from:"Gregory Haskins"

Re: [kvm-devel] High vm-exit latencies during kvm boot-up/shutdown

2007-10-23 Thread Gregory Haskins

On Tue, 2007-10-23 at 16:19 +0200, Avi Kivity wrote:
> Jan Kiszka wrote:
> > Avi,
> >
> > [somehow your mails do not get through to my private account, so I'm
> > switching]
> >
> > Avi Kivity wrote:
> >   
> >> Jan Kiszka wrote:
> >> 
> >>> Clarification: I can't precisely tell what code is executed in VM mode,
> >>> as I don't have qemu or that guest instrumented. I just see the kernel
> >>> entering VM mode and leaving it again more than 300 us later. So I
> >>> wonder why this is allowed while some external IRQ is pending.
> >>>
> >>>   
> >>>   
> >> How do you know an external interrupt is pending?
> >> 
> >
> > It's the host timer IRQ, programmed to fire in certain intervals (100 us
> > here). Test case is some latency measurement tool like tglx's cyclictest
> > or similar programs we use in Xenomai.
> >
> >   
> >> kvm programs the hardware to exit when an external interrupt arrives.
> >>
> >> 
> >
> > Here is a latency trace I just managed to capture over 2.6.23.1-rt1 with 
> > latest kvm from git hacked into (kvm generally seems to work fine this way):
> >
> > ...
> > qemu-sys-7543  0...1 13897us : vmcs_write16+0xb/0x20 
> > (vmx_save_host_state+0x1a7/0x1c0)
> > qemu-sys-7543  0...1 13897us : vmcs_writel+0xb/0x30 (vmcs_write16+0x1e/0x20)
> > qemu-sys-7543  0...1 13898us : segment_base+0xc/0x70 
> > (vmx_save_host_state+0xa0/0x1c0)
> > qemu-sys-7543  0...1 13898us : vmcs_writel+0xb/0x30 
> > (vmx_save_host_state+0xb0/0x1c0)
> > qemu-sys-7543  0...1 13898us : segment_base+0xc/0x70 
> > (vmx_save_host_state+0xbf/0x1c0)
> > qemu-sys-7543  0...1 13898us : vmcs_writel+0xb/0x30 
> > (vmx_save_host_state+0xcf/0x1c0)
> > qemu-sys-7543  0...1 13898us : load_msrs+0xb/0x40 
> > (vmx_save_host_state+0xe7/0x1c0)
> > qemu-sys-7543  0...1 13898us : kvm_load_guest_fpu+0x8/0x40 
> > (kvm_vcpu_ioctl_run+0xbf/0x570)
> > qemu-sys-7543  0D..1 13899us : vmx_vcpu_run+0xc/0x110 
> > (kvm_vcpu_ioctl_run+0x120/0x570)
> > qemu-sys-7543  0D..1 13899us!: vmcs_writel+0xb/0x30 
> > (vmx_vcpu_run+0x22/0x110)
> > qemu-sys-7543  0D..1 14344us : vmcs_read32+0xb/0x20 
> > (vmx_vcpu_run+0xc7/0x110)
> > qemu-sys-7543  0D..1 14345us : vmcs_readl+0x8/0x10 (vmcs_read32+0x16/0x20)
> > qemu-sys-7543  0D..1 14345us : vmcs_read32+0xb/0x20 
> > (vmx_vcpu_run+0xf4/0x110)
> > qemu-sys-7543  0D..1 14345us+: vmcs_readl+0x8/0x10 (vmcs_read32+0x16/0x20)
> > qemu-sys-7543  0D..1 14349us : irq_enter+0xb/0x30 (do_IRQ+0x45/0xc0)
> > qemu-sys-7543  0D.h1 14350us : do_IRQ+0x73/0xc0 (f8caae24 0 0)
> > qemu-sys-7543  0D.h1 14351us : handle_level_irq+0xe/0x120 (do_IRQ+0x7d/0xc0)
> > qemu-sys-7543  0D.h1 14351us : __spin_lock+0xc/0x30 
> > (handle_level_irq+0x24/0x120)
> > qemu-sys-7543  0D.h2 14352us : mask_and_ack_8259A+0x14/0x120 
> > (handle_level_irq+0x37/0x120)
> > qemu-sys-7543  0D.h2 14352us+: __spin_lock_irqsave+0x11/0x60 
> > (mask_and_ack_8259A+0x2a/0x120)
> > qemu-sys-7543  0D.h3 14357us : __spin_unlock_irqrestore+0xc/0x60 
> > (mask_and_ack_8259A+0x7a/0x120)
> > qemu-sys-7543  0D.h2 14358us : redirect_hardirq+0x8/0x70 
> > (handle_level_irq+0x72/0x120)
> > qemu-sys-7543  0D.h2 14358us : __spin_unlock+0xb/0x40 
> > (handle_level_irq+0x8e/0x120)
> > qemu-sys-7543  0D.h1 14358us : handle_IRQ_event+0xe/0x110 
> > (handle_level_irq+0x9a/0x120)
> > qemu-sys-7543  0D.h1 14359us : timer_interrupt+0xb/0x60 
> > (handle_IRQ_event+0x67/0x110)
> > qemu-sys-7543  0D.h1 14359us : hrtimer_interrupt+0xe/0x1f0 
> > (timer_interrupt+0x20/0x60)
> > ...
> >
> > One can see 345 us latency between vm-enter and vm-exit in vmx_vcpu_run -
> > and this while cyclictest runs at a period of 100 us!
> >
> > I got the same results over Adeos/I-pipe & Xenomai with the function
> > tracer there, also pointing to the period while the CPU is in VM mode.
> >
> > Anyone any ideas? Greg, I put you on CC as you said you once saw "decent
> > latencies" with your patches. Are there still magic bits missing in
> > official kvm?
> >   
> 
> No bits missing as far as I know.  It should just work.

That could very well be the case these days.  I know back when I was
looking at it, KVM would not run on VMX + -rt without modification or it
would crash/hang (this was around the time I was working on that
smp_function_call stuff).  And without careful modification it would run
very poorly (with high (300us+ latencies) revealed in cyclictest.

However, I was able to craft the vmx_vcpu_run path so that a VM could
run side-by-side with cyclictest with sub 40us latencies.  In fact,
normally it was sub 30us, but on an occasional run I would get a spike
to ~37us.

Unfortunately I am deep into other non-KVM related -rt issues at the
moment, so I can't work on it any further for a bit.

Regards,
-Greg


signature.asc
Description: This is a digitally signed message part
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and config

Re: [kvm-devel] KVM and Prempt?

2007-10-19 Thread Gregory Haskins

On Fri, 2007-10-19 at 16:17 +0200, Back, Michael (ext) wrote:

> >Are you referring to the -rt patch?
> 
> Yes, I take the patch from
> http://www.kernel.org/pub/linux/kernel/projects/rt/
> 

Ah, yes.  There were problems in the past with running KVM on -rt if you
have the full kit enabled.  Its possible they are still not fixed.

I had patches to address this issue for an older KVM and had it up and
running on a full PREEMPT_RT kernel, but they have become quite stale at
this point.  I will be picking up this work again sometime in the near
future, so watch this space.

Are you building KVM as a module from the -rt tree, or as an external
module?

-Greg

signature.asc
Description: This is a digitally signed message part
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] KVM and Prempt?

2007-10-19 Thread Gregory Haskins

Hi Michael,

On Fri, 2007-10-19 at 15:32 +0200, Back, Michael (ext) wrote:
> 
> 
> 2.6.31.1 should be 2.6.23.1 - sorry
> 
> _  
> Von:Back, Michael (ext)   
> Gesendet:   Friday, October 19, 2007 3:16 PM 
> An: 'kvm-devel@lists.sourceforge.net' 
> Betreff:KVM and Prempt?
> 
> 
> Hallo, 
> I tried to run Windows XP with KVM on Linux 2.6.31.1 on a AMD Opteron
> and on a Intel Xeon, on both it works fine!
> 
> After this test I patch the kernel with the current prempt-patch and
> on both it doesn't works! 

Are you referring to the -rt patch?

> -> After a very short time - I could see the windows startup screen -
> the complied system froze!
> 
> Has someone ever tried to do the same and it works? 
> Or will KVM with Windows on a prempt kernel  
> - never work? 
> - maybe work in the future? 
> - should now work but this .. and this … should be done and consider?
> 
> With best regards, 
> Michael Back
> 
> # 
> ASTRUM IT GmbH
> Michael Back
> 
> Dipl.-Ing. Univ.
> software engineer
> Projectmanagement
> 
> Am Wolfsmantel 46
> 91058 Erlangen, Germany
> Tel.: +49 (91 31) 94 08 - 374
> Fax: +49 (91 31) 94 08 - 108  
>  [EMAIL PROTECTED]>  
> 
> contact address by Siemens
> Siemens AG
> Medical Solutions
> MED MRZ
> Allee am Roethelheimpark 2
> 91052 Erlangen, Germany
> Tel.: +49 (9131) 84-5307
> Fax: +49 (9131) 84-8767
>  [EMAIL PROTECTED]>
> 
> Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard
> Cromme; Managing Board: Peter Loescher, Chairman, President and Chief
> Executive Officer; 
> 
> Heinrich Hiesinger, Joe Kaeser, Rudi Lamprecht, Eduardo Montes,
> Juergen Radomski, Erich R. Reinhardt, Hermann Requardt, Uriel J.
> Sharef, Peter Y. Solmssen, Klaus Wucherer;
> 
>  Registered offices: Berlin and Munich; Commercial registries: Berlin
> Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322
> 
> -
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> ___ kvm-devel mailing list 
> kvm-devel@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/kvm-devel


signature.asc
Description: This is a digitally signed message part
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [ kvm-Bugs-1807620 ] KVM's --disable-gcc-check doesn't work

2007-10-04 Thread Gregory Haskins

On Thu, 2007-10-04 at 21:49 +0200, Farkas Levente wrote:
> Gregory Haskins wrote:
> > On Thu, 2007-10-04 at 19:27 +0200, Farkas Levente wrote:
> > 
> >> ok but now as qemu code was imported into kvm, then it's probably would
> >> be better to witch gcc-4.x?
> > 
> > Sure.  Are you volunteering? ;)  I'm sure both upstream QEMU developers,
> > KVM developers, and the community using either would be most
> > appreciative.  I know I would be.
> 
> here i mean that packaging kvm for fedora/redhat/centos using gcc-4.x in
> stead of gcc-3.x. if currently there is no reason to use gcc-3.x than i
> change all of my spec file.

You *could*, sure.  I have done this for local builds here.  But if you
go that route I would recommend making a patch to KVM so it doesn't fall
back into QEMU mode automatically (today if it can't open the kvm module
it will assume "-no-kvm" like behavior).  Otherwise you will have a
bunch of support calls about why its not working properly should someone
cause the system to fall back.

> 
> ps. anyway it's planed to be temporary or permanent to use a qemu fork
> for kvm and not try to propagate changes back to the upstream qemu?

Ill defer to Avi here, though as I understand it: Things that benefit
the upstream source get pushed...things that are KVM only don't.

Regards,
-Greg

signature.asc
Description: This is a digitally signed message part
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [ kvm-Bugs-1807620 ] KVM's --disable-gcc-check doesn't work

2007-10-04 Thread Gregory Haskins

On Thu, 2007-10-04 at 19:27 +0200, Farkas Levente wrote:

> ok but now as qemu code was imported into kvm, then it's probably would
> be better to witch gcc-4.x?

Sure.  Are you volunteering? ;)  I'm sure both upstream QEMU developers,
KVM developers, and the community using either would be most
appreciative.  I know I would be.

Regards,
-Greg

signature.asc
Description: This is a digitally signed message part
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [ kvm-Bugs-1807620 ] KVM's --disable-gcc-check doesn't work

2007-10-04 Thread Gregory Haskins

On Thu, 2007-10-04 at 18:33 +0200, Farkas Levente wrote:
> hi,
> what's the real reason that kvm can't be compiled gcc-4.x?
> wouldn't it be better to be able to compile with the current compilers too?

Its actually an issue with QEMUs cpu emulation code.  It takes advantage
of compiler traits that are no longer true in the 4.x series.  The code
will actually compile under 4.x however.  It just will not run properly.
However, KVM doesn't use QEMUs cpu emulation, so if you do not care
about running with KVM disabled (e.g. -no-kvm), you don't technically
need to worry about it.

Hope that helps.
-Greg

> 
> SourceForge.net wrote:
> > Bugs item #1807620, was opened at 2007-10-04 18:25
> > Message generated for change (Tracker Item Submitted) made by Item Submitter
> > You can respond by visiting: 
> > https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1807620&group_id=180599
> > 
> > Please note that this message will contain a full copy of the comment 
> > thread,
> > including the initial issue submission, for this request,
> > not just the latest update.
> > Category: None
> > Group: None
> > Status: Open
> > Resolution: None
> > Priority: 5
> > Private: No
> > Submitted By: Technologov (technologov)
> > Assigned to: Nobody/Anonymous (nobody)
> > Summary: KVM's --disable-gcc-check doesn't work
> > 
> > Initial Comment:
> > KVM's configure switch: "--disable-gcc-check" doesn't work.
> > 
> > This bug makes KVM very hard to compile on openSUSE 10.2/10.3, which, 
> > unlike Fedora doesn't have compat-gcc-34 package.
> > 
> > Here is the error:
> > 
> > [EMAIL PROTECTED]:~/Linstall/kvm-45> ./configure --disable-gcc-check
> > ./configure: cannot locate gcc 3.x. please install it or specify with 
> > --qemu-cc
> > 
> > According to HELP, it should work:
> > [EMAIL PROTECTED]:~/Linstall/kvm-45> ./configure --help
> > Usage: ./configure [options]
> > 
> > Options include:
> > 
> > --prefix=PREFIXwhere to install things (/usr/local)
> > --with-patched-kernel  don't use external module
> > --kerneldir=DIRkernel build directory 
> > (/lib/modules/2.6.22.5-31-bigsmp/build)
> > --qemu-cc=""   compiler for qemu (needs gcc3.x) ()
> > --disable-gcc-checkdon't insist on gcc-3.x
> >- this will break running without kvm
> > 
> > 
> > Host: openSUSE 10.3, 32-bit, Intel Core 2 CPU, KVM-45.
> > 
> > -Alexey Technologov
> > 
> > --
> > 
> > You can respond by visiting: 
> > https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1807620&group_id=180599
> > 
> > -
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems?  Stop.
> > Now Search log events and configuration files using AJAX and a browser.
> > Download your FREE copy of Splunk now >> http://get.splunk.com/
> > ___
> > kvm-devel mailing list
> > kvm-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/kvm-devel
> > 
> 
> 


signature.asc
Description: This is a digitally signed message part
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] What happens on an INT80 instruction

2007-10-02 Thread Gregory Haskins

On Mon, 2007-10-01 at 17:23 -0600, Cam Macdonell wrote:

> 
> Actually, I looking into doing a PhD dissertation :)  I'm just trying to 
> get a better working understanding of how kvm (and other VMMs) handle 
> instructions like int80 that should trap into the OS, but of course in a 
> VM need to trap into the guest OS (which is running at user-level) and 
> not the host OS.  Do traps by a guest app to the guest OS involve the 
> VMM at all?

Hi Cam,
   The answer has to do with a few different variables:

1) The capabilities of the virtualization technology hw (e.g. VMX, SVM,
etc)

2) The programming of those capabilities by the VMM

The important thing to remember when using VM hardware is: they aren't
*really* executing guest code in "userspace" (though that is a very nice
way to think of them in many respects...I do this myself when its
convenient).  They are really executing in a special context
("guest-mode") where the hardware can be programmed in various ways by
the VMM.

Intel VMX for instance, (and AMD-SVM is similar) allows the guest to
have its own Interrupt-Descriptor-Table (IDT) independent of the hosts
IDT.  This governs how interrupts are handled when they are injected
into the guest context, just like the host IDT governs how they are
delivered to the VMM.  One primary difference, however, is the host has
some programmatic control over the behavior of the guest (within the
constraints of the hardware capabilities, of course).  For instance, the
host can program the VMX hardware to cause a VMEXIT when certain
instructions or events happen inside the guest.

VMX has such a control for INTx instructions (see Section 20.6.3 in the
Intel SDM Volume 3b), but (IIUC, and as Anthony mentioned) they are
limited to the first 32 of the 256 vectors in x86 (i.e. the "hardware
exceptions") whereas the remaining vectors are not trappable.  What this
means is that if the vector is >= 32, or if its < 32 but the
exit-control is not enabled the INTx instruction will be delivered right
to the guests-IDT without leaving guest-context.  Otherwise, it will
VMEXIT back to the host. 

IIUC, KVM in particular only sets the control for a handful of the 32
vectors (#PF is one, I'm pretty sure ;).  KVM doesn't care about INT80,
and the VMX hardware doesn't support that exit-condition even if it did.
What this means is that on KVM/VMX, an INT80 is delivered to whatever
the guest set up in its own IDT for vector 80, and that's it.  The host
wouldn't even know, per se.  However, I'm sure there might be some
VMM/HW combo out there other than KVM that might trap INT80, so YMMV.

I hope this helps to clarify.

Good luck on that dissertation!
-Greg

signature.asc
Description: This is a digitally signed message part
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Userspace hypercalls?

2007-08-27 Thread Gregory Haskins

(Responding from the beach ;)

I've already done a good portion of what is being discussed here in my pvio 
series.  Please review it and give me your feedback.

For instance, you will see the single hypercall passing the channel id down.   
-Original Message- 
From: Avi Kivity <[EMAIL PROTECTED]> 
Cc: kvm-devel <[EMAIL PROTECTED]> 
To: Anthony Liguori <[EMAIL PROTECTED]> 

Sent: 8/27/2007 10:47:22 AM 
Subject: Re: [kvm-devel] Userspace hypercalls? 

Avi Kivity wrote: 
> 
> Thinking a little more about this, it isn't about handling hypercalls  
> in userspace, but about handling a virtio sync() in userspace. 
> 
> So how about having a KVM_HC_WAKE_CHANNEL hypercall (similar to Xen's  
> event channel, but assymetric) that has a channel parameter.  The  
> kernel handler for that hypercall dispatches calls to either a kernel  
> handler or a userspace handler.  That means we don't need a separate  
> ETH_SEND, ETH_RECEIVE, or BLOCK_SEND hypercalls. 

And thinking a tiny little bit more about this, we can have the kernel  
(optionally) fire an eventfd, so a separate userspace thread or process  
can be woken up to service the device, without a heavyweight exit. 

--  
Any sufficiently difficult bug is indistinguishable from a feature. 

- 
This SF.net email is sponsored by: Splunk Inc. 
Still grepping through log files to find problems?  Stop. 
Now Search log events and configuration files using AJAX and a browser. 
Download your FREE copy of Splunk now >>  http://get.splunk.com/ 
___ 
kvm-devel mailing list 
kvm-devel@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/kvm-devel 

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/13] PV-IO v4

2007-08-24 Thread Gregory Haskins

On Fri, 2007-08-24 at 17:22 -0700, Dor Laor wrote:

> Cheers and enjoy the vacation

Thanks!

>  (your not going to Tuscon Arizona are you...)

While I can't quite say I'd rather be in AZ than on vacation with the
family ;), I am disappointed that the timing will prevent me from
joining you guys.  Have a good time out there, and I will talk to you
guys when I get back.

-Greg

(Thanks for the patches, BTW.  I will try to take a look ASAP)

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 13/13] KVM: Add an IOQNET backend driver

2007-08-24 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig   |   10 +
 drivers/kvm/Makefile  |4 
 drivers/kvm/ioqnet_host.c |  578 +
 3 files changed, 591 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index b45c9c3..942acc5 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -46,6 +46,16 @@ config KVM_PV_HOST
depends on KVM
select IOQ
 
+config KVM_IOQNET
+tristate "IOQNET backend host support"
+   depends on KVM_PV_HOST
+   ---help---
+   Adds a backend driver for connecting guest IOQNET drivers to a host
+based netif interface.  This ethernet like interface can then be used
+   to wire the guest into more elaborate network configurations such as
+   via a standard linux bridge.  You only need this if you plan to run
+   guests which have an IOQNET driver.  If unsure, say N.
+
 config KVM_GUEST
bool "KVM Guest support"
depends on X86
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index e7b52e8..e4df631 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -13,4 +13,6 @@ obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
 kvm-amd-objs = svm.o
 obj-$(CONFIG_KVM_AMD) += kvm-amd.o
 kvm-pvbus-objs := ioq_guest.o pvbus_guest.o
-obj-$(CONFIG_KVM_PVBUS_GUEST) += kvm-pvbus.o
\ No newline at end of file
+obj-$(CONFIG_KVM_PVBUS_GUEST) += kvm-pvbus.o
+kvm-ioqnet-objs := ioqnet_host.o
+obj-$(CONFIG_KVM_IOQNET) += kvm-ioqnet.o
\ No newline at end of file
diff --git a/drivers/kvm/ioqnet_host.c b/drivers/kvm/ioqnet_host.c
new file mode 100644
index 000..ffec49e
--- /dev/null
+++ b/drivers/kvm/ioqnet_host.c
@@ -0,0 +1,578 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * ioqnet - A paravirtualized network device based on the IOQ interface.
+ *
+ * This module represents the backend driver for an IOQNET driver on the KVM
+ * platform.
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * Derived in part from the SNULL example from the book "Linux Device
+ * Drivers" by Alessandro Rubini and Jonathan Corbet, published
+ * by O'Reilly & Associates.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include  /* printk() */
+#include  /* kmalloc() */
+#include   /* error codes */
+#include   /* size_t */
+#include  /* mark_bh */
+
+#include 
+#include/* struct device, and other headers */
+#include  /* eth_type_trans */
+#include   /* struct iphdr */
+#include  /* struct tcphdr */
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvbus_host.h"
+#include "kvm.h"
+
+MODULE_AUTHOR("Gregory Haskins");
+MODULE_LICENSE("GPL");
+
+#define IOQNET_NAME "ioqnet"
+
+/*
+ * FIXME: Any "BUG_ON" code that can be triggered by a malicious guest must
+ * be turned into an inject_gp()
+ */
+
+struct ioqnet_queue {
+   struct ioq  *queue;
+   struct ioq_notifier  notifier;
+};
+
+struct ioqnet_priv {
+   spinlock_t   lock;
+   struct kvm  *kvm;
+   struct kvm_pv_device pvdev;
+   struct net_device   *netdev;
+   struct net_device_stats  stats;
+   struct ioqnet_queue  rxq;
+   struct ioqnet_queue  txq;
+   struct tasklet_structtxtask;
+   int  connected;
+   int  opened;
+};
+
+#undef PDEBUG /* undef it, just in case */
+#ifdef IOQNET_DEBUG
+#  define PDEBUG(fmt, args...) printk( KERN_DEBUG "ioqnet: " fmt, ## args)
+#else
+#  define PDEBUG(fmt, args...) /* not debugging: nothing */
+#endif
+
+/*
+ * Enable and disable receive interrupts.
+ */
+static void ioqnet_rx_ints(struct net_device *dev, int enable)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   struct ioq *ioq = priv->rxq.queue;
+
+   if (priv->connected) {
+   if (enable)
+   ioq_start(ioq, 0);
+   else
+   ioq_stop(ioq, 0);
+   }
+}
+
+/*
+ * Open and close
+ */
+
+int ioqnet_open(struct net_device *dev)
+{
+   struct ioqnet_priv *p

[kvm-devel] [PATCH 12/13] KVM: Add PVBUS support to the KVM host

2007-08-24 Thread Gregory Haskins

PVBUS allows VMM agnostic PV drivers to discover/configure virtual resources

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig  |2 
 drivers/kvm/Makefile |1 
 drivers/kvm/kvm.h|3 
 drivers/kvm/kvm_main.c   |4 
 drivers/kvm/pvbus_host.c |  636 ++
 drivers/kvm/pvbus_host.h |   66 +
 6 files changed, 711 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index b81a188..b45c9c3 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -45,7 +45,7 @@ config KVM_PV_HOST
 boolean "Add paravirtualization backend support to KVM"
depends on KVM
select IOQ
-
+
 config KVM_GUEST
bool "KVM Guest support"
depends on X86
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index eb32ce5..e7b52e8 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -5,6 +5,7 @@
 kvm-objs := kvm_main.o mmu.o x86_emulate.o
 ifeq ($(CONFIG_KVM_PV_HOST),y)
 kvm-objs += ioq_host.o
+kvm-objs += pvbus_host.o
 endif
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index aaa6d12..9f1cdfa 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -14,6 +14,8 @@
 #include 
 #include 
 #include 
+#include 
+  
 #include 
 
 #include "ioq.h"
@@ -414,6 +416,7 @@ struct kvm {
struct kvm_io_bus pio_bus;
 #ifdef CONFIG_KVM_PV_HOST
struct ioq_mgr *ioqmgr;
+   struct kvm_pvbus *pvbus;
 #endif
 };
 
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 03d0d67..8d1e4ce 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -18,6 +18,7 @@
 #include "kvm.h"
 #include "x86_emulate.h"
 #include "segment_descriptor.h"
+#include "pvbus_host.h"
 
 #include 
 #include 
@@ -302,6 +303,7 @@ static struct kvm *kvm_create_vm(void)
list_add(&kvm->vm_list, &vm_list);
spin_unlock(&kvm_lock);
kvmhost_ioqmgr_init(kvm);
+   kvm_pvbus_init(kvm);
return kvm;
 }
 
@@ -3305,6 +3307,7 @@ static __init int kvm_init(void)
memset(__va(bad_page_address), 0, PAGE_SIZE);
 
kvmhost_ioqmgr_module_init();
+   kvm_pvbus_module_init();
 
return 0;
 
@@ -3320,6 +3323,7 @@ static __exit void kvm_exit(void)
kvm_exit_debug();
__free_page(pfn_to_page(bad_page_address >> PAGE_SHIFT));
kvm_mmu_module_exit();
+   kvm_pvbus_module_exit();
 }
 
 module_init(kvm_init)
diff --git a/drivers/kvm/pvbus_host.c b/drivers/kvm/pvbus_host.c
new file mode 100644
index 000..cc506f4
--- /dev/null
+++ b/drivers/kvm/pvbus_host.c
@@ -0,0 +1,636 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvbus.h"
+#include "pvbus_host.h"
+#include "kvm.h"
+
+struct pvbus_map {
+   int (*compare)(const void *left, const void *right);
+   const void* (*getkey)(struct rb_node *node);
+
+   struct mutex   lock;
+   struct rb_root root;
+   size_t count;
+};
+
+struct _pv_devtype {
+   struct kvm_pv_devtype *item;
+   struct rb_node node;
+};
+
+struct _pv_device {
+   struct kvm_pv_device  *item;
+   struct rb_node node;
+   struct _pv_devtype*parent;
+   intsynced;
+};
+
+static struct pvbus_map pvbus_typemap;
+
+struct kvm_pvbus_eventq {
+   struct mutex lock;
+   struct ioq  *ioq;
+
+};
+
+struct kvm_pvbus {
+   struct mutexlock;
+   struct kvm *kvm;
+   struct pvbus_mapdevmap;
+   struct kvm_pvbus_eventq eventq;
+};
+
+/*
+ * --
+ * generic rb map management
+ * --
+ */
+
+static void pvbus_map_init(struct pvbus_map *map)
+{
+   mutex_init(&map->lock);
+   map->root = RB_ROOT;
+}
+
+static int pvbus_map_register(struct pvbus_map *map, struct rb_node *node)
+{
+   int ret = 0;
+   struct rb_root *root;
+   struct rb_node **new, *parent = NULL;
+
+   mutex_lock(

[kvm-devel] [PATCH 11/13] KVM: Add support for IOQ

2007-08-24 Thread Gregory Haskins

IOQ is a shared-memory-queue interface for implmenting PV driver
communication.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig|5 +
 drivers/kvm/Makefile   |3 
 drivers/kvm/ioq.h  |   12 +-
 drivers/kvm/ioq_host.c |  365 
 drivers/kvm/kvm.h  |4 +
 drivers/kvm/kvm_main.c |3 
 6 files changed, 391 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index d17ce96..b81a188 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -41,6 +41,11 @@ config KVM_AMD
  Provides support for KVM on AMD processors equipped with the AMD-V
  (SVM) extensions.
 
+config KVM_PV_HOST
+boolean "Add paravirtualization backend support to KVM"
+   depends on KVM
+   select IOQ
+
 config KVM_GUEST
bool "KVM Guest support"
depends on X86
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index cd621fc..eb32ce5 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -3,6 +3,9 @@
 #
 
 kvm-objs := kvm_main.o mmu.o x86_emulate.o
+ifeq ($(CONFIG_KVM_PV_HOST),y)
+kvm-objs += ioq_host.o
+endif
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/ioq.h b/drivers/kvm/ioq.h
index 7e955f1..347fa0b 100644
--- a/drivers/kvm/ioq.h
+++ b/drivers/kvm/ioq.h
@@ -25,7 +25,17 @@
 
 #include 
 
-#define IOQHC_REGISTER   1
+struct kvm;
+
+#ifdef CONFIG_KVM_PV_HOST
+int kvmhost_ioqmgr_init(struct kvm *kvm);
+int kvmhost_ioqmgr_module_init(void);
+#else
+#define kvmhost_ioqmgr_init(kvm) {}
+#define kvmhost_ioqmgr_module_init() {}
+#endif
+
+#define IOQHC_REGISTER1
 #define IOQHC_UNREGISTER  2
 #define IOQHC_SIGNAL 3
 
diff --git a/drivers/kvm/ioq_host.c b/drivers/kvm/ioq_host.c
new file mode 100644
index 000..413f103
--- /dev/null
+++ b/drivers/kvm/ioq_host.c
@@ -0,0 +1,365 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "ioq.h"
+#include "kvm.h"
+
+struct kvmhost_ioq {
+   struct ioqioq;
+   struct rb_nodenode;
+   atomic_t  refcnt;
+   struct kvm_vcpu  *vcpu;
+   int   irq;
+};
+
+struct kvmhost_map {
+   spinlock_t lock;
+   struct rb_root root;
+};
+
+struct kvmhost_ioq_mgr {
+   struct ioq_mgr  mgr;
+   struct kvm *kvm;
+   struct kvmhost_map  map;
+};
+
+struct kvmhost_ioq* to_ioq(struct ioq *ioq)
+{
+   return container_of(ioq, struct kvmhost_ioq, ioq);
+}
+
+struct kvmhost_ioq_mgr* to_mgr(struct ioq_mgr *mgr)
+{
+   return container_of(mgr, struct kvmhost_ioq_mgr, mgr);
+}
+
+/*
+ * --
+ * rb map management
+ * --
+ */
+
+static void kvmhost_map_init(struct kvmhost_map *map)
+{
+   spin_lock_init(&map->lock);
+   map->root = RB_ROOT;
+}
+
+static int kvmhost_map_register(struct kvmhost_map *map,
+   struct kvmhost_ioq *ioq)
+{
+   int ret = 0;
+   struct rb_root *root;
+   struct rb_node **new, *parent = NULL;
+
+   spin_lock(&map->lock);
+
+   root = &map->root;
+   new  = &(root->rb_node);
+
+   /* Figure out where to put new node */
+   while (*new) {
+   struct kvmhost_ioq *this;
+
+   this   = container_of(*new, struct kvmhost_ioq, node);
+   parent = *new;
+
+   if (ioq->ioq.id < this->ioq.id)
+   new = &((*new)->rb_left);
+   else if (ioq->ioq.id > this->ioq.id)
+   new = &((*new)->rb_right);
+   else {
+   ret = -EEXIST;
+   break;
+   }
+   }
+
+   if (!ret) {
+   /* Add new node and rebalance tree. */
+   rb_link_node(&ioq->node, parent, new);
+   rb_insert_color(&ioq->node, root);
+   }
+
+   spin_unlock(&map->lock);
+
+   ret

[kvm-devel] [PATCH 10/13] KVM: Add a gpa_to_hva helper function

2007-08-24 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h |1 +
 drivers/kvm/mmu.c |   12 
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 839e11c..2987edc 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -507,6 +507,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int 
slot);
 void kvm_mmu_zap_all(struct kvm *kvm);
 
 hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa);
+void* gpa_to_hva(struct kvm *kvm, gpa_t gpa);
 #define HPA_MSB ((sizeof(hpa_t) * 8) - 1)
 #define HPA_ERR_MASK ((hpa_t)1 << HPA_MSB)
 static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; }
diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 3d98255..2d7afba 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -776,6 +776,18 @@ hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa)
| (gpa & (PAGE_SIZE-1));
 }
 
+void* gpa_to_hva(struct kvm *kvm, gpa_t gpa)
+{
+   struct page *page;
+
+   if ((gpa & HPA_ERR_MASK) == 0)
+   return NULL;
+
+   page = gfn_to_page(kvm, gpa >> PAGE_SHIFT);
+   return kmap_atomic(page, gpa & PAGE_MASK);
+}
+EXPORT_SYMBOL_GPL(gpa_to_hva);
+
 hpa_t gva_to_hpa(struct kvm_vcpu *vcpu, gva_t gva)
 {
gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, gva);


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 09/13] KVM: Importing Dor's base PV infrastructure work to kvm.git HEAD

2007-08-24 Thread Gregory Haskins

From: Dor Laor <[EMAIL PROTECTED]>

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h  |   11 +++
 drivers/kvm/kvm_main.c |  153 +++-
 drivers/kvm/svm.c  |   11 +++
 drivers/kvm/svm.h  |2 -
 drivers/kvm/vmx.c  |6 ++
 5 files changed, 148 insertions(+), 35 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index a42a6f3..839e11c 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -316,9 +316,6 @@ struct kvm_vcpu {
unsigned long cr0;
unsigned long cr2;
unsigned long cr3;
-   gpa_t para_state_gpa;
-   struct page *para_state_page;
-   gpa_t hypercall_gpa;
unsigned long cr4;
unsigned long cr8;
u64 pdptrs[4]; /* pae */
@@ -388,6 +385,12 @@ struct kvm_memory_slot {
unsigned long *dirty_bitmap;
 };
 
+struct kvm_hypercall {
+   unsigned long (*hypercall)(struct kvm_vcpu*, unsigned long args[]);
+   struct module *module;
+   int idx;
+};
+
 struct kvm {
struct mutex lock; /* protects everything except vcpus */
int naliases;
@@ -588,6 +591,8 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu);
 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
 
 int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_register_hypercall(struct module *module, struct kvm_hypercall 
*hypercall);
+int kvm_unregister_hypercall(struct kvm_hypercall *hypercall);
 
 static inline int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 u32 error_code)
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index d154487..6428746 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -84,6 +84,8 @@ static struct kvm_stats_debugfs_item {
 
 static struct dentry *debugfs_dir;
 
+static struct kvm_hypercall hypercalls[KVM_NR_HYPERCALLS];
+
 #define MAX_IO_MSRS 256
 
 #define CR0_RESERVED_BITS  \
@@ -1263,53 +1265,150 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 
vcpu->run->exit_reason = KVM_EXIT_HLT;
++vcpu->stat.halt_exits;
+
return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_halt);
 
+int kvm_register_hypercall(struct module* module,
+  struct kvm_hypercall *hypercall)
+{
+   int r = 0;
+
+   if (hypercall->idx >= KVM_NR_HYPERCALLS ||
+   hypercall->idx < 0) {
+   printk(KERN_DEBUG "%s:hypercall registration idx(%d)\n",
+__FUNCTION__, hypercall->idx);
+   return -EINVAL;
+   }
+
+   spin_lock(&kvm_lock);
+
+   if (hypercalls[hypercall->idx].hypercall) {
+   printk(KERN_DEBUG "%s:hypercall idx(%d) already taken\n",
+   __FUNCTION__, hypercall->idx);
+   r = -EEXIST;
+   goto out;
+   }
+
+   if (try_module_get(module) < 0) {
+   printk(KERN_DEBUG "%s: module reference count++ failed\n",
+   __FUNCTION__);
+   r = -EINVAL;
+   goto out;
+   }
+
+   hypercalls[hypercall->idx].hypercall = hypercall->hypercall;
+   hypercalls[hypercall->idx].module = module;
+
+out:
+   spin_unlock(&kvm_lock);
+
+   return r;
+}
+EXPORT_SYMBOL_GPL(kvm_register_hypercall);
+
+int kvm_unregister_hypercall(struct kvm_hypercall *hypercall)
+{
+   if (hypercall->idx >= KVM_NR_HYPERCALLS ||
+   hypercall->idx < 0) {
+   printk(KERN_DEBUG "%s:hypercall unregistration idx(%d)\n",
+__FUNCTION__, hypercall->idx);
+   return -EINVAL;
+   }
+
+   spin_lock(&kvm_lock);
+   if (!hypercalls[hypercall->idx].hypercall) {
+   printk(KERN_DEBUG "%s:hypercall idx(%d) was not registered\n",
+   __FUNCTION__, hypercall->idx);
+   spin_unlock(&kvm_lock);
+   return -EEXIST;
+   }
+
+   hypercalls[hypercall->idx].hypercall = 0;
+   module_put(hypercalls[hypercall->idx].module);
+   spin_unlock(&kvm_lock);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_unregister_hypercall);
+
+/*
+ * Generic hypercall dispatcher routine.
+ * Returns 0 for user space handling, 1 on success handling
+ */
 int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-   unsigned long nr, a0, a1, a2, a3, a4, a5, ret;
+   unsigned long nr, ret;
+   unsigned long args[6];
+   int res = 1;
 
kvm_arch_ops->cache_regs(vcpu);
ret = -KVM_EINVAL;
 #ifdef CONFIG_X86_64
if (is_long_mode(vcpu)) {
nr = vcpu->regs[VCPU_REGS_RAX];
-   a0 = vcpu->regs[VCPU_REGS_RDI];
-   a1 = vcpu->regs[VCPU_REGS_RSI];
-   a2 = vcpu->regs[VCPU_REGS_RDX];
-   a3 = vcpu-

[kvm-devel] [PATCH 08/13] KVM: Add a guest side driver for IOQ

2007-08-24 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig   |   16 ++
 drivers/kvm/Makefile  |2 
 drivers/kvm/ioq.h |   39 +
 drivers/kvm/ioq_guest.c   |  196 +++
 drivers/kvm/pvbus.h   |   63 +++
 drivers/kvm/pvbus_guest.c |  382 +
 include/linux/kvm.h   |4 
 7 files changed, 701 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 445c6e4..d17ce96 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -41,4 +41,20 @@ config KVM_AMD
  Provides support for KVM on AMD processors equipped with the AMD-V
  (SVM) extensions.
 
+config KVM_GUEST
+   bool "KVM Guest support"
+   depends on X86
+   default n
+ 
+config KVM_PVBUS_GUEST
+tristate "Paravirtualized Bus (PVBUS) support"
+depends on KVM_GUEST
+select IOQ
+select PVBUS
+---help---
+   PVBUS is an infrastructure for generic PV drivers to take advantage
+   of an underlying hypervisor without having to understand the details
+   of the hypervisor itself.  You only need this option if you plan to
+   run this kernel as a KVM guest.
+
 endif # VIRTUALIZATION
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index c0a789f..cd621fc 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -8,3 +8,5 @@ kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
 kvm-amd-objs = svm.o
 obj-$(CONFIG_KVM_AMD) += kvm-amd.o
+kvm-pvbus-objs := ioq_guest.o pvbus_guest.o
+obj-$(CONFIG_KVM_PVBUS_GUEST) += kvm-pvbus.o
\ No newline at end of file
diff --git a/drivers/kvm/ioq.h b/drivers/kvm/ioq.h
new file mode 100644
index 000..7e955f1
--- /dev/null
+++ b/drivers/kvm/ioq.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _KVM_IOQ_H_
+#define _KVM_IOQ_H_
+
+#include 
+
+#define IOQHC_REGISTER   1
+#define IOQHC_UNREGISTER  2
+#define IOQHC_SIGNAL 3
+
+struct ioq_register {
+   ioq_id_t id;
+   u32  irq;
+   u64  ring;
+};
+
+
+#endif /* _KVM_IOQ_H_ */
diff --git a/drivers/kvm/ioq_guest.c b/drivers/kvm/ioq_guest.c
new file mode 100644
index 000..5f16390
--- /dev/null
+++ b/drivers/kvm/ioq_guest.c
@@ -0,0 +1,196 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ioq.h"
+#include "kvm.h"
+
+struct kvmguest_ioq {
+   struct ioqioq;
+   int   irq;
+};
+
+struct kvmguest_ioq* to_ioq(struct ioq *ioq)
+{
+   return container_of(ioq, struct kvmguest_ioq, ioq);
+}
+
+static int ioq_hypercall(unsigned long nr, void *data)
+{
+   return hypercall(2, __NR_hypercall_ioq, nr, __pa(data));
+}
+
+/*
+ * --
+ * interrupt handler
+ * --
+ */
+irqreturn_t kvmguest_ioq_intr(int irq, void *dev)
+{
+   struct kvmguest_ioq *_ioq = to_ioq(dev);
+
+   ioq_wakeup(&_ioq->ioq);
+
+   return IRQ_HANDLED;
+}
+
+/*
+ * --
+ * ioq implementation
+ * --
+ */
+
+static int kvmguest_ioq_signal(struct ioq *ioq)
+{
+   return ioq_hypercall(IOQHC_SIGNAL, &ioq->id);
+}
+
+static void kvmguest_ioq_destroy(struct ioq *ioq)
+{
+   struct kvmguest_ioq *_ioq = to_ioq(ioq);
+   int ret;
+
+

[kvm-devel] [PATCH 07/13] IRQ: Export create_irq/destroy_irq

2007-08-24 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/io_apic.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index f57f8b9..f8d2508 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -1907,6 +1907,7 @@ int create_irq(void)
}
return irq;
 }
+EXPORT_SYMBOL(create_irq);
 
 void destroy_irq(unsigned int irq)
 {
@@ -1918,6 +1919,7 @@ void destroy_irq(unsigned int irq)
__clear_irq_vector(irq);
spin_unlock_irqrestore(&vector_lock, flags);
 }
+EXPORT_SYMBOL(destroy_irq);
 
 /*
  * MSI mesage composition


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 06/13] IOQNET: Add a test harness infrastructure to IOQNET

2007-08-24 Thread Gregory Haskins

We can add a IOQNET loop-back device and register it with the PVBUS to test
many aspects of the system (IOQ, PVBUS, and IOQNET itself).

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/net/Kconfig   |   10 +
 drivers/net/ioqnet/Makefile   |3 
 drivers/net/ioqnet/loopback.c |  517 +
 3 files changed, 530 insertions(+), 0 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index ef05437..a6a467e 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2993,4 +2993,14 @@ config IOQNET_DEBUG
depends on IOQNET
default n
 
+config IOQNET_LOOPBACK
+tristate "IOQNET loopback device test harness"
+depends on IOQNET
+default n
+---help---
+This will install a special PVBUS device that implements two IOQNET
+devices.  The devices are, of course, linked to one another forming a
+loopback mechanism.  This allows many subsystems to be tested: IOQ,
+PVBUS, and IOQNET itself.  If unsure, say N.
+ 
 endif # NETDEVICES
diff --git a/drivers/net/ioqnet/Makefile b/drivers/net/ioqnet/Makefile
index d7020ee..7d2d156 100644
--- a/drivers/net/ioqnet/Makefile
+++ b/drivers/net/ioqnet/Makefile
@@ -4,8 +4,11 @@
 
 ioqnet-objs = driver.o
 obj-$(CONFIG_IOQNET) += ioqnet.o
+ioqnet-loopback-objs = loopback.o
+obj-$(CONFIG_IOQNET_LOOPBACK) += ioqnet-loopback.o
 
 
 ifeq ($(CONFIG_IOQNET_DEBUG),y)
 EXTRA_CFLAGS += -DIOQNET_DEBUG
 endif
+
diff --git a/drivers/net/ioqnet/loopback.c b/drivers/net/ioqnet/loopback.c
new file mode 100644
index 000..9cc8d47
--- /dev/null
+++ b/drivers/net/ioqnet/loopback.c
@@ -0,0 +1,517 @@
+/*
+ * ioqnet test harness
+ *
+ * Copyright (C) 2007 Novell, Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_AUTHOR("Gregory Haskins");
+MODULE_LICENSE("GPL");
+
+#ifndef ETH_ALEN
+#define ETH_ALEN 6
+#endif
+
+#undef PDEBUG /* undef it, just in case */
+#ifdef IOQNET_DEBUG
+#  define PDEBUG(fmt, args...) printk( KERN_DEBUG "ioqnet: " fmt, ## args)
+#else
+#  define PDEBUG(fmt, args...) /* not debugging: nothing */
+#endif
+
+/*
+ * -
+ * First we must create an IOQ implementation to use while under test
+ * since these operations will all be local to the same host
+ * -
+ */
+
+struct ioqnet_lb_ioq {
+   struct ioq   ioq;
+   struct ioqnet_lb_ioq*peer;
+   struct tasklet_structtask;
+};
+
+struct ioqnet_lb_ioqmgr {
+   struct ioq_mgr  mgr;
+
+   /*
+* Since this is just a test harness, we know ahead of time that
+* we aren't going to need more than a handful of IOQs.  So to keep
+* lookups simple we will simply create a static array of them
+*/
+   struct ioqnet_lb_ioq ioqs[8];
+   int pos;
+};
+
+static struct ioqnet_lb_ioqmgr lb_ioqmgr;
+
+struct ioqnet_lb_ioq* to_ioq(struct ioq *ioq)
+{
+   return container_of(ioq, struct ioqnet_lb_ioq, ioq);
+}
+
+struct ioqnet_lb_ioqmgr* to_mgr(struct ioq_mgr *mgr)
+{
+   return container_of(mgr, struct ioqnet_lb_ioqmgr, mgr);
+}
+
+/*
+ * --
+ * ioq implementation
+ * --
+ */
+static void ioqnet_lb_ioq_wake(unsigned long data)
+{
+   struct ioqnet_lb_ioq *_ioq = (struct ioqnet_lb_ioq*)data;
+
+   if (_ioq->peer)
+   ioq_wakeup(&_ioq->peer->ioq);
+}
+
+static int ioqnet_lb_ioq_signal(struct ioq *ioq)
+{
+   struct ioqnet_lb_ioq *_ioq = to_ioq(ioq);
+
+   if (_ioq->peer)
+   tasklet_schedule(&_ioq->task);
+
+   return 0;
+}
+
+static void ioqnet_lb_ioq_destroy(struct ioq *ioq)
+{
+   struct ioqnet_lb_ioq *_ioq = to_ioq(ioq);
+
+   if (_ioq->peer) {
+   _ioq->peer->peer = NULL;
+   _ioq->peer   = NULL;
+   }
+
+   if (_ioq->ioq.locale == ioq_locality_north) {
+   kfree(_ioq->ioq.ring);
+   kfree(_ioq->ioq.head_desc);
+   } else
+   kfree(_ioq);
+}
+
+/*
+ * --
+ * ioqmgr implementa

[kvm-devel] [PATCH 05/13] IOQ: Add an IOQ network driver

2007-08-24 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/net/Kconfig |   10 +
 drivers/net/Makefile|2 
 drivers/net/ioqnet/Makefile |   11 +
 drivers/net/ioqnet/driver.c |  678 +++
 include/linux/ioqnet.h  |   44 +++
 5 files changed, 745 insertions(+), 0 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 81ef81c..ef05437 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2983,4 +2983,14 @@ config NETPOLL_TRAP
 config NET_POLL_CONTROLLER
def_bool NETPOLL
 
+config IOQNET
+   tristate "IOQNET (IOQ based paravirtualized network driver)"
+   select IOQ
+   select PVBUS
+
+config IOQNET_DEBUG
+bool "IOQNET debugging"
+   depends on IOQNET
+   default n
+
 endif # NETDEVICES
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index e684212..09a744d 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -232,6 +232,8 @@ obj-$(CONFIG_ENP2611_MSF_NET) += ixp2000/
 
 obj-$(CONFIG_NETCONSOLE) += netconsole.o
 
+obj-$(CONFIG_IOQNET) += ioqnet/
+
 obj-$(CONFIG_FS_ENET) += fs_enet/
 
 obj-$(CONFIG_NETXEN_NIC) += netxen/
diff --git a/drivers/net/ioqnet/Makefile b/drivers/net/ioqnet/Makefile
new file mode 100644
index 000..d7020ee
--- /dev/null
+++ b/drivers/net/ioqnet/Makefile
@@ -0,0 +1,11 @@
+#
+# Makefile for the IOQNET ethernet driver
+#
+
+ioqnet-objs = driver.o
+obj-$(CONFIG_IOQNET) += ioqnet.o
+
+
+ifeq ($(CONFIG_IOQNET_DEBUG),y)
+EXTRA_CFLAGS += -DIOQNET_DEBUG
+endif
diff --git a/drivers/net/ioqnet/driver.c b/drivers/net/ioqnet/driver.c
new file mode 100644
index 000..244f633
--- /dev/null
+++ b/drivers/net/ioqnet/driver.c
@@ -0,0 +1,678 @@
+/*
+ * ioqnet - A paravirtualized network device based on the IOQ interface
+ *
+ * Copyright (C) 2007 Novell, Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * Derived from the SNULL example from the book "Linux Device
+ * Drivers" by Alessandro Rubini and Jonathan Corbet, published
+ * by O'Reilly & Associates.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include  /* printk() */
+#include  /* kmalloc() */
+#include   /* error codes */
+#include   /* size_t */
+#include  /* mark_bh */
+
+#include 
+#include/* struct device, and other headers */
+#include  /* eth_type_trans */
+#include   /* struct iphdr */
+#include  /* struct tcphdr */
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+MODULE_AUTHOR("Gregory Haskins");
+MODULE_LICENSE("GPL");
+
+#undef PDEBUG /* undef it, just in case */
+#ifdef IOQNET_DEBUG
+#  define PDEBUG(fmt, args...) printk( KERN_DEBUG "ioqnet: " fmt, ## args)
+#else
+#  define PDEBUG(fmt, args...) /* not debugging: nothing */
+#endif
+
+#define RX_RINGLEN 64
+#define TX_RINGLEN 64
+#define TX_PTRS_PER_DESC 64
+
+struct ioqnet_queue {
+   struct ioq  *queue;
+   struct ioq_notifier  notifier;
+};
+
+struct ioqnet_tx_desc {
+   struct sk_buff  *skb;
+   struct ioqnet_tx_ptr data[TX_PTRS_PER_DESC];
+};
+
+struct ioqnet_priv {
+   spinlock_t   lock;
+   struct net_device   *dev;
+   struct pvbus_device *pdev;
+   struct net_device_stats  stats;
+   struct ioqnet_queue  rxq;
+   struct ioqnet_queue  txq;
+   struct tasklet_structtxtask;
+};
+
+static int ioqnet_queue_init(struct ioqnet_priv *priv,
+struct ioqnet_queue *q,
+size_t ringsize,
+void (*func)(struct ioq_notifier*))
+{
+   int ret = priv->pdev->createqueue(priv->pdev, &q->queue, ringsize, 0);
+   if (ret < 0)
+   return ret;
+
+   q->notifier.signal = func;
+   q->queue->notifier = &q->notifier;
+
+   return 0;
+}
+
+/* Perform a hypercall to register/connect our queues */
+static int ioqnet_connect(struct ioqnet_priv *priv)
+{
+   struct ioqnet_connect data = {
+   .rxq = priv->rxq.queue->id,
+   .txq = priv->txq.queue->id,
+   };
+
+   return priv->pdev->call(priv->pdev, IOQNET_CONNECT,
+   &data, sizeof(data), 0);
+}
+
+static int ioqnet_disconnect(struct ioqnet_priv *priv)
+{
+   return priv->pdev->call(priv->pdev, IOQNET_DISCONNECT, NULL, 0, 0);
+}
+
+/* Perform a hypercall to get the assigned MAC addr */
+static int ioqnet_query_mac(struct ioqnet_priv *priv)
+{
+   return priv->pdev->call(priv->pdev,
+   IOQNET_QUERY_MAC,
+   priv->dev->dev_addr,
+   ETH_ALEN, 0);
+}
+
+
+/*
+ * Enable and disable receive interrupts.
+ */
+static void ioqnet_rx_ints(struct net_device *dev, int enable)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev

[kvm-devel] [PATCH 04/13] PARAVIRTUALIZATION: Add support for a bus abstraction

2007-08-24 Thread Gregory Haskins

PV usually comes in two flavors:  device PV, and "core" PV.  The existing PV
ops deal in terms of the latter.  However, it would be useful to add an
interface for a virtual bus with provisions for discovery/configuration of
backend PV devices.  Often times it is desirable to run PV devices even if the
entire core is not operating with PVOPS.  Therefore, we introduce a separate
interface to deal with the devices.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 arch/i386/Kconfig|2 +
 arch/x86_64/Kconfig  |2 +
 drivers/Makefile |1 
 drivers/pvbus/Kconfig|7 ++
 drivers/pvbus/Makefile   |6 ++
 drivers/pvbus/pvbus-driver.c |  120 ++
 include/linux/pvbus.h|   59 +
 7 files changed, 197 insertions(+), 0 deletions(-)

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index f952493..a89b8a5 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -1137,6 +1137,8 @@ source "drivers/pci/pcie/Kconfig"
 
 source "drivers/pci/Kconfig"
 
+source "drivers/pvbus/Kconfig"
+
 config ISA_DMA_API
bool
default y
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index ffa0364..abf1f63 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -740,6 +740,8 @@ source "drivers/pcmcia/Kconfig"
 
 source "drivers/pci/hotplug/Kconfig"
 
+source "drivers/pvbus/Kconfig"
+
 endmenu
 
 
diff --git a/drivers/Makefile b/drivers/Makefile
index f0878b2..54dd639 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -88,3 +88,4 @@ obj-$(CONFIG_DMA_ENGINE)  += dma/
 obj-$(CONFIG_HID)  += hid/
 obj-$(CONFIG_PPC_PS3)  += ps3/
 obj-$(CONFIG_OF)   += of/
+obj-$(CONFIG_PVBUS)+= pvbus/
diff --git a/drivers/pvbus/Kconfig b/drivers/pvbus/Kconfig
new file mode 100644
index 000..1ca094d
--- /dev/null
+++ b/drivers/pvbus/Kconfig
@@ -0,0 +1,7 @@
+#
+# PVBUS configuration
+#
+
+config PVBUS
+   bool "Paravirtual Bus"
+
diff --git a/drivers/pvbus/Makefile b/drivers/pvbus/Makefile
new file mode 100644
index 000..0df2c2e
--- /dev/null
+++ b/drivers/pvbus/Makefile
@@ -0,0 +1,6 @@
+#
+# Makefile for the PVBUS bus specific drivers.
+#
+
+obj-y += pvbus-driver.o
+
diff --git a/drivers/pvbus/pvbus-driver.c b/drivers/pvbus/pvbus-driver.c
new file mode 100644
index 000..3f6687d
--- /dev/null
+++ b/drivers/pvbus/pvbus-driver.c
@@ -0,0 +1,120 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * Paravirtualized-Bus - This is a generic infrastructure for virtual devices
+ * and their drivers.  It is inspired by Rusty Russell's lguest_bus, but with
+ * the key difference that the bus is decoupled from the underlying hypervisor
+ * in both name and function.
+ *
+ * Instead, it is intended that external hypervisor support will register
+ * arbitrary devices.  Generic drivers can then monitor this bus for
+ * compatible devices regardless of the hypervisor implementation. 
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+
+#define PVBUS_NAME "pvbus"
+
+/*
+ * This function is invoked whenever a new driver and/or device is added
+ * to check if there is a match
+ */
+static int pvbus_dev_match(struct device *_dev, struct device_driver *_drv)
+{
+   struct pvbus_device *dev = container_of(_dev,struct pvbus_device,dev);
+   struct pvbus_driver *drv = container_of(_drv,struct pvbus_driver,drv);
+
+   return !strcmp(dev->name, drv->name);
+}
+
+/*
+ * This function is invoked after the bus infrastructure has already made a
+ * match.  The device will contain a reference to the paired driver which
+ * we will extract.
+ */
+static int pvbus_dev_probe(struct device *_dev)
+{
+   int ret = 0;
+   struct pvbus_device*dev = container_of(_dev,struct pvbus_device, dev);
+   struct pvbus_driver*drv = container_of(_dev->driver,
+  struct pvbus_driver, drv);
+
+   if (drv->probe)
+   ret = drv->probe(dev);
+
+   return ret;
+}
+
+static struct bus_type pv_bus = {
+   .name   = PVBUS_NAME,
+   .match  = pvbus_dev_match,
+};
+
+static stru

[kvm-devel] [PATCH 03/13] IOQ: Adding basic definitions for IO-Queue logic

2007-08-24 Thread Gregory Haskins

IOQ is a generic shared-memory-queue mechanism that happens to be friendly
to virtualization boundaries.  Note that it is not virtualization specific
due to its flexible transport layer.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 include/linux/ioq.h |  176 +++
 lib/Kconfig |   11 ++
 lib/Makefile|1 
 lib/ioq.c   |  228 +++
 4 files changed, 416 insertions(+), 0 deletions(-)

diff --git a/include/linux/ioq.h b/include/linux/ioq.h
new file mode 100644
index 000..d3a18a1
--- /dev/null
+++ b/include/linux/ioq.h
@@ -0,0 +1,176 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * IOQ is a generic shared-memory-queue mechanism that happens to be friendly
+ * to virtualization boundaries. It can be used in a variety of ways, though
+ * its intended purpose is to become the low-level communication path for
+ * paravirtualized drivers.  Note that it is not virtualization specific
+ * due to its flexible signaling layer.
+ *
+ * The following are a list of key design points:
+ *
+ * #) All shared-memory is always allocated on explicitly one side of the
+ *link.  This typically would be the guest side in a VM/VMM scenario.
+ * #) The code has the concept of "north" and "south" where north denotes the
+ *memory-owner side (e.g. guest).
+ * #) A IOQ is "created" on the north side (which generates a unique ID), and
+ *is "connected" on the remote side via its ID.  The facilitates call-path
+ *setup in a manner that is friendly across VM/VMM boundaries.
+ * #) An IOQ is manipulated using an iterator idiom.
+ * #) A "IOQ Manager" abstraction handles the translation between two
+ *endpoints. E.g. allocating "north" memory, signaling, translating
+ *addresses (e.g. GPA to PA)
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_IOQ_H
+#define _LINUX_IOQ_H
+
+#include 
+#include 
+#include 
+
+struct ioq_mgr;
+
+/*
+ *-
+ * The following structures represent data that is shared across boundaries
+ * which may be quite disparate from one another (e.g. Windows vs Linux,
+ * 32 vs 64 bit, etc).  Therefore, care has been taken to make sure they
+ * present data in a manner that is independent of the environment.
+ *---
+ */
+typedef u64 ioq_id_t;
+
+struct ioq_ring_desc {
+   u64 cookie; /* for arbitrary use by north-side */
+   u64 ptr;
+   u64 len;
+   u64 alen;
+   u8  valid;
+   u8  sown; /* South owned = 1, North owned = 0 */
+};
+
+#define IOQ_RING_MAGIC 0x47fa2fe4
+#define IOQ_RING_VER   1
+
+struct ioq_ring_idx {
+   u32 head;/* 0 based index to head of ptr array */
+   u32 tail;/* 0 based index to tail of ptr array */
+   u8  full;
+};
+
+struct ioq_irq {
+   u8  enabled;
+   u8  pending;
+};
+
+enum ioq_locality {
+   ioq_locality_north,
+   ioq_locality_south,
+};
+
+struct ioq_ring_head {
+   u32 magic;
+   u32 ver;
+   ioq_id_tid;
+   u32 count;
+   u64 ptr; /* ptr to array of ioq_ring_desc[count] */
+   struct ioq_ring_idx idx[2];
+   struct ioq_irq  irq[2];
+   u8  padding[16];
+};
+
+/* --- END SHARED STRUCTURES --- */
+
+enum ioq_idx_type {
+   ioq_idxtype_valid,
+   ioq_idxtype_inuse,
+   ioq_idxtype_invalid,
+};
+
+enum ioq_seek_type {
+   ioq_seek_tail,
+   ioq_seek_next,
+   ioq_seek_head,
+   ioq_seek_set
+};
+
+struct ioq_iterator {
+   struct ioq*ioq;
+   struct ioq_ring_idx   *idx;
+   u32pos;
+   struct ioq_ring_desc  *desc;
+   intupdate;
+};
+
+int  ioq_iter_seek(struct ioq_iterator *iter, enum ioq_seek_type type,
+  long offset, int flags);
+int  ioq_iter_push(struct ioq_iterator *iter, int flags);
+int  ioq_iter_pop(struct ioq_iterator *iter,  int flags);
+
+struct ioq_notifier {
+

[kvm-devel] [PATCH 02/13] KVM: Add hypercall definitions

2007-08-24 Thread Gregory Haskins

Author: Ingo Molnar <[EMAIL PROTECTED]>
Author: Dor Laor <[EMAIL PROTECTED]>

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 include/linux/kvm.h |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 1d5a49c..7e9b862 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -298,4 +298,24 @@ struct kvm_signal_mask {
 #define KVM_GET_FPU   _IOR(KVMIO,  0x8c, struct kvm_fpu)
 #define KVM_SET_FPU   _IOW(KVMIO,  0x8d, struct kvm_fpu)
 
+/*
+ * Hypercall calling convention:
+ *
+ * Each hypercall may have 0-6 parameters.
+ *
+ * 64-bit hypercall index is in RAX, goes from 0 to __NR_hypercalls-1
+ *
+ * 64-bit parameters 1-6 are in the standard gcc x86_64 calling convention
+ * order: RDI, RSI, RDX, RCX, R8, R9.
+ *
+ * 32-bit index is EBX, parameters are: EAX, ECX, EDX, ESI, EDI, EBP.
+ * (the first 3 are according to the gcc regparm calling convention)
+ *
+ * No registers are clobbered by the hypercall, except that the
+ * return value is in RAX.
+ */
+#define KVM_NR_HYPERCALLS  1
+
+#define __NR_hypercall_test0
+
 #endif


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 01/13] PV: Add basic infrastructure for paravirtual/hypercall infrastructure

2007-08-24 Thread Gregory Haskins

Author: Ingo Molnar <[EMAIL PROTECTED]>
Author: Dor Laor <[EMAIL PROTECTED]>

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 include/asm-i386/hypercall.h   |  138 
 include/asm-x86_64/hypercall.h |  100 +
 2 files changed, 238 insertions(+), 0 deletions(-)

diff --git a/include/asm-i386/hypercall.h b/include/asm-i386/hypercall.h
new file mode 100644
index 000..8eeb23a
--- /dev/null
+++ b/include/asm-i386/hypercall.h
@@ -0,0 +1,138 @@
+#ifndef __ASM_HYPERCALL_H
+#define __ASM_HYPERCALL_H
+
+#define CONFIG_PARAVIRT 1
+#ifdef CONFIG_PARAVIRT
+
+/*
+ * Hypercalls, according to the calling convention
+ * documented in include/linux/kvm_para.h
+ *
+ * Copyright (C) 2007, Red Hat, Inc., Ingo Molnar <[EMAIL PROTECTED]>
+ * Copyright (C) 2007, Qumranet, Inc., Dor Laor <[EMAIL PROTECTED]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+static inline int __hypercall0(unsigned int nr)
+{
+   int ret;
+   asm (" call hypercall_addr\n"
+   : "=a" (ret)
+   : "a" (nr)
+   : "memory", "cc"
+   );
+   return ret;
+}
+
+static inline int __hypercall1(unsigned int nr, unsigned long p1)
+{
+   int ret;
+   asm (" call hypercall_addr\n"
+   : "=a" (ret)
+   : "a" (nr),
+ "b" (p1)
+   : "memory", "cc"
+   );
+   return ret;
+}
+
+static inline int
+__hypercall2(unsigned int nr, unsigned long p1, unsigned long p2)
+{
+   int ret;
+   asm (" call hypercall_addr\n"
+   : "=a" (ret)
+   : "a" (nr),
+ "b" (p1),
+ "c" (p2)
+   : "memory", "cc"
+   );
+   return ret;
+}
+
+static inline int
+__hypercall3(unsigned int nr, unsigned long p1, unsigned long p2,
+unsigned long p3)
+{
+   int ret;
+   asm (" call hypercall_addr\n"
+   : "=a" (ret)
+   : "a" (nr),
+ "b" (p1),
+ "c" (p2),
+ "d" (p3)
+   : "memory", "cc"
+   );
+   return ret;
+}
+
+static inline int
+__hypercall4(unsigned int nr, unsigned long p1, unsigned long p2,
+unsigned long p3, unsigned long p4)
+{
+   int ret;
+   asm (" call hypercall_addr\n"
+   : "=a" (ret)
+   : "a" (nr),
+ "b" (p1),
+ "c" (p2),
+ "d" (p3),
+ "S" (p4)
+   : "memory", "cc"
+   );
+   return ret;
+}
+
+static inline int
+__hypercall5(unsigned int nr, unsigned long p1, unsigned long p2,
+unsigned long p3, unsigned long p4, unsigned long p5)
+{
+   int ret;
+   asm (" call hypercall_addr\n"
+   : "=a" (ret)
+   : "a" (nr),
+ "b" (p1),
+ "c" (p2),
+ "d" (p3),
+ "S" (p4),
+ "D" (p5)
+   : "memory", "cc"
+   );
+   return ret;
+}
+
+static inline int
+__hypercall6(unsigned int nr, unsigned long p1, unsigned long p2,
+unsigned long p3, unsigned long p4, unsigned long p5,
+unsigned long p6)
+{
+   int ret;
+   asm (" call hypercall_addr\n"
+   : "=a" (ret)
+   : "a" (nr),
+ "b" (p1),
+ "c" (p2),
+ "d" (p3),
+ "S" (p4),
+ "D" (p5),
+ "bp" (p6)
+   : "memory", "cc"
+   );
+   return ret;
+}
+
+
+#define hypercall(nr_params, args...)  \
+({ \
+   int __ret;  \
+   \
+   __ret = __hypercall##nr_params(args);   \
+   \
+   __ret;  \
+})
+
+#endif /* CONFIG_PARAVIRT */
+
+#endif /* __ASM_HYPERCALL_H */
diff --git a/include/asm-x86_64/hypercall.h b/include/asm-x86_64/hypercall.h
new file mode 100644
index 000..5cf15b3
--- /dev/null
+++ b/include/asm-x86_64/hypercall.h
@@ -0,0 +1,100 @@
+#ifndef __ASM_HYPERCALL_H
+#define __ASM_HYPERCALL_H
+
+

[kvm-devel] [PATCH 00/13] PV-IO v4

2007-08-24 Thread Gregory Haskins

The following series implements v4 of the PV-IO series.

Changes since v3:

1)Rebased on top of kvm.git HEAD
2)Forward ported Ingo's/Dor's baseline work for paravirtualization (Dropped
  ballon and net driver) 
3)Fixed numerous bugs:  You can now pass packets via the IOQNET loopback.

I am on vacation next week (figures...just in time for the KVM forum) and
will be also traveling for business for a good portion of Sept.Based on
that, I figured I would make one more drop with my latest stuff.

Have a good time at the forum!  I will talk to you guys when everyone is back
in the swing of things.

-Greg



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Remove APIC lock

2007-08-24 Thread Gregory Haskins

On Fri, 2007-08-24 at 22:24 +0800, Dong, Eddie wrote:
> [EMAIL PROTECTED] wrote:
> > Gregory Haskins wrote:
> >> On Fri, 2007-08-24 at 21:08 +0800, Dong, Eddie wrote:
> >>> Avi:
> >>> 
> >>> apic->lock is used in many place to avoid race condition with
> >>> apic timer call back function which may run on different pCPU.
> >>> This patch migrate the apic timer to same CPU with the one VP
> >>> runs on, thus the lock is no longer necessary.
> >>> 
> >> 
> >> What about sources that can inject interrupts besides the timer?
> >> (E.g. in-kernel PV drivers)
> > 
> > Injecting IRQ is OK, since it is just operation to IRR
> > register which we
> > can
> > use atomic operations. Xen also do in that way.
> > 
> O, said too quick. Xen current has evolved to  be protected by a
> bigger irqlock for both APIC & IOAPIC, and PIC uses per chip lock.
> 
> For our case, PIC/IOAPIC now is using kvm->lock. So APIC is working
> with kvm->lock too. But this lock may be too big for pv driver. We may
> need to think of a solution to cover both APIC & IOAPIC in future.

Yeah, I would highly recommend you make this more fine grained.  For
example, I had a vcpu->irq.lock, and a per-chip lock (e.g. one per apic,
per pic, (and one per ioapic but I never got there)).  Event injection
is a hot-spot, so coarse locking is probably going to cause performance
deficiencies.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Remove APIC lock

2007-08-24 Thread Gregory Haskins

On Fri, 2007-08-24 at 21:08 +0800, Dong, Eddie wrote:
> Avi:
> 
> apic->lock is used in many place to avoid race condition with apic
> timer call back
> function which may run on different pCPU. This patch migrate the
> apic timer to
> same CPU with the one VP runs on, thus the lock is no longer
> necessary.
> 

What about sources that can inject interrupts besides the timer?  (E.g.
in-kernel PV drivers)

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [kvm-commits] KVM: VMX: Use shadow TPR/cr8 for 64-bits guests

2007-08-23 Thread Gregory Haskins

On Thu, 2007-08-23 at 10:46 +0300, Avi Kivity wrote:
> repository: /home/avi/kvm/linux-2.6
> branch: lapic5
> commit 8ed05c33d82a394c90b5dd830513416e59ffcf68
> Author: Yang, Sheng <[EMAIL PROTECTED]>
> Date:   Wed Aug 22 15:03:15 2007 +0300
> 
> KVM: VMX: Use shadow TPR/cr8 for 64-bits guests
> 
> This patch enables TPR shadow of VMX on CR8 access. 64bit Windows using
> CR8 access TPR frequently. The TPR shadow can improve the performance of
> access TPR by not causing vmexit.
> 

I remember when I added something similar to the lapic series that both
of my C2D systems (Merom + Woodcrest) reported that they didn't support
the feature in the HW.  Since those were fairly new, which ones *do*
support this (out of curiosity), or is there a microcode update floating
around out there ;).

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Gregory Haskins

On Tue, 2007-08-21 at 20:12 +0300, Avi Kivity wrote:

> No, sync() means "make the other side aware that there's work to be done".
> 

Ok, but still the important thing isn't the kick per se, but the
resulting completetion.  Can we do interrupt driven reclamation?  Some
of those virtio_net emails I saw kicking around earlier today implied
buffers are reclaimed on the next xmit (e.g. polling) which violates the
netif rules for avoiding deadlock.  I suppose that could have just been
an implementation decisionbut I remember wondering how reaping would
work when virtio first came out.

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Gregory Haskins

On Tue, 2007-08-21 at 10:06 -0400, Gregory Haskins wrote:
> On Tue, 2007-08-21 at 23:47 +1000, Rusty Russell wrote:
> > 
> > In the guest -> host direction, an interface like virtio is designed
> > for batching, with the explicit distinction between add_buf & sync.
> 
> Right.  IOQ has "iter_push()" and "signal()" as synonymous operations.

Hi Rusty,
  This reminded me of an area that I thought might have been missing in
virtio compared to IOQ.  That is, flexibility in the io-completion via
the distinction between "signal" and "sync".  sync() implies that its a
blocking call based on the full drain of the queue, correct?  the
ioq_signal() operation is purely a "kick".  You can, of course, still
implement synchronous functions with a higher layer construct such as
the ioq->wq.  For example:

void send_sync(struct ioq *ioq, struct sk_buff *skb)
{
DECLARE_WAITQUEUE(wait, current);
struct ioq_iterator iter;

ioq_iter_init(ioq, &iter, ioq_idxtype_inuse, IOQ_ITER_AUTOUPDATE);

ioq_iter_seek(&iter, ioq_seek_head, 0, 0);

/* Update the iter.desc->ptr with skb details */

mb();
iter.desc->valid = 1;
iter.desc->sown  = 1; /* give ownership to the south */
mb();

ioq_iter_push(&iter, 0);

add_wait_queue(&ioq->wq, &wait);
set_current_state(TASK_UNINTERRUPTIBLE);

/* Wait until we own it again */
while (!iter.desc->sown)
schedule();

set_current_state(TASK_RUNNING);
remove_wait_queue(&ioq->wq, &wait);
}

But really the goal behind this design was to allow for fine-grained
selection of how io-completion is notified.  E.g.  callback (e.g.
interrupt-driven) deferred reclaimation/reaping (see
ioqnet_tx_complete), sleeping-wait via ioq->wq, busy-wait, etc.

Is there a way to do something similar in virtio? (and forgive me if
there is..I still haven't seen the code).  And if not and people like
that idea, what would be a good way to add it to the interface?

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Gregory Haskins

On Tue, 2007-08-21 at 23:47 +1000, Rusty Russell wrote:

> Hi Gregory,
> 
>   The main current use is disk drivers: they process out-of-order.

Maybe for you ;)  I am working on the networking/IVMC side.

> 
> >   I think the use of rings for the tx-path in of
> > itself is questionable unless you can implement something like the bidir
> > NAPI that I demonstrated in ioqnet.  Otherwise, you end up having to
> > hypercall on each update to the ring anyway and you might as well
> > hypercall directly w/o using a ring.
> 
>   In the guest -> host direction, an interface like virtio is designed
> for batching, with the explicit distinction between add_buf & sync.

Right.  IOQ has "iter_push()" and "signal()" as synonymous operations.
But note that batching via deferred synchronization does not implicitly
require a shared queue. E.g. you could batch internally and then
hypercall at the "sync" point.  However, batching via a queue is still
nice because at least you give the host side a chance to independently
"notice" the changes concurrently before the sync.  But I digress...

>   On
> the receive side, you can have explicit interrupt suppression on
> implicit mitigation caused by scheduling effects.

Agreed.  This is precisely what the bidir NAPI stuff is doing and I
didn't mean to imply that virtio wasn't capable of it too.  All I meant
is that if you *don't* take advantage of it, the guest->host path via a
queue is likely overkill.  E.g. you might as well hypercall instead.

>   But in fact as we can see, two rings need less from each ring than one
> ring.  One ring must have producer and consumer indices, so the producer
> doesn't overrun the consumer.  But if the second ring is used to feed
> consumption, the consumer index isn't required any more: in fact, it's
> just confusing to have.

Don't get me wrong.  I am totally in favor of the two ring approach.
You have enlightened me on that front. :)  I was under the impression
that then making the two-ringed approach support out-of-order added
significantly more complexity.  Did I understand that wrong?

> 
>   I really think that a table of descriptors, a ring for produced
> descriptors and a ring for used descriptors is the most cache-friendly,
> bidir-non-trusting simple implementation possible.  Of course, the
> produced and used rings might be the same format, which allows code
> sharing and if you squint a little, that's your "lowest level" simple
> ringbuffer.

Sounds reasonable to me.

> 
> Thanks for the discussion,

Ditto!
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Gregory Haskins

On Tue, 2007-08-21 at 15:25 +0300, Avi Kivity wrote:
> Gregory Haskins wrote:
> > On Tue, 2007-08-21 at 17:58 +1000, Rusty Russell wrote:
> >
> >   
> >> Partly the horror of the code, but mainly because it is an in-order
> >> ring.  You'll note that we use a reply ring, so we don't need to know
> >> how much the other side has consumed (and it needn't do so in order).
> >>
> >> 
> >
> > I have certainly been known to take a similar stance when looking at Xen
> > code ;) (recall the lapic work I did).  However, that said I am not yet
> > convinced that an out-of-order ring (at least as a fundamental
> > primitive) buys us much.  
> 
> It's pretty much required for block I/O into disk arrays.

You are misunderstanding me.  I totally agree that block io is
inherently out-of-order.  What I am trying to convey is that at a
fundamental level *everything* (including block-io) can be viewed as an
ordered sequence of events.

For instance, consider that a block-io driver is making requests like
"perform read transaction X", and "perform write transaction Y".
Likewise, the host side can pass events like "completed transaction Y"
and "completed transaction X".  At this level, everything is *always*
ordered, regardless of the fact that X and Y were temporally rearranged
by the host.

This is what the ioq/pvbus series is trying to address:  These low-level
primitives for moving events in and out of the guest in a VMM agnostic
way.  From there, you could apply higher level constructs such as an
out-of-order sg descriptor ring to represent your block-io data.  The
low-level primitives simply become a way to convey changes to that
construct.

In a nutshell, IOQ provides a simple bi-directional ordered event
channel and a context associated hypercall mechanism (see
pvbus_device->call()) to accomplish these low-level chores.

I am also advocating caution on the tx path, as I think indirection
(e.g. queuing) as opposed to direct access (e.g. contextual hypercall)
has limited applicability.  Trying to come up with a complex
"one-size-fits-all" queue for the tx path may be not worthwhile since in
the end there is still a 1:1 with queue-insert:hypercall.  You might as
well just pass the descriptor directly via the contextual hypercall.
Where this ends up being a win is where you can do the bi-dir NAPI-like
tricks like IOQNET and have the queue-insert to hypercall ratio become >
1.  

> 
> Xen does out-of-order, btw, on its single ring, but at the cost of some 
> complexity.  I don't believe it is worthwhile and prefer split 
> request/reply rings.

I am not against the split rings either.  The article that Rusty
forwarded was very interesting indeed.  But if I understood the article
and Rusty, there are kind of two aspects to it.  A) Using two rings to
make an cache-thrash friendly ordered ring, or B) adding out-of-order
capability to these two rings.  I am certainly in favor of (A) for use
as the low-level event transport.  I just question whether the
complexity of (B) is justified as the one and only queuing mechanism
when there are plenty of patterns that simply cannot take advantage of
it.

What I am wondering is if we should have a set of low-level primitives
that deal primarily with ordered event sequencing and VMM abstraction,
and a higher set of code expressed in terms of these primitives for
implementing the constructs such as (B) for block-io.

> 
> With my VJ T-shirt on, I can even say it's more efficient, as each side 
> of the ring will have a single writer and a single reader, reducing 
> ping-pong effects if the interrupt completions happens to land on the 
> wrong cpu.

Agreed.

> 
> Network tx can be out of order too (with some traffic destined to other 
> guests, some to the host, and some to external interfaces, completions 
> will be out of order).

Well, not with respect to the 1:1 event delivery channel as I envision
it (unless I am misunderstanding you?)

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Gregory Haskins

On Tue, 2007-08-21 at 17:58 +1000, Rusty Russell wrote:

> Partly the horror of the code, but mainly because it is an in-order
> ring.  You'll note that we use a reply ring, so we don't need to know
> how much the other side has consumed (and it needn't do so in order).
> 

I have certainly been known to take a similar stance when looking at Xen
code ;) (recall the lapic work I did).  However, that said I am not yet
convinced that an out-of-order ring (at least as a fundamental
primitive) buys us much.  I think the use of rings for the tx-path in of
itself is questionable unless you can implement something like the bidir
NAPI that I demonstrated in ioqnet.  Otherwise, you end up having to
hypercall on each update to the ring anyway and you might as well
hypercall directly w/o using a ring.

At a fundamental level, I think we simply need an efficient and in-order
(read: simple) ring to move data in, and a context associated hypercall
to get out.  We can also use that simple ring to move data out if its
advantageous to do so (read: tx NAPI can be used).  From there, we can
build more complex constructs from these primitives, like out-of-order
sg block-io.

OTOH, its possible that its redundant to have a simple low-level
infrastructure and then build a more complex ring for out-of-order
processing on top of it.  I'm not sure.  My gut feeling is that it will
probably result in a cleaner implementation: The higher-layered ring can
stop worrying about the interrupt/hypercall details (it would use the
simple ring as its transport)and implementations that don't need
out-of-order (e.g. networks) don't have to deal with the associated
complexity.

What are your thoughts to this layering approach?

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-20 Thread Gregory Haskins

On Mon, 2007-08-20 at 07:03 -0700, Dor Laor wrote:
> >> > 2) We either need huge descriptors or some chaining
> >mechanism to
> >> > handle scatter-gather.
> >> >
> >>
> >> Or, my preference, have a small sglist in the descriptor;
> >
> >
> >Define "small" ;)
> >
> >There a certainly patterns that cannot/will-not take advantage of SG
> >(for instance, your typical network rx path), and therefore the sg
> >entries are wasted in some cases.  Since they need to be (IMHO) u64,
> >they suck down at least 8 bytes a piece.  Because of this I elected to
> >use the model of one pointer per descriptor, with an external
> descriptor
> >for SG.  What are your thoughts on this?
> 
> Using Rusty's code there is no waste.
> Each descriptor has a flag (head|next). Next flag stands for pointer to
> the
> next descriptor with u32 next index. So the waste is 4 bytes.
> Sg descriptors are chained on the same descriptor ring.

Right, so he is using a chaining mechanism and I was using a
single-pointer + external-descriptor mechanism.  (Actually you can chain
with IOQ too if you want but I chose to implement the IOQNET example
with external-descriptors).  I'm not sure if either way is particularly
better than the other.  The important thing (IMO) is that either way you
avoid waste for the (not so uncommon) non-sg case.

You still owe me some code, BTW ;)

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-20 Thread Gregory Haskins

On Sun, 2007-08-19 at 12:24 +0300, Avi Kivity wrote:
> Rusty Russell wrote:
> > 2) We either need huge descriptors or some chaining mechanism to
> > handle scatter-gather.
> >   
> 
> Or, my preference, have a small sglist in the descriptor;

Define "small" ;)

There a certainly patterns that cannot/will-not take advantage of SG
(for instance, your typical network rx path), and therefore the sg
entries are wasted in some cases.  Since they need to be (IMHO) u64,
they suck down at least 8 bytes a piece.  Because of this I elected to
use the model of one pointer per descriptor, with an external descriptor
for SG.  What are your thoughts on this?

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-17 Thread Gregory Haskins

On Fri, 2007-08-17 at 17:43 +1000, Rusty Russell wrote:

>   Sure, these discussions can get pretty esoteric.  The question is
> whether you want a point-to-point transport (as we discuss here), or an
> N-way.  Lguest has N-way, but I'm not convinced it's worthwhile, as
> there's some overhead involved in looking up recipients (basically futex
> code).

Ah, ok I get it.  In that case: yeah, I agree 1:1 is probably the way to
go.  We can always build some kind of N-way transport in terms of 1:1
primitives if its desirable (though its probably better in most cases to
just reuse something like an ethernet transport/bridge than invent
something new). 

> 
> I agree that page sharing is silly.  But we can design a mechanism where
> it such a "DMA agent" need only enforce a few very simple rules not the
> whole protocol, and yet the guest doesn't know whether it's talking to
> an agent or the host.

InterestingI would love to hear more of your ideas surrounding this.

> Well, for cache reasons you should really try to avoid having both sides
> write to the same data.  Hence two separate cache-aligned regions is
> better than one region and a flip bit.

While I certainly can see what you mean about the cache implications for
a bit-flip design, I don't see how you can get away with not having both
sides write to the same memory in other designs either.  Wouldn't you
still have to adjust descriptors from one ring to the other?  E.g.
wouldn't both sides be writing descriptor pointer data in this case, or
am I missing something?

> And if you make them separate pages, then this can also be inter-guest
> safe 8)

Ok, now you are making my head hurt 8)

> 
> Yeah, I agree.  I'm not sure how important it is IRL, but it *feels*
> clever 8)

Heh, yeah, I agree I don't know how much it saves.  I kind of got it for
free based on the general design of the queue, so I thought "hey that's
pretty cool".  It shouldn't *hurt*, anyway ;)

> 
> Yeah, I fear grant tables too.  But in any scheme, the descriptors imply
> permission, so with a little careful design and implementation it should
> "just work"...
> 

I am certainly looking forward to hearing more of your ideas in this
area.  Very interesting, indeed

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-16 Thread Gregory Haskins

Hi Rusty,

 Comments inline...

On Fri, 2007-08-17 at 11:25 +1000, Rusty Russell wrote:
> 
> Transport has several parts.  What the hypervisor knows about (usually
> shared memory and some interrupt mechanism and possibly "DMA") and what
> is convention between users (eg. ringbuffer layouts).  Whether it's 1:1
> or n-way (if 1:1, is it symmetrical?).

TBH, I am not sure what you mean by 1:1 vs n-way ringbuffers (its
probably just lack of sleep and tomorrow I will smack myself for
asking ;)

But could you elaborate here?

>   Whether it has to be host <->
> guest, or can be inter-guest.  Whether it requires trust between the
> sides.
> 
> My personal thoughts are that we should be aiming for 1:1 untrusting.

Untrusting I understand, and I agree with you there.  Obviously the host
is implicitly trusted (you have no choice, really) but I think the
guests should be validated just as you would for a standard
userspace/kernel interaction (e.g. validate pointer arguments and their
range, etc).

> And not having inter-guest is just
> poor form (and putting it in later is impossible, as we'll see).

I agree that having an ability to do inter-guest is a good idea.
However, I don't know if I am convinced if it has to be done in a
direct, zero-copy way. Mediating through the host certainly can work and
is probably acceptable for most things.  In this way the host is
essentially acting as a DMA agent to copy from one guests memory to the
other.  It solves the "trust" issue and simplifies the need to have a
"grant table" like mechanism which can get pretty hairy, IMHO.

I *could* be convinced otherwise, but that is my current thought.  This
would essentially look very similar to how my patch #4 (loopback) works.
It takes a pointer from an tx-queue and copies the data to a pointer
from an empty descriptor in the other side's rx-queue.  If you move that
concept down into the host this is how I was envisioning it working.

> 
> It seems that a shared-memory "ring-buffer of descriptors" is the
> simplest implementation.  But there are two problems with a simple
> descriptor ring:
> 
> 1) A ring buffer doesn't work well for things which process
> out-of-order, such as a block device.
> 2) We either need huge descriptors or some chaining mechanism to
> handle scatter-gather.
> 

I definitely agree that a simple descriptor-ring in of itself doesn't
solve all possible patterns directly.  I don't know if you had a chance
to look too deeply into the IOQ code yet, but it essentially is a very
simple descriptor-ring as you mention.

However, I don't view that as a limitation because I envision this type
of thing to be just one "tool" or layer in a larger puzzle.  One that
can be applied many different ways to solve more complex problems.

(The following is a long and boring story about my train of thought and
how I got to where I am today with this code)

What I was seeing as a general problem is efficient basic
event movement.  Each guest->host or host->guest
transition is expensive so we want to minimize the number of these
occurring  (ideally down to 1 (or less!) per
operation).

Now moving events out of a guest in one (or fewer) IO operations is
fairly straight forward (hypercall namespace is typically pretty large
and they can have accompanying parameters (including pointers)
associated with them).  However, moving events *into* the guest in one
(or fewer) shots is difficult because by default you really only have a
single parameter (interrupt vector) to convey any meaning.  To make
matters worse, the namespace for vectors can be rather small (e.g. 256
on x86).

Now traditionally we would of course solve the latter problem by
turning around and doing some kind of additional IO operation to get
more details about the event.  Any why not?  Its dirt cheap on
bare-metal.  Of course, in a VM this is particularly expensive and we
want to avoid it.

Enter the shared memory concept:  E.g. put details about the event
somewhere in memory that can be read in the guest without a VMEXIT.  Now
your interrupt vector is simply ringing the doorbell on your
shared-memory.  The question becomes: how to you synchronize access to
the memory without necessitating as much overhead as you had to begin
with? E.g. how does one side know when the other side is done and wants
more data, etc.  What if you want to parallelize things, etc.

Enter the shared-memory queue: Now you have a way to organize your
memory such that both sides can use it effectively and simultaneously.

So there you have it:  We can use a simple shared-memory-queue to
efficiently move event data into a guest.  And we can use hypercalls to
efficiently move it out.  As it turns out, there are also cases where
using a queue for the output side makes sense too, but the basic case is
for input.  But long story short, that is the basic fundamental purpose
of this subsystem.

Now enter the more complex usage patterns:  

For instance, a block device driver co

[kvm-devel] [PATCH 05/10] IRQ: Export create_irq/destroy_irq

2007-08-16 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/io_apic.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index d8bfe31..6bf8794 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -1849,6 +1849,7 @@ int create_irq(void)
}
return irq;
 }
+EXPORT_SYMBOL(create_irq);
 
 void destroy_irq(unsigned int irq)
 {
@@ -1860,6 +1861,7 @@ void destroy_irq(unsigned int irq)
__clear_irq_vector(irq);
spin_unlock_irqrestore(&vector_lock, flags);
 }
+EXPORT_SYMBOL(destroy_irq);
 
 /*
  * MSI mesage composition


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 09/10] KVM: Add PVBUS support to the KVM host

2007-08-16 Thread Gregory Haskins

PVBUS allows VMM agnostic PV drivers to discover/configure virtual resources

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig  |   10 +
 drivers/kvm/Makefile |3 
 drivers/kvm/kvm.h|4 
 drivers/kvm/kvm_main.c   |4 
 drivers/kvm/pvbus_host.c |  636 ++
 drivers/kvm/pvbus_host.h |   66 +
 6 files changed, 723 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index d9def33..9f2ef22 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -52,6 +52,16 @@ config KVM_IOQ_HOST
depends on KVM
select IOQ
 
+config KVM_PVBUS_HOST
+   boolean "Paravirtualized Bus (PVBUS) host support"
+   depends on KVM
+   select KVM_IOQ_HOST
+   ---help---
+PVBUS is an infrastructure for generic PV drivers to take advantage
+of an underlying hypervisor without having to understand the details
+   of the hypervisor itself.  You only need this option if you plan to
+   run PVBUS based PV guests in KVM.
+
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 2095061..8926fa9 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -7,6 +7,9 @@ kvm-objs := kvm_main.o mmu.o x86_emulate.o
 ifeq ($(CONFIG_KVM_IOQ_HOST),y)
 kvm-objs += ioq_host.o
 endif
+ifeq ($(CONFIG_KVM_PVBUS_HOST),y)
+kvm-objs += pvbus_host.o
+endif
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index c38c84f..8dc9ac3 100755
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vmx.h"
 #include "ioq.h"
@@ -393,6 +394,9 @@ struct kvm {
 #ifdef CONFIG_KVM_IOQ_HOST
struct ioq_mgr *ioqmgr;
 #endif
+#ifdef CONFIG_KVM_PVBUS_HOST
+   struct kvm_pvbus *pvbus;
+#endif
 
 };
 
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index fbffd2f..d35ce8d 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -44,6 +44,7 @@
 
 #include "x86_emulate.h"
 #include "segment_descriptor.h"
+#include "pvbus_host.h"
 
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
@@ -350,6 +351,7 @@ static struct kvm *kvm_create_vm(void)
spin_unlock(&kvm_lock);
}
kvmhost_ioqmgr_init(kvm);
+   kvm_pvbus_init(kvm);
return kvm;
 }
 
@@ -3616,6 +3618,7 @@ static __init int kvm_init(void)
memset(__va(bad_page_address), 0, PAGE_SIZE);
 
kvmhost_ioqmgr_module_init();
+   kvm_pvbus_module_init();
 
return 0;
 
@@ -3637,6 +3640,7 @@ static __exit void kvm_exit(void)
mntput(kvmfs_mnt);
unregister_filesystem(&kvm_fs_type);
kvm_mmu_module_exit();
+   kvm_pvbus_module_exit();
 }
 
 module_init(kvm_init)
diff --git a/drivers/kvm/pvbus_host.c b/drivers/kvm/pvbus_host.c
new file mode 100644
index 000..cc506f4
--- /dev/null
+++ b/drivers/kvm/pvbus_host.c
@@ -0,0 +1,636 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvbus.h"
+#include "pvbus_host.h"
+#include "kvm.h"
+
+struct pvbus_map {
+   int (*compare)(const void *left, const void *right);
+   const void* (*getkey)(struct rb_node *node);
+
+   struct mutex   lock;
+   struct rb_root root;
+   size_t count;
+};
+
+struct _pv_devtype {
+   struct kvm_pv_devtype *item;
+   struct rb_node node;
+};
+
+struct _pv_device {
+   struct kvm_pv_device  *item;
+   struct rb_node node;
+   struct _pv_devtype*parent;
+   intsynced;
+};
+
+static struct pvbus_map pvbus_typemap;
+
+struct kvm_pvbus_eventq {
+   struct mutex lock;
+   struct ioq  *ioq;
+
+};
+
+struct kvm_pvbus {
+   struct mutexlock;
+   struct kvm *kvm;
+   struct pvbus_mapdevmap;
+   struct kvm_pvbus_ev

[kvm-devel] [PATCH 10/10] KVM: Add an IOQNET backend driver

2007-08-16 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig   |5 
 drivers/kvm/Makefile  |2 
 drivers/kvm/ioqnet_host.c |  566 +
 3 files changed, 573 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 9f2ef22..19551a2 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -62,6 +62,11 @@ config KVM_PVBUS_HOST
of the hypervisor itself.  You only need this option if you plan to
run PVBUS based PV guests in KVM.
 
+config KVM_IOQNET
+   tristate "IOQNET host support"
+   depends on KVM
+   select KVM_PVBUS_HOST
+
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 8926fa9..66e5272 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -22,3 +22,5 @@ kvm-net-host-objs = kvm_net_host.o
 obj-$(CONFIG_KVM_NET_HOST) += kvm_net_host.o
 kvm-pvbus-objs := ioq_guest.o pvbus_guest.o
 obj-$(CONFIG_KVM_PVBUS_GUEST) += kvm-pvbus.o
+kvm-ioqnet-objs := ioqnet_host.o
+obj-$(CONFIG_KVM_IOQNET) += kvm-ioqnet.o
\ No newline at end of file
diff --git a/drivers/kvm/ioqnet_host.c b/drivers/kvm/ioqnet_host.c
new file mode 100644
index 000..0f4d055
--- /dev/null
+++ b/drivers/kvm/ioqnet_host.c
@@ -0,0 +1,566 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * ioqnet - A paravirtualized network device based on the IOQ interface.
+ *
+ * This module represents the backend driver for an IOQNET driver on the KVM
+ * platform.
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * Derived in part from the SNULL example from the book "Linux Device
+ * Drivers" by Alessandro Rubini and Jonathan Corbet, published
+ * by O'Reilly & Associates.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include  /* printk() */
+#include  /* kmalloc() */
+#include   /* error codes */
+#include   /* size_t */
+#include  /* mark_bh */
+
+#include 
+#include/* struct device, and other headers */
+#include  /* eth_type_trans */
+#include   /* struct iphdr */
+#include  /* struct tcphdr */
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvbus_host.h"
+#include "kvm.h"
+
+MODULE_AUTHOR("Gregory Haskins");
+MODULE_LICENSE("GPL");
+
+#define IOQNET_NAME "ioqnet"
+
+/*
+ * FIXME: Any "BUG_ON" code that can be triggered by a malicious guest must
+ * be turned into an inject_gp()
+ */
+
+struct ioqnet_queue {
+   struct ioq  *queue;
+   struct ioq_notifier  notifier;
+};
+
+struct ioqnet_priv {
+   spinlock_t   lock;
+   struct kvm  *kvm;
+   struct kvm_pv_device pvdev;
+   struct net_device   *netdev;
+   struct net_device_stats  stats;
+   struct ioqnet_queue  rxq;
+   struct ioqnet_queue  txq;
+   struct tasklet_structtxtask;
+   int  connected;
+   int  opened;
+};
+
+#undef PDEBUG /* undef it, just in case */
+#ifdef IOQNET_DEBUG
+#  define PDEBUG(fmt, args...) printk( KERN_DEBUG "ioqnet: " fmt, ## args)
+#else
+#  define PDEBUG(fmt, args...) /* not debugging: nothing */
+#endif
+
+/*
+ * Enable and disable receive interrupts.
+ */
+static void ioqnet_rx_ints(struct net_device *dev, int enable)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   struct ioq *ioq = priv->rxq.queue;
+
+   if (priv->connected) {
+   if (enable)
+   ioq_start(ioq, 0);
+   else
+   ioq_stop(ioq, 0);
+   }
+}
+
+/*
+ * Open and close
+ */
+
+int ioqnet_open(struct net_device *dev)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   
+   priv->opened = 1;
+   netif_start_queue(dev);
+   
+   return 0;
+}
+
+int ioqnet_release(struct net_device *dev)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   
+   priv->opened = 0;
+   netif_stop_queue(dev);
+
+   return 0;
+}
+
+/*
+ * Configuration changes (passed on by ifconfig)
+ */

[kvm-devel] [PATCH 07/10] KVM: Add a gpa_to_hva helper function

2007-08-16 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h |1 +
 drivers/kvm/mmu.c |   12 
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 9934f11..05d5be1 100755
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -475,6 +475,7 @@ void vcpu_load(struct kvm_vcpu *vcpu);
 void vcpu_put(struct kvm_vcpu *vcpu);
 
 hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa);
+void* gpa_to_hva(struct kvm *kvm, gpa_t gpa);
 #define HPA_MSB ((sizeof(hpa_t) * 8) - 1)
 #define HPA_ERR_MASK ((hpa_t)1 << HPA_MSB)
 static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; }
diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index e84c599..daaf0d2 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -766,6 +766,18 @@ hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 }
 EXPORT_SYMBOL_GPL(gpa_to_hpa);
 
+void* gpa_to_hva(struct kvm *kvm, gpa_t gpa)
+{
+   struct page *page;
+
+   if ((gpa & HPA_ERR_MASK) == 0)
+   return NULL;
+
+   page = gfn_to_page(kvm, gpa >> PAGE_SHIFT);
+   return kmap_atomic(page, gpa & PAGE_MASK);
+}
+EXPORT_SYMBOL_GPL(gpa_to_hva);
+
 hpa_t gva_to_hpa(struct kvm_vcpu *vcpu, gva_t gva)
 {
gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, gva);


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 01/10] IOQ: Adding basic definitions for IO-Queue logic

2007-08-16 Thread Gregory Haskins

IOQ is a generic shared-memory-queue mechanism that happens to be friendly
to virtualization boundaries.  Note that it is not virtualization specific
due to its flexible transport layer.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 include/linux/ioq.h |  176 +++
 lib/Kconfig |   11 ++
 lib/Makefile|1 
 lib/ioq.c   |  228 +++
 4 files changed, 416 insertions(+), 0 deletions(-)

diff --git a/include/linux/ioq.h b/include/linux/ioq.h
new file mode 100644
index 000..d3a18a1
--- /dev/null
+++ b/include/linux/ioq.h
@@ -0,0 +1,176 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * IOQ is a generic shared-memory-queue mechanism that happens to be friendly
+ * to virtualization boundaries. It can be used in a variety of ways, though
+ * its intended purpose is to become the low-level communication path for
+ * paravirtualized drivers.  Note that it is not virtualization specific
+ * due to its flexible signaling layer.
+ *
+ * The following are a list of key design points:
+ *
+ * #) All shared-memory is always allocated on explicitly one side of the
+ *link.  This typically would be the guest side in a VM/VMM scenario.
+ * #) The code has the concept of "north" and "south" where north denotes the
+ *memory-owner side (e.g. guest).
+ * #) A IOQ is "created" on the north side (which generates a unique ID), and
+ *is "connected" on the remote side via its ID.  The facilitates call-path
+ *setup in a manner that is friendly across VM/VMM boundaries.
+ * #) An IOQ is manipulated using an iterator idiom.
+ * #) A "IOQ Manager" abstraction handles the translation between two
+ *endpoints. E.g. allocating "north" memory, signaling, translating
+ *addresses (e.g. GPA to PA)
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_IOQ_H
+#define _LINUX_IOQ_H
+
+#include 
+#include 
+#include 
+
+struct ioq_mgr;
+
+/*
+ *-
+ * The following structures represent data that is shared across boundaries
+ * which may be quite disparate from one another (e.g. Windows vs Linux,
+ * 32 vs 64 bit, etc).  Therefore, care has been taken to make sure they
+ * present data in a manner that is independent of the environment.
+ *---
+ */
+typedef u64 ioq_id_t;
+
+struct ioq_ring_desc {
+   u64 cookie; /* for arbitrary use by north-side */
+   u64 ptr;
+   u64 len;
+   u64 alen;
+   u8  valid;
+   u8  sown; /* South owned = 1, North owned = 0 */
+};
+
+#define IOQ_RING_MAGIC 0x47fa2fe4
+#define IOQ_RING_VER   1
+
+struct ioq_ring_idx {
+   u32 head;/* 0 based index to head of ptr array */
+   u32 tail;/* 0 based index to tail of ptr array */
+   u8  full;
+};
+
+struct ioq_irq {
+   u8  enabled;
+   u8  pending;
+};
+
+enum ioq_locality {
+   ioq_locality_north,
+   ioq_locality_south,
+};
+
+struct ioq_ring_head {
+   u32 magic;
+   u32 ver;
+   ioq_id_tid;
+   u32 count;
+   u64 ptr; /* ptr to array of ioq_ring_desc[count] */
+   struct ioq_ring_idx idx[2];
+   struct ioq_irq  irq[2];
+   u8  padding[16];
+};
+
+/* --- END SHARED STRUCTURES --- */
+
+enum ioq_idx_type {
+   ioq_idxtype_valid,
+   ioq_idxtype_inuse,
+   ioq_idxtype_invalid,
+};
+
+enum ioq_seek_type {
+   ioq_seek_tail,
+   ioq_seek_next,
+   ioq_seek_head,
+   ioq_seek_set
+};
+
+struct ioq_iterator {
+   struct ioq*ioq;
+   struct ioq_ring_idx   *idx;
+   u32pos;
+   struct ioq_ring_desc  *desc;
+   intupdate;
+};
+
+int  ioq_iter_seek(struct ioq_iterator *iter, enum ioq_seek_type type,
+  long offset, int flags);
+int  ioq_iter_push(struct ioq_iterator *iter, int flags);
+int  ioq_iter_pop(struct ioq_iterator *iter,  int flags);
+
+struct ioq_notifier {
+

[kvm-devel] [PATCH 06/10] KVM: Add a guest side driver for IOQ

2007-08-16 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig   |   28 +++
 drivers/kvm/Makefile  |3 
 drivers/kvm/ioq.h |   39 +
 drivers/kvm/ioq_guest.c   |  195 +++
 drivers/kvm/pvbus.h   |   63 +++
 drivers/kvm/pvbus_guest.c |  382 +
 include/linux/kvm.h   |4 
 7 files changed, 706 insertions(+), 8 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 22d0eb4..aca79d1 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -47,16 +47,32 @@ config KVM_BALLOON
  The driver inflate/deflate guest physical memory on demand.
  This ability provides memory over commit for the host
 
-config KVM_NET
-   tristate "Para virtual network device"
-   depends on KVM
-   ---help---
- Provides support for guest paravirtualization networking
-
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
---help---
  Provides support for host paravirtualization networking
 
+config KVM_GUEST
+   bool "KVM Guest support"
+   depends on X86
+   default y
+
+config KVM_PVBUS_GUEST
+   tristate "Paravirtualized Bus (PVBUS) support"
+   depends on KVM_GUEST
+   select IOQ
+   select PVBUS
+   ---help---
+PVBUS is an infrastructure for generic PV drivers to take advantage
+of an underlying hypervisor without having to understand the details
+   of the hypervisor itself.  You only need this option if you plan to
+   run this kernel as a KVM guest.
+
+config KVM_NET
+   tristate "Para virtual network device"
+   depends on KVM && KVM_GUEST
+   ---help---
+ Provides support for guest paravirtualization networking
+
 endif # VIRTUALIZATION
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 92600d8..c6a59bb 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -14,4 +14,5 @@ kvm-net-objs = kvm_net.o
 obj-$(CONFIG_KVM_NET) += kvm-net.o
 kvm-net-host-objs = kvm_net_host.o
 obj-$(CONFIG_KVM_NET_HOST) += kvm_net_host.o
-
+kvm-pvbus-objs := ioq_guest.o pvbus_guest.o
+obj-$(CONFIG_KVM_PVBUS_GUEST) += kvm-pvbus.o
diff --git a/drivers/kvm/ioq.h b/drivers/kvm/ioq.h
new file mode 100644
index 000..7e955f1
--- /dev/null
+++ b/drivers/kvm/ioq.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _KVM_IOQ_H_
+#define _KVM_IOQ_H_
+
+#include 
+
+#define IOQHC_REGISTER   1
+#define IOQHC_UNREGISTER  2
+#define IOQHC_SIGNAL 3
+
+struct ioq_register {
+   ioq_id_t id;
+   u32  irq;
+   u64  ring;
+};
+
+
+#endif /* _KVM_IOQ_H_ */
diff --git a/drivers/kvm/ioq_guest.c b/drivers/kvm/ioq_guest.c
new file mode 100644
index 000..068aeb1
--- /dev/null
+++ b/drivers/kvm/ioq_guest.c
@@ -0,0 +1,195 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+
+#include "ioq.h"
+#include "kvm.h"
+
+struct kvmguest_ioq {
+   struct ioqioq;
+   int   irq;
+};
+
+struct kvmguest_ioq* to_ioq(struct ioq *ioq)
+{
+   return container_of(ioq, struct kvmguest_ioq, ioq);
+}
+
+static int ioq_hypercall(unsigned long nr, void *data)
+{
+   return hyperca

[kvm-devel] [PATCH 08/10] KVM: Add support for IOQ

2007-08-16 Thread Gregory Haskins

IOQ is a shared-memory-queue interface for implmenting PV driver
communication.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig|5 +
 drivers/kvm/Makefile   |3 
 drivers/kvm/ioq.h  |   12 +-
 drivers/kvm/ioq_host.c |  365 
 drivers/kvm/kvm.h  |5 +
 drivers/kvm/kvm_main.c |3 
 include/linux/kvm.h|1 
 7 files changed, 393 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index aca79d1..d9def33 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -47,6 +47,11 @@ config KVM_BALLOON
  The driver inflate/deflate guest physical memory on demand.
  This ability provides memory over commit for the host
 
+config KVM_IOQ_HOST
+   boolean "Add IOQ support to KVM"
+   depends on KVM
+   select IOQ
+
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index c6a59bb..2095061 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -4,6 +4,9 @@
 EXTRA_CFLAGS :=
 
 kvm-objs := kvm_main.o mmu.o x86_emulate.o
+ifeq ($(CONFIG_KVM_IOQ_HOST),y)
+kvm-objs += ioq_host.o
+endif
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/ioq.h b/drivers/kvm/ioq.h
index 7e955f1..b942113 100644
--- a/drivers/kvm/ioq.h
+++ b/drivers/kvm/ioq.h
@@ -25,7 +25,17 @@
 
 #include 
 
-#define IOQHC_REGISTER   1
+struct kvm;
+
+#ifdef CONFIG_KVM_IOQ_HOST
+int kvmhost_ioqmgr_init(struct kvm *kvm);
+int kvmhost_ioqmgr_module_init(void);
+#else
+#define kvmhost_ioqmgr_init(kvm) {}
+#define kvmhost_ioqmgr_module_init() {}
+#endif
+
+#define IOQHC_REGISTER1
 #define IOQHC_UNREGISTER  2
 #define IOQHC_SIGNAL 3
 
diff --git a/drivers/kvm/ioq_host.c b/drivers/kvm/ioq_host.c
new file mode 100644
index 000..413f103
--- /dev/null
+++ b/drivers/kvm/ioq_host.c
@@ -0,0 +1,365 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "ioq.h"
+#include "kvm.h"
+
+struct kvmhost_ioq {
+   struct ioqioq;
+   struct rb_nodenode;
+   atomic_t  refcnt;
+   struct kvm_vcpu  *vcpu;
+   int   irq;
+};
+
+struct kvmhost_map {
+   spinlock_t lock;
+   struct rb_root root;
+};
+
+struct kvmhost_ioq_mgr {
+   struct ioq_mgr  mgr;
+   struct kvm *kvm;
+   struct kvmhost_map  map;
+};
+
+struct kvmhost_ioq* to_ioq(struct ioq *ioq)
+{
+   return container_of(ioq, struct kvmhost_ioq, ioq);
+}
+
+struct kvmhost_ioq_mgr* to_mgr(struct ioq_mgr *mgr)
+{
+   return container_of(mgr, struct kvmhost_ioq_mgr, mgr);
+}
+
+/*
+ * --
+ * rb map management
+ * --
+ */
+
+static void kvmhost_map_init(struct kvmhost_map *map)
+{
+   spin_lock_init(&map->lock);
+   map->root = RB_ROOT;
+}
+
+static int kvmhost_map_register(struct kvmhost_map *map,
+   struct kvmhost_ioq *ioq)
+{
+   int ret = 0;
+   struct rb_root *root;
+   struct rb_node **new, *parent = NULL;
+
+   spin_lock(&map->lock);
+
+   root = &map->root;
+   new  = &(root->rb_node);
+
+   /* Figure out where to put new node */
+   while (*new) {
+   struct kvmhost_ioq *this;
+
+   this   = container_of(*new, struct kvmhost_ioq, node);
+   parent = *new;
+
+   if (ioq->ioq.id < this->ioq.id)
+   new = &((*new)->rb_left);
+   else if (ioq->ioq.id > this->ioq.id)
+   new = &((*new)->rb_right);
+   else {
+   ret = -EEXIST;
+   break;
+   }
+   }
+
+   if (!ret) {
+   /* Add new node and rebalance tree. */
+   rb_link_node(&ioq->node, parent, new);
+   rb_insert_color(&ioq->node, r

[kvm-devel] [PATCH 04/10] IOQNET: Add a test harness infrastructure to IOQNET

2007-08-16 Thread Gregory Haskins

We can add a IOQNET loop-back device and register it with the PVBUS to test
many aspects of the system (IOQ, PVBUS, and IOQNET itself).

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/net/Kconfig   |   10 +
 drivers/net/ioqnet/Makefile   |3 
 drivers/net/ioqnet/loopback.c |  502 +
 3 files changed, 515 insertions(+), 0 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 7ee7454..426947d 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2957,6 +2957,16 @@ config IOQNET_DEBUG
depends on IOQNET
default n
 
+config IOQNET_LOOPBACK
+   tristate "IOQNET loopback device test harness"
+   depends on IOQNET
+   default n
+   ---help---
+   This will install a special PVBUS device that implements two IOQNET
+   devices.  The devices are, of course, linked to one another forming a
+   loopback mechanism.  This allows many subsystems to be tested: IOQ,
+   PVBUS, and IOQNET itself.  If unsure, say N.
+
 endif #NETDEVICES
 
 config NETPOLL
diff --git a/drivers/net/ioqnet/Makefile b/drivers/net/ioqnet/Makefile
index d7020ee..7d2d156 100644
--- a/drivers/net/ioqnet/Makefile
+++ b/drivers/net/ioqnet/Makefile
@@ -4,8 +4,11 @@
 
 ioqnet-objs = driver.o
 obj-$(CONFIG_IOQNET) += ioqnet.o
+ioqnet-loopback-objs = loopback.o
+obj-$(CONFIG_IOQNET_LOOPBACK) += ioqnet-loopback.o
 
 
 ifeq ($(CONFIG_IOQNET_DEBUG),y)
 EXTRA_CFLAGS += -DIOQNET_DEBUG
 endif
+
diff --git a/drivers/net/ioqnet/loopback.c b/drivers/net/ioqnet/loopback.c
new file mode 100644
index 000..0e36b43
--- /dev/null
+++ b/drivers/net/ioqnet/loopback.c
@@ -0,0 +1,502 @@
+/*
+ * ioqnet test harness
+ *
+ * Copyright (C) 2007 Novell, Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_AUTHOR("Gregory Haskins");
+MODULE_LICENSE("GPL");
+
+#ifndef ETH_ALEN
+#define ETH_ALEN 6
+#endif
+
+#undef PDEBUG /* undef it, just in case */
+#ifdef IOQNET_DEBUG
+#  define PDEBUG(fmt, args...) printk( KERN_DEBUG "ioqnet: " fmt, ## args)
+#else
+#  define PDEBUG(fmt, args...) /* not debugging: nothing */
+#endif
+
+/*
+ * -
+ * First we must create an IOQ implementation to use while under test
+ * since these operations will all be local to the same host
+ * -
+ */
+
+struct ioqnet_lb_ioq {
+   struct ioq   ioq;
+   struct ioqnet_lb_ioq*peer;
+   struct tasklet_structtask;
+};
+
+struct ioqnet_lb_ioqmgr {
+   struct ioq_mgr  mgr;
+
+   /*
+* Since this is just a test harness, we know ahead of time that
+* we aren't going to need more than a handful of IOQs.  So to keep
+* lookups simple we will simply create a static array of them
+*/
+   struct ioqnet_lb_ioq ioqs[8];
+   int pos;
+};
+
+static struct ioqnet_lb_ioqmgr lb_ioqmgr;
+
+struct ioqnet_lb_ioq* to_ioq(struct ioq *ioq)
+{
+   return container_of(ioq, struct ioqnet_lb_ioq, ioq);
+}
+
+struct ioqnet_lb_ioqmgr* to_mgr(struct ioq_mgr *mgr)
+{
+   return container_of(mgr, struct ioqnet_lb_ioqmgr, mgr);
+}
+
+/*
+ * --
+ * ioq implementation
+ * --
+ */
+static void ioqnet_lb_ioq_wake(unsigned long data)
+{
+   struct ioqnet_lb_ioq *_ioq = (struct ioqnet_lb_ioq*)data;
+
+   if (_ioq->peer)
+   ioq_wakeup(&_ioq->peer->ioq);
+}
+
+static int ioqnet_lb_ioq_signal(struct ioq *ioq)
+{
+   struct ioqnet_lb_ioq *_ioq = to_ioq(ioq);
+
+   if (_ioq->peer)
+   tasklet_schedule(&_ioq->task);
+
+   return 0;
+}
+
+static void ioqnet_lb_ioq_destroy(struct ioq *ioq)
+{
+   struct ioqnet_lb_ioq *_ioq = to_ioq(ioq);
+
+   if (_ioq->peer) {
+   _ioq->peer->peer = NULL;
+   _ioq->peer   = NULL;
+   }
+
+   if (_ioq->ioq.locale == ioq_locality_north) {
+   kfree(_ioq->ioq.ring);
+   kfree(_ioq->ioq.head_desc);
+   } else
+   kfree(_ioq);
+}
+
+/*
+ * --
+ * ioqmgr implementa

[kvm-devel] [PATCH 03/10] IOQ: Add an IOQ network driver

2007-08-16 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/net/Kconfig |   10 +
 drivers/net/Makefile|2 
 drivers/net/ioqnet/Makefile |   11 +
 drivers/net/ioqnet/driver.c |  658 +++
 include/linux/ioqnet.h  |   44 +++
 5 files changed, 725 insertions(+), 0 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index fb99cd4..7ee7454 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2947,6 +2947,16 @@ config NETCONSOLE
If you want to log kernel messages over the network, enable this.
See  for details.
 
+config IOQNET
+   tristate "IOQNET (IOQ based paravirtualized network driver)"
+   select IOQ
+   select PVBUS
+
+config IOQNET_DEBUG
+   bool "IOQNET debugging"
+   depends on IOQNET
+   default n
+
 endif #NETDEVICES
 
 config NETPOLL
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index a77affa..4c8a918 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -224,6 +224,8 @@ obj-$(CONFIG_ENP2611_MSF_NET) += ixp2000/
 
 obj-$(CONFIG_NETCONSOLE) += netconsole.o
 
+obj-$(CONFIG_IOQNET) += ioqnet/
+
 obj-$(CONFIG_FS_ENET) += fs_enet/
 
 obj-$(CONFIG_NETXEN_NIC) += netxen/
diff --git a/drivers/net/ioqnet/Makefile b/drivers/net/ioqnet/Makefile
new file mode 100644
index 000..d7020ee
--- /dev/null
+++ b/drivers/net/ioqnet/Makefile
@@ -0,0 +1,11 @@
+#
+# Makefile for the IOQNET ethernet driver
+#
+
+ioqnet-objs = driver.o
+obj-$(CONFIG_IOQNET) += ioqnet.o
+
+
+ifeq ($(CONFIG_IOQNET_DEBUG),y)
+EXTRA_CFLAGS += -DIOQNET_DEBUG
+endif
diff --git a/drivers/net/ioqnet/driver.c b/drivers/net/ioqnet/driver.c
new file mode 100644
index 000..8352029
--- /dev/null
+++ b/drivers/net/ioqnet/driver.c
@@ -0,0 +1,658 @@
+/*
+ * ioqnet - A paravirtualized network device based on the IOQ interface
+ *
+ * Copyright (C) 2007 Novell, Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * Derived from the SNULL example from the book "Linux Device
+ * Drivers" by Alessandro Rubini and Jonathan Corbet, published
+ * by O'Reilly & Associates.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include  /* printk() */
+#include  /* kmalloc() */
+#include   /* error codes */
+#include   /* size_t */
+#include  /* mark_bh */
+
+#include 
+#include/* struct device, and other headers */
+#include  /* eth_type_trans */
+#include   /* struct iphdr */
+#include  /* struct tcphdr */
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+MODULE_AUTHOR("Gregory Haskins");
+MODULE_LICENSE("GPL");
+
+#undef PDEBUG /* undef it, just in case */
+#ifdef IOQNET_DEBUG
+#  define PDEBUG(fmt, args...) printk( KERN_DEBUG "ioqnet: " fmt, ## args)
+#else
+#  define PDEBUG(fmt, args...) /* not debugging: nothing */
+#endif
+
+#define RX_RINGLEN 64
+#define TX_RINGLEN 64
+#define TX_PTRS_PER_DESC 64
+
+struct ioqnet_queue {
+   struct ioq  *queue;
+   struct ioq_notifier  notifier;
+};
+
+struct ioqnet_tx_desc {
+   struct sk_buff  *skb;
+   struct ioqnet_tx_ptr data[TX_PTRS_PER_DESC];
+};
+
+struct ioqnet_priv {
+   spinlock_t   lock;
+   struct net_device   *dev;
+   struct pvbus_device *pdev;
+   struct net_device_stats  stats;
+   struct ioqnet_queue  rxq;
+   struct ioqnet_queue  txq;
+   struct tasklet_structtxtask;
+};
+
+static int ioqnet_queue_init(struct ioqnet_priv *priv,
+struct ioqnet_queue *q,
+size_t ringsize,
+void (*func)(struct ioq_notifier*))
+{
+   int ret = priv->pdev->createqueue(priv->pdev, &q->queue, ringsize, 0);
+   if (ret < 0)
+   return ret;
+
+   q->notifier.signal = func;
+   q->queue->notifier = &q->notifier;
+
+   return 0;
+}
+
+/* Perform a hypercall to register/connect our queues */
+static int ioqnet_connect(struct ioqnet_priv *priv)
+{
+   struct ioqnet_connect data = {
+   .rxq = priv->rxq.queue->id,
+   .txq = priv->txq.queue->id,
+   };
+
+   return priv->pdev->call(priv->pdev, IOQNET_CONNECT,
+   &data, sizeof(data), 0);
+}
+
+static int ioqnet_disconnect(struct ioqnet_priv *priv)
+{
+   return priv->pdev->call(priv->pdev, IOQNET_DISCONNECT, NULL, 0, 0);
+}
+
+/* Perform a hypercall to get the assigned MAC addr */
+static int ioqnet_query_mac(struct ioqnet_priv *priv)
+{
+   return priv->pdev->call(priv->pdev,
+   IOQNET_QUERY_MAC,
+   priv->dev->dev_addr,
+   ETH_ALEN, 0);
+}
+
+
+/*
+ * Enable and disable receive interrupts.
+ */
+static void ioqnet_rx_ints(struct net_device *dev

[kvm-devel] [PATCH 02/10] PARAVIRTUALIZATION: Add support for a bus abstraction

2007-08-16 Thread Gregory Haskins

PV usually comes in two flavors:  device PV, and "core" PV.  The existing PV
ops deal in terms of the latter.  However, it would be useful to add an
interface for a virtual bus with provisions for discovery/configuration of
backend PV devices.  Often times it is desirable to run PV devices even if the
entire core is not operating with PVOPS.  Therefore, we introduce a separate
interface to deal with the devices.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 arch/i386/Kconfig|2 +
 arch/x86_64/Kconfig  |2 +
 drivers/Makefile |1 
 drivers/pvbus/Kconfig|7 ++
 drivers/pvbus/Makefile   |6 ++
 drivers/pvbus/pvbus-driver.c |  120 ++
 include/linux/pvbus.h|   59 +
 7 files changed, 197 insertions(+), 0 deletions(-)

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index c2d54b8..acf4506 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -1125,6 +1125,8 @@ source "drivers/pci/pcie/Kconfig"
 
 source "drivers/pci/Kconfig"
 
+source "drivers/pvbus/Kconfig"
+
 config ISA_DMA_API
bool
default y
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 145bb82..17d6c78 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -721,6 +721,8 @@ source "drivers/pcmcia/Kconfig"
 
 source "drivers/pci/hotplug/Kconfig"
 
+source "drivers/pvbus/Kconfig"
+
 endmenu
 
 
diff --git a/drivers/Makefile b/drivers/Makefile
index adad2f3..179e669 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -81,3 +81,4 @@ obj-$(CONFIG_GENERIC_TIME)+= clocksource/
 obj-$(CONFIG_DMA_ENGINE)   += dma/
 obj-$(CONFIG_HID)  += hid/
 obj-$(CONFIG_PPC_PS3)  += ps3/
+obj-$(CONFIG_PVBUS)+= pvbus/
diff --git a/drivers/pvbus/Kconfig b/drivers/pvbus/Kconfig
new file mode 100644
index 000..1ca094d
--- /dev/null
+++ b/drivers/pvbus/Kconfig
@@ -0,0 +1,7 @@
+#
+# PVBUS configuration
+#
+
+config PVBUS
+   bool "Paravirtual Bus"
+
diff --git a/drivers/pvbus/Makefile b/drivers/pvbus/Makefile
new file mode 100644
index 000..0df2c2e
--- /dev/null
+++ b/drivers/pvbus/Makefile
@@ -0,0 +1,6 @@
+#
+# Makefile for the PVBUS bus specific drivers.
+#
+
+obj-y += pvbus-driver.o
+
diff --git a/drivers/pvbus/pvbus-driver.c b/drivers/pvbus/pvbus-driver.c
new file mode 100644
index 000..3f6687d
--- /dev/null
+++ b/drivers/pvbus/pvbus-driver.c
@@ -0,0 +1,120 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * Paravirtualized-Bus - This is a generic infrastructure for virtual devices
+ * and their drivers.  It is inspired by Rusty Russell's lguest_bus, but with
+ * the key difference that the bus is decoupled from the underlying hypervisor
+ * in both name and function.
+ *
+ * Instead, it is intended that external hypervisor support will register
+ * arbitrary devices.  Generic drivers can then monitor this bus for
+ * compatible devices regardless of the hypervisor implementation. 
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+
+#define PVBUS_NAME "pvbus"
+
+/*
+ * This function is invoked whenever a new driver and/or device is added
+ * to check if there is a match
+ */
+static int pvbus_dev_match(struct device *_dev, struct device_driver *_drv)
+{
+   struct pvbus_device *dev = container_of(_dev,struct pvbus_device,dev);
+   struct pvbus_driver *drv = container_of(_drv,struct pvbus_driver,drv);
+
+   return !strcmp(dev->name, drv->name);
+}
+
+/*
+ * This function is invoked after the bus infrastructure has already made a
+ * match.  The device will contain a reference to the paired driver which
+ * we will extract.
+ */
+static int pvbus_dev_probe(struct device *_dev)
+{
+   int ret = 0;
+   struct pvbus_device*dev = container_of(_dev,struct pvbus_device, dev);
+   struct pvbus_driver*drv = container_of(_dev->driver,
+  struct pvbus_driver, drv);
+
+   if (drv->probe)
+   ret = drv->probe(dev);
+
+   return ret;
+}
+
+static struct bus_type pv_bus = {
+   .name   = PVBUS_NAME,
+   .match  = pvbus_dev_match,
+};
+
+static stru

[kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-16 Thread Gregory Haskins

Here is the v3 release of the patch series for a generalized PV-IO
infrastructure.  It has v2 plus the following changes:

1) The big changes is that PVBUS is now based on the bus/device_register
   APIs.  The code is inspired by the lguest_bus except it has been decoupled
   from the hypervisor.  Also, the "device" object provides an actual
   interface to the device which yeilds a tight-coupling to the underlying
   device provider.  This offers some interesting features as evident in patch
   #4 where we register some in-host virtual devices for the purposes of
   testing the IOQ/PVBUS/IOQNET infrastructure.

2) The KVM specific portions of the code have been adapted to the new model
   and hotplug support has been added.

The test harness in #4 has helped to find some bugs.  Some more remain that I
will fix after this mail goes out.  There is also some work to do with the
"shutdown" portion of the code (e.g. you will see a bus panic if you try to
rmmod a pvbus device/driver right now).

Note that the first four patches are not specific to KVM.  Once we get a
better picture of what we can use, I will obviouly start cross posting those
particular patches to LKML so they can be reviewed by a broader community.
For now I will keep it KVM since their status is in flux.

I am looking forward to seeing Dor's virtio based patch series.  Please send
it to me ASAP, even if its not complete or even compiling.

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC PATCH 0/9] PV device infrastructure v2

2007-08-15 Thread Gregory Haskins

On Wed, 2007-08-15 at 15:37 -0700, Dor Laor wrote:

> 
> If you'll be quick enough you can rip the lguest_bus into a very light
> weight
> virtio_bus. Please keep it thin as possible, Rusty's code is 217 lines
> long, half
> of it comments. I'm planing to have a flexible use of this bus while one
> can either choose virtio_bus or pci bus.

Already done ;)

Its not quite polished yet.  I will send tomorrow.  Currently it looks
like this:

 arch/x86_64/Kconfig  |2
 drivers/Makefile |1
 drivers/pvbus/Kconfig|7 +++
 drivers/pvbus/Makefile   |6 ++
 drivers/pvbus/pvbus-driver.c |   89
+++
 include/linux/pvbus.h|   55 ++
 6 files changed, 160 insertions(+)

Of course its still IOQ oriented currently, but we can reconcile that
when we get together.

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] a few may be wiki question

2007-08-15 Thread Gregory Haskins

On Wed, 2007-08-15 at 15:52 +0200, Farkas Levente wrote:
> Gregory Haskins wrote:
> > 
> > On thing that is interesting about this (to me) is that, in a way it
> > kind of is a "poor mans" swap for the guests memory on the host.  E.g.
> > you could give your guests a really small amount of "physical" ram (say,
> > I dunno, 64MB/ea) and a large swap file (say several gigs).  Since the
> > disk-io emanating from the guest would likely be mediated by the hosts
> > buffercache, its kind of like you just gave the guest a large chunk of
> > (indirectly accessible) pageable ram.
> 
> i just imagine th opposite setup. suppose i've got 4 guest and 4gb ram.
> most of the time guest are idle they just used as a compile server for
> different platforms (centos, mandrake, fedora, windows etc). i can
> schedule the compiles (ie. 1:00 guest1, 2:00 guest 2 etc). in this case
> i gives the host 1-2gb and all guest 2-3gb logical ram. which is in sum
> 9-14gb ram even i have only 4gb physical ram.
> _BUT_ as i know that the guests are usually idle _and_ they have high
> resource requirement at different time, probably all of them can use
> 2-3gb real ram as memory. and in this setup if all other time most of
> the guest's ram are swapped out it's not really bother me since they are
> idle.
> or did is something misunderstood?

I think you are just misunderstanding me.  We are saying the same thing
(I think) :)

What it sounds like you are talking about is oversubscribing your ram.
IIUC this is not possible today unless you employ a guest-swap scheme
like I mentioned.  Until the balloon/swap support is in, the guest "ram"
allocation aggregated across all VMs cannot exceed host-physical ram (in
your case, 4GB).  In fact, it has to be a little less of course so the
host still has some to play with ;)

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC PATCH 0/9] PV device infrastructure v2

2007-08-15 Thread Gregory Haskins

On Wed, 2007-08-15 at 00:13 -0400, Gregory Haskins wrote:
> On Wed, 2007-08-15 at 06:58 +0300, Avi Kivity wrote:

> > since it wants to be hypervisor agnostic, it cannot specify an ABI (as
> > some already have ABIs, for example Xen).
> 
> I see, and that is a good point.  By only being an API, virtio can work
> over XenBus or any other ABI it wants.  IOQ is an API *and* an ABI.
> While this doesnt preclude it from ever working on Xen, it does imply
> that you would need to add the IOQ ABI to the Xen infrastructure if you
> wanted to use it (at least, if you wanted to use it directly instead of
> doing some kind of funky queue-in-queue with IOQ on XenBus..yuk). 

I think I misspoke when I said this (it was up way to late for coherent
thought ;)

While what you said is true (virtio = API, IOQ = API+ABI) note that I
believe both can be considered hypervisor agnostic.  (That was a primary
design goal of mine).  All the hypervisor would need to do is write an
ioq_mgr layer.  Xen for instance, could use either hypercalls/interrupts
directly like I do for the KVM implementation.  Or it could use XenBus
as a signaling transport.

I am only bringing this up as a correction to my statement, not to try
to sway someone in some kind of IOQ vs VirtIO debate.  :)

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC PATCH 0/9] PV device infrastructure v2

2007-08-15 Thread Gregory Haskins

On Wed, 2007-08-15 at 00:52 -0700, Dor Laor wrote:

> If the above is followed, any enhancement will be appriciated.

Since I am close, I will probably make at least one more v3 drop "as is"
with the new lguest_bus inspired pvbus (with hotplug, etc).  From there,
we can (virtually) get together and figure out what can be used
directly, what can be used in spirit, and what should be thrown away.

Dor, I know you mentioned you wanted to go stepwise, but note that I
*did* break it up into a series.  I just delivered it all at once ;)

But the fact is, I really need to do this stuff in-kernel so I am very
much interested in getting the solution designed with that in mind from
the get-go.  So note that you will continue to see a series that is
oriented to support both your userspace approach and my current
in-kernel direction.  (Note that my current series doesn't deal with the
userspace side yet).

Please point me to your current branch when it is available for me to
look at.  I'm looking forward to helping out here!

Regards,
-Greg   


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] a few may be wiki question

2007-08-15 Thread Gregory Haskins

On Wed, 2007-08-15 at 11:07 +0300, Avi Kivity wrote:
> Farkas Levente wrote:
> > this rise another question if swapping will be used the it moves the
> > guest memory to the guest's swap or the host's swap? if to the host's
> > swap then this implies i should have to allocate large enough swap for
> > the host. ie. even if i give only 256mb to the host still have to gives
> > 6-8GB swap partition to the host.
> >   
> 
> Yes, when swapping is implemented then guest memory will be swapped to 
> the host swap.  Of course, if the guest has its own swap file, then it 
> will swap to the guest's swap independently.

And to clarify: FYI, I believe the guest-swap option should actually
work today.  If you assign a partition as a swap device in the guest it
should use it.  kvm-host will simply see it as disk io as any other.
This is in contrast to the host-swap and/or balloon driver which is
still a work-in-progress.

On thing that is interesting about this (to me) is that, in a way it
kind of is a "poor mans" swap for the guests memory on the host.  E.g.
you could give your guests a really small amount of "physical" ram (say,
I dunno, 64MB/ea) and a large swap file (say several gigs).  Since the
disk-io emanating from the guest would likely be mediated by the hosts
buffercache, its kind of like you just gave the guest a large chunk of
(indirectly accessible) pageable ram.

OTOH, virtualized disk-io is pretty slow compared to bare-metal io.  As
I said, this is a "poor mans" solution ;)  What it really comes down to
is: does having emulated disk-io to a buffercache offer adequate
performance for your applications running in the guest?  If so, this may
be a good interim solution to get a larger number of guests running on a
given host with "dynamic ram".

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC PATCH 0/9] PV device infrastructure v2

2007-08-14 Thread Gregory Haskins

On Wed, 2007-08-15 at 06:58 +0300, Avi Kivity wrote:
>
> 
> Yes.  virtio adapts the driver interface to an API can drive a shared
> memory queue more easily, while you actually provide the queue protocol
> (and ABI).

If I understand what you are saying, it actually jives with something
that struck me as a potentially win-win a while back when I first heard
about virtio.  That is: you can actually implement virtio with ioq.  I
am not sure if that idea has any merit or not, especially given the news
that virtio for KVM is almost donebut I thought I would mention it. 

> 
> virtio seems to have more modest goals:
> 
> - make it easier to write guest/host (or guest/guest) transports, but
> not actually provide them
> - limited to guest only (and Linux only)
> - no discovery/hotplug (yet?)

As an aside, the version of the pvbus that I am working on right now
(v3?) adds hotplug capability.  I assume that there is some merit to me
continuing the work at least on the discovery/hotplug front then?

> 
> since it wants to be hypervisor agnostic, it cannot specify an ABI (as
> some already have ABIs, for example Xen).

I see, and that is a good point.  By only being an API, virtio can work
over XenBus or any other ABI it wants.  IOQ is an API *and* an ABI.
While this doesnt preclude it from ever working on Xen, it does imply
that you would need to add the IOQ ABI to the Xen infrastructure if you
wanted to use it (at least, if you wanted to use it directly instead of
doing some kind of funky queue-in-queue with IOQ on XenBus..yuk). 

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC PATCH 0/9] PV device infrastructure v2

2007-08-14 Thread Gregory Haskins

On Wed, 2007-08-15 at 05:50 +0300, Avi Kivity wrote:
> Gregory Haskins wrote:
> > This series incorporates all of v1 plus the following changes based on
> > feedback to date:
> >   
> 
> Are you positioning this as an alternative to virtio?

Absolutely not!  I really just want to see a decent PV-IO solution
working.

When I originally started this project virtio didn't exist.  And even
recently I was under the impression that there was no implementation of
virtio for KVM in existence.  I just learned this morning that Dor is
pretty close to getting something working, but that was news to me.

> If so, be aware that virtio is (a) mostly done (b) very well done.

I would very much like to help make virtio work, which is really where I
was going with this.  My design is a little bit different so I was
submitting it in case there was any ideas worth salvaging to be picked
up by the official project.  (I tried to explain this in the v1
announcement so I apologize if anyone, particularly Rusty, felt
slightedit was not my intention)

> 
> Can you describe what you are trying to achieve that virtio doesn't do?
> 

To be perfectly honest, I have never been able to find an implementation
of virtio (I looked and even asked Rusty directly via email but never
found/heard anything back) so I don't know exactly what its capabilities
are.  My impressions from reading Rusty's email proposals are that there
are certainly similarities to the virtio interface and the ioq interface
in concept.  Where they seem to differ is that the concept of the ring
is more exposed in IOQ via the iterator idiom instead of the sg-buffer
idiom.

At the time I first saw the virtio proposals, I couldn't quite wrap my
head around how I could do things like "zero copy" + deferred pointer
reaping, which was a design goal of mine.  So I kept up with the IOQ
design for the interim to at least demonstrate where I was trying to go.
Perhaps it will be a useful innovation and the virtio interface design
will pick up some of my ideas.  Perhaps virtio deals with it already.
Or perhaps no one will think its a good idea and it gets pushed
to /dev/null ;)  I am not sure what the answer will be.

But in any case, I was also trying to go beyond the shared-ring
interface design.  For instance, the series also provides:

*) a system for efficiently discovering/communicating with PV backend
devices that was not laden with legacy interfaces like PCI.  (I am in
the process of converting over to use bus_register as Dor suggested).

*) generalization of as much of the shared-memory code as possible so it
could be reused (for instance, both guest and host can use the same IOQ
interface, and the code itself will largely work with any hypervisor, or
even non-hypervisor shared-memory systems, like RDMA/AMP).

Again, perhaps virtio will cover all these areas too.  Without having
seen it I am not really sure where the overlap exists, but perhaps there
will be at least some aspects of my series that are useful.  That is why
I submitted it.  Ideally I can hook up with whomever is working on the
implementation (sounds like Dor?) and we can crank something out
together.  :)

Again, apologies if anyone felt slighted. :(

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [RFC PATCH 7/9] KVM: Add PVBUS support to the KVM host

2007-08-14 Thread Gregory Haskins

PVBUS allows VMM agnostic PV drivers to discover/configure virtual resources

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig  |   10 +
 drivers/kvm/Makefile |3 
 drivers/kvm/kvm.h|4 
 drivers/kvm/kvm_main.c   |4 
 drivers/kvm/pvbus_host.c |  466 ++
 drivers/kvm/pvbus_host.h |   66 +++
 6 files changed, 553 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 0f81af1..cb674bb 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -52,6 +52,16 @@ config KVM_IOQ_HOST
depends on KVM
select IOQ
 
+config KVM_PVBUS_HOST
+   boolean "Paravirtualized Bus (PVBUS) host support"
+   depends on KVM
+   select KVM_IOQ_HOST
+   ---help---
+PVBUS is an infrastructure for generic PV drivers to take advantage
+of an underlying hypervisor without having to understand the details
+   of the hypervisor itself.  You only need this option if you plan to
+   run PVBUS based PV guests in KVM.
+
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 2095061..8926fa9 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -7,6 +7,9 @@ kvm-objs := kvm_main.o mmu.o x86_emulate.o
 ifeq ($(CONFIG_KVM_IOQ_HOST),y)
 kvm-objs += ioq_host.o
 endif
+ifeq ($(CONFIG_KVM_PVBUS_HOST),y)
+kvm-objs += pvbus_host.o
+endif
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index c38c84f..8dc9ac3 100755
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vmx.h"
 #include "ioq.h"
@@ -393,6 +394,9 @@ struct kvm {
 #ifdef CONFIG_KVM_IOQ_HOST
struct ioq_mgr *ioqmgr;
 #endif
+#ifdef CONFIG_KVM_PVBUS_HOST
+   struct kvm_pvbus *pvbus;
+#endif
 
 };
 
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index fbffd2f..d35ce8d 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -44,6 +44,7 @@
 
 #include "x86_emulate.h"
 #include "segment_descriptor.h"
+#include "pvbus_host.h"
 
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
@@ -350,6 +351,7 @@ static struct kvm *kvm_create_vm(void)
spin_unlock(&kvm_lock);
}
kvmhost_ioqmgr_init(kvm);
+   kvm_pvbus_init(kvm);
return kvm;
 }
 
@@ -3616,6 +3618,7 @@ static __init int kvm_init(void)
memset(__va(bad_page_address), 0, PAGE_SIZE);
 
kvmhost_ioqmgr_module_init();
+   kvm_pvbus_module_init();
 
return 0;
 
@@ -3637,6 +3640,7 @@ static __exit void kvm_exit(void)
mntput(kvmfs_mnt);
unregister_filesystem(&kvm_fs_type);
kvm_mmu_module_exit();
+   kvm_pvbus_module_exit();
 }
 
 module_init(kvm_init)
diff --git a/drivers/kvm/pvbus_host.c b/drivers/kvm/pvbus_host.c
new file mode 100644
index 000..44d81f3
--- /dev/null
+++ b/drivers/kvm/pvbus_host.c
@@ -0,0 +1,466 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "pvbus.h"
+#include "pvbus_host.h"
+#include "kvm.h"
+
+struct pvbus_map {
+   int (*compare)(const void *left, const void *right);
+   const void* (*getkey)(struct rb_node *node);
+
+   spinlock_t lock;
+   struct rb_root root;
+};
+
+struct _pv_devtype {
+   struct kvm_pv_devtype *item;
+   struct rb_node node;
+   struct list_head   devlist;
+};
+
+struct _pv_device {
+   struct kvm_pv_device  *item;
+   struct rb_node node;
+   struct list_head   list;
+   struct _pv_devtype*parent;
+};
+
+static struct pvbus_map pvbus_typemap; /* stores globally available types */
+
+struct kvm_pvbus {
+   spinlock_t   lock;
+   struct pvbus_map typemap; /* stores locally instantiated types */
+   struct pvbus_map devmap;
+};
+
+/*
+ * --
+ * generic rb map manageme

[kvm-devel] [RFC PATCH 9/9] KVM: Add an IOQNET backend driver

2007-08-14 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig   |5 
 drivers/kvm/Makefile  |2 
 drivers/kvm/ioqnet_host.c |  566 +
 3 files changed, 573 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index cb674bb..585581b 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -62,6 +62,11 @@ config KVM_PVBUS_HOST
of the hypervisor itself.  You only need this option if you plan to
run PVBUS based PV guests in KVM.
 
+config KVM_IOQNET
+   tristate "IOQNET host support"
+   depends on KVM
+   select KVM_PVBUS_HOST
+
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 8926fa9..66e5272 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -22,3 +22,5 @@ kvm-net-host-objs = kvm_net_host.o
 obj-$(CONFIG_KVM_NET_HOST) += kvm_net_host.o
 kvm-pvbus-objs := ioq_guest.o pvbus_guest.o
 obj-$(CONFIG_KVM_PVBUS_GUEST) += kvm-pvbus.o
+kvm-ioqnet-objs := ioqnet_host.o
+obj-$(CONFIG_KVM_IOQNET) += kvm-ioqnet.o
\ No newline at end of file
diff --git a/drivers/kvm/ioqnet_host.c b/drivers/kvm/ioqnet_host.c
new file mode 100644
index 000..0f4d055
--- /dev/null
+++ b/drivers/kvm/ioqnet_host.c
@@ -0,0 +1,566 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * ioqnet - A paravirtualized network device based on the IOQ interface.
+ *
+ * This module represents the backend driver for an IOQNET driver on the KVM
+ * platform.
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * Derived in part from the SNULL example from the book "Linux Device
+ * Drivers" by Alessandro Rubini and Jonathan Corbet, published
+ * by O'Reilly & Associates.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include  /* printk() */
+#include  /* kmalloc() */
+#include   /* error codes */
+#include   /* size_t */
+#include  /* mark_bh */
+
+#include 
+#include/* struct device, and other headers */
+#include  /* eth_type_trans */
+#include   /* struct iphdr */
+#include  /* struct tcphdr */
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvbus_host.h"
+#include "kvm.h"
+
+MODULE_AUTHOR("Gregory Haskins");
+MODULE_LICENSE("GPL");
+
+#define IOQNET_NAME "ioqnet"
+
+/*
+ * FIXME: Any "BUG_ON" code that can be triggered by a malicious guest must
+ * be turned into an inject_gp()
+ */
+
+struct ioqnet_queue {
+   struct ioq  *queue;
+   struct ioq_notifier  notifier;
+};
+
+struct ioqnet_priv {
+   spinlock_t   lock;
+   struct kvm  *kvm;
+   struct kvm_pv_device pvdev;
+   struct net_device   *netdev;
+   struct net_device_stats  stats;
+   struct ioqnet_queue  rxq;
+   struct ioqnet_queue  txq;
+   struct tasklet_structtxtask;
+   int  connected;
+   int  opened;
+};
+
+#undef PDEBUG /* undef it, just in case */
+#ifdef IOQNET_DEBUG
+#  define PDEBUG(fmt, args...) printk( KERN_DEBUG "ioqnet: " fmt, ## args)
+#else
+#  define PDEBUG(fmt, args...) /* not debugging: nothing */
+#endif
+
+/*
+ * Enable and disable receive interrupts.
+ */
+static void ioqnet_rx_ints(struct net_device *dev, int enable)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   struct ioq *ioq = priv->rxq.queue;
+
+   if (priv->connected) {
+   if (enable)
+   ioq_start(ioq, 0);
+   else
+   ioq_stop(ioq, 0);
+   }
+}
+
+/*
+ * Open and close
+ */
+
+int ioqnet_open(struct net_device *dev)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   
+   priv->opened = 1;
+   netif_start_queue(dev);
+   
+   return 0;
+}
+
+int ioqnet_release(struct net_device *dev)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   
+   priv->opened = 0;
+   netif_stop_queue(dev);
+
+   return 0;
+}
+
+/*
+ * Configuration changes (passed on by ifconfig)
+ */

[kvm-devel] [RFC PATCH 4/9] KVM: Add a guest side driver for IOQ

2007-08-14 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig   |   27 +-
 drivers/kvm/Makefile  |3 -
 drivers/kvm/ioq.h |   39 +
 drivers/kvm/ioq_guest.c   |  192 +
 drivers/kvm/pvbus.h   |   41 ++
 drivers/kvm/pvbus_guest.c |  189 
 include/linux/kvm.h   |4 +
 7 files changed, 487 insertions(+), 8 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 22d0eb4..cba03d2 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -47,16 +47,31 @@ config KVM_BALLOON
  The driver inflate/deflate guest physical memory on demand.
  This ability provides memory over commit for the host
 
-config KVM_NET
-   tristate "Para virtual network device"
-   depends on KVM
-   ---help---
- Provides support for guest paravirtualization networking
-
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
---help---
  Provides support for host paravirtualization networking
 
+config KVM_GUEST
+   bool "KVM Guest support"
+   depends on X86
+   default y
+
+config KVM_PVBUS_GUEST
+   tristate "Paravirtualized Bus (PVBUS) support"
+   depends on KVM_GUEST
+   select IOQ
+   ---help---
+PVBUS is an infrastructure for generic PV drivers to take advantage
+of an underlying hypervisor without having to understand the details
+   of the hypervisor itself.  You only need this option if you plan to
+   run this kernel as a KVM guest.
+
+config KVM_NET
+   tristate "Para virtual network device"
+   depends on KVM && KVM_GUEST
+   ---help---
+ Provides support for guest paravirtualization networking
+
 endif # VIRTUALIZATION
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 92600d8..c6a59bb 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -14,4 +14,5 @@ kvm-net-objs = kvm_net.o
 obj-$(CONFIG_KVM_NET) += kvm-net.o
 kvm-net-host-objs = kvm_net_host.o
 obj-$(CONFIG_KVM_NET_HOST) += kvm_net_host.o
-
+kvm-pvbus-objs := ioq_guest.o pvbus_guest.o
+obj-$(CONFIG_KVM_PVBUS_GUEST) += kvm-pvbus.o
diff --git a/drivers/kvm/ioq.h b/drivers/kvm/ioq.h
new file mode 100644
index 000..7e955f1
--- /dev/null
+++ b/drivers/kvm/ioq.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _KVM_IOQ_H_
+#define _KVM_IOQ_H_
+
+#include 
+
+#define IOQHC_REGISTER   1
+#define IOQHC_UNREGISTER  2
+#define IOQHC_SIGNAL 3
+
+struct ioq_register {
+   ioq_id_t id;
+   u32  irq;
+   u64  ring;
+};
+
+
+#endif /* _KVM_IOQ_H_ */
diff --git a/drivers/kvm/ioq_guest.c b/drivers/kvm/ioq_guest.c
new file mode 100644
index 000..f7c88aa
--- /dev/null
+++ b/drivers/kvm/ioq_guest.c
@@ -0,0 +1,192 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+
+#include "ioq.h"
+#include "kvm.h"
+
+struct kvmguest_ioq {
+   struct ioqioq;
+   int   irq;
+};
+
+struct kvmguest_ioq* to_ioq(struct ioq *ioq)
+{
+   return container_of(ioq, struct kvmguest_ioq, ioq);
+}
+
+static int ioq_hypercall(unsigned long nr, void *data)
+{
+   return hyperca

[kvm-devel] [RFC PATCH 5/9] KVM: Add a gpa_to_hva helper function

2007-08-14 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h |1 +
 drivers/kvm/mmu.c |   12 
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 9934f11..05d5be1 100755
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -475,6 +475,7 @@ void vcpu_load(struct kvm_vcpu *vcpu);
 void vcpu_put(struct kvm_vcpu *vcpu);
 
 hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa);
+void* gpa_to_hva(struct kvm *kvm, gpa_t gpa);
 #define HPA_MSB ((sizeof(hpa_t) * 8) - 1)
 #define HPA_ERR_MASK ((hpa_t)1 << HPA_MSB)
 static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; }
diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index e84c599..daaf0d2 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -766,6 +766,18 @@ hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 }
 EXPORT_SYMBOL_GPL(gpa_to_hpa);
 
+void* gpa_to_hva(struct kvm *kvm, gpa_t gpa)
+{
+   struct page *page;
+
+   if ((gpa & HPA_ERR_MASK) == 0)
+   return NULL;
+
+   page = gfn_to_page(kvm, gpa >> PAGE_SHIFT);
+   return kmap_atomic(page, gpa & PAGE_MASK);
+}
+EXPORT_SYMBOL_GPL(gpa_to_hva);
+
 hpa_t gva_to_hpa(struct kvm_vcpu *vcpu, gva_t gva)
 {
gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, gva);


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [RFC PATCH 8/9] IOQ: Add an IOQ network driver

2007-08-14 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/net/Kconfig|4 
 drivers/net/Makefile   |2 
 drivers/net/ioqnet.c   |  631 
 include/linux/ioqnet.h |   42 +++
 4 files changed, 679 insertions(+), 0 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index fb99cd4..eb46c07 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2947,6 +2947,10 @@ config NETCONSOLE
If you want to log kernel messages over the network, enable this.
See  for details.
 
+config IOQNET
+   tristate "IOQ based paravirtualized network driver"
+   select IOQ
+
 endif #NETDEVICES
 
 config NETPOLL
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index a77affa..f1b4916 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -224,6 +224,8 @@ obj-$(CONFIG_ENP2611_MSF_NET) += ixp2000/
 
 obj-$(CONFIG_NETCONSOLE) += netconsole.o
 
+obj-$(CONFIG_IOQNET) += ioqnet.o
+
 obj-$(CONFIG_FS_ENET) += fs_enet/
 
 obj-$(CONFIG_NETXEN_NIC) += netxen/
diff --git a/drivers/net/ioqnet.c b/drivers/net/ioqnet.c
new file mode 100644
index 000..5500631
--- /dev/null
+++ b/drivers/net/ioqnet.c
@@ -0,0 +1,631 @@
+/*
+ * ioqnet - A paravirtualized network device based on the IOQ interface
+ *
+ * Copyright (C) 2007 Novell, Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * Derived from the SNULL example from the book "Linux Device
+ * Drivers" by Alessandro Rubini and Jonathan Corbet, published
+ * by O'Reilly & Associates.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include  /* printk() */
+#include  /* kmalloc() */
+#include   /* error codes */
+#include   /* size_t */
+#include  /* mark_bh */
+
+#include 
+#include/* struct device, and other headers */
+#include  /* eth_type_trans */
+#include   /* struct iphdr */
+#include  /* struct tcphdr */
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+MODULE_AUTHOR("Gregory Haskins");
+MODULE_LICENSE("GPL");
+
+#undef PDEBUG /* undef it, just in case */
+#ifdef IOQNET_DEBUG
+#  define PDEBUG(fmt, args...) printk( KERN_DEBUG "ioqnet: " fmt, ## args)
+#else
+#  define PDEBUG(fmt, args...) /* not debugging: nothing */
+#endif
+
+static int timeout = 5;   /* In jiffies */
+module_param(timeout, int, 0);
+
+#define RX_RINGLEN 64
+#define TX_RINGLEN 64
+#define TX_PTRS_PER_DESC 64
+
+struct ioqnet_queue {
+   struct ioq  *queue;
+   struct ioq_notifier  notifier;
+};
+
+struct ioqnet_tx_desc {
+   struct sk_buff  *skb;
+   struct ioqnet_tx_ptr data[TX_PTRS_PER_DESC];
+};
+
+struct ioqnet_priv {
+   spinlock_t   lock;
+   struct net_device   *dev;
+   struct net_device_stats  stats;
+   struct ioqnet_queue  rxq;
+   struct ioqnet_queue  txq;
+   struct tasklet_structtxtask;
+   u64  pvb_instance;
+};
+
+static int ioqnet_queue_init(struct ioqnet_priv *priv,
+struct ioqnet_queue *q,
+size_t ringsize,
+void (*func)(struct ioq_notifier*))
+{
+   int ret = pvbus_ops->ioqmgr->create(pvbus_ops->ioqmgr,
+   &q->queue, ringsize, 0);
+   if (ret < 0)
+   return ret;
+
+   q->notifier.signal = func;
+
+   return 0;
+}
+
+/*
+ * Enable and disable receive interrupts.
+ */
+static void ioqnet_rx_ints(struct net_device *dev, int enable)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   struct ioq *ioq = priv->rxq.queue;
+   if (enable)
+   ioq_start(ioq, 0);
+   else
+   ioq_stop(ioq, 0);
+}
+
+static void ioqnet_alloc_rx_desc(struct ioq_ring_desc *desc, size_t len)
+{
+   struct sk_buff *skb = dev_alloc_skb(len + 2);
+   BUG_ON(!skb);
+
+   skb_reserve(skb, 2); /* align IP on 16B boundary */
+
+   desc->cookie = (u64)skb;
+   desc->ptr= (u64)__pa(skb->data);
+   desc->len= len; /* total length  */
+   desc->alen   = 0;   /* actual length - to be filled in by host */
+
+   mb();
+   desc->valid  = 1;
+   desc->sown   = 1;   /* give ownership to the south */
+   mb();
+}
+
+static void ioqnet_setup_rx(struct ioqnet_priv *priv)
+{
+   struct ioq *ioq = priv->rxq.queue;
+   struct ioq_iterator iter;
+   int ret;
+   int i;
+
+   /*
+* We want to iterate on the "valid" index.  By default the iterator
+* will not "autoupdate" which means it will not hypercall the host
+* with our changes.  This is good, because we are really just
+* initializing stuff here anyway.  Note that you can always manually
+* signal the host with ioq_signal() if the autoupdate feature is not
+* used.
+

[kvm-devel] [RFC PATCH 6/9] KVM: Add support for IOQ

2007-08-14 Thread Gregory Haskins

IOQ is a shared-memory-queue interface for implmenting PV driver
communication.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig|5 +
 drivers/kvm/Makefile   |3 
 drivers/kvm/ioq.h  |   12 +-
 drivers/kvm/ioq_host.c |  365 
 drivers/kvm/kvm.h  |5 +
 drivers/kvm/kvm_main.c |3 
 include/linux/kvm.h|1 
 7 files changed, 393 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index cba03d2..0f81af1 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -47,6 +47,11 @@ config KVM_BALLOON
  The driver inflate/deflate guest physical memory on demand.
  This ability provides memory over commit for the host
 
+config KVM_IOQ_HOST
+   boolean "Add IOQ support to KVM"
+   depends on KVM
+   select IOQ
+
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index c6a59bb..2095061 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -4,6 +4,9 @@
 EXTRA_CFLAGS :=
 
 kvm-objs := kvm_main.o mmu.o x86_emulate.o
+ifeq ($(CONFIG_KVM_IOQ_HOST),y)
+kvm-objs += ioq_host.o
+endif
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/ioq.h b/drivers/kvm/ioq.h
index 7e955f1..b942113 100644
--- a/drivers/kvm/ioq.h
+++ b/drivers/kvm/ioq.h
@@ -25,7 +25,17 @@
 
 #include 
 
-#define IOQHC_REGISTER   1
+struct kvm;
+
+#ifdef CONFIG_KVM_IOQ_HOST
+int kvmhost_ioqmgr_init(struct kvm *kvm);
+int kvmhost_ioqmgr_module_init(void);
+#else
+#define kvmhost_ioqmgr_init(kvm) {}
+#define kvmhost_ioqmgr_module_init() {}
+#endif
+
+#define IOQHC_REGISTER1
 #define IOQHC_UNREGISTER  2
 #define IOQHC_SIGNAL 3
 
diff --git a/drivers/kvm/ioq_host.c b/drivers/kvm/ioq_host.c
new file mode 100644
index 000..413f103
--- /dev/null
+++ b/drivers/kvm/ioq_host.c
@@ -0,0 +1,365 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "ioq.h"
+#include "kvm.h"
+
+struct kvmhost_ioq {
+   struct ioqioq;
+   struct rb_nodenode;
+   atomic_t  refcnt;
+   struct kvm_vcpu  *vcpu;
+   int   irq;
+};
+
+struct kvmhost_map {
+   spinlock_t lock;
+   struct rb_root root;
+};
+
+struct kvmhost_ioq_mgr {
+   struct ioq_mgr  mgr;
+   struct kvm *kvm;
+   struct kvmhost_map  map;
+};
+
+struct kvmhost_ioq* to_ioq(struct ioq *ioq)
+{
+   return container_of(ioq, struct kvmhost_ioq, ioq);
+}
+
+struct kvmhost_ioq_mgr* to_mgr(struct ioq_mgr *mgr)
+{
+   return container_of(mgr, struct kvmhost_ioq_mgr, mgr);
+}
+
+/*
+ * --
+ * rb map management
+ * --
+ */
+
+static void kvmhost_map_init(struct kvmhost_map *map)
+{
+   spin_lock_init(&map->lock);
+   map->root = RB_ROOT;
+}
+
+static int kvmhost_map_register(struct kvmhost_map *map,
+   struct kvmhost_ioq *ioq)
+{
+   int ret = 0;
+   struct rb_root *root;
+   struct rb_node **new, *parent = NULL;
+
+   spin_lock(&map->lock);
+
+   root = &map->root;
+   new  = &(root->rb_node);
+
+   /* Figure out where to put new node */
+   while (*new) {
+   struct kvmhost_ioq *this;
+
+   this   = container_of(*new, struct kvmhost_ioq, node);
+   parent = *new;
+
+   if (ioq->ioq.id < this->ioq.id)
+   new = &((*new)->rb_left);
+   else if (ioq->ioq.id > this->ioq.id)
+   new = &((*new)->rb_right);
+   else {
+   ret = -EEXIST;
+   break;
+   }
+   }
+
+   if (!ret) {
+   /* Add new node and rebalance tree. */
+   rb_link_node(&ioq->node, parent, new);
+   rb_insert_color(&ioq->node, r

[kvm-devel] [RFC PATCH 1/9] IOQ: Adding basic definitions for IO-Queue logic

2007-08-14 Thread Gregory Haskins

IOQ is a generic shared-memory-queue mechanism that happens to be friendly
to virtualization boundaries.  Note that it is not virtualization specific
due to its flexible transport layer.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 include/linux/ioq.h |  176 +++
 lib/Kconfig |   11 ++
 lib/Makefile|1 
 lib/ioq.c   |  228 +++
 4 files changed, 416 insertions(+), 0 deletions(-)

diff --git a/include/linux/ioq.h b/include/linux/ioq.h
new file mode 100644
index 000..d3a18a1
--- /dev/null
+++ b/include/linux/ioq.h
@@ -0,0 +1,176 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * IOQ is a generic shared-memory-queue mechanism that happens to be friendly
+ * to virtualization boundaries. It can be used in a variety of ways, though
+ * its intended purpose is to become the low-level communication path for
+ * paravirtualized drivers.  Note that it is not virtualization specific
+ * due to its flexible signaling layer.
+ *
+ * The following are a list of key design points:
+ *
+ * #) All shared-memory is always allocated on explicitly one side of the
+ *link.  This typically would be the guest side in a VM/VMM scenario.
+ * #) The code has the concept of "north" and "south" where north denotes the
+ *memory-owner side (e.g. guest).
+ * #) A IOQ is "created" on the north side (which generates a unique ID), and
+ *is "connected" on the remote side via its ID.  The facilitates call-path
+ *setup in a manner that is friendly across VM/VMM boundaries.
+ * #) An IOQ is manipulated using an iterator idiom.
+ * #) A "IOQ Manager" abstraction handles the translation between two
+ *endpoints. E.g. allocating "north" memory, signaling, translating
+ *addresses (e.g. GPA to PA)
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_IOQ_H
+#define _LINUX_IOQ_H
+
+#include 
+#include 
+#include 
+
+struct ioq_mgr;
+
+/*
+ *-
+ * The following structures represent data that is shared across boundaries
+ * which may be quite disparate from one another (e.g. Windows vs Linux,
+ * 32 vs 64 bit, etc).  Therefore, care has been taken to make sure they
+ * present data in a manner that is independent of the environment.
+ *---
+ */
+typedef u64 ioq_id_t;
+
+struct ioq_ring_desc {
+   u64 cookie; /* for arbitrary use by north-side */
+   u64 ptr;
+   u64 len;
+   u64 alen;
+   u8  valid;
+   u8  sown; /* South owned = 1, North owned = 0 */
+};
+
+#define IOQ_RING_MAGIC 0x47fa2fe4
+#define IOQ_RING_VER   1
+
+struct ioq_ring_idx {
+   u32 head;/* 0 based index to head of ptr array */
+   u32 tail;/* 0 based index to tail of ptr array */
+   u8  full;
+};
+
+struct ioq_irq {
+   u8  enabled;
+   u8  pending;
+};
+
+enum ioq_locality {
+   ioq_locality_north,
+   ioq_locality_south,
+};
+
+struct ioq_ring_head {
+   u32 magic;
+   u32 ver;
+   ioq_id_tid;
+   u32 count;
+   u64 ptr; /* ptr to array of ioq_ring_desc[count] */
+   struct ioq_ring_idx idx[2];
+   struct ioq_irq  irq[2];
+   u8  padding[16];
+};
+
+/* --- END SHARED STRUCTURES --- */
+
+enum ioq_idx_type {
+   ioq_idxtype_valid,
+   ioq_idxtype_inuse,
+   ioq_idxtype_invalid,
+};
+
+enum ioq_seek_type {
+   ioq_seek_tail,
+   ioq_seek_next,
+   ioq_seek_head,
+   ioq_seek_set
+};
+
+struct ioq_iterator {
+   struct ioq*ioq;
+   struct ioq_ring_idx   *idx;
+   u32pos;
+   struct ioq_ring_desc  *desc;
+   intupdate;
+};
+
+int  ioq_iter_seek(struct ioq_iterator *iter, enum ioq_seek_type type,
+  long offset, int flags);
+int  ioq_iter_push(struct ioq_iterator *iter, int flags);
+int  ioq_iter_pop(struct ioq_iterator *iter,  int flags);
+
+struct ioq_notifier {
+

[kvm-devel] [RFC PATCH 2/9] PARAVIRTUALIZATION: Add support for a bus abstraction

2007-08-14 Thread Gregory Haskins

PV usually comes in two flavors:  device PV, and "core" PV.  The existing PV
ops deal in terms of the latter.  However, it would be useful to add an
interface for a virtual bus with provisions for discovery/configuration of
backend PV devices.  Often times it is desirable to run PV devices even if the
entire core is not operating with PVOPS.  Therefore, we introduce a separate
interface to deal with the devices.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 include/linux/pvbus.h |   43 +++
 kernel/Makefile   |2 +-
 kernel/pvbus.c|   45 +
 3 files changed, 89 insertions(+), 1 deletions(-)

diff --git a/include/linux/pvbus.h b/include/linux/pvbus.h
new file mode 100644
index 000..edfe185
--- /dev/null
+++ b/include/linux/pvbus.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * Paravirtualized-Bus
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_PVBUS_H
+#define _LINUX_PVBUS_H
+
+#include 
+
+struct pvbus_dev {
+   u64 instance;
+   u32 version;
+};
+
+struct pvbus_ops {
+   int (*enumerate)(const char *dev, struct pvbus_dev inst[],
+size_t *cnt, int flags);
+   int (*call)(u64 inst, u32 func, void *data, size_t len, int flags);
+
+   struct ioq_mgr *ioqmgr;
+};
+
+extern struct pvbus_ops *pvbus_ops;
+
+#endif /* */
diff --git a/kernel/Makefile b/kernel/Makefile
index 642d427..3008163 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -8,7 +8,7 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o 
profile.o \
signal.o sys.o kmod.o workqueue.o pid.o \
rcupdate.o extable.o params.o posix-timers.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
-   hrtimer.o rwsem.o latency.o nsproxy.o srcu.o die_notifier.o
+   hrtimer.o rwsem.o latency.o nsproxy.o srcu.o die_notifier.o pvbus.o
 
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
 obj-y += time/
diff --git a/kernel/pvbus.c b/kernel/pvbus.c
new file mode 100644
index 000..e638a69
--- /dev/null
+++ b/kernel/pvbus.c
@@ -0,0 +1,45 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * Paravirtualized-Bus
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+
+int native_enumerate(const char *dev, struct pvbus_dev inst[],
+size_t *cnt, int flags)
+{
+   return -ENOSYS;
+}
+
+int native_call(u64 inst, u32 func, void *data, size_t len, int flags)
+{
+   return -ENOSYS;
+}
+
+static struct pvbus_ops native_pvbus = {
+   .enumerate = native_enumerate,
+   .call  = native_call,
+   .ioqmgr= NULL,
+};
+
+/* This will get reassigned if a PV compatible platform is detected */
+struct pvbus_ops *pvbus_ops = &native_pvbus;
+EXPORT_SYMBOL(pvbus_ops);


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [RFC PATCH 3/9] IRQ: Export create_irq/destroy_irq

2007-08-14 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/io_apic.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index d8bfe31..6bf8794 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -1849,6 +1849,7 @@ int create_irq(void)
}
return irq;
 }
+EXPORT_SYMBOL(create_irq);
 
 void destroy_irq(unsigned int irq)
 {
@@ -1860,6 +1861,7 @@ void destroy_irq(unsigned int irq)
__clear_irq_vector(irq);
spin_unlock_irqrestore(&vector_lock, flags);
 }
+EXPORT_SYMBOL(destroy_irq);
 
 /*
  * MSI mesage composition


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [RFC PATCH 0/9] PV device infrastructure v2

2007-08-14 Thread Gregory Haskins

This series incorporates all of v1 plus the following changes based on
feedback to date:

*) s/ioq_iter/ioq_iter_init

*) removed unecessary desc->offset

*) utilized create_irq() to actually assign interrupt resource

*) removed inadvertant cleanup of whitespace from series

*) added memory barriers in strategic locations

*) misc cleanups I can't quite remember at this time ;)


Notably missing:


*) ioq_iter_push/pop have not been rolled into ioq_iter_seek

*) kmap_atomic is still in use as I don't have a clear understanding of how to
   support this properly.

*) still not using bus_register/device_register


Regards,
-Greg


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] a few may be wiki question

2007-08-14 Thread Gregory Haskins

On Tue, 2007-08-14 at 15:50 +0300, Avi Kivity wrote:
> Farkas Levente wrote:
> > hi,
> > i try to setup a centos host server with kvm and a few guest os for the
> > first time. imho there is only a very limited docs about kvm (even if i
> > try to read them:-). so there are a few general questions:
> > - which is the recommended host config?:
> >   - should i used x86_64 or i386 host kernel? i don't know in advance,
> > but probably most guest os will be 32 bit. is there any advantage or
> > disadvantage to use x86_64 as a host os?
> >   
> 
> If you have more than 1GB of RAM, 64-bit will be slightly faster.  
> Otherwise there is no preference.

One other difference is that x86_64 host can run either 32 or 64 bit
guest.  i386 can only run 32 bit.

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC PATCH 1/8] IOQ: Adding basic definitions for IO-Queue logic

2007-08-14 Thread Gregory Haskins

On Tue, 2007-08-14 at 07:44 -0400, Gregory Haskins wrote:
> On Tue, 2007-08-14 at 01:01 -0700, Dor Laor wrote:

> > Instead of the above code I would call to iter_seek and do everything inside
> > then return a proper error code if needed.
> 
> Im not sure I fully understand your point.  But if I do get what you are
> saying that wouldn't work right for the design.  iter_push and iter_pop
> are special forms of iter_seek(seek_next) in that they advance the index
> AND the position.  iter_seek only advances the position.

/me slapping myself

I think I see what you are saying now.  You just meant make a single
function "iter_seek" which as flags for "seek_next, seek_push, and
seek_pop"? 

Yes, that would work quite nicely and cleans things up a bit.  

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC PATCH 1/8] IOQ: Adding basic definitions for IO-Queue logic

2007-08-14 Thread Gregory Haskins

On Tue, 2007-08-14 at 01:01 -0700, Dor Laor wrote:

> Why not use the standard naming for guest/host or front/backend?

I was trying to stress that this isn't virtualization specific.  For
instance, you could use an IOQ on something like an AMP or RDMA based
system too.  However, that being said I am not married to the
generalized terms.  I could easily change the locality code to be
"guest/host" and be just as happy if it eliminates confusion.

> I didn’t see usage for cookie in downstream patches. Is it obslete?

It's use is optional, but the IOQNET driver does take advantage of it.
It stores things like the real skb pointer.

> 
> >+u64 ptr;
> >+u64 offset;
> 
> The ptr & offset can be united.

Ya, you are probably right.  In earlier designs I didn't have the cookie
field so I wanted ptr to remain fixed at the base of the allocation.
That is probably not as important any more.

> What's alen?

"actual length".  For instance, look at how IOQNET manages the the
(guest-side) rx queue.  It populates the ring with a bunch of MTU sized
skbs and sets len = MTU, alen = 0.  The host side then comes and adjusts
the alen to match what the actual packet length is.

> 
> >+u8  valid;
> >+u8  sown; /* South owned = 1, North owned = 0 */
> 
> What about no-one owns it?

There is *always* an owner ;)

>  if you plan of using the valid field you might need
> to handle complex races.

If we were only using the valid/sown flags that would probably be true.
But the ioq_ring_indexes are also used and I think they eliminate the
races.

Typically an application will move to either the head of tail of an
index, and then process until the point that the valid/sown flags
indicate we are done.  I think you will find this is race free, but
certainly let me know if you think otherwise.

> 
> >+};
> >+
> >+#define IOQ_RING_MAGIC 0x47fa2fe4
> >+#define IOQ_RING_VER   1
> >+
> >+struct ioq_ring_idx {
> >+u32 head;/* 0 based index to head of ptr array
> >*/
> >+u32 tail;/* 0 based index to tail of ptr array
> >*/
> >+u8  full;
> 
> full = head==tail ???

Sometimes ;)

I might have screwed this up, but as far as I know the logic is sound.
The way I figured it, there are two conditions where we can have
head==tail.  When the ring is empty, or when its full.  The full flag
allows them to be differentiated.

> Why have a notifier in a separate structure?

The notifier structure is something allocated and provided by the user
of the ioq.  For an example, see the IOQNET driver.

> >+
> >+int ioq_iter_push(struct ioq_iterator *iter, int flags)
> >+{
> >+struct ioq_ring_head *head_desc = iter->ioq->head_desc;
> >+struct ioq_ring_idx  *idx  = iter->idx;
> >+int ret = -ENOSPC;
> >+
> >+/*
> >+ * Its only valid to push if we are currently pointed at the head
> >+ */
> >+if (iter->pos != idx->head)
> >+return -EINVAL;
> >+
> >+if (ioq_ring_count(idx, head_desc->count) < head_desc->count) {
> >+idx->head++;
> >+idx->head %= head_desc->count;
> >+
> >+if (idx->head == idx->tail)
> >+idx->full = 1;
> >+
> >+ret = ioq_iter_seek(iter, ioq_seek_next, 0, flags);
> 
> Instead of the above code I would call to iter_seek and do everything inside
> then return a proper error code if needed.

Im not sure I fully understand your point.  But if I do get what you are
saying that wouldn't work right for the design.  iter_push and iter_pop
are special forms of iter_seek(seek_next) in that they advance the index
AND the position.  iter_seek only advances the position.

> EXPORT_SYMBOL(ioq_iter_pop);
> >+
> >+int ioq_iter(struct ioq *ioq, struct ioq_iterator *iter,
> >+ enum ioq_idx_type type, int flags)
> 
> ioq_iter_init ?

yeah, thats a good way to think of it.  I think I originally had a more
descriptive name and then got on a short "ioq_" kick ;)

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC PATCH 0/8] PV device infrastructure

2007-08-14 Thread Gregory Haskins

On Tue, 2007-08-14 at 00:44 -0700, Dor Laor wrote:

> Can you provide performance figures? What’s the tcp/udp between guest & host?

None yet :(  Its pretty much "pseudo code that compiles" at this
point. ;) I still have to work out several issues before I can even
begin testing.  Interrupt management jumps out at me as one of those
issues.

The one saving grace is that I originally developed the ioq.[ch] code in
userspace with a mutex/condvar based ioqmgr.  So in general I know the
queuing code works.  But once I started really coding to the kernel it
became difficult to maintain the test harness so I cut it out and
abandoned it.

> I saw that you use kvm_vcpu_send_interrupt in a comment. Did you actually 
> inject irqs?

Not yet, just a place holder for the real implementation.  Now you know
why I was interested in the lapic project tho ;)

> Through the kernel or userspace?

The thing obviously would prefer an in-kernel injection path, but I
think I might need to talk to you soon about getting the userspace path
working in the interim until the lapic3 branch is merged.

> 
> Also I have some general comments:
> - I like the approach of a general shared memory queue (one can call it dma 
> descriptor 
>   data structure or whatever). Rusty has a similar dma descriptor code that 
> can be generealized
>   easily too.

By all means, point me to the area you think I can contribute here.  One
of the the things I was thinking of as I looked at the two was that
perhaps the code in IOQ could be used to *implement* virtio.  Even if
people don't like having the iterator methods exposed directly to
end-users (e.g. drivers), the stuff I did to generalize the queuing and
to abstract the VMM platform away might be useful.

> - Regarding the virtual bus, I didn't see any use of standard Linux APIs such 
> as 'bus_register' or
>   'driver_register'. Take a look at lguest_bus.c, it is very nice piece of 
> code.

This wasn't on purpose per se.  I will definitely take a look at that.

On that topic, I really see the queuing code as independent from the
pvbus code.  Did you like any of the directions I took things there, or
am I off in the weeds?

> - IMO the ring code lacks lots of memory barrier calls (actually my original 
> implementation 
>   partially suffered from that too ...

Totally agree.  Its on my plate.  I think at the very least I need them
in the ioq_signal() code.  But there are probably other places too, like
before the ownership bit is changed (e.g. desc->sown).

>   Since Rusty has taken care for that in his virtio implementation I think we 
> should base
>   our KVM PV devices over it. I just started porting virtio to KVM and expect 
> to see the first
>   packets coming in today. I'll send initial patches when it be in better 
> shape next week.
>   Nevertheless, any ideas/code are welcomed.

I am looking forward to your patches!

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC PATCH 4/8] KVM: Add a gpa_to_hva helper function

2007-08-14 Thread Gregory Haskins

On Tue, 2007-08-14 at 12:43 +0530, Amit Shah wrote:
> On Tuesday 14 August 2007 09:37:45 Gregory Haskins wrote:

> There are several things about this patch which aren't nice:
> - It implicitly does a kmap_atomic. That's a bad thing for procedures that 
> might sleep after calling this
> - It's prone to not being kunmapped
> - kmap_atomic takes two arguments
> - 'page' is not checked for !NULL
> - we can simply use gpa_to_hpa and get page_address(hpa), instead of 
> duplicating all that it does.
> 

You are probably right on all accounts.  I am not an expert in this area
and have been getting help from Hollis and Anthony via IRC.  There is a
very good chance I have misunderstood some of their extremely helpful
pointers to me.

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [RFC PATCH 4/8] KVM: Add a gpa_to_hva helper function

2007-08-13 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h |1 +
 drivers/kvm/mmu.c |   12 
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 9934f11..05d5be1 100755
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -475,6 +475,7 @@ void vcpu_load(struct kvm_vcpu *vcpu);
 void vcpu_put(struct kvm_vcpu *vcpu);
 
 hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa);
+void* gpa_to_hva(struct kvm *kvm, gpa_t gpa);
 #define HPA_MSB ((sizeof(hpa_t) * 8) - 1)
 #define HPA_ERR_MASK ((hpa_t)1 << HPA_MSB)
 static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; }
diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index e84c599..daaf0d2 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -766,6 +766,18 @@ hpa_t gpa_to_hpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 }
 EXPORT_SYMBOL_GPL(gpa_to_hpa);
 
+void* gpa_to_hva(struct kvm *kvm, gpa_t gpa)
+{
+   struct page *page;
+
+   if ((gpa & HPA_ERR_MASK) == 0)
+   return NULL;
+
+   page = gfn_to_page(kvm, gpa >> PAGE_SHIFT);
+   return kmap_atomic(page, gpa & PAGE_MASK);
+}
+EXPORT_SYMBOL_GPL(gpa_to_hva);
+
 hpa_t gva_to_hpa(struct kvm_vcpu *vcpu, gva_t gva)
 {
gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, gva);


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [RFC PATCH 1/8] IOQ: Adding basic definitions for IO-Queue logic

2007-08-13 Thread Gregory Haskins

IOQ is a generic shared-memory-queue mechanism that happens to be friendly
to virtualization boundaries.  Note that it is not virtualization specific
due to its flexible transport layer.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 include/linux/ioq.h |  178 +
 lib/Kconfig |   11 +++
 lib/Makefile|1 
 lib/ioq.c   |  219 +++
 4 files changed, 409 insertions(+), 0 deletions(-)

diff --git a/include/linux/ioq.h b/include/linux/ioq.h
new file mode 100644
index 000..52f68f5
--- /dev/null
+++ b/include/linux/ioq.h
@@ -0,0 +1,178 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * IOQ is a generic shared-memory-queue mechanism that happens to be friendly
+ * to virtualization boundaries. It can be used in a variety of ways, though
+ * its intended purpose is to become the low-level communication path for
+ * paravirtualized drivers.  Note that it is not virtualization specific
+ * due to its flexible signaling layer.
+ *
+ * The following are a list of key design points:
+ *
+ * #) All shared-memory is always allocated on explicitly one side of the
+ *link.  This typically would be the guest side in a VM/VMM scenario.
+ * #) The code has the concept of "north" and "south" where north denotes the
+ *memory-owner side (e.g. guest).
+ * #) A IOQ is "created" on the north side (which generates a unique ID), and
+ *is "connected" on the remote side via its ID.  The facilitates call-path
+ *setup in a manner that is friendly across VM/VMM boundaries.
+ * #) An IOQ is manipulated using an iterator idiom.
+ * #) A "IOQ Manager" abstraction handles the translation between two
+ *endpoints. E.g. allocating "north" memory, signaling, translating
+ *addresses (e.g. GPA to PA)
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_IOQ_H
+#define _LINUX_IOQ_H
+
+#include 
+#include 
+#include 
+
+struct ioq_mgr;
+
+/*
+ *-
+ * The following structures represent data that is shared across boundaries
+ * which may be quite disparate from one another (e.g. Windows vs Linux,
+ * 32 vs 64 bit, etc).  Therefore, care has been taken to make sure they
+ * present data in a manner that is independent of the environment.
+ *---
+ */
+typedef u64 ioq_id_t;
+
+struct ioq_ring_desc {
+   u64 cookie; /* for arbitrary use by north-side */
+   u64 ptr;
+   u64 offset;
+   u64 len;
+   u64 alen;
+   u8  valid;
+   u8  sown; /* South owned = 1, North owned = 0 */
+};
+
+#define IOQ_RING_MAGIC 0x47fa2fe4
+#define IOQ_RING_VER   1
+
+struct ioq_ring_idx {
+   u32 head;/* 0 based index to head of ptr array */
+   u32 tail;/* 0 based index to tail of ptr array */
+   u8  full;
+};
+
+struct ioq_irq {
+   u8  enabled;
+   u8  pending;
+};
+
+enum ioq_locality {
+   ioq_locality_north,
+   ioq_locality_south,
+};
+
+struct ioq_ring_head {
+   u32 magic;
+   u32 ver;
+   ioq_id_tid;
+   u32 count;
+   u64 ptr; /* ptr to array of ioq_ring_desc[count] */
+   struct ioq_ring_idx idx[2];
+   struct ioq_irq  irq[2];
+   u8  padding[16];
+};
+
+/* --- END SHARED STRUCTURES --- */
+
+enum ioq_idx_type {
+   ioq_idxtype_valid,
+   ioq_idxtype_inuse,
+   ioq_idxtype_invalid,
+};
+
+enum ioq_seek_type {
+   ioq_seek_tail,
+   ioq_seek_next,
+   ioq_seek_head,
+   ioq_seek_set
+};
+
+struct ioq_iterator {
+   struct ioq*ioq;
+   unsigned long  flags;
+   intupdate;
+   struct ioq_ring_idx   *idx;
+   u32pos;
+   struct ioq_ring_desc  *desc;
+};
+
+int  ioq_iter_seek(struct ioq_iterator *iter, enum ioq_seek_type type,
+  long offset, int flags);
+int  ioq_iter_push(struct ioq_iterator *iter, int flags);
+int  ioq

[kvm-devel] [RFC PATCH 2/8] PARAVIRTUALIZATION: Add support for a bus abstraction

2007-08-13 Thread Gregory Haskins

PV usually comes in two flavors:  device PV, and "core" PV.  The existing PV
ops deal in terms of the latter.  However, it would be useful to add an
interface for a virtual bus with provisions for discovery/configuration of
backend PV devices.  Often times it is desirable to run PV devices even if the
entire core is not operating with PVOPS.  Therefore, we introduce a separate
interface to deal with the devices.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 include/linux/pvbus.h |   43 +++
 kernel/Makefile   |2 +-
 kernel/pvbus.c|   45 +
 3 files changed, 89 insertions(+), 1 deletions(-)

diff --git a/include/linux/pvbus.h b/include/linux/pvbus.h
new file mode 100644
index 000..edfe185
--- /dev/null
+++ b/include/linux/pvbus.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * Paravirtualized-Bus
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_PVBUS_H
+#define _LINUX_PVBUS_H
+
+#include 
+
+struct pvbus_dev {
+   u64 instance;
+   u32 version;
+};
+
+struct pvbus_ops {
+   int (*enumerate)(const char *dev, struct pvbus_dev inst[],
+size_t *cnt, int flags);
+   int (*call)(u64 inst, u32 func, void *data, size_t len, int flags);
+
+   struct ioq_mgr *ioqmgr;
+};
+
+extern struct pvbus_ops *pvbus_ops;
+
+#endif /* */
diff --git a/kernel/Makefile b/kernel/Makefile
index 642d427..3008163 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -8,7 +8,7 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o 
profile.o \
signal.o sys.o kmod.o workqueue.o pid.o \
rcupdate.o extable.o params.o posix-timers.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
-   hrtimer.o rwsem.o latency.o nsproxy.o srcu.o die_notifier.o
+   hrtimer.o rwsem.o latency.o nsproxy.o srcu.o die_notifier.o pvbus.o
 
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
 obj-y += time/
diff --git a/kernel/pvbus.c b/kernel/pvbus.c
new file mode 100644
index 000..e638a69
--- /dev/null
+++ b/kernel/pvbus.c
@@ -0,0 +1,45 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * Paravirtualized-Bus
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+
+int native_enumerate(const char *dev, struct pvbus_dev inst[],
+size_t *cnt, int flags)
+{
+   return -ENOSYS;
+}
+
+int native_call(u64 inst, u32 func, void *data, size_t len, int flags)
+{
+   return -ENOSYS;
+}
+
+static struct pvbus_ops native_pvbus = {
+   .enumerate = native_enumerate,
+   .call  = native_call,
+   .ioqmgr= NULL,
+};
+
+/* This will get reassigned if a PV compatible platform is detected */
+struct pvbus_ops *pvbus_ops = &native_pvbus;
+EXPORT_SYMBOL(pvbus_ops);


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [RFC PATCH 7/8] IOQ: Add an IOQ network driver

2007-08-13 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/net/Kconfig|   52 ++--
 drivers/net/Makefile   |2 
 drivers/net/ioqnet.c   |  623 
 include/linux/ioqnet.h |   42 +++
 4 files changed, 695 insertions(+), 24 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index fb99cd4..6bab885 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -883,12 +883,12 @@ config SMC911X
help
  This is a driver for SMSC's LAN911x series of Ethernet chipsets
  including the new LAN9115, LAN9116, LAN9117, and LAN9118.
- Say Y if you want it compiled into the kernel, 
+ Say Y if you want it compiled into the kernel,
  and read the Ethernet-HOWTO, available from
  <http://www.linuxdoc.org/docs.html#howto>.
 
- This driver is also available as a module. The module will be 
- called smc911x.  If you want to compile it as a module, say M 
+ This driver is also available as a module. The module will be
+ called smc911x.  If you want to compile it as a module, say M
  here and read 
 
 config NET_VENDOR_RACAL
@@ -1221,7 +1221,7 @@ config IBM_EMAC_RX_SKB_HEADROOM
  will always reserve at least 2 bytes to make IP header
  aligned, so usually there is no need to add any additional
  headroom.
- 
+
  If unsure, set to 0.
 
 config IBM_EMAC_PHY_RX_CLK_FIX
@@ -1229,10 +1229,10 @@ config IBM_EMAC_PHY_RX_CLK_FIX
depends on IBM_EMAC && (405EP || 440GX || 440EP || 440GR)
help
  Enable this if EMAC attached to a PHY which doesn't generate
- RX clock if there is no link, if this is the case, you will 
+ RX clock if there is no link, if this is the case, you will
  see "TX disable timeout" or "RX disable timeout" in the system
  log.
- 
+
  If unsure, say N.
 
 config IBM_EMAC_DEBUG
@@ -1249,7 +1249,7 @@ config IBM_EMAC_RGMII
bool
depends on IBM_EMAC && 440GX
default y
-   
+
 config IBM_EMAC_TAH
bool
depends on IBM_EMAC && 440GX
@@ -1482,9 +1482,9 @@ config E100
select MII
---help---
  This driver supports Intel(R) PRO/100 family of adapters.
- To verify that your adapter is supported, find the board ID number 
- on the adapter. Look for a label that has a barcode and a number 
- in the format 123456-001 (six digits hyphen three digits). 
+ To verify that your adapter is supported, find the board ID number
+ on the adapter. Look for a label that has a barcode and a number
+ in the format 123456-001 (six digits hyphen three digits).
 
  Use the above information and the Adapter & Driver ID Guide at:
 
@@ -1496,7 +1496,7 @@ config E100
 
  <http://appsr.intel.com/scripts-df/support_intel.asp>
 
- More specific information on configuring the driver is in 
+ More specific information on configuring the driver is in
  .
 
  To compile this driver as a module, choose M here and read
@@ -1950,7 +1950,7 @@ config E1000
depends on PCI
---help---
  This driver supports Intel(R) PRO/1000 gigabit ethernet family of
- adapters.  For more information on how to identify your adapter, go 
+ adapters.  For more information on how to identify your adapter, go
  to the Adapter & Driver ID Guide at:
 
  <http://support.intel.com/support/network/adapter/pro100/21397.htm>
@@ -1960,7 +1960,7 @@ config E1000
 
  <http://support.intel.com>
 
- More specific information on configuring the driver is in 
+ More specific information on configuring the driver is in
  .
 
  To compile this driver as a module, choose M here and read
@@ -2074,7 +2074,7 @@ config R8169_VLAN
---help---
  Say Y here for the r8169 driver to support the functions required
  by the kernel 802.1Q code.
- 
+
  If in doubt, say Y.
 
 config SIS190
@@ -2100,7 +2100,7 @@ config SKGE
  and related Gigabit Ethernet adapters. It is a new smaller driver
  with better performance and more complete ethtool support.
 
- It does not support the link failover and network management 
+ It does not support the link failover and network management
  features that "portable" vendor supplied sk98lin driver does.
 
  This driver supports adapters based on the original Yukon chipset:
@@ -2201,16 +2201,16 @@ config SK98LIN
- SK-9871 V2.0 Gigabit Ethernet 1000Base-ZX Adapter
- SK-9872 Gigabit Ethernet Server Adapter (SK-NET GE-ZX dual link)
- SMC EZ Card 1000 (SMC9452TXV.2)
- 
+
  The adapters support Jumbo Frames.

[kvm-devel] [RFC PATCH 6/8] KVM: Add PVBUS support to the KVM host

2007-08-13 Thread Gregory Haskins

PVBUS allows VMM agnostic PV drivers to discover/configure virtual resources

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig  |   10 +
 drivers/kvm/Makefile |3 
 drivers/kvm/kvm.h|4 
 drivers/kvm/kvm_main.c   |4 
 drivers/kvm/pvbus_host.c |  459 ++
 drivers/kvm/pvbus_host.h |   66 +++
 6 files changed, 546 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 0f81af1..cb674bb 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -52,6 +52,16 @@ config KVM_IOQ_HOST
depends on KVM
select IOQ
 
+config KVM_PVBUS_HOST
+   boolean "Paravirtualized Bus (PVBUS) host support"
+   depends on KVM
+   select KVM_IOQ_HOST
+   ---help---
+PVBUS is an infrastructure for generic PV drivers to take advantage
+of an underlying hypervisor without having to understand the details
+   of the hypervisor itself.  You only need this option if you plan to
+   run PVBUS based PV guests in KVM.
+
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 2095061..8926fa9 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -7,6 +7,9 @@ kvm-objs := kvm_main.o mmu.o x86_emulate.o
 ifeq ($(CONFIG_KVM_IOQ_HOST),y)
 kvm-objs += ioq_host.o
 endif
+ifeq ($(CONFIG_KVM_PVBUS_HOST),y)
+kvm-objs += pvbus_host.o
+endif
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index c38c84f..8dc9ac3 100755
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vmx.h"
 #include "ioq.h"
@@ -393,6 +394,9 @@ struct kvm {
 #ifdef CONFIG_KVM_IOQ_HOST
struct ioq_mgr *ioqmgr;
 #endif
+#ifdef CONFIG_KVM_PVBUS_HOST
+   struct kvm_pvbus *pvbus;
+#endif
 
 };
 
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index fbffd2f..d35ce8d 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -44,6 +44,7 @@
 
 #include "x86_emulate.h"
 #include "segment_descriptor.h"
+#include "pvbus_host.h"
 
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
@@ -350,6 +351,7 @@ static struct kvm *kvm_create_vm(void)
spin_unlock(&kvm_lock);
}
kvmhost_ioqmgr_init(kvm);
+   kvm_pvbus_init(kvm);
return kvm;
 }
 
@@ -3616,6 +3618,7 @@ static __init int kvm_init(void)
memset(__va(bad_page_address), 0, PAGE_SIZE);
 
kvmhost_ioqmgr_module_init();
+   kvm_pvbus_module_init();
 
return 0;
 
@@ -3637,6 +3640,7 @@ static __exit void kvm_exit(void)
mntput(kvmfs_mnt);
unregister_filesystem(&kvm_fs_type);
kvm_mmu_module_exit();
+   kvm_pvbus_module_exit();
 }
 
 module_init(kvm_init)
diff --git a/drivers/kvm/pvbus_host.c b/drivers/kvm/pvbus_host.c
new file mode 100644
index 000..574ca4e
--- /dev/null
+++ b/drivers/kvm/pvbus_host.c
@@ -0,0 +1,459 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "pvbus.h"
+#include "pvbus_host.h"
+#include "kvm.h"
+
+struct pvbus_map {
+   int (*compare)(const void *left, const void *right);
+   const void* (*getkey)(struct rb_node *node);
+
+   spinlock_t lock;
+   struct rb_root root;
+};
+
+struct _pv_devtype {
+   struct kvm_pv_devtype *item;
+   struct rb_node node;
+   struct list_head   devlist;
+};
+
+struct _pv_device {
+   struct kvm_pv_device  *item;
+   struct rb_node node;
+   struct list_head   list;
+   struct _pv_devtype*parent;
+};
+
+static struct pvbus_map pvbus_typemap; /* stores globally available types */
+
+struct kvm_pvbus {
+   spinlock_t   lock;
+   struct pvbus_map typemap; /* stores locally instantiated types */
+   struct pvbus_map devmap;
+};
+
+/*
+ * --
+ * generic rb map manageme

[kvm-devel] [RFC PATCH 8/8] KVM: Add an IOQNET backend driver

2007-08-13 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig   |5 
 drivers/kvm/Makefile  |2 
 drivers/kvm/ioqnet_host.c |  556 +
 3 files changed, 563 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index cb674bb..1c884c5 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -62,6 +62,11 @@ config KVM_PVBUS_HOST
of the hypervisor itself.  You only need this option if you plan to
run PVBUS based PV guests in KVM.
 
+config KVM_IOQNET
+   boolean "IOQNET host support"
+   depends on KVM
+   select KVM_PVBUS_HOST
+
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 8926fa9..66e5272 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -22,3 +22,5 @@ kvm-net-host-objs = kvm_net_host.o
 obj-$(CONFIG_KVM_NET_HOST) += kvm_net_host.o
 kvm-pvbus-objs := ioq_guest.o pvbus_guest.o
 obj-$(CONFIG_KVM_PVBUS_GUEST) += kvm-pvbus.o
+kvm-ioqnet-objs := ioqnet_host.o
+obj-$(CONFIG_KVM_IOQNET) += kvm-ioqnet.o
\ No newline at end of file
diff --git a/drivers/kvm/ioqnet_host.c b/drivers/kvm/ioqnet_host.c
new file mode 100644
index 000..aff7e5c
--- /dev/null
+++ b/drivers/kvm/ioqnet_host.c
@@ -0,0 +1,556 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * ioqnet - A paravirtualized network device based on the IOQ interface.
+ *
+ * This module represents the backend driver for an IOQNET driver on the KVM
+ * platform.
+ *
+ * Author:
+ *  Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * Derived in part from the SNULL example from the book "Linux Device
+ * Drivers" by Alessandro Rubini and Jonathan Corbet, published
+ * by O'Reilly & Associates.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include  /* printk() */
+#include  /* kmalloc() */
+#include   /* error codes */
+#include   /* size_t */
+#include  /* mark_bh */
+
+#include 
+#include/* struct device, and other headers */
+#include  /* eth_type_trans */
+#include   /* struct iphdr */
+#include  /* struct tcphdr */
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvbus_host.h"
+#include "kvm.h"
+
+MODULE_AUTHOR("Gregory Haskins");
+MODULE_LICENSE("GPL");
+
+#define IOQNET_NAME "ioqnet"
+
+/*
+ * FIXME: Any "BUG_ON" code that can be triggered by a malicious guest must
+ * be turned into an inject_gp()
+ */
+
+struct ioqnet_queue {
+   struct ioq  *queue;
+   struct ioq_notifier  notifier;
+};
+
+struct ioqnet_priv {
+   spinlock_t   lock;
+   struct kvm  *kvm;
+   struct kvm_pv_device pvdev;
+   struct net_device   *netdev;
+   struct net_device_stats  stats;
+   struct ioqnet_queue  rxq;
+   struct ioqnet_queue  txq;
+   struct tasklet_structtxtask;
+   int  connected;
+   int  opened;
+};
+
+#undef PDEBUG /* undef it, just in case */
+#ifdef IOQNET_DEBUG
+#  define PDEBUG(fmt, args...) printk( KERN_DEBUG "ioqnet: " fmt, ## args)
+#else
+#  define PDEBUG(fmt, args...) /* not debugging: nothing */
+#endif
+
+/*
+ * Enable and disable receive interrupts.
+ */
+static void ioqnet_rx_ints(struct net_device *dev, int enable)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   struct ioq *ioq = priv->rxq.queue;
+
+   if (priv->connected) {
+   if (enable)
+   ioq_start(ioq, 0);
+   else
+   ioq_stop(ioq, 0);
+   }
+}
+
+/*
+ * Open and close
+ */
+
+int ioqnet_open(struct net_device *dev)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   
+   priv->opened = 1;
+   netif_start_queue(dev);
+   
+   return 0;
+}
+
+int ioqnet_release(struct net_device *dev)
+{
+   struct ioqnet_priv *priv = netdev_priv(dev);
+   
+   priv->opened = 0;
+   netif_stop_queue(dev);
+
+   return 0;
+}
+
+/*
+ * Configuration changes (passed on by ifconfig)
+ */

[kvm-devel] [RFC PATCH 0/8] PV device infrastructure

2007-08-13 Thread Gregory Haskins

This patch series represents the state of my current work w.r.t. PV and KVM.
It is still a work in progress and so far has only been compile tested (though
earlier versions had a test harness that ironed many bugs out).  I submit it
now as an RFC on the directions I am taking and to solicit feedback.  Note
that it is based on Dor's PV branch, not kvm.git master.  

As a brief synopsis, the series implements the following:

*) A generic shared-memory based IO subsystem called "IOQ".  This is an
 iterator based interface, with an asymmetric allocation scheme conducive to
 virtualization constraints.  Its designed to allow for "zero copy" (yeah, I
 know its not really ;) type buffer passing between host and guest, including
 deferred pointer reaping.  It also allows for bi-directional NAPI which can
 mitigate both guest interrupts as well as host-hypercalls on active queues.   

*) A generic paravirtual-bus abstraction (called PVBUS) for
 discovery/configuration of PV devices.  This allows backend driver writers to
 simply plug into the system without mucking with ACPI, Bochs, PCI, and other
 legacy related items.  It also allows guest drivers to dynamically find,
 query, and connect to backend devices in a hypervisor and system neutral way.
 This is helpful, for instance, for s390 that from what I understand doesn't
 have PCI. 

*) An example PV network driver called "IOQNET".  Note that the guest side of
 this driver is VMM independent since it sits on top of the IOQ/PVBUS
 interfaces. 

I started this work a while back before I knew about Rusty's VirtIO proposals.
I had recently contemplated distilling the original interface into an
implementation of virtio, but there was some key differences in approach
taken.  Rather than just throw those differences away, I figured I would
submit it "as is" to foster discussion.  I would ideally like to see a
convergence between IOQ and VirtIO with Rusty's blessing.  So Rusty: If you
see anything that I did that looks interesting to you, I would like to work
with you on bringing some of those ideas into a VirtIO implementation for
KVM. 

I would like to thank all of you out there who guided me along the way.  In
particular, Hollis, Anthony, Dor and Rusty were most helpful in helping me to
understand how to go about building something like this. 

-Greg


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [RFC PATCH 3/8] KVM: Add a guest side driver for IOQ

2007-08-13 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig   |   27 +-
 drivers/kvm/Makefile  |3 -
 drivers/kvm/ioq.h |   39 +
 drivers/kvm/ioq_guest.c   |  190 +
 drivers/kvm/pvbus.h   |   41 ++
 drivers/kvm/pvbus_guest.c |  189 +
 include/linux/kvm.h   |4 +
 7 files changed, 485 insertions(+), 8 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 22d0eb4..cba03d2 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -47,16 +47,31 @@ config KVM_BALLOON
  The driver inflate/deflate guest physical memory on demand.
  This ability provides memory over commit for the host
 
-config KVM_NET
-   tristate "Para virtual network device"
-   depends on KVM
-   ---help---
- Provides support for guest paravirtualization networking
-
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
---help---
  Provides support for host paravirtualization networking
 
+config KVM_GUEST
+   bool "KVM Guest support"
+   depends on X86
+   default y
+
+config KVM_PVBUS_GUEST
+   tristate "Paravirtualized Bus (PVBUS) support"
+   depends on KVM_GUEST
+   select IOQ
+   ---help---
+PVBUS is an infrastructure for generic PV drivers to take advantage
+of an underlying hypervisor without having to understand the details
+   of the hypervisor itself.  You only need this option if you plan to
+   run this kernel as a KVM guest.
+
+config KVM_NET
+   tristate "Para virtual network device"
+   depends on KVM && KVM_GUEST
+   ---help---
+ Provides support for guest paravirtualization networking
+
 endif # VIRTUALIZATION
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index 92600d8..c6a59bb 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -14,4 +14,5 @@ kvm-net-objs = kvm_net.o
 obj-$(CONFIG_KVM_NET) += kvm-net.o
 kvm-net-host-objs = kvm_net_host.o
 obj-$(CONFIG_KVM_NET_HOST) += kvm_net_host.o
-
+kvm-pvbus-objs := ioq_guest.o pvbus_guest.o
+obj-$(CONFIG_KVM_PVBUS_GUEST) += kvm-pvbus.o
diff --git a/drivers/kvm/ioq.h b/drivers/kvm/ioq.h
new file mode 100644
index 000..7e955f1
--- /dev/null
+++ b/drivers/kvm/ioq.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _KVM_IOQ_H_
+#define _KVM_IOQ_H_
+
+#include 
+
+#define IOQHC_REGISTER   1
+#define IOQHC_UNREGISTER  2
+#define IOQHC_SIGNAL 3
+
+struct ioq_register {
+   ioq_id_t id;
+   u32  irq;
+   u64  ring;
+};
+
+
+#endif /* _KVM_IOQ_H_ */
diff --git a/drivers/kvm/ioq_guest.c b/drivers/kvm/ioq_guest.c
new file mode 100644
index 000..d608407
--- /dev/null
+++ b/drivers/kvm/ioq_guest.c
@@ -0,0 +1,190 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+
+#include "ioq.h"
+#include "kvm.h"
+
+struct kvmguest_ioq {
+   struct ioqioq;
+   int   irq;
+};
+
+struct kvmguest_ioq* to_ioq(struct ioq *ioq)
+{
+   return container_of(ioq, struct kvmguest_ioq, ioq);
+}
+
+static int ioq_hypercall(unsigned long nr, void *data)
+{
+   return hyperca

[kvm-devel] [RFC PATCH 5/8] KVM: Add support for IOQ

2007-08-13 Thread Gregory Haskins

IOQ is a shared-memory-queue interface for implmenting PV driver
communication.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/Kconfig|5 +
 drivers/kvm/Makefile   |3 
 drivers/kvm/ioq.h  |   12 +-
 drivers/kvm/ioq_host.c |  365 
 drivers/kvm/kvm.h  |5 +
 drivers/kvm/kvm_main.c |3 
 include/linux/kvm.h|1 
 7 files changed, 393 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index cba03d2..0f81af1 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -47,6 +47,11 @@ config KVM_BALLOON
  The driver inflate/deflate guest physical memory on demand.
  This ability provides memory over commit for the host
 
+config KVM_IOQ_HOST
+   boolean "Add IOQ support to KVM"
+   depends on KVM
+   select IOQ
+
 config KVM_NET_HOST
tristate "Para virtual network host device"
depends on KVM
diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index c6a59bb..2095061 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -4,6 +4,9 @@
 EXTRA_CFLAGS :=
 
 kvm-objs := kvm_main.o mmu.o x86_emulate.o
+ifeq ($(CONFIG_KVM_IOQ_HOST),y)
+kvm-objs += ioq_host.o
+endif
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/ioq.h b/drivers/kvm/ioq.h
index 7e955f1..b942113 100644
--- a/drivers/kvm/ioq.h
+++ b/drivers/kvm/ioq.h
@@ -25,7 +25,17 @@
 
 #include 
 
-#define IOQHC_REGISTER   1
+struct kvm;
+
+#ifdef CONFIG_KVM_IOQ_HOST
+int kvmhost_ioqmgr_init(struct kvm *kvm);
+int kvmhost_ioqmgr_module_init(void);
+#else
+#define kvmhost_ioqmgr_init(kvm) {}
+#define kvmhost_ioqmgr_module_init() {}
+#endif
+
+#define IOQHC_REGISTER1
 #define IOQHC_UNREGISTER  2
 #define IOQHC_SIGNAL 3
 
diff --git a/drivers/kvm/ioq_host.c b/drivers/kvm/ioq_host.c
new file mode 100644
index 000..413f103
--- /dev/null
+++ b/drivers/kvm/ioq_host.c
@@ -0,0 +1,365 @@
+/*
+ * Copyright 2007 Novell.  All Rights Reserved.
+ *
+ * See include/linux/ioq.h for documentation
+ *
+ * Author:
+ * Gregory Haskins <[EMAIL PROTECTED]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "ioq.h"
+#include "kvm.h"
+
+struct kvmhost_ioq {
+   struct ioqioq;
+   struct rb_nodenode;
+   atomic_t  refcnt;
+   struct kvm_vcpu  *vcpu;
+   int   irq;
+};
+
+struct kvmhost_map {
+   spinlock_t lock;
+   struct rb_root root;
+};
+
+struct kvmhost_ioq_mgr {
+   struct ioq_mgr  mgr;
+   struct kvm *kvm;
+   struct kvmhost_map  map;
+};
+
+struct kvmhost_ioq* to_ioq(struct ioq *ioq)
+{
+   return container_of(ioq, struct kvmhost_ioq, ioq);
+}
+
+struct kvmhost_ioq_mgr* to_mgr(struct ioq_mgr *mgr)
+{
+   return container_of(mgr, struct kvmhost_ioq_mgr, mgr);
+}
+
+/*
+ * --
+ * rb map management
+ * --
+ */
+
+static void kvmhost_map_init(struct kvmhost_map *map)
+{
+   spin_lock_init(&map->lock);
+   map->root = RB_ROOT;
+}
+
+static int kvmhost_map_register(struct kvmhost_map *map,
+   struct kvmhost_ioq *ioq)
+{
+   int ret = 0;
+   struct rb_root *root;
+   struct rb_node **new, *parent = NULL;
+
+   spin_lock(&map->lock);
+
+   root = &map->root;
+   new  = &(root->rb_node);
+
+   /* Figure out where to put new node */
+   while (*new) {
+   struct kvmhost_ioq *this;
+
+   this   = container_of(*new, struct kvmhost_ioq, node);
+   parent = *new;
+
+   if (ioq->ioq.id < this->ioq.id)
+   new = &((*new)->rb_left);
+   else if (ioq->ioq.id > this->ioq.id)
+   new = &((*new)->rb_right);
+   else {
+   ret = -EEXIST;
+   break;
+   }
+   }
+
+   if (!ret) {
+   /* Add new node and rebalance tree. */
+   rb_link_node(&ioq->node, parent, new);
+   rb_insert_color(&ioq->node, r

Re: [kvm-devel] add back pending timer irqs for kernel APIC timer

2007-08-13 Thread Gregory Haskins

On Mon, 2007-08-13 at 11:50 +0300, Avi Kivity wrote:

> > This patch keep track of the pending irqs and inject them back to guest
> > eventually
> > even the guest may be descheduled. This is also what we did in Xen.
> > BTW, This policy will also be applied to future kernel PIT, I just do it
> > step by step.
> >
> >   
> 
> I see.  We have something like that in userspace (called the 
> time-drift-fix, or tdf).  Will look at the patch now.
> 
> 

Note that my old lapic branch did a similar "tdf" thing as well and it
worked quite nicely, so I think Eddie is on the right track with adding
this.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] help with understanding GPA/GVA issues

2007-08-10 Thread Gregory Haskins

On Fri, 2007-08-10 at 19:59 +0300, Avi Kivity wrote:

> Note that passing a virtual address is highly discouraged as its meaning 
> can change from vcpu to vcpu, it might not be mapped, translation is 
> slow, etc.  Just let the guest do the translation.

Yeah, Hollis and Anthony straighted me out via IRC.  I will mostly be
dealing with kmalloc/skb buffers so we settled on the following as
optimal:

guest-side

gpa = __pa(ptr);

host-side:

gfn  = gpa >> PAGE_SHIFT
page = gfn_to_page(gfn);
ptr = kmap(page);

..

kunmap(ptr); 

> 
> We have something running at qumranet, will be sent out soon.  I am 
> somewhat discouraged in trying to get the thing to page -- Shaohua's 
> approach is much simpler.
> 

Cool!  Ill keep an eye out.

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] help with understanding GPA/GVA issues

2007-08-10 Thread Gregory Haskins

On Fri, 2007-08-10 at 09:32 -0500, Anthony Liguori wrote:
> Gregory Haskins wrote:
> > Hi All,
> >   I am working on some PV stuff and had some questions about the ability
> > to share memory across the Guest/Host boundary.
> >
> > It seems that most examples of how to do this always involve starting
> > with a *page, converting it to a gfn via page_to_gfn(), and using that
> > as a gpa to pass across the boundary.
> >   
> 
> Do you mean page_to_pfn?  I assume you're talking about a page within 
> the guest right?

Er, ya.  Sorry...up too late last night ;)

> 
> > I understand that this method avoids a software traipse through the
> > page-walker, so it's nice.  What I can't quite figure out is what are
> > the other types of memory (if any) that can be passed across.
> >
> > For instance, is a pointer from kmalloc() considered a gpa, a gva,
> > neither?
> 
> gpa = guest physical address.  It's a pa or a pfn << PAGE_SHIFT.
> 
> gva = guest virtual address.  It's returned from pretty much anything 
> that allocates memory (kmalloc for instance).  This is all within the 
> guest of course.
> 
> >   Or are gva's only pointers that come from guest-userspace,
> > etc.  Is it possible to pass something like a skb->data pointer (I
> > understand that I may have to run the page-walker for some of these)?
> >   
> 
> Yes, you can pass through any gva.  There are couple of things to be 
> aware of though.  When passing a gva, you have to be sure that the gva 
> is actually mapped in memory as KVM cannot cause Linux to fault in a 
> page.

Cool!  Ya, I totally understand and agree that it has to be "DMA" class
memory that is mapped and pinned in guest context.  IIUC you basically
need to follow the same rules as you would for a DMA based device in a
bare-metal scenario.

>   Also, for something like skb->data, you should probably just pass 
> the gpa since they'll usually fall within a single page anyway.

Hmm...good point.

> 
> > If so, how would I do this:  E.g. can I just pass the pointer, and then
> > do gva_to_hpa() on the host?  Or do I need to prep the pointer before
> > sending it?
> >   
> 
> There's no need to prep provided that you know the va is mapped into 
> memory in the guest.
> 

Thanks Anthony!  Very helpful indeed and I appreciate the explanation.

On this topic:  I know there has been talk going on of giving each VM
its own linux va context.  IIUC, when that happens we wouldn't need the
gva_to_hpa type functions, right?  We could use things like
copy_to_user(), etc?  Out of curiosity, what's the status of that
project?

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] help with understanding GPA/GVA issues

2007-08-09 Thread Gregory Haskins

Hi All,
  I am working on some PV stuff and had some questions about the ability
to share memory across the Guest/Host boundary.

It seems that most examples of how to do this always involve starting
with a *page, converting it to a gfn via page_to_gfn(), and using that
as a gpa to pass across the boundary.

I understand that this method avoids a software traipse through the
page-walker, so it's nice.  What I can't quite figure out is what are
the other types of memory (if any) that can be passed across.

For instance, is a pointer from kmalloc() considered a gpa, a gva,
neither?  Or are gva's only pointers that come from guest-userspace,
etc.  Is it possible to pass something like a skb->data pointer (I
understand that I may have to run the page-walker for some of these)?
If so, how would I do this:  E.g. can I just pass the pointer, and then
do gva_to_hpa() on the host?  Or do I need to prep the pointer before
sending it?

TIA
-Greg


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] KVM and RT

2007-08-01 Thread Gregory Haskins

On Wed, 2007-08-01 at 09:56 +0300, Avi Kivity wrote:
> Gregory Haskins wrote:
> > Hi Team,
> >   I don't know if anyone here also subscribes to linux-rt-users, but it
> > seems as though Ingo et. al. rejected my modifications which ran the
> > smp_call() in a thread (VFCIPI).  
> 
> It's not surprising.  650 lines including a custom memory allocator is
> excessive.

Well, as a tactical solution I definitely agree.  As you know, I was
going for a more broadly applicable feature going way beyond KVM.  Of
those 650 lines, a good chunk will fall away if I incorporated some of
the feedback (plist instead of custom prio_array, convert to workqueue).
And the last 150 lines are a custom allocator to work around the
regression of GFP_ATOMIC on PREEMPT_RT.  But I digress... some of the
feedback was that I was "wrong and misguided" or something like
that...ouch.  Back to the drawing board. ;)

> 
> > So FYI: KVM is still broken on RT and
> > needs to be addressed.
> >
> > In a nutshell, kvm_lock cannot be used as it us today.  It either needs
> > to be a raw_spinlock_t, or the locking needs to be done differently.
> > The code currently blows up when you shut down a VM running on top of
> > PREEMPT_RT.  Just thought you might want to know.
> >   
> 
> What about hoisting the lock outside the IPI as I suggested earlier?

I think your proposal should work fine as long as you are not atomic
when you take the lock wherever it's moved to.

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] KVM and RT

2007-07-31 Thread Gregory Haskins

Hi Team,
  I don't know if anyone here also subscribes to linux-rt-users, but it
seems as though Ingo et. al. rejected my modifications which ran the
smp_call() in a thread (VFCIPI).  So FYI: KVM is still broken on RT and
needs to be addressed.

In a nutshell, kvm_lock cannot be used as it us today.  It either needs
to be a raw_spinlock_t, or the locking needs to be done differently.
The code currently blows up when you shut down a VM running on top of
PREEMPT_RT.  Just thought you might want to know.

Regards,
-Greg


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 2/2] KVM: Protect race-condition betweenVMCS and current_vmcs on VMX hardware

2007-07-31 Thread Gregory Haskins

On Tue, 2007-07-31 at 17:18 +0800, Dong, Eddie wrote:

> 
> I may miss something, why does that matter?

As it turns out, it doesn't ;)  So we have dropped the patch.  But not
for the reason you are suggesting.

>  __vcpu_clear will eventually
> get executed though it is a little bit delayed. vmclear will eventually
> dump 
> internal state of VM-a VMCS to memory and VM-b get its own VMCS 
> loaded.  Here the point is vmclear has a parameter to identify which
> VM's VMCS to dump, not only a memory address. Jun, please correct me if
> I am wrong.

The race is against per_cpu(current_vmcs), not the actual VMCS.
However, Avi pointed out that the race is benign so the race doesn't
matter.

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH][RFC] RT: Preemptible Function-Call-IPI Support

2007-07-30 Thread Gregory Haskins

This patch is an RFC for the "Threaded IPI" idea I was talking about last
week on the linux-rt/kvm list.  It builds and boots fine for me.  However,
note the following:

1) Currently only x86_64 has been converted.  After I get more feedback I will
   convert the other relevant architectures as well
2) The priority-inheritance logic is non-functioning.  All FUNCTION_CALL IPIs
   are directed at a task with normal priority.  Eventually the task will
   inherit the priority of the highest waiter.

I have confirmed that KVM now shutdows cleanly in RT without modifying the KVM
code.

Comments/Suggestions/Bug-reports welcome!

Thanks!
-Greg



This code allows FUNCTION_CALL IPIs to become preemptible by executing
them in kthread context instead of interrupt context.  They are referred
to as "Virtual Function Call IPIs" (VFCIPI) because we no longer rely
on the actual FCIPI facility.  Instead we schedule a thread to run.  This
essentially replaces the synchronous FCIPI with an async RESCHEDULE IPI.

Since the function will be executed in kthread context, it is fully
sleepable and preemptible, thus providing more determinism.  It also allows
code that was written to expect spin_locks to work properly, even though
they may have converted to rt_mutex under the hood.  In summary, this
subsystem does for FCIPI interrupts what PREEMPT_HARDIRQs does for normal
interrupts.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/smp.c |   18 +-
 include/linux/smp.h  |   25 ++-
 include/linux/vfcipi.h   |   10 +
 init/main.c  |3 
 kernel/Kconfig.preempt   |   12 +
 kernel/Makefile  |1 
 kernel/vfcipi/Makefile   |4 
 kernel/vfcipi/heap.c |  136 ++
 kernel/vfcipi/heap.h |   20 ++
 kernel/vfcipi/thread.c   |  445 ++
 10 files changed, 662 insertions(+), 12 deletions(-)

diff --git a/arch/x86_64/kernel/smp.c b/arch/x86_64/kernel/smp.c
index 8cf7a0d..71fbe2f 100644
--- a/arch/x86_64/kernel/smp.c
+++ b/arch/x86_64/kernel/smp.c
@@ -367,7 +367,7 @@ __smp_call_function_single(int cpu, void (*func) (void 
*info), void *info,
 }
 
 /*
- * smp_call_function_single - Run a function on another CPU
+ * smp_call_function_single__nodelay - Run a function on another CPU
  * @func: The function to run. This must be fast and non-blocking.
  * @info: An arbitrary pointer to pass to the function.
  * @nonatomic: Currently unused.
@@ -378,9 +378,9 @@ __smp_call_function_single(int cpu, void (*func) (void 
*info), void *info,
  * Does not return until the remote CPU is nearly ready to execute 
  * or is or has executed.
  */
-
-int smp_call_function_single (int cpu, void (*func) (void *info), void *info,
-   int nonatomic, int wait)
+int
+smp_call_function_single__nodelay (int cpu, void (*func) (void *info),
+ void *info, int nonatomic, int wait)
 {
/* prevent preemption and reschedule on another processor */
int me = get_cpu();
@@ -398,7 +398,7 @@ int smp_call_function_single (int cpu, void (*func) (void 
*info), void *info,
put_cpu();
return 0;
 }
-EXPORT_SYMBOL(smp_call_function_single);
+EXPORT_SYMBOL(smp_call_function_single__nodelay);
 
 /*
  * this function sends a 'generic call function' IPI to all other CPUs
@@ -437,7 +437,7 @@ static void __smp_call_function (void (*func) (void *info), 
void *info,
 }
 
 /*
- * smp_call_function - run a function on all other CPUs.
+ * smp_call_function__nodelay - run a function on all other CPUs.
  * @func: The function to run. This must be fast and non-blocking.
  * @info: An arbitrary pointer to pass to the function.
  * @nonatomic: currently unused.
@@ -451,15 +451,15 @@ static void __smp_call_function (void (*func) (void 
*info), void *info,
  * hardware interrupt handler or from a bottom half handler.
  * Actually there are a few legal cases, like panic.
  */
-int smp_call_function (void (*func) (void *info), void *info, int nonatomic,
-   int wait)
+int smp_call_function__nodelay (void (*func) (void *info), void *info,
+   int nonatomic, int wait)
 {
spin_lock(&call_lock);
__smp_call_function(func,info,nonatomic,wait);
spin_unlock(&call_lock);
return 0;
 }
-EXPORT_SYMBOL(smp_call_function);
+EXPORT_SYMBOL(smp_call_function__nodelay);
 
 static void stop_this_cpu(void *dummy)
 {
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 442f87b..5017a97 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -62,10 +62,29 @@ extern void smp_cpus_done(unsigned int max_cpus);
 /*
  * Call a function on all other processors
  */
-int smp_call_function(void(*func)(void *info), void *info, int retry, int 
wait);
 
-int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
-

Re: [kvm-devel] [PATCH 2/2] KVM: Clean up VMCLEAR/VMPTRLD code on VMX

2007-07-27 Thread Gregory Haskins

On Fri, 2007-07-27 at 17:09 -0700, Nakajima, Jun wrote:
> Gregory Haskins wrote:
> > Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
> > ---
> > 
> >  drivers/kvm/vmx.c |   71
> >  +++-- 1 files
> changed, 58
> > insertions(+), 13 deletions(-) 
> > 
> > diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
> > index 96837d6..86f1eea 100644
> > --- a/drivers/kvm/vmx.c
> > +++ b/drivers/kvm/vmx.c
> > @@ -191,6 +191,20 @@ static struct kvm_msr_entry
> *find_msr_entry(struct
> > kvm_vcpu *vcpu, u32 msr) return NULL;
> >  }
> > 
> > +static void vmcs_load(struct vmcs *vmcs)
> > +{
> > +   u64 phys_addr = __pa(vmcs);
> > +   u8 error;
> > +
> > +   asm volatile (ASM_VMX_VMPTRLD_RAX "; setna %0"
> > + : "=g"(error) : "a"(&phys_addr), "m"(phys_addr)
> > + : "cc");
> > +
> > +   if (error)
> > +   printk(KERN_ERR "kvm: vmptrld %p/%llx fail\n",
> > +  vmcs, phys_addr);
> > +}
> 
> I don't believe this instruction fails under normal conditions, but we
> should terminate the guest cleanly in such cases, rather than just doing
> printk().

Note that this was just a verbatim move of existing code.  I don't
disagree with your assessment...its just that its not the right place
for a change like that.  If you want to submit a further patch, that
would certainly be appreciated.

> 
> Jun
> ---
> Intel Open Source Technology Center

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 2/2] KVM: Clean up VMCLEAR/VMPTRLD code on VMX

2007-07-27 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/vmx.c |   71 +++--
 1 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 96837d6..86f1eea 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -191,6 +191,20 @@ static struct kvm_msr_entry *find_msr_entry(struct 
kvm_vcpu *vcpu, u32 msr)
return NULL;
 }
 
+static void vmcs_load(struct vmcs *vmcs)
+{
+   u64 phys_addr = __pa(vmcs);
+   u8 error;
+   
+   asm volatile (ASM_VMX_VMPTRLD_RAX "; setna %0"
+ : "=g"(error) : "a"(&phys_addr), "m"(phys_addr)
+ : "cc");
+
+   if (error)
+   printk(KERN_ERR "kvm: vmptrld %p/%llx fail\n",
+  vmcs, phys_addr);
+}
+
 static void vmcs_clear(struct vmcs *vmcs)
 {
u64 phys_addr = __pa(vmcs);
@@ -210,10 +224,34 @@ static void __vcpu_clear(void *arg)
struct vcpu_vmx *vmx = to_vmx(vcpu);
int cpu = raw_smp_processor_id();
 
-   if (vcpu->cpu == cpu)
+   if (vcpu->cpu != -1) {
+   /*
+* We should *never* try to __vcpu_clear a remote VMCS. This
+* would have been addressed at a higher layer already
+*/
+   BUG_ON(vcpu->cpu != cpu);
+
+   /*
+* Execute the VMCLEAR operation regardless of whether the 
+* VMCS is currently active on this CPU or not (it doesn't
+* necessarily have to be)
+*/
vmcs_clear(vmx->vmcs);
-   if (per_cpu(current_vmcs, cpu) == vmx->vmcs)
-   per_cpu(current_vmcs, cpu) = NULL;
+
+   /*
+* And finally, if this VMCS *was* currently active on this
+* CPU, mark the CPU as available again
+*/
+   if (per_cpu(current_vmcs, cpu) == vmx->vmcs)
+   per_cpu(current_vmcs, cpu) = NULL;
+   } else
+   /*
+* If vcpu->cpu thinks we are not installed anywhere,
+* but this CPU thinks are are currently active, something is
+* wacked.
+*/
+   BUG_ON(per_cpu(current_vmcs, cpu) == vmx->vmcs);
+
rdtscll(vcpu->host_tsc);
 }
 
@@ -223,7 +261,9 @@ static void vcpu_clear(struct kvm_vcpu *vcpu)
smp_call_function_single(vcpu->cpu, __vcpu_clear, vcpu, 0, 1);
else
__vcpu_clear(vcpu);
+
to_vmx(vcpu)->launched = 0;
+   vcpu->cpu   = -1;
 }
 
 static unsigned long vmcs_readl(unsigned long field)
@@ -431,7 +471,6 @@ static void vmx_load_host_state(struct kvm_vcpu *vcpu)
 static void vmx_vcpu_load(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   u64 phys_addr = __pa(vmx->vmcs);
int cpu;
u64 tsc_this, delta;
 
@@ -440,16 +479,22 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu)
if (vcpu->cpu != cpu)
vcpu_clear(vcpu);
 
-   if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
-   u8 error;
-
+   /*
+* By the time we get here, we know that either our VMCS is already
+* loaded on the current CPU (from previous runs), or that its not
+* loaded *anywhere* in the system at all (due to the vcpu_clear()
+* operation above).  Either way, we must check to make sure we are
+* the currently loaded pointer, and correct it if we are not.
+*
+* Note: A race condition exists against current_vmcs between the
+* following update, and any IPIs dispatched to clear a different
+* VMCS.  Currently, this race condition is believed to be benign,
+* but tread carefully.
+*/
+   if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
+   /* Re-establish ourselves as the current VMCS */
+   vmcs_load(vmx->vmcs);
per_cpu(current_vmcs, cpu) = vmx->vmcs;
-   asm volatile (ASM_VMX_VMPTRLD_RAX "; setna %0"
- : "=g"(error) : "a"(&phys_addr), "m"(phys_addr)
- : "cc");
-   if (error)
-   printk(KERN_ERR "kvm: vmptrld %p/%llx fail\n",
-  vmx->vmcs, phys_addr);
}
 
if (vcpu->cpu != cpu) {


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 1/2] KVM: Remove arch specific components from the general code

2007-07-27 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h  |   31 
 drivers/kvm/kvm_main.c |   26 +--
 drivers/kvm/kvm_svm.h  |3 
 drivers/kvm/svm.c  |  394 
 drivers/kvm/vmx.c  |  249 +++---
 5 files changed, 397 insertions(+), 306 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index fc27c2f..6cbf087 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -15,7 +15,6 @@
 #include 
 #include 
 
-#include "vmx.h"
 #include 
 #include 
 
@@ -140,14 +139,6 @@ struct kvm_mmu_page {
};
 };
 
-struct vmcs {
-   u32 revision_id;
-   u32 abort;
-   char data[0];
-};
-
-#define vmx_msr_entry kvm_msr_entry
-
 struct kvm_vcpu;
 
 /*
@@ -309,15 +300,12 @@ void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
 struct kvm_io_device *dev);
 
 struct kvm_vcpu {
+   int valid;
struct kvm *kvm;
int vcpu_id;
-   union {
-   struct vmcs *vmcs;
-   struct vcpu_svm *svm;
-   };
+   void *_priv;
struct mutex mutex;
int   cpu;
-   int   launched;
u64 host_tsc;
struct kvm_run *run;
int interrupt_window_open;
@@ -340,14 +328,6 @@ struct kvm_vcpu {
u64 shadow_efer;
u64 apic_base;
u64 ia32_misc_enable_msr;
-   int nmsrs;
-   int save_nmsrs;
-   int msr_offset_efer;
-#ifdef CONFIG_X86_64
-   int msr_offset_kernel_gs_base;
-#endif
-   struct vmx_msr_entry *guest_msrs;
-   struct vmx_msr_entry *host_msrs;
 
struct kvm_mmu mmu;
 
@@ -366,11 +346,6 @@ struct kvm_vcpu {
char *guest_fx_image;
int fpu_active;
int guest_fpu_loaded;
-   struct vmx_host_state {
-   int loaded;
-   u16 fs_sel, gs_sel, ldt_sel;
-   int fs_gs_ldt_reload_needed;
-   } vmx_host_state;
 
int mmio_needed;
int mmio_read_completed;
@@ -579,8 +554,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 
data);
 
 void fx_init(struct kvm_vcpu *vcpu);
 
-void load_msrs(struct vmx_msr_entry *e, int n);
-void save_msrs(struct vmx_msr_entry *e, int n);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index bc11c2d..9cc16b8 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -367,7 +367,7 @@ static void free_pio_guest_pages(struct kvm_vcpu *vcpu)
 
 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
 {
-   if (!vcpu->vmcs)
+   if (!vcpu->valid)
return;
 
vcpu_load(vcpu);
@@ -377,7 +377,7 @@ static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
 
 static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
 {
-   if (!vcpu->vmcs)
+   if (!vcpu->valid)
return;
 
vcpu_load(vcpu);
@@ -1646,24 +1646,6 @@ void kvm_resched(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_resched);
 
-void load_msrs(struct vmx_msr_entry *e, int n)
-{
-   int i;
-
-   for (i = 0; i < n; ++i)
-   wrmsrl(e[i].index, e[i].data);
-}
-EXPORT_SYMBOL_GPL(load_msrs);
-
-void save_msrs(struct vmx_msr_entry *e, int n)
-{
-   int i;
-
-   for (i = 0; i < n; ++i)
-   rdmsrl(e[i].index, e[i].data);
-}
-EXPORT_SYMBOL_GPL(save_msrs);
-
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
int i;
@@ -2402,7 +2384,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int 
n)
 
mutex_lock(&vcpu->mutex);
 
-   if (vcpu->vmcs) {
+   if (vcpu->valid) {
mutex_unlock(&vcpu->mutex);
return -EEXIST;
}
@@ -2450,6 +2432,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int 
n)
kvm->nvcpus = n + 1;
spin_unlock(&kvm_lock);
 
+   vcpu->valid = 1;
+
return r;
 
 out_free_vcpus:
diff --git a/drivers/kvm/kvm_svm.h b/drivers/kvm/kvm_svm.h
index a869983..82e5d77 100644
--- a/drivers/kvm/kvm_svm.h
+++ b/drivers/kvm/kvm_svm.h
@@ -20,7 +20,10 @@ static const u32 host_save_user_msrs[] = {
 #define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs)
 #define NUM_DB_REGS 4
 
+struct kvm_vcpu;
+
 struct vcpu_svm {
+   struct kvm_vcpu *vcpu;
struct vmcb *vmcb;
unsigned long vmcb_pa;
struct svm_cpu_data *svm_data;
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 850a1b1..3248187 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -49,6 +49,11 @@ MODULE_LICENSE("GPL");
 #define SVM_FEATURE_LBRV (1 << 1)
 #define SVM_DEATURE_SVML (1 << 2)
 
+static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
+{
+   return (struct vcpu_svm*)vcpu->_priv;
+}
+
 unsigned long iopm_base;
 unsigned long msrpm_base;
 
@@ -95,7 +100,7 @@ s

[kvm-devel] [PATCH 0/2] Arch cleanup v5

2007-07-27 Thread Gregory Haskins

(Note the last "v3" I sent out was a duplicate v3...it should have been v4)

This series includes the following changes from v4(v3)

Patch #1: I have folded Rusty's cleanup in (thanks Rusty!)
Patch #2: Rebased on Rusty's changes
Patch #3: Dropped

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 1/2] Rename svm() and vmx() to to_svm() and to_vmx().

2007-07-27 Thread Gregory Haskins

On Fri, 2007-07-27 at 16:53 +1000, Rusty Russell wrote:
> On Thu, 2007-07-26 at 14:45 -0400, Gregory Haskins wrote:
> > Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
> 
> OK, in anticipation that you would do it, I've done a trivial
> s/svm()/to-svm()/ and s/vmx()/to_vmx()/ patch and put my patch on top of
> it.

Thanks Rusty...I actually hadn't got around to it, so this is
appreciated.  I will fold it in now and resend out the new patch.

> 
> I think the result is quite nice (there are some potential cleanups of
> the now-gratuitous to-and-fro conversions, but this is simple).
> Probably easiest to fold this one straight into yours and post as one
> patch.
> 
> Cheers,
> Rusty.
> 
> ==
> This goes on top of "[PATCH 1/3] KVM: Remove arch specific components from
> the general code" and changes svm() to to_svm() and kvm() to to_kvm().
> 
> It uses a tmp var where multiple calls would be needed, and fixes up
> some linewrap issues.  It can be simply folded into the previous patch.
> 
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
> 
> diff -r b318edfbdb7d drivers/kvm/svm.c
> --- a/drivers/kvm/svm.c   Fri Jul 27 15:55:31 2007 +1000
> +++ b/drivers/kvm/svm.c   Fri Jul 27 16:09:24 2007 +1000
> @@ -49,7 +49,7 @@ MODULE_LICENSE("GPL");
>  #define SVM_FEATURE_LBRV (1 << 1)
>  #define SVM_DEATURE_SVML (1 << 2)
>  
> -static inline struct vcpu_svm* svm(struct kvm_vcpu *vcpu)
> +static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
>  {
>   return (struct vcpu_svm*)vcpu->_priv;
>  }
> @@ -100,7 +100,7 @@ static inline u32 svm_has(u32 feat)
>  
>  static unsigned get_addr_size(struct kvm_vcpu *vcpu)
>  {
> - struct vmcb_save_area *sa = &svm(vcpu)->vmcb->save;
> + struct vmcb_save_area *sa = &to_svm(vcpu)->vmcb->save;
>   u16 cs_attrib;
>  
>   if (!(sa->cr0 & X86_CR0_PE) || (sa->rflags & X86_EFLAGS_VM))
> @@ -186,7 +186,7 @@ static inline void write_dr7(unsigned lo
>  
>  static inline void force_new_asid(struct kvm_vcpu *vcpu)
>  {
> - svm(vcpu)->asid_generation--;
> + to_svm(vcpu)->asid_generation--;
>  }
>  
>  static inline void flush_guest_tlb(struct kvm_vcpu *vcpu)
> @@ -199,22 +199,24 @@ static void svm_set_efer(struct kvm_vcpu
>   if (!(efer & KVM_EFER_LMA))
>   efer &= ~KVM_EFER_LME;
>  
> - svm(vcpu)->vmcb->save.efer = efer | MSR_EFER_SVME_MASK;
> + to_svm(vcpu)->vmcb->save.efer = efer | MSR_EFER_SVME_MASK;
>   vcpu->shadow_efer = efer;
>  }
>  
>  static void svm_inject_gp(struct kvm_vcpu *vcpu, unsigned error_code)
>  {
> - svm(vcpu)->vmcb->control.event_inj =SVM_EVTINJ_VALID |
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + svm->vmcb->control.event_inj =  SVM_EVTINJ_VALID |
>   SVM_EVTINJ_VALID_ERR |
>   SVM_EVTINJ_TYPE_EXEPT |
>   GP_VECTOR;
> - svm(vcpu)->vmcb->control.event_inj_err = error_code;
> + svm->vmcb->control.event_inj_err = error_code;
>  }
>  
>  static void inject_ud(struct kvm_vcpu *vcpu)
>  {
> - svm(vcpu)->vmcb->control.event_inj =SVM_EVTINJ_VALID |
> + to_svm(vcpu)->vmcb->control.event_inj = SVM_EVTINJ_VALID |
>   SVM_EVTINJ_TYPE_EXEPT |
>   UD_VECTOR;
>  }
> @@ -233,19 +235,21 @@ static int is_external_interrupt(u32 inf
>  
>  static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>  {
> - if (!svm(vcpu)->next_rip) {
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + if (!svm->next_rip) {
>   printk(KERN_DEBUG "%s: NOP\n", __FUNCTION__);
>   return;
>   }
> - if (svm(vcpu)->next_rip - svm(vcpu)->vmcb->save.rip > 15) {
> + if (svm->next_rip - svm->vmcb->save.rip > 15) {
>   printk(KERN_ERR "%s: ip 0x%llx next 0x%llx\n",
>  __FUNCTION__,
> -svm(vcpu)->vmcb->save.rip,
> -svm(vcpu)->next_rip);
> - }
> -
> - vcpu->rip = svm(vcpu)->vmcb->save.rip = svm(vcpu)->next_rip;
> - svm(vcpu)->vmcb->control.int_state &= ~SVM_INTERRUPT_SHADOW_MASK;
> +svm->vmcb->save.rip,
> +svm->next_rip);
> + }
> +
> + vcpu->rip = svm->vmcb->save.rip = svm->next_rip;
> + svm->vmcb->control.int_state &=

Re: [kvm-devel] Threaded IPIs?

2007-07-27 Thread Gregory Haskins

On Fri, 2007-07-27 at 07:56 +0300, Avi Kivity wrote:
> Gregory Haskins wrote:
> > Hi guys,
> >   While working with the -rt kernel, I have noticed a problem in KVM.
> > Specifically, when you stop a VM you sometimes get a "sleep while
> > atomic" oopses.  It turns out that the issue is related to an
> > smp_function_call IPI that KVM does to remotely flush the VMX hardware
> > on shutdown.  The code tries to acquire the global kvm_lock (which is a
> > normal spinlock_t, of course converted to rt_mutex on -rt) from the
> > interrupt context of the IPI handler.  You know the rest of the
> > story
> >
> > The obvious solution is to convert the kvm_lock to a raw_spinlock_t.
> > However, I really don't want to do this unless we absolutely have to
> > since it will just increase latencies for a good portion of the rest of
> > KVM.
> >   
> 
> Are you talking about decache_vcpus_from_cpu() that is called from
> hardware_disable()?  

Yeah, exactly.

> Would having the caller acquire the lock solve the
> problem?  

Possibly, though I would have to look at how/where the caller executes.
If its in_atomic, it will hit the same problem that the callee does.

That being said, I have no doubts that we can solve this particular KVM
issue much more simply then with what I am proposing.  What I am
wondering, however, is if this is an opportunity to further increase the
preemptibility of the RT kernel.  I figure, most drivers that use
hardirqs and spinlock_t continue to work transparently on PREEMPT_RT
because threaded IRQs can sleep (and be preempted)why not FC-IPIs
too?

I'm partway done with a proof-of-concept patch.  Ill send it out later.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 3/3] KVM: Protect race-condition between VMCS and current_vmcs on VMX hardware

2007-07-27 Thread Gregory Haskins

On Fri, 2007-07-27 at 07:58 +0300, Avi Kivity wrote:
> Gregory Haskins wrote:
> > We need to provide locking around the current_vmcs/VMCS interactions to
> > protect against race conditions.
> >
> >   
> 
> We agreed the race was benign?  Do you no longer think so?

No, I agree (see "0/3" note).  I was just including it in case we wanted
to lock it down for posterity.  Since it adds what would amount to
unnecessary overhead, the comments provided in 2/3 are probably
sufficient.

-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] KVM: Remove arch specific components from the general code

2007-07-26 Thread Gregory Haskins

On Fri, 2007-07-27 at 09:27 +1000, Rusty Russell wrote:

> So the in-kernel apic code has to traverse every element in the array?
> That is clearly better because?

I can't speak for Eddie's implementation, but the one that I had worked
on did in fact take advantage of the array.   Not for traversing it as
you suggested, but rather for efficient mapping of vcpu ID to APIC.  The
modeled "APIC BUS" already knew its targets.  It would then use
"kvm->vcpus[target]" to find the right APIC.

However, that being said I really saw my implementation as "cheating".
In a clean design, the apic-bus model should be independent of the kvm
to vcpu mapping.  The apic-bus should never really "see" the entire VM.
It should only see itself as a bus with some interconnected APICs to
talk to.  If that were true, your decision to move the kvm->vcpu array
to a list would be inconsequential to the in-kernel APIC code.  This
extra level of modeling never came to be in my now defunct series,
however.

So in summary, I don't think moving to a list like you did is a terrible
thing, but other provisions will have to be made if you are not
providing a good mapping mechanism.  E.g. either a proper apic-bus model
will have to map processor-id to apic independently, or your kvm->vcpu
mapping will need to be adaptable to relatively efficient lookups
(hlist, maybe?).

>   We get to place an artificial maximum
> and keep a ceiling variable like the existing code does?

This always bothered me too.  But I guess realistically some reasonable
ceiling could probably be found (something in the 8-64 range sounds
right to me) if the dynamic list idea is shot down.  If we do go this
static route, it should probably be in the Kconfig.

> 
> > > + spin_lock(&kvm->lock);
> > > + /* What do we care if they create duplicate CPU ids?  But be nice. */
> > >   
> > 
> > Oh, we care.  Esp. the apic code.
> 
> Yeah, I left the check although it's currently unneeded, but I still
> don't think we should care.  We shouldn't use the cpu_id internally, but
> use pointers.  Sure, the guest might get confused, by that's not the kvm
> module's problem.

As Avi said, only the APIC (or more specifically, the apic-bus model)
should care.  But this is _very_ important.  Interrupt messages have
destination addresses expressed in terms of these IDs ;)

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] Threaded IPIs?

2007-07-26 Thread Gregory Haskins

Hi guys,
  While working with the -rt kernel, I have noticed a problem in KVM.
Specifically, when you stop a VM you sometimes get a "sleep while
atomic" oopses.  It turns out that the issue is related to an
smp_function_call IPI that KVM does to remotely flush the VMX hardware
on shutdown.  The code tries to acquire the global kvm_lock (which is a
normal spinlock_t, of course converted to rt_mutex on -rt) from the
interrupt context of the IPI handler.  You know the rest of the
story

The obvious solution is to convert the kvm_lock to a raw_spinlock_t.
However, I really don't want to do this unless we absolutely have to
since it will just increase latencies for a good portion of the rest of
KVM.

There are probably quite a few solutions here which don't involve the
"big hammer" conversion to raw_spinlocks.  One of them I was kicking
around was "what if FUNCTION_CALL IPIs (FC-IPI) could be generally
threaded just like hard/soft IRQs"?  This brings up some questions:

a) Will KVM function properly if we did this (I believe so)?

b) Would this be a good way to solve the problem (perhaps, but more
simple solutions probably exist)? 

c) Would this be useful to other subsystems besides KVM (I have no idea
what else is low-level enough to use FC-IPI)?

Help me to brainstorm on this threaded-IPI idea for a minute.  Assuming
this idea has merit, here are some of the ground-rules I was considering
for this feature:

1) By default, general IPIs will continue to act like NODELAY IRQs (e.g.
execute in interrupt context).  This means things like RESCHEDULE, et.
al. will continue to function as they do today.

2) By default, FC-IPIs would be threaded if hardirqs are threaded.

3) An option in the call parameter could specify if NODELAY-like
behavior is desired for subsystems that care to limit the deferment.
This would cause the FC-IPI to override the deferment mechanism and
execute directly in interrupt context on a per-call basis.

4) On systems where hardirqs are not threaded, the DEFER/NODELAY flag is
ignored and FC-IPIs resume their current behavior.

There would be some remaining challenges to resolve too, such as:

A) Normal deferment mechanism have threads that naturally affine to any
arbitrary processor that is free based on the scheduler policy, whereas
FC-IPIs usually affine to a specific processor (as is the case with
KVM).  Is there a way today to affine a deferred work item (e.g.
work-queues, tasklets, etc) to a specific CPU?  If not, we would have to
create one.

I guess I cannot think of any others at the moment, but (A) is big
enough to chew on for now ;)

Comments?

Regards,
-Greg


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 3/3] KVM: Protect race-condition between VMCS and current_vmcs on VMX hardware

2007-07-26 Thread Gregory Haskins

We need to provide locking around the current_vmcs/VMCS interactions to
protect against race conditions.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/vmx.c |   25 +++--
 1 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index d6354ca..78ff917 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -219,6 +219,7 @@ static void __vcpu_clear(void *arg)
 {
struct kvm_vcpu *vcpu = arg;
int cpu = raw_smp_processor_id();
+   unsigned long flags;
 
if (vcpu->cpu != -1) {
/*
@@ -238,8 +239,10 @@ static void __vcpu_clear(void *arg)
 * And finally, if this VMCS *was* currently active on this
 * CPU, mark the CPU as available again
 */
+   local_irq_save(flags);
if (per_cpu(current_vmcs, cpu) == vmx(vcpu)->vmcs)
per_cpu(current_vmcs, cpu) = NULL;
+   local_irq_restore(flags);
} else
/*
 * If vcpu->cpu thinks we are not installed anywhere,
@@ -464,6 +467,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu)
 {
int cpu;
u64 tsc_this, delta;
+   unsigned long flags;
+   int reload = 0;
 
cpu = get_cpu();
 
@@ -477,16 +482,24 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu)
 * operation above).  Either way, we must check to make sure we are
 * the currently loaded pointer, and correct it if we are not.
 *
-* Note: A race condition exists against current_vmcs between the
-* following update, and any IPIs dispatched to clear a different
-* VMCS.  Currently, this race condition is believed to be benign,
-* but tread carefully.
+* Note: We disable interrupts to prevent a race condition in
+* current_vmcs against IPIs from remote CPUs to clear their own VMCS.
+*
+* Also note that preemption is currently disabled, so there is no race
+* between the current_vmcs and the VMPTRLD operation which happens
+* shortly after the current_vmcs update external to the critical
+* section.
 */
+   local_irq_save(flags);
if (per_cpu(current_vmcs, cpu) != vmx(vcpu)->vmcs) {
-   /* Re-establish ourselves as the current VMCS */
-   vmcs_load(vmx(vcpu)->vmcs);
per_cpu(current_vmcs, cpu) = vmx(vcpu)->vmcs;
+   reload = 1;
}
+   local_irq_restore(flags);
+
+   if (reload)
+   /* Re-establish ourselves as the current VMCS */
+   vmcs_load(vmx(vcpu)->vmcs);
 
if (vcpu->cpu != cpu) {
struct descriptor_table dt;


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 1/3] KVM: Remove arch specific components from the general code

2007-07-26 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/kvm.h  |   31 -
 drivers/kvm/kvm_main.c |   26 +---
 drivers/kvm/kvm_svm.h  |3 
 drivers/kvm/svm.c  |  322 +---
 drivers/kvm/vmx.c  |  236 +--
 5 files changed, 320 insertions(+), 298 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index fc27c2f..6cbf087 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -15,7 +15,6 @@
 #include 
 #include 
 
-#include "vmx.h"
 #include 
 #include 
 
@@ -140,14 +139,6 @@ struct kvm_mmu_page {
};
 };
 
-struct vmcs {
-   u32 revision_id;
-   u32 abort;
-   char data[0];
-};
-
-#define vmx_msr_entry kvm_msr_entry
-
 struct kvm_vcpu;
 
 /*
@@ -309,15 +300,12 @@ void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
 struct kvm_io_device *dev);
 
 struct kvm_vcpu {
+   int valid;
struct kvm *kvm;
int vcpu_id;
-   union {
-   struct vmcs *vmcs;
-   struct vcpu_svm *svm;
-   };
+   void *_priv;
struct mutex mutex;
int   cpu;
-   int   launched;
u64 host_tsc;
struct kvm_run *run;
int interrupt_window_open;
@@ -340,14 +328,6 @@ struct kvm_vcpu {
u64 shadow_efer;
u64 apic_base;
u64 ia32_misc_enable_msr;
-   int nmsrs;
-   int save_nmsrs;
-   int msr_offset_efer;
-#ifdef CONFIG_X86_64
-   int msr_offset_kernel_gs_base;
-#endif
-   struct vmx_msr_entry *guest_msrs;
-   struct vmx_msr_entry *host_msrs;
 
struct kvm_mmu mmu;
 
@@ -366,11 +346,6 @@ struct kvm_vcpu {
char *guest_fx_image;
int fpu_active;
int guest_fpu_loaded;
-   struct vmx_host_state {
-   int loaded;
-   u16 fs_sel, gs_sel, ldt_sel;
-   int fs_gs_ldt_reload_needed;
-   } vmx_host_state;
 
int mmio_needed;
int mmio_read_completed;
@@ -579,8 +554,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 
data);
 
 void fx_init(struct kvm_vcpu *vcpu);
 
-void load_msrs(struct vmx_msr_entry *e, int n);
-void save_msrs(struct vmx_msr_entry *e, int n);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index bc11c2d..9cc16b8 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -367,7 +367,7 @@ static void free_pio_guest_pages(struct kvm_vcpu *vcpu)
 
 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
 {
-   if (!vcpu->vmcs)
+   if (!vcpu->valid)
return;
 
vcpu_load(vcpu);
@@ -377,7 +377,7 @@ static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
 
 static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
 {
-   if (!vcpu->vmcs)
+   if (!vcpu->valid)
return;
 
vcpu_load(vcpu);
@@ -1646,24 +1646,6 @@ void kvm_resched(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_resched);
 
-void load_msrs(struct vmx_msr_entry *e, int n)
-{
-   int i;
-
-   for (i = 0; i < n; ++i)
-   wrmsrl(e[i].index, e[i].data);
-}
-EXPORT_SYMBOL_GPL(load_msrs);
-
-void save_msrs(struct vmx_msr_entry *e, int n)
-{
-   int i;
-
-   for (i = 0; i < n; ++i)
-   rdmsrl(e[i].index, e[i].data);
-}
-EXPORT_SYMBOL_GPL(save_msrs);
-
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
int i;
@@ -2402,7 +2384,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int 
n)
 
mutex_lock(&vcpu->mutex);
 
-   if (vcpu->vmcs) {
+   if (vcpu->valid) {
mutex_unlock(&vcpu->mutex);
return -EEXIST;
}
@@ -2450,6 +2432,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int 
n)
kvm->nvcpus = n + 1;
spin_unlock(&kvm_lock);
 
+   vcpu->valid = 1;
+
return r;
 
 out_free_vcpus:
diff --git a/drivers/kvm/kvm_svm.h b/drivers/kvm/kvm_svm.h
index a869983..82e5d77 100644
--- a/drivers/kvm/kvm_svm.h
+++ b/drivers/kvm/kvm_svm.h
@@ -20,7 +20,10 @@ static const u32 host_save_user_msrs[] = {
 #define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs)
 #define NUM_DB_REGS 4
 
+struct kvm_vcpu;
+
 struct vcpu_svm {
+   struct kvm_vcpu *vcpu;
struct vmcb *vmcb;
unsigned long vmcb_pa;
struct svm_cpu_data *svm_data;
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 850a1b1..0c12e9e 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -49,6 +49,11 @@ MODULE_LICENSE("GPL");
 #define SVM_FEATURE_LBRV (1 << 1)
 #define SVM_DEATURE_SVML (1 << 2)
 
+static inline struct vcpu_svm* svm(struct kvm_vcpu *vcpu)
+{
+   return (struct vcpu_svm*)vcpu->_priv;
+}
+
 unsigned long iopm_base;
 unsigned long msrpm_base;
 
@@ -95,7 +100,7 @@ s

[kvm-devel] [PATCH 2/3] KVM: Clean up VMCLEAR/VMPTRLD code on VMX

2007-07-26 Thread Gregory Haskins

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
---

 drivers/kvm/vmx.c |   70 +++--
 1 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 5f0a7fd..d6354ca 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -188,6 +188,20 @@ static struct kvm_msr_entry *find_msr_entry(struct 
kvm_vcpu *vcpu, u32 msr)
return NULL;
 }
 
+static void vmcs_load(struct vmcs *vmcs)
+{
+   u64 phys_addr = __pa(vmcs);
+   u8 error;
+   
+   asm volatile (ASM_VMX_VMPTRLD_RAX "; setna %0"
+ : "=g"(error) : "a"(&phys_addr), "m"(phys_addr)
+ : "cc");
+
+   if (error)
+   printk(KERN_ERR "kvm: vmptrld %p/%llx fail\n",
+  vmcs, phys_addr);
+}
+
 static void vmcs_clear(struct vmcs *vmcs)
 {
u64 phys_addr = __pa(vmcs);
@@ -206,10 +220,34 @@ static void __vcpu_clear(void *arg)
struct kvm_vcpu *vcpu = arg;
int cpu = raw_smp_processor_id();
 
-   if (vcpu->cpu == cpu)
+   if (vcpu->cpu != -1) {
+   /*
+* We should *never* try to __vcpu_clear a remote VMCS. This
+* would have been addressed at a higher layer already
+*/
+   BUG_ON(vcpu->cpu != cpu);
+
+   /*
+* Execute the VMCLEAR operation regardless of whether the 
+* VMCS is currently active on this CPU or not (it doesn't
+* necessarily have to be)
+*/
vmcs_clear(vmx(vcpu)->vmcs);
-   if (per_cpu(current_vmcs, cpu) == vmx(vcpu)->vmcs)
-   per_cpu(current_vmcs, cpu) = NULL;
+
+   /*
+* And finally, if this VMCS *was* currently active on this
+* CPU, mark the CPU as available again
+*/
+   if (per_cpu(current_vmcs, cpu) == vmx(vcpu)->vmcs)
+   per_cpu(current_vmcs, cpu) = NULL;
+   } else
+   /*
+* If vcpu->cpu thinks we are not installed anywhere,
+* but this CPU thinks are are currently active, something is
+* wacked.
+*/
+   BUG_ON(per_cpu(current_vmcs, cpu) == vmx(vcpu)->vmcs);
+
rdtscll(vcpu->host_tsc);
 }
 
@@ -220,6 +258,7 @@ static void vcpu_clear(struct kvm_vcpu *vcpu)
else
__vcpu_clear(vcpu);
vmx(vcpu)->launched = 0;
+   vcpu->cpu   = -1;
 }
 
 static unsigned long vmcs_readl(unsigned long field)
@@ -423,7 +462,6 @@ static void vmx_load_host_state(struct kvm_vcpu *vcpu)
  */
 static void vmx_vcpu_load(struct kvm_vcpu *vcpu)
 {
-   u64 phys_addr = __pa(vmx(vcpu)->vmcs);
int cpu;
u64 tsc_this, delta;
 
@@ -432,16 +470,22 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu)
if (vcpu->cpu != cpu)
vcpu_clear(vcpu);
 
-   if (per_cpu(current_vmcs, cpu) != vmx(vcpu)->vmcs) {
-   u8 error;
-
+   /*
+* By the time we get here, we know that either our VMCS is already
+* loaded on the current CPU (from previous runs), or that its not
+* loaded *anywhere* in the system at all (due to the vcpu_clear()
+* operation above).  Either way, we must check to make sure we are
+* the currently loaded pointer, and correct it if we are not.
+*
+* Note: A race condition exists against current_vmcs between the
+* following update, and any IPIs dispatched to clear a different
+* VMCS.  Currently, this race condition is believed to be benign,
+* but tread carefully.
+*/
+   if (per_cpu(current_vmcs, cpu) != vmx(vcpu)->vmcs) {
+   /* Re-establish ourselves as the current VMCS */
+   vmcs_load(vmx(vcpu)->vmcs);
per_cpu(current_vmcs, cpu) = vmx(vcpu)->vmcs;
-   asm volatile (ASM_VMX_VMPTRLD_RAX "; setna %0"
- : "=g"(error) : "a"(&phys_addr), "m"(phys_addr)
- : "cc");
-   if (error)
-   printk(KERN_ERR "kvm: vmptrld %p/%llx fail\n",
-  vmx(vcpu)->vmcs, phys_addr);
}
 
if (vcpu->cpu != cpu) {


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 0/3] Arch cleanup v3

2007-07-26 Thread Gregory Haskins

Changes from v2

Patch #1: Fixed bad indent

I split the original #2 up into two patches:

Patch #2: Contains only VMX/VMCS cleanup
Patch #3: Contains a fix for VMCS the race condition previously discussed.

Patch #3 is optional given the recent discovery that the race should not cause
actual problems.

Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 2/2] KVM: Protect race-condition between VMCS and current_vmcs on VMX hardware

2007-07-26 Thread Gregory Haskins

On Thu, 2007-07-26 at 19:31 +0300, Avi Kivity wrote:
> Avi Kivity wrote:
> >>
> >> Sure.  It can happen with two VMs are running simultaneously.  Lets call
> >> them VM-a and VM-b.  Assume the scenario: VM-a is on CPU-x, gets
> >> migrated to CPU-y, and VM-b gets scheduled in on CPU-x.  There is a race
> >> on CPU-x with the VMCS handling logic between the VM-b process context,
> >> and the IPI to execute the __vcpu_clear for VM-a.
> >>   
> >
> > A race indeed, good catch.
> >
> > I think the race is only on the per_cpu(current_vmcs) variable, no?  
> > The actual vmcs ptr (as loaded by vmptrld) is handled by the processor.
> 
> btw, I think the race is benign.  if __vcpu_clear() wins, vcpu_load() 
> gets to set current_vmcs and all is well.  If vcpu_load() wins, 
> __vcpu_clear() stomps on current_vmcs, but the only effect of that the 
> next time vcpu_load() is called, it issues an unnecessary vmptrld.

Hmm.. Yes I think you are right.  When I first started thinking about
this is when I thought we needed to VMCLEAR the current before the
VMPTRLD, in which case this would be a real bug.  But in light of you
setting me straight on that issue, I think this race drops away too.  We
should probably comment the code just in case current_vmcs gets more
complex in the future so it doesn't get lost ;)

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 2/2] KVM: Protect race-condition between VMCS and current_vmcs on VMX hardware

2007-07-26 Thread Gregory Haskins

On Thu, 2007-07-26 at 18:35 +0300, Avi Kivity wrote:

> A race indeed, good catch.
> 
> I think the race is only on the per_cpu(current_vmcs) variable, no?  The 
> actual vmcs ptr (as loaded by vmptrld) is handled by the processor.

Correct.

> 
> > Disabling interrupts was chosen as the sync-primitive, because the code
> > will always be on the CPU in question.
> >
> >   
> 
> Looks a bit heavy handed.  How about replacing (in __vcpu_clear())
> 
> if (per_cpu(current_vmcs, cpu) == vcpu->vmcs)
> per_cpu(current_vmcs, cpu) = NULL;
> 
> by
> 
> cmpxchg_local(&per_cpu(current_vmcs, cpu), vcpu->vmcs, NULL);
> 
> ?

Hmm...possibly.  I've never worked with the cmpxchg subsystem so let me
look into it a little bit and get back to you.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

1 2 3 4 5 >

1 - 100 of 493 matches

Mail list logo