Re: Re[2]: vkernel & GSoC, some questions

2008-03-16 Thread Matthew Dillon
Basically DragonFly has a syscall API that allows a userland process
to create and completely control any number of VM spaces, including
the ability to pass execution control to a VM space and get it back,
and control memory mappings within that VM space (and in the virtual
kernel process itself) on a page-by-page basis, so only 'invalid' PTEs
are passed through to the virtual kernel by the real kernel and the
real kernel caches page mappings with real hardware pmaps.  Any
exception that occurs within a running VM space is routed back to the
virtual kernel process by the real kernel.  Any real signal (e.g. the
vkernel's 'clock' interrupt) or exception that occurs also forces control
to return to the vkernel process.

A DragonFly virtual kernel is just a user process which uses this feature
to manipulate VM contexts (i.e. for processes running under the vkernel
itself), providing a complete emulation environment that is opaque to
userland.  The vkernel itself is not running in an emulated environment,
it is a 'real' (and singular) user process running on the machine.
These VM contexts are managed by the real kernel as pure VM contexts,
NOT as threads or processes or anything else.  Since the VM context in
the real kernel basically has one VM entry (representing the software
emulated mmap of the entire address space), and since pmap's use
throw-away PTEs, the real-kernel overhead is minimal and there is no
real limit to the number of virtualized processes the virtual kernel
can control, nor any other resource limitations within the real kernel.

One can even run a virtual kernel inside a virtual kernel... not sure why
anyone would want to do it, but it works!  I can even thrash the virtual
kernel without it having any effect whatsoever on the real kernel or
system.

The ENTIRE operational overhead rests solely in operations which must
perform a context switch.  Cpu-bound programs will run at full speed and
I/O bound programs aren't too bad either.  Context-switch-heavy programs
suffer as they do in a hardware virtualized environment.  Make no
mistake about that, running any sort of kernel in a hardware virtualized
environment that wasn't designed to run in and you are going to have
horrible performance, as many people trying to simply 'move' their
existing machines to virtualized environments have found out the hard
way.   I could probably shave off a microsecond from our virtual
kernel syscall path, but it isn't a priority for me... I'm using a
code efficient but performance inefficient implementation to pass
contextual information between the emulated VM context and the
virtual kernel, and it's a fairly expensive copy op that would benefit
greatly if it were converted to shared memory or if I simply cached the
userland page in the real kernel to avoid the copyout/lookup/pmap op.
I could probably also parallelize the real I/O backend for the 'disk'
better, but it isn't a priority for me either.

SMP is supported the same as it is supported in a real kernel, the
virtual kernel simply creates a LWP for each 'cpu' (for all intents
and purposes you can think of it as forking once for each cpu).  All
the LWPs have access to the same pool of VM contexts and thus the
virtual kernel can schedule its processes to any of the LWPs on a whim.
It just uses the same process scheduler that the real kernel does...
nearly all the code in the virtual kernel is the same, in fact, the
vkernel 'platform' is only 700K of source code.

There are some minor (and admittedly not very well developed) shims to
reduce the load on the real machine when you do things like run a
vkernel simulating many cpu's on a machine which only has a few
physical cpu's.  Spinning in a thread vs on a hard cpu is not the best
thing in the world to do, after all.  In anycase, this means that
generally speaking SMP performance in a virtual kernel will scale as
DragonFly's own SMP performance is improved.  Right now the vkernels
can be built SMP but it isn't recommended... those kinds of builds
are best used to test SMP work and not for real applications.

--

Insofar as virtual kernels verses machine emulation and performance goes,
people need to realize that *NO* machine emulation technology is going
to perform well for any task requiring a lot context switching or a lot
of non-MMU-resolvable page faults.  No matter WHAT technology you use,
at some point any real I/O operation will have to pass through the real
kernel, period.  For example, a syscall-heavy process running under a
virtual kernel will perform just about as badly as a syscall-heavy
process running under something like VMWare.  Hardware virtualized MMU
support isn't quite advanced enough to solve the performance 

Re: Re[2]: vkernel & GSoC, some questions

2008-03-16 Thread Matthew Dillon

:
:Given the fact that there are not as many developers as needed, what would be 
a practical purpose of vkernel?
:
:UML is typically used to debug drivers and/or for hosting. Now that Linux 
about to have or already has container technology, hosting on UML makes little 
sense.

The single largest benefit UML or a hardware emulated environment has
over a jail is that it is virtually impossible to crash the real kernel
no matter what you are doing within the virtualized environment.  I
don't know any ISP that is able to keep a user-accessible (shell prompt)
machine up consistently outside of a UML environment.  The only reason
machines don't crash more is that they tend to run a subset of available
applications in a subset of possible load and resource related
circumstances.

Neither jails no containers nor any other native-kernel technology will
EVER solve that problem.  For that matter, no native-kernel technology
will ever come close to providing the same level of compartmentalization
from a security standpoint, and particularly not if you intend to run
general purposes applications in that environment.

The reason UML is used, particularly for web hosting, is because 
web developers require numerous non-trivial backend tools to be installed
each of which has the potential to hog resources, crash the machine,
create security holes, or otherwise create hell for everyone else.  The
hell needs to be restricted and narrowed as much as possible so human
resources can focus on the cause rather then on the collateral damage.
For any compute-intensive business, collateral damage is the #1 IT issue,
the cost of power is the #2 issue, and network resources are the #3
issue.  Things like cpu and machines... those are in the noise.  They're
basically free.

With a virtual kernel like UML (or our vkernel), the worse that happens
is that the vkernel itself crashes and reboots in 5 seconds (+ fsck time
for that particular user).  No other vkernel is effected, no other 
customer is effected, no other compartmentalized resource is effected.

Jails are great, no question about it, and there are numerous applications
which require the performance benefits that running in a jail verses
an emulated environment provides, but we will never, EVER see jails
replace UML.  This is particularly true considering the resource being
put into improving emulated environments.  The overhead for running an
emulated environment ten years from now is probably going to be a
fraction of the overhead it is now, as hardware catches up to desire.

-Matt

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[2]: vkernel & GSoC, some questions

2008-03-17 Thread Alexander Sack
On Sun, Mar 16, 2008 at 7:13 PM, Matthew Dillon <[EMAIL PROTECTED]>
wrote:

>Basically DragonFly has a syscall API that allows a userland process
>to create and completely control any number of VM spaces, including
>the ability to pass execution control to a VM space and get it back,
>and control memory mappings within that VM space (and in the virtual
>kernel process itself) on a page-by-page basis, so only 'invalid' PTEs
>are passed through to the virtual kernel by the real kernel and the
>real kernel caches page mappings with real hardware pmaps.  Any
>exception that occurs within a running VM space is routed back to the
>virtual kernel process by the real kernel.  Any real signal (e.g. the
>vkernel's 'clock' interrupt) or exception that occurs also forces
> control
>to return to the vkernel process.


Matt, I'm sorry I'm not trying to hijack this thread but isn't the vkernel
approach very similar to VMWare's hosted architecture products (such as
Fusion for the Mac and Client Workstation for windows)?

As I understand it, they have a regular process like vkernel called
vmware-vmx which provides the management of different VM contexts running
along side the host OS.  It also does a passthrough for invalid PTEs to the
real kernel and manages contexts in I believe the same fashion you just
described.  There is also an I/O subsystem a long side it to reuse the
hosted drivers to managed the virtualized filesystem and devices - not sure
what Dragon does.

I realize that their claim to fame is as you said x86 binary code
translations but I believe VMWare's product is very close to what you are
describing with respect to vkernels (please correct me if I'm wrong).  Its
just that this thread has devolved slightly into a hypervisor vs. hosted
architecture world and I believe their is room for both.

Thanks!

-aps

-- 
"What lies behind us and what lies in front of us is of little concern to
what lies within us." -Ralph Waldo Emerson
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[2]: vkernel & GSoC, some questions

2008-03-17 Thread Matthew Dillon

:Matt, I'm sorry I'm not trying to hijack this thread but isn't the vkernel
:approach very similar to VMWare's hosted architecture products (such as
:Fusion for the Mac and Client Workstation for windows)?
:
:As I understand it, they have a regular process like vkernel called
:vmware-vmx which provides the management of different VM contexts running
:along side the host OS.  It also does a passthrough for invalid PTEs to the
:real kernel and manages contexts in I believe the same fashion you just
:described.  There is also an I/O subsystem a long side it to reuse the
:hosted drivers to managed the virtualized filesystem and devices - not sure
:what Dragon does.
:
:I realize that their claim to fame is as you said x86 binary code
:translations but I believe VMWare's product is very close to what you are
:describing with respect to vkernels (please correct me if I'm wrong).  Its
:just that this thread has devolved slightly into a hypervisor vs. hosted
:architecture world and I believe their is room for both.
:
:Thanks!
:
:-aps

This reminds me of XEN.  Basically instead of trying to rewrite
instructions or do 100% hardware emulation it sounds like they are
providing XEN-like functionality where the target OS is aware it is
running inside a hypervisor and can make explicit 'shortcut' calls to
the hypervisor instead of attempting to access the resource via
emulated hardware.

These shortcuts are going to be considerably more efficient, resulting
in better performance.  It is also the claim to fame that a vkernel
architecture has.  In fact, XEN is really much closer to a vkernel
architecture then it is to a hypervisor architecture.  A vkernel can
be thought of as the most generic and flexible implementation, with
access to many system calls verses the fairly limited set XEN provides,
and a hypervisor's access to the same subset is done by emulating
hardware devices.

In all three cases the emulated hardware -- disk and network basically,
devolves down into calling read() or write() or the real-kernel
equivalent.  A hypervisor has the most work to do since it is trying to
emulate a hardware interface (adding another layer).  XEN has less work
to do as it is really not trying to emulate hardware.  A vkernel has
even less work to do because it is running as a userland program and can
simply make the appropriate system call to implement the back-end.

There are more similarities then differences.  I expect VMWare is feeling
the pressure from having to hack their code so much to support multiple
operating systems... I mean, literally, every time microsoft comes out
with an update VMWare has to hack something new in.  it's really amazing
how hard it is to emulate a complete hardware environment, let alone do
it efficiently.

Frankly, I would love to see something like VMWare force an industry-wide
API for machine access which bypasses the holy hell of a mess we have
with the BIOS, and see BIOSes then respec to a new far cleaner API.  The
BIOS is the stinking pile of horseshit that has held back OS development
for the last 15 years.

For hardware emulation to really work efficiently one pretty much has to
dedicate an entire cpu to the emulator in order to allow it to operate
more like a coprocessor and save a larger chunk of the context switch
overhead which is the bane of VMWare, UML/vkernel, AND XEN.  This may
seem wasteful but when you are talking about systems with 4 or more cores
which are more I/O and memory limited then they are cpu limited,
dedicating a whole cpu to handle critical path operations would probably
boost performance considerably.

-Matt

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[2]: vkernel & GSoC, some questions

2008-03-17 Thread Alexander Sack
Some interesting reading for anyone who cares:

http://citeseer.ist.psu.edu/rd/89980079%2C480988%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/24361/http:zSzzSzwww.usenix.orgzSzpublicationszSzlibraryzSzproceedingszSzusenix01zSzsugermanzSzsugerman.pdf/venkitachalam01virtualizing.pdf

> These shortcuts are going to be considerably more efficient, resulting
>in better performance.  It is also the claim to fame that a vkernel
>architecture has.  In fact, XEN is really much closer to a vkernel
>architecture then it is to a hypervisor architecture.  A vkernel can
>be thought of as the most generic and flexible implementation, with
>access to many system calls verses the fairly limited set XEN provides,
>and a hypervisor's access to the same subset is done by emulating
>hardware devices.

I've never used XEN (paravirtualization) but I assume that the target
OS then has special system calls or shortcuts to ask the underlying
monitor/hypervisor to the right things (like allocate safe (virtual)
memory instead of relying on a shadow/trap model etc.).

>In all three cases the emulated hardware -- disk and network basically,
>devolves down into calling read() or write() or the real-kernel
>equivalent.  A hypervisor has the most work to do since it is trying to
>emulate a hardware interface (adding another layer).  XEN has less work
>to do as it is really not trying to emulate hardware.  A vkernel has
>even less work to do because it is running as a userland program and can
>simply make the appropriate system call to implement the back-end.

I'm pretty sure this is what VMWare does for the their hosted product.
 Its a simple userland process that makes syscalls and traps
interrupts which eventually devolve into reads and writes.  I believe
they do a lot of performance work in interrupt coalescing and doing
their darnest to prevent world-wide context switches.

>  There are more similarities then differences.  I expect VMWare is feeling
>the pressure from having to hack their code so much to support multiple
>operating systems... I mean, literally, every time microsoft comes out
>with an update VMWare has to hack something new in.  it's really amazing
>how hard it is to emulate a complete hardware environment, let alone do
>it efficiently.

No doubt virtualization is a tough job and I'm wondering if future
hardware enhancements will make software like VMWare/vkernel/XEN
obsolete in the end.

>Frankly, I would love to see something like VMWare force an industry-wide
>API for machine access which bypasses the holy hell of a mess we have
>with the BIOS, and see BIOSes then respec to a new far cleaner API.  The
>BIOS is the stinking pile of horseshit that has held back OS development
>for the last 15 years.

EFI?  Just kidding...

-aps

-- 
"What lies behind us and what lies in front of us is of little concern
to what lies within us." -Ralph Waldo Emerson
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[2]: vkernel & GSoC, some questions

2008-03-18 Thread Peter Jeremy
On Mon, Mar 17, 2008 at 01:16:41PM -0700, Matthew Dillon wrote:
>This reminds me of XEN.  Basically instead of trying to rewrite
>instructions or do 100% hardware emulation it sounds like they are
>providing XEN-like functionality where the target OS is aware it is
>running inside a hypervisor and can make explicit 'shortcut' calls to
>the hypervisor instead of attempting to access the resource via
>emulated hardware.

That reminds me of IBM VM/CMS: CP (the hypervisor) had a variety of
magic "syscalls" (via the DIAGNOSE instruction) that CMS would use
to perform (eg) real I/O.

>Frankly, I would love to see something like VMWare force an industry-wide
>API for machine access which bypasses the holy hell of a mess we have

It would need to be open and I can't see any particular driver for
VMWare (or anyone else) to force this.

>with the BIOS, and see BIOSes then respec to a new far cleaner API.  The
>BIOS is the stinking pile of horseshit that has held back OS development
>for the last 15 years.

I'd go further and say that BIOSes are getting worse: Back in the
AT-clone days, you could just totally ignore the BIOS once you'd
gotten the kernel loaded.  Now you _have_ to keep talking to the
BIOS for things like ACPI - but the BIOSes are still just as broken
as they used to be.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgp7GShFjTuFG.pgp
Description: PGP signature


Re: Re[2]: vkernel & GSoC, some questions

2008-03-18 Thread Robert Watson


On Tue, 18 Mar 2008, Peter Jeremy wrote:

   with the BIOS, and see BIOSes then respec to a new far cleaner API. 
The

   BIOS is the stinking pile of horseshit that has held back OS development
   for the last 15 years.


I'd go further and say that BIOSes are getting worse: Back in the AT-clone 
days, you could just totally ignore the BIOS once you'd gotten the kernel 
loaded.  Now you _have_ to keep talking to the BIOS for things like ACPI - 
but the BIOSes are still just as broken as they used to be.


On Sun's Niagara (sun4v) platform, it is expected that all OS's will sit on 
top of the hypervisor that ships in the firmware, abstracting away countless 
basic hardware services behind hypercalls.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"