Vimage howto

2008-12-08 Thread Julian Elischer

Well not completely, but I've had a number of questions over the
last few months about what it is, so, as Marko and I have written
the following how to virtualize your module document, I've been
directing people to it. After another couple of questions I think
this could do with wider distribition..

It is available at:

http://perforce.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/vimage/porting_to_vimage.txt

but I include it here for popular enjoyment.

Please contact me or Marko if you have any questions or suggestions on 
this.

===
Vimage: what is it?
===

Vimage is a framework in the BSD kernel which allows a co-operating module
to operate on multiple independent instances of its state so that it can
participate in a virtual machine / virtual environment scenario.

The implementation approach taken by the vimage framwork is a replacement
of selected global state variables with constructs that allow for the
virtualized state to be stored and resolved in appropriate instances of
module-specific container structures.  The code operating on virtualized state
has to conform to a set of rules described further below, among other things
in order to allow for all the changes to be conditionally compilable, i.e.
permitting the virtualized code to fall back to operation on global state.

The most visible change throughout the existing code is typically replacement
of direct references to global variables with macros; foo_bar thus becomes
V_foo_bar.  V_foo_bar macros will resolve back to foo_bar global in default
kernel builds, and alternatively to some_base_pointer-_foo_bar for options
VIMAGE kernel configs.  Prepending of V_ prefixes to variable references
helps in visual discrimination between global and virtualized state.  The
framework extends the sysctl infrastructure to support access to virtualized
state through introduction of the SYSCTL_V family of macros; those also
automatically fall back to their standard SYSCTL counterparts in default
kernel builds.  Transparent kldsym(2) lookups are provided to virtualized
variables explicitly marked for visibility to kldsym interface, which permits
userland binaries such as netstat to operate unmodified on options VIMAGE
kernels, though this may have wide security implications.

The vimage struct is currently primarily a placeholder for pointers to
module-specific struct instances; currently V_NET (networking), V_CPU
(CPU scheduling), and V_PROCG (jail-style interprocess protection) major
module classes are defined.  Each vimage module may or may not be further
split into minor or submodules; the networking subsystem (vimage id V_NET;
struct vnet) in particular is organized in submodules such as VNET_MOD_NET
(mandatory shared infrastructure: routing tables, interface lists etc.);
VNET_MOD_INET (IPv4 state including transport protocols); VNET_MOD_INET6,
VNET_MOD_IPSEC, VNET_MOD_IPFW, VNET_MOD_NETGRAPH etc.  The speciality of
VNET submodules is in that they not only provide storage for virtualized
data, but also enforce ordering of initialization and cleanup.  Hence, not
all submodules must necessarily allocate private storage for their specific
data; they may be defined solely for to support proper initialization
ordering.

Each process is associated with a vimage, and vimages currently hang off of
ucred-s.  This relationship defines a process's administrative affinity
to a vimage and thus indirectly to all of its modules (NET, CPU, PROCG)
as well as to any submodules.  All network interfaces and sockets hold
pointers back to their parent vnets; this relationship is obviously entirely
independent from proc-ucred-vimage bindings.  Hence, when a process
opens a socket, the socket will get bound to a vnet instance hanging off of
proc-ucred-vimage-vnet, but once such a socket-vnet binding gets
established, it cannot be changed for the entire socket lifetime.  Certain
classes of network interfaces (Ethernet in particular) can be assigned
from one vnet to another at any time.  By definition all vnets are
are independent and can communicate only if they are explicitly provided
with communication paths; currently only netgraph can be used to establish
inter-vnet datapaths.

In network traffic processing the vnet affinity is defined either by the
inbound interface or by the socket / pcb - vnet binding.  However, there
are many functions in the network stack that cannot implicitly fetch
the vnet context from their standard arguments.  Instead of explicitly
extending argument lists of such functions with a struct vnet *,
a per-thread variable td_vnet was introduced, which can be fetched via
the curvnet macro (#define curvnet curthread-td_vnet).  The curvnet
context has to be set on entry to the network stack (socket operations,
packet reception, or timer-driven functions) and cleared on exit.  This
must be done via provided CURVNET_SET() / CURVNET_RESTORE() family of
macros, which allow for stacking of curvnet context setting and 

Re: Vimage howto

2008-12-08 Thread Bruce M. Simpson

Julian,

Thank you (and Marko) very much for preparing this document.

The VIMAGE import has had me at something of an impasse re: the IGMPv3 
branch and clearly written documentation is a big help indeed.


Julian Elischer wrote:

Well not completely, but I've had a number of questions over the
last few months about what it is, so, as Marko and I have written
the following how to virtualize your module document, I've been
directing people to it. After another couple of questions I think
this could do with wider distribition..


Thank you also for providing it here on the list, as opposed to relying 
on Perforce alone. Whilst I understand committers rate p4 for 
experimental work in the FreeBSD sphere, sadly it is simply not 
accessible to the not-so-silent majority in the FreeBSD sphere who are 
not committers, which makes its continued use questionable at best.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]