> Maybe migrating the entire VM system would be neat, but I don't think
> it would be my top priority. With current technology (CSE, shared
> IUCV, shared DASD, etc) you can already build a set of VM systems that
> look similar enough that you don't care where the virtual machine is
> running.

I would concur. Moving CP is more complex than is necessary. Moving
virtual machines is far easier (comparatively), and given VM's new life
as the hosting platform, what we care about is the contents of the
virtual machines, not CP. 

> The big challenge would be to have very well defined for which virtual
> machines it would work. You don't want something like V=R Recovery
> which worked in theory but requirements were such that it could rarely
> do so in real life.

>From the discussions that Perry Ruiter and I had back in 2000 about
this, the biggest problems are primarily in device state and
connectivity, followed by getting memory pages that have been flushed
out to disk transferred, which implies some controlled effort to do the
process migration. Uncontrolled migration attempts (eg, crash recovery
situations) probably haven't a prayer of working. 

A key point would be that this would likely work only in situations
where the alternate system was cabled to all the same disk at the same
physical address, and with network connections cabled to the same
switches. The introduction of L2 support and IEEE VLAN support in the
OSA and the VM and Linux TCP stacks makes the network takeover part of
this a lot easier in that we can actually implement MAC level takeover
if desired (the code is already in the Linux stack; there'd be some
development necessary for VM). 

One area that might be interesting would be to investigate a Linux
device driver using CP *BLOCKIO services for disk I/O instead of
directly addressing the disks. There would be a performance impact, but
the additional layer of isolation would effectively remove any disk
geometry-sensitive components in the I/O layer, and it would have the
side effect of putting a lot more of the I/O state in a place where CP
could get at it more effectively. Of course, that introduces a problem
with moving IUCV connections, but I think that's relatively easily
solvable (ie what to do if a IUCV sever occurs and how to redrive the
I/O request transparently). 

Another area might be to do just Linux process migration to guests
already running on the alternate system. There is code in the Linux
OpenSSI toolkit to do that kind of thing for identically configured
systems (CPU, memory, I/O) in a sort of "stun, transfer, resume" model,
but it involves some fairly sophisticated kernel hacking to activate it.
www.openssi.org if you're interested in that (it also has some
interesting functions for presenting guests as a single-system-image). 

-- db

Reply via email to