Re: [Libguestfs] Splitting up virt-v2v

Martin Kletzander Wed, 17 Mar 2021 03:25:12 -0700

On Wed, Nov 25, 2020 at 10:29:45AM +0000, Richard W.M. Jones wrote:

For a long time I've wanted to split up virt-v2v into smaller
components to make it easier to consume.  It's never been clear how to
do this, but I think I have a workable plan now, described in this email.


In contrast I am replying for a long time here as well.  I know we actually
talked about all this on IRC, but I them planned to post the summary here and I
did not.  Unfortunately I cannot find this in my IRC logs any more, so I will
try to remember as many things as possible.

----------------------------------------------------------------------

First, the AIMS, which are:

(a) Preserve current functionality, including copying conversion,
   in-place conversion, and the virt-v2v command line.

(b) Allow warm migration to use virt-v2v without requiring the
   "--debug-overlays hack".

(c) Allow threads, multi-conn, and parallel copying of guest disks, all
   for better copying performance.

(d) Allow an alternate supervisor to convert and copy many guests in
   parallel, given that the supervisor has a global view of the
   system/network (I'm not intending to implement this, only to make
   it possible).

(e) Better progress bars.

(f) Better logging.

(g) Reuse as much existing code as possible.  This is NOT a rewrite!


So my idea was that this could be split into phases similar to what is shown in
v2v/types.mli, separately per each input and output object and then some core
functionality.  Any internal state could be kept in a specific file so that it
is accessible to all these helpers.  After that virt-v2v _could_ be implemented
by a shell script if someone wanted.  Of course that is just for illustration of
how usable that would be, not that it would be viable to make that a shell
script, of course.

There are some command line options to limit virt-v2v to do only some phase or
skip a phase (--print-source, --no-copy, -o null) and this would make it more
approachable.  Serialising the internal state into a file is something that does
not have to be parsed by anyone else and can also be pretty relaxed when it
comes to backward compatibility (two input helpers will probably not need to be
run from two different versions of virt-v2v).

What would be nice to have exposed is the internal representation of all the
information needed to construct an output guest description.  I would have to be
in an extensible format, but it would not be prone to getting stale as often.
Think of it as `-o json` and then few helpers that convert this type of
information into the output format.  What would be nice about this is that
supporting any new format would not require a new output type for each tiny
change.  Keeping some existing format up to date could also be easier because
the clear split could lower the barrier for developers to support their cloud
solution format.  Of course I do not know how much of an issue this currently is
(or is not), but it seemed like a good idea to me.

----------------------------------------------------------------------

Here's my PLAN:

/usr/bin/virt-v2v still exists, but it's now a supervisor program
(possibly even a shell script) that runs the steps below:

(1) Set up the input side by running "helper-v2v-input-<type>".  For
   all input types this creates a temporary directory containing:

   /tmp/XXXXXX/in1    NBD endpoints overlaying the source disk(s)
   /tmp/XXXXXX/in2    (these are actually Unix domain sockets)
   /tmp/XXXXXX/in3
   /tmp/XXXXXX/metadata.in   Metadata parsed from the source.

   Currently for most inputs we have a running nbdkit process for
   each source disk, and we'd do the same here, except we add
   nbdkit-cow-filter on top so that the source disk is protected from
   being modified.  Another small difference is that for -i disk
   (local input) we would need an active nbdkit process on top of the
   disk, whereas currently we set the disk as a qcow2 backing file.

(2) Perform the conversion by running "helper-v2v-convert".  This does
   the conversion and sparsification.  It writes directly to the NBD
   endpoints (in*) above.  The writes are stored in the COW overlay
   so the source disk is not modified.


This would make it easy for someone to copy the disks themselves and then
provide them as nbd sockets if they want to modify the copies in place, locally,
*after* copying and without any extra cow layer on top.  That is good.

   Conversion will also create an output metadata file:

   /tmp/XXXXXX/metadata.out   Target metadata

   Exact format of the metadata files is to be decided, but some kind
   of not-quite-libvirt-XML may be suitable.  It's also not clear if
   the metadata format is an internal detail of virt-v2v, or if we
   document it as a stable API.

(3) Set up the output side by running "helper-v2v-output-<type>
   setup".  This will read the output metadata and do whatever is
   needed to set up the empty output disks (perhaps by creating a
   guest on the target, but also this could be done in step (5)
   below).

   This will create:

   /tmp/XXXXXX/out1    NBD endpoints overlaying the target disk(s)
   /tmp/XXXXXX/out2    (these are actually Unix domain sockets)
   /tmp/XXXXXX/out3

(4) Do the copy.  By default this will run either nbdcopy or qemu-img
   convert from in* -> out*.

   Copying could be done in parallel, currently it is done serially.

(5) Finalize the output by running "helper-v2v-output-<type> final".
   This might create the target guest and whatever else is needed.

(6) Kill the NBD servers and clean up the temporary directory.


Of course the suggested split is not required, what you suggest here would work
just as well.  I just wanted share the idea I had because I thought it could
actually be easier to do, maintain, and future-proof.

----------------------------------------------------------------------

Let's see how this plan matches the aims.

Aim (a):

 Copying conversion works as outlined above.  In-place conversion
 works by placing an NBD server on top of the files you want to
 convert and running helper-v2v-convert (virt-v2v --in-place would
 also still work for backwards compat).


I remember --in-place doing some input-related shenanigans that made it
different from "just convert this".  But I think keeping the original --in-place
will not cause any issues.

Aim (b):

 Warm migration: Should be fairly clear this can work in the same way
 as in-place conversion, but I'll discuss this further with Martin K
 and Tomas to make sure I'm not missing anything.


The separation of steps works a bit better.  I think keeping the pre-checks and
everything is good, it's just that when one is messing up with the workflow it
is easier to plug various phases together at different times when it is more
split apart.  I would imagine high-level debugging by non-expert is easier as
well.

Aims (c), (d):

 Threads etc for performance: Although I don't plan to implement
 this, it's clear that an alternate supervisor program could improve
 performance here by either doing copies of a single guest / multiple
 disks in parallel, but even better by having a global view of the
 system and doing copies of multiple guests' disks in parallel.

 This is outside the scope of the virt-v2v project, but in scope for
 something like MTV.


And easy to do with the split ;-)

Aim (e):

 Better progress bars: nbdcopy should have support for
 machine-readable progress bars, once I push the changes.  It will
 mean no more need to parse debug logs.

Aim (f):

 Better logging: I hope we can log each step separately.

 A custom supervisor program would also be able to tell which
 particular step failed (eg. did it fail in conversion?  did it fail
 copying a disk and which one?)

Aim (g):

 This works by splitting up the existing v2v code base into separate
 binaries.  It is already broadly structured (internally) like this.
 So it's not a rewrite, it's a big refactoring.

 However I'd probably write a new virt-v2v supervisor binary, because
 the existing command line parsing code is extremely complex.


Sounds similar to what I thought.  And if it is simplified, then virt-v2v can
just forward arguments to appropriate places.  Sounds good.

I do not know if I mentioned everything and I am not sure how deep we went into
some of the details, but I guess the best way will show up later on no matter
what.

Once again sorry for such a late public reply, hopefully this will at least keep
part of the conversation archived =)

Have a great day,
Martin

signature.asc
Description: PGP signature

_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] Splitting up virt-v2v

Reply via email to