Avihai Horon <avih...@nvidia.com> wrote:

>> You have a point here.
>> But I will approach this case in a different way:
>>
>> Destination QEMU needs to be older, because it don't have the feature.
>> So we need to NOT being able to do the switchover for older machine
>> types.
>> And have something like this is qemu/hw/machine.c
>>
>> GlobalProperty hw_compat_7_2[] = {
>>      { "our_device", "explicit-switchover", "off" },
>> };
>>
>> Or whatever we want to call the device and the property, and not use it
>> for older machine types to allow migration for that.
>
> Let me see if I get this straight (I'm not that familiar with
> hw_compat_x_y):
>
> You mean that device Y which adds support for explicit-switchover in
> QEMU version Z should add a property
> like you wrote above, and use it to disable explicit-switchover usage
> for Y devices when Y device
> from QEMU older than Z is migrated?

More that "from" "to"

Let me elaborate.  We have two QEMUs:

QEMU version X, has device dev. Let's call it qemu-X.
QEMU version Y (X+1) add feature foo to device dev.  Let's call it qemu-Y.

We have two machine types (for this exercise we don't care about
architectures)

PC-X.0
PC-Y.0

So, the possible combinations are:

First the easy cases, same qemu on both sides.  Different machine types.

$ qemu-X -M PC-X.0   -> to -> qemu-X -M PC-X.0

  good. neither guest use feature foo.

$ qemu-X -M PC-Y.0   -> to -> qemu-X -M PC-Y.0

  impossible. qemu-X don't have machine PC-Y.0.  So nothing to see here.

$ qemu-Y -M PC-X.0   -> to -> qemu-Y -M PC-X.0

  good.  We have feature foo in both sides. Notice that I recomend here
  to not use feature foo.  We will see on the difficult cases.

$ qemu-Y -M PC-Y.0   -> to -> qemu-Y -M PC-Y.0

  good.  Both sides use feature foo.

Difficult cases, when we mix qemu versions.

$ qemu-X -M PC-X.0  -> to -> qemu-Y -M PC-X.0

  source don't have feature foo.  Destination have feature foo.
  But if we disable it for machine PC-X.0, it will work.

$ qemu-Y -M PC-X.0  -> to -> qemu-X -M PC-X.0

  same than previous example.  But here we have feature foo on source
  and not on destination.  Disabling it for machine PC-X.0 fixes the
  problem.

We can't migrate a PC-Y.0 when one of the qemu's is qemu-X, so that case
is impossible.

Does this makes more sense?

And now, how hw_compat_X_Y works.

It is an array of registers with the format:

- name of device  (we give some rope here, for instance migration is a
  device in this context)

- name of property: self explanatory.  The important bit is that
  we can get the value of the property in the device driver.

- value of the property: self explanatory.

With this mechanism what we do when we add a feature to a device that
matters for migration is:
- for the machine type of the version that we are "developing" feature
  is enabled by default.  For whatever that enable means.

- for old machine types we disable the feature, so it can be migrate
  freely with old qemu. But using the old machine type.

- there is way to enable the feature on the command line even for old
  machine types on new qemu, but only developers use that for testing.
  Normal users/admins never do that.

what does hw_compat_7_2 means?

Ok, we need to know the versions.  New version is 8.0.

hw_compat_7_2 has all the properties represensing "features", defaults,
whatever that has changed since 7.2.  In other words, what features we
need to disable to get to the features that existed when 7.2 was
released.

To go to a real example.

In the development tree.  We have:

GlobalProperty hw_compat_8_0[] = {
    { "migration", "multifd-flush-after-each-section", "on"},
};
const size_t hw_compat_8_0_len = G_N_ELEMENTS(hw_compat_8_0);

Feature is implemented in the following commits:

77c259a4cb1c9799754b48f570301ebf1de5ded8
b05292c237030343516d073b1a1e5f49ffc017a8
294e5a4034e81b3d8db03b4e0f691386f20d6ed3

When we are doing migration with multifd and we pass the end of memory
(i.e. we end one iteration through all the RAM) we need to make sure
that we don't send the same page through two channels, i.e. contents of
the page at iteration 1 through channel 1 and contents of the page at
iteration 2 through channel 2.  The problem is that they could arrive
out of order and the of page of iteration 1 arrive later than iteration
2 and overwrite new data with old data.  Which is undesirable.
We could use complex algorithms to fix that, but one easy way of doing
it is:

- When we finish a run through all memory (i.e.) one iteration, we flush
  all channels and make sure that everything arrives to destination
  before starting sending data o the next iteration.  I call that
  synchronize all channels.

And that is what we *should* have done.  But when I implemented the
feature, I did this synchronization everytime that we finish a cycle
(around 100miliseconds).  i.e. 10 times per second. This is called a
section for historical reasons. And when you are migrating
multiterabytes RAM machines with very fast networking, we end waiting
too much on the synchronizations.

Once detected the problem and found the cause, we change that.  The
problem is that if we are running an old qemu against a new qemu (or
viceversa) we would not be able to migrate, because one send/expects
synchronizations at different points.

So we have to maintain the old algorithm and the new algoritm.  That is
what we did here.  For machines older than <current in development>,
i.e. 8.0 we use the old algorithm (multifd-flush-after-earch section is
"on").

But the default for new machine types is the new algorithm, much faster.

I know that the explanation has been quite long, but inventing an
example was going to be even more complex.

Does this makes sense?

Later, Juan.


Reply via email to