Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration

2009-11-16 Thread Dor Laor

On 10/28/2009 08:54 AM, Michael Goldish wrote:


- "Dor Laor"  wrote:


On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote:

Hi Michael, I am reviewing your patchset and have just a minor

remark

to make here:

On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish

  wrote:

This patch adds a new test that checks the timedrift introduced by

migrations.

It uses the same parameters used by the timedrift test to get the

guest time.

In addition, the number of migrations the test performs is

controlled by the

parameter 'migration_iterations'.

Signed-off-by: Michael Goldish
---
   client/tests/kvm/kvm_tests.cfg.sample  |   33

---

   client/tests/kvm/tests/timedrift_with_migration.py |   95



   2 files changed, 115 insertions(+), 13 deletions(-)
   create mode 100644

client/tests/kvm/tests/timedrift_with_migration.py


diff --git a/client/tests/kvm/kvm_tests.cfg.sample

b/client/tests/kvm/kvm_tests.cfg.sample

index 540d0a2..618c21e 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -100,19 +100,26 @@ variants:
  type = linux_s3

  - timedrift:install setup
-type = timedrift
  extra_params += " -rtc-td-hack"
-# Pin the VM and host load to CPU #0
-cpu_mask = 0x1
-# Set the load and rest durations
-load_duration = 20
-rest_duration = 20
-# Fail if the drift after load is higher than 50%
-drift_threshold = 50
-# Fail if the drift after the rest period is higher than

10%

-drift_threshold_after_rest = 10
-# For now, make sure this test is executed alone
-used_cpus = 100
+variants:
+- with_load:
+type = timedrift
+# Pin the VM and host load to CPU #0
+cpu_mask = 0x1



Let's use -smp 2 always.


We can also just make -smp 2 the default for all tests. Does that sound
good?


Yes




btw: we need not to parallel the load test with standard tests.


We already don't, because the load test has used_cpus = 100 which
forces it to run alone.


Soon I'll have 100 on my laptop :), better change it to -1 or MAX_INT




+# Set the load and rest durations
+load_duration = 20
+rest_duration = 20


Even the default duration here seems way too brief here, is there

any

reason why 20s was chosen instead of, let's say, 1800s? I am under

the

impression that 20s of load won't be enough to cause any noticeable
drift...


+# Fail if the drift after load is higher than 50%
+drift_threshold = 50
+# Fail if the drift after the rest period is

higher than 10%

+drift_threshold_after_rest = 10


I am also curious about those tresholds and the reasoning behind

them.

Is there any official agreement on what we consider to be an
unreasonable drift?

Another thing that struck me out is drift calculation: On the

original

timedrift test, the guest drift is normalized against the host

drift:


drift = 100.0 * (host_delta - guest_delta) / host_delta

While in the new drift tests, we consider only the guest drift. I
believe is better to normalize all tests based on one drift
calculation criteria, and those values should be reviewed, and at
least a certain level of agreement on our development community

should

be reached.


I think we don't need to calculate drift ratio. We should define a
threshold in seconds, let's say 2 seconds. Beyond that, there should
not be any drift.


Are you talking about the timedrift with load or timedrift with
migration or reboot tests?  I was told that when running the load test
for e.g 60 secs, the drift should be given in % of that duration.
In the case of migration and reboot, absolute durations are used (in
seconds, no %).  Should we do that in the load test too?


Yes, but: during extreme load, we do predict that a guest *without* pv 
clock will drift and won't be able to catchup until the load stops and 
only then it will catchup. So my recommendation is to do the following:
- pvclock guest - can check with 'cat 
/sys/devices/system/clocksource/clocksource0/current_clocksource ' don't 
allow drift during huge loads.

  Exist (+safe) for rhel5.4 guests and ~2.6.29 (from 2.6.27).
- non-pv clock - run the load, stop the load, wait 5 seconds, measure time

For both, use absolute times.





Do we support migration to a different host? We should, especially in
this test too. The destination host reading should also be used.
Apart for that, good patchset, and good thing you refactored some of
the code to shared utils.


We don't, and it would be very messy to implement with the framework
right now.  We should probably do that as some sort of server side test,
but we don't have server side tests right now, so doing it may take a
little time and effort.  I got the impression that there are more
important things to do at the

Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration

2009-10-27 Thread Michael Goldish

- "Dor Laor"  wrote:

> On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote:
> > Hi Michael, I am reviewing your patchset and have just a minor
> remark
> > to make here:
> >
> > On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish
>  wrote:
> >> This patch adds a new test that checks the timedrift introduced by
> migrations.
> >> It uses the same parameters used by the timedrift test to get the
> guest time.
> >> In addition, the number of migrations the test performs is
> controlled by the
> >> parameter 'migration_iterations'.
> >>
> >> Signed-off-by: Michael Goldish
> >> ---
> >>   client/tests/kvm/kvm_tests.cfg.sample  |   33
> ---
> >>   client/tests/kvm/tests/timedrift_with_migration.py |   95
> 
> >>   2 files changed, 115 insertions(+), 13 deletions(-)
> >>   create mode 100644
> client/tests/kvm/tests/timedrift_with_migration.py
> >>
> >> diff --git a/client/tests/kvm/kvm_tests.cfg.sample
> b/client/tests/kvm/kvm_tests.cfg.sample
> >> index 540d0a2..618c21e 100644
> >> --- a/client/tests/kvm/kvm_tests.cfg.sample
> >> +++ b/client/tests/kvm/kvm_tests.cfg.sample
> >> @@ -100,19 +100,26 @@ variants:
> >>  type = linux_s3
> >>
> >>  - timedrift:install setup
> >> -type = timedrift
> >>  extra_params += " -rtc-td-hack"
> >> -# Pin the VM and host load to CPU #0
> >> -cpu_mask = 0x1
> >> -# Set the load and rest durations
> >> -load_duration = 20
> >> -rest_duration = 20
> >> -# Fail if the drift after load is higher than 50%
> >> -drift_threshold = 50
> >> -# Fail if the drift after the rest period is higher than
> 10%
> >> -drift_threshold_after_rest = 10
> >> -# For now, make sure this test is executed alone
> >> -used_cpus = 100
> >> +variants:
> >> +- with_load:
> >> +type = timedrift
> >> +# Pin the VM and host load to CPU #0
> >> +cpu_mask = 0x1
> 
> 
> Let's use -smp 2 always.

We can also just make -smp 2 the default for all tests. Does that sound
good?

> btw: we need not to parallel the load test with standard tests.

We already don't, because the load test has used_cpus = 100 which
forces it to run alone.

> >> +# Set the load and rest durations
> >> +load_duration = 20
> >> +rest_duration = 20
> >
> > Even the default duration here seems way too brief here, is there
> any
> > reason why 20s was chosen instead of, let's say, 1800s? I am under
> the
> > impression that 20s of load won't be enough to cause any noticeable
> > drift...
> >
> >> +# Fail if the drift after load is higher than 50%
> >> +drift_threshold = 50
> >> +# Fail if the drift after the rest period is
> higher than 10%
> >> +drift_threshold_after_rest = 10
> >
> > I am also curious about those tresholds and the reasoning behind
> them.
> > Is there any official agreement on what we consider to be an
> > unreasonable drift?
> >
> > Another thing that struck me out is drift calculation: On the
> original
> > timedrift test, the guest drift is normalized against the host
> drift:
> >
> > drift = 100.0 * (host_delta - guest_delta) / host_delta
> >
> > While in the new drift tests, we consider only the guest drift. I
> > believe is better to normalize all tests based on one drift
> > calculation criteria, and those values should be reviewed, and at
> > least a certain level of agreement on our development community
> should
> > be reached.
> 
> I think we don't need to calculate drift ratio. We should define a 
> threshold in seconds, let's say 2 seconds. Beyond that, there should
> not be any drift.

Are you talking about the timedrift with load or timedrift with
migration or reboot tests?  I was told that when running the load test
for e.g 60 secs, the drift should be given in % of that duration.
In the case of migration and reboot, absolute durations are used (in
seconds, no %).  Should we do that in the load test too?

> Do we support migration to a different host? We should, especially in
> this test too. The destination host reading should also be used.
> Apart for that, good patchset, and good thing you refactored some of
> the code to shared utils.

We don't, and it would be very messy to implement with the framework
right now.  We should probably do that as some sort of server side test,
but we don't have server side tests right now, so doing it may take a
little time and effort.  I got the impression that there are more
important things to do at the moment, but please correct me if I'm wrong.

> >
> > Other than this concern that came to my mind, the new tests look
> good
> > and work fine here. I had to do a slight rebase in one of the
> patches,
> > very minor stuff. The default values and the drift calculation can
> be
> > changed on a later time. Thanks!
> > --
> > To unsubscribe fro

Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration

2009-10-27 Thread Dor Laor

On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote:

Hi Michael, I am reviewing your patchset and have just a minor remark
to make here:

On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish  wrote:

This patch adds a new test that checks the timedrift introduced by migrations.
It uses the same parameters used by the timedrift test to get the guest time.
In addition, the number of migrations the test performs is controlled by the
parameter 'migration_iterations'.

Signed-off-by: Michael Goldish
---
  client/tests/kvm/kvm_tests.cfg.sample  |   33 ---
  client/tests/kvm/tests/timedrift_with_migration.py |   95 
  2 files changed, 115 insertions(+), 13 deletions(-)
  create mode 100644 client/tests/kvm/tests/timedrift_with_migration.py

diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
b/client/tests/kvm/kvm_tests.cfg.sample
index 540d0a2..618c21e 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -100,19 +100,26 @@ variants:
 type = linux_s3

 - timedrift:install setup
-type = timedrift
 extra_params += " -rtc-td-hack"
-# Pin the VM and host load to CPU #0
-cpu_mask = 0x1
-# Set the load and rest durations
-load_duration = 20
-rest_duration = 20
-# Fail if the drift after load is higher than 50%
-drift_threshold = 50
-# Fail if the drift after the rest period is higher than 10%
-drift_threshold_after_rest = 10
-# For now, make sure this test is executed alone
-used_cpus = 100
+variants:
+- with_load:
+type = timedrift
+# Pin the VM and host load to CPU #0
+cpu_mask = 0x1



Let's use -smp 2 always.

btw: we need not to parallel the load test with standard tests.


+# Set the load and rest durations
+load_duration = 20
+rest_duration = 20


Even the default duration here seems way too brief here, is there any
reason why 20s was chosen instead of, let's say, 1800s? I am under the
impression that 20s of load won't be enough to cause any noticeable
drift...


+# Fail if the drift after load is higher than 50%
+drift_threshold = 50
+# Fail if the drift after the rest period is higher than 10%
+drift_threshold_after_rest = 10


I am also curious about those tresholds and the reasoning behind them.
Is there any official agreement on what we consider to be an
unreasonable drift?

Another thing that struck me out is drift calculation: On the original
timedrift test, the guest drift is normalized against the host drift:

drift = 100.0 * (host_delta - guest_delta) / host_delta

While in the new drift tests, we consider only the guest drift. I
believe is better to normalize all tests based on one drift
calculation criteria, and those values should be reviewed, and at
least a certain level of agreement on our development community should
be reached.


I think we don't need to calculate drift ratio. We should define a 
threshold in seconds, let's say 2 seconds. Beyond that, there should not 
be any drift.


Do we support migration to a different host? We should, especially in 
this test too. The destination host reading should also be used.


Apart for that, good patchset, and good thing you refactored some of the 
code to shared utils.




Other than this concern that came to my mind, the new tests look good
and work fine here. I had to do a slight rebase in one of the patches,
very minor stuff. The default values and the drift calculation can be
changed on a later time. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration

2009-10-12 Thread Michael Goldish

- "Lucas Meneghel Rodrigues"  wrote:

> Hi Michael, I am reviewing your patchset and have just a minor remark
> to make here:
> 
> On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish 
> wrote:
> > This patch adds a new test that checks the timedrift introduced by
> migrations.
> > It uses the same parameters used by the timedrift test to get the
> guest time.
> > In addition, the number of migrations the test performs is
> controlled by the
> > parameter 'migration_iterations'.
> >
> > Signed-off-by: Michael Goldish 
> > ---
> >  client/tests/kvm/kvm_tests.cfg.sample              |   33 ---
> >  client/tests/kvm/tests/timedrift_with_migration.py |   95
> 
> >  2 files changed, 115 insertions(+), 13 deletions(-)
> >  create mode 100644
> client/tests/kvm/tests/timedrift_with_migration.py
> >
> > diff --git a/client/tests/kvm/kvm_tests.cfg.sample
> b/client/tests/kvm/kvm_tests.cfg.sample
> > index 540d0a2..618c21e 100644
> > --- a/client/tests/kvm/kvm_tests.cfg.sample
> > +++ b/client/tests/kvm/kvm_tests.cfg.sample
> > @@ -100,19 +100,26 @@ variants:
> >         type = linux_s3
> >
> >     - timedrift:    install setup
> > -        type = timedrift
> >         extra_params += " -rtc-td-hack"
> > -        # Pin the VM and host load to CPU #0
> > -        cpu_mask = 0x1
> > -        # Set the load and rest durations
> > -        load_duration = 20
> > -        rest_duration = 20
> > -        # Fail if the drift after load is higher than 50%
> > -        drift_threshold = 50
> > -        # Fail if the drift after the rest period is higher than
> 10%
> > -        drift_threshold_after_rest = 10
> > -        # For now, make sure this test is executed alone
> > -        used_cpus = 100
> > +        variants:
> > +            - with_load:
> > +                type = timedrift
> > +                # Pin the VM and host load to CPU #0
> > +                cpu_mask = 0x1
> > +                # Set the load and rest durations
> > +                load_duration = 20
> > +                rest_duration = 20
> 
> Even the default duration here seems way too brief here, is there any
> reason why 20s was chosen instead of, let's say, 1800s? I am under
> the
> impression that 20s of load won't be enough to cause any noticeable
> drift...

Apparently I've been working with a bad qemu version for quite a while,
because after 20s of load I often get a huge (80%) drift.  This normally
shouldn't happen.  We might want to wait a little longer than 20s, but
there's no need to wait as long as 1800s AFAIK.  The test is meant to
catch drift problems, and apparently when there's a problem, it reveals
itself quickly.  I'm not sure there are drift problems that reveal
themselves after only 1800s of load, but quite frankly, I know very little
about this, so the timeout value should be changed by the user.

> > +                # Fail if the drift after load is higher than 50%
> > +                drift_threshold = 50
> > +                # Fail if the drift after the rest period is higher
> than 10%
> > +                drift_threshold_after_rest = 10
> 
> I am also curious about those tresholds and the reasoning behind
> them.
> Is there any official agreement on what we consider to be an
> unreasonable drift?

I really don't know.  After asking around I got the impression that the
threshold should depend on the load.  Theoretically it should be possible
to get any amount of drift with enough load -- that's the way I see it,
but I'm really not sure.

Maybe the best thing for us to do is run the time drift test a few times
with functional qemu versions as well as with broken ones (but not as broken
as the one I'm using), so we can see what thresholds best differentiate
between functional and broken code.

> Another thing that struck me out is drift calculation: On the
> original
> timedrift test, the guest drift is normalized against the host drift:
> 
> drift = 100.0 * (host_delta - guest_delta) / host_delta
> 
> While in the new drift tests, we consider only the guest drift. I
> believe is better to normalize all tests based on one drift
> calculation criteria, and those values should be reviewed, and at
> least a certain level of agreement on our development community
> should be reached.

The new tests use the host clock as reference like the original test.
We check how much time passed on the host, and then how much time passed
in the guest, and take the difference between those two.  The result is an
absolute number of seconds because there's no "load duration" with which
to normalize the result.  We're just interested in how much drift is caused
by a single reboot or a single migration (or a few).  I don't think it
matters how long the reboot/migration procedure took -- we treat it as a
single discrete event.

> Other than this concern that came to my mind, the new tests look good
> and work fine here. I had to do a slight rebase in one of the
> patches,
> very minor stuff. The default values and the drift c

Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration

2009-10-12 Thread Lucas Meneghel Rodrigues
Hi Michael, I am reviewing your patchset and have just a minor remark
to make here:

On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish  wrote:
> This patch adds a new test that checks the timedrift introduced by migrations.
> It uses the same parameters used by the timedrift test to get the guest time.
> In addition, the number of migrations the test performs is controlled by the
> parameter 'migration_iterations'.
>
> Signed-off-by: Michael Goldish 
> ---
>  client/tests/kvm/kvm_tests.cfg.sample              |   33 ---
>  client/tests/kvm/tests/timedrift_with_migration.py |   95 
> 
>  2 files changed, 115 insertions(+), 13 deletions(-)
>  create mode 100644 client/tests/kvm/tests/timedrift_with_migration.py
>
> diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
> b/client/tests/kvm/kvm_tests.cfg.sample
> index 540d0a2..618c21e 100644
> --- a/client/tests/kvm/kvm_tests.cfg.sample
> +++ b/client/tests/kvm/kvm_tests.cfg.sample
> @@ -100,19 +100,26 @@ variants:
>         type = linux_s3
>
>     - timedrift:    install setup
> -        type = timedrift
>         extra_params += " -rtc-td-hack"
> -        # Pin the VM and host load to CPU #0
> -        cpu_mask = 0x1
> -        # Set the load and rest durations
> -        load_duration = 20
> -        rest_duration = 20
> -        # Fail if the drift after load is higher than 50%
> -        drift_threshold = 50
> -        # Fail if the drift after the rest period is higher than 10%
> -        drift_threshold_after_rest = 10
> -        # For now, make sure this test is executed alone
> -        used_cpus = 100
> +        variants:
> +            - with_load:
> +                type = timedrift
> +                # Pin the VM and host load to CPU #0
> +                cpu_mask = 0x1
> +                # Set the load and rest durations
> +                load_duration = 20
> +                rest_duration = 20

Even the default duration here seems way too brief here, is there any
reason why 20s was chosen instead of, let's say, 1800s? I am under the
impression that 20s of load won't be enough to cause any noticeable
drift...

> +                # Fail if the drift after load is higher than 50%
> +                drift_threshold = 50
> +                # Fail if the drift after the rest period is higher than 10%
> +                drift_threshold_after_rest = 10

I am also curious about those tresholds and the reasoning behind them.
Is there any official agreement on what we consider to be an
unreasonable drift?

Another thing that struck me out is drift calculation: On the original
timedrift test, the guest drift is normalized against the host drift:

drift = 100.0 * (host_delta - guest_delta) / host_delta

While in the new drift tests, we consider only the guest drift. I
believe is better to normalize all tests based on one drift
calculation criteria, and those values should be reviewed, and at
least a certain level of agreement on our development community should
be reached.

Other than this concern that came to my mind, the new tests look good
and work fine here. I had to do a slight rebase in one of the patches,
very minor stuff. The default values and the drift calculation can be
changed on a later time. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html