Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration
On 10/28/2009 08:54 AM, Michael Goldish wrote: - "Dor Laor" wrote: On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote: Hi Michael, I am reviewing your patchset and have just a minor remark to make here: On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish wrote: This patch adds a new test that checks the timedrift introduced by migrations. It uses the same parameters used by the timedrift test to get the guest time. In addition, the number of migrations the test performs is controlled by the parameter 'migration_iterations'. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_tests.cfg.sample | 33 --- client/tests/kvm/tests/timedrift_with_migration.py | 95 2 files changed, 115 insertions(+), 13 deletions(-) create mode 100644 client/tests/kvm/tests/timedrift_with_migration.py diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample index 540d0a2..618c21e 100644 --- a/client/tests/kvm/kvm_tests.cfg.sample +++ b/client/tests/kvm/kvm_tests.cfg.sample @@ -100,19 +100,26 @@ variants: type = linux_s3 - timedrift:install setup -type = timedrift extra_params += " -rtc-td-hack" -# Pin the VM and host load to CPU #0 -cpu_mask = 0x1 -# Set the load and rest durations -load_duration = 20 -rest_duration = 20 -# Fail if the drift after load is higher than 50% -drift_threshold = 50 -# Fail if the drift after the rest period is higher than 10% -drift_threshold_after_rest = 10 -# For now, make sure this test is executed alone -used_cpus = 100 +variants: +- with_load: +type = timedrift +# Pin the VM and host load to CPU #0 +cpu_mask = 0x1 Let's use -smp 2 always. We can also just make -smp 2 the default for all tests. Does that sound good? Yes btw: we need not to parallel the load test with standard tests. We already don't, because the load test has used_cpus = 100 which forces it to run alone. Soon I'll have 100 on my laptop :), better change it to -1 or MAX_INT +# Set the load and rest durations +load_duration = 20 +rest_duration = 20 Even the default duration here seems way too brief here, is there any reason why 20s was chosen instead of, let's say, 1800s? I am under the impression that 20s of load won't be enough to cause any noticeable drift... +# Fail if the drift after load is higher than 50% +drift_threshold = 50 +# Fail if the drift after the rest period is higher than 10% +drift_threshold_after_rest = 10 I am also curious about those tresholds and the reasoning behind them. Is there any official agreement on what we consider to be an unreasonable drift? Another thing that struck me out is drift calculation: On the original timedrift test, the guest drift is normalized against the host drift: drift = 100.0 * (host_delta - guest_delta) / host_delta While in the new drift tests, we consider only the guest drift. I believe is better to normalize all tests based on one drift calculation criteria, and those values should be reviewed, and at least a certain level of agreement on our development community should be reached. I think we don't need to calculate drift ratio. We should define a threshold in seconds, let's say 2 seconds. Beyond that, there should not be any drift. Are you talking about the timedrift with load or timedrift with migration or reboot tests? I was told that when running the load test for e.g 60 secs, the drift should be given in % of that duration. In the case of migration and reboot, absolute durations are used (in seconds, no %). Should we do that in the load test too? Yes, but: during extreme load, we do predict that a guest *without* pv clock will drift and won't be able to catchup until the load stops and only then it will catchup. So my recommendation is to do the following: - pvclock guest - can check with 'cat /sys/devices/system/clocksource/clocksource0/current_clocksource ' don't allow drift during huge loads. Exist (+safe) for rhel5.4 guests and ~2.6.29 (from 2.6.27). - non-pv clock - run the load, stop the load, wait 5 seconds, measure time For both, use absolute times. Do we support migration to a different host? We should, especially in this test too. The destination host reading should also be used. Apart for that, good patchset, and good thing you refactored some of the code to shared utils. We don't, and it would be very messy to implement with the framework right now. We should probably do that as some sort of server side test, but we don't have server side tests right now, so doing it may take a little time and effort. I got the impression that there are more important things to do at the
Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration
- "Dor Laor" wrote: > On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote: > > Hi Michael, I am reviewing your patchset and have just a minor > remark > > to make here: > > > > On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish > wrote: > >> This patch adds a new test that checks the timedrift introduced by > migrations. > >> It uses the same parameters used by the timedrift test to get the > guest time. > >> In addition, the number of migrations the test performs is > controlled by the > >> parameter 'migration_iterations'. > >> > >> Signed-off-by: Michael Goldish > >> --- > >> client/tests/kvm/kvm_tests.cfg.sample | 33 > --- > >> client/tests/kvm/tests/timedrift_with_migration.py | 95 > > >> 2 files changed, 115 insertions(+), 13 deletions(-) > >> create mode 100644 > client/tests/kvm/tests/timedrift_with_migration.py > >> > >> diff --git a/client/tests/kvm/kvm_tests.cfg.sample > b/client/tests/kvm/kvm_tests.cfg.sample > >> index 540d0a2..618c21e 100644 > >> --- a/client/tests/kvm/kvm_tests.cfg.sample > >> +++ b/client/tests/kvm/kvm_tests.cfg.sample > >> @@ -100,19 +100,26 @@ variants: > >> type = linux_s3 > >> > >> - timedrift:install setup > >> -type = timedrift > >> extra_params += " -rtc-td-hack" > >> -# Pin the VM and host load to CPU #0 > >> -cpu_mask = 0x1 > >> -# Set the load and rest durations > >> -load_duration = 20 > >> -rest_duration = 20 > >> -# Fail if the drift after load is higher than 50% > >> -drift_threshold = 50 > >> -# Fail if the drift after the rest period is higher than > 10% > >> -drift_threshold_after_rest = 10 > >> -# For now, make sure this test is executed alone > >> -used_cpus = 100 > >> +variants: > >> +- with_load: > >> +type = timedrift > >> +# Pin the VM and host load to CPU #0 > >> +cpu_mask = 0x1 > > > Let's use -smp 2 always. We can also just make -smp 2 the default for all tests. Does that sound good? > btw: we need not to parallel the load test with standard tests. We already don't, because the load test has used_cpus = 100 which forces it to run alone. > >> +# Set the load and rest durations > >> +load_duration = 20 > >> +rest_duration = 20 > > > > Even the default duration here seems way too brief here, is there > any > > reason why 20s was chosen instead of, let's say, 1800s? I am under > the > > impression that 20s of load won't be enough to cause any noticeable > > drift... > > > >> +# Fail if the drift after load is higher than 50% > >> +drift_threshold = 50 > >> +# Fail if the drift after the rest period is > higher than 10% > >> +drift_threshold_after_rest = 10 > > > > I am also curious about those tresholds and the reasoning behind > them. > > Is there any official agreement on what we consider to be an > > unreasonable drift? > > > > Another thing that struck me out is drift calculation: On the > original > > timedrift test, the guest drift is normalized against the host > drift: > > > > drift = 100.0 * (host_delta - guest_delta) / host_delta > > > > While in the new drift tests, we consider only the guest drift. I > > believe is better to normalize all tests based on one drift > > calculation criteria, and those values should be reviewed, and at > > least a certain level of agreement on our development community > should > > be reached. > > I think we don't need to calculate drift ratio. We should define a > threshold in seconds, let's say 2 seconds. Beyond that, there should > not be any drift. Are you talking about the timedrift with load or timedrift with migration or reboot tests? I was told that when running the load test for e.g 60 secs, the drift should be given in % of that duration. In the case of migration and reboot, absolute durations are used (in seconds, no %). Should we do that in the load test too? > Do we support migration to a different host? We should, especially in > this test too. The destination host reading should also be used. > Apart for that, good patchset, and good thing you refactored some of > the code to shared utils. We don't, and it would be very messy to implement with the framework right now. We should probably do that as some sort of server side test, but we don't have server side tests right now, so doing it may take a little time and effort. I got the impression that there are more important things to do at the moment, but please correct me if I'm wrong. > > > > Other than this concern that came to my mind, the new tests look > good > > and work fine here. I had to do a slight rebase in one of the > patches, > > very minor stuff. The default values and the drift calculation can > be > > changed on a later time. Thanks! > > -- > > To unsubscribe fro
Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration
On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote: Hi Michael, I am reviewing your patchset and have just a minor remark to make here: On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish wrote: This patch adds a new test that checks the timedrift introduced by migrations. It uses the same parameters used by the timedrift test to get the guest time. In addition, the number of migrations the test performs is controlled by the parameter 'migration_iterations'. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_tests.cfg.sample | 33 --- client/tests/kvm/tests/timedrift_with_migration.py | 95 2 files changed, 115 insertions(+), 13 deletions(-) create mode 100644 client/tests/kvm/tests/timedrift_with_migration.py diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample index 540d0a2..618c21e 100644 --- a/client/tests/kvm/kvm_tests.cfg.sample +++ b/client/tests/kvm/kvm_tests.cfg.sample @@ -100,19 +100,26 @@ variants: type = linux_s3 - timedrift:install setup -type = timedrift extra_params += " -rtc-td-hack" -# Pin the VM and host load to CPU #0 -cpu_mask = 0x1 -# Set the load and rest durations -load_duration = 20 -rest_duration = 20 -# Fail if the drift after load is higher than 50% -drift_threshold = 50 -# Fail if the drift after the rest period is higher than 10% -drift_threshold_after_rest = 10 -# For now, make sure this test is executed alone -used_cpus = 100 +variants: +- with_load: +type = timedrift +# Pin the VM and host load to CPU #0 +cpu_mask = 0x1 Let's use -smp 2 always. btw: we need not to parallel the load test with standard tests. +# Set the load and rest durations +load_duration = 20 +rest_duration = 20 Even the default duration here seems way too brief here, is there any reason why 20s was chosen instead of, let's say, 1800s? I am under the impression that 20s of load won't be enough to cause any noticeable drift... +# Fail if the drift after load is higher than 50% +drift_threshold = 50 +# Fail if the drift after the rest period is higher than 10% +drift_threshold_after_rest = 10 I am also curious about those tresholds and the reasoning behind them. Is there any official agreement on what we consider to be an unreasonable drift? Another thing that struck me out is drift calculation: On the original timedrift test, the guest drift is normalized against the host drift: drift = 100.0 * (host_delta - guest_delta) / host_delta While in the new drift tests, we consider only the guest drift. I believe is better to normalize all tests based on one drift calculation criteria, and those values should be reviewed, and at least a certain level of agreement on our development community should be reached. I think we don't need to calculate drift ratio. We should define a threshold in seconds, let's say 2 seconds. Beyond that, there should not be any drift. Do we support migration to a different host? We should, especially in this test too. The destination host reading should also be used. Apart for that, good patchset, and good thing you refactored some of the code to shared utils. Other than this concern that came to my mind, the new tests look good and work fine here. I had to do a slight rebase in one of the patches, very minor stuff. The default values and the drift calculation can be changed on a later time. Thanks! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration
- "Lucas Meneghel Rodrigues" wrote: > Hi Michael, I am reviewing your patchset and have just a minor remark > to make here: > > On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish > wrote: > > This patch adds a new test that checks the timedrift introduced by > migrations. > > It uses the same parameters used by the timedrift test to get the > guest time. > > In addition, the number of migrations the test performs is > controlled by the > > parameter 'migration_iterations'. > > > > Signed-off-by: Michael Goldish > > --- > > client/tests/kvm/kvm_tests.cfg.sample | 33 --- > > client/tests/kvm/tests/timedrift_with_migration.py | 95 > > > 2 files changed, 115 insertions(+), 13 deletions(-) > > create mode 100644 > client/tests/kvm/tests/timedrift_with_migration.py > > > > diff --git a/client/tests/kvm/kvm_tests.cfg.sample > b/client/tests/kvm/kvm_tests.cfg.sample > > index 540d0a2..618c21e 100644 > > --- a/client/tests/kvm/kvm_tests.cfg.sample > > +++ b/client/tests/kvm/kvm_tests.cfg.sample > > @@ -100,19 +100,26 @@ variants: > > type = linux_s3 > > > > - timedrift: install setup > > - type = timedrift > > extra_params += " -rtc-td-hack" > > - # Pin the VM and host load to CPU #0 > > - cpu_mask = 0x1 > > - # Set the load and rest durations > > - load_duration = 20 > > - rest_duration = 20 > > - # Fail if the drift after load is higher than 50% > > - drift_threshold = 50 > > - # Fail if the drift after the rest period is higher than > 10% > > - drift_threshold_after_rest = 10 > > - # For now, make sure this test is executed alone > > - used_cpus = 100 > > + variants: > > + - with_load: > > + type = timedrift > > + # Pin the VM and host load to CPU #0 > > + cpu_mask = 0x1 > > + # Set the load and rest durations > > + load_duration = 20 > > + rest_duration = 20 > > Even the default duration here seems way too brief here, is there any > reason why 20s was chosen instead of, let's say, 1800s? I am under > the > impression that 20s of load won't be enough to cause any noticeable > drift... Apparently I've been working with a bad qemu version for quite a while, because after 20s of load I often get a huge (80%) drift. This normally shouldn't happen. We might want to wait a little longer than 20s, but there's no need to wait as long as 1800s AFAIK. The test is meant to catch drift problems, and apparently when there's a problem, it reveals itself quickly. I'm not sure there are drift problems that reveal themselves after only 1800s of load, but quite frankly, I know very little about this, so the timeout value should be changed by the user. > > + # Fail if the drift after load is higher than 50% > > + drift_threshold = 50 > > + # Fail if the drift after the rest period is higher > than 10% > > + drift_threshold_after_rest = 10 > > I am also curious about those tresholds and the reasoning behind > them. > Is there any official agreement on what we consider to be an > unreasonable drift? I really don't know. After asking around I got the impression that the threshold should depend on the load. Theoretically it should be possible to get any amount of drift with enough load -- that's the way I see it, but I'm really not sure. Maybe the best thing for us to do is run the time drift test a few times with functional qemu versions as well as with broken ones (but not as broken as the one I'm using), so we can see what thresholds best differentiate between functional and broken code. > Another thing that struck me out is drift calculation: On the > original > timedrift test, the guest drift is normalized against the host drift: > > drift = 100.0 * (host_delta - guest_delta) / host_delta > > While in the new drift tests, we consider only the guest drift. I > believe is better to normalize all tests based on one drift > calculation criteria, and those values should be reviewed, and at > least a certain level of agreement on our development community > should be reached. The new tests use the host clock as reference like the original test. We check how much time passed on the host, and then how much time passed in the guest, and take the difference between those two. The result is an absolute number of seconds because there's no "load duration" with which to normalize the result. We're just interested in how much drift is caused by a single reboot or a single migration (or a few). I don't think it matters how long the reboot/migration procedure took -- we treat it as a single discrete event. > Other than this concern that came to my mind, the new tests look good > and work fine here. I had to do a slight rebase in one of the > patches, > very minor stuff. The default values and the drift c
Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration
Hi Michael, I am reviewing your patchset and have just a minor remark to make here: On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish wrote: > This patch adds a new test that checks the timedrift introduced by migrations. > It uses the same parameters used by the timedrift test to get the guest time. > In addition, the number of migrations the test performs is controlled by the > parameter 'migration_iterations'. > > Signed-off-by: Michael Goldish > --- > client/tests/kvm/kvm_tests.cfg.sample | 33 --- > client/tests/kvm/tests/timedrift_with_migration.py | 95 > > 2 files changed, 115 insertions(+), 13 deletions(-) > create mode 100644 client/tests/kvm/tests/timedrift_with_migration.py > > diff --git a/client/tests/kvm/kvm_tests.cfg.sample > b/client/tests/kvm/kvm_tests.cfg.sample > index 540d0a2..618c21e 100644 > --- a/client/tests/kvm/kvm_tests.cfg.sample > +++ b/client/tests/kvm/kvm_tests.cfg.sample > @@ -100,19 +100,26 @@ variants: > type = linux_s3 > > - timedrift: install setup > - type = timedrift > extra_params += " -rtc-td-hack" > - # Pin the VM and host load to CPU #0 > - cpu_mask = 0x1 > - # Set the load and rest durations > - load_duration = 20 > - rest_duration = 20 > - # Fail if the drift after load is higher than 50% > - drift_threshold = 50 > - # Fail if the drift after the rest period is higher than 10% > - drift_threshold_after_rest = 10 > - # For now, make sure this test is executed alone > - used_cpus = 100 > + variants: > + - with_load: > + type = timedrift > + # Pin the VM and host load to CPU #0 > + cpu_mask = 0x1 > + # Set the load and rest durations > + load_duration = 20 > + rest_duration = 20 Even the default duration here seems way too brief here, is there any reason why 20s was chosen instead of, let's say, 1800s? I am under the impression that 20s of load won't be enough to cause any noticeable drift... > + # Fail if the drift after load is higher than 50% > + drift_threshold = 50 > + # Fail if the drift after the rest period is higher than 10% > + drift_threshold_after_rest = 10 I am also curious about those tresholds and the reasoning behind them. Is there any official agreement on what we consider to be an unreasonable drift? Another thing that struck me out is drift calculation: On the original timedrift test, the guest drift is normalized against the host drift: drift = 100.0 * (host_delta - guest_delta) / host_delta While in the new drift tests, we consider only the guest drift. I believe is better to normalize all tests based on one drift calculation criteria, and those values should be reviewed, and at least a certain level of agreement on our development community should be reached. Other than this concern that came to my mind, the new tests look good and work fine here. I had to do a slight rebase in one of the patches, very minor stuff. The default values and the drift calculation can be changed on a later time. Thanks! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html