Re: [PATCH] [RFC] KVM test: Control files automatic generation to save memory

2010-02-18 Thread Lucas Meneghel Rodrigues
On Sun, 2010-02-14 at 12:07 -0500, Michael Goldish wrote:
 - Lucas Meneghel Rodrigues l...@redhat.com wrote:
 
  As our configuration system generates a list of dicts
  with test parameters, and that list might be potentially
  *very* large, keeping all this information in memory might
  be a problem for smaller virtualization hosts due to
  the memory pressure created. Tests made on my 4GB laptop
  show that most of the memory is being used during a
  typical kvm autotest session.
  
  So, instead of keeping all this information in memory,
  let's take a different approach and unfold all the
  tests generated by the config system and generate a
  control file:
  
  job.run_test('kvm', params={param1, param2, ...}, tag='foo', ...)
  job.run_test('kvm', params={param1, param2, ...}, tag='bar', ...)
  
  By dumping all the dicts that were before in the memory to
  a control file, the memory usage of a typical kvm autotest
  session is drastically reduced making it easier to run in smaller
  virt hosts.
  
  The advantages of taking this new approach are:
   * You can see what tests are going to run and the dependencies
 between them by looking at the generated control file
   * The control file is all ready to use, you can for example
 paste it on the web interface and profit
   * As mentioned, a lot less memory consumption, avoiding
 memory pressure on virtualization hosts.
  
  This is a crude 1st pass at implementing this approach, so please
  provide comments.
  
  Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
  ---
 
 Interesting idea!
 
 - Personally I don't like the renaming of kvm_config.py to
 generate_control.py, and prefer to keep them separate, so that
 generate_control.py has the create_control() function and
 kvm_config.py has everything else.  It's just a matter of naming;
 kvm_config.py deals mostly with config files, not with control files,
 and it can be used for other purposes than generating control files.

Fair enough, no problem.

 - I wonder why so much memory is used by the test list.  Our daily
 test sets aren't very big, so although the parser should use a huge
 amount of memory while parsing, nearly all of that memory should be
 freed by the time the parser is done, because the final 'only'
 statement reduces the number of tests to a small fraction of the total
 number in a full set.  What test set did you try with that 4 GB
 machine, and how much memory was used by the test list?  If a
 ridiculous amount of memory was used, this might indicate a bug in
 kvm_config.py (maybe it keeps references to deleted tests, forcing
 them to stay in memory).

This problem wasn't found during the daily test routine, rather it was a
comment I heard from Naphtali about the typical autotest memory usage.
Also Marcelo made a similar comment, so I thought it was a problem worth
looking. I tried to run the default test set that we selected for
upstream (3 resulting dicts) on my 4GB RAM laptop, here are my findings:

 * Before autotest usage: Around 20% of memory used, 10% used as cache.
 * During autotest usage: About 99% of memory used, 27% used as cache.

So yes, there's a significant memory usage increase, that doesn't happen
using a flat, autogenerated control file. Sure it doesn't make my
laptop crawl, but it is a *lot* of resource usage anyway.

Also, let's assume that for small test sets, we can can reclaim all
memory back. Still we have to consider large test sets. I am all for
profiling the memory usage and fix eventual bugs, but we need to keep in
mind that one might want to run large test sets, and large test sets
imply keeping a fairly large amount of data in memory. If the amount of
memory is negligible on most use cases, then let's just fix bugs and
forget about using the proposed approach.

Also, a flat control file is quicker to run, because there's no
parsing of the config file happening in there. So, this control file
generation thing makes some sense, that's why I decided to code this 1st
pass attempt at doing it.

 - I don't think this approach will work for control.parallel, because
 the tests have to be assigned dynamically to available queues, and
 AFAIK this can't be done by a simple static control file.

Not necessarily, as the control file is a program, we can just generate
the code using some sort of function that can do the assignment. I don't
fully see all that's needed to get the job done, but in theory should be
possible.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] KVM test: Control files automatic generation to save memory

2010-02-18 Thread Michael Goldish

- Lucas Meneghel Rodrigues l...@redhat.com wrote:

 On Sun, 2010-02-14 at 12:07 -0500, Michael Goldish wrote:
  - Lucas Meneghel Rodrigues l...@redhat.com wrote:
  
   As our configuration system generates a list of dicts
   with test parameters, and that list might be potentially
   *very* large, keeping all this information in memory might
   be a problem for smaller virtualization hosts due to
   the memory pressure created. Tests made on my 4GB laptop
   show that most of the memory is being used during a
   typical kvm autotest session.
   
   So, instead of keeping all this information in memory,
   let's take a different approach and unfold all the
   tests generated by the config system and generate a
   control file:
   
   job.run_test('kvm', params={param1, param2, ...}, tag='foo', ...)
   job.run_test('kvm', params={param1, param2, ...}, tag='bar', ...)
   
   By dumping all the dicts that were before in the memory to
   a control file, the memory usage of a typical kvm autotest
   session is drastically reduced making it easier to run in smaller
   virt hosts.
   
   The advantages of taking this new approach are:
* You can see what tests are going to run and the dependencies
  between them by looking at the generated control file
* The control file is all ready to use, you can for example
  paste it on the web interface and profit
* As mentioned, a lot less memory consumption, avoiding
  memory pressure on virtualization hosts.
   
   This is a crude 1st pass at implementing this approach, so please
   provide comments.
   
   Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
   ---
  
  Interesting idea!
  
  - Personally I don't like the renaming of kvm_config.py to
  generate_control.py, and prefer to keep them separate, so that
  generate_control.py has the create_control() function and
  kvm_config.py has everything else.  It's just a matter of naming;
  kvm_config.py deals mostly with config files, not with control
 files,
  and it can be used for other purposes than generating control
 files.
 
 Fair enough, no problem.
 
  - I wonder why so much memory is used by the test list.  Our daily
  test sets aren't very big, so although the parser should use a huge
  amount of memory while parsing, nearly all of that memory should be
  freed by the time the parser is done, because the final 'only'
  statement reduces the number of tests to a small fraction of the
 total
  number in a full set.  What test set did you try with that 4 GB
  machine, and how much memory was used by the test list?  If a
  ridiculous amount of memory was used, this might indicate a bug in
  kvm_config.py (maybe it keeps references to deleted tests, forcing
  them to stay in memory).
 
 This problem wasn't found during the daily test routine, rather it was
 a
 comment I heard from Naphtali about the typical autotest memory
 usage.
 Also Marcelo made a similar comment, so I thought it was a problem
 worth
 looking. I tried to run the default test set that we selected for
 upstream (3 resulting dicts) on my 4GB RAM laptop, here are my
 findings:
 
  * Before autotest usage: Around 20% of memory used, 10% used as
 cache.
  * During autotest usage: About 99% of memory used, 27% used as
 cache.

Before autotest usage, were there any VMs running?

3 dicts can't possibly take up so much space.  If it is indeed kvm_config's
fault (which I doubt), there's probably a bug in it that prevents it from
freeing unused memory, and once we fix that bug the problem should be gone.

 So yes, there's a significant memory usage increase, that doesn't
 happen
 using a flat, autogenerated control file. Sure it doesn't make my
 laptop crawl, but it is a *lot* of resource usage anyway.
 
 Also, let's assume that for small test sets, we can can reclaim all
 memory back. Still we have to consider large test sets. I am all for
 profiling the memory usage and fix eventual bugs, but we need to keep
 in
 mind that one might want to run large test sets, and large test sets
 imply keeping a fairly large amount of data in memory. If the amount
 of
 memory is negligible on most use cases, then let's just fix bugs and
 forget about using the proposed approach.
 
 Also, a flat control file is quicker to run, because there's no
 parsing of the config file happening in there. So, this control file

Agreed, but on the other hand, the static control file idea introduces
an extra preprocessing step (not necessarily bad).

 generation thing makes some sense, that's why I decided to code this
 1st pass attempt at doing it.
 
  - I don't think this approach will work for control.parallel,
 because
  the tests have to be assigned dynamically to available queues, and
  AFAIK this can't be done by a simple static control file.
 
 Not necessarily, as the control file is a program, we can just
 generate
 the code using some sort of function that can do the assignment. I
 don't
 fully see all that's needed to get the job done, 

Re: [PATCH] [RFC] KVM test: Control files automatic generation to save memory

2010-02-14 Thread Michael Goldish

- Lucas Meneghel Rodrigues l...@redhat.com wrote:

 As our configuration system generates a list of dicts
 with test parameters, and that list might be potentially
 *very* large, keeping all this information in memory might
 be a problem for smaller virtualization hosts due to
 the memory pressure created. Tests made on my 4GB laptop
 show that most of the memory is being used during a
 typical kvm autotest session.
 
 So, instead of keeping all this information in memory,
 let's take a different approach and unfold all the
 tests generated by the config system and generate a
 control file:
 
 job.run_test('kvm', params={param1, param2, ...}, tag='foo', ...)
 job.run_test('kvm', params={param1, param2, ...}, tag='bar', ...)
 
 By dumping all the dicts that were before in the memory to
 a control file, the memory usage of a typical kvm autotest
 session is drastically reduced making it easier to run in smaller
 virt hosts.
 
 The advantages of taking this new approach are:
  * You can see what tests are going to run and the dependencies
between them by looking at the generated control file
  * The control file is all ready to use, you can for example
paste it on the web interface and profit
  * As mentioned, a lot less memory consumption, avoiding
memory pressure on virtualization hosts.
 
 This is a crude 1st pass at implementing this approach, so please
 provide comments.
 
 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---

Interesting idea!

- Personally I don't like the renaming of kvm_config.py to
generate_control.py, and prefer to keep them separate, so that
generate_control.py has the create_control() function and
kvm_config.py has everything else.  It's just a matter of naming;
kvm_config.py deals mostly with config files, not with control files,
and it can be used for other purposes than generating control files.

- I wonder why so much memory is used by the test list.  Our daily
test sets aren't very big, so although the parser should use a huge
amount of memory while parsing, nearly all of that memory should be
freed by the time the parser is done, because the final 'only'
statement reduces the number of tests to a small fraction of the total
number in a full set.  What test set did you try with that 4 GB
machine, and how much memory was used by the test list?  If a
ridiculous amount of memory was used, this might indicate a bug in
kvm_config.py (maybe it keeps references to deleted tests, forcing
them to stay in memory).

- I don't think this approach will work for control.parallel, because
the tests have to be assigned dynamically to available queues, and
AFAIK this can't be done by a simple static control file.

- Whether or not this is a good idea probably depends on the users.
On one hand, users will be required to run generate_control.py before
autotest.py, and the generated control files will be very big and
ugly; on the other hand, maybe they won't care.

I probably haven't given this enough thought so I might have missed a
few things.


  client/tests/kvm/control |   64 
  client/tests/kvm/generate_control.py |  586
 ++
  client/tests/kvm/kvm_config.py   |  524
 --
  3 files changed, 586 insertions(+), 588 deletions(-)
  delete mode 100644 client/tests/kvm/control
  create mode 100755 client/tests/kvm/generate_control.py
  delete mode 100755 client/tests/kvm/kvm_config.py
 
 diff --git a/client/tests/kvm/control b/client/tests/kvm/control
 deleted file mode 100644
 index 163286e..000
 --- a/client/tests/kvm/control
 +++ /dev/null
 @@ -1,64 +0,0 @@
 -AUTHOR = 
 -u...@redhat.com (Uri Lublin)
 -dru...@redhat.com (Dror Russo)
 -mgold...@redhat.com (Michael Goldish)
 -dh...@redhat.com (David Huff)
 -aerom...@redhat.com (Alexey Eromenko)
 -mbu...@redhat.com (Mike Burns)
 -
 -TIME = 'MEDIUM'
 -NAME = 'KVM test'
 -TEST_TYPE = 'client'
 -TEST_CLASS = 'Virtualization'
 -TEST_CATEGORY = 'Functional'
 -
 -DOC = 
 -Executes the KVM test framework on a given host. This module is
 separated in
 -minor functions, that execute different tests for doing Quality
 Assurance on
 -KVM (both kernelspace and userspace) code.
 -
 -For online docs, please refer to
 http://www.linux-kvm.org/page/KVM-Autotest
 -
 -
 -import sys, os, logging
 -# Add the KVM tests dir to the python path
 -kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests/kvm')
 -sys.path.append(kvm_test_dir)
 -# Now we can import modules inside the KVM tests dir
 -import kvm_utils, kvm_config
 -
 -# set English environment (command output might be localized, need to
 be safe)
 -os.environ['LANG'] = 'en_US.UTF-8'
 -
 -build_cfg_path = os.path.join(kvm_test_dir, build.cfg)
 -build_cfg = kvm_config.config(build_cfg_path)
 -# Make any desired changes to the build configuration here. For
 example:
 -#build_cfg.parse_string(
 -#release_tag = 84
 -#)
 -if not kvm_utils.run_tests(build_cfg.get_list(), job):
 -logging.error(KVM build 

Re: [PATCH] [RFC] KVM test: Control files automatic generation to save memory

2010-02-14 Thread Dor Laor

On 02/14/2010 07:07 PM, Michael Goldish wrote:


- Lucas Meneghel Rodriguesl...@redhat.com  wrote:


As our configuration system generates a list of dicts
with test parameters, and that list might be potentially
*very* large, keeping all this information in memory might
be a problem for smaller virtualization hosts due to
the memory pressure created. Tests made on my 4GB laptop
show that most of the memory is being used during a
typical kvm autotest session.

So, instead of keeping all this information in memory,
let's take a different approach and unfold all the
tests generated by the config system and generate a
control file:

job.run_test('kvm', params={param1, param2, ...}, tag='foo', ...)
job.run_test('kvm', params={param1, param2, ...}, tag='bar', ...)

By dumping all the dicts that were before in the memory to
a control file, the memory usage of a typical kvm autotest
session is drastically reduced making it easier to run in smaller
virt hosts.

The advantages of taking this new approach are:
  * You can see what tests are going to run and the dependencies
between them by looking at the generated control file
  * The control file is all ready to use, you can for example
paste it on the web interface and profit
  * As mentioned, a lot less memory consumption, avoiding
memory pressure on virtualization hosts.

This is a crude 1st pass at implementing this approach, so please
provide comments.

Signed-off-by: Lucas Meneghel Rodriguesl...@redhat.com
---


Interesting idea!

- Personally I don't like the renaming of kvm_config.py to
generate_control.py, and prefer to keep them separate, so that
generate_control.py has the create_control() function and
kvm_config.py has everything else.  It's just a matter of naming;
kvm_config.py deals mostly with config files, not with control files,
and it can be used for other purposes than generating control files.

- I wonder why so much memory is used by the test list.  Our daily
test sets aren't very big, so although the parser should use a huge
amount of memory while parsing, nearly all of that memory should be
freed by the time the parser is done, because the final 'only'
statement reduces the number of tests to a small fraction of the total
number in a full set.  What test set did you try with that 4 GB
machine, and how much memory was used by the test list?  If a
ridiculous amount of memory was used, this might indicate a bug in
kvm_config.py (maybe it keeps references to deleted tests, forcing
them to stay in memory).


I agree, it's worth getting to the bottom of it - I wonder how many 
objects are created on kvm unstable set. It should be a huge number.
Besides that, one can always call the python garbage collection 
interface in order to free unreferenced memory immediately.




- I don't think this approach will work for control.parallel, because
the tests have to be assigned dynamically to available queues, and
AFAIK this can't be done by a simple static control file.

- Whether or not this is a good idea probably depends on the users.
On one hand, users will be required to run generate_control.py before
autotest.py, and the generated control files will be very big and
ugly; on the other hand, maybe they won't care.

I probably haven't given this enough thought so I might have missed a
few things.



  client/tests/kvm/control |   64 
  client/tests/kvm/generate_control.py |  586
++
  client/tests/kvm/kvm_config.py   |  524
--
  3 files changed, 586 insertions(+), 588 deletions(-)
  delete mode 100644 client/tests/kvm/control
  create mode 100755 client/tests/kvm/generate_control.py
  delete mode 100755 client/tests/kvm/kvm_config.py

diff --git a/client/tests/kvm/control b/client/tests/kvm/control
deleted file mode 100644
index 163286e..000
--- a/client/tests/kvm/control
+++ /dev/null
@@ -1,64 +0,0 @@
-AUTHOR = 
-u...@redhat.com (Uri Lublin)
-dru...@redhat.com (Dror Russo)
-mgold...@redhat.com (Michael Goldish)
-dh...@redhat.com (David Huff)
-aerom...@redhat.com (Alexey Eromenko)
-mbu...@redhat.com (Mike Burns)
-
-TIME = 'MEDIUM'
-NAME = 'KVM test'
-TEST_TYPE = 'client'
-TEST_CLASS = 'Virtualization'
-TEST_CATEGORY = 'Functional'
-
-DOC = 
-Executes the KVM test framework on a given host. This module is
separated in
-minor functions, that execute different tests for doing Quality
Assurance on
-KVM (both kernelspace and userspace) code.
-
-For online docs, please refer to
http://www.linux-kvm.org/page/KVM-Autotest
-
-
-import sys, os, logging
-# Add the KVM tests dir to the python path
-kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests/kvm')
-sys.path.append(kvm_test_dir)
-# Now we can import modules inside the KVM tests dir
-import kvm_utils, kvm_config
-
-# set English environment (command output might be localized, need to
be safe)
-os.environ['LANG'] = 'en_US.UTF-8'
-
-build_cfg_path = os.path.join(kvm_test_dir, build.cfg)
-build_cfg =