Re: [libvirt] Need a better word than allocated or ascertained

2011-01-10 Thread Balbir Singh
* Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2011-01-07 15:23:54]:

 
 CC'ing Balbir..
 
 On Fri, 07 Jan 2011 10:33:08 +0100, Zdenek Styblik sty...@turnovfree.net 
 wrote:
  On 01/07/2011 10:10 AM, Justin Clift wrote:
   On 07/01/2011, at 6:12 PM, Nikunj A. Dadhania wrote:
   snip
   Guaranteed sounds best to me.
  
   Thats not Gauranteed to the best of my knowlegde
  
   Balbir suggest enforced, I guessed i dropped it somewhere.
   https://www.redhat.com/archives/libvir-list/2010-August/msg00712.html
   
   Balbir's suggested wording (from the email):
   
 limit to enforce on memory contention
   
   Does that mean it's the minimum memory limit it would really like to
   have, but can't guarantee it?  (ie it's not guaranteed)
  
  I'm getting a bit confused here. enforced really doesn't fit into the
  context, or does it?
  
  What should it say/explain? [soft-limit]
  Who is target audience?
  
  And I think the last question is very important, because your technical
  mambo-jumbo might be just fine and tip-top to the last bit, but if
  nobody else understands it, then such help seems to be a bit helpless to
  me. Meaning:
  * allocated/guaranteed I can imagine;
  * ascertained gave me really non-sense translation, although that might
  be caused by crappy dictionary;
  * enforced - uh ... how? what? when? Is it when host is running low on
  memory and/or there are many VMs competing for memory? If so, please
  explain it somewhere if it isn't already(yeah, I'm trying to figure out
  the meaning).
  
  Or what happens when memory reaches 'soft-limit'?

enforced is same as policing or forcing, whether or not the
application likes it. A soft limit is enforced when we hit resource
contention (that is the operating system finds it has to do work to
find free memory for applications), soft limits kick in and try to
push down each cgroup to their soft limit.


-- 
Three Cheers,
Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Need a better word than allocated or ascertained

2011-01-10 Thread Balbir Singh
* Zdenek Styblik sty...@turnovfree.net [2011-01-10 14:08:43]:

 On 01/10/2011 09:55 AM, Balbir Singh wrote:
  * Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2011-01-07 15:23:54]:
 [...]
  Or what happens when memory reaches 'soft-limit'?
  
  enforced is same as policing or forcing, whether or not the
  application likes it. A soft limit is enforced when we hit resource
  contention (that is the operating system finds it has to do work to
  find free memory for applications), soft limits kick in and try to
  push down each cgroup to their soft limit.
  
 
 Such explanation makes more sense to me rather than proposed sentence.
 However, there are some critical factors like a] my lack of knowledge on
 many libvirt(or virtualization in general) topics b] I'm not a native
 English speaker; which may or may not play a role.
 
 --- SNIP ---
 A soft limit is enforced when host is running short on free resources or
 during resource contention. Guest's resources are then pushed to
 soft-limit as an attempt to regain free resources.
 Limit is in kilobytes.
 Applies to QEMU and LXC only.
 --- SNIP ---

Good, well stated IMHO

 
 I don't know. This is like 10th version and wow, what a pile of
 non-sense I came with :[
 Guest memory won't be pushed bellow soft limit, because guest could go
 ape(OOMK/whatever) about it and we don't want that.
 Could it be understood as resource allocation/reservation like in eg.
 VMware ESX? But it might work differently in QEMU/LXC than in VMware.
 Anyway, this is probably off-topic here.
 
 I just would go for longer explanation rather than squeezing everything
 into 5 words, which seems to be impossible to me, or changing just one word.
 
 ~~~ non-relevant part ~~~
 Other things I've noticed at the page...
 
 I would change the table to:
 
 Name | Units | Required | Desc |
 --hard-limit limit | kB | optional | some description
 
 Or
 
 Name | Required | Desc |
 --hard-limit limit | optional | some descrioption limit is in kilobytes
 
 Also, I think it should be 'kB' not 'kb' which means 'kilobits'[1]. I
 don't want to bitch or anything like that. Please, take it very very
 easy. Although, it's explained in description kb is meant as kilobytes
 and it might be only me who is used on kb X kB thing. Dunno :\


I'd agree, conventions need to be properly followed.
 
 I would put eg. QEMU and LXC only at new line, but this might be
 unnecessary(= just a format issue). There also could be special column
 'Applies to' and what not(at this point, I feel like I must be really
 bored to come up with such stuff; please apply sftu if necessary w/o
 hard feelings ;] ).
 There is also duplication of this info paragraph below in 'Platform or
 Hypervisor specific notes', thus if something changes it must be changed
 at two places.
 
 Links:
 ---
 [1] http://en.wikipedia.org/wiki/KB
 
 Have a nice day,
 Zdenek
 
 -- 
 Zdenek Styblik
 Net/Linux admin
 OS TurnovFree.net
 email: sty...@turnovfree.net
 jabber: sty...@jabber.turnovfree.net

-- 
Three Cheers,
Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Update docs for memory parameters and memtune command

2010-10-18 Thread Balbir Singh
* Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-10-18 14:03:53]:

 On Mon, 18 Oct 2010 09:55:46 +0200, Matthias Bolte 
 matthias.bo...@googlemail.com wrote:
  2010/10/18 Nikunj A. Dadhania nik...@linux.vnet.ibm.com:
   From: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
  
   docs/formatdomain.html.in: Add memtune element details
 [...]
   @@ -211,6 +216,22 @@
           codehugepages/code element set within it. This tells the
           hypervisor that the guest should have its memory allocated using
           hugepages instead of the normal native page size./dd
   +      dtcodememtune/code/dt
   +      dd The optional codememtune/code element provides details
   +      regarding the memory tuneable parameters for the domain. If this is
   +      omitted, it defaults to the OS provided defaults./dd
   +      dtcodehard_limit/code/dt
   +      dd The optional codehard_limit/code element is the maximum 
   memory
   +       the guest can use. The units for this value are kilobytes (i.e. 
   blocks
   +       of 1024 bytes)/dd
  
  Well, the maximum of memory a guest can use is also controlled by the
  memory and currentMemory element in some way. How does hard_limit
  relate to those two?
 
 memory and currentMemory are related to balloon size, while these are 
 operating
 system provided limits.
  
   +      dtcodesoft_limit/code/dt
   +      dd The optional codesoft_limit/code element is the memory 
   limit to
   +       enforce during memory contention. The units for this value are
   +       kilobytes (i.e. blocks of 1024 bytes)/dd
  
  Is this an upper or a lower limit? Does it mean in case of contention
  this guest may only use up to soft_limit kilobytes of memory (upper
  limit)? Or does it mean in case of contention make sure that this
  guest can access at least soft_limit kilobytes of memory (lower
  limit)?
  
 Upper limit of memory the guest can use(i.e upto soft_limit) during
 contention. Balbir, correct me if this isn't correct.


Yes, that interpretation is correct. We try to push back the guest to
soft limit on contention, this is typically the case when the guest
uses more than the assigned soft limit.
 
  How does this relate to the memory and currentMemory element?  
 
 At present no relation, they are implemented by the OS.

This feature allows us to set useful limits, on lack of contention no
limits are enforced (IOW, this is work conserving so to speak).

 
  How does it related to the min_guarantee element?
  
 It is not related to min_guarantee.
 
   +      dtcodeswap_hard_limit/code/dt
   +      dd The optional codeswap_hard_limit/code element is the 
   maximum
   +       swap the guest can use. The units for this value are kilobytes
   +       (i.e. blocks of 1024 bytes)/dd
  
  What about the min_guarantee element anyway? It's not implemented in virsh.
  
 Missed it, I will add the docs about min_gaurantee and send the updated
 patch. It is not implemented in virsh. However, I have taken care of parsing
 them in domain configuration.
 

-- 
Three Cheers,
Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v3 04/13] XML parsing for memory tunables

2010-10-12 Thread Balbir Singh
* Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-10-08 14:43:34]:

 On Fri, 8 Oct 2010 14:10:53 +0530, Balbir Singh bal...@linux.vnet.ibm.com 
 wrote:
  * Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-10-08 12:00:44]:
  
   On Thu, 7 Oct 2010 12:49:29 +0100, Daniel P. Berrange 
   berra...@redhat.com wrote:
On Mon, Oct 04, 2010 at 12:47:22PM +0530, Nikunj A. Dadhania wrote:
 On Mon, 4 Oct 2010 12:16:42 +0530, Balbir Singh 
 bal...@linux.vnet.ibm.com wrote:
  * Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-09-28 
  15:26:30]:
   snip
   +unsigned long hard_limit;
   +unsigned long soft_limit;
   +unsigned long min_guarantee;
   +unsigned long swap_hard_limit;
  
  The hard_limit, soft_limit, swap_hard_limit are s64 and the value is
  in bytes. What is the unit supported in this implementation?

Actually if libvirt is built on 32bit these aren't big enough - make
them into 'unsigned long long' data types I reckon.

   I was thinking that as we are having the unit of KB, we would be able to
   represent 2^42 bytes of memory limit, ie. 4 Terabytes. Won't this suffice 
   in
   case of 32bit?
  
  
  How would you represent -1 (2^63 -1) as unlimited or max limit we use
  today? 
  
 I think I have answered this question in the thread: this is specific to
 cgroup that -1 means unlimited, this may not be true for other HVs.

OK, so how do we handle unlimited values in general?

-- 
Three Cheers,
Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v3 04/13] XML parsing for memory tunables

2010-10-08 Thread Balbir Singh
* Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-10-08 12:00:44]:

 On Thu, 7 Oct 2010 12:49:29 +0100, Daniel P. Berrange berra...@redhat.com 
 wrote:
  On Mon, Oct 04, 2010 at 12:47:22PM +0530, Nikunj A. Dadhania wrote:
   On Mon, 4 Oct 2010 12:16:42 +0530, Balbir Singh 
   bal...@linux.vnet.ibm.com wrote:
* Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-09-28 15:26:30]:
 snip
 +unsigned long hard_limit;
 +unsigned long soft_limit;
 +unsigned long min_guarantee;
 +unsigned long swap_hard_limit;

The hard_limit, soft_limit, swap_hard_limit are s64 and the value is
in bytes. What is the unit supported in this implementation?
  
  Actually if libvirt is built on 32bit these aren't big enough - make
  them into 'unsigned long long' data types I reckon.
  
 I was thinking that as we are having the unit of KB, we would be able to
 represent 2^42 bytes of memory limit, ie. 4 Terabytes. Won't this suffice in
 case of 32bit?


How would you represent -1 (2^63 -1) as unlimited or max limit we use
today? 

-- 
Three Cheers,
Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v3 05/13] Implement cgroup memory controller tunables

2010-10-04 Thread Balbir Singh
* Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-09-28 15:26:35]:

 From: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
 
 Provides interfaces for setting/getting memory tunables like hard_limit,
 soft_limit and swap_hard_limit
 
 Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com

The changes look good to me. unsigned long kb should cover all values
in bytes as well.

Acked-by: Balbir Singh bal...@linux.vnet.ibm.com
 

-- 
Three Cheers,
Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v3 04/13] XML parsing for memory tunables

2010-10-04 Thread Balbir Singh
* Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-09-28 15:26:30]:

 From: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
 
 Adding parsing code for memory tunables in the domain xml file
 
 v2:
 + Fix typo min_guarantee
 
 Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
 ---
  src/conf/domain_conf.c |   50 
 +---
  src/conf/domain_conf.h |   12 ---
  src/esx/esx_vmx.c  |   30 +-
  src/lxc/lxc_controller.c   |2 +-
  src/lxc/lxc_driver.c   |   12 +--
  src/openvz/openvz_driver.c |8 ---
  src/qemu/qemu_conf.c   |8 ---
  src/qemu/qemu_driver.c |   18 
  src/test/test_driver.c |   12 +--
  src/uml/uml_conf.c |2 +-
  src/uml/uml_driver.c   |   14 ++--
  11 files changed, 104 insertions(+), 64 deletions(-)
 
 diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
 index e05d5d7..0dd74e4 100644
 --- a/src/conf/domain_conf.c
 +++ b/src/conf/domain_conf.c
 @@ -4231,19 +4231,38 @@ static virDomainDefPtr 
 virDomainDefParseXML(virCapsPtr caps,
  def-description = virXPathString(string(./description[1]), ctxt);
 
  /* Extract domain memory */
 -if (virXPathULong(string(./memory[1]), ctxt, def-maxmem)  0) {
 +if (virXPathULong(string(./memory[1]), ctxt, 
 +  def-mem.max_balloon)  0) {
  virDomainReportError(VIR_ERR_INTERNAL_ERROR,
   %s, _(missing memory element));
  goto error;
  }
 
 -if (virXPathULong(string(./currentMemory[1]), ctxt, def-memory)  0)
 -def-memory = def-maxmem;
 +if (virXPathULong(string(./currentMemory[1]), ctxt, 
 +  def-mem.cur_balloon)  0)
 +def-mem.cur_balloon = def-mem.max_balloon;
 
  node = virXPathNode(./memoryBacking/hugepages, ctxt);
  if (node)
 -def-hugepage_backed = 1;
 -
 +def-mem.hugepage_backed = 1;
 +
 +/* Extract other memory tunables */
 +if (virXPathULong(string(./memtune/hard_limit), ctxt, 
 +  def-mem.hard_limit)  0) 
 +def-mem.hard_limit = 0;
 +
 +if (virXPathULong(string(./memtune/soft_limit[1]), ctxt, 
 +  def-mem.soft_limit)  0)
 +def-mem.soft_limit = 0;
 +
 +if (virXPathULong(string(./memtune/min_guarantee[1]), ctxt, 
 +  def-mem.min_guarantee)  0)
 +def-mem.min_guarantee = 0;
 +
 +if (virXPathULong(string(./memtune/swap_hard_limit[1]), ctxt, 
 +  def-mem.swap_hard_limit)  0)
 +def-mem.swap_hard_limit = 0;
 +

Quick question, does 0 represent invalid values? I'd presume you'd
want to use something like -1. We support unsigned long long for the
values to be set (64 bit signed), unlimited translates to 2^63 - 1, is
ULong sufficient to represent that value?

  if (virXPathULong(string(./vcpu[1]), ctxt, def-vcpus)  0)
  def-vcpus = 1;
 
 @@ -6382,10 +6401,25 @@ char *virDomainDefFormat(virDomainDefPtr def,
  virBufferEscapeString(buf,   description%s/description\n,
def-description);
 
 -virBufferVSprintf(buf,   memory%lu/memory\n, def-maxmem);
 +virBufferVSprintf(buf,   memory%lu/memory\n, 
 def-mem.max_balloon);
  virBufferVSprintf(buf,   currentMemory%lu/currentMemory\n,
 -  def-memory);
 -if (def-hugepage_backed) {
 +  def-mem.cur_balloon);
 +virBufferVSprintf(buf,   memtune\n);
 +if (def-mem.hard_limit) {
 +virBufferVSprintf(buf, hard_limit%lu/hard_limit\n, 
 +  def-mem.hard_limit);
 +}
 +if (def-mem.soft_limit) {
 +virBufferVSprintf(buf, soft_limit%lu/soft_limit\n, 
 +  def-mem.soft_limit);
 +}
 +if (def-mem.swap_hard_limit) {
 +virBufferVSprintf(buf, 
 swap_hard_limit%lu/swap_hard_limit\n, 
 +  def-mem.swap_hard_limit);
 +}
 +virBufferVSprintf(buf,   /memtune\n);
 +
 +if (def-mem.hugepage_backed) {
  virBufferAddLit(buf,   memoryBacking\n);
  virBufferAddLit(buf, hugepages/\n);
  virBufferAddLit(buf,   /memoryBacking\n);
 diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
 index 7195c04..2ecc2af 100644
 --- a/src/conf/domain_conf.h
 +++ b/src/conf/domain_conf.h
 @@ -864,9 +864,15 @@ struct _virDomainDef {
  char *name;
  char *description;
 
 -unsigned long memory;
 -unsigned long maxmem;
 -unsigned char hugepage_backed;
 +struct {
 +unsigned long max_balloon;
 +unsigned long cur_balloon;
 +unsigned long hugepage_backed;
 +unsigned long hard_limit;
 +unsigned long soft_limit;
 +unsigned long min_guarantee;
 +unsigned long swap_hard_limit;

The hard_limit, soft_limit, swap_hard_limit are s64 and the value is
in bytes. What is 

Re: [libvirt] [RFC] Memory controller exploitation in libvirt

2010-08-31 Thread Balbir Singh
 held might not be the right word for soft limit.
 How about - Memory limit ensured during contention

I'd recommend limit to enforce on memory contention

Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Memory controller exploitation in libvirt

2010-08-30 Thread Balbir Singh
On Mon, Aug 30, 2010 at 11:56 AM, Nikunj A. Dadhania
nik...@linux.vnet.ibm.com wrote:
 On Tue, 24 Aug 2010 11:07:29 +0100, Daniel P. Berrange 
 berra...@redhat.com wrote:
 On Tue, Aug 24, 2010 at 03:17:44PM +0530, Nikunj A. Dadhania wrote:
 
  On Tue, 24 Aug 2010 11:02:49 +0200, Matthias Bolte 
  matthias.bo...@googlemail.com wrote:
 
  snip
 
   Yes the ESX driver allows to control ballooning through
   virDomainSetMemory and virDomainSetMaxMemory.
  
   ESX itself also allows to set what's called memoryMinGaurantee in the
   thread, but this is not exposed in libvirt.
  LXC driver is using virDomainSetMemory to set the memory hard limit while
  QEmu/ESX use them to change the ballooning. And as you said, ESX does 
  support
  memoryMinGaurantee, we can get this exported in libvirt using this new API.
 
  Here I am trying to group all the memory related parameters into one single
  public API as we have in virDomainSetSchedulerParameters. Currently, the 
  names
  are not conveying what they modify in the below layer and are confusing.

 For historical design record, I think it would be good to write a short
 description of what memory tunables are available for each hypervisor,
 covering VMWare, OpenVZ, Xen, KVM and LXC (the latter both cgroups based).
 I do recall that OpenVZ  in particular had a huge number of memory
 tunables.

 This is an attempt at covering the memory tunables supported by various
 hypervisors in libvirt. Let me know if I have missed any memory
 tunable. Moreover, inputs from the maintainers/key contributes of each HVs on
 these parameters is appreciable. This would help in getting a complete
 coverage of the memory tunables that libvirt can support.

 1) OpenVZ
 =
 vmguarpages: Memory allocation guarantee, in pages.
 kmemsize: Size of unswappable kernel memory(in bytes), allocated for 
 processes in this
          container.
 oomguarpages: The guaranteed amount of memory for the case the memory is
              “over-booked” (out-of-memory kill guarantee), in pages.
 privvmpages: Memory allocation limit, in pages.


 OpenVZ driver does not implement any of these functions:
 domainSetMemory
 domainSetMaxMemory
 domainGetMaxMemory

 Although, the driver has an internal implementation for setting memory:
 openvzDomainSetMemoryInternal that is read from the domain xml file.

 2) VMWare
 =
 ConfiuredSize:  Virtual memory the guest can have.

 Shares: Priority of the VM, in case there is not enough memory or in case when
        there is more memory. It has symbolic values like Low, Normal, High
        and Custom

 Reservation: Gauranteed lower bound on the amount of the physical memory that
             the host reserves for the VM even in case of the overcommit. The
             VM is allowed to allocate till this level and after it has hit
             the reservation, those pages are not reclaimed. In case, if guest
             is not using till the reservation, the host can use that portion
             of memory.

 Limit: This is the upper bound for the amount of physical memory that the host
       can allocate for the VM.

 Memory Balloon

 ESX driver uses following:
 * domainSetMaxMemory to set the max virtual memory for the VM.
 * domainSetMemory to inflate/deflate the balloon.
 * ESX provides lower bound(Reservation), but is not being exploited currently.

 3) Xen
 ==
 maxmem_set: Maximum amount of memory reservation of the domain
 mem_target_set: Set current memory usage of the domain


 4) KVM  LXC
 
 memory.limit_in_bytes: Memory hard limit
 memory.soft_limit_in_bytes: Memory limit held during contention

held might not be the right word for soft limit.

 memory.memsw_limit_in_bytes: Memory+Swap hard limit
 memory.swapiness: Controls the tendency of moving the VM processes to the
                  swap. Value range is 0-100, where 0 means, avoid swapping as
                  long as possible and 100 means aggressively swap processes.

 Statistics:
 memory.usage_in_bytes: Current memory usage
 memory.memsw_usage_in_bytes: Current memory+swap usage
 memory.max_usage_in_bytes: Maximum memory usage recorded
 memory.memsw_max_usage_in_bytes: Maximum memory+swap usage

We also have memory.stat, memory.hierarchy - The question is do we
care about hierarchical control? We also have controls to decide
whether to move memory on moving from one cgroup to another, that
might not apply to the LXC/QEMU case. There is also memory.failcnt
which I am not sure makes sense to export.

Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Memory controller exploitation in libvirt

2010-08-24 Thread Balbir Singh
* Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-08-24 11:53:27]:

 
  Subject: [RFC] Memory controller exploitation in libvirt
 
  Memory CGroup is a kernel feature that can be exploited effectively in the
  current libvirt/qemu driver. Here is a shot at that.
 
  At present, QEmu uses memory ballooning feature, where the memory can be
  inflated/deflated as and when needed, co-operatively between the host and
  the guest. There should be some mechanism where the host can have more
  control over the guests memory usage. Memory CGroup provides features such
  as hard-limit and soft-limit for memory, and hard-limit for swap area.
 
  Design 1: Provide new API and XML changes for resource management
  =
 
  All the memory controller tunables are not supported with the current
  abstractions provided by the libvirt API. libvirt works on various OS. This
  new API will support GNU/Linux initially and as and when other platforms
  starts supporting memory tunables, the interface could be enabled for
  them. Adding following two function pointer to the virDriver interface.
 
  1) domainSetMemoryParameters: which would take one or more name-value
 pairs. This makes the API extensible, and agnostic to the kind of
 parameters supported by various Hypervisors.
  2) domainGetMemoryParameters: For getting current memory parameters
 
  Corresponding libvirt public API:
  int virDomainSetMemoryParamters (virDomainPtr domain, 
 virMemoryParamterPtr params, 
 unsigned int nparams);
  int virDomainGetMemoryParamters (virDomainPtr domain, 
 virMemoryParamterPtr params, 
 unsigned int nparams);
 
 

Does nparams imply setting several parameters together? Does bulk
loading help? I would prefer splitting out the API if possible
into

virCgroupSetMemory() - already present in src/util/cgroup.c
virCgroupGetMemory() - already present in src/util/cgroup.c
virCgroupSetMemorySoftLimit()
virCgroupSetMemoryHardLimit()
virCgroupSetMemorySwapHardLimit()
virCgroupGetStats()

 
  Parameter list supported:
 
 MemoryHardLimits (memory.limits_in_bytes) - Maximum memory 
 MemorySoftLimits (memory.softlimit_in_bytes) - Desired memory 

Soft limits allows you to set memory limit on contention.

 MemoryMinimumGaurantee - Minimum memory required (without this amount of
 memory, VM should not be started) 
 
 SwapHardLimits (memory.memsw_limit_in_bytes) - Maximum swap 
 SwapSoftLimits (Currently not supported by kernel) - Desired swap space 
 

We *dont* support SwapSoftLimits in the memory cgroup controller with
no plans to support it in the future either at this point. The
semantics are just too hard to get right at the moment.

 Tunables memory.limit_in_bytes, memory.softlimit_in_bytes and
 memory.memsw_limit_in_bytes are provided by the memory controller in the
 Linux kernel.
 
  I am not an expert here, so just listing what new elements need to be added
  to the XML schema:
 
  define name=resource
 element memory
   element memoryHardLimit/
   element memorySoftLimit/
 element memoryMinGaurantee/
 element swapHardLimit/
 element swapSoftLimit/
 /element
  /define
 

I'd prefer a syntax that integrates well with what we currently have

cgroup
path.../path
controller
name../name
soft limit.../
hard limit.../
/controller
...
/cgroup

But I am not an XML expert or an export in designing XML
configurations.

  Pros:
  * Support all the tunables exported by the kernel
  * More tunables can be added as and when required
 
  Cons:
  * Code changes would touch various levels
  * Might need to redefine(changing the scope) of existing memory
API. Currently, domainSetMemory is used to set limit_in_bytes in LXC and
memory ballooning in QEmu. While the domainSetMaxMemory is not defined in
QEmu and in case of LXC it is setting the internal object's maxmem
variable.
 
  Future: 
 
  * Later on, CPU/IO/Network controllers related tunables can be
added/enhanced along with the APIs/XML elements:
 
  CPUHardLimit
CPUSoftLimit
CPUShare
CPUPercentage
IO_BW_Softlimit
IO_BW_Hardlimit
IO_BW_percentage
 
  * libvirt-cim support for resource management
 
  Design 2: Reuse the current memory APIs in libvirt
  ==
 
  Use memory.limit_in_bytes to tweak memory hard limits 
  Init - Set the memory.limit_in_bytes to maximum mem.
 
  Claiming memory from guest:
  a) Reduce balloon size
  b) If the guest does not co-operate(How do we know?), reduce
  memory.limit_in_bytes. 


This is a policy 

Re: [libvirt] [RFC] Memory controller exploitation in libvirt

2010-08-24 Thread Balbir Singh
* Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-08-24 13:35:10]:

 On Tue, 24 Aug 2010 13:05:26 +0530, Balbir Singh bal...@linux.vnet.ibm.com 
 wrote:
  * Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-08-24 11:53:27]:
  
   
Subject: [RFC] Memory controller exploitation in libvirt
   
Corresponding libvirt public API:
int virDomainSetMemoryParamters (virDomainPtr domain, 
   virMemoryParamterPtr params, 
   unsigned int nparams);
int virDomainGetMemoryParamters (virDomainPtr domain, 
   virMemoryParamterPtr params, 
   unsigned int nparams);
   
   
  
  Does nparams imply setting several parameters together? Does bulk
  loading help? I would prefer splitting out the API if possible
  into
 Yes it helps, when parsing the parameters from the domain xml file, we can 
 call
 this API and set them at once. BTW, it can also be called with one parameter
 if desired.

  
  virCgroupSetMemory() - already present in src/util/cgroup.c
  virCgroupGetMemory() - already present in src/util/cgroup.c
  virCgroupSetMemorySoftLimit()
  virCgroupSetMemoryHardLimit()
  virCgroupSetMemorySwapHardLimit()
  virCgroupGetStats()
 This is at the cgroup level(internal API) and will be implemented in the way
 that is suggested. The RFC should not be specific to cgroups. libvirt is
 supported on multiple OS and the above described APIs in the RFC are public
 API.


I thought we were talking of cgroups in the QEMU driver for Linux.
IMHO the generalization is too big. ESX for example, already abstracts
their WLM/RM needs in their driver.
 
   SwapHardLimits (memory.memsw_limit_in_bytes) - Maximum swap 
   SwapSoftLimits (Currently not supported by kernel) - Desired swap 
   space 
   
  
  We *dont* support SwapSoftLimits in the memory cgroup controller with
  no plans to support it in the future either at this point. The
 Ok.
 
   Tunables memory.limit_in_bytes, memory.softlimit_in_bytes and
   memory.memsw_limit_in_bytes are provided by the memory controller in 
   the
   Linux kernel.
   
I am not an expert here, so just listing what new elements need to be 
   added
to the XML schema:
   
define name=resource
   element memory
 element memoryHardLimit/
 element memorySoftLimit/
   element memoryMinGaurantee/
   element swapHardLimit/
   element swapSoftLimit/
   /element
/define
   
  
  I'd prefer a syntax that integrates well with what we currently have
  
  cgroup
  path.../path
  controller
  name../name
  soft limit.../
  hard limit.../
  /controller
  ...
  /cgroup
 
 Again this is a libvirt domain xml file, IMO, it should not be cgroup 
 specific.


See the comment above. 

-- 
Three Cheers,
Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Memory controller exploitation in libvirt

2010-08-24 Thread Balbir Singh
* Daniel P. Berrange berra...@redhat.com [2010-08-24 11:02:44]:

 On Tue, Aug 24, 2010 at 01:05:26PM +0530, Balbir Singh wrote:
  * Nikunj A. Dadhania nik...@linux.vnet.ibm.com [2010-08-24 11:53:27]:
  
   
Subject: [RFC] Memory controller exploitation in libvirt
   
Memory CGroup is a kernel feature that can be exploited effectively in 
   the
current libvirt/qemu driver. Here is a shot at that.
   
At present, QEmu uses memory ballooning feature, where the memory can be
inflated/deflated as and when needed, co-operatively between the host and
the guest. There should be some mechanism where the host can have more
control over the guests memory usage. Memory CGroup provides features 
   such
as hard-limit and soft-limit for memory, and hard-limit for swap area.
   
Design 1: Provide new API and XML changes for resource management
=
   
All the memory controller tunables are not supported with the current
abstractions provided by the libvirt API. libvirt works on various OS. 
   This
new API will support GNU/Linux initially and as and when other platforms
starts supporting memory tunables, the interface could be enabled for
them. Adding following two function pointer to the virDriver interface.
   
1) domainSetMemoryParameters: which would take one or more name-value
   pairs. This makes the API extensible, and agnostic to the kind of
   parameters supported by various Hypervisors.
2) domainGetMemoryParameters: For getting current memory parameters
   
Corresponding libvirt public API:
int virDomainSetMemoryParamters (virDomainPtr domain, 
   virMemoryParamterPtr params, 
   unsigned int nparams);
int virDomainGetMemoryParamters (virDomainPtr domain, 
   virMemoryParamterPtr params, 
   unsigned int nparams);
   
   
  
  Does nparams imply setting several parameters together? Does bulk
  loading help? I would prefer splitting out the API if possible
  into
  
  virCgroupSetMemory() - already present in src/util/cgroup.c
  virCgroupGetMemory() - already present in src/util/cgroup.c
  virCgroupSetMemorySoftLimit()
  virCgroupSetMemoryHardLimit()
  virCgroupSetMemorySwapHardLimit()
  virCgroupGetStats()
 
 Nope, we don't want cgroups exposed in the public API, since this
 has to be applicable to the VMWare and OpenVZ drivers too.


I am not talking about exposing these as public API, but
be a part of src/util/cgroup.c and utilized by the qemu driver.

It is good to abstract out the OS independent parts, but my concern
was double exposure through API like driver-setMemory() that is currently
used and the newer API.

 
Parameter list supported:
   
   MemoryHardLimits (memory.limits_in_bytes) - Maximum memory 
   MemorySoftLimits (memory.softlimit_in_bytes) - Desired memory 
  
  Soft limits allows you to set memory limit on contention.
  
   MemoryMinimumGaurantee - Minimum memory required (without this amount 
   of
   memory, VM should not be started) 
   
   SwapHardLimits (memory.memsw_limit_in_bytes) - Maximum swap 
   SwapSoftLimits (Currently not supported by kernel) - Desired swap 
   space 
   
  
  We *dont* support SwapSoftLimits in the memory cgroup controller with
  no plans to support it in the future either at this point. The
  semantics are just too hard to get right at the moment.
 
 That's not a huge problem. Since we have many hypervisors to support
 in libvirt, I expect the set of tunables will expand over time, and
 not every hypervisor driver in libvirt will support every tunable.
 They'll just pick the tunables that apply to them. We can leave
 SwapSoftLimits out of the public API until we find a HV that needs
 it
 
  
   Tunables memory.limit_in_bytes, memory.softlimit_in_bytes and
   memory.memsw_limit_in_bytes are provided by the memory controller in 
   the
   Linux kernel.
   
I am not an expert here, so just listing what new elements need to be 
   added
to the XML schema:
   
define name=resource
   element memory
 element memoryHardLimit/
 element memorySoftLimit/
   element memoryMinGaurantee/
   element swapHardLimit/
   element swapSoftLimit/
   /element
/define
   
  
  I'd prefer a syntax that integrates well with what we currently have
  
  cgroup
  path.../path
  controller
  name../name
  soft limit.../
  hard limit.../
  /controller
  ...
  /cgroup
 
 That is exposing far too much info about the cgroups implementation
 details. The XML representation needs to be decouple from the
 implementation

Re: [libvirt] About cgroup mechanism using in libvirt

2010-06-14 Thread Balbir Singh
On Mon, Jun 14, 2010 at 3:10 PM, Daniel P. Berrange berra...@redhat.com wrote:
 On Sat, Jun 12, 2010 at 07:23:33AM -0400, Alex Jia wrote:
 Hey Daniel,
 The cgroup mechanism have been integrated into libvirt for LXC and QEMU 
 driver,
 and the LXC driver uses all of cgroup controllers except for net_cls and 
 cpuset,
 while the QEMU driver only uses the cpu and devices controllers at present.

 From the user point of view, user can use some virsh commands to control some
 guest resources:
 1. Using 'virsh schedinfo' command to get/set CPU scheduler priority for a 
 guest

 QEMU + LXC use the cpu controller 'cpu_shares' tunable

 2. Using 'virsh vcpuin' command to control guest vcpu affinity

 QEMU pins the process directly, doesn't use cgroups.  LXC has't
 implemented this yet

 3. Using 'virsh setmem' command to change memory allocation
 4. Using 'virsh setmaxmem' command to change maximum memory limit

 QEMU uses balloon driver.  LXC uses cgroups memory controller


Not sure if I understand this, but the balloon driver and memory
cgroups are not mutually exclusive. One could use both together and  I
would certainly like to see additional commands to support cgroups.
What happens if a guest (like freebsd) does not support ballooning?
Are you suggesting we'll not need cgroups at all with QEMU?

 5. Using 'virsh setvcpus' command to change number of virtual CPUs

 QEMU uses cpu hotplug. LXC hasn't implemented this.

 I just make sure the above 1 using CPU scheduler controller, maybe 4 using 
 Memory
 controller? and maybe 5 using CPU set controller? I am not sure.


I think we'll some notion of soft limits as well, not sure if they can
be encapsulated using the current set. We need memory shares for
example to encapsulate them.

Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] cgroup: Enable memory.use_hierarchy of cgroup for domain

2010-05-06 Thread Balbir Singh
On Thu, May 6, 2010 at 7:40 PM, Ryota Ozaki ozaki.ry...@gmail.com wrote:
 Through conversation with Kumar L Srikanth-B22348, I found
 that the function of getting memory usage (e.g., virsh dominfo)
 doesn't work for lxc with ns subsystem of cgroup enabled.

 This is because of features of ns and memory subsystems.
 Ns creates child cgroup on every process fork and as a result
 processes in a container are not assigned in a cgroup for
 domain (e.g., libvirt/lxc/test1/). For example, libvirt_lxc
 and init (or somewhat specified in XML) are assigned into
 libvirt/lxc/test1/8839/ and libvirt/lxc/test1/8839/8849/,
 respectively. On the other hand, memory subsystem accounts
 memory usage within a group of processes by default, i.e.,
 it does not take any child (and descendent) groups into
 account. With the two features, virsh dominfo which just
 checks memory usage of a cgroup for domain always returns
 zero because the cgroup has no process.

 Setting memory.use_hierarchy of a group allows to account
 (and limit) memory usage of every descendent groups of the group.
 By setting it of a cgroup for domain, we can get proper memory
 usage of lxc with ns subsystem enabled. (To be exact, the
 setting is required only when memory and ns subsystems are
 enabled at the same time, e.g., mount -t cgroup none /cgroup.)
 ---

This does sound like a valid use case and the correct fix.

  src/util/cgroup.c |   49 +
  1 files changed, 45 insertions(+), 4 deletions(-)

 diff --git a/src/util/cgroup.c b/src/util/cgroup.c
 index b8b2eb5..93cd6a9 100644
 --- a/src/util/cgroup.c
 +++ b/src/util/cgroup.c
 @@ -443,7 +443,38 @@ static int virCgroupCpuSetInherit(virCgroupPtr parent, 
 virCgroupPtr group)
     return rc;
  }

 -static int virCgroupMakeGroup(virCgroupPtr parent, virCgroupPtr group, int 
 create)
 +static int virCgroupSetMemoryUseHierarchy(virCgroupPtr group)
 +{
 +    int rc = 0;
 +    unsigned long long value;
 +    const char *filename = memory.use_hierarchy;
 +
 +    rc = virCgroupGetValueU64(group,
 +                              VIR_CGROUP_CONTROLLER_MEMORY,
 +                              filename, value);
 +    if (rc != 0) {
 +        VIR_ERROR(Failed to read %s/%s (%d), group-path, filename, rc);
 +        return rc;
 +    }
 +
 +    /* Setting twice causes error, so if already enabled, skip setting */
 +    if (value == 1)
 +        return 0;
 +
 +    VIR_DEBUG(Setting up %s/%s, group-path, filename);
 +    rc = virCgroupSetValueU64(group,
 +                              VIR_CGROUP_CONTROLLER_MEMORY,
 +                              filename, 1);
 +
 +    if (rc != 0) {
 +        VIR_ERROR(Failed to set %s/%s (%d), group-path, filename, rc);
 +    }
 +
 +    return rc;
 +}
 +
 +static int virCgroupMakeGroup(virCgroupPtr parent, virCgroupPtr group,
 +                              int create, int memory_hierarchy)
  {
     int i;
     int rc = 0;
 @@ -477,6 +508,16 @@ static int virCgroupMakeGroup(virCgroupPtr parent, 
 virCgroupPtr group, int creat
                     break;
                 }
             }

Can you please add a comment here stating that memory.use_hierarchy
should always be called prior to creating subcgroups and attaching
tasks

 +            if (memory_hierarchy 
 +                group-controllers[VIR_CGROUP_CONTROLLER_MEMORY].mountPoint 
 != NULL 
 +                (i == VIR_CGROUP_CONTROLLER_MEMORY ||
 +                 STREQ(group-controllers[i].mountPoint, 
 group-controllers[VIR_CGROUP_CONTROLLER_MEMORY].mountPoint))) {
 +                rc = virCgroupSetMemoryUseHierarchy(group);
 +                if (rc != 0) {
 +                    VIR_FREE(path);
 +                    break;
 +                }
 +            }
         }

         VIR_FREE(path);
 @@ -553,7 +594,7 @@ static int virCgroupAppRoot(int privileged,
     if (rc != 0)
         goto cleanup;

 -    rc = virCgroupMakeGroup(rootgrp, *group, create);
 +    rc = virCgroupMakeGroup(rootgrp, *group, create, 0);

  cleanup:
     virCgroupFree(rootgrp);
 @@ -653,7 +694,7 @@ int virCgroupForDriver(const char *name,
     VIR_FREE(path);

     if (rc == 0) {
 -        rc = virCgroupMakeGroup(rootgrp, *group, create);
 +        rc = virCgroupMakeGroup(rootgrp, *group, create, 0);
         if (rc != 0)
             virCgroupFree(group);
     }
 @@ -703,7 +744,7 @@ int virCgroupForDomain(virCgroupPtr driver,
     VIR_FREE(path);

     if (rc == 0) {
 -        rc = virCgroupMakeGroup(driver, *group, create);
 +        rc = virCgroupMakeGroup(driver, *group, create, 1);
         if (rc != 0)
             virCgroupFree(group);
     }

A comment on why Domains get hierarchy support and Drivers don't will
help unless it is very obvious to developers.

Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] dont't crash in virsh dominfo domain

2010-03-19 Thread Balbir Singh
On Thu, Mar 18, 2010 at 7:18 PM, Daniel Veillard veill...@redhat.com wrote:
 On Wed, Mar 17, 2010 at 09:11:07PM +0100, Guido Günther wrote:
 Hi,

 virsh dominfo domain crashes with:

 #0  strlen () at ../sysdeps/i386/i486/strlen.S:69
 #1  0x080891c9 in qemudNodeGetSecurityModel (conn=0x8133940, 
 secmodel=0xb5676ede) at qemu/qemu_driver.c:4911
 #2  0xb7eb5623 in virNodeGetSecurityModel (conn=0x8133940, secmodel=0x0) at 
 libvirt.c:5118
 #3  0x0806767a in remoteDispatchNodeGetSecurityModel (server=0x811, 
 client=0x8134080, conn=0x8133940, hdr=0x81a8388, rerr=0xb56771d8, 
 args=0xb56771a0, ret=0xb5677144) at remote.c:1306
 #4  0x08068acc in remoteDispatchClientCall (server=0x811, 
 client=0x8134080, msg=0x8168378) at dispatch.c:506
 #5  0x08068ee3 in remoteDispatchClientRequest (server=0x811, 
 client=0x8134080, msg=0x8168378) at dispatch.c:388
 #6  0x0805baba in qemudWorker (data=0x811de2c) at libvirtd.c:1528
 #7  0xb7bb8585 in start_thread (arg=0xb5677b70) at pthread_create.c:300
 #8  0xb7b3a29e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

 if there's no primary security driver set since we only intialize the
 secmodel.model and secmodel.doi if we have one. Attached patch checks
 for primarySecurityDriver instead of securityDriver since the later is
 always set in qemudSecurityInit().
 Cheers,
  -- Guido

 From 1d26ec760739b0ea17d1b29730dbdb5632d3565c Mon Sep 17 00:00:00 2001
 From: =?UTF-8?q?Guido=20G=C3=BCnther?= a...@sigxcpu.org
 Date: Wed, 17 Mar 2010 21:04:11 +0100
 Subject: [PATCH] Don't crash without a security driver

 virsh dominfo vm crashes if there's no primary security driver set
 since we only intialize the secmodel.model and secmodel.doi if we have
 one. Attached patch checks for securityPrimaryDriver instead of
 securityDriver since the later is always set in qemudSecurityInit().

 Closes: http://bugs.debian.org/574359
 ---
  src/qemu/qemu_driver.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
 index 67d9ade..e26c591 100644
 --- a/src/qemu/qemu_driver.c
 +++ b/src/qemu/qemu_driver.c
 @@ -4956,7 +4956,7 @@ static int qemudNodeGetSecurityModel(virConnectPtr 
 conn,
      int ret = 0;

      qemuDriverLock(driver);
 -    if (!driver-securityDriver) {
 +    if (!driver-securityPrimaryDriver) {
          memset(secmodel, 0, sizeof (*secmodel));
          goto cleanup;
      }
 --

I've seen this issue too... I can confirm that this patch fixes the issue.

Balbir

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] kernel summit topic - 'containers end-game'

2009-06-30 Thread Balbir Singh
* Serge E. Hallyn se...@us.ibm.com [2009-06-30 15:06:13]:

 Quoting Balbir Singh (bal...@linux.vnet.ibm.com):
  On Tue, Jun 23, 2009 at 8:26 PM, Serge E. Hallynse...@us.ibm.com wrote:
   A topic on ksummit agenda is 'containers end-game and how do we
   get there'.
  
   So for starters, looking just at application (and system) containers, 
   what do
   the libvirt and liblxc projects want to see in kernel support that is 
   currently
   missing?  Are there specific things that should be done soon to make 
   containers
   more useful and usable?
  
   More generally, the topic raises the question... what 'end-games' are 
   there?
   A few I can think of off-hand include:
  
          1. resource control
  
  We intend to hold a io-controller minisummit before KS, we should have
  updates on that front. We also need to discuss CPU hard limits and
  Memory soft limits. We need control for memory large page, mlock, OOM
  notification support, shared page accounting, etc. Eventually on the
  libvirt front, we want to isolate cgroup and lxc support into
  individual components (long term)
 
 Thanks, Balbir.  By the last sentence, are you talking about having
 cgroup in its own libcgroup, or do you mean something else?
 
 On the topic of cgroups, does anyone not agree that we should try
 to get rid of the ns cgroup, at least once user namespaces can
 prevent root in a container from escaping their cgroup?


I would have no objections to trying to obsolete ns cgroup once user
namespaces can do what you suggest. 

-- 
Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] kernel summit topic - 'containers end-game'

2009-06-29 Thread Balbir Singh
On Tue, Jun 23, 2009 at 8:26 PM, Serge E. Hallynse...@us.ibm.com wrote:
 A topic on ksummit agenda is 'containers end-game and how do we
 get there'.

 So for starters, looking just at application (and system) containers, what do
 the libvirt and liblxc projects want to see in kernel support that is 
 currently
 missing?  Are there specific things that should be done soon to make 
 containers
 more useful and usable?

 More generally, the topic raises the question... what 'end-games' are there?
 A few I can think of off-hand include:

        1. resource control

We intend to hold a io-controller minisummit before KS, we should have
updates on that front. We also need to discuss CPU hard limits and
Memory soft limits. We need control for memory large page, mlock, OOM
notification support, shared page accounting, etc. Eventually on the
libvirt front, we want to isolate cgroup and lxc support into
individual components (long term)

        2. lightweight virtual servers
        3. (or 2.5) unprivileged containers/jail-on-steroids
                (lightweight virtual servers in which you might, just
                maybe, almost, be able to give away a root account, at
                least as much as you could do so with a kvm/qemu/xen
                partition)
        4. checkpoint, restart, and migration

 For each end-game, what kernel pieces do we think are missing?  For instance,
 people seem agreed that resource control needs io control :)  Containers imo
 need a user namespace.  I think there are quite a few network namespace
 exploiters who require sysfs directory tagging (or some equivalent) to
 allow us to migrate physical devices into network namespaces.  And
 checkpoint/restart needs... checkpoint/restart.

Balbir Singh

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1 of 2] Add internal cgroup manipulation functions

2008-10-03 Thread Balbir Singh
Dan Smith wrote:
 This patch adds src/cgroup.{c,h} with support for creating and manipulating
 cgroups.  It's quite naive at the moment, but should provide something to
 work with to move forward with resource controls.
 
 All groups created with the internal API are forced under $mount/libvirt/
 to keep everything together.  The first time a group is created, the libvirt
 directory is also created, and the settings from the root are inherited.
 
 The code scans the mount table to look for the first mount of type cgroup,
 and assumes that all controllers are mounted there.  I think this
 could/should be updated to prefer a mount with just the controller(s) we
 want, if there are multiple ones.
 
 If you have the cpuset controller enabled, and cpuset.cpus_exclusive is 1,
 then all cgroups to be created will fail.  Since we probably shouldn't
 blindly set the root to be non-exclusive, we may also want to consider this
 condition to be no cgroup support.
 
 diff -r 444e2614d0a2 -r 8e948eb88328 src/Makefile.am
 --- a/src/Makefile.am Wed Sep 17 16:07:03 2008 +
 +++ b/src/Makefile.am Mon Sep 29 09:37:42 2008 -0700
 @@ -96,7 +96,8 @@
   lxc_conf.c lxc_conf.h   \
   lxc_container.c lxc_container.h \
   lxc_controller.c\
 - veth.c veth.h
 + veth.c veth.h   \
 + cgroup.c cgroup.h
 
  OPENVZ_DRIVER_SOURCES =  \
   openvz_conf.c openvz_conf.h \
 diff -r 444e2614d0a2 -r 8e948eb88328 src/cgroup.c
 --- /dev/null Thu Jan 01 00:00:00 1970 +
 +++ b/src/cgroup.cMon Sep 29 09:37:42 2008 -0700
 @@ -0,0 +1,526 @@
 +/*
 + * cgroup.c: Tools for managing cgroups
 + *
 + * Copyright IBM Corp. 2008
 + *
 + * See COPYING.LIB for the License of this software
 + *
 + * Authors:
 + *  Dan Smith [EMAIL PROTECTED]
 + */
 +#include config.h
 +
 +#include stdio.h
 +#include stdint.h
 +#include inttypes.h
 +#include mntent.h
 +#include fcntl.h
 +#include string.h
 +#include errno.h
 +#include stdlib.h
 +#include stdbool.h
 +#include sys/stat.h
 +#include sys/types.h
 +#include libgen.h
 +
 +#include internal.h
 +#include util.h
 +#include cgroup.h
 +
 +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__)
 +#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg)
 +
 +struct virCgroup {
 +char *path;
 +};
 +

There is no support for permissions, is everything run as root?

 +void virCgroupFree(virCgroupPtr *group)
 +{
 +if (*group != NULL) {
 +free((*group)-path);
 +free(*group);
 +*group = NULL;
 +}
 +}
 +
 +static virCgroupPtr cgroup_get_mount(void)
 +{
 +FILE *mounts;
 +struct mntent entry;
 +char buf[512];

Is 512 arbitrary? How do we know it is going to be sufficient?

 +virCgroupPtr root = NULL;
 +
 +root = calloc(1, sizeof(*root));
 +if (root == NULL)
 +return NULL;
 +
 +mounts = fopen(/proc/mounts, r);
 +if (mounts == NULL) {
 +DEBUG0(Unable to open /proc/mounts: %m);
 +goto err;
 +}
 +
 +while (getmntent_r(mounts, entry, buf, sizeof(buf)) != NULL) {
 +if (STREQ(entry.mnt_type, cgroup)) {
 +root-path = strdup(entry.mnt_dir);
 +break;
 +}
 +}
 +
 +if (root-path == NULL) {
 +DEBUG0(Did not find cgroup mount);

Or strdup failed due to ENOMEM

 +goto err;
 +}
 +
 +fclose(mounts);
 +
 +return root;
 +err:
 +virCgroupFree(root);
 +
 +return NULL;
 +}
 +
 +int virCgroupHaveSupport(void)
 +{
 +virCgroupPtr root;
 +
 +root = cgroup_get_mount();
 +if (root == NULL)
 +return -1;
 +
 +virCgroupFree(root);
 +

This is quite a horrible way of wasting computation.

 +return 0;
 +}
 +
 +static int cgroup_path_of(const char *grppath,
 +  const char *key,
 +  char **path)
 +{
 +virCgroupPtr root;
 +int rc = 0;
 +
 +root = cgroup_get_mount();

So every routine calls cgroup_path_of(), reads the mounts entry and find a entry
for cgroup and returns it, why not do it just once and use it.

 +if (root == NULL) {
 +rc = -ENOTDIR;
 +goto out;
 +}
 +
 +if (asprintf(path, %s/%s/%s, root-path, grppath, key) == -1)
 +rc = -ENOMEM;
 +out:
 +virCgroupFree(root);
 +
 +return rc;
 +}
 +
 +int virCgroupSetValueStr(virCgroupPtr group,
 + const char *key,
 + const char *value)
 +{
 +int fd = -1;
 +int rc = 0;
 +char *keypath = NULL;
 +
 +rc = cgroup_path_of(group-path, key, keypath);
 +if (rc != 0)
 +return rc;
 +
 +fd = open(keypath, O_WRONLY);

I see a mix of open and fopen calls.I would prefer to stick to just one, helps
with readability.

 +if (fd  0) {
 +DEBUG(Unable to open %s: %m, keypath);
 +rc 

[libvirt] [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Balbir Singh
Hi, Everyone,

I've seen a new set of patches from Dan Smith, which implement cgroup support
for libvirt. While the patches seem simple, there are some issues that have been
pointed out in the posting itself.

I hope that libvirt will switch over (may be after your concerns are addressed
and definitely in the longer run) to using libcgroups rather than having an
internal implementation of cgroups. The advantages of switching over would be
using the functionality that libcgroup already provides

libcgroups (libcg.sf.net) provides

1. Ability to configure and mount cgroups and controllers via initscripts and a
configuration file
2. An API to control and read cgroups information
3. Thread safety around API calls
4. Daemons to automatically classify a task based on a certain set of rules
5. API to extract current cgroup classification (where is the task currently in
the cgroup hierarchy)

While re-implementing might sound like a cool thing to do, here are the 
drawbacks

1. It leads to code duplication and reduces code reuse
2. It leads to confused users

I understand that in the past there has been a perception that libcgroups might
not yet be ready, because we did not have ABI stability built into the library
and the header file had old comments about things changing. I would urge the
group to look at the current implementation of libcgroups (look at v0.32) and
help us

1. Fix any issues you see or point them to us
2. Add new API or request for new API that can help us integrate better with 
libvirt



-- 
Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Balbir Singh
On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange [EMAIL PROTECTED] wrote:
 On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
 I understand that in the past there has been a perception that libcgroups 
 might
 not yet be ready, because we did not have ABI stability built into the 
 library
 and the header file had old comments about things changing. I would urge the
 group to look at the current implementation of libcgroups (look at v0.32) and
 help us

 1. Fix any issues you see or point them to us
 2. Add new API or request for new API that can help us integrate better with 
 libvirt

 To expand on what I said in my other mail about providing value-add over
 the representation exposed by the kernel, here's some thoughts on the API
 exposed.

 Consider the following high level use case of libvirt

  - A set of groups, in a 3 level hierarchy APPNAME/DRIVER/DOMAIN
  - Control the ACL for block/char devices
  - Control memory limits

 This translates into an underling implementation, that I need to create 3
 levels of cgroups in the filesystem, attach my PIDs at the 3rd level
 use the memory and device controllers and attach PIDs at the 3rd, and
 set values for attributes exposed by the controllers. Notice I'm not
 actually setting any config parms at the 1st  2nd levels, but they
 do need to still exist to ensure namespace uniqueness amongst different
 applications using cgroups.

 The current cgroups API provides APIs that directly map to individual
 actions wrt the kernel filesystem exposed. So as an application developer
 I have to explicitly create the 3 levels of hierarchy, tell it I want
 to use memory  device controllers, format config values into the syntax
 required for each attribute, and remeber the attribute names.

 // Create the hierachy APPNAME/DRIVER/DOMAIN
 c1 = cgroup_new_cgroup(libvirt)
 c2 = cgroup_new_cgroup_parent(c1, lxc)
 c3 = cgroup_new_cgroup_parent(c2, domain.name)

 // Setup the controllers I want to use
 cgroup_add_controler(c3, devices)
 cgroup_add_controller(c3, memory)

 // Add my domain's PID to the cgroup
 cgroup_attach_task(c3, domain.pid)

 // Set the device ACL limits
 cgroup_set_value_string(c2, devices.deny, a);

 char buf[1024];
 sprintf(buf, %c %d:%d, 'c', 1, 3);
 cgroup_set_value_stirng(c2, devices.allow, buf);

 // Set memory limit
 cgroup_set_value_uint64(c2, memory.limit_in_bytes, domain.memory * 
 1024);

 This really isn't providing any semantically useful abstraction over
 the direct filesytem manipulation. Just a bunch of wrappers for mkdir(),
 mount() and read()/write() calls. My application still has to know far
 too much information about the details of cgroups as exposed by the
 kernel.


True, it definitely does and the way I look at APIs is that they are
layers. We've built the first layer that abstracts permissions, paths
and strings into a set of useful API. The second layer does things
that you say, the question then is why don't we have it yet?

Let me try and answer that question

1. We've been trying to build configuration, classification and the
low level plumbing
2. We've been planning to build the exact same thing that you say, we
call that the pluggable architecture, where controller plug in their
logic and provide the abstractions you need, but not gotten there yet.

When you announced cgroup support in libvirt, it was definitely going
to be a user and we hoped that you would come to us with your exact
requirements that you've mentioned now (believe me, your feedback is
very useful). The question then to ask is, is it cheaper for you to
build these abstractions into libvirt or either helped us or asked us
to do so, we would have gladly obliged. You might say that the onus is
on the maintainers to do the right thing without feedback, but I would
beg to differ.

What you've asked for, I consider as a layer on top of the API we have
now and should be easy to build.

 I do not care that there is a concept of  'controllers' at all, I just
 want to set device ACLs and memory limits. I do not care what the attributes
 in the filesystem are called, again I just want to set device ACLs and memory
 limits.  I do not care what the data format for them must be for device/memory
 settings. Memory settings could be stored in base-2, base-10 or base-16 I
 should not have to know this information.

 With this style of API, the library provide no real value-add or  compelling
 reason to use it.

 What might a more useful API look like? At least from my point of view,
 I'd like to be able to say:

  // Tell it I want $PID placed in APPNAME/DRIVER/DOMAIN
  char *path[] = { libvirt, lxc, domain.name};
  cg = cgroup_new_path(path, domain.pid)

  // I want to deny all devices
  cgroup_deny_all_devices(cg);

  // Allow /dev/null - either by node/major/minor
  cgroup_allow_device_node(cg, 'c', 1, 3);

  // Or more conviently just give it a node to copy info

Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Balbir Singh
On Sat, Oct 4, 2008 at 1:17 AM, Daniel P. Berrange [EMAIL PROTECTED] wrote:
 On Sat, Oct 04, 2008 at 12:13:38AM +0530, Balbir Singh wrote:
 On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange [EMAIL PROTECTED] 
 wrote:
 True, it definitely does and the way I look at APIs is that they are
 layers. We've built the first layer that abstracts permissions, paths
 and strings into a set of useful API. The second layer does things
 that you say, the question then is why don't we have it yet?

 Let me try and answer that question

 1. We've been trying to build configuration, classification and the
 low level plumbing
 2. We've been planning to build the exact same thing that you say, we
 call that the pluggable architecture, where controller plug in their
 logic and provide the abstractions you need, but not gotten there yet.

 When you announced cgroup support in libvirt, it was definitely going
 to be a user and we hoped that you would come to us with your exact
 requirements that you've mentioned now (believe me, your feedback is
 very useful). The question then to ask is, is it cheaper for you to
 build these abstractions into libvirt or either helped us or asked us
 to do so, we would have gladly obliged. You might say that the onus is
 on the maintainers to do the right thing without feedback, but I would
 beg to differ.

 The thing I didn't mention, is that until Dan posted his current patches
 actually implementing the cgroups stuff in LXC driver, I didn't have a
 good picture of what the ideal higher level interface would look like.
 If you try and imagine high level APIs, without having an app actually
 using them, its all too easy to design something that turns out to not
 be useful.

 So while I know the low level cgroups API isn't what we  need, it needs
 the current proof of concept in the libvirt LXC  driver to discover what
 is an effective approach for libcgroups. I suspect our code will evolve
 further as we learn from what we've got now.  By doing this entirely
 within libvirt we can experiment with effective implementation strategies
 without having to lockdown a formally supported API immediately. Once
 things settle down, it'll easier for libcgroups to see exactly what is
 important for a high level API and thus make one that's useful to more
 apps in the long term.


Please remember my words if you ever find that you have a code base
that looks like what we have in libcgroups, please remember to switch
over to libcgroup. I fear that you will reach that stage, the code
that is going in right now has too many things hard-coded and will
need a lot of changes going forward, things like adding support for
new controllers is not going to be straight forward, your assumption
that only root can create a container might be broken and we'll build
support for hierarchies, which will require further changes, etc. I am
not scaring you, just trying to make sure we don't solve the same
problems twice.

Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Balbir Singh

 The thing I didn't mention, is that until Dan posted his current patches
 actually implementing the cgroups stuff in LXC driver, I didn't have a
 good picture of what the ideal higher level interface would look like.
 If you try and imagine high level APIs, without having an app actually
 using them, its all too easy to design something that turns out to not
 be useful.

 So while I know the low level cgroups API isn't what we  need, it needs
 the current proof of concept in the libvirt LXC  driver to discover what
 is an effective approach for libcgroups. I suspect our code will evolve
 further as we learn from what we've got now.  By doing this entirely
 within libvirt we can experiment with effective implementation strategies
 without having to lockdown a formally supported API immediately. Once
 things settle down, it'll easier for libcgroups to see exactly what is
 important for a high level API and thus make one that's useful to more
 apps in the long term.


Agreed, the libvirt changes for cgroups have shown us a useful layer
to build. We'll keep on top of it and try and build something that
everyone can use.

Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0 of 2] [RFC] Add cgroup manipulation and LXC driver support

2008-10-03 Thread Balbir Singh
Daniel P. Berrange wrote:
 On Wed, Oct 01, 2008 at 08:41:19AM +0530, Balbir Singh wrote:
 Dan Smith wrote:
 DB At the same time having the controllers mounted is mandatory for
 DB libvirt to work and asking the admin to set things up manually
 DB also sucks. So perhaps we'll need to mount them automatically, but
 DB make this behaviuour configurable in some way, so admin can
 DB override it

 Perhaps we can:

  - Have a list of controllers we use (memory and devices so far)
  - Create each group in all mounts required to satisfy our necessary
controllers
  - Select the appropriate mount when setting a cont.key value

 I am not sure how libvirt provides thread safety, but I did not see any 
 explicit
 coding for that?
 
 The thread safety model for libvirt has two levels
 
  - A single virConnectPtr object must only be used by one thread. 
If you have multiple threads, you must provide each with its
own conenct object
 
  - Within a stateless driver (Xen, OpenVZ, Test), there is no shared
state between virConnectPtr objects, so there are no thread issues
in this respect
 
  - With a stateful driver, the libvirtd daemon ensures that only a
single thread is active at once, so against there are no thread
issues there either.
 
 Now, in a short while I will be making the daemon fully-multithreaded. When
 this happens, the stateful drivers will be required to maintain mutexes for
 locking. The locking model wil have 2 levels, one lock over the driver as
 a whole. This is held only while acquiring a lock against the object being
 modified (eg the virtual domain object).
 
 Each virtual domain, lives in one cgroup, so there is a single virCGroup
 object associated with each domain. the virCGroup object state is seflf
 contained, so independant virCGroup objects can be accessed concurrently
 from multiple threads, without any threads safety issues.

Thanks, that was quite insightful.

-- 
Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Balbir Singh
Daniel P. Berrange wrote:
 On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
 Hi, Everyone,

 I've seen a new set of patches from Dan Smith, which implement cgroup support
 for libvirt. While the patches seem simple, there are some issues that have 
 been
 pointed out in the posting itself.

 I hope that libvirt will switch over (may be after your concerns are 
 addressed
 and definitely in the longer run) to using libcgroups rather than having an
 internal implementation of cgroups. The advantages of switching over would be
 using the functionality that libcgroup already provides

 libcgroups (libcg.sf.net) provides

 1. Ability to configure and mount cgroups and controllers via initscripts 
 and a
 configuration file
 2. An API to control and read cgroups information
 3. Thread safety around API calls
 4. Daemons to automatically classify a task based on a certain set of rules
 5. API to extract current cgroup classification (where is the task currently 
 in
 the cgroup hierarchy)
 
 So from a functional point of view you are addressing essentially three
 use cases
 
  1. System configuration for controllers
  2. Automatic task classification
  3. Application development API for creating groups
 
 If each piece is correctly designed, the choice of implementation for
 each of these can be, and in some cases must be, totally independant. 
 
 Since the kernel restricts that a single controller can only be attached
 to one cgroupsfs mount point, and one attach cannot be changed, the choice
 of how / where to mount controllers must remain outside the scope of
 applications. If any application using cgroups were to specify mount
 points, it would be inflicting its own requirements on every user of
 cgroups. This implies that applications must be designed to work with 
 whatever controller mount configuration the admin has configured, and
 not configure stuff themselves. So impl for point 1 (configuration) 
 must, by neccessity, be completely independant of impl for point 3 
 (application API).
 
 Considering automatic task classification. The task classification engine
 must be able to cope with the fact that applications have some functional
 requirements on cgroups setup. Taking libvirt as an example, we have a
 specific need to apply some controllers over a group of processes forming
 a container. A task classification engine must not re-clasify individual
 tasks within a container because that would conflict with the semantics
 required by libvirt. It is, however, free to re-classify the libvirtd 
 daemon itself - whatever cgroup libvirtd is placed in, it will create the
 LXC cgroups below this point.
 
 So if libvirt is designed correctly, it will work with whatever cgroup
 task classification engine that might be running. Similarly if the task
 classification engine has been designed to co-operate with applications
 there is no problem running it alonside libvirt. Thus the implementation
 of points 2 (task classification) and point 3 (application API) have no
 need to be formally tied together. Furthermore tieing them together does
 not magically solve the problem that both applications  the cgroups task
 classification engine need to be intelligently designed to co-operate.
 

Agreed!

 
 While re-implementing might sound like a cool thing to do, here are the 
 drawbacks

 1. It leads to code duplication and reduces code reuse
 
 This is important if the library code is providing significant value add to 
 the application using it. As it stands, libcgroup is merely a direct interface
 to the cgroups filesystem providing weakly typed setters  getters - with the
 exception of looking at the mount table to find where a controller lives, this
 is not hard / complex code, so the benefits of re-use are not particularly 
 high. 
 

Please see my earlier email on layering of API.

 In such a scenario reducing code duplication is not in itself a benefit, since
 there are costs associated with using external libraries. It is more 
 complicated
 integrate 2 independant style sof API, particularly with different views on 
 error reporting, memory management and varying expectations for the semantic
 models exposed.
 

I disagree, I see a lot of code that does the same thing, look through
/proc/mounts, read and parse values to write and read. I see two API's you've
built on top of what libcgroup has (one for setting memory limit and the other
for devices). Please compare the patch sizes as well and you'll see what I mean.

 There are a number of 'hard' questions wrt to cgroups usage by applications,
 two of which are outlined above. Simply having all applications use a single
 API cannot magically solve any of these problems - no matter what API is used
 application developers need to take care to design their usage of cgroups
 such that it 'plays nicely' with other applications.
 

Playing nicely is a definite requirement, but not using existing code or
contributing to it if something is broken and re

Re: [libvirt] [PATCH 0 of 2] [RFC] Add cgroup manipulation and LXC driver support

2008-09-30 Thread Balbir Singh
Daniel P. Berrange wrote:
 On Tue, Sep 30, 2008 at 11:11:57AM -0700, Dan Smith wrote:
 BS For all practical purposes, it is not possible to mount all
 BS controllers at the same place. Consider a simple case of ns, if
 BS the ns controller is mounted, you need root permissions to create
 BS new groups, which defeats the whole purpose of the cgroup
 BS filesystem and assigning permissions, so that an application can
 BS create groups on it own.

 I don't think I'd go so far as saying that it defeats the whole
 purpose, but I understand your point.

 After just a small amount of playing around, it seems like it might be
 reasonable to just mount the controllers we care about somewhere just
 for libvirt.

 - What to do if memory and device controllers aren't present
 - What to do if the root group is set for exclusive cpuset behavior
 BS These need to be fixed as well.

 ...that's why I pointed them out :)

 I'm thinking that mounting the controllers we care about at daemon
 startup (as mentioned above) would solve both of these issues as well.

 Does anyone have an opinion on taking that approach?
 
 The trouble is then libvirt would be dictating policy to the host
 admin, because once you mount a particular controller, you can't
 change the wayu its mounted. So if libvirt mounted each controller
 separately, then the admin couldn't have a mount with multiple
 controllers active, and vica-verca. The kernel cgroups interface
 really sucks in this regard :-(
 
 At the same time having the controllers mounted is mandatory for libvirt
 to work and asking the admin to set things up manually also sucks. So
 perhaps we'll need to mount them automatically, but make this behaviuour
 configurable in some way, so admin can override it


As I mentioned in my previous email, one could use the cgconfigparser to
automatically mount the controllers at initscripts time and then also use a
policy to automatically classify tasks.



-- 
Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0 of 2] [RFC] Add cgroup manipulation and LXC driver support

2008-09-30 Thread Balbir Singh
Dan Smith wrote:
 DB The trouble is then libvirt would be dictating policy to the host
 DB admin, because once you mount a particular controller, you can't
 DB change the wayu its mounted. So if libvirt mounted each controller
 DB separately, then the admin couldn't have a mount with multiple
 DB controllers active, and vica-verca.
 
 Oh, I see.  I had left that out of my quick test.  I had assumed that
 it would behave as you would expect.
 
 DB The kernel cgroups interface really sucks in this regard :-(
 
 I was going to go with surprisingly unideal ...but yeah.

The interface, when it was designed was designed to allow flexibility of
separating controllers. One might need different resources for tasks, they
should not be forced to share the same set of controllers. Cgroups has the
notion of busy (as in no new groups are created underneath), so it needs to be
not busy for changing the way it is mounted.

This has made our life while working on libcgroup very hard. The other thing
that gets hard is controller interplay and rules. CPUsets for example has rules
about not allowing tasks to attach without adding cpus and mems and other rules
about exclusivity and having certain files just in the root.

 
 DB At the same time having the controllers mounted is mandatory for
 DB libvirt to work and asking the admin to set things up manually
 DB also sucks. So perhaps we'll need to mount them automatically, but
 DB make this behaviuour configurable in some way, so admin can
 DB override it
 
 Perhaps we can:
 
  - Have a list of controllers we use (memory and devices so far)
  - Create each group in all mounts required to satisfy our necessary
controllers
  - Select the appropriate mount when setting a cont.key value
 

I am not sure how libvirt provides thread safety, but I did not see any explicit
coding for that?

 It will muck things up a bit, but I think it might be doable.
 

I would really recommend looking at libcgroup in the long run and using it.


-- 
Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0 of 2] [RFC] Add cgroup manipulation and LXC driver support

2008-09-29 Thread Balbir Singh
Dan Smith wrote:
 This patch set adds basic cgroup support to the LXC driver.  It consists of
 a small internal cgroup manipulation API, as well as changes to the driver
 itself to utilize the support.  Currently, we just set a memory limit
 and the allowed devices list.  The cgroup.{c,h} interface can be easily
 redirected to libcgroup in the future if and when the decision to move in
 that direction is made.
 
 Some discussion on the following points is probably warranted, to help
 determine how deep we want to go with this internal implementation, in terms'
 of supporting complex system configurations, etc.
 
  - What to do if controllers are mounted in multiple places

For all practical purposes, it is not possible to mount all controllers at the
same place. Consider a simple case of ns, if the ns controller is mounted, you
need root permissions to create new groups, which defeats the whole purpose of
the cgroup filesystem and assigning permissions, so that an application can
create groups on it own.

  - What to do if memory and device controllers aren't present
  - What to do if the root group is set for exclusive cpuset behavior

These need to be fixed as well.


-- 
Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [Libvir] [RFC] Add Container support to libvirt

2008-01-15 Thread Balbir Singh
* Daniel P. Berrange [EMAIL PROTECTED] [2008-01-15 15:52:13]:

 On Tue, Jan 15, 2008 at 12:26:43AM -0800, Dave Leskovec wrote:
  Greetings,
  
  Following up on the XML format for the Linux Container support I 
  proposed...  I've made the following recommended changes:
  * Changed mount tags
  * Changed nameserver tag to be consistent with gateway
  * Moved cpushare and memory tags outside container tag
  
  This is the updated format:
  domain type='linuxcontainer'
nameContainer123/name
uuid8dfd44b31e76d8d335150a2d98211ea0/uuid
container
filesystem
mount
source dir=/home/user/lxc_files/etc//
target dir=/etc//
/mount
mount
source dir=/home/user/lxc_files/var//
target dir=/var//
/mount
/filesystem
 
 Comparing this to the Linux-VServer XML that Daniel posted, you're both
 pretty much representing the same concepts so we need to make a decision
 about which format to use for  filesystem mounts.
 
 OpenVZ also provides a /domain/container/filesystem tag, though it
 uses a concept of filesystem templates auto-cloned per container
 rather than explicit mounts. I think I'd like to see
 
filesystem type=mount
source dir=/home/user/lxc_files/etc//
target dir=/etc//
/filesystem
 
 For the existing OpenVZ XML, we can augment their filesystem tag with
 an attribute  type=template.
 
application/usr/sbin/container_init/application
network hostname='browndog'
ip address=192.168.1.110 netmask=255.255.255.0/
gateway address=192.168.1.1/
nameserver address=192.168.1.1/nameserver
/ip
/network
 
 Again this is pretty similar to needs of VServer / OpenVZ. In the existing
 OpenVZ XML, the gateway and nameserver tags are immediately within the
 network tag, rather than nested inside the ip tag. Aside from that it
 looks to be a consistent set of information.
 
/container
cpushare40/cpushare
 
 As Daniel points out, we've thus far explicitly excluded tuning info from
 the XML. Not that I have any suggestion on where else to put it at this
 time. This is a minor thing though, easily implemented once we come to a
 decision.

At some point, we'll need resource management extensions to libvirt.
vserver and openVZ both use them and it will also be useful for
containers and kvm/qemu as well. I think we'll need a resource
management feature extension to the XML format.

Currently resource management is provided through control groups (I
can send out links if desired). Ideally once configured the control
groups should be persistent (visible across reboots, so we need to
save state).

Thoughts?

 
memory65536/memory
devices
console tty='/dev/pts/4'/
/devices
  /domain
  
  Does this look ok now?  All comments and questions are welcome.
 
 Pretty close.
 
 Dan.
 -- 
 |=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
 |=-   Perl modules: http://search.cpan.org/~danberr/  -=|
 |=-   Projects: http://freshmeat.net/~danielpb/   -=|
 |=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 
 
 --
 Libvir-list mailing list
 Libvir-list@redhat.com
 https://www.redhat.com/mailman/listinfo/libvir-list

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list