Re: [Devel] [PATCH vz7] fuse: fuse_prepare_write() cannot handle page from killed request

2017-02-14 Thread Konstantin Khorenko

Dima, please review the patch.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 02/14/2017 03:03 AM, Maxim Patlasov wrote:

After fuse_prepare_write() called __fuse_readpage(file, page, ...),
the page might be already unlocked by fuse_kill_requests():


for (i = 0; i < req->num_pages; i++) {
struct page *page = req->pages[i];
SetPageError(page);
unlock_page(page);


so it is incorrect to touch it at all. The problem can be easily
fixed the same way is it was done in fuse_readpage() checking "killed"
flag.

Another minor complication is that there are three different use-cases
for that snippet from fuse_kill_requests() above: fuse_readpages(),
fuse_readpage() and fuse_prepare_write(). Among them only the latter
needs explicit page_cache_release() call. That's why the patch introduces
ad-hoc request flag "page_needs_release".

https://jira.sw.ru/browse/PSBM-54547
Signed-off-by: Maxim Patlasov 
---
 fs/fuse/file.c   |   15 ++-
 fs/fuse/fuse_i.h |3 +++
 fs/fuse/inode.c  |2 ++
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index a514748..41ed6f0 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1008,7 +1008,7 @@ static void fuse_short_read(struct fuse_req *req, struct 
inode *inode,

 static int __fuse_readpage(struct file *file, struct page *page, size_t count,
   int *err, struct fuse_req **req_pp, u64 *attr_ver_p,
-  bool *killed_p)
+  bool page_needs_release, bool *killed_p)
 {
struct fuse_io_priv io = { .async = 0, .file = file };
struct inode *inode = page->mapping->host;
@@ -1040,6 +1040,7 @@ static int __fuse_readpage(struct file *file, struct page 
*page, size_t count,
req->pages[0] = page;
req->page_descs[0].length = count;
req->page_cache = 1;
+   req->page_needs_release = page_needs_release;

num_read = fuse_send_read(req, &io, page_offset(page), count, NULL);
killed = req->killed;
@@ -1071,7 +1072,7 @@ static int fuse_readpage(struct file *file, struct page 
*page)
goto out;

num_read = __fuse_readpage(file, page, count, &err, &req, &attr_ver,
-  &killed);
+  false, &killed);
if (!err) {
/*
 * Short read means EOF.  If file size is larger, truncate it
@@ -1153,6 +1154,7 @@ static void fuse_send_readpages(struct fuse_req *req, 
struct file *file)
req->out.page_zeroing = 1;
req->out.page_replace = 1;
req->page_cache = 1;
+   req->page_needs_release = false;
fuse_read_fill(req, file, pos, count, FUSE_READ);
fuse_account_request(fc, count);
req->misc.read.attr_ver = fuse_get_attr_version(fc);
@@ -2368,6 +2370,7 @@ static int fuse_prepare_write(struct fuse_conn *fc, 
struct file *file,
unsigned num_read;
unsigned page_len;
int err;
+   bool killed = false;

if (fuse_file_fail_immediately(file)) {
unlock_page(page);
@@ -2385,12 +2388,14 @@ static int fuse_prepare_write(struct fuse_conn *fc, 
struct file *file,
}

num_read = __fuse_readpage(file, page, page_len, &err, &req, NULL,
-  NULL);
+  true, &killed);
if (req)
fuse_put_request(fc, req);
if (err) {
-   unlock_page(page);
-   page_cache_release(page);
+   if (!killed) {
+   unlock_page(page);
+   page_cache_release(page);
+   }
} else if (num_read != PAGE_CACHE_SIZE) {
zero_user_segment(page, num_read, PAGE_CACHE_SIZE);
}
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 22eb9c9..fefa8ff 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -330,6 +330,9 @@ struct fuse_req {
/** Request contains pages from page-cache */
unsigned page_cache:1;

+   /** Request pages need page_cache_release() */
+   unsigned page_needs_release:1;
+
/** Request was killed -- pages were released */
unsigned killed:1;

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b63aae2..ddd858c 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -378,6 +378,8 @@ static void fuse_kill_requests(struct fuse_conn *fc, struct 
inode *inode,
struct page *page = req->pages[i];
SetPageError(page);
unlock_page(page);
+   if (req->page_needs_release)
+   page_cache_release(page);
req->pages[i] = NULL;
}


.


___
Devel 

[Devel] [PATCH] KVM: nVMX: Fix the NMI IDT-vectoring handling

2017-02-14 Thread Denis Plotnikov
From: Wanpeng Li 

Run kvm-unit-tests/eventinj.flat in L1:

Sending NMI to self
After NMI to self
FAIL: NMI

This test scenario is to test whether VMM can handle NMI IDT-vectoring info 
correctly.

At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on 
both L1
and L0 are empty so:

- The L2 accesses memory can generate EPT violation which can be intercepted by 
L0.

  The EPT violation vmexit occurred during delivery of this NMI, and the NMI 
info is
  recorded in vmcs02's IDT-vectoring info.

- L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT 
violation into L1.

  The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info 
since
  it is a nested vmexit.

- L1 receives the EPT violation, then fixes its EPT12.
- L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits 
to L0.
- L0 emulates VMRESUME which is called from L1, then return to L2.

  L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 
through
  vmcs02.

- The L2 re-executes the fault instruction and cause EPT violation again.
- Since the L1's EPT12 is valid, L0 can fix its EPT02
- L0 resume L2

  The EPT violation vmexit occurred during delivery of this NMI again, and the 
NMI info
  is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through 
vmentry
  event injection since it is caused by EPT02's EPT violation.

However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU 
is in
guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if 
it is
the L0's responsibility to inject NMI from IDT-vectoring info to L2.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Jan Kiszka 
Cc: Bandan Das 
Signed-off-by: Wanpeng Li 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Radim Krčmář 
(cherry picked from commit c5a6d5f7faad8549bb5ff7e3e5792e33933c5b9f)

Fix #PSBM-59729
---
 arch/x86/kvm/vmx.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a193d23..18e761f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4997,29 +4997,30 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-   if (is_guest_mode(vcpu))
-   return;
+   if (!is_guest_mode(vcpu)) {
+   if (!cpu_has_virtual_nmis()) {
+   /*
+* Tracking the NMI-blocked state in software is built 
upon
+* finding the next open IRQ window. This, in turn, 
depends on
+* well-behaving guests: They have to keep IRQs 
disabled at
+* least as long as the NMI handler runs. Otherwise we 
may
+* cause NMI nesting, maybe breaking the guest. But as 
this is
+* highly unlikely, we can live with the residual risk.
+*/
+   vmx->soft_vnmi_blocked = 1;
+   vmx->vnmi_blocked_time = 0;
+   }
 
-   if (!cpu_has_virtual_nmis()) {
-   /*
-* Tracking the NMI-blocked state in software is built upon
-* finding the next open IRQ window. This, in turn, depends on
-* well-behaving guests: They have to keep IRQs disabled at
-* least as long as the NMI handler runs. Otherwise we may
-* cause NMI nesting, maybe breaking the guest. But as this is
-* highly unlikely, we can live with the residual risk.
-*/
-   vmx->soft_vnmi_blocked = 1;
-   vmx->vnmi_blocked_time = 0;
+   ++vcpu->stat.nmi_injections;
+   vmx->nmi_known_unmasked = false;
}
 
-   ++vcpu->stat.nmi_injections;
-   vmx->nmi_known_unmasked = false;
if (vmx->rmode.vm86_active) {
if (kvm_inject_realmode_interrupt(vcpu, NMI_VECTOR, 0) != 
EMULATE_DONE)
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
return;
}
+
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
 }
-- 
2.10.1.352.g0cf3611

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH] KVM: nVMX: Fix the NMI IDT-vectoring handling

2017-02-14 Thread Denis Plotnikov
From: Wanpeng Li 

Run kvm-unit-tests/eventinj.flat in L1:

Sending NMI to self
After NMI to self
FAIL: NMI

This test scenario is to test whether VMM can handle NMI IDT-vectoring info 
correctly.

At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on 
both L1
and L0 are empty so:

- The L2 accesses memory can generate EPT violation which can be intercepted by 
L0.

  The EPT violation vmexit occurred during delivery of this NMI, and the NMI 
info is
  recorded in vmcs02's IDT-vectoring info.

- L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT 
violation into L1.

  The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info 
since
  it is a nested vmexit.

- L1 receives the EPT violation, then fixes its EPT12.
- L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits 
to L0.
- L0 emulates VMRESUME which is called from L1, then return to L2.

  L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 
through
  vmcs02.

- The L2 re-executes the fault instruction and cause EPT violation again.
- Since the L1's EPT12 is valid, L0 can fix its EPT02
- L0 resume L2

  The EPT violation vmexit occurred during delivery of this NMI again, and the 
NMI info
  is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through 
vmentry
  event injection since it is caused by EPT02's EPT violation.

However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU 
is in
guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if 
it is
the L0's responsibility to inject NMI from IDT-vectoring info to L2.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Jan Kiszka 
Cc: Bandan Das 
Signed-off-by: Wanpeng Li 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Radim Krčmář 
(cherry picked from commit c5a6d5f7faad8549bb5ff7e3e5792e33933c5b9f)

Fix #PSBM-59729

Signed-off-by: Denis Plotnikov 
---
 arch/x86/kvm/vmx.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a193d23..18e761f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4997,29 +4997,30 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-   if (is_guest_mode(vcpu))
-   return;
+   if (!is_guest_mode(vcpu)) {
+   if (!cpu_has_virtual_nmis()) {
+   /*
+* Tracking the NMI-blocked state in software is built 
upon
+* finding the next open IRQ window. This, in turn, 
depends on
+* well-behaving guests: They have to keep IRQs 
disabled at
+* least as long as the NMI handler runs. Otherwise we 
may
+* cause NMI nesting, maybe breaking the guest. But as 
this is
+* highly unlikely, we can live with the residual risk.
+*/
+   vmx->soft_vnmi_blocked = 1;
+   vmx->vnmi_blocked_time = 0;
+   }
 
-   if (!cpu_has_virtual_nmis()) {
-   /*
-* Tracking the NMI-blocked state in software is built upon
-* finding the next open IRQ window. This, in turn, depends on
-* well-behaving guests: They have to keep IRQs disabled at
-* least as long as the NMI handler runs. Otherwise we may
-* cause NMI nesting, maybe breaking the guest. But as this is
-* highly unlikely, we can live with the residual risk.
-*/
-   vmx->soft_vnmi_blocked = 1;
-   vmx->vnmi_blocked_time = 0;
+   ++vcpu->stat.nmi_injections;
+   vmx->nmi_known_unmasked = false;
}
 
-   ++vcpu->stat.nmi_injections;
-   vmx->nmi_known_unmasked = false;
if (vmx->rmode.vm86_active) {
if (kvm_inject_realmode_interrupt(vcpu, NMI_VECTOR, 0) != 
EMULATE_DONE)
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
return;
}
+
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
 }
-- 
2.10.1.352.g0cf3611

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [vzlin-dev] [PATCH vz7] fuse: fuse_prepare_write() cannot handle page from killed request

2017-02-14 Thread Dmitry Monakhov
Maxim Patlasov  writes:

> After fuse_prepare_write() called __fuse_readpage(file, page, ...),
> the page might be already unlocked by fuse_kill_requests():
>
>>  for (i = 0; i < req->num_pages; i++) {
>>  struct page *page = req->pages[i];
>>  SetPageError(page);
>>  unlock_page(page);
ACK.
>
> so it is incorrect to touch it at all. The problem can be easily
> fixed the same way is it was done in fuse_readpage() checking "killed"
> flag.
>
> Another minor complication is that there are three different use-cases
> for that snippet from fuse_kill_requests() above: fuse_readpages(),
> fuse_readpage() and fuse_prepare_write(). Among them only the latter
> needs explicit page_cache_release() call. That's why the patch introduces
> ad-hoc request flag "page_needs_release".
>
> https://jira.sw.ru/browse/PSBM-54547
> Signed-off-by: Maxim Patlasov 
> ---
>  fs/fuse/file.c   |   15 ++-
>  fs/fuse/fuse_i.h |3 +++
>  fs/fuse/inode.c  |2 ++
>  3 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index a514748..41ed6f0 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1008,7 +1008,7 @@ static void fuse_short_read(struct fuse_req *req, 
> struct inode *inode,
>  
>  static int __fuse_readpage(struct file *file, struct page *page, size_t 
> count,
>  int *err, struct fuse_req **req_pp, u64 *attr_ver_p,
> -bool *killed_p)
> +bool page_needs_release, bool *killed_p)
>  {
>   struct fuse_io_priv io = { .async = 0, .file = file };
>   struct inode *inode = page->mapping->host;
> @@ -1040,6 +1040,7 @@ static int __fuse_readpage(struct file *file, struct 
> page *page, size_t count,
>   req->pages[0] = page;
>   req->page_descs[0].length = count;
>   req->page_cache = 1;
> + req->page_needs_release = page_needs_release;
>  
>   num_read = fuse_send_read(req, &io, page_offset(page), count, NULL);
>   killed = req->killed;
> @@ -1071,7 +1072,7 @@ static int fuse_readpage(struct file *file, struct page 
> *page)
>   goto out;
>  
>   num_read = __fuse_readpage(file, page, count, &err, &req, &attr_ver,
> -&killed);
> +false, &killed);
>   if (!err) {
>   /*
>* Short read means EOF.  If file size is larger, truncate it
> @@ -1153,6 +1154,7 @@ static void fuse_send_readpages(struct fuse_req *req, 
> struct file *file)
>   req->out.page_zeroing = 1;
>   req->out.page_replace = 1;
>   req->page_cache = 1;
> + req->page_needs_release = false;
>   fuse_read_fill(req, file, pos, count, FUSE_READ);
>   fuse_account_request(fc, count);
>   req->misc.read.attr_ver = fuse_get_attr_version(fc);
> @@ -2368,6 +2370,7 @@ static int fuse_prepare_write(struct fuse_conn *fc, 
> struct file *file,
>   unsigned num_read;
>   unsigned page_len;
>   int err;
> + bool killed = false;
>  
>   if (fuse_file_fail_immediately(file)) {
>   unlock_page(page);
> @@ -2385,12 +2388,14 @@ static int fuse_prepare_write(struct fuse_conn *fc, 
> struct file *file,
>   }
>  
>   num_read = __fuse_readpage(file, page, page_len, &err, &req, NULL,
> -NULL);
> +true, &killed);
>   if (req)
>   fuse_put_request(fc, req);
>   if (err) {
> - unlock_page(page);
> - page_cache_release(page);
> + if (!killed) {
> + unlock_page(page);
> + page_cache_release(page);
> + }
>   } else if (num_read != PAGE_CACHE_SIZE) {
>   zero_user_segment(page, num_read, PAGE_CACHE_SIZE);
>   }
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 22eb9c9..fefa8ff 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -330,6 +330,9 @@ struct fuse_req {
>   /** Request contains pages from page-cache */
>   unsigned page_cache:1;
>  
> + /** Request pages need page_cache_release() */
> + unsigned page_needs_release:1;
> +
>   /** Request was killed -- pages were released */
>   unsigned killed:1;
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index b63aae2..ddd858c 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -378,6 +378,8 @@ static void fuse_kill_requests(struct fuse_conn *fc, 
> struct inode *inode,
>   struct page *page = req->pages[i];
>   SetPageError(page);
>   unlock_page(page);
> + if (req->page_needs_release)
> + page_cache_release(page);
>   req->pages[i] = NULL;
>   }
>  
___
Devel

[Devel] [PATCH libvzctl 2/3] env_nsops: ns_set_memory_param -- Allow to select which param to set

2017-02-14 Thread Cyrill Gorcunov
In case of restore procedure we will be restoring
memory limits in several steps: limiting ram first
and limiting swap on later stage. Thus provide the
mask to trigger the switch.

Signed-off-by: Cyrill Gorcunov 
---
 lib/env_nsops.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/lib/env_nsops.c b/lib/env_nsops.c
index 4650506..a5f1ec4 100644
--- a/lib/env_nsops.c
+++ b/lib/env_nsops.c
@@ -278,7 +278,13 @@ static int ns_set_ub(struct vzctl_env_handle *h,
 
 #define PAGE_COUNTER_MAX ((unsigned long)LONG_MAX)
 
-static int ns_set_memory_param(struct vzctl_env_handle *h, struct 
vzctl_ub_param *ub)
+#define MPARAM_PHYSPAGES1
+#define MPARAM_SWAPPAGES2
+#define MPARAM_ALL  (MPARAM_PHYSPAGES | MPARAM_SWAPPAGES)
+
+static int ns_set_memory_param(struct vzctl_env_handle *h,
+  struct vzctl_ub_param *ub,
+  int mask)
 {
int ret = 0;
int pagesize = get_pagesize();
@@ -300,7 +306,7 @@ static int ns_set_memory_param(struct vzctl_env_handle *h, 
struct vzctl_ub_param
x += ub->physpages ? (float)pagesize * ub->physpages->l : cur_mem;
new_ms = x > PAGE_COUNTER_MAX ? PAGE_COUNTER_MAX : (unsigned long) x;
 
-   if (ub->physpages) {
+   if (ub->physpages && (mask & MPARAM_PHYSPAGES)) {
x = (float)pagesize * ub->physpages->l;
new_mem = x > PAGE_COUNTER_MAX ? PAGE_COUNTER_MAX : (unsigned 
long) x;
 
@@ -321,7 +327,8 @@ static int ns_set_memory_param(struct vzctl_env_handle *h, 
struct vzctl_ub_param

}
 
-   return cg_env_set_memory(h->ctid, CG_SWAP_LIMIT, new_ms);
+   return (mask & MPARAM_SWAPPAGES) ?
+   cg_env_set_memory(h->ctid, CG_SWAP_LIMIT, new_ms) : 0;
 }
 
 static int ns_apply_res_param(struct vzctl_env_handle *h,
@@ -346,7 +353,7 @@ static int ns_apply_res_param(struct vzctl_env_handle *h,
 * unlimited memory resources until 
 * configuration was activated by vcmmd
 */
-   ret = ns_set_memory_param(h, ub);
+   ret = ns_set_memory_param(h, ub, MPARAM_ALL);
if (ret)
goto err;
ret = vcmm_register(h, env);
@@ -357,7 +364,7 @@ static int ns_apply_res_param(struct vzctl_env_handle *h,
env->res->memguar = NULL;
}
} else
-   ret = ns_set_memory_param(h, ub);
+   ret = ns_set_memory_param(h, ub, MPARAM_ALL);
 
if (env->res->ub->pagecache_isolation) {
ret = cg_env_set_memory(EID(h), "memory.disable_cleancache",
-- 
2.7.4

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH libvzctl 1/3] cgroup: Allow to attach to specific cgroup only

2017-02-14 Thread Cyrill Gorcunov
Signed-off-by: Cyrill Gorcunov 
---
 lib/cgroup.c| 4 +++-
 lib/cgroup.h| 2 +-
 lib/env_nsops.c | 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/cgroup.c b/lib/cgroup.c
index c101a64..806cab2 100644
--- a/lib/cgroup.c
+++ b/lib/cgroup.c
@@ -618,11 +618,13 @@ int cg_disable_pseudosuper(const int pseudosuper_fd)
return do_write_data(pseudosuper_fd, NULL, "0", 1);
 }
 
-int cg_attach_task(const char *ctid, pid_t pid)
+int cg_attach_task(const char *ctid, pid_t pid, char *cg_subsys)
 {
int ret, i;
 
for (i = 0; i < sizeof(cg_ctl_map)/sizeof(cg_ctl_map[0]); i++) {
+   if (cg_subsys && strcmp(cg_ctl_map[i].subsys, cg_subsys))
+   continue;
ret = cg_set_ul(ctid, cg_ctl_map[i].subsys, "tasks", pid);
if (ret == -1)
return -1;
diff --git a/lib/cgroup.h b/lib/cgroup.h
index 9d4508d..f815e1e 100644
--- a/lib/cgroup.h
+++ b/lib/cgroup.h
@@ -51,7 +51,7 @@ int cg_destroy_cgroup(const char *ctid);
 int cg_enable_pseudosuper(const char *ctid);
 int cg_pseudosuper_open(const char *ctid, int *fd);
 int cg_disable_pseudosuper(const int pseudosuper_fd);
-int cg_attach_task(const char *ctid, pid_t pid);
+int cg_attach_task(const char *ctid, pid_t pid, char *cg_subsys);
 int cg_set_param(const char *ctid, const char *subsys, const char *name, const 
char *data);
 int cg_get_param(const char *ctid, const char *subsys, const char *name, char 
*out, int size);
 int cg_get_ul(const char *ctid, const char *subsys, const char *name,
diff --git a/lib/env_nsops.c b/lib/env_nsops.c
index 20d1acb..4650506 100644
--- a/lib/env_nsops.c
+++ b/lib/env_nsops.c
@@ -705,7 +705,7 @@ static int do_env_create(struct vzctl_env_handle *h, struct 
start_param *param)
 * children into appropriate cgroups.
 */
if (!param->fn) {
-   ret = cg_attach_task(h->ctid, getpid());
+   ret = cg_attach_task(h->ctid, getpid(), NULL);
if (ret)
goto err;
}
@@ -866,7 +866,7 @@ static int ns_env_enter(struct vzctl_env_handle *h, int 
flags)
if (dp == NULL)
return vzctl_err(-1, errno, "Unable to open dir %s", path);
 
-   ret = cg_attach_task(EID(h), getpid());
+   ret = cg_attach_task(EID(h), getpid(), NULL);
if (ret)
goto err;
 
-- 
2.7.4

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH libvzctl 3/3] Rework setting up memory limits on restore

2017-02-14 Thread Cyrill Gorcunov
Currenlty we restore memory limits in post-resume
stage where applications are already executing
after restore stage which in worst scenario may
cause container to stop restoration procedure.

Instead lets use another strategy:

 - start restore with ram limited but unlimited swap
 - on post-restore stage setup swap limit back
 - kick containers to run

https://jira.sw.ru/browse/PSBM-58742

Signed-off-by: Cyrill Gorcunov 
---
 lib/env.c|  22 +
 lib/env.h|   2 +-
 lib/env_nsops.c  | 116 +--
 scripts/vz-rst-action.in |  21 ++---
 4 files changed, 88 insertions(+), 73 deletions(-)

diff --git a/lib/env.c b/lib/env.c
index f804ca4..a412b65 100644
--- a/lib/env.c
+++ b/lib/env.c
@@ -1218,27 +1218,29 @@ int vzctl2_env_restore(struct vzctl_env_handle *h,
goto err;
 
 
-   logger(10, 0, "* Wait for post-resume");
+   logger(10, 0, "* Wait for post-restore action script");
ret = read_p(h->ctx->status_p[0]);
if (ret) {
-   logger(-1, 0, "Error %d reported from post-resume", ret);
+   logger(-1, 0, "Error %d reported from post-restore action 
script", ret);
goto err;
}
-   logger(10, 0, "* Continue post-resume");
+
+   if (!(flags & VZCTL_SKIP_SETUP)) {
+   logger(10, 0, "* Setting up parameters on container");
+   ret = vzctl2_apply_param(h, env, VZCTL_CPT_POST_RESTORE);
+   if (ret)
+   goto err;
+   }
+
+   logger(10, 0, "* Continue post-restore action script");
if (write(h->ctx->wait_p[1], &ret, sizeof(ret)) == -1)
ret = vzctl_err(VZCTL_E_SYSTEM, errno, "Unable to write to the"
-   " wait fd when post-resume the Container");
+   " wait fd when post-restore the Container");
if (ret)
goto err;
 
close(h->ctx->wait_p[1]); h->ctx->wait_p[1] = -1;
 
-   if (!(flags & VZCTL_SKIP_SETUP)) {
-   ret = vzctl2_apply_param(h, env, VZCTL_CPT_POST_RESUME);
-   if (ret)
-   goto err;
-   }
-
h->ctx->state = 0;
 
logger(10, 0, "* Wait on error pipe");
diff --git a/lib/env.h b/lib/env.h
index d56205d..5240c2d 100644
--- a/lib/env.h
+++ b/lib/env.h
@@ -60,7 +60,7 @@
 enum {
VZCTL_CONF_PARAM= 0x2,
VZCTL_CONF_QUIET= 0x4,
-   VZCTL_CPT_POST_RESUME   = 0x8,
+   VZCTL_CPT_POST_RESTORE  = 0x8,
 };
 
 struct vzctl_opts {
diff --git a/lib/env_nsops.c b/lib/env_nsops.c
index a5f1ec4..ad9ae73 100644
--- a/lib/env_nsops.c
+++ b/lib/env_nsops.c
@@ -331,53 +331,66 @@ static int ns_set_memory_param(struct vzctl_env_handle *h,
cg_env_set_memory(h->ctid, CG_SWAP_LIMIT, new_ms) : 0;
 }
 
+static int ns_apply_memory_param(struct vzctl_env_handle *h,
+ struct vzctl_env_param *env,
+ struct vzctl_ub_param *ub,
+ int mparam_mask)
+{
+int ret;
+
+if (is_managed_by_vcmmd()) {
+if (h->ctx->state == VZCTL_STATE_STARTING) {
+/* apply parameters to avoid running with
+ * unlimited memory resources until
+ * configuration was activated by vcmmd
+ */
+ret = ns_set_memory_param(h, ub, mparam_mask);
+if (!ret)
+ret = vcmm_register(h, env);
+} else
+ret = vcmm_update(h, env);
+if (ret) {
+free(env->res->memguar);
+env->res->memguar = NULL;
+}
+} else
+ret = ns_set_memory_param(h, ub, mparam_mask);
+return ret;
+}
+
 static int ns_apply_res_param(struct vzctl_env_handle *h,
-   struct vzctl_env_param *env)
+ struct vzctl_env_param *env,
+ int mparam_mask)
 {
-   int ret;
-   struct vzctl_ub_param *ub;
+struct vzctl_ub_param *ub;
+int ret;
 
-   ret = get_vswap_param(h, env, &ub);
-   if (ret)
-   return ret;
+ret = get_vswap_param(h, env, &ub);
+if (ret)
+return ret;
 
-   if (is_vz_kernel()) {
-   ret = ns_set_ub(h, ub);
-   if (ret)
-   goto err;
-   }
+if (is_vz_kernel()) {
+ret = ns_set_ub(h, ub);
+if (ret)
+goto err;
+}
 
-   if (is_managed_by_vcmmd()) {
-   if (h->ctx->state == VZCTL_STATE_STARTING) {
-   /* apply parameters to avoid running with
-* unlimited memory resources until 
-  

[Devel] [PATCH libvzctl 0/3] Rework setting up memory limits on restore

2017-02-14 Thread Cyrill Gorcunov
Igor, please ping me in PM once you're refy to merge the series: I'll
rebuild new criu first with changes needed so libvzctl will require
new criu to exist before the install.

Cyrill Gorcunov (3):
  cgroup: Allow to attach to specific cgroup only
  env_nsops: ns_set_memory_param -- Allow to select which param to set
  Rework setting up memory limits on restore

 lib/cgroup.c |   4 +-
 lib/cgroup.h |   2 +-
 lib/env.c|  22 
 lib/env.h|   2 +-
 lib/env_nsops.c  | 133 ++-
 scripts/vz-rst-action.in |  21 ++--
 6 files changed, 104 insertions(+), 80 deletions(-)

-- 
2.7.4

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel