[Devel] [PATCH rh7] timers should not get negative argument

2016-06-17 Thread Cyrill Gorcunov
From: Vasily Averin 

This patch fixes 25-sec delay on login into systemd based containers.

Userspace application can set timer for past
and expect that the timer will be expired immediately.

This can do not work as expected inside migrated containers.
Translated argument provided to timer can become negative,
and according timer will sleep a very long time.

https://jira.sw.ru/browse/PSBM-48475

CC: Vladimir Davydov 
CC: Konstantin Khorenko 
Signed-off-by: Vasily Averin 
Acked-by: Cyrill Gorcunov 
---
 kernel/posix-timers.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-pcs7.git/kernel/posix-timers.c
===
--- linux-pcs7.git.orig/kernel/posix-timers.c
+++ linux-pcs7.git/kernel/posix-timers.c
@@ -133,6 +133,8 @@ static struct k_clock posix_clocks[MAX_C
 (which_clock) == CLOCK_MONOTONIC_COARSE)
 
 #ifdef CONFIG_VE
+static struct timespec zero_time;
+
 void monotonic_abs_to_ve(clockid_t which_clock, struct timespec *tp)
 {
struct ve_struct *ve = get_exec_env();
@@ -151,6 +153,10 @@ void monotonic_ve_to_abs(clockid_t which
set_normalized_timespec(tp,
tp->tv_sec + ve->start_timespec.tv_sec,
tp->tv_nsec + ve->start_timespec.tv_nsec);
+   if (timespec_compare(tp, &zero_time) <= 0) {
+   tp->tv_sec =  0;
+   tp->tv_nsec = 1;
+   }
 }
 #endif
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] test - pls ignore

2016-06-17 Thread Vladimir Davydov

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [NEW KERNEL] 3.10.0-327.18.2.vz7.14.16 (rhel7)

2016-06-17 Thread builder
Changelog:

OpenVZ kernel rh7-3.10.0-327.18.2.vz7.14.16

* sysctl: dropped unused fs.ve-area-access-check, net.ipv4.tcp_max_tw*
* device_cgroup: allow to manage devices from inside a Container in
  @pseudosuper state with no usual Container constraints.
  This is used on CRIU restore stage.
* device_cgroup: allow to change device mount permission via cgroup.
  Previously it was only possible via ioctl on a running Container only
  which is inconveninent on CRIU restore stage.
* device_cgroup: kill ACC_QUOTA permission. Not needed anymore.
* ploop: couple of bugs introduced during rebase from 2.6.32-x
* netlink: added possibility to dump and restore netlink sockets with
  data in receive queue (in case of no ongoing callback execution)


Generated changelog:

* Fri Jun 17 2016 Konstantin Khorenko  
[3.10.0-327.18.2.vz7.14.16]
- netlink/diag: report flags for netlink sockets (Andrey Vagin) [PSBM-28386]
- netlink: add an ability to restore messages in a receive queue (Andrey Vagin) 
[PSBM-28386]
- netlink: allow to set peeking offset for sockets (Andrey Vagin) [PSBM-28386 
PSBM-48484 PSBM-28386]
- ploop: io_kaio: fix silly bug in kaio_complete_io_state() (Maxim Patlasov)
- ploop: fix counting bio_qlen (Maxim Patlasov)
- ve/device_cgroup: kill ACC_QUOTA permission (Andrey Ryabinin) [PSBM-48482]
- ve/device_cgroup: allow to change device mount permission via cgroup (Andrey 
Ryabinin) [PSBM-48431]
- ve/security: device_cgroup -- Allow manage devices in @pseudosuper state 
(Cyrill Gorcunov) [PSBM-48421]
- ve/sysctl: remove unused fs.ve-area-access-check, net.ipv4.tcp_max_tw* (Pavel 
Tikhomirov) [PSBM-47061]


Built packages: 
http://kojistorage.eng.sw.ru/packages/vzkernel/3.10.0/327.18.2.vz7.14.16/
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7] mm: memcontrol: fix race between kmem uncharge and charge reparenting

2016-06-17 Thread Vladimir Davydov
When a cgroup is destroyed, all user memory pages get recharged to the
parent cgroup. Recharging is done by mem_cgroup_reparent_charges which
keeps looping until res <= kmem. This is supposed to guarantee that by
the time cgroup gets released, no pages is charged to it. However, the
guarantee might be violated in case mem_cgroup_reparent_charges races
with kmem charge or uncharge.

Currently, kmem is charged before res and uncharged after. As a result,
kmem might become greater than res for a short period of time even if
there are still user memory pages charged to the cgroup. In this case
mem_cgroup_reparent_charges will give up prematurely, and the cgroup
might be released though there are still pages charged to it. Uncharge
of such a page will trigger kernel panic:

  general protection fault:  [#1] SMP
  CPU: 0 PID: 972445 Comm: httpd ve: 0 Tainted: G   OE     
3.10.0-427.10.1.lve1.4.9.el7.x86_64 #1 12.14
  task: 88065d53d8d0 ti: 880224f34000 task.ti: 880224f34000
  RIP: 0010:[]  [] 
mem_cgroup_charge_statistics.isra.16+0x13/0x60
  RSP: 0018:880224f37a80  EFLAGS: 00010202
  RAX:  RBX: 8807b26f0110 RCX: 
  RDX: 79726f6765746163 RSI: ea000c9c0440 RDI: 8806a55662f8
  RBP: 880224f37a80 R08:  R09: 03808000
  R10: 00b8 R11: ea001eaa8980 R12: ea000c9c0440
  R13: 0001 R14:  R15: 8806a5566000
  FS:  () GS:8807d400() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 7f54289bd74c CR3: 0006638b1000 CR4: 06f0
  DR0:  DR1:  DR2: 
  DR3:  DR6: 0ff0 DR7: 0400
  Stack:
   880224f37ac0 811e9ddf 88060001 ea000c9c0440
   0001 037d1000 880224f37c78 0380
   880224f37ad0 811ee99a 880224f37b08 811b9ec9
  Call Trace:
   [] __mem_cgroup_uncharge_common+0xcf/0x320
   [] mem_cgroup_uncharge_page+0x2a/0x30
   [] page_remove_rmap+0xb9/0x160
   [] ? res_counter_uncharge+0x13/0x20
   [] unmap_page_range+0x460/0x870
   [] unmap_single_vma+0x81/0xf0
   [] unmap_vmas+0x49/0x90
   [] exit_mmap+0xac/0x1a0
   [] mmput+0x6b/0x140
   [] flush_old_exec+0x467/0x8d0
   [] load_elf_binary+0x33c/0xde0
   [] ? get_user_pages+0x52/0x60
   [] ? load_elf_library+0x220/0x220
   [] search_binary_handler+0xd5/0x300
   [] do_execve_common.isra.26+0x657/0x720
   [] SyS_execve+0x29/0x30
   [] stub_execve+0x69/0xa0

To prevent this from happening, let's always charge kmem after res and
uncharge before res.

https://bugs.openvz.org/browse/OVZ-6756

Reported-by: Anatoly Stepanov 
Signed-off-by: Vladimir Davydov 
---
 mm/memcontrol.c | 44 
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1c3fbb2d2c48..de7c36295515 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3163,10 +3163,6 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t 
gfp, u64 size)
int ret = 0;
bool may_oom;
 
-   ret = res_counter_charge(&memcg->kmem, size, &fail_res);
-   if (ret)
-   return ret;
-
/*
 * Conditions under which we can wait for the oom_killer. Those are
 * the same conditions tested by the core page allocator
@@ -3198,8 +3194,33 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t 
gfp, u64 size)
res_counter_charge_nofail(&memcg->memsw, size,
  &fail_res);
ret = 0;
-   } else if (ret)
-   res_counter_uncharge(&memcg->kmem, size);
+   }
+
+   if (ret)
+   return ret;
+
+   /*
+* When a cgroup is destroyed, all user memory pages get recharged to
+* the parent cgroup. Recharging is done by mem_cgroup_reparent_charges
+* which keeps looping until res <= kmem. This is supposed to guarantee
+* that by the time cgroup gets released, no pages is charged to it.
+*
+* If kmem were charged before res or uncharged after, kmem might
+* become greater than res for a short period of time even if there
+* were still user memory pages charged to the cgroup. In this case
+* mem_cgroup_reparent_charges would give up prematurely, and the
+* cgroup could be released though there were still pages charged to
+* it. Uncharge of such a page would trigger kernel panic.
+*
+* To prevent this from happening, kmem must be charged after res and
+* uncharged before res.
+*/
+   ret = res_counter_charge(&memcg->kmem, size, &fail_res);
+   if (ret) {
+   res_counter_uncharge(&memcg->res, size);
+   if (do_swap_account)
+   res_counter_u

Re: [Devel] [PATCH rh7] mm: memcontrol: fix race between kmem uncharge and charge reparenting

2016-06-17 Thread Konstantin Khorenko

Kirill, please review.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 06/17/2016 01:35 PM, Vladimir Davydov wrote:

When a cgroup is destroyed, all user memory pages get recharged to the
parent cgroup. Recharging is done by mem_cgroup_reparent_charges which
keeps looping until res <= kmem. This is supposed to guarantee that by
the time cgroup gets released, no pages is charged to it. However, the
guarantee might be violated in case mem_cgroup_reparent_charges races
with kmem charge or uncharge.

Currently, kmem is charged before res and uncharged after. As a result,
kmem might become greater than res for a short period of time even if
there are still user memory pages charged to the cgroup. In this case
mem_cgroup_reparent_charges will give up prematurely, and the cgroup
might be released though there are still pages charged to it. Uncharge
of such a page will trigger kernel panic:

   general protection fault:  [#1] SMP
   CPU: 0 PID: 972445 Comm: httpd ve: 0 Tainted: G   OE     
3.10.0-427.10.1.lve1.4.9.el7.x86_64 #1 12.14
   task: 88065d53d8d0 ti: 880224f34000 task.ti: 880224f34000
   RIP: 0010:[]  [] 
mem_cgroup_charge_statistics.isra.16+0x13/0x60
   RSP: 0018:880224f37a80  EFLAGS: 00010202
   RAX:  RBX: 8807b26f0110 RCX: 
   RDX: 79726f6765746163 RSI: ea000c9c0440 RDI: 8806a55662f8
   RBP: 880224f37a80 R08:  R09: 03808000
   R10: 00b8 R11: ea001eaa8980 R12: ea000c9c0440
   R13: 0001 R14:  R15: 8806a5566000
   FS:  () GS:8807d400() knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2: 7f54289bd74c CR3: 0006638b1000 CR4: 06f0
   DR0:  DR1:  DR2: 
   DR3:  DR6: 0ff0 DR7: 0400
   Stack:
880224f37ac0 811e9ddf 88060001 ea000c9c0440
0001 037d1000 880224f37c78 0380
880224f37ad0 811ee99a 880224f37b08 811b9ec9
   Call Trace:
[] __mem_cgroup_uncharge_common+0xcf/0x320
[] mem_cgroup_uncharge_page+0x2a/0x30
[] page_remove_rmap+0xb9/0x160
[] ? res_counter_uncharge+0x13/0x20
[] unmap_page_range+0x460/0x870
[] unmap_single_vma+0x81/0xf0
[] unmap_vmas+0x49/0x90
[] exit_mmap+0xac/0x1a0
[] mmput+0x6b/0x140
[] flush_old_exec+0x467/0x8d0
[] load_elf_binary+0x33c/0xde0
[] ? get_user_pages+0x52/0x60
[] ? load_elf_library+0x220/0x220
[] search_binary_handler+0xd5/0x300
[] do_execve_common.isra.26+0x657/0x720
[] SyS_execve+0x29/0x30
[] stub_execve+0x69/0xa0

To prevent this from happening, let's always charge kmem after res and
uncharge before res.

https://bugs.openvz.org/browse/OVZ-6756

Reported-by: Anatoly Stepanov 
Signed-off-by: Vladimir Davydov 
---
  mm/memcontrol.c | 44 
  1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1c3fbb2d2c48..de7c36295515 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3163,10 +3163,6 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t 
gfp, u64 size)
int ret = 0;
bool may_oom;

-   ret = res_counter_charge(&memcg->kmem, size, &fail_res);
-   if (ret)
-   return ret;
-
/*
 * Conditions under which we can wait for the oom_killer. Those are
 * the same conditions tested by the core page allocator
@@ -3198,8 +3194,33 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t 
gfp, u64 size)
res_counter_charge_nofail(&memcg->memsw, size,
  &fail_res);
ret = 0;
-   } else if (ret)
-   res_counter_uncharge(&memcg->kmem, size);
+   }
+
+   if (ret)
+   return ret;
+
+   /*
+* When a cgroup is destroyed, all user memory pages get recharged to
+* the parent cgroup. Recharging is done by mem_cgroup_reparent_charges
+* which keeps looping until res <= kmem. This is supposed to guarantee
+* that by the time cgroup gets released, no pages is charged to it.
+*
+* If kmem were charged before res or uncharged after, kmem might
+* become greater than res for a short period of time even if there
+* were still user memory pages charged to the cgroup. In this case
+* mem_cgroup_reparent_charges would give up prematurely, and the
+* cgroup could be released though there were still pages charged to
+* it. Uncharge of such a page would trigger kernel panic.
+*
+* To prevent this from happening, kmem must be charged after res and
+* uncharged before res.
+*/
+   ret = res_counter_charge(&m

[Devel] [PATCH RHEL7 COMMIT] ploop: io_kaio: fix silly bug in kaio_complete_io_state()

2016-06-17 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.15
-->
commit 75347fcbd30151977601d819a56f0a0bb57182f5
Author: Maxim Patlasov 
Date:   Fri Jun 17 13:32:34 2016 +0400

ploop: io_kaio: fix silly bug in kaio_complete_io_state()

It's useless to check for preq->req_rw & REQ_FUA after:
preq->req_rw &= ~REQ_FUA;

Signed-off-by: Maxim Patlasov 
Acked-by: Dmitry Monakhov 

Note: original code:
...
  preq->req_rw &= ~REQ_FUA;

/* Convert requested fua to fsync */
   if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state) ||
   test_and_clear_bit(PLOOP_REQ_KAIO_FSYNC,
   &preq->state))
   post_fsync = 1;

if (!post_fsync &&
!ploop_req_delay_fua_possible(preq->req_rw, preq) &&
(preq->req_rw & REQ_FUA))
post_fsync = 1;

preq->req_rw &= ~REQ_FUA;
...
---
 drivers/block/ploop/io_kaio.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/block/ploop/io_kaio.c b/drivers/block/ploop/io_kaio.c
index 54f8e21..81da1c5 100644
--- a/drivers/block/ploop/io_kaio.c
+++ b/drivers/block/ploop/io_kaio.c
@@ -78,8 +78,6 @@ static void kaio_complete_io_state(struct ploop_request * 
preq)
return;
}
 
-   preq->req_rw &= ~REQ_FUA;
-
/* Convert requested fua to fsync */
if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state) ||
test_and_clear_bit(PLOOP_REQ_KAIO_FSYNC, &preq->state))
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ploop: fix counting bio_qlen

2016-06-17 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.15
-->
commit 6cf1b457fb7252f4d2ada14f8cff0d3b91c26b5d
Author: Maxim Patlasov 
Date:   Fri Jun 17 13:32:34 2016 +0400

ploop: fix counting bio_qlen

The commit ec1eeb868 (May 22 2015) ported "separate queue for discard bio"
patch from RHEL6-based kernel incorrectly. Original patch stated clearly
that if we want to decrement bio_discard_qlen, bio_qlen must not change:

@@ -500,7 +502,7 @@ ploop_bio_queue(struct ploop_device * pl
(err = ploop_discard_add_bio(plo->fbd, bio))) {
BIO_ENDIO(bio, err);
list_add(&preq->list, &plo->free_list);
-   plo->bio_qlen--;
+   plo->bio_discard_qlen--;
plo->bio_total--;
return;
}

but that port did the opposite:

@@ -521,6 +523,7 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * 
bio,
BIO_ENDIO(plo->queue, bio, err);
list_add(&preq->list, &plo->free_list);
plo->bio_qlen--;
+   plo->bio_discard_qlen--;
plo->bio_total--;
return;
}

Signed-off-by: Maxim Patlasov 
Acked-by: Dmitry Monakhov 
---
 drivers/block/ploop/dev.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 01a5189..2ef1449 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -530,7 +530,6 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * bio,
}
BIO_ENDIO(plo->queue, bio, err);
list_add(&preq->list, &plo->free_list);
-   plo->bio_qlen--;
plo->bio_discard_qlen--;
plo->bio_total--;
return;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ve/device_cgroup: kill ACC_QUOTA permission

2016-06-17 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.15
-->
commit 32eb6c887fd633b840453f9011a62d8253ef689c
Author: Andrey Ryabinin 
Date:   Fri Jun 17 13:26:15 2016 +0400

ve/device_cgroup: kill ACC_QUOTA permission

This is a leftover from PCS6. Currently this code does absolutely
nothing, so let's remove it.

https://jira.sw.ru/browse/PSBM-48482

Signed-off-by: Andrey Ryabinin 

khorenko@: keep MAY_QUOTACTL and ACC_QUOTA defines with
comment about deprecation.
---
 include/linux/fs.h   |  2 +-
 security/device_cgroup.c | 14 +++---
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index b035f62..7203dba 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -77,7 +77,7 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* called from RCU mode, don't block */
 #define MAY_NOT_BLOCK  0x0080
 /* for devgroup-vs-openvz only */
-#define MAY_QUOTACTL   0x0001
+#define MAY_QUOTACTL   0x0001  /* deprecated */
 #define MAY_MOUNT  0x0002
 
 /*
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index fc14cdc..8e77d78 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -22,10 +22,10 @@
 #define ACC_MKNOD 1
 #define ACC_READ  2
 #define ACC_WRITE 4
-#define ACC_QUOTA 8
+#define ACC_QUOTA 8/* deprecated */
 #define ACC_HIDDEN 16
 #define ACC_MOUNT 64
-#define ACC_MASK (ACC_MKNOD | ACC_READ | ACC_WRITE | ACC_QUOTA | ACC_MOUNT)
+#define ACC_MASK (ACC_MKNOD | ACC_READ | ACC_WRITE | ACC_MOUNT)
 
 #define DEV_BLOCK 1
 #define DEV_CHAR  2
@@ -941,8 +941,6 @@ int __devcgroup_inode_permission(struct inode *inode, int 
mask)
access |= ACC_WRITE;
if (mask & MAY_READ)
access |= ACC_READ;
-   if (mask & MAY_QUOTACTL)
-   access |= ACC_QUOTA;
if (mask & MAY_MOUNT)
access |= ACC_MOUNT;
 
@@ -962,8 +960,6 @@ int devcgroup_device_permission(umode_t mode, dev_t dev, 
int mask)
access |= ACC_WRITE;
if (mask & MAY_READ)
access |= ACC_READ;
-   if (mask & MAY_QUOTACTL)
-   access |= ACC_QUOTA;
 
return __devcgroup_check_permission(type, MAJOR(dev), MINOR(dev), 
access);
 }
@@ -972,7 +968,7 @@ int devcgroup_device_visible(umode_t mode, int major, int 
start_minor, int nr_mi
 {
struct dev_cgroup *dev_cgroup;
struct dev_exception_item *ex;
-   short access = ACC_READ | ACC_WRITE | ACC_QUOTA;
+   short access = ACC_READ | ACC_WRITE;
bool match = false;
 
rcu_read_lock();
@@ -1076,8 +1072,6 @@ static unsigned decode_ve_perms(unsigned perm)
mask |= ACC_READ;
if (perm & S_IWOTH)
mask |= ACC_WRITE;
-   if (perm & S_IXGRP)
-   mask |= ACC_QUOTA;
if (perm & S_IXUSR)
mask |= ACC_MOUNT;
 
@@ -1092,8 +1086,6 @@ static unsigned encode_ve_perms(unsigned mask)
perm |= S_IROTH;
if (mask & ACC_WRITE)
perm |= S_IWOTH;
-   if (mask & ACC_QUOTA)
-   perm |= S_IXGRP;
if (mask & ACC_MOUNT)
perm |= S_IXUSR;
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ve/device_cgroup: allow to change device mount permission via cgroup

2016-06-17 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.15
-->
commit 3c080800a7f1d1a3102c9d9de1b46b80e0fec187
Author: Andrey Ryabinin 
Date:   Fri Jun 17 13:08:24 2016 +0400

ve/device_cgroup: allow to change device mount permission via cgroup

Currently, in order to allow a Container to mount device, we call an
ioctl(get_vzctlfd(), VZCTL_SETDEVPERMS, &devperms) with S_IXUSR bit set.

In fact, this ioctl() is just a wrapper around dev cgroup interface, which
is very odd. Instead, lets allow to change mount permission via dev cgroup
interface.
Since letter 'm' already occupied for mknod permission, we will use
capitalize 'M' for mount permission.

E.g.:

$ echo 'b 182:954545 M' > /sys/fs/cgroup/devices/$ID/devices.allow
$ cat /sys/fs/cgroup/devices/$ID/devices.list
...
b 182:954545 rmM

https://jira.sw.ru/browse/PSBM-48431

Signed-off-by: Andrey Ryabinin 
---
 security/device_cgroup.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index f94d08e..fc14cdc 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -269,7 +269,7 @@ static void devcgroup_css_free(struct cgroup *cgroup)
 #define DEVCG_LIST 3
 
 #define MAJMINLEN 13
-#define ACCLEN 4
+#define ACCLEN 5
 
 static void set_access(char *acc, short access)
 {
@@ -281,6 +281,8 @@ static void set_access(char *acc, short access)
acc[idx++] = 'w';
if (access & ACC_MKNOD)
acc[idx++] = 'm';
+   if (access & ACC_MOUNT)
+   acc[idx++] = 'M';
 }
 
 static char type_to_char(short type)
@@ -771,7 +773,7 @@ static int devcgroup_update_access(struct dev_cgroup 
*devcgroup,
}
if (!isspace(*b))
return -EINVAL;
-   for (b++, count = 0; count < 3; count++, b++) {
+   for (b++, count = 0; count < ACCLEN - 1; count++, b++) {
switch (*b) {
case 'r':
ex.access |= ACC_READ;
@@ -782,9 +784,12 @@ static int devcgroup_update_access(struct dev_cgroup 
*devcgroup,
case 'm':
ex.access |= ACC_MKNOD;
break;
+   case 'M':
+   ex.access |= ACC_MOUNT;
+   break;
case '\n':
case '\0':
-   count = 3;
+   count = ACCLEN - 1;
break;
default:
return -EINVAL;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ve/security: device_cgroup -- Allow manage devices in @pseudosuper state

2016-06-17 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.15
-->
commit 6504a698d0cb68644ad61f139e528c7fb605a246
Author: Cyrill Gorcunov 
Date:   Fri Jun 17 13:06:56 2016 +0400

ve/security: device_cgroup -- Allow manage devices in @pseudosuper state

When restoring containers with several disks it's more convenient
to mount device first and the setup permissions needed. So for this
sake we allow to escape device permissions testing inside VE only
if @pseudosuper state enabled.

https://jira.sw.ru/browse/PSBM-48421

CC: Vladimir Davydov 
CC: Konstantin Khorenko 
CC: Andrey Vagin 
Signed-off-by: Cyrill Gorcunov 
---
 security/device_cgroup.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 0a6d9c4..f94d08e 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -902,8 +902,24 @@ static int __devcgroup_check_permission(short type, u32 
major, u32 minor,
 minor, access);
rcu_read_unlock();
 
+#ifdef CONFIG_VE
+   /*
+* When restoring container allow everything in
+* pseudosuper state. We need this for early
+* mounting of second ploop device. Still, don't
+* change behaviour on the ve0.
+*/
+   if (!rc) {
+   struct ve_struct *ve = get_exec_env();
+
+   if (!ve_is_super(ve) && ve->is_pseudosuper)
+   return 0;
+   return -EPERM;
+   }
+#else
if (!rc)
return -EPERM;
+#endif
 
return 0;
 }
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [vzlin-dev] [PATCH rh7] ploop: io_kaio: fix silly bug in kaio_complete_io_state()

2016-06-17 Thread Dmitry Monakhov
Maxim Patlasov  writes:

> It's useless to check for preq->req_rw & REQ_FUA after:
> preq->req_rw &= ~REQ_FUA;
ACK :) But in order to make it clear for others let's post original code
here!
...
  preq->req_rw &= ~REQ_FUA;

/* Convert requested fua to fsync */
   if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state) ||
   test_and_clear_bit(PLOOP_REQ_KAIO_FSYNC,
   &preq->state))
   post_fsync = 1;

if (!post_fsync &&
!ploop_req_delay_fua_possible(preq->req_rw, preq) &&
(preq->req_rw & REQ_FUA))
post_fsync = 1;

preq->req_rw &= ~REQ_FUA;
...


>
> Signed-off-by: Maxim Patlasov 
> ---
>  drivers/block/ploop/io_kaio.c |2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/block/ploop/io_kaio.c b/drivers/block/ploop/io_kaio.c
> index 79aa9af..de26319 100644
> --- a/drivers/block/ploop/io_kaio.c
> +++ b/drivers/block/ploop/io_kaio.c
> @@ -71,8 +71,6 @@ static void kaio_complete_io_state(struct ploop_request * 
> preq)
>   return;
>   }
>  
> - preq->req_rw &= ~REQ_FUA;
> -
>   /* Convert requested fua to fsync */
>   if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state) ||
>   test_and_clear_bit(PLOOP_REQ_KAIO_FSYNC, &preq->state))


signature.asc
Description: PGP signature
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [vzlin-dev] [PATCH rh7] ploop: fix counting bio_qlen

2016-06-17 Thread Dmitry Monakhov
Maxim Patlasov  writes:

> The commit ec1eeb868 (May 22 2015) ported "separate queue for discard bio"
> patch from RHEL6-based kernel incorrectly. Original patch stated clearly
> that if we want to decrement bio_discard_qlen, bio_qlen must not change:
>
> @@ -500,7 +502,7 @@ ploop_bio_queue(struct ploop_device * pl
> (err = ploop_discard_add_bio(plo->fbd, bio))) {
> BIO_ENDIO(bio, err);
> list_add(&preq->list, &plo->free_list);
> -   plo->bio_qlen--;
> +   plo->bio_discard_qlen--;
> plo->bio_total--;
> return;
> }
>
> but that port did the opposite:
>
> @@ -521,6 +523,7 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * 
> bio,
> BIO_ENDIO(plo->queue, bio, err);
> list_add(&preq->list, &plo->free_list);
> plo->bio_qlen--;
> +   plo->bio_discard_qlen--;
> plo->bio_total--;
> return;
> }
>
> Signed-off-by: Maxim Patlasov 
> ---
>  drivers/block/ploop/dev.c |1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
> index db55be3..e1fbfcf 100644
> --- a/drivers/block/ploop/dev.c
> +++ b/drivers/block/ploop/dev.c
> @@ -523,7 +523,6 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * 
> bio,
>   }
>   BIO_ENDIO(plo->queue, bio, err);
>   list_add(&preq->list, &plo->free_list);
> - plo->bio_qlen--;
>   plo->bio_discard_qlen--;
>   plo->bio_total--;
>   return;
ACK


signature.asc
Description: PGP signature
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel