[Devel] [PATCH rh7] ms/cgroup: fix rmdir EBUSY regression in 3.11

2017-01-11 Thread Andrey Ryabinin
From: Hugh Dickins 

commit bb78a92f47696b2da49f2692b6a9fa56d07c444a upstream.

On 3.11-rc we are seeing cgroup directories left behind when they should
have been removed.  Here's a trivial reproducer:

cd /sys/fs/cgroup/memory
mkdir parent parent/child; rmdir parent/child parent
rmdir: failed to remove `parent': Device or resource busy

It's because cgroup_destroy_locked() (step 1 of destruction) leaves
cgroup on parent's children list, letting cgroup_offline_fn() (step 2 of
destruction) remove it; but step 2 is run by work queue, which may not
yet have removed the children when parent destruction checks the list.

Fix that by checking through a non-empty list of children: if every one
of them has already been marked CGRP_DEAD, then it's safe to proceed:
those children are invisible to userspace, and should not obstruct rmdir.

(I didn't see any reason to keep the cgrp->children checks under the
unrelated css_set_lock, so moved them out.)

tj: Flattened nested ifs a bit and updated comment so that it's
correct on both for-3.11-fixes and for-3.12.

Signed-off-by: Hugh Dickins 
Signed-off-by: Tejun Heo 

https://jira.sw.ru/browse/PSBM-53314

[aryabinin: s/cgroup_is_dead()/cgroup_is_removed()]
Signed-off-by: Andrey Ryabinin 
---
 kernel/cgroup.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1c047b9..6aafc51 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4432,11 +4432,29 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
struct dentry *d = cgrp->dentry;
struct cgroup_event *event, *tmp;
struct cgroup_subsys *ss;
+   struct cgroup *child;
+   bool empty;
 
lockdep_assert_held(&d->d_inode->i_mutex);
lockdep_assert_held(&cgroup_mutex);
 
-   if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children))
+   if (atomic_read(&cgrp->count))
+   return -EBUSY;
+
+   /*
+* Make sure there's no live children.  We can't test ->children
+* emptiness as dead children linger on it while being destroyed;
+* otherwise, "rmdir parent/child parent" may fail with -EBUSY.
+*/
+   empty = true;
+   rcu_read_lock();
+   list_for_each_entry_rcu(child, &cgrp->children, sibling) {
+   empty = cgroup_is_removed(child);
+   if (!empty)
+   break;
+   }
+   rcu_read_unlock();
+   if (!empty)
return -EBUSY;
 
/*
-- 
2.10.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh7] ms/cgroup: fix rmdir EBUSY regression in 3.11

2017-01-11 Thread Cyrill Gorcunov
On Wed, Jan 11, 2017 at 12:30:49PM +0300, Andrey Ryabinin wrote:
> 
> [aryabinin: s/cgroup_is_dead()/cgroup_is_removed()]
> Signed-off-by: Andrey Ryabinin 
Acked-by: Cyrill Gorcunov 

Thank you!
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ms/cgroup: fix rmdir EBUSY regression in 3.11

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.8
-->
commit 34697deacd12106ba28bc7e44efe5221de278ff8
Author: Hugh Dickins 
Date:   Wed Jan 11 14:20:30 2017 +0400

ms/cgroup: fix rmdir EBUSY regression in 3.11

commit bb78a92f47696b2da49f2692b6a9fa56d07c444a upstream.

On 3.11-rc we are seeing cgroup directories left behind when they should
have been removed.  Here's a trivial reproducer:

cd /sys/fs/cgroup/memory
mkdir parent parent/child; rmdir parent/child parent
rmdir: failed to remove `parent': Device or resource busy

It's because cgroup_destroy_locked() (step 1 of destruction) leaves
cgroup on parent's children list, letting cgroup_offline_fn() (step 2 of
destruction) remove it; but step 2 is run by work queue, which may not
yet have removed the children when parent destruction checks the list.

Fix that by checking through a non-empty list of children: if every one
of them has already been marked CGRP_DEAD, then it's safe to proceed:
those children are invisible to userspace, and should not obstruct rmdir.

(I didn't see any reason to keep the cgrp->children checks under the
unrelated css_set_lock, so moved them out.)

tj: Flattened nested ifs a bit and updated comment so that it's
correct on both for-3.11-fixes and for-3.12.

Signed-off-by: Hugh Dickins 
Signed-off-by: Tejun Heo 

https://jira.sw.ru/browse/PSBM-53314

[aryabinin: s/cgroup_is_dead()/cgroup_is_removed()]
Signed-off-by: Andrey Ryabinin 
Acked-by: Cyrill Gorcunov 
---
 kernel/cgroup.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7c185a0..5ea44e1 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4434,11 +4434,29 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
struct dentry *d = cgrp->dentry;
struct cgroup_event *event, *tmp;
struct cgroup_subsys *ss;
+   struct cgroup *child;
+   bool empty;
 
lockdep_assert_held(&d->d_inode->i_mutex);
lockdep_assert_held(&cgroup_mutex);
 
-   if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children))
+   if (atomic_read(&cgrp->count))
+   return -EBUSY;
+
+   /*
+* Make sure there's no live children.  We can't test ->children
+* emptiness as dead children linger on it while being destroyed;
+* otherwise, "rmdir parent/child parent" may fail with -EBUSY.
+*/
+   empty = true;
+   rcu_read_lock();
+   list_for_each_entry_rcu(child, &cgrp->children, sibling) {
+   empty = cgroup_is_removed(child);
+   if (!empty)
+   break;
+   }
+   rcu_read_unlock();
+   if (!empty)
return -EBUSY;
 
/*
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ms/xfs: rework buffer dispose list tracking

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.8
-->
commit 80545dc5820da58dc3b7512dab667992ebcd487d
Author: Dmitry Monakhov 
Date:   Wed Jan 11 14:34:39 2017 +0400

ms/xfs: rework buffer dispose list tracking

Patchset description:
[7.3] rebase xfs lru patches

rh7-3.10.0-514 already has 'fs-xfs-rework-buffer-dispose-list-tracking', but
originally it depens on ms/xfs-convert-buftarg-LRU-to-generic, so
In order to preserve original logic I've revert rhel's patch (1'st one),
and reapply it later in natural order:
TOC:
0001-Revert-fs-xfs-rework-buffer-dispose-list-tracking.patch

0002-ms-xfs-convert-buftarg-LRU-to-generic-code.patch
0003-From-c70ded437bb646ace0dcbf3c7989d4edeed17f7e-Mon-Se.patch [not 
changed]
0004-ms-xfs-rework-buffer-dispose-list-tracking.patch

===
This patch description:

In converting the buffer lru lists to use the generic code, the locking
for marking the buffers as on the dispose list was lost.  This results in
confusion in LRU buffer tracking and acocunting, resulting in reference
counts being mucked up and filesystem beig unmountable.

To fix this, introduce an internal buffer spinlock to protect the state
field that holds the dispose list information.  Because there is now
locking needed around xfs_buf_lru_add/del, and they are used in exactly
one place each two lines apart, get rid of the wrappers and code the logic
directly in place.

Further, the LRU emptying code used on unmount is less than optimal.
Convert it to use a dispose list as per a normal shrinker walk, and repeat
the walk that fills the dispose list until the LRU is empty.  Thi avoids
needing to drop and regain the LRU lock for every item being freed, and
allows the same logic as the shrinker isolate call to be used.  Simpler,
easier to understand.

Signed-off-by: Dave Chinner 
Signed-off-by: Glauber Costa 
Cc: "Theodore Ts'o" 
Cc: Adrian Hunter 
Cc: Al Viro 
Cc: Artem Bityutskiy 
Cc: Arve Hjønnevåg 
Cc: Carlos Maiolino 
Cc: Christoph Hellwig 
Cc: Chuck Lever 
Cc: Daniel Vetter 
Cc: David Rientjes 
Cc: Gleb Natapov 
Cc: Greg Thelen 
Cc: J. Bruce Fields 
Cc: Jan Kara 
Cc: Jerome Glisse 
Cc: John Stultz 
Cc: KAMEZAWA Hiroyuki 
Cc: Kent Overstreet 
Cc: Kirill A. Shutemov 
Cc: Marcelo Tosatti 
Cc: Mel Gorman 
Cc: Steven Whitehouse 
Cc: Thomas Hellstrom 
Cc: Trond Myklebust 
Signed-off-by: Andrew Morton 
Signed-off-by: Al Viro 
(cherry picked from commit a408235726aa82c0358c9ec68124b6f4bc0a79df)

https://jira.sw.ru/browse/PSBM-55577

Signed-off-by: Dmitry Monakhov 
---
 fs/xfs/xfs_buf.c | 147 +++
 fs/xfs/xfs_buf.h |   8 ++-
 2 files changed, 78 insertions(+), 77 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index bf933d5..8d8c9ce 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -80,37 +80,6 @@ xfs_buf_vmap_len(
 }
 
 /*
- * xfs_buf_lru_add - add a buffer to the LRU.
- *
- * The LRU takes a new reference to the buffer so that it will only be freed
- * once the shrinker takes the buffer off the LRU.
- */
-static void
-xfs_buf_lru_add(
-   struct xfs_buf  *bp)
-{
-   if (list_lru_add(&bp->b_target->bt_lru, &bp->b_lru)) {
-   bp->b_lru_flags &= ~_XBF_LRU_DISPOSE;
-   atomic_inc(&bp->b_hold);
-   }
-}
-
-/*
- * xfs_buf_lru_del - remove a buffer from the LRU
- *
- * The unlocked check is safe here because it only occurs when there are not
- * b_lru_ref counts left on the inode under the pag->pag_buf_lock. it is there
- * to optimise the shrinker removing the buffer from the LRU and calling
- * xfs_buf_free().
- */
-static void
-xfs_buf_lru_del(
-   struct xfs_buf  *bp)
-{
-   list_lru_del(&bp->b_target->bt_lru, &bp->b_lru);
-}
-
-/*
  * Bump the I/O in flight count on the buftarg if we haven't yet done so for
  * this buffer. The count is incremented once per buffer (per hold cycle)
  * because the corresponding decrement is deferred to buffer release. Buffers
@@ -181,12 +150,14 @@ xfs_buf_stale(
 */
xfs_buf_ioacct_dec(bp);
 
-   atomic_set(&(bp)->b_lru_ref, 0);
-   if (!(bp->b_lru_flags & _XBF_LRU_DISPOSE) &&
+   spin_lock(&bp->b_lock);
+   atomic_set(&bp->b_lru_ref, 0);
+   if (!(bp->b_state & XFS_BSTATE_DISPOSE) &&
(list_lru_del(&bp->b_target->bt_lru, &bp->b_lru)))
atomic_dec(&bp->b_hold);
 
ASSERT(atomic_read(&bp->b_hold) >= 1);
+   spin_unlock(&bp->b_lock);
 }
 
 static int
@@ -987,10 +958,28 @@ xfs_buf_rele(
/* the last reference has been dropped ... */
xfs_buf_ioacct_d

[Devel] [PATCH RHEL7 COMMIT] ms/xfs-convert-buftarg-lru-to-generic-code-fix

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.8
-->
commit a438079b37e68bd54fa4942175643cc7add1e027
Author: Andrew Morton 
Date:   Wed Jan 11 14:34:38 2017 +0400

ms/xfs-convert-buftarg-lru-to-generic-code-fix

Patchset description:
[7.3] rebase xfs lru patches

rh7-3.10.0-514 already has 'fs-xfs-rework-buffer-dispose-list-tracking', but
originally it depens on ms/xfs-convert-buftarg-LRU-to-generic, so
In order to preserve original logic I've revert rhel's patch (1'st one),
and reapply it later in natural order:
TOC:
0001-Revert-fs-xfs-rework-buffer-dispose-list-tracking.patch

0002-ms-xfs-convert-buftarg-LRU-to-generic-code.patch
0003-From-c70ded437bb646ace0dcbf3c7989d4edeed17f7e-Mon-Se.patch [not 
changed]
0004-ms-xfs-rework-buffer-dispose-list-tracking.patch

===
This patch description:

fix warnings

Cc: Dave Chinner 
Cc: Glauber Costa 
Signed-off-by: Andrew Morton 
Signed-off-by: Al Viro 
(cherry picked from commit addbda40bed47d8942658fca93e14b5f1cbf009a)

Signed-off-by: Vladimir Davydov 

https://jira.sw.ru/browse/PSBM-55577

Signed-off-by: Dmitry Monakhov 
---
 fs/xfs/xfs_buf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 87a314a..bf933d5 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1654,7 +1654,7 @@ xfs_buftarg_isolate(
return LRU_REMOVED;
 }
 
-static long
+static unsigned long
 xfs_buftarg_shrink_scan(
struct shrinker *shrink,
struct shrink_control   *sc)
@@ -1662,7 +1662,7 @@ xfs_buftarg_shrink_scan(
struct xfs_buftarg  *btp = container_of(shrink,
struct xfs_buftarg, bt_shrinker);
LIST_HEAD(dispose);
-   longfreed;
+   unsigned long   freed;
unsigned long   nr_to_scan = sc->nr_to_scan;
 
freed = list_lru_walk_node(&btp->bt_lru, sc->nid, xfs_buftarg_isolate,
@@ -1678,7 +1678,7 @@ xfs_buftarg_shrink_scan(
return freed;
 }
 
-static long
+static unsigned long
 xfs_buftarg_shrink_count(
struct shrinker *shrink,
struct shrink_control   *sc)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ms/xfs: convert buftarg LRU to generic code

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.8
-->
commit 6490df80cd2cc557ea7ce8520c8aa0ed5528a5f0
Author: Dmitry Monakhov 
Date:   Wed Jan 11 14:34:37 2017 +0400

ms/xfs: convert buftarg LRU to generic code

Patchset description:
[7.3] rebase xfs lru patches

rh7-3.10.0-514 already has 'fs-xfs-rework-buffer-dispose-list-tracking', but
originally it depens on ms/xfs-convert-buftarg-LRU-to-generic, so
In order to preserve original logic I've revert rhel's patch (1'st one),
and reapply it later in natural order:
TOC:
0001-Revert-fs-xfs-rework-buffer-dispose-list-tracking.patch

0002-ms-xfs-convert-buftarg-LRU-to-generic-code.patch
0003-From-c70ded437bb646ace0dcbf3c7989d4edeed17f7e-Mon-Se.patch [not 
changed]
0004-ms-xfs-rework-buffer-dispose-list-tracking.patch

===
This patch description:

Convert the buftarg LRU to use the new generic LRU list and take advantage
of the functionality it supplies to make the buffer cache shrinker node
aware.

Signed-off-by: Glauber Costa 
Signed-off-by: Dave Chinner 
Cc: "Theodore Ts'o" 
Cc: Adrian Hunter 
Cc: Al Viro 
Cc: Artem Bityutskiy 
Cc: Arve Hjønnevåg 
Cc: Carlos Maiolino 
Cc: Christoph Hellwig 
Cc: Chuck Lever 
Cc: Daniel Vetter 
Cc: David Rientjes 
Cc: Gleb Natapov 
Cc: Greg Thelen 
Cc: J. Bruce Fields 
Cc: Jan Kara 
Cc: Jerome Glisse 
Cc: John Stultz 
Cc: KAMEZAWA Hiroyuki 
Cc: Kent Overstreet 
Cc: Kirill A. Shutemov 
Cc: Marcelo Tosatti 
Cc: Mel Gorman 
Cc: Steven Whitehouse 
Cc: Thomas Hellstrom 
Cc: Trond Myklebust 
Signed-off-by: Andrew Morton 
Signed-off-by: Al Viro 
(cherry picked from commit e80dfa19976b884db1ac2bc5d7d6ca0a4027bd1c)

https://jira.sw.ru/browse/PSBM-55577

Signed-off-by: Dmitry Monakhov 
---
 fs/xfs/xfs_buf.c | 170 ++-
 fs/xfs/xfs_buf.h |   5 +-
 2 files changed, 81 insertions(+), 94 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index c0de0e2..87a314a 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -85,20 +85,14 @@ xfs_buf_vmap_len(
  * The LRU takes a new reference to the buffer so that it will only be freed
  * once the shrinker takes the buffer off the LRU.
  */
-STATIC void
+static void
 xfs_buf_lru_add(
struct xfs_buf  *bp)
 {
-   struct xfs_buftarg *btp = bp->b_target;
-
-   spin_lock(&btp->bt_lru_lock);
-   if (list_empty(&bp->b_lru)) {
-   atomic_inc(&bp->b_hold);
-   list_add_tail(&bp->b_lru, &btp->bt_lru);
-   btp->bt_lru_nr++;
+   if (list_lru_add(&bp->b_target->bt_lru, &bp->b_lru)) {
bp->b_lru_flags &= ~_XBF_LRU_DISPOSE;
+   atomic_inc(&bp->b_hold);
}
-   spin_unlock(&btp->bt_lru_lock);
 }
 
 /*
@@ -107,24 +101,13 @@ xfs_buf_lru_add(
  * The unlocked check is safe here because it only occurs when there are not
  * b_lru_ref counts left on the inode under the pag->pag_buf_lock. it is there
  * to optimise the shrinker removing the buffer from the LRU and calling
- * xfs_buf_free(). i.e. it removes an unnecessary round trip on the
- * bt_lru_lock.
+ * xfs_buf_free().
  */
-STATIC void
+static void
 xfs_buf_lru_del(
struct xfs_buf  *bp)
 {
-   struct xfs_buftarg *btp = bp->b_target;
-
-   if (list_empty(&bp->b_lru))
-   return;
-
-   spin_lock(&btp->bt_lru_lock);
-   if (!list_empty(&bp->b_lru)) {
-   list_del_init(&bp->b_lru);
-   btp->bt_lru_nr--;
-   }
-   spin_unlock(&btp->bt_lru_lock);
+   list_lru_del(&bp->b_target->bt_lru, &bp->b_lru);
 }
 
 /*
@@ -199,18 +182,10 @@ xfs_buf_stale(
xfs_buf_ioacct_dec(bp);
 
atomic_set(&(bp)->b_lru_ref, 0);
-   if (!list_empty(&bp->b_lru)) {
-   struct xfs_buftarg *btp = bp->b_target;
-
-   spin_lock(&btp->bt_lru_lock);
-   if (!list_empty(&bp->b_lru) &&
-   !(bp->b_lru_flags & _XBF_LRU_DISPOSE)) {
-   list_del_init(&bp->b_lru);
-   btp->bt_lru_nr--;
-   atomic_dec(&bp->b_hold);
-   }
-   spin_unlock(&btp->bt_lru_lock);
-   }
+   if (!(bp->b_lru_flags & _XBF_LRU_DISPOSE) &&
+   (list_lru_del(&bp->b_target->bt_lru, &bp->b_lru)))
+   atomic_dec(&bp->b_hold);
+
ASSERT(atomic_read(&bp->b_hold) >= 1);
 }
 
@@ -1597,11 +1572,14 @@ xfs_buf_iomove(
  * returned. These buffers will have an elevated hold count, so wait on those
  * while freeing all the buffers only held by the LRU.
  */
-void
-xfs_wait_buftarg(
-   struct xfs_buftarg  *btp)
+static enum lru_status
+xfs_buft

[Devel] [PATCH RHEL7 COMMIT] fs: constify iov_iter_count/iov_iter_iovec helpers

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.8
-->
commit 31eb8052e9a1198abc40891d8aea3840e6004b93
Author: Dmitry Monakhov 
Date:   Wed Jan 11 14:44:03 2017 +0400

fs: constify iov_iter_count/iov_iter_iovec helpers

This is done in the scope of fixing CEPH compilation after rebase to 
RHEL7.3.

https://jira.sw.ru/browse/PSBM-54817

Signed-off-by: Dmitry Monakhov 
---
 include/linux/fs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index e30e8a1..a27bd15 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -448,13 +448,13 @@ static inline int iov_iter_has_iovec(const struct 
iov_iter *i)
 {
return i->ops == &ii_iovec_ops;
 }
-static inline struct iovec *iov_iter_iovec(struct iov_iter *i)
+static inline struct iovec *iov_iter_iovec(const struct iov_iter *i)
 {
BUG_ON(!iov_iter_has_iovec(i));
return (struct iovec *)i->data;
 }
 
-static inline size_t iov_iter_count(struct iov_iter *i)
+static inline size_t iov_iter_count(const struct iov_iter *i)
 {
return i->count;
 }
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] Revert: [fs] xfs: rework buffer dispose list tracking

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.8
-->
commit d4dd61f05d16cabfdb0e277b45da08e99b55ebf5
Author: Dmitry Monakhov 
Date:   Wed Jan 11 14:34:36 2017 +0400

Revert: [fs] xfs: rework buffer dispose list tracking

Patchset description:
[7.3] rebase xfs lru patches

rh7-3.10.0-514 already has 'fs-xfs-rework-buffer-dispose-list-tracking', but
originally it depens on ms/xfs-convert-buftarg-LRU-to-generic, so
In order to preserve original logic I've revert rhel's patch (1'st one),
and reapply it later in natural order:
TOC:
0001-Revert-fs-xfs-rework-buffer-dispose-list-tracking.patch

0002-ms-xfs-convert-buftarg-LRU-to-generic-code.patch
0003-From-c70ded437bb646ace0dcbf3c7989d4edeed17f7e-Mon-Se.patch [not 
changed]
0004-ms-xfs-rework-buffer-dispose-list-tracking.patch

===
This patch description:

RH Bugzilla: 1349175

ms commit a408235726aa82c0358c9ec68124b6f4bc0a79df
Author: Dave Chinner 
Date:   Wed Aug 28 10:18:06 2013 +1000

xfs: rework buffer dispose list tracking

In converting the buffer lru lists to use the generic code, the locking
for marking the buffers as on the dispose list was lost.  This results 
in
confusion in LRU buffer tracking and acocunting, resulting in reference
counts being mucked up and filesystem beig unmountable.

To fix this, introduce an internal buffer spinlock to protect the state
field that holds the dispose list information.  Because there is now
locking needed around xfs_buf_lru_add/del, and they are used in exactly
one place each two lines apart, get rid of the wrappers and code the 
logic
directly in place.

Further, the LRU emptying code used on unmount is less than optimal.
Convert it to use a dispose list as per a normal shrinker walk, and 
repeat
the walk that fills the dispose list until the LRU is empty.  Thi avoids
needing to drop and regain the LRU lock for every item being freed, and
allows the same logic as the shrinker isolate call to be used.  Simpler,
easier to understand.

Signed-off-by: Dave Chinner 
Signed-off-by: Glauber Costa 
Cc: "Theodore Ts'o" 
Cc: Adrian Hunter 
Cc: Al Viro 
Cc: Artem Bityutskiy 
Cc: Arve Hjonnevag 
Cc: Carlos Maiolino 
Cc: Christoph Hellwig 
Cc: Chuck Lever 
Cc: Daniel Vetter 
Cc: David Rientjes 
Cc: Gleb Natapov 
Cc: Greg Thelen 
Cc: J. Bruce Fields 
Cc: Jan Kara 
Cc: Jerome Glisse 
Cc: John Stultz 
Cc: KAMEZAWA Hiroyuki 
Cc: Kent Overstreet 
Cc: Kirill A. Shutemov 
Cc: Marcelo Tosatti 
Cc: Mel Gorman 
Cc: Steven Whitehouse 
Cc: Thomas Hellstrom 
Cc: Trond Myklebust 
Signed-off-by: Andrew Morton 
Signed-off-by: Al Viro 

Signed-off-by: Brian Foster 

https://jira.sw.ru/browse/PSBM-55577

Signed-off-by: Dmitry Monakhov 
---
 fs/xfs/xfs_buf.c | 57 
 fs/xfs/xfs_buf.h |  8 +++-
 2 files changed, 11 insertions(+), 54 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index e380398..c0de0e2 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -96,7 +96,7 @@ xfs_buf_lru_add(
atomic_inc(&bp->b_hold);
list_add_tail(&bp->b_lru, &btp->bt_lru);
btp->bt_lru_nr++;
-   bp->b_state &= ~XFS_BSTATE_DISPOSE;
+   bp->b_lru_flags &= ~_XBF_LRU_DISPOSE;
}
spin_unlock(&btp->bt_lru_lock);
 }
@@ -198,21 +198,19 @@ xfs_buf_stale(
 */
xfs_buf_ioacct_dec(bp);
 
-   spin_lock(&bp->b_lock);
-   atomic_set(&bp->b_lru_ref, 0);
+   atomic_set(&(bp)->b_lru_ref, 0);
if (!list_empty(&bp->b_lru)) {
struct xfs_buftarg *btp = bp->b_target;
 
spin_lock(&btp->bt_lru_lock);
if (!list_empty(&bp->b_lru) &&
-   !(bp->b_state & XFS_BSTATE_DISPOSE)) {
+   !(bp->b_lru_flags & _XBF_LRU_DISPOSE)) {
list_del_init(&bp->b_lru);
btp->bt_lru_nr--;
atomic_dec(&bp->b_hold);
}
spin_unlock(&btp->bt_lru_lock);
}
-   spin_unlock(&bp->b_lock);
ASSERT(atomic_read(&bp->b_hold) >= 1);
 }
 
@@ -1014,26 +1012,10 @@ xfs_buf_rele(
/* the last reference has been dropped ... */
xfs_buf_ioacct_dec(bp);
if (!(bp->b_flags & XBF_STALE) && atomic_read(&bp->b_lru_ref)) {
-   /*
-* If the buffer is added to t

[Devel] [PATCH RHEL7 COMMIT] fs/ceph: honor kernel direct aio changes v2

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.8
-->
commit dc5b3fbb992470c36ea83967aab4945648a980c0
Author: Dmitry Monakhov 
Date:   Wed Jan 11 14:44:04 2017 +0400

fs/ceph: honor kernel direct aio changes v2

Base patches:
fs/ceph: honor kernel direct aio changes
fs/ceph: add BUG_ON to iov_iter access

Changes: replace opencoded iter to iovec coversion with propper helper.

This is done in the scope of fixing CEPH compilation after rebase to 
RHEL7.3.

https://jira.sw.ru/browse/PSBM-54817

Signed-off-by: Dmitry Monakhov 
---
 fs/ceph/file.c | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 82676fa..0b72417 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -40,8 +40,8 @@
  */
 static size_t dio_get_pagev_size(const struct iov_iter *it)
 {
-const struct iovec *iov = it->iov;
-const struct iovec *iovend = iov + it->nr_segs;
+const struct iovec *iov = iov_iter_iovec(it);
+size_t total = iov_iter_count(it);
 size_t size;
 
 size = iov->iov_len - it->iov_offset;
@@ -50,8 +50,10 @@ static size_t dio_get_pagev_size(const struct iov_iter *it)
  * and the next base are page aligned.
  */
 while (PAGE_ALIGNED((iov->iov_base + iov->iov_len)) &&
-   (++iov < iovend && PAGE_ALIGNED((iov->iov_base {
-size += iov->iov_len;
+   PAGE_ALIGNED(((iov++)->iov_base))) {
+   size_t n =  min(iov->iov_len, total);
+   size += n;
+   total -= n;
 }
 dout("dio_get_pagevlen len = %zu\n", size);
 return size;
@@ -71,7 +73,7 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
struct page **pages;
int ret = 0, idx, npages;
 
-   align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
+   align = (unsigned long)(iov_iter_iovec(it)->iov_base + it->iov_offset) &
(PAGE_SIZE - 1);
npages = calc_pages_for(align, nbytes);
pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
@@ -82,10 +84,11 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t 
nbytes,
}
 
for (idx = 0; idx < npages; ) {
-   void __user *data = tmp_it.iov->iov_base + tmp_it.iov_offset;
+   struct iovec *tmp_iov = iov_iter_iovec(&tmp_it);
+   void __user *data = tmp_iov->iov_base + tmp_it.iov_offset;
size_t off = (unsigned long)data & (PAGE_SIZE - 1);
size_t len = min_t(size_t, nbytes,
-  tmp_it.iov->iov_len - tmp_it.iov_offset);
+  tmp_iov->iov_len - tmp_it.iov_offset);
int n = (len + off + PAGE_SIZE - 1) >> PAGE_SHIFT;
ret = get_user_pages_fast((unsigned long)data, n, write,
   pages + idx);
@@ -522,10 +525,9 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct 
iov_iter *i,
size_t left = len = ret;
 
while (left) {
-   void __user *data = i->iov[0].iov_base +
-   i->iov_offset;
-   l = min(i->iov[0].iov_len - i->iov_offset,
-   left);
+   struct iovec *iov = (struct iovec *)i->data;
+   void __user *data = iov->iov_base + i->iov_offset;
+   l = min(iov->iov_len - i->iov_offset, left);
 
ret = ceph_copy_page_vector_to_user(&pages[k],
data, off, l);
@@ -1121,7 +1123,7 @@ static ssize_t inline_to_iov(struct kiocb *iocb, struct 
iov_iter *i,
 
while (left) {
struct iovec *iov = iov_iter_iovec(i);
-   void __user *udata = iov->iov_base + i->iov_offset;
+   void __user *udata = iov->iov_base;
size_t n = min(iov->iov_len - i->iov_offset, left);
 
if (__copy_to_user(udata, kdata, n)) {
@@ -1139,8 +1141,8 @@ static ssize_t inline_to_iov(struct kiocb *iocb, struct 
iov_iter *i,
size_t left = min_t(loff_t, iocb->ki_pos + len, i_size) - pos;
 
while (left) {
-   struct iovec *iov = iov_iter_iovec(i);
-   void __user *udata = iov->iov_base + i->iov_offset;
+   struct iovec *iov = (struct iovec *)i->data;
+   void __user *udata = iov->iov_base;
size_t n = min(iov->iov_len - i->iov_offset, left);
 
if (__clear_user(udata, n)) {
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] Revert "configs: temporary disable CEPH and XFS compilation"

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.8
-->
commit 1c089c8e86762f744e4ff0031f2b2a068ffb0b03
Author: Konstantin Khorenko 
Date:   Wed Jan 11 14:45:22 2017 +0400

Revert "configs: temporary disable CEPH and XFS compilation"

This reverts commit 58208fc80ccc242fa0f6a633559ab33d0bbc4c55.

XFS and CEPH compilation is fixed now, so enable modules compilation back.

https://jira.sw.ru/browse/PSBM-55577
https://jira.sw.ru/browse/PSBM-54817

Signed-off-by: Konstantin Khorenko 
---
 configs/kernel-3.10.0-x86_64-debug.config | 2 --
 configs/kernel-3.10.0-x86_64.config   | 2 --
 2 files changed, 4 deletions(-)

diff --git a/configs/kernel-3.10.0-x86_64-debug.config 
b/configs/kernel-3.10.0-x86_64-debug.config
index 6c07440..3191c05 100644
--- a/configs/kernel-3.10.0-x86_64-debug.config
+++ b/configs/kernel-3.10.0-x86_64-debug.config
@@ -5900,8 +5900,6 @@ CONFIG_FB_INTEL_I2C=y
 # CONFIG_USB_OTG_WHITELIST is not set
 # CONFIG_USB_OTG_BLACKLIST_HUB is not set
 # CONFIG_DISABLE_DEV_COREDUMP is not set
-# CONFIG_CEPH_FS is not set
-# CONFIG_XFS_FS is not set
 
 CONFIG_BCACHE=m
 # CONFIG_BCACHE_DEBUG is not set
diff --git a/configs/kernel-3.10.0-x86_64.config 
b/configs/kernel-3.10.0-x86_64.config
index a4bfa30..51289af 100644
--- a/configs/kernel-3.10.0-x86_64.config
+++ b/configs/kernel-3.10.0-x86_64.config
@@ -5873,8 +5873,6 @@ CONFIG_FB_INTEL_I2C=y
 # CONFIG_USB_OTG_WHITELIST is not set
 # CONFIG_USB_OTG_BLACKLIST_HUB is not set
 # CONFIG_DISABLE_DEV_COREDUMP is not set
-# CONFIG_CEPH_FS is not set
-# CONFIG_XFS_FS is not set
 
 CONFIG_BCACHE=m
 # CONFIG_BCACHE_DEBUG is not set
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ms/kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.9
-->
commit 7efd73d3548321581182145c349fadcce13ad00b
Author: Jim Mattson 
Date:   Wed Jan 11 18:56:16 2017 +0400

ms/kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)

When L2 exits to L0 due to "exception or NMI", software exceptions
(#BP and #OF) for which L1 has requested an intercept should be
handled by L1 rather than L0. Previously, only hardware exceptions
were forwarded to L1.

Signed-off-by: Jim Mattson 
Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Bonzini 

Backport of ms commit ef85b67385436ddc1998f45f1d6a210f935b3388
Fixes CVE-2016-9588
https://vulners.com/cve/CVE-2016-9588

https://jira.sw.ru/browse/PSBM-58194

Signed-off-by: Evgeny Yakovlev 
---
 arch/x86/kvm/vmx.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3d39923..8dea43b9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1243,10 +1243,10 @@ static inline bool nested_cpu_has_posted_intr(struct 
vmcs12 *vmcs12)
return vmcs12->pin_based_vm_exec_control & PIN_BASED_POSTED_INTR;
 }
 
-static inline bool is_exception(u32 intr_info)
+static inline bool is_nmi(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
-   == (INTR_TYPE_HARD_EXCEPTION | INTR_INFO_VALID_MASK);
+   == (INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK);
 }
 
 static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
@@ -5185,7 +5185,7 @@ static int handle_exception(struct kvm_vcpu *vcpu)
if (is_machine_check(intr_info))
return handle_machine_check(vcpu);
 
-   if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR)
+   if (is_nmi(intr_info))
return 1;  /* already handled by vmx_vcpu_run() */
 
if (is_no_device(intr_info)) {
@@ -7629,7 +7629,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 
switch (exit_reason) {
case EXIT_REASON_EXCEPTION_NMI:
-   if (!is_exception(intr_info))
+   if (is_nmi(intr_info))
return false;
else if (is_page_fault(intr_info))
return enable_ept;
@@ -8226,8 +8226,7 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
kvm_machine_check();
 
/* We need to handle NMIs before interrupts are enabled */
-   if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR &&
-   (exit_intr_info & INTR_INFO_VALID_MASK)) {
+   if (is_nmi(exit_intr_info)) {
kvm_before_handle_nmi(&vmx->vcpu);
asm("int $2");
kvm_after_handle_nmi(&vmx->vcpu);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH RHEL7 COMMIT] ms/kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)

2017-01-11 Thread Konstantin Khorenko

Please consider to create a ReadyKernel live patch for this issue.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 01/11/2017 05:56 PM, Konstantin Khorenko wrote:

The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.9
-->
commit 7efd73d3548321581182145c349fadcce13ad00b
Author: Jim Mattson 
Date:   Wed Jan 11 18:56:16 2017 +0400

ms/kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)

When L2 exits to L0 due to "exception or NMI", software exceptions
(#BP and #OF) for which L1 has requested an intercept should be
handled by L1 rather than L0. Previously, only hardware exceptions
were forwarded to L1.

Signed-off-by: Jim Mattson 
Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Bonzini 

Backport of ms commit ef85b67385436ddc1998f45f1d6a210f935b3388
Fixes CVE-2016-9588
https://vulners.com/cve/CVE-2016-9588

https://jira.sw.ru/browse/PSBM-58194

Signed-off-by: Evgeny Yakovlev 
---
 arch/x86/kvm/vmx.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3d39923..8dea43b9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1243,10 +1243,10 @@ static inline bool nested_cpu_has_posted_intr(struct 
vmcs12 *vmcs12)
return vmcs12->pin_based_vm_exec_control & PIN_BASED_POSTED_INTR;
 }

-static inline bool is_exception(u32 intr_info)
+static inline bool is_nmi(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
-   == (INTR_TYPE_HARD_EXCEPTION | INTR_INFO_VALID_MASK);
+   == (INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK);
 }

 static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
@@ -5185,7 +5185,7 @@ static int handle_exception(struct kvm_vcpu *vcpu)
if (is_machine_check(intr_info))
return handle_machine_check(vcpu);

-   if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR)
+   if (is_nmi(intr_info))
return 1;  /* already handled by vmx_vcpu_run() */

if (is_no_device(intr_info)) {
@@ -7629,7 +7629,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)

switch (exit_reason) {
case EXIT_REASON_EXCEPTION_NMI:
-   if (!is_exception(intr_info))
+   if (is_nmi(intr_info))
return false;
else if (is_page_fault(intr_info))
return enable_ept;
@@ -8226,8 +8226,7 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
kvm_machine_check();

/* We need to handle NMIs before interrupts are enabled */
-   if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR &&
-   (exit_intr_info & INTR_INFO_VALID_MASK)) {
+   if (is_nmi(exit_intr_info)) {
kvm_before_handle_nmi(&vmx->vcpu);
asm("int $2");
kvm_after_handle_nmi(&vmx->vcpu);
.


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH RHEL7 COMMIT] ms/KVM: x86: drop error recovery in em_jmp_far and em_ret_far

2017-01-11 Thread Konstantin Khorenko

Please consider to create a ReadyKernel live patch for this issue.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 01/11/2017 05:56 PM, Konstantin Khorenko wrote:

The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.9
-->
commit 55d41d1ae7e2e9bc4911e47615e7dbb67cbadb2b
Author: Radim Krčmář 
Date:   Wed Jan 11 18:56:17 2017 +0400

ms/KVM: x86: drop error recovery in em_jmp_far and em_ret_far

em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.
We could have fixed that by always saving and restoring the CS, but we
take a simpler approach and just break any guest that manages to fail
as the error recovery is error-prone and modern CPUs don't need emulator
for this.

Found by syzkaller:

  WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 
em_ret_far+0x428/0x480
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
01/01/2011
   [...]
  Call Trace:
   [...] __dump_stack lib/dump_stack.c:15
   [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
   [...] panic+0x1b7/0x3a3 kernel/panic.c:179
   [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
   [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
   [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
   [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
   [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
   [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
   [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
   [...] complete_emulated_io arch/x86/kvm/x86.c:6870
   [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
   [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
   [...] kvm_vcpu_ioctl+0x61e/0xdd0 
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
   [...] vfs_ioctl fs/ioctl.c:43
   [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
   [...] SYSC_ioctl fs/ioctl.c:694
   [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
   [...] entry_SYSCALL_64_fastpath+0x1f/0xc2

Reported-by: Dmitry Vyukov 
Cc: sta...@vger.kernel.org
Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far 
jumps")
Signed-off-by: Radim Krčmář 

Backport of ms commit 2117d5398c81554fbf803f5fd1dc55eb78216c0c
Fixes CVE-2016-9756
https://vulners.com/cve/CVE-2016-9756

https://jira.sw.ru/browse/PSBM-58195

Signed-off-by: Evgeny Yakovlev 
---
 arch/x86/kvm/emulate.c | 36 +++-
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index bce2c74..f9da33c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2125,16 +2125,10 @@ static int em_iret(struct x86_emulate_ctxt *ctxt)
 static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
 {
int rc;
-   unsigned short sel, old_sel;
-   struct desc_struct old_desc, new_desc;
-   const struct x86_emulate_ops *ops = ctxt->ops;
+   unsigned short sel;
+   struct desc_struct new_desc;
u8 cpl = ctxt->ops->cpl(ctxt);

-   /* Assignment of RIP may only fail in 64-bit mode */
-   if (ctxt->mode == X86EMUL_MODE_PROT64)
-   ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
-VCPU_SREG_CS);
-
memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);

rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
@@ -2144,12 +2138,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
return rc;

rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
-   if (rc != X86EMUL_CONTINUE) {
-   WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
-   /* assigning eip failed; restore the old cs */
-   ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
-   return rc;
-   }
+   /* Error handling is not implemented. */
+   if (rc != X86EMUL_CONTINUE)
+   return X86EMUL_UNHANDLEABLE;
+
return rc;
 }

@@ -2209,14 +2201,8 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
 {
int rc;
unsigned long eip, cs;
-   u16 old_cs;
int cpl = ctxt->ops->cpl(ctxt);
-   struct desc_struct old_desc, new_desc;
-   const struct x86_emulate_ops *ops = ctxt->ops;
-
-   if (ctxt->mode == X86EMUL_MODE_PROT64)
-   ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
-VCPU_SREG_CS);
+   

[Devel] [PATCH RHEL7 COMMIT] ms/KVM: x86: drop error recovery in em_jmp_far and em_ret_far

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.9
-->
commit 55d41d1ae7e2e9bc4911e47615e7dbb67cbadb2b
Author: Radim Krčmář 
Date:   Wed Jan 11 18:56:17 2017 +0400

ms/KVM: x86: drop error recovery in em_jmp_far and em_ret_far

em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.
We could have fixed that by always saving and restoring the CS, but we
take a simpler approach and just break any guest that manages to fail
as the error recovery is error-prone and modern CPUs don't need emulator
for this.

Found by syzkaller:

  WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 
em_ret_far+0x428/0x480
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
01/01/2011
   [...]
  Call Trace:
   [...] __dump_stack lib/dump_stack.c:15
   [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
   [...] panic+0x1b7/0x3a3 kernel/panic.c:179
   [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
   [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
   [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
   [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
   [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
   [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
   [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
   [...] complete_emulated_io arch/x86/kvm/x86.c:6870
   [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
   [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
   [...] kvm_vcpu_ioctl+0x61e/0xdd0 
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
   [...] vfs_ioctl fs/ioctl.c:43
   [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
   [...] SYSC_ioctl fs/ioctl.c:694
   [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
   [...] entry_SYSCALL_64_fastpath+0x1f/0xc2

Reported-by: Dmitry Vyukov 
Cc: sta...@vger.kernel.org
Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far 
jumps")
Signed-off-by: Radim Krčmář 

Backport of ms commit 2117d5398c81554fbf803f5fd1dc55eb78216c0c
Fixes CVE-2016-9756
https://vulners.com/cve/CVE-2016-9756

https://jira.sw.ru/browse/PSBM-58195

Signed-off-by: Evgeny Yakovlev 
---
 arch/x86/kvm/emulate.c | 36 +++-
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index bce2c74..f9da33c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2125,16 +2125,10 @@ static int em_iret(struct x86_emulate_ctxt *ctxt)
 static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
 {
int rc;
-   unsigned short sel, old_sel;
-   struct desc_struct old_desc, new_desc;
-   const struct x86_emulate_ops *ops = ctxt->ops;
+   unsigned short sel;
+   struct desc_struct new_desc;
u8 cpl = ctxt->ops->cpl(ctxt);
 
-   /* Assignment of RIP may only fail in 64-bit mode */
-   if (ctxt->mode == X86EMUL_MODE_PROT64)
-   ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
-VCPU_SREG_CS);
-
memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
 
rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
@@ -2144,12 +2138,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
return rc;
 
rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
-   if (rc != X86EMUL_CONTINUE) {
-   WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
-   /* assigning eip failed; restore the old cs */
-   ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
-   return rc;
-   }
+   /* Error handling is not implemented. */
+   if (rc != X86EMUL_CONTINUE)
+   return X86EMUL_UNHANDLEABLE;
+
return rc;
 }
 
@@ -2209,14 +2201,8 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
 {
int rc;
unsigned long eip, cs;
-   u16 old_cs;
int cpl = ctxt->ops->cpl(ctxt);
-   struct desc_struct old_desc, new_desc;
-   const struct x86_emulate_ops *ops = ctxt->ops;
-
-   if (ctxt->mode == X86EMUL_MODE_PROT64)
-   ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
-VCPU_SREG_CS);
+   struct desc_struct new_desc;
 
rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
@@ -2233,10 +2219,10 @

Re: [Devel] [PATCH RHEL7 COMMIT] ms/kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)

2017-01-11 Thread Vasily Averin
it requires ebnabled nested virtualization.
it is not supported yet, so there are no sense to fix it in ReadyKernel

On 2017-01-11 18:00, Konstantin Khorenko wrote:
> Please consider to create a ReadyKernel live patch for this issue.
> 
> -- 
> Best regards,
> 
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
> 
> On 01/11/2017 05:56 PM, Konstantin Khorenko wrote:
>> The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear 
>> at https://src.openvz.org/scm/ovz/vzkernel.git
>> after rh7-3.10.0-514.vz7.27.9
>> -->
>> commit 7efd73d3548321581182145c349fadcce13ad00b
>> Author: Jim Mattson 
>> Date:   Wed Jan 11 18:56:16 2017 +0400
>>
>> ms/kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)
>>
>> When L2 exits to L0 due to "exception or NMI", software exceptions
>> (#BP and #OF) for which L1 has requested an intercept should be
>> handled by L1 rather than L0. Previously, only hardware exceptions
>> were forwarded to L1.
>>
>> Signed-off-by: Jim Mattson 
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Paolo Bonzini 
>>
>> Backport of ms commit ef85b67385436ddc1998f45f1d6a210f935b3388
>> Fixes CVE-2016-9588
>> https://vulners.com/cve/CVE-2016-9588
>>
>> https://jira.sw.ru/browse/PSBM-58194
>>
>> Signed-off-by: Evgeny Yakovlev 
>> ---
>>  arch/x86/kvm/vmx.c | 11 +--
>>  1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 3d39923..8dea43b9 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -1243,10 +1243,10 @@ static inline bool nested_cpu_has_posted_intr(struct 
>> vmcs12 *vmcs12)
>>  return vmcs12->pin_based_vm_exec_control & PIN_BASED_POSTED_INTR;
>>  }
>>
>> -static inline bool is_exception(u32 intr_info)
>> +static inline bool is_nmi(u32 intr_info)
>>  {
>>  return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
>> -== (INTR_TYPE_HARD_EXCEPTION | INTR_INFO_VALID_MASK);
>> +== (INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK);
>>  }
>>
>>  static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>> @@ -5185,7 +5185,7 @@ static int handle_exception(struct kvm_vcpu *vcpu)
>>  if (is_machine_check(intr_info))
>>  return handle_machine_check(vcpu);
>>
>> -if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR)
>> +if (is_nmi(intr_info))
>>  return 1;  /* already handled by vmx_vcpu_run() */
>>
>>  if (is_no_device(intr_info)) {
>> @@ -7629,7 +7629,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu 
>> *vcpu)
>>
>>  switch (exit_reason) {
>>  case EXIT_REASON_EXCEPTION_NMI:
>> -if (!is_exception(intr_info))
>> +if (is_nmi(intr_info))
>>  return false;
>>  else if (is_page_fault(intr_info))
>>  return enable_ept;
>> @@ -8226,8 +8226,7 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx 
>> *vmx)
>>  kvm_machine_check();
>>
>>  /* We need to handle NMIs before interrupts are enabled */
>> -if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR &&
>> -(exit_intr_info & INTR_INFO_VALID_MASK)) {
>> +if (is_nmi(exit_intr_info)) {
>>  kvm_before_handle_nmi(&vmx->vcpu);
>>  asm("int $2");
>>  kvm_after_handle_nmi(&vmx->vcpu);
>> .
>>
> 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH RHEL7 COMMIT] ms/KVM: x86: drop error recovery in em_jmp_far and em_ret_far

2017-01-11 Thread Vasily Averin
According to Red Hat this issue have low impact,
I do not see here any Virtuozzo specific, it isn't critical for us.
so we're not going to force fix and will wait until it will be fixed in RHEL7 
kernels.

On 2017-01-11 18:01, Konstantin Khorenko wrote:
> Please consider to create a ReadyKernel live patch for this issue.
> 
> -- 
> Best regards,
> 
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
> 
> On 01/11/2017 05:56 PM, Konstantin Khorenko wrote:
>> The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear 
>> at https://src.openvz.org/scm/ovz/vzkernel.git
>> after rh7-3.10.0-514.vz7.27.9
>> -->
>> commit 55d41d1ae7e2e9bc4911e47615e7dbb67cbadb2b
>> Author: Radim Krčmář 
>> Date:   Wed Jan 11 18:56:17 2017 +0400
>>
>> ms/KVM: x86: drop error recovery in em_jmp_far and em_ret_far
>>
>> em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
>> bit mode, but syzkaller proved otherwise (and SDM agrees).
>> Code segment was restored upon failure, but it was left uninitialized
>> outside of long mode, which could lead to a leak of host kernel stack.
>> We could have fixed that by always saving and restoring the CS, but we
>> take a simpler approach and just break any guest that manages to fail
>> as the error recovery is error-prone and modern CPUs don't need emulator
>> for this.
>>
>> Found by syzkaller:
>>
>>   WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 
>> em_ret_far+0x428/0x480
>>   Kernel panic - not syncing: panic_on_warn set ...
>>
>>   CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
>> 01/01/2011
>>[...]
>>   Call Trace:
>>[...] __dump_stack lib/dump_stack.c:15
>>[...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
>>[...] panic+0x1b7/0x3a3 kernel/panic.c:179
>>[...] __warn+0x1c4/0x1e0 kernel/panic.c:542
>>[...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
>>[...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
>>[...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
>>[...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
>>[...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
>>[...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
>>[...] complete_emulated_io arch/x86/kvm/x86.c:6870
>>[...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
>>[...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
>>[...] kvm_vcpu_ioctl+0x61e/0xdd0 
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
>>[...] vfs_ioctl fs/ioctl.c:43
>>[...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
>>[...] SYSC_ioctl fs/ioctl.c:694
>>[...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
>>[...] entry_SYSCALL_64_fastpath+0x1f/0xc2
>>
>> Reported-by: Dmitry Vyukov 
>> Cc: sta...@vger.kernel.org
>> Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far 
>> jumps")
>> Signed-off-by: Radim Krčmář 
>>
>> Backport of ms commit 2117d5398c81554fbf803f5fd1dc55eb78216c0c
>> Fixes CVE-2016-9756
>> https://vulners.com/cve/CVE-2016-9756
>>
>> https://jira.sw.ru/browse/PSBM-58195
>>
>> Signed-off-by: Evgeny Yakovlev 
>> ---
>>  arch/x86/kvm/emulate.c | 36 +++-
>>  1 file changed, 11 insertions(+), 25 deletions(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index bce2c74..f9da33c 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -2125,16 +2125,10 @@ static int em_iret(struct x86_emulate_ctxt *ctxt)
>>  static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
>>  {
>>  int rc;
>> -unsigned short sel, old_sel;
>> -struct desc_struct old_desc, new_desc;
>> -const struct x86_emulate_ops *ops = ctxt->ops;
>> +unsigned short sel;
>> +struct desc_struct new_desc;
>>  u8 cpl = ctxt->ops->cpl(ctxt);
>>
>> -/* Assignment of RIP may only fail in 64-bit mode */
>> -if (ctxt->mode == X86EMUL_MODE_PROT64)
>> -ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
>> - VCPU_SREG_CS);
>> -
>>  memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
>>
>>  rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl,
>> @@ -2144,12 +2138,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
>>  return rc;
>>
>>  rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
>> -if (rc != X86EMUL_CONTINUE) {
>> -WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
>> -/* assigning eip failed; restore the old cs */
>> -ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
>> -return rc;
>> -}
>> +/* Error handling is not implemented. */
>> +if (rc != X86EMUL_CONTINUE)
>> +return X86EMUL_UNHANDLEABLE;
>> +
>>  return r

[Devel] [PATCH RHEL7 COMMIT] ve/fs/fadvise: introduce FADV_DEACTIVATE flag

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.9
-->
commit 8d30c1ed7eb9c3174eec56a5d0dc29316fd86a39
Author: Andrey Ryabinin 
Date:   Wed Jan 11 19:30:03 2017 +0400

ve/fs/fadvise: introduce FADV_DEACTIVATE flag

FADV_DEACTIVATE advises kernel to move file pages from active to
inactive list.

This allows Chunk Servers (CS) to mark particular page cache parts to be
reclaimed in the first turn.

https://jira.sw.ru/browse/PSBM-57915

Signed-off-by: Andrey Ryabinin 
---
 include/uapi/linux/fadvise.h |  1 +
 mm/fadvise.c | 43 +++
 2 files changed, 44 insertions(+)

diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h
index a3e0703..b6ade7e 100644
--- a/include/uapi/linux/fadvise.h
+++ b/include/uapi/linux/fadvise.h
@@ -17,6 +17,7 @@
 #define POSIX_FADV_DONTNEED4 /* Don't need these pages.  */
 #define POSIX_FADV_NOREUSE 5 /* Data will be accessed once.  */
 #endif
+#define FADV_DEACTIVATE32 /* Mark pages as good candidates for 
reclaim */
 
 #ifdef __KERNEL__
 extern int generic_fadvise(struct file* file, loff_t off, loff_t len, int adv);
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 0b25007..50beef3 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -22,6 +22,43 @@
 
 #include 
 
+static void fadvise_deactivate(struct address_space *mapping,
+   pgoff_t start, pgoff_t end)
+{
+   struct pagevec pvec;
+   pgoff_t index = start;
+   int i;
+
+   if (start > end)
+   return;
+
+   /*
+* Note: this function may get called on a shmem/tmpfs mapping:
+* pagevec_lookup() might then return 0 prematurely (because it
+* got a gangful of swap entries); but it's hardly worth worrying
+* about - it can rarely have anything to free from such a mapping
+* (most pages are dirty), and already skips over any difficulties.
+*/
+
+   pagevec_init(&pvec, 0);
+   while (index <= end && pagevec_lookup(&pvec, mapping, index,
+   min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+   for (i = 0; i < pagevec_count(&pvec); i++) {
+   struct page *page = pvec.pages[i];
+
+   /* We rely upon deletion not changing page->index */
+   index = page->index;
+   if (index > end)
+   break;
+
+   deactivate_page(page);
+   }
+   pagevec_release(&pvec);
+   cond_resched();
+   index++;
+   }
+}
+
 /*
  * POSIX_FADV_WILLNEED could set PG_Referenced, and POSIX_FADV_NOREUSE could
  * deactivate the pages and clear PG_Referenced.
@@ -47,6 +84,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t 
len, int advice)
case POSIX_FADV_WILLNEED:
case POSIX_FADV_NOREUSE:
case POSIX_FADV_DONTNEED:
+   case FADV_DEACTIVATE:
/* no bad return value, but ignore advice */
break;
default:
@@ -127,6 +165,11 @@ int generic_fadvise(struct file *file, loff_t offset, 
loff_t len, int advice)
}
}
break;
+   case FADV_DEACTIVATE:
+   start_index = (offset+(PAGE_CACHE_SIZE-1)) >> PAGE_CACHE_SHIFT;
+   end_index = (endbyte >> PAGE_CACHE_SHIFT);
+   fadvise_deactivate(mapping, start_index, end_index);
+   break;
default:
ret = -EINVAL;
}
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH libvzctl] scritps: vz-rst-action.in -- Fix typo in error message

2017-01-11 Thread Cyrill Gorcunov
In case if restore failed better to have precise error string.

Signed-off-by: Cyrill Gorcunov 
---
Igor please don't force new libvzctl to be comiled solely
because of this patch, it's just a typo and can wait.

 scripts/vz-rst-action.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/vz-rst-action.in b/scripts/vz-rst-action.in
index 0afbe11..77442f8 100755
--- a/scripts/vz-rst-action.in
+++ b/scripts/vz-rst-action.in
@@ -105,11 +105,11 @@ case "$CRTOOLS_SCRIPT_ACTION" in
[ -f "$CRTOOLS_IMAGE_DIR/vz_memory_limit_in_bytes.img" ] && \
{ echo `cat 
$CRTOOLS_IMAGE_DIR/vz_memory_limit_in_bytes.img` > \

/sys/fs/cgroup/memory/machine.slice/$VEID/memory.limit_in_bytes || \
-   { echo "Failed to restore core_pattern"; exit 
1; } }
+   { echo "Failed to restore 
memory.limit_in_bytes"; exit 1; } }
[ -f "$CRTOOLS_IMAGE_DIR/vz_memory_memsw_limit_in_bytes.img" ] 
&& \
{ echo `cat 
$CRTOOLS_IMAGE_DIR/vz_memory_memsw_limit_in_bytes.img` > \

/sys/fs/cgroup/memory/machine.slice/$VEID/memory.memsw.limit_in_bytes || \
-   { echo "Failed to restore core_pattern"; exit 
1; } }
+   { echo "Failed to restore 
memory.memsw.limit_in_bytes"; exit 1; } }
fi
;;
 "post-restore")
-- 
2.7.4

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ms/net: fix creation adjacent device symlinks

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.9
-->
commit 86ebd49b6913ee3e9e4e66220e350b6982c985e5
Author: Alexander Y. Fomichev 
Date:   Wed Jan 11 20:19:36 2017 +0400

ms/net: fix creation adjacent device symlinks

__netdev_adjacent_dev_insert may add adjust device of different net
namespace, without proper check it leads to emergence of broken
sysfs links from/to devices in another namespace.
Fix: rewrite netdev_adjacent_is_neigh_list macro as a function,
 move net_eq check into netdev_adjacent_is_neigh_list.
 (thanks David)
 related to: 4c75431ac3520631f1d9e74aa88407e6374dbbc4

Signed-off-by: Alexander Fomichev 
Signed-off-by: David S. Miller 

ms commit: 7ce64c7 ("net: fix creation adjacent device symlinks")
https://jira.sw.ru/browse/PSBM-58300

Signed-off-by: Pavel Tikhomirov 
Acked-by: Andrew Vagin 
---
 net/core/dev.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b1a183d..65f68fd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5182,9 +5182,14 @@ void netdev_adjacent_sysfs_del(struct net_device *dev,
sysfs_remove_link(&(dev->dev.kobj), linkname);
 }
 
-#define netdev_adjacent_is_neigh_list(dev, dev_list) \
-   (dev_list == &dev->adj_list.upper || \
-dev_list == &dev->adj_list.lower)
+static inline bool netdev_adjacent_is_neigh_list(struct net_device *dev,
+struct net_device *adj_dev,
+struct list_head *dev_list)
+{
+   return (dev_list == &dev->adj_list.upper ||
+   dev_list == &dev->adj_list.lower) &&
+   net_eq(dev_net(dev), dev_net(adj_dev));
+}
 
 static int __netdev_adjacent_dev_insert(struct net_device *dev,
struct net_device *adj_dev,
@@ -5214,7 +5219,7 @@ static int __netdev_adjacent_dev_insert(struct net_device 
*dev,
pr_debug("dev_hold for %s, because of link added from %s to %s\n",
 adj_dev->name, dev->name, adj_dev->name);
 
-   if (netdev_adjacent_is_neigh_list(dev, dev_list)) {
+   if (netdev_adjacent_is_neigh_list(dev, adj_dev, dev_list)) {
ret = netdev_adjacent_sysfs_add(dev, adj_dev, dev_list);
if (ret)
goto free_adj;
@@ -5235,7 +5240,7 @@ static int __netdev_adjacent_dev_insert(struct net_device 
*dev,
return 0;
 
 remove_symlinks:
-   if (netdev_adjacent_is_neigh_list(dev, dev_list))
+   if (netdev_adjacent_is_neigh_list(dev, adj_dev, dev_list))
netdev_adjacent_sysfs_del(dev, adj_dev->name, dev_list);
 free_adj:
kfree(adj);
@@ -5267,8 +5272,7 @@ void __netdev_adjacent_dev_remove(struct net_device *dev,
if (adj->master)
sysfs_remove_link(&(dev->dev.kobj), "master");
 
-   if (netdev_adjacent_is_neigh_list(dev, dev_list) &&
-   net_eq(dev_net(dev),dev_net(adj_dev)))
+   if (netdev_adjacent_is_neigh_list(dev, adj_dev, dev_list))
netdev_adjacent_sysfs_del(dev, adj_dev->name, dev_list);
 
list_del_rcu(&adj->list);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ms/macvlan: unregister net device when netdev_upper_dev_link() fails

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.9
-->
commit 5fdf0a73250e93877a68d57948f4237134d0d522
Author: Cong Wang 
Date:   Wed Jan 11 20:19:37 2017 +0400

ms/macvlan: unregister net device when netdev_upper_dev_link() fails

rtnl_newlink() doesn't unregister it for us on failure.

Cc: Patrick McHardy 
Cc: David S. Miller 
Signed-off-by: Cong Wang 
Signed-off-by: Cong Wang 
Signed-off-by: David S. Miller 

ms commit: da37705 ("macvlan: unregister net device when
netdev_upper_dev_link() fails")

https://jira.sw.ru/browse/PSBM-58300

Signed-off-by: Pavel Tikhomirov 
Acked-by: Andrew Vagin 
---
 drivers/net/macvlan.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index a6bbb1d..808cf38 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -966,8 +966,7 @@ int macvlan_common_newlink(struct net *src_net, struct 
net_device *dev,
 
err = netdev_upper_dev_link(lowerdev, dev);
if (err)
-   goto destroy_port;
-
+   goto unregister_netdev;
 
dev->priv_flags |= IFF_MACVLAN;
list_add_tail_rcu(&vlan->list, &port->vlans);
@@ -975,6 +974,8 @@ int macvlan_common_newlink(struct net *src_net, struct 
net_device *dev,
 
return 0;
 
+unregister_netdev:
+   unregister_netdevice(dev);
 destroy_port:
port->count -= 1;
if (!port->count)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ms/net: prevent of emerging cross-namespace symlinks

2017-01-11 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at 
https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.9
-->
commit 53259712b96f1c8cfdb17096e79f26e007bef3a7
Author: Alexander Y. Fomichev 
Date:   Wed Jan 11 20:19:35 2017 +0400

ms/net: prevent of emerging cross-namespace symlinks

Patchset description:
macvlan: fix crash on list_del_rcu in macvlan_dellink

Fixing problem that criu zdtm macvlan test on host crashes VZ7.
Note: macvlans are prohibited inside vz7 Containers => Container owner/user
cannot crash the node in that way.

1 - Remove cross-namespace upper_/lower_ symlinks on sysfs
when upper device is moved to other netns, that prevented
creation of another upper dev with same name on same lower
dev in initial netns(with warning).

2 - Fix for 1

3 - Fix partial macvlan device creation in case of error in
netdev_upper_dev_link, when remove such device we get crash.

Crash:

[43183.592029] [ cut here ]
[43183.592057] WARNING: at fs/sysfs/dir.c:560 sysfs_add_one+0xa5/0xd0()
[43183.592059] sysfs: cannot create duplicate filename 
'/devices/virtual/net/zdtmbr0/upper_zdtmmvlan0'
...
[43183.657255] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[43183.657285] IP: [] __list_del_entry+0x29/0xd0
[43183.657313] PGD 147afb067 PUD 1466d8067 PMD 0
[43183.657330] Oops:  [#1] SMP
[43183.657344] Modules linked in: xt_mark macvlan nf_conntrack_netlink 
nfnetlink udp_diag tcp_diag inet_diag netlink_diag af_packet_diag unix_diag 
binfmt_misc ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM 
iptable_mangle ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 tun 8021q 
garp mrp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
intel_powerclamp iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw 
gf128mul glue_helper ablk_helper cryptd sg sbs ppdev sbshc pcspkr 
virtio_balloon parport_pc parport shpchp lpc_ich veth overlay ip6_vzprivnet 
ip6_vznetstat pio_kaio pio_nfs nfsd auth_rpcgss nfs_acl lockd grace sunrpc 
pio_direct pfmt_raw pfmt_ploop1 ploop ip_vznetstat ip_vzprivnet vziolimit
[43183.657631]  vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge 
stp llc ip_tables ext4 mbcache jbd2 sd_mod sr_mod crc_t10dif cdrom 
crct10dif_generic ata_generic pata_acpi crct10dif_pclmul crct10dif_common ahci 
ata_piix crc32c_intel libahci libata serio_raw virtio_pci virtio_ring e1000 
virtio fjes floppy dm_mirror dm_region_hash dm_log dm_mod
[43183.657807] CPU: 0 PID: 332 Comm: kworker/u64:7 ve: 0 Tainted: G
W     3.10.0-514.vz7.27.5 #1 27.5
[43183.657823] Hardware name: Parallels Software International Inc. 
Parallels Virtual Platform/Parallels Virtual Platform, BIOS 6.10.24198.1226784 
12/09/2015
[43183.657842] Workqueue: netns cleanup_net
[43183.657858] task: 880144c8d050 ti: 880144d0 task.ti: 
880144d0
[43183.657871] RIP: 0010:[]  [] 
__list_del_entry+0x29/0xd0
[43183.657889] RSP: 0018:880144d03ce0  EFLAGS: 00010207
[43183.657898] RAX:  RBX: 880012598000 RCX: 
dead0200
[43183.657906] RDX:  RSI: 880144d03d10 RDI: 
8800125988c8
[43183.657915] RBP: 880144d03ce0 R08: 880144d03d38 R09: 

[43183.657926] R10: 0001 R11:  R12: 
880144d03d10
[43183.657935] R13: 88009fe22848 R14: 880144d03d10 R15: 
88009fe22780
[43183.657946] FS:  () GS:88014ae0() 
knlGS:
[43183.657959] CS:  0010 DS:  ES:  CR0: 80050033
[43183.657967] CR2:  CR3: 0001459f5000 CR4: 
000406f0
[43183.657975] DR0: 00010140 DR1:  DR2: 

[43183.657983] DR3:  DR6: 0ff0 DR7: 
0600
[43183.657990] Stack:
[43183.657997]  880144d03d00 a05368ce 880012598000 
880144d03dd8
[43183.658023]  880144d03d78 8156fb02 880144d03d10 
880144d03d10
[43183.658046]   880144c8d050 810b3080 
880144d03d38
[43183.658069] Call Trace:
[43183.658081]  [] macvlan_dellink+0x1e/0x50 [macvlan]
[43183.658093]  [] default_device_exit_batch+0x102/0x190
[43183.658108]  [] ? wake_up_atomic_t+0x30/0x30
[43183.658118]  [] ops_exit_list.isra.5+0x53/0x60
[43183.658127]  [] cleanup_net+0x260/0x480
[43183.658142]  [] process_one_work+0x17b/0x470
[43183.658151]  [] worker_thread+0x126/0x410
[43183.658160]  [] ? rescuer_thread+0x460/0x460
[43183.658171]  [] kthread+0xcf/0xe0
[43183.658181]  [] ? create_kthread+0x60/0x60
[43183.658193]  [] ret_fro