[Devel] [PATCH rh7 4.1/8] ms/mm/rmap: share the i_mmap_rwsem fix

2020-12-01 Thread Andrey Ryabinin
Use down_read_nested to avoid lockdep complain.

Signed-off-by: Andrey Ryabinin 
---
 mm/rmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 523957450d20..90cf61e209ac 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1724,7 +1724,7 @@ static int rmap_walk_file(struct page *page, struct 
rmap_walk_control *rwc)
return ret;
pgoff = page_to_pgoff(page);
 
-   i_mmap_lock_read(mapping);
+   down_read_nested(&mapping->i_mmap_rwsem, SINGLE_DEPTH_NESTING);
vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
unsigned long address = vma_address(page, vma);
 
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 4.1/8] ms/mm/rmap: share the i_mmap_rwsem fix

2020-12-01 Thread Andrey Ryabinin
Use down_read_nested to avoid lockdep complain.

Signed-off-by: Andrey Ryabinin 
---
 mm/rmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 523957450d20..90cf61e209ac 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1724,7 +1724,7 @@ static int rmap_walk_file(struct page *page, struct 
rmap_walk_control *rwc)
return ret;
pgoff = page_to_pgoff(page);
 
-   i_mmap_lock_read(mapping);
+   down_read_nested(&mapping->i_mmap_rwsem, SINGLE_DEPTH_NESTING);
vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
unsigned long address = vma_address(page, vma);
 
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7] mm: Fix race between reparenting memcg and list_lru_del()

2020-12-01 Thread Kirill Tkhai
From: Roman Gushchin 

On reparenting struct list_lru_one::nr_items may become
negative, so all the shrinker bits logic works as not expected.

This leads to cleared shrinker bit while LRU is not
actually empty.

(We will pull description from ms git later, when it's available).

https://lkml.org/lkml/2020/11/30/1093

Signed-off-by: Kirill Tkhai 
---
 mm/list_lru.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 21e12a8364ff..05d517197fbe 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -511,7 +511,6 @@ static void memcg_drain_list_lru_node(struct list_lru *lru, 
int nid,
struct list_lru_node *nlru = &lru->node[nid];
int dst_idx = memcg_cache_id(dst_memcg);
struct list_lru_one *src, *dst;
-   bool set;
 
/*
 * Since list_lru_{add,del} may be called under an IRQ-safe lock,
@@ -523,9 +522,8 @@ static void memcg_drain_list_lru_node(struct list_lru *lru, 
int nid,
dst = list_lru_from_memcg_idx(nlru, dst_idx);
 
list_splice_init(&src->list, &dst->list);
-   set = (!dst->nr_items && src->nr_items);
dst->nr_items += src->nr_items;
-   if (set)
+   if (src->nr_items)
memcg_set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru));
src->nr_items = 0;
 


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 6/6] ms/netfilter: ipset: Fix calling ip_set() macro at dumping

2020-12-01 Thread Vasily Averin
From: Jozsef Kadlecsik 

The ip_set() macro is called when either ip_set_ref_lock held only
or no lock/nfnl mutex is held at dumping. Take this into account
properly. Also, use Pablo's suggestion to use rcu_dereference_raw(),
the ref_netlink protects the set.

Signed-off-by: Jozsef Kadlecsik 
Signed-off-by: Pablo Neira Ayuso 

(cherry-picked from commit 8a02bdd50b2ecb6d62121d2958d3ea186cc88ce7)
https://jira.sw.ru/browse/PSBM-122965
Signed-off-by: Vasily Averin 
---
 net/netfilter/ipset/ip_set_core.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index a22af3e..b067879 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -55,11 +55,15 @@ MODULE_AUTHOR("Jozsef Kadlecsik 
");
 MODULE_DESCRIPTION("core IP set support");
 MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_IPSET);
 
-/* When the nfnl mutex is held: */
+/* When the nfnl mutex or ip_set_ref_lock is held: */
 #define ip_set_dereference(p)  \
-   rcu_dereference_protected(p, lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET))
+   rcu_dereference_protected(p,\
+   lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET) || \
+   lockdep_is_held(&ip_set_ref_lock))
 #define ip_set(inst, id)   \
ip_set_dereference((inst)->ip_set_list)[id]
+#define ip_set_ref_netlink(inst,id)\
+   rcu_dereference_raw((inst)->ip_set_list)[id]
 
 /* The set types are implemented in modules and registered set types
  * can be found in ip_set_type_list. Adding/deleting types is
@@ -1261,7 +1265,7 @@ ip_set_dump_done(struct netlink_callback *cb)
struct ip_set_net *inst =
(struct ip_set_net *)cb->args[IPSET_CB_NET];
ip_set_id_t index = (ip_set_id_t)cb->args[IPSET_CB_INDEX];
-   struct ip_set *set = ip_set(inst, index);
+   struct ip_set *set = ip_set_ref_netlink(inst, index);
 
if (set->variant->uref)
set->variant->uref(set, cb, false);
@@ -1457,7 +1461,7 @@ ip_set_dump_do(struct sk_buff *skb, struct 
netlink_callback *cb)
 release_refcount:
/* If there was an error or set is done, release set */
if (ret || !cb->args[IPSET_CB_ARG0]) {
-   set = ip_set(inst, index);
+   set = ip_set_ref_netlink(inst, index);
if (set->variant->uref)
set->variant->uref(set, cb, false);
pr_debug("release set %s\n", set->name);
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 5/6] ms/netfilter: ipset: fix suspicious RCU usage in find_set_and_id

2020-12-01 Thread Vasily Averin
From: Kadlecsik József 

find_set_and_id() is called when the NFNL_SUBSYS_IPSET mutex is held.
However, in the error path there can be a follow-up recvmsg() without
the mutex held. Use the start() function of struct netlink_dump_control
instead of dump() to verify and report if the specified set does not
exist.

Thanks to Pablo Neira Ayuso for helping me to understand the subleties
of the netlink protocol.

Reported-by: syzbot+fc69d7cb21258ab4a...@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik 
Signed-off-by: Pablo Neira Ayuso 

(cherry-picked from commit 5038517119d50ed0240059b1d7fc2faa92371c08)
https://jira.sw.ru/browse/PSBM-122965
Signed-off-by: Vasily Averin 
---
 net/netfilter/ipset/ip_set_core.c | 41 ---
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index 0a53827..a22af3e 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1284,30 +1284,33 @@ dump_attrs(struct nlmsghdr *nlh)
 }
 
 static int
-dump_init(struct netlink_callback *cb, struct ip_set_net *inst)
+ip_set_dump_start(struct netlink_callback *cb)
 {
struct nlmsghdr *nlh = nlmsg_hdr(cb->skb);
int min_len = nlmsg_total_size(sizeof(struct nfgenmsg));
struct nlattr *cda[IPSET_ATTR_CMD_MAX + 1];
struct nlattr *attr = (void *)nlh + min_len;
+   struct sk_buff *skb = cb->skb;
+   struct ip_set_net *inst = ip_set_pernet(sock_net(skb->sk));
u32 dump_type;
-   ip_set_id_t index;
int ret;
 
ret = nla_parse(cda, IPSET_ATTR_CMD_MAX,
  attr, nlh->nlmsg_len - min_len, ip_set_setname_policy);
if (ret)
-   return ret;
+   goto error;
 
cb->args[IPSET_CB_PROTO] = nla_get_u8(cda[IPSET_ATTR_PROTOCOL]);
if (cda[IPSET_ATTR_SETNAME]) {
+   ip_set_id_t index;
struct ip_set *set;
 
set = find_set_and_id(inst, nla_data(cda[IPSET_ATTR_SETNAME]),
  &index);
-   if (!set)
-   return -ENOENT;
-
+   if (!set) {
+   ret = -ENOENT;
+   goto error;
+   }
dump_type = DUMP_ONE;
cb->args[IPSET_CB_INDEX] = index;
} else {
@@ -1323,10 +1326,17 @@ dump_init(struct netlink_callback *cb, struct 
ip_set_net *inst)
cb->args[IPSET_CB_DUMP] = dump_type;
 
return 0;
+
+error:
+   /* We have to create and send the error message manually :-( */
+   if (nlh->nlmsg_flags & NLM_F_ACK) {
+   netlink_ack(cb->skb, nlh, ret);
+   }
+   return ret;
 }
 
 static int
-ip_set_dump_start(struct sk_buff *skb, struct netlink_callback *cb)
+ip_set_dump_do(struct sk_buff *skb, struct netlink_callback *cb)
 {
ip_set_id_t index = IPSET_INVALID_ID, max;
struct ip_set *set = NULL;
@@ -1337,18 +1347,8 @@ ip_set_dump_start(struct sk_buff *skb, struct 
netlink_callback *cb)
bool is_destroyed;
int ret = 0;
 
-   if (!cb->args[IPSET_CB_DUMP]) {
-   ret = dump_init(cb, inst);
-   if (ret < 0) {
-   nlh = nlmsg_hdr(cb->skb);
-   /* We have to create and send the error message
-* manually :-(
-*/
-   if (nlh->nlmsg_flags & NLM_F_ACK)
-   netlink_ack(cb->skb, nlh, ret);
-   return ret;
-   }
-   }
+   if (!cb->args[IPSET_CB_DUMP])
+   return -EINVAL;
 
if (cb->args[IPSET_CB_INDEX] >= inst->ip_set_max)
goto out;
@@ -1484,7 +1484,8 @@ ip_set_dump(struct sock *ctnl, struct sk_buff *skb,
 
{
struct netlink_dump_control c = {
-   .dump = ip_set_dump_start,
+   .start = ip_set_dump_start,
+   .dump = ip_set_dump_do,
.done = ip_set_dump_done,
};
return netlink_dump_start(ctnl, skb, nlh, &c);
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 2/6] ms/netfilter: ipset: fix a missing check of nla_parse

2020-12-01 Thread Vasily Averin
From: Aditya Pakki 

When nla_parse fails, we should not use the results (the first
argument). The fix checks if it fails, and if so, returns its error code
upstream.

Signed-off-by: Aditya Pakki 
Signed-off-by: Jozsef Kadlecsik 

(cherry-picked from commit f4f5748bfec94cf418e49bf05f0c81a1b9ebc95)
VvS: replaced original nla_parse_deprecated() by nla_parse()
https://jira.sw.ru/browse/PSBM-122965
Signed-off-by: Vasily Averin 
---
 net/netfilter/ipset/ip_set_core.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index 6ef5898..d5344e5 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1542,10 +1542,14 @@ call_ad(struct sock *ctnl, struct sk_buff *skb, struct 
ip_set *set,
memcpy(&errmsg->msg, nlh, nlh->nlmsg_len);
cmdattr = (void *)&errmsg->msg + min_len;
 
-   nla_parse(cda, IPSET_ATTR_CMD_MAX,
+   ret = nla_parse(cda, IPSET_ATTR_CMD_MAX,
  cmdattr, nlh->nlmsg_len - min_len,
  ip_set_adt_policy);
 
+   if (ret) {
+   nlmsg_free(skb2);
+   return ret;
+   }
errline = nla_data(cda[IPSET_ATTR_LINENO]);
 
*errline = lineno;
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 4/6] ms/netlink: add a start callback for starting a netlink dump

2020-12-01 Thread Vasily Averin
From: Tom Herbert 

The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.

Signed-off-by: Tom Herbert 
Signed-off-by: David S. Miller 

(cherry-picked commit fc9e50f5a5a4e1fa9ba2756f745a13e693cf6a06)
https://jira.sw.ru/browse/PSBM-122965
Signed-off-by: Vasily Averin 
---
 include/linux/netlink.h  |  2 ++
 include/net/genetlink.h  |  2 ++
 net/netlink/af_netlink.c |  4 
 net/netlink/genetlink.c  | 16 
 4 files changed, 24 insertions(+)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index a35a751..813f623 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -136,6 +136,7 @@ netlink_skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 struct netlink_callback {
struct sk_buff  *skb;
const struct nlmsghdr   *nlh;
+   int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff * skb,
struct netlink_callback *cb);
int (*done)(struct netlink_callback *cb);
@@ -158,6 +159,7 @@ struct nlmsghdr *
 __nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int 
flags);
 
 struct netlink_dump_control {
+   int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff *skb, struct netlink_callback *);
int (*done)(struct netlink_callback *);
void *data;
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index e33a65d..c867ebd 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -116,6 +116,7 @@ static inline void genl_info_net_set(struct genl_info 
*info, struct net *net)
  * @flags: flags
  * @policy: attribute validation policy
  * @doit: standard command callback
+ * @start: start callback for dumps
  * @dumpit: callback for dumpers
  * @done: completion callback for dumps
  * @ops_list: operations list
@@ -124,6 +125,7 @@ struct genl_ops {
const struct nla_policy *policy;
int(*doit)(struct sk_buff *skb,
   struct genl_info *info);
+   int(*start)(struct netlink_callback *cb);
int(*dumpit)(struct sk_buff *skb,
 struct netlink_callback *cb);
int(*done)(struct netlink_callback *cb);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index eb2b5de..d1db782c 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2195,6 +2195,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 
cb = &nlk->cb;
memset(cb, 0, sizeof(*cb));
+   cb->start = control->start;
cb->dump = control->dump;
cb->done = control->done;
cb->nlh = nlh;
@@ -2207,6 +2208,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 
mutex_unlock(nlk->cb_mutex);
 
+   if (cb->start)
+   cb->start(cb);
+
ret = netlink_dump(sk);
sock_put(sk);
 
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 87b4a12..5fb5884 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -472,6 +472,20 @@ void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
 }
 EXPORT_SYMBOL(genlmsg_put);
 
+static int genl_lock_start(struct netlink_callback *cb)
+{
+   /* our ops are always const - netlink API doesn't propagate that */
+   const struct genl_ops *ops = cb->data;
+   int rc = 0;
+
+   if (ops->start) {
+   genl_lock();
+   rc = ops->start(cb);
+   genl_unlock();
+   }
+   return rc;
+}
+
 static int genl_lock_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 {
/* our ops are always const - netlink API doesn't propagate that */
@@ -544,6 +558,7 @@ static int genl_family_rcv_msg(const struct genl_family 
*family,
.module = family->module,
/* we have const, but the netlink API doesn't */
.data = (void *)ops,
+   .start = genl_lock_start,
.dump = genl_lock_dumpit,
.done = genl_lock_done,
};
@@ -555,6 +570,7 @@ static int genl_family_rcv_msg(const struct genl_family 
*family,
} else {
struct netlink_dump_control c = {
.module = family->module,
+   .start = ops->start,
.dump = ops->dumpit,
.done = ops->done,
};
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 3/6] ms/netfilter: ipset: Fix the last missing check of nla_parse_deprecated()

2020-12-01 Thread Vasily Averin
From: Jozsef Kadlecsik 

In dump_init() the outdated comment was incorrect and we had a missing
validation check of nla_parse_deprecated().

Signed-off-by: Jozsef Kadlecsik 
(cherry-picked from commit 13c6ba1f855415cf3b9c58ea926ae8858050ec1c)
VvS: replaced original nla_parse_deprecated() by nla_parse()
https://jira.sw.ru/browse/PSBM-122965
Signed-off-by: Vasily Averin 
---
 net/netfilter/ipset/ip_set_core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index d5344e5..0a53827 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1292,10 +1292,12 @@ dump_init(struct netlink_callback *cb, struct 
ip_set_net *inst)
struct nlattr *attr = (void *)nlh + min_len;
u32 dump_type;
ip_set_id_t index;
+   int ret;
 
-   /* Second pass, so parser can't fail */
-   nla_parse(cda, IPSET_ATTR_CMD_MAX,
+   ret = nla_parse(cda, IPSET_ATTR_CMD_MAX,
  attr, nlh->nlmsg_len - min_len, ip_set_setname_policy);
+   if (ret)
+   return ret;
 
cb->args[IPSET_CB_PROTO] = nla_get_u8(cda[IPSET_ATTR_PROTOCOL]);
if (cda[IPSET_ATTR_SETNAME]) {
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 1/6] ms/netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is present

2020-12-01 Thread Vasily Averin
From: Florian Westphal 

The set uadt functions assume lineno is never NULL, but it is in
case of ip_set_utest().

syzkaller managed to generate a netlink message that calls this with
LINENO attr present:

general protection fault:  [#1] PREEMPT SMP KASAN
RIP: 0010:hash_mac4_uadt+0x1bc/0x470 net/netfilter/ipset/ip_set_hash_mac.c:104
Call Trace:
 ip_set_utest+0x55b/0x890 net/netfilter/ipset/ip_set_core.c:1867
 nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229
 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
 nfnetlink_rcv+0x1ba/0x460 net/netfilter/nfnetlink.c:563

pass a dummy lineno storage, its easier than patching all set
implementations.

This seems to be a day-0 bug.

Cc: Jozsef Kadlecsik 
Reported-by: syzbot+34bd2369d38707f3f...@syzkaller.appspotmail.com
Fixes: a7b4f989a6294 ("netfilter: ipset: IP set core support")
Signed-off-by: Florian Westphal 
Acked-by: Jozsef Kadlecsik 
Signed-off-by: Pablo Neira Ayuso 

(cherry-picked from commit 22dad713b8a5ff488e07b821195270672f486eb2)
https://jira.sw.ru/browse/PSBM-122965
Signed-off-by: Vasily Averin 
---
 net/netfilter/ipset/ip_set_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index 6d20f97..6ef5898 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1678,6 +1678,7 @@ ip_set_utest(struct sock *ctnl, struct sk_buff *skb,
struct ip_set *set;
struct nlattr *tb[IPSET_ATTR_ADT_MAX + 1] = {};
int ret = 0;
+   u32 lineno;
 
if (unlikely(protocol_min_failed(attr) ||
 !attr[IPSET_ATTR_SETNAME] ||
@@ -1694,7 +1695,7 @@ ip_set_utest(struct sock *ctnl, struct sk_buff *skb,
return -IPSET_ERR_PROTOCOL;
 
rcu_read_lock_bh();
-   ret = set->variant->uadt(set, tb, IPSET_TEST, NULL, 0, 0);
+   ret = set->variant->uadt(set, tb, IPSET_TEST, &lineno, 0, 0);
rcu_read_unlock_bh();
/* Userspace can't trigger element to be re-added */
if (ret == -EAGAIN)
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 1/2] ms/netfilter: ipset: Fix forceadd evaluation path

2020-12-01 Thread Vasily Averin
From: Jozsef Kadlecsik 

When the forceadd option is enabled, the hash:* types should find and replace
the first entry in the bucket with the new one if there are no reuseable
(deleted or timed out) entries. However, the position index was just not set
to zero and remained the invalid -1 if there were no reuseable entries.

Reported-by: syzbot+6a86565c74ebe30ae...@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and 
protocol version 7")
Signed-off-by: Jozsef Kadlecsik 

(cherry-picked from commit 8af1c6fbd9239877998c7f5a591cb2c88d41fb66)
https://jira.sw.ru/browse/PSBM-123063
Signed-off-by: Vasily Averin 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index aa10e4a..45046e5 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -758,6 +758,8 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
}
}
if (reuse || forceadd) {
+   if (j == -1)
+   j = 0;
data = ahash_data(n, j, set->dsize);
if (!deleted) {
 #ifdef IP_SET_HASH_WITH_NETS
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 2/2] ms/netfilter: ipset: Correct rcu_dereference() call in ip_set_put_comment()

2020-12-01 Thread Vasily Averin
From: Jozsef Kadlecsik 

The function is called when rcu_read_lock() is held and not
when rcu_read_lock_bh() is held.

Signed-off-by: Jozsef Kadlecsik 
Signed-off-by: Pablo Neira Ayuso 

(cherry-picked from commit 17b8b74c0f8dbf9b9e3301f9ca5b65dd1c079951)
https://jira.sw.ru/browse/PSBM-123063
Signed-off-by: Vasily Averin 
---
 include/linux/netfilter/ipset/ip_set_comment.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set_comment.h 
b/include/linux/netfilter/ipset/ip_set_comment.h
index 8e2bab1..70877f8d 100644
--- a/include/linux/netfilter/ipset/ip_set_comment.h
+++ b/include/linux/netfilter/ipset/ip_set_comment.h
@@ -43,11 +43,11 @@ ip_set_init_comment(struct ip_set *set, struct 
ip_set_comment *comment,
rcu_assign_pointer(comment->c, c);
 }
 
-/* Used only when dumping a set, protected by rcu_read_lock_bh() */
+/* Used only when dumping a set, protected by rcu_read_lock() */
 static inline int
 ip_set_put_comment(struct sk_buff *skb, const struct ip_set_comment *comment)
 {
-   struct ip_set_comment_rcu *c = rcu_dereference_bh(comment->c);
+   struct ip_set_comment_rcu *c = rcu_dereference(comment->c);
 
if (!c)
return 0;
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7] ms/net_sched: gen_estimator: fix lockdep splat

2020-12-01 Thread Vasily Averin
From: Eric Dumazet 

syzbot reported a lockdep splat in gen_new_estimator() /
est_fetch_counters() when attempting to lock est->stats_lock.

Since est_fetch_counters() is called from BH context from timer
interrupt, we need to block BH as well when calling it from process
context.

Most qdiscs use per cpu counters and are immune to the problem,
but net/sched/act_api.c and net/netfilter/xt_RATEEST.c are using
a spinlock to protect their data. They both call gen_new_estimator()
while object is created and not yet alive, so this bug could
not trigger a deadlock, only a lockdep splat.

Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate 
estimators")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Acked-by: Cong Wang 
Signed-off-by: David S. Miller 

(cherry-picked from commit 40ca54e3a686f13117f3de0c443f8026dadf7c44)
https://jira.sw.ru/browse/PSBM-123087
Signed-off-by: Vasily Averin 
---
 net/core/gen_estimator.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index bca1d06..0e3024c 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -159,7 +159,11 @@ int gen_new_estimator(struct gnet_stats_basic_packed 
*bstats,
est->intvl_log = intvl_log;
est->cpu_bstats = cpu_bstats;
 
+   if (stats_lock)
+   local_bh_disable();
est_fetch_counters(est, &b);
+   if (stats_lock)
+   local_bh_enable();
est->last_bytes = b.bytes;
est->last_packets = b.packets;
old = rcu_dereference_protected(*rate_est, 1);
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7] ms/xfrm: check id proto in validate_tmpl()

2020-12-01 Thread Vasily Averin
From: Cong Wang 

syzbot reported a kernel warning in xfrm_state_fini(), which
indicates that we have entries left in the list
net->xfrm.state_all whose proto is zero. And
xfrm_id_proto_match() doesn't consider them as a match with
IPSEC_PROTO_ANY in this case.

Proto with value 0 is probably not a valid value, at least
verify_newsa_info() doesn't consider it valid either.

This patch fixes it by checking the proto value in
validate_tmpl() and rejecting invalid ones, like what iproute2
does in xfrm_xfrmproto_getbyname().

Reported-by: syzbot 
Cc: Steffen Klassert 
Cc: Herbert Xu 
Signed-off-by: Cong Wang 
Signed-off-by: Steffen Klassert 

(cherry-picked from commit 6a53b7593233ab9e4f96873ebacc0f653a55c3e1)
https://jira.sw.ru/browse/PSBM-123084
Signed-off-by: Vasily Averin 
---
 net/xfrm/xfrm_user.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 472e1e2..e2dd99f 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1391,6 +1391,21 @@ static int validate_tmpl(int nr, struct xfrm_user_tmpl 
*ut, u16 family)
default:
return -EINVAL;
}
+
+   switch (ut[i].id.proto) {
+   case IPPROTO_AH:
+   case IPPROTO_ESP:
+   case IPPROTO_COMP:
+#if IS_ENABLED(CONFIG_IPV6)
+   case IPPROTO_ROUTING:
+   case IPPROTO_DSTOPTS:
+#endif
+   case IPSEC_PROTO_ANY:
+   break;
+   default:
+   return -EINVAL;
+   }
+
}
 
return 0;
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7] ms/ip6mr: fix stale iterator

2020-12-01 Thread Vasily Averin
From: Nikolay Aleksandrov 

When we dump the ip6mr mfc entries via proc, we initialize an iterator
with the table to dump but we don't clear the cache pointer which might
be initialized from a prior read on the same descriptor that ended. This
can result in lock imbalance (an unnecessary unlock) leading to other
crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
Thanks for the reliable reproducer.

Here's syzbot's trace:
 WARNING: bad unlock balance detected!
 4.15.0-rc3+ #128 Not tainted
 syzkaller971460/3195 is trying to release lock (mrt_lock) at:
 [<6898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
 but there are no more locks to release!

 other info that might help us debug this:
 1 lock held by syzkaller971460/3195:
  #0:  (&p->lock){+.+.}, at: [<744a6565>] seq_read+0xd5/0x13d0
 fs/seq_file.c:165

 stack backtrace:
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
  __lock_release kernel/locking/lockdep.c:3775 [inline]
  lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
  __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
  _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
  ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
  traverse+0x3bc/0xa00 fs/seq_file.c:135
  seq_read+0x96a/0x13d0 fs/seq_file.c:189
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:e574a15c EFLAGS: 0292 ORIG_RAX: 014d
 RAX: ffda RBX: 000f RCX: 20a3afb0
 RDX: 0001 RSI: 0067 RDI: 
 RBP:  R08:  R09: 
 R10:  R11:  R12: 
 R13:  R14:  R15: 
 BUG: sleeping function called from invalid context at lib/usercopy.c:25
 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
 INFO: lockdep is turned off.
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
  __might_sleep+0x95/0x190 kernel/sched/core.c:6013
  __might_fault+0xab/0x1d0 mm/memory.c:4525
  _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
  copy_to_user include/linux/uaccess.h:155 [inline]
  seq_read+0xcb4/0x13d0 fs/seq_file.c:279
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:e574a15c EFLAGS: 0292 ORIG_RAX: 014d
 RAX: ffda RBX: 000f RCX: 20a3afb0
 RDX: 0001 RSI: 0067 RDI: 
 RBP:  R08:  R09: 
 R10:  R11:  R12: 
 R13:  R14:  R15: 
 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
 lib/usercopy.c:26

Reported-by: syzbot 

Signed-off-by: Nikolay Aleksandrov 
Signed-off-by: David S. Miller 

(cherry-picked from commit 4adfa79fc254efb7b0eb3cd58f62c2c3f805f1ba)
https://jira.sw.ru/browse/PSBM-122990
Signed-off-by: Vasily Averin 
---
 net/ipv6/ip6mr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 00c6c7d..856173b6 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -497,6 +497,7 @@ static void *ipmr_mfc_seq_start(struct seq_file *seq, 
loff_t *pos)
return ERR_PTR(-ENOENT);
 
it->mrt = mrt;
+   it->cache = NULL;
return *pos ? ipmr_mfc_seq_idx(net, seq->private, *pos - 

[Devel] [PATCH RH7] ms/ext4: fix argument checking in EXT4_IOC_MOVE_EXT

2020-12-01 Thread Vasily Averin
From: Theodore Ts'o 

If the starting block number of either the source or destination file
exceeds the EOF, EXT4_IOC_MOVE_EXT should return EINVAL.

Also fixed the helper function mext_check_coverage() so that if the
logical block is beyond EOF, make it return immediately, instead of
looping until the block number wraps all the away around.  This takes
long enough that if there are multiple threads trying to do pound on
an the same inode doing non-sensical things, it can end up triggering
the kernel's soft lockup detector.

Reported-by: syzbot+c61979f6f2cba5cb3...@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o 
Cc: sta...@kernel.org
(cherry-picked from commit f18b2b83a727a3db208308057d2c7945f368e625)
https://jira.sw.ru/browse/PSBM-122991
Signed-off-by: Vasily Averin 
---
 fs/ext4/move_extent.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 6c925d6..930c7bd 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -530,9 +530,13 @@ mext_check_arguments(struct inode *orig_inode,
orig_inode->i_ino, donor_inode->i_ino);
return -EINVAL;
}
-   if (orig_eof < orig_start + *len - 1)
+   if (orig_eof <= orig_start)
+   *len = 0;
+   else if (orig_eof < orig_start + *len - 1)
*len = orig_eof - orig_start;
-   if (donor_eof < donor_start + *len - 1)
+   if (donor_eof <= donor_start)
+   *len = 0;
+   else if (donor_eof < donor_start + *len - 1)
*len = donor_eof - donor_start;
if (!*len) {
ext4_debug("ext4 move extent: len should not be 0 "
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel