[PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock

2018-09-17 Thread Vlad Buslov
Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently. Additional changes are required to refactor classifiers
API and individual classifiers for parallel execution. This patch set
lays groundwork to eventually register rule update handlers as
rtnl-unlocked by modifying code in cls API that works with Qdiscs and
blocks. Following patch set does the same for chains and classifiers.

The goal of this change is to refactor tcf_block_find() and its
dependencies to allow concurrent execution:
- Extend Qdisc API with rcu to lookup and take reference to Qdisc
  without relying on rtnl lock.
- Extend tcf_block with atomic reference counting and rcu.
- Always take reference to tcf_block while working with it.
- Implement tcf_block_release() to release resources obtained by
  tcf_block_find()
- Create infrastructure to allow registering Qdiscs with class ops that
  do not require the caller to hold rtnl lock.

All three netlink rule update handlers use tcf_block_find() to lookup
Qdisc and block, and this patch set introduces additional means of
synchronization to substitute rtnl lock in cls API.

Some functions in cls and sch APIs have historic names that no longer
clearly describe their intent. In order not make this code even more
confusing when introducing their concurrency-friendly versions, rename
these functions to describe actual implementation.

Changes from V1 to V2:
- Rebase on latest net-next.
- Patch 8 - remove.
- Patch 9 - fold into patch 11.
- Patch 11:
  - Rename tcf_block_{get|put}() to tcf_block_refcnt_{get|put}().
- Patch 13 - remove.

Vlad Buslov (10):
  net: core: netlink: add helper refcount dec and lock function
  net: sched: rename qdisc_destroy() to qdisc_put()
  net: sched: extend Qdisc with rcu
  net: sched: add helper function to take reference to Qdisc
  net: sched: use Qdisc rcu API instead of relying on rtnl lock
  net: sched: change tcf block reference counter type to refcount_t
  net: sched: implement functions to put and flush all chains
  net: sched: protect block idr with spinlock
  net: sched: implement tcf_block_refcnt_{get|put}()
  net: sched: use reference counting for tcf blocks on rules update

 include/linux/rtnetlink.h |   6 ++
 include/net/pkt_sched.h   |   1 +
 include/net/sch_generic.h |  20 +++-
 net/core/rtnetlink.c  |   6 ++
 net/sched/cls_api.c   | 250 +-
 net/sched/sch_api.c   |  24 -
 net/sched/sch_atm.c   |   2 +-
 net/sched/sch_cbq.c   |   2 +-
 net/sched/sch_cbs.c   |   2 +-
 net/sched/sch_drr.c   |   4 +-
 net/sched/sch_dsmark.c|   2 +-
 net/sched/sch_fifo.c  |   2 +-
 net/sched/sch_generic.c   |  48 +++--
 net/sched/sch_hfsc.c  |   2 +-
 net/sched/sch_htb.c   |   4 +-
 net/sched/sch_mq.c|   4 +-
 net/sched/sch_mqprio.c|   4 +-
 net/sched/sch_multiq.c|   6 +-
 net/sched/sch_netem.c |   2 +-
 net/sched/sch_prio.c  |   6 +-
 net/sched/sch_qfq.c   |   4 +-
 net/sched/sch_red.c   |   4 +-
 net/sched/sch_sfb.c   |   4 +-
 net/sched/sch_tbf.c   |   4 +-
 24 files changed, 301 insertions(+), 112 deletions(-)

-- 
2.7.5



[PATCH net-next v2 08/10] net: sched: protect block idr with spinlock

2018-09-17 Thread Vlad Buslov
Protect block idr access with spinlock, instead of relying on rtnl lock.
Take tn->idr_lock spinlock during block insertion and removal.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 58b2d8443f6a..924723fb74f6 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -473,6 +473,7 @@ tcf_chain0_head_change_cb_del(struct tcf_block *block,
 }
 
 struct tcf_net {
+   spinlock_t idr_lock; /* Protects idr */
struct idr idr;
 };
 
@@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, 
struct net *net,
struct netlink_ext_ack *extack)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
+   int err;
+
+   idr_preload(GFP_KERNEL);
+   spin_lock(&tn->idr_lock);
+   err = idr_alloc_u32(&tn->idr, block, &block->index, block->index,
+   GFP_NOWAIT);
+   spin_unlock(&tn->idr_lock);
+   idr_preload_end();
 
-   return idr_alloc_u32(&tn->idr, block, &block->index, block->index,
-GFP_KERNEL);
+   return err;
 }
 
 static void tcf_block_remove(struct tcf_block *block, struct net *net)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+   spin_lock(&tn->idr_lock);
idr_remove(&tn->idr, block->index);
+   spin_unlock(&tn->idr_lock);
 }
 
 static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
@@ -2285,6 +2295,7 @@ static __net_init int tcf_net_init(struct net *net)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+   spin_lock_init(&tn->idr_lock);
idr_init(&tn->idr);
return 0;
 }
-- 
2.7.5



[PATCH net-next v2 06/10] net: sched: change tcf block reference counter type to refcount_t

2018-09-17 Thread Vlad Buslov
As a preparation for removing rtnl lock dependency from rules update path,
change tcf block reference counter type to refcount_t to allow modification
by concurrent users.

In block put function perform decrement and check reference counter once to
accommodate concurrent modification by unlocked users. After this change
tcf_chain_put at the end of block put function is called with
block->refcnt==0 and will deallocate block after the last chain is
released, so there is no need to manually deallocate block in this case.
However, if block reference counter reached 0 and there are no chains to
release, block must still be deallocated manually.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  2 +-
 net/sched/cls_api.c   | 59 ---
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9a295e690efe..45fee65468d0 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -345,7 +345,7 @@ struct tcf_chain {
 struct tcf_block {
struct list_head chain_list;
u32 index; /* block index for shared blocks */
-   unsigned int refcnt;
+   refcount_t refcnt;
struct net *net;
struct Qdisc *q;
struct list_head cb_list;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index cfa4a02a6a1a..c3c7d4e2f84c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -240,7 +240,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
if (!chain->index)
block->chain0.chain = NULL;
kfree(chain);
-   if (list_empty(&block->chain_list) && block->refcnt == 0)
+   if (list_empty(&block->chain_list) && !refcount_read(&block->refcnt))
kfree(block);
 }
 
@@ -510,7 +510,7 @@ static struct tcf_block *tcf_block_create(struct net *net, 
struct Qdisc *q,
INIT_LIST_HEAD(&block->owner_list);
INIT_LIST_HEAD(&block->chain0.filter_chain_list);
 
-   block->refcnt = 1;
+   refcount_set(&block->refcnt, 1);
block->net = net;
block->index = block_index;
 
@@ -719,7 +719,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
/* block_index not 0 means the shared block is requested */
block = tcf_block_lookup(net, ei->block_index);
if (block)
-   block->refcnt++;
+   refcount_inc(&block->refcnt);
}
 
if (!block) {
@@ -762,7 +762,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
 err_block_insert:
kfree(block);
} else {
-   block->refcnt--;
+   refcount_dec(&block->refcnt);
}
return err;
 }
@@ -802,34 +802,45 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
tcf_chain0_head_change_cb_del(block, ei);
tcf_block_owner_del(block, q, ei->binder_type);
 
-   if (block->refcnt == 1) {
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, block->net);
-
-   /* Hold a refcnt for all chains, so that they don't disappear
-* while we are iterating.
+   if (refcount_dec_and_test(&block->refcnt)) {
+   /* Flushing/putting all chains will cause the block to be
+* deallocated when last chain is freed. However, if chain_list
+* is empty, block has to be manually deallocated. After block
+* reference counter reached 0, it is no longer possible to
+* increment it or add new chains to block.
 */
-   list_for_each_entry(chain, &block->chain_list, list)
-   tcf_chain_hold(chain);
+   bool free_block = list_empty(&block->chain_list);
 
-   list_for_each_entry(chain, &block->chain_list, list)
-   tcf_chain_flush(chain);
-   }
+   if (tcf_block_shared(block))
+   tcf_block_remove(block, block->net);
 
-   tcf_block_offload_unbind(block, q, ei);
+   if (!free_block) {
+   /* Hold a refcnt for all chains, so that they don't
+* disappear while we are iterating.
+*/
+   list_for_each_entry(chain, &block->chain_list, list)
+   tcf_chain_hold(chain);
 
-   if (block->refcnt == 1) {
-   /* At this point, all the chains should have refcnt >= 1. */
-   list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
-   tcf_chain_put_explicitly_created(chain);
-   tcf_chain_put(chain);
+   list_for_each_entry(chain, &block->chain_list, list)
+   tcf_chain_flush(chain);
}
 
-   block->refcnt--

[PATCH net-next v2 07/10] net: sched: implement functions to put and flush all chains

2018-09-17 Thread Vlad Buslov
Extract code that flushes and puts all chains on tcf block to two
standalone function to be shared with functions that locklessly get/put
reference to block.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 55 +
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c3c7d4e2f84c..58b2d8443f6a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -538,6 +538,31 @@ static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
qdisc_put_unlocked(q);
 }
 
+static void tcf_block_flush_all_chains(struct tcf_block *block)
+{
+   struct tcf_chain *chain;
+
+   /* Hold a refcnt for all chains, so that they don't disappear
+* while we are iterating.
+*/
+   list_for_each_entry(chain, &block->chain_list, list)
+   tcf_chain_hold(chain);
+
+   list_for_each_entry(chain, &block->chain_list, list)
+   tcf_chain_flush(chain);
+}
+
+static void tcf_block_put_all_chains(struct tcf_block *block)
+{
+   struct tcf_chain *chain, *tmp;
+
+   /* At this point, all the chains should have refcnt >= 1. */
+   list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
+   tcf_chain_put_explicitly_created(chain);
+   tcf_chain_put(chain);
+   }
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -795,8 +820,6 @@ EXPORT_SYMBOL(tcf_block_get);
 void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
   struct tcf_block_ext_info *ei)
 {
-   struct tcf_chain *chain, *tmp;
-
if (!block)
return;
tcf_chain0_head_change_cb_del(block, ei);
@@ -813,32 +836,14 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
 
if (tcf_block_shared(block))
tcf_block_remove(block, block->net);
-
-   if (!free_block) {
-   /* Hold a refcnt for all chains, so that they don't
-* disappear while we are iterating.
-*/
-   list_for_each_entry(chain, &block->chain_list, list)
-   tcf_chain_hold(chain);
-
-   list_for_each_entry(chain, &block->chain_list, list)
-   tcf_chain_flush(chain);
-   }
-
+   if (!free_block)
+   tcf_block_flush_all_chains(block);
tcf_block_offload_unbind(block, q, ei);
 
-   if (free_block) {
+   if (free_block)
kfree(block);
-   } else {
-   /* At this point, all the chains should have
-* refcnt >= 1.
-*/
-   list_for_each_entry_safe(chain, tmp, &block->chain_list,
-list) {
-   tcf_chain_put_explicitly_created(chain);
-   tcf_chain_put(chain);
-   }
-   }
+   else
+   tcf_block_put_all_chains(block);
} else {
tcf_block_offload_unbind(block, q, ei);
}
-- 
2.7.5



[PATCH net-next v2 01/10] net: core: netlink: add helper refcount dec and lock function

2018-09-17 Thread Vlad Buslov
Rtnl lock is encapsulated in netlink and cannot be accessed by other
modules directly. This means that reference counted objects that rely on
rtnl lock cannot use it with refcounter helper function that atomically
releases decrements reference and obtains mutex.

This patch implements simple wrapper function around refcount_dec_and_lock
that obtains rtnl lock if reference counter value reached 0.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/linux/rtnetlink.h | 1 +
 net/core/rtnetlink.c  | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 5225832bd6ff..dffbf665c086 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -34,6 +34,7 @@ extern void rtnl_unlock(void);
 extern int rtnl_trylock(void);
 extern int rtnl_is_locked(void);
 extern int rtnl_lock_killable(void);
+extern bool refcount_dec_and_rtnl_lock(refcount_t *r);
 
 extern wait_queue_head_t netdev_unregistering_wq;
 extern struct rw_semaphore pernet_ops_rwsem;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e4ae0319e189..ad9d1493cb27 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -130,6 +130,12 @@ int rtnl_is_locked(void)
 }
 EXPORT_SYMBOL(rtnl_is_locked);
 
+bool refcount_dec_and_rtnl_lock(refcount_t *r)
+{
+   return refcount_dec_and_mutex_lock(r, &rtnl_mutex);
+}
+EXPORT_SYMBOL(refcount_dec_and_rtnl_lock);
+
 #ifdef CONFIG_PROVE_LOCKING
 bool lockdep_rtnl_is_held(void)
 {
-- 
2.7.5



[PATCH net-next v2 09/10] net: sched: implement tcf_block_refcnt_{get|put}()

2018-09-17 Thread Vlad Buslov
Implement get/put function for blocks that only take/release the reference
and perform deallocation. These functions are intended to be used by
unlocked rules update path to always hold reference to block while working
with it. They use on new fine-grained locking mechanisms introduced in
previous patches in this set, instead of relying on global protection
provided by rtnl lock.

Extract code that is common with tcf_block_detach_ext() into common
function __tcf_block_put().

Extend tcf_block with rcu to allow safe deallocation when it is accessed
concurrently.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c   | 74 ---
 2 files changed, 51 insertions(+), 24 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 45fee65468d0..931fcdadf64a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -357,6 +357,7 @@ struct tcf_block {
struct tcf_chain *chain;
struct list_head filter_chain_list;
} chain0;
+   struct rcu_head rcu;
 };
 
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 924723fb74f6..0a7a3ace2da9 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -241,7 +241,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
block->chain0.chain = NULL;
kfree(chain);
if (list_empty(&block->chain_list) && !refcount_read(&block->refcnt))
-   kfree(block);
+   kfree_rcu(block, rcu);
 }
 
 static void tcf_chain_hold(struct tcf_chain *chain)
@@ -537,6 +537,19 @@ static struct tcf_block *tcf_block_lookup(struct net *net, 
u32 block_index)
return idr_find(&tn->idr, block_index);
 }
 
+static struct tcf_block *tcf_block_refcnt_get(struct net *net, u32 block_index)
+{
+   struct tcf_block *block;
+
+   rcu_read_lock();
+   block = tcf_block_lookup(net, block_index);
+   if (block && !refcount_inc_not_zero(&block->refcnt))
+   block = NULL;
+   rcu_read_unlock();
+
+   return block;
+}
+
 static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
 {
if (!q)
@@ -573,6 +586,40 @@ static void tcf_block_put_all_chains(struct tcf_block 
*block)
}
 }
 
+static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
+   struct tcf_block_ext_info *ei)
+{
+   if (refcount_dec_and_test(&block->refcnt)) {
+   /* Flushing/putting all chains will cause the block to be
+* deallocated when last chain is freed. However, if chain_list
+* is empty, block has to be manually deallocated. After block
+* reference counter reached 0, it is no longer possible to
+* increment it or add new chains to block.
+*/
+   bool free_block = list_empty(&block->chain_list);
+
+   if (tcf_block_shared(block))
+   tcf_block_remove(block, block->net);
+   if (!free_block)
+   tcf_block_flush_all_chains(block);
+
+   if (q)
+   tcf_block_offload_unbind(block, q, ei);
+
+   if (free_block)
+   kfree_rcu(block, rcu);
+   else
+   tcf_block_put_all_chains(block);
+   } else if (q) {
+   tcf_block_offload_unbind(block, q, ei);
+   }
+}
+
+static void tcf_block_refcnt_put(struct tcf_block *block)
+{
+   __tcf_block_put(block, NULL, NULL);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -795,7 +842,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
if (tcf_block_shared(block))
tcf_block_remove(block, net);
 err_block_insert:
-   kfree(block);
+   kfree_rcu(block, rcu);
} else {
refcount_dec(&block->refcnt);
}
@@ -835,28 +882,7 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
tcf_chain0_head_change_cb_del(block, ei);
tcf_block_owner_del(block, q, ei->binder_type);
 
-   if (refcount_dec_and_test(&block->refcnt)) {
-   /* Flushing/putting all chains will cause the block to be
-* deallocated when last chain is freed. However, if chain_list
-* is empty, block has to be manually deallocated. After block
-* reference counter reached 0, it is no longer possible to
-* increment it or add new chains to block.
-*/
-   bool free_block = list_empty(&block->chain_list);
-
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, block->net);
-   if (!free_block)
-   tcf_block_flush_all_chains(block)

[PATCH net-next v2 03/10] net: sched: extend Qdisc with rcu

2018-09-17 Thread Vlad Buslov
Currently, Qdisc API functions assume that users have rtnl lock taken. To
implement rtnl unlocked classifiers update interface, Qdisc API must be
extended with functions that do not require rtnl lock.

Extend Qdisc structure with rcu. Implement special version of put function
qdisc_put_unlocked() that is called without rtnl lock taken. This function
only takes rtnl lock if Qdisc reference counter reached zero and is
intended to be used as optimization.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/linux/rtnetlink.h |  5 +
 include/net/pkt_sched.h   |  1 +
 include/net/sch_generic.h |  2 ++
 net/sched/sch_api.c   | 18 ++
 net/sched/sch_generic.c   | 25 -
 5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dffbf665c086..d3dff3e41e6c 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -84,6 +84,11 @@ static inline struct netdev_queue *dev_ingress_queue(struct 
net_device *dev)
return rtnl_dereference(dev->ingress_queue);
 }
 
+static inline struct netdev_queue *dev_ingress_queue_rcu(struct net_device 
*dev)
+{
+   return rcu_dereference(dev->ingress_queue);
+}
+
 struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
 
 #ifdef CONFIG_NET_INGRESS
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 7dc769e5452b..a16fbe9a2a67 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -102,6 +102,7 @@ int qdisc_set_default(const char *id);
 void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
struct nlattr *tab,
struct netlink_ext_ack *extack);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fadb1a4d4ee8..8a0d1523d11b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -90,6 +90,7 @@ struct Qdisc {
struct gnet_stats_queue __percpu *cpu_qstats;
int padded;
refcount_t  refcnt;
+   struct rcu_head rcu;
 
/*
 * For performance sake on SMP, we put highly modified fields at the end
@@ -555,6 +556,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue 
*dev_queue,
  struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
 void qdisc_put(struct Qdisc *qdisc);
+void qdisc_put_unlocked(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
   unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2096138c4bf6..070bed39155b 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -314,6 +314,24 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 
handle)
return q;
 }
 
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle)
+{
+   struct Qdisc *q;
+   struct netdev_queue *nq;
+
+   if (!handle)
+   return NULL;
+   q = qdisc_match_from_root(dev->qdisc, handle);
+   if (q)
+   goto out;
+
+   nq = dev_ingress_queue_rcu(dev);
+   if (nq)
+   q = qdisc_match_from_root(nq->qdisc_sleeping, handle);
+out:
+   return q;
+}
+
 static struct Qdisc *qdisc_leaf(struct Qdisc *p, u32 classid)
 {
unsigned long cl;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 3e7696f3e053..531fac1d2875 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -941,6 +941,13 @@ void qdisc_free(struct Qdisc *qdisc)
kfree((char *) qdisc - qdisc->padded);
 }
 
+void qdisc_free_cb(struct rcu_head *head)
+{
+   struct Qdisc *q = container_of(head, struct Qdisc, rcu);
+
+   qdisc_free(q);
+}
+
 static void qdisc_destroy(struct Qdisc *qdisc)
 {
const struct Qdisc_ops  *ops = qdisc->ops;
@@ -970,7 +977,7 @@ static void qdisc_destroy(struct Qdisc *qdisc)
kfree_skb_list(skb);
}
 
-   qdisc_free(qdisc);
+   call_rcu(&qdisc->rcu, qdisc_free_cb);
 }
 
 void qdisc_put(struct Qdisc *qdisc)
@@ -983,6 +990,22 @@ void qdisc_put(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_put);
 
+/* Version of qdisc_put() that is called with rtnl mutex unlocked.
+ * Intended to be used as optimization, this function only takes rtnl lock if
+ * qdisc reference counter reached zero.
+ */
+
+void qdisc_put_unlocked(struct Qdisc *qdisc)
+{
+   if (qdisc->flags & TCQ_F_BUILTIN ||
+   !refcount_dec_and_rtnl_lock(&qdisc->refcnt))
+   return;
+
+   qdisc_destroy(qdisc);
+   rtnl_unlock();
+}
+EXPORT_SYMBOL(qdisc_put_unlocked);
+
 /* Attach 

[PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock

2018-09-17 Thread Vlad Buslov
As a preparation from removing rtnl lock dependency from rules update path,
use Qdisc rcu and reference counting capabilities instead of relying on
rtnl lock while working with Qdiscs. Create new tcf_block_release()
function, and use it to free resources taken by tcf_block_find().
Currently, this function only releases Qdisc and it is extended in next
patches in this series.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 88 -
 1 file changed, 73 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1a67af8a6e8c..cfa4a02a6a1a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -527,6 +527,17 @@ static struct tcf_block *tcf_block_lookup(struct net *net, 
u32 block_index)
return idr_find(&tn->idr, block_index);
 }
 
+static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
+{
+   if (!q)
+   return;
+
+   if (rtnl_held)
+   qdisc_put(q);
+   else
+   qdisc_put_unlocked(q);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -537,6 +548,7 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
struct netlink_ext_ack *extack)
 {
struct tcf_block *block;
+   int err = 0;
 
if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
block = tcf_block_lookup(net, block_index);
@@ -548,55 +560,91 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
const struct Qdisc_class_ops *cops;
struct net_device *dev;
 
+   rcu_read_lock();
+
/* Find link */
-   dev = __dev_get_by_index(net, ifindex);
-   if (!dev)
+   dev = dev_get_by_index_rcu(net, ifindex);
+   if (!dev) {
+   rcu_read_unlock();
return ERR_PTR(-ENODEV);
+   }
 
/* Find qdisc */
if (!*parent) {
*q = dev->qdisc;
*parent = (*q)->handle;
} else {
-   *q = qdisc_lookup(dev, TC_H_MAJ(*parent));
+   *q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
if (!*q) {
NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't 
exists");
-   return ERR_PTR(-EINVAL);
+   err = -EINVAL;
+   goto errout_rcu;
}
}
 
+   *q = qdisc_refcount_inc_nz(*q);
+   if (!*q) {
+   NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
+   err = -EINVAL;
+   goto errout_rcu;
+   }
+
/* Is it classful? */
cops = (*q)->ops->cl_ops;
if (!cops) {
NL_SET_ERR_MSG(extack, "Qdisc not classful");
-   return ERR_PTR(-EINVAL);
+   err = -EINVAL;
+   goto errout_rcu;
}
 
if (!cops->tcf_block) {
NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
-   return ERR_PTR(-EOPNOTSUPP);
+   err = -EOPNOTSUPP;
+   goto errout_rcu;
}
 
+   /* At this point we know that qdisc is not noop_qdisc,
+* which means that qdisc holds a reference to net_device
+* and we hold a reference to qdisc, so it is safe to release
+* rcu read lock.
+*/
+   rcu_read_unlock();
+
/* Do we search for filter, attached to class? */
if (TC_H_MIN(*parent)) {
*cl = cops->find(*q, *parent);
if (*cl == 0) {
NL_SET_ERR_MSG(extack, "Specified class doesn't 
exist");
-   return ERR_PTR(-ENOENT);
+   err = -ENOENT;
+   goto errout_qdisc;
}
}
 
/* And the last stroke */
block = cops->tcf_block(*q, *cl, extack);
-   if (!block)
-   return ERR_PTR(-EINVAL);
+   if (!block) {
+   err = -EINVAL;
+   goto errout_qdisc;
+   }
if (tcf_block_shared(block)) {
NL_SET_ERR_MSG(extack, "This filter block is shared. 
Please use the block index to manipulate the filters");
-   return ERR_PTR(-EOPNOTSUPP);
+   err = -EOPNOTSUPP;
+   goto errout_qdisc;
}
}
 
return blo

[PATCH net-next v2 04/10] net: sched: add helper function to take reference to Qdisc

2018-09-17 Thread Vlad Buslov
Implement function to take reference to Qdisc that relies on rcu read lock
instead of rtnl mutex. Function only takes reference to Qdisc if reference
counter isn't zero. Intended to be used by unlocked cls API.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 8a0d1523d11b..9a295e690efe 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -115,6 +115,19 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
refcount_inc(&qdisc->refcnt);
 }
 
+/* Intended to be used by unlocked users, when concurrent qdisc release is
+ * possible.
+ */
+
+static inline struct Qdisc *qdisc_refcount_inc_nz(struct Qdisc *qdisc)
+{
+   if (qdisc->flags & TCQ_F_BUILTIN)
+   return qdisc;
+   if (refcount_inc_not_zero(&qdisc->refcnt))
+   return qdisc;
+   return NULL;
+}
+
 static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
if (qdisc->flags & TCQ_F_NOLOCK)
-- 
2.7.5



[PATCH net-next v2 02/10] net: sched: rename qdisc_destroy() to qdisc_put()

2018-09-17 Thread Vlad Buslov
Current implementation of qdisc_destroy() decrements Qdisc reference
counter and only actually destroy Qdisc if reference counter value reached
zero. Rename qdisc_destroy() to qdisc_put() in order for it to better
describe the way in which this function currently implemented and used.

Extract code that deallocates Qdisc into new private qdisc_destroy()
function. It is intended to be shared between regular qdisc_put() and its
unlocked version that is introduced in next patch in this series.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  2 +-
 net/sched/sch_api.c   |  6 +++---
 net/sched/sch_atm.c   |  2 +-
 net/sched/sch_cbq.c   |  2 +-
 net/sched/sch_cbs.c   |  2 +-
 net/sched/sch_drr.c   |  4 ++--
 net/sched/sch_dsmark.c|  2 +-
 net/sched/sch_fifo.c  |  2 +-
 net/sched/sch_generic.c   | 23 ++-
 net/sched/sch_hfsc.c  |  2 +-
 net/sched/sch_htb.c   |  4 ++--
 net/sched/sch_mq.c|  4 ++--
 net/sched/sch_mqprio.c|  4 ++--
 net/sched/sch_multiq.c|  6 +++---
 net/sched/sch_netem.c |  2 +-
 net/sched/sch_prio.c  |  6 +++---
 net/sched/sch_qfq.c   |  4 ++--
 net/sched/sch_red.c   |  4 ++--
 net/sched/sch_sfb.c   |  4 ++--
 net/sched/sch_tbf.c   |  4 ++--
 20 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d326fd553b58..fadb1a4d4ee8 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -554,7 +554,7 @@ void dev_deactivate_many(struct list_head *head);
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
  struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
-void qdisc_destroy(struct Qdisc *qdisc);
+void qdisc_put(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
   unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 411c40344b77..2096138c4bf6 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -920,7 +920,7 @@ static void notify_and_destroy(struct net *net, struct 
sk_buff *skb,
qdisc_notify(net, skb, n, clid, old, new);
 
if (old)
-   qdisc_destroy(old);
+   qdisc_put(old);
 }
 
 /* Graft qdisc "new" to class "classid" of qdisc "parent" or
@@ -973,7 +973,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc 
*parent,
qdisc_refcount_inc(new);
 
if (!ingress)
-   qdisc_destroy(old);
+   qdisc_put(old);
}
 
 skip:
@@ -1561,7 +1561,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct 
nlmsghdr *n,
err = qdisc_graft(dev, p, skb, n, clid, q, NULL, extack);
if (err) {
if (q)
-   qdisc_destroy(q);
+   qdisc_put(q);
return err;
}
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index cd49afca9617..d714d3747bcb 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -150,7 +150,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl)
pr_debug("atm_tc_put: destroying\n");
list_del_init(&flow->list);
pr_debug("atm_tc_put: qdisc %p\n", flow->q);
-   qdisc_destroy(flow->q);
+   qdisc_put(flow->q);
tcf_block_put(flow->block);
if (flow->sock) {
pr_debug("atm_tc_put: f_count %ld\n",
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f42025d53cfe..4dc05409e3fb 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1418,7 +1418,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct 
cbq_class *cl)
WARN_ON(cl->filters);
 
tcf_block_put(cl->block);
-   qdisc_destroy(cl->q);
+   qdisc_put(cl->q);
qdisc_put_rtab(cl->R_tab);
gen_kill_estimator(&cl->rate_est);
if (cl != &q->link)
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index e26a24017faa..e689e11b6d0f 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -379,7 +379,7 @@ static void cbs_destroy(struct Qdisc *sch)
cbs_disable_offload(dev, q);
 
if (q->qdisc)
-   qdisc_destroy(q->qdisc);
+   qdisc_put(q->qdisc);
 }
 
 static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index e0b0cf8a9939..cdebaed0f8cf 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -134,7 +134,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, 
u32 parentid,
tca[TCA_RATE]);
if (err) {
NL_SET_ERR_MSG(extack, "Failed to replace estimator");
-   qdisc_destroy(cl->qdisc);
+   qdis

[PATCH net-next v2 10/10] net: sched: use reference counting for tcf blocks on rules update

2018-09-17 Thread Vlad Buslov
In order to remove dependency on rtnl lock on rules update path, always
take reference to block while using it on rules update path. Change
tcf_block_get() error handling to properly release block with reference
counting, instead of just destroying it, in order to accommodate potential
concurrent users.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 0a7a3ace2da9..6a3eec5dbdf1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -633,7 +633,7 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
int err = 0;
 
if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, block_index);
+   block = tcf_block_refcnt_get(net, block_index);
if (!block) {
NL_SET_ERR_MSG(extack, "Block of given index was not 
found");
return ERR_PTR(-EINVAL);
@@ -713,6 +713,14 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
err = -EOPNOTSUPP;
goto errout_qdisc;
}
+
+   /* Always take reference to block in order to support execution
+* of rules update path of cls API without rtnl lock. Caller
+* must release block when it is finished using it. 'if' block
+* of this conditional obtain reference to block by calling
+* tcf_block_refcnt_get().
+*/
+   refcount_inc(&block->refcnt);
}
 
return block;
@@ -726,6 +734,8 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
 
 static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
 {
+   if (!IS_ERR_OR_NULL(block))
+   tcf_block_refcnt_put(block);
tcf_qdisc_put(q, true);
 }
 
@@ -794,21 +804,16 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
 {
struct net *net = qdisc_net(q);
struct tcf_block *block = NULL;
-   bool created = false;
int err;
 
-   if (ei->block_index) {
+   if (ei->block_index)
/* block_index not 0 means the shared block is requested */
-   block = tcf_block_lookup(net, ei->block_index);
-   if (block)
-   refcount_inc(&block->refcnt);
-   }
+   block = tcf_block_refcnt_get(net, ei->block_index);
 
if (!block) {
block = tcf_block_create(net, q, ei->block_index, extack);
if (IS_ERR(block))
return PTR_ERR(block);
-   created = true;
if (tcf_block_shared(block)) {
err = tcf_block_insert(block, net, extack);
if (err)
@@ -838,14 +843,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
 err_chain0_head_change_cb_add:
tcf_block_owner_del(block, q, ei->binder_type);
 err_block_owner_add:
-   if (created) {
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, net);
 err_block_insert:
-   kfree_rcu(block, rcu);
-   } else {
-   refcount_dec(&block->refcnt);
-   }
+   tcf_block_refcnt_put(block);
return err;
 }
 EXPORT_SYMBOL(tcf_block_get_ext);
@@ -1739,7 +1738,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
return err;
 
if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, tcm->tcm_block_index);
+   block = tcf_block_refcnt_get(net, tcm->tcm_block_index);
if (!block)
goto out;
/* If we work with block index, q is NULL and parent value
@@ -1798,6 +1797,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
}
}
 
+   if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+   tcf_block_refcnt_put(block);
cb->args[0] = index;
 
 out:
@@ -2062,7 +2063,7 @@ static int tc_dump_chain(struct sk_buff *skb, struct 
netlink_callback *cb)
return err;
 
if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, tcm->tcm_block_index);
+   block = tcf_block_refcnt_get(net, tcm->tcm_block_index);
if (!block)
goto out;
/* If we work with block index, q is NULL and parent value
@@ -2129,6 +2130,8 @@ static int tc_dump_chain(struct sk_buff *skb, struct 
netlink_callback *cb)
index++;
}
 
+   if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+   tcf_block_refcnt_put(block);
cb->args[0] = index;
 
 out:
-

Re: [Patch net-next] ipv4: initialize ra_mutex in inet_init_net()

2018-09-17 Thread Kirill Tkhai
On 14.09.2018 23:32, Cong Wang wrote:
> ra_mutex is a IPv4 specific mutex, it is inside struct netns_ipv4,
> but its initialization is in the generic netns code, setup_net().
> 
> Move it to IPv4 specific net init code, inet_init_net().
> 
> Fixes: d9ff3049739e ("net: Replace ip_ra_lock with per-net mutex")
> Cc: Kirill Tkhai 
> Signed-off-by: Cong Wang 
> ---
>  net/core/net_namespace.c | 1 -
>  net/ipv4/af_inet.c   | 2 ++
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 670c84b1bfc2..b272ccfcbf63 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -308,7 +308,6 @@ static __net_init int setup_net(struct net *net, struct 
> user_namespace *user_ns)
>   net->user_ns = user_ns;
>   idr_init(&net->netns_ids);
>   spin_lock_init(&net->nsid_lock);
> - mutex_init(&net->ipv4.ra_mutex);
>  
>   list_for_each_entry(ops, &pernet_list, list) {
>   error = ops_init(ops, net);
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 20fda8fb8ffd..57b7bffb93e5 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1817,6 +1817,8 @@ static __net_init int inet_init_net(struct net *net)
>   net->ipv4.sysctl_igmp_llm_reports = 1;
>   net->ipv4.sysctl_igmp_qrv = 2;
>  
> + mutex_init(&net->ipv4.ra_mutex);
> +

In inet_init() the order of registration is:

ip_mr_init();
init_inet_pernet_ops();

This means, ipmr_net_ops pernet operations are before af_inet_ops
in pernet_list. So, there is a theoretical probability, sometimes
in the future, we will have a problem during a fail of net initialization.

Say,

setup_net():
ipmr_net_ops->init() returns 0
xxx->init()  returns error
and then we do:
ipmr_net_ops->exit(),

which could touch ra_mutex (theoretically).

Your patch is OK, but since you do this, we may also swap the order
of registration of ipmr_net_ops and af_inet_ops better too.

Kirill


[PATCH net 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-17 Thread Jose Abreu
This follows David Miller advice and tries to fix coalesce timer in
multi-queue scenarios.

We are now using per-queue coalesce values and per-queue TX timer.

Coalesce timer default values was changed to 1ms and the coalesce frames
to 25.

Tested in B2B setup between XGMAC2 and GMAC5.

Signed-off-by: Jose Abreu 
Fixes:  ce736788e8a ("net: stmmac: adding multiple buffers for TX")
Cc: Florian Fainelli 
Cc: Neil Armstrong 
Cc: Jerome Brunet 
Cc: Martin Blumenstingl 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/common.h  |   4 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  14 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 233 --
 include/linux/stmmac.h|   1 +
 4 files changed, 146 insertions(+), 106 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 1854f270ad66..b1b305f8f414 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -258,10 +258,10 @@ struct stmmac_safety_stats {
 #define MAX_DMA_RIWT   0xff
 #define MIN_DMA_RIWT   0x20
 /* Tx coalesce parameters */
-#define STMMAC_COAL_TX_TIMER   4
+#define STMMAC_COAL_TX_TIMER   1000
 #define STMMAC_MAX_COAL_TX_TICK10
 #define STMMAC_TX_MAX_FRAMES   256
-#define STMMAC_TX_FRAMES   64
+#define STMMAC_TX_FRAMES   25
 
 /* Packets types */
 enum packets_types {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index c0a855b7ab3b..63e1064b27a2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -48,6 +48,8 @@ struct stmmac_tx_info {
 
 /* Frequently used values are kept adjacent for cache effect */
 struct stmmac_tx_queue {
+   u32 tx_count_frames;
+   struct timer_list txtimer;
u32 queue_index;
struct stmmac_priv *priv_data;
struct dma_extended_desc *dma_etx cacheline_aligned_in_smp;
@@ -73,7 +75,14 @@ struct stmmac_rx_queue {
u32 rx_zeroc_thresh;
dma_addr_t dma_rx_phy;
u32 rx_tail_addr;
+};
+
+struct stmmac_channel {
struct napi_struct napi cacheline_aligned_in_smp;
+   struct stmmac_priv *priv_data;
+   u32 index;
+   int has_rx;
+   int has_tx;
 };
 
 struct stmmac_tc_entry {
@@ -109,14 +118,12 @@ struct stmmac_pps_cfg {
 
 struct stmmac_priv {
/* Frequently used values are kept adjacent for cache effect */
-   u32 tx_count_frames;
u32 tx_coal_frames;
u32 tx_coal_timer;
 
int tx_coalesce;
int hwts_tx_en;
bool tx_path_in_lpi_mode;
-   struct timer_list txtimer;
bool tso;
 
unsigned int dma_buf_sz;
@@ -137,6 +144,9 @@ struct stmmac_priv {
/* TX Queue */
struct stmmac_tx_queue tx_queue[MTL_MAX_TX_QUEUES];
 
+   /* Generic channel for NAPI */
+   struct stmmac_channel channel[STMMAC_CH_MAX];
+
bool oldlink;
int speed;
int oldduplex;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 9f458bb16f2a..ab9cc0143ff2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -148,12 +148,14 @@ static void stmmac_verify_args(void)
 static void stmmac_disable_all_queues(struct stmmac_priv *priv)
 {
u32 rx_queues_cnt = priv->plat->rx_queues_to_use;
+   u32 tx_queues_cnt = priv->plat->tx_queues_to_use;
+   u32 maxq = max(rx_queues_cnt, tx_queues_cnt);
u32 queue;
 
-   for (queue = 0; queue < rx_queues_cnt; queue++) {
-   struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
+   for (queue = 0; queue < maxq; queue++) {
+   struct stmmac_channel *ch = &priv->channel[queue];
 
-   napi_disable(&rx_q->napi);
+   napi_disable(&ch->napi);
}
 }
 
@@ -164,12 +166,14 @@ static void stmmac_disable_all_queues(struct stmmac_priv 
*priv)
 static void stmmac_enable_all_queues(struct stmmac_priv *priv)
 {
u32 rx_queues_cnt = priv->plat->rx_queues_to_use;
+   u32 tx_queues_cnt = priv->plat->tx_queues_to_use;
+   u32 maxq = max(rx_queues_cnt, tx_queues_cnt);
u32 queue;
 
-   for (queue = 0; queue < rx_queues_cnt; queue++) {
-   struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
+   for (queue = 0; queue < maxq; queue++) {
+   struct stmmac_channel *ch = &priv->channel[queue];
 
-   napi_enable(&rx_q->napi);
+   napi_enable(&ch->napi);
}
 }
 
@@ -1843,18 +1847,18 @@ static void stmmac_dma_operation_mode(struct 
stmmac_priv *priv)
  * @queue: TX queue index
  * Description: it reclaims the transmit resources after transmission 
completes.
  */
-static 

[PATCH net 2/2] net: stmmac: Fixup the tail addr setting in xmit path

2018-09-17 Thread Jose Abreu
Currently we are always setting the tail address of descriptor list to
the end of the pre-allocated list.

According to databook this is not correct. Tail address should point to
the last available descriptor + 1, which means we have to update the
tail address everytime we call the xmit function.

This should make no impact in older versions of MAC but in newer
versions there are some DMA features which allows the IP to fetch
descriptors in advance and in a non sequential order so its critical
that we set the tail address correctly.

Signed-off-by: Jose Abreu 
Fixes: f748be531d70 ("stmmac: support new GMAC4")
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ab9cc0143ff2..75896d6ba6e2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2213,8 +2213,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv 
*priv)
stmmac_init_tx_chan(priv, priv->ioaddr, priv->plat->dma_cfg,
tx_q->dma_tx_phy, chan);
 
-   tx_q->tx_tail_addr = tx_q->dma_tx_phy +
-   (DMA_TX_SIZE * sizeof(struct dma_desc));
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy;
stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
   tx_q->tx_tail_addr, chan);
}
@@ -3003,6 +3002,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len);
 
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx * sizeof(*desc));
stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
 
return NETDEV_TX_OK;
@@ -3210,6 +3210,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
stmmac_enable_dma_transmission(priv, priv->ioaddr);
 
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx * sizeof(*desc));
stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
 
return NETDEV_TX_OK;
-- 
2.7.4




[PATCH net 0/2] net: stmmac: Coalesce and tail addr fixes

2018-09-17 Thread Jose Abreu
The fix for coalesce timer and a fix in tail address setting that impacts
XGMAC2 operation.

The series is:
Tested-by: Jerome Brunet 
on a113 s400 board (single queue)

Cc: Florian Fainelli 
Cc: Neil Armstrong 
Cc: Jerome Brunet 
Cc: Martin Blumenstingl 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 

Jose Abreu (2):
  net: stmmac: Rework coalesce timer and fix multi-queue races
  net: stmmac: Fixup the tail addr setting in xmit path

 drivers/net/ethernet/stmicro/stmmac/common.h  |   4 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  14 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 238 --
 include/linux/stmmac.h|   1 +
 4 files changed, 149 insertions(+), 108 deletions(-)

-- 
2.7.4




iproute2: Debian 9 No ELF support

2018-09-17 Thread Bo YU

Hello,
I have followed the instructions from:

https://cilium.readthedocs.io/en/latest/bpf/#bpftool

to test xdp program.
But i can not enable elf support.

./configure --prefix=/usr
```output
TC schedulers
ATM no

libc has setns: yes
SELinux support: no
ELF support: no
libmnl support: yes
Berkeley DB: yes
need for strlcpy: yes
libcap support: yes
```
And i have installed libelf-dev :
```output
sudo apt show libelf-dev
Package: libelf-dev
Version: 0.168-1
Priority: optional
Section: libdevel
Source: elfutils
Maintainer: Kurt Roeckx 
Installed-Size: 353 kB
Depends: libelf1 (= 0.168-1)
Conflicts: libelfg0-dev
Homepage: https://sourceware.org/elfutils/
Tag: devel::library, role::devel-lib
```

And gcc version:
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)

uname -a:
Linux debian 4.18.0-rc1+ #2 SMP Sun Jun 24 16:53:57 HKT 2018 x86_64 GNU/Linux

Any help is appreciate.



Re: [PATCH net] netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev

2018-09-17 Thread Pablo Neira Ayuso
On Sun, Sep 16, 2018 at 09:14:42PM -0700, David Ahern wrote:
> Pablo:
> 
> DaveM has this marked as waiting for upstream. Any comment on this patch?

Please, resend a Cc netfilter-de...@vger.kernel.org

Thanks David.


[PATCH v2 0/4] net: qcom/emac: add shared mdio bus support

2018-09-17 Thread Wang Dongsheng
The emac include MDIO controller, and the motherboard has more than one
PHY connected to an MDIO bus. So share the shared mii_bus for others MAC
device that not has MDIO bus connected. 

Tested: QDF2400 (ACPI), buildin/insmod/rmmod

V2:
 - Separate patch.
 - bindings: s/Since QDF2400 emac/Since emac/

Wang Dongsheng (4):
  net: qcom/emac: split phy_config to mdio bus create and get phy device
  dt-bindings: net: qcom: Add binding for shared mdio bus
  net: qcom/emac: add of shared mdio bus support
  net: qcom/emac: add acpi shared mdio bus support

 .../devicetree/bindings/net/qcom-emac.txt |   4 +
 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 207 ++
 drivers/net/ethernet/qualcomm/emac/emac.c |   8 +-
 3 files changed, 175 insertions(+), 44 deletions(-)

-- 
2.18.0



[PATCH v2 1/4] net: qcom/emac: split phy_config to mdio bus create and get phy device

2018-09-17 Thread Wang Dongsheng
This patch separate emac_mdio_bus_create and emac_get_phydev from
emac_phy_config, and do some codes clean.

Signed-off-by: Wang Dongsheng 
---
 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 99 +++
 1 file changed, 57 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-phy.c 
b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
index 53dbf1e163a8..2d16c6b9ef49 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-phy.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
@@ -96,15 +96,14 @@ static int emac_mdio_write(struct mii_bus *bus, int addr, 
int regnum, u16 val)
return 0;
 }
 
-/* Configure the MDIO bus and connect the external PHY */
-int emac_phy_config(struct platform_device *pdev, struct emac_adapter *adpt)
+static int emac_mdio_bus_create(struct platform_device *pdev,
+   struct emac_adapter *adpt)
 {
struct device_node *np = pdev->dev.of_node;
struct mii_bus *mii_bus;
int ret;
 
-   /* Create the mii_bus object for talking to the MDIO bus */
-   adpt->mii_bus = mii_bus = devm_mdiobus_alloc(&pdev->dev);
+   mii_bus = devm_mdiobus_alloc(&pdev->dev);
if (!mii_bus)
return -ENOMEM;
 
@@ -115,50 +114,66 @@ int emac_phy_config(struct platform_device *pdev, struct 
emac_adapter *adpt)
mii_bus->parent = &pdev->dev;
mii_bus->priv = adpt;
 
-   if (has_acpi_companion(&pdev->dev)) {
-   u32 phy_addr;
-
-   ret = mdiobus_register(mii_bus);
-   if (ret) {
-   dev_err(&pdev->dev, "could not register mdio bus\n");
-   return ret;
-   }
-   ret = device_property_read_u32(&pdev->dev, "phy-channel",
-  &phy_addr);
-   if (ret)
-   /* If we can't read a valid phy address, then assume
-* that there is only one phy on this mdio bus.
-*/
-   adpt->phydev = phy_find_first(mii_bus);
-   else
-   adpt->phydev = mdiobus_get_phy(mii_bus, phy_addr);
-
-   /* of_phy_find_device() claims a reference to the phydev,
-* so we do that here manually as well. When the driver
-* later unloads, it can unilaterally drop the reference
-* without worrying about ACPI vs DT.
-*/
-   if (adpt->phydev)
-   get_device(&adpt->phydev->mdio.dev);
-   } else {
-   struct device_node *phy_np;
+   ret = of_mdiobus_register(mii_bus, has_acpi_companion(&pdev->dev) ?
+ NULL : np);
+   if (ret) {
+   dev_err(&pdev->dev, "Could not register mdio bus\n");
+   return ret;
+   }
 
-   ret = of_mdiobus_register(mii_bus, np);
-   if (ret) {
-   dev_err(&pdev->dev, "could not register mdio bus\n");
-   return ret;
-   }
+   adpt->mii_bus = mii_bus;
+   return 0;
+}
 
+static void emac_get_phydev(struct platform_device *pdev,
+   struct emac_adapter *adpt)
+{
+   struct device_node *np = pdev->dev.of_node;
+   struct mii_bus *bus = adpt->mii_bus;
+   struct device_node *phy_np;
+   u32 phy_addr;
+   int ret;
+
+   if (!has_acpi_companion(&pdev->dev)) {
phy_np = of_parse_phandle(np, "phy-handle", 0);
adpt->phydev = of_phy_find_device(phy_np);
of_node_put(phy_np);
+   return;
}
 
-   if (!adpt->phydev) {
-   dev_err(&pdev->dev, "could not find external phy\n");
-   mdiobus_unregister(mii_bus);
-   return -ENODEV;
-   }
+   ret = device_property_read_u32(&pdev->dev, "phy-channel",
+  &phy_addr);
+   if (ret)
+   /* If we can't read a valid phy address, then assume
+* that there is only one phy on this mdio bus.
+*/
+   adpt->phydev = phy_find_first(bus);
+   else
+   adpt->phydev = mdiobus_get_phy(bus, phy_addr);
+
+   /* of_phy_find_device() claims a reference to the phydev,
+* so we do that here manually as well. When the driver
+* later unloads, it can unilaterally drop the reference
+* without worrying about ACPI vs DT.
+*/
+   if (adpt->phydev)
+   get_device(&adpt->phydev->mdio.dev);
+}
 
-   return 0;
+/* Configure the MDIO bus and connect the external PHY */
+int emac_phy_config(struct platform_device *pdev, struct emac_adapter *adpt)
+{
+   int ret;
+
+   ret = emac_mdio_bus_create(pdev, adpt);
+   if (ret)
+   return ret;
+
+   emac_get_phydev(pdev, adpt);
+   if (adpt->phydev)
+   r

[PATCH v2 2/4] dt-bindings: net: qcom: Add binding for shared mdio bus

2018-09-17 Thread Wang Dongsheng
This property copy from "ibm,emac.txt" to describe a shared MIDO bus.
Since emac include MDIO, so If the motherboard has more than one PHY
connected to an MDIO bus, this property will point to the MAC device
that has the MDIO bus.

Signed-off-by: Wang Dongsheng 
---
V2: s/Since QDF2400 emac/Since emac/
---
 Documentation/devicetree/bindings/net/qcom-emac.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/qcom-emac.txt 
b/Documentation/devicetree/bindings/net/qcom-emac.txt
index 346e6c7f47b7..50db71771358 100644
--- a/Documentation/devicetree/bindings/net/qcom-emac.txt
+++ b/Documentation/devicetree/bindings/net/qcom-emac.txt
@@ -24,6 +24,9 @@ Internal PHY node:
 The external phy child node:
 - reg : The phy address
 
+Optional properties:
+- mdio-device : Shared MIDO bus.
+
 Example:
 
 FSM9900:
@@ -86,6 +89,7 @@ soc {
reg = <0x0 0x3880 0x0 0x1>,
  <0x0 0x38816000 0x0 0x1000>;
interrupts = <0 256 4>;
+   mdio-device = <&emac1>;
 
clocks = <&gcc 0>, <&gcc 1>, <&gcc 3>, <&gcc 4>, <&gcc 5>,
 <&gcc 6>, <&gcc 7>;
-- 
2.18.0



[PATCH v2 4/4] net: qcom/emac: add acpi shared mdio bus support

2018-09-17 Thread Wang Dongsheng
Parsing _DSD package "mdio-device".

Signed-off-by: Wang Dongsheng 
---
 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-phy.c 
b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
index 4f98f9a0ed54..7a96bcf15d3f 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-phy.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
@@ -97,6 +97,51 @@ static int emac_mdio_write(struct mii_bus *bus, int addr, 
int regnum, u16 val)
return 0;
 }
 
+static int acpi_device_match(struct device *dev, void *fwnode)
+{
+   return dev->fwnode == fwnode;
+}
+
+static int emac_acpi_get_shared_bus(struct platform_device *pdev,
+   struct mii_bus **bus)
+{
+   struct fwnode_reference_args args;
+   struct fwnode_handle *fw_node;
+
+   struct device *shared_dev;
+   struct net_device *shared_netdev;
+   struct emac_adapter *shared_adpt;
+   int ret;
+
+   if (!has_acpi_companion(&pdev->dev))
+   return -ENODEV;
+
+   fw_node = acpi_fwnode_handle(ACPI_COMPANION(&pdev->dev));
+   ret = acpi_node_get_property_reference(fw_node, "mdio-device", 0,
+  &args);
+   if (ACPI_FAILURE(ret) || !is_acpi_device_node(args.fwnode)) {
+   dev_err(&pdev->dev, "Missing mdio-device property\n");
+   return -ENODEV;
+   }
+
+   shared_dev = bus_find_device(&platform_bus_type, NULL,
+args.fwnode,
+acpi_device_match);
+   if (!shared_dev)
+   return -EPROBE_DEFER;
+
+   shared_netdev = dev_get_drvdata(shared_dev);
+   if (!shared_netdev)
+   return -EPROBE_DEFER;
+
+   shared_adpt = netdev_priv(shared_netdev);
+   if (!shared_adpt->mii_bus)
+   return -EPROBE_DEFER;
+
+   *bus = shared_adpt->mii_bus;
+   return 0;
+}
+
 static int emac_of_get_shared_bus(struct platform_device *pdev,
  struct mii_bus **bus)
 {
@@ -137,6 +182,12 @@ static int emac_of_get_shared_bus(struct platform_device 
*pdev,
 static int __do_get_emac_mido_shared_bus(struct platform_device *pdev,
 struct emac_adapter *adpt)
 {
+   int ret;
+
+   ret = emac_acpi_get_shared_bus(pdev, &adpt->mii_bus);
+   if (adpt->mii_bus || ret == -EPROBE_DEFER)
+   return ret;
+
return emac_of_get_shared_bus(pdev, &adpt->mii_bus);
 }
 
-- 
2.18.0



[PATCH v2 3/4] net: qcom/emac: add of shared mdio bus support

2018-09-17 Thread Wang Dongsheng
Share the mii_bus for others MAC device because EMAC include MDIO,
and the motherboard has more than one PHY connected to an MDIO bus.

Signed-off-by: Wang Dongsheng 
---
 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 63 ++-
 drivers/net/ethernet/qualcomm/emac/emac.c |  8 ++-
 2 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-phy.c 
b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
index 2d16c6b9ef49..4f98f9a0ed54 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-phy.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
@@ -13,6 +13,7 @@
 /* Qualcomm Technologies, Inc. EMAC PHY Controller driver.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -96,8 +97,51 @@ static int emac_mdio_write(struct mii_bus *bus, int addr, 
int regnum, u16 val)
return 0;
 }
 
-static int emac_mdio_bus_create(struct platform_device *pdev,
-   struct emac_adapter *adpt)
+static int emac_of_get_shared_bus(struct platform_device *pdev,
+ struct mii_bus **bus)
+{
+   struct device_node *shared_node;
+   struct platform_device *shared_pdev;
+   struct net_device *shared_netdev;
+   struct emac_adapter *shared_adpt;
+   struct device_node *np = pdev->dev.of_node;
+
+   const phandle *prop;
+
+   prop = of_get_property(np, "mdio-device", NULL);
+   if (!prop) {
+   dev_err(&pdev->dev, "Missing mdio-device property\n");
+   return -ENODEV;
+   }
+
+   shared_node = of_find_node_by_phandle(*prop);
+   if (!shared_node)
+   return -ENODEV;
+
+   shared_pdev = of_find_device_by_node(shared_node);
+   if (!shared_pdev)
+   return -ENODEV;
+
+   shared_netdev = dev_get_drvdata(&shared_pdev->dev);
+   if (!shared_netdev)
+   return -EPROBE_DEFER;
+
+   shared_adpt = netdev_priv(shared_netdev);
+   if (!shared_adpt->mii_bus)
+   return -EPROBE_DEFER;
+
+   *bus = shared_adpt->mii_bus;
+   return 0;
+}
+
+static int __do_get_emac_mido_shared_bus(struct platform_device *pdev,
+struct emac_adapter *adpt)
+{
+   return emac_of_get_shared_bus(pdev, &adpt->mii_bus);
+}
+
+static int __do_emac_mido_bus_create(struct platform_device *pdev,
+struct emac_adapter *adpt)
 {
struct device_node *np = pdev->dev.of_node;
struct mii_bus *mii_bus;
@@ -125,6 +169,17 @@ static int emac_mdio_bus_create(struct platform_device 
*pdev,
return 0;
 }
 
+static int emac_mdio_bus_create(struct platform_device *pdev,
+   struct emac_adapter *adpt)
+{
+   bool shared_mdio;
+
+   shared_mdio = device_property_read_bool(&pdev->dev, "mdio-device");
+   if (shared_mdio)
+   return __do_get_emac_mido_shared_bus(pdev, adpt);
+   return __do_emac_mido_bus_create(pdev, adpt);
+}
+
 static void emac_get_phydev(struct platform_device *pdev,
struct emac_adapter *adpt)
 {
@@ -174,6 +229,8 @@ int emac_phy_config(struct platform_device *pdev, struct 
emac_adapter *adpt)
return 0;
 
dev_err(&pdev->dev, "Could not find external phy\n");
-   mdiobus_unregister(adpt->mii_bus);
+   /* Only the bus creator can unregister mdio bus */
+   if (&pdev->dev == adpt->mii_bus->parent)
+   mdiobus_unregister(adpt->mii_bus);
return -ENODEV;
 }
diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c 
b/drivers/net/ethernet/qualcomm/emac/emac.c
index 2a0cbc535a2e..6e566b4c5a6b 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -727,7 +727,9 @@ static int emac_probe(struct platform_device *pdev)
netif_napi_del(&adpt->rx_q.napi);
 err_undo_mdiobus:
put_device(&adpt->phydev->mdio.dev);
-   mdiobus_unregister(adpt->mii_bus);
+   /* Only the bus creator can unregister mdio bus */
+   if (&pdev->dev == adpt->mii_bus->parent)
+   mdiobus_unregister(adpt->mii_bus);
 err_undo_clocks:
emac_clks_teardown(adpt);
 err_undo_netdev:
@@ -747,7 +749,9 @@ static int emac_remove(struct platform_device *pdev)
emac_clks_teardown(adpt);
 
put_device(&adpt->phydev->mdio.dev);
-   mdiobus_unregister(adpt->mii_bus);
+   /* Only the bus creator can unregister mdio bus */
+   if (&pdev->dev == adpt->mii_bus->parent)
+   mdiobus_unregister(adpt->mii_bus);
free_netdev(netdev);
 
if (adpt->phy.digital)
-- 
2.18.0



Re: [PATCH net-next RFC 7/8] udp: gro behind static key

2018-09-17 Thread Steffen Klassert
On Fri, Sep 14, 2018 at 01:59:40PM -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> Avoid the socket lookup cost in udp_gro_receive if no socket has a
> gro callback configured.
> 
> Signed-off-by: Willem de Bruijn 

...

> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 4f6aa95a9b12..f44fe328aa0f 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct list_head 
> *head,
>  {
>   struct udphdr *uh = udp_gro_udphdr(skb);
>  
> - if (unlikely(!uh))
> + if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key))
>   goto flush;

If you use udp_encap_needed_key to enalbe UDP GRO, then a UDP
encapsulation socket will enable it too. Not sure if this is
intentional.

That said, enabling UDP GRO on a UDP encapsulation socket
(ESP in UPD etc.) will fail badly as then encrypted ESP
packets might be merged together. So we somehow should
make sure that this does not happen.

Anyway, this reminds me that we can support GRO for
UDP encapsulation. It just requires separate GRO
callbacks for the different encapsulation types.


Re: [PATH RFC net-next 1/8] net: phy: Move linkmode helpers to somewhere public

2018-09-17 Thread Maxime Chevallier
On Fri, 14 Sep 2018 23:38:49 +0200
Andrew Lunn  wrote:

>phylink has some useful helpers to working with linkmode bitmaps.
>Move them to there own header so other code can use them.
>
>Signed-off-by: Andrew Lunn 

Reviewed-by: Maxime Chevallier 


Re: [PATH RFC net-next 2/8] net: phy: Add phydev_warn()

2018-09-17 Thread Maxime Chevallier
On Fri, 14 Sep 2018 23:38:50 +0200
Andrew Lunn  wrote:

>Not all new style LINK_MODE bits can be converted into old style
>SUPPORTED bits. We need to warn when such a conversion is attempted.
>Add a helper for this.
>
>Signed-off-by: Andrew Lunn 

Reviewed-by: Maxime Chevallier 


Re: [PATH RFC net-next 3/8] net: phy: Add helper to convert MII ADV register to a linkmode

2018-09-17 Thread Maxime Chevallier
On Fri, 14 Sep 2018 23:38:51 +0200
Andrew Lunn  wrote:

>The phy_mii_ioctl can be used to write a value into the MII_ADVERTISE
>register in the PHY. Since this changes the state of the PHY, we need
>to make the same change to phydev->advertising. Add a helper which can
>convert the register value to a linkmode.
>
>Signed-off-by: Andrew Lunn 

Reviewed-by: Maxime Chevallier 


Re: [PATH RFC net-next 4/8] net: phy: Add helper for advertise to lcl value

2018-09-17 Thread Maxime Chevallier
On Fri, 14 Sep 2018 23:38:52 +0200
Andrew Lunn  wrote:

>Add a helper to convert the local advertising to an LCL capabilities,
>which is then used to resolve pause flow control settings.
>
>Signed-off-by: Andrew Lunn 

Reviewed-by: Maxime Chevallier 


Re: [PATH RFC net-next 5/8] net: phy: Add limkmode equivalents to some of the MII ethtool helpers

2018-09-17 Thread Maxime Chevallier
On Fri, 14 Sep 2018 23:38:53 +0200
Andrew Lunn  wrote:

>Add helpers which take a linkmode rather than a u32 ethtool for
>advertising settings.
>
>Signed-off-by: Andrew Lunn 

Reviewed-by: Maxime Chevallier 


Re: iproute2: Debian 9 No ELF support

2018-09-17 Thread Daniel Borkmann
On 09/17/2018 10:23 AM, Bo YU wrote:
> Hello,
> I have followed the instructions from:
> 
> https://cilium.readthedocs.io/en/latest/bpf/#bpftool
> 
> to test xdp program.
> But i can not enable elf support.
> 
> ./configure --prefix=/usr
> ```output
> TC schedulers
> ATM    no
> 
> libc has setns: yes
> SELinux support: no
> ELF support: no
> libmnl support: yes
> Berkeley DB: yes
> need for strlcpy: yes
> libcap support: yes
> ```
> And i have installed libelf-dev :
> ```output
> sudo apt show libelf-dev
> Package: libelf-dev
> Version: 0.168-1
> Priority: optional
> Section: libdevel
> Source: elfutils
> Maintainer: Kurt Roeckx 
> Installed-Size: 353 kB
> Depends: libelf1 (= 0.168-1)
> Conflicts: libelfg0-dev
> Homepage: https://sourceware.org/elfutils/
> Tag: devel::library, role::devel-lib
> ```
> 
> And gcc version:
> gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
> 
> uname -a:
> Linux debian 4.18.0-rc1+ #2 SMP Sun Jun 24 16:53:57 HKT 2018 x86_64 GNU/Linux
> 
> Any help is appreciate.

Debian's official iproute2 packaging build says 'libelf-dev' [0], and having
libelf-dev installed should work ...

  [...]
  Build-Depends: bison,
   debhelper (>= 10~),
   flex,
   iptables-dev,
   libatm1-dev,
   libcap-dev,
   libdb-dev,
   libelf-dev,
   libmnl-dev,
   libselinux1-dev,
   linux-libc-dev,
   pkg-config,
   po-debconf,
   zlib1g-dev,
  [...]

Did you ran into this one perhaps [1]? Do you have zlib1g-dev installed?

  [0] https://salsa.debian.org/debian/iproute2/blob/master/debian/control
  [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=885071


[PATCH v2 1/2] netlink: add NLA_REJECT policy type

2018-09-17 Thread Johannes Berg
From: Johannes Berg 

In some situations some netlink attributes may be used for output
only (kernel->userspace) or may be reserved for future use. It's
then helpful to be able to prevent userspace from using them in
messages sent to the kernel, since they'd otherwise be ignored and
any future will become impossible if this happens.

Add NLA_REJECT to the policy which does nothing but reject (with
EINVAL) validation of any messages containing this attribute.
Allow for returning a specific extended ACK error message in the
validation_data pointer.

While at it clear up the documentation a bit - the NLA_BITFIELD32
documentation was added to the list of len field descriptions.

Also, use NL_SET_BAD_ATTR() in one place where it's open-coded.

The specific case I have in mind now is a shared nested attribute
containing request/response data, and it would be pointless and
potentially confusing to have userspace include response data in
the messages that actually contain a request.

Signed-off-by: Johannes Berg 
---
v2: preserve behaviour of overwriting the extack message, with
either the generic or the specific one now
---
 include/net/netlink.h | 13 -
 lib/nlattr.c  | 23 ---
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 0c154f98e987..b318b0a9f6c3 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -180,6 +180,7 @@ enum {
NLA_S32,
NLA_S64,
NLA_BITFIELD32,
+   NLA_REJECT,
__NLA_TYPE_MAX,
 };
 
@@ -208,9 +209,19 @@ enum {
  *NLA_MSECSLeaving the length field zero will verify the
  * given type fits, using it verifies minimum length
  * just like "All other"
- *NLA_BITFIELD32  A 32-bit bitmap/bitselector attribute
+ *NLA_BITFIELD32   Unused
+ *NLA_REJECT   Unused
  *All otherMinimum length of attribute payload
  *
+ * Meaning of `validation_data' field:
+ *NLA_BITFIELD32   This is a 32-bit bitmap/bitselector attribute and
+ * validation data must point to a u32 value of valid
+ * flags
+ *NLA_REJECT   This attribute is always rejected and validation 
data
+ * may point to a string to report as the error instead
+ * of the generic one in extended ACK.
+ *All otherUnused
+ *
  * Example:
  * static const struct nla_policy my_policy[ATTR_MAX+1] = {
  * [ATTR_FOO] = { .type = NLA_U16 },
diff --git a/lib/nlattr.c b/lib/nlattr.c
index e335bcafa9e4..36d74b079151 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -69,7 +69,8 @@ static int validate_nla_bitfield32(const struct nlattr *nla,
 }
 
 static int validate_nla(const struct nlattr *nla, int maxtype,
-   const struct nla_policy *policy)
+   const struct nla_policy *policy,
+   const char **error_msg)
 {
const struct nla_policy *pt;
int minlen = 0, attrlen = nla_len(nla), type = nla_type(nla);
@@ -87,6 +88,11 @@ static int validate_nla(const struct nlattr *nla, int 
maxtype,
}
 
switch (pt->type) {
+   case NLA_REJECT:
+   if (pt->validation_data && error_msg)
+   *error_msg = pt->validation_data;
+   return -EINVAL;
+
case NLA_FLAG:
if (attrlen > 0)
return -ERANGE;
@@ -180,11 +186,10 @@ int nla_validate(const struct nlattr *head, int len, int 
maxtype,
int rem;
 
nla_for_each_attr(nla, head, len, rem) {
-   int err = validate_nla(nla, maxtype, policy);
+   int err = validate_nla(nla, maxtype, policy, NULL);
 
if (err < 0) {
-   if (extack)
-   extack->bad_attr = nla;
+   NL_SET_BAD_ATTR(extack, nla);
return err;
}
}
@@ -250,11 +255,15 @@ int nla_parse(struct nlattr **tb, int maxtype, const 
struct nlattr *head,
u16 type = nla_type(nla);
 
if (type > 0 && type <= maxtype) {
+   static const char _msg[] = "Attribute failed policy 
validation";
+   const char *msg = _msg;
+
if (policy) {
-   err = validate_nla(nla, maxtype, policy);
+   err = validate_nla(nla, maxtype, policy, &msg);
if (err < 0) {
-   NL_SET_ERR_MSG_ATTR(extack, nla,
-   "Attribute failed 
policy validation");
+   NL_SET_BAD_ATTR(extack, nla);
+   if (extack)
+   exta

[PATCH v2 2/2] netlink: add ethernet address policy types

2018-09-17 Thread Johannes Berg
From: Johannes Berg 

Commonly, ethernet addresses are just using a policy of
{ .len = ETH_ALEN }
which leaves userspace free to send more data than it should,
which may hide bugs.

Introduce NLA_EXACT_LEN which checks for exact size, rejecting
the attribute if it's not exactly that length. Also add
NLA_EXACT_LEN_WARN which requires the minimum length and will
warn on longer attributes, for backward compatibility.

Use these to define NLA_POLICY_ETH_ADDR (new strict policy) and
NLA_POLICY_ETH_ADDR_COMPAT (compatible policy with warning);
these are used like this:

static const struct nla_policy [...] = {
[NL_ATTR_NAME] = NLA_POLICY_ETH_ADDR,
...
};

Signed-off-by: Johannes Berg 
---
v2: add only NLA_EXACT_LEN/NLA_EXACT_LEN_WARN and build on top
of that for ethernet address validation, so it can be extended
for other types (e.g. IPv6 addresses)
---
 include/net/netlink.h | 13 +
 lib/nlattr.c  |  8 +++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index b318b0a9f6c3..318b1ded3833 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -181,6 +181,8 @@ enum {
NLA_S64,
NLA_BITFIELD32,
NLA_REJECT,
+   NLA_EXACT_LEN,
+   NLA_EXACT_LEN_WARN,
__NLA_TYPE_MAX,
 };
 
@@ -211,6 +213,10 @@ enum {
  * just like "All other"
  *NLA_BITFIELD32   Unused
  *NLA_REJECT   Unused
+ *NLA_EXACT_LENAttribute must have exactly this length, otherwise
+ * it is rejected.
+ *NLA_EXACT_LEN_WARN   Attribute should have exactly this length, a warning
+ * is logged if it is longer, shorter is rejected.
  *All otherMinimum length of attribute payload
  *
  * Meaning of `validation_data' field:
@@ -236,6 +242,13 @@ struct nla_policy {
void*validation_data;
 };
 
+#define NLA_POLICY_EXACT_LEN(_len) { .type = NLA_EXACT_LEN, .len = _len }
+#define NLA_POLICY_EXACT_LEN_WARN(_len){ .type = NLA_EXACT_LEN_WARN, \
+ .len = _len }
+
+#define NLA_POLICY_ETH_ADDRNLA_POLICY_EXACT_LEN(ETH_ALEN)
+#define NLA_POLICY_ETH_ADDR_COMPAT NLA_POLICY_EXACT_LEN_WARN(ETH_ALEN)
+
 /**
  * struct nl_info - netlink source information
  * @nlh: Netlink message header of original request
diff --git a/lib/nlattr.c b/lib/nlattr.c
index 36d74b079151..bb6fe5ed4ecf 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -82,12 +82,18 @@ static int validate_nla(const struct nlattr *nla, int 
maxtype,
 
BUG_ON(pt->type > NLA_TYPE_MAX);
 
-   if (nla_attr_len[pt->type] && attrlen != nla_attr_len[pt->type]) {
+   if ((nla_attr_len[pt->type] && attrlen != nla_attr_len[pt->type]) ||
+   (pt->type == NLA_EXACT_LEN_WARN && attrlen != pt->len)) {
pr_warn_ratelimited("netlink: '%s': attribute type %d has an 
invalid length.\n",
current->comm, type);
}
 
switch (pt->type) {
+   case NLA_EXACT_LEN:
+   if (attrlen != pt->len)
+   return -ERANGE;
+   break;
+
case NLA_REJECT:
if (pt->validation_data && error_msg)
*error_msg = pt->validation_data;
-- 
2.14.4



Re: [PATH RFC net-next 0/8] Continue towards using linkmode in phylib

2018-09-17 Thread Maxime Chevallier
Hi Andrew,

On Fri, 14 Sep 2018 23:38:48 +0200
Andrew Lunn  wrote:

>These patches contain some further cleanup and helpers, and the first
>real patch towards using linkmode bitmaps in phylink.
>
>It is RFC because i don't like patch #7 and maybe somebody has a
>better idea how to do this. Ideally, we want to initialise a linux
>generic bitmap at compile time.

Thanks for that series. I've reviewed what I feel confident enough to,
I'll be happy to test the runtime "features" listing that I think you
plan to implement.

Maxime 


Re: [RFC PATCH 3/4] udp: implement GRO plain UDP sockets.

2018-09-17 Thread Paolo Abeni
Hi,

On Fri, 2018-09-14 at 09:48 -0700, Eric Dumazet wrote:
> Are you sure the data is actually fully copied to user space ?
> 
> tools/testing/selftests/net/udpgso_bench_rx.c
> 
> uses :
> 
> static char rbuf[ETH_DATA_LEN];
>/* MSG_TRUNC will make return value full datagram length */
>ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
> 
> So you need to change this program.

Thank for the feedback.

You are right, I need to update udpgso_bench_rx. Making it
unconditionally read up to 64K bytes, I measure:

Before:
udp rx:962 MB/s   685339 calls/s

After:
udp rx:   1344 MB/s22812 calls/s

Top perf offenders for udpgso_bench_rx:
  31.83%  [kernel] [k] copy_user_enhanced_fast_string
   8.90%  [kernel] [k] skb_release_data
   7.97%  [kernel] [k] free_pcppages_bulk
   6.82%  [kernel] [k] copy_page_to_iter
   3.41%  [kernel] [k] skb_copy_datagram_iter
   2.01%  [kernel] [k] free_unref_page
   1.92%  [kernel] [k] __entry_SYSCALL_64_trampoline

Trivial note: with this even UDP sockets would benefit from remote skb
freeing, as the cost of skb_release_data is relevant for the GSO
packets.

> Also, GRO reception would mean that userspace can retrieve,
> not only full bytes of X datagrams, but also the gso_size (or length of 
> individual datagrams)
> 
> You can not know the size of the packets in advance, the sender will decide.

Thanks for pointing that out. I guess that implementing something like
cmsg(UDP_SEGMENT) as Willem suggests in in 8/8 patch would do, right?

I can have a look at that _if_ there is interest in this approch,

Cheers,

Paolo



Re: [RFC PATCH 2/4] net: enable UDP gro on demand.

2018-09-17 Thread Paolo Abeni
On Sun, 2018-09-16 at 14:23 -0400, Willem de Bruijn wrote:
> That udp gro implementation is clearly less complete than yours in
> this patchset. The point I wanted to bring up for discussion is not the
> protocol implementation, but the infrastructure for enabling it
> conditionally.

I'm still [trying to] processing your patchset ;) So please perdon me
for any obvious interpretation mistakes...

> Assuming cycle cost is comparable, what do you think of  using the
> existing sk offload callbacks to enable this on a per-socket basis?

I have no objection about that, if there are no performance drawbacks.
In my measures retpoline costs is relevant for every indirect call
added. Using the existing sk offload approach will require an
additional indirect call per packet compared to the implementation
here.

> As for the protocol-wide knob, I do strongly prefer something that can
> work for all protocols, not just UDP. 

I like the general infrastructure idea. I think there is some agreement
in avoiding the addition of more user-controllable knobs, as we already
have a lot of them. If I read your patch correctly, user-space need to
enable/disable the UDO GSO explicitly via procfs, right?

I tried to look for something that does not require user action.

> I also implemented a version that
> atomically swaps the struct ptr instead of the flag based approach I sent
> for review. I'm fairly agnostic about that point. 

I think/fear security oriented guys may scream for the somewhat large
deconstification ?!?

> One subtle issue is that I
> believe we need to keep the gro_complete callbacks enabled, as gro
> packets may be queued for completion when gro_receive gets disabled.

Good point, thanks! I missed that.

Cheers,

Paolo




Re: [PATCH bpf-next] tools/bpf: bpftool: improve output format for bpftool net

2018-09-17 Thread Daniel Borkmann
On 09/14/2018 11:49 PM, Yonghong Song wrote:
> This is a followup patch for Commit f6f3bac08ff9
> ("tools/bpf: bpftool: add net support").
> Some improvements are made for the bpftool net output.
> Specially, plain output is more concise such that
> per attachment should nicely fit in one line.
> Compared to previous output, the prog tag is removed
> since it can be easily obtained with program id.
> Similar to xdp attachments, the device name is added
> to tc_filters attachments.
> 
> The bpf program attached through shared block
> mechanism is supported as well.
>   $ ip link add dev v1 type veth peer name v2
>   $ tc qdisc add dev v1 ingress_block 10 egress_block 20 clsact
>   $ tc qdisc add dev v2 ingress_block 10 egress_block 20 clsact
>   $ tc filter add block 10 protocol ip prio 25 bpf obj bpf_shared.o sec 
> ingress flowid 1:1
>   $ tc filter add block 20 protocol ip prio 30 bpf obj bpf_cyclic.o sec 
> classifier flowid 1:1
>   $ bpftool net
>   xdp [
>   ]
>   tc_filters [
>v2(7) qdisc_clsact_ingress bpf_shared.o:[ingress] id 23
>v2(7) qdisc_clsact_egress bpf_cyclic.o:[classifier] id 24
>v1(8) qdisc_clsact_ingress bpf_shared.o:[ingress] id 23
>v1(8) qdisc_clsact_egress bpf_cyclic.o:[classifier] id 24

Just one minor note for this one here, do we even need the "qdisc_" prefix? 
Couldn't it just simply
be "clsact/ingress", "clsact/egress", "htb" etc?

>   ]
> 
> The documentation and "bpftool net help" are updated
> to make it clear that current implementation only
> supports xdp and tc attachments. For programs
> attached to cgroups, "bpftool cgroup" can be used
> to dump attachments. For other programs e.g.
> sk_{filter,skb,msg,reuseport} and lwt/seg6,
> iproute2 tools should be used.
> 
> The new output:
>   $ bpftool net
>   xdp [
>eth0(2) id/drv 198

Could we change the "id/{drv,offload,generic} xyz" into e.g. "eth0(2) 
{driver,offload,generic} id 198",
meaning, the "id xyz" being a child of either "driver", "offload" or "generic". 
Reason would be two-fold:
i) we can keep the "id xyz" notion consistent as used under "tc_filters", and 
ii) it allows to put further
information aside from just "id" member under "driver", "offload" or "generic" 
in the future.

>   ]
>   tc_filters [

Nit: can we use just "tc" for the above? Main use case would be clsact with one 
of its two hooks anyway,
and the term "filter" is sort of tc historic; while being correct bpf progs 
would do much more than just
filtering, and context is pretty clear anyway from qdisc that we subsequently 
dump.

>eth0(2) qdisc_clsact_ingress fbflow_icmp id 335 act [{icmp_action id 336}]
>eth0(2) qdisc_clsact_egress fbflow_egress id 334
>   ]
>   $ bpftool -jp net
>   [{
> "xdp": [{
> "devname": "eth0",
> "ifindex": 2,
> "id/drv": 198
> }
> ],
> "tc_filters": [{
> "devname": "eth0",
> "ifindex": 2,
> "kind": "qdisc_clsact_ingress",
> "name": "fbflow_icmp",
> "id": 335,
> "act": [{
> "name": "icmp_action",
> "id": 336
> }
> ]
> },{
> "devname": "eth0",
> "ifindex": 2,
> "kind": "qdisc_clsact_egress",
> "name": "fbflow_egress",
> "id": 334
> }
> ]
> }
>   ]
> 
> Signed-off-by: Yonghong Song 


Re: [PATCH net-next RFC 7/8] udp: gro behind static key

2018-09-17 Thread Paolo Abeni
On Fri, 2018-09-14 at 13:59 -0400, Willem de Bruijn wrote:
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 4f6aa95a9b12..f44fe328aa0f 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct list_head 
> *head,
>  {
>   struct udphdr *uh = udp_gro_udphdr(skb);
>  
> - if (unlikely(!uh))
> + if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key))
>   goto flush;
>  
>   /* Don't bother verifying checksum if we're going to flush anyway. */

If I read this correctly, once udp_encap_needed_key is enabled, it will
never be turned off, because the tunnel and encap socket shut down does
not cope with udp_encap_needed_key.

Perhaps we should take care of that, too.

Cheers,

Paolo




[PATCH mlx5-next 1/4] net/mlx5: Rename incorrect naming in IFC file

2018-09-17 Thread Leon Romanovsky
From: Mark Bloch 

Remove a trailing underscore from the multicast/unicast names.

Signed-off-by: Mark Bloch 
Reviewed-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/qp.c | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 2 +-
 include/linux/mlx5/mlx5_ifc.h   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 1f35ecbefffe..8bada4b9 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1279,7 +1279,7 @@ static int create_raw_packet_qp_tir(struct mlx5_ib_dev 
*dev,
 
if (dev->rep)
MLX5_SET(tirc, tirc, self_lb_block,
-MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST_);
+MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST);
 
err = mlx5_core_create_tir(dev->mdev, in, inlen, &rq->tirn);
 
@@ -1582,7 +1582,7 @@ static int create_rss_raw_qp_tir(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
 create_tir:
if (dev->rep)
MLX5_SET(tirc, tirc, self_lb_block,
-MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST_);
+MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST);
 
err = mlx5_core_create_tir(dev->mdev, in, inlen, &qp->rss_qp.tirn);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index db3278cc052b..3078491cc0d0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -153,7 +153,7 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool 
enable_uc_lb)
 
if (enable_uc_lb)
MLX5_SET(modify_tir_in, in, ctx.self_lb_block,
-MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST_);
+MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST);
 
MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1);
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 3a4a2e0567e9..4c7a1d25d73b 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2559,8 +2559,8 @@ enum {
 };
 
 enum {
-   MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST_= 0x1,
-   MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST_  = 0x2,
+   MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST= 0x1,
+   MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST  = 0x2,
 };
 
 struct mlx5_ifc_tirc_bits {
-- 
2.14.4



[PATCH rdma-next 3/4] RDMA/mlx5: Allow creating RAW ethernet QP with loopback support

2018-09-17 Thread Leon Romanovsky
From: Mark Bloch 

Expose two new flags:
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC

Those flags can be used at creation time in order to allow a QP
to be able to receive loopback traffic (unicast and multicast).
We store the state in the QP to be used on the destroy path
to indicate with which flags the QP was created with.

Signed-off-by: Mark Bloch 
Reviewed-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  2 +-
 drivers/infiniband/hw/mlx5/qp.c  | 62 
 include/uapi/rdma/mlx5-abi.h |  2 ++
 3 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 7b2af7e719c4..b258adb93097 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -440,7 +440,7 @@ struct mlx5_ib_qp {
struct list_headcq_send_list;
struct mlx5_rate_limit  rl;
u32 underlay_qpn;
-   booltunnel_offload_en;
+   u32 flags_en;
/* storage for qp sub type when core qp type is IB_QPT_DRIVER */
enum ib_qp_type qp_sub_type;
 };
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 8bada4b9..428e417e01da 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1258,8 +1258,9 @@ static bool tunnel_offload_supported(struct mlx5_core_dev 
*dev)
 
 static int create_raw_packet_qp_tir(struct mlx5_ib_dev *dev,
struct mlx5_ib_rq *rq, u32 tdn,
-   bool tunnel_offload_en)
+   u32 *qp_flags_en)
 {
+   u8 lb_flag = 0;
u32 *in;
void *tirc;
int inlen;
@@ -1274,12 +1275,21 @@ static int create_raw_packet_qp_tir(struct mlx5_ib_dev 
*dev,
MLX5_SET(tirc, tirc, disp_type, MLX5_TIRC_DISP_TYPE_DIRECT);
MLX5_SET(tirc, tirc, inline_rqn, rq->base.mqp.qpn);
MLX5_SET(tirc, tirc, transport_domain, tdn);
-   if (tunnel_offload_en)
+   if (*qp_flags_en & MLX5_QP_FLAG_TUNNEL_OFFLOADS)
MLX5_SET(tirc, tirc, tunneled_offload_en, 1);
 
-   if (dev->rep)
-   MLX5_SET(tirc, tirc, self_lb_block,
-MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST);
+   if (*qp_flags_en & MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC)
+   lb_flag |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST;
+
+   if (*qp_flags_en & MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC)
+   lb_flag |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST;
+
+   if (dev->rep) {
+   lb_flag |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST;
+   *qp_flags_en |= MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC;
+   }
+
+   MLX5_SET(tirc, tirc, self_lb_block, lb_flag);
 
err = mlx5_core_create_tir(dev->mdev, in, inlen, &rq->tirn);
 
@@ -1332,8 +1342,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
goto err_destroy_sq;
 
 
-   err = create_raw_packet_qp_tir(dev, rq, tdn,
-  qp->tunnel_offload_en);
+   err = create_raw_packet_qp_tir(dev, rq, tdn, &qp->flags_en);
if (err)
goto err_destroy_rq;
}
@@ -1410,6 +1419,7 @@ static int create_rss_raw_qp_tir(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
u32 tdn = mucontext->tdn;
struct mlx5_ib_create_qp_rss ucmd = {};
size_t required_cmd_sz;
+   u8 lb_flag = 0;
 
if (init_attr->qp_type != IB_QPT_RAW_PACKET)
return -EOPNOTSUPP;
@@ -1444,7 +1454,9 @@ static int create_rss_raw_qp_tir(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
return -EOPNOTSUPP;
}
 
-   if (ucmd.flags & ~MLX5_QP_FLAG_TUNNEL_OFFLOADS) {
+   if (ucmd.flags & ~(MLX5_QP_FLAG_TUNNEL_OFFLOADS |
+  MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC |
+  MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC)) {
mlx5_ib_dbg(dev, "invalid flags\n");
return -EOPNOTSUPP;
}
@@ -1461,6 +1473,16 @@ static int create_rss_raw_qp_tir(struct mlx5_ib_dev 
*dev, struct mlx5_ib_qp *qp,
return -EOPNOTSUPP;
}
 
+   if (ucmd.flags & MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC || dev->rep) {
+   lb_flag |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST;
+   qp->flags_en |= MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC;
+   }
+
+   if (ucmd.flags & MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC) {
+   lb_flag |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST;
+   qp->flags_en |= MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC;
+   }
+
err = ib_copy_to_udata(udata, &resp, min(udata->outlen, sizeof(resp)));
if (err) {
mlx5_ib_dbg(dev, "copy failed\n");
@@ -

[PATCH rdma-next 4/4] RDMA/mlx5: Enable vport loopback when user context or QP mandate

2018-09-17 Thread Leon Romanovsky
From: Mark Bloch 

A user can create a QP which can accept loopback traffic, but that's not
enough. We need to enable loopback on the vport as well. Currently vport
loopback is enabled only when more than 1 users are using the IB device,
update the logic to consider whatever a QP which supports loopback was
created, if so enable vport loopback even if there is only a single user.

Signed-off-by: Mark Bloch 
Reviewed-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/main.c| 40 +---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  4 
 drivers/infiniband/hw/mlx5/qp.c  | 34 +++---
 3 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index b64861ba2c42..7d5fcf76466f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1571,28 +1571,44 @@ static void deallocate_uars(struct mlx5_ib_dev *dev,
mlx5_cmd_free_uar(dev->mdev, bfregi->sys_pages[i]);
 }
 
-static int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev)
+int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp)
 {
int err = 0;
 
mutex_lock(&dev->lb.mutex);
-   dev->lb.user_td++;
-
-   if (dev->lb.user_td == 2)
-   err = mlx5_nic_vport_update_local_lb(dev->mdev, true);
+   if (td)
+   dev->lb.user_td++;
+   if (qp)
+   dev->lb.qps++;
+
+   if (dev->lb.user_td == 2 ||
+   dev->lb.qps == 1) {
+   if (!dev->lb.enabled) {
+   err = mlx5_nic_vport_update_local_lb(dev->mdev, true);
+   dev->lb.enabled = true;
+   }
+   }
 
mutex_unlock(&dev->lb.mutex);
 
return err;
 }
 
-static void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev)
+void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool td, bool qp)
 {
mutex_lock(&dev->lb.mutex);
-   dev->lb.user_td--;
-
-   if (dev->lb.user_td < 2)
-   mlx5_nic_vport_update_local_lb(dev->mdev, false);
+   if (td)
+   dev->lb.user_td--;
+   if (qp)
+   dev->lb.qps--;
+
+   if (dev->lb.user_td == 1 &&
+   dev->lb.qps == 0) {
+   if (dev->lb.enabled) {
+   mlx5_nic_vport_update_local_lb(dev->mdev, false);
+   dev->lb.enabled = false;
+   }
+   }
 
mutex_unlock(&dev->lb.mutex);
 }
@@ -1613,7 +1629,7 @@ static int mlx5_ib_alloc_transport_domain(struct 
mlx5_ib_dev *dev, u32 *tdn)
 !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
return err;
 
-   return mlx5_ib_enable_lb(dev);
+   return mlx5_ib_enable_lb(dev, true, false);
 }
 
 static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn)
@@ -1628,7 +1644,7 @@ static void mlx5_ib_dealloc_transport_domain(struct 
mlx5_ib_dev *dev, u32 tdn)
 !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
return;
 
-   mlx5_ib_disable_lb(dev);
+   mlx5_ib_disable_lb(dev, true, false);
 }
 
 static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index b258adb93097..99c853c56d31 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -882,6 +882,8 @@ struct mlx5_ib_lb_state {
/* protect the user_td */
struct mutexmutex;
u32 user_td;
+   int qps;
+   boolenabled;
 };
 
 struct mlx5_ib_dev {
@@ -1040,6 +1042,8 @@ int mlx5_ib_query_srq(struct ib_srq *ibsrq, struct 
ib_srq_attr *srq_attr);
 int mlx5_ib_destroy_srq(struct ib_srq *srq);
 int mlx5_ib_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr,
  const struct ib_recv_wr **bad_wr);
+int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp);
+void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool td, bool qp);
 struct ib_qp *mlx5_ib_create_qp(struct ib_pd *pd,
struct ib_qp_init_attr *init_attr,
struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 428e417e01da..1f318a47040c 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1256,6 +1256,16 @@ static bool tunnel_offload_supported(struct 
mlx5_core_dev *dev)
 MLX5_CAP_ETH(dev, tunnel_stateless_geneve_rx));
 }
 
+static void destroy_raw_packet_qp_tir(struct mlx5_ib_dev *dev,
+ struct mlx5_ib_rq *rq,
+ u32 qp_flags_en)
+{
+   if (qp_flags_en & (MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC |
+  MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC))
+ 

[PATCH rdma-next 2/4] RDMA/mlx5: Refactor transport domain bookkeeping logic

2018-09-17 Thread Leon Romanovsky
From: Mark Bloch 

In preparation to enable loopback on a single user context move the logic
that enables/disables loopback to separate functions and group variables
under a single struct.

Signed-off-by: Mark Bloch 
Reviewed-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/main.c| 45 +++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 10 +---
 2 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 659af370a961..b64861ba2c42 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1571,6 +1571,32 @@ static void deallocate_uars(struct mlx5_ib_dev *dev,
mlx5_cmd_free_uar(dev->mdev, bfregi->sys_pages[i]);
 }
 
+static int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev)
+{
+   int err = 0;
+
+   mutex_lock(&dev->lb.mutex);
+   dev->lb.user_td++;
+
+   if (dev->lb.user_td == 2)
+   err = mlx5_nic_vport_update_local_lb(dev->mdev, true);
+
+   mutex_unlock(&dev->lb.mutex);
+
+   return err;
+}
+
+static void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev)
+{
+   mutex_lock(&dev->lb.mutex);
+   dev->lb.user_td--;
+
+   if (dev->lb.user_td < 2)
+   mlx5_nic_vport_update_local_lb(dev->mdev, false);
+
+   mutex_unlock(&dev->lb.mutex);
+}
+
 static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn)
 {
int err;
@@ -1587,14 +1613,7 @@ static int mlx5_ib_alloc_transport_domain(struct 
mlx5_ib_dev *dev, u32 *tdn)
 !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
return err;
 
-   mutex_lock(&dev->lb_mutex);
-   dev->user_td++;
-
-   if (dev->user_td == 2)
-   err = mlx5_nic_vport_update_local_lb(dev->mdev, true);
-
-   mutex_unlock(&dev->lb_mutex);
-   return err;
+   return mlx5_ib_enable_lb(dev);
 }
 
 static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn)
@@ -1609,13 +1628,7 @@ static void mlx5_ib_dealloc_transport_domain(struct 
mlx5_ib_dev *dev, u32 tdn)
 !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
return;
 
-   mutex_lock(&dev->lb_mutex);
-   dev->user_td--;
-
-   if (dev->user_td < 2)
-   mlx5_nic_vport_update_local_lb(dev->mdev, false);
-
-   mutex_unlock(&dev->lb_mutex);
+   mlx5_ib_disable_lb(dev);
 }
 
 static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
@@ -5970,7 +5983,7 @@ int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
if ((MLX5_CAP_GEN(dev->mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
(MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) ||
 MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
-   mutex_init(&dev->lb_mutex);
+   mutex_init(&dev->lb.mutex);
 
return 0;
 }
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 6c57872fdc4e..7b2af7e719c4 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -878,6 +878,12 @@ to_mcounters(struct ib_counters *ibcntrs)
 int parse_flow_flow_action(struct mlx5_ib_flow_action *maction,
   bool is_egress,
   struct mlx5_flow_act *action);
+struct mlx5_ib_lb_state {
+   /* protect the user_td */
+   struct mutexmutex;
+   u32 user_td;
+};
+
 struct mlx5_ib_dev {
struct ib_deviceib_dev;
const struct uverbs_object_tree_def *driver_trees[7];
@@ -919,9 +925,7 @@ struct mlx5_ib_dev {
const struct mlx5_ib_profile*profile;
struct mlx5_eswitch_rep *rep;
 
-   /* protect the user_td */
-   struct mutexlb_mutex;
-   u32 user_td;
+   struct mlx5_ib_lb_state lb;
u8  umr_fence;
struct list_headib_dev_list;
u64 sys_image_guid;
-- 
2.14.4



[PATCH rdma-next 0/4] mlx5 vport loopback

2018-09-17 Thread Leon Romanovsky
From: Leon Romanovsky 

Hi,

This is short series from Mark which extends handling of loopback
traffic. Originally mlx5 IB dynamically enabled/disabled both unicast
and multicast based on number of users. However RAW ethernet QPs need
more granular access.

Thanks

Mark Bloch (4):
  net/mlx5: Rename incorrect naming in IFC file
  RDMA/mlx5: Refactor transport domain bookkeeping logic
  RDMA/mlx5: Allow creating RAW ethernet QP with loopback support
  RDMA/mlx5: Enable vport loopback when user context or QP mandate

 drivers/infiniband/hw/mlx5/main.c  | 61 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h   | 16 +++-
 drivers/infiniband/hw/mlx5/qp.c| 96 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|  2 +-
 include/linux/mlx5/mlx5_ifc.h  |  4 +-
 include/uapi/rdma/mlx5-abi.h   |  2 +
 6 files changed, 138 insertions(+), 43 deletions(-)

--
2.14.4



Re: [PATCH net-next RFC 7/8] udp: gro behind static key

2018-09-17 Thread Steffen Klassert
On Fri, Sep 14, 2018 at 01:59:40PM -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> Avoid the socket lookup cost in udp_gro_receive if no socket has a
> gro callback configured.

It would be nice if we could do GRO not just for GRO configured
sockets, but also for flows that are going to be IPsec transformed
or directly forwarded.

Maybe in case that forwarding is enabled on the receiving device,
inet_gro_receive() could do a route lookup and allow GRO if the
route lookup returned at forwarding route.

For flows that are likely software segmented after that, it
would be worth to build packet chains insted of merging the
payload. Packets of the same flow could travel together, but
it would save the cost of the packet merging and segmenting.
This could be done similar to what I proposed for the list
receive case:

https://www.spinics.net/lists/netdev/msg522706.html

How GRO should be done could be even configured
by replacing the net_offload pointer similar
to what Paolo propsed in his pachset with
the inet_update_offload() function.


[PATCH][net-next] veth: rename pcpu_vstats as pcpu_lstats

2018-09-17 Thread Li RongQing
struct pcpu_vstats and pcpu_lstats have same members and
usage, and pcpu_lstats is used in many files, so rename
pcpu_vstats as pcpu_lstats to reduce duplicate definition

Signed-off-by: Zhang Yu 
Signed-off-by: Li RongQing 
---
 drivers/net/veth.c| 22 --
 include/linux/netdevice.h |  1 -
 2 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index bc8faf13a731..aeecb5892e26 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -37,12 +37,6 @@
 #define VETH_XDP_TXBIT(0)
 #define VETH_XDP_REDIR BIT(1)
 
-struct pcpu_vstats {
-   u64 packets;
-   u64 bytes;
-   struct u64_stats_sync   syncp;
-};
-
 struct veth_rq {
struct napi_struct  xdp_napi;
struct net_device   *dev;
@@ -217,7 +211,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
skb_tx_timestamp(skb);
if (likely(veth_forward_skb(rcv, skb, rq, rcv_xdp) == NET_RX_SUCCESS)) {
-   struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
+   struct pcpu_lstats *stats = this_cpu_ptr(dev->lstats);
 
u64_stats_update_begin(&stats->syncp);
stats->bytes += length;
@@ -236,7 +230,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
-static u64 veth_stats_one(struct pcpu_vstats *result, struct net_device *dev)
+static u64 veth_stats_one(struct pcpu_lstats *result, struct net_device *dev)
 {
struct veth_priv *priv = netdev_priv(dev);
int cpu;
@@ -244,7 +238,7 @@ static u64 veth_stats_one(struct pcpu_vstats *result, 
struct net_device *dev)
result->packets = 0;
result->bytes = 0;
for_each_possible_cpu(cpu) {
-   struct pcpu_vstats *stats = per_cpu_ptr(dev->vstats, cpu);
+   struct pcpu_lstats *stats = per_cpu_ptr(dev->lstats, cpu);
u64 packets, bytes;
unsigned int start;
 
@@ -264,7 +258,7 @@ static void veth_get_stats64(struct net_device *dev,
 {
struct veth_priv *priv = netdev_priv(dev);
struct net_device *peer;
-   struct pcpu_vstats one;
+   struct pcpu_lstats one;
 
tot->tx_dropped = veth_stats_one(&one, dev);
tot->tx_bytes = one.bytes;
@@ -830,13 +824,13 @@ static int veth_dev_init(struct net_device *dev)
 {
int err;
 
-   dev->vstats = netdev_alloc_pcpu_stats(struct pcpu_vstats);
-   if (!dev->vstats)
+   dev->lstats = netdev_alloc_pcpu_stats(struct pcpu_lstats);
+   if (!dev->lstats)
return -ENOMEM;
 
err = veth_alloc_queues(dev);
if (err) {
-   free_percpu(dev->vstats);
+   free_percpu(dev->lstats);
return err;
}
 
@@ -846,7 +840,7 @@ static int veth_dev_init(struct net_device *dev)
 static void veth_dev_free(struct net_device *dev)
 {
veth_free_queues(dev);
-   free_percpu(dev->vstats);
+   free_percpu(dev->lstats);
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index baed5d5088c5..1cbbf77a685f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2000,7 +2000,6 @@ struct net_device {
struct pcpu_lstats __percpu *lstats;
struct pcpu_sw_netstats __percpu*tstats;
struct pcpu_dstats __percpu *dstats;
-   struct pcpu_vstats __percpu *vstats;
};
 
 #if IS_ENABLED(CONFIG_GARP)
-- 
2.16.2



[PATCH mlx5-next 03/25] net/mlx5: Set uid as part of RQ commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of RQ commands so that the firmware can manage the
RQ object in a secured way.

That will enable using an RQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 16 ++--
 include/linux/mlx5/mlx5_ifc.h|  6 +++---
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index 04f72a1cdbcc..0ca68ef54d93 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -540,6 +540,17 @@ int mlx5_core_xrcd_dealloc(struct mlx5_core_dev *dev, u32 
xrcdn)
 }
 EXPORT_SYMBOL_GPL(mlx5_core_xrcd_dealloc);
 
+static void destroy_rq_tracked(struct mlx5_core_dev *dev, u32 rqn, u16 uid)
+{
+   u32 in[MLX5_ST_SZ_DW(destroy_rq_in)]   = {0};
+   u32 out[MLX5_ST_SZ_DW(destroy_rq_out)] = {0};
+
+   MLX5_SET(destroy_rq_in, in, opcode, MLX5_CMD_OP_DESTROY_RQ);
+   MLX5_SET(destroy_rq_in, in, rqn, rqn);
+   MLX5_SET(destroy_rq_in, in, uid, uid);
+   mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
 int mlx5_core_create_rq_tracked(struct mlx5_core_dev *dev, u32 *in, int inlen,
struct mlx5_core_qp *rq)
 {
@@ -550,6 +561,7 @@ int mlx5_core_create_rq_tracked(struct mlx5_core_dev *dev, 
u32 *in, int inlen,
if (err)
return err;
 
+   rq->uid = MLX5_GET(create_rq_in, in, uid);
rq->qpn = rqn;
err = create_resource_common(dev, rq, MLX5_RES_RQ);
if (err)
@@ -558,7 +570,7 @@ int mlx5_core_create_rq_tracked(struct mlx5_core_dev *dev, 
u32 *in, int inlen,
return 0;
 
 err_destroy_rq:
-   mlx5_core_destroy_rq(dev, rq->qpn);
+   destroy_rq_tracked(dev, rq->qpn, rq->uid);
 
return err;
 }
@@ -568,7 +580,7 @@ void mlx5_core_destroy_rq_tracked(struct mlx5_core_dev *dev,
  struct mlx5_core_qp *rq)
 {
destroy_resource_common(dev, rq);
-   mlx5_core_destroy_rq(dev, rq->qpn);
+   destroy_rq_tracked(dev, rq->qpn, rq->uid);
 }
 EXPORT_SYMBOL(mlx5_core_destroy_rq_tracked);
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index e5a0d3ecfaad..01b707666fb4 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -5489,7 +5489,7 @@ enum {
 
 struct mlx5_ifc_modify_rq_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6165,7 +6165,7 @@ struct mlx5_ifc_destroy_rq_out_bits {
 
 struct mlx5_ifc_destroy_rq_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6848,7 +6848,7 @@ struct mlx5_ifc_create_rq_out_bits {
 
 struct mlx5_ifc_create_rq_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
-- 
2.14.4



[PATCH rdma-next 00/24] Extend DEVX functionality

2018-09-17 Thread Leon Romanovsky
From: Leon Romanovsky 

>From Yishai,

This series comes to enable the DEVX functionality in some wider scope,
specifically,
- It enables using kernel objects that were created by the verbs
  API in the DEVX flow.
- It enables white list commands without DEVX user context.
- It enables the IB link layer under CAP_NET_RAW capabilities.
- It exposes the PRM handles for RAW QP (i.e. TIRN, TISN, RQN, SQN)
  to be used later on directly by the DEVX interface.

In General,
Each object that is created/destroyed/modified via verbs will be stamped
with a UID based on its user context. This is already done for DEVX objects
commands.

This will enable the firmware to enforce the usage of kernel objects
from the DEVX flow by validating that the same UID is used and the resources are
really related to the same user.

For example in case a CQ was created with verbs it will be stamped with
UID and once will be pointed by a DEVX create QP command the firmware will
validate that the input CQN really belongs to the UID which issues the create QP
command.

As of the above, all the PRM objects (except of the public ones which
are managed by the kernel e.g. FLOW, etc.) will have a UID upon their
create/modify/destroy commands. The detection of UMEM / physical
addressed in the relevant commands will be done by firmware according to a 'umem
valid bit' as the UID may be used in both cases.

The series also enables white list commands which don't require a
specific DEVX context, instead of this a device UID is used so that
the firmware will mask un-privileged functionality. The IB link layer
is also enabled once CAP_NET_RAW permission exists.

To enable using the RAW QP underlay objects (e.g. TIRN, RQN, etc.) later
on by DEVX commands the UHW output for this case was extended to return this
data when a DEVX context is used.

Thanks

Leon Romanovsky (1):
  net/mlx5: Update mlx5_ifc with DEVX UID bits

Yishai Hadas (24):
  net/mlx5: Set uid as part of CQ commands
  net/mlx5: Set uid as part of QP commands
  net/mlx5: Set uid as part of RQ commands
  net/mlx5: Set uid as part of SQ commands
  net/mlx5: Set uid as part of SRQ commands
  net/mlx5: Set uid as part of DCT commands
  IB/mlx5: Set uid as part of CQ creation
  IB/mlx5: Set uid as part of QP creation
  IB/mlx5: Set uid as part of RQ commands
  IB/mlx5: Set uid as part of SQ commands
  IB/mlx5: Set uid as part of TIR commands
  IB/mlx5: Set uid as part of TIS commands
  IB/mlx5: Set uid as part of RQT commands
  IB/mlx5: Set uid as part of PD commands
  IB/mlx5: Set uid as part of TD commands
  IB/mlx5: Set uid as part of SRQ commands
  IB/mlx5: Set uid as part of DCT commands
  IB/mlx5: Set uid as part of XRCD commands
  IB/mlx5: Set uid as part of MCG commands
  IB/mlx5: Set valid umem bit on DEVX
  IB/mlx5: Expose RAW QP device handles to user space
  IB/mlx5: Manage device uid for DEVX white list commands
  IB/mlx5: Enable DEVX white list commands
  IB/mlx5: Enable DEVX on IB

 drivers/infiniband/hw/mlx5/cmd.c  | 129 ++
 drivers/infiniband/hw/mlx5/cmd.h  |  14 ++
 drivers/infiniband/hw/mlx5/cq.c   |   1 +
 drivers/infiniband/hw/mlx5/devx.c | 182 +++---
 drivers/infiniband/hw/mlx5/main.c |  80 +++
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |  15 +--
 drivers/infiniband/hw/mlx5/qp.c   | 141 +++-
 drivers/infiniband/hw/mlx5/srq.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/cq.c  |   4 +
 drivers/net/ethernet/mellanox/mlx5/core/qp.c  |  81 
 drivers/net/ethernet/mellanox/mlx5/core/srq.c |  30 -
 include/linux/mlx5/cq.h   |   1 +
 include/linux/mlx5/driver.h   |   1 +
 include/linux/mlx5/mlx5_ifc.h | 135 +++
 include/linux/mlx5/qp.h   |   1 +
 include/linux/mlx5/srq.h  |   1 +
 include/uapi/rdma/mlx5-abi.h  |  13 ++
 17 files changed, 657 insertions(+), 173 deletions(-)

--
2.14.4



[PATCH mlx5-next 01/25] net/mlx5: Set uid as part of CQ commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of CQ commands so that the firmware can manage the CQ
object in a secured way.

This will enable using a CQ that was created by verbs application to be
used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 4 
 include/linux/mlx5/cq.h  | 1 +
 include/linux/mlx5/mlx5_ifc.h| 6 +++---
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index a4179122a279..4b85abb5c9f7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -109,6 +109,7 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct 
mlx5_core_cq *cq,
cq->cons_index = 0;
cq->arm_sn = 0;
cq->eq = eq;
+   cq->uid = MLX5_GET(create_cq_in, in, uid);
refcount_set(&cq->refcount, 1);
init_completion(&cq->free);
if (!cq->comp)
@@ -144,6 +145,7 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct 
mlx5_core_cq *cq,
memset(dout, 0, sizeof(dout));
MLX5_SET(destroy_cq_in, din, opcode, MLX5_CMD_OP_DESTROY_CQ);
MLX5_SET(destroy_cq_in, din, cqn, cq->cqn);
+   MLX5_SET(destroy_cq_in, din, uid, cq->uid);
mlx5_cmd_exec(dev, din, sizeof(din), dout, sizeof(dout));
return err;
 }
@@ -165,6 +167,7 @@ int mlx5_core_destroy_cq(struct mlx5_core_dev *dev, struct 
mlx5_core_cq *cq)
 
MLX5_SET(destroy_cq_in, in, opcode, MLX5_CMD_OP_DESTROY_CQ);
MLX5_SET(destroy_cq_in, in, cqn, cq->cqn);
+   MLX5_SET(destroy_cq_in, in, uid, cq->uid);
err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
if (err)
return err;
@@ -196,6 +199,7 @@ int mlx5_core_modify_cq(struct mlx5_core_dev *dev, struct 
mlx5_core_cq *cq,
u32 out[MLX5_ST_SZ_DW(modify_cq_out)] = {0};
 
MLX5_SET(modify_cq_in, in, opcode, MLX5_CMD_OP_MODIFY_CQ);
+   MLX5_SET(modify_cq_in, in, uid, cq->uid);
return mlx5_cmd_exec(dev, in, inlen, out, sizeof(out));
 }
 EXPORT_SYMBOL(mlx5_core_modify_cq);
diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h
index 0ef6138eca49..31a750570c38 100644
--- a/include/linux/mlx5/cq.h
+++ b/include/linux/mlx5/cq.h
@@ -61,6 +61,7 @@ struct mlx5_core_cq {
int reset_notify_added;
struct list_headreset_notify;
struct mlx5_eq  *eq;
+   u16 uid;
 };
 
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index a14c4eaff53f..e62a0825d35c 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -5630,7 +5630,7 @@ enum {
 
 struct mlx5_ifc_modify_cq_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6405,7 +6405,7 @@ struct mlx5_ifc_destroy_cq_out_bits {
 
 struct mlx5_ifc_destroy_cq_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -7165,7 +7165,7 @@ struct mlx5_ifc_create_cq_out_bits {
 
 struct mlx5_ifc_create_cq_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
-- 
2.14.4



[PATCH mlx5-next 02/25] net/mlx5: Set uid as part of QP commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of QP commands so that the firmware can manage the
QP object in a secured way.

That will enable using a QP that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 45 +---
 include/linux/mlx5/mlx5_ifc.h| 22 +++---
 include/linux/mlx5/qp.h  |  1 +
 3 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index 4ca07bfb6b14..04f72a1cdbcc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -240,6 +240,7 @@ int mlx5_core_create_qp(struct mlx5_core_dev *dev,
if (err)
return err;
 
+   qp->uid = MLX5_GET(create_qp_in, in, uid);
qp->qpn = MLX5_GET(create_qp_out, out, qpn);
mlx5_core_dbg(dev, "qpn = 0x%x\n", qp->qpn);
 
@@ -261,6 +262,7 @@ int mlx5_core_create_qp(struct mlx5_core_dev *dev,
memset(dout, 0, sizeof(dout));
MLX5_SET(destroy_qp_in, din, opcode, MLX5_CMD_OP_DESTROY_QP);
MLX5_SET(destroy_qp_in, din, qpn, qp->qpn);
+   MLX5_SET(destroy_qp_in, din, uid, qp->uid);
mlx5_cmd_exec(dev, din, sizeof(din), dout, sizeof(dout));
return err;
 }
@@ -320,6 +322,7 @@ int mlx5_core_destroy_qp(struct mlx5_core_dev *dev,
 
MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP);
MLX5_SET(destroy_qp_in, in, qpn, qp->qpn);
+   MLX5_SET(destroy_qp_in, in, uid, qp->uid);
err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
if (err)
return err;
@@ -373,7 +376,7 @@ static void mbox_free(struct mbox_info *mbox)
 
 static int modify_qp_mbox_alloc(struct mlx5_core_dev *dev, u16 opcode, int qpn,
u32 opt_param_mask, void *qpc,
-   struct mbox_info *mbox)
+   struct mbox_info *mbox, u16 uid)
 {
mbox->out = NULL;
mbox->in = NULL;
@@ -381,26 +384,32 @@ static int modify_qp_mbox_alloc(struct mlx5_core_dev 
*dev, u16 opcode, int qpn,
 #define MBOX_ALLOC(mbox, typ)  \
mbox_alloc(mbox, MLX5_ST_SZ_BYTES(typ##_in), 
MLX5_ST_SZ_BYTES(typ##_out))
 
-#define MOD_QP_IN_SET(typ, in, _opcode, _qpn) \
-   MLX5_SET(typ##_in, in, opcode, _opcode); \
-   MLX5_SET(typ##_in, in, qpn, _qpn)
-
-#define MOD_QP_IN_SET_QPC(typ, in, _opcode, _qpn, _opt_p, _qpc) \
-   MOD_QP_IN_SET(typ, in, _opcode, _qpn); \
-   MLX5_SET(typ##_in, in, opt_param_mask, _opt_p); \
-   memcpy(MLX5_ADDR_OF(typ##_in, in, qpc), _qpc, MLX5_ST_SZ_BYTES(qpc))
+#define MOD_QP_IN_SET(typ, in, _opcode, _qpn, _uid) \
+   do { \
+   MLX5_SET(typ##_in, in, opcode, _opcode); \
+   MLX5_SET(typ##_in, in, qpn, _qpn); \
+   MLX5_SET(typ##_in, in, uid, _uid); \
+   } while (0)
+
+#define MOD_QP_IN_SET_QPC(typ, in, _opcode, _qpn, _opt_p, _qpc, _uid) \
+   do { \
+   MOD_QP_IN_SET(typ, in, _opcode, _qpn, _uid); \
+   MLX5_SET(typ##_in, in, opt_param_mask, _opt_p); \
+   memcpy(MLX5_ADDR_OF(typ##_in, in, qpc), \
+  _qpc, MLX5_ST_SZ_BYTES(qpc)); \
+   } while (0)
 
switch (opcode) {
/* 2RST & 2ERR */
case MLX5_CMD_OP_2RST_QP:
if (MBOX_ALLOC(mbox, qp_2rst))
return -ENOMEM;
-   MOD_QP_IN_SET(qp_2rst, mbox->in, opcode, qpn);
+   MOD_QP_IN_SET(qp_2rst, mbox->in, opcode, qpn, uid);
break;
case MLX5_CMD_OP_2ERR_QP:
if (MBOX_ALLOC(mbox, qp_2err))
return -ENOMEM;
-   MOD_QP_IN_SET(qp_2err, mbox->in, opcode, qpn);
+   MOD_QP_IN_SET(qp_2err, mbox->in, opcode, qpn, uid);
break;
 
/* MODIFY with QPC */
@@ -408,37 +417,37 @@ static int modify_qp_mbox_alloc(struct mlx5_core_dev 
*dev, u16 opcode, int qpn,
if (MBOX_ALLOC(mbox, rst2init_qp))
return -ENOMEM;
MOD_QP_IN_SET_QPC(rst2init_qp, mbox->in, opcode, qpn,
- opt_param_mask, qpc);
+ opt_param_mask, qpc, uid);
break;
case MLX5_CMD_OP_INIT2RTR_QP:
if (MBOX_ALLOC(mbox, init2rtr_qp))
return -ENOMEM;
MOD_QP_IN_SET_QPC(init2rtr_qp, mbox->in, opcode, qpn,
- opt_param_mask, qpc);
+ opt_param_mask, qpc, uid);
break;
case MLX5_CMD_OP_RTR2RTS_QP:
if (MBOX_ALLOC(mbox, rtr2rts_qp))
return -ENOMEM;
MOD_QP_IN_SET_QPC(rtr2rts_qp, mbox->in, opcode, qpn,
- 

[PATCH mlx5-next 06/25] net/mlx5: Set uid as part of DCT commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of DCT commands so that the firmware can manage the
DCT object in a secured way.

That will enable using a DCT that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 4 
 include/linux/mlx5/mlx5_ifc.h| 6 +++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index 9bdb3dc425ce..30f0e5ea7b2c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -211,6 +211,7 @@ int mlx5_core_create_dct(struct mlx5_core_dev *dev,
}
 
qp->qpn = MLX5_GET(create_dct_out, out, dctn);
+   qp->uid = MLX5_GET(create_dct_in, in, uid);
err = create_resource_common(dev, qp, MLX5_RES_DCT);
if (err)
goto err_cmd;
@@ -219,6 +220,7 @@ int mlx5_core_create_dct(struct mlx5_core_dev *dev,
 err_cmd:
MLX5_SET(destroy_dct_in, din, opcode, MLX5_CMD_OP_DESTROY_DCT);
MLX5_SET(destroy_dct_in, din, dctn, qp->qpn);
+   MLX5_SET(destroy_dct_in, din, uid, qp->uid);
mlx5_cmd_exec(dev, (void *)&in, sizeof(din),
  (void *)&out, sizeof(dout));
return err;
@@ -277,6 +279,7 @@ static int mlx5_core_drain_dct(struct mlx5_core_dev *dev,
 
MLX5_SET(drain_dct_in, in, opcode, MLX5_CMD_OP_DRAIN_DCT);
MLX5_SET(drain_dct_in, in, dctn, qp->qpn);
+   MLX5_SET(drain_dct_in, in, uid, qp->uid);
return mlx5_cmd_exec(dev, (void *)&in, sizeof(in),
 (void *)&out, sizeof(out));
 }
@@ -303,6 +306,7 @@ int mlx5_core_destroy_dct(struct mlx5_core_dev *dev,
destroy_resource_common(dev, &dct->mqp);
MLX5_SET(destroy_dct_in, in, opcode, MLX5_CMD_OP_DESTROY_DCT);
MLX5_SET(destroy_dct_in, in, dctn, qp->qpn);
+   MLX5_SET(destroy_dct_in, in, uid, qp->uid);
err = mlx5_cmd_exec(dev, (void *)&in, sizeof(in),
(void *)&out, sizeof(out));
return err;
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 5a2f0b02483a..efa4a60431d4 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -5919,7 +5919,7 @@ struct mlx5_ifc_drain_dct_out_bits {
 
 struct mlx5_ifc_drain_dct_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6383,7 +6383,7 @@ struct mlx5_ifc_destroy_dct_out_bits {
 
 struct mlx5_ifc_destroy_dct_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -7139,7 +7139,7 @@ struct mlx5_ifc_create_dct_out_bits {
 
 struct mlx5_ifc_create_dct_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
-- 
2.14.4



[PATCH rdma-next 10/25] IB/mlx5: Set uid as part of RQ commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of RQ commands so that the firmware can manage the
RQ object in a secured way.

The uid for the destroy command is set by mlx5_core.

This will enable using an RQ that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/qp.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 786db05dfb91..31c69da7ccdf 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1190,7 +1190,7 @@ static size_t get_rq_pas_size(void *qpc)
 
 static int create_raw_packet_qp_rq(struct mlx5_ib_dev *dev,
   struct mlx5_ib_rq *rq, void *qpin,
-  size_t qpinlen)
+  size_t qpinlen, u16 uid)
 {
struct mlx5_ib_qp *mqp = rq->base.container_mibqp;
__be64 *pas;
@@ -1211,6 +1211,7 @@ static int create_raw_packet_qp_rq(struct mlx5_ib_dev 
*dev,
if (!in)
return -ENOMEM;
 
+   MLX5_SET(create_rq_in, in, uid, uid);
rqc = MLX5_ADDR_OF(create_rq_in, in, ctx);
if (!(rq->flags & MLX5_IB_RQ_CVLAN_STRIPPING))
MLX5_SET(rqc, rqc, vsd, 1);
@@ -1328,6 +1329,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
struct mlx5_ib_ucontext *mucontext = to_mucontext(ucontext);
int err;
u32 tdn = mucontext->tdn;
+   u16 uid = mucontext->devx_uid;
 
if (qp->sq.wqe_cnt) {
err = create_raw_packet_qp_tis(dev, qp, sq, tdn);
@@ -1349,7 +1351,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
rq->flags |= MLX5_IB_RQ_CVLAN_STRIPPING;
if (qp->flags & MLX5_IB_QP_PCI_WRITE_END_PADDING)
rq->flags |= MLX5_IB_RQ_PCI_WRITE_END_PADDING;
-   err = create_raw_packet_qp_rq(dev, rq, in, inlen);
+   err = create_raw_packet_qp_rq(dev, rq, in, inlen, uid);
if (err)
goto err_destroy_sq;
 
@@ -2840,7 +2842,8 @@ static int ib_mask_to_mlx5_opt(int ib_mask)
 
 static int modify_raw_packet_qp_rq(struct mlx5_ib_dev *dev,
   struct mlx5_ib_rq *rq, int new_state,
-  const struct mlx5_modify_raw_qp_param 
*raw_qp_param)
+  const struct mlx5_modify_raw_qp_param 
*raw_qp_param,
+  u16 uid)
 {
void *in;
void *rqc;
@@ -2853,6 +2856,7 @@ static int modify_raw_packet_qp_rq(struct mlx5_ib_dev 
*dev,
return -ENOMEM;
 
MLX5_SET(modify_rq_in, in, rq_state, rq->state);
+   MLX5_SET(modify_rq_in, in, uid, uid);
 
rqc = MLX5_ADDR_OF(modify_rq_in, in, ctx);
MLX5_SET(rqc, rqc, state, new_state);
@@ -2957,6 +2961,7 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
u8 tx_affinity)
 {
struct mlx5_ib_raw_packet_qp *raw_packet_qp = &qp->raw_packet_qp;
+   u16 uid = to_mucontext(qp->ibqp.uobject->context)->devx_uid;
struct mlx5_ib_rq *rq = &raw_packet_qp->rq;
struct mlx5_ib_sq *sq = &raw_packet_qp->sq;
int modify_rq = !!qp->rq.wqe_cnt;
@@ -3000,7 +3005,8 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
}
 
if (modify_rq) {
-   err =  modify_raw_packet_qp_rq(dev, rq, rq_state, raw_qp_param);
+   err =  modify_raw_packet_qp_rq(dev, rq, rq_state, raw_qp_param,
+  uid);
if (err)
return err;
}
@@ -5407,6 +5413,8 @@ static int  create_rq(struct mlx5_ib_rwq *rwq, struct 
ib_pd *pd,
if (!in)
return -ENOMEM;
 
+   MLX5_SET(create_rq_in, in, uid,
+to_mucontext(pd->uobject->context)->devx_uid);
rqc = MLX5_ADDR_OF(create_rq_in, in, ctx);
MLX5_SET(rqc,  rqc, mem_rq_type,
 MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_INLINE);
@@ -5792,6 +5800,8 @@ int mlx5_ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr 
*wq_attr,
if (wq_state == IB_WQS_ERR)
wq_state = MLX5_RQC_STATE_ERR;
MLX5_SET(modify_rq_in, in, rq_state, curr_wq_state);
+   MLX5_SET(modify_rq_in, in, uid,
+to_mucontext(wq->uobject->context)->devx_uid);
MLX5_SET(rqc, rqc, state, wq_state);
 
if (wq_attr_mask & IB_WQ_FLAGS) {
-- 
2.14.4



[PATCH rdma-next 09/25] IB/mlx5: Set uid as part of QP creation

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of QP creation so that the firmware can manage the
QP object in a secured way.

The uid for the destroy and the modify commands is set by mlx5_core.

This will enable using a QP that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/qp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index e7ebe50ffdb5..786db05dfb91 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -850,6 +850,7 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct 
ib_pd *pd,
goto err_umem;
}
 
+   MLX5_SET(create_qp_in, *in, uid, context->devx_uid);
pas = (__be64 *)MLX5_ADDR_OF(create_qp_in, *in, pas);
if (ubuffer->umem)
mlx5_ib_populate_pas(dev, ubuffer->umem, page_shift, pas, 0);
-- 
2.14.4



[PATCH rdma-next 08/25] IB/mlx5: Set uid as part of CQ creation

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of CQ creation so that the firmware can manage the
CQ object in a secured way.

The uid for the destroy and the modify commands is set by mlx5_core.

This will enable using a CQ that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/cq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 495fa6e651ea..a41519dc8d3a 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -877,6 +877,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
ib_udata *udata,
cq->private_flags |= MLX5_IB_CQ_PR_FLAGS_CQE_128_PAD;
}
 
+   MLX5_SET(create_cq_in, *cqb, uid, to_mucontext(context)->devx_uid);
return 0;
 
 err_cqb:
-- 
2.14.4



[PATCH mlx5-next 05/25] net/mlx5: Set uid as part of SRQ commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of SRQ commands so that the firmware can manage the
SRQ object in a secured way.

That will enable using an SRQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/srq.c | 30 ---
 include/linux/mlx5/driver.h   |  1 +
 include/linux/mlx5/mlx5_ifc.h | 22 ++--
 include/linux/mlx5/srq.h  |  1 +
 4 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/srq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/srq.c
index 23cc337a96c9..216d44ad061a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/srq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/srq.c
@@ -166,6 +166,7 @@ static int create_srq_cmd(struct mlx5_core_dev *dev, struct 
mlx5_core_srq *srq,
if (!create_in)
return -ENOMEM;
 
+   MLX5_SET(create_srq_in, create_in, uid, in->uid);
srqc = MLX5_ADDR_OF(create_srq_in, create_in, srq_context_entry);
pas = MLX5_ADDR_OF(create_srq_in, create_in, pas);
 
@@ -178,8 +179,10 @@ static int create_srq_cmd(struct mlx5_core_dev *dev, 
struct mlx5_core_srq *srq,
err = mlx5_cmd_exec(dev, create_in, inlen, create_out,
sizeof(create_out));
kvfree(create_in);
-   if (!err)
+   if (!err) {
srq->srqn = MLX5_GET(create_srq_out, create_out, srqn);
+   srq->uid = in->uid;
+   }
 
return err;
 }
@@ -193,6 +196,7 @@ static int destroy_srq_cmd(struct mlx5_core_dev *dev,
MLX5_SET(destroy_srq_in, srq_in, opcode,
 MLX5_CMD_OP_DESTROY_SRQ);
MLX5_SET(destroy_srq_in, srq_in, srqn, srq->srqn);
+   MLX5_SET(destroy_srq_in, srq_in, uid, srq->uid);
 
return mlx5_cmd_exec(dev, srq_in, sizeof(srq_in),
 srq_out, sizeof(srq_out));
@@ -208,6 +212,7 @@ static int arm_srq_cmd(struct mlx5_core_dev *dev, struct 
mlx5_core_srq *srq,
MLX5_SET(arm_rq_in, srq_in, op_mod, MLX5_ARM_RQ_IN_OP_MOD_SRQ);
MLX5_SET(arm_rq_in, srq_in, srq_number, srq->srqn);
MLX5_SET(arm_rq_in, srq_in, lwm,  lwm);
+   MLX5_SET(arm_rq_in, srq_in, uid, srq->uid);
 
return  mlx5_cmd_exec(dev, srq_in, sizeof(srq_in),
  srq_out, sizeof(srq_out));
@@ -260,6 +265,7 @@ static int create_xrc_srq_cmd(struct mlx5_core_dev *dev,
if (!create_in)
return -ENOMEM;
 
+   MLX5_SET(create_xrc_srq_in, create_in, uid, in->uid);
xrc_srqc = MLX5_ADDR_OF(create_xrc_srq_in, create_in,
xrc_srq_context_entry);
pas  = MLX5_ADDR_OF(create_xrc_srq_in, create_in, pas);
@@ -277,6 +283,7 @@ static int create_xrc_srq_cmd(struct mlx5_core_dev *dev,
goto out;
 
srq->srqn = MLX5_GET(create_xrc_srq_out, create_out, xrc_srqn);
+   srq->uid = in->uid;
 out:
kvfree(create_in);
return err;
@@ -291,6 +298,7 @@ static int destroy_xrc_srq_cmd(struct mlx5_core_dev *dev,
MLX5_SET(destroy_xrc_srq_in, xrcsrq_in, opcode,
 MLX5_CMD_OP_DESTROY_XRC_SRQ);
MLX5_SET(destroy_xrc_srq_in, xrcsrq_in, xrc_srqn, srq->srqn);
+   MLX5_SET(destroy_xrc_srq_in, xrcsrq_in, uid, srq->uid);
 
return mlx5_cmd_exec(dev, xrcsrq_in, sizeof(xrcsrq_in),
 xrcsrq_out, sizeof(xrcsrq_out));
@@ -306,6 +314,7 @@ static int arm_xrc_srq_cmd(struct mlx5_core_dev *dev,
MLX5_SET(arm_xrc_srq_in, xrcsrq_in, op_mod,   
MLX5_ARM_XRC_SRQ_IN_OP_MOD_XRC_SRQ);
MLX5_SET(arm_xrc_srq_in, xrcsrq_in, xrc_srqn, srq->srqn);
MLX5_SET(arm_xrc_srq_in, xrcsrq_in, lwm,  lwm);
+   MLX5_SET(arm_xrc_srq_in, xrcsrq_in, uid, srq->uid);
 
return  mlx5_cmd_exec(dev, xrcsrq_in, sizeof(xrcsrq_in),
  xrcsrq_out, sizeof(xrcsrq_out));
@@ -365,10 +374,13 @@ static int create_rmp_cmd(struct mlx5_core_dev *dev, 
struct mlx5_core_srq *srq,
wq = MLX5_ADDR_OF(rmpc, rmpc, wq);
 
MLX5_SET(rmpc, rmpc, state, MLX5_RMPC_STATE_RDY);
+   MLX5_SET(create_rmp_in, create_in, uid, in->uid);
set_wq(wq, in);
memcpy(MLX5_ADDR_OF(rmpc, rmpc, wq.pas), in->pas, pas_size);
 
err = mlx5_core_create_rmp(dev, create_in, inlen, &srq->srqn);
+   if (!err)
+   srq->uid = in->uid;
 
kvfree(create_in);
return err;
@@ -377,7 +389,13 @@ static int create_rmp_cmd(struct mlx5_core_dev *dev, 
struct mlx5_core_srq *srq,
 static int destroy_rmp_cmd(struct mlx5_core_dev *dev,
   struct mlx5_core_srq *srq)
 {
-   return mlx5_core_destroy_rmp(dev, srq->srqn);
+   u32 in[MLX5_ST_SZ_DW(destroy_rmp_in)]   = {0};
+   u32 out[MLX5_ST_SZ_DW(destroy_rmp_out)] = {0};
+
+   MLX5_SET(destro

[PATCH rdma-next 12/25] IB/mlx5: Set uid as part of TIR commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of TIR commands so that the firmware can manage the
TIR object in a secured way.

That will enable using a TIR that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/cmd.c | 11 +++
 drivers/infiniband/hw/mlx5/cmd.h |  1 +
 drivers/infiniband/hw/mlx5/qp.c  | 24 
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index c84fef9a8a08..e150ae44e06a 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -197,3 +197,14 @@ int mlx5_cmd_query_ext_ppcnt_counters(struct mlx5_core_dev 
*dev, void *out)
return  mlx5_core_access_reg(dev, in, sz, out, sz, MLX5_REG_PPCNT,
 0, 0);
 }
+
+void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid)
+{
+   u32 in[MLX5_ST_SZ_DW(destroy_tir_in)]   = {0};
+   u32 out[MLX5_ST_SZ_DW(destroy_tir_out)] = {0};
+
+   MLX5_SET(destroy_tir_in, in, opcode, MLX5_CMD_OP_DESTROY_TIR);
+   MLX5_SET(destroy_tir_in, in, tirn, tirn);
+   MLX5_SET(destroy_tir_in, in, uid, uid);
+   mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h
index 88cbb1c41703..274090a38c4b 100644
--- a/drivers/infiniband/hw/mlx5/cmd.h
+++ b/drivers/infiniband/hw/mlx5/cmd.h
@@ -47,4 +47,5 @@ int mlx5_cmd_modify_cong_params(struct mlx5_core_dev *mdev,
 int mlx5_cmd_alloc_memic(struct mlx5_memic *memic, phys_addr_t *addr,
 u64 length, u32 alignment);
 int mlx5_cmd_dealloc_memic(struct mlx5_memic *memic, u64 addr, u64 length);
+void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid);
 #endif /* MLX5_IB_CMD_H */
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 24370635008e..07bf5128bee4 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -37,6 +37,7 @@
 #include 
 #include "mlx5_ib.h"
 #include "ib_rep.h"
+#include "cmd.h"
 
 /* not supported currently */
 static int wq_signature;
@@ -1262,17 +1263,19 @@ static bool tunnel_offload_supported(struct 
mlx5_core_dev *dev)
 
 static void destroy_raw_packet_qp_tir(struct mlx5_ib_dev *dev,
  struct mlx5_ib_rq *rq,
- u32 qp_flags_en)
+ u32 qp_flags_en,
+ u16 uid)
 {
if (qp_flags_en & (MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC |
   MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC))
mlx5_ib_disable_lb(dev, false, true);
-   mlx5_core_destroy_tir(dev->mdev, rq->tirn);
+   mlx5_cmd_destroy_tir(dev->mdev, rq->tirn, uid);
 }
 
 static int create_raw_packet_qp_tir(struct mlx5_ib_dev *dev,
struct mlx5_ib_rq *rq, u32 tdn,
-   u32 *qp_flags_en)
+   u32 *qp_flags_en,
+   u16 uid)
 {
u8 lb_flag = 0;
u32 *in;
@@ -1285,6 +1288,7 @@ static int create_raw_packet_qp_tir(struct mlx5_ib_dev 
*dev,
if (!in)
return -ENOMEM;
 
+   MLX5_SET(create_tir_in, in, uid, uid);
tirc = MLX5_ADDR_OF(create_tir_in, in, ctx);
MLX5_SET(tirc, tirc, disp_type, MLX5_TIRC_DISP_TYPE_DIRECT);
MLX5_SET(tirc, tirc, inline_rqn, rq->base.mqp.qpn);
@@ -1311,7 +1315,7 @@ static int create_raw_packet_qp_tir(struct mlx5_ib_dev 
*dev,
err = mlx5_ib_enable_lb(dev, false, true);
 
if (err)
-   destroy_raw_packet_qp_tir(dev, rq, 0);
+   destroy_raw_packet_qp_tir(dev, rq, 0, uid);
}
kvfree(in);
 
@@ -1356,8 +1360,8 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
if (err)
goto err_destroy_sq;
 
-
-   err = create_raw_packet_qp_tir(dev, rq, tdn, &qp->flags_en);
+   err = create_raw_packet_qp_tir(dev, rq, tdn, &qp->flags_en,
+  uid);
if (err)
goto err_destroy_rq;
}
@@ -1385,9 +1389,10 @@ static void destroy_raw_packet_qp(struct mlx5_ib_dev 
*dev,
struct mlx5_ib_raw_packet_qp *raw_packet_qp = &qp->raw_packet_qp;
struct mlx5_ib_sq *sq = &raw_packet_qp->sq;
struct mlx5_ib_rq *rq = &raw_packet_qp->rq;
+   u16 uid = to_mucontext(qp->ibqp.uobject->context)->devx_uid;
 
if (qp->rq.wqe_cnt) {
-   destroy_raw_packet_qp_tir(dev, rq, qp->flags_en);
+   destroy_raw_packet_qp_tir(dev, rq, qp->flags_en, uid);
destroy_raw_packet_qp_rq(dev, rq);
}
 

[PATCH mlx5-next 04/25] net/mlx5: Set uid as part of SQ commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of SQ commands so that the firmware can manage the
SQ object in a secured way.

That will enable using an SQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 16 ++--
 include/linux/mlx5/mlx5_ifc.h|  6 +++---
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c 
b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index 0ca68ef54d93..9bdb3dc425ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -584,6 +584,17 @@ void mlx5_core_destroy_rq_tracked(struct mlx5_core_dev 
*dev,
 }
 EXPORT_SYMBOL(mlx5_core_destroy_rq_tracked);
 
+static void destroy_sq_tracked(struct mlx5_core_dev *dev, u32 sqn, u16 uid)
+{
+   u32 in[MLX5_ST_SZ_DW(destroy_sq_in)]   = {0};
+   u32 out[MLX5_ST_SZ_DW(destroy_sq_out)] = {0};
+
+   MLX5_SET(destroy_sq_in, in, opcode, MLX5_CMD_OP_DESTROY_SQ);
+   MLX5_SET(destroy_sq_in, in, sqn, sqn);
+   MLX5_SET(destroy_sq_in, in, uid, uid);
+   mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
 int mlx5_core_create_sq_tracked(struct mlx5_core_dev *dev, u32 *in, int inlen,
struct mlx5_core_qp *sq)
 {
@@ -594,6 +605,7 @@ int mlx5_core_create_sq_tracked(struct mlx5_core_dev *dev, 
u32 *in, int inlen,
if (err)
return err;
 
+   sq->uid = MLX5_GET(create_sq_in, in, uid);
sq->qpn = sqn;
err = create_resource_common(dev, sq, MLX5_RES_SQ);
if (err)
@@ -602,7 +614,7 @@ int mlx5_core_create_sq_tracked(struct mlx5_core_dev *dev, 
u32 *in, int inlen,
return 0;
 
 err_destroy_sq:
-   mlx5_core_destroy_sq(dev, sq->qpn);
+   destroy_sq_tracked(dev, sq->qpn, sq->uid);
 
return err;
 }
@@ -612,7 +624,7 @@ void mlx5_core_destroy_sq_tracked(struct mlx5_core_dev *dev,
  struct mlx5_core_qp *sq)
 {
destroy_resource_common(dev, sq);
-   mlx5_core_destroy_sq(dev, sq->qpn);
+   destroy_sq_tracked(dev, sq->qpn, sq->uid);
 }
 EXPORT_SYMBOL(mlx5_core_destroy_sq_tracked);
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 01b707666fb4..8151488f6570 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -5382,7 +5382,7 @@ struct mlx5_ifc_modify_sq_out_bits {
 
 struct mlx5_ifc_modify_sq_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6097,7 +6097,7 @@ struct mlx5_ifc_destroy_sq_out_bits {
 
 struct mlx5_ifc_destroy_sq_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6770,7 +6770,7 @@ struct mlx5_ifc_create_sq_out_bits {
 
 struct mlx5_ifc_create_sq_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
-- 
2.14.4



[PATCH mlx5-next 07/25] net/mlx5: Update mlx5_ifc with DEVX UID bits

2018-09-17 Thread Leon Romanovsky
From: Leon Romanovsky 

Add DEVX information to WQ, SRQ, CQ, TRI, TIS, QP,
RQ, XRCD, PD, MKEY and MCG.

Signed-off-by: Leon Romanovsky 
---
 include/linux/mlx5/mlx5_ifc.h | 67 +++
 1 file changed, 43 insertions(+), 24 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index efa4a60431d4..0f460fb22c31 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1291,7 +1291,9 @@ struct mlx5_ifc_wq_bits {
u8 reserved_at_118[0x3];
u8 log_wq_sz[0x5];
 
-   u8 reserved_at_120[0x3];
+   u8 dbr_umem_valid[0x1];
+   u8 wq_umem_valid[0x1];
+   u8 reserved_at_122[0x1];
u8 log_hairpin_num_packets[0x5];
u8 reserved_at_128[0x3];
u8 log_hairpin_data_sz[0x5];
@@ -2365,7 +2367,10 @@ struct mlx5_ifc_qpc_bits {
 
u8 dc_access_key[0x40];
 
-   u8 reserved_at_680[0xc0];
+   u8 reserved_at_680[0x3];
+   u8 dbr_umem_valid[0x1];
+
+   u8 reserved_at_684[0xbc];
 };
 
 struct mlx5_ifc_roce_addr_layout_bits {
@@ -2465,7 +2470,7 @@ struct mlx5_ifc_xrc_srqc_bits {
 
u8 wq_signature[0x1];
u8 cont_srq[0x1];
-   u8 reserved_at_22[0x1];
+   u8 dbr_umem_valid[0x1];
u8 rlky[0x1];
u8 basic_cyclic_rcv_wqe[0x1];
u8 log_rq_stride[0x3];
@@ -3129,7 +3134,9 @@ enum {
 
 struct mlx5_ifc_cqc_bits {
u8 status[0x4];
-   u8 reserved_at_4[0x4];
+   u8 reserved_at_4[0x2];
+   u8 dbr_umem_valid[0x1];
+   u8 reserved_at_7[0x1];
u8 cqe_sz[0x3];
u8 cc[0x1];
u8 reserved_at_c[0x1];
@@ -5315,7 +5322,7 @@ struct mlx5_ifc_modify_tis_bitmask_bits {
 
 struct mlx5_ifc_modify_tis_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -5354,7 +5361,7 @@ struct mlx5_ifc_modify_tir_out_bits {
 
 struct mlx5_ifc_modify_tir_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -5455,7 +5462,7 @@ struct mlx5_ifc_rqt_bitmask_bits {
 
 struct mlx5_ifc_modify_rqt_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -5642,7 +5649,10 @@ struct mlx5_ifc_modify_cq_in_bits {
 
struct mlx5_ifc_cqc_bits cq_context;
 
-   u8 reserved_at_280[0x600];
+   u8 reserved_at_280[0x40];
+
+   u8 cq_umem_valid[0x1];
+   u8 reserved_at_2c1[0x5bf];
 
u8 pas[0][0x40];
 };
@@ -5963,7 +5973,7 @@ struct mlx5_ifc_detach_from_mcg_out_bits {
 
 struct mlx5_ifc_detach_from_mcg_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6031,7 +6041,7 @@ struct mlx5_ifc_destroy_tis_out_bits {
 
 struct mlx5_ifc_destroy_tis_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6053,7 +6063,7 @@ struct mlx5_ifc_destroy_tir_out_bits {
 
 struct mlx5_ifc_destroy_tir_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6143,7 +6153,7 @@ struct mlx5_ifc_destroy_rqt_out_bits {
 
 struct mlx5_ifc_destroy_rqt_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6508,7 +6518,7 @@ struct mlx5_ifc_dealloc_xrcd_out_bits {
 
 struct mlx5_ifc_dealloc_xrcd_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6596,7 +6606,7 @@ struct mlx5_ifc_dealloc_pd_out_bits {
 
 struct mlx5_ifc_dealloc_pd_in_bits {
u8 opcode[0x10];
-   u8 reserved_at_10[0x10];
+   u8 uid[0x10];
 
u8 reserved_at_20[0x10];
u8 op_mod[0x10];
@@ -6675,7 +6685,9 @@ struct mlx5_ifc_create_xrc_srq_in_bits {
 
struct mlx5_ifc_xrc_srqc_bits xrc_srq_context_entry;
 
-   u8 reserved_at_280[0x600];
+   u8 reserved_at_280[0x40];
+   u8 xrc_srq_umem_valid[0x1];
+   u8 reserve

[PATCH rdma-next 19/25] IB/mlx5: Set uid as part of XRCD commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of XRCD commands so that the firmware can manage the
XRCD object in a secured way.

That will enable using an XRCD that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/cmd.c | 25 +
 drivers/infiniband/hw/mlx5/cmd.h |  2 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 drivers/infiniband/hw/mlx5/qp.c  |  8 ++--
 4 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index 9da10fbb7e23..51c39bc77ac7 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -271,3 +271,28 @@ void mlx5_cmd_dealloc_transport_domain(struct 
mlx5_core_dev *dev, u32 tdn,
MLX5_SET(dealloc_transport_domain_in, in, transport_domain, tdn);
mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
+
+int mlx5_cmd_xrcd_alloc(struct mlx5_core_dev *dev, u32 *xrcdn, u16 uid)
+{
+   u32 out[MLX5_ST_SZ_DW(alloc_xrcd_out)] = {0};
+   u32 in[MLX5_ST_SZ_DW(alloc_xrcd_in)]   = {0};
+   int err;
+
+   MLX5_SET(alloc_xrcd_in, in, opcode, MLX5_CMD_OP_ALLOC_XRCD);
+   MLX5_SET(alloc_xrcd_in, in, uid, uid);
+   err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+   if (!err)
+   *xrcdn = MLX5_GET(alloc_xrcd_out, out, xrcd);
+   return err;
+}
+
+int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid)
+{
+   u32 out[MLX5_ST_SZ_DW(dealloc_xrcd_out)] = {0};
+   u32 in[MLX5_ST_SZ_DW(dealloc_xrcd_in)]   = {0};
+
+   MLX5_SET(dealloc_xrcd_in, in, opcode, MLX5_CMD_OP_DEALLOC_XRCD);
+   MLX5_SET(dealloc_xrcd_in, in, xrcd, xrcdn);
+   MLX5_SET(dealloc_xrcd_in, in, uid, uid);
+   return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h
index 3a1d611216fb..76823e86fd17 100644
--- a/drivers/infiniband/hw/mlx5/cmd.h
+++ b/drivers/infiniband/hw/mlx5/cmd.h
@@ -55,4 +55,6 @@ int mlx5_cmd_alloc_transport_domain(struct mlx5_core_dev 
*dev, u32 *tdn,
u16 uid);
 void mlx5_cmd_dealloc_transport_domain(struct mlx5_core_dev *dev, u32 tdn,
   u16 uid);
+int mlx5_cmd_xrcd_alloc(struct mlx5_core_dev *dev, u32 *xrcdn, u16 uid);
+int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid);
 #endif /* MLX5_IB_CMD_H */
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 88ea0df71d94..f582bd05c180 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -548,6 +548,7 @@ struct mlx5_ib_srq {
 struct mlx5_ib_xrcd {
struct ib_xrcd  ibxrcd;
u32 xrcdn;
+   u16 uid;
 };
 
 enum mlx5_ib_mtt_access_flags {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 1a3b79405260..00b36b971ffa 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -5341,6 +5341,7 @@ struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device 
*ibdev,
struct mlx5_ib_dev *dev = to_mdev(ibdev);
struct mlx5_ib_xrcd *xrcd;
int err;
+   u16 uid;
 
if (!MLX5_CAP_GEN(dev->mdev, xrc))
return ERR_PTR(-ENOSYS);
@@ -5349,12 +5350,14 @@ struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device 
*ibdev,
if (!xrcd)
return ERR_PTR(-ENOMEM);
 
-   err = mlx5_core_xrcd_alloc(dev->mdev, &xrcd->xrcdn);
+   uid = context ? to_mucontext(context)->devx_uid : 0;
+   err = mlx5_cmd_xrcd_alloc(dev->mdev, &xrcd->xrcdn, uid);
if (err) {
kfree(xrcd);
return ERR_PTR(-ENOMEM);
}
 
+   xrcd->uid = uid;
return &xrcd->ibxrcd;
 }
 
@@ -5362,9 +5365,10 @@ int mlx5_ib_dealloc_xrcd(struct ib_xrcd *xrcd)
 {
struct mlx5_ib_dev *dev = to_mdev(xrcd->device);
u32 xrcdn = to_mxrcd(xrcd)->xrcdn;
+   u16 uid =  to_mxrcd(xrcd)->uid;
int err;
 
-   err = mlx5_core_xrcd_dealloc(dev->mdev, xrcdn);
+   err = mlx5_cmd_xrcd_dealloc(dev->mdev, xrcdn, uid);
if (err)
mlx5_ib_warn(dev, "failed to dealloc xrcdn 0x%x\n", xrcdn);
 
-- 
2.14.4



[PATCH rdma-next 16/25] IB/mlx5: Set uid as part of TD commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of TD commands so that the firmware can
manage the TD object in a secured way.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/cmd.c  | 30 ++
 drivers/infiniband/hw/mlx5/cmd.h  |  4 
 drivers/infiniband/hw/mlx5/main.c | 33 ++---
 3 files changed, 52 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index 5560346102bd..9da10fbb7e23 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -241,3 +241,33 @@ void mlx5_cmd_dealloc_pd(struct mlx5_core_dev *dev, u32 
pdn, u16 uid)
MLX5_SET(dealloc_pd_in, in, uid, uid);
mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
+
+int mlx5_cmd_alloc_transport_domain(struct mlx5_core_dev *dev, u32 *tdn,
+   u16 uid)
+{
+   u32 in[MLX5_ST_SZ_DW(alloc_transport_domain_in)]   = {0};
+   u32 out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+   int err;
+
+   MLX5_SET(alloc_transport_domain_in, in, opcode,
+MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+
+   err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+   if (!err)
+   *tdn = MLX5_GET(alloc_transport_domain_out, out,
+   transport_domain);
+
+   return err;
+}
+
+void mlx5_cmd_dealloc_transport_domain(struct mlx5_core_dev *dev, u32 tdn,
+  u16 uid)
+{
+   u32 in[MLX5_ST_SZ_DW(dealloc_transport_domain_in)]   = {0};
+   u32 out[MLX5_ST_SZ_DW(dealloc_transport_domain_out)] = {0};
+
+   MLX5_SET(dealloc_transport_domain_in, in, opcode,
+MLX5_CMD_OP_DEALLOC_TRANSPORT_DOMAIN);
+   MLX5_SET(dealloc_transport_domain_in, in, transport_domain, tdn);
+   mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h
index b47e98b8a53a..3a1d611216fb 100644
--- a/drivers/infiniband/hw/mlx5/cmd.h
+++ b/drivers/infiniband/hw/mlx5/cmd.h
@@ -51,4 +51,8 @@ void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 
tirn, u16 uid);
 void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid);
 void mlx5_cmd_destroy_rqt(struct mlx5_core_dev *dev, u32 rqtn, u16 uid);
 void mlx5_cmd_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn, u16 uid);
+int mlx5_cmd_alloc_transport_domain(struct mlx5_core_dev *dev, u32 *tdn,
+   u16 uid);
+void mlx5_cmd_dealloc_transport_domain(struct mlx5_core_dev *dev, u32 tdn,
+  u16 uid);
 #endif /* MLX5_IB_CMD_H */
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 7e6fd5553ab3..c1f94bc09606 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1613,14 +1613,15 @@ void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool 
td, bool qp)
mutex_unlock(&dev->lb.mutex);
 }
 
-static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn)
+static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn,
+ u16 uid)
 {
int err;
 
if (!MLX5_CAP_GEN(dev->mdev, log_max_transport_domain))
return 0;
 
-   err = mlx5_core_alloc_transport_domain(dev->mdev, tdn);
+   err = mlx5_cmd_alloc_transport_domain(dev->mdev, tdn, uid);
if (err)
return err;
 
@@ -1632,12 +1633,13 @@ static int mlx5_ib_alloc_transport_domain(struct 
mlx5_ib_dev *dev, u32 *tdn)
return mlx5_ib_enable_lb(dev, true, false);
 }
 
-static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn)
+static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn,
+u16 uid)
 {
if (!MLX5_CAP_GEN(dev->mdev, log_max_transport_domain))
return;
 
-   mlx5_core_dealloc_transport_domain(dev->mdev, tdn);
+   mlx5_cmd_dealloc_transport_domain(dev->mdev, tdn, uid);
 
if ((MLX5_CAP_GEN(dev->mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH) ||
(!MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) &&
@@ -1756,22 +1758,23 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
context->ibucontext.invalidate_range = &mlx5_ib_invalidate_range;
 #endif
 
-   err = mlx5_ib_alloc_transport_domain(dev, &context->tdn);
-   if (err)
-   goto out_uars;
-
if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX) {
/* Block DEVX on Infiniband as of SELinux */
if (mlx5_ib_port_link_layer(ibdev, 1) != 
IB_LINK_LAYER_ETHERNET) {
err = -EPERM;
-   goto out_td;
+   goto out_uars;
}
 
err = mlx5_ib_devx_create(dev, conte

[PATCH rdma-next 22/25] IB/mlx5: Expose RAW QP device handles to user space

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Expose RAW QP device handles to user space by extending the UHW part of
mlx5_ib_create_qp_resp.

This data is returned only when DEVX context is used where it may be
applicable.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/qp.c | 38 --
 include/uapi/rdma/mlx5-abi.h| 13 +
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 00b36b971ffa..9a04f8b12a75 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1325,7 +1325,9 @@ static int create_raw_packet_qp_tir(struct mlx5_ib_dev 
*dev,
 
 static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
u32 *in, size_t inlen,
-   struct ib_pd *pd)
+   struct ib_pd *pd,
+   struct ib_udata *udata,
+   struct mlx5_ib_create_qp_resp *resp)
 {
struct mlx5_ib_raw_packet_qp *raw_packet_qp = &qp->raw_packet_qp;
struct mlx5_ib_sq *sq = &raw_packet_qp->sq;
@@ -1346,6 +1348,13 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
if (err)
goto err_destroy_tis;
 
+   if (uid) {
+   resp->tisn = sq->tisn;
+   resp->comp_mask |= MLX5_IB_CREATE_QP_RESP_MASK_TISN;
+   resp->sqn = sq->base.mqp.qpn;
+   resp->comp_mask |= MLX5_IB_CREATE_QP_RESP_MASK_SQN;
+   }
+
sq->base.container_mibqp = qp;
sq->base.mqp.event = mlx5_ib_qp_event;
}
@@ -1365,13 +1374,26 @@ static int create_raw_packet_qp(struct mlx5_ib_dev 
*dev, struct mlx5_ib_qp *qp,
   uid);
if (err)
goto err_destroy_rq;
+
+   if (uid) {
+   resp->rqn = rq->base.mqp.qpn;
+   resp->comp_mask |= MLX5_IB_CREATE_QP_RESP_MASK_RQN;
+   resp->tirn = rq->tirn;
+   resp->comp_mask |= MLX5_IB_CREATE_QP_RESP_MASK_TIRN;
+   }
}
 
qp->trans_qp.base.mqp.qpn = qp->sq.wqe_cnt ? sq->base.mqp.qpn :
 rq->base.mqp.qpn;
 
+   err = ib_copy_to_udata(udata, resp, min(udata->outlen, sizeof(*resp)));
+   if (err)
+   goto err_destroy_tir;
+
return 0;
 
+err_destroy_tir:
+   destroy_raw_packet_qp_tir(dev, rq, qp->flags_en, uid);
 err_destroy_rq:
destroy_raw_packet_qp_rq(dev, rq);
 err_destroy_sq:
@@ -1643,12 +1665,23 @@ static int create_rss_raw_qp_tir(struct mlx5_ib_dev 
*dev, struct mlx5_ib_qp *qp,
if (err)
goto err;
 
+   if (mucontext->devx_uid) {
+   resp.comp_mask |= MLX5_IB_CREATE_QP_RESP_MASK_TIRN;
+   resp.tirn = qp->rss_qp.tirn;
+   }
+
+   err = ib_copy_to_udata(udata, &resp, min(udata->outlen, sizeof(resp)));
+   if (err)
+   goto err_copy;
+
kvfree(in);
/* qpn is reserved for that QP */
qp->trans_qp.base.mqp.qpn = 0;
qp->flags |= MLX5_IB_QP_RSS;
return 0;
 
+err_copy:
+   mlx5_cmd_destroy_tir(dev->mdev, qp->rss_qp.tirn, mucontext->devx_uid);
 err:
kvfree(in);
return err;
@@ -2030,7 +2063,8 @@ static int create_qp_common(struct mlx5_ib_dev *dev, 
struct ib_pd *pd,
qp->flags & MLX5_IB_QP_UNDERLAY) {
qp->raw_packet_qp.sq.ubuffer.buf_addr = ucmd.sq_buf_addr;
raw_packet_qp_copy_info(qp, &qp->raw_packet_qp);
-   err = create_raw_packet_qp(dev, qp, in, inlen, pd);
+   err = create_raw_packet_qp(dev, qp, in, inlen, pd, udata,
+  &resp);
} else {
err = mlx5_core_create_qp(dev->mdev, &base->mqp, in, inlen);
}
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index 3ddb31a0bc47..8fa9f90e2bb1 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -352,9 +352,22 @@ struct mlx5_ib_create_qp_rss {
__u32   flags;
 };
 
+enum mlx5_ib_create_qp_resp_mask {
+   MLX5_IB_CREATE_QP_RESP_MASK_TIRN = 1UL << 0,
+   MLX5_IB_CREATE_QP_RESP_MASK_TISN = 1UL << 1,
+   MLX5_IB_CREATE_QP_RESP_MASK_RQN  = 1UL << 2,
+   MLX5_IB_CREATE_QP_RESP_MASK_SQN  = 1UL << 3,
+};
+
 struct mlx5_ib_create_qp_resp {
__u32   bfreg_index;
__u32   reserved;
+   __u32   comp_mask;
+   __u32   tirn;
+   __u32   tisn;
+   __u32   rqn;
+   __u32   sqn;
+   __u32   reserved1;
 };
 
 struct mlx5_ib_alloc_mw {
-- 
2.14.4



[PATCH rdma-next 17/25] IB/mlx5: Set uid as part of SRQ commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of SRQ create command so that the firmware can manage
the SRQ object in a secured way.

The uid for the destroy and modify commands are set by mlx5_core.

That will enable using a SRQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/srq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c
index d359fecf7a5b..6b1cd9ef4e2a 100644
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -144,6 +144,7 @@ static int create_srq_user(struct ib_pd *pd, struct 
mlx5_ib_srq *srq,
 
in->log_page_size = page_shift - MLX5_ADAPTER_PAGE_SHIFT;
in->page_offset = offset;
+   in->uid = to_mucontext(pd->uobject->context)->devx_uid;
if (MLX5_CAP_GEN(dev->mdev, cqe_version) == MLX5_CQE_VERSION_V1 &&
in->type != IB_SRQT_BASIC)
in->user_index = uidx;
-- 
2.14.4



[PATCH rdma-next 11/25] IB/mlx5: Set uid as part of SQ commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of SQ commands so that the firmware can manage the
SQ object in a secured way.

The uid for the destroy command is set by mlx5_core.

This will enable using an SQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/qp.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 31c69da7ccdf..24370635008e 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1088,7 +1088,7 @@ static void destroy_flow_rule_vport_sq(struct mlx5_ib_dev 
*dev,
 
 static int create_raw_packet_qp_sq(struct mlx5_ib_dev *dev,
   struct mlx5_ib_sq *sq, void *qpin,
-  struct ib_pd *pd)
+  struct ib_pd *pd, u16 uid)
 {
struct mlx5_ib_ubuffer *ubuffer = &sq->ubuffer;
__be64 *pas;
@@ -1116,6 +1116,7 @@ static int create_raw_packet_qp_sq(struct mlx5_ib_dev 
*dev,
goto err_umem;
}
 
+   MLX5_SET(create_sq_in, in, uid, uid);
sqc = MLX5_ADDR_OF(create_sq_in, in, ctx);
MLX5_SET(sqc, sqc, flush_in_error_en, 1);
if (MLX5_CAP_ETH(dev->mdev, multi_pkt_send_wqe))
@@ -1336,7 +1337,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
if (err)
return err;
 
-   err = create_raw_packet_qp_sq(dev, sq, in, pd);
+   err = create_raw_packet_qp_sq(dev, sq, in, pd, uid);
if (err)
goto err_destroy_tis;
 
@@ -2885,7 +2886,8 @@ static int modify_raw_packet_qp_rq(struct mlx5_ib_dev 
*dev,
 static int modify_raw_packet_qp_sq(struct mlx5_core_dev *dev,
   struct mlx5_ib_sq *sq,
   int new_state,
-  const struct mlx5_modify_raw_qp_param 
*raw_qp_param)
+  const struct mlx5_modify_raw_qp_param 
*raw_qp_param,
+  u16 uid)
 {
struct mlx5_ib_qp *ibqp = sq->base.container_mibqp;
struct mlx5_rate_limit old_rl = ibqp->rl;
@@ -2902,6 +2904,7 @@ static int modify_raw_packet_qp_sq(struct mlx5_core_dev 
*dev,
if (!in)
return -ENOMEM;
 
+   MLX5_SET(modify_sq_in, in, uid, uid);
MLX5_SET(modify_sq_in, in, sq_state, sq->state);
 
sqc = MLX5_ADDR_OF(modify_sq_in, in, ctx);
@@ -3019,7 +3022,8 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
return err;
}
 
-   return modify_raw_packet_qp_sq(dev->mdev, sq, sq_state, 
raw_qp_param);
+   return modify_raw_packet_qp_sq(dev->mdev, sq, sq_state,
+  raw_qp_param, uid);
}
 
return 0;
-- 
2.14.4



[PATCH rdma-next 18/25] IB/mlx5: Set uid as part of DCT commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of DCT create command so that the firmware can
manage the DCT object in a secured way.

The uid for the destroy and drain commands are set by mlx5_core.

That will enable using a DCT that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/qp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 8fbf17a885b9..1a3b79405260 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -2311,6 +2311,8 @@ static struct ib_qp *mlx5_ib_create_dct(struct ib_pd *pd,
goto err_free;
}
 
+   MLX5_SET(create_dct_in, qp->dct.in, uid,
+to_mucontext(pd->uobject->context)->devx_uid);
dctc = MLX5_ADDR_OF(create_dct_in, qp->dct.in, dct_context_entry);
qp->qp_sub_type = MLX5_IB_QPT_DCT;
MLX5_SET(dctc, dctc, pd, to_mpd(pd)->pdn);
-- 
2.14.4



[PATCH rdma-next 15/25] IB/mlx5: Set uid as part of PD commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of PD commands so that the firmware can manage the
PD object in a secured way.

For example when a QP is created its uid must match the CQ uid which it
uses.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/cmd.c | 10 ++
 drivers/infiniband/hw/mlx5/cmd.h |  1 +
 drivers/infiniband/hw/mlx5/main.c| 16 +---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index 347e3912b4bb..5560346102bd 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -231,3 +231,13 @@ void mlx5_cmd_destroy_rqt(struct mlx5_core_dev *dev, u32 
rqtn, u16 uid)
mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
 
+void mlx5_cmd_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn, u16 uid)
+{
+   u32 out[MLX5_ST_SZ_DW(dealloc_pd_out)] = {0};
+   u32 in[MLX5_ST_SZ_DW(dealloc_pd_in)]   = {0};
+
+   MLX5_SET(dealloc_pd_in, in, opcode, MLX5_CMD_OP_DEALLOC_PD);
+   MLX5_SET(dealloc_pd_in, in, pd, pdn);
+   MLX5_SET(dealloc_pd_in, in, uid, uid);
+   mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h
index 0437190c1b35..b47e98b8a53a 100644
--- a/drivers/infiniband/hw/mlx5/cmd.h
+++ b/drivers/infiniband/hw/mlx5/cmd.h
@@ -50,4 +50,5 @@ int mlx5_cmd_dealloc_memic(struct mlx5_memic *memic, u64 
addr, u64 length);
 void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid);
 void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid);
 void mlx5_cmd_destroy_rqt(struct mlx5_core_dev *dev, u32 rqtn, u16 uid);
+void mlx5_cmd_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn, u16 uid);
 #endif /* MLX5_IB_CMD_H */
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 75851721d1dc..7e6fd5553ab3 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2355,21 +2355,31 @@ static struct ib_pd *mlx5_ib_alloc_pd(struct ib_device 
*ibdev,
struct mlx5_ib_alloc_pd_resp resp;
struct mlx5_ib_pd *pd;
int err;
+   u32 out[MLX5_ST_SZ_DW(alloc_pd_out)] = {0};
+   u32 in[MLX5_ST_SZ_DW(alloc_pd_in)]   = {0};
+   u16 uid;
+
+   uid = context ? to_mucontext(context)->devx_uid : 0;
 
pd = kmalloc(sizeof(*pd), GFP_KERNEL);
if (!pd)
return ERR_PTR(-ENOMEM);
 
-   err = mlx5_core_alloc_pd(to_mdev(ibdev)->mdev, &pd->pdn);
+   MLX5_SET(alloc_pd_in, in, opcode, MLX5_CMD_OP_ALLOC_PD);
+   MLX5_SET(alloc_pd_in, in, uid, uid);
+   err = mlx5_cmd_exec(to_mdev(ibdev)->mdev, in, sizeof(in),
+   out, sizeof(out));
if (err) {
kfree(pd);
return ERR_PTR(err);
}
 
+   pd->pdn = MLX5_GET(alloc_pd_out, out, pd);
+   pd->uid = uid;
if (context) {
resp.pdn = pd->pdn;
if (ib_copy_to_udata(udata, &resp, sizeof(resp))) {
-   mlx5_core_dealloc_pd(to_mdev(ibdev)->mdev, pd->pdn);
+   mlx5_cmd_dealloc_pd(to_mdev(ibdev)->mdev, pd->pdn, uid);
kfree(pd);
return ERR_PTR(-EFAULT);
}
@@ -2383,7 +2393,7 @@ static int mlx5_ib_dealloc_pd(struct ib_pd *pd)
struct mlx5_ib_dev *mdev = to_mdev(pd->device);
struct mlx5_ib_pd *mpd = to_mpd(pd);
 
-   mlx5_core_dealloc_pd(mdev->mdev, mpd->pdn);
+   mlx5_cmd_dealloc_pd(mdev->mdev, mpd->pdn, mpd->uid);
kfree(mpd);
 
return 0;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 2508a401a7d9..88ea0df71d94 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -153,6 +153,7 @@ static inline struct mlx5_ib_ucontext *to_mucontext(struct 
ib_ucontext *ibuconte
 struct mlx5_ib_pd {
struct ib_pdibpd;
u32 pdn;
+   u16 uid;
 };
 
 enum {
-- 
2.14.4



[PATCH rdma-next 13/25] IB/mlx5: Set uid as part of TIS commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of TIS commands so that the firmware can manage the
TIS object in a secured way.

That will enable using a TIS that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/cmd.c | 12 
 drivers/infiniband/hw/mlx5/cmd.h |  1 +
 drivers/infiniband/hw/mlx5/qp.c  | 27 +--
 3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index e150ae44e06a..8a3623bbca94 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -208,3 +208,15 @@ void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 
tirn, u16 uid)
MLX5_SET(destroy_tir_in, in, uid, uid);
mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
+
+void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid)
+{
+   u32 in[MLX5_ST_SZ_DW(destroy_tis_in)]   = {0};
+   u32 out[MLX5_ST_SZ_DW(destroy_tis_out)] = {0};
+
+   MLX5_SET(destroy_tis_in, in, opcode, MLX5_CMD_OP_DESTROY_TIS);
+   MLX5_SET(destroy_tis_in, in, tisn, tisn);
+   MLX5_SET(destroy_tis_in, in, uid, uid);
+   mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h
index 274090a38c4b..a55e750591e5 100644
--- a/drivers/infiniband/hw/mlx5/cmd.h
+++ b/drivers/infiniband/hw/mlx5/cmd.h
@@ -48,4 +48,5 @@ int mlx5_cmd_alloc_memic(struct mlx5_memic *memic, 
phys_addr_t *addr,
 u64 length, u32 alignment);
 int mlx5_cmd_dealloc_memic(struct mlx5_memic *memic, u64 addr, u64 length);
 void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid);
+void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid);
 #endif /* MLX5_IB_CMD_H */
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 07bf5128bee4..5421857f195e 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1062,11 +1062,12 @@ static int is_connected(enum ib_qp_type qp_type)
 
 static int create_raw_packet_qp_tis(struct mlx5_ib_dev *dev,
struct mlx5_ib_qp *qp,
-   struct mlx5_ib_sq *sq, u32 tdn)
+   struct mlx5_ib_sq *sq, u32 tdn, u16 uid)
 {
u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
void *tisc = MLX5_ADDR_OF(create_tis_in, in, ctx);
 
+   MLX5_SET(create_tis_in, in, uid, uid);
MLX5_SET(tisc, tisc, transport_domain, tdn);
if (qp->flags & MLX5_IB_QP_UNDERLAY)
MLX5_SET(tisc, tisc, underlay_qpn, qp->underlay_qpn);
@@ -1075,9 +1076,9 @@ static int create_raw_packet_qp_tis(struct mlx5_ib_dev 
*dev,
 }
 
 static void destroy_raw_packet_qp_tis(struct mlx5_ib_dev *dev,
- struct mlx5_ib_sq *sq)
+ struct mlx5_ib_sq *sq, u16 uid)
 {
-   mlx5_core_destroy_tis(dev->mdev, sq->tisn);
+   mlx5_cmd_destroy_tis(dev->mdev, sq->tisn, uid);
 }
 
 static void destroy_flow_rule_vport_sq(struct mlx5_ib_dev *dev,
@@ -1337,7 +1338,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
u16 uid = mucontext->devx_uid;
 
if (qp->sq.wqe_cnt) {
-   err = create_raw_packet_qp_tis(dev, qp, sq, tdn);
+   err = create_raw_packet_qp_tis(dev, qp, sq, tdn, uid);
if (err)
return err;
 
@@ -1378,7 +1379,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, 
struct mlx5_ib_qp *qp,
return err;
destroy_raw_packet_qp_sq(dev, sq);
 err_destroy_tis:
-   destroy_raw_packet_qp_tis(dev, sq);
+   destroy_raw_packet_qp_tis(dev, sq, uid);
 
return err;
 }
@@ -1398,7 +1399,7 @@ static void destroy_raw_packet_qp(struct mlx5_ib_dev *dev,
 
if (qp->sq.wqe_cnt) {
destroy_raw_packet_qp_sq(dev, sq);
-   destroy_raw_packet_qp_tis(dev, sq);
+   destroy_raw_packet_qp_tis(dev, sq, uid);
}
 }
 
@@ -2579,7 +2580,7 @@ static int ib_rate_to_mlx5(struct mlx5_ib_dev *dev, u8 
rate)
 }
 
 static int modify_raw_packet_eth_prio(struct mlx5_core_dev *dev,
- struct mlx5_ib_sq *sq, u8 sl)
+ struct mlx5_ib_sq *sq, u8 sl, u16 uid)
 {
void *in;
void *tisc;
@@ -2592,6 +2593,7 @@ static int modify_raw_packet_eth_prio(struct 
mlx5_core_dev *dev,
return -ENOMEM;
 
MLX5_SET(modify_tis_in, in, bitmask.prio, 1);
+   MLX5_SET(modify_tis_in, in, uid, uid);
 
tisc = MLX5_ADDR_OF(modify_tis_in, in, ctx);
MLX5_SET(tisc, tisc, prio, ((sl & 0x7) << 1));
@@ -2604,7 +2606,8 @@ static int modify_raw_packet_eth_prio(struct 
mlx5_cor

[PATCH rdma-next 20/25] IB/mlx5: Set uid as part of MCG commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of MCG commands so that the firmware can manage the
MCG object in a secured way.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/cmd.c  | 30 ++
 drivers/infiniband/hw/mlx5/cmd.h  |  4 
 drivers/infiniband/hw/mlx5/main.c | 11 +--
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index 51c39bc77ac7..ababc5cdbcaa 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -296,3 +296,33 @@ int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 
xrcdn, u16 uid)
MLX5_SET(dealloc_xrcd_in, in, uid, uid);
return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
+
+int mlx5_cmd_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid,
+   u32 qpn, u16 uid)
+{
+   u32 out[MLX5_ST_SZ_DW(attach_to_mcg_out)] = {0};
+   u32 in[MLX5_ST_SZ_DW(attach_to_mcg_in)]   = {0};
+   void *gid;
+
+   MLX5_SET(attach_to_mcg_in, in, opcode, MLX5_CMD_OP_ATTACH_TO_MCG);
+   MLX5_SET(attach_to_mcg_in, in, qpn, qpn);
+   MLX5_SET(attach_to_mcg_in, in, uid, uid);
+   gid = MLX5_ADDR_OF(attach_to_mcg_in, in, multicast_gid);
+   memcpy(gid, mgid, sizeof(*mgid));
+   return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
+int mlx5_cmd_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid,
+   u32 qpn, u16 uid)
+{
+   u32 out[MLX5_ST_SZ_DW(detach_from_mcg_out)] = {0};
+   u32 in[MLX5_ST_SZ_DW(detach_from_mcg_in)]   = {0};
+   void *gid;
+
+   MLX5_SET(detach_from_mcg_in, in, opcode, MLX5_CMD_OP_DETACH_FROM_MCG);
+   MLX5_SET(detach_from_mcg_in, in, qpn, qpn);
+   MLX5_SET(detach_from_mcg_in, in, uid, uid);
+   gid = MLX5_ADDR_OF(detach_from_mcg_in, in, multicast_gid);
+   memcpy(gid, mgid, sizeof(*mgid));
+   return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h
index 76823e86fd17..7cf364af7c28 100644
--- a/drivers/infiniband/hw/mlx5/cmd.h
+++ b/drivers/infiniband/hw/mlx5/cmd.h
@@ -57,4 +57,8 @@ void mlx5_cmd_dealloc_transport_domain(struct mlx5_core_dev 
*dev, u32 tdn,
   u16 uid);
 int mlx5_cmd_xrcd_alloc(struct mlx5_core_dev *dev, u32 *xrcdn, u16 uid);
 int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid);
+int mlx5_cmd_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid,
+   u32 qpn, u16 uid);
+int mlx5_cmd_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid,
+   u32 qpn, u16 uid);
 #endif /* MLX5_IB_CMD_H */
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index c1f94bc09606..ac2abfc866a6 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4139,13 +4139,17 @@ static int mlx5_ib_mcg_attach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
struct mlx5_ib_dev *dev = to_mdev(ibqp->device);
struct mlx5_ib_qp *mqp = to_mqp(ibqp);
int err;
+   u16 uid;
+
+   uid = ibqp->uobject ?
+   to_mucontext(ibqp->uobject->context)->devx_uid : 0;
 
if (mqp->flags & MLX5_IB_QP_UNDERLAY) {
mlx5_ib_dbg(dev, "Attaching a multi cast group to underlay QP 
is not supported\n");
return -EOPNOTSUPP;
}
 
-   err = mlx5_core_attach_mcg(dev->mdev, gid, ibqp->qp_num);
+   err = mlx5_cmd_attach_mcg(dev->mdev, gid, ibqp->qp_num, uid);
if (err)
mlx5_ib_warn(dev, "failed attaching QPN 0x%x, MGID %pI6\n",
 ibqp->qp_num, gid->raw);
@@ -4157,8 +4161,11 @@ static int mlx5_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
 {
struct mlx5_ib_dev *dev = to_mdev(ibqp->device);
int err;
+   u16 uid;
 
-   err = mlx5_core_detach_mcg(dev->mdev, gid, ibqp->qp_num);
+   uid = ibqp->uobject ?
+   to_mucontext(ibqp->uobject->context)->devx_uid : 0;
+   err = mlx5_cmd_detach_mcg(dev->mdev, gid, ibqp->qp_num, uid);
if (err)
mlx5_ib_warn(dev, "failed detaching QPN 0x%x, MGID %pI6\n",
 ibqp->qp_num, gid->raw);
-- 
2.14.4



[PATCH rdma-next 14/25] IB/mlx5: Set uid as part of RQT commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set uid as part of RQT commands so that the firmware can manage the
RQT object in a secured way.

That will enable using an RQT that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/cmd.c | 11 +++
 drivers/infiniband/hw/mlx5/cmd.h |  1 +
 drivers/infiniband/hw/mlx5/qp.c  | 11 +--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index 8a3623bbca94..347e3912b4bb 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -220,3 +220,14 @@ void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 
tisn, u16 uid)
mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
 
+void mlx5_cmd_destroy_rqt(struct mlx5_core_dev *dev, u32 rqtn, u16 uid)
+{
+   u32 in[MLX5_ST_SZ_DW(destroy_rqt_in)]   = {0};
+   u32 out[MLX5_ST_SZ_DW(destroy_rqt_out)] = {0};
+
+   MLX5_SET(destroy_rqt_in, in, opcode, MLX5_CMD_OP_DESTROY_RQT);
+   MLX5_SET(destroy_rqt_in, in, rqtn, rqtn);
+   MLX5_SET(destroy_rqt_in, in, uid, uid);
+   mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h
index a55e750591e5..0437190c1b35 100644
--- a/drivers/infiniband/hw/mlx5/cmd.h
+++ b/drivers/infiniband/hw/mlx5/cmd.h
@@ -49,4 +49,5 @@ int mlx5_cmd_alloc_memic(struct mlx5_memic *memic, 
phys_addr_t *addr,
 int mlx5_cmd_dealloc_memic(struct mlx5_memic *memic, u64 addr, u64 length);
 void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid);
 void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid);
+void mlx5_cmd_destroy_rqt(struct mlx5_core_dev *dev, u32 rqtn, u16 uid);
 #endif /* MLX5_IB_CMD_H */
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 5421857f195e..8fbf17a885b9 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -5702,6 +5702,7 @@ struct ib_rwq_ind_table 
*mlx5_ib_create_rwq_ind_table(struct ib_device *device,
int i;
u32 *in;
void *rqtc;
+   u16 uid;
 
if (udata->inlen > 0 &&
!ib_is_udata_cleared(udata, 0,
@@ -5739,6 +5740,10 @@ struct ib_rwq_ind_table 
*mlx5_ib_create_rwq_ind_table(struct ib_device *device,
for (i = 0; i < sz; i++)
MLX5_SET(rqtc, rqtc, rq_num[i], init_attr->ind_tbl[i]->wq_num);
 
+   /* Use the uid from its internal WQ */
+   uid = to_mucontext(init_attr->ind_tbl[0]->uobject->context)->devx_uid;
+   MLX5_SET(create_rqt_in, in, uid, uid);
+
err = mlx5_core_create_rqt(dev->mdev, in, inlen, &rwq_ind_tbl->rqtn);
kvfree(in);
 
@@ -5757,7 +5762,7 @@ struct ib_rwq_ind_table 
*mlx5_ib_create_rwq_ind_table(struct ib_device *device,
return &rwq_ind_tbl->ib_rwq_ind_tbl;
 
 err_copy:
-   mlx5_core_destroy_rqt(dev->mdev, rwq_ind_tbl->rqtn);
+   mlx5_cmd_destroy_rqt(dev->mdev, rwq_ind_tbl->rqtn, uid);
 err:
kfree(rwq_ind_tbl);
return ERR_PTR(err);
@@ -5767,8 +5772,10 @@ int mlx5_ib_destroy_rwq_ind_table(struct 
ib_rwq_ind_table *ib_rwq_ind_tbl)
 {
struct mlx5_ib_rwq_ind_table *rwq_ind_tbl = 
to_mrwq_ind_table(ib_rwq_ind_tbl);
struct mlx5_ib_dev *dev = to_mdev(ib_rwq_ind_tbl->device);
+   u16 uid;
 
-   mlx5_core_destroy_rqt(dev->mdev, rwq_ind_tbl->rqtn);
+   uid = to_mucontext(ib_rwq_ind_tbl->uobject->context)->devx_uid;
+   mlx5_cmd_destroy_rqt(dev->mdev, rwq_ind_tbl->rqtn, uid);
 
kfree(rwq_ind_tbl);
return 0;
-- 
2.14.4



[PATCH rdma-next 25/25] IB/mlx5: Enable DEVX on IB

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

IB has additional protections with SELinux that cannot be extended to
the DEVX domain. SELinux can restrict access to pkeys. The first version
of DEVX blocked IB entirely until this could be understood.

Since DEVX requires CAP_NET_RAW, it supersedes the SELinux restriction
and allows userspace to form arbitrary packets with arbitrary pkeys.

Thus we enable IB for DEVX when CAP_NET_RAW is given.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/main.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 8cc285c4da8e..c31e57bead8e 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1759,12 +1759,6 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
 #endif
 
if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX) {
-   /* Block DEVX on Infiniband as of SELinux */
-   if (mlx5_ib_port_link_layer(ibdev, 1) != 
IB_LINK_LAYER_ETHERNET) {
-   err = -EPERM;
-   goto out_uars;
-   }
-
err = mlx5_ib_devx_create(dev);
if (err < 0)
goto out_uars;
-- 
2.14.4



[PATCH rdma-next 23/25] IB/mlx5: Manage device uid for DEVX white list commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Manage device uid for DEVX white list commands.
The created device uid will be used on white list commands if the
user didn't supply its own uid.

This will enable the firmware to filter out non privileged functionality
as of the recognition of the uid.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/devx.c| 12 ++--
 drivers/infiniband/hw/mlx5/main.c| 16 
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 13 +
 3 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c 
b/drivers/infiniband/hw/mlx5/devx.c
index 562c7936bbad..97cac57dcb3d 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -45,13 +45,14 @@ static struct mlx5_ib_ucontext *devx_ufile2uctx(struct 
ib_uverbs_file *file)
return to_mucontext(ib_uverbs_get_ucontext(file));
 }
 
-int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext 
*context)
+int mlx5_ib_devx_create(struct mlx5_ib_dev *dev)
 {
u32 in[MLX5_ST_SZ_DW(create_uctx_in)] = {0};
u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
u64 general_obj_types;
void *hdr;
int err;
+   u16 uid;
 
hdr = MLX5_ADDR_OF(create_uctx_in, in, hdr);
 
@@ -70,19 +71,18 @@ int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, struct 
mlx5_ib_ucontext *contex
if (err)
return err;
 
-   context->devx_uid = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
-   return 0;
+   uid = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
+   return uid;
 }
 
-void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev,
- struct mlx5_ib_ucontext *context)
+void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, u16 uid)
 {
u32 in[MLX5_ST_SZ_DW(general_obj_in_cmd_hdr)] = {0};
u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
 
MLX5_SET(general_obj_in_cmd_hdr, in, opcode, 
MLX5_CMD_OP_DESTROY_GENERAL_OBJECT);
MLX5_SET(general_obj_in_cmd_hdr, in, obj_type, MLX5_OBJ_TYPE_UCTX);
-   MLX5_SET(general_obj_in_cmd_hdr, in, obj_id, context->devx_uid);
+   MLX5_SET(general_obj_in_cmd_hdr, in, obj_id, uid);
 
mlx5_cmd_exec(dev->mdev, in, sizeof(in), out, sizeof(out));
 }
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index ac2abfc866a6..8cc285c4da8e 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1765,9 +1765,10 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
goto out_uars;
}
 
-   err = mlx5_ib_devx_create(dev, context);
-   if (err)
+   err = mlx5_ib_devx_create(dev);
+   if (err < 0)
goto out_uars;
+   context->devx_uid = err;
}
 
err = mlx5_ib_alloc_transport_domain(dev, &context->tdn,
@@ -1872,7 +1873,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
mlx5_ib_dealloc_transport_domain(dev, context->tdn, context->devx_uid);
 out_devx:
if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX)
-   mlx5_ib_devx_destroy(dev, context);
+   mlx5_ib_devx_destroy(dev, context->devx_uid);
 
 out_uars:
deallocate_uars(dev, context);
@@ -1899,7 +1900,7 @@ static int mlx5_ib_dealloc_ucontext(struct ib_ucontext 
*ibcontext)
mlx5_ib_dealloc_transport_domain(dev, context->tdn, context->devx_uid);
 
if (context->devx_uid)
-   mlx5_ib_devx_destroy(dev, context);
+   mlx5_ib_devx_destroy(dev, context->devx_uid);
 
deallocate_uars(dev, context);
kfree(bfregi->sys_pages);
@@ -6287,6 +6288,8 @@ void __mlx5_ib_remove(struct mlx5_ib_dev *dev,
profile->stage[stage].cleanup(dev);
}
 
+   if (dev->devx_whitelist_uid)
+   mlx5_ib_devx_destroy(dev, dev->devx_whitelist_uid);
ib_dealloc_device((struct ib_device *)dev);
 }
 
@@ -6295,6 +6298,7 @@ void *__mlx5_ib_add(struct mlx5_ib_dev *dev,
 {
int err;
int i;
+   int uid;
 
printk_once(KERN_INFO "%s", mlx5_version);
 
@@ -6306,6 +6310,10 @@ void *__mlx5_ib_add(struct mlx5_ib_dev *dev,
}
}
 
+   uid = mlx5_ib_devx_create(dev);
+   if (uid > 0)
+   dev->devx_whitelist_uid = uid;
+
dev->profile = profile;
dev->ib_active = true;
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index f582bd05c180..6a0fbd0286ef 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -934,6 +934,7 @@ struct mlx5_ib_dev {
struct list_headib_dev_list;
u64 sys_image_guid;
struct mlx5_memic   memic;
+   u16 devx_whitelist_uid;
 }

[PATCH rdma-next 21/25] IB/mlx5: Set valid umem bit on DEVX

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Set valid umem bit on DEVX commands that use umem.
This will enforce the umem usage by the firmware and not the 'pas' info.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/devx.c | 95 +++
 1 file changed, 95 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/devx.c 
b/drivers/infiniband/hw/mlx5/devx.c
index 25dafa4ff6ca..562c7936bbad 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -264,6 +264,97 @@ static int devx_is_valid_obj_id(struct devx_obj *obj, 
const void *in)
return false;
 }
 
+static void devx_set_umem_valid(const void *in)
+{
+   u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode);
+
+   switch (opcode) {
+   case MLX5_CMD_OP_CREATE_MKEY:
+   MLX5_SET(create_mkey_in, in, mkey_umem_valid, 1);
+   break;
+   case MLX5_CMD_OP_CREATE_CQ:
+   {
+   void *cqc;
+
+   MLX5_SET(create_cq_in, in, cq_umem_valid, 1);
+   cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+   MLX5_SET(cqc, cqc, dbr_umem_valid, 1);
+   break;
+   }
+   case MLX5_CMD_OP_CREATE_QP:
+   {
+   void *qpc;
+
+   qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+   MLX5_SET(qpc, qpc, dbr_umem_valid, 1);
+   MLX5_SET(create_qp_in, in, wq_umem_valid, 1);
+   break;
+   }
+
+   case MLX5_CMD_OP_CREATE_RQ:
+   {
+   void *rqc, *wq;
+
+   rqc = MLX5_ADDR_OF(create_rq_in, in, ctx);
+   wq  = MLX5_ADDR_OF(rqc, rqc, wq);
+   MLX5_SET(wq, wq, dbr_umem_valid, 1);
+   MLX5_SET(wq, wq, wq_umem_valid, 1);
+   break;
+   }
+
+   case MLX5_CMD_OP_CREATE_SQ:
+   {
+   void *sqc, *wq;
+
+   sqc = MLX5_ADDR_OF(create_sq_in, in, ctx);
+   wq = MLX5_ADDR_OF(sqc, sqc, wq);
+   MLX5_SET(wq, wq, dbr_umem_valid, 1);
+   MLX5_SET(wq, wq, wq_umem_valid, 1);
+   break;
+   }
+
+   case MLX5_CMD_OP_MODIFY_CQ:
+   MLX5_SET(modify_cq_in, in, cq_umem_valid, 1);
+   break;
+
+   case MLX5_CMD_OP_CREATE_RMP:
+   {
+   void *rmpc, *wq;
+
+   rmpc = MLX5_ADDR_OF(create_rmp_in, in, ctx);
+   wq = MLX5_ADDR_OF(rmpc, rmpc, wq);
+   MLX5_SET(wq, wq, dbr_umem_valid, 1);
+   MLX5_SET(wq, wq, wq_umem_valid, 1);
+   break;
+   }
+
+   case MLX5_CMD_OP_CREATE_XRQ:
+   {
+   void *xrqc, *wq;
+
+   xrqc = MLX5_ADDR_OF(create_xrq_in, in, xrq_context);
+   wq = MLX5_ADDR_OF(xrqc, xrqc, wq);
+   MLX5_SET(wq, wq, dbr_umem_valid, 1);
+   MLX5_SET(wq, wq, wq_umem_valid, 1);
+   break;
+   }
+
+   case MLX5_CMD_OP_CREATE_XRC_SRQ:
+   {
+   void *xrc_srqc;
+
+   MLX5_SET(create_xrc_srq_in, in, xrc_srq_umem_valid, 1);
+   xrc_srqc = MLX5_ADDR_OF(create_xrc_srq_in, in,
+   xrc_srq_context_entry);
+   MLX5_SET(xrc_srqc, xrc_srqc, dbr_umem_valid, 1);
+   break;
+   }
+
+   default:
+   return;
+   }
+}
+
 static bool devx_is_obj_create_cmd(const void *in)
 {
u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode);
@@ -741,6 +832,8 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)(
return -ENOMEM;
 
MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, c->devx_uid);
+   devx_set_umem_valid(cmd_in);
+
err = mlx5_cmd_exec(dev->mdev, cmd_in,
uverbs_attr_get_len(attrs, 
MLX5_IB_ATTR_DEVX_OBJ_CREATE_CMD_IN),
cmd_out, cmd_out_len);
@@ -790,6 +883,8 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_MODIFY)(
return PTR_ERR(cmd_out);
 
MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, c->devx_uid);
+   devx_set_umem_valid(cmd_in);
+
err = mlx5_cmd_exec(obj->mdev, cmd_in,
uverbs_attr_get_len(attrs, 
MLX5_IB_ATTR_DEVX_OBJ_MODIFY_CMD_IN),
cmd_out, cmd_out_len);
-- 
2.14.4



[PATCH rdma-next 24/25] IB/mlx5: Enable DEVX white list commands

2018-09-17 Thread Leon Romanovsky
From: Yishai Hadas 

Enable DEVX white list commands without the need for CAP_NET_RAW.

DEVX uid must exist from the ucontext or the device so that the firmware
will mask unprivileged capabilities.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/devx.c | 75 +++
 1 file changed, 60 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c 
b/drivers/infiniband/hw/mlx5/devx.c
index 97cac57dcb3d..c11640047f26 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -61,9 +61,6 @@ int mlx5_ib_devx_create(struct mlx5_ib_dev *dev)
!(general_obj_types & MLX5_GENERAL_OBJ_TYPES_CAP_UMEM))
return -EINVAL;
 
-   if (!capable(CAP_NET_RAW))
-   return -EPERM;
-
MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode, 
MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type, MLX5_OBJ_TYPE_UCTX);
 
@@ -476,12 +473,49 @@ static bool devx_is_obj_query_cmd(const void *in)
}
 }
 
+static bool devx_is_whitelist_cmd(void *in)
+{
+   u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode);
+
+   switch (opcode) {
+   case MLX5_CMD_OP_QUERY_HCA_CAP:
+   case MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT:
+   return true;
+   default:
+   return false;
+   }
+}
+
+static int devx_get_uid(struct mlx5_ib_ucontext *c, void *cmd_in)
+{
+   if (devx_is_whitelist_cmd(cmd_in)) {
+   struct mlx5_ib_dev *dev;
+
+   if (c->devx_uid)
+   return c->devx_uid;
+
+   dev = to_mdev(c->ibucontext.device);
+   if (dev->devx_whitelist_uid)
+   return dev->devx_whitelist_uid;
+
+   return -EOPNOTSUPP;
+   }
+
+   if (!c->devx_uid)
+   return -EINVAL;
+
+   if (!capable(CAP_NET_RAW))
+   return -EPERM;
+
+   return c->devx_uid;
+}
 static bool devx_is_general_cmd(void *in)
 {
u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode);
 
switch (opcode) {
case MLX5_CMD_OP_QUERY_HCA_CAP:
+   case MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT:
case MLX5_CMD_OP_QUERY_VPORT_STATE:
case MLX5_CMD_OP_QUERY_ADAPTER:
case MLX5_CMD_OP_QUERY_ISSI:
@@ -589,14 +623,16 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OTHER)(
MLX5_IB_ATTR_DEVX_OTHER_CMD_OUT);
void *cmd_out;
int err;
+   int uid;
 
c = devx_ufile2uctx(file);
if (IS_ERR(c))
return PTR_ERR(c);
dev = to_mdev(c->ibucontext.device);
 
-   if (!c->devx_uid)
-   return -EPERM;
+   uid = devx_get_uid(c, cmd_in);
+   if (uid < 0)
+   return uid;
 
/* Only white list of some general HCA commands are allowed for this 
method. */
if (!devx_is_general_cmd(cmd_in))
@@ -606,7 +642,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OTHER)(
if (IS_ERR(cmd_out))
return PTR_ERR(cmd_out);
 
-   MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, c->devx_uid);
+   MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, uid);
err = mlx5_cmd_exec(dev->mdev, cmd_in,
uverbs_attr_get_len(attrs, 
MLX5_IB_ATTR_DEVX_OTHER_CMD_IN),
cmd_out, cmd_out_len);
@@ -816,9 +852,11 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)(
struct mlx5_ib_dev *dev = to_mdev(c->ibucontext.device);
struct devx_obj *obj;
int err;
+   int uid;
 
-   if (!c->devx_uid)
-   return -EPERM;
+   uid = devx_get_uid(c, cmd_in);
+   if (uid < 0)
+   return uid;
 
if (!devx_is_obj_create_cmd(cmd_in))
return -EINVAL;
@@ -831,7 +869,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)(
if (!obj)
return -ENOMEM;
 
-   MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, c->devx_uid);
+   MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, uid);
devx_set_umem_valid(cmd_in);
 
err = mlx5_cmd_exec(dev->mdev, cmd_in,
@@ -868,9 +906,11 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_MODIFY)(
struct devx_obj *obj = uobj->object;
void *cmd_out;
int err;
+   int uid;
 
-   if (!c->devx_uid)
-   return -EPERM;
+   uid = devx_get_uid(c, cmd_in);
+   if (uid < 0)
+   return uid;
 
if (!devx_is_obj_modify_cmd(cmd_in))
return -EINVAL;
@@ -882,7 +922,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_MODIFY)(
if (IS_ERR(cmd_out))
return PTR_ERR(cmd_out);
 
-   MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, c->devx_uid);
+   MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, uid);
devx_set_umem_valid(cmd_in);
 
err = mlx5_cmd_exec(obj->md

Re: iproute2: Debian 9 No ELF support

2018-09-17 Thread Bo YU

Hi,
On Mon, Sep 17, 2018 at 11:57:12AM +0200, Daniel Borkmann wrote:

On 09/17/2018 10:23 AM, Bo YU wrote:

Hello,
I have followed the instructions from:

https://cilium.readthedocs.io/en/latest/bpf/#bpftool

to test xdp program.
But i can not enable elf support.

./configure --prefix=/usr
```output
TC schedulers
ATM    no

libc has setns: yes
SELinux support: no
ELF support: no
libmnl support: yes
Berkeley DB: yes
need for strlcpy: yes
libcap support: yes
```
And i have installed libelf-dev :
```output
sudo apt show libelf-dev
Package: libelf-dev
Version: 0.168-1
Priority: optional
Section: libdevel
Source: elfutils
Maintainer: Kurt Roeckx 
Installed-Size: 353 kB
Depends: libelf1 (= 0.168-1)
Conflicts: libelfg0-dev
Homepage: https://sourceware.org/elfutils/
Tag: devel::library, role::devel-lib
```

And gcc version:
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)

uname -a:
Linux debian 4.18.0-rc1+ #2 SMP Sun Jun 24 16:53:57 HKT 2018 x86_64 GNU/Linux

Any help is appreciate.


Debian's official iproute2 packaging build says 'libelf-dev' [0], and having
libelf-dev installed should work ...

 [...]
 Build-Depends: bison,
  debhelper (>= 10~),
  flex,
  iptables-dev,
  libatm1-dev,
  libcap-dev,
  libdb-dev,
  libelf-dev,
  libmnl-dev,
  libselinux1-dev,
  linux-libc-dev,
  pkg-config,
  po-debconf,
  zlib1g-dev,
 [...]

Did you ran into this one perhaps [1]? Do you have zlib1g-dev installed?

Yes,You are right. I install zlib1g-dev with your help,iproute2 enable ELF
support.
```output
./configure --prefix=/usr
TC schedulers
ATM no

libc has setns: yes
SELinux support: no
ELF support: yes
libmnl support: yes
Berkeley DB: yes
need for strlcpy: yes
libcap support: yes

```
But there is no effect after [1], right? When i install libelf-dev,it should
install zlib1g-dev also.

Is there any way to update the page [2]?

Thank you, Daniel

[2] https://cilium.readthedocs.io/en/latest/bpf/#bpftool


 [0] https://salsa.debian.org/debian/iproute2/blob/master/debian/control
 [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=885071


Re: iproute2: Debian 9 No ELF support

2018-09-17 Thread Daniel Borkmann
On 09/17/2018 01:46 PM, Bo YU wrote:
> On Mon, Sep 17, 2018 at 11:57:12AM +0200, Daniel Borkmann wrote:
>> On 09/17/2018 10:23 AM, Bo YU wrote:
>>> Hello,
>>> I have followed the instructions from:
>>>
>>> https://cilium.readthedocs.io/en/latest/bpf/#bpftool
>>>
>>> to test xdp program.
>>> But i can not enable elf support.
>>>
>>> ./configure --prefix=/usr
>>> ```output
>>> TC schedulers
>>> ATM    no
>>>
>>> libc has setns: yes
>>> SELinux support: no
>>> ELF support: no
>>> libmnl support: yes
>>> Berkeley DB: yes
>>> need for strlcpy: yes
>>> libcap support: yes
>>> ```
>>> And i have installed libelf-dev :
>>> ```output
>>> sudo apt show libelf-dev
>>> Package: libelf-dev
>>> Version: 0.168-1
>>> Priority: optional
>>> Section: libdevel
>>> Source: elfutils
>>> Maintainer: Kurt Roeckx 
>>> Installed-Size: 353 kB
>>> Depends: libelf1 (= 0.168-1)
>>> Conflicts: libelfg0-dev
>>> Homepage: https://sourceware.org/elfutils/
>>> Tag: devel::library, role::devel-lib
>>> ```
>>>
>>> And gcc version:
>>> gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
>>>
>>> uname -a:
>>> Linux debian 4.18.0-rc1+ #2 SMP Sun Jun 24 16:53:57 HKT 2018 x86_64 
>>> GNU/Linux
>>>
>>> Any help is appreciate.
>>
>> Debian's official iproute2 packaging build says 'libelf-dev' [0], and having
>> libelf-dev installed should work ...
>>
>>  [...]
>>  Build-Depends: bison,
>>   debhelper (>= 10~),
>>   flex,
>>   iptables-dev,
>>   libatm1-dev,
>>   libcap-dev,
>>   libdb-dev,
>>   libelf-dev,
>>   libmnl-dev,
>>   libselinux1-dev,
>>   linux-libc-dev,
>>   pkg-config,
>>   po-debconf,
>>   zlib1g-dev,
>>  [...]
>>
>> Did you ran into this one perhaps [1]? Do you have zlib1g-dev installed?
> Yes,You are right. I install zlib1g-dev with your help,iproute2 enable ELF
> support.
> ```output
> ./configure --prefix=/usr
> TC schedulers
> ATM    no
> 
> libc has setns: yes
> SELinux support: no
> ELF support: yes
> libmnl support: yes
> Berkeley DB: yes
> need for strlcpy: yes
> libcap support: yes
> 
> ```
> But there is no effect after [1], right? When i install libelf-dev,it should
> install zlib1g-dev also.
> 
> Is there any way to update the page [2]?

This bug should be Debian specific, so it would make sense to contact Debian
developers or comment on [1] if it's still not resolved in current versions.

> Thank you, Daniel
> 
> [2] https://cilium.readthedocs.io/en/latest/bpf/#bpftool
>>
>>  [0] https://salsa.debian.org/debian/iproute2/blob/master/debian/control
>>  [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=885071



Re: [PATCH v3 00/30] backport of IP fragmentation fixes

2018-09-17 Thread Greg KH
On Thu, Sep 13, 2018 at 07:58:32AM -0700, Stephen Hemminger wrote:
> Took the set of patches from 4.19 to handle IP fragmentation DoS
> and applied them against 4.14.69.  Most of these are from Eric.
> In a couple case, it required some manual merge conflict resolution.
> 
> Tested normal IP fragmentation with iperf3 and malicious IP fragments
> with fragmentsmack. Under fragmentation attack (700Kpps) the original
> 4.14.69 consumes 97% CPU; with this patch it drops to 5%.

All now queued up, thanks for doing the backport.

greg k-h


Re: [PATCH net-next v3 0/2] net: stmmac: Coalesce and tail addr fixes

2018-09-17 Thread Jose Abreu
Hi Jerome,

On 14-09-2018 16:06, Jerome Brunet wrote:
>
> Looks better this time. Stable so far, with even a small throughput 
> improvement
> on the Tx path.
>
> so for the a113 s400 board (single queue)
> Tested-by: Jerome Brunet 
>

Thanks for testing! I sent out a rebased version against net.

Can you share what's the throughput improvement in % ?

Do you still see the performance drop when tx/rx work at same
time ? I remember that was another issue ...

Thanks and Best Regards,
Jose Miguel Abreu


[PATCH net] selftests: pmtu: properly redirect stderr to /dev/null

2018-09-17 Thread Sabrina Dubroca
The cleanup function uses "$CMD 2 > /dev/null", which doesn't actually
send stderr to /dev/null, so when the netns doesn't exist, the error
message is shown. Use "2> /dev/null" instead, so that those messages
disappear, as was intended.

Fixes: d1f1b9cbf34c ("selftests: net: Introduce first PMTU test")
Signed-off-by: Sabrina Dubroca 
---
 tools/testing/selftests/net/pmtu.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/pmtu.sh 
b/tools/testing/selftests/net/pmtu.sh
index 32a194e3e07a..0ab9423d009f 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -178,8 +178,8 @@ setup() {
 
 cleanup() {
[ ${cleanup_done} -eq 1 ] && return
-   ip netns del ${NS_A} 2 > /dev/null
-   ip netns del ${NS_B} 2 > /dev/null
+   ip netns del ${NS_A} 2> /dev/null
+   ip netns del ${NS_B} 2> /dev/null
cleanup_done=1
 }
 
-- 
2.19.0



Re: [PATCH net] selftests: pmtu: properly redirect stderr to /dev/null

2018-09-17 Thread Stefano Brivio
On Mon, 17 Sep 2018 15:30:06 +0200
Sabrina Dubroca  wrote:

> The cleanup function uses "$CMD 2 > /dev/null", which doesn't actually
> send stderr to /dev/null, so when the netns doesn't exist, the error
> message is shown. Use "2> /dev/null" instead, so that those messages
> disappear, as was intended.

Oops, thanks for catching this.

> Fixes: d1f1b9cbf34c ("selftests: net: Introduce first PMTU test")
> Signed-off-by: Sabrina Dubroca 

Acked-by: Stefano Brivio 

-- 
Stefano


Re: [RFC PATCH 2/4] net: enable UDP gro on demand.

2018-09-17 Thread Willem de Bruijn
On Mon, Sep 17, 2018 at 6:18 AM Paolo Abeni  wrote:
>
> On Sun, 2018-09-16 at 14:23 -0400, Willem de Bruijn wrote:
> > That udp gro implementation is clearly less complete than yours in
> > this patchset. The point I wanted to bring up for discussion is not the
> > protocol implementation, but the infrastructure for enabling it
> > conditionally.
>
> I'm still [trying to] processing your patchset ;) So please perdon me
> for any obvious interpretation mistakes...
>
> > Assuming cycle cost is comparable, what do you think of  using the
> > existing sk offload callbacks to enable this on a per-socket basis?
>
> I have no objection about that, if there are no performance drawbacks.
> In my measures retpoline costs is relevant for every indirect call
> added. Using the existing sk offload approach will require an
> additional indirect call per packet compared to the implementation
> here.

Fair enough. The question is whether it is significant to the real workload.
This is also an issue with GRO processing in general, all those callbacks
as well as the two cacheline lookups to get to each callback.

> > As for the protocol-wide knob, I do strongly prefer something that can
> > work for all protocols, not just UDP.
>
> I like the general infrastructure idea. I think there is some agreement
> in avoiding the addition of more user-controllable knobs, as we already
> have a lot of them. If I read your patch correctly, user-space need to
> enable/disable the UDO GSO explicitly via procfs, right?

No, like other GRO callbacks, the feature is enabled by default.

Patch 7/8 disables the most expensive part behind a static key
until a socket actually registers a GRO callback, whether tunnel
or the new application GRO.

Patch 6/9 makes it possible to disable any protocol completely,
indeed through a sysctl.

> I tried to look for something that does not require user action.
>
> > I also implemented a version that
> > atomically swaps the struct ptr instead of the flag based approach I sent
> > for review. I'm fairly agnostic about that point.
>
> I think/fear security oriented guys may scream for the somewhat large
> deconstification ?!?

Hmm.. yes, interesting point. Since const pointers are a compile time
feature, in practice I don't think that they buy any protection against
callback pointer rewriting. Let me think about that some more.

>
> > One subtle issue is that I
> > believe we need to keep the gro_complete callbacks enabled, as gro
> > packets may be queued for completion when gro_receive gets disabled.
>
> Good point, thanks! I missed that.
>
> Cheers,
>
> Paolo
>
>


Re: [PATCH net-next RFC 7/8] udp: gro behind static key

2018-09-17 Thread Willem de Bruijn
On Mon, Sep 17, 2018 at 5:03 AM Steffen Klassert
 wrote:
>
> On Fri, Sep 14, 2018 at 01:59:40PM -0400, Willem de Bruijn wrote:
> > From: Willem de Bruijn 
> >
> > Avoid the socket lookup cost in udp_gro_receive if no socket has a
> > gro callback configured.
> >
> > Signed-off-by: Willem de Bruijn 
>
> ...
>
> > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > index 4f6aa95a9b12..f44fe328aa0f 100644
> > --- a/net/ipv4/udp_offload.c
> > +++ b/net/ipv4/udp_offload.c
> > @@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct 
> > list_head *head,
> >  {
> >   struct udphdr *uh = udp_gro_udphdr(skb);
> >
> > - if (unlikely(!uh))
> > + if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key))
> >   goto flush;
>
> If you use udp_encap_needed_key to enalbe UDP GRO, then a UDP
> encapsulation socket will enable it too. Not sure if this is
> intentional.

Yes. That is already the case to a certain point. The function was
introduced with tunnels and is enabled by tunnels, but so far only
compiles out the encap_rcv() branch in udp_qeueue_rcv_skb.

With patch 7/8 it also toggles the GRO path. Critically, both are
enabled as soon as a tunnel is registered.

>
> That said, enabling UDP GRO on a UDP encapsulation socket
> (ESP in UPD etc.) will fail badly as then encrypted ESP
> packets might be merged together. So we somehow should
> make sure that this does not happen.

Absolutely. This initial implementation probably breaks UDP tunnels
badly. That needs to be addressed.

>
> Anyway, this reminds me that we can support GRO for
> UDP encapsulation. It just requires separate GRO
> callbacks for the different encapsulation types.


Re: [PATCH net-next RFC 7/8] udp: gro behind static key

2018-09-17 Thread Willem de Bruijn
On Mon, Sep 17, 2018 at 6:24 AM Paolo Abeni  wrote:
>
> On Fri, 2018-09-14 at 13:59 -0400, Willem de Bruijn wrote:
> > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > index 4f6aa95a9b12..f44fe328aa0f 100644
> > --- a/net/ipv4/udp_offload.c
> > +++ b/net/ipv4/udp_offload.c
> > @@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct 
> > list_head *head,
> >  {
> >   struct udphdr *uh = udp_gro_udphdr(skb);
> >
> > - if (unlikely(!uh))
> > + if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key))
> >   goto flush;
> >
> >   /* Don't bother verifying checksum if we're going to flush anyway. */
>
> If I read this correctly, once udp_encap_needed_key is enabled, it will
> never be turned off, because the tunnel and encap socket shut down does
> not cope with udp_encap_needed_key.
>
> Perhaps we should take care of that, too.

Agreed. For now I reused what's already there, but I can extend
that with refcounting using static_branch_inc/static_branch_dec.


Re: [PATCH net-next RFC 7/8] udp: gro behind static key

2018-09-17 Thread Willem de Bruijn
On Mon, Sep 17, 2018 at 6:37 AM Steffen Klassert
 wrote:
>
> On Fri, Sep 14, 2018 at 01:59:40PM -0400, Willem de Bruijn wrote:
> > From: Willem de Bruijn 
> >
> > Avoid the socket lookup cost in udp_gro_receive if no socket has a
> > gro callback configured.
>
> It would be nice if we could do GRO not just for GRO configured
> sockets, but also for flows that are going to be IPsec transformed
> or directly forwarded.

I thought about that, as we have GSO. An egregious hack enables
GRO for all registered local sockets that support it and for any flow
for which no local socket is registered:

@@ -365,11 +369,13 @@ struct sk_buff *udp_gro_receive(struct list_head
*head, struct sk_buff *skb,
rcu_read_lock();
sk = (*lookup)(skb, uh->source, uh->dest);

-   if (sk && udp_sk(sk)->gro_receive)
-   goto unflush;
-   goto out_unlock;
+   if (!sk)
+   gro_receive_cb = udp_gro_receive_cb;
+   else if (!udp_sk(sk)->gro_receive)
+   goto out_unlock;
+   else
+   gro_receive_cb = udp_sk(sk)->gro_receive;

@@ -392,7 +398,7 @@ struct sk_buff *udp_gro_receive(struct list_head
*head, struct sk_buff *skb,

skb_gro_pull(skb, sizeof(struct udphdr)); /* pull
encapsulating udp header */
skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
-   pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb);
+   pp = call_gro_receive_sk(gro_receive_cb, sk, head, skb);


But not having a local socket does not imply forwarding path, of course.

>
> Maybe in case that forwarding is enabled on the receiving device,
> inet_gro_receive() could do a route lookup and allow GRO if the
> route lookup returned at forwarding route.

That's a better solution, if the cost is acceptable. We do have to
be careful against increasing per packet cycle cost in this path
given that it's a possible vector for DoS attempts.

> For flows that are likely software segmented after that, it
> would be worth to build packet chains insted of merging the
> payload. Packets of the same flow could travel together, but
> it would save the cost of the packet merging and segmenting.

With software GSO that is faster, as it would have to allocate
all the separate segment skbs in skb_segment later. Though
there is some complexity if MTUs differ.

With hardware UDP GSO, having a single skb will be cheaper in
the forwarding path. Using napi_gro_frags, device drivers really
do only end up allocating one skb for the GSO packet.

> This could be done similar to what I proposed for the list
> receive case:
>
> https://www.spinics.net/lists/netdev/msg522706.html
>
> How GRO should be done could be even configured
> by replacing the net_offload pointer similar
> to what Paolo propsed in his pachset with
> the inet_update_offload() function.

Right. The above hack also already has to use two distinct
callback assignments.


[PATCH] net: phy: phylink: fix SFP interface autodetection

2018-09-17 Thread Baruch Siach
When the switching to the SFP detected link mode update the main
link_interface field as well. Otherwise, the link fails to come up when
the configured 'phy-mode' defers from the SFP detected mode.

This fixes 1GB SFP module link up on eth3 of the Macchiatobin board that
is configured in the DT to "2500base-x" phy-mode.

Signed-off-by: Baruch Siach 
---
 drivers/net/phy/phylink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 3ba5cf2a8a5f..3ece48c86841 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1631,6 +1631,7 @@ static int phylink_sfp_module_insert(void *upstream,
if (pl->link_an_mode != MLO_AN_INBAND ||
pl->link_config.interface != config.interface) {
pl->link_config.interface = config.interface;
+   pl->link_interface = config.interface;
pl->link_an_mode = MLO_AN_INBAND;
 
changed = true;
-- 
2.18.0



Re: [PATCH v2 2/4] dt-bindings: net: qcom: Add binding for shared mdio bus

2018-09-17 Thread Andrew Lunn
On Mon, Sep 17, 2018 at 04:53:29PM +0800, Wang Dongsheng wrote:
> This property copy from "ibm,emac.txt" to describe a shared MIDO bus.
> Since emac include MDIO, so If the motherboard has more than one PHY
> connected to an MDIO bus, this property will point to the MAC device
> that has the MDIO bus.
> 
> Signed-off-by: Wang Dongsheng 
> ---
> V2: s/Since QDF2400 emac/Since emac/
> ---
>  Documentation/devicetree/bindings/net/qcom-emac.txt | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/qcom-emac.txt 
> b/Documentation/devicetree/bindings/net/qcom-emac.txt
> index 346e6c7f47b7..50db71771358 100644
> --- a/Documentation/devicetree/bindings/net/qcom-emac.txt
> +++ b/Documentation/devicetree/bindings/net/qcom-emac.txt
> @@ -24,6 +24,9 @@ Internal PHY node:
>  The external phy child node:
>  - reg : The phy address
>  
> +Optional properties:
> +- mdio-device : Shared MIDO bus.

Hi Dongsheng

I don't see why you need this property. The ethernet interface has a
phy-handle which points to a PHY. That is all you need to find the PHY.

emac0: ethernet@feb2 {
compatible = "qcom,fsm9900-emac";
reg = <0xfeb2 0x1>,
  <0xfeb36000 0x1000>;
interrupts = <76>;

clocks = <&gcc 0>, <&gcc 1>, <&gcc 3>, <&gcc 4>, <&gcc 5>,
<&gcc 6>, <&gcc 7>;
clock-names = "axi_clk", "cfg_ahb_clk", "high_speed_clk",
"mdio_clk", "tx_clk", "rx_clk", "sys_clk";

internal-phy = <&emac_sgmii>;

phy-handle = <&phy0>;

#address-cells = <1>;
#size-cells = <0>;

phy0: ethernet-phy@0 {
reg = <0>;
};

phy1: ethernet-phy@1 {
reg = <1>;
};

pinctrl-names = "default";
pinctrl-0 = <&mdio_pins_a>;
};

emac1: ethernet@3890 {
compatible = "qcom,fsm9900-emac";
...
...

phy-handle = <&phy1>;
};

Andrew


Re: [PATCH net] pppoe: fix reception of frames with no mac header

2018-09-17 Thread David Miller
From: Guillaume Nault 
Date: Fri, 14 Sep 2018 16:28:05 +0200

> pppoe_rcv() needs to look back at the Ethernet header in order to
> lookup the PPPoE session. Therefore we need to ensure that the mac
> header is big enough to contain an Ethernet header. Otherwise
> eth_hdr(skb)->h_source might access invalid data.
 ...
> Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers")
> Reported-by: syzbot+f5f6080811c849739...@syzkaller.appspotmail.com
> Signed-off-by: Guillaume Nault 

Applied and queued up for -stable, thanks.


Re: [PATCH net] bnxt_en: Fix VF mac address regression.

2018-09-17 Thread David Miller
From: Michael Chan 
Date: Fri, 14 Sep 2018 15:41:29 -0400

> The recent commit to always forward the VF MAC address to the PF for
> approval may not work if the PF driver or the firmware is older.  This
> will cause the VF driver to fail during probe:
> 
>   bnxt_en :00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 
> 0xf seq id 0x5 error 0x
>   bnxt_en :00:03.0 (unnamed net_device) (uninitialized): VF MAC address 
> 00:00:17:02:05:d0 not approved by the PF
>   bnxt_en :00:03.0: Unable to initialize mac address.
>   bnxt_en: probe of :00:03.0 failed with error -99
> 
> We fix it by treating the error as fatal only if the VF MAC address is
> locally generated by the VF.
> 
> Fixes: 707e7e966026 ("bnxt_en: Always forward VF MAC address to the PF.")
> Reported-by: Seth Forshee 
> Reported-by: Siwei Liu 
> Signed-off-by: Michael Chan 
> ---
> Please queue this for stable as well.  Thanks.

Applied and queued up for -stable.


Re: [PATCH net] ipv6: fix possible use-after-free in ip6_xmit()

2018-09-17 Thread David Miller
From: Eric Dumazet 
Date: Fri, 14 Sep 2018 12:02:31 -0700

> In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
> we need to call skb_set_owner_w() before consuming original skb,
> otherwise we risk a use-after-free.
> 
> Bring IPv6 in line with what we do in IPv4 to fix this.
> 
> Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
> Signed-off-by: Eric Dumazet 
> Reported-by: syzbot 

Applied and queued up for -stable.


Re: [PATCH v2 0/2] hv_netvsc: associate VF and PV device by serial number

2018-09-17 Thread David Miller
From: Stephen Hemminger 
Date: Fri, 14 Sep 2018 12:54:55 -0700

> The Hyper-V implementation of PCI controller has concept of 32 bit serial 
> number
> (not to be confused with PCI-E serial number).  This value is sent in the 
> protocol
> from the host to indicate SR-IOV VF device is attached to a synthetic NIC.
> 
> Using the serial number (instead of MAC address) to associate the two devices
> avoids lots of potential problems when there are duplicate MAC addresses from
> tunnels or layered devices.
> 
> The patch set is broken into two parts, one is for the PCI controller
> and the other is for the netvsc device. Normally, these go through different
> trees but sending them together here for better review. The PCI changes
> were submitted previously, but the main review comment was "why do you
> need this?". This is why.
> 
> v2 - slot name can be shorter.
>  remove locking when creating pci_slots; see comment for explaination

Series applied, thanks.


Re: [net-next PATCH] tls: async support causes out-of-bounds access in crypto APIs

2018-09-17 Thread David Miller
From: John Fastabend 
Date: Fri, 14 Sep 2018 13:01:46 -0700

> When async support was added it needed to access the sk from the async
> callback to report errors up the stack. The patch tried to use space
> after the aead request struct by directly setting the reqsize field in
> aead_request. This is an internal field that should not be used
> outside the crypto APIs. It is used by the crypto code to define extra
> space for private structures used in the crypto context. Users of the
> API then use crypto_aead_reqsize() and add the returned amount of
> bytes to the end of the request memory allocation before posting the
> request to encrypt/decrypt APIs.
> 
> So this breaks (with general protection fault and KASAN error, if
> enabled) because the request sent to decrypt is shorter than required
> causing the crypto API out-of-bounds errors. Also it seems unlikely the
> sk is even valid by the time it gets to the callback because of memset
> in crypto layer.
> 
> Anyways, fix this by holding the sk in the skb->sk field when the
> callback is set up and because the skb is already passed through to
> the callback handler via void* we can access it in the handler. Then
> in the handler we need to be careful to NULL the pointer again before
> kfree_skb. I added comments on both the setup (in tls_do_decryption)
> and when we clear it from the crypto callback handler
> tls_decrypt_done(). After this selftests pass again and fixes KASAN
> errors/warnings.
> 
> Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls 
> records")
> Signed-off-by: John Fastabend 

Applied, thanks John.


Re: [Patch net-next] ipv4: initialize ra_mutex in inet_init_net()

2018-09-17 Thread David Miller
From: Cong Wang 
Date: Fri, 14 Sep 2018 13:32:42 -0700

> ra_mutex is a IPv4 specific mutex, it is inside struct netns_ipv4,
> but its initialization is in the generic netns code, setup_net().
> 
> Move it to IPv4 specific net init code, inet_init_net().
> 
> Fixes: d9ff3049739e ("net: Replace ip_ra_lock with per-net mutex")
> Cc: Kirill Tkhai 
> Signed-off-by: Cong Wang 

Please take into consideration Kirill's feedback.

Thank you.


Re: [PATCH net] net: dsa: mv88e6xxx: Fix ATU Miss Violation

2018-09-17 Thread David Miller
From: Andrew Lunn 
Date: Fri, 14 Sep 2018 23:46:12 +0200

> Fix a cut/paste error and a typo which results in ATU miss violations
> not being reported.
> 
> Fixes: 0977644c5005 ("net: dsa: mv88e6xxx: Decode ATU problem interrupt")
> Signed-off-by: Andrew Lunn 

Applied and queued up for -stable.


Re: [PATCH net] tls: fix currently broken MSG_PEEK behavior

2018-09-17 Thread David Miller
From: Daniel Borkmann 
Date: Fri, 14 Sep 2018 23:00:55 +0200

> In kTLS MSG_PEEK behavior is currently failing, strace example:
 ...
> As can be seen from strace, there are two TLS records sent,
> i) 'test_read_peek' and ii) '_mult_recs\0' where we end up
> peeking 'test_read_peektest_read_peektest'. This is clearly
> wrong, and what happens is that given peek cannot call into
> tls_sw_advance_skb() to unpause strparser and proceed with
> the next skb, we end up looping over the current one, copying
> the 'test_read_peek' over and over into the user provided
> buffer.
> 
> Here, we can only peek into the currently held skb (current,
> full TLS record) as otherwise we would end up having to hold
> all the original skb(s) (depending on the peek depth) in a
> separate queue when unpausing strparser to process next
> records, minimally intrusive is to return only up to the
> current record's size (which likely was what c46234ebb4d1
> ("tls: RX path for ktls") originally intended as well). Thus,
> after patch we properly peek the first record:
 ...
> Fixes: c46234ebb4d1 ("tls: RX path for ktls")
> Signed-off-by: Daniel Borkmann 

Applied and queued up for -stable.


Re: [PATCH net-next] net: dsa: gswip: Fix return value check in gswip_probe()

2018-09-17 Thread David Miller
From: Wei Yongjun 
Date: Sat, 15 Sep 2018 01:33:21 +

> In case of error, the function devm_ioremap_resource() returns ERR_PTR()
> and never returns NULL. The NULL test in the return value check should
> be replaced with IS_ERR().
> 
> Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
> Signed-off-by: Wei Yongjun 

Applied.


Re: [PATCH net-next] net: lantiq: Fix return value check in xrx200_probe()

2018-09-17 Thread David Miller
From: Wei Yongjun 
Date: Sat, 15 Sep 2018 01:33:50 +

> In case of error, the function devm_ioremap_resource() returns ERR_PTR()
> and never returns NULL. The NULL test in the return value check should
> be replaced with IS_ERR().
> 
> Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
> Signed-off-by: Wei Yongjun 

Applied.


Re: [PATCH net-next] net: hns: make function hns_gmac_wait_fifo_clean() static

2018-09-17 Thread David Miller
From: Wei Yongjun 
Date: Sat, 15 Sep 2018 01:42:09 +

> Fixes the following sparse warning:
> 
> drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c:322:5: warning:
>  symbol 'hns_gmac_wait_fifo_clean' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Applied.


[PATCH net] net/ipv4: defensive cipso option parsing

2018-09-17 Thread Stefan Nuernberger
commit 40413955ee26 ("Cipso: cipso_v4_optptr enter infinite loop") fixed
a possible infinite loop in the IP option parsing of CIPSO. The fix
assumes that ip_options_compile filtered out all zero length options and
that no other one-byte options beside IPOPT_END and IPOPT_NOOP exist.
While this assumption currently holds true, add explicit checks for zero
length and invalid length options to be safe for the future. Even though
ip_options_compile should have validated the options, the introduction of
new one-byte options can still confuse this code without the additional
checks.

Signed-off-by: Stefan Nuernberger 
Reviewed-by: David Woodhouse 
Reviewed-by: Simon Veith 
Cc: sta...@vger.kernel.org
---
 net/ipv4/cipso_ipv4.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 82178cc69c96..f291b57b8474 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1512,7 +1512,7 @@ static int cipso_v4_parsetag_loc(const struct 
cipso_v4_doi *doi_def,
  *
  * Description:
  * Parse the packet's IP header looking for a CIPSO option.  Returns a pointer
- * to the start of the CIPSO option on success, NULL if one if not found.
+ * to the start of the CIPSO option on success, NULL if one is not found.
  *
  */
 unsigned char *cipso_v4_optptr(const struct sk_buff *skb)
@@ -1522,9 +1522,11 @@ unsigned char *cipso_v4_optptr(const struct sk_buff *skb)
int optlen;
int taglen;
 
-   for (optlen = iph->ihl*4 - sizeof(struct iphdr); optlen > 0; ) {
+   for (optlen = iph->ihl*4 - sizeof(struct iphdr); optlen > 1; ) {
switch (optptr[0]) {
case IPOPT_CIPSO:
+   if (!optptr[1] || optptr[1] > optlen)
+   return NULL;
return optptr;
case IPOPT_END:
return NULL;
@@ -1534,6 +1536,10 @@ unsigned char *cipso_v4_optptr(const struct sk_buff *skb)
default:
taglen = optptr[1];
}
+
+   if (!taglen || taglen > optlen)
+   break;
+
optlen -= taglen;
optptr += taglen;
}
-- 
2.19.0



Re: [PATCH] net: phy: phylink: fix SFP interface autodetection

2018-09-17 Thread Russell King - ARM Linux
On Mon, Sep 17, 2018 at 05:19:57PM +0300, Baruch Siach wrote:
> When the switching to the SFP detected link mode update the main
> link_interface field as well. Otherwise, the link fails to come up when
> the configured 'phy-mode' defers from the SFP detected mode.
> 
> This fixes 1GB SFP module link up on eth3 of the Macchiatobin board that
> is configured in the DT to "2500base-x" phy-mode.

link_interface isn't supposed to track the SFP link mode.  In any case,
this is only used when a PHY is attached.  For a PHY on a SFP,
phylink_connect_phy() should be using link_config.interface and not
link_interface there.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH v3 net-next 07/12] net: ethernet: Add helper to remove a supported link mode

2018-09-17 Thread Simon Horman
On Wed, Sep 12, 2018 at 01:53:14AM +0200, Andrew Lunn wrote:
> Some MAC hardware cannot support a subset of link modes. e.g. often
> 1Gbps Full duplex is supported, but Half duplex is not. Add a helper
> to remove such a link mode.
> 
> Signed-off-by: Andrew Lunn 
> Reviewed-by: Florian Fainelli 
> ---
>  drivers/net/ethernet/apm/xgene/xgene_enet_hw.c |  6 +++---
>  drivers/net/ethernet/cadence/macb_main.c   |  5 ++---
>  drivers/net/ethernet/freescale/fec_main.c  |  3 ++-
>  drivers/net/ethernet/microchip/lan743x_main.c  |  2 +-
>  drivers/net/ethernet/renesas/ravb_main.c   |  3 ++-
>  .../net/ethernet/stmicro/stmmac/stmmac_main.c  | 12 
>  drivers/net/phy/phy_device.c   | 18 ++
>  drivers/net/usb/lan78xx.c  |  2 +-
>  include/linux/phy.h|  1 +
>  9 files changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
> b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
> index 078a04dc1182..4831f9de5945 100644

...

> diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
> b/drivers/net/ethernet/renesas/ravb_main.c
> index aff5516b781e..fb2a1125780d 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1074,7 +1074,8 @@ static int ravb_phy_init(struct net_device *ndev)
>   }
>  
>   /* 10BASE is not supported */
> - phydev->supported &= ~PHY_10BT_FEATURES;
> + phy_remove_link_mode(phydev, ETHTOOL_LINK_MODE_10baseT_Half_BIT);
> + phy_remove_link_mode(phydev, ETHTOOL_LINK_MODE_10baseT_Full_BIT);
>  
>   phy_attached_info(phydev);
>  

...

> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index db1172db1e7c..e9ca83a438b0 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -1765,6 +1765,24 @@ int phy_set_max_speed(struct phy_device *phydev, u32 
> max_speed)
>  }
>  EXPORT_SYMBOL(phy_set_max_speed);
>  
> +/**
> + * phy_remove_link_mode - Remove a supported link mode
> + * @phydev: phy_device structure to remove link mode from
> + * @link_mode: Link mode to be removed
> + *
> + * Description: Some MACs don't support all link modes which the PHY
> + * does.  e.g. a 1G MAC often does not support 1000Half. Add a helper
> + * to remove a link mode.
> + */
> +void phy_remove_link_mode(struct phy_device *phydev, u32 link_mode)
> +{
> + WARN_ON(link_mode > 31);
> +
> + phydev->supported &= ~BIT(link_mode);
> + phydev->advertising = phydev->supported;
> +}
> +EXPORT_SYMBOL(phy_remove_link_mode);
> +
>  static void of_set_phy_supported(struct phy_device *phydev)
>  {
>   struct device_node *node = phydev->mdio.dev.of_node;

Hi Andrew,

I believe that for the RAVB the overall effect of this change is that
10-BaseT modes are no longer advertised (although both with and without
this patch they are not supported).

Unfortunately on R-Car Gen3 M3-W (r8a7796) based Salvator-X board
I have observed that this results in the link no longer being negotiated
on one switch (the one I usually use) while it seemed fine on another.


Re: [PATCH net-next 0/5] net: lantiq: Minor fixes for vrx200 and gswip

2018-09-17 Thread David Miller
From: Hauke Mehrtens 
Date: Sat, 15 Sep 2018 14:08:44 +0200

> These are mostly minor fixes to problems addresses in the latests round 
> of the review of the original series adding these driver, which were not 
> applied before the patches got merged into net-next.
> In addition it fixes a data bus error on poweroff.

Series applied.


Re: [PATCH v2 net] net: aquantia: memory corruption on jumbo frames

2018-09-17 Thread David Miller
From: Igor Russkikh 
Date: Sat, 15 Sep 2018 18:03:39 +0300

> From: Friedemann Gerold 
> 
> This patch fixes skb_shared area, which will be corrupted
> upon reception of 4K jumbo packets.
> 
> Originally build_skb usage purpose was to reuse page for skb to eliminate
> needs of extra fragments. But that logic does not take into account that
> skb_shared_info should be reserved at the end of skb data area.
> 
> In case packet data consumes all the page (4K), skb_shinfo location
> overflows the page. As a consequence, __build_skb zeroed shinfo data above
> the allocated page, corrupting next page.
> 
> The issue is rarely seen in real life because jumbo are normally larger
> than 4K and that causes another code path to trigger.
> But it 100% reproducible with simple scapy packet, like:
> 
> sendp(IP(dst="192.168.100.3") / TCP(dport=443) \
>   / Raw(RandString(size=(4096-40))), iface="enp1s0")
> 
> Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
> 
> Reported-by: Friedemann Gerold 
> Reported-by: Michael Rauch 
> Signed-off-by: Friedemann Gerold 
> Tested-by: Nikita Danilov 
> Signed-off-by: Igor Russkikh 

APplied and queued up for -stable.


Re: [PATCHv2 net-next 1/1] net: rds: use memset to optimize the recv

2018-09-17 Thread David Miller
From: Zhu Yanjun 
Date: Sun, 16 Sep 2018 22:49:30 -0400

> The function rds_inc_init is in recv process. To use memset can optimize
> the function rds_inc_init.
> The test result:
> 
>  Before:
>  1) + 24.950 us   |rds_inc_init [rds]();
>  After:
>  1) + 10.990 us   |rds_inc_init [rds]();
> 
> Acked-by: Santosh Shilimkar 
> Signed-off-by: Zhu Yanjun 
> ---
> V1->V2: a new patch for net-next

Applied.


Re: [PATCH net-next] liquidio: Add the features to show FEC settings and set FEC settings

2018-09-17 Thread David Miller
From: Felix Manlunas 
Date: Sun, 16 Sep 2018 22:43:32 -0700

> From: Weilin Chang 
> 
> 1. Add functions for get_fecparam and set_fecparam.
> 2. Modify lio_get_link_ksettings to display FEC setting.
> 
> Signed-off-by: Weilin Chang 
> Acked-by: Derek Chickles 
> Signed-off-by: Felix Manlunas 

Applied.


[PATCH] net: emac: fix fixed-link setup for the RTL8363SB switch

2018-09-17 Thread Christian Lamparter
On the Netgear WNDAP620, the emac ethernet isn't receiving nor
xmitting any frames from/to the RTL8363SB (identifies itself
as a RTL8367RB).

This is caused by the emac hardware not knowing the forced link
parameters for speed, duplex, pause, etc.

This begs the question, how this was working on the original
driver code, when it was necessary to set the phy_address and
phy_map to 0x. But I guess without access to the old
PPC405/440/460 hardware, it's not possible to know.

Signed-off-by: Christian Lamparter 
---
 drivers/net/ethernet/ibm/emac/core.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.c 
b/drivers/net/ethernet/ibm/emac/core.c
index 354c0982847b..3b398ebdb5e6 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -2677,12 +2677,17 @@ static int emac_init_phy(struct emac_instance *dev)
if (of_phy_is_fixed_link(np)) {
int res = emac_dt_mdio_probe(dev);
 
-   if (!res) {
-   res = of_phy_register_fixed_link(np);
-   if (res)
-   mdiobus_unregister(dev->mii_bus);
+   if (res)
+   return res;
+
+   res = of_phy_register_fixed_link(np);
+   dev->phy_dev = of_phy_find_device(np);
+   if (res || !dev->phy_dev) {
+   mdiobus_unregister(dev->mii_bus);
+   return res ? res : -EINVAL;
}
-   return res;
+   emac_adjust_link(dev->ndev);
+   put_device(&dev->phy_dev->mdio.dev);
}
return 0;
}
-- 
2.19.0.rc2



  1   2   >