[PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock
Currently, all netlink protocol handlers for updating rules, actions and qdiscs are protected with single global rtnl lock which removes any possibility for parallelism. This patch set is a third step to remove rtnl lock dependency from TC rules update path. Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added. Handlers registered with this flag are called without RTNL taken. End goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered with UNLOCKED flag to allow parallel execution. However, there is no intention to completely remove or split rtnl lock itself. This patch set addresses specific problems in implementation of classifiers API that prevent its control path from being executed concurrently. Additional changes are required to refactor classifiers API and individual classifiers for parallel execution. This patch set lays groundwork to eventually register rule update handlers as rtnl-unlocked by modifying code in cls API that works with Qdiscs and blocks. Following patch set does the same for chains and classifiers. The goal of this change is to refactor tcf_block_find() and its dependencies to allow concurrent execution: - Extend Qdisc API with rcu to lookup and take reference to Qdisc without relying on rtnl lock. - Extend tcf_block with atomic reference counting and rcu. - Always take reference to tcf_block while working with it. - Implement tcf_block_release() to release resources obtained by tcf_block_find() - Create infrastructure to allow registering Qdiscs with class ops that do not require the caller to hold rtnl lock. All three netlink rule update handlers use tcf_block_find() to lookup Qdisc and block, and this patch set introduces additional means of synchronization to substitute rtnl lock in cls API. Some functions in cls and sch APIs have historic names that no longer clearly describe their intent. In order not make this code even more confusing when introducing their concurrency-friendly versions, rename these functions to describe actual implementation. Changes from V1 to V2: - Rebase on latest net-next. - Patch 8 - remove. - Patch 9 - fold into patch 11. - Patch 11: - Rename tcf_block_{get|put}() to tcf_block_refcnt_{get|put}(). - Patch 13 - remove. Vlad Buslov (10): net: core: netlink: add helper refcount dec and lock function net: sched: rename qdisc_destroy() to qdisc_put() net: sched: extend Qdisc with rcu net: sched: add helper function to take reference to Qdisc net: sched: use Qdisc rcu API instead of relying on rtnl lock net: sched: change tcf block reference counter type to refcount_t net: sched: implement functions to put and flush all chains net: sched: protect block idr with spinlock net: sched: implement tcf_block_refcnt_{get|put}() net: sched: use reference counting for tcf blocks on rules update include/linux/rtnetlink.h | 6 ++ include/net/pkt_sched.h | 1 + include/net/sch_generic.h | 20 +++- net/core/rtnetlink.c | 6 ++ net/sched/cls_api.c | 250 +- net/sched/sch_api.c | 24 - net/sched/sch_atm.c | 2 +- net/sched/sch_cbq.c | 2 +- net/sched/sch_cbs.c | 2 +- net/sched/sch_drr.c | 4 +- net/sched/sch_dsmark.c| 2 +- net/sched/sch_fifo.c | 2 +- net/sched/sch_generic.c | 48 +++-- net/sched/sch_hfsc.c | 2 +- net/sched/sch_htb.c | 4 +- net/sched/sch_mq.c| 4 +- net/sched/sch_mqprio.c| 4 +- net/sched/sch_multiq.c| 6 +- net/sched/sch_netem.c | 2 +- net/sched/sch_prio.c | 6 +- net/sched/sch_qfq.c | 4 +- net/sched/sch_red.c | 4 +- net/sched/sch_sfb.c | 4 +- net/sched/sch_tbf.c | 4 +- 24 files changed, 301 insertions(+), 112 deletions(-) -- 2.7.5
[PATCH net-next v2 08/10] net: sched: protect block idr with spinlock
Protect block idr access with spinlock, instead of relying on rtnl lock. Take tn->idr_lock spinlock during block insertion and removal. Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko --- net/sched/cls_api.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 58b2d8443f6a..924723fb74f6 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -473,6 +473,7 @@ tcf_chain0_head_change_cb_del(struct tcf_block *block, } struct tcf_net { + spinlock_t idr_lock; /* Protects idr */ struct idr idr; }; @@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, struct net *net, struct netlink_ext_ack *extack) { struct tcf_net *tn = net_generic(net, tcf_net_id); + int err; + + idr_preload(GFP_KERNEL); + spin_lock(&tn->idr_lock); + err = idr_alloc_u32(&tn->idr, block, &block->index, block->index, + GFP_NOWAIT); + spin_unlock(&tn->idr_lock); + idr_preload_end(); - return idr_alloc_u32(&tn->idr, block, &block->index, block->index, -GFP_KERNEL); + return err; } static void tcf_block_remove(struct tcf_block *block, struct net *net) { struct tcf_net *tn = net_generic(net, tcf_net_id); + spin_lock(&tn->idr_lock); idr_remove(&tn->idr, block->index); + spin_unlock(&tn->idr_lock); } static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q, @@ -2285,6 +2295,7 @@ static __net_init int tcf_net_init(struct net *net) { struct tcf_net *tn = net_generic(net, tcf_net_id); + spin_lock_init(&tn->idr_lock); idr_init(&tn->idr); return 0; } -- 2.7.5
[PATCH net-next v2 06/10] net: sched: change tcf block reference counter type to refcount_t
As a preparation for removing rtnl lock dependency from rules update path, change tcf block reference counter type to refcount_t to allow modification by concurrent users. In block put function perform decrement and check reference counter once to accommodate concurrent modification by unlocked users. After this change tcf_chain_put at the end of block put function is called with block->refcnt==0 and will deallocate block after the last chain is released, so there is no need to manually deallocate block in this case. However, if block reference counter reached 0 and there are no chains to release, block must still be deallocated manually. Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko --- include/net/sch_generic.h | 2 +- net/sched/cls_api.c | 59 --- 2 files changed, 36 insertions(+), 25 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 9a295e690efe..45fee65468d0 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -345,7 +345,7 @@ struct tcf_chain { struct tcf_block { struct list_head chain_list; u32 index; /* block index for shared blocks */ - unsigned int refcnt; + refcount_t refcnt; struct net *net; struct Qdisc *q; struct list_head cb_list; diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index cfa4a02a6a1a..c3c7d4e2f84c 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -240,7 +240,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain) if (!chain->index) block->chain0.chain = NULL; kfree(chain); - if (list_empty(&block->chain_list) && block->refcnt == 0) + if (list_empty(&block->chain_list) && !refcount_read(&block->refcnt)) kfree(block); } @@ -510,7 +510,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q, INIT_LIST_HEAD(&block->owner_list); INIT_LIST_HEAD(&block->chain0.filter_chain_list); - block->refcnt = 1; + refcount_set(&block->refcnt, 1); block->net = net; block->index = block_index; @@ -719,7 +719,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q, /* block_index not 0 means the shared block is requested */ block = tcf_block_lookup(net, ei->block_index); if (block) - block->refcnt++; + refcount_inc(&block->refcnt); } if (!block) { @@ -762,7 +762,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q, err_block_insert: kfree(block); } else { - block->refcnt--; + refcount_dec(&block->refcnt); } return err; } @@ -802,34 +802,45 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q, tcf_chain0_head_change_cb_del(block, ei); tcf_block_owner_del(block, q, ei->binder_type); - if (block->refcnt == 1) { - if (tcf_block_shared(block)) - tcf_block_remove(block, block->net); - - /* Hold a refcnt for all chains, so that they don't disappear -* while we are iterating. + if (refcount_dec_and_test(&block->refcnt)) { + /* Flushing/putting all chains will cause the block to be +* deallocated when last chain is freed. However, if chain_list +* is empty, block has to be manually deallocated. After block +* reference counter reached 0, it is no longer possible to +* increment it or add new chains to block. */ - list_for_each_entry(chain, &block->chain_list, list) - tcf_chain_hold(chain); + bool free_block = list_empty(&block->chain_list); - list_for_each_entry(chain, &block->chain_list, list) - tcf_chain_flush(chain); - } + if (tcf_block_shared(block)) + tcf_block_remove(block, block->net); - tcf_block_offload_unbind(block, q, ei); + if (!free_block) { + /* Hold a refcnt for all chains, so that they don't +* disappear while we are iterating. +*/ + list_for_each_entry(chain, &block->chain_list, list) + tcf_chain_hold(chain); - if (block->refcnt == 1) { - /* At this point, all the chains should have refcnt >= 1. */ - list_for_each_entry_safe(chain, tmp, &block->chain_list, list) { - tcf_chain_put_explicitly_created(chain); - tcf_chain_put(chain); + list_for_each_entry(chain, &block->chain_list, list) + tcf_chain_flush(chain); } - block->refcnt--
[PATCH net-next v2 07/10] net: sched: implement functions to put and flush all chains
Extract code that flushes and puts all chains on tcf block to two standalone function to be shared with functions that locklessly get/put reference to block. Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko --- net/sched/cls_api.c | 55 + 1 file changed, 30 insertions(+), 25 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index c3c7d4e2f84c..58b2d8443f6a 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -538,6 +538,31 @@ static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held) qdisc_put_unlocked(q); } +static void tcf_block_flush_all_chains(struct tcf_block *block) +{ + struct tcf_chain *chain; + + /* Hold a refcnt for all chains, so that they don't disappear +* while we are iterating. +*/ + list_for_each_entry(chain, &block->chain_list, list) + tcf_chain_hold(chain); + + list_for_each_entry(chain, &block->chain_list, list) + tcf_chain_flush(chain); +} + +static void tcf_block_put_all_chains(struct tcf_block *block) +{ + struct tcf_chain *chain, *tmp; + + /* At this point, all the chains should have refcnt >= 1. */ + list_for_each_entry_safe(chain, tmp, &block->chain_list, list) { + tcf_chain_put_explicitly_created(chain); + tcf_chain_put(chain); + } +} + /* Find tcf block. * Set q, parent, cl when appropriate. */ @@ -795,8 +820,6 @@ EXPORT_SYMBOL(tcf_block_get); void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q, struct tcf_block_ext_info *ei) { - struct tcf_chain *chain, *tmp; - if (!block) return; tcf_chain0_head_change_cb_del(block, ei); @@ -813,32 +836,14 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q, if (tcf_block_shared(block)) tcf_block_remove(block, block->net); - - if (!free_block) { - /* Hold a refcnt for all chains, so that they don't -* disappear while we are iterating. -*/ - list_for_each_entry(chain, &block->chain_list, list) - tcf_chain_hold(chain); - - list_for_each_entry(chain, &block->chain_list, list) - tcf_chain_flush(chain); - } - + if (!free_block) + tcf_block_flush_all_chains(block); tcf_block_offload_unbind(block, q, ei); - if (free_block) { + if (free_block) kfree(block); - } else { - /* At this point, all the chains should have -* refcnt >= 1. -*/ - list_for_each_entry_safe(chain, tmp, &block->chain_list, -list) { - tcf_chain_put_explicitly_created(chain); - tcf_chain_put(chain); - } - } + else + tcf_block_put_all_chains(block); } else { tcf_block_offload_unbind(block, q, ei); } -- 2.7.5
[PATCH net-next v2 01/10] net: core: netlink: add helper refcount dec and lock function
Rtnl lock is encapsulated in netlink and cannot be accessed by other modules directly. This means that reference counted objects that rely on rtnl lock cannot use it with refcounter helper function that atomically releases decrements reference and obtains mutex. This patch implements simple wrapper function around refcount_dec_and_lock that obtains rtnl lock if reference counter value reached 0. Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko --- include/linux/rtnetlink.h | 1 + net/core/rtnetlink.c | 6 ++ 2 files changed, 7 insertions(+) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 5225832bd6ff..dffbf665c086 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -34,6 +34,7 @@ extern void rtnl_unlock(void); extern int rtnl_trylock(void); extern int rtnl_is_locked(void); extern int rtnl_lock_killable(void); +extern bool refcount_dec_and_rtnl_lock(refcount_t *r); extern wait_queue_head_t netdev_unregistering_wq; extern struct rw_semaphore pernet_ops_rwsem; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index e4ae0319e189..ad9d1493cb27 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -130,6 +130,12 @@ int rtnl_is_locked(void) } EXPORT_SYMBOL(rtnl_is_locked); +bool refcount_dec_and_rtnl_lock(refcount_t *r) +{ + return refcount_dec_and_mutex_lock(r, &rtnl_mutex); +} +EXPORT_SYMBOL(refcount_dec_and_rtnl_lock); + #ifdef CONFIG_PROVE_LOCKING bool lockdep_rtnl_is_held(void) { -- 2.7.5
[PATCH net-next v2 09/10] net: sched: implement tcf_block_refcnt_{get|put}()
Implement get/put function for blocks that only take/release the reference and perform deallocation. These functions are intended to be used by unlocked rules update path to always hold reference to block while working with it. They use on new fine-grained locking mechanisms introduced in previous patches in this set, instead of relying on global protection provided by rtnl lock. Extract code that is common with tcf_block_detach_ext() into common function __tcf_block_put(). Extend tcf_block with rcu to allow safe deallocation when it is accessed concurrently. Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko --- include/net/sch_generic.h | 1 + net/sched/cls_api.c | 74 --- 2 files changed, 51 insertions(+), 24 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 45fee65468d0..931fcdadf64a 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -357,6 +357,7 @@ struct tcf_block { struct tcf_chain *chain; struct list_head filter_chain_list; } chain0; + struct rcu_head rcu; }; static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 924723fb74f6..0a7a3ace2da9 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -241,7 +241,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain) block->chain0.chain = NULL; kfree(chain); if (list_empty(&block->chain_list) && !refcount_read(&block->refcnt)) - kfree(block); + kfree_rcu(block, rcu); } static void tcf_chain_hold(struct tcf_chain *chain) @@ -537,6 +537,19 @@ static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index) return idr_find(&tn->idr, block_index); } +static struct tcf_block *tcf_block_refcnt_get(struct net *net, u32 block_index) +{ + struct tcf_block *block; + + rcu_read_lock(); + block = tcf_block_lookup(net, block_index); + if (block && !refcount_inc_not_zero(&block->refcnt)) + block = NULL; + rcu_read_unlock(); + + return block; +} + static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held) { if (!q) @@ -573,6 +586,40 @@ static void tcf_block_put_all_chains(struct tcf_block *block) } } +static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q, + struct tcf_block_ext_info *ei) +{ + if (refcount_dec_and_test(&block->refcnt)) { + /* Flushing/putting all chains will cause the block to be +* deallocated when last chain is freed. However, if chain_list +* is empty, block has to be manually deallocated. After block +* reference counter reached 0, it is no longer possible to +* increment it or add new chains to block. +*/ + bool free_block = list_empty(&block->chain_list); + + if (tcf_block_shared(block)) + tcf_block_remove(block, block->net); + if (!free_block) + tcf_block_flush_all_chains(block); + + if (q) + tcf_block_offload_unbind(block, q, ei); + + if (free_block) + kfree_rcu(block, rcu); + else + tcf_block_put_all_chains(block); + } else if (q) { + tcf_block_offload_unbind(block, q, ei); + } +} + +static void tcf_block_refcnt_put(struct tcf_block *block) +{ + __tcf_block_put(block, NULL, NULL); +} + /* Find tcf block. * Set q, parent, cl when appropriate. */ @@ -795,7 +842,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q, if (tcf_block_shared(block)) tcf_block_remove(block, net); err_block_insert: - kfree(block); + kfree_rcu(block, rcu); } else { refcount_dec(&block->refcnt); } @@ -835,28 +882,7 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q, tcf_chain0_head_change_cb_del(block, ei); tcf_block_owner_del(block, q, ei->binder_type); - if (refcount_dec_and_test(&block->refcnt)) { - /* Flushing/putting all chains will cause the block to be -* deallocated when last chain is freed. However, if chain_list -* is empty, block has to be manually deallocated. After block -* reference counter reached 0, it is no longer possible to -* increment it or add new chains to block. -*/ - bool free_block = list_empty(&block->chain_list); - - if (tcf_block_shared(block)) - tcf_block_remove(block, block->net); - if (!free_block) - tcf_block_flush_all_chains(block)
[PATCH net-next v2 03/10] net: sched: extend Qdisc with rcu
Currently, Qdisc API functions assume that users have rtnl lock taken. To implement rtnl unlocked classifiers update interface, Qdisc API must be extended with functions that do not require rtnl lock. Extend Qdisc structure with rcu. Implement special version of put function qdisc_put_unlocked() that is called without rtnl lock taken. This function only takes rtnl lock if Qdisc reference counter reached zero and is intended to be used as optimization. Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko --- include/linux/rtnetlink.h | 5 + include/net/pkt_sched.h | 1 + include/net/sch_generic.h | 2 ++ net/sched/sch_api.c | 18 ++ net/sched/sch_generic.c | 25 - 5 files changed, 50 insertions(+), 1 deletion(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index dffbf665c086..d3dff3e41e6c 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -84,6 +84,11 @@ static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev) return rtnl_dereference(dev->ingress_queue); } +static inline struct netdev_queue *dev_ingress_queue_rcu(struct net_device *dev) +{ + return rcu_dereference(dev->ingress_queue); +} + struct netdev_queue *dev_ingress_queue_create(struct net_device *dev); #ifdef CONFIG_NET_INGRESS diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 7dc769e5452b..a16fbe9a2a67 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -102,6 +102,7 @@ int qdisc_set_default(const char *id); void qdisc_hash_add(struct Qdisc *q, bool invisible); void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); +struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle); struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, struct nlattr *tab, struct netlink_ext_ack *extack); diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index fadb1a4d4ee8..8a0d1523d11b 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -90,6 +90,7 @@ struct Qdisc { struct gnet_stats_queue __percpu *cpu_qstats; int padded; refcount_t refcnt; + struct rcu_head rcu; /* * For performance sake on SMP, we put highly modified fields at the end @@ -555,6 +556,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue, struct Qdisc *qdisc); void qdisc_reset(struct Qdisc *qdisc); void qdisc_put(struct Qdisc *qdisc); +void qdisc_put_unlocked(struct Qdisc *qdisc); void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n, unsigned int len); struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue, diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 2096138c4bf6..070bed39155b 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -314,6 +314,24 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle) return q; } +struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle) +{ + struct Qdisc *q; + struct netdev_queue *nq; + + if (!handle) + return NULL; + q = qdisc_match_from_root(dev->qdisc, handle); + if (q) + goto out; + + nq = dev_ingress_queue_rcu(dev); + if (nq) + q = qdisc_match_from_root(nq->qdisc_sleeping, handle); +out: + return q; +} + static struct Qdisc *qdisc_leaf(struct Qdisc *p, u32 classid) { unsigned long cl; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 3e7696f3e053..531fac1d2875 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -941,6 +941,13 @@ void qdisc_free(struct Qdisc *qdisc) kfree((char *) qdisc - qdisc->padded); } +void qdisc_free_cb(struct rcu_head *head) +{ + struct Qdisc *q = container_of(head, struct Qdisc, rcu); + + qdisc_free(q); +} + static void qdisc_destroy(struct Qdisc *qdisc) { const struct Qdisc_ops *ops = qdisc->ops; @@ -970,7 +977,7 @@ static void qdisc_destroy(struct Qdisc *qdisc) kfree_skb_list(skb); } - qdisc_free(qdisc); + call_rcu(&qdisc->rcu, qdisc_free_cb); } void qdisc_put(struct Qdisc *qdisc) @@ -983,6 +990,22 @@ void qdisc_put(struct Qdisc *qdisc) } EXPORT_SYMBOL(qdisc_put); +/* Version of qdisc_put() that is called with rtnl mutex unlocked. + * Intended to be used as optimization, this function only takes rtnl lock if + * qdisc reference counter reached zero. + */ + +void qdisc_put_unlocked(struct Qdisc *qdisc) +{ + if (qdisc->flags & TCQ_F_BUILTIN || + !refcount_dec_and_rtnl_lock(&qdisc->refcnt)) + return; + + qdisc_destroy(qdisc); + rtnl_unlock(); +} +EXPORT_SYMBOL(qdisc_put_unlocked); + /* Attach
[PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock
As a preparation from removing rtnl lock dependency from rules update path, use Qdisc rcu and reference counting capabilities instead of relying on rtnl lock while working with Qdiscs. Create new tcf_block_release() function, and use it to free resources taken by tcf_block_find(). Currently, this function only releases Qdisc and it is extended in next patches in this series. Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko --- net/sched/cls_api.c | 88 - 1 file changed, 73 insertions(+), 15 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 1a67af8a6e8c..cfa4a02a6a1a 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -527,6 +527,17 @@ static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index) return idr_find(&tn->idr, block_index); } +static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held) +{ + if (!q) + return; + + if (rtnl_held) + qdisc_put(q); + else + qdisc_put_unlocked(q); +} + /* Find tcf block. * Set q, parent, cl when appropriate. */ @@ -537,6 +548,7 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q, struct netlink_ext_ack *extack) { struct tcf_block *block; + int err = 0; if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) { block = tcf_block_lookup(net, block_index); @@ -548,55 +560,91 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q, const struct Qdisc_class_ops *cops; struct net_device *dev; + rcu_read_lock(); + /* Find link */ - dev = __dev_get_by_index(net, ifindex); - if (!dev) + dev = dev_get_by_index_rcu(net, ifindex); + if (!dev) { + rcu_read_unlock(); return ERR_PTR(-ENODEV); + } /* Find qdisc */ if (!*parent) { *q = dev->qdisc; *parent = (*q)->handle; } else { - *q = qdisc_lookup(dev, TC_H_MAJ(*parent)); + *q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent)); if (!*q) { NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists"); - return ERR_PTR(-EINVAL); + err = -EINVAL; + goto errout_rcu; } } + *q = qdisc_refcount_inc_nz(*q); + if (!*q) { + NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists"); + err = -EINVAL; + goto errout_rcu; + } + /* Is it classful? */ cops = (*q)->ops->cl_ops; if (!cops) { NL_SET_ERR_MSG(extack, "Qdisc not classful"); - return ERR_PTR(-EINVAL); + err = -EINVAL; + goto errout_rcu; } if (!cops->tcf_block) { NL_SET_ERR_MSG(extack, "Class doesn't support blocks"); - return ERR_PTR(-EOPNOTSUPP); + err = -EOPNOTSUPP; + goto errout_rcu; } + /* At this point we know that qdisc is not noop_qdisc, +* which means that qdisc holds a reference to net_device +* and we hold a reference to qdisc, so it is safe to release +* rcu read lock. +*/ + rcu_read_unlock(); + /* Do we search for filter, attached to class? */ if (TC_H_MIN(*parent)) { *cl = cops->find(*q, *parent); if (*cl == 0) { NL_SET_ERR_MSG(extack, "Specified class doesn't exist"); - return ERR_PTR(-ENOENT); + err = -ENOENT; + goto errout_qdisc; } } /* And the last stroke */ block = cops->tcf_block(*q, *cl, extack); - if (!block) - return ERR_PTR(-EINVAL); + if (!block) { + err = -EINVAL; + goto errout_qdisc; + } if (tcf_block_shared(block)) { NL_SET_ERR_MSG(extack, "This filter block is shared. Please use the block index to manipulate the filters"); - return ERR_PTR(-EOPNOTSUPP); + err = -EOPNOTSUPP; + goto errout_qdisc; } } return blo
[PATCH net-next v2 04/10] net: sched: add helper function to take reference to Qdisc
Implement function to take reference to Qdisc that relies on rcu read lock instead of rtnl mutex. Function only takes reference to Qdisc if reference counter isn't zero. Intended to be used by unlocked cls API. Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko --- include/net/sch_generic.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 8a0d1523d11b..9a295e690efe 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -115,6 +115,19 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc) refcount_inc(&qdisc->refcnt); } +/* Intended to be used by unlocked users, when concurrent qdisc release is + * possible. + */ + +static inline struct Qdisc *qdisc_refcount_inc_nz(struct Qdisc *qdisc) +{ + if (qdisc->flags & TCQ_F_BUILTIN) + return qdisc; + if (refcount_inc_not_zero(&qdisc->refcnt)) + return qdisc; + return NULL; +} + static inline bool qdisc_is_running(struct Qdisc *qdisc) { if (qdisc->flags & TCQ_F_NOLOCK) -- 2.7.5
[PATCH net-next v2 02/10] net: sched: rename qdisc_destroy() to qdisc_put()
Current implementation of qdisc_destroy() decrements Qdisc reference counter and only actually destroy Qdisc if reference counter value reached zero. Rename qdisc_destroy() to qdisc_put() in order for it to better describe the way in which this function currently implemented and used. Extract code that deallocates Qdisc into new private qdisc_destroy() function. It is intended to be shared between regular qdisc_put() and its unlocked version that is introduced in next patch in this series. Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko --- include/net/sch_generic.h | 2 +- net/sched/sch_api.c | 6 +++--- net/sched/sch_atm.c | 2 +- net/sched/sch_cbq.c | 2 +- net/sched/sch_cbs.c | 2 +- net/sched/sch_drr.c | 4 ++-- net/sched/sch_dsmark.c| 2 +- net/sched/sch_fifo.c | 2 +- net/sched/sch_generic.c | 23 ++- net/sched/sch_hfsc.c | 2 +- net/sched/sch_htb.c | 4 ++-- net/sched/sch_mq.c| 4 ++-- net/sched/sch_mqprio.c| 4 ++-- net/sched/sch_multiq.c| 6 +++--- net/sched/sch_netem.c | 2 +- net/sched/sch_prio.c | 6 +++--- net/sched/sch_qfq.c | 4 ++-- net/sched/sch_red.c | 4 ++-- net/sched/sch_sfb.c | 4 ++-- net/sched/sch_tbf.c | 4 ++-- 20 files changed, 47 insertions(+), 42 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index d326fd553b58..fadb1a4d4ee8 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -554,7 +554,7 @@ void dev_deactivate_many(struct list_head *head); struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue, struct Qdisc *qdisc); void qdisc_reset(struct Qdisc *qdisc); -void qdisc_destroy(struct Qdisc *qdisc); +void qdisc_put(struct Qdisc *qdisc); void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n, unsigned int len); struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue, diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 411c40344b77..2096138c4bf6 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -920,7 +920,7 @@ static void notify_and_destroy(struct net *net, struct sk_buff *skb, qdisc_notify(net, skb, n, clid, old, new); if (old) - qdisc_destroy(old); + qdisc_put(old); } /* Graft qdisc "new" to class "classid" of qdisc "parent" or @@ -973,7 +973,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent, qdisc_refcount_inc(new); if (!ingress) - qdisc_destroy(old); + qdisc_put(old); } skip: @@ -1561,7 +1561,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n, err = qdisc_graft(dev, p, skb, n, clid, q, NULL, extack); if (err) { if (q) - qdisc_destroy(q); + qdisc_put(q); return err; } diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c index cd49afca9617..d714d3747bcb 100644 --- a/net/sched/sch_atm.c +++ b/net/sched/sch_atm.c @@ -150,7 +150,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl) pr_debug("atm_tc_put: destroying\n"); list_del_init(&flow->list); pr_debug("atm_tc_put: qdisc %p\n", flow->q); - qdisc_destroy(flow->q); + qdisc_put(flow->q); tcf_block_put(flow->block); if (flow->sock) { pr_debug("atm_tc_put: f_count %ld\n", diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index f42025d53cfe..4dc05409e3fb 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -1418,7 +1418,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl) WARN_ON(cl->filters); tcf_block_put(cl->block); - qdisc_destroy(cl->q); + qdisc_put(cl->q); qdisc_put_rtab(cl->R_tab); gen_kill_estimator(&cl->rate_est); if (cl != &q->link) diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c index e26a24017faa..e689e11b6d0f 100644 --- a/net/sched/sch_cbs.c +++ b/net/sched/sch_cbs.c @@ -379,7 +379,7 @@ static void cbs_destroy(struct Qdisc *sch) cbs_disable_offload(dev, q); if (q->qdisc) - qdisc_destroy(q->qdisc); + qdisc_put(q->qdisc); } static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb) diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c index e0b0cf8a9939..cdebaed0f8cf 100644 --- a/net/sched/sch_drr.c +++ b/net/sched/sch_drr.c @@ -134,7 +134,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid, tca[TCA_RATE]); if (err) { NL_SET_ERR_MSG(extack, "Failed to replace estimator"); - qdisc_destroy(cl->qdisc); + qdis
[PATCH net-next v2 10/10] net: sched: use reference counting for tcf blocks on rules update
In order to remove dependency on rtnl lock on rules update path, always take reference to block while using it on rules update path. Change tcf_block_get() error handling to properly release block with reference counting, instead of just destroying it, in order to accommodate potential concurrent users. Signed-off-by: Vlad Buslov Acked-by: Jiri Pirko --- net/sched/cls_api.c | 37 - 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 0a7a3ace2da9..6a3eec5dbdf1 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -633,7 +633,7 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q, int err = 0; if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) { - block = tcf_block_lookup(net, block_index); + block = tcf_block_refcnt_get(net, block_index); if (!block) { NL_SET_ERR_MSG(extack, "Block of given index was not found"); return ERR_PTR(-EINVAL); @@ -713,6 +713,14 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q, err = -EOPNOTSUPP; goto errout_qdisc; } + + /* Always take reference to block in order to support execution +* of rules update path of cls API without rtnl lock. Caller +* must release block when it is finished using it. 'if' block +* of this conditional obtain reference to block by calling +* tcf_block_refcnt_get(). +*/ + refcount_inc(&block->refcnt); } return block; @@ -726,6 +734,8 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q, static void tcf_block_release(struct Qdisc *q, struct tcf_block *block) { + if (!IS_ERR_OR_NULL(block)) + tcf_block_refcnt_put(block); tcf_qdisc_put(q, true); } @@ -794,21 +804,16 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q, { struct net *net = qdisc_net(q); struct tcf_block *block = NULL; - bool created = false; int err; - if (ei->block_index) { + if (ei->block_index) /* block_index not 0 means the shared block is requested */ - block = tcf_block_lookup(net, ei->block_index); - if (block) - refcount_inc(&block->refcnt); - } + block = tcf_block_refcnt_get(net, ei->block_index); if (!block) { block = tcf_block_create(net, q, ei->block_index, extack); if (IS_ERR(block)) return PTR_ERR(block); - created = true; if (tcf_block_shared(block)) { err = tcf_block_insert(block, net, extack); if (err) @@ -838,14 +843,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q, err_chain0_head_change_cb_add: tcf_block_owner_del(block, q, ei->binder_type); err_block_owner_add: - if (created) { - if (tcf_block_shared(block)) - tcf_block_remove(block, net); err_block_insert: - kfree_rcu(block, rcu); - } else { - refcount_dec(&block->refcnt); - } + tcf_block_refcnt_put(block); return err; } EXPORT_SYMBOL(tcf_block_get_ext); @@ -1739,7 +1738,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb) return err; if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) { - block = tcf_block_lookup(net, tcm->tcm_block_index); + block = tcf_block_refcnt_get(net, tcm->tcm_block_index); if (!block) goto out; /* If we work with block index, q is NULL and parent value @@ -1798,6 +1797,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb) } } + if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) + tcf_block_refcnt_put(block); cb->args[0] = index; out: @@ -2062,7 +2063,7 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb) return err; if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) { - block = tcf_block_lookup(net, tcm->tcm_block_index); + block = tcf_block_refcnt_get(net, tcm->tcm_block_index); if (!block) goto out; /* If we work with block index, q is NULL and parent value @@ -2129,6 +2130,8 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb) index++; } + if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) + tcf_block_refcnt_put(block); cb->args[0] = index; out: -
Re: [Patch net-next] ipv4: initialize ra_mutex in inet_init_net()
On 14.09.2018 23:32, Cong Wang wrote: > ra_mutex is a IPv4 specific mutex, it is inside struct netns_ipv4, > but its initialization is in the generic netns code, setup_net(). > > Move it to IPv4 specific net init code, inet_init_net(). > > Fixes: d9ff3049739e ("net: Replace ip_ra_lock with per-net mutex") > Cc: Kirill Tkhai > Signed-off-by: Cong Wang > --- > net/core/net_namespace.c | 1 - > net/ipv4/af_inet.c | 2 ++ > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c > index 670c84b1bfc2..b272ccfcbf63 100644 > --- a/net/core/net_namespace.c > +++ b/net/core/net_namespace.c > @@ -308,7 +308,6 @@ static __net_init int setup_net(struct net *net, struct > user_namespace *user_ns) > net->user_ns = user_ns; > idr_init(&net->netns_ids); > spin_lock_init(&net->nsid_lock); > - mutex_init(&net->ipv4.ra_mutex); > > list_for_each_entry(ops, &pernet_list, list) { > error = ops_init(ops, net); > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 20fda8fb8ffd..57b7bffb93e5 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1817,6 +1817,8 @@ static __net_init int inet_init_net(struct net *net) > net->ipv4.sysctl_igmp_llm_reports = 1; > net->ipv4.sysctl_igmp_qrv = 2; > > + mutex_init(&net->ipv4.ra_mutex); > + In inet_init() the order of registration is: ip_mr_init(); init_inet_pernet_ops(); This means, ipmr_net_ops pernet operations are before af_inet_ops in pernet_list. So, there is a theoretical probability, sometimes in the future, we will have a problem during a fail of net initialization. Say, setup_net(): ipmr_net_ops->init() returns 0 xxx->init() returns error and then we do: ipmr_net_ops->exit(), which could touch ra_mutex (theoretically). Your patch is OK, but since you do this, we may also swap the order of registration of ipmr_net_ops and af_inet_ops better too. Kirill
[PATCH net 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races
This follows David Miller advice and tries to fix coalesce timer in multi-queue scenarios. We are now using per-queue coalesce values and per-queue TX timer. Coalesce timer default values was changed to 1ms and the coalesce frames to 25. Tested in B2B setup between XGMAC2 and GMAC5. Signed-off-by: Jose Abreu Fixes: ce736788e8a ("net: stmmac: adding multiple buffers for TX") Cc: Florian Fainelli Cc: Neil Armstrong Cc: Jerome Brunet Cc: Martin Blumenstingl Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- drivers/net/ethernet/stmicro/stmmac/common.h | 4 +- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 14 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 233 -- include/linux/stmmac.h| 1 + 4 files changed, 146 insertions(+), 106 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index 1854f270ad66..b1b305f8f414 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -258,10 +258,10 @@ struct stmmac_safety_stats { #define MAX_DMA_RIWT 0xff #define MIN_DMA_RIWT 0x20 /* Tx coalesce parameters */ -#define STMMAC_COAL_TX_TIMER 4 +#define STMMAC_COAL_TX_TIMER 1000 #define STMMAC_MAX_COAL_TX_TICK10 #define STMMAC_TX_MAX_FRAMES 256 -#define STMMAC_TX_FRAMES 64 +#define STMMAC_TX_FRAMES 25 /* Packets types */ enum packets_types { diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index c0a855b7ab3b..63e1064b27a2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -48,6 +48,8 @@ struct stmmac_tx_info { /* Frequently used values are kept adjacent for cache effect */ struct stmmac_tx_queue { + u32 tx_count_frames; + struct timer_list txtimer; u32 queue_index; struct stmmac_priv *priv_data; struct dma_extended_desc *dma_etx cacheline_aligned_in_smp; @@ -73,7 +75,14 @@ struct stmmac_rx_queue { u32 rx_zeroc_thresh; dma_addr_t dma_rx_phy; u32 rx_tail_addr; +}; + +struct stmmac_channel { struct napi_struct napi cacheline_aligned_in_smp; + struct stmmac_priv *priv_data; + u32 index; + int has_rx; + int has_tx; }; struct stmmac_tc_entry { @@ -109,14 +118,12 @@ struct stmmac_pps_cfg { struct stmmac_priv { /* Frequently used values are kept adjacent for cache effect */ - u32 tx_count_frames; u32 tx_coal_frames; u32 tx_coal_timer; int tx_coalesce; int hwts_tx_en; bool tx_path_in_lpi_mode; - struct timer_list txtimer; bool tso; unsigned int dma_buf_sz; @@ -137,6 +144,9 @@ struct stmmac_priv { /* TX Queue */ struct stmmac_tx_queue tx_queue[MTL_MAX_TX_QUEUES]; + /* Generic channel for NAPI */ + struct stmmac_channel channel[STMMAC_CH_MAX]; + bool oldlink; int speed; int oldduplex; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 9f458bb16f2a..ab9cc0143ff2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -148,12 +148,14 @@ static void stmmac_verify_args(void) static void stmmac_disable_all_queues(struct stmmac_priv *priv) { u32 rx_queues_cnt = priv->plat->rx_queues_to_use; + u32 tx_queues_cnt = priv->plat->tx_queues_to_use; + u32 maxq = max(rx_queues_cnt, tx_queues_cnt); u32 queue; - for (queue = 0; queue < rx_queues_cnt; queue++) { - struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue]; + for (queue = 0; queue < maxq; queue++) { + struct stmmac_channel *ch = &priv->channel[queue]; - napi_disable(&rx_q->napi); + napi_disable(&ch->napi); } } @@ -164,12 +166,14 @@ static void stmmac_disable_all_queues(struct stmmac_priv *priv) static void stmmac_enable_all_queues(struct stmmac_priv *priv) { u32 rx_queues_cnt = priv->plat->rx_queues_to_use; + u32 tx_queues_cnt = priv->plat->tx_queues_to_use; + u32 maxq = max(rx_queues_cnt, tx_queues_cnt); u32 queue; - for (queue = 0; queue < rx_queues_cnt; queue++) { - struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue]; + for (queue = 0; queue < maxq; queue++) { + struct stmmac_channel *ch = &priv->channel[queue]; - napi_enable(&rx_q->napi); + napi_enable(&ch->napi); } } @@ -1843,18 +1847,18 @@ static void stmmac_dma_operation_mode(struct stmmac_priv *priv) * @queue: TX queue index * Description: it reclaims the transmit resources after transmission completes. */ -static
[PATCH net 2/2] net: stmmac: Fixup the tail addr setting in xmit path
Currently we are always setting the tail address of descriptor list to the end of the pre-allocated list. According to databook this is not correct. Tail address should point to the last available descriptor + 1, which means we have to update the tail address everytime we call the xmit function. This should make no impact in older versions of MAC but in newer versions there are some DMA features which allows the IP to fetch descriptors in advance and in a non sequential order so its critical that we set the tail address correctly. Signed-off-by: Jose Abreu Fixes: f748be531d70 ("stmmac: support new GMAC4") Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index ab9cc0143ff2..75896d6ba6e2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2213,8 +2213,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) stmmac_init_tx_chan(priv, priv->ioaddr, priv->plat->dma_cfg, tx_q->dma_tx_phy, chan); - tx_q->tx_tail_addr = tx_q->dma_tx_phy + - (DMA_TX_SIZE * sizeof(struct dma_desc)); + tx_q->tx_tail_addr = tx_q->dma_tx_phy; stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, chan); } @@ -3003,6 +3002,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len); + tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx * sizeof(*desc)); stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue); return NETDEV_TX_OK; @@ -3210,6 +3210,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) stmmac_enable_dma_transmission(priv, priv->ioaddr); + tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx * sizeof(*desc)); stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue); return NETDEV_TX_OK; -- 2.7.4
[PATCH net 0/2] net: stmmac: Coalesce and tail addr fixes
The fix for coalesce timer and a fix in tail address setting that impacts XGMAC2 operation. The series is: Tested-by: Jerome Brunet on a113 s400 board (single queue) Cc: Florian Fainelli Cc: Neil Armstrong Cc: Jerome Brunet Cc: Martin Blumenstingl Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue Jose Abreu (2): net: stmmac: Rework coalesce timer and fix multi-queue races net: stmmac: Fixup the tail addr setting in xmit path drivers/net/ethernet/stmicro/stmmac/common.h | 4 +- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 14 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 238 -- include/linux/stmmac.h| 1 + 4 files changed, 149 insertions(+), 108 deletions(-) -- 2.7.4
iproute2: Debian 9 No ELF support
Hello, I have followed the instructions from: https://cilium.readthedocs.io/en/latest/bpf/#bpftool to test xdp program. But i can not enable elf support. ./configure --prefix=/usr ```output TC schedulers ATM no libc has setns: yes SELinux support: no ELF support: no libmnl support: yes Berkeley DB: yes need for strlcpy: yes libcap support: yes ``` And i have installed libelf-dev : ```output sudo apt show libelf-dev Package: libelf-dev Version: 0.168-1 Priority: optional Section: libdevel Source: elfutils Maintainer: Kurt Roeckx Installed-Size: 353 kB Depends: libelf1 (= 0.168-1) Conflicts: libelfg0-dev Homepage: https://sourceware.org/elfutils/ Tag: devel::library, role::devel-lib ``` And gcc version: gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) uname -a: Linux debian 4.18.0-rc1+ #2 SMP Sun Jun 24 16:53:57 HKT 2018 x86_64 GNU/Linux Any help is appreciate.
Re: [PATCH net] netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev
On Sun, Sep 16, 2018 at 09:14:42PM -0700, David Ahern wrote: > Pablo: > > DaveM has this marked as waiting for upstream. Any comment on this patch? Please, resend a Cc netfilter-de...@vger.kernel.org Thanks David.
[PATCH v2 0/4] net: qcom/emac: add shared mdio bus support
The emac include MDIO controller, and the motherboard has more than one PHY connected to an MDIO bus. So share the shared mii_bus for others MAC device that not has MDIO bus connected. Tested: QDF2400 (ACPI), buildin/insmod/rmmod V2: - Separate patch. - bindings: s/Since QDF2400 emac/Since emac/ Wang Dongsheng (4): net: qcom/emac: split phy_config to mdio bus create and get phy device dt-bindings: net: qcom: Add binding for shared mdio bus net: qcom/emac: add of shared mdio bus support net: qcom/emac: add acpi shared mdio bus support .../devicetree/bindings/net/qcom-emac.txt | 4 + drivers/net/ethernet/qualcomm/emac/emac-phy.c | 207 ++ drivers/net/ethernet/qualcomm/emac/emac.c | 8 +- 3 files changed, 175 insertions(+), 44 deletions(-) -- 2.18.0
[PATCH v2 1/4] net: qcom/emac: split phy_config to mdio bus create and get phy device
This patch separate emac_mdio_bus_create and emac_get_phydev from emac_phy_config, and do some codes clean. Signed-off-by: Wang Dongsheng --- drivers/net/ethernet/qualcomm/emac/emac-phy.c | 99 +++ 1 file changed, 57 insertions(+), 42 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/emac/emac-phy.c b/drivers/net/ethernet/qualcomm/emac/emac-phy.c index 53dbf1e163a8..2d16c6b9ef49 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac-phy.c +++ b/drivers/net/ethernet/qualcomm/emac/emac-phy.c @@ -96,15 +96,14 @@ static int emac_mdio_write(struct mii_bus *bus, int addr, int regnum, u16 val) return 0; } -/* Configure the MDIO bus and connect the external PHY */ -int emac_phy_config(struct platform_device *pdev, struct emac_adapter *adpt) +static int emac_mdio_bus_create(struct platform_device *pdev, + struct emac_adapter *adpt) { struct device_node *np = pdev->dev.of_node; struct mii_bus *mii_bus; int ret; - /* Create the mii_bus object for talking to the MDIO bus */ - adpt->mii_bus = mii_bus = devm_mdiobus_alloc(&pdev->dev); + mii_bus = devm_mdiobus_alloc(&pdev->dev); if (!mii_bus) return -ENOMEM; @@ -115,50 +114,66 @@ int emac_phy_config(struct platform_device *pdev, struct emac_adapter *adpt) mii_bus->parent = &pdev->dev; mii_bus->priv = adpt; - if (has_acpi_companion(&pdev->dev)) { - u32 phy_addr; - - ret = mdiobus_register(mii_bus); - if (ret) { - dev_err(&pdev->dev, "could not register mdio bus\n"); - return ret; - } - ret = device_property_read_u32(&pdev->dev, "phy-channel", - &phy_addr); - if (ret) - /* If we can't read a valid phy address, then assume -* that there is only one phy on this mdio bus. -*/ - adpt->phydev = phy_find_first(mii_bus); - else - adpt->phydev = mdiobus_get_phy(mii_bus, phy_addr); - - /* of_phy_find_device() claims a reference to the phydev, -* so we do that here manually as well. When the driver -* later unloads, it can unilaterally drop the reference -* without worrying about ACPI vs DT. -*/ - if (adpt->phydev) - get_device(&adpt->phydev->mdio.dev); - } else { - struct device_node *phy_np; + ret = of_mdiobus_register(mii_bus, has_acpi_companion(&pdev->dev) ? + NULL : np); + if (ret) { + dev_err(&pdev->dev, "Could not register mdio bus\n"); + return ret; + } - ret = of_mdiobus_register(mii_bus, np); - if (ret) { - dev_err(&pdev->dev, "could not register mdio bus\n"); - return ret; - } + adpt->mii_bus = mii_bus; + return 0; +} +static void emac_get_phydev(struct platform_device *pdev, + struct emac_adapter *adpt) +{ + struct device_node *np = pdev->dev.of_node; + struct mii_bus *bus = adpt->mii_bus; + struct device_node *phy_np; + u32 phy_addr; + int ret; + + if (!has_acpi_companion(&pdev->dev)) { phy_np = of_parse_phandle(np, "phy-handle", 0); adpt->phydev = of_phy_find_device(phy_np); of_node_put(phy_np); + return; } - if (!adpt->phydev) { - dev_err(&pdev->dev, "could not find external phy\n"); - mdiobus_unregister(mii_bus); - return -ENODEV; - } + ret = device_property_read_u32(&pdev->dev, "phy-channel", + &phy_addr); + if (ret) + /* If we can't read a valid phy address, then assume +* that there is only one phy on this mdio bus. +*/ + adpt->phydev = phy_find_first(bus); + else + adpt->phydev = mdiobus_get_phy(bus, phy_addr); + + /* of_phy_find_device() claims a reference to the phydev, +* so we do that here manually as well. When the driver +* later unloads, it can unilaterally drop the reference +* without worrying about ACPI vs DT. +*/ + if (adpt->phydev) + get_device(&adpt->phydev->mdio.dev); +} - return 0; +/* Configure the MDIO bus and connect the external PHY */ +int emac_phy_config(struct platform_device *pdev, struct emac_adapter *adpt) +{ + int ret; + + ret = emac_mdio_bus_create(pdev, adpt); + if (ret) + return ret; + + emac_get_phydev(pdev, adpt); + if (adpt->phydev) + r
[PATCH v2 2/4] dt-bindings: net: qcom: Add binding for shared mdio bus
This property copy from "ibm,emac.txt" to describe a shared MIDO bus. Since emac include MDIO, so If the motherboard has more than one PHY connected to an MDIO bus, this property will point to the MAC device that has the MDIO bus. Signed-off-by: Wang Dongsheng --- V2: s/Since QDF2400 emac/Since emac/ --- Documentation/devicetree/bindings/net/qcom-emac.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/net/qcom-emac.txt b/Documentation/devicetree/bindings/net/qcom-emac.txt index 346e6c7f47b7..50db71771358 100644 --- a/Documentation/devicetree/bindings/net/qcom-emac.txt +++ b/Documentation/devicetree/bindings/net/qcom-emac.txt @@ -24,6 +24,9 @@ Internal PHY node: The external phy child node: - reg : The phy address +Optional properties: +- mdio-device : Shared MIDO bus. + Example: FSM9900: @@ -86,6 +89,7 @@ soc { reg = <0x0 0x3880 0x0 0x1>, <0x0 0x38816000 0x0 0x1000>; interrupts = <0 256 4>; + mdio-device = <&emac1>; clocks = <&gcc 0>, <&gcc 1>, <&gcc 3>, <&gcc 4>, <&gcc 5>, <&gcc 6>, <&gcc 7>; -- 2.18.0
[PATCH v2 4/4] net: qcom/emac: add acpi shared mdio bus support
Parsing _DSD package "mdio-device". Signed-off-by: Wang Dongsheng --- drivers/net/ethernet/qualcomm/emac/emac-phy.c | 51 +++ 1 file changed, 51 insertions(+) diff --git a/drivers/net/ethernet/qualcomm/emac/emac-phy.c b/drivers/net/ethernet/qualcomm/emac/emac-phy.c index 4f98f9a0ed54..7a96bcf15d3f 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac-phy.c +++ b/drivers/net/ethernet/qualcomm/emac/emac-phy.c @@ -97,6 +97,51 @@ static int emac_mdio_write(struct mii_bus *bus, int addr, int regnum, u16 val) return 0; } +static int acpi_device_match(struct device *dev, void *fwnode) +{ + return dev->fwnode == fwnode; +} + +static int emac_acpi_get_shared_bus(struct platform_device *pdev, + struct mii_bus **bus) +{ + struct fwnode_reference_args args; + struct fwnode_handle *fw_node; + + struct device *shared_dev; + struct net_device *shared_netdev; + struct emac_adapter *shared_adpt; + int ret; + + if (!has_acpi_companion(&pdev->dev)) + return -ENODEV; + + fw_node = acpi_fwnode_handle(ACPI_COMPANION(&pdev->dev)); + ret = acpi_node_get_property_reference(fw_node, "mdio-device", 0, + &args); + if (ACPI_FAILURE(ret) || !is_acpi_device_node(args.fwnode)) { + dev_err(&pdev->dev, "Missing mdio-device property\n"); + return -ENODEV; + } + + shared_dev = bus_find_device(&platform_bus_type, NULL, +args.fwnode, +acpi_device_match); + if (!shared_dev) + return -EPROBE_DEFER; + + shared_netdev = dev_get_drvdata(shared_dev); + if (!shared_netdev) + return -EPROBE_DEFER; + + shared_adpt = netdev_priv(shared_netdev); + if (!shared_adpt->mii_bus) + return -EPROBE_DEFER; + + *bus = shared_adpt->mii_bus; + return 0; +} + static int emac_of_get_shared_bus(struct platform_device *pdev, struct mii_bus **bus) { @@ -137,6 +182,12 @@ static int emac_of_get_shared_bus(struct platform_device *pdev, static int __do_get_emac_mido_shared_bus(struct platform_device *pdev, struct emac_adapter *adpt) { + int ret; + + ret = emac_acpi_get_shared_bus(pdev, &adpt->mii_bus); + if (adpt->mii_bus || ret == -EPROBE_DEFER) + return ret; + return emac_of_get_shared_bus(pdev, &adpt->mii_bus); } -- 2.18.0
[PATCH v2 3/4] net: qcom/emac: add of shared mdio bus support
Share the mii_bus for others MAC device because EMAC include MDIO, and the motherboard has more than one PHY connected to an MDIO bus. Signed-off-by: Wang Dongsheng --- drivers/net/ethernet/qualcomm/emac/emac-phy.c | 63 ++- drivers/net/ethernet/qualcomm/emac/emac.c | 8 ++- 2 files changed, 66 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/emac/emac-phy.c b/drivers/net/ethernet/qualcomm/emac/emac-phy.c index 2d16c6b9ef49..4f98f9a0ed54 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac-phy.c +++ b/drivers/net/ethernet/qualcomm/emac/emac-phy.c @@ -13,6 +13,7 @@ /* Qualcomm Technologies, Inc. EMAC PHY Controller driver. */ +#include #include #include #include @@ -96,8 +97,51 @@ static int emac_mdio_write(struct mii_bus *bus, int addr, int regnum, u16 val) return 0; } -static int emac_mdio_bus_create(struct platform_device *pdev, - struct emac_adapter *adpt) +static int emac_of_get_shared_bus(struct platform_device *pdev, + struct mii_bus **bus) +{ + struct device_node *shared_node; + struct platform_device *shared_pdev; + struct net_device *shared_netdev; + struct emac_adapter *shared_adpt; + struct device_node *np = pdev->dev.of_node; + + const phandle *prop; + + prop = of_get_property(np, "mdio-device", NULL); + if (!prop) { + dev_err(&pdev->dev, "Missing mdio-device property\n"); + return -ENODEV; + } + + shared_node = of_find_node_by_phandle(*prop); + if (!shared_node) + return -ENODEV; + + shared_pdev = of_find_device_by_node(shared_node); + if (!shared_pdev) + return -ENODEV; + + shared_netdev = dev_get_drvdata(&shared_pdev->dev); + if (!shared_netdev) + return -EPROBE_DEFER; + + shared_adpt = netdev_priv(shared_netdev); + if (!shared_adpt->mii_bus) + return -EPROBE_DEFER; + + *bus = shared_adpt->mii_bus; + return 0; +} + +static int __do_get_emac_mido_shared_bus(struct platform_device *pdev, +struct emac_adapter *adpt) +{ + return emac_of_get_shared_bus(pdev, &adpt->mii_bus); +} + +static int __do_emac_mido_bus_create(struct platform_device *pdev, +struct emac_adapter *adpt) { struct device_node *np = pdev->dev.of_node; struct mii_bus *mii_bus; @@ -125,6 +169,17 @@ static int emac_mdio_bus_create(struct platform_device *pdev, return 0; } +static int emac_mdio_bus_create(struct platform_device *pdev, + struct emac_adapter *adpt) +{ + bool shared_mdio; + + shared_mdio = device_property_read_bool(&pdev->dev, "mdio-device"); + if (shared_mdio) + return __do_get_emac_mido_shared_bus(pdev, adpt); + return __do_emac_mido_bus_create(pdev, adpt); +} + static void emac_get_phydev(struct platform_device *pdev, struct emac_adapter *adpt) { @@ -174,6 +229,8 @@ int emac_phy_config(struct platform_device *pdev, struct emac_adapter *adpt) return 0; dev_err(&pdev->dev, "Could not find external phy\n"); - mdiobus_unregister(adpt->mii_bus); + /* Only the bus creator can unregister mdio bus */ + if (&pdev->dev == adpt->mii_bus->parent) + mdiobus_unregister(adpt->mii_bus); return -ENODEV; } diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c index 2a0cbc535a2e..6e566b4c5a6b 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac.c +++ b/drivers/net/ethernet/qualcomm/emac/emac.c @@ -727,7 +727,9 @@ static int emac_probe(struct platform_device *pdev) netif_napi_del(&adpt->rx_q.napi); err_undo_mdiobus: put_device(&adpt->phydev->mdio.dev); - mdiobus_unregister(adpt->mii_bus); + /* Only the bus creator can unregister mdio bus */ + if (&pdev->dev == adpt->mii_bus->parent) + mdiobus_unregister(adpt->mii_bus); err_undo_clocks: emac_clks_teardown(adpt); err_undo_netdev: @@ -747,7 +749,9 @@ static int emac_remove(struct platform_device *pdev) emac_clks_teardown(adpt); put_device(&adpt->phydev->mdio.dev); - mdiobus_unregister(adpt->mii_bus); + /* Only the bus creator can unregister mdio bus */ + if (&pdev->dev == adpt->mii_bus->parent) + mdiobus_unregister(adpt->mii_bus); free_netdev(netdev); if (adpt->phy.digital) -- 2.18.0
Re: [PATCH net-next RFC 7/8] udp: gro behind static key
On Fri, Sep 14, 2018 at 01:59:40PM -0400, Willem de Bruijn wrote: > From: Willem de Bruijn > > Avoid the socket lookup cost in udp_gro_receive if no socket has a > gro callback configured. > > Signed-off-by: Willem de Bruijn ... > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > index 4f6aa95a9b12..f44fe328aa0f 100644 > --- a/net/ipv4/udp_offload.c > +++ b/net/ipv4/udp_offload.c > @@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct list_head > *head, > { > struct udphdr *uh = udp_gro_udphdr(skb); > > - if (unlikely(!uh)) > + if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key)) > goto flush; If you use udp_encap_needed_key to enalbe UDP GRO, then a UDP encapsulation socket will enable it too. Not sure if this is intentional. That said, enabling UDP GRO on a UDP encapsulation socket (ESP in UPD etc.) will fail badly as then encrypted ESP packets might be merged together. So we somehow should make sure that this does not happen. Anyway, this reminds me that we can support GRO for UDP encapsulation. It just requires separate GRO callbacks for the different encapsulation types.
Re: [PATH RFC net-next 1/8] net: phy: Move linkmode helpers to somewhere public
On Fri, 14 Sep 2018 23:38:49 +0200 Andrew Lunn wrote: >phylink has some useful helpers to working with linkmode bitmaps. >Move them to there own header so other code can use them. > >Signed-off-by: Andrew Lunn Reviewed-by: Maxime Chevallier
Re: [PATH RFC net-next 2/8] net: phy: Add phydev_warn()
On Fri, 14 Sep 2018 23:38:50 +0200 Andrew Lunn wrote: >Not all new style LINK_MODE bits can be converted into old style >SUPPORTED bits. We need to warn when such a conversion is attempted. >Add a helper for this. > >Signed-off-by: Andrew Lunn Reviewed-by: Maxime Chevallier
Re: [PATH RFC net-next 3/8] net: phy: Add helper to convert MII ADV register to a linkmode
On Fri, 14 Sep 2018 23:38:51 +0200 Andrew Lunn wrote: >The phy_mii_ioctl can be used to write a value into the MII_ADVERTISE >register in the PHY. Since this changes the state of the PHY, we need >to make the same change to phydev->advertising. Add a helper which can >convert the register value to a linkmode. > >Signed-off-by: Andrew Lunn Reviewed-by: Maxime Chevallier
Re: [PATH RFC net-next 4/8] net: phy: Add helper for advertise to lcl value
On Fri, 14 Sep 2018 23:38:52 +0200 Andrew Lunn wrote: >Add a helper to convert the local advertising to an LCL capabilities, >which is then used to resolve pause flow control settings. > >Signed-off-by: Andrew Lunn Reviewed-by: Maxime Chevallier
Re: [PATH RFC net-next 5/8] net: phy: Add limkmode equivalents to some of the MII ethtool helpers
On Fri, 14 Sep 2018 23:38:53 +0200 Andrew Lunn wrote: >Add helpers which take a linkmode rather than a u32 ethtool for >advertising settings. > >Signed-off-by: Andrew Lunn Reviewed-by: Maxime Chevallier
Re: iproute2: Debian 9 No ELF support
On 09/17/2018 10:23 AM, Bo YU wrote: > Hello, > I have followed the instructions from: > > https://cilium.readthedocs.io/en/latest/bpf/#bpftool > > to test xdp program. > But i can not enable elf support. > > ./configure --prefix=/usr > ```output > TC schedulers > ATM no > > libc has setns: yes > SELinux support: no > ELF support: no > libmnl support: yes > Berkeley DB: yes > need for strlcpy: yes > libcap support: yes > ``` > And i have installed libelf-dev : > ```output > sudo apt show libelf-dev > Package: libelf-dev > Version: 0.168-1 > Priority: optional > Section: libdevel > Source: elfutils > Maintainer: Kurt Roeckx > Installed-Size: 353 kB > Depends: libelf1 (= 0.168-1) > Conflicts: libelfg0-dev > Homepage: https://sourceware.org/elfutils/ > Tag: devel::library, role::devel-lib > ``` > > And gcc version: > gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) > > uname -a: > Linux debian 4.18.0-rc1+ #2 SMP Sun Jun 24 16:53:57 HKT 2018 x86_64 GNU/Linux > > Any help is appreciate. Debian's official iproute2 packaging build says 'libelf-dev' [0], and having libelf-dev installed should work ... [...] Build-Depends: bison, debhelper (>= 10~), flex, iptables-dev, libatm1-dev, libcap-dev, libdb-dev, libelf-dev, libmnl-dev, libselinux1-dev, linux-libc-dev, pkg-config, po-debconf, zlib1g-dev, [...] Did you ran into this one perhaps [1]? Do you have zlib1g-dev installed? [0] https://salsa.debian.org/debian/iproute2/blob/master/debian/control [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=885071
[PATCH v2 1/2] netlink: add NLA_REJECT policy type
From: Johannes Berg In some situations some netlink attributes may be used for output only (kernel->userspace) or may be reserved for future use. It's then helpful to be able to prevent userspace from using them in messages sent to the kernel, since they'd otherwise be ignored and any future will become impossible if this happens. Add NLA_REJECT to the policy which does nothing but reject (with EINVAL) validation of any messages containing this attribute. Allow for returning a specific extended ACK error message in the validation_data pointer. While at it clear up the documentation a bit - the NLA_BITFIELD32 documentation was added to the list of len field descriptions. Also, use NL_SET_BAD_ATTR() in one place where it's open-coded. The specific case I have in mind now is a shared nested attribute containing request/response data, and it would be pointless and potentially confusing to have userspace include response data in the messages that actually contain a request. Signed-off-by: Johannes Berg --- v2: preserve behaviour of overwriting the extack message, with either the generic or the specific one now --- include/net/netlink.h | 13 - lib/nlattr.c | 23 --- 2 files changed, 28 insertions(+), 8 deletions(-) diff --git a/include/net/netlink.h b/include/net/netlink.h index 0c154f98e987..b318b0a9f6c3 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -180,6 +180,7 @@ enum { NLA_S32, NLA_S64, NLA_BITFIELD32, + NLA_REJECT, __NLA_TYPE_MAX, }; @@ -208,9 +209,19 @@ enum { *NLA_MSECSLeaving the length field zero will verify the * given type fits, using it verifies minimum length * just like "All other" - *NLA_BITFIELD32 A 32-bit bitmap/bitselector attribute + *NLA_BITFIELD32 Unused + *NLA_REJECT Unused *All otherMinimum length of attribute payload * + * Meaning of `validation_data' field: + *NLA_BITFIELD32 This is a 32-bit bitmap/bitselector attribute and + * validation data must point to a u32 value of valid + * flags + *NLA_REJECT This attribute is always rejected and validation data + * may point to a string to report as the error instead + * of the generic one in extended ACK. + *All otherUnused + * * Example: * static const struct nla_policy my_policy[ATTR_MAX+1] = { * [ATTR_FOO] = { .type = NLA_U16 }, diff --git a/lib/nlattr.c b/lib/nlattr.c index e335bcafa9e4..36d74b079151 100644 --- a/lib/nlattr.c +++ b/lib/nlattr.c @@ -69,7 +69,8 @@ static int validate_nla_bitfield32(const struct nlattr *nla, } static int validate_nla(const struct nlattr *nla, int maxtype, - const struct nla_policy *policy) + const struct nla_policy *policy, + const char **error_msg) { const struct nla_policy *pt; int minlen = 0, attrlen = nla_len(nla), type = nla_type(nla); @@ -87,6 +88,11 @@ static int validate_nla(const struct nlattr *nla, int maxtype, } switch (pt->type) { + case NLA_REJECT: + if (pt->validation_data && error_msg) + *error_msg = pt->validation_data; + return -EINVAL; + case NLA_FLAG: if (attrlen > 0) return -ERANGE; @@ -180,11 +186,10 @@ int nla_validate(const struct nlattr *head, int len, int maxtype, int rem; nla_for_each_attr(nla, head, len, rem) { - int err = validate_nla(nla, maxtype, policy); + int err = validate_nla(nla, maxtype, policy, NULL); if (err < 0) { - if (extack) - extack->bad_attr = nla; + NL_SET_BAD_ATTR(extack, nla); return err; } } @@ -250,11 +255,15 @@ int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head, u16 type = nla_type(nla); if (type > 0 && type <= maxtype) { + static const char _msg[] = "Attribute failed policy validation"; + const char *msg = _msg; + if (policy) { - err = validate_nla(nla, maxtype, policy); + err = validate_nla(nla, maxtype, policy, &msg); if (err < 0) { - NL_SET_ERR_MSG_ATTR(extack, nla, - "Attribute failed policy validation"); + NL_SET_BAD_ATTR(extack, nla); + if (extack) + exta
[PATCH v2 2/2] netlink: add ethernet address policy types
From: Johannes Berg Commonly, ethernet addresses are just using a policy of { .len = ETH_ALEN } which leaves userspace free to send more data than it should, which may hide bugs. Introduce NLA_EXACT_LEN which checks for exact size, rejecting the attribute if it's not exactly that length. Also add NLA_EXACT_LEN_WARN which requires the minimum length and will warn on longer attributes, for backward compatibility. Use these to define NLA_POLICY_ETH_ADDR (new strict policy) and NLA_POLICY_ETH_ADDR_COMPAT (compatible policy with warning); these are used like this: static const struct nla_policy [...] = { [NL_ATTR_NAME] = NLA_POLICY_ETH_ADDR, ... }; Signed-off-by: Johannes Berg --- v2: add only NLA_EXACT_LEN/NLA_EXACT_LEN_WARN and build on top of that for ethernet address validation, so it can be extended for other types (e.g. IPv6 addresses) --- include/net/netlink.h | 13 + lib/nlattr.c | 8 +++- 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/include/net/netlink.h b/include/net/netlink.h index b318b0a9f6c3..318b1ded3833 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -181,6 +181,8 @@ enum { NLA_S64, NLA_BITFIELD32, NLA_REJECT, + NLA_EXACT_LEN, + NLA_EXACT_LEN_WARN, __NLA_TYPE_MAX, }; @@ -211,6 +213,10 @@ enum { * just like "All other" *NLA_BITFIELD32 Unused *NLA_REJECT Unused + *NLA_EXACT_LENAttribute must have exactly this length, otherwise + * it is rejected. + *NLA_EXACT_LEN_WARN Attribute should have exactly this length, a warning + * is logged if it is longer, shorter is rejected. *All otherMinimum length of attribute payload * * Meaning of `validation_data' field: @@ -236,6 +242,13 @@ struct nla_policy { void*validation_data; }; +#define NLA_POLICY_EXACT_LEN(_len) { .type = NLA_EXACT_LEN, .len = _len } +#define NLA_POLICY_EXACT_LEN_WARN(_len){ .type = NLA_EXACT_LEN_WARN, \ + .len = _len } + +#define NLA_POLICY_ETH_ADDRNLA_POLICY_EXACT_LEN(ETH_ALEN) +#define NLA_POLICY_ETH_ADDR_COMPAT NLA_POLICY_EXACT_LEN_WARN(ETH_ALEN) + /** * struct nl_info - netlink source information * @nlh: Netlink message header of original request diff --git a/lib/nlattr.c b/lib/nlattr.c index 36d74b079151..bb6fe5ed4ecf 100644 --- a/lib/nlattr.c +++ b/lib/nlattr.c @@ -82,12 +82,18 @@ static int validate_nla(const struct nlattr *nla, int maxtype, BUG_ON(pt->type > NLA_TYPE_MAX); - if (nla_attr_len[pt->type] && attrlen != nla_attr_len[pt->type]) { + if ((nla_attr_len[pt->type] && attrlen != nla_attr_len[pt->type]) || + (pt->type == NLA_EXACT_LEN_WARN && attrlen != pt->len)) { pr_warn_ratelimited("netlink: '%s': attribute type %d has an invalid length.\n", current->comm, type); } switch (pt->type) { + case NLA_EXACT_LEN: + if (attrlen != pt->len) + return -ERANGE; + break; + case NLA_REJECT: if (pt->validation_data && error_msg) *error_msg = pt->validation_data; -- 2.14.4
Re: [PATH RFC net-next 0/8] Continue towards using linkmode in phylib
Hi Andrew, On Fri, 14 Sep 2018 23:38:48 +0200 Andrew Lunn wrote: >These patches contain some further cleanup and helpers, and the first >real patch towards using linkmode bitmaps in phylink. > >It is RFC because i don't like patch #7 and maybe somebody has a >better idea how to do this. Ideally, we want to initialise a linux >generic bitmap at compile time. Thanks for that series. I've reviewed what I feel confident enough to, I'll be happy to test the runtime "features" listing that I think you plan to implement. Maxime
Re: [RFC PATCH 3/4] udp: implement GRO plain UDP sockets.
Hi, On Fri, 2018-09-14 at 09:48 -0700, Eric Dumazet wrote: > Are you sure the data is actually fully copied to user space ? > > tools/testing/selftests/net/udpgso_bench_rx.c > > uses : > > static char rbuf[ETH_DATA_LEN]; >/* MSG_TRUNC will make return value full datagram length */ >ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT); > > So you need to change this program. Thank for the feedback. You are right, I need to update udpgso_bench_rx. Making it unconditionally read up to 64K bytes, I measure: Before: udp rx:962 MB/s 685339 calls/s After: udp rx: 1344 MB/s22812 calls/s Top perf offenders for udpgso_bench_rx: 31.83% [kernel] [k] copy_user_enhanced_fast_string 8.90% [kernel] [k] skb_release_data 7.97% [kernel] [k] free_pcppages_bulk 6.82% [kernel] [k] copy_page_to_iter 3.41% [kernel] [k] skb_copy_datagram_iter 2.01% [kernel] [k] free_unref_page 1.92% [kernel] [k] __entry_SYSCALL_64_trampoline Trivial note: with this even UDP sockets would benefit from remote skb freeing, as the cost of skb_release_data is relevant for the GSO packets. > Also, GRO reception would mean that userspace can retrieve, > not only full bytes of X datagrams, but also the gso_size (or length of > individual datagrams) > > You can not know the size of the packets in advance, the sender will decide. Thanks for pointing that out. I guess that implementing something like cmsg(UDP_SEGMENT) as Willem suggests in in 8/8 patch would do, right? I can have a look at that _if_ there is interest in this approch, Cheers, Paolo
Re: [RFC PATCH 2/4] net: enable UDP gro on demand.
On Sun, 2018-09-16 at 14:23 -0400, Willem de Bruijn wrote: > That udp gro implementation is clearly less complete than yours in > this patchset. The point I wanted to bring up for discussion is not the > protocol implementation, but the infrastructure for enabling it > conditionally. I'm still [trying to] processing your patchset ;) So please perdon me for any obvious interpretation mistakes... > Assuming cycle cost is comparable, what do you think of using the > existing sk offload callbacks to enable this on a per-socket basis? I have no objection about that, if there are no performance drawbacks. In my measures retpoline costs is relevant for every indirect call added. Using the existing sk offload approach will require an additional indirect call per packet compared to the implementation here. > As for the protocol-wide knob, I do strongly prefer something that can > work for all protocols, not just UDP. I like the general infrastructure idea. I think there is some agreement in avoiding the addition of more user-controllable knobs, as we already have a lot of them. If I read your patch correctly, user-space need to enable/disable the UDO GSO explicitly via procfs, right? I tried to look for something that does not require user action. > I also implemented a version that > atomically swaps the struct ptr instead of the flag based approach I sent > for review. I'm fairly agnostic about that point. I think/fear security oriented guys may scream for the somewhat large deconstification ?!? > One subtle issue is that I > believe we need to keep the gro_complete callbacks enabled, as gro > packets may be queued for completion when gro_receive gets disabled. Good point, thanks! I missed that. Cheers, Paolo
Re: [PATCH bpf-next] tools/bpf: bpftool: improve output format for bpftool net
On 09/14/2018 11:49 PM, Yonghong Song wrote: > This is a followup patch for Commit f6f3bac08ff9 > ("tools/bpf: bpftool: add net support"). > Some improvements are made for the bpftool net output. > Specially, plain output is more concise such that > per attachment should nicely fit in one line. > Compared to previous output, the prog tag is removed > since it can be easily obtained with program id. > Similar to xdp attachments, the device name is added > to tc_filters attachments. > > The bpf program attached through shared block > mechanism is supported as well. > $ ip link add dev v1 type veth peer name v2 > $ tc qdisc add dev v1 ingress_block 10 egress_block 20 clsact > $ tc qdisc add dev v2 ingress_block 10 egress_block 20 clsact > $ tc filter add block 10 protocol ip prio 25 bpf obj bpf_shared.o sec > ingress flowid 1:1 > $ tc filter add block 20 protocol ip prio 30 bpf obj bpf_cyclic.o sec > classifier flowid 1:1 > $ bpftool net > xdp [ > ] > tc_filters [ >v2(7) qdisc_clsact_ingress bpf_shared.o:[ingress] id 23 >v2(7) qdisc_clsact_egress bpf_cyclic.o:[classifier] id 24 >v1(8) qdisc_clsact_ingress bpf_shared.o:[ingress] id 23 >v1(8) qdisc_clsact_egress bpf_cyclic.o:[classifier] id 24 Just one minor note for this one here, do we even need the "qdisc_" prefix? Couldn't it just simply be "clsact/ingress", "clsact/egress", "htb" etc? > ] > > The documentation and "bpftool net help" are updated > to make it clear that current implementation only > supports xdp and tc attachments. For programs > attached to cgroups, "bpftool cgroup" can be used > to dump attachments. For other programs e.g. > sk_{filter,skb,msg,reuseport} and lwt/seg6, > iproute2 tools should be used. > > The new output: > $ bpftool net > xdp [ >eth0(2) id/drv 198 Could we change the "id/{drv,offload,generic} xyz" into e.g. "eth0(2) {driver,offload,generic} id 198", meaning, the "id xyz" being a child of either "driver", "offload" or "generic". Reason would be two-fold: i) we can keep the "id xyz" notion consistent as used under "tc_filters", and ii) it allows to put further information aside from just "id" member under "driver", "offload" or "generic" in the future. > ] > tc_filters [ Nit: can we use just "tc" for the above? Main use case would be clsact with one of its two hooks anyway, and the term "filter" is sort of tc historic; while being correct bpf progs would do much more than just filtering, and context is pretty clear anyway from qdisc that we subsequently dump. >eth0(2) qdisc_clsact_ingress fbflow_icmp id 335 act [{icmp_action id 336}] >eth0(2) qdisc_clsact_egress fbflow_egress id 334 > ] > $ bpftool -jp net > [{ > "xdp": [{ > "devname": "eth0", > "ifindex": 2, > "id/drv": 198 > } > ], > "tc_filters": [{ > "devname": "eth0", > "ifindex": 2, > "kind": "qdisc_clsact_ingress", > "name": "fbflow_icmp", > "id": 335, > "act": [{ > "name": "icmp_action", > "id": 336 > } > ] > },{ > "devname": "eth0", > "ifindex": 2, > "kind": "qdisc_clsact_egress", > "name": "fbflow_egress", > "id": 334 > } > ] > } > ] > > Signed-off-by: Yonghong Song
Re: [PATCH net-next RFC 7/8] udp: gro behind static key
On Fri, 2018-09-14 at 13:59 -0400, Willem de Bruijn wrote: > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > index 4f6aa95a9b12..f44fe328aa0f 100644 > --- a/net/ipv4/udp_offload.c > +++ b/net/ipv4/udp_offload.c > @@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct list_head > *head, > { > struct udphdr *uh = udp_gro_udphdr(skb); > > - if (unlikely(!uh)) > + if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key)) > goto flush; > > /* Don't bother verifying checksum if we're going to flush anyway. */ If I read this correctly, once udp_encap_needed_key is enabled, it will never be turned off, because the tunnel and encap socket shut down does not cope with udp_encap_needed_key. Perhaps we should take care of that, too. Cheers, Paolo
[PATCH mlx5-next 1/4] net/mlx5: Rename incorrect naming in IFC file
From: Mark Bloch Remove a trailing underscore from the multicast/unicast names. Signed-off-by: Mark Bloch Reviewed-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/qp.c | 4 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 2 +- include/linux/mlx5/mlx5_ifc.h | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 1f35ecbefffe..8bada4b9 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1279,7 +1279,7 @@ static int create_raw_packet_qp_tir(struct mlx5_ib_dev *dev, if (dev->rep) MLX5_SET(tirc, tirc, self_lb_block, -MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST_); +MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST); err = mlx5_core_create_tir(dev->mdev, in, inlen, &rq->tirn); @@ -1582,7 +1582,7 @@ static int create_rss_raw_qp_tir(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, create_tir: if (dev->rep) MLX5_SET(tirc, tirc, self_lb_block, -MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST_); +MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST); err = mlx5_core_create_tir(dev->mdev, in, inlen, &qp->rss_qp.tirn); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index db3278cc052b..3078491cc0d0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -153,7 +153,7 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb) if (enable_uc_lb) MLX5_SET(modify_tir_in, in, ctx.self_lb_block, -MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST_); +MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST); MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1); diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 3a4a2e0567e9..4c7a1d25d73b 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -2559,8 +2559,8 @@ enum { }; enum { - MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST_= 0x1, - MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST_ = 0x2, + MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST= 0x1, + MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST = 0x2, }; struct mlx5_ifc_tirc_bits { -- 2.14.4
[PATCH rdma-next 3/4] RDMA/mlx5: Allow creating RAW ethernet QP with loopback support
From: Mark Bloch Expose two new flags: MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC Those flags can be used at creation time in order to allow a QP to be able to receive loopback traffic (unicast and multicast). We store the state in the QP to be used on the destroy path to indicate with which flags the QP was created with. Signed-off-by: Mark Bloch Reviewed-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +- drivers/infiniband/hw/mlx5/qp.c | 62 include/uapi/rdma/mlx5-abi.h | 2 ++ 3 files changed, 52 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 7b2af7e719c4..b258adb93097 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -440,7 +440,7 @@ struct mlx5_ib_qp { struct list_headcq_send_list; struct mlx5_rate_limit rl; u32 underlay_qpn; - booltunnel_offload_en; + u32 flags_en; /* storage for qp sub type when core qp type is IB_QPT_DRIVER */ enum ib_qp_type qp_sub_type; }; diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 8bada4b9..428e417e01da 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1258,8 +1258,9 @@ static bool tunnel_offload_supported(struct mlx5_core_dev *dev) static int create_raw_packet_qp_tir(struct mlx5_ib_dev *dev, struct mlx5_ib_rq *rq, u32 tdn, - bool tunnel_offload_en) + u32 *qp_flags_en) { + u8 lb_flag = 0; u32 *in; void *tirc; int inlen; @@ -1274,12 +1275,21 @@ static int create_raw_packet_qp_tir(struct mlx5_ib_dev *dev, MLX5_SET(tirc, tirc, disp_type, MLX5_TIRC_DISP_TYPE_DIRECT); MLX5_SET(tirc, tirc, inline_rqn, rq->base.mqp.qpn); MLX5_SET(tirc, tirc, transport_domain, tdn); - if (tunnel_offload_en) + if (*qp_flags_en & MLX5_QP_FLAG_TUNNEL_OFFLOADS) MLX5_SET(tirc, tirc, tunneled_offload_en, 1); - if (dev->rep) - MLX5_SET(tirc, tirc, self_lb_block, -MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST); + if (*qp_flags_en & MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC) + lb_flag |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST; + + if (*qp_flags_en & MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC) + lb_flag |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST; + + if (dev->rep) { + lb_flag |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST; + *qp_flags_en |= MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC; + } + + MLX5_SET(tirc, tirc, self_lb_block, lb_flag); err = mlx5_core_create_tir(dev->mdev, in, inlen, &rq->tirn); @@ -1332,8 +1342,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, goto err_destroy_sq; - err = create_raw_packet_qp_tir(dev, rq, tdn, - qp->tunnel_offload_en); + err = create_raw_packet_qp_tir(dev, rq, tdn, &qp->flags_en); if (err) goto err_destroy_rq; } @@ -1410,6 +1419,7 @@ static int create_rss_raw_qp_tir(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, u32 tdn = mucontext->tdn; struct mlx5_ib_create_qp_rss ucmd = {}; size_t required_cmd_sz; + u8 lb_flag = 0; if (init_attr->qp_type != IB_QPT_RAW_PACKET) return -EOPNOTSUPP; @@ -1444,7 +1454,9 @@ static int create_rss_raw_qp_tir(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, return -EOPNOTSUPP; } - if (ucmd.flags & ~MLX5_QP_FLAG_TUNNEL_OFFLOADS) { + if (ucmd.flags & ~(MLX5_QP_FLAG_TUNNEL_OFFLOADS | + MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC | + MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC)) { mlx5_ib_dbg(dev, "invalid flags\n"); return -EOPNOTSUPP; } @@ -1461,6 +1473,16 @@ static int create_rss_raw_qp_tir(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, return -EOPNOTSUPP; } + if (ucmd.flags & MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC || dev->rep) { + lb_flag |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST; + qp->flags_en |= MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC; + } + + if (ucmd.flags & MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC) { + lb_flag |= MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST; + qp->flags_en |= MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC; + } + err = ib_copy_to_udata(udata, &resp, min(udata->outlen, sizeof(resp))); if (err) { mlx5_ib_dbg(dev, "copy failed\n"); @@ -
[PATCH rdma-next 4/4] RDMA/mlx5: Enable vport loopback when user context or QP mandate
From: Mark Bloch A user can create a QP which can accept loopback traffic, but that's not enough. We need to enable loopback on the vport as well. Currently vport loopback is enabled only when more than 1 users are using the IB device, update the logic to consider whatever a QP which supports loopback was created, if so enable vport loopback even if there is only a single user. Signed-off-by: Mark Bloch Reviewed-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/main.c| 40 +--- drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 drivers/infiniband/hw/mlx5/qp.c | 34 +++--- 3 files changed, 59 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index b64861ba2c42..7d5fcf76466f 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1571,28 +1571,44 @@ static void deallocate_uars(struct mlx5_ib_dev *dev, mlx5_cmd_free_uar(dev->mdev, bfregi->sys_pages[i]); } -static int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev) +int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp) { int err = 0; mutex_lock(&dev->lb.mutex); - dev->lb.user_td++; - - if (dev->lb.user_td == 2) - err = mlx5_nic_vport_update_local_lb(dev->mdev, true); + if (td) + dev->lb.user_td++; + if (qp) + dev->lb.qps++; + + if (dev->lb.user_td == 2 || + dev->lb.qps == 1) { + if (!dev->lb.enabled) { + err = mlx5_nic_vport_update_local_lb(dev->mdev, true); + dev->lb.enabled = true; + } + } mutex_unlock(&dev->lb.mutex); return err; } -static void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev) +void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool td, bool qp) { mutex_lock(&dev->lb.mutex); - dev->lb.user_td--; - - if (dev->lb.user_td < 2) - mlx5_nic_vport_update_local_lb(dev->mdev, false); + if (td) + dev->lb.user_td--; + if (qp) + dev->lb.qps--; + + if (dev->lb.user_td == 1 && + dev->lb.qps == 0) { + if (dev->lb.enabled) { + mlx5_nic_vport_update_local_lb(dev->mdev, false); + dev->lb.enabled = false; + } + } mutex_unlock(&dev->lb.mutex); } @@ -1613,7 +1629,7 @@ static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn) !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc))) return err; - return mlx5_ib_enable_lb(dev); + return mlx5_ib_enable_lb(dev, true, false); } static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn) @@ -1628,7 +1644,7 @@ static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn) !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc))) return; - mlx5_ib_disable_lb(dev); + mlx5_ib_disable_lb(dev, true, false); } static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index b258adb93097..99c853c56d31 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -882,6 +882,8 @@ struct mlx5_ib_lb_state { /* protect the user_td */ struct mutexmutex; u32 user_td; + int qps; + boolenabled; }; struct mlx5_ib_dev { @@ -1040,6 +1042,8 @@ int mlx5_ib_query_srq(struct ib_srq *ibsrq, struct ib_srq_attr *srq_attr); int mlx5_ib_destroy_srq(struct ib_srq *srq); int mlx5_ib_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr, const struct ib_recv_wr **bad_wr); +int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp); +void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool td, bool qp); struct ib_qp *mlx5_ib_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *init_attr, struct ib_udata *udata); diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 428e417e01da..1f318a47040c 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1256,6 +1256,16 @@ static bool tunnel_offload_supported(struct mlx5_core_dev *dev) MLX5_CAP_ETH(dev, tunnel_stateless_geneve_rx)); } +static void destroy_raw_packet_qp_tir(struct mlx5_ib_dev *dev, + struct mlx5_ib_rq *rq, + u32 qp_flags_en) +{ + if (qp_flags_en & (MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC | + MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC)) +
[PATCH rdma-next 2/4] RDMA/mlx5: Refactor transport domain bookkeeping logic
From: Mark Bloch In preparation to enable loopback on a single user context move the logic that enables/disables loopback to separate functions and group variables under a single struct. Signed-off-by: Mark Bloch Reviewed-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/main.c| 45 +++- drivers/infiniband/hw/mlx5/mlx5_ib.h | 10 +--- 2 files changed, 36 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 659af370a961..b64861ba2c42 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1571,6 +1571,32 @@ static void deallocate_uars(struct mlx5_ib_dev *dev, mlx5_cmd_free_uar(dev->mdev, bfregi->sys_pages[i]); } +static int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev) +{ + int err = 0; + + mutex_lock(&dev->lb.mutex); + dev->lb.user_td++; + + if (dev->lb.user_td == 2) + err = mlx5_nic_vport_update_local_lb(dev->mdev, true); + + mutex_unlock(&dev->lb.mutex); + + return err; +} + +static void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev) +{ + mutex_lock(&dev->lb.mutex); + dev->lb.user_td--; + + if (dev->lb.user_td < 2) + mlx5_nic_vport_update_local_lb(dev->mdev, false); + + mutex_unlock(&dev->lb.mutex); +} + static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn) { int err; @@ -1587,14 +1613,7 @@ static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn) !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc))) return err; - mutex_lock(&dev->lb_mutex); - dev->user_td++; - - if (dev->user_td == 2) - err = mlx5_nic_vport_update_local_lb(dev->mdev, true); - - mutex_unlock(&dev->lb_mutex); - return err; + return mlx5_ib_enable_lb(dev); } static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn) @@ -1609,13 +1628,7 @@ static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn) !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc))) return; - mutex_lock(&dev->lb_mutex); - dev->user_td--; - - if (dev->user_td < 2) - mlx5_nic_vport_update_local_lb(dev->mdev, false); - - mutex_unlock(&dev->lb_mutex); + mlx5_ib_disable_lb(dev); } static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, @@ -5970,7 +5983,7 @@ int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev) if ((MLX5_CAP_GEN(dev->mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH) && (MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) || MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc))) - mutex_init(&dev->lb_mutex); + mutex_init(&dev->lb.mutex); return 0; } diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 6c57872fdc4e..7b2af7e719c4 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -878,6 +878,12 @@ to_mcounters(struct ib_counters *ibcntrs) int parse_flow_flow_action(struct mlx5_ib_flow_action *maction, bool is_egress, struct mlx5_flow_act *action); +struct mlx5_ib_lb_state { + /* protect the user_td */ + struct mutexmutex; + u32 user_td; +}; + struct mlx5_ib_dev { struct ib_deviceib_dev; const struct uverbs_object_tree_def *driver_trees[7]; @@ -919,9 +925,7 @@ struct mlx5_ib_dev { const struct mlx5_ib_profile*profile; struct mlx5_eswitch_rep *rep; - /* protect the user_td */ - struct mutexlb_mutex; - u32 user_td; + struct mlx5_ib_lb_state lb; u8 umr_fence; struct list_headib_dev_list; u64 sys_image_guid; -- 2.14.4
[PATCH rdma-next 0/4] mlx5 vport loopback
From: Leon Romanovsky Hi, This is short series from Mark which extends handling of loopback traffic. Originally mlx5 IB dynamically enabled/disabled both unicast and multicast based on number of users. However RAW ethernet QPs need more granular access. Thanks Mark Bloch (4): net/mlx5: Rename incorrect naming in IFC file RDMA/mlx5: Refactor transport domain bookkeeping logic RDMA/mlx5: Allow creating RAW ethernet QP with loopback support RDMA/mlx5: Enable vport loopback when user context or QP mandate drivers/infiniband/hw/mlx5/main.c | 61 ++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 16 +++- drivers/infiniband/hw/mlx5/qp.c| 96 +- .../net/ethernet/mellanox/mlx5/core/en_common.c| 2 +- include/linux/mlx5/mlx5_ifc.h | 4 +- include/uapi/rdma/mlx5-abi.h | 2 + 6 files changed, 138 insertions(+), 43 deletions(-) -- 2.14.4
Re: [PATCH net-next RFC 7/8] udp: gro behind static key
On Fri, Sep 14, 2018 at 01:59:40PM -0400, Willem de Bruijn wrote: > From: Willem de Bruijn > > Avoid the socket lookup cost in udp_gro_receive if no socket has a > gro callback configured. It would be nice if we could do GRO not just for GRO configured sockets, but also for flows that are going to be IPsec transformed or directly forwarded. Maybe in case that forwarding is enabled on the receiving device, inet_gro_receive() could do a route lookup and allow GRO if the route lookup returned at forwarding route. For flows that are likely software segmented after that, it would be worth to build packet chains insted of merging the payload. Packets of the same flow could travel together, but it would save the cost of the packet merging and segmenting. This could be done similar to what I proposed for the list receive case: https://www.spinics.net/lists/netdev/msg522706.html How GRO should be done could be even configured by replacing the net_offload pointer similar to what Paolo propsed in his pachset with the inet_update_offload() function.
[PATCH][net-next] veth: rename pcpu_vstats as pcpu_lstats
struct pcpu_vstats and pcpu_lstats have same members and usage, and pcpu_lstats is used in many files, so rename pcpu_vstats as pcpu_lstats to reduce duplicate definition Signed-off-by: Zhang Yu Signed-off-by: Li RongQing --- drivers/net/veth.c| 22 -- include/linux/netdevice.h | 1 - 2 files changed, 8 insertions(+), 15 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index bc8faf13a731..aeecb5892e26 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -37,12 +37,6 @@ #define VETH_XDP_TXBIT(0) #define VETH_XDP_REDIR BIT(1) -struct pcpu_vstats { - u64 packets; - u64 bytes; - struct u64_stats_sync syncp; -}; - struct veth_rq { struct napi_struct xdp_napi; struct net_device *dev; @@ -217,7 +211,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) skb_tx_timestamp(skb); if (likely(veth_forward_skb(rcv, skb, rq, rcv_xdp) == NET_RX_SUCCESS)) { - struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); + struct pcpu_lstats *stats = this_cpu_ptr(dev->lstats); u64_stats_update_begin(&stats->syncp); stats->bytes += length; @@ -236,7 +230,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } -static u64 veth_stats_one(struct pcpu_vstats *result, struct net_device *dev) +static u64 veth_stats_one(struct pcpu_lstats *result, struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); int cpu; @@ -244,7 +238,7 @@ static u64 veth_stats_one(struct pcpu_vstats *result, struct net_device *dev) result->packets = 0; result->bytes = 0; for_each_possible_cpu(cpu) { - struct pcpu_vstats *stats = per_cpu_ptr(dev->vstats, cpu); + struct pcpu_lstats *stats = per_cpu_ptr(dev->lstats, cpu); u64 packets, bytes; unsigned int start; @@ -264,7 +258,7 @@ static void veth_get_stats64(struct net_device *dev, { struct veth_priv *priv = netdev_priv(dev); struct net_device *peer; - struct pcpu_vstats one; + struct pcpu_lstats one; tot->tx_dropped = veth_stats_one(&one, dev); tot->tx_bytes = one.bytes; @@ -830,13 +824,13 @@ static int veth_dev_init(struct net_device *dev) { int err; - dev->vstats = netdev_alloc_pcpu_stats(struct pcpu_vstats); - if (!dev->vstats) + dev->lstats = netdev_alloc_pcpu_stats(struct pcpu_lstats); + if (!dev->lstats) return -ENOMEM; err = veth_alloc_queues(dev); if (err) { - free_percpu(dev->vstats); + free_percpu(dev->lstats); return err; } @@ -846,7 +840,7 @@ static int veth_dev_init(struct net_device *dev) static void veth_dev_free(struct net_device *dev) { veth_free_queues(dev); - free_percpu(dev->vstats); + free_percpu(dev->lstats); } #ifdef CONFIG_NET_POLL_CONTROLLER diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index baed5d5088c5..1cbbf77a685f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2000,7 +2000,6 @@ struct net_device { struct pcpu_lstats __percpu *lstats; struct pcpu_sw_netstats __percpu*tstats; struct pcpu_dstats __percpu *dstats; - struct pcpu_vstats __percpu *vstats; }; #if IS_ENABLED(CONFIG_GARP) -- 2.16.2
[PATCH mlx5-next 03/25] net/mlx5: Set uid as part of RQ commands
From: Yishai Hadas Set uid as part of RQ commands so that the firmware can manage the RQ object in a secured way. That will enable using an RQ that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/qp.c | 16 ++-- include/linux/mlx5/mlx5_ifc.h| 6 +++--- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c index 04f72a1cdbcc..0ca68ef54d93 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c @@ -540,6 +540,17 @@ int mlx5_core_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn) } EXPORT_SYMBOL_GPL(mlx5_core_xrcd_dealloc); +static void destroy_rq_tracked(struct mlx5_core_dev *dev, u32 rqn, u16 uid) +{ + u32 in[MLX5_ST_SZ_DW(destroy_rq_in)] = {0}; + u32 out[MLX5_ST_SZ_DW(destroy_rq_out)] = {0}; + + MLX5_SET(destroy_rq_in, in, opcode, MLX5_CMD_OP_DESTROY_RQ); + MLX5_SET(destroy_rq_in, in, rqn, rqn); + MLX5_SET(destroy_rq_in, in, uid, uid); + mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} + int mlx5_core_create_rq_tracked(struct mlx5_core_dev *dev, u32 *in, int inlen, struct mlx5_core_qp *rq) { @@ -550,6 +561,7 @@ int mlx5_core_create_rq_tracked(struct mlx5_core_dev *dev, u32 *in, int inlen, if (err) return err; + rq->uid = MLX5_GET(create_rq_in, in, uid); rq->qpn = rqn; err = create_resource_common(dev, rq, MLX5_RES_RQ); if (err) @@ -558,7 +570,7 @@ int mlx5_core_create_rq_tracked(struct mlx5_core_dev *dev, u32 *in, int inlen, return 0; err_destroy_rq: - mlx5_core_destroy_rq(dev, rq->qpn); + destroy_rq_tracked(dev, rq->qpn, rq->uid); return err; } @@ -568,7 +580,7 @@ void mlx5_core_destroy_rq_tracked(struct mlx5_core_dev *dev, struct mlx5_core_qp *rq) { destroy_resource_common(dev, rq); - mlx5_core_destroy_rq(dev, rq->qpn); + destroy_rq_tracked(dev, rq->qpn, rq->uid); } EXPORT_SYMBOL(mlx5_core_destroy_rq_tracked); diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index e5a0d3ecfaad..01b707666fb4 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -5489,7 +5489,7 @@ enum { struct mlx5_ifc_modify_rq_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6165,7 +6165,7 @@ struct mlx5_ifc_destroy_rq_out_bits { struct mlx5_ifc_destroy_rq_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6848,7 +6848,7 @@ struct mlx5_ifc_create_rq_out_bits { struct mlx5_ifc_create_rq_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; -- 2.14.4
[PATCH rdma-next 00/24] Extend DEVX functionality
From: Leon Romanovsky >From Yishai, This series comes to enable the DEVX functionality in some wider scope, specifically, - It enables using kernel objects that were created by the verbs API in the DEVX flow. - It enables white list commands without DEVX user context. - It enables the IB link layer under CAP_NET_RAW capabilities. - It exposes the PRM handles for RAW QP (i.e. TIRN, TISN, RQN, SQN) to be used later on directly by the DEVX interface. In General, Each object that is created/destroyed/modified via verbs will be stamped with a UID based on its user context. This is already done for DEVX objects commands. This will enable the firmware to enforce the usage of kernel objects from the DEVX flow by validating that the same UID is used and the resources are really related to the same user. For example in case a CQ was created with verbs it will be stamped with UID and once will be pointed by a DEVX create QP command the firmware will validate that the input CQN really belongs to the UID which issues the create QP command. As of the above, all the PRM objects (except of the public ones which are managed by the kernel e.g. FLOW, etc.) will have a UID upon their create/modify/destroy commands. The detection of UMEM / physical addressed in the relevant commands will be done by firmware according to a 'umem valid bit' as the UID may be used in both cases. The series also enables white list commands which don't require a specific DEVX context, instead of this a device UID is used so that the firmware will mask un-privileged functionality. The IB link layer is also enabled once CAP_NET_RAW permission exists. To enable using the RAW QP underlay objects (e.g. TIRN, RQN, etc.) later on by DEVX commands the UHW output for this case was extended to return this data when a DEVX context is used. Thanks Leon Romanovsky (1): net/mlx5: Update mlx5_ifc with DEVX UID bits Yishai Hadas (24): net/mlx5: Set uid as part of CQ commands net/mlx5: Set uid as part of QP commands net/mlx5: Set uid as part of RQ commands net/mlx5: Set uid as part of SQ commands net/mlx5: Set uid as part of SRQ commands net/mlx5: Set uid as part of DCT commands IB/mlx5: Set uid as part of CQ creation IB/mlx5: Set uid as part of QP creation IB/mlx5: Set uid as part of RQ commands IB/mlx5: Set uid as part of SQ commands IB/mlx5: Set uid as part of TIR commands IB/mlx5: Set uid as part of TIS commands IB/mlx5: Set uid as part of RQT commands IB/mlx5: Set uid as part of PD commands IB/mlx5: Set uid as part of TD commands IB/mlx5: Set uid as part of SRQ commands IB/mlx5: Set uid as part of DCT commands IB/mlx5: Set uid as part of XRCD commands IB/mlx5: Set uid as part of MCG commands IB/mlx5: Set valid umem bit on DEVX IB/mlx5: Expose RAW QP device handles to user space IB/mlx5: Manage device uid for DEVX white list commands IB/mlx5: Enable DEVX white list commands IB/mlx5: Enable DEVX on IB drivers/infiniband/hw/mlx5/cmd.c | 129 ++ drivers/infiniband/hw/mlx5/cmd.h | 14 ++ drivers/infiniband/hw/mlx5/cq.c | 1 + drivers/infiniband/hw/mlx5/devx.c | 182 +++--- drivers/infiniband/hw/mlx5/main.c | 80 +++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 15 +-- drivers/infiniband/hw/mlx5/qp.c | 141 +++- drivers/infiniband/hw/mlx5/srq.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/cq.c | 4 + drivers/net/ethernet/mellanox/mlx5/core/qp.c | 81 drivers/net/ethernet/mellanox/mlx5/core/srq.c | 30 - include/linux/mlx5/cq.h | 1 + include/linux/mlx5/driver.h | 1 + include/linux/mlx5/mlx5_ifc.h | 135 +++ include/linux/mlx5/qp.h | 1 + include/linux/mlx5/srq.h | 1 + include/uapi/rdma/mlx5-abi.h | 13 ++ 17 files changed, 657 insertions(+), 173 deletions(-) -- 2.14.4
[PATCH mlx5-next 01/25] net/mlx5: Set uid as part of CQ commands
From: Yishai Hadas Set uid as part of CQ commands so that the firmware can manage the CQ object in a secured way. This will enable using a CQ that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/cq.c | 4 include/linux/mlx5/cq.h | 1 + include/linux/mlx5/mlx5_ifc.h| 6 +++--- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c index a4179122a279..4b85abb5c9f7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c @@ -109,6 +109,7 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, cq->cons_index = 0; cq->arm_sn = 0; cq->eq = eq; + cq->uid = MLX5_GET(create_cq_in, in, uid); refcount_set(&cq->refcount, 1); init_completion(&cq->free); if (!cq->comp) @@ -144,6 +145,7 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, memset(dout, 0, sizeof(dout)); MLX5_SET(destroy_cq_in, din, opcode, MLX5_CMD_OP_DESTROY_CQ); MLX5_SET(destroy_cq_in, din, cqn, cq->cqn); + MLX5_SET(destroy_cq_in, din, uid, cq->uid); mlx5_cmd_exec(dev, din, sizeof(din), dout, sizeof(dout)); return err; } @@ -165,6 +167,7 @@ int mlx5_core_destroy_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq) MLX5_SET(destroy_cq_in, in, opcode, MLX5_CMD_OP_DESTROY_CQ); MLX5_SET(destroy_cq_in, in, cqn, cq->cqn); + MLX5_SET(destroy_cq_in, in, uid, cq->uid); err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); if (err) return err; @@ -196,6 +199,7 @@ int mlx5_core_modify_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, u32 out[MLX5_ST_SZ_DW(modify_cq_out)] = {0}; MLX5_SET(modify_cq_in, in, opcode, MLX5_CMD_OP_MODIFY_CQ); + MLX5_SET(modify_cq_in, in, uid, cq->uid); return mlx5_cmd_exec(dev, in, inlen, out, sizeof(out)); } EXPORT_SYMBOL(mlx5_core_modify_cq); diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h index 0ef6138eca49..31a750570c38 100644 --- a/include/linux/mlx5/cq.h +++ b/include/linux/mlx5/cq.h @@ -61,6 +61,7 @@ struct mlx5_core_cq { int reset_notify_added; struct list_headreset_notify; struct mlx5_eq *eq; + u16 uid; }; diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index a14c4eaff53f..e62a0825d35c 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -5630,7 +5630,7 @@ enum { struct mlx5_ifc_modify_cq_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6405,7 +6405,7 @@ struct mlx5_ifc_destroy_cq_out_bits { struct mlx5_ifc_destroy_cq_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -7165,7 +7165,7 @@ struct mlx5_ifc_create_cq_out_bits { struct mlx5_ifc_create_cq_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; -- 2.14.4
[PATCH mlx5-next 02/25] net/mlx5: Set uid as part of QP commands
From: Yishai Hadas Set uid as part of QP commands so that the firmware can manage the QP object in a secured way. That will enable using a QP that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/qp.c | 45 +--- include/linux/mlx5/mlx5_ifc.h| 22 +++--- include/linux/mlx5/qp.h | 1 + 3 files changed, 39 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c index 4ca07bfb6b14..04f72a1cdbcc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c @@ -240,6 +240,7 @@ int mlx5_core_create_qp(struct mlx5_core_dev *dev, if (err) return err; + qp->uid = MLX5_GET(create_qp_in, in, uid); qp->qpn = MLX5_GET(create_qp_out, out, qpn); mlx5_core_dbg(dev, "qpn = 0x%x\n", qp->qpn); @@ -261,6 +262,7 @@ int mlx5_core_create_qp(struct mlx5_core_dev *dev, memset(dout, 0, sizeof(dout)); MLX5_SET(destroy_qp_in, din, opcode, MLX5_CMD_OP_DESTROY_QP); MLX5_SET(destroy_qp_in, din, qpn, qp->qpn); + MLX5_SET(destroy_qp_in, din, uid, qp->uid); mlx5_cmd_exec(dev, din, sizeof(din), dout, sizeof(dout)); return err; } @@ -320,6 +322,7 @@ int mlx5_core_destroy_qp(struct mlx5_core_dev *dev, MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP); MLX5_SET(destroy_qp_in, in, qpn, qp->qpn); + MLX5_SET(destroy_qp_in, in, uid, qp->uid); err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); if (err) return err; @@ -373,7 +376,7 @@ static void mbox_free(struct mbox_info *mbox) static int modify_qp_mbox_alloc(struct mlx5_core_dev *dev, u16 opcode, int qpn, u32 opt_param_mask, void *qpc, - struct mbox_info *mbox) + struct mbox_info *mbox, u16 uid) { mbox->out = NULL; mbox->in = NULL; @@ -381,26 +384,32 @@ static int modify_qp_mbox_alloc(struct mlx5_core_dev *dev, u16 opcode, int qpn, #define MBOX_ALLOC(mbox, typ) \ mbox_alloc(mbox, MLX5_ST_SZ_BYTES(typ##_in), MLX5_ST_SZ_BYTES(typ##_out)) -#define MOD_QP_IN_SET(typ, in, _opcode, _qpn) \ - MLX5_SET(typ##_in, in, opcode, _opcode); \ - MLX5_SET(typ##_in, in, qpn, _qpn) - -#define MOD_QP_IN_SET_QPC(typ, in, _opcode, _qpn, _opt_p, _qpc) \ - MOD_QP_IN_SET(typ, in, _opcode, _qpn); \ - MLX5_SET(typ##_in, in, opt_param_mask, _opt_p); \ - memcpy(MLX5_ADDR_OF(typ##_in, in, qpc), _qpc, MLX5_ST_SZ_BYTES(qpc)) +#define MOD_QP_IN_SET(typ, in, _opcode, _qpn, _uid) \ + do { \ + MLX5_SET(typ##_in, in, opcode, _opcode); \ + MLX5_SET(typ##_in, in, qpn, _qpn); \ + MLX5_SET(typ##_in, in, uid, _uid); \ + } while (0) + +#define MOD_QP_IN_SET_QPC(typ, in, _opcode, _qpn, _opt_p, _qpc, _uid) \ + do { \ + MOD_QP_IN_SET(typ, in, _opcode, _qpn, _uid); \ + MLX5_SET(typ##_in, in, opt_param_mask, _opt_p); \ + memcpy(MLX5_ADDR_OF(typ##_in, in, qpc), \ + _qpc, MLX5_ST_SZ_BYTES(qpc)); \ + } while (0) switch (opcode) { /* 2RST & 2ERR */ case MLX5_CMD_OP_2RST_QP: if (MBOX_ALLOC(mbox, qp_2rst)) return -ENOMEM; - MOD_QP_IN_SET(qp_2rst, mbox->in, opcode, qpn); + MOD_QP_IN_SET(qp_2rst, mbox->in, opcode, qpn, uid); break; case MLX5_CMD_OP_2ERR_QP: if (MBOX_ALLOC(mbox, qp_2err)) return -ENOMEM; - MOD_QP_IN_SET(qp_2err, mbox->in, opcode, qpn); + MOD_QP_IN_SET(qp_2err, mbox->in, opcode, qpn, uid); break; /* MODIFY with QPC */ @@ -408,37 +417,37 @@ static int modify_qp_mbox_alloc(struct mlx5_core_dev *dev, u16 opcode, int qpn, if (MBOX_ALLOC(mbox, rst2init_qp)) return -ENOMEM; MOD_QP_IN_SET_QPC(rst2init_qp, mbox->in, opcode, qpn, - opt_param_mask, qpc); + opt_param_mask, qpc, uid); break; case MLX5_CMD_OP_INIT2RTR_QP: if (MBOX_ALLOC(mbox, init2rtr_qp)) return -ENOMEM; MOD_QP_IN_SET_QPC(init2rtr_qp, mbox->in, opcode, qpn, - opt_param_mask, qpc); + opt_param_mask, qpc, uid); break; case MLX5_CMD_OP_RTR2RTS_QP: if (MBOX_ALLOC(mbox, rtr2rts_qp)) return -ENOMEM; MOD_QP_IN_SET_QPC(rtr2rts_qp, mbox->in, opcode, qpn, -
[PATCH mlx5-next 06/25] net/mlx5: Set uid as part of DCT commands
From: Yishai Hadas Set uid as part of DCT commands so that the firmware can manage the DCT object in a secured way. That will enable using a DCT that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/qp.c | 4 include/linux/mlx5/mlx5_ifc.h| 6 +++--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c index 9bdb3dc425ce..30f0e5ea7b2c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c @@ -211,6 +211,7 @@ int mlx5_core_create_dct(struct mlx5_core_dev *dev, } qp->qpn = MLX5_GET(create_dct_out, out, dctn); + qp->uid = MLX5_GET(create_dct_in, in, uid); err = create_resource_common(dev, qp, MLX5_RES_DCT); if (err) goto err_cmd; @@ -219,6 +220,7 @@ int mlx5_core_create_dct(struct mlx5_core_dev *dev, err_cmd: MLX5_SET(destroy_dct_in, din, opcode, MLX5_CMD_OP_DESTROY_DCT); MLX5_SET(destroy_dct_in, din, dctn, qp->qpn); + MLX5_SET(destroy_dct_in, din, uid, qp->uid); mlx5_cmd_exec(dev, (void *)&in, sizeof(din), (void *)&out, sizeof(dout)); return err; @@ -277,6 +279,7 @@ static int mlx5_core_drain_dct(struct mlx5_core_dev *dev, MLX5_SET(drain_dct_in, in, opcode, MLX5_CMD_OP_DRAIN_DCT); MLX5_SET(drain_dct_in, in, dctn, qp->qpn); + MLX5_SET(drain_dct_in, in, uid, qp->uid); return mlx5_cmd_exec(dev, (void *)&in, sizeof(in), (void *)&out, sizeof(out)); } @@ -303,6 +306,7 @@ int mlx5_core_destroy_dct(struct mlx5_core_dev *dev, destroy_resource_common(dev, &dct->mqp); MLX5_SET(destroy_dct_in, in, opcode, MLX5_CMD_OP_DESTROY_DCT); MLX5_SET(destroy_dct_in, in, dctn, qp->qpn); + MLX5_SET(destroy_dct_in, in, uid, qp->uid); err = mlx5_cmd_exec(dev, (void *)&in, sizeof(in), (void *)&out, sizeof(out)); return err; diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 5a2f0b02483a..efa4a60431d4 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -5919,7 +5919,7 @@ struct mlx5_ifc_drain_dct_out_bits { struct mlx5_ifc_drain_dct_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6383,7 +6383,7 @@ struct mlx5_ifc_destroy_dct_out_bits { struct mlx5_ifc_destroy_dct_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -7139,7 +7139,7 @@ struct mlx5_ifc_create_dct_out_bits { struct mlx5_ifc_create_dct_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; -- 2.14.4
[PATCH rdma-next 10/25] IB/mlx5: Set uid as part of RQ commands
From: Yishai Hadas Set uid as part of RQ commands so that the firmware can manage the RQ object in a secured way. The uid for the destroy command is set by mlx5_core. This will enable using an RQ that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/qp.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 786db05dfb91..31c69da7ccdf 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1190,7 +1190,7 @@ static size_t get_rq_pas_size(void *qpc) static int create_raw_packet_qp_rq(struct mlx5_ib_dev *dev, struct mlx5_ib_rq *rq, void *qpin, - size_t qpinlen) + size_t qpinlen, u16 uid) { struct mlx5_ib_qp *mqp = rq->base.container_mibqp; __be64 *pas; @@ -1211,6 +1211,7 @@ static int create_raw_packet_qp_rq(struct mlx5_ib_dev *dev, if (!in) return -ENOMEM; + MLX5_SET(create_rq_in, in, uid, uid); rqc = MLX5_ADDR_OF(create_rq_in, in, ctx); if (!(rq->flags & MLX5_IB_RQ_CVLAN_STRIPPING)) MLX5_SET(rqc, rqc, vsd, 1); @@ -1328,6 +1329,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, struct mlx5_ib_ucontext *mucontext = to_mucontext(ucontext); int err; u32 tdn = mucontext->tdn; + u16 uid = mucontext->devx_uid; if (qp->sq.wqe_cnt) { err = create_raw_packet_qp_tis(dev, qp, sq, tdn); @@ -1349,7 +1351,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, rq->flags |= MLX5_IB_RQ_CVLAN_STRIPPING; if (qp->flags & MLX5_IB_QP_PCI_WRITE_END_PADDING) rq->flags |= MLX5_IB_RQ_PCI_WRITE_END_PADDING; - err = create_raw_packet_qp_rq(dev, rq, in, inlen); + err = create_raw_packet_qp_rq(dev, rq, in, inlen, uid); if (err) goto err_destroy_sq; @@ -2840,7 +2842,8 @@ static int ib_mask_to_mlx5_opt(int ib_mask) static int modify_raw_packet_qp_rq(struct mlx5_ib_dev *dev, struct mlx5_ib_rq *rq, int new_state, - const struct mlx5_modify_raw_qp_param *raw_qp_param) + const struct mlx5_modify_raw_qp_param *raw_qp_param, + u16 uid) { void *in; void *rqc; @@ -2853,6 +2856,7 @@ static int modify_raw_packet_qp_rq(struct mlx5_ib_dev *dev, return -ENOMEM; MLX5_SET(modify_rq_in, in, rq_state, rq->state); + MLX5_SET(modify_rq_in, in, uid, uid); rqc = MLX5_ADDR_OF(modify_rq_in, in, ctx); MLX5_SET(rqc, rqc, state, new_state); @@ -2957,6 +2961,7 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, u8 tx_affinity) { struct mlx5_ib_raw_packet_qp *raw_packet_qp = &qp->raw_packet_qp; + u16 uid = to_mucontext(qp->ibqp.uobject->context)->devx_uid; struct mlx5_ib_rq *rq = &raw_packet_qp->rq; struct mlx5_ib_sq *sq = &raw_packet_qp->sq; int modify_rq = !!qp->rq.wqe_cnt; @@ -3000,7 +3005,8 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, } if (modify_rq) { - err = modify_raw_packet_qp_rq(dev, rq, rq_state, raw_qp_param); + err = modify_raw_packet_qp_rq(dev, rq, rq_state, raw_qp_param, + uid); if (err) return err; } @@ -5407,6 +5413,8 @@ static int create_rq(struct mlx5_ib_rwq *rwq, struct ib_pd *pd, if (!in) return -ENOMEM; + MLX5_SET(create_rq_in, in, uid, +to_mucontext(pd->uobject->context)->devx_uid); rqc = MLX5_ADDR_OF(create_rq_in, in, ctx); MLX5_SET(rqc, rqc, mem_rq_type, MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_INLINE); @@ -5792,6 +5800,8 @@ int mlx5_ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr *wq_attr, if (wq_state == IB_WQS_ERR) wq_state = MLX5_RQC_STATE_ERR; MLX5_SET(modify_rq_in, in, rq_state, curr_wq_state); + MLX5_SET(modify_rq_in, in, uid, +to_mucontext(wq->uobject->context)->devx_uid); MLX5_SET(rqc, rqc, state, wq_state); if (wq_attr_mask & IB_WQ_FLAGS) { -- 2.14.4
[PATCH rdma-next 09/25] IB/mlx5: Set uid as part of QP creation
From: Yishai Hadas Set uid as part of QP creation so that the firmware can manage the QP object in a secured way. The uid for the destroy and the modify commands is set by mlx5_core. This will enable using a QP that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/qp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index e7ebe50ffdb5..786db05dfb91 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -850,6 +850,7 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd, goto err_umem; } + MLX5_SET(create_qp_in, *in, uid, context->devx_uid); pas = (__be64 *)MLX5_ADDR_OF(create_qp_in, *in, pas); if (ubuffer->umem) mlx5_ib_populate_pas(dev, ubuffer->umem, page_shift, pas, 0); -- 2.14.4
[PATCH rdma-next 08/25] IB/mlx5: Set uid as part of CQ creation
From: Yishai Hadas Set uid as part of CQ creation so that the firmware can manage the CQ object in a secured way. The uid for the destroy and the modify commands is set by mlx5_core. This will enable using a CQ that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/cq.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 495fa6e651ea..a41519dc8d3a 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -877,6 +877,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata, cq->private_flags |= MLX5_IB_CQ_PR_FLAGS_CQE_128_PAD; } + MLX5_SET(create_cq_in, *cqb, uid, to_mucontext(context)->devx_uid); return 0; err_cqb: -- 2.14.4
[PATCH mlx5-next 05/25] net/mlx5: Set uid as part of SRQ commands
From: Yishai Hadas Set uid as part of SRQ commands so that the firmware can manage the SRQ object in a secured way. That will enable using an SRQ that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/srq.c | 30 --- include/linux/mlx5/driver.h | 1 + include/linux/mlx5/mlx5_ifc.h | 22 ++-- include/linux/mlx5/srq.h | 1 + 4 files changed, 40 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/srq.c b/drivers/net/ethernet/mellanox/mlx5/core/srq.c index 23cc337a96c9..216d44ad061a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/srq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/srq.c @@ -166,6 +166,7 @@ static int create_srq_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, if (!create_in) return -ENOMEM; + MLX5_SET(create_srq_in, create_in, uid, in->uid); srqc = MLX5_ADDR_OF(create_srq_in, create_in, srq_context_entry); pas = MLX5_ADDR_OF(create_srq_in, create_in, pas); @@ -178,8 +179,10 @@ static int create_srq_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, err = mlx5_cmd_exec(dev, create_in, inlen, create_out, sizeof(create_out)); kvfree(create_in); - if (!err) + if (!err) { srq->srqn = MLX5_GET(create_srq_out, create_out, srqn); + srq->uid = in->uid; + } return err; } @@ -193,6 +196,7 @@ static int destroy_srq_cmd(struct mlx5_core_dev *dev, MLX5_SET(destroy_srq_in, srq_in, opcode, MLX5_CMD_OP_DESTROY_SRQ); MLX5_SET(destroy_srq_in, srq_in, srqn, srq->srqn); + MLX5_SET(destroy_srq_in, srq_in, uid, srq->uid); return mlx5_cmd_exec(dev, srq_in, sizeof(srq_in), srq_out, sizeof(srq_out)); @@ -208,6 +212,7 @@ static int arm_srq_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, MLX5_SET(arm_rq_in, srq_in, op_mod, MLX5_ARM_RQ_IN_OP_MOD_SRQ); MLX5_SET(arm_rq_in, srq_in, srq_number, srq->srqn); MLX5_SET(arm_rq_in, srq_in, lwm, lwm); + MLX5_SET(arm_rq_in, srq_in, uid, srq->uid); return mlx5_cmd_exec(dev, srq_in, sizeof(srq_in), srq_out, sizeof(srq_out)); @@ -260,6 +265,7 @@ static int create_xrc_srq_cmd(struct mlx5_core_dev *dev, if (!create_in) return -ENOMEM; + MLX5_SET(create_xrc_srq_in, create_in, uid, in->uid); xrc_srqc = MLX5_ADDR_OF(create_xrc_srq_in, create_in, xrc_srq_context_entry); pas = MLX5_ADDR_OF(create_xrc_srq_in, create_in, pas); @@ -277,6 +283,7 @@ static int create_xrc_srq_cmd(struct mlx5_core_dev *dev, goto out; srq->srqn = MLX5_GET(create_xrc_srq_out, create_out, xrc_srqn); + srq->uid = in->uid; out: kvfree(create_in); return err; @@ -291,6 +298,7 @@ static int destroy_xrc_srq_cmd(struct mlx5_core_dev *dev, MLX5_SET(destroy_xrc_srq_in, xrcsrq_in, opcode, MLX5_CMD_OP_DESTROY_XRC_SRQ); MLX5_SET(destroy_xrc_srq_in, xrcsrq_in, xrc_srqn, srq->srqn); + MLX5_SET(destroy_xrc_srq_in, xrcsrq_in, uid, srq->uid); return mlx5_cmd_exec(dev, xrcsrq_in, sizeof(xrcsrq_in), xrcsrq_out, sizeof(xrcsrq_out)); @@ -306,6 +314,7 @@ static int arm_xrc_srq_cmd(struct mlx5_core_dev *dev, MLX5_SET(arm_xrc_srq_in, xrcsrq_in, op_mod, MLX5_ARM_XRC_SRQ_IN_OP_MOD_XRC_SRQ); MLX5_SET(arm_xrc_srq_in, xrcsrq_in, xrc_srqn, srq->srqn); MLX5_SET(arm_xrc_srq_in, xrcsrq_in, lwm, lwm); + MLX5_SET(arm_xrc_srq_in, xrcsrq_in, uid, srq->uid); return mlx5_cmd_exec(dev, xrcsrq_in, sizeof(xrcsrq_in), xrcsrq_out, sizeof(xrcsrq_out)); @@ -365,10 +374,13 @@ static int create_rmp_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, wq = MLX5_ADDR_OF(rmpc, rmpc, wq); MLX5_SET(rmpc, rmpc, state, MLX5_RMPC_STATE_RDY); + MLX5_SET(create_rmp_in, create_in, uid, in->uid); set_wq(wq, in); memcpy(MLX5_ADDR_OF(rmpc, rmpc, wq.pas), in->pas, pas_size); err = mlx5_core_create_rmp(dev, create_in, inlen, &srq->srqn); + if (!err) + srq->uid = in->uid; kvfree(create_in); return err; @@ -377,7 +389,13 @@ static int create_rmp_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, static int destroy_rmp_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq) { - return mlx5_core_destroy_rmp(dev, srq->srqn); + u32 in[MLX5_ST_SZ_DW(destroy_rmp_in)] = {0}; + u32 out[MLX5_ST_SZ_DW(destroy_rmp_out)] = {0}; + + MLX5_SET(destro
[PATCH rdma-next 12/25] IB/mlx5: Set uid as part of TIR commands
From: Yishai Hadas Set uid as part of TIR commands so that the firmware can manage the TIR object in a secured way. That will enable using a TIR that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/cmd.c | 11 +++ drivers/infiniband/hw/mlx5/cmd.h | 1 + drivers/infiniband/hw/mlx5/qp.c | 24 3 files changed, 28 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c index c84fef9a8a08..e150ae44e06a 100644 --- a/drivers/infiniband/hw/mlx5/cmd.c +++ b/drivers/infiniband/hw/mlx5/cmd.c @@ -197,3 +197,14 @@ int mlx5_cmd_query_ext_ppcnt_counters(struct mlx5_core_dev *dev, void *out) return mlx5_core_access_reg(dev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0); } + +void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid) +{ + u32 in[MLX5_ST_SZ_DW(destroy_tir_in)] = {0}; + u32 out[MLX5_ST_SZ_DW(destroy_tir_out)] = {0}; + + MLX5_SET(destroy_tir_in, in, opcode, MLX5_CMD_OP_DESTROY_TIR); + MLX5_SET(destroy_tir_in, in, tirn, tirn); + MLX5_SET(destroy_tir_in, in, uid, uid); + mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h index 88cbb1c41703..274090a38c4b 100644 --- a/drivers/infiniband/hw/mlx5/cmd.h +++ b/drivers/infiniband/hw/mlx5/cmd.h @@ -47,4 +47,5 @@ int mlx5_cmd_modify_cong_params(struct mlx5_core_dev *mdev, int mlx5_cmd_alloc_memic(struct mlx5_memic *memic, phys_addr_t *addr, u64 length, u32 alignment); int mlx5_cmd_dealloc_memic(struct mlx5_memic *memic, u64 addr, u64 length); +void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid); #endif /* MLX5_IB_CMD_H */ diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 24370635008e..07bf5128bee4 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -37,6 +37,7 @@ #include #include "mlx5_ib.h" #include "ib_rep.h" +#include "cmd.h" /* not supported currently */ static int wq_signature; @@ -1262,17 +1263,19 @@ static bool tunnel_offload_supported(struct mlx5_core_dev *dev) static void destroy_raw_packet_qp_tir(struct mlx5_ib_dev *dev, struct mlx5_ib_rq *rq, - u32 qp_flags_en) + u32 qp_flags_en, + u16 uid) { if (qp_flags_en & (MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC | MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC)) mlx5_ib_disable_lb(dev, false, true); - mlx5_core_destroy_tir(dev->mdev, rq->tirn); + mlx5_cmd_destroy_tir(dev->mdev, rq->tirn, uid); } static int create_raw_packet_qp_tir(struct mlx5_ib_dev *dev, struct mlx5_ib_rq *rq, u32 tdn, - u32 *qp_flags_en) + u32 *qp_flags_en, + u16 uid) { u8 lb_flag = 0; u32 *in; @@ -1285,6 +1288,7 @@ static int create_raw_packet_qp_tir(struct mlx5_ib_dev *dev, if (!in) return -ENOMEM; + MLX5_SET(create_tir_in, in, uid, uid); tirc = MLX5_ADDR_OF(create_tir_in, in, ctx); MLX5_SET(tirc, tirc, disp_type, MLX5_TIRC_DISP_TYPE_DIRECT); MLX5_SET(tirc, tirc, inline_rqn, rq->base.mqp.qpn); @@ -1311,7 +1315,7 @@ static int create_raw_packet_qp_tir(struct mlx5_ib_dev *dev, err = mlx5_ib_enable_lb(dev, false, true); if (err) - destroy_raw_packet_qp_tir(dev, rq, 0); + destroy_raw_packet_qp_tir(dev, rq, 0, uid); } kvfree(in); @@ -1356,8 +1360,8 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, if (err) goto err_destroy_sq; - - err = create_raw_packet_qp_tir(dev, rq, tdn, &qp->flags_en); + err = create_raw_packet_qp_tir(dev, rq, tdn, &qp->flags_en, + uid); if (err) goto err_destroy_rq; } @@ -1385,9 +1389,10 @@ static void destroy_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_raw_packet_qp *raw_packet_qp = &qp->raw_packet_qp; struct mlx5_ib_sq *sq = &raw_packet_qp->sq; struct mlx5_ib_rq *rq = &raw_packet_qp->rq; + u16 uid = to_mucontext(qp->ibqp.uobject->context)->devx_uid; if (qp->rq.wqe_cnt) { - destroy_raw_packet_qp_tir(dev, rq, qp->flags_en); + destroy_raw_packet_qp_tir(dev, rq, qp->flags_en, uid); destroy_raw_packet_qp_rq(dev, rq); }
[PATCH mlx5-next 04/25] net/mlx5: Set uid as part of SQ commands
From: Yishai Hadas Set uid as part of SQ commands so that the firmware can manage the SQ object in a secured way. That will enable using an SQ that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/net/ethernet/mellanox/mlx5/core/qp.c | 16 ++-- include/linux/mlx5/mlx5_ifc.h| 6 +++--- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c index 0ca68ef54d93..9bdb3dc425ce 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c @@ -584,6 +584,17 @@ void mlx5_core_destroy_rq_tracked(struct mlx5_core_dev *dev, } EXPORT_SYMBOL(mlx5_core_destroy_rq_tracked); +static void destroy_sq_tracked(struct mlx5_core_dev *dev, u32 sqn, u16 uid) +{ + u32 in[MLX5_ST_SZ_DW(destroy_sq_in)] = {0}; + u32 out[MLX5_ST_SZ_DW(destroy_sq_out)] = {0}; + + MLX5_SET(destroy_sq_in, in, opcode, MLX5_CMD_OP_DESTROY_SQ); + MLX5_SET(destroy_sq_in, in, sqn, sqn); + MLX5_SET(destroy_sq_in, in, uid, uid); + mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} + int mlx5_core_create_sq_tracked(struct mlx5_core_dev *dev, u32 *in, int inlen, struct mlx5_core_qp *sq) { @@ -594,6 +605,7 @@ int mlx5_core_create_sq_tracked(struct mlx5_core_dev *dev, u32 *in, int inlen, if (err) return err; + sq->uid = MLX5_GET(create_sq_in, in, uid); sq->qpn = sqn; err = create_resource_common(dev, sq, MLX5_RES_SQ); if (err) @@ -602,7 +614,7 @@ int mlx5_core_create_sq_tracked(struct mlx5_core_dev *dev, u32 *in, int inlen, return 0; err_destroy_sq: - mlx5_core_destroy_sq(dev, sq->qpn); + destroy_sq_tracked(dev, sq->qpn, sq->uid); return err; } @@ -612,7 +624,7 @@ void mlx5_core_destroy_sq_tracked(struct mlx5_core_dev *dev, struct mlx5_core_qp *sq) { destroy_resource_common(dev, sq); - mlx5_core_destroy_sq(dev, sq->qpn); + destroy_sq_tracked(dev, sq->qpn, sq->uid); } EXPORT_SYMBOL(mlx5_core_destroy_sq_tracked); diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 01b707666fb4..8151488f6570 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -5382,7 +5382,7 @@ struct mlx5_ifc_modify_sq_out_bits { struct mlx5_ifc_modify_sq_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6097,7 +6097,7 @@ struct mlx5_ifc_destroy_sq_out_bits { struct mlx5_ifc_destroy_sq_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6770,7 +6770,7 @@ struct mlx5_ifc_create_sq_out_bits { struct mlx5_ifc_create_sq_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; -- 2.14.4
[PATCH mlx5-next 07/25] net/mlx5: Update mlx5_ifc with DEVX UID bits
From: Leon Romanovsky Add DEVX information to WQ, SRQ, CQ, TRI, TIS, QP, RQ, XRCD, PD, MKEY and MCG. Signed-off-by: Leon Romanovsky --- include/linux/mlx5/mlx5_ifc.h | 67 +++ 1 file changed, 43 insertions(+), 24 deletions(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index efa4a60431d4..0f460fb22c31 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1291,7 +1291,9 @@ struct mlx5_ifc_wq_bits { u8 reserved_at_118[0x3]; u8 log_wq_sz[0x5]; - u8 reserved_at_120[0x3]; + u8 dbr_umem_valid[0x1]; + u8 wq_umem_valid[0x1]; + u8 reserved_at_122[0x1]; u8 log_hairpin_num_packets[0x5]; u8 reserved_at_128[0x3]; u8 log_hairpin_data_sz[0x5]; @@ -2365,7 +2367,10 @@ struct mlx5_ifc_qpc_bits { u8 dc_access_key[0x40]; - u8 reserved_at_680[0xc0]; + u8 reserved_at_680[0x3]; + u8 dbr_umem_valid[0x1]; + + u8 reserved_at_684[0xbc]; }; struct mlx5_ifc_roce_addr_layout_bits { @@ -2465,7 +2470,7 @@ struct mlx5_ifc_xrc_srqc_bits { u8 wq_signature[0x1]; u8 cont_srq[0x1]; - u8 reserved_at_22[0x1]; + u8 dbr_umem_valid[0x1]; u8 rlky[0x1]; u8 basic_cyclic_rcv_wqe[0x1]; u8 log_rq_stride[0x3]; @@ -3129,7 +3134,9 @@ enum { struct mlx5_ifc_cqc_bits { u8 status[0x4]; - u8 reserved_at_4[0x4]; + u8 reserved_at_4[0x2]; + u8 dbr_umem_valid[0x1]; + u8 reserved_at_7[0x1]; u8 cqe_sz[0x3]; u8 cc[0x1]; u8 reserved_at_c[0x1]; @@ -5315,7 +5322,7 @@ struct mlx5_ifc_modify_tis_bitmask_bits { struct mlx5_ifc_modify_tis_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -5354,7 +5361,7 @@ struct mlx5_ifc_modify_tir_out_bits { struct mlx5_ifc_modify_tir_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -5455,7 +5462,7 @@ struct mlx5_ifc_rqt_bitmask_bits { struct mlx5_ifc_modify_rqt_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -5642,7 +5649,10 @@ struct mlx5_ifc_modify_cq_in_bits { struct mlx5_ifc_cqc_bits cq_context; - u8 reserved_at_280[0x600]; + u8 reserved_at_280[0x40]; + + u8 cq_umem_valid[0x1]; + u8 reserved_at_2c1[0x5bf]; u8 pas[0][0x40]; }; @@ -5963,7 +5973,7 @@ struct mlx5_ifc_detach_from_mcg_out_bits { struct mlx5_ifc_detach_from_mcg_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6031,7 +6041,7 @@ struct mlx5_ifc_destroy_tis_out_bits { struct mlx5_ifc_destroy_tis_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6053,7 +6063,7 @@ struct mlx5_ifc_destroy_tir_out_bits { struct mlx5_ifc_destroy_tir_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6143,7 +6153,7 @@ struct mlx5_ifc_destroy_rqt_out_bits { struct mlx5_ifc_destroy_rqt_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6508,7 +6518,7 @@ struct mlx5_ifc_dealloc_xrcd_out_bits { struct mlx5_ifc_dealloc_xrcd_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6596,7 +6606,7 @@ struct mlx5_ifc_dealloc_pd_out_bits { struct mlx5_ifc_dealloc_pd_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; @@ -6675,7 +6685,9 @@ struct mlx5_ifc_create_xrc_srq_in_bits { struct mlx5_ifc_xrc_srqc_bits xrc_srq_context_entry; - u8 reserved_at_280[0x600]; + u8 reserved_at_280[0x40]; + u8 xrc_srq_umem_valid[0x1]; + u8 reserve
[PATCH rdma-next 19/25] IB/mlx5: Set uid as part of XRCD commands
From: Yishai Hadas Set uid as part of XRCD commands so that the firmware can manage the XRCD object in a secured way. That will enable using an XRCD that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/cmd.c | 25 + drivers/infiniband/hw/mlx5/cmd.h | 2 ++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + drivers/infiniband/hw/mlx5/qp.c | 8 ++-- 4 files changed, 34 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c index 9da10fbb7e23..51c39bc77ac7 100644 --- a/drivers/infiniband/hw/mlx5/cmd.c +++ b/drivers/infiniband/hw/mlx5/cmd.c @@ -271,3 +271,28 @@ void mlx5_cmd_dealloc_transport_domain(struct mlx5_core_dev *dev, u32 tdn, MLX5_SET(dealloc_transport_domain_in, in, transport_domain, tdn); mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); } + +int mlx5_cmd_xrcd_alloc(struct mlx5_core_dev *dev, u32 *xrcdn, u16 uid) +{ + u32 out[MLX5_ST_SZ_DW(alloc_xrcd_out)] = {0}; + u32 in[MLX5_ST_SZ_DW(alloc_xrcd_in)] = {0}; + int err; + + MLX5_SET(alloc_xrcd_in, in, opcode, MLX5_CMD_OP_ALLOC_XRCD); + MLX5_SET(alloc_xrcd_in, in, uid, uid); + err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); + if (!err) + *xrcdn = MLX5_GET(alloc_xrcd_out, out, xrcd); + return err; +} + +int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid) +{ + u32 out[MLX5_ST_SZ_DW(dealloc_xrcd_out)] = {0}; + u32 in[MLX5_ST_SZ_DW(dealloc_xrcd_in)] = {0}; + + MLX5_SET(dealloc_xrcd_in, in, opcode, MLX5_CMD_OP_DEALLOC_XRCD); + MLX5_SET(dealloc_xrcd_in, in, xrcd, xrcdn); + MLX5_SET(dealloc_xrcd_in, in, uid, uid); + return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h index 3a1d611216fb..76823e86fd17 100644 --- a/drivers/infiniband/hw/mlx5/cmd.h +++ b/drivers/infiniband/hw/mlx5/cmd.h @@ -55,4 +55,6 @@ int mlx5_cmd_alloc_transport_domain(struct mlx5_core_dev *dev, u32 *tdn, u16 uid); void mlx5_cmd_dealloc_transport_domain(struct mlx5_core_dev *dev, u32 tdn, u16 uid); +int mlx5_cmd_xrcd_alloc(struct mlx5_core_dev *dev, u32 *xrcdn, u16 uid); +int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid); #endif /* MLX5_IB_CMD_H */ diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 88ea0df71d94..f582bd05c180 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -548,6 +548,7 @@ struct mlx5_ib_srq { struct mlx5_ib_xrcd { struct ib_xrcd ibxrcd; u32 xrcdn; + u16 uid; }; enum mlx5_ib_mtt_access_flags { diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 1a3b79405260..00b36b971ffa 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -5341,6 +5341,7 @@ struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device *ibdev, struct mlx5_ib_dev *dev = to_mdev(ibdev); struct mlx5_ib_xrcd *xrcd; int err; + u16 uid; if (!MLX5_CAP_GEN(dev->mdev, xrc)) return ERR_PTR(-ENOSYS); @@ -5349,12 +5350,14 @@ struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device *ibdev, if (!xrcd) return ERR_PTR(-ENOMEM); - err = mlx5_core_xrcd_alloc(dev->mdev, &xrcd->xrcdn); + uid = context ? to_mucontext(context)->devx_uid : 0; + err = mlx5_cmd_xrcd_alloc(dev->mdev, &xrcd->xrcdn, uid); if (err) { kfree(xrcd); return ERR_PTR(-ENOMEM); } + xrcd->uid = uid; return &xrcd->ibxrcd; } @@ -5362,9 +5365,10 @@ int mlx5_ib_dealloc_xrcd(struct ib_xrcd *xrcd) { struct mlx5_ib_dev *dev = to_mdev(xrcd->device); u32 xrcdn = to_mxrcd(xrcd)->xrcdn; + u16 uid = to_mxrcd(xrcd)->uid; int err; - err = mlx5_core_xrcd_dealloc(dev->mdev, xrcdn); + err = mlx5_cmd_xrcd_dealloc(dev->mdev, xrcdn, uid); if (err) mlx5_ib_warn(dev, "failed to dealloc xrcdn 0x%x\n", xrcdn); -- 2.14.4
[PATCH rdma-next 16/25] IB/mlx5: Set uid as part of TD commands
From: Yishai Hadas Set uid as part of TD commands so that the firmware can manage the TD object in a secured way. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/cmd.c | 30 ++ drivers/infiniband/hw/mlx5/cmd.h | 4 drivers/infiniband/hw/mlx5/main.c | 33 ++--- 3 files changed, 52 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c index 5560346102bd..9da10fbb7e23 100644 --- a/drivers/infiniband/hw/mlx5/cmd.c +++ b/drivers/infiniband/hw/mlx5/cmd.c @@ -241,3 +241,33 @@ void mlx5_cmd_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn, u16 uid) MLX5_SET(dealloc_pd_in, in, uid, uid); mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); } + +int mlx5_cmd_alloc_transport_domain(struct mlx5_core_dev *dev, u32 *tdn, + u16 uid) +{ + u32 in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0}; + u32 out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0}; + int err; + + MLX5_SET(alloc_transport_domain_in, in, opcode, +MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN); + + err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); + if (!err) + *tdn = MLX5_GET(alloc_transport_domain_out, out, + transport_domain); + + return err; +} + +void mlx5_cmd_dealloc_transport_domain(struct mlx5_core_dev *dev, u32 tdn, + u16 uid) +{ + u32 in[MLX5_ST_SZ_DW(dealloc_transport_domain_in)] = {0}; + u32 out[MLX5_ST_SZ_DW(dealloc_transport_domain_out)] = {0}; + + MLX5_SET(dealloc_transport_domain_in, in, opcode, +MLX5_CMD_OP_DEALLOC_TRANSPORT_DOMAIN); + MLX5_SET(dealloc_transport_domain_in, in, transport_domain, tdn); + mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h index b47e98b8a53a..3a1d611216fb 100644 --- a/drivers/infiniband/hw/mlx5/cmd.h +++ b/drivers/infiniband/hw/mlx5/cmd.h @@ -51,4 +51,8 @@ void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid); void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid); void mlx5_cmd_destroy_rqt(struct mlx5_core_dev *dev, u32 rqtn, u16 uid); void mlx5_cmd_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn, u16 uid); +int mlx5_cmd_alloc_transport_domain(struct mlx5_core_dev *dev, u32 *tdn, + u16 uid); +void mlx5_cmd_dealloc_transport_domain(struct mlx5_core_dev *dev, u32 tdn, + u16 uid); #endif /* MLX5_IB_CMD_H */ diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 7e6fd5553ab3..c1f94bc09606 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1613,14 +1613,15 @@ void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool td, bool qp) mutex_unlock(&dev->lb.mutex); } -static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn) +static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn, + u16 uid) { int err; if (!MLX5_CAP_GEN(dev->mdev, log_max_transport_domain)) return 0; - err = mlx5_core_alloc_transport_domain(dev->mdev, tdn); + err = mlx5_cmd_alloc_transport_domain(dev->mdev, tdn, uid); if (err) return err; @@ -1632,12 +1633,13 @@ static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn) return mlx5_ib_enable_lb(dev, true, false); } -static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn) +static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn, +u16 uid) { if (!MLX5_CAP_GEN(dev->mdev, log_max_transport_domain)) return; - mlx5_core_dealloc_transport_domain(dev->mdev, tdn); + mlx5_cmd_dealloc_transport_domain(dev->mdev, tdn, uid); if ((MLX5_CAP_GEN(dev->mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH) || (!MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) && @@ -1756,22 +1758,23 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, context->ibucontext.invalidate_range = &mlx5_ib_invalidate_range; #endif - err = mlx5_ib_alloc_transport_domain(dev, &context->tdn); - if (err) - goto out_uars; - if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX) { /* Block DEVX on Infiniband as of SELinux */ if (mlx5_ib_port_link_layer(ibdev, 1) != IB_LINK_LAYER_ETHERNET) { err = -EPERM; - goto out_td; + goto out_uars; } err = mlx5_ib_devx_create(dev, conte
[PATCH rdma-next 22/25] IB/mlx5: Expose RAW QP device handles to user space
From: Yishai Hadas Expose RAW QP device handles to user space by extending the UHW part of mlx5_ib_create_qp_resp. This data is returned only when DEVX context is used where it may be applicable. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/qp.c | 38 -- include/uapi/rdma/mlx5-abi.h| 13 + 2 files changed, 49 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 00b36b971ffa..9a04f8b12a75 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1325,7 +1325,9 @@ static int create_raw_packet_qp_tir(struct mlx5_ib_dev *dev, static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, u32 *in, size_t inlen, - struct ib_pd *pd) + struct ib_pd *pd, + struct ib_udata *udata, + struct mlx5_ib_create_qp_resp *resp) { struct mlx5_ib_raw_packet_qp *raw_packet_qp = &qp->raw_packet_qp; struct mlx5_ib_sq *sq = &raw_packet_qp->sq; @@ -1346,6 +1348,13 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, if (err) goto err_destroy_tis; + if (uid) { + resp->tisn = sq->tisn; + resp->comp_mask |= MLX5_IB_CREATE_QP_RESP_MASK_TISN; + resp->sqn = sq->base.mqp.qpn; + resp->comp_mask |= MLX5_IB_CREATE_QP_RESP_MASK_SQN; + } + sq->base.container_mibqp = qp; sq->base.mqp.event = mlx5_ib_qp_event; } @@ -1365,13 +1374,26 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, uid); if (err) goto err_destroy_rq; + + if (uid) { + resp->rqn = rq->base.mqp.qpn; + resp->comp_mask |= MLX5_IB_CREATE_QP_RESP_MASK_RQN; + resp->tirn = rq->tirn; + resp->comp_mask |= MLX5_IB_CREATE_QP_RESP_MASK_TIRN; + } } qp->trans_qp.base.mqp.qpn = qp->sq.wqe_cnt ? sq->base.mqp.qpn : rq->base.mqp.qpn; + err = ib_copy_to_udata(udata, resp, min(udata->outlen, sizeof(*resp))); + if (err) + goto err_destroy_tir; + return 0; +err_destroy_tir: + destroy_raw_packet_qp_tir(dev, rq, qp->flags_en, uid); err_destroy_rq: destroy_raw_packet_qp_rq(dev, rq); err_destroy_sq: @@ -1643,12 +1665,23 @@ static int create_rss_raw_qp_tir(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, if (err) goto err; + if (mucontext->devx_uid) { + resp.comp_mask |= MLX5_IB_CREATE_QP_RESP_MASK_TIRN; + resp.tirn = qp->rss_qp.tirn; + } + + err = ib_copy_to_udata(udata, &resp, min(udata->outlen, sizeof(resp))); + if (err) + goto err_copy; + kvfree(in); /* qpn is reserved for that QP */ qp->trans_qp.base.mqp.qpn = 0; qp->flags |= MLX5_IB_QP_RSS; return 0; +err_copy: + mlx5_cmd_destroy_tir(dev->mdev, qp->rss_qp.tirn, mucontext->devx_uid); err: kvfree(in); return err; @@ -2030,7 +2063,8 @@ static int create_qp_common(struct mlx5_ib_dev *dev, struct ib_pd *pd, qp->flags & MLX5_IB_QP_UNDERLAY) { qp->raw_packet_qp.sq.ubuffer.buf_addr = ucmd.sq_buf_addr; raw_packet_qp_copy_info(qp, &qp->raw_packet_qp); - err = create_raw_packet_qp(dev, qp, in, inlen, pd); + err = create_raw_packet_qp(dev, qp, in, inlen, pd, udata, + &resp); } else { err = mlx5_core_create_qp(dev->mdev, &base->mqp, in, inlen); } diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h index 3ddb31a0bc47..8fa9f90e2bb1 100644 --- a/include/uapi/rdma/mlx5-abi.h +++ b/include/uapi/rdma/mlx5-abi.h @@ -352,9 +352,22 @@ struct mlx5_ib_create_qp_rss { __u32 flags; }; +enum mlx5_ib_create_qp_resp_mask { + MLX5_IB_CREATE_QP_RESP_MASK_TIRN = 1UL << 0, + MLX5_IB_CREATE_QP_RESP_MASK_TISN = 1UL << 1, + MLX5_IB_CREATE_QP_RESP_MASK_RQN = 1UL << 2, + MLX5_IB_CREATE_QP_RESP_MASK_SQN = 1UL << 3, +}; + struct mlx5_ib_create_qp_resp { __u32 bfreg_index; __u32 reserved; + __u32 comp_mask; + __u32 tirn; + __u32 tisn; + __u32 rqn; + __u32 sqn; + __u32 reserved1; }; struct mlx5_ib_alloc_mw { -- 2.14.4
[PATCH rdma-next 17/25] IB/mlx5: Set uid as part of SRQ commands
From: Yishai Hadas Set uid as part of SRQ create command so that the firmware can manage the SRQ object in a secured way. The uid for the destroy and modify commands are set by mlx5_core. That will enable using a SRQ that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/srq.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c index d359fecf7a5b..6b1cd9ef4e2a 100644 --- a/drivers/infiniband/hw/mlx5/srq.c +++ b/drivers/infiniband/hw/mlx5/srq.c @@ -144,6 +144,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq, in->log_page_size = page_shift - MLX5_ADAPTER_PAGE_SHIFT; in->page_offset = offset; + in->uid = to_mucontext(pd->uobject->context)->devx_uid; if (MLX5_CAP_GEN(dev->mdev, cqe_version) == MLX5_CQE_VERSION_V1 && in->type != IB_SRQT_BASIC) in->user_index = uidx; -- 2.14.4
[PATCH rdma-next 11/25] IB/mlx5: Set uid as part of SQ commands
From: Yishai Hadas Set uid as part of SQ commands so that the firmware can manage the SQ object in a secured way. The uid for the destroy command is set by mlx5_core. This will enable using an SQ that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/qp.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 31c69da7ccdf..24370635008e 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1088,7 +1088,7 @@ static void destroy_flow_rule_vport_sq(struct mlx5_ib_dev *dev, static int create_raw_packet_qp_sq(struct mlx5_ib_dev *dev, struct mlx5_ib_sq *sq, void *qpin, - struct ib_pd *pd) + struct ib_pd *pd, u16 uid) { struct mlx5_ib_ubuffer *ubuffer = &sq->ubuffer; __be64 *pas; @@ -1116,6 +1116,7 @@ static int create_raw_packet_qp_sq(struct mlx5_ib_dev *dev, goto err_umem; } + MLX5_SET(create_sq_in, in, uid, uid); sqc = MLX5_ADDR_OF(create_sq_in, in, ctx); MLX5_SET(sqc, sqc, flush_in_error_en, 1); if (MLX5_CAP_ETH(dev->mdev, multi_pkt_send_wqe)) @@ -1336,7 +1337,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, if (err) return err; - err = create_raw_packet_qp_sq(dev, sq, in, pd); + err = create_raw_packet_qp_sq(dev, sq, in, pd, uid); if (err) goto err_destroy_tis; @@ -2885,7 +2886,8 @@ static int modify_raw_packet_qp_rq(struct mlx5_ib_dev *dev, static int modify_raw_packet_qp_sq(struct mlx5_core_dev *dev, struct mlx5_ib_sq *sq, int new_state, - const struct mlx5_modify_raw_qp_param *raw_qp_param) + const struct mlx5_modify_raw_qp_param *raw_qp_param, + u16 uid) { struct mlx5_ib_qp *ibqp = sq->base.container_mibqp; struct mlx5_rate_limit old_rl = ibqp->rl; @@ -2902,6 +2904,7 @@ static int modify_raw_packet_qp_sq(struct mlx5_core_dev *dev, if (!in) return -ENOMEM; + MLX5_SET(modify_sq_in, in, uid, uid); MLX5_SET(modify_sq_in, in, sq_state, sq->state); sqc = MLX5_ADDR_OF(modify_sq_in, in, ctx); @@ -3019,7 +3022,8 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, return err; } - return modify_raw_packet_qp_sq(dev->mdev, sq, sq_state, raw_qp_param); + return modify_raw_packet_qp_sq(dev->mdev, sq, sq_state, + raw_qp_param, uid); } return 0; -- 2.14.4
[PATCH rdma-next 18/25] IB/mlx5: Set uid as part of DCT commands
From: Yishai Hadas Set uid as part of DCT create command so that the firmware can manage the DCT object in a secured way. The uid for the destroy and drain commands are set by mlx5_core. That will enable using a DCT that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/qp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 8fbf17a885b9..1a3b79405260 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -2311,6 +2311,8 @@ static struct ib_qp *mlx5_ib_create_dct(struct ib_pd *pd, goto err_free; } + MLX5_SET(create_dct_in, qp->dct.in, uid, +to_mucontext(pd->uobject->context)->devx_uid); dctc = MLX5_ADDR_OF(create_dct_in, qp->dct.in, dct_context_entry); qp->qp_sub_type = MLX5_IB_QPT_DCT; MLX5_SET(dctc, dctc, pd, to_mpd(pd)->pdn); -- 2.14.4
[PATCH rdma-next 15/25] IB/mlx5: Set uid as part of PD commands
From: Yishai Hadas Set uid as part of PD commands so that the firmware can manage the PD object in a secured way. For example when a QP is created its uid must match the CQ uid which it uses. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/cmd.c | 10 ++ drivers/infiniband/hw/mlx5/cmd.h | 1 + drivers/infiniband/hw/mlx5/main.c| 16 +--- drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + 4 files changed, 25 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c index 347e3912b4bb..5560346102bd 100644 --- a/drivers/infiniband/hw/mlx5/cmd.c +++ b/drivers/infiniband/hw/mlx5/cmd.c @@ -231,3 +231,13 @@ void mlx5_cmd_destroy_rqt(struct mlx5_core_dev *dev, u32 rqtn, u16 uid) mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); } +void mlx5_cmd_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn, u16 uid) +{ + u32 out[MLX5_ST_SZ_DW(dealloc_pd_out)] = {0}; + u32 in[MLX5_ST_SZ_DW(dealloc_pd_in)] = {0}; + + MLX5_SET(dealloc_pd_in, in, opcode, MLX5_CMD_OP_DEALLOC_PD); + MLX5_SET(dealloc_pd_in, in, pd, pdn); + MLX5_SET(dealloc_pd_in, in, uid, uid); + mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h index 0437190c1b35..b47e98b8a53a 100644 --- a/drivers/infiniband/hw/mlx5/cmd.h +++ b/drivers/infiniband/hw/mlx5/cmd.h @@ -50,4 +50,5 @@ int mlx5_cmd_dealloc_memic(struct mlx5_memic *memic, u64 addr, u64 length); void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid); void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid); void mlx5_cmd_destroy_rqt(struct mlx5_core_dev *dev, u32 rqtn, u16 uid); +void mlx5_cmd_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn, u16 uid); #endif /* MLX5_IB_CMD_H */ diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 75851721d1dc..7e6fd5553ab3 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -2355,21 +2355,31 @@ static struct ib_pd *mlx5_ib_alloc_pd(struct ib_device *ibdev, struct mlx5_ib_alloc_pd_resp resp; struct mlx5_ib_pd *pd; int err; + u32 out[MLX5_ST_SZ_DW(alloc_pd_out)] = {0}; + u32 in[MLX5_ST_SZ_DW(alloc_pd_in)] = {0}; + u16 uid; + + uid = context ? to_mucontext(context)->devx_uid : 0; pd = kmalloc(sizeof(*pd), GFP_KERNEL); if (!pd) return ERR_PTR(-ENOMEM); - err = mlx5_core_alloc_pd(to_mdev(ibdev)->mdev, &pd->pdn); + MLX5_SET(alloc_pd_in, in, opcode, MLX5_CMD_OP_ALLOC_PD); + MLX5_SET(alloc_pd_in, in, uid, uid); + err = mlx5_cmd_exec(to_mdev(ibdev)->mdev, in, sizeof(in), + out, sizeof(out)); if (err) { kfree(pd); return ERR_PTR(err); } + pd->pdn = MLX5_GET(alloc_pd_out, out, pd); + pd->uid = uid; if (context) { resp.pdn = pd->pdn; if (ib_copy_to_udata(udata, &resp, sizeof(resp))) { - mlx5_core_dealloc_pd(to_mdev(ibdev)->mdev, pd->pdn); + mlx5_cmd_dealloc_pd(to_mdev(ibdev)->mdev, pd->pdn, uid); kfree(pd); return ERR_PTR(-EFAULT); } @@ -2383,7 +2393,7 @@ static int mlx5_ib_dealloc_pd(struct ib_pd *pd) struct mlx5_ib_dev *mdev = to_mdev(pd->device); struct mlx5_ib_pd *mpd = to_mpd(pd); - mlx5_core_dealloc_pd(mdev->mdev, mpd->pdn); + mlx5_cmd_dealloc_pd(mdev->mdev, mpd->pdn, mpd->uid); kfree(mpd); return 0; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 2508a401a7d9..88ea0df71d94 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -153,6 +153,7 @@ static inline struct mlx5_ib_ucontext *to_mucontext(struct ib_ucontext *ibuconte struct mlx5_ib_pd { struct ib_pdibpd; u32 pdn; + u16 uid; }; enum { -- 2.14.4
[PATCH rdma-next 13/25] IB/mlx5: Set uid as part of TIS commands
From: Yishai Hadas Set uid as part of TIS commands so that the firmware can manage the TIS object in a secured way. That will enable using a TIS that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/cmd.c | 12 drivers/infiniband/hw/mlx5/cmd.h | 1 + drivers/infiniband/hw/mlx5/qp.c | 27 +-- 3 files changed, 30 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c index e150ae44e06a..8a3623bbca94 100644 --- a/drivers/infiniband/hw/mlx5/cmd.c +++ b/drivers/infiniband/hw/mlx5/cmd.c @@ -208,3 +208,15 @@ void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid) MLX5_SET(destroy_tir_in, in, uid, uid); mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); } + +void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid) +{ + u32 in[MLX5_ST_SZ_DW(destroy_tis_in)] = {0}; + u32 out[MLX5_ST_SZ_DW(destroy_tis_out)] = {0}; + + MLX5_SET(destroy_tis_in, in, opcode, MLX5_CMD_OP_DESTROY_TIS); + MLX5_SET(destroy_tis_in, in, tisn, tisn); + MLX5_SET(destroy_tis_in, in, uid, uid); + mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} + diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h index 274090a38c4b..a55e750591e5 100644 --- a/drivers/infiniband/hw/mlx5/cmd.h +++ b/drivers/infiniband/hw/mlx5/cmd.h @@ -48,4 +48,5 @@ int mlx5_cmd_alloc_memic(struct mlx5_memic *memic, phys_addr_t *addr, u64 length, u32 alignment); int mlx5_cmd_dealloc_memic(struct mlx5_memic *memic, u64 addr, u64 length); void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid); +void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid); #endif /* MLX5_IB_CMD_H */ diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 07bf5128bee4..5421857f195e 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1062,11 +1062,12 @@ static int is_connected(enum ib_qp_type qp_type) static int create_raw_packet_qp_tis(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, - struct mlx5_ib_sq *sq, u32 tdn) + struct mlx5_ib_sq *sq, u32 tdn, u16 uid) { u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {0}; void *tisc = MLX5_ADDR_OF(create_tis_in, in, ctx); + MLX5_SET(create_tis_in, in, uid, uid); MLX5_SET(tisc, tisc, transport_domain, tdn); if (qp->flags & MLX5_IB_QP_UNDERLAY) MLX5_SET(tisc, tisc, underlay_qpn, qp->underlay_qpn); @@ -1075,9 +1076,9 @@ static int create_raw_packet_qp_tis(struct mlx5_ib_dev *dev, } static void destroy_raw_packet_qp_tis(struct mlx5_ib_dev *dev, - struct mlx5_ib_sq *sq) + struct mlx5_ib_sq *sq, u16 uid) { - mlx5_core_destroy_tis(dev->mdev, sq->tisn); + mlx5_cmd_destroy_tis(dev->mdev, sq->tisn, uid); } static void destroy_flow_rule_vport_sq(struct mlx5_ib_dev *dev, @@ -1337,7 +1338,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, u16 uid = mucontext->devx_uid; if (qp->sq.wqe_cnt) { - err = create_raw_packet_qp_tis(dev, qp, sq, tdn); + err = create_raw_packet_qp_tis(dev, qp, sq, tdn, uid); if (err) return err; @@ -1378,7 +1379,7 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp, return err; destroy_raw_packet_qp_sq(dev, sq); err_destroy_tis: - destroy_raw_packet_qp_tis(dev, sq); + destroy_raw_packet_qp_tis(dev, sq, uid); return err; } @@ -1398,7 +1399,7 @@ static void destroy_raw_packet_qp(struct mlx5_ib_dev *dev, if (qp->sq.wqe_cnt) { destroy_raw_packet_qp_sq(dev, sq); - destroy_raw_packet_qp_tis(dev, sq); + destroy_raw_packet_qp_tis(dev, sq, uid); } } @@ -2579,7 +2580,7 @@ static int ib_rate_to_mlx5(struct mlx5_ib_dev *dev, u8 rate) } static int modify_raw_packet_eth_prio(struct mlx5_core_dev *dev, - struct mlx5_ib_sq *sq, u8 sl) + struct mlx5_ib_sq *sq, u8 sl, u16 uid) { void *in; void *tisc; @@ -2592,6 +2593,7 @@ static int modify_raw_packet_eth_prio(struct mlx5_core_dev *dev, return -ENOMEM; MLX5_SET(modify_tis_in, in, bitmask.prio, 1); + MLX5_SET(modify_tis_in, in, uid, uid); tisc = MLX5_ADDR_OF(modify_tis_in, in, ctx); MLX5_SET(tisc, tisc, prio, ((sl & 0x7) << 1)); @@ -2604,7 +2606,8 @@ static int modify_raw_packet_eth_prio(struct mlx5_cor
[PATCH rdma-next 20/25] IB/mlx5: Set uid as part of MCG commands
From: Yishai Hadas Set uid as part of MCG commands so that the firmware can manage the MCG object in a secured way. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/cmd.c | 30 ++ drivers/infiniband/hw/mlx5/cmd.h | 4 drivers/infiniband/hw/mlx5/main.c | 11 +-- 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c index 51c39bc77ac7..ababc5cdbcaa 100644 --- a/drivers/infiniband/hw/mlx5/cmd.c +++ b/drivers/infiniband/hw/mlx5/cmd.c @@ -296,3 +296,33 @@ int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid) MLX5_SET(dealloc_xrcd_in, in, uid, uid); return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); } + +int mlx5_cmd_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, + u32 qpn, u16 uid) +{ + u32 out[MLX5_ST_SZ_DW(attach_to_mcg_out)] = {0}; + u32 in[MLX5_ST_SZ_DW(attach_to_mcg_in)] = {0}; + void *gid; + + MLX5_SET(attach_to_mcg_in, in, opcode, MLX5_CMD_OP_ATTACH_TO_MCG); + MLX5_SET(attach_to_mcg_in, in, qpn, qpn); + MLX5_SET(attach_to_mcg_in, in, uid, uid); + gid = MLX5_ADDR_OF(attach_to_mcg_in, in, multicast_gid); + memcpy(gid, mgid, sizeof(*mgid)); + return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} + +int mlx5_cmd_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, + u32 qpn, u16 uid) +{ + u32 out[MLX5_ST_SZ_DW(detach_from_mcg_out)] = {0}; + u32 in[MLX5_ST_SZ_DW(detach_from_mcg_in)] = {0}; + void *gid; + + MLX5_SET(detach_from_mcg_in, in, opcode, MLX5_CMD_OP_DETACH_FROM_MCG); + MLX5_SET(detach_from_mcg_in, in, qpn, qpn); + MLX5_SET(detach_from_mcg_in, in, uid, uid); + gid = MLX5_ADDR_OF(detach_from_mcg_in, in, multicast_gid); + memcpy(gid, mgid, sizeof(*mgid)); + return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h index 76823e86fd17..7cf364af7c28 100644 --- a/drivers/infiniband/hw/mlx5/cmd.h +++ b/drivers/infiniband/hw/mlx5/cmd.h @@ -57,4 +57,8 @@ void mlx5_cmd_dealloc_transport_domain(struct mlx5_core_dev *dev, u32 tdn, u16 uid); int mlx5_cmd_xrcd_alloc(struct mlx5_core_dev *dev, u32 *xrcdn, u16 uid); int mlx5_cmd_xrcd_dealloc(struct mlx5_core_dev *dev, u32 xrcdn, u16 uid); +int mlx5_cmd_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, + u32 qpn, u16 uid); +int mlx5_cmd_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, + u32 qpn, u16 uid); #endif /* MLX5_IB_CMD_H */ diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index c1f94bc09606..ac2abfc866a6 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -4139,13 +4139,17 @@ static int mlx5_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) struct mlx5_ib_dev *dev = to_mdev(ibqp->device); struct mlx5_ib_qp *mqp = to_mqp(ibqp); int err; + u16 uid; + + uid = ibqp->uobject ? + to_mucontext(ibqp->uobject->context)->devx_uid : 0; if (mqp->flags & MLX5_IB_QP_UNDERLAY) { mlx5_ib_dbg(dev, "Attaching a multi cast group to underlay QP is not supported\n"); return -EOPNOTSUPP; } - err = mlx5_core_attach_mcg(dev->mdev, gid, ibqp->qp_num); + err = mlx5_cmd_attach_mcg(dev->mdev, gid, ibqp->qp_num, uid); if (err) mlx5_ib_warn(dev, "failed attaching QPN 0x%x, MGID %pI6\n", ibqp->qp_num, gid->raw); @@ -4157,8 +4161,11 @@ static int mlx5_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { struct mlx5_ib_dev *dev = to_mdev(ibqp->device); int err; + u16 uid; - err = mlx5_core_detach_mcg(dev->mdev, gid, ibqp->qp_num); + uid = ibqp->uobject ? + to_mucontext(ibqp->uobject->context)->devx_uid : 0; + err = mlx5_cmd_detach_mcg(dev->mdev, gid, ibqp->qp_num, uid); if (err) mlx5_ib_warn(dev, "failed detaching QPN 0x%x, MGID %pI6\n", ibqp->qp_num, gid->raw); -- 2.14.4
[PATCH rdma-next 14/25] IB/mlx5: Set uid as part of RQT commands
From: Yishai Hadas Set uid as part of RQT commands so that the firmware can manage the RQT object in a secured way. That will enable using an RQT that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/cmd.c | 11 +++ drivers/infiniband/hw/mlx5/cmd.h | 1 + drivers/infiniband/hw/mlx5/qp.c | 11 +-- 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c index 8a3623bbca94..347e3912b4bb 100644 --- a/drivers/infiniband/hw/mlx5/cmd.c +++ b/drivers/infiniband/hw/mlx5/cmd.c @@ -220,3 +220,14 @@ void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid) mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); } +void mlx5_cmd_destroy_rqt(struct mlx5_core_dev *dev, u32 rqtn, u16 uid) +{ + u32 in[MLX5_ST_SZ_DW(destroy_rqt_in)] = {0}; + u32 out[MLX5_ST_SZ_DW(destroy_rqt_out)] = {0}; + + MLX5_SET(destroy_rqt_in, in, opcode, MLX5_CMD_OP_DESTROY_RQT); + MLX5_SET(destroy_rqt_in, in, rqtn, rqtn); + MLX5_SET(destroy_rqt_in, in, uid, uid); + mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} + diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h index a55e750591e5..0437190c1b35 100644 --- a/drivers/infiniband/hw/mlx5/cmd.h +++ b/drivers/infiniband/hw/mlx5/cmd.h @@ -49,4 +49,5 @@ int mlx5_cmd_alloc_memic(struct mlx5_memic *memic, phys_addr_t *addr, int mlx5_cmd_dealloc_memic(struct mlx5_memic *memic, u64 addr, u64 length); void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid); void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid); +void mlx5_cmd_destroy_rqt(struct mlx5_core_dev *dev, u32 rqtn, u16 uid); #endif /* MLX5_IB_CMD_H */ diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 5421857f195e..8fbf17a885b9 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -5702,6 +5702,7 @@ struct ib_rwq_ind_table *mlx5_ib_create_rwq_ind_table(struct ib_device *device, int i; u32 *in; void *rqtc; + u16 uid; if (udata->inlen > 0 && !ib_is_udata_cleared(udata, 0, @@ -5739,6 +5740,10 @@ struct ib_rwq_ind_table *mlx5_ib_create_rwq_ind_table(struct ib_device *device, for (i = 0; i < sz; i++) MLX5_SET(rqtc, rqtc, rq_num[i], init_attr->ind_tbl[i]->wq_num); + /* Use the uid from its internal WQ */ + uid = to_mucontext(init_attr->ind_tbl[0]->uobject->context)->devx_uid; + MLX5_SET(create_rqt_in, in, uid, uid); + err = mlx5_core_create_rqt(dev->mdev, in, inlen, &rwq_ind_tbl->rqtn); kvfree(in); @@ -5757,7 +5762,7 @@ struct ib_rwq_ind_table *mlx5_ib_create_rwq_ind_table(struct ib_device *device, return &rwq_ind_tbl->ib_rwq_ind_tbl; err_copy: - mlx5_core_destroy_rqt(dev->mdev, rwq_ind_tbl->rqtn); + mlx5_cmd_destroy_rqt(dev->mdev, rwq_ind_tbl->rqtn, uid); err: kfree(rwq_ind_tbl); return ERR_PTR(err); @@ -5767,8 +5772,10 @@ int mlx5_ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *ib_rwq_ind_tbl) { struct mlx5_ib_rwq_ind_table *rwq_ind_tbl = to_mrwq_ind_table(ib_rwq_ind_tbl); struct mlx5_ib_dev *dev = to_mdev(ib_rwq_ind_tbl->device); + u16 uid; - mlx5_core_destroy_rqt(dev->mdev, rwq_ind_tbl->rqtn); + uid = to_mucontext(ib_rwq_ind_tbl->uobject->context)->devx_uid; + mlx5_cmd_destroy_rqt(dev->mdev, rwq_ind_tbl->rqtn, uid); kfree(rwq_ind_tbl); return 0; -- 2.14.4
[PATCH rdma-next 25/25] IB/mlx5: Enable DEVX on IB
From: Yishai Hadas IB has additional protections with SELinux that cannot be extended to the DEVX domain. SELinux can restrict access to pkeys. The first version of DEVX blocked IB entirely until this could be understood. Since DEVX requires CAP_NET_RAW, it supersedes the SELinux restriction and allows userspace to form arbitrary packets with arbitrary pkeys. Thus we enable IB for DEVX when CAP_NET_RAW is given. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/main.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 8cc285c4da8e..c31e57bead8e 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1759,12 +1759,6 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, #endif if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX) { - /* Block DEVX on Infiniband as of SELinux */ - if (mlx5_ib_port_link_layer(ibdev, 1) != IB_LINK_LAYER_ETHERNET) { - err = -EPERM; - goto out_uars; - } - err = mlx5_ib_devx_create(dev); if (err < 0) goto out_uars; -- 2.14.4
[PATCH rdma-next 23/25] IB/mlx5: Manage device uid for DEVX white list commands
From: Yishai Hadas Manage device uid for DEVX white list commands. The created device uid will be used on white list commands if the user didn't supply its own uid. This will enable the firmware to filter out non privileged functionality as of the recognition of the uid. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/devx.c| 12 ++-- drivers/infiniband/hw/mlx5/main.c| 16 drivers/infiniband/hw/mlx5/mlx5_ib.h | 13 + 3 files changed, 23 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index 562c7936bbad..97cac57dcb3d 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -45,13 +45,14 @@ static struct mlx5_ib_ucontext *devx_ufile2uctx(struct ib_uverbs_file *file) return to_mucontext(ib_uverbs_get_ucontext(file)); } -int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext *context) +int mlx5_ib_devx_create(struct mlx5_ib_dev *dev) { u32 in[MLX5_ST_SZ_DW(create_uctx_in)] = {0}; u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0}; u64 general_obj_types; void *hdr; int err; + u16 uid; hdr = MLX5_ADDR_OF(create_uctx_in, in, hdr); @@ -70,19 +71,18 @@ int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext *contex if (err) return err; - context->devx_uid = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id); - return 0; + uid = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id); + return uid; } -void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, - struct mlx5_ib_ucontext *context) +void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, u16 uid) { u32 in[MLX5_ST_SZ_DW(general_obj_in_cmd_hdr)] = {0}; u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0}; MLX5_SET(general_obj_in_cmd_hdr, in, opcode, MLX5_CMD_OP_DESTROY_GENERAL_OBJECT); MLX5_SET(general_obj_in_cmd_hdr, in, obj_type, MLX5_OBJ_TYPE_UCTX); - MLX5_SET(general_obj_in_cmd_hdr, in, obj_id, context->devx_uid); + MLX5_SET(general_obj_in_cmd_hdr, in, obj_id, uid); mlx5_cmd_exec(dev->mdev, in, sizeof(in), out, sizeof(out)); } diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index ac2abfc866a6..8cc285c4da8e 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1765,9 +1765,10 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, goto out_uars; } - err = mlx5_ib_devx_create(dev, context); - if (err) + err = mlx5_ib_devx_create(dev); + if (err < 0) goto out_uars; + context->devx_uid = err; } err = mlx5_ib_alloc_transport_domain(dev, &context->tdn, @@ -1872,7 +1873,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, mlx5_ib_dealloc_transport_domain(dev, context->tdn, context->devx_uid); out_devx: if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX) - mlx5_ib_devx_destroy(dev, context); + mlx5_ib_devx_destroy(dev, context->devx_uid); out_uars: deallocate_uars(dev, context); @@ -1899,7 +1900,7 @@ static int mlx5_ib_dealloc_ucontext(struct ib_ucontext *ibcontext) mlx5_ib_dealloc_transport_domain(dev, context->tdn, context->devx_uid); if (context->devx_uid) - mlx5_ib_devx_destroy(dev, context); + mlx5_ib_devx_destroy(dev, context->devx_uid); deallocate_uars(dev, context); kfree(bfregi->sys_pages); @@ -6287,6 +6288,8 @@ void __mlx5_ib_remove(struct mlx5_ib_dev *dev, profile->stage[stage].cleanup(dev); } + if (dev->devx_whitelist_uid) + mlx5_ib_devx_destroy(dev, dev->devx_whitelist_uid); ib_dealloc_device((struct ib_device *)dev); } @@ -6295,6 +6298,7 @@ void *__mlx5_ib_add(struct mlx5_ib_dev *dev, { int err; int i; + int uid; printk_once(KERN_INFO "%s", mlx5_version); @@ -6306,6 +6310,10 @@ void *__mlx5_ib_add(struct mlx5_ib_dev *dev, } } + uid = mlx5_ib_devx_create(dev); + if (uid > 0) + dev->devx_whitelist_uid = uid; + dev->profile = profile; dev->ib_active = true; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index f582bd05c180..6a0fbd0286ef 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -934,6 +934,7 @@ struct mlx5_ib_dev { struct list_headib_dev_list; u64 sys_image_guid; struct mlx5_memic memic; + u16 devx_whitelist_uid; }
[PATCH rdma-next 21/25] IB/mlx5: Set valid umem bit on DEVX
From: Yishai Hadas Set valid umem bit on DEVX commands that use umem. This will enforce the umem usage by the firmware and not the 'pas' info. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/devx.c | 95 +++ 1 file changed, 95 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index 25dafa4ff6ca..562c7936bbad 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -264,6 +264,97 @@ static int devx_is_valid_obj_id(struct devx_obj *obj, const void *in) return false; } +static void devx_set_umem_valid(const void *in) +{ + u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode); + + switch (opcode) { + case MLX5_CMD_OP_CREATE_MKEY: + MLX5_SET(create_mkey_in, in, mkey_umem_valid, 1); + break; + case MLX5_CMD_OP_CREATE_CQ: + { + void *cqc; + + MLX5_SET(create_cq_in, in, cq_umem_valid, 1); + cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context); + MLX5_SET(cqc, cqc, dbr_umem_valid, 1); + break; + } + case MLX5_CMD_OP_CREATE_QP: + { + void *qpc; + + qpc = MLX5_ADDR_OF(create_qp_in, in, qpc); + MLX5_SET(qpc, qpc, dbr_umem_valid, 1); + MLX5_SET(create_qp_in, in, wq_umem_valid, 1); + break; + } + + case MLX5_CMD_OP_CREATE_RQ: + { + void *rqc, *wq; + + rqc = MLX5_ADDR_OF(create_rq_in, in, ctx); + wq = MLX5_ADDR_OF(rqc, rqc, wq); + MLX5_SET(wq, wq, dbr_umem_valid, 1); + MLX5_SET(wq, wq, wq_umem_valid, 1); + break; + } + + case MLX5_CMD_OP_CREATE_SQ: + { + void *sqc, *wq; + + sqc = MLX5_ADDR_OF(create_sq_in, in, ctx); + wq = MLX5_ADDR_OF(sqc, sqc, wq); + MLX5_SET(wq, wq, dbr_umem_valid, 1); + MLX5_SET(wq, wq, wq_umem_valid, 1); + break; + } + + case MLX5_CMD_OP_MODIFY_CQ: + MLX5_SET(modify_cq_in, in, cq_umem_valid, 1); + break; + + case MLX5_CMD_OP_CREATE_RMP: + { + void *rmpc, *wq; + + rmpc = MLX5_ADDR_OF(create_rmp_in, in, ctx); + wq = MLX5_ADDR_OF(rmpc, rmpc, wq); + MLX5_SET(wq, wq, dbr_umem_valid, 1); + MLX5_SET(wq, wq, wq_umem_valid, 1); + break; + } + + case MLX5_CMD_OP_CREATE_XRQ: + { + void *xrqc, *wq; + + xrqc = MLX5_ADDR_OF(create_xrq_in, in, xrq_context); + wq = MLX5_ADDR_OF(xrqc, xrqc, wq); + MLX5_SET(wq, wq, dbr_umem_valid, 1); + MLX5_SET(wq, wq, wq_umem_valid, 1); + break; + } + + case MLX5_CMD_OP_CREATE_XRC_SRQ: + { + void *xrc_srqc; + + MLX5_SET(create_xrc_srq_in, in, xrc_srq_umem_valid, 1); + xrc_srqc = MLX5_ADDR_OF(create_xrc_srq_in, in, + xrc_srq_context_entry); + MLX5_SET(xrc_srqc, xrc_srqc, dbr_umem_valid, 1); + break; + } + + default: + return; + } +} + static bool devx_is_obj_create_cmd(const void *in) { u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode); @@ -741,6 +832,8 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)( return -ENOMEM; MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, c->devx_uid); + devx_set_umem_valid(cmd_in); + err = mlx5_cmd_exec(dev->mdev, cmd_in, uverbs_attr_get_len(attrs, MLX5_IB_ATTR_DEVX_OBJ_CREATE_CMD_IN), cmd_out, cmd_out_len); @@ -790,6 +883,8 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_MODIFY)( return PTR_ERR(cmd_out); MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, c->devx_uid); + devx_set_umem_valid(cmd_in); + err = mlx5_cmd_exec(obj->mdev, cmd_in, uverbs_attr_get_len(attrs, MLX5_IB_ATTR_DEVX_OBJ_MODIFY_CMD_IN), cmd_out, cmd_out_len); -- 2.14.4
[PATCH rdma-next 24/25] IB/mlx5: Enable DEVX white list commands
From: Yishai Hadas Enable DEVX white list commands without the need for CAP_NET_RAW. DEVX uid must exist from the ucontext or the device so that the firmware will mask unprivileged capabilities. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/devx.c | 75 +++ 1 file changed, 60 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index 97cac57dcb3d..c11640047f26 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -61,9 +61,6 @@ int mlx5_ib_devx_create(struct mlx5_ib_dev *dev) !(general_obj_types & MLX5_GENERAL_OBJ_TYPES_CAP_UMEM)) return -EINVAL; - if (!capable(CAP_NET_RAW)) - return -EPERM; - MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode, MLX5_CMD_OP_CREATE_GENERAL_OBJECT); MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type, MLX5_OBJ_TYPE_UCTX); @@ -476,12 +473,49 @@ static bool devx_is_obj_query_cmd(const void *in) } } +static bool devx_is_whitelist_cmd(void *in) +{ + u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode); + + switch (opcode) { + case MLX5_CMD_OP_QUERY_HCA_CAP: + case MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT: + return true; + default: + return false; + } +} + +static int devx_get_uid(struct mlx5_ib_ucontext *c, void *cmd_in) +{ + if (devx_is_whitelist_cmd(cmd_in)) { + struct mlx5_ib_dev *dev; + + if (c->devx_uid) + return c->devx_uid; + + dev = to_mdev(c->ibucontext.device); + if (dev->devx_whitelist_uid) + return dev->devx_whitelist_uid; + + return -EOPNOTSUPP; + } + + if (!c->devx_uid) + return -EINVAL; + + if (!capable(CAP_NET_RAW)) + return -EPERM; + + return c->devx_uid; +} static bool devx_is_general_cmd(void *in) { u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode); switch (opcode) { case MLX5_CMD_OP_QUERY_HCA_CAP: + case MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT: case MLX5_CMD_OP_QUERY_VPORT_STATE: case MLX5_CMD_OP_QUERY_ADAPTER: case MLX5_CMD_OP_QUERY_ISSI: @@ -589,14 +623,16 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OTHER)( MLX5_IB_ATTR_DEVX_OTHER_CMD_OUT); void *cmd_out; int err; + int uid; c = devx_ufile2uctx(file); if (IS_ERR(c)) return PTR_ERR(c); dev = to_mdev(c->ibucontext.device); - if (!c->devx_uid) - return -EPERM; + uid = devx_get_uid(c, cmd_in); + if (uid < 0) + return uid; /* Only white list of some general HCA commands are allowed for this method. */ if (!devx_is_general_cmd(cmd_in)) @@ -606,7 +642,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OTHER)( if (IS_ERR(cmd_out)) return PTR_ERR(cmd_out); - MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, c->devx_uid); + MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, uid); err = mlx5_cmd_exec(dev->mdev, cmd_in, uverbs_attr_get_len(attrs, MLX5_IB_ATTR_DEVX_OTHER_CMD_IN), cmd_out, cmd_out_len); @@ -816,9 +852,11 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)( struct mlx5_ib_dev *dev = to_mdev(c->ibucontext.device); struct devx_obj *obj; int err; + int uid; - if (!c->devx_uid) - return -EPERM; + uid = devx_get_uid(c, cmd_in); + if (uid < 0) + return uid; if (!devx_is_obj_create_cmd(cmd_in)) return -EINVAL; @@ -831,7 +869,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)( if (!obj) return -ENOMEM; - MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, c->devx_uid); + MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, uid); devx_set_umem_valid(cmd_in); err = mlx5_cmd_exec(dev->mdev, cmd_in, @@ -868,9 +906,11 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_MODIFY)( struct devx_obj *obj = uobj->object; void *cmd_out; int err; + int uid; - if (!c->devx_uid) - return -EPERM; + uid = devx_get_uid(c, cmd_in); + if (uid < 0) + return uid; if (!devx_is_obj_modify_cmd(cmd_in)) return -EINVAL; @@ -882,7 +922,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_MODIFY)( if (IS_ERR(cmd_out)) return PTR_ERR(cmd_out); - MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, c->devx_uid); + MLX5_SET(general_obj_in_cmd_hdr, cmd_in, uid, uid); devx_set_umem_valid(cmd_in); err = mlx5_cmd_exec(obj->md
Re: iproute2: Debian 9 No ELF support
Hi, On Mon, Sep 17, 2018 at 11:57:12AM +0200, Daniel Borkmann wrote: On 09/17/2018 10:23 AM, Bo YU wrote: Hello, I have followed the instructions from: https://cilium.readthedocs.io/en/latest/bpf/#bpftool to test xdp program. But i can not enable elf support. ./configure --prefix=/usr ```output TC schedulers ATM no libc has setns: yes SELinux support: no ELF support: no libmnl support: yes Berkeley DB: yes need for strlcpy: yes libcap support: yes ``` And i have installed libelf-dev : ```output sudo apt show libelf-dev Package: libelf-dev Version: 0.168-1 Priority: optional Section: libdevel Source: elfutils Maintainer: Kurt Roeckx Installed-Size: 353 kB Depends: libelf1 (= 0.168-1) Conflicts: libelfg0-dev Homepage: https://sourceware.org/elfutils/ Tag: devel::library, role::devel-lib ``` And gcc version: gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) uname -a: Linux debian 4.18.0-rc1+ #2 SMP Sun Jun 24 16:53:57 HKT 2018 x86_64 GNU/Linux Any help is appreciate. Debian's official iproute2 packaging build says 'libelf-dev' [0], and having libelf-dev installed should work ... [...] Build-Depends: bison, debhelper (>= 10~), flex, iptables-dev, libatm1-dev, libcap-dev, libdb-dev, libelf-dev, libmnl-dev, libselinux1-dev, linux-libc-dev, pkg-config, po-debconf, zlib1g-dev, [...] Did you ran into this one perhaps [1]? Do you have zlib1g-dev installed? Yes,You are right. I install zlib1g-dev with your help,iproute2 enable ELF support. ```output ./configure --prefix=/usr TC schedulers ATM no libc has setns: yes SELinux support: no ELF support: yes libmnl support: yes Berkeley DB: yes need for strlcpy: yes libcap support: yes ``` But there is no effect after [1], right? When i install libelf-dev,it should install zlib1g-dev also. Is there any way to update the page [2]? Thank you, Daniel [2] https://cilium.readthedocs.io/en/latest/bpf/#bpftool [0] https://salsa.debian.org/debian/iproute2/blob/master/debian/control [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=885071
Re: iproute2: Debian 9 No ELF support
On 09/17/2018 01:46 PM, Bo YU wrote: > On Mon, Sep 17, 2018 at 11:57:12AM +0200, Daniel Borkmann wrote: >> On 09/17/2018 10:23 AM, Bo YU wrote: >>> Hello, >>> I have followed the instructions from: >>> >>> https://cilium.readthedocs.io/en/latest/bpf/#bpftool >>> >>> to test xdp program. >>> But i can not enable elf support. >>> >>> ./configure --prefix=/usr >>> ```output >>> TC schedulers >>> ATM no >>> >>> libc has setns: yes >>> SELinux support: no >>> ELF support: no >>> libmnl support: yes >>> Berkeley DB: yes >>> need for strlcpy: yes >>> libcap support: yes >>> ``` >>> And i have installed libelf-dev : >>> ```output >>> sudo apt show libelf-dev >>> Package: libelf-dev >>> Version: 0.168-1 >>> Priority: optional >>> Section: libdevel >>> Source: elfutils >>> Maintainer: Kurt Roeckx >>> Installed-Size: 353 kB >>> Depends: libelf1 (= 0.168-1) >>> Conflicts: libelfg0-dev >>> Homepage: https://sourceware.org/elfutils/ >>> Tag: devel::library, role::devel-lib >>> ``` >>> >>> And gcc version: >>> gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) >>> >>> uname -a: >>> Linux debian 4.18.0-rc1+ #2 SMP Sun Jun 24 16:53:57 HKT 2018 x86_64 >>> GNU/Linux >>> >>> Any help is appreciate. >> >> Debian's official iproute2 packaging build says 'libelf-dev' [0], and having >> libelf-dev installed should work ... >> >> [...] >> Build-Depends: bison, >> debhelper (>= 10~), >> flex, >> iptables-dev, >> libatm1-dev, >> libcap-dev, >> libdb-dev, >> libelf-dev, >> libmnl-dev, >> libselinux1-dev, >> linux-libc-dev, >> pkg-config, >> po-debconf, >> zlib1g-dev, >> [...] >> >> Did you ran into this one perhaps [1]? Do you have zlib1g-dev installed? > Yes,You are right. I install zlib1g-dev with your help,iproute2 enable ELF > support. > ```output > ./configure --prefix=/usr > TC schedulers > ATM no > > libc has setns: yes > SELinux support: no > ELF support: yes > libmnl support: yes > Berkeley DB: yes > need for strlcpy: yes > libcap support: yes > > ``` > But there is no effect after [1], right? When i install libelf-dev,it should > install zlib1g-dev also. > > Is there any way to update the page [2]? This bug should be Debian specific, so it would make sense to contact Debian developers or comment on [1] if it's still not resolved in current versions. > Thank you, Daniel > > [2] https://cilium.readthedocs.io/en/latest/bpf/#bpftool >> >> [0] https://salsa.debian.org/debian/iproute2/blob/master/debian/control >> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=885071
Re: [PATCH v3 00/30] backport of IP fragmentation fixes
On Thu, Sep 13, 2018 at 07:58:32AM -0700, Stephen Hemminger wrote: > Took the set of patches from 4.19 to handle IP fragmentation DoS > and applied them against 4.14.69. Most of these are from Eric. > In a couple case, it required some manual merge conflict resolution. > > Tested normal IP fragmentation with iperf3 and malicious IP fragments > with fragmentsmack. Under fragmentation attack (700Kpps) the original > 4.14.69 consumes 97% CPU; with this patch it drops to 5%. All now queued up, thanks for doing the backport. greg k-h
Re: [PATCH net-next v3 0/2] net: stmmac: Coalesce and tail addr fixes
Hi Jerome, On 14-09-2018 16:06, Jerome Brunet wrote: > > Looks better this time. Stable so far, with even a small throughput > improvement > on the Tx path. > > so for the a113 s400 board (single queue) > Tested-by: Jerome Brunet > Thanks for testing! I sent out a rebased version against net. Can you share what's the throughput improvement in % ? Do you still see the performance drop when tx/rx work at same time ? I remember that was another issue ... Thanks and Best Regards, Jose Miguel Abreu
[PATCH net] selftests: pmtu: properly redirect stderr to /dev/null
The cleanup function uses "$CMD 2 > /dev/null", which doesn't actually send stderr to /dev/null, so when the netns doesn't exist, the error message is shown. Use "2> /dev/null" instead, so that those messages disappear, as was intended. Fixes: d1f1b9cbf34c ("selftests: net: Introduce first PMTU test") Signed-off-by: Sabrina Dubroca --- tools/testing/selftests/net/pmtu.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 32a194e3e07a..0ab9423d009f 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -178,8 +178,8 @@ setup() { cleanup() { [ ${cleanup_done} -eq 1 ] && return - ip netns del ${NS_A} 2 > /dev/null - ip netns del ${NS_B} 2 > /dev/null + ip netns del ${NS_A} 2> /dev/null + ip netns del ${NS_B} 2> /dev/null cleanup_done=1 } -- 2.19.0
Re: [PATCH net] selftests: pmtu: properly redirect stderr to /dev/null
On Mon, 17 Sep 2018 15:30:06 +0200 Sabrina Dubroca wrote: > The cleanup function uses "$CMD 2 > /dev/null", which doesn't actually > send stderr to /dev/null, so when the netns doesn't exist, the error > message is shown. Use "2> /dev/null" instead, so that those messages > disappear, as was intended. Oops, thanks for catching this. > Fixes: d1f1b9cbf34c ("selftests: net: Introduce first PMTU test") > Signed-off-by: Sabrina Dubroca Acked-by: Stefano Brivio -- Stefano
Re: [RFC PATCH 2/4] net: enable UDP gro on demand.
On Mon, Sep 17, 2018 at 6:18 AM Paolo Abeni wrote: > > On Sun, 2018-09-16 at 14:23 -0400, Willem de Bruijn wrote: > > That udp gro implementation is clearly less complete than yours in > > this patchset. The point I wanted to bring up for discussion is not the > > protocol implementation, but the infrastructure for enabling it > > conditionally. > > I'm still [trying to] processing your patchset ;) So please perdon me > for any obvious interpretation mistakes... > > > Assuming cycle cost is comparable, what do you think of using the > > existing sk offload callbacks to enable this on a per-socket basis? > > I have no objection about that, if there are no performance drawbacks. > In my measures retpoline costs is relevant for every indirect call > added. Using the existing sk offload approach will require an > additional indirect call per packet compared to the implementation > here. Fair enough. The question is whether it is significant to the real workload. This is also an issue with GRO processing in general, all those callbacks as well as the two cacheline lookups to get to each callback. > > As for the protocol-wide knob, I do strongly prefer something that can > > work for all protocols, not just UDP. > > I like the general infrastructure idea. I think there is some agreement > in avoiding the addition of more user-controllable knobs, as we already > have a lot of them. If I read your patch correctly, user-space need to > enable/disable the UDO GSO explicitly via procfs, right? No, like other GRO callbacks, the feature is enabled by default. Patch 7/8 disables the most expensive part behind a static key until a socket actually registers a GRO callback, whether tunnel or the new application GRO. Patch 6/9 makes it possible to disable any protocol completely, indeed through a sysctl. > I tried to look for something that does not require user action. > > > I also implemented a version that > > atomically swaps the struct ptr instead of the flag based approach I sent > > for review. I'm fairly agnostic about that point. > > I think/fear security oriented guys may scream for the somewhat large > deconstification ?!? Hmm.. yes, interesting point. Since const pointers are a compile time feature, in practice I don't think that they buy any protection against callback pointer rewriting. Let me think about that some more. > > > One subtle issue is that I > > believe we need to keep the gro_complete callbacks enabled, as gro > > packets may be queued for completion when gro_receive gets disabled. > > Good point, thanks! I missed that. > > Cheers, > > Paolo > >
Re: [PATCH net-next RFC 7/8] udp: gro behind static key
On Mon, Sep 17, 2018 at 5:03 AM Steffen Klassert wrote: > > On Fri, Sep 14, 2018 at 01:59:40PM -0400, Willem de Bruijn wrote: > > From: Willem de Bruijn > > > > Avoid the socket lookup cost in udp_gro_receive if no socket has a > > gro callback configured. > > > > Signed-off-by: Willem de Bruijn > > ... > > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > > index 4f6aa95a9b12..f44fe328aa0f 100644 > > --- a/net/ipv4/udp_offload.c > > +++ b/net/ipv4/udp_offload.c > > @@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct > > list_head *head, > > { > > struct udphdr *uh = udp_gro_udphdr(skb); > > > > - if (unlikely(!uh)) > > + if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key)) > > goto flush; > > If you use udp_encap_needed_key to enalbe UDP GRO, then a UDP > encapsulation socket will enable it too. Not sure if this is > intentional. Yes. That is already the case to a certain point. The function was introduced with tunnels and is enabled by tunnels, but so far only compiles out the encap_rcv() branch in udp_qeueue_rcv_skb. With patch 7/8 it also toggles the GRO path. Critically, both are enabled as soon as a tunnel is registered. > > That said, enabling UDP GRO on a UDP encapsulation socket > (ESP in UPD etc.) will fail badly as then encrypted ESP > packets might be merged together. So we somehow should > make sure that this does not happen. Absolutely. This initial implementation probably breaks UDP tunnels badly. That needs to be addressed. > > Anyway, this reminds me that we can support GRO for > UDP encapsulation. It just requires separate GRO > callbacks for the different encapsulation types.
Re: [PATCH net-next RFC 7/8] udp: gro behind static key
On Mon, Sep 17, 2018 at 6:24 AM Paolo Abeni wrote: > > On Fri, 2018-09-14 at 13:59 -0400, Willem de Bruijn wrote: > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > > index 4f6aa95a9b12..f44fe328aa0f 100644 > > --- a/net/ipv4/udp_offload.c > > +++ b/net/ipv4/udp_offload.c > > @@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct > > list_head *head, > > { > > struct udphdr *uh = udp_gro_udphdr(skb); > > > > - if (unlikely(!uh)) > > + if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key)) > > goto flush; > > > > /* Don't bother verifying checksum if we're going to flush anyway. */ > > If I read this correctly, once udp_encap_needed_key is enabled, it will > never be turned off, because the tunnel and encap socket shut down does > not cope with udp_encap_needed_key. > > Perhaps we should take care of that, too. Agreed. For now I reused what's already there, but I can extend that with refcounting using static_branch_inc/static_branch_dec.
Re: [PATCH net-next RFC 7/8] udp: gro behind static key
On Mon, Sep 17, 2018 at 6:37 AM Steffen Klassert wrote: > > On Fri, Sep 14, 2018 at 01:59:40PM -0400, Willem de Bruijn wrote: > > From: Willem de Bruijn > > > > Avoid the socket lookup cost in udp_gro_receive if no socket has a > > gro callback configured. > > It would be nice if we could do GRO not just for GRO configured > sockets, but also for flows that are going to be IPsec transformed > or directly forwarded. I thought about that, as we have GSO. An egregious hack enables GRO for all registered local sockets that support it and for any flow for which no local socket is registered: @@ -365,11 +369,13 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb, rcu_read_lock(); sk = (*lookup)(skb, uh->source, uh->dest); - if (sk && udp_sk(sk)->gro_receive) - goto unflush; - goto out_unlock; + if (!sk) + gro_receive_cb = udp_gro_receive_cb; + else if (!udp_sk(sk)->gro_receive) + goto out_unlock; + else + gro_receive_cb = udp_sk(sk)->gro_receive; @@ -392,7 +398,7 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb, skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */ skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr)); - pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb); + pp = call_gro_receive_sk(gro_receive_cb, sk, head, skb); But not having a local socket does not imply forwarding path, of course. > > Maybe in case that forwarding is enabled on the receiving device, > inet_gro_receive() could do a route lookup and allow GRO if the > route lookup returned at forwarding route. That's a better solution, if the cost is acceptable. We do have to be careful against increasing per packet cycle cost in this path given that it's a possible vector for DoS attempts. > For flows that are likely software segmented after that, it > would be worth to build packet chains insted of merging the > payload. Packets of the same flow could travel together, but > it would save the cost of the packet merging and segmenting. With software GSO that is faster, as it would have to allocate all the separate segment skbs in skb_segment later. Though there is some complexity if MTUs differ. With hardware UDP GSO, having a single skb will be cheaper in the forwarding path. Using napi_gro_frags, device drivers really do only end up allocating one skb for the GSO packet. > This could be done similar to what I proposed for the list > receive case: > > https://www.spinics.net/lists/netdev/msg522706.html > > How GRO should be done could be even configured > by replacing the net_offload pointer similar > to what Paolo propsed in his pachset with > the inet_update_offload() function. Right. The above hack also already has to use two distinct callback assignments.
[PATCH] net: phy: phylink: fix SFP interface autodetection
When the switching to the SFP detected link mode update the main link_interface field as well. Otherwise, the link fails to come up when the configured 'phy-mode' defers from the SFP detected mode. This fixes 1GB SFP module link up on eth3 of the Macchiatobin board that is configured in the DT to "2500base-x" phy-mode. Signed-off-by: Baruch Siach --- drivers/net/phy/phylink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 3ba5cf2a8a5f..3ece48c86841 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -1631,6 +1631,7 @@ static int phylink_sfp_module_insert(void *upstream, if (pl->link_an_mode != MLO_AN_INBAND || pl->link_config.interface != config.interface) { pl->link_config.interface = config.interface; + pl->link_interface = config.interface; pl->link_an_mode = MLO_AN_INBAND; changed = true; -- 2.18.0
Re: [PATCH v2 2/4] dt-bindings: net: qcom: Add binding for shared mdio bus
On Mon, Sep 17, 2018 at 04:53:29PM +0800, Wang Dongsheng wrote: > This property copy from "ibm,emac.txt" to describe a shared MIDO bus. > Since emac include MDIO, so If the motherboard has more than one PHY > connected to an MDIO bus, this property will point to the MAC device > that has the MDIO bus. > > Signed-off-by: Wang Dongsheng > --- > V2: s/Since QDF2400 emac/Since emac/ > --- > Documentation/devicetree/bindings/net/qcom-emac.txt | 4 > 1 file changed, 4 insertions(+) > > diff --git a/Documentation/devicetree/bindings/net/qcom-emac.txt > b/Documentation/devicetree/bindings/net/qcom-emac.txt > index 346e6c7f47b7..50db71771358 100644 > --- a/Documentation/devicetree/bindings/net/qcom-emac.txt > +++ b/Documentation/devicetree/bindings/net/qcom-emac.txt > @@ -24,6 +24,9 @@ Internal PHY node: > The external phy child node: > - reg : The phy address > > +Optional properties: > +- mdio-device : Shared MIDO bus. Hi Dongsheng I don't see why you need this property. The ethernet interface has a phy-handle which points to a PHY. That is all you need to find the PHY. emac0: ethernet@feb2 { compatible = "qcom,fsm9900-emac"; reg = <0xfeb2 0x1>, <0xfeb36000 0x1000>; interrupts = <76>; clocks = <&gcc 0>, <&gcc 1>, <&gcc 3>, <&gcc 4>, <&gcc 5>, <&gcc 6>, <&gcc 7>; clock-names = "axi_clk", "cfg_ahb_clk", "high_speed_clk", "mdio_clk", "tx_clk", "rx_clk", "sys_clk"; internal-phy = <&emac_sgmii>; phy-handle = <&phy0>; #address-cells = <1>; #size-cells = <0>; phy0: ethernet-phy@0 { reg = <0>; }; phy1: ethernet-phy@1 { reg = <1>; }; pinctrl-names = "default"; pinctrl-0 = <&mdio_pins_a>; }; emac1: ethernet@3890 { compatible = "qcom,fsm9900-emac"; ... ... phy-handle = <&phy1>; }; Andrew
Re: [PATCH net] pppoe: fix reception of frames with no mac header
From: Guillaume Nault Date: Fri, 14 Sep 2018 16:28:05 +0200 > pppoe_rcv() needs to look back at the Ethernet header in order to > lookup the PPPoE session. Therefore we need to ensure that the mac > header is big enough to contain an Ethernet header. Otherwise > eth_hdr(skb)->h_source might access invalid data. ... > Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers") > Reported-by: syzbot+f5f6080811c849739...@syzkaller.appspotmail.com > Signed-off-by: Guillaume Nault Applied and queued up for -stable, thanks.
Re: [PATCH net] bnxt_en: Fix VF mac address regression.
From: Michael Chan Date: Fri, 14 Sep 2018 15:41:29 -0400 > The recent commit to always forward the VF MAC address to the PF for > approval may not work if the PF driver or the firmware is older. This > will cause the VF driver to fail during probe: > > bnxt_en :00:03.0 (unnamed net_device) (uninitialized): hwrm req_type > 0xf seq id 0x5 error 0x > bnxt_en :00:03.0 (unnamed net_device) (uninitialized): VF MAC address > 00:00:17:02:05:d0 not approved by the PF > bnxt_en :00:03.0: Unable to initialize mac address. > bnxt_en: probe of :00:03.0 failed with error -99 > > We fix it by treating the error as fatal only if the VF MAC address is > locally generated by the VF. > > Fixes: 707e7e966026 ("bnxt_en: Always forward VF MAC address to the PF.") > Reported-by: Seth Forshee > Reported-by: Siwei Liu > Signed-off-by: Michael Chan > --- > Please queue this for stable as well. Thanks. Applied and queued up for -stable.
Re: [PATCH net] ipv6: fix possible use-after-free in ip6_xmit()
From: Eric Dumazet Date: Fri, 14 Sep 2018 12:02:31 -0700 > In the unlikely case ip6_xmit() has to call skb_realloc_headroom(), > we need to call skb_set_owner_w() before consuming original skb, > otherwise we risk a use-after-free. > > Bring IPv6 in line with what we do in IPv4 to fix this. > > Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") > Signed-off-by: Eric Dumazet > Reported-by: syzbot Applied and queued up for -stable.
Re: [PATCH v2 0/2] hv_netvsc: associate VF and PV device by serial number
From: Stephen Hemminger Date: Fri, 14 Sep 2018 12:54:55 -0700 > The Hyper-V implementation of PCI controller has concept of 32 bit serial > number > (not to be confused with PCI-E serial number). This value is sent in the > protocol > from the host to indicate SR-IOV VF device is attached to a synthetic NIC. > > Using the serial number (instead of MAC address) to associate the two devices > avoids lots of potential problems when there are duplicate MAC addresses from > tunnels or layered devices. > > The patch set is broken into two parts, one is for the PCI controller > and the other is for the netvsc device. Normally, these go through different > trees but sending them together here for better review. The PCI changes > were submitted previously, but the main review comment was "why do you > need this?". This is why. > > v2 - slot name can be shorter. > remove locking when creating pci_slots; see comment for explaination Series applied, thanks.
Re: [net-next PATCH] tls: async support causes out-of-bounds access in crypto APIs
From: John Fastabend Date: Fri, 14 Sep 2018 13:01:46 -0700 > When async support was added it needed to access the sk from the async > callback to report errors up the stack. The patch tried to use space > after the aead request struct by directly setting the reqsize field in > aead_request. This is an internal field that should not be used > outside the crypto APIs. It is used by the crypto code to define extra > space for private structures used in the crypto context. Users of the > API then use crypto_aead_reqsize() and add the returned amount of > bytes to the end of the request memory allocation before posting the > request to encrypt/decrypt APIs. > > So this breaks (with general protection fault and KASAN error, if > enabled) because the request sent to decrypt is shorter than required > causing the crypto API out-of-bounds errors. Also it seems unlikely the > sk is even valid by the time it gets to the callback because of memset > in crypto layer. > > Anyways, fix this by holding the sk in the skb->sk field when the > callback is set up and because the skb is already passed through to > the callback handler via void* we can access it in the handler. Then > in the handler we need to be careful to NULL the pointer again before > kfree_skb. I added comments on both the setup (in tls_do_decryption) > and when we clear it from the crypto callback handler > tls_decrypt_done(). After this selftests pass again and fixes KASAN > errors/warnings. > > Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls > records") > Signed-off-by: John Fastabend Applied, thanks John.
Re: [Patch net-next] ipv4: initialize ra_mutex in inet_init_net()
From: Cong Wang Date: Fri, 14 Sep 2018 13:32:42 -0700 > ra_mutex is a IPv4 specific mutex, it is inside struct netns_ipv4, > but its initialization is in the generic netns code, setup_net(). > > Move it to IPv4 specific net init code, inet_init_net(). > > Fixes: d9ff3049739e ("net: Replace ip_ra_lock with per-net mutex") > Cc: Kirill Tkhai > Signed-off-by: Cong Wang Please take into consideration Kirill's feedback. Thank you.
Re: [PATCH net] net: dsa: mv88e6xxx: Fix ATU Miss Violation
From: Andrew Lunn Date: Fri, 14 Sep 2018 23:46:12 +0200 > Fix a cut/paste error and a typo which results in ATU miss violations > not being reported. > > Fixes: 0977644c5005 ("net: dsa: mv88e6xxx: Decode ATU problem interrupt") > Signed-off-by: Andrew Lunn Applied and queued up for -stable.
Re: [PATCH net] tls: fix currently broken MSG_PEEK behavior
From: Daniel Borkmann Date: Fri, 14 Sep 2018 23:00:55 +0200 > In kTLS MSG_PEEK behavior is currently failing, strace example: ... > As can be seen from strace, there are two TLS records sent, > i) 'test_read_peek' and ii) '_mult_recs\0' where we end up > peeking 'test_read_peektest_read_peektest'. This is clearly > wrong, and what happens is that given peek cannot call into > tls_sw_advance_skb() to unpause strparser and proceed with > the next skb, we end up looping over the current one, copying > the 'test_read_peek' over and over into the user provided > buffer. > > Here, we can only peek into the currently held skb (current, > full TLS record) as otherwise we would end up having to hold > all the original skb(s) (depending on the peek depth) in a > separate queue when unpausing strparser to process next > records, minimally intrusive is to return only up to the > current record's size (which likely was what c46234ebb4d1 > ("tls: RX path for ktls") originally intended as well). Thus, > after patch we properly peek the first record: ... > Fixes: c46234ebb4d1 ("tls: RX path for ktls") > Signed-off-by: Daniel Borkmann Applied and queued up for -stable.
Re: [PATCH net-next] net: dsa: gswip: Fix return value check in gswip_probe()
From: Wei Yongjun Date: Sat, 15 Sep 2018 01:33:21 + > In case of error, the function devm_ioremap_resource() returns ERR_PTR() > and never returns NULL. The NULL test in the return value check should > be replaced with IS_ERR(). > > Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") > Signed-off-by: Wei Yongjun Applied.
Re: [PATCH net-next] net: lantiq: Fix return value check in xrx200_probe()
From: Wei Yongjun Date: Sat, 15 Sep 2018 01:33:50 + > In case of error, the function devm_ioremap_resource() returns ERR_PTR() > and never returns NULL. The NULL test in the return value check should > be replaced with IS_ERR(). > > Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver") > Signed-off-by: Wei Yongjun Applied.
Re: [PATCH net-next] net: hns: make function hns_gmac_wait_fifo_clean() static
From: Wei Yongjun Date: Sat, 15 Sep 2018 01:42:09 + > Fixes the following sparse warning: > > drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c:322:5: warning: > symbol 'hns_gmac_wait_fifo_clean' was not declared. Should it be static? > > Signed-off-by: Wei Yongjun Applied.
[PATCH net] net/ipv4: defensive cipso option parsing
commit 40413955ee26 ("Cipso: cipso_v4_optptr enter infinite loop") fixed a possible infinite loop in the IP option parsing of CIPSO. The fix assumes that ip_options_compile filtered out all zero length options and that no other one-byte options beside IPOPT_END and IPOPT_NOOP exist. While this assumption currently holds true, add explicit checks for zero length and invalid length options to be safe for the future. Even though ip_options_compile should have validated the options, the introduction of new one-byte options can still confuse this code without the additional checks. Signed-off-by: Stefan Nuernberger Reviewed-by: David Woodhouse Reviewed-by: Simon Veith Cc: sta...@vger.kernel.org --- net/ipv4/cipso_ipv4.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c index 82178cc69c96..f291b57b8474 100644 --- a/net/ipv4/cipso_ipv4.c +++ b/net/ipv4/cipso_ipv4.c @@ -1512,7 +1512,7 @@ static int cipso_v4_parsetag_loc(const struct cipso_v4_doi *doi_def, * * Description: * Parse the packet's IP header looking for a CIPSO option. Returns a pointer - * to the start of the CIPSO option on success, NULL if one if not found. + * to the start of the CIPSO option on success, NULL if one is not found. * */ unsigned char *cipso_v4_optptr(const struct sk_buff *skb) @@ -1522,9 +1522,11 @@ unsigned char *cipso_v4_optptr(const struct sk_buff *skb) int optlen; int taglen; - for (optlen = iph->ihl*4 - sizeof(struct iphdr); optlen > 0; ) { + for (optlen = iph->ihl*4 - sizeof(struct iphdr); optlen > 1; ) { switch (optptr[0]) { case IPOPT_CIPSO: + if (!optptr[1] || optptr[1] > optlen) + return NULL; return optptr; case IPOPT_END: return NULL; @@ -1534,6 +1536,10 @@ unsigned char *cipso_v4_optptr(const struct sk_buff *skb) default: taglen = optptr[1]; } + + if (!taglen || taglen > optlen) + break; + optlen -= taglen; optptr += taglen; } -- 2.19.0
Re: [PATCH] net: phy: phylink: fix SFP interface autodetection
On Mon, Sep 17, 2018 at 05:19:57PM +0300, Baruch Siach wrote: > When the switching to the SFP detected link mode update the main > link_interface field as well. Otherwise, the link fails to come up when > the configured 'phy-mode' defers from the SFP detected mode. > > This fixes 1GB SFP module link up on eth3 of the Macchiatobin board that > is configured in the DT to "2500base-x" phy-mode. link_interface isn't supposed to track the SFP link mode. In any case, this is only used when a PHY is attached. For a PHY on a SFP, phylink_connect_phy() should be using link_config.interface and not link_interface there. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [PATCH v3 net-next 07/12] net: ethernet: Add helper to remove a supported link mode
On Wed, Sep 12, 2018 at 01:53:14AM +0200, Andrew Lunn wrote: > Some MAC hardware cannot support a subset of link modes. e.g. often > 1Gbps Full duplex is supported, but Half duplex is not. Add a helper > to remove such a link mode. > > Signed-off-by: Andrew Lunn > Reviewed-by: Florian Fainelli > --- > drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 6 +++--- > drivers/net/ethernet/cadence/macb_main.c | 5 ++--- > drivers/net/ethernet/freescale/fec_main.c | 3 ++- > drivers/net/ethernet/microchip/lan743x_main.c | 2 +- > drivers/net/ethernet/renesas/ravb_main.c | 3 ++- > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 12 > drivers/net/phy/phy_device.c | 18 ++ > drivers/net/usb/lan78xx.c | 2 +- > include/linux/phy.h| 1 + > 9 files changed, 38 insertions(+), 14 deletions(-) > > diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > index 078a04dc1182..4831f9de5945 100644 ... > diff --git a/drivers/net/ethernet/renesas/ravb_main.c > b/drivers/net/ethernet/renesas/ravb_main.c > index aff5516b781e..fb2a1125780d 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -1074,7 +1074,8 @@ static int ravb_phy_init(struct net_device *ndev) > } > > /* 10BASE is not supported */ > - phydev->supported &= ~PHY_10BT_FEATURES; > + phy_remove_link_mode(phydev, ETHTOOL_LINK_MODE_10baseT_Half_BIT); > + phy_remove_link_mode(phydev, ETHTOOL_LINK_MODE_10baseT_Full_BIT); > > phy_attached_info(phydev); > ... > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index db1172db1e7c..e9ca83a438b0 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -1765,6 +1765,24 @@ int phy_set_max_speed(struct phy_device *phydev, u32 > max_speed) > } > EXPORT_SYMBOL(phy_set_max_speed); > > +/** > + * phy_remove_link_mode - Remove a supported link mode > + * @phydev: phy_device structure to remove link mode from > + * @link_mode: Link mode to be removed > + * > + * Description: Some MACs don't support all link modes which the PHY > + * does. e.g. a 1G MAC often does not support 1000Half. Add a helper > + * to remove a link mode. > + */ > +void phy_remove_link_mode(struct phy_device *phydev, u32 link_mode) > +{ > + WARN_ON(link_mode > 31); > + > + phydev->supported &= ~BIT(link_mode); > + phydev->advertising = phydev->supported; > +} > +EXPORT_SYMBOL(phy_remove_link_mode); > + > static void of_set_phy_supported(struct phy_device *phydev) > { > struct device_node *node = phydev->mdio.dev.of_node; Hi Andrew, I believe that for the RAVB the overall effect of this change is that 10-BaseT modes are no longer advertised (although both with and without this patch they are not supported). Unfortunately on R-Car Gen3 M3-W (r8a7796) based Salvator-X board I have observed that this results in the link no longer being negotiated on one switch (the one I usually use) while it seemed fine on another.
Re: [PATCH net-next 0/5] net: lantiq: Minor fixes for vrx200 and gswip
From: Hauke Mehrtens Date: Sat, 15 Sep 2018 14:08:44 +0200 > These are mostly minor fixes to problems addresses in the latests round > of the review of the original series adding these driver, which were not > applied before the patches got merged into net-next. > In addition it fixes a data bus error on poweroff. Series applied.
Re: [PATCH v2 net] net: aquantia: memory corruption on jumbo frames
From: Igor Russkikh Date: Sat, 15 Sep 2018 18:03:39 +0300 > From: Friedemann Gerold > > This patch fixes skb_shared area, which will be corrupted > upon reception of 4K jumbo packets. > > Originally build_skb usage purpose was to reuse page for skb to eliminate > needs of extra fragments. But that logic does not take into account that > skb_shared_info should be reserved at the end of skb data area. > > In case packet data consumes all the page (4K), skb_shinfo location > overflows the page. As a consequence, __build_skb zeroed shinfo data above > the allocated page, corrupting next page. > > The issue is rarely seen in real life because jumbo are normally larger > than 4K and that causes another code path to trigger. > But it 100% reproducible with simple scapy packet, like: > > sendp(IP(dst="192.168.100.3") / TCP(dport=443) \ > / Raw(RandString(size=(4096-40))), iface="enp1s0") > > Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") > > Reported-by: Friedemann Gerold > Reported-by: Michael Rauch > Signed-off-by: Friedemann Gerold > Tested-by: Nikita Danilov > Signed-off-by: Igor Russkikh APplied and queued up for -stable.
Re: [PATCHv2 net-next 1/1] net: rds: use memset to optimize the recv
From: Zhu Yanjun Date: Sun, 16 Sep 2018 22:49:30 -0400 > The function rds_inc_init is in recv process. To use memset can optimize > the function rds_inc_init. > The test result: > > Before: > 1) + 24.950 us |rds_inc_init [rds](); > After: > 1) + 10.990 us |rds_inc_init [rds](); > > Acked-by: Santosh Shilimkar > Signed-off-by: Zhu Yanjun > --- > V1->V2: a new patch for net-next Applied.
Re: [PATCH net-next] liquidio: Add the features to show FEC settings and set FEC settings
From: Felix Manlunas Date: Sun, 16 Sep 2018 22:43:32 -0700 > From: Weilin Chang > > 1. Add functions for get_fecparam and set_fecparam. > 2. Modify lio_get_link_ksettings to display FEC setting. > > Signed-off-by: Weilin Chang > Acked-by: Derek Chickles > Signed-off-by: Felix Manlunas Applied.
[PATCH] net: emac: fix fixed-link setup for the RTL8363SB switch
On the Netgear WNDAP620, the emac ethernet isn't receiving nor xmitting any frames from/to the RTL8363SB (identifies itself as a RTL8367RB). This is caused by the emac hardware not knowing the forced link parameters for speed, duplex, pause, etc. This begs the question, how this was working on the original driver code, when it was necessary to set the phy_address and phy_map to 0x. But I guess without access to the old PPC405/440/460 hardware, it's not possible to know. Signed-off-by: Christian Lamparter --- drivers/net/ethernet/ibm/emac/core.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c index 354c0982847b..3b398ebdb5e6 100644 --- a/drivers/net/ethernet/ibm/emac/core.c +++ b/drivers/net/ethernet/ibm/emac/core.c @@ -2677,12 +2677,17 @@ static int emac_init_phy(struct emac_instance *dev) if (of_phy_is_fixed_link(np)) { int res = emac_dt_mdio_probe(dev); - if (!res) { - res = of_phy_register_fixed_link(np); - if (res) - mdiobus_unregister(dev->mii_bus); + if (res) + return res; + + res = of_phy_register_fixed_link(np); + dev->phy_dev = of_phy_find_device(np); + if (res || !dev->phy_dev) { + mdiobus_unregister(dev->mii_bus); + return res ? res : -EINVAL; } - return res; + emac_adjust_link(dev->ndev); + put_device(&dev->phy_dev->mdio.dev); } return 0; } -- 2.19.0.rc2