Re: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct
On 06/21/2017 02:11 PM, Kalle Valo wrote: David Miller writes: From: Jia-Ju Bai Date: Mon, 19 Jun 2017 10:48:53 +0800 The driver may sleep under a spin lock, and the function call path is: netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock) ioremap --> may sleep To fix it, the lock is released before "ioremap", and the lock is acquired again after this function. Signed-off-by: Jia-Ju Bai This style of change you are making is really starting to be a problem. You can't just drop locks like this, especially without explaining why it's ok, and why the mutual exclusion this code was trying to achieve is still going to be OK afterwards. In fact, I see zero analysis of the locking situation here, why it was needed in the first place, and why your change is OK in that context. Any locking change is delicate, and you must put the greatest of care and consideration into it. Just putting "unlock/lock" around the sleeping operation shows a very low level of consideration for the implications of the change you are making. This isn't like making whitespace fixes, sorry... We already tried to explain this to Jia-Ju during review of a wireless patch: https://patchwork.kernel.org/patch/9756585/ Jia-Ju, you should listen to feedback. If you continue submitting random patches like this makes it hard for maintainers to trust your patches anymore. Hi, I am quite sorry for my incorrect patches, and I will listen carefully to your advice. In fact, for some bugs and patches which I have reported before, I have not received the feedback of them, so I resent them a few days ago, including this patch. Sorry for my mistake again. Thanks, Jia-Ju Bai
Re: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct
David Miller writes: > From: Jia-Ju Bai > Date: Mon, 19 Jun 2017 10:48:53 +0800 > >> The driver may sleep under a spin lock, and the function call path is: >> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock) >> ioremap --> may sleep >> >> To fix it, the lock is released before "ioremap", and the lock is >> acquired again after this function. >> >> Signed-off-by: Jia-Ju Bai > > This style of change you are making is really starting to be a > problem. > > You can't just drop locks like this, especially without explaining > why it's ok, and why the mutual exclusion this code was trying to > achieve is still going to be OK afterwards. > > In fact, I see zero analysis of the locking situation here, why > it was needed in the first place, and why your change is OK in > that context. > > Any locking change is delicate, and you must put the greatest of > care and consideration into it. > > Just putting "unlock/lock" around the sleeping operation shows a > very low level of consideration for the implications of the change > you are making. > > This isn't like making whitespace fixes, sorry... We already tried to explain this to Jia-Ju during review of a wireless patch: https://patchwork.kernel.org/patch/9756585/ Jia-Ju, you should listen to feedback. If you continue submitting random patches like this makes it hard for maintainers to trust your patches anymore. -- Kalle Valo
[PATCH rdma-next 03/19] RDMA/netlink: Rename and remove redundant parameter from ibnl_unicast
From: Leon Romanovsky Netlink message header is not needed for unicast reply, hence remove it. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/iwpm_msg.c | 6 +++--- drivers/infiniband/core/iwpm_util.c | 4 ++-- drivers/infiniband/core/netlink.c | 5 ++--- include/rdma/rdma_netlink.h | 4 +--- 4 files changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/iwpm_msg.c b/drivers/infiniband/core/iwpm_msg.c index 1fab707b1f68..8f84557d04e3 100644 --- a/drivers/infiniband/core/iwpm_msg.c +++ b/drivers/infiniband/core/iwpm_msg.c @@ -172,7 +172,7 @@ int iwpm_add_mapping(struct iwpm_sa_data *pm_msg, u8 nl_client) goto add_mapping_error; nlmsg_request->req_buffer = pm_msg; - ret = ibnl_unicast(skb, nlh, iwpm_user_pid); + ret = rdma_nl_unicast(skb, iwpm_user_pid); if (ret) { skb = NULL; /* skb is freed in the netlink send-op handling */ iwpm_user_pid = IWPM_PID_UNDEFINED; @@ -248,7 +248,7 @@ int iwpm_add_and_query_mapping(struct iwpm_sa_data *pm_msg, u8 nl_client) goto query_mapping_error; nlmsg_request->req_buffer = pm_msg; - ret = ibnl_unicast(skb, nlh, iwpm_user_pid); + ret = rdma_nl_unicast(skb, iwpm_user_pid); if (ret) { skb = NULL; /* skb is freed in the netlink send-op handling */ err_str = "Unable to send a nlmsg"; @@ -308,7 +308,7 @@ int iwpm_remove_mapping(struct sockaddr_storage *local_addr, u8 nl_client) if (ret) goto remove_mapping_error; - ret = ibnl_unicast(skb, nlh, iwpm_user_pid); + ret = rdma_nl_unicast(skb, iwpm_user_pid); if (ret) { skb = NULL; /* skb is freed in the netlink send-op handling */ iwpm_user_pid = IWPM_PID_UNDEFINED; diff --git a/drivers/infiniband/core/iwpm_util.c b/drivers/infiniband/core/iwpm_util.c index c46442ac71a2..c81c55942626 100644 --- a/drivers/infiniband/core/iwpm_util.c +++ b/drivers/infiniband/core/iwpm_util.c @@ -597,7 +597,7 @@ static int send_mapinfo_num(u32 mapping_num, u8 nl_client, int iwpm_pid) &mapping_num, IWPM_NLA_MAPINFO_SEND_NUM); if (ret) goto mapinfo_num_error; - ret = ibnl_unicast(skb, nlh, iwpm_pid); + ret = rdma_nl_unicast(skb, iwpm_pid); if (ret) { skb = NULL; err_str = "Unable to send a nlmsg"; @@ -626,7 +626,7 @@ static int send_nlmsg_done(struct sk_buff *skb, u8 nl_client, int iwpm_pid) return -ENOMEM; } nlh->nlmsg_type = NLMSG_DONE; - ret = ibnl_unicast(skb, (struct nlmsghdr *)skb->data, iwpm_pid); + ret = rdma_nl_unicast(skb, iwpm_pid); if (ret) pr_warn("%s Unable to send a nlmsg\n", __func__); return ret; diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 96057e722123..cd29311078b5 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -241,12 +241,11 @@ static void rdma_nl_rcv(struct sk_buff *skb) mutex_unlock(&rdma_nl_mutex); } -int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh, - __u32 pid) +int rdma_nl_unicast(struct sk_buff *skb, u32 pid) { return nlmsg_unicast(nls, skb, pid); } -EXPORT_SYMBOL(ibnl_unicast); +EXPORT_SYMBOL(rdma_nl_unicast); int ibnl_multicast(struct sk_buff *skb, struct nlmsghdr *nlh, unsigned int group, gfp_t flags) diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h index 6932b7acd3a6..9779cd5520d7 100644 --- a/include/rdma/rdma_netlink.h +++ b/include/rdma/rdma_netlink.h @@ -59,12 +59,10 @@ int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh, /** * Send the supplied skb to a specific userspace PID. * @skb: The netlink skb - * @nlh: Header of the netlink message to send * @pid: Userspace netlink process ID * Returns 0 on success or a negative error code. */ -int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh, - __u32 pid); +int rdma_nl_unicast(struct sk_buff *skb, u32 pid); /** * Send the supplied skb to a netlink group. -- 2.13.1
[PATCH rdma-next 15/19] RDMA/netlink: Implement nldev device dumpit calback
From: Leon Romanovsky This patch adds the ability to return all available devices together with their properties. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/nldev.c | 61 - 1 file changed, 60 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index 1d1e4f214874..219b52166c46 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -30,13 +30,72 @@ * POSSIBILITY OF SUCH DAMAGE. */ +#include #include #include "core_priv.h" +static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = { + [RDMA_NLDEV_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, + .len = IB_DEVICE_NAME_MAX - 1}, + [RDMA_NLDEV_ATTR_PORT_INDEX]= { .type = NLA_U32 }, +}; + +static int fill_dev_info(struct sk_buff *msg, struct ib_device *device) +{ + if (nla_put_string(msg, RDMA_NLDEV_ATTR_DEV_NAME, device->name)) + return -EMSGSIZE; + if (nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, rdma_end_port(device))) + return -EMSGSIZE; + return 0; +} + +static int _nldev_get_dumpit(struct ib_device *device, +struct sk_buff *skb, +struct netlink_callback *cb, +unsigned int idx) +{ + int start = cb->args[0]; + struct nlmsghdr *nlh; + + if (idx < start) + return 0; + + nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, + RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET), + 0, NLM_F_MULTI); + + if (fill_dev_info(skb, device)) { + nlmsg_cancel(skb, nlh); + goto out; + } + + nlmsg_end(skb, nlh); + + idx++; + +out: cb->args[0] = idx; + return skb->len; +} + +static int nldev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) +{ + /* +* There is no need to take lock, because +* we are relying on ib_core's lists_rwsem +*/ + return ib_enum_all_devs(_nldev_get_dumpit, skb, cb); +} + +static const struct rdma_nl_cbs nldev_cb_table[] = { + [RDMA_NLDEV_CMD_GET] = { + .dump = nldev_get_dumpit, + }, +}; + void __init nldev_init(void) { - rdma_nl_register(RDMA_NL_NLDEV, NULL); + rdma_nl_register(RDMA_NL_NLDEV, nldev_cb_table); } void __exit nldev_exit(void) -- 2.13.1
[PATCH rdma-next 09/19] RDMA/netlink: Add and implement doit netlink callback
From: Leon Romanovsky The .doit callback is used by netlink core to differentiate between get and set operations. Common convention is to use that call for command operations like (SET, ADD, e.t.c.) and/or access without NLF_M_DUMP flag. This commit adds proper declaration and implementation to RDMA netlink. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/netlink.c | 19 ++- include/rdma/rdma_netlink.h | 2 ++ 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 14b64e4d1e06..34f529cc9776 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -75,9 +75,13 @@ static bool is_nl_msg_valid(unsigned int type, unsigned int op) static bool is_nl_valid(unsigned int type, unsigned int op) { - if (!is_nl_msg_valid(type, op) || - !rdma_nl_types[type].cb_table || - !rdma_nl_types[type].cb_table[op].dump) + const struct rdma_nl_cbs *cb_table; + + if (!is_nl_msg_valid(type, op)) + return false; + + cb_table = rdma_nl_types[type].cb_table; + if (!cb_table || (!cb_table[op].dump && !cb_table[op].doit)) return false; return true; } @@ -152,6 +156,7 @@ static int rdma_nl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, unsigned int op = RDMA_NL_GET_OP(type); struct netlink_callback cb = {}; struct netlink_dump_control c = {}; + int ret; if (!is_nl_valid(index, op)) return -EINVAL; @@ -170,10 +175,14 @@ static int rdma_nl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, cb.nlh = nlh; cb.dump = rdma_nl_types[index].cb_table[op].dump; return cb.dump(skb, &cb); + } else { + c.dump = rdma_nl_types[index].cb_table[op].dump; + return netlink_dump_start(nls, skb, nlh, &c); } + if (rdma_nl_types[index].cb_table[op].doit) + ret = rdma_nl_types[index].cb_table[op].doit(skb, nlh, extack); + return ret; - c.dump = rdma_nl_types[index].cb_table[op].dump; - return netlink_dump_start(nls, skb, nlh, &c); } /* diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h index 8feeb899e2b2..d6a481880f41 100644 --- a/include/rdma/rdma_netlink.h +++ b/include/rdma/rdma_netlink.h @@ -6,6 +6,8 @@ #include struct rdma_nl_cbs { + int (*doit)(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack); int (*dump)(struct sk_buff *skb, struct netlink_callback *nlcb); u8 flags; }; -- 2.13.1
[PATCH rdma-next 17/19] RDMA/netlink: Add nldev port dumpit implementation
From: Leon Romanovsky This patch implements the query interface to get all ports data for the specific device. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/nldev.c | 60 + 1 file changed, 60 insertions(+) diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index 8d3fd9b58cc1..ded06fa69ccb 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -50,6 +50,16 @@ static int fill_dev_info(struct sk_buff *msg, struct ib_device *device) return 0; } +static int fill_port_info(struct sk_buff *msg, + struct ib_device *device, u32 port) +{ + if (nla_put_string(msg, RDMA_NLDEV_ATTR_DEV_NAME, device->name)) + return -EMSGSIZE; + if (nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port)) + return -EMSGSIZE; + return 0; +} + static int nldev_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { @@ -126,11 +136,61 @@ static int nldev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) return ib_enum_all_devs(_nldev_get_dumpit, skb, cb); } +static int nldev_port_get_dumpit(struct sk_buff *skb, +struct netlink_callback *cb) +{ + struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; + char name[IB_DEVICE_NAME_MAX]; + struct ib_device *device; + int start = cb->args[0]; + struct nlmsghdr *nlh; + u32 idx = 0; + int err; + u32 p; + + err = nlmsg_parse(cb->nlh, 0, tb, RDMA_NLDEV_ATTR_MAX, + nldev_policy, NULL); + if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME]) + return -EINVAL; + + nla_strlcpy(name, tb[RDMA_NLDEV_ATTR_DEV_NAME], IB_DEVICE_NAME_MAX); + + device = __ib_device_get_by_name(name); + if (!device) + return -EINVAL; + + for (p = rdma_start_port(device); p <= rdma_end_port(device); ++p) { + if (idx < start) { + idx++; + continue; + } + + nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, +RDMA_NLDEV_CMD_PORT_GET), + 0, NLM_F_MULTI); + + if (fill_port_info(skb, device, p)) { + nlmsg_cancel(skb, nlh); + goto out; + } + idx++; + nlmsg_end(skb, nlh); + } + +out: cb->args[0] = idx; + return skb->len; +} + static const struct rdma_nl_cbs nldev_cb_table[] = { [RDMA_NLDEV_CMD_GET] = { .doit = nldev_get_doit, .dump = nldev_get_dumpit, }, + [RDMA_NLDEV_CMD_PORT_GET] = { + .dump = nldev_port_get_dumpit, + }, }; void __init nldev_init(void) -- 2.13.1
[PATCH rdma-next 18/19] RDMA/netlink: Implement nldev port doit callback
From: Leon Romanovsky Provide ability to get specific to device and port information. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/nldev.c | 45 + 1 file changed, 45 insertions(+) diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index ded06fa69ccb..250965329592 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -136,6 +136,50 @@ static int nldev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) return ib_enum_all_devs(_nldev_get_dumpit, skb, cb); } +static int nldev_port_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; + char name[IB_DEVICE_NAME_MAX]; + struct ib_device *device; + struct sk_buff *msg; + u32 port; + int err; + + err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX, + nldev_policy, extack); + if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME] || + !tb[RDMA_NLDEV_ATTR_PORT_INDEX]) + return -EINVAL; + + nla_strlcpy(name, tb[RDMA_NLDEV_ATTR_DEV_NAME], IB_DEVICE_NAME_MAX); + device = __ib_device_get_by_name(name); + if (!device) + return -EINVAL; + + port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]); + if (rdma_is_port_valid(device, port)) + return -EINVAL; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, + RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET), + 0, 0); + + err = fill_port_info(msg, device, port); + if (err) { + nlmsg_free(msg); + return err; + } + + nlmsg_end(msg, nlh); + + return rdma_nl_unicast(msg, NETLINK_CB(skb).portid); +} + static int nldev_port_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { @@ -189,6 +233,7 @@ static const struct rdma_nl_cbs nldev_cb_table[] = { .dump = nldev_get_dumpit, }, [RDMA_NLDEV_CMD_PORT_GET] = { + .doit = nldev_port_get_doit, .dump = nldev_port_get_dumpit, }, }; -- 2.13.1
[PATCH rdma-next 14/19] RDMA/netlink: Add nldev initialization flows
From: Leon Romanovsky Add nldev init and exit flows to the RDMA/core. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/Makefile| 4 +++- drivers/infiniband/core/core_priv.h | 4 drivers/infiniband/core/device.c| 2 ++ drivers/infiniband/core/nldev.c | 45 + 4 files changed, 54 insertions(+), 1 deletion(-) create mode 100644 drivers/infiniband/core/nldev.c diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 6ebd9ad95010..d260c5ecf656 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -10,7 +10,9 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \ device.o fmr_pool.o cache.o netlink.o \ roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \ - multicast.o mad.o smi.o agent.o mad_rmpp.o + multicast.o mad.o smi.o agent.o mad_rmpp.o \ + nldev.o + ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 0137aa1ec471..492c856393c9 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -188,4 +188,8 @@ int ib_nl_handle_ip_res_resp(struct sk_buff *skb, struct netlink_ext_ack *extack); struct ib_device *__ib_device_get_by_name(const char *name); + +/* RDMA device netlink */ +void nldev_init(void); +void nldev_exit(void); #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 7317f203a315..bbcc210731e2 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -1092,6 +1092,7 @@ static int __init ib_core_init(void) goto err_mad; } + nldev_init(); rdma_nl_register(RDMA_NL_LS, ibnl_ls_cb_table); ib_cache_setup(); @@ -1115,6 +1116,7 @@ static int __init ib_core_init(void) static void __exit ib_core_cleanup(void) { ib_cache_cleanup(); + nldev_exit(); rdma_nl_unregister(RDMA_NL_LS); ib_sa_cleanup(); ib_mad_cleanup(); diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c new file mode 100644 index ..1d1e4f214874 --- /dev/null +++ b/drivers/infiniband/core/nldev.c @@ -0,0 +1,45 @@ +/* + * Copyright (c) 2017 Mellanox Technologies. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. Neither the names of the copyright holders nor the names of its + *contributors may be used to endorse or promote products derived from + *this software without specific prior written permission. + * + * Alternatively, this software may be distributed under the terms of the + * GNU General Public License ("GPL") version 2 as published by the Free + * Software Foundation. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#include + +#include "core_priv.h" + +void __init nldev_init(void) +{ + rdma_nl_register(RDMA_NL_NLDEV, NULL); +} + +void __exit nldev_exit(void) +{ + rdma_nl_unregister(RDMA_NL_NLDEV); +} -- 2.13.1
[PATCH rdma-next 19/19] RDMA/netlink: Expose device and port capability masks
From: Leon Romanovsky The port capability mask is exposed to user space via sysfs interface, while device capabilities are available for verbs only. This patch provides those capabilities through netlink interface. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/nldev.c | 19 +++ include/uapi/rdma/rdma_netlink.h | 5 + 2 files changed, 24 insertions(+) diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index 250965329592..77135fb29179 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -47,16 +47,35 @@ static int fill_dev_info(struct sk_buff *msg, struct ib_device *device) return -EMSGSIZE; if (nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, rdma_end_port(device))) return -EMSGSIZE; + + BUILD_BUG_ON(sizeof(device->attrs.device_cap_flags) != sizeof(u64)); + if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_CAP_FLAGS, + device->attrs.device_cap_flags, 0)) + return -EMSGSIZE; + return 0; } static int fill_port_info(struct sk_buff *msg, struct ib_device *device, u32 port) { + struct ib_port_attr attr; + int ret; + if (nla_put_string(msg, RDMA_NLDEV_ATTR_DEV_NAME, device->name)) return -EMSGSIZE; if (nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port)) return -EMSGSIZE; + + ret = ib_query_port(device, port, &attr); + if (ret) + return ret; + + BUILD_BUG_ON(sizeof(attr.port_cap_flags) > sizeof(u64)); + if (nla_put_u64_64bit(msg, RDMA_NLDEV_ATTR_CAP_FLAGS, + (u64)attr.port_cap_flags, 0)) + return -EMSGSIZE; + return 0; } diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h index bfafd5996d52..1fff898632f4 100644 --- a/include/uapi/rdma/rdma_netlink.h +++ b/include/uapi/rdma/rdma_netlink.h @@ -262,6 +262,11 @@ enum rdma_nldev_attr { */ RDMA_NLDEV_ATTR_PORT_INDEX, /* u32 */ + /* +* Device and port capabilities +*/ + RDMA_NLDEV_ATTR_CAP_FLAGS, /* u64 */ + RDMA_NLDEV_ATTR_MAX }; #endif /* _UAPI_RDMA_NETLINK_H */ -- 2.13.1
[PATCH rdma-next 06/19] RDMA/netlink: Rename netlink callback struct
From: Leon Romanovsky The RDMA netlink client infrastructure was removed and made obsolete the old name (ibnl_client_cbs). This patch renames to the more appropriate name (rdma_nl_cbs). Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/device.c | 2 +- drivers/infiniband/core/iwcm.c| 2 +- drivers/infiniband/core/netlink.c | 4 ++-- include/rdma/rdma_netlink.h | 4 ++-- 5 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index a4013b0908e2..2af30a19b926 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -4477,7 +4477,7 @@ static int cma_get_id_stats(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; } -static const struct ibnl_client_cbs cma_cb_table[] = { +static const struct rdma_nl_cbs cma_cb_table[] = { [RDMA_NL_RDMA_CM_ID_STATS] = { .dump = cma_get_id_stats}, }; diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 2001dabd1444..5326dcc6ede7 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -1008,7 +1008,7 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, } EXPORT_SYMBOL(ib_get_net_dev_by_params); -static const struct ibnl_client_cbs ibnl_ls_cb_table[] = { +static const struct rdma_nl_cbs ibnl_ls_cb_table[] = { [RDMA_NL_LS_OP_RESOLVE] = { .dump = ib_nl_handle_resolve_resp, .flags = RDMA_NL_ADMIN_PERM, diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c index 8599271d8be6..452a3115e3e6 100644 --- a/drivers/infiniband/core/iwcm.c +++ b/drivers/infiniband/core/iwcm.c @@ -80,7 +80,7 @@ const char *__attribute_const__ iwcm_reject_msg(int reason) } EXPORT_SYMBOL(iwcm_reject_msg); -static struct ibnl_client_cbs iwcm_nl_cb_table[] = { +static struct rdma_nl_cbs iwcm_nl_cb_table[] = { [RDMA_NL_IWPM_REG_PID] = {.dump = iwpm_register_pid_cb}, [RDMA_NL_IWPM_ADD_MAPPING] = {.dump = iwpm_add_mapping_cb}, [RDMA_NL_IWPM_QUERY_MAPPING] = {.dump = iwpm_add_and_query_mapping_cb}, diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 1022ce6628ae..14b64e4d1e06 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -43,7 +43,7 @@ static DEFINE_MUTEX(rdma_nl_mutex); static struct sock *nls; static struct { - const struct ibnl_client_cbs *cb_table; + const struct rdma_nl_cbs *cb_table; } rdma_nl_types[RDMA_NL_NUM_CLIENTS]; int rdma_nl_chk_listeners(unsigned int group) @@ -83,7 +83,7 @@ static bool is_nl_valid(unsigned int type, unsigned int op) } void rdma_nl_register(unsigned int index, - const struct ibnl_client_cbs cb_table[]) + const struct rdma_nl_cbs cb_table[]) { mutex_lock(&rdma_nl_mutex); if (!is_nl_msg_valid(index, 0)) { diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h index b39e030b3a64..8feeb899e2b2 100644 --- a/include/rdma/rdma_netlink.h +++ b/include/rdma/rdma_netlink.h @@ -5,7 +5,7 @@ #include #include -struct ibnl_client_cbs { +struct rdma_nl_cbs { int (*dump)(struct sk_buff *skb, struct netlink_callback *nlcb); u8 flags; }; @@ -24,7 +24,7 @@ void rdma_nl_exit(void); * @cb_table: A table for op->callback */ void rdma_nl_register(unsigned int index, - const struct ibnl_client_cbs cb_table[]); + const struct rdma_nl_cbs cb_table[]); /** * Remove a client from IB netlink. -- 2.13.1
[PATCH rdma-next 13/19] RDMA/netlink: Add netlink device definitions to UAPI
From: Leon Romanovsky Introduce new defines to rdma_netlink.h, so the RDMA configuration tool will be able to communicate with RDMA subsystem by using the shared defines. The addition of new client (NLDEV) revealed the fact that we exposed by mistake the RDMA_NL_I40IW define which is not backed by any RDMA netlink by now and it won't be exposed in the future too. So this patch reuses the value and leaves the comment together with old definition to whose who are using RDMA_NL_I40IW as a replacement for digit "5". The NLDEV operates with objects. The struct ib_device has two straightforward objects: device itself and ports of that device. This brings us to propose the following commands to work on those objects: * RDMA_NLDEV_CMD_{GET,SET,NEW,DEL} - works on ib_device itself * RDMA_NLDEV_CMD_PORT_{GET,SET,NEW,DEL} - works on ports of specific ib_device Those commands receive/return the device name (RDMA_NLDEV_ATTR_DEV_NAME) and port index (RDMA_NLDEV_ATTR_PORT_INDEX). For device object accesses, the RDMA_NLDEV_ATTR_PORT_INDEX will return the maximum number of ports for specific ib_device and for port access the actual port index. The port index starts from 1 to follow RDMA/core internal semantics and the sysfs exposed knobs.. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/netlink.c | 2 +- include/uapi/rdma/rdma_netlink.h | 42 +++ 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 9731c313e9b9..cce9b7af4a3b 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -60,7 +60,7 @@ static bool is_nl_msg_valid(unsigned int type, unsigned int op) RDMA_NL_IWPM_NUM_OPS, 0, RDMA_NL_LS_NUM_OPS, - 0 }; + RDMA_NLDEV_NUM_OPS }; /* * This BUILD_BUG_ON is intended to catch addition of new diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h index 02fe8390c18f..bfafd5996d52 100644 --- a/include/uapi/rdma/rdma_netlink.h +++ b/include/uapi/rdma/rdma_netlink.h @@ -8,7 +8,14 @@ enum { RDMA_NL_IWCM, RDMA_NL_RSVD, RDMA_NL_LS, /* RDMA Local Services */ + /* +* RDMA_NL_I40IW not in use and it was added here by mistake, +* but we need to keep it anyway, because it is UAPI and for +* unknown reasons, someone in the field decided to replace "5" +* with this define. +*/ RDMA_NL_I40IW, + RDMA_NL_NLDEV = RDMA_NL_I40IW, /* RDMA device interface */ RDMA_NL_NUM_CLIENTS }; @@ -222,4 +229,39 @@ struct rdma_nla_ls_gid { __u8gid[16]; }; +enum rdma_nldev_command { + RDMA_NLDEV_CMD_UNSPEC, + + RDMA_NLDEV_CMD_GET, /* can dump */ + RDMA_NLDEV_CMD_SET, + RDMA_NLDEV_CMD_NEW, + RDMA_NLDEV_CMD_DEL, + + RDMA_NLDEV_CMD_PORT_GET, /* can dump */ + RDMA_NLDEV_CMD_PORT_SET, + RDMA_NLDEV_CMD_PORT_NEW, + RDMA_NLDEV_CMD_PORT_DEL, + + RDMA_NLDEV_NUM_OPS +}; + +enum rdma_nldev_attr { + /* don't change the order or add anything between, this is ABI! */ + RDMA_NLDEV_ATTR_UNSPEC, + + /* Identifier for ib_device */ + RDMA_NLDEV_ATTR_DEV_NAME, /* string */ + /* +* Device name together with port index are identifiers +* for port/link properties. +* +* For RDMA_NLDEV_CMD_GET comamnd, port index will return number +* of available ports in ib_device, while for port specific operations, +* it will be real port index as it appears in sysfs. Port index follows +* sysfs notation and starts from 1 for the first port. +*/ + RDMA_NLDEV_ATTR_PORT_INDEX, /* u32 */ + + RDMA_NLDEV_ATTR_MAX +}; #endif /* _UAPI_RDMA_NETLINK_H */ -- 2.13.1
[PATCH rdma-next 07/19] RDMA/core: Add iterator over ib_devices
From: Leon Romanovsky The coming nldev needs iterate over all IB devices in the system and in order do not expose the ib_devices list outside the devices.c, it is necessary to provide function iterator. Current version is written explicitly for nldev callback to avoid over-engineering at this stage, but it can be easily extended for other types. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/core_priv.h | 8 drivers/infiniband/core/device.c| 25 + 2 files changed, 33 insertions(+) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index cb7d372e4bdf..4a150c4be175 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -92,6 +92,14 @@ void ib_enum_all_roce_netdevs(roce_netdev_filter filter, roce_netdev_callback cb, void *cookie); +typedef int (*nldev_callback)(struct ib_device *device, + struct sk_buff *skb, + struct netlink_callback *cb, + unsigned int idx); + +int ib_enum_all_devs(nldev_callback nldev_cb, struct sk_buff *skb, +struct netlink_callback *cb); + enum ib_cache_gid_default_mode { IB_CACHE_GID_DEFAULT_MODE_SET, IB_CACHE_GID_DEFAULT_MODE_DELETE diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 5326dcc6ede7..7a799fc90348 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -815,6 +815,31 @@ void ib_enum_all_roce_netdevs(roce_netdev_filter filter, } /** + * ib_enum_all_devs - enumerate all ib_devices + * @cb: Callback to call for each found ib_device + * + * Enumerates all ib_devices and calls callback() on each device. + */ +int ib_enum_all_devs(nldev_callback nldev_cb, struct sk_buff *skb, +struct netlink_callback *cb) +{ + struct ib_device *dev; + unsigned int idx = 0; + int ret = 0; + + down_read(&lists_rwsem); + list_for_each_entry(dev, &device_list, core_list) { + ret = nldev_cb(dev, skb, cb, idx); + if (ret) + break; + idx++; + } + + up_read(&lists_rwsem); + return ret; +} + +/** * ib_query_pkey - Get P_Key table entry * @device:Device to query * @port_num:Port number to query -- 2.13.1
[PATCH rdma-next 08/19] RDMA/core: Expose translation from device name to ib_device
From: Leon Romanovsky Provide ability to convert from device name to ib_device for the IB/core users. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/core_priv.h | 1 + drivers/infiniband/core/device.c| 3 +-- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 4a150c4be175..049ccbfca988 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -184,4 +184,5 @@ int ib_nl_handle_set_timeout(struct sk_buff *skb, int ib_nl_handle_ip_res_resp(struct sk_buff *skb, struct netlink_callback *cb); +struct ib_device *__ib_device_get_by_name(const char *name); #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 7a799fc90348..4ec1b24258de 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -124,7 +124,7 @@ static int ib_device_check_mandatory(struct ib_device *device) return 0; } -static struct ib_device *__ib_device_get_by_name(const char *name) +struct ib_device *__ib_device_get_by_name(const char *name) { struct ib_device *device; @@ -135,7 +135,6 @@ static struct ib_device *__ib_device_get_by_name(const char *name) return NULL; } - static int alloc_name(char *name) { unsigned long *inuse; -- 2.13.1
[PATCH rdma-next 10/19] RDMA/netlink: Reduce indirection access to cb_table
From: Leon Romanovsky Introduce intermediate variable to store access to fields of cb_table. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/netlink.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 34f529cc9776..73c74d1cd2a3 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -156,12 +156,15 @@ static int rdma_nl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, unsigned int op = RDMA_NL_GET_OP(type); struct netlink_callback cb = {}; struct netlink_dump_control c = {}; + const struct rdma_nl_cbs *cb_table; int ret; if (!is_nl_valid(index, op)) return -EINVAL; - if ((rdma_nl_types[index].cb_table[op].flags & RDMA_NL_ADMIN_PERM) && + cb_table = rdma_nl_types[type].cb_table; + + if ((cb_table[op].flags & RDMA_NL_ADMIN_PERM) && !netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM; @@ -173,14 +176,14 @@ static int rdma_nl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, (index == RDMA_NL_LS && op == RDMA_NL_LS_OP_SET_TIMEOUT)) { cb.skb = skb; cb.nlh = nlh; - cb.dump = rdma_nl_types[index].cb_table[op].dump; + cb.dump = cb_table[op].dump; return cb.dump(skb, &cb); } else { - c.dump = rdma_nl_types[index].cb_table[op].dump; + c.dump = cb_table[op].dump; return netlink_dump_start(nls, skb, nlh, &c); } - if (rdma_nl_types[index].cb_table[op].doit) - ret = rdma_nl_types[index].cb_table[op].doit(skb, nlh, extack); + if (cb_table[op].doit) + ret = cb_table[op].doit(skb, nlh, extack); return ret; } -- 2.13.1
[PATCH rdma-next 11/19] RDMA/netlink: Convert LS to doit callback
From: Leon Romanovsky RDMA_NL_LS protocol is actually is not dump anything, but sets data and it should be handled by doit callback. This patch actually converts RDMA_NL_LS to doit callback, while preserving IWCM and RDMA_CM flows through netlink_dump_start(). Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/addr.c | 5 ++--- drivers/infiniband/core/core_priv.h | 9 ++--- drivers/infiniband/core/device.c| 6 +++--- drivers/infiniband/core/netlink.c | 28 ++-- drivers/infiniband/core/sa_query.c | 8 5 files changed, 25 insertions(+), 31 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 2cc23a26ce4f..0b283029bc61 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -129,10 +129,9 @@ static void ib_nl_process_good_ip_rsep(const struct nlmsghdr *nlh) } int ib_nl_handle_ip_res_resp(struct sk_buff *skb, -struct netlink_callback *cb) +struct nlmsghdr *nlh, +struct netlink_ext_ack *extack) { - const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh; - if ((nlh->nlmsg_flags & NLM_F_REQUEST) || !(NETLINK_CB(skb).sk)) return -EPERM; diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 049ccbfca988..0137aa1ec471 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -178,11 +178,14 @@ int ib_sa_init(void); void ib_sa_cleanup(void); int ib_nl_handle_resolve_resp(struct sk_buff *skb, - struct netlink_callback *cb); + struct nlmsghdr *nlh, + struct netlink_ext_ack *extack); int ib_nl_handle_set_timeout(struct sk_buff *skb, -struct netlink_callback *cb); +struct nlmsghdr *nlh, +struct netlink_ext_ack *extack); int ib_nl_handle_ip_res_resp(struct sk_buff *skb, -struct netlink_callback *cb); +struct nlmsghdr *nlh, +struct netlink_ext_ack *extack); struct ib_device *__ib_device_get_by_name(const char *name); #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 4ec1b24258de..7317f203a315 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -1034,15 +1034,15 @@ EXPORT_SYMBOL(ib_get_net_dev_by_params); static const struct rdma_nl_cbs ibnl_ls_cb_table[] = { [RDMA_NL_LS_OP_RESOLVE] = { - .dump = ib_nl_handle_resolve_resp, + .doit = ib_nl_handle_resolve_resp, .flags = RDMA_NL_ADMIN_PERM, }, [RDMA_NL_LS_OP_SET_TIMEOUT] = { - .dump = ib_nl_handle_set_timeout, + .doit = ib_nl_handle_set_timeout, .flags = RDMA_NL_ADMIN_PERM, }, [RDMA_NL_LS_OP_IP_RESOLVE] = { - .dump = ib_nl_handle_ip_res_resp, + .doit = ib_nl_handle_ip_res_resp, .flags = RDMA_NL_ADMIN_PERM, }, }; diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 73c74d1cd2a3..52fce73be9c1 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -154,38 +154,30 @@ static int rdma_nl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, int type = nlh->nlmsg_type; unsigned int index = RDMA_NL_GET_CLIENT(type); unsigned int op = RDMA_NL_GET_OP(type); - struct netlink_callback cb = {}; - struct netlink_dump_control c = {}; const struct rdma_nl_cbs *cb_table; - int ret; if (!is_nl_valid(index, op)) return -EINVAL; - cb_table = rdma_nl_types[type].cb_table; + cb_table = rdma_nl_types[index].cb_table; if ((cb_table[op].flags & RDMA_NL_ADMIN_PERM) && !netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM; - /* -* For response or local service set_timeout request, -* there is no need to use netlink_dump_start. -*/ - if (!(nlh->nlmsg_flags & NLM_F_REQUEST) || - (index == RDMA_NL_LS && op == RDMA_NL_LS_OP_SET_TIMEOUT)) { - cb.skb = skb; - cb.nlh = nlh; - cb.dump = cb_table[op].dump; - return cb.dump(skb, &cb); - } else { - c.dump = cb_table[op].dump; + /* TODO: Convert IWCM to properly handle doit callbacks */ + if ((nlh->nlmsg_flags & NLM_F_DUMP) || index == RDMA_NL_RDMA_CM || + index == RDMA_NL_IWCM) { + struct netlink_dump_control c = { + .dump = cb_table[op].dump, + }; return netlink_dump_start(nls, skb, nlh,
[PATCH rdma-next 12/19] RDMA/netlink: Update copyright
From: Leon Romanovsky Add Mellanox to the copyright header. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/netlink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 52fce73be9c1..9731c313e9b9 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -1,4 +1,5 @@ /* + * Copyright (c) 2017 Mellanox Technologies Inc. All rights reserved. * Copyright (c) 2010 Voltaire Inc. All rights reserved. * * This software is available to you under a choice of one of two -- 2.13.1
[PATCH rdma-next 04/19] RDMA/netlink: Rename and remove redundant parameter from ibnl_multicast
From: Leon Romanovsky The pointer to netlink header was not used in the ibnl_multicast function, so let's remove it and simplify the function signature. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/addr.c | 2 +- drivers/infiniband/core/iwpm_msg.c | 2 +- drivers/infiniband/core/netlink.c | 5 ++--- drivers/infiniband/core/sa_query.c | 2 +- include/rdma/rdma_netlink.h| 4 +--- 5 files changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 134d8394fca5..ebd0242bab3d 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -185,7 +185,7 @@ static int ib_nl_ip_send_msg(struct rdma_dev_addr *dev_addr, /* Repair the nlmsg header length */ nlmsg_end(skb, nlh); - ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, GFP_KERNEL); + rdma_nl_multicast(skb, RDMA_NL_GROUP_LS, GFP_KERNEL); /* Make the request retry, so when we get the response from userspace * we will have something. diff --git a/drivers/infiniband/core/iwpm_msg.c b/drivers/infiniband/core/iwpm_msg.c index 8f84557d04e3..561f312ea35a 100644 --- a/drivers/infiniband/core/iwpm_msg.c +++ b/drivers/infiniband/core/iwpm_msg.c @@ -103,7 +103,7 @@ int iwpm_register_pid(struct iwpm_dev_data *pm_msg, u8 nl_client) pr_debug("%s: Multicasting a nlmsg (dev = %s ifname = %s iwpm = %s)\n", __func__, pm_msg->dev_name, pm_msg->if_name, iwpm_ulib_name); - ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_IWPM, GFP_KERNEL); + ret = rdma_nl_multicast(skb, RDMA_NL_GROUP_IWPM, GFP_KERNEL); if (ret) { skb = NULL; /* skb is freed in the netlink send-op handling */ iwpm_user_pid = IWPM_PID_UNAVAILABLE; diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index cd29311078b5..89a6219b6ec8 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -247,12 +247,11 @@ int rdma_nl_unicast(struct sk_buff *skb, u32 pid) } EXPORT_SYMBOL(rdma_nl_unicast); -int ibnl_multicast(struct sk_buff *skb, struct nlmsghdr *nlh, - unsigned int group, gfp_t flags) +int rdma_nl_multicast(struct sk_buff *skb, unsigned int group, gfp_t flags) { return nlmsg_multicast(nls, skb, 0, group, flags); } -EXPORT_SYMBOL(ibnl_multicast); +EXPORT_SYMBOL(rdma_nl_multicast); int __init rdma_nl_init(void) { diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 6e39a763b220..d890600f1e2d 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -862,7 +862,7 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask) /* Repair the nlmsg header length */ nlmsg_end(skb, nlh); - ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, gfp_mask); + ret = rdma_nl_multicast(skb, RDMA_NL_GROUP_LS, gfp_mask); if (!ret) ret = len; else diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h index 9779cd5520d7..145283896417 100644 --- a/include/rdma/rdma_netlink.h +++ b/include/rdma/rdma_netlink.h @@ -67,13 +67,11 @@ int rdma_nl_unicast(struct sk_buff *skb, u32 pid); /** * Send the supplied skb to a netlink group. * @skb: The netlink skb - * @nlh: Header of the netlink message to send * @group: Netlink group ID * @flags: allocation flags * Returns 0 on success or a negative error code. */ -int ibnl_multicast(struct sk_buff *skb, struct nlmsghdr *nlh, - unsigned int group, gfp_t flags); +int rdma_nl_multicast(struct sk_buff *skb, unsigned int group, gfp_t flags); /** * Check if there are any listeners to the netlink group -- 2.13.1
[PATCH rdma-next 05/19] RDMA/netlink: Simplify and rename ibnl_chk_listeners
From: Leon Romanovsky Make ibnl_chk_listeners function to be one line by removing unneeded comparison. Rename that function to be complaint to other functions in RDMA netlink. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/addr.c | 2 +- drivers/infiniband/core/netlink.c | 8 +++- drivers/infiniband/core/sa_query.c | 2 +- include/rdma/rdma_netlink.h| 2 +- 4 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index ebd0242bab3d..2cc23a26ce4f 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -324,7 +324,7 @@ static void queue_req(struct addr_req *req) static int ib_nl_fetch_ha(struct dst_entry *dst, struct rdma_dev_addr *dev_addr, const void *daddr, u32 seq, u16 family) { - if (ibnl_chk_listeners(RDMA_NL_GROUP_LS)) + if (rdma_nl_chk_listeners(RDMA_NL_GROUP_LS)) return -EADDRNOTAVAIL; /* We fill in what we can, the response will fill the rest */ diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 89a6219b6ec8..1022ce6628ae 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -46,13 +46,11 @@ static struct { const struct ibnl_client_cbs *cb_table; } rdma_nl_types[RDMA_NL_NUM_CLIENTS]; -int ibnl_chk_listeners(unsigned int group) +int rdma_nl_chk_listeners(unsigned int group) { - if (netlink_has_listeners(nls, group) == 0) - return -1; - return 0; + return (netlink_has_listeners(nls, group)) ? 0 : -1; } -EXPORT_SYMBOL(ibnl_chk_listeners); +EXPORT_SYMBOL(rdma_nl_chk_listeners); static bool is_nl_msg_valid(unsigned int type, unsigned int op) { diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index d890600f1e2d..c06b7deea4d2 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -1419,7 +1419,7 @@ static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) if ((query->flags & IB_SA_ENABLE_LOCAL_SERVICE) && (!(query->flags & IB_SA_QUERY_OPA))) { - if (!ibnl_chk_listeners(RDMA_NL_GROUP_LS)) { + if (!rdma_nl_chk_listeners(RDMA_NL_GROUP_LS)) { if (!ib_nl_make_request(query, gfp_mask)) return id; } diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h index 145283896417..b39e030b3a64 100644 --- a/include/rdma/rdma_netlink.h +++ b/include/rdma/rdma_netlink.h @@ -78,6 +78,6 @@ int rdma_nl_multicast(struct sk_buff *skb, unsigned int group, gfp_t flags); * @group: the netlink group ID * Returns 0 on success or a negative for no listeners. */ -int ibnl_chk_listeners(unsigned int group); +int rdma_nl_chk_listeners(unsigned int group); #endif /* _RDMA_NETLINK_H */ -- 2.13.1
[PATCH rdma-next 16/19] RDMa/netlink: Add nldev device doit implementation
From: Leon Romanovsky Provide ability to query specific device. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/nldev.c | 40 1 file changed, 40 insertions(+) diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index 219b52166c46..8d3fd9b58cc1 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -50,6 +50,45 @@ static int fill_dev_info(struct sk_buff *msg, struct ib_device *device) return 0; } +static int nldev_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; + char name[IB_DEVICE_NAME_MAX]; + struct ib_device *device; + struct sk_buff *msg; + int err; + + err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX, + nldev_policy, extack); + if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME]) + return -EINVAL; + + nla_strlcpy(name, tb[RDMA_NLDEV_ATTR_DEV_NAME], IB_DEVICE_NAME_MAX); + + device = __ib_device_get_by_name(name); + if (!device) + return -EINVAL; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, + RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET), + 0, 0); + + err = fill_dev_info(msg, device); + if (err) { + nlmsg_free(msg); + return err; + } + + nlmsg_end(msg, nlh); + + return rdma_nl_unicast(msg, NETLINK_CB(skb).portid); +} + static int _nldev_get_dumpit(struct ib_device *device, struct sk_buff *skb, struct netlink_callback *cb, @@ -89,6 +128,7 @@ static int nldev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) static const struct rdma_nl_cbs nldev_cb_table[] = { [RDMA_NLDEV_CMD_GET] = { + .doit = nldev_get_doit, .dump = nldev_get_dumpit, }, }; -- 2.13.1
[PATCH rdma-next 01/19] RDMA/netlink: Add flag to consolidate common handing
From: Leon Romanovsky Add ability to provide flags to control RDMA netlink callbacks and convert addr.c and sa_query.c to be first users of such infrastructure. It allows to move their CAP_NET_ADMIN checks into netlink core. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/addr.c | 3 +-- drivers/infiniband/core/device.c | 12 +--- drivers/infiniband/core/netlink.c | 4 drivers/infiniband/core/sa_query.c | 6 ++ include/rdma/rdma_netlink.h| 6 ++ 5 files changed, 22 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 02971e239a18..134d8394fca5 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -134,8 +134,7 @@ int ib_nl_handle_ip_res_resp(struct sk_buff *skb, const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh; if ((nlh->nlmsg_flags & NLM_F_REQUEST) || - !(NETLINK_CB(skb).sk) || - !netlink_capable(skb, CAP_NET_ADMIN)) + !(NETLINK_CB(skb).sk)) return -EPERM; if (ib_nl_is_good_ip_resp(nlh)) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 5c70ea49d5ad..2001dabd1444 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -1010,11 +1010,17 @@ EXPORT_SYMBOL(ib_get_net_dev_by_params); static const struct ibnl_client_cbs ibnl_ls_cb_table[] = { [RDMA_NL_LS_OP_RESOLVE] = { - .dump = ib_nl_handle_resolve_resp}, + .dump = ib_nl_handle_resolve_resp, + .flags = RDMA_NL_ADMIN_PERM, + }, [RDMA_NL_LS_OP_SET_TIMEOUT] = { - .dump = ib_nl_handle_set_timeout}, + .dump = ib_nl_handle_set_timeout, + .flags = RDMA_NL_ADMIN_PERM, + }, [RDMA_NL_LS_OP_IP_RESOLVE] = { - .dump = ib_nl_handle_ip_res_resp}, + .dump = ib_nl_handle_ip_res_resp, + .flags = RDMA_NL_ADMIN_PERM, + }, }; static int __init ib_core_init(void) diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 4fa6746a62b1..f0d482009c69 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -171,6 +171,10 @@ static int rdma_nl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, if (!is_nl_valid(index, op)) return -EINVAL; + if ((rdma_nl_types[index].cb_table[op].flags & RDMA_NL_ADMIN_PERM) && + !netlink_capable(skb, CAP_NET_ADMIN)) + return -EPERM; + /* * For response or local service set_timeout request, * there is no need to use netlink_dump_start. diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index e335b09c022e..6e39a763b220 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -1034,8 +1034,7 @@ int ib_nl_handle_set_timeout(struct sk_buff *skb, int ret; if (!(nlh->nlmsg_flags & NLM_F_REQUEST) || - !(NETLINK_CB(skb).sk) || - !netlink_capable(skb, CAP_NET_ADMIN)) + !(NETLINK_CB(skb).sk)) return -EPERM; ret = nla_parse(tb, LS_NLA_TYPE_MAX - 1, nlmsg_data(nlh), @@ -1110,8 +1109,7 @@ int ib_nl_handle_resolve_resp(struct sk_buff *skb, int ret; if ((nlh->nlmsg_flags & NLM_F_REQUEST) || - !(NETLINK_CB(skb).sk) || - !netlink_capable(skb, CAP_NET_ADMIN)) + !(NETLINK_CB(skb).sk)) return -EPERM; spin_lock_irqsave(&ib_nl_request_lock, flags); diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h index 761517105a36..6932b7acd3a6 100644 --- a/include/rdma/rdma_netlink.h +++ b/include/rdma/rdma_netlink.h @@ -7,6 +7,12 @@ struct ibnl_client_cbs { int (*dump)(struct sk_buff *skb, struct netlink_callback *nlcb); + u8 flags; +}; + +enum rdma_nl_flags { + /* Require CAP_NET_ADMIN */ + RDMA_NL_ADMIN_PERM = 1 << 0, }; int rdma_nl_init(void); -- 2.13.1
[PATCH rdma-next 00/19] RDMA Netlink Device Client
The following patch set is an implementation of NLDEV - RDMA netlink device client. It is based on the already sent patch [1] and patch set [2]. This client is needed to properly integrate coming RDMAtool [3] into iproute2 package which is based on netlink. The following patch set can be logically divided into three parts: * Cleanup of RDMA netlink interface to handle dumpit/doit callbacks. * NLDEV initial implementation * Exposing device and capability masks via this interface The supplementary user space part will follow later or. Thanks [1] "Revert "IB/core: Add flow control to the portmapper netlink calls"" https://patchwork.kernel.org/patch/9752865/ [2] [PATCH rdma-next V2 0/5] Refactor RDMA netlink infrastructure https://www.spinics.net/lists/linux-rdma/msg50945.html [3] [RFC iproute2 0/8] RDMA tool https://www.spinics.net/lists/linux-rdma/msg49575.html Available in the "topic/rdma-netlink" topic branch of this git repo: git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git Or for browsing: https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/rdma-netlink CC: Chien Tin Tung CC: Steve Wise CC: Stephen Hemminger CC: Jiri Pirko CC: Ariel Almog CC: Linux RDMA CC: Linux Netdev Leon Romanovsky (19): RDMA/netlink: Add flag to consolidate common handing RDMA/netlink: Simplify the put_msg and put_attr RDMA/netlink: Rename and remove redundant parameter from ibnl_unicast RDMA/netlink: Rename and remove redundant parameter from ibnl_multicast RDMA/netlink: Simplify and rename ibnl_chk_listeners RDMA/netlink: Rename netlink callback struct RDMA/core: Add iterator over ib_devices RDMA/core: Expose translation from device name to ib_device RDMA/netlink: Add and implement doit netlink callback RDMA/netlink: Reduce indirection access to cb_table RDMA/netlink: Convert LS to doit callback RDMA/netlink: Update copyright RDMA/netlink: Add netlink device definitions to UAPI RDMA/netlink: Add nldev initialization flows RDMA/netlink: Implement nldev device dumpit calback RDMa/netlink: Add nldev device doit implementation RDMA/netlink: Add nldev port dumpit implementation RDMA/netlink: Implement nldev port doit callback RDMA/netlink: Expose device and port capability masks drivers/infiniband/core/Makefile| 4 +- drivers/infiniband/core/addr.c | 12 +- drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/core_priv.h | 22 ++- drivers/infiniband/core/device.c| 44 +- drivers/infiniband/core/iwcm.c | 2 +- drivers/infiniband/core/iwpm_msg.c | 8 +- drivers/infiniband/core/iwpm_util.c | 4 +- drivers/infiniband/core/netlink.c | 98 ++--- drivers/infiniband/core/nldev.c | 268 drivers/infiniband/core/sa_query.c | 18 ++- include/rdma/rdma_netlink.h | 22 +-- include/uapi/rdma/rdma_netlink.h| 47 +++ 13 files changed, 454 insertions(+), 97 deletions(-) create mode 100644 drivers/infiniband/core/nldev.c -- 2.13.1
[PATCH rdma-next 02/19] RDMA/netlink: Simplify the put_msg and put_attr
From: Leon Romanovsky Reuse standard macros to cancel the netlink message in case of error. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/netlink.c | 31 +-- 1 file changed, 9 insertions(+), 22 deletions(-) diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index f0d482009c69..96057e722123 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -126,36 +126,23 @@ EXPORT_SYMBOL(rdma_nl_unregister); void *ibnl_put_msg(struct sk_buff *skb, struct nlmsghdr **nlh, int seq, int len, int client, int op, int flags) { - unsigned char *prev_tail; - - prev_tail = skb_tail_pointer(skb); - *nlh = nlmsg_put(skb, 0, seq, RDMA_NL_GET_TYPE(client, op), -len, flags); - if (!*nlh) - goto out_nlmsg_trim; - (*nlh)->nlmsg_len = skb_tail_pointer(skb) - prev_tail; + *nlh = nlmsg_put(skb, 0, seq, RDMA_NL_GET_TYPE(client, op), len, flags); + if (!*nlh) { + nlmsg_cancel(skb, *nlh); + return NULL; + } return nlmsg_data(*nlh); - -out_nlmsg_trim: - nlmsg_trim(skb, prev_tail); - return NULL; } EXPORT_SYMBOL(ibnl_put_msg); int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh, int len, void *data, int type) { - unsigned char *prev_tail; - - prev_tail = skb_tail_pointer(skb); - if (nla_put(skb, type, len, data)) - goto nla_put_failure; - nlh->nlmsg_len += skb_tail_pointer(skb) - prev_tail; + if (nla_put(skb, type, len, data)) { + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; + } return 0; - -nla_put_failure: - nlmsg_trim(skb, prev_tail - nlh->nlmsg_len); - return -EMSGSIZE; } EXPORT_SYMBOL(ibnl_put_attr); -- 2.13.1
[PATCH v2] brcmfmac: Fix a memory leak in error handling path in 'brcmf_cfg80211_attach'
If 'wiphy_new()' fails, we leak 'ops'. Add a new label in the error handling path to free it in such a case. Cc: sta...@vger.kernel.org Fixes: 5c22fb85102a7 ("brcmfmac: add wowl gtk rekeying offload support") Signed-off-by: Christophe JAILLET --- v2: Add CC tag Change prefix --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c index 2443c71a202f..032d823c53c2 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -6861,7 +6861,7 @@ struct brcmf_cfg80211_info *brcmf_cfg80211_attach(struct brcmf_pub *drvr, wiphy = wiphy_new(ops, sizeof(struct brcmf_cfg80211_info)); if (!wiphy) { brcmf_err("Could not allocate wiphy device\n"); - return NULL; + goto ops_out; } memcpy(wiphy->perm_addr, drvr->mac, ETH_ALEN); set_wiphy_dev(wiphy, busdev); @@ -7012,6 +7012,7 @@ struct brcmf_cfg80211_info *brcmf_cfg80211_attach(struct brcmf_pub *drvr, ifp->vif = NULL; wiphy_out: brcmf_free_wiphy(wiphy); +ops_out: kfree(ops); return NULL; } -- 2.11.0
Re: [PATCH net 1/2] xfrm6: Fix IPv6 payload_len in xfrm6_transport_finish
On Mon, Jun 19, 2017 at 11:33:20AM +0300, yoss...@mellanox.com wrote: > From: Yossi Kuperman > > IPv6 payload length indicates the size of the payload, including any > extension headers. In xfrm6_transport_finish, ipv6_hdr(skb)->payload_len > is set to the payload size only, regardless of the presence of any > extension headers. > > After ESP GRO transport mode decapsulation, ipv6_rcv trims the packet > according to the wrong payload_len, thus corrupting the packet. > > Set payload_len to account for extension headers as well. > > Fixes: 716062fd4c2f ("[IPSEC]: Merge most of the input path") > Signed-off-by: Yossi Kuperman > --- > net/ipv6/xfrm6_input.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c > index 08a807b..3ef5d91 100644 > --- a/net/ipv6/xfrm6_input.c > +++ b/net/ipv6/xfrm6_input.c > @@ -43,8 +43,8 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async) > return 1; > #endif > > - ipv6_hdr(skb)->payload_len = htons(skb->len); > __skb_push(skb, skb->data - skb_network_header(skb)); > + ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); You mentioned that the bug happens with ESP GRO. Does this bug also happen in the standard codepath? If not, you might better move the above line into the 'if' section below. > > if (xo && (xo->flags & XFRM_GRO)) { > skb_mac_header_rebuild(skb); > -- > 2.8.1
[PATCH net-next] qede: Fix compilation without QED_RDMA
From: Chad Dupuis When CONFIG_QED_RDMA isn't defined, we'd hit the following: /include/linux/qed/qede_rdma.h:84:19: warning: ‘qede_rdma_dev_add’ used but never defined [enabled by default] static inline int qede_rdma_dev_add(struct qede_dev *dev); Fixes: bbfcd1e8e167 ("qed*: Set rdma generic functions prefix") Signed-off-by: Chad Dupuis Signed-off-by: Yuval Mintz --- include/linux/qed/qede_rdma.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/qed/qede_rdma.h b/include/linux/qed/qede_rdma.h index 1348a16..9904617 100644 --- a/include/linux/qed/qede_rdma.h +++ b/include/linux/qed/qede_rdma.h @@ -81,7 +81,7 @@ void qede_rdma_dev_remove(struct qede_dev *dev); void qede_rdma_event_changeaddr(struct qede_dev *edr); #else -static inline int qede_rdma_dev_add(struct qede_dev *dev); +static inline int qede_rdma_dev_add(struct qede_dev *dev) { return 0; } -- 2.9.4
Re: [RFC 1/2] net-next: fix DSA flow_disection
On 20/06/17 23:52, Andrew Lunn wrote: On Tue, Jun 20, 2017 at 07:37:35PM +0200, John Crispin wrote: On 20/06/17 16:01, Andrew Lunn wrote: On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote: RPS and probably other kernel features are currently broken on some if not all DSA devices. The root cause of this that skb_hash will call the flow_disector. Hi John What is the call path when the flow_disector is called? I'm wondering if we can defer this, and call it later, after the tag code has removed the header. Andrew Hi John I follow your logic of doing the hash early Is there any value in including the DSA header in the hash? That might allow frames from different ingress ports to be spread over CPUs? Andrew Hi Andrew, adding the DSA header wont make any difference and would still require a patch to the flow dissector. John
Re: [PATCH NET] net/hns:bugfix of ethtool -t phy self_test
Hi, Andrew On 2017/6/21 11:13, Andrew Lunn wrote: > On Wed, Jun 21, 2017 at 10:03:29AM +0800, l00371289 wrote: >> Hi, Andrew >> >> On 2017/6/20 21:27, Andrew Lunn wrote: >>> On Tue, Jun 20, 2017 at 11:05:54AM +0800, l00371289 wrote: hi, Florian On 2017/6/20 5:00, Florian Fainelli wrote: > On 06/16/2017 02:24 AM, Lin Yun Sheng wrote: >> This patch fixes the phy loopback self_test failed issue. when >> Marvell Phy Module is loaded, it will powerdown fiber when doing >> phy loopback self test, which cause phy loopback self_test fail. >> >> Signed-off-by: Lin Yun Sheng >> --- >> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 16 ++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c >> b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c >> index b8fab14..e95795b 100644 >> --- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c >> +++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c >> @@ -288,9 +288,15 @@ static int hns_nic_config_phy_loopback(struct >> phy_device *phy_dev, u8 en) > > The question really is, why is not this properly integrated into the PHY > driver and PHYLIB such that the only thing the Ethernet MAC driver has > to call is a function of the PHY driver putting it in self-test? Do you meaning calling phy_dev->drv->resume and phy_dev->drv->suspend function? >>> >>> No. Florian is saying you should add support for phylib and the >>> drivers to enable/disable loopback. >>> >>> The BMCR loopback bit is pretty much standardised. So you can >>> implement a genphy_loopback(phydev, enable), which most drivers can >>> use. Those that need there own can implement it in there driver. >> >> I tried to add the genphy_loopback support you mentioned, please look >> at it if that is what you mean. If Yes, I will try to send out a new patch. >> >> Best Regards >> Yinsheng Lin >> >> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >> index 1219eea..54fecad 100644 >> --- a/drivers/net/phy/phy_device.c >> +++ b/drivers/net/phy/phy_device.c >> @@ -1628,6 +1628,31 @@ static int gen10g_resume(struct phy_device *phydev) >> return 0; >> } >> >> +int genphy_loopback(struct phy_device *phydev, bool enable) >> +{ >> + int value; >> + >> + mutex_lock(&phydev->lock); > > Do you look at the other genphy_ functions? How many take the mutex? only genphy_suspend and genphy_resume take the mutex, I will have to remove the lock taking, right? > >> + if (enable) { >> + value = phy_read(phydev, MII_BMCR); >> + phy_write(phydev, MII_BMCR, value | BMCR_LOOPBACK); >> + } else { >> + value = phy_read(phydev, MII_BMCR); >> + phy_write(phydev, MII_BMCR, value & ~BMCR_LOOPBACK); >> + } >> + >> + mutex_unlock(&phydev->lock); >> + >> + return 0; >> +} >> +EXPORT_SYMBOL(genphy_loopback); >> + >> +static int gen10g_loopback(struct phy_device *phydev, bool enable) >> +{ >> + return 0; >> +} >> + >> static int __set_phy_supported(struct phy_device *phydev, u32 max_speed) >> { >> /* The default values for phydev->supported are provided by the PHY >> @@ -1874,6 +1899,7 @@ void phy_drivers_unregister(struct phy_driver *drv, >> int n) >> .read_status= genphy_read_status, >> .suspend= genphy_suspend, >> .resume = genphy_resume, >> + .set_loopback = genphy_loopback, >> }, { >> .phy_id = 0x, >> .phy_id_mask= 0x, >> @@ -1885,6 +1911,7 @@ void phy_drivers_unregister(struct phy_driver *drv, >> int n) >> .read_status= gen10g_read_status, >> .suspend= gen10g_suspend, >> .resume = gen10g_resume, >> + .set_loopback = gen10g_loopback, >> } }; >> >> static int __init phy_init(void) >> diff --git a/include/linux/phy.h b/include/linux/phy.h >> index e76e4ad..fc7a5c8 100644 >> --- a/include/linux/phy.h >> +++ b/include/linux/phy.h >> @@ -639,6 +639,7 @@ struct phy_driver { >> int (*set_tunable)(struct phy_device *dev, >> struct ethtool_tunable *tuna, >> const void *data); >> + int (*set_loopback(struct phy_device *dev, bool enable); > > Does this even compile? It looks to be missing a ) My mistake, I will make sure it will compile before sending it. > > Also, where is the exported function the MAC driver should call? Here is a example: drivers/net/ph/marvell.c marvell_set_loopback(struct phy_device *dev, bool enable) { /* do some device specific setting */ return genphy_loopback(dev, enable); } I don't know if this makes sense or not? Best Regards Yunsheng Lin > > Andrew > > . >
[PATCH net-next] r8152: correct the definition
Replace VLAN_HLEN and CRC_SIZE with ETH_FCS_LEN. Signed-off-by: Hayes Wang --- drivers/net/usb/r8152.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 8bc4573..6cfffef 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -569,7 +569,6 @@ enum rtl_register_content { #define RTL8152_MAX_TX 4 #define RTL8152_MAX_RX 10 #define INTBUFSIZE 2 -#define CRC_SIZE 4 #define TX_ALIGN 4 #define RX_ALIGN 8 @@ -588,12 +587,13 @@ enum rtl_register_content { #define BYTE_EN_END_MASK 0xf0 #define RTL8153_MAX_PACKET 9216 /* 9K */ -#define RTL8153_MAX_MTU(RTL8153_MAX_PACKET - VLAN_ETH_HLEN - VLAN_HLEN) -#define RTL8152_RMS(VLAN_ETH_FRAME_LEN + VLAN_HLEN) +#define RTL8153_MAX_MTU(RTL8153_MAX_PACKET - VLAN_ETH_HLEN - \ +ETH_FCS_LEN) +#define RTL8152_RMS(VLAN_ETH_FRAME_LEN + ETH_FCS_LEN) #define RTL8153_RMSRTL8153_MAX_PACKET #define RTL8152_TX_TIMEOUT (5 * HZ) #define RTL8152_NAPI_WEIGHT64 -#define rx_reserved_size(x)((x) + VLAN_ETH_HLEN + CRC_SIZE + \ +#define rx_reserved_size(x)((x) + VLAN_ETH_HLEN + ETH_FCS_LEN + \ sizeof(struct rx_desc) + RX_ALIGN) /* rtl8152 flags */ @@ -770,7 +770,7 @@ static const int multicast_filter_limit = 32; static unsigned int agg_buf_sz = 16384; #define RTL_LIMITED_TSO_SIZE (agg_buf_sz - sizeof(struct tx_desc) - \ -VLAN_ETH_HLEN - VLAN_HLEN) +VLAN_ETH_HLEN - ETH_FCS_LEN) static int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data) @@ -1928,7 +1928,7 @@ static int rx_bottom(struct r8152 *tp, int budget) if (urb->actual_length < len_used) break; - pkt_len -= CRC_SIZE; + pkt_len -= ETH_FCS_LEN; rx_data += sizeof(struct rx_desc); skb = napi_alloc_skb(napi, pkt_len); @@ -1952,7 +1952,7 @@ static int rx_bottom(struct r8152 *tp, int budget) } find_next_rx: - rx_data = rx_agg_align(rx_data + pkt_len + CRC_SIZE); + rx_data = rx_agg_align(rx_data + pkt_len + ETH_FCS_LEN); rx_desc = (struct rx_desc *)rx_data; len_used = (int)(rx_data - (u8 *)agg->head); len_used += sizeof(struct rx_desc); @@ -2242,7 +2242,7 @@ static void set_tx_qlen(struct r8152 *tp) { struct net_device *netdev = tp->netdev; - tp->tx_qlen = agg_buf_sz / (netdev->mtu + VLAN_ETH_HLEN + VLAN_HLEN + + tp->tx_qlen = agg_buf_sz / (netdev->mtu + VLAN_ETH_HLEN + ETH_FCS_LEN + sizeof(struct tx_desc)); } @@ -3439,7 +3439,7 @@ static void r8153_first_init(struct r8152 *tp) rtl_rx_vlan_en(tp, tp->netdev->features & NETIF_F_HW_VLAN_CTAG_RX); - ocp_data = tp->netdev->mtu + VLAN_ETH_HLEN + CRC_SIZE; + ocp_data = tp->netdev->mtu + VLAN_ETH_HLEN + ETH_FCS_LEN; ocp_write_word(tp, MCU_TYPE_PLA, PLA_RMS, ocp_data); ocp_write_byte(tp, MCU_TYPE_PLA, PLA_MTPS, MTPS_JUMBO); @@ -3489,7 +3489,7 @@ static void r8153_enter_oob(struct r8152 *tp) usleep_range(1000, 2000); } - ocp_data = tp->netdev->mtu + VLAN_ETH_HLEN + CRC_SIZE; + ocp_data = tp->netdev->mtu + VLAN_ETH_HLEN + ETH_FCS_LEN; ocp_write_word(tp, MCU_TYPE_PLA, PLA_RMS, ocp_data); switch (tp->version) { @@ -4957,7 +4957,7 @@ static int rtl8152_change_mtu(struct net_device *dev, int new_mtu) dev->mtu = new_mtu; if (netif_running(dev)) { - u32 rms = new_mtu + VLAN_ETH_HLEN + CRC_SIZE; + u32 rms = new_mtu + VLAN_ETH_HLEN + ETH_FCS_LEN; ocp_write_word(tp, MCU_TYPE_PLA, PLA_RMS, rms); -- 2.7.4
Re: [Patch net] ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
On 6/20/17 2:42 PM, Cong Wang wrote: > In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after > ipv6_dev_notf") > I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired, > unfortunately, as reported by jeffy, netdev_wait_allrefs() > could rebroadcast NETDEV_UNREGISTER event until all refs are > gone. > > We have to add an additional check to avoid this corner case. > For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED, > for dev_change_net_namespace(), dev->reg_state is > NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED. > > Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after > ipv6_dev_notf") > Reported-by: jeffy > Cc: David Ahern > Signed-off-by: Cong Wang > --- > net/ipv6/route.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) Acked-by: David Ahern
Re: [PATCH NET] net/hns:bugfix of ethtool -t phy self_test
On Wed, Jun 21, 2017 at 10:03:29AM +0800, l00371289 wrote: > Hi, Andrew > > On 2017/6/20 21:27, Andrew Lunn wrote: > > On Tue, Jun 20, 2017 at 11:05:54AM +0800, l00371289 wrote: > >> hi, Florian > >> > >> On 2017/6/20 5:00, Florian Fainelli wrote: > >>> On 06/16/2017 02:24 AM, Lin Yun Sheng wrote: > This patch fixes the phy loopback self_test failed issue. when > Marvell Phy Module is loaded, it will powerdown fiber when doing > phy loopback self test, which cause phy loopback self_test fail. > > Signed-off-by: Lin Yun Sheng > --- > drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 16 ++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c > b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c > index b8fab14..e95795b 100644 > --- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c > +++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c > @@ -288,9 +288,15 @@ static int hns_nic_config_phy_loopback(struct > phy_device *phy_dev, u8 en) > >>> > >>> The question really is, why is not this properly integrated into the PHY > >>> driver and PHYLIB such that the only thing the Ethernet MAC driver has > >>> to call is a function of the PHY driver putting it in self-test? > >> Do you meaning calling phy_dev->drv->resume and phy_dev->drv->suspend > >> function? > > > > No. Florian is saying you should add support for phylib and the > > drivers to enable/disable loopback. > > > > The BMCR loopback bit is pretty much standardised. So you can > > implement a genphy_loopback(phydev, enable), which most drivers can > > use. Those that need there own can implement it in there driver. > > I tried to add the genphy_loopback support you mentioned, please look > at it if that is what you mean. If Yes, I will try to send out a new patch. > > Best Regards > Yinsheng Lin > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 1219eea..54fecad 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -1628,6 +1628,31 @@ static int gen10g_resume(struct phy_device *phydev) > return 0; > } > > +int genphy_loopback(struct phy_device *phydev, bool enable) > +{ > + int value; > + > + mutex_lock(&phydev->lock); Do you look at the other genphy_ functions? How many take the mutex? > + if (enable) { > + value = phy_read(phydev, MII_BMCR); > + phy_write(phydev, MII_BMCR, value | BMCR_LOOPBACK); > + } else { > + value = phy_read(phydev, MII_BMCR); > + phy_write(phydev, MII_BMCR, value & ~BMCR_LOOPBACK); > + } > + > + mutex_unlock(&phydev->lock); > + > + return 0; > +} > +EXPORT_SYMBOL(genphy_loopback); > + > +static int gen10g_loopback(struct phy_device *phydev, bool enable) > +{ > + return 0; > +} > + > static int __set_phy_supported(struct phy_device *phydev, u32 max_speed) > { > /* The default values for phydev->supported are provided by the PHY > @@ -1874,6 +1899,7 @@ void phy_drivers_unregister(struct phy_driver *drv, int > n) > .read_status= genphy_read_status, > .suspend= genphy_suspend, > .resume = genphy_resume, > + .set_loopback = genphy_loopback, > }, { > .phy_id = 0x, > .phy_id_mask= 0x, > @@ -1885,6 +1911,7 @@ void phy_drivers_unregister(struct phy_driver *drv, int > n) > .read_status= gen10g_read_status, > .suspend= gen10g_suspend, > .resume = gen10g_resume, > + .set_loopback = gen10g_loopback, > } }; > > static int __init phy_init(void) > diff --git a/include/linux/phy.h b/include/linux/phy.h > index e76e4ad..fc7a5c8 100644 > --- a/include/linux/phy.h > +++ b/include/linux/phy.h > @@ -639,6 +639,7 @@ struct phy_driver { > int (*set_tunable)(struct phy_device *dev, > struct ethtool_tunable *tuna, > const void *data); > + int (*set_loopback(struct phy_device *dev, bool enable); Does this even compile? It looks to be missing a ) Also, where is the exported function the MAC driver should call? Andrew
Re: [Patch net] ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
Hi Cong Wang, oh, oops, i did misread. also, Tested-by: Jeffy Chen On 06/21/2017 11:01 AM, jeffy wrote: Hi Cong Wang, i don't know much about net core, maybe i'm misreading the code...but On 06/21/2017 02:42 AM, Cong Wang wrote: In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired, unfortunately, as reported by jeffy, netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event until all refs are gone. We have to add an additional check to avoid this corner case. For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED, for dev_change_net_namespace(), dev->reg_state is NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED. i saw we are calling NETDEV_REGISTER in these cases: 1/ register_netdevice_notifier: the paired unregister would be: a) normal unregister: rollback_registered_many b) the error path: register_netdevice_notifier->rollback jump label 2/ register_netdevice: the paired unregister would both be rollback_registered_many for normal/error cases 3/ dev_change_net_namespace: the paired unregister is the one right before the register notify i think we are handling all register notifies, but only unregister notify from rollback_registered_many now? you are checking for NETREG_UNREGISTERED, so only filter out the unregistered after rollback_registered_many, so that indeed covers all cases :) Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") Reported-by: jeffy Cc: David Ahern Signed-off-by: Cong Wang --- net/ipv6/route.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 7cebd95..322bd62 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3722,7 +3722,11 @@ static int ip6_route_dev_notify(struct notifier_block *this, net->ipv6.ip6_blk_hole_entry->dst.dev = dev; net->ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(dev); #endif - } else if (event == NETDEV_UNREGISTER) { + } else if (event == NETDEV_UNREGISTER && +dev->reg_state != NETREG_UNREGISTERED) { +/* NETDEV_UNREGISTER could be fired for multiple times by + * netdev_wait_allrefs(). Make sure we only call this once. + */ in6_dev_put(net->ipv6.ip6_null_entry->rt6i_idev); #ifdef CONFIG_IPV6_MULTIPLE_TABLES in6_dev_put(net->ipv6.ip6_prohibit_entry->rt6i_idev);
Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+
On 6/20/17 5:41 PM, Ben Greear wrote: > On 06/20/2017 11:05 AM, Michal Kubecek wrote: >> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote: >>> On 06/14/2017 03:25 PM, David Ahern wrote: On 6/14/17 4:23 PM, Ben Greear wrote: > On 06/13/2017 07:27 PM, David Ahern wrote: > >> Let's try a targeted debug patch. See attached > > I had to change it to pr_err so it would go to our serial console > since the system locked hard on crash, > and that appears to be enough to change the timing where we can no > longer > reproduce the problem. ok, let's figure out which one is doing that. There are 3 debug statements. I suspect fib6_del_route is the one setting the state to FWS_U. Can you remove the debug prints in fib6_repair_tree and fib6_walk_continue and try again? >>> >>> We cannot reproduce with just that one printf in the kernel either. It >>> must change the timing too much to trigger the bug. >> >> You might try trace_printk() which should have less impact (don't forget >> to enable /proc/sys/kernel/ftrace_dump_on_oops). > > We cannot reproduce with trace_printk() either. I think that suggests the walker state is set to FWS_U in fib6_del_route, and it is the FWS_U case in fib6_walk_continue that triggers the fault -- the null parent (pn = fn->parent). So we have the 2 areas of code that are interacting. I'm on a road trip through the end of this week with little time to focus on this problem. I'll get back to you another suggestion when I can.
Re: [Patch net] ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
Hi Cong Wang, i don't know much about net core, maybe i'm misreading the code...but On 06/21/2017 02:42 AM, Cong Wang wrote: In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired, unfortunately, as reported by jeffy, netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event until all refs are gone. We have to add an additional check to avoid this corner case. For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED, for dev_change_net_namespace(), dev->reg_state is NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED. i saw we are calling NETDEV_REGISTER in these cases: 1/ register_netdevice_notifier: the paired unregister would be: a) normal unregister: rollback_registered_many b) the error path: register_netdevice_notifier->rollback jump label 2/ register_netdevice: the paired unregister would both be rollback_registered_many for normal/error cases 3/ dev_change_net_namespace: the paired unregister is the one right before the register notify i think we are handling all register notifies, but only unregister notify from rollback_registered_many now? Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") Reported-by: jeffy Cc: David Ahern Signed-off-by: Cong Wang --- net/ipv6/route.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 7cebd95..322bd62 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3722,7 +3722,11 @@ static int ip6_route_dev_notify(struct notifier_block *this, net->ipv6.ip6_blk_hole_entry->dst.dev = dev; net->ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(dev); #endif -} else if (event == NETDEV_UNREGISTER) { +} else if (event == NETDEV_UNREGISTER && + dev->reg_state != NETREG_UNREGISTERED) { + /* NETDEV_UNREGISTER could be fired for multiple times by +* netdev_wait_allrefs(). Make sure we only call this once. +*/ in6_dev_put(net->ipv6.ip6_null_entry->rt6i_idev); #ifdef CONFIG_IPV6_MULTIPLE_TABLES in6_dev_put(net->ipv6.ip6_prohibit_entry->rt6i_idev);
Re: [PATCH] liquidio: stop using huge static buffer, save 4096k in .data
From: David Miller Date: Tue, 20 Jun 2017 21:17:13 -0400 > From: Felix Manlunas > Date: Tue, 20 Jun 2017 13:51:25 -0700 > > > From: Derek Chickles > > Date: Tue, 20 Jun 2017 13:15:34 -0700 > > > >> > From: David Miller [mailto:da...@davemloft.net] > >> > Sent: Tuesday, June 20, 2017 12:22 PM > >> > > >> > From: Denys Vlasenko > >> > Date: Mon, 19 Jun 2017 21:50:52 +0200 > >> > > >> > > Only compile-tested - I don't have the hardware. > >> > > > >> > > From code inspection, octeon_pci_write_core_mem() appears to be safe > >> > > wrt > >> > > unaligned source. In any case, u8 fbuf[] was not guaranteed to be > >> > > aligned > >> > > anyway. > >> > > > >> > > Signed-off-by: Denys Vlasenko > >> > > >> > Looks good to me but I'll let one of the liquidio guys review this first > >> > before I apply it. > >> > >> Felix is going to try this out this week to confirm. Let's wait for his > >> ack. > > > > This patch works. I tested it with a LiquidIO II adapter. > > > > ACK > > Please ACK patches in the standard way which is in the form of: > > Acked-by: David S. Miller > > This tag is recognized by tools and in particular the patchwork > site where networking patches are maintained, automatically > including your ACK into the patch I apply. Acked-by: Felix Manlunas
Re: [PATCH] net: intel: e1000e: add check on e1e_wphy() return value
Gustavo, The return value of ret_val seems used to check if the access to PHY/NVM got its semaphore, generally speaking, it is needed for every PHY access of this driver. Reviewed-by: Ethan Zhao On Wed, Jun 21, 2017 at 5:22 AM, Gustavo A. R. Silva wrote: > Check return value from call to e1e_wphy(). This value is being > checked during previous calls to function e1e_wphy() and it seems > a check was missing here. > > Addresses-Coverity-ID: 1226905 > Signed-off-by: Gustavo A. R. Silva > --- > drivers/net/ethernet/intel/e1000e/ich8lan.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c > b/drivers/net/ethernet/intel/e1000e/ich8lan.c > index 68ea8b4..d6d4ed7 100644 > --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c > +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c > @@ -2437,6 +2437,8 @@ static s32 e1000_hv_phy_workarounds_ich8lan(struct > e1000_hw *hw) > if (hw->phy.revision < 2) { > e1000e_phy_sw_reset(hw); > ret_val = e1e_wphy(hw, MII_BMCR, 0x3140); > + if (ret_val) > + return ret_val; > } > } > > -- > 2.5.0 >
[PATCH] PATCH v3 Convert multiple netdev_info messages to netdev_dbg
The bond_options.c file contains multiple netdev_info messages that clutter kernel output. This patches replaces these with netdev_dbg messages and adds a netdev_dbg for packets for slave. Signed-off-by: Michael J Dilmore Suggested-by: Joe Perches --- drivers/net/bonding/bond_options.c | 54 +++--- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index e3a9af6..9110e5b 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -722,13 +722,13 @@ static int bond_option_mode_set(struct bonding *bond, { if (!bond_mode_uses_arp(newval->value) && bond->params.arp_interval) { netdev_dbg(bond->dev, "%s mode is incompatible with arp monitoring, start mii monitoring\n", - newval->string); + newval->string); /* disable arp monitoring */ bond->params.arp_interval = 0; /* set miimon to default value */ bond->params.miimon = BOND_DEFAULT_MIIMON; netdev_dbg(bond->dev, "Setting MII monitoring interval to %d\n", - bond->params.miimon); + bond->params.miimon); } /* don't cache arp_validate between modes */ @@ -783,12 +783,12 @@ static int bond_option_active_slave_set(struct bonding *bond, if (new_active == old_active) { /* do nothing */ netdev_dbg(bond->dev, "%s is already the current active slave\n", - new_active->dev->name); + new_active->dev->name); } else { if (old_active && (new_active->link == BOND_LINK_UP) && bond_slave_is_up(new_active)) { netdev_dbg(bond->dev, "Setting %s as active slave\n", - new_active->dev->name); + new_active->dev->name); bond_change_active_slave(bond, new_active); } else { netdev_err(bond->dev, "Could not set %s as active slave; either %s is down or the link is down\n", @@ -811,14 +811,14 @@ static int bond_option_miimon_set(struct bonding *bond, const struct bond_opt_value *newval) { netdev_dbg(bond->dev, "Setting MII monitoring interval to %llu\n", - newval->value); + newval->value); bond->params.miimon = newval->value; if (bond->params.updelay) netdev_dbg(bond->dev, "Note: Updating updelay (to %d) since it is a multiple of the miimon value\n", - bond->params.updelay * bond->params.miimon); + bond->params.updelay * bond->params.miimon); if (bond->params.downdelay) netdev_dbg(bond->dev, "Note: Updating downdelay (to %d) since it is a multiple of the miimon value\n", - bond->params.downdelay * bond->params.miimon); + bond->params.downdelay * bond->params.miimon); if (newval->value && bond->params.arp_interval) { netdev_dbg(bond->dev, "MII monitoring cannot be used with ARP monitoring - disabling ARP monitoring...\n"); bond->params.arp_interval = 0; @@ -863,7 +863,7 @@ static int bond_option_updelay_set(struct bonding *bond, } bond->params.updelay = value / bond->params.miimon; netdev_dbg(bond->dev, "Setting up delay to %d\n", - bond->params.updelay * bond->params.miimon); + bond->params.updelay * bond->params.miimon); return 0; } @@ -885,7 +885,7 @@ static int bond_option_downdelay_set(struct bonding *bond, } bond->params.downdelay = value / bond->params.miimon; netdev_dbg(bond->dev, "Setting down delay to %d\n", - bond->params.downdelay * bond->params.miimon); + bond->params.downdelay * bond->params.miimon); return 0; } @@ -894,7 +894,7 @@ static int bond_option_use_carrier_set(struct bonding *bond, const struct bond_opt_value *newval) { netdev_dbg(bond->dev, "Setting use_carrier to %llu\n", - newval->value); + newval->value); bond->params.use_carrier = newval->value; return 0; @@ -908,7 +908,7 @@ static int bond_option_arp_interval_set(struct bonding *bond, const struct bond_opt_value *newval) { netdev_dbg(bond->dev, "Setting ARP monitoring interval to %llu\n", - newval->value); + newval->value); bond->params.arp_interv
Re: [PATCH NET] net/hns:bugfix of ethtool -t phy self_test
Hi, Andrew On 2017/6/20 21:28, Andrew Lunn wrote: The question really is, why is not this properly integrated into the PHY driver and PHYLIB such that the only thing the Ethernet MAC driver has to call is a function of the PHY driver putting it in self-test? >>> >>> This whole driver pokes various PHY registers, rather than use >>> phylib. And it does so without taking the PHY lock. >> I will consider using phylib as much as possible, thanks. >> >> It also assumes it >>> is a Marvell PHY and i don't see anywhere it actually verifies this. >> When it said Marvell Phy , I meant Marvell Phy with fibre support. >> I will send anther patch to only setting bit in Fiber Control when >> it is a Marvell Phy with fibre support. > > There is a lot more broken than just that. > > You really should remove all code which is accessing the PHY, and add > support to phylib and the drivers for what you need. > > Andrew After adding genphy_loopback support, I will try it. Thanks for pointing out. Best Regards Yunsheng Lin
[PATCH net-next] Add a tcp_filter hook before handle ack packet
From: Chenbo Feng Currently in both ipv4 and ipv6 code path, the ack packet received when sk at TCP_NEW_SYN_RECV state is not filtered by socket filter or cgroup filter since it is handled from tcp_child_process and never reaches the tcp_filter inside tcp_v4_rcv or tcp_v6_rcv. Adding a tcp_filter hooks here can make sure all the ingress tcp packet can be correctly filtered. Signed-off-by: Chenbo Feng --- net/ipv4/tcp_ipv4.c | 2 ++ net/ipv6/tcp_ipv6.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 1dc8c44..ca3afb0 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1675,6 +1675,8 @@ int tcp_v4_rcv(struct sk_buff *skb) } if (nsk == sk) { reqsk_put(req); + } else if (tcp_filter(sk, skb)) { + goto discard_and_relse; } else if (tcp_child_process(sk, nsk, skb)) { tcp_v4_send_reset(nsk, skb); goto discard_and_relse; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 84ad502..565d89b 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1451,6 +1451,8 @@ static int tcp_v6_rcv(struct sk_buff *skb) if (nsk == sk) { reqsk_put(req); tcp_v6_restore_cb(skb); + } else if (tcp_filter(sk, skb)) { + goto discard_and_relse; } else if (tcp_child_process(sk, nsk, skb)) { tcp_v6_send_reset(nsk, skb); goto discard_and_relse; -- 2.7.4
Re: [PATCH NET] net/hns:bugfix of ethtool -t phy self_test
Hi, Andrew On 2017/6/20 21:27, Andrew Lunn wrote: > On Tue, Jun 20, 2017 at 11:05:54AM +0800, l00371289 wrote: >> hi, Florian >> >> On 2017/6/20 5:00, Florian Fainelli wrote: >>> On 06/16/2017 02:24 AM, Lin Yun Sheng wrote: This patch fixes the phy loopback self_test failed issue. when Marvell Phy Module is loaded, it will powerdown fiber when doing phy loopback self test, which cause phy loopback self_test fail. Signed-off-by: Lin Yun Sheng --- drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c index b8fab14..e95795b 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c @@ -288,9 +288,15 @@ static int hns_nic_config_phy_loopback(struct phy_device *phy_dev, u8 en) >>> >>> The question really is, why is not this properly integrated into the PHY >>> driver and PHYLIB such that the only thing the Ethernet MAC driver has >>> to call is a function of the PHY driver putting it in self-test? >> Do you meaning calling phy_dev->drv->resume and phy_dev->drv->suspend >> function? > > No. Florian is saying you should add support for phylib and the > drivers to enable/disable loopback. > > The BMCR loopback bit is pretty much standardised. So you can > implement a genphy_loopback(phydev, enable), which most drivers can > use. Those that need there own can implement it in there driver. I tried to add the genphy_loopback support you mentioned, please look at it if that is what you mean. If Yes, I will try to send out a new patch. Best Regards Yinsheng Lin diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 1219eea..54fecad 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1628,6 +1628,31 @@ static int gen10g_resume(struct phy_device *phydev) return 0; } +int genphy_loopback(struct phy_device *phydev, bool enable) +{ + int value; + + mutex_lock(&phydev->lock); + + if (enable) { + value = phy_read(phydev, MII_BMCR); + phy_write(phydev, MII_BMCR, value | BMCR_LOOPBACK); + } else { + value = phy_read(phydev, MII_BMCR); + phy_write(phydev, MII_BMCR, value & ~BMCR_LOOPBACK); + } + + mutex_unlock(&phydev->lock); + + return 0; +} +EXPORT_SYMBOL(genphy_loopback); + +static int gen10g_loopback(struct phy_device *phydev, bool enable) +{ + return 0; +} + static int __set_phy_supported(struct phy_device *phydev, u32 max_speed) { /* The default values for phydev->supported are provided by the PHY @@ -1874,6 +1899,7 @@ void phy_drivers_unregister(struct phy_driver *drv, int n) .read_status= genphy_read_status, .suspend= genphy_suspend, .resume = genphy_resume, + .set_loopback = genphy_loopback, }, { .phy_id = 0x, .phy_id_mask= 0x, @@ -1885,6 +1911,7 @@ void phy_drivers_unregister(struct phy_driver *drv, int n) .read_status= gen10g_read_status, .suspend= gen10g_suspend, .resume = gen10g_resume, + .set_loopback = gen10g_loopback, } }; static int __init phy_init(void) diff --git a/include/linux/phy.h b/include/linux/phy.h index e76e4ad..fc7a5c8 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -639,6 +639,7 @@ struct phy_driver { int (*set_tunable)(struct phy_device *dev, struct ethtool_tunable *tuna, const void *data); + int (*set_loopback(struct phy_device *dev, bool enable); }; #define to_phy_driver(d) container_of(to_mdio_common_driver(d), \ struct phy_driver, mdiodrv)
linux-next: manual merge of the net-next tree with the pci tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: drivers/net/wireless/marvell/mwifiex/pcie.c between commit: c336cc0ee4eb ("PCI: Split ->reset_notify() method into ->reset_prepare() and ->reset_done()") from the pci tree and commit: 68efd0386988 ("mwifiex: pcie: stop setting/clearing 'surprise_removed'") from the net-next tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc drivers/net/wireless/marvell/mwifiex/pcie.c index 279adf124fc9,b53ecf1eddda.. --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@@ -361,48 -359,35 +361,46 @@@ static void mwifiex_pcie_reset_prepare( } mwifiex_dbg(adapter, INFO, - "%s: vendor=0x%4.04x device=0x%4.04x rev=%d %s\n", - __func__, pdev->vendor, pdev->device, - pdev->revision, - prepare ? "Pre-FLR" : "Post-FLR"); - - if (prepare) { - /* Kernel would be performing FLR after this notification. - * Cleanup all software without cleaning anything related to - * PCIe and HW. - */ - mwifiex_shutdown_sw(adapter); - clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &card->work_flags); - clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &card->work_flags); - } else { - /* Kernel stores and restores PCIe function context before and - * after performing FLR respectively. Reconfigure the software - * and firmware including firmware redownload - */ - ret = mwifiex_reinit_sw(adapter); - if (ret) { - dev_err(&pdev->dev, "reinit failed: %d\n", ret); - return; - } - } + "%s: vendor=0x%4.04x device=0x%4.04x rev=%d Pre-FLR\n", + __func__, pdev->vendor, pdev->device, pdev->revision); + + mwifiex_shutdown_sw(adapter); - adapter->surprise_removed = true; + clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &card->work_flags); + clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &card->work_flags); mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__); } -static const struct pci_error_handlers mwifiex_pcie_err_handler[] = { - { .reset_notify = mwifiex_pcie_reset_notify, }, +/* + * Kernel stores and restores PCIe function context before and after performing + * FLR respectively. Reconfigure the software and firmware including firmware + * redownload. + */ +static void mwifiex_pcie_reset_done(struct pci_dev *pdev) +{ + struct pcie_service_card *card = pci_get_drvdata(pdev); + struct mwifiex_adapter *adapter = card->adapter; + int ret; + + if (!adapter) { + dev_err(&pdev->dev, "%s: adapter structure is not valid\n", + __func__); + return; + } + + mwifiex_dbg(adapter, INFO, + "%s: vendor=0x%4.04x device=0x%4.04x rev=%d Post-FLR\n", + __func__, pdev->vendor, pdev->device, pdev->revision); + - adapter->surprise_removed = false; + ret = mwifiex_reinit_sw(adapter); + if (ret) + dev_err(&pdev->dev, "reinit failed: %d\n", ret); + else + mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__); +} + +static const struct pci_error_handlers mwifiex_pcie_err_handler = { + .reset_prepare = mwifiex_pcie_reset_prepare, + .reset_done = mwifiex_pcie_reset_done, }; #ifdef CONFIG_PM_SLEEP
linux-next: manual merge of the net-next tree with the net tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: net/core/rtnetlink.c between commit: db833d40ad32 ("rtnetlink: add IFLA_GROUP to ifla_policy") from the net tree and commit: 3d3ea5af5c0b ("rtnl: Add support for netdev event to link messages") from the net-next tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc net/core/rtnetlink.c index 467a2f4510a7,3aa57848a895.. --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@@ -1469,7 -1519,7 +1520,8 @@@ static const struct nla_policy ifla_pol [IFLA_LINK_NETNSID] = { .type = NLA_S32 }, [IFLA_PROTO_DOWN] = { .type = NLA_U8 }, [IFLA_XDP] = { .type = NLA_NESTED }, + [IFLA_GROUP]= { .type = NLA_U32 }, + [IFLA_EVENT]= { .type = NLA_U32 }, }; static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
[net-next 05/15] i40e: use dev_dbg instead of dev_info when warning about missing routine
From: Jacob Keller When searching for the vf_capability client routine, dev_info() was used, instead of the normal dev_dbg(). This causes the message to be displayed at standard log levels which can cause administrators to worry. Avoid this by using dev_dbg instead. Copyright updated to 2017. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_client.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.c b/drivers/net/ethernet/intel/i40e/i40e_client.c index 36f694ccdc09..1b1e2acbd07f 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_client.c +++ b/drivers/net/ethernet/intel/i40e/i40e_client.c @@ -1,7 +1,7 @@ /*** * * Intel Ethernet Controller XL710 Family Linux Driver - * Copyright(c) 2013 - 2015 Intel Corporation. + * Copyright(c) 2013 - 2017 Intel Corporation. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -273,8 +273,8 @@ int i40e_vf_client_capable(struct i40e_pf *pf, u32 vf_id) if (!cdev || !cdev->client) goto out; if (!cdev->client->ops || !cdev->client->ops->vf_capable) { - dev_info(&pf->pdev->dev, -"Cannot locate client instance VF capability routine\n"); + dev_dbg(&pf->pdev->dev, + "Cannot locate client instance VF capability routine\n"); goto out; } if (!test_bit(__I40E_CLIENT_INSTANCE_OPENED, &cdev->state)) -- 2.12.2
[net-next 08/15] i40e: Support firmware CEE DCB UP to TC map re-definition
From: Greg Bowers Changes parsing of FW 4.33 AQ command Get CEE DCBX OPER CFG (0x0A07). Change is required because FW now creates the oper_prio_tc nibbles reversed from those in the CEE Priority Group sub-TLV. This change will only apply to FW 4.33 as future FW versions will use a different function to parse the CEE data. Signed-off-by: Greg Bowers Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_dcb.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.c b/drivers/net/ethernet/intel/i40e/i40e_dcb.c index bf1d67e184f7..55079fe3ed63 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_dcb.c +++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.c @@ -620,14 +620,17 @@ static void i40e_cee_to_dcb_v1_config( /* CEE PG data to ETS config */ dcbcfg->etscfg.maxtcs = cee_cfg->oper_num_tc; + /* Note that the FW creates the oper_prio_tc nibbles reversed +* from those in the CEE Priority Group sub-TLV. +*/ for (i = 0; i < 4; i++) { tc = (u8)((cee_cfg->oper_prio_tc[i] & -I40E_CEE_PGID_PRIO_1_MASK) >> -I40E_CEE_PGID_PRIO_1_SHIFT); - dcbcfg->etscfg.prioritytable[i*2] = tc; - tc = (u8)((cee_cfg->oper_prio_tc[i] & I40E_CEE_PGID_PRIO_0_MASK) >> I40E_CEE_PGID_PRIO_0_SHIFT); + dcbcfg->etscfg.prioritytable[i * 2] = tc; + tc = (u8)((cee_cfg->oper_prio_tc[i] & +I40E_CEE_PGID_PRIO_1_MASK) >> +I40E_CEE_PGID_PRIO_1_SHIFT); dcbcfg->etscfg.prioritytable[i*2 + 1] = tc; } -- 2.12.2
[net-next 13/15] i40e: clear only cause_ena bit
From: Shannon Nelson When disabling interrupts, we should only be clearing the CAUSE_ENA bit, not clearing the whole register. Clearing the whole register sets the NEXTQ_IDX field to 0 instead of 0x7ff which can confuse the Firmware in some reset sequences. Signed-off-by: Shannon Nelson Signed-off-by: Mitch Williams Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index b743eca879d5..5d82ff54c7b0 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3588,14 +3588,24 @@ static void i40e_vsi_disable_irq(struct i40e_vsi *vsi) int base = vsi->base_vector; int i; + /* disable interrupt causation from each queue */ for (i = 0; i < vsi->num_queue_pairs; i++) { - wr32(hw, I40E_QINT_TQCTL(vsi->tx_rings[i]->reg_idx), 0); - wr32(hw, I40E_QINT_RQCTL(vsi->rx_rings[i]->reg_idx), 0); + u32 val; + + val = rd32(hw, I40E_QINT_TQCTL(vsi->tx_rings[i]->reg_idx)); + val &= ~I40E_QINT_TQCTL_CAUSE_ENA_MASK; + wr32(hw, I40E_QINT_TQCTL(vsi->tx_rings[i]->reg_idx), val); + + val = rd32(hw, I40E_QINT_RQCTL(vsi->rx_rings[i]->reg_idx)); + val &= ~I40E_QINT_RQCTL_CAUSE_ENA_MASK; + wr32(hw, I40E_QINT_RQCTL(vsi->rx_rings[i]->reg_idx), val); + if (!i40e_enabled_xdp_vsi(vsi)) continue; wr32(hw, I40E_QINT_TQCTL(vsi->xdp_rings[i]->reg_idx), 0); } + /* disable each interrupt */ if (pf->flags & I40E_FLAG_MSIX_ENABLED) { for (i = vsi->base_vector; i < (vsi->num_q_vectors + vsi->base_vector); i++) -- 2.12.2
[net-next 07/15] i40e: Fix potential out of bound array access
From: Sudheer Mogilappagari This is a fix for the static code analysis issue where dcbcfg->numapps could be greater than size of array (i.e dcbcfg->app[I40E_DCBX_MAX_APPS]). The fix makes sure that the array is not accessed past the size of of the array (i.e. I40E_DCBX_MAX_APPS). Copyright updated to 2017. Signed-off-by: Sudheer Mogilappagari Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_dcb.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.c b/drivers/net/ethernet/intel/i40e/i40e_dcb.c index 0fab3a9b51d9..bf1d67e184f7 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_dcb.c +++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.c @@ -1,7 +1,7 @@ /*** * * Intel Ethernet Controller XL710 Family Linux Driver - * Copyright(c) 2013 - 2014 Intel Corporation. + * Copyright(c) 2013 - 2017 Intel Corporation. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -390,6 +390,8 @@ static void i40e_parse_cee_app_tlv(struct i40e_cee_feat_tlv *tlv, if (!dcbcfg->numapps) return; + if (dcbcfg->numapps > I40E_DCBX_MAX_APPS) + dcbcfg->numapps = I40E_DCBX_MAX_APPS; for (i = 0; i < dcbcfg->numapps; i++) { u8 up, selector; -- 2.12.2
[net-next 15/15] i40e: don't hold RTNL lock for the entire reset
From: Jacob Keller We recently refactored i40e_do_reset() and its friends to be able to hold the RTNL lock only for the portions that actually need to be protected. However, a separate refactoring added several new callers of these functions during the PCIe error recovery and suspend/resume cycles. When merging the changes together, it was not noticed that we could reduce the RTNL scope by letting the reset function handle the lock itself, as previously it was not possible. Fix this by replacing these call sites to indicate that the reset function should handle its own lock. This enables multiple PFs to reset or resume simultaneously without serializing the resets via the RTNL lock. The end result is that on systems with lots of PFs and VFs the resets don't stall waiting for each other to finish. It is probable that we can also do the same for i40e_do_reset_safe, but this author did not research that change carefully enough to be confident. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 27 +++ 1 file changed, 7 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index c4328b4bec95..2db93d3f6d23 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -6566,9 +6566,7 @@ static void i40e_reset_subtask(struct i40e_pf *pf) if (reset_flags && !test_bit(__I40E_DOWN, pf->state) && !test_bit(__I40E_CONFIG_BUSY, pf->state)) { - rtnl_lock(); - i40e_do_reset(pf, reset_flags, true); - rtnl_unlock(); + i40e_do_reset(pf, reset_flags, false); } } @@ -11906,11 +11904,8 @@ static pci_ers_result_t i40e_pci_error_detected(struct pci_dev *pdev, } /* shutdown all operations */ - if (!test_bit(__I40E_SUSPENDED, pf->state)) { - rtnl_lock(); - i40e_prep_for_reset(pf, true); - rtnl_unlock(); - } + if (!test_bit(__I40E_SUSPENDED, pf->state)) + i40e_prep_for_reset(pf, false); /* Request a slot reset */ return PCI_ERS_RESULT_NEED_RESET; @@ -11976,9 +11971,7 @@ static void i40e_pci_error_resume(struct pci_dev *pdev) if (test_bit(__I40E_SUSPENDED, pf->state)) return; - rtnl_lock(); - i40e_handle_reset_warning(pf, true); - rtnl_unlock(); + i40e_handle_reset_warning(pf, false); } /** @@ -12058,9 +12051,7 @@ static void i40e_shutdown(struct pci_dev *pdev) if (pf->wol_en && (pf->flags & I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE)) i40e_enable_mc_magic_wake(pf); - rtnl_lock(); - i40e_prep_for_reset(pf, true); - rtnl_unlock(); + i40e_prep_for_reset(pf, false); wr32(hw, I40E_PFPM_APM, (pf->wol_en ? I40E_PFPM_APM_APME_MASK : 0)); @@ -12092,9 +12083,7 @@ static int i40e_suspend(struct pci_dev *pdev, pm_message_t state) if (pf->wol_en && (pf->flags & I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE)) i40e_enable_mc_magic_wake(pf); - rtnl_lock(); - i40e_prep_for_reset(pf, true); - rtnl_unlock(); + i40e_prep_for_reset(pf, false); wr32(hw, I40E_PFPM_APM, (pf->wol_en ? I40E_PFPM_APM_APME_MASK : 0)); wr32(hw, I40E_PFPM_WUFC, (pf->wol_en ? I40E_PFPM_WUFC_MAG_MASK : 0)); @@ -12140,9 +12129,7 @@ static int i40e_resume(struct pci_dev *pdev) /* handling the reset will rebuild the device state */ if (test_and_clear_bit(__I40E_SUSPENDED, pf->state)) { clear_bit(__I40E_DOWN, pf->state); - rtnl_lock(); - i40e_reset_and_rebuild(pf, false, true); - rtnl_unlock(); + i40e_reset_and_rebuild(pf, false, false); } return 0; -- 2.12.2
[net-next 12/15] i40e: fix disabling overflow promiscuous mode
From: Alan Brady There exists a bug in which the driver does not correctly exit overflow promiscuous mode. This can occur if "too many" mac filters are added, putting the driver into overflow promiscuous mode, and the filters are then removed. When the failed filters are removed, the driver reports exiting overflow promiscuous mode which is correct, however traffic continues to be received as if in promiscuous mode still. The bug occurs because the conditional for toggling promiscuous mode was set to only execute when promiscuous mode was enabled and not when it was disabled as well. This patch fixes the conditional to correctly execute when promiscuous mode is toggled and not just enabled. Without this patch, the driver is unable to correctly exit overflow promiscuous mode. Signed-off-by: Alan Brady Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 8af6420826d1..b743eca879d5 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -2281,9 +2281,8 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) i40e_aq_str(hw, hw->aq.asq_last_status)); } } - if ((changed_flags & IFF_PROMISC) || - (promisc_changed && -test_bit(__I40E_VSI_OVERFLOW_PROMISC, vsi->state))) { + + if ((changed_flags & IFF_PROMISC) || promisc_changed) { bool cur_promisc; cur_promisc = (!!(vsi->current_netdev_flags & IFF_PROMISC) || -- 2.12.2
[net-next 10/15] i40e: genericize the partition bandwidth control
From: Shannon Nelson Partition bandwidth control is not in just one form of MFP (multi-function partitioning), so make the code more generic and be sure to nudge the Tx scheduler for all MFP. Copyright updated to 2017. Signed-off-by: Shannon Nelson Signed-off-by: Mitch Williams Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e.h | 13 + drivers/net/ethernet/intel/i40e/i40e_main.c | 41 ++--- 2 files changed, 26 insertions(+), 28 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 4250ab55a9f1..76395e695007 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -1,7 +1,7 @@ /*** * * Intel Ethernet Controller XL710 Family Linux Driver - * Copyright(c) 2013 - 2016 Intel Corporation. + * Copyright(c) 2013 - 2017 Intel Corporation. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -516,9 +516,8 @@ struct i40e_pf { bool ptp_tx; bool ptp_rx; u16 rss_table_size; /* HW RSS table size */ - /* These are only valid in NPAR modes */ - u32 npar_max_bw; - u32 npar_min_bw; + u32 max_bw; + u32 min_bw; u32 ioremap_len; u32 fd_inv; @@ -971,9 +970,9 @@ int i40e_ptp_get_ts_config(struct i40e_pf *pf, struct ifreq *ifr); void i40e_ptp_init(struct i40e_pf *pf); void i40e_ptp_stop(struct i40e_pf *pf); int i40e_is_vsi_uplink_mode_veb(struct i40e_vsi *vsi); -i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf); -i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf); -i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf); +i40e_status i40e_get_partition_bw_setting(struct i40e_pf *pf); +i40e_status i40e_set_partition_bw_setting(struct i40e_pf *pf); +i40e_status i40e_commit_partition_bw_setting(struct i40e_pf *pf); void i40e_print_link_message(struct i40e_vsi *vsi, bool isup); static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 7e415bb5a7dc..8d7bd85933bb 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -1,7 +1,7 @@ /*** * * Intel Ethernet Controller XL710 Family Linux Driver - * Copyright(c) 2013 - 2016 Intel Corporation. + * Copyright(c) 2013 - 2017 Intel Corporation. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -8740,10 +8740,10 @@ int i40e_reconfig_rss_queues(struct i40e_pf *pf, int queue_count) } /** - * i40e_get_npar_bw_setting - Retrieve BW settings for this PF partition + * i40e_get_partition_bw_setting - Retrieve BW settings for this PF partition * @pf: board private structure **/ -i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf) +i40e_status i40e_get_partition_bw_setting(struct i40e_pf *pf) { i40e_status status; bool min_valid, max_valid; @@ -8754,27 +8754,27 @@ i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf) if (!status) { if (min_valid) - pf->npar_min_bw = min_bw; + pf->min_bw = min_bw; if (max_valid) - pf->npar_max_bw = max_bw; + pf->max_bw = max_bw; } return status; } /** - * i40e_set_npar_bw_setting - Set BW settings for this PF partition + * i40e_set_partition_bw_setting - Set BW settings for this PF partition * @pf: board private structure **/ -i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf) +i40e_status i40e_set_partition_bw_setting(struct i40e_pf *pf) { struct i40e_aqc_configure_partition_bw_data bw_data; i40e_status status; /* Set the valid bit for this PF */ bw_data.pf_valid_bits = cpu_to_le16(BIT(pf->hw.pf_id)); - bw_data.max_bw[pf->hw.pf_id] = pf->npar_max_bw & I40E_ALT_BW_VALUE_MASK; - bw_data.min_bw[pf->hw.pf_id] = pf->npar_min_bw & I40E_ALT_BW_VALUE_MASK; + bw_data.max_bw[pf->hw.pf_id] = pf->max_bw & I40E_ALT_BW_VALUE_MASK; + bw_data.min_bw[pf->hw.pf_id] = pf->min_bw & I40E_ALT_BW_VALUE_MASK; /* Set the new bandwidths */ status = i40e_aq_configure_partition_bw(&pf->hw, &bw_data, NULL); @@ -8783,10 +8783,10 @@ i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf) } /** - * i40e_commit_npar_bw_setting - Commit BW settings for this PF partition + * i40e_commit_partition_bw_setting - Commit BW settings for this PF partition * @pf: board private structure **/ -i40e_status i40e_commit_npar_bw_sett
[net-next 04/15] i40e/i40evf: update WOL and I40E_AQC_ADDR_VALID_MASK flags
From: Alice Michael Update a few flags related to FW interactions. Copyright updated to 2017. Signed-off-by: Alice Michael Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 4 ++-- drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h index 5eb04114e13f..5d5f422cbae5 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h @@ -1,7 +1,7 @@ /*** * * Intel Ethernet Controller XL710 Family Linux Driver - * Copyright(c) 2013 - 2016 Intel Corporation. + * Copyright(c) 2013 - 2017 Intel Corporation. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -531,7 +531,7 @@ struct i40e_aqc_mac_address_read { #define I40E_AQC_PORT_ADDR_VALID 0x40 #define I40E_AQC_WOL_ADDR_VALID0x80 #define I40E_AQC_MC_MAG_EN_VALID 0x100 -#define I40E_AQC_ADDR_VALID_MASK 0x1F0 +#define I40E_AQC_ADDR_VALID_MASK 0x3F0 u8 reserved[6]; __le32 addr_high; __le32 addr_low; diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h index 91d8786d386d..83e63e55c4b4 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h +++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h @@ -1,7 +1,7 @@ /*** * * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver - * Copyright(c) 2013 - 2016 Intel Corporation. + * Copyright(c) 2013 - 2017 Intel Corporation. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -528,7 +528,7 @@ struct i40e_aqc_mac_address_read { #define I40E_AQC_PORT_ADDR_VALID 0x40 #define I40E_AQC_WOL_ADDR_VALID0x80 #define I40E_AQC_MC_MAG_EN_VALID 0x100 -#define I40E_AQC_ADDR_VALID_MASK 0x1F0 +#define I40E_AQC_ADDR_VALID_MASK 0x3F0 u8 reserved[6]; __le32 addr_high; __le32 addr_low; @@ -586,6 +586,7 @@ struct i40e_aqc_set_wol_filter { __le16 cmd_flags; #define I40E_AQC_SET_WOL_FILTER0x8000 #define I40E_AQC_SET_WOL_FILTER_NO_TCO_WOL 0x4000 +#define I40E_AQC_SET_WOL_FILTER_WOL_PRESERVE_ON_PFR0x2000 #define I40E_AQC_SET_WOL_FILTER_ACTION_CLEAR 0 #define I40E_AQC_SET_WOL_FILTER_ACTION_SET 1 __le16 valid_flags; -- 2.12.2
[net-next 03/15] i40evf: assign num_active_queues inside i40evf_alloc_queues
From: Jacob Keller The variable num_active_queues represents the number of active queues we have for the device. We assign this pretty early in i40evf_init_subtask. Several code locations are written with loops over the tx_rings and rx_rings structures, which don't get allocated until i40evf_alloc_queues, and which get freed by i40evf_free_queues. These call sites were written under the assumption that tx_rings and rx_rings would always be allocated at least when num_active_queues is non-zero. Lets fix this by moving the assignment into the function where we allocate queues. We'll use a temporary variable for storage so that we don't assign the value in the adapter structure until after the rings have been set up. Finally, when we free the queues, we'll clear the value to ensure that we do not loop over the rings memory that no longer exists. This resolves a possible NULL pointer dereference in i40evf_get_ethtool_stats which could occur if the VF fails to recover from a reset, and then a user requests statistics. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 3a3ca965b242..7c213a347909 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -1198,6 +1198,7 @@ static void i40evf_free_queues(struct i40evf_adapter *adapter) { if (!adapter->vsi_res) return; + adapter->num_active_queues = 0; kfree(adapter->tx_rings); adapter->tx_rings = NULL; kfree(adapter->rx_rings); @@ -1214,18 +1215,22 @@ static void i40evf_free_queues(struct i40evf_adapter *adapter) **/ static int i40evf_alloc_queues(struct i40evf_adapter *adapter) { - int i; + int i, num_active_queues; + + num_active_queues = min_t(int, + adapter->vsi_res->num_queue_pairs, + (int)(num_online_cpus())); - adapter->tx_rings = kcalloc(adapter->num_active_queues, + adapter->tx_rings = kcalloc(num_active_queues, sizeof(struct i40e_ring), GFP_KERNEL); if (!adapter->tx_rings) goto err_out; - adapter->rx_rings = kcalloc(adapter->num_active_queues, + adapter->rx_rings = kcalloc(num_active_queues, sizeof(struct i40e_ring), GFP_KERNEL); if (!adapter->rx_rings) goto err_out; - for (i = 0; i < adapter->num_active_queues; i++) { + for (i = 0; i < num_active_queues; i++) { struct i40e_ring *tx_ring; struct i40e_ring *rx_ring; @@ -1247,6 +1252,8 @@ static int i40evf_alloc_queues(struct i40evf_adapter *adapter) rx_ring->rx_itr_setting = (I40E_ITR_DYNAMIC | I40E_ITR_RX_DEF); } + adapter->num_active_queues = num_active_queues; + return 0; err_out: @@ -2636,9 +2643,6 @@ static void i40evf_init_task(struct work_struct *work) adapter->watchdog_timer.data = (unsigned long)adapter; mod_timer(&adapter->watchdog_timer, jiffies + 1); - adapter->num_active_queues = min_t(int, - adapter->vsi_res->num_queue_pairs, - (int)(num_online_cpus())); adapter->tx_desc_count = I40EVF_DEFAULT_TXD; adapter->rx_desc_count = I40EVF_DEFAULT_RXD; err = i40evf_init_interrupt_scheme(adapter); -- 2.12.2
[net-next 01/15] i40e: add XDP support for pass and drop actions
From: Björn Töpel This commit adds basic XDP support for i40e derived NICs. All XDP actions will end up in XDP_DROP. Signed-off-by: Björn Töpel Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e.h | 7 ++ drivers/net/ethernet/intel/i40e/i40e_main.c | 87 +++ drivers/net/ethernet/intel/i40e/i40e_txrx.c | 130 +--- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 1 + 4 files changed, 194 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 395ca94faf80..d3195b29d53c 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -645,6 +645,8 @@ struct i40e_vsi { u16 max_frame; u16 rx_buf_len; + struct bpf_prog *xdp_prog; + /* List of q_vectors allocated to this VSI */ struct i40e_q_vector **q_vectors; int num_q_vectors; @@ -972,4 +974,9 @@ i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf); i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf); i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf); void i40e_print_link_message(struct i40e_vsi *vsi, bool isup); + +static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi) +{ + return !!vsi->xdp_prog; +} #endif /* _I40E_H_ */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 98fb644a580e..89bbe32a5934 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -27,6 +27,7 @@ #include #include #include +#include /* Local includes */ #include "i40e.h" @@ -2396,6 +2397,18 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf) } /** + * i40e_max_xdp_frame_size - returns the maximum allowed frame size for XDP + * @vsi: the vsi + **/ +static int i40e_max_xdp_frame_size(struct i40e_vsi *vsi) +{ + if (PAGE_SIZE >= 8192 || (vsi->back->flags & I40E_FLAG_LEGACY_RX)) + return I40E_RXBUFFER_2048; + else + return I40E_RXBUFFER_3072; +} + +/** * i40e_change_mtu - NDO callback to change the Maximum Transfer Unit * @netdev: network interface device structure * @new_mtu: new value for maximum frame size @@ -2408,6 +2421,13 @@ static int i40e_change_mtu(struct net_device *netdev, int new_mtu) struct i40e_vsi *vsi = np->vsi; struct i40e_pf *pf = vsi->back; + if (i40e_enabled_xdp_vsi(vsi)) { + int frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN; + + if (frame_size > i40e_max_xdp_frame_size(vsi)) + return -EINVAL; + } + netdev_info(netdev, "changing MTU from %d to %d\n", netdev->mtu, new_mtu); netdev->mtu = new_mtu; @@ -9311,6 +9331,72 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb, return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK); } +/** + * i40e_xdp_setup - add/remove an XDP program + * @vsi: VSI to changed + * @prog: XDP program + **/ +static int i40e_xdp_setup(struct i40e_vsi *vsi, + struct bpf_prog *prog) +{ + int frame_size = vsi->netdev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN; + struct i40e_pf *pf = vsi->back; + struct bpf_prog *old_prog; + bool need_reset; + int i; + + /* Don't allow frames that span over multiple buffers */ + if (frame_size > vsi->rx_buf_len) + return -EINVAL; + + if (!i40e_enabled_xdp_vsi(vsi) && !prog) + return 0; + + /* When turning XDP on->off/off->on we reset and rebuild the rings. */ + need_reset = (i40e_enabled_xdp_vsi(vsi) != !!prog); + + if (need_reset) + i40e_prep_for_reset(pf, true); + + old_prog = xchg(&vsi->xdp_prog, prog); + + if (need_reset) + i40e_reset_and_rebuild(pf, true, true); + + for (i = 0; i < vsi->num_queue_pairs; i++) + WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog); + + if (old_prog) + bpf_prog_put(old_prog); + + return 0; +} + +/** + * i40e_xdp - implements ndo_xdp for i40e + * @dev: netdevice + * @xdp: XDP command + **/ +static int i40e_xdp(struct net_device *dev, + struct netdev_xdp *xdp) +{ + struct i40e_netdev_priv *np = netdev_priv(dev); + struct i40e_vsi *vsi = np->vsi; + + if (vsi->type != I40E_VSI_MAIN) + return -EINVAL; + + switch (xdp->command) { + case XDP_SETUP_PROG: + return i40e_xdp_setup(vsi, xdp->prog); + case XDP_QUERY_PROG: + xdp->prog_attached = i40e_enabled_xdp_vsi(vsi); + return 0; + default: + return -EINVAL; + } +} + static const struct net_device_ops i40e_netdev_ops = { .ndo_open = i40e_open, .ndo_stop
[net-next 09/15] i40e: Add message for unsupported MFP mode
From: Carolyn Wyborny This patch adds a check and message if the device is in MFP mode as changing RSS input set is not supported in MFP mode. Signed-off-by: Carolyn Wyborny Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c index 9d3233c2c9cd..9692a5294fa3 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c @@ -2711,6 +2711,12 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, struct ethtool_rxnfc *nfc) u8 flow_pctype = 0; u64 i_set, i_setc; + if (pf->flags & I40E_FLAG_MFP_ENABLED) { + dev_err(&pf->pdev->dev, + "Change of RSS hash input set is not supported when MFP mode is enabled\n"); + return -EOPNOTSUPP; + } + /* RSS does not support anything other than hashing * to queues on src and dst IPs and ports */ -- 2.12.2
[net-next 14/15] i40e: Handle PE_CRITERR properly with IWARP enabled
From: Catherine Sullivan When IWARP is enabled, we weren't clearing the PE_CRITERR, just logging it and removing it from the mask. We need to do a corer to reset the PE_CRITERR register, so set the bit for that as we handle the interrupt. We should also be checking for the error against the PFINT_ICR0 register, and only need to clear it in the value getting written to PFINT_ICR0_ENA. Signed-off-by: Catherine Sullivan Signed-off-by: Mitch Williams Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 5d82ff54c7b0..c4328b4bec95 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3684,10 +3684,10 @@ static irqreturn_t i40e_intr(int irq, void *data) pf->sw_int_count++; if ((pf->flags & I40E_FLAG_IWARP_ENABLED) && - (ena_mask & I40E_PFINT_ICR0_ENA_PE_CRITERR_MASK)) { + (icr0 & I40E_PFINT_ICR0_ENA_PE_CRITERR_MASK)) { ena_mask &= ~I40E_PFINT_ICR0_ENA_PE_CRITERR_MASK; - icr0 &= ~I40E_PFINT_ICR0_ENA_PE_CRITERR_MASK; dev_dbg(&pf->pdev->dev, "cleared PE_CRITERR\n"); + set_bit(__I40E_CORE_RESET_REQUESTED, pf->state); } /* only q0 is used in MSI/Legacy mode, and none are used in MSIX */ -- 2.12.2
[net-next 02/15] i40e: add support for XDP_TX action
From: Björn Töpel This patch adds proper XDP_TX action support. For each Tx ring, an additional XDP Tx ring is allocated and setup. This version does the DMA mapping in the fast-path, which will penalize performance for IOMMU enabled systems. Further, debugfs support is not wired up for the XDP Tx rings. Signed-off-by: Björn Töpel Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e.h | 1 + drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 42 +++- drivers/net/ethernet/intel/i40e/i40e_main.c| 299 +++-- drivers/net/ethernet/intel/i40e/i40e_txrx.c| 118 +- drivers/net/ethernet/intel/i40e/i40e_txrx.h| 11 + 5 files changed, 384 insertions(+), 87 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index d3195b29d53c..4250ab55a9f1 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -629,6 +629,7 @@ struct i40e_vsi { /* These are containers of ring pointers, allocated at run-time */ struct i40e_ring **rx_rings; struct i40e_ring **tx_rings; + struct i40e_ring **xdp_rings; /* XDP Tx rings */ u32 active_filters; u32 promisc_threshold; diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c index 3d58762efbc0..9d3233c2c9cd 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c @@ -1299,6 +1299,17 @@ static void i40e_get_ringparam(struct net_device *netdev, ring->rx_jumbo_pending = 0; } +static bool i40e_active_tx_ring_index(struct i40e_vsi *vsi, u16 index) +{ + if (i40e_enabled_xdp_vsi(vsi)) { + return index < vsi->num_queue_pairs || + (index >= vsi->alloc_queue_pairs && +index < vsi->alloc_queue_pairs + vsi->num_queue_pairs); + } + + return index < vsi->num_queue_pairs; +} + static int i40e_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring) { @@ -1308,6 +1319,7 @@ static int i40e_set_ringparam(struct net_device *netdev, struct i40e_vsi *vsi = np->vsi; struct i40e_pf *pf = vsi->back; u32 new_rx_count, new_tx_count; + u16 tx_alloc_queue_pairs; int timeout = 50; int i, err = 0; @@ -1345,6 +1357,8 @@ static int i40e_set_ringparam(struct net_device *netdev, for (i = 0; i < vsi->num_queue_pairs; i++) { vsi->tx_rings[i]->count = new_tx_count; vsi->rx_rings[i]->count = new_rx_count; + if (i40e_enabled_xdp_vsi(vsi)) + vsi->xdp_rings[i]->count = new_tx_count; } goto done; } @@ -1354,20 +1368,24 @@ static int i40e_set_ringparam(struct net_device *netdev, * to the Tx and Rx ring structs. */ - /* alloc updated Tx resources */ + /* alloc updated Tx and XDP Tx resources */ + tx_alloc_queue_pairs = vsi->alloc_queue_pairs * + (i40e_enabled_xdp_vsi(vsi) ? 2 : 1); if (new_tx_count != vsi->tx_rings[0]->count) { netdev_info(netdev, "Changing Tx descriptor count from %d to %d.\n", vsi->tx_rings[0]->count, new_tx_count); - tx_rings = kcalloc(vsi->alloc_queue_pairs, + tx_rings = kcalloc(tx_alloc_queue_pairs, sizeof(struct i40e_ring), GFP_KERNEL); if (!tx_rings) { err = -ENOMEM; goto done; } - for (i = 0; i < vsi->num_queue_pairs; i++) { - /* clone ring and setup updated count */ + for (i = 0; i < tx_alloc_queue_pairs; i++) { + if (!i40e_active_tx_ring_index(vsi, i)) + continue; + tx_rings[i] = *vsi->tx_rings[i]; tx_rings[i].count = new_tx_count; /* the desc and bi pointers will be reallocated in the @@ -1379,6 +1397,8 @@ static int i40e_set_ringparam(struct net_device *netdev, if (err) { while (i) { i--; + if (!i40e_active_tx_ring_index(vsi, i)) + continue; i40e_free_tx_resources(&tx_rings[i]); } kfree(tx_rings); @@ -1446,9 +1466,11 @@ static int i40e_set_ringparam(struct net_device *netdev, i40e_down(vsi); if (tx_rings) { - for (i = 0; i < vsi->num_queue_pairs; i++) { -
[net-next 11/15] i40e: Add support for OEM firmware version
From: Filip Sadowski This patch adds support for OEM firmware version. If OEM specific adapter is detected ethtool reports OEM product version in firmware version string instead of etrack id. Signed-off-by: Filip Sadowski Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e.h | 48 - drivers/net/ethernet/intel/i40e/i40e_main.c | 47 2 files changed, 81 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 76395e695007..d616f698e155 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -103,6 +103,12 @@ (I40E_AQ_PHY_DEBUG_DISABLE_LINK_FW | \ I40E_AQ_PHY_DEBUG_DISABLE_ALL_LINK_FW) +#define I40E_OEM_EETRACK_ID0x +#define I40E_OEM_GEN_SHIFT 24 +#define I40E_OEM_SNAP_MASK 0x00ff +#define I40E_OEM_SNAP_SHIFT16 +#define I40E_OEM_RELEASE_MASK 0x + /* The values in here are decimal coded as hex as is the case in the NVM map*/ #define I40E_CURRENT_NVM_VERSION_HI0x2 #define I40E_CURRENT_NVM_VERSION_LO0x40 @@ -734,22 +740,36 @@ static inline char *i40e_nvm_version_str(struct i40e_hw *hw) { static char buf[32]; u32 full_ver; - u8 ver, patch; - u16 build; full_ver = hw->nvm.oem_ver; - ver = (u8)(full_ver >> I40E_OEM_VER_SHIFT); - build = (u16)((full_ver >> I40E_OEM_VER_BUILD_SHIFT) & -I40E_OEM_VER_BUILD_MASK); - patch = (u8)(full_ver & I40E_OEM_VER_PATCH_MASK); - - snprintf(buf, sizeof(buf), -"%x.%02x 0x%x %d.%d.%d", -(hw->nvm.version & I40E_NVM_VERSION_HI_MASK) >> - I40E_NVM_VERSION_HI_SHIFT, -(hw->nvm.version & I40E_NVM_VERSION_LO_MASK) >> - I40E_NVM_VERSION_LO_SHIFT, -hw->nvm.eetrack, ver, build, patch); + + if (hw->nvm.eetrack == I40E_OEM_EETRACK_ID) { + u8 gen, snap; + u16 release; + + gen = (u8)(full_ver >> I40E_OEM_GEN_SHIFT); + snap = (u8)((full_ver & I40E_OEM_SNAP_MASK) >> + I40E_OEM_SNAP_SHIFT); + release = (u16)(full_ver & I40E_OEM_RELEASE_MASK); + + snprintf(buf, sizeof(buf), "%x.%x.%x", gen, snap, release); + } else { + u8 ver, patch; + u16 build; + + ver = (u8)(full_ver >> I40E_OEM_VER_SHIFT); + build = (u16)((full_ver >> I40E_OEM_VER_BUILD_SHIFT) & +I40E_OEM_VER_BUILD_MASK); + patch = (u8)(full_ver & I40E_OEM_VER_PATCH_MASK); + + snprintf(buf, sizeof(buf), +"%x.%02x 0x%x %d.%d.%d", +(hw->nvm.version & I40E_NVM_VERSION_HI_MASK) >> + I40E_NVM_VERSION_HI_SHIFT, +(hw->nvm.version & I40E_NVM_VERSION_LO_MASK) >> + I40E_NVM_VERSION_LO_SHIFT, +hw->nvm.eetrack, ver, build, patch); + } return buf; } diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 8d7bd85933bb..8af6420826d1 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -7108,6 +7108,51 @@ static void i40e_send_version(struct i40e_pf *pf) } /** + * i40e_get_oem_version - get OEM specific version information + * @hw: pointer to the hardware structure + **/ +static void i40e_get_oem_version(struct i40e_hw *hw) +{ + u16 block_offset = 0x; + u16 block_length = 0; + u16 capabilities = 0; + u16 gen_snap = 0; + u16 release = 0; + +#define I40E_SR_NVM_OEM_VERSION_PTR0x1B +#define I40E_NVM_OEM_LENGTH_OFFSET 0x00 +#define I40E_NVM_OEM_CAPABILITIES_OFFSET 0x01 +#define I40E_NVM_OEM_GEN_OFFSET0x02 +#define I40E_NVM_OEM_RELEASE_OFFSET0x03 +#define I40E_NVM_OEM_CAPABILITIES_MASK 0x000F +#define I40E_NVM_OEM_LENGTH3 + + /* Check if pointer to OEM version block is valid. */ + i40e_read_nvm_word(hw, I40E_SR_NVM_OEM_VERSION_PTR, &block_offset); + if (block_offset == 0x) + return; + + /* Check if OEM version block has correct length. */ + i40e_read_nvm_word(hw, block_offset + I40E_NVM_OEM_LENGTH_OFFSET, + &block_length); + if (block_length < I40E_NVM_OEM_LENGTH) + return; + + /* Check if OEM version format is as expected. */ + i40e_read_nvm_word(hw, block_offset + I40E_NVM_OEM_CAPABILITIES_OFFSET, + &capabilities); + if ((capabilities & I40E_NVM_OEM_CAPABILITIES_MASK) != 0) +
[net-next 06/15] i40e: comment that udp_port must be in host byte order
From: Jacob Keller The firmware expects the port number passed when setting up the UDP tunnel configuration to be in Little Endian format. The i40e_aq_add_udp_tunnel command byte swaps the value from host order to Little Endian. Since commit fe0b0cd97b4f ("i40e: send correct port number to AdminQ when enabling UDP tunnels") we've correctly sent the value in host order. Let's also add a comment to the function explaining that it must be in host order, as the port numbers are commonly stored as Big Endian values. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_common.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c index cbad4eba7ae7..8e082a946411 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_common.c +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c @@ -3614,11 +3614,15 @@ i40e_status i40e_aq_get_cee_dcb_config(struct i40e_hw *hw, /** * i40e_aq_add_udp_tunnel * @hw: pointer to the hw struct - * @udp_port: the UDP port to add + * @udp_port: the UDP port to add in Host byte order * @header_len: length of the tunneling header length in DWords * @protocol_index: protocol index type * @filter_index: pointer to filter index * @cmd_details: pointer to command details structure or NULL + * + * Note: Firmware expects the udp_port value to be in Little Endian format, + * and this function will call cpu_to_le16 to convert from Host byte order to + * Little Endian order. **/ i40e_status i40e_aq_add_udp_tunnel(struct i40e_hw *hw, u16 udp_port, u8 protocol_index, -- 2.12.2
[net-next 00/15][pull request] 40GbE Intel Wired LAN Driver Updates 2017-06-20
This series contains updates to i40e and i40evf only. Björn adds additional XDP support for i40e, by adding pass and drop actions and XDP_TX action support. Jake fixes a possible NULL pointer dereference in i40evf_get_ethtool_stats() which could occur if the VF fails to recover from a reset, and then a user requests statistics. Changed the use of dev_info() to dev_dbg() for vf_capability client routine so that the standard log is not spammed with this information which "might" cause administrators to worry. Also added more code comments to help explain why udp_port has be in host byte order and to avoid future changes which may cause this to break. Fixed the holding of the RTNL lock for the entire reset routine, reduced the scope so that the reset function will handle its own lock, so that we do not have to wrap every reference to i40e_do_reset() with RTNL lock/unlock. Alice updates flags related to firmware interactions for WoL and admin queue command address with the correct value. Sudheer makes a fix to ensure that the array is not accessed past the size of the array. Greg fixes the parsing of firmware 4.33 admin queue commmand "Get CEE DCBX PER CFG" because the firmware now creates the oper_prio_tc nibbles reversed from those in the CDD Priority Group sub-TLV. Carolyn adds a check and message to let users know that when in MFP mode, changing RSS hash input set is not supported. Shannon makes the partition bandwidth control more generic since it is not in just one form of multi-function partitioning (MFP). Also fixes a bug which was causing the firmware confusion in some reset sequences, when we were disabling interrupts and we were clearing the whole register. Instead we should only be clearing the CAUSE_ENA bit when disabling interrupts. Filip adds support for OEM firmware version, so that if a OEM specific adapter is detected, ethtool reports the OEM product version in the firmware version string instead of etrack id. Alan fixes a bug where the driver was not correctly exiting overflow promiscuous mode, which can happen if "too many" MAC filters are added, putting the driver into overflow promiscuous mode, and the filters are then removed. The bug occurs because the conditional for toggling promiscuous mode was only be executed when enabled and not when it was disabled. The following are changes since commit f5c306470ed0a8f03ba7017f397da2555b5800d4: Merge tag 'mlx5-updates-2017-06-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux and are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE Alan Brady (1): i40e: fix disabling overflow promiscuous mode Alice Michael (1): i40e/i40evf: update WOL and I40E_AQC_ADDR_VALID_MASK flags Björn Töpel (2): i40e: add XDP support for pass and drop actions i40e: add support for XDP_TX action Carolyn Wyborny (1): i40e: Add message for unsupported MFP mode Catherine Sullivan (1): i40e: Handle PE_CRITERR properly with IWARP enabled Filip Sadowski (1): i40e: Add support for OEM firmware version Greg Bowers (1): i40e: Support firmware CEE DCB UP to TC map re-definition Jacob Keller (4): i40evf: assign num_active_queues inside i40evf_alloc_queues i40e: use dev_dbg instead of dev_info when warning about missing routine i40e: comment that udp_port must be in host byte order i40e: don't hold RTNL lock for the entire reset Shannon Nelson (2): i40e: genericize the partition bandwidth control i40e: clear only cause_ena bit Sudheer Mogilappagari (1): i40e: Fix potential out of bound array access drivers/net/ethernet/intel/i40e/i40e.h | 69 ++- drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 4 +- drivers/net/ethernet/intel/i40e/i40e_client.c | 6 +- drivers/net/ethernet/intel/i40e/i40e_common.c | 6 +- drivers/net/ethernet/intel/i40e/i40e_dcb.c | 15 +- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 48 +- drivers/net/ethernet/intel/i40e/i40e_main.c| 524 - drivers/net/ethernet/intel/i40e/i40e_txrx.c| 244 -- drivers/net/ethernet/intel/i40e/i40e_txrx.h| 12 + .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h| 5 +- drivers/net/ethernet/intel/i40evf/i40evf_main.c| 18 +- 11 files changed, 746 insertions(+), 205 deletions(-) -- 2.12.2
Re: [PATCH] liquidio: stop using huge static buffer, save 4096k in .data
From: Felix Manlunas Date: Tue, 20 Jun 2017 13:51:25 -0700 > From: Derek Chickles > Date: Tue, 20 Jun 2017 13:15:34 -0700 > >> > From: David Miller [mailto:da...@davemloft.net] >> > Sent: Tuesday, June 20, 2017 12:22 PM >> > >> > From: Denys Vlasenko >> > Date: Mon, 19 Jun 2017 21:50:52 +0200 >> > >> > > Only compile-tested - I don't have the hardware. >> > > >> > > From code inspection, octeon_pci_write_core_mem() appears to be safe wrt >> > > unaligned source. In any case, u8 fbuf[] was not guaranteed to be aligned >> > > anyway. >> > > >> > > Signed-off-by: Denys Vlasenko >> > >> > Looks good to me but I'll let one of the liquidio guys review this first >> > before I apply it. >> >> Felix is going to try this out this week to confirm. Let's wait for his ack. > > This patch works. I tested it with a LiquidIO II adapter. > > ACK Please ACK patches in the standard way which is in the form of: Acked-by: David S. Miller This tag is recognized by tools and in particular the patchwork site where networking patches are maintained, automatically including your ACK into the patch I apply.
Investment portfolio
Hello, How are you today? we have the financial capability to finance any investment portfolio as far as is genuine, all we need is a capable business partner that possesses investment strategies or profitable business information for good turn over within 10-20years. We can provide proof of funds on demand Please write me back if you can work with me on this project. Thank You, Best Regards Hassan Dako
Re: [PATCH] [PATCH v2 net-next] bonding: Convert multiple netdev_info messages to netdev_dbg
On Tue, 2017-06-20 at 23:05 +0100, Michael J Dilmore wrote: > The bond_options.c file contains several netdev_info messages that clutter > kernel output. This patch changes all netdev_info messages > to netdev_dbg and adds a netdev debug for the packets per slave parameter. Hey Michael. You should realign the multiple-line statements to the open parentheses. cheers, Joe > diff --git a/drivers/net/bonding/bond_options.c > b/drivers/net/bonding/bond_options.c [] > @@ -721,13 +721,13 @@ static int bond_option_mode_set(struct bonding *bond, > const struct bond_opt_value *newval) > { > if (!bond_mode_uses_arp(newval->value) && bond->params.arp_interval) { > - netdev_info(bond->dev, "%s mode is incompatible with arp > monitoring, start mii monitoring\n", > + netdev_dbg(bond->dev, "%s mode is incompatible with arp > monitoring, start mii monitoring\n", > newval->string); Now all these are not aligned properly. etc...
Re: [PATCH 00/51] rtc: stop using rtc deprecated functions
On Wed, Jun 21, 2017 at 12:00:30AM +0200, Thomas Gleixner wrote: > Yes, but there are still quite some issues to solve there: > > 1) How do you tell the system that it should apply the offset in the > first place, i.e at boot time before NTP or any other mechanism can > correct it? > > 2) Deal with creative vendors who have their own idea about the 'start > of the epoch' > > 3) Add the information of wraparound time to the rtc device which > needs to be filled in for each device. That way the rtc_*** > accessor functions can deal with them whether they wrap in 2038 or > 2100 or whatever. > > #3 is the simplest problem of them :) Well, if there's additional non-volatile storage, you can store additional information in there, but you still need the RTC subsystem to be aware that the hardware is only 32-bit capable. You'd also still need something along the lines I detailed (redefining what dates the past 32-bit values indicate) to cope with the RTC being set backwards after the machine thinks (possibly incorrectly) that the date has jumped forwards. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
[PATCH] [PATCH v2 net-next] bonding: Convert multiple netdev_info messages to netdev_dbg
The bond_options.c file contains several netdev_info messages that clutter kernel output. This patch changes all netdev_info messages to netdev_dbg and adds a netdev debug for the packets per slave parameter. Suggested-by: Joe Perches Signed-off-by: Michael J Dilmore --- drivers/net/bonding/bond_options.c | 79 +++--- 1 file changed, 40 insertions(+), 39 deletions(-) diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 1bcbb89..e3a9af6 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -721,13 +721,13 @@ static int bond_option_mode_set(struct bonding *bond, const struct bond_opt_value *newval) { if (!bond_mode_uses_arp(newval->value) && bond->params.arp_interval) { - netdev_info(bond->dev, "%s mode is incompatible with arp monitoring, start mii monitoring\n", + netdev_dbg(bond->dev, "%s mode is incompatible with arp monitoring, start mii monitoring\n", newval->string); /* disable arp monitoring */ bond->params.arp_interval = 0; /* set miimon to default value */ bond->params.miimon = BOND_DEFAULT_MIIMON; - netdev_info(bond->dev, "Setting MII monitoring interval to %d\n", + netdev_dbg(bond->dev, "Setting MII monitoring interval to %d\n", bond->params.miimon); } @@ -771,7 +771,7 @@ static int bond_option_active_slave_set(struct bonding *bond, block_netpoll_tx(); /* check to see if we are clearing active */ if (!slave_dev) { - netdev_info(bond->dev, "Clearing current active slave\n"); + netdev_dbg(bond->dev, "Clearing current active slave\n"); RCU_INIT_POINTER(bond->curr_active_slave, NULL); bond_select_active_slave(bond); } else { @@ -782,12 +782,12 @@ static int bond_option_active_slave_set(struct bonding *bond, if (new_active == old_active) { /* do nothing */ - netdev_info(bond->dev, "%s is already the current active slave\n", + netdev_dbg(bond->dev, "%s is already the current active slave\n", new_active->dev->name); } else { if (old_active && (new_active->link == BOND_LINK_UP) && bond_slave_is_up(new_active)) { - netdev_info(bond->dev, "Setting %s as active slave\n", + netdev_dbg(bond->dev, "Setting %s as active slave\n", new_active->dev->name); bond_change_active_slave(bond, new_active); } else { @@ -810,17 +810,17 @@ static int bond_option_active_slave_set(struct bonding *bond, static int bond_option_miimon_set(struct bonding *bond, const struct bond_opt_value *newval) { - netdev_info(bond->dev, "Setting MII monitoring interval to %llu\n", + netdev_dbg(bond->dev, "Setting MII monitoring interval to %llu\n", newval->value); bond->params.miimon = newval->value; if (bond->params.updelay) - netdev_info(bond->dev, "Note: Updating updelay (to %d) since it is a multiple of the miimon value\n", + netdev_dbg(bond->dev, "Note: Updating updelay (to %d) since it is a multiple of the miimon value\n", bond->params.updelay * bond->params.miimon); if (bond->params.downdelay) - netdev_info(bond->dev, "Note: Updating downdelay (to %d) since it is a multiple of the miimon value\n", + netdev_dbg(bond->dev, "Note: Updating downdelay (to %d) since it is a multiple of the miimon value\n", bond->params.downdelay * bond->params.miimon); if (newval->value && bond->params.arp_interval) { - netdev_info(bond->dev, "MII monitoring cannot be used with ARP monitoring - disabling ARP monitoring...\n"); + netdev_dbg(bond->dev, "MII monitoring cannot be used with ARP monitoring - disabling ARP monitoring...\n"); bond->params.arp_interval = 0; if (bond->params.arp_validate) bond->params.arp_validate = BOND_ARP_VALIDATE_NONE; @@ -862,7 +862,7 @@ static int bond_option_updelay_set(struct bonding *bond, bond->params.miimon); } bond->params.updelay = value / bond->params.miimon; - netdev_info(bond->dev, "Setting up delay to %d\n", + netdev_dbg(bond->dev, "Setting up delay to %d\n", bond->params.updelay * bond->params.miimon); return 0; @@ -884,7 +884,7 @@ static int bond_option_downdel
Re: [PATCH 00/51] rtc: stop using rtc deprecated functions
Hi! > >> > This is it. > >> > https://patchwork.kernel.org/patch/6219401/ > >> > >> Thanks. > >> > >> Yes, that's argument against changing rtc _drivers_ for hardware that > >> can not do better than 32bit. For generic code (such as 44/51 sysfs, > >> 51/51 suspend test), the change still makes sense. > > What I had in mind when writing those patches was to remove the limitations > coming from those functions usage, even more since they been marked has > deprecated. > > I agree that will change nothing of hardware limitation but at least > the limit will > not come from the framework. > > Yes, we agree on that but I won't cherry pick working patches from a 51 > > patches series. Well, it would be actually nice for you to do the cherry picking. That's something maintainers do, because it is hard for contributors to guess maintainer's taste. Anyway, it looks like someone should go through all the RTC drivers, and document their limitations of each driver (date in future when hardware ceases to be useful). If Benjamin has time to do that, I guess that removes all the objections to the series. Regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH 00/51] rtc: stop using rtc deprecated functions
On Tue, 20 Jun 2017, Alexandre Belloni wrote: > On 20/06/2017 at 22:15:36 +0100, Russell King - ARM Linux wrote: > > On Tue, Jun 20, 2017 at 05:07:46PM +0200, Benjamin Gaignard wrote: > > > 2017-06-20 15:48 GMT+02:00 Alexandre Belloni > > > : > > > >> Yes, that's argument against changing rtc _drivers_ for hardware that > > > >> can not do better than 32bit. For generic code (such as 44/51 sysfs, > > > >> 51/51 suspend test), the change still makes sense. > > > > > > What I had in mind when writing those patches was to remove the > > > limitations > > > coming from those functions usage, even more since they been marked has > > > deprecated. > > > > I'd say that they should not be marked as deprecated. They're entirely > > appropriate for use with hardware that only supports a 32-bit > > representation of time. > > > > It's entirely reasonable to fix the ones that use other representations > > that exceed that, but for those which do not, we need to keep using the > > 32-bit versions. Doing so actually gives us _more_ flexibility in the > > future. > > > > Consider that at the moment, we define the 32-bit RTC representation to > > start at a well known epoch. We _could_ decide that when it wraps to > > 0x8000 seconds, we'll define the lower 0x4000 seconds to mean > > dates in the future - and keep rolling that forward each time we cross > > another 0x4000 seconds. Unless someone invents a real time machine, > > we shouldn't need to set a modern RTC back to 1970. > > > > I agree with that but not the android guys. They seem to mandate an RTC > that can store time from 01/01/1970. I don't know much more than that > because they never cared to explain why that was actually necessary > (apart from a laconic "this will result in a bad user experience") > > I think tglx had a plan for offsetting the time at some point so 32-bit > platform can pass 2038 properly. Yes, but there are still quite some issues to solve there: 1) How do you tell the system that it should apply the offset in the first place, i.e at boot time before NTP or any other mechanism can correct it? 2) Deal with creative vendors who have their own idea about the 'start of the epoch' 3) Add the information of wraparound time to the rtc device which needs to be filled in for each device. That way the rtc_*** accessor functions can deal with them whether they wrap in 2038 or 2100 or whatever. #3 is the simplest problem of them :) > My opinion is that as long as userspace is not ready to handle those > dates, it doesn't really matter because it is quite unlikely that > anything will be able to continue running anyway. That's a different story. Making the kernel y2038 ready in general is a good thing. Whether userspace will be ready by then or not is completely irrelevant. Thanks, tglx
Re: [RFC 1/2] net-next: fix DSA flow_disection
> On Tue, Jun 20, 2017 at 07:37:35PM +0200, John Crispin wrote: > > > On 20/06/17 16:01, Andrew Lunn wrote: > >On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote: > >>RPS and probably other kernel features are currently broken on some if not > >>all DSA devices. The root cause of this that skb_hash will call the > >>flow_disector. > >Hi John > > > >What is the call path when the flow_disector is called? I'm wondering > >if we can defer this, and call it later, after the tag code has > >removed the header. > > > > Andrew Hi John I follow your logic of doing the hash early Is there any value in including the DSA header in the hash? That might allow frames from different ingress ports to be spread over CPUs? Andrew
Re: [PATCH] net: phy: smsc: fix buffer overflow in memcpy
On Tue, Jun 20, 2017 at 10:40:46PM +0200, Arnd Bergmann wrote: > The memcpy annotation triggers for a fixed-length buffer copy: > > In file included from /git/arm-soc/arch/arm64/include/asm/processor.h:30:0, > from /git/arm-soc/arch/arm64/include/asm/spinlock.h:21, > from /git/arm-soc/include/linux/spinlock.h:87, > from /git/arm-soc/include/linux/seqlock.h:35, > from /git/arm-soc/include/linux/time.h:5, > from /git/arm-soc/include/linux/stat.h:21, > from /git/arm-soc/include/linux/module.h:10, > from /git/arm-soc/drivers/net/phy/smsc.c:20: > In function 'memcpy', > inlined from 'smsc_get_strings' at > /git/arm-soc/drivers/net/phy/smsc.c:166:3: > /git/arm-soc/include/linux/string.h:309:4: error: call to '__read_overflow2' > declared with attribute error: detected read beyond size of object passed as > 2nd parameter > > Using strncpy instead of memcpy should do the right thing here. Hi Arnd You will find this pattern in number of phy drivers: bcm-phy-lib.c: memcpy(data + i * ETH_GSTRING_LEN, marvell.c: memcpy(data + i * ETH_GSTRING_LEN, micrel.c: memcpy(data + i * ETH_GSTRING_LEN, smsc.c: memcpy(data + i * ETH_GSTRING_LEN, They probably all need the same fix. Andrew
Re: [PATCH v2] net/phy: micrel: configure intterupts after autoneg workaround
On 06/20/2017 10:48 AM, Zach Brown wrote: > The commit ("net/phy: micrel: Add workaround for bad autoneg") fixes an > autoneg failure case by resetting the hardware. This turns off > intterupts. Things will work themselves out if the phy polls, as it will > figure out it's state during a poll. However if the phy uses only > intterupts, the phy will stall, since interrupts are off. This patch > fixes the issue by calling config_intr after resetting the phy. > > Fixes: d2fd719bcb0e ("net/phy: micrel: Add workaround for bad autoneg ") > Signed-off-by: Zach Brown Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH v2] net/phy: micrel: configure intterupts after autoneg workaround
On Tue, Jun 20, 2017 at 12:48:11PM -0500, Zach Brown wrote: > The commit ("net/phy: micrel: Add workaround for bad autoneg") fixes an > autoneg failure case by resetting the hardware. This turns off > intterupts. Things will work themselves out if the phy polls, as it will > figure out it's state during a poll. However if the phy uses only > intterupts, the phy will stall, since interrupts are off. This patch > fixes the issue by calling config_intr after resetting the phy. > > Fixes: d2fd719bcb0e ("net/phy: micrel: Add workaround for bad autoneg ") > Signed-off-by: Zach Brown Reviewed-by: Andrew Lunn Andrew
Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+
On 06/20/2017 11:05 AM, Michal Kubecek wrote: On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote: On 06/14/2017 03:25 PM, David Ahern wrote: On 6/14/17 4:23 PM, Ben Greear wrote: On 06/13/2017 07:27 PM, David Ahern wrote: Let's try a targeted debug patch. See attached I had to change it to pr_err so it would go to our serial console since the system locked hard on crash, and that appears to be enough to change the timing where we can no longer reproduce the problem. ok, let's figure out which one is doing that. There are 3 debug statements. I suspect fib6_del_route is the one setting the state to FWS_U. Can you remove the debug prints in fib6_repair_tree and fib6_walk_continue and try again? We cannot reproduce with just that one printf in the kernel either. It must change the timing too much to trigger the bug. You might try trace_printk() which should have less impact (don't forget to enable /proc/sys/kernel/ftrace_dump_on_oops). We cannot reproduce with trace_printk() either. Thanks, Ben Michal Kubecek -- Ben Greear Candela Technologies Inc http://www.candelatech.com
[PATCH net-next v2 06/12] nfp: add stats and xmit helpers for representors
Provide helpers for stats and xmit on representor netdevs. Parts based on work by Bert van Leeuwen, Benjamin LaHaise and Jakub Kicinski. Signed-off-by: Simon Horman Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 199 +- drivers/net/ethernet/netronome/nfp/nfp_net_repr.h | 28 +++ 2 files changed, 226 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c index 8e02f843ae92..44adcc5df11e 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c @@ -32,15 +32,198 @@ */ #include +#include #include #include #include "nfpcore/nfp_cpp.h" #include "nfp_app.h" #include "nfp_main.h" +#include "nfp_net_ctrl.h" #include "nfp_net_repr.h" #include "nfp_port.h" +static void +nfp_repr_inc_tx_stats(struct net_device *netdev, unsigned int len, + int tx_status) +{ + struct nfp_repr *repr = netdev_priv(netdev); + struct nfp_repr_pcpu_stats *stats; + + if (unlikely(tx_status != NET_XMIT_SUCCESS && +tx_status != NET_XMIT_CN)) { + this_cpu_inc(repr->stats->tx_drops); + return; + } + + stats = this_cpu_ptr(repr->stats); + u64_stats_update_begin(&stats->syncp); + stats->tx_packets++; + stats->tx_bytes += len; + u64_stats_update_end(&stats->syncp); +} + +void nfp_repr_inc_rx_stats(struct net_device *netdev, unsigned int len) +{ + struct nfp_repr *repr = netdev_priv(netdev); + struct nfp_repr_pcpu_stats *stats; + + stats = this_cpu_ptr(repr->stats); + u64_stats_update_begin(&stats->syncp); + stats->rx_packets++; + stats->rx_bytes += len; + u64_stats_update_end(&stats->syncp); +} + +static void +nfp_repr_phy_port_get_stats64(const struct nfp_app *app, u8 phy_port, + struct rtnl_link_stats64 *stats) +{ + u8 __iomem *mem; + + mem = app->pf->mac_stats_mem + phy_port * NFP_MAC_STATS_SIZE; + + /* TX and RX stats are flipped as we are returning the stats as seen +* at the switch port corresponding to the phys port. +*/ + stats->tx_packets = readq(mem + NFP_MAC_STATS_RX_FRAMES_RECEIVED_OK); + stats->tx_bytes = readq(mem + NFP_MAC_STATS_RX_IN_OCTETS); + stats->tx_dropped = readq(mem + NFP_MAC_STATS_RX_IN_ERRORS); + + stats->rx_packets = readq(mem + NFP_MAC_STATS_TX_FRAMES_TRANSMITTED_OK); + stats->rx_bytes = readq(mem + NFP_MAC_STATS_TX_OUT_OCTETS); + stats->rx_dropped = readq(mem + NFP_MAC_STATS_TX_OUT_ERRORS); +} + +static void +nfp_repr_vf_get_stats64(const struct nfp_app *app, u8 vf, + struct rtnl_link_stats64 *stats) +{ + u8 __iomem *mem; + + mem = app->pf->vf_cfg_mem + vf * NFP_NET_CFG_BAR_SZ; + + /* TX and RX stats are flipped as we are returning the stats as seen +* at the switch port corresponding to the VF. +*/ + stats->tx_packets = readq(mem + NFP_NET_CFG_STATS_RX_FRAMES); + stats->tx_bytes = readq(mem + NFP_NET_CFG_STATS_RX_OCTETS); + stats->tx_dropped = readq(mem + NFP_NET_CFG_STATS_RX_DISCARDS); + + stats->rx_packets = readq(mem + NFP_NET_CFG_STATS_TX_FRAMES); + stats->rx_bytes = readq(mem + NFP_NET_CFG_STATS_TX_OCTETS); + stats->rx_dropped = readq(mem + NFP_NET_CFG_STATS_TX_DISCARDS); +} + +static void +nfp_repr_pf_get_stats64(const struct nfp_app *app, u8 pf, + struct rtnl_link_stats64 *stats) +{ + u8 __iomem *mem; + + if (pf) + return; + + mem = nfp_cpp_area_iomem(app->pf->data_vnic_bar); + + stats->tx_packets = readq(mem + NFP_NET_CFG_STATS_RX_FRAMES); + stats->tx_bytes = readq(mem + NFP_NET_CFG_STATS_RX_OCTETS); + stats->tx_dropped = readq(mem + NFP_NET_CFG_STATS_RX_DISCARDS); + + stats->rx_packets = readq(mem + NFP_NET_CFG_STATS_TX_FRAMES); + stats->rx_bytes = readq(mem + NFP_NET_CFG_STATS_TX_OCTETS); + stats->rx_dropped = readq(mem + NFP_NET_CFG_STATS_TX_DISCARDS); +} + +void +nfp_repr_get_stats64(const struct nfp_app *app, enum nfp_repr_type type, +u8 port, struct rtnl_link_stats64 *stats) +{ + switch (type) { + case NFP_REPR_TYPE_PHYS_PORT: + nfp_repr_phy_port_get_stats64(app, port, stats); + break; + case NFP_REPR_TYPE_PF: + nfp_repr_pf_get_stats64(app, port, stats); + break; + case NFP_REPR_TYPE_VF: + nfp_repr_vf_get_stats64(app, port, stats); + default: + break; + } +} + +bool +nfp_repr_has_offload_stats(const struct net_device *dev, int attr_id) +{ + switch (attr_id) { + case IFLA_OFFLOAD_XSTATS_CPU_HIT: + return true; + } + + return false; +} + +static int +nfp_repr_get_hos
[PATCH net-next v2 12/12] nfp: add VF and PF representors to flower app
Initialise VF and PF representors in flower app. Based in part on work by Benjamin LaHaise, Bert van Leeuwen and Jakub Kicinski. Signed-off-by: Simon Horman Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/flower/main.c | 86 +++- 1 file changed, 84 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c index d1d905727c54..582b7be3e219 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/main.c +++ b/drivers/net/ethernet/netronome/nfp/flower/main.c @@ -149,15 +149,81 @@ static const struct net_device_ops nfp_flower_repr_netdev_ops = { .ndo_get_offload_stats = nfp_repr_get_offload_stats, }; +static void nfp_flower_sriov_disable(struct nfp_app *app) +{ + nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_VF); +} + +static int +nfp_flower_spawn_vnic_reprs(struct nfp_app *app, + enum nfp_flower_cmsg_port_vnic_type vnic_type, + enum nfp_repr_type repr_type, unsigned int cnt) +{ + u8 nfp_pcie = nfp_cppcore_pcie_unit(app->pf->cpp); + struct nfp_flower_priv *priv = app->priv; + struct nfp_reprs *reprs, *old_reprs; + const u8 queue = 0; + int i, err; + + reprs = nfp_reprs_alloc(cnt); + if (!reprs) + return -ENOMEM; + + for (i = 0; i < cnt; i++) { + u32 port_id; + + reprs->reprs[i] = nfp_repr_alloc(app); + if (!reprs->reprs[i]) { + err = -ENOMEM; + goto err_reprs_clean; + } + + SET_NETDEV_DEV(reprs->reprs[i], &priv->nn->pdev->dev); + eth_hw_addr_inherit(reprs->reprs[i], priv->nn->dp.netdev); + + port_id = nfp_flower_cmsg_pcie_port(nfp_pcie, vnic_type, + i, queue); + err = nfp_repr_init(app, reprs->reprs[i], + &nfp_flower_repr_netdev_ops, + port_id, NULL, priv->nn->dp.netdev); + if (err) + goto err_reprs_clean; + + nfp_info(app->cpp, "%s%d Representor(%s) created\n", +repr_type == NFP_REPR_TYPE_PF ? "PF" : "VF", i, +reprs->reprs[i]->name); + } + + old_reprs = nfp_app_reprs_set(app, repr_type, reprs); + if (IS_ERR(old_reprs)) { + err = PTR_ERR(old_reprs); + goto err_reprs_clean; + } + + return 0; +err_reprs_clean: + nfp_reprs_clean_and_free(reprs); + return err; +} + +static int nfp_flower_sriov_enable(struct nfp_app *app, int num_vfs) +{ + return nfp_flower_spawn_vnic_reprs(app, + NFP_FLOWER_CMSG_PORT_VNIC_TYPE_VF, + NFP_REPR_TYPE_VF, num_vfs); +} + static void nfp_flower_stop(struct nfp_app *app) { + nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_PF); nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_PHYS_PORT); + } -static int nfp_flower_start(struct nfp_app *app) +static int +nfp_flower_spawn_phy_reprs(struct nfp_app *app, struct nfp_flower_priv *priv) { struct nfp_eth_table *eth_tbl = app->pf->eth_tbl; - struct nfp_flower_priv *priv = app->priv; struct nfp_reprs *reprs, *old_reprs; unsigned int i; int err; @@ -218,6 +284,19 @@ static int nfp_flower_start(struct nfp_app *app) return err; } +static int nfp_flower_start(struct nfp_app *app) +{ + int err; + + err = nfp_flower_spawn_phy_reprs(app, app->priv); + if (err) + return err; + + return nfp_flower_spawn_vnic_reprs(app, + NFP_FLOWER_CMSG_PORT_VNIC_TYPE_PF, + NFP_REPR_TYPE_PF, 1); +} + static void nfp_flower_vnic_clean(struct nfp_app *app, struct nfp_net *nn) { kfree(app->priv); @@ -289,6 +368,9 @@ const struct nfp_app_type app_flower = { .ctrl_msg_rx= nfp_flower_cmsg_rx, + .sriov_enable = nfp_flower_sriov_enable, + .sriov_disable = nfp_flower_sriov_disable, + .eswitch_mode_get = eswitch_mode_get, .repr_get = nfp_flower_repr_get, }; -- 2.1.4
[PATCH net-next v2 11/12] nfp: add flower app
Add app for flower offload. At this point the PF netdev and phys port representor netdevs are initialised. Follow-up work will add support for VF and PF representors and beyond that offloading the flower classifier. Based in part on work by Benjamin LaHaise and Bert van Leeuwen. Signed-off-by: Simon Horman Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/Makefile | 1 + drivers/net/ethernet/netronome/nfp/flower/main.c | 294 +++ drivers/net/ethernet/netronome/nfp/nfp_app.c | 1 + drivers/net/ethernet/netronome/nfp/nfp_app.h | 4 + 4 files changed, 300 insertions(+) create mode 100644 drivers/net/ethernet/netronome/nfp/flower/main.c diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile index e14f62863add..10b556b2c59d 100644 --- a/drivers/net/ethernet/netronome/nfp/Makefile +++ b/drivers/net/ethernet/netronome/nfp/Makefile @@ -28,6 +28,7 @@ nfp-objs := \ bpf/main.o \ bpf/offload.o \ flower/cmsg.o \ + flower/main.o \ nic/main.o ifeq ($(CONFIG_BPF_SYSCALL),y) diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c new file mode 100644 index ..d1d905727c54 --- /dev/null +++ b/drivers/net/ethernet/netronome/nfp/flower/main.c @@ -0,0 +1,294 @@ +/* + * Copyright (C) 2017 Netronome Systems, Inc. + * + * This software is dual licensed under the GNU General License Version 2, + * June 1991 as shown in the file COPYING in the top-level directory of this + * source tree or the BSD 2-Clause License provided below. You have the + * option to license this software under the complete terms of either license. + * + * The BSD 2-Clause License: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include + +#include "../nfpcore/nfp_cpp.h" +#include "../nfpcore/nfp_nsp.h" +#include "../nfp_app.h" +#include "../nfp_main.h" +#include "../nfp_net.h" +#include "../nfp_net_repr.h" +#include "../nfp_port.h" +#include "./cmsg.h" + +/** + * struct nfp_flower_priv - Flower APP per-vNIC priv data + * @nn: Pointer to vNIC + */ +struct nfp_flower_priv { + struct nfp_net *nn; +}; + +static const char *nfp_flower_extra_cap(struct nfp_app *app, struct nfp_net *nn) +{ + return "FLOWER"; +} + +static enum devlink_eswitch_mode eswitch_mode_get(struct nfp_app *app) +{ + return DEVLINK_ESWITCH_MODE_SWITCHDEV; +} + +static enum nfp_repr_type +nfp_flower_repr_get_type_and_port(struct nfp_app *app, u32 port_id, u8 *port) +{ + switch (FIELD_GET(NFP_FLOWER_CMSG_PORT_TYPE, port_id)) { + case NFP_FLOWER_CMSG_PORT_TYPE_PHYS_PORT: + *port = FIELD_GET(NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM, + port_id); + return NFP_REPR_TYPE_PHYS_PORT; + + case NFP_FLOWER_CMSG_PORT_TYPE_PCIE_PORT: + *port = FIELD_GET(NFP_FLOWER_CMSG_PORT_VNIC, port_id); + if (FIELD_GET(NFP_FLOWER_CMSG_PORT_VNIC_TYPE, port_id) == + NFP_FLOWER_CMSG_PORT_VNIC_TYPE_PF) + return NFP_REPR_TYPE_PF; + else + return NFP_REPR_TYPE_VF; + } + + return NFP_FLOWER_CMSG_PORT_TYPE_UNSPEC; +} + +static struct net_device * +nfp_flower_repr_get(struct nfp_app *app, u32 port_id) +{ + enum nfp_repr_type repr_type; + struct nfp_reprs *reprs; + u8 port = 0; + + repr_type = nfp_flower_repr_get_type_and_port(app, port_id, &port); + + reprs = rcu_dereference(app->reprs[repr_type]); + if (!reprs) + return NULL; + + if (port >= reprs->num_reprs) + return NULL; + + return reprs->reprs[port]; +} + +static void +nfp_flower_repr_netdev_get_stats64(struct net_device *netdev, + str
[PATCH net-next v2 08/12] nfp: provide nfp_port to of nfp_net_get_mac_addr()
Provide port rather than vNIC as parameter of nfp_net_get_mac_addr. This is to allow this function to be used by representor netdevs where a vNIC may have more than one physical port none of which are associated with the vNIC. Signed-off-by: Simon Horman Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/nfp_app_nic.c | 2 +- drivers/net/ethernet/netronome/nfp/nfp_main.h | 3 ++- drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 25 +++ 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c b/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c index 7b966bd3d214..c11a6c34e217 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c @@ -69,7 +69,7 @@ int nfp_app_nic_vnic_init(struct nfp_app *app, struct nfp_net *nn, if (err) return err < 0 ? err : 0; - nfp_net_get_mac_addr(app->pf, nn, id); + nfp_net_get_mac_addr(app->pf, nn->port, id); return 0; } diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h b/drivers/net/ethernet/netronome/nfp/nfp_main.h index aa69d4101eb9..edc14dc78674 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h @@ -58,6 +58,7 @@ struct nfp_hwinfo; struct nfp_mip; struct nfp_net; struct nfp_nsp_identify; +struct nfp_port; struct nfp_rtsym_table; /** @@ -147,7 +148,7 @@ void nfp_hwmon_unregister(struct nfp_pf *pf); struct nfp_eth_table_port * nfp_net_find_port(struct nfp_eth_table *eth_tbl, unsigned int id); void -nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_net *nn, unsigned int id); +nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_port *port, unsigned int id); bool nfp_ctrl_tx(struct nfp_net *nn, struct sk_buff *skb); diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c index eb87e1c08bb1..e16a5fa92279 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c @@ -135,25 +135,24 @@ static u8 __iomem *nfp_net_map_area(struct nfp_cpp *cpp, /** * nfp_net_get_mac_addr() - Get the MAC address. * @pf: NFP PF handle - * @nn: NFP Network structure + * @port: NFP port structure * @id: NFP port id * * First try to get the MAC address from NSP ETH table. If that * fails try HWInfo. As a last resort generate a random address. */ void -nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_net *nn, unsigned int id) +nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_port *port, unsigned int id) { struct nfp_eth_table_port *eth_port; - struct nfp_net_dp *dp = &nn->dp; u8 mac_addr[ETH_ALEN]; const char *mac_str; char name[32]; - eth_port = __nfp_port_get_eth_port(nn->port); + eth_port = __nfp_port_get_eth_port(port); if (eth_port) { - ether_addr_copy(dp->netdev->dev_addr, eth_port->mac_addr); - ether_addr_copy(dp->netdev->perm_addr, eth_port->mac_addr); + ether_addr_copy(port->netdev->dev_addr, eth_port->mac_addr); + ether_addr_copy(port->netdev->perm_addr, eth_port->mac_addr); return; } @@ -161,22 +160,22 @@ nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_net *nn, unsigned int id) mac_str = nfp_hwinfo_lookup(pf->hwinfo, name); if (!mac_str) { - dev_warn(dp->dev, "Can't lookup MAC address. Generate\n"); - eth_hw_addr_random(dp->netdev); + nfp_warn(pf->cpp, "Can't lookup MAC address. Generate\n"); + eth_hw_addr_random(port->netdev); return; } if (sscanf(mac_str, "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx", &mac_addr[0], &mac_addr[1], &mac_addr[2], &mac_addr[3], &mac_addr[4], &mac_addr[5]) != 6) { - dev_warn(dp->dev, -"Can't parse MAC address (%s). Generate.\n", mac_str); - eth_hw_addr_random(dp->netdev); + nfp_warn(pf->cpp, "Can't parse MAC address (%s). Generate.\n", +mac_str); + eth_hw_addr_random(port->netdev); return; } - ether_addr_copy(dp->netdev->dev_addr, mac_addr); - ether_addr_copy(dp->netdev->perm_addr, mac_addr); + ether_addr_copy(port->netdev->dev_addr, mac_addr); + ether_addr_copy(port->netdev->perm_addr, mac_addr); } struct nfp_eth_table_port * -- 2.1.4
[PATCH net-next v2 10/12] nfp: add support for control messages for flower app
In preparation for adding a new flower app - targeted at offloading the flower classifier - provide support for control message that it will use to communicate with the NFP. Based in part on work by Bert van Leeuwen. Signed-off-by: Simon Horman Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/Makefile | 1 + drivers/net/ethernet/netronome/nfp/flower/cmsg.c | 159 +++ drivers/net/ethernet/netronome/nfp/flower/cmsg.h | 116 + drivers/net/ethernet/netronome/nfp/nfp_app.c | 5 +- drivers/net/ethernet/netronome/nfp/nfp_app.h | 3 +- 5 files changed, 281 insertions(+), 3 deletions(-) create mode 100644 drivers/net/ethernet/netronome/nfp/flower/cmsg.c create mode 100644 drivers/net/ethernet/netronome/nfp/flower/cmsg.h diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile index a401113035f5..e14f62863add 100644 --- a/drivers/net/ethernet/netronome/nfp/Makefile +++ b/drivers/net/ethernet/netronome/nfp/Makefile @@ -27,6 +27,7 @@ nfp-objs := \ nfp_port.o \ bpf/main.o \ bpf/offload.o \ + flower/cmsg.o \ nic/main.o ifeq ($(CONFIG_BPF_SYSCALL),y) diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c new file mode 100644 index ..326f17eeaccf --- /dev/null +++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c @@ -0,0 +1,159 @@ +/* + * Copyright (C) 2015-2017 Netronome Systems, Inc. + * + * This software is dual licensed under the GNU General License Version 2, + * June 1991 as shown in the file COPYING in the top-level directory of this + * source tree or the BSD 2-Clause License provided below. You have the + * option to license this software under the complete terms of either license. + * + * The BSD 2-Clause License: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include + +#include "../nfpcore/nfp_cpp.h" +#include "../nfp_net_repr.h" +#include "./cmsg.h" + +#define nfp_flower_cmsg_warn(app, fmt, args...) \ + do {\ + if (net_ratelimit())\ + nfp_warn((app)->cpp, fmt, ## args); \ + } while (0) + +static struct nfp_flower_cmsg_hdr * +nfp_flower_cmsg_get_hdr(struct sk_buff *skb) +{ + return (struct nfp_flower_cmsg_hdr *)skb->data; +} + +static void *nfp_flower_cmsg_get_data(struct sk_buff *skb) +{ + return (unsigned char *)skb->data + NFP_FLOWER_CMSG_HLEN; +} + +static struct sk_buff * +nfp_flower_cmsg_alloc(struct nfp_app *app, unsigned int size, + enum nfp_flower_cmsg_type_port type) +{ + struct nfp_flower_cmsg_hdr *ch; + struct sk_buff *skb; + + size += NFP_FLOWER_CMSG_HLEN; + + skb = nfp_app_ctrl_msg_alloc(app, size, GFP_KERNEL); + if (!skb) + return NULL; + + ch = nfp_flower_cmsg_get_hdr(skb); + ch->pad = 0; + ch->version = NFP_FLOWER_CMSG_VER1; + ch->type = type; + skb_put(skb, size); + + return skb; +} + +int nfp_flower_cmsg_portmod(struct net_device *netdev) +{ + struct nfp_repr *repr = netdev_priv(netdev); + struct nfp_flower_cmsg_portmod *msg; + struct sk_buff *skb; + + skb = nfp_flower_cmsg_alloc(repr->app, sizeof(*msg), + NFP_FLOWER_CMSG_TYPE_PORT_MOD); + if (!skb) + return -ENOMEM; + + msg = nfp_flower_cmsg_get_data(skb); + msg->portnum = cpu_to_be32(repr->dst->u.port_info.port_id); + msg->reserved = 0; + msg->info = netif_carrier_ok(netdev); + msg->mtu = cpu_to_be16(netdev->mtu); + + nfp_ctrl_tx(repr->app->ctrl, skb); + + return 0; +} + +static void +nfp_flower_cmsg_portmod
[PATCH net-next v2 09/12] nfp: add support for tx/rx with metadata portid
Allow tx/rx with metadata port id. This will be used for tx/rx of representor netdevs acting as upper-devices while a pf netdev acts as a lower-device. Signed-off-by: Simon Horman Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/nfp_net.h | 1 + .../net/ethernet/netronome/nfp/nfp_net_common.c| 57 +++--- 2 files changed, 52 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h index 02fd8d4e253c..96c8ea476c05 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h @@ -318,6 +318,7 @@ struct nfp_meta_parsed { u8 csum_type; u32 hash; u32 mark; + u32 portid; __wsum csum; }; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 2b1ae666..49b8bc937ad8 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -755,6 +755,26 @@ static void nfp_net_tx_xmit_more_flush(struct nfp_net_tx_ring *tx_ring) tx_ring->wr_ptr_add = 0; } +static int nfp_net_prep_port_id(struct sk_buff *skb) +{ + struct metadata_dst *md_dst = skb_metadata_dst(skb); + unsigned char *data; + + if (likely(!md_dst)) + return 0; + if (unlikely(md_dst->type != METADATA_HW_PORT_MUX)) + return 0; + + if (unlikely(skb_cow_head(skb, 8))) + return -ENOMEM; + + data = skb_push(skb, 8); + put_unaligned_be32(NFP_NET_META_PORTID, data); + put_unaligned_be32(md_dst->u.port_info.port_id, data + 4); + + return 8; +} + /** * nfp_net_tx() - Main transmit entry point * @skb:SKB to transmit @@ -767,6 +787,7 @@ static int nfp_net_tx(struct sk_buff *skb, struct net_device *netdev) struct nfp_net *nn = netdev_priv(netdev); const struct skb_frag_struct *frag; struct nfp_net_tx_desc *txd, txdg; + int f, nr_frags, wr_idx, md_bytes; struct nfp_net_tx_ring *tx_ring; struct nfp_net_r_vector *r_vec; struct nfp_net_tx_buf *txbuf; @@ -774,8 +795,6 @@ static int nfp_net_tx(struct sk_buff *skb, struct net_device *netdev) struct nfp_net_dp *dp; dma_addr_t dma_addr; unsigned int fsize; - int f, nr_frags; - int wr_idx; u16 qidx; dp = &nn->dp; @@ -797,6 +816,13 @@ static int nfp_net_tx(struct sk_buff *skb, struct net_device *netdev) return NETDEV_TX_BUSY; } + md_bytes = nfp_net_prep_port_id(skb); + if (unlikely(md_bytes < 0)) { + nfp_net_tx_xmit_more_flush(tx_ring); + dev_kfree_skb_any(skb); + return NETDEV_TX_OK; + } + /* Start with the head skbuf */ dma_addr = dma_map_single(dp->dev, skb->data, skb_headlen(skb), DMA_TO_DEVICE); @@ -815,7 +841,7 @@ static int nfp_net_tx(struct sk_buff *skb, struct net_device *netdev) /* Build TX descriptor */ txd = &tx_ring->txds[wr_idx]; - txd->offset_eop = (nr_frags == 0) ? PCIE_DESC_TX_EOP : 0; + txd->offset_eop = (nr_frags ? 0 : PCIE_DESC_TX_EOP) | md_bytes; txd->dma_len = cpu_to_le16(skb_headlen(skb)); nfp_desc_set_dma_addr(txd, dma_addr); txd->data_len = cpu_to_le16(skb->len); @@ -855,7 +881,7 @@ static int nfp_net_tx(struct sk_buff *skb, struct net_device *netdev) *txd = txdg; txd->dma_len = cpu_to_le16(fsize); nfp_desc_set_dma_addr(txd, dma_addr); - txd->offset_eop = + txd->offset_eop |= (f == nr_frags - 1) ? PCIE_DESC_TX_EOP : 0; } @@ -1450,6 +1476,10 @@ nfp_net_parse_meta(struct net_device *netdev, struct nfp_meta_parsed *meta, meta->mark = get_unaligned_be32(data); data += 4; break; + case NFP_NET_META_PORTID: + meta->portid = get_unaligned_be32(data); + data += 4; + break; case NFP_NET_META_CSUM: meta->csum_type = CHECKSUM_COMPLETE; meta->csum = @@ -1594,6 +1624,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget) struct nfp_net_rx_buf *rxbuf; struct nfp_net_rx_desc *rxd; struct nfp_meta_parsed meta; + struct net_device *netdev; dma_addr_t new_dma_addr; void *new_frag; @@ -1672,7 +1703,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget) } if (xdp_prog && !(rxd->rxd.flags & PCIE_DESC_RX_BPF && -
[PATCH net-next v2 07/12] nfp: app callbacks for SRIOV
Add app-callbacks for app-specific initialisation of SRIOV. Disabling SRIOV is brought forward in nfp_pci_remove() so that nfp_app_sriov_disable is called while the app still exists. This is intended to be used to implement representor netdevs for virtual ports. Signed-off-by: Simon Horman Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/nfp_app.h | 18 drivers/net/ethernet/netronome/nfp/nfp_main.c | 42 +++ 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h index af023a0491e7..ff2d43615808 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h @@ -75,6 +75,8 @@ extern const struct nfp_app_type app_bpf; * @tc_busy: TC HW offload busy (rules loaded) * @xdp_offload:offload an XDP program * @eswitch_mode_get:get SR-IOV eswitch mode + * @sriov_enable: app-specific sriov initialisation + * @sriov_disable: app-specific sriov clean-up * @repr_get: get representor netdev */ struct nfp_app_type { @@ -102,6 +104,9 @@ struct nfp_app_type { int (*xdp_offload)(struct nfp_app *app, struct nfp_net *nn, struct bpf_prog *prog); + int (*sriov_enable)(struct nfp_app *app, int num_vfs); + void (*sriov_disable)(struct nfp_app *app); + enum devlink_eswitch_mode (*eswitch_mode_get)(struct nfp_app *app); struct net_device *(*repr_get)(struct nfp_app *app, u32 id); }; @@ -237,6 +242,19 @@ static inline int nfp_app_eswitch_mode_get(struct nfp_app *app, u16 *mode) return 0; } +static inline int nfp_app_sriov_enable(struct nfp_app *app, int num_vfs) +{ + if (!app || !app->type->sriov_enable) + return -EOPNOTSUPP; + return app->type->sriov_enable(app, num_vfs); +} + +static inline void nfp_app_sriov_disable(struct nfp_app *app) +{ + if (app && app->type->sriov_disable) + app->type->sriov_disable(app); +} + static inline struct net_device *nfp_app_repr_get(struct nfp_app *app, u32 id) { if (unlikely(!app || !app->type->repr_get)) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c index 4e59dcb78c36..748e54cc885e 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c @@ -54,6 +54,7 @@ #include "nfpcore/nfp6000_pcie.h" +#include "nfp_app.h" #include "nfp_main.h" #include "nfp_net.h" @@ -97,28 +98,45 @@ static int nfp_pcie_sriov_enable(struct pci_dev *pdev, int num_vfs) struct nfp_pf *pf = pci_get_drvdata(pdev); int err; + mutex_lock(&pf->lock); + if (num_vfs > pf->limit_vfs) { nfp_info(pf->cpp, "Firmware limits number of VFs to %u\n", pf->limit_vfs); - return -EINVAL; + err = -EINVAL; + goto err_unlock; + } + + err = nfp_app_sriov_enable(pf->app, num_vfs); + if (err) { + dev_warn(&pdev->dev, "App specific PCI sriov configuration failed: %d\n", +err); + goto err_unlock; } err = pci_enable_sriov(pdev, num_vfs); if (err) { dev_warn(&pdev->dev, "Failed to enable PCI sriov: %d\n", err); - return err; + goto err_app_sriov_disable; } pf->num_vfs = num_vfs; dev_dbg(&pdev->dev, "Created %d VFs.\n", pf->num_vfs); + mutex_unlock(&pf->lock); return num_vfs; + +err_app_sriov_disable: + nfp_app_sriov_disable(pf->app); +err_unlock: + mutex_unlock(&pf->lock); + return err; #endif return 0; } -static int nfp_pcie_sriov_disable(struct pci_dev *pdev) +static int __nfp_pcie_sriov_disable(struct pci_dev *pdev) { #ifdef CONFIG_PCI_IOV struct nfp_pf *pf = pci_get_drvdata(pdev); @@ -132,6 +150,8 @@ static int nfp_pcie_sriov_disable(struct pci_dev *pdev) return -EPERM; } + nfp_app_sriov_disable(pf->app); + pf->num_vfs = 0; pci_disable_sriov(pdev); @@ -140,6 +160,18 @@ static int nfp_pcie_sriov_disable(struct pci_dev *pdev) return 0; } +static int nfp_pcie_sriov_disable(struct pci_dev *pdev) +{ + struct nfp_pf *pf = pci_get_drvdata(pdev); + int err; + + mutex_lock(&pf->lock); + err = __nfp_pcie_sriov_disable(pdev); + mutex_unlock(&pf->lock); + + return err; +} + static int nfp_pcie_sriov_configure(struct pci_dev *pdev, int num_vfs) { if (num_vfs == 0) @@ -431,11 +463,11 @@ static void nfp_pci_remove(struct pci_dev *pdev) devlink = priv_to_devlink(pf); - nfp_net_pci_remove(pf); - nfp_pcie_sriov_disable(pdev); pci_sriov_set_totalvfs(pf->pdev, 0); + nfp_net_pci_remove(pf); + devlink_unre
[PATCH net-next v2 00/12] nfp: add flower app with representors
Hi, this series adds a flower app to the NFP driver. It initialises four types of netdevs: * PF netdev - lower-device for communication of packets to device * PF representor netdev * VF representor netdevs * Phys port representor netdevs The PF netdev acts as a lower-device which sends and receives packets to and from the firmware. The representors act as upper-devices. For TX representors attach a metadata dst to the skb which is used by the PF netdev to prepend metadata to the packet before forwarding the firmware. On RX the PF netdev looks up the representor based on the prepended metadata recieved from the firmware and forwards the skb to the representor after removing the metadata. Control queues are used to send and receive control messages which are used to communicate configuration information with the firmware. These are in separate vNIC to the queues belonging to the PF netdev. The control queues are not exposed to use-space via a netdev or any other means. As the name implies this app is targeted at providing offload of TC flower. That will be added by follow-up work. This patchset focuses on adding phys port and VF representor netdevs to which flower classifiers may be attached. Changes since v1: * Correct port_id endieness annotations * Make nfp_repr_*_get_stats64() static * Include for readq() on 32-bit systems Jakub Kicinski (3): net: store port/representator id in metadata_dst nfp: devlink add support for getting eswitch mode nfp: move physical port init into a helper Simon Horman (9): nfp: map mac_stats and vf_cfg BARs nfp: general representor implementation nfp: add stats and xmit helpers for representors nfp: app callbacks for SRIOV nfp: provide nfp_port to of nfp_net_get_mac_addr() nfp: add support for tx/rx with metadata portid nfp: add support for control messages for flower app nfp: add flower app nfp: add VF and PF representors to flower app drivers/net/ethernet/netronome/nfp/Makefile| 3 + drivers/net/ethernet/netronome/nfp/flower/cmsg.c | 159 + drivers/net/ethernet/netronome/nfp/flower/cmsg.h | 116 +++ drivers/net/ethernet/netronome/nfp/flower/main.c | 376 + drivers/net/ethernet/netronome/nfp/nfp_app.c | 26 +- drivers/net/ethernet/netronome/nfp/nfp_app.h | 58 +++- drivers/net/ethernet/netronome/nfp/nfp_app_nic.c | 25 +- drivers/net/ethernet/netronome/nfp/nfp_devlink.c | 18 + drivers/net/ethernet/netronome/nfp/nfp_main.c | 42 ++- drivers/net/ethernet/netronome/nfp/nfp_main.h | 11 +- drivers/net/ethernet/netronome/nfp/nfp_net.h | 1 + .../net/ethernet/netronome/nfp/nfp_net_common.c| 57 +++- drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 141 +--- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 353 +++ drivers/net/ethernet/netronome/nfp/nfp_net_repr.h | 120 +++ drivers/net/ethernet/netronome/nfp/nfp_port.c | 25 ++ drivers/net/ethernet/netronome/nfp/nfp_port.h | 63 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h | 2 + .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 5 +- include/net/dst_metadata.h | 41 ++- net/core/dst.c | 15 +- net/core/filter.c | 1 + net/ipv4/ip_tunnel_core.c | 6 +- net/openvswitch/flow_netlink.c | 4 +- 24 files changed, 1575 insertions(+), 93 deletions(-) create mode 100644 drivers/net/ethernet/netronome/nfp/flower/cmsg.c create mode 100644 drivers/net/ethernet/netronome/nfp/flower/cmsg.h create mode 100644 drivers/net/ethernet/netronome/nfp/flower/main.c create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h -- 2.1.4
[PATCH net-next v2 01/12] net: store port/representator id in metadata_dst
From: Jakub Kicinski Switches and modern SR-IOV enabled NICs may multiplex traffic from Port representators and control messages over single set of hardware queues. Control messages and muxed traffic may need ordered delivery. Those requirements make it hard to comfortably use TC infrastructure today unless we have a way of attaching metadata to skbs at the upper device. Because single set of queues is used for many netdevs stopping TC/sched queues of all of them reliably is impossible and lower device has to retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on the fastpath. This patch attempts to enable port/representative devs to attach metadata to skbs which carry port id. This way representatives can be queueless and all queuing can be performed at the lower netdev in the usual way. Traffic arriving on the port/representative interfaces will be have metadata attached and will subsequently be queued to the lower device for transmission. The lower device should recognize the metadata and translate it to HW specific format which is most likely either a special header inserted before the network headers or descriptor/metadata fields. Metadata is associated with the lower device by storing the netdev pointer along with port id so that if TC decides to redirect or mirror the new netdev will not try to interpret it. This is mostly for SR-IOV devices since switches don't have lower netdevs today. Signed-off-by: Jakub Kicinski Signed-off-by: Sridhar Samudrala Signed-off-by: Simon Horman --- include/net/dst_metadata.h | 41 - net/core/dst.c | 15 ++- net/core/filter.c | 1 + net/ipv4/ip_tunnel_core.c | 6 -- net/openvswitch/flow_netlink.c | 4 +++- 5 files changed, 50 insertions(+), 17 deletions(-) diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h index 701fc814d0af..a803129a4849 100644 --- a/include/net/dst_metadata.h +++ b/include/net/dst_metadata.h @@ -5,10 +5,22 @@ #include #include +enum metadata_type { + METADATA_IP_TUNNEL, + METADATA_HW_PORT_MUX, +}; + +struct hw_port_info { + struct net_device *lower_dev; + u32 port_id; +}; + struct metadata_dst { struct dst_entrydst; + enum metadata_type type; union { struct ip_tunnel_info tun_info; + struct hw_port_info port_info; } u; }; @@ -27,7 +39,7 @@ static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb) struct metadata_dst *md_dst = skb_metadata_dst(skb); struct dst_entry *dst; - if (md_dst) + if (md_dst && md_dst->type == METADATA_IP_TUNNEL) return &md_dst->u.tun_info; dst = skb_dst(skb); @@ -55,22 +67,33 @@ static inline int skb_metadata_dst_cmp(const struct sk_buff *skb_a, a = (const struct metadata_dst *) skb_dst(skb_a); b = (const struct metadata_dst *) skb_dst(skb_b); - if (!a != !b || a->u.tun_info.options_len != b->u.tun_info.options_len) + if (!a != !b || a->type != b->type) return 1; - return memcmp(&a->u.tun_info, &b->u.tun_info, - sizeof(a->u.tun_info) + a->u.tun_info.options_len); + switch (a->type) { + case METADATA_HW_PORT_MUX: + return memcmp(&a->u.port_info, &b->u.port_info, + sizeof(a->u.port_info)); + case METADATA_IP_TUNNEL: + return memcmp(&a->u.tun_info, &b->u.tun_info, + sizeof(a->u.tun_info) + +a->u.tun_info.options_len); + default: + return 1; + } } void metadata_dst_free(struct metadata_dst *); -struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags); -struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t flags); +struct metadata_dst *metadata_dst_alloc(u8 optslen, enum metadata_type type, + gfp_t flags); +struct metadata_dst __percpu * +metadata_dst_alloc_percpu(u8 optslen, enum metadata_type type, gfp_t flags); static inline struct metadata_dst *tun_rx_dst(int md_size) { struct metadata_dst *tun_dst; - tun_dst = metadata_dst_alloc(md_size, GFP_ATOMIC); + tun_dst = metadata_dst_alloc(md_size, METADATA_IP_TUNNEL, GFP_ATOMIC); if (!tun_dst) return NULL; @@ -85,11 +108,11 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) int md_size; struct metadata_dst *new_md; - if (!md_dst) + if (!md_dst || md_dst->type != METADATA_IP_TUNNEL) return ERR_PTR(-EINVAL); md_size = md_dst->u.tun_info.options_len; - new_md = metadata_dst_alloc(md_size, GFP_ATOMIC); + new_md = metadata_dst_alloc(md_size, METADATA_IP_TUNNEL, GFP_ATOMIC); if (!new_md)
[PATCH net-next v2 05/12] nfp: general representor implementation
Provide infrastructure to create and destroy representors of a given type. Parts based on work by Bert van Leeuwen, Benjamin LaHaise, and Jakub Kicinski. Signed-off-by: Simon Horman Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/Makefile | 1 + drivers/net/ethernet/netronome/nfp/nfp_app.c | 20 +++ drivers/net/ethernet/netronome/nfp/nfp_app.h | 18 +++ drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 156 ++ drivers/net/ethernet/netronome/nfp/nfp_net_repr.h | 92 + 5 files changed, 287 insertions(+) create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile index 5ad9a557f06a..a401113035f5 100644 --- a/drivers/net/ethernet/netronome/nfp/Makefile +++ b/drivers/net/ethernet/netronome/nfp/Makefile @@ -22,6 +22,7 @@ nfp-objs := \ nfp_net_common.o \ nfp_net_ethtool.o \ nfp_net_main.o \ + nfp_net_repr.o \ nfp_netvf_main.o \ nfp_port.o \ bpf/main.o \ diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c b/drivers/net/ethernet/netronome/nfp/nfp_app.c index 396b93f54823..c9ccb0f94604 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c @@ -38,6 +38,7 @@ #include "nfpcore/nfp_nffw.h" #include "nfp_app.h" #include "nfp_main.h" +#include "nfp_net_repr.h" static const struct nfp_app_type *apps[] = { &app_nic, @@ -68,6 +69,25 @@ struct sk_buff *nfp_app_ctrl_msg_alloc(struct nfp_app *app, unsigned int size) return skb; } +struct nfp_reprs * +nfp_app_reprs_set(struct nfp_app *app, enum nfp_repr_type type, + struct nfp_reprs *reprs) +{ + struct nfp_reprs *old; + + old = rcu_dereference_protected(app->reprs[type], + lockdep_is_held(&app->pf->lock)); + if (reprs && old) { + old = ERR_PTR(-EBUSY); + goto exit_unlock; + } + + rcu_assign_pointer(app->reprs[type], reprs); + +exit_unlock: + return old; +} + struct nfp_app *nfp_app_alloc(struct nfp_pf *pf, enum nfp_app_id id) { struct nfp_app *app; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h index 0fee14ffa081..af023a0491e7 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h @@ -36,6 +36,8 @@ #include +#include "nfp_net_repr.h" + struct bpf_prog; struct net_device; struct pci_dev; @@ -73,6 +75,7 @@ extern const struct nfp_app_type app_bpf; * @tc_busy: TC HW offload busy (rules loaded) * @xdp_offload:offload an XDP program * @eswitch_mode_get:get SR-IOV eswitch mode + * @repr_get: get representor netdev */ struct nfp_app_type { enum nfp_app_id id; @@ -100,6 +103,7 @@ struct nfp_app_type { struct bpf_prog *prog); enum devlink_eswitch_mode (*eswitch_mode_get)(struct nfp_app *app); + struct net_device *(*repr_get)(struct nfp_app *app, u32 id); }; /** @@ -108,6 +112,7 @@ struct nfp_app_type { * @pf:backpointer to NFP PF structure * @cpp: pointer to the CPP handle * @ctrl: pointer to ctrl vNIC struct + * @reprs: array of pointers to representors * @type: pointer to const application ops and info */ struct nfp_app { @@ -116,6 +121,7 @@ struct nfp_app { struct nfp_cpp *cpp; struct nfp_net *ctrl; + struct nfp_reprs __rcu *reprs[NFP_REPR_TYPE_MAX + 1]; const struct nfp_app_type *type; }; @@ -231,6 +237,18 @@ static inline int nfp_app_eswitch_mode_get(struct nfp_app *app, u16 *mode) return 0; } +static inline struct net_device *nfp_app_repr_get(struct nfp_app *app, u32 id) +{ + if (unlikely(!app || !app->type->repr_get)) + return NULL; + + return app->type->repr_get(app, id); +} + +struct nfp_reprs * +nfp_app_reprs_set(struct nfp_app *app, enum nfp_repr_type type, + struct nfp_reprs *reprs); + const char *nfp_app_mip_name(struct nfp_app *app); struct sk_buff *nfp_app_ctrl_msg_alloc(struct nfp_app *app, unsigned int size); diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c new file mode 100644 index ..8e02f843ae92 --- /dev/null +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c @@ -0,0 +1,156 @@ +/* + * Copyright (C) 2017 Netronome Systems, Inc. + * + * This software is dual licensed under the GNU General License Version 2, + * June 1991 as shown in the file COPYING in the top-level directory of this + * source tree or the BSD 2-Clause License provided below. You have the + * option to license this software under the comp
[PATCH net-next v2 04/12] nfp: map mac_stats and vf_cfg BARs
If present map mac_stats and vf_cfg BARs. These will be used by representor netdevs to read statistics for phys port and vf representors. Also provide defines describing the layout of the mac_stats area. Similar defines are already present for the cf_cfg area. Based in part on work by Jakub Kicinski. Signed-off-by: Simon Horman Reviewed-by: Jakub Kicinski --- drivers/net/ethernet/netronome/nfp/nfp_main.h | 8 ++ drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 116 +++-- drivers/net/ethernet/netronome/nfp/nfp_port.h | 60 +++ .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h | 2 + .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 5 +- 5 files changed, 161 insertions(+), 30 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h b/drivers/net/ethernet/netronome/nfp/nfp_main.h index 88724f8d0dcd..aa69d4101eb9 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h @@ -68,6 +68,10 @@ struct nfp_rtsym_table; * @data_vnic_bar: Pointer to the CPP area for the data vNICs' BARs * @ctrl_vnic_bar: Pointer to the CPP area for the ctrl vNIC's BAR * @qc_area: Pointer to the CPP area for the queues + * @mac_stats_bar: Pointer to the CPP area for the MAC stats + * @mac_stats_mem: Pointer to mapped MAC stats area + * @vf_cfg_bar:Pointer to the CPP area for the VF configuration BAR + * @vf_cfg_mem:Pointer to mapped VF configuration area * @irq_entries: Array of MSI-X entries for all vNICs * @limit_vfs: Number of VFs supported by firmware (~0 for PCI limit) * @num_vfs: Number of SR-IOV VFs enabled @@ -97,6 +101,10 @@ struct nfp_pf { struct nfp_cpp_area *data_vnic_bar; struct nfp_cpp_area *ctrl_vnic_bar; struct nfp_cpp_area *qc_area; + struct nfp_cpp_area *mac_stats_bar; + u8 __iomem *mac_stats_mem; + struct nfp_cpp_area *vf_cfg_bar; + u8 __iomem *vf_cfg_mem; struct msix_entry *irq_entries; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c index bc2bc0886176..eb87e1c08bb1 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c @@ -570,6 +570,79 @@ static void nfp_net_pf_app_stop(struct nfp_pf *pf) nfp_net_pf_app_stop_ctrl(pf); } +static void nfp_net_pci_unmap_mem(struct nfp_pf *pf) +{ + if (pf->vf_cfg_bar) + nfp_cpp_area_release_free(pf->vf_cfg_bar); + if (pf->mac_stats_bar) + nfp_cpp_area_release_free(pf->mac_stats_bar); + nfp_cpp_area_release_free(pf->qc_area); + nfp_cpp_area_release_free(pf->data_vnic_bar); +} + +static int nfp_net_pci_map_mem(struct nfp_pf *pf) +{ + u32 ctrl_bar_sz; + u8 __iomem *mem; + int err; + + ctrl_bar_sz = pf->max_data_vnics * NFP_PF_CSR_SLICE_SIZE; + mem = nfp_net_pf_map_rtsym(pf, "net.ctrl", "_pf%d_net_bar0", + ctrl_bar_sz, &pf->data_vnic_bar); + if (IS_ERR(mem)) { + err = PTR_ERR(mem); + if (!pf->fw_loaded && err == -ENOENT) + err = -EPROBE_DEFER; + return err; + } + + pf->mac_stats_mem = nfp_net_pf_map_rtsym(pf, "net.macstats", +"_mac_stats", +NFP_MAC_STATS_SIZE * +(pf->eth_tbl->max_index + 1), +&pf->mac_stats_bar); + if (IS_ERR(pf->mac_stats_mem)) { + if (PTR_ERR(pf->mac_stats_mem) != -ENOENT) { + err = PTR_ERR(pf->mac_stats_mem); + goto err_unmap_ctrl; + } + pf->mac_stats_mem = NULL; + } + + pf->vf_cfg_mem = nfp_net_pf_map_rtsym(pf, "net.vfcfg", + "_pf%d_net_vf_bar", + NFP_NET_CFG_BAR_SZ * + pf->limit_vfs, &pf->vf_cfg_bar); + if (IS_ERR(pf->vf_cfg_mem)) { + if (PTR_ERR(pf->vf_cfg_mem) != -ENOENT) { + err = PTR_ERR(pf->vf_cfg_mem); + goto err_unmap_mac_stats; + } + pf->vf_cfg_mem = NULL; + } + + mem = nfp_net_map_area(pf->cpp, "net.qc", 0, 0, + NFP_PCIE_QUEUE(0), NFP_QCP_QUEUE_AREA_SZ, + &pf->qc_area); + if (IS_ERR(mem)) { + nfp_err(pf->cpp, "Failed to map Queue Controller area.\n"); + err = PTR_ERR(mem); + goto err_unmap_vf_cfg; + } + + return 0; + +err_unmap_vf_cfg: + if (pf->vf_cfg_bar) + nfp_cpp_area_release_free(pf->vf_c
[PATCH net-next v2 03/12] nfp: move physical port init into a helper
From: Jakub Kicinski Move MAC/PHY port init into a helper to make it easier to reuse it in the representor code. Signed-off-by: Jakub Kicinski Signed-off-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_app_nic.c | 23 ++ drivers/net/ethernet/netronome/nfp/nfp_port.c| 25 drivers/net/ethernet/netronome/nfp/nfp_port.h| 3 +++ 3 files changed, 34 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c b/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c index 83c65e6291ee..7b966bd3d214 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c @@ -42,6 +42,8 @@ static int nfp_app_nic_vnic_init_phy_port(struct nfp_pf *pf, struct nfp_app *app, struct nfp_net *nn, unsigned int id) { + int err; + if (!pf->eth_tbl) return 0; @@ -49,26 +51,13 @@ nfp_app_nic_vnic_init_phy_port(struct nfp_pf *pf, struct nfp_app *app, if (IS_ERR(nn->port)) return PTR_ERR(nn->port); - nn->port->eth_id = id; - nn->port->eth_port = nfp_net_find_port(pf->eth_tbl, id); - - /* Check if vNIC has external port associated and cfg is OK */ - if (!nn->port->eth_port) { - nfp_err(app->cpp, - "NSP port entries don't match vNICs (no entry for port #%d)\n", - id); + err = nfp_port_init_phy_port(pf, app, nn->port, id); + if (err) { nfp_port_free(nn->port); - return -EINVAL; - } - if (nn->port->eth_port->override_changed) { - nfp_warn(app->cpp, -"Config changed for port #%d, reboot required before port will be operational\n", -id); - nn->port->type = NFP_PORT_INVALID; - return 1; + return err; } - return 0; + return nn->port->type == NFP_PORT_INVALID; } int nfp_app_nic_vnic_init(struct nfp_app *app, struct nfp_net *nn, diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.c b/drivers/net/ethernet/netronome/nfp/nfp_port.c index a17410ac01ab..19bceeb82225 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_port.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_port.c @@ -33,6 +33,7 @@ #include +#include "nfpcore/nfp_cpp.h" #include "nfpcore/nfp_nsp.h" #include "nfp_app.h" #include "nfp_main.h" @@ -112,6 +113,30 @@ nfp_port_get_phys_port_name(struct net_device *netdev, char *name, size_t len) return 0; } +int nfp_port_init_phy_port(struct nfp_pf *pf, struct nfp_app *app, + struct nfp_port *port, unsigned int id) +{ + port->eth_id = id; + port->eth_port = nfp_net_find_port(pf->eth_tbl, id); + + /* Check if vNIC has external port associated and cfg is OK */ + if (!port->eth_port) { + nfp_err(app->cpp, + "NSP port entries don't match vNICs (no entry for port #%d)\n", + id); + return -EINVAL; + } + if (port->eth_port->override_changed) { + nfp_warn(app->cpp, +"Config changed for port #%d, reboot required before port will be operational\n", +id); + port->type = NFP_PORT_INVALID; + return 0; + } + + return 0; +} + struct nfp_port * nfp_port_alloc(struct nfp_app *app, enum nfp_port_type type, struct net_device *netdev) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h b/drivers/net/ethernet/netronome/nfp/nfp_port.h index 4d1a9b3fed41..fb28c7071987 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_port.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h @@ -104,6 +104,9 @@ nfp_port_alloc(struct nfp_app *app, enum nfp_port_type type, struct net_device *netdev); void nfp_port_free(struct nfp_port *port); +int nfp_port_init_phy_port(struct nfp_pf *pf, struct nfp_app *app, + struct nfp_port *port, unsigned int id); + int nfp_net_refresh_eth_port(struct nfp_port *port); void nfp_net_refresh_port_table(struct nfp_port *port); int nfp_net_refresh_port_table_sync(struct nfp_pf *pf); -- 2.1.4
[PATCH net-next v2 02/12] nfp: devlink add support for getting eswitch mode
From: Jakub Kicinski Add app callback for reporting eswitch mode. Non-SRIOV apps should not implement this callback, nfp_app code will then respond with -EOPNOTSUPP. Signed-off-by: Jakub Kicinski Signed-off-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_app.h | 15 +++ drivers/net/ethernet/netronome/nfp/nfp_devlink.c | 18 ++ 2 files changed, 33 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h index f5e373fa8c3b..0fee14ffa081 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h @@ -34,6 +34,8 @@ #ifndef _NFP_APP_H #define _NFP_APP_H 1 +#include + struct bpf_prog; struct net_device; struct pci_dev; @@ -70,6 +72,7 @@ extern const struct nfp_app_type app_bpf; * @setup_tc: setup TC ndo * @tc_busy: TC HW offload busy (rules loaded) * @xdp_offload:offload an XDP program + * @eswitch_mode_get:get SR-IOV eswitch mode */ struct nfp_app_type { enum nfp_app_id id; @@ -95,6 +98,8 @@ struct nfp_app_type { bool (*tc_busy)(struct nfp_app *app, struct nfp_net *nn); int (*xdp_offload)(struct nfp_app *app, struct nfp_net *nn, struct bpf_prog *prog); + + enum devlink_eswitch_mode (*eswitch_mode_get)(struct nfp_app *app); }; /** @@ -216,6 +221,16 @@ static inline void nfp_app_ctrl_rx(struct nfp_app *app, struct sk_buff *skb) app->type->ctrl_msg_rx(app, skb); } +static inline int nfp_app_eswitch_mode_get(struct nfp_app *app, u16 *mode) +{ + if (!app->type->eswitch_mode_get) + return -EOPNOTSUPP; + + *mode = app->type->eswitch_mode_get(app); + + return 0; +} + const char *nfp_app_mip_name(struct nfp_app *app); struct sk_buff *nfp_app_ctrl_msg_alloc(struct nfp_app *app, unsigned int size); diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c index 2609a0f28e81..6c9f29c2e975 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c @@ -149,9 +149,27 @@ nfp_devlink_port_unsplit(struct devlink *devlink, unsigned int port_index) return ret; } +static int nfp_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode) +{ + struct nfp_pf *pf = devlink_priv(devlink); + int ret; + + mutex_lock(&pf->lock); + if (!pf->app) { + ret = -EBUSY; + goto out; + } + ret = nfp_app_eswitch_mode_get(pf->app, mode); +out: + mutex_unlock(&pf->lock); + + return ret; +} + const struct devlink_ops nfp_devlink_ops = { .port_split = nfp_devlink_port_split, .port_unsplit = nfp_devlink_port_unsplit, + .eswitch_mode_get = nfp_devlink_eswitch_mode_get, }; int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port) -- 2.1.4
Re: [PATCH 00/51] rtc: stop using rtc deprecated functions
On 20/06/2017 at 22:15:36 +0100, Russell King - ARM Linux wrote: > On Tue, Jun 20, 2017 at 05:07:46PM +0200, Benjamin Gaignard wrote: > > 2017-06-20 15:48 GMT+02:00 Alexandre Belloni > > : > > >> Yes, that's argument against changing rtc _drivers_ for hardware that > > >> can not do better than 32bit. For generic code (such as 44/51 sysfs, > > >> 51/51 suspend test), the change still makes sense. > > > > What I had in mind when writing those patches was to remove the limitations > > coming from those functions usage, even more since they been marked has > > deprecated. > > I'd say that they should not be marked as deprecated. They're entirely > appropriate for use with hardware that only supports a 32-bit > representation of time. > > It's entirely reasonable to fix the ones that use other representations > that exceed that, but for those which do not, we need to keep using the > 32-bit versions. Doing so actually gives us _more_ flexibility in the > future. > > Consider that at the moment, we define the 32-bit RTC representation to > start at a well known epoch. We _could_ decide that when it wraps to > 0x8000 seconds, we'll define the lower 0x4000 seconds to mean > dates in the future - and keep rolling that forward each time we cross > another 0x4000 seconds. Unless someone invents a real time machine, > we shouldn't need to set a modern RTC back to 1970. > I agree with that but not the android guys. They seem to mandate an RTC that can store time from 01/01/1970. I don't know much more than that because they never cared to explain why that was actually necessary (apart from a laconic "this will result in a bad user experience") I think tglx had a plan for offsetting the time at some point so 32-bit platform can pass 2038 properly. My opinion is that as long as userspace is not ready to handle those dates, it doesn't really matter because it is quite unlikely that anything will be able to continue running anyway. -- Alexandre Belloni, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com
Re: [PATCH net-next v3 4/4] ip6mr: add netlink notifications on mrt6msg cache reports
On 20/06/17 23:54, Julien Gomes wrote: > Add Netlink notifications on cache reports in ip6mr, in addition to the > existing mrt6msg sent to mroute6_sk. > Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV6_MROUTE_R. > > MSGTYPE, MIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the > same data as their equivalent fields in the mrt6msg header. > PKT attribute is the packet sent to mroute6_sk, without the added > mrt6msg header. > > Suggested-by: Ryan Halbrook > Signed-off-by: Julien Gomes > --- > include/uapi/linux/mroute6.h | 12 > net/ipv6/ip6mr.c | 71 > ++-- > 2 files changed, 81 insertions(+), 2 deletions(-) > Reviewed-by: Nikolay Aleksandrov
Re: [PATCH net-next v3 3/4] ipmr: add netlink notifications on igmpmsg cache reports
On 20/06/17 23:54, Julien Gomes wrote: > Add Netlink notifications on cache reports in ipmr, in addition to the > existing igmpmsg sent to mroute_sk. > Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV4_MROUTE_R. > > MSGTYPE, VIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the > same data as their equivalent fields in the igmpmsg header. > PKT attribute is the packet sent to mroute_sk, without the added igmpmsg > header. > > Suggested-by: Ryan Halbrook > Signed-off-by: Julien Gomes > --- > include/uapi/linux/mroute.h | 12 > net/ipv4/ipmr.c | 69 > +++-- > 2 files changed, 79 insertions(+), 2 deletions(-) > Thanks, Reviewed-by: Nikolay Aleksandrov
Re: [PATCH net-next v3 07/15] bpf: Add setsockopt helper function to bpf
On Mon, Jun 19, 2017 at 11:00 PM, Lawrence Brakmo wrote: > Added support for calling a subset of socket setsockopts from > BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather > than making the changes to call the socket setsockopt function because > the changes required would have been larger. > > @@ -2671,6 +2672,69 @@ static const struct bpf_func_proto > bpf_get_socket_uid_proto = { > .arg1_type = ARG_PTR_TO_CTX, > }; > > +BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock, > + int, level, int, optname, char *, optval, int, optlen) > +{ > + struct sock *sk = bpf_sock->sk; > + int ret = 0; > + int val; > + > + if (bpf_sock->is_req_sock) > + return -EINVAL; > + > + if (level == SOL_SOCKET) { > + /* Only some socketops are supported */ > + val = *((int *)optval); > + > + switch (optname) { > + case SO_RCVBUF: > + sk->sk_userlocks |= SOCK_RCVBUF_LOCK; > + sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF); > + break; > + case SO_SNDBUF: > + sk->sk_userlocks |= SOCK_SNDBUF_LOCK; > + sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF); > + break; > + case SO_MAX_PACING_RATE: > + sk->sk_max_pacing_rate = val; > + sk->sk_pacing_rate = min(sk->sk_pacing_rate, > +sk->sk_max_pacing_rate); > + break; > + case SO_PRIORITY: > + sk->sk_priority = val; > + break; > + case SO_RCVLOWAT: > + if (val < 0) > + val = INT_MAX; > + sk->sk_rcvlowat = val ? : 1; > + break; > + case SO_MARK: > + sk->sk_mark = val; > + break; Isn't the socket lock required when manipulating these fields? It's not obvious that the lock is held from every bpf hook point that could trigger this function...
[PATCH] net: intel: e1000e: add check on e1e_wphy() return value
Check return value from call to e1e_wphy(). This value is being checked during previous calls to function e1e_wphy() and it seems a check was missing here. Addresses-Coverity-ID: 1226905 Signed-off-by: Gustavo A. R. Silva --- drivers/net/ethernet/intel/e1000e/ich8lan.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c index 68ea8b4..d6d4ed7 100644 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c @@ -2437,6 +2437,8 @@ static s32 e1000_hv_phy_workarounds_ich8lan(struct e1000_hw *hw) if (hw->phy.revision < 2) { e1000e_phy_sw_reset(hw); ret_val = e1e_wphy(hw, MII_BMCR, 0x3140); + if (ret_val) + return ret_val; } } -- 2.5.0
Re: [PATCH 00/51] rtc: stop using rtc deprecated functions
On Tue, Jun 20, 2017 at 05:07:46PM +0200, Benjamin Gaignard wrote: > 2017-06-20 15:48 GMT+02:00 Alexandre Belloni > : > >> Yes, that's argument against changing rtc _drivers_ for hardware that > >> can not do better than 32bit. For generic code (such as 44/51 sysfs, > >> 51/51 suspend test), the change still makes sense. > > What I had in mind when writing those patches was to remove the limitations > coming from those functions usage, even more since they been marked has > deprecated. I'd say that they should not be marked as deprecated. They're entirely appropriate for use with hardware that only supports a 32-bit representation of time. It's entirely reasonable to fix the ones that use other representations that exceed that, but for those which do not, we need to keep using the 32-bit versions. Doing so actually gives us _more_ flexibility in the future. Consider that at the moment, we define the 32-bit RTC representation to start at a well known epoch. We _could_ decide that when it wraps to 0x8000 seconds, we'll define the lower 0x4000 seconds to mean dates in the future - and keep rolling that forward each time we cross another 0x4000 seconds. Unless someone invents a real time machine, we shouldn't need to set a modern RTC back to 1970. If we convert the 32-bit counter RTC drivers to use 64-bit conversions, then we're completely stuffed, because the lower 32-bits will always be relative to the epoch, and we can't change that without breaking the 64-bit users. So, keep the 32-bit conversion functions, do not deprecate them, and think about the future possibilities. I really think this "get rid of 32-bit time representations" is a much to narrow focus on the wrong problem. You can't ever fix 32-bit time representations by just adding additional zeros into the MSB bits. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
[PATCH net-next v3 1/4] rtnetlink: add NEWCACHEREPORT message type
New NEWCACHEREPORT message type to be used for cache reports sent via Netlink, effectively allowing splitting cache report reception from mroute programming. Suggested-by: Ryan Halbrook Signed-off-by: Julien Gomes Reviewed-by: Nikolay Aleksandrov --- include/uapi/linux/rtnetlink.h | 3 +++ security/selinux/nlmsgtab.c| 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 564790e854f7..cd1afb900929 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -146,6 +146,9 @@ enum { RTM_GETSTATS = 94, #define RTM_GETSTATS RTM_GETSTATS + RTM_NEWCACHEREPORT = 96, +#define RTM_NEWCACHEREPORT RTM_NEWCACHEREPORT + __RTM_MAX, #define RTM_MAX(((__RTM_MAX + 3) & ~3) - 1) }; diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c index 5aeaf30b7a13..7b7433a1a34c 100644 --- a/security/selinux/nlmsgtab.c +++ b/security/selinux/nlmsgtab.c @@ -79,6 +79,7 @@ static const struct nlmsg_perm nlmsg_route_perms[] = { RTM_GETNSID, NETLINK_ROUTE_SOCKET__NLMSG_READ }, { RTM_NEWSTATS, NETLINK_ROUTE_SOCKET__NLMSG_READ }, { RTM_GETSTATS, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_NEWCACHEREPORT, NETLINK_ROUTE_SOCKET__NLMSG_READ }, }; static const struct nlmsg_perm nlmsg_tcpdiag_perms[] = @@ -158,7 +159,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm) switch (sclass) { case SECCLASS_NETLINK_ROUTE_SOCKET: /* RTM_MAX always point to RTM_SET, ie RTM_NEWxxx + 3 */ - BUILD_BUG_ON(RTM_MAX != (RTM_NEWSTATS + 3)); + BUILD_BUG_ON(RTM_MAX != (RTM_NEWCACHEREPORT + 3)); err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms, sizeof(nlmsg_route_perms)); break; -- 2.13.1
[PATCH net-next v3 0/4] ipmr/ip6mr: add Netlink notifications on cache reports
Currently, all ipmr/ip6mr cache reports are sent through the mroute/mroute6 socket only. This forces the use of a single socket for mroute programming, cache reports and, regarding ipmr, IGMP messages without Router Alert option reception. The present patches are aiming to send Netlink notifications in addition to the existing igmpmsg/mrt6msg to give user programs a way to handle cache reports in parallel with multiple sockets other than the mroute/mroute6 socket. Changes in v2: - Changed attributes naming from {IPMRA,IP6MRA}_CACHEREPORTA_* to {IPMRA,IP6MRA}_CREPORT_* - Improved packet data copy to handle non-linear packets in ipmr/ip6mr cache report Netlink notification creation - Added two rtnetlink groups with restricted-binding - Changed cache report notified groups from RTNL_{IPV4,IPV6}_MROUTE to the new restricted groups in ipmr/ip6mr Changes in v3: - Put message size calculation for {igmp,mrt6}msg_netlink_event in separate functions - Increased vif id attributes size from u8 to u32 Julien Gomes (4): rtnetlink: add NEWCACHEREPORT message type rtnetlink: add restricted rtnl groups for ipv4 and ipv6 mroute ipmr: add netlink notifications on igmpmsg cache reports ip6mr: add netlink notifications on mrt6msg cache reports include/uapi/linux/mroute.h| 12 +++ include/uapi/linux/mroute6.h | 12 +++ include/uapi/linux/rtnetlink.h | 7 + net/core/rtnetlink.c | 13 net/ipv4/ipmr.c| 69 ++-- net/ipv6/ip6mr.c | 71 -- security/selinux/nlmsgtab.c| 3 +- 7 files changed, 182 insertions(+), 5 deletions(-) -- 2.13.1
[PATCH net-next v3 2/4] rtnetlink: add restricted rtnl groups for ipv4 and ipv6 mroute
Add RTNLGRP_{IPV4,IPV6}_MROUTE_R as two new restricted groups for the NETLINK_ROUTE family. Binding to these groups specifically requires CAP_NET_ADMIN to allow multicast of sensitive messages (e.g. mroute cache reports). Suggested-by: Nikolay Aleksandrov Signed-off-by: Julien Gomes Signed-off-by: Nikolay Aleksandrov --- include/uapi/linux/rtnetlink.h | 4 net/core/rtnetlink.c | 13 + 2 files changed, 17 insertions(+) diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index cd1afb900929..d148505010a7 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -669,6 +669,10 @@ enum rtnetlink_groups { #define RTNLGRP_NSID RTNLGRP_NSID RTNLGRP_MPLS_NETCONF, #define RTNLGRP_MPLS_NETCONF RTNLGRP_MPLS_NETCONF + RTNLGRP_IPV4_MROUTE_R, +#define RTNLGRP_IPV4_MROUTE_R RTNLGRP_IPV4_MROUTE_R + RTNLGRP_IPV6_MROUTE_R, +#define RTNLGRP_IPV6_MROUTE_R RTNLGRP_IPV6_MROUTE_R __RTNLGRP_MAX }; #define RTNLGRP_MAX(__RTNLGRP_MAX - 1) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 3aa57848a895..4aefa5a2625f 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -4218,6 +4218,18 @@ static void rtnetlink_rcv(struct sk_buff *skb) rtnl_unlock(); } +static int rtnetlink_bind(struct net *net, int group) +{ + switch (group) { + case RTNLGRP_IPV4_MROUTE_R: + case RTNLGRP_IPV6_MROUTE_R: + if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) + return -EPERM; + break; + } + return 0; +} + static int rtnetlink_event(struct notifier_block *this, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); @@ -4252,6 +4264,7 @@ static int __net_init rtnetlink_net_init(struct net *net) .input = rtnetlink_rcv, .cb_mutex = &rtnl_mutex, .flags = NL_CFG_F_NONROOT_RECV, + .bind = rtnetlink_bind, }; sk = netlink_kernel_create(net, NETLINK_ROUTE, &cfg); -- 2.13.1
[PATCH net-next v3 4/4] ip6mr: add netlink notifications on mrt6msg cache reports
Add Netlink notifications on cache reports in ip6mr, in addition to the existing mrt6msg sent to mroute6_sk. Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV6_MROUTE_R. MSGTYPE, MIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the same data as their equivalent fields in the mrt6msg header. PKT attribute is the packet sent to mroute6_sk, without the added mrt6msg header. Suggested-by: Ryan Halbrook Signed-off-by: Julien Gomes --- include/uapi/linux/mroute6.h | 12 net/ipv6/ip6mr.c | 71 ++-- 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/mroute6.h b/include/uapi/linux/mroute6.h index ed5721148768..e4746816c855 100644 --- a/include/uapi/linux/mroute6.h +++ b/include/uapi/linux/mroute6.h @@ -133,4 +133,16 @@ struct mrt6msg { struct in6_addr im6_src, im6_dst; }; +/* ip6mr netlink cache report attributes */ +enum { + IP6MRA_CREPORT_UNSPEC, + IP6MRA_CREPORT_MSGTYPE, + IP6MRA_CREPORT_MIF_ID, + IP6MRA_CREPORT_SRC_ADDR, + IP6MRA_CREPORT_DST_ADDR, + IP6MRA_CREPORT_PKT, + __IP6MRA_CREPORT_MAX +}; +#define IP6MRA_CREPORT_MAX (__IP6MRA_CREPORT_MAX - 1) + #endif /* _UAPI__LINUX_MROUTE6_H */ diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index b0e2bf1f4212..7454850f2098 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -116,6 +116,7 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb, struct mfc6_cache *c, struct rtmsg *rtm); static void mr6_netlink_event(struct mr6_table *mrt, struct mfc6_cache *mfc, int cmd); +static void mrt6msg_netlink_event(struct mr6_table *mrt, struct sk_buff *pkt); static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb); static void mroute_clean_tables(struct mr6_table *mrt, bool all); @@ -1125,8 +1126,7 @@ static void ip6mr_cache_resolve(struct net *net, struct mr6_table *mrt, } /* - * Bounce a cache query up to pim6sd. We could use netlink for this but pim6sd - * expects the following bizarre scheme. + * Bounce a cache query up to pim6sd and netlink. * * Called under mrt_lock. */ @@ -1208,6 +1208,8 @@ static int ip6mr_cache_report(struct mr6_table *mrt, struct sk_buff *pkt, return -EINVAL; } + mrt6msg_netlink_event(mrt, skb); + /* * Deliver to user space multicast routing algorithms */ @@ -2457,6 +2459,71 @@ static void mr6_netlink_event(struct mr6_table *mrt, struct mfc6_cache *mfc, rtnl_set_sk_err(net, RTNLGRP_IPV6_MROUTE, err); } +static size_t mrt6msg_netlink_msgsize(size_t payloadlen) +{ + size_t len = + NLMSG_ALIGN(sizeof(struct rtgenmsg)) + + nla_total_size(1) /* IP6MRA_CREPORT_MSGTYPE */ + + nla_total_size(4) /* IP6MRA_CREPORT_MIF_ID */ + /* IP6MRA_CREPORT_SRC_ADDR */ + + nla_total_size(sizeof(struct in6_addr)) + /* IP6MRA_CREPORT_DST_ADDR */ + + nla_total_size(sizeof(struct in6_addr)) + /* IP6MRA_CREPORT_PKT */ + + nla_total_size(payloadlen) + ; + + return len; +} + +static void mrt6msg_netlink_event(struct mr6_table *mrt, struct sk_buff *pkt) +{ + struct net *net = read_pnet(&mrt->net); + struct nlmsghdr *nlh; + struct rtgenmsg *rtgenm; + struct mrt6msg *msg; + struct sk_buff *skb; + struct nlattr *nla; + int payloadlen; + + payloadlen = pkt->len - sizeof(struct mrt6msg); + msg = (struct mrt6msg *)skb_transport_header(pkt); + + skb = nlmsg_new(mrt6msg_netlink_msgsize(payloadlen), GFP_ATOMIC); + if (!skb) + goto errout; + + nlh = nlmsg_put(skb, 0, 0, RTM_NEWCACHEREPORT, + sizeof(struct rtgenmsg), 0); + if (!nlh) + goto errout; + rtgenm = nlmsg_data(nlh); + rtgenm->rtgen_family = RTNL_FAMILY_IP6MR; + if (nla_put_u8(skb, IP6MRA_CREPORT_MSGTYPE, msg->im6_msgtype) || + nla_put_u32(skb, IP6MRA_CREPORT_MIF_ID, msg->im6_mif) || + nla_put_in6_addr(skb, IP6MRA_CREPORT_SRC_ADDR, +&msg->im6_src) || + nla_put_in6_addr(skb, IP6MRA_CREPORT_DST_ADDR, +&msg->im6_dst)) + goto nla_put_failure; + + nla = nla_reserve(skb, IP6MRA_CREPORT_PKT, payloadlen); + if (!nla || skb_copy_bits(pkt, sizeof(struct mrt6msg), + nla_data(nla), payloadlen)) + goto nla_put_failure; + + nlmsg_end(skb, nlh); + + rtnl_notify(skb, net, 0, RTNLGRP_IPV6_MROUTE_R, NULL, GFP_ATOMIC); + return; + +nla_put_failure: + nlmsg_cancel(skb, nlh); +errout: + k
[PATCH net-next v3 3/4] ipmr: add netlink notifications on igmpmsg cache reports
Add Netlink notifications on cache reports in ipmr, in addition to the existing igmpmsg sent to mroute_sk. Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV4_MROUTE_R. MSGTYPE, VIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the same data as their equivalent fields in the igmpmsg header. PKT attribute is the packet sent to mroute_sk, without the added igmpmsg header. Suggested-by: Ryan Halbrook Signed-off-by: Julien Gomes --- include/uapi/linux/mroute.h | 12 net/ipv4/ipmr.c | 69 +++-- 2 files changed, 79 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/mroute.h b/include/uapi/linux/mroute.h index f904367c0cee..e8e5041dea8e 100644 --- a/include/uapi/linux/mroute.h +++ b/include/uapi/linux/mroute.h @@ -152,6 +152,18 @@ enum { }; #define IPMRA_VIFA_MAX (__IPMRA_VIFA_MAX - 1) +/* ipmr netlink cache report attributes */ +enum { + IPMRA_CREPORT_UNSPEC, + IPMRA_CREPORT_MSGTYPE, + IPMRA_CREPORT_VIF_ID, + IPMRA_CREPORT_SRC_ADDR, + IPMRA_CREPORT_DST_ADDR, + IPMRA_CREPORT_PKT, + __IPMRA_CREPORT_MAX +}; +#define IPMRA_CREPORT_MAX (__IPMRA_CREPORT_MAX - 1) + /* That's all usermode folks */ #define MFC_ASSERT_THRESH (3*HZ) /* Maximal freq. of asserts */ diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index 3e7454aa49e8..a1d521be612b 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -109,6 +109,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb, struct mfc_cache *c, struct rtmsg *rtm); static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc, int cmd); +static void igmpmsg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt); static void mroute_clean_tables(struct mr_table *mrt, bool all); static void ipmr_expire_process(unsigned long arg); @@ -995,8 +996,7 @@ static void ipmr_cache_resolve(struct net *net, struct mr_table *mrt, } } -/* Bounce a cache query up to mrouted. We could use netlink for this but mrouted - * expects the following bizarre scheme. +/* Bounce a cache query up to mrouted and netlink. * * Called under mrt_lock. */ @@ -1062,6 +1062,8 @@ static int ipmr_cache_report(struct mr_table *mrt, return -EINVAL; } + igmpmsg_netlink_event(mrt, skb); + /* Deliver to mrouted */ ret = sock_queue_rcv_skb(mroute_sk, skb); rcu_read_unlock(); @@ -2341,6 +2343,69 @@ static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc, rtnl_set_sk_err(net, RTNLGRP_IPV4_MROUTE, err); } +static size_t igmpmsg_netlink_msgsize(size_t payloadlen) +{ + size_t len = + NLMSG_ALIGN(sizeof(struct rtgenmsg)) + + nla_total_size(1) /* IPMRA_CREPORT_MSGTYPE */ + + nla_total_size(4) /* IPMRA_CREPORT_VIF_ID */ + + nla_total_size(4) /* IPMRA_CREPORT_SRC_ADDR */ + + nla_total_size(4) /* IPMRA_CREPORT_DST_ADDR */ + /* IPMRA_CREPORT_PKT */ + + nla_total_size(payloadlen) + ; + + return len; +} + +static void igmpmsg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt) +{ + struct net *net = read_pnet(&mrt->net); + struct nlmsghdr *nlh; + struct rtgenmsg *rtgenm; + struct igmpmsg *msg; + struct sk_buff *skb; + struct nlattr *nla; + int payloadlen; + + payloadlen = pkt->len - sizeof(struct igmpmsg); + msg = (struct igmpmsg *)skb_network_header(pkt); + + skb = nlmsg_new(igmpmsg_netlink_msgsize(payloadlen), GFP_ATOMIC); + if (!skb) + goto errout; + + nlh = nlmsg_put(skb, 0, 0, RTM_NEWCACHEREPORT, + sizeof(struct rtgenmsg), 0); + if (!nlh) + goto errout; + rtgenm = nlmsg_data(nlh); + rtgenm->rtgen_family = RTNL_FAMILY_IPMR; + if (nla_put_u8(skb, IPMRA_CREPORT_MSGTYPE, msg->im_msgtype) || + nla_put_u32(skb, IPMRA_CREPORT_VIF_ID, msg->im_vif) || + nla_put_in_addr(skb, IPMRA_CREPORT_SRC_ADDR, + msg->im_src.s_addr) || + nla_put_in_addr(skb, IPMRA_CREPORT_DST_ADDR, + msg->im_dst.s_addr)) + goto nla_put_failure; + + nla = nla_reserve(skb, IPMRA_CREPORT_PKT, payloadlen); + if (!nla || skb_copy_bits(pkt, sizeof(struct igmpmsg), + nla_data(nla), payloadlen)) + goto nla_put_failure; + + nlmsg_end(skb, nlh); + + rtnl_notify(skb, net, 0, RTNLGRP_IPV4_MROUTE_R, NULL, GFP_ATOMIC); + return; + +nla_put_failure: + nlmsg_cancel(skb, nlh); +errout: + kfree_skb(skb); + rtnl_set_sk_err(net, RTNLGRP_IPV4_MROUTE_R, -ENOBUFS); +} + static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_c
Re: [PATCH] liquidio: stop using huge static buffer, save 4096k in .data
From: Derek Chickles Date: Tue, 20 Jun 2017 13:15:34 -0700 > > From: David Miller [mailto:da...@davemloft.net] > > Sent: Tuesday, June 20, 2017 12:22 PM > > > > From: Denys Vlasenko > > Date: Mon, 19 Jun 2017 21:50:52 +0200 > > > > > Only compile-tested - I don't have the hardware. > > > > > > From code inspection, octeon_pci_write_core_mem() appears to be safe wrt > > > unaligned source. In any case, u8 fbuf[] was not guaranteed to be aligned > > > anyway. > > > > > > Signed-off-by: Denys Vlasenko > > > > Looks good to me but I'll let one of the liquidio guys review this first > > before I apply it. > > Felix is going to try this out this week to confirm. Let's wait for his ack. This patch works. I tested it with a LiquidIO II adapter. ACK
Re: [PATCH v1 1/2] dt-binding: ptp: add bindings document for dte based ptp clock
Hi Rob, On 17-06-18 07:04 AM, Rob Herring wrote: On Mon, Jun 12, 2017 at 01:26:00PM -0700, Arun Parameswaran wrote: Add device tree binding documentation for the Broadcom DTE PTP clock driver. Signed-off-by: Arun Parameswaran --- Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt | 13 + 1 file changed, 13 insertions(+) create mode 100644 Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt diff --git a/Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt b/Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt new file mode 100644 index 000..07590bc --- /dev/null +++ b/Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt @@ -0,0 +1,13 @@ +* Broadcom Digital Timing Engine(DTE) based PTP clock driver Bindings describe h/w, not drivers. + +Required properties: +- compatible: should be "brcm,ptp-dte" Looks too generic. You need SoC specific compatible strings. Rob, could you please help me understand the use of adding SoC specific compatible strings. I still don't get it. It's my understanding that the SoC compatibility string is to future proof against bugs/incompatibilities between different versions of the hardware block due to integration issues or any other reason. You can then compare in your driver because the strings were already used in the dtb. That would make sense if you can't already differentiate what SoC you are running on. But the SoC is already specified in the root of the device tree in the compatible string? Why can't you just use of_machine_is_compatible inside your driver when needed? Please explain what I'm missing. I see other drivers already following the of_machine_is_compatible approach and it makes more sense to me than adding SoC specific compatible strings into every driver. Regards, Scott
Re: [PATCH net-next 06/12] nfp: add stats and xmit helpers for representors
On Wed, Jun 21, 2017 at 01:15:05AM +0800, kbuild test robot wrote: > Hi Simon, > > [auto build test ERROR on net-next/master] > > url: > https://github.com/0day-ci/linux/commits/Simon-Horman/nfp-add-flower-app-with-representors/20170620-233831 > config: arm-allmodconfig (attached as .config) > compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705 > reproduce: > wget > https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=arm It seems that I forgot to add #include I will do so in v2. > > All errors (new ones prefixed by >>): > >drivers/net//ethernet/netronome/nfp/nfp_net_repr.c: In function > 'nfp_repr_phy_port_get_stats64': > >> drivers/net//ethernet/netronome/nfp/nfp_net_repr.c:88:22: error: implicit > >> declaration of function 'readq' [-Werror=implicit-function-declaration] > stats->tx_packets = readq(mem + NFP_MAC_STATS_RX_FRAMES_RECEIVED_OK); > ^ >cc1: some warnings being treated as errors > > vim +/readq +88 drivers/net//ethernet/netronome/nfp/nfp_net_repr.c > > 72stats->rx_packets++; > 73stats->rx_bytes += len; > 74u64_stats_update_end(&stats->syncp); > 75} > 76 > 77void > 78nfp_repr_phy_port_get_stats64(const struct nfp_app *app, u8 > phy_port, > 79 struct rtnl_link_stats64 *stats) > 80{ > 81u8 __iomem *mem; > 82 > 83mem = app->pf->mac_stats_mem + phy_port * > NFP_MAC_STATS_SIZE; > 84 > 85/* TX and RX stats are flipped as we are returning the > stats as seen > 86 * at the switch port corresponding to the phys port. > 87 */ > > 88stats->tx_packets = readq(mem + > NFP_MAC_STATS_RX_FRAMES_RECEIVED_OK); > 89stats->tx_bytes = readq(mem + > NFP_MAC_STATS_RX_IN_OCTETS); > 90stats->tx_dropped = readq(mem + > NFP_MAC_STATS_RX_IN_ERRORS); > 91 > 92stats->rx_packets = readq(mem + > NFP_MAC_STATS_TX_FRAMES_TRANSMITTED_OK); > 93stats->rx_bytes = readq(mem + > NFP_MAC_STATS_TX_OUT_OCTETS); > 94stats->rx_dropped = readq(mem + > NFP_MAC_STATS_TX_OUT_ERRORS); > 95} > 96 > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
[PATCH] net: phy: smsc: fix buffer overflow in memcpy
The memcpy annotation triggers for a fixed-length buffer copy: In file included from /git/arm-soc/arch/arm64/include/asm/processor.h:30:0, from /git/arm-soc/arch/arm64/include/asm/spinlock.h:21, from /git/arm-soc/include/linux/spinlock.h:87, from /git/arm-soc/include/linux/seqlock.h:35, from /git/arm-soc/include/linux/time.h:5, from /git/arm-soc/include/linux/stat.h:21, from /git/arm-soc/include/linux/module.h:10, from /git/arm-soc/drivers/net/phy/smsc.c:20: In function 'memcpy', inlined from 'smsc_get_strings' at /git/arm-soc/drivers/net/phy/smsc.c:166:3: /git/arm-soc/include/linux/string.h:309:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter Using strncpy instead of memcpy should do the right thing here. Fixes: 030a89028db0 ("net: phy: smsc: Implement PHY statistics") Signed-off-by: Arnd Bergmann --- drivers/net/phy/smsc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c index 1b8204be064c..2306bfae057f 100644 --- a/drivers/net/phy/smsc.c +++ b/drivers/net/phy/smsc.c @@ -163,7 +163,7 @@ static void smsc_get_strings(struct phy_device *phydev, u8 *data) int i; for (i = 0; i < ARRAY_SIZE(smsc_hw_stats); i++) { - memcpy(data + i * ETH_GSTRING_LEN, + strncpy(data + i * ETH_GSTRING_LEN, smsc_hw_stats[i].string, ETH_GSTRING_LEN); } } -- 2.9.0
RE: [PATCH] liquidio: stop using huge static buffer, save 4096k in .data
> From: David Miller [mailto:da...@davemloft.net] > Sent: Tuesday, June 20, 2017 12:22 PM > > From: Denys Vlasenko > Date: Mon, 19 Jun 2017 21:50:52 +0200 > > > Only compile-tested - I don't have the hardware. > > > > From code inspection, octeon_pci_write_core_mem() appears to be safe wrt > > unaligned source. In any case, u8 fbuf[] was not guaranteed to be aligned > > anyway. > > > > Signed-off-by: Denys Vlasenko > > Looks good to me but I'll let one of the liquidio guys review this first > before I apply it. Felix is going to try this out this week to confirm. Let's wait for his ack.
[net-next PATCH] tcp: md5: hide unused variable
Changing from a memcpy to per-member comparison left the size variable unused: net/ipv4/tcp_ipv4.c: In function 'tcp_md5_do_lookup': net/ipv4/tcp_ipv4.c:910:15: error: unused variable 'size' [-Werror=unused-variable] This does not show up when CONFIG_IPV6 is enabled, but the variable can be removed either way, along with the now unused assignment. Fixes: 6797318e623d ("tcp: md5: add an address prefix for key lookup") Signed-off-by: Arnd Bergmann --- net/ipv4/tcp_ipv4.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index bf407f3e20dd..e20bcf0061af 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -907,7 +907,6 @@ struct tcp_md5sig_key *tcp_md5_do_lookup(const struct sock *sk, { const struct tcp_sock *tp = tcp_sk(sk); struct tcp_md5sig_key *key; - unsigned int size = sizeof(struct in_addr); const struct tcp_md5sig_info *md5sig; __be32 mask; struct tcp_md5sig_key *best_match = NULL; @@ -918,10 +917,7 @@ struct tcp_md5sig_key *tcp_md5_do_lookup(const struct sock *sk, lockdep_sock_is_held(sk)); if (!md5sig) return NULL; -#if IS_ENABLED(CONFIG_IPV6) - if (family == AF_INET6) - size = sizeof(struct in6_addr); -#endif + hlist_for_each_entry_rcu(key, &md5sig->head, node) { if (key->family != family) continue; -- 2.9.0
[GIT] Networking
1) Fix refcounting wrt. timers which hold onto inet6 address objects, from Xin Long. 2) Fix an ancient bug in wireless wext ioctls, from Johannes Berg. 3) Firmware handling fixes in brcm80211 driver, from Arend Van Spriel. 4) Several mlx5 driver fixes (firmware readiness, timestamp cap reporting, devlink command validity checking, tc offloading, etc.) From Eli Cohen, Maor Dickman, Chris Mi, and Or Gerlitz. 5) Fix dst leak in IP/IP6 tunnels, from Haishuang Yan. 6) Fix dst refcount bug in decnet, from Wei Wang. 7) Netdev can be double freed in register_vlan_device(). Fix from Gao Feng. 8) Don't allow object to be destroyed while it is being dumped in SCTP, from Xin Long. 9) Fix dpaa_eth build when modular, from Madalin Bucur. 10) Fix throw route leaks, from Serhey Popovych. 11) IFLA_GROUP missing from if_nlmsg_size() and ifla_policy[] table, also from Serhey Popovych. 12) Fix premature TX SKB free in stmmac, from Niklas Cassel. Please pull, thanks a lot! The following changes since commit a090bd4ff8387c409732a8e059fbf264ea0bdd56: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-06-15 18:09:47 +0900) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to b4846fc3c8559649277e3e4e6b5cec5348a8d208: igmp: add a missing spin_lock_init() (2017-06-20 15:51:57 -0400) Arend Van Spriel (5): brcmfmac: add parameter to pass error code in firmware callback brcmfmac: use firmware callback upon failure to load brcmfmac: unbind all devices upon failure in firmware callback brcmfmac: fix brcmf_fws_add_interface() for USB devices brcmfmac: fix uninitialized warning in brcmf_usb_probe_phase2() Chris Mi (1): net/mlx5e: Fix min inline value for VF rep SQs David Howells (1): rxrpc: Fix several cases where a padded len isn't checked in ticket decode David S. Miller (4): Merge tag 'mlx5-fixes-2017-06-14' of git://git.kernel.org/.../saeed/linux Merge tag 'mac80211-for-davem-2017-06-16' of git://git.kernel.org/.../jberg/mac80211 Merge branch 'net-fix-loadable-module-for-DPAA-Ethernet' Merge tag 'wireless-drivers-for-davem-2017-06-20' of git://git.kernel.org/.../kvalo/wireless-drivers Edward Cree (1): sfc: remove duplicate up_write on VF filter_sem Eli Cohen (1): net/mlx5: Wait for FW readiness before initializing command interface Gao Feng (1): net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev Haishuang Yan (3): ip_tunnel: fix potential issue in ip_tunnel_rcv ip6_tunnel: fix potential issue in __ip6_tnl_rcv ip6_tunnel: Correct tos value in collect_md mode Johannes Berg (3): wireless: wext: remove ndo_do_ioctl fallback wireless: wext: use struct iwreq earlier in the call chain dev_ioctl: copy only the smaller struct iwreq for wext Krzysztof Kozlowski (1): dt-bindings: net: sms911x: Add missing optional VDD regulators Lin Yun Sheng (1): net/hns:bugfix of ethtool -t phy self_test Madalin Bucur (2): fsl/fman: propagate dma_ops dpaa_eth: reuse the dma_ops provided by the FMan MAC device Maor Dickman (1): net/mlx5e: Fix timestamping capabilities reporting Niklas Cassel (1): net: stmmac: free an skb first when there are no longer any descriptors using it Or Gerlitz (3): net/mlx5: Properly check applicability of devlink eswitch commands net/mlx5e: Remove TC header re-write offloading of ip tos net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it Raju Rangoju (1): cxgb4: notify uP to route ctrlq compl to rdma rspq Sebastian Siewior (1): net/core: remove explicit do_softirq() from busy_poll_stop() Serhey Popovych (3): fib_rules: Resolve goto rules target on delete ipv6: Do not leak throw route references rtnetlink: add IFLA_GROUP to ifla_policy WANG Cong (1): igmp: add a missing spin_lock_init() Wei Wang (1): decnet: always not take dst->__refcnt when inserting dst into hash table Xin Long (3): ipv6: fix calling in6_ifa_hold incorrectly for dad work sctp: return next obj by passing pos + 1 into sctp_transport_get_idx sctp: ensure ep is not destroyed before doing the dump xypron.g...@gmx.de (1): Doc: net: dsa: b53: update location of referenced dsa.txt Documentation/devicetree/bindings/net/dsa/b53.txt | 2 +- Documentation/devicetree/bindings/net/smsc911x.txt | 1 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 10 ++ drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +- drivers/net/ethernet/freescale/fman/mac.c | 2 ++ drivers/net/ethernet/hisilicon/hns/hns_ethtool.c| 16 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c| 8 +++