Re: [PATCHv4 iproute2 2/2] lib/libnetlink: update rtnl_talk to support malloc buff at run time
On Mon, Oct 09, 2017 at 10:25:25PM +0200, Phil Sutter wrote: > Hi Stephen, > > On Mon, Oct 02, 2017 at 10:37:08AM -0700, Stephen Hemminger wrote: > > On Thu, 28 Sep 2017 21:33:46 +0800 > > Hangbin Liu wrote: > > > > > From: Hangbin Liu > > > > > > This is an update for 460c03f3f3cc ("iplink: double the buffer size also > > > in > > > iplink_get()"). After update, we will not need to double the buffer size > > > every time when VFs number increased. > > > > > > With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the > > > length parameter. > > > > > > With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable > > > answer to avoid overwrite data in nlh, because it may has more info after > > > nlh. also this will avoid nlh buffer not enough issue. > > > > > > We need to free answer after using. > > > > > > Signed-off-by: Hangbin Liu > > > Signed-off-by: Phil Sutter > > > --- > > > > Most of the uses of rtnl_talk() don't need to this peek and dynamic sizing. > > Can only those places that need that be targeted? > > We could probably do that, by having a buffer on stack in __rtnl_talk() > which will be used instead of the allocated one if 'answer' is NULL. Or > maybe even introduce a dedicated API call for the dynamically allocated > receive buffer. But I really doubt that's feasible: AFAICT, that stack > buffer still needs to be reasonably sized since the reply might be > larger than the request (reusing the request buffer would be the most > simple way to tackle this), also there is support for extack which may > bloat the response to arbitrary size. Hangbin has shown in his benchmark > that the overhead of the second syscall is negligible, so why care about > that and increase code complexity even further? > > Not saying it's not possible, but I just doubt it's worth the effort. Agreed. Current code is based on the assumption that we can estimate the maximum reply length in advance and the reason for this series is that this assumption turned out to be wrong. I'm afraid that if we replace it by an assumption that we can estimate the maximum reply length for most requests with only few exceptions, it's only matter of time for us to be proven wrong again. Michal Kubecek
[Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
We need a real-time notification for tcp retransmission for monitoring. Of course we could use ftrace to dynamically instrument this kernel function too, however we can't retrieve the connection information at the same time, for example perf-tools [1] reads /proc/net/tcp for socket details, which is slow when we have a lots of connections. Therefore, this patch adds a tracepoint for tcp_retransmit_skb() and exposes src/dst IP addresses and ports of the connection. This also makes it easier to integrate into perf. Note, I expose both IPv4 and IPv6 addresses at the same time: for a IPv4 socket, v4 mapped address is used as IPv6 addresses, for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel. Perhaps there are other interfaces to use (for example netlink), but tracepoint is the quickest way I can think of. 1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans Cc: Eric Dumazet Cc: Yuchung Cheng Cc: Neal Cardwell Signed-off-by: Cong Wang --- include/trace/events/tcp.h | 63 ++ net/core/net-traces.c | 1 + net/ipv4/tcp_output.c | 3 +++ 3 files changed, 67 insertions(+) create mode 100644 include/trace/events/tcp.h diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h new file mode 100644 index ..cb22acc8aacd --- /dev/null +++ b/include/trace/events/tcp.h @@ -0,0 +1,63 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM tcp + +#if !defined(_TRACE_TCP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_TCP_H + +#include +#include +#include +#include + +TRACE_EVENT(tcp_retransmit_skb, + + TP_PROTO(const struct sock *sk, struct sk_buff *skb, int segs), + + TP_ARGS(sk, skb, segs), + + TP_STRUCT__entry( + __field(__u16, sport) + __field(__u16, dport) + __array(__u8, saddr, 4) + __array(__u8, daddr, 4) + __array(__u8, saddr_v6, 16) + __array(__u8, daddr_v6, 16) + ), + + TP_fast_assign( + struct ipv6_pinfo *np = inet6_sk(sk); + struct inet_sock *inet = inet_sk(sk); + struct in6_addr *pin6; + __be32 *p32; + + __entry->sport = ntohs(inet->inet_sport); + __entry->dport = ntohs(inet->inet_dport); + + p32 = (__be32 *) __entry->saddr; + *p32 = inet->inet_saddr; + + p32 = (__be32 *) __entry->daddr; + *p32 = inet->inet_daddr; + + if (np) { + pin6 = (struct in6_addr *)__entry->saddr_v6; + *pin6 = np->saddr; + pin6 = (struct in6_addr *)__entry->daddr_v6; + *pin6 = *(np->daddr_cache); + } else { + pin6 = (struct in6_addr *)__entry->saddr_v6; + ipv6_addr_set_v4mapped(inet->inet_saddr, pin6); + pin6 = (struct in6_addr *)__entry->daddr_v6; + ipv6_addr_set_v4mapped(inet->inet_daddr, pin6); + } + ), + + TP_printk("sport=%hu, dport=%hu, saddr=%pI4, daddr=%pI4, saddrv6=%pI6, daddrv6=%pI6", + __entry->sport, __entry->dport, __entry->saddr, __entry->daddr, + __entry->saddr_v6, __entry->daddr_v6) +); + +#endif /* _TRACE_TCP_H */ + +/* This part must be outside protection */ +#include diff --git a/net/core/net-traces.c b/net/core/net-traces.c index 1132820c8e62..f4e4fa2db505 100644 --- a/net/core/net-traces.c +++ b/net/core/net-traces.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #if IS_ENABLED(CONFIG_IPV6) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 696b0a168f16..e6d6e1393578 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -42,6 +42,8 @@ #include #include +#include + /* People can turn this off for buggy TCP's found in printers etc. */ int sysctl_tcp_retrans_collapse __read_mostly = 1; @@ -2899,6 +2901,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) if (!tp->retrans_stamp) tp->retrans_stamp = tcp_skb_timestamp(skb); + trace_tcp_retransmit_skb(sk, skb, segs); } if (tp->undo_retrans < 0) -- 2.13.0
Re: [net-next 2/3] ip_gre: fix erspan tunnel mtu calculation
On Tue, Oct 10, 2017 at 4:47 AM, William Tu wrote: > Remove the unnecessary -4 and +4 bytes at mtu and headroom calculation. > In addition, erspan uses fixed 8-byte gre header, so add ERSPAN_GREHDR_LEN > macro for better readability. > > Now tunnel->hlen = grehdr(8) + erspanhdr(8) = 16 byte. > The mtu should be ETH_DATA_LEN - 16 - iph(20) = 1464. > After the ip_tunnel_bind_dev(), the mtu is adjusted to > 1464 - 14 (dev->hard_header_len) = 1450. > The maximum skb->len the erspan tunnel can carry without > being truncated is 1450 + 14 = 1464 byte. > > Signed-off-by: William Tu > Cc: Xin Long > --- > include/net/erspan.h | 1 + > net/ipv4/ip_gre.c| 11 +-- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/include/net/erspan.h b/include/net/erspan.h > index ca94fc86865e..e28294e248d0 100644 > --- a/include/net/erspan.h > +++ b/include/net/erspan.h > @@ -28,6 +28,7 @@ > */ > > #define ERSPAN_VERSION 0x1 > +#define ERSPAN_GREHDR_LEN 8/* ERSPAN has fixed 8-byte GRE header */ > > #define VER_MASK 0xf000 > #define VLAN_MASK 0x0fff > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index 286065c35959..6e6e4c4811cc 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -569,8 +569,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct > net_device *dev, > > key = &tun_info->key; > > - /* ERSPAN has fixed 8 byte GRE header */ > - tunnel_hlen = 8 + sizeof(struct erspanhdr); > + tunnel_hlen = ERSPAN_GREHDR_LEN + sizeof(struct erspanhdr); > > rt = prepare_fb_xmit(skb, dev, &fl, tunnel_hlen); > if (!rt) > @@ -591,7 +590,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct > net_device *dev, > erspan_build_header(skb, tunnel_id_to_key32(key->tun_id), > ntohl(md->index), truncate); > > - gre_build_header(skb, 8, TUNNEL_SEQ, > + gre_build_header(skb, ERSPAN_GREHDR_LEN, TUNNEL_SEQ, > htons(ETH_P_ERSPAN), 0, htonl(tunnel->o_seqno++)); > > df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0; > @@ -1242,14 +1241,14 @@ static int erspan_tunnel_init(struct net_device *dev) > struct ip_tunnel *tunnel = netdev_priv(dev); > int t_hlen; > > - tunnel->tun_hlen = 8; > + tunnel->tun_hlen = ERSPAN_GREHDR_LEN; > tunnel->parms.iph.protocol = IPPROTO_GRE; > tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen + >sizeof(struct erspanhdr); > t_hlen = tunnel->hlen + sizeof(struct iphdr); > > - dev->needed_headroom = LL_MAX_HEADER + t_hlen + 4; > - dev->mtu = ETH_DATA_LEN - t_hlen - 4; > + dev->needed_headroom = LL_MAX_HEADER + t_hlen; > + dev->mtu = ETH_DATA_LEN - t_hlen; 1. I guess '+4-4' stuff was copied from __gre_tunnel_init(), I'm thinking it may be there for some reason. 2. 'dev->needed_headroom =' and 'dev->mtu =' are really needed ? As I've seen both will be updated in .newlink: ipgre_newlink() -> ip_tunnel_newlink() -> ip_tunnel_bind_dev() > dev->features |= GRE_FEATURES; > dev->hw_features|= GRE_FEATURES; > dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; > -- > 2.7.4 >
Re: [PATCH] net: ftgmac100: Request clock and set speed
On Tue, Oct 10, 2017 at 2:34 PM, Florian Fainelli wrote: > > > On 10/09/2017 09:49 PM, Joel Stanley wrote: >> According to the ASPEED datasheet, gigabit speeds require a clock of >> 100MHz or higher. Other speeds require 25MHz or higher. >> >> Signed-off-by: Joel Stanley >> --- > >> @@ -161,6 +170,9 @@ static int ftgmac100_reset_and_config_mac(struct >> ftgmac100 *priv) >> break; >> } >> >> + if (freq && priv->clk) >> + clk_set_rate(priv->clk, freq); > > Checking for priv->clk should not be necessary all public clk APIs can > deal with a NULL clock pointer. The intention was to set ->clk to NULL to indicate that there was no clk, and therefore there is no reason to attempt to set the rate or call prepare/unprepare. If the we prefer to call the clk apis unconditionally I will send a v2 with the checks removed. Thanks for the review. Cheers, Joel
Re: [PATCH] net: ftgmac100: Request clock and set speed
On 10/09/2017 09:49 PM, Joel Stanley wrote: > According to the ASPEED datasheet, gigabit speeds require a clock of > 100MHz or higher. Other speeds require 25MHz or higher. > > Signed-off-by: Joel Stanley > --- > @@ -161,6 +170,9 @@ static int ftgmac100_reset_and_config_mac(struct > ftgmac100 *priv) > break; > } > > + if (freq && priv->clk) > + clk_set_rate(priv->clk, freq); Checking for priv->clk should not be necessary all public clk APIs can deal with a NULL clock pointer. > + > /* (Re)initialize the queue pointers */ > priv->rx_pointer = 0; > priv->tx_clean_pointer = 0; > @@ -1775,6 +1787,13 @@ static int ftgmac100_probe(struct platform_device > *pdev) > priv->dev = &pdev->dev; > INIT_WORK(&priv->reset_task, ftgmac100_reset_task); > > + /* Enable clock if present */ > + priv->clk = devm_clk_get(&pdev->dev, NULL); > + if (!IS_ERR(priv->clk)) > + clk_prepare_enable(priv->clk); > + else > + priv->clk = NULL; Same here. > + > /* map io memory */ > priv->res = request_mem_region(res->start, resource_size(res), > dev_name(&pdev->dev)); > @@ -1883,6 +1902,9 @@ static int ftgmac100_remove(struct platform_device > *pdev) > > unregister_netdev(netdev); > > + if (priv->clk) > + clk_disable_unprepare(priv->clk); And here. > + > /* There's a small chance the reset task will have been re-queued, >* during stop, make sure it's gone before we free the structure. >*/ > -- Florian
Re: [PATCH] net: dsa: mv88e6xxx: rework in-chip bridging
On 09/10/17 14:00, Florian Fainelli wrote: > Le 10/08/17 à 20:23, Greg Ungerer a écrit : >> On 07/10/17 13:04, Florian Fainelli wrote: >>> Le 10/03/17 à 23:20, Greg Ungerer a écrit : On Wed, Mar 29, 2017 at 04:30:16PM -0400, Vivien Didelot wrote: > All ports -- internal and external, for chips featuring a PVT -- have a > mask restricting to which internal ports a frame is allowed to egress. > > Now that DSA exposes the number of ports and their bridge devices, it is > possible to extract the code generating the VLAN map and make it generic > so that it can be shared later with the cross-chip bridging code. This patch changes the behavior of interfaces on startup if they are not part of a bridge. I have a board with a Marvell 6350 switch with a device tree that sets up the 5 ports as lan1, lan2, lan3, lan4, wan. With kernels before this patch (so linux-4.12 and older) after system startup I could do: ifconfig lan1 192.168.0.1 And then ping out that interface with no problems. After this patch is applied (effects linux-4.13 and newer) then the ping fails: PING 192.168.0.22 (192.168.0.22) 56(84) bytes of data. From 192.168.0.1 icmp_seq=1 Destination Host Unreachable From 192.168.0.1 icmp_seq=2 Destination Host Unreachable From 192.168.0.1 icmp_seq=3 Destination Host Unreachable If I incorporate an interface into a bridge then it all works ok. So simply: brctl addbr br0 brctl addif br0 lan1 ifconfig lan1 up ifconfig br0 192.168.0.1 Then pings out work as expected. And if I now remove that lan1 interface from the bridge and use it alone again then it will now work ok: ifconfig br0 down brctl delif br0 lan1 ifconfig lan1 192.168.0.1 And that now pings ok. I fixed this with the attached patch. It is probably not the correct approach, but it does restore the older behavior. What do you think? >>> >>> This is strange, the dsa_switch_tree and its associated dsa_switch >>> instances should be fully setup by the time ops->setup() is running in >>> your driver but your patch suggests this may not be happening? >> >> That is what I am seeing, yep. >> >> >>> Are you using the new style Device Tree binding or the old style Device >>> Tree binding out of curiosity? >> >> This is my device tree fragment for the switch: >> >> dsa@0 { >> compatible = "marvell,dsa"; >> #address-cells = <2>; >> #size-cells = <0>; >> >> dsa,ethernet = <ð0>; >> dsa,mii-bus = <&mdio>; >> >> switch@0 { >> #address-cells = <1>; >> #size-cells = <0>; >> reg = <0x11 0>; >> >> port@0 { >> reg = <0>; >> label = "lan1"; >> }; >> port@1 { >> reg = <1>; >> label = "lan2"; >> }; >> port@2 { >> reg = <2>; >> label = "lan3"; >> }; >> port@3 { >> reg = <3>; >> label = "lan4"; >> }; >> port@4 { >> reg = <4>; >> label = "wan"; >> }; >> port@5 { >> reg = <5>; >> label = "cpu"; >> }; >> }; >> }; >> >> The board I am using is based around an Marvell Armada 370. This device tree >> setup looks pretty similar to the other Marvell boards using marvell,dsa. > > This is the old Device Tree binding which goes through an unfortunately > different code path while initializing all the dsa_switch_tree and > dsa_switch structures, while we should definitively look into fixing > this, would you mind trying to update your board using something similar > to this commit: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cb2ec8cad8c82cd7cfd19edcacd846861d6e703 Ok, converted the devicetree to use this form of dsa setup. Does not show the problem, on first boot lan1 works properly. > This would make you go through net/dsa/dsa2.c which is what most of us > usually test. In the meantime we should probably start issuing warning > messages when people use the old Device Tree binding to encourage them > to migrate other. Yeah, I had not noticed that the devicetree dsa setup had changed, I have been using my devicetree for quite a few k
[PATCH] net: ftgmac100: Request clock and set speed
According to the ASPEED datasheet, gigabit speeds require a clock of 100MHz or higher. Other speeds require 25MHz or higher. Signed-off-by: Joel Stanley --- drivers/net/ethernet/faraday/ftgmac100.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index 9ed8e4b81530..870ebd857978 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -21,6 +21,7 @@ #define pr_fmt(fmt)KBUILD_MODNAME ": " fmt +#include #include #include #include @@ -59,6 +60,9 @@ /* Min number of tx ring entries before stopping queue */ #define TX_THRESHOLD (MAX_SKB_FRAGS + 1) +#define FTGMAC_100MHZ 1 +#define FTGMAC_25MHZ 2500 + struct ftgmac100 { /* Registers */ struct resource *res; @@ -96,6 +100,7 @@ struct ftgmac100 { struct napi_struct napi; struct work_struct reset_task; struct mii_bus *mii_bus; + struct clk *clk; /* Link management */ int cur_speed; @@ -142,18 +147,22 @@ static int ftgmac100_reset_mac(struct ftgmac100 *priv, u32 maccr) static int ftgmac100_reset_and_config_mac(struct ftgmac100 *priv) { u32 maccr = 0; + int freq = 0; switch (priv->cur_speed) { case SPEED_10: case 0: /* no link */ + freq = FTGMAC_25MHZ; break; case SPEED_100: maccr |= FTGMAC100_MACCR_FAST_MODE; + freq = FTGMAC_25MHZ; break; case SPEED_1000: maccr |= FTGMAC100_MACCR_GIGA_MODE; + freq = FTGMAC_100MHZ; break; default: netdev_err(priv->netdev, "Unknown speed %d !\n", @@ -161,6 +170,9 @@ static int ftgmac100_reset_and_config_mac(struct ftgmac100 *priv) break; } + if (freq && priv->clk) + clk_set_rate(priv->clk, freq); + /* (Re)initialize the queue pointers */ priv->rx_pointer = 0; priv->tx_clean_pointer = 0; @@ -1775,6 +1787,13 @@ static int ftgmac100_probe(struct platform_device *pdev) priv->dev = &pdev->dev; INIT_WORK(&priv->reset_task, ftgmac100_reset_task); + /* Enable clock if present */ + priv->clk = devm_clk_get(&pdev->dev, NULL); + if (!IS_ERR(priv->clk)) + clk_prepare_enable(priv->clk); + else + priv->clk = NULL; + /* map io memory */ priv->res = request_mem_region(res->start, resource_size(res), dev_name(&pdev->dev)); @@ -1883,6 +1902,9 @@ static int ftgmac100_remove(struct platform_device *pdev) unregister_netdev(netdev); + if (priv->clk) + clk_disable_unprepare(priv->clk); + /* There's a small chance the reset task will have been re-queued, * during stop, make sure it's gone before we free the structure. */ -- 2.14.1
Re: [PATCH net-next] openvswitch: add ct_clear action
On Fri, Oct 6, 2017 at 9:44 AM, Eric Garver wrote: > This adds a ct_clear action for clearing conntrack state. ct_clear is > currently implemented in OVS userspace, but is not backed by an action > in the kernel datapath. This is useful for flows that may modify a > packet tuple after a ct lookup has already occurred. > > Signed-off-by: Eric Garver Patch mostly looks good. I have following comments. > --- > include/uapi/linux/openvswitch.h | 2 ++ > net/openvswitch/actions.c| 5 + > net/openvswitch/conntrack.c | 12 > net/openvswitch/conntrack.h | 7 +++ > net/openvswitch/flow_netlink.c | 5 + > 5 files changed, 31 insertions(+) > > diff --git a/include/uapi/linux/openvswitch.h > b/include/uapi/linux/openvswitch.h > index 156ee4cab82e..1b6e510e2cc6 100644 > --- a/include/uapi/linux/openvswitch.h > +++ b/include/uapi/linux/openvswitch.h > @@ -806,6 +806,7 @@ struct ovs_action_push_eth { > * packet. > * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the > * packet. > + * @OVS_ACTION_ATTR_CT_CLEAR: Clear conntrack state from the packet. > * > * Only a single header can be set with a single %OVS_ACTION_ATTR_SET. Not > all > * fields within a header are modifiable, e.g. the IPv4 protocol and fragment > @@ -835,6 +836,7 @@ enum ovs_action_attr { > OVS_ACTION_ATTR_TRUNC,/* u32 struct ovs_action_trunc. */ > OVS_ACTION_ATTR_PUSH_ETH, /* struct ovs_action_push_eth. */ > OVS_ACTION_ATTR_POP_ETH, /* No argument. */ > + OVS_ACTION_ATTR_CT_CLEAR, /* No argument. */ > > __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted >* from userspace. */ > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index a54a556fcdb5..db9c7f2e662b 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -1203,6 +1203,10 @@ static int do_execute_actions(struct datapath *dp, > struct sk_buff *skb, > return err == -EINPROGRESS ? 0 : err; > break; > > + case OVS_ACTION_ATTR_CT_CLEAR: > + err = ovs_ct_clear(skb, key); > + break; > + > case OVS_ACTION_ATTR_PUSH_ETH: > err = push_eth(skb, key, nla_data(a)); > break; > @@ -1210,6 +1214,7 @@ static int do_execute_actions(struct datapath *dp, > struct sk_buff *skb, > case OVS_ACTION_ATTR_POP_ETH: > err = pop_eth(skb, key); > break; > + > } Unrelated change. > > if (unlikely(err)) { > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > index d558e882ca0c..f9b73c726ad7 100644 > --- a/net/openvswitch/conntrack.c > +++ b/net/openvswitch/conntrack.c > @@ -1129,6 +1129,18 @@ int ovs_ct_execute(struct net *net, struct sk_buff > *skb, > return err; > } > > +int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key *key) > +{ > + if (skb_nfct(skb)) { > + nf_conntrack_put(skb_nfct(skb)); > + nf_ct_set(skb, NULL, 0); Can the new conntract state be appropriate? may be IP_CT_UNTRACKED? > + } > + > + ovs_ct_fill_key(skb, key); > + I do not see need to refill the key if there is no skb-nf-ct. > + return 0; > +} > + > static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char > *name, > const struct sw_flow_key *key, bool log) > {
[PATCH iproute2] iplink: new option to set neigh suppression on a bridge port
From: Roopa Prabhu neigh suppression can be used to suppress arp and nd flood to bridge ports. It maps to the recently added kernel support for bridge port flag IFLA_BRPORT_NEIGH_SUPPRESS. Signed-off-by: Roopa Prabhu --- bridge/link.c| 13 + ip/iplink_bridge_slave.c | 8 man/man8/bridge.8| 4 3 files changed, 25 insertions(+) diff --git a/bridge/link.c b/bridge/link.c index 93472ad..d3a211e 100644 --- a/bridge/link.c +++ b/bridge/link.c @@ -198,6 +198,9 @@ int print_linkinfo(const struct sockaddr_nl *who, if (prtb[IFLA_BRPORT_MCAST_FLOOD]) print_onoff(fp, "mcast_flood", rta_getattr_u8(prtb[IFLA_BRPORT_MCAST_FLOOD])); + if (prtb[IFLA_BRPORT_NEIGH_SUPPRESS]) + print_onoff(fp, "neigh_suppress", + rta_getattr_u8(prtb[IFLA_BRPORT_NEIGH_SUPPRESS])); } } else print_portstate(fp, rta_getattr_u8(tb[IFLA_PROTINFO])); @@ -266,6 +269,7 @@ static int brlink_modify(int argc, char **argv) .ifm.ifi_family = PF_BRIDGE, }; char *d = NULL; + __s8 neigh_suppress = -1; __s8 learning = -1; __s8 learning_sync = -1; __s8 flood = -1; @@ -355,6 +359,11 @@ static int brlink_modify(int argc, char **argv) flags |= BRIDGE_FLAGS_SELF; } else if (strcmp(*argv, "master") == 0) { flags |= BRIDGE_FLAGS_MASTER; + } else if (strcmp(*argv, "neigh_suppress") == 0) { + NEXT_ARG(); + if (!on_off("neigh_suppress", &neigh_suppress, + *argv)) + return -1; } else { usage(); } @@ -407,6 +416,10 @@ static int brlink_modify(int argc, char **argv) if (state >= 0) addattr8(&req.n, sizeof(req), IFLA_BRPORT_STATE, state); + if (neigh_suppress != -1) + addattr8(&req.n, sizeof(req), IFLA_BRPORT_NEIGH_SUPPRESS, +neigh_suppress); + addattr_nest_end(&req.n, nest); /* IFLA_AF_SPEC nested attribute. Contains IFLA_BRIDGE_FLAGS that diff --git a/ip/iplink_bridge_slave.c b/ip/iplink_bridge_slave.c index 80272b0..fdf8e89 100644 --- a/ip/iplink_bridge_slave.c +++ b/ip/iplink_bridge_slave.c @@ -238,6 +238,10 @@ static void bridge_slave_print_opt(struct link_util *lu, FILE *f, if (tb[IFLA_BRPORT_MCAST_FLOOD]) _print_onoff(f, "mcast_flood", "mcast_flood", rta_getattr_u8(tb[IFLA_BRPORT_MCAST_FLOOD])); + + if (tb[IFLA_BRPORT_NEIGH_SUPPRESS]) + _print_onoff(f, "neigh_suppress", "neigh_suppress", +rta_getattr_u8(tb[IFLA_BRPORT_NEIGH_SUPPRESS])); } static void bridge_slave_parse_on_off(char *arg_name, char *arg_val, @@ -328,6 +332,10 @@ static int bridge_slave_parse_opt(struct link_util *lu, int argc, char **argv, NEXT_ARG(); bridge_slave_parse_on_off("mcast_fast_leave", *argv, n, IFLA_BRPORT_FAST_LEAVE); + } else if (matches(*argv, "neigh_suppress") == 0) { + NEXT_ARG(); + bridge_slave_parse_on_off("neigh_suppress", *argv, n, + IFLA_BRPORT_NEIGH_SUPPRESS); } else if (matches(*argv, "help") == 0) { explain(); return -1; diff --git a/man/man8/bridge.8 b/man/man8/bridge.8 index 9c5f855..fdba0fe 100644 --- a/man/man8/bridge.8 +++ b/man/man8/bridge.8 @@ -323,6 +323,10 @@ switch. Controls whether a given port will be flooded with multicast traffic for which there is no MDB entry. By default this flag is on. .TP +.BR "neigh_suppress on " or " neigh_suppress off " +Controls whether neigh discovery (arp and nd) proxy and suppression is enabled on the port. By default this flag is off. + +.TP .BI self link setting is configured on specified physical device -- 2.1.4
Re: [PATCH 1/3] net/atm: Delete an error message for a failed memory allocation in five functions
On Mon, 2017-10-09 at 22:49 +0200, SF Markus Elfring wrote: > Omit extra messages for a memory allocation failure in these functions. [] > diff --git a/net/atm/mpc.c b/net/atm/mpc.c > @@ -184,10 +184,8 @@ struct atm_mpoa_qos *atm_mpoa_add_qos(__be32 dst_ip, > struct atm_qos *qos) > } > > entry = kmalloc(sizeof(struct atm_mpoa_qos), GFP_KERNEL); > - if (entry == NULL) { > - pr_info("mpoa: out of memory\n"); > + if (!entry) Unspecified change. Make sure your changelog is comprehensive. > return entry; > - } > entry->ipaddr = dst_ip; > entry->qos = *qos; > @@ -473,10 +471,8 @@ static const uint8_t *copy_macs(struct mpoa_client *mpc, > kfree(mpc->mps_macs); > mpc->number_of_mps_macs = 0; > mpc->mps_macs = kmalloc(num_macs * ETH_ALEN, GFP_KERNEL); > - if (mpc->mps_macs == NULL) { > - pr_info("(%s) out of mem\n", mpc->dev->name); > + if (!mpc->mps_macs) > return NULL; etc...
Re: [PATCH net-next 0/2] net: defer cgroups init to accept()
From: Eric Dumazet Date: Sun, 8 Oct 2017 21:47:49 -0700 > On Sun, Oct 8, 2017 at 9:44 PM, Eric Dumazet wrote: >> After TCP 3WHS became lockless, we should not attempt cgroup games >> from sk_clone_lock() since listener/cgroup might be already gone. >> >> Move this business to inet_csk_accept() where we have >> the guarantee both parent and child exist. >> >> Many thanks to John Sperbeck for spotting these issues >> >> Eric Dumazet (2): >> net: memcontrol: defer call to mem_cgroup_sk_alloc() >> net: defer call to cgroup_sk_alloc() > > This was based on net tree, but I used the wrong script, and thus this > has the [PATCH net-next] tag. > > Sorry for the confusion, but I guess this also can be applied to > net-next since this is not a recent regression. Series applied to 'net', thanks.
Re: [PATCHv2] Add a driver for Renesas uPD60620 and uPD60620A PHYs
From: Bernd Edlinger Date: Sun, 8 Oct 2017 13:40:08 + > Signed-off-by: Bernd Edlinger Applied to net-next, thanks.
Re: [PATCH net-next] openvswitch: add ct_clear action
From: Eric Garver Date: Fri, 6 Oct 2017 12:44:26 -0400 > This adds a ct_clear action for clearing conntrack state. ct_clear is > currently implemented in OVS userspace, but is not backed by an action > in the kernel datapath. This is useful for flows that may modify a > packet tuple after a ct lookup has already occurred. > > Signed-off-by: Eric Garver Pravin et al., this needs your review. Thanks.
Re: [PATCH RFC 0/3] tun zerocopy stats
From: Willem de Bruijn Date: Fri, 6 Oct 2017 18:25:13 -0400 > From: Willem de Bruijn > > Add zerocopy transfer statistics to the vhost_net/tun zerocopy path. > > I've been using this to verify recent changes to zerocopy tuning [1]. > Sharing more widely, as it may be useful in similar future work. > > Use ethtool stats as interface, as these are defined per device > driver and can easily be extended. > > Make the zerocopy release callback take an extra hop through the tun > driver to allow the driver to increment its counters. > > Care must be taken to avoid adding an alloc/free to this hot path. > Since the caller already must allocate a ubuf_info, make it allocate > two at a time and grant one to the tun device. > > 1/3: introduce ethtool stats (`ethtool -S $DEV`) for tun devices > 2/3: add zerocopy tx and tx_err counters > 3/3: convert vhost_net to pass a pair of ubuf_info to tun > > [1] http://patchwork.ozlabs.org/patch/822613/ This looks mostly fine to me, but I don't know enough about how vhost and tap interact to tell whether this makes sense to upstream. What are the runtime costs for these new statistics?
Re: [PATCH net-next v2] vhost_net: do not stall on zerocopy depletion
From: Willem de Bruijn Date: Fri, 6 Oct 2017 13:22:31 -0400 > From: Willem de Bruijn > > Vhost-net has a hard limit on the number of zerocopy skbs in flight. > When reached, transmission stalls. Stalls cause latency, as well as > head-of-line blocking of other flows that do not use zerocopy. > > Instead of stalling, revert to copy-based transmission. > > Tested by sending two udp flows from guest to host, one with payload > of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The > large flow is redirected to a netem instance with 1MBps rate limit > and deep 1000 entry queue. > > modprobe ifb > ip link set dev ifb0 up > tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit > > tc qdisc add dev tap0 ingress > tc filter add dev tap0 parent : protocol ip \ > u32 match ip dport 8000 0x \ > action mirred egress redirect dev ifb0 > > Before the delay, both flows process around 80K pps. With the delay, > before this patch, both process around 400. After this patch, the > large flow is still rate limited, while the small reverts to its > original rate. See also discussion in the first link, below. > > Without rate limiting, {1, 10, 100}x TCP_STREAM tests continued to > send at 100% zerocopy. > > The limit in vhost_exceeds_maxpend must be carefully chosen. With > vq->num >> 1, the flows remain correlated. This value happens to > correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller > fractions and ensure correctness also for much smaller values of > vq->num, by testing the min() of both explicitly. See also the > discussion in the second link below. > > Changes > v1 -> v2 > - replaced min with typed min_t > - avoid unnecessary whitespace change > > Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3thtxa6bnzkygp3tgvpjbp...@mail.gmail.com > Link:http://lkml.kernel.org/r/20170819064129.27272-1-...@klaipeden.com > Signed-off-by: Willem de Bruijn Applied, thanks Willem.
Re: [PATCHv2 net-next] openvswitch: Add erspan tunnel support.
From: William Tu Date: Wed, 4 Oct 2017 17:03:12 -0700 > Add erspan netlink interface for OVS. > > Signed-off-by: William Tu > Cc: Pravin B Shelar > --- > v1->v2: remove unnecessary compat code. Applied, thanks.
Re: [PATCH net-next] once: switch to new jump label API
From: Eric Biggers Date: Mon, 9 Oct 2017 14:30:52 -0700 > From: Eric Biggers > > Switch the DO_ONCE() macro from the deprecated jump label API to the new > one. The new one is more readable, and for DO_ONCE() it also makes the > generated code more icache-friendly: now the one-time initialization > code is placed out-of-line at the jump target, rather than at the inline > fallthrough case. > > Acked-by: Hannes Frederic Sowa > Signed-off-by: Eric Biggers Applied, thank you.
Re: [PATCH net-next] ipv6: use rcu_dereference_bh() in ipv6_route_seq_next()
From: Wei Wang Date: Mon, 9 Oct 2017 17:17:26 -0700 > From: Wei Wang > > This patch replaces rcu_deference() with rcu_dereference_bh() in > ipv6_route_seq_next() to avoid the following warning: ... > Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in > fib6_table") > Reported-by: Xiaolong Ye > Signed-off-by: Wei Wang > Acked-by: Eric Dumazet Applied, thanks.
Re: [PATCH 1/1] xdp: Sample xdp program implementing ip forward
Sorry for the late reply. I will include the suggested changes in the next revision of the patch. Please see inline for clarifications and questions. Thanks, Christina From: Daniel Borkmann Sent: Tuesday, October 3, 2017 9:24 PM To: Jacob, Christina; netdev@vger.kernel.org Cc: linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; alexei.starovoi...@gmail.com Subject: Re: [PATCH 1/1] xdp: Sample xdp program implementing ip forward >On 10/03/2017 09:37 AM, cjacob wrote: >> Implements port to port forwarding with route table and arp table >> lookup for ipv4 packets using bpf_redirect helper function and >> lpm_trie map. >> >> Signed-off-by: cjacob > >Thanks for the patch, just few minor comments below! > >Note, should be full name, e.g.: > > Signed-off-by: Christina Jacob > >Also you From: only shows 'cjacob' as can be seen from the cover letter >as well, so perhaps check your git settings to make that full name: > > cjacob (1): > xdp: Sample xdp program implementing ip forward > >If there's one single patch, then cover letter is not needed, only >for >1 sets. > >[...] >> +#define KBUILD_MODNAME "foo" >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include "bpf_helpers.h" >> +#include >> +#include >> + >> +struct trie_value { >> + __u8 prefix[4]; >> + long value; >> + int gw; >> + int ifindex; >> + int metric; >> +}; >> + >> +union key_4 { >> + u32 b32[2]; >> + u8 b8[8]; >> +}; >> + >> +struct arp_entry { >> + int dst; >> + long mac; >> +}; >> + >> +struct direct_map { >> + long mac; >> + int ifindex; >> + struct arp_entry arp; >> +}; >> + >> +/* Map for trie implementation*/ >> +struct bpf_map_def SEC("maps") lpm_map = { >> + .type = BPF_MAP_TYPE_LPM_TRIE, >> + .key_size = 8, >> + .value_size = >> + sizeof(struct trie_value), > >(Nit: there are couple of such breaks throughout the patch, can we > just use single line for such cases where reasonable?) > >> + .max_entries = 50, >> + .map_flags = BPF_F_NO_PREALLOC, >> +}; >> + >> +/* Map for counter*/ >> +struct bpf_map_def SEC("maps") rxcnt = { >> + .type = BPF_MAP_TYPE_PERCPU_ARRAY, >> + .key_size = sizeof(u32), >> + .value_size = sizeof(long), >> + .max_entries = 256, >> +}; >> + >> +/* Map for ARP table*/ >> +struct bpf_map_def SEC("maps") arp_table = { >> + .type = BPF_MAP_TYPE_HASH, >> + .key_size = sizeof(int), >> + .value_size = sizeof(long), > >Perhaps these should be proper structs here, such that it >becomes easier to read/handle later on lookup. > I am not clear about this. I am defining a ebpf map. I did not understand what structure you are refering to Am I missing something here?. >> + .max_entries = 50, >> +}; >> + >> +/* Map to keep the exact match entries in the route table*/ >> +struct bpf_map_def SEC("maps") exact_match = { >> + .type = BPF_MAP_TYPE_HASH, >> + .key_size = sizeof(int), >> + .value_size = sizeof(struct direct_map), >> + .max_entries = 50, >> +}; >> + >> +/** >> + * Function to set source and destination mac of the packet >> + */ >> +static inline void set_src_dst_mac(void *data, void *src, void *dst) >> +{ >> + unsigned short *p = data; >> + unsigned short *dest = dst; >> + unsigned short *source = src; >> + >> + p[3] = source[0]; >> + p[4] = source[1]; >> + p[5] = source[2]; >> + p[0] = dest[0]; >> + p[1] = dest[1]; >> + p[2] = dest[2]; > >You could just use __builtin_memcpy() given length is >constant anyway, so LLVM will do the inlining. > >> +} >> + >> +/** >> + * Parse IPV4 packet to get SRC, DST IP and protocol >> + */ >> +static inline int parse_ipv4(void *data, u64 nh_off, void *data_end, >> + unsigned int *src, unsigned int *dest) >> +{ >> + struct iphdr *iph = data + nh_off; >> + >> + if (iph + 1 > data_end) >> + return 0; >> + *src = (unsigned int)iph->saddr; >> + *dest = (unsigned int)iph->daddr; > >Why not stay with __be32 types? > >> + return iph->protocol; >> +} >> + >> +SEC("xdp3") >> +int xdp_prog3(struct xdp_md *ctx) >> +{ >> + void *data_end = (void *)(long)ctx->data_end; >> + void *data = (void *)(long)ctx->data; >> + struct ethhdr *eth = data; >> + int rc = XDP_DROP, forward_to; >> + long *value; >> + struct trie_value *prefix_value; >> + long *dest_mac = NULL, *src_mac = NULL; >> + u16 h_proto; >> + u64 nh_off; >> + u32 ipproto; >> + union key_4 key4; >> + >> + nh_off = sizeof(*eth); >> + if (data + nh_off > data_end) >> + return rc; >> + >> + h_proto = eth->h_proto; >> + >> + if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) { >> + struct vlan_hdr *vhdr; >> + >> + vhdr = data + nh_off; >> + nh_off += sizeof(struct vlan_hdr); >> +
Re: [PATCH net-next RFC 4/9] net: dsa: mv88e6xxx: add support for event capture
On Mon, Oct 09, 2017 at 04:08:50PM -0600, Levi Pearson wrote: > Another issue related to this is that while the free-running counter > in the hardware can't be easily adjusted, the periodic event generator > *can* be finely adjusted (via picosecond and sub-picosecond > accumulators) to correct for drift between the local clock and the PTP > grandmaster time. So to be semantically correct, this needs to be both > started at the right time *and* it needs to have the periodic > corrections made so that the fine correction parameters in the > hardware keep it adjusted to be synchronous with PTP grandmaster time. So if the accumulators are safe to adjust on the fly, then the adjfine() method will have to program them with every adjustment. Thanks, Richard
Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.
On Mon, Oct 09, 2017 at 04:23:57PM -0700, Alexei Starovoitov wrote: >On Mon, Oct 09, 2017 at 11:15:40PM +, Levin, Alexander (Sasha Levin) wrote: >> On Mon, Oct 09, 2017 at 04:06:20PM -0700, Alexei Starovoitov wrote: >> >On Mon, Oct 09, 2017 at 08:26:34PM +, Levin, Alexander (Sasha Levin) >> >wrote: >> >> On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote: >> >> >On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote: >> >> >> Fix BUG() calls to use BUG_ON(conditional) macros. >> >> >> >> >> >> This was found using make coccicheck M=net/core on linux next >> >> >> tag next-2017092 >> >> >> >> >> >> Signed-off-by: Tim Hansen >> >> >> --- >> >> >> net/core/skbuff.c | 15 ++- >> >> >> 1 file changed, 6 insertions(+), 9 deletions(-) >> >> >> >> >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> >> >> index d98c2e3ce2bf..34ce4c1a0f3c 100644 >> >> >> --- a/net/core/skbuff.c >> >> >> +++ b/net/core/skbuff.c >> >> >> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff >> >> >> *skb, gfp_t gfp_mask) >> >> >>/* Set the tail pointer and length */ >> >> >>skb_put(n, skb->len); >> >> >> >> >> >> - if (skb_copy_bits(skb, -headerlen, n->head, headerlen + >> >> >> skb->len)) >> >> >> - BUG(); >> >> >> + BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + >> >> >> skb->len)); >> >> > >> >> >I'm concerned with this change. >> >> >1. Calling non-trivial bit of code inside the macro is a poor coding >> >> >style (imo) >> >> >2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and >> >> >implementation >> >> >of BUG and BUG_ON look quite different. >> >> >> >> For these archs, wouldn't it then be more efficient to use BUG_ON rather >> >> than BUG()? >> > >> >why more efficient? any data to prove that? >> >> Just guessing. >> >> Either way, is there a particular reason for not using BUG_ON() here >> besides that it's implementation is "quite different"? >> >> >I'm pointing that the change is not equivalent and >> >this code has been around forever (pre-git days), so I see >> >no reason to risk changing it. >> >> Do you know that BUG_ON() is broken on any archs? >> >> If not, "this code has been around forever" is really not an excuse to >> not touch code. >> >> If BUG_ON() behavior is broken somewhere, then it needs to get fixed. > >no idea whether it's broken. My main objection is #1. >imo it's a very poor coding style to put functions with >side-effects into macros. Especially debug/bug/warn-like. This, however, seems to be an accepted coding style in the net/ subsys. Look a few lines lower in the very same file to find: BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len)); Side effects ahoy ;) >For example llvm has DEBUG() macro and everything inside >will disappear depending on compilation flags. >I wouldn't be surprised if somebody for the name >of security (to avoid crash on BUG_ON) will replace >BUG/BUG_ON with some other implementation or nop >and will have real bugs, since skb_copy_bits() is somehow >not called or called in different context. This was already discussed, with the conclusion that BUG() can never be disabled, since, as you described, it'll lead to very catastrophic results. See i.e.: commit b06dd879f5db33c1d7f5ab516ea671627f99c0c9 Author: Josh Triplett Date: Mon Apr 7 15:39:14 2014 -0700 x86: always define BUG() and HAVE_ARCH_BUG, even with !CONFIG_BUG Anyway, as you said, this boils down to coding style nitpicking. I guess that my only comment here would be that it shpid go one way or the other and we document the decision somewhere (either per subsys, or for the entire tree). -- Thanks, Sasha
Re: [net-next 00/14][pull request] 40GbE Intel Wired LAN Driver Updates 2017-10-09
From: Jeff Kirsher Date: Mon, 9 Oct 2017 15:38:27 -0700 > This series contains updates to i40e and i40evf only. Pulled, thanks Jeff.
Re: [PATCH] timer: Remove meaningless .data/.function assignments
From: Kees Cook Date: Mon, 9 Oct 2017 17:10:32 -0700 > Several timer users needlessly reset their .function/.data fields during > their timer callback, but nothing else changes them. Some users do not > use their .data field at all. Each instance is removed here. > > Cc: Krzysztof Halasa > Cc: Aditya Shankar > Cc: Ganesh Krishna > Cc: Greg Kroah-Hartman > Cc: Jens Axboe > Cc: netdev@vger.kernel.org > Cc: linux-wirel...@vger.kernel.org > Cc: de...@driverdev.osuosl.org > Signed-off-by: Kees Cook > Acked-by: Greg Kroah-Hartman # for staging > Acked-by: Krzysztof Halasa # for wan/hdlc* > Acked-by: Jens Axboe # for amiflop > --- > This should go via the timer/core tree, please. It's been acked by each > of the maintainers. Thanks! For networking bits: Acked-by: David S. Miller
[PATCH RFC tip/core/rcu 03/15] drivers/net/ethernet/qlogic/qed: Fix __qed_spq_block() ordering
The __qed_spq_block() function expects an smp_read_barrier_depends() to order a prior READ_ONCE() against a later load that does not depend on the prior READ_ONCE(), an expectation that can fail to be met. This commit therefore replaces the READ_ONCE() with smp_load_acquire() and removes the smp_read_barrier_depends(). Signed-off-by: Paul E. McKenney Cc: Ariel Elior Cc: Cc: --- drivers/net/ethernet/qlogic/qed/qed_spq.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_spq.c b/drivers/net/ethernet/qlogic/qed/qed_spq.c index be48d9abd001..c1237ec58b6c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_spq.c +++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c @@ -97,9 +97,7 @@ static int __qed_spq_block(struct qed_hwfn *p_hwfn, while (iter_cnt--) { /* Validate we receive completion update */ - if (READ_ONCE(comp_done->done) == 1) { - /* Read updated FW return value */ - smp_read_barrier_depends(); + if (smp_load_acquire(&comp_done->done) == 1) { /* ^^^ */ if (p_fw_ret) *p_fw_ret = comp_done->fw_return_code; return 0; -- 2.5.2
[PATCH RFC tip/core/rcu 14/15] netfilter: Remove now-redundant smp_read_barrier_depends()
READ_ONCE() now implies smp_read_barrier_depends(), which means that the instances in arpt_do_table(), ipt_do_table(), and ip6t_do_table() are now redundant. This commit removes them and adjusts the comments. Signed-off-by: Paul E. McKenney Cc: Pablo Neira Ayuso Cc: Jozsef Kadlecsik Cc: Florian Westphal Cc: "David S. Miller" Cc: Cc: Cc: --- net/ipv4/netfilter/arp_tables.c | 7 +-- net/ipv4/netfilter/ip_tables.c | 7 +-- net/ipv6/netfilter/ip6_tables.c | 7 +-- 3 files changed, 3 insertions(+), 18 deletions(-) diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 9e2770fd00be..d555b3b31c49 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -202,13 +202,8 @@ unsigned int arpt_do_table(struct sk_buff *skb, local_bh_disable(); addend = xt_write_recseq_begin(); - private = table->private; + private = READ_ONCE(table->private); /* Address dependency. */ cpu = smp_processor_id(); - /* -* Ensure we load private-> members after we've fetched the base -* pointer. -*/ - smp_read_barrier_depends(); table_base = private->entries; jumpstack = (struct arpt_entry **)private->jumpstack[cpu]; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 39286e543ee6..f63752bec442 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -260,13 +260,8 @@ ipt_do_table(struct sk_buff *skb, WARN_ON(!(table->valid_hooks & (1 << hook))); local_bh_disable(); addend = xt_write_recseq_begin(); - private = table->private; + private = READ_ONCE(table->private); /* Address dependency. */ cpu= smp_processor_id(); - /* -* Ensure we load private-> members after we've fetched the base -* pointer. -*/ - smp_read_barrier_depends(); table_base = private->entries; jumpstack = (struct ipt_entry **)private->jumpstack[cpu]; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 01bd3ee5ebc6..52afcab9b0d6 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -282,12 +282,7 @@ ip6t_do_table(struct sk_buff *skb, local_bh_disable(); addend = xt_write_recseq_begin(); - private = table->private; - /* -* Ensure we load private-> members after we've fetched the base -* pointer. -*/ - smp_read_barrier_depends(); + private = READ_ONCE(table->private); /* Address dependency. */ cpu= smp_processor_id(); table_base = private->entries; jumpstack = (struct ip6t_entry **)private->jumpstack[cpu]; -- 2.5.2
[PATCH net-next] ipv6: use rcu_dereference_bh() in ipv6_route_seq_next()
From: Wei Wang This patch replaces rcu_deference() with rcu_dereference_bh() in ipv6_route_seq_next() to avoid the following warning: [ 19.431685] WARNING: suspicious RCU usage [ 19.433451] 4.14.0-rc3-00914-g66f5d6c #118 Not tainted [ 19.435509] - [ 19.437267] net/ipv6/ip6_fib.c:2259 suspicious rcu_dereference_check() usage! [ 19.440790] [ 19.440790] other info that might help us debug this: [ 19.440790] [ 19.444734] [ 19.444734] rcu_scheduler_active = 2, debug_locks = 1 [ 19.447757] 2 locks held by odhcpd/3720: [ 19.449480] #0: (&p->lock){+.+.}, at: [] seq_read+0x3c/0x333 [ 19.452720] #1: (rcu_read_lock_bh){}, at: [] ipv6_route_seq_start+0x5/0xfd [ 19.456323] [ 19.456323] stack backtrace: [ 19.458812] CPU: 0 PID: 3720 Comm: odhcpd Not tainted 4.14.0-rc3-00914-g66f5d6c #118 [ 19.462042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 19.465414] Call Trace: [ 19.466788] dump_stack+0x86/0xc0 [ 19.468358] lockdep_rcu_suspicious+0xea/0xf3 [ 19.470183] ipv6_route_seq_next+0x71/0x164 [ 19.471963] seq_read+0x244/0x333 [ 19.473522] proc_reg_read+0x48/0x67 [ 19.475152] ? proc_reg_write+0x67/0x67 [ 19.476862] __vfs_read+0x26/0x10b [ 19.478463] ? __might_fault+0x37/0x84 [ 19.480148] vfs_read+0xba/0x146 [ 19.481690] SyS_read+0x51/0x8e [ 19.483197] do_int80_syscall_32+0x66/0x15a [ 19.484969] entry_INT80_compat+0x32/0x50 [ 19.486707] RIP: 0023:0xf7f0be8e [ 19.488244] RSP: 002b:ffa75d04 EFLAGS: 0246 ORIG_RAX: 0003 [ 19.491431] RAX: ffda RBX: 0009 RCX: 08056068 [ 19.493886] RDX: 1000 RSI: 08056008 RDI: 1000 [ 19.496331] RBP: 01ff R08: R09: [ 19.498768] R10: R11: R12: [ 19.501217] R13: R14: R15: Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Reported-by: Xiaolong Ye Signed-off-by: Wei Wang Acked-by: Eric Dumazet --- net/ipv6/ip6_fib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 52a29ba32928..c2ecd5ec638a 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -2262,7 +2262,7 @@ static void *ipv6_route_seq_next(struct seq_file *seq, void *v, loff_t *pos) if (!v) goto iter_table; - n = rcu_dereference(((struct rt6_info *)v)->dst.rt6_next); + n = rcu_dereference_bh(((struct rt6_info *)v)->dst.rt6_next); if (n) { ++*pos; return n; -- 2.14.2.920.gcf0c67979c-goog
[PATCH] timer: Remove meaningless .data/.function assignments
Several timer users needlessly reset their .function/.data fields during their timer callback, but nothing else changes them. Some users do not use their .data field at all. Each instance is removed here. Cc: Krzysztof Halasa Cc: Aditya Shankar Cc: Ganesh Krishna Cc: Greg Kroah-Hartman Cc: Jens Axboe Cc: netdev@vger.kernel.org Cc: linux-wirel...@vger.kernel.org Cc: de...@driverdev.osuosl.org Signed-off-by: Kees Cook Acked-by: Greg Kroah-Hartman # for staging Acked-by: Krzysztof Halasa # for wan/hdlc* Acked-by: Jens Axboe # for amiflop --- This should go via the timer/core tree, please. It's been acked by each of the maintainers. Thanks! --- drivers/block/amiflop.c | 3 +-- drivers/net/wan/hdlc_cisco.c | 2 -- drivers/net/wan/hdlc_fr.c | 2 -- drivers/staging/wilc1000/wilc_wfi_cfgoperations.c | 4 +--- 4 files changed, 2 insertions(+), 9 deletions(-) diff --git a/drivers/block/amiflop.c b/drivers/block/amiflop.c index 49908c74bfcb..4e3fb9f104af 100644 --- a/drivers/block/amiflop.c +++ b/drivers/block/amiflop.c @@ -323,7 +323,7 @@ static void fd_deselect (int drive) } -static void motor_on_callback(unsigned long nr) +static void motor_on_callback(unsigned long ignored) { if (!(ciaa.pra & DSKRDY) || --on_attempts == 0) { complete_all(&motor_on_completion); @@ -344,7 +344,6 @@ static int fd_motor_on(int nr) fd_select(nr); reinit_completion(&motor_on_completion); - motor_on_timer.data = nr; mod_timer(&motor_on_timer, jiffies + HZ/2); on_attempts = 10; diff --git a/drivers/net/wan/hdlc_cisco.c b/drivers/net/wan/hdlc_cisco.c index a408abc25512..f4b0ab34f048 100644 --- a/drivers/net/wan/hdlc_cisco.c +++ b/drivers/net/wan/hdlc_cisco.c @@ -276,8 +276,6 @@ static void cisco_timer(unsigned long arg) spin_unlock(&st->lock); st->timer.expires = jiffies + st->settings.interval * HZ; - st->timer.function = cisco_timer; - st->timer.data = arg; add_timer(&st->timer); } diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c index 78596e42a3f3..07f265fa2826 100644 --- a/drivers/net/wan/hdlc_fr.c +++ b/drivers/net/wan/hdlc_fr.c @@ -644,8 +644,6 @@ static void fr_timer(unsigned long arg) state(hdlc)->settings.t391 * HZ; } - state(hdlc)->timer.function = fr_timer; - state(hdlc)->timer.data = arg; add_timer(&state(hdlc)->timer); } diff --git a/drivers/staging/wilc1000/wilc_wfi_cfgoperations.c b/drivers/staging/wilc1000/wilc_wfi_cfgoperations.c index ac5aaafa461c..60f088babf27 100644 --- a/drivers/staging/wilc1000/wilc_wfi_cfgoperations.c +++ b/drivers/staging/wilc1000/wilc_wfi_cfgoperations.c @@ -266,7 +266,7 @@ static void update_scan_time(void) last_scanned_shadow[i].time_scan = jiffies; } -static void remove_network_from_shadow(unsigned long arg) +static void remove_network_from_shadow(unsigned long unused) { unsigned long now = jiffies; int i, j; @@ -287,7 +287,6 @@ static void remove_network_from_shadow(unsigned long arg) } if (last_scanned_cnt != 0) { - hAgingTimer.data = arg; mod_timer(&hAgingTimer, jiffies + msecs_to_jiffies(AGING_TIME)); } } @@ -304,7 +303,6 @@ static int is_network_in_shadow(struct network_info *pstrNetworkInfo, int i; if (last_scanned_cnt == 0) { - hAgingTimer.data = (unsigned long)user_void; mod_timer(&hAgingTimer, jiffies + msecs_to_jiffies(AGING_TIME)); state = -1; } else { -- 2.7.4 -- Kees Cook Pixel Security
ATENCIÓN
ATENCIÓN; Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de correo electrónico. Para revalidar su buzón de correo, envíe la siguiente información a continuación: nombre: Nombre de usuario: contraseña: Confirmar contraseña: E-mail: teléfono: Si usted no puede revalidar su buzón, el buzón se deshabilitará! Disculpa las molestias. Código de verificación: es: 006524 Correo Soporte Técnico © 2017 ¡gracias Sistemas administrador (null)
Re: [net-next 00/10][pull request] 10GbE Intel Wired LAN Driver Updates 2017-10-09
From: Jeff Kirsher Date: Mon, 9 Oct 2017 11:39:50 -0700 > This series contains updates to ixgbe only. Pulled, thanks Jeff.
Re: [PATCH net-next v2 1/5] bpf: Add file mode configuration into bpf maps
On Mon, Oct 9, 2017 at 4:07 PM, Alexei Starovoitov wrote: > On Mon, Oct 09, 2017 at 03:20:24PM -0700, Chenbo Feng wrote: >> From: Chenbo Feng >> >> Introduce the map read/write flags to the eBPF syscalls that returns the >> map fd. The flags is used to set up the file mode when construct a new >> file descriptor for bpf maps. To not break the backward capability, the >> f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise >> it should be O_RDONLY or O_WRONLY. When the userspace want to modify or >> read the map content, it will check the file mode to see if it is >> allowed to make the change. >> >> Signed-off-by: Chenbo Feng >> Acked-by: Alexei Starovoitov >> --- >> include/linux/bpf.h | 6 ++-- >> include/uapi/linux/bpf.h | 6 >> kernel/bpf/arraymap.c| 7 +++-- >> kernel/bpf/devmap.c | 5 ++- >> kernel/bpf/hashtab.c | 5 +-- >> kernel/bpf/inode.c | 15 ++--- >> kernel/bpf/lpm_trie.c| 3 +- >> kernel/bpf/sockmap.c | 5 ++- >> kernel/bpf/stackmap.c| 5 ++- >> kernel/bpf/syscall.c | 80 >> +++- >> 10 files changed, 114 insertions(+), 23 deletions(-) >> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >> index bc7da2ddfcaf..0e9ca2555d7f 100644 >> --- a/include/linux/bpf.h >> +++ b/include/linux/bpf.h >> @@ -308,11 +308,11 @@ void bpf_map_area_free(void *base); >> >> extern int sysctl_unprivileged_bpf_disabled; >> >> -int bpf_map_new_fd(struct bpf_map *map); >> +int bpf_map_new_fd(struct bpf_map *map, int flags); >> int bpf_prog_new_fd(struct bpf_prog *prog); >> >> int bpf_obj_pin_user(u32 ufd, const char __user *pathname); >> -int bpf_obj_get_user(const char __user *pathname); >> +int bpf_obj_get_user(const char __user *pathname, int flags); >> >> int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value); >> int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value); >> @@ -331,6 +331,8 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, >> struct file *map_file, >> void *key, void *value, u64 map_flags); >> int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value); >> >> +int bpf_get_file_flag(int flags); >> + >> /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and >> * forced to use 'long' read/writes to try to atomically copy long counters. >> * Best-effort only. No barriers here, since it _will_ race with concurrent >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 6db9e1d679cd..9cb50a228c39 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -217,6 +217,10 @@ enum bpf_attach_type { >> >> #define BPF_OBJ_NAME_LEN 16U >> >> +/* Flags for accessing BPF object */ >> +#define BPF_F_RDONLY (1U << 3) >> +#define BPF_F_WRONLY (1U << 4) >> + >> union bpf_attr { >> struct { /* anonymous struct used by BPF_MAP_CREATE command */ >> __u32 map_type; /* one of enum bpf_map_type */ >> @@ -259,6 +263,7 @@ union bpf_attr { >> struct { /* anonymous struct used by BPF_OBJ_* commands */ >> __aligned_u64 pathname; >> __u32 bpf_fd; >> + __u32 file_flags; >> }; >> >> struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ >> @@ -286,6 +291,7 @@ union bpf_attr { >> __u32 map_id; >> }; >> __u32 next_id; >> + __u32 open_flags; >> }; >> >> struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */ >> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c >> index 68d866628be0..f869e48ef2f6 100644 >> --- a/kernel/bpf/arraymap.c >> +++ b/kernel/bpf/arraymap.c >> @@ -19,6 +19,9 @@ >> >> #include "map_in_map.h" >> >> +#define ARRAY_CREATE_FLAG_MASK \ >> + (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) >> + >> static void bpf_array_free_percpu(struct bpf_array *array) >> { >> int i; >> @@ -56,8 +59,8 @@ static struct bpf_map *array_map_alloc(union bpf_attr >> *attr) >> >> /* check sanity of attributes */ >> if (attr->max_entries == 0 || attr->key_size != 4 || >> - attr->value_size == 0 || attr->map_flags & ~BPF_F_NUMA_NODE || >> - (percpu && numa_node != NUMA_NO_NODE)) >> + attr->value_size == 0 || attr->map_flags & >> + ~ARRAY_CREATE_FLAG_MASK || (percpu && numa_node != NUMA_NO_NODE)) > > that's very non-standard way of breaking lines. > Did you run checkpatch ? did it complain? > Will fix in next revision, checkpatch didn't say anything about this0 error and 0 warning for this patch series.
Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.
On Mon, Oct 09, 2017 at 11:15:40PM +, Levin, Alexander (Sasha Levin) wrote: > On Mon, Oct 09, 2017 at 04:06:20PM -0700, Alexei Starovoitov wrote: > >On Mon, Oct 09, 2017 at 08:26:34PM +, Levin, Alexander (Sasha Levin) > >wrote: > >> On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote: > >> >On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote: > >> >> Fix BUG() calls to use BUG_ON(conditional) macros. > >> >> > >> >> This was found using make coccicheck M=net/core on linux next > >> >> tag next-2017092 > >> >> > >> >> Signed-off-by: Tim Hansen > >> >> --- > >> >> net/core/skbuff.c | 15 ++- > >> >> 1 file changed, 6 insertions(+), 9 deletions(-) > >> >> > >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >> >> index d98c2e3ce2bf..34ce4c1a0f3c 100644 > >> >> --- a/net/core/skbuff.c > >> >> +++ b/net/core/skbuff.c > >> >> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff > >> >> *skb, gfp_t gfp_mask) > >> >> /* Set the tail pointer and length */ > >> >> skb_put(n, skb->len); > >> >> > >> >> - if (skb_copy_bits(skb, -headerlen, n->head, headerlen + > >> >> skb->len)) > >> >> - BUG(); > >> >> + BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + > >> >> skb->len)); > >> > > >> >I'm concerned with this change. > >> >1. Calling non-trivial bit of code inside the macro is a poor coding > >> >style (imo) > >> >2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and > >> >implementation > >> >of BUG and BUG_ON look quite different. > >> > >> For these archs, wouldn't it then be more efficient to use BUG_ON rather > >> than BUG()? > > > >why more efficient? any data to prove that? > > Just guessing. > > Either way, is there a particular reason for not using BUG_ON() here > besides that it's implementation is "quite different"? > > >I'm pointing that the change is not equivalent and > >this code has been around forever (pre-git days), so I see > >no reason to risk changing it. > > Do you know that BUG_ON() is broken on any archs? > > If not, "this code has been around forever" is really not an excuse to > not touch code. > > If BUG_ON() behavior is broken somewhere, then it needs to get fixed. no idea whether it's broken. My main objection is #1. imo it's a very poor coding style to put functions with side-effects into macros. Especially debug/bug/warn-like. For example llvm has DEBUG() macro and everything inside will disappear depending on compilation flags. I wouldn't be surprised if somebody for the name of security (to avoid crash on BUG_ON) will replace BUG/BUG_ON with some other implementation or nop and will have real bugs, since skb_copy_bits() is somehow not called or called in different context.
Re: linux-next: manual merge of the drivers-x86 tree with the net-next tree
On Mon, Oct 09, 2017 at 08:56:34PM +0100, Mark Brown wrote: > On Mon, Oct 09, 2017 at 10:43:01PM +0300, Mika Westerberg wrote: > > > If possible, I would rather move this chapter to be before "Networking > > over Thunderbolt cable". Reason is that it then follows NVM flashing > > chapter which is typically where you need to force power in the first > > place. > Agreed. > I guess that's something best sorted out either in the relevant trees or > during the merge window? I'm not sure how we would deal with it in the trees. Best to note this during the merge window - whichever goes in second. Test merge will identify the merge conflict, and we can include a note to Linus on the preference. -- Darren Hart VMware Open Source Technology Center
Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.
From: Alexei Starovoitov Date: Mon, 9 Oct 2017 16:06:20 -0700 >> For these archs, wouldn't it then be more efficient to use BUG_ON >> rather than BUG()? > > why more efficient? any data to prove that? It can completely eliminate a branch. For example on powerpc if you use BUG() then the code generated is: test condition branch_not_true 1f unconditional_trap 1: Whereas with BUG_ON() it's just: test condition trap_if_true Which is a lot better even when the branches in the first case are well predicted.
Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.
On Mon, Oct 09, 2017 at 04:06:20PM -0700, Alexei Starovoitov wrote: >On Mon, Oct 09, 2017 at 08:26:34PM +, Levin, Alexander (Sasha Levin) wrote: >> On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote: >> >On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote: >> >> Fix BUG() calls to use BUG_ON(conditional) macros. >> >> >> >> This was found using make coccicheck M=net/core on linux next >> >> tag next-2017092 >> >> >> >> Signed-off-by: Tim Hansen >> >> --- >> >> net/core/skbuff.c | 15 ++- >> >> 1 file changed, 6 insertions(+), 9 deletions(-) >> >> >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> >> index d98c2e3ce2bf..34ce4c1a0f3c 100644 >> >> --- a/net/core/skbuff.c >> >> +++ b/net/core/skbuff.c >> >> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, >> >> gfp_t gfp_mask) >> >> /* Set the tail pointer and length */ >> >> skb_put(n, skb->len); >> >> >> >> - if (skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)) >> >> - BUG(); >> >> + BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)); >> > >> >I'm concerned with this change. >> >1. Calling non-trivial bit of code inside the macro is a poor coding style >> >(imo) >> >2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and >> >implementation >> >of BUG and BUG_ON look quite different. >> >> For these archs, wouldn't it then be more efficient to use BUG_ON rather >> than BUG()? > >why more efficient? any data to prove that? Just guessing. Either way, is there a particular reason for not using BUG_ON() here besides that it's implementation is "quite different"? >I'm pointing that the change is not equivalent and >this code has been around forever (pre-git days), so I see >no reason to risk changing it. Do you know that BUG_ON() is broken on any archs? If not, "this code has been around forever" is really not an excuse to not touch code. If BUG_ON() behavior is broken somewhere, then it needs to get fixed. -- Thanks, Sasha
[GIT] Networking
1) Fix object leak on IPSEC offload failure, from Steffen Klassert. 2) Fix range checks in ipset address range addition operations, from Jozsef Kadlecsik. 3) Fix pernet ops unregistration order in ipset, from Florian Westphal. 4) Add missing netlink attribute policy for nl80211 packet pattern attrs, from Peng Xu. 5) Fix PPP device destruction race, from Guillaume Nault. 6) Write marks get lost when BPF verifier processes R1=R2 register assignments, causing incorrect liveness information and less state pruning. Fix from Alexei Starovoitov. 7) Fix blockhole routes so that they are marked dead and therefore not cached in sockets, otherwise IPSEC stops working. From Steffen Klassert. 8) Fix broadcast handling of UDP socket early demux, from Paolo Abeni. Please pull, thanks a lot! The following changes since commit 7a92616c0bac849e790283723b36c399668a1d9f: Merge tag 'pm-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm (2017-10-05 15:51:37 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to fdfbad3256918fc5736d68384331d2dbf45ccbd6: cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan (2017-10-09 16:03:32 -0700) Aleksander Morgado (1): cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan Alexei Starovoitov (1): bpf: fix liveness marking Alexey Kodanev (2): vti: fix NULL dereference in xfrm_input() gso: fix payload length when gso_size is zero Artem Savkov (2): xfrm: don't call xfrm_policy_cache_flush under xfrm_state_lock netfilter: ebtables: fix race condition in frame_filter_net_init() Arvind Yadav (1): netfilter: nf_tables: Release memory obtained by kasprintf Axel Beckert (1): doc: Fix typo "8023.ad" in bonding documentation Dan Carpenter (1): selftests/net: rxtimestamp: Fix an off by one David S. Miller (4): Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec Merge tag 'mac80211-for-davem-2017-10-09' of git://git.kernel.org/.../jberg/mac80211 Merge branch '10GbE' of git://git.kernel.org/.../jkirsher/net-queue Merge git://git.kernel.org/.../pablo/nf Ding Tianhong (2): Revert commit 1a8b6d76dc5b ("net:add one common config...") net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag Eric Dumazet (1): netfilter: x_tables: avoid stack-out-of-bounds read in xt_copy_counters_from_user Florian Westphal (1): netfilter: ipset: pernet ops must be unregistered last Guillaume Nault (1): ppp: fix race in ppp device destruction Gustavo A. R. Silva (1): net: thunderx: mark expected switch fall-throughs in nicvf_main() Ido Schimmel (1): mlxsw: spectrum_router: Avoid expensive lookup during route removal Jason A. Donenfeld (1): netlink: do not set cb_running if dump's start() errs JingPiao Chen (1): netfilter: nf_tables: fix update chain error John Fastabend (1): ixgbe: incorrect XDP ring accounting in ethtool tx_frame param Jon Maloy (2): tipc: correct initialization of skb list tipc: Unclone message at secondary destination lookup Jozsef Kadlecsik (1): netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses Lin Zhang (1): netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook Mark D Rustad (1): ixgbe: Return error when getting PHY address if PHY access is not supported Matteo Croce (1): ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real Pablo Neira Ayuso (1): netfilter: nf_tables: do not dump chain counters if not enabled Paolo Abeni (1): udp: fix bcast packet reception Peng Xu (1): nl80211: Define policy for packet pattern attributes Ross Lagerwall (1): netfilter: ipset: Fix race between dump and swap Sabrina Dubroca (1): ixgbe: fix masking of bits read from IXGBE_VXLANCTRL register Shmulik Ladkani (1): netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' Steffen Klassert (4): xfrm: Fix deletion of offloaded SAs on failure. xfrm: Fix negative device refcount on offload failure. ipv6: Fix traffic triggered IPsec connections. ipv4: Fix traffic triggered IPsec connections. Subash Abhinov Kasiviswanathan (1): netfilter: xt_socket: Restore mark from full sockets only Vadim Fedorenko (1): netfilter: ipvs: full-functionality option for ECN encapsulation in tunnel Documentation/networking/bonding.txt | 2 +- arch/Kconfig | 3 --- arch/sparc/Kconfig| 1 - drivers/net/ethernet/cavium/thunder/nicvf_main.c | 2 ++ drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c| 22 -- drivers/net/et
Re: [PATCH net-next v2 1/5] bpf: Add file mode configuration into bpf maps
On Mon, Oct 09, 2017 at 03:20:24PM -0700, Chenbo Feng wrote: > From: Chenbo Feng > > Introduce the map read/write flags to the eBPF syscalls that returns the > map fd. The flags is used to set up the file mode when construct a new > file descriptor for bpf maps. To not break the backward capability, the > f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise > it should be O_RDONLY or O_WRONLY. When the userspace want to modify or > read the map content, it will check the file mode to see if it is > allowed to make the change. > > Signed-off-by: Chenbo Feng > Acked-by: Alexei Starovoitov > --- > include/linux/bpf.h | 6 ++-- > include/uapi/linux/bpf.h | 6 > kernel/bpf/arraymap.c| 7 +++-- > kernel/bpf/devmap.c | 5 ++- > kernel/bpf/hashtab.c | 5 +-- > kernel/bpf/inode.c | 15 ++--- > kernel/bpf/lpm_trie.c| 3 +- > kernel/bpf/sockmap.c | 5 ++- > kernel/bpf/stackmap.c| 5 ++- > kernel/bpf/syscall.c | 80 > +++- > 10 files changed, 114 insertions(+), 23 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index bc7da2ddfcaf..0e9ca2555d7f 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -308,11 +308,11 @@ void bpf_map_area_free(void *base); > > extern int sysctl_unprivileged_bpf_disabled; > > -int bpf_map_new_fd(struct bpf_map *map); > +int bpf_map_new_fd(struct bpf_map *map, int flags); > int bpf_prog_new_fd(struct bpf_prog *prog); > > int bpf_obj_pin_user(u32 ufd, const char __user *pathname); > -int bpf_obj_get_user(const char __user *pathname); > +int bpf_obj_get_user(const char __user *pathname, int flags); > > int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value); > int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value); > @@ -331,6 +331,8 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, > struct file *map_file, > void *key, void *value, u64 map_flags); > int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value); > > +int bpf_get_file_flag(int flags); > + > /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and > * forced to use 'long' read/writes to try to atomically copy long counters. > * Best-effort only. No barriers here, since it _will_ race with concurrent > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 6db9e1d679cd..9cb50a228c39 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -217,6 +217,10 @@ enum bpf_attach_type { > > #define BPF_OBJ_NAME_LEN 16U > > +/* Flags for accessing BPF object */ > +#define BPF_F_RDONLY (1U << 3) > +#define BPF_F_WRONLY (1U << 4) > + > union bpf_attr { > struct { /* anonymous struct used by BPF_MAP_CREATE command */ > __u32 map_type; /* one of enum bpf_map_type */ > @@ -259,6 +263,7 @@ union bpf_attr { > struct { /* anonymous struct used by BPF_OBJ_* commands */ > __aligned_u64 pathname; > __u32 bpf_fd; > + __u32 file_flags; > }; > > struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ > @@ -286,6 +291,7 @@ union bpf_attr { > __u32 map_id; > }; > __u32 next_id; > + __u32 open_flags; > }; > > struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */ > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > index 68d866628be0..f869e48ef2f6 100644 > --- a/kernel/bpf/arraymap.c > +++ b/kernel/bpf/arraymap.c > @@ -19,6 +19,9 @@ > > #include "map_in_map.h" > > +#define ARRAY_CREATE_FLAG_MASK \ > + (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) > + > static void bpf_array_free_percpu(struct bpf_array *array) > { > int i; > @@ -56,8 +59,8 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr) > > /* check sanity of attributes */ > if (attr->max_entries == 0 || attr->key_size != 4 || > - attr->value_size == 0 || attr->map_flags & ~BPF_F_NUMA_NODE || > - (percpu && numa_node != NUMA_NO_NODE)) > + attr->value_size == 0 || attr->map_flags & > + ~ARRAY_CREATE_FLAG_MASK || (percpu && numa_node != NUMA_NO_NODE)) that's very non-standard way of breaking lines. Did you run checkpatch ? did it complain?
Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.
On Mon, Oct 09, 2017 at 08:26:34PM +, Levin, Alexander (Sasha Levin) wrote: > On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote: > >On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote: > >> Fix BUG() calls to use BUG_ON(conditional) macros. > >> > >> This was found using make coccicheck M=net/core on linux next > >> tag next-2017092 > >> > >> Signed-off-by: Tim Hansen > >> --- > >> net/core/skbuff.c | 15 ++- > >> 1 file changed, 6 insertions(+), 9 deletions(-) > >> > >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >> index d98c2e3ce2bf..34ce4c1a0f3c 100644 > >> --- a/net/core/skbuff.c > >> +++ b/net/core/skbuff.c > >> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, > >> gfp_t gfp_mask) > >>/* Set the tail pointer and length */ > >>skb_put(n, skb->len); > >> > >> - if (skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)) > >> - BUG(); > >> + BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)); > > > >I'm concerned with this change. > >1. Calling non-trivial bit of code inside the macro is a poor coding style > >(imo) > >2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and > >implementation > >of BUG and BUG_ON look quite different. > > For these archs, wouldn't it then be more efficient to use BUG_ON rather than > BUG()? why more efficient? any data to prove that? I'm pointing that the change is not equivalent and this code has been around forever (pre-git days), so I see no reason to risk changing it.
Re: linux-next: manual merge of the cgroup tree with the net-next tree
On Mon, Oct 09, 2017 at 07:38:36PM +0100, Mark Brown wrote: > Hi Tejun, > > Today's linux-next merge of the cgroup tree got a conflict in: > > kernel/cgroup/cgroup.c > > between commit: > > 324bda9e6c5ad ("bpf: multi program support for cgroup+bpf") > > from the net-next tree and commit: > > 041cd640b2f3c ("cgroup: Implement cgroup2 basic CPU usage accounting") > > from the cgroup tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > > diff --cc kernel/cgroup/cgroup.c > index 00f5b358aeac,c3421ee0d230.. > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@@ -4765,8 -4785,9 +4788,11 @@@ static struct cgroup *cgroup_create(str > > return cgrp; > > +out_idr_free: > +cgroup_idr_remove(&root->cgroup_idr, cgrp->id); > + out_stat_exit: > + if (cgroup_on_dfl(parent)) > + cgroup_stat_exit(cgrp); thanks. I did the same merge conflict resolution for our combined tree.
Re: [PATCH] cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan
From: Aleksander Morgado Date: Mon, 9 Oct 2017 14:05:12 +0200 > The u-blox TOBY-L2 is a LTE Cat 4 module with HSPA+ and 2G fallback. > This module allows switching to different USB profiles with the > 'AT+UUSBCONF' command, and provides a ECM network interface when the > 'AT+UUSBCONF=2' profile is selected. > > The u-blox SARA-U2 is a HSPA module with 2G fallback. The default USB > configuration includes a ECM network interface. > > Both these modules are controlled via AT commands through one of the > TTYs exposed. Connecting these modules may be done just by activating > the desired PDP context with 'AT+CGACT=1,' and then running DHCP > on the ECM interface. > > Signed-off-by: Aleksander Morgado Applied, thank you.
[net-next 02/14] i40e: use the safe hash table iterator when deleting mac filters
From: Lihong Yang This patch replaces hash_for_each function with hash_for_each_safe when calling __i40e_del_filter. The hash_for_each_safe function is the right one to use when iterating over a hash table to safely remove a hash entry. Otherwise, incorrect values may be read from freed memory. Detected by CoverityScan, CID 1402048 Read from pointer after free Signed-off-by: Lihong Yang Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 04568137e029..c062d74d21f3 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -2883,6 +2883,7 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) struct i40e_mac_filter *f; struct i40e_vf *vf; int ret = 0; + struct hlist_node *h; int bkt; /* validate the request */ @@ -2921,7 +2922,7 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) /* Delete all the filters for this VSI - we're going to kill it * anyway. */ - hash_for_each(vsi->mac_filter_hash, bkt, f, hlist) + hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) __i40e_del_filter(vsi, f); spin_unlock_bh(&vsi->mac_filter_hash_lock); -- 2.14.2
[net-next 01/14] i40e: fix flags declaration
From: Jacob Keller Since we don't yet have more than 32 flags, we'll use a u32 for both the hw_features and flag field. Should we gain more flags in the future, we may need to convert to a u64 or separate flags out into two fields. This was overlooked in the previous commit 2781de2134c4 ("i40e/i40evf: organize and re-number feature flags"), where the feature flag was not converted form u64 to u32. Signed-off-by: Jacob Keller Reviewed-by: Mitch Williams Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 18c453a3e728..7baf6d8a84dd 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -424,7 +424,7 @@ struct i40e_pf { #define I40E_HW_PORT_ID_VALID BIT(17) #define I40E_HW_RESTART_AUTONEGBIT(18) - u64 flags; + u32 flags; #define I40E_FLAG_RX_CSUM_ENABLED BIT(0) #define I40E_FLAG_MSI_ENABLED BIT(1) #define I40E_FLAG_MSIX_ENABLED BIT(2) -- 2.14.2
[net-next 05/14] i40e/i40evf: always set the CLEARPBA flag when re-enabling interrupts
From: Jacob Keller In the past we changed driver behavior to not clear the PBA when re-enabling interrupts. This change was motivated by the flawed belief that clearing the PBA would cause a lost interrupt if a receive interrupt occurred while interrupts were disabled. According to empirical testing this isn't the case. Additionally, the data sheet specifically says that we should set the CLEARPBA bit when re-enabling interrupts in a polling setup. This reverts commit 40d72a509862 ("i40e/i40evf: don't lose interrupts") Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e.h | 5 + drivers/net/ethernet/intel/i40e/i40e_main.c| 11 +-- drivers/net/ethernet/intel/i40e/i40e_txrx.c| 6 ++ drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 +- drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 4 +--- 5 files changed, 10 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 7baf6d8a84dd..8139b4ee1dc3 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -949,9 +949,6 @@ static inline void i40e_irq_dynamic_enable(struct i40e_vsi *vsi, int vector) struct i40e_hw *hw = &pf->hw; u32 val; - /* definitely clear the PBA here, as this function is meant to -* clean out all previous interrupts AND enable the interrupt -*/ val = I40E_PFINT_DYN_CTLN_INTENA_MASK | I40E_PFINT_DYN_CTLN_CLEARPBA_MASK | (I40E_ITR_NONE << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT); @@ -960,7 +957,7 @@ static inline void i40e_irq_dynamic_enable(struct i40e_vsi *vsi, int vector) } void i40e_irq_dynamic_disable_icr0(struct i40e_pf *pf); -void i40e_irq_dynamic_enable_icr0(struct i40e_pf *pf, bool clearpba); +void i40e_irq_dynamic_enable_icr0(struct i40e_pf *pf); int i40e_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd); int i40e_open(struct net_device *netdev); int i40e_close(struct net_device *netdev); diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index d4b0cc36afb1..00a83afb02e9 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3403,15 +3403,14 @@ void i40e_irq_dynamic_disable_icr0(struct i40e_pf *pf) /** * i40e_irq_dynamic_enable_icr0 - Enable default interrupt generation for icr0 * @pf: board private structure - * @clearpba: true when all pending interrupt events should be cleared **/ -void i40e_irq_dynamic_enable_icr0(struct i40e_pf *pf, bool clearpba) +void i40e_irq_dynamic_enable_icr0(struct i40e_pf *pf) { struct i40e_hw *hw = &pf->hw; u32 val; val = I40E_PFINT_DYN_CTL0_INTENA_MASK | - (clearpba ? I40E_PFINT_DYN_CTL0_CLEARPBA_MASK : 0) | + I40E_PFINT_DYN_CTL0_CLEARPBA_MASK | (I40E_ITR_NONE << I40E_PFINT_DYN_CTL0_ITR_INDX_SHIFT); wr32(hw, I40E_PFINT_DYN_CTL0, val); @@ -3597,7 +3596,7 @@ static int i40e_vsi_enable_irq(struct i40e_vsi *vsi) for (i = 0; i < vsi->num_q_vectors; i++) i40e_irq_dynamic_enable(vsi, i); } else { - i40e_irq_dynamic_enable_icr0(pf, true); + i40e_irq_dynamic_enable_icr0(pf); } i40e_flush(&pf->hw); @@ -3746,7 +3745,7 @@ static irqreturn_t i40e_intr(int irq, void *data) wr32(hw, I40E_PFINT_ICR0_ENA, ena_mask); if (!test_bit(__I40E_DOWN, pf->state)) { i40e_service_event_schedule(pf); - i40e_irq_dynamic_enable_icr0(pf, false); + i40e_irq_dynamic_enable_icr0(pf); } return ret; @@ -8455,7 +8454,7 @@ static int i40e_setup_misc_vector(struct i40e_pf *pf) i40e_flush(hw); - i40e_irq_dynamic_enable_icr0(pf, true); + i40e_irq_dynamic_enable_icr0(pf); return err; } diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 3bd176606c09..616abf79253e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2202,9 +2202,7 @@ static u32 i40e_buildreg_itr(const int type, const u16 itr) u32 val; val = I40E_PFINT_DYN_CTLN_INTENA_MASK | - /* Don't clear PBA because that can cause lost interrupts that - * came in while we were cleaning/polling - */ + I40E_PFINT_DYN_CTLN_CLEARPBA_MASK | (type << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT) | (itr << I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT); @@ -2241,7 +2239,7 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi, /* If we don't have MSIX, then we only need to re-enable icr0 */ if (!(vsi->back->flags & I40E_FLAG_MSIX_ENABLED)) { -
[net-next 03/14] i40evf: fix mac filter removal timing issue
From: Alan Brady Due to the asynchronous nature in which mac filters are added and deleted, there exists a bug in which filters are erroneously removed if removed then added again quickly. The events are as such: - filter marked for removal - same filter is re-added before watchdog that cleans up filters - we skip re-adding the filter because we have it already in the list - watchdog filter cleanup kicks off and filter is removed So when we were re-adding the same filter, it didn't actually get added because it already existed in the list, but was marked for removal and had yet to actually be removed. This patch fixes the issue by making sure that when adding a filter, if we find it already existing in our list, make sure it is not marked to be removed. Signed-off-by: Alan Brady Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 1d2fc898b664..f62d9565c7b5 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -880,6 +880,8 @@ i40evf_mac_filter *i40evf_add_filter(struct i40evf_adapter *adapter, list_add_tail(&f->list, &adapter->mac_filter_list); f->add = true; adapter->aq_required |= I40EVF_FLAG_AQ_ADD_MAC_FILTER; + } else { + f->remove = false; } clear_bit(__I40EVF_IN_CRITICAL_TASK, &adapter->crit_section); -- 2.14.2
[net-next 07/14] i40e/i40evf: bump tail only in multiples of 8
From: Jacob Keller Hardware only fetches descriptors on cachelines of 8, essentially ignoring the lower 3 bits of the tail register. Thus, it is pointless to bump tail by an unaligned access as the hardware will ignore some of the new descriptors we allocated. Thus, it's ideal if we can ensure tail writes are always aligned to 8. At first, it seems like we'd already do this, since we allocate descriptors in batches which are a multiple of 8. Since we'd always increment by a multiple of 8, it seems like the value should always be aligned. However, this ignores allocation failures. If we fail to allocate a buffer, our tail register will become unaligned. Once it has become unaligned it will essentially be stuck unaligned until a buffer allocation happens to fail at the exact amount necessary to re-align it. We can do better, by simply rounding down the number of buffers we're about to allocate (cleaned_count) such that "next_to_clean + cleaned_count" is rounded to the nearest multiple of 8. We do this by calculating how far off that value is and subtracting it from the cleaned_count. This essentially defers allocation of buffers if they're going to be ignored by hardware anyways, and re-aligns our next_to_use and tail values after a failure to allocate a descriptor. This calculation ensures that we always align the tail writes in a way the hardware expects and don't unnecessarily allocate buffers which won't be fetched immediately. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 9 + drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 9 + 2 files changed, 18 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 616abf79253e..a23306f04e00 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -1372,6 +1372,15 @@ bool i40e_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count) union i40e_rx_desc *rx_desc; struct i40e_rx_buffer *bi; + /* Hardware only fetches new descriptors in cache lines of 8, +* essentially ignoring the lower 3 bits of the tail register. We want +* to ensure our tail writes are aligned to avoid unnecessary work. We +* can't simply round down the cleaned count, since we might fail to +* allocate some buffers. What we really want is to ensure that +* next_to_used + cleaned_count produces an aligned value. +*/ + cleaned_count -= (ntu + cleaned_count) & 0x7; + /* do nothing if no valid netdev defined */ if (!rx_ring->netdev || !cleaned_count) return false; diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c index fe817e2b6fef..6806ada11490 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c @@ -711,6 +711,15 @@ bool i40evf_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count) union i40e_rx_desc *rx_desc; struct i40e_rx_buffer *bi; + /* Hardware only fetches new descriptors in cache lines of 8, +* essentially ignoring the lower 3 bits of the tail register. We want +* to ensure our tail writes are aligned to avoid unnecessary work. We +* can't simply round down the cleaned count, since we might fail to +* allocate some buffers. What we really want is to ensure that +* next_to_used + cleaned_count produces an aligned value. +*/ + cleaned_count -= (ntu + cleaned_count) & 0x7; + /* do nothing if no valid netdev defined */ if (!rx_ring->netdev || !cleaned_count) return false; -- 2.14.2
[net-next 11/14] i40e: Retry AQC GetPhyAbilities to overcome I2CRead hangs
From: Jayaprakash Shanmugam - When the I2C is busy, the PHY reads are delayed. The firmware will return EGAIN in these cases with an expectation that the SW will trigger the reads again - This patch retries the operation for a maximum period of 500ms Signed-off-by: Jayaprakash Shanmugam Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_common.c | 42 ++- drivers/net/ethernet/intel/i40e/i40e_type.h | 3 ++ drivers/net/ethernet/intel/i40evf/i40e_type.h | 3 ++ 3 files changed, 35 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c index 60542beda7ad..53aad378d49c 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_common.c +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c @@ -1567,30 +1567,46 @@ i40e_status i40e_aq_get_phy_capabilities(struct i40e_hw *hw, struct i40e_aq_desc desc; i40e_status status; u16 abilities_size = sizeof(struct i40e_aq_get_phy_abilities_resp); + u16 max_delay = I40E_MAX_PHY_TIMEOUT, total_delay = 0; if (!abilities) return I40E_ERR_PARAM; - i40e_fill_default_direct_cmd_desc(&desc, - i40e_aqc_opc_get_phy_abilities); + do { + i40e_fill_default_direct_cmd_desc(&desc, + i40e_aqc_opc_get_phy_abilities); - desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_BUF); - if (abilities_size > I40E_AQ_LARGE_BUF) - desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_LB); + desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_BUF); + if (abilities_size > I40E_AQ_LARGE_BUF) + desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_LB); - if (qualified_modules) - desc.params.external.param0 |= + if (qualified_modules) + desc.params.external.param0 |= cpu_to_le32(I40E_AQ_PHY_REPORT_QUALIFIED_MODULES); - if (report_init) - desc.params.external.param0 |= + if (report_init) + desc.params.external.param0 |= cpu_to_le32(I40E_AQ_PHY_REPORT_INITIAL_VALUES); - status = i40e_asq_send_command(hw, &desc, abilities, abilities_size, - cmd_details); + status = i40e_asq_send_command(hw, &desc, abilities, + abilities_size, cmd_details); - if (hw->aq.asq_last_status == I40E_AQ_RC_EIO) - status = I40E_ERR_UNKNOWN_PHY; + if (status) + break; + + if (hw->aq.asq_last_status == I40E_AQ_RC_EIO) { + status = I40E_ERR_UNKNOWN_PHY; + break; + } else if (hw->aq.asq_last_status == I40E_AQ_RC_EAGAIN) { + usleep_range(1000, 2000); + total_delay++; + status = I40E_ERR_TIMEOUT; + } + } while ((hw->aq.asq_last_status != I40E_AQ_RC_OK) && +(total_delay < max_delay)); + + if (status) + return status; if (report_init) { if (hw->mac.type == I40E_MAC_XL710 && diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h index 4b32b1d38a66..0410fcbdbb94 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_type.h +++ b/drivers/net/ethernet/intel/i40e/i40e_type.h @@ -46,6 +46,9 @@ /* Max default timeout in ms, */ #define I40E_MAX_NVM_TIMEOUT 18000 +/* Max timeout in ms for the phy to respond */ +#define I40E_MAX_PHY_TIMEOUT 500 + /* Switch from ms to the 1usec global time (this is the GTIME resolution) */ #define I40E_MS_TO_GTIME(time) ((time) * 1000) diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h b/drivers/net/ethernet/intel/i40evf/i40e_type.h index 9364b67fff9c..213b773dfad6 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_type.h +++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h @@ -46,6 +46,9 @@ /* Max default timeout in ms, */ #define I40E_MAX_NVM_TIMEOUT 18000 +/* Max timeout in ms for the phy to respond */ +#define I40E_MAX_PHY_TIMEOUT 500 + /* Switch from ms to the 1usec global time (this is the GTIME resolution) */ #define I40E_MS_TO_GTIME(time) ((time) * 1000) -- 2.14.2
[net-next 13/14] i40e: fix a typo
From: Rami Rosen This patch fixes a typo in i40e_vsi_alloc_arrays() documentation. The first parameter name should be "vsi" instead of "type". Signed-off-by: Rami Rosen Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index b26f615bed5a..4de52001a2b9 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -7688,7 +7688,7 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi) /** * i40e_vsi_alloc_arrays - Allocate queue and vector pointer arrays for the vsi - * @type: VSI pointer + * @vsi: VSI pointer * @alloc_qvectors: a bool to specify if q_vectors need to be allocated. * * On error: returns error code (negative) -- 2.14.2
[net-next 04/14] i40e/i40evf: fix incorrect default ITR values on driver load
From: Jacob Keller The ITR register expects to be programmed in units of 2 microseconds. Because of this, all of the drivers I40E_ITR_* constants are in terms of this 2 microsecond register. Unfortunately, the rx_itr_default value is expected to be programmed in microseconds. Effectively the driver defaults to an ITR value of half the expected value (in terms of minimum microseconds between interrupts). Fix this by changing the default values to be calculated using ITR_REG_TO_USEC macro which indicates that we're converting from the register units into microseconds. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++-- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 6 -- drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 6 -- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 ++-- 4 files changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 60b11fdeca2d..d4b0cc36afb1 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -8983,8 +8983,8 @@ static int i40e_sw_init(struct i40e_pf *pf) I40E_FLAG_MSIX_ENABLED; /* Set default ITR */ - pf->rx_itr_default = I40E_ITR_DYNAMIC | I40E_ITR_RX_DEF; - pf->tx_itr_default = I40E_ITR_DYNAMIC | I40E_ITR_TX_DEF; + pf->rx_itr_default = I40E_ITR_RX_DEF; + pf->tx_itr_default = I40E_ITR_TX_DEF; /* Depending on PF configurations, it is possible that the RSS * maximum might end up larger than the available queues diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index a4e3e665a1a1..c3156aa3f709 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -38,8 +38,10 @@ #define I40E_ITR_8K0x003E #define I40E_ITR_4K0x007A #define I40E_MAX_INTRL 0x3B/* reg uses 4 usec resolution */ -#define I40E_ITR_RX_DEFI40E_ITR_20K -#define I40E_ITR_TX_DEFI40E_ITR_20K +#define I40E_ITR_RX_DEF(ITR_REG_TO_USEC(I40E_ITR_20K) | \ + I40E_ITR_DYNAMIC) +#define I40E_ITR_TX_DEF(ITR_REG_TO_USEC(I40E_ITR_20K) | \ + I40E_ITR_DYNAMIC) #define I40E_ITR_DYNAMIC 0x8000 /* use top bit as a flag */ #define I40E_MIN_INT_RATE 250 /* ~= 100 / (I40E_MAX_ITR * 2) */ #define I40E_MAX_INT_RATE 50 /* == 100 / (I40E_MIN_ITR * 2) */ diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h index d8ca802a71a9..8f9830d7649a 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h @@ -38,8 +38,10 @@ #define I40E_ITR_8K0x003E #define I40E_ITR_4K0x007A #define I40E_MAX_INTRL 0x3B/* reg uses 4 usec resolution */ -#define I40E_ITR_RX_DEFI40E_ITR_20K -#define I40E_ITR_TX_DEFI40E_ITR_20K +#define I40E_ITR_RX_DEF(ITR_REG_TO_USEC(I40E_ITR_20K) | \ + I40E_ITR_DYNAMIC) +#define I40E_ITR_TX_DEF(ITR_REG_TO_USEC(I40E_ITR_20K) | \ + I40E_ITR_DYNAMIC) #define I40E_ITR_DYNAMIC 0x8000 /* use top bit as a flag */ #define I40E_MIN_INT_RATE 250 /* ~= 100 / (I40E_MAX_ITR * 2) */ #define I40E_MAX_INT_RATE 50 /* == 100 / (I40E_MIN_ITR * 2) */ diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index f62d9565c7b5..5bcbd46e2f6c 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -1223,7 +1223,7 @@ static int i40evf_alloc_queues(struct i40evf_adapter *adapter) tx_ring->netdev = adapter->netdev; tx_ring->dev = &adapter->pdev->dev; tx_ring->count = adapter->tx_desc_count; - tx_ring->tx_itr_setting = (I40E_ITR_DYNAMIC | I40E_ITR_TX_DEF); + tx_ring->tx_itr_setting = I40E_ITR_TX_DEF; if (adapter->flags & I40EVF_FLAG_WB_ON_ITR_CAPABLE) tx_ring->flags |= I40E_TXR_FLAGS_WB_ON_ITR; @@ -1232,7 +1232,7 @@ static int i40evf_alloc_queues(struct i40evf_adapter *adapter) rx_ring->netdev = adapter->netdev; rx_ring->dev = &adapter->pdev->dev; rx_ring->count = adapter->rx_desc_count; - rx_ring->rx_itr_setting = (I40E_ITR_DYNAMIC | I40E_ITR_RX_DEF); + rx_ring->rx_itr_setting = I40E_ITR_RX_DEF; } adapter->num_active_queues = num_active_queues; -- 2.14.2
[net-next 08/14] i40e/i40evf: bundle more descriptors when allocating buffers
From: Jacob Keller Double the number of descriptors we'll bundle into one tail bump when receiving. Empirical testing has shown that we reduce CPU utilization and don't appear to reduce throughput or packet rate. 32 seems to be the sweet spot, as it's half the default polling budget, so we'd essentially reduce from 4 tail writes when polling down to 2. Increasing this up to 64 appears to have negative impacts as it may become possible that we don't bump the tail each time we get polled, which could cause a long delay between returning descriptors to the hardware. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 +- drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index c3156aa3f709..ff57ae451524 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -208,7 +208,7 @@ static inline bool i40e_test_staterr(union i40e_rx_desc *rx_desc, } /* How many Rx Buffers do we bundle into one write to the hardware ? */ -#define I40E_RX_BUFFER_WRITE 16 /* Must be power of 2 */ +#define I40E_RX_BUFFER_WRITE 32 /* Must be power of 2 */ #define I40E_RX_INCREMENT(r, i) \ do {\ (i)++; \ diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h index 8f9830d7649a..8d26c85d12e1 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h @@ -191,7 +191,7 @@ static inline bool i40e_test_staterr(union i40e_rx_desc *rx_desc, } /* How many Rx Buffers do we bundle into one write to the hardware ? */ -#define I40E_RX_BUFFER_WRITE 16 /* Must be power of 2 */ +#define I40E_RX_BUFFER_WRITE 32 /* Must be power of 2 */ #define I40E_RX_INCREMENT(r, i) \ do {\ (i)++; \ -- 2.14.2
[net-next 00/14][pull request] 40GbE Intel Wired LAN Driver Updates 2017-10-09
This series contains updates to i40e and i40evf only. Jake fixes missed flag conversion from u64 to u32. Fixes a deafult ITR value issue where the driver defaults to an ITR value of half the expected value (in terms of minimum microseconds between interrupts). So fix this by changing the default values to be calculated using the ITR_REG_TO_USEC() macro which indicates that we are converting from the register units into microseconds. Updates the drivers to bump the tail in increments of 8 and double the number of descriptors we will bundle into one tail bump when receiving. With the recent kernel support for enabling XPS and QoS at the same time, we no longer need to worry about the number of traffic classes when enabling XPS. Lihong converts the use of hash_for_each() to hash_for_each_safe() to safely remove a hash entry. Adds a check for the return value for find_first_bit() in the case that it returns the size passed to search. Alan fixes a bug in which filters are erroneously removed if they are removed and then added again. So make sure that when adding a filter, if we find it already existed in our list, make sure it is not marked to be removed. Jayaprakash adds the retrying of PHY reads when the I2C is busy for a maximum period of 500ms. Rami fixes code comment typo. Stefano Brivio simplifies the code by removing the use of a local return code variable and simply return the results of the read function. The following are changes since commit 2e997d8b12d2933d7640bb3a43af8eb6857a73af: Merge branch 'ipv6-addrlabel-avoid-dirtying-ip6addrlbl_entry' and are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE Alan Brady (1): i40evf: fix mac filter removal timing issue Jacob Keller (7): i40e: fix flags declaration i40e/i40evf: fix incorrect default ITR values on driver load i40e/i40evf: always set the CLEARPBA flag when re-enabling interrupts i40e: reduce lrxqthresh from 2 to 1 i40e/i40evf: bump tail only in multiples of 8 i40e/i40evf: bundle more descriptors when allocating buffers i40e: allow XPS with QoS enabled Jayaprakash Shanmugam (1): i40e: Retry AQC GetPhyAbilities to overcome I2CRead hangs Lihong Yang (3): i40e: use the safe hash table iterator when deleting mac filters i40e: add check for return from find_first_bit call i40e: use a local variable instead of calculating multiple times Rami Rosen (1): i40e: fix a typo Stefano Brivio (1): i40e: Avoid some useless variables and initializers in NVM functions drivers/net/ethernet/intel/i40e/i40e.h | 7 ++-- drivers/net/ethernet/intel/i40e/i40e_common.c | 42 +++--- drivers/net/ethernet/intel/i40e/i40e_main.c| 36 --- drivers/net/ethernet/intel/i40e/i40e_nvm.c | 20 --- drivers/net/ethernet/intel/i40e/i40e_txrx.c| 15 +--- drivers/net/ethernet/intel/i40e/i40e_txrx.h| 8 +++-- drivers/net/ethernet/intel/i40e/i40e_type.h| 3 ++ drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 27 +++--- drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 13 +-- drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 8 +++-- drivers/net/ethernet/intel/i40evf/i40e_type.h | 3 ++ drivers/net/ethernet/intel/i40evf/i40evf_main.c| 6 ++-- 12 files changed, 107 insertions(+), 81 deletions(-) -- 2.14.2
[net-next 06/14] i40e: reduce lrxqthresh from 2 to 1
From: Jacob Keller The lrxq thresh value tells hardware to immediately interrupt when there are fewer than N*64 packets left in the ring. Counter intuitively, empirical testing has shown that decreasing this value from 2 to 1, and thus changing from an immediate interrupt at fewer than 128 descriptors down to 64 descriptors causes a small increase in the maximum total packets per second we can receive. This increase occurs even when we're polling with interrupts masked, as the hardware must still handle interrupts internally even if we've disabled them in software. Also reduce the value for any VFs we allocate. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c| 2 +- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 00a83afb02e9..74875ddaeb33 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3030,7 +3030,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) if (hw->revision_id == 0) rx_ctx.lrxqthresh = 0; else - rx_ctx.lrxqthresh = 2; + rx_ctx.lrxqthresh = 1; rx_ctx.crcstrip = 1; rx_ctx.l2tsel = 1; /* this controls whether VLAN is stripped from inner headers */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 10298956a81b..83727906a386 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -639,7 +639,7 @@ static int i40e_config_vsi_rx_queue(struct i40e_vf *vf, u16 vsi_id, rx_ctx.dsize = 1; /* default values */ - rx_ctx.lrxqthresh = 2; + rx_ctx.lrxqthresh = 1; rx_ctx.crcstrip = 1; rx_ctx.prefena = 1; rx_ctx.l2tsel = 1; -- 2.14.2
[net-next 14/14] i40e: Avoid some useless variables and initializers in NVM functions
From: Stefano Brivio Fixes: 09f79fd49d94 ("i40e: avoid NVM acquire deadlock during NVM update") Signed-off-by: Stefano Brivio Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_nvm.c | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c index 57505b1df98d..151d9cfb6ea4 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c +++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c @@ -311,13 +311,10 @@ static i40e_status i40e_read_nvm_word_aq(struct i40e_hw *hw, u16 offset, static i40e_status __i40e_read_nvm_word(struct i40e_hw *hw, u16 offset, u16 *data) { - i40e_status ret_code = 0; - if (hw->flags & I40E_HW_FLAG_AQ_SRCTL_ACCESS_ENABLE) - ret_code = i40e_read_nvm_word_aq(hw, offset, data); - else - ret_code = i40e_read_nvm_word_srctl(hw, offset, data); - return ret_code; + return i40e_read_nvm_word_aq(hw, offset, data); + + return i40e_read_nvm_word_srctl(hw, offset, data); } /** @@ -331,7 +328,7 @@ static i40e_status __i40e_read_nvm_word(struct i40e_hw *hw, i40e_status i40e_read_nvm_word(struct i40e_hw *hw, u16 offset, u16 *data) { - i40e_status ret_code = 0; + i40e_status ret_code; ret_code = i40e_acquire_nvm(hw, I40E_RESOURCE_READ); if (ret_code) @@ -446,13 +443,10 @@ static i40e_status __i40e_read_nvm_buffer(struct i40e_hw *hw, u16 offset, u16 *words, u16 *data) { - i40e_status ret_code = 0; - if (hw->flags & I40E_HW_FLAG_AQ_SRCTL_ACCESS_ENABLE) - ret_code = i40e_read_nvm_buffer_aq(hw, offset, words, data); - else - ret_code = i40e_read_nvm_buffer_srctl(hw, offset, words, data); - return ret_code; + return i40e_read_nvm_buffer_aq(hw, offset, words, data); + + return i40e_read_nvm_buffer_srctl(hw, offset, words, data); } /** -- 2.14.2
[net-next 12/14] i40e: use a local variable instead of calculating multiple times
From: Lihong Yang The computed result of I40E_MAX_VSI_QP * I40E_VIRTCHNL_SUPPORTED_QTYPES is used more than three times in function i40e_config_irq_link_list. Simply declare a local variable to store it to improve readability. Signed-off-by: Lihong Yang Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 125dcd1d2233..0c4fa225c7be 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -273,7 +273,7 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_id, struct i40e_hw *hw = &pf->hw; u16 vsi_queue_id, pf_queue_id; enum i40e_queue_type qtype; - u16 next_q, vector_id; + u16 next_q, vector_id, size; u32 reg, reg_idx; u16 itr_idx = 0; @@ -303,11 +303,9 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_id, vsi_queue_id + 1)); } - next_q = find_first_bit(&linklistmap, - (I40E_MAX_VSI_QP * -I40E_VIRTCHNL_SUPPORTED_QTYPES)); - if (unlikely(next_q == (I40E_MAX_VSI_QP * - I40E_VIRTCHNL_SUPPORTED_QTYPES))) + size = I40E_MAX_VSI_QP * I40E_VIRTCHNL_SUPPORTED_QTYPES; + next_q = find_first_bit(&linklistmap, size); + if (unlikely(next_q == size)) goto irq_list_done; vsi_queue_id = next_q / I40E_VIRTCHNL_SUPPORTED_QTYPES; @@ -317,7 +315,7 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_id, wr32(hw, reg_idx, reg); - while (next_q < (I40E_MAX_VSI_QP * I40E_VIRTCHNL_SUPPORTED_QTYPES)) { + while (next_q < size) { switch (qtype) { case I40E_QUEUE_TYPE_RX: reg_idx = I40E_QINT_RQCTL(pf_queue_id); @@ -331,12 +329,8 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_id, break; } - next_q = find_next_bit(&linklistmap, - (I40E_MAX_VSI_QP * - I40E_VIRTCHNL_SUPPORTED_QTYPES), - next_q + 1); - if (next_q < - (I40E_MAX_VSI_QP * I40E_VIRTCHNL_SUPPORTED_QTYPES)) { + next_q = find_next_bit(&linklistmap, size, next_q + 1); + if (next_q < size) { vsi_queue_id = next_q / I40E_VIRTCHNL_SUPPORTED_QTYPES; qtype = next_q % I40E_VIRTCHNL_SUPPORTED_QTYPES; pf_queue_id = i40e_vc_get_pf_queue_id(vf, vsi_id, -- 2.14.2
[net-next 09/14] i40e: allow XPS with QoS enabled
From: Jacob Keller Recently, the kernel gained support for enabling XPS and QoS at the same time. Thus, we no longer need to worry about the number of traffic classes when enabling XPS. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 17 ++--- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 74875ddaeb33..b26f615bed5a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -2879,23 +2879,18 @@ static void i40e_vsi_free_rx_resources(struct i40e_vsi *vsi) **/ static void i40e_config_xps_tx_ring(struct i40e_ring *ring) { - struct i40e_vsi *vsi = ring->vsi; int cpu; if (!ring->q_vector || !ring->netdev) return; - if ((vsi->tc_config.numtc <= 1) && - !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, ring->state)) { - cpu = cpumask_local_spread(ring->q_vector->v_idx, -1); - netif_set_xps_queue(ring->netdev, get_cpu_mask(cpu), - ring->queue_index); - } + /* We only initialize XPS once, so as not to overwrite user settings */ + if (test_and_set_bit(__I40E_TX_XPS_INIT_DONE, ring->state)) + return; - /* schedule our worker thread which will take care of -* applying the new filter changes -*/ - i40e_service_event_schedule(vsi->back); + cpu = cpumask_local_spread(ring->q_vector->v_idx, -1); + netif_set_xps_queue(ring->netdev, get_cpu_mask(cpu), + ring->queue_index); } /** -- 2.14.2
[net-next 10/14] i40e: add check for return from find_first_bit call
From: Lihong Yang The find_first_bit function will return the size passed to search if the first set bit is not found. This patch adds the check in case that happens as the return value would be used as the index in an array and that would have caused the out-of-bounds access. Detected by CoverityScan, CID 1295969 Out-of-bounds access Signed-off-by: Lihong Yang Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 83727906a386..125dcd1d2233 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -306,6 +306,10 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_id, next_q = find_first_bit(&linklistmap, (I40E_MAX_VSI_QP * I40E_VIRTCHNL_SUPPORTED_QTYPES)); + if (unlikely(next_q == (I40E_MAX_VSI_QP * + I40E_VIRTCHNL_SUPPORTED_QTYPES))) + goto irq_list_done; + vsi_queue_id = next_q / I40E_VIRTCHNL_SUPPORTED_QTYPES; qtype = next_q % I40E_VIRTCHNL_SUPPORTED_QTYPES; pf_queue_id = i40e_vc_get_pf_queue_id(vf, vsi_id, vsi_queue_id); -- 2.14.2
[PATCH net-next v2 2/5] bpf: Add tests for eBPF file mode
From: Chenbo Feng Two related tests are added into bpf selftest to test read only map and write only map. The tests verified the read only and write only flags are working on hash maps. Signed-off-by: Chenbo Feng --- tools/testing/selftests/bpf/test_maps.c | 48 + 1 file changed, 48 insertions(+) diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index fe3a443a1102..896f23cfe918 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -1033,6 +1033,51 @@ static void test_map_parallel(void) assert(bpf_map_get_next_key(fd, &key, &key) == -1 && errno == ENOENT); } +static void test_map_rdonly(void) +{ + int i, fd, key = 0, value = 0; + + fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), + MAP_SIZE, map_flags | BPF_F_RDONLY); + if (fd < 0) { + printf("Failed to create map for read only test '%s'!\n", + strerror(errno)); + exit(1); + } + + key = 1; + value = 1234; + /* Insert key=1 element. */ + assert(bpf_map_update_elem(fd, &key, &value, BPF_ANY) == -1 && + errno == EPERM); + + /* Check that key=2 is not found. */ + assert(bpf_map_lookup_elem(fd, &key, &value) == -1 && errno == ENOENT); + assert(bpf_map_get_next_key(fd, &key, &value) == -1 && errno == ENOENT); +} + +static void test_map_wronly(void) +{ + int i, fd, key = 0, value = 0; + + fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), + MAP_SIZE, map_flags | BPF_F_WRONLY); + if (fd < 0) { + printf("Failed to create map for read only test '%s'!\n", + strerror(errno)); + exit(1); + } + + key = 1; + value = 1234; + /* Insert key=1 element. */ + assert(bpf_map_update_elem(fd, &key, &value, BPF_ANY) == 0) + + /* Check that key=2 is not found. */ + assert(bpf_map_lookup_elem(fd, &key, &value) == -1 && errno == EPERM); + assert(bpf_map_get_next_key(fd, &key, &value) == -1 && errno == EPERM); +} + static void run_all_tests(void) { test_hashmap(0, NULL); @@ -1050,6 +1095,9 @@ static void run_all_tests(void) test_map_large(); test_map_parallel(); test_map_stress(); + + test_map_rdonly(); + test_map_wronly(); } int main(void) -- 2.14.2.920.gcf0c67979c-goog
[PATCH net-next v2 4/5] selinux: bpf: Add selinux check for eBPF syscall operations
From: Chenbo Feng Implement the actual checks introduced to eBPF related syscalls. This implementation use the security field inside bpf object to store a sid that identify the bpf object. And when processes try to access the object, selinux will check if processes have the right privileges. The creation of eBPF object are also checked at the general bpf check hook and new cmd introduced to eBPF domain can also be checked there. Signed-off-by: Chenbo Feng Acked-by: Alexei Starovoitov --- security/selinux/hooks.c| 111 security/selinux/include/classmap.h | 2 + security/selinux/include/objsec.h | 4 ++ 3 files changed, 117 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index f5d304736852..41aba4e3d57c 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -85,6 +85,7 @@ #include #include #include +#include #include "avc.h" #include "objsec.h" @@ -6252,6 +6253,106 @@ static void selinux_ib_free_security(void *ib_sec) } #endif +#ifdef CONFIG_BPF_SYSCALL +static int selinux_bpf(int cmd, union bpf_attr *attr, +unsigned int size) +{ + u32 sid = current_sid(); + int ret; + + switch (cmd) { + case BPF_MAP_CREATE: + ret = avc_has_perm(sid, sid, SECCLASS_BPF_MAP, BPF_MAP__CREATE, + NULL); + break; + case BPF_PROG_LOAD: + ret = avc_has_perm(sid, sid, SECCLASS_BPF_PROG, BPF_PROG__LOAD, + NULL); + break; + default: + ret = 0; + break; + } + + return ret; +} + +static u32 bpf_map_fmode_to_av(fmode_t fmode) +{ + u32 av = 0; + + if (f_mode & FMODE_READ) + av |= BPF_MAP__READ; + if (f_mode & FMODE_WRITE) + av |= BPF_MAP__WRITE; + return av; +} + +static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode) +{ + u32 sid = current_sid(); + struct bpf_security_struct *bpfsec; + + bpfsec = map->security; + return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_MAP, + bpf_map_fmode_to_av(fmode), NULL); +} + +static int selinux_bpf_prog(struct bpf_prog *prog) +{ + u32 sid = current_sid(); + struct bpf_security_struct *bpfsec; + + bpfsec = prog->aux->security; + return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_PROG, + BPF_PROG__USE, NULL); +} + +static int selinux_bpf_map_alloc(struct bpf_map *map) +{ + struct bpf_security_struct *bpfsec; + + bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL); + if (!bpfsec) + return -ENOMEM; + + bpfsec->sid = current_sid(); + map->security = bpfsec; + + return 0; +} + +static void selinux_bpf_map_free(struct bpf_map *map) +{ + struct bpf_security_struct *bpfsec = map->security; + + map->security = NULL; + kfree(bpfsec); +} + +static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux) +{ + struct bpf_security_struct *bpfsec; + + bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL); + if (!bpfsec) + return -ENOMEM; + + bpfsec->sid = current_sid(); + aux->security = bpfsec; + + return 0; +} + +static void selinux_bpf_prog_free(struct bpf_prog_aux *aux) +{ + struct bpf_security_struct *bpfsec = aux->security; + + aux->security = NULL; + kfree(bpfsec); +} +#endif + static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(binder_set_context_mgr, selinux_binder_set_context_mgr), LSM_HOOK_INIT(binder_transaction, selinux_binder_transaction), @@ -6471,6 +6572,16 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match), LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free), #endif + +#ifdef CONFIG_BPF_SYSCALL + LSM_HOOK_INIT(bpf, selinux_bpf), + LSM_HOOK_INIT(bpf_map, selinux_bpf_map), + LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog), + LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc), + LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc), + LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free), + LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free), +#endif }; static __init int selinux_init(void) diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index 35ffb29a69cb..7253c5eea59c 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -237,6 +237,8 @@ struct security_class_mapping secclass_map[] = { { "access", NULL } }, { "infiniband_endport", { "manage_subnet", NULL } }, + { "bpf_map", {"create", "read", "write"} }, + { "bpf_prog", {"load", "use"} }
[PATCH net-next v2 3/5] security: bpf: Add LSM hooks for bpf object related syscall
From: Chenbo Feng Introduce several LSM hooks for the syscalls that will allow the userspace to access to eBPF object such as eBPF programs and eBPF maps. The security check is aimed to enforce a per object security protection for eBPF object so only processes with the right priviliges can read/write to a specific map or use a specific eBPF program. Besides that, a general security hook is added before the multiplexer of bpf syscall to check the cmd and the attribute used for the command. The actual security module can decide which command need to be checked and how the cmd should be checked. Signed-off-by: Chenbo Feng --- include/linux/bpf.h | 6 ++ include/linux/lsm_hooks.h | 54 +++ include/linux/security.h | 45 +++ kernel/bpf/syscall.c | 28 ++-- security/security.c | 32 5 files changed, 163 insertions(+), 2 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0e9ca2555d7f..225740688ab7 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -57,6 +57,9 @@ struct bpf_map { atomic_t usercnt; struct bpf_map *inner_map_meta; char name[BPF_OBJ_NAME_LEN]; +#ifdef CONFIG_SECURITY + void *security; +#endif }; /* function argument constraints */ @@ -190,6 +193,9 @@ struct bpf_prog_aux { struct user_struct *user; u64 load_time; /* ns since boottime */ char name[BPF_OBJ_NAME_LEN]; +#ifdef CONFIG_SECURITY + void *security; +#endif union { struct work_struct work; struct rcu_head rcu; diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index c9258124e417..7161d8e7ee79 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1351,6 +1351,40 @@ * @inode we wish to get the security context of. * @ctx is a pointer in which to place the allocated security context. * @ctxlen points to the place to put the length of @ctx. + * + * Security hooks for using the eBPF maps and programs functionalities through + * eBPF syscalls. + * + * @bpf: + * Do a initial check for all bpf syscalls after the attribute is copied + * into the kernel. The actual security module can implement their own + * rules to check the specific cmd they need. + * + * @bpf_map: + * Do a check when the kernel generate and return a file descriptor for + * eBPF maps. + * + * @map: bpf map that we want to access + * @mask: the access flags + * + * @bpf_prog: + * Do a check when the kernel generate and return a file descriptor for + * eBPF programs. + * + * @prog: bpf prog that userspace want to use. + * + * @bpf_map_alloc_security: + * Initialize the security field inside bpf map. + * + * @bpf_map_free_security: + * Clean up the security information stored inside bpf map. + * + * @bpf_prog_alloc_security: + * Initialize the security field inside bpf program. + * + * @bpf_prog_free_security: + * Clean up the security information stored inside bpf prog. + * */ union security_list_options { int (*binder_set_context_mgr)(struct task_struct *mgr); @@ -1682,6 +1716,17 @@ union security_list_options { struct audit_context *actx); void (*audit_rule_free)(void *lsmrule); #endif /* CONFIG_AUDIT */ + +#ifdef CONFIG_BPF_SYSCALL + int (*bpf)(int cmd, union bpf_attr *attr, +unsigned int size); + int (*bpf_map)(struct bpf_map *map, fmode_t fmode); + int (*bpf_prog)(struct bpf_prog *prog); + int (*bpf_map_alloc_security)(struct bpf_map *map); + void (*bpf_map_free_security)(struct bpf_map *map); + int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux); + void (*bpf_prog_free_security)(struct bpf_prog_aux *aux); +#endif /* CONFIG_BPF_SYSCALL */ }; struct security_hook_heads { @@ -1901,6 +1946,15 @@ struct security_hook_heads { struct list_head audit_rule_match; struct list_head audit_rule_free; #endif /* CONFIG_AUDIT */ +#ifdef CONFIG_BPF_SYSCALL + struct list_head bpf; + struct list_head bpf_map; + struct list_head bpf_prog; + struct list_head bpf_map_alloc_security; + struct list_head bpf_map_free_security; + struct list_head bpf_prog_alloc_security; + struct list_head bpf_prog_free_security; +#endif /* CONFIG_BPF_SYSCALL */ } __randomize_layout; /* diff --git a/include/linux/security.h b/include/linux/security.h index ce6265960d6c..18800b0911e5 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -31,6 +31,7 @@ #include #include #include +#include struct linux_binprm; struct cred; @@ -1730,6 +1731,50 @@ static inline void securityfs_remove(struct dentry *dentry) #endif +#ifdef CONFIG_BPF_SYSCALL +#ifdef CONFIG_SECURITY +extern int security_bpf(int cmd, uni
[PATCH net-next v2 5/5] selinux: bpf: Add addtional check for bpf object file receive
From: Chenbo Feng Introduce a bpf object related check when sending and receiving files through unix domain socket as well as binder. It checks if the receiving process have privilege to read/write the bpf map or use the bpf program. This check is necessary because the bpf maps and programs are using a anonymous inode as their shared inode so the normal way of checking the files and sockets when passing between processes cannot work properly on eBPF object. This check only works when the BPF_SYSCALL is configured. The information stored inside the file security struct is the same as the information in bpf object security struct. Signed-off-by: Chenbo Feng --- include/linux/bpf.h | 3 +++ include/linux/lsm_hooks.h | 17 + include/linux/security.h | 9 +++ kernel/bpf/syscall.c | 4 ++-- security/security.c | 8 +++ security/selinux/hooks.c | 61 +++ 6 files changed, 100 insertions(+), 2 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 225740688ab7..81d6c01b8825 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -285,6 +285,9 @@ int bpf_prog_array_copy_to_user(struct bpf_prog_array __rcu *progs, #ifdef CONFIG_BPF_SYSCALL DECLARE_PER_CPU(int, bpf_prog_active); +extern const struct file_operations bpf_map_fops; +extern const struct file_operations bpf_prog_fops; + #define BPF_PROG_TYPE(_id, _ops) \ extern const struct bpf_verifier_ops _ops; #define BPF_MAP_TYPE(_id, _ops) \ diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 7161d8e7ee79..517dea60b87b 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1385,6 +1385,19 @@ * @bpf_prog_free_security: * Clean up the security information stored inside bpf prog. * + * @bpf_map_file: + * When creating a bpf map fd, set up the file security information with + * the bpf security information stored in the map struct. So when the map + * fd is passed between processes, the security module can directly read + * the security information from file security struct rather than the bpf + * security struct. + * + * @bpf_prog_file: + * When creating a bpf prog fd, set up the file security information with + * the bpf security information stored in the prog struct. So when the prog + * fd is passed between processes, the security module can directly read + * the security information from file security struct rather than the bpf + * security struct. */ union security_list_options { int (*binder_set_context_mgr)(struct task_struct *mgr); @@ -1726,6 +1739,8 @@ union security_list_options { void (*bpf_map_free_security)(struct bpf_map *map); int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux); void (*bpf_prog_free_security)(struct bpf_prog_aux *aux); + void (*bpf_map_file)(struct bpf_map *map, struct file *file); + void (*bpf_prog_file)(struct bpf_prog_aux *aux, struct file *file); #endif /* CONFIG_BPF_SYSCALL */ }; @@ -1954,6 +1969,8 @@ struct security_hook_heads { struct list_head bpf_map_free_security; struct list_head bpf_prog_alloc_security; struct list_head bpf_prog_free_security; + struct list_head bpf_map_file; + struct list_head bpf_prog_file; #endif /* CONFIG_BPF_SYSCALL */ } __randomize_layout; diff --git a/include/linux/security.h b/include/linux/security.h index 18800b0911e5..57573b794e2d 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1740,6 +1740,8 @@ extern int security_bpf_map_alloc(struct bpf_map *map); extern void security_bpf_map_free(struct bpf_map *map); extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux); extern void security_bpf_prog_free(struct bpf_prog_aux *aux); +extern void security_bpf_map_file(struct bpf_map *map, struct file *file); +extern void security_bpf_prog_file(struct bpf_prog_aux *aux, struct file *file); #else static inline int security_bpf(int cmd, union bpf_attr *attr, unsigned int size) @@ -1772,6 +1774,13 @@ static inline int security_bpf_prog_alloc(struct bpf_prog_aux *aux) static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) { } + +static inline void security_bpf_map_file(struct bpf_map *map, struct file *file) +{ } + +static inline void security_bpf_prog_file(struct bpf_prog_aux *aux, + struct file *file) +{ } #endif /* CONFIG_SECURITY */ #endif /* CONFIG_BPF_SYSCALL */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 1cf31ddd7616..b144181d3f3a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -313,7 +313,7 @@ static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf, return -EINVAL; } -static const struct file_operations bpf_map_fops = { +const struct file_operations bpf_map_fops = { #ifdef CONFIG_PROC_FS
[PATCH net-next v2 1/5] bpf: Add file mode configuration into bpf maps
From: Chenbo Feng Introduce the map read/write flags to the eBPF syscalls that returns the map fd. The flags is used to set up the file mode when construct a new file descriptor for bpf maps. To not break the backward capability, the f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise it should be O_RDONLY or O_WRONLY. When the userspace want to modify or read the map content, it will check the file mode to see if it is allowed to make the change. Signed-off-by: Chenbo Feng Acked-by: Alexei Starovoitov --- include/linux/bpf.h | 6 ++-- include/uapi/linux/bpf.h | 6 kernel/bpf/arraymap.c| 7 +++-- kernel/bpf/devmap.c | 5 ++- kernel/bpf/hashtab.c | 5 +-- kernel/bpf/inode.c | 15 ++--- kernel/bpf/lpm_trie.c| 3 +- kernel/bpf/sockmap.c | 5 ++- kernel/bpf/stackmap.c| 5 ++- kernel/bpf/syscall.c | 80 +++- 10 files changed, 114 insertions(+), 23 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index bc7da2ddfcaf..0e9ca2555d7f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -308,11 +308,11 @@ void bpf_map_area_free(void *base); extern int sysctl_unprivileged_bpf_disabled; -int bpf_map_new_fd(struct bpf_map *map); +int bpf_map_new_fd(struct bpf_map *map, int flags); int bpf_prog_new_fd(struct bpf_prog *prog); int bpf_obj_pin_user(u32 ufd, const char __user *pathname); -int bpf_obj_get_user(const char __user *pathname); +int bpf_obj_get_user(const char __user *pathname, int flags); int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value); int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value); @@ -331,6 +331,8 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, void *key, void *value, u64 map_flags); int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value); +int bpf_get_file_flag(int flags); + /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and * forced to use 'long' read/writes to try to atomically copy long counters. * Best-effort only. No barriers here, since it _will_ race with concurrent diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 6db9e1d679cd..9cb50a228c39 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -217,6 +217,10 @@ enum bpf_attach_type { #define BPF_OBJ_NAME_LEN 16U +/* Flags for accessing BPF object */ +#define BPF_F_RDONLY (1U << 3) +#define BPF_F_WRONLY (1U << 4) + union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ @@ -259,6 +263,7 @@ union bpf_attr { struct { /* anonymous struct used by BPF_OBJ_* commands */ __aligned_u64 pathname; __u32 bpf_fd; + __u32 file_flags; }; struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ @@ -286,6 +291,7 @@ union bpf_attr { __u32 map_id; }; __u32 next_id; + __u32 open_flags; }; struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */ diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 68d866628be0..f869e48ef2f6 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -19,6 +19,9 @@ #include "map_in_map.h" +#define ARRAY_CREATE_FLAG_MASK \ + (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) + static void bpf_array_free_percpu(struct bpf_array *array) { int i; @@ -56,8 +59,8 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr) /* check sanity of attributes */ if (attr->max_entries == 0 || attr->key_size != 4 || - attr->value_size == 0 || attr->map_flags & ~BPF_F_NUMA_NODE || - (percpu && numa_node != NUMA_NO_NODE)) + attr->value_size == 0 || attr->map_flags & + ~ARRAY_CREATE_FLAG_MASK || (percpu && numa_node != NUMA_NO_NODE)) return ERR_PTR(-EINVAL); if (attr->value_size > KMALLOC_MAX_SIZE) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index e093d9a2c4dd..e5d3de7cff2e 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -50,6 +50,9 @@ #include #include +#define DEV_CREATE_FLAG_MASK \ + (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) + struct bpf_dtab_netdev { struct net_device *dev; struct bpf_dtab *dtab; @@ -80,7 +83,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr) /* check sanity of attributes */ if (attr->max_entries == 0 || attr->key_size != 4 || - attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE) + attr->value_size != 4 || attr->map_flags & ~DEV_CREATE_FLAG_MASK) return ERR_PTR(-EINVA
[PATCH net-next v2 0/5] bpf: security: New file mode and LSM hooks for eBPF object permission control
From: Chenbo Feng Much like files and sockets, eBPF objects are accessed, controlled, and shared via a file descriptor (FD). Unlike files and sockets, the existing mechanism for eBPF object access control is very limited. Currently there are two options for granting accessing to eBPF operations: grant access to all processes, or only CAP_SYS_ADMIN processes. The CAP_SYS_ADMIN-only mode is not ideal because most users do not have this capability and granting a user CAP_SYS_ADMIN grants too many other security-sensitive permissions. It also unnecessarily allows all CAP_SYS_ADMIN processes access to eBPF functionality. Allowing all processes to access to eBPF objects is also undesirable since it has potential to allow unprivileged processes to consume kernel memory, and opens up attack surface to the kernel. Adding LSM hooks maintains the status quo for systems which do not use an LSM, preserving compatibility with userspace, while allowing security modules to choose how best to handle permissions on eBPF objects. Here is a possible use case for the lsm hooks with selinux module: The network-control daemon (netd) creates and loads an eBPF object for network packet filtering and analysis. It passes the object FD to an unprivileged network monitor app (netmonitor), which is not allowed to create, modify or load eBPF objects, but is allowed to read the traffic stats from the map. Selinux could use these hooks to grant the following permissions: allow netd self:bpf_map { create read write}; allow netmonitor netd:fd use; allow netmonitor netd:bpf_map read; In this patch series, A file mode is added to bpf map to store the accessing mode. With this file mode flags, the map can be obtained read only, write only or read and write. With the help of this file mode, several security hooks can be added to the eBPF syscall implementations to do permissions checks. These LSM hooks are mainly focused on checking the process privileges before it obtains the fd for a specific bpf object. No matter from a file location or from a eBPF id. Besides that, a general check hook is also implemented at the start of bpf syscalls so that each security module can have their own implementation on the reset of bpf object related functionalities. In order to store the ownership and security information about eBPF maps, a security field pointer is added to the struct bpf_map. And the last two patch set are implementation of selinux check on these hooks introduced, plus an additional check when eBPF object is passed between processes using unix socket as well as binder IPC. Change since V1: - Whitelist the new bpf flags in the map allocate check. - Added bpf selftest for the new flags. - Added two new security hooks for copying the security information from the bpf object security struct to file security struct - Simplified the checking action when bpf fd is passed between processes. Chenbo Feng (5): bpf: Add file mode configuration into bpf maps bpf: Add tests for eBPF file mode security: bpf: Add LSM hooks for bpf object related syscall selinux: bpf: Add selinux check for eBPF syscall operations selinux: bpf: Add addtional check for bpf object file receive include/linux/bpf.h | 15 ++- include/linux/lsm_hooks.h | 71 + include/linux/security.h| 54 ++ include/uapi/linux/bpf.h| 6 ++ kernel/bpf/arraymap.c | 7 +- kernel/bpf/devmap.c | 5 +- kernel/bpf/hashtab.c| 5 +- kernel/bpf/inode.c | 15 ++- kernel/bpf/lpm_trie.c | 3 +- kernel/bpf/sockmap.c| 5 +- kernel/bpf/stackmap.c | 5 +- kernel/bpf/syscall.c| 112 ++--- security/security.c | 40 security/selinux/hooks.c| 172 security/selinux/include/classmap.h | 2 + security/selinux/include/objsec.h | 4 + tools/testing/selftests/bpf/test_maps.c | 48 + 17 files changed, 542 insertions(+), 27 deletions(-) -- 2.14.2.920.gcf0c67979c-goog
Re: [PATCH net-next RFC 4/9] net: dsa: mv88e6xxx: add support for event capture
On Sun, Oct 8, 2017 at 9:06 AM, Richard Cochran wrote: >> +static int mv88e6xxx_ptp_enable_perout(struct mv88e6xxx_chip *chip, >> +struct ptp_clock_request *rq, int on) >> +{ >> + struct timespec ts; >> + u64 ns; >> + int pin; >> + int err; >> + >> + pin = ptp_find_pin(chip->ptp_clock, PTP_PF_PEROUT, rq->extts.index); >> + >> + if (pin < 0) >> + return -EBUSY; >> + >> + ts.tv_sec = rq->perout.period.sec; >> + ts.tv_nsec = rq->perout.period.nsec; >> + ns = timespec_to_ns(&ts); >> + >> + if (ns > U32_MAX) >> + return -ERANGE; >> + >> + mutex_lock(&chip->reg_lock); >> + >> + err = mv88e6xxx_config_periodic_trig(chip, (u32)ns, 0); > > Here you ignore the phase of the signal given in the trq->perout.start > field. That is not what the user expects. For periodic outputs where > the phase cannot be set, we really would need a new ioctl. > > However, in this case, you should just drop this functionality. I > understand that this works with your adjustable external oscillator, > but we cannot support that in mainline (at least, not yet). I've been working with this patchset and just came across this limitation as well. The periodic timer output is the basis of the Qbv t0 timer in devices that support Qbv, and setting this up with the correct start time is pretty important in that context. The hardware does support setting a start time, but it must be specified according to the cycle count of the free-running timer rather than a nanosecond value. I think this can be worked out from the values stored in the timecounter struct and I'm writing some code for it now, but if you've already written something I'd be happy to integrate that instead. Another issue related to this is that while the free-running counter in the hardware can't be easily adjusted, the periodic event generator *can* be finely adjusted (via picosecond and sub-picosecond accumulators) to correct for drift between the local clock and the PTP grandmaster time. So to be semantically correct, this needs to be both started at the right time *and* it needs to have the periodic corrections made so that the fine correction parameters in the hardware keep it adjusted to be synchronous with PTP grandmaster time. So, taking this functionality out in the first pass seems like a good move for Brandon to take, but I'm working on a complete implementation for it. I think I've got a handle on how to do it, but if you have any suggestions, I'm open to them. Levi
RE: [PATCH RFC v1 0/3] Support for tap user-space access with veth interfaces
Hello, Just a reminder for feedback. Please let me know your comments. > -Original Message- > From: Grandhi, Sainath > Sent: Wednesday, September 06, 2017 5:34 PM > To: netdev@vger.kernel.org > Cc: da...@davemloft.net; Grandhi, Sainath > Subject: [PATCH RFC v1 0/3] Support for tap user-space access with veth > interfaces > > From: Sainath Grandhi > > This patchset adds a tap device driver for veth virtual network interface. > With this implementation, tap character interface can be added only to the > peer > veth interface. Adding tap interface to veth is for usecases that forwards > packets between host and VMs. This eliminates the need for an additional > software bridge. This can be extended to create both the peer interfaces as > tap > interfaces. These patches are a step in that direction. > > Sainath Grandhi (3): > net: Adding API to parse IFLA_LINKINFO attribute > net: Abstracting out common routines from veth for use by vethtap > vethtap: veth based tap driver > > drivers/net/Kconfig | 1 + > drivers/net/Makefile| 2 + > drivers/net/{veth.c => veth_main.c} | 80 ++--- > drivers/net/vethtap.c | 216 > > include/linux/if_veth.h | 13 +++ > include/net/rtnetlink.h | 3 + > net/core/rtnetlink.c| 8 ++ > 7 files changed, 308 insertions(+), 15 deletions(-) rename > drivers/net/{veth.c => > veth_main.c} (89%) create mode 100644 drivers/net/vethtap.c create mode > 100644 include/linux/if_veth.h > > -- > 2.7.4
[PATCH net-next] once: switch to new jump label API
From: Eric Biggers Switch the DO_ONCE() macro from the deprecated jump label API to the new one. The new one is more readable, and for DO_ONCE() it also makes the generated code more icache-friendly: now the one-time initialization code is placed out-of-line at the jump target, rather than at the inline fallthrough case. Acked-by: Hannes Frederic Sowa Signed-off-by: Eric Biggers --- include/linux/once.h | 6 +++--- lib/once.c | 8 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/linux/once.h b/include/linux/once.h index 9c98aaa87cbc..724724918e8b 100644 --- a/include/linux/once.h +++ b/include/linux/once.h @@ -5,7 +5,7 @@ #include bool __do_once_start(bool *done, unsigned long *flags); -void __do_once_done(bool *done, struct static_key *once_key, +void __do_once_done(bool *done, struct static_key_true *once_key, unsigned long *flags); /* Call a function exactly once. The idea of DO_ONCE() is to perform @@ -38,8 +38,8 @@ void __do_once_done(bool *done, struct static_key *once_key, ({ \ bool ___ret = false; \ static bool ___done = false; \ - static struct static_key ___once_key = STATIC_KEY_INIT_TRUE; \ - if (static_key_true(&___once_key)) { \ + static DEFINE_STATIC_KEY_TRUE(___once_key); \ + if (static_branch_unlikely(&___once_key)) { \ unsigned long ___flags; \ ___ret = __do_once_start(&___done, &___flags); \ if (unlikely(___ret)) { \ diff --git a/lib/once.c b/lib/once.c index 05c8604627eb..831c5a6b0bb2 100644 --- a/lib/once.c +++ b/lib/once.c @@ -5,7 +5,7 @@ struct once_work { struct work_struct work; - struct static_key *key; + struct static_key_true *key; }; static void once_deferred(struct work_struct *w) @@ -14,11 +14,11 @@ static void once_deferred(struct work_struct *w) work = container_of(w, struct once_work, work); BUG_ON(!static_key_enabled(work->key)); - static_key_slow_dec(work->key); + static_branch_disable(work->key); kfree(work); } -static void once_disable_jump(struct static_key *key) +static void once_disable_jump(struct static_key_true *key) { struct once_work *w; @@ -51,7 +51,7 @@ bool __do_once_start(bool *done, unsigned long *flags) } EXPORT_SYMBOL(__do_once_start); -void __do_once_done(bool *done, struct static_key *once_key, +void __do_once_done(bool *done, struct static_key_true *once_key, unsigned long *flags) __releases(once_lock) { -- 2.14.2.920.gcf0c67979c-goog
Re: [PATCH] once: switch to new jump label API
From: Daniel Borkmann Date: Mon, 09 Oct 2017 23:14:42 +0200 > Given original code was accepted against net-next tree as major > users of the api are networking related anyway, it should be fine > here as well to route through this tree. Maybe resend the patch with > a [PATCH net-next] in the subject line (as usually done) to make the > targeted tree more clear. Indeed, please do, sorry for not replying.
Re: [PATCH net] net: enable interface alias removal via rtnl
On 10/9/17 9:25 AM, Nicolas Dichtel wrote: > Le 09/10/2017 à 16:02, David Ahern a écrit : >> On 10/9/17 2:23 AM, Nicolas Dichtel wrote: >>> Le 06/10/2017 à 22:10, Oliver Hartkopp a écrit : On 10/06/2017 08:18 PM, David Ahern wrote: > On 10/5/17 4:19 AM, Nicolas Dichtel wrote: >> IFLA_IFALIAS is defined as NLA_STRING. It means that the minimal length >> of >> the attribute is 1 ("\0"). However, to remove an alias, the attribute >> length must be 0 (see dev_set_alias()). > > why not add a check in dev_set_alias that if len is 1 and the 1 > character is '\0' it means remove the alias? >>> Because it requires an iproute2 patch. iproute2 doesn't send the '\0'. With >>> the >>> command 'ip link set dummy0 alias ""', the attribute length is 0. >> >> iproute2 needs the feature for 0-len strings or perhaps a 'noalias' option. > iproute2 needs nothing ... > >> >> You can reset the alias using the sysfs file. Given that there is a >> workaround for existing kernels and userspace, upstream can get fixed >> without changing the UAPI. >> > I don't get the point with the UAPI. What will be broken? never mind; I see the error of my ways. > I don't see why allowing an attribute with no data is a problem. > I remember the problem now. I made a patch back in March 2016 that adjusted the policy validation to allow 0-length string. I never sent it and forgot about it until today. You changing ifla_policy to NLA_BINARY is achieving the same thing. I think a comment above the policy line is warranted that clarifies IFLA_IFALIAS is a string but to allow a 0-length string to remove the alias NLA_BINARY is used for policy validation. Comparing the validation done for NLA_STRING vs NLA_BINARY it does change the behavior for 256-character strings.
Re: [PATCH] once: switch to new jump label API
On 10/09/2017 10:27 PM, Eric Biggers wrote: On Fri, Sep 15, 2017 at 09:07:51PM -0700, Eric Biggers wrote: On Tue, Aug 22, 2017 at 02:44:41PM -0400, Hannes Frederic Sowa wrote: Eric Biggers writes: From: Eric Biggers Switch the DO_ONCE() macro from the deprecated jump label API to the new one. The new one is more readable, and for DO_ONCE() it also makes the generated code more icache-friendly: now the one-time initialization code is placed out-of-line at the jump target, rather than at the inline fallthrough case. Signed-off-by: Eric Biggers Acked-by: Hannes Frederic Sowa Great! Who though is the maintainer for this code? It seems it was originally taken by David Miller through the networking tree. David, are you taking further patches to the "once" functions, or should I be trying to get this into -mm, or somewhere else? Eric Ping. Given original code was accepted against net-next tree as major users of the api are networking related anyway, it should be fine here as well to route through this tree. Maybe resend the patch with a [PATCH net-next] in the subject line (as usually done) to make the targeted tree more clear.
Re: [PATCH v6 05/11] dt-bindings: net: dwmac-sun8i: update documentation about integrated PHY
On Sun, Oct 08, 2017 at 06:33:40PM +, Corentin Labbe wrote: > On Thu, Sep 28, 2017 at 09:37:08AM +0200, Corentin Labbe wrote: > > On Wed, Sep 27, 2017 at 04:02:10PM +0200, Andrew Lunn wrote: > > > Hi Corentin > > > > > > > +Required properties for the mdio-mux node: > > > > + - compatible = "mdio-mux" > > > > > > This is too generic. Please add a more specific compatible for this > > > particular mux. You can keep "mdio-mux", since that is what the MDIO > > > subsystem will look for. > > > > > > > I will add allwinner,sun8i-h3-mdio-mux > > > > > > +Required properties of the integrated phy node: > > > > - clocks: a phandle to the reference clock for the EPHY > > > > - resets: a phandle to the reset control for the EPHY > > > > +- phy-is-integrated > > > > > > So the last thing you said is that the mux is not the problem > > > here. Something else is locking up. Did you discover what? > > > > > > I really would like phy-is-integrated to go away. > > > > > > > I have found the problem: by enabling ephy clk/reset the timeout does not > > occur anymore. > > So we could remove phy-is-integrated by: > > Moving internal phy clk/reset handling in mdio_mux_syscon_switch_fn() > > But this means: > > - getting internalphy node always by manually get > > internal_mdio/internal_phy (and not by the given phyhandle) > > - doing some unnecessary tasks (enable/scan/disable) when external_phy is > > needed > > > > Regards > > Hello all > > Below is the current patch, as you can read, it does not use anymore the > phy-is-integrated property. > So now, the mdio-mux must always enable the internal mdio when switch_fn ask > for it and so reset MAC and so need to enable ephy clk/reset. > But for this I need a reference to thoses clock and reset. (this is done in > get_ephy_nodes) > The current version set those clock in mdio-mux node, and as you can see it > is already ugly (lots of get next node), > if the clk/rst nodes were as it should be, in phy nodes, it will be more bad. > > So, since the MAC have a dependency on thoses clk/rst nodes for > doing reset(), I seek a proper way to get references on it. > > OR do you agree that putting ephy clk/rst in emac is acceptable ? Why not just parsing the DT child nodes looking for resets and clocks properties? The usual PHY don't have that. Maxime -- Maxime Ripard, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com signature.asc Description: PGP signature
Re: [net-next V5 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
On 10/09/2017 07:59 PM, Jesper Dangaard Brouer wrote: [...] +static void *cpu_map_lookup_elem(struct bpf_map *map, void *key) +{ + struct bpf_cpu_map_entry *rcpu = + __cpu_map_lookup_elem(map, *(u32 *)key); + + return rcpu ? &rcpu->qsize : NULL; I still think from my prior email/comment that we should use per-cpu scratch buffer here. Would be nice to keep the guarantee that noone can modify it, it's just a tiny change. Well, it's no-longer really needed, right(?), as this patchset update, change that bpf-side cannot invoke this. The userspace-side reading this will get a copy. Ah sorry, you're right, the related change happens in later patch, I missed that; would be good to avoid a split in future or other option is to forbid usage initially in check_map_func_compatibility() by bailing out in BPF_MAP_TYPE_CPUMAP case unconditionally, and then enabling it for BPF_FUNC_redirect_map in next step, such that should someone accidentally only backport this patch, we don't allow for unintended misuse. Thanks, Daniel
[PATCH 3/3] net/atm: Adjust 121 checks for null pointers
From: Markus Elfring Date: Mon, 9 Oct 2017 22:22:45 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The script “checkpatch.pl” pointed information out like the following. Comparison to NULL could be written … Thus fix the affected source code places. Signed-off-by: Markus Elfring --- net/atm/br2684.c | 17 net/atm/clip.c| 2 +- net/atm/lec.c | 38 - net/atm/mpc.c | 114 +- net/atm/mpoa_caches.c | 63 ++-- net/atm/pppoatm.c | 11 +++-- net/atm/proc.c| 2 +- 7 files changed, 123 insertions(+), 124 deletions(-) diff --git a/net/atm/br2684.c b/net/atm/br2684.c index f5b601c01f38..d3f8ec90556a 100644 --- a/net/atm/br2684.c +++ b/net/atm/br2684.c @@ -212,7 +212,7 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev, struct sk_buff *skb2 = skb_realloc_headroom(skb, minheadroom); brvcc->copies_needed++; dev_kfree_skb(skb); - if (skb2 == NULL) { + if (!skb2) { brvcc->copies_failed++; return 0; } @@ -299,7 +299,7 @@ static netdev_tx_t br2684_start_xmit(struct sk_buff *skb, pr_debug("skb_dst(skb)=%p\n", skb_dst(skb)); read_lock(&devs_lock); brvcc = pick_outgoing_vcc(skb, brdev); - if (brvcc == NULL) { + if (!brvcc) { pr_debug("no vcc attached to dev %s\n", dev->name); dev->stats.tx_errors++; dev->stats.tx_carrier_errors++; @@ -372,13 +372,13 @@ static int br2684_setfilt(struct atm_vcc *atmvcc, void __user * arg) struct br2684_dev *brdev; read_lock(&devs_lock); brdev = BRPRIV(br2684_find_dev(&fs.ifspec)); - if (brdev == NULL || list_empty(&brdev->brvccs) || + if (!brdev || list_empty(&brdev->brvccs) || brdev->brvccs.next != brdev->brvccs.prev) /* >1 VCC */ brvcc = NULL; else brvcc = list_entry_brvcc(brdev->brvccs.next); read_unlock(&devs_lock); - if (brvcc == NULL) + if (!brvcc) return -ESRCH; } else brvcc = BR2684_VCC(atmvcc); @@ -427,8 +427,7 @@ static void br2684_push(struct atm_vcc *atmvcc, struct sk_buff *skb) struct br2684_dev *brdev = BRPRIV(net_dev); pr_debug("\n"); - - if (unlikely(skb == NULL)) { + if (unlikely(!skb)) { /* skb==NULL means VCC is being destroyed */ br2684_close_vcc(brvcc); if (list_empty(&brdev->brvccs)) { @@ -550,13 +549,13 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg) atomic_set(&brvcc->qspace, 2); write_lock_irq(&devs_lock); net_dev = br2684_find_dev(&be.ifspec); - if (net_dev == NULL) { + if (!net_dev) { pr_err("tried to attach to non-existent device\n"); err = -ENXIO; goto error; } brdev = BRPRIV(net_dev); - if (atmvcc->push == NULL) { + if (!atmvcc->push) { err = -EBADFD; goto error; } @@ -839,7 +838,7 @@ static int __init br2684_init(void) #ifdef CONFIG_PROC_FS struct proc_dir_entry *p; p = proc_create("br2684", 0, atm_proc_root, &br2684_proc_ops); - if (p == NULL) + if (!p) return -ENOMEM; #endif register_atm_ioctl(&br2684_ioctl_ops); diff --git a/net/atm/clip.c b/net/atm/clip.c index 041d519b8771..b45bfcb6cc1b 100644 --- a/net/atm/clip.c +++ b/net/atm/clip.c @@ -809,7 +809,7 @@ static void *clip_seq_vcc_walk(struct clip_seq_state *state, struct clip_vcc *vcc = state->vcc; vcc = clip_seq_next_vcc(e, vcc); - if (vcc && pos != NULL) { + if (vcc && pos) { while (*pos) { vcc = clip_seq_next_vcc(e, vcc); if (!vcc) diff --git a/net/atm/lec.c b/net/atm/lec.c index 74a794602412..4f94c6ed893c 100644 --- a/net/atm/lec.c +++ b/net/atm/lec.c @@ -139,7 +139,7 @@ static void lec_handle_bridge(struct sk_buff *skb, struct net_device *dev) struct atmlec_msg *mesg; skb2 = alloc_skb(sizeof(struct atmlec_msg), GFP_ATOMIC); - if (skb2 == NULL) + if (!skb2) return; skb2->len = sizeof(struct atmlec_msg); mesg = (struct atmlec_msg *)skb2->data; @@ -264,7 +264,7 @@ static netdev_tx_t lec_start_xmit(struct sk_buff *skb, min_frame_size - skb->truesize, GFP_ATOMIC); dev_kfree_skb(skb); -
[PATCH 2/3] net/atm: Improve a size determination in 12 functions
From: Markus Elfring Date: Mon, 9 Oct 2017 22:00:19 +0200 Replace the specification of data structures by pointer dereferences as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- net/atm/addr.c| 2 +- net/atm/br2684.c | 2 +- net/atm/clip.c| 4 ++-- net/atm/lec.c | 6 +++--- net/atm/mpc.c | 4 ++-- net/atm/mpoa_caches.c | 4 ++-- net/atm/signaling.c | 2 +- 7 files changed, 12 insertions(+), 12 deletions(-) diff --git a/net/atm/addr.c b/net/atm/addr.c index dcda35c66f15..048e6d192818 100644 --- a/net/atm/addr.c +++ b/net/atm/addr.c @@ -86,7 +86,7 @@ int atm_add_addr(struct atm_dev *dev, const struct sockaddr_atmsvc *addr, return -EEXIST; } } - this = kmalloc(sizeof(struct atm_dev_addr), GFP_ATOMIC); + this = kmalloc(sizeof(*this), GFP_ATOMIC); if (!this) { spin_unlock_irqrestore(&dev->lock, flags); return -ENOMEM; diff --git a/net/atm/br2684.c b/net/atm/br2684.c index 4e96f902..f5b601c01f38 100644 --- a/net/atm/br2684.c +++ b/net/atm/br2684.c @@ -538,7 +538,7 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg) if (copy_from_user(&be, arg, sizeof be)) return -EFAULT; - brvcc = kzalloc(sizeof(struct br2684_vcc), GFP_KERNEL); + brvcc = kzalloc(sizeof(*brvcc), GFP_KERNEL); if (!brvcc) return -ENOMEM; /* diff --git a/net/atm/clip.c b/net/atm/clip.c index 65f706e4344c..041d519b8771 100644 --- a/net/atm/clip.c +++ b/net/atm/clip.c @@ -60,7 +60,7 @@ static int to_atmarpd(enum atmarp_ctrl_type type, int itf, __be32 ip) skb = alloc_skb(sizeof(struct atmarp_ctrl), GFP_ATOMIC); if (!skb) return -ENOMEM; - ctrl = skb_put(skb, sizeof(struct atmarp_ctrl)); + ctrl = skb_put(skb, sizeof(*ctrl)); ctrl->type = type; ctrl->itf_num = itf; ctrl->ip = ip; @@ -418,7 +418,7 @@ static int clip_mkip(struct atm_vcc *vcc, int timeout) if (!vcc->push) return -EBADFD; - clip_vcc = kmalloc(sizeof(struct clip_vcc), GFP_KERNEL); + clip_vcc = kmalloc(sizeof(*clip_vcc), GFP_KERNEL); if (!clip_vcc) return -ENOMEM; pr_debug("%p vcc %p\n", clip_vcc, vcc); diff --git a/net/atm/lec.c b/net/atm/lec.c index f5be0b931978..74a794602412 100644 --- a/net/atm/lec.c +++ b/net/atm/lec.c @@ -690,7 +690,7 @@ static int lec_vcc_attach(struct atm_vcc *vcc, void __user *arg) if (ioc_data.dev_num < 0 || ioc_data.dev_num >= MAX_LEC_ITF || !dev_lec[ioc_data.dev_num]) return -EINVAL; - vpriv = kmalloc(sizeof(struct lec_vcc_priv), GFP_KERNEL); + vpriv = kmalloc(sizeof(*vpriv), GFP_KERNEL); if (!vpriv) return -ENOMEM; vpriv->xoff = 0; @@ -1552,7 +1552,7 @@ static struct lec_arp_table *make_entry(struct lec_priv *priv, { struct lec_arp_table *to_return; - to_return = kzalloc(sizeof(struct lec_arp_table), GFP_ATOMIC); + to_return = kzalloc(sizeof(*to_return), GFP_ATOMIC); if (!to_return) return NULL; @@ -2155,7 +2155,7 @@ static int lec_mcast_make(struct lec_priv *priv, struct atm_vcc *vcc) struct lec_vcc_priv *vpriv; int err = 0; - vpriv = kmalloc(sizeof(struct lec_vcc_priv), GFP_KERNEL); + vpriv = kmalloc(sizeof(*vpriv), GFP_KERNEL); if (!vpriv) return -ENOMEM; vpriv->xoff = 0; diff --git a/net/atm/mpc.c b/net/atm/mpc.c index dd57d05b5dcc..d6729d797107 100644 --- a/net/atm/mpc.c +++ b/net/atm/mpc.c @@ -183,7 +183,7 @@ struct atm_mpoa_qos *atm_mpoa_add_qos(__be32 dst_ip, struct atm_qos *qos) return entry; } - entry = kmalloc(sizeof(struct atm_mpoa_qos), GFP_KERNEL); + entry = kmalloc(sizeof(*entry), GFP_KERNEL); if (!entry) return entry; @@ -279,7 +279,7 @@ static struct mpoa_client *alloc_mpc(void) { struct mpoa_client *mpc; - mpc = kzalloc(sizeof(struct mpoa_client), GFP_KERNEL); + mpc = kzalloc(sizeof(*mpc), GFP_KERNEL); if (mpc == NULL) return NULL; rwlock_init(&mpc->ingress_lock); diff --git a/net/atm/mpoa_caches.c b/net/atm/mpoa_caches.c index 7495b42d59eb..23f36e5a20ee 100644 --- a/net/atm/mpoa_caches.c +++ b/net/atm/mpoa_caches.c @@ -96,7 +96,7 @@ static in_cache_entry *in_cache_get_by_vcc(struct atm_vcc *vcc, static in_cache_entry *in_cache_add_entry(__be32 dst_ip, struct mpoa_client *client) { - in_cache_entry *entry = kzalloc(sizeof(in_cache_entry), GFP_KERNEL); + in_cache_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL); i
[PATCH 1/3] net/atm: Delete an error message for a failed memory allocation in five functions
From: Markus Elfring Date: Mon, 9 Oct 2017 21:34:35 +0200 Omit extra messages for a memory allocation failure in these functions. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- net/atm/lec.c | 5 ++--- net/atm/mpc.c | 8 ++-- net/atm/mpoa_caches.c | 8 ++-- 3 files changed, 6 insertions(+), 15 deletions(-) diff --git a/net/atm/lec.c b/net/atm/lec.c index a3d93a1bb133..f5be0b931978 100644 --- a/net/atm/lec.c +++ b/net/atm/lec.c @@ -1553,10 +1553,9 @@ static struct lec_arp_table *make_entry(struct lec_priv *priv, struct lec_arp_table *to_return; to_return = kzalloc(sizeof(struct lec_arp_table), GFP_ATOMIC); - if (!to_return) { - pr_info("LEC: Arp entry kmalloc failed\n"); + if (!to_return) return NULL; - } + ether_addr_copy(to_return->mac_addr, mac_addr); INIT_HLIST_NODE(&to_return->next); setup_timer(&to_return->timer, lec_arp_expire_arp, diff --git a/net/atm/mpc.c b/net/atm/mpc.c index 5677147209e8..dd57d05b5dcc 100644 --- a/net/atm/mpc.c +++ b/net/atm/mpc.c @@ -184,10 +184,8 @@ struct atm_mpoa_qos *atm_mpoa_add_qos(__be32 dst_ip, struct atm_qos *qos) } entry = kmalloc(sizeof(struct atm_mpoa_qos), GFP_KERNEL); - if (entry == NULL) { - pr_info("mpoa: out of memory\n"); + if (!entry) return entry; - } entry->ipaddr = dst_ip; entry->qos = *qos; @@ -473,10 +471,8 @@ static const uint8_t *copy_macs(struct mpoa_client *mpc, kfree(mpc->mps_macs); mpc->number_of_mps_macs = 0; mpc->mps_macs = kmalloc(num_macs * ETH_ALEN, GFP_KERNEL); - if (mpc->mps_macs == NULL) { - pr_info("(%s) out of mem\n", mpc->dev->name); + if (!mpc->mps_macs) return NULL; - } } ether_addr_copy(mpc->mps_macs, router_mac); tlvs += 20; if (device_type == MPS_AND_MPC) tlvs += 20; diff --git a/net/atm/mpoa_caches.c b/net/atm/mpoa_caches.c index 4ccaa16b1eb1..7495b42d59eb 100644 --- a/net/atm/mpoa_caches.c +++ b/net/atm/mpoa_caches.c @@ -98,10 +98,8 @@ static in_cache_entry *in_cache_add_entry(__be32 dst_ip, { in_cache_entry *entry = kzalloc(sizeof(in_cache_entry), GFP_KERNEL); - if (entry == NULL) { - pr_info("mpoa: mpoa_caches.c: new_in_cache_entry: out of memory\n"); + if (!entry) return NULL; - } dprintk("adding an ingress entry, ip = %pI4\n", &dst_ip); @@ -460,10 +458,8 @@ static eg_cache_entry *eg_cache_add_entry(struct k_message *msg, { eg_cache_entry *entry = kzalloc(sizeof(eg_cache_entry), GFP_KERNEL); - if (entry == NULL) { - pr_info("out of memory\n"); + if (!entry) return NULL; - } dprintk("adding an egress entry, ip = %pI4, this should be our IP\n", &msg->content.eg_info.eg_dst_ip); -- 2.14.2
[PATCH 0/3] net-ATM: Adjustments for several function implementations
From: Markus Elfring Date: Mon, 9 Oct 2017 22:30:33 +0200 Some update suggestions were taken into account from static source code analysis. Markus Elfring (3): Delete an error message for a failed memory allocation in five functions Improve a size determination in 12 functions Adjust 121 checks for null pointers net/atm/addr.c| 2 +- net/atm/br2684.c | 19 net/atm/clip.c| 6 +-- net/atm/lec.c | 49 ++-- net/atm/mpc.c | 126 -- net/atm/mpoa_caches.c | 75 +++--- net/atm/pppoatm.c | 11 ++--- net/atm/proc.c| 2 +- net/atm/signaling.c | 2 +- 9 files changed, 141 insertions(+), 151 deletions(-) -- 2.14.2
[net-next 3/3] ip_gre: cache the device mtu hard_header_len calc
The patch introduces ip_tunnel->ether_mtu fields to cache the value of dev->mtu + dev->hard_header_len. This avoids the arithmetic operation on every packet. Signed-off-by: William Tu Cc: David Laight --- include/net/ip_tunnels.h | 1 + net/ipv4/ip_gre.c| 8 net/ipv4/ip_tunnel.c | 3 +++ 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index b41a1e057fce..19565be26e13 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -117,6 +117,7 @@ struct ip_tunnel { /* This field used only by ERSPAN */ u32 index; /* ERSPAN type II index */ + unsigned intether_mtu; /* The mtu including the ether hdr */ struct dst_cache dst_cache; diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 6e6e4c4811cc..994b8ddea0b1 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -578,8 +578,8 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev, if (gre_handle_offloads(skb, false)) goto err_free_rt; - if (skb->len > dev->mtu + dev->hard_header_len) { - pskb_trim(skb, dev->mtu + dev->hard_header_len); + if (skb->len > tunnel->ether_mtu) { + pskb_trim(skb, tunnel->ether_mtu); truncate = true; } @@ -730,8 +730,8 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb, if (skb_cow_head(skb, dev->needed_headroom)) goto free_skb; - if (skb->len > dev->mtu + dev->hard_header_len) { - pskb_trim(skb, dev->mtu + dev->hard_header_len); + if (skb->len > tunnel->ether_mtu) { + pskb_trim(skb, tunnel->ether_mtu); truncate = true; } diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index fe6fee728ce4..859af5b86802 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -348,6 +348,7 @@ static int ip_tunnel_bind_dev(struct net_device *dev) dev->needed_headroom = t_hlen + hlen; mtu -= (dev->hard_header_len + t_hlen); + tunnel->ether_mtu = mtu + dev->hard_header_len; if (mtu < 68) mtu = 68; @@ -952,6 +953,7 @@ int __ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict) } dev->mtu = new_mtu; + tunnel->ether_mtu = new_mtu + dev->hard_header_len; return 0; } EXPORT_SYMBOL_GPL(__ip_tunnel_change_mtu); @@ -1183,6 +1185,7 @@ int ip_tunnel_init(struct net_device *dev) tunnel->dev = dev; tunnel->net = dev_net(dev); + tunnel->ether_mtu = dev->mtu + dev->hard_header_len; strcpy(tunnel->parms.name, dev->name); iph->version= 4; iph->ihl= 5; -- 2.7.4
[net-next 2/3] ip_gre: fix erspan tunnel mtu calculation
Remove the unnecessary -4 and +4 bytes at mtu and headroom calculation. In addition, erspan uses fixed 8-byte gre header, so add ERSPAN_GREHDR_LEN macro for better readability. Now tunnel->hlen = grehdr(8) + erspanhdr(8) = 16 byte. The mtu should be ETH_DATA_LEN - 16 - iph(20) = 1464. After the ip_tunnel_bind_dev(), the mtu is adjusted to 1464 - 14 (dev->hard_header_len) = 1450. The maximum skb->len the erspan tunnel can carry without being truncated is 1450 + 14 = 1464 byte. Signed-off-by: William Tu Cc: Xin Long --- include/net/erspan.h | 1 + net/ipv4/ip_gre.c| 11 +-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/net/erspan.h b/include/net/erspan.h index ca94fc86865e..e28294e248d0 100644 --- a/include/net/erspan.h +++ b/include/net/erspan.h @@ -28,6 +28,7 @@ */ #define ERSPAN_VERSION 0x1 +#define ERSPAN_GREHDR_LEN 8/* ERSPAN has fixed 8-byte GRE header */ #define VER_MASK 0xf000 #define VLAN_MASK 0x0fff diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 286065c35959..6e6e4c4811cc 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -569,8 +569,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev, key = &tun_info->key; - /* ERSPAN has fixed 8 byte GRE header */ - tunnel_hlen = 8 + sizeof(struct erspanhdr); + tunnel_hlen = ERSPAN_GREHDR_LEN + sizeof(struct erspanhdr); rt = prepare_fb_xmit(skb, dev, &fl, tunnel_hlen); if (!rt) @@ -591,7 +590,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev, erspan_build_header(skb, tunnel_id_to_key32(key->tun_id), ntohl(md->index), truncate); - gre_build_header(skb, 8, TUNNEL_SEQ, + gre_build_header(skb, ERSPAN_GREHDR_LEN, TUNNEL_SEQ, htons(ETH_P_ERSPAN), 0, htonl(tunnel->o_seqno++)); df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0; @@ -1242,14 +1241,14 @@ static int erspan_tunnel_init(struct net_device *dev) struct ip_tunnel *tunnel = netdev_priv(dev); int t_hlen; - tunnel->tun_hlen = 8; + tunnel->tun_hlen = ERSPAN_GREHDR_LEN; tunnel->parms.iph.protocol = IPPROTO_GRE; tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen + sizeof(struct erspanhdr); t_hlen = tunnel->hlen + sizeof(struct iphdr); - dev->needed_headroom = LL_MAX_HEADER + t_hlen + 4; - dev->mtu = ETH_DATA_LEN - t_hlen - 4; + dev->needed_headroom = LL_MAX_HEADER + t_hlen; + dev->mtu = ETH_DATA_LEN - t_hlen; dev->features |= GRE_FEATURES; dev->hw_features|= GRE_FEATURES; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; -- 2.7.4
[net-next 1/3] ip_gre: fix mtu and headroom size
The ip_gre_calc_hlen() already counts gre's base header len (4-byte) plus optional header's len, so tunnel->hlen has the entire gre headeri + options len. Thus, remove the -4 and +4 when calculating the needed_headroom and mtu. Fixes: 4565e9919cda ("gre: Setup and TX path for gre/UDP foo-over-udp encapsulation") Signed-off-by: William Tu Cc: Tom Herbert --- net/ipv4/ip_gre.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index c105a315b1a3..286065c35959 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -947,8 +947,8 @@ static void __gre_tunnel_init(struct net_device *dev) t_hlen = tunnel->hlen + sizeof(struct iphdr); - dev->needed_headroom= LL_MAX_HEADER + t_hlen + 4; - dev->mtu= ETH_DATA_LEN - t_hlen - 4; + dev->needed_headroom= LL_MAX_HEADER + t_hlen; + dev->mtu= ETH_DATA_LEN - t_hlen; dev->features |= GRE_FEATURES; dev->hw_features|= GRE_FEATURES; -- 2.7.4
[net-next 0/3] ip_gre: a bunch of fixes for mtu
The first two patches are to fix some issues for mtu and needed_headroom length calculation from the gre and erspan tunnel header. The last path tries to avoid arithmetic operation for every packet when checking for erspan truncate. William Tu (3): ip_gre: fix mtu and headroom size ip_gre: fix erspan tunnel mtu calculation ip_gre: cache the device mtu hard_header_len calc include/net/erspan.h | 1 + include/net/ip_tunnels.h | 1 + net/ipv4/ip_gre.c| 23 +++ net/ipv4/ip_tunnel.c | 3 +++ 4 files changed, 16 insertions(+), 12 deletions(-) -- 2.7.4
Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.
On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote: >On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote: >> Fix BUG() calls to use BUG_ON(conditional) macros. >> >> This was found using make coccicheck M=net/core on linux next >> tag next-2017092 >> >> Signed-off-by: Tim Hansen >> --- >> net/core/skbuff.c | 15 ++- >> 1 file changed, 6 insertions(+), 9 deletions(-) >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> index d98c2e3ce2bf..34ce4c1a0f3c 100644 >> --- a/net/core/skbuff.c >> +++ b/net/core/skbuff.c >> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, >> gfp_t gfp_mask) >> /* Set the tail pointer and length */ >> skb_put(n, skb->len); >> >> -if (skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)) >> -BUG(); >> +BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)); > >I'm concerned with this change. >1. Calling non-trivial bit of code inside the macro is a poor coding style >(imo) >2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and >implementation >of BUG and BUG_ON look quite different. For these archs, wouldn't it then be more efficient to use BUG_ON rather than BUG()? -- Thanks, Sasha
Re: [PATCH] once: switch to new jump label API
On Fri, Sep 15, 2017 at 09:07:51PM -0700, Eric Biggers wrote: > On Tue, Aug 22, 2017 at 02:44:41PM -0400, Hannes Frederic Sowa wrote: > > Eric Biggers writes: > > > > > From: Eric Biggers > > > > > > Switch the DO_ONCE() macro from the deprecated jump label API to the new > > > one. The new one is more readable, and for DO_ONCE() it also makes the > > > generated code more icache-friendly: now the one-time initialization > > > code is placed out-of-line at the jump target, rather than at the inline > > > fallthrough case. > > > > > > Signed-off-by: Eric Biggers > > > > Acked-by: Hannes Frederic Sowa > > > Thanks! > > Great! Who though is the maintainer for this code? It seems it was > originally > taken by David Miller through the networking tree. David, are you taking > further patches to the "once" functions, or should I be trying to get this > into > -mm, or somewhere else? > > Eric Ping.
Re: [PATCHv4 iproute2 2/2] lib/libnetlink: update rtnl_talk to support malloc buff at run time
Hi Stephen, On Mon, Oct 02, 2017 at 10:37:08AM -0700, Stephen Hemminger wrote: > On Thu, 28 Sep 2017 21:33:46 +0800 > Hangbin Liu wrote: > > > From: Hangbin Liu > > > > This is an update for 460c03f3f3cc ("iplink: double the buffer size also in > > iplink_get()"). After update, we will not need to double the buffer size > > every time when VFs number increased. > > > > With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the > > length parameter. > > > > With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable > > answer to avoid overwrite data in nlh, because it may has more info after > > nlh. also this will avoid nlh buffer not enough issue. > > > > We need to free answer after using. > > > > Signed-off-by: Hangbin Liu > > Signed-off-by: Phil Sutter > > --- > > Most of the uses of rtnl_talk() don't need to this peek and dynamic sizing. > Can only those places that need that be targeted? We could probably do that, by having a buffer on stack in __rtnl_talk() which will be used instead of the allocated one if 'answer' is NULL. Or maybe even introduce a dedicated API call for the dynamically allocated receive buffer. But I really doubt that's feasible: AFAICT, that stack buffer still needs to be reasonably sized since the reply might be larger than the request (reusing the request buffer would be the most simple way to tackle this), also there is support for extack which may bloat the response to arbitrary size. Hangbin has shown in his benchmark that the overhead of the second syscall is negligible, so why care about that and increase code complexity even further? Not saying it's not possible, but I just doubt it's worth the effort. Cheers, Phil
RE: [net-next 14/15] i40e: ignore skb->xmit_more when deciding to set RS bit
> -Original Message- > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] > On Behalf Of Alexander Duyck > Sent: Monday, October 09, 2017 9:28 AM > To: David Laight > Cc: Kirsher, Jeffrey T ; da...@davemloft.net; > Keller, Jacob E ; netdev@vger.kernel.org; > nhor...@redhat.com; sassm...@redhat.com; jogre...@redhat.com > Subject: Re: [net-next 14/15] i40e: ignore skb->xmit_more when deciding to set > RS bit > > On Mon, Oct 9, 2017 at 2:07 AM, David Laight wrote: > > From: Jeff Kirsher > >> Sent: 06 October 2017 18:57 > >> From: Jacob Keller > >> > >> Since commit 6a7fded776a7 ("i40e: Fix RS bit update in Tx path and > >> disable force WB workaround") we've tried to "optimize" setting the > >> RS bit based around skb->xmit_more. This same logic was refactored > >> in commit 1dc8b538795f ("i40e: Reorder logic for coalescing RS bits"), > >> but ultimately was not functionally changed. > > > > I've tried to understand the above, but without definitions of WB > > and RS and the '4-ness' of something it is quite difficult. > > > > I THINK this is all about telling the hardware there is a packet > > in the ring to transmit? > > > > I don't understand the 4-ness. Linux requires that the hardware > > be notified of a single packet transmit, and that the 'transmit > > complete' also be notified in 'a timely manner' for a single packet. > > So to clarify some of this. The RS is short for Report Status. It > tells the hardware that when it has completed the descriptor it should > trigger a write back, aka WB. > > The 4-ness is because each descriptor is 16 bytes, so if we write back > once every 4 descriptors that is one 64B cache line which is much > better for most platforms performance wise. > > > skb->xmit_more ought to be usable to optimise both of these > > (assuming it isn't a lie). > > Actually it leads to issues if we hold off for too long. If we are > busy dequeueing a long list, and the Tx queue has gone empty then we > are stalling Tx without need to do so. We need to be regularly > notifying the device that it has buffers available to transmit which > is what xmit_more is good for. However in addition the hardware needs > to notify us regularly that there are buffers ready to clean which it > isn't necessarily so useful for, especially if are filling the entire > ring and only providing one notification to the driver that there are > buffers ready since we can defer the Tx interrupt for too long. > > What Jake is trying to fix here is to prevent us from generating what > looks like a square wave in terms of throughput. Essentially the > current behavior that is being seen is that all the buffers in the > ring either belong to software, or belong to hardware. It is best for > us to have this split half and half so that the hardware has buffers > to transmit and software has buffers to clean and enqueue new buffers > onto. > Yes this sums it up pretty well. The test case which found this bug is outlined in the description. Essentially we could see up to dozens or even almost a hundred packets in a row bunched up if we had a bunch of virtual devices layered on top of the network driver. This led to us not indicating to get status from the hardware about finished packets, which results in an increased delay to finish Tx. > > The driver would need to ensure a ring full of messages isn't > > generated without either wakeup - but that might be 128 frames. > > That is what is happening here. Basically we are guaranteeing > writebacks are occurring on a regular basis instead of in large > bursts. > > > FWIW on the systems I have PCIe writes are relatively cheap > > (reads are slow). So different counts would be appropriate > > for delaying doorbell writes and requesting completion interrupts. > > > > David > > The doorbell writes are still being delayed the same amount with this > patch. The only thing that was split out was the device write backs > are now occurring on a more regular basis instead of being held off > until a much larger set is handled. > Exactly. The tail bumps are still functioning the same, but it's the hardware report status requests which are not delayed. Note that there's some further quirks in how our hardware performs write backs which exacerbate the problem because of how the cachelines are setup so that the delay is even worse than we would expect. Thanks, Jake > - Alex
Re: [PATCH v1 RFC 6/7] Add MIB counter reading support
On 10/06/2017 01:33 PM, tristram...@microchip.com wrote: > From: Tristram Ha > > Add MIB counter reading support. > Rename ksz_9477_reg.h to ksz9477_reg.h for consistency as the product > name is always KSZ. > Header file ksz_priv.h no longer contains any chip specific data. You should probably explain in the commit message that you are adding a timer that reads counters every 30 seconds to avoid overflows, as this is a pretty important piece of information. > +static void ksz9477_phy_setup(struct ksz_device *dev, int port, > + struct phy_device *phy) > +{ > + if (port < dev->phy_port_cnt) { > + /* SUPPORTED_Asym_Pause and SUPPORTED_Pause can be removed to > + * disable flow control when rate limiting is used. > + */ > + phy->advertising = phy->supported; > + } > +} This appears to be an unrelated change here that you would want to put in a separate patch. > + > static void ksz9477_port_setup(struct ksz_device *dev, int port, bool > cpu_port) > { > u8 data8; > @@ -1159,6 +1198,8 @@ static int ksz9477_setup(struct dsa_switch *ds) > /* start switch */ > ksz_cfg(dev, REG_SW_OPERATION, SW_START, true); > > + ksz_init_mib_timer(dev); > + > return 0; > } > > @@ -1168,6 +1209,7 @@ static int ksz9477_setup(struct dsa_switch *ds) > .set_addr = ksz9477_set_addr, > .phy_read = ksz9477_phy_read16, > .phy_write = ksz9477_phy_write16, > + .adjust_link= ksz_adjust_link, > .port_enable= ksz_enable_port, > .port_disable = ksz_disable_port, > .get_strings= ksz9477_get_strings, > @@ -1289,6 +1331,7 @@ static int ksz9477_switch_init(struct ksz_device *dev) > if (!dev->ports) > return -ENOMEM; > for (i = 0; i < dev->mib_port_cnt; i++) { > + mutex_init(&dev->ports[i].mib.cnt_mutex); > dev->ports[i].mib.counters = > devm_kzalloc(dev->dev, >sizeof(u64) * > @@ -1310,7 +1353,12 @@ static void ksz9477_switch_exit(struct ksz_device *dev) > .get_port_addr = ksz9477_get_port_addr, > .cfg_port_member = ksz9477_cfg_port_member, > .flush_dyn_mac_table = ksz9477_flush_dyn_mac_table, > + .phy_setup = ksz9477_phy_setup, > .port_setup = ksz9477_port_setup, > + .r_mib_cnt = ksz9477_r_mib_cnt, > + .r_mib_pkt = ksz9477_r_mib_pkt, > + .freeze_mib = ksz9477_freeze_mib, > + .port_init_cnt = ksz9477_port_init_cnt, > .shutdown = ksz9477_reset_switch, > .detect = ksz9477_switch_detect, > .init = ksz9477_switch_init, > diff --git a/drivers/net/dsa/microchip/ksz_9477_reg.h > b/drivers/net/dsa/microchip/ksz9477_reg.h > similarity index 100% > rename from drivers/net/dsa/microchip/ksz_9477_reg.h > rename to drivers/net/dsa/microchip/ksz9477_reg.h > diff --git a/drivers/net/dsa/microchip/ksz_common.c > b/drivers/net/dsa/microchip/ksz_common.c > index 1c9c4c5..885b3e3 100644 > --- a/drivers/net/dsa/microchip/ksz_common.c > +++ b/drivers/net/dsa/microchip/ksz_common.c > @@ -50,6 +50,81 @@ void ksz_update_port_member(struct ksz_device *dev, int > port) > } > } > > +static void port_r_cnt(struct ksz_device *dev, int port) > +{ > + struct ksz_port_mib *mib = &dev->ports[port].mib; > + u64 *dropped; > + > + /* Some ports may not have MIB counters before SWITCH_COUNTER_NUM. */ > + while (mib->cnt_ptr < dev->reg_mib_cnt) { > + dev->dev_ops->r_mib_cnt(dev, port, mib->cnt_ptr, > + &mib->counters[mib->cnt_ptr]); > + ++mib->cnt_ptr; > + } > + > + /* last one in storage */ > + dropped = &mib->counters[dev->mib_cnt]; > + > + /* Some ports may not have MIB counters after SWITCH_COUNTER_NUM. */ > + while (mib->cnt_ptr < dev->mib_cnt) { > + dev->dev_ops->r_mib_pkt(dev, port, mib->cnt_ptr, > + dropped, &mib->counters[mib->cnt_ptr]); > + ++mib->cnt_ptr; > + } > + mib->cnt_ptr = 0; > +} > + > +static void ksz_mib_read_work(struct work_struct *work) > +{ > + struct ksz_device *dev = > + container_of(work, struct ksz_device, mib_read); > + struct ksz_port *p; > + struct ksz_port_mib *mib; > + int i; > + > + for (i = 0; i < dev->mib_port_cnt; i++) { > + p = &dev->ports[i]; > + if (!p->on) > + continue; > + mib = &p->mib; > + mutex_lock(&mib->cnt_mutex); > + > + /* read only dropped counters when link is not up */ > + if (p->link_down) > + p->link_down = 0; > + else if (!p->link_up) > + mib->cnt_ptr = dev->reg_mib_cnt; Can't you just check the ports' PHY device here instead of caching that (which can get out of
Re: [PATCH v1 RFC 5/7] Break KSZ9477 DSA driver into two files
On 10/06/2017 01:33 PM, tristram...@microchip.com wrote: > From: Tristram Ha > > Break KSZ9477 DSA driver into two files in preparation to add more KSZ > switch drivers. > Add common functions in ksz_common.h so that other KSZ switch drivers > can access code in ksz_common.c. > Add ksz_spi.h for common functions used by KSZ switch SPI drivers. > > Signed-off-by: Tristram Ha Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH v1 RFC 0/7] Modify KSZ9477 DSA driver in preparation to add other KSZ switch drivers
On 10/06/2017 01:32 PM, tristram...@microchip.com wrote: > From: Tristram Ha > > This series of patches is to modify the original KSZ9477 DSA driver so > that other KSZ switch drivers can be added and use the common code. > > There are several steps to accomplish this achievement. First is to > rename some function names with a prefix to indicate chip specific > function. Second is to move common code into header that can be shared. > Last is to modify tag_ksz.c so that it can handle many tail tag formats > used by different KSZ switch drivers. > > ksz_common.c will contain the common code used by all KSZ switch drivers. > ksz9477.c will contain KSZ9477 code from the original ksz_common.c. > ksz9477_spi.c is renamed from ksz_spi.c. > ksz9477_reg.h is renamed from ksz_9477_reg.h. > ksz_common.h is added to provide common code access to KSZ switch > drivers. > ksz_spi.h is added to provide common SPI access functions to KSZ SPI > drivers. Most of this looks fine, and it is probably time to move away from the RFC state and do a formal patch submission with the Reviewed-by and Acked-by tags you just collected. Can you also make sure you prefix your patches with: net: dsa: to be consistent with other submissions done to that subsystem, e.g: net: dsa: microchip: Replace license with GPL Thank you! > > v1 > - Each patch in the set is self-contained > - Use ksz9477 prefix to indicate KSZ9477 specific code > - Simplify MIB counter reading code > - Switch driver code is not accessed from tag_ksz.c > > Tristram Ha (7): > Replace license with GPL. > Clean up code according to patch check suggestions. > Rename some functions with ksz9477 prefix to separate chip specific > code from common code. > Rename ksz_spi.c to ksz9477_spi.c and update Kconfig in preparation to > add more KSZ switch drivers. > Break KSZ9477 DSA driver into two files in preparation to add more KSZ > switch drivers. Add common functions in ksz_common.h so that other > KSZ switch drivers can access code in ksz_common.c. Add ksz_spi.h > for common functions used by KSZ switch SPI drivers. > Add MIB counter reading support. Rename ksz_9477_reg.h to > ksz9477_reg.h for consistency as the product name is always KSZ. > Header file ksz_priv.h no longer contains any chip specific data. > Modify tag_ksz.c so that tail tag code can be used by other KSZ switch > drivers. > > drivers/net/dsa/microchip/Kconfig | 14 +- > drivers/net/dsa/microchip/Makefile |4 +- > drivers/net/dsa/microchip/ksz9477.c| 1376 > > .../microchip/{ksz_9477_reg.h => ksz9477_reg.h}| 23 +- > drivers/net/dsa/microchip/ksz9477_spi.c| 188 +++ > drivers/net/dsa/microchip/ksz_common.c | 1210 - > drivers/net/dsa/microchip/ksz_common.h | 234 > drivers/net/dsa/microchip/ksz_priv.h | 258 ++-- > drivers/net/dsa/microchip/ksz_spi.c| 216 --- > drivers/net/dsa/microchip/ksz_spi.h| 82 ++ > include/net/dsa.h |2 +- > net/dsa/Kconfig|4 + > net/dsa/dsa.c |4 +- > net/dsa/dsa_priv.h |2 +- > net/dsa/tag_ksz.c | 107 +- > 15 files changed, 2331 insertions(+), 1393 deletions(-) > create mode 100644 drivers/net/dsa/microchip/ksz9477.c > rename drivers/net/dsa/microchip/{ksz_9477_reg.h => ksz9477_reg.h} (98%) > create mode 100644 drivers/net/dsa/microchip/ksz9477_spi.c > create mode 100644 drivers/net/dsa/microchip/ksz_common.h > delete mode 100644 drivers/net/dsa/microchip/ksz_spi.c > create mode 100644 drivers/net/dsa/microchip/ksz_spi.h > -- Florian
RE: [PATCH v1 RFC 1/7] Replace license with GPL
> Subject: [PATCH v1 RFC 1/7] Replace license with GPL > > From: Tristram Ha > > Replace license with GPL. > > Signed-off-by: Tristram Ha Reviewed-by: Woojung Huh
Re: linux-next: manual merge of the drivers-x86 tree with the net-next tree
On Mon, Oct 09, 2017 at 10:43:01PM +0300, Mika Westerberg wrote: > If possible, I would rather move this chapter to be before "Networking > over Thunderbolt cable". Reason is that it then follows NVM flashing > chapter which is typically where you need to force power in the first > place. I guess that's something best sorted out either in the relevant trees or during the merge window? signature.asc Description: PGP signature
Re: [PATCH v1 RFC 4/7] Rename ksz_spi.c to ksz9477_spi.c
On 10/06/2017 01:33 PM, tristram...@microchip.com wrote: > From: Tristram Ha > > Rename ksz_spi.c to ksz9477_spi.c and update Kconfig in preparation to add > more KSZ switch drivers. > > Signed-off-by: Tristram Ha Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH v1 RFC 3/7] Rename some functions with ksz9477 prefix
On 10/06/2017 01:33 PM, tristram...@microchip.com wrote: > From: Tristram Ha > > Rename some functions with ksz9477 prefix to separate chip specific code > from common code. > > Signed-off-by: Tristram Ha Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH v1 RFC 1/7] Replace license with GPL
On Fri 2017-10-06 13:32:59, tristram...@microchip.com wrote: > From: Tristram Ha > > Replace license with GPL. > > Signed-off-by: Tristram Ha Acked-by: Pavel Machek -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH v1 RFC 2/7] Clean up code according to patch check suggestions
On Fri 2017-10-06 13:33:00, tristram...@microchip.com wrote: > From: Tristram Ha > > Clean up code according to patch check suggestions. > > Signed-off-by: Tristram Ha Reviewed-by: Pavel Machek -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH v1 RFC 3/7] Rename some functions with ksz9477 prefix
On Fri 2017-10-06 13:33:01, tristram...@microchip.com wrote: > From: Tristram Ha > > Rename some functions with ksz9477 prefix to separate chip specific code > from common code. > > Signed-off-by: Tristram Ha Reviewed-by: Pavel Machek -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH v1 RFC 4/7] Rename ksz_spi.c to ksz9477_spi.c
On Fri 2017-10-06 13:33:02, tristram...@microchip.com wrote: > From: Tristram Ha > > Rename ksz_spi.c to ksz9477_spi.c and update Kconfig in preparation to add > more KSZ switch drivers. > > Signed-off-by: Tristram Ha This will ask people question they already answered, but I guess that's ok. Reviewed-by: Pavel Machek -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH v1 RFC 2/7] Clean up code according to patch check suggestions
On 10/06/2017 01:33 PM, tristram...@microchip.com wrote: > From: Tristram Ha > > Clean up code according to patch check suggestions. > > Signed-off-by: Tristram Ha Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH v1 RFC 6/7] Add MIB counter reading support
Hi! > From: Tristram Ha > > Add MIB counter reading support. > Rename ksz_9477_reg.h to ksz9477_reg.h for consistency as the product > name is always KSZ. > Header file ksz_priv.h no longer contains any chip specific data. Nothing obviously wrong here, but if you are doing another iteration, it would be nice to separate "Rename ksz_9477_reg.h to ksz9477_reg.h" from the code changes. Best regards, > + timeout = 1000; > + do { > + ksz_pread32(dev, port, REG_PORT_MIB_CTRL_STAT__4, > + &data); > + usleep_range(1, 10); > + if (!(data & MIB_COUNTER_READ)) > + break; > + } while (timeout-- > 0); 1000 iterations, 1usec each, so 1msec, but you allow up to 10 msec. Interesting. > + /* failed to read MIB. get out of loop */ > + if (!timeout) { > + dev_dbg(dev->dev, "Failed to get MIB\n"); > + return; > + } Are you sure this works? AFAICT "timeout" will underflow, so !timeout will not trigger. > -static void ksz_get_ethtool_stats(struct dsa_switch *ds, int port, > - uint64_t *buf) > -{ > - struct ksz_device *dev = ds->priv; > - int i; > - u32 data; > - int timeout; > - > - mutex_lock(&dev->stats_mutex); > - > - for (i = 0; i < TOTAL_SWITCH_COUNTER_NUM; i++) { > - data = MIB_COUNTER_READ; > - data |= ((mib_names[i].index & 0xFF) << MIB_COUNTER_INDEX_S); > - ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, data); > - > - timeout = 1000; > - do { > - ksz_pread32(dev, port, REG_PORT_MIB_CTRL_STAT__4, > - &data); > - usleep_range(1, 10); > - if (!(data & MIB_COUNTER_READ)) > - break; > - } while (timeout-- > 0); > - > - /* failed to read MIB. get out of loop */ > - if (!timeout) { > - dev_dbg(dev->dev, "Failed to get MIB\n"); > - break; > - } Hmm. Bug was there already... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH v1 RFC 5/7] Break KSZ9477 DSA driver into two files
On Fri 2017-10-06 13:33:03, tristram...@microchip.com wrote: > From: Tristram Ha > > Break KSZ9477 DSA driver into two files in preparation to add more KSZ > switch drivers. > Add common functions in ksz_common.h so that other KSZ switch drivers > can access code in ksz_common.c. > Add ksz_spi.h for common functions used by KSZ switch SPI drivers. > > Signed-off-by: Tristram Ha Reviewed-by: Pavel Machek -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: linux-next: manual merge of the drivers-x86 tree with the net-next tree
On Mon, Oct 09, 2017 at 06:56:33PM +0100, Mark Brown wrote: > +Networking over Thunderbolt cable > +- > +Thunderbolt technology allows software communication across two hosts > +connected by a Thunderbolt cable. > + > +It is possible to tunnel any kind of traffic over Thunderbolt link but > +currently we only support Apple ThunderboltIP protocol. > + > +If the other host is running Windows or macOS only thing you need to > +do is to connect Thunderbolt cable between the two hosts, the > +``thunderbolt-net`` is loaded automatically. If the other host is also > +Linux you should load ``thunderbolt-net`` manually on one host (it does > +not matter which one):: > + > + # modprobe thunderbolt-net > + > +This triggers module load on the other host automatically. If the driver > +is built-in to the kernel image, there is no need to do anything. > + > +The driver will create one virtual ethernet interface per Thunderbolt > +port which are named like ``thunderbolt0`` and so on. From this point > +you can either use standard userspace tools like ``ifconfig`` to > +configure the interface or let your GUI to handle it automatically. > ++ > + Forcing power > + - > + Many OEMs include a method that can be used to force the power of a > + thunderbolt controller to an "On" state even if nothing is connected. > + If supported by your machine this will be exposed by the WMI bus with > + a sysfs attribute called "force_power". > + > + For example the intel-wmi-thunderbolt driver exposes this attribute in: > + > /sys/devices/platform/PNP0C14:00/wmi_bus/wmi_bus-PNP0C14:00/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power > + > + To force the power to on, write 1 to this attribute file. > + To disable force power, write 0 to this attribute file. > + > + Note: it's currently not possible to query the force power state of a > platform. If possible, I would rather move this chapter to be before "Networking over Thunderbolt cable". Reason is that it then follows NVM flashing chapter which is typically where you need to force power in the first place.
[PATCH] wimax/i2400m: Remove VLAIS
From: Behan Webster Convert Variable Length Array in Struct (VLAIS) to valid C by converting local struct definition to use a flexible array. The structure is only used to define a cast of a buffer so the size of the struct is not used to allocate storage. Signed-off-by: Behan Webster Signed-off-by: Mark Charebois Suggested-by: Arnd Bergmann Signed-off-by: Matthias Kaehlcke --- drivers/net/wimax/i2400m/fw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wimax/i2400m/fw.c b/drivers/net/wimax/i2400m/fw.c index c9c711dcd0e6..a89b5685e68b 100644 --- a/drivers/net/wimax/i2400m/fw.c +++ b/drivers/net/wimax/i2400m/fw.c @@ -652,7 +652,7 @@ static int i2400m_download_chunk(struct i2400m *i2400m, const void *chunk, struct device *dev = i2400m_dev(i2400m); struct { struct i2400m_bootrom_header cmd; - u8 cmd_payload[chunk_len]; + u8 cmd_payload[]; } __packed *buf; struct i2400m_bootrom_header ack; -- 2.14.2.920.gcf0c67979c-goog
Re: [PATCH v1 RFC 1/7] Replace license with GPL
On 10/09/2017 11:40 AM, tristram...@microchip.com wrote: >> From: tristram...@microchip.com >>> Sent: 06 October 2017 21:33 >>> Replace license with GPL. >> >> Don't you need permission from all the people who have updated >> the files in order to make this change? >> >> David > > I am a little confused by your comment. The 4 original KSZ9477 DSA > driver files were written by Woojung at Microchip Technology Inc. > There was a complaint the "AS IS" license is not exactly GPL. > > It should be submitted formally to net-next instead of a RFC, but it > is probably pointless to do that when there is no code change. > > I am hoping these drastic changes of KSZ9477 driver can be accepted > so that the patches can be submitted formally to net-next. What David is saying is that you need to take into account every one who has been making changes to the Microchip driver, because anytime someones submits a patch that gets accepted, their "voice" now counts: Fortunately, the list seems to be fairly limited: git shortlog -sne drivers/net/dsa/microchip 5 Arkadi Sharshevsky 1 Woojung Huh So seeking contributor approval for a license change should be fairly simple. -- Florian
Re: [PATCH v4 1/2] net: phy: DP83822 initial driver submission
On 10/09/2017 11:59 AM, Dan Murphy wrote: > Add support for the TI DP83822 10/100Mbit ethernet phy. > > The DP83822 provides flexibility to connect to a MAC through a > standard MII, RMII or RGMII interface. > > In addition the DP83822 needs to be removed from the DP83848 driver > as the WoL support is added here for this device. > > Datasheet: > http://www.ti.com/product/DP83822I/datasheet > > Signed-off-by: Dan Murphy > --- > > v4 - Squash DP83822 removal patch into this patch - > https://www.mail-archive.com/netdev@vger.kernel.org/msg192424.html > > v3 - Fixed WoL indication bit and removed mutex for suspend/resume - > https://www.mail-archive.com/netdev@vger.kernel.org/msg191891.html and > https://www.mail-archive.com/netdev@vger.kernel.org/msg191665.html > > v2 - Updated per comments. Removed unnessary parantheis, called > genphy_suspend/ > resume routines and then performing WoL changes, reworked sopass storage and > reduced > the number of phy reads, and moved WOL_SECURE_ON - > https://www.mail-archive.com/netdev@vger.kernel.org/msg191392.html > > drivers/net/phy/Kconfig | 5 + > drivers/net/phy/Makefile | 1 + > drivers/net/phy/dp83822.c | 302 > ++ > drivers/net/phy/dp83848.c | 3 - > 4 files changed, 308 insertions(+), 3 deletions(-) > create mode 100644 drivers/net/phy/dp83822.c > > diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig > index cd931cf9dcc2..8e78a482e09e 100644 > --- a/drivers/net/phy/Kconfig > +++ b/drivers/net/phy/Kconfig > @@ -277,6 +277,11 @@ config DAVICOM_PHY > ---help--- > Currently supports dm9161e and dm9131 > > +config DP83822_PHY > + tristate "Texas Instruments DP83822 PHY" > + ---help--- > + Supports the DP83822 PHY. > + > config DP83848_PHY > tristate "Texas Instruments DP83848 PHY" > ---help--- > diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile > index 416df92fbf4f..df3b82ba8550 100644 > --- a/drivers/net/phy/Makefile > +++ b/drivers/net/phy/Makefile > @@ -55,6 +55,7 @@ obj-$(CONFIG_CICADA_PHY)+= cicada.o > obj-$(CONFIG_CORTINA_PHY)+= cortina.o > obj-$(CONFIG_DAVICOM_PHY)+= davicom.o > obj-$(CONFIG_DP83640_PHY)+= dp83640.o > +obj-$(CONFIG_DP83822_PHY)+= dp83822.o > obj-$(CONFIG_DP83848_PHY)+= dp83848.o > obj-$(CONFIG_DP83867_PHY)+= dp83867.o > obj-$(CONFIG_FIXED_PHY) += fixed_phy.o > diff --git a/drivers/net/phy/dp83822.c b/drivers/net/phy/dp83822.c > new file mode 100644 > index ..de196dbc46cd > --- /dev/null > +++ b/drivers/net/phy/dp83822.c > @@ -0,0 +1,302 @@ > +/* > + * Driver for the Texas Instruments DP83822 PHY > + * > + * Copyright (C) 2017 Texas Instruments Inc. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define DP83822_PHY_ID 0x2000a240 > +#define DP83822_DEVADDR 0x1f > + > +#define MII_DP83822_MISR10x12 > +#define MII_DP83822_MISR20x13 > +#define MII_DP83822_RESET_CTRL 0x1f > + > +#define DP83822_HW_RESET BIT(15) > +#define DP83822_SW_RESET BIT(14) > + > +/* MISR1 bits */ > +#define DP83822_RX_ERR_HF_INT_EN BIT(0) > +#define DP83822_FALSE_CARRIER_HF_INT_EN BIT(1) > +#define DP83822_ANEG_COMPLETE_INT_EN BIT(2) > +#define DP83822_DUP_MODE_CHANGE_INT_EN BIT(3) > +#define DP83822_SPEED_CHANGED_INT_EN BIT(4) > +#define DP83822_LINK_STAT_INT_EN BIT(5) > +#define DP83822_ENERGY_DET_INT_ENBIT(6) > +#define DP83822_LINK_QUAL_INT_EN BIT(7) > + > +/* MISR2 bits */ > +#define DP83822_JABBER_DET_INT_ENBIT(0) > +#define DP83822_WOL_PKT_INT_EN BIT(1) > +#define DP83822_SLEEP_MODE_INT_ENBIT(2) > +#define DP83822_MDI_XOVER_INT_EN BIT(3) > +#define DP83822_LB_FIFO_INT_EN BIT(4) > +#define DP83822_PAGE_RX_INT_EN BIT(5) > +#define DP83822_ANEG_ERR_INT_EN BIT(6) > +#define DP83822_EEE_ERROR_CHANGE_INT_EN BIT(7) > + > +/* INT_STAT1 bits */ > +#define DP83822_WOL_INT_EN BIT(4) > +#define DP83822_WOL_INT_STAT BIT(12) > + > +#define MII_DP83822_RXSOP1 0x04a5 > +#define MII_DP83822_RXSOP2 0x04a6 > +#define MII_DP83822_RXSOP3 0x04a7 > + > +/* WoL Registers */ > +#define MII_DP83822_WOL_CFG 0x04a0 > +#define MII_DP83822_WOL_STAT0x04a1 > +#define MII_DP83822_WOL_DA1 0x04a2 > +#define MII_DP83822_WOL_DA2 0x04a3 > +#define M
Re: [PATCH net-next 1/1] [net] bonding: Add NUMA notice
On Sat, Oct 7, 2017 at 6:20 PM, David Miller wrote: > From: Patrick Talbert > Date: Thu, 5 Oct 2017 16:23:45 -0400 > >> Network performance can suffer when a load balancing bond uses slave >> interfaces which are in different NUMA domains. >> >> This compares the NUMA domain of a newly enslaved interface against any >> existing enslaved interfaces and prints a warning if they do not match. >> >> Signed-off-by: Patrick Talbert > > This is a bit over the top, and doesn't even handle cases where > the device has no specific NUMA node (-1). Hello David, The motivation was simply to have a notification in place if the interface to-be-enslaved does not match the existing one(s). We have seen performance issues with bonding which were eventually tracked down to this mismatched NUMA domain issue. I thought it was worth having the *mild* warning because it can have a decent impact yet is probably not an obvious thing to check for most users. Though I now see your point. If the bonded interfaces are VLANs, and their underlying physical interfaces happen to be in different NUMA domains, then my check will not know as the VLAN interface numa_node member will be -1 no matter what. That's probably a pretty common setup but adding the logic to check for it probably isn't worth it. Patrick
RE: [PATCH v1 RFC 1/7] Replace license with GPL
> From: tristram...@microchip.com > > Sent: 06 October 2017 21:33 > > Replace license with GPL. > > Don't you need permission from all the people who have updated > the files in order to make this change? > > David I am a little confused by your comment. The 4 original KSZ9477 DSA driver files were written by Woojung at Microchip Technology Inc. There was a complaint the "AS IS" license is not exactly GPL. It should be submitted formally to net-next instead of a RFC, but it is probably pointless to do that when there is no code change. I am hoping these drastic changes of KSZ9477 driver can be accepted so that the patches can be submitted formally to net-next.