[patch net 2/3] mlxsw: switchx2: Fix memory leak at skb reallocation
From: Arkadi Sharshevsky During transmission the skb is checked for headroom in order to add vendor specific header. In case the skb needs to be re-allocated, skb_realloc_headroom() is called to make a private copy of the original, but doesn't release it. Current code assumes that the original skb is released during reallocation and only releases it at the error path which causes a memory leak. Fix this by adding the original skb release to the main path. Fixes: d003462a50de ("mlxsw: Simplify mlxsw_sx_port_xmit function") Signed-off-by: Arkadi Sharshevsky Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c index 150ccf5..2e88115e 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c +++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c @@ -345,6 +345,7 @@ static netdev_tx_t mlxsw_sx_port_xmit(struct sk_buff *skb, dev_kfree_skb_any(skb_orig); return NETDEV_TX_OK; } + dev_consume_skb_any(skb_orig); } mlxsw_sx_txhdr_construct(skb, &tx_info); /* TX header is consumed by HW on the way so we shouldn't count its -- 2.7.4
[patch net 3/3] mlxsw: pci: Fix EQE structure definition
From: Elad Raz The event_data starts from address 0x00-0x0C and not from 0x08-0x014. This leads to duplication with other fields in the Event Queue Element such as sub-type, cqn and owner. Fixes: eda6500a987a0 ("mlxsw: Add PCI bus implementation") Signed-off-by: Elad Raz Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/pci_hw.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h index d147ddd..0af3338 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h +++ b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h @@ -209,21 +209,21 @@ MLXSW_ITEM32(pci, eqe, owner, 0x0C, 0, 1); /* pci_eqe_cmd_token * Command completion event - token */ -MLXSW_ITEM32(pci, eqe, cmd_token, 0x08, 16, 16); +MLXSW_ITEM32(pci, eqe, cmd_token, 0x00, 16, 16); /* pci_eqe_cmd_status * Command completion event - status */ -MLXSW_ITEM32(pci, eqe, cmd_status, 0x08, 0, 8); +MLXSW_ITEM32(pci, eqe, cmd_status, 0x00, 0, 8); /* pci_eqe_cmd_out_param_h * Command completion event - output parameter - higher part */ -MLXSW_ITEM32(pci, eqe, cmd_out_param_h, 0x0C, 0, 32); +MLXSW_ITEM32(pci, eqe, cmd_out_param_h, 0x04, 0, 32); /* pci_eqe_cmd_out_param_l * Command completion event - output parameter - lower part */ -MLXSW_ITEM32(pci, eqe, cmd_out_param_l, 0x10, 0, 32); +MLXSW_ITEM32(pci, eqe, cmd_out_param_l, 0x08, 0, 32); #endif -- 2.7.4
[patch net 0/3] mlxsw: Couple of fixes
From: Jiri Pirko Couple of simple fixes from Arkadi and Elad. Please queue these up for stable. Thanks. Arkadi Sharshevsky (2): mlxsw: spectrum: Fix memory leak at skb reallocation mlxsw: switchx2: Fix memory leak at skb reallocation Elad Raz (1): mlxsw: pci: Fix EQE structure definition drivers/net/ethernet/mellanox/mlxsw/pci_hw.h | 8 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 1 + drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 1 + 3 files changed, 6 insertions(+), 4 deletions(-) -- 2.7.4
[patch net 1/3] mlxsw: spectrum: Fix memory leak at skb reallocation
From: Arkadi Sharshevsky During transmission the skb is checked for headroom in order to add vendor specific header. In case the skb needs to be re-allocated, skb_realloc_headroom() is called to make a private copy of the original, but doesn't release it. Current code assumes that the original skb is released during reallocation and only releases it at the error path which causes a memory leak. Fix this by adding the original skb release to the main path. Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Arkadi Sharshevsky Reviewed-by: Ido Schimmel Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index d768c7b..003093a 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -684,6 +684,7 @@ static netdev_tx_t mlxsw_sp_port_xmit(struct sk_buff *skb, dev_kfree_skb_any(skb_orig); return NETDEV_TX_OK; } + dev_consume_skb_any(skb_orig); } if (eth_skb_pad(skb)) { -- 2.7.4
[PATCH iproute2/net-next 2/2] tc: flower: Support matching ARP
Support matching on ARP operation, and hardware and protocol addresses for Ethernet hardware and IPv4 protocol addresses. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol arp parent : flower indev eth0 \ arp_op request arp_sip 10.0.0.1 action drop tc filter add dev eth0 protocol rarp parent : flower indev eth0 \ arp_op reply arp_tha 52:54:3f:00:00:00/24 action drop Signed-off-by: Simon Horman --- man/man8/tc-flower.8 | 41 +- tc/f_flower.c| 208 +++ 2 files changed, 232 insertions(+), 17 deletions(-) diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8 index 5904a9ecafdf..2dd2c5e6e4a5 100644 --- a/man/man8/tc-flower.8 +++ b/man/man8/tc-flower.8 @@ -34,7 +34,13 @@ flower \- flow based traffic control filter .BR dst_ip " | " src_ip " } " .IR PREFIX " | { " .BR dst_port " | " src_port " } " -.IR port_number " } | " +.IR port_number " } | { " +.BR arp_tip " | " arp_sip " } " +.IR PREFIX " | " +.BR arp_op " { " request " | " reply " | " +.IR OP " } | { " +.BR arp_tha " | " arp_sha " } " +.IR MASKED_LLADDR " | " .B enc_key_id .IR KEY-ID " | {" .BR enc_dst_ip " | " enc_src_ip " } { " @@ -131,6 +137,36 @@ Match on ICMP type or code. Only available for .BR ip_proto " values " icmp " and " icmpv6 which have to be specified in beforehand. .TP +.BI arp_tip " PREFIX" +.TQ +.BI arp_sip " PREFIX" +Match on ARP or RARP sender or target IP address. +.I PREFIX +must be a valid IPv4 address optionally followed by a slash and the prefix +length. If the prefix is missing, \fBtc\fR assumes a full-length host +match. +.TP +.BI arp_op " ARP_OP" +Match on ARP or RARP operation. +.I ARP_OP +may be +.BR request ", " reply +or an integer value 0, 1 or 2. A mask may be optionally provided to limit +the bits of the operation which are matched. A mask is provided by +following the address with a slash and then the mask. It may be provided as +an unsigned 8 bit value representing a bitwise mask. If the mask is missing +then a match on all bits is assumed. +.TP +.BI arp_sha " MASKED_LLADDR" +.TQ +.BI arp_tha " MASKED_LLADDR" +Match on ARP or RARP sender or target MAC address. A mask may be optionally +provided to limit the bits of the address which are matched. A mask is +provided by following the address with a slash and then the mask. It may be +provided in LLADDR format, in which case it is a bitwise mask, or as a +number of high bits to match. If the mask is missing then a match on all +bits is assumed. +.TP .BI enc_key_id " NUMBER" .TQ .BI enc_dst_ip " PREFIX" @@ -152,7 +188,8 @@ As stated above where applicable, matches of a certain layer implicitly depend on the matches of the next lower layer. Precisely, layer one and two matches (\fBindev\fR, \fBdst_mac\fR and \fBsrc_mac\fR) have no dependency, layer three matches -(\fBip_proto\fR, \fBdst_ip\fR and \fBsrc_ip\fR) +(\fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR, +\fBarp_op\fR, \fBarp_tha\fR and \fBarp_sha\fR) depend on the .B protocol option of tc filter, layer four port matches diff --git a/tc/f_flower.c b/tc/f_flower.c index 99f5f8163ee0..d301db36a549 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -54,6 +55,11 @@ static void explain(void) " src_port PORT-NUMBER |\n" " type ICMP-TYPE |\n" " code ICMP-CODE |\n" + " arp_tip PREFIX |\n" + " arp_sip PREFIX |\n" + " arp_op [ request | reply | OP ] |\n" + " arp_tha MASKED-LLADDR |\n" + " arp_sha MASKED-LLADDR |\n" " enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n" " enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n" " enc_key_id [ KEY-ID ] |\n" @@ -192,27 +198,16 @@ err: return -1; } -static int flower_parse_ip_addr(char *str, __be16 eth_type, - int addr4_type, int mask4_type, - int addr6_type, int mask6_type, - struct nlmsghdr *n) +static int __flower_parse_ip_addr(char *str, int family, + int addr4_type, int mask4_type, + int addr6_type, int mask6_type, + struct nlmsghdr *n) { int ret; inet_prefix addr; - int family; int bits; int i; - if (eth_type == htons(ETH_P_IP)) { - family = AF_INET; - } else if (eth_type == htons(ETH_P_IPV6)) { - family = AF_INET6; - } else if (!eth_typ
[PATCH iproute2/net-next 1/2] tc: flower: update headers for TCA_FLOWER_KEY_ARP*
Present in net-next. Signed-off-by: Simon Horman --- include/linux/pkt_cls.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h index a081efbd61a2..1e5e1ddfdaca 100644 --- a/include/linux/pkt_cls.h +++ b/include/linux/pkt_cls.h @@ -416,6 +416,17 @@ enum { TCA_FLOWER_KEY_ICMPV6_TYPE, /* u8 */ TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,/* u8 */ + TCA_FLOWER_KEY_ARP_SIP, /* be32 */ + TCA_FLOWER_KEY_ARP_SIP_MASK,/* be32 */ + TCA_FLOWER_KEY_ARP_TIP, /* be32 */ + TCA_FLOWER_KEY_ARP_TIP_MASK,/* be32 */ + TCA_FLOWER_KEY_ARP_OP, /* u8 */ + TCA_FLOWER_KEY_ARP_OP_MASK, /* u8 */ + TCA_FLOWER_KEY_ARP_SHA, /* ETH_ALEN */ + TCA_FLOWER_KEY_ARP_SHA_MASK,/* ETH_ALEN */ + TCA_FLOWER_KEY_ARP_THA, /* ETH_ALEN */ + TCA_FLOWER_KEY_ARP_THA_MASK,/* ETH_ALEN */ + __TCA_FLOWER_MAX, }; -- 2.7.0.rc3.207.g0ac5344
[PATCH iproute2/net-next 0/2] net/sched: cls_flower: Support matching ARP
Add support for support matching on ARP operation, and hardware and protocol addresses for Ethernet hardware and IPv4 protocol addresses. Changes since RFC: * Drop RFC designation; kernel patches are present in net-next Simon Horman (2): tc: flower: update headers for TCA_FLOWER_KEY_ARP* tc: flower: Support matching ARP include/linux/pkt_cls.h | 11 +++ man/man8/tc-flower.8| 41 +- tc/f_flower.c | 208 3 files changed, 243 insertions(+), 17 deletions(-) -- 2.7.0.rc3.207.g0ac5344
Re: [PATCH] can: Fix kernel panic at security_sock_rcv_skb
On 01/12/2017 07:33 AM, Liu ShuoX wrote: From: Zhang Yanmin The patch is for fix the below kernel panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] selinux_socket_sock_rcv_skb+0x65/0x2a0 Call Trace: [] security_sock_rcv_skb+0x4c/0x60 [] sk_filter+0x41/0x210 [] sock_queue_rcv_skb+0x53/0x3a0 [] raw_rcv+0x2a3/0x3c0 [] can_rcv_filter+0x12b/0x370 [] can_receive+0xd9/0x120 [] can_rcv+0xab/0x100 [] __netif_receive_skb_core+0xd8c/0x11f0 [] __netif_receive_skb+0x24/0xb0 [] process_backlog+0x127/0x280 [] net_rx_action+0x33b/0x4f0 [] __do_softirq+0x184/0x440 [] do_softirq_own_stack+0x1c/0x30 [] do_softirq.part.18+0x3b/0x40 [] do_softirq+0x1d/0x20 [] netif_rx_ni+0xe5/0x110 [] slcan_receive_buf+0x507/0x520 [] flush_to_ldisc+0x21c/0x230 [] process_one_work+0x24f/0x670 [] worker_thread+0x9d/0x6f0 [] ? rescuer_thread+0x480/0x480 [] kthread+0x12c/0x150 [] ret_from_fork+0x3f/0x70 The sk dereferenced in panic has been released. After the rcu_call in can_rx_unregister, receiver was protected by RCU but inner data was not, then later sk will be freed while other CPU is still using it. We need wait here to make sure sk referenced via receiver was safe. => security_sk_free => sk_destruct => __sk_free => sk_free => raw_release => sock_release => sock_close => __fput => fput => task_work_run => exit_to_usermode_loop => syscall_return_slowpath => int_ret_from_sys_call Signed-off-by: Zhang Yanmin Signed-off-by: He, Bo Signed-off-by: Liu Shuo A --- net/can/af_can.c | 14 -- net/can/af_can.h | 1 - 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/net/can/af_can.c b/net/can/af_can.c index 1108079..fcbe971 100644 --- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -517,10 +517,8 @@ int can_rx_register(struct net_device *dev, canid_t can_id, canid_t mask, /* * can_rx_delete_receiver - rcu callback for single receiver entry removal */ -static void can_rx_delete_receiver(struct rcu_head *rp) +static void can_rx_delete_receiver(struct receiver *r) { - struct receiver *r = container_of(rp, struct receiver, rcu); - kmem_cache_free(rcv_cache, r); } @@ -595,9 +593,13 @@ void can_rx_unregister(struct net_device *dev, canid_t can_id, canid_t mask, out: spin_unlock(&can_rcvlists_lock); - /* schedule the receiver item for deletion */ - if (r) - call_rcu(&r->rcu, can_rx_delete_receiver); + /* synchronize_rcu to wait until a grace period has elapsed, to make +* sure all receiver's sk dereferenced by others. +*/ + if (r) { + synchronize_rcu(); + can_rx_delete_receiver(r); Nitpick: When can_rx_delete_receiver() just contains kmem_cache_free(rcv_cache, r), then the function definition should be removed. But my main concern is: The reason why can_rx_delete_receiver() was introduced was the need to remove a huge number of receivers with can_rx_unregister(). When you call synchronize_rcu() after each receiver removal this would potentially lead to a big performance issue when e.g. closing CAN_RAW sockets with a high number of receivers. So the idea was to remove/unlink the receiver hlist_del_rcu(&r->list) and also kmem_cache_free(rcv_cache, r) by some rcu mechanism - so that all elements are cleaned up by rcu at a later point. Is it possible that the problems emerge due to hlist_del_rcu(&r->list) and you accidently fix it with your introduced synchronize_rcu()? Regards, Oliver + } } EXPORT_SYMBOL(can_rx_unregister); diff --git a/net/can/af_can.h b/net/can/af_can.h index fca0fe9..a0cbf83 100644 --- a/net/can/af_can.h +++ b/net/can/af_can.h @@ -50,7 +50,6 @@ struct receiver { struct hlist_node list; - struct rcu_head rcu; canid_t can_id; canid_t mask; unsigned long matches;
Re: [PATCH net-next 2/2] bpf: allow b/h/w/dw access for bpf's cb in ctx
Hi Daniel, 2017-01-12 (02:21 +0100) ~ Daniel Borkmann > When structs are used to store temporary state in cb[] buffer that is > used with programs and among tail calls, then the generated code will > not always access the buffer in bpf_w chunks. We can ease programming > of it and let this act more natural by allowing for aligned b/h/w/dw > sized access for cb[] ctx member. Various test cases are attached as > well for the selftest suite. Potentially, this can also be reused for > other program types to pass data around. > > Signed-off-by: Daniel Borkmann > Acked-by: Alexei Starovoitov > --- > kernel/bpf/verifier.c | 8 +- > net/core/filter.c | 41 ++- > tools/testing/selftests/bpf/test_verifier.c | 442 > +++- > 3 files changed, 478 insertions(+), 13 deletions(-) > [...] > diff --git a/tools/testing/selftests/bpf/test_verifier.c > b/tools/testing/selftests/bpf/test_verifier.c > index 9bb4534..f664bed 100644 > --- a/tools/testing/selftests/bpf/test_verifier.c > +++ b/tools/testing/selftests/bpf/test_verifier.c > @@ -859,15 +859,451 @@ struct test_val { [...] > + { > + "check cb access: doulbe, oob 5", > + .insns = { > + BPF_MOV64_IMM(BPF_REG_0, 0), > + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, > + offsetof(struct __sk_buff, cb[4]) + 8), > + BPF_EXIT_INSN(), > + }, > + .errstr = "invalid bpf_context access", > + .result = REJECT, > + }, Nitpicking: typo ("doulbe"). Regards, Quentin
Re: [PATCH] [net] net/mlx5e: fix another -Wmaybe-uninitialized warning
On 1/11/2017 11:14 PM, Arnd Bergmann wrote: @@ -666,14 +666,15 @@ static int mlx5e_route_lookup_ipv4(struct mlx5e_priv *priv, struct rtable *rt; struct neighbour *n = NULL; int ttl; + int ret; + + if (!IS_ENABLED(CONFIG_INET)) + return -EOPNOTSUPP; -#if IS_ENABLED(CONFIG_INET) rt = ip_route_output_key(dev_net(mirred_dev), fl4); - if (IS_ERR(rt)) - return PTR_ERR(rt); -#else - return -EOPNOTSUPP; -#endif + ret = PTR_ERR_OR_ZERO(rt); + if (ret) + return ret; but this means that if we got NULL from ip_route_output_key, we will return success (0) here which is wrong.
Re: [PATCH iproute2 1/3] sr: add header files for SR-IPv6
On 01/10/2017 07:33 PM, Stephen Hemminger wrote: > I get all headers from santized kernel headers generated by > $ make headers_install > but the segmentation stuff is missing. > > When you added segment routing headers you forgot to export them. > Please send a patch to include/uapi/linux/Kbuild, after that is merged > I will pick them up. > > Also this patch is only for net-next. Oops ! Will do that, thanks David signature.asc Description: OpenPGP digital signature
Re: [PATCH] wext: handle NULL exta data in iwe_stream_add_point better
On Wed, 2017-01-11 at 21:39 +0100, Arnd Bergmann wrote: > On Wednesday, January 11, 2017 4:06:17 PM CET Johannes Berg wrote: > > > > Applied. Also fixed the typo in the subject :) > > Thanks! Unfortunately I now got another warning for the same > function, and though I would have expected the patch to fix it, that > did not work: I've come to expect better of you (i.e. testing your own patches) ;-) Come to think of it, I'm thinking I should drop this patch and the driver should just use iwe_stream_add_event() instead? It'll be somewhat tricky to get the length correct though. Alternatively, perhaps we should just uninline all the crap and then the compiler can't bother us :) johannes
[PATCH/RFC net] ravb: Remove Rx overflow log messages
From: Kazuya Mizuguchi Remove Rx overflow log messages as in an environment where logging results in network traffic logging may cause further overflows. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Kazuya Mizuguchi [simon: reworked changelog] Signed-off-by: Simon Horman --- drivers/net/ethernet/renesas/ravb_main.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 92d7692c840d..5e5ad978eab9 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -926,14 +926,10 @@ static int ravb_poll(struct napi_struct *napi, int budget) /* Receive error message handling */ priv->rx_over_errors = priv->stats[RAVB_BE].rx_over_errors; priv->rx_over_errors += priv->stats[RAVB_NC].rx_over_errors; - if (priv->rx_over_errors != ndev->stats.rx_over_errors) { + if (priv->rx_over_errors != ndev->stats.rx_over_errors) ndev->stats.rx_over_errors = priv->rx_over_errors; - netif_err(priv, rx_err, ndev, "Receive Descriptor Empty\n"); - } - if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors) { + if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors) ndev->stats.rx_fifo_errors = priv->rx_fifo_errors; - netif_err(priv, rx_err, ndev, "Receive FIFO Overflow\n"); - } out: return budget - quota; } -- 2.7.0.rc3.207.g0ac5344
Re: [Patch net] atm: remove an unnecessary loop
On Wed 11-01-17 21:02:01, Cong Wang wrote: > alloc_tx() is already inside a wait loop for a successful skb > allocation, this loop inside alloc_tx() is quite unnecessary > and actually problematic. I am not familiar with this code at all but vcc_sendmsg seems to be one of those cases where open coding __GFP_NOFAIL semantic makes sense as there is an allocation fallback strategy implemented. > Signed-off-by: Cong Wang I cannot give my reviewed-by because I am not familiar with the code but this looks like an improvement to me. > --- > net/atm/common.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/net/atm/common.c b/net/atm/common.c > index a3ca922..7ec3bbc 100644 > --- a/net/atm/common.c > +++ b/net/atm/common.c > @@ -72,10 +72,11 @@ static struct sk_buff *alloc_tx(struct atm_vcc *vcc, > unsigned int size) >sk_wmem_alloc_get(sk), size, sk->sk_sndbuf); > return NULL; > } > - while (!(skb = alloc_skb(size, GFP_KERNEL))) > - schedule(); > - pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize); > - atomic_add(skb->truesize, &sk->sk_wmem_alloc); > + skb = alloc_skb(size, GFP_KERNEL); > + if (skb) { > + pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize); > + atomic_add(skb->truesize, &sk->sk_wmem_alloc); > + } > return skb; > } > > -- > 2.5.5 -- Michal Hocko SUSE Labs
Re: wl1251 & mac address & calibration data
Hi! > >> But overwriting that one file is not possible as it next update of > >> linux-firmware package will overwrite it back. It break any normal usage > >> of package management. > >> > >> Also it is ridiculously broken by design if some "boot" files needs to > >> be overwritten to initialize hardware properly. To not break booting you > >> need to overwrite that file before first boot. But without booting > >> device you cannot read calibration data. So some hack with autoreboot > >> after boot is needed. > > Providing the calibration data via Device Tree is the proper way to > solve this. Yes yes, I know N900 doesn't support it but that's a > deficiency in N900, not Linux. Linux has to work with whatever hardware provides. You may not like N900 design, but we have to support it, anyway. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH] [v2] net: qcom/emac: grab a reference to the phydev on ACPI systems
On Wed, Jan 11, 2017 at 04:45:51PM -0600, Timur Tabi wrote: > Commit 6ffe1c4cd0a7 ("net: qcom/emac: fix of_node and phydev leaks") > fixed the problem with reference leaks on phydev, but the fix is > device-tree specific. When the driver unloads, the reference is > dropped only on DT systems. > > Instead, it's cleaner if up grab an reference on ACPI systems. > When the driver unloads, we can drop the reference without having > to check whether we're on a DT system. > > Signed-off-by: Timur Tabi Reviewed-by: Johan Hovold
Re: [PATCH/RFC v2 net-next] ravb: unmap descriptors when freeing rings
On Fri, Jan 06, 2017 at 10:02:36PM +0300, Sergei Shtylyov wrote: > Hello! > > On 01/05/2017 01:43 PM, Simon Horman wrote: > > >From: Kazuya Mizuguchi > > > >"swiotlb buffer is full" errors occur after repeated initialisation of a > >device - f.e. suspend/resume or ip link set up/down. This is because memory > >mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit() > >is not released. Resolve this problem by unmapping descriptors when > >freeing rings. > > > >Note, ravb_tx_free() is moved but not otherwise modified by this patch. > > > >Signed-off-by: Kazuya Mizuguchi > >[simon: reworked] > >Signed-off-by: Simon Horman > >-- > >v1 [Kazuya Mizuguchi] > > > >v2 [Simon Horman] > >* As suggested by Sergei Shtylyov > > - Use dma_mapping_error() and rx_desc->ds_cc when unmapping RX descriptors; > >this is consistent with the way that they are mapped > > - Use ravb_tx_free() to clear TX descriptors > >Not sure that was good idea (sorry)... ravb_tx_ring() only unmaps the > transmitted buffers, while we need to unmap everything... > > >* Reduce scope of new local variable > >--- > > drivers/net/ethernet/renesas/ravb_main.c | 89 > > ++-- > > 1 file changed, 51 insertions(+), 38 deletions(-) > > > >diff --git a/drivers/net/ethernet/renesas/ravb_main.c > >b/drivers/net/ethernet/renesas/ravb_main.c > >index 92d7692c840d..1797c48e3176 100644 > >--- a/drivers/net/ethernet/renesas/ravb_main.c > >+++ b/drivers/net/ethernet/renesas/ravb_main.c > >@@ -179,6 +179,44 @@ static struct mdiobb_ops bb_ops = { > > .get_mdio_data = ravb_get_mdio_data, > > }; > > > >+/* Free TX skb function for AVB-IP */ > >+static int ravb_tx_free(struct net_device *ndev, int q) > >+{ > >+struct ravb_private *priv = netdev_priv(ndev); > >+struct net_device_stats *stats = &priv->stats[q]; > >+struct ravb_tx_desc *desc; > >+int free_num = 0; > >+int entry; > >+u32 size; > >+ > >+for (; priv->cur_tx[q] - priv->dirty_tx[q] > 0; priv->dirty_tx[q]++) { > >+entry = priv->dirty_tx[q] % (priv->num_tx_ring[q] * > >+ NUM_TX_DESC); > >+desc = &priv->tx_ring[q][entry]; > >+if (desc->die_dt != DT_FEMPTY) > >Here, it stop once an untransmitted buffer is encountered... Yes, I see that now. I wonder if we should: a) paramatise ravb_tx_free() so it may either clear all transmitted buffers (current behaviour) or all buffers (new behaviour). b) provide a different version of this loop in ravb_ring_free() What are your thoughts? > >+break; > >+/* Descriptor type must be checked before all other reads */ > >+dma_rmb(); > >+size = le16_to_cpu(desc->ds_tagl) & TX_DS; > >+/* Free the original skb. */ > >+if (priv->tx_skb[q][entry / NUM_TX_DESC]) { > >+dma_unmap_single(ndev->dev.parent, > >le32_to_cpu(desc->dptr), > >+ size, DMA_TO_DEVICE); > >+/* Last packet descriptor? */ > >+if (entry % NUM_TX_DESC == NUM_TX_DESC - 1) { > >+entry /= NUM_TX_DESC; > >+dev_kfree_skb_any(priv->tx_skb[q][entry]); > >+priv->tx_skb[q][entry] = NULL; > >+stats->tx_packets++; > >+} > >+free_num++; > >+} > >+stats->tx_bytes += size; > >+desc->die_dt = DT_EEMPTY; > >+} > >+return free_num; > >+} > >+ > > /* Free skb's and DMA buffers for Ethernet AVB */ > > static void ravb_ring_free(struct net_device *ndev, int q) > > { > >@@ -207,6 +245,18 @@ static void ravb_ring_free(struct net_device *ndev, int > >q) > > priv->tx_align[q] = NULL; > > > > if (priv->rx_ring[q]) { > >+for (i = 0; i < priv->num_rx_ring[q]; i++) { > >+struct ravb_ex_rx_desc *rx_desc = &priv->rx_ring[q][i]; > >+ > >+if (!dma_mapping_error(ndev->dev.parent, > >+ rx_desc->dptr)) { > > You forgot le32_to_cpu() here, we can't use the raw descriptor fields. Thanks, I will fix that. > >+dma_unmap_single(ndev->dev.parent, > >+ le32_to_cpu(rx_desc->dptr), > >+ PKT_BUF_SZ, > >+ DMA_FROM_DEVICE); > >+rx_desc->ds_cc = cpu_to_le16(0); > >You don't check it anyway, not sure what that buys... Thanks, I will see about dropping that.
Re: [PATCH] wext: handle NULL exta data in iwe_stream_add_point better
> Come to think of it, I'm thinking I should drop this patch and the > driver should just use iwe_stream_add_event() instead? It'll be > somewhat tricky to get the length correct though. No, turns out that's basically impossible with all the compat etc. stuff here. johannes
Re: [PATCH] wext: handle NULL exta data in iwe_stream_add_point better
On Wed, 2017-01-11 at 21:39 +0100, Arnd Bergmann wrote: > On Wednesday, January 11, 2017 4:06:17 PM CET Johannes Berg wrote: > > > > Applied. Also fixed the typo in the subject :) > > Thanks! Unfortunately I now got another warning for the same > function, and though I would have expected the patch to fix it, that > did not work: > > In file included from /git/arm- > soc/drivers/net/wireless/intersil/prism54/islpci_dev.h:27:0, > from /git/arm- > soc/drivers/net/wireless/intersil/prism54/isl_ioctl.h:24, > from /git/arm- > soc/drivers/net/wireless/intersil/prism54/isl_ioctl.c:32: > /git/arm-soc/drivers/net/wireless/intersil/prism54/isl_ioctl.c: In > function 'prism54_get_scan': > /git/arm-soc/include/net/iw_handler.h:560:4: error: argument 2 null > where non-null expected [-Werror=nonnull] > memcpy(stream + point_len, extra, iwe->u.data.length); And I realized only now that this was a different place ... I've just added the check you suggested - spent way too much time already on this old crap :) johannes
Re: [PATCH] wext: handle NULL exta data in iwe_stream_add_point better
On Thursday, January 12, 2017 10:16:00 AM CET Johannes Berg wrote: > And I realized only now that this was a different place ... Right, it was a few hundred randconfigs later after I had confirmed that the first patch fixed all the configurations that were broken at first. > I've just added the check you suggested - spent way too much time > already on this old crap Ok, thanks! Let's hope it doesn't come back once more. I'm still trying to categorize the newly added warnings in gcc-7, there a number of very useful warnings that got added, but some of them are rather noisy and find both a number of real bugs and false positives. The NULL check had only a few findings that all seemed worth fixing. Arnd
Re: [PATCH v2 2/2] stmmac: rename it to synopsys
Hi Florian, Às 9:14 PM de 1/11/2017, Florian Fainelli escreveu: > On 01/10/2017 06:52 AM, Joao Pinto wrote: >> This patch renames stmicro/stmmac to synopsys/ since it is a standard >> ethernet software package regarding synopsys ethernet controllers, supporting >> the majority of Synopsys Ethernet IPs. The config IDs remain the same, for >> retro-compatibility, only the description was changed. > > Do re really have to do this? ST Micro were the first to upstream > support for a Synopsys IP, and it was later on identified as being > "stmicro" instead of "synopsys" (during the big driver move under > drivers/net/ethernet) whichever came first in the driver essentially "wins". > > As mentioned before, although git is able to track renames, git log does > not automatically have --follow, so it can be hard for people to track > down the (new) history of the driver. > > Personally, I don't see much value in doing this rename, especially when > all the driver internal structures are still going to be named with > stmmac (and please don't even think about doing a s/stmmac/snps/ inside > the driver ;)). > > My 2 cents. > First of all, I am suggesting an alternative way of organizing the code, and that's it, I have no second intentions about anything :). Please don't see this as a take-over or erase Stmicro from credits, please... it makes no sense. You can leave STMicro license and all the credits fine by me and I insist on it. But lets name it for something that makes sense... lets call it dwc (designware controllers), I am totally open to suggestions. I don't understand the hostility of some comments, honestly. The easiest way is to keep things like they are today, and believe me I have a lot of things to do, like adding the support of multi-queues / multi-channels to stmmac, so I not suggesting this because it is fun. I am suggesting this because it is what I am used to seeing in other subsystems. USB has dwc2 and dwc3 folders that clearly identifies that they are Designware (synopsys) extensions to the USB 2.0 and 3.0. The author of dwc3 was Texas Instruments, and they did not name it ti/usb. For example I use an AXS101 Development board that does not have a stmicro SoC but has a Designware Ethernet IP in it, so uses stmicro/stmmac. For me it is confusing. Lets not name it synopsys, for me it is totally fine, but naming it stmicro/stmmac is not the right way because it seems like it is a driver just for stmicro products, which is not, is for products that use Designware Ethernet IPs. I am volunteering to do this work, let's discuss this. Thanks, Joao
[PATCH] synopsys: remove dwc_eth_qos driver
This driver is no longer necessary since it was merged into stmmac. Acked-by: Lars Persson Signed-off-by: Joao Pinto --- MAINTAINERS |7 - arch/arm/configs/multi_v7_defconfig |3 +- drivers/net/ethernet/Kconfig|1 - drivers/net/ethernet/Makefile |1 - drivers/net/ethernet/synopsys/Kconfig | 27 - drivers/net/ethernet/synopsys/Makefile |5 - drivers/net/ethernet/synopsys/dwc_eth_qos.c | 2996 --- 7 files changed, 2 insertions(+), 3038 deletions(-) delete mode 100644 drivers/net/ethernet/synopsys/Kconfig delete mode 100644 drivers/net/ethernet/synopsys/Makefile delete mode 100644 drivers/net/ethernet/synopsys/dwc_eth_qos.c diff --git a/MAINTAINERS b/MAINTAINERS index c8df0e1..acfb0a0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10865,13 +10865,6 @@ F: include/linux/dma/dw.h F: include/linux/platform_data/dma-dw.h F: drivers/dma/dw/ -SYNOPSYS DESIGNWARE ETHERNET QOS 4.10a driver -M: Lars Persson -L: netdev@vger.kernel.org -S: Supported -F: Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt -F: drivers/net/ethernet/synopsys/dwc_eth_qos.c - SYNOPSYS DESIGNWARE I2C DRIVER M: Jarkko Nikula R: Andy Shevchenko diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig index b01a438..64f4419 100644 --- a/arch/arm/configs/multi_v7_defconfig +++ b/arch/arm/configs/multi_v7_defconfig @@ -253,7 +253,8 @@ CONFIG_R8169=y CONFIG_SH_ETH=y CONFIG_SMSC911X=y CONFIG_STMMAC_ETH=y -CONFIG_SYNOPSYS_DWC_ETH_QOS=y +CONFIG_STMMAC_PLATFORM=y +CONFIG_DWMAC_DWC_QOS_ETH=y CONFIG_TI_CPSW=y CONFIG_XILINX_EMACLITE=y CONFIG_AT803X_PHY=y diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig index e4c28fe..afc07d4 100644 --- a/drivers/net/ethernet/Kconfig +++ b/drivers/net/ethernet/Kconfig @@ -170,7 +170,6 @@ source "drivers/net/ethernet/sgi/Kconfig" source "drivers/net/ethernet/smsc/Kconfig" source "drivers/net/ethernet/stmicro/Kconfig" source "drivers/net/ethernet/sun/Kconfig" -source "drivers/net/ethernet/synopsys/Kconfig" source "drivers/net/ethernet/tehuti/Kconfig" source "drivers/net/ethernet/ti/Kconfig" source "drivers/net/ethernet/tile/Kconfig" diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile index 24330f4..e7861a8 100644 --- a/drivers/net/ethernet/Makefile +++ b/drivers/net/ethernet/Makefile @@ -81,7 +81,6 @@ obj-$(CONFIG_NET_VENDOR_SGI) += sgi/ obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/ obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/ obj-$(CONFIG_NET_VENDOR_SUN) += sun/ -obj-$(CONFIG_NET_VENDOR_SYNOPSYS) += synopsys/ obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/ obj-$(CONFIG_NET_VENDOR_TI) += ti/ obj-$(CONFIG_TILE_NET) += tile/ diff --git a/drivers/net/ethernet/synopsys/Kconfig b/drivers/net/ethernet/synopsys/Kconfig deleted file mode 100644 index 8276ee5..000 --- a/drivers/net/ethernet/synopsys/Kconfig +++ /dev/null @@ -1,27 +0,0 @@ -# -# Synopsys network device configuration -# - -config NET_VENDOR_SYNOPSYS - bool "Synopsys devices" - default y - ---help--- - If you have a network (Ethernet) device belonging to this class, say Y. - - Note that the answer to this question doesn't directly affect the - kernel: saying N will just cause the configurator to skip all - the questions about Synopsys devices. If you say Y, you will be asked - for your specific device in the following questions. - -if NET_VENDOR_SYNOPSYS - -config SYNOPSYS_DWC_ETH_QOS - tristate "Sypnopsys DWC Ethernet QOS v4.10a support" - select PHYLIB - select CRC32 - select MII - depends on OF && HAS_DMA - ---help--- - This driver supports the DWC Ethernet QoS from Synopsys - -endif # NET_VENDOR_SYNOPSYS diff --git a/drivers/net/ethernet/synopsys/Makefile b/drivers/net/ethernet/synopsys/Makefile deleted file mode 100644 index 7a37572..000 --- a/drivers/net/ethernet/synopsys/Makefile +++ /dev/null @@ -1,5 +0,0 @@ -# -# Makefile for the Synopsys network device drivers. -# - -obj-$(CONFIG_SYNOPSYS_DWC_ETH_QOS) += dwc_eth_qos.o diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c b/drivers/net/ethernet/synopsys/dwc_eth_qos.c deleted file mode 100644 index 467dcc5..000 --- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c +++ /dev/null @@ -1,2996 +0,0 @@ -/* Synopsys DWC Ethernet Quality-of-Service v4.10a linux driver - * - * This is a driver for the Synopsys DWC Ethernet QoS IP version 4.10a (GMAC). - * This version introduced a lot of changes which breaks backwards - * compatibility the non-QoS IP from Synopsys (used in the ST Micro drivers). - * Some fields differ between version 4.00a and 4.10a, mainly the interrupt - * bit fields. The driver could be made compatible with 4.00, if all relevant - * HW erratas are handled. - * - * The GMAC is highly configurable at synthesis time. This driver
Re: [PATCH v2 2/2] stmmac: rename it to synopsys
Hi Joao On 01/12/2017 10:43 AM, Joao Pinto wrote: Hi Florian, Às 9:14 PM de 1/11/2017, Florian Fainelli escreveu: On 01/10/2017 06:52 AM, Joao Pinto wrote: This patch renames stmicro/stmmac to synopsys/ since it is a standard ethernet software package regarding synopsys ethernet controllers, supporting the majority of Synopsys Ethernet IPs. The config IDs remain the same, for retro-compatibility, only the description was changed. Do re really have to do this? ST Micro were the first to upstream support for a Synopsys IP, and it was later on identified as being "stmicro" instead of "synopsys" (during the big driver move under drivers/net/ethernet) whichever came first in the driver essentially "wins". As mentioned before, although git is able to track renames, git log does not automatically have --follow, so it can be hard for people to track down the (new) history of the driver. Personally, I don't see much value in doing this rename, especially when all the driver internal structures are still going to be named with stmmac (and please don't even think about doing a s/stmmac/snps/ inside the driver ;)). My 2 cents. First of all, I am suggesting an alternative way of organizing the code, and that's it, I have no second intentions about anything :). Please don't see this as a take-over or erase Stmicro from credits, please... it makes no sense. You can leave STMicro license and all the credits fine by me and I insist on it. But lets name it for something that makes sense... lets call it dwc (designware controllers), I am totally open to suggestions. I don't understand the hostility of some comments, honestly. The easiest way is to keep things like they are today, and believe me I have a lot of things to do, like adding the support of multi-queues / multi-channels to stmmac, so I not suggesting this because it is fun. I am suggesting this because it is what I am used to seeing in other subsystems. USB has dwc2 and dwc3 folders that clearly identifies that they are Designware (synopsys) extensions to the USB 2.0 and 3.0. The author of dwc3 was Texas Instruments, and they did not name it ti/usb. For example I use an AXS101 Development board that does not have a stmicro SoC but has a Designware Ethernet IP in it, so uses stmicro/stmmac. For me it is confusing. Lets not name it synopsys, for me it is totally fine, but naming it stmicro/stmmac is not the right way because it seems like it is a driver just for stmicro products, which is not, is for products that use Designware Ethernet IPs. I am volunteering to do this work, let's discuss this. For me it makes no sens to rename only folder (stmicro/stmmac by synopsys) and keep stmmac* inside a synopsys folder (that is very confusing). If you propose that you have to change all. BUT doing that, we will lose all stmmac driver story and we don't want that. Thanks, Joao
Re: [PATCH] [net] net/mlx5e: fix another -Wmaybe-uninitialized warning
On Thursday, January 12, 2017 10:30:24 AM CET Or Gerlitz wrote: > On 1/11/2017 11:14 PM, Arnd Bergmann wrote: > > @@ -666,14 +666,15 @@ static int mlx5e_route_lookup_ipv4(struct mlx5e_priv > > *priv, > > struct rtable *rt; > > struct neighbour *n = NULL; > > int ttl; > > + int ret; > > + > > + if (!IS_ENABLED(CONFIG_INET)) > > + return -EOPNOTSUPP; > > > > -#if IS_ENABLED(CONFIG_INET) > > rt = ip_route_output_key(dev_net(mirred_dev), fl4); > > - if (IS_ERR(rt)) > > - return PTR_ERR(rt); > > -#else > > - return -EOPNOTSUPP; > > -#endif > > + ret = PTR_ERR_OR_ZERO(rt); > > + if (ret) > > + return ret; > > but this means that if we got NULL from ip_route_output_key, we will > return success (0) here which is wrong. I don't think so: if 'rt' is NULL or a valid pointer, then 'ret' is zero and we will not return here. Arnd
Re: net/atm: warning in alloc_tx/__might_sleep
On Wed, 2017-01-11 at 20:36 -0800, Cong Wang wrote: > On Wed, Jan 11, 2017 at 11:46 AM, Michal Hocko wrote: > > On Wed 11-01-17 20:45:25, Michal Hocko wrote: > >> On Wed 11-01-17 09:37:06, Chas Williams wrote: > >> > On Mon, 2017-01-09 at 18:20 +0100, Andrey Konovalov wrote: > >> > > Hi! > >> > > > >> > > I've got the following error report while running the syzkaller fuzzer. > >> > > > >> > > On commit a121103c922847ba5010819a3f250f1f7fc84ab8 (4.10-rc3). > >> > > > >> > > A reproducer is attached. > >> > > > >> > > [ cut here ] > >> > > WARNING: CPU: 0 PID: 4114 at kernel/sched/core.c:7737 > >> > > __might_sleep+0x149/0x1a0 > >> > > do not call blocking ops when !TASK_RUNNING; state=1 set at > >> > > [] prepare_to_wait+0x182/0x530 > >> > > Modules linked in: > >> > > CPU: 0 PID: 4114 Comm: a.out Not tainted 4.10.0-rc3+ #59 > >> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs > >> > > 01/01/2011 > >> > > Call Trace: > >> > > __dump_stack lib/dump_stack.c:15 > >> > > dump_stack+0x292/0x398 lib/dump_stack.c:51 > >> > > __warn+0x19f/0x1e0 kernel/panic.c:547 > >> > > warn_slowpath_fmt+0xc5/0x110 kernel/panic.c:562 > >> > > __might_sleep+0x149/0x1a0 kernel/sched/core.c:7732 > >> > > slab_pre_alloc_hook mm/slab.h:408 > >> > > slab_alloc_node mm/slub.c:2634 > >> > > kmem_cache_alloc_node+0x14a/0x280 mm/slub.c:2744 > >> > > __alloc_skb+0x10f/0x800 net/core/skbuff.c:219 > >> > > alloc_skb ./include/linux/skbuff.h:926 > >> > > alloc_tx net/atm/common.c:75 > >> > > >> > This is likely alloc_skb(..., GFP_KERNEL) in alloc_tx(). The simplest > >> > fix for this would be simply to switch this GFP_ATOMIC. See if this is > >> > any better. > >> > > >> > diff --git a/net/atm/common.c b/net/atm/common.c > >> > index a3ca922..d84220c 100644 > >> > --- a/net/atm/common.c > >> > +++ b/net/atm/common.c > >> > @@ -72,7 +72,7 @@ static struct sk_buff *alloc_tx(struct atm_vcc *vcc, > >> > unsigned int size) > >> > sk_wmem_alloc_get(sk), size, sk->sk_sndbuf); > >> > return NULL; > >> > } > >> > - while (!(skb = alloc_skb(size, GFP_KERNEL))) > >> > + while (!(skb = alloc_skb(size, GFP_ATOMIC))) > >> > schedule(); > >> > pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize); > >> > atomic_add(skb->truesize, &sk->sk_wmem_alloc); > >> > >> Blee, this code is just horrendous. But the "fix" is obviously broken! > >> schedule() is just a noop if you do not change the task state and what > >> you are just asking for is a never failing non sleeping allocation - aka > >> a busy loop in the kernel! > > > > And btw. this while loop should be really turned into GFP_KERNEL | > > __GFP_NOFAIL with and explanation why this allocation cannot possibly > > fail. > > I think a nested loop is quite unnecessary, probably due to the code itself > is pretty old. The alloc_tx() is in the outer loop, the alloc_skb() is > in the inner > loop, both seem to wait for a successful GFP allocation. The inner one > is even more unnecessary. > > Of course, I am not surprised MM may already have a mechanism to do > the similar logic. > > There maybe some reason ATM needs such a logic, although other proto > could handle skb allocation failure quite well in ->sendmsg(). I can't think of any particular reason that it needs this loop here. I suspect that the loop for alloc_tx() predates the wait logic in ->sendmsg() and that the original looping was in alloc_tx() initially and was simply never removed. Changes here would date back to before the git conversion.
Re: [v5,1/5] soc: qcom: smem_state: Fix include for ERR_PTR()
Bjorn Andersson wrote: > The correct include file for getting errno constants and ERR_PTR() is > linux/err.h, rather than linux/errno.h, so fix the include. > > Fixes: e8b123e60084 ("soc: qcom: smem_state: Add stubs for disabled > smem_state") > Acked-by: Andy Gross > Signed-off-by: Bjorn Andersson 5 patches applied to ath-next branch of ath.git, thanks. 6c0b2e833f14 soc: qcom: smem_state: Fix include for ERR_PTR() f303a9311065 wcn36xx: Transition driver to SMD client 886039036c20 wcn36xx: Implement firmware assisted scan 43efa3c0f241 wcn36xx: Implement print_reg indication d53628882255 wcn36xx: Don't use the destroyed hal_mutex -- https://patchwork.kernel.org/patch/9429045/ Documentation about submitting wireless patches and checking status from patchwork: https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
[PATCH net-next v2 1/2] bpf: pass original insn directly to convert_ctx_access
Currently, when calling convert_ctx_access() callback for the various program types, we pass in insn->dst_reg, insn->src_reg, insn->off from the original instruction. This information is needed to rewrite the instruction that is based on the user ctx structure into a kernel representation for the ctx. As we'd like to allow access size beyond just BPF_W, we'd need also insn->code for that in order to decode the original access size. Given that, lets just pass insn directly to the convert_ctx_access() callback and work on that to not clutter the callback with even more arguments we need to pass when everything is already contained in insn. So lets go through that once, no functional change. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- include/linux/bpf.h | 7 ++- kernel/bpf/verifier.c| 3 +- kernel/trace/bpf_trace.c | 15 ++--- net/core/filter.c| 139 +-- 4 files changed, 87 insertions(+), 77 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 94ea8d2..f8c3560 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -161,9 +161,10 @@ struct bpf_verifier_ops { enum bpf_reg_type *reg_type); int (*gen_prologue)(struct bpf_insn *insn, bool direct_write, const struct bpf_prog *prog); - u32 (*convert_ctx_access)(enum bpf_access_type type, int dst_reg, - int src_reg, int ctx_off, - struct bpf_insn *insn, struct bpf_prog *prog); + u32 (*convert_ctx_access)(enum bpf_access_type type, + const struct bpf_insn *src, + struct bpf_insn *dst, + struct bpf_prog *prog); }; struct bpf_prog_type_list { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2efdc91..df7e472 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3177,8 +3177,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) if (env->insn_aux_data[i].ptr_type != PTR_TO_CTX) continue; - cnt = ops->convert_ctx_access(type, insn->dst_reg, insn->src_reg, - insn->off, insn_buf, env->prog); + cnt = ops->convert_ctx_access(type, insn, insn_buf, env->prog); if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf)) { verbose("bpf verifier is misconfigured\n"); return -EINVAL; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index f883c43..1860e7f 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -572,28 +572,29 @@ static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type return true; } -static u32 pe_prog_convert_ctx_access(enum bpf_access_type type, int dst_reg, - int src_reg, int ctx_off, +static u32 pe_prog_convert_ctx_access(enum bpf_access_type type, + const struct bpf_insn *si, struct bpf_insn *insn_buf, struct bpf_prog *prog) { struct bpf_insn *insn = insn_buf; - switch (ctx_off) { + switch (si->off) { case offsetof(struct bpf_perf_event_data, sample_period): BUILD_BUG_ON(FIELD_SIZEOF(struct perf_sample_data, period) != sizeof(u64)); *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_perf_event_data_kern, - data), dst_reg, src_reg, + data), si->dst_reg, si->src_reg, offsetof(struct bpf_perf_event_data_kern, data)); - *insn++ = BPF_LDX_MEM(BPF_DW, dst_reg, dst_reg, + *insn++ = BPF_LDX_MEM(BPF_DW, si->dst_reg, si->dst_reg, offsetof(struct perf_sample_data, period)); break; default: *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_perf_event_data_kern, - regs), dst_reg, src_reg, + regs), si->dst_reg, si->src_reg, offsetof(struct bpf_perf_event_data_kern, regs)); - *insn++ = BPF_LDX_MEM(BPF_SIZEOF(long), dst_reg, dst_reg, ctx_off); + *insn++ = BPF_LDX_MEM(BPF_SIZEOF(long), si->dst_reg, si->dst_reg, + si->off); break; } diff --git a/net/core/filter.c b/net/core/filter.c index f4d16a9..8cfbdef 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2972,32 +2972,33 @@ void bpf_warn_invalid_xdp_action(u32 act) } EXPORT_SYMBOL_GPL(bpf_warn_invalid
[PATCH net-next v2 2/2] bpf: allow b/h/w/dw access for bpf's cb in ctx
When structs are used to store temporary state in cb[] buffer that is used with programs and among tail calls, then the generated code will not always access the buffer in bpf_w chunks. We can ease programming of it and let this act more natural by allowing for aligned b/h/w/dw sized access for cb[] ctx member. Various test cases are attached as well for the selftest suite. Potentially, this can also be reused for other program types to pass data around. Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 8 +- net/core/filter.c | 41 ++- tools/testing/selftests/bpf/test_verifier.c | 442 +++- 3 files changed, 478 insertions(+), 13 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index df7e472..d60e12c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3165,10 +3165,14 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) insn = env->prog->insnsi + delta; for (i = 0; i < insn_cnt; i++, insn++) { - if (insn->code == (BPF_LDX | BPF_MEM | BPF_W) || + if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) || + insn->code == (BPF_LDX | BPF_MEM | BPF_H) || + insn->code == (BPF_LDX | BPF_MEM | BPF_W) || insn->code == (BPF_LDX | BPF_MEM | BPF_DW)) type = BPF_READ; - else if (insn->code == (BPF_STX | BPF_MEM | BPF_W) || + else if (insn->code == (BPF_STX | BPF_MEM | BPF_B) || +insn->code == (BPF_STX | BPF_MEM | BPF_H) || +insn->code == (BPF_STX | BPF_MEM | BPF_W) || insn->code == (BPF_STX | BPF_MEM | BPF_DW)) type = BPF_WRITE; else diff --git a/net/core/filter.c b/net/core/filter.c index 8cfbdef..9038386 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2776,11 +2776,33 @@ static bool __is_valid_access(int off, int size) { if (off < 0 || off >= sizeof(struct __sk_buff)) return false; + /* The verifier guarantees that size > 0. */ if (off % size != 0) return false; - if (size != sizeof(__u32)) - return false; + + switch (off) { + case offsetof(struct __sk_buff, cb[0]) ... +offsetof(struct __sk_buff, cb[4]) + sizeof(__u32) - 1: + if (size == sizeof(__u16) && + off > offsetof(struct __sk_buff, cb[4]) + sizeof(__u16)) + return false; + if (size == sizeof(__u32) && + off > offsetof(struct __sk_buff, cb[4])) + return false; + if (size == sizeof(__u64) && + off > offsetof(struct __sk_buff, cb[2])) + return false; + if (size != sizeof(__u8) && + size != sizeof(__u16) && + size != sizeof(__u32) && + size != sizeof(__u64)) + return false; + break; + default: + if (size != sizeof(__u32)) + return false; + } return true; } @@ -2799,7 +2821,7 @@ static bool sk_filter_is_valid_access(int off, int size, if (type == BPF_WRITE) { switch (off) { case offsetof(struct __sk_buff, cb[0]) ... -offsetof(struct __sk_buff, cb[4]): +offsetof(struct __sk_buff, cb[4]) + sizeof(__u32) - 1: break; default: return false; @@ -2823,7 +2845,7 @@ static bool lwt_is_valid_access(int off, int size, case offsetof(struct __sk_buff, mark): case offsetof(struct __sk_buff, priority): case offsetof(struct __sk_buff, cb[0]) ... -offsetof(struct __sk_buff, cb[4]): +offsetof(struct __sk_buff, cb[4]) + sizeof(__u32) - 1: break; default: return false; @@ -2915,7 +2937,7 @@ static bool tc_cls_act_is_valid_access(int off, int size, case offsetof(struct __sk_buff, tc_index): case offsetof(struct __sk_buff, priority): case offsetof(struct __sk_buff, cb[0]) ... -offsetof(struct __sk_buff, cb[4]): +offsetof(struct __sk_buff, cb[4]) + sizeof(__u32) - 1: case offsetof(struct __sk_buff, tc_classid): break; default: @@ -3066,8 +3088,11 @@ static u32 sk_filter_convert_ctx_access(enum bpf_access_type type, si->dst_reg, si->src_reg, insn); case offsetof(struct __sk_buff, cb[0]) ... -offsetof(struct __sk_buff, cb[4]): +offseto
[PATCH net-next v2 0/2] More flexible BPF cb access
This patch improves BPF's cb access by allowing b/h/w/dw access variants on it. For details, please see individual patches. Thanks! v1 -> v2: - Fix typo in test case description spotted by Quentin - Rest as-is Daniel Borkmann (2): bpf: pass original insn directly to convert_ctx_access bpf: allow b/h/w/dw access for bpf's cb in ctx include/linux/bpf.h | 7 +- kernel/bpf/verifier.c | 11 +- kernel/trace/bpf_trace.c| 15 +- net/core/filter.c | 176 ++- tools/testing/selftests/bpf/test_verifier.c | 442 +++- 5 files changed, 563 insertions(+), 88 deletions(-) -- 1.9.3
Re: net: wireless: ath: wil6210: constify cfg80211_ops structures
Bhumika Goyal wrote: > cfg80211_ops structures are only passed as an argument to the function > wiphy_new. This argument is of type const, so cfg80211_ops strutures > having this property can be declared as const. > Done using Coccinelle > > @r1 disable optional_qualifier @ > identifier i; > position p; > @@ > static struct cfg80211_ops i@p = {...}; > > @ok1@ > identifier r1.i; > position p; > @@ > wiphy_new(&i@p,...) > > @bad@ > position p!={r1.p,ok1.p}; > identifier r1.i; > @@ > i@p > > @depends on !bad disable optional_qualifier@ > identifier r1.i; > @@ > +const > struct cfg80211_ops i; > > File size before: >text data bss dec hex filename > 18133 6632 0 2476560bd wireless/ath/wil6210/cfg80211.o > > File size after: >text data bss dec hex filename > 18933 5832 0 2476560bd wireless/ath/wil6210/cfg80211.o > > Signed-off-by: Bhumika Goyal Patch applied to ath-next branch of ath.git, thanks. b59eb96181e7 wil6210: constify cfg80211_ops structures -- https://patchwork.kernel.org/patch/9479127/ Documentation about submitting wireless patches and checking status from patchwork: https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [1/2] ath9k: ar9002_mac: kill off ACCESS_ONCE()
Mark Rutland wrote: > For several reasons, it is desirable to use {READ,WRITE}_ONCE() in > preference to ACCESS_ONCE(), and new code is expected to use one of the > former. So far, there's been no reason to change most existing uses of > ACCESS_ONCE(), as these aren't currently harmful. > > However, for some new features (e.g. KTSAN / Kernel Thread Sanitizer), > it is necessary to instrument reads and writes separately, which is not > possible with ACCESS_ONCE(). This distinction is critical to correct > operation. > > It's possible to transform the bulk of kernel code using the Coccinelle > script below. However, for some files (including the ath9k ar9002 mac > driver), this mangles the formatting. As a preparatory step, this patch > converts the driver to use {READ,WRITE}_ONCE() without said mangling. > > > virtual patch > > @ depends on patch @ > expression E1, E2; > @@ > > - ACCESS_ONCE(E1) = E2 > + WRITE_ONCE(E1, E2) > > @ depends on patch @ > expression E; > @@ > > - ACCESS_ONCE(E) > + READ_ONCE(E) > > > Signed-off-by: Mark Rutland > Cc: ath9k-de...@qca.qualcomm.com > Cc: Kalle Valo > Cc: linux-wirel...@vger.kernel.org > Cc: ath9k-de...@lists.ath9k.org > Cc: netdev@vger.kernel.org 2 patches applied to ath-next branch of ath.git, thanks. d5a3a76a9cb8 ath9k: ar9002_mac: kill off ACCESS_ONCE() 50f3818196f5 ath9k: ar9003_mac: kill off ACCESS_ONCE() -- https://patchwork.kernel.org/patch/9489799/ Documentation about submitting wireless patches and checking status from patchwork: https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: ath9k: fix spelling mistake: "meaurement" -> "measurement"
Colin Ian King wrote: > From: Colin Ian King > > Trivial fix to spelling mistake in ath_err message > > Signed-off-by: Colin Ian King Patch applied to ath-next branch of ath.git, thanks. 714ee339ff90 ath9k: fix spelling mistake: "meaurement" -> "measurement" -- https://patchwork.kernel.org/patch/9492191/ Documentation about submitting wireless patches and checking status from patchwork: https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH v2 2/2] stmmac: rename it to synopsys
Hi Alex, good morning! Às 10:11 AM de 1/12/2017, Alexandre Torgue escreveu: >> >> Lets not name it synopsys, for me it is totally fine, but naming it >> stmicro/stmmac is not the right way because it seems like it is a driver just >> for stmicro products, which is not, is for products that use Designware >> Ethernet >> IPs. >> >> I am volunteering to do this work, let's discuss this. > > For me it makes no sens to rename only folder (stmicro/stmmac by synopsys) and > keep stmmac* inside a synopsys folder (that is very confusing). If you propose > that you have to change all. > > BUT doing that, we will lose all stmmac driver story and we don't want that. Totally understand your point. Do you agree on this approach? rename "stmicro" to "dwc" (designware controllers) and leave stmmac as it is today. This small change is enough in my point of view and sole the problems you refer. We would have net/ethernet/dwc/stmmac/. I can also rename the dwmac4 files and functions to eqos, since soon we will have a new eqos version. dwmac4.h -> eqos.h dwmac4_core.c -> eqos_core.c dwmac4_descs.c -> eqos_descs.c dwmac4_descs.h -> eqos_descs.h dwmac4_dma.c -> eqos_dma.c dwmac4_dma.h -> eqos_dma.h dwmac4_lib.c -> eqos_lib.c What do you think about this approach? Thanks, Joao > > > >> >> Thanks, >> Joao >> >>
Re: [PATCH net-next 2/2] bpf: allow b/h/w/dw access for bpf's cb in ctx
On 01/12/2017 09:25 AM, Quentin Monnet wrote: 2017-01-12 (02:21 +0100) ~ Daniel Borkmann [...] diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 9bb4534..f664bed 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -859,15 +859,451 @@ struct test_val { [...] + { + "check cb access: doulbe, oob 5", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, + offsetof(struct __sk_buff, cb[4]) + 8), + BPF_EXIT_INSN(), + }, + .errstr = "invalid bpf_context access", + .result = REJECT, + }, Nitpicking: typo ("doulbe"). Thanks for spotting, I've sent out a v2.
Re: [PATCH net-next] secure_seq: fix sparse errors
Nice catch, thanks. Reviewed-by: Jason A. Donenfeld
Re: [PATCH 9/9] treewide: Inline ib_dma_map_*() functions
Reviewed-by: Sagi Grimberg
Re: [PATCH/RFC net] ravb: Remove Rx overflow log messages
On 01/12/2017 11:41 AM, Simon Horman wrote: From: Kazuya Mizuguchi Remove Rx overflow log messages as in an environment where logging results in network traffic logging may cause further overflows. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Kazuya Mizuguchi [simon: reworked changelog] Signed-off-by: Simon Horman Acked-by: Sergei Shtylyov MBR, Sergei
Re: [PATCH/RFC v2 net-next] ravb: unmap descriptors when freeing rings
On 01/12/2017 12:11 PM, Simon Horman wrote: From: Kazuya Mizuguchi "swiotlb buffer is full" errors occur after repeated initialisation of a device - f.e. suspend/resume or ip link set up/down. This is because memory mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit() is not released. Resolve this problem by unmapping descriptors when freeing rings. Note, ravb_tx_free() is moved but not otherwise modified by this patch. Signed-off-by: Kazuya Mizuguchi [simon: reworked] Signed-off-by: Simon Horman -- v1 [Kazuya Mizuguchi] v2 [Simon Horman] * As suggested by Sergei Shtylyov - Use dma_mapping_error() and rx_desc->ds_cc when unmapping RX descriptors; this is consistent with the way that they are mapped - Use ravb_tx_free() to clear TX descriptors Not sure that was good idea (sorry)... ravb_tx_ring() only unmaps the transmitted buffers, while we need to unmap everything... * Reduce scope of new local variable --- drivers/net/ethernet/renesas/ravb_main.c | 89 ++-- 1 file changed, 51 insertions(+), 38 deletions(-) diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 92d7692c840d..1797c48e3176 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -179,6 +179,44 @@ static struct mdiobb_ops bb_ops = { .get_mdio_data = ravb_get_mdio_data, }; +/* Free TX skb function for AVB-IP */ +static int ravb_tx_free(struct net_device *ndev, int q) +{ + struct ravb_private *priv = netdev_priv(ndev); + struct net_device_stats *stats = &priv->stats[q]; + struct ravb_tx_desc *desc; + int free_num = 0; + int entry; + u32 size; + + for (; priv->cur_tx[q] - priv->dirty_tx[q] > 0; priv->dirty_tx[q]++) { + entry = priv->dirty_tx[q] % (priv->num_tx_ring[q] * +NUM_TX_DESC); + desc = &priv->tx_ring[q][entry]; + if (desc->die_dt != DT_FEMPTY) Here, it stop once an untransmitted buffer is encountered... Yes, I see that now. I wonder if we should: a) paramatise ravb_tx_free() so it may either clear all transmitted buffers (current behaviour) or all buffers (new behaviour). b) provide a different version of this loop in ravb_ring_free() What are your thoughts? I'm voting for (b). [...] MBR, Sergei
[PATCH net] ravb: Remove Rx overflow log messages
From: Kazuya Mizuguchi Remove Rx overflow log messages as in an environment where logging results in network traffic logging may cause further overflows. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Kazuya Mizuguchi [simon: reworked changelog] Signed-off-by: Simon Horman Acked-by: Sergei Shtylyov --- Changes since RFC: * Added Ack from Sergei --- drivers/net/ethernet/renesas/ravb_main.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 92d7692c840d..5e5ad978eab9 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -926,14 +926,10 @@ static int ravb_poll(struct napi_struct *napi, int budget) /* Receive error message handling */ priv->rx_over_errors = priv->stats[RAVB_BE].rx_over_errors; priv->rx_over_errors += priv->stats[RAVB_NC].rx_over_errors; - if (priv->rx_over_errors != ndev->stats.rx_over_errors) { + if (priv->rx_over_errors != ndev->stats.rx_over_errors) ndev->stats.rx_over_errors = priv->rx_over_errors; - netif_err(priv, rx_err, ndev, "Receive Descriptor Empty\n"); - } - if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors) { + if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors) ndev->stats.rx_fifo_errors = priv->rx_fifo_errors; - netif_err(priv, rx_err, ndev, "Receive FIFO Overflow\n"); - } out: return budget - quota; } -- 2.7.0.rc3.207.g0ac5344
Re: [PATCHv3 2/6] sh_eth: add generic wake-on-lan support via magic packet
Hello! On 01/09/2017 06:34 PM, Niklas Söderlund wrote: Add generic functionality to support Wake-on-LAN using MagicPacket which are supported by at least a few versions of sh_eth. Only add functionality for WoL, no specific sh_eth versions are marked to support WoL yet. WoL is enabled in the suspend callback by setting MagicPacket detection and disabling all interrupts expect MagicPacket. In the resume path the driver needs to reset the hardware to rearm the WoL logic, this prevents the driver from simply restoring the registers and to take advantage of that sh_eth was not suspended to reduce resume time. To reset the hardware the driver closes and reopens the device just like it would do in a normal suspend/resume scenario without WoL enabled, but it both closes and opens the device in the resume callback since the device needs to be open for WoL to work. One quirk needed for WoL is that the module clock needs to be prevented from being switched off by Runtime PM. To keep the clock alive the suspend callback need to call clk_enable() directly to increase the usage count of the clock. Then when Runtime PM decreases the clock usage count it won't reach 0 and be switched off. Signed-off-by: Niklas Söderlund --- drivers/net/ethernet/renesas/sh_eth.c | 114 +++--- drivers/net/ethernet/renesas/sh_eth.h | 3 + 2 files changed, 109 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c index 8a784dce45fa..542c92b57b35 100644 --- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -1552,6 +1552,8 @@ static void sh_eth_emac_interrupt(struct net_device *ndev) sh_eth_rcv_snd_enable(ndev); } } + if (felic_stat & ECSR_MPD) + pm_wakeup_event(&mdp->pdev->dev, 0); Hum, seeing a corner case: if we're ignoring the link interrupt (and it does occur along with ECSR.MPD, we'll return and miss this check. It would have been preferable to add this code above the ECSR.LCHNG handler... [...] @@ -3150,15 +3189,67 @@ static int sh_eth_drv_remove(struct platform_device *pdev) #ifdef CONFIG_PM #ifdef CONFIG_PM_SLEEP +static int sh_eth_wol_setup(struct net_device *ndev) +{ + struct sh_eth_private *mdp = netdev_priv(ndev); + + /* Only allow ECI interrupts */ + synchronize_irq(ndev->irq); + napi_disable(&mdp->napi); + sh_eth_write(ndev, DMAC_M_ECI, EESIPR); + + /* Enable MagicPacket */ + sh_eth_modify(ndev, ECMR, 0, ECMR_MPDE); I'd prefer sh_eth_modify(ndev, ECMR, ECMR_MPDE, ECMR_MPDE) to be consistent with my other code... [...] MBR, Sergei
To netlink or not to netlink, that is the question
Hey folks, A few months ago I switched away from using netlink in wireguard, preferring instead to use ioctl. I had come up against limitations in rtnetlink, and ioctl presented a straightforward hard to screw-up alternative. The very simple API is documented here: https://git.zx2c4.com/WireGuard/tree/src/uapi.h This works well, and I'm reluctant to change it, but as I do more complicated things, and as kernel submission time looms nearer, I'm kept up at night by the notion that maybe I ought to give netlink another chance. But how? For each wireguard interface, there are three types of structures for userspace to configure. There is one wgdevice for each interface. Each wgdevice has a variable amount (up to 2^16) of wgpeers. Each wgpeer has a variable amount (up to 2^16) of wgipmasks. I'd like an interface to get and set all of these at once, atomically. Presently, with the ioctl, I just have a simple get ioctl and a simple set ioctl. The set one passes a user space pointer, which is read incrementally in kernel space. The get one will first return how much userspace should allocate, and then when called again will write incrementally into a provided userspace buffer up to a passed-in maximum number of bytes. Very basic, I'm quite happy. When I had tried to do this priorly with netlink, I did it by defining changelink and fill_info in rtnl_link_ops. For changelink, I iterated through the netlink objects, and for fill_info, I filled in the skb with netlink objects. This was a bit more complex but basically worked. Except netlink skbs have a maximum size and are buffered, which means things broke entirely when trying to read or write logs of wgpeers or lots of wgipmasks. So, the meager interfaces afforded to me by rtnl_link_ops are insufficient. Doing anything beyond this, either by registering new rtnetlink messages, or by using generic netlink, seemed overwhelmingly complex and undesirable. So I'm wondering -- is there a good way to be doing this with netlink? Or am I right to stay with ioctl? Thanks, Jason
Re: [PATCH] can: Fix kernel panic at security_sock_rcv_skb
On Thu, 2017-01-12 at 09:22 +0100, Oliver Hartkopp wrote: > > On 01/12/2017 07:33 AM, Liu ShuoX wrote: > > From: Zhang Yanmin > > > > The patch is for fix the below kernel panic: > > BUG: unable to handle kernel NULL pointer dereference at (null) > > IP: [] selinux_socket_sock_rcv_skb+0x65/0x2a0 > > > > Call Trace: > > > > [] security_sock_rcv_skb+0x4c/0x60 > > [] sk_filter+0x41/0x210 > > [] sock_queue_rcv_skb+0x53/0x3a0 > > [] raw_rcv+0x2a3/0x3c0 > > [] can_rcv_filter+0x12b/0x370 > > [] can_receive+0xd9/0x120 > > [] can_rcv+0xab/0x100 > > [] __netif_receive_skb_core+0xd8c/0x11f0 > > [] __netif_receive_skb+0x24/0xb0 > > [] process_backlog+0x127/0x280 > > [] net_rx_action+0x33b/0x4f0 > > [] __do_softirq+0x184/0x440 > > [] do_softirq_own_stack+0x1c/0x30 > > > > [] do_softirq.part.18+0x3b/0x40 > > [] do_softirq+0x1d/0x20 > > [] netif_rx_ni+0xe5/0x110 > > [] slcan_receive_buf+0x507/0x520 > > [] flush_to_ldisc+0x21c/0x230 > > [] process_one_work+0x24f/0x670 > > [] worker_thread+0x9d/0x6f0 > > [] ? rescuer_thread+0x480/0x480 > > [] kthread+0x12c/0x150 > > [] ret_from_fork+0x3f/0x70 > > > > The sk dereferenced in panic has been released. After the rcu_call in > > can_rx_unregister, receiver was protected by RCU but inner data was > > not, then later sk will be freed while other CPU is still using it. > > We need wait here to make sure sk referenced via receiver was safe. > > > > => security_sk_free > > => sk_destruct > > => __sk_free > > => sk_free > > => raw_release > > => sock_release > > => sock_close > > => __fput > > => fput > > => task_work_run > > => exit_to_usermode_loop > > => syscall_return_slowpath > > => int_ret_from_sys_call > > > > Signed-off-by: Zhang Yanmin > > Signed-off-by: He, Bo > > Signed-off-by: Liu Shuo A > > --- > > net/can/af_can.c | 14 -- > > net/can/af_can.h | 1 - > > 2 files changed, 8 insertions(+), 7 deletions(-) > > > > diff --git a/net/can/af_can.c b/net/can/af_can.c > > index 1108079..fcbe971 100644 > > --- a/net/can/af_can.c > > +++ b/net/can/af_can.c > > @@ -517,10 +517,8 @@ int can_rx_register(struct net_device *dev, canid_t > > can_id, canid_t mask, > > /* > > * can_rx_delete_receiver - rcu callback for single receiver entry removal > > */ > > -static void can_rx_delete_receiver(struct rcu_head *rp) > > +static void can_rx_delete_receiver(struct receiver *r) > > { > > - struct receiver *r = container_of(rp, struct receiver, rcu); > > - > > kmem_cache_free(rcv_cache, r); > > } > > > > @@ -595,9 +593,13 @@ void can_rx_unregister(struct net_device *dev, canid_t > > can_id, canid_t mask, > > out: > > spin_unlock(&can_rcvlists_lock); > > > > - /* schedule the receiver item for deletion */ > > - if (r) > > - call_rcu(&r->rcu, can_rx_delete_receiver); > > + /* synchronize_rcu to wait until a grace period has elapsed, to make > > +* sure all receiver's sk dereferenced by others. > > +*/ > > + if (r) { > > + synchronize_rcu(); > > + can_rx_delete_receiver(r); > > Nitpick: When can_rx_delete_receiver() just contains > kmem_cache_free(rcv_cache, r), then the function definition should be > removed. > > But my main concern is: > > The reason why can_rx_delete_receiver() was introduced was the need to > remove a huge number of receivers with can_rx_unregister(). > > When you call synchronize_rcu() after each receiver removal this would > potentially lead to a big performance issue when e.g. closing CAN_RAW > sockets with a high number of receivers. > > So the idea was to remove/unlink the receiver hlist_del_rcu(&r->list) > and also kmem_cache_free(rcv_cache, r) by some rcu mechanism - so that > all elements are cleaned up by rcu at a later point. > > Is it possible that the problems emerge due to hlist_del_rcu(&r->list) > and you accidently fix it with your introduced synchronize_rcu()? I agree this patch does not fix the root cause. The main problem seems that the sockets themselves are not RCU protected. If CAN uses RCU for delivery, then sockets should be freed only after one RCU grace period. On recent kernels, following patch could help : diff --git a/net/can/af_can.c b/net/can/af_can.c index 1108079d934f8383a599d7997b08100fca0465e9..353beaefee7ea3631eb429b011604906b964465e 100644 --- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -189,6 +189,7 @@ static int can_create(struct net *net, struct socket *sock, int protocol, sock_init_data(sock, sk); sk->sk_destruct = can_sock_destruct; + sock_set_flag(sk, SOCK_RCU_FREE); if (sk->sk_prot->init) err = sk->sk_prot->init(sk); For older kernels, the following could be used : net/can/af_can.c | 13 ++--- net/can/af_can.h |3 ++- net/can/bcm.c|4 ++-- net/can/gw.c |2 +- net/can/raw.c|4 ++-- 5 files changed, 17 insertions(+), 9 deletions(-) diff --git a/net/can/af_can.c b/net/can/af_can.c index 1108079d934f8383a599d
Re: [PATCH net-next] net/mlx5e: Support bpf_xdp_adjust_head()
On Thu, Jan 12, 2017 at 4:09 AM, Martin KaFai Lau wrote: > This patch adds bpf_xdp_adjust_head() support to mlx5e. Hi Martin, Thanks for the patch ! you can find some comments below. > > 1. rx_headroom is added to struct mlx5e_rq. It uses >an existing 4 byte hole in the struct. > 2. The adjusted data length is checked against >MLX5E_XDP_MIN_INLINE and MLX5E_SW2HW_MTU(rq->netdev->mtu). > 3. The macro MLX5E_SW2HW_MTU is moved from en_main.c to en.h. >MLX5E_HW2SW_MTU is also moved to en.h for symmetric reason >but it is not a must. > > Signed-off-by: Martin KaFai Lau > --- > drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 ++ > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 18 +++ > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 63 > ++- > 3 files changed, 51 insertions(+), 34 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h > b/drivers/net/ethernet/mellanox/mlx5/core/en.h > index a473cea10c16..0d9dd860a295 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h > @@ -51,6 +51,9 @@ > > #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v) > > +#define MLX5E_HW2SW_MTU(hwmtu) ((hwmtu) - (ETH_HLEN + VLAN_HLEN + > ETH_FCS_LEN)) > +#define MLX5E_SW2HW_MTU(swmtu) ((swmtu) + (ETH_HLEN + VLAN_HLEN + > ETH_FCS_LEN)) > + > #define MLX5E_MAX_NUM_TC 8 > > #define MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE0x6 > @@ -369,6 +372,7 @@ struct mlx5e_rq { > > unsigned long state; > intix; > + u16rx_headroom; > > struct mlx5e_rx_am am; /* Adaptive Moderation */ > struct bpf_prog *xdp_prog; > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > index f74ba73c55c7..aba3691e0919 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > @@ -343,9 +343,6 @@ static void mlx5e_disable_async_events(struct mlx5e_priv > *priv) > synchronize_irq(mlx5_get_msix_vec(priv->mdev, MLX5_EQ_VEC_ASYNC)); > } > > -#define MLX5E_HW2SW_MTU(hwmtu) (hwmtu - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)) > -#define MLX5E_SW2HW_MTU(swmtu) (swmtu + (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)) > - > static inline int mlx5e_get_wqe_mtt_sz(void) > { > /* UMR copies MTTs in units of MLX5_UMR_MTT_ALIGNMENT bytes. > @@ -534,9 +531,13 @@ static int mlx5e_create_rq(struct mlx5e_channel *c, > goto err_rq_wq_destroy; > } > > - rq->buff.map_dir = DMA_FROM_DEVICE; > - if (rq->xdp_prog) > + if (rq->xdp_prog) { > rq->buff.map_dir = DMA_BIDIRECTIONAL; > + rq->rx_headroom = XDP_PACKET_HEADROOM; > + } else { > + rq->buff.map_dir = DMA_FROM_DEVICE; > + rq->rx_headroom = MLX5_RX_HEADROOM; > + } > > switch (priv->params.rq_wq_type) { > case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ: > @@ -586,7 +587,7 @@ static int mlx5e_create_rq(struct mlx5e_channel *c, > byte_count = rq->buff.wqe_sz; > > /* calc the required page order */ > - frag_sz = MLX5_RX_HEADROOM + > + frag_sz = rq->rx_headroom + > byte_count /* packet data */ + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > frag_sz = SKB_DATA_ALIGN(frag_sz); > @@ -3153,11 +3154,6 @@ static int mlx5e_xdp_set(struct net_device *netdev, > struct bpf_prog *prog) > bool reset, was_opened; > int i; > > - if (prog && prog->xdp_adjust_head) { > - netdev_err(netdev, "Does not support > bpf_xdp_adjust_head()\n"); > - return -EOPNOTSUPP; > - } > - > mutex_lock(&priv->state_lock); > > if ((netdev->features & NETIF_F_LRO) && prog) { > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > index 0e2fb3ed1790..914e00132e08 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > @@ -264,7 +264,7 @@ int mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq, struct > mlx5e_rx_wqe *wqe, u16 ix) > if (unlikely(mlx5e_page_alloc_mapped(rq, di))) > return -ENOMEM; > > - wqe->data.addr = cpu_to_be64(di->addr + MLX5_RX_HEADROOM); > + wqe->data.addr = cpu_to_be64(di->addr + rq->rx_headroom); > return 0; > } > > @@ -646,8 +646,7 @@ static inline void mlx5e_xmit_xdp_doorbell(struct > mlx5e_sq *sq) > > static inline void mlx5e_xmit_xdp_frame(struct mlx5e_rq *rq, > struct mlx5e_dma_info *di, > - unsigned int data_offset, > - int len) > +
[PATCH net-next] tools: psock_lib: harden socket filter used by psock tests
The filter added by sock_setfilter is intended to only permit packets matching the pattern set up by create_payload(), but we only check the ip_len, and a single test-character in the IP packet to ensure this condition. Harden the filter by adding additional constraints so that we only permit UDP/IPv4 packets that meet the ip_len and test-character requirements. Include the bpf_asm src as a comment, in case this needs to be enhanced in the future Signed-off-by: Sowmini Varadhan --- tools/testing/selftests/net/psock_lib.h | 39 +- 1 files changed, 32 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/net/psock_lib.h b/tools/testing/selftests/net/psock_lib.h index 24bc7ec..a77da88 100644 --- a/tools/testing/selftests/net/psock_lib.h +++ b/tools/testing/selftests/net/psock_lib.h @@ -40,14 +40,39 @@ static __maybe_unused void sock_setfilter(int fd, int lvl, int optnum) { + /* the filter below checks for all of the following conditions that +* are based on the contents of create_payload() +* ether type 0x800 and +* ip proto udp and +* skb->len == DATA_LEN and +* udp[38] == 'a' or udp[38] == 'b' +* It can be generated from the following bpf_asm input: +* ldh [12] +* jne #0x800, drop; ETH_P_IP +* ldb [23] +* jneq #17, drop ; IPPROTO_UDP +* ld len ; ld skb->len +* jlt #100, drop ; DATA_LEN +* ldb [80] +* jeq #97, pass ; DATA_CHAR +* jne #98, drop ; DATA_CHAR_1 +* pass: +*ret #-1 +* drop: +*ret #0 +*/ struct sock_filter bpf_filter[] = { - { 0x80, 0, 0, 0x }, /* LD pktlen*/ - { 0x35, 0, 4, DATA_LEN }, /* JGE DATA_LEN [f goto nomatch]*/ - { 0x30, 0, 0, 0x0050 }, /* LD ip[80]*/ - { 0x15, 1, 0, DATA_CHAR }, /* JEQ DATA_CHAR [t goto match]*/ - { 0x15, 0, 1, DATA_CHAR_1}, /* JEQ DATA_CHAR_1 [t goto match]*/ - { 0x06, 0, 0, 0x0060 }, /* RET match */ - { 0x06, 0, 0, 0x }, /* RET no match */ + { 0x28, 0, 0, 0x000c }, + { 0x15, 0, 8, 0x0800 }, + { 0x30, 0, 0, 0x0017 }, + { 0x15, 0, 6, 0x0011 }, + { 0x80, 0, 0, 00 }, + { 0x35, 0, 4, 0x0064 }, + { 0x30, 0, 0, 0x0050 }, + { 0x15, 1, 0, 0x0061 }, + { 0x15, 0, 1, 0x0062 }, + { 0x06, 0, 0, 0x }, + { 0x06, 0, 0, 00 }, }; struct sock_fprog bpf_prog; -- 1.7.1
Re: [PATCH 8/9] IB: Convert ib_dma_*_coherent() argument type from u64 into dma_addr_t
On Tue, Jan 10, 2017 at 04:56:47PM -0800, Bart Van Assche wrote: > This patch does not change any functionality. > > Signed-off-by: Bart Van Assche > Cc: David S. Miller > Cc: linux-r...@vger.kernel.org > Cc: netdev@vger.kernel.org > Cc: rds-de...@oss.oracle.com > --- > include/rdma/ib_verbs.h | 11 +++ > net/rds/ib.h| 6 +++--- > 2 files changed, 6 insertions(+), 11 deletions(-) > Thanks, Reviewed-by: Leon Romanovsky signature.asc Description: PGP signature
Re: [PATCH/RFC v2 net-next] ravb: unmap descriptors when freeing rings
On Thu, Jan 12, 2017 at 03:03:05PM +0300, Sergei Shtylyov wrote: > On 01/12/2017 12:11 PM, Simon Horman wrote: ... > >> Here, it stop once an untransmitted buffer is encountered... > > > >Yes, I see that now. > > > >I wonder if we should: > > > >a) paramatise ravb_tx_free() so it may either clear all transmitted buffers > > (current behaviour) or all buffers (new behaviour). > >b) provide a different version of this loop in ravb_ring_free() > > > >What are your thoughts? > >I'm voting for (b). Ok, something like this? @@ -215,6 +225,30 @@ static void ravb_ring_free(struct net_device *ndev, int q) } if (priv->tx_ring[q]) { + for (; priv->cur_tx[q] - priv->dirty_tx[q] > 0; priv->dirty_tx[q]++) { + struct ravb_tx_desc *desc; + int entry; + + entry = priv->dirty_tx[q] % (priv->num_tx_ring[q] * +NUM_TX_DESC); + desc = &priv->tx_ring[q][entry]; + + /* Free the original skb. */ + if (priv->tx_skb[q][entry / NUM_TX_DESC]) { + u32 size = le16_to_cpu(desc->ds_tagl) & TX_DS; + + dma_unmap_single(ndev->dev.parent, +le32_to_cpu(desc->dptr), +size, DMA_TO_DEVICE); + /* Last packet descriptor? */ + if (entry % NUM_TX_DESC == NUM_TX_DESC - 1) { + entry /= NUM_TX_DESC; + dev_kfree_skb_any(priv->tx_skb[q][entry]); + priv->tx_skb[q][entry] = NULL; + } + } + } + ring_size = sizeof(struct ravb_tx_desc) * (priv->num_tx_ring[q] * NUM_TX_DESC + 1); dma_free_coherent(ndev->dev.parent, ring_size, priv->tx_ring[q],
Re: [PATCH v3 net-next 4/4] syncookies: use SipHash in place of SHA1
On Sun, 2017-01-08 at 13:54 +0100, Jason A. Donenfeld wrote: > SHA1 is slower and less secure than SipHash, and so replacing syncookie > generation with SipHash makes natural sense. Some BSDs have been doing > this for several years in fact. > > The speedup should be similar -- and even more impressive -- to the > speedup from the sequence number fix in this series. I confirm a nice speedup under SYNFLOOD. sha_transform() used to consume ~12 % of cpu cycles, while the siphash_2u64() only uses ~1.9 % Depending on the setup, gain is about 9 % 4.48% [kernel] [k] ipt_do_table 4.39% [kernel] [k] fib_table_lookup 3.90% [kernel] [k] __netif_receive_skb_core 3.76% [kernel] [k] fib_rules_lookup 3.15% [kernel] [k] __inet_lookup_established 3.11% [kernel] [k] tcp_conn_request 2.51% [kernel] [k] tcp_v4_rcv 2.42% [kernel] [k] tcp_make_synack 2.22% [kernel] [k] nf_iterate 2.16% [kernel] [k] ip_rcv 1.92% [kernel] [k] siphash_2u64 1.76% [kernel] [k] __ip_route_output_key 1.73% [kernel] [k] mlx4_en_process_rx_cq 1.68% [kernel] [k] memcpy_erms 1.59% [kernel] [k] __alloc_skb 1.49% [kernel] [k] __dev_queue_xmit 1.48% [kernel] [k] kmem_cache_alloc 1.38% [kernel] [k] __local_bh_enable_ip 1.36% [kernel] [k] kmem_cache_free 1.21% [kernel] [k] ___cache_free 1.09% [kernel] [k] __build_skb 1.07% [kernel] [k] inet_reqsk_alloc 1.04% [kernel] [k] kfree 1.04% [kernel] [k] ip_build_and_send_pkt 1.04% [kernel] [k] inet_gro_receive 1.01% [kernel] [k] fib_validate_source 0.98% [kernel] [k] tcp_openreq_init_rwin 0.98% [kernel] [k] inet_csk_route_req 0.97% [kernel] [k] fib_get_table 0.96% [kernel] [k] ip_finish_output2 0.94% [kernel] [k] tcp_v4_do_rcv 0.91% [kernel] [k] ip_local_deliver_finish 0.91% [kernel] [k] netif_skb_features 0.91% [kernel] [k] dev_hard_start_xmit
Re: [PATCH 9/9] treewide: Inline ib_dma_map_*() functions
On Tue, Jan 10, 2017 at 04:56:48PM -0800, Bart Van Assche wrote: > Almost all changes in this patch except the removal of local variables > that became superfluous and the actual removal of the ib_dma_map_*() > functions have been generated as follows: > > git grep -lE 'ib_(sg_|)dma_' | > xargs -d\\n \ > sed -i -e > 's/\([^[:alnum:]_]\)ib_dma_\([^(]*\)(\&\([^,]\+\),/\1dma_\2(\3.dma_device,/g' > \ >-e > 's/\([^[:alnum:]_]\)ib_dma_\([^(]*\)(\([^,]\+\),/\1dma_\2(\3->dma_device,/g' \ > -e 's/ib_sg_dma_\(len\|address\)(\([^,]\+\), /sg_dma_\1(/g' > > Signed-off-by: Bart Van Assche > Reviewed-by: Christoph Hellwig > Cc: Andreas Dilger > Cc: Anna Schumaker > Cc: David S. Miller > Cc: Eric Van Hensbergen > Cc: James Simmons > Cc: Latchesar Ionkov > Cc: Oleg Drokin > Cc: Ron Minnich > Cc: Trond Myklebust > Cc: de...@driverdev.osuosl.org > Cc: linux-...@vger.kernel.org > Cc: linux-n...@lists.infradead.org > Cc: linux-r...@vger.kernel.org > Cc: lustre-de...@lists.lustre.org > Cc: netdev@vger.kernel.org > Cc: rds-de...@oss.oracle.com > Cc: target-de...@vger.kernel.org > Cc: v9fs-develo...@lists.sourceforge.net > --- > drivers/infiniband/core/mad.c | 28 +-- > drivers/infiniband/core/rw.c | 30 ++- > drivers/infiniband/core/umem.c | 4 +- > drivers/infiniband/core/umem_odp.c | 6 +- > drivers/infiniband/hw/mlx4/cq.c| 2 +- > drivers/infiniband/hw/mlx4/mad.c | 28 +-- > drivers/infiniband/hw/mlx4/mr.c| 4 +- > drivers/infiniband/hw/mlx4/qp.c| 10 +- > drivers/infiniband/hw/mlx5/mr.c| 4 +- For mlx5 and mlx4 parts. Acked-by: Leon Romanovsky Thanks signature.asc Description: PGP signature
[PATCH net] mld: do not remove mld souce list info when set link down
This is an IPv6 version of commit 24803f38a5c0 ("igmp: do not remove igmp souce list..."). In mld_del_delrec(), we will restore back all source filter info instead of flush them. Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since we should not remove source list info when set link down. Remove igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in ipv6_mc_down(). Also clear all source info after igmp6_group_dropped() instead of in it because ipv6_mc_down() will call igmp6_group_dropped(). Signed-off-by: Hangbin Liu --- net/ipv6/mcast.c | 51 ++- 1 file changed, 30 insertions(+), 21 deletions(-) diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 14a3903..7139fff 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -81,7 +81,7 @@ static void mld_gq_timer_expire(unsigned long data); static void mld_ifc_timer_expire(unsigned long data); static void mld_ifc_event(struct inet6_dev *idev); static void mld_add_delrec(struct inet6_dev *idev, struct ifmcaddr6 *pmc); -static void mld_del_delrec(struct inet6_dev *idev, const struct in6_addr *addr); +static void mld_del_delrec(struct inet6_dev *idev, struct ifmcaddr6 *pmc); static void mld_clear_delrec(struct inet6_dev *idev); static bool mld_in_v1_mode(const struct inet6_dev *idev); static int sf_setstate(struct ifmcaddr6 *pmc); @@ -692,9 +692,9 @@ static void igmp6_group_dropped(struct ifmcaddr6 *mc) dev_mc_del(dev, buf); } - if (mc->mca_flags & MAF_NOREPORT) - goto done; spin_unlock_bh(&mc->mca_lock); + if (mc->mca_flags & MAF_NOREPORT) + return; if (!mc->idev->dead) igmp6_leave_group(mc); @@ -702,8 +702,6 @@ static void igmp6_group_dropped(struct ifmcaddr6 *mc) spin_lock_bh(&mc->mca_lock); if (del_timer(&mc->mca_timer)) atomic_dec(&mc->mca_refcnt); -done: - ip6_mc_clear_src(mc); spin_unlock_bh(&mc->mca_lock); } @@ -748,10 +746,11 @@ static void mld_add_delrec(struct inet6_dev *idev, struct ifmcaddr6 *im) spin_unlock_bh(&idev->mc_lock); } -static void mld_del_delrec(struct inet6_dev *idev, const struct in6_addr *pmca) +static void mld_del_delrec(struct inet6_dev *idev, struct ifmcaddr6 *im) { struct ifmcaddr6 *pmc, *pmc_prev; - struct ip6_sf_list *psf, *psf_next; + struct ip6_sf_list *psf; + struct in6_addr *pmca = &im->mca_addr; spin_lock_bh(&idev->mc_lock); pmc_prev = NULL; @@ -768,14 +767,20 @@ static void mld_del_delrec(struct inet6_dev *idev, const struct in6_addr *pmca) } spin_unlock_bh(&idev->mc_lock); + spin_lock_bh(&im->mca_lock); if (pmc) { - for (psf = pmc->mca_tomb; psf; psf = psf_next) { - psf_next = psf->sf_next; - kfree(psf); + im->idev = pmc->idev; + im->mca_crcount = idev->mc_qrv; + im->mca_sfmode = pmc->mca_sfmode; + if (pmc->mca_sfmode == MCAST_INCLUDE) { + im->mca_tomb = pmc->mca_tomb; + im->mca_sources = pmc->mca_sources; + for (psf = im->mca_sources; psf; psf = psf->sf_next) + psf->sf_crcount = im->mca_crcount; } in6_dev_put(pmc->idev); - kfree(pmc); } + spin_unlock_bh(&im->mca_lock); } static void mld_clear_delrec(struct inet6_dev *idev) @@ -904,7 +909,7 @@ int ipv6_dev_mc_inc(struct net_device *dev, const struct in6_addr *addr) mca_get(mc); write_unlock_bh(&idev->lock); - mld_del_delrec(idev, &mc->mca_addr); + mld_del_delrec(idev, mc); igmp6_group_added(mc); ma_put(mc); return 0; @@ -927,6 +932,7 @@ int __ipv6_dev_mc_dec(struct inet6_dev *idev, const struct in6_addr *addr) write_unlock_bh(&idev->lock); igmp6_group_dropped(ma); + ip6_mc_clear_src(ma); ma_put(ma); return 0; @@ -2501,15 +2507,17 @@ void ipv6_mc_down(struct inet6_dev *idev) /* Withdraw multicast list */ read_lock_bh(&idev->lock); - mld_ifc_stop_timer(idev); - mld_gq_stop_timer(idev); - mld_dad_stop_timer(idev); for (i = idev->mc_list; i; i = i->next) igmp6_group_dropped(i); - read_unlock_bh(&idev->lock); - mld_clear_delrec(idev); + /* Should stop timer after group drop. or we will +* start timer again in mld_ifc_event() +*/ + mld_ifc_stop_timer(idev); + mld_gq_stop_timer(idev); + mld_dad_stop_timer(idev); + read_unlock_bh(&idev->lock); } static void ipv6_mc_reset(struct inet6_dev *idev) @@ -2531,8 +2539,10 @@ void ipv6_mc_
Re: [PATCH/RFC v2 net-next] ravb: unmap descriptors when freeing rings
Hi, On 12.01.2017 10:11, Simon Horman wrote: + + for (; priv->cur_tx[q] - priv->dirty_tx[q] > 0; priv->dirty_tx[q]++) { BTW: How can this work correctly when cur_tx wraps and dirty_tx is greater? Regards, Lino
[PATCH net-next] IPsec: do not ignore crypto err in ah input
ah input processing uses the asynchrnous hash crypto API which supplies an error code as part of the operation completion but the error code was being ignored. Treat a crypto API error indication as a verification failure. While a crypto API reported error would almost certainly result in a memcpy of the digest failing anyway and thus the security risk seems minor, performing a memory compare on what might be uninitialized memory is wrong. Signed-off-by: Gilad Ben-Yossef --- The change was boot tested on Arm64 but I did not exercise the specific error code path in question. net/ipv4/ah4.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index f2a7102..22377c8 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -270,6 +270,9 @@ static void ah_input_done(struct crypto_async_request *base, int err) int ihl = ip_hdrlen(skb); int ah_hlen = (ah->hdrlen + 2) << 2; + if (err) + goto out; + work_iph = AH_SKB_CB(skb)->tmp; auth_data = ah_tmp_auth(work_iph, ihl); icv = ah_tmp_icv(ahp->ahash, auth_data, ahp->icv_trunc_len); -- 2.1.4
[PATCH net-next] cdc-ether: usbnet_cdc_zte_status() can be static
From: Wei Yongjun Fixes the following sparse warning: drivers/net/usb/cdc_ether.c:469:6: warning: symbol 'usbnet_cdc_zte_status' was not declared. Should it be static? Signed-off-by: Wei Yongjun --- drivers/net/usb/cdc_ether.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c index fe7b288..620ba8e 100644 --- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -466,7 +466,7 @@ static int usbnet_cdc_zte_rx_fixup(struct usbnet *dev, struct sk_buff *skb) * connected. This causes the link state to be incorrect. Work around this by * always setting the state to off, then on. */ -void usbnet_cdc_zte_status(struct usbnet *dev, struct urb *urb) +static void usbnet_cdc_zte_status(struct usbnet *dev, struct urb *urb) { struct usb_cdc_notification *event;
[PATCH iproute2 v4 1/4] ifstat: Includes reorder
Reorder the includes order in misc/ifstat.c to match convention. Signed-off-by: Nogah Frankel Reviewed-by: Jiri Pirko --- misc/ifstat.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/misc/ifstat.c b/misc/ifstat.c index 92d67b0..5bcbcc8 100644 --- a/misc/ifstat.c +++ b/misc/ifstat.c @@ -28,12 +28,12 @@ #include #include -#include -#include #include #include -#include +#include "libnetlink.h" +#include "json_writer.h" +#include "SNAPSHOT.h" int dump_zeros; int reset_history; -- 2.4.3
[PATCH iproute2 v4 4/4] ifstat: Add "sw only" extended statistics to ifstat
Add support for extended statistics of SW only type, for counting only the packets that went via the cpu. (useful for systems with forward offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT. It is under the name 'cpu_hits' (or any shorten of it as 'cpu' or simply 'c') For example: ifstat -x c Signed-off-by: Nogah Frankel Reviewed-by: Jiri Pirko --- misc/ifstat.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/misc/ifstat.c b/misc/ifstat.c index 3478f0a..5b6a36b 100644 --- a/misc/ifstat.c +++ b/misc/ifstat.c @@ -730,7 +730,8 @@ static void xstat_usage(void) { fprintf(stderr, "Usage: ifstat supported xstats:\n" -" 64bits default stats, with 64 bits support\n"); +" 64bits default stats, with 64 bits support\n" +" cpu_hits Counts only packets that went via the CPU.\n"); } struct extended_stats_options_t { @@ -745,6 +746,7 @@ struct extended_stats_options_t { */ static const struct extended_stats_options_t extended_stats_options[] = { {"64bits", IFLA_STATS_LINK_64, NO_SUB_TYPE}, + {"cpu_hits", IFLA_STATS_LINK_OFFLOAD_XSTATS, IFLA_OFFLOAD_XSTATS_CPU_HIT}, }; static const char *get_filter_type(const char *name) -- 2.4.3
[PATCH iproute2 v4 0/4] update ifstat for new stats
Previously stats were gotten by RTM_GETLINK which returns 32 bits based statistics. It supports only one type of stats. Lately, a new method to get stats was added - RTM_GETSTATS. It supports ability to choose stats type. The basic stats were changed from 32 bits based to 64 bits based. This patchset adds ifstat the ability to get extended stats by this method. Its adds two types of extended stats: 64bits - the same as the "normal" stats but get the stats from the cpu in 64 bits based struct. cpu_hits - for packets that hit cpu. --- v3->v4: - patch 2/4: - change xstat name read to avoid redundant copy. - delete extra line - patch 4/4: - change xstat name. v2->v3: - patch 1/4: - add a new patch to reorder includes in misc/ifstat.c - patch 2/4: (previously 1/3) - fix typos. - change error print to use fprintf. v1->v2: - change from using RTM_GETSTATS always to using it only for extended stats. - Add 64bits extended stats type. Nogah Frankel (4): ifstat: Includes reorder ifstat: Add extended statistics to ifstat ifstat: Add 64 bits based stats to extended statistics ifstat: Add "sw only" extended statistics to ifstat misc/ifstat.c | 170 +++--- 1 file changed, 152 insertions(+), 18 deletions(-) -- 2.4.3
[PATCH iproute2 v4 3/4] ifstat: Add 64 bits based stats to extended statistics
The default stats for ifstat are 32 bits based. The kernel supports 64 bits based stats. (They are returned in struct rtnl_link_stats64 which is an exact copy of struct rtnl_link_stats, in which the "normal" stats are returned, but with fields of u64 instead of u32). This patch adds them as an extended stats. It is read with filter type IFLA_STATS_LINK_64 and no sub type. It is under the name 64bits (or any shorten of it as "64") For example: ifstat -x 64bit Signed-off-by: Nogah Frankel Reviewed-by: Jiri Pirko --- misc/ifstat.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/misc/ifstat.c b/misc/ifstat.c index 9467119..3478f0a 100644 --- a/misc/ifstat.c +++ b/misc/ifstat.c @@ -729,7 +729,8 @@ static int verify_forging(int fd) static void xstat_usage(void) { fprintf(stderr, -"Usage: ifstat supported xstats:\n"); +"Usage: ifstat supported xstats:\n" +" 64bits default stats, with 64 bits support\n"); } struct extended_stats_options_t { @@ -743,6 +744,7 @@ struct extended_stats_options_t { * Name length must be under 64 chars. */ static const struct extended_stats_options_t extended_stats_options[] = { + {"64bits", IFLA_STATS_LINK_64, NO_SUB_TYPE}, }; static const char *get_filter_type(const char *name) -- 2.4.3
[PATCH iproute2 v4 2/4] ifstat: Add extended statistics to ifstat
Extended stats are part of the RTM_GETSTATS method. This patch adds them to ifstat. While extended stats can come in many forms, we support only the rtnl_link_stats64 struct for them (which is the 64 bits version of struct rtnl_link_stats). We support stats in the main nesting level, or one lower. The extension can be called by its name or any shorten of it. If there is more than one matched, the first one will be picked. To get the extended stats the flag -x is used. Signed-off-by: Nogah Frankel Reviewed-by: Jiri Pirko --- misc/ifstat.c | 160 -- 1 file changed, 145 insertions(+), 15 deletions(-) diff --git a/misc/ifstat.c b/misc/ifstat.c index 5bcbcc8..9467119 100644 --- a/misc/ifstat.c +++ b/misc/ifstat.c @@ -34,6 +34,7 @@ #include "libnetlink.h" #include "json_writer.h" #include "SNAPSHOT.h" +#include "utils.h" int dump_zeros; int reset_history; @@ -48,17 +49,21 @@ int pretty; double W; char **patterns; int npatterns; +bool is_extended; +int filter_type; +int sub_type; char info_source[128]; int source_mismatch; #define MAXS (sizeof(struct rtnl_link_stats)/sizeof(__u32)) +#define NO_SUB_TYPE 0x struct ifstat_ent { struct ifstat_ent *next; char*name; int ifindex; - unsigned long long val[MAXS]; + __u64 val[MAXS]; double rate[MAXS]; __u32 ival[MAXS]; }; @@ -106,6 +111,48 @@ static int match(const char *id) return 0; } +static int get_nlmsg_extended(const struct sockaddr_nl *who, + struct nlmsghdr *m, void *arg) +{ + struct if_stats_msg *ifsm = NLMSG_DATA(m); + struct rtattr *tb[IFLA_STATS_MAX+1]; + int len = m->nlmsg_len; + struct ifstat_ent *n; + + if (m->nlmsg_type != RTM_NEWSTATS) + return 0; + + len -= NLMSG_LENGTH(sizeof(*ifsm)); + if (len < 0) + return -1; + + parse_rtattr(tb, IFLA_STATS_MAX, IFLA_STATS_RTA(ifsm), len); + if (tb[filter_type] == NULL) + return 0; + + n = malloc(sizeof(*n)); + if (!n) + abort(); + + n->ifindex = ifsm->ifindex; + n->name = strdup(ll_index_to_name(ifsm->ifindex)); + + if (sub_type == NO_SUB_TYPE) { + memcpy(&n->val, RTA_DATA(tb[filter_type]), sizeof(n->val)); + } else { + struct rtattr *attr; + + attr = parse_rtattr_one_nested(sub_type, tb[filter_type]); + if (attr == NULL) + return 0; + memcpy(&n->val, RTA_DATA(attr), sizeof(n->val)); + } + memset(&n->rate, 0, sizeof(n->rate)); + n->next = kern_db; + kern_db = n; + return 0; +} + static int get_nlmsg(const struct sockaddr_nl *who, struct nlmsghdr *m, void *arg) { @@ -147,18 +194,34 @@ static void load_info(void) { struct ifstat_ent *db, *n; struct rtnl_handle rth; + __u32 filter_mask; if (rtnl_open(&rth, 0) < 0) exit(1); - if (rtnl_wilddump_request(&rth, AF_INET, RTM_GETLINK) < 0) { - perror("Cannot send dump request"); - exit(1); - } + if (is_extended) { + ll_init_map(&rth); + filter_mask = IFLA_STATS_FILTER_BIT(filter_type); + if (rtnl_wilddump_stats_req_filter(&rth, AF_UNSPEC, RTM_GETSTATS, + filter_mask) < 0) { + perror("Cannot send dump request"); + exit(1); + } - if (rtnl_dump_filter(&rth, get_nlmsg, NULL) < 0) { - fprintf(stderr, "Dump terminated\n"); - exit(1); + if (rtnl_dump_filter(&rth, get_nlmsg_extended, NULL) < 0) { + fprintf(stderr, "Dump terminated\n"); + exit(1); + } + } else { + if (rtnl_wilddump_request(&rth, AF_INET, RTM_GETLINK) < 0) { + perror("Cannot send dump request"); + exit(1); + } + + if (rtnl_dump_filter(&rth, get_nlmsg, NULL) < 0) { + fprintf(stderr, "Dump terminated\n"); + exit(1); + } } rtnl_close(&rth); @@ -553,10 +616,17 @@ static void update_db(int interval) } for (i = 0; i < MAXS; i++) { double sample; - unsigned long incr = h1->ival[i] - n->ival[i]; + __u64 incr; + + if (is_extended) { + incr = h1->val[i] - n->val[i]; +
Setting link down or up in software
Hello, I'm wondering what are the semantics of calling ip link set dev eth0 down I was expecting that to somehow instruct the device's ethernet driver to shut everything down, have the PHY tell the peer that it's going away, maybe even put the PHY in some low-power mode, etc. But it doesn't seem to be doing any of that on my HW. So what exactly is it supposed to do? And on top of that, I am seeing random occurrences of nb8800 26000.ethernet eth0: Link is Down Sometimes it is printed immediately. Sometimes it is printed as soon as I run "ip link set dev eth0 up" (?!) Sometimes it is not printed at all. I find this erratic behavior very confusing. Is it the symptom of some deeper bug? Regards.
[PATCH/RFC net] ravb: do not use zero-length alighment DMA request
From: Masaru Nagai Due to alignment requirements of the hardware transmissions are split into two DMA requests, a small padding request of 0 - 4 bytes in length followed by the a request for rest of the packet. In the case of IP packets the first request will never be zero due to the way that the stack aligns buffers for IP packets. However, for non-IP packets it may be zero. In this case it has been reported that timeouts occur, presumably because transmission stops at the first zero-length DMA request and thus the packet is not transmitted. However, in my environment a BUG is triggered as follows: [ 20.381417] [ cut here ] [ 20.386054] kernel BUG at lib/swiotlb.c:495! [ 20.390324] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 20.395805] Modules linked in: [ 20.398862] CPU: 0 PID: 2089 Comm: mz Not tainted 4.10.0-rc3-1-gf13ad2db193f #162 [ 20.406689] Hardware name: Renesas Salvator-X board based on r8a7796 (DT) [ 20.413474] task: 80063b1f1900 task.stack: 80063a71c000 [ 20.419404] PC is at swiotlb_tbl_map_single+0x178/0x2ec [ 20.424625] LR is at map_single+0x4c/0x98 [ 20.428629] pc : [] lr : [] pstate: 81c5 [ 20.436019] sp : 80063a71f9b0 [ 20.439327] x29: 80063a71f9b0 x28: 80063a20d500 [ 20.444636] x27: 08ed5000 x26: [ 20.449944] x25: 00067abe2adc x24: [ 20.455252] x23: 0020 x22: 0001 [ 20.460559] x21: 00175ffe x20: 80063b2a0010 [ 20.465866] x19: x18: cae6fb20 [ 20.471173] x17: a09ba018 x16: 087c8b70 [ 20.476480] x15: a084f588 x14: a09cfa14 [ 20.481787] x13: cae87ff0 x12: 0063abe2 [ 20.487098] x11: 08096360 x10: 80063abe2adc [ 20.492407] x9 : x8 : [ 20.497718] x7 : x6 : 08ed50d0 [ 20.503028] x5 : x4 : 0001 [ 20.508338] x3 : x2 : 00067abe2adc [ 20.513648] x1 : bafff000 x0 : [ 20.518958] [ 20.520446] Process mz (pid: 2089, stack limit = 0x80063a71c000) [ 20.526798] Stack: (0x80063a71f9b0 to 0x80063a72) [ 20.532543] f9a0: 80063a71fa30 0839c680 [ 20.540374] f9c0: 80063b2a0010 80063b2a0010 0001 [ 20.548204] f9e0: 006e 80063b23c000 80063b23c000 [ 20.556034] fa00: 80063b23c000 80063a20d500 00013b1f1900 [ 20.563864] fa20: 80063ffd18e0 80063b2a0010 80063a71fa60 0839cd10 [ 20.571694] fa40: 80063b2a0010 80063ffd18e0 00067abe2adc [ 20.579524] fa60: 80063a71fa90 08096380 80063b2a0010 [ 20.587353] fa80: 0001 80063a71fac0 0864f770 [ 20.595184] faa0: 80063b23caf0 0140 [ 20.603014] fac0: 80063a71fb60 087e6498 80063a20d500 80063b23c000 [ 20.610843] fae0: 08daeaf0 08daeb00 [ 20.618673] fb00: 80063a71fc0c 08da7000 80063b23c090 80063a44f000 [ 20.626503] fb20: 08daeb00 80063a71fc0c 08da7000 [ 20.634333] fb40: 80063b23c090 80060037 087e63d8 [ 20.642163] fb60: 80063a71fbc0 08807510 80063a692400 80063a20d500 [ 20.649993] fb80: 80063a44f000 80063b23c000 80063a69249c [ 20.657823] fba0: 80063a087800 80063b23c000 80063a20d500 [ 20.665653] fbc0: 80063a71fc10 087e67dc 80063a20d500 80063a692400 [ 20.673483] fbe0: 80063b23c000 80063a44f000 80063a69249c [ 20.681312] fc00: 80063a5f1a10 00103a087800 80063a71fc70 087e6b24 [ 20.689142] fc20: 80063a5f1a80 80063a71fde8 000f 05ea [ 20.696972] fc40: 80063a5f1a10 000f 0887fbd0 [ 20.704802] fc60: fff43a5f1a80 80063a71fc80 08880240 [ 20.712632] fc80: 80063a71fd90 087c7a34 80063afc7180 [ 20.720462] fca0: cae6fe18 0014 6000 0015 [ 20.728292] fcc0: 0123 00ce 088d2000 80063b1f1900 [ 20.736122] fce0: 8933 08e7cb80 80063a71fd80 087c50a4 [ 20.743951] fd00: 8933 08e7cb80 08e7cb80 001e [ 20.751781] fd20: 80063a71fe4c 0300 0123 [ 20.759611] fd40: 80063b1f 000e 0300 [ 20.767441] fd60: 0
[PATCH net-next 0/2] net/smc: fix typo and clc-bug
Dave, I received 2 bug reports for my new AF_SMC-code. Here are the fixes for them. Thanks, Ursula Ursula Braun (2): smc-typo-in-core-sock smc-macaddr-len net/core/sock.c | 2 +- net/smc/smc_clc.c | 10 -- net/smc/smc_ib.h | 4 +++- 3 files changed, 8 insertions(+), 8 deletions(-) -- 2.8.4
[PATCH net-next 2/2] smc: ETH_ALEN as memcpy length for mac addresses
When creating an SMC connection, there is a CLC (connection layer control) handshake to prepare for RDMA traffic. The corresponding code is part of commit 0cfdd8f92cac ("smc: connection and link group creation"). Mac addresses to be exchanged in the handshake are copied with a wrong length of 12 instead of 6 bytes. Following code overwrites the wrongly copied code, but nevertheless the correct length should already be used for the preceding mac address copying. Use ETH_ALEN for the memcpy length with mac addresses. Signed-off-by: Ursula Braun Fixes: 0cfdd8f92cac ("smc: connection and link group creation") Reported-by: Dan Carpenter --- net/smc/smc_clc.c | 10 -- net/smc/smc_ib.h | 4 +++- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c index e1e684c..cc6b6f8 100644 --- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -10,6 +10,7 @@ */ #include +#include #include #include @@ -151,8 +152,7 @@ int smc_clc_send_proposal(struct smc_sock *smc, pclc.hdr.version = SMC_CLC_V1; /* SMC version */ memcpy(pclc.lcl.id_for_peer, local_systemid, sizeof(local_systemid)); memcpy(&pclc.lcl.gid, &smcibdev->gid[ibport - 1], SMC_GID_SIZE); - memcpy(&pclc.lcl.mac, &smcibdev->mac[ibport - 1], - sizeof(smcibdev->mac[ibport - 1])); + memcpy(&pclc.lcl.mac, &smcibdev->mac[ibport - 1], ETH_ALEN); /* determine subnet and mask from internal TCP socket */ rc = smc_netinfo_by_tcpsk(smc->clcsock, &pclc.outgoing_subnet, @@ -199,8 +199,7 @@ int smc_clc_send_confirm(struct smc_sock *smc) memcpy(cclc.lcl.id_for_peer, local_systemid, sizeof(local_systemid)); memcpy(&cclc.lcl.gid, &link->smcibdev->gid[link->ibport - 1], SMC_GID_SIZE); - memcpy(&cclc.lcl.mac, &link->smcibdev->mac[link->ibport - 1], - sizeof(link->smcibdev->mac)); + memcpy(&cclc.lcl.mac, &link->smcibdev->mac[link->ibport - 1], ETH_ALEN); hton24(cclc.qpn, link->roce_qp->qp_num); cclc.rmb_rkey = htonl(conn->rmb_desc->mr_rx[SMC_SINGLE_LINK]->rkey); @@ -252,8 +251,7 @@ int smc_clc_send_accept(struct smc_sock *new_smc, int srv_first_contact) memcpy(aclc.lcl.id_for_peer, local_systemid, sizeof(local_systemid)); memcpy(&aclc.lcl.gid, &link->smcibdev->gid[link->ibport - 1], SMC_GID_SIZE); - memcpy(&aclc.lcl.mac, link->smcibdev->mac[link->ibport - 1], - sizeof(link->smcibdev->mac[link->ibport - 1])); + memcpy(&aclc.lcl.mac, link->smcibdev->mac[link->ibport - 1], ETH_ALEN); hton24(aclc.qpn, link->roce_qp->qp_num); aclc.rmb_rkey = htonl(conn->rmb_desc->mr_rx[SMC_SINGLE_LINK]->rkey); diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h index 3fe2d55..a95f74b 100644 --- a/net/smc/smc_ib.h +++ b/net/smc/smc_ib.h @@ -11,6 +11,7 @@ #ifndef _SMC_IB_H #define _SMC_IB_H +#include #include #define SMC_MAX_PORTS 2 /* Max # of ports */ @@ -34,7 +35,8 @@ struct smc_ib_device {/* ib-device infos for smc */ struct ib_cq*roce_cq_recv; /* recv completion queue */ struct tasklet_struct send_tasklet; /* called by send cq handler */ struct tasklet_struct recv_tasklet; /* called by recv cq handler */ - charmac[SMC_MAX_PORTS][6]; /* mac address per port*/ + charmac[SMC_MAX_PORTS][ETH_ALEN]; + /* mac address per port*/ union ib_gidgid[SMC_MAX_PORTS]; /* gid per port */ u8 initialized : 1; /* ib dev CQ, evthdl done */ struct work_struct port_event_work; -- 2.8.4
[PATCH net-next 1/2] net: fix AF_SMC related typo
When introducing the new socket family AF_SMC in commit ac7138746e14 ("smc: establish new socket family"), a typo in af_family_clock_key_strings has slipped in. This patch repairs it. Signed-off-by: Ursula Braun Fixes: ac7138746e14 ("smc: establish new socket family") Reported-by: Andrew Morton --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sock.c b/net/core/sock.c index dbbdc4f..8b35debf 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -256,7 +256,7 @@ static const char *const af_family_clock_key_strings[AF_MAX+1] = { "clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET" , "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG" , "clock-AF_NFC" , "clock-AF_VSOCK", "clock-AF_KCM" , - "clock-AF_QIPCRTR", "closck-AF_smc", "clock-AF_MAX" + "clock-AF_QIPCRTR", "clock-AF_SMC" , "clock-AF_MAX" }; /* -- 2.8.4
Re: [PATCH v4 01/13] net: ethernet: aquantia: Make and configuration files.
From: Alexander Loktionov Date: Wed, 11 Jan 2017 19:53:05 -0800 > @@ -0,0 +1,19 @@ > +/* > + * aQuantia Corporation Network Driver > + * Copyright (C) 2014-2017 aQuantia Corporation. All rights reserved > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + */ > + > +#ifndef VER_H > +#define VER_H > + > +#define NIC_MAJOR_DRIVER_VERSION 1 > +#define NIC_MINOR_DRIVER_VERSION 5 > +#define NIC_BUILD_DRIVER_VERSION 339 > +#define NIC_REVISION_DRIVER_VERSION0 > + > +#endif /* VER_H */ > + Please do not add empty lines at the end of files, GIT even warns about this. Please audit your entire submission for this problem.
Re: [PATCH net-next] net: core: Make netif_wake_subqueue a wrapper
From: Florian Fainelli Date: Wed, 11 Jan 2017 21:13:02 -0800 > netif_wake_subqueue() is duplicating the same thing that netif_tx_wake_queue() > does, so make it call it directly after looking up the queue from the index. > > Signed-off-by: Florian Fainelli Looks good, applied.
Re: [PATCH v2 net-next] Introduce a sysctl that modifies the value of PROT_SOCK.
From: Krister Johansen Date: Wed, 11 Jan 2017 22:52:25 -0800 > Add net.ipv4.ip_unprotected_port_start, which is a per namespace sysctl > that denotes the first unprotected inet port in the namespace. To > disable all protected ports set this to zero. It also checks for > overlap with the local port range. The protected and local range may > not overlap. > > The use case for this change is to allow containerized processes to bind > to priviliged ports, but prevent them from ever being allowed to modify > their container's network configuration. The latter is accomplished by > ensuring that the network namespace is not a child of the user > namespace. This modification was needed to allow the container manager > to disable a namespace's priviliged port restrictions without exposing > control of the network namespace to processes in the user namespace. > > Signed-off-by: Krister Johansen This is what CAP_NET_BIND_SERVICE is for, and why it is a separate network privilege, please use it.
[iproute PATCH] tc: m_xt: Fix segfault with iptables-1.6.0
Said iptables version introduced struct xtables_globals field 'compat_rev', a function pointer. Initializing it is mandatory as libxtables calls it without existence check. Without this, tc segfaults when using the xt action like so: | tc filter add dev d0 parent : u32 match u32 0 0 \ | action xt -j MARK --set-mark 20 Signed-off-by: Phil Sutter --- tc/m_xt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tc/m_xt.c b/tc/m_xt.c index dbb54981462ee..57ed40d7aa3a8 100644 --- a/tc/m_xt.c +++ b/tc/m_xt.c @@ -77,6 +77,9 @@ static struct xtables_globals tcipt_globals = { .orig_opts = original_opts, .opts = original_opts, .exit_err = NULL, +#if (XTABLES_VERSION_CODE >= 11) + .compat_rev = xtables_compatible_revision, +#endif }; /* -- 2.11.0
Re: [PATCH net-next] cxgb4: Initialize mbox lock and list for mgmt dev
From: Ganesh Goudar Date: Thu, 12 Jan 2017 12:23:21 +0530 > Initialize mbox lock and list for mgmt dev to avoid NULL pointer > dereference when cxgb_set_vf_mac is called. > > And also allocate memory for private data while allocating mgmt > netdev. > > Signed-off-by: Ganesh Goudar Applied.
Re: [patch net 0/3] mlxsw: Couple of fixes
From: Jiri Pirko Date: Thu, 12 Jan 2017 09:10:36 +0100 > Couple of simple fixes from Arkadi and Elad. > > Please queue these up for stable. Thanks. Series applied and queued up for -stable, thanks.
Re: [PATCH net-next] tools: psock_lib: harden socket filter used by psock tests
On 01/12/2017 02:10 PM, Sowmini Varadhan wrote: The filter added by sock_setfilter is intended to only permit packets matching the pattern set up by create_payload(), but we only check the ip_len, and a single test-character in the IP packet to ensure this condition. Harden the filter by adding additional constraints so that we only permit UDP/IPv4 packets that meet the ip_len and test-character requirements. Include the bpf_asm src as a comment, in case this needs to be enhanced in the future Signed-off-by: Sowmini Varadhan LGTM, thanks! Acked-by: Daniel Borkmann
Re: [PATCH v2 net-next] Introduce a sysctl that modifies the value of PROT_SOCK.
On Wed, 2017-01-11 at 22:52 -0800, Krister Johansen wrote: > Add net.ipv4.ip_unprotected_port_start, which is a per namespace sysctl > that denotes the first unprotected inet port in the namespace. To > disable all protected ports set this to zero. It also checks for > overlap with the local port range. The protected and local range may > not overlap. > > The use case for this change is to allow containerized processes to bind > to priviliged ports, but prevent them from ever being allowed to modify > their container's network configuration. The latter is accomplished by > ensuring that the network namespace is not a child of the user > namespace. This modification was needed to allow the container manager > to disable a namespace's priviliged port restrictions without exposing > control of the network namespace to processes in the user namespace. > > Signed-off-by: Krister Johansen > --- > include/net/ip.h | 10 + > include/net/netns/ipv4.h | 1 + > net/ipv4/af_inet.c | 5 - > net/ipv4/sysctl_net_ipv4.c | 50 > +- > net/ipv6/af_inet6.c| 3 ++- > net/netfilter/ipvs/ip_vs_ctl.c | 7 +++--- > net/sctp/socket.c | 10 + > security/selinux/hooks.c | 3 ++- Adding a new sysctl without documentation is generally not accepted. Please take a look at Documentation/networking/ip-sysctl.txt BTW, sticking to 'unprivileged' ports might be better than 'unprotected' which is vague.
[PATCH net-next] lwt_bpf: bpf_lwt_prog_cmp() can be static
From: Wei Yongjun Fixes the following sparse warning: net/core/lwt_bpf.c:355:5: warning: symbol 'bpf_lwt_prog_cmp' was not declared. Should it be static? Signed-off-by: Wei Yongjun --- net/core/lwt_bpf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c index 71bb3e2..40ef8ae 100644 --- a/net/core/lwt_bpf.c +++ b/net/core/lwt_bpf.c @@ -352,7 +352,7 @@ static int bpf_encap_nlsize(struct lwtunnel_state *lwtstate) 0; } -int bpf_lwt_prog_cmp(struct bpf_lwt_prog *a, struct bpf_lwt_prog *b) +static int bpf_lwt_prog_cmp(struct bpf_lwt_prog *a, struct bpf_lwt_prog *b) { /* FIXME: * The LWT state is currently rebuilt for delete requests which
Re: [PATCH net-next] lwt_bpf: bpf_lwt_prog_cmp() can be static
On 01/12/2017 03:39 PM, Wei Yongjun wrote: From: Wei Yongjun Fixes the following sparse warning: net/core/lwt_bpf.c:355:5: warning: symbol 'bpf_lwt_prog_cmp' was not declared. Should it be static? Signed-off-by: Wei Yongjun Acked-by: Daniel Borkmann
[PATCH RESEND net-next 11/12] s390/qeth: issue STARTLAN as first IPA command
From: Julian Wiedmann STARTLAN needs to be the first IPA command after MPC initialization completes. So move the qeth_send_startlan() call from the layer disciplines into the core path, right after the MPC handshake. While at it, replace the magic LAN OFFLINE return code with the existing enum. Signed-off-by: Julian Wiedmann Reviewed-by: Thomas Richter Reviewed-by: Ursula Braun --- drivers/s390/net/qeth_core.h | 1 - drivers/s390/net/qeth_core_main.c | 21 + drivers/s390/net/qeth_l2_main.c | 15 --- drivers/s390/net/qeth_l3_main.c | 15 --- 4 files changed, 17 insertions(+), 35 deletions(-) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 774ae51..e7addea 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -913,7 +913,6 @@ void qeth_clear_thread_running_bit(struct qeth_card *, unsigned long); int qeth_core_hardsetup_card(struct qeth_card *); void qeth_print_status_message(struct qeth_card *); int qeth_init_qdio_queues(struct qeth_card *); -int qeth_send_startlan(struct qeth_card *); int qeth_send_ipa_cmd(struct qeth_card *, struct qeth_cmd_buffer *, int (*reply_cb) (struct qeth_card *, struct qeth_reply *, unsigned long), diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index ca8309f..315d8a2 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -2944,7 +2944,7 @@ int qeth_send_ipa_cmd(struct qeth_card *card, struct qeth_cmd_buffer *iob, } EXPORT_SYMBOL_GPL(qeth_send_ipa_cmd); -int qeth_send_startlan(struct qeth_card *card) +static int qeth_send_startlan(struct qeth_card *card) { int rc; struct qeth_cmd_buffer *iob; @@ -2957,7 +2957,6 @@ int qeth_send_startlan(struct qeth_card *card) rc = qeth_send_ipa_cmd(card, iob, NULL, NULL); return rc; } -EXPORT_SYMBOL_GPL(qeth_send_startlan); static int qeth_default_setadapterparms_cb(struct qeth_card *card, struct qeth_reply *reply, unsigned long data) @@ -5087,6 +5086,20 @@ int qeth_core_hardsetup_card(struct qeth_card *card) goto out; } + rc = qeth_send_startlan(card); + if (rc) { + QETH_DBF_TEXT_(SETUP, 2, "6err%d", rc); + if (rc == IPA_RC_LAN_OFFLINE) { + dev_warn(&card->gdev->dev, + "The LAN is offline\n"); + card->lan_online = 0; + } else { + rc = -ENODEV; + goto out; + } + } else + card->lan_online = 1; + card->options.ipa4.supported_funcs = 0; card->options.ipa6.supported_funcs = 0; card->options.adp.supported_funcs = 0; @@ -5098,14 +5111,14 @@ int qeth_core_hardsetup_card(struct qeth_card *card) if (qeth_is_supported(card, IPA_SETADAPTERPARMS)) { rc = qeth_query_setadapterparms(card); if (rc < 0) { - QETH_DBF_TEXT_(SETUP, 2, "6err%d", rc); + QETH_DBF_TEXT_(SETUP, 2, "7err%d", rc); goto out; } } if (qeth_adp_supported(card, IPA_SETADP_SET_DIAG_ASSIST)) { rc = qeth_query_setdiagass(card); if (rc < 0) { - QETH_DBF_TEXT_(SETUP, 2, "7err%d", rc); + QETH_DBF_TEXT_(SETUP, 2, "8err%d", rc); goto out; } } diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index c298759c..bea4833 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -1177,21 +1177,6 @@ static int __qeth_l2_set_online(struct ccwgroup_device *gdev, int recovery_mode) /* softsetup */ QETH_DBF_TEXT(SETUP, 2, "softsetp"); - rc = qeth_send_startlan(card); - if (rc) { - QETH_DBF_TEXT_(SETUP, 2, "1err%d", rc); - if (rc == 0xe080) { - dev_warn(&card->gdev->dev, - "The LAN is offline\n"); - card->lan_online = 0; - goto contin; - } - rc = -ENODEV; - goto out_remove; - } else - card->lan_online = 1; - -contin: if ((card->info.type == QETH_CARD_TYPE_OSD) || (card->info.type == QETH_CARD_TYPE_OSX)) { rc = qeth_l2_start_ipassists(card); diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index ac37d05..06d0add 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -3227,21 +3227,6 @@ static int __qeth_l3_set_online(struct ccwgroup_device *gdev, int recovery_mode) /* softsetup */ QETH_DBF_TEXT(SETUP, 2, "softsetp"); - rc = qeth_send_
[PATCH RESEND net-next 02/12] s390/qeth: test RX/TX checksum offload reply
From: Thomas Richter Turning on receive and/or transmit checksum offload support on the OSA card requires 2 commands: 1. start command which replies with available features 2. enable command to turn on selected features. The current version does not check the reply of the start command and simply uses the returned value to enable offload features. When the start command returns zero, this leads to a situation where no checksum offload is turned on by the hardware. Even worse no error indication is returned. The Linux kernel assumes the OSA card performs RX/TX checksum offload, but the hardware does not perform any checksum verification at all. This patch checks the return of the start and enable command responses from the hardware and turns off checksum offloading if the commands fails or does not respond with the correct bit setting. Signed-off-by: Thomas Richter Reviewed-by: Julian Wiedmann Reviewed-by: Ursula Braun --- drivers/s390/net/qeth_core_main.c | 13 + drivers/s390/net/qeth_core_mpc.h | 10 ++ 2 files changed, 23 insertions(+) diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 5ab80ea..49b813f 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -6104,11 +6104,19 @@ static int qeth_ipa_checksum_run_cmd(struct qeth_card *card, static int qeth_send_checksum_on(struct qeth_card *card, int cstype) { + const __u32 required_features = QETH_IPA_CHECKSUM_IP_HDR | + QETH_IPA_CHECKSUM_UDP | + QETH_IPA_CHECKSUM_TCP; struct qeth_checksum_cmd chksum_cb; int rc; rc = qeth_ipa_checksum_run_cmd(card, cstype, IPA_CMD_ASS_START, 0, &chksum_cb); + if (!rc) { + if ((required_features & chksum_cb.supported) != + required_features) + rc = -EIO; + } if (rc) { qeth_send_simple_setassparms(card, cstype, IPA_CMD_ASS_STOP, 0); dev_warn(&card->gdev->dev, @@ -6118,6 +6126,11 @@ static int qeth_send_checksum_on(struct qeth_card *card, int cstype) } rc = qeth_ipa_checksum_run_cmd(card, cstype, IPA_CMD_ASS_ENABLE, chksum_cb.supported, &chksum_cb); + if (!rc) { + if ((required_features & chksum_cb.enabled) != + required_features) + rc = -EIO; + } if (rc) { qeth_send_simple_setassparms(card, cstype, IPA_CMD_ASS_STOP, 0); dev_warn(&card->gdev->dev, diff --git a/drivers/s390/net/qeth_core_mpc.h b/drivers/s390/net/qeth_core_mpc.h index f54ea72..bc69d0a 100644 --- a/drivers/s390/net/qeth_core_mpc.h +++ b/drivers/s390/net/qeth_core_mpc.h @@ -352,6 +352,16 @@ struct qeth_arp_query_info { char *udata; }; +/* IPA set assist segmentation bit definitions for receive and + * transmit checksum offloading. + */ +enum qeth_ipa_checksum_bits { + QETH_IPA_CHECKSUM_IP_HDR= 0x0002, + QETH_IPA_CHECKSUM_UDP = 0x0008, + QETH_IPA_CHECKSUM_TCP = 0x0010, + QETH_IPA_CHECKSUM_LP2LP = 0x0020 +}; + /* IPA Assist checksum offload reply layout. */ struct qeth_checksum_cmd { __u32 supported; -- 2.8.4
[PATCH RESEND net-next 10/12] s390/qeth: shuffle MAC management functions around
From: Julian Wiedmann Move all MAC utility functions in one place, and drop the forward declarations. Signed-off-by: Julian Wiedmann Reviewed-by: Thomas Richter --- drivers/s390/net/qeth_l2_main.c | 129 1 file changed, 63 insertions(+), 66 deletions(-) diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index d456740..c298759c 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -27,9 +27,6 @@ static int qeth_l2_set_offline(struct ccwgroup_device *); static int qeth_l2_stop(struct net_device *); -static int qeth_l2_send_delmac(struct qeth_card *, __u8 *); -static int qeth_l2_send_setdelmac(struct qeth_card *, __u8 *, - enum qeth_ipa_cmds); static void qeth_l2_set_rx_mode(struct net_device *); static int qeth_l2_recover(void *); static void qeth_bridgeport_query_support(struct qeth_card *card); @@ -165,6 +162,64 @@ static int qeth_setdel_makerc(struct qeth_card *card, int retcode) return rc; } +static int qeth_l2_send_setdelmac(struct qeth_card *card, __u8 *mac, + enum qeth_ipa_cmds ipacmd) +{ + struct qeth_ipa_cmd *cmd; + struct qeth_cmd_buffer *iob; + + QETH_CARD_TEXT(card, 2, "L2sdmac"); + iob = qeth_get_ipacmd_buffer(card, ipacmd, QETH_PROT_IPV4); + if (!iob) + return -ENOMEM; + cmd = (struct qeth_ipa_cmd *)(iob->data+IPA_PDU_HEADER_SIZE); + cmd->data.setdelmac.mac_length = OSA_ADDR_LEN; + memcpy(&cmd->data.setdelmac.mac, mac, OSA_ADDR_LEN); + return qeth_setdel_makerc(card, qeth_send_ipa_cmd(card, iob, + NULL, NULL)); +} + +static int qeth_l2_send_setmac(struct qeth_card *card, __u8 *mac) +{ + int rc; + + QETH_CARD_TEXT(card, 2, "L2Setmac"); + rc = qeth_l2_send_setdelmac(card, mac, IPA_CMD_SETVMAC); + if (rc == 0) { + card->info.mac_bits |= QETH_LAYER2_MAC_REGISTERED; + memcpy(card->dev->dev_addr, mac, OSA_ADDR_LEN); + dev_info(&card->gdev->dev, + "MAC address %pM successfully registered on device %s\n", + card->dev->dev_addr, card->dev->name); + } else { + card->info.mac_bits &= ~QETH_LAYER2_MAC_REGISTERED; + switch (rc) { + case -EEXIST: + dev_warn(&card->gdev->dev, + "MAC address %pM already exists\n", mac); + break; + case -EPERM: + dev_warn(&card->gdev->dev, + "MAC address %pM is not authorized\n", mac); + break; + } + } + return rc; +} + +static int qeth_l2_send_delmac(struct qeth_card *card, __u8 *mac) +{ + int rc; + + QETH_CARD_TEXT(card, 2, "L2Delmac"); + if (!(card->info.mac_bits & QETH_LAYER2_MAC_REGISTERED)) + return 0; + rc = qeth_l2_send_setdelmac(card, mac, IPA_CMD_DELVMAC); + if (rc == 0) + card->info.mac_bits &= ~QETH_LAYER2_MAC_REGISTERED; + return rc; +} + static int qeth_l2_send_setgroupmac(struct qeth_card *card, __u8 *mac) { int rc; @@ -193,11 +248,6 @@ static int qeth_l2_send_delgroupmac(struct qeth_card *card, __u8 *mac) return rc; } -static inline u32 qeth_l2_mac_hash(const u8 *addr) -{ - return get_unaligned((u32 *)(&addr[2])); -} - static int qeth_l2_write_mac(struct qeth_card *card, struct qeth_mac *mac) { if (mac->is_uc) { @@ -232,6 +282,11 @@ static void qeth_l2_del_all_macs(struct qeth_card *card) spin_unlock_bh(&card->mclock); } +static inline u32 qeth_l2_mac_hash(const u8 *addr) +{ + return get_unaligned((u32 *)(&addr[2])); +} + static inline int qeth_l2_get_cast_type(struct qeth_card *card, struct sk_buff *skb) { @@ -572,64 +627,6 @@ static int qeth_l2_poll(struct napi_struct *napi, int budget) return work_done; } -static int qeth_l2_send_setdelmac(struct qeth_card *card, __u8 *mac, - enum qeth_ipa_cmds ipacmd) -{ - struct qeth_ipa_cmd *cmd; - struct qeth_cmd_buffer *iob; - - QETH_CARD_TEXT(card, 2, "L2sdmac"); - iob = qeth_get_ipacmd_buffer(card, ipacmd, QETH_PROT_IPV4); - if (!iob) - return -ENOMEM; - cmd = (struct qeth_ipa_cmd *)(iob->data+IPA_PDU_HEADER_SIZE); - cmd->data.setdelmac.mac_length = OSA_ADDR_LEN; - memcpy(&cmd->data.setdelmac.mac, mac, OSA_ADDR_LEN); - return qeth_setdel_makerc(card, qeth_send_ipa_cmd(card, iob, - NULL, NULL)); -} - -static int qeth_l2_send_setmac(struct qeth_card *card, __u8 *mac) -{ - int rc; - - QETH_CARD_TEXT(card, 2, "L2Setmac"); - rc = qeth_l2_send_setdelmac(card, mac, IPA_CMD_SETVMAC); - if (rc == 0) { -
[PATCH RESEND net-next 06/12] s390/qeth: drop qeth_l2_del_all_macs() parameter
From: Julian Wiedmann The only caller passes del = 0, so remove both the parameter and the code that handles != 0. Signed-off-by: Julian Wiedmann Reviewed-by: Thomas Richter Acked-by: Ursula Braun --- drivers/s390/net/qeth_l2_main.c | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 9c921c28..3025f56 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -216,7 +216,7 @@ static int qeth_l2_write_mac(struct qeth_card *card, struct qeth_mac *mac) return rc; } -static void qeth_l2_del_all_macs(struct qeth_card *card, int del) +static void qeth_l2_del_all_macs(struct qeth_card *card) { struct qeth_mac *mac; struct hlist_node *tmp; @@ -224,13 +224,6 @@ static void qeth_l2_del_all_macs(struct qeth_card *card, int del) spin_lock_bh(&card->mclock); hash_for_each_safe(card->mac_htable, i, tmp, mac, hnode) { - if (del) { - if (mac->is_uc) - qeth_l2_send_setdelmac(card, mac->mac_addr, - IPA_CMD_DELVMAC); - else - qeth_l2_send_delgroupmac(card, mac->mac_addr); - } hash_del(&mac->hnode); kfree(mac); } @@ -425,7 +418,7 @@ static void qeth_l2_stop_card(struct qeth_card *card, int recovery_mode) card->state = CARD_STATE_SOFTSETUP; } if (card->state == CARD_STATE_SOFTSETUP) { - qeth_l2_del_all_macs(card, 0); + qeth_l2_del_all_macs(card); qeth_clear_ipacmd_list(card); card->state = CARD_STATE_HARDSETUP; } -- 2.8.4
[PATCH RESEND net-next 12/12] s390/qeth: fix retrieval of vipa and proxy-arp addresses
qeth devices in layer3 mode need a separate handling of vipa and proxy-arp addresses. vipa and proxy-arp addresses processed by qeth can be read from userspace. Introduced with commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") the retrieval of vipa and proxy-arp addresses is broken, if more than one vipa or proxy-arp address are set. The qeth code used local variable "int i" for 2 different purposes. This patch now spends 2 separate local variables of type "int". While touching these functions hash_for_each_safe() is converted to hash_for_each(), since there is no removal of hash entries. Signed-off-by: Ursula Braun Reviewed-by: Julian Wiedmann Reference-ID: RQM 3524 --- drivers/s390/net/qeth_l3_sys.c | 30 -- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c index 3cd4d9f..05e9471 100644 --- a/drivers/s390/net/qeth_l3_sys.c +++ b/drivers/s390/net/qeth_l3_sys.c @@ -689,15 +689,15 @@ static ssize_t qeth_l3_dev_vipa_add_show(char *buf, struct qeth_card *card, enum qeth_prot_versions proto) { struct qeth_ipaddr *ipaddr; - struct hlist_node *tmp; char addr_str[40]; + int str_len = 0; int entry_len; /* length of 1 entry string, differs between v4 and v6 */ - int i = 0; + int i; entry_len = (proto == QETH_PROT_IPV4)? 12 : 40; entry_len += 2; /* \n + terminator */ spin_lock_bh(&card->ip_lock); - hash_for_each_safe(card->ip_htable, i, tmp, ipaddr, hnode) { + hash_for_each(card->ip_htable, i, ipaddr, hnode) { if (ipaddr->proto != proto) continue; if (ipaddr->type != QETH_IP_TYPE_VIPA) @@ -705,16 +705,17 @@ static ssize_t qeth_l3_dev_vipa_add_show(char *buf, struct qeth_card *card, /* String must not be longer than PAGE_SIZE. So we check if * string length gets near PAGE_SIZE. Then we can savely display * the next IPv6 address (worst case, compared to IPv4) */ - if ((PAGE_SIZE - i) <= entry_len) + if ((PAGE_SIZE - str_len) <= entry_len) break; qeth_l3_ipaddr_to_string(proto, (const u8 *)&ipaddr->u, addr_str); - i += snprintf(buf + i, PAGE_SIZE - i, "%s\n", addr_str); + str_len += snprintf(buf + str_len, PAGE_SIZE - str_len, "%s\n", + addr_str); } spin_unlock_bh(&card->ip_lock); - i += snprintf(buf + i, PAGE_SIZE - i, "\n"); + str_len += snprintf(buf + str_len, PAGE_SIZE - str_len, "\n"); - return i; + return str_len; } static ssize_t qeth_l3_dev_vipa_add4_show(struct device *dev, @@ -851,15 +852,15 @@ static ssize_t qeth_l3_dev_rxip_add_show(char *buf, struct qeth_card *card, enum qeth_prot_versions proto) { struct qeth_ipaddr *ipaddr; - struct hlist_node *tmp; char addr_str[40]; + int str_len = 0; int entry_len; /* length of 1 entry string, differs between v4 and v6 */ - int i = 0; + int i; entry_len = (proto == QETH_PROT_IPV4)? 12 : 40; entry_len += 2; /* \n + terminator */ spin_lock_bh(&card->ip_lock); - hash_for_each_safe(card->ip_htable, i, tmp, ipaddr, hnode) { + hash_for_each(card->ip_htable, i, ipaddr, hnode) { if (ipaddr->proto != proto) continue; if (ipaddr->type != QETH_IP_TYPE_RXIP) @@ -867,16 +868,17 @@ static ssize_t qeth_l3_dev_rxip_add_show(char *buf, struct qeth_card *card, /* String must not be longer than PAGE_SIZE. So we check if * string length gets near PAGE_SIZE. Then we can savely display * the next IPv6 address (worst case, compared to IPv4) */ - if ((PAGE_SIZE - i) <= entry_len) + if ((PAGE_SIZE - str_len) <= entry_len) break; qeth_l3_ipaddr_to_string(proto, (const u8 *)&ipaddr->u, addr_str); - i += snprintf(buf + i, PAGE_SIZE - i, "%s\n", addr_str); + str_len += snprintf(buf + str_len, PAGE_SIZE - str_len, "%s\n", + addr_str); } spin_unlock_bh(&card->ip_lock); - i += snprintf(buf + i, PAGE_SIZE - i, "\n"); + str_len += snprintf(buf + str_len, PAGE_SIZE - str_len, "\n"); - return i; + return str_len; } static ssize_t qeth_l3_dev_rxip_add4_show(struct device *dev, -- 2.8.4
[PATCH RESEND net-next 09/12] s390/qeth: extract qeth_l2_remove_mac()
From: Julian Wiedmann This matches qeth_l2_write_mac(). Signed-off-by: Julian Wiedmann Reviewed-by: Thomas Richter --- drivers/s390/net/qeth_l2_main.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 074fc62..d456740 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -200,16 +200,22 @@ static inline u32 qeth_l2_mac_hash(const u8 *addr) static int qeth_l2_write_mac(struct qeth_card *card, struct qeth_mac *mac) { - - int rc; - if (mac->is_uc) { - rc = qeth_l2_send_setdelmac(card, mac->mac_addr, + return qeth_l2_send_setdelmac(card, mac->mac_addr, IPA_CMD_SETVMAC); } else { - rc = qeth_l2_send_setgroupmac(card, mac->mac_addr); + return qeth_l2_send_setgroupmac(card, mac->mac_addr); + } +} + +static int qeth_l2_remove_mac(struct qeth_card *card, struct qeth_mac *mac) +{ + if (mac->is_uc) { + return qeth_l2_send_setdelmac(card, mac->mac_addr, + IPA_CMD_DELVMAC); + } else { + return qeth_l2_send_delgroupmac(card, mac->mac_addr); } - return rc; } static void qeth_l2_del_all_macs(struct qeth_card *card) @@ -782,14 +788,7 @@ static void qeth_l2_set_rx_mode(struct net_device *dev) hash_for_each_safe(card->mac_htable, i, tmp, mac, hnode) { if (mac->disp_flag == QETH_DISP_ADDR_DELETE) { - if (!mac->is_uc) - rc = qeth_l2_send_delgroupmac(card, - mac->mac_addr); - else { - rc = qeth_l2_send_setdelmac(card, mac->mac_addr, - IPA_CMD_DELVMAC); - } - + qeth_l2_remove_mac(card, mac); hash_del(&mac->hnode); kfree(mac); -- 2.8.4
[PATCH RESEND net-next 00/12] s390: qeth patches
Hi Dave, yesterday I came up with 13 qeth patches. Since you have not been happy with the 13th patch, I want to make sure that at least the remaining 12 qeth patches can be applied to net-next. Here is the resend of them. Thanks, Ursula Julian Wiedmann (8): s390/qeth: Allow reading hsuid in state DOWN s390/qeth: Remove QETH_IP_HEADER_SIZE s390/qeth: drop qeth_l2_del_all_macs() parameter s390/qeth: don't convert return code twice s390/qeth: consolidate errno translation s390/qeth: extract qeth_l2_remove_mac() s390/qeth: shuffle MAC management functions around s390/qeth: issue STARTLAN as first IPA command Thomas Richter (3): s390/qeth: rework RX/TX checksum offload s390/qeth: test RX/TX checksum offload reply s390/qeth: display warning for OSA3 RX/TX checksum offloading Ursula Braun (1): s390/qeth: fix retrieval of vipa and proxy-arp addresses drivers/s390/net/qeth_core.h | 5 - drivers/s390/net/qeth_core_main.c | 135 --- drivers/s390/net/qeth_core_mpc.h | 17 drivers/s390/net/qeth_l2_main.c | 189 -- drivers/s390/net/qeth_l3_main.c | 15 --- drivers/s390/net/qeth_l3_sys.c| 33 --- 6 files changed, 212 insertions(+), 182 deletions(-) -- 2.8.4
[PATCH RESEND net-next 08/12] s390/qeth: consolidate errno translation
From: Julian Wiedmann Consolidate errno handling for MAC management: Instead of doing this in every caller, do it in one place. Signed-off-by: Julian Wiedmann Reviewed-by: Thomas Richter Suggested-by: Ursula Braun --- drivers/s390/net/qeth_l2_main.c | 20 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 38fae10..074fc62 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -170,8 +170,7 @@ static int qeth_l2_send_setgroupmac(struct qeth_card *card, __u8 *mac) int rc; QETH_CARD_TEXT(card, 2, "L2Sgmac"); - rc = qeth_setdel_makerc(card, qeth_l2_send_setdelmac(card, mac, - IPA_CMD_SETGMAC)); + rc = qeth_l2_send_setdelmac(card, mac, IPA_CMD_SETGMAC); if (rc == -EEXIST) QETH_DBF_MESSAGE(2, "Group MAC %pM already existing on %s\n", mac, QETH_CARD_IFNAME(card)); @@ -186,8 +185,7 @@ static int qeth_l2_send_delgroupmac(struct qeth_card *card, __u8 *mac) int rc; QETH_CARD_TEXT(card, 2, "L2Dgmac"); - rc = qeth_setdel_makerc(card, qeth_l2_send_setdelmac(card, mac, - IPA_CMD_DELGMAC)); + rc = qeth_l2_send_setdelmac(card, mac, IPA_CMD_DELGMAC); if (rc) QETH_DBF_MESSAGE(2, "Could not delete group MAC %pM on %s: %d\n", @@ -206,9 +204,8 @@ static int qeth_l2_write_mac(struct qeth_card *card, struct qeth_mac *mac) int rc; if (mac->is_uc) { - rc = qeth_setdel_makerc(card, - qeth_l2_send_setdelmac(card, mac->mac_addr, - IPA_CMD_SETVMAC)); + rc = qeth_l2_send_setdelmac(card, mac->mac_addr, + IPA_CMD_SETVMAC); } else { rc = qeth_l2_send_setgroupmac(card, mac->mac_addr); } @@ -582,7 +579,8 @@ static int qeth_l2_send_setdelmac(struct qeth_card *card, __u8 *mac, cmd = (struct qeth_ipa_cmd *)(iob->data+IPA_PDU_HEADER_SIZE); cmd->data.setdelmac.mac_length = OSA_ADDR_LEN; memcpy(&cmd->data.setdelmac.mac, mac, OSA_ADDR_LEN); - return qeth_send_ipa_cmd(card, iob, NULL, NULL); + return qeth_setdel_makerc(card, qeth_send_ipa_cmd(card, iob, + NULL, NULL)); } static int qeth_l2_send_setmac(struct qeth_card *card, __u8 *mac) @@ -590,8 +588,7 @@ static int qeth_l2_send_setmac(struct qeth_card *card, __u8 *mac) int rc; QETH_CARD_TEXT(card, 2, "L2Setmac"); - rc = qeth_setdel_makerc(card, qeth_l2_send_setdelmac(card, mac, - IPA_CMD_SETVMAC)); + rc = qeth_l2_send_setdelmac(card, mac, IPA_CMD_SETVMAC); if (rc == 0) { card->info.mac_bits |= QETH_LAYER2_MAC_REGISTERED; memcpy(card->dev->dev_addr, mac, OSA_ADDR_LEN); @@ -621,8 +618,7 @@ static int qeth_l2_send_delmac(struct qeth_card *card, __u8 *mac) QETH_CARD_TEXT(card, 2, "L2Delmac"); if (!(card->info.mac_bits & QETH_LAYER2_MAC_REGISTERED)) return 0; - rc = qeth_setdel_makerc(card, qeth_l2_send_setdelmac(card, mac, - IPA_CMD_DELVMAC)); + rc = qeth_l2_send_setdelmac(card, mac, IPA_CMD_DELVMAC); if (rc == 0) card->info.mac_bits &= ~QETH_LAYER2_MAC_REGISTERED; return rc; -- 2.8.4
[PATCH RESEND net-next 04/12] s390/qeth: Allow reading hsuid in state DOWN
From: Julian Wiedmann Accessing the current hsuid via card->options.hsuid is perfectly fine, even when the card is DOWN. Signed-off-by: Julian Wiedmann Reviewed-by: Thomas Richter Acked-by: Ursula Braun --- drivers/s390/net/qeth_l3_sys.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c index 0e00a5c..3cd4d9f 100644 --- a/drivers/s390/net/qeth_l3_sys.c +++ b/drivers/s390/net/qeth_l3_sys.c @@ -250,9 +250,6 @@ static ssize_t qeth_l3_dev_hsuid_show(struct device *dev, if (card->info.type != QETH_CARD_TYPE_IQD) return -EPERM; - if (card->state == CARD_STATE_DOWN) - return -EPERM; - memcpy(tmp_hsuid, card->options.hsuid, sizeof(tmp_hsuid)); EBCASC(tmp_hsuid, 8); return sprintf(buf, "%s\n", tmp_hsuid); -- 2.8.4
[PATCH RESEND net-next 05/12] s390/qeth: Remove QETH_IP_HEADER_SIZE
From: Julian Wiedmann Remove unused define QETH_IP_HEADER_SIZE. Signed-off-by: Julian Wiedmann Reviewed-by: Thomas Richter Acked-by: Ursula Braun --- drivers/s390/net/qeth_core.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 41e4665..774ae51 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -281,8 +281,6 @@ static inline int qeth_is_ipa_enabled(struct qeth_ipa_info *ipa, #define QETH_HIGH_WATERMARK_PACK 5 #define QETH_WATERMARK_PACK_FUZZ 1 -#define QETH_IP_HEADER_SIZE 40 - /* large receive scatter gather copy break */ #define QETH_RX_SG_CB (PAGE_SIZE >> 1) #define QETH_RX_PULL_LEN 256 -- 2.8.4
[PATCH RESEND net-next 07/12] s390/qeth: don't convert return code twice
From: Julian Wiedmann qeth_l2_send_groupmac() already translates the return code, so calling qeth_setdel_makerc() a second time only produces garbage. Signed-off-by: Julian Wiedmann Reviewed-by: Thomas Richter Reviewed-by: Ursula Braun --- drivers/s390/net/qeth_l2_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 3025f56..38fae10 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -210,8 +210,7 @@ static int qeth_l2_write_mac(struct qeth_card *card, struct qeth_mac *mac) qeth_l2_send_setdelmac(card, mac->mac_addr, IPA_CMD_SETVMAC)); } else { - rc = qeth_setdel_makerc(card, - qeth_l2_send_setgroupmac(card, mac->mac_addr)); + rc = qeth_l2_send_setgroupmac(card, mac->mac_addr); } return rc; } -- 2.8.4
[PATCH RESEND net-next 01/12] s390/qeth: rework RX/TX checksum offload
From: Thomas Richter Rework the RX/TX checksum offloading command sequence to use the provided function call back mechanims to return card data to the device driver. Signed-off-by: Thomas Richter Reviewed-by: Julian Wiedmann Reviewed-by: Ursula Braun --- drivers/s390/net/qeth_core.h | 2 - drivers/s390/net/qeth_core_main.c | 96 ++- drivers/s390/net/qeth_core_mpc.h | 7 +++ 3 files changed, 72 insertions(+), 33 deletions(-) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 6d4b68c4..41e4665 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -674,8 +674,6 @@ struct qeth_card_info { int broadcast_capable; int unique_id; struct qeth_card_blkt blkt; - __u32 csum_mask; - __u32 tx_csum_mask; enum qeth_ipa_promisc_modes promisc_mode; __u32 diagass_support; __u32 hwtrap; diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index e335583..5ab80ea 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -5289,18 +5289,6 @@ int qeth_setassparms_cb(struct qeth_card *card, if (cmd->hdr.prot_version == QETH_PROT_IPV6) card->options.ipa6.enabled_funcs = cmd->hdr.ipa_enabled; } - if (cmd->data.setassparms.hdr.assist_no == IPA_INBOUND_CHECKSUM && - cmd->data.setassparms.hdr.command_code == IPA_CMD_ASS_START) { - card->info.csum_mask = cmd->data.setassparms.data.flags_32bit; - QETH_CARD_TEXT_(card, 3, "csum:%d", card->info.csum_mask); - } - if (cmd->data.setassparms.hdr.assist_no == IPA_OUTBOUND_CHECKSUM && - cmd->data.setassparms.hdr.command_code == IPA_CMD_ASS_START) { - card->info.tx_csum_mask = - cmd->data.setassparms.data.flags_32bit; - QETH_CARD_TEXT_(card, 3, "tcsu:%d", card->info.tx_csum_mask); - } - return 0; } EXPORT_SYMBOL_GPL(qeth_setassparms_cb); @@ -6060,23 +6048,78 @@ int qeth_core_ethtool_get_settings(struct net_device *netdev, } EXPORT_SYMBOL_GPL(qeth_core_ethtool_get_settings); +/* Callback to handle checksum offload command reply from OSA card. + * Verify that required features have been enabled on the card. + * Return error in hdr->return_code as this value is checked by caller. + * + * Always returns zero to indicate no further messages from the OSA card. + */ +static int qeth_ipa_checksum_run_cmd_cb(struct qeth_card *card, + struct qeth_reply *reply, + unsigned long data) +{ + struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *) data; + struct qeth_checksum_cmd *chksum_cb = + (struct qeth_checksum_cmd *)reply->param; + + QETH_CARD_TEXT(card, 4, "chkdoccb"); + if (cmd->hdr.return_code) + return 0; + + memset(chksum_cb, 0, sizeof(*chksum_cb)); + if (cmd->data.setassparms.hdr.command_code == IPA_CMD_ASS_START) { + chksum_cb->supported = + cmd->data.setassparms.data.chksum.supported; + QETH_CARD_TEXT_(card, 3, "strt:%x", chksum_cb->supported); + } + if (cmd->data.setassparms.hdr.command_code == IPA_CMD_ASS_ENABLE) { + chksum_cb->supported = + cmd->data.setassparms.data.chksum.supported; + chksum_cb->enabled = + cmd->data.setassparms.data.chksum.enabled; + QETH_CARD_TEXT_(card, 3, "supp:%x", chksum_cb->supported); + QETH_CARD_TEXT_(card, 3, "enab:%x", chksum_cb->enabled); + } + return 0; +} + +/* Send command to OSA card and check results. */ +static int qeth_ipa_checksum_run_cmd(struct qeth_card *card, +enum qeth_ipa_funcs ipa_func, +__u16 cmd_code, long data, +struct qeth_checksum_cmd *chksum_cb) +{ + struct qeth_cmd_buffer *iob; + int rc = -ENOMEM; + + QETH_CARD_TEXT(card, 4, "chkdocmd"); + iob = qeth_get_setassparms_cmd(card, ipa_func, cmd_code, + sizeof(__u32), QETH_PROT_IPV4); + if (iob) + rc = qeth_send_setassparms(card, iob, sizeof(__u32), data, + qeth_ipa_checksum_run_cmd_cb, + chksum_cb); + return rc; +} + static int qeth_send_checksum_on(struct qeth_card *card, int cstype) { - long rxtx_arg; + struct qeth_checksum_cmd chksum_cb; int rc; - rc = qeth_send_simple_setassparms(card, cstype, IPA_CMD_ASS_START, 0); + rc = qeth_ipa_checksum_run_cmd(card, cstype, IPA_CMD_ASS_START, 0, + &chksum_cb);
[PATCH net-next] ipmr: improve hash scalability
Recently we started using ipmr with thousands of entries and easily hit soft lockups on smaller devices. The reason is that the hash function uses the high order bits from the src and dst, but those don't change in many common cases, also the hash table is only 64 elements so with thousands it doesn't scale at all. This patch migrates the hash table to rhashtable, and in particular the rhl interface which allows for duplicate elements to be chained because of the MFC_PROXY support (*,G; *,*,oif cases) which allows for multiple duplicate entries to be added with different interfaces (IMO wrong, but it's been in for a long time). And here are some results from tests I've run in a VM: mr_table size (default, allocated for all namespaces): BeforeAfter 49304 bytes 2400 bytes Add 65000 routes (the diff is much larger on smaller devices): BeforeAfter 1m42s 58s Forwarding 256 byte packets with 65000 routes (test done in a VM): BeforeAfter 3 Mbps / ~1465 pps122 Mbps / ~59000 pps As a bonus we no longer see the soft lockups on smaller devices which showed up even with 2000 entries before. Signed-off-by: Nikolay Aleksandrov --- include/linux/mroute.h | 57 --- net/ipv4/ipmr.c| 255 +++-- 2 files changed, 182 insertions(+), 130 deletions(-) diff --git a/include/linux/mroute.h b/include/linux/mroute.h index f019b62f27b5..d7f63339ef0b 100644 --- a/include/linux/mroute.h +++ b/include/linux/mroute.h @@ -3,6 +3,7 @@ #include #include +#include #include #include @@ -60,7 +61,6 @@ struct vif_device { #define VIFF_STATIC 0x8000 #define VIF_EXISTS(_mrt, _idx) ((_mrt)->vif_table[_idx].dev != NULL) -#define MFC_LINES 64 struct mr_table { struct list_headlist; @@ -69,8 +69,9 @@ struct mr_table { struct sock __rcu *mroute_sk; struct timer_list ipmr_expire_timer; struct list_headmfc_unres_queue; - struct list_headmfc_cache_array[MFC_LINES]; struct vif_device vif_table[MAXVIFS]; + struct rhltable mfc_hash; + struct list_headmfc_cache_list; int maxvif; atomic_tcache_resolve_queue_len; boolmroute_do_assert; @@ -85,17 +86,48 @@ enum { MFC_STATIC = BIT(0), }; +struct mfc_cache_cmp_arg { + __be32 mfc_mcastgrp; + __be32 mfc_origin; +}; + +/** + * struct mfc_cache - multicast routing entries + * @mnode: rhashtable list + * @mfc_mcastgrp: destination multicast group address + * @mfc_origin: source address + * @cmparg: used for rhashtable comparisons + * @mfc_parent: source interface (iif) + * @mfc_flags: entry flags + * @expires: unresolved entry expire time + * @unresolved: unresolved cached skbs + * @last_assert: time of last assert + * @minvif: minimum VIF id + * @maxvif: maximum VIF id + * @bytes: bytes that have passed for this entry + * @pkt: packets that have passed for this entry + * @wrong_if: number of wrong source interface hits + * @lastuse: time of last use of the group (traffic or update) + * @ttls: OIF TTL threshold array + * @list: global entry list + * @rcu: used for entry destruction + */ struct mfc_cache { - struct list_head list; - __be32 mfc_mcastgrp;/* Group the entry belongs to */ - __be32 mfc_origin; /* Source of packet */ - vifi_t mfc_parent; /* Source interface */ - int mfc_flags; /* Flags on line */ + struct rhlist_head mnode; + union { + struct { + __be32 mfc_mcastgrp; + __be32 mfc_origin; + }; + struct mfc_cache_cmp_arg cmparg; + }; + vifi_t mfc_parent; + int mfc_flags; union { struct { unsigned long expires; - struct sk_buff_head unresolved; /* Unresolved buffers */ + struct sk_buff_head unresolved; } unres; struct { unsigned long last_assert; @@ -105,18 +137,13 @@ struct mfc_cache { unsigned long pkt; unsigned long wrong_if; unsigned long lastuse; - unsigned char ttls[MAXVIFS];/* TTL thresholds */ + unsigned char ttls[MAXVIFS]; } res; } mfc_un; + struct list_head list; struct rcu_head rcu; }; -#ifdef __BIG_ENDIAN -#define MFC_HASH(a,b) (__force u32)(__be32)a)>>24)^(((__force u32)(__be32)b)>>26))&(MFC_LINES-1)) -#else -#define MFC_HASH(a,b) __force u32)(__be32)a)^(((__force
[PATCH RESEND net-next 03/12] s390/qeth: display warning for OSA3 RX/TX checksum offloading
From: Thomas Richter When RX/TX checksum offloading is turned on and the adapter is an OSA 3 card in layer 3 mode, the checksum offloading is only performed when both peers use different adapters. If both peers share an OSA 3 card, communication is a memory copy and checksum offloading is not performed. This patch adds a warning to inform the administrator. OSA 3 in layer 2 mode does not offer the RX/TX checksum offload feature. Signed-off-by: Thomas Richter Reviewed-by: Julian Wiedmann Reviewed-by: Ursula Braun --- drivers/s390/net/qeth_core_main.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 49b813f..ca8309f 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -6116,6 +6116,11 @@ static int qeth_send_checksum_on(struct qeth_card *card, int cstype) if ((required_features & chksum_cb.supported) != required_features) rc = -EIO; + else if (!(QETH_IPA_CHECKSUM_LP2LP & chksum_cb.supported) && +cstype == IPA_INBOUND_CHECKSUM) + dev_warn(&card->gdev->dev, +"Hardware checksumming is performed only if %s and its peer use different OSA Express 3 ports\n", +QETH_CARD_IFNAME(card)); } if (rc) { qeth_send_simple_setassparms(card, cstype, IPA_CMD_ASS_STOP, 0); -- 2.8.4
Re: [PATCH net-next 0/2] net/smc: fix typo and clc-bug
From: Ursula Braun Date: Thu, 12 Jan 2017 14:57:13 +0100 > I received 2 bug reports for my new AF_SMC-code. Here are the fixes for them. Series applied, thanks.
Re: [PATCH net-next v2 0/2] More flexible BPF cb access
From: Daniel Borkmann Date: Thu, 12 Jan 2017 11:51:31 +0100 > This patch improves BPF's cb access by allowing b/h/w/dw > access variants on it. For details, please see individual > patches. Series applied, thanks.
Re: [PATCH net-next] lwt_bpf: bpf_lwt_prog_cmp() can be static
From: Wei Yongjun Date: Thu, 12 Jan 2017 14:39:28 + > From: Wei Yongjun > > Fixes the following sparse warning: > > net/core/lwt_bpf.c:355:5: warning: > symbol 'bpf_lwt_prog_cmp' was not declared. Should it be static? > > Signed-off-by: Wei Yongjun Applied.
Re: [PATCH RESEND net-next 00/12] s390: qeth patches
From: Ursula Braun Date: Thu, 12 Jan 2017 15:48:31 +0100 > yesterday I came up with 13 qeth patches. Since you have not been > happy with the 13th patch, I want to make sure that at least the > remaining 12 qeth patches can be applied to net-next. Here is the > resend of them. Series applied.
Re: Setting link down or up in software
On 12/01/2017 14:05, Mason wrote: > I'm wondering what are the semantics of calling > > ip link set dev eth0 down > > I was expecting that to somehow instruct the device's ethernet driver > to shut everything down, have the PHY tell the peer that it's going > away, maybe even put the PHY in some low-power mode, etc. > > But it doesn't seem to be doing any of that on my HW. > > So what exactly is it supposed to do? > > > And on top of that, I am seeing random occurrences of > > nb8800 26000.ethernet eth0: Link is Down > > Sometimes it is printed immediately. > Sometimes it is printed as soon as I run "ip link set dev eth0 up" (?!) > Sometimes it is not printed at all. > > I find this erratic behavior very confusing. > > Is it the symptom of some deeper bug? Here's an example of "Link is Down" printed when I set link up: At [ 62.750220] I run ip link set dev eth0 down Then leave the system idle for 10 minutes. At [ 646.263041] I run ip link set dev eth0 up At [ 647.364079] it prints "Link is Down" At [ 649.417434] it prints "Link is Up - 1Gbps/Full - flow control rx/tx" I think whether I set up the PHY to use interrupts or polling does have an influence on the weirdness I observe. AFAICT, changing the interface flags is done in dev_change_flags which calls __dev_change_flags and __dev_notify_flags Is one of these supposed to call the device driver through a callback at some point? How/when is the phy_state_machine notified of the change in interface flags? Regards.
Re: [PATCH net-next] net: ipv6: put autoconf routes into per-interface tables
On 1/9/17 7:01 PM, Lorenzo Colitti wrote: > On Sun, Jan 8, 2017 at 1:24 PM, David Ahern wrote: >> Why not use the VRF capability then? create a VRF and assign the interface >> to it. End result is the same -- separate tables and the need to use a >> bind-to-device API to hit those routes. > > Requiring that VRFs for this creates additional complexity, because > each network now requires its own VRF. That means that the connection > manager must create the VRF before the interface comes up and receives > the RA. > > In some cases this might not be possible. For example, consider a tun > interface that's created by a different process such as a VPN client. > In this case the connection manager doesn't know the interface name, > and the VPN client doesn't know to create the VRF, so if the tun > interface gets an RA after the tun is created but Have you looked at adding basic l3mdev capabilities to tun? in this case just l3mdev_fib_table needs be implemented. On interface create push down a table id and set the IFF_L3MDEV_MASTER flag.
Correct method for initializing Pause and Asymmetrical Pause support in phy drivers
Hello netdev list, I am currently investigating a problem related to Ethernet auto-negotiation of Pause and Asymmetrical Pause capabilities. TL;DR: I am using a Picozed system-on-module with a Xilinx Gigabit Ethernet MAC and a Marvell PHY. It does not appear to be advertising support for Pause and Asym Pause, which seems strange to me given that this is relatively recent hardware. I suspect that may be due to a problem in the way phydev->supported is initialized in drivers/net/phy/marvell.c. I am trying to confirm what the proper method is to initialize phydev->supported such that it advertises SUPPORTED_Pause and SUPPORTED_Asym_Pause. Adding these flags to (phy_driver).features seems to work, but I would like to confirm with people who are more knowledgeable than me in this regard. Read on for details about what I have observed and tried so far. # The System # The application I am working on uses Avnet's Picozed 7020 System-on-Module (SOM), which contains: * An on-chip MAC (on a Xilinx Zynq 7000 chip) * A Marvell 1512 Alaska PHY. * A daughtercard that provides the actual RJ45 connector. The Zynq is running a Xilinx fork of Linux. I am working with the following drivers: The MAC is the built-in Gigabit Ethernet MAC on a Xilinx Zynq 7000 chip. It uses the xemacps.c driver, which can be found on Xilinx's official Linux fork: * RAW: https://raw.githubusercontent.com/Xilinx/linux-xlnx/master/drivers/net/ethernet/xilinx/xilinx_emacps.c * GITHUB: https://github.com/Xilinx/linux-xlnx/blob/master/drivers/net/ethernet/xilinx/xilinx_emacps.c The PHY is a Marvell 1512 Alaska device that comes on the Picozed 7020 SOM that is the heart of the application I am working on. The version of the driver I am using for this PHY can be found here (note that this version is slightly different (older?) to the mainline Linux repo): * RAW: https://raw.githubusercontent.com/Xilinx/linux-xlnx/master/drivers/net/phy/marvell.c * GITHUB: https://github.com/Xilinx/linux-xlnx/blob/master/drivers/net/phy/marvell.c # The Problem: No Flow Control via Auto-Negotiation # I have noticed that when I connect to the Picozed using a PC, the connection speed and duplex are negotiated correctly, and the signal integrity is good. However, its autonegotiation capabilities do not report support for the Pause or Asymmetrical Pause capabilities. Flow control is thus disabled as a result. I verified this by dumping the Link Partner capabilities PHY register on my PC. If I connect another regular PC or a smart switch (on which I have enabled flow control) to my PC, flow control capability is reported and thus Pause frames are enabled. Based on the Zynq's Technical Reference Manual, the MAC supports Pause frames and Asymmetrical Pause, so I am working with the assumption that these features should be advertised, contrary to what I am seeing. I spent most of the day looking at the PHY abstraction layer, the marvell driver, and the xemacps driver to figure out where the missing flow control capability information needed to be added. I also looked at the phy.txt where I found the following: "Now just make sure that phydev->supported and phydev->advertising have any values pruned from them which don't make sense for your controller a 10/100 controller may be connected to a gigabit capable PHY, so you would need to mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions for these bitfields. Note that you should not SET any bits, or the PHY may get put into an unsupported state." Source: https://raw.githubusercontent.com/Xilinx/linux-xlnx/master/Documentation/networking/phy.txt So my understanding is as follows: 1. The PHY driver sets all of the flags for the capabilities it supports in phydev->supported. 2. The MAC driver then prunes the capabilities it does not support from phydev->supported to see what can be safely advertised. In xemacps.c, the following code appears to be performing this pruning, by removing all capabilities other than PHY_GBIT_FEATURES and the Flow Control capability bits. phydev->supported &= (PHY_GBIT_FEATURES | SUPPORTED_Pause | SUPPORTED_Asym_Pause); So, if the PHY had advertised that it supported Flow Control / Pause Frames, these capabilities would have been preserved. However, with some variable dumping to dmesg, I can see that SUPPORTED_Pause and SUPPORTED_Asym_Pause are not present in phydev->supported. What I observe is that phydev->supported has a value of 0x02ff, and we would expect bits 13 and 14 to be set, resulting in 0x62ff. I dug a bit deeper to see how the PHY driver populates the value of phydev->supported before it gets passed to the MAC for pruning. I found that it comes from the .features field in the phy_driver structs defined at the bottom of the marvell.c file. In the version I am using, this field only contains PHY_GBIT_FEATURES, which explains why the SUPPORTED_Pause and SUPPORTED_Asym_Pause flags are not set. { .phy_id = MARVELL_PHY_ID_88E1510, .phy_id_mask
Re: [PATCH] [net] net/mlx5e: fix another -Wmaybe-uninitialized warning
On 1/11/2017 11:14 PM, Arnd Bergmann wrote: As found by Olof's build bot, today's mainline kernel gained a harmless warning about a potential uninitalied variable reference: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'parse_tc_fdb_actions': drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:769:13: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:811:21: note: 'out_dev' was declared here This was introduced through the addition of an 'IS_ERR/PTR_ERR' pair that gcc is unfortunately unable to completely figure out. Replacing it with PTR_ERR_OR_ZERO makes the code more understandable to gcc so it no longer warns. can you elaborate on this a little further? Hadar Hen Zion already attempted to fix the warning earlier by adding fake initializations, but that ended up just making the code worse without fully addressing all warnings, so I'm reverting it now that it is no longer needed. ok, so if your approach eliminates the warning on out_dev and also on the variables for which Hadar added the faked initializers, I guess we should be fine with this change (saw your reply on my other comment), just another question: In order to avoid pulling a variable declaration into the #ifdef, I'm removing it in favor of a more readable 'if()' statement here that has the same effect. When I build here without CONFIG_INET in my system, the build goes fine with this approach. However, we're pretty sure that in the past we got 0-day report from the kbuild test robot where he was unhappy that we make the ip_route_output_key call without being wrapped with that #if IS_ENABLED(CONFIG_INET) -- so, we don't want to go there again... thoughts? Or.
Re: [PATCH net] ravb: Remove Rx overflow log messages
From: Simon Horman Date: Thu, 12 Jan 2017 13:21:06 +0100 > From: Kazuya Mizuguchi > > Remove Rx overflow log messages as in an environment where logging results > in network traffic logging may cause further overflows. > > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") > Signed-off-by: Kazuya Mizuguchi > [simon: reworked changelog] > Signed-off-by: Simon Horman > Acked-by: Sergei Shtylyov Applied, thanks.
Re: [PATCH v2 2/2] stmmac: rename it to synopsys
I don't understand at all why it is so important to change the name of these files nor the directory they live in. What bonafide benefit will users receive if we do this? The only clear part is the downside, which is that it is going to make it painful to browse source history and backport bug fixes. Please, let's not do this. Thanks.
Re: [PATCH] synopsys: remove dwc_eth_qos driver
There is no driver named "synopsys", therefore "synopsys: " is not an appropriate subsystem prefix. Subsystem prefixes have to refer to real names in the source tree in one form or another. I know you guys really want to rename the stmmac driver to that, but no agreement has occurred on that issue and therefore assuming it has happened is not appropriate.
Re: Setting link down or up in software
> Here's an example of "Link is Down" printed when I set link up: > > At [ 62.750220] I run ip link set dev eth0 down > Then leave the system idle for 10 minutes. > At [ 646.263041] I run ip link set dev eth0 up > At [ 647.364079] it prints "Link is Down" > At [ 649.417434] it prints "Link is Up - 1Gbps/Full - flow control rx/tx" Purely a guess, but when you up the interface, it starts auto negotiation. That often involves resetting the PHY. If the PHY has already once completed autoneg, e.g. because of the boot loader, it will be initially UP. The reset will put it DOWN, and then once autoneg is complete, it will be Up again. Pure guess. Go read the code and see if i'm write. Andrew
[PATCH] net: thunderx: acpi: fix LMAC initialization
While probing BGX we requesting appropriate QLM for it's configuration and get LMAC count by that request. Then, while reading configured MAC values from SSDT table we need to save them in proper mapping: BGX[i]->lmac[j].mac = to later provide for initialization stuff. In order to fill such mapping properly we need to add lmac index to be used while acpi initialization since at this moment bgx->lmac_count already contains actual value. Signed-off-by: Vadim Lomovtsev --- drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c index be30ad0..a3f4f83 100644 --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c @@ -47,8 +47,9 @@ struct lmac { struct bgx { u8 bgx_id; struct lmaclmac[MAX_LMAC_PER_BGX]; - int lmac_count; + u8 lmac_count; u8 max_lmac; + u8 acpi_lmac_idx; void __iomem*reg_base; struct pci_dev *pdev; boolis_dlm; @@ -1073,13 +1074,13 @@ static acpi_status bgx_acpi_register_phy(acpi_handle handle, if (acpi_bus_get_device(handle, &adev)) goto out; - acpi_get_mac_address(dev, adev, bgx->lmac[bgx->lmac_count].mac); + acpi_get_mac_address(dev, adev, bgx->lmac[bgx->acpi_lmac_idx].mac); - SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, dev); + SET_NETDEV_DEV(&bgx->lmac[bgx->acpi_lmac_idx].netdev, dev); - bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; + bgx->lmac[bgx->acpi_lmac_idx].lmacid = bgx->acpi_lmac_idx; + bgx->acpi_lmac_idx++; /* move to next LMAC */ out: - bgx->lmac_count++; return AE_OK; } -- 1.8.3.1
Re: [PATCH net-next 1/4] siphash: add cryptographically secure PRF
Eric Biggers wrote: > Hi Jason, just a few comments: > > On Fri, Jan 06, 2017 at 09:10:52PM +0100, Jason A. Donenfeld wrote: >> +#define SIPHASH_ALIGNMENT __alignof__(u64) >> +typedef u64 siphash_key_t[2]; > > I was confused by all the functions passing siphash_key_t "by value" until I > saw > that it's actually typedefed to u64[2]. Have you considered making it a > struct > instead, something like this? > > typedef struct { >u64 v[2]; > } siphash_key_t; If it's just an 128-bit value then we have u128 in crypto/b128ops.h that could be generalised for this. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt