Dear Talented
Dear Talented, I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue sky Studio a Film Corporation Located in the United State, is Soliciting for the Right to use Your Photo/Face and Personality as One of the Semi -Major Role/ Character in our Upcoming ANIMATED Stereoscope 3D Movie-The Story of Anubis (Anubis 2018) The Movie is Currently Filming (In Production) Please Note That There Will Be No Auditions, Traveling or Any Special / Professional Acting Skills, Since the Production of This Movie Will Be Done with our State of Art Computer -Generating Imagery Equipment. We Are Prepared to Pay the Total Sum of $620,000.00 USD. For More Information/Understanding, Please Write us on the E-Mail Below. CONTACT EMAIL: blueskyanimatedstu...@usa.com All Reply to: blueskyanimatedstu...@usa.com Note: Only the Response send to this mail will be Given a Prior Consideration. Talent Scout Kim Sharma
Re: [PATCH] w90p910_ether: include linux/interrupt.h
From: Arnd BergmannDate: Tue, 12 Sep 2017 14:31:48 +0200 > A randconfig build caused a compile failure: > > drivers/net/ethernet/nuvoton/w90p910_ether.c: In function > 'w90p910_ether_close': > drivers/net/ethernet/nuvoton/w90p910_ether.c:580:2: error: implicit > declaration of function 'free_irq'; did you mean 'free_uid'? > [-Werror=implicit-function-declaration] > > Adding the correct include fixes the problem. > > Signed-off-by: Arnd Bergmann Applied.
Re: [PATCH net] net: bonding: fix tlb_dynamic_lb default value
From: Nikolay AleksandrovDate: Tue, 12 Sep 2017 15:10:05 +0300 > Commit 8b426dc54cf4 ("bonding: remove hardcoded value") changed the > default value for tlb_dynamic_lb which lead to either broken ALB mode > (since tlb_dynamic_lb can be changed only in TLB) or setting TLB mode > with tlb_dynamic_lb equal to 0. > The first issue was recently fixed by setting tlb_dynamic_lb to 1 always > when switching to ALB mode, but the default value is still wrong and > we'll enter TLB mode with tlb_dynamic_lb equal to 0 if the mode is > changed via netlink or sysfs. In order to restore the previous behaviour > and default value simply remove the mode check around the default param > initialization for tlb_dynamic_lb which will always set it to 1 as > before. > > Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value") > Signed-off-by: Nikolay Aleksandrov Applied and queued up for -stable, thanks.
Re: [PATCH] ipv4: Namespaceify tcp_fastopen knob
From: Haishuang YanDate: Tue, 12 Sep 2017 18:30:57 +0800 > Different namespace application might require enable TCP Fast Open > feature independently of the host. > > Reported-by: Luca BRUNO > Signed-off-by: Haishuang Yan ... > diff --git a/samples/bpf/test_ipip.sh b/samples/bpf/test_ipip.sh > index 1969254..7bbc521 100755 > --- a/samples/bpf/test_ipip.sh > +++ b/samples/bpf/test_ipip.sh > @@ -173,6 +173,8 @@ function cleanup { > cleanup > echo "Testing IP tunnels..." > test_ipip > +sleep 1 > test_ipip6 > +sleep 1 > test_ip6ip6 > echo "*** PASS ***" This seems like a completely unrelated change.
RE: [PATCH v3] iproute2: add support for GRE ignore-df knob
Guys, thanks heaps for this, much appreciated! Cheers. Mike -Original Message- From: Philip Prindeville [mailto:phil...@redfish-solutions.com] Sent: Friday, 21 July 2017 10:35 AM To: Stephen HemmingerCc: netdev@vger.kernel.org; Michele Lucini Subject: Re: [PATCH v3] iproute2: add support for GRE ignore-df knob > On Jul 20, 2017, at 6:26 PM, Stephen Hemminger > wrote: > > On Thu, 20 Jul 2017 13:06:10 -0600 > "Philip Prindeville" wrote: > >> From: Philip Prindeville >> >> In the presence of firewalls which improperly block ICMP Unreachable >> (including Fragmentation Required) messages, Path MTU Discovery is >> prevented from working. >> >> The workaround is to handle IPv4 payloads opaquely, ignoring the DF >> bit. >> >> Kernel commit 22a59be8b7693eb2d0897a9638f5991f2f8e4ddd ("net: ipv4: >> Add ability to have GRE ignore DF bit in IPv4 payloads") is >> complemented by this user-space changeset which exposes control of >> this setting. >> >> Reviewed-by: Stephen Hemminger >> Signed-off-by: Philip Prindeville > > Applied, thanks Philip Thanks! Sorry I didn’t realize that the first submission a year ago hadn’t been applied and it took me this long to redux and resubmit it. Michele: hopefully this comes out in your distro-of-choice fairly soon. Like I said, I thought this had already been rolled in. -Philip
Re: [PATCH v4 2/2] ip6_tunnel: fix ip6 tunnel lookup in collect_md mode
From: Haishuang YanDate: Tue, 12 Sep 2017 17:47:57 +0800 > In collect_md mode, if the tun dev is down, it still can call > __ip6_tnl_rcv to receive on packets, and the rx statistics increase > improperly. > > When the md tunnel is down, it's not neccessary to increase RX drops > for the tunnel device, packets would be recieved on fallback tunnel, > and the RX drops on fallback device will be increased as expected. > > Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") > Cc: Alexei Starovoitov > Signed-off-by: Haishuang Yan Applied.
Re: [PATCH v4 1/2] ip_tunnel: fix ip tunnel lookup in collect_md mode
From: Haishuang YanDate: Tue, 12 Sep 2017 17:47:56 +0800 > In collect_md mode, if the tun dev is down, it still can call > ip_tunnel_rcv to receive on packets, and the rx statistics increase > improperly. > > When the md tunnel is down, it's not neccessary to increase RX drops > for the tunnel device, packets would be recieved on fallback tunnel, > and the RX drops on fallback device will be increased as expected. > > Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.") > Cc: Pravin B Shelar > Signed-off-by: Haishuang Yan Applied.
Re: [patch net] mlxsw: spectrum: Prevent mirred-related crash on removal
From: Jiri PirkoDate: Tue, 12 Sep 2017 08:50:53 +0200 > From: Yuval Mintz > > When removing the offloading of mirred actions under > matchall classifiers, mlxsw would find the destination port > associated with the offloaded action and utilize it for undoing > the configuration. > > Depending on the order by which ports are removed, it's possible that > the destination port would get removed before the source port. > In such a scenario, when actions would be flushed for the source port > mlxsw would perform an illegal dereference as the destination port is > no longer listed. > > Since the only item necessary for undoing the configuration on the > destination side is the port-id and that in turn is already maintained > by mlxsw on the source-port, simply stop trying to access the > destination port and use the port-id directly instead. > > Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC > offloading") > Signed-off-by: Yuval Mintz > Signed-off-by: Jiri Pirko Applied and queued up for -stable, thanks.
Re: [Patch net v3 0/3] net_sched: fix filter chain reference counting
From: Cong WangDate: Mon, 11 Sep 2017 16:33:29 -0700 > This patchset fixes tc filter chain reference counting and nasty race > conditions with RCU callbacks. Please see each patch for details. Series applied, thanks Cong.
Re: [PATCH v2] openvswitch: Fix an error handling path in 'ovs_nla_init_match_and_action()'
From: Christophe JAILLETDate: Mon, 11 Sep 2017 21:56:20 +0200 > All other error handling paths in this function go through the 'error' > label. This one should do the same. > > Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.") > Signed-off-by: Christophe JAILLET Applied.
Re: [PATCH net] tcp/dccp: remove reqsk_put() from inet_child_forget()
From: Eric DumazetDate: Mon, 11 Sep 2017 15:58:38 -0700 > From: Eric Dumazet > > Back in linux-4.4, I inadvertently put a call to reqsk_put() in > inet_child_forget(), forgetting it could be called from two different > points. > > In the case it is called from inet_csk_reqsk_queue_add(), we want to > keep the reference on the request socket, since it is released later by > the caller (tcp_v{4|6}_rcv()) > > This bug never showed up because atomic_dec_and_test() was not signaling > the underflow, and SLAB_DESTROY_BY RCU semantic for request sockets > prevented the request to be put in quarantine. > > Recent conversion of socket refcount from atomic_t to refcount_t finally > exposed the bug. > > So move the reqsk_put() to inet_csk_listen_stop() to fix this. > > Thanks to Shankara Pailoor for using syzkaller and providing > a nice set of .config and C repro. ... > Fixes: ebb516af60e1 ("tcp/dccp: fix race at listener dismantle phase") > Signed-off-by: Eric Dumazet > Reported-by: Shankara Pailoor > Tested-by: Shankara Pailoor Applied and queued up for -stable. Thanks.
Re: [PATCH v2 net] smsc95xx: Configure pause time to 0xffff when tx flow control enabled
From:Date: Mon, 11 Sep 2017 17:43:11 + > From: Nisar Sayed > > Configure pause time to 0x when tx flow control enabled > > Set pause time to 0x in the pause frame to indicate the > partner to stop sending the packets. When RX buffer frees up, > the device sends pause frame with pause time zero for partner to > resume transmission. > > Fixes: 2f7ca802bdae ("Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver") > Signed-off-by: Nisar Sayed Applied.
Re: Regression in throughput between kvm guests over virtual bridge
On 2017年09月13日 01:56, Matthew Rosato wrote: We are seeing a regression for a subset of workloads across KVM guests over a virtual bridge between host kernel 4.12 and 4.13. Bisecting points to c67df11f "vhost_net: try batch dequing from skb array" In the regressed environment, we are running 4 kvm guests, 2 running as uperf servers and 2 running as uperf clients, all on a single host. They are connected via a virtual bridge. The uperf client profile looks like: So, 1 tcp streaming instance per client. When upgrading the host kernel from 4.12->4.13, we see about a 30% drop in throughput for this scenario. After the bisect, I further verified that reverting c67df11f on 4.13 "fixes" the throughput for this scenario. On the other hand, if we increase the load by upping the number of streaming instances to 50 (nprocs="50") or even 10, we see instead a ~10% increase in throughput when upgrading host from 4.12->4.13. So it may be the issue is specific to "light load" scenarios. I would expect some overhead for the batching, but 30% seems significant... Any thoughts on what might be happening here? Hi, thanks for the bisecting. Will try to see if I can reproduce. Various factors could have impact on stream performance. If possible, could you collect the #pkts and average packet size during the test? And if you guest version is above 4.12, could you please retry with napi_tx=true? Thanks
Re: [PATCH] datapath: Fix an error handling path in 'ovs_nla_init_match_and_action()'
On Tue, Sep 12, 2017 at 3:20 AM, Christophe JAILLETwrote: > All other error handling paths in this function go through the 'error' > label. This one should do the same. > > Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.") > Signed-off-by: Christophe JAILLET > --- > I think that the comment above the function could be improved. It looks > like the commit log which has introduced this function. > > I'm also not sure that commit 9cc9a5cb176c is of any help. It is > supposed to remove a warning, and I guess it does. But > 'ovs_nla_init_match_and_action()' > is called unconditionnaly from 'ovs_flow_cmd_set()'. So even if the stack > used by each function is reduced, the overall stack should be the same, if > not larger. > > So this commit sounds like adding a bug where the code was fine and states > to fix an issue but, at the best, only hides it. > > Instead of fixing the code with the proposed patch, reverting the initial > commit could also be considered. > --- > net/openvswitch/datapath.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c > index 76cf273a56c7..c3aec6227c91 100644 > --- a/net/openvswitch/datapath.c > +++ b/net/openvswitch/datapath.c > @@ -1112,7 +1112,8 @@ static int ovs_nla_init_match_and_action(struct net > *net, > if (!a[OVS_FLOW_ATTR_KEY]) { > OVS_NLERR(log, > "Flow key attribute not present in set > flow."); > - return -EINVAL; > + error = -EINVAL; > + goto error; Thank for your report. But I really don't understand. In the 'ovs_nla_init_match_and_action', we only init 'match' when the OVS_FLOW_ATTR_KEY is set. If the 'OVS_FLOW_ATTR_ACTIONS' is set, but not 'OVS_FLOW_ATTR_KEY', we can return directly because the match is not inited yet, and it is unnecessary to set it's mask NULL. Then ovs_flow_cmd_set can run via value returned. > } > > *acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], key, > -- > 2.11.0 >
Re: [PATCH v2] geneve: Fix setting ttl value in collect metadata mode
On Tue, Sep 12, 2017 at 12:05 AM, Haishuang Yanwrote: > Similar to vxlan/ipip tunnel, if key->tos is zero in collect metadata > mode, tos should also fallback to ip{4,6}_dst_hoplimit. > > Signed-off-by: Haishuang Yan > > --- > Changes since v2: > * Make the commit message more clearer. > --- > drivers/net/geneve.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c > index f640407..d52a65f 100644 > --- a/drivers/net/geneve.c > +++ b/drivers/net/geneve.c > @@ -834,11 +834,10 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct > net_device *dev, > sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true); > if (geneve->collect_md) { > tos = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb); > - ttl = key->ttl; > } else { > tos = ip_tunnel_ecn_encap(fl4.flowi4_tos, ip_hdr(skb), skb); > - ttl = key->ttl ? : ip4_dst_hoplimit(>dst); > } > + ttl = key->ttl ? : ip4_dst_hoplimit(>dst); > df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0; > This changes user API of Geneve collect-metadata mode. I do not see good reason for this. Why user can not set right TTL for the flow?
Re: [PATCH v4 1/2] ip_tunnel: fix ip tunnel lookup in collect_md mode
On Tue, Sep 12, 2017 at 2:47 AM, Haishuang Yanwrote: > In collect_md mode, if the tun dev is down, it still can call > ip_tunnel_rcv to receive on packets, and the rx statistics increase > improperly. > > When the md tunnel is down, it's not neccessary to increase RX drops > for the tunnel device, packets would be recieved on fallback tunnel, > and the RX drops on fallback device will be increased as expected. > > Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.") > Cc: Pravin B Shelar > Signed-off-by: Haishuang Yan Acked-by: Pravin B Shelar
Memory leaks in conntrack
Hello, While testing my TC filter patches (so not related to conntrack), the following memory leaks are shown up: unreferenced object 0x9b19ba551228 (size 128): comm "chronyd", pid 338, jiffies 4294910829 (age 53.188s) hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 18 00 00 30 00 00 00 00 00 00 00 00 ...0 backtrace: [] create_object+0x169/0x2aa [] kmemleak_alloc+0x25/0x41 [] slab_post_alloc_hook+0x44/0x65 [] __kmalloc_track_caller+0x113/0x146 [] __krealloc+0x4a/0x69 [] nf_ct_ext_add+0xe1/0x145 [] init_conntrack+0x1f7/0x36e [] nf_conntrack_in+0x1d3/0x326 [] ipv4_conntrack_local+0x4d/0x50 [] nf_hook_slow+0x3c/0x9b [] nf_hook.constprop.40+0xbe/0xd8 [] __ip_local_out+0xb3/0xbf [] ip_local_out+0x1c/0x36 [] ip_send_skb+0x19/0x3d [] udp_send_skb+0x17e/0x1df [] udp_sendmsg+0x5a2/0x77c unreferenced object 0x9b19a69b3340 (size 336): comm "chronyd", pid 338, jiffies 4294910868 (age 53.032s) hex dump (first 32 bytes): 01 00 00 00 5a 5a 5a 5a 00 00 00 00 ad 4e ad de .N.. ff ff ff ff 5a 5a 5a 5a ff ff ff ff ff ff ff ff backtrace: [] create_object+0x169/0x2aa [] kmemleak_alloc+0x25/0x41 [] slab_post_alloc_hook+0x44/0x65 [] kmem_cache_alloc+0xd7/0x1f1 [] __nf_conntrack_alloc+0xa2/0x146 [] init_conntrack+0xb2/0x36e [] nf_conntrack_in+0x1d3/0x326 [] ipv4_conntrack_local+0x4d/0x50 [] nf_hook_slow+0x3c/0x9b [] nf_hook.constprop.40+0xbe/0xd8 [] __ip_local_out+0xb3/0xbf [] ip_local_out+0x1c/0x36 [] ip_send_skb+0x19/0x3d [] udp_send_skb+0x17e/0x1df [] udp_sendmsg+0x5a2/0x77c [] inet_sendmsg+0x37/0x5e This seems new because I never see this before. I don't touch chronyd in my VM, so I have no idea why it sends out UDP packets, my guess is it is some periodical packet. I don't think I use conntrack either, since /proc/net/ip_conntrack does not exist. Here are some related config of my kernel: $ grep CONNTRACK .config CONFIG_NF_CONNTRACK=y CONFIG_NF_CONNTRACK_MARK=y CONFIG_NF_CONNTRACK_SECMARK=y CONFIG_NF_CONNTRACK_ZONES=y CONFIG_NF_CONNTRACK_PROCFS=y CONFIG_NF_CONNTRACK_EVENTS=y # CONFIG_NF_CONNTRACK_TIMEOUT is not set CONFIG_NF_CONNTRACK_TIMESTAMP=y CONFIG_NF_CONNTRACK_AMANDA=y CONFIG_NF_CONNTRACK_FTP=y CONFIG_NF_CONNTRACK_H323=y CONFIG_NF_CONNTRACK_IRC=y CONFIG_NF_CONNTRACK_BROADCAST=y CONFIG_NF_CONNTRACK_NETBIOS_NS=y CONFIG_NF_CONNTRACK_SNMP=y CONFIG_NF_CONNTRACK_PPTP=y CONFIG_NF_CONNTRACK_SANE=y CONFIG_NF_CONNTRACK_SIP=y CONFIG_NF_CONNTRACK_TFTP=y CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y CONFIG_NF_CONNTRACK_IPV4=y CONFIG_NF_CONNTRACK_IPV6=y Please let me know if you need any other information. Thanks.
Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression
First I’m super sorry for the top post, I’m at plumbers and I forgot to upload my muttrc to my new cloud instance, so I’m screwed using outlook. I have a completely untested, uncompiled patch that I think will fix the problem, would you mind giving it a go? Thanks, Josef On 9/12/17, 3:36 PM, "Laura Abbott"wrote: Hi, Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1432684 of a regression with automatic spice port assignment. The libvirt team reduced this to the attached test case run as follows: In a separate terminal, qemu-kvm -vnc 127.0.0.1:0 to grab port 5900. Then do this: $ gcc bind-collision.c && ./a.out bind: Address already in use AF_INET check failed. $ gcc -D CHECK_IPV6 bind-collision.c && ./a.out AF_INET6 success AF_INET success $ gcc bind-collision.c && ./a.out AF_INET success Bisection showed this behavior to be caused by commit 319554f284dda9f2737d09df82ba3610bd8ddea3 Author: Josef Bacik Date: Thu Jan 19 17:47:46 2017 -0500 inet: don't use sk_v6_rcv_saddr directly When comparing two sockets we need to use inet6_rcv_saddr so we get a NULL sk_v6_rcv_saddr if the socket isn't AF_INET6, otherwise our comparison function can be wrong. Fixes: 637bc8b ("inet: reset tb->fastreuseport when adding a reuseport sk") Signed-off-by: Josef Bacik Signed-off-by: David S. Miller And reverting fixed both the standalone test case and the spice issue. Any ideas? Thanks, Laura 0001-net-set-tb-fast_sk_family.patch Description: 0001-net-set-tb-fast_sk_family.patch
Re: [RFC PATCH] net: Introduce a socket option to enable picking tx queue based on rx queue.
On Tue, Sep 12, 2017 at 3:31 PM, Samudrala, Sridharwrote: > > > On 9/12/2017 8:47 AM, Eric Dumazet wrote: >> >> On Mon, 2017-09-11 at 23:27 -0700, Samudrala, Sridhar wrote: >>> >>> On 9/11/2017 8:53 PM, Eric Dumazet wrote: On Mon, 2017-09-11 at 20:12 -0700, Tom Herbert wrote: > Two ints in sock_common for this purpose is quite expensive and the > use case for this is limited-- even if a RX->TX queue mapping were > introduced to eliminate the queue pair assumption this still won't > help if the receive and transmit interfaces are different for the > connection. I think we really need to see some very compelling results > to be able to justify this. >>> >>> Will try to collect and post some perf data with symmetric queue >>> configuration. >>> Yes, this is unreasonable cost. XPS should really cover the case already. >>> >>> Eric, >>> >>> Can you clarify how XPS covers the RX-> TX queue mapping case? >>> Is it possible to configure XPS to select TX queue based on the RX queue >>> of a flow? >>> IIUC, it is based on the CPU of the thread doing the transmit OR based >>> on skb->priority to TC mapping? >>> It may be possible to get this effect if the the threads are pinned to a >>> core, but if the app threads are >>> freely moving, i am not sure how XPS can be configured to select the TX >>> queue based on the RX queue of a flow. >> >> If application is freely moving, how NIC can properly select the RX >> queue so that packets are coming to the appropriate queue ? > > The RX queue is selected via RSS and we don't want to move the flow based on > where the thread is running. Unless flow director is enabled on the Intel device... This was, I believe, one of the first attempts to introduce a queue pair notion to general purpose NICs. The idea was that the device records the TX queue for a flow and then uses that to determine receive queue in a symmetric fashion. aRFS is similar, but was under SW control how the mapping is done. As Eric mentioned there are scalability issues with these mechanisms, but we also found that flow director can easily reorder packets whenever the thread moves. >> >> >> This is called aRFS, and it does not scale to millions of flows. >> We tried in the past, and this went nowhere really, since the setup cost >> is prohibitive and DDOS vulnerable. >> >> XPS will follow the thread, since selection is done on current cpu. >> >> The problem is RX side. If application is free to migrate, then special >> support (aRFS) is needed from the hardware. > > This may be true if most of the rx processing is happening in the interrupt > context. > But with busy polling, i think we don't need aRFS as a thread should be > able to poll > any queue irrespective of where it is running. It's not just a problem with interrupt processing, in general we like to have all receive processing an subsequent transmit of a reply to be done on one CPU. Silo'ing is good for performance and parallelism. This can sometimes be relaxed in situations where CPUs share a cache so crossing CPUs is not not costly. >> >> >> At least for passive connections, we already have all the support in the >> kernel so that you can have one thread per NIC queue, dealing with >> sockets that have incoming packets all received on one NIC RX queue. >> (And of course all TX packets will use the symmetric TX queue) >> >> SO_REUSEPORT plus appropriate BPF filter can achieve that. >> >> Say you have 32 queues, 32 cpus. >> >> Simply use 32 listeners, 32 threads (or 32 pools of threads) > > Yes. This will work if each thread is pinned to a core associated with the > RX interrupt. > It may not be possible to pin the threads to a core. > Instead we want to associate a thread to a queue and do all the RX and TX > completion > of a queue in the same thread context via busy polling. > When that happens it's possible for RX to be done on the completely wrong CPU which we know is suboptimal. However, this shouldn't negatively affect TX side since XPS will just use the queue appropriate for running CPU. Like Eric said, this is really a receive problem more than a transmit problem. Keeping them as independent paths seems to be a good approach. Tom
Re: [PATCH] datapath: Fix an error handling path in 'ovs_nla_init_match_and_action()'
On 09/11/2017 12:20 PM, Christophe JAILLET wrote: All other error handling paths in this function go through the 'error' label. This one should do the same. Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.") Signed-off-by: Christophe JAILLET--- I think that the comment above the function could be improved. It looks like the commit log which has introduced this function. I'm also not sure that commit 9cc9a5cb176c is of any help. It is supposed to remove a warning, and I guess it does. But 'ovs_nla_init_match_and_action()' is called unconditionnaly from 'ovs_flow_cmd_set()'. So even if the stack used by each function is reduced, the overall stack should be the same, if not larger. So this commit sounds like adding a bug where the code was fine and states to fix an issue but, at the best, only hides it. Having a large stack frame isn't really a bug per se. But the Linux kernel warns about stack frames that are too large so reordering the code to get the warning to go away seems fine to me. Instead of fixing the code with the proposed patch, reverting the initial commit could also be considered. Then the warning will come back. - Greg --- net/openvswitch/datapath.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 76cf273a56c7..c3aec6227c91 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -1112,7 +1112,8 @@ static int ovs_nla_init_match_and_action(struct net *net, if (!a[OVS_FLOW_ATTR_KEY]) { OVS_NLERR(log, "Flow key attribute not present in set flow."); - return -EINVAL; + error = -EINVAL; + goto error; } *acts = get_flow_actions(net, a[OVS_FLOW_ATTR_ACTIONS], key,
319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression
Hi, Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1432684 of a regression with automatic spice port assignment. The libvirt team reduced this to the attached test case run as follows: In a separate terminal, qemu-kvm -vnc 127.0.0.1:0 to grab port 5900. Then do this: $ gcc bind-collision.c && ./a.out bind: Address already in use AF_INET check failed. $ gcc -D CHECK_IPV6 bind-collision.c && ./a.out AF_INET6 success AF_INET success $ gcc bind-collision.c && ./a.out AF_INET success Bisection showed this behavior to be caused by commit 319554f284dda9f2737d09df82ba3610bd8ddea3 Author: Josef BacikDate: Thu Jan 19 17:47:46 2017 -0500 inet: don't use sk_v6_rcv_saddr directly When comparing two sockets we need to use inet6_rcv_saddr so we get a NULL sk_v6_rcv_saddr if the socket isn't AF_INET6, otherwise our comparison function can be wrong. Fixes: 637bc8b ("inet: reset tb->fastreuseport when adding a reuseport sk") Signed-off-by: Josef Bacik Signed-off-by: David S. Miller And reverting fixed both the standalone test case and the spice issue. Any ideas? Thanks, Laura #include #include #include #include #include #include #include /* Reproducer for https://bugzilla.redhat.com/show_bug.cgi?id=1432684 Simply do something like: qemu-kvm -vnc 127.0.0.1:0 */ #define PORT 5900 int check_port(int family) { int fd = -1; int reuseaddr = 1; int v6only = 1; int addrlen; int ret = -1; bool ipv6 = false; struct sockaddr *addr; struct sockaddr_in6 addr6 = { .sin6_family = AF_INET6, .sin6_port = htons(PORT), .sin6_addr = in6addr_any }; struct sockaddr_in addr4 = { .sin_family = AF_INET, .sin_port = htons(PORT), .sin_addr.s_addr = htonl(INADDR_ANY) }; if (family == AF_INET6) { addr = (struct sockaddr*) addrlen = sizeof(addr6); ipv6 = true; } else if (family == AF_INET) { addr = (struct sockaddr*) addrlen = sizeof(addr4); } else { printf("Unknown family\n"); goto out; } if ((fd = socket(family, SOCK_STREAM, 0)) < 0) { perror("socket"); goto out; } if (ipv6 && setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, (void*), sizeof(v6only)) < 0) { perror("setsockopt IPV6_V6ONLY"); goto out; } if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, , sizeof(reuseaddr)) < 0) { perror("setsockopt SO_REUSEADDR"); goto out; } if (bind(fd, addr, addrlen) < 0) { perror("bind"); goto out; } ret = 0; out: close(fd); return ret; } int main(void) { #ifdef CHECK_IPV6 if (check_port(AF_INET6) < 0) { printf("AF_INET6 check failed.\n"); return -1; } printf("AF_INET6 success\n"); #endif if (check_port(AF_INET) < 0) { printf("AF_INET check failed.\n"); return -1; } printf("AF_INET success\n"); return 0; }
Re: [PATCH] ieee802154: fix gcc-4.9 warnings
On Tue, 2017-09-12 at 12:16 +0200, Arnd Bergmann wrote: > All older compiler versions up to gcc-4.9 produce these > harmless warnings: > > drivers/net/ieee802154/ca8210.c: In function 'ca8210_skb_tx': > drivers/net/ieee802154/ca8210.c:1947:9: warning: missing braces around > initializer [-Wmissing-braces] > > This changes the syntax to something that works on all versions > without warnings. > > Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver") [] > diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c [] > @@ -1944,7 +1944,7 @@ static int ca8210_skb_tx( > ) > { > int status; > - struct ieee802154_hdr header = { 0 }; > + struct ieee802154_hdr header = { }; > struct secspec secspec; > unsigned int mac_len; Presumably gcc does this because the first member of struct ieee802154_hdr is another struct. I wonder if "struct foo bar = { 0 };" should be discouraged by checkpatch. Right now it's about 4:3 in favor of struct foo bar = {}; over struct foo bar = { 0 }; $ git grep -E "struct\s+\w+\s+\w+\s*=\s*\{\s*0\s*\}\s*[,;]" | wc -l 826 $ git grep -E "struct\s+\w+\s+\w+\s*=\s*\{\s*\}\s*[,;]" | wc -l 990 There are many instances on multiple lines too. The git grep above doesn't span multiple lines.
Re: [RFC PATCH] net: Introduce a socket option to enable picking tx queue based on rx queue.
On 9/12/2017 8:47 AM, Eric Dumazet wrote: On Mon, 2017-09-11 at 23:27 -0700, Samudrala, Sridhar wrote: On 9/11/2017 8:53 PM, Eric Dumazet wrote: On Mon, 2017-09-11 at 20:12 -0700, Tom Herbert wrote: Two ints in sock_common for this purpose is quite expensive and the use case for this is limited-- even if a RX->TX queue mapping were introduced to eliminate the queue pair assumption this still won't help if the receive and transmit interfaces are different for the connection. I think we really need to see some very compelling results to be able to justify this. Will try to collect and post some perf data with symmetric queue configuration. Yes, this is unreasonable cost. XPS should really cover the case already. Eric, Can you clarify how XPS covers the RX-> TX queue mapping case? Is it possible to configure XPS to select TX queue based on the RX queue of a flow? IIUC, it is based on the CPU of the thread doing the transmit OR based on skb->priority to TC mapping? It may be possible to get this effect if the the threads are pinned to a core, but if the app threads are freely moving, i am not sure how XPS can be configured to select the TX queue based on the RX queue of a flow. If application is freely moving, how NIC can properly select the RX queue so that packets are coming to the appropriate queue ? The RX queue is selected via RSS and we don't want to move the flow based on where the thread is running. This is called aRFS, and it does not scale to millions of flows. We tried in the past, and this went nowhere really, since the setup cost is prohibitive and DDOS vulnerable. XPS will follow the thread, since selection is done on current cpu. The problem is RX side. If application is free to migrate, then special support (aRFS) is needed from the hardware. This may be true if most of the rx processing is happening in the interrupt context. But with busy polling, i think we don't need aRFS as a thread should be able to poll any queue irrespective of where it is running. At least for passive connections, we already have all the support in the kernel so that you can have one thread per NIC queue, dealing with sockets that have incoming packets all received on one NIC RX queue. (And of course all TX packets will use the symmetric TX queue) SO_REUSEPORT plus appropriate BPF filter can achieve that. Say you have 32 queues, 32 cpus. Simply use 32 listeners, 32 threads (or 32 pools of threads) Yes. This will work if each thread is pinned to a core associated with the RX interrupt. It may not be possible to pin the threads to a core. Instead we want to associate a thread to a queue and do all the RX and TX completion of a queue in the same thread context via busy polling. Thanks Sridhar
[no subject]
<>
Re: [PATCH 2/2] net: qcom/emac: add software control for pause frame mode
On 08/01/2017 04:37 PM, Timur Tabi wrote: The EMAC has the option of sending only a single pause frame when flow control is enabled and the RX queue is full. Although sending only one pause frame has little value, this would allow admins to enable automatic flow control without having to worry about the EMAC flooding nearby switches with pause frames if the kernel hangs. The option is enabled by using the single-pause-mode private flag. Signed-off-by: Timur TabiDave, I don't see this patch in net-next. Can you pick it up for 4.14? -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
On Tue, Sep 12, 2017 at 2:36 PM, Jiri Pirkowrote: > Tue, Sep 12, 2017 at 11:10:22PM CEST, xiyou.wangc...@gmail.com wrote: >>On Tue, Sep 12, 2017 at 3:40 AM, Jiri Pirko wrote: >>> This patch helps: >> >>Looks good to me. Please feel free to submit a formal patch. > > Okay, I will send the patch to you formally so you can add it as a first > patch of your patchset. I can carry it by myself if it fits to this patchset. However, I believe it should be independent since it has to be backported much further than this patchset. I don't know why no one triggered the crash before call_rcu() was introduced there. Anyway, I believe you should submit your patch alone, either before or after this patchset, there should be no conflict.
Re: [PATCH net] net: systemport: Fix 64-bit stats deadlock
On 09/12/2017 02:38 PM, Eric Dumazet wrote: > On Tue, 2017-09-12 at 13:14 -0700, Florian Fainelli wrote: >> We can enter a deadlock situation because there is no sufficient protection >> when ndo_get_stats64() runs in process context to guard against RX or TX NAPI >> contexts running in softirq, this can lead to the following lockdep splat and >> actual deadlock was experienced as well with an iperf session in the >> background >> and a while loop doing ifconfig + ethtool. > >> So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64() >> since it does not appear to be useful for anything. No inconsistency was >> observed with either ifconfig or ethtool, global TX counts equal the sum of >> per-queue TX counts on a 32-bit architecture. >> >> Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics") >> Signed-off-by: Florian Fainelli>> --- >> drivers/net/ethernet/broadcom/bcmsysport.c | 3 --- >> 1 file changed, 3 deletions(-) >> >> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c >> b/drivers/net/ethernet/broadcom/bcmsysport.c >> index a6572b51435a..c3c53f6cd9e6 100644 >> --- a/drivers/net/ethernet/broadcom/bcmsysport.c >> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c >> @@ -1735,11 +1735,8 @@ static void bcm_sysport_get_stats64(struct net_device >> *dev, >> stats->tx_packets += tx_packets; >> } >> >> -/* lockless update tx_bytes and tx_packets */ >> -u64_stats_update_begin(>syncp); > > Yes, this u64_stats_update_begin()/u64_stats_update_end() is bogus > > But why do we even write on tx_bytes/tx_packets here ??? That's for the ethtool -S netdev stats copy (that's on me, I added that in the driver initial version), so yes, not very robust... > > Seems very wrong anyway. > > (ethtool -S does not call bcm_sysport_get_stats64() to refresh them ) Yes that might actually be the simplest way to get this fixed. > >> stats64->tx_bytes = stats->tx_bytes; >> stats64->tx_packets = stats->tx_packets; >> -u64_stats_update_end(>syncp); >> >> do { >> start = u64_stats_fetch_begin_irq(>syncp); > > -- Florian
Re: [PATCH net] net: systemport: Fix 64-bit stats deadlock
On Tue, 2017-09-12 at 13:14 -0700, Florian Fainelli wrote: > We can enter a deadlock situation because there is no sufficient protection > when ndo_get_stats64() runs in process context to guard against RX or TX NAPI > contexts running in softirq, this can lead to the following lockdep splat and > actual deadlock was experienced as well with an iperf session in the > background > and a while loop doing ifconfig + ethtool. > So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64() > since it does not appear to be useful for anything. No inconsistency was > observed with either ifconfig or ethtool, global TX counts equal the sum of > per-queue TX counts on a 32-bit architecture. > > Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics") > Signed-off-by: Florian Fainelli> --- > drivers/net/ethernet/broadcom/bcmsysport.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c > b/drivers/net/ethernet/broadcom/bcmsysport.c > index a6572b51435a..c3c53f6cd9e6 100644 > --- a/drivers/net/ethernet/broadcom/bcmsysport.c > +++ b/drivers/net/ethernet/broadcom/bcmsysport.c > @@ -1735,11 +1735,8 @@ static void bcm_sysport_get_stats64(struct net_device > *dev, > stats->tx_packets += tx_packets; > } > > - /* lockless update tx_bytes and tx_packets */ > - u64_stats_update_begin(>syncp); Yes, this u64_stats_update_begin()/u64_stats_update_end() is bogus But why do we even write on tx_bytes/tx_packets here ??? Seems very wrong anyway. (ethtool -S does not call bcm_sysport_get_stats64() to refresh them ) > stats64->tx_bytes = stats->tx_bytes; > stats64->tx_packets = stats->tx_packets; > - u64_stats_update_end(>syncp); > > do { > start = u64_stats_fetch_begin_irq(>syncp);
Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
Tue, Sep 12, 2017 at 11:10:22PM CEST, xiyou.wangc...@gmail.com wrote: >On Tue, Sep 12, 2017 at 3:40 AM, Jiri Pirkowrote: >> Tue, Sep 12, 2017 at 11:42:15AM CEST, j...@resnulli.us wrote: >>>Tue, Sep 12, 2017 at 01:33:30AM CEST, xiyou.wangc...@gmail.com wrote: gen estimator has been rewritten in commit 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators"), the caller is no longer needed to wait for a grace period. So this patch gets rid of it. This also completely closes a race condition between action free path and filter chain add/remove path for the following patch. Because otherwise the nested RCU callback can't be caught by rcu_barrier(). Please see also the comments in code. >>> >>>Looks like this is causing a null pointer dereference bug for me, 100% >>>of the time. Just add and remove any rule with action and you get: >>> >> >> [...] >> >>> >>>Looks like you need to save owner of the module before you call >>>__tcf_idr_release so you can later on use it for module_put > >Why do you believe it is this patch introduces the bug? > >That code has been there since the beginning of git history: > >+ for (a = act; a; a = act) { >+ if (a->ops && a->ops->cleanup) { >+ DPRINTK("tcf_action_destroy destroying %p next %p\n", >+ a, a->next); >+ if (a->ops->cleanup(a, bind) == ACT_P_DELETED) >+ module_put(a->ops->owner); >+ act = act->next; > >Seems to be a very old one. The reason why it exposes, I guess, >is call_rcu() somehow delays the free after module_put(). Yeah, looks like the race was just hard to hit. However with your patch, it is very easy to hit. > > >> >> This patch helps: > >Looks good to me. Please feel free to submit a formal patch. Okay, I will send the patch to you formally so you can add it as a first patch of your patchset.
Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
On Tue, Sep 12, 2017 at 3:40 AM, Jiri Pirkowrote: > Tue, Sep 12, 2017 at 11:42:15AM CEST, j...@resnulli.us wrote: >>Tue, Sep 12, 2017 at 01:33:30AM CEST, xiyou.wangc...@gmail.com wrote: >>>gen estimator has been rewritten in commit 1c0d32fde5bd >>>("net_sched: gen_estimator: complete rewrite of rate estimators"), >>>the caller is no longer needed to wait for a grace period. >>>So this patch gets rid of it. >>> >>>This also completely closes a race condition between action free >>>path and filter chain add/remove path for the following patch. >>>Because otherwise the nested RCU callback can't be caught by >>>rcu_barrier(). >>> >>>Please see also the comments in code. >> >>Looks like this is causing a null pointer dereference bug for me, 100% >>of the time. Just add and remove any rule with action and you get: >> > > [...] > >> >>Looks like you need to save owner of the module before you call >>__tcf_idr_release so you can later on use it for module_put Why do you believe it is this patch introduces the bug? That code has been there since the beginning of git history: + for (a = act; a; a = act) { + if (a->ops && a->ops->cleanup) { + DPRINTK("tcf_action_destroy destroying %p next %p\n", + a, a->next); + if (a->ops->cleanup(a, bind) == ACT_P_DELETED) + module_put(a->ops->owner); + act = act->next; Seems to be a very old one. The reason why it exposes, I guess, is call_rcu() somehow delays the free after module_put(). > > This patch helps: Looks good to me. Please feel free to submit a formal patch.
Re: Can libpcap filter on vlan tags when vlans are hardware-accelerated?
On Tue, Sep 12, 2017 at 11:54:43AM -0700, Ben Greear wrote: > It does not appear to work on Fedora-26, and I'm curious if someone > knows what needs doing to get this support working? It's rather complicated. The "vlan" and "vlan " filters didn't handle the case when vlan information is passed in metadata until commit 04660eb1e561 ("Use BPF extensions in compiled filters"), i.e. libpcap 1.7.0. Unfortunately that commit made libpcap always check only metadata for the first outermost vlan tag so that it broke the case when vlan information is passed in packet itself (which is less frequent today). To handle both cases correctly, you would need libpcap with commits d739b068ac29 ("Make VLAN filter handle both metadata and inline tags") and 7c7a19fbd9af ("Fix logic of combined VLAN test") and also the optimizer fix from https://github.com/the-tcpdump-group/libpcap/pull/582/commits/075015a3d17a (without it the filters generate incorrect BPF in some cases unless the optimizer is disabled). As far as I can see, these commits are not in any release yet. Michal Kubecek
Re: [PATCH/RFC net-next 2/2] net/sched: allow flower to match tunnel options
On Tue, Sep 12, 2017 at 5:20 PM, Simon Hormanwrote: > Allow matching on options in tunnel headers. > This makes use of existing tunnel metadata support. Simon, This patch is about matching on tunnel options, right? but > Options are a bytestring of up to 256 bytes. > Tunnel implementations may support less or more options, > or no options at all. > > # ip link add name geneve0 type geneve dstport 0 external > # tc qdisc add dev eth0 ingress > # tc qdisc del dev eth0 ingress; tc qdisc add dev eth0 ingress > # tc filter add dev eth0 protocol ip parent : \ > flower indev eth0 \ > ip_proto udp \ > action tunnel_key \ > set src_ip 10.0.99.192 \ > dst_ip 10.0.99.193 \ > dst_port 4789 \ > id 11 \ > opts 0102800100800022 \ > action mirred egress redirect dev geneve0 the example here is on how to use tunnel options in the tunnel set key actions.. And the other way around in the other patch... the patch is about the tunnel key set action and the example shows how to match that in flower... I guess you want to swap the relevant of the change log. Anyway, is there any human readable/understandable representation of these options? e.g what does 0102800100800022 means for geneve?
[PATCH net] net: systemport: Fix 64-bit stats deadlock
We can enter a deadlock situation because there is no sufficient protection when ndo_get_stats64() runs in process context to guard against RX or TX NAPI contexts running in softirq, this can lead to the following lockdep splat and actual deadlock was experienced as well with an iperf session in the background and a while loop doing ifconfig + ethtool. [5.780350] [5.784679] WARNING: inconsistent lock state [5.789011] 4.13.0-rc7-02179-g32fae27c725d #70 Not tainted [5.794561] [5.798890] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [5.804971] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes: [5.810175] (>seq#2){+.?...}, at: [] bcm_sysport_tx_reclaim+0x30/0x54 [5.818327] {SOFTIRQ-ON-W} state was registered at: [5.823278] bcm_sysport_get_stats64+0x17c/0x258 [5.828053] dev_get_stats+0x38/0xac [5.831776] rtnl_fill_stats+0x30/0x118 [5.835761] rtnl_fill_ifinfo+0x538/0xe24 [5.839921] rtmsg_ifinfo_build_skb+0x6c/0xd8 [5.844430] rtmsg_ifinfo_event.part.5+0x14/0x44 [5.849201] rtmsg_ifinfo+0x20/0x28 [5.852837] register_netdevice+0x628/0x6b8 [5.857171] register_netdev+0x14/0x24 [5.861051] bcm_sysport_probe+0x30c/0x438 [5.865280] platform_drv_probe+0x50/0xb0 [5.869418] driver_probe_device+0x2e8/0x450 [5.873817] __driver_attach+0x104/0x120 [5.877871] bus_for_each_dev+0x7c/0xc0 [5.881834] bus_add_driver+0x1b0/0x270 [5.885797] driver_register+0x78/0xf4 [5.889675] do_one_initcall+0x54/0x190 [5.893646] kernel_init_freeable+0x144/0x1d0 [5.898135] kernel_init+0x8/0x110 [5.901665] ret_from_fork+0x14/0x2c [5.905363] irq event stamp: 24263 [5.908804] hardirqs last enabled at (24262): [] net_rx_action+0xc4/0x4e4 [5.916624] hardirqs last disabled at (24263): [] _raw_spin_lock_irqsave+0x1c/0x98 [5.925143] softirqs last enabled at (24258): [] irq_enter+0x84/0x98 [5.932524] softirqs last disabled at (24259): [] irq_exit+0x108/0x16c [5.939985] [5.939985] other info that might help us debug this: [5.946576] Possible unsafe locking scenario: [5.946576] [5.952556]CPU0 [5.955031] [5.957506] lock(>seq#2); [5.960955] [5.963604] lock(>seq#2); [5.967227] [5.967227] *** DEADLOCK *** [5.967227] [5.973222] 1 lock held by swapper/0/0: [5.977092] #0: (&(>lock)->rlock){..-...}, at: [] bcm_sysport_tx_reclaim+0x20/0x54 So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64() since it does not appear to be useful for anything. No inconsistency was observed with either ifconfig or ethtool, global TX counts equal the sum of per-queue TX counts on a 32-bit architecture. Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics") Signed-off-by: Florian Fainelli--- drivers/net/ethernet/broadcom/bcmsysport.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c index a6572b51435a..c3c53f6cd9e6 100644 --- a/drivers/net/ethernet/broadcom/bcmsysport.c +++ b/drivers/net/ethernet/broadcom/bcmsysport.c @@ -1735,11 +1735,8 @@ static void bcm_sysport_get_stats64(struct net_device *dev, stats->tx_packets += tx_packets; } - /* lockless update tx_bytes and tx_packets */ - u64_stats_update_begin(>syncp); stats64->tx_bytes = stats->tx_bytes; stats64->tx_packets = stats->tx_packets; - u64_stats_update_end(>syncp); do { start = u64_stats_fetch_begin_irq(>syncp); -- 1.9.1
[PATCH] net: vrf: avoid gcc-4.6 warning
When building an allmodconfig kernel with gcc-4.6, we get a rather odd warning: drivers/net/vrf.c: In function ‘vrf_ip6_input_dst’: drivers/net/vrf.c:964:3: error: initialized field with side-effects overwritten [-Werror] drivers/net/vrf.c:964:3: error: (near initialization for ‘fl6’) [-Werror] I have no idea what this warning is even trying to say, but it does seem like a false positive. Reordering the initialization in to match the structure definition gets rid of the warning, and might also avoid whatever gcc thinks is wrong here. Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses") Signed-off-by: Arnd Bergmann--- drivers/net/vrf.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 7e19051f3230..9b243e6f3008 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -957,12 +957,12 @@ static void vrf_ip6_input_dst(struct sk_buff *skb, struct net_device *vrf_dev, { const struct ipv6hdr *iph = ipv6_hdr(skb); struct flowi6 fl6 = { + .flowi6_iif = ifindex, + .flowi6_mark= skb->mark, + .flowi6_proto = iph->nexthdr, .daddr = iph->daddr, .saddr = iph->saddr, .flowlabel = ip6_flowinfo(iph), - .flowi6_mark= skb->mark, - .flowi6_proto = iph->nexthdr, - .flowi6_iif = ifindex, }; struct net *net = dev_net(vrf_dev); struct rt6_info *rt6; -- 2.9.0
[PATCH] ravb: document R8A77970 bindings
R-Car V3M (R8A77970) SoC also has the R-Car gen3 compatible EtherAVB device, so document the SoC specific bindings. Signed-off-by: Sergei Shtylyov--- The patch is against DaveM's 'net-next.git' repo but I wouldn't mind if it's applied to 'net.git' instead. :-) Documentation/devicetree/bindings/net/renesas,ravb.txt |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: net-next/Documentation/devicetree/bindings/net/renesas,ravb.txt === --- net-next.orig/Documentation/devicetree/bindings/net/renesas,ravb.txt +++ net-next/Documentation/devicetree/bindings/net/renesas,ravb.txt @@ -17,6 +17,7 @@ Required properties: - "renesas,etheravb-r8a7795" for the R8A7795 SoC. - "renesas,etheravb-r8a7796" for the R8A7796 SoC. + - "renesas,etheravb-r8a77970" for the R8A77970 SoC. - "renesas,etheravb-rcar-gen3" as a fallback for the above R-Car Gen3 devices. @@ -40,7 +41,7 @@ Optional properties: - interrupt-parent: the phandle for the interrupt controller that services interrupts for this device. - interrupt-names: A list of interrupt names. - For the R8A779[56] SoCs this property is mandatory; + For the R-Car Gen 3 SoCs this property is mandatory; it should include one entry per channel, named "ch%u", where %u is the channel number ranging from 0 to 24. For other SoCs this property is optional; if present
Re: Can libpcap filter on vlan tags when vlans are hardware-accelerated?
On 09/12/2017 11:54 AM, Ben Greear wrote: It does not appear to work on Fedora-26, and I'm curious if someone knows what needs doing to get this support working? Thanks, Ben Gah, I spoke too soon. system-test guy says it works on cmd-line, but not when we try to make it work in another way...could be local bug, I'll poke at this more. Thanks, Ben -- Ben GreearCandela Technologies Inc http://www.candelatech.com
Can libpcap filter on vlan tags when vlans are hardware-accelerated?
It does not appear to work on Fedora-26, and I'm curious if someone knows what needs doing to get this support working? Thanks, Ben -- Ben GreearCandela Technologies Inc http://www.candelatech.com
[no subject]
<>
Regression in throughput between kvm guests over virtual bridge
We are seeing a regression for a subset of workloads across KVM guests over a virtual bridge between host kernel 4.12 and 4.13. Bisecting points to c67df11f "vhost_net: try batch dequing from skb array" In the regressed environment, we are running 4 kvm guests, 2 running as uperf servers and 2 running as uperf clients, all on a single host. They are connected via a virtual bridge. The uperf client profile looks like: So, 1 tcp streaming instance per client. When upgrading the host kernel from 4.12->4.13, we see about a 30% drop in throughput for this scenario. After the bisect, I further verified that reverting c67df11f on 4.13 "fixes" the throughput for this scenario. On the other hand, if we increase the load by upping the number of streaming instances to 50 (nprocs="50") or even 10, we see instead a ~10% increase in throughput when upgrading host from 4.12->4.13. So it may be the issue is specific to "light load" scenarios. I would expect some overhead for the batching, but 30% seems significant... Any thoughts on what might be happening here?
Re: [PATCH net] net: bonding: fix tlb_dynamic_lb default value
On Tue, Sep 12, 2017 at 5:10 AM, Nikolay Aleksandrovwrote: > Commit 8b426dc54cf4 ("bonding: remove hardcoded value") changed the > default value for tlb_dynamic_lb which lead to either broken ALB mode > (since tlb_dynamic_lb can be changed only in TLB) or setting TLB mode > with tlb_dynamic_lb equal to 0. > The first issue was recently fixed by setting tlb_dynamic_lb to 1 always > when switching to ALB mode, but the default value is still wrong and > we'll enter TLB mode with tlb_dynamic_lb equal to 0 if the mode is > changed via netlink or sysfs. In order to restore the previous behaviour > and default value simply remove the mode check around the default param > initialization for tlb_dynamic_lb which will always set it to 1 as > before. > > Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value") > Signed-off-by: Nikolay Aleksandrov Acked-by: Mahesh Bandewar > --- > drivers/net/bonding/bond_main.c | 17 +++-- > 1 file changed, 7 insertions(+), 10 deletions(-) > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index fc63992ab0e0..c99dc59d729b 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -4289,7 +4289,7 @@ static int bond_check_params(struct bond_params *params) > int bond_mode = BOND_MODE_ROUNDROBIN; > int xmit_hashtype = BOND_XMIT_POLICY_LAYER2; > int lacp_fast = 0; > - int tlb_dynamic_lb = 0; > + int tlb_dynamic_lb; > > /* Convert string parameters. */ > if (mode) { > @@ -4601,16 +4601,13 @@ static int bond_check_params(struct bond_params > *params) > } > ad_user_port_key = valptr->value; > > - if ((bond_mode == BOND_MODE_TLB) || (bond_mode == BOND_MODE_ALB)) { > - bond_opt_initstr(, "default"); > - valptr = bond_opt_parse(bond_opt_get(BOND_OPT_TLB_DYNAMIC_LB), > - ); > - if (!valptr) { > - pr_err("Error: No tlb_dynamic_lb default value"); > - return -EINVAL; > - } > - tlb_dynamic_lb = valptr->value; > + bond_opt_initstr(, "default"); > + valptr = bond_opt_parse(bond_opt_get(BOND_OPT_TLB_DYNAMIC_LB), > ); > + if (!valptr) { > + pr_err("Error: No tlb_dynamic_lb default value"); > + return -EINVAL; > } > + tlb_dynamic_lb = valptr->value; > > if (lp_interval == 0) { > pr_warn("Warning: ip_interval must be between 1 and %d, so it > was reset to %d\n", > -- > 2.1.4 >
[PATCH] VSOCK: fix uapi/linux/vm_sockets.h incomplete types
This patch fixes the following compiler errors when userspace applications use the vm_sockets.h header: include/uapi/linux/vm_sockets.h:148:32: error: invalid application of ‘sizeof’ to incomplete type ‘struct sockaddr’ unsigned char svm_zero[sizeof(struct sockaddr) - ^~ include/uapi/linux/vm_sockets.h:149:18: error: ‘sa_family_t’ undeclared here (not in a function) sizeof(sa_family_t) - ^~~ Two issues: 1. In the kernel struct sockaddr comes in via but in userspace is required. 2. struct sockaddr_vm has a __kernel_sa_family_t field so let's be consistent and use the same type for the sizeof(sa_family_t) calculation. Currently userspace applications work around this broken header by first including . In the kernel there is no compiler error because provides everything. It's worth fixing the header file though. Cc: Jorgen HansenSigned-off-by: Stefan Hajnoczi --- include/uapi/linux/vm_sockets.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h index b4ed5d895699..4ae5c625ac56 100644 --- a/include/uapi/linux/vm_sockets.h +++ b/include/uapi/linux/vm_sockets.h @@ -18,6 +18,10 @@ #include +#ifndef __KERNEL__ +#include /* struct sockaddr */ +#endif + /* Option name for STREAM socket buffer size. Use as the option name in * setsockopt(3) or getsockopt(3) to set or get an unsigned long long that * specifies the size of the buffer underlying a vSockets STREAM socket. @@ -146,7 +150,7 @@ struct sockaddr_vm { unsigned int svm_port; unsigned int svm_cid; unsigned char svm_zero[sizeof(struct sockaddr) - - sizeof(sa_family_t) - + sizeof(__kernel_sa_family_t) - sizeof(unsigned short) - sizeof(unsigned int) - sizeof(unsigned int)]; }; -- 2.13.5
Re: [RFC PATCH] net: Introduce a socket option to enable picking tx queue based on rx queue.
On Mon, 2017-09-11 at 23:27 -0700, Samudrala, Sridhar wrote: > > On 9/11/2017 8:53 PM, Eric Dumazet wrote: > > On Mon, 2017-09-11 at 20:12 -0700, Tom Herbert wrote: > > > >> Two ints in sock_common for this purpose is quite expensive and the > >> use case for this is limited-- even if a RX->TX queue mapping were > >> introduced to eliminate the queue pair assumption this still won't > >> help if the receive and transmit interfaces are different for the > >> connection. I think we really need to see some very compelling results > >> to be able to justify this. > Will try to collect and post some perf data with symmetric queue > configuration. > > > Yes, this is unreasonable cost. > > > > XPS should really cover the case already. > > > Eric, > > Can you clarify how XPS covers the RX-> TX queue mapping case? > Is it possible to configure XPS to select TX queue based on the RX queue > of a flow? > IIUC, it is based on the CPU of the thread doing the transmit OR based > on skb->priority to TC mapping? > It may be possible to get this effect if the the threads are pinned to a > core, but if the app threads are > freely moving, i am not sure how XPS can be configured to select the TX > queue based on the RX queue of a flow. If application is freely moving, how NIC can properly select the RX queue so that packets are coming to the appropriate queue ? This is called aRFS, and it does not scale to millions of flows. We tried in the past, and this went nowhere really, since the setup cost is prohibitive and DDOS vulnerable. XPS will follow the thread, since selection is done on current cpu. The problem is RX side. If application is free to migrate, then special support (aRFS) is needed from the hardware. At least for passive connections, we already have all the support in the kernel so that you can have one thread per NIC queue, dealing with sockets that have incoming packets all received on one NIC RX queue. (And of course all TX packets will use the symmetric TX queue) SO_REUSEPORT plus appropriate BPF filter can achieve that. Say you have 32 queues, 32 cpus. Simply use 32 listeners, 32 threads (or 32 pools of threads)
[PATCH net] ipv6: fix net.ipv6.conf.all interface DAD handlers
Currently, writing into net.ipv6.conf.all.{accept_dad,use_optimistic,optimistic_dad} has no effect. Fix handling of these flags by: - using the maximum of global and per-interface values for the accept_dad flag. That is, if at least one of the two values is non-zero, enable DAD on the interface. If at least one value is set to 2, enable DAD and disable IPv6 operation on the interface if MAC-based link-local address was found - using the logical OR of global and per-interface values for the optimistic_dad flag. If at least one of them is set to one, optimistic duplicate address detection (RFC 4429) is enabled on the interface - using the logical OR of global and per-interface values for the use_optimistic flag. If at least one of them is set to one, optimistic addresses won't be marked as deprecated during source address selection on the interface. While at it, as we're modifying the prototype for ipv6_use_optimistic_addr(), drop inline, and let the compiler decide. Fixes: 7fd2561e4ebd ("net: ipv6: Add a sysctl to make optimistic addresses useful candidates") Signed-off-by: Matteo Croce--- Documentation/networking/ip-sysctl.txt | 18 ++ net/ipv6/addrconf.c| 27 --- 2 files changed, 34 insertions(+), 11 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index b3345d0fe0a6..77f4de59dc9c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1680,6 +1680,9 @@ accept_dad - INTEGER 2: Enable DAD, and disable IPv6 operation if MAC-based duplicate link-local address has been found. + DAD operation and mode on a given interface will be selected according + to the maximum value of conf/{all,interface}/accept_dad. + force_tllao - BOOLEAN Enable sending the target link-layer address option even when responding to a unicast neighbor solicitation. @@ -1727,16 +1730,23 @@ suppress_frag_ndisc - INTEGER optimistic_dad - BOOLEAN Whether to perform Optimistic Duplicate Address Detection (RFC 4429). - 0: disabled (default) - 1: enabled + 0: disabled (default) + 1: enabled + + Optimistic Duplicate Address Detection for the interface will be enabled + if at least one of conf/{all,interface}/optimistic_dad is set to 1, + it will be disabled otherwise. use_optimistic - BOOLEAN If enabled, do not classify optimistic addresses as deprecated during source address selection. Preferred addresses will still be chosen before optimistic addresses, subject to other ranking in the source address selection algorithm. - 0: disabled (default) - 1: enabled + 0: disabled (default) + 1: enabled + + This will be enabled if at least one of + conf/{all,interface}/use_optimistic is set to 1, disabled otherwise. stable_secret - IPv6 address This IPv6 address will be used as a secret to generate IPv6 diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index c2e2a78787ec..774d8794248a 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1399,10 +1399,18 @@ static inline int ipv6_saddr_preferred(int type) return 0; } -static inline bool ipv6_use_optimistic_addr(struct inet6_dev *idev) +static bool ipv6_use_optimistic_addr(struct net *net, +struct inet6_dev *idev) { #ifdef CONFIG_IPV6_OPTIMISTIC_DAD - return idev && idev->cnf.optimistic_dad && idev->cnf.use_optimistic; + if (!idev) + return false; + if (!net->ipv6.devconf_all->optimistic_dad && !idev->cnf.optimistic_dad) + return false; + if (!net->ipv6.devconf_all->use_optimistic && !idev->cnf.use_optimistic) + return false; + + return true; #else return false; #endif @@ -1472,7 +1480,7 @@ static int ipv6_get_saddr_eval(struct net *net, /* Rule 3: Avoid deprecated and optimistic addresses */ u8 avoid = IFA_F_DEPRECATED; - if (!ipv6_use_optimistic_addr(score->ifa->idev)) + if (!ipv6_use_optimistic_addr(net, score->ifa->idev)) avoid |= IFA_F_OPTIMISTIC; ret = ipv6_saddr_preferred(score->addr_type) || !(score->ifa->flags & avoid); @@ -2460,7 +2468,8 @@ int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev, int max_addresses = in6_dev->cnf.max_addresses; #ifdef CONFIG_IPV6_OPTIMISTIC_DAD - if (in6_dev->cnf.optimistic_dad && + if ((net->ipv6.devconf_all->optimistic_dad || +in6_dev->cnf.optimistic_dad) && !net->ipv6.devconf_all->forwarding && sllao) addr_flags |= IFA_F_OPTIMISTIC; #endif @@
Re: [PATCH] tcp: TCP_USER_TIMEOUT can not work in tcp_probe_timer()
On Tue, 2017-09-12 at 08:05 -0700, Eric Dumazet wrote: > On Tue, 2017-09-12 at 14:08 +0800, liujian wrote: > > Hi, > > > > In the scenario, tcp server side IP changed, and at that memont, > > userspace application still send data continuously; > > tcp_send_head(sk)'s timestamp always be refreshed. > > > > Here is the packetdrill script: > > > >0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > >+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > >+0 bind(3, ..., ...) = 0 > >+0 listen(3, 1) = 0 > > > >+0 < S 0:0(0) win 0 > >+0 > S. 0:0(0) ack 1 > > > > +.1 < . 1:1(0) ack 1 win 65530 > >+0 accept(3, ..., ...) = 4 > > > >+0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 > >+0 write(4, ..., 24) = 24 > >+0 > P. 1:25(24) ack 1 win 229 > >+.1 < . 1:1(0) ack 25 win 65530 > > > > //change the ipaddress > >+1 `ifconfig tun0 192.168.0.10/16` > > > >+1 write(4, ..., 24) = 24 > >+1 write(4, ..., 24) = 24 > >+1 write(4, ..., 24) = 24 > >+1 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > >+3 write(4, ..., 24) = 24 > > > >+0 `ifconfig tun0 192.168.0.1/16` > >+0 < . 1:1(0) ack 1 win 1000 > >+0 write(4, ..., 24) = -1 > > > > > > This has nothing to do with the code patch you have changed. > > How have you tested your patch exactly ? > lpaa23:~# ss -toenmi src :8080 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 144192.168.134.161:8080 192.0.2.1:51165 timer:(persist,8.262ms,5) ino:1 82083 sk:3 <-> skmem:(r0,rb359040,t0,tb46080,f1792,w2304,o0,bl0,d0) sack cubic wscale:7,8 rto:301 backoff:5 rtt:100.127/37.576 mss:1460 rcvmss:536 advmss:1460 cwnd:10 bytes_acked:24 segs_out:12 segs_in:3 data_segs_out:12 send 1.2Mbps lastsnd:1370 l astrcv:13348 lastack:13248 pacing_rate 2.3Mbps delivery_rate 116.7Kbps app_limited busy:11346ms rcv_space:29200 notsent:1 44 minrtt:100.043 This is the typical RTO timer, not zero window probe.
Re: [PATCH] tcp: TCP_USER_TIMEOUT can not work in tcp_probe_timer()
On Tue, 2017-09-12 at 14:08 +0800, liujian wrote: > Hi, > > In the scenario, tcp server side IP changed, and at that memont, > userspace application still send data continuously; > tcp_send_head(sk)'s timestamp always be refreshed. > > Here is the packetdrill script: > >0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 >+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 >+0 bind(3, ..., ...) = 0 >+0 listen(3, 1) = 0 > >+0 < S 0:0(0) win 0 >+0 > S. 0:0(0) ack 1 > > +.1 < . 1:1(0) ack 1 win 65530 >+0 accept(3, ..., ...) = 4 > >+0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 >+0 write(4, ..., 24) = 24 >+0 > P. 1:25(24) ack 1 win 229 >+.1 < . 1:1(0) ack 25 win 65530 > > //change the ipaddress >+1 `ifconfig tun0 192.168.0.10/16` > >+1 write(4, ..., 24) = 24 >+1 write(4, ..., 24) = 24 >+1 write(4, ..., 24) = 24 >+1 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 >+3 write(4, ..., 24) = 24 > >+0 `ifconfig tun0 192.168.0.1/16` >+0 < . 1:1(0) ack 1 win 1000 >+0 write(4, ..., 24) = -1 > > This has nothing to do with the code patch you have changed. How have you tested your patch exactly ? > [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt > test.pkt:50: runtime error in write call: Expected result -1 but got 24 with > errno 2 (No such file or directory) > > real 1m11.364s > user 0m0.028s > sys 0m0.106s > > [root@localhost ~]# netstat -toen > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State > User Inode Timer > tcp0504 192.168.0.1:8080192.0.2.1:33993 > ESTABLISHED 0 45453 probe (22.38/0/7) > > since the script didn't wait for enough time, here only got 7 probes. > > 在 2017/9/11 23:22, Eric Dumazet 写道: > > On Mon, 2017-09-11 at 08:13 -0700, Eric Dumazet wrote: > > > >> You can see we got only 3 probes, not 4. > > > > Here is complete packetdrill test showing that code behaves as expected. > > > > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > >+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > >+0 bind(3, ..., ...) = 0 > >+0 listen(3, 1) = 0 > > > >+0 < S 0:0(0) win 0 > >+0 > S. 0:0(0) ack 1 > > > > // Client advertises a zero receive window, so we can't send. > > +.1 < . 1:1(0) ack 1 win 0 > >+0 accept(3, ..., ...) = 4 > > > >+0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 > >+0 write(4, ..., 2920) = 2920 > > > > // Window probes are scheduled just like RTOs. > > +.3~+.31 > . 0:0(0) ack 1 > > +.6~+.62 > . 0:0(0) ack 1 > > +1.2~+1.24 > . 0:0(0) ack 1 > > > > // Peer opens its window too late ! > >+3 < . 1:1(0) ack 1 win 1000 > >+0 > R 1:1(0) > > > > > > > > . > > >
[iproute PATCH] ipaddress: Fix segfault in 'addr showdump'
Obviously, 'addr showdump' feature wasn't adjusted to json output support. As a consequence, calls to print_string() in print_addrinfo() tried to dereference a NULL FILE pointer. Fixes: d0e720111aad2 ("ip: ipaddress.c: add support for json output") Signed-off-by: Phil Sutter--- ip/ipaddress.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/ip/ipaddress.c b/ip/ipaddress.c index 9797145023966..ee6c9f588e7ba 100644 --- a/ip/ipaddress.c +++ b/ip/ipaddress.c @@ -1801,17 +1801,31 @@ static int show_handler(const struct sockaddr_nl *nl, { struct ifaddrmsg *ifa = NLMSG_DATA(n); - printf("if%d:\n", ifa->ifa_index); + open_json_object(NULL); + print_int(PRINT_ANY, "index", "if%d:\n", ifa->ifa_index); print_addrinfo(NULL, n, stdout); + close_json_object(); return 0; } static int ipaddr_showdump(void) { + int err; + if (ipadd_dump_check_magic()) exit(-1); - exit(rtnl_from_file(stdin, _handler, NULL)); + new_json_obj(json, stdout); + open_json_object(NULL); + open_json_array(PRINT_JSON, "addr_info"); + + err = rtnl_from_file(stdin, _handler, NULL); + + close_json_array(PRINT_JSON, NULL); + close_json_object(); + delete_json_obj(); + + exit(err); } static int restore_handler(const struct sockaddr_nl *nl, -- 2.13.1
Re: [PATCH] ieee802154: fix gcc-4.9 warnings
Hi Arnd, > All older compiler versions up to gcc-4.9 produce these > harmless warnings: > > drivers/net/ieee802154/ca8210.c: In function 'ca8210_skb_tx': > drivers/net/ieee802154/ca8210.c:1947:9: warning: missing braces around > initializer [-Wmissing-braces] > > This changes the syntax to something that works on all versions > without warnings. > > Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver") > Signed-off-by: Arnd Bergmann> --- > drivers/net/ieee802154/ca8210.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) patch has been applied to bluetooth-next tree. Regards Marcel
[PATCH/RFC net-next 2/2] net/sched: allow flower to match tunnel options
Allow matching on options in tunnel headers. This makes use of existing tunnel metadata support. Options are a bytestring of up to 256 bytes. Tunnel implementations may support less or more options, or no options at all. # ip link add name geneve0 type geneve dstport 0 external # tc qdisc add dev eth0 ingress # tc qdisc del dev eth0 ingress; tc qdisc add dev eth0 ingress # tc filter add dev eth0 protocol ip parent : \ flower indev eth0 \ ip_proto udp \ action tunnel_key \ set src_ip 10.0.99.192 \ dst_ip 10.0.99.193 \ dst_port 4789 \ id 11 \ opts 0102800100800022 \ action mirred egress redirect dev geneve0 Signed-off-by: Simon HormanReviewed-by: Jakub Kicinski --- include/net/flow_dissector.h | 13 + include/uapi/linux/pkt_cls.h | 3 +++ net/sched/cls_flower.c | 35 ++- 3 files changed, 50 insertions(+), 1 deletion(-) diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index fc3dce730a6b..43f98bf0b349 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -183,6 +183,18 @@ struct flow_dissector_key_ip { __u8ttl; }; +/** + * struct flow_dissector_key_enc_opts: + * @data: data + * @len: len + */ +struct flow_dissector_key_enc_opts { + u8 data[256]; /* Using IP_TUNNEL_OPTS_MAX is desired here +* but seems difficult to #include +*/ + u8 len; +}; + enum flow_dissector_key_id { FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */ FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */ @@ -205,6 +217,7 @@ enum flow_dissector_key_id { FLOW_DISSECTOR_KEY_MPLS, /* struct flow_dissector_key_mpls */ FLOW_DISSECTOR_KEY_TCP, /* struct flow_dissector_key_tcp */ FLOW_DISSECTOR_KEY_IP, /* struct flow_dissector_key_ip */ + FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */ FLOW_DISSECTOR_KEY_MAX, }; diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index d5e2bf68d0d4..7a09a28f21e0 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -467,6 +467,9 @@ enum { TCA_FLOWER_KEY_IP_TTL, /* u8 */ TCA_FLOWER_KEY_IP_TTL_MASK, /* u8 */ + TCA_FLOWER_KEY_ENC_OPTS, + TCA_FLOWER_KEY_ENC_OPTS_MASK, + __TCA_FLOWER_MAX, }; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 1a267e77c6de..2a8364ef4fd5 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -51,6 +51,7 @@ struct fl_flow_key { struct flow_dissector_key_mpls mpls; struct flow_dissector_key_tcp tcp; struct flow_dissector_key_ip ip; + struct flow_dissector_key_enc_opts enc_opts; } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */ struct fl_flow_mask_range { @@ -181,6 +182,11 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp, skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id); skb_key.enc_tp.src = key->tp_src; skb_key.enc_tp.dst = key->tp_dst; + + if (info->options_len) { + skb_key.enc_opts.len = info->options_len; + ip_tunnel_info_opts_get(skb_key.enc_opts.data, info); + } } skb_key.indev_ifindex = skb->skb_iif; @@ -421,6 +427,8 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = { [TCA_FLOWER_KEY_IP_TOS_MASK]= { .type = NLA_U8 }, [TCA_FLOWER_KEY_IP_TTL] = { .type = NLA_U8 }, [TCA_FLOWER_KEY_IP_TTL_MASK]= { .type = NLA_U8 }, + [TCA_FLOWER_KEY_ENC_OPTS] = { .type = NLA_BINARY }, + [TCA_FLOWER_KEY_ENC_OPTS_MASK] = { .type = NLA_BINARY }, }; static void fl_set_key_val(struct nlattr **tb, @@ -712,6 +720,26 @@ static int fl_set_key(struct net *net, struct nlattr **tb, >enc_tp.dst, TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK, sizeof(key->enc_tp.dst)); + if (tb[TCA_FLOWER_KEY_ENC_OPTS]) { + key->enc_opts.len = nla_len(tb[TCA_FLOWER_KEY_ENC_OPTS]); + + if (key->enc_opts.len > sizeof(key->enc_opts.data)) + return -EINVAL; + + /* enc_opts is variable length. +* If present ensure the value and mask are the same length. +*/ + if (tb[TCA_FLOWER_KEY_ENC_OPTS_MASK] && + nla_len(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]) != key->enc_opts.len) + return -EINVAL; + + mask->enc_opts.len = key->enc_opts.len; + fl_set_key_val(tb, key->enc_opts.data, TCA_FLOWER_KEY_ENC_OPTS, + mask->enc_opts.data, +
[PATCH/RFC net-next 1/2] net/sched: add tunnel option support to act_tunnel_key
Allow setting tunnel options using the act_tunnel_key action. Options are a bitwise maskable bytestring of up to 256 bytes. Tunnel implementations may support less or more options, or no options at all. e.g. # ip link add name geneve0 type geneve dstport 0 external # tc qdisc del dev geneve0 ingress # tc filter add dev geneve0 protocol ip parent : \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ enc_opts 0102800100800020/fff0 \ ip_proto udp \ action mirred egress redirect dev eth1 Signed-off-by: Simon HormanReviewed-by: Jakub Kicinski --- include/uapi/linux/tc_act/tc_tunnel_key.h | 1 + net/sched/act_tunnel_key.c| 26 +- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h b/include/uapi/linux/tc_act/tc_tunnel_key.h index afcd4be953e2..e0cb1121d132 100644 --- a/include/uapi/linux/tc_act/tc_tunnel_key.h +++ b/include/uapi/linux/tc_act/tc_tunnel_key.h @@ -35,6 +35,7 @@ enum { TCA_TUNNEL_KEY_PAD, TCA_TUNNEL_KEY_ENC_DST_PORT,/* be16 */ TCA_TUNNEL_KEY_NO_CSUM, /* u8 */ + TCA_TUNNEL_KEY_ENC_OPTS, __TCA_TUNNEL_KEY_MAX, }; diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c index 30c96274c638..77b5890a48b9 100644 --- a/net/sched/act_tunnel_key.c +++ b/net/sched/act_tunnel_key.c @@ -66,6 +66,7 @@ static const struct nla_policy tunnel_key_policy[TCA_TUNNEL_KEY_MAX + 1] = { [TCA_TUNNEL_KEY_ENC_KEY_ID] = { .type = NLA_U32 }, [TCA_TUNNEL_KEY_ENC_DST_PORT] = {.type = NLA_U16}, [TCA_TUNNEL_KEY_NO_CSUM] = { .type = NLA_U8 }, + [TCA_TUNNEL_KEY_ENC_OPTS] = { .type = NLA_BINARY }, }; static int tunnel_key_init(struct net *net, struct nlattr *nla, @@ -81,9 +82,11 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla, struct tcf_tunnel_key *t; bool exists = false; __be16 dst_port = 0; + int opts_len = 0; __be64 key_id; __be16 flags; int ret = 0; + u8 *opts; int err; if (!nla) @@ -121,6 +124,11 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla, if (tb[TCA_TUNNEL_KEY_ENC_DST_PORT]) dst_port = nla_get_be16(tb[TCA_TUNNEL_KEY_ENC_DST_PORT]); + if (tb[TCA_TUNNEL_KEY_ENC_OPTS]) { + opts = nla_data(tb[TCA_TUNNEL_KEY_ENC_OPTS]); + opts_len = nla_len(tb[TCA_TUNNEL_KEY_ENC_OPTS]); + } + if (tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC] && tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]) { __be32 saddr; @@ -131,7 +139,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla, metadata = __ip_tun_set_dst(saddr, daddr, 0, 0, dst_port, flags, - key_id, 0); + key_id, opts_len); } else if (tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC] && tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]) { struct in6_addr saddr; @@ -142,9 +150,13 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla, metadata = __ipv6_tun_set_dst(, , 0, 0, dst_port, 0, flags, - key_id, 0); + key_id, opts_len); } + if (opts_len) + ip_tunnel_info_opts_set(>u.tun_info, + opts, opts_len); + if (!metadata) { ret = -EINVAL; goto err_out; @@ -264,8 +276,9 @@ static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a, goto nla_put_failure; if (params->tcft_action == TCA_TUNNEL_KEY_ACT_SET) { - struct ip_tunnel_key *key = - >tcft_enc_metadata->u.tun_info.key; + struct ip_tunnel_info *info = + >tcft_enc_metadata->u.tun_info; + struct ip_tunnel_key *key = >key; __be32 key_id = tunnel_id_to_key32(key->tun_id); if (nla_put_be32(skb, TCA_TUNNEL_KEY_ENC_KEY_ID, key_id) || @@ -273,7 +286,10 @@ static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a, >tcft_enc_metadata->u.tun_info) || nla_put_be16(skb, TCA_TUNNEL_KEY_ENC_DST_PORT, key->tp_dst) || nla_put_u8(skb, TCA_TUNNEL_KEY_NO_CSUM, - !(key->tun_flags & TUNNEL_CSUM))) +
[PATCH/RFC net-next 0/2] net/sched: support tunnel options in cls_flower and act_tunnel_key
Allow the flower classifier to match on tunnel options and the tunnel key action to set them. Tunnel options are a bytestring of up to 256 bytes. The flower classifier matching with an optional bitwise mask. Tunnel implementations may support more or less options, or none at all. Simon Horman (2): net/sched: add tunnel option support to act_tunnel_key net/sched: allow flower to match tunnel options include/net/flow_dissector.h | 13 include/uapi/linux/pkt_cls.h | 3 +++ include/uapi/linux/tc_act/tc_tunnel_key.h | 1 + net/sched/act_tunnel_key.c| 26 ++- net/sched/cls_flower.c| 35 ++- 5 files changed, 72 insertions(+), 6 deletions(-) -- 2.1.4
[PATCH iproute2/net-next] tc: flower: support for matching MPLS labels
From: Benjamin LaHaiseThis patch adds support to the iproute2 tc filter command for matching MPLS labels in the flower classifier. The ability to match the Time To Live, Bottom Of Stack, Traffic Control and Label fields are added as options to the flower filter. e.g.: tc filter add dev eth0 protocol 0x8847 parent : \ flower mpls_label 1 mpls_tc 2 mpls_ttl 3 mpls_bos 0 \ action drop Signed-off-by: Benjamin LaHaise Signed-off-by: Simon Horman Reviewed-by: Jakub Kicinski --- v1 [Simon Horman] - added flower_print_opt portion to code - added example to changelog - revised manpage changes v0 [Benjamin LaHaise] --- man/man8/tc-flower.8 | 37 +++-- tc/f_flower.c| 92 2 files changed, 127 insertions(+), 2 deletions(-) diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8 index be46f0278b4f..88a23f544133 100644 --- a/man/man8/tc-flower.8 +++ b/man/man8/tc-flower.8 @@ -29,6 +29,14 @@ flower \- flow based traffic control filter .IR PRIORITY " | " .BR vlan_ethtype " { " ipv4 " | " ipv6 " | " .IR ETH_TYPE " } | " +.B mpls_label +.IR LABEL " | " +.B mpls_tc +.IR TC " | " +.B mpls_bos +.IR BOS " | " +.B mpls_ttl +.IR TTL " | " .BR ip_proto " { " tcp " | " udp " | " sctp " | " icmp " | " icmpv6 " | " .IR IP_PROTO " } | " .B ip_tos @@ -119,6 +127,29 @@ may be either .BR ipv4 ", " ipv6 or an unsigned 16bit value in hexadecimal format. .TP +.BI mpls_label " LABEL" +Match the label id in the outermost MPLS label stack entry. +.I LABEL +is an unsigned 20 bit value in decimal format. +.TP +.BI mpls_tc " TC" +Match on the MPLS TC field, which is typically used for packet priority, +in the outermost MPLS label stack entry. +.I TC +is an unsigned 3 bit value in decimal format. +.TP +.BI mpls_bos " BOS" +Match on the MPLS Bottom Of Stack field in the outermost MPLS label stack +entry. +.I BOS +is a 1 bit value in decimal format. +.TP +.BI mpls_ttl " TTL" +Match on the MPLS Time To Live field in the outermost MPLS label stack +entry. +.I TTL +is an unsigned 8 bit value in decimal format. +.TP .BI ip_proto " IP_PROTO" Match on layer four protocol. .I IP_PROTO @@ -226,8 +257,10 @@ to match on fragmented packets or not respectively. As stated above where applicable, matches of a certain layer implicitly depend on the matches of the next lower layer. Precisely, layer one and two matches (\fBindev\fR, \fBdst_mac\fR and \fBsrc_mac\fR) -have no dependency, layer three matches -(\fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR, +have no dependency, +MPLS and layer three matches +(\fBmpls_label\fR, \fBmpls_tc\fR, \fBmpls_bos\fR, \fBmpls_ttl\fR, +\fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR, \fBarp_op\fR, \fBarp_tha\fR, \fBarp_sha\fR and \fBip_flags\fR) depend on the .B protocol diff --git a/tc/f_flower.c b/tc/f_flower.c index 934832e2bbe9..8c4bfb0d339e 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "utils.h" #include "tc_util.h" @@ -55,6 +56,10 @@ static void explain(void) " ip_proto [tcp | udp | sctp | icmp | icmpv6 | IP-PROTO ] |\n" " ip_tos MASKED-IP_TOS |\n" " ip_ttl MASKED-IP_TTL |\n" + " mpls_label LABEL |\n" + " mpls_tc TC |\n" + " mpls_bos BOS |\n" + " mpls_ttl TTL |\n" " dst_ip PREFIX |\n" " src_ip PREFIX |\n" " dst_port PORT-NUMBER |\n" @@ -672,6 +677,70 @@ static int flower_parse_opt(struct filter_util *qu, char *handle, _ethtype, n); if (ret < 0) return -1; + } else if (matches(*argv, "mpls_label") == 0) { + __u32 label; + + NEXT_ARG(); + if (eth_type != htons(ETH_P_MPLS_UC) && + eth_type != htons(ETH_P_MPLS_MC)) { + fprintf(stderr, + "Can't set \"mpls_label\" if ethertype isn't MPLS\n"); + return -1; + } + ret = get_u32(, *argv, 10); + if (ret < 0 || label & ~(MPLS_LS_LABEL_MASK >> MPLS_LS_LABEL_SHIFT)) { + fprintf(stderr, "Illegal \"mpls_label\"\n"); + return -1; + } + addattr32(n, MAX_MSG, TCA_FLOWER_KEY_MPLS_LABEL,
Re: [patch net] mlxsw: spectrum: Prevent mirred-related crash on removal
On Tue, Sep 12, 2017 at 03:15:50PM +0200, Jiri Pirko wrote: > Tue, Sep 12, 2017 at 03:05:06PM CEST, and...@lunn.ch wrote: > >On Tue, Sep 12, 2017 at 08:50:53AM +0200, Jiri Pirko wrote: > >> From: Yuval Mintz> > > >Hi Jiri, Yuval > > > >s/mirred/mirrored/g > > Actually, the name of the tc action is indeed "mirred". :-( Andrew
Re: [patch net] mlxsw: spectrum: Prevent mirred-related crash on removal
Tue, Sep 12, 2017 at 03:05:06PM CEST, and...@lunn.ch wrote: >On Tue, Sep 12, 2017 at 08:50:53AM +0200, Jiri Pirko wrote: >> From: Yuval Mintz> >Hi Jiri, Yuval > >s/mirred/mirrored/g Actually, the name of the tc action is indeed "mirred". See net/sched/act_mirred.c
[PATCH/RFC net-next] ravb: RX checksum offload
Add support for RX checksum offload. This is enabled by default and may be disabled and re-enabled using ethtool: # ethtool -K eth0 rx off # ethtool -K eth0 rx on The RAVB provides a simple checksumming scheme which appears to be completely compatible with CHECKSUM_COMPLETE: a 1's complement sum of all packet data after the L2 header is appended to packet data; this may be trivially read by the driver and used to update the skb accordingly. In terms of performance throughput is close to gigabit line-rate both with and without RX checksum offload enabled. Perf output, however, appears to indicate that significantly less time is spent in do_csum(). This is as expected. Test results with RX checksum offload enabled: # /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo enable_enobufs failed: getprotobyname Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.00 938.78 [ perf record: Woken up 14 times to write data ] [ perf record: Captured and wrote 3.524 MB /run/perf.data (~153957 samples) ] Summary of output of perf report: 19.49% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 9.88% ksoftirqd/0 [kernel.kallsyms] [k] __pi_memcpy 7.33% ksoftirqd/0 [kernel.kallsyms] [k] skb_put 7.00% ksoftirqd/0 [kernel.kallsyms] [k] ravb_poll 3.89% ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive 3.65% netperf [kernel.kallsyms] [k] __arch_copy_to_user 3.43% swapper [kernel.kallsyms] [k] arch_cpu_idle 2.77% swapper [kernel.kallsyms] [k] tick_nohz_idle_enter 1.85% ksoftirqd/0 [kernel.kallsyms] [k] __netdev_alloc_skb 1.80% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irq 1.64% ksoftirqd/0 [kernel.kallsyms] [k] __slab_alloc.isra.79 1.62% ksoftirqd/0 [kernel.kallsyms] [k] __pi___inval_cache_range Test results without RX checksum offload enabled: # /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo enable_enobufs failed: getprotobyname Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.00 941.09 [ perf record: Woken up 14 times to write data ] [ perf record: Captured and wrote 3.411 MB /run/perf.data (~149040 samples) ] Summary of output of perf report: 17.50%ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 10.60%ksoftirqd/0 [kernel.kallsyms] [k] __pi_memcpy 7.91%ksoftirqd/0 [kernel.kallsyms] [k] skb_put 6.95%ksoftirqd/0 [kernel.kallsyms] [k] do_csum 6.22%ksoftirqd/0 [kernel.kallsyms] [k] ravb_poll 3.84%ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive 2.53%netperf [kernel.kallsyms] [k] __arch_copy_to_user 2.53%swapper [kernel.kallsyms] [k] arch_cpu_idle 2.27%swapper [kernel.kallsyms] [k] tick_nohz_idle_enter 1.90%ksoftirqd/0 [kernel.kallsyms] [k] __pi___inval_cache_range 1.90%ksoftirqd/0 [kernel.kallsyms] [k] __netdev_alloc_skb 1.52%ksoftirqd/0 [kernel.kallsyms] [k] __slab_alloc.isra.79 Above results collected on an R-Car Gen 3 Salvator-X/r8a7796 ES1.0. Also tested on a R-Car Gen 3 Salvator-X/r8a7795 ES1.0. By inspection this also appears to be compatible with the ravb found on R-Car Gen 2 SoCs, however, this patch is currently untested on such hardware. Signed-off-by: Simon Horman--- drivers/net/ethernet/renesas/ravb_main.c | 58 +++- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index fdf30bfa403b..7c6438cd7de7 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -403,8 +403,9 @@ static void ravb_emac_init(struct net_device *ndev) /* Receive frame limit set register */ ravb_write(ndev, ndev->mtu + ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN, RFLR); - /* PAUSE prohibition */ + /* EMAC Mode: PAUSE prohibition; Duplex; RX Checksum; TX; RX */ ravb_write(ndev, ECMR_ZPF | (priv->duplex ? ECMR_DM : 0) | + (ndev->features & NETIF_F_RXCSUM ? ECMR_RCSC : 0) | ECMR_TE | ECMR_RE, ECMR); ravb_set_rate(ndev); @@ -520,6 +521,19 @@ static void ravb_get_tx_tstamp(struct net_device *ndev) } } +static void ravb_rx_csum(struct sk_buff *skb) +{ + u8 *hw_csum; + + /* The hardware
Re: [patch net] mlxsw: spectrum: Prevent mirred-related crash on removal
On Tue, Sep 12, 2017 at 08:50:53AM +0200, Jiri Pirko wrote: > From: Yuval MintzHi Jiri, Yuval s/mirred/mirrored/g Andrew
RE: [PATCH v2 net 1/3] lan78xx: Fix for eeprom read/write when device auto suspend
> > From: Nisar Sayed> > > > Fix for eeprom read/write when device auto suspend > > > > Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to > > 10/100/1000 Ethernet device driver") > > Signed-off-by: Nisar Sayed > > --- > > drivers/net/usb/lan78xx.c | 22 ++ > > 1 file changed, 18 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c > > index b99a7fb..baf91c7 100644 > > --- a/drivers/net/usb/lan78xx.c > > +++ b/drivers/net/usb/lan78xx.c > > @@ -1265,30 +1265,44 @@ static int lan78xx_ethtool_get_eeprom(struct > net_device *netdev, > > struct ethtool_eeprom *ee, u8 *data) { > > struct lan78xx_net *dev = netdev_priv(netdev); > > + int ret = -EINVAL; > > + > > + if (usb_autopm_get_interface(dev->intf) < 0) > > + return ret; > > Hi Nisar > > It is better to do > >ret = usb_autopm_get_interface(dev->intf; >if (ret) > return ret; > > i.e. use the error code usb_autopm_get_interface() gives you. > > > ee->magic = LAN78XX_EEPROM_MAGIC; > > > > - return lan78xx_read_raw_eeprom(dev, ee->offset, ee->len, data); > > + ret = lan78xx_read_raw_eeprom(dev, ee->offset, ee->len, data); > > + > > + usb_autopm_put_interface(dev->intf); > > + > > + return ret; > > } > > > > static int lan78xx_ethtool_set_eeprom(struct net_device *netdev, > > struct ethtool_eeprom *ee, u8 *data) { > > struct lan78xx_net *dev = netdev_priv(netdev); > > + int ret = -EINVAL; > > + > > + if (usb_autopm_get_interface(dev->intf) < 0) > > + return ret; > > Same here. > > Andrew Thanks Andrew, will update it. - Nisar
Re: [PATCH net-next v2 1/2] net: phy: realtek: rename RTL8211F_PAGE_SELECT to RTL821x_PAGE_SELECT
On Tue, Sep 12, 2017 at 06:54:35PM +0900, Kunihiko Hayashi wrote: > This renames the definition of page select register from > RTL8211F_PAGE_SELECT to RTL821x_PAGE_SELECT to use it across models. > > Signed-off-by: Kunihiko HayashiReviewed-by: Andrew Lunn Andrew
[PATCH] w90p910_ether: include linux/interrupt.h
A randconfig build caused a compile failure: drivers/net/ethernet/nuvoton/w90p910_ether.c: In function 'w90p910_ether_close': drivers/net/ethernet/nuvoton/w90p910_ether.c:580:2: error: implicit declaration of function 'free_irq'; did you mean 'free_uid'? [-Werror=implicit-function-declaration] Adding the correct include fixes the problem. Signed-off-by: Arnd Bergmann--- drivers/net/ethernet/nuvoton/w90p910_ether.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/nuvoton/w90p910_ether.c b/drivers/net/ethernet/nuvoton/w90p910_ether.c index 89ab786da25f..4a67c55aa9f1 100644 --- a/drivers/net/ethernet/nuvoton/w90p910_ether.c +++ b/drivers/net/ethernet/nuvoton/w90p910_ether.c @@ -11,6 +11,7 @@ #include #include +#include #include #include #include -- 2.9.0
[PATCH net] net: bonding: fix tlb_dynamic_lb default value
Commit 8b426dc54cf4 ("bonding: remove hardcoded value") changed the default value for tlb_dynamic_lb which lead to either broken ALB mode (since tlb_dynamic_lb can be changed only in TLB) or setting TLB mode with tlb_dynamic_lb equal to 0. The first issue was recently fixed by setting tlb_dynamic_lb to 1 always when switching to ALB mode, but the default value is still wrong and we'll enter TLB mode with tlb_dynamic_lb equal to 0 if the mode is changed via netlink or sysfs. In order to restore the previous behaviour and default value simply remove the mode check around the default param initialization for tlb_dynamic_lb which will always set it to 1 as before. Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value") Signed-off-by: Nikolay Aleksandrov--- drivers/net/bonding/bond_main.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index fc63992ab0e0..c99dc59d729b 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4289,7 +4289,7 @@ static int bond_check_params(struct bond_params *params) int bond_mode = BOND_MODE_ROUNDROBIN; int xmit_hashtype = BOND_XMIT_POLICY_LAYER2; int lacp_fast = 0; - int tlb_dynamic_lb = 0; + int tlb_dynamic_lb; /* Convert string parameters. */ if (mode) { @@ -4601,16 +4601,13 @@ static int bond_check_params(struct bond_params *params) } ad_user_port_key = valptr->value; - if ((bond_mode == BOND_MODE_TLB) || (bond_mode == BOND_MODE_ALB)) { - bond_opt_initstr(, "default"); - valptr = bond_opt_parse(bond_opt_get(BOND_OPT_TLB_DYNAMIC_LB), - ); - if (!valptr) { - pr_err("Error: No tlb_dynamic_lb default value"); - return -EINVAL; - } - tlb_dynamic_lb = valptr->value; + bond_opt_initstr(, "default"); + valptr = bond_opt_parse(bond_opt_get(BOND_OPT_TLB_DYNAMIC_LB), ); + if (!valptr) { + pr_err("Error: No tlb_dynamic_lb default value"); + return -EINVAL; } + tlb_dynamic_lb = valptr->value; if (lp_interval == 0) { pr_warn("Warning: ip_interval must be between 1 and %d, so it was reset to %d\n", -- 2.1.4
Re: ipset losing entries on its own
can somebody throw more light on this? How it is possible (without a bug) that for exactly same set of IPs, at time IPSET HASHSIZE remains at 1024 and at times it increases to 2048? As a workaround I am running the show setting HASHSIZE as 16384 at times of IPSET creation itself, and till now (its more than 4 days) the issue has not repeated. But this need to be addressed, right?
Re: Subject: [PATCH] vxlan: only reduce known arp boardcast request to support, virtual IP
On Tue, 12 Sep 2017 11:26:49 +0800, oc wrote: > The purpose of vxlan arp reduce feature is to reply the boardcast > arp request in vtep instead of sending it out to save traffic. > The current implemention drops arp packet, if the ip cannot be > found in neigh table. In the case of virtual IP address, user > defines IP address without management from SDN controller. The IP > address does not exist in neigh table, so the arp boardcast request > from a client can not be sent to the server who owns the virtual IP > address. > > This patch allow the arp request to be sent out if: > 1. not arp boardcast request > 2. cannot be found in neigh table > 3. arp record status is not NUD_CONNECTED > > The user defined of virtual IP address works while arp reduce still > suppress the arp boardcast for IP address managed by SDN controller > with this patch. Your patch is whitespace damaged, does not conform to the kernel coding style and the email does not have your full name in the From header. As for the patch itself, you're changing existing functionality that people may depend on and thus a new config option is needed to enable the behavior. Jiri
[PATCH] vti: fix NULL dereference in xfrm_input()
Can be reproduced with LTP tests: # icmp-uni-vti.sh -p ah -a sha256 -m tunnel -S fffe -k 1 -s 10 IPv4: RIP: 0010:xfrm_input+0x7f9/0x870 ... Call Trace: vti_input+0xaa/0x110 [ip_vti] ? skb_free_head+0x21/0x40 vti_rcv+0x33/0x40 [ip_vti] xfrm4_ah_rcv+0x33/0x60 ip_local_deliver_finish+0x94/0x1e0 ip_local_deliver+0x6f/0xe0 ? ip_route_input_noref+0x28/0x50 ... # icmp-uni-vti.sh -6 -p ah -a sha256 -m tunnel -S fffe -k 1 -s 10 IPv6: RIP: 0010:xfrm_input+0x7f9/0x870 ... Call Trace: xfrm6_rcv_tnl+0x3c/0x40 vti6_rcv+0xd5/0xe0 [ip6_vti] xfrm6_ah_rcv+0x33/0x60 ip6_input_finish+0xee/0x460 ip6_input+0x3f/0xb0 ip6_rcv_finish+0x45/0xa0 ipv6_rcv+0x34b/0x540 xfrm_input() invokes xfrm_rcv_cb() -> vti_rcv_cb(), the last callback might call skb_scrub_packet(), which in turn can reset secpath. Fix it by adding a check that skb->sp is not NULL. Fixes: 7e9e9202bccc ("xfrm: Clear RX SKB secpath xfrm_offload") Signed-off-by: Alexey Kodanev--- net/xfrm/xfrm_input.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 2515cd2..8ac9d32 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -429,7 +429,8 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) nf_reset(skb); if (decaps) { - skb->sp->olen = 0; + if (skb->sp) + skb->sp->olen = 0; skb_dst_drop(skb); gro_cells_receive(_cells, skb); return 0; @@ -440,7 +441,8 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) err = x->inner_mode->afinfo->transport_finish(skb, xfrm_gro || async); if (xfrm_gro) { - skb->sp->olen = 0; + if (skb->sp) + skb->sp->olen = 0; skb_dst_drop(skb); gro_cells_receive(_cells, skb); return err; -- 1.7.1
[PATCH] qed: remove unnecessary call to memset
call to memset to assign 0 value immediately after allocating memory with kzalloc is unnecesaary as kzalloc allocates the memory filled with 0 value. Semantic patch used to resolve this issue: @@ expression e,e2; constant c; statement S; @@ e = kzalloc(e2, c); if(e == NULL) S - memset(e, 0, e2); Signed-off-by: Himanshu Jha--- drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c index eaca457..8f6ccc0 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c @@ -1244,7 +1244,6 @@ int qed_dcbx_get_config_params(struct qed_hwfn *p_hwfn, if (!dcbx_info) return -ENOMEM; - memset(dcbx_info, 0, sizeof(*dcbx_info)); rc = qed_dcbx_query_params(p_hwfn, dcbx_info, QED_DCBX_OPERATIONAL_MIB); if (rc) { kfree(dcbx_info); -- 2.7.4
Re: broken vlan support on Realtek RTL8111/8168/8411 rev 9
Hi Francois > ethtool -K eth0 rxvlan off Thank you, that did the trick, vlan tags are not correctly passed on and not set to vlan 0 with rxvlan turned off. > For my reward, please send a complete dmesg where the messages from > the vanilla r8169 module appear for the rev 09 card (r81..f ?). I > won't dig it right now. Yes, the 'f' variant: [1.035203] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [1.035262] r8169 :03:00.0: setting latency timer to 64 [1.035348] r8169 :03:00.0: irq 41 for MSI/MSI-X [1.035634] r8169 :03:00.0: eth0: RTL8168f/8111f at 0xc9c28000, c8:60:00:dd:f8:6c, XID 08000800 IRQ 41 [1.035637] r8169 :03:00.0: eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] [ 10.403921] r8169 :03:00.0: firmware: agent loaded rtl_nic/rtl8168f-1.fw into memory Linux pulsar 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64 GNU/Linux 624831688e25aa47fa84c30c045fcae3 /lib/firmware/rtl_nic/rtl8168f-1.fw Firmware Bug or Hardware Problem? -Benoît Panizzon- -- I m p r o W a r e A G-Leiter Commerce Kunden __ Zurlindenstrasse 29 Tel +41 61 826 93 00 CH-4133 PrattelnFax +41 61 826 93 01 Schweiz Web http://www.imp.ch __
Re: [Patch net v3 3/3] net_sched: carefully handle tcf_block_put()
Tue, Sep 12, 2017 at 01:33:32AM CEST, xiyou.wangc...@gmail.com wrote: >As pointed out by Jiri, there is still a race condition between >tcf_block_put() and tcf_chain_destroy() in a RCU callback. There >is no way to make it correct without proper locking or synchronization, >because both operate on a shared list. > >Locking is hard, because the only lock we can pick here is a spinlock, >however, in tc_dump_tfilter() we iterate this list with a sleeping >function called (tcf_chain_dump()), which makes using a lock to protect >chain_list almost impossible. > >Jiri suggested the idea of holding a refcnt before flushing, this works >because it guarantees us there would be no parallel tcf_chain_destroy() >during the loop, therefore the race condition is gone. But we have to >be very careful with proper synchronization with RCU callbacks. > >Suggested-by: Jiri Pirko>Cc: Jamal Hadi Salim >Signed-off-by: Cong Wang Acked-by: Jiri Pirko Thanks!
Re: [Patch net v3 2/3] net_sched: fix reference counting of tc filter chain
Tue, Sep 12, 2017 at 01:33:31AM CEST, xiyou.wangc...@gmail.com wrote: >This patch fixes the following ugliness of tc filter chain refcnt: > >a) tp proto should hold a refcnt to the chain too. This significantly > simplifies the logic. > >b) Chain 0 is no longer special, it is created with refcnt=1 like any > other chains. All the ugliness in tcf_chain_put() can be gone! > >c) No need to handle the flushing oddly, because block still holds > chain 0, it can not be released, this guarantees block is the last > user. > >d) The race condition with RCU callbacks is easier to handle with just > a rcu_barrier(). Much easier to understand, nothing to hide. Thanks > to the previous patch. Please see also the comments in code. > >e) Make the code understandable by humans, much less error-prone. > >Fixes: 744a4cf63e52 ("net: sched: fix use after free when tcf_chain_destroy is >called multiple times") >Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") >Cc: Jiri Pirko>Cc: Jamal Hadi Salim >Signed-off-by: Cong Wang Looking good to me. Thanks! Acked-by: Jiri Pirko
Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
Tue, Sep 12, 2017 at 11:42:15AM CEST, j...@resnulli.us wrote: >Tue, Sep 12, 2017 at 01:33:30AM CEST, xiyou.wangc...@gmail.com wrote: >>gen estimator has been rewritten in commit 1c0d32fde5bd >>("net_sched: gen_estimator: complete rewrite of rate estimators"), >>the caller is no longer needed to wait for a grace period. >>So this patch gets rid of it. >> >>This also completely closes a race condition between action free >>path and filter chain add/remove path for the following patch. >>Because otherwise the nested RCU callback can't be caught by >>rcu_barrier(). >> >>Please see also the comments in code. > >Looks like this is causing a null pointer dereference bug for me, 100% >of the time. Just add and remove any rule with action and you get: > [...] > >Looks like you need to save owner of the module before you call >__tcf_idr_release so you can later on use it for module_put This patch helps: diff --git a/net/sched/act_api.c b/net/sched/act_api.c index fcd7dc7..de73e71 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -514,13 +514,15 @@ EXPORT_SYMBOL(tcf_action_exec); int tcf_action_destroy(struct list_head *actions, int bind) { + const struct tc_action_ops *ops; struct tc_action *a, *tmp; int ret = 0; list_for_each_entry_safe(a, tmp, actions, list) { + ops = a->ops; ret = __tcf_idr_release(a, bind, true); if (ret == ACT_P_DELETED) - module_put(a->ops->owner); + module_put(ops->owner); else if (ret < 0) return ret; }
[PATCH] ipv4: Namespaceify tcp_fastopen knob
Different namespace application might require enable TCP Fast Open feature independently of the host. Reported-by: Luca BRUNOSigned-off-by: Haishuang Yan --- include/net/netns/ipv4.h | 2 ++ include/net/tcp.h | 1 - net/ipv4/af_inet.c | 7 --- net/ipv4/sysctl_net_ipv4.c | 42 +- net/ipv4/tcp.c | 4 ++-- net/ipv4/tcp_fastopen.c| 13 ++--- net/ipv4/tcp_ipv4.c| 2 ++ samples/bpf/test_ipip.sh | 2 ++ 8 files changed, 39 insertions(+), 34 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 305e031..ea0953b 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -128,6 +128,8 @@ struct netns_ipv4 { struct inet_timewait_death_row tcp_death_row; int sysctl_max_syn_backlog; int sysctl_tcp_max_orphans; + int sysctl_tcp_fastopen; + unsigned int sysctl_tcp_fastopen_blackhole_timeout; #ifdef CONFIG_NET_L3_MASTER_DEV int sysctl_udp_l3mdev_accept; diff --git a/include/net/tcp.h b/include/net/tcp.h index ac2d998..e4cc0dd 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -240,7 +240,6 @@ /* sysctl variables for tcp */ -extern int sysctl_tcp_fastopen; extern int sysctl_tcp_retrans_collapse; extern int sysctl_tcp_stdurg; extern int sysctl_tcp_rfc1337; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index e31108e..309b849 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -195,7 +195,7 @@ int inet_listen(struct socket *sock, int backlog) { struct sock *sk = sock->sk; unsigned char old_state; - int err; + int err, tcp_fastopen; lock_sock(sk); @@ -217,8 +217,9 @@ int inet_listen(struct socket *sock, int backlog) * because the socket was in TCP_LISTEN state previously but * was shutdown() rather than close(). */ - if ((sysctl_tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) && - (sysctl_tcp_fastopen & TFO_SERVER_ENABLE) && + tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen; + if ((tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) && + (tcp_fastopen & TFO_SERVER_ENABLE) && !inet_csk(sk)->icsk_accept_queue.fastopenq.max_qlen) { fastopen_queue_tune(sk, backlog); tcp_fastopen_init_key_once(true); diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 4f26c8d3..30ebeb9 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -394,27 +394,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl, .proc_handler = proc_dointvec }, { - .procname = "tcp_fastopen", - .data = _tcp_fastopen, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, - { - .procname = "tcp_fastopen_key", - .mode = 0600, - .maxlen = ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10), - .proc_handler = proc_tcp_fastopen_key, - }, - { - .procname = "tcp_fastopen_blackhole_timeout_sec", - .data = _tcp_fastopen_blackhole_timeout, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_tfo_blackhole_detect_timeout, - .extra1 = , - }, - { .procname = "tcp_abort_on_overflow", .data = _tcp_abort_on_overflow, .maxlen = sizeof(int), @@ -1085,6 +1064,27 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl, .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "tcp_fastopen", + .data = _net.ipv4.sysctl_tcp_fastopen, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .procname = "tcp_fastopen_key", + .mode = 0600, + .maxlen = ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10), + .proc_handler = proc_tcp_fastopen_key, + }, + { + .procname = "tcp_fastopen_blackhole_timeout_sec", + .data = _net.ipv4.sysctl_tcp_fastopen_blackhole_timeout, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_tfo_blackhole_detect_timeout, + .extra1 = , + }, #ifdef CONFIG_IP_ROUTE_MULTIPATH { .procname =
Re: [PATCH] ieee802154: fix gcc-4.9 warnings
Hello. On 09/12/2017 12:16 PM, Arnd Bergmann wrote: All older compiler versions up to gcc-4.9 produce these harmless warnings: drivers/net/ieee802154/ca8210.c: In function 'ca8210_skb_tx': drivers/net/ieee802154/ca8210.c:1947:9: warning: missing braces around initializer [-Wmissing-braces] This changes the syntax to something that works on all versions without warnings. Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver") Signed-off-by: Arnd Bergmann--- drivers/net/ieee802154/ca8210.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c index 24a1eabbbc9d..e6b8ce81a6c3 100644 --- a/drivers/net/ieee802154/ca8210.c +++ b/drivers/net/ieee802154/ca8210.c @@ -1944,7 +1944,7 @@ static int ca8210_skb_tx( ) { int status; - struct ieee802154_hdr header = { 0 }; + struct ieee802154_hdr header = { }; struct secspec secspec; unsigned int mac_len; Acked-by: Stefan Schmidt regards Stefan Schmidt
[PATCH] ieee802154: fix gcc-4.9 warnings
All older compiler versions up to gcc-4.9 produce these harmless warnings: drivers/net/ieee802154/ca8210.c: In function 'ca8210_skb_tx': drivers/net/ieee802154/ca8210.c:1947:9: warning: missing braces around initializer [-Wmissing-braces] This changes the syntax to something that works on all versions without warnings. Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver") Signed-off-by: Arnd Bergmann--- drivers/net/ieee802154/ca8210.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c index 24a1eabbbc9d..e6b8ce81a6c3 100644 --- a/drivers/net/ieee802154/ca8210.c +++ b/drivers/net/ieee802154/ca8210.c @@ -1944,7 +1944,7 @@ static int ca8210_skb_tx( ) { int status; - struct ieee802154_hdr header = { 0 }; + struct ieee802154_hdr header = { }; struct secspec secspec; unsigned int mac_len; -- 2.9.0
[PATCH net-next v2 1/2] net: phy: realtek: rename RTL8211F_PAGE_SELECT to RTL821x_PAGE_SELECT
This renames the definition of page select register from RTL8211F_PAGE_SELECT to RTL821x_PAGE_SELECT to use it across models. Signed-off-by: Kunihiko Hayashi--- Changes since v1: - new patch in this series --- drivers/net/phy/realtek.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c index 9cbe645..99c3297 100644 --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -22,11 +22,11 @@ #define RTL821x_INER 0x12 #define RTL821x_INER_INIT 0x6400 #define RTL821x_INSR 0x13 +#define RTL821x_PAGE_SELECT0x1f #define RTL8211E_INER_LINK_STATUS 0x400 #define RTL8211F_INER_LINK_STATUS 0x0010 #define RTL8211F_INSR 0x1d -#define RTL8211F_PAGE_SELECT 0x1f #define RTL8211F_TX_DELAY 0x100 MODULE_DESCRIPTION("Realtek PHY driver"); @@ -46,10 +46,10 @@ static int rtl8211f_ack_interrupt(struct phy_device *phydev) { int err; - phy_write(phydev, RTL8211F_PAGE_SELECT, 0xa43); + phy_write(phydev, RTL821x_PAGE_SELECT, 0xa43); err = phy_read(phydev, RTL8211F_INSR); /* restore to default page 0 */ - phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0); + phy_write(phydev, RTL821x_PAGE_SELECT, 0x0); return (err < 0) ? err : 0; } @@ -102,7 +102,7 @@ static int rtl8211f_config_init(struct phy_device *phydev) if (ret < 0) return ret; - phy_write(phydev, RTL8211F_PAGE_SELECT, 0xd08); + phy_write(phydev, RTL821x_PAGE_SELECT, 0xd08); reg = phy_read(phydev, 0x11); /* enable TX-delay for rgmii-id and rgmii-txid, otherwise disable it */ @@ -114,7 +114,7 @@ static int rtl8211f_config_init(struct phy_device *phydev) phy_write(phydev, 0x11, reg); /* restore to default page 0 */ - phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0); + phy_write(phydev, RTL821x_PAGE_SELECT, 0x0); return 0; } -- 2.7.4
[PATCH net-next v2 2/2] net: phy: realtek: add RTL8201F phy-id and functions
From: Jassi BrarAdd RTL8201F phy-id and the related functions to the driver. The original patch is as follows: https://patchwork.kernel.org/patch/2538341/ Signed-off-by: Jongsung Kim Signed-off-by: Jassi Brar Signed-off-by: Kunihiko Hayashi Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli --- Changes since v1: - use RTL821x_PAGE_SELECT instead of defining RTL8201F_PAGE_SELECT --- drivers/net/phy/realtek.c | 44 1 file changed, 44 insertions(+) diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c index 99c3297..d4670ec 100644 --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -29,10 +29,22 @@ #define RTL8211F_INSR 0x1d #define RTL8211F_TX_DELAY 0x100 +#define RTL8201F_ISR 0x1e +#define RTL8201F_IER 0x13 + MODULE_DESCRIPTION("Realtek PHY driver"); MODULE_AUTHOR("Johnson Leung"); MODULE_LICENSE("GPL"); +static int rtl8201_ack_interrupt(struct phy_device *phydev) +{ + int err; + + err = phy_read(phydev, RTL8201F_ISR); + + return (err < 0) ? err : 0; +} + static int rtl821x_ack_interrupt(struct phy_device *phydev) { int err; @@ -54,6 +66,25 @@ static int rtl8211f_ack_interrupt(struct phy_device *phydev) return (err < 0) ? err : 0; } +static int rtl8201_config_intr(struct phy_device *phydev) +{ + int err; + + /* switch to page 7 */ + phy_write(phydev, RTL821x_PAGE_SELECT, 0x7); + + if (phydev->interrupts == PHY_INTERRUPT_ENABLED) + err = phy_write(phydev, RTL8201F_IER, + BIT(13) | BIT(12) | BIT(11)); + else + err = phy_write(phydev, RTL8201F_IER, 0); + + /* restore to default page 0 */ + phy_write(phydev, RTL821x_PAGE_SELECT, 0x0); + + return err; +} + static int rtl8211b_config_intr(struct phy_device *phydev) { int err; @@ -129,6 +160,18 @@ static struct phy_driver realtek_drvs[] = { .config_aneg= _config_aneg, .read_status= _read_status, }, { + .phy_id = 0x001cc816, + .name = "RTL8201F 10/100Mbps Ethernet", + .phy_id_mask= 0x001f, + .features = PHY_BASIC_FEATURES, + .flags = PHY_HAS_INTERRUPT, + .config_aneg= _config_aneg, + .read_status= _read_status, + .ack_interrupt = _ack_interrupt, + .config_intr= _config_intr, + .suspend= genphy_suspend, + .resume = genphy_resume, + }, { .phy_id = 0x001cc912, .name = "RTL8211B Gigabit Ethernet", .phy_id_mask= 0x001f, @@ -181,6 +224,7 @@ static struct phy_driver realtek_drvs[] = { module_phy_driver(realtek_drvs); static struct mdio_device_id __maybe_unused realtek_tbl[] = { + { 0x001cc816, 0x001f }, { 0x001cc912, 0x001f }, { 0x001cc914, 0x001f }, { 0x001cc915, 0x001f }, -- 2.7.4
[PATCH v4 2/2] ip6_tunnel: fix ip6 tunnel lookup in collect_md mode
In collect_md mode, if the tun dev is down, it still can call __ip6_tnl_rcv to receive on packets, and the rx statistics increase improperly. When the md tunnel is down, it's not neccessary to increase RX drops for the tunnel device, packets would be recieved on fallback tunnel, and the RX drops on fallback device will be increased as expected. Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Cc: Alexei StarovoitovSigned-off-by: Haishuang Yan --- Change since v4: * Make the commit message more clearer * Fix wrong recipient address --- net/ipv6/ip6_tunnel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 10a693a..ae73164 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -171,7 +171,7 @@ static struct net_device_stats *ip6_get_stats(struct net_device *dev) } t = rcu_dereference(ip6n->collect_md_tun); - if (t) + if (t && t->dev->flags & IFF_UP) return t; t = rcu_dereference(ip6n->tnls_wc[0]); -- 1.8.3.1
[PATCH v4 1/2] ip_tunnel: fix ip tunnel lookup in collect_md mode
In collect_md mode, if the tun dev is down, it still can call ip_tunnel_rcv to receive on packets, and the rx statistics increase improperly. When the md tunnel is down, it's not neccessary to increase RX drops for the tunnel device, packets would be recieved on fallback tunnel, and the RX drops on fallback device will be increased as expected. Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.") Cc: Pravin B ShelarSigned-off-by: Haishuang Yan --- Change since v4: * Make the commit message more clearer. * Fix wrong recipient addresss --- net/ipv4/ip_tunnel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index e1856bf..e9805ad 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -176,7 +176,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn, return cand; t = rcu_dereference(itn->collect_md_tun); - if (t) + if (t && t->dev->flags & IFF_UP) return t; if (itn->fb_tunnel_dev && itn->fb_tunnel_dev->flags & IFF_UP) -- 1.8.3.1
Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
Tue, Sep 12, 2017 at 01:33:30AM CEST, xiyou.wangc...@gmail.com wrote: >gen estimator has been rewritten in commit 1c0d32fde5bd >("net_sched: gen_estimator: complete rewrite of rate estimators"), >the caller is no longer needed to wait for a grace period. >So this patch gets rid of it. > >This also completely closes a race condition between action free >path and filter chain add/remove path for the following patch. >Because otherwise the nested RCU callback can't be caught by >rcu_barrier(). > >Please see also the comments in code. Looks like this is causing a null pointer dereference bug for me, 100% of the time. Just add and remove any rule with action and you get: [ 598.599825] BUG: unable to handle kernel NULL pointer dereference at 0030 [ 598.607782] IP: tcf_action_destroy+0xc0/0x140 [ 598.612231] PGD 0 P4D 0 [ 598.614797] Oops: [#1] SMP KASAN [ 598.618525] Modules linked in: act_gact cls_flower sch_ingress rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl x86_pkg_temp_thermal coretemp mlxsw_spectrum kvm_intel mlxfw kvm parman bridge sunrpc irqbypass iTCO_wdt iTCO_vendor_support stp crct10dif_pclmul llc crc32_pclmul crc32c_intel mlxsw_pci ghash_clmulni_intel mlxsw_core i2c_i801 e1000e pcspkr ptp tpm_tis mei_me pps_core mei tpm_tis_core lpc_ich tpm shpchp video [ 598.659010] CPU: 1 PID: 758 Comm: bash Tainted: GB 4.13.0jiri+ #70 [ 598.666509] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016 [ 598.677630] task: 880371624bc0 task.stack: 880387808000 [ 598.683648] RIP: 0010:tcf_action_destroy+0xc0/0x140 [ 598.688617] RSP: 0018:88038d107cb8 EFLAGS: 00010282 [ 598.693922] RAX: RBX: 88038d107d28 RCX: 820b80e0 [ 598.701184] RDX: RSI: 0008 RDI: 0030 [ 598.708405] RBP: 88038d107ce8 R08: 0001 R09: 0001 [ 598.715607] R10: 88038d107b27 R11: fbfff0bcf36c R12: [ 598.722816] R13: 88038d107d38 R14: 88036bf75650 R15: 0001 [ 598.730047] FS: 7f398050b700() GS:88038d10() knlGS: [ 598.738253] CS: 0010 DS: ES: CR0: 80050033 [ 598.744086] CR2: 0030 CR3: 000371ac4001 CR4: 001606e0 [ 598.751328] Call Trace: [ 598.753809] [ 598.755871] tcf_exts_destroy+0x17f/0x260 [ 598.775969] fl_destroy_filter+0x1d/0x30 [cls_flower] [ 598.781069] rcu_process_callbacks+0x6b2/0xe00 Kasan says: [ 597.503005] BUG: KASAN: use-after-free in tcf_action_destroy+0xad/0x140 [ 597.509751] Read of size 8 at addr 88036bf75640 by task bash/758 [ 597.516222] [ 597.517761] CPU: 1 PID: 758 Comm: bash Not tainted 4.13.0jiri+ #70 [ 597.524075] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016 [ 597.535132] Call Trace: [ 597.537630] [ 597.539718] dump_stack+0xd5/0x150 [ 597.554853] print_address_description+0x86/0x410 [ 597.559667] kasan_report+0x181/0x4c0 [ 597.583360] tcf_action_destroy+0xad/0x140 [ 597.587551] tcf_exts_destroy+0x17f/0x260 Ubsan says: [ 598.184033] UBSAN: Undefined behaviour in net/sched/act_api.c:523:4 [ 598.190409] member access within null pointer of type 'const struct tc_action_ops' [ 598.198076] CPU: 1 PID: 758 Comm: bash Tainted: GB 4.13.0jiri+ #70 [ 598.205570] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016 [ 598.216669] Call Trace: [ 598.219157] [ 598.221245] dump_stack+0xd5/0x150 [ 598.228703] ubsan_epilogue+0xd/0x4e [ 598.232333] __ubsan_handle_type_mismatch+0xf2/0x293 [ 598.252880] tcf_action_destroy+0x115/0x140 [ 598.257151] tcf_exts_destroy+0x17f/0x260 [ 598.277336] fl_destroy_filter+0x1d/0x30 [cls_flower] [ 598.282472] rcu_process_callbacks+0x6b2/0xe00 Looks like you need to save owner of the module before you call __tcf_idr_release so you can later on use it for module_put
RE: [PATCH] tipc: Use bsearch library function
From: David Miller > Sent: 11 September 2017 22:30 > From: Thomas Meyer> Date: Sat, 9 Sep 2017 05:18:19 +0200 > > > @@ -168,6 +169,18 @@ static struct name_seq *tipc_nameseq_create(u32 type, > > struct hlist_head > *seq_hea > > return nseq; > > } > > > > +static int nameseq_find_subseq_cmp(const void *key, const void *elt) > > +{ > > + u32 instance = *(u32 *)key; > > + struct sub_seq *sseq = (struct sub_seq *)elt; > > Please order local variables from longest to shortest (ie. reverse > christmas tree). You probably just need to remove the unnecessary cast of 'void *'. Although adding the 'const' qualifier will make it wrong again. You probably ought to make the 'key' a structure - even if it only contains a single u32. Casting pointers to numeric types is often wrong. David
Re: [PATCH net-next 2/3] net: ethernet: socionext: add AVE ethernet driver
Hi Andrew, On Mon, 11 Sep 2017 14:00:09 +0200 Andrew Lunnwrote: > > > > +static irqreturn_t ave_interrupt(int irq, void *netdev) > > > > +{ > > > > + struct net_device *ndev = (struct net_device *)netdev; > > > > + struct ave_private *priv = netdev_priv(ndev); > > > > + u32 gimr_val, gisr_val; > > > > + > > > > + gimr_val = ave_irq_disable_all(ndev); > > > > + > > > > + /* get interrupt status */ > > > > + gisr_val = ave_r32(ndev, AVE_GISR); > > > > + > > > > + /* PHY */ > > > > + if (gisr_val & AVE_GI_PHY) { > > > > + ave_w32(ndev, AVE_GISR, AVE_GI_PHY); > > > > + if (priv->internal_phy_interrupt) > > > > + phy_mac_interrupt(ndev->phydev, > > > > ndev->phydev->link); > > > > > > Humm. I don't think this is correct. You are supposed to give it the > > > new link state, not the old. > > > > > > What does a PHY interrupt mean here? > > > > In the general case, I think PHY events like changing link state are > > transmitted > > to CPU as interrupt via interrupt controller, then PHY driver itself can > > handle > > the interrupt. > > > > And in this case, PHY events are transmitted to MAC as one of its interrupt > > factor, > > then I thought that MAC driver had to tell the events to PHY. > > Could this be in-band SGMI signalling from the PHY to the MAC? Does > the documentation give examples of when this interrupt will happen? > > Andrew Unfortunately this MAC doesn't support SGMII. And there aren't any examples of when this interrupt will happen. This interrupt happens after ave_open() is called and link is established. However, I found that auto negotiation didn't start when this interrupt wasn't handled. Although ave_init() calls phy_start_aneg(), it doesn't make sense because phy driver doesn't start yet. And according to Florian's comment in ave_init(), > + phy_start_interrupts(phydev); > + > + phy_start_aneg(phydev); > > No, no, phy_start() would take care of all of that correctly for you, > you don't have have to do it just there because ave_open() eventually > calls phy_start() for you, so just drop these two calls. When moving phy_start_aneg() to ave_open(), the handler doesn't need to call phy_mac_interrupt() and I can remove it from the handler. --- Best Regards, Kunihiko Hayashi
scheduling while atomic from vmci_transport_recv_stream_cb in 3.16 kernels
Hi, we are seeing the following splat with Debian 3.16 stable kernel BUG: scheduling while atomic: MATLAB/26771/0x0100 Modules linked in: veeamsnap(O) hmac cbc cts nfsv4 dns_resolver rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc vmw_vso$ CPU: 0 PID: 26771 Comm: MATLAB Tainted: G O 3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u3 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015 88315c1e4c20 8150db3f 88193f803dc8 8150acdf 815103a2 00012f00 8819423dbfd8 00012f00 88315c1e4c20 88193f803dc8 88193f803d50 88193f803dc0 Call Trace: [] ? dump_stack+0x41/0x51 [] ? __schedule_bug+0x48/0x55 [] ? __schedule+0x5d2/0x700 [] ? schedule_timeout+0x229/0x2a0 [] ? select_task_rq_fair+0x390/0x700 [] ? check_preempt_wakeup+0x120/0x1d0 [] ? wait_for_completion+0xa8/0x120 [] ? wake_up_state+0x10/0x10 [] ? call_rcu_bh+0x20/0x20 [] ? wait_rcu_gp+0x4b/0x60 [] ? ftrace_raw_output_rcu_utilization+0x40/0x40 [] ? vmci_event_unsubscribe+0x75/0xb0 [vmw_vmci] [] ? vmci_transport_destruct+0x1d/0xe0 [vmw_vsock_vmci_transport] [] ? vsock_sk_destruct+0x13/0x60 [vsock] [] ? __sk_free+0x1a/0x130 [] ? vmci_transport_recv_stream_cb+0x1e8/0x2d0 [vmw_vsock_vmci_transport] [] ? vmci_datagram_invoke_guest_handler+0xaa/0xd0 [vmw_vmci] [] ? vmci_dispatch_dgs+0xc1/0x200 [vmw_vmci] [] ? tasklet_action+0xf4/0x100 [] ? __do_softirq+0xf1/0x290 [] ? irq_exit+0x95/0xa0 [] ? do_IRQ+0x52/0xe0 [] ? common_interrupt+0x6d/0x6d AFAICS this has been fixed by 4ef7ea9195ea ("VSOCK: sock_put wasn't safe to call in interrupt context") but this patch hasn't been backported to stable trees. It applies cleanly on top of 3.16 stable tree but I am not familiar with the code to send the backport to the stable maintainer directly. Could you double check that the patch below (just a blind cherry-pick) is correct and it doesn't need additional patches on top? Ben could you take this to your stable 3.16 branch if the patch is correct? I am CCing Sasha for 4.1 stable tree as well. I haven't checked whether pre 3.16 kernels are affected as well. --- commit fac774c40b5c512113b6373cad498f35bee7a409 Author: Jorgen HansenDate: Wed Oct 21 04:53:56 2015 -0700 VSOCK: sock_put wasn't safe to call in interrupt context commit 4ef7ea9195ea73262cd9730fb54e1eb726da157b upstream. In the vsock vmci_transport driver, sock_put wasn't safe to call in interrupt context, since that may call the vsock destructor which in turn calls several functions that should only be called from process context. This change defers the callling of these functions to a worker thread. All these functions were deallocation of resources related to the transport itself. Furthermore, an unused callback was removed to simplify the cleanup. Multiple customers have been hitting this issue when using VMware tools on vSphere 2015. Also added a version to the vmci transport module (starting from 1.0.2.0-k since up until now it appears that this module was sharing version with vsock that is currently at 1.0.1.0-k). Reviewed-by: Aditya Asarwade Reviewed-by: Thomas Hellstrom Signed-off-by: Jorgen Hansen Signed-off-by: David S. Miller diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index 9bb63ffec4f2..aed136d27b01 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -40,13 +40,11 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg); static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg); -static void vmci_transport_peer_attach_cb(u32 sub_id, - const struct vmci_event_data *ed, - void *client_data); static void vmci_transport_peer_detach_cb(u32 sub_id, const struct vmci_event_data *ed, void *client_data); static void vmci_transport_recv_pkt_work(struct work_struct *work); +static void vmci_transport_cleanup(struct work_struct *work); static int vmci_transport_recv_listen(struct sock *sk, struct vmci_transport_packet *pkt); static int vmci_transport_recv_connecting_server( @@ -75,6 +73,10 @@ struct vmci_transport_recv_pkt_info { struct vmci_transport_packet pkt; }; +static LIST_HEAD(vmci_transport_cleanup_list); +static DEFINE_SPINLOCK(vmci_transport_cleanup_lock); +static DECLARE_WORK(vmci_transport_cleanup_work, vmci_transport_cleanup); + static struct vmci_handle vmci_transport_stream_handle = { VMCI_INVALID_ID,
Re: [PATCH iproute2 1/2] lib/libnetlink: re malloc buff if size is not enough
On Tue, Sep 12, 2017 at 10:47:48AM +0200, Michal Kubecek wrote: > On Mon, Sep 11, 2017 at 03:19:55PM +0800, Hangbin Liu wrote: > > On Fri, Sep 08, 2017 at 04:51:13PM +0200, Phil Sutter wrote: > > > Regarding Michal's concern about reentrancy, maybe we should go into a > > > different direction and make rtnl_recvmsg() return a newly allocated > > > buffer which the caller has to free. > > > > Hmm... But we could not free the buf in __rtnl_talk(). Because in > > __rtnl_talk() we assign the answer with the buf address and return to > > caller. > > > > for (h = (struct nlmsghdr *)buf; status >= sizeof(*h); ) { > > [...] > > if (answer) { > > *answer= h; > > return 0; > > } > > } > > > > And the caller will keep use it in later code. Since there are plenty of > > functions called rtnl_talk. I think it would be much more complex to free > > the buffer every time. > > > > > > Hi Michal, > > > > Would you like to tell me more about your concern with reentrancy? It's > > looks > > arpd doesn't call rtnl_talk() or rtnl_dump_filter_l(). > > I checked again and arpd indeed isn't a problem. It doesn't seem to call > any of the two functions (directly or indirectly) and while it's linked > with "-lpthread", it's not really multithreaded. > > But my concern was rather about other potential users of libnetlink > (i.e. those which are not part of iproute2). I must admit, though, that > I'm not sure if libnetlink code is reentrant as of now. (And people are > discouraged from using it in its own manual page.) > > That being said, I still like Phil's idea for a different reason. While > investigating the issue with "ip link show dev eth ..." which led me to > commit 6599162b958e ("iplink: check for message truncation in > iplink_get()"), I quickly peeked at some other callers of rtnl_talk() > and I'm afraid there may be others which wouldn't handle truncated > message correctly. I assume the maxlen argument was always chosen to be > sufficient for any expected messages but as the example of iplink_get() > shows, messages returned by kernel my grow over time. > > That's why I like the idea of __rtnl_talk() returning a pointer to newly > allocated buffer (of sufficient size) rather than copying the response > into a buffer provided by caller and potentially truncating it. I'm sorry, I managed to forget that your patch 2 does already address this problem. But the fact that any caller must keep in mind that he must not call the same function again until the previous response is no longer needed still feels like a trap. It's something you need to keep in mind (where "you" in fact means any future contributor) and it's easy to forget. That's why I prefer the reentrant functions like strerror_r() or localtime_r() even in code which is not intended to be multithreaded. Getting an object which is "mine" to do with whatever I want until I no longer need it feels like a cleaner interface to me. Michal Kubecek
Re: [PATCH iproute2 1/2] lib/libnetlink: re malloc buff if size is not enough
On Mon, Sep 11, 2017 at 03:19:55PM +0800, Hangbin Liu wrote: > On Fri, Sep 08, 2017 at 04:51:13PM +0200, Phil Sutter wrote: > > Regarding Michal's concern about reentrancy, maybe we should go into a > > different direction and make rtnl_recvmsg() return a newly allocated > > buffer which the caller has to free. > > Hmm... But we could not free the buf in __rtnl_talk(). Because in > __rtnl_talk() we assign the answer with the buf address and return to caller. > > for (h = (struct nlmsghdr *)buf; status >= sizeof(*h); ) { > [...] > if (answer) { > *answer= h; > return 0; > } > } > > And the caller will keep use it in later code. Since there are plenty of > functions called rtnl_talk. I think it would be much more complex to free > the buffer every time. > > > Hi Michal, > > Would you like to tell me more about your concern with reentrancy? It's looks > arpd doesn't call rtnl_talk() or rtnl_dump_filter_l(). I checked again and arpd indeed isn't a problem. It doesn't seem to call any of the two functions (directly or indirectly) and while it's linked with "-lpthread", it's not really multithreaded. But my concern was rather about other potential users of libnetlink (i.e. those which are not part of iproute2). I must admit, though, that I'm not sure if libnetlink code is reentrant as of now. (And people are discouraged from using it in its own manual page.) That being said, I still like Phil's idea for a different reason. While investigating the issue with "ip link show dev eth ..." which led me to commit 6599162b958e ("iplink: check for message truncation in iplink_get()"), I quickly peeked at some other callers of rtnl_talk() and I'm afraid there may be others which wouldn't handle truncated message correctly. I assume the maxlen argument was always chosen to be sufficient for any expected messages but as the example of iplink_get() shows, messages returned by kernel my grow over time. That's why I like the idea of __rtnl_talk() returning a pointer to newly allocated buffer (of sufficient size) rather than copying the response into a buffer provided by caller and potentially truncating it. Michal Kubecek
Re: [PATCH v5 10/10] net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs
On Mon, Sep 11, 2017 at 10:19:20PM +0200, Andrew Lunn wrote: > > Even with CLK_BUS_EPHY/RST_BUS_EPHY enabled, the MAC reset timeout. > > So no the CLK/RST are really for the PHY. > > Thanks for trying that. > > You said it was probably during scanning of the bus it times out. What > address is causing the timeout? 0 or 1? If the internal bus can only > have one PHY on it, maybe we need to set bus->phy_mask to 0x1? > I have added a trace in begin and end of stmmac_mdio_read() [ 18.145451] libphy: stmmac: probed [ 18.148398] libphy: mdio_mux: probed [ 18.148650] dwmac-sun8i 1c3.ethernet: Switch mux to internal PHY [ 18.248751] dwmac-sun8i 1c3.ethernet: EMAC reset timeout [ 18.249297] libphy: mdio_mux: probed [ 18.249362] dwmac-sun8i 1c3.ethernet: Switch mux to external PHY [ 18.249391] stmmac_mdio_read 0 2 [ 18.249598] stmmac_mdio_read 0 2 1c [ 18.249623] stmmac_mdio_read 0 3 [ 18.249811] stmmac_mdio_read 0 3 c915 [ 20.737271] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null) [ 31.294868] stmmac_mdio_read 0 0 [ 31.295311] stmmac_mdio_read 0 0 1140 It seems that the timeout is unrelated to MDIO bus. Regards
[PATCH v2] geneve: Fix setting ttl value in collect metadata mode
Similar to vxlan/ipip tunnel, if key->tos is zero in collect metadata mode, tos should also fallback to ip{4,6}_dst_hoplimit. Signed-off-by: Haishuang Yan--- Changes since v2: * Make the commit message more clearer. --- drivers/net/geneve.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index f640407..d52a65f 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -834,11 +834,10 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev, sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true); if (geneve->collect_md) { tos = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb); - ttl = key->ttl; } else { tos = ip_tunnel_ecn_encap(fl4.flowi4_tos, ip_hdr(skb), skb); - ttl = key->ttl ? : ip4_dst_hoplimit(>dst); } + ttl = key->ttl ? : ip4_dst_hoplimit(>dst); df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0; err = geneve_build_skb(>dst, skb, info, xnet, sizeof(struct iphdr)); @@ -873,12 +872,11 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev, sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true); if (geneve->collect_md) { prio = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb); - ttl = key->ttl; } else { prio = ip_tunnel_ecn_encap(ip6_tclass(fl6.flowlabel), ip_hdr(skb), skb); - ttl = key->ttl ? : ip6_dst_hoplimit(dst); } + ttl = key->ttl ? : ip6_dst_hoplimit(dst); err = geneve_build_skb(dst, skb, info, xnet, sizeof(struct ipv6hdr)); if (unlikely(err)) return err; -- 1.8.3.1
[patch net] mlxsw: spectrum: Prevent mirred-related crash on removal
From: Yuval MintzWhen removing the offloading of mirred actions under matchall classifiers, mlxsw would find the destination port associated with the offloaded action and utilize it for undoing the configuration. Depending on the order by which ports are removed, it's possible that the destination port would get removed before the source port. In such a scenario, when actions would be flushed for the source port mlxsw would perform an illegal dereference as the destination port is no longer listed. Since the only item necessary for undoing the configuration on the destination side is the port-id and that in turn is already maintained by mlxsw on the source-port, simply stop trying to access the destination port and use the port-id directly instead. Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC offloading") Signed-off-by: Yuval Mintz Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index e080459..696b99e 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -575,15 +575,14 @@ static void mlxsw_sp_span_entry_destroy(struct mlxsw_sp *mlxsw_sp, } static struct mlxsw_sp_span_entry * -mlxsw_sp_span_entry_find(struct mlxsw_sp_port *port) +mlxsw_sp_span_entry_find(struct mlxsw_sp *mlxsw_sp, u8 local_port) { - struct mlxsw_sp *mlxsw_sp = port->mlxsw_sp; int i; for (i = 0; i < mlxsw_sp->span.entries_count; i++) { struct mlxsw_sp_span_entry *curr = _sp->span.entries[i]; - if (curr->used && curr->local_port == port->local_port) + if (curr->used && curr->local_port == local_port) return curr; } return NULL; @@ -594,7 +593,8 @@ static struct mlxsw_sp_span_entry { struct mlxsw_sp_span_entry *span_entry; - span_entry = mlxsw_sp_span_entry_find(port); + span_entry = mlxsw_sp_span_entry_find(port->mlxsw_sp, + port->local_port); if (span_entry) { /* Already exists, just take a reference */ span_entry->ref_count++; @@ -783,12 +783,13 @@ static int mlxsw_sp_span_mirror_add(struct mlxsw_sp_port *from, } static void mlxsw_sp_span_mirror_remove(struct mlxsw_sp_port *from, - struct mlxsw_sp_port *to, + u8 destination_port, enum mlxsw_sp_span_type type) { struct mlxsw_sp_span_entry *span_entry; - span_entry = mlxsw_sp_span_entry_find(to); + span_entry = mlxsw_sp_span_entry_find(from->mlxsw_sp, + destination_port); if (!span_entry) { netdev_err(from->dev, "no span entry found\n"); return; @@ -1563,14 +1564,12 @@ static void mlxsw_sp_port_del_cls_matchall_mirror(struct mlxsw_sp_port *mlxsw_sp_port, struct mlxsw_sp_port_mall_mirror_tc_entry *mirror) { - struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp; enum mlxsw_sp_span_type span_type; - struct mlxsw_sp_port *to_port; - to_port = mlxsw_sp->ports[mirror->to_local_port]; span_type = mirror->ingress ? MLXSW_SP_SPAN_INGRESS : MLXSW_SP_SPAN_EGRESS; - mlxsw_sp_span_mirror_remove(mlxsw_sp_port, to_port, span_type); + mlxsw_sp_span_mirror_remove(mlxsw_sp_port, mirror->to_local_port, + span_type); } static int -- 2.9.3
Re: [RFC PATCH] net: Introduce a socket option to enable picking tx queue based on rx queue.
On 9/11/2017 8:53 PM, Eric Dumazet wrote: On Mon, 2017-09-11 at 20:12 -0700, Tom Herbert wrote: Two ints in sock_common for this purpose is quite expensive and the use case for this is limited-- even if a RX->TX queue mapping were introduced to eliminate the queue pair assumption this still won't help if the receive and transmit interfaces are different for the connection. I think we really need to see some very compelling results to be able to justify this. Will try to collect and post some perf data with symmetric queue configuration. Yes, this is unreasonable cost. XPS should really cover the case already. Eric, Can you clarify how XPS covers the RX-> TX queue mapping case? Is it possible to configure XPS to select TX queue based on the RX queue of a flow? IIUC, it is based on the CPU of the thread doing the transmit OR based on skb->priority to TC mapping? It may be possible to get this effect if the the threads are pinned to a core, but if the app threads are freely moving, i am not sure how XPS can be configured to select the TX queue based on the RX queue of a flow. Thanks Sridhar
Re: [PATCH] tcp: TCP_USER_TIMEOUT can not work in tcp_probe_timer()
Hi, In the scenario, tcp server side IP changed, and at that memont, userspace application still send data continuously; tcp_send_head(sk)'s timestamp always be refreshed. Here is the packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 0 +0 > S. 0:0(0) ack 1 +.1 < . 1:1(0) ack 1 win 65530 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 +0 write(4, ..., 24) = 24 +0 > P. 1:25(24) ack 1 win 229 +.1 < . 1:1(0) ack 25 win 65530 //change the ipaddress +1 `ifconfig tun0 192.168.0.10/16` +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +0 `ifconfig tun0 192.168.0.1/16` +0 < . 1:1(0) ack 1 win 1000 +0 write(4, ..., 24) = -1 [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt test.pkt:50: runtime error in write call: Expected result -1 but got 24 with errno 2 (No such file or directory) real1m11.364s user0m0.028s sys 0m0.106s [root@localhost ~]# netstat -toen Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State User Inode Timer tcp0504 192.168.0.1:8080192.0.2.1:33993 ESTABLISHED 0 45453 probe (22.38/0/7) since the script didn't wait for enough time, here only got 7 probes. 在 2017/9/11 23:22, Eric Dumazet 写道: > On Mon, 2017-09-11 at 08:13 -0700, Eric Dumazet wrote: > >> You can see we got only 3 probes, not 4. > > Here is complete packetdrill test showing that code behaves as expected. > > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 >+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 >+0 bind(3, ..., ...) = 0 >+0 listen(3, 1) = 0 > >+0 < S 0:0(0) win 0 >+0 > S. 0:0(0) ack 1 > > // Client advertises a zero receive window, so we can't send. > +.1 < . 1:1(0) ack 1 win 0 >+0 accept(3, ..., ...) = 4 > >+0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 >+0 write(4, ..., 2920) = 2920 > > // Window probes are scheduled just like RTOs. > +.3~+.31 > . 0:0(0) ack 1 > +.6~+.62 > . 0:0(0) ack 1 > +1.2~+1.24 > . 0:0(0) ack 1 > > // Peer opens its window too late ! >+3 < . 1:1(0) ack 1 win 1000 >+0 > R 1:1(0) > > > > . >