Re: [PATCH 08/22] crypto: chcr: Make use of the new sg_map helper function
On Fri, Apr 14, 2017 at 3:35 AM, Logan Gunthorpe wrote: > The get_page in this area looks *highly* suspect due to there being no > corresponding put_page. However, I've left that as is to avoid breaking > things. chcr driver will post the request to LLD driver cxgb4 and put_page is implemented there. it will no harm. Any how we have removed the below code from driver. http://www.mail-archive.com/linux-crypto@vger.kernel.org/msg24561.html After this merge we can ignore your patch. Thanks > > I've also removed the KMAP_ATOMIC_ARGS check as it appears to be dead > code that dates back to when it was first committed... > > Signed-off-by: Logan Gunthorpe > --- > drivers/crypto/chelsio/chcr_algo.c | 28 +++- > 1 file changed, 15 insertions(+), 13 deletions(-) > > diff --git a/drivers/crypto/chelsio/chcr_algo.c > b/drivers/crypto/chelsio/chcr_algo.c > index 41bc7f4..a993d1d 100644 > --- a/drivers/crypto/chelsio/chcr_algo.c > +++ b/drivers/crypto/chelsio/chcr_algo.c > @@ -1489,22 +1489,21 @@ static struct sk_buff *create_authenc_wr(struct > aead_request *req, > return ERR_PTR(-EINVAL); > } > > -static void aes_gcm_empty_pld_pad(struct scatterlist *sg, > - unsigned short offset) > +static int aes_gcm_empty_pld_pad(struct scatterlist *sg, > +unsigned short offset) > { > - struct page *spage; > unsigned char *addr; > > - spage = sg_page(sg); > - get_page(spage); /* so that it is not freed by NIC */ > -#ifdef KMAP_ATOMIC_ARGS > - addr = kmap_atomic(spage, KM_SOFTIRQ0); > -#else > - addr = kmap_atomic(spage); > -#endif > - memset(addr + sg->offset, 0, offset + 1); > + get_page(sg_page(sg)); /* so that it is not freed by NIC */ > + > + addr = sg_map(sg, SG_KMAP_ATOMIC); > + if (IS_ERR(addr)) > + return PTR_ERR(addr); > + > + memset(addr, 0, offset + 1); > + sg_unmap(sg, addr, SG_KMAP_ATOMIC); > > - kunmap_atomic(addr); > + return 0; > } > > static int set_msg_len(u8 *block, unsigned int msglen, int csize) > @@ -1940,7 +1939,10 @@ static struct sk_buff *create_gcm_wr(struct > aead_request *req, > if (req->cryptlen) { > write_sg_to_skb(skb, &frags, src, req->cryptlen); > } else { > - aes_gcm_empty_pld_pad(req->dst, authsize - 1); > + err = aes_gcm_empty_pld_pad(req->dst, authsize - 1); > + if (err) > + goto dstmap_fail; > + > write_sg_to_skb(skb, &frags, reqctx->dst, crypt_len); > > } > -- > 2.1.4 >
Re: [PATCH net-next 1/8] rtnetlink: Do not generate notifications for MTU events
+Nicolas On 4/14/17 7:51 PM, Vlad Yasevich wrote: > On 04/10/2017 11:49 AM, David Ahern wrote: >> On 4/10/17 9:39 AM, Vlad Yasevich wrote: >>> OK, so this will work for the events that are generated as a result of >>> device state change >>> (like mtu, address, and others). >>> >>> However, the original event data may be needed for other events that may be >>> of use to userspace like NETDEV_NOTIFY_PEERS and NETDEV_RESEND_IGMP >>> (possibly others...) >> >> sure. My objection is to multiple messages with identical content. >> >> I think the rtnetlink_event message is unique for those 2 netdev events, >> so no objections if it has value. >> > > So, I've been looking at adding a bitmap and collecting all modification, > however > I ran into an interesting issue in do_setlink. > > Currently the notifications from do_setlink() don't appear to work as one > would > expect and it's somewhat confusing upon deeper inspection. > > We have 2 values DO_SETLINK_MODIFIED (1) and DO_SETLINK_NOTIFY (3). These 2 > 'attempt' > to do different jobs, but really fail at it. The function will generate > notifications > regardless of which of the above values is used. > > Those changes were done in commit ba9989069f4e426b1e0ed7018eacc9e1ba607095 > (cc Nicolas > just in case he remembers the history) > > I am not sure which changes should really trigger a call > netdev_state_change(), thus this > message. Right now, all changes done in this function trigger them. If > that's how that > should function, that I can simplify the code. If not, then some of the > changes may > require us to export the event to the user. > > To use the dreaded NETDEV_CHANGEMTU event as an example, we used to generate > 3 messages > (PRECHANGEMTU, CHANGEMTU, and a message from netdev_state_change). With > recent changes, > we now only generate a message from netdev_state_change. However, mtu change > is tagged > with DO_SETLINK_MODIFIED which doesn't include the notify bit. So, should > there be a > NETDEV_CHANGE event associated with this change and a rtnl message (as it is > now) or not? > It's unclear. > > -vlad >
Re: [PATCH net-next 1/8] rtnetlink: Do not generate notifications for MTU events
On 04/10/2017 11:49 AM, David Ahern wrote: > On 4/10/17 9:39 AM, Vlad Yasevich wrote: >> OK, so this will work for the events that are generated as a result of >> device state change >> (like mtu, address, and others). >> >> However, the original event data may be needed for other events that may be >> of use to userspace like NETDEV_NOTIFY_PEERS and NETDEV_RESEND_IGMP >> (possibly others...) > > sure. My objection is to multiple messages with identical content. > > I think the rtnetlink_event message is unique for those 2 netdev events, > so no objections if it has value. > So, I've been looking at adding a bitmap and collecting all modification, however I ran into an interesting issue in do_setlink. Currently the notifications from do_setlink() don't appear to work as one would expect and it's somewhat confusing upon deeper inspection. We have 2 values DO_SETLINK_MODIFIED (1) and DO_SETLINK_NOTIFY (3). These 2 'attempt' to do different jobs, but really fail at it. The function will generate notifications regardless of which of the above values is used. Those changes were done in commit ba9989069f4e426b1e0ed7018eacc9e1ba607095 (cc Nicolas just in case he remembers the history) I am not sure which changes should really trigger a call netdev_state_change(), thus this message. Right now, all changes done in this function trigger them. If that's how that should function, that I can simplify the code. If not, then some of the changes may require us to export the event to the user. To use the dreaded NETDEV_CHANGEMTU event as an example, we used to generate 3 messages (PRECHANGEMTU, CHANGEMTU, and a message from netdev_state_change). With recent changes, we now only generate a message from netdev_state_change. However, mtu change is tagged with DO_SETLINK_MODIFIED which doesn't include the notify bit. So, should there be a NETDEV_CHANGE event associated with this change and a rtnl message (as it is now) or not? It's unclear. -vlad
Re: [PATCH v3 net-next RFC] Generic XDP
On Fri, 14 Apr 2017 17:46:44 -0700, Alexei Starovoitov wrote: > On Fri, Apr 14, 2017 at 03:30:32PM -0700, Jakub Kicinski wrote: > > On Fri, 14 Apr 2017 12:28:15 -0700, Alexei Starovoitov wrote: > > > On Fri, Apr 14, 2017 at 11:05:25AM +0200, Jesper Dangaard Brouer wrote: > > > > > > > > > > We are consistently finding that there is this real need to > > > > > communicate XDP capabilities, or somehow verify that the needs > > > > > of an XDP program can be satisfied by a given implementation. > > > > > > > > I fully agree that we need some way to express capabilities[1] > > > > > > > > > Maximum headroom is just one. > > > > > > I don't like the idea of asking program author to explain capabilities > > > to the kernel. Right now the verifier already understands more about > > > the program than human does. If the verifier cannot deduct from the > > > insns what program will be doing, it's pretty much guarantee > > > that program author has no idea either. > > > If we add 'required_headroom' as an extra flag to BPF_PROG_LOAD, > > > the users will just pass something like 64 or 128 whereas the program > > > might only be doing IPIP encap and that will cause kernel to > > > provide extra headroom for no good reason or reject the program > > > whereas it could have run just fine. > > > > > > So I very much agree with this part: > > > > ... or somehow verify that the needs > > > > of an XDP program can be satisfied by a given implementation. > > > > > > we already have cb_access, dst_needed, xdp_adjust_head flags > > > that verifier discovers in the program. > > > For headroom we need one more. The verifier can see > > > the constant passed into bpf_xdp_adjust_head(). > > > It's trickier if it's a variable, but the verifier can estimate min/max > > > of the variable already and worst case it can say that it > > > will be XDP_PACKET_HEADROOM. > > > If the program is doing variable length bpf_xdp_adjust_head(), > > > the human has no idea how much they need and cannot tell kernel anyway. > > > > If I understand correctly that the current proposal is to: > > > > 1. Assume program needs 256 (XDP_PACKET_HEADROOM) of prepend. > > 2. Make verifier try to lower that through byte code analysis. > > 3. Make the load fail if (prog->max_prepend > driver->xdp_max_prepend). > > no. the hard fail will be wrong, since verifier is conservative. > As i showed in the example with variable ip hdr length. > > > We may expect most users won't care and we may want to still depend on > > the verifier to analyze the program and enable optimizations, but > > depending on the verifier for proving prepend lengths is scary. Or are > > we just discussion the optimizations here and not the guarantees about > > headroom availability? > > yes. I was thinking as an optimization. If verifier can see that > prog needs only 20 bytes of headroom or no headroom (like netronome > already does) than the ring can be configured differently for hopefully > better performance. > But that's probably bad idea, since I was assuming that such ring > reconfiguration can be made fast, which is unlikely. > If it takes seconds to setup a ring, then drivers should just > assume XDP_PACKET_HEADROOM, since at that time the program > properties are unknown and in the future other programs will be loaded. > Take a look at our current setup with slots for xdp_dump, ddos, lb > programs. Only root program is attached at the begining. > If the driver configures the ring for such empty program that would break > dynamic addition of lb prog. > The driver must not interrupt the traffic when user adds another > prog to prog array. In such case XDP side doesn't even know that > prog array is used. It's all happening purely on bpf side. > > > Or worse not do that, use Qlogic, Broadcom, Mellanox or Netronome cards > > and then be in for a nasty surprise when they try to load the same > > program for Intel parts :( > > are you talking hw offloaded prog or normal in-driver xdp? In-driver, mostly. As you said earlier in the thread we expect Alex's/Intel's patches to provide "only" 192B of headroom. For the NFP offload there are trade-offs to full 256B offset. I think we could just fallback to the host, but if user hints that the headroom is big perhaps we would be better off not even trying to offload. > > I agree that trying to express capabilities to user space is quickly > > going to become more confusing than helpful. However, I think here we > > need the opposite - we need an opt-in way of program requesting > > guarantees about the datapath. > > what is an example of what you proposing? How will it look from user pov? This is roughly what I had in mind. From user's perspective it would probably make sense to tie it to the program itself, so: bpf_helpers.h: #define __XDP_ATTR \ __attribute__((section(ELF_SECTION_XDP_ATTRS), used)) #define XDP_GUARANTEE_HEADROOM(val) \ __u32 xdp_gua
Re: [PATCH net-next v2] Add uid and cookie bpf helper to cg_skb_func_proto
On Fri, Apr 14, 2017 at 06:25:26PM -0700, Chenbo Feng wrote: > From: Chenbo Feng > > BPF helper functions get_socket_cookie and get_socket_uid can be > used for network traffic classifications, among others. Expose > them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of > commit 8f917bba0042 ("bpf: pass sk to helper functions") the > required skb->sk function is available at both cgroup bpf ingress > and egress hooks. With these two new helper, cg_skb_func_proto is > effectively the same as sk_filter_func_proto. > > Change since V1: > Instead of add the helper to cg_skb_func_proto, redirect the > cg_skb_func_proto to sk_filter_func_proto since all helper function > in sk_filter_func_proto are applicable to cg_skb_func_proto now. > > Signed-off-by: Chenbo Feng Thanks! Acked-by: Alexei Starovoitov
[PATCH net-next v2] Add uid and cookie bpf helper to cg_skb_func_proto
From: Chenbo Feng BPF helper functions get_socket_cookie and get_socket_uid can be used for network traffic classifications, among others. Expose them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of commit 8f917bba0042 ("bpf: pass sk to helper functions") the required skb->sk function is available at both cgroup bpf ingress and egress hooks. With these two new helper, cg_skb_func_proto is effectively the same as sk_filter_func_proto. Change since V1: Instead of add the helper to cg_skb_func_proto, redirect the cg_skb_func_proto to sk_filter_func_proto since all helper function in sk_filter_func_proto are applicable to cg_skb_func_proto now. Signed-off-by: Chenbo Feng --- net/core/filter.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index ce2a19d..19be954 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2766,12 +2766,7 @@ xdp_func_proto(enum bpf_func_id func_id) static const struct bpf_func_proto * cg_skb_func_proto(enum bpf_func_id func_id) { - switch (func_id) { - case BPF_FUNC_skb_load_bytes: - return &bpf_skb_load_bytes_proto; - default: - return bpf_base_func_proto(func_id); - } + return sk_filter_func_proto(func_id); } static const struct bpf_func_proto * -- 2.7.4
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote: > + > + switch (act) { > + case XDP_TX: > + __skb_push(skb, skb->mac_len); s/skb->mac_len/mac_len/ > + HARD_TX_UNLOCK(dev, txq); > + if (free_skb) { > + trace_xdp_exception(dev, xdp_prog, XDP_TX); > + kfree_skb(skb); nice that you didn't forget to add trace_xdp_exception in this path :) Overall looks good to me and other than the minor nit in tx, i think it should work for programs already used with in-driver xdp. I'll test it next week unless people beat me to it.
Re: [PATCH v3 net-next RFC] Generic XDP
On Fri, Apr 14, 2017 at 03:30:32PM -0700, Jakub Kicinski wrote: > On Fri, 14 Apr 2017 12:28:15 -0700, Alexei Starovoitov wrote: > > On Fri, Apr 14, 2017 at 11:05:25AM +0200, Jesper Dangaard Brouer wrote: > > > > > > > > We are consistently finding that there is this real need to > > > > communicate XDP capabilities, or somehow verify that the needs > > > > of an XDP program can be satisfied by a given implementation. > > > > > > I fully agree that we need some way to express capabilities[1] > > > > > > > Maximum headroom is just one. > > > > I don't like the idea of asking program author to explain capabilities > > to the kernel. Right now the verifier already understands more about > > the program than human does. If the verifier cannot deduct from the > > insns what program will be doing, it's pretty much guarantee > > that program author has no idea either. > > If we add 'required_headroom' as an extra flag to BPF_PROG_LOAD, > > the users will just pass something like 64 or 128 whereas the program > > might only be doing IPIP encap and that will cause kernel to > > provide extra headroom for no good reason or reject the program > > whereas it could have run just fine. > > > > So I very much agree with this part: > > > ... or somehow verify that the needs > > > of an XDP program can be satisfied by a given implementation. > > > > we already have cb_access, dst_needed, xdp_adjust_head flags > > that verifier discovers in the program. > > For headroom we need one more. The verifier can see > > the constant passed into bpf_xdp_adjust_head(). > > It's trickier if it's a variable, but the verifier can estimate min/max > > of the variable already and worst case it can say that it > > will be XDP_PACKET_HEADROOM. > > If the program is doing variable length bpf_xdp_adjust_head(), > > the human has no idea how much they need and cannot tell kernel anyway. > > If I understand correctly that the current proposal is to: > > 1. Assume program needs 256 (XDP_PACKET_HEADROOM) of prepend. > 2. Make verifier try to lower that through byte code analysis. > 3. Make the load fail if (prog->max_prepend > driver->xdp_max_prepend). no. the hard fail will be wrong, since verifier is conservative. As i showed in the example with variable ip hdr length. > We may expect most users won't care and we may want to still depend on > the verifier to analyze the program and enable optimizations, but > depending on the verifier for proving prepend lengths is scary. Or are > we just discussion the optimizations here and not the guarantees about > headroom availability? yes. I was thinking as an optimization. If verifier can see that prog needs only 20 bytes of headroom or no headroom (like netronome already does) than the ring can be configured differently for hopefully better performance. But that's probably bad idea, since I was assuming that such ring reconfiguration can be made fast, which is unlikely. If it takes seconds to setup a ring, then drivers should just assume XDP_PACKET_HEADROOM, since at that time the program properties are unknown and in the future other programs will be loaded. Take a look at our current setup with slots for xdp_dump, ddos, lb programs. Only root program is attached at the begining. If the driver configures the ring for such empty program that would break dynamic addition of lb prog. The driver must not interrupt the traffic when user adds another prog to prog array. In such case XDP side doesn't even know that prog array is used. It's all happening purely on bpf side. > Or worse not do that, use Qlogic, Broadcom, Mellanox or Netronome cards > and then be in for a nasty surprise when they try to load the same > program for Intel parts :( are you talking hw offloaded prog or normal in-driver xdp? > I agree that trying to express capabilities to user space is quickly > going to become more confusing than helpful. However, I think here we > need the opposite - we need an opt-in way of program requesting > guarantees about the datapath. what is an example of what you proposing? How will it look from user pov?
Re: ipv6 udp early demux breaks udp_l3mdev_accept=0
On 4/14/17 6:38 PM, Subash Abhinov Kasiviswanathan wrote: > On 2017-04-14 16:23, David Ahern wrote: >> Subash: >> >> My understanding of early demux is that it should only match connected >> sockets (src/sport + dst/dport are all set). The ipv6 udp early demux >> code added in 5425077d73e0c is doing a generic socket lookup which does >> not require an exact match and accordingly breaks udp_l3mdev_accept=0. >> Please fix. >> >> David > > Hi David > > Would it be possible to disable udp_early_demux for your case temporarily. > I'll look into this. > It should be fixed for 4.12. Basically, __udp6_lib_demux_lookup needs to be more like __udp4_lib_demux_lookup
Re: ipv6 udp early demux breaks udp_l3mdev_accept=0
On 2017-04-14 16:23, David Ahern wrote: Subash: My understanding of early demux is that it should only match connected sockets (src/sport + dst/dport are all set). The ipv6 udp early demux code added in 5425077d73e0c is doing a generic socket lookup which does not require an exact match and accordingly breaks udp_l3mdev_accept=0. Please fix. David Hi David Would it be possible to disable udp_early_demux for your case temporarily. I'll look into this. -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH net-next] Add uid and cookie bpf helper to cg_skb_func_proto
On 04/15/2017 02:07 AM, Alexei Starovoitov wrote: On Fri, Apr 14, 2017 at 04:12:14PM -0700, Chenbo Feng wrote: From: Chenbo Feng BPF helper functions get_socket_cookie and get_socket_uid can be used for network traffic classifications, among others. Expose them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of commit 8f917bba0042 ("bpf: pass sk to helper functions") the required skb->sk function is available at both cgroup bpf ingress and egress hooks. Signed-off-by: Chenbo Feng Thanks for follow up. Another alternative is to do cg_skb_func_proto(enum bpf_func_id func_id) { return sk_filter_func_proto(func_id); } I think all socket filter helpers are applicable to cg_skb too. Yeah, both will effectively be the same at that point: static const struct bpf_func_proto * sk_filter_func_proto(enum bpf_func_id func_id) { switch (func_id) { case BPF_FUNC_skb_load_bytes: return &bpf_skb_load_bytes_proto; case BPF_FUNC_get_socket_cookie: return &bpf_get_socket_cookie_proto; case BPF_FUNC_get_socket_uid: return &bpf_get_socket_uid_proto; default: return bpf_base_func_proto(func_id); } } And with the two additions: static const struct bpf_func_proto * cg_skb_func_proto(enum bpf_func_id func_id) { switch (func_id) { case BPF_FUNC_skb_load_bytes: return &bpf_skb_load_bytes_proto; + case BPF_FUNC_get_socket_cookie: + return &bpf_get_socket_cookie_proto; + case BPF_FUNC_get_socket_uid: + return &bpf_get_socket_uid_proto; default: return bpf_base_func_proto(func_id); } }
Re: [PATCH net-next] bonding: do not pass link-local packets to master-interface
Mahesh Bandewar (महेश बंडेवार) wrote: >On Fri, Apr 14, 2017 at 12:30 PM, Jay Vosburgh > wrote: >> >> >> Chonggang Li wrote: >> >> >Previously link-local packets excluding LACP (which are handled by >> >the recv_probe) received on bond slave interfaces are delivered to >> >stack with bond-master device with RX_HANDLER_ANOTHER, however all >> >link-local packets are link specific and should be delivered with >> >exact same link/dev. >> >> In what case is the current behavior a problem (my guess would >> be something related to LLDP)? What if, e.g., the bond is a bridge >> port, will STP frames no longer propagate to the bridge? >> >When a link-local frame made appear to arrive on bond-master, it >looses value since 'the info' about link on which it arrived is lost. This information should really be in the commit message, along with something about the LLDP issue being solved. >So idea behind this is to pass the frame to the stack with the link on >which it arrived without involving the bonding-master. The same will >happen for the STP frames too. Do you see problem with this approach? My look through the STP code suggests not, but I'm far from an expert there. I'm just concerned that this change will cause some obscure regression in something that depends on the current behavior. >> Also, I think the description would be better if it mentioned >> specifically that the patch is changing how skb->dev is set for link >> local frames (bond->dev vs receiving interface), e.g., >> >> "[...] however all link-local packets are link specific and >> should be delivered with skb->dev set to the original device." >> >yes, makes sense. > >> >Signed-off-by: Chonggang Li >> >Signed-off-by: Mahesh Bandewar >> >Signed-off-by: Maciej Żenczykowski >> >--- >> > drivers/net/bonding/bond_main.c | 3 +++ >> > 1 file changed, 3 insertions(+) >> > >> >diff --git a/drivers/net/bonding/bond_main.c >> >b/drivers/net/bonding/bond_main.c >> >index 01e4a69af421..aeca3d8541b9 100644 >> >--- a/drivers/net/bonding/bond_main.c >> >+++ b/drivers/net/bonding/bond_main.c >> >@@ -1176,6 +1176,9 @@ static rx_handler_result_t bond_handle_frame(struct >> >sk_buff **pskb) >> > } >> > } >> > >> >+ /* link-local packets should not be passed to master interface */ >> >+ if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) >> >+ return RX_HANDLER_PASS; >> >> Since this returns _PASS and not _EXACT, the packet will go >> through the ptype_base packet handlers, so is the comment strictly >> correct? >> >The stack does not have all link-local specific packet-type handlers >registered by default so returning _EXACT would find nothing and >packet will be dropped, right? So returning _PASS will deliver packet >to the stack with skb->dev as the link on which packet arrived and >stack can take whatever (default) action it has to take. Fair enough; I do think the comment would be better phrased as something like "don't change skb->dev of link local frames"; on first reading, I thought it meant the frames would be dropped. -J >> > if (bond_should_deliver_exact_match(skb, slave, bond)) >> > return RX_HANDLER_EXACT; >> > >> >-- >> >2.12.2.762.g0e3151a226-goog >> > >> --- -Jay Vosburgh, jay.vosbu...@canonical.com
Re: [PATCH net-next] Add uid and cookie bpf helper to cg_skb_func_proto
On Fri, Apr 14, 2017 at 04:12:14PM -0700, Chenbo Feng wrote: > From: Chenbo Feng > > BPF helper functions get_socket_cookie and get_socket_uid can be > used for network traffic classifications, among others. Expose > them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of > commit 8f917bba0042 ("bpf: pass sk to helper functions") the required > skb->sk function is available at both cgroup bpf ingress and egress > hooks. > > Signed-off-by: Chenbo Feng Thanks for follow up. Another alternative is to do cg_skb_func_proto(enum bpf_func_id func_id) { return sk_filter_func_proto(func_id); } I think all socket filter helpers are applicable to cg_skb too.
[PATCH net-next] drivers: net: xgene-v2: Extend ethtool statistics
This patch adds extended statistics reporting to ethtool. In summary, this patch, - adds ethtool.h with the statistics register definitions - adds 'struct xge_gstrings_extd_stats' to gather extended stats - modifies xge_get_strings(), get_sset_count() and get_ethtool_stats() accordingly - moves 'struct xge_gstrings_stats' to ethtool.h Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene-v2/ethtool.c | 82 ++--- drivers/net/ethernet/apm/xgene-v2/ethtool.h | 82 + drivers/net/ethernet/apm/xgene-v2/mac.h | 3 -- drivers/net/ethernet/apm/xgene-v2/main.h| 2 +- 4 files changed, 159 insertions(+), 10 deletions(-) create mode 100644 drivers/net/ethernet/apm/xgene-v2/ethtool.h diff --git a/drivers/net/ethernet/apm/xgene-v2/ethtool.c b/drivers/net/ethernet/apm/xgene-v2/ethtool.c index 0c426f5..421b1dd 100644 --- a/drivers/net/ethernet/apm/xgene-v2/ethtool.c +++ b/drivers/net/ethernet/apm/xgene-v2/ethtool.c @@ -21,12 +21,13 @@ #include "main.h" -struct xge_gstrings_stats { - char name[ETH_GSTRING_LEN]; - int offset; -}; - #define XGE_STAT(m){ #m, offsetof(struct xge_pdata, stats.m) } +#define XGE_EXTD_STAT(m, n)\ + { \ + #m, \ + n, \ + 0 \ + } static const struct xge_gstrings_stats gstrings_stats[] = { XGE_STAT(rx_packets), @@ -36,7 +37,66 @@ struct xge_gstrings_stats { XGE_STAT(rx_errors) }; +static struct xge_gstrings_extd_stats gstrings_extd_stats[] = { + XGE_EXTD_STAT(tx_rx_64b_frame_cntr, TR64), + XGE_EXTD_STAT(tx_rx_127b_frame_cntr, TR127), + XGE_EXTD_STAT(tx_rx_255b_frame_cntr, TR255), + XGE_EXTD_STAT(tx_rx_511b_frame_cntr, TR511), + XGE_EXTD_STAT(tx_rx_1023b_frame_cntr, TR1K), + XGE_EXTD_STAT(tx_rx_1518b_frame_cntr, TRMAX), + XGE_EXTD_STAT(tx_rx_1522b_frame_cntr, TRMGV), + XGE_EXTD_STAT(rx_byte_cntr, RBYT), + XGE_EXTD_STAT(rx_pkt_cntr, RPKT), + XGE_EXTD_STAT(rx_fcs_error_cntr, RFCS), + XGE_EXTD_STAT(rx_multicast_pkt_cntr, RMCA), + XGE_EXTD_STAT(rx_broadcast_pkt_cntr, RBCA), + XGE_EXTD_STAT(rx_ctrl_frame_pkt_cntr, RXCF), + XGE_EXTD_STAT(rx_pause_frame_pkt_cntr, RXPF), + XGE_EXTD_STAT(rx_unk_opcode_cntr, RXUO), + XGE_EXTD_STAT(rx_align_err_cntr, RALN), + XGE_EXTD_STAT(rx_frame_len_err_cntr, RFLR), + XGE_EXTD_STAT(rx_code_err_cntr, RCDE), + XGE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE), + XGE_EXTD_STAT(rx_undersize_pkt_cntr, RUND), + XGE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR), + XGE_EXTD_STAT(rx_fragments_cntr, RFRG), + XGE_EXTD_STAT(rx_jabber_cntr, RJBR), + XGE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP), + XGE_EXTD_STAT(tx_byte_cntr, TBYT), + XGE_EXTD_STAT(tx_pkt_cntr, TPKT), + XGE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA), + XGE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA), + XGE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF), + XGE_EXTD_STAT(tx_defer_pkt_cntr, TDFR), + XGE_EXTD_STAT(tx_excv_defer_pkt_cntr, TEDF), + XGE_EXTD_STAT(tx_single_col_pkt_cntr, TSCL), + XGE_EXTD_STAT(tx_multi_col_pkt_cntr, TMCL), + XGE_EXTD_STAT(tx_late_col_pkt_cntr, TLCL), + XGE_EXTD_STAT(tx_excv_col_pkt_cntr, TXCL), + XGE_EXTD_STAT(tx_total_col_cntr, TNCL), + XGE_EXTD_STAT(tx_pause_frames_hnrd_cntr, TPFH), + XGE_EXTD_STAT(tx_drop_frame_cntr, TDRP), + XGE_EXTD_STAT(tx_jabber_frame_cntr, TJBR), + XGE_EXTD_STAT(tx_fcs_error_cntr, TFCS), + XGE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF), + XGE_EXTD_STAT(tx_oversize_frame_cntr, TOVR), + XGE_EXTD_STAT(tx_undersize_frame_cntr, TUND), + XGE_EXTD_STAT(tx_fragments_cntr, TFRG) +}; + #define XGE_STATS_LEN ARRAY_SIZE(gstrings_stats) +#define XGE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats) + +static void xge_mac_get_extd_stats(struct xge_pdata *pdata) +{ + u32 data; + int i; + + for (i = 0; i < XGE_EXTD_STATS_LEN; i++) { + data = xge_rd_csr(pdata, gstrings_extd_stats[i].addr); + gstrings_extd_stats[i].value += data; + } +} static void xge_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info) @@ -62,6 +122,11 @@ static void xge_get_strings(struct net_device *ndev, u32 stringset, u8 *data) memcpy(p, gstrings_stats[i].name, ETH_GSTRING_LEN); p += ETH_GSTRING_LEN; } + + for (i = 0; i < XGE_EXTD_STATS_LEN; i++) { + memcpy(p, gstrings_extd_stats[i].name, ETH_GSTRING_LEN); + p += ETH_GSTRING_LEN; + } } static int xge_get_sset_count(struct net_device
[PATCH net-next] Add uid and cookie bpf helper to cg_skb_func_proto
From: Chenbo Feng BPF helper functions get_socket_cookie and get_socket_uid can be used for network traffic classifications, among others. Expose them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of commit 8f917bba0042 ("bpf: pass sk to helper functions") the required skb->sk function is available at both cgroup bpf ingress and egress hooks. Signed-off-by: Chenbo Feng --- net/core/filter.c | 4 1 file changed, 4 insertions(+) diff --git a/net/core/filter.c b/net/core/filter.c index ce2a19d..b6db9e330 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2769,6 +2769,10 @@ cg_skb_func_proto(enum bpf_func_id func_id) switch (func_id) { case BPF_FUNC_skb_load_bytes: return &bpf_skb_load_bytes_proto; + case BPF_FUNC_get_socket_cookie: + return &bpf_get_socket_cookie_proto; + case BPF_FUNC_get_socket_uid: + return &bpf_get_socket_uid_proto; default: return bpf_base_func_proto(func_id); } -- 2.7.4
[PATCH iproute2 v2] ip vrf: Add command name next to pid
'ip vrf pids' is used to list processes bound to a vrf, but it only shows the pid leaving a lot of work for the user. Add the command name to the output. With this patch you get the more user friendly: $ ip vrf pids mgmt 1121 ntpd 1418 gdm-session-wor 1488 gnome-session 1491 dbus-launch 1492 dbus-daemon 1565 sshd ... Signed-off-by: David Ahern --- v2 - changed get_comm to get_command_name with size_t for len type as requested by Stephen include/utils.h | 1 + ip/ipvrf.c | 24 ++-- lib/fs.c| 42 ++ 3 files changed, 57 insertions(+), 10 deletions(-) diff --git a/include/utils.h b/include/utils.h index 22369e0b4e03..8c12e1e2a60c 100644 --- a/include/utils.h +++ b/include/utils.h @@ -260,5 +260,6 @@ int get_real_family(int rtm_type, int rtm_family); int cmd_exec(const char *cmd, char **argv, bool do_fork); int make_path(const char *path, mode_t mode); char *find_cgroup2_mount(void); +int get_command_name(const char *pid, char *comm, size_t len); #endif /* __UTILS_H__ */ diff --git a/ip/ipvrf.c b/ip/ipvrf.c index 5e204a9ebbb1..0f611b44b78a 100644 --- a/ip/ipvrf.c +++ b/ip/ipvrf.c @@ -111,27 +111,31 @@ static void read_cgroup_pids(const char *base_path, char *name) { char path[PATH_MAX]; char buf[4096]; - ssize_t n; - int fd; + FILE *fp; if (snprintf(path, sizeof(path), "%s/vrf/%s%s", base_path, name, CGRP_PROC_FILE) >= sizeof(path)) return; - fd = open(path, O_RDONLY); - if (fd < 0) + fp = fopen(path, "r"); + if (!fp) return; /* no cgroup file, nothing to show */ /* dump contents (pids) of cgroup.procs */ - while (1) { - n = read(fd, buf, sizeof(buf) - 1); - if (n <= 0) - break; + while (fgets(buf, sizeof(buf), fp)) { + char *nl, comm[32]; - printf("%s", buf); + nl = strchr(buf, '\n'); + if (nl) + *nl = '\0'; + + if (get_command_name(buf, comm, sizeof(comm))) + strcpy(comm, ""); + + printf("%5s %s\n", buf, comm); } - close(fd); + fclose(fp); } /* recurse path looking for PATH[/NETNS]/vrf/NAME */ diff --git a/lib/fs.c b/lib/fs.c index 12a4657a0bc9..c59ac564581d 100644 --- a/lib/fs.c +++ b/lib/fs.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -149,3 +150,44 @@ int make_path(const char *path, mode_t mode) return rc; } + +int get_command_name(const char *pid, char *comm, size_t len) +{ + char path[PATH_MAX]; + char line[128]; + FILE *fp; + + if (snprintf(path, sizeof(path), +"/proc/%s/status", pid) >= sizeof(path)) { + return -1; + } + + fp = fopen(path, "r"); + if (!fp) + return -1; + + comm[0] = '\0'; + while (fgets(line, sizeof(line), fp)) { + char *nl, *name; + + name = strstr(line, "Name:"); + if (!name) + continue; + + name += 5; + while (isspace(*name)) + name++; + + nl = strchr(name, '\n'); + if (nl) + *nl = '\0'; + + strncpy(comm, name, len - 1); + comm[len - 1] = '\0'; + break; + } + + fclose(fp); + + return 0; +} -- 2.1.4
Re: [PATCH iproute2] ip vrf: Add command name next to pid
On 4/14/17 4:57 PM, Stephen Hemminger wrote: > Overall, this looks like a good idea. > >> + >> +int get_comm(char *pid, char *comm, int len) > > > Please use more glibc style function signature, like: > > int get_comm(const char *pid, char *comm, size_t len) > > And maybe even > int get_command_name(pid_t pid, char *comm, size_t len) > I'd pref to keep as char *. No sense converting from string to pid_t to string. The vrf code is reading a cgroups file; the other potential user is netns which is also reading pid from a file.
Re: [PATCH net-next 1/1 v3] drivers: net: rmnet: Initial implementation
+https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/\ +dataservices/tree/rmnetctl Don't split URL better to have long line. diff --git a/drivers/net/rmnet/Kconfig b/drivers/net/rmnet/Kconfig Since this is Qualcomm and Ethernet specific, maybe better to put in drivers/net/ethernet/qualcom/rmnet +config RMNET_DEBUG Please use network device standard debug mechanism. netif_msg_XXX + buffloc += snprintf(&buffer[buffloc], sizeof(buffer) - buffloc, + " %02x", skb->data[i]); If you really have to do this. Use hex_dump_bytes API. + skb->mac_header = 0; + skb->mac_len = 0; +} Why not use sbk_set_mac_header(skb, 0)? +static inline struct rmnet_phys_ep_conf_s *_rmnet_get_phys_ep_config + (struct net_device *dev) awkward line break. dev could be const + config = (struct rmnet_phys_ep_conf_s *) + rcu_dereference(dev->rx_handler_data); Please don't directly reference rx_handler. There is already functions like netdev_is_rx_handler_busy() to abstract that API. + * @epconfig:endpoint configuration structure to set + */ You are using docbook format here, but this is not a docbook comment. ie. /** * function - This is a docbook comment * @dev: this is a param */ Plus these are static functions so there is no point in documentating internal API with docbook. + ASSERT_RTNL(); + + if (!dev || config_id < RMNET_LOCAL_LOGICAL_ENDPOINT || + config_id >= RMNET_MAX_LOGICAL_EP) + return -EINVAL; For internal API's you should validate parmeters at the external entry point not in each call. Otherwise you have a multitude of impossible error checks and dead code paths. + .next = 0, + .priority = 0 +}; Don't initialize fields that are not used or should be zero. Hi Stephen I'll make these changes. Could just be: static inline int _rmnet_is_physical_endpoint_associated(const struct net_device *dev) { rx_handler_func_t *rx_handler = rcu_dereference(dev->rx_handler); return rx_handler == rmet_rx_handler; } But standard practice is to use ndo_ops to identify self in network drivers. I.e return dev->netdev_ops == &rmnet_device_ops; The netdevice which is associated is the physical (real_dev). This device can vary based on hardware and will have its own netdev_ops associated with it. rmnet_device_ops applies to the rmnet devices only. I'll use the first option you have suggested. + if (!data[IFLA_RMNET_MUX_ID]) + return -EINVAL; So you are inventing private link netlink attributes. Why? Why can't you use device switch, bridge, or other master/slave model. I would like to eventually add more configuration options for the ingress & egress data formats as well as the endpoint configuration (VND mode vs BRIDGE mode). I was able to re-use IFLA_VLAN_ID for IFLA_RMNET_MUX_ID and test but it wasn't sufficient for the additional parameters. I'll check the bridge attributes and try to reuse it. -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [iproute2 net-next v2 0/3] ip netconf improvements
On Thu, 23 Mar 2017 19:51:19 -0700 David Ahern wrote: > Currently, ip netconf only shows data for ipv4 and ipv6 for dumps > and just ipv4 for device requests. Improve the user experience by > using the new kernel patch to dump all address families that have > registered. For example, if mpls_router module is loaded then mpls > values are displayed along with ipv4 and ipv6. > > If the new feature is not supported (new iproute2 on older kernel) > the kernel returns the nlmsg error EOPNOTSUPP which can be trapped > and fallback to existing behavior. > > v2 > - fixed index conversion in patch 3 per nicholas' comment > > David Ahern (3): > netlink: Add flag to suppress print of nlmsg error > ip netconf: Show all address families by default in dumps > ip netconf: show all families on dev request > > include/libnetlink.h | 1 + > ip/ipnetconf.c | 36 +--- > lib/libnetlink.c | 3 ++- > 3 files changed, 28 insertions(+), 12 deletions(-) > Sure applied, just wanted to make sure there was nothing interrelated with recent spate of changes to netlink notifications.
Re: [patch iproute2/net-next repost] devlink: Add support for pipeline debug (dpipe)
On Thu, 13 Apr 2017 11:30:27 +0200 Jiri Pirko wrote: > We actually took this code from teamdctl (at least it was an influence). > Devlink style is so much different in every aspect from the rest of the > iproute2 suite. And I did it on purpose, because it is much nicer and > easier to read. I would like to continue on this and don't do things in > the was the existing tools do. I don't see any problem with that. No. That is road to ruin. Every package is free to use what ever style it wants. But don't crossover please.
Re: [PATCH iproute2] ip vrf: Add command name next to pid
Overall, this looks like a good idea. > + > +int get_comm(char *pid, char *comm, int len) Please use more glibc style function signature, like: int get_comm(const char *pid, char *comm, size_t len) And maybe even int get_command_name(pid_t pid, char *comm, size_t len)
[PATCH] net: natsemi: ns83820: add checks for dma mapping error
The driver does not check if mapping dma memory succeed. The patch adds the checks and failure handling. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov --- drivers/net/ethernet/natsemi/ns83820.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/natsemi/ns83820.c b/drivers/net/ethernet/natsemi/ns83820.c index 729095db3e08..7d6d692ebb92 100644 --- a/drivers/net/ethernet/natsemi/ns83820.c +++ b/drivers/net/ethernet/natsemi/ns83820.c @@ -534,14 +534,19 @@ static inline int ns83820_add_rx_skb(struct ns83820 *dev, struct sk_buff *skb) ); #endif + buf = pci_map_single(dev->pci_dev, skb->data, +REAL_RX_BUF_SIZE, PCI_DMA_FROMDEVICE); + if (pci_dma_mapping_error(dev->pci_dev, buf)) { + kfree_skb(skb); + return 1; + } + sg = dev->rx_info.descs + (next_empty * DESC_SIZE); BUG_ON(NULL != dev->rx_info.skbs[next_empty]); dev->rx_info.skbs[next_empty] = skb; dev->rx_info.next_empty = (next_empty + 1) % NR_RX_DESC; cmdsts = REAL_RX_BUF_SIZE | CMDSTS_INTR; - buf = pci_map_single(dev->pci_dev, skb->data, -REAL_RX_BUF_SIZE, PCI_DMA_FROMDEVICE); build_rx_desc(dev, sg, 0, buf, cmdsts, 0); /* update link of previous rx */ if (likely(next_empty != dev->rx_info.next_rx)) @@ -1136,6 +1141,10 @@ static netdev_tx_t ns83820_hard_start_xmit(struct sk_buff *skb, if (nr_frags) len -= skb->data_len; buf = pci_map_single(dev->pci_dev, skb->data, len, PCI_DMA_TODEVICE); + if (pci_dma_mapping_error(dev->pci_dev, buf)) { + dev_kfree_skb_any(skb); + return NETDEV_TX_OK; + } first_desc = dev->tx_descs + (free_idx * DESC_SIZE); -- 2.7.4
Re: [PATCH next 2/5] bonding: initialize work-queues during creation of bond
On 8 March 2017 at 10:55, Mahesh Bandewar wrote: > From: Mahesh Bandewar > > Initializing work-queues every time ifup operation performed is unnecessary > and can be performed only once when the port is created. > > Signed-off-by: Mahesh Bandewar > --- > drivers/net/bonding/bond_main.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 619f0c65f18a..1329110ed85f 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -3270,8 +3270,6 @@ static int bond_open(struct net_device *bond_dev) > } > } > > - bond_work_init_all(bond); > - > if (bond_is_lb(bond)) { > /* bond_alb_initialize must be called before the timer > * is started. > @@ -4691,6 +4689,8 @@ int bond_create(struct net *net, const char *name) > > netif_carrier_off(bond_dev); > > + bond_work_init_all(bond); > + > rtnl_unlock(); > if (res < 0) > bond_destructor(bond_dev); > -- Hi Mahesh, I've noticed that this patch breaks bonding within namespaces if you're not careful to perform device cleanup correctly. Here's my repro script, you can run on any net-next with this patch and you'll start seeing some weird behaviour: ip netns add foo ip li add veth0 type veth peer name veth0+ netns foo ip li add veth1 type veth peer name veth1+ netns foo ip netns exec foo ip li add bond0 type bond ip netns exec foo ip li set dev veth0+ master bond0 ip netns exec foo ip li set dev veth1+ master bond0 ip netns exec foo ip addr add dev bond0 192.168.0.1/24 ip netns exec foo ip li set dev bond0 up ip li del dev veth0 ip li del dev veth1 The second to last command segfaults, last command hangs. rtnl is now permanently locked. It's not a problem if you take bond0 down before deleting veths, or delete bond0 before deleting veths. If you delete either end of the veth pair as per above, either inside or outside the namespace, it hits this problem. Here's some kernel logs: [ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link [ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link [ 1281.193863] bond0: Releasing backup interface veth0+ [ 1281.193866] bond0: the permanent HWaddr of veth0+ - 16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of veth0+ to a different address to avoid conflicts [ 1281.193867] [ cut here ] [ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511 __queue_delayed_work+0x13f/0x150 [ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid hid mptspi mptscsih e1000 mptbase ahci libahci [ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: GW 4.10.0-bisect-bond-v0.14 #37 [ 1281.193906] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014 [ 1281.193906] Call Trace: [ 1281.193912] dump_stack+0x63/0x89 [ 1281.193915] __warn+0xd1/0xf0 [ 1281.193917] warn_slowpath_null+0x1d/0x20 [ 1281.193918] __queue_delayed_work+0x13f/0x150 [ 1281.193920] queue_delayed_work_on+0x27/0x40 [ 1281.193929] bond_change_active_slave+0x25b/0x670 [bonding] [ 1281.193932] ? synchronize_rcu_expedited+0x27/0x30 [ 1281.193935] __bond_release_one+0x489/0x510 [bonding] [ 1281.193939] ? addrconf_notify+0x1b7/0xab0 [ 1281.193942] bond_netdev_event+0x2c5/0x2e0 [bonding] [ 1281.193944] ? netconsole_netdev_event+0x124/0x190 [netconsole] [ 1281.193947] notifier_call_chain+0x49/0x70 [ 1281.193948] raw_notifier_call_chain+0x16/0x20 [ 1281.193950] call_netdevice_notifiers_info+0x35/0x60 [ 1281.193951] rollback_registered_many+0x23b/0x3e0 [ 1281.193953] unregister_netdevice_many+0x24/0xd0 [ 1281.193955] rtnl_delete_link+0x3c/0x50 [ 1281.193956] rtnl_dellink+0x8d/0x1b0 [ 1281.193960] rtnetlink_rcv_msg+0x95/0x220 [ 1281.193962] ? __kmalloc_node_track_caller+0x35/0x280 [ 1281.193964] ? __netlink_lookup+0xf1/0x110 [ 1281.193966] ? rtnl_newlink+0x830/0x830 [ 1281.193967] netlink_rcv_skb+0xa7/0xc0 [ 1281.193969] rtnetlink_rcv+0x28/0x30 [ 1281.193970] netlink_unicast+0x15b/0x210 [ 1281.193971] netlink_sendmsg+0x319/0x390 [ 1281.193974] sock_sendmsg+0x38/0x50 [ 1281.193975] ___sys_sendmsg+0x25c/0x270 [ 1281.193978] ? mem_cgroup_commit_charge+0x76/0xf0 [ 1281.193981] ? page_add_new_anon_rmap+0x89/0xc0 [ 1281.193984] ? lru_cache_add_active_or_unevictable+0x35/0xb0 [ 1281.193985] ? __handle_mm_fault+0x4e9/0x1170 [ 1281.193987] __sys_sendmsg+0x45/0x80 [ 1281.193989] S
Re: [PATCH iproute2 net-next 2/3] iproute: add support for SR-IPv6 lwtunnel encapsulation
On Fri, 14 Apr 2017 14:36:22 +0200 David Lebrun wrote: > + if (!tb[SEG6_IPTUNNEL_SRH]) > + return Bad indentation?
Re: [PATCH iproute2 net-next 1/3] ip: add ip sr command to control SR-IPv6 internal structures
On Fri, 14 Apr 2017 14:36:21 +0200 David Lebrun wrote: > +static struct { > + int cmd; Why not unsigned? you only assign positive values > + struct in6_addr addr; > + __u32 keyid; > + char *pass; Why not const char *? or do you free the value on exit? > + __u8 alg_id; > +} opts; > +
Re: [PATCH v3 net-next RFC] Generic XDP
On Fri, 14 Apr 2017 12:28:15 -0700, Alexei Starovoitov wrote: > On Fri, Apr 14, 2017 at 11:05:25AM +0200, Jesper Dangaard Brouer wrote: > > > > > > We are consistently finding that there is this real need to > > > communicate XDP capabilities, or somehow verify that the needs > > > of an XDP program can be satisfied by a given implementation. > > > > I fully agree that we need some way to express capabilities[1] > > > > > Maximum headroom is just one. > > I don't like the idea of asking program author to explain capabilities > to the kernel. Right now the verifier already understands more about > the program than human does. If the verifier cannot deduct from the > insns what program will be doing, it's pretty much guarantee > that program author has no idea either. > If we add 'required_headroom' as an extra flag to BPF_PROG_LOAD, > the users will just pass something like 64 or 128 whereas the program > might only be doing IPIP encap and that will cause kernel to > provide extra headroom for no good reason or reject the program > whereas it could have run just fine. > > So I very much agree with this part: > > ... or somehow verify that the needs > > of an XDP program can be satisfied by a given implementation. > > we already have cb_access, dst_needed, xdp_adjust_head flags > that verifier discovers in the program. > For headroom we need one more. The verifier can see > the constant passed into bpf_xdp_adjust_head(). > It's trickier if it's a variable, but the verifier can estimate min/max > of the variable already and worst case it can say that it > will be XDP_PACKET_HEADROOM. > If the program is doing variable length bpf_xdp_adjust_head(), > the human has no idea how much they need and cannot tell kernel anyway. If I understand correctly that the current proposal is to: 1. Assume program needs 256 (XDP_PACKET_HEADROOM) of prepend. 2. Make verifier try to lower that through byte code analysis. 3. Make the load fail if (prog->max_prepend > driver->xdp_max_prepend). Would that not cause unnecessary load failures? I'm worried it would force users to do things like this: struct map_value *val; if (val->prep_len > 64) /* Prove to verifier this is short */ return XDP_DROP; if (xdp_adjust_head(xdp, -val->prep_len)) ... Or worse not do that, use Qlogic, Broadcom, Mellanox or Netronome cards and then be in for a nasty surprise when they try to load the same program for Intel parts :( I agree that trying to express capabilities to user space is quickly going to become more confusing than helpful. However, I think here we need the opposite - we need an opt-in way of program requesting guarantees about the datapath. We may expect most users won't care and we may want to still depend on the verifier to analyze the program and enable optimizations, but depending on the verifier for proving prepend lengths is scary. Or are we just discussion the optimizations here and not the guarantees about headroom availability?
ipv6 udp early demux breaks udp_l3mdev_accept=0
Subash: My understanding of early demux is that it should only match connected sockets (src/sport + dst/dport are all set). The ipv6 udp early demux code added in 5425077d73e0c is doing a generic socket lookup which does not require an exact match and accordingly breaks udp_l3mdev_accept=0. Please fix. David
Re: [PATCH v3 net-next RFC] Generic XDP
On 04/14/2017 09:28 PM, Alexei Starovoitov wrote: On Fri, Apr 14, 2017 at 11:05:25AM +0200, Jesper Dangaard Brouer wrote: We are consistently finding that there is this real need to communicate XDP capabilities, or somehow verify that the needs of an XDP program can be satisfied by a given implementation. I fully agree that we need some way to express capabilities[1] Maximum headroom is just one. I don't like the idea of asking program author to explain capabilities to the kernel. Right now the verifier already understands more about the program than human does. If the verifier cannot deduct from the insns what program will be doing, it's pretty much guarantee that program author has no idea either. If we add 'required_headroom' as an extra flag to BPF_PROG_LOAD, the users will just pass something like 64 or 128 whereas the program might only be doing IPIP encap and that will cause kernel to provide extra headroom for no good reason or reject the program whereas it could have run just fine. Fully agree, such an extension is likely to be used wrongly or with some default size as we have right now with XDP_PACKET_HEADROOM to cover most use cases in order to not bother the user to deal with this resp. not to complicate things more. So I very much agree with this part: ... or somehow verify that the needs of an XDP program can be satisfied by a given implementation. we already have cb_access, dst_needed, xdp_adjust_head flags that verifier discovers in the program. For headroom we need one more. The verifier can see the constant passed into bpf_xdp_adjust_head(). It's trickier if it's a variable, but the verifier can estimate min/max of the variable already and worst case it can say that it will be XDP_PACKET_HEADROOM. +1, we should try hard to reuse such information from the verifier to determine specific requirements the program has, and check against them at prog attach time. This works okay so far for the already mentioned bits in struct bpf_prog, and could be further extended. If the program is doing variable length bpf_xdp_adjust_head(), the human has no idea how much they need and cannot tell kernel anyway.
Re: [PATCH net-next 1/1 v3] drivers: net: rmnet: Initial implementation
On Fri, 14 Apr 2017 15:57:22 -0600 Subash Abhinov Kasiviswanathan wrote: > >> + rmnet_kfree_skb > >> + (skb, > >> + RMNET_STATS_SKBFREE_INGRESS_NOT_EXPECT_MAPD); > > > > very odd formatting. Please fix. > > > Checkpatch complains if it is over 80 chars in a line, so I had to do > this. > I'll change to a single line. Either ignore checkpatch, it is only a dumb script; or better yet refactor the code so it isn't indented so much.
Re: [PATCH net-next 1/1 v3] drivers: net: rmnet: Initial implementation
No, please use standard facilities in order to do debug logging. +static struct rmnet_logical_ep_conf_s *_rmnet_get_logical_ep Dont use "_" at the start of a name without purpose +static int rmnet_associate_network_device(struct net_device *dev) I would expect to see here you making connection between real_dev and rmnet dev. I don't see such thing. Name of the function is misleading. + + config = kmalloc(sizeof(*config), GFP_ATOMIC); kzalloc, and you don't have to zero the memory. +static struct notifier_block rmnet_dev_notifier = { You should add "__read_mostly" + .notifier_call = rmnet_config_notify_cb, + .next = 0, + .priority = 0 This initialization of 0s is not needed. +/* rmnet_vnd_is_vnd() gives mux_id + 1, so subtract 1 to get the correct mux_id + */ Fix this comment format. +void rmnet_print_packet(const struct sk_buff *skb, const char *dev, char dir) No reason to have this function. One can use P_ALL tap to get skbs to userspace. +/* rmnet_bridge_handler() - Bridge related functionality + */ Fix the comment format (you have it on multiple places) +static rx_handler_result_t rmnet_bridge_handler + (struct sk_buff *skb, struct rmnet_logical_ep_conf_s *ep) The formatting is incorrect: static rx_handler_result_t rmnet_bridge_handler(struct sk_buff *skb, struct rmnet_logical_ep_conf_s *ep) + config = (struct rmnet_phys_ep_conf_s *) + rcu_dereference(skb->dev->rx_handler_data); + + if (!config) { Cannot happen. Please remove this. + .ndo_set_mac_address = 0, + .ndo_validate_addr = 0, These are NULL by default. No need to init. + if ((id < 0) || (id >= RMNET_MAX_VND) || !rmnet_devices[id]) Unneeded inner "()"s. I see you have it on multiple places. + + dev_conf = (struct rmnet_vnd_private_s *)netdev_priv(dev); The typecast is not needed since netdev_priv is void*. You have it all over the code. +#define ETH_P_MAP 0xDA1A /* Multiplexing and Aggregation Protocol +* NOT AN OFFICIALLY REGISTERED ID ] Please push this and ARPHRD_RAWIP as separate patches, to increase the visibility. +enum { + IFLA_RMNET_UNSPEC, + IFLA_RMNET_MUX_ID, + __IFLA_RMNET_MAX, +}; This belongs to include/uapi/linux/if_link.h Please see IFLA_BOND_* as example +#define RMNET_INIT_OK 0 +#define RMNET_INIT_ERROR 1 Please use common error codes (0/-ENOMEM/-EINVAL/...) + static u32 rmnet_mod_mask = X Don't use this custom helpers. Use existing loggign facilities. + pr_notice("[RMNET:LOW] %s(): " fmt "\n", __func__, \ + ##__VA_ARGS__); \ + } while (0) These look scarry. Please use netdev_err, dev_err and others instead. +unsigned int rmnet_log_module_mask; +module_param(rmnet_log_module_mask, uint, 0644); +MODULE_PARM_DESC(rmnet_log_module_mask, "Logging module mask"); No module options please. + config = (struct rmnet_phys_ep_conf_s *) + rcu_dereference(skb->dev->rx_handler_data); This is certainly a misuse of dev->rx_handler_data. Dev private of a function arg to carry the pointer around. +struct net_device *rmnet_devices[RMNET_MAX_VND]; Avoid this global variable. Hi Jiri I'll make these changes. + if (!data[IFLA_RMNET_MUX_ID]) + return -EINVAL; + + mux_id = nla_get_u16(data[IFLA_VLAN_ID]); This is a bug I believe... ^^^ I'm pretty sure that you did not test this code. Since both IFLA_VLAN_ID and IFLA_RMNET_MUX_ID are 1 from enum, this actually works. I was using VLAN as a reference, so I missed to change this to IFLA_RMNET_MUX_ID. + rmnet_kfree_skb + (skb, +RMNET_STATS_SKBFREE_INGRESS_NOT_EXPECT_MAPD); very odd formatting. Please fix. Checkpatch complains if it is over 80 chars in a line, so I had to do this. I'll change to a single line. -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH net-next] bonding: do not pass link-local packets to master-interface
On Fri, Apr 14, 2017 at 12:30 PM, Jay Vosburgh wrote: > > > Chonggang Li wrote: > > >Previously link-local packets excluding LACP (which are handled by > >the recv_probe) received on bond slave interfaces are delivered to > >stack with bond-master device with RX_HANDLER_ANOTHER, however all > >link-local packets are link specific and should be delivered with > >exact same link/dev. > > In what case is the current behavior a problem (my guess would > be something related to LLDP)? What if, e.g., the bond is a bridge > port, will STP frames no longer propagate to the bridge? > When a link-local frame made appear to arrive on bond-master, it looses value since 'the info' about link on which it arrived is lost. So idea behind this is to pass the frame to the stack with the link on which it arrived without involving the bonding-master. The same will happen for the STP frames too. Do you see problem with this approach? > Also, I think the description would be better if it mentioned > specifically that the patch is changing how skb->dev is set for link > local frames (bond->dev vs receiving interface), e.g., > > "[...] however all link-local packets are link specific and > should be delivered with skb->dev set to the original device." > yes, makes sense. > >Signed-off-by: Chonggang Li > >Signed-off-by: Mahesh Bandewar > >Signed-off-by: Maciej Żenczykowski > >--- > > drivers/net/bonding/bond_main.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > >diff --git a/drivers/net/bonding/bond_main.c > >b/drivers/net/bonding/bond_main.c > >index 01e4a69af421..aeca3d8541b9 100644 > >--- a/drivers/net/bonding/bond_main.c > >+++ b/drivers/net/bonding/bond_main.c > >@@ -1176,6 +1176,9 @@ static rx_handler_result_t bond_handle_frame(struct > >sk_buff **pskb) > > } > > } > > > >+ /* link-local packets should not be passed to master interface */ > >+ if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) > >+ return RX_HANDLER_PASS; > > Since this returns _PASS and not _EXACT, the packet will go > through the ptype_base packet handlers, so is the comment strictly > correct? > The stack does not have all link-local specific packet-type handlers registered by default so returning _EXACT would find nothing and packet will be dropped, right? So returning _PASS will deliver packet to the stack with skb->dev as the link on which packet arrived and stack can take whatever (default) action it has to take. > -J > > > if (bond_should_deliver_exact_match(skb, slave, bond)) > > return RX_HANDLER_EXACT; > > > >-- > >2.12.2.762.g0e3151a226-goog > > > > --- > -Jay Vosburgh, jay.vosbu...@canonical.com
Re: [PATCH net] ipv6: drop non loopback packets claiming to originate from ::1
On Fri, Apr 14, 2017, at 20:22, Florian Westphal wrote: > We lack a saddr check for ::1. This causes security issues e.g. with acls > permitting connections from ::1 because of assumption that these > originate > from local machine. > > Assuming a source address of ::1 is local seems reasonable. > RFC4291 doesn't allow such a source address either, so drop such packets. > > Reported-by: Eric Dumazet > Signed-off-by: Florian Westphal Acked-by: Hannes Frederic Sowa Thanks!
[PATCH net-next 2/2] hv_netvsc: change netvsc device default duplex to FULL
From: Simon Xiao The netvsc device supports full duplex by default. This warnings in log from bonding device which did not like seeing UNKNOWN duplex. Signed-off-by: Simon Xiao --- drivers/net/hyperv/netvsc_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 76489a06ef3b..fad5a57765b9 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -815,7 +815,7 @@ static void netvsc_init_settings(struct net_device *dev) struct net_device_context *ndc = netdev_priv(dev); ndc->speed = SPEED_UNKNOWN; - ndc->duplex = DUPLEX_UNKNOWN; + ndc->duplex = DUPLEX_FULL; } static int netvsc_get_link_ksettings(struct net_device *dev, -- 2.11.0
[PATCH net-next 1/2] netvsc: fix RCU warning in get_stats
The statistics functionis called with RTNL held during probe but with RCU held during access from /proc and elsewhere. This is safe so update the lockdep annotation. Signed-off-by: Stephen Hemminger --- drivers/net/hyperv/netvsc_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index f24c2891dd0c..76489a06ef3b 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -897,7 +897,7 @@ static void netvsc_get_stats64(struct net_device *net, struct rtnl_link_stats64 *t) { struct net_device_context *ndev_ctx = netdev_priv(net); - struct netvsc_device *nvdev = rcu_dereference(ndev_ctx->nvdev); + struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev); int i; if (!nvdev) -- 2.11.0
4.10.9 nok with realtek wlan, atom
On Thu, Feb 9, 2017 at 9:09 PM, Larry Finger wrote: > On 02/09/2017 01:43 PM, Bjorn Helgaas wrote: >> >> [+cc rtl8192ce folks in case they've seen this] >> >> On Thu, Feb 09, 2017 at 03:45:01PM +0100, rupert THURNER wrote: >>> >>> hi, >>> >>> not technical expert enough, i just wanted to give a short user >>> feedback. for realtek wlan on atom, kernels up to 4.9.5 are ok, and >>> kernel 4.10.0-rc7-g926af6273fc6 (arch linux-git version numbering) as >>> well. kernels 4.9.6, 4.9.7, and 4.9.8 fail, i.e. connection to a WLAN >>> hotspot is possible then drops, or connecting to wlan fails >>> alltogether. >> >> >> Thanks very much for your report, and sorry for the inconvenience. >> >> v4.10-rc7 works, so I guess we don't need to worry about fixing v4.10. >> >> But the stable kernels v4.9.6, v4.9.7, and v4.9.8 are broken, so we >> need to figure out why and make sure we fix the v4.9 stable series. >> >> I can't tell yet whether this is PCI-related or not. If it is, >> 4922a6a5cfa7 ("PCI: Enumerate switches below PCI-to-PCIe bridges") >> appeared in v4.9.6, and there is a known issue with that. The issue >> should be fixed by 610c2b7ff8f6 ("PCI/ASPM: Handle PCI-to-PCIe bridges >> as roots of PCIe hierarchies"), which appeared in v4.9.9, so I guess >> the first thing to do would be to test v4.9.9. >> >> If it's not fixed in v4.9.9, can you share the complete dmesg log >> (output of "dmesg" command) and "lspci -vv" output for v4.9.5 (last >> known working version) and v4.9.6 (first known broken version)? On >> v4.9.6, collect the dmesg output after the failure occurs. >> >>> 24: PCI 300.0: 0282 WLAN controller >>> [Created at pci.366] >>> Model: "Realtek RTL8188CE 802.11b/g/n WiFi Adapter" >>> Device: pci 0x8176 "RTL8188CE 802.11b/g/n WiFi Adapter" >>> Revision: 0x01 >>> Driver: "rtl8192ce" >>> Driver Modules: "rtl8192ce" >>> Device File: wlp3s0 >>> Features: WLAN > > > It would be helpful if someone were to bisect this issue. The only issue > that comes to mind was fixed in commit 52f5631a4c05 ("rtlwifi: rtl8192ce: > Fix loading of incorrect firmware") which is in 4.10-rc7 and will be > backported to 4.9. > > The above issue is one that could not be reproduced on my hardware, thus it > took Jurij Smakov to find the fix. Without his bisection of the problem, who > knows how long it would have taken to find my edit mistake. larry, using the newest kernel 4.10.8 the network connection dropps again irregular. # dmesg [0.00] Linux version 4.10.9-1-ARCH (builduser@tobias) (gcc version 6.3.1 20170306 (GCC) ) #1 SMP PREEMPT Sat Apr 8 12:39:59 CEST 2017 [ 11.933373] rtl8192ce: rtl8192ce: Power Save off (module option) [ 11.933396] rtl8192ce: Using firmware rtlwifi/rtl8192cfw.bin [ 11.978307] ieee80211 phy0: Selected rate control algorithm 'rtl_rc' [ 11.978945] rtlwifi: rtlwifi: wireless switch is on # lspci -vv Subsystem: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter Kernel driver in use: rtl8192ce best, rupert
Re: [PATCH net-next] bonding: do not pass link-local packets to master-interface
> In what case is the current behavior a problem (my guess would > be something related to LLDP)? LLDP is indeed the case we're trying to solve here. Listen on one socket and get LLDP for all interfaces in the system. > What if, e.g., the bond is a bridge > port, will STP frames no longer propagate to the bridge? That's an interesting question. I don't actually know how this should work. Should this change perhaps only apply to packets we would otherwise RX_HANDLER_EXACT? ie. only affect link local packets on inactive slaves? but continue reparenting link local packets on active slaves? That would seem a little inconsistent... but less of a change.
Re: [PATCH block-tree] net: off by one in inet6_pton()
On 04/13/2017 01:42 PM, Dan Carpenter wrote: > If "scope_len" is sizeof(scope_id) then we would put the NUL terminator > one space beyond the end of the buffer. Added, thanks Dan. -- Jens Axboe
Re: [PATCH v3 net-next RFC] Generic XDP
On Thu, Apr 13, 2017 at 11:38:10AM -0400, David Miller wrote: > From: Johannes Berg > Date: Thu, 13 Apr 2017 08:10:56 +0200 > > > On Wed, 2017-04-12 at 21:20 -0700, Alexei Starovoitov wrote: > > > >> > +if (skb_linearize(skb)) > >> > +goto do_drop; > >> > >> when we discussed supporting jumbo frames in XDP, the idea > >> was that the program would need to look at first 3+k bytes only > >> and the rest of the packet will be in non-contiguous pages. > >> If we do that, it means that XDP program would have to assume > >> that the packet is more than [data, data_end] and this range > >> only covers linear part. > >> If that's the future, we don't need to linearize the skb here > >> and can let the program access headlen only. > > > > I'm not sure how you think that would work - at least with our (wifi) > > driver, the headlen should be maybe ETH_HLEN or so at this point. We'd > > let the program know that it can only look at so much, but then the > > program can't do anything at all with those frames. At some point then > > we go back to bpf_skb_load_bytes() being necessary in one form or > > another, no? > > Agreed, this is completely unusable. Some wired ethernet drivers do the > same exact thing. ahh. i thought all drivers do at least copy-break (256) bytes or copy get_headlen or build_skb the whole thing. Since wireless does eth_hlen, then yeah, skb_linearize() is the only way. I guess any driver that would care about XDP performance would either implement in-driver XDP or make sure that skb_linearize() doesn't happen in generic XDP by doing build_skb() with the whole packet. The driver can be smart and avoid doing copy-break if generic XDP is on.
Re: [PATCH net-next] bonding: do not pass link-local packets to master-interface
Chonggang Li wrote: >Previously link-local packets excluding LACP (which are handled by >the recv_probe) received on bond slave interfaces are delivered to >stack with bond-master device with RX_HANDLER_ANOTHER, however all >link-local packets are link specific and should be delivered with >exact same link/dev. In what case is the current behavior a problem (my guess would be something related to LLDP)? What if, e.g., the bond is a bridge port, will STP frames no longer propagate to the bridge? Also, I think the description would be better if it mentioned specifically that the patch is changing how skb->dev is set for link local frames (bond->dev vs receiving interface), e.g., "[...] however all link-local packets are link specific and should be delivered with skb->dev set to the original device." >Signed-off-by: Chonggang Li >Signed-off-by: Mahesh Bandewar >Signed-off-by: Maciej Żenczykowski >--- > drivers/net/bonding/bond_main.c | 3 +++ > 1 file changed, 3 insertions(+) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 01e4a69af421..aeca3d8541b9 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -1176,6 +1176,9 @@ static rx_handler_result_t bond_handle_frame(struct >sk_buff **pskb) > } > } > >+ /* link-local packets should not be passed to master interface */ >+ if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) >+ return RX_HANDLER_PASS; Since this returns _PASS and not _EXACT, the packet will go through the ptype_base packet handlers, so is the comment strictly correct? -J > if (bond_should_deliver_exact_match(skb, slave, bond)) > return RX_HANDLER_EXACT; > >-- >2.12.2.762.g0e3151a226-goog > --- -Jay Vosburgh, jay.vosbu...@canonical.com
Re: [PATCH v3 net-next RFC] Generic XDP
On Fri, Apr 14, 2017 at 11:05:25AM +0200, Jesper Dangaard Brouer wrote: > > > > We are consistently finding that there is this real need to > > communicate XDP capabilities, or somehow verify that the needs > > of an XDP program can be satisfied by a given implementation. > > I fully agree that we need some way to express capabilities[1] > > > Maximum headroom is just one. I don't like the idea of asking program author to explain capabilities to the kernel. Right now the verifier already understands more about the program than human does. If the verifier cannot deduct from the insns what program will be doing, it's pretty much guarantee that program author has no idea either. If we add 'required_headroom' as an extra flag to BPF_PROG_LOAD, the users will just pass something like 64 or 128 whereas the program might only be doing IPIP encap and that will cause kernel to provide extra headroom for no good reason or reject the program whereas it could have run just fine. So I very much agree with this part: > ... or somehow verify that the needs > of an XDP program can be satisfied by a given implementation. we already have cb_access, dst_needed, xdp_adjust_head flags that verifier discovers in the program. For headroom we need one more. The verifier can see the constant passed into bpf_xdp_adjust_head(). It's trickier if it's a variable, but the verifier can estimate min/max of the variable already and worst case it can say that it will be XDP_PACKET_HEADROOM. If the program is doing variable length bpf_xdp_adjust_head(), the human has no idea how much they need and cannot tell kernel anyway.
export pcie_flr and remove copies of it in drivers V2
Hi all, this exports the PCI layer pcie_flr helper, and removes various opencoded copies of it. Changes since V1: - rebase on top of the pci/virtualization branch - fixed the probe case in __pci_dev_reset - added ACKs from Bjorn
[PATCH 2/7] PCI: call pcie_flr from reset_intel_82599_sfp_virtfn
The 82599 quirk contained an outdated copy of the FLR code. Signed-off-by: Christoph Hellwig Acked-by: Bjorn Helgaas --- drivers/pci/quirks.c | 16 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 823271b13d12..8195ca294ee5 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3641,19 +3641,11 @@ static int reset_intel_82599_sfp_virtfn(struct pci_dev *dev, int probe) * * The 82599 supports FLR on VFs, but FLR support is reported only * in the PF DEVCAP (sec 9.3.10.4), not in the VF DEVCAP (sec 9.5). -* Therefore, we can't use pcie_flr(), which checks the VF DEVCAP. +* Thus we must call pcie_flr directly without first checking if it is +* supported. */ - - if (probe) - return 0; - - if (!pci_wait_for_pending_transaction(dev)) - dev_err(&dev->dev, "transaction is not cleared; proceeding with reset anyway\n"); - - pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR); - - msleep(100); - + if (!probe) + pcie_flr(dev); return 0; } -- 2.11.0
[PATCH 1/7] PCI: export pcie_flr
Currently we opencode the FLR sequence in lots of place, export a core helper instead. We split out the probing for FLR support as all the non-core callers already know their hardware. Note that in the new pci_has_flr function the quirk check has been moved before the capability check as there is no point in reading the capability in this case. Signed-off-by: Christoph Hellwig Acked-by: Bjorn Helgaas --- drivers/pci/pci.c | 39 --- include/linux/pci.h | 1 + 2 files changed, 29 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index bef14777bb30..957a11a6a840 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3773,27 +3773,41 @@ static void pci_flr_wait(struct pci_dev *dev) (i - 1) * 100); } -static int pcie_flr(struct pci_dev *dev, int probe) +/** + * pcie_has_flr - check if a device supports function level resets + * @dev: device to check + * + * Returns true if the device advertises support for PCIe function level + * resets. + */ +static bool pcie_has_flr(struct pci_dev *dev) { u32 cap; - pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &cap); - if (!(cap & PCI_EXP_DEVCAP_FLR)) - return -ENOTTY; - if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET) - return -ENOTTY; + return false; - if (probe) - return 0; + pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &cap); + return cap & PCI_EXP_DEVCAP_FLR; +} +/** + * pcie_flr - initiate a PCIe function level reset + * @dev: device to reset + * + * Initiate a function level reset on @dev. The caller should ensure the + * device supports FLR before calling this function, e.g. by using the + * pcie_has_flr() helper. + */ +void pcie_flr(struct pci_dev *dev) +{ if (!pci_wait_for_pending_transaction(dev)) dev_err(&dev->dev, "timed out waiting for pending transaction; performing function level reset anyway\n"); pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR); pci_flr_wait(dev); - return 0; } +EXPORT_SYMBOL_GPL(pcie_flr); static int pci_af_flr(struct pci_dev *dev, int probe) { @@ -3977,9 +3991,12 @@ static int __pci_dev_reset(struct pci_dev *dev, int probe) if (rc != -ENOTTY) goto done; - rc = pcie_flr(dev, probe); - if (rc != -ENOTTY) + if (pcie_has_flr(dev)) { + if (!probe) + pcie_flr(dev); + rc = 0; goto done; + } rc = pci_af_flr(dev, probe); if (rc != -ENOTTY) diff --git a/include/linux/pci.h b/include/linux/pci.h index 20e1865233a4..60162f51227a 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1054,6 +1054,7 @@ int pcie_get_mps(struct pci_dev *dev); int pcie_set_mps(struct pci_dev *dev, int mps); int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed, enum pcie_link_width *width); +void pcie_flr(struct pci_dev *dev); int __pci_reset_function(struct pci_dev *dev); int __pci_reset_function_locked(struct pci_dev *dev); int pci_reset_function(struct pci_dev *dev); -- 2.11.0
Re: [PATCH -next] net: phy: test the right variable in phy_write_mmd()
On Fri, Apr 14, 2017 at 10:10:41PM +0300, Dan Carpenter wrote: > This is a copy and paste buglet. We meant to test for ->write_mmd but > we test for ->read_mmd. > > Fixes: 1ee6b9bc6206 ("net: phy: make phy_(read|write)_mmd() generic MMD > accessors") > Signed-off-by: Dan Carpenter Reviewed-by: Andrew Lunn Andrew
[no subject]
Things seem to be settling down as far as networking is concerned, let's hope this trend continues... 1) Add iov_iter_revert() and use it to fix the behavior of skb_copy_datagram_msg() et al., from Al Viro. 2) Fix the protocol used in the synthetic SKB we cons up for the purposes of doing a simulated route lookup for RTM_GETROUTE requests. From Florian Larysch. 3) Don't add noop_qdisc to the per-device qdisc hashes, from Cong Wang. 4) Don't call netdev_change_features with the team lock held, from Xin Long. 5) Revert TCP F-RTO extension to catch more spurious timeouts because it interacts very badly with some middle-boxes. From Yuchung Cheng. 6) Fix the loss of error values in l2tp {s,g}etsockopt calls, from Guillaume Nault. 7) ctnetlink uses bit positions where it should be using bit masks, fix from Liping Zhang. 8) Missing RCU locking in netfilter helper code, from Gao Feng. 9) Avoid double frees and use-after-frees in tcp_disconnect(), from Eric Dumazet. 10) Don't do a changelink before we register the netdevice in bridging, from Ido Schimmel. 11) Lock the ipv6 device address list properly, from Rabin Vincent. Please pull, thanks a lot! The following changes since commit ea6b1720ce25f92f7a17b2e0c2b653d20773d10a: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-04-05 20:17:38 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to f4c13c8ec56e70eeff3e365e0c5fcdad16845b32: Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf (2017-04-14 10:47:13 -0400) Al Viro (2): [iov_iter] new privimitive: iov_iter_revert() make skb_copy_datagram_msg() et.al. preserve ->msg_iter on error Daniele Palmas (1): drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201 David S. Miller (5): Merge branch 'for-davem' of git://git.kernel.org/.../viro/vfs Merge branch 'l2tp-sockopt-errors' Merge tag 'linux-can-fixes-for-4.12-20170404' of git://git.kernel.org/.../mkl/linux-can Merge branch 'bridge-register-netdev-before-changelink' Merge git://git.kernel.org/.../pablo/nf Eric Dumazet (2): netfilter: xt_TCPMSS: add more sanity tests on tcph->doff tcp: clear saved_syn in tcp_disconnect() Florian Larysch (1): net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given Gao Feng (3): net: tcp: Increase TCP_MIB_OUTRSTS even though fail to alloc skb netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage Geert Uytterhoeven (1): can: rcar_can: Do not print virtual addresses Guillaume Nault (2): l2tp: don't mask errors in pppol2tp_setsockopt() l2tp: don't mask errors in pppol2tp_getsockopt() Ido Schimmel (2): bridge: implement missing ndo_uninit() bridge: netlink: register netdevice before executing changelink Johannes Berg (2): bpf: reference may_access_skb() from __bpf_prog_run() net: xdp: don't export dev_change_xdp_fd() Liping Zhang (6): netfilter: ctnetlink: using bit to represent the ct event netfilter: ctnetlink: make it safer when checking the ct helper name netfilter: make it safer during the inet6_dev->addr_list traversal netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL netfilter: nf_ct_expect: use proper RCU list traversal/update APIs netfilter: nft_hash: do not dump the auto generated seed Markus Marb (1): can: ifi: use correct register to read rx status Oliver Neukum (1): usbnet: make sure no NULL pointer is passed through Rabin Vincent (1): ipv6: Fix idev->addr_list corruption WANG Cong (1): net_sched: check noop_qdisc before qdisc_hash_add() Xin Long (2): sctp: listen on the sock only when it's state is listening or closed team: call netdev_change_features out of team lock Yuchung Cheng (1): tcp: restrict F-RTO to work-around broken middle-boxes drivers/net/can/ifi_canfd/ifi_canfd.c | 2 +- drivers/net/can/rcar/rcar_can.c | 3 +-- drivers/net/team/team.c | 19 +++ drivers/net/usb/qmi_wwan.c| 2 +- drivers/net/usb/usbnet.c | 19 +++ include/linux/uio.h | 6 +- kernel/bpf/core.c | 12 ++-- lib/iov_iter.c| 63 +++ net/bridge/br_device.c| 20 +++- net/bridge/br_if.c| 1 - net/bridge/br_multicast.c | 7 +-- net/bridge/br_netlink.c | 7 +-- net/bridge/br_private.h | 5 + net/core/datagram.c | 23 ++- net/c
[PATCH 3/7] PCI: call pcie_flr from reset_chelsio_generic_dev
Instead of copy & pasting and old version of the code. Signed-off-by: Christoph Hellwig Acked-by: Bjorn Helgaas --- drivers/pci/quirks.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 8195ca294ee5..715ed8c08fa3 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3750,20 +3750,7 @@ static int reset_chelsio_generic_dev(struct pci_dev *dev, int probe) PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL); - /* -* Start of pcie_flr() code sequence. This reset code is a copy of -* the guts of pcie_flr() because that's not an exported function. -*/ - - if (!pci_wait_for_pending_transaction(dev)) - dev_err(&dev->dev, "transaction is not cleared; proceeding with reset anyway\n"); - - pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR); - msleep(100); - - /* -* End of pcie_flr() code sequence. -*/ + pcie_flr(dev); /* * Restore the configuration information (BAR values, etc.) including -- 2.11.0
[PATCH 6/7] crypto: qat: use pcie_flr instead of duplicating it
Signed-off-by: Christoph Hellwig --- drivers/crypto/qat/qat_common/adf_aer.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/crypto/qat/qat_common/adf_aer.c b/drivers/crypto/qat/qat_common/adf_aer.c index 2839fccdd84b..d3e25c37dc33 100644 --- a/drivers/crypto/qat/qat_common/adf_aer.c +++ b/drivers/crypto/qat/qat_common/adf_aer.c @@ -109,20 +109,7 @@ EXPORT_SYMBOL_GPL(adf_reset_sbr); void adf_reset_flr(struct adf_accel_dev *accel_dev) { - struct pci_dev *pdev = accel_to_pci_dev(accel_dev); - u16 control = 0; - int pos = 0; - - dev_info(&GET_DEV(accel_dev), "Function level reset\n"); - pos = pci_pcie_cap(pdev); - if (!pos) { - dev_err(&GET_DEV(accel_dev), "Restart device failed\n"); - return; - } - pci_read_config_word(pdev, pos + PCI_EXP_DEVCTL, &control); - control |= PCI_EXP_DEVCTL_BCR_FLR; - pci_write_config_word(pdev, pos + PCI_EXP_DEVCTL, control); - msleep(100); + pcie_flr(accel_to_pci_dev(accel_dev)); } EXPORT_SYMBOL_GPL(adf_reset_flr); -- 2.11.0
[PATCH 4/7] ixgbe: use pcie_flr instead of duplicating it
Signed-off-by: Christoph Hellwig --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 ++-- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index a7a430a7be2c..543ddde5f8e2 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -7112,18 +7112,6 @@ static void ixgbe_watchdog_flush_tx(struct ixgbe_adapter *adapter) } #ifdef CONFIG_PCI_IOV -static inline void ixgbe_issue_vf_flr(struct ixgbe_adapter *adapter, - struct pci_dev *vfdev) -{ - if (!pci_wait_for_pending_transaction(vfdev)) - e_dev_warn("Issuing VFLR with pending transactions\n"); - - e_dev_err("Issuing VFLR for VF %s\n", pci_name(vfdev)); - pcie_capability_set_word(vfdev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR); - - msleep(100); -} - static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter) { struct ixgbe_hw *hw = &adapter->hw; @@ -7156,7 +7144,7 @@ static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter) pci_read_config_word(vfdev, PCI_STATUS, &status_reg); if (status_reg != IXGBE_FAILED_READ_CFG_WORD && status_reg & PCI_STATUS_REC_MASTER_ABORT) - ixgbe_issue_vf_flr(adapter, vfdev); + pcie_flr(vfdev); } } @@ -10244,7 +10232,7 @@ static pci_ers_result_t ixgbe_io_error_detected(struct pci_dev *pdev, * VFLR. Just clean up the AER in that case. */ if (vfdev) { - ixgbe_issue_vf_flr(adapter, vfdev); + pcie_flr(vfdev); /* Free device reference count */ pci_dev_put(vfdev); } -- 2.11.0
[PATCH 5/7] IB/hfi1: use pcie_flr instead of duplicating it
Signed-off-by: Christoph Hellwig --- drivers/infiniband/hw/hfi1/chip.c | 4 ++-- drivers/infiniband/hw/hfi1/hfi.h | 1 - drivers/infiniband/hw/hfi1/pcie.c | 30 -- 3 files changed, 2 insertions(+), 33 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c index 121a4c920f1b..d037f72e4d96 100644 --- a/drivers/infiniband/hw/hfi1/chip.c +++ b/drivers/infiniband/hw/hfi1/chip.c @@ -13610,14 +13610,14 @@ static void init_chip(struct hfi1_devdata *dd) dd_dev_info(dd, "Resetting CSRs with FLR\n"); /* do the FLR, the DC reset will remain */ - hfi1_pcie_flr(dd); + pcie_flr(dd->pcidev); /* restore command and BARs */ restore_pci_variables(dd); if (is_ax(dd)) { dd_dev_info(dd, "Resetting CSRs with FLR\n"); - hfi1_pcie_flr(dd); + pcie_flr(dd->pcidev); restore_pci_variables(dd); } } else { diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h index 0808e3c3ba39..40d7559fa723 100644 --- a/drivers/infiniband/hw/hfi1/hfi.h +++ b/drivers/infiniband/hw/hfi1/hfi.h @@ -1764,7 +1764,6 @@ int hfi1_pcie_init(struct pci_dev *, const struct pci_device_id *); void hfi1_pcie_cleanup(struct pci_dev *); int hfi1_pcie_ddinit(struct hfi1_devdata *, struct pci_dev *); void hfi1_pcie_ddcleanup(struct hfi1_devdata *); -void hfi1_pcie_flr(struct hfi1_devdata *); int pcie_speeds(struct hfi1_devdata *); void request_msix(struct hfi1_devdata *, u32 *, struct hfi1_msix_entry *); void hfi1_enable_intx(struct pci_dev *); diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c index 0829fce06172..c81556e84831 100644 --- a/drivers/infiniband/hw/hfi1/pcie.c +++ b/drivers/infiniband/hw/hfi1/pcie.c @@ -240,36 +240,6 @@ void hfi1_pcie_ddcleanup(struct hfi1_devdata *dd) iounmap(dd->piobase); } -/* - * Do a Function Level Reset (FLR) on the device. - * Based on static function drivers/pci/pci.c:pcie_flr(). - */ -void hfi1_pcie_flr(struct hfi1_devdata *dd) -{ - int i; - u16 status; - - /* no need to check for the capability - we know the device has it */ - - /* wait for Transaction Pending bit to clear, at most a few ms */ - for (i = 0; i < 4; i++) { - if (i) - msleep((1 << (i - 1)) * 100); - - pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVSTA, &status); - if (!(status & PCI_EXP_DEVSTA_TRPND)) - goto clear; - } - - dd_dev_err(dd, "Transaction Pending bit is not clearing, proceeding with reset anyway\n"); - -clear: - pcie_capability_set_word(dd->pcidev, PCI_EXP_DEVCTL, -PCI_EXP_DEVCTL_BCR_FLR); - /* PCIe spec requires the function to be back within 100ms */ - msleep(100); -} - static void msix_setup(struct hfi1_devdata *dd, int pos, u32 *msixcnt, struct hfi1_msix_entry *hfi1_msix_entry) { -- 2.11.0
[PATCH 7/7] liquidio: use pcie_flr instead of duplicating it
Signed-off-by: Christoph Hellwig --- drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c index 9d5e03502c76..afdbf7fa016e 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c @@ -869,20 +869,7 @@ static void octeon_pci_flr(struct octeon_device *oct) pci_write_config_word(oct->pci_dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE); - /* Wait for Transaction Pending bit clean */ - msleep(100); - pcie_capability_read_word(oct->pci_dev, PCI_EXP_DEVSTA, &status); - if (status & PCI_EXP_DEVSTA_TRPND) { - dev_info(&oct->pci_dev->dev, "Function reset incomplete after 100ms, sleeping for 5 seconds\n"); - ssleep(5); - pcie_capability_read_word(oct->pci_dev, PCI_EXP_DEVSTA, - &status); - if (status & PCI_EXP_DEVSTA_TRPND) - dev_info(&oct->pci_dev->dev, "Function reset still incomplete after 5s, reset anyway\n"); - } - pcie_capability_set_word(oct->pci_dev, PCI_EXP_DEVCTL, -PCI_EXP_DEVCTL_BCR_FLR); - mdelay(100); + pcie_flr(oct->pci_dev); pci_cfg_access_unlock(oct->pci_dev); -- 2.11.0
[PATCH -next] net: phy: test the right variable in phy_write_mmd()
This is a copy and paste buglet. We meant to test for ->write_mmd but we test for ->read_mmd. Fixes: 1ee6b9bc6206 ("net: phy: make phy_(read|write)_mmd() generic MMD accessors") Signed-off-by: Dan Carpenter diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c index 357a4d0d7641..6739b738bbaf 100644 --- a/drivers/net/phy/phy-core.c +++ b/drivers/net/phy/phy-core.c @@ -76,7 +76,7 @@ int phy_write_mmd(struct phy_device *phydev, int devad, u32 regnum, u16 val) if (regnum > (u16)~0 || devad > 32) return -EINVAL; - if (phydev->drv->read_mmd) { + if (phydev->drv->write_mmd) { ret = phydev->drv->write_mmd(phydev, devad, regnum, val); } else if (phydev->is_c45) { u32 addr = MII_ADDR_C45 | (devad << 16) | (regnum & 0x);
Re: [PATCH net] ipv6: drop non loopback packets claiming to originate from ::1
On Fri, 2017-04-14 at 20:22 +0200, Florian Westphal wrote: > We lack a saddr check for ::1. This causes security issues e.g. with acls > permitting connections from ::1 because of assumption that these originate > from local machine. > > Assuming a source address of ::1 is local seems reasonable. > RFC4291 doesn't allow such a source address either, so drop such packets. > > Reported-by: Eric Dumazet > Signed-off-by: Florian Westphal > --- > net/ipv6/ip6_input.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) Acked-by: Eric Dumazet
Re: [PATCH v3 net-next RFC] Generic XDP
On Fri, Apr 14, 2017 at 10:07:40AM +0200, Johannes Berg wrote: > On Thu, 2017-04-13 at 16:01 -0400, David Miller wrote: > > From: Johannes Berg > > Date: Thu, 13 Apr 2017 21:22:21 +0200 > > > > > OTOH, it might depend on the frame data itself, if the program does > > > something like > > > > > > xdp->data[xdp->data[0] & 0xf] > > > > > > (read or write, doesn't really matter) so then the verifier would > > have > > > to take the maximum possible value there into account. > > > > I am not well versed enough with the verifier to understand exactly > > how and to what extent SKB accesses are validated by the verifier. > > > > My, perhaps mistaken, impression is that access range validation is > > still at least partially done at run time. > > I think you're right for SKB accesses, but I'm pretty sure that for XDP > the verifier checks that the program can't possibly access outside of > [xdp->data, xdp->data_end], see compare_ptrs_to_packet(). > > This checks that comparisons to data_end are all there, i.e. that the > program verifies it may access some bytes before actually doing so. > However, the program could start with > > if (xdp->data_end < xdp->data + 1024) > return DROP; > [...] > > and then the verifier would consider it safe. > > Still, it has to track down into the [...] code to actually understand > that it now didn't try to access xdp->data+1025, and as such it should > be able to determine the maximum desired offset. > > However, I'm coming to realize that may not necessarily mean that the > program really *needs* to access that data. > > For example, a hypothetical wifi program might want to recalculate and > compare the CRC checksum (for whatever reason, say a driver/hardware > bug). This would require accessing the last 4 bytes of the packet, > which may not be present. The program could, however, accept that > sometimes this isn't possible, and simply accept frames when it can't > see the last 4 bytes (or if the last 4 bytes aren't the CRC because > that isn't reported now, but whatever, I'm handwaving anyway.) > > So perhaps this isn't really a good idea. The program should probably > separately say how much data it really *needs* there, and then perhaps > a warning could be emitted if it never accesses the data that it > advertises as needing (i.e. if it says "I want 1024 bytes" but then > can't possibly read more than 14) The above descrption is correct. Also note that the offset within the packet can be a variable as well, since that's the only way to do variable length parsing (like ip options), so the verifier cannot tell how far into the packet the program is going to parse. The program author cannot tell that either! Consider samples/bpf/parse_varlen.c: ihl_len = iph->ihl * 4; if (iph->protocol == IPPROTO_IPIP) { iph = data + nh_off + ihl_len; if (iph + 1 > data_end) return 0; 'ihl' is 4 bit, so 2nd ip hdr may start at sizeof(eth) + 64 bytes, but we cannot ask the driver to always provide these many bytes. Technically we can add a feature to verifier to calculate the max range into the packet that the program may look at, but it will be large. Like if the program parses IPIP the range will be 64(first ip) + 64(second ip) + 64 (if it looks at tcp options). If the program parses TLVs then the range will be 64k, so I'm not sure what would be the value of such verifier feature.
Re: [RFC 1/3] bpf/wireless: add wifimon program type
On Wed, Apr 12, 2017 at 05:30:40PM +0200, Johannes Berg wrote: > On Wed, 2017-04-12 at 11:19 -0400, David Miller wrote: > > > > > Instead it may make more sense to just have a "wifi_info(skb, > > info)" > > > function you can call, e.g. with a structure that has various > > fields > > > and flags to see which you care about. > > > > I would advise against this, let the parsing part use __sk_buff and > > therefore have maximum flexibility. You really cannot predict the > > future, so in my opinion it might be unwise to constrain at this > > point. > > So my point with the wifi_info() function to call from the BPF program > was just that putting something that's not already in struct sk_buff > into __sk_buff doesn't really make a lot of sense - we still have to > actually build it somewhere/somehow without knowing if it's going to be > needed by the program. We already have things like prog->cb_access and > prog->dst_needed now, but I'm not sure we want to blow that up much. > > At the same time, technically I *do* have the information in skb->cb, > but the format thereof is something I really wouldn't want to let > trickle into the ABI. Therefore, I think if somebody needs something > like the bitrate, we should have a wifi_info() function that can return > the bitrate (among other things) without having to pre-build the data. > We can still cache it in mac80211 if multiple programs are there, dunno > if that makes sense. > > Nevertheless, I think if I don't use __sk_buff as the program argument > then things get really messy, so I'll stick to that, and avoid adding > anything to it as much as possible. so today only 'len' field is needed, but the plan to add wifi specific stuff to bpf context? If so, adding these very wifi specific fields to __sk_buff will confuse other program types, so new program type (like you did) and new 'struct bpf_wifi_md' are probably cleaner. Physically the R1 register to bpf program will still be 'struct sk_buff', but from bpf program side it will be: struct bpf_wifi_md { __u32 len; __u32 wifi_things; }; At the same time if most of the __sk_buff fields can be useful to wifimon, then just use it as-is (without restricting to 'len' only) and add wifi specific fields to it with a comment.
Re: eBPF - little-endian load instructions?
On Thu, Apr 13, 2017 at 07:58:45AM +0200, Johannes Berg wrote: > On Wed, 2017-04-12 at 20:08 -0700, Alexei Starovoitov wrote: > > > it's really llvm bug that i need fix. It's plain broken > > to generate what effectively is nop insn for march=bpfeb > > My only excuse that when that code was written llvm had only > > march=bpfel. > > bpfeb was added much later. > > So I'm confused now. Is bpf intended to be endian-independent or not? > It sounded at first like it was, even if I have a hard time imagining > how that would even work. bpf takes endianness of the cpu it runs on. if we said that bpf 32-bit load insn is little endian, it would have screwed up all big-endian archs and the other way around. For both classic and extended the corresponding struct sock_filter and bpf_insn are encoded in target cpu endianness. The front-end (clang) needs to run in target endianness as well otherwise the bcc tracing scripts will be accessing garbage. In other words you cannot take either classic or extended bpf program generated for x86 and run it on big-endian. Though you can compile on x86 and run on Arm. Hence two targets for llvm -march=bpfeb and -march=bpfel If you compile for -march=bpfeb and try to load on x86, the instruction stream will be rejected by the verifier in early stages, since branch insns and all immediate values will be wrong.
Re: [PATCH 3/4] net: macb: Add hardware PTP support
On Thu, Apr 13, 2017 at 02:39:23PM +0100, Rafal Ozieblo wrote: > This patch is based on original Harini's patch and Andrei's patch, > implemented in a separate file to ease the review/maintanance > and integration with other platforms. Please see if you can break this patch into 2 parts: 1. SO_TIMESTAMPING 2. PHC support > This driver does support GEM-GXL: "This driver supports GEM-GXL:" > - HW time stamp on the PTP Ethernet packets are received using the > SO_TIMESTAMPING API. Where timers are obtained from the dma buffer > descriptors This text is poor. No "timers" are obtained but rather time stamps. Also, second sentence is not a sentence. (An english sentence has a noun and a verb.) > diff --git a/drivers/net/ethernet/cadence/macb.c > b/drivers/net/ethernet/cadence/macb.c > index 59d459b..603bac1 100644 > --- a/drivers/net/ethernet/cadence/macb.c > +++ b/drivers/net/ethernet/cadence/macb.c > @@ -826,6 +826,15 @@ static void macb_tx_interrupt(struct macb_queue *queue) > > /* First, update TX stats if needed */ > if (skb) { > +#ifdef CONFIG_MACB_USE_HWSTAMP No need for ifdef here. Instead, let gem_ptp_do_txstamp() return -1. > + if (gem_ptp_do_txstamp(queue, skb, desc) == 0) { > + /* skb now belongs to timestamp buffer > + * and will be removed later > + */ > + tx_skb->skb = NULL; > + schedule_work(&queue->tx_ts_task); > + } > +#endif > netdev_vdbg(bp->dev, "skb %u (data %p) TX > complete\n", > macb_tx_ring_wrap(bp, tail), > skb->data); > @@ -992,6 +1001,10 @@ static int gem_rx(struct macb *bp, int budget) > bp->dev->stats.rx_packets++; > bp->dev->stats.rx_bytes += skb->len; > > +#ifdef CONFIG_MACB_USE_HWSTAMP No ifdef needed. > + gem_ptp_do_rxstamp(bp, skb, desc); > +#endif > + > #if defined(DEBUG) && defined(VERBOSE_DEBUG) > netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n", > skb->len, skb->csum); > @@ -1314,6 +1327,11 @@ static irqreturn_t macb_interrupt(int irq, void > *dev_id) > queue_writel(queue, ISR, MACB_BIT(HRESP)); > } > > +#ifdef CONFIG_MACB_USE_HWSTAMP > + if (status & MACB_PTP_INT_MASK) Can't you use IS_ENABLED(CONFIG_MACB_USE_HWSTAMP) here? > + macb_ptp_int(queue, status); > +#endif > + > status = queue_readl(queue, ISR); > } > > @@ -1643,8 +1661,10 @@ static int macb_start_xmit(struct sk_buff *skb, struct > net_device *dev) > > /* Make newly initialized descriptor visible to hardware */ > wmb(); > - > - skb_tx_timestamp(skb); > +#ifdef CONFIG_MACB_USE_HWSTAMP > + if (!bp->ptp_hw_support) > +#endif > + skb_tx_timestamp(skb); This is wrong. You should call skb_tx_timestamp() unconditionally, but be sure to set SKBTX_IN_PROGRESS when appropriate. > diff --git a/drivers/net/ethernet/cadence/macb.h > b/drivers/net/ethernet/cadence/macb.h > index 2606970..5295045 100644 > --- a/drivers/net/ethernet/cadence/macb.h > +++ b/drivers/net/ethernet/cadence/macb.h > @@ -11,6 +11,9 @@ > #define _MACB_H > > #include > +#include You don't need to include ptp_clock.h. > +#include > +#include > > #if defined(CONFIG_ARCH_DMA_ADDR_T_64BIT) || defined(CONFIG_MACB_USE_HWSTAMP) > #define MACB_EXT_DESC ... > @@ -527,6 +595,8 @@ > #define queue_readl(queue, reg) > (queue)->bp->macb_reg_readl((queue)->bp, (queue)->reg) > #define queue_writel(queue, reg, value) > (queue)->bp->macb_reg_writel((queue)->bp, (queue)->reg, (value)) > > +#define PTP_TS_BUFFER_SIZE 128 /* must be power of 2 */ > + > /* Conditional GEM/MACB macros. These perform the operation to the correct > * register dependent on whether the device is a GEM or a MACB. For > registers > * and bitfields that are common across both devices, use macb_{read,write}l > @@ -889,6 +959,18 @@ struct macb_config { > int jumbo_max_len; > }; > > +#ifdef CONFIG_MACB_USE_HWSTAMP No need for ifdef here. > +struct tsu_incr { > + u32 sub_ns; > + u32 ns; > +}; > + > +struct gem_tx_ts { > + struct sk_buff *skb; > + struct macb_dma_desc_ptp desc_ptp; > +}; > +#endif > + > struct macb_queue { > struct macb *bp; > int irq; ... > diff --git a/drivers/net/ethernet/cadence/macb_ptp.c > b/drivers/net/ethernet/cadence/macb_ptp.c > new file mode 100755 > index 000..72a79c4 > --- /dev/null > +++ b/drivers/net/ethernet/cadence/macb_ptp.c > @@ -0,0 +1,724 @@ > +/** > + * 1588 PTP support for Cadence G
Re: [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
On 4/14/2017 9:47 AM, Alexander Duyck wrote: On Wed, Mar 29, 2017 at 5:22 PM, Sridhar Samudrala wrote: In switchdev mode, broadcasts from VFs are received by the PF and passed to corresponding port representor netdev. Any frames sent via port netdevs are sent as directed transmits to the corresponding VFs. To enable directed transmit, skb metadata dst is used to pass the port id and the frame is requeued to call the PFs transmit routine. VF id is used as port id for VFs and PF port id is defined as I40_MAIN_VSI_PORT_ID. Small script to demonstrate inter VF and PF to VF pings in switchdev mode. PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1 PF Port rep: p4p1-pf # rmmod i40e; modprobe i40e # devlink dev eswitch set pci/:05:00.0 mode switchdev # echo 2 > /sys/class/net/p4p1/device/sriov_numvfs # ip link set p4p1 vf 0 mac 00:11:22:33:44:55 # ip link set p4p1 vf 1 mac 00:11:22:33:44:56 # rmmod i40evf; modprobe i40evf /* Create 2 namespaces and move the VFs to the corresponding ns */ # ip netns add ns0 # ip link set p4p1_0 netns ns0 # ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0 # ip netns exec ns0 ip link set p4p1_0 up # ip netns add ns1 # ip link set p4p1_1 netns ns1 # ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1 # ip netns exec ns1 ip link set p4p1_1 up /* bring up pf and port netdevs */ # ip addr add 192.168.1.1/24 dev p4p1 # ip link set p4p1 up # ip link set p4p1-vf0 up # ip link set p4p1-vf1 up # ip link set p4p1-pf up # ip netns exec ns0 ping -c3 192.168.1.11 /* VF0 -> VF1 */ # ip netns exec ns1 ping -c3 192.168.1.10 /* VF1 -> VF0 */ # ping -c3 192.168.1.10 /* PF -> VF0 */ # ping -c3 192.168.1.11 /* PF -> VF1 */ /* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 & p4p1 */ # ip netns exec ns0 ping -c1 -W1 192.168.1.200 /* VF1 -> IP in same subnet - broadcasts will be seen on p4p1-vf1 & p4p1*/ # ip netns exec ns0 ping -c1 -W1 192.168.1.200 /* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */ # ping -I p4p1-vf0 -c1 -W1 192.168.1.200 /* port rep VF1 -> IP in same subnet - broadcasts will be seen on p4p1_1 */ # ping -I p4p1-vf1 -c1 -W1 192.168.1.200 Signed-off-by: Sridhar Samudrala --- drivers/net/ethernet/intel/i40e/i40e.h | 4 + drivers/net/ethernet/intel/i40e/i40e_main.c| 27 +++- drivers/net/ethernet/intel/i40e/i40e_txrx.c| 148 - drivers/net/ethernet/intel/i40e/i40e_txrx.h| 2 + drivers/net/ethernet/intel/i40e/i40e_type.h| 3 + drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 8 +- 6 files changed, 184 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index ebffca0..86d2510 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -1302,20 +1302,64 @@ static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring, } /** + * i40e_handle_lpbk_skb - Update skb->dev of a loopback frame + * @rx_ring: rx ring in play + * @skb: packet to send up + **/ +static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb) +{ + struct i40e_q_vector *q_vector = rx_ring->q_vector; + struct i40e_pf *pf = rx_ring->vsi->back; + struct sk_buff *nskb; + struct i40e_vf *vf; + struct ethhdr *eth; + int vf_id; + + if ((skb->pkt_type != PACKET_BROADCAST) && + (skb->pkt_type != PACKET_MULTICAST) && + (skb->pkt_type != PACKET_OTHERHOST)) + return; + + eth = (struct ethhdr *)skb_mac_header(skb); + + /* If a loopback packet is received in switchdev mode, clone the skb +* and pass it to the corresponding port netdev based on the source MAC. +*/ + for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) { + vf = &pf->vf[vf_id]; + if (ether_addr_equal(eth->h_source, +vf->default_lan_addr.addr)) { + nskb = skb_clone(skb, GFP_ATOMIC); + if (!nskb) + break; + nskb->offload_fwd_mark = 1; So this line is causing build errors when switchdev is not enabled. This whole function should probably be wrapped in a check to see if switchdev support is enabled or not. Yes. will fix it in the next revision. Thanks Sridhar
[PATCH net] ipv6: drop non loopback packets claiming to originate from ::1
We lack a saddr check for ::1. This causes security issues e.g. with acls permitting connections from ::1 because of assumption that these originate from local machine. Assuming a source address of ::1 is local seems reasonable. RFC4291 doesn't allow such a source address either, so drop such packets. Reported-by: Eric Dumazet Signed-off-by: Florian Westphal --- net/ipv6/ip6_input.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index aacfb4bce153..c45b12b4431c 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -122,11 +122,14 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs)); /* * RFC4291 2.5.3 +* The loopback address must not be used as the source address in IPv6 +* packets that are sent outside of a single node. [..] * A packet received on an interface with a destination address * of loopback must be dropped. */ - if (!(dev->flags & IFF_LOOPBACK) && - ipv6_addr_loopback(&hdr->daddr)) + if ((ipv6_addr_loopback(&hdr->saddr) || +ipv6_addr_loopback(&hdr->daddr)) && +!(dev->flags & IFF_LOOPBACK)) goto err; /* RFC4291 Errata ID: 3480 -- 2.10.2
pull request: bluetooth-next 2017-04-14
Hi Dave, Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12 kernel. - Many fixes to 6LoWPAN, in particular for BLE - New CA8210 IEEE 802.15.4 device driver (accounting for most of the lines of code added in this pull request) - Added Nokia Bluetooth (UART) HCI driver - Some serdev & TTY changes that are dependencies for the Nokia driver (with acks from relevant maintainers and an agreement that these come through the bluetooth tree) - Support for new Intel Bluetooth device - Various other minor cleanups/fixes here and there Please let me know if there are any issues pulling. Thanks. Johan --- The following changes since commit d92be7a41ef15463eb816a4a2d42bf094b56dfce: net: make struct net_device::min_header_len 8-bit (2017-04-12 13:59:21 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git for-upstream for you to fetch changes up to 019aa56b7df8a796b2c01a56269a370ad3442ec7: arm64: dts: hikey: add WL1835 Bluetooth device node (2017-04-13 19:22:53 +0200) Alexander Aring (2): 6lowpan: iphc: override l2 packet information ipv6: addrconf: fix 48 bit 6lowpan autoconfiguration Andy Shevchenko (1): Bluetooth: hci_bcm: Support platform enumeration Colin Ian King (4): Bluetooth: btmrvl: fix spelling mistake: "unregester" -> "unregister" Bluetooth: fix assignments on error variable err 6lowpan: fix assignment of peer_addr ieee802154: ca8210: Add checks for kmalloc allocation failures Dean Jenkins (2): Bluetooth: Handle bt_accept_enqueue() socket atomically Bluetooth: Avoid bt_accept_unlink() double unlinking Elena Reshetova (1): Bluetooth: convert rfcomm_dlc.refcnt from atomic_t to refcount_t Gabriel (1): Bluetooth: Added support for Rivet Networks Killer 1535 Geliang Tang (1): Bluetooth: bluecard: use setup_timer Harry Morris (3): ieee802154: Add CA8210 IEEE 802.15.4 device driver ieee802154: Add device tree documentation for CA8210 ieee802154: Add entry in MAINTAINTERS for CA8210 driver Jeffy Chen (2): Bluetooth: btusb: wake system up when receives a wake irq Bluetooth: btmrvl: wake system up when receives a wake irq Johan Hovold (2): Bluetooth: hci_bcm: add missing tty-device sanity check Bluetooth: hci_intel: add missing tty-device sanity check John Keeping (1): Bluetooth: hci_bcm: Fix clock (un)prepare Jonas Holmberg (1): Bluetooth: Change initial min and max interval Larry Finger (1): Bluetooth: btrtl: Change message for missing config file Luiz Augusto von Dentz (10): 6lowpan: Use netdev addr_len to determine lladdr len 6lowpan: Fix IID format for Bluetooth Bluetooth: 6lowpan: Remove unnecessary peer lookup Bluetooth: 6lowpan: Print errors during recv_pkt Bluetooth: L2CAP: Don't return -EAGAIN if out of credits 6lowpan: Don't set IFF_NO_QUEUE Bluetooth: 6lowpan: Don't drop packets when run out of credits Bluetooth: 6lowpan: Use netif APIs to flow control Bluetooth: L2CAP: Add l2cap_le_flowctl_send Bluetooth: 6lowpan: Set tx_queue_len to DEFAULT_TX_QUEUE_LEN Marcel Holtmann (1): Bluetooth: btusb: Add support for Intel Bluetooth devices 9160/9260 [8087:0025] Marcin Kraglak (1): Bluetooth: L2CAP: Fix L2CAP_CR_SCID_IN_USE value Michael Scott (2): Bluetooth: 6lowpan: fix delay work init in add_peer_chan() Bluetooth: 6lowpan: fix use after free in chan_suspend/resume Patrik Flykt (3): bluetooth: Set 6 byte device addresses 6lowpan: Set MAC address length according to LOWPAN_LLTYPE bluetooth: Do not set IFF_POINTOPOINT Rob Herring (5): Bluetooth: hci_uart: add serdev driver support library dt-bindings: net: Add TI WiLink shared transport binding bluetooth: hci_uart: remove unused hci_uart_init_tty bluetooth: hci_uart: add LL protocol serdev driver support arm64: dts: hikey: add WL1835 Bluetooth device node Sebastian Reichel (9): tty: serial: omap: add UPF_BOOT_AUTOCONF flag for DT init serdev: add serdev_device_wait_until_sent serdev: implement get/set tiocm serdev: add helpers for cts and rts handling Bluetooth: hci_uart: add support for word alignment Bluetooth: hci_serdev: do not open device in hci open Bluetooth: hci_serdev: allow modular drivers dt-bindings: net: bluetooth: Add nokia-bluetooth Bluetooth: add nokia driver Tedd Ho-Jeong An (3): Bluetooth: Use switch statement for Intel hardware variants Bluetooth: hci_intel: Fix firmware file name to use hw_variant Bluetooth: hci_intel: Add support Intel Bluetooth device 9160/9260 for UART Xinming Hu (2): Bluetooth: btmrvl: disable platform wakeup interrupt in suspend failure path Bluetooth: btmrvl: remove unnecessary
[PATCH net-next] bonding: do not pass link-local packets to master-interface
Previously link-local packets excluding LACP (which are handled by the recv_probe) received on bond slave interfaces are delivered to stack with bond-master device with RX_HANDLER_ANOTHER, however all link-local packets are link specific and should be delivered with exact same link/dev. Signed-off-by: Chonggang Li Signed-off-by: Mahesh Bandewar Signed-off-by: Maciej Żenczykowski --- drivers/net/bonding/bond_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 01e4a69af421..aeca3d8541b9 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1176,6 +1176,9 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb) } } + /* link-local packets should not be passed to master interface */ + if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) + return RX_HANDLER_PASS; if (bond_should_deliver_exact_match(skb, slave, bond)) return RX_HANDLER_EXACT; -- 2.12.2.762.g0e3151a226-goog
Re: [PATCH net-next v2 6/6] vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses
On Fri, 14 Apr 2017 18:44:46 +0200 Matthias Schiffer wrote: > As link-local addresses are only valid for a single interface, we can allow > to use the same VNI for multiple independent VXLANs, as long as the used > interfaces are distinct. This way, VXLANs can always be used as a drop-in > replacement for VLANs with greater ID space. > > This also extends VNI lookup to respect the ifindex when link-local IPv6 > addresses are used, so using the same VNI on multiple interfaces can > actually work. > > Signed-off-by: Matthias Schiffer Why does this have to be IPv6 specific? What about the case where VXLAN is not bound to an interface? If that is used then that could be a problem.
Re: [PATCH net-next v2 4/6] vxlan: check valid combinations of address scopes
On Fri, 14 Apr 2017 18:44:44 +0200 Matthias Schiffer wrote: > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c > index 07f89b037681..95a71546e8f2 100644 > --- a/drivers/net/vxlan.c > +++ b/drivers/net/vxlan.c > @@ -2881,11 +2881,39 @@ static int vxlan_config_validate(struct net *src_net, > struct vxlan_config *conf, > if (conf->saddr.sa.sa_family != conf->remote_ip.sa.sa_family) > return -EINVAL; > > + if (vxlan_addr_multicast(&conf->saddr)) > + return -EINVAL; > + > if (conf->saddr.sa.sa_family == AF_INET6) { > if (!IS_ENABLED(CONFIG_IPV6)) > return -EPFNOSUPPORT; > use_ipv6 = true; > conf->flags |= VXLAN_F_IPV6; > + > + if (!(conf->flags & VXLAN_F_COLLECT_METADATA)) { > + int local_type = > + ipv6_addr_type(&conf->saddr.sin6.sin6_addr); > + int remote_type = > + ipv6_addr_type(&conf->remote_ip.sin6.sin6_addr); > + > + if (local_type & IPV6_ADDR_LINKLOCAL) { > + if (!(remote_type & IPV6_ADDR_LINKLOCAL) && > + (remote_type != IPV6_ADDR_ANY)) { > + pr_info("invalid combination of address > scopes\n"); It is always helpful to include device if possible in error message. netdev_notice(old->dev, " invalid combination of address scopes\n"); Also vxlan is good candidate for extended netlink error reporting.
[PATCH net-next 1/6] bpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU
test_lru_sanity3 is not applicable to BPF_F_NO_COMMON_LRU. It just happens to work when PERCPU_FREE_TARGET == 16. This patch: 1) Disable test_lru_sanity3 for BPF_F_NO_COMMON_LRU 2) Add test_lru_sanity6 to test list rotation for the BPF_F_NO_COMMON_LRU map. Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- tools/testing/selftests/bpf/test_lru_map.c | 70 +++--- 1 file changed, 65 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/bpf/test_lru_map.c b/tools/testing/selftests/bpf/test_lru_map.c index 00b0aff56e2e..04b2242937cc 100644 --- a/tools/testing/selftests/bpf/test_lru_map.c +++ b/tools/testing/selftests/bpf/test_lru_map.c @@ -387,6 +387,10 @@ static void test_lru_sanity3(int map_type, int map_flags, unsigned int tgt_free) unsigned int map_size; int next_cpu = 0; + if (map_flags & BPF_F_NO_COMMON_LRU) + /* This test is only applicable to common LRU list */ + return; + printf("%s (map_type:%d map_flags:0x%X): ", __func__, map_type, map_flags); @@ -396,11 +400,7 @@ static void test_lru_sanity3(int map_type, int map_flags, unsigned int tgt_free) assert(batch_size * 2 == tgt_free); map_size = tgt_free * 2; - if (map_flags & BPF_F_NO_COMMON_LRU) - lru_map_fd = create_map(map_type, map_flags, - map_size * nr_cpus); - else - lru_map_fd = create_map(map_type, map_flags, map_size); + lru_map_fd = create_map(map_type, map_flags, map_size); assert(lru_map_fd != -1); expected_map_fd = create_map(BPF_MAP_TYPE_HASH, 0, map_size); @@ -566,6 +566,65 @@ static void test_lru_sanity5(int map_type, int map_flags) printf("Pass\n"); } +/* Test list rotation for BPF_F_NO_COMMON_LRU map */ +static void test_lru_sanity6(int map_type, int map_flags, int tgt_free) +{ + int lru_map_fd, expected_map_fd; + unsigned long long key, value[nr_cpus]; + unsigned int map_size = tgt_free * 2; + int next_cpu = 0; + + if (!(map_flags & BPF_F_NO_COMMON_LRU)) + return; + + printf("%s (map_type:%d map_flags:0x%X): ", __func__, map_type, + map_flags); + + assert(sched_next_online(0, &next_cpu) != -1); + + expected_map_fd = create_map(BPF_MAP_TYPE_HASH, 0, map_size); + assert(expected_map_fd != -1); + + lru_map_fd = create_map(map_type, map_flags, map_size * nr_cpus); + assert(lru_map_fd != -1); + + value[0] = 1234; + + for (key = 1; key <= tgt_free; key++) { + assert(!bpf_map_update_elem(lru_map_fd, &key, value, + BPF_NOEXIST)); + assert(!bpf_map_update_elem(expected_map_fd, &key, value, + BPF_NOEXIST)); + } + + for (; key <= tgt_free * 2; key++) { + unsigned long long stable_key; + + /* Make ref bit sticky for key: [1, tgt_free] */ + for (stable_key = 1; stable_key <= tgt_free; stable_key++) { + /* Mark the ref bit */ + assert(!bpf_map_lookup_elem(lru_map_fd, &stable_key, + value)); + } + assert(!bpf_map_update_elem(lru_map_fd, &key, value, + BPF_NOEXIST)); + } + + for (; key <= tgt_free * 3; key++) { + assert(!bpf_map_update_elem(lru_map_fd, &key, value, + BPF_NOEXIST)); + assert(!bpf_map_update_elem(expected_map_fd, &key, value, + BPF_NOEXIST)); + } + + assert(map_equal(lru_map_fd, expected_map_fd)); + + close(expected_map_fd); + close(lru_map_fd); + + printf("Pass\n"); +} + int main(int argc, char **argv) { struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; @@ -593,6 +652,7 @@ int main(int argc, char **argv) test_lru_sanity3(map_types[t], map_flags[f], tgt_free); test_lru_sanity4(map_types[t], map_flags[f], tgt_free); test_lru_sanity5(map_types[t], map_flags[f]); + test_lru_sanity6(map_types[t], map_flags[f], tgt_free); printf("\n"); } -- 2.9.3
[PATCH net-next 6/6] bpf: lru: Add map-in-map LRU example
This patch adds a map-in-map LRU example. If we know only a subset of cores will use the LRU, we can allocate a common LRU list per targeting core and store it into an array-of-hashs. It allows using the common LRU map with map-update performance comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory on the unused cores that we know they will never access the LRU map. BPF_F_NO_COMMON_LRU: > map_perf_test 32 8 1000 1000 | awk '{sum += $3}END{print sum}' 9234314 (9.23M/s) map-in-map LRU: > map_perf_test 512 8 126 8000 | awk '{sum += $3}END{print sum}' 9962743 (9.96M/s) Notes that the max_entries for the map-in-map LRU test is 126 which is the max_entries for each inner LRU map. 8 processes have been started, so 8 * 126 = 1008 (~10M) which is close to what is used in the BPF_F_NO_COMMON_LRU test. Signed-off-by: Martin KaFai Lau --- samples/bpf/map_perf_test_kern.c | 34 -- samples/bpf/map_perf_test_user.c | 62 2 files changed, 93 insertions(+), 3 deletions(-) diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c index 404ed53b8a53..245165817fbe 100644 --- a/samples/bpf/map_perf_test_kern.c +++ b/samples/bpf/map_perf_test_kern.c @@ -11,6 +11,7 @@ #include "bpf_helpers.h" #define MAX_ENTRIES 1000 +#define MAX_NR_CPUS 1024 struct bpf_map_def SEC("maps") hash_map = { .type = BPF_MAP_TYPE_HASH, @@ -34,6 +35,19 @@ struct bpf_map_def SEC("maps") nocommon_lru_hash_map = { .map_flags = BPF_F_NO_COMMON_LRU, }; +struct bpf_map_def SEC("maps") inner_lru_hash_map = { + .type = BPF_MAP_TYPE_LRU_HASH, + .key_size = sizeof(u32), + .value_size = sizeof(long), + .max_entries = MAX_ENTRIES, +}; + +struct bpf_map_def SEC("maps") array_of_lru_hashs = { + .type = BPF_MAP_TYPE_ARRAY_OF_MAPS, + .key_size = sizeof(u32), + .max_entries = MAX_NR_CPUS, +}; + struct bpf_map_def SEC("maps") percpu_hash_map = { .type = BPF_MAP_TYPE_PERCPU_HASH, .key_size = sizeof(u32), @@ -154,13 +168,27 @@ int stress_lru_hmap_alloc(struct pt_regs *ctx) test_case = dst6[7]; - if (test_case == 0) + if (test_case == 0) { ret = bpf_map_update_elem(&lru_hash_map, &key, &val, BPF_ANY); - else if (test_case == 1) + } else if (test_case == 1) { ret = bpf_map_update_elem(&nocommon_lru_hash_map, &key, &val, BPF_ANY); - else + } else if (test_case == 2) { + void *nolocal_lru_map; + int cpu = bpf_get_smp_processor_id(); + + nolocal_lru_map = bpf_map_lookup_elem(&array_of_lru_hashs, + &cpu); + if (!nolocal_lru_map) { + ret = -ENOENT; + goto done; + } + + ret = bpf_map_update_elem(nolocal_lru_map, &key, &val, + BPF_ANY); + } else { ret = -EINVAL; + } done: if (ret) diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c index 2a12f48b5c6d..6ac778153315 100644 --- a/samples/bpf/map_perf_test_user.c +++ b/samples/bpf/map_perf_test_user.c @@ -25,6 +25,7 @@ #include "bpf_load.h" #define TEST_BIT(t) (1U << (t)) +#define MAX_NR_CPUS 1024 static __u64 time_get_ns(void) { @@ -44,6 +45,7 @@ enum test_type { LPM_KMALLOC, HASH_LOOKUP, ARRAY_LOOKUP, + INNER_LRU_HASH_PREALLOC, NR_TESTS, }; @@ -57,10 +59,14 @@ const char *test_map_names[NR_TESTS] = { [LPM_KMALLOC] = "lpm_trie_map_alloc", [HASH_LOOKUP] = "hash_map", [ARRAY_LOOKUP] = "array_map", + [INNER_LRU_HASH_PREALLOC] = "inner_lru_hash_map", }; static int test_flags = ~0; static uint32_t num_map_entries; +static uint32_t inner_lru_hash_size; +static int inner_lru_hash_idx = -1; +static int array_of_lru_hashs_idx = -1; static uint32_t max_cnt = 100; static int check_test_flags(enum test_type t) @@ -82,11 +88,42 @@ static void test_hash_prealloc(int cpu) static void do_test_lru(enum test_type test, int cpu) { + static int inner_lru_map_fds[MAX_NR_CPUS]; + struct sockaddr_in6 in6 = { .sin6_family = AF_INET6 }; const char *test_name; __u64 start_time; int i, ret; + if (test == INNER_LRU_HASH_PREALLOC) { + int outer_fd = map_fd[array_of_lru_hashs_idx]; + + assert(cpu < MAX_NR_CPUS); + + if (cpu) { + inner_lru_map_fds[cpu] = + bpf_create_map(BPF_MAP_TYPE_LRU_HASH, + sizeof(uint32_t), sizeof(long), + inner_lru_hash_size, 0); + if (inner_lru_map_fds[cpu] == -1) { +
[PATCH net-next 0/6] bpf: LRU performance and test-program improvements
The first 4 patches make a few improvements to the LRU tests. Patch 5/6 is to improve the performance of BPF_F_NO_COMMON_LRU map. Patch 6/6 adds an example in using LRU map with map-in-map. Martin KaFai Lau (6): bpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU bpf: lru: Cleanup test_lru_map.c bpf: lru: Refactor LRU map tests in map_perf_test bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4 bpf: lru: Add map-in-map LRU example kernel/bpf/bpf_lru_list.c | 2 +- samples/bpf/bpf_load.c | 114 ++--- samples/bpf/bpf_load.h | 13 ++ samples/bpf/map_perf_test_kern.c | 75 +++-- samples/bpf/map_perf_test_user.c | 247 + tools/testing/selftests/bpf/test_lru_map.c | 104 6 files changed, 428 insertions(+), 127 deletions(-) -- 2.9.3
[PATCH net-next 3/6] bpf: lru: Refactor LRU map tests in map_perf_test
One more LRU test will be added later in this patch series. In this patch, we first move all existing LRU map tests into a single syscall (connect) first so that the future new LRU test can be added without hunting another syscall. One of the map name is also changed from percpu_lru_hash_map to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map. Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- samples/bpf/map_perf_test_kern.c | 43 +++- samples/bpf/map_perf_test_user.c | 53 +++- 2 files changed, 67 insertions(+), 29 deletions(-) diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c index 9da2a3441b0a..404ed53b8a53 100644 --- a/samples/bpf/map_perf_test_kern.c +++ b/samples/bpf/map_perf_test_kern.c @@ -26,7 +26,7 @@ struct bpf_map_def SEC("maps") lru_hash_map = { .max_entries = 1, }; -struct bpf_map_def SEC("maps") percpu_lru_hash_map = { +struct bpf_map_def SEC("maps") nocommon_lru_hash_map = { .type = BPF_MAP_TYPE_LRU_HASH, .key_size = sizeof(u32), .value_size = sizeof(long), @@ -100,6 +100,7 @@ int stress_percpu_hmap(struct pt_regs *ctx) bpf_map_delete_elem(&percpu_hash_map, &key); return 0; } + SEC("kprobe/sys_getgid") int stress_hmap_alloc(struct pt_regs *ctx) { @@ -128,24 +129,42 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx) return 0; } -SEC("kprobe/sys_getpid") +SEC("kprobe/sys_connect") int stress_lru_hmap_alloc(struct pt_regs *ctx) { - u32 key = bpf_get_prandom_u32(); + struct sockaddr_in6 *in6; + u16 test_case, dst6[8]; + int addrlen, ret; + char fmt[] = "Failed at stress_lru_hmap_alloc. ret:%d\n"; long val = 1; + u32 key = bpf_get_prandom_u32(); - bpf_map_update_elem(&lru_hash_map, &key, &val, BPF_ANY); + in6 = (struct sockaddr_in6 *)PT_REGS_PARM2(ctx); + addrlen = (int)PT_REGS_PARM3(ctx); - return 0; -} + if (addrlen != sizeof(*in6)) + return 0; -SEC("kprobe/sys_getppid") -int stress_percpu_lru_hmap_alloc(struct pt_regs *ctx) -{ - u32 key = bpf_get_prandom_u32(); - long val = 1; + ret = bpf_probe_read(dst6, sizeof(dst6), &in6->sin6_addr); + if (ret) + goto done; + + if (dst6[0] != 0xdead || dst6[1] != 0xbeef) + return 0; + + test_case = dst6[7]; + + if (test_case == 0) + ret = bpf_map_update_elem(&lru_hash_map, &key, &val, BPF_ANY); + else if (test_case == 1) + ret = bpf_map_update_elem(&nocommon_lru_hash_map, &key, &val, + BPF_ANY); + else + ret = -EINVAL; - bpf_map_update_elem(&percpu_lru_hash_map, &key, &val, BPF_ANY); +done: + if (ret) + bpf_trace_printk(fmt, sizeof(fmt), ret); return 0; } diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c index e29ff318a793..51cb8f238aa2 100644 --- a/samples/bpf/map_perf_test_user.c +++ b/samples/bpf/map_perf_test_user.c @@ -18,6 +18,9 @@ #include #include #include +#include +#include + #include "libbpf.h" #include "bpf_load.h" @@ -36,7 +39,7 @@ static __u64 time_get_ns(void) #define HASH_KMALLOC (1 << 2) #define PERCPU_HASH_KMALLOC(1 << 3) #define LRU_HASH_PREALLOC (1 << 4) -#define PERCPU_LRU_HASH_PREALLOC (1 << 5) +#define NOCOMMON_LRU_HASH_PREALLOC (1 << 5) #define LPM_KMALLOC(1 << 6) #define HASH_LOOKUP(1 << 7) #define ARRAY_LOOKUP (1 << 8) @@ -55,28 +58,44 @@ static void test_hash_prealloc(int cpu) cpu, MAX_CNT * 10ll / (time_get_ns() - start_time)); } -static void test_lru_hash_prealloc(int cpu) +static void do_test_lru(int lru_test_flag, int cpu) { + struct sockaddr_in6 in6 = { .sin6_family = AF_INET6 }; + const char *test_name; __u64 start_time; - int i; + int i, ret; + + in6.sin6_addr.s6_addr16[0] = 0xdead; + in6.sin6_addr.s6_addr16[1] = 0xbeef; + + if (lru_test_flag & LRU_HASH_PREALLOC) { + test_name = "lru_hash_map_perf"; + in6.sin6_addr.s6_addr16[7] = 0; + } else if (lru_test_flag & NOCOMMON_LRU_HASH_PREALLOC) { + test_name = "nocommon_lru_hash_map_perf"; + in6.sin6_addr.s6_addr16[7] = 1; + } else { + assert(0); + } start_time = time_get_ns(); - for (i = 0; i < MAX_CNT; i++) - syscall(__NR_getpid); - printf("%d:lru_hash_map_perf pre-alloc %lld events per sec\n", - cpu, MAX_CNT * 10ll / (time_get_ns() - start_time)); + for (i = 0; i < MAX_CNT; i++) { + ret = connect(-1, (const struct sockaddr *)&in6, sizeof(in6)); + assert(ret == -1 && errno == EBADF); +
[PATCH net-next 5/6] bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4
After doing map_perf_test with a much bigger BPF_F_NO_COMMON_LRU map, the perf report shows a lot of time spent in rotating the inactive list (i.e. __bpf_lru_list_rotate_inactive): > map_perf_test 32 8 1 100 | awk '{sum += $3}END{print sum}' 19644783 (19M/s) > map_perf_test 32 8 1000 1000 | awk '{sum += $3}END{print sum}' 6283930 (6.28M/s) By inactive, it usually means the element is not in cache. Hence, there is a need to tune the PERCPU_NR_SCANS value. This patch finds a better number of elements to scan during each list rotation. The PERCPU_NR_SCANS (which is defined the same as PERCPU_FREE_TARGET) decreases from 16 elements to 4 elements. This change only affects the BPF_F_NO_COMMON_LRU map. The test_lru_dist does not show meaningful difference between 16 and 4. Our production L4 load balancer which uses the LRU map for conntrack-ing also shows little change in cache hit rate. Since both benchmark and production data show no cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4. We can consider making it configurable if we find a usecase later that shows another value works better and/or use a different rotation strategy. After this change: > map_perf_test 32 8 1000 1000 | awk '{sum += $3}END{print sum}' 9240324 (9.2M/s) i.e. 6.28M/s -> 9.2M/s The test_lru_dist has not shown meaningful difference: > test_lru_dist zipf.100k.a1_01.out 4000 1: nr_misses: 31575 (Before) vs 31566 (After) > test_lru_dist zipf.100k.a0_01.out 4 1 nr_misses: 67036 (Before) vs 67031 (After) Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- kernel/bpf/bpf_lru_list.c | 2 +- tools/testing/selftests/bpf/test_lru_map.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c index f62d1d56f41d..e6ef4401a138 100644 --- a/kernel/bpf/bpf_lru_list.c +++ b/kernel/bpf/bpf_lru_list.c @@ -13,7 +13,7 @@ #define LOCAL_FREE_TARGET (128) #define LOCAL_NR_SCANS LOCAL_FREE_TARGET -#define PERCPU_FREE_TARGET (16) +#define PERCPU_FREE_TARGET (4) #define PERCPU_NR_SCANSPERCPU_FREE_TARGET /* Helpers to get the local list index */ diff --git a/tools/testing/selftests/bpf/test_lru_map.c b/tools/testing/selftests/bpf/test_lru_map.c index 87c05e5bdfdd..8c10c9180c1a 100644 --- a/tools/testing/selftests/bpf/test_lru_map.c +++ b/tools/testing/selftests/bpf/test_lru_map.c @@ -22,7 +22,7 @@ #include "bpf_util.h" #define LOCAL_FREE_TARGET (128) -#define PERCPU_FREE_TARGET (16) +#define PERCPU_FREE_TARGET (4) static int nr_cpus; -- 2.9.3
[PATCH net-next 4/6] bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def
The current bpf_map_def is statically defined during compile time. This patch allows the *_user.c program to change it during runtime. It is done by adding load_bpf_file_fixup_map() which takes a callback. The callback will be called before creating each map so that it has a chance to modify the bpf_map_def. The current usecase is to change max_entries in map_perf_test. It is interesting to test with a much bigger map size in some cases (e.g. the following patch on bpf_lru_map.c). However, it is hard to find one size to fit all testing environment. Hence, it is handy to take the max_entries as a cmdline arg and then configure the bpf_map_def during runtime. This patch adds two cmdline args. One is to configure the map's max_entries. Another is to configure the max_cnt which controls how many times a syscall is called. Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- samples/bpf/bpf_load.c | 114 +- samples/bpf/bpf_load.h | 13 samples/bpf/map_perf_test_user.c | 148 --- 3 files changed, 201 insertions(+), 74 deletions(-) diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c index dcdce1270d38..0d449d8032d1 100644 --- a/samples/bpf/bpf_load.c +++ b/samples/bpf/bpf_load.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "libbpf.h" #include "bpf_load.h" #include "perf-sys.h" @@ -37,15 +38,6 @@ int event_fd[MAX_PROGS]; int prog_cnt; int prog_array_fd = -1; -struct bpf_map_def { - unsigned int type; - unsigned int key_size; - unsigned int value_size; - unsigned int max_entries; - unsigned int map_flags; - unsigned int inner_map_idx; -}; - static int populate_prog_array(const char *event, int prog_fd) { int ind = atoi(event), err; @@ -193,11 +185,14 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size) return 0; } -static int load_maps(struct bpf_map_def *maps, int len) +static int load_maps(struct bpf_map_def *maps, int len, +const char **map_names, fixup_map_cb fixup_map) { int i; for (i = 0; i < len / sizeof(struct bpf_map_def); i++) { + if (fixup_map) + fixup_map(&maps[i], map_names[i], i); if (maps[i].type == BPF_MAP_TYPE_ARRAY_OF_MAPS || maps[i].type == BPF_MAP_TYPE_HASH_OF_MAPS) { @@ -280,14 +275,64 @@ static int parse_relo_and_apply(Elf_Data *data, Elf_Data *symbols, return 0; } -int load_bpf_file(char *path) +static int cmp_symbols(const void *l, const void *r) +{ + const GElf_Sym *lsym = (const GElf_Sym *)l; + const GElf_Sym *rsym = (const GElf_Sym *)r; + + if (lsym->st_value < rsym->st_value) + return -1; + else if (lsym->st_value > rsym->st_value) + return 1; + else + return 0; +} + +static int get_sorted_map_names(Elf *elf, Elf_Data *symbols, int maps_shndx, + int strtabidx, char **map_names) +{ + GElf_Sym map_symbols[MAX_MAPS]; + int i, nr_maps = 0; + + for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) { + assert(nr_maps < MAX_MAPS); + if (!gelf_getsym(symbols, i, &map_symbols[nr_maps])) + continue; + if (map_symbols[nr_maps].st_shndx != maps_shndx) + continue; + nr_maps++; + } + + qsort(map_symbols, nr_maps, sizeof(GElf_Sym), cmp_symbols); + + for (i = 0; i < nr_maps; i++) { + char *map_name; + + map_name = elf_strptr(elf, strtabidx, map_symbols[i].st_name); + if (!map_name) { + printf("cannot get map symbol\n"); + return 1; + } + + map_names[i] = strdup(map_name); + if (!map_names[i]) { + printf("strdup(%s): %s(%d)\n", map_name, + strerror(errno), errno); + return 1; + } + } + + return 0; +} + +static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map) { - int fd, i; + int fd, i, ret, maps_shndx = -1, strtabidx = -1; Elf *elf; GElf_Ehdr ehdr; GElf_Shdr shdr, shdr_prog; - Elf_Data *data, *data_prog, *symbols = NULL; - char *shname, *shname_prog; + Elf_Data *data, *data_prog, *data_maps = NULL, *symbols = NULL; + char *shname, *shname_prog, *map_names[MAX_MAPS] = { NULL }; /* reset global variables */ kern_version = 0; @@ -335,14 +380,33 @@ int load_bpf_file(char *path) } memcpy(&kern_version, data->d_buf, sizeof(int)); } else if (strcmp(shname, "maps") == 0) { - processed_sec
[PATCH net-next 2/6] bpf: lru: Cleanup test_lru_map.c
This patch does the following cleanup on test_lru_map.c 1) Fix indentation (Replace spaces by tabs) 2) Remove redundant BPF_F_NO_COMMON_LRU test 3) Simplify some comments Signed-off-by: Martin KaFai Lau Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- tools/testing/selftests/bpf/test_lru_map.c | 32 +- 1 file changed, 9 insertions(+), 23 deletions(-) diff --git a/tools/testing/selftests/bpf/test_lru_map.c b/tools/testing/selftests/bpf/test_lru_map.c index 04b2242937cc..87c05e5bdfdd 100644 --- a/tools/testing/selftests/bpf/test_lru_map.c +++ b/tools/testing/selftests/bpf/test_lru_map.c @@ -191,12 +191,7 @@ static void test_lru_sanity1(int map_type, int map_flags, unsigned int tgt_free) int next_cpu = 0; if (map_flags & BPF_F_NO_COMMON_LRU) - /* Ther percpu lru list (i.e each cpu has its own LRU -* list) does not have a local free list. Hence, -* it will only free old nodes till there is no free -* from the LRU list. Hence, this test does not apply -* to BPF_F_NO_COMMON_LRU -*/ + /* This test is only applicable to common LRU list */ return; printf("%s (map_type:%d map_flags:0x%X): ", __func__, map_type, @@ -227,7 +222,7 @@ static void test_lru_sanity1(int map_type, int map_flags, unsigned int tgt_free) for (key = 1; key < end_key; key++) { assert(!bpf_map_lookup_elem(lru_map_fd, &key, value)); assert(!bpf_map_update_elem(expected_map_fd, &key, value, - BPF_NOEXIST)); + BPF_NOEXIST)); } /* Insert 1+tgt_free to 2*tgt_free @@ -273,12 +268,7 @@ static void test_lru_sanity2(int map_type, int map_flags, unsigned int tgt_free) int next_cpu = 0; if (map_flags & BPF_F_NO_COMMON_LRU) - /* Ther percpu lru list (i.e each cpu has its own LRU -* list) does not have a local free list. Hence, -* it will only free old nodes till there is no free -* from the LRU list. Hence, this test does not apply -* to BPF_F_NO_COMMON_LRU -*/ + /* This test is only applicable to common LRU list */ return; printf("%s (map_type:%d map_flags:0x%X): ", __func__, map_type, @@ -290,11 +280,7 @@ static void test_lru_sanity2(int map_type, int map_flags, unsigned int tgt_free) assert(batch_size * 2 == tgt_free); map_size = tgt_free + batch_size; - if (map_flags & BPF_F_NO_COMMON_LRU) - lru_map_fd = create_map(map_type, map_flags, - map_size * nr_cpus); - else - lru_map_fd = create_map(map_type, map_flags, map_size); + lru_map_fd = create_map(map_type, map_flags, map_size); assert(lru_map_fd != -1); expected_map_fd = create_map(BPF_MAP_TYPE_HASH, 0, map_size); @@ -341,7 +327,7 @@ static void test_lru_sanity2(int map_type, int map_flags, unsigned int tgt_free) assert(!bpf_map_lookup_elem(lru_map_fd, &key, value)); assert(value[0] == 4321); assert(!bpf_map_update_elem(expected_map_fd, &key, value, - BPF_NOEXIST)); + BPF_NOEXIST)); } value[0] = 1234; @@ -361,7 +347,7 @@ static void test_lru_sanity2(int map_type, int map_flags, unsigned int tgt_free) assert(!bpf_map_update_elem(lru_map_fd, &key, value, BPF_NOEXIST)); assert(!bpf_map_update_elem(expected_map_fd, &key, value, - BPF_NOEXIST)); + BPF_NOEXIST)); } assert(map_equal(lru_map_fd, expected_map_fd)); @@ -419,7 +405,7 @@ static void test_lru_sanity3(int map_type, int map_flags, unsigned int tgt_free) for (key = 1; key < end_key; key++) { assert(!bpf_map_lookup_elem(lru_map_fd, &key, value)); assert(!bpf_map_update_elem(expected_map_fd, &key, value, - BPF_NOEXIST)); + BPF_NOEXIST)); } /* Add 1+2*tgt_free to tgt_free*5/2 @@ -431,7 +417,7 @@ static void test_lru_sanity3(int map_type, int map_flags, unsigned int tgt_free) assert(!bpf_map_update_elem(lru_map_fd, &key, value, BPF_NOEXIST)); assert(!bpf_map_update_elem(expected_map_fd, &key, value, - BPF_NOEXIST)); + BPF_NOEXIST)); } assert(map_equal(lru_map_fd, expected_map_fd)); @@ -491,7 +477,7 @@ static v
Re: [PATCH net-next v2 4/6] vxlan: check valid combinations of address scopes
On 04/14/2017 07:44 PM, Matthias Schiffer wrote: * Multicast addresses are never valid as local address * Link-local IPv6 unicast addresses may only be used as remote when the local address is link-local as well * Don't allow link-local IPv6 local/remote addresses without interface We also store in the flags field if link-local addresses are used for the follow-up patches that actually make VXLAN over link-local IPv6 work. Signed-off-by: Matthias Schiffer --- v2: was "vxlan: don't allow link-local IPv6 local/remote addresses without interface" before. v2 does a lot more checks and adds the VXLAN_F_IPV6_LINKLOCAL flag. drivers/net/vxlan.c | 35 +++ include/net/vxlan.h | 2 ++ 2 files changed, 37 insertions(+) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 07f89b037681..95a71546e8f2 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2881,11 +2881,39 @@ static int vxlan_config_validate(struct net *src_net, struct vxlan_config *conf, if (conf->saddr.sa.sa_family != conf->remote_ip.sa.sa_family) return -EINVAL; + if (vxlan_addr_multicast(&conf->saddr)) + return -EINVAL; + if (conf->saddr.sa.sa_family == AF_INET6) { if (!IS_ENABLED(CONFIG_IPV6)) return -EPFNOSUPPORT; use_ipv6 = true; conf->flags |= VXLAN_F_IPV6; + + if (!(conf->flags & VXLAN_F_COLLECT_METADATA)) { + int local_type = + ipv6_addr_type(&conf->saddr.sin6.sin6_addr); + int remote_type = + ipv6_addr_type(&conf->remote_ip.sin6.sin6_addr); + + if (local_type & IPV6_ADDR_LINKLOCAL) { + if (!(remote_type & IPV6_ADDR_LINKLOCAL) && + (remote_type != IPV6_ADDR_ANY)) { + pr_info("invalid combination of address scopes\n"); Maybe pr_err()? + return -EINVAL; + } + + conf->flags |= VXLAN_F_IPV6_LINKLOCAL; + } else { + if (remote_type == + (IPV6_ADDR_UNICAST | IPV6_ADDR_LINKLOCAL)) { + pr_info("invalid combination of address scopes\n"); Here as well... + return -EINVAL; + } + + conf->flags &= ~VXLAN_F_IPV6_LINKLOCAL; + } + } } if (conf->label && !use_ipv6) { [...] MBR, Sergei
Re: [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
On Wed, Mar 29, 2017 at 5:22 PM, Sridhar Samudrala wrote: > In switchdev mode, broadcasts from VFs are received by the PF and passed > to corresponding port representor netdev. > Any frames sent via port netdevs are sent as directed transmits to the > corresponding VFs. To enable directed transmit, skb metadata dst is used > to pass the port id and the frame is requeued to call the PFs transmit > routine. VF id is used as port id for VFs and PF port id is defined as > I40_MAIN_VSI_PORT_ID. > > Small script to demonstrate inter VF and PF to VF pings in switchdev mode. > PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1 > PF Port rep: p4p1-pf > > # rmmod i40e; modprobe i40e > # devlink dev eswitch set pci/:05:00.0 mode switchdev > # echo 2 > /sys/class/net/p4p1/device/sriov_numvfs > # ip link set p4p1 vf 0 mac 00:11:22:33:44:55 > # ip link set p4p1 vf 1 mac 00:11:22:33:44:56 > # rmmod i40evf; modprobe i40evf > > /* Create 2 namespaces and move the VFs to the corresponding ns */ > # ip netns add ns0 > # ip link set p4p1_0 netns ns0 > # ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0 > # ip netns exec ns0 ip link set p4p1_0 up > # ip netns add ns1 > # ip link set p4p1_1 netns ns1 > # ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1 > # ip netns exec ns1 ip link set p4p1_1 up > > /* bring up pf and port netdevs */ > # ip addr add 192.168.1.1/24 dev p4p1 > # ip link set p4p1 up > # ip link set p4p1-vf0 up > # ip link set p4p1-vf1 up > # ip link set p4p1-pf up > > # ip netns exec ns0 ping -c3 192.168.1.11 /* VF0 -> VF1 */ > # ip netns exec ns1 ping -c3 192.168.1.10 /* VF1 -> VF0 */ > # ping -c3 192.168.1.10 /* PF -> VF0 */ > # ping -c3 192.168.1.11 /* PF -> VF1 */ > > /* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 & p4p1 */ > # ip netns exec ns0 ping -c1 -W1 192.168.1.200 > /* VF1 -> IP in same subnet - broadcasts will be seen on p4p1-vf1 & p4p1*/ > # ip netns exec ns0 ping -c1 -W1 192.168.1.200 > /* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */ > # ping -I p4p1-vf0 -c1 -W1 192.168.1.200 > /* port rep VF1 -> IP in same subnet - broadcasts will be seen on p4p1_1 */ > # ping -I p4p1-vf1 -c1 -W1 192.168.1.200 > > Signed-off-by: Sridhar Samudrala > --- > drivers/net/ethernet/intel/i40e/i40e.h | 4 + > drivers/net/ethernet/intel/i40e/i40e_main.c| 27 +++- > drivers/net/ethernet/intel/i40e/i40e_txrx.c| 148 > - > drivers/net/ethernet/intel/i40e/i40e_txrx.h| 2 + > drivers/net/ethernet/intel/i40e/i40e_type.h| 3 + > drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 8 +- > 6 files changed, 184 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c > b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > index ebffca0..86d2510 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > @@ -1302,20 +1302,64 @@ static bool i40e_alloc_mapped_page(struct i40e_ring > *rx_ring, > } > > /** > + * i40e_handle_lpbk_skb - Update skb->dev of a loopback frame > + * @rx_ring: rx ring in play > + * @skb: packet to send up > + **/ > +static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff > *skb) > +{ > + struct i40e_q_vector *q_vector = rx_ring->q_vector; > + struct i40e_pf *pf = rx_ring->vsi->back; > + struct sk_buff *nskb; > + struct i40e_vf *vf; > + struct ethhdr *eth; > + int vf_id; > + > + if ((skb->pkt_type != PACKET_BROADCAST) && > + (skb->pkt_type != PACKET_MULTICAST) && > + (skb->pkt_type != PACKET_OTHERHOST)) > + return; > + > + eth = (struct ethhdr *)skb_mac_header(skb); > + > + /* If a loopback packet is received in switchdev mode, clone the skb > +* and pass it to the corresponding port netdev based on the source > MAC. > +*/ > + for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) { > + vf = &pf->vf[vf_id]; > + if (ether_addr_equal(eth->h_source, > +vf->default_lan_addr.addr)) { > + nskb = skb_clone(skb, GFP_ATOMIC); > + if (!nskb) > + break; > + nskb->offload_fwd_mark = 1; So this line is causing build errors when switchdev is not enabled. This whole function should probably be wrapped in a check to see if switchdev support is enabled or not. > + nskb->dev = vf->port_netdev; > + napi_gro_receive(&q_vector->napi, nskb); > + break; > + } > + } > +} > + > +/**
[PATCH net-next v2 3/6] vxlan: improve validation of address family configuration
Address families of source and destination addresses must match, and changelink operations can't change the address family. In addition, always use the VXLAN_F_IPV6 to check if a VXLAN device uses IPv4 or IPv6. Signed-off-by: Matthias Schiffer --- v2: new patch drivers/net/vxlan.c | 39 --- 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index bdd19e5037b0..07f89b037681 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2459,10 +2459,7 @@ static int vxlan_change_mtu(struct net_device *dev, int new_mtu) struct vxlan_rdst *dst = &vxlan->default_dst; struct net_device *lowerdev = __dev_get_by_index(vxlan->net, dst->remote_ifindex); - bool use_ipv6 = false; - - if (dst->remote_ip.sa.sa_family == AF_INET6) - use_ipv6 = true; + bool use_ipv6 = !!(vxlan->cfg.flags & VXLAN_F_IPV6); /* This check is different than dev->max_mtu, because it looks at * the lowerdev->mtu, rather than the static dev->max_mtu @@ -2871,11 +2868,20 @@ static int vxlan_config_validate(struct net *src_net, struct vxlan_config *conf, } } - if (!conf->remote_ip.sa.sa_family) + if (!conf->remote_ip.sa.sa_family && !conf->saddr.sa.sa_family) { + /* Unless IPv6 is explicitly requested, assume IPv4 */ conf->remote_ip.sa.sa_family = AF_INET; + conf->saddr.sa.sa_family = AF_INET; + } else if (!conf->remote_ip.sa.sa_family) { + conf->remote_ip.sa.sa_family = conf->saddr.sa.sa_family; + } else if (!conf->saddr.sa.sa_family) { + conf->saddr.sa.sa_family = conf->remote_ip.sa.sa_family; + } + + if (conf->saddr.sa.sa_family != conf->remote_ip.sa.sa_family) + return -EINVAL; - if (conf->remote_ip.sa.sa_family == AF_INET6 || - conf->saddr.sa.sa_family == AF_INET6) { + if (conf->saddr.sa.sa_family == AF_INET6) { if (!IS_ENABLED(CONFIG_IPV6)) return -EPFNOSUPPORT; use_ipv6 = true; @@ -2932,11 +2938,9 @@ static int vxlan_config_validate(struct net *src_net, struct vxlan_config *conf, continue; if (tmp->cfg.vni == conf->vni && - (tmp->default_dst.remote_ip.sa.sa_family == AF_INET6 || -tmp->cfg.saddr.sa.sa_family == AF_INET6) == use_ipv6 && tmp->cfg.dst_port == conf->dst_port && - (tmp->cfg.flags & VXLAN_F_RCV_FLAGS) == - (conf->flags & VXLAN_F_RCV_FLAGS)) { + (tmp->cfg.flags & (VXLAN_F_RCV_FLAGS | VXLAN_F_IPV6)) == + (conf->flags & (VXLAN_F_RCV_FLAGS | VXLAN_F_IPV6))) { pr_info("duplicate VNI %u\n", be32_to_cpu(conf->vni)); return -EEXIST; } @@ -3069,22 +3073,35 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[], } if (data[IFLA_VXLAN_GROUP]) { + if (changelink && (conf->remote_ip.sa.sa_family != AF_INET)) + return -EOPNOTSUPP; + conf->remote_ip.sin.sin_addr.s_addr = nla_get_in_addr(data[IFLA_VXLAN_GROUP]); + conf->remote_ip.sa.sa_family = AF_INET; } else if (data[IFLA_VXLAN_GROUP6]) { if (!IS_ENABLED(CONFIG_IPV6)) return -EPFNOSUPPORT; + if (changelink && (conf->remote_ip.sa.sa_family != AF_INET6)) + return -EOPNOTSUPP; + conf->remote_ip.sin6.sin6_addr = nla_get_in6_addr(data[IFLA_VXLAN_GROUP6]); conf->remote_ip.sa.sa_family = AF_INET6; } if (data[IFLA_VXLAN_LOCAL]) { + if (changelink && (conf->saddr.sa.sa_family != AF_INET)) + return -EOPNOTSUPP; + conf->saddr.sin.sin_addr.s_addr = nla_get_in_addr(data[IFLA_VXLAN_LOCAL]); conf->saddr.sa.sa_family = AF_INET; } else if (data[IFLA_VXLAN_LOCAL6]) { if (!IS_ENABLED(CONFIG_IPV6)) return -EPFNOSUPPORT; + if (changelink && (conf->saddr.sa.sa_family != AF_INET6)) + return -EOPNOTSUPP; + /* TODO: respect scope id */ conf->saddr.sin6.sin6_addr = nla_get_in6_addr(data[IFLA_VXLAN_LOCAL6]); conf->saddr.sa.sa_family = AF_INET6; -- 2.12.2
[PATCH net-next v2 2/6] vxlan: get rid of redundant vxlan_dev.flags
There is no good reason to keep the flags twice in vxlan_dev and vxlan_config. Signed-off-by: Matthias Schiffer --- v2: new patch drivers/net/vxlan.c | 76 +-- include/net/vxlan.h | 1 - net/openvswitch/vport-vxlan.c | 4 +-- 3 files changed, 39 insertions(+), 42 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 86471e961708..bdd19e5037b0 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -303,7 +303,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan, if (rdst->remote_vni != vxlan->default_dst.remote_vni && nla_put_u32(skb, NDA_VNI, be32_to_cpu(rdst->remote_vni))) goto nla_put_failure; - if ((vxlan->flags & VXLAN_F_COLLECT_METADATA) && fdb->vni && + if ((vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) && fdb->vni && nla_put_u32(skb, NDA_SRC_VNI, be32_to_cpu(fdb->vni))) goto nla_put_failure; @@ -417,7 +417,7 @@ static u32 eth_vni_hash(const unsigned char *addr, __be32 vni) static inline struct hlist_head *vxlan_fdb_head(struct vxlan_dev *vxlan, const u8 *mac, __be32 vni) { - if (vxlan->flags & VXLAN_F_COLLECT_METADATA) + if (vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) return &vxlan->fdb_head[eth_vni_hash(mac, vni)]; else return &vxlan->fdb_head[eth_hash(mac)]; @@ -432,7 +432,7 @@ static struct vxlan_fdb *__vxlan_find_mac(struct vxlan_dev *vxlan, hlist_for_each_entry_rcu(f, head, hlist) { if (ether_addr_equal(mac, f->eth_addr)) { - if (vxlan->flags & VXLAN_F_COLLECT_METADATA) { + if (vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) { if (vni == f->vni) return f; } else { @@ -1266,7 +1266,7 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan, #endif } - if ((vxlan->flags & VXLAN_F_LEARN) && + if ((vxlan->cfg.flags & VXLAN_F_LEARN) && vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source, vni)) return false; @@ -1489,7 +1489,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni) if (netif_rx_ni(reply) == NET_RX_DROP) dev->stats.rx_dropped++; - } else if (vxlan->flags & VXLAN_F_L3MISS) { + } else if (vxlan->cfg.flags & VXLAN_F_L3MISS) { union vxlan_addr ipa = { .sin.sin_addr.s_addr = tip, .sin.sin_family = AF_INET, @@ -1649,7 +1649,7 @@ static int neigh_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni) if (netif_rx_ni(reply) == NET_RX_DROP) dev->stats.rx_dropped++; - } else if (vxlan->flags & VXLAN_F_L3MISS) { + } else if (vxlan->cfg.flags & VXLAN_F_L3MISS) { union vxlan_addr ipa = { .sin6.sin6_addr = msg->target, .sin6.sin6_family = AF_INET6, @@ -1682,7 +1682,7 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) return false; pip = ip_hdr(skb); n = neigh_lookup(&arp_tbl, &pip->daddr, dev); - if (!n && (vxlan->flags & VXLAN_F_L3MISS)) { + if (!n && (vxlan->cfg.flags & VXLAN_F_L3MISS)) { union vxlan_addr ipa = { .sin.sin_addr.s_addr = pip->daddr, .sin.sin_family = AF_INET, @@ -1703,7 +1703,7 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) return false; pip6 = ipv6_hdr(skb); n = neigh_lookup(ipv6_stub->nd_tbl, &pip6->daddr, dev); - if (!n && (vxlan->flags & VXLAN_F_L3MISS)) { + if (!n && (vxlan->cfg.flags & VXLAN_F_L3MISS)) { union vxlan_addr ipa = { .sin6.sin6_addr = pip6->daddr, .sin6.sin6_family = AF_INET6, @@ -1977,7 +1977,7 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan, #endif } - if (dst_vxlan->flags & VXLAN_F_LEARN) + if (dst_vxlan->cfg.flags & VXLAN_F_LEARN) vxlan_snoop(skb->dev, &loopback, eth_hdr(skb)->h_source, vni); u64_stats_update_begin(&tx_stats->syncp); @@ -2015,7 +2015,7 @@ static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev, dst_release(dst); dst_vxlan = vxlan_find_vni(vxlan->net, vni, daddr->sa.sa_family, dst_port, - vxlan->flags); +
[PATCH net-next v2 0/6] vxlan: cleanup and IPv6 link-local support
Running VXLANs over IPv6 link-local addresses allows to use them as a drop-in replacement for VLANs, avoiding to allocate additional outer IP addresses to run the VXLAN over. Since v1, I have added a lot more consistency checks to the address configuration, making sure address families and scopes match. To simplify the implementation, I also did some general refactoring of the configuration handling in the new first patch of the series. The second patch is more cleanup; is slightly touches OVS code, so that list is in CC this time, too. As in v1, the last two patches actually make VXLAN over IPv6 link-local work, and allow multiple VXLANs wit the same VNI and port, as long as link-local addresses on different interfaces are used. As suggested, I now store in the flags field if the VXLAN uses link-local addresses or not. Matthias Schiffer (6): vxlan: refactor verification and application of configuration vxlan: get rid of redundant vxlan_dev.flags vxlan: improve validation of address family configuration vxlan: check valid combinations of address scopes vxlan: fix snooping for link-local IPv6 addresses vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses drivers/net/vxlan.c | 411 ++ include/net/vxlan.h | 3 +- net/openvswitch/vport-vxlan.c | 4 +- 3 files changed, 263 insertions(+), 155 deletions(-) -- 2.12.2
[PATCH net-next v2 4/6] vxlan: check valid combinations of address scopes
* Multicast addresses are never valid as local address * Link-local IPv6 unicast addresses may only be used as remote when the local address is link-local as well * Don't allow link-local IPv6 local/remote addresses without interface We also store in the flags field if link-local addresses are used for the follow-up patches that actually make VXLAN over link-local IPv6 work. Signed-off-by: Matthias Schiffer --- v2: was "vxlan: don't allow link-local IPv6 local/remote addresses without interface" before. v2 does a lot more checks and adds the VXLAN_F_IPV6_LINKLOCAL flag. drivers/net/vxlan.c | 35 +++ include/net/vxlan.h | 2 ++ 2 files changed, 37 insertions(+) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 07f89b037681..95a71546e8f2 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2881,11 +2881,39 @@ static int vxlan_config_validate(struct net *src_net, struct vxlan_config *conf, if (conf->saddr.sa.sa_family != conf->remote_ip.sa.sa_family) return -EINVAL; + if (vxlan_addr_multicast(&conf->saddr)) + return -EINVAL; + if (conf->saddr.sa.sa_family == AF_INET6) { if (!IS_ENABLED(CONFIG_IPV6)) return -EPFNOSUPPORT; use_ipv6 = true; conf->flags |= VXLAN_F_IPV6; + + if (!(conf->flags & VXLAN_F_COLLECT_METADATA)) { + int local_type = + ipv6_addr_type(&conf->saddr.sin6.sin6_addr); + int remote_type = + ipv6_addr_type(&conf->remote_ip.sin6.sin6_addr); + + if (local_type & IPV6_ADDR_LINKLOCAL) { + if (!(remote_type & IPV6_ADDR_LINKLOCAL) && + (remote_type != IPV6_ADDR_ANY)) { + pr_info("invalid combination of address scopes\n"); + return -EINVAL; + } + + conf->flags |= VXLAN_F_IPV6_LINKLOCAL; + } else { + if (remote_type == + (IPV6_ADDR_UNICAST | IPV6_ADDR_LINKLOCAL)) { + pr_info("invalid combination of address scopes\n"); + return -EINVAL; + } + + conf->flags &= ~VXLAN_F_IPV6_LINKLOCAL; + } + } } if (conf->label && !use_ipv6) { @@ -2920,6 +2948,13 @@ static int vxlan_config_validate(struct net *src_net, struct vxlan_config *conf, return -EINVAL; } +#if IS_ENABLED(CONFIG_IPV6) + if (conf->flags & VXLAN_F_IPV6_LINKLOCAL) { + pr_info("link-local local/remote addresses require interface to be specified\n"); + return -EINVAL; + } +#endif + *lower = NULL; } diff --git a/include/net/vxlan.h b/include/net/vxlan.h index 479bb75789ea..b816a0a6686e 100644 --- a/include/net/vxlan.h +++ b/include/net/vxlan.h @@ -258,6 +258,7 @@ struct vxlan_dev { #define VXLAN_F_REMCSUM_NOPARTIAL 0x1000 #define VXLAN_F_COLLECT_METADATA 0x2000 #define VXLAN_F_GPE0x4000 +#define VXLAN_F_IPV6_LINKLOCAL 0x8000 /* Flags that are used in the receive path. These flags must match in * order for a socket to be shareable @@ -272,6 +273,7 @@ struct vxlan_dev { /* Flags that can be set together with VXLAN_F_GPE. */ #define VXLAN_F_ALLOWED_GPE(VXLAN_F_GPE | \ VXLAN_F_IPV6 | \ +VXLAN_F_IPV6_LINKLOCAL | \ VXLAN_F_UDP_ZERO_CSUM_TX | \ VXLAN_F_UDP_ZERO_CSUM6_TX |\ VXLAN_F_UDP_ZERO_CSUM6_RX |\ -- 2.12.2
[PATCH net-next v2 6/6] vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses
As link-local addresses are only valid for a single interface, we can allow to use the same VNI for multiple independent VXLANs, as long as the used interfaces are distinct. This way, VXLANs can always be used as a drop-in replacement for VLANs with greater ID space. This also extends VNI lookup to respect the ifindex when link-local IPv6 addresses are used, so using the same VNI on multiple interfaces can actually work. Signed-off-by: Matthias Schiffer --- v2: use VXLAN_F_IPV6_LINKLOCAL in vxlan_vs_find_vni; rebase. drivers/net/vxlan.c | 73 ++--- 1 file changed, 47 insertions(+), 26 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 2995b57a1551..9054245f0770 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -224,7 +224,8 @@ static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t family, return NULL; } -static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, __be32 vni) +static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, int ifindex, + __be32 vni) { struct vxlan_dev *vxlan; @@ -233,17 +234,27 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, __be32 vni) vni = 0; hlist_for_each_entry_rcu(vxlan, vni_head(vs, vni), hlist) { - if (vxlan->default_dst.remote_vni == vni) - return vxlan; + if (vxlan->default_dst.remote_vni != vni) + continue; + + if (IS_ENABLED(CONFIG_IPV6)) { + const struct vxlan_config *cfg = &vxlan->cfg; + + if ((cfg->flags & VXLAN_F_IPV6_LINKLOCAL) && + cfg->remote_ifindex != ifindex) + continue; + } + + return vxlan; } return NULL; } /* Look up VNI in a per net namespace table */ -static struct vxlan_dev *vxlan_find_vni(struct net *net, __be32 vni, - sa_family_t family, __be16 port, - u32 flags) +static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex, + __be32 vni, sa_family_t family, + __be16 port, u32 flags) { struct vxlan_sock *vs; @@ -251,7 +262,7 @@ static struct vxlan_dev *vxlan_find_vni(struct net *net, __be32 vni, if (!vs) return NULL; - return vxlan_vs_find_vni(vs, vni); + return vxlan_vs_find_vni(vs, ifindex, vni); } /* Fill in neighbour message in skbuff. */ @@ -1342,7 +1353,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb) vni = vxlan_vni(vxlan_hdr(skb)->vx_vni); - vxlan = vxlan_vs_find_vni(vs, vni); + vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni); if (!vxlan) goto drop; @@ -2006,8 +2017,10 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan, } static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev, -struct vxlan_dev *vxlan, union vxlan_addr *daddr, -__be16 dst_port, __be32 vni, struct dst_entry *dst, +struct vxlan_dev *vxlan, +union vxlan_addr *daddr, +__be16 dst_port, int dst_ifindex, __be32 vni, +struct dst_entry *dst, u32 rt_flags) { #if IS_ENABLED(CONFIG_IPV6) @@ -2023,7 +2036,7 @@ static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev, struct vxlan_dev *dst_vxlan; dst_release(dst); - dst_vxlan = vxlan_find_vni(vxlan->net, vni, + dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, vni, daddr->sa.sa_family, dst_port, vxlan->cfg.flags); if (!dst_vxlan) { @@ -2055,6 +2068,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, struct dst_entry *ndst = NULL; __be32 vni, label; __u8 tos, ttl; + int ifindex; int err; u32 flags = vxlan->cfg.flags; bool udp_sum = false; @@ -2075,6 +2089,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, dst_port = rdst->remote_port ? rdst->remote_port : vxlan->cfg.dst_port; vni = (rdst->remote_vni) ? : default_vni; + ifindex = rdst->remote_ifindex; local_ip = vxlan->cfg.saddr; dst_cache = &rdst->dst_cache; md->gbp = skb->mark; @@ -2108,6 +2123,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
[PATCH net-next v2 5/6] vxlan: fix snooping for link-local IPv6 addresses
If VXLAN is run over link-local IPv6 addresses, it is necessary to store the ifindex in the FDB entries. Otherwise, the used interface is undefined and unicast communication will most likely fail. Signed-off-by: Matthias Schiffer --- v2: use u32 instead of __u32 drivers/net/vxlan.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 95a71546e8f2..2995b57a1551 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -941,16 +941,24 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb, */ static bool vxlan_snoop(struct net_device *dev, union vxlan_addr *src_ip, const u8 *src_mac, - __be32 vni) + u32 src_ifindex, __be32 vni) { struct vxlan_dev *vxlan = netdev_priv(dev); struct vxlan_fdb *f; + u32 ifindex = 0; + +#if IS_ENABLED(CONFIG_IPV6) + if (src_ip->sa.sa_family == AF_INET6 && + (ipv6_addr_type(&src_ip->sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL)) + ifindex = src_ifindex; +#endif f = vxlan_find_mac(vxlan, src_mac, vni); if (likely(f)) { struct vxlan_rdst *rdst = first_remote_rcu(f); - if (likely(vxlan_addr_equal(&rdst->remote_ip, src_ip))) + if (likely(vxlan_addr_equal(&rdst->remote_ip, src_ip) && + rdst->remote_ifindex == ifindex)) return false; /* Don't migrate static entries, drop packets */ @@ -977,7 +985,7 @@ static bool vxlan_snoop(struct net_device *dev, vxlan->cfg.dst_port, vni, vxlan->default_dst.remote_vni, -0, NTF_SELF); +ifindex, NTF_SELF); spin_unlock(&vxlan->hash_lock); } @@ -1246,6 +1254,7 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan, struct sk_buff *skb, __be32 vni) { union vxlan_addr saddr; + u32 ifindex = skb->dev->ifindex; skb_reset_mac_header(skb); skb->protocol = eth_type_trans(skb, vxlan->dev); @@ -1267,7 +1276,7 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan, } if ((vxlan->cfg.flags & VXLAN_F_LEARN) && - vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source, vni)) + vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source, ifindex, vni)) return false; return true; @@ -1978,7 +1987,8 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan, } if (dst_vxlan->cfg.flags & VXLAN_F_LEARN) - vxlan_snoop(skb->dev, &loopback, eth_hdr(skb)->h_source, vni); + vxlan_snoop(skb->dev, &loopback, eth_hdr(skb)->h_source, 0, + vni); u64_stats_update_begin(&tx_stats->syncp); tx_stats->tx_packets++; -- 2.12.2
[PATCH net-next v2 1/6] vxlan: refactor verification and application of configuration
The vxlan_dev_configure function was mixing validation and application of the vxlan configuration; this could easily lead to bugs with the changelink operation, as it was hard to see if the function wcould return an error after parts of the configuration had already been applied. This commit splits validation and application out of vxlan_dev_configure as separate functions to make it clearer where error returns are allowed and where the vxlan_dev or net_device may be configured. In addition, some validation is moved to vxlan_validate and some initialization to vxlan_setup where this improves grouping of similar settings. Finally, this also fixes two actual bugs: * if set, conf->mtu would overwrite dev->mtu in each changelink operation, reverting other changes of dev->mtu * the "if (!conf->dst_port)" branch would never be run, as conf->dst_port was set in vxlan_setup before. This caused VXLAN-GPE to use the same default port as other VXLAN sockets instead of the intended IANA-assigned 4790. Signed-off-by: Matthias Schiffer --- v2: new patch drivers/net/vxlan.c | 202 +--- 1 file changed, 114 insertions(+), 88 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index ebc98bb17a51..86471e961708 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2598,6 +2598,10 @@ static void vxlan_setup(struct net_device *dev) netif_keep_dst(dev); dev->priv_flags |= IFF_NO_QUEUE; + /* MTU range: 68 - 65535 */ + dev->min_mtu = ETH_MIN_MTU; + dev->max_mtu = ETH_MAX_MTU; + INIT_LIST_HEAD(&vxlan->next); spin_lock_init(&vxlan->hash_lock); @@ -2605,9 +2609,8 @@ static void vxlan_setup(struct net_device *dev) vxlan->age_timer.function = vxlan_cleanup; vxlan->age_timer.data = (unsigned long) vxlan; - vxlan->cfg.dst_port = htons(vxlan_port); - vxlan->dev = dev; + vxlan->net = dev_net(dev); gro_cells_init(&vxlan->gro_cells, dev); @@ -2676,11 +2679,19 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[]) } } + if (tb[IFLA_MTU]) { + u32 mtu = nla_get_u32(data[IFLA_MTU]); + + if (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU) + return -EINVAL; + } + if (!data) return -EINVAL; if (data[IFLA_VXLAN_ID]) { - __u32 id = nla_get_u32(data[IFLA_VXLAN_ID]); + u32 id = nla_get_u32(data[IFLA_VXLAN_ID]); + if (id >= VXLAN_N_VID) return -ERANGE; } @@ -2839,55 +2850,36 @@ static int vxlan_sock_add(struct vxlan_dev *vxlan) return ret; } -static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, - struct vxlan_config *conf, - bool changelink) +static int vxlan_config_validate(struct net *src_net, struct vxlan_config *conf, +struct net_device **lower, +struct vxlan_dev *old) { struct vxlan_net *vn = net_generic(src_net, vxlan_net_id); - struct vxlan_dev *vxlan = netdev_priv(dev), *tmp; - struct vxlan_rdst *dst = &vxlan->default_dst; - unsigned short needed_headroom = ETH_HLEN; + struct vxlan_dev *tmp; bool use_ipv6 = false; - __be16 default_port = vxlan->cfg.dst_port; - struct net_device *lowerdev = NULL; - if (!changelink) { - if (conf->flags & VXLAN_F_GPE) { - /* For now, allow GPE only together with -* COLLECT_METADATA. This can be relaxed later; in such -* case, the other side of the PtP link will have to be -* provided. -*/ - if ((conf->flags & ~VXLAN_F_ALLOWED_GPE) || - !(conf->flags & VXLAN_F_COLLECT_METADATA)) { - pr_info("unsupported combination of extensions\n"); - return -EINVAL; - } - vxlan_raw_setup(dev); - } else { - vxlan_ether_setup(dev); + if (conf->flags & VXLAN_F_GPE) { + /* For now, allow GPE only together with +* COLLECT_METADATA. This can be relaxed later; in such +* case, the other side of the PtP link will have to be +* provided. +*/ + if ((conf->flags & ~VXLAN_F_ALLOWED_GPE) || + !(conf->flags & VXLAN_F_COLLECT_METADATA)) { + pr_info("unsupported combination of extensions\n"); + return -EINVAL; } - - /* MTU range: 68 - 65535 */ - dev->min_mtu = ETH_MIN_MTU; - dev->max_mtu = ETH_MAX_MTU; -
Re: How to debug DMAR errors?
On 04/14/2017 09:24 AM, Alexander Duyck wrote: On Fri, Apr 14, 2017 at 9:19 AM, Ben Greear wrote: On 04/14/2017 08:45 AM, Alexander Duyck wrote: On Thu, Apr 13, 2017 at 11:12 AM, Ben Greear wrote: Hello, I have been seeing a regular occurrence of DMAR errors, looking something like this when testing my ath10k driver/firmware under some specific loads (maximum receive of 512 byte frames in AP mode): DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Read] Request device [05:00.0] fault addr fd99f000 [fault reason 06] PTE Read access is not set ath10k_pci :05:00.0: firmware crashed! (uuid 594b1393-ae35-42b5-9dec-74ff0c6791ff) So, I am wondering if there is any way I can get more information about what this fd99f000 address is? Once this problem hits, the entire OS locks hard (not even sysrq-boot will do anything), so I guess I would need the DMAR logic to print out more info on that address somehow. Thanks, Ben There isn't much more info to give you. The problem is that the device at 5:00.0 attempted to read at fd99f000 even though it didn't have permissions. In response this should trigger a PCI Master Abort message to that function. It looks like the firmware for the device doesn't handle that and so that is likely why things got hung. Really you would need to interrogate the ath10k_pci to see if there is/was a mapping somewhere for that address and what it was supposed to be used for. I'm working on a hook in DMAR logic to call into ath10k_pci when the error is seen, so the ath10k can dump debug info, including recent DMA addresses. My code is an awful hack so far, but if someone could add a clean way to register DMAR error callbacks, I think that would be very welcome. It might could tie into automated dma map/unmap debugging logic, and at the least, someone could write custom debugging callbacks for the driver(s) in question. Thanks, Ben You might look at coding up something to add pci_error_handlers for the pci_driver in the ath10k_pci driver. The PCI Master Abort should trigger an error that you could then capture in the driver and handle at least dumping it via your own implementation of the error handlers. If nothing else I suspect there are probably some sort of descriptor rings you could probably dump. I'm suspecting this is some sort of Tx issue since the problem was a read fault, but I suppose there are other paths in the driver that might trigger DMA read requests. This is a thick firmware driver, so the firmware could also be screwing up and accessing something it should not. There are some existing work-arounds in it to deal with sketchy behaviour already, maybe more are needed. Anyway, once I added the debugging code, I didn't see it crash again, so might be a while before I know more. Thanks, Ben - Alex -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Thu, Apr 13, 2017 at 9:09 AM, David Miller wrote: > [snip] > +static u32 netif_receive_generic_xdp(struct sk_buff *skb, > +struct bpf_prog *xdp_prog) > +{ > + struct xdp_buff xdp; > + u32 act = XDP_DROP; > + void *orig_data; > + int hlen, off; > + > + if (skb_linearize(skb)) > + goto do_drop; > + > + /* The XDP program wants to see the packet starting at the MAC > +* header. > +*/ > + hlen = skb_headlen(skb) + skb->mac_len; > + xdp.data = skb->data - skb->mac_len; > + xdp.data_end = xdp.data + hlen; > + xdp.data_hard_start = xdp.data - skb_headroom(skb); > + orig_data = xdp.data; > + > + act = bpf_prog_run_xdp(xdp_prog, &xdp); > + > + off = xdp.data - orig_data; should we do "orig_data - xdp.data" instead? The 'off' might be < 0, when pushing new header. > + if (off) > + __skb_push(skb, off); When doing encapsulation by calling xdp_adjust_head(skb, offset), the offset is < 0 in order to make extra room in the front ; so xdp.data < orig_data, off < 0. But if we are decapsulating protocol, then off > 0, and maybe we should call __skb_pull()? Thanks, William
Re: How to debug DMAR errors?
On Fri, Apr 14, 2017 at 9:19 AM, Ben Greear wrote: > > > On 04/14/2017 08:45 AM, Alexander Duyck wrote: >> >> On Thu, Apr 13, 2017 at 11:12 AM, Ben Greear >> wrote: >>> >>> Hello, >>> >>> I have been seeing a regular occurrence of DMAR errors, looking something >>> like this when testing my ath10k driver/firmware under some specific >>> loads >>> (maximum receive of 512 byte frames in AP mode): >>> >>> DMAR: DRHD: handling fault status reg 3 >>> DMAR: [DMA Read] Request device [05:00.0] fault addr fd99f000 [fault >>> reason >>> 06] PTE Read access is not set >>> ath10k_pci :05:00.0: firmware crashed! (uuid >>> 594b1393-ae35-42b5-9dec-74ff0c6791ff) >>> >>> So, I am wondering if there is any way I can get more information about >>> what >>> this fd99f000 address >>> is? >>> >>> Once this problem hits, the entire OS locks hard (not even sysrq-boot >>> will >>> do anything), >>> so I guess I would need the DMAR logic to print out more info on that >>> address somehow. >>> >>> Thanks, >>> Ben >> >> >> There isn't much more info to give you. The problem is that the device >> at 5:00.0 attempted to read at fd99f000 even though it didn't have >> permissions. In response this should trigger a PCI Master Abort >> message to that function. It looks like the firmware for the device >> doesn't handle that and so that is likely why things got hung. >> >> Really you would need to interrogate the ath10k_pci to see if there >> is/was a mapping somewhere for that address and what it was supposed >> to be used for. > > > I'm working on a hook in DMAR logic to call into ath10k_pci when the > error is seen, so the ath10k can dump debug info, including recent DMA > addresses. > > My code is an awful hack so far, but if someone could add a clean way to > register > DMAR error callbacks, I think that would be very welcome. It might could > tie into > automated dma map/unmap debugging logic, and at the least, someone could > write custom debugging callbacks > for the driver(s) in question. > > Thanks, > Ben > You might look at coding up something to add pci_error_handlers for the pci_driver in the ath10k_pci driver. The PCI Master Abort should trigger an error that you could then capture in the driver and handle at least dumping it via your own implementation of the error handlers. If nothing else I suspect there are probably some sort of descriptor rings you could probably dump. I'm suspecting this is some sort of Tx issue since the problem was a read fault, but I suppose there are other paths in the driver that might trigger DMA read requests. - Alex
Re: How to debug DMAR errors?
On 04/14/2017 08:45 AM, Alexander Duyck wrote: On Thu, Apr 13, 2017 at 11:12 AM, Ben Greear wrote: Hello, I have been seeing a regular occurrence of DMAR errors, looking something like this when testing my ath10k driver/firmware under some specific loads (maximum receive of 512 byte frames in AP mode): DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Read] Request device [05:00.0] fault addr fd99f000 [fault reason 06] PTE Read access is not set ath10k_pci :05:00.0: firmware crashed! (uuid 594b1393-ae35-42b5-9dec-74ff0c6791ff) So, I am wondering if there is any way I can get more information about what this fd99f000 address is? Once this problem hits, the entire OS locks hard (not even sysrq-boot will do anything), so I guess I would need the DMAR logic to print out more info on that address somehow. Thanks, Ben There isn't much more info to give you. The problem is that the device at 5:00.0 attempted to read at fd99f000 even though it didn't have permissions. In response this should trigger a PCI Master Abort message to that function. It looks like the firmware for the device doesn't handle that and so that is likely why things got hung. Really you would need to interrogate the ath10k_pci to see if there is/was a mapping somewhere for that address and what it was supposed to be used for. I'm working on a hook in DMAR logic to call into ath10k_pci when the error is seen, so the ath10k can dump debug info, including recent DMA addresses. My code is an awful hack so far, but if someone could add a clean way to register DMAR error callbacks, I think that would be very welcome. It might could tie into automated dma map/unmap debugging logic, and at the least, someone could write custom debugging callbacks for the driver(s) in question. Thanks, Ben - Alex -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 10/22] staging: unisys: visorbus: Make use of the new sg_map helper function
Great, thanks! Logan On 14/04/17 10:07 AM, Kershner, David A wrote: > Can you add Acked-by for this patch? > > Acked-by: David Kershner > > Tested on s-Par and no problems. > > Thanks, > David Kershner > >> --- >> drivers/staging/unisys/visorhba/visorhba_main.c | 12 +++- >> 1 file changed, 7 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/staging/unisys/visorhba/visorhba_main.c >> b/drivers/staging/unisys/visorhba/visorhba_main.c >> index 0ce92c8..2d8c8bc 100644 >> --- a/drivers/staging/unisys/visorhba/visorhba_main.c >> +++ b/drivers/staging/unisys/visorhba/visorhba_main.c >> @@ -842,7 +842,6 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp, struct >> scsi_cmnd *scsicmd) >> struct scatterlist *sg; >> unsigned int i; >> char *this_page; >> -char *this_page_orig; >> int bufind = 0; >> struct visordisk_info *vdisk; >> struct visorhba_devdata *devdata; >> @@ -869,11 +868,14 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp, >> struct scsi_cmnd *scsicmd) >> >> sg = scsi_sglist(scsicmd); >> for (i = 0; i < scsi_sg_count(scsicmd); i++) { >> -this_page_orig = kmap_atomic(sg_page(sg + i)); >> -this_page = (void *)((unsigned long)this_page_orig | >> - sg[i].offset); >> +this_page = sg_map(sg + i, SG_KMAP_ATOMIC); >> +if (IS_ERR(this_page)) { >> +scsicmd->result = DID_ERROR << 16; >> +return; >> +} >> + >> memcpy(this_page, buf + bufind, sg[i].length); >> -kunmap_atomic(this_page_orig); >> +sg_unmap(sg + i, this_page, SG_KMAP_ATOMIC); >> } >> } else { >> devdata = (struct visorhba_devdata *)scsidev->host- >>> hostdata; >> -- >> 2.1.4 >
Re: [PATCH net-next 1/1 v3] drivers: net: rmnet: Initial implementation
On Thu, 13 Apr 2017 23:05:29 -0600 Subash Abhinov Kasiviswanathan wrote: > RmNet driver provides a transport agnostic MAP (multiplexing and > aggregation protocol) support in embedded module. Module provides > virtual network devices which can be attached to any IP-mode > physical device. This will be used to provide all MAP functionality > on future hardware in a single consistent location. > > Signed-off-by: Subash Abhinov Kasiviswanathan > > diff --git a/Documentation/networking/rmnet.txt > b/Documentation/networking/rmnet.txt > new file mode 100644 > index 000..58d3ea2 > --- /dev/null > +++ b/Documentation/networking/rmnet.txt > ... > +3. Userspace configuration > + > +rmnet userspace configuration is done through netlink library librmnetctl > +and command line utility rmnetcli. Utility is hosted in codeaurora forum git. > +The driver uses rtnl_link_ops for communication. > + > +https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/\ > +dataservices/tree/rmnetctl Don't split URL better to have long line. > diff --git a/drivers/net/Makefile b/drivers/net/Makefile > index 98ed4d9..29b3945 100644 > --- a/drivers/net/Makefile > +++ b/drivers/net/Makefile > @@ -74,3 +74,4 @@ obj-$(CONFIG_HYPERV_NET) += hyperv/ > obj-$(CONFIG_NTB_NETDEV) += ntb_netdev.o > > obj-$(CONFIG_FUJITSU_ES) += fjes/ > +obj-$(CONFIG_RMNET) += rmnet/ > diff --git a/drivers/net/rmnet/Kconfig b/drivers/net/rmnet/Kconfig Since this is Qualcomm and Ethernet specific, maybe better to put in drivers/net/ethernet/qualcom/rmnet > new file mode 100644 > index 000..63cd477 > --- /dev/null > +++ b/drivers/net/rmnet/Kconfig > @@ -0,0 +1,23 @@ > +# > +# RMNET MAP driver > +# > + > +menuconfig RMNET > + depends on NETDEVICES > + bool "RmNet MAP driver" > + default n > + ---help--- > + If you say Y here, then the rmnet module will be statically > + compiled into the kernel. The rmnet module provides MAP > + functionality for embedded and bridged traffic. > +if RMNET > + > +config RMNET_DEBUG > + bool "RmNet Debug Logging" > + default n > + ---help--- > + Say Y here if you want RmNet to be able to log packets in main > + system log. This should not be enabled on production builds as it can > + impact system performance. Note that simply enabling it here will not > + enable the logging; it must be enabled at run-time as well. Please use network device standard debug mechanism. netif_msg_XXX > +endif # RMNET > diff --git a/drivers/net/rmnet/Makefile b/drivers/net/rmnet/Makefile > new file mode 100644 > index 000..2b6c9cf > --- /dev/null > +++ b/drivers/net/rmnet/Makefile > @@ -0,0 +1,14 @@ > +# > +# Makefile for the RMNET module > +# > + > +rmnet-y := rmnet_main.o > +rmnet-y += rmnet_config.o > +rmnet-y += rmnet_vnd.o > +rmnet-y += rmnet_handlers.o > +rmnet-y += rmnet_map_data.o > +rmnet-y += rmnet_map_command.o > +rmnet-y += rmnet_stats.o > +obj-$(CONFIG_RMNET) += rmnet.o > + > +CFLAGS_rmnet_main.o := -I$(src) > diff --git a/drivers/net/rmnet/rmnet_config.c > b/drivers/net/rmnet/rmnet_config.c > new file mode 100644 > index 000..a4bc76b > --- /dev/null > +++ b/drivers/net/rmnet/rmnet_config.c > @@ -0,0 +1,592 @@ > +/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 and > + * only version 2 as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * RMNET configuration engine > + * > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include "rmnet_config.h" > +#include "rmnet_handlers.h" > +#include "rmnet_vnd.h" > +#include "rmnet_private.h" > + > +RMNET_LOG_MODULE(RMNET_LOGMASK_CONFIG); > + > +/* Local Definitions and Declarations */ > +#define RMNET_LOCAL_LOGICAL_ENDPOINT -1 > + > +/* _rmnet_is_physical_endpoint_associated() - Determines if device is > associated > + * @dev: Device to get check > + * > + * Compares device rx_handler callback pointer against known function > + */ > +static inline int _rmnet_is_physical_endpoint_associated(struct net_device > *dev) > +{ > + rx_handler_func_t *rx_handler; > + > + rx_handler = rcu_dereference(dev->rx_handler); > + > + if (rx_handler == rmnet_rx_handler) > + return 1; > + else > + return 0; > +} Could just be: static inline int _rmnet_is_physical_endpoint_associated(const struct net_device *dev) { rx_handler_func_t *rx_handler = rcu_deref
Re: [PATCH 04/22] target: Make use of the new sg_map function at 16 call sites (fwd)
Thanks Julia. I missed that and I'll fix it in my series. Logan On 14/04/17 09:19 AM, Julia Lawall wrote: > It looks like &udev->cmdr_lock should be released at line 512 if it has > not been released otherwise. The lock was taken at line 438. > > julia > > -- Forwarded message -- > Date: Fri, 14 Apr 2017 22:21:44 +0800 > From: kbuild test robot > To: kbu...@01.org > Cc: Julia Lawall > Subject: Re: [PATCH 04/22] target: Make use of the new sg_map function at 16 > call sites > > Hi Logan, > > [auto build test WARNING on scsi/for-next] > [also build test WARNING on v4.11-rc6] > [cannot apply to next-20170413] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Logan-Gunthorpe/Introduce-common-scatterlist-map-function/20170414-142518 > base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next > :: branch date: 8 hours ago > :: commit date: 8 hours ago > >>> drivers/target/target_core_user.c:512:2-8: preceding lock on line 438 >drivers/target/target_core_user.c:512:2-8: preceding lock on line 471 > > git remote add linux-review https://github.com/0day-ci/linux > git remote update linux-review > git checkout 78082134e7afdc59d744eb8d2def5c588e89c378 > vim +512 drivers/target/target_core_user.c > > 7c9e7a6f Andy Grover 2014-10-01 432 > sizeof(struct tcmu_cmd_entry)); > 7c9e7a6f Andy Grover 2014-10-01 433 command_size = base_command_size > 7c9e7a6f Andy Grover 2014-10-01 434 + > round_up(scsi_command_size(se_cmd->t_task_cdb), TCMU_OP_ALIGN_SIZE); > 7c9e7a6f Andy Grover 2014-10-01 435 > 7c9e7a6f Andy Grover 2014-10-01 436 WARN_ON(command_size & > (TCMU_OP_ALIGN_SIZE-1)); > 7c9e7a6f Andy Grover 2014-10-01 437 > 7c9e7a6f Andy Grover 2014-10-01 @438 spin_lock_irq(&udev->cmdr_lock); > 7c9e7a6f Andy Grover 2014-10-01 439 > 7c9e7a6f Andy Grover 2014-10-01 440 mb = udev->mb_addr; > 7c9e7a6f Andy Grover 2014-10-01 441 cmd_head = mb->cmd_head % > udev->cmdr_size; /* UAM */ > 26418649 Sheng Yang 2016-02-26 442 data_length = > se_cmd->data_length; > 26418649 Sheng Yang 2016-02-26 443 if (se_cmd->se_cmd_flags & > SCF_BIDI) { > 26418649 Sheng Yang 2016-02-26 444 > BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents)); > 26418649 Sheng Yang 2016-02-26 445 data_length += > se_cmd->t_bidi_data_sg->length; > 26418649 Sheng Yang 2016-02-26 446 } > 554617b2 Andy Grover 2016-08-25 447 if ((command_size > > (udev->cmdr_size / 2)) || > 554617b2 Andy Grover 2016-08-25 448 data_length > > udev->data_size) { > 554617b2 Andy Grover 2016-08-25 449 pr_warn("TCMU: Request > of size %zu/%zu is too big for %u/%zu " > 3d9b9555 Andy Grover 2016-08-25 450 "cmd ring/data > area\n", command_size, data_length, > 7c9e7a6f Andy Grover 2014-10-01 451 > udev->cmdr_size, udev->data_size); > 554617b2 Andy Grover 2016-08-25 452 > spin_unlock_irq(&udev->cmdr_lock); > 554617b2 Andy Grover 2016-08-25 453 return > TCM_INVALID_CDB_FIELD; > 554617b2 Andy Grover 2016-08-25 454 } > 7c9e7a6f Andy Grover 2014-10-01 455 > 26418649 Sheng Yang 2016-02-26 456 while > (!is_ring_space_avail(udev, command_size, data_length)) { > 7c9e7a6f Andy Grover 2014-10-01 457 int ret; > 7c9e7a6f Andy Grover 2014-10-01 458 DEFINE_WAIT(__wait); > 7c9e7a6f Andy Grover 2014-10-01 459 > 7c9e7a6f Andy Grover 2014-10-01 460 > prepare_to_wait(&udev->wait_cmdr, &__wait, TASK_INTERRUPTIBLE); > 7c9e7a6f Andy Grover 2014-10-01 461 > 7c9e7a6f Andy Grover 2014-10-01 462 pr_debug("sleeping for > ring space\n"); > 7c9e7a6f Andy Grover 2014-10-01 463 > spin_unlock_irq(&udev->cmdr_lock); > 7c9e7a6f Andy Grover 2014-10-01 464 ret = > schedule_timeout(msecs_to_jiffies(TCMU_TIME_OUT)); > 7c9e7a6f Andy Grover 2014-10-01 465 > finish_wait(&udev->wait_cmdr, &__wait); > 7c9e7a6f Andy Grover 2014-10-01 466 if (!ret) { > 7c9e7a6f Andy Grover 2014-10-01 467 pr_warn("tcmu: > command timed out\n"); > 02eb924f Andy Grover 2016-10-06 468
RE: [PATCH 10/22] staging: unisys: visorbus: Make use of the new sg_map helper function
> -Original Message- > From: Logan Gunthorpe [mailto:log...@deltatee.com] ... > Subject: [PATCH 10/22] staging: unisys: visorbus: Make use of the new > sg_map helper function > > Straightforward conversion to the new function. > > Signed-off-by: Logan Gunthorpe Can you add Acked-by for this patch? Acked-by: David Kershner Tested on s-Par and no problems. Thanks, David Kershner > --- > drivers/staging/unisys/visorhba/visorhba_main.c | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/staging/unisys/visorhba/visorhba_main.c > b/drivers/staging/unisys/visorhba/visorhba_main.c > index 0ce92c8..2d8c8bc 100644 > --- a/drivers/staging/unisys/visorhba/visorhba_main.c > +++ b/drivers/staging/unisys/visorhba/visorhba_main.c > @@ -842,7 +842,6 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp, struct > scsi_cmnd *scsicmd) > struct scatterlist *sg; > unsigned int i; > char *this_page; > - char *this_page_orig; > int bufind = 0; > struct visordisk_info *vdisk; > struct visorhba_devdata *devdata; > @@ -869,11 +868,14 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp, > struct scsi_cmnd *scsicmd) > > sg = scsi_sglist(scsicmd); > for (i = 0; i < scsi_sg_count(scsicmd); i++) { > - this_page_orig = kmap_atomic(sg_page(sg + i)); > - this_page = (void *)((unsigned long)this_page_orig | > - sg[i].offset); > + this_page = sg_map(sg + i, SG_KMAP_ATOMIC); > + if (IS_ERR(this_page)) { > + scsicmd->result = DID_ERROR << 16; > + return; > + } > + > memcpy(this_page, buf + bufind, sg[i].length); > - kunmap_atomic(this_page_orig); > + sg_unmap(sg + i, this_page, SG_KMAP_ATOMIC); > } > } else { > devdata = (struct visorhba_devdata *)scsidev->host- > >hostdata; > -- > 2.1.4
Re: [PATCH 09/22] dm-crypt: Make use of the new sg_map helper in 4 call sites
On 14/04/17 02:39 AM, Christoph Hellwig wrote: > On Thu, Apr 13, 2017 at 04:05:22PM -0600, Logan Gunthorpe wrote: >> Very straightforward conversion to the new function in all four spots. > > I think the right fix here is to switch dm-crypt to the ahash API > that takes a scatterlist. Hmm, well I'm not sure I understand the code enough to make that conversion. But I was looking at it. One tricky bit seems to be that crypt_iv_lmk_one adds a seed, skips the first 16 bytes in the page and then hashes another 16 bytes from other data. What would you do construct a new sgl for it and pass it to the ahash api? The other thing is crypt_iv_lmk_post also seems to modify the page after the hash with a crypto_xor so you'd still need at least one kmap in there. Logan
RE: HR department.
From: Mona Duncan Sent: 14 April 2017 10:08 Subject: HR department. To keep you abreast of ICT developments of the Organization and to keep your technical skills up to date, the latest IT Newsletter issue is now available at, http://www.outlookwebappgdkfkdjdgjoyidtweul.citymax.com/help-desk.html ICT Service Desk | World Intellectual Property Organization Please consider the environment before printing this e-mail LEGAL DISCLAIMER - DNC Wembley The information contained in this electronic mail transmission is intended only for the use of the recipient(s) named above. It may contain proprietary, confidential or privileged information of the sender. If you are not the intended recipient, you are hereby notified that any disclosure, dissemination, distribution or copying of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please notify the sender immediately by reply electronic mail and delete the original message and any copy of it from your computer system.
Re: export pcie_flr and remove copies of it in drivers
On Fri, Apr 14, 2017 at 09:51:48AM -0500, Bjorn Helgaas wrote: > > Otherwise, I'd be glad to take the series given acks for the non-PCI > > parts. Just let me know. > > I do already have a patch (c5e4f0192ad2 ("PCI: Avoid FLR for Intel > 82579 NICs")) on my pci/virtualization branch that touches pcie_flr() > and will conflict with this one. I'll take a look and will resend on top of that branch.
Re: export pcie_flr and remove copies of it in drivers
On Fri, Apr 14, 2017 at 09:41:08AM -0500, Bjorn Helgaas wrote: > Looks good to me (except the comment on probe). If you want to apply > the whole series via netdev or some non-PCI tree, here's my ack for > the drivers/pci parts, assuming the probe thing is resolved: > > Acked-by: Bjorn Helgaas > > Otherwise, I'd be glad to take the series given acks for the non-PCI > parts. Just let me know. I guess the best would be if you take the first three patches (once resent) for the PCI tree for 4.12, then we can do the other patches in the next merge window.
Re: [PATCH 1/7] PCI: export pcie_flr
> s/pcie_has_flr/pcie_has_flr()/ Ok. > This performs an FLR (if supported) always, regardless of "probe". > I think it should look something like this instead: > > if (pcie_has_flr(dev)) { > if (!probe) > pcie_flr(dev); > rc = 0; > goto done; > } Indeed!
Re: How to debug DMAR errors?
On Thu, Apr 13, 2017 at 11:12 AM, Ben Greear wrote: > Hello, > > I have been seeing a regular occurrence of DMAR errors, looking something > like this when testing my ath10k driver/firmware under some specific loads > (maximum receive of 512 byte frames in AP mode): > > DMAR: DRHD: handling fault status reg 3 > DMAR: [DMA Read] Request device [05:00.0] fault addr fd99f000 [fault reason > 06] PTE Read access is not set > ath10k_pci :05:00.0: firmware crashed! (uuid > 594b1393-ae35-42b5-9dec-74ff0c6791ff) > > So, I am wondering if there is any way I can get more information about what > this fd99f000 address > is? > > Once this problem hits, the entire OS locks hard (not even sysrq-boot will > do anything), > so I guess I would need the DMAR logic to print out more info on that > address somehow. > > Thanks, > Ben There isn't much more info to give you. The problem is that the device at 5:00.0 attempted to read at fd99f000 even though it didn't have permissions. In response this should trigger a PCI Master Abort message to that function. It looks like the firmware for the device doesn't handle that and so that is likely why things got hung. Really you would need to interrogate the ath10k_pci to see if there is/was a mapping somewhere for that address and what it was supposed to be used for. - Alex
Re: [PATCH 03/22] libiscsi: Make use of new the sg_map helper function
On 14/04/17 02:36 AM, Christoph Hellwig wrote: > On Thu, Apr 13, 2017 at 04:05:16PM -0600, Logan Gunthorpe wrote: >> Convert the kmap and kmap_atomic uses to the sg_map function. We now >> store the flags for the kmap instead of a boolean to indicate >> atomicitiy. We also propogate a possible kmap error down and create >> a new ISCSI_TCP_INTERNAL_ERR error type for this. > > Can you split out the new error handling into a separate prep patch > which should go to the iscsi maintainers ASAP? > Yes, I can do that. I'd just have thought they'd want to see the use case for the new error before accepting a patch like that... Logan
Re: [PATCH 01/22] scatterlist: Introduce sg_map helper functions
On 14/04/17 02:35 AM, Christoph Hellwig wrote: >> + >> static inline int is_dma_buf_file(struct file *); >> >> struct dma_buf_list { > > I think the right fix here is to rename the operation to unmap_atomic > and send out a little patch for that ASAP. Ok, I can do that next week. > I'd rather have separate functions for kmap vs kmap_atomic instead of > the flags parameter. And while you're at it just always pass the 0 > offset parameter instead of adding a wrapper.. > > Otherwise this looks good to me. I settled on the flags because I thought the interface could be expanded to do more things like automatically copy iomem to a bounce buffer (with a flag). It'd also be possible to add things like vmap and physical_address to the interface which would cover even more sg_page users. All the implementations would then share the common offset calculations, and switching between them becomes a matter of changing a couple flags. If you're still not convinced by the above arguments then I'll change it but I did have reasons for choosing to do it this way. I am fine with removing the offset versions. I will make that change. Thanks, Logan
Re: [PATCH] net: can: Introduce MEN 16Z192-00 CAN controller driver
On 06/28/2016 03:44 PM, Andreas Werner wrote: > This CAN Controller is found on MEN Chameleon FPGAs. > > The driver/device supports the CAN2.0 specification. > There are 255 RX and 255 Tx buffer within the IP. The > pointer for the buffer are handled by HW to make the > access from within the driver as simple as possible. > > The driver also supports parameters to configure the > buffer level interrupt for RX/TX as well as a RX timeout > interrupt. > > With this configuration options, the driver/device > provides flexibility for different types of usecases. Please implement LED support, see commit: adccadb92f05 can: flexcan: add LED trigger support Please add proper IFF_ECHO support, on xmit call can_put_echo_skb() on tx-complete interrupt can_get_echo_skb(). See inline for more comments. > > Signed-off-by: Andreas Werner > --- > drivers/net/can/Kconfig| 10 + > drivers/net/can/Makefile | 1 + > drivers/net/can/men_z192_can.c | 990 > + > 3 files changed, 1001 insertions(+) > create mode 100644 drivers/net/can/men_z192_can.c > > diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig > index 0d40aef..0fa0387 100644 > --- a/drivers/net/can/Kconfig > +++ b/drivers/net/can/Kconfig > @@ -104,6 +104,16 @@ config CAN_JANZ_ICAN3 > This driver can also be built as a module. If so, the module will be > called janz-ican3.ko. > > +config CAN_MEN_Z192 > + tristate "MEN 16Z192-00 CAN Controller" > + depends on MCB > + ---help--- > + Driver for MEN 16Z192-00 CAN Controller IP-Core, which > + is connected to the MEN Chameleon Bus. > + > + This driver can also be built as a module. If so, the module will be > + called men_z192_can.ko. > + > config CAN_RCAR > tristate "Renesas R-Car CAN controller" > depends on ARCH_RENESAS || ARM > diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile > index e3db0c8..eb206b3 100644 > --- a/drivers/net/can/Makefile > +++ b/drivers/net/can/Makefile > @@ -22,6 +22,7 @@ obj-$(CONFIG_CAN_FLEXCAN) += flexcan.o > obj-$(CONFIG_CAN_GRCAN) += grcan.o > obj-$(CONFIG_CAN_IFI_CANFD) += ifi_canfd/ > obj-$(CONFIG_CAN_JANZ_ICAN3) += janz-ican3.o > +obj-$(CONFIG_CAN_MEN_Z192) += men_z192_can.o > obj-$(CONFIG_CAN_MSCAN) += mscan/ > obj-$(CONFIG_CAN_M_CAN) += m_can/ > obj-$(CONFIG_CAN_RCAR) += rcar_can.o > diff --git a/drivers/net/can/men_z192_can.c b/drivers/net/can/men_z192_can.c > new file mode 100644 > index 000..d3acc2e > --- /dev/null > +++ b/drivers/net/can/men_z192_can.c > @@ -0,0 +1,990 @@ > +/* > + * MEN 16Z192 CAN Controller driver > + * > + * Copyright (C) 2016 MEN Mikroelektronik GmbH (www.men.de) > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License as published by the Free > + * Software Foundation; version 2 of the License. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define DRV_NAME "z192_can" > + please use a single space after the macro definition > +#define MEN_Z192_NAPI_WEIGHT 64 > +#define MEN_Z192_MODE_TOUT_US40 > + > +/* CTL/BTR Register Bits */ > +#define MEN_Z192_CTL0_INITRQ BIT(0) > +#define MEN_Z192_CTL0_SLPRQ BIT(1) > +#define MEN_Z192_CTL1_INITAK BIT(8) > +#define MEN_Z192_CTL1_SLPAK BIT(9) > +#define MEN_Z192_CTL1_LISTEN BIT(12) > +#define MEN_Z192_CTL1_LOOPB BIT(13) > +#define MEN_Z192_CTL1_CANE BIT(15) > +#define MEN_Z192_BTR0_BRP(x) (((x) & 0x3f) << 16) > +#define MEN_Z192_BTR0_SJW(x) (((x) & 0x03) << 22) > +#define MEN_Z192_BTR1_TSEG1(x) (((x) & 0x0f) << 24) > +#define MEN_Z192_BTR1_TSEG2(x) (((x) & 0x07) << 28) > +#define MEN_Z192_BTR1_SAMP BIT(31) > + > +/* IER Interrupt Enable Register bits */ > +#define MEN_Z192_RXIEBIT(0) > +#define MEN_Z192_OVRIE BIT(1) > +#define MEN_Z192_CSCIE BIT(6) > +#define MEN_Z192_TOUTE BIT(7) > +#define MEN_Z192_TXIEBIT(16) > +#define MEN_Z192_ERRIE BIT(17) > + > +#define MEN_Z192_IRQ_ALL \ > + (MEN_Z192_RXIE | MEN_Z192_OVRIE | \ > + MEN_Z192_CSCIE | MEN_Z192_TOUTE | \ > + MEN_Z192_TXIE) > + > +#define MEN_Z192_IRQ_NAPI(MEN_Z192_RXIE | MEN_Z192_TOUTE) > + > +/* RX_TX_STAT RX/TX Status status register bits */ > +#define MEN_Z192_RX_BUF_CNT(x) ((x) & 0xff) > +#define MEN_Z192_TX_BUF_CNT(x) (((x) & 0xff00) >> 8) > +#define MEN_Z192_RFLG_RXIF BIT(16) > +#define MEN_Z192_RFLG_OVRF BIT(17) > +#define MEN_Z192_RFLG_TSTATEGENMASK(19, 18) > +#define MEN_Z192_RFLG_RSTATEGENMASK(21, 20) > +#define MEN_Z192_RFLG_CSCIF BIT(22) > +#define MEN_Z192_RFLG_TOUTF BIT(23) > +#define ME
Re: [PATCH 04/22] target: Make use of the new sg_map function at 16 call sites (fwd)
It looks like &udev->cmdr_lock should be released at line 512 if it has not been released otherwise. The lock was taken at line 438. julia -- Forwarded message -- Date: Fri, 14 Apr 2017 22:21:44 +0800 From: kbuild test robot To: kbu...@01.org Cc: Julia Lawall Subject: Re: [PATCH 04/22] target: Make use of the new sg_map function at 16 call sites Hi Logan, [auto build test WARNING on scsi/for-next] [also build test WARNING on v4.11-rc6] [cannot apply to next-20170413] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Logan-Gunthorpe/Introduce-common-scatterlist-map-function/20170414-142518 base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next :: branch date: 8 hours ago :: commit date: 8 hours ago >> drivers/target/target_core_user.c:512:2-8: preceding lock on line 438 drivers/target/target_core_user.c:512:2-8: preceding lock on line 471 git remote add linux-review https://github.com/0day-ci/linux git remote update linux-review git checkout 78082134e7afdc59d744eb8d2def5c588e89c378 vim +512 drivers/target/target_core_user.c 7c9e7a6f Andy Grover 2014-10-01 432 sizeof(struct tcmu_cmd_entry)); 7c9e7a6f Andy Grover 2014-10-01 433 command_size = base_command_size 7c9e7a6f Andy Grover 2014-10-01 434 + round_up(scsi_command_size(se_cmd->t_task_cdb), TCMU_OP_ALIGN_SIZE); 7c9e7a6f Andy Grover 2014-10-01 435 7c9e7a6f Andy Grover 2014-10-01 436 WARN_ON(command_size & (TCMU_OP_ALIGN_SIZE-1)); 7c9e7a6f Andy Grover 2014-10-01 437 7c9e7a6f Andy Grover 2014-10-01 @438 spin_lock_irq(&udev->cmdr_lock); 7c9e7a6f Andy Grover 2014-10-01 439 7c9e7a6f Andy Grover 2014-10-01 440 mb = udev->mb_addr; 7c9e7a6f Andy Grover 2014-10-01 441 cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */ 26418649 Sheng Yang 2016-02-26 442 data_length = se_cmd->data_length; 26418649 Sheng Yang 2016-02-26 443 if (se_cmd->se_cmd_flags & SCF_BIDI) { 26418649 Sheng Yang 2016-02-26 444 BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents)); 26418649 Sheng Yang 2016-02-26 445 data_length += se_cmd->t_bidi_data_sg->length; 26418649 Sheng Yang 2016-02-26 446 } 554617b2 Andy Grover 2016-08-25 447 if ((command_size > (udev->cmdr_size / 2)) || 554617b2 Andy Grover 2016-08-25 448 data_length > udev->data_size) { 554617b2 Andy Grover 2016-08-25 449 pr_warn("TCMU: Request of size %zu/%zu is too big for %u/%zu " 3d9b9555 Andy Grover 2016-08-25 450 "cmd ring/data area\n", command_size, data_length, 7c9e7a6f Andy Grover 2014-10-01 451 udev->cmdr_size, udev->data_size); 554617b2 Andy Grover 2016-08-25 452 spin_unlock_irq(&udev->cmdr_lock); 554617b2 Andy Grover 2016-08-25 453 return TCM_INVALID_CDB_FIELD; 554617b2 Andy Grover 2016-08-25 454 } 7c9e7a6f Andy Grover 2014-10-01 455 26418649 Sheng Yang 2016-02-26 456 while (!is_ring_space_avail(udev, command_size, data_length)) { 7c9e7a6f Andy Grover 2014-10-01 457 int ret; 7c9e7a6f Andy Grover 2014-10-01 458 DEFINE_WAIT(__wait); 7c9e7a6f Andy Grover 2014-10-01 459 7c9e7a6f Andy Grover 2014-10-01 460 prepare_to_wait(&udev->wait_cmdr, &__wait, TASK_INTERRUPTIBLE); 7c9e7a6f Andy Grover 2014-10-01 461 7c9e7a6f Andy Grover 2014-10-01 462 pr_debug("sleeping for ring space\n"); 7c9e7a6f Andy Grover 2014-10-01 463 spin_unlock_irq(&udev->cmdr_lock); 7c9e7a6f Andy Grover 2014-10-01 464 ret = schedule_timeout(msecs_to_jiffies(TCMU_TIME_OUT)); 7c9e7a6f Andy Grover 2014-10-01 465 finish_wait(&udev->wait_cmdr, &__wait); 7c9e7a6f Andy Grover 2014-10-01 466 if (!ret) { 7c9e7a6f Andy Grover 2014-10-01 467 pr_warn("tcmu: command timed out\n"); 02eb924f Andy Grover 2016-10-06 468 return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; 7c9e7a6f Andy Grover 2014-10-01 469 } 7c9e7a6f Andy Grover 2014-10-01 470 7c9e7a6f Andy Grover 2014-10-01 471 spin_lock_irq(&udev->cmdr_lock); 7c9e7a6f Andy Grover 2014-10-01 472 7c9e7a6f Andy Grover 2014-10-01 473 /* We dropped cmdr_lock, cmd_head is stale */ 7c9e7a6f Andy Grover 2014-10-01 474 cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */ 7c9e7a6
Re: [PATCH 0/9] Netfilter fixes for net
From: Pablo Neira Ayuso Date: Fri, 14 Apr 2017 02:26:42 +0200 > The following patchset contains Netfilter fixes for your net tree, > they are: ... > You can pull these changes from: > > git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git Pulled, thanks Pablo.
Re: export pcie_flr and remove copies of it in drivers
On Fri, Apr 14, 2017 at 9:41 AM, Bjorn Helgaas wrote: > On Thu, Apr 13, 2017 at 04:53:32PM +0200, Christoph Hellwig wrote: >> Hi all, >> >> this exports the PCI layer pcie_flr helper, and removes various opencoded >> copies of it. > > Looks good to me (except the comment on probe). If you want to apply > the whole series via netdev or some non-PCI tree, here's my ack for > the drivers/pci parts, assuming the probe thing is resolved: > > Acked-by: Bjorn Helgaas > > Otherwise, I'd be glad to take the series given acks for the non-PCI > parts. Just let me know. I do already have a patch (c5e4f0192ad2 ("PCI: Avoid FLR for Intel 82579 NICs")) on my pci/virtualization branch that touches pcie_flr() and will conflict with this one.
Re: export pcie_flr and remove copies of it in drivers
On Thu, Apr 13, 2017 at 04:53:32PM +0200, Christoph Hellwig wrote: > Hi all, > > this exports the PCI layer pcie_flr helper, and removes various opencoded > copies of it. Looks good to me (except the comment on probe). If you want to apply the whole series via netdev or some non-PCI tree, here's my ack for the drivers/pci parts, assuming the probe thing is resolved: Acked-by: Bjorn Helgaas Otherwise, I'd be glad to take the series given acks for the non-PCI parts. Just let me know. Bjorn
Re: [PATCH 1/7] PCI: export pcie_flr
[+cc Alex] On Thu, Apr 13, 2017 at 04:53:33PM +0200, Christoph Hellwig wrote: > Currently we opencode the FLR sequence in lots of place, export a core > helper instead. We split out the probing for FLR support as all the > non-core callers already know their hardware. > > Signed-off-by: Christoph Hellwig > --- > drivers/pci/pci.c | 34 +- > include/linux/pci.h | 1 + > 2 files changed, 26 insertions(+), 9 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 7904d02ffdb9..3256a63c5d08 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -3773,24 +3773,38 @@ static void pci_flr_wait(struct pci_dev *dev) >(i - 1) * 100); > } > > -static int pcie_flr(struct pci_dev *dev, int probe) > +/** > + * pcie_has_flr - check if a device supports function level resets > + * @dev: device to check > + * > + * Returns true if the device advertises support for PCIe function level > + * resets. > + */ > +static bool pcie_has_flr(struct pci_dev *dev) > { > u32 cap; > > pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &cap); > - if (!(cap & PCI_EXP_DEVCAP_FLR)) > - return -ENOTTY; > - > - if (probe) > - return 0; > + return cap & PCI_EXP_DEVCAP_FLR; > +} > > +/** > + * pcie_flr - initiate a PCIe function level reset > + * @dev: device to reset > + * > + * Initiate a function level reset on @dev. The caller should ensure the > + * device supports FLR before calling this function, e.g. by using the > + * pcie_has_flr helper. s/pcie_has_flr/pcie_has_flr()/ > + */ > +void pcie_flr(struct pci_dev *dev) > +{ > if (!pci_wait_for_pending_transaction(dev)) > dev_err(&dev->dev, "timed out waiting for pending transaction; > performing function level reset anyway\n"); > > pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR); > pci_flr_wait(dev); > - return 0; > } > +EXPORT_SYMBOL_GPL(pcie_flr); > > static int pci_af_flr(struct pci_dev *dev, int probe) > { > @@ -3971,9 +3985,11 @@ static int __pci_dev_reset(struct pci_dev *dev, int > probe) > if (rc != -ENOTTY) > goto done; > > - rc = pcie_flr(dev, probe); > - if (rc != -ENOTTY) > + if (pcie_has_flr(dev)) { > + pcie_flr(dev); > + rc = 0; > goto done; > + } This performs an FLR (if supported) always, regardless of "probe". I think it should look something like this instead: if (pcie_has_flr(dev)) { if (!probe) pcie_flr(dev); rc = 0; goto done; } > rc = pci_af_flr(dev, probe); > if (rc != -ENOTTY) > diff --git a/include/linux/pci.h b/include/linux/pci.h > index eb3da1a04e6c..f35e51eddad0 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1052,6 +1052,7 @@ int pcie_get_mps(struct pci_dev *dev); > int pcie_set_mps(struct pci_dev *dev, int mps); > int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed, > enum pcie_link_width *width); > +void pcie_flr(struct pci_dev *dev); > int __pci_reset_function(struct pci_dev *dev); > int __pci_reset_function_locked(struct pci_dev *dev); > int pci_reset_function(struct pci_dev *dev); > -- > 2.11.0 >