Re: [PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev
Hi Moritz, [auto build test WARNING on net-next/master] [also build test WARNING on v4.12 next-20170713] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Moritz-Fischer/dt-bindings-net-Add-bindings-for-National-Instruments-XGE-netdev/20170714-125718 config: ia64-allmodconfig (attached as .config) compiler: ia64-linux-gcc (GCC) 6.2.0 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=ia64 All warnings (new ones prefixed by >>): In file included from include/linux/if_ether.h:23:0, from include/linux/etherdevice.h:25, from drivers/net/ethernet/ni/nixge.c:14: drivers/net/ethernet/ni/nixge.c: In function 'nixge_dma_bd_release': >> drivers/net/ethernet/ni/nixge.c:241:17: warning: cast to pointer from >> integer of different size [-Wint-to-pointer-cast] dev_kfree_skb((struct sk_buff *) ^ include/linux/skbuff.h:977:38: note: in definition of macro 'dev_kfree_skb' #define dev_kfree_skb(a) consume_skb(a) ^ drivers/net/ethernet/ni/nixge.c: In function 'nixge_dma_bd_init': >> drivers/net/ethernet/ni/nixge.c:299:35: warning: cast from pointer to >> integer of different size [-Wpointer-to-int-cast] priv->rx_bd_v[i].sw_id_offset = (u32)skb; ^ drivers/net/ethernet/ni/nixge.c: In function 'nixge_start_xmit_done': drivers/net/ethernet/ni/nixge.c:452:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] dev_kfree_skb_irq((struct sk_buff *)cur_p->app4); ^ drivers/net/ethernet/ni/nixge.c: In function 'nixge_recv': drivers/net/ethernet/ni/nixge.c:546:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] skb = (struct sk_buff *)(cur_p->sw_id_offset); ^ drivers/net/ethernet/ni/nixge.c:577:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] cur_p->sw_id_offset = (u32)new_skb; ^ drivers/net/ethernet/ni/nixge.c: In function 'nixge_dma_err_handler': drivers/net/ethernet/ni/nixge.c:687:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] dev_kfree_skb_irq((struct sk_buff *)cur_p->app4); ^ vim +241 drivers/net/ethernet/ni/nixge.c 228 229 #define nixge_ctrl_poll_timeout(priv, addr, val, cond, sleep_us, timeout_us) \ 230 readl_poll_timeout((priv)->ctrl_regs + (addr), (val), cond, \ 231 (sleep_us), (timeout_us)) 232 233 static void nixge_dma_bd_release(struct net_device *ndev) 234 { 235 int i; 236 struct nixge_priv *priv = netdev_priv(ndev); 237 238 for (i = 0; i < RX_BD_NUM; i++) { 239 dma_unmap_single(ndev->dev.parent, priv->rx_bd_v[i].phys, 240 priv->max_frm_size, DMA_FROM_DEVICE); > 241 dev_kfree_skb((struct sk_buff *) 242(priv->rx_bd_v[i].sw_id_offset)); 243 } 244 245 if (priv->rx_bd_v) { 246 dma_free_coherent(ndev->dev.parent, 247sizeof(*priv->rx_bd_v) * RX_BD_NUM, 248priv->rx_bd_v, 249priv->rx_bd_p); 250 } 251 if (priv->tx_bd_v) { 252 dma_free_coherent(ndev->dev.parent, 253sizeof(*priv->tx_bd_v) * TX_BD_NUM, 254priv->tx_bd_v, 255priv->tx_bd_p); 256 } 257 } 258 259 static int nixge_dma_bd_init(struct net_device *ndev) 260 { 261 u32 cr; 262 int i; 263 struct sk_buff *skb; 264 struct nixge_priv *priv = netdev_priv(ndev); 265 266 /* Reset the indexes which are used for accessing the BDs */ 267 priv->tx_bd_ci = 0; 268 priv->tx_bd_tail = 0; 269 priv->rx_bd_ci = 0; 270 271 /* Allocate the Tx and Rx buffer descriptors. */ 272 priv->tx_bd_v = dma_zalloc_coherent(ndev->dev.parent, 273sizeof(*priv->tx_bd_v) * TX_BD_NUM, 274&priv-
Re: [PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev
Hi Andrew, On Thu, Jul 13, 2017 at 6:34 PM, Andrew Lunn wrote: >> > > + /* not sure if this is the correct way of dealing with this ... */ >> > > + ndev->phydev->supported &= ~(SUPPORTED_Autoneg); >> > > + ndev->phydev->advertising = ndev->phydev->supported; >> > > + ndev->phydev->autoneg = AUTONEG_DISABLE; >> > >> > What are you trying to achieve? >> >> Basically can't do Autoneg, I'll need to take a closer look. > > Hi Moritz > > What i actually think you mean, is it can only do 1Gbps. So you could > autoneg, but only advertise 1Gbps. Look at masking out > PHY_10BT_FEATURES and PHY_100BT_FEATURES. It does either 1Gbps or 10Gbps (over SFP+), depending which bitstream is loaded into the FPGA. In the current setup I could also just have two different compatible strings, since neither setup supports the other rate, but that might change. It seems getting rid of that part (the default values) now works, too. I'll need to take a closer look tomorrow (and I need to retest with 1Gbps) > > Take a look at: > > http://elixir.free-electrons.com/linux/latest/source/drivers/net/ethernet/renesas/ravb_main.c#L1045 Will do. Thanks for feedback, Moritz
[PATCHv2 1/1] mlx4_en: remove unnecessary returned value check
The function __mlx4_zone_remove_one_entry always returns zero. So it is not necessary to check it. Cc: Joe Jin Cc: Junxiao Bi Signed-off-by: Zhu Yanjun Reviewed-by: Yuval Shaia --- Change from v1 to v2: Initialization is moved to variable declaration. drivers/net/ethernet/mellanox/mlx4/alloc.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c b/drivers/net/ethernet/mellanox/mlx4/alloc.c index 249a458..710d6c6 100644 --- a/drivers/net/ethernet/mellanox/mlx4/alloc.c +++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c @@ -283,7 +283,7 @@ int mlx4_zone_add_one(struct mlx4_zone_allocator *zone_alloc, } /* Should be called under a lock */ -static int __mlx4_zone_remove_one_entry(struct mlx4_zone_entry *entry) +static void __mlx4_zone_remove_one_entry(struct mlx4_zone_entry *entry) { struct mlx4_zone_allocator *zone_alloc = entry->allocator; @@ -315,8 +315,6 @@ static int __mlx4_zone_remove_one_entry(struct mlx4_zone_entry *entry) } zone_alloc->mask = mask; } - - return 0; } void mlx4_zone_allocator_destroy(struct mlx4_zone_allocator *zone_alloc) @@ -457,7 +455,7 @@ struct mlx4_bitmap *mlx4_zone_get_bitmap(struct mlx4_zone_allocator *zones, u32 int mlx4_zone_remove_one(struct mlx4_zone_allocator *zones, u32 uid) { struct mlx4_zone_entry *zone; - int res; + int res = 0; spin_lock(&zones->lock); @@ -468,7 +466,7 @@ int mlx4_zone_remove_one(struct mlx4_zone_allocator *zones, u32 uid) goto out; } - res = __mlx4_zone_remove_one_entry(zone); + __mlx4_zone_remove_one_entry(zone); out: spin_unlock(&zones->lock); -- 2.7.4
[git pull] vfs.git network field-by-field copyin patches
[My apologies for late pull requests - I hoped to be back to normal connectivity by yesterday evening. Spent the night sitting in Logan instead...] This part of misc.compat queue was held back for review from networking folks and since davem has jus ACKed those... The following changes since commit 0d0606060baefdb13d3d80dba1b4c816b0676e16: mqueue: move compat syscalls to native ones (2017-07-04 13:13:49 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git misc.compat for you to fetch changes up to f8f8a727eab1c5b78c3703a461565b042979cc79: get_compat_bpf_fprog(): don't copyin field-by-field (2017-07-04 13:14:34 -0400) Al Viro (3): copy_msghdr_from_user(): get rid of field-by-field copyin get_compat_msghdr(): get rid of field-by-field copyin get_compat_bpf_fprog(): don't copyin field-by-field net/compat.c | 49 +++-- net/socket.c | 31 ++- 2 files changed, 37 insertions(+), 43 deletions(-)
Re: [RFC] get_compat_msghdr(): get rid of field-by-field copyin
From: Al Viro Date: Fri, 14 Jul 2017 02:37:50 +0100 > On Tue, Jul 11, 2017 at 08:25:14PM -0700, David Miller wrote: > >> looks harmless, or if there is a bug in there I can't see it. >> >> But whatever it is, that same problem could be hiding in some of these >> other transformations as well. >> >> I think the bug might be that we are corrupting the user's stack >> somehow. But the two user copies in that commit look perfectly fine >> to my eyes. >> >> There shouldn't be any padding in that compat_rlimit structure, so >> it's not like we're copying extra bytes. Well, we'd be exposing >> kernel stack memory if that were the case. > > There isn't any padding in compat_rlimit; unfortunately, it was > mistakenly declared as struct rlimit instead. Which, of course, > has different member sizes - otherwise we wouldn't have needed > a compat syscall there in the first place. > > It was harder to spot since I combined move and a transformation > into one commit. Shouldn't have done so... Had those been two > separate commits, the bug would've stood out immediately. Shouldn't > be the case here... Ok, I'll ack these patches then: Acked-by: David S. Miller
Re: [PATCH][net-next] svcrdma: fix an incorrect check on -E2BIG and -EINVAL
On Thu, Jul 13, 2017 at 01:53:10PM -0400, Chuck Lever wrote: > > > On Jul 13, 2017, at 1:51 PM, Colin King wrote: > > > > From: Colin Ian King > > > > The current check will always be true and will always jump to > > err1, this looks dubious to me. I believe && should be used > > instead of ||. > > > > Detected by CoverityScan, CID#1450120 ("Logically Dead Code") > > > > Fixes: 107c1d0a991a ("svcrdma: Avoid Send Queue overflow") > > Signed-off-by: Colin Ian King > > Reviewed-by: Chuck Lever > > Dan reported this today, and I have a similar patch in my > "pending for-rc" series. This one works too. Applied and merged upstream, thanks. --b.
Re: [RFC] get_compat_msghdr(): get rid of field-by-field copyin
On Tue, Jul 11, 2017 at 08:25:14PM -0700, David Miller wrote: > looks harmless, or if there is a bug in there I can't see it. > > But whatever it is, that same problem could be hiding in some of these > other transformations as well. > > I think the bug might be that we are corrupting the user's stack > somehow. But the two user copies in that commit look perfectly fine > to my eyes. > > There shouldn't be any padding in that compat_rlimit structure, so > it's not like we're copying extra bytes. Well, we'd be exposing > kernel stack memory if that were the case. There isn't any padding in compat_rlimit; unfortunately, it was mistakenly declared as struct rlimit instead. Which, of course, has different member sizes - otherwise we wouldn't have needed a compat syscall there in the first place. It was harder to spot since I combined move and a transformation into one commit. Shouldn't have done so... Had those been two separate commits, the bug would've stood out immediately. Shouldn't be the case here... > Color me stumped, but cautious about ACK'ing these networking > compat changes.
Re: [PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev
> > > + /* not sure if this is the correct way of dealing with this ... */ > > > + ndev->phydev->supported &= ~(SUPPORTED_Autoneg); > > > + ndev->phydev->advertising = ndev->phydev->supported; > > > + ndev->phydev->autoneg = AUTONEG_DISABLE; > > > > What are you trying to achieve? > > Basically can't do Autoneg, I'll need to take a closer look. Hi Moritz What i actually think you mean, is it can only do 1Gbps. So you could autoneg, but only advertise 1Gbps. Look at masking out PHY_10BT_FEATURES and PHY_100BT_FEATURES. Take a look at: http://elixir.free-electrons.com/linux/latest/source/drivers/net/ethernet/renesas/ravb_main.c#L1045 It might actually make sense to add a phy_set_min_speed(), a mirror to phy_set_max_speed(). Andrew
Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported
On 2017/7/14 5:09, Sinan Kaya wrote: > On 7/13/2017 10:21 AM, Ding Tianhong wrote: >> static void pci_configure_relaxed_ordering(struct pci_dev *dev) >> +{ >> +/* We should not alter the relaxed ordering bit for the VF */ >> +if (dev->is_virtfn) >> +return; >> + >> +/* If the releaxed ordering enable bit is not set, do nothing. */ >> +if (!pcie_relaxed_ordering_supported(dev)) >> +return; >> + >> +if (pci_dev_should_disable_relaxed_ordering(dev)) { >> +pcie_clear_relaxed_ordering(dev); >> +dev_info(&dev->dev, "Disable Relaxed Ordering\n"); >> +} >> +} > > I couldn't find anywhere where you actually enable the relaxed ordering > like the subject suggests. > There is no code to enable the PCIe Relaxed Ordering bit in the configuration space, it is only be enable by default according to the PCIe Standard Specification, what we do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root Complex. Thanks Ding
Re: [PATCH net] net: bridge: fix dest lookup when vlan proto doesn't match
On 2017/07/13 22:09, Nikolay Aleksandrov wrote: > With 802.1ad support the vlan_ingress code started checking for vlan > protocol mismatch which causes the current tag to be inserted and the > bridge vlan protocol & pvid to be set. The vlan tag insertion changes > the skb mac_header and thus the lookup mac dest pointer which was loaded > prior to calling br_allowed_ingress in br_handle_frame_finish is VLAN_HLEN > bytes off now, pointing to the last two bytes of the destination mac and > the first four of the source mac causing lookups to always fail and > broadcasting all such packets to all ports. Same thing happens for locally > originated packets when passing via br_dev_xmit. So load the dest pointer > after the vlan checks and possible skb change. > > Fixes: 8580e2117c06 ("bridge: Prepare for 802.1ad vlan filtering support") > Reported-by: Anitha Narasimha Murthy > Signed-off-by: Nikolay Aleksandrov Oops. Thank you for fixing it. Acked-by: Toshiaki Makita
Re: [PATCH net-next v1 2/3] gtp: Support creating flow-based gtp net_device
Hi Harald, > On 7/13/17, 12:35 AM, "Harald Welte" wrote: > > > +static int gtp_dev_open(struct net_device *dev) > > +{ > > + struct gtp_dev *gtp = netdev_priv(dev); > > + struct net *net = gtp->net; > > + struct socket *sock1u; > > + struct socket *sock0; > > + struct udp_tunnel_sock_cfg tunnel_cfg; > > + struct udp_port_cfg udp_conf; > > + int err; > > + > > + memset(&udp_conf, 0, sizeof(udp_conf)); > > + > > + udp_conf.family = AF_INET; > > + udp_conf.local_ip.s_addr = htonl(INADDR_ANY); > > + udp_conf.local_udp_port = htons(GTP1U_PORT); > > + > > + err = udp_sock_create(gtp->net, &udp_conf, &sock1u); > > + if (err < 0) > > + return err; > > + > > + udp_conf.local_udp_port = htons(GTP0_PORT); > > + err = udp_sock_create(gtp->net, &udp_conf, &sock0); > > + if (err < 0) > > + return err; > > you're unconditionally binding to both GTP0 and GTP1 UDP ports. This is > done selectively based on netlink attributes in the existing "normal" > non-OVS kernel code, i.e. the control is left to the user. > > Is this function is only called/used in the context of OVS? If so, > since you explicitly implement only GTPv1, why bind to GTPv0 port? > I had doubts on how this flow-based GTPv1 code path should fit in, which is why the GTPv0 and the GTPv1 code pieces are mixed in my changes. Should I explicitly claim that the flow-based change is for GTPv1 only? > > + setup_udp_tunnel_sock(net, sock1u, &tunnel_cfg); > > even here, you're only setting up the v1 and not v0. > same reason as above. > > + /* Assume largest header, ie. GTPv0. */ > > + dev->needed_headroom= LL_MAX_HEADER + > > + sizeof(struct iphdr) + > > + sizeof(struct udphdr) + > > + sizeof(struct gtp0_header); > > ... and here you're using headroom for a GTPv0 header, despite (I think) > only supporting GTPv1 from this configuration? Yes, only GTPv1 is supported. > > > + err = gtp_hashtable_new(gtp, GTP_PDP_HASHSIZE); // JO: when to free?? > > I think that question about when to free needs to be resolved before any > merge. Did you check that it persists even after the device is > closed/removed? I didn't investigate it. What do you mean by persist? Thanks -Jiannan
Re: [PATCH net-next v1 1/3] gtp: refactor to support flow-based gtp encap and decap
Hi Harald, > On 7/13/17, 12:26 AM, "Harald Welte" wrote: >· > > static inline void gtp_set_pktinfo_ipv4(struct gtp_pktinfo *pktinfo, > > struct sock *sk, struct iphdr *iph, > > - struct pdp_ctx *pctx, struct rtable *rt, > > - struct flowi4 *fl4, > > + struct rtable *rt, struct flowi4 *fl4, > > struct net_device *dev) > > { > > [...] > > + __be32 tun_id; >· > you are breaking GTPv0 functionality here. GTPv0 has 64 bit tunnel > identifiers, and this function is called both from GTPv1 and GTPv0 > context. >· > This makes me wonder how you did verify that your changes do not break > the existing operation with both GTPv0 and GTPv1? >· Good catch. I only fully tested the GTPv1 path against oai-cn. Will fix this and test the GTPv0 path as well. I had doubts on how this flow-based GTPv1 code path should fit in, which is why the GTPv0 and the GTPv1 code pieces are mixed in my changes. Should I explicitly claim that the flow-based change is for GTPv1 only? > > + // flow-based GTP1U encap > > + info = skb_tunnel_info(skb); > > + if (gtp->collect_md && info && ntohs(info->key.tp_dst) == GTP1U_PORT) { >· > I think it's typically safe to assume that GTP is only operated on > standard ports, but it is something you chould/should think about, i.e. > whether you want that kind of restriction. In the existing use case, we > have the v0/v1 information stored in the per-pdp context structure. >· The reason I’m checking GTP1U_PORT here is to filter GTP1U traffic. It possible to pass a port number from ovs into the gtp module. I will investigate how to support programmable port. > > + tun_id = htonl(pctx->u.v1.o_tei); >· > here is where you're assuming GTPv1 in two ways from code that is called > from both v0 and v1. > * you're dereferencing a v1 specific element in the pctx union > * you're storing the result in a 32bit variable >· Right, will fix this for GTPv0. > > gtp = netdev_priv(dev); > > + gtp->net = src_net; >· > Isn't this a generic change that's independent of your work on OVS GTP? It is meant to be OVS independent. What makes it not? Should I leave this field un-initialized? Thanks -Jiannan
Re: [PATCH 1/2] dt-bindings: net: Add bindings for National Instruments XGE netdev
Hi Yuan, On Thu, Jul 13, 2017 at 5:33 PM, YUAN Linyu wrote: > > >> -Original Message- >> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] >> On Behalf Of Moritz Fischer >> Sent: Friday, July 14, 2017 5:22 AM >> To: netdev@vger.kernel.org >> Cc: devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; >> da...@davemloft.net; mark.rutl...@arm.com; robh...@kernel.org; Moritz >> Fischer >> Subject: [PATCH 1/2] dt-bindings: net: Add bindings for National Instruments >> XGE netdev >> >> This adds bindings for the NI XGE 1G/10G network device. >> >> Signed-off-by: Moritz Fischer >> --- >> Documentation/devicetree/bindings/net/nixge.c | 32 >> +++ >> 1 file changed, 32 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/net/nixge.c > > It should be a text file, nixge.txt You are absolutely right ... I need to have my head examined. > >> >> diff --git a/Documentation/devicetree/bindings/net/nixge.c >> b/Documentation/devicetree/bindings/net/nixge.c >> new file mode 100644 >> index 000..9fff5a7
RE: [PATCH 1/2] dt-bindings: net: Add bindings for National Instruments XGE netdev
> -Original Message- > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] > On Behalf Of Moritz Fischer > Sent: Friday, July 14, 2017 5:22 AM > To: netdev@vger.kernel.org > Cc: devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; > da...@davemloft.net; mark.rutl...@arm.com; robh...@kernel.org; Moritz > Fischer > Subject: [PATCH 1/2] dt-bindings: net: Add bindings for National Instruments > XGE netdev > > This adds bindings for the NI XGE 1G/10G network device. > > Signed-off-by: Moritz Fischer > --- > Documentation/devicetree/bindings/net/nixge.c | 32 > +++ > 1 file changed, 32 insertions(+) > create mode 100644 Documentation/devicetree/bindings/net/nixge.c It should be a text file, nixge.txt > > diff --git a/Documentation/devicetree/bindings/net/nixge.c > b/Documentation/devicetree/bindings/net/nixge.c > new file mode 100644 > index 000..9fff5a7
Re: [PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev
Hi Andrew, thanks for the quick response. On Fri, Jul 14, 2017 at 12:36:36AM +0200, Andrew Lunn wrote: > > +++ b/drivers/net/ethernet/ni/nixge.c > > @@ -0,0 +1,1246 @@ > > +/* > > + * Copyright (c) 2016-2017, National Instruments Corp. > > + * > > + * Network Driver for Ettus Research XGE MAC > > + * > > + * This is largely based on the Xilinx AXI Ethernet Driver, > > + * and uses the same DMA engine in the FPGA > > Hi Moritz > > Is the DMA code the same as in the AXI driver? Should it be pulled out > into a library and shared? Mostly, I'll see what I can do. At least the register definitions and common structures can be pulled out into a common header file. > > > +struct nixge_priv { > > + struct net_device *ndev; > > + struct device *dev; > > + > > + /* Connection to PHY device */ > > + struct phy_device *phy_dev; > > + phy_interface_t phy_interface; > > > + /* protecting link parameters */ > > + spinlock_t lock; > > + int link; > > + int speed; > > + int duplex; > > All these seem to be pointless. They are set, but never used. Will fix. > > > + > > +static inline void nixge_dma_write_reg(struct nixge_priv *priv, off_t > > offset, > > + u32 val) > > Please leave it up to the compile to inline. Will fix. > > > +static void __nixge_device_reset(struct nixge_priv *priv, off_t offset) > > +{ > > + u32 timeout; > > + /* Reset Axi DMA. This would reset NIXGE Ethernet core as well. > > +* The reset process of Axi DMA takes a while to complete as all > > +* pending commands/transfers will be flushed or completed during > > +* this reset process. > > +*/ > > + nixge_dma_write_reg(priv, offset, XAXIDMA_CR_RESET_MASK); > > + timeout = DELAY_OF_ONE_MILLISEC; > > + while (nixge_dma_read_reg(priv, offset) & XAXIDMA_CR_RESET_MASK) { > > + udelay(1); > > There is a link between the 1 and the value of DELAY_OF_ONE_MILLISEC. > It would be good to try to link these two together. D'oh ... Seems like a good candidate for iopoll ... > > > + if (--timeout == 0) { > > + netdev_err(priv->ndev, "%s: DMA reset timeout!\n", > > + __func__); > > + break; > > + } > > + } > > +} > > + > > > +static void nixge_handle_link_change(struct net_device *ndev) > > +{ > > + struct nixge_priv *priv = netdev_priv(ndev); > > + struct phy_device *phydev = ndev->phydev; > > + unsigned long flags; > > + int status_change = 0; > > + > > + spin_lock_irqsave(&priv->lock, flags); > > + > > + if (phydev->link != priv->link) { > > + if (!phydev->link) { > > + priv->speed = 0; > > + priv->duplex = -1; > > + } > > + priv->link = phydev->link; > > + > > + status_change = 1; > > + } > > + > > + spin_unlock_irqrestore(&priv->lock, flags); > > + > > + if (status_change) { > > + if (phydev->link) { > > + netif_carrier_on(ndev); > > + netdev_info(ndev, "link up (%d/%s)\n", > > + phydev->speed, > > + phydev->duplex == DUPLEX_FULL ? > > + "Full" : "Half"); > > + } else { > > + netif_carrier_off(ndev); > > + netdev_info(ndev, "link down\n"); > > + } > > phy_print_status() should be used. Will do. > > Also, the phylib will handle netif_carrier_off/on for you. Good to know :) > > > +static int nixge_open(struct net_device *ndev) > > +{ > > + struct nixge_priv *priv = netdev_priv(ndev); > > + int ret; > > + > > + nixge_device_reset(ndev); > > + > > + /* start netif carrier down */ > > + netif_carrier_off(ndev); > > + > > + if (!ndev->phydev) > > + netdev_err(ndev, "no phy, phy_start() failed\n"); > > Not really correct. You don't call phy_start(). And phy_start() cannot > indicate a failure, it is a void function. Will fix. > > It would be a lot better to bail out if there is no phy. Probably > during probe. Yeah. > > > +static s32 __nixge_set_mac_address(struct net_device *ndev, const void > > *addr) > > +{ > > + struct nixge_priv *priv = netdev_priv(ndev); > > + > > + if (addr) > > + memcpy(ndev->dev_addr, addr, ETH_ALEN); > > + if (!is_valid_ether_addr(ndev->dev_addr)) > > + eth_random_addr(ndev->dev_addr); > > Messy. I would change this. Make addr mandatory. If it is invalid, > return an error. That will make nixge_net_set_mac_address() do the > right thing. When called from nixge_probe() should verify what it gets > from the nvmem, and if it is invalid, pass a random MAC address. Will fix. > > > + > > + nixge_ctrl_write_reg(priv, NIXGE_REG_MAC_LSB, > > +(ndev->dev_addr[2]) << 24 | > > +(ndev->dev_addr[3] << 16) | > > +(ndev->dev_addr[4]
Re: [PATCH v7 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
[[ Sorry for the Double Send: I forgot to switch to Plain Text. Have I mentioned how much I hate modern Web-based email agents? :-) -- Casey ]] Yeah, I think this works for now. We'll stumble over what to do when we want to mix upstream TLPs without Relaxed Ordering Attributes directed at problematic Root Complexes, and Peer-to-Peer TLPs with Relaxed Ordering Attributes ... or vice versa depending on which target PCIe Device has issues with Relaxed Ordering. Thanks for all the work! Casey
Re: [PATCH] datapath: Fix for force/commit action failures
On 13 July 2017 at 15:38, Greg Rose wrote: > On 07/13/2017 11:03 AM, Joe Stringer wrote: >> >> On 13 July 2017 at 11:01, Greg Rose wrote: >>> >>> On 07/13/2017 10:46 AM, Joe Stringer wrote: On 13 July 2017 at 09:25, Greg Rose wrote: > > > When there is an established connection in direction A->B, it is > possible to receive a packet on port B which then executes > ct(commit,force) without first performing ct() - ie, a lookup. > In this case, we would expect that this packet can delete the existing > entry so that we can commit a connection with direction B->A. However, > currently we only perform a check in skb_nfct_cached() for whether > OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a > lookup previously occurred. In the above scenario, a lookup has not > occurred but we should still be able to statelessly look up the > existing entry and potentially delete the entry if it is in the > opposite direction. > > This patch extends the check to also hint that if the action has the > force flag set, then we will lookup the existing entry so that the > force check at the end of skb_nfct_cached has the ability to delete > the connection. > > CC: d...@openvswitch.org > CC: Pravin Shalar > Signed-off-by: Joe Stringer > Signed-off-by: Greg Rose > --- A couple more administrative notes, on netdev the module name in the patch subject for openvswitch is "openvswitch" rather than datapath; >>> >>> >>> >>> Right you are. >>> and patches rather than having just "PATCH" as the subject prefix should state the tree. In this case, it's a bugfix so it should be "PATCH net". >>> >>> >>> >>> I knew that... forgot the format patch option to add it. Net-next >>> is closed so that would be mandatory. >>> >>> Furthermore, if you're able to figure out which commit introduced the issue (I believe it's introduced by the force commit patch), then you should place the "Fixes: " tag. I can give you some pointers off-list on how to do this (git blame and some basic formatting of the targeted patch should do the trick - this tag expects a 12-digit hash). For reference, I ended up looking it up during review, this is the line you'd add: Fixes: dd41d33f0b03 ("openvswitch: Add force commit.") >>> >>> >>> >>> Oh, thanks! >>> >>> >net/openvswitch/conntrack.c | 12 >1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > index 08679eb..9041cf8 100644 > --- a/net/openvswitch/conntrack.c > +++ b/net/openvswitch/conntrack.c > @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net, > ct = nf_ct_get(skb, &ctinfo); > /* If no ct, check if we have evidence that an existing > conntrack entry >* might be found for this skb. This happens when we lose a > skb->_nfct > -* due to an upcall. If the connection was not confirmed, it > is > not > -* cached and needs to be run through conntrack again. > +* due to an upcall, or if the direction is being forced. If > the > +* connection was not confirmed, it is not cached and needs to > be > run > +* through conntrack again. >*/ > - if (!ct && key->ct_state & OVS_CS_F_TRACKED && > + if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) && > !(key->ct_state & OVS_CS_F_INVALID) && > - key->ct_zone == info->zone.id) { > +key->ct_zone == info->zone.id) || > +(!key->ct_state && info->force)) { > ct = ovs_ct_find_existing(net, &info->zone, > info->family, skb, > !!(key->ct_state >& OVS_CS_F_NAT_MASK)); > if (ct) > nf_ct_get(skb, &ctinfo); > + else > + return false; > } > if (!ct) > return false; I was just wondering if this has the potential to prevent nf_conntrack_in() from being called at all in this case, which is also not quite right. In the original case of (!ct && (key->ct_state & OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_TRACKED)), which I'll refer to as "ct_executed", we explicitly want to avoid running nf_conntrack_in() if we already ran it, because the connection tracker doesn't expect to see the same packet twice (there's also things like stats/accounting, and potentially L4 state machines that could get messed up by multiple calls). By the time the info->force and direction check happen
Re: [PATCH net-next v1 0/3] Flow Based GTP Tunneling
Hi Harald and Jeo, Thank you for the code review. They are really helpful! > On 7/13/17, 11:14 AM, "Joe Stringer" wrote: > > On 13 July 2017 at 00:12, Harald Welte wrote: >> I'm not familiar with the details here, but does this imply that you >> are matching on the outer (transport layer) source IP address? If so, >> please note this is in violation of the 3GPP specifications for GTP, >> which require explicitly that the TEID is the *only* criteria for >> matching an encapsulated packet to the tunnel. Basically anyone can >> send you an encapsulated packet from any random source IP, just as long >> as the TEID matches a tunnel, it will be decapsulated. >> >> This is [presumably] in order to take care of mobility, as the >> subscribers' phone moves around different MME/S-GW/SGSN, each having >> different source IP addresses. > > I think that this will be hard to avoid; the several existing tunnel > implementations that OVS plugs into all allow matching on the outer > addresses/ports, I don't see a good way to restrict it without > introducing completely new metadata_dst paths for GTP. I'd prefer not > to introduce something like this if we can avoid it; several tunnels > currently share all of the same metadata_dst code and that's proven > sufficient for all cases so far. If someone wishes to implement the > 3GPP standard correctly then they should not create matches like this. > In quite a few cases, OVS tends to take the approach that we give the > user the tools to do what they need to do, but if they wish to shoot > themselves in the foot then that's up to them. We can of course work > towards ensuring the OVS userspace guides users in the right direction > though. The flow listed out here is an example for the nature of the match that can be performed, but the actual match rule that is programmed by the control plane should wildcard the tun_src as noted by Harald. It is the control plane’s responsibility to enforce the 3GPP Specifications, e.g. creating flow rules to make TEID the only criteria for matching an encapsulated packet to the tunnel. I will provide a 3GPP compliant example in the next version, thank Harald for pointing it out. -Jiannan
[PATCH] zd1211rw: fix spelling mistake 'hybernate' -> 'hibernate'
From: Colin Ian King Trivial fix to spelling mistake in PDEBUG debug message. Signed-off-by: Colin Ian King --- drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c b/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c index a93f657a41c7..d4e512f50945 100644 --- a/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c +++ b/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c @@ -61,7 +61,7 @@ static void dump_regwrite(u32 rw) switch (reg) { case 0: - PDEBUG("reg0 CFG1 ref_sel %d hybernate %d rf_vco_reg_en %d" + PDEBUG("reg0 CFG1 ref_sel %d hibernate %d rf_vco_reg_en %d" " if_vco_reg_en %d if_vga_en %d", bits(rw, 14, 15), bit(rw, 3), bit(rw, 2), bit(rw, 1), bit(rw, 0)); -- 2.11.0
Re: [PATCH] datapath: Fix for force/commit action failures
On 07/13/2017 11:03 AM, Joe Stringer wrote: On 13 July 2017 at 11:01, Greg Rose wrote: On 07/13/2017 10:46 AM, Joe Stringer wrote: On 13 July 2017 at 09:25, Greg Rose wrote: When there is an established connection in direction A->B, it is possible to receive a packet on port B which then executes ct(commit,force) without first performing ct() - ie, a lookup. In this case, we would expect that this packet can delete the existing entry so that we can commit a connection with direction B->A. However, currently we only perform a check in skb_nfct_cached() for whether OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a lookup previously occurred. In the above scenario, a lookup has not occurred but we should still be able to statelessly look up the existing entry and potentially delete the entry if it is in the opposite direction. This patch extends the check to also hint that if the action has the force flag set, then we will lookup the existing entry so that the force check at the end of skb_nfct_cached has the ability to delete the connection. CC: d...@openvswitch.org CC: Pravin Shalar Signed-off-by: Joe Stringer Signed-off-by: Greg Rose --- A couple more administrative notes, on netdev the module name in the patch subject for openvswitch is "openvswitch" rather than datapath; Right you are. and patches rather than having just "PATCH" as the subject prefix should state the tree. In this case, it's a bugfix so it should be "PATCH net". I knew that... forgot the format patch option to add it. Net-next is closed so that would be mandatory. Furthermore, if you're able to figure out which commit introduced the issue (I believe it's introduced by the force commit patch), then you should place the "Fixes: " tag. I can give you some pointers off-list on how to do this (git blame and some basic formatting of the targeted patch should do the trick - this tag expects a 12-digit hash). For reference, I ended up looking it up during review, this is the line you'd add: Fixes: dd41d33f0b03 ("openvswitch: Add force commit.") Oh, thanks! net/openvswitch/conntrack.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 08679eb..9041cf8 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net, ct = nf_ct_get(skb, &ctinfo); /* If no ct, check if we have evidence that an existing conntrack entry * might be found for this skb. This happens when we lose a skb->_nfct -* due to an upcall. If the connection was not confirmed, it is not -* cached and needs to be run through conntrack again. +* due to an upcall, or if the direction is being forced. If the +* connection was not confirmed, it is not cached and needs to be run +* through conntrack again. */ - if (!ct && key->ct_state & OVS_CS_F_TRACKED && + if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_INVALID) && - key->ct_zone == info->zone.id) { +key->ct_zone == info->zone.id) || +(!key->ct_state && info->force)) { ct = ovs_ct_find_existing(net, &info->zone, info->family, skb, !!(key->ct_state & OVS_CS_F_NAT_MASK)); if (ct) nf_ct_get(skb, &ctinfo); + else + return false; } if (!ct) return false; I was just wondering if this has the potential to prevent nf_conntrack_in() from being called at all in this case, which is also not quite right. In the original case of (!ct && (key->ct_state & OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_TRACKED)), which I'll refer to as "ct_executed", we explicitly want to avoid running nf_conntrack_in() if we already ran it, because the connection tracker doesn't expect to see the same packet twice (there's also things like stats/accounting, and potentially L4 state machines that could get messed up by multiple calls). By the time the info->force and direction check happens at the end of the function, "ct_executed" is implied to be true. However, in this new case, ct_executed is actually false - because there was no ct() before the ct(force,commit). In this case, we only want to look up the existing entry to see if it should be deleted; if it should not be deleted, then we still haven't yet done the nf_conntrack_in() call so we should return false and the caller, __ovs_ct_lookup() should make the call to nf_conntrack_in(). What I mean is something like the following incremental on your patch: diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 9041cf8b822f..98783f114824 100644 --- a/net/openvswitch/
Re: [PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev
> +++ b/drivers/net/ethernet/ni/nixge.c > @@ -0,0 +1,1246 @@ > +/* > + * Copyright (c) 2016-2017, National Instruments Corp. > + * > + * Network Driver for Ettus Research XGE MAC > + * > + * This is largely based on the Xilinx AXI Ethernet Driver, > + * and uses the same DMA engine in the FPGA Hi Moritz Is the DMA code the same as in the AXI driver? Should it be pulled out into a library and shared? > +struct nixge_priv { > + struct net_device *ndev; > + struct device *dev; > + > + /* Connection to PHY device */ > + struct phy_device *phy_dev; > + phy_interface_t phy_interface; > + /* protecting link parameters */ > + spinlock_t lock; > + int link; > + int speed; > + int duplex; All these seem to be pointless. They are set, but never used. > + > +static inline void nixge_dma_write_reg(struct nixge_priv *priv, off_t offset, > +u32 val) Please leave it up to the compile to inline. > +static void __nixge_device_reset(struct nixge_priv *priv, off_t offset) > +{ > + u32 timeout; > + /* Reset Axi DMA. This would reset NIXGE Ethernet core as well. > + * The reset process of Axi DMA takes a while to complete as all > + * pending commands/transfers will be flushed or completed during > + * this reset process. > + */ > + nixge_dma_write_reg(priv, offset, XAXIDMA_CR_RESET_MASK); > + timeout = DELAY_OF_ONE_MILLISEC; > + while (nixge_dma_read_reg(priv, offset) & XAXIDMA_CR_RESET_MASK) { > + udelay(1); There is a link between the 1 and the value of DELAY_OF_ONE_MILLISEC. It would be good to try to link these two together. > + if (--timeout == 0) { > + netdev_err(priv->ndev, "%s: DMA reset timeout!\n", > +__func__); > + break; > + } > + } > +} > + > +static void nixge_handle_link_change(struct net_device *ndev) > +{ > + struct nixge_priv *priv = netdev_priv(ndev); > + struct phy_device *phydev = ndev->phydev; > + unsigned long flags; > + int status_change = 0; > + > + spin_lock_irqsave(&priv->lock, flags); > + > + if (phydev->link != priv->link) { > + if (!phydev->link) { > + priv->speed = 0; > + priv->duplex = -1; > + } > + priv->link = phydev->link; > + > + status_change = 1; > + } > + > + spin_unlock_irqrestore(&priv->lock, flags); > + > + if (status_change) { > + if (phydev->link) { > + netif_carrier_on(ndev); > + netdev_info(ndev, "link up (%d/%s)\n", > + phydev->speed, > + phydev->duplex == DUPLEX_FULL ? > + "Full" : "Half"); > + } else { > + netif_carrier_off(ndev); > + netdev_info(ndev, "link down\n"); > + } phy_print_status() should be used. Also, the phylib will handle netif_carrier_off/on for you. > +static int nixge_open(struct net_device *ndev) > +{ > + struct nixge_priv *priv = netdev_priv(ndev); > + int ret; > + > + nixge_device_reset(ndev); > + > + /* start netif carrier down */ > + netif_carrier_off(ndev); > + > + if (!ndev->phydev) > + netdev_err(ndev, "no phy, phy_start() failed\n"); Not really correct. You don't call phy_start(). And phy_start() cannot indicate a failure, it is a void function. It would be a lot better to bail out if there is no phy. Probably during probe. > +static s32 __nixge_set_mac_address(struct net_device *ndev, const void *addr) > +{ > + struct nixge_priv *priv = netdev_priv(ndev); > + > + if (addr) > + memcpy(ndev->dev_addr, addr, ETH_ALEN); > + if (!is_valid_ether_addr(ndev->dev_addr)) > + eth_random_addr(ndev->dev_addr); Messy. I would change this. Make addr mandatory. If it is invalid, return an error. That will make nixge_net_set_mac_address() do the right thing. When called from nixge_probe() should verify what it gets from the nvmem, and if it is invalid, pass a random MAC address. > + > + nixge_ctrl_write_reg(priv, NIXGE_REG_MAC_LSB, > + (ndev->dev_addr[2]) << 24 | > + (ndev->dev_addr[3] << 16) | > + (ndev->dev_addr[4] << 8) | > + (ndev->dev_addr[5] << 0)); > + > + nixge_ctrl_write_reg(priv, NIXGE_REG_MAC_MSB, > + (ndev->dev_addr[1] | (ndev->dev_addr[0] << 8))); > + > + return 0; > +} > + > +static void nixge_ethtools_get_drvinfo(struct net_device *ndev, > +struct ethtool_drvinfo *ed) > +{ > + strlcpy(ed->driver, "nixge", sizeof(ed->driver)); > + strlcpy(ed->version, "1.00a", sizeof(ed->version)); Driver version is pretty poin
Re: [PATCH] igb: fix unused igb_deliver_wake_packet() warning when CONFIG_PM=n
On Wed, 2017-07-12 at 18:23 -0300, Fabio Estevam wrote: > On Wed, Jul 12, 2017 at 6:09 PM, Dave Hansen > wrote: > > > > From: Dave Hansen > > > > I'm seeing warnings on kernel configurations where CONFIG_PM is > > disabled. It happens in 4.12, at least: > > > > drivers/ethernet/intel/igb/igb_main.c:7988:13: warning: > > 'igb_deliver_wake_packet' defined but not used [-Wunused-function] > > > > This is because igb_deliver_wake_packet() is defined outside of > > the #ifdef", but is used only a single time within the #ifdef in > > igb_resume(). Fix it by moving igb_deliver_wake_packet() next to > > igb_resume() inside the #ifdef. > > > > The diff ends up looking a bit funky here. It *looks* like > > igb_suspend() is getting moved, but that's an artifact of how > > 'diff' sees the changes. > > > > Cc: Jeff Kirsher > > Cc: intel-wired-...@lists.osuosl.org > > Cc: netdev@vger.kernel.org > > Signed-off-by: Dave Hansen > > Seems to be fixed by the following commit in linux-next: > > commit 000ba1f2ebf0d6f93b9ae6cfbe5417e66f1b8e8c > Author: Arnd Bergmann > Date: Thu Apr 27 21:09:52 2017 +0200 > > igb: mark PM functions as __maybe_unused > > The new wake function is only used by the suspend/resume handlers > that > are defined in inside of an #ifdef, which can cause this harmless > warning: > > drivers/net/ethernet/intel/igb/igb_main.c:7988:13: warning: > 'igb_deliver_wake_packet' defined but not used [-Wunused-function] > > Removing the #ifdef, instead using a __maybe_unused annotation > simplifies the code and avoids the warning. > > Fixes: b90fa8763560 ("igb: Enable reading of wake up packet") > Signed-off-by: Arnd Bergmann > Tested-by: Aaron Brown > Signed-off-by: Jeff Kirsher Yeah, this is already fixed in my tree. Dropping this patch. signature.asc Description: This is a digitally signed message part
[PATCH 1/2] dt-bindings: net: Add bindings for National Instruments XGE netdev
This adds bindings for the NI XGE 1G/10G network device. Signed-off-by: Moritz Fischer --- Documentation/devicetree/bindings/net/nixge.c | 32 +++ 1 file changed, 32 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/nixge.c diff --git a/Documentation/devicetree/bindings/net/nixge.c b/Documentation/devicetree/bindings/net/nixge.c new file mode 100644 index 000..9fff5a7 --- /dev/null +++ b/Documentation/devicetree/bindings/net/nixge.c @@ -0,0 +1,32 @@ +* NI XGE Ethernet controller + +Required properties: +- compatible: Should be "ni,xge-enet-2.00" +- reg: Address and length of the register set for the device +- interrupts: Should contain tx and rx interrupt +- interrupt-names: Should be "rx-irq" and "tx-irq" +- phy-mode: See ethernet.txt file in the same directory. +- nvmem-cells: Phandle of nvmem cell containing the mac address +- nvmem-cell-names: Should be "address" + +Examples (10G generic PHY): + nixge0: ethernet@4000 { + compatible = "ni,xge-enet-2.00"; + reg = <0x4000 0x6000>; + + nvmem-cells = <ð1_addr>; + nvmem-cell-names = "address"; + + interrupts = <0 29 4>, <0 30 4>; + interrupt-names = "rx-irq", "tx-irq"; + interrupt-parent = <&intc>; + + phy-mode = "xgmii"; + phy-handle = <ðernet_phy1>; + + ethernet_phy1: ethernet-phy@4 { + compatible = "ethernet-phy-ieee802.3-c45"; + reg = <4>; + devices = <0xa>; + }; + }; -- 2.7.4
[PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev
Add support for the National Instruments XGE 1/10G network device. It uses the EEPROM on the board via NVMEM. Signed-off-by: Moritz Fischer --- drivers/net/ethernet/Kconfig |1 + drivers/net/ethernet/Makefile|1 + drivers/net/ethernet/ni/Kconfig | 26 + drivers/net/ethernet/ni/Makefile |1 + drivers/net/ethernet/ni/nixge.c | 1246 ++ 5 files changed, 1275 insertions(+) create mode 100644 drivers/net/ethernet/ni/Kconfig create mode 100644 drivers/net/ethernet/ni/Makefile create mode 100644 drivers/net/ethernet/ni/nixge.c diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig index edae15ac..2021806 100644 --- a/drivers/net/ethernet/Kconfig +++ b/drivers/net/ethernet/Kconfig @@ -127,6 +127,7 @@ config FEALNX source "drivers/net/ethernet/natsemi/Kconfig" source "drivers/net/ethernet/netronome/Kconfig" +source "drivers/net/ethernet/ni/Kconfig" source "drivers/net/ethernet/8390/Kconfig" config NET_NETX diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile index bf7f450..68f49f7 100644 --- a/drivers/net/ethernet/Makefile +++ b/drivers/net/ethernet/Makefile @@ -58,6 +58,7 @@ obj-$(CONFIG_NET_VENDOR_MYRI) += myricom/ obj-$(CONFIG_FEALNX) += fealnx.o obj-$(CONFIG_NET_VENDOR_NATSEMI) += natsemi/ obj-$(CONFIG_NET_VENDOR_NETRONOME) += netronome/ +obj-$(CONFIG_NET_VENDOR_NI) += ni/ obj-$(CONFIG_NET_NETX) += netx-eth.o obj-$(CONFIG_NET_VENDOR_NUVOTON) += nuvoton/ obj-$(CONFIG_NET_VENDOR_NVIDIA) += nvidia/ diff --git a/drivers/net/ethernet/ni/Kconfig b/drivers/net/ethernet/ni/Kconfig new file mode 100644 index 000..a74ffeb --- /dev/null +++ b/drivers/net/ethernet/ni/Kconfig @@ -0,0 +1,26 @@ +# +# National Instuments network device configuration +# + +config NET_VENDOR_NI + bool "National Instruments Devices" + default y + ---help--- + If you have a network (Ethernet) device belonging to this class, say Y. + + Note that the answer to this question doesn't directly affect the + kernel: saying N will just cause the configurator to skip all + the questions about National Instrument devices. + If you say Y, you will be asked for your specific device in the + following questions. + +if NET_VENDOR_NI + +config NI_XGE_MANAGEMENT_ENET + tristate "National Instruments XGE management enet support" + select PHYLIB + ---help--- + Simple LAN device for debug or management purposes. Can + support either 10G or 1G PHYs via SFP+ ports. + +endif diff --git a/drivers/net/ethernet/ni/Makefile b/drivers/net/ethernet/ni/Makefile new file mode 100644 index 000..99c6646 --- /dev/null +++ b/drivers/net/ethernet/ni/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_NI_XGE_MANAGEMENT_ENET) += nixge.o diff --git a/drivers/net/ethernet/ni/nixge.c b/drivers/net/ethernet/ni/nixge.c new file mode 100644 index 000..85b213c --- /dev/null +++ b/drivers/net/ethernet/ni/nixge.c @@ -0,0 +1,1246 @@ +/* + * Copyright (c) 2016-2017, National Instruments Corp. + * + * Network Driver for Ettus Research XGE MAC + * + * This is largely based on the Xilinx AXI Ethernet Driver, + * and uses the same DMA engine in the FPGA + * + * SPDX-License-Identifier: GPL-2.0 + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define TX_BD_NUM 64 +#define RX_BD_NUM 128 + +#define DELAY_OF_ONE_MILLISEC 1000 + +/* Axi DMA Register definitions */ + +#define XAXIDMA_TX_CR_OFFSET 0x /* Channel control */ +#define XAXIDMA_TX_SR_OFFSET 0x0004 /* Status */ +#define XAXIDMA_TX_CDESC_OFFSET0x0008 /* Current descriptor pointer */ +#define XAXIDMA_TX_TDESC_OFFSET0x0010 /* Tail descriptor pointer */ + +#define XAXIDMA_RX_CR_OFFSET 0x0030 /* Channel control */ +#define XAXIDMA_RX_SR_OFFSET 0x0034 /* Status */ +#define XAXIDMA_RX_CDESC_OFFSET0x0038 /* Current descriptor pointer */ +#define XAXIDMA_RX_TDESC_OFFSET0x0040 /* Tail descriptor pointer */ + +#define XAXIDMA_CR_RUNSTOP_MASK0x0001 /* Start/stop DMA channel */ +#define XAXIDMA_CR_RESET_MASK 0x0004 /* Reset DMA engine */ + +#define XAXIDMA_BD_NDESC_OFFSET0x00 /* Next descriptor pointer */ +#define XAXIDMA_BD_BUFA_OFFSET 0x08 /* Buffer address */ +#define XAXIDMA_BD_CTRL_LEN_OFFSET 0x18 /* Control/buffer length */ +#define XAXIDMA_BD_STS_OFFSET 0x1C /* Status */ +#define XAXIDMA_BD_USR0_OFFSET 0x20 /* User IP specific word0 */ +#define XAXIDMA_BD_USR1_OFFSET 0x24 /* User IP specific word1 */ +#define XAXIDMA_BD_USR2_OFFSET 0x28 /* User IP specific word2 */ +#define XAXIDMA_BD_USR3_OFFSET 0x2C /* User IP specific word3 */ +#define XAXIDMA_BD_USR4_OFFSET 0x30 /* User IP speci
Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported
On 7/13/2017 10:21 AM, Ding Tianhong wrote: > static void pci_configure_relaxed_ordering(struct pci_dev *dev) > +{ > + /* We should not alter the relaxed ordering bit for the VF */ > + if (dev->is_virtfn) > + return; > + > + /* If the releaxed ordering enable bit is not set, do nothing. */ > + if (!pcie_relaxed_ordering_supported(dev)) > + return; > + > + if (pci_dev_should_disable_relaxed_ordering(dev)) { > + pcie_clear_relaxed_ordering(dev); > + dev_info(&dev->dev, "Disable Relaxed Ordering\n"); > + } > +} I couldn't find anywhere where you actually enable the relaxed ordering like the subject suggests. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule
On Thu, Jul 13, 2017 at 02:39:10PM -0600, David Ahern wrote: > On 7/13/17 2:33 PM, Ido Schimmel wrote: > > Remains where? It's not clear to me how you concluded mlxsw is at fault. > > My setup is running net-next with the refcount patches and I didn't > > observe this. > > Create a VRF. BTW, this didn't show up on my dev branch as I've patches that introduce IPv6 support where I move the rules notifications to core, after the refcount is set to 1 and just before the netlink notification is sent. https://github.com/idosch/linux/commit/7b17a21b1d71fc9a1969080e5fdcb90f376b73b2
Re: [PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule
On Thu, Jul 13, 2017 at 02:39:10PM -0600, David Ahern wrote: > On 7/13/17 2:33 PM, Ido Schimmel wrote: > > Remains where? It's not clear to me how you concluded mlxsw is at fault. > > My setup is running net-next with the refcount patches and I didn't > > observe this. > > Create a VRF. Yea, I wasn't running VRFs with the refcount patches. Reproduced this on my system now. Thanks for the fix.
[PATCH] [for 4.13] net: qcom/emac: fix double free of SGMII IRQ during shutdown
If the interface is not up, then don't try to close it during a shutdown. This avoids possible double free of the IRQ, which can happen during a shutdown. Fixes: 03eb3eb4d4d5 ("net: qcom/emac: add shutdown function") Signed-off-by: Timur Tabi --- drivers/net/ethernet/qualcomm/emac/emac.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c index 746d94e..60850bf 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac.c +++ b/drivers/net/ethernet/qualcomm/emac/emac.c @@ -766,11 +766,13 @@ static void emac_shutdown(struct platform_device *pdev) struct emac_adapter *adpt = netdev_priv(netdev); struct emac_sgmii *sgmii = &adpt->phy; - /* Closing the SGMII turns off its interrupts */ - sgmii->close(adpt); + if (netdev->flags & IFF_UP) { + /* Closing the SGMII turns off its interrupts */ + sgmii->close(adpt); - /* Resetting the MAC turns off all DMA and its interrupts */ - emac_mac_reset(adpt); + /* Resetting the MAC turns off all DMA and its interrupts */ + emac_mac_reset(adpt); + } } static struct platform_driver emac_platform_driver = { -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()
From: Alexander Potapenko Date: Thu, 13 Jul 2017 21:28:39 +0200 > On Thu, Jul 13, 2017 at 8:32 PM, David Miller wrote: >> struct sctp_paramhdr { >> __be16 type; >> __be16 length; >> }; >> >> typedef struct sctp_errhdr { >> __be16 cause; >> __be16 length; >> __u8 variable[0]; >> } sctp_errhdr_t; ... >> Something like: >> >> pos.v + offsetof(pos.v, length) + sizeof(pos.v->length) <= (void *) >> chunk + end > > Do we need to bother about truncated structures? Shouldn't it be > enough to check that there's at least sizeof(struct sctp_paramhdr) > bytes left then? With the zero length array at the end, it's arguable what the "size" of such a thing is. That's why I tried to be explicit with the length field.
Re: [PATCH net] net: set fib rule refcount after malloc
From: David Ahern Date: Thu, 13 Jul 2017 13:36:40 -0700 > The configure callback of fib_rules_ops can change the refcnt of a > fib rule. For instance, mlxsw takes a refcnt when adding the processing > of the rule to a work queue. Thus the rule refcnt can not be reset to > to 1 afterwards. Move the refcnt setting to after the allocation. > > Fixes: 5361e209dd30 ("net: avoid one splat in fib_nl_delrule()") > Signed-off-by: David Ahern Can someone keep a scoreboard of how many bugs and arbitrary unnecessary changes get introduced because of this new refcount_t business rather than get fixed? I'd really like to see how that pans out over the long term especially with all of the backporting pain this facility is going to cause. Anyways, David I will apply this, thanks.
Re: [PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule
On 7/13/17 2:33 PM, Ido Schimmel wrote: > Remains where? It's not clear to me how you concluded mlxsw is at fault. > My setup is running net-next with the refcount patches and I didn't > observe this. Create a VRF. see latest patch. mlxsw releasing the refcnt on the rule was the victim; eric's patch to fix a delete was setting the refcnt to 1 after mlxsw bumped it.
[PATCH net] net: set fib rule refcount after malloc
The configure callback of fib_rules_ops can change the refcnt of a fib rule. For instance, mlxsw takes a refcnt when adding the processing of the rule to a work queue. Thus the rule refcnt can not be reset to to 1 afterwards. Move the refcnt setting to after the allocation. Fixes: 5361e209dd30 ("net: avoid one splat in fib_nl_delrule()") Signed-off-by: David Ahern --- net/core/fib_rules.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index a0093e1b0235..fdcb1bcd2afa 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -400,6 +400,7 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh, err = -ENOMEM; goto errout; } + refcount_set(&rule->refcnt, 1); rule->fr_net = net; rule->pref = tb[FRA_PRIORITY] ? nla_get_u32(tb[FRA_PRIORITY]) @@ -517,8 +518,6 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh, last = r; } - refcount_set(&rule->refcnt, 1); - if (last) list_add_rcu(&rule->list, &last->list); else -- 2.1.4
Re: [PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule
On Thu, Jul 13, 2017 at 01:46:15PM -0600, David Ahern wrote: > On 7/13/17 1:11 PM, David Ahern wrote: > > Since mlxsw is not doing a get on the rule to increase the ref count, it > > should not be doing a put. > > upon further review, mlxsw is doing a get on the rule > > Problem remains, but this is not the right fix. Remains where? It's not clear to me how you concluded mlxsw is at fault. My setup is running net-next with the refcount patches and I didn't observe this. If current trace isn't enough to pinpoint the problem, can you try to reproduce with a KASAN enabled kernel?
Re: [PATCH net] libceph: osdmap: Fix some NULL dereferences
On Thu, Jul 13, 2017 at 06:13:22PM +0200, Ilya Dryomov wrote: > > Hi Dan, > > I applied osdmap_apply_incremental() hunk and fixed a similar bug in > osdmap_decode() (it's a not NULL deref, that's why smatch didn't catch > it). > No... :/ Smatch complained about it, but I somehow marked it as done yesterday without sending a patch. Thanks for catching that. The Smatch check is just looking for places where we do ERR_PTR(0) or ERR_PTR(valid_ptr). That can be intentional, so there is some hand reviewing involved. regards, dan carpenter
Re: [PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule
On 7/13/17 1:11 PM, David Ahern wrote: > Since mlxsw is not doing a get on the rule to increase the ref count, it > should not be doing a put. upon further review, mlxsw is doing a get on the rule Problem remains, but this is not the right fix.
Re: [PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()
On Thu, Jul 13, 2017 at 8:32 PM, David Miller wrote: > From: Alexander Potapenko > Date: Thu, 13 Jul 2017 20:10:34 +0200 > >> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h >> index a9519a06a23b..f13632ee33f0 100644 >> --- a/include/net/sctp/sctp.h >> +++ b/include/net/sctp/sctp.h >> @@ -469,6 +469,7 @@ _sctp_walk_params((pos), (chunk), >> ntohs((chunk)->chunk_hdr.length), member) >> >> #define _sctp_walk_params(pos, chunk, end, member)\ >> for (pos.v = chunk->member;\ >> + pos.v < (void *)chunk + end &&\ >> pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\ >> ntohs(pos.p->length) >= sizeof(struct sctp_paramhdr);\ >> pos.v += SCTP_PAD4(ntohs(pos.p->length))) >> @@ -479,6 +480,7 @@ _sctp_walk_errors((err), (chunk_hdr), >> ntohs((chunk_hdr)->length)) >> #define _sctp_walk_errors(err, chunk_hdr, end)\ >> for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \ >> sizeof(struct sctp_chunkhdr));\ >> + (void *)err <= (void *)chunk_hdr + end &&\ >> (void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\ >> ntohs(err->length) >= sizeof(sctp_errhdr_t); \ >> err = (sctp_errhdr_t *)((void *)err + SCTP_PAD4(ntohs(err->length > > Even with the "err < ..." fixed in the second hunk, I still think you need > to tweak these checks some more. > > What is necessary is that the first two members of sctp_paramhdr or > sctp_errhdr are in the range ptr to end. > > struct sctp_paramhdr { > __be16 type; > __be16 length; > }; > > typedef struct sctp_errhdr { > __be16 cause; > __be16 length; > __u8 variable[0]; > } sctp_errhdr_t; > > so that we can legally dereference ->length. > > But that is not what your new check is doing. Your new check is only > making sure there is at least one byte there. > > I guess it's not easy to write a clean test that doesn't hardcode the > offset of the length field and it's size. > > Something like: > > pos.v + offsetof(pos.v, length) + sizeof(pos.v->length) <= (void *) > chunk + end Do we need to bother about truncated structures? Shouldn't it be enough to check that there's at least sizeof(struct sctp_paramhdr) bytes left then? > and > > (void *)err + offsetof(err, length) + sizeof(err->length) <= (void *) > chunk_hdr + end > > And yeah, that isn't exactly concise nor pretty... -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg
[PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule
The recent conversion to refcount_t, 717d1e993ad8 ("net: convert fib_rule.refcnt from atomic_t to refcount_t"), and subsequent fix by Eric, 5361e209dd30 ("net: avoid one splat in fib_nl_delrule()"), exposed a bug in mlxsw. The driver is doing a put on fib rules after processing it from the notifier. This triggers a BUG on: [ 104.444889] BUG: unable to handle kernel NULL pointer dereference at 0010 [ 104.452821] IP: fib_rules_lookup+0x39/0x170 [ 104.457056] PGD 409395067 [ 104.457057] P4D 409395067 [ 104.459783] PUD 408c23067 [ 104.462507] PMD 0 ... [ 104.519750] CPU: 1 PID: 900 Comm: vrf Tainted: GW 4.12.0-rc7+ #51 [ 104.527133] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015 [ 104.537084] task: 880401454380 task.stack: c97c [ 104.543029] RIP: 0010:fib_rules_lookup+0x39/0x170 [ 104.547784] RSP: :88041dd039d8 EFLAGS: 00010207 [ 104.553053] RAX: d8e1b910 RBX: RCX: 0002 [ 104.560264] RDX: fff5 RSI: RDI: 880408d80f30 [ 104.567461] RBP: 88041dd03a08 R08: 001d R09: [ 104.574699] R10: R11: 0006 R12: 88040b160cc0 [ 104.581916] R13: 88041dd03a18 R14: 88040b160d40 R15: 88041dd03aa0 [ 104.589130] FS: 7f44b0edf700() GS:88041dd0() knlGS: [ 104.597330] CS: 0010 DS: ES: CR0: 80050033 [ 104.603151] CR2: 0010 CR3: 000408ca8000 CR4: 001406e0 [ 104.610371] Call Trace: [ 104.612839] [ 104.614872] __fib_lookup+0x54/0x90 [ 104.618406] fib_validate_source+0x31d/0x570 [ 104.622731] ? fib_rules_lookup+0x131/0x170 [ 104.626975] ? __fib_lookup+0x54/0x90 [ 104.630685] ip_route_input_rcu+0xbcf/0xd30 Since mlxsw is not doing a get on the rule to increase the ref count, it should not be doing a put. Fixes: 5d7bfd141924a("ipv4: fib_rules: Dump FIB rules when registering FIB notifier") Signed-off-by: David Ahern --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index 383fef5a8e24..b0fb8e5e83c9 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -2844,7 +2844,6 @@ static void mlxsw_sp_router_fib_event_work(struct work_struct *work) rule = fib_work->fr_info.rule; if (!fib4_rule_default(rule) && !rule->l3mdev) mlxsw_sp_router_fib4_abort(mlxsw_sp); - fib_rule_put(rule); break; case FIB_EVENT_NH_ADD: /* fall through */ case FIB_EVENT_NH_DEL: -- 2.1.4
[PATCH 3/3] Documentation: devicetree: net: optional idm regs for bgmac
Specifying IDM register space in DT is not mendatory for SoCs where firmware takes care of IDM operations. This patch updates BGMAC driver's DT binding documentation indicating the same. Signed-off-by: Abhishek Shah Reviewed-by: Ray Jui Reviewed-by: Oza Oza Reviewed-by: Scott Branden --- Documentation/devicetree/bindings/net/brcm,amac.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/net/brcm,amac.txt b/Documentation/devicetree/bindings/net/brcm,amac.txt index 2fefa1a..ad16c1f 100644 --- a/Documentation/devicetree/bindings/net/brcm,amac.txt +++ b/Documentation/devicetree/bindings/net/brcm,amac.txt @@ -11,6 +11,7 @@ Required properties: - reg-names: Names of the registers. "amac_base":Address and length of the GMAC registers "idm_base": Address and length of the GMAC IDM registers + (required for NSP and Northstar2) "nicpm_base": Address and length of the NIC Port Manager registers (required for Northstar2) - interrupts: Interrupt number -- 2.7.4
[PATCH 2/3] net: ethernet: bgmac: Make IDM register space optional
IDM operations are usually one time ops and should be done in firmware itself. Driver is not supposed to touch IDM registers. However, for some SoCs', driver is performing IDM read/writes. So this patch masks IDM operations in case firmware is taking care of IDM operations. Signed-off-by: Abhishek Shah Reviewed-by: Oza Oza Reviewed-by: Ray Jui Reviewed-by: Scott Branden --- drivers/net/ethernet/broadcom/bgmac-platform.c | 19 --- drivers/net/ethernet/broadcom/bgmac.c | 70 +++--- drivers/net/ethernet/broadcom/bgmac.h | 1 + 3 files changed, 55 insertions(+), 35 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bgmac-platform.c b/drivers/net/ethernet/broadcom/bgmac-platform.c index 1ca75de..d937083d 100644 --- a/drivers/net/ethernet/broadcom/bgmac-platform.c +++ b/drivers/net/ethernet/broadcom/bgmac-platform.c @@ -55,6 +55,9 @@ static void platform_bgmac_idm_write(struct bgmac *bgmac, u16 offset, u32 value) static bool platform_bgmac_clk_enabled(struct bgmac *bgmac) { + if (!bgmac->plat.idm_base) + return true; + if ((bgmac_idm_read(bgmac, BCMA_IOCTL) & BGMAC_CLK_EN) != BGMAC_CLK_EN) return false; if (bgmac_idm_read(bgmac, BCMA_RESET_CTL) & BCMA_RESET_CTL_RESET) @@ -66,6 +69,9 @@ static void platform_bgmac_clk_enable(struct bgmac *bgmac, u32 flags) { u32 val; + if (!bgmac->plat.idm_base) + return; + /* The Reset Control register only contains a single bit to show if the * controller is currently in reset. Do a sanity check here, just in * case the bootloader happened to leave the device in reset. @@ -180,6 +186,7 @@ static int bgmac_probe(struct platform_device *pdev) bgmac->feature_flags |= BGMAC_FEAT_CMDCFG_SR_REV4; bgmac->feature_flags |= BGMAC_FEAT_TX_MASK_SETUP; bgmac->feature_flags |= BGMAC_FEAT_RX_MASK_SETUP; + bgmac->feature_flags |= BGMAC_FEAT_IDM_MASK; bgmac->dev = &pdev->dev; bgmac->dma_dev = &pdev->dev; @@ -207,15 +214,13 @@ static int bgmac_probe(struct platform_device *pdev) return PTR_ERR(bgmac->plat.base); regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "idm_base"); - if (!regs) { - dev_err(&pdev->dev, "Unable to obtain idm resource\n"); - return -EINVAL; + if (regs) { + bgmac->plat.idm_base = devm_ioremap_resource(&pdev->dev, regs); + if (IS_ERR(bgmac->plat.idm_base)) + return PTR_ERR(bgmac->plat.idm_base); + bgmac->feature_flags &= ~BGMAC_FEAT_IDM_MASK; } - bgmac->plat.idm_base = devm_ioremap_resource(&pdev->dev, regs); - if (IS_ERR(bgmac->plat.idm_base)) - return PTR_ERR(bgmac->plat.idm_base); - regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "nicpm_base"); if (regs) { bgmac->plat.nicpm_base = devm_ioremap_resource(&pdev->dev, diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c index ba4d2e1..48d672b 100644 --- a/drivers/net/ethernet/broadcom/bgmac.c +++ b/drivers/net/ethernet/broadcom/bgmac.c @@ -622,9 +622,11 @@ static int bgmac_dma_alloc(struct bgmac *bgmac) BUILD_BUG_ON(BGMAC_MAX_TX_RINGS > ARRAY_SIZE(ring_base)); BUILD_BUG_ON(BGMAC_MAX_RX_RINGS > ARRAY_SIZE(ring_base)); - if (!(bgmac_idm_read(bgmac, BCMA_IOST) & BCMA_IOST_DMA64)) { - dev_err(bgmac->dev, "Core does not report 64-bit DMA\n"); - return -ENOTSUPP; + if (!(bgmac->feature_flags & BGMAC_FEAT_IDM_MASK)) { + if (!(bgmac_idm_read(bgmac, BCMA_IOST) & BCMA_IOST_DMA64)) { + dev_err(bgmac->dev, "Core does not report 64-bit DMA\n"); + return -ENOTSUPP; + } } for (i = 0; i < BGMAC_MAX_TX_RINGS; i++) { @@ -855,9 +857,11 @@ static void bgmac_mac_speed(struct bgmac *bgmac) static void bgmac_miiconfig(struct bgmac *bgmac) { if (bgmac->feature_flags & BGMAC_FEAT_FORCE_SPEED_2500) { - bgmac_idm_write(bgmac, BCMA_IOCTL, - bgmac_idm_read(bgmac, BCMA_IOCTL) | 0x40 | - BGMAC_BCMA_IOCTL_SW_CLKEN); + if (!(bgmac->feature_flags & BGMAC_FEAT_IDM_MASK)) { + bgmac_idm_write(bgmac, BCMA_IOCTL, + bgmac_idm_read(bgmac, BCMA_IOCTL) | + 0x40 | BGMAC_BCMA_IOCTL_SW_CLKEN); + } bgmac->mac_speed = SPEED_2500; bgmac->mac_duplex = DUPLEX_FULL; bgmac_mac_speed(bgmac); @@ -874,11 +878,36 @@ static void bgmac_miiconfig(struct bgmac *bgmac) } } +static void bgmac_chip_reset_idm_config(struct bgmac *bgmac) +{ + u32 iost; + + iost = bgmac_idm_read(bgmac, BCMA_IOST); +
[PATCH 1/3] net: ethernet: bgmac: Remove unnecessary 'return' from platform_bgmac_idm_write
Return type for idm register write callback should be void as 'writel' API is used for write operation. However, there no need to have 'return' in this function. Signed-off-by: Abhishek Shah Reviewed-by: Oza Oza Reviewed-by: Ray Jui Reviewed-by: Scott Branden --- drivers/net/ethernet/broadcom/bgmac-platform.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bgmac-platform.c b/drivers/net/ethernet/broadcom/bgmac-platform.c index 73aca97..1ca75de 100644 --- a/drivers/net/ethernet/broadcom/bgmac-platform.c +++ b/drivers/net/ethernet/broadcom/bgmac-platform.c @@ -50,7 +50,7 @@ static u32 platform_bgmac_idm_read(struct bgmac *bgmac, u16 offset) static void platform_bgmac_idm_write(struct bgmac *bgmac, u16 offset, u32 value) { - return writel(value, bgmac->plat.idm_base + offset); + writel(value, bgmac->plat.idm_base + offset); } static bool platform_bgmac_clk_enabled(struct bgmac *bgmac) -- 2.7.4
[PATCH 0/3] Extend BGMAC driver for Stingray SoC
The patchset extends Broadcom BGMAC driver for Broadcom Stingray SoC. This patchset is based on Linux-4.12 and tested on NS2 and Stingray. Abhishek Shah (3): net: ethernet: bgmac: Remove unnecessary 'return' from platform_bgmac_idm_write net: ethernet: bgmac: Make IDM register space optional in BGMAC driver Documentation: devicetree: net: optional idm regs for bgmac .../devicetree/bindings/net/brcm,amac.txt | 1 + drivers/net/ethernet/broadcom/bgmac-platform.c | 21 --- drivers/net/ethernet/broadcom/bgmac.c | 70 +- drivers/net/ethernet/broadcom/bgmac.h | 1 + 4 files changed, 57 insertions(+), 36 deletions(-) -- 2.7.4
Re: [PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()
From: Alexander Potapenko Date: Thu, 13 Jul 2017 20:10:34 +0200 > diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h > index a9519a06a23b..f13632ee33f0 100644 > --- a/include/net/sctp/sctp.h > +++ b/include/net/sctp/sctp.h > @@ -469,6 +469,7 @@ _sctp_walk_params((pos), (chunk), > ntohs((chunk)->chunk_hdr.length), member) > > #define _sctp_walk_params(pos, chunk, end, member)\ > for (pos.v = chunk->member;\ > + pos.v < (void *)chunk + end &&\ > pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\ > ntohs(pos.p->length) >= sizeof(struct sctp_paramhdr);\ > pos.v += SCTP_PAD4(ntohs(pos.p->length))) > @@ -479,6 +480,7 @@ _sctp_walk_errors((err), (chunk_hdr), > ntohs((chunk_hdr)->length)) > #define _sctp_walk_errors(err, chunk_hdr, end)\ > for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \ > sizeof(struct sctp_chunkhdr));\ > + (void *)err <= (void *)chunk_hdr + end &&\ > (void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\ > ntohs(err->length) >= sizeof(sctp_errhdr_t); \ > err = (sctp_errhdr_t *)((void *)err + SCTP_PAD4(ntohs(err->length Even with the "err < ..." fixed in the second hunk, I still think you need to tweak these checks some more. What is necessary is that the first two members of sctp_paramhdr or sctp_errhdr are in the range ptr to end. struct sctp_paramhdr { __be16 type; __be16 length; }; typedef struct sctp_errhdr { __be16 cause; __be16 length; __u8 variable[0]; } sctp_errhdr_t; so that we can legally dereference ->length. But that is not what your new check is doing. Your new check is only making sure there is at least one byte there. I guess it's not easy to write a clean test that doesn't hardcode the offset of the length field and it's size. Something like: pos.v + offsetof(pos.v, length) + sizeof(pos.v->length) <= (void *) chunk + end and (void *)err + offsetof(err, length) + sizeof(err->length) <= (void *) chunk_hdr + end And yeah, that isn't exactly concise nor pretty...
RE: [PATCH] smsc95xx: use ethtool_op_get_ts_info()
> -Original Message- > From: Petr Kulhavy [mailto:br...@jikos.cz] > Sent: Thursday, July 13, 2017 1:41 PM > To: steve.glendinn...@shawell.net; UNGLinuxDriver > Cc: netdev@vger.kernel.org; linux-...@vger.kernel.org; Petr Kulhavy > Subject: [PATCH] smsc95xx: use ethtool_op_get_ts_info() > > This change enables the use of SW timestamping on Raspberry PI. > > smsc95xx uses the usbnet transmit function usbnet_start_xmit(), which > implements software timestamping. However the > SOF_TIMESTAMPING_TX_SOFTWARE > capability was missing and only SOF_TIMESTAMPING_RX_SOFTWARE was > announced. > By using ethtool_op_get_ts_info() as get_ts_info() also the > SOF_TIMESTAMPING_TX_SOFTWARE is announced. > > Signed-off-by: Petr Kulhavy Reviewed-by: Woojung Huh Thanks
Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants
From: John Fastabend Date: Thu, 13 Jul 2017 10:00:15 -0700 > On 07/13/2017 09:16 AM, Jesper Dangaard Brouer wrote: >> On Thu, 13 Jul 2017 13:14:30 +0200 >> Jesper Dangaard Brouer wrote: >> >>> I'm still getting crashes (but much harder to provoke), but I figured >>> out why. We sort of missed one case, where map_to_flush gets set, when >>> the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then >>> forgets to call xdp_do_flush_map, if all packets in that NAPI cycle >>> failed. We could blame the driver, but yhe clean solution is making >>> sure, that we don't set map_to_flush when the __bpf_tx_xdp() call >>> fails. It should also handle the other case I fixed I'll cleanup >>> my PoC-fix patch, test it and provide it here. >> >> I changed flow in the function to be: > > > Great, I'll merge this, the other couple fixes, and the bitops optimization > and > hopefully then we are set. I'll post a v2 and we can do some final checks. I am so looking forward to merging this, great work everyone.
Re: [PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()
On Thu, Jul 13, 2017 at 8:14 PM, Alexander Potapenko wrote: > On Thu, Jul 13, 2017 at 8:10 PM, Alexander Potapenko > wrote: >> If the iterator (|pos.p| or |err|) has already reached the end of >> chunk, we shouldn't access iterator->length. >> >> This bug has been detected by KMSAN. For the following pair of system >> calls: >> >> socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3 >> sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0), >> inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, >> sin6_scope_id=0}, 28) = 1 >> >> the tool has reported a use of uninitialized memory: >> >> == >> BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0 >> CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs >> 01/01/2011 >> Call Trace: >> >>__dump_stack lib/dump_stack.c:16 >>dump_stack+0x172/0x1c0 lib/dump_stack.c:52 >>kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927 >>__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469 >>__sctp_rcv_init_lookup net/sctp/input.c:1074 >>__sctp_rcv_lookup_harder net/sctp/input.c:1233 >>__sctp_rcv_lookup net/sctp/input.c:1255 >>sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170 >>sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984 >>ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 >>NF_HOOK ./include/linux/netfilter.h:257 >>ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 >>dst_input ./include/net/dst.h:492 >>ip6_rcv_finish net/ipv6/ip6_input.c:69 >>NF_HOOK ./include/linux/netfilter.h:257 >>ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 >>__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 >>__netif_receive_skb net/core/dev.c:4246 >>process_backlog+0x667/0xba0 net/core/dev.c:4866 >>napi_poll net/core/dev.c:5268 >>net_rx_action+0xc95/0x1590 net/core/dev.c:5333 >>__do_softirq+0x485/0x942 kernel/softirq.c:284 >>do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902 >> >>do_softirq kernel/softirq.c:328 >>__local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181 >>local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31 >>rcu_read_unlock_bh ./include/linux/rcupdate.h:931 >>ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124 >>ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149 >>NF_HOOK_COND ./include/linux/netfilter.h:246 >>ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163 >>dst_output ./include/net/dst.h:486 >>NF_HOOK ./include/linux/netfilter.h:257 >>ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261 >>sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225 >>sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632 >>sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885 >>sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750 >>sctp_side_effects net/sctp/sm_sideeffect.c:1773 >>sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147 >>sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88 >>sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954 >>inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762 >>sock_sendmsg_nosec net/socket.c:633 >>sock_sendmsg net/socket.c:643 >>SYSC_sendto+0x608/0x710 net/socket.c:1696 >>SyS_sendto+0x8a/0xb0 net/socket.c:1664 >>do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285 >>entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246 >> RIP: 0033:0x401133 >> RSP: 002b:7fff6d99cd38 EFLAGS: 0246 ORIG_RAX: 002c >> RAX: ffda RBX: 004002b0 RCX: 00401133 >> RDX: 0001 RSI: 00494088 RDI: 0003 >> RBP: 7fff6d99cd90 R08: 7fff6d99cd50 R09: 001c >> R10: 0001 R11: 0246 R12: >> R13: 004063d0 R14: 00406460 R15: >> origin: >>save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 >>kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 >>kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198 >>kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211 >>slab_alloc_node mm/slub.c:2743 >>__kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351 >>__kmalloc_reserve net/core/skbuff.c:138 >>__alloc_skb+0x26b/0x840 net/core/skbuff.c:231 >>alloc_skb ./include/linux/skbuff.h:933 >>sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570 >>sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885 >>sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750 >>sctp_side_effects net/sctp/sm_sideeffect.c:1773 >>sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147 >>sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88 >>sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954 >>inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762 >>sock_sendmsg_nosec net/s
Re: [PATCH v7 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
On Thu, Jul 13, 2017 at 11:14 AM, Alexander Duyck wrote: > On Thu, Jul 13, 2017 at 7:21 AM, Ding Tianhong > wrote: >> From: Casey Leedom >> >> cxgb4 Ethernet driver now queries PCIe configuration space to determine >> if it can send TLPs to it with the Relaxed Ordering Attribute set. >> >> Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability >> Device Control[Relaxed Ordering Enable] at probe routine, to make sure >> the driver will not send the Relaxed Ordering TLPs to the Root Complex which >> could not deal the Relaxed Ordering TLPs. >> >> Signed-off-by: Casey Leedom >> Signed-off-by: Ding Tianhong > > Ding, > > You can probably just drop this patch. If I am understanding Casey > correctly just the fact that the relaxed ordering enable bit is > cleared in the configuration should be enough to do this for the > device automatically. > > - Alex Actually I take that back. I hadn't caught the most recent parts of the thread. If this is good for Casey then this works for me. - Alex
Re: [PATCH net-next v1 0/3] Flow Based GTP Tunneling
On 13 July 2017 at 00:12, Harald Welte wrote: > hi Jiannan, > > net-next si closed, as it has been pointed out already by Joe. > > On Wed, Jul 12, 2017 at 05:44:52PM -0700, Jiannan Ouyang wrote: >> ovs-ofctl add-flow br0 >> "in_port=2,tun_src=192.168.60.141,tun_id=123, \ >> actions=set_field:02:00:00:00:00:00->eth_src, \ >> set_field:ff:ff:ff:ff:ff:ff->eth_dst,LOCAL" > > I'm not familiar with the details here, but does this imply that you > are matching on the outer (transport layer) source IP address? If so, > please note this is in violation of the 3GPP specifications for GTP, > which require explicitly that the TEID is the *only* criteria for > matching an encapsulated packet to the tunnel. Basically anyone can > send you an encapsulated packet from any random source IP, just as long > as the TEID matches a tunnel, it will be decapsulated. > > This is [presumably] in order to take care of mobility, as the > subscribers' phone moves around different MME/S-GW/SGSN, each having > different source IP addresses. I think that this will be hard to avoid; the several existing tunnel implementations that OVS plugs into all allow matching on the outer addresses/ports, I don't see a good way to restrict it without introducing completely new metadata_dst paths for GTP. I'd prefer not to introduce something like this if we can avoid it; several tunnels currently share all of the same metadata_dst code and that's proven sufficient for all cases so far. If someone wishes to implement the 3GPP standard correctly then they should not create matches like this. In quite a few cases, OVS tends to take the approach that we give the user the tools to do what they need to do, but if they wish to shoot themselves in the foot then that's up to them. We can of course work towards ensuring the OVS userspace guides users in the right direction though.
Re: [PATCH v7 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
On Thu, Jul 13, 2017 at 7:21 AM, Ding Tianhong wrote: > From: Casey Leedom > > cxgb4 Ethernet driver now queries PCIe configuration space to determine > if it can send TLPs to it with the Relaxed Ordering Attribute set. > > Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability > Device Control[Relaxed Ordering Enable] at probe routine, to make sure > the driver will not send the Relaxed Ordering TLPs to the Root Complex which > could not deal the Relaxed Ordering TLPs. > > Signed-off-by: Casey Leedom > Signed-off-by: Ding Tianhong Ding, You can probably just drop this patch. If I am understanding Casey correctly just the fact that the relaxed ordering enable bit is cleared in the configuration should be enough to do this for the device automatically. - Alex > --- > drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 1 + > drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 23 +-- > drivers/net/ethernet/chelsio/cxgb4/sge.c| 5 +++-- > 3 files changed, 21 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h > b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h > index ef4be78..09ea62e 100644 > --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h > +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h > @@ -529,6 +529,7 @@ enum { /* adapter flags */ > USING_SOFT_PARAMS = (1 << 6), > MASTER_PF = (1 << 7), > FW_OFLD_CONN = (1 << 9), > + ROOT_NO_RELAXED_ORDERING = (1 << 10), > }; > > enum { > diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c > b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c > index e403fa1..391e484 100644 > --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c > +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c > @@ -4654,11 +4654,6 @@ static void print_port_info(const struct net_device > *dev) > dev->name, adap->params.vpd.id, adap->name, buf); > } > > -static void enable_pcie_relaxed_ordering(struct pci_dev *dev) > -{ > - pcie_capability_set_word(dev, PCI_EXP_DEVCTL, > PCI_EXP_DEVCTL_RELAX_EN); > -} > - > /* > * Free the following resources: > * - memory used for tables > @@ -4908,7 +4903,6 @@ static int init_one(struct pci_dev *pdev, const struct > pci_device_id *ent) > } > > pci_enable_pcie_error_reporting(pdev); > - enable_pcie_relaxed_ordering(pdev); > pci_set_master(pdev); > pci_save_state(pdev); > > @@ -4947,6 +4941,23 @@ static int init_one(struct pci_dev *pdev, const struct > pci_device_id *ent) > adapter->msg_enable = DFLT_MSG_ENABLE; > memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map)); > > + /* If possible, we use PCIe Relaxed Ordering Attribute to deliver > +* Ingress Packet Data to Free List Buffers in order to allow for > +* chipset performance optimizations between the Root Complex and > +* Memory Controllers. (Messages to the associated Ingress Queue > +* notifying new Packet Placement in the Free Lists Buffers will be > +* send without the Relaxed Ordering Attribute thus guaranteeing that > +* all preceding PCIe Transaction Layer Packets will be processed > +* first.) But some Root Complexes have various issues with Upstream > +* Transaction Layer Packets with the Relaxed Ordering Attribute set. > +* The PCIe devices which under the Root Complexes will be cleared the > +* Relaxed Ordering bit in the configuration space, So we check our > +* PCIe configuration space to see if it's flagged with advice against > +* using Relaxed Ordering. > +*/ > + if (!pcie_relaxed_ordering_supported(pdev)) > + adapter->flags |= ROOT_NO_RELAXED_ORDERING; > + > spin_lock_init(&adapter->stats_lock); > spin_lock_init(&adapter->tid_release_lock); > spin_lock_init(&adapter->win0_lock); > diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c > b/drivers/net/ethernet/chelsio/cxgb4/sge.c > index ede1220..4ef68f6 100644 > --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c > +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c > @@ -2719,6 +2719,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct > sge_rspq *iq, bool fwevtq, > struct fw_iq_cmd c; > struct sge *s = &adap->sge; > struct port_info *pi = netdev_priv(dev); > + int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING); > > /* Size needs to be multiple of 16, including status entry. */ > iq->size = roundup(iq->size, 16); > @@ -2772,8 +2773,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct > sge_rspq *iq, bool fwevtq, > > flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc); > c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F | > -FW_IQ_CMD_FL0FETCHRO_F | > -
Re: [PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()
On Thu, Jul 13, 2017 at 8:10 PM, Alexander Potapenko wrote: > If the iterator (|pos.p| or |err|) has already reached the end of > chunk, we shouldn't access iterator->length. > > This bug has been detected by KMSAN. For the following pair of system > calls: > > socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3 > sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0), > inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, > sin6_scope_id=0}, 28) = 1 > > the tool has reported a use of uninitialized memory: > > == > BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0 > CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs > 01/01/2011 > Call Trace: > >__dump_stack lib/dump_stack.c:16 >dump_stack+0x172/0x1c0 lib/dump_stack.c:52 >kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927 >__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469 >__sctp_rcv_init_lookup net/sctp/input.c:1074 >__sctp_rcv_lookup_harder net/sctp/input.c:1233 >__sctp_rcv_lookup net/sctp/input.c:1255 >sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170 >sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984 >ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 >NF_HOOK ./include/linux/netfilter.h:257 >ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 >dst_input ./include/net/dst.h:492 >ip6_rcv_finish net/ipv6/ip6_input.c:69 >NF_HOOK ./include/linux/netfilter.h:257 >ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 >__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 >__netif_receive_skb net/core/dev.c:4246 >process_backlog+0x667/0xba0 net/core/dev.c:4866 >napi_poll net/core/dev.c:5268 >net_rx_action+0xc95/0x1590 net/core/dev.c:5333 >__do_softirq+0x485/0x942 kernel/softirq.c:284 >do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902 > >do_softirq kernel/softirq.c:328 >__local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181 >local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31 >rcu_read_unlock_bh ./include/linux/rcupdate.h:931 >ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124 >ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149 >NF_HOOK_COND ./include/linux/netfilter.h:246 >ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163 >dst_output ./include/net/dst.h:486 >NF_HOOK ./include/linux/netfilter.h:257 >ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261 >sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225 >sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632 >sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885 >sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750 >sctp_side_effects net/sctp/sm_sideeffect.c:1773 >sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147 >sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88 >sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954 >inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762 >sock_sendmsg_nosec net/socket.c:633 >sock_sendmsg net/socket.c:643 >SYSC_sendto+0x608/0x710 net/socket.c:1696 >SyS_sendto+0x8a/0xb0 net/socket.c:1664 >do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285 >entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246 > RIP: 0033:0x401133 > RSP: 002b:7fff6d99cd38 EFLAGS: 0246 ORIG_RAX: 002c > RAX: ffda RBX: 004002b0 RCX: 00401133 > RDX: 0001 RSI: 00494088 RDI: 0003 > RBP: 7fff6d99cd90 R08: 7fff6d99cd50 R09: 001c > R10: 0001 R11: 0246 R12: > R13: 004063d0 R14: 00406460 R15: > origin: >save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 >kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 >kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198 >kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211 >slab_alloc_node mm/slub.c:2743 >__kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351 >__kmalloc_reserve net/core/skbuff.c:138 >__alloc_skb+0x26b/0x840 net/core/skbuff.c:231 >alloc_skb ./include/linux/skbuff.h:933 >sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570 >sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885 >sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750 >sctp_side_effects net/sctp/sm_sideeffect.c:1773 >sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147 >sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88 >sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954 >inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762 >sock_sendmsg_nosec net/socket.c:633 >sock_sendmsg net/socket.c:643 >SYSC_sendto+0x608/0x710 net/socket.c:1696 >SyS_sendto+0x8a/0xb0 net/socket.c:1664 >do_syscall_64+0xe6/
[PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()
If the iterator (|pos.p| or |err|) has already reached the end of chunk, we shouldn't access iterator->length. This bug has been detected by KMSAN. For the following pair of system calls: socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3 sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 1 the tool has reported a use of uninitialized memory: == BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0 CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x172/0x1c0 lib/dump_stack.c:52 kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927 __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469 __sctp_rcv_init_lookup net/sctp/input.c:1074 __sctp_rcv_lookup_harder net/sctp/input.c:1233 __sctp_rcv_lookup net/sctp/input.c:1255 sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170 sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902 do_softirq kernel/softirq.c:328 __local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181 local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31 rcu_read_unlock_bh ./include/linux/rcupdate.h:931 ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124 ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149 NF_HOOK_COND ./include/linux/netfilter.h:246 ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163 dst_output ./include/net/dst.h:486 NF_HOOK ./include/linux/netfilter.h:257 ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261 sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225 sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632 sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885 sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750 sctp_side_effects net/sctp/sm_sideeffect.c:1773 sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147 sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88 sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954 inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 sock_sendmsg net/socket.c:643 SYSC_sendto+0x608/0x710 net/socket.c:1696 SyS_sendto+0x8a/0xb0 net/socket.c:1664 do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285 entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246 RIP: 0033:0x401133 RSP: 002b:7fff6d99cd38 EFLAGS: 0246 ORIG_RAX: 002c RAX: ffda RBX: 004002b0 RCX: 00401133 RDX: 0001 RSI: 00494088 RDI: 0003 RBP: 7fff6d99cd90 R08: 7fff6d99cd50 R09: 001c R10: 0001 R11: 0246 R12: R13: 004063d0 R14: 00406460 R15: origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198 kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211 slab_alloc_node mm/slub.c:2743 __kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351 __kmalloc_reserve net/core/skbuff.c:138 __alloc_skb+0x26b/0x840 net/core/skbuff.c:231 alloc_skb ./include/linux/skbuff.h:933 sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570 sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885 sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750 sctp_side_effects net/sctp/sm_sideeffect.c:1773 sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147 sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88 sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954 inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 sock_sendmsg net/socket.c:643 SYSC_sendto+0x608/0x710 net/socket.c:1696 SyS_sendto+0x8a/0xb0 net/socket.c:1664 do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285 return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246 == Signed-off-by: Alexander Potapenko --- include/net/sctp/sctp.h | 2 ++ 1 file changed,
Re: [PATCH] datapath: Fix for force/commit action failures
On 13 July 2017 at 11:01, Greg Rose wrote: > On 07/13/2017 10:46 AM, Joe Stringer wrote: >> >> On 13 July 2017 at 09:25, Greg Rose wrote: >>> >>> When there is an established connection in direction A->B, it is >>> possible to receive a packet on port B which then executes >>> ct(commit,force) without first performing ct() - ie, a lookup. >>> In this case, we would expect that this packet can delete the existing >>> entry so that we can commit a connection with direction B->A. However, >>> currently we only perform a check in skb_nfct_cached() for whether >>> OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a >>> lookup previously occurred. In the above scenario, a lookup has not >>> occurred but we should still be able to statelessly look up the >>> existing entry and potentially delete the entry if it is in the >>> opposite direction. >>> >>> This patch extends the check to also hint that if the action has the >>> force flag set, then we will lookup the existing entry so that the >>> force check at the end of skb_nfct_cached has the ability to delete >>> the connection. >>> >>> CC: d...@openvswitch.org >>> CC: Pravin Shalar >>> Signed-off-by: Joe Stringer >>> Signed-off-by: Greg Rose >>> --- >> >> >> A couple more administrative notes, on netdev the module name in the >> patch subject for openvswitch is "openvswitch" rather than datapath; > > > Right you are. > >> and patches rather than having just "PATCH" as the subject prefix >> should state the tree. In this case, it's a bugfix so it should be >> "PATCH net". > > > I knew that... forgot the format patch option to add it. Net-next > is closed so that would be mandatory. > > Furthermore, if you're able to figure out which commit >> >> introduced the issue (I believe it's introduced by the force commit >> patch), then you should place the "Fixes: " tag. I can give you some >> pointers off-list on how to do this (git blame and some basic >> formatting of the targeted patch should do the trick - this tag >> expects a 12-digit hash). >> >> For reference, I ended up looking it up during review, this is the >> line you'd add: >> Fixes: dd41d33f0b03 ("openvswitch: Add force commit.") > > > Oh, thanks! > > >> >>> net/openvswitch/conntrack.c | 12 >>> 1 file changed, 8 insertions(+), 4 deletions(-) >>> >>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c >>> index 08679eb..9041cf8 100644 >>> --- a/net/openvswitch/conntrack.c >>> +++ b/net/openvswitch/conntrack.c >>> @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net, >>> ct = nf_ct_get(skb, &ctinfo); >>> /* If no ct, check if we have evidence that an existing >>> conntrack entry >>> * might be found for this skb. This happens when we lose a >>> skb->_nfct >>> -* due to an upcall. If the connection was not confirmed, it is >>> not >>> -* cached and needs to be run through conntrack again. >>> +* due to an upcall, or if the direction is being forced. If the >>> +* connection was not confirmed, it is not cached and needs to be >>> run >>> +* through conntrack again. >>> */ >>> - if (!ct && key->ct_state & OVS_CS_F_TRACKED && >>> + if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) && >>> !(key->ct_state & OVS_CS_F_INVALID) && >>> - key->ct_zone == info->zone.id) { >>> +key->ct_zone == info->zone.id) || >>> +(!key->ct_state && info->force)) { >>> ct = ovs_ct_find_existing(net, &info->zone, >>> info->family, skb, >>>!!(key->ct_state >>> & OVS_CS_F_NAT_MASK)); >>> if (ct) >>> nf_ct_get(skb, &ctinfo); >>> + else >>> + return false; >>> } >>> if (!ct) >>> return false; >> >> >> I was just wondering if this has the potential to prevent >> nf_conntrack_in() from being called at all in this case, which is also >> not quite right. In the original case of (!ct && (key->ct_state & >> OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_TRACKED)), which I'll >> refer to as "ct_executed", we explicitly want to avoid running >> nf_conntrack_in() if we already ran it, because the connection tracker >> doesn't expect to see the same packet twice (there's also things like >> stats/accounting, and potentially L4 state machines that could get >> messed up by multiple calls). By the time the info->force and >> direction check happens at the end of the function, "ct_executed" is >> implied to be true. However, in this new case, ct_executed is actually >> false - because there was no ct() before the ct(force,commit). In this >> case, we only want to look up the existing entry to see if it should >> be deleted; if it should not be deleted, then we still haven't yet >> done the nf_conntrack_in() call so we should retur
Re: [PATCH net-next v1 2/3] gtp: Support creating flow-based gtp net_device
On 12 July 2017 at 17:44, Jiannan Ouyang wrote: > Add the gtp_create_flow_based_dev() interface to create flow-based gtp > net_device, which sets gtp->collect_md. Under flow-based mode, UDP sockets are > created and maintained in kernel. > > Signed-off-by: Jiannan Ouyang > --- > +static int gtp_configure(struct net *net, struct net_device *dev, > +const struct ip_tunnel_info *info) > +{ > + struct gtp_net *gn = net_generic(net, gtp_net_id); > + struct gtp_dev *gtp = netdev_priv(dev); > + int err; > + > + gtp->net = net; > + gtp->dev = dev; > + > + if (gtp_find_flow_based_dev(net)) > + return -EBUSY; > + > + dev->netdev_ops = >p_netdev_ops; > + dev->priv_destructor= free_netdev; > + > + dev->hard_header_len = 0; > + dev->addr_len = 0; > + > + /* Zero header length. */ > + dev->type = ARPHRD_NONE; > + dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST; > + > + dev->priv_flags |= IFF_NO_QUEUE; > + dev->features |= NETIF_F_LLTX; > + netif_keep_dst(dev); > + > + /* Assume largest header, ie. GTPv0. */ > + dev->needed_headroom= LL_MAX_HEADER + > + sizeof(struct iphdr) + > + sizeof(struct udphdr) + > + sizeof(struct gtp0_header); > + > + dst_cache_reset(>p->info.dst_cache); > + gtp->info = *info; > + gtp->collect_md = true; > + > + err = gtp_hashtable_new(gtp, GTP_PDP_HASHSIZE); // JO: when to free?? > + if (err < 0) { > + pr_err("Error gtp_hashtable_new"); > + return err; > + } Firstly, if this succeeds then the subsequent register_netdevice() call below could still fail, in which case the error handling there free this hashtable. Don't forget that even if this whole function succeeds, the caller may still fail to set up the device and in those error cases everything needs to be freed/cleaned up as well. Then in terms of the overall lifecycle, I guess it's a matter of 1) If there is a hashtable allocated then it must be freed on device close; 2) If the device can be reconfigured, and there is already one of these allocated then it would need to be freed upon reconfiguration. Cheers, Joe
Re: [PATCH] datapath: Fix for force/commit action failures
On 07/13/2017 10:46 AM, Joe Stringer wrote: On 13 July 2017 at 09:25, Greg Rose wrote: When there is an established connection in direction A->B, it is possible to receive a packet on port B which then executes ct(commit,force) without first performing ct() - ie, a lookup. In this case, we would expect that this packet can delete the existing entry so that we can commit a connection with direction B->A. However, currently we only perform a check in skb_nfct_cached() for whether OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a lookup previously occurred. In the above scenario, a lookup has not occurred but we should still be able to statelessly look up the existing entry and potentially delete the entry if it is in the opposite direction. This patch extends the check to also hint that if the action has the force flag set, then we will lookup the existing entry so that the force check at the end of skb_nfct_cached has the ability to delete the connection. CC: d...@openvswitch.org CC: Pravin Shalar Signed-off-by: Joe Stringer Signed-off-by: Greg Rose --- A couple more administrative notes, on netdev the module name in the patch subject for openvswitch is "openvswitch" rather than datapath; Right you are. and patches rather than having just "PATCH" as the subject prefix should state the tree. In this case, it's a bugfix so it should be "PATCH net". I knew that... forgot the format patch option to add it. Net-next is closed so that would be mandatory. Furthermore, if you're able to figure out which commit introduced the issue (I believe it's introduced by the force commit patch), then you should place the "Fixes: " tag. I can give you some pointers off-list on how to do this (git blame and some basic formatting of the targeted patch should do the trick - this tag expects a 12-digit hash). For reference, I ended up looking it up during review, this is the line you'd add: Fixes: dd41d33f0b03 ("openvswitch: Add force commit.") Oh, thanks! net/openvswitch/conntrack.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 08679eb..9041cf8 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net, ct = nf_ct_get(skb, &ctinfo); /* If no ct, check if we have evidence that an existing conntrack entry * might be found for this skb. This happens when we lose a skb->_nfct -* due to an upcall. If the connection was not confirmed, it is not -* cached and needs to be run through conntrack again. +* due to an upcall, or if the direction is being forced. If the +* connection was not confirmed, it is not cached and needs to be run +* through conntrack again. */ - if (!ct && key->ct_state & OVS_CS_F_TRACKED && + if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_INVALID) && - key->ct_zone == info->zone.id) { +key->ct_zone == info->zone.id) || +(!key->ct_state && info->force)) { ct = ovs_ct_find_existing(net, &info->zone, info->family, skb, !!(key->ct_state & OVS_CS_F_NAT_MASK)); if (ct) nf_ct_get(skb, &ctinfo); + else + return false; } if (!ct) return false; I was just wondering if this has the potential to prevent nf_conntrack_in() from being called at all in this case, which is also not quite right. In the original case of (!ct && (key->ct_state & OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_TRACKED)), which I'll refer to as "ct_executed", we explicitly want to avoid running nf_conntrack_in() if we already ran it, because the connection tracker doesn't expect to see the same packet twice (there's also things like stats/accounting, and potentially L4 state machines that could get messed up by multiple calls). By the time the info->force and direction check happens at the end of the function, "ct_executed" is implied to be true. However, in this new case, ct_executed is actually false - because there was no ct() before the ct(force,commit). In this case, we only want to look up the existing entry to see if it should be deleted; if it should not be deleted, then we still haven't yet done the nf_conntrack_in() call so we should return false and the caller, __ovs_ct_lookup() should make the call to nf_conntrack_in(). What I mean is something like the following incremental on your patch: diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 9041cf8b822f..98783f114824 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -637,6 +637,7 @@ static bool skb_nfct_cached(struct net *net, {
[RFC net 1/2] net: set skb hash for IP6 TCP reset packet
From: Shaohua Li Please see below tcpdump output: 21:00:48.109122 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 40) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [S], cksum 0x0529 (incorrect -> 0xf56c), seq 3282214508, win 43690, options [mss 65476,sackOK,TS val 2500903437 ecr 0,nop,wscale 7], length 0 21:00:48.109381 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 40) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags [S.], cksum 0x0529 (incorrect -> 0x49ad), seq 1923801573, ack 3282214509, win 43690, options [mss 65476,sackOK,TS val 2500903437 ecr 2500903437,nop,wscale 7], length 0 21:00:48.109548 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [.], cksum 0x0521 (incorrect -> 0x1bdf), seq 1, ack 1, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 0 21:00:48.109823 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 62) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [P.], cksum 0x053f (incorrect -> 0xb8b1), seq 1:31, ack 1, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 30 21:00:48.109910 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags [.], cksum 0x0521 (incorrect -> 0x1bc1), seq 1, ack 31, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 0 21:00:48.110043 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 56) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags [P.], cksum 0x0539 (incorrect -> 0xb726), seq 1:25, ack 31, win 342, options [nop,nop,TS val 2500903438 ecr 2500903437], length 24 21:00:48.110173 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [.], cksum 0x0521 (incorrect -> 0x1ba7), seq 31, ack 25, win 342, options [nop,nop,TS val 2500903438 ecr 2500903438], length 0 21:00:48.110211 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags [F.], cksum 0x0521 (incorrect -> 0x1ba7), seq 25, ack 31, win 342, options [nop,nop,TS val 2500903438 ecr 2500903437], length 0 21:00:48.151099 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [.], cksum 0x0521 (incorrect -> 0x1ba6), seq 31, ack 26, win 342, options [nop,nop,TS val 2500903438 ecr 2500903438], length 0 21:00:49.110524 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 56) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [P.], cksum 0x0539 (incorrect -> 0xb324), seq 31:55, ack 26, win 342, options [nop,nop,TS val 2500904438 ecr 2500903438], length 24 21:00:49.110637 IP6 (flowlabel 0xb34d5, hlim 64, next-header TCP (6) payload length: 20) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags [R], cksum 0x0515 (incorrect -> 0x668c), seq 1923801599, win 0, length 0 The tcp reset packet has a different flowlabel, which causes our router doesn't correctly close tcp connection. The reason is the normal packet gets the skb->hash from sk->sk_txhash, which is generated randomly. ip6_make_flowlabel then uses the hash to create a flowlabel. The reset packet doesn't get assigned a hash, so the flowlabel is calculated with flowi6. The solution is to save the hash value for timeout sock and use it for reset packet. Signed-off-by: Shaohua Li --- include/net/inet_timewait_sock.h | 10 ++ net/ipv4/inet_timewait_sock.c| 1 + net/ipv6/tcp_ipv6.c | 7 +++ 3 files changed, 18 insertions(+) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index 6a75d67..b568224 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -77,6 +77,7 @@ struct inet_timewait_sock { tw_pad : 2,/* 2 bits hole */ tw_tos : 8; kmemcheck_bitfield_end(flags); + __u32 tw_txhash; struct timer_list tw_timer; struct inet_bind_bucket *tw_tb; }; @@ -87,6 +88,15 @@ static inline struct inet_timewait_sock *inet_twsk(const struct sock *sk) return (struct inet_timewait_sock *)sk; } +static inline void skb_set_hash_from_twsk(struct sk_buff *skb, + struct inet_timewait_sock *twsk) +{ + if (twsk->tw_txhash) { + skb->l4_hash = 1; + skb->hash = twsk->tw_txhash; + } +} + void inet_twsk_free(struct inet_timewait_sock *tw); void inet_twsk_put(struct inet_timewait_sock *tw); diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 5b03915..6f460e7 100644
[RFC net 0/2] fix ipv6 tcp reset packet flowlabel issues
From: Shaohua Li Please see below tcpdump output: 21:00:48.109122 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 40) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [S], cksum 0x0529 (incorrect -> 0xf56c), seq 3282214508, win 43690, options [mss 65476,sackOK,TS val 2500903437 ecr 0,nop,wscale 7], length 0 21:00:48.109381 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 40) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags [S.], cksum 0x0529 (incorrect -> 0x49ad), seq 1923801573, ack 3282214509, win 43690, options [mss 65476,sackOK,TS val 2500903437 ecr 2500903437,nop,wscale 7], length 0 21:00:48.109548 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [.], cksum 0x0521 (incorrect -> 0x1bdf), seq 1, ack 1, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 0 21:00:48.109823 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 62) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [P.], cksum 0x053f (incorrect -> 0xb8b1), seq 1:31, ack 1, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 30 21:00:48.109910 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags [.], cksum 0x0521 (incorrect -> 0x1bc1), seq 1, ack 31, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 0 21:00:48.110043 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 56) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags [P.], cksum 0x0539 (incorrect -> 0xb726), seq 1:25, ack 31, win 342, options [nop,nop,TS val 2500903438 ecr 2500903437], length 24 21:00:48.110173 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [.], cksum 0x0521 (incorrect -> 0x1ba7), seq 31, ack 25, win 342, options [nop,nop,TS val 2500903438 ecr 2500903438], length 0 21:00:48.110211 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags [F.], cksum 0x0521 (incorrect -> 0x1ba7), seq 25, ack 31, win 342, options [nop,nop,TS val 2500903438 ecr 2500903437], length 0 21:00:48.151099 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [.], cksum 0x0521 (incorrect -> 0x1ba6), seq 31, ack 26, win 342, options [nop,nop,TS val 2500903438 ecr 2500903438], length 0 21:00:49.110524 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 56) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags [P.], cksum 0x0539 (incorrect -> 0xb324), seq 31:55, ack 26, win 342, options [nop,nop,TS val 2500904438 ecr 2500903438], length 24 21:00:49.110637 IP6 (flowlabel 0xb34d5, hlim 64, next-header TCP (6) payload length: 20) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags [R], cksum 0x0515 (incorrect -> 0x668c), seq 1923801599, win 0, length 0 The flowlabel of reset packet (0xb34d5) and that of normal packet (0xd827f) are different. This causes our router doesn't correctly close tcp connection. The patches try to fix the issue. Thanks, Shaohua Shaohua Li (2): net: set skb hash for IP6 TCP reset packet net: tcp_v6_send_reset should set flowlabel include/net/inet_timewait_sock.h | 10 ++ net/ipv4/inet_timewait_sock.c| 1 + net/ipv6/tcp_ipv6.c | 25 - 3 files changed, 35 insertions(+), 1 deletion(-) -- 2.9.3
[RFC net 2/2] net: tcp_v6_send_reset should set flowlabel
From: Shaohua Li Currently tcp_v6_send_reset ignores user defined flowlabel, so the reset packet doesn't include the flowlabel info. Signed-off-by: Shaohua Li --- net/ipv6/tcp_ipv6.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 2e656d2..eaf4b8e 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -898,6 +898,8 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb) struct sock *sk1 = NULL; #endif int oif; + u8 tclass = 0; + __be32 flowlabel = 0; if (th->rst) return; @@ -946,7 +948,21 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb) (th->doff << 2); oif = sk ? sk->sk_bound_dev_if : 0; - tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, key, 1, 0, 0); + if (sk) { + if (sk_fullsock(sk)) { + struct ipv6_pinfo *np = inet6_sk(sk); + + tclass = np->tclass; + flowlabel = np->flow_label & IPV6_FLOWLABEL_MASK; + } else { + struct inet_timewait_sock *tw = inet_twsk(sk); + + tclass = tw->tw_tclass; + flowlabel = cpu_to_be32(tw->tw_flowlabel); + } + } + tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, key, 1, + tclass, flowlabel); #ifdef CONFIG_TCP_MD5SIG out: -- 2.9.3
Re: [PATCH][net-next] svcrdma: fix an incorrect check on -E2BIG and -EINVAL
> On Jul 13, 2017, at 1:51 PM, Colin King wrote: > > From: Colin Ian King > > The current check will always be true and will always jump to > err1, this looks dubious to me. I believe && should be used > instead of ||. > > Detected by CoverityScan, CID#1450120 ("Logically Dead Code") > > Fixes: 107c1d0a991a ("svcrdma: Avoid Send Queue overflow") > Signed-off-by: Colin Ian King Reviewed-by: Chuck Lever Dan reported this today, and I have a similar patch in my "pending for-rc" series. This one works too. > --- > net/sunrpc/xprtrdma/svc_rdma_sendto.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c > b/net/sunrpc/xprtrdma/svc_rdma_sendto.c > index 19fd01e4b690..7c3a211e0e9a 100644 > --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c > +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c > @@ -684,7 +684,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) > return 0; > > err2: > - if (ret != -E2BIG || ret != -EINVAL) > + if (ret != -E2BIG && ret != -EINVAL) > goto err1; > > ret = svc_rdma_post_recv(rdma, GFP_KERNEL); > -- > 2.11.0 > -- Chuck Lever
[PATCH][net-next] svcrdma: fix an incorrect check on -E2BIG and -EINVAL
From: Colin Ian King The current check will always be true and will always jump to err1, this looks dubious to me. I believe && should be used instead of ||. Detected by CoverityScan, CID#1450120 ("Logically Dead Code") Fixes: 107c1d0a991a ("svcrdma: Avoid Send Queue overflow") Signed-off-by: Colin Ian King --- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index 19fd01e4b690..7c3a211e0e9a 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -684,7 +684,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) return 0; err2: - if (ret != -E2BIG || ret != -EINVAL) + if (ret != -E2BIG && ret != -EINVAL) goto err1; ret = svc_rdma_post_recv(rdma, GFP_KERNEL); -- 2.11.0
Re: [PATCH] datapath: Fix for force/commit action failures
On 13 July 2017 at 09:25, Greg Rose wrote: > When there is an established connection in direction A->B, it is > possible to receive a packet on port B which then executes > ct(commit,force) without first performing ct() - ie, a lookup. > In this case, we would expect that this packet can delete the existing > entry so that we can commit a connection with direction B->A. However, > currently we only perform a check in skb_nfct_cached() for whether > OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a > lookup previously occurred. In the above scenario, a lookup has not > occurred but we should still be able to statelessly look up the > existing entry and potentially delete the entry if it is in the > opposite direction. > > This patch extends the check to also hint that if the action has the > force flag set, then we will lookup the existing entry so that the > force check at the end of skb_nfct_cached has the ability to delete > the connection. > > CC: d...@openvswitch.org > CC: Pravin Shalar > Signed-off-by: Joe Stringer > Signed-off-by: Greg Rose > --- A couple more administrative notes, on netdev the module name in the patch subject for openvswitch is "openvswitch" rather than datapath; and patches rather than having just "PATCH" as the subject prefix should state the tree. In this case, it's a bugfix so it should be "PATCH net". Furthermore, if you're able to figure out which commit introduced the issue (I believe it's introduced by the force commit patch), then you should place the "Fixes: " tag. I can give you some pointers off-list on how to do this (git blame and some basic formatting of the targeted patch should do the trick - this tag expects a 12-digit hash). For reference, I ended up looking it up during review, this is the line you'd add: Fixes: dd41d33f0b03 ("openvswitch: Add force commit.") > net/openvswitch/conntrack.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c > index 08679eb..9041cf8 100644 > --- a/net/openvswitch/conntrack.c > +++ b/net/openvswitch/conntrack.c > @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net, > ct = nf_ct_get(skb, &ctinfo); > /* If no ct, check if we have evidence that an existing conntrack > entry > * might be found for this skb. This happens when we lose a > skb->_nfct > -* due to an upcall. If the connection was not confirmed, it is not > -* cached and needs to be run through conntrack again. > +* due to an upcall, or if the direction is being forced. If the > +* connection was not confirmed, it is not cached and needs to be run > +* through conntrack again. > */ > - if (!ct && key->ct_state & OVS_CS_F_TRACKED && > + if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) && > !(key->ct_state & OVS_CS_F_INVALID) && > - key->ct_zone == info->zone.id) { > +key->ct_zone == info->zone.id) || > +(!key->ct_state && info->force)) { > ct = ovs_ct_find_existing(net, &info->zone, info->family, skb, > !!(key->ct_state > & OVS_CS_F_NAT_MASK)); > if (ct) > nf_ct_get(skb, &ctinfo); > + else > + return false; > } > if (!ct) > return false; I was just wondering if this has the potential to prevent nf_conntrack_in() from being called at all in this case, which is also not quite right. In the original case of (!ct && (key->ct_state & OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_TRACKED)), which I'll refer to as "ct_executed", we explicitly want to avoid running nf_conntrack_in() if we already ran it, because the connection tracker doesn't expect to see the same packet twice (there's also things like stats/accounting, and potentially L4 state machines that could get messed up by multiple calls). By the time the info->force and direction check happens at the end of the function, "ct_executed" is implied to be true. However, in this new case, ct_executed is actually false - because there was no ct() before the ct(force,commit). In this case, we only want to look up the existing entry to see if it should be deleted; if it should not be deleted, then we still haven't yet done the nf_conntrack_in() call so we should return false and the caller, __ovs_ct_lookup() should make the call to nf_conntrack_in(). What I mean is something like the following incremental on your patch: diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 9041cf8b822f..98783f114824 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -637,6 +637,7 @@ static bool skb_nfct_cached(struct net *net, { enum ip_conntrack_info ctinfo; struct nf_conn *ct; + bool ct_executed;
[PATCH] smsc95xx: use ethtool_op_get_ts_info()
This change enables the use of SW timestamping on Raspberry PI. smsc95xx uses the usbnet transmit function usbnet_start_xmit(), which implements software timestamping. However the SOF_TIMESTAMPING_TX_SOFTWARE capability was missing and only SOF_TIMESTAMPING_RX_SOFTWARE was announced. By using ethtool_op_get_ts_info() as get_ts_info() also the SOF_TIMESTAMPING_TX_SOFTWARE is announced. Signed-off-by: Petr Kulhavy --- drivers/net/usb/smsc95xx.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c index 2dfca96..340c134 100644 --- a/drivers/net/usb/smsc95xx.c +++ b/drivers/net/usb/smsc95xx.c @@ -898,6 +898,7 @@ static const struct ethtool_ops smsc95xx_ethtool_ops = { .set_wol= smsc95xx_ethtool_set_wol, .get_link_ksettings = smsc95xx_get_link_ksettings, .set_link_ksettings = smsc95xx_set_link_ksettings, + .get_ts_info= ethtool_op_get_ts_info, }; static int smsc95xx_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd) -- 2.7.4
Re: [ovs-dev] [PATCH] datapath: Fix for force/commit action failures
On 07/13/2017 10:08 AM, Darrell Ball wrote: On 7/13/17, 9:25 AM, "ovs-dev-boun...@openvswitch.org on behalf of Greg Rose" wrote: When there is an established connection in direction A->B, it is possible to receive a packet on port B which then executes ct(commit,force) without first performing ct() - ie, a lookup. In this case, we would expect that this packet can delete the existing entry so that we can commit a connection with direction B->A. However, currently we only perform a check in skb_nfct_cached() for whether OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a lookup previously occurred. In the above scenario, a lookup has not occurred but we should still be able to statelessly look up the existing entry and potentially delete the entry if it is in the opposite direction. This patch extends the check to also hint that if the action has the force flag set, then we will lookup the existing entry so that the force check at the end of skb_nfct_cached has the ability to delete the connection. CC: d...@openvswitch.org CC: Pravin Shalar Signed-off-by: Joe Stringer Signed-off-by: Greg Rose --- net/openvswitch/conntrack.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 08679eb..9041cf8 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net, ct = nf_ct_get(skb, &ctinfo); /* If no ct, check if we have evidence that an existing conntrack entry * might be found for this skb. This happens when we lose a skb->_nfct - * due to an upcall. If the connection was not confirmed, it is not - * cached and needs to be run through conntrack again. + * due to an upcall, or if the direction is being forced. If the + * connection was not confirmed, it is not cached and needs to be run + * through conntrack again. */ - if (!ct && key->ct_state & OVS_CS_F_TRACKED && + if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_INVALID) && - key->ct_zone == info->zone.id) { + key->ct_zone == info->zone.id) || + (!key->ct_state && info->force)) { ct = ovs_ct_find_existing(net, &info->zone, info->family, skb, !!(key->ct_state & OVS_CS_F_NAT_MASK)); if (ct) nf_ct_get(skb, &ctinfo); + else + return false; the above else case looks redundant since it maps to the following check if (!ct) return false; which services a superset of the code flow. Sure, we can let it fall through... missed that. After waiting for more review I'll send a V2 patch. Thanks for the review Darrell - Greg } if (!ct) return false; -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=5da6ykiSQJBoJXmpq6iVknmPAc5JDzlVngmp8j_xcXA&s=PsX-njQ_hFqy8P77KNEyX0i7u165p2Wrbg4uAYqfbGs&e=
[PATCH net 1/1] net sched actions: rename act_get_notify() to tcf_get_notify()
Make name consistent with other TC event notification routines, such as tcf_add_notify() and tcf_del_notify() Signed-off-by: Roman Mashak --- net/sched/act_api.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/sched/act_api.c b/net/sched/act_api.c index aed6cf2..f2e9ed3 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -835,7 +835,7 @@ static int tca_get_fill(struct sk_buff *skb, struct list_head *actions, } static int -act_get_notify(struct net *net, u32 portid, struct nlmsghdr *n, +tcf_get_notify(struct net *net, u32 portid, struct nlmsghdr *n, struct list_head *actions, int event) { struct sk_buff *skb; @@ -1018,7 +1018,7 @@ static int tca_action_flush(struct net *net, struct nlattr *nla, } if (event == RTM_GETACTION) - ret = act_get_notify(net, portid, n, &actions, event); + ret = tcf_get_notify(net, portid, n, &actions, event); else { /* delete */ ret = tcf_del_notify(net, n, &actions, portid); if (ret) -- 1.9.1
Re: [ovs-dev] [PATCH] datapath: Fix for force/commit action failures
On 7/13/17, 9:25 AM, "ovs-dev-boun...@openvswitch.org on behalf of Greg Rose" wrote: When there is an established connection in direction A->B, it is possible to receive a packet on port B which then executes ct(commit,force) without first performing ct() - ie, a lookup. In this case, we would expect that this packet can delete the existing entry so that we can commit a connection with direction B->A. However, currently we only perform a check in skb_nfct_cached() for whether OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a lookup previously occurred. In the above scenario, a lookup has not occurred but we should still be able to statelessly look up the existing entry and potentially delete the entry if it is in the opposite direction. This patch extends the check to also hint that if the action has the force flag set, then we will lookup the existing entry so that the force check at the end of skb_nfct_cached has the ability to delete the connection. CC: d...@openvswitch.org CC: Pravin Shalar Signed-off-by: Joe Stringer Signed-off-by: Greg Rose --- net/openvswitch/conntrack.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 08679eb..9041cf8 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net, ct = nf_ct_get(skb, &ctinfo); /* If no ct, check if we have evidence that an existing conntrack entry * might be found for this skb. This happens when we lose a skb->_nfct -* due to an upcall. If the connection was not confirmed, it is not -* cached and needs to be run through conntrack again. +* due to an upcall, or if the direction is being forced. If the +* connection was not confirmed, it is not cached and needs to be run +* through conntrack again. */ - if (!ct && key->ct_state & OVS_CS_F_TRACKED && + if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_INVALID) && - key->ct_zone == info->zone.id) { +key->ct_zone == info->zone.id) || +(!key->ct_state && info->force)) { ct = ovs_ct_find_existing(net, &info->zone, info->family, skb, !!(key->ct_state & OVS_CS_F_NAT_MASK)); if (ct) nf_ct_get(skb, &ctinfo); + else + return false; the above else case looks redundant since it maps to the following check if (!ct) return false; which services a superset of the code flow. } if (!ct) return false; -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=5da6ykiSQJBoJXmpq6iVknmPAc5JDzlVngmp8j_xcXA&s=PsX-njQ_hFqy8P77KNEyX0i7u165p2Wrbg4uAYqfbGs&e=
Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants
On 07/13/2017 09:16 AM, Jesper Dangaard Brouer wrote: > On Thu, 13 Jul 2017 13:14:30 +0200 > Jesper Dangaard Brouer wrote: > >> I'm still getting crashes (but much harder to provoke), but I figured >> out why. We sort of missed one case, where map_to_flush gets set, when >> the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then >> forgets to call xdp_do_flush_map, if all packets in that NAPI cycle >> failed. We could blame the driver, but yhe clean solution is making >> sure, that we don't set map_to_flush when the __bpf_tx_xdp() call >> fails. It should also handle the other case I fixed I'll cleanup >> my PoC-fix patch, test it and provide it here. > > I changed flow in the function to be: Great, I'll merge this, the other couple fixes, and the bitops optimization and hopefully then we are set. I'll post a v2 and we can do some final checks. Thanks! John > > int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, > struct bpf_prog *xdp_prog) > { > struct redirect_info *ri = this_cpu_ptr(&redirect_info); > struct bpf_map *map = ri->map; > u32 index = ri->ifindex; > struct net_device *fwd; > int err = -EINVAL; > > ri->ifindex = 0; > ri->map = NULL; > > fwd = __dev_map_lookup_elem(map, index); > if (!fwd) > goto out; > > if (ri->map_to_flush && (ri->map_to_flush != map)) > xdp_do_flush_map(); > > err = __bpf_tx_xdp(fwd, map, xdp, index); > if (likely(!err)) > ri->map_to_flush = map; > > out: > trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT); > return err; > } > > > The diff is: > > diff --git a/net/core/filter.c b/net/core/filter.c > index 4ca895d6ed51..c50a7ec2cdab 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -2483,26 +2483,25 @@ int xdp_do_redirect_map(struct net_device *dev, > struct xdp_buff *xdp, > struct bpf_map *map = ri->map; > u32 index = ri->ifindex; > struct net_device *fwd; > + int err = -EINVAL; > + > + ri->ifindex = 0; > + ri->map = NULL; > > fwd = __dev_map_lookup_elem(map, index); > if (!fwd) > goto out; > > - ri->ifindex = 0; > - > if (ri->map_to_flush && (ri->map_to_flush != map)) > xdp_do_flush_map(); > > - ri->map_to_flush = map; > - ri->map = NULL; > + err = __bpf_tx_xdp(fwd, map, xdp, index); > + if (likely(!err)) > + ri->map_to_flush = map; > > - trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT); > - > - return __bpf_tx_xdp(fwd, map, xdp, index); > out: > - ri->ifindex = 0; > - ri->map = NULL; > - return -EINVAL; > + trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT); > + return err; > } > > int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, >
[PATCH net] net/packet: Fix Tx queue selection for AF_PACKET
When PACKET_QDISC_BYPASS is not used, Tx queue selection will be done before the packet is enqueued, taking into account any mappings set by a queuing discipline such as mqprio without hardware offloading. This selection may be affected by a previously saved queue_mapping, either on the Rx path, or done before the packet reaches the device, as it's currently the case for AF_PACKET. In order for queue selection to work as expected when using traffic control, there can't be another selection done before that point is reached, so move the call to packet_pick_tx_queue to packet_direct_xmit, leaving the default xmit path as it was before PACKET_QDISC_BYPASS was introduced. A forward declaration of packet_pick_tx_queue() is introduced to avoid the need to reorder the functions within the file. Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") Signed-off-by: Iván Briano --- net/packet/af_packet.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index e3beb28..008bb34 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -214,6 +214,7 @@ static void prb_clear_rxhash(struct tpacket_kbdq_core *, static void prb_fill_vlan_info(struct tpacket_kbdq_core *, struct tpacket3_hdr *); static void packet_flush_mclist(struct sock *sk); +static void packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb); struct packet_skb_cb { union { @@ -260,6 +261,7 @@ static int packet_direct_xmit(struct sk_buff *skb) if (skb != orig_skb) goto drop; + packet_pick_tx_queue(dev, skb); txq = skb_get_tx_queue(dev, skb); local_bh_disable(); @@ -2747,8 +2749,6 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) goto tpacket_error; } - packet_pick_tx_queue(dev, skb); - skb->destructor = tpacket_destruct_skb; __packet_set_status(po, ph, TP_STATUS_SENDING); packet_inc_pending(&po->tx_ring); @@ -2931,8 +2931,6 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) skb->priority = sk->sk_priority; skb->mark = sockc.mark; - packet_pick_tx_queue(dev, skb); - if (po->has_vnet_hdr) { err = virtio_net_hdr_to_skb(skb, &vnet_hdr, vio_le()); if (err) -- 2.9.4
Re: [PATCH] net: broadcom: bnx2x: make a couple of const arrays static
From: Colin King Date: Tue, 11 Jul 2017 11:52:23 +0100 > From: Colin Ian King > > Don't populate various tables on the stack but make them static const. > Makes the object code smaller by nearly 200 bytes: > > Before: >text data bss dec hex filename > 113468 11200 0 124668 1e6fc bnx2x_ethtool.o > > After: >text data bss dec hex filename > 113129 11344 0 124473 1e639 bnx2x_ethtool.o > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH] dccp: make const array error_code static
From: Colin King Date: Thu, 13 Jul 2017 12:22:24 +0100 > From: Colin Ian King > > Don't populate array error_code on the stack but make it static. Makes > the object code smaller by almost 250 bytes: > > Before: >text data bss dec hex filename > 10366 983 0 113492c55 net/dccp/input.o > > After: >text data bss dec hex filename > 10161 1039 0 112002bc0 net/dccp/input.o > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH] net: stmmac: make const array route_possibilities static
From: Colin King Date: Tue, 11 Jul 2017 12:18:48 +0100 > From: Colin Ian King > > Don't populate array route_possibilities on the stack but make it > static const. Makes the object code a little smaller by 85 bytes: > > Before: >text data bss dec hex filename >9901 2448 0 12349303d dwmac4_core.o > > After: >text data bss dec hex filename >9760 2504 0 122642fe8 dwmac4_core.o > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH] netlink: correctly document nla_put_u64_64bit()
From: Rolf Eike Beer Date: Thu, 13 Jul 2017 16:50:24 +0200 > From 90bda8d1bc2a0c5d283e4e0a4b19812a4cce72bd Mon Sep 17 00:00:00 2001 > From: Rolf Eike Beer > Date: Thu, 13 Jul 2017 16:46:43 +0200 > Subject: [PATCH] netlink: correctly document nla_put_u64_64bit() > > Signed-off-by: Rolf Eike Beer Applied.
[PATCH] datapath: Fix for force/commit action failures
When there is an established connection in direction A->B, it is possible to receive a packet on port B which then executes ct(commit,force) without first performing ct() - ie, a lookup. In this case, we would expect that this packet can delete the existing entry so that we can commit a connection with direction B->A. However, currently we only perform a check in skb_nfct_cached() for whether OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a lookup previously occurred. In the above scenario, a lookup has not occurred but we should still be able to statelessly look up the existing entry and potentially delete the entry if it is in the opposite direction. This patch extends the check to also hint that if the action has the force flag set, then we will lookup the existing entry so that the force check at the end of skb_nfct_cached has the ability to delete the connection. CC: d...@openvswitch.org CC: Pravin Shalar Signed-off-by: Joe Stringer Signed-off-by: Greg Rose --- net/openvswitch/conntrack.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 08679eb..9041cf8 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net, ct = nf_ct_get(skb, &ctinfo); /* If no ct, check if we have evidence that an existing conntrack entry * might be found for this skb. This happens when we lose a skb->_nfct -* due to an upcall. If the connection was not confirmed, it is not -* cached and needs to be run through conntrack again. +* due to an upcall, or if the direction is being forced. If the +* connection was not confirmed, it is not cached and needs to be run +* through conntrack again. */ - if (!ct && key->ct_state & OVS_CS_F_TRACKED && + if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_INVALID) && - key->ct_zone == info->zone.id) { +key->ct_zone == info->zone.id) || +(!key->ct_state && info->force)) { ct = ovs_ct_find_existing(net, &info->zone, info->family, skb, !!(key->ct_state & OVS_CS_F_NAT_MASK)); if (ct) nf_ct_get(skb, &ctinfo); + else + return false; } if (!ct) return false; -- 1.8.3.1
Re: [PATCH net] cxgb4: add new T5 pci device id's
From: Ganesh Goudar Date: Thu, 13 Jul 2017 18:45:07 +0530 > Add 0x50a3 and 0x50a4 T5 device id's > > Signed-off-by: Ganesh Goudar Applied.
Re: [PATCH net] xgene: Don't fail probe, if there is no clk resource for SGMII interfaces
From: Thomas Bogendoerfer Date: Thu, 13 Jul 2017 10:57:40 +0200 > From: Thomas Bogendoerfer > > This change fixes following problem > > [1.827940] xgene-enet: probe of 1f210030.ethernet failed with error -2 > > which leads to a missing ethernet interface (reproducable at least on > Gigabyte MP30-AR0 and APM Mustang systems). > > The check for a valid clk resource fails, because DT doesn't provide a > clock for sgenet1. But the driver doesn't use this clk, if the ethernet > port is connected via SGMII. Therefore this patch avoids probing for clk > on SGMII interfaces. Applied, thanks. > Fixes: 9aea7779b764 drivers: net: xgene: Fix crash on DT systems Please put the commit header text inside of parenthesis and double quotes, like this: Fixes: 9aea7779b764 ("drivers: net: xgene: Fix crash on DT systems") I fixed it up for you this time.
Re: [PATCH] bpf: fix return in bpf_skb_adjust_net
From: Kefeng Wang Date: Thu, 13 Jul 2017 14:27:58 +0800 > The bpf_skb_adjust_net() ignores the return value of bpf_skb_net_shrink/grow, > and always return 0, fix it by return 'ret'. > > Signed-off-by: Kefeng Wang Applied, thanks.
Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants
On Thu, 13 Jul 2017 13:14:30 +0200 Jesper Dangaard Brouer wrote: > I'm still getting crashes (but much harder to provoke), but I figured > out why. We sort of missed one case, where map_to_flush gets set, when > the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then > forgets to call xdp_do_flush_map, if all packets in that NAPI cycle > failed. We could blame the driver, but yhe clean solution is making > sure, that we don't set map_to_flush when the __bpf_tx_xdp() call > fails. It should also handle the other case I fixed I'll cleanup > my PoC-fix patch, test it and provide it here. I changed flow in the function to be: int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { struct redirect_info *ri = this_cpu_ptr(&redirect_info); struct bpf_map *map = ri->map; u32 index = ri->ifindex; struct net_device *fwd; int err = -EINVAL; ri->ifindex = 0; ri->map = NULL; fwd = __dev_map_lookup_elem(map, index); if (!fwd) goto out; if (ri->map_to_flush && (ri->map_to_flush != map)) xdp_do_flush_map(); err = __bpf_tx_xdp(fwd, map, xdp, index); if (likely(!err)) ri->map_to_flush = map; out: trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT); return err; } The diff is: diff --git a/net/core/filter.c b/net/core/filter.c index 4ca895d6ed51..c50a7ec2cdab 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2483,26 +2483,25 @@ int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, struct bpf_map *map = ri->map; u32 index = ri->ifindex; struct net_device *fwd; + int err = -EINVAL; + + ri->ifindex = 0; + ri->map = NULL; fwd = __dev_map_lookup_elem(map, index); if (!fwd) goto out; - ri->ifindex = 0; - if (ri->map_to_flush && (ri->map_to_flush != map)) xdp_do_flush_map(); - ri->map_to_flush = map; - ri->map = NULL; + err = __bpf_tx_xdp(fwd, map, xdp, index); + if (likely(!err)) + ri->map_to_flush = map; - trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT); - - return __bpf_tx_xdp(fwd, map, xdp, index); out: - ri->ifindex = 0; - ri->map = NULL; - return -EINVAL; + trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT); + return err; } int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH net] libceph: osdmap: Fix some NULL dereferences
On Thu, Jul 13, 2017 at 9:45 AM, Dan Carpenter wrote: > There are hidden gotos in the ceph_decode_* macros. We need to set the > "err" variable on these error paths otherwise we end up returning > ERR_PTR(0) which is NULL. It causes NULL dereferences in the callers. > > Fixes: 278b1d709c6a ("libceph: ceph_decode_skip_* helpers") > Signed-off-by: Dan Carpenter > > diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c > index 864789c5974e..c7521a847ef7 100644 > --- a/net/ceph/osdmap.c > +++ b/net/ceph/osdmap.c > @@ -510,6 +510,7 @@ static struct crush_map *crush_decode(void *pbyval, void > *end) > } > } > > + err = -EINVAL; > ceph_decode_skip_map(p, end, 32, string, bad); /* type_map */ > ceph_decode_skip_map(p, end, 32, string, bad); /* name_map */ > ceph_decode_skip_map(p, end, 32, string, bad); /* rule_name_map */ > @@ -1825,9 +1826,9 @@ struct ceph_osdmap *osdmap_apply_incremental(void **p, > void *end, > if (struct_v >= 3) { > /* new_erasure_code_profiles */ > ceph_decode_skip_map_of_map(p, end, string, string, string, > - bad); > + e_inval); > /* old_erasure_code_profiles */ > - ceph_decode_skip_set(p, end, string, bad); > + ceph_decode_skip_set(p, end, string, e_inval); > } > > if (struct_v >= 4) { Hi Dan, I applied osdmap_apply_incremental() hunk and fixed a similar bug in osdmap_decode() (it's a not NULL deref, that's why smatch didn't catch it). I posted a separate patch for crush_decode(). This is the second time in two months you are fixing it, so I wanted something more future-proof. See the end result at https://github.com/ceph/ceph-client/commits/testing. Thanks, Ilya
Re: [BUG]: NULL ptr dereference in unix_stream_sendmsg+0x1c1/0x380
Sorry, I've forgot to change the subject as the first time it fired at xlog_cil_push(), but all reproductions fire on unix_stream_sendmsg(). On 07/13/2017 06:16 PM, Dmitry Safonov wrote: Hello, We run CRIU tests on linux-next tree and today we found this issue. CRIU tests are the set of small programs to check checkpoint/restore of different primitives (files, sockets, signals, pipes, etc). https://github.com/xemul/criu/tree/master/test Each test is executed three times: without namespaces, in a set of all namespaces except userns, in a set of all namespaces. When a test passed the preparation tests, it sends a signal to an executer, and then the executer dumps and restores tests processes, and sends a signal to the test back to check that everything are restored correctly. = Run zdtm/transition/unix_sock in ns == Start test ./unix_sock --pidfile=unix_sock.pid --outfile=unix_sock.out --filename=unix_sock.test Run criu dump [ 57.647284] writing to auto_msgmni has no effect [ 60.730380] criu (2023) used greatest stack depth: 11808 bytes left Run criu restore [ 60.993529] BUG: unable to handle kernel NULL pointer dereference at (null) [ 60.994221] IP: skb_queue_tail+0x2e/0x50 [ 60.994589] PGD 71070067 [ 60.994590] P4D 71070067 [ 60.994854] PUD 71071067 [ 60.995102] PMD 0 [ 60.995352] [ 60.995694] Oops: 0002 [#1] SMP [ 60.996004] CPU: 0 PID: 2053 Comm: unix_sock Not tainted 4.12.0-next-20170713 #6 [ 60.996706] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 [ 60.997657] task: 880074748c80 task.stack: c9594000 [ 60.998208] RIP: 0010:skb_queue_tail+0x2e/0x50 [ 60.998614] RSP: 0018:c9597cf8 EFLAGS: 00010046 [ 60.999132] RAX: 0246 RBX: 88006f3fa0c8 RCX: [ 60.999797] RDX: RSI: 0246 RDI: 88006f3fa0dc [ 61.000455] RBP: c9597d10 R08: c9597e50 R09: [ 61.001114] R10: 880072daea00 R11: 88007d002d80 R12: 880072daea00 [ 61.001772] R13: 88006f3fa0dc R14: 88006f3fa000 R15: 0001 [ 61.002451] FS: () GS:88007fc0(0063) knlGS:f7f7b380 [ 61.003198] CS: 0010 DS: 002b ES: 002b CR0: 80050033 [ 61.003735] CR2: CR3: 7106f000 CR4: 06f0 [ 61.004393] DR0: DR1: DR2: [ 61.005050] DR3: DR6: fffe0ff0 DR7: 0400 [ 61.005717] Call Trace: [ 61.005952] unix_stream_sendmsg+0x1c1/0x380 [ 61.006345] sock_sendmsg+0x33/0x40 [ 61.006667] sock_write_iter+0x7d/0xc0 [ 61.007032] __vfs_write+0xcd/0x120 [ 61.007353] vfs_write+0xac/0x1a0 [ 61.007677] SyS_write+0x41/0xa0 [ 61.007996] do_fast_syscall_32+0x8b/0x15c [ 61.008371] entry_SYSENTER_compat+0x4c/0x5b [ 61.008781] RIP: 0023:0xf7f7faf9 [ 61.009082] RSP: 002b:fffd62f8 EFLAGS: 0246 ORIG_RAX: 0004 [ 61.009811] RAX: ffda RBX: 0004 RCX: fffd6738 [ 61.010453] RDX: 03e8 RSI: fffd63b8 RDI: fffd6749 [ 61.06] RBP: fffd6b38 R08: R09: [ 61.011795] R10: R11: R12: [ 61.012378] R13: R14: R15: [ 61.013027] Code: e5 41 55 4c 8d 6f 14 41 54 53 48 89 fb 4c 89 ef 49 89 f4 e8 85 d3 21 00 48 8b 53 08 49 89 1c 24 4c 89 ef 48 89 c6 49 89 54 24 08 <4c> 89 22 83 43 10 01 4c 89 63 08 e8 22 d4 21 00 5b 41 5c 41 5d [ 61.014778] RIP: skb_queue_tail+0x2e/0x50 RSP: c9597cf8 [ 61.015333] CR2: [ 61.015639] ---[ end trace efd0a4201d4b29fc ]--- The bug is easily (5/5) reproduced on next-20170713 with the following: git clone https://github.com/xemul/criu.git cd criu && git checkout criu-dev COMPAT_TEST=y make -j5 zdtm for i in `seq 1 2`; do ./test/zdtm.py run -t zdtm/transition/unix_sock -f ns ; done -- Dmitry
[BUG]: NULL ptr dereference in xlog_cil_push+0x274/0x430
Hello, We run CRIU tests on linux-next tree and today we found this issue. CRIU tests are the set of small programs to check checkpoint/restore of different primitives (files, sockets, signals, pipes, etc). https://github.com/xemul/criu/tree/master/test Each test is executed three times: without namespaces, in a set of all namespaces except userns, in a set of all namespaces. When a test passed the preparation tests, it sends a signal to an executer, and then the executer dumps and restores tests processes, and sends a signal to the test back to check that everything are restored correctly. = Run zdtm/transition/unix_sock in ns == Start test ./unix_sock --pidfile=unix_sock.pid --outfile=unix_sock.out --filename=unix_sock.test Run criu dump [ 57.647284] writing to auto_msgmni has no effect [ 60.730380] criu (2023) used greatest stack depth: 11808 bytes left Run criu restore [ 60.993529] BUG: unable to handle kernel NULL pointer dereference at (null) [ 60.994221] IP: skb_queue_tail+0x2e/0x50 [ 60.994589] PGD 71070067 [ 60.994590] P4D 71070067 [ 60.994854] PUD 71071067 [ 60.995102] PMD 0 [ 60.995352] [ 60.995694] Oops: 0002 [#1] SMP [ 60.996004] CPU: 0 PID: 2053 Comm: unix_sock Not tainted 4.12.0-next-20170713 #6 [ 60.996706] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 [ 60.997657] task: 880074748c80 task.stack: c9594000 [ 60.998208] RIP: 0010:skb_queue_tail+0x2e/0x50 [ 60.998614] RSP: 0018:c9597cf8 EFLAGS: 00010046 [ 60.999132] RAX: 0246 RBX: 88006f3fa0c8 RCX: [ 60.999797] RDX: RSI: 0246 RDI: 88006f3fa0dc [ 61.000455] RBP: c9597d10 R08: c9597e50 R09: [ 61.001114] R10: 880072daea00 R11: 88007d002d80 R12: 880072daea00 [ 61.001772] R13: 88006f3fa0dc R14: 88006f3fa000 R15: 0001 [ 61.002451] FS: () GS:88007fc0(0063) knlGS:f7f7b380 [ 61.003198] CS: 0010 DS: 002b ES: 002b CR0: 80050033 [ 61.003735] CR2: CR3: 7106f000 CR4: 06f0 [ 61.004393] DR0: DR1: DR2: [ 61.005050] DR3: DR6: fffe0ff0 DR7: 0400 [ 61.005717] Call Trace: [ 61.005952] unix_stream_sendmsg+0x1c1/0x380 [ 61.006345] sock_sendmsg+0x33/0x40 [ 61.006667] sock_write_iter+0x7d/0xc0 [ 61.007032] __vfs_write+0xcd/0x120 [ 61.007353] vfs_write+0xac/0x1a0 [ 61.007677] SyS_write+0x41/0xa0 [ 61.007996] do_fast_syscall_32+0x8b/0x15c [ 61.008371] entry_SYSENTER_compat+0x4c/0x5b [ 61.008781] RIP: 0023:0xf7f7faf9 [ 61.009082] RSP: 002b:fffd62f8 EFLAGS: 0246 ORIG_RAX: 0004 [ 61.009811] RAX: ffda RBX: 0004 RCX: fffd6738 [ 61.010453] RDX: 03e8 RSI: fffd63b8 RDI: fffd6749 [ 61.06] RBP: fffd6b38 R08: R09: [ 61.011795] R10: R11: R12: [ 61.012378] R13: R14: R15: [ 61.013027] Code: e5 41 55 4c 8d 6f 14 41 54 53 48 89 fb 4c 89 ef 49 89 f4 e8 85 d3 21 00 48 8b 53 08 49 89 1c 24 4c 89 ef 48 89 c6 49 89 54 24 08 <4c> 89 22 83 43 10 01 4c 89 63 08 e8 22 d4 21 00 5b 41 5c 41 5d [ 61.014778] RIP: skb_queue_tail+0x2e/0x50 RSP: c9597cf8 [ 61.015333] CR2: [ 61.015639] ---[ end trace efd0a4201d4b29fc ]--- The bug is easily (5/5) reproduced on next-20170713 with the following: git clone https://github.com/xemul/criu.git cd criu && git checkout criu-dev COMPAT_TEST=y make -j5 zdtm for i in `seq 1 2`; do ./test/zdtm.py run -t zdtm/transition/unix_sock -f ns ; done -- Dmitry
[PATCH] netlink: correctly document nla_put_u64_64bit()
>From 90bda8d1bc2a0c5d283e4e0a4b19812a4cce72bd Mon Sep 17 00:00:00 2001 From: Rolf Eike Beer Date: Thu, 13 Jul 2017 16:46:43 +0200 Subject: [PATCH] netlink: correctly document nla_put_u64_64bit() Signed-off-by: Rolf Eike Beer --- include/net/netlink.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/net/netlink.h b/include/net/netlink.h index 01709172b3d3..ef8e6c3a80a6 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -98,8 +98,8 @@ * nla_put_u8(skb, type, value) add u8 attribute to skb * nla_put_u16(skb, type, value) add u16 attribute to skb * nla_put_u32(skb, type, value) add u32 attribute to skb - * nla_put_u64_64bits(skb, type, - * value, padattr) add u64 attribute to skb + * nla_put_u64_64bit(skb, type, + * value, padattr) add u64 attribute to skb * nla_put_s8(skb, type, value) add s8 attribute to skb * nla_put_s16(skb, type, value) add s16 attribute to skb * nla_put_s32(skb, type, value) add s32 attribute to skb -- 2.13.2 -- Rolf Eike Beer, emlix GmbH, http://www.emlix.com Fon +49 551 30664-0, Fax +49 551 30664-11 Bertha-von-Suttner-Str. 9, 37085 Göttingen, Germany Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160 Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055 emlix – smart embedded open source
Re: sctp refcount bug.
On Thu, Jul 13, 2017 at 11:38:34AM -0300, Marcelo Ricardo Leitner wrote: > On Thu, Jul 13, 2017 at 10:36:39AM -0400, Dave Jones wrote: > > Hit this on Linus' current tree. > > > > > > refcount_t: underflow; use-after-free. > > Any tips on how to reproduce this? Only seen it once so far. Will see if I can narrow it down if it reproduces. It took ~12 hours of fuzzing to find overnight. Dave
Re: sctp refcount bug.
On Thu, Jul 13, 2017 at 10:36:39AM -0400, Dave Jones wrote: > Hit this on Linus' current tree. > > > refcount_t: underflow; use-after-free. Any tips on how to reproduce this? Marcelo
sctp refcount bug.
Hit this on Linus' current tree. refcount_t: underflow; use-after-free. [ cut here ] WARNING: CPU: 2 PID: 14455 at lib/refcount.c:186 refcount_sub_and_test+0x45/0x50 CPU: 2 PID: 14455 Comm: trinity-c46 Tainted: G D 4.12.0-think+ #11 task: 8804fc71b8c0 task.stack: c90002328000 RIP: 0010:refcount_sub_and_test+0x45/0x50 RSP: 0018:c9000232ba58 EFLAGS: 00010282 RAX: 0026 RBX: 88001db1d1c0 RCX: RDX: RSI: 88050a3ccca8 RDI: 88050a3ccca8 RBP: c9000232ba58 R08: R09: 0001 R10: c9000232ba88 R11: R12: 88000d3f9b40 R13: 880456948008 R14: 880456948870 R15: c9000232bd10 FS: 7f79b1032700() GS:88050a20() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 008436726348 CR3: 00022cc87000 CR4: 001406e0 DR0: 7f731068f000 DR1: 7f2d83eb9000 DR2: 7f302340e000 DR3: DR6: 0ff0 DR7: 0600 Call Trace: sctp_wfree+0x5d/0x190 [sctp] skb_release_head_state+0x64/0xc0 skb_release_all+0x12/0x30 consume_skb+0x50/0x170 sctp_chunk_put+0x59/0x80 [sctp] sctp_chunk_free+0x26/0x30 [sctp] __sctp_outq_teardown+0x1d8/0x270 [sctp] sctp_outq_free+0xe/0x10 [sctp] sctp_association_free+0x92/0x220 [sctp] sctp_do_sm+0x12a6/0x1920 [sctp] ? __get_user_4+0x18/0x20 ? no_context+0x3f/0x360 ? lock_acquire+0xe7/0x1e0 ? skb_dequeue+0x1d/0x70 sctp_primitive_SHUTDOWN+0x33/0x40 [sctp] sctp_close+0x26e/0x2a0 [sctp] inet_release+0x3c/0x60 sock_release+0x1f/0x80 sock_close+0x12/0x20 __fput+0xf8/0x200 fput+0xe/0x10 task_work_run+0x85/0xc0 exit_to_usermode_loop+0xa8/0xb0 do_syscall_64+0x151/0x190 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f79b095b1e9 RSP: 002b:7ffc5eca3088 EFLAGS: 0246 ORIG_RAX: 0120 RAX: fff2 RBX: 0120 RCX: 7f79b095b1e9 RDX: 006e RSI: 008436738120 RDI: 0130 RBP: 7ffc5eca3130 R08: R09: 0ff0 R10: 00080800 R11: 0246 R12: 0002 R13: 7f79b0ee9058 R14: 7f79b1032698 R15: 7f79b0ee9000 Code: 75 e6 85 d2 0f 94 c0 c3 31 c0 c3 80 3d ce 95 bc 00 00 75 f4 55 48 c7 c7 00 d9 ee 81 48 89 e5 c6 05 ba 95 bc 00 01 e8 fc 2c c0 ff <0f> ff 31 c0 5d c3 0f 1f 44 00 00 55 48 89 fe bf 01 00 00 00 48 ---[ end trace 19b7bd878c0f56fd ]--- [ cut here ] WARNING: CPU: 2 PID: 14455 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x1b8/0x1f0 CPU: 2 PID: 14455 Comm: trinity-c46 Tainted: G D W 4.12.0-think+ #11 task: 8804fc71b8c0 task.stack: c90002328000 RIP: 0010:inet_sock_destruct+0x1b8/0x1f0 RSP: 0018:c9000232bcf8 EFLAGS: 00010286 RAX: RBX: 88000d3f9b40 RCX: RDX: fd00 RSI: 0300 RDI: 88000d3f9ca8 RBP: c9000232bd08 R08: R09: R10: R11: R12: 88000d3f9ca8 R13: 88000d3f9b40 R14: 88000d3f9bc8 R15: 8801836e21d0 FS: 7f79b1032700() GS:88050a20() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 3b9ab732 CR3: 00022cc87000 CR4: 001406e0 DR0: 7f731068f000 DR1: 7f2d83eb9000 DR2: 7f302340e000 DR3: DR6: 0ff0 DR7: 0600 Call Trace: sctp_destruct_sock+0x25/0x30 [sctp] __sk_destruct+0x28/0x230 sk_destruct+0x20/0x30 __sk_free+0x43/0xa0 sk_free+0x25/0x30 sctp_close+0x218/0x2a0 [sctp] inet_release+0x3c/0x60 sock_release+0x1f/0x80 sock_close+0x12/0x20 __fput+0xf8/0x200 fput+0xe/0x10 task_work_run+0x85/0xc0 exit_to_usermode_loop+0xa8/0xb0 do_syscall_64+0x151/0x190 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f79b095b1e9 RSP: 002b:7ffc5eca3088 EFLAGS: 0246 ORIG_RAX: 0120 RAX: fff2 RBX: 0120 RCX: 7f79b095b1e9 RDX: 006e RSI: 008436738120 RDI: 0130 RBP: 7ffc5eca3130 R08: R09: 0ff0 R10: 00080800 R11: 0246 R12: 0002 R13: 7f79b0ee9058 R14: 7f79b1032698 R15: 7f79b0ee9000 Code: df e8 bd 5f f4 ff e9 07 ff ff ff 0f ff 8b 83 8c 02 00 00 85 c0 0f 84 2d ff ff ff 0f ff 8b 93 88 02 00 00 85 d2 0f 84 2b ff ff ff <0f> ff 8b 83 40 02 00 00 85 c0 0f 84 29 ff ff ff 0f ff e9 22 ff ---[ end trace 19b7bd878c0f56fe ]--- [ cut here ] WARNING: CPU: 2 PID: 14455 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c8/0x1f0 CPU: 2 PID: 14455 Comm: trinity-c46 Tainted: G D W 4.12.0-think+ #11 task: 8804fc71b8c0 task.stack: c90002328000 RIP: 0010:inet_sock_destruct+0x1c8/0x1f0 RSP: 0018:c9000232bcf8 EFLAGS: 00010206 RAX: 0300 RBX: 88000d3f9b40 RCX: RDX: fd00 RSI: 0300 RDI: fff
[PATCH v7 1/3] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
From: Casey Leedom The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed Ordering Attribute should not be used on Transaction Layer Packets destined for the PCIe End Node so flagged. Initially flagged this way are Intel E5-26xx Root Complex Ports which suffer from a Flow Control Credit Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which don't obey PCIe 3.0 ordering rules which can lead to Data Corruption. Signed-off-by: Casey Leedom Signed-off-by: Ding Tianhong --- drivers/pci/quirks.c | 38 ++ include/linux/pci.h | 2 ++ 2 files changed, 40 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6967c6b..1e1cdbe 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4016,6 +4016,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev) quirk_tw686x_class); /* + * Some devices have problems with Transaction Layer Packets with the Relaxed + * Ordering Attribute set. Such devices should mark themselves and other + * Device Drivers should check before sending TLPs with RO set. + */ +static void quirk_relaxedordering_disable(struct pci_dev *dev) +{ + dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING; +} + +/* + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can + * cause performance problems with Upstream Transaction Layer Packets with + * Relaxed Ordering set. + */ +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); + +/* + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex + * where Upstream Transaction Layer Packets with the Relaxed Ordering + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering + * set. This is a violation of the PCIe 3.0 Transaction Ordering Rules + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0 + * November 10, 2010). As a result, on this platform we can't use Relaxed + * Ordering for Upstream TLPs. + */ +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); + +/* * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same * values for the Attribute as were supplied in the header of the * corresponding Request, except as explicitly allowed when IDO is used." diff --git a/include/linux/pci.h b/include/linux/pci.h index 4869e66..412ec1c 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -188,6 +188,8 @@ enum pci_dev_flags { * the direct_complete optimization. */ PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11), + /* Don't use Relaxed Ordering for TLPs directed at this device */ + PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12), }; enum pci_irq_reroute_variant { -- 1.8.3.1
[PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported
The PCIe Device Control Register use the bit 4 to indicate that whether the device is permitted to enable relaxed ordering or not. But relaxed ordering is not safe for some platform which could only use strong write ordering, so devices are allowed (but not required) to enable relaxed ordering bit by default. If a PCIe device didn't enable the relaxed ordering attribute default, we should not do anything in the PCIe configuration, otherwise we should check if any of the devices above us do not support relaxed ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on the result if we get a return that indicate that the relaxed ordering is not supported we should update our device to disable relaxed ordering in configuration space. If the device above us doesn't exist or isn't the PCIe device, we shouldn't do anything and skip updating relaxed ordering because we are probably running in a guest machine. Signed-off-by: Ding Tianhong --- drivers/pci/pci.c | 29 + drivers/pci/probe.c | 37 + include/linux/pci.h | 2 ++ 3 files changed, 68 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d88edf5..7a6b32f 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps) EXPORT_SYMBOL(pcie_set_mps); /** + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit + * @dev: PCI device to query + * + * If possible clear relaxed ordering + */ +int pcie_clear_relaxed_ordering(struct pci_dev *dev) +{ + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, + PCI_EXP_DEVCTL_RELAX_EN); +} +EXPORT_SYMBOL(pcie_clear_relaxed_ordering); + +/** + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support + * @dev: PCI device to query + * + * Returns true if the device support relaxed ordering attribute. + */ +bool pcie_relaxed_ordering_supported(struct pci_dev *dev) +{ + u16 v; + + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v); + + return !!(v & PCI_EXP_DEVCTL_RELAX_EN); +} +EXPORT_SYMBOL(pcie_relaxed_ordering_supported); + +/** * pcie_get_minimum_link - determine minimum link settings of a PCI device * @dev: PCI device to query * @speed: storage for minimum speed diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index c31310d..48df012 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev) PCI_EXP_DEVCTL_EXT_TAG); } +/** + * pci_dev_should_disable_relaxed_ordering - check if the PCI device + * should disable the relaxed ordering attribute. + * @dev: PCI device + * + * Return true if any of the PCI devices above us do not support + * relaxed ordering. + */ +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev) +{ + while (dev) { + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) + return true; + + dev = dev->bus->self; + } + + return false; +} + +static void pci_configure_relaxed_ordering(struct pci_dev *dev) +{ + /* We should not alter the relaxed ordering bit for the VF */ + if (dev->is_virtfn) + return; + + /* If the releaxed ordering enable bit is not set, do nothing. */ + if (!pcie_relaxed_ordering_supported(dev)) + return; + + if (pci_dev_should_disable_relaxed_ordering(dev)) { + pcie_clear_relaxed_ordering(dev); + dev_info(&dev->dev, "Disable Relaxed Ordering\n"); + } +} + static void pci_configure_device(struct pci_dev *dev) { struct hotplug_params hpp; @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev) pci_configure_mps(dev); pci_configure_extended_tags(dev); + pci_configure_relaxed_ordering(dev); memset(&hpp, 0, sizeof(hpp)); ret = pci_get_hp_params(dev, &hpp); diff --git a/include/linux/pci.h b/include/linux/pci.h index 412ec1c..3aa23a2 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev, void pci_pme_wakeup_bus(struct pci_bus *bus); void pci_d3cold_enable(struct pci_dev *dev); void pci_d3cold_disable(struct pci_dev *dev); +int pcie_clear_relaxed_ordering(struct pci_dev *dev); +bool pcie_relaxed_ordering_supported(struct pci_dev *dev); /* PCI Virtual Channel */ int pci_save_vc_state(struct pci_dev *dev); -- 1.8.3.1
[PATCH v7 0/3] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
Some devices have problems with Transaction Layer Packets with the Relaxed Ordering Attribute set. This patch set adds a new PCIe Device Flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known devices with Relaxed Ordering issues, and a use of this new flag by the cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex Ports. It's been years since I've submitted kernel.org patches, I appolgise for the almost certain submission errors. v2: Alexander point out that the v1 was only a part of the whole solution, some platform which has some issues could use the new flag to indicate that it is not safe to enable relaxed ordering attribute, then we need to clear the relaxed ordering enable bits in the PCI configuration when initializing the device. So add a new second patch to modify the PCI initialization code to clear the relaxed ordering enable bit in the event that the root complex doesn't want relaxed ordering enabled. The third patch was base on the v1's second patch and only be changed to query the relaxed ordering enable bit in the PCI configuration space to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes set. This version didn't plan to drop the defines for Intel Drivers to use the new checking way to enable relaxed ordering because it is not the hardest part of the moment, we could fix it in next patchset when this patches reach the goal. v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration, If a PCIe device didn't enable the relaxed ordering attribute default, we should not do anything in the PCIe configuration, otherwise we should check if any of the devices above us do not support relaxed ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on the result if we get a return that indicate that the relaxed ordering is not supported we should update our device to disable relaxed ordering in configuration space. If the device above us doesn't exist or isn't the PCIe device, we shouldn't do anything and skip updating relaxed ordering because we are probably running in a guest. v4: Rename the functions pcie_get_relaxed_ordering and pcie_disable_relaxed_ordering according John's suggestion, and modify the description, use the true/false as the return value. We shouldn't enable relaxed ordering attribute by the setting in the root complex configuration space for PCIe device, so fix it for cxgb4. Fix some format issues. v5: Removed the unnecessary code for some function which only return the bool value, and add the check for VF device. Make this patch set base on 4.12-rc5. v6: Fix the logic error in the need to enable the relaxed ordering attribute for cxgb4. v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed Ordering Enable] in PCI Probe() routine, this will break our current solution for some platform which has problematic when enable the relaxed ordering attribute. According to the latest recommendations, remove the enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer scene, but we agree to leave this problem until we really trigger it. Make this patch set base on 4.12 release version. Casey Leedom (2): PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag Ding Tianhong (1): PCI: Enable PCIe Relaxed Ordering if supported drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 1 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 ++ drivers/net/ethernet/chelsio/cxgb4/sge.c| 5 +-- drivers/pci/pci.c | 32 +++ drivers/pci/probe.c | 41 + drivers/pci/quirks.c| 38 +++ include/linux/pci.h | 4 +++ 7 files changed, 136 insertions(+), 2 deletions(-) -- 1.9.0
[PATCH v7 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
From: Casey Leedom cxgb4 Ethernet driver now queries PCIe configuration space to determine if it can send TLPs to it with the Relaxed Ordering Attribute set. Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability Device Control[Relaxed Ordering Enable] at probe routine, to make sure the driver will not send the Relaxed Ordering TLPs to the Root Complex which could not deal the Relaxed Ordering TLPs. Signed-off-by: Casey Leedom Signed-off-by: Ding Tianhong --- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 1 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 23 +-- drivers/net/ethernet/chelsio/cxgb4/sge.c| 5 +++-- 3 files changed, 21 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h index ef4be78..09ea62e 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h @@ -529,6 +529,7 @@ enum { /* adapter flags */ USING_SOFT_PARAMS = (1 << 6), MASTER_PF = (1 << 7), FW_OFLD_CONN = (1 << 9), + ROOT_NO_RELAXED_ORDERING = (1 << 10), }; enum { diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index e403fa1..391e484 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -4654,11 +4654,6 @@ static void print_port_info(const struct net_device *dev) dev->name, adap->params.vpd.id, adap->name, buf); } -static void enable_pcie_relaxed_ordering(struct pci_dev *dev) -{ - pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN); -} - /* * Free the following resources: * - memory used for tables @@ -4908,7 +4903,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent) } pci_enable_pcie_error_reporting(pdev); - enable_pcie_relaxed_ordering(pdev); pci_set_master(pdev); pci_save_state(pdev); @@ -4947,6 +4941,23 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent) adapter->msg_enable = DFLT_MSG_ENABLE; memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map)); + /* If possible, we use PCIe Relaxed Ordering Attribute to deliver +* Ingress Packet Data to Free List Buffers in order to allow for +* chipset performance optimizations between the Root Complex and +* Memory Controllers. (Messages to the associated Ingress Queue +* notifying new Packet Placement in the Free Lists Buffers will be +* send without the Relaxed Ordering Attribute thus guaranteeing that +* all preceding PCIe Transaction Layer Packets will be processed +* first.) But some Root Complexes have various issues with Upstream +* Transaction Layer Packets with the Relaxed Ordering Attribute set. +* The PCIe devices which under the Root Complexes will be cleared the +* Relaxed Ordering bit in the configuration space, So we check our +* PCIe configuration space to see if it's flagged with advice against +* using Relaxed Ordering. +*/ + if (!pcie_relaxed_ordering_supported(pdev)) + adapter->flags |= ROOT_NO_RELAXED_ORDERING; + spin_lock_init(&adapter->stats_lock); spin_lock_init(&adapter->tid_release_lock); spin_lock_init(&adapter->win0_lock); diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c index ede1220..4ef68f6 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c @@ -2719,6 +2719,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq, struct fw_iq_cmd c; struct sge *s = &adap->sge; struct port_info *pi = netdev_priv(dev); + int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING); /* Size needs to be multiple of 16, including status entry. */ iq->size = roundup(iq->size, 16); @@ -2772,8 +2773,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq, flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc); c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F | -FW_IQ_CMD_FL0FETCHRO_F | -FW_IQ_CMD_FL0DATARO_F | +FW_IQ_CMD_FL0FETCHRO_V(relaxed) | +FW_IQ_CMD_FL0DATARO_V(relaxed) | FW_IQ_CMD_FL0PADEN_F); if (cong >= 0) c.iqns_to_fl0congen |= -- 1.8.3.1
Re: [PATCH net 20/20] net/hinic: Add ethtool and stats
Hi Andrew, We will separate the patches when we will resubmit. Aviad On 7/12/2017 6:43 PM, Andrew Lunn wrote: > On Wed, Jul 12, 2017 at 10:17:26PM +0800, Aviad Krawczyk wrote: > > Hi Avaid > >> + >> +static void hinic_tx_timeout(struct net_device *netdev) >> +{ >> +struct hinic_dev *nic_dev = netdev_priv(netdev); >> + >> +netif_err(nic_dev, drv, netdev, "Tx timeout\n"); >> +} >> + >> +#ifdef CONFIG_NET_POLL_CONTROLLER >> +static void hinic_netpoll(struct net_device *netdev) >> +{ >> +struct hinic_dev *nic_dev = netdev_priv(netdev); >> +struct hinic_hwdev *hwdev = nic_dev->hwdev; >> +int i, num_qps = hinic_hwdev_num_qps(hwdev); >> + >> +for (i = 0; i < num_qps; i++) { >> +struct hinic_txq *txq = &nic_dev->txqs[i]; >> +struct hinic_rxq *rxq = &nic_dev->rxqs[i]; >> + >> +napi_schedule(&txq->napi); >> +napi_schedule(&rxq->napi); >> +} >> +} >> +#endif > > This has nothing to do with ethtool support. Separate patch please. > > Andrew > > . >
[PATCH net] cxgb4: add new T5 pci device id's
Add 0x50a3 and 0x50a4 T5 device id's Signed-off-by: Ganesh Goudar --- drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h index 99987d8..aa28299 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h @@ -174,6 +174,8 @@ CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN CH_PCI_ID_TABLE_FENTRY(0x50a0), /* Custom T540-CR */ CH_PCI_ID_TABLE_FENTRY(0x50a1), /* Custom T540-CR */ CH_PCI_ID_TABLE_FENTRY(0x50a2), /* Custom T540-KR4 */ + CH_PCI_ID_TABLE_FENTRY(0x50a3), /* Custom T580-KR4 */ + CH_PCI_ID_TABLE_FENTRY(0x50a4), /* Custom 2x T540-CR */ /* T6 adapters: */ -- 2.1.0
Re: [PATCH net 01/20] net/hinic: Initialize hw interface
Hi Andrew, Now is the merge window and we need to resubmit. We will fix it when we will resubmit. The version number was used for the ethtool information and module version and will be removed. devm_kzalloc - will be used for the allocation of the memory at initialization. We used pr_ for messages that point on errors or information in module, and dev_ for errors in device and netif_ for errors in network device operations. netif_ will be used for network device ops and dev_ for the others. Aviad On 7/12/2017 6:29 PM, Andrew Lunn wrote: >> + >> +#define HINIC_DRV_NAME "HiNIC" >> +#define HINIC_DRV_VERSION "1.0" > > Hi Aviad > > Please don't add a driver version. There was a discussion about this > recently, how pointless it is. > >> +/** >> + * hinic_init_hwdev - Initialize the NIC HW >> + * @hwdev: the NIC HW device that is returned from the initialization >> + * @pdev: the NIC pci device >> + * >> + * Return 0 - Success, negative - Failure >> + * >> + * Initialize the NIC HW device and return a pointer to it in the first arg >> + **/ >> +int hinic_init_hwdev(struct hinic_hwdev **hwdev, struct pci_dev *pdev) >> +{ >> +struct hinic_pfhwdev *pfhwdev; >> +struct hinic_hwif *hwif; >> +int err; >> + >> +hwif = kzalloc(sizeof(*hwif), GFP_KERNEL); > > Using the devm_ functions makes your cleanup code simpler when > handling memory. > >> +/** >> + * nic_dev_init - Initialize the NIC device >> + * @pdev: the NIC pci device >> + * >> + * Return 0 - Success, negative - Failure >> + **/ >> +static int nic_dev_init(struct pci_dev *pdev) >> +{ >> +struct hinic_dev *nic_dev; >> +struct net_device *netdev; >> +struct hinic_hwdev *hwdev; >> +int err, num_qps; >> + >> +err = hinic_init_hwdev(&hwdev, pdev); >> +if (err) { >> +dev_err(&pdev->dev, "Failed to initialize HW device\n"); >> +return err; >> +} >> + >> +num_qps = hinic_hwdev_num_qps(hwdev); >> +if (num_qps <= 0) { >> +dev_err(&pdev->dev, "Invalid number of QPS\n"); >> +err = -EINVAL; >> +goto num_qps_err; >> +} >> + >> +netdev = alloc_etherdev_mq(sizeof(*nic_dev), num_qps); >> +if (!netdev) { >> +pr_err("Failed to allocate Ethernet device\n"); > > Above you used dev_err, here you used pr_err(). Please be consistent. > >> +err = -ENOMEM; >> +goto alloc_etherdev_err; >> +} >> + >> +netdev->netdev_ops = &hinic_netdev_ops; >> + >> +nic_dev = (struct hinic_dev *)netdev_priv(netdev); >> +nic_dev->hwdev = hwdev; >> +nic_dev->netdev = netdev; >> +nic_dev->msg_enable = MSG_ENABLE_DEFAULT; >> + >> +pci_set_drvdata(pdev, netdev); >> + >> +netif_carrier_off(netdev); >> + >> +err = register_netdev(netdev); >> +if (err) { >> +netif_err(nic_dev, probe, netdev, "Failed to register >> netdev\n"); > > probably not a good idea to use netif_err, if register_netdev just > failed. dev_err() would be better. > > > . >
[PATCH net v2] cxgb4: ptp_clock_register() returns error pointers
Check ptp_clock_register() return not only for NULL but also for error pointers, and also nullify adapter->ptp_clock if ptp_clock_register() fails. Fixes: 9c33e4208bce ("cxgb4: Add PTP Hardware Clock (PHC) support") Reported-by: Dan Carpenter Cc: Richard Cochran Signed-off-by: Ganesh Goudar --- v2: nullifying adapter->ptp_clock if ptp_clock_register() fails. --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c index 50517cf..9f9d6ca 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c @@ -441,7 +441,8 @@ void cxgb4_ptp_init(struct adapter *adapter) adapter->ptp_clock = ptp_clock_register(&adapter->ptp_clock_info, &adapter->pdev->dev); - if (!adapter->ptp_clock) { + if (IS_ERR_OR_NULL(adapter->ptp_clock)) { + adapter->ptp_clock = NULL; dev_err(adapter->pdev_dev, "PTP %s Clock registration has failed\n", __func__); return; -- 2.1.0
[PATCH net] net: bridge: fix dest lookup when vlan proto doesn't match
With 802.1ad support the vlan_ingress code started checking for vlan protocol mismatch which causes the current tag to be inserted and the bridge vlan protocol & pvid to be set. The vlan tag insertion changes the skb mac_header and thus the lookup mac dest pointer which was loaded prior to calling br_allowed_ingress in br_handle_frame_finish is VLAN_HLEN bytes off now, pointing to the last two bytes of the destination mac and the first four of the source mac causing lookups to always fail and broadcasting all such packets to all ports. Same thing happens for locally originated packets when passing via br_dev_xmit. So load the dest pointer after the vlan checks and possible skb change. Fixes: 8580e2117c06 ("bridge: Prepare for 802.1ad vlan filtering support") Reported-by: Anitha Narasimha Murthy Signed-off-by: Nikolay Aleksandrov --- net/bridge/br_device.c | 3 ++- net/bridge/br_input.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index f0f3447e8aa4..861ae2a165f4 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -34,11 +34,11 @@ static struct lock_class_key bridge_netdev_addr_lock_key; netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev) { struct net_bridge *br = netdev_priv(dev); - const unsigned char *dest = skb->data; struct net_bridge_fdb_entry *dst; struct net_bridge_mdb_entry *mdst; struct pcpu_sw_netstats *brstats = this_cpu_ptr(br->stats); const struct nf_br_ops *nf_ops; + const unsigned char *dest; u16 vid = 0; rcu_read_lock(); @@ -61,6 +61,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev) if (!br_allowed_ingress(br, br_vlan_group_rcu(br), skb, &vid)) goto out; + dest = eth_hdr(skb)->h_dest; if (is_broadcast_ether_addr(dest)) { br_flood(br, skb, BR_PKT_BROADCAST, false, true); } else if (is_multicast_ether_addr(dest)) { diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 013f2290bfa5..7637f58c1226 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -131,11 +131,11 @@ static void br_do_proxy_arp(struct sk_buff *skb, struct net_bridge *br, int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb) { struct net_bridge_port *p = br_port_get_rcu(skb->dev); - const unsigned char *dest = eth_hdr(skb)->h_dest; enum br_pkt_type pkt_type = BR_PKT_UNICAST; struct net_bridge_fdb_entry *dst = NULL; struct net_bridge_mdb_entry *mdst; bool local_rcv, mcast_hit = false; + const unsigned char *dest; struct net_bridge *br; u16 vid = 0; @@ -153,6 +153,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, false); local_rcv = !!(br->dev->flags & IFF_PROMISC); + dest = eth_hdr(skb)->h_dest; if (is_multicast_ether_addr(dest)) { /* by definition the broadcast is also a multicast address */ if (is_broadcast_ether_addr(dest)) { -- 2.1.4
Re: net-next STATUS page
On Thursday 2017-07-13 13:32, Saeed Mahameed wrote: >>> Therefore, in order to avoid any and all confusion I have created this web site: http://vger.kernel.org/~davem/net-next.html > > You will need an image processing algorithm to determine whether it is open or > close, i have some good neural network algorithms if you are interested :) That's not going to win any hacker prize. A much more resource-saving approach is to just strcasestr(s, "closed") in the raw text. …at least while the page maintainer remains good-hearted.
Re: net-next STATUS page
On 7/13/2017 11:56 AM, Jiri Pirko wrote: Tue, Jul 11, 2017 at 08:58:41PM CEST, step...@networkplumber.org wrote: On Tue, 11 Jul 2017 07:24:16 -0700 (PDT) David Miller wrote: It has gotten to the point that even casually walking around Faro, Portugal last week, random German tourists would stop me in the street and ask if net-next was open or not. Therefore, in order to avoid any and all confusion I have created this web site: https://emea01.safelinks.protection.outlook.com/?url=http:%2F%2Fvger.kernel.org%2F~davem%2Fnet-next.html&data=02%7C01%7Csaeedm%40mellanox.com%7Caf800791df264ccd4eb008d4c9cd20ca%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636355330266033104&sdata=frov8YFnlybI1n3hacYLWJZzYbWL7mk4kWIS6XlG2Vg%3D&reserved=0 Please refer to it if you are ever in doubt. Thanks! Where is the app? There would be good to have git-send-email hook to check this :) You will need an image processing algorithm to determine whether it is open or close, i have some good neural network algorithms if you are interested :)
[PATCH] dccp: make const array error_code static
From: Colin Ian King Don't populate array error_code on the stack but make it static. Makes the object code smaller by almost 250 bytes: Before: textdata bss dec hex filename 10366 983 0 113492c55 net/dccp/input.o After: textdata bss dec hex filename 101611039 0 112002bc0 net/dccp/input.o Signed-off-by: Colin Ian King --- net/dccp/input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/dccp/input.c b/net/dccp/input.c index 4a05d7876850..fa6be9750bb4 100644 --- a/net/dccp/input.c +++ b/net/dccp/input.c @@ -126,7 +126,7 @@ static int dccp_rcv_closereq(struct sock *sk, struct sk_buff *skb) static u16 dccp_reset_code_convert(const u8 code) { - const u16 error_code[] = { + static const u16 error_code[] = { [DCCP_RESET_CODE_CLOSED] = 0, /* normal termination */ [DCCP_RESET_CODE_UNSPECIFIED]= 0, /* nothing known */ [DCCP_RESET_CODE_ABORTED]= ECONNRESET, -- 2.11.0
Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants
On Tue, 11 Jul 2017 11:26:54 -0700 John Fastabend wrote: > On 07/11/2017 07:23 AM, Jesper Dangaard Brouer wrote: > > On Mon, 10 Jul 2017 17:59:17 -0700 > > John Fastabend wrote: > > > >> On 07/10/2017 11:30 AM, Jesper Dangaard Brouer wrote: > >>> On Sat, 8 Jul 2017 21:06:17 +0200 > >>> Jesper Dangaard Brouer wrote: > >>> > On Sat, 08 Jul 2017 10:46:18 +0100 (WEST) > David Miller wrote: > > > From: John Fastabend > > Date: Fri, 07 Jul 2017 10:48:36 -0700 > > > >> On 07/07/2017 10:34 AM, John Fastabend wrote: > >>> This series adds two new XDP helper routines bpf_redirect() and > >>> bpf_redirect_map(). The first variant bpf_redirect() is meant > >>> to be used the same way it is currently being used by the cls_bpf > >>> classifier. An xdp packet will be redirected immediately when this > >>> is called. > >> > >> Also other than the typo in the title there ;) I'm going to CC > >> the driver maintainers working on XDP (makes for a long CC list but) > >> because we would want to try and get support in as many as possible in > >> the next merge window. > >> > >> For this rev I just implemented on ixgbe because I wrote the > >> original XDP support there. I'll volunteer to do virtio as well. > >> > > > > I went over this series a few times and it looks great to me. > > You didn't even give me some coding style issues to pick on :-) > > We (Daniel, Andy and I) have been reviewing and improving on this > patchset the last couple of weeks ;-). We had some stability issues, > which is why it wasn't published earlier. My plan is to test this > latest patchset again, Monday and Tuesday. I'll try to assess stability > and provide some performance numbers. > >>> > >>> > >>> Damn, I though it was stable, I have been running a lot of performance > >>> tests, and then this just happened :-( > >> > >> Thanks, I'll take a look through the code and see if I can come up with > >> why this might happen. I haven't hit it on my tests yet though. > > > > I've figured out why this happens, and I have a fix, see patch below > > with some comments with questions. > > > > Awesome! > > > The problem is that we can leak map_to_flush in an error path, the fix: > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 2ccd6ff09493..7f1f48668dcf 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -2497,11 +2497,14 @@ int xdp_do_redirect_map(struct net_device *dev, > > struct xdp_buff *xdp, > > ri->map = NULL; > > > > trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT); > > - > > + // Q: Should we also trace "goto out" (failed lookup)? > > + //like bpf_warn_invalid_xdp_redirect(); > > Maybe another trace event? trace_xdp_redirect_failed() > > > return __bpf_tx_xdp(fwd, map, xdp, index); > > out: > > ri->ifindex = 0; > > - ri->map = NULL; > > + // XXX: here we could leak ri->map_to_flush, which could be > > + // picked up later by xdp_do_flush_map() > > + xdp_do_flush_map(); /* Clears ri->map_to_flush + ri->map */ > > +1 > > ah map lookup failed and we need to do the flush nice catch. I'm still getting crashes (but much harder to provoke), but I figured out why. We sort of missed one case, where map_to_flush gets set, when the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then forgets to call xdp_do_flush_map, if all packets in that NAPI cycle failed. We could blame the driver, but yhe clean solution is making sure, that we don't set map_to_flush when the __bpf_tx_xdp() call fails. It should also handle the other case I fixed I'll cleanup my PoC-fix patch, test it and provide it here. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
[PATCH net] net: hns: add acpi function of xge led control
From: LiuJian The current code only support DT method to control xge led. This patch is the implementation of acpi method to control xge led. Signed-off-by: LiuJian Reviewed-by: John Garry Reviewed-by: Yunsheng Lin Reviewed-by: Daode Huang --- drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 3 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 58 +- 2 files changed, 58 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c index ff864a1..a37166e 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c @@ -776,8 +776,9 @@ void hns_ae_update_led_status(struct hnae_handle *handle) assert(handle); mac_cb = hns_get_mac_cb(handle); - if (!mac_cb->cpld_ctrl) + if (mac_cb->media_type != HNAE_MEDIA_TYPE_FIBER) return; + hns_set_led_opt(mac_cb); } diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c index 7a8addd..408b63f 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c @@ -53,6 +53,34 @@ static u32 dsaf_read_sub(struct dsaf_device *dsaf_dev, u32 reg) return ret; } +static void hns_dsaf_acpi_ledctrl_by_port(struct hns_mac_cb *mac_cb, u8 op_type, + u32 link, u32 port, u32 act) +{ + union acpi_object *obj; + union acpi_object obj_args[3], argv4; + + obj_args[0].integer.type = ACPI_TYPE_INTEGER; + obj_args[0].integer.value = link; + obj_args[1].integer.type = ACPI_TYPE_INTEGER; + obj_args[1].integer.value = port; + obj_args[2].integer.type = ACPI_TYPE_INTEGER; + obj_args[2].integer.value = act; + + argv4.type = ACPI_TYPE_PACKAGE; + argv4.package.count = 3; + argv4.package.elements = obj_args; + + obj = acpi_evaluate_dsm(ACPI_HANDLE(mac_cb->dev), + &hns_dsaf_acpi_dsm_guid, 0, op_type, &argv4); + if (!obj) { + dev_warn(mac_cb->dev, "ledctrl fail, link:%d port:%d act:%d!\n", +link, port, act); + return; + } + + ACPI_FREE(obj); +} + static void hns_cpld_set_led(struct hns_mac_cb *mac_cb, int link_status, u16 speed, int data) { @@ -93,6 +121,18 @@ static void hns_cpld_set_led(struct hns_mac_cb *mac_cb, int link_status, } } +static void hns_cpld_set_led_acpi(struct hns_mac_cb *mac_cb, int link_status, +u16 speed, int data) +{ + if (!mac_cb) { + pr_err("cpld_led_set mac_cb is null!\n"); + return; + } + + hns_dsaf_acpi_ledctrl_by_port(mac_cb, HNS_OP_LED_SET_FUNC, + link_status, mac_cb->mac_id, data); +} + static void cpld_led_reset(struct hns_mac_cb *mac_cb) { if (!mac_cb || !mac_cb->cpld_ctrl) @@ -103,6 +143,20 @@ static void cpld_led_reset(struct hns_mac_cb *mac_cb) mac_cb->cpld_led_value = CPLD_LED_DEFAULT_VALUE; } +static void cpld_led_reset_acpi(struct hns_mac_cb *mac_cb) +{ + if (!mac_cb) { + pr_err("cpld_led_reset mac_cb is null!\n"); + return; + } + + if (mac_cb->media_type != HNAE_MEDIA_TYPE_FIBER) +return; + + hns_dsaf_acpi_ledctrl_by_port(mac_cb, HNS_OP_LED_SET_FUNC, + 0, mac_cb->mac_id, 0); +} + static int cpld_set_led_id(struct hns_mac_cb *mac_cb, enum hnae_led_state status) { @@ -604,8 +658,8 @@ struct dsaf_misc_op *hns_misc_op_get(struct dsaf_device *dsaf_dev) misc_op->cfg_serdes_loopback = hns_mac_config_sds_loopback; } else if (is_acpi_node(dsaf_dev->dev->fwnode)) { - misc_op->cpld_set_led = hns_cpld_set_led; - misc_op->cpld_reset_led = cpld_led_reset; + misc_op->cpld_set_led = hns_cpld_set_led_acpi; + misc_op->cpld_reset_led = cpld_led_reset_acpi; misc_op->cpld_set_led_id = cpld_set_led_id; misc_op->dsaf_reset = hns_dsaf_rst_acpi; -- 2.5.0
[PATCH net] xgene: Don't fail probe, if there is no clk resource for SGMII interfaces
From: Thomas Bogendoerfer This change fixes following problem [1.827940] xgene-enet: probe of 1f210030.ethernet failed with error -2 which leads to a missing ethernet interface (reproducable at least on Gigabyte MP30-AR0 and APM Mustang systems). The check for a valid clk resource fails, because DT doesn't provide a clock for sgenet1. But the driver doesn't use this clk, if the ethernet port is connected via SGMII. Therefore this patch avoids probing for clk on SGMII interfaces. Fixes: 9aea7779b764 drivers: net: xgene: Fix crash on DT systems Signed-off-by: Thomas Bogendoerfer --- drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index d3906f6b01bd..86058a9f3417 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -1785,16 +1785,18 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata) xgene_enet_gpiod_get(pdata); - pdata->clk = devm_clk_get(&pdev->dev, NULL); - if (IS_ERR(pdata->clk)) { - /* Abort if the clock is defined but couldn't be retrived. -* Always abort if the clock is missing on DT system as -* the driver can't cope with this case. -*/ - if (PTR_ERR(pdata->clk) != -ENOENT || dev->of_node) - return PTR_ERR(pdata->clk); - /* Firmware may have set up the clock already. */ - dev_info(dev, "clocks have been setup already\n"); + if (pdata->phy_mode != PHY_INTERFACE_MODE_SGMII) { + pdata->clk = devm_clk_get(&pdev->dev, NULL); + if (IS_ERR(pdata->clk)) { + /* Abort if the clock is defined but couldn't be +* retrived. Always abort if the clock is missing on +* DT system as the driver can't cope with this case. +*/ + if (PTR_ERR(pdata->clk) != -ENOENT || dev->of_node) + return PTR_ERR(pdata->clk); + /* Firmware may have set up the clock already. */ + dev_info(dev, "clocks have been setup already\n"); + } } if (pdata->phy_mode != PHY_INTERFACE_MODE_XGMII) -- 2.12.3
Re: [PATCH] bpf: fix return in bpf_skb_adjust_net
On 07/13/2017 08:27 AM, Kefeng Wang wrote: The bpf_skb_adjust_net() ignores the return value of bpf_skb_net_shrink/grow, and always return 0, fix it by return 'ret'. Signed-off-by: Kefeng Wang Sigh, yep of course. Thanks! Acked-by: Daniel Borkmann
Re: net-next STATUS page
Tue, Jul 11, 2017 at 08:58:41PM CEST, step...@networkplumber.org wrote: >On Tue, 11 Jul 2017 07:24:16 -0700 (PDT) >David Miller wrote: > >> It has gotten to the point that even casually walking around >> Faro, Portugal last week, random German tourists would stop >> me in the street and ask if net-next was open or not. >> >> Therefore, in order to avoid any and all confusion I have created this >> web site: >> >> http://vger.kernel.org/~davem/net-next.html >> >> Please refer to it if you are ever in doubt. >> >> Thanks! > >Where is the app? There would be good to have git-send-email hook to check this :)
[PATCH net] libceph: osdmap: Fix some NULL dereferences
There are hidden gotos in the ceph_decode_* macros. We need to set the "err" variable on these error paths otherwise we end up returning ERR_PTR(0) which is NULL. It causes NULL dereferences in the callers. Fixes: 278b1d709c6a ("libceph: ceph_decode_skip_* helpers") Signed-off-by: Dan Carpenter diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c index 864789c5974e..c7521a847ef7 100644 --- a/net/ceph/osdmap.c +++ b/net/ceph/osdmap.c @@ -510,6 +510,7 @@ static struct crush_map *crush_decode(void *pbyval, void *end) } } + err = -EINVAL; ceph_decode_skip_map(p, end, 32, string, bad); /* type_map */ ceph_decode_skip_map(p, end, 32, string, bad); /* name_map */ ceph_decode_skip_map(p, end, 32, string, bad); /* rule_name_map */ @@ -1825,9 +1826,9 @@ struct ceph_osdmap *osdmap_apply_incremental(void **p, void *end, if (struct_v >= 3) { /* new_erasure_code_profiles */ ceph_decode_skip_map_of_map(p, end, string, string, string, - bad); + e_inval); /* old_erasure_code_profiles */ - ceph_decode_skip_set(p, end, string, bad); + ceph_decode_skip_set(p, end, string, e_inval); } if (struct_v >= 4) {
Re: [PATCH net-next v1 2/3] gtp: Support creating flow-based gtp net_device
Hi Jiannan, please keep in mind I have zero clue about OVS, so I cannot really comment on what is customary in that context and what not. However, some general review from the GTP point of view below: On Wed, Jul 12, 2017 at 05:44:54PM -0700, Jiannan Ouyang wrote: > Add the gtp_create_flow_based_dev() interface to create flow-based gtp > net_device, which sets gtp->collect_md. Under flow-based mode, UDP sockets are > created and maintained in kernel. > > Signed-off-by: Jiannan Ouyang > --- > drivers/net/gtp.c | 213 > +- > include/net/gtp.h | 8 ++ > 2 files changed, 217 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c > index 5a7b504..09712c9 100644 > --- a/drivers/net/gtp.c > +++ b/drivers/net/gtp.c > @@ -642,9 +642,94 @@ static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, > struct net_device *dev) > return NETDEV_TX_OK; > } > > +static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize); > +static void gtp_hashtable_free(struct gtp_dev *gtp); > +static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]); > + > +static int gtp_change_mtu(struct net_device *dev, int new_mtu, bool strict) > +{ > + int max_mtu = IP_MAX_MTU - dev->hard_header_len - sizeof(struct iphdr) > + - sizeof(struct udphdr) - sizeof(struct gtp1_header); > + > + if (new_mtu < ETH_MIN_MTU) > + return -EINVAL; > + > + if (new_mtu > max_mtu) { > + if (strict) > + return -EINVAL; > + > + new_mtu = max_mtu; > + } > + > + dev->mtu = new_mtu; > + return 0; > +} > + > +static int gtp_dev_open(struct net_device *dev) > +{ > + struct gtp_dev *gtp = netdev_priv(dev); > + struct net *net = gtp->net; > + struct socket *sock1u; > + struct socket *sock0; > + struct udp_tunnel_sock_cfg tunnel_cfg; > + struct udp_port_cfg udp_conf; > + int err; > + > + memset(&udp_conf, 0, sizeof(udp_conf)); > + > + udp_conf.family = AF_INET; > + udp_conf.local_ip.s_addr = htonl(INADDR_ANY); > + udp_conf.local_udp_port = htons(GTP1U_PORT); > + > + err = udp_sock_create(gtp->net, &udp_conf, &sock1u); > + if (err < 0) > + return err; > + > + udp_conf.local_udp_port = htons(GTP0_PORT); > + err = udp_sock_create(gtp->net, &udp_conf, &sock0); > + if (err < 0) > + return err; you're unconditionally binding to both GTP0 and GTP1 UDP ports. This is done selectively based on netlink attributes in the existing "normal" non-OVS kernel code, i.e. the control is left to the user. Is this function is only called/used in the context of OVS? If so, since you explicitly implement only GTPv1, why bind to GTPv0 port? > + setup_udp_tunnel_sock(net, sock1u, &tunnel_cfg); even here, you're only setting up the v1 and not v0. > + /* Assume largest header, ie. GTPv0. */ > + dev->needed_headroom= LL_MAX_HEADER + > + sizeof(struct iphdr) + > + sizeof(struct udphdr) + > + sizeof(struct gtp0_header); ... and here you're using headroom for a GTPv0 header, despite (I think) only supporting GTPv1 from this configuration? > + err = gtp_hashtable_new(gtp, GTP_PDP_HASHSIZE); // JO: when to free?? I think that question about when to free needs to be resolved before any merge. Did you check that it persists even after the device is closed/removed? -- - Harald Weltehttp://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)