Re: [PATCH] net: thunderx: Switch to pci_alloc_irq_vectors
On Wed, Apr 5, 2017 at 11:57 AM, Christoph Hellwig wrote: > On Tue, Apr 04, 2017 at 02:59:06PM +0530, dev.srinivas...@gmail.com wrote: >> From: Thanneeru Srinivasulu >> >> Remove deprecated pci_enable_msix API in favour of it's >> successor pci_alloc_irq_vectors. >> >> Signed-off-by: Thanneeru Srinivasulu >> Signed-off-by: Sunil Goutham > > Looks good. > > Are you fine with me queueing this up together with the other > pci_enable_msix() removal patches? No issues, thanks.
Re: [PATCH v2 00/13] ftgmac100: Rework batch 1 - Link & Interrupts
On Tue, 2017-04-04 at 23:02 -0700, Florian Fainelli wrote: > We don't necessarily have a phydev attached when using NC-SI, so it was > > easier to have the core code path not have to go fishing for those > > settings in different places based on whether we're using NC-SI or not. > > Oh right, I missed that part. Is there a reason why NC-SI does not have > a PHY device attached? If not, could you somehow model the link using a > fixed PHY (which appears to Linux as a normal phy_device) just to keep > things simple. Hrm ... maybe another day if you don't mind ;-) First NC-SI isn't really a PHY it's a cross-over RMII connection to another NIC. Now we could make it a phydev using a "fixed" PHY I suppose, that just "represents" the other end. That would be a way to do it. It would need to have the link permanently up however (see below). That said I do want to tackle making it some kind of pseudo-PHY that actually reflects the state of the remote end (especially the link state, ie. up/down). However there are a couple of issues to tackle if we do that. Well mostly one annoying one: NC-SI needs to talk to the remote NIC via specific ethernet frames. With the current link watch code however, if we reflect the remote link to the local NIC link via netif_carrier_on/off, we end up deactivating the device on link off and thus preventing the NC-SI stack from talking to the peer NIC at all. I thought a while ago we could add some dev flag to prevent the link watch from doing that, but never got to look into it myself and apparently neither did Gavin. So yes, those are worthwhile improvements and I can probably tackle them once I've unpiled a dozen other train wrecks from my plate ;) However I'd like to not block this series further since it's not actually making things any worse than they are. > > > - the need to reset the HW during link changes is just ... well too bad > > > > Yup but there's little choice. The HW wants it. I don't see any real > > point in optimizing that path mind you. Losing a few packets around > > a link change isn't going to hurt and it keeps the code a lot simpler > > by having a single "re-init" path. > > I was just merely trying to say nicely: what a nicely broken piece of HW > (there were other adjectives coming to mind), and I do understand the pain. :-) At least I got a register spec (and little more) :-) It looks like those Aspeed BMCs are the only game in town for BMC chips these days and they use that "interesting" IP block from Faraday so this is probably here to stay, at least for a while. Another "interesting" attribute of that piece of c^Hhw is its handling of receive descriptors. It doesn't "count" how many are free. It has to constantly "read" the head descriptor in the RX ring to check the own bit. So you have to setup a HW timer for the chip to go "poll" on your memory. It's pretty insane. At least for TX there's an MMIO you can poke to tell it to go fetch more. There's sort-of one for RX but it doesn't seem to do what you would expect, or I did something wrong when playing with it. It's not like it would have been hard to have a counter, which is incremented by writing a value to a register so Linux can "provide" descriptors by writing the number freed in there. So the chip never really knows how many free descriptors it has which also means it cannot do flow control based on that, only on the FIFO threshold. With a 2K only FIFO that's interesting. Anyway, it sort-of works. Without my patches I maxed out at about 80Mbit/s iperf on a gigabit link with the AST2500 eval board (ARM11 800Mhz base). With my patches I get to about 400Mbit/s. Cheers. Ben.
Re: [PATCH] net: thunderx: Switch to pci_alloc_irq_vectors
On Tue, Apr 04, 2017 at 02:59:06PM +0530, dev.srinivas...@gmail.com wrote: > From: Thanneeru Srinivasulu > > Remove deprecated pci_enable_msix API in favour of it's > successor pci_alloc_irq_vectors. > > Signed-off-by: Thanneeru Srinivasulu > Signed-off-by: Sunil Goutham Looks good. Are you fine with me queueing this up together with the other pci_enable_msix() removal patches?
Re: net/sched: latent livelock in dev_deactivate_many() due to yield() usage
On Tue, 2017-04-04 at 22:25 -0700, Cong Wang wrote: > On Tue, Apr 4, 2017 at 8:20 PM, Mike Galbraith wrote: > > - while (some_qdisc_is_busy(dev)) > > - yield(); > > + swait_event_timeout(swait, > > !some_qdisc_is_busy(dev), 1); > > } > > I don't see why this is an improvement even if I don't care about the > hardcoded timeout for now... Why the scheduler can make a better > decision with swait_event_timeout() than with cond_resched()? Because sleeping gets you out of the way? There is no other decision the scheduler can make while a SCHED_FIFO task is trying to yield when it is the one and only task at it's priority. The scheduler is doing exactly what it is supposed to do, problem is people calling yield() tend to think it does something it does not do, which is why it is decorated with "if you think you want yield(), think again" Yes, yield semantics suck rocks, basically don't exist. Hop in your time machine and slap whoever you find claiming responsibility :) -Mike
Re: [PATCH v2 00/13] ftgmac100: Rework batch 1 - Link & Interrupts
On 04/04/2017 10:53 PM, Benjamin Herrenschmidt wrote: > On Tue, 2017-04-04 at 21:21 -0700, Florian Fainelli wrote: >> >> This looks pretty good to me, two minor things: >> >> - most drivers keep track of the old status/duplex/pause/link variables >> instead of the current one which is already available within struct >> phy_device, any particular reason for not doing like the other drivers? > > We don't necessarily have a phydev attached when using NC-SI, so it was > easier to have the core code path not have to go fishing for those > settings in different places based on whether we're using NC-SI or not. Oh right, I missed that part. Is there a reason why NC-SI does not have a PHY device attached? If not, could you somehow model the link using a fixed PHY (which appears to Linux as a normal phy_device) just to keep things simple. > >> - the need to reset the HW during link changes is just ... well too bad > > Yup but there's little choice. The HW wants it. I don't see any real > point in optimizing that path mind you. Losing a few packets around > a link change isn't going to hurt and it keeps the code a lot simpler > by having a single "re-init" path. I was just merely trying to say nicely: what a nicely broken piece of HW (there were other adjectives coming to mind), and I do understand the pain. > >> With that: >> >>> Reviewed-by: Florian Fainelli > > Thanks ! > > I'll post batch 2 in the next couple of days which tackles the RX path. Cool, looking forward to that! -- Florian
Re: [Patch net] net_sched: replace yield() with cond_resched()
On Tue, 2017-04-04 at 22:19 -0700, Cong Wang wrote: > On Tue, Apr 4, 2017 at 8:55 PM, Mike Galbraith wrote: > > That won't help, cond_resched() has the same impact upon a lone > > SCHED_FIFO task as yield() does.. none. > > Hmm? In the comment you quote: > > * If you want to use yield() to wait for something, use wait_event(). > * If you want to use yield() to be 'nice' for others, use cond_resched(). > > So if cond_resched() doesn't help, why this misleading comment? This is not an oh let's be nice guys thing, it's a perfect match of... * while (!event) * yield(); (/copy/paste> ..get off the CPU until this happens thing. With nobody to yield the C PU to, some_qdisc_is_busy() will remain true forever more. > I picked the latter one, because the former is harder to implement > properly (at least for -net) we need qdisc's to notify this waiter once > they finish transmitting packets, which means we probably need > a per-netdevice wait struct. Yup, why I merely notified net-fu masters of lurking spinner. I met it because I sometimes run most kthreads at prio 1, some prioritized, and kworkers at prio 2. (never mind why, but they're excellent reasons) -Mike
Re: [PATCH v2 00/13] ftgmac100: Rework batch 1 - Link & Interrupts
On Tue, 2017-04-04 at 21:21 -0700, Florian Fainelli wrote: > > This looks pretty good to me, two minor things: > > - most drivers keep track of the old status/duplex/pause/link variables > instead of the current one which is already available within struct > phy_device, any particular reason for not doing like the other drivers? We don't necessarily have a phydev attached when using NC-SI, so it was easier to have the core code path not have to go fishing for those settings in different places based on whether we're using NC-SI or not. > - the need to reset the HW during link changes is just ... well too bad Yup but there's little choice. The HW wants it. I don't see any real point in optimizing that path mind you. Losing a few packets around a link change isn't going to hurt and it keeps the code a lot simpler by having a single "re-init" path. > With that: > > > Reviewed-by: Florian Fainelli Thanks ! I'll post batch 2 in the next couple of days which tackles the RX path. Cheers, Ben.
Re: [PATCH] net: phy: broadcom: Add support for the BCM54210E
On Wed, Apr 5, 2017 at 3:17 PM, Florian Fainelli wrote: > > > On 04/04/2017 10:33 PM, Joel Stanley wrote: >> This device is a single-port RGMII 10/100/1000BASE-T PHY with EEE & WOL. > > This looks good, although Rafal did beat you to it: > > 0fc9ae107669760c2a8658cb5b5876dbe525e08d ("net: phy: broadcom: add > support for BCM54210E") Even better! Thank you. Cheers, Joel
Re: [PATCH] net: phy: broadcom: Add support for the BCM54210E
On 04/04/2017 10:33 PM, Joel Stanley wrote: > This device is a single-port RGMII 10/100/1000BASE-T PHY with EEE & WOL. This looks good, although Rafal did beat you to it: 0fc9ae107669760c2a8658cb5b5876dbe525e08d ("net: phy: broadcom: add support for BCM54210E") -- Florian
[PATCH] net: phy: broadcom: Add support for the BCM54210E
This device is a single-port RGMII 10/100/1000BASE-T PHY with EEE & WOL. Signed-off-by: Joel Stanley --- drivers/net/phy/broadcom.c | 13 + include/linux/brcmphy.h| 2 ++ 2 files changed, 15 insertions(+) diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c index 9cd8b27d1292..3df826323129 100644 --- a/drivers/net/phy/broadcom.c +++ b/drivers/net/phy/broadcom.c @@ -703,6 +703,18 @@ static struct phy_driver broadcom_drivers[] = { .read_status= genphy_read_status, .ack_interrupt = brcm_fet_ack_interrupt, .config_intr= brcm_fet_config_intr, +}, { + .phy_id = PHY_ID_BCM54210E, + .phy_id_mask= 0xfff0, + .name = "Broadcom BCM54210E", + .features = PHY_GBIT_FEATURES | + SUPPORTED_Pause | SUPPORTED_Asym_Pause, + .flags = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT, + .config_init= bcm54xx_config_init, + .config_aneg= genphy_config_aneg, + .read_status= genphy_read_status, + .ack_interrupt = bcm_phy_ack_intr, + .config_intr= bcm_phy_config_intr, } }; module_phy_driver(broadcom_drivers); @@ -723,6 +735,7 @@ static struct mdio_device_id __maybe_unused broadcom_tbl[] = { { PHY_ID_BCM57780, 0xfff0 }, { PHY_ID_BCMAC131, 0xfff0 }, { PHY_ID_BCM5241, 0xfff0 }, + { PHY_ID_BCM54210E, 0xfff0}, { } }; diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h index 55e517130311..53106b9c89f1 100644 --- a/include/linux/brcmphy.h +++ b/include/linux/brcmphy.h @@ -40,6 +40,8 @@ #define PHY_ID_BCM_CYGNUS 0xae025200 +#define PHY_ID_BCM54210E 0x600d84a0 + #define PHY_BCM_OUI_MASK 0xfc00 #define PHY_BCM_OUI_1 0x00206000 #define PHY_BCM_OUI_2 0x0143bc00 -- 2.11.0
Re: net/sched: latent livelock in dev_deactivate_many() due to yield() usage
On Tue, Apr 4, 2017 at 8:20 PM, Mike Galbraith wrote: > - while (some_qdisc_is_busy(dev)) > - yield(); > + swait_event_timeout(swait, !some_qdisc_is_busy(dev), 1); > } I don't see why this is an improvement even if I don't care about the hardcoded timeout for now... Why the scheduler can make a better decision with swait_event_timeout() than with cond_resched()?
Re: [Patch net] net_sched: replace yield() with cond_resched()
On Tue, Apr 4, 2017 at 8:55 PM, Mike Galbraith wrote: > On Tue, 2017-04-04 at 18:52 -0700, Cong Wang wrote: > >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c >> index 1a2f9e9..4725d2f 100644 >> --- a/net/sched/sch_generic.c >> +++ b/net/sched/sch_generic.c >> @@ -925,7 +925,7 @@ void dev_deactivate_many(struct list_head *head) >> /* Wait for outstanding qdisc_run calls. */ >> list_for_each_entry(dev, head, close_list) >> while (some_qdisc_is_busy(dev)) >> - yield(); >> + cond_resched(); >> } > > That won't help, cond_resched() has the same impact upon a lone > SCHED_FIFO task as yield() does.. none. Hmm? In the comment you quote: * If you want to use yield() to wait for something, use wait_event(). * If you want to use yield() to be 'nice' for others, use cond_resched(). So if cond_resched() doesn't help, why this misleading comment? I picked the latter one, because the former is harder to implement properly (at least for -net) we need qdisc's to notify this waiter once they finish transmitting packets, which means we probably need a per-netdevice wait struct.
[PATCH] af_unix: Use designated initializers
Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, and the initializer fixes were extracted from grsecurity. In this case, NULL initialize with { } instead of undesignated NULLs. Signed-off-by: Kees Cook --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 928691c43408..6a7fe7660551 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -996,7 +996,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) unsigned int hash; struct unix_address *addr; struct hlist_head *list; - struct path path = { NULL, NULL }; + struct path path = { }; err = -EINVAL; if (sunaddr->sun_family != AF_UNIX) -- 2.7.4 -- Kees Cook Pixel Security
Re: [PATCH v2 00/13] ftgmac100: Rework batch 1 - Link & Interrupts
Salut Benjamin, Le 04/04/17 à 19:28, Benjamin Herrenschmidt a écrit : > This is version 2 of the first batch of updates to the > ftgmac100 driver. > > Essentially: > > - A few misc cleanups > - Fixing link speed & duplex handling (including dealing with >an Aspeed requirement to double reset the controller when >the speed changes) > - And addition of a reset task workqueue which will be used >for delaying the re-initialization of the controller > - Fixing a number of issues with how interrupts and NAPI >are dealt with. > > Subsequent batches will rework and improve the rx path, the > tx path, and add a bunch of features and fixes. > > Version 2 addresses some review comments to patches 5 and 10 > (see version history in the respective emails). > This looks pretty good to me, two minor things: - most drivers keep track of the old status/duplex/pause/link variables instead of the current one which is already available within struct phy_device, any particular reason for not doing like the other drivers? - the need to reset the HW during link changes is just ... well too bad With that: Reviewed-by: Florian Fainelli -- Florian
Re: pull-request: wireless-drivers-next 2017-04-03
David Miller writes: > From: Kalle Valo > Date: Tue, 04 Apr 2017 20:48:35 +0300 > >> David Miller writes: >> >>> From: Kalle Valo >>> Date: Mon, 03 Apr 2017 14:26:10 +0300 >>> here few really small fixes. I'm hoping this to be the last pull request for 4.11. Please let me if there are any problems. >>> >>> Pulled, thanks. >>> >>> But I will warn you, you say fixes, but your Subject line and >>> GIT tag says "-next" so I pulled it into net-next. >> >> Sorry, I used the wrong pull request template and that's why I had the >> wrong subject in this pull request. So actually this was supposed to be >> for net, not net-next. Any chance you could also pull this to net so >> that we can still get the fixes to 4.11? > > Sure, done. Great, thank you. And sorry for this, I need to be more careful when sending the pull requests. -- Kalle Valo
Re: [Patch net] net_sched: replace yield() with cond_resched()
On Tue, 2017-04-04 at 18:52 -0700, Cong Wang wrote: > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index 1a2f9e9..4725d2f 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -925,7 +925,7 @@ void dev_deactivate_many(struct list_head *head) > /* Wait for outstanding qdisc_run calls. */ > list_for_each_entry(dev, head, close_list) > while (some_qdisc_is_busy(dev)) > - yield(); > + cond_resched(); > } That won't help, cond_resched() has the same impact upon a lone SCHED_FIFO task as yield() does.. none. -Mike
Re: [PATCH] ebpf: verify the output of the JIT
Hi Kees, On Tue, Apr 04, 2017 at 03:17:57PM -0700, Kees Cook wrote: > On Tue, Apr 4, 2017 at 3:08 PM, Tycho Andersen wrote: > > The goal of this patch is to protect the JIT against an attacker with a > > write-in-memory primitive. The JIT allocates a buffer which will eventually > > be marked +x, so we need to make sure that what was written to this buffer > > is what was intended. > > > > We acheive this by building a hash of the instruction buffer as > > instructions are emittted and then comparing that to a hash at the end of > > the JIT compile after the buffer has been marked read-only. > > > > Signed-off-by: Tycho Andersen > > CC: Daniel Borkmann > > CC: Alexei Starovoitov > > CC: Kees Cook > > CC: Mickaël Salaün > > Cool! This closes the race condition on producing the JIT vs going > read-only. I wonder if it might be possible to make this a more > generic interface to the BPF which would be allocate the hash, provide > the update callback during emit, and then do the hash check itself at > the end of bpf_jit_binary_lock_ro()? Yes, probably so. I can look into that for the next version. Tycho
[PATCH v2 02/13] ftgmac100: Remove "banner" comments
The divisions they represent are not particularily meaningful and things are going to be moving around with upcoming changes making these comments more a burden than anything else. Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 42 1 file changed, 42 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index bf7b1c0..6501aa7 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -44,9 +44,6 @@ #define MAX_PKT_SIZE 1518 #define RX_BUF_SIZEPAGE_SIZE /* must be smaller than 0x3fff */ -/** - * private data - */ struct ftgmac100_descs { struct ftgmac100_rxdes rxdes[RX_QUEUE_ENTRIES]; struct ftgmac100_txdes txdes[TX_QUEUE_ENTRIES]; @@ -86,9 +83,6 @@ struct ftgmac100 { static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv, struct ftgmac100_rxdes *rxdes, gfp_t gfp); -/** - * internal functions (hardware register access) - */ static void ftgmac100_set_rx_ring_base(struct ftgmac100 *priv, dma_addr_t addr) { iowrite32(addr, priv->base + FTGMAC100_OFFSET_RXR_BADR); @@ -243,9 +237,6 @@ static void ftgmac100_stop_hw(struct ftgmac100 *priv) iowrite32(0, priv->base + FTGMAC100_OFFSET_MACCR); } -/** - * internal functions (receive descriptor) - */ static bool ftgmac100_rxdes_first_segment(struct ftgmac100_rxdes *rxdes) { return rxdes->rxdes0 & cpu_to_le32(FTGMAC100_RXDES0_FRS); @@ -370,9 +361,6 @@ static struct page *ftgmac100_rxdes_get_page(struct ftgmac100 *priv, return *ftgmac100_rxdes_page_slot(priv, rxdes); } -/** - * internal functions (receive) - */ static int ftgmac100_next_rx_pointer(int pointer) { return (pointer + 1) & (RX_QUEUE_ENTRIES - 1); @@ -557,9 +545,6 @@ static bool ftgmac100_rx_packet(struct ftgmac100 *priv, int *processed) return true; } -/** - * internal functions (transmit descriptor) - */ static void ftgmac100_txdes_reset(const struct ftgmac100 *priv, struct ftgmac100_txdes *txdes) { @@ -653,9 +638,6 @@ static struct sk_buff *ftgmac100_txdes_get_skb(struct ftgmac100_txdes *txdes) return (struct sk_buff *)txdes->txdes2; } -/** - * internal functions (transmit) - */ static int ftgmac100_next_tx_pointer(int pointer) { return (pointer + 1) & (TX_QUEUE_ENTRIES - 1); @@ -771,9 +753,6 @@ static int ftgmac100_xmit(struct ftgmac100 *priv, struct sk_buff *skb, return NETDEV_TX_OK; } -/** - * internal functions (buffer) - */ static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv, struct ftgmac100_rxdes *rxdes, gfp_t gfp) { @@ -865,9 +844,6 @@ static int ftgmac100_alloc_buffers(struct ftgmac100 *priv) return -ENOMEM; } -/** - * internal functions (mdio) - */ static void ftgmac100_adjust_link(struct net_device *netdev) { struct ftgmac100 *priv = netdev_priv(netdev); @@ -917,9 +893,6 @@ static int ftgmac100_mii_probe(struct ftgmac100 *priv) return 0; } -/** - * struct mii_bus functions - */ static int ftgmac100_mdiobus_read(struct mii_bus *bus, int phy_addr, int regnum) { struct net_device *netdev = bus->priv; @@ -991,9 +964,6 @@ static int ftgmac100_mdiobus_write(struct mii_bus *bus, int phy_addr, return -EIO; } -/** - * struct ethtool_ops functions - ***
Re: net/sched: latent livelock in dev_deactivate_many() due to yield() usage
On Tue, 2017-04-04 at 15:39 -0700, Cong Wang wrote: > Thanks for the report! Looks like a quick solution here is to replace > this yield() with cond_resched(), it is harder to really wait for > all qdisc's to transmit all packets. No, cond_resched() won't help. What I did is below, but I suspect net wizards will do something better. --- net/sched/sch_generic.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -901,6 +902,7 @@ static bool some_qdisc_is_busy(struct ne */ void dev_deactivate_many(struct list_head *head) { + DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(swait); struct net_device *dev; bool sync_needed = false; @@ -924,8 +926,7 @@ void dev_deactivate_many(struct list_hea /* Wait for outstanding qdisc_run calls. */ list_for_each_entry(dev, head, close_list) - while (some_qdisc_is_busy(dev)) - yield(); + swait_event_timeout(swait, !some_qdisc_is_busy(dev), 1); } void dev_deactivate(struct net_device *dev)
Re: net/ipv4: use-after-free in ipv4_mtu
On Tue, 2017-04-04 at 18:11 -0700, Cong Wang wrote: > On Tue, Apr 4, 2017 at 11:51 AM, Eric Dumazet wrote: > > On Tue, Apr 4, 2017 at 7:50 AM, Andrey Konovalov > > wrote: > >> > >> Hi, > >> > >> I've got the following error report while fuzzing the kernel with > >> syzkaller. > >> > >> On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5). > >> > >> Unfortunately it's not reproducible. > >> > >> == > >> BUG: KASAN: use-after-free in dst_metric_raw include/net/dst.h:176 > >> [inline] at addr 88003d6a965c > >> BUG: KASAN: use-after-free in ipv4_mtu+0x3f2/0x4b0 > >> net/ipv4/route.c:1270 at addr 88003d6a965c > >> Read of size 4 by task syz-executor3/20611 > >> CPU: 3 PID: 20611 Comm: syz-executor3 Not tainted 4.11.0-rc5+ #199 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs > >> 01/01/2011 > >> Call Trace: > >> __dump_stack lib/dump_stack.c:16 [inline] > >> dump_stack+0x292/0x398 lib/dump_stack.c:52 > >> kasan_object_err+0x1c/0x70 mm/kasan/report.c:164 > >> print_address_description mm/kasan/report.c:202 [inline] > >> kasan_report_error mm/kasan/report.c:291 [inline] > >> kasan_report+0x252/0x510 mm/kasan/report.c:347 > >> __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367 > >> dst_metric_raw include/net/dst.h:176 [inline] > >> ipv4_mtu+0x3f2/0x4b0 net/ipv4/route.c:1270 > >> dst_mtu include/net/dst.h:221 [inline] > >> do_ip_getsockopt+0x71d/0x2290 net/ipv4/ip_sockglue.c:1433 > >> ip_getsockopt+0x90/0x230 net/ipv4/ip_sockglue.c:1578 > >> tcp_getsockopt+0x82/0xd0 net/ipv4/tcp.c:3131 > >> sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2709 > >> SYSC_getsockopt net/socket.c:1829 [inline] > >> SyS_getsockopt+0x252/0x390 net/socket.c:1811 > >> entry_SYSCALL_64_fastpath+0x1f/0xc2 > >> RIP: 0033:0x4458d9 > >> RSP: 002b:7fe87f452b58 EFLAGS: 0286 ORIG_RAX: 0037 > >> RAX: ffda RBX: 0005 RCX: 004458d9 > >> RDX: 000e RSI: RDI: 0005 > >> RBP: 006e0020 R08: 20db6000 R09: > >> R10: 207e8000 R11: 0286 R12: 00708150 > >> R13: 20db8000 R14: 1000 R15: 0003 > >> Object at 88003d6a9658, in cache kmalloc-64 size: 64 > >> Allocated: > >> PID = 20110 > >> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 > >> save_stack+0x43/0xd0 mm/kasan/kasan.c:513 > >> set_track mm/kasan/kasan.c:525 [inline] > >> kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616 > >> kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745 > >> kmalloc include/linux/slab.h:490 [inline] > >> kzalloc include/linux/slab.h:663 [inline] > >> fib_create_info+0x8e0/0x3a30 net/ipv4/fib_semantics.c:1040 > >> fib_table_insert+0x1a5/0x1550 net/ipv4/fib_trie.c:1221 > >> ip_rt_ioctl+0xddc/0x1590 net/ipv4/fib_frontend.c:597 > >> inet_ioctl+0xf2/0x1c0 net/ipv4/af_inet.c:882 > >> sctp: [Deprecated]: syz-executor0 (pid 20638) Use of int in max_burst > >> socket option. > >> Use struct sctp_assoc_value instead > >> sock_do_ioctl+0x65/0xb0 net/socket.c:906 > >> sock_ioctl+0x28f/0x440 net/socket.c:1004 > >> vfs_ioctl fs/ioctl.c:45 [inline] > >> do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685 > >> SYSC_ioctl fs/ioctl.c:700 [inline] > >> SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 > >> entry_SYSCALL_64_fastpath+0x1f/0xc2 > >> Freed: > >> PID = 4439 > >> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 > >> save_stack+0x43/0xd0 mm/kasan/kasan.c:513 > >> set_track mm/kasan/kasan.c:525 [inline] > >> kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589 > >> slab_free_hook mm/slub.c:1357 [inline] > >> slab_free_freelist_hook mm/slub.c:1379 [inline] > >> slab_free mm/slub.c:2961 [inline] > >> kfree+0xe8/0x2b0 mm/slub.c:3882 > >> free_fib_info_rcu+0x4ba/0x5e0 net/ipv4/fib_semantics.c:218 > >> __rcu_reclaim kernel/rcu/rcu.h:118 [inline] > >> rcu_do_batch.isra.64+0x947/0xcc0 kernel/rcu/tree.c:2879 > >> invoke_rcu_callbacks kernel/rcu/tree.c:3142 [inline] > >> __rcu_process_callbacks kernel/rcu/tree.c:3109 [inline] > >> rcu_process_callbacks+0x2cc/0xb90 kernel/rcu/tree.c:3126 > >> __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 > >> Memory state around the buggy address: > >> 88003d6a9500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > >> 88003d6a9580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > >> >88003d6a9600: fc fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb > >> ^ > >> 88003d6a9680: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc > >> 88003d6a9700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > >> == > > > > Thanks for the report Andrey > > > > Looking at fib->fib_metrics, I fail to understand how the following can > > work : > > > > dst_init_metrics(&rt->dst, fi->fib_metrics, true); > > > > In the cases fi->
[PATCH v2 09/13] ftgmac100: Move the bulk of inits to a separate function
The link monitoring and error handling code will have to redo the ring inits and HW setup so move the code out of ftgmac100_open() into a dedicated function. This forces a bit of re-ordering of ftgmac100_open() but nothing dramatic. Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 71 +++- 1 file changed, 42 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index fdb8638..36f2905 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -1109,10 +1109,35 @@ static int ftgmac100_poll(struct napi_struct *napi, int budget) return rx; } +static int ftgmac100_init_all(struct ftgmac100 *priv, bool ignore_alloc_err) +{ + int err = 0; + + /* Re-init descriptors (adjust queue sizes) */ + ftgmac100_init_rings(priv); + + /* Realloc rx descriptors */ + err = ftgmac100_alloc_rx_buffers(priv); + if (err && !ignore_alloc_err) + return err; + + /* Reinit and restart HW */ + ftgmac100_init_hw(priv); + ftgmac100_start_hw(priv); + + /* Re-enable the device */ + napi_enable(&priv->napi); + netif_start_queue(priv->netdev); + + /* Enable all interrupts */ + iowrite32(priv->int_mask_all, priv->base + FTGMAC100_OFFSET_IER); + + return err; +} + static int ftgmac100_open(struct net_device *netdev) { struct ftgmac100 *priv = netdev_priv(netdev); - unsigned int status; int err; /* Allocate ring buffers */ @@ -1122,13 +1147,6 @@ static int ftgmac100_open(struct net_device *netdev) return err; } - /* Initialize the rings */ - ftgmac100_init_rings(priv); - - /* Allocate receive buffers */ - if (ftgmac100_alloc_rx_buffers(priv)) - goto err_alloc; - /* When using NC-SI we force the speed to 100Mbit/s full duplex, * * Otherwise we leave it set to 0 (no link), the link @@ -1162,26 +1180,21 @@ static int ftgmac100_open(struct net_device *netdev) goto err_irq; } - ftgmac100_init_hw(priv); - ftgmac100_start_hw(priv); - - /* Clear stale interrupts */ - status = ioread32(priv->base + FTGMAC100_OFFSET_ISR); - iowrite32(status, priv->base + FTGMAC100_OFFSET_ISR); + /* Start things up */ + err = ftgmac100_init_all(priv, false); + if (err) { + netdev_err(netdev, "Failed to allocate packet buffers\n"); + goto err_alloc; + } - if (netdev->phydev) + if (netdev->phydev) { + /* If we have a PHY, start polling */ phy_start(netdev->phydev); - else if (priv->use_ncsi) + } else if (priv->use_ncsi) { + /* If using NC-SI, set our carrier on and start the stack */ netif_carrier_on(netdev); - napi_enable(&priv->napi); - netif_start_queue(netdev); - - /* enable all interrupts */ - iowrite32(priv->int_mask_all, priv->base + FTGMAC100_OFFSET_IER); - - /* Start the NCSI device */ - if (priv->use_ncsi) { + /* Start the NCSI device */ err = ncsi_start_dev(priv->ndev); if (err) goto err_ncsi; @@ -1189,16 +1202,16 @@ static int ftgmac100_open(struct net_device *netdev) return 0; -err_ncsi: + err_ncsi: napi_disable(&priv->napi); netif_stop_queue(netdev); + err_alloc: + ftgmac100_free_buffers(priv); free_irq(netdev->irq, netdev); -err_irq: + err_irq: netif_napi_del(&priv->napi); -err_hw: -err_alloc: + err_hw: iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); - ftgmac100_free_buffers(priv); ftgmac100_free_rings(priv); return err; } -- 2.9.3
[PATCH v2 08/13] ftgmac100: Request the interrupt only after HW is reset
The interrupt isn't shared, so this will keep it masked until we have the HW in a known sane state. Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index bb444d2..fdb8638 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -1129,12 +1129,6 @@ static int ftgmac100_open(struct net_device *netdev) if (ftgmac100_alloc_rx_buffers(priv)) goto err_alloc; - err = request_irq(netdev->irq, ftgmac100_interrupt, 0, netdev->name, netdev); - if (err) { - netdev_err(netdev, "failed to request irq %d\n", netdev->irq); - goto err_irq; - } - /* When using NC-SI we force the speed to 100Mbit/s full duplex, * * Otherwise we leave it set to 0 (no link), the link @@ -1161,6 +1155,13 @@ static int ftgmac100_open(struct net_device *netdev) /* Initialize NAPI */ netif_napi_add(netdev, &priv->napi, ftgmac100_poll, 64); + /* Grab our interrupt */ + err = request_irq(netdev->irq, ftgmac100_interrupt, 0, netdev->name, netdev); + if (err) { + netdev_err(netdev, "failed to request irq %d\n", netdev->irq); + goto err_irq; + } + ftgmac100_init_hw(priv); ftgmac100_start_hw(priv); @@ -1191,12 +1192,12 @@ static int ftgmac100_open(struct net_device *netdev) err_ncsi: napi_disable(&priv->napi); netif_stop_queue(netdev); - netif_napi_del(&priv->napi); - iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); -err_hw: free_irq(netdev->irq, netdev); err_irq: + netif_napi_del(&priv->napi); +err_hw: err_alloc: + iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); ftgmac100_free_buffers(priv); ftgmac100_free_rings(priv); return err; -- 2.9.3
[PATCH v2 03/13] ftgmac100: Reorder struct fields and comment
Reorder the fields in struct ftgmac in slightly more logical groups. Will make more sense as I add/remove some. No code change. Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index 6501aa7..02e0534 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -50,34 +50,39 @@ struct ftgmac100_descs { }; struct ftgmac100 { + /* Registers */ struct resource *res; void __iomem *base; struct ftgmac100_descs *descs; dma_addr_t descs_dma_addr; + /* Rx ring */ struct page *rx_pages[RX_QUEUE_ENTRIES]; - unsigned int rx_pointer; + u32 rxdes0_edorr_mask; + + /* Tx ring */ unsigned int tx_clean_pointer; unsigned int tx_pointer; unsigned int tx_pending; - + u32 txdes0_edotr_mask; spinlock_t tx_lock; + /* Component structures */ struct net_device *netdev; struct device *dev; struct ncsi_dev *ndev; struct napi_struct napi; - struct mii_bus *mii_bus; + + /* Link management */ int old_speed; - int int_mask_all; bool use_ncsi; - bool enabled; - u32 rxdes0_edorr_mask; - u32 txdes0_edotr_mask; + /* Misc */ + int int_mask_all; + bool enabled; }; static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv, -- 2.9.3
[PATCH v2 12/13] ftgmac100: Remove useless tests in interrupt handler
The interrupt is neither enabled nor registered when the interface isn't running (regardless of whether we use nc-si or not) so the test isn't useful. Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index 3adfb92..4fa138b 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -1047,14 +1047,9 @@ static irqreturn_t ftgmac100_interrupt(int irq, void *dev_id) struct net_device *netdev = dev_id; struct ftgmac100 *priv = netdev_priv(netdev); - /* When running in NCSI mode, the interface should be ready for -* receiving or transmitting NCSI packets before it's opened. -*/ - if (likely(priv->use_ncsi || netif_running(netdev))) { - /* Disable interrupts for polling */ - iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); - napi_schedule(&priv->napi); - } + /* Disable interrupts for polling */ + iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); + napi_schedule(&priv->napi); return IRQ_HANDLED; } -- 2.9.3
[PATCH v2 10/13] ftgmac100: Add a reset task and use it for link changes
Link speed changes require a full HW reset. This isn't done properly at the moment. It will involve delays and thus isn't suitable to do from the link poll callback. So let's create a reset_task that we can queue up when the link changes. It will be useful for various cases of error handling as well. Signed-off-by: Benjamin Herrenschmidt -- v2. Fix lock ordering Add mdio_bus mutex --- drivers/net/ethernet/faraday/ftgmac100.c | 87 +++- 1 file changed, 74 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index 36f2905..61f02bf 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -74,6 +74,7 @@ struct ftgmac100 { struct device *dev; struct ncsi_dev *ndev; struct napi_struct napi; + struct work_struct reset_task; struct mii_bus *mii_bus; /* Link management */ @@ -872,7 +873,6 @@ static void ftgmac100_adjust_link(struct net_device *netdev) struct ftgmac100 *priv = netdev_priv(netdev); struct phy_device *phydev = netdev->phydev; int new_speed; - int ier; /* We store "no link" as speed 0 */ if (!phydev->link) @@ -897,20 +897,11 @@ static void ftgmac100_adjust_link(struct net_device *netdev) if (!new_speed) return; - ier = ioread32(priv->base + FTGMAC100_OFFSET_IER); - - /* disable all interrupts */ + /* Disable all interrupts */ iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); - netif_stop_queue(netdev); - ftgmac100_stop_hw(priv); - - netif_start_queue(netdev); - ftgmac100_init_hw(priv); - ftgmac100_start_hw(priv); - - /* re-enable interrupts */ - iowrite32(ier, priv->base + FTGMAC100_OFFSET_IER); + /* Reset the adapter asynchronously */ + schedule_work(&priv->reset_task); } static int ftgmac100_mii_probe(struct ftgmac100 *priv) @@ -1135,6 +1126,61 @@ static int ftgmac100_init_all(struct ftgmac100 *priv, bool ignore_alloc_err) return err; } +static void ftgmac100_reset_task(struct work_struct *work) +{ + struct ftgmac100 *priv = container_of(work, struct ftgmac100, + reset_task); + struct net_device *netdev = priv->netdev; + int err; + + netdev_dbg(netdev, "Resetting NIC...\n"); + + /* Lock the world */ + rtnl_lock(); + if (netdev->phydev) + mutex_lock(&netdev->phydev->lock); + if (priv->mii_bus) + mutex_lock(&priv->mii_bus->mdio_lock); + + + /* Check if the interface is still up */ + if (!netif_running(netdev)) + goto bail; + + /* Stop the network stack */ + netif_trans_update(netdev); + napi_disable(&priv->napi); + netif_tx_disable(netdev); + + /* Stop and reset the MAC */ + ftgmac100_stop_hw(priv); + err = ftgmac100_reset_hw(priv); + if (err) { + /* Not much we can do ... it might come back... */ + netdev_err(netdev, "attempting to continue...\n"); + } + + /* Free all rx and tx buffers */ + ftgmac100_free_buffers(priv); + + /* The ring pointers have been reset in HW, reflect this here */ + priv->rx_pointer = 0; + priv->tx_clean_pointer = 0; + priv->tx_pointer = 0; + priv->tx_pending = 0; + + /* Setup everything again and restart chip */ + ftgmac100_init_all(priv, true); + + netdev_dbg(netdev, "Reset done !\n"); + bail: + if (priv->mii_bus) + mutex_unlock(&priv->mii_bus->mdio_lock); + if (netdev->phydev) + mutex_unlock(&netdev->phydev->lock); + rtnl_unlock(); +} + static int ftgmac100_open(struct net_device *netdev) { struct ftgmac100 *priv = netdev_priv(netdev); @@ -1220,6 +1266,14 @@ static int ftgmac100_stop(struct net_device *netdev) { struct ftgmac100 *priv = netdev_priv(netdev); + /* Note about the reset task: We are called with the rtnl lock +* held, so we are synchronized against the core of the reset +* task. We must not try to synchronously cancel it otherwise +* we can deadlock. But since it will test for netif_running() +* which has already been cleared by the net core, we don't +* anything special to do. +*/ + /* disable all interrupts */ iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); @@ -1395,6 +1449,7 @@ static int ftgmac100_probe(struct platform_device *pdev) priv = netdev_priv(netdev); priv->netdev = netdev; priv->dev = &pdev->dev; + INIT_WORK(&priv->reset_task, ftgmac100_reset_task); spin_lock_init(&priv->tx_lock); @@ -1498,6 +1553,12 @@ static int ftgmac100_remove(struct platform_device *pdev) priv = netdev_priv(netdev); unregiste
[PATCH v2 07/13] ftgmac100: Move napi_add/del to open/close
Rather than probe/remove Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index 0d0576f..bb444d2 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -1158,6 +1158,9 @@ static int ftgmac100_open(struct net_device *netdev) if (err) goto err_hw; + /* Initialize NAPI */ + netif_napi_add(netdev, &priv->napi, ftgmac100_poll, 64); + ftgmac100_init_hw(priv); ftgmac100_start_hw(priv); @@ -1188,6 +1191,7 @@ static int ftgmac100_open(struct net_device *netdev) err_ncsi: napi_disable(&priv->napi); netif_stop_queue(netdev); + netif_napi_del(&priv->napi); iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); err_hw: free_irq(netdev->irq, netdev); @@ -1207,6 +1211,7 @@ static int ftgmac100_stop(struct net_device *netdev) netif_stop_queue(netdev); napi_disable(&priv->napi); + netif_napi_del(&priv->napi); if (netdev->phydev) phy_stop(netdev->phydev); else if (priv->use_ncsi) @@ -1379,9 +1384,6 @@ static int ftgmac100_probe(struct platform_device *pdev) spin_lock_init(&priv->tx_lock); - /* initialize NAPI */ - netif_napi_add(netdev, &priv->napi, ftgmac100_poll, 64); - /* map io memory */ priv->res = request_mem_region(res->start, resource_size(res), dev_name(&pdev->dev)); -- 2.9.3
[PATCH v2 11/13] ftgmac100: Rework MAC reset and init
The HW requires a full MAC reset when changing the speed. Additionally the Aspeed documentation spells out that the MAC needs to be reset twice with a 10us interval. We thus move the speed setting and top level reset code into a new ftgmac100_reset_and_config_mac() function which handles both. Move the ring pointers initialization there too in order to reflect the HW change. Also reduce the timeout for the MAC reset as it shouldn't take more than 300 clock cycles according to the doc. Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 98 +++- 1 file changed, 59 insertions(+), 39 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index 61f02bf..3adfb92 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -112,27 +112,64 @@ static void ftgmac100_txdma_normal_prio_start_polling(struct ftgmac100 *priv) iowrite32(1, priv->base + FTGMAC100_OFFSET_NPTXPD); } -static int ftgmac100_reset_hw(struct ftgmac100 *priv) +static int ftgmac100_reset_mac(struct ftgmac100 *priv, u32 maccr) { struct net_device *netdev = priv->netdev; int i; /* NOTE: reset clears all registers */ - iowrite32(FTGMAC100_MACCR_SW_RST, priv->base + FTGMAC100_OFFSET_MACCR); - for (i = 0; i < 5; i++) { + iowrite32(maccr, priv->base + FTGMAC100_OFFSET_MACCR); + iowrite32(maccr | FTGMAC100_MACCR_SW_RST, + priv->base + FTGMAC100_OFFSET_MACCR); + for (i = 0; i < 50; i++) { unsigned int maccr; maccr = ioread32(priv->base + FTGMAC100_OFFSET_MACCR); if (!(maccr & FTGMAC100_MACCR_SW_RST)) return 0; - udelay(1000); + udelay(1); } - netdev_err(netdev, "software reset failed\n"); + netdev_err(netdev, "Hardware reset failed\n"); return -EIO; } +static int ftgmac100_reset_and_config_mac(struct ftgmac100 *priv) +{ + u32 maccr = 0; + + switch (priv->cur_speed) { + case SPEED_10: + case 0: /* no link */ + break; + + case SPEED_100: + maccr |= FTGMAC100_MACCR_FAST_MODE; + break; + + case SPEED_1000: + maccr |= FTGMAC100_MACCR_GIGA_MODE; + break; + default: + netdev_err(priv->netdev, "Unknown speed %d !\n", + priv->cur_speed); + break; + } + + /* (Re)initialize the queue pointers */ + priv->rx_pointer = 0; + priv->tx_clean_pointer = 0; + priv->tx_pointer = 0; + priv->tx_pending = 0; + + /* The doc says reset twice with 10us interval */ + if (ftgmac100_reset_mac(priv, maccr)) + return -EIO; + usleep_range(10, 1000); + return ftgmac100_reset_mac(priv, maccr); +} + static void ftgmac100_set_mac(struct ftgmac100 *priv, const unsigned char *mac) { unsigned int maddr = mac[0] << 8 | mac[1]; @@ -208,35 +245,28 @@ static void ftgmac100_init_hw(struct ftgmac100 *priv) ftgmac100_set_mac(priv, priv->netdev->dev_addr); } -#define MACCR_ENABLE_ALL (FTGMAC100_MACCR_TXDMA_EN | \ -FTGMAC100_MACCR_RXDMA_EN | \ -FTGMAC100_MACCR_TXMAC_EN | \ -FTGMAC100_MACCR_RXMAC_EN | \ -FTGMAC100_MACCR_CRC_APD| \ -FTGMAC100_MACCR_RX_RUNT| \ -FTGMAC100_MACCR_RX_BROADPKT) - static void ftgmac100_start_hw(struct ftgmac100 *priv) { - int maccr = MACCR_ENABLE_ALL; + u32 maccr = ioread32(priv->base + FTGMAC100_OFFSET_MACCR); - switch (priv->cur_speed) { - default: - case 10: - break; + /* Keep the original GMAC and FAST bits */ + maccr &= (FTGMAC100_MACCR_FAST_MODE | FTGMAC100_MACCR_GIGA_MODE); - case 100: - maccr |= FTGMAC100_MACCR_FAST_MODE; - break; - - case 1000: - maccr |= FTGMAC100_MACCR_GIGA_MODE; - break; - } + /* Add all the main enable bits */ + maccr |= FTGMAC100_MACCR_TXDMA_EN | +FTGMAC100_MACCR_RXDMA_EN | +FTGMAC100_MACCR_TXMAC_EN | +FTGMAC100_MACCR_RXMAC_EN | +FTGMAC100_MACCR_CRC_APD| +FTGMAC100_MACCR_PHY_LINK_LEVEL | +FTGMAC100_MACCR_RX_RUNT| +FTGMAC100_MACCR_RX_BROADPKT; + /* Add other bits as needed */ if (priv->cur_duplex == DUPLEX_FULL) maccr |= FTGMAC100_MACCR_FULLDUP; + /* Hit the HW */ iowrite32(maccr, priv->base + FTGMAC100_OFFSET_MACCR); } @@ -1154,7 +118
[PATCH v2 13/13] ftgmac100: Rework NAPI & interrupts handling
First, don't look at the interrupt status in the poll loop to decide what to poll. It's wrong. If we have run out of budget, we may still have RX packets to unqueue but no more RX interrupt pending. So instead move the code looking at the interrupt status into the interrupt handler where it belongs. That avoids a slow MMIO read in the NAPI fast path. We keep the abnormal interrupts enabled while NAPI is scheduled. While at it, actually do something useful in the "error" cases: On AHB bus error, trigger the new reset task, that's about all we can do. On RX packet fifo or descriptor overflows, we need to restart the MAC after having freed things up. So set a flag that NAPI will see and use to perform that restart after harvesting the RX ring. Finally, we shouldn't complete NAPI if there are still outgoing packets that will need harvesting. Waiting for more interrupts is less efficient than letting NAPI run a while longer while the queue drains. Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 137 +-- drivers/net/ethernet/faraday/ftgmac100.h | 14 2 files changed, 90 insertions(+), 61 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index 4fa138b..88dab5f 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -83,7 +83,7 @@ struct ftgmac100 { bool use_ncsi; /* Misc */ - int int_mask_all; + bool need_mac_restart; }; static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv, @@ -1046,10 +1046,49 @@ static irqreturn_t ftgmac100_interrupt(int irq, void *dev_id) { struct net_device *netdev = dev_id; struct ftgmac100 *priv = netdev_priv(netdev); + unsigned int status, new_mask = FTGMAC100_INT_BAD; - /* Disable interrupts for polling */ - iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); - napi_schedule(&priv->napi); + /* Fetch and clear interrupt bits, process abnormal ones */ + status = ioread32(priv->base + FTGMAC100_OFFSET_ISR); + iowrite32(status, priv->base + FTGMAC100_OFFSET_ISR); + if (unlikely(status & FTGMAC100_INT_BAD)) { + + /* RX buffer unavailable */ + if (status & FTGMAC100_INT_NO_RXBUF) + netdev->stats.rx_over_errors++; + + /* received packet lost due to RX FIFO full */ + if (status & FTGMAC100_INT_RPKT_LOST) + netdev->stats.rx_fifo_errors++; + + /* sent packet lost due to excessive TX collision */ + if (status & FTGMAC100_INT_XPKT_LOST) + netdev->stats.tx_fifo_errors++; + + /* AHB error -> Reset the chip */ + if (status & FTGMAC100_INT_AHB_ERR) { + if (net_ratelimit()) + netdev_warn(netdev, + "AHB bus error ! Resetting chip.\n"); + iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); + schedule_work(&priv->reset_task); + return IRQ_HANDLED; + } + + /* We may need to restart the MAC after such errors, delay +* this until after we have freed some Rx buffers though +*/ + priv->need_mac_restart = true; + + /* Disable those errors until we restart */ + new_mask &= ~status; + } + + /* Only enable "bad" interrupts while NAPI is on */ + iowrite32(new_mask, priv->base + FTGMAC100_OFFSET_IER); + + /* Schedule NAPI bh */ + napi_schedule_irqoff(&priv->napi); return IRQ_HANDLED; } @@ -1057,68 +1096,51 @@ static irqreturn_t ftgmac100_interrupt(int irq, void *dev_id) static int ftgmac100_poll(struct napi_struct *napi, int budget) { struct ftgmac100 *priv = container_of(napi, struct ftgmac100, napi); - struct net_device *netdev = priv->netdev; - unsigned int status; - bool completed = true; + bool more, completed = true; int rx = 0; - status = ioread32(priv->base + FTGMAC100_OFFSET_ISR); - iowrite32(status, priv->base + FTGMAC100_OFFSET_ISR); - - if (status & (FTGMAC100_INT_RPKT_BUF | FTGMAC100_INT_NO_RXBUF)) { - /* -* FTGMAC100_INT_RPKT_BUF: -* RX DMA has received packets into RX buffer successfully -* -* FTGMAC100_INT_NO_RXBUF: -* RX buffer unavailable -*/ - bool retry; + ftgmac100_tx_complete(priv); - do { - retry = ftgmac100_rx_packet(priv, &rx); - } while (retry && rx < budget); + do { + more = ftgmac100_rx_packet(priv, &rx); + } while (more && rx < budget); - if (re
[PATCH v2 06/13] ftgmac100: Split ring alloc, init and rx buffer alloc
Currently, a single function is used to allocate the rings themselves, initialize them, populate the rx ring, and allocate the rx buffers. The same happens on free. This splits them into separate functions. This will be useful when properly implementing re-initialization on link changes and error handling when the rings will be repopulated but not freed. Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 68 ++-- 1 file changed, 47 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index cc6e971..0d0576f 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -792,6 +792,7 @@ static void ftgmac100_free_buffers(struct ftgmac100 *priv) { int i; + /* Free all RX buffers */ for (i = 0; i < RX_QUEUE_ENTRIES; i++) { struct ftgmac100_rxdes *rxdes = &priv->descs->rxdes[i]; struct page *page = ftgmac100_rxdes_get_page(priv, rxdes); @@ -804,6 +805,7 @@ static void ftgmac100_free_buffers(struct ftgmac100 *priv) __free_page(page); } + /* Free all TX buffers */ for (i = 0; i < TX_QUEUE_ENTRIES; i++) { struct ftgmac100_txdes *txdes = &priv->descs->txdes[i]; struct sk_buff *skb = ftgmac100_txdes_get_skb(txdes); @@ -815,40 +817,54 @@ static void ftgmac100_free_buffers(struct ftgmac100 *priv) dma_unmap_single(priv->dev, map, skb_headlen(skb), DMA_TO_DEVICE); kfree_skb(skb); } - - dma_free_coherent(priv->dev, sizeof(struct ftgmac100_descs), - priv->descs, priv->descs_dma_addr); } -static int ftgmac100_alloc_buffers(struct ftgmac100 *priv) +static void ftgmac100_free_rings(struct ftgmac100 *priv) { - int i; + /* Free descriptors */ + if (priv->descs) + dma_free_coherent(priv->dev, sizeof(struct ftgmac100_descs), + priv->descs, priv->descs_dma_addr); +} +static int ftgmac100_alloc_rings(struct ftgmac100 *priv) +{ + /* Allocate descriptors */ priv->descs = dma_zalloc_coherent(priv->dev, sizeof(struct ftgmac100_descs), &priv->descs_dma_addr, GFP_KERNEL); if (!priv->descs) return -ENOMEM; - /* initialize RX ring */ - ftgmac100_rxdes_set_end_of_ring(priv, - &priv->descs->rxdes[RX_QUEUE_ENTRIES - 1]); + return 0; +} + +static void ftgmac100_init_rings(struct ftgmac100 *priv) +{ + int i; + + /* Initialize RX ring */ + for (i = 0; i < RX_QUEUE_ENTRIES; i++) + priv->descs->rxdes[i].rxdes0 = 0; + ftgmac100_rxdes_set_end_of_ring(priv, &priv->descs->rxdes[i - 1]); + + /* Initialize TX ring */ + for (i = 0; i < TX_QUEUE_ENTRIES; i++) + priv->descs->txdes[i].txdes0 = 0; + ftgmac100_txdes_set_end_of_ring(priv, &priv->descs->txdes[i -1]); +} + +static int ftgmac100_alloc_rx_buffers(struct ftgmac100 *priv) +{ + int i; for (i = 0; i < RX_QUEUE_ENTRIES; i++) { struct ftgmac100_rxdes *rxdes = &priv->descs->rxdes[i]; if (ftgmac100_alloc_rx_page(priv, rxdes, GFP_KERNEL)) - goto err; + return -ENOMEM; } - - /* initialize TX ring */ - ftgmac100_txdes_set_end_of_ring(priv, - &priv->descs->txdes[TX_QUEUE_ENTRIES - 1]); return 0; - -err: - ftgmac100_free_buffers(priv); - return -ENOMEM; } static void ftgmac100_adjust_link(struct net_device *netdev) @@ -1099,12 +1115,20 @@ static int ftgmac100_open(struct net_device *netdev) unsigned int status; int err; - err = ftgmac100_alloc_buffers(priv); + /* Allocate ring buffers */ + err = ftgmac100_alloc_rings(priv); if (err) { - netdev_err(netdev, "failed to allocate buffers\n"); - goto err_alloc; + netdev_err(netdev, "Failed to allocate descriptors\n"); + return err; } + /* Initialize the rings */ + ftgmac100_init_rings(priv); + + /* Allocate receive buffers */ + if (ftgmac100_alloc_rx_buffers(priv)) + goto err_alloc; + err = request_irq(netdev->irq, ftgmac100_interrupt, 0, netdev->name, netdev); if (err) { netdev_err(netdev, "failed to request irq %d\n", netdev->irq); @@ -1168,8 +1192,9 @@ static int ftgmac100_open(struct net_device *netdev) err_hw: free_irq(netdev->irq, netdev); err_irq: - ftgmac100_free_buffers(priv); err_alloc: + ftgmac100_free_buffers(priv); + ftgmac100_free_rings(priv); return err; } @@ -1190,6
[PATCH v2 05/13] ftgmac100: Cleanup speed/duplex tracking and fix duplex config
Keep track of both the current speed and duplex settings instead of only speed and properly apply the duplex setting to the HW. This reworks the adjust_link() function to also avoid trying to reconfigure the HW when there is no link and to display the link state to the user. Signed-off-by: Benjamin Herrenschmidt -- v2. Use phy_print_status() Only bail out on link down *after* updating priv->cur_speed --- drivers/net/ethernet/faraday/ftgmac100.c | 52 +++- 1 file changed, 44 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index cc2271b..cc6e971 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -77,7 +77,8 @@ struct ftgmac100 { struct mii_bus *mii_bus; /* Link management */ - int old_speed; + int cur_speed; + int cur_duplex; bool use_ncsi; /* Misc */ @@ -210,16 +211,15 @@ static void ftgmac100_init_hw(struct ftgmac100 *priv) FTGMAC100_MACCR_RXDMA_EN | \ FTGMAC100_MACCR_TXMAC_EN | \ FTGMAC100_MACCR_RXMAC_EN | \ -FTGMAC100_MACCR_FULLDUP| \ FTGMAC100_MACCR_CRC_APD| \ FTGMAC100_MACCR_RX_RUNT| \ FTGMAC100_MACCR_RX_BROADPKT) -static void ftgmac100_start_hw(struct ftgmac100 *priv, int speed) +static void ftgmac100_start_hw(struct ftgmac100 *priv) { int maccr = MACCR_ENABLE_ALL; - switch (speed) { + switch (priv->cur_speed) { default: case 10: break; @@ -233,6 +233,9 @@ static void ftgmac100_start_hw(struct ftgmac100 *priv, int speed) break; } + if (priv->cur_duplex == DUPLEX_FULL) + maccr |= FTGMAC100_MACCR_FULLDUP; + iowrite32(maccr, priv->base + FTGMAC100_OFFSET_MACCR); } @@ -852,12 +855,31 @@ static void ftgmac100_adjust_link(struct net_device *netdev) { struct ftgmac100 *priv = netdev_priv(netdev); struct phy_device *phydev = netdev->phydev; + int new_speed; int ier; - if (phydev->speed == priv->old_speed) + /* We store "no link" as speed 0 */ + if (!phydev->link) + new_speed = 0; + else + new_speed = phydev->speed; + + if (phydev->speed == priv->cur_speed && + phydev->duplex == priv->cur_duplex) return; - priv->old_speed = phydev->speed; + /* Print status if we have a link or we had one and just lost it, +* don't print otherwise. +*/ + if (new_speed || priv->cur_speed) + phy_print_status(phydev); + + priv->cur_speed = new_speed; + priv->cur_duplex = phydev->duplex; + + /* Link is down, do nothing else */ + if (!new_speed) + return; ier = ioread32(priv->base + FTGMAC100_OFFSET_IER); @@ -869,7 +891,7 @@ static void ftgmac100_adjust_link(struct net_device *netdev) netif_start_queue(netdev); ftgmac100_init_hw(priv); - ftgmac100_start_hw(priv, phydev->speed); + ftgmac100_start_hw(priv); /* re-enable interrupts */ iowrite32(ier, priv->base + FTGMAC100_OFFSET_IER); @@ -1089,6 +,20 @@ static int ftgmac100_open(struct net_device *netdev) goto err_irq; } + /* When using NC-SI we force the speed to 100Mbit/s full duplex, +* +* Otherwise we leave it set to 0 (no link), the link +* message from the PHY layer will handle setting it up to +* something else if needed. +*/ + if (priv->use_ncsi) { + priv->cur_duplex = DUPLEX_FULL; + priv->cur_speed = SPEED_100; + } else { + priv->cur_duplex = 0; + priv->cur_speed = 0; + } + priv->rx_pointer = 0; priv->tx_clean_pointer = 0; priv->tx_pointer = 0; @@ -1099,7 +1135,7 @@ static int ftgmac100_open(struct net_device *netdev) goto err_hw; ftgmac100_init_hw(priv); - ftgmac100_start_hw(priv, priv->use_ncsi ? 100 : 10); + ftgmac100_start_hw(priv); /* Clear stale interrupts */ status = ioread32(priv->base + FTGMAC100_OFFSET_ISR); -- 2.9.3
[PATCH v2 01/13] ftgmac100: Use netdev->irq instead of private copy
There's a placeholder already for the irq, use it Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index 928b0df..bf7b1c0 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -55,7 +55,6 @@ struct ftgmac100_descs { struct ftgmac100 { struct resource *res; void __iomem *base; - int irq; struct ftgmac100_descs *descs; dma_addr_t descs_dma_addr; @@ -1119,9 +1118,9 @@ static int ftgmac100_open(struct net_device *netdev) goto err_alloc; } - err = request_irq(priv->irq, ftgmac100_interrupt, 0, netdev->name, netdev); + err = request_irq(netdev->irq, ftgmac100_interrupt, 0, netdev->name, netdev); if (err) { - netdev_err(netdev, "failed to request irq %d\n", priv->irq); + netdev_err(netdev, "failed to request irq %d\n", netdev->irq); goto err_irq; } @@ -1168,7 +1167,7 @@ static int ftgmac100_open(struct net_device *netdev) netif_stop_queue(netdev); iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); err_hw: - free_irq(priv->irq, netdev); + free_irq(netdev->irq, netdev); err_irq: ftgmac100_free_buffers(priv); err_alloc: @@ -1194,7 +1193,7 @@ static int ftgmac100_stop(struct net_device *netdev) ncsi_stop_dev(priv->ndev); ftgmac100_stop_hw(priv); - free_irq(priv->irq, netdev); + free_irq(netdev->irq, netdev); ftgmac100_free_buffers(priv); return 0; @@ -1381,7 +1380,7 @@ static int ftgmac100_probe(struct platform_device *pdev) goto err_ioremap; } - priv->irq = irq; + netdev->irq = irq; /* MAC address from chip or random one */ ftgmac100_setup_mac(priv); @@ -1438,7 +1437,7 @@ static int ftgmac100_probe(struct platform_device *pdev) goto err_register_netdev; } - netdev_info(netdev, "irq %d, mapped at %p\n", priv->irq, priv->base); + netdev_info(netdev, "irq %d, mapped at %p\n", netdev->irq, priv->base); return 0; -- 2.9.3
[PATCH v2 04/13] ftgmac100: Remove "enabled" flags
It's not used in any meaningful way Signed-off-by: Benjamin Herrenschmidt --- drivers/net/ethernet/faraday/ftgmac100.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index 02e0534..cc2271b 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -82,7 +82,6 @@ struct ftgmac100 { /* Misc */ int int_mask_all; - bool enabled; }; static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv, @@ -1124,8 +1123,6 @@ static int ftgmac100_open(struct net_device *netdev) goto err_ncsi; } - priv->enabled = true; - return 0; err_ncsi: @@ -1144,11 +1141,7 @@ static int ftgmac100_stop(struct net_device *netdev) { struct ftgmac100 *priv = netdev_priv(netdev); - if (!priv->enabled) - return 0; - /* disable all interrupts */ - priv->enabled = false; iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); netif_stop_queue(netdev); -- 2.9.3
[PATCH v2 00/13] ftgmac100: Rework batch 1 - Link & Interrupts
This is version 2 of the first batch of updates to the ftgmac100 driver. Essentially: - A few misc cleanups - Fixing link speed & duplex handling (including dealing with an Aspeed requirement to double reset the controller when the speed changes) - And addition of a reset task workqueue which will be used for delaying the re-initialization of the controller - Fixing a number of issues with how interrupts and NAPI are dealt with. Subsequent batches will rework and improve the rx path, the tx path, and add a bunch of features and fixes. Version 2 addresses some review comments to patches 5 and 10 (see version history in the respective emails).
[PATCH net-next] liquidio: fix Octeon core watchdog timeout false alarm
Detection of watchdog timeout of Octeon cores is flawed and susceptible to false alarms. Refactor by removing the detection code, and in its place, leverage existing code that monitors for an indication from the NIC firmware that an Octeon core crashed; expand the meaning of the indication to "an Octeon core crashed or its watchdog timer expired". Detection of watchdog timeout is now delegated to an exception handler in the NIC firmware; this is free of false alarms. Also if there's an Octeon core crash or watchdog timeout: (1) Disable VF Ethernet links. (2) Decrement the module refcount by an amount equal to the number of active VFs of the NIC whose Octeon core crashed or had a watchdog timeout. The refcount will continue to reflect the active VFs of other liquidio NIC(s) (if present) whose Octeon cores are faultless. Item (2) is needed to avoid the case of not being able to unload the driver because the module refcount is stuck at some non-zero number. There is code that, in normal cases, decrements the refcount upon receiving a message from the firmware that a VF driver was unloaded. But in exceptional cases like an Octeon core crash or watchdog timeout, arrival of that particular message from the firmware might be unreliable. That normal case code is changed to not touch the refcount in the exceptional case to avoid contention (over the refcount) with the liquidio_watchdog kernel thread who will carry out item (2). Signed-off-by: Felix Manlunas Signed-off-by: Derek Chickles --- drivers/net/ethernet/cavium/liquidio/lio_main.c| 178 - .../net/ethernet/cavium/liquidio/octeon_device.h | 2 + .../net/ethernet/cavium/liquidio/octeon_network.h | 4 - 3 files changed, 107 insertions(+), 77 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index a8426d3..fa673a1 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -173,6 +173,8 @@ static int liquidio_stop(struct net_device *netdev); static void liquidio_remove(struct pci_dev *pdev); static int liquidio_probe(struct pci_dev *pdev, const struct pci_device_id *ent); +static int liquidio_set_vf_link_state(struct net_device *netdev, int vfidx, + int linkstate); static struct handshake handshake[MAX_OCTEON_DEVICES]; static struct completion first_stage; @@ -1199,97 +1201,122 @@ static int octeon_setup_interrupt(struct octeon_device *oct) return 0; } +static struct octeon_device *get_other_octeon_device(struct octeon_device *oct) +{ + struct octeon_device *other_oct; + + other_oct = lio_get_device(oct->octeon_id + 1); + + if (other_oct && other_oct->pci_dev) { + int oct_busnum, other_oct_busnum; + + oct_busnum = oct->pci_dev->bus->number; + other_oct_busnum = other_oct->pci_dev->bus->number; + + if (oct_busnum == other_oct_busnum) { + int oct_slot, other_oct_slot; + + oct_slot = PCI_SLOT(oct->pci_dev->devfn); + other_oct_slot = PCI_SLOT(other_oct->pci_dev->devfn); + + if (oct_slot == other_oct_slot) + return other_oct; + } + } + + return NULL; +} + +static void disable_all_vf_links(struct octeon_device *oct) +{ + struct net_device *netdev; + int max_vfs, vf, i; + + if (!oct) + return; + + max_vfs = oct->sriov_info.max_vfs; + + for (i = 0; i < oct->ifcount; i++) { + netdev = oct->props[i].netdev; + if (!netdev) + continue; + + for (vf = 0; vf < max_vfs; vf++) + liquidio_set_vf_link_state(netdev, vf, + IFLA_VF_LINK_STATE_DISABLE); + } +} + static int liquidio_watchdog(void *param) { - u64 wdog; - u16 mask_of_stuck_cores = 0; - u16 mask_of_crashed_cores = 0; - int core_num; - u8 core_is_stuck[LIO_MAX_CORES]; - u8 core_crashed[LIO_MAX_CORES]; + bool err_msg_was_printed[LIO_MAX_CORES]; + u16 mask_of_crashed_or_stuck_cores = 0; + bool all_vf_links_are_disabled = false; struct octeon_device *oct = param; + struct octeon_device *other_oct; +#ifdef CONFIG_MODULE_UNLOAD + long refcount, vfs_referencing_pf; + u64 vfs_mask1, vfs_mask2; +#endif + int core; - memset(core_is_stuck, 0, sizeof(core_is_stuck)); - memset(core_crashed, 0, sizeof(core_crashed)); + memset(err_msg_was_printed, 0, sizeof(err_msg_was_printed)); while (!kthread_should_stop()) { - mask_of_crashed_cores = + /* sleep for a couple of seconds so that we don't hog the CPU */ + set_current_stat
[Patch net] net_sched: replace yield() with cond_resched()
yield() should be rendered dead, according to Mike. It is hard to wait properly for all qdisc's to transmit all packets. So just keep the original logic. Reported-by: Mike Galbraith Signed-off-by: Cong Wang --- net/sched/sch_generic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 1a2f9e9..4725d2f 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -925,7 +925,7 @@ void dev_deactivate_many(struct list_head *head) /* Wait for outstanding qdisc_run calls. */ list_for_each_entry(dev, head, close_list) while (some_qdisc_is_busy(dev)) - yield(); + cond_resched(); } void dev_deactivate(struct net_device *dev) -- 2.5.5
[Patch net] net_sched: check noop_qdisc before qdisc_hash_add()
Dmitry reported a crash when injecting faults in attach_one_default_qdisc() and dev->qdisc is still a noop_disc, the check before qdisc_hash_add() fails to catch it because it tests NULL. We should test against noop_qdisc since it is the default qdisc at this point. Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable") Reported-by: Dmitry Vyukov Cc: Jiri Kosina Signed-off-by: Cong Wang --- net/sched/sch_generic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index b052b27..1a2f9e9 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -794,7 +794,7 @@ static void attach_default_qdiscs(struct net_device *dev) } } #ifdef CONFIG_NET_SCHED - if (dev->qdisc) + if (dev->qdisc != &noop_qdisc) qdisc_hash_add(dev->qdisc); #endif } -- 2.5.5
[PATCH net-next] net: usbnet: Remove unused driver_name variable
With GCC 6.3, we can get the following warning: drivers/net/usb/usbnet.c:85:19: warning: 'driver_name' defined but not used [-Wunused-const-variable=] static const char driver_name [] = "usbnet"; ^~~ Signed-off-by: Florian Fainelli --- drivers/net/usb/usbnet.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 9890656af735..1cc945cbeaa3 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -82,8 +82,6 @@ // randomly generated ethernet address static u8 node_id [ETH_ALEN]; -static const char driver_name [] = "usbnet"; - /* use ethtool to change the level for any given device */ static int msg_level = -1; module_param (msg_level, int, 0); -- 2.9.3
Re: net/ipv4: use-after-free in ipv4_mtu
On Tue, Apr 4, 2017 at 11:51 AM, Eric Dumazet wrote: > On Tue, Apr 4, 2017 at 7:50 AM, Andrey Konovalov > wrote: >> >> Hi, >> >> I've got the following error report while fuzzing the kernel with syzkaller. >> >> On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5). >> >> Unfortunately it's not reproducible. >> >> == >> BUG: KASAN: use-after-free in dst_metric_raw include/net/dst.h:176 >> [inline] at addr 88003d6a965c >> BUG: KASAN: use-after-free in ipv4_mtu+0x3f2/0x4b0 >> net/ipv4/route.c:1270 at addr 88003d6a965c >> Read of size 4 by task syz-executor3/20611 >> CPU: 3 PID: 20611 Comm: syz-executor3 Not tainted 4.11.0-rc5+ #199 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> Call Trace: >> __dump_stack lib/dump_stack.c:16 [inline] >> dump_stack+0x292/0x398 lib/dump_stack.c:52 >> kasan_object_err+0x1c/0x70 mm/kasan/report.c:164 >> print_address_description mm/kasan/report.c:202 [inline] >> kasan_report_error mm/kasan/report.c:291 [inline] >> kasan_report+0x252/0x510 mm/kasan/report.c:347 >> __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367 >> dst_metric_raw include/net/dst.h:176 [inline] >> ipv4_mtu+0x3f2/0x4b0 net/ipv4/route.c:1270 >> dst_mtu include/net/dst.h:221 [inline] >> do_ip_getsockopt+0x71d/0x2290 net/ipv4/ip_sockglue.c:1433 >> ip_getsockopt+0x90/0x230 net/ipv4/ip_sockglue.c:1578 >> tcp_getsockopt+0x82/0xd0 net/ipv4/tcp.c:3131 >> sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2709 >> SYSC_getsockopt net/socket.c:1829 [inline] >> SyS_getsockopt+0x252/0x390 net/socket.c:1811 >> entry_SYSCALL_64_fastpath+0x1f/0xc2 >> RIP: 0033:0x4458d9 >> RSP: 002b:7fe87f452b58 EFLAGS: 0286 ORIG_RAX: 0037 >> RAX: ffda RBX: 0005 RCX: 004458d9 >> RDX: 000e RSI: RDI: 0005 >> RBP: 006e0020 R08: 20db6000 R09: >> R10: 207e8000 R11: 0286 R12: 00708150 >> R13: 20db8000 R14: 1000 R15: 0003 >> Object at 88003d6a9658, in cache kmalloc-64 size: 64 >> Allocated: >> PID = 20110 >> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 >> save_stack+0x43/0xd0 mm/kasan/kasan.c:513 >> set_track mm/kasan/kasan.c:525 [inline] >> kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616 >> kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745 >> kmalloc include/linux/slab.h:490 [inline] >> kzalloc include/linux/slab.h:663 [inline] >> fib_create_info+0x8e0/0x3a30 net/ipv4/fib_semantics.c:1040 >> fib_table_insert+0x1a5/0x1550 net/ipv4/fib_trie.c:1221 >> ip_rt_ioctl+0xddc/0x1590 net/ipv4/fib_frontend.c:597 >> inet_ioctl+0xf2/0x1c0 net/ipv4/af_inet.c:882 >> sctp: [Deprecated]: syz-executor0 (pid 20638) Use of int in max_burst >> socket option. >> Use struct sctp_assoc_value instead >> sock_do_ioctl+0x65/0xb0 net/socket.c:906 >> sock_ioctl+0x28f/0x440 net/socket.c:1004 >> vfs_ioctl fs/ioctl.c:45 [inline] >> do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685 >> SYSC_ioctl fs/ioctl.c:700 [inline] >> SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 >> entry_SYSCALL_64_fastpath+0x1f/0xc2 >> Freed: >> PID = 4439 >> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 >> save_stack+0x43/0xd0 mm/kasan/kasan.c:513 >> set_track mm/kasan/kasan.c:525 [inline] >> kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589 >> slab_free_hook mm/slub.c:1357 [inline] >> slab_free_freelist_hook mm/slub.c:1379 [inline] >> slab_free mm/slub.c:2961 [inline] >> kfree+0xe8/0x2b0 mm/slub.c:3882 >> free_fib_info_rcu+0x4ba/0x5e0 net/ipv4/fib_semantics.c:218 >> __rcu_reclaim kernel/rcu/rcu.h:118 [inline] >> rcu_do_batch.isra.64+0x947/0xcc0 kernel/rcu/tree.c:2879 >> invoke_rcu_callbacks kernel/rcu/tree.c:3142 [inline] >> __rcu_process_callbacks kernel/rcu/tree.c:3109 [inline] >> rcu_process_callbacks+0x2cc/0xb90 kernel/rcu/tree.c:3126 >> __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 >> Memory state around the buggy address: >> 88003d6a9500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> 88003d6a9580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> >88003d6a9600: fc fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb >> ^ >> 88003d6a9680: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc >> 88003d6a9700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> == > > Thanks for the report Andrey > > Looking at fib->fib_metrics, I fail to understand how the following can work : > > dst_init_metrics(&rt->dst, fi->fib_metrics, true); > > In the cases fi->fib_metrics is _not_ dst_default_metrics, > fi->fib_metrics can be freed when the fib is deleted, > while dst(s) have still the 'read only pointer'. > > RCU grace period before fi->fib_metrics freeing does not help. > > Without refcounts, it looks like we need to copy the
[PATCH net-next 14/14] nfp: add support for .set_link_ksettings()
Support setting link speed and autonegotiation through set_link_ksettings() ethtool op. If the port is reconfigured in incompatible way and reboot is required the netdev will get unregistered and not come back until user reboots the system. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 46 ++ 1 file changed, 46 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c index 963d6dd97cec..3328041ec290 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -237,6 +237,51 @@ nfp_net_get_link_ksettings(struct net_device *netdev, return 0; } +static int +nfp_net_set_link_ksettings(struct net_device *netdev, + const struct ethtool_link_ksettings *cmd) +{ + struct nfp_net *nn = netdev_priv(netdev); + struct nfp_nsp *nsp; + int err; + + if (!nn->eth_port) + return -EOPNOTSUPP; + + if (netif_running(netdev)) { + nn_warn(nn, "Changing settings not allowed on an active interface. It may cause the port to be disabled until reboot.\n"); + return -EBUSY; + } + + nsp = nfp_eth_config_start(nn->cpp, nn->eth_port->index); + if (IS_ERR(nsp)) + return PTR_ERR(nsp); + + err = __nfp_eth_set_aneg(nsp, cmd->base.autoneg == AUTONEG_ENABLE ? +NFP_ANEG_AUTO : NFP_ANEG_DISABLED); + if (err) + goto err_bad_set; + if (cmd->base.speed != SPEED_UNKNOWN) { + u32 speed = cmd->base.speed / nn->eth_port->lanes; + + err = __nfp_eth_set_speed(nsp, speed); + if (err) + goto err_bad_set; + } + + err = nfp_eth_config_commit_end(nsp); + if (err > 0) + return 0; /* no change */ + + nfp_net_refresh_port_config(nn); + + return err; + +err_bad_set: + nfp_eth_config_cleanup_end(nsp); + return err; +} + static void nfp_net_get_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring) { @@ -879,6 +924,7 @@ static const struct ethtool_ops nfp_net_ethtool_ops = { .get_channels = nfp_net_get_channels, .set_channels = nfp_net_set_channels, .get_link_ksettings = nfp_net_get_link_ksettings, + .set_link_ksettings = nfp_net_set_link_ksettings, }; void nfp_net_set_ethtool_ops(struct net_device *netdev) -- 2.11.0
[PATCH net-next 06/14] nfp: report link speed from NSP
On the PF prefer the link speed value provided by the NSP. Refresh port table if needed. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c index d3cec0d4a978..0fdc14e7b576 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -49,6 +49,7 @@ #include #include "nfpcore/nfp.h" +#include "nfpcore/nfp_nsp_eth.h" #include "nfp_net_ctrl.h" #include "nfp_net.h" @@ -205,6 +206,16 @@ nfp_net_get_link_ksettings(struct net_device *netdev, if (!netif_carrier_ok(netdev)) return 0; + /* Use link speed from ETH table if available, otherwise try the BAR */ + if (nn->eth_port && nfp_net_link_changed_read_clear(nn)) + nfp_net_refresh_port_config(nn); + /* Separate if - on FW error the port could've disappeared from table */ + if (nn->eth_port) { + cmd->base.speed = nn->eth_port->speed; + cmd->base.duplex = DUPLEX_FULL; + return 0; + } + sts = nn_readl(nn, NFP_NET_CFG_STS); ls = FIELD_GET(NFP_NET_CFG_STS_LINK_RATE, sts); -- 2.11.0
[PATCH net-next 03/14] nfp: add mutex protection for the port list
We will want to unregister netdevs after their port got reconfigured. For that we need to make sure manipulations of port list from the port reconfiguration flow will not race with driver's .remove() callback. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_main.c | 3 +-- drivers/net/ethernet/netronome/nfp/nfp_main.h | 3 +++ drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 19 +-- 3 files changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c index dedac720fb29..96266796fd09 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c @@ -385,8 +385,7 @@ static void nfp_pci_remove(struct pci_dev *pdev) { struct nfp_pf *pf = pci_get_drvdata(pdev); - if (!list_empty(&pf->ports)) - nfp_net_pci_remove(pf); + nfp_net_pci_remove(pf); nfp_pcie_sriov_disable(pdev); diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h b/drivers/net/ethernet/netronome/nfp/nfp_main.h index bb15a5724bf7..b7ceec9a5783 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h @@ -42,6 +42,7 @@ #include #include #include +#include #include struct dentry; @@ -67,6 +68,7 @@ struct nfp_eth_table; * @num_ports: Number of adapter ports app firmware supports * @num_netdevs: Number of netdevs spawned * @ports: Linked list of port structures (struct nfp_net) + * @port_lock: Protects @ports, @num_ports, @num_netdevs */ struct nfp_pf { struct pci_dev *pdev; @@ -92,6 +94,7 @@ struct nfp_pf { unsigned int num_netdevs; struct list_head ports; + struct mutex port_lock; }; extern struct pci_driver nfp_netvf_pci_driver; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c index 1644954f52cd..4d602b1ddc90 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c @@ -481,17 +481,22 @@ int nfp_net_pci_probe(struct nfp_pf *pf) int stride; int err; + mutex_init(&pf->port_lock); + /* Verify that the board has completed initialization */ if (!nfp_is_ready(pf->cpp)) { nfp_err(pf->cpp, "NFP is not ready for NIC operation.\n"); return -EINVAL; } + mutex_lock(&pf->port_lock); pf->num_ports = nfp_net_pf_get_num_ports(pf); ctrl_bar = nfp_net_pf_map_ctrl_bar(pf); - if (!ctrl_bar) - return pf->fw_loaded ? -EINVAL : -EPROBE_DEFER; + if (!ctrl_bar) { + err = pf->fw_loaded ? -EINVAL : -EPROBE_DEFER; + goto err_unlock; + } nfp_net_get_fw_version(&fw_ver, ctrl_bar); if (fw_ver.resv || fw_ver.class != NFP_NET_CFG_VERSION_CLASS_GENERIC) { @@ -565,6 +570,8 @@ int nfp_net_pci_probe(struct nfp_pf *pf) if (err) goto err_clean_ddir; + mutex_unlock(&pf->port_lock); + return 0; err_clean_ddir: @@ -574,6 +581,8 @@ int nfp_net_pci_probe(struct nfp_pf *pf) nfp_cpp_area_release_free(pf->tx_area); err_ctrl_unmap: nfp_cpp_area_release_free(pf->ctrl_area); +err_unlock: + mutex_unlock(&pf->port_lock); return err; } @@ -581,6 +590,10 @@ void nfp_net_pci_remove(struct nfp_pf *pf) { struct nfp_net *nn; + mutex_lock(&pf->port_lock); + if (list_empty(&pf->ports)) + goto out; + list_for_each_entry(nn, &pf->ports, port_list) { nfp_net_debugfs_dir_clean(&nn->debugfs_dir); @@ -597,4 +610,6 @@ void nfp_net_pci_remove(struct nfp_pf *pf) nfp_cpp_area_release_free(pf->rx_area); nfp_cpp_area_release_free(pf->tx_area); nfp_cpp_area_release_free(pf->ctrl_area); +out: + mutex_unlock(&pf->port_lock); } -- 2.11.0
[PATCH net-next 01/14] nfp: add support for .get_link_ksettings()
Read link speed from the BAR. This provides very basic information and works for both PFs and VFs. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h | 13 ++ .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 49 ++ 2 files changed, 62 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h index 71d86171b4ee..d04ccc9f6116 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h @@ -177,6 +177,19 @@ #define NFP_NET_CFG_VERSION_MINOR(x)(((x) & 0xff) << 0) #define NFP_NET_CFG_STS 0x0034 #define NFP_NET_CFG_STS_LINK(0x1 << 0) /* Link up or down */ +/* Link rate */ +#define NFP_NET_CFG_STS_LINK_RATE_SHIFT 1 +#define NFP_NET_CFG_STS_LINK_RATE_MASK 0xF +#define NFP_NET_CFG_STS_LINK_RATE \ + (NFP_NET_CFG_STS_LINK_RATE_MASK << NFP_NET_CFG_STS_LINK_RATE_SHIFT) +#define NFP_NET_CFG_STS_LINK_RATE_UNSUPPORTED 0 +#define NFP_NET_CFG_STS_LINK_RATE_UNKNOWN 1 +#define NFP_NET_CFG_STS_LINK_RATE_1G2 +#define NFP_NET_CFG_STS_LINK_RATE_10G 3 +#define NFP_NET_CFG_STS_LINK_RATE_25G 4 +#define NFP_NET_CFG_STS_LINK_RATE_40G 5 +#define NFP_NET_CFG_STS_LINK_RATE_50G 6 +#define NFP_NET_CFG_STS_LINK_RATE_100G 7 #define NFP_NET_CFG_CAP 0x0038 #define NFP_NET_CFG_MAX_TXRINGS 0x003c #define NFP_NET_CFG_MAX_RXRINGS 0x0040 diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c index ed22a813e579..d3cec0d4a978 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -173,6 +173,54 @@ static void nfp_net_get_drvinfo(struct net_device *netdev, drvinfo->regdump_len = NFP_NET_CFG_BAR_SZ; } +/** + * nfp_net_get_link_ksettings - Get Link Speed settings + * @netdev:network interface device structure + * @cmd: ethtool command + * + * Reports speed settings based on info in the BAR provided by the fw. + */ +static int +nfp_net_get_link_ksettings(struct net_device *netdev, + struct ethtool_link_ksettings *cmd) +{ + static const u32 ls_to_ethtool[] = { + [NFP_NET_CFG_STS_LINK_RATE_UNSUPPORTED] = 0, + [NFP_NET_CFG_STS_LINK_RATE_UNKNOWN] = SPEED_UNKNOWN, + [NFP_NET_CFG_STS_LINK_RATE_1G] = SPEED_1000, + [NFP_NET_CFG_STS_LINK_RATE_10G] = SPEED_1, + [NFP_NET_CFG_STS_LINK_RATE_25G] = SPEED_25000, + [NFP_NET_CFG_STS_LINK_RATE_40G] = SPEED_4, + [NFP_NET_CFG_STS_LINK_RATE_50G] = SPEED_5, + [NFP_NET_CFG_STS_LINK_RATE_100G]= SPEED_10, + }; + struct nfp_net *nn = netdev_priv(netdev); + u32 sts, ls; + + ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE); + cmd->base.port = PORT_OTHER; + cmd->base.speed = SPEED_UNKNOWN; + cmd->base.duplex = DUPLEX_UNKNOWN; + + if (!netif_carrier_ok(netdev)) + return 0; + + sts = nn_readl(nn, NFP_NET_CFG_STS); + + ls = FIELD_GET(NFP_NET_CFG_STS_LINK_RATE, sts); + if (ls == NFP_NET_CFG_STS_LINK_RATE_UNSUPPORTED) + return -EOPNOTSUPP; + + if (ls == NFP_NET_CFG_STS_LINK_RATE_UNKNOWN || + ls >= ARRAY_SIZE(ls_to_ethtool)) + return 0; + + cmd->base.speed = ls_to_ethtool[sts]; + cmd->base.duplex = DUPLEX_FULL; + + return 0; +} + static void nfp_net_get_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring) { @@ -814,6 +862,7 @@ static const struct ethtool_ops nfp_net_ethtool_ops = { .set_coalesce = nfp_net_set_coalesce, .get_channels = nfp_net_get_channels, .set_channels = nfp_net_set_channels, + .get_link_ksettings = nfp_net_get_link_ksettings, }; void nfp_net_set_ethtool_ops(struct net_device *netdev) -- 2.11.0
[PATCH net-next 10/14] nfp: allow multi-stage NSP configuration
NSP commands may be slow to respond, we should try to avoid doing a command-per-item when user requested to change multiple parameters for instance with an ethtool .set_settings() command. Introduce a way of internal NSP code to carry state in NSP structure and add start/finish calls to perform the initialization and kick off of the configuration request, with potentially many parameters being modified in between. nfp_eth_set_mod_enable() will make use of the new code internally, other "set" functions to follow. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h | 8 ++ .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 43 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h | 4 + .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 114 +++-- 4 files changed, 138 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h index 778bd9424d5d..8afef7593f13 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h @@ -52,6 +52,14 @@ const char *nfp_hwinfo_lookup(struct nfp_cpp *cpp, const char *lookup); struct nfp_nsp; +struct nfp_cpp *nfp_nsp_cpp(struct nfp_nsp *state); +bool nfp_nsp_config_modified(struct nfp_nsp *state); +void nfp_nsp_config_set_modified(struct nfp_nsp *state, bool modified); +void *nfp_nsp_config_entries(struct nfp_nsp *state); +unsigned int nfp_nsp_config_idx(struct nfp_nsp *state); +void nfp_nsp_config_set_state(struct nfp_nsp *state, void *entries, + unsigned int idx); +void nfp_nsp_config_clear_state(struct nfp_nsp *state); int nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int size); int nfp_nsp_write_eth_table(struct nfp_nsp *state, const void *buf, unsigned int size); diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c index 6482831282b2..225d07815375 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c @@ -104,8 +104,51 @@ struct nfp_nsp { u16 major; u16 minor; } ver; + + /* Eth table config state */ + bool modified; + unsigned int idx; + void *entries; }; +struct nfp_cpp *nfp_nsp_cpp(struct nfp_nsp *state) +{ + return state->cpp; +} + +bool nfp_nsp_config_modified(struct nfp_nsp *state) +{ + return state->modified; +} + +void nfp_nsp_config_set_modified(struct nfp_nsp *state, bool modified) +{ + state->modified = modified; +} + +void *nfp_nsp_config_entries(struct nfp_nsp *state) +{ + return state->entries; +} + +unsigned int nfp_nsp_config_idx(struct nfp_nsp *state) +{ + return state->idx; +} + +void +nfp_nsp_config_set_state(struct nfp_nsp *state, void *entries, unsigned int idx) +{ + state->entries = entries; + state->idx = idx; +} + +void nfp_nsp_config_clear_state(struct nfp_nsp *state) +{ + state->entries = NULL; + state->idx = 0; +} + static int nfp_nsp_check(struct nfp_nsp *state) { struct nfp_cpp *cpp = state->cpp; diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h index e3baec32..c452ad311993 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h @@ -136,4 +136,8 @@ struct nfp_eth_table * __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp *nsp); int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, bool enable); +struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx); +int nfp_eth_config_commit_end(struct nfp_nsp *nsp); +void nfp_eth_config_cleanup_end(struct nfp_nsp *nsp); + #endif diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c index 837de15ed720..55d8e073ccbd 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c @@ -268,63 +268,115 @@ __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp *nsp) return NULL; } -/** - * nfp_eth_set_mod_enable() - set PHY module enable control bit - * @cpp: NFP CPP handle - * @idx: NFP chip-wide port index - * @enable:Desired state - * - * Enable or disable PHY module (this usually means setting the TX lanes - * disable bits). - * - * Return: 0 or -ERRNO. - */ -int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, bool enable) +struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx) { struct eth_table_entry *entries; struct nfp_nsp *nsp; - u64 reg; int ret; entries = kzalloc(NSP_ETH_TABLE_SIZE, GFP_KER
[PATCH net-next 04/14] nfp: track link state changes
For caching link settings - remember if we have seen link events since the last time the eth_port information was refreshed. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_net.h| 6 +- drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 14 ++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h index 8e04aa0e6e87..91e963b5104f 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h @@ -523,7 +523,8 @@ struct nfp_net_dp { * @reconfig_sync_present: Some thread is performing synchronous reconfig * @reconfig_timer:Timer for async reading of reconfig results * @link_up:Is the link up? - * @link_status_lock: Protects @link_up and ensures atomicity with BAR reading + * @link_changed: Has link state changes since last port refresh? + * @link_status_lock: Protects @link_* and ensures atomicity with BAR reading * @rx_coalesce_usecs: RX interrupt moderation usecs delay parameter * @rx_coalesce_max_frames: RX interrupt moderation frame count parameter * @tx_coalesce_usecs: TX interrupt moderation usecs delay parameter @@ -580,6 +581,7 @@ struct nfp_net { u32 me_freq_mhz; bool link_up; + bool link_changed; spinlock_t link_status_lock; spinlock_t reconfig_lock; @@ -810,6 +812,8 @@ nfp_net_irqs_assign(struct nfp_net *nn, struct msix_entry *irq_entries, struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn); int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new); +bool nfp_net_link_changed_read_clear(struct nfp_net *nn); + #ifdef CONFIG_NFP_DEBUG void nfp_net_debugfs_create(void); void nfp_net_debugfs_destroy(void); diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 6944a3202a45..8664815f45ce 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -376,6 +376,19 @@ static irqreturn_t nfp_net_irq_rxtx(int irq, void *data) return IRQ_HANDLED; } +bool nfp_net_link_changed_read_clear(struct nfp_net *nn) +{ + unsigned long flags; + bool ret; + + spin_lock_irqsave(&nn->link_status_lock, flags); + ret = nn->link_changed; + nn->link_changed = false; + spin_unlock_irqrestore(&nn->link_status_lock, flags); + + return ret; +} + /** * nfp_net_read_link_status() - Reread link status from control BAR * @nn: NFP Network structure @@ -395,6 +408,7 @@ static void nfp_net_read_link_status(struct nfp_net *nn) goto out; nn->link_up = link_up; + nn->link_changed = true; if (nn->link_up) { netif_carrier_on(nn->dp.netdev); -- 2.11.0
[PATCH net-next 11/14] nfp: turn NSP port entry into a union
Make NSP port structure a union to simplify accessing the fields from generic macros. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 38 ++ 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c index 55d8e073ccbd..ca5c041e64a4 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c @@ -49,7 +49,7 @@ #define NSP_ETH_NBI_PORT_COUNT 24 #define NSP_ETH_MAX_COUNT (2 * NSP_ETH_NBI_PORT_COUNT) #define NSP_ETH_TABLE_SIZE (NSP_ETH_MAX_COUNT *\ -sizeof(struct eth_table_entry)) +sizeof(union eth_table_entry)) #define NSP_ETH_PORT_LANES GENMASK_ULL(3, 0) #define NSP_ETH_PORT_INDEX GENMASK_ULL(15, 8) @@ -71,6 +71,15 @@ #define NSP_ETH_CTRL_TX_ENABLEDBIT_ULL(2) #define NSP_ETH_CTRL_RX_ENABLEDBIT_ULL(3) +enum nfp_eth_raw { + NSP_ETH_RAW_PORT = 0, + NSP_ETH_RAW_STATE, + NSP_ETH_RAW_MAC, + NSP_ETH_RAW_CONTROL, + + NSP_ETH_NUM_RAW +}; + enum nfp_eth_rate { RATE_INVALID = 0, RATE_10M, @@ -80,12 +89,15 @@ enum nfp_eth_rate { RATE_25G, }; -struct eth_table_entry { - __le64 port; - __le64 state; - u8 mac_addr[6]; - u8 resv[2]; - __le64 control; +union eth_table_entry { + struct { + __le64 port; + __le64 state; + u8 mac_addr[6]; + u8 resv[2]; + __le64 control; + }; + __le64 raw[NSP_ETH_NUM_RAW]; }; static unsigned int nfp_eth_rate(enum nfp_eth_rate rate) @@ -114,7 +126,7 @@ static void nfp_eth_copy_mac_reverse(u8 *dst, const u8 *src) } static void -nfp_eth_port_translate(struct nfp_nsp *nsp, const struct eth_table_entry *src, +nfp_eth_port_translate(struct nfp_nsp *nsp, const union eth_table_entry *src, unsigned int index, struct nfp_eth_table_port *dst) { unsigned int rate; @@ -216,7 +228,7 @@ struct nfp_eth_table *nfp_eth_read_ports(struct nfp_cpp *cpp) struct nfp_eth_table * __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp *nsp) { - struct eth_table_entry *entries; + union eth_table_entry *entries; struct nfp_eth_table *table; int i, j, ret, cnt = 0; @@ -270,7 +282,7 @@ __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp *nsp) struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx) { - struct eth_table_entry *entries; + union eth_table_entry *entries; struct nfp_nsp *nsp; int ret; @@ -307,7 +319,7 @@ struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx) void nfp_eth_config_cleanup_end(struct nfp_nsp *nsp) { - struct eth_table_entry *entries = nfp_nsp_config_entries(nsp); + union eth_table_entry *entries = nfp_nsp_config_entries(nsp); nfp_nsp_config_set_modified(nsp, false); nfp_nsp_config_clear_state(nsp); @@ -331,7 +343,7 @@ void nfp_eth_config_cleanup_end(struct nfp_nsp *nsp) */ int nfp_eth_config_commit_end(struct nfp_nsp *nsp) { - struct eth_table_entry *entries = nfp_nsp_config_entries(nsp); + union eth_table_entry *entries = nfp_nsp_config_entries(nsp); int ret = 1; if (nfp_nsp_config_modified(nsp)) { @@ -357,7 +369,7 @@ int nfp_eth_config_commit_end(struct nfp_nsp *nsp) */ int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, bool enable) { - struct eth_table_entry *entries; + union eth_table_entry *entries; struct nfp_nsp *nsp; u64 reg; -- 2.11.0
[PATCH net-next 12/14] nfp: add extended error messages
Allow NSP to set option code even when error is reported. This provides a way for NSP to give user more precise information about why command failed. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 37 +- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c index 225d07815375..96bb5f6bd87b 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c @@ -97,6 +97,13 @@ enum nfp_nsp_cmd { __MAX_SPCODE, }; +static const struct { + int code; + const char *msg; +} nsp_errors[] = { + { 0, "success" } /* placeholder to avoid warnings */ +}; + struct nfp_nsp { struct nfp_cpp *cpp; struct nfp_resource *res; @@ -149,6 +156,18 @@ void nfp_nsp_config_clear_state(struct nfp_nsp *state) state->idx = 0; } +static void nfp_nsp_print_extended_error(struct nfp_nsp *state, u32 ret_val) +{ + int i; + + if (!ret_val) + return; + + for (i = 0; i < ARRAY_SIZE(nsp_errors); i++) + if (ret_val == nsp_errors[i].code) + nfp_err(state->cpp, "err msg: %s\n", nsp_errors[i].msg); +} + static int nfp_nsp_check(struct nfp_nsp *state) { struct nfp_cpp *cpp = state->cpp; @@ -282,7 +301,7 @@ nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg, static int nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 option, u32 buff_cpp, u64 buff_addr) { - u64 reg, nsp_base, nsp_buffer, nsp_status, nsp_command; + u64 reg, ret_val, nsp_base, nsp_buffer, nsp_status, nsp_command; struct nfp_cpp *cpp = state->cpp; u32 nsp_cpp; int err; @@ -335,18 +354,20 @@ static int nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 option, return err; } + err = nfp_cpp_readq(cpp, nsp_cpp, nsp_command, &ret_val); + if (err < 0) + return err; + ret_val = FIELD_GET(NSP_COMMAND_OPTION, ret_val); + err = FIELD_GET(NSP_STATUS_RESULT, reg); if (err) { - nfp_warn(cpp, "Result (error) code set: %d command: %d\n", --err, code); + nfp_warn(cpp, "Result (error) code set: %d (%d) command: %d\n", +-err, (int)ret_val, code); + nfp_nsp_print_extended_error(state, ret_val); return -err; } - err = nfp_cpp_readq(cpp, nsp_cpp, nsp_command, ®); - if (err < 0) - return err; - - return FIELD_GET(NSP_COMMAND_OPTION, reg); + return ret_val; } static int nfp_nsp_command_buf(struct nfp_nsp *nsp, u16 code, u32 option, -- 2.11.0
[PATCH net-next 13/14] nfp: NSP backend for link configuration operations
Add NSP backend for upcoming link configuration operations. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 6 +- .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h | 7 + .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 180 +++-- 3 files changed, 179 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c index 96bb5f6bd87b..4635f42e15b0 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c @@ -101,7 +101,11 @@ static const struct { int code; const char *msg; } nsp_errors[] = { - { 0, "success" } /* placeholder to avoid warnings */ + { 6010, "could not map to phy for port" }, + { 6011, "not an allowed rate/lanes for port" }, + { 6012, "not an allowed rate/lanes for port" }, + { 6013, "high/low error, change other port first" }, + { 6014, "config not found in flash" }, }; struct nfp_nsp { diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h index c452ad311993..7d34ff145fd7 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h @@ -134,10 +134,17 @@ struct nfp_eth_table { struct nfp_eth_table *nfp_eth_read_ports(struct nfp_cpp *cpp); struct nfp_eth_table * __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp *nsp); + int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, bool enable); +int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx, + bool configed); struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx); int nfp_eth_config_commit_end(struct nfp_nsp *nsp); void nfp_eth_config_cleanup_end(struct nfp_nsp *nsp); +int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode); +int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed); +int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes); + #endif diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c index ca5c041e64a4..639438d8313a 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c @@ -58,6 +58,7 @@ #define NSP_ETH_PORT_LANES_MASKcpu_to_le64(NSP_ETH_PORT_LANES) +#define NSP_ETH_STATE_CONFIGURED BIT_ULL(0) #define NSP_ETH_STATE_ENABLED BIT_ULL(1) #define NSP_ETH_STATE_TX_ENABLED BIT_ULL(2) #define NSP_ETH_STATE_RX_ENABLED BIT_ULL(3) @@ -67,9 +68,13 @@ #define NSP_ETH_STATE_OVRD_CHNGBIT_ULL(22) #define NSP_ETH_STATE_ANEG GENMASK_ULL(25, 23) +#define NSP_ETH_CTRL_CONFIGUREDBIT_ULL(0) #define NSP_ETH_CTRL_ENABLED BIT_ULL(1) #define NSP_ETH_CTRL_TX_ENABLEDBIT_ULL(2) #define NSP_ETH_CTRL_RX_ENABLEDBIT_ULL(3) +#define NSP_ETH_CTRL_SET_RATE BIT_ULL(4) +#define NSP_ETH_CTRL_SET_LANES BIT_ULL(5) +#define NSP_ETH_CTRL_SET_ANEG BIT_ULL(6) enum nfp_eth_raw { NSP_ETH_RAW_PORT = 0, @@ -100,21 +105,38 @@ union eth_table_entry { __le64 raw[NSP_ETH_NUM_RAW]; }; -static unsigned int nfp_eth_rate(enum nfp_eth_rate rate) +static const struct { + enum nfp_eth_rate rate; + unsigned int speed; +} nsp_eth_rate_tbl[] = { + { RATE_INVALID, 0, }, + { RATE_10M, SPEED_10, }, + { RATE_100M,SPEED_100, }, + { RATE_1G, SPEED_1000, }, + { RATE_10G, SPEED_1, }, + { RATE_25G, SPEED_25000, }, +}; + +static unsigned int nfp_eth_rate2speed(enum nfp_eth_rate rate) { - unsigned int rate_xlate[] = { - [RATE_INVALID] = 0, - [RATE_10M] = SPEED_10, - [RATE_100M] = SPEED_100, - [RATE_1G] = SPEED_1000, - [RATE_10G] = SPEED_1, - [RATE_25G] = SPEED_25000, - }; + int i; - if (rate >= ARRAY_SIZE(rate_xlate)) - return 0; + for (i = 0; i < ARRAY_SIZE(nsp_eth_rate_tbl); i++) + if (nsp_eth_rate_tbl[i].rate == rate) + return nsp_eth_rate_tbl[i].speed; + + return 0; +} + +static unsigned int nfp_eth_speed2rate(unsigned int speed) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(nsp_eth_rate_tbl); i++) + if (nsp_eth_rate_tbl[i].speed == speed) + return nsp_eth_rate_tbl[i].rate; - return rate_xlate[rate]; + return RATE_INVALID; } static void nfp_eth_copy_mac_reverse(u8 *dst, const u8 *src) @@ -145,7 +167,7 @@ nfp_eth_port_translate(str
[PATCH net-next 08/14] nfp: report port type in ethtool
Service process firmware provides us with information about media and interface (SFP module) plugged in, translate that to Linux's PORT_* defines and report via ethtool. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 1 + .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 21 +++ .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h | 24 ++ 3 files changed, 46 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c index 563ced3c99e1..3b2a09821a59 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -215,6 +215,7 @@ nfp_net_get_link_ksettings(struct net_device *netdev, nfp_net_refresh_port_config(nn); /* Separate if - on FW error the port could've disappeared from table */ if (nn->eth_port) { + cmd->base.port = nn->eth_port->port_type; cmd->base.speed = nn->eth_port->speed; cmd->base.duplex = DUPLEX_FULL; return 0; diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c index dcb1bc81e554..07b4ded01514 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c @@ -62,6 +62,8 @@ #define NSP_ETH_STATE_TX_ENABLED BIT_ULL(2) #define NSP_ETH_STATE_RX_ENABLED BIT_ULL(3) #define NSP_ETH_STATE_RATE GENMASK_ULL(11, 8) +#define NSP_ETH_STATE_INTERFACEGENMASK_ULL(19, 12) +#define NSP_ETH_STATE_MEDIAGENMASK_ULL(21, 20) #define NSP_ETH_STATE_OVRD_CHNGBIT_ULL(22) #define NSP_ETH_STATE_ANEG GENMASK_ULL(25, 23) @@ -134,6 +136,9 @@ nfp_eth_port_translate(struct nfp_nsp *nsp, const struct eth_table_entry *src, rate = nfp_eth_rate(FIELD_GET(NSP_ETH_STATE_RATE, state)); dst->speed = dst->lanes * rate; + dst->interface = FIELD_GET(NSP_ETH_STATE_INTERFACE, state); + dst->media = FIELD_GET(NSP_ETH_STATE_MEDIA, state); + nfp_eth_copy_mac_reverse(dst->mac_addr, src->mac_addr); dst->label_port = FIELD_GET(NSP_ETH_PORT_PHYLABEL, port); @@ -170,6 +175,20 @@ nfp_eth_mark_split_ports(struct nfp_cpp *cpp, struct nfp_eth_table *table) } } +static void +nfp_eth_calc_port_type(struct nfp_cpp *cpp, struct nfp_eth_table_port *entry) +{ + if (entry->interface == NFP_INTERFACE_NONE) { + entry->port_type = PORT_NONE; + return; + } + + if (entry->media == NFP_MEDIA_FIBRE) + entry->port_type = PORT_FIBRE; + else + entry->port_type = PORT_DA; +} + /** * nfp_eth_read_ports() - retrieve port information * @cpp: NFP CPP handle @@ -237,6 +256,8 @@ __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp *nsp) &table->ports[j++]); nfp_eth_mark_split_ports(cpp, table); + for (i = 0; i < table->count; i++) + nfp_eth_calc_port_type(cpp, &table->ports[i]); kfree(entries); diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h index 6b3e954e70b3..57eb3cfa6a0a 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h @@ -37,6 +37,22 @@ #include #include +enum nfp_eth_interface { + NFP_INTERFACE_NONE = 0, + NFP_INTERFACE_SFP = 1, + NFP_INTERFACE_SFPP = 10, + NFP_INTERFACE_SFP28 = 28, + NFP_INTERFACE_QSFP = 40, + NFP_INTERFACE_CXP = 100, + NFP_INTERFACE_QSFP28= 112, +}; + +enum nfp_eth_media { + NFP_MEDIA_DAC_PASSIVE = 0, + NFP_MEDIA_DAC_ACTIVE, + NFP_MEDIA_FIBRE, +}; + enum nfp_eth_aneg { NFP_ANEG_AUTO = 0, NFP_ANEG_SEARCH, @@ -56,6 +72,8 @@ enum nfp_eth_aneg { * @base: first channel index (within NBI) * @lanes: number of channels * @speed: interface speed (in Mbps) + * @interface: interface (module) plugged in + * @media: media type of the @interface * @aneg: auto negotiation mode * @mac_addr: interface MAC address * @label_port:port id @@ -65,6 +83,7 @@ enum nfp_eth_aneg { * @rx_enabled:is RX enabled? * @override_changed: is media reconfig pending? * + * @port_type: one of %PORT_* defines for ethtool * @is_split: is interface part of a split port */ struct nfp_eth_table { @@ -77,6 +96,9 @@ struct nfp_eth_table { unsigned int lanes; unsigned int speed; + unsigned int interface; + enum nfp_eth_media media; + enum nfp_eth_aneg aneg;
[PATCH net-next 00/14] nfp: ethtool link settings
Hi! This series adds support for getting and setting link settings via the (moderately) new ethtool ksettings ops. First patch introduces minimal speed and duplex reporting using the information directly provided in PCI BAR0 memory. Next few changes deal with the need to refresh port state read from the service process and patch 6 finally uses that information to provide link speed and duplex. Patches 7 and 8 add auto negotiation and port type reporting. Remaining changes provide the set support for speed and auto negotiation. An upcoming series will also add port splitting support via devlink. Quite a bit of churn in this series is caused by the fact that currently port speed and split changes will usually require a reboot to take effect. Current service process code is not capable of performing MAC reinitialization after chip has been passing traffic. To make sure user is aware of this limitation we refuse the configuration unless netdev is down, print warning to the logs and if configuration was performed but did take effect we unregister the netdev. Service process has a "reboot needed" sticky bit, so reloading the driver will not bring the netdev back. Note that there is a helper in patch 13 which is marked as __always_inline, because the FIELD_* macros require the parameters to be known at compilation time. I hope that is OK. Jakub Kicinski (14): nfp: add support for .get_link_ksettings() nfp: don't spawn netdevs for reconfigured ports nfp: add mutex protection for the port list nfp: track link state changes nfp: add port state refresh nfp: report link speed from NSP nfp: report auto-negotiation in ethtool nfp: report port type in ethtool nfp: separate high level and low level NSP headers nfp: allow multi-stage NSP configuration nfp: turn NSP port entry into a union nfp: add extended error messages nfp: NSP backend for link configuration operations nfp: add support for .set_link_ksettings() drivers/net/ethernet/netronome/nfp/nfp_main.c | 5 +- drivers/net/ethernet/netronome/nfp/nfp_main.h | 11 +- drivers/net/ethernet/netronome/nfp/nfp_net.h | 7 +- .../net/ethernet/netronome/nfp/nfp_net_common.c| 16 +- drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h | 13 + .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 111 +++ drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 187 --- drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h | 20 +- .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 85 - .../nfp/nfpcore/{nfp_nsp_eth.h => nfp_nsp.h} | 68 +++- .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 359 + 11 files changed, 754 insertions(+), 128 deletions(-) rename drivers/net/ethernet/netronome/nfp/nfpcore/{nfp_nsp_eth.h => nfp_nsp.h} (62%) -- 2.11.0
[PATCH net-next 05/14] nfp: add port state refresh
We will need a way of refreshing port state for link settings get/set. For get we need to refresh port speed and type. When settings are changed the reconfiguration may require reboot before it's effective. Unregister netdevs affected by reconfiguration from a workqueue. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_main.h | 3 + drivers/net/ethernet/netronome/nfp/nfp_net.h | 1 + drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 89 +-- 3 files changed, 85 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h b/drivers/net/ethernet/netronome/nfp/nfp_main.h index b7ceec9a5783..b57de047b002 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h @@ -44,6 +44,7 @@ #include #include #include +#include struct dentry; struct pci_dev; @@ -69,6 +70,7 @@ struct nfp_eth_table; * @num_netdevs: Number of netdevs spawned * @ports: Linked list of port structures (struct nfp_net) * @port_lock: Protects @ports, @num_ports, @num_netdevs + * @port_refresh_work: Work entry for taking netdevs out */ struct nfp_pf { struct pci_dev *pdev; @@ -94,6 +96,7 @@ struct nfp_pf { unsigned int num_netdevs; struct list_head ports; + struct work_struct port_refresh_work; struct mutex port_lock; }; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h index 91e963b5104f..052db9208fbb 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h @@ -813,6 +813,7 @@ struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn); int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new); bool nfp_net_link_changed_read_clear(struct nfp_net *nn); +void nfp_net_refresh_port_config(struct nfp_net *nn); #ifdef CONFIG_NFP_DEBUG void nfp_net_debugfs_create(void); diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c index 4d602b1ddc90..8e975c36877c 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "nfpcore/nfp.h" #include "nfpcore/nfp_cpp.h" @@ -468,6 +469,82 @@ nfp_net_pf_spawn_netdevs(struct nfp_pf *pf, return err; } +static void nfp_net_pci_remove_finish(struct nfp_pf *pf) +{ + nfp_net_debugfs_dir_clean(&pf->ddir); + + nfp_net_irqs_disable(pf->pdev); + kfree(pf->irq_entries); + + nfp_cpp_area_release_free(pf->rx_area); + nfp_cpp_area_release_free(pf->tx_area); + nfp_cpp_area_release_free(pf->ctrl_area); +} + +static void nfp_net_refresh_netdevs(struct work_struct *work) +{ + struct nfp_pf *pf = container_of(work, struct nfp_pf, +port_refresh_work); + struct nfp_net *nn, *next; + + mutex_lock(&pf->port_lock); + + /* Check for nfp_net_pci_remove() racing against us */ + if (list_empty(&pf->ports)) + goto out; + + list_for_each_entry_safe(nn, next, &pf->ports, port_list) { + if (!nn->eth_port) { + nfp_warn(pf->cpp, "Warning: port %d not present after reconfig\n", +nn->eth_port->eth_index); + continue; + } + if (!nn->eth_port->override_changed) + continue; + + nn_warn(nn, "Port config changed, unregistering. Reboot required before port will be operational again.\n"); + + nfp_net_debugfs_dir_clean(&nn->debugfs_dir); + nfp_net_netdev_clean(nn->dp.netdev); + + list_del(&nn->port_list); + pf->num_netdevs--; + nfp_net_netdev_free(nn); + } + + if (list_empty(&pf->ports)) + nfp_net_pci_remove_finish(pf); +out: + mutex_unlock(&pf->port_lock); +} + +void nfp_net_refresh_port_config(struct nfp_net *nn) +{ + struct nfp_pf *pf = pci_get_drvdata(nn->pdev); + struct nfp_eth_table *old_table; + + ASSERT_RTNL(); + + old_table = pf->eth_tbl; + + list_for_each_entry(nn, &pf->ports, port_list) + nfp_net_link_changed_read_clear(nn); + + pf->eth_tbl = nfp_eth_read_ports(pf->cpp); + if (!pf->eth_tbl) { + pf->eth_tbl = old_table; + nfp_err(pf->cpp, "Error refreshing port config!\n"); + return; + } + + list_for_each_entry(nn, &pf->ports, port_list) + nn->eth_port = nfp_net_find_port(pf, nn->eth_port->eth_index); + + kfree(old_table); + + schedule_work(&pf->port_refresh_work); +} + /* * PCI device functions */ @@ -481,6 +558,7 @@ int nfp_net_pci_probe(struct nfp_pf *p
[PATCH net-next 02/14] nfp: don't spawn netdevs for reconfigured ports
After port reconfiguration (port split, media type change) firmware will continue to report old configuration until reboot. NSP will inform us that reconfiguration is pending. To avoid user confusion refuse to spawn netdevs until the new configuration is applied (reboot). We need to split the netdev to eth_table port matching from MAC search and move it earlier in the probe() flow. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_main.h | 5 +- drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 79 +- .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 12 +++- .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h | 3 + 4 files changed, 62 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h b/drivers/net/ethernet/netronome/nfp/nfp_main.h index 39105d0435e9..bb15a5724bf7 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h @@ -64,7 +64,8 @@ struct nfp_eth_table; * @fw_loaded: Is the firmware loaded? * @eth_tbl: NSP ETH table * @ddir: Per-device debugfs directory - * @num_ports: Number of adapter ports + * @num_ports: Number of adapter ports app firmware supports + * @num_netdevs: Number of netdevs spawned * @ports: Linked list of port structures (struct nfp_net) */ struct nfp_pf { @@ -88,6 +89,8 @@ struct nfp_pf { struct dentry *ddir; unsigned int num_ports; + unsigned int num_netdevs; + struct list_head ports; }; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c index 2025cb7c6d90..1644954f52cd 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c @@ -129,14 +129,29 @@ static u8 __iomem *nfp_net_map_area(struct nfp_cpp *cpp, return (u8 __iomem *)ERR_PTR(err); } +/** + * nfp_net_get_mac_addr() - Get the MAC address. + * @nn: NFP Network structure + * @cpp: NFP CPP handle + * @id: NFP port id + * + * First try to get the MAC address from NSP ETH table. If that + * fails try HWInfo. As a last resort generate a random address. + */ static void -nfp_net_get_mac_addr_hwinfo(struct nfp_net_dp *dp, struct nfp_cpp *cpp, - unsigned int id) +nfp_net_get_mac_addr(struct nfp_net *nn, struct nfp_cpp *cpp, unsigned int id) { + struct nfp_net_dp *dp = &nn->dp; u8 mac_addr[ETH_ALEN]; const char *mac_str; char name[32]; + if (nn->eth_port) { + ether_addr_copy(dp->netdev->dev_addr, nn->eth_port->mac_addr); + ether_addr_copy(dp->netdev->perm_addr, nn->eth_port->mac_addr); + return; + } + snprintf(name, sizeof(name), "eth%d.mac", id); mac_str = nfp_hwinfo_lookup(cpp, name); @@ -159,32 +174,16 @@ nfp_net_get_mac_addr_hwinfo(struct nfp_net_dp *dp, struct nfp_cpp *cpp, ether_addr_copy(dp->netdev->perm_addr, mac_addr); } -/** - * nfp_net_get_mac_addr() - Get the MAC address. - * @nn: NFP Network structure - * @pf: NFP PF device structure - * @id: NFP port id - * - * First try to get the MAC address from NSP ETH table. If that - * fails try HWInfo. As a last resort generate a random address. - */ -static void -nfp_net_get_mac_addr(struct nfp_net *nn, struct nfp_pf *pf, unsigned int id) +static struct nfp_eth_table_port * +nfp_net_find_port(struct nfp_pf *pf, unsigned int id) { int i; for (i = 0; pf->eth_tbl && i < pf->eth_tbl->count; i++) - if (pf->eth_tbl->ports[i].eth_index == id) { - const u8 *mac_addr = pf->eth_tbl->ports[i].mac_addr; - - nn->eth_port = &pf->eth_tbl->ports[i]; + if (pf->eth_tbl->ports[i].eth_index == id) + return &pf->eth_tbl->ports[i]; - ether_addr_copy(nn->dp.netdev->dev_addr, mac_addr); - ether_addr_copy(nn->dp.netdev->perm_addr, mac_addr); - return; - } - - nfp_net_get_mac_addr_hwinfo(&nn->dp, pf->cpp, id); + return NULL; } static unsigned int nfp_net_pf_get_num_ports(struct nfp_pf *pf) @@ -283,6 +282,7 @@ static void nfp_net_pf_free_netdevs(struct nfp_pf *pf) while (!list_empty(&pf->ports)) { nn = list_first_entry(&pf->ports, struct nfp_net, port_list); list_del(&nn->port_list); + pf->num_netdevs--; nfp_net_netdev_free(nn); } @@ -291,7 +291,8 @@ static void nfp_net_pf_free_netdevs(struct nfp_pf *pf) static struct nfp_net * nfp_net_pf_alloc_port_netdev(struct nfp_pf *pf, void __iomem *ctrl_bar, void __iomem *tx_bar, void __iomem *rx_bar, -
[PATCH net-next 09/14] nfp: separate high level and low level NSP headers
We will soon add more NSP commands and structure definitions. Move all high-level NSP header contents to a common nfp_nsp.h file. Right now it mostly boils down to renaming nfp_nsp_eth.h and moving some functions from nfp.h there. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_main.c | 2 +- drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +- drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 2 +- drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 2 +- drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h | 12 ++-- drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 1 + .../nfp/nfpcore/{nfp_nsp_eth.h => nfp_nsp.h} | 19 ++- .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 2 +- 8 files changed, 22 insertions(+), 20 deletions(-) rename drivers/net/ethernet/netronome/nfp/nfpcore/{nfp_nsp_eth.h => nfp_nsp.h} (89%) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c index 96266796fd09..bea2a1a6c211 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c @@ -48,7 +48,7 @@ #include "nfpcore/nfp.h" #include "nfpcore/nfp_cpp.h" #include "nfpcore/nfp_nffw.h" -#include "nfpcore/nfp_nsp_eth.h" +#include "nfpcore/nfp_nsp.h" #include "nfpcore/nfp6000_pcie.h" diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 8664815f45ce..e2197160e4dc 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -67,7 +67,7 @@ #include #include -#include "nfpcore/nfp_nsp_eth.h" +#include "nfpcore/nfp_nsp.h" #include "nfp_net_ctrl.h" #include "nfp_net.h" diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c index 3b2a09821a59..963d6dd97cec 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -49,7 +49,7 @@ #include #include "nfpcore/nfp.h" -#include "nfpcore/nfp_nsp_eth.h" +#include "nfpcore/nfp_nsp.h" #include "nfp_net_ctrl.h" #include "nfp_net.h" diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c index 8e975c36877c..3e1f97e88710 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c @@ -52,7 +52,7 @@ #include "nfpcore/nfp.h" #include "nfpcore/nfp_cpp.h" #include "nfpcore/nfp_nffw.h" -#include "nfpcore/nfp_nsp_eth.h" +#include "nfpcore/nfp_nsp.h" #include "nfpcore/nfp6000_pcie.h" #include "nfp_net_ctrl.h" diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h index f7ca8e374923..778bd9424d5d 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h @@ -48,18 +48,10 @@ const char *nfp_hwinfo_lookup(struct nfp_cpp *cpp, const char *lookup); -/* Implemented in nfp_nsp.c */ +/* Implemented in nfp_nsp.c, low level functions */ struct nfp_nsp; -struct firmware; - -struct nfp_nsp *nfp_nsp_open(struct nfp_cpp *cpp); -void nfp_nsp_close(struct nfp_nsp *state); -u16 nfp_nsp_get_abi_ver_major(struct nfp_nsp *state); -u16 nfp_nsp_get_abi_ver_minor(struct nfp_nsp *state); -int nfp_nsp_wait(struct nfp_nsp *state); -int nfp_nsp_device_soft_reset(struct nfp_nsp *state); -int nfp_nsp_load_fw(struct nfp_nsp *state, const struct firmware *fw); + int nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int size); int nfp_nsp_write_eth_table(struct nfp_nsp *state, const void *buf, unsigned int size); diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c index 17822ae4a17f..6482831282b2 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c @@ -49,6 +49,7 @@ #include "nfp.h" #include "nfp_cpp.h" +#include "nfp_nsp.h" /* Offsets relative to the CSR base */ #define NSP_STATUS 0x00 diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h similarity index 89% rename from drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h rename to drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h index 57eb3cfa6a0a..e3baec32 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h @@ -31,12 +31,24 @@ * SOFTWARE. */ -#ifndef NSP_NSP_ETH_H -#define NSP_NSP_ETH_H 1 +#ifndef NSP_NSP_H +#define NSP_NSP_H 1 #include #include +struct firmware; +struct nfp_cpp; +struct nfp_nsp; + +struct nfp_nsp *nfp_nsp_open(struct
[PATCH net-next 07/14] nfp: report auto-negotiation in ethtool
NSP ABI version 0.17 is exposing the autonegotiation settings. Report whether autoneg is on via ethtool. Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 4 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 2 ++ drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h | 11 +++ 3 files changed, 17 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c index 0fdc14e7b576..563ced3c99e1 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -203,6 +203,10 @@ nfp_net_get_link_ksettings(struct net_device *netdev, cmd->base.speed = SPEED_UNKNOWN; cmd->base.duplex = DUPLEX_UNKNOWN; + if (nn->eth_port) + cmd->base.autoneg = nn->eth_port->aneg != NFP_ANEG_DISABLED ? + AUTONEG_ENABLE : AUTONEG_DISABLE; + if (!netif_carrier_ok(netdev)) return 0; diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c index 932772fbd27e..dcb1bc81e554 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c @@ -63,6 +63,7 @@ #define NSP_ETH_STATE_RX_ENABLED BIT_ULL(3) #define NSP_ETH_STATE_RATE GENMASK_ULL(11, 8) #define NSP_ETH_STATE_OVRD_CHNGBIT_ULL(22) +#define NSP_ETH_STATE_ANEG GENMASK_ULL(25, 23) #define NSP_ETH_CTRL_ENABLED BIT_ULL(1) #define NSP_ETH_CTRL_TX_ENABLEDBIT_ULL(2) @@ -142,6 +143,7 @@ nfp_eth_port_translate(struct nfp_nsp *nsp, const struct eth_table_entry *src, return; dst->override_changed = FIELD_GET(NSP_ETH_STATE_OVRD_CHNG, state); + dst->aneg = FIELD_GET(NSP_ETH_STATE_ANEG, state); } static void diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h index 6838741fadd7..6b3e954e70b3 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h @@ -37,6 +37,14 @@ #include #include +enum nfp_eth_aneg { + NFP_ANEG_AUTO = 0, + NFP_ANEG_SEARCH, + NFP_ANEG_25G_CONSORTIUM, + NFP_ANEG_25G_IEEE, + NFP_ANEG_DISABLED, +}; + /** * struct nfp_eth_table - ETH table information * @count: number of table entries @@ -48,6 +56,7 @@ * @base: first channel index (within NBI) * @lanes: number of channels * @speed: interface speed (in Mbps) + * @aneg: auto negotiation mode * @mac_addr: interface MAC address * @label_port:port id * @label_subport: id of interface within port (for split ports) @@ -68,6 +77,8 @@ struct nfp_eth_table { unsigned int lanes; unsigned int speed; + enum nfp_eth_aneg aneg; + u8 mac_addr[ETH_ALEN]; u8 label_port; -- 2.11.0
[PATCH net] nfp: fix potential use after free on xdp prog
We should unregister the net_device first, before we give back our reference on xdp_prog. Otherwise xdp_prog may be freed before .ndo_stop() disabled the datapath. Found by code inspection. Fixes: ecd63a0217d5 ("nfp: add XDP support in the driver") Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman --- Just a heads up - this will cause a merge conflict since nn->netdev member got moved to nn->dp.netdev in net-next. drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 9179a99563af..a41377e26c07 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -3275,9 +3275,10 @@ void nfp_net_netdev_clean(struct net_device *netdev) { struct nfp_net *nn = netdev_priv(netdev); + unregister_netdev(nn->netdev); + if (nn->xdp_prog) bpf_prog_put(nn->xdp_prog); if (nn->bpf_offload_xdp) nfp_net_xdp_offload(nn, NULL); - unregister_netdev(nn->netdev); } -- 2.11.0
RE: [PATCH] i40e: limit client interface to X722 hardware
> -Original Message- > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On > Behalf Of Stefan Assmann > Sent: Tuesday, April 04, 2017 12:52 PM > To: Or Gerlitz > Cc: intel-wired-...@lists.osuosl.org; Linux Netdev List > ; David Miller ; Kirsher, > Jeffrey T > Subject: Re: [PATCH] i40e: limit client interface to X722 hardware > > On 04.04.2017 18:56, Or Gerlitz wrote: > > On Tue, Apr 4, 2017 at 5:34 PM, Stefan Assmann wrote: > >> The client interface is meant for X722 iWARP support. Modprobing i40iw > >> on systems with X710/XL710 NICs currently may crash the system. > > > > just curious may or crash? and why? > > The backtrace I got was not really conclusive. The code is not meant to > be run on that hardware so I didn't bother to dig deeper. > > Stefan The i40iw module can't easily determine which hardware its loaded upon. So it assumes that we (i40e, that is) have handed it a handle to valid hardware. When the interface is opened, it starts reading and writing registers that are nonexistent on X710/XL710. -Mitch
Re: net/sched: latent livelock in dev_deactivate_many() due to yield() usage
On Sat, Apr 1, 2017 at 9:28 PM, Mike Galbraith wrote: > Greetings network wizards, > > Quoting kernel/sched/core.c: > /** > * yield - yield the current processor to other threads. > * > * Do not ever use this function, there's a 99% chance you're doing it wrong. > * > * The scheduler is at all times free to pick the calling task as the most > * eligible task to run, if removing the yield() call from your code breaks > * it, its already broken. > * > * Typical broken usage is: > * > * while (!event) > * yield(); > * > * where one assumes that yield() will let 'the other' process run that will > * make event true. If the current task is a SCHED_FIFO task that will never > * happen. Never use yield() as a progress guarantee!! > * > * If you want to use yield() to wait for something, use wait_event(). > * If you want to use yield() to be 'nice' for others, use cond_resched(). > * If you still want to use yield(), do not! > */ > > Livelock can be triggered by setting kworkers to SCHED_FIFO, then > suspend/resume.. you come back from sleepy-land with a spinning > kworker. For whatever reason, I can only do that with an enterprise > like config, my standard config refuses to play, but no matter, it's > "Typical broken usage". > > (yield() should be rendered dead) Thanks for the report! Looks like a quick solution here is to replace this yield() with cond_resched(), it is harder to really wait for all qdisc's to transmit all packets.
Re: [PATCH] ebpf: verify the output of the JIT
On Tue, Apr 4, 2017 at 3:08 PM, Tycho Andersen wrote: > The goal of this patch is to protect the JIT against an attacker with a > write-in-memory primitive. The JIT allocates a buffer which will eventually > be marked +x, so we need to make sure that what was written to this buffer > is what was intended. > > We acheive this by building a hash of the instruction buffer as > instructions are emittted and then comparing that to a hash at the end of > the JIT compile after the buffer has been marked read-only. > > Signed-off-by: Tycho Andersen > CC: Daniel Borkmann > CC: Alexei Starovoitov > CC: Kees Cook > CC: Mickaël Salaün Cool! This closes the race condition on producing the JIT vs going read-only. I wonder if it might be possible to make this a more generic interface to the BPF which would be allocate the hash, provide the update callback during emit, and then do the hash check itself at the end of bpf_jit_binary_lock_ro()? -Kees > --- > arch/x86/Kconfig| 11 > arch/x86/net/bpf_jit_comp.c | 147 > > 2 files changed, 147 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index cc98d5a..7b2db2c 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2789,6 +2789,17 @@ config X86_DMA_REMAP > > source "net/Kconfig" > > +config EBPF_JIT_HASH_OUTPUT > + def_bool y > + depends on HAVE_EBPF_JIT > + depends on BPF_JIT > + select CRYPTO_SHA256 > + ---help--- > + Enables a double check of the JIT's output after it is marked > read-only to > + ensure that it matches what the JIT generated. > + > + Note, only applies when /proc/sys/net/core/bpf_jit_harden > 0. > + > source "drivers/Kconfig" > > source "drivers/firmware/Kconfig" > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c > index 32322ce..be1271e 100644 > --- a/arch/x86/net/bpf_jit_comp.c > +++ b/arch/x86/net/bpf_jit_comp.c > @@ -13,9 +13,15 @@ > #include > #include > #include > +#include > +#include > > int bpf_jit_enable __read_mostly; > > +#ifdef CONFIG_EBPF_JIT_HASH_OUTPUT > +struct crypto_shash *tfm __read_mostly; > +#endif > + > /* > * assembly code in arch/x86/net/bpf_jit.S > */ > @@ -25,7 +31,8 @@ extern u8 sk_load_byte_positive_offset[]; > extern u8 sk_load_word_negative_offset[], sk_load_half_negative_offset[]; > extern u8 sk_load_byte_negative_offset[]; > > -static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len) > +static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len, > +struct shash_desc *hash) > { > if (len == 1) > *ptr = bytes; > @@ -35,11 +42,15 @@ static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len) > *(u32 *)ptr = bytes; > barrier(); > } > + > + if (IS_ENABLED(CONFIG_EBPF_JIT_HASH_OUTPUT) && hash) > + crypto_shash_update(hash, (u8 *) &bytes, len); > + > return ptr + len; > } > > #define EMIT(bytes, len) \ > - do { prog = emit_code(prog, bytes, len); cnt += len; } while (0) > + do { prog = emit_code(prog, bytes, len, hash); cnt += len; } while (0) > > #define EMIT1(b1) EMIT(b1, 1) > #define EMIT2(b1, b2) EMIT((b1) + ((b2) << 8), 2) > @@ -206,7 +217,7 @@ struct jit_context { > /* emit x64 prologue code for BPF program and check it's size. > * bpf_tail_call helper will skip it while jumping into another program > */ > -static void emit_prologue(u8 **pprog) > +static void emit_prologue(u8 **pprog, struct shash_desc *hash) > { > u8 *prog = *pprog; > int cnt = 0; > @@ -264,7 +275,7 @@ static void emit_prologue(u8 **pprog) > * goto *(prog->bpf_func + prologue_size); > * out: > */ > -static void emit_bpf_tail_call(u8 **pprog) > +static void emit_bpf_tail_call(u8 **pprog, struct shash_desc *hash) > { > u8 *prog = *pprog; > int label1, label2, label3; > @@ -328,7 +339,7 @@ static void emit_bpf_tail_call(u8 **pprog) > } > > > -static void emit_load_skb_data_hlen(u8 **pprog) > +static void emit_load_skb_data_hlen(u8 **pprog, struct shash_desc *hash) > { > u8 *prog = *pprog; > int cnt = 0; > @@ -348,7 +359,8 @@ static void emit_load_skb_data_hlen(u8 **pprog) > } > > static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, > - int oldproglen, struct jit_context *ctx) > + int oldproglen, struct jit_context *ctx, > + struct shash_desc *hash) > { > struct bpf_insn *insn = bpf_prog->insnsi; > int insn_cnt = bpf_prog->len; > @@ -360,10 +372,10 @@ static int do_jit(struct bpf_prog *bpf_prog, int > *addrs, u8 *image, > int proglen = 0; > u8 *prog = temp; > > - emit_prologue(&prog); > + emit_prologue(&prog, hash); > > if (seen_ld_abs) > - emit_load_skb_data_hlen(&prog); > + emit_load_skb_dat
[PATCH net-next 04/12] bnxt_en: Add ethtool get_wol method.
Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 16 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h | 1 + 2 files changed, 17 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 6903a87..2b94704 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -1,6 +1,7 @@ /* Broadcom NetXtreme-C/E network driver. * * Copyright (c) 2014-2016 Broadcom Corporation + * Copyright (c) 2016-2017 Broadcom Limited * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -832,6 +833,20 @@ static void bnxt_get_drvinfo(struct net_device *dev, kfree(pkglog); } +static void bnxt_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) +{ + struct bnxt *bp = netdev_priv(dev); + + wol->supported = 0; + wol->wolopts = 0; + memset(&wol->sopass, 0, sizeof(wol->sopass)); + if (bp->flags & BNXT_FLAG_WOL_CAP) { + wol->supported = WAKE_MAGIC; + if (bp->wol) + wol->wolopts = WAKE_MAGIC; + } +} + u32 _bnxt_fw_to_ethtool_adv_spds(u16 fw_speeds, u8 fw_pause) { u32 speed_mask = 0; @@ -2134,6 +2149,7 @@ static int bnxt_set_phys_id(struct net_device *dev, .get_pauseparam = bnxt_get_pauseparam, .set_pauseparam = bnxt_set_pauseparam, .get_drvinfo= bnxt_get_drvinfo, + .get_wol= bnxt_get_wol, .get_coalesce = bnxt_get_coalesce, .set_coalesce = bnxt_set_coalesce, .get_msglevel = bnxt_get_msglevel, diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h index ed1e555..2762171 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h @@ -1,6 +1,7 @@ /* Broadcom NetXtreme-C/E network driver. * * Copyright (c) 2014-2016 Broadcom Corporation + * Copyright (c) 2016-2017 Broadcom Limited * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by -- 1.8.3.1
[PATCH net-next 05/12] bnxt_en: Add ethtool set_wol method.
And add functions to set and free magic packet filter. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 32 +++ drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 ++ drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 26 ++ 3 files changed, 60 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 10a9cda..e432d0a 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -5842,6 +5842,38 @@ static int bnxt_hwrm_port_led_qcaps(struct bnxt *bp) return 0; } +int bnxt_hwrm_alloc_wol_fltr(struct bnxt *bp) +{ + struct hwrm_wol_filter_alloc_input req = {0}; + struct hwrm_wol_filter_alloc_output *resp = bp->hwrm_cmd_resp_addr; + int rc; + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_WOL_FILTER_ALLOC, -1, -1); + req.port_id = cpu_to_le16(bp->pf.port_id); + req.wol_type = WOL_FILTER_ALLOC_REQ_WOL_TYPE_MAGICPKT; + req.enables = cpu_to_le32(WOL_FILTER_ALLOC_REQ_ENABLES_MAC_ADDRESS); + memcpy(req.mac_address, bp->dev->dev_addr, ETH_ALEN); + mutex_lock(&bp->hwrm_cmd_lock); + rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); + if (!rc) + bp->wol_filter_id = resp->wol_filter_id; + mutex_unlock(&bp->hwrm_cmd_lock); + return rc; +} + +int bnxt_hwrm_free_wol_fltr(struct bnxt *bp) +{ + struct hwrm_wol_filter_free_input req = {0}; + int rc; + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_WOL_FILTER_FREE, -1, -1); + req.port_id = cpu_to_le16(bp->pf.port_id); + req.enables = cpu_to_le32(WOL_FILTER_FREE_REQ_ENABLES_WOL_FILTER_ID); + req.wol_filter_id = bp->wol_filter_id; + rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); + return rc; +} + static u16 bnxt_hwrm_get_wol_fltrs(struct bnxt *bp, u16 handle) { struct hwrm_wol_filter_qcfg_input req = {0}; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 02de812..aba25ba 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1242,6 +1242,8 @@ int bnxt_hwrm_func_rgtr_async_events(struct bnxt *bp, unsigned long *bmap, void bnxt_tx_enable(struct bnxt *bp); int bnxt_hwrm_set_pause(struct bnxt *); int bnxt_hwrm_set_link_setting(struct bnxt *, bool, bool); +int bnxt_hwrm_alloc_wol_fltr(struct bnxt *bp); +int bnxt_hwrm_free_wol_fltr(struct bnxt *bp); int bnxt_hwrm_fw_set_time(struct bnxt *); int bnxt_open_nic(struct bnxt *, bool, bool); int bnxt_close_nic(struct bnxt *, bool, bool); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 2b94704..84cd4ca 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -847,6 +847,31 @@ static void bnxt_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) } } +static int bnxt_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) +{ + struct bnxt *bp = netdev_priv(dev); + + if (wol->wolopts & ~WAKE_MAGIC) + return -EINVAL; + + if (wol->wolopts & WAKE_MAGIC) { + if (!(bp->flags & BNXT_FLAG_WOL_CAP)) + return -EINVAL; + if (!bp->wol) { + if (bnxt_hwrm_alloc_wol_fltr(bp)) + return -EBUSY; + bp->wol = 1; + } + } else { + if (bp->wol) { + if (bnxt_hwrm_free_wol_fltr(bp)) + return -EBUSY; + bp->wol = 0; + } + } + return 0; +} + u32 _bnxt_fw_to_ethtool_adv_spds(u16 fw_speeds, u8 fw_pause) { u32 speed_mask = 0; @@ -2150,6 +2175,7 @@ static int bnxt_set_phys_id(struct net_device *dev, .set_pauseparam = bnxt_set_pauseparam, .get_drvinfo= bnxt_get_drvinfo, .get_wol= bnxt_get_wol, + .set_wol= bnxt_set_wol, .get_coalesce = bnxt_get_coalesce, .set_coalesce = bnxt_set_coalesce, .get_msglevel = bnxt_get_msglevel, -- 1.8.3.1
[PATCH net-next 01/12] bnxt_en: Update firmware interface spec to 1.7.6.2.
Features added include WoL and selftest. Signed-off-by: Deepak Khungar Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 325 +--- drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 8 +- drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h | 1 + 3 files changed, 297 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h index 6e275c2..7dc71bb 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h @@ -11,19 +11,21 @@ #ifndef BNXT_HSI_H #define BNXT_HSI_H -/* HSI and HWRM Specification 1.7.0 */ +/* HSI and HWRM Specification 1.7.6 */ #define HWRM_VERSION_MAJOR 1 #define HWRM_VERSION_MINOR 7 -#define HWRM_VERSION_UPDATE0 +#define HWRM_VERSION_UPDATE6 -#define HWRM_VERSION_STR "1.7.0" +#define HWRM_VERSION_RSVD 2 /* non-zero means beta version */ + +#define HWRM_VERSION_STR "1.7.6.2" /* * Following is the signature for HWRM message field that indicates not * applicable (All F's). Need to cast it the size of the field if needed. */ #define HWRM_NA_SIGNATURE ((__le32)(-1)) #define HWRM_MAX_REQ_LEN(128) /* hwrm_func_buf_rgtr */ -#define HWRM_MAX_RESP_LEN(176) /* hwrm_func_qstats */ +#define HWRM_MAX_RESP_LEN(248) /* hwrm_selftest_qlist */ #define HW_HASH_INDEX_SIZE 0x80/* 7 bit indirection table index. */ #define HW_HASH_KEY_SIZE 40 #define HWRM_RESP_VALID_KEY 1 /* valid key for HWRM response */ @@ -571,9 +573,10 @@ struct hwrm_ver_get_output { __le16 max_req_win_len; __le16 max_resp_len; __le16 def_req_timeout; + u8 init_pending; + #define VER_GET_RESP_INIT_PENDING_DEV_NOT_RDY 0x1UL u8 unused_0; u8 unused_1; - u8 unused_2; u8 valid; }; @@ -809,6 +812,8 @@ struct hwrm_func_qcfg_output { #define FUNC_QCFG_RESP_FLAGS_OOB_WOL_BMP_ENABLED0x2UL #define FUNC_QCFG_RESP_FLAGS_FW_DCBX_AGENT_ENABLED 0x4UL #define FUNC_QCFG_RESP_FLAGS_STD_TX_RING_MODE_ENABLED 0x8UL + #define FUNC_QCFG_RESP_FLAGS_FW_LLDP_AGENT_ENABLED 0x10UL + #define FUNC_QCFG_RESP_FLAGS_MULTI_HOST 0x20UL u8 mac_address[6]; __le16 pci_id; __le16 alloc_rsscos_ctx; @@ -827,10 +832,12 @@ struct hwrm_func_qcfg_output { #define FUNC_QCFG_RESP_PORT_PARTITION_TYPE_NPAR1_5 0x3UL #define FUNC_QCFG_RESP_PORT_PARTITION_TYPE_NPAR2_0 0x4UL #define FUNC_QCFG_RESP_PORT_PARTITION_TYPE_UNKNOWN 0xffUL - u8 unused_0; + u8 port_pf_cnt; + #define FUNC_QCFG_RESP_PORT_PF_CNT_UNAVAIL 0x0UL __le16 dflt_vnic_id; - u8 unused_1; - u8 unused_2; + u8 host_cnt; + #define FUNC_QCFG_RESP_HOST_CNT_UNAVAIL0x0UL + u8 unused_0; __le32 min_bw; #define FUNC_QCFG_RESP_MIN_BW_BW_VALUE_MASK 0xfffUL #define FUNC_QCFG_RESP_MIN_BW_BW_VALUE_SFT 0 @@ -867,12 +874,12 @@ struct hwrm_func_qcfg_output { #define FUNC_QCFG_RESP_EVB_MODE_NO_EVB 0x0UL #define FUNC_QCFG_RESP_EVB_MODE_VEB0x1UL #define FUNC_QCFG_RESP_EVB_MODE_VEPA 0x2UL - u8 unused_3; + u8 unused_1; __le16 alloc_vfs; __le32 alloc_mcast_filters; __le32 alloc_hw_ring_grps; __le16 alloc_sp_tx_rings; - u8 unused_4; + u8 unused_2; u8 valid; }; @@ -888,16 +895,13 @@ struct hwrm_func_cfg_input { u8 unused_0; u8 unused_1; __le32 flags; - #define FUNC_CFG_REQ_FLAGS_PROM_MODE0x1UL - #define FUNC_CFG_REQ_FLAGS_SRC_MAC_ADDR_CHECK 0x2UL - #define FUNC_CFG_REQ_FLAGS_SRC_IP_ADDR_CHECK0x4UL - #define FUNC_CFG_REQ_FLAGS_VLAN_PRI_MATCH 0x8UL - #define FUNC_CFG_REQ_FLAGS_DFLT_PRI_NOMATCH 0x10UL - #define FUNC_CFG_REQ_FLAGS_DISABLE_PAUSE0x20UL - #define FUNC_CFG_REQ_FLAGS_DISABLE_STP 0x40UL - #define FUNC_CFG_REQ_FLAGS_DISABLE_LLDP 0x80UL - #define FUNC_CFG_REQ_FLAGS_DISABLE_PTPV20x100UL - #define FUNC_CFG_REQ_FLAGS_STD_TX_RING_MODE 0x200UL + #define FUNC_CFG_REQ_FLAGS_SRC_MAC_ADDR_CHECK_DISABLE 0x1UL + #define FUNC_CFG_REQ_FLAGS_SRC_MAC_ADDR_CHECK_ENABLE 0x2UL + #define FUNC_CFG_REQ_FLAGS_RSVD_MASK0x1fcUL + #define FUNC_CFG_REQ_FLAGS_RSVD_SFT 2 + #define FUNC_CFG_REQ_FLAGS_STD_TX_RING_MODE_ENABLE 0x200UL + #define FUNC_CFG_REQ_FLAGS_STD_TX_RING_MODE_DISABLE 0x400UL + #defi
[PATCH net-next 10/12] bnxt_en: Add interrupt test to ethtool -t selftest.
Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 32 ++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index dde3e21..848ecf2 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -2178,6 +2178,29 @@ static int bnxt_set_phys_id(struct net_device *dev, return rc; } +static int bnxt_hwrm_selftest_irq(struct bnxt *bp, u16 cmpl_ring) +{ + struct hwrm_selftest_irq_input req = {0}; + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_SELFTEST_IRQ, cmpl_ring, -1); + return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); +} + +static int bnxt_test_irq(struct bnxt *bp) +{ + int i; + + for (i = 0; i < bp->cp_nr_rings; i++) { + u16 cmpl_ring = bp->grp_info[i].cp_fw_ring_id; + int rc; + + rc = bnxt_hwrm_selftest_irq(bp, cmpl_ring); + if (rc) + return rc; + } + return 0; +} + static int bnxt_hwrm_mac_loopback(struct bnxt *bp, bool enable) { struct hwrm_port_mac_cfg_input req = {0}; @@ -2366,9 +2389,10 @@ static int bnxt_run_fw_tests(struct bnxt *bp, u8 test_mask, u8 *test_results) return rc; } -#define BNXT_DRV_TESTS 2 +#define BNXT_DRV_TESTS 3 #define BNXT_MACLPBK_TEST_IDX (bp->num_tests - BNXT_DRV_TESTS) #define BNXT_PHYLPBK_TEST_IDX (BNXT_MACLPBK_TEST_IDX + 1) +#define BNXT_IRQ_TEST_IDX (BNXT_MACLPBK_TEST_IDX + 2) static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest, u64 *buf) @@ -2437,6 +2461,10 @@ static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest, bnxt_half_close_nic(bp); bnxt_open_nic(bp, false, true); } + if (bnxt_test_irq(bp)) { + buf[BNXT_IRQ_TEST_IDX] = 1; + etest->flags |= ETH_TEST_FL_FAILED; + } for (i = 0; i < bp->num_tests - BNXT_DRV_TESTS; i++) { u8 bit_val = 1 << i; @@ -2484,6 +2512,8 @@ void bnxt_ethtool_init(struct bnxt *bp) strcpy(str, "Mac loopback test (offline)"); } else if (i == BNXT_PHYLPBK_TEST_IDX) { strcpy(str, "Phy loopback test (offline)"); + } else if (i == BNXT_IRQ_TEST_IDX) { + strcpy(str, "Interrupt_test (offline)"); } else { strlcpy(str, fw_str, ETH_GSTRING_LEN); strncat(str, " test", ETH_GSTRING_LEN - strlen(str)); -- 1.8.3.1
[PATCH net-next 09/12] bnxt_en: Add PHY loopback to ethtool self-test.
It is necessary to disable autoneg before enabling PHY loopback, otherwise link won't come up. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 62 ++- 1 file changed, 60 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index ecb4417..dde3e21 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -2192,6 +2192,54 @@ static int bnxt_hwrm_mac_loopback(struct bnxt *bp, bool enable) return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); } +static int bnxt_disable_an_for_lpbk(struct bnxt *bp, + struct hwrm_port_phy_cfg_input *req) +{ + struct bnxt_link_info *link_info = &bp->link_info; + u16 fw_advertising = link_info->advertising; + u16 fw_speed; + int rc; + + if (!link_info->autoneg) + return 0; + + fw_speed = PORT_PHY_CFG_REQ_FORCE_LINK_SPEED_1GB; + if (netif_carrier_ok(bp->dev)) + fw_speed = bp->link_info.link_speed; + else if (fw_advertising & BNXT_LINK_SPEED_MSK_10GB) + fw_speed = PORT_PHY_CFG_REQ_FORCE_LINK_SPEED_10GB; + else if (fw_advertising & BNXT_LINK_SPEED_MSK_25GB) + fw_speed = PORT_PHY_CFG_REQ_FORCE_LINK_SPEED_25GB; + else if (fw_advertising & BNXT_LINK_SPEED_MSK_40GB) + fw_speed = PORT_PHY_CFG_REQ_FORCE_LINK_SPEED_40GB; + else if (fw_advertising & BNXT_LINK_SPEED_MSK_50GB) + fw_speed = PORT_PHY_CFG_REQ_FORCE_LINK_SPEED_50GB; + + req->force_link_speed = cpu_to_le16(fw_speed); + req->flags |= cpu_to_le32(PORT_PHY_CFG_REQ_FLAGS_FORCE | + PORT_PHY_CFG_REQ_FLAGS_RESET_PHY); + rc = hwrm_send_message(bp, req, sizeof(*req), HWRM_CMD_TIMEOUT); + req->flags = 0; + req->force_link_speed = cpu_to_le16(0); + return rc; +} + +static int bnxt_hwrm_phy_loopback(struct bnxt *bp, bool enable) +{ + struct hwrm_port_phy_cfg_input req = {0}; + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_PORT_PHY_CFG, -1, -1); + + if (enable) { + bnxt_disable_an_for_lpbk(bp, &req); + req.lpbk = PORT_PHY_CFG_REQ_LPBK_LOCAL; + } else { + req.lpbk = PORT_PHY_CFG_REQ_LPBK_NONE; + } + req.enables = cpu_to_le32(PORT_PHY_CFG_REQ_ENABLES_LPBK); + return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); +} + static int bnxt_rx_loopback(struct bnxt *bp, struct bnxt_napi *bnapi, u32 raw_cons, int pkt_size) { @@ -2318,8 +2366,9 @@ static int bnxt_run_fw_tests(struct bnxt *bp, u8 test_mask, u8 *test_results) return rc; } -#define BNXT_DRV_TESTS 1 +#define BNXT_DRV_TESTS 2 #define BNXT_MACLPBK_TEST_IDX (bp->num_tests - BNXT_DRV_TESTS) +#define BNXT_PHYLPBK_TEST_IDX (BNXT_MACLPBK_TEST_IDX + 1) static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest, u64 *buf) @@ -2377,8 +2426,15 @@ static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest, else buf[BNXT_MACLPBK_TEST_IDX] = 0; - bnxt_half_close_nic(bp); bnxt_hwrm_mac_loopback(bp, false); + bnxt_hwrm_phy_loopback(bp, true); + msleep(1000); + if (bnxt_run_loopback(bp)) { + buf[BNXT_PHYLPBK_TEST_IDX] = 1; + etest->flags |= ETH_TEST_FL_FAILED; + } + bnxt_hwrm_phy_loopback(bp, false); + bnxt_half_close_nic(bp); bnxt_open_nic(bp, false, true); } for (i = 0; i < bp->num_tests - BNXT_DRV_TESTS; i++) { @@ -2426,6 +2482,8 @@ void bnxt_ethtool_init(struct bnxt *bp) if (i == BNXT_MACLPBK_TEST_IDX) { strcpy(str, "Mac loopback test (offline)"); + } else if (i == BNXT_PHYLPBK_TEST_IDX) { + strcpy(str, "Phy loopback test (offline)"); } else { strlcpy(str, fw_str, ETH_GSTRING_LEN); strncat(str, " test", ETH_GSTRING_LEN - strlen(str)); -- 1.8.3.1
[PATCH net-next 11/12] bnxt_en: Use short TX BDs for the XDP TX ring.
No offload is performed on the XDP_TX ring so we can use the short TX BDs. This has the effect of doubling the size of the XDP TX ring so that it now matches the size of the rx ring by default. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 16 ++-- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 8b27137..9dae327 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -23,7 +23,6 @@ void bnxt_xmit_xdp(struct bnxt *bp, struct bnxt_tx_ring_info *txr, dma_addr_t mapping, u32 len, u16 rx_prod) { struct bnxt_sw_tx_bd *tx_buf; - struct tx_bd_ext *txbd1; struct tx_bd *txbd; u32 flags; u16 prod; @@ -33,23 +32,13 @@ void bnxt_xmit_xdp(struct bnxt *bp, struct bnxt_tx_ring_info *txr, tx_buf->rx_prod = rx_prod; txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)]; - flags = (len << TX_BD_LEN_SHIFT) | TX_BD_TYPE_LONG_TX_BD | - (2 << TX_BD_FLAGS_BD_CNT_SHIFT) | TX_BD_FLAGS_COAL_NOW | + flags = (len << TX_BD_LEN_SHIFT) | (1 << TX_BD_FLAGS_BD_CNT_SHIFT) | TX_BD_FLAGS_PACKET_END | bnxt_lhint_arr[len >> 9]; txbd->tx_bd_len_flags_type = cpu_to_le32(flags); txbd->tx_bd_opaque = prod; txbd->tx_bd_haddr = cpu_to_le64(mapping); prod = NEXT_TX(prod); - txbd1 = (struct tx_bd_ext *) - &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)]; - - txbd1->tx_bd_hsize_lflags = cpu_to_le32(0); - txbd1->tx_bd_mss = cpu_to_le32(0); - txbd1->tx_bd_cfa_action = cpu_to_le32(0); - txbd1->tx_bd_cfa_meta = cpu_to_le32(0); - - prod = NEXT_TX(prod); txr->tx_prod = prod; } @@ -66,7 +55,6 @@ void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts) for (i = 0; i < nr_pkts; i++) { last_tx_cons = tx_cons; tx_cons = NEXT_TX(tx_cons); - tx_cons = NEXT_TX(tx_cons); } txr->tx_cons = tx_cons; if (bnxt_tx_avail(bp, txr) == bp->tx_ring_size) { @@ -133,7 +121,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, return false; case XDP_TX: - if (tx_avail < 2) { + if (tx_avail < 1) { trace_xdp_exception(bp->dev, xdp_prog, act); bnxt_reuse_rx_data(rxr, cons, page); return true; -- 1.8.3.1
[PATCH net-next 12/12] bnxt_en: Cap the msix vector with the max completion rings.
The current code enables up to the maximum MSIX vectors in the PCIE config space without considering the max completion rings available. An MSIX vector is only useful when it has an associated completion ring, so it is better to cap it. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 9d71c19..43b7342 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -5183,9 +5183,10 @@ static unsigned int bnxt_get_max_func_irqs(struct bnxt *bp) { #if defined(CONFIG_BNXT_SRIOV) if (BNXT_VF(bp)) - return bp->vf.max_irqs; + return min_t(unsigned int, bp->vf.max_irqs, +bp->vf.max_cp_rings); #endif - return bp->pf.max_irqs; + return min_t(unsigned int, bp->pf.max_irqs, bp->pf.max_cp_rings); } void bnxt_set_max_func_irqs(struct bnxt *bp, unsigned int max_irqs) -- 1.8.3.1
[PATCH net-next 08/12] bnxt_en: Add ethtool mac loopback self test.
The mac loopback self test operates in polling mode. To support that, we need to add functions to open and close the NIC half way. The half open mode allows the rings to operate without IRQ and NAPI. We use the XDP transmit function to send the loopback packet. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 37 + drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 + drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 165 -- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 4 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 2 + 5 files changed, 199 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 7b72ba9..9d71c19 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -6097,6 +6097,43 @@ int bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init) return rc; } +/* rtnl_lock held, open the NIC half way by allocating all resources, but + * NAPI, IRQ, and TX are not enabled. This is mainly used for offline + * self tests. + */ +int bnxt_half_open_nic(struct bnxt *bp) +{ + int rc = 0; + + rc = bnxt_alloc_mem(bp, false); + if (rc) { + netdev_err(bp->dev, "bnxt_alloc_mem err: %x\n", rc); + goto half_open_err; + } + rc = bnxt_init_nic(bp, false); + if (rc) { + netdev_err(bp->dev, "bnxt_init_nic err: %x\n", rc); + goto half_open_err; + } + return 0; + +half_open_err: + bnxt_free_skbs(bp); + bnxt_free_mem(bp, false); + dev_close(bp->dev); + return rc; +} + +/* rtnl_lock held, this call can only be made after a previous successful + * call to bnxt_half_open_nic(). + */ +void bnxt_half_close_nic(struct bnxt *bp) +{ + bnxt_hwrm_resource_free(bp, false, false); + bnxt_free_skbs(bp); + bnxt_free_mem(bp, false); +} + static int bnxt_open(struct net_device *dev) { struct bnxt *bp = netdev_priv(dev); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 4affaac..c9a1688 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1255,6 +1255,8 @@ int bnxt_hwrm_func_rgtr_async_events(struct bnxt *bp, unsigned long *bmap, int bnxt_hwrm_free_wol_fltr(struct bnxt *bp); int bnxt_hwrm_fw_set_time(struct bnxt *); int bnxt_open_nic(struct bnxt *, bool, bool); +int bnxt_half_open_nic(struct bnxt *bp); +void bnxt_half_close_nic(struct bnxt *bp); int bnxt_close_nic(struct bnxt *, bool, bool); int bnxt_reserve_rings(struct bnxt *bp, int tx, int rx, int tcs, int tx_xdp); int bnxt_setup_mq_tc(struct net_device *dev, u8 tc); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 711d7fd..ecb4417 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -18,6 +18,7 @@ #include #include "bnxt_hsi.h" #include "bnxt.h" +#include "bnxt_xdp.h" #include "bnxt_ethtool.h" #include "bnxt_nvm_defs.h" /* NVRAM content constant and structure defs */ #include "bnxt_fw_hdr.h" /* Firmware hdr constant and structure defs */ @@ -2177,6 +2178,130 @@ static int bnxt_set_phys_id(struct net_device *dev, return rc; } +static int bnxt_hwrm_mac_loopback(struct bnxt *bp, bool enable) +{ + struct hwrm_port_mac_cfg_input req = {0}; + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_PORT_MAC_CFG, -1, -1); + + req.enables = cpu_to_le32(PORT_MAC_CFG_REQ_ENABLES_LPBK); + if (enable) + req.lpbk = PORT_MAC_CFG_REQ_LPBK_LOCAL; + else + req.lpbk = PORT_MAC_CFG_REQ_LPBK_NONE; + return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); +} + +static int bnxt_rx_loopback(struct bnxt *bp, struct bnxt_napi *bnapi, + u32 raw_cons, int pkt_size) +{ + struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring; + struct bnxt_rx_ring_info *rxr = bnapi->rx_ring; + struct bnxt_sw_rx_bd *rx_buf; + struct rx_cmp *rxcmp; + u16 cp_cons, cons; + u8 *data; + u32 len; + int i; + + cp_cons = RING_CMP(raw_cons); + rxcmp = (struct rx_cmp *) + &cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)]; + cons = rxcmp->rx_cmp_opaque; + rx_buf = &rxr->rx_buf_ring[cons]; + data = rx_buf->data_ptr; + len = le32_to_cpu(rxcmp->rx_cmp_len_flags_type) >> RX_CMP_LEN_SHIFT; + if (len != pkt_size) + return -EIO; + i = ETH_ALEN; + if (!ether_addr_equal(data + i, bnapi->bp->dev->dev_addr)) + return -EIO; + i += ETH_ALEN; + for ( ; i < pkt_size; i++) { + if (data[i] != (u8)(i & 0xff)) + ret
[PATCH net-next 02/12] bnxt_en: Add basic WoL infrastructure.
Add code to driver probe function to check if the device is WoL capable and if Magic packet WoL filter is currently set. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 43 +++ drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 +++ 2 files changed, 47 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 174ec8f..70cc313 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4532,6 +4532,9 @@ static int bnxt_hwrm_func_qcaps(struct bnxt *bp) pf->max_tx_wm_flows = le32_to_cpu(resp->max_tx_wm_flows); pf->max_rx_em_flows = le32_to_cpu(resp->max_rx_em_flows); pf->max_rx_wm_flows = le32_to_cpu(resp->max_rx_wm_flows); + if (resp->flags & + cpu_to_le32(FUNC_QCAPS_RESP_FLAGS_WOL_MAGICPKT_SUPPORTED)) + bp->flags |= BNXT_FLAG_WOL_CAP; } else { #ifdef CONFIG_BNXT_SRIOV struct bnxt_vf_info *vf = &bp->vf; @@ -5839,6 +5842,44 @@ static int bnxt_hwrm_port_led_qcaps(struct bnxt *bp) return 0; } +static u16 bnxt_hwrm_get_wol_fltrs(struct bnxt *bp, u16 handle) +{ + struct hwrm_wol_filter_qcfg_input req = {0}; + struct hwrm_wol_filter_qcfg_output *resp = bp->hwrm_cmd_resp_addr; + u16 next_handle = 0; + int rc; + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_WOL_FILTER_QCFG, -1, -1); + req.port_id = cpu_to_le16(bp->pf.port_id); + req.handle = cpu_to_le16(handle); + mutex_lock(&bp->hwrm_cmd_lock); + rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); + if (!rc) { + next_handle = le16_to_cpu(resp->next_handle); + if (next_handle != 0) { + if (resp->wol_type == + WOL_FILTER_ALLOC_REQ_WOL_TYPE_MAGICPKT) { + bp->wol = 1; + bp->wol_filter_id = resp->wol_filter_id; + } + } + } + mutex_unlock(&bp->hwrm_cmd_lock); + return next_handle; +} + +static void bnxt_get_wol_settings(struct bnxt *bp) +{ + u16 handle = 0; + + if (!BNXT_PF(bp) || !(bp->flags & BNXT_FLAG_WOL_CAP)) + return; + + do { + handle = bnxt_hwrm_get_wol_fltrs(bp, handle); + } while (handle && handle != 0x); +} + static bool bnxt_eee_config_ok(struct bnxt *bp) { struct ethtool_eee *eee = &bp->eee; @@ -7575,6 +7616,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) if (rc) goto init_err_pci_clean; + bnxt_get_wol_settings(bp); + rc = register_netdev(dev); if (rc) goto init_err_clr_int; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 3cb0777..02de812 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -989,6 +989,7 @@ struct bnxt { #define BNXT_FLAG_UDP_RSS_CAP 0x800 #define BNXT_FLAG_EEE_CAP 0x1000 #define BNXT_FLAG_NEW_RSS_CAP 0x2000 + #define BNXT_FLAG_WOL_CAP 0x4000 #define BNXT_FLAG_ROCEV1_CAP0x8000 #define BNXT_FLAG_ROCEV2_CAP0x1 #define BNXT_FLAG_ROCE_CAP (BNXT_FLAG_ROCEV1_CAP | \ @@ -1180,6 +1181,9 @@ struct bnxt { u32 lpi_tmr_lo; u32 lpi_tmr_hi; + u8 wol_filter_id; + u8 wol; + u8 num_leds; struct bnxt_led_infoleds[BNXT_MAX_LED]; -- 1.8.3.1
[PATCH net-next 00/12] bnxt_en: Updates for net-next.
Main changes are to add WoL and selftest features, optimize XDP_TX by using short BDs, and to cap the usage of MSIX. Michael Chan (12): bnxt_en: Update firmware interface spec to 1.7.6.2. bnxt_en: Add basic WoL infrastructure. bnxt_en: Add pci shutdown method. bnxt_en: Add ethtool get_wol method. bnxt_en: Add ethtool set_wol method. bnxt_en: Add suspend/resume callbacks. bnxt_en: Add basic ethtool -t selftest support. bnxt_en: Add ethtool mac loopback self test. bnxt_en: Add PHY loopback to ethtool self-test. bnxt_en: Add interrupt test to ethtool -t selftest. bnxt_en: Use short TX BDs for the XDP TX ring. bnxt_en: Cap the msix vector with the max completion rings. drivers/net/ethernet/broadcom/bnxt/bnxt.c | 207 ++- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 21 +- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 413 +- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h | 3 + drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 325 +++-- drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 8 +- drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h | 1 + drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 20 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 2 + 9 files changed, 942 insertions(+), 58 deletions(-) -- 1.8.3.1
[PATCH net-next 07/12] bnxt_en: Add basic ethtool -t selftest support.
Add the basic infrastructure and only firmware tests initially. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 + drivers/net/ethernet/broadcom/bnxt/bnxt.h | 13 ++- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 136 +- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h | 2 + 4 files changed, 150 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 4e77bbf..7b72ba9 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -7281,6 +7281,7 @@ static void bnxt_remove_one(struct pci_dev *pdev) bnxt_clear_int_mode(bp); bnxt_hwrm_func_drv_unrgtr(bp); bnxt_free_hwrm_resources(bp); + bnxt_ethtool_free(bp); bnxt_dcb_free(bp); kfree(bp->edev); bp->edev = NULL; @@ -7603,6 +7604,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) bnxt_hwrm_func_qcfg(bp); bnxt_hwrm_port_led_qcaps(bp); + bnxt_ethtool_init(bp); bnxt_set_rx_skb_mode(bp, false); bnxt_set_tpa_flags(bp); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index aba25ba..4affaac 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -426,8 +426,6 @@ struct rx_tpa_end_cmp_ext { #define BNXT_MIN_PKT_SIZE 52 -#define BNXT_NUM_TESTS(bp) 0 - #define BNXT_DEFAULT_RX_RING_SIZE 511 #define BNXT_DEFAULT_TX_RING_SIZE 511 @@ -911,6 +909,14 @@ struct bnxt_led_info { __le16 led_color_caps; }; +#define BNXT_MAX_TEST 8 + +struct bnxt_test_info { + u8 offline_mask; + u16 timeout; + char string[BNXT_MAX_TEST][ETH_GSTRING_LEN]; +}; + #define BNXT_GRCPF_REG_WINDOW_BASE_OUT 0x400 #define BNXT_CAG_REG_LEGACY_INT_STATUS 0x4014 #define BNXT_CAG_REG_BASE 0x30 @@ -1181,6 +1187,9 @@ struct bnxt { u32 lpi_tmr_lo; u32 lpi_tmr_hi; + u8 num_tests; + struct bnxt_test_info *test_info; + u8 wol_filter_id; u8 wol; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 84cd4ca..711d7fd 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -210,6 +210,10 @@ static int bnxt_get_sset_count(struct net_device *dev, int sset) return num_stats; } + case ETH_SS_TEST: + if (!bp->num_tests) + return -EOPNOTSUPP; + return bp->num_tests; default: return -EOPNOTSUPP; } @@ -307,6 +311,11 @@ static void bnxt_get_strings(struct net_device *dev, u32 stringset, u8 *buf) } } break; + case ETH_SS_TEST: + if (bp->num_tests) + memcpy(buf, bp->test_info->string, + bp->num_tests * ETH_GSTRING_LEN); + break; default: netdev_err(bp->dev, "bnxt_get_strings invalid request %x\n", stringset); @@ -825,7 +834,7 @@ static void bnxt_get_drvinfo(struct net_device *dev, sizeof(info->fw_version)); strlcpy(info->bus_info, pci_name(bp->pdev), sizeof(info->bus_info)); info->n_stats = BNXT_NUM_STATS * bp->cp_nr_rings; - info->testinfo_len = BNXT_NUM_TESTS(bp); + info->testinfo_len = bp->num_tests; /* TODO CHIMP_FW: eeprom dump details */ info->eedump_len = 0; /* TODO CHIMP FW: reg dump details */ @@ -2168,6 +2177,130 @@ static int bnxt_set_phys_id(struct net_device *dev, return rc; } +static int bnxt_run_fw_tests(struct bnxt *bp, u8 test_mask, u8 *test_results) +{ + struct hwrm_selftest_exec_output *resp = bp->hwrm_cmd_resp_addr; + struct hwrm_selftest_exec_input req = {0}; + int rc; + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_SELFTEST_EXEC, -1, -1); + mutex_lock(&bp->hwrm_cmd_lock); + resp->test_success = 0; + req.flags = test_mask; + rc = _hwrm_send_message(bp, &req, sizeof(req), bp->test_info->timeout); + *test_results = resp->test_success; + mutex_unlock(&bp->hwrm_cmd_lock); + return rc; +} + +#define BNXT_DRV_TESTS 0 + +static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest, + u64 *buf) +{ + struct bnxt *bp = netdev_priv(dev); + bool offline = false; + u8 test_results = 0; + u8 test_mask = 0; + int rc, i; + + if (!bp->num_tests || !BNXT_SINGLE_PF(bp)) + return; + mem
[PATCH net-next 06/12] bnxt_en: Add suspend/resume callbacks.
Add suspend/resume callbacks using the newer dev_pm_ops method. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 57 +++ 1 file changed, 57 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index e432d0a..4e77bbf 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -7703,6 +7703,62 @@ static void bnxt_shutdown(struct pci_dev *pdev) rtnl_unlock(); } +#ifdef CONFIG_PM_SLEEP +static int bnxt_suspend(struct device *device) +{ + struct pci_dev *pdev = to_pci_dev(device); + struct net_device *dev = pci_get_drvdata(pdev); + struct bnxt *bp = netdev_priv(dev); + int rc = 0; + + rtnl_lock(); + if (netif_running(dev)) { + netif_device_detach(dev); + rc = bnxt_close(dev); + } + bnxt_hwrm_func_drv_unrgtr(bp); + rtnl_unlock(); + return rc; +} + +static int bnxt_resume(struct device *device) +{ + struct pci_dev *pdev = to_pci_dev(device); + struct net_device *dev = pci_get_drvdata(pdev); + struct bnxt *bp = netdev_priv(dev); + int rc = 0; + + rtnl_lock(); + if (bnxt_hwrm_ver_get(bp) || bnxt_hwrm_func_drv_rgtr(bp)) { + rc = -ENODEV; + goto resume_exit; + } + rc = bnxt_hwrm_func_reset(bp); + if (rc) { + rc = -EBUSY; + goto resume_exit; + } + bnxt_get_wol_settings(bp); + if (netif_running(dev)) { + rc = bnxt_open(dev); + if (!rc) + netif_device_attach(dev); + } + +resume_exit: + rtnl_unlock(); + return rc; +} + +static SIMPLE_DEV_PM_OPS(bnxt_pm_ops, bnxt_suspend, bnxt_resume); +#define BNXT_PM_OPS (&bnxt_pm_ops) + +#else + +#define BNXT_PM_OPS NULL + +#endif /* CONFIG_PM_SLEEP */ + /** * bnxt_io_error_detected - called when PCI error is detected * @pdev: Pointer to PCI device @@ -7820,6 +7876,7 @@ static void bnxt_io_resume(struct pci_dev *pdev) .probe = bnxt_init_one, .remove = bnxt_remove_one, .shutdown = bnxt_shutdown, + .driver.pm = BNXT_PM_OPS, .err_handler= &bnxt_err_handler, #if defined(CONFIG_BNXT_SRIOV) .sriov_configure = bnxt_sriov_configure, -- 1.8.3.1
[PATCH net-next 03/12] bnxt_en: Add pci shutdown method.
Add pci shutdown method to put device in the proper WoL and power state. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 70cc313..10a9cda 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -7617,6 +7617,10 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) goto init_err_pci_clean; bnxt_get_wol_settings(bp); + if (bp->flags & BNXT_FLAG_WOL_CAP) + device_set_wakeup_enable(&pdev->dev, bp->wol); + else + device_set_wakeup_capable(&pdev->dev, false); rc = register_netdev(dev); if (rc) @@ -7641,6 +7645,32 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) return rc; } +static void bnxt_shutdown(struct pci_dev *pdev) +{ + struct net_device *dev = pci_get_drvdata(pdev); + struct bnxt *bp; + + if (!dev) + return; + + rtnl_lock(); + bp = netdev_priv(dev); + if (!bp) + goto shutdown_exit; + + if (netif_running(dev)) + dev_close(dev); + + if (system_state == SYSTEM_POWER_OFF) { + bnxt_clear_int_mode(bp); + pci_wake_from_d3(pdev, bp->wol); + pci_set_power_state(pdev, PCI_D3hot); + } + +shutdown_exit: + rtnl_unlock(); +} + /** * bnxt_io_error_detected - called when PCI error is detected * @pdev: Pointer to PCI device @@ -7757,6 +7787,7 @@ static void bnxt_io_resume(struct pci_dev *pdev) .id_table = bnxt_pci_tbl, .probe = bnxt_init_one, .remove = bnxt_remove_one, + .shutdown = bnxt_shutdown, .err_handler= &bnxt_err_handler, #if defined(CONFIG_BNXT_SRIOV) .sriov_configure = bnxt_sriov_configure, -- 1.8.3.1
[PATCH] ebpf: verify the output of the JIT
The goal of this patch is to protect the JIT against an attacker with a write-in-memory primitive. The JIT allocates a buffer which will eventually be marked +x, so we need to make sure that what was written to this buffer is what was intended. We acheive this by building a hash of the instruction buffer as instructions are emittted and then comparing that to a hash at the end of the JIT compile after the buffer has been marked read-only. Signed-off-by: Tycho Andersen CC: Daniel Borkmann CC: Alexei Starovoitov CC: Kees Cook CC: Mickaël Salaün --- arch/x86/Kconfig| 11 arch/x86/net/bpf_jit_comp.c | 147 2 files changed, 147 insertions(+), 11 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index cc98d5a..7b2db2c 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2789,6 +2789,17 @@ config X86_DMA_REMAP source "net/Kconfig" +config EBPF_JIT_HASH_OUTPUT + def_bool y + depends on HAVE_EBPF_JIT + depends on BPF_JIT + select CRYPTO_SHA256 + ---help--- + Enables a double check of the JIT's output after it is marked read-only to + ensure that it matches what the JIT generated. + + Note, only applies when /proc/sys/net/core/bpf_jit_harden > 0. + source "drivers/Kconfig" source "drivers/firmware/Kconfig" diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 32322ce..be1271e 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -13,9 +13,15 @@ #include #include #include +#include +#include int bpf_jit_enable __read_mostly; +#ifdef CONFIG_EBPF_JIT_HASH_OUTPUT +struct crypto_shash *tfm __read_mostly; +#endif + /* * assembly code in arch/x86/net/bpf_jit.S */ @@ -25,7 +31,8 @@ extern u8 sk_load_byte_positive_offset[]; extern u8 sk_load_word_negative_offset[], sk_load_half_negative_offset[]; extern u8 sk_load_byte_negative_offset[]; -static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len) +static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len, +struct shash_desc *hash) { if (len == 1) *ptr = bytes; @@ -35,11 +42,15 @@ static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len) *(u32 *)ptr = bytes; barrier(); } + + if (IS_ENABLED(CONFIG_EBPF_JIT_HASH_OUTPUT) && hash) + crypto_shash_update(hash, (u8 *) &bytes, len); + return ptr + len; } #define EMIT(bytes, len) \ - do { prog = emit_code(prog, bytes, len); cnt += len; } while (0) + do { prog = emit_code(prog, bytes, len, hash); cnt += len; } while (0) #define EMIT1(b1) EMIT(b1, 1) #define EMIT2(b1, b2) EMIT((b1) + ((b2) << 8), 2) @@ -206,7 +217,7 @@ struct jit_context { /* emit x64 prologue code for BPF program and check it's size. * bpf_tail_call helper will skip it while jumping into another program */ -static void emit_prologue(u8 **pprog) +static void emit_prologue(u8 **pprog, struct shash_desc *hash) { u8 *prog = *pprog; int cnt = 0; @@ -264,7 +275,7 @@ static void emit_prologue(u8 **pprog) * goto *(prog->bpf_func + prologue_size); * out: */ -static void emit_bpf_tail_call(u8 **pprog) +static void emit_bpf_tail_call(u8 **pprog, struct shash_desc *hash) { u8 *prog = *pprog; int label1, label2, label3; @@ -328,7 +339,7 @@ static void emit_bpf_tail_call(u8 **pprog) } -static void emit_load_skb_data_hlen(u8 **pprog) +static void emit_load_skb_data_hlen(u8 **pprog, struct shash_desc *hash) { u8 *prog = *pprog; int cnt = 0; @@ -348,7 +359,8 @@ static void emit_load_skb_data_hlen(u8 **pprog) } static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, - int oldproglen, struct jit_context *ctx) + int oldproglen, struct jit_context *ctx, + struct shash_desc *hash) { struct bpf_insn *insn = bpf_prog->insnsi; int insn_cnt = bpf_prog->len; @@ -360,10 +372,10 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, int proglen = 0; u8 *prog = temp; - emit_prologue(&prog); + emit_prologue(&prog, hash); if (seen_ld_abs) - emit_load_skb_data_hlen(&prog); + emit_load_skb_data_hlen(&prog, hash); for (i = 0; i < insn_cnt; i++, insn++) { const s32 imm32 = insn->imm; @@ -875,7 +887,7 @@ xadd: if (is_imm8(insn->off)) if (seen_ld_abs) { if (reload_skb_data) { EMIT1(0x5F); /* pop %rdi */ - emit_load_skb_data_hlen(&prog); + emit_load_skb_data_hlen(&prog, hash); } else { EMIT2(0x41, 0x59); /* pop %r9 */
Re: [iproute PATCH 0/4] Smaller link type help review
On Tue, 28 Mar 2017 23:19:35 +0200 Phil Sutter wrote: > This series addresses some minor nits with link type specific help > texts: > > * Unify coding style of print_help() callbacks (or the functions they > call. > > * Unify output as much as possible for a common look and feel. > > * Make sure there's type specific help for each type listed in 'ip link > help'. > > Phil Sutter (4): > ip: link: bond: Fix whitespace in help text > ip: link: macvlan: Add newline to help output > ip: link: Unify link type help functions a bit > ip: link: Add missing link type help texts > > ip/Makefile | 3 ++- > ip/iplink_bond.c| 2 +- > ip/iplink_dummy.c | 16 > ip/iplink_geneve.c | 28 ++-- > ip/iplink_ifb.c | 16 > ip/iplink_ipoib.c | 4 +++- > ip/iplink_macvlan.c | 1 + > ip/iplink_nlmon.c | 16 > ip/iplink_team.c| 25 + > ip/iplink_vcan.c| 16 > ip/iplink_vlan.c| 15 +-- > ip/iplink_vxlan.c | 44 +--- > ip/link_gre.c | 36 +++- > ip/link_gre6.c | 47 --- > ip/link_ip6tnl.c| 46 +++--- > ip/link_iptnl.c | 38 ++ > ip/link_vti.c | 17 + > 17 files changed, 265 insertions(+), 105 deletions(-) > create mode 100644 ip/iplink_dummy.c > create mode 100644 ip/iplink_ifb.c > create mode 100644 ip/iplink_nlmon.c > create mode 100644 ip/iplink_team.c > create mode 100644 ip/iplink_vcan.c > All 4 Applied
Re: [PATCH net] sctp: get sock from transport in sctp_transport_update_pmtu
On Tue, Apr 04, 2017 at 01:39:55PM +0800, Xin Long wrote: > This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU > updates to accomodate route invalidation."). As t->asoc can't be NULL > in sctp_transport_update_pmtu, it could get sk from asoc, and no need > to pass sk into that function. > > It is also to remove some duplicated codes from that function. > > Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner > --- > include/net/sctp/sctp.h| 5 ++--- > include/net/sctp/structs.h | 6 +++--- > net/sctp/associola.c | 6 +++--- > net/sctp/input.c | 4 ++-- > net/sctp/output.c | 4 ++-- > net/sctp/socket.c | 6 +++--- > net/sctp/transport.c | 19 +++ > 7 files changed, 22 insertions(+), 28 deletions(-) > > diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h > index d75caa7..069582e 100644 > --- a/include/net/sctp/sctp.h > +++ b/include/net/sctp/sctp.h > @@ -448,10 +448,9 @@ static inline int sctp_frag_point(const struct > sctp_association *asoc, int pmtu) > return frag; > } > > -static inline void sctp_assoc_pending_pmtu(struct sock *sk, struct > sctp_association *asoc) > +static inline void sctp_assoc_pending_pmtu(struct sctp_association *asoc) > { > - > - sctp_assoc_sync_pmtu(sk, asoc); > + sctp_assoc_sync_pmtu(asoc); > asoc->pmtu_pending = 0; > } > > diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h > index a127b7c..138f861 100644 > --- a/include/net/sctp/structs.h > +++ b/include/net/sctp/structs.h > @@ -952,8 +952,8 @@ void sctp_transport_lower_cwnd(struct sctp_transport *, > sctp_lower_cwnd_t); > void sctp_transport_burst_limited(struct sctp_transport *); > void sctp_transport_burst_reset(struct sctp_transport *); > unsigned long sctp_transport_timeout(struct sctp_transport *); > -void sctp_transport_reset(struct sctp_transport *); > -void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32); > +void sctp_transport_reset(struct sctp_transport *t); > +void sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu); > void sctp_transport_immediate_rtx(struct sctp_transport *); > void sctp_transport_dst_release(struct sctp_transport *t); > void sctp_transport_dst_confirm(struct sctp_transport *t); > @@ -1954,7 +1954,7 @@ void sctp_assoc_update(struct sctp_association *old, > > __u32 sctp_association_get_next_tsn(struct sctp_association *); > > -void sctp_assoc_sync_pmtu(struct sock *, struct sctp_association *); > +void sctp_assoc_sync_pmtu(struct sctp_association *asoc); > void sctp_assoc_rwnd_increase(struct sctp_association *, unsigned int); > void sctp_assoc_rwnd_decrease(struct sctp_association *, unsigned int); > void sctp_assoc_set_primary(struct sctp_association *, > diff --git a/net/sctp/associola.c b/net/sctp/associola.c > index 0b26df5..a9708da 100644 > --- a/net/sctp/associola.c > +++ b/net/sctp/associola.c > @@ -1412,7 +1412,7 @@ sctp_assoc_choose_alter_transport(struct > sctp_association *asoc, > /* Update the association's pmtu and frag_point by going through all the > * transports. This routine is called when a transport's PMTU has changed. > */ > -void sctp_assoc_sync_pmtu(struct sock *sk, struct sctp_association *asoc) > +void sctp_assoc_sync_pmtu(struct sctp_association *asoc) > { > struct sctp_transport *t; > __u32 pmtu = 0; > @@ -1424,8 +1424,8 @@ void sctp_assoc_sync_pmtu(struct sock *sk, struct > sctp_association *asoc) > list_for_each_entry(t, &asoc->peer.transport_addr_list, > transports) { > if (t->pmtu_pending && t->dst) { > - sctp_transport_update_pmtu(sk, t, > - > SCTP_TRUNC4(dst_mtu(t->dst))); > + sctp_transport_update_pmtu( > + t, SCTP_TRUNC4(dst_mtu(t->dst))); > t->pmtu_pending = 0; > } > if (!pmtu || (t->pathmtu < pmtu)) > diff --git a/net/sctp/input.c b/net/sctp/input.c > index 2a28ab2..0e06a27 100644 > --- a/net/sctp/input.c > +++ b/net/sctp/input.c > @@ -401,10 +401,10 @@ void sctp_icmp_frag_needed(struct sock *sk, struct > sctp_association *asoc, > > if (t->param_flags & SPP_PMTUD_ENABLE) { > /* Update transports view of the MTU */ > - sctp_transport_update_pmtu(sk, t, pmtu); > + sctp_transport_update_pmtu(t, pmtu); > > /* Update association pmtu. */ > - sctp_assoc_sync_pmtu(sk, asoc); > + sctp_assoc_sync_pmtu(asoc); > } > > /* Retransmit with the new pmtu setting. > diff --git a/net/sctp/output.c b/net/sctp/output.c > index ec4d50a..1409a87 100644 > --- a/net/sctp/output.c > +++ b/net/sctp/output.c > @@ -105,10 +105,10 @@ void sctp_packet_config(struct sctp_packet *packet, > __u32 vtag, > if (!sctp_transport_dst_check(tp)) { >
Re: [PATCH iproute2] man: ip-link.8: document bridge options
On Tue, 28 Mar 2017 17:56:48 +0200 Sabrina Dubroca wrote: > Signed-off-by: Phil Sutter > Signed-off-by: Sabrina Dubroca Applied
Re: [PATCH iproute2 1/1] tc: print skbedit action when dumping actions.
On Wed, 22 Mar 2017 14:00:31 -0400 Roman Mashak wrote: > Signed-off-by: Roman Mashak Makes sense. Applied
Re: [PATCH iproute2] man: fix man page warnings
On Sun, 26 Mar 2017 21:11:14 +0200 Alexander Alemayhu wrote: > While generating PDFs from the man pages, I saw the warning below from > several files. Compared the tc-matchall.8 with bridge.8 and used .RI > instead of .R. It should have no effect on the man page rendering. > > `R' is a string (producing the registered sign), not a macro. > > Signed-off-by: Alexander Alemayhu Applied
Re: [PATCH] ss: replace all zero characters in a unix name to '@'
On Sat, 1 Apr 2017 04:31:57 +0300 Andrei Vagin wrote: > From: Andrei Vagin > > A name of an abstract socket can contain zero characters. > Now we replace only the first character. If a name contains more > than one zero character, the ss tool shows only a part of the name: > u_str UNCONN00 @1931097 * 0 > > the output with this patch: > u_str UNCONN00 @@zdtm-./sk-unix-unconn-23/@ 1931097 * 0 > > Signed-off-by: Andrei Vagin This patch duplicates changes that are already in current version. commit 878dadc79d247aa37b67fb30608e58ef1f9ab9ff Author: Isaac Boukris Date: Sat Oct 29 22:20:19 2016 +0300 iproute2: ss: escape all null bytes in abstract unix domain socket Abstract unix domain socket may embed null characters, these should be translated to '@' when printed by ss the same way the null prefix is currently being translated. Signed-off-by: Isaac Boukris
Re: [iproute2 net-next v2 0/3] ip netconf improvements
On Tue, 4 Apr 2017 17:07:31 -0400 David Ahern wrote: > On 3/23/17 10:51 PM, David Ahern wrote: > > Currently, ip netconf only shows data for ipv4 and ipv6 for dumps > > and just ipv4 for device requests. Improve the user experience by > > using the new kernel patch to dump all address families that have > > registered. For example, if mpls_router module is loaded then mpls > > values are displayed along with ipv4 and ipv6. > > > > If the new feature is not supported (new iproute2 on older kernel) > > the kernel returns the nlmsg error EOPNOTSUPP which can be trapped > > and fallback to existing behavior. > > > > v2 > > - fixed index conversion in patch 3 per nicholas' comment > > > > David Ahern (3): > > netlink: Add flag to suppress print of nlmsg error > > ip netconf: Show all address families by default in dumps > > ip netconf: show all families on dev request > > > > include/libnetlink.h | 1 + > > ip/ipnetconf.c | 36 +--- > > lib/libnetlink.c | 3 ++- > > 3 files changed, 28 insertions(+), 12 deletions(-) > > > > Hi Stephen: any comments? are you ok with this change? I was holding off until all the upstream commits went through. Other than that fine.
[PATCH net-next] bonding: attempt to better support longer hw addresses
People are using bonding over Infiniband IPoIB connections, and who knows what else. Infiniband has a hardware address length of 20 octets (INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32. Various places in the bonding code are currently hard-wired to 6 octets (ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides, only alb is currently possible on Infiniband links right now anyway, due to commit 1533e7731522, so the alb code is where most of the changes are. One major component of this change is the addition of a bond_hw_addr_copy function that takes a length argument, instead of using ether_addr_copy everywhere that hardware addresses need to be copied about. The other major component of this change is converting the bonding code from using struct sockaddr for address storage to struct sockaddr_storage, as the former has an address storage space of only 14, while the latter is 128 minus a few, which is necessary to support bonding over device with up to MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes up some memory corruption issues with the current code, where it's possible to write an infiniband hardware address into a sockaddr declared on the stack. Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet hardware address now: $ cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active) Primary Slave: mlx4_ib0 (primary_reselect always) Currently Active Slave: mlx4_ib0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 100 Down Delay (ms): 100 Slave Interface: mlx4_ib0 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01 Slave queue ID: 0 Slave Interface: mlx4_ib1 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02 Slave queue ID: 0 Also tested with a standard 1Gbps NIC bonding setup (with a mix of e1000 and e1000e cards), running LNST's bonding tests. CC: Jay Vosburgh CC: Veaceslav Falico CC: Andy Gospodarek CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson --- drivers/net/bonding/bond_alb.c| 88 +++ drivers/net/bonding/bond_main.c | 73 ++-- drivers/net/bonding/bond_procfs.c | 3 +- include/net/bonding.h | 12 +- 4 files changed, 108 insertions(+), 68 deletions(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index c80b023092dd..7d7a3cec149a 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -687,7 +687,8 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond) /* the arp must be sent on the selected rx channel */ tx_slave = rlb_choose_channel(skb, bond); if (tx_slave) - ether_addr_copy(arp->mac_src, tx_slave->dev->dev_addr); + bond_hw_addr_copy(arp->mac_src, tx_slave->dev->dev_addr, + tx_slave->dev->addr_len); netdev_dbg(bond->dev, "Server sent ARP Reply packet\n"); } else if (arp->op_code == htons(ARPOP_REQUEST)) { /* Create an entry in the rx_hashtbl for this client as a @@ -1017,22 +1018,23 @@ static void alb_send_learning_packets(struct slave *slave, u8 mac_addr[], rcu_read_unlock(); } -static int alb_set_slave_mac_addr(struct slave *slave, u8 addr[]) +static int alb_set_slave_mac_addr(struct slave *slave, u8 addr[], + unsigned int len) { struct net_device *dev = slave->dev; - struct sockaddr s_addr; + struct sockaddr_storage ss; if (BOND_MODE(slave->bond) == BOND_MODE_TLB) { - memcpy(dev->dev_addr, addr, dev->addr_len); + memcpy(dev->dev_addr, addr, len); return 0; } /* for rlb each slave must have a unique hw mac addresses so that * each slave will receive packets destined to a different mac */ - memcpy(s_addr.sa_data, addr, dev->addr_len); - s_addr.sa_family = dev->type; - if (dev_set_mac_address(dev, &s_addr)) { + memcpy(ss.__data, addr, len); + ss.ss_family = dev->type; + if (dev_set_mac_address(dev, (struct sockaddr *)&ss)) { netdev_err(slave->bond->dev, "dev_set_mac_address of dev %s failed! ALB mode requires that the base driver support setting the hw address also when the network device's interface is open\n", dev->name); return -EOPNOTSUPP; @@ -1046,11 +1048,14 @@ static int alb_set_slave_mac_addr(struct slave *slave, u8 addr[]) */ static void alb_swap_mac_addr(struct slave *slave1, struct slave *slave2) {
[PATCH net 2/2] tcp: fix reordering SNMP under-counting
Currently the reordering SNMP counters only increase if a connection sees a higher degree then it has previously seen. It ignores if the reordering degree is not greater than the default system threshold. This significantly under-counts the number of reordering events and falsely convey that reordering is rare on the network. This patch properly and faithfully records the number of reordering events detected by the TCP stack, just like the comment says "this exciting event is worth to be remembered". Note that even so TCP still under-estimate the actual reordering events because TCP requires TS options or certain packet sequences to detect reordering (i.e. ACKing never-retransmitted sequence in recovery or disordered state). Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell Signed-off-by: Soheil Hassas Yeganeh --- net/ipv4/tcp_input.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a75c48f62e27..5bfe17fc8064 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -874,22 +874,11 @@ static void tcp_update_reordering(struct sock *sk, const int metric, const int ts) { struct tcp_sock *tp = tcp_sk(sk); - if (metric > tp->reordering) { - int mib_idx; + int mib_idx; + if (metric > tp->reordering) { tp->reordering = min(sysctl_tcp_max_reordering, metric); - /* This exciting event is worth to be remembered. 8) */ - if (ts) - mib_idx = LINUX_MIB_TCPTSREORDER; - else if (tcp_is_reno(tp)) - mib_idx = LINUX_MIB_TCPRENOREORDER; - else if (tcp_is_fack(tp)) - mib_idx = LINUX_MIB_TCPFACKREORDER; - else - mib_idx = LINUX_MIB_TCPSACKREORDER; - - NET_INC_STATS(sock_net(sk), mib_idx); #if FASTRETRANS_DEBUG > 1 pr_debug("Disorder%d %d %u f%u s%u rr%d\n", tp->rx_opt.sack_ok, inet_csk(sk)->icsk_ca_state, @@ -902,6 +891,18 @@ static void tcp_update_reordering(struct sock *sk, const int metric, } tp->rack.reord = 1; + + /* This exciting event is worth to be remembered. 8) */ + if (ts) + mib_idx = LINUX_MIB_TCPTSREORDER; + else if (tcp_is_reno(tp)) + mib_idx = LINUX_MIB_TCPRENOREORDER; + else if (tcp_is_fack(tp)) + mib_idx = LINUX_MIB_TCPFACKREORDER; + else + mib_idx = LINUX_MIB_TCPSACKREORDER; + + NET_INC_STATS(sock_net(sk), mib_idx); } /* This must be called before lost_out is incremented */ -- 2.12.2.715.g7642488e1d-goog
[PATCH net 1/2] tcp: fix lost retransmit SNMP under-counting
The lost retransmit SNMP stat is under-counting retransmission that uses segment offloading. This patch fixes that so all retransmission related SNMP counters are consistent. Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell Signed-off-by: Soheil Hassas Yeganeh --- net/ipv4/tcp_recovery.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index 4ecb38ae8504..d8acbd9f477a 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -12,7 +12,8 @@ static void tcp_rack_mark_skb_lost(struct sock *sk, struct sk_buff *skb) /* Account for retransmits that are lost again */ TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS; tp->retrans_out -= tcp_skb_pcount(skb); - NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPLOSTRETRANSMIT); + NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPLOSTRETRANSMIT, + tcp_skb_pcount(skb)); } } -- 2.12.2.715.g7642488e1d-goog
Re: net/sctp: list double add warning in sctp_endpoint_add_asoc
On Wed, Apr 05, 2017 at 01:29:19AM +0800, Xin Long wrote: > On Tue, Apr 4, 2017 at 9:28 PM, Andrey Konovalov > wrote: > > Hi, > > > > I've got the following error report while fuzzing the kernel with syzkaller. > > > > On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5). > > > > A reproducer and .config are attached. > The script is pretty hard to reproduce the issue in my env. I didn't try running it but I also found the reproducer very complicated to follow. Do you have any plans on having some PoC optimizer, so we can have a more readable code? strace is handy for filtering the noise, yes, but sometimes it doesn't cut it. > But there seems a case to cause a use-after-free when out of snd_buf. > > the case is like: > --- > one thread: another thread: > sctp_rcv hold asoc (hold transport) > enqueue the chunk to backlog queue > [refcnt=2] > > sctp_close free assoc > [refcnt=1] > > sctp_sendmsg find asoc > but not hold it > > out of snd_buf > hold asoc, schedule out > [refcnt = 2] > > process backlog and put asoc/transport > [refcnt=1] > > schedule in, put asoc > [refcnt=0] <--- destroyed > > sctp_sendmsg continue It shouldn't be continuing here because sctp_wait_for_sndbuf and sctp_wait_for_connect functions are checking if the asoc is dead already when it schedules in, even though sctp_wait_for_connect return value is ignored and sctp_sendmsg() simply returns after that. Or the checks for dead asocs in there aren't enough somehow. > using asoc, panic
Re: [RFC net-next] bpf: taint loading !is_gpl programs
On 04/04/2017 08:33 PM, Aaron Conole wrote: The eBPF framework is used for more than just socket level filtering. It can also provide tracing, and even change the way packets coming into the system look. Most of the eBPF callable symbols are available to non-gpl programs, and this includes helper functions which modify packets. This allows proprietary eBPF code to link to the kernel and make decisions which can negatively impact network performance. Since the sources for these programs are only available under a proprietary license, it seems better to treat them the same as other proprietary modules: set the system taint flag. An exemption is made for socket-level filters, since they do not really impact networking for the whole kernel. Signed-off-by: Aaron Conole --- kernel/bpf/syscall.c | 5 + 1 file changed, 5 insertions(+) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ab0cf4c4..1255b51 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -860,6 +860,11 @@ static int bpf_prog_load(union bpf_attr *attr) bpf_prog_kallsyms_add(prog); trace_bpf_prog_load(prog, err); + if (type != BPF_PROG_TYPE_SOCKET_FILTER && !is_gpl && !(err < 0)) { + if (!test_taint(TAINT_PROPRIETARY_MODULE)) + pr_warn("bpf license '%s' taints kernel.\n", license); + add_taint(TAINT_PROPRIETARY_MODULE, LOCKDEP_STILL_OK); + } return err; free_used_maps: Nacked-by: Daniel Borkmann This is proposal completely unreasonable; what the purpose of .gpl_only flags is agreed upon since the beginning is that some of the helpers are only available if the program is loaded as gpl, f.e. bpf_ktime_get_ns(), bpf_probe_read(), bpf_probe_write_user(), bpf_trace_printk(), bpf_skb_event_output(), etc. Now, suddenly switching from one kernel version to another, existing programs would out of a sudden taint the kernel, which by itself is unacceptable. There are also many other subsystems that can modify packets, or affect system performance negatively if configured wrongly and which in addition *don't require* a hard capable(CAP_SYS_ADMIN) restriction like such eBPF programs already do, perhaps should we taint them as well? Plus tracing programs are attached to passively monitor systems performance, not even modifying data structures ... The current purpose of .gpl_only is fine as-is, and there's work in progress for a generic dump mechanism that works with all program types to improve introspection aspect if that's what you're after, starting to taint is, in a way, breaking existing applications and this is not acceptable.
Re: [iproute2 net-next v2 0/3] ip netconf improvements
On 3/23/17 10:51 PM, David Ahern wrote: > Currently, ip netconf only shows data for ipv4 and ipv6 for dumps > and just ipv4 for device requests. Improve the user experience by > using the new kernel patch to dump all address families that have > registered. For example, if mpls_router module is loaded then mpls > values are displayed along with ipv4 and ipv6. > > If the new feature is not supported (new iproute2 on older kernel) > the kernel returns the nlmsg error EOPNOTSUPP which can be trapped > and fallback to existing behavior. > > v2 > - fixed index conversion in patch 3 per nicholas' comment > > David Ahern (3): > netlink: Add flag to suppress print of nlmsg error > ip netconf: Show all address families by default in dumps > ip netconf: show all families on dev request > > include/libnetlink.h | 1 + > ip/ipnetconf.c | 36 +--- > lib/libnetlink.c | 3 ++- > 3 files changed, 28 insertions(+), 12 deletions(-) > Hi Stephen: any comments? are you ok with this change?
Fw: [Bug 195169] New: ip_route_input_noref panic
Begin forwarded message: Date: Fri, 31 Mar 2017 02:54:55 + From: bugzilla-dae...@bugzilla.kernel.org To: step...@networkplumber.org Subject: [Bug 195169] New: ip_route_input_noref panic https://bugzilla.kernel.org/show_bug.cgi?id=195169 Bug ID: 195169 Summary: ip_route_input_noref panic Product: Networking Version: 2.5 Kernel Version: 3.10.103 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: IPV4 Assignee: step...@networkplumber.org Reporter: panweipi...@163.com Regression: No Created attachment 255655 --> https://bugzilla.kernel.org/attachment.cgi?id=255655&action=edit ip_route_input_noref We hit a kernel panic on kernel 3.10.103. Since we do not configure kdump, I can only take a picture after it hangs. -- You are receiving this mail because: You are the assignee for the bug.
Re: [PATCH] i40e: limit client interface to X722 hardware
On 04.04.2017 18:56, Or Gerlitz wrote: > On Tue, Apr 4, 2017 at 5:34 PM, Stefan Assmann wrote: >> The client interface is meant for X722 iWARP support. Modprobing i40iw >> on systems with X710/XL710 NICs currently may crash the system. > > just curious may or crash? and why? The backtrace I got was not really conclusive. The code is not meant to be run on that hardware so I didn't bother to dig deeper. Stefan
[PATCH] i40e: only register client on iWarp-capable devices
The client interface is only intended for use on devices that support iWarp). Only register with the client if this is the case. This fixes a panic when loading i40iw on X710 devices. Signed-off-by: Mitch Williams Reported-by: Stefan Assmann --- drivers/net/ethernet/intel/i40e/i40e_main.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 87d99fa..5e0e44e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -11828,10 +11828,12 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent) round_jiffies(jiffies + pf->service_timer_period)); /* add this PF to client device list and launch a client service task */ - err = i40e_lan_add_device(pf); - if (err) - dev_info(&pdev->dev, "Failed to add PF to client API service list: %d\n", -err); + if (pf->flags & I40E_FLAG_IWARP_ENABLED) { + err = i40e_lan_add_device(pf); + if (err) + dev_info(&pdev->dev, "Failed to add PF to client API service list: %d\n", +err); + } #define PCI_SPEED_SIZE 8 #define PCI_WIDTH_SIZE 8 @@ -12013,10 +12015,11 @@ static void i40e_remove(struct pci_dev *pdev) i40e_vsi_release(pf->vsi[pf->lan_vsi]); /* remove attached clients */ - ret_code = i40e_lan_del_device(pf); - if (ret_code) { - dev_warn(&pdev->dev, "Failed to delete client device: %d\n", -ret_code); + if (pf->flags & I40E_FLAG_IWARP_ENABLED) { + ret_code = i40e_lan_del_device(pf); + if (ret_code) + dev_warn(&pdev->dev, "Failed to delete client device: %d\n", +ret_code); } /* shutdown and destroy the HMC */ -- 2.7.4
Re: [PATCH 4/4] net: stmmac: adding multiple napi mechanism
On Tue, Apr 04, 2017 at 06:54:27PM +0100, Joao Pinto wrote: [...] > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c [...] > @@ -1259,7 +1317,6 @@ static int init_dma_tx_desc_rings(struct net_device > *dev) > /* TX INITIALIZATION */ > for (i = 0; i < DMA_TX_SIZE; i++) { > struct dma_desc *p; > - > if (priv->extend_desc) > p = &((tx_q->dma_etx + i)->basic); > else I think checkpatch would complain about this now because we're supposed to separate variable declarations from code by a single blank line. > - netif_napi_add(ndev, &priv->napi, stmmac_poll, 64); > + ret = alloc_dma_desc_resources(priv); > + if (ret < 0) { > + netdev_err(priv->dev, "%s: DMA descriptors allocation failed\n", > +__func__); > + goto init_dma_error; > + } > + > + ret = init_dma_desc_rings(priv->dev, GFP_KERNEL); > + if (ret < 0) { > + netdev_err(priv->dev, "%s: DMA descriptors initialization > failed\n", > +__func__); > + goto init_dma_error; > + } > + > + for (queue = 0; queue < priv->plat->rx_queues_to_use; queue++) { > + struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue]; > + > + netif_napi_add(ndev, &rx_q->napi, stmmac_poll, > +(8 * priv->plat->rx_queues_to_use)); > + } Why is this moving to ->probe() now? This works on Tegra186, so: Reviewed-by: Thierry Reding signature.asc Description: PGP signature
Re: [PATCH 2/4] net: stmmac: adding multiple buffers for rx
On Tue, Apr 04, 2017 at 06:54:25PM +0100, Joao Pinto wrote: [...] > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c [...] > @@ -3402,6 +3474,9 @@ static irqreturn_t stmmac_interrupt(int irq, void > *dev_id) > > if (priv->synopsys_id >= DWMAC_CORE_4_00) { > for (queue = 0; queue < queues_count; queue++) { > + struct stmmac_rx_queue *rx_q = > + &priv->rx_queue[queue]; Found one more: the indentation here looks wrong. I think it's more idiomatic to indent by at least a tab in such cases. > + > status |= > priv->hw->mac->host_mtl_irq_status(priv->hw, > queue); This is becoming quite unwieldy because of the indentation levels. Maybe this could be split out into a separate function. Could be a separate patch, though. Thierry signature.asc Description: PGP signature
Re: [PATCH 3/4] net: stmmac: adding multiple buffers for TX
On Tue, Apr 04, 2017 at 06:54:26PM +0100, Joao Pinto wrote: > This patch adds the structure stmmac_tx_queue which contains > tx queues specific data (previously in stmmac_priv). > > Signed-off-by: Joao Pinto > --- > drivers/net/ethernet/stmicro/stmmac/chain_mode.c | 38 +- > drivers/net/ethernet/stmicro/stmmac/ring_mode.c | 46 +- > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 26 +- > drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 521 > +- > 4 files changed, 375 insertions(+), 256 deletions(-) Looks good to me: Reviewed-by: Thierry Reding And works fine on Tegra186, so: Tested-by: Thierry Reding signature.asc Description: PGP signature
Re: [PATCH 2/4] net: stmmac: adding multiple buffers for rx
One more nit: subject should say "... for RX" for consistency with patch 3/4. Thierry signature.asc Description: PGP signature
Re: [PATCH 2/4] net: stmmac: adding multiple buffers for rx
On Tue, Apr 04, 2017 at 06:54:25PM +0100, Joao Pinto wrote: [...] > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c [...] > static void stmmac_display_rx_rings(struct stmmac_priv *priv) > { > + u32 rx_cnt = priv->plat->rx_queues_to_use; > void *head_rx; > + u32 queue; > > - if (priv->extend_desc) > - head_rx = (void *)priv->dma_erx; > - else > - head_rx = (void *)priv->dma_rx; > + /* Display RX rings */ > + for (queue = 0; queue < rx_cnt; queue++) { > + struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue]; > > - /* Display RX ring */ > - priv->hw->desc->display_ring(head_rx, DMA_RX_SIZE, true); > + pr_info("\tRX Queue %d rings\n", queue); Nit: %u is the right specifier for unsigned integers. > @@ -1107,46 +1135,65 @@ static int init_dma_rx_desc_rings(struct net_device > *dev, gfp_t flags) [...] > err_init_rx_buffers: > - while (--i >= 0) > - stmmac_free_rx_buffers(priv, i); > + while (queue-- >= 0) { Why are you switching to postfix decrement here? Not only is it inconsistent with the prefix decrement below, I think this also gives you a wrong result. Consider what happens if queue == 0. The condition evaluates to true, but within the loop the queue variable will wrap to ~0 and probably crash stmmac_free_rx_buffers(). Other than that, this looks fine, so with the above fixed: Reviewed-by: Thierry Reding Also works on Tegra186, so: Tested-by: Thierry Reding signature.asc Description: PGP signature
Re: [PATCH 1/4] net: stmmac: break some functions into RX and TX scopes
On Tue, Apr 04, 2017 at 06:54:24PM +0100, Joao Pinto wrote: > This patch breaks several functions into RX and TX scopes, which > will be useful when adding multiple buffers mechanism. > > Signed-off-by: Joao Pinto > --- > drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 350 > +- > 1 file changed, 268 insertions(+), 82 deletions(-) A couple of small nits below. > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c [...] > @@ -924,16 +941,16 @@ static int stmmac_set_bfsize(int mtu, int bufsize) > } > > /** > - * stmmac_clear_descriptors - clear descriptors > + * stmmac_clear_rx_descriptors - clear RX descriptors > * @priv: driver private structure > - * Description: this function is called to clear the tx and rx descriptors > + * Description: this function is called to clear the rx descriptors You seem to be transitioning to "RX" and "TX" everywhere, maybe do the same in this comment for consistency? Also, on a general note: there's no need for "Description:" here. The kerneldoc format mandates that you leave a blank line after the block of parameter descriptions, and the paragraph that follows becomes the description. I know that these are static functions and are therefore not parsed by kerneldoc, but since you already use the syntax anyway, you might as well get it right. > * in case of both basic and extended descriptors are used. > */ > -static void stmmac_clear_descriptors(struct stmmac_priv *priv) > +static void stmmac_clear_rx_descriptors(struct stmmac_priv *priv) > { > int i; This could be unsigned. > > - /* Clear the Rx/Tx descriptors */ > + /* Clear the RX descriptors */ > for (i = 0; i < DMA_RX_SIZE; i++) > if (priv->extend_desc) > priv->hw->desc->init_rx_desc(&priv->dma_erx[i].basic, > @@ -943,6 +960,19 @@ static void stmmac_clear_descriptors(struct stmmac_priv > *priv) > priv->hw->desc->init_rx_desc(&priv->dma_rx[i], >priv->use_riwt, priv->mode, >(i == DMA_RX_SIZE - 1)); > +} > + > +/** > + * stmmac_clear_tx_descriptors - clear tx descriptors > + * @priv: driver private structure > + * Description: this function is called to clear the tx descriptors > + * in case of both basic and extended descriptors are used. > + */ > +static void stmmac_clear_tx_descriptors(struct stmmac_priv *priv) > +{ > + int i; Same here. There are a couple of other such occurrences throughout the file. This already exists in many places in the driver, so I don't think this needs to be changed. Or at least it could be a follow-up patch. > + > + /* Clear the TX descriptors */ > for (i = 0; i < DMA_TX_SIZE; i++) > if (priv->extend_desc) > priv->hw->desc->init_tx_desc(&priv->dma_etx[i].basic, > @@ -955,6 +985,21 @@ static void stmmac_clear_descriptors(struct stmmac_priv > *priv) > } > > /** > + * stmmac_clear_descriptors - clear descriptors > + * @priv: driver private structure > + * Description: this function is called to clear the tx and rx descriptors > + * in case of both basic and extended descriptors are used. > + */ > +static void stmmac_clear_descriptors(struct stmmac_priv *priv) > +{ > + /* Clear the RX descriptors */ > + stmmac_clear_rx_descriptors(priv); > + > + /* Clear the TX descriptors */ > + stmmac_clear_tx_descriptors(priv); > +} > + > +/** > * stmmac_init_rx_buffers - init the RX descriptor buffer. > * @priv: driver private structure > * @p: descriptor pointer > @@ -996,6 +1041,11 @@ static int stmmac_init_rx_buffers(struct stmmac_priv > *priv, struct dma_desc *p, > return 0; > } > > +/** > + * stmmac_free_rx_buffers - free RX dma buffers > + * @priv: private structure > + * @i: buffer index. If this operates on a single buffer, as specified by the buffer index, maybe this should be named singular stmmac_free_rx_buffer()? > + */ > static void stmmac_free_rx_buffers(struct stmmac_priv *priv, int i) The index could be unsigned. > { > if (priv->rx_skbuff[i]) { > @@ -1007,14 +1057,42 @@ static void stmmac_free_rx_buffers(struct stmmac_priv > *priv, int i) > } > > /** > - * init_dma_desc_rings - init the RX/TX descriptor rings > + * stmmac_free_tx_buffers - free RX dma buffers > + * @priv: private structure > + * @i: buffer index. > + */ > +static void stmmac_free_tx_buffers(struct stmmac_priv *priv, int i) > +{ > + if (priv->tx_skbuff_dma[i].buf) { > + if (priv->tx_skbuff_dma[i].map_as_page) > + dma_unmap_page(priv->device, > +priv->tx_skbuff_dma[i].buf, > +priv->tx_skbuff_dma[i].len, > +DMA_TO_DEVICE); > + else > + dma_unmap_single(priv->dev
Re: net/ipv4: use-after-free in ipv4_mtu
On Tue, Apr 4, 2017 at 7:50 AM, Andrey Konovalov wrote: > > Hi, > > I've got the following error report while fuzzing the kernel with syzkaller. > > On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5). > > Unfortunately it's not reproducible. > > == > BUG: KASAN: use-after-free in dst_metric_raw include/net/dst.h:176 > [inline] at addr 88003d6a965c > BUG: KASAN: use-after-free in ipv4_mtu+0x3f2/0x4b0 > net/ipv4/route.c:1270 at addr 88003d6a965c > Read of size 4 by task syz-executor3/20611 > CPU: 3 PID: 20611 Comm: syz-executor3 Not tainted 4.11.0-rc5+ #199 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:16 [inline] > dump_stack+0x292/0x398 lib/dump_stack.c:52 > kasan_object_err+0x1c/0x70 mm/kasan/report.c:164 > print_address_description mm/kasan/report.c:202 [inline] > kasan_report_error mm/kasan/report.c:291 [inline] > kasan_report+0x252/0x510 mm/kasan/report.c:347 > __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367 > dst_metric_raw include/net/dst.h:176 [inline] > ipv4_mtu+0x3f2/0x4b0 net/ipv4/route.c:1270 > dst_mtu include/net/dst.h:221 [inline] > do_ip_getsockopt+0x71d/0x2290 net/ipv4/ip_sockglue.c:1433 > ip_getsockopt+0x90/0x230 net/ipv4/ip_sockglue.c:1578 > tcp_getsockopt+0x82/0xd0 net/ipv4/tcp.c:3131 > sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2709 > SYSC_getsockopt net/socket.c:1829 [inline] > SyS_getsockopt+0x252/0x390 net/socket.c:1811 > entry_SYSCALL_64_fastpath+0x1f/0xc2 > RIP: 0033:0x4458d9 > RSP: 002b:7fe87f452b58 EFLAGS: 0286 ORIG_RAX: 0037 > RAX: ffda RBX: 0005 RCX: 004458d9 > RDX: 000e RSI: RDI: 0005 > RBP: 006e0020 R08: 20db6000 R09: > R10: 207e8000 R11: 0286 R12: 00708150 > R13: 20db8000 R14: 1000 R15: 0003 > Object at 88003d6a9658, in cache kmalloc-64 size: 64 > Allocated: > PID = 20110 > save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 > save_stack+0x43/0xd0 mm/kasan/kasan.c:513 > set_track mm/kasan/kasan.c:525 [inline] > kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616 > kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745 > kmalloc include/linux/slab.h:490 [inline] > kzalloc include/linux/slab.h:663 [inline] > fib_create_info+0x8e0/0x3a30 net/ipv4/fib_semantics.c:1040 > fib_table_insert+0x1a5/0x1550 net/ipv4/fib_trie.c:1221 > ip_rt_ioctl+0xddc/0x1590 net/ipv4/fib_frontend.c:597 > inet_ioctl+0xf2/0x1c0 net/ipv4/af_inet.c:882 > sctp: [Deprecated]: syz-executor0 (pid 20638) Use of int in max_burst > socket option. > Use struct sctp_assoc_value instead > sock_do_ioctl+0x65/0xb0 net/socket.c:906 > sock_ioctl+0x28f/0x440 net/socket.c:1004 > vfs_ioctl fs/ioctl.c:45 [inline] > do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685 > SYSC_ioctl fs/ioctl.c:700 [inline] > SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 > entry_SYSCALL_64_fastpath+0x1f/0xc2 > Freed: > PID = 4439 > save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 > save_stack+0x43/0xd0 mm/kasan/kasan.c:513 > set_track mm/kasan/kasan.c:525 [inline] > kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589 > slab_free_hook mm/slub.c:1357 [inline] > slab_free_freelist_hook mm/slub.c:1379 [inline] > slab_free mm/slub.c:2961 [inline] > kfree+0xe8/0x2b0 mm/slub.c:3882 > free_fib_info_rcu+0x4ba/0x5e0 net/ipv4/fib_semantics.c:218 > __rcu_reclaim kernel/rcu/rcu.h:118 [inline] > rcu_do_batch.isra.64+0x947/0xcc0 kernel/rcu/tree.c:2879 > invoke_rcu_callbacks kernel/rcu/tree.c:3142 [inline] > __rcu_process_callbacks kernel/rcu/tree.c:3109 [inline] > rcu_process_callbacks+0x2cc/0xb90 kernel/rcu/tree.c:3126 > __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 > Memory state around the buggy address: > 88003d6a9500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > 88003d6a9580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > >88003d6a9600: fc fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb > ^ > 88003d6a9680: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc > 88003d6a9700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > == Thanks for the report Andrey Looking at fib->fib_metrics, I fail to understand how the following can work : dst_init_metrics(&rt->dst, fi->fib_metrics, true); In the cases fi->fib_metrics is _not_ dst_default_metrics, fi->fib_metrics can be freed when the fib is deleted, while dst(s) have still the 'read only pointer'. RCU grace period before fi->fib_metrics freeing does not help. Without refcounts, it looks like we need to copy the fib_metrics.
Re: pull-request: wireless-drivers-next 2017-04-03
From: Kalle Valo Date: Tue, 04 Apr 2017 20:48:35 +0300 > David Miller writes: > >> From: Kalle Valo >> Date: Mon, 03 Apr 2017 14:26:10 +0300 >> >>> here few really small fixes. I'm hoping this to be the last pull request >>> for 4.11. >>> >>> Please let me if there are any problems. >> >> Pulled, thanks. >> >> But I will warn you, you say fixes, but your Subject line and >> GIT tag says "-next" so I pulled it into net-next. > > Sorry, I used the wrong pull request template and that's why I had the > wrong subject in this pull request. So actually this was supposed to be > for net, not net-next. Any chance you could also pull this to net so > that we can still get the fixes to 4.11? Sure, done.
[RFC net-next] bpf: taint loading !is_gpl programs
The eBPF framework is used for more than just socket level filtering. It can also provide tracing, and even change the way packets coming into the system look. Most of the eBPF callable symbols are available to non-gpl programs, and this includes helper functions which modify packets. This allows proprietary eBPF code to link to the kernel and make decisions which can negatively impact network performance. Since the sources for these programs are only available under a proprietary license, it seems better to treat them the same as other proprietary modules: set the system taint flag. An exemption is made for socket-level filters, since they do not really impact networking for the whole kernel. Signed-off-by: Aaron Conole --- kernel/bpf/syscall.c | 5 + 1 file changed, 5 insertions(+) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ab0cf4c4..1255b51 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -860,6 +860,11 @@ static int bpf_prog_load(union bpf_attr *attr) bpf_prog_kallsyms_add(prog); trace_bpf_prog_load(prog, err); + if (type != BPF_PROG_TYPE_SOCKET_FILTER && !is_gpl && !(err < 0)) { + if (!test_taint(TAINT_PROPRIETARY_MODULE)) + pr_warn("bpf license '%s' taints kernel.\n", license); + add_taint(TAINT_PROPRIETARY_MODULE, LOCKDEP_STILL_OK); + } return err; free_used_maps: -- 2.9.3
Re: [PATCH v4 2/2] can: spi: hi311x: Add Holt HI-311x CAN driver
On 04/04/2017 11:34 AM, Marc Kleine-Budde wrote: > On 03/24/2017 06:20 PM, Akshay Bhat wrote: >> Hi Marc, >> >> On 03/17/2017 05:10 PM, Akshay Bhat wrote: >>> This patch adds support for the Holt HI-311x CAN controller. The HI311x >>> CAN controller is capable of transmitting and receiving standard data >>> frames, extended data frames and remote frames. The HI311x interfaces >>> with the host over SPI. >>> >>> Datasheet: www.holtic.com/documents/371-hi-3110_v-rev-jpdf.do >>> >>> Signed-off-by: Akshay Bhat >>> --- >>> >> >> If there are no further review comments can this series be applied to >> can-next or does it need to wait for the next kernel release cycle (4.13)? > > The driver doesn't check if the workqueue allocation is successfull, > I've squashed this patch: > Thanks Marc, appreciate it. The squashed patch looks good.
Re: [PATCH] net: netfilter: Use seq_puts()/seq_putc() where possible
On Wed, Mar 29, 2017 at 03:25:17AM +0530, simran singhal wrote: > For string without format specifiers, use seq_puts(). For > seq_printf("\n"), use seq_putc('\n'). > > Signed-off-by: simran singhal > --- > net/netfilter/ipvs/ip_vs_ctl.c | 8 Simran, I would be happy to pick up the IPVS version if it was posted as a separate patch. Alternative, Pablo, if you would like to take this patch feel free to add: Acked-by: Simon Horman
Re: [PATCH] net: netfilter: Replace explicit NULL comparison with ! operator
On Tue, Apr 04, 2017 at 01:41:11PM -0400, Simon Horman wrote: > On Wed, Mar 29, 2017 at 03:45:01PM +0530, Arushi Singhal wrote: > > Replace explicit NULL comparison with ! operator to simplify code. > > > > Signed-off-by: Arushi Singhal > > --- > > net/netfilter/ipvs/ip_vs_ctl.c | 8 ++--- > > net/netfilter/ipvs/ip_vs_proto.c | 8 ++--- > > I count 18 instances of "!= NULL in net/netfilter/ipvs/ip_vs_proto but this > patch only seems to update 8 of them. I would prefer to fix all or none of > them. Agreed. Please address all instances and resubmit.
[PATCH 0/4 net-next] net: stmmac: adding multiple buffers
This patch adds multiple buffers to stmmac in a more fragmented way, in order to make problem debug easier. I would kindly request to people to test this patch in their HWs in order to check if everything's functional. Thank you. Joao Pinto (4): net: stmmac: break some functions into RX and TX scopes net: stmmac: adding multiple buffers for rx net: stmmac: adding multiple buffers for TX net: stmmac: adding multiple napi mechanism drivers/net/ethernet/stmicro/stmmac/chain_mode.c | 45 +- drivers/net/ethernet/stmicro/stmmac/ring_mode.c | 46 +- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 49 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1295 ++--- 4 files changed, 969 insertions(+), 466 deletions(-) -- 2.9.3