date:20170404

Re: [PATCH] net: thunderx: Switch to pci_alloc_irq_vectors

2017-04-04 Thread Thanneeru Srinivasulu

On Wed, Apr 5, 2017 at 11:57 AM, Christoph Hellwig  wrote:
> On Tue, Apr 04, 2017 at 02:59:06PM +0530, dev.srinivas...@gmail.com wrote:
>> From: Thanneeru Srinivasulu 
>>
>> Remove deprecated pci_enable_msix API in favour of it's
>> successor pci_alloc_irq_vectors.
>>
>> Signed-off-by: Thanneeru Srinivasulu 
>> Signed-off-by: Sunil Goutham 
>
> Looks good.
>
> Are you fine with me queueing this up together with the other
> pci_enable_msix() removal patches?

No issues, thanks.

Re: [PATCH v2 00/13] ftgmac100: Rework batch 1 - Link & Interrupts

2017-04-04 Thread Benjamin Herrenschmidt

On Tue, 2017-04-04 at 23:02 -0700, Florian Fainelli wrote:

> We don't necessarily have a phydev attached when using NC-SI, so it was
> > easier to have the core code path not have to go fishing for those
> > settings in different places based on whether we're using NC-SI or not.
> 
> Oh right, I missed that part. Is there a reason why NC-SI does not have
> a PHY device attached? If not, could you somehow model the link using a
> fixed PHY (which appears to Linux as a normal phy_device) just to keep
> things simple.

Hrm ... maybe another day if you don't mind ;-)

First NC-SI isn't really a PHY  it's a cross-over RMII connection
to another NIC.

Now we could make it a phydev using a "fixed" PHY I suppose, that just
"represents" the other end. That would be a way to do it. It would need
to have the link permanently up however (see below).

That said I do want to tackle making it some kind of pseudo-PHY that
actually reflects the state of the remote end (especially the link
state, ie. up/down).

However there are a couple of issues to tackle if we do that. Well
mostly one annoying one:

NC-SI needs to talk to the remote NIC via specific ethernet frames.

With the current link watch code however, if we reflect the remote link
to the local NIC link via netif_carrier_on/off, we end up deactivating
the device on link off and thus preventing the NC-SI stack from talking
to the peer NIC at all.

I thought a while ago we could add some dev flag to prevent the link
watch from doing that, but never got to look into it myself and
apparently neither did Gavin.

So yes, those are worthwhile improvements and I can probably tackle
them once I've unpiled a dozen other train wrecks from my plate ;)
However I'd like to not block this series further since it's not
actually making things any worse than they are.

> > > - the need to reset the HW during link changes is just ... well too bad
> > 
> > Yup but there's little choice. The HW wants it. I don't see any real
> > point in optimizing that path mind you. Losing a few packets around
> > a link change isn't going to hurt and it keeps the code a lot simpler
> > by having a single "re-init" path.
> 
> I was just merely trying to say nicely: what a nicely broken piece of HW
> (there were other adjectives coming to mind), and I do understand the pain.

:-) At least I got a register spec (and little more) :-)

It looks like those Aspeed BMCs are the only game in town for BMC chips
these days and they use that "interesting" IP block from Faraday so
this is probably here to stay, at least for a while.

Another "interesting" attribute of that piece of c^Hhw is its handling
of receive descriptors.

It doesn't "count" how many are free. It has to constantly "read" the
head descriptor in the RX ring to check the own bit. So you have to
setup a HW timer for the chip to go "poll" on your memory. It's pretty
insane. At least for TX there's an MMIO you can poke to tell it to go
fetch more. There's sort-of one for RX but it doesn't seem to do what
you would expect, or I did something wrong when playing with it.

It's not like it would have been hard to have a counter, which is
incremented by writing a value to a register so Linux can "provide"
descriptors by writing the number freed in there.

So the chip never really knows how many free descriptors it has which
also means it cannot do flow control based on that, only on the FIFO
threshold. With a 2K only FIFO that's  interesting.

Anyway, it sort-of works. Without my patches I maxed out at about
80Mbit/s iperf on a gigabit link with the AST2500 eval board (ARM11
800Mhz base). With my patches I get to about 400Mbit/s.

Cheers.
Ben.

Re: [PATCH] net: thunderx: Switch to pci_alloc_irq_vectors

2017-04-04 Thread Christoph Hellwig

On Tue, Apr 04, 2017 at 02:59:06PM +0530, dev.srinivas...@gmail.com wrote:
> From: Thanneeru Srinivasulu 
> 
> Remove deprecated pci_enable_msix API in favour of it's
> successor pci_alloc_irq_vectors.
> 
> Signed-off-by: Thanneeru Srinivasulu 
> Signed-off-by: Sunil Goutham 

Looks good.

Are you fine with me queueing this up together with the other
pci_enable_msix() removal patches?

Re: net/sched: latent livelock in dev_deactivate_many() due to yield() usage

2017-04-04 Thread Mike Galbraith

On Tue, 2017-04-04 at 22:25 -0700, Cong Wang wrote:
> On Tue, Apr 4, 2017 at 8:20 PM, Mike Galbraith  wrote:
> > -   while (some_qdisc_is_busy(dev))
> > -   yield();
> > +   swait_event_timeout(swait,
> > !some_qdisc_is_busy(dev), 1);
> >  }
> 
> I don't see why this is an improvement even if I don't care about the
> hardcoded timeout for now... Why the scheduler can make a better
> decision with swait_event_timeout() than with cond_resched()?

Because sleeping gets you out of the way?  There is no other decision
the scheduler can make while a SCHED_FIFO task is trying to yield when
it is the one and only task at it's priority.  The scheduler is doing
exactly what it is supposed to do, problem is people calling yield()
tend to think it does something it does not do, which is why it is
decorated with "if you think you want yield(), think again"

Yes, yield semantics suck rocks, basically don't exist.  Hop in your
time machine and slap whoever you find claiming responsibility :)

-Mike

Re: [PATCH v2 00/13] ftgmac100: Rework batch 1 - Link & Interrupts

2017-04-04 Thread Florian Fainelli



On 04/04/2017 10:53 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2017-04-04 at 21:21 -0700, Florian Fainelli wrote:
>>
>> This looks pretty good to me, two minor things:
>>
>> - most drivers keep track of the old status/duplex/pause/link variables
>> instead of the current one which is already available within struct
>> phy_device, any particular reason for not doing like the other drivers?
> 
> We don't necessarily have a phydev attached when using NC-SI, so it was
> easier to have the core code path not have to go fishing for those
> settings in different places based on whether we're using NC-SI or not.

Oh right, I missed that part. Is there a reason why NC-SI does not have
a PHY device attached? If not, could you somehow model the link using a
fixed PHY (which appears to Linux as a normal phy_device) just to keep
things simple.

> 
>> - the need to reset the HW during link changes is just ... well too bad
> 
> Yup but there's little choice. The HW wants it. I don't see any real
> point in optimizing that path mind you. Losing a few packets around
> a link change isn't going to hurt and it keeps the code a lot simpler
> by having a single "re-init" path.

I was just merely trying to say nicely: what a nicely broken piece of HW
(there were other adjectives coming to mind), and I do understand the pain.

> 
>> With that:
>>
>>> Reviewed-by: Florian Fainelli 
> 
> Thanks !
> 
> I'll post batch 2 in the next couple of days which tackles the RX path.

Cool, looking forward to that!
-- 
Florian

Re: [Patch net] net_sched: replace yield() with cond_resched()

2017-04-04 Thread Mike Galbraith

On Tue, 2017-04-04 at 22:19 -0700, Cong Wang wrote:
> On Tue, Apr 4, 2017 at 8:55 PM, Mike Galbraith  wrote:

> > That won't help, cond_resched() has the same impact upon a lone
> > SCHED_FIFO task as yield() does.. none.
> 
> Hmm? In the comment you quote:
> 
>  * If you want to use yield() to wait for something, use wait_event().
>  * If you want to use yield() to be 'nice' for others, use cond_resched().
> 
> So if cond_resched() doesn't help, why this misleading comment?

This is not an oh let's be nice guys thing, it's a perfect match of...

 * while (!event)
 *  yield();
(/copy/paste>

..get off the CPU until this happens thing.  With nobody to yield the C
PU to, some_qdisc_is_busy() will remain true forever more.

> I picked the latter one, because the former is harder to implement
> properly (at least for -net) we need qdisc's to notify this waiter once
> they finish transmitting packets, which means we probably need
> a per-netdevice wait struct.

Yup, why I merely notified net-fu masters of lurking spinner.  I met it
 because I sometimes run most kthreads at prio 1, some prioritized, and
kworkers at prio 2.  (never mind why, but they're excellent reasons)

-Mike

Re: [PATCH v2 00/13] ftgmac100: Rework batch 1 - Link & Interrupts

2017-04-04 Thread Benjamin Herrenschmidt

On Tue, 2017-04-04 at 21:21 -0700, Florian Fainelli wrote:
> 
> This looks pretty good to me, two minor things:
> 
> - most drivers keep track of the old status/duplex/pause/link variables
> instead of the current one which is already available within struct
> phy_device, any particular reason for not doing like the other drivers?

We don't necessarily have a phydev attached when using NC-SI, so it was
easier to have the core code path not have to go fishing for those
settings in different places based on whether we're using NC-SI or not.

> - the need to reset the HW during link changes is just ... well too bad

Yup but there's little choice. The HW wants it. I don't see any real
point in optimizing that path mind you. Losing a few packets around
a link change isn't going to hurt and it keeps the code a lot simpler
by having a single "re-init" path.

> With that:
> 
> > Reviewed-by: Florian Fainelli 

Thanks !

I'll post batch 2 in the next couple of days which tackles the RX path.

Cheers,
Ben.

Re: [PATCH] net: phy: broadcom: Add support for the BCM54210E

2017-04-04 Thread Joel Stanley

On Wed, Apr 5, 2017 at 3:17 PM, Florian Fainelli  wrote:
>
>
> On 04/04/2017 10:33 PM, Joel Stanley wrote:
>> This device is a single-port RGMII 10/100/1000BASE-T PHY with EEE & WOL.
>
> This looks good, although Rafal did beat you to it:
>
> 0fc9ae107669760c2a8658cb5b5876dbe525e08d ("net: phy: broadcom: add
> support for BCM54210E")

Even better! Thank you.

Cheers,

Joel

Re: [PATCH] net: phy: broadcom: Add support for the BCM54210E

2017-04-04 Thread Florian Fainelli



On 04/04/2017 10:33 PM, Joel Stanley wrote:
> This device is a single-port RGMII 10/100/1000BASE-T PHY with EEE & WOL.

This looks good, although Rafal did beat you to it:

0fc9ae107669760c2a8658cb5b5876dbe525e08d ("net: phy: broadcom: add
support for BCM54210E")
-- 
Florian

[PATCH] net: phy: broadcom: Add support for the BCM54210E

2017-04-04 Thread Joel Stanley

This device is a single-port RGMII 10/100/1000BASE-T PHY with EEE & WOL.

Signed-off-by: Joel Stanley 
---
 drivers/net/phy/broadcom.c | 13 +
 include/linux/brcmphy.h|  2 ++
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index 9cd8b27d1292..3df826323129 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -703,6 +703,18 @@ static struct phy_driver broadcom_drivers[] = {
.read_status= genphy_read_status,
.ack_interrupt  = brcm_fet_ack_interrupt,
.config_intr= brcm_fet_config_intr,
+}, {
+   .phy_id = PHY_ID_BCM54210E,
+   .phy_id_mask= 0xfff0,
+   .name   = "Broadcom BCM54210E",
+   .features   = PHY_GBIT_FEATURES |
+   SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+   .flags  = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT,
+   .config_init= bcm54xx_config_init,
+   .config_aneg= genphy_config_aneg,
+   .read_status= genphy_read_status,
+   .ack_interrupt  = bcm_phy_ack_intr,
+   .config_intr= bcm_phy_config_intr,
 } };
 
 module_phy_driver(broadcom_drivers);
@@ -723,6 +735,7 @@ static struct mdio_device_id __maybe_unused broadcom_tbl[] 
= {
{ PHY_ID_BCM57780, 0xfff0 },
{ PHY_ID_BCMAC131, 0xfff0 },
{ PHY_ID_BCM5241, 0xfff0 },
+   { PHY_ID_BCM54210E, 0xfff0},
{ }
 };
 
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index 55e517130311..53106b9c89f1 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -40,6 +40,8 @@
 
 #define PHY_ID_BCM_CYGNUS  0xae025200
 
+#define PHY_ID_BCM54210E   0x600d84a0
+
 #define PHY_BCM_OUI_MASK   0xfc00
 #define PHY_BCM_OUI_1  0x00206000
 #define PHY_BCM_OUI_2  0x0143bc00
-- 
2.11.0

Re: net/sched: latent livelock in dev_deactivate_many() due to yield() usage

2017-04-04 Thread Cong Wang

On Tue, Apr 4, 2017 at 8:20 PM, Mike Galbraith  wrote:
> -   while (some_qdisc_is_busy(dev))
> -   yield();
> +   swait_event_timeout(swait, !some_qdisc_is_busy(dev), 1);
>  }

I don't see why this is an improvement even if I don't care about the
hardcoded timeout for now... Why the scheduler can make a better
decision with swait_event_timeout() than with cond_resched()?

Re: [Patch net] net_sched: replace yield() with cond_resched()

2017-04-04 Thread Cong Wang

On Tue, Apr 4, 2017 at 8:55 PM, Mike Galbraith  wrote:
> On Tue, 2017-04-04 at 18:52 -0700, Cong Wang wrote:
>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index 1a2f9e9..4725d2f 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -925,7 +925,7 @@ void dev_deactivate_many(struct list_head *head)
>>   /* Wait for outstanding qdisc_run calls. */
>>   list_for_each_entry(dev, head, close_list)
>>   while (some_qdisc_is_busy(dev))
>> - yield();
>> + cond_resched();
>>  }
>
> That won't help, cond_resched() has the same impact upon a lone
> SCHED_FIFO task as yield() does.. none.

Hmm? In the comment you quote:

 * If you want to use yield() to wait for something, use wait_event().
 * If you want to use yield() to be 'nice' for others, use cond_resched().

So if cond_resched() doesn't help, why this misleading comment?

I picked the latter one, because the former is harder to implement
properly (at least for -net) we need qdisc's to notify this waiter once
they finish transmitting packets, which means we probably need
a per-netdevice wait struct.

[PATCH] af_unix: Use designated initializers

2017-04-04 Thread Kees Cook

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, and the initializer fixes
were extracted from grsecurity. In this case, NULL initialize with { }
instead of undesignated NULLs.

Signed-off-by: Kees Cook 
---
 net/unix/af_unix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 928691c43408..6a7fe7660551 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -996,7 +996,7 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
unsigned int hash;
struct unix_address *addr;
struct hlist_head *list;
-   struct path path = { NULL, NULL };
+   struct path path = { };
 
err = -EINVAL;
if (sunaddr->sun_family != AF_UNIX)
-- 
2.7.4


-- 
Kees Cook
Pixel Security

Re: [PATCH v2 00/13] ftgmac100: Rework batch 1 - Link & Interrupts

2017-04-04 Thread Florian Fainelli

Salut Benjamin,

Le 04/04/17 à 19:28, Benjamin Herrenschmidt a écrit :
> This is version 2 of the first batch of updates to the
> ftgmac100 driver.
> 
> Essentially:
> 
>  - A few misc cleanups
>  - Fixing link speed & duplex handling (including dealing with
>an Aspeed requirement to double reset the controller when
>the speed changes)
>  - And addition of a reset task workqueue which will be used
>for delaying the re-initialization of the controller
>  - Fixing a number of issues with how interrupts and NAPI
>are dealt with.
> 
> Subsequent batches will rework and improve the rx path, the
> tx path, and add a bunch of features and fixes.
> 
> Version 2 addresses some review comments to patches 5 and 10
> (see version history in the respective emails).
> 

This looks pretty good to me, two minor things:

- most drivers keep track of the old status/duplex/pause/link variables
instead of the current one which is already available within struct
phy_device, any particular reason for not doing like the other drivers?

- the need to reset the HW during link changes is just ... well too bad

With that:

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: pull-request: wireless-drivers-next 2017-04-03

2017-04-04 Thread Kalle Valo

David Miller  writes:

> From: Kalle Valo 
> Date: Tue, 04 Apr 2017 20:48:35 +0300
>
>> David Miller  writes:
>> 
>>> From: Kalle Valo 
>>> Date: Mon, 03 Apr 2017 14:26:10 +0300
>>>
 here few really small fixes. I'm hoping this to be the last pull request
 for 4.11.
 
 Please let me if there are any problems.
>>>
>>> Pulled, thanks.
>>>
>>> But I will warn you, you say fixes, but your Subject line and
>>> GIT tag says "-next" so I pulled it into net-next.
>> 
>> Sorry, I used the wrong pull request template and that's why I had the
>> wrong subject in this pull request. So actually this was supposed to be
>> for net, not net-next. Any chance you could also pull this to net so
>> that we can still get the fixes to 4.11?
>
> Sure, done.

Great, thank you. And sorry for this, I need to be more careful when
sending the pull requests.

-- 
Kalle Valo

Re: [Patch net] net_sched: replace yield() with cond_resched()

2017-04-04 Thread Mike Galbraith

On Tue, 2017-04-04 at 18:52 -0700, Cong Wang wrote:

> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 1a2f9e9..4725d2f 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -925,7 +925,7 @@ void dev_deactivate_many(struct list_head *head)
>   /* Wait for outstanding qdisc_run calls. */
>   list_for_each_entry(dev, head, close_list)
>   while (some_qdisc_is_busy(dev))
> - yield();
> + cond_resched();
>  }

That won't help, cond_resched() has the same impact upon a lone
SCHED_FIFO task as yield() does.. none.

-Mike

Re: [PATCH] ebpf: verify the output of the JIT

2017-04-04 Thread Tycho Andersen

Hi Kees,

On Tue, Apr 04, 2017 at 03:17:57PM -0700, Kees Cook wrote:
> On Tue, Apr 4, 2017 at 3:08 PM, Tycho Andersen  wrote:
> > The goal of this patch is to protect the JIT against an attacker with a
> > write-in-memory primitive. The JIT allocates a buffer which will eventually
> > be marked +x, so we need to make sure that what was written to this buffer
> > is what was intended.
> >
> > We acheive this by building a hash of the instruction buffer as
> > instructions are emittted and then comparing that to a hash at the end of
> > the JIT compile after the buffer has been marked read-only.
> >
> > Signed-off-by: Tycho Andersen 
> > CC: Daniel Borkmann 
> > CC: Alexei Starovoitov 
> > CC: Kees Cook 
> > CC: Mickaël Salaün 
> 
> Cool! This closes the race condition on producing the JIT vs going
> read-only. I wonder if it might be possible to make this a more
> generic interface to the BPF which would be allocate the hash, provide
> the update callback during emit, and then do the hash check itself at
> the end of bpf_jit_binary_lock_ro()?

Yes, probably so. I can look into that for the next version.

Tycho

[PATCH v2 02/13] ftgmac100: Remove "banner" comments

2017-04-04 Thread Benjamin Herrenschmidt

The divisions they represent are not particularily meaningful
and things are going to be moving around with upcoming changes
making these comments more a burden than anything else.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 42 
 1 file changed, 42 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index bf7b1c0..6501aa7 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -44,9 +44,6 @@
 #define MAX_PKT_SIZE   1518
 #define RX_BUF_SIZEPAGE_SIZE   /* must be smaller than 0x3fff 
*/
 
-/**
- * private data
- */
 struct ftgmac100_descs {
struct ftgmac100_rxdes rxdes[RX_QUEUE_ENTRIES];
struct ftgmac100_txdes txdes[TX_QUEUE_ENTRIES];
@@ -86,9 +83,6 @@ struct ftgmac100 {
 static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv,
   struct ftgmac100_rxdes *rxdes, gfp_t gfp);
 
-/**
- * internal functions (hardware register access)
- */
 static void ftgmac100_set_rx_ring_base(struct ftgmac100 *priv, dma_addr_t addr)
 {
iowrite32(addr, priv->base + FTGMAC100_OFFSET_RXR_BADR);
@@ -243,9 +237,6 @@ static void ftgmac100_stop_hw(struct ftgmac100 *priv)
iowrite32(0, priv->base + FTGMAC100_OFFSET_MACCR);
 }
 
-/**
- * internal functions (receive descriptor)
- */
 static bool ftgmac100_rxdes_first_segment(struct ftgmac100_rxdes *rxdes)
 {
return rxdes->rxdes0 & cpu_to_le32(FTGMAC100_RXDES0_FRS);
@@ -370,9 +361,6 @@ static struct page *ftgmac100_rxdes_get_page(struct 
ftgmac100 *priv,
return *ftgmac100_rxdes_page_slot(priv, rxdes);
 }
 
-/**
- * internal functions (receive)
- */
 static int ftgmac100_next_rx_pointer(int pointer)
 {
return (pointer + 1) & (RX_QUEUE_ENTRIES - 1);
@@ -557,9 +545,6 @@ static bool ftgmac100_rx_packet(struct ftgmac100 *priv, int 
*processed)
return true;
 }
 
-/**
- * internal functions (transmit descriptor)
- */
 static void ftgmac100_txdes_reset(const struct ftgmac100 *priv,
  struct ftgmac100_txdes *txdes)
 {
@@ -653,9 +638,6 @@ static struct sk_buff *ftgmac100_txdes_get_skb(struct 
ftgmac100_txdes *txdes)
return (struct sk_buff *)txdes->txdes2;
 }
 
-/**
- * internal functions (transmit)
- */
 static int ftgmac100_next_tx_pointer(int pointer)
 {
return (pointer + 1) & (TX_QUEUE_ENTRIES - 1);
@@ -771,9 +753,6 @@ static int ftgmac100_xmit(struct ftgmac100 *priv, struct 
sk_buff *skb,
return NETDEV_TX_OK;
 }
 
-/**
- * internal functions (buffer)
- */
 static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv,
   struct ftgmac100_rxdes *rxdes, gfp_t gfp)
 {
@@ -865,9 +844,6 @@ static int ftgmac100_alloc_buffers(struct ftgmac100 *priv)
return -ENOMEM;
 }
 
-/**
- * internal functions (mdio)
- */
 static void ftgmac100_adjust_link(struct net_device *netdev)
 {
struct ftgmac100 *priv = netdev_priv(netdev);
@@ -917,9 +893,6 @@ static int ftgmac100_mii_probe(struct ftgmac100 *priv)
return 0;
 }
 
-/**
- * struct mii_bus functions
- */
 static int ftgmac100_mdiobus_read(struct mii_bus *bus, int phy_addr, int 
regnum)
 {
struct net_device *netdev = bus->priv;
@@ -991,9 +964,6 @@ static int ftgmac100_mdiobus_write(struct mii_bus *bus, int 
phy_addr,
return -EIO;
 }
 
-/**
- * struct ethtool_ops functions
- ***

Re: net/sched: latent livelock in dev_deactivate_many() due to yield() usage

2017-04-04 Thread Mike Galbraith

On Tue, 2017-04-04 at 15:39 -0700, Cong Wang wrote:

> Thanks for the report! Looks like a quick solution here is to replace
> this yield() with cond_resched(), it is harder to really wait for
> all qdisc's to transmit all packets.

No, cond_resched() won't help.  What I did is below, but I suspect net
wizards will do something better.

---
 net/sched/sch_generic.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -901,6 +902,7 @@ static bool some_qdisc_is_busy(struct ne
  */
 void dev_deactivate_many(struct list_head *head)
 {
+   DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(swait);
struct net_device *dev;
bool sync_needed = false;
 
@@ -924,8 +926,7 @@ void dev_deactivate_many(struct list_hea
 
/* Wait for outstanding qdisc_run calls. */
list_for_each_entry(dev, head, close_list)
-   while (some_qdisc_is_busy(dev))
-   yield();
+   swait_event_timeout(swait, !some_qdisc_is_busy(dev), 1);
 }
 
 void dev_deactivate(struct net_device *dev)

Re: net/ipv4: use-after-free in ipv4_mtu

2017-04-04 Thread Eric Dumazet

On Tue, 2017-04-04 at 18:11 -0700, Cong Wang wrote:
> On Tue, Apr 4, 2017 at 11:51 AM, Eric Dumazet  wrote:
> > On Tue, Apr 4, 2017 at 7:50 AM, Andrey Konovalov  
> > wrote:
> >>
> >> Hi,
> >>
> >> I've got the following error report while fuzzing the kernel with 
> >> syzkaller.
> >>
> >> On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5).
> >>
> >> Unfortunately it's not reproducible.
> >>
> >> ==
> >> BUG: KASAN: use-after-free in dst_metric_raw include/net/dst.h:176
> >> [inline] at addr 88003d6a965c
> >> BUG: KASAN: use-after-free in ipv4_mtu+0x3f2/0x4b0
> >> net/ipv4/route.c:1270 at addr 88003d6a965c
> >> Read of size 4 by task syz-executor3/20611
> >> CPU: 3 PID: 20611 Comm: syz-executor3 Not tainted 4.11.0-rc5+ #199
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
> >> 01/01/2011
> >> Call Trace:
> >>  __dump_stack lib/dump_stack.c:16 [inline]
> >>  dump_stack+0x292/0x398 lib/dump_stack.c:52
> >>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
> >>  print_address_description mm/kasan/report.c:202 [inline]
> >>  kasan_report_error mm/kasan/report.c:291 [inline]
> >>  kasan_report+0x252/0x510 mm/kasan/report.c:347
> >>  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367
> >>  dst_metric_raw include/net/dst.h:176 [inline]
> >>  ipv4_mtu+0x3f2/0x4b0 net/ipv4/route.c:1270
> >>  dst_mtu include/net/dst.h:221 [inline]
> >>  do_ip_getsockopt+0x71d/0x2290 net/ipv4/ip_sockglue.c:1433
> >>  ip_getsockopt+0x90/0x230 net/ipv4/ip_sockglue.c:1578
> >>  tcp_getsockopt+0x82/0xd0 net/ipv4/tcp.c:3131
> >>  sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2709
> >>  SYSC_getsockopt net/socket.c:1829 [inline]
> >>  SyS_getsockopt+0x252/0x390 net/socket.c:1811
> >>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> >> RIP: 0033:0x4458d9
> >> RSP: 002b:7fe87f452b58 EFLAGS: 0286 ORIG_RAX: 0037
> >> RAX: ffda RBX: 0005 RCX: 004458d9
> >> RDX: 000e RSI:  RDI: 0005
> >> RBP: 006e0020 R08: 20db6000 R09: 
> >> R10: 207e8000 R11: 0286 R12: 00708150
> >> R13: 20db8000 R14: 1000 R15: 0003
> >> Object at 88003d6a9658, in cache kmalloc-64 size: 64
> >> Allocated:
> >> PID = 20110
> >>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> >>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
> >>  set_track mm/kasan/kasan.c:525 [inline]
> >>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
> >>  kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745
> >>  kmalloc include/linux/slab.h:490 [inline]
> >>  kzalloc include/linux/slab.h:663 [inline]
> >>  fib_create_info+0x8e0/0x3a30 net/ipv4/fib_semantics.c:1040
> >>  fib_table_insert+0x1a5/0x1550 net/ipv4/fib_trie.c:1221
> >>  ip_rt_ioctl+0xddc/0x1590 net/ipv4/fib_frontend.c:597
> >>  inet_ioctl+0xf2/0x1c0 net/ipv4/af_inet.c:882
> >> sctp: [Deprecated]: syz-executor0 (pid 20638) Use of int in max_burst
> >> socket option.
> >> Use struct sctp_assoc_value instead
> >>  sock_do_ioctl+0x65/0xb0 net/socket.c:906
> >>  sock_ioctl+0x28f/0x440 net/socket.c:1004
> >>  vfs_ioctl fs/ioctl.c:45 [inline]
> >>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
> >>  SYSC_ioctl fs/ioctl.c:700 [inline]
> >>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
> >>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> >> Freed:
> >> PID = 4439
> >>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> >>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
> >>  set_track mm/kasan/kasan.c:525 [inline]
> >>  kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
> >>  slab_free_hook mm/slub.c:1357 [inline]
> >>  slab_free_freelist_hook mm/slub.c:1379 [inline]
> >>  slab_free mm/slub.c:2961 [inline]
> >>  kfree+0xe8/0x2b0 mm/slub.c:3882
> >>  free_fib_info_rcu+0x4ba/0x5e0 net/ipv4/fib_semantics.c:218
> >>  __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
> >>  rcu_do_batch.isra.64+0x947/0xcc0 kernel/rcu/tree.c:2879
> >>  invoke_rcu_callbacks kernel/rcu/tree.c:3142 [inline]
> >>  __rcu_process_callbacks kernel/rcu/tree.c:3109 [inline]
> >>  rcu_process_callbacks+0x2cc/0xb90 kernel/rcu/tree.c:3126
> >>  __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
> >> Memory state around the buggy address:
> >>  88003d6a9500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >>  88003d6a9580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >> >88003d6a9600: fc fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb
> >> ^
> >>  88003d6a9680: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
> >>  88003d6a9700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >> ==
> >
> > Thanks for the report Andrey
> >
> > Looking at fib->fib_metrics, I fail to understand how the following can 
> > work :
> >
> > dst_init_metrics(&rt->dst, fi->fib_metrics, true);
> >
> > In the cases fi->

[PATCH v2 09/13] ftgmac100: Move the bulk of inits to a separate function

2017-04-04 Thread Benjamin Herrenschmidt

The link monitoring and error handling code will have to
redo the ring inits and HW setup so move the code out of
ftgmac100_open() into a dedicated function.

This forces a bit of re-ordering of ftgmac100_open() but
nothing dramatic.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 71 +++-
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index fdb8638..36f2905 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1109,10 +1109,35 @@ static int ftgmac100_poll(struct napi_struct *napi, int 
budget)
return rx;
 }
 
+static int ftgmac100_init_all(struct ftgmac100 *priv, bool ignore_alloc_err)
+{
+   int err = 0;
+
+   /* Re-init descriptors (adjust queue sizes) */
+   ftgmac100_init_rings(priv);
+
+   /* Realloc rx descriptors */
+   err = ftgmac100_alloc_rx_buffers(priv);
+   if (err && !ignore_alloc_err)
+   return err;
+
+   /* Reinit and restart HW */
+   ftgmac100_init_hw(priv);
+   ftgmac100_start_hw(priv);
+
+   /* Re-enable the device */
+   napi_enable(&priv->napi);
+   netif_start_queue(priv->netdev);
+
+   /* Enable all interrupts */
+   iowrite32(priv->int_mask_all, priv->base + FTGMAC100_OFFSET_IER);
+
+   return err;
+}
+
 static int ftgmac100_open(struct net_device *netdev)
 {
struct ftgmac100 *priv = netdev_priv(netdev);
-   unsigned int status;
int err;
 
/* Allocate ring buffers  */
@@ -1122,13 +1147,6 @@ static int ftgmac100_open(struct net_device *netdev)
return err;
}
 
-   /* Initialize the rings */
-   ftgmac100_init_rings(priv);
-
-   /* Allocate receive buffers */
-   if (ftgmac100_alloc_rx_buffers(priv))
-   goto err_alloc;
-
/* When using NC-SI we force the speed to 100Mbit/s full duplex,
 *
 * Otherwise we leave it set to 0 (no link), the link
@@ -1162,26 +1180,21 @@ static int ftgmac100_open(struct net_device *netdev)
goto err_irq;
}
 
-   ftgmac100_init_hw(priv);
-   ftgmac100_start_hw(priv);
-
-   /* Clear stale interrupts */
-   status = ioread32(priv->base + FTGMAC100_OFFSET_ISR);
-   iowrite32(status, priv->base + FTGMAC100_OFFSET_ISR);
+   /* Start things up */
+   err = ftgmac100_init_all(priv, false);
+   if (err) {
+   netdev_err(netdev, "Failed to allocate packet buffers\n");
+   goto err_alloc;
+   }
 
-   if (netdev->phydev)
+   if (netdev->phydev) {
+   /* If we have a PHY, start polling */
phy_start(netdev->phydev);
-   else if (priv->use_ncsi)
+   } else if (priv->use_ncsi) {
+   /* If using NC-SI, set our carrier on and start the stack */
netif_carrier_on(netdev);
 
-   napi_enable(&priv->napi);
-   netif_start_queue(netdev);
-
-   /* enable all interrupts */
-   iowrite32(priv->int_mask_all, priv->base + FTGMAC100_OFFSET_IER);
-
-   /* Start the NCSI device */
-   if (priv->use_ncsi) {
+   /* Start the NCSI device */
err = ncsi_start_dev(priv->ndev);
if (err)
goto err_ncsi;
@@ -1189,16 +1202,16 @@ static int ftgmac100_open(struct net_device *netdev)
 
return 0;
 
-err_ncsi:
+ err_ncsi:
napi_disable(&priv->napi);
netif_stop_queue(netdev);
+ err_alloc:
+   ftgmac100_free_buffers(priv);
free_irq(netdev->irq, netdev);
-err_irq:
+ err_irq:
netif_napi_del(&priv->napi);
-err_hw:
-err_alloc:
+ err_hw:
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
-   ftgmac100_free_buffers(priv);
ftgmac100_free_rings(priv);
return err;
 }
-- 
2.9.3

[PATCH v2 08/13] ftgmac100: Request the interrupt only after HW is reset

2017-04-04 Thread Benjamin Herrenschmidt

The interrupt isn't shared, so this will keep it masked
until we have the HW in a known sane state.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index bb444d2..fdb8638 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1129,12 +1129,6 @@ static int ftgmac100_open(struct net_device *netdev)
if (ftgmac100_alloc_rx_buffers(priv))
goto err_alloc;
 
-   err = request_irq(netdev->irq, ftgmac100_interrupt, 0, netdev->name, 
netdev);
-   if (err) {
-   netdev_err(netdev, "failed to request irq %d\n", netdev->irq);
-   goto err_irq;
-   }
-
/* When using NC-SI we force the speed to 100Mbit/s full duplex,
 *
 * Otherwise we leave it set to 0 (no link), the link
@@ -1161,6 +1155,13 @@ static int ftgmac100_open(struct net_device *netdev)
/* Initialize NAPI */
netif_napi_add(netdev, &priv->napi, ftgmac100_poll, 64);
 
+   /* Grab our interrupt */
+   err = request_irq(netdev->irq, ftgmac100_interrupt, 0, netdev->name, 
netdev);
+   if (err) {
+   netdev_err(netdev, "failed to request irq %d\n", netdev->irq);
+   goto err_irq;
+   }
+
ftgmac100_init_hw(priv);
ftgmac100_start_hw(priv);
 
@@ -1191,12 +1192,12 @@ static int ftgmac100_open(struct net_device *netdev)
 err_ncsi:
napi_disable(&priv->napi);
netif_stop_queue(netdev);
-   netif_napi_del(&priv->napi);
-   iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
-err_hw:
free_irq(netdev->irq, netdev);
 err_irq:
+   netif_napi_del(&priv->napi);
+err_hw:
 err_alloc:
+   iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
ftgmac100_free_buffers(priv);
ftgmac100_free_rings(priv);
return err;
-- 
2.9.3

[PATCH v2 03/13] ftgmac100: Reorder struct fields and comment

2017-04-04 Thread Benjamin Herrenschmidt

Reorder the fields in struct ftgmac in slightly more logical
groups. Will make more sense as I add/remove some.

No code change.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 6501aa7..02e0534 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -50,34 +50,39 @@ struct ftgmac100_descs {
 };
 
 struct ftgmac100 {
+   /* Registers */
struct resource *res;
void __iomem *base;
 
struct ftgmac100_descs *descs;
dma_addr_t descs_dma_addr;
 
+   /* Rx ring */
struct page *rx_pages[RX_QUEUE_ENTRIES];
-
unsigned int rx_pointer;
+   u32 rxdes0_edorr_mask;
+
+   /* Tx ring */
unsigned int tx_clean_pointer;
unsigned int tx_pointer;
unsigned int tx_pending;
-
+   u32 txdes0_edotr_mask;
spinlock_t tx_lock;
 
+   /* Component structures */
struct net_device *netdev;
struct device *dev;
struct ncsi_dev *ndev;
struct napi_struct napi;
-
struct mii_bus *mii_bus;
+
+   /* Link management */
int old_speed;
-   int int_mask_all;
bool use_ncsi;
-   bool enabled;
 
-   u32 rxdes0_edorr_mask;
-   u32 txdes0_edotr_mask;
+   /* Misc */
+   int int_mask_all;
+   bool enabled;
 };
 
 static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv,
-- 
2.9.3

[PATCH v2 12/13] ftgmac100: Remove useless tests in interrupt handler

2017-04-04 Thread Benjamin Herrenschmidt

The interrupt is neither enabled nor registered when the interface
isn't running (regardless of whether we use nc-si or not) so the
test isn't useful.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 3adfb92..4fa138b 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1047,14 +1047,9 @@ static irqreturn_t ftgmac100_interrupt(int irq, void 
*dev_id)
struct net_device *netdev = dev_id;
struct ftgmac100 *priv = netdev_priv(netdev);
 
-   /* When running in NCSI mode, the interface should be ready for
-* receiving or transmitting NCSI packets before it's opened.
-*/
-   if (likely(priv->use_ncsi || netif_running(netdev))) {
-   /* Disable interrupts for polling */
-   iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
-   napi_schedule(&priv->napi);
-   }
+   /* Disable interrupts for polling */
+   iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
+   napi_schedule(&priv->napi);
 
return IRQ_HANDLED;
 }
-- 
2.9.3

[PATCH v2 10/13] ftgmac100: Add a reset task and use it for link changes

2017-04-04 Thread Benjamin Herrenschmidt

Link speed changes require a full HW reset. This isn't done
properly at the moment. It will involve delays and thus isn't
suitable to do from the link poll callback.

So let's create a reset_task that we can queue up when the
link changes. It will be useful for various cases of error
handling as well.

Signed-off-by: Benjamin Herrenschmidt 
--

v2. Fix lock ordering
Add mdio_bus mutex
---
 drivers/net/ethernet/faraday/ftgmac100.c | 87 +++-
 1 file changed, 74 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 36f2905..61f02bf 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -74,6 +74,7 @@ struct ftgmac100 {
struct device *dev;
struct ncsi_dev *ndev;
struct napi_struct napi;
+   struct work_struct reset_task;
struct mii_bus *mii_bus;
 
/* Link management */
@@ -872,7 +873,6 @@ static void ftgmac100_adjust_link(struct net_device *netdev)
struct ftgmac100 *priv = netdev_priv(netdev);
struct phy_device *phydev = netdev->phydev;
int new_speed;
-   int ier;
 
/* We store "no link" as speed 0 */
if (!phydev->link)
@@ -897,20 +897,11 @@ static void ftgmac100_adjust_link(struct net_device 
*netdev)
if (!new_speed)
return;
 
-   ier = ioread32(priv->base + FTGMAC100_OFFSET_IER);
-
-   /* disable all interrupts */
+   /* Disable all interrupts */
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
 
-   netif_stop_queue(netdev);
-   ftgmac100_stop_hw(priv);
-
-   netif_start_queue(netdev);
-   ftgmac100_init_hw(priv);
-   ftgmac100_start_hw(priv);
-
-   /* re-enable interrupts */
-   iowrite32(ier, priv->base + FTGMAC100_OFFSET_IER);
+   /* Reset the adapter asynchronously */
+   schedule_work(&priv->reset_task);
 }
 
 static int ftgmac100_mii_probe(struct ftgmac100 *priv)
@@ -1135,6 +1126,61 @@ static int ftgmac100_init_all(struct ftgmac100 *priv, 
bool ignore_alloc_err)
return err;
 }
 
+static void ftgmac100_reset_task(struct work_struct *work)
+{
+   struct ftgmac100 *priv = container_of(work, struct ftgmac100,
+ reset_task);
+   struct net_device *netdev = priv->netdev;
+   int err;
+
+   netdev_dbg(netdev, "Resetting NIC...\n");
+
+   /* Lock the world */
+   rtnl_lock();
+   if (netdev->phydev)
+   mutex_lock(&netdev->phydev->lock);
+   if (priv->mii_bus)
+   mutex_lock(&priv->mii_bus->mdio_lock);
+
+
+   /* Check if the interface is still up */
+   if (!netif_running(netdev))
+   goto bail;
+
+   /* Stop the network stack */
+   netif_trans_update(netdev);
+   napi_disable(&priv->napi);
+   netif_tx_disable(netdev);
+
+   /* Stop and reset the MAC */
+   ftgmac100_stop_hw(priv);
+   err = ftgmac100_reset_hw(priv);
+   if (err) {
+   /* Not much we can do ... it might come back... */
+   netdev_err(netdev, "attempting to continue...\n");
+   }
+
+   /* Free all rx and tx buffers */
+   ftgmac100_free_buffers(priv);
+
+   /* The ring pointers have been reset in HW, reflect this here */
+   priv->rx_pointer = 0;
+   priv->tx_clean_pointer = 0;
+   priv->tx_pointer = 0;
+   priv->tx_pending = 0;
+
+   /* Setup everything again and restart chip */
+   ftgmac100_init_all(priv, true);
+
+   netdev_dbg(netdev, "Reset done !\n");
+ bail:
+   if (priv->mii_bus)
+   mutex_unlock(&priv->mii_bus->mdio_lock);
+   if (netdev->phydev)
+   mutex_unlock(&netdev->phydev->lock);
+   rtnl_unlock();
+}
+
 static int ftgmac100_open(struct net_device *netdev)
 {
struct ftgmac100 *priv = netdev_priv(netdev);
@@ -1220,6 +1266,14 @@ static int ftgmac100_stop(struct net_device *netdev)
 {
struct ftgmac100 *priv = netdev_priv(netdev);
 
+   /* Note about the reset task: We are called with the rtnl lock
+* held, so we are synchronized against the core of the reset
+* task. We must not try to synchronously cancel it otherwise
+* we can deadlock. But since it will test for netif_running()
+* which has already been cleared by the net core, we don't
+* anything special to do.
+*/
+
/* disable all interrupts */
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
 
@@ -1395,6 +1449,7 @@ static int ftgmac100_probe(struct platform_device *pdev)
priv = netdev_priv(netdev);
priv->netdev = netdev;
priv->dev = &pdev->dev;
+   INIT_WORK(&priv->reset_task, ftgmac100_reset_task);
 
spin_lock_init(&priv->tx_lock);
 
@@ -1498,6 +1553,12 @@ static int ftgmac100_remove(struct platform_device *pdev)
priv = netdev_priv(netdev);
 
unregiste

[PATCH v2 07/13] ftgmac100: Move napi_add/del to open/close

2017-04-04 Thread Benjamin Herrenschmidt

Rather than probe/remove

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 0d0576f..bb444d2 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1158,6 +1158,9 @@ static int ftgmac100_open(struct net_device *netdev)
if (err)
goto err_hw;
 
+   /* Initialize NAPI */
+   netif_napi_add(netdev, &priv->napi, ftgmac100_poll, 64);
+
ftgmac100_init_hw(priv);
ftgmac100_start_hw(priv);
 
@@ -1188,6 +1191,7 @@ static int ftgmac100_open(struct net_device *netdev)
 err_ncsi:
napi_disable(&priv->napi);
netif_stop_queue(netdev);
+   netif_napi_del(&priv->napi);
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
 err_hw:
free_irq(netdev->irq, netdev);
@@ -1207,6 +1211,7 @@ static int ftgmac100_stop(struct net_device *netdev)
 
netif_stop_queue(netdev);
napi_disable(&priv->napi);
+   netif_napi_del(&priv->napi);
if (netdev->phydev)
phy_stop(netdev->phydev);
else if (priv->use_ncsi)
@@ -1379,9 +1384,6 @@ static int ftgmac100_probe(struct platform_device *pdev)
 
spin_lock_init(&priv->tx_lock);
 
-   /* initialize NAPI */
-   netif_napi_add(netdev, &priv->napi, ftgmac100_poll, 64);
-
/* map io memory */
priv->res = request_mem_region(res->start, resource_size(res),
   dev_name(&pdev->dev));
-- 
2.9.3

[PATCH v2 11/13] ftgmac100: Rework MAC reset and init

2017-04-04 Thread Benjamin Herrenschmidt

The HW requires a full MAC reset when changing the speed.

Additionally the Aspeed documentation spells out that the
MAC needs to be reset twice with a 10us interval.

We thus move the speed setting and top level reset code
into a new ftgmac100_reset_and_config_mac() function which
handles both. Move the ring pointers initialization there
too in order to reflect the HW change.

Also reduce the timeout for the MAC reset as it shouldn't
take more than 300 clock cycles according to the doc.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 98 +++-
 1 file changed, 59 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 61f02bf..3adfb92 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -112,27 +112,64 @@ static void 
ftgmac100_txdma_normal_prio_start_polling(struct ftgmac100 *priv)
iowrite32(1, priv->base + FTGMAC100_OFFSET_NPTXPD);
 }
 
-static int ftgmac100_reset_hw(struct ftgmac100 *priv)
+static int ftgmac100_reset_mac(struct ftgmac100 *priv, u32 maccr)
 {
struct net_device *netdev = priv->netdev;
int i;
 
/* NOTE: reset clears all registers */
-   iowrite32(FTGMAC100_MACCR_SW_RST, priv->base + FTGMAC100_OFFSET_MACCR);
-   for (i = 0; i < 5; i++) {
+   iowrite32(maccr, priv->base + FTGMAC100_OFFSET_MACCR);
+   iowrite32(maccr | FTGMAC100_MACCR_SW_RST,
+ priv->base + FTGMAC100_OFFSET_MACCR);
+   for (i = 0; i < 50; i++) {
unsigned int maccr;
 
maccr = ioread32(priv->base + FTGMAC100_OFFSET_MACCR);
if (!(maccr & FTGMAC100_MACCR_SW_RST))
return 0;
 
-   udelay(1000);
+   udelay(1);
}
 
-   netdev_err(netdev, "software reset failed\n");
+   netdev_err(netdev, "Hardware reset failed\n");
return -EIO;
 }
 
+static int ftgmac100_reset_and_config_mac(struct ftgmac100 *priv)
+{
+   u32 maccr = 0;
+
+   switch (priv->cur_speed) {
+   case SPEED_10:
+   case 0: /* no link */
+   break;
+
+   case SPEED_100:
+   maccr |= FTGMAC100_MACCR_FAST_MODE;
+   break;
+
+   case SPEED_1000:
+   maccr |= FTGMAC100_MACCR_GIGA_MODE;
+   break;
+   default:
+   netdev_err(priv->netdev, "Unknown speed %d !\n",
+  priv->cur_speed);
+   break;
+   }
+
+   /* (Re)initialize the queue pointers */
+   priv->rx_pointer = 0;
+   priv->tx_clean_pointer = 0;
+   priv->tx_pointer = 0;
+   priv->tx_pending = 0;
+
+   /* The doc says reset twice with 10us interval */
+   if (ftgmac100_reset_mac(priv, maccr))
+   return -EIO;
+   usleep_range(10, 1000);
+   return ftgmac100_reset_mac(priv, maccr);
+}
+
 static void ftgmac100_set_mac(struct ftgmac100 *priv, const unsigned char *mac)
 {
unsigned int maddr = mac[0] << 8 | mac[1];
@@ -208,35 +245,28 @@ static void ftgmac100_init_hw(struct ftgmac100 *priv)
ftgmac100_set_mac(priv, priv->netdev->dev_addr);
 }
 
-#define MACCR_ENABLE_ALL   (FTGMAC100_MACCR_TXDMA_EN   | \
-FTGMAC100_MACCR_RXDMA_EN   | \
-FTGMAC100_MACCR_TXMAC_EN   | \
-FTGMAC100_MACCR_RXMAC_EN   | \
-FTGMAC100_MACCR_CRC_APD| \
-FTGMAC100_MACCR_RX_RUNT| \
-FTGMAC100_MACCR_RX_BROADPKT)
-
 static void ftgmac100_start_hw(struct ftgmac100 *priv)
 {
-   int maccr = MACCR_ENABLE_ALL;
+   u32 maccr = ioread32(priv->base + FTGMAC100_OFFSET_MACCR);
 
-   switch (priv->cur_speed) {
-   default:
-   case 10:
-   break;
+   /* Keep the original GMAC and FAST bits */
+   maccr &= (FTGMAC100_MACCR_FAST_MODE | FTGMAC100_MACCR_GIGA_MODE);
 
-   case 100:
-   maccr |= FTGMAC100_MACCR_FAST_MODE;
-   break;
-
-   case 1000:
-   maccr |= FTGMAC100_MACCR_GIGA_MODE;
-   break;
-   }
+   /* Add all the main enable bits */
+   maccr |= FTGMAC100_MACCR_TXDMA_EN   |
+FTGMAC100_MACCR_RXDMA_EN   |
+FTGMAC100_MACCR_TXMAC_EN   |
+FTGMAC100_MACCR_RXMAC_EN   |
+FTGMAC100_MACCR_CRC_APD|
+FTGMAC100_MACCR_PHY_LINK_LEVEL |
+FTGMAC100_MACCR_RX_RUNT|
+FTGMAC100_MACCR_RX_BROADPKT;
 
+   /* Add other bits as needed */
if (priv->cur_duplex == DUPLEX_FULL)
maccr |= FTGMAC100_MACCR_FULLDUP;
 
+   /* Hit the HW */
iowrite32(maccr, priv->base + FTGMAC100_OFFSET_MACCR);
 }
 
@@ -1154,7 +118

[PATCH v2 13/13] ftgmac100: Rework NAPI & interrupts handling

2017-04-04 Thread Benjamin Herrenschmidt

First, don't look at the interrupt status in the poll loop
to decide what to poll. It's wrong. If we have run out of
budget, we may still have RX packets to unqueue but no more
RX interrupt pending.

So instead move the code looking at the interrupt status
into the interrupt handler where it belongs. That avoids a slow
MMIO read in the NAPI fast path. We keep the abnormal interrupts
enabled while NAPI is scheduled.

While at it, actually do something useful in the "error" cases:

On AHB bus error, trigger the new reset task, that's about all
we can do. On RX packet fifo or descriptor overflows, we need
to restart the MAC after having freed things up. So set a flag
that NAPI will see and use to perform that restart after
harvesting the RX ring.

Finally, we shouldn't complete NAPI if there are still outgoing
packets that will need harvesting. Waiting for more interrupts
is less efficient than letting NAPI run a while longer while
the queue drains.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 137 +--
 drivers/net/ethernet/faraday/ftgmac100.h |  14 
 2 files changed, 90 insertions(+), 61 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 4fa138b..88dab5f 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -83,7 +83,7 @@ struct ftgmac100 {
bool use_ncsi;
 
/* Misc */
-   int int_mask_all;
+   bool need_mac_restart;
 };
 
 static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv,
@@ -1046,10 +1046,49 @@ static irqreturn_t ftgmac100_interrupt(int irq, void 
*dev_id)
 {
struct net_device *netdev = dev_id;
struct ftgmac100 *priv = netdev_priv(netdev);
+   unsigned int status, new_mask = FTGMAC100_INT_BAD;
 
-   /* Disable interrupts for polling */
-   iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
-   napi_schedule(&priv->napi);
+   /* Fetch and clear interrupt bits, process abnormal ones */
+   status = ioread32(priv->base + FTGMAC100_OFFSET_ISR);
+   iowrite32(status, priv->base + FTGMAC100_OFFSET_ISR);
+   if (unlikely(status & FTGMAC100_INT_BAD)) {
+
+   /* RX buffer unavailable */
+   if (status & FTGMAC100_INT_NO_RXBUF)
+   netdev->stats.rx_over_errors++;
+
+   /* received packet lost due to RX FIFO full */
+   if (status & FTGMAC100_INT_RPKT_LOST)
+   netdev->stats.rx_fifo_errors++;
+
+   /* sent packet lost due to excessive TX collision */
+   if (status & FTGMAC100_INT_XPKT_LOST)
+   netdev->stats.tx_fifo_errors++;
+
+   /* AHB error -> Reset the chip */
+   if (status & FTGMAC100_INT_AHB_ERR) {
+   if (net_ratelimit())
+   netdev_warn(netdev,
+  "AHB bus error ! Resetting chip.\n");
+   iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
+   schedule_work(&priv->reset_task);
+   return IRQ_HANDLED;
+   }
+
+   /* We may need to restart the MAC after such errors, delay
+* this until after we have freed some Rx buffers though
+*/
+   priv->need_mac_restart = true;
+
+   /* Disable those errors until we restart */
+   new_mask &= ~status;
+   }
+
+   /* Only enable "bad" interrupts while NAPI is on */
+   iowrite32(new_mask, priv->base + FTGMAC100_OFFSET_IER);
+
+   /* Schedule NAPI bh */
+   napi_schedule_irqoff(&priv->napi);
 
return IRQ_HANDLED;
 }
@@ -1057,68 +1096,51 @@ static irqreturn_t ftgmac100_interrupt(int irq, void 
*dev_id)
 static int ftgmac100_poll(struct napi_struct *napi, int budget)
 {
struct ftgmac100 *priv = container_of(napi, struct ftgmac100, napi);
-   struct net_device *netdev = priv->netdev;
-   unsigned int status;
-   bool completed = true;
+   bool more, completed = true;
int rx = 0;
 
-   status = ioread32(priv->base + FTGMAC100_OFFSET_ISR);
-   iowrite32(status, priv->base + FTGMAC100_OFFSET_ISR);
-
-   if (status & (FTGMAC100_INT_RPKT_BUF | FTGMAC100_INT_NO_RXBUF)) {
-   /*
-* FTGMAC100_INT_RPKT_BUF:
-*  RX DMA has received packets into RX buffer successfully
-*
-* FTGMAC100_INT_NO_RXBUF:
-*  RX buffer unavailable
-*/
-   bool retry;
+   ftgmac100_tx_complete(priv);
 
-   do {
-   retry = ftgmac100_rx_packet(priv, &rx);
-   } while (retry && rx < budget);
+   do {
+   more = ftgmac100_rx_packet(priv, &rx);
+   } while (more && rx < budget);
 
-   if (re

[PATCH v2 06/13] ftgmac100: Split ring alloc, init and rx buffer alloc

2017-04-04 Thread Benjamin Herrenschmidt

Currently, a single function is used to allocate the rings
themselves, initialize them, populate the rx ring, and
allocate the rx buffers. The same happens on free.

This splits them into separate functions. This will be
useful when properly implementing re-initialization on
link changes and error handling when the rings will be
repopulated but not freed.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 68 ++--
 1 file changed, 47 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index cc6e971..0d0576f 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -792,6 +792,7 @@ static void ftgmac100_free_buffers(struct ftgmac100 *priv)
 {
int i;
 
+   /* Free all RX buffers */
for (i = 0; i < RX_QUEUE_ENTRIES; i++) {
struct ftgmac100_rxdes *rxdes = &priv->descs->rxdes[i];
struct page *page = ftgmac100_rxdes_get_page(priv, rxdes);
@@ -804,6 +805,7 @@ static void ftgmac100_free_buffers(struct ftgmac100 *priv)
__free_page(page);
}
 
+   /* Free all TX buffers */
for (i = 0; i < TX_QUEUE_ENTRIES; i++) {
struct ftgmac100_txdes *txdes = &priv->descs->txdes[i];
struct sk_buff *skb = ftgmac100_txdes_get_skb(txdes);
@@ -815,40 +817,54 @@ static void ftgmac100_free_buffers(struct ftgmac100 *priv)
dma_unmap_single(priv->dev, map, skb_headlen(skb), 
DMA_TO_DEVICE);
kfree_skb(skb);
}
-
-   dma_free_coherent(priv->dev, sizeof(struct ftgmac100_descs),
- priv->descs, priv->descs_dma_addr);
 }
 
-static int ftgmac100_alloc_buffers(struct ftgmac100 *priv)
+static void ftgmac100_free_rings(struct ftgmac100 *priv)
 {
-   int i;
+   /* Free descriptors */
+   if (priv->descs)
+   dma_free_coherent(priv->dev, sizeof(struct ftgmac100_descs),
+ priv->descs, priv->descs_dma_addr);
+}
 
+static int ftgmac100_alloc_rings(struct ftgmac100 *priv)
+{
+   /* Allocate descriptors */
priv->descs = dma_zalloc_coherent(priv->dev,
  sizeof(struct ftgmac100_descs),
  &priv->descs_dma_addr, GFP_KERNEL);
if (!priv->descs)
return -ENOMEM;
 
-   /* initialize RX ring */
-   ftgmac100_rxdes_set_end_of_ring(priv,
-   &priv->descs->rxdes[RX_QUEUE_ENTRIES - 
1]);
+   return 0;
+}
+
+static void ftgmac100_init_rings(struct ftgmac100 *priv)
+{
+   int i;
+
+   /* Initialize RX ring */
+   for (i = 0; i < RX_QUEUE_ENTRIES; i++)
+   priv->descs->rxdes[i].rxdes0 = 0;
+   ftgmac100_rxdes_set_end_of_ring(priv, &priv->descs->rxdes[i - 1]);
+
+   /* Initialize TX ring */
+   for (i = 0; i < TX_QUEUE_ENTRIES; i++)
+   priv->descs->txdes[i].txdes0 = 0;
+   ftgmac100_txdes_set_end_of_ring(priv, &priv->descs->txdes[i -1]);
+}
+
+static int ftgmac100_alloc_rx_buffers(struct ftgmac100 *priv)
+{
+   int i;
 
for (i = 0; i < RX_QUEUE_ENTRIES; i++) {
struct ftgmac100_rxdes *rxdes = &priv->descs->rxdes[i];
 
if (ftgmac100_alloc_rx_page(priv, rxdes, GFP_KERNEL))
-   goto err;
+   return -ENOMEM;
}
-
-   /* initialize TX ring */
-   ftgmac100_txdes_set_end_of_ring(priv,
-   &priv->descs->txdes[TX_QUEUE_ENTRIES - 
1]);
return 0;
-
-err:
-   ftgmac100_free_buffers(priv);
-   return -ENOMEM;
 }
 
 static void ftgmac100_adjust_link(struct net_device *netdev)
@@ -1099,12 +1115,20 @@ static int ftgmac100_open(struct net_device *netdev)
unsigned int status;
int err;
 
-   err = ftgmac100_alloc_buffers(priv);
+   /* Allocate ring buffers  */
+   err = ftgmac100_alloc_rings(priv);
if (err) {
-   netdev_err(netdev, "failed to allocate buffers\n");
-   goto err_alloc;
+   netdev_err(netdev, "Failed to allocate descriptors\n");
+   return err;
}
 
+   /* Initialize the rings */
+   ftgmac100_init_rings(priv);
+
+   /* Allocate receive buffers */
+   if (ftgmac100_alloc_rx_buffers(priv))
+   goto err_alloc;
+
err = request_irq(netdev->irq, ftgmac100_interrupt, 0, netdev->name, 
netdev);
if (err) {
netdev_err(netdev, "failed to request irq %d\n", netdev->irq);
@@ -1168,8 +1192,9 @@ static int ftgmac100_open(struct net_device *netdev)
 err_hw:
free_irq(netdev->irq, netdev);
 err_irq:
-   ftgmac100_free_buffers(priv);
 err_alloc:
+   ftgmac100_free_buffers(priv);
+   ftgmac100_free_rings(priv);
return err;
 }
 
@@ -1190,6

[PATCH v2 05/13] ftgmac100: Cleanup speed/duplex tracking and fix duplex config

2017-04-04 Thread Benjamin Herrenschmidt

Keep track of both the current speed and duplex settings
instead of only speed and properly apply the duplex setting
to the HW.

This reworks the adjust_link() function to also avoid trying
to reconfigure the HW when there is no link and to display
the link state to the user.

Signed-off-by: Benjamin Herrenschmidt 
--

v2. Use phy_print_status()
Only bail out on link down *after* updating priv->cur_speed
---
 drivers/net/ethernet/faraday/ftgmac100.c | 52 +++-
 1 file changed, 44 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index cc2271b..cc6e971 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -77,7 +77,8 @@ struct ftgmac100 {
struct mii_bus *mii_bus;
 
/* Link management */
-   int old_speed;
+   int cur_speed;
+   int cur_duplex;
bool use_ncsi;
 
/* Misc */
@@ -210,16 +211,15 @@ static void ftgmac100_init_hw(struct ftgmac100 *priv)
 FTGMAC100_MACCR_RXDMA_EN   | \
 FTGMAC100_MACCR_TXMAC_EN   | \
 FTGMAC100_MACCR_RXMAC_EN   | \
-FTGMAC100_MACCR_FULLDUP| \
 FTGMAC100_MACCR_CRC_APD| \
 FTGMAC100_MACCR_RX_RUNT| \
 FTGMAC100_MACCR_RX_BROADPKT)
 
-static void ftgmac100_start_hw(struct ftgmac100 *priv, int speed)
+static void ftgmac100_start_hw(struct ftgmac100 *priv)
 {
int maccr = MACCR_ENABLE_ALL;
 
-   switch (speed) {
+   switch (priv->cur_speed) {
default:
case 10:
break;
@@ -233,6 +233,9 @@ static void ftgmac100_start_hw(struct ftgmac100 *priv, int 
speed)
break;
}
 
+   if (priv->cur_duplex == DUPLEX_FULL)
+   maccr |= FTGMAC100_MACCR_FULLDUP;
+
iowrite32(maccr, priv->base + FTGMAC100_OFFSET_MACCR);
 }
 
@@ -852,12 +855,31 @@ static void ftgmac100_adjust_link(struct net_device 
*netdev)
 {
struct ftgmac100 *priv = netdev_priv(netdev);
struct phy_device *phydev = netdev->phydev;
+   int new_speed;
int ier;
 
-   if (phydev->speed == priv->old_speed)
+   /* We store "no link" as speed 0 */
+   if (!phydev->link)
+   new_speed = 0;
+   else
+   new_speed = phydev->speed;
+
+   if (phydev->speed == priv->cur_speed &&
+   phydev->duplex == priv->cur_duplex)
return;
 
-   priv->old_speed = phydev->speed;
+   /* Print status if we have a link or we had one and just lost it,
+* don't print otherwise.
+*/
+   if (new_speed || priv->cur_speed)
+   phy_print_status(phydev);
+
+   priv->cur_speed = new_speed;
+   priv->cur_duplex = phydev->duplex;
+
+   /* Link is down, do nothing else */
+   if (!new_speed)
+   return;
 
ier = ioread32(priv->base + FTGMAC100_OFFSET_IER);
 
@@ -869,7 +891,7 @@ static void ftgmac100_adjust_link(struct net_device *netdev)
 
netif_start_queue(netdev);
ftgmac100_init_hw(priv);
-   ftgmac100_start_hw(priv, phydev->speed);
+   ftgmac100_start_hw(priv);
 
/* re-enable interrupts */
iowrite32(ier, priv->base + FTGMAC100_OFFSET_IER);
@@ -1089,6 +,20 @@ static int ftgmac100_open(struct net_device *netdev)
goto err_irq;
}
 
+   /* When using NC-SI we force the speed to 100Mbit/s full duplex,
+*
+* Otherwise we leave it set to 0 (no link), the link
+* message from the PHY layer will handle setting it up to
+* something else if needed.
+*/
+   if (priv->use_ncsi) {
+   priv->cur_duplex = DUPLEX_FULL;
+   priv->cur_speed = SPEED_100;
+   } else {
+   priv->cur_duplex = 0;
+   priv->cur_speed = 0;
+   }
+
priv->rx_pointer = 0;
priv->tx_clean_pointer = 0;
priv->tx_pointer = 0;
@@ -1099,7 +1135,7 @@ static int ftgmac100_open(struct net_device *netdev)
goto err_hw;
 
ftgmac100_init_hw(priv);
-   ftgmac100_start_hw(priv, priv->use_ncsi ? 100 : 10);
+   ftgmac100_start_hw(priv);
 
/* Clear stale interrupts */
status = ioread32(priv->base + FTGMAC100_OFFSET_ISR);
-- 
2.9.3

[PATCH v2 01/13] ftgmac100: Use netdev->irq instead of private copy

2017-04-04 Thread Benjamin Herrenschmidt

There's a placeholder already for the irq, use it

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 928b0df..bf7b1c0 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -55,7 +55,6 @@ struct ftgmac100_descs {
 struct ftgmac100 {
struct resource *res;
void __iomem *base;
-   int irq;
 
struct ftgmac100_descs *descs;
dma_addr_t descs_dma_addr;
@@ -1119,9 +1118,9 @@ static int ftgmac100_open(struct net_device *netdev)
goto err_alloc;
}
 
-   err = request_irq(priv->irq, ftgmac100_interrupt, 0, netdev->name, 
netdev);
+   err = request_irq(netdev->irq, ftgmac100_interrupt, 0, netdev->name, 
netdev);
if (err) {
-   netdev_err(netdev, "failed to request irq %d\n", priv->irq);
+   netdev_err(netdev, "failed to request irq %d\n", netdev->irq);
goto err_irq;
}
 
@@ -1168,7 +1167,7 @@ static int ftgmac100_open(struct net_device *netdev)
netif_stop_queue(netdev);
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
 err_hw:
-   free_irq(priv->irq, netdev);
+   free_irq(netdev->irq, netdev);
 err_irq:
ftgmac100_free_buffers(priv);
 err_alloc:
@@ -1194,7 +1193,7 @@ static int ftgmac100_stop(struct net_device *netdev)
ncsi_stop_dev(priv->ndev);
 
ftgmac100_stop_hw(priv);
-   free_irq(priv->irq, netdev);
+   free_irq(netdev->irq, netdev);
ftgmac100_free_buffers(priv);
 
return 0;
@@ -1381,7 +1380,7 @@ static int ftgmac100_probe(struct platform_device *pdev)
goto err_ioremap;
}
 
-   priv->irq = irq;
+   netdev->irq = irq;
 
/* MAC address from chip or random one */
ftgmac100_setup_mac(priv);
@@ -1438,7 +1437,7 @@ static int ftgmac100_probe(struct platform_device *pdev)
goto err_register_netdev;
}
 
-   netdev_info(netdev, "irq %d, mapped at %p\n", priv->irq, priv->base);
+   netdev_info(netdev, "irq %d, mapped at %p\n", netdev->irq, priv->base);
 
return 0;
 
-- 
2.9.3

[PATCH v2 04/13] ftgmac100: Remove "enabled" flags

2017-04-04 Thread Benjamin Herrenschmidt

It's not used in any meaningful way

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 02e0534..cc2271b 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -82,7 +82,6 @@ struct ftgmac100 {
 
/* Misc */
int int_mask_all;
-   bool enabled;
 };
 
 static int ftgmac100_alloc_rx_page(struct ftgmac100 *priv,
@@ -1124,8 +1123,6 @@ static int ftgmac100_open(struct net_device *netdev)
goto err_ncsi;
}
 
-   priv->enabled = true;
-
return 0;
 
 err_ncsi:
@@ -1144,11 +1141,7 @@ static int ftgmac100_stop(struct net_device *netdev)
 {
struct ftgmac100 *priv = netdev_priv(netdev);
 
-   if (!priv->enabled)
-   return 0;
-
/* disable all interrupts */
-   priv->enabled = false;
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
 
netif_stop_queue(netdev);
-- 
2.9.3

[PATCH v2 00/13] ftgmac100: Rework batch 1 - Link & Interrupts

2017-04-04 Thread Benjamin Herrenschmidt

This is version 2 of the first batch of updates to the
ftgmac100 driver.

Essentially:

 - A few misc cleanups
 - Fixing link speed & duplex handling (including dealing with
   an Aspeed requirement to double reset the controller when
   the speed changes)
 - And addition of a reset task workqueue which will be used
   for delaying the re-initialization of the controller
 - Fixing a number of issues with how interrupts and NAPI
   are dealt with.

Subsequent batches will rework and improve the rx path, the
tx path, and add a bunch of features and fixes.

Version 2 addresses some review comments to patches 5 and 10
(see version history in the respective emails).

[PATCH net-next] liquidio: fix Octeon core watchdog timeout false alarm

2017-04-04 Thread Felix Manlunas

Detection of watchdog timeout of Octeon cores is flawed and susceptible to
false alarms.  Refactor by removing the detection code, and in its place,
leverage existing code that monitors for an indication from the NIC
firmware that an Octeon core crashed; expand the meaning of the indication
to "an Octeon core crashed or its watchdog timer expired".  Detection of
watchdog timeout is now delegated to an exception handler in the NIC
firmware; this is free of false alarms.

Also if there's an Octeon core crash or watchdog timeout:
(1) Disable VF Ethernet links.
(2) Decrement the module refcount by an amount equal to the number of
active VFs of the NIC whose Octeon core crashed or had a watchdog
timeout.  The refcount will continue to reflect the active VFs of
other liquidio NIC(s) (if present) whose Octeon cores are faultless.

Item (2) is needed to avoid the case of not being able to unload the driver
because the module refcount is stuck at some non-zero number.  There is
code that, in normal cases, decrements the refcount upon receiving a
message from the firmware that a VF driver was unloaded.  But in
exceptional cases like an Octeon core crash or watchdog timeout, arrival of
that particular message from the firmware might be unreliable.  That normal
case code is changed to not touch the refcount in the exceptional case to
avoid contention (over the refcount) with the liquidio_watchdog kernel
thread who will carry out item (2).

Signed-off-by: Felix Manlunas 
Signed-off-by: Derek Chickles 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 178 -
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   2 +
 .../net/ethernet/cavium/liquidio/octeon_network.h  |   4 -
 3 files changed, 107 insertions(+), 77 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index a8426d3..fa673a1 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -173,6 +173,8 @@ static int liquidio_stop(struct net_device *netdev);
 static void liquidio_remove(struct pci_dev *pdev);
 static int liquidio_probe(struct pci_dev *pdev,
  const struct pci_device_id *ent);
+static int liquidio_set_vf_link_state(struct net_device *netdev, int vfidx,
+ int linkstate);
 
 static struct handshake handshake[MAX_OCTEON_DEVICES];
 static struct completion first_stage;
@@ -1199,97 +1201,122 @@ static int octeon_setup_interrupt(struct octeon_device 
*oct)
return 0;
 }
 
+static struct octeon_device *get_other_octeon_device(struct octeon_device *oct)
+{
+   struct octeon_device *other_oct;
+
+   other_oct = lio_get_device(oct->octeon_id + 1);
+
+   if (other_oct && other_oct->pci_dev) {
+   int oct_busnum, other_oct_busnum;
+
+   oct_busnum = oct->pci_dev->bus->number;
+   other_oct_busnum = other_oct->pci_dev->bus->number;
+
+   if (oct_busnum == other_oct_busnum) {
+   int oct_slot, other_oct_slot;
+
+   oct_slot = PCI_SLOT(oct->pci_dev->devfn);
+   other_oct_slot = PCI_SLOT(other_oct->pci_dev->devfn);
+
+   if (oct_slot == other_oct_slot)
+   return other_oct;
+   }
+   }
+
+   return NULL;
+}
+
+static void disable_all_vf_links(struct octeon_device *oct)
+{
+   struct net_device *netdev;
+   int max_vfs, vf, i;
+
+   if (!oct)
+   return;
+
+   max_vfs = oct->sriov_info.max_vfs;
+
+   for (i = 0; i < oct->ifcount; i++) {
+   netdev = oct->props[i].netdev;
+   if (!netdev)
+   continue;
+
+   for (vf = 0; vf < max_vfs; vf++)
+   liquidio_set_vf_link_state(netdev, vf,
+  IFLA_VF_LINK_STATE_DISABLE);
+   }
+}
+
 static int liquidio_watchdog(void *param)
 {
-   u64 wdog;
-   u16 mask_of_stuck_cores = 0;
-   u16 mask_of_crashed_cores = 0;
-   int core_num;
-   u8 core_is_stuck[LIO_MAX_CORES];
-   u8 core_crashed[LIO_MAX_CORES];
+   bool err_msg_was_printed[LIO_MAX_CORES];
+   u16 mask_of_crashed_or_stuck_cores = 0;
+   bool all_vf_links_are_disabled = false;
struct octeon_device *oct = param;
+   struct octeon_device *other_oct;
+#ifdef CONFIG_MODULE_UNLOAD
+   long refcount, vfs_referencing_pf;
+   u64 vfs_mask1, vfs_mask2;
+#endif
+   int core;
 
-   memset(core_is_stuck, 0, sizeof(core_is_stuck));
-   memset(core_crashed, 0, sizeof(core_crashed));
+   memset(err_msg_was_printed, 0, sizeof(err_msg_was_printed));
 
while (!kthread_should_stop()) {
-   mask_of_crashed_cores =
+   /* sleep for a couple of seconds so that we don't hog the CPU */
+   set_current_stat

[Patch net] net_sched: replace yield() with cond_resched()

2017-04-04 Thread Cong Wang

yield() should be rendered dead, according to Mike.

It is hard to wait properly for all qdisc's to transmit
all packets. So just keep the original logic.

Reported-by: Mike Galbraith 
Signed-off-by: Cong Wang 
---
 net/sched/sch_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 1a2f9e9..4725d2f 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -925,7 +925,7 @@ void dev_deactivate_many(struct list_head *head)
/* Wait for outstanding qdisc_run calls. */
list_for_each_entry(dev, head, close_list)
while (some_qdisc_is_busy(dev))
-   yield();
+   cond_resched();
 }
 
 void dev_deactivate(struct net_device *dev)
-- 
2.5.5

[Patch net] net_sched: check noop_qdisc before qdisc_hash_add()

2017-04-04 Thread Cong Wang

Dmitry reported a crash when injecting faults in
attach_one_default_qdisc() and dev->qdisc is still
a noop_disc, the check before qdisc_hash_add() fails
to catch it because it tests NULL. We should test
against noop_qdisc since it is the default qdisc
at this point.

Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Dmitry Vyukov 
Cc: Jiri Kosina 
Signed-off-by: Cong Wang 
---
 net/sched/sch_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index b052b27..1a2f9e9 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -794,7 +794,7 @@ static void attach_default_qdiscs(struct net_device *dev)
}
}
 #ifdef CONFIG_NET_SCHED
-   if (dev->qdisc)
+   if (dev->qdisc != &noop_qdisc)
qdisc_hash_add(dev->qdisc);
 #endif
 }
-- 
2.5.5

[PATCH net-next] net: usbnet: Remove unused driver_name variable

2017-04-04 Thread Florian Fainelli

With GCC 6.3, we can get the following warning:

drivers/net/usb/usbnet.c:85:19: warning: 'driver_name' defined but not
used [-Wunused-const-variable=]
 static const char driver_name [] = "usbnet";
   ^~~

Signed-off-by: Florian Fainelli 
---
 drivers/net/usb/usbnet.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 9890656af735..1cc945cbeaa3 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -82,8 +82,6 @@
 // randomly generated ethernet address
 static u8  node_id [ETH_ALEN];
 
-static const char driver_name [] = "usbnet";
-
 /* use ethtool to change the level for any given device */
 static int msg_level = -1;
 module_param (msg_level, int, 0);
-- 
2.9.3

Re: net/ipv4: use-after-free in ipv4_mtu

2017-04-04 Thread Cong Wang

On Tue, Apr 4, 2017 at 11:51 AM, Eric Dumazet  wrote:
> On Tue, Apr 4, 2017 at 7:50 AM, Andrey Konovalov  
> wrote:
>>
>> Hi,
>>
>> I've got the following error report while fuzzing the kernel with syzkaller.
>>
>> On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5).
>>
>> Unfortunately it's not reproducible.
>>
>> ==
>> BUG: KASAN: use-after-free in dst_metric_raw include/net/dst.h:176
>> [inline] at addr 88003d6a965c
>> BUG: KASAN: use-after-free in ipv4_mtu+0x3f2/0x4b0
>> net/ipv4/route.c:1270 at addr 88003d6a965c
>> Read of size 4 by task syz-executor3/20611
>> CPU: 3 PID: 20611 Comm: syz-executor3 Not tainted 4.11.0-rc5+ #199
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:16 [inline]
>>  dump_stack+0x292/0x398 lib/dump_stack.c:52
>>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
>>  print_address_description mm/kasan/report.c:202 [inline]
>>  kasan_report_error mm/kasan/report.c:291 [inline]
>>  kasan_report+0x252/0x510 mm/kasan/report.c:347
>>  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367
>>  dst_metric_raw include/net/dst.h:176 [inline]
>>  ipv4_mtu+0x3f2/0x4b0 net/ipv4/route.c:1270
>>  dst_mtu include/net/dst.h:221 [inline]
>>  do_ip_getsockopt+0x71d/0x2290 net/ipv4/ip_sockglue.c:1433
>>  ip_getsockopt+0x90/0x230 net/ipv4/ip_sockglue.c:1578
>>  tcp_getsockopt+0x82/0xd0 net/ipv4/tcp.c:3131
>>  sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2709
>>  SYSC_getsockopt net/socket.c:1829 [inline]
>>  SyS_getsockopt+0x252/0x390 net/socket.c:1811
>>  entry_SYSCALL_64_fastpath+0x1f/0xc2
>> RIP: 0033:0x4458d9
>> RSP: 002b:7fe87f452b58 EFLAGS: 0286 ORIG_RAX: 0037
>> RAX: ffda RBX: 0005 RCX: 004458d9
>> RDX: 000e RSI:  RDI: 0005
>> RBP: 006e0020 R08: 20db6000 R09: 
>> R10: 207e8000 R11: 0286 R12: 00708150
>> R13: 20db8000 R14: 1000 R15: 0003
>> Object at 88003d6a9658, in cache kmalloc-64 size: 64
>> Allocated:
>> PID = 20110
>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>>  set_track mm/kasan/kasan.c:525 [inline]
>>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
>>  kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745
>>  kmalloc include/linux/slab.h:490 [inline]
>>  kzalloc include/linux/slab.h:663 [inline]
>>  fib_create_info+0x8e0/0x3a30 net/ipv4/fib_semantics.c:1040
>>  fib_table_insert+0x1a5/0x1550 net/ipv4/fib_trie.c:1221
>>  ip_rt_ioctl+0xddc/0x1590 net/ipv4/fib_frontend.c:597
>>  inet_ioctl+0xf2/0x1c0 net/ipv4/af_inet.c:882
>> sctp: [Deprecated]: syz-executor0 (pid 20638) Use of int in max_burst
>> socket option.
>> Use struct sctp_assoc_value instead
>>  sock_do_ioctl+0x65/0xb0 net/socket.c:906
>>  sock_ioctl+0x28f/0x440 net/socket.c:1004
>>  vfs_ioctl fs/ioctl.c:45 [inline]
>>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
>>  SYSC_ioctl fs/ioctl.c:700 [inline]
>>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>>  entry_SYSCALL_64_fastpath+0x1f/0xc2
>> Freed:
>> PID = 4439
>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>>  set_track mm/kasan/kasan.c:525 [inline]
>>  kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
>>  slab_free_hook mm/slub.c:1357 [inline]
>>  slab_free_freelist_hook mm/slub.c:1379 [inline]
>>  slab_free mm/slub.c:2961 [inline]
>>  kfree+0xe8/0x2b0 mm/slub.c:3882
>>  free_fib_info_rcu+0x4ba/0x5e0 net/ipv4/fib_semantics.c:218
>>  __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
>>  rcu_do_batch.isra.64+0x947/0xcc0 kernel/rcu/tree.c:2879
>>  invoke_rcu_callbacks kernel/rcu/tree.c:3142 [inline]
>>  __rcu_process_callbacks kernel/rcu/tree.c:3109 [inline]
>>  rcu_process_callbacks+0x2cc/0xb90 kernel/rcu/tree.c:3126
>>  __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
>> Memory state around the buggy address:
>>  88003d6a9500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>  88003d6a9580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>> >88003d6a9600: fc fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb
>> ^
>>  88003d6a9680: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
>>  88003d6a9700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>> ==
>
> Thanks for the report Andrey
>
> Looking at fib->fib_metrics, I fail to understand how the following can work :
>
> dst_init_metrics(&rt->dst, fi->fib_metrics, true);
>
> In the cases fi->fib_metrics is _not_ dst_default_metrics,
> fi->fib_metrics can be freed when the fib is deleted,
> while dst(s) have still the 'read only pointer'.
>
> RCU grace period before fi->fib_metrics freeing does not help.
>
> Without refcounts, it looks like we need to copy the

[PATCH net-next 14/14] nfp: add support for .set_link_ksettings()

2017-04-04 Thread Jakub Kicinski

Support setting link speed and autonegotiation through
set_link_ksettings() ethtool op.  If the port is reconfigured
in incompatible way and reboot is required the netdev will get
unregistered and not come back until user reboots the system.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 963d6dd97cec..3328041ec290 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -237,6 +237,51 @@ nfp_net_get_link_ksettings(struct net_device *netdev,
return 0;
 }
 
+static int
+nfp_net_set_link_ksettings(struct net_device *netdev,
+  const struct ethtool_link_ksettings *cmd)
+{
+   struct nfp_net *nn = netdev_priv(netdev);
+   struct nfp_nsp *nsp;
+   int err;
+
+   if (!nn->eth_port)
+   return -EOPNOTSUPP;
+
+   if (netif_running(netdev)) {
+   nn_warn(nn, "Changing settings not allowed on an active 
interface. It may cause the port to be disabled until reboot.\n");
+   return -EBUSY;
+   }
+
+   nsp = nfp_eth_config_start(nn->cpp, nn->eth_port->index);
+   if (IS_ERR(nsp))
+   return PTR_ERR(nsp);
+
+   err = __nfp_eth_set_aneg(nsp, cmd->base.autoneg == AUTONEG_ENABLE ?
+NFP_ANEG_AUTO : NFP_ANEG_DISABLED);
+   if (err)
+   goto err_bad_set;
+   if (cmd->base.speed != SPEED_UNKNOWN) {
+   u32 speed = cmd->base.speed / nn->eth_port->lanes;
+
+   err = __nfp_eth_set_speed(nsp, speed);
+   if (err)
+   goto err_bad_set;
+   }
+
+   err = nfp_eth_config_commit_end(nsp);
+   if (err > 0)
+   return 0; /* no change */
+
+   nfp_net_refresh_port_config(nn);
+
+   return err;
+
+err_bad_set:
+   nfp_eth_config_cleanup_end(nsp);
+   return err;
+}
+
 static void nfp_net_get_ringparam(struct net_device *netdev,
  struct ethtool_ringparam *ring)
 {
@@ -879,6 +924,7 @@ static const struct ethtool_ops nfp_net_ethtool_ops = {
.get_channels   = nfp_net_get_channels,
.set_channels   = nfp_net_set_channels,
.get_link_ksettings = nfp_net_get_link_ksettings,
+   .set_link_ksettings = nfp_net_set_link_ksettings,
 };
 
 void nfp_net_set_ethtool_ops(struct net_device *netdev)
-- 
2.11.0

[PATCH net-next 06/14] nfp: report link speed from NSP

2017-04-04 Thread Jakub Kicinski

On the PF prefer the link speed value provided by the NSP.
Refresh port table if needed.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index d3cec0d4a978..0fdc14e7b576 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -49,6 +49,7 @@
 #include 
 
 #include "nfpcore/nfp.h"
+#include "nfpcore/nfp_nsp_eth.h"
 #include "nfp_net_ctrl.h"
 #include "nfp_net.h"
 
@@ -205,6 +206,16 @@ nfp_net_get_link_ksettings(struct net_device *netdev,
if (!netif_carrier_ok(netdev))
return 0;
 
+   /* Use link speed from ETH table if available, otherwise try the BAR */
+   if (nn->eth_port && nfp_net_link_changed_read_clear(nn))
+   nfp_net_refresh_port_config(nn);
+   /* Separate if - on FW error the port could've disappeared from table */
+   if (nn->eth_port) {
+   cmd->base.speed = nn->eth_port->speed;
+   cmd->base.duplex = DUPLEX_FULL;
+   return 0;
+   }
+
sts = nn_readl(nn, NFP_NET_CFG_STS);
 
ls = FIELD_GET(NFP_NET_CFG_STS_LINK_RATE, sts);
-- 
2.11.0

[PATCH net-next 03/14] nfp: add mutex protection for the port list

2017-04-04 Thread Jakub Kicinski

We will want to unregister netdevs after their port got reconfigured.
For that we need to make sure manipulations of port list from the
port reconfiguration flow will not race with driver's .remove()
callback.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c |  3 +--
 drivers/net/ethernet/netronome/nfp/nfp_main.h |  3 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 19 +--
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index dedac720fb29..96266796fd09 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -385,8 +385,7 @@ static void nfp_pci_remove(struct pci_dev *pdev)
 {
struct nfp_pf *pf = pci_get_drvdata(pdev);
 
-   if (!list_empty(&pf->ports))
-   nfp_net_pci_remove(pf);
+   nfp_net_pci_remove(pf);
 
nfp_pcie_sriov_disable(pdev);
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h 
b/drivers/net/ethernet/netronome/nfp/nfp_main.h
index bb15a5724bf7..b7ceec9a5783 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct dentry;
@@ -67,6 +68,7 @@ struct nfp_eth_table;
  * @num_ports: Number of adapter ports app firmware supports
  * @num_netdevs:   Number of netdevs spawned
  * @ports: Linked list of port structures (struct nfp_net)
+ * @port_lock: Protects @ports, @num_ports, @num_netdevs
  */
 struct nfp_pf {
struct pci_dev *pdev;
@@ -92,6 +94,7 @@ struct nfp_pf {
unsigned int num_netdevs;
 
struct list_head ports;
+   struct mutex port_lock;
 };
 
 extern struct pci_driver nfp_netvf_pci_driver;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 1644954f52cd..4d602b1ddc90 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -481,17 +481,22 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
int stride;
int err;
 
+   mutex_init(&pf->port_lock);
+
/* Verify that the board has completed initialization */
if (!nfp_is_ready(pf->cpp)) {
nfp_err(pf->cpp, "NFP is not ready for NIC operation.\n");
return -EINVAL;
}
 
+   mutex_lock(&pf->port_lock);
pf->num_ports = nfp_net_pf_get_num_ports(pf);
 
ctrl_bar = nfp_net_pf_map_ctrl_bar(pf);
-   if (!ctrl_bar)
-   return pf->fw_loaded ? -EINVAL : -EPROBE_DEFER;
+   if (!ctrl_bar) {
+   err = pf->fw_loaded ? -EINVAL : -EPROBE_DEFER;
+   goto err_unlock;
+   }
 
nfp_net_get_fw_version(&fw_ver, ctrl_bar);
if (fw_ver.resv || fw_ver.class != NFP_NET_CFG_VERSION_CLASS_GENERIC) {
@@ -565,6 +570,8 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
if (err)
goto err_clean_ddir;
 
+   mutex_unlock(&pf->port_lock);
+
return 0;
 
 err_clean_ddir:
@@ -574,6 +581,8 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
nfp_cpp_area_release_free(pf->tx_area);
 err_ctrl_unmap:
nfp_cpp_area_release_free(pf->ctrl_area);
+err_unlock:
+   mutex_unlock(&pf->port_lock);
return err;
 }
 
@@ -581,6 +590,10 @@ void nfp_net_pci_remove(struct nfp_pf *pf)
 {
struct nfp_net *nn;
 
+   mutex_lock(&pf->port_lock);
+   if (list_empty(&pf->ports))
+   goto out;
+
list_for_each_entry(nn, &pf->ports, port_list) {
nfp_net_debugfs_dir_clean(&nn->debugfs_dir);
 
@@ -597,4 +610,6 @@ void nfp_net_pci_remove(struct nfp_pf *pf)
nfp_cpp_area_release_free(pf->rx_area);
nfp_cpp_area_release_free(pf->tx_area);
nfp_cpp_area_release_free(pf->ctrl_area);
+out:
+   mutex_unlock(&pf->port_lock);
 }
-- 
2.11.0

[PATCH net-next 01/14] nfp: add support for .get_link_ksettings()

2017-04-04 Thread Jakub Kicinski

Read link speed from the BAR.  This provides very basic information
and works for both PFs and VFs.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  | 13 ++
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 49 ++
 2 files changed, 62 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
index 71d86171b4ee..d04ccc9f6116 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
@@ -177,6 +177,19 @@
 #define   NFP_NET_CFG_VERSION_MINOR(x)(((x) & 0xff) <<  0)
 #define NFP_NET_CFG_STS 0x0034
 #define   NFP_NET_CFG_STS_LINK(0x1 << 0) /* Link up or down */
+/* Link rate */
+#define   NFP_NET_CFG_STS_LINK_RATE_SHIFT 1
+#define   NFP_NET_CFG_STS_LINK_RATE_MASK  0xF
+#define   NFP_NET_CFG_STS_LINK_RATE   \
+   (NFP_NET_CFG_STS_LINK_RATE_MASK << NFP_NET_CFG_STS_LINK_RATE_SHIFT)
+#define   NFP_NET_CFG_STS_LINK_RATE_UNSUPPORTED   0
+#define   NFP_NET_CFG_STS_LINK_RATE_UNKNOWN   1
+#define   NFP_NET_CFG_STS_LINK_RATE_1G2
+#define   NFP_NET_CFG_STS_LINK_RATE_10G   3
+#define   NFP_NET_CFG_STS_LINK_RATE_25G   4
+#define   NFP_NET_CFG_STS_LINK_RATE_40G   5
+#define   NFP_NET_CFG_STS_LINK_RATE_50G   6
+#define   NFP_NET_CFG_STS_LINK_RATE_100G  7
 #define NFP_NET_CFG_CAP 0x0038
 #define NFP_NET_CFG_MAX_TXRINGS 0x003c
 #define NFP_NET_CFG_MAX_RXRINGS 0x0040
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index ed22a813e579..d3cec0d4a978 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -173,6 +173,54 @@ static void nfp_net_get_drvinfo(struct net_device *netdev,
drvinfo->regdump_len = NFP_NET_CFG_BAR_SZ;
 }
 
+/**
+ * nfp_net_get_link_ksettings - Get Link Speed settings
+ * @netdev:network interface device structure
+ * @cmd:   ethtool command
+ *
+ * Reports speed settings based on info in the BAR provided by the fw.
+ */
+static int
+nfp_net_get_link_ksettings(struct net_device *netdev,
+  struct ethtool_link_ksettings *cmd)
+{
+   static const u32 ls_to_ethtool[] = {
+   [NFP_NET_CFG_STS_LINK_RATE_UNSUPPORTED] = 0,
+   [NFP_NET_CFG_STS_LINK_RATE_UNKNOWN] = SPEED_UNKNOWN,
+   [NFP_NET_CFG_STS_LINK_RATE_1G]  = SPEED_1000,
+   [NFP_NET_CFG_STS_LINK_RATE_10G] = SPEED_1,
+   [NFP_NET_CFG_STS_LINK_RATE_25G] = SPEED_25000,
+   [NFP_NET_CFG_STS_LINK_RATE_40G] = SPEED_4,
+   [NFP_NET_CFG_STS_LINK_RATE_50G] = SPEED_5,
+   [NFP_NET_CFG_STS_LINK_RATE_100G]= SPEED_10,
+   };
+   struct nfp_net *nn = netdev_priv(netdev);
+   u32 sts, ls;
+
+   ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE);
+   cmd->base.port = PORT_OTHER;
+   cmd->base.speed = SPEED_UNKNOWN;
+   cmd->base.duplex = DUPLEX_UNKNOWN;
+
+   if (!netif_carrier_ok(netdev))
+   return 0;
+
+   sts = nn_readl(nn, NFP_NET_CFG_STS);
+
+   ls = FIELD_GET(NFP_NET_CFG_STS_LINK_RATE, sts);
+   if (ls == NFP_NET_CFG_STS_LINK_RATE_UNSUPPORTED)
+   return -EOPNOTSUPP;
+
+   if (ls == NFP_NET_CFG_STS_LINK_RATE_UNKNOWN ||
+   ls >= ARRAY_SIZE(ls_to_ethtool))
+   return 0;
+
+   cmd->base.speed = ls_to_ethtool[sts];
+   cmd->base.duplex = DUPLEX_FULL;
+
+   return 0;
+}
+
 static void nfp_net_get_ringparam(struct net_device *netdev,
  struct ethtool_ringparam *ring)
 {
@@ -814,6 +862,7 @@ static const struct ethtool_ops nfp_net_ethtool_ops = {
.set_coalesce   = nfp_net_set_coalesce,
.get_channels   = nfp_net_get_channels,
.set_channels   = nfp_net_set_channels,
+   .get_link_ksettings = nfp_net_get_link_ksettings,
 };
 
 void nfp_net_set_ethtool_ops(struct net_device *netdev)
-- 
2.11.0

[PATCH net-next 10/14] nfp: allow multi-stage NSP configuration

2017-04-04 Thread Jakub Kicinski

NSP commands may be slow to respond, we should try to avoid doing
a command-per-item when user requested to change multiple parameters
for instance with an ethtool .set_settings() command.

Introduce a way of internal NSP code to carry state in NSP structure
and add start/finish calls to perform the initialization and kick off
of the configuration request, with potentially many parameters being
modified in between.

nfp_eth_set_mod_enable() will make use of the new code internally,
other "set" functions to follow.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h   |   8 ++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |  43 
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |   4 +
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   | 114 +++--
 4 files changed, 138 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
index 778bd9424d5d..8afef7593f13 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
@@ -52,6 +52,14 @@ const char *nfp_hwinfo_lookup(struct nfp_cpp *cpp, const 
char *lookup);
 
 struct nfp_nsp;
 
+struct nfp_cpp *nfp_nsp_cpp(struct nfp_nsp *state);
+bool nfp_nsp_config_modified(struct nfp_nsp *state);
+void nfp_nsp_config_set_modified(struct nfp_nsp *state, bool modified);
+void *nfp_nsp_config_entries(struct nfp_nsp *state);
+unsigned int nfp_nsp_config_idx(struct nfp_nsp *state);
+void nfp_nsp_config_set_state(struct nfp_nsp *state, void *entries,
+ unsigned int idx);
+void nfp_nsp_config_clear_state(struct nfp_nsp *state);
 int nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int 
size);
 int nfp_nsp_write_eth_table(struct nfp_nsp *state,
const void *buf, unsigned int size);
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 6482831282b2..225d07815375 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -104,8 +104,51 @@ struct nfp_nsp {
u16 major;
u16 minor;
} ver;
+
+   /* Eth table config state */
+   bool modified;
+   unsigned int idx;
+   void *entries;
 };
 
+struct nfp_cpp *nfp_nsp_cpp(struct nfp_nsp *state)
+{
+   return state->cpp;
+}
+
+bool nfp_nsp_config_modified(struct nfp_nsp *state)
+{
+   return state->modified;
+}
+
+void nfp_nsp_config_set_modified(struct nfp_nsp *state, bool modified)
+{
+   state->modified = modified;
+}
+
+void *nfp_nsp_config_entries(struct nfp_nsp *state)
+{
+   return state->entries;
+}
+
+unsigned int nfp_nsp_config_idx(struct nfp_nsp *state)
+{
+   return state->idx;
+}
+
+void
+nfp_nsp_config_set_state(struct nfp_nsp *state, void *entries, unsigned int 
idx)
+{
+   state->entries = entries;
+   state->idx = idx;
+}
+
+void nfp_nsp_config_clear_state(struct nfp_nsp *state)
+{
+   state->entries = NULL;
+   state->idx = 0;
+}
+
 static int nfp_nsp_check(struct nfp_nsp *state)
 {
struct nfp_cpp *cpp = state->cpp;
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index e3baec32..c452ad311993 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -136,4 +136,8 @@ struct nfp_eth_table *
 __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp *nsp);
 int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, bool enable);
 
+struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx);
+int nfp_eth_config_commit_end(struct nfp_nsp *nsp);
+void nfp_eth_config_cleanup_end(struct nfp_nsp *nsp);
+
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
index 837de15ed720..55d8e073ccbd 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
@@ -268,63 +268,115 @@ __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp 
*nsp)
return NULL;
 }
 
-/**
- * nfp_eth_set_mod_enable() - set PHY module enable control bit
- * @cpp:   NFP CPP handle
- * @idx:   NFP chip-wide port index
- * @enable:Desired state
- *
- * Enable or disable PHY module (this usually means setting the TX lanes
- * disable bits).
- *
- * Return: 0 or -ERRNO.
- */
-int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, bool enable)
+struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx)
 {
struct eth_table_entry *entries;
struct nfp_nsp *nsp;
-   u64 reg;
int ret;
 
entries = kzalloc(NSP_ETH_TABLE_SIZE, GFP_KER

[PATCH net-next 04/14] nfp: track link state changes

2017-04-04 Thread Jakub Kicinski

For caching link settings - remember if we have seen link events
since the last time the eth_port information was refreshed.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h|  6 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 14 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 8e04aa0e6e87..91e963b5104f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -523,7 +523,8 @@ struct nfp_net_dp {
  * @reconfig_sync_present:  Some thread is performing synchronous reconfig
  * @reconfig_timer:Timer for async reading of reconfig results
  * @link_up:Is the link up?
- * @link_status_lock:  Protects @link_up and ensures atomicity with BAR reading
+ * @link_changed:  Has link state changes since last port refresh?
+ * @link_status_lock:  Protects @link_* and ensures atomicity with BAR reading
  * @rx_coalesce_usecs:  RX interrupt moderation usecs delay parameter
  * @rx_coalesce_max_frames: RX interrupt moderation frame count parameter
  * @tx_coalesce_usecs:  TX interrupt moderation usecs delay parameter
@@ -580,6 +581,7 @@ struct nfp_net {
u32 me_freq_mhz;
 
bool link_up;
+   bool link_changed;
spinlock_t link_status_lock;
 
spinlock_t reconfig_lock;
@@ -810,6 +812,8 @@ nfp_net_irqs_assign(struct nfp_net *nn, struct msix_entry 
*irq_entries,
 struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn);
 int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new);
 
+bool nfp_net_link_changed_read_clear(struct nfp_net *nn);
+
 #ifdef CONFIG_NFP_DEBUG
 void nfp_net_debugfs_create(void);
 void nfp_net_debugfs_destroy(void);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 6944a3202a45..8664815f45ce 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -376,6 +376,19 @@ static irqreturn_t nfp_net_irq_rxtx(int irq, void *data)
return IRQ_HANDLED;
 }
 
+bool nfp_net_link_changed_read_clear(struct nfp_net *nn)
+{
+   unsigned long flags;
+   bool ret;
+
+   spin_lock_irqsave(&nn->link_status_lock, flags);
+   ret = nn->link_changed;
+   nn->link_changed = false;
+   spin_unlock_irqrestore(&nn->link_status_lock, flags);
+
+   return ret;
+}
+
 /**
  * nfp_net_read_link_status() - Reread link status from control BAR
  * @nn:   NFP Network structure
@@ -395,6 +408,7 @@ static void nfp_net_read_link_status(struct nfp_net *nn)
goto out;
 
nn->link_up = link_up;
+   nn->link_changed = true;
 
if (nn->link_up) {
netif_carrier_on(nn->dp.netdev);
-- 
2.11.0

[PATCH net-next 11/14] nfp: turn NSP port entry into a union

2017-04-04 Thread Jakub Kicinski

Make NSP port structure a union to simplify accessing the fields
from generic macros.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   | 38 ++
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
index 55d8e073ccbd..ca5c041e64a4 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
@@ -49,7 +49,7 @@
 #define NSP_ETH_NBI_PORT_COUNT 24
 #define NSP_ETH_MAX_COUNT  (2 * NSP_ETH_NBI_PORT_COUNT)
 #define NSP_ETH_TABLE_SIZE (NSP_ETH_MAX_COUNT *\
-sizeof(struct eth_table_entry))
+sizeof(union eth_table_entry))
 
 #define NSP_ETH_PORT_LANES GENMASK_ULL(3, 0)
 #define NSP_ETH_PORT_INDEX GENMASK_ULL(15, 8)
@@ -71,6 +71,15 @@
 #define NSP_ETH_CTRL_TX_ENABLEDBIT_ULL(2)
 #define NSP_ETH_CTRL_RX_ENABLEDBIT_ULL(3)
 
+enum nfp_eth_raw {
+   NSP_ETH_RAW_PORT = 0,
+   NSP_ETH_RAW_STATE,
+   NSP_ETH_RAW_MAC,
+   NSP_ETH_RAW_CONTROL,
+
+   NSP_ETH_NUM_RAW
+};
+
 enum nfp_eth_rate {
RATE_INVALID = 0,
RATE_10M,
@@ -80,12 +89,15 @@ enum nfp_eth_rate {
RATE_25G,
 };
 
-struct eth_table_entry {
-   __le64 port;
-   __le64 state;
-   u8 mac_addr[6];
-   u8 resv[2];
-   __le64 control;
+union eth_table_entry {
+   struct {
+   __le64 port;
+   __le64 state;
+   u8 mac_addr[6];
+   u8 resv[2];
+   __le64 control;
+   };
+   __le64 raw[NSP_ETH_NUM_RAW];
 };
 
 static unsigned int nfp_eth_rate(enum nfp_eth_rate rate)
@@ -114,7 +126,7 @@ static void nfp_eth_copy_mac_reverse(u8 *dst, const u8 *src)
 }
 
 static void
-nfp_eth_port_translate(struct nfp_nsp *nsp, const struct eth_table_entry *src,
+nfp_eth_port_translate(struct nfp_nsp *nsp, const union eth_table_entry *src,
   unsigned int index, struct nfp_eth_table_port *dst)
 {
unsigned int rate;
@@ -216,7 +228,7 @@ struct nfp_eth_table *nfp_eth_read_ports(struct nfp_cpp 
*cpp)
 struct nfp_eth_table *
 __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp *nsp)
 {
-   struct eth_table_entry *entries;
+   union eth_table_entry *entries;
struct nfp_eth_table *table;
int i, j, ret, cnt = 0;
 
@@ -270,7 +282,7 @@ __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp 
*nsp)
 
 struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx)
 {
-   struct eth_table_entry *entries;
+   union eth_table_entry *entries;
struct nfp_nsp *nsp;
int ret;
 
@@ -307,7 +319,7 @@ struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, 
unsigned int idx)
 
 void nfp_eth_config_cleanup_end(struct nfp_nsp *nsp)
 {
-   struct eth_table_entry *entries = nfp_nsp_config_entries(nsp);
+   union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
 
nfp_nsp_config_set_modified(nsp, false);
nfp_nsp_config_clear_state(nsp);
@@ -331,7 +343,7 @@ void nfp_eth_config_cleanup_end(struct nfp_nsp *nsp)
  */
 int nfp_eth_config_commit_end(struct nfp_nsp *nsp)
 {
-   struct eth_table_entry *entries = nfp_nsp_config_entries(nsp);
+   union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
int ret = 1;
 
if (nfp_nsp_config_modified(nsp)) {
@@ -357,7 +369,7 @@ int nfp_eth_config_commit_end(struct nfp_nsp *nsp)
  */
 int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, bool enable)
 {
-   struct eth_table_entry *entries;
+   union eth_table_entry *entries;
struct nfp_nsp *nsp;
u64 reg;
 
-- 
2.11.0

[PATCH net-next 12/14] nfp: add extended error messages

2017-04-04 Thread Jakub Kicinski

Allow NSP to set option code even when error is reported.  This provides
a way for NSP to give user more precise information about why command
failed.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   | 37 +-
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 225d07815375..96bb5f6bd87b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -97,6 +97,13 @@ enum nfp_nsp_cmd {
__MAX_SPCODE,
 };
 
+static const struct {
+   int code;
+   const char *msg;
+} nsp_errors[] = {
+   { 0, "success" } /* placeholder to avoid warnings */
+};
+
 struct nfp_nsp {
struct nfp_cpp *cpp;
struct nfp_resource *res;
@@ -149,6 +156,18 @@ void nfp_nsp_config_clear_state(struct nfp_nsp *state)
state->idx = 0;
 }
 
+static void nfp_nsp_print_extended_error(struct nfp_nsp *state, u32 ret_val)
+{
+   int i;
+
+   if (!ret_val)
+   return;
+
+   for (i = 0; i < ARRAY_SIZE(nsp_errors); i++)
+   if (ret_val == nsp_errors[i].code)
+   nfp_err(state->cpp, "err msg: %s\n", nsp_errors[i].msg);
+}
+
 static int nfp_nsp_check(struct nfp_nsp *state)
 {
struct nfp_cpp *cpp = state->cpp;
@@ -282,7 +301,7 @@ nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg,
 static int nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 option,
   u32 buff_cpp, u64 buff_addr)
 {
-   u64 reg, nsp_base, nsp_buffer, nsp_status, nsp_command;
+   u64 reg, ret_val, nsp_base, nsp_buffer, nsp_status, nsp_command;
struct nfp_cpp *cpp = state->cpp;
u32 nsp_cpp;
int err;
@@ -335,18 +354,20 @@ static int nfp_nsp_command(struct nfp_nsp *state, u16 
code, u32 option,
return err;
}
 
+   err = nfp_cpp_readq(cpp, nsp_cpp, nsp_command, &ret_val);
+   if (err < 0)
+   return err;
+   ret_val = FIELD_GET(NSP_COMMAND_OPTION, ret_val);
+
err = FIELD_GET(NSP_STATUS_RESULT, reg);
if (err) {
-   nfp_warn(cpp, "Result (error) code set: %d command: %d\n",
--err, code);
+   nfp_warn(cpp, "Result (error) code set: %d (%d) command: %d\n",
+-err, (int)ret_val, code);
+   nfp_nsp_print_extended_error(state, ret_val);
return -err;
}
 
-   err = nfp_cpp_readq(cpp, nsp_cpp, nsp_command, ®);
-   if (err < 0)
-   return err;
-
-   return FIELD_GET(NSP_COMMAND_OPTION, reg);
+   return ret_val;
 }
 
 static int nfp_nsp_command_buf(struct nfp_nsp *nsp, u16 code, u32 option,
-- 
2.11.0

[PATCH net-next 13/14] nfp: NSP backend for link configuration operations

2017-04-04 Thread Jakub Kicinski

Add NSP backend for upcoming link configuration operations.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |   6 +-
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |   7 +
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   | 180 +++--
 3 files changed, 179 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 96bb5f6bd87b..4635f42e15b0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -101,7 +101,11 @@ static const struct {
int code;
const char *msg;
 } nsp_errors[] = {
-   { 0, "success" } /* placeholder to avoid warnings */
+   { 6010, "could not map to phy for port" },
+   { 6011, "not an allowed rate/lanes for port" },
+   { 6012, "not an allowed rate/lanes for port" },
+   { 6013, "high/low error, change other port first" },
+   { 6014, "config not found in flash" },
 };
 
 struct nfp_nsp {
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index c452ad311993..7d34ff145fd7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -134,10 +134,17 @@ struct nfp_eth_table {
 struct nfp_eth_table *nfp_eth_read_ports(struct nfp_cpp *cpp);
 struct nfp_eth_table *
 __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp *nsp);
+
 int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, bool enable);
+int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx,
+  bool configed);
 
 struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx);
 int nfp_eth_config_commit_end(struct nfp_nsp *nsp);
 void nfp_eth_config_cleanup_end(struct nfp_nsp *nsp);
 
+int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode);
+int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed);
+int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes);
+
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
index ca5c041e64a4..639438d8313a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
@@ -58,6 +58,7 @@
 
 #define NSP_ETH_PORT_LANES_MASKcpu_to_le64(NSP_ETH_PORT_LANES)
 
+#define NSP_ETH_STATE_CONFIGURED   BIT_ULL(0)
 #define NSP_ETH_STATE_ENABLED  BIT_ULL(1)
 #define NSP_ETH_STATE_TX_ENABLED   BIT_ULL(2)
 #define NSP_ETH_STATE_RX_ENABLED   BIT_ULL(3)
@@ -67,9 +68,13 @@
 #define NSP_ETH_STATE_OVRD_CHNGBIT_ULL(22)
 #define NSP_ETH_STATE_ANEG GENMASK_ULL(25, 23)
 
+#define NSP_ETH_CTRL_CONFIGUREDBIT_ULL(0)
 #define NSP_ETH_CTRL_ENABLED   BIT_ULL(1)
 #define NSP_ETH_CTRL_TX_ENABLEDBIT_ULL(2)
 #define NSP_ETH_CTRL_RX_ENABLEDBIT_ULL(3)
+#define NSP_ETH_CTRL_SET_RATE  BIT_ULL(4)
+#define NSP_ETH_CTRL_SET_LANES BIT_ULL(5)
+#define NSP_ETH_CTRL_SET_ANEG  BIT_ULL(6)
 
 enum nfp_eth_raw {
NSP_ETH_RAW_PORT = 0,
@@ -100,21 +105,38 @@ union eth_table_entry {
__le64 raw[NSP_ETH_NUM_RAW];
 };
 
-static unsigned int nfp_eth_rate(enum nfp_eth_rate rate)
+static const struct {
+   enum nfp_eth_rate rate;
+   unsigned int speed;
+} nsp_eth_rate_tbl[] = {
+   { RATE_INVALID, 0, },
+   { RATE_10M, SPEED_10, },
+   { RATE_100M,SPEED_100, },
+   { RATE_1G,  SPEED_1000, },
+   { RATE_10G, SPEED_1, },
+   { RATE_25G, SPEED_25000, },
+};
+
+static unsigned int nfp_eth_rate2speed(enum nfp_eth_rate rate)
 {
-   unsigned int rate_xlate[] = {
-   [RATE_INVALID]  = 0,
-   [RATE_10M]  = SPEED_10,
-   [RATE_100M] = SPEED_100,
-   [RATE_1G]   = SPEED_1000,
-   [RATE_10G]  = SPEED_1,
-   [RATE_25G]  = SPEED_25000,
-   };
+   int i;
 
-   if (rate >= ARRAY_SIZE(rate_xlate))
-   return 0;
+   for (i = 0; i < ARRAY_SIZE(nsp_eth_rate_tbl); i++)
+   if (nsp_eth_rate_tbl[i].rate == rate)
+   return nsp_eth_rate_tbl[i].speed;
+
+   return 0;
+}
+
+static unsigned int nfp_eth_speed2rate(unsigned int speed)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(nsp_eth_rate_tbl); i++)
+   if (nsp_eth_rate_tbl[i].speed == speed)
+   return nsp_eth_rate_tbl[i].rate;
 
-   return rate_xlate[rate];
+   return RATE_INVALID;
 }
 
 static void nfp_eth_copy_mac_reverse(u8 *dst, const u8 *src)
@@ -145,7 +167,7 @@ nfp_eth_port_translate(str

[PATCH net-next 08/14] nfp: report port type in ethtool

2017-04-04 Thread Jakub Kicinski

Service process firmware provides us with information about media
and interface (SFP module) plugged in, translate that to Linux's
PORT_* defines and report via ethtool.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  1 +
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   | 21 +++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h   | 24 ++
 3 files changed, 46 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 563ced3c99e1..3b2a09821a59 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -215,6 +215,7 @@ nfp_net_get_link_ksettings(struct net_device *netdev,
nfp_net_refresh_port_config(nn);
/* Separate if - on FW error the port could've disappeared from table */
if (nn->eth_port) {
+   cmd->base.port = nn->eth_port->port_type;
cmd->base.speed = nn->eth_port->speed;
cmd->base.duplex = DUPLEX_FULL;
return 0;
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
index dcb1bc81e554..07b4ded01514 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
@@ -62,6 +62,8 @@
 #define NSP_ETH_STATE_TX_ENABLED   BIT_ULL(2)
 #define NSP_ETH_STATE_RX_ENABLED   BIT_ULL(3)
 #define NSP_ETH_STATE_RATE GENMASK_ULL(11, 8)
+#define NSP_ETH_STATE_INTERFACEGENMASK_ULL(19, 12)
+#define NSP_ETH_STATE_MEDIAGENMASK_ULL(21, 20)
 #define NSP_ETH_STATE_OVRD_CHNGBIT_ULL(22)
 #define NSP_ETH_STATE_ANEG GENMASK_ULL(25, 23)
 
@@ -134,6 +136,9 @@ nfp_eth_port_translate(struct nfp_nsp *nsp, const struct 
eth_table_entry *src,
rate = nfp_eth_rate(FIELD_GET(NSP_ETH_STATE_RATE, state));
dst->speed = dst->lanes * rate;
 
+   dst->interface = FIELD_GET(NSP_ETH_STATE_INTERFACE, state);
+   dst->media = FIELD_GET(NSP_ETH_STATE_MEDIA, state);
+
nfp_eth_copy_mac_reverse(dst->mac_addr, src->mac_addr);
 
dst->label_port = FIELD_GET(NSP_ETH_PORT_PHYLABEL, port);
@@ -170,6 +175,20 @@ nfp_eth_mark_split_ports(struct nfp_cpp *cpp, struct 
nfp_eth_table *table)
}
 }
 
+static void
+nfp_eth_calc_port_type(struct nfp_cpp *cpp, struct nfp_eth_table_port *entry)
+{
+   if (entry->interface == NFP_INTERFACE_NONE) {
+   entry->port_type = PORT_NONE;
+   return;
+   }
+
+   if (entry->media == NFP_MEDIA_FIBRE)
+   entry->port_type = PORT_FIBRE;
+   else
+   entry->port_type = PORT_DA;
+}
+
 /**
  * nfp_eth_read_ports() - retrieve port information
  * @cpp:   NFP CPP handle
@@ -237,6 +256,8 @@ __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp 
*nsp)
   &table->ports[j++]);
 
nfp_eth_mark_split_ports(cpp, table);
+   for (i = 0; i < table->count; i++)
+   nfp_eth_calc_port_type(cpp, &table->ports[i]);
 
kfree(entries);
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h
index 6b3e954e70b3..57eb3cfa6a0a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h
@@ -37,6 +37,22 @@
 #include 
 #include 
 
+enum nfp_eth_interface {
+   NFP_INTERFACE_NONE  = 0,
+   NFP_INTERFACE_SFP   = 1,
+   NFP_INTERFACE_SFPP  = 10,
+   NFP_INTERFACE_SFP28 = 28,
+   NFP_INTERFACE_QSFP  = 40,
+   NFP_INTERFACE_CXP   = 100,
+   NFP_INTERFACE_QSFP28= 112,
+};
+
+enum nfp_eth_media {
+   NFP_MEDIA_DAC_PASSIVE = 0,
+   NFP_MEDIA_DAC_ACTIVE,
+   NFP_MEDIA_FIBRE,
+};
+
 enum nfp_eth_aneg {
NFP_ANEG_AUTO = 0,
NFP_ANEG_SEARCH,
@@ -56,6 +72,8 @@ enum nfp_eth_aneg {
  * @base:  first channel index (within NBI)
  * @lanes: number of channels
  * @speed: interface speed (in Mbps)
+ * @interface: interface (module) plugged in
+ * @media: media type of the @interface
  * @aneg:  auto negotiation mode
  * @mac_addr:  interface MAC address
  * @label_port:port id
@@ -65,6 +83,7 @@ enum nfp_eth_aneg {
  * @rx_enabled:is RX enabled?
  * @override_changed: is media reconfig pending?
  *
+ * @port_type: one of %PORT_* defines for ethtool
  * @is_split:  is interface part of a split port
  */
 struct nfp_eth_table {
@@ -77,6 +96,9 @@ struct nfp_eth_table {
unsigned int lanes;
unsigned int speed;
 
+   unsigned int interface;
+   enum nfp_eth_media media;
+
enum nfp_eth_aneg aneg;

[PATCH net-next 00/14] nfp: ethtool link settings

2017-04-04 Thread Jakub Kicinski

Hi!

This series adds support for getting and setting link settings
via the (moderately) new ethtool ksettings ops.

First patch introduces minimal speed and duplex reporting using
the information directly provided in PCI BAR0 memory.

Next few changes deal with the need to refresh port state read
from the service process and patch 6 finally uses that information
to provide link speed and duplex.  Patches 7 and 8 add auto 
negotiation and port type reporting.

Remaining changes provide the set support for speed and auto
negotiation.  An upcoming series will also add port splitting
support via devlink.

Quite a bit of churn in this series is caused by the fact that
currently port speed and split changes will usually require a
reboot to take effect.  Current service process code is not capable
of performing MAC reinitialization after chip has been passing
traffic.  To make sure user is aware of this limitation we refuse
the configuration unless netdev is down, print warning to the logs
and if configuration was performed but did take effect we unregister
the netdev.  Service process has a "reboot needed" sticky bit, so
reloading the driver will not bring the netdev back.

Note that there is a helper in patch 13 which is marked as
__always_inline, because the FIELD_* macros require the parameters
to be known at compilation time.  I hope that is OK.


Jakub Kicinski (14):
  nfp: add support for .get_link_ksettings()
  nfp: don't spawn netdevs for reconfigured ports
  nfp: add mutex protection for the port list
  nfp: track link state changes
  nfp: add port state refresh
  nfp: report link speed from NSP
  nfp: report auto-negotiation in ethtool
  nfp: report port type in ethtool
  nfp: separate high level and low level NSP headers
  nfp: allow multi-stage NSP configuration
  nfp: turn NSP port entry into a union
  nfp: add extended error messages
  nfp: NSP backend for link configuration operations
  nfp: add support for .set_link_ksettings()

 drivers/net/ethernet/netronome/nfp/nfp_main.c  |   5 +-
 drivers/net/ethernet/netronome/nfp/nfp_main.h  |  11 +-
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   7 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  16 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |  13 +
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 111 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  | 187 ---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h   |  20 +-
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |  85 -
 .../nfp/nfpcore/{nfp_nsp_eth.h => nfp_nsp.h}   |  68 +++-
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   | 359 +
 11 files changed, 754 insertions(+), 128 deletions(-)
 rename drivers/net/ethernet/netronome/nfp/nfpcore/{nfp_nsp_eth.h => nfp_nsp.h} 
(62%)

-- 
2.11.0

[PATCH net-next 05/14] nfp: add port state refresh

2017-04-04 Thread Jakub Kicinski

We will need a way of refreshing port state for link settings
get/set.  For get we need to refresh port speed and type.

When settings are changed the reconfiguration may require
reboot before it's effective.  Unregister netdevs affected
by reconfiguration from a workqueue.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.h |  3 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h  |  1 +
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 89 +--
 3 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h 
b/drivers/net/ethernet/netronome/nfp/nfp_main.h
index b7ceec9a5783..b57de047b002 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct dentry;
 struct pci_dev;
@@ -69,6 +70,7 @@ struct nfp_eth_table;
  * @num_netdevs:   Number of netdevs spawned
  * @ports: Linked list of port structures (struct nfp_net)
  * @port_lock: Protects @ports, @num_ports, @num_netdevs
+ * @port_refresh_work: Work entry for taking netdevs out
  */
 struct nfp_pf {
struct pci_dev *pdev;
@@ -94,6 +96,7 @@ struct nfp_pf {
unsigned int num_netdevs;
 
struct list_head ports;
+   struct work_struct port_refresh_work;
struct mutex port_lock;
 };
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 91e963b5104f..052db9208fbb 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -813,6 +813,7 @@ struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn);
 int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new);
 
 bool nfp_net_link_changed_read_clear(struct nfp_net *nn);
+void nfp_net_refresh_port_config(struct nfp_net *nn);
 
 #ifdef CONFIG_NFP_DEBUG
 void nfp_net_debugfs_create(void);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 4d602b1ddc90..8e975c36877c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "nfpcore/nfp.h"
 #include "nfpcore/nfp_cpp.h"
@@ -468,6 +469,82 @@ nfp_net_pf_spawn_netdevs(struct nfp_pf *pf,
return err;
 }
 
+static void nfp_net_pci_remove_finish(struct nfp_pf *pf)
+{
+   nfp_net_debugfs_dir_clean(&pf->ddir);
+
+   nfp_net_irqs_disable(pf->pdev);
+   kfree(pf->irq_entries);
+
+   nfp_cpp_area_release_free(pf->rx_area);
+   nfp_cpp_area_release_free(pf->tx_area);
+   nfp_cpp_area_release_free(pf->ctrl_area);
+}
+
+static void nfp_net_refresh_netdevs(struct work_struct *work)
+{
+   struct nfp_pf *pf = container_of(work, struct nfp_pf,
+port_refresh_work);
+   struct nfp_net *nn, *next;
+
+   mutex_lock(&pf->port_lock);
+
+   /* Check for nfp_net_pci_remove() racing against us */
+   if (list_empty(&pf->ports))
+   goto out;
+
+   list_for_each_entry_safe(nn, next, &pf->ports, port_list) {
+   if (!nn->eth_port) {
+   nfp_warn(pf->cpp, "Warning: port %d not present after 
reconfig\n",
+nn->eth_port->eth_index);
+   continue;
+   }
+   if (!nn->eth_port->override_changed)
+   continue;
+
+   nn_warn(nn, "Port config changed, unregistering. Reboot 
required before port will be operational again.\n");
+
+   nfp_net_debugfs_dir_clean(&nn->debugfs_dir);
+   nfp_net_netdev_clean(nn->dp.netdev);
+
+   list_del(&nn->port_list);
+   pf->num_netdevs--;
+   nfp_net_netdev_free(nn);
+   }
+
+   if (list_empty(&pf->ports))
+   nfp_net_pci_remove_finish(pf);
+out:
+   mutex_unlock(&pf->port_lock);
+}
+
+void nfp_net_refresh_port_config(struct nfp_net *nn)
+{
+   struct nfp_pf *pf = pci_get_drvdata(nn->pdev);
+   struct nfp_eth_table *old_table;
+
+   ASSERT_RTNL();
+
+   old_table = pf->eth_tbl;
+
+   list_for_each_entry(nn, &pf->ports, port_list)
+   nfp_net_link_changed_read_clear(nn);
+
+   pf->eth_tbl = nfp_eth_read_ports(pf->cpp);
+   if (!pf->eth_tbl) {
+   pf->eth_tbl = old_table;
+   nfp_err(pf->cpp, "Error refreshing port config!\n");
+   return;
+   }
+
+   list_for_each_entry(nn, &pf->ports, port_list)
+   nn->eth_port = nfp_net_find_port(pf, nn->eth_port->eth_index);
+
+   kfree(old_table);
+
+   schedule_work(&pf->port_refresh_work);
+}
+
 /*
  * PCI device functions
  */
@@ -481,6 +558,7 @@ int nfp_net_pci_probe(struct nfp_pf *p

[PATCH net-next 02/14] nfp: don't spawn netdevs for reconfigured ports

2017-04-04 Thread Jakub Kicinski

After port reconfiguration (port split, media type change)
firmware will continue to report old configuration until
reboot.  NSP will inform us that reconfiguration is pending.
To avoid user confusion refuse to spawn netdevs until the
new configuration is applied (reboot).

We need to split the netdev to eth_table port matching from
MAC search and move it earlier in the probe() flow.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.h  |  5 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  | 79 +-
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   | 12 +++-
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h   |  3 +
 4 files changed, 62 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h 
b/drivers/net/ethernet/netronome/nfp/nfp_main.h
index 39105d0435e9..bb15a5724bf7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h
@@ -64,7 +64,8 @@ struct nfp_eth_table;
  * @fw_loaded: Is the firmware loaded?
  * @eth_tbl:   NSP ETH table
  * @ddir:  Per-device debugfs directory
- * @num_ports: Number of adapter ports
+ * @num_ports: Number of adapter ports app firmware supports
+ * @num_netdevs:   Number of netdevs spawned
  * @ports: Linked list of port structures (struct nfp_net)
  */
 struct nfp_pf {
@@ -88,6 +89,8 @@ struct nfp_pf {
struct dentry *ddir;
 
unsigned int num_ports;
+   unsigned int num_netdevs;
+
struct list_head ports;
 };
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 2025cb7c6d90..1644954f52cd 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -129,14 +129,29 @@ static u8 __iomem *nfp_net_map_area(struct nfp_cpp *cpp,
return (u8 __iomem *)ERR_PTR(err);
 }
 
+/**
+ * nfp_net_get_mac_addr() - Get the MAC address.
+ * @nn:   NFP Network structure
+ * @cpp:  NFP CPP handle
+ * @id:  NFP port id
+ *
+ * First try to get the MAC address from NSP ETH table. If that
+ * fails try HWInfo.  As a last resort generate a random address.
+ */
 static void
-nfp_net_get_mac_addr_hwinfo(struct nfp_net_dp *dp, struct nfp_cpp *cpp,
-   unsigned int id)
+nfp_net_get_mac_addr(struct nfp_net *nn, struct nfp_cpp *cpp, unsigned int id)
 {
+   struct nfp_net_dp *dp = &nn->dp;
u8 mac_addr[ETH_ALEN];
const char *mac_str;
char name[32];
 
+   if (nn->eth_port) {
+   ether_addr_copy(dp->netdev->dev_addr, nn->eth_port->mac_addr);
+   ether_addr_copy(dp->netdev->perm_addr, nn->eth_port->mac_addr);
+   return;
+   }
+
snprintf(name, sizeof(name), "eth%d.mac", id);
 
mac_str = nfp_hwinfo_lookup(cpp, name);
@@ -159,32 +174,16 @@ nfp_net_get_mac_addr_hwinfo(struct nfp_net_dp *dp, struct 
nfp_cpp *cpp,
ether_addr_copy(dp->netdev->perm_addr, mac_addr);
 }
 
-/**
- * nfp_net_get_mac_addr() - Get the MAC address.
- * @nn:   NFP Network structure
- * @pf:  NFP PF device structure
- * @id:  NFP port id
- *
- * First try to get the MAC address from NSP ETH table. If that
- * fails try HWInfo.  As a last resort generate a random address.
- */
-static void
-nfp_net_get_mac_addr(struct nfp_net *nn, struct nfp_pf *pf, unsigned int id)
+static struct nfp_eth_table_port *
+nfp_net_find_port(struct nfp_pf *pf, unsigned int id)
 {
int i;
 
for (i = 0; pf->eth_tbl && i < pf->eth_tbl->count; i++)
-   if (pf->eth_tbl->ports[i].eth_index == id) {
-   const u8 *mac_addr = pf->eth_tbl->ports[i].mac_addr;
-
-   nn->eth_port = &pf->eth_tbl->ports[i];
+   if (pf->eth_tbl->ports[i].eth_index == id)
+   return &pf->eth_tbl->ports[i];
 
-   ether_addr_copy(nn->dp.netdev->dev_addr, mac_addr);
-   ether_addr_copy(nn->dp.netdev->perm_addr, mac_addr);
-   return;
-   }
-
-   nfp_net_get_mac_addr_hwinfo(&nn->dp, pf->cpp, id);
+   return NULL;
 }
 
 static unsigned int nfp_net_pf_get_num_ports(struct nfp_pf *pf)
@@ -283,6 +282,7 @@ static void nfp_net_pf_free_netdevs(struct nfp_pf *pf)
while (!list_empty(&pf->ports)) {
nn = list_first_entry(&pf->ports, struct nfp_net, port_list);
list_del(&nn->port_list);
+   pf->num_netdevs--;
 
nfp_net_netdev_free(nn);
}
@@ -291,7 +291,8 @@ static void nfp_net_pf_free_netdevs(struct nfp_pf *pf)
 static struct nfp_net *
 nfp_net_pf_alloc_port_netdev(struct nfp_pf *pf, void __iomem *ctrl_bar,
 void __iomem *tx_bar, void __iomem *rx_bar,
-

[PATCH net-next 09/14] nfp: separate high level and low level NSP headers

2017-04-04 Thread Jakub Kicinski

We will soon add more NSP commands and structure definitions.
Move all high-level NSP header contents to a common nfp_nsp.h file.
Right now it mostly boils down to renaming nfp_nsp_eth.h and
moving some functions from nfp.h there.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c |  2 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c   |  2 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c  |  2 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c |  2 +-
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h  | 12 ++--
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c  |  1 +
 .../nfp/nfpcore/{nfp_nsp_eth.h => nfp_nsp.h}  | 19 ++-
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c  |  2 +-
 8 files changed, 22 insertions(+), 20 deletions(-)
 rename drivers/net/ethernet/netronome/nfp/nfpcore/{nfp_nsp_eth.h => nfp_nsp.h} 
(89%)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index 96266796fd09..bea2a1a6c211 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -48,7 +48,7 @@
 #include "nfpcore/nfp.h"
 #include "nfpcore/nfp_cpp.h"
 #include "nfpcore/nfp_nffw.h"
-#include "nfpcore/nfp_nsp_eth.h"
+#include "nfpcore/nfp_nsp.h"
 
 #include "nfpcore/nfp6000_pcie.h"
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 8664815f45ce..e2197160e4dc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -67,7 +67,7 @@
 #include 
 #include 
 
-#include "nfpcore/nfp_nsp_eth.h"
+#include "nfpcore/nfp_nsp.h"
 #include "nfp_net_ctrl.h"
 #include "nfp_net.h"
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 3b2a09821a59..963d6dd97cec 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -49,7 +49,7 @@
 #include 
 
 #include "nfpcore/nfp.h"
-#include "nfpcore/nfp_nsp_eth.h"
+#include "nfpcore/nfp_nsp.h"
 #include "nfp_net_ctrl.h"
 #include "nfp_net.h"
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 8e975c36877c..3e1f97e88710 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -52,7 +52,7 @@
 #include "nfpcore/nfp.h"
 #include "nfpcore/nfp_cpp.h"
 #include "nfpcore/nfp_nffw.h"
-#include "nfpcore/nfp_nsp_eth.h"
+#include "nfpcore/nfp_nsp.h"
 #include "nfpcore/nfp6000_pcie.h"
 
 #include "nfp_net_ctrl.h"
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
index f7ca8e374923..778bd9424d5d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
@@ -48,18 +48,10 @@
 
 const char *nfp_hwinfo_lookup(struct nfp_cpp *cpp, const char *lookup);
 
-/* Implemented in nfp_nsp.c */
+/* Implemented in nfp_nsp.c, low level functions */
 
 struct nfp_nsp;
-struct firmware;
-
-struct nfp_nsp *nfp_nsp_open(struct nfp_cpp *cpp);
-void nfp_nsp_close(struct nfp_nsp *state);
-u16 nfp_nsp_get_abi_ver_major(struct nfp_nsp *state);
-u16 nfp_nsp_get_abi_ver_minor(struct nfp_nsp *state);
-int nfp_nsp_wait(struct nfp_nsp *state);
-int nfp_nsp_device_soft_reset(struct nfp_nsp *state);
-int nfp_nsp_load_fw(struct nfp_nsp *state, const struct firmware *fw);
+
 int nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int 
size);
 int nfp_nsp_write_eth_table(struct nfp_nsp *state,
const void *buf, unsigned int size);
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 17822ae4a17f..6482831282b2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -49,6 +49,7 @@
 
 #include "nfp.h"
 #include "nfp_cpp.h"
+#include "nfp_nsp.h"
 
 /* Offsets relative to the CSR base */
 #define NSP_STATUS 0x00
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
similarity index 89%
rename from drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h
rename to drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index 57eb3cfa6a0a..e3baec32 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -31,12 +31,24 @@
  * SOFTWARE.
  */
 
-#ifndef NSP_NSP_ETH_H
-#define NSP_NSP_ETH_H 1
+#ifndef NSP_NSP_H
+#define NSP_NSP_H 1
 
 #include 
 #include 
 
+struct firmware;
+struct nfp_cpp;
+struct nfp_nsp;
+
+struct nfp_nsp *nfp_nsp_open(struct

[PATCH net-next 07/14] nfp: report auto-negotiation in ethtool

2017-04-04 Thread Jakub Kicinski

NSP ABI version 0.17 is exposing the autonegotiation settings.
Report whether autoneg is on via ethtool.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c |  4 
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c |  2 ++
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h | 11 +++
 3 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 0fdc14e7b576..563ced3c99e1 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -203,6 +203,10 @@ nfp_net_get_link_ksettings(struct net_device *netdev,
cmd->base.speed = SPEED_UNKNOWN;
cmd->base.duplex = DUPLEX_UNKNOWN;
 
+   if (nn->eth_port)
+   cmd->base.autoneg = nn->eth_port->aneg != NFP_ANEG_DISABLED ?
+   AUTONEG_ENABLE : AUTONEG_DISABLE;
+
if (!netif_carrier_ok(netdev))
return 0;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
index 932772fbd27e..dcb1bc81e554 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
@@ -63,6 +63,7 @@
 #define NSP_ETH_STATE_RX_ENABLED   BIT_ULL(3)
 #define NSP_ETH_STATE_RATE GENMASK_ULL(11, 8)
 #define NSP_ETH_STATE_OVRD_CHNGBIT_ULL(22)
+#define NSP_ETH_STATE_ANEG GENMASK_ULL(25, 23)
 
 #define NSP_ETH_CTRL_ENABLED   BIT_ULL(1)
 #define NSP_ETH_CTRL_TX_ENABLEDBIT_ULL(2)
@@ -142,6 +143,7 @@ nfp_eth_port_translate(struct nfp_nsp *nsp, const struct 
eth_table_entry *src,
return;
 
dst->override_changed = FIELD_GET(NSP_ETH_STATE_OVRD_CHNG, state);
+   dst->aneg = FIELD_GET(NSP_ETH_STATE_ANEG, state);
 }
 
 static void
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h
index 6838741fadd7..6b3e954e70b3 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.h
@@ -37,6 +37,14 @@
 #include 
 #include 
 
+enum nfp_eth_aneg {
+   NFP_ANEG_AUTO = 0,
+   NFP_ANEG_SEARCH,
+   NFP_ANEG_25G_CONSORTIUM,
+   NFP_ANEG_25G_IEEE,
+   NFP_ANEG_DISABLED,
+};
+
 /**
  * struct nfp_eth_table - ETH table information
  * @count: number of table entries
@@ -48,6 +56,7 @@
  * @base:  first channel index (within NBI)
  * @lanes: number of channels
  * @speed: interface speed (in Mbps)
+ * @aneg:  auto negotiation mode
  * @mac_addr:  interface MAC address
  * @label_port:port id
  * @label_subport:  id of interface within port (for split ports)
@@ -68,6 +77,8 @@ struct nfp_eth_table {
unsigned int lanes;
unsigned int speed;
 
+   enum nfp_eth_aneg aneg;
+
u8 mac_addr[ETH_ALEN];
 
u8 label_port;
-- 
2.11.0

[PATCH net] nfp: fix potential use after free on xdp prog

2017-04-04 Thread Jakub Kicinski

We should unregister the net_device first, before we give back
our reference on xdp_prog.  Otherwise xdp_prog may be freed
before .ndo_stop() disabled the datapath.  Found by code inspection.

Fixes: ecd63a0217d5 ("nfp: add XDP support in the driver")
Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
Just a heads up - this will cause a merge conflict since nn->netdev
member got moved to nn->dp.netdev in net-next.

 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 9179a99563af..a41377e26c07 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3275,9 +3275,10 @@ void nfp_net_netdev_clean(struct net_device *netdev)
 {
struct nfp_net *nn = netdev_priv(netdev);
 
+   unregister_netdev(nn->netdev);
+
if (nn->xdp_prog)
bpf_prog_put(nn->xdp_prog);
if (nn->bpf_offload_xdp)
nfp_net_xdp_offload(nn, NULL);
-   unregister_netdev(nn->netdev);
 }
-- 
2.11.0

RE: [PATCH] i40e: limit client interface to X722 hardware

2017-04-04 Thread Williams, Mitch A



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
> Behalf Of Stefan Assmann
> Sent: Tuesday, April 04, 2017 12:52 PM
> To: Or Gerlitz 
> Cc: intel-wired-...@lists.osuosl.org; Linux Netdev List
> ; David Miller ; Kirsher,
> Jeffrey T 
> Subject: Re: [PATCH] i40e: limit client interface to X722 hardware
> 
> On 04.04.2017 18:56, Or Gerlitz wrote:
> > On Tue, Apr 4, 2017 at 5:34 PM, Stefan Assmann  wrote:
> >> The client interface is meant for X722 iWARP support. Modprobing i40iw
> >> on systems with X710/XL710 NICs currently may crash the system.
> >
> > just curious may or crash? and why?
> 
> The backtrace I got was not really conclusive. The code is not meant to
> be run on that hardware so I didn't bother to dig deeper.
> 
>   Stefan

The i40iw module can't easily determine which hardware its loaded upon. So it 
assumes that we (i40e, that is) have handed it a handle to valid hardware. When 
the interface is opened, it starts reading and writing registers that are 
nonexistent on X710/XL710.

-Mitch

Re: net/sched: latent livelock in dev_deactivate_many() due to yield() usage

2017-04-04 Thread Cong Wang

On Sat, Apr 1, 2017 at 9:28 PM, Mike Galbraith  wrote:
> Greetings network wizards,
>
> Quoting kernel/sched/core.c:
> /**
>  * yield - yield the current processor to other threads.
>  *
>  * Do not ever use this function, there's a 99% chance you're doing it wrong.
>  *
>  * The scheduler is at all times free to pick the calling task as the most
>  * eligible task to run, if removing the yield() call from your code breaks
>  * it, its already broken.
>  *
>  * Typical broken usage is:
>  *
>  * while (!event)
>  *  yield();
>  *
>  * where one assumes that yield() will let 'the other' process run that will
>  * make event true. If the current task is a SCHED_FIFO task that will never
>  * happen. Never use yield() as a progress guarantee!!
>  *
>  * If you want to use yield() to wait for something, use wait_event().
>  * If you want to use yield() to be 'nice' for others, use cond_resched().
>  * If you still want to use yield(), do not!
>  */
>
> Livelock can be triggered by setting kworkers to SCHED_FIFO, then
> suspend/resume.. you come back from sleepy-land with a spinning
> kworker.  For whatever reason, I can only do that with an enterprise
> like config, my standard config refuses to play, but no matter, it's
> "Typical broken usage".
>
> (yield() should be rendered dead)


Thanks for the report! Looks like a quick solution here is to replace
this yield() with cond_resched(), it is harder to really wait for
all qdisc's to transmit all packets.

Re: [PATCH] ebpf: verify the output of the JIT

2017-04-04 Thread Kees Cook

On Tue, Apr 4, 2017 at 3:08 PM, Tycho Andersen  wrote:
> The goal of this patch is to protect the JIT against an attacker with a
> write-in-memory primitive. The JIT allocates a buffer which will eventually
> be marked +x, so we need to make sure that what was written to this buffer
> is what was intended.
>
> We acheive this by building a hash of the instruction buffer as
> instructions are emittted and then comparing that to a hash at the end of
> the JIT compile after the buffer has been marked read-only.
>
> Signed-off-by: Tycho Andersen 
> CC: Daniel Borkmann 
> CC: Alexei Starovoitov 
> CC: Kees Cook 
> CC: Mickaël Salaün 

Cool! This closes the race condition on producing the JIT vs going
read-only. I wonder if it might be possible to make this a more
generic interface to the BPF which would be allocate the hash, provide
the update callback during emit, and then do the hash check itself at
the end of bpf_jit_binary_lock_ro()?

-Kees

> ---
>  arch/x86/Kconfig|  11 
>  arch/x86/net/bpf_jit_comp.c | 147 
> 
>  2 files changed, 147 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cc98d5a..7b2db2c 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2789,6 +2789,17 @@ config X86_DMA_REMAP
>
>  source "net/Kconfig"
>
> +config EBPF_JIT_HASH_OUTPUT
> +   def_bool y
> +   depends on HAVE_EBPF_JIT
> +   depends on BPF_JIT
> +   select CRYPTO_SHA256
> +   ---help---
> + Enables a double check of the JIT's output after it is marked 
> read-only to
> + ensure that it matches what the JIT generated.
> +
> + Note, only applies when /proc/sys/net/core/bpf_jit_harden > 0.
> +
>  source "drivers/Kconfig"
>
>  source "drivers/firmware/Kconfig"
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 32322ce..be1271e 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -13,9 +13,15 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
>  int bpf_jit_enable __read_mostly;
>
> +#ifdef CONFIG_EBPF_JIT_HASH_OUTPUT
> +struct crypto_shash *tfm __read_mostly;
> +#endif
> +
>  /*
>   * assembly code in arch/x86/net/bpf_jit.S
>   */
> @@ -25,7 +31,8 @@ extern u8 sk_load_byte_positive_offset[];
>  extern u8 sk_load_word_negative_offset[], sk_load_half_negative_offset[];
>  extern u8 sk_load_byte_negative_offset[];
>
> -static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
> +static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len,
> +struct shash_desc *hash)
>  {
> if (len == 1)
> *ptr = bytes;
> @@ -35,11 +42,15 @@ static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
> *(u32 *)ptr = bytes;
> barrier();
> }
> +
> +   if (IS_ENABLED(CONFIG_EBPF_JIT_HASH_OUTPUT) && hash)
> +   crypto_shash_update(hash, (u8 *) &bytes, len);
> +
> return ptr + len;
>  }
>
>  #define EMIT(bytes, len) \
> -   do { prog = emit_code(prog, bytes, len); cnt += len; } while (0)
> +   do { prog = emit_code(prog, bytes, len, hash); cnt += len; } while (0)
>
>  #define EMIT1(b1)  EMIT(b1, 1)
>  #define EMIT2(b1, b2)  EMIT((b1) + ((b2) << 8), 2)
> @@ -206,7 +217,7 @@ struct jit_context {
>  /* emit x64 prologue code for BPF program and check it's size.
>   * bpf_tail_call helper will skip it while jumping into another program
>   */
> -static void emit_prologue(u8 **pprog)
> +static void emit_prologue(u8 **pprog, struct shash_desc *hash)
>  {
> u8 *prog = *pprog;
> int cnt = 0;
> @@ -264,7 +275,7 @@ static void emit_prologue(u8 **pprog)
>   *   goto *(prog->bpf_func + prologue_size);
>   * out:
>   */
> -static void emit_bpf_tail_call(u8 **pprog)
> +static void emit_bpf_tail_call(u8 **pprog, struct shash_desc *hash)
>  {
> u8 *prog = *pprog;
> int label1, label2, label3;
> @@ -328,7 +339,7 @@ static void emit_bpf_tail_call(u8 **pprog)
>  }
>
>
> -static void emit_load_skb_data_hlen(u8 **pprog)
> +static void emit_load_skb_data_hlen(u8 **pprog, struct shash_desc *hash)
>  {
> u8 *prog = *pprog;
> int cnt = 0;
> @@ -348,7 +359,8 @@ static void emit_load_skb_data_hlen(u8 **pprog)
>  }
>
>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
> - int oldproglen, struct jit_context *ctx)
> + int oldproglen, struct jit_context *ctx,
> + struct shash_desc *hash)
>  {
> struct bpf_insn *insn = bpf_prog->insnsi;
> int insn_cnt = bpf_prog->len;
> @@ -360,10 +372,10 @@ static int do_jit(struct bpf_prog *bpf_prog, int 
> *addrs, u8 *image,
> int proglen = 0;
> u8 *prog = temp;
>
> -   emit_prologue(&prog);
> +   emit_prologue(&prog, hash);
>
> if (seen_ld_abs)
> -   emit_load_skb_data_hlen(&prog);
> +   emit_load_skb_dat

[PATCH net-next 04/12] bnxt_en: Add ethtool get_wol method.

2017-04-04 Thread Michael Chan

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 16 
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 6903a87..2b94704 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -1,6 +1,7 @@
 /* Broadcom NetXtreme-C/E network driver.
  *
  * Copyright (c) 2014-2016 Broadcom Corporation
+ * Copyright (c) 2016-2017 Broadcom Limited
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -832,6 +833,20 @@ static void bnxt_get_drvinfo(struct net_device *dev,
kfree(pkglog);
 }
 
+static void bnxt_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+{
+   struct bnxt *bp = netdev_priv(dev);
+
+   wol->supported = 0;
+   wol->wolopts = 0;
+   memset(&wol->sopass, 0, sizeof(wol->sopass));
+   if (bp->flags & BNXT_FLAG_WOL_CAP) {
+   wol->supported = WAKE_MAGIC;
+   if (bp->wol)
+   wol->wolopts = WAKE_MAGIC;
+   }
+}
+
 u32 _bnxt_fw_to_ethtool_adv_spds(u16 fw_speeds, u8 fw_pause)
 {
u32 speed_mask = 0;
@@ -2134,6 +2149,7 @@ static int bnxt_set_phys_id(struct net_device *dev,
.get_pauseparam = bnxt_get_pauseparam,
.set_pauseparam = bnxt_set_pauseparam,
.get_drvinfo= bnxt_get_drvinfo,
+   .get_wol= bnxt_get_wol,
.get_coalesce   = bnxt_get_coalesce,
.set_coalesce   = bnxt_set_coalesce,
.get_msglevel   = bnxt_get_msglevel,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h
index ed1e555..2762171 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h
@@ -1,6 +1,7 @@
 /* Broadcom NetXtreme-C/E network driver.
  *
  * Copyright (c) 2014-2016 Broadcom Corporation
+ * Copyright (c) 2016-2017 Broadcom Limited
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
-- 
1.8.3.1

[PATCH net-next 05/12] bnxt_en: Add ethtool set_wol method.

2017-04-04 Thread Michael Chan

And add functions to set and free magic packet filter.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 32 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  2 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 26 ++
 3 files changed, 60 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 10a9cda..e432d0a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5842,6 +5842,38 @@ static int bnxt_hwrm_port_led_qcaps(struct bnxt *bp)
return 0;
 }
 
+int bnxt_hwrm_alloc_wol_fltr(struct bnxt *bp)
+{
+   struct hwrm_wol_filter_alloc_input req = {0};
+   struct hwrm_wol_filter_alloc_output *resp = bp->hwrm_cmd_resp_addr;
+   int rc;
+
+   bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_WOL_FILTER_ALLOC, -1, -1);
+   req.port_id = cpu_to_le16(bp->pf.port_id);
+   req.wol_type = WOL_FILTER_ALLOC_REQ_WOL_TYPE_MAGICPKT;
+   req.enables = cpu_to_le32(WOL_FILTER_ALLOC_REQ_ENABLES_MAC_ADDRESS);
+   memcpy(req.mac_address, bp->dev->dev_addr, ETH_ALEN);
+   mutex_lock(&bp->hwrm_cmd_lock);
+   rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+   if (!rc)
+   bp->wol_filter_id = resp->wol_filter_id;
+   mutex_unlock(&bp->hwrm_cmd_lock);
+   return rc;
+}
+
+int bnxt_hwrm_free_wol_fltr(struct bnxt *bp)
+{
+   struct hwrm_wol_filter_free_input req = {0};
+   int rc;
+
+   bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_WOL_FILTER_FREE, -1, -1);
+   req.port_id = cpu_to_le16(bp->pf.port_id);
+   req.enables = cpu_to_le32(WOL_FILTER_FREE_REQ_ENABLES_WOL_FILTER_ID);
+   req.wol_filter_id = bp->wol_filter_id;
+   rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+   return rc;
+}
+
 static u16 bnxt_hwrm_get_wol_fltrs(struct bnxt *bp, u16 handle)
 {
struct hwrm_wol_filter_qcfg_input req = {0};
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 02de812..aba25ba 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1242,6 +1242,8 @@ int bnxt_hwrm_func_rgtr_async_events(struct bnxt *bp, 
unsigned long *bmap,
 void bnxt_tx_enable(struct bnxt *bp);
 int bnxt_hwrm_set_pause(struct bnxt *);
 int bnxt_hwrm_set_link_setting(struct bnxt *, bool, bool);
+int bnxt_hwrm_alloc_wol_fltr(struct bnxt *bp);
+int bnxt_hwrm_free_wol_fltr(struct bnxt *bp);
 int bnxt_hwrm_fw_set_time(struct bnxt *);
 int bnxt_open_nic(struct bnxt *, bool, bool);
 int bnxt_close_nic(struct bnxt *, bool, bool);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 2b94704..84cd4ca 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -847,6 +847,31 @@ static void bnxt_get_wol(struct net_device *dev, struct 
ethtool_wolinfo *wol)
}
 }
 
+static int bnxt_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+{
+   struct bnxt *bp = netdev_priv(dev);
+
+   if (wol->wolopts & ~WAKE_MAGIC)
+   return -EINVAL;
+
+   if (wol->wolopts & WAKE_MAGIC) {
+   if (!(bp->flags & BNXT_FLAG_WOL_CAP))
+   return -EINVAL;
+   if (!bp->wol) {
+   if (bnxt_hwrm_alloc_wol_fltr(bp))
+   return -EBUSY;
+   bp->wol = 1;
+   }
+   } else {
+   if (bp->wol) {
+   if (bnxt_hwrm_free_wol_fltr(bp))
+   return -EBUSY;
+   bp->wol = 0;
+   }
+   }
+   return 0;
+}
+
 u32 _bnxt_fw_to_ethtool_adv_spds(u16 fw_speeds, u8 fw_pause)
 {
u32 speed_mask = 0;
@@ -2150,6 +2175,7 @@ static int bnxt_set_phys_id(struct net_device *dev,
.set_pauseparam = bnxt_set_pauseparam,
.get_drvinfo= bnxt_get_drvinfo,
.get_wol= bnxt_get_wol,
+   .set_wol= bnxt_set_wol,
.get_coalesce   = bnxt_get_coalesce,
.set_coalesce   = bnxt_set_coalesce,
.get_msglevel   = bnxt_get_msglevel,
-- 
1.8.3.1

[PATCH net-next 01/12] bnxt_en: Update firmware interface spec to 1.7.6.2.

2017-04-04 Thread Michael Chan

Features added include WoL and selftest.

Signed-off-by: Deepak Khungar 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h   | 325 +---
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c |   8 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |   1 +
 3 files changed, 297 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h
index 6e275c2..7dc71bb 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h
@@ -11,19 +11,21 @@
 #ifndef BNXT_HSI_H
 #define BNXT_HSI_H
 
-/* HSI and HWRM Specification 1.7.0 */
+/* HSI and HWRM Specification 1.7.6 */
 #define HWRM_VERSION_MAJOR 1
 #define HWRM_VERSION_MINOR 7
-#define HWRM_VERSION_UPDATE0
+#define HWRM_VERSION_UPDATE6
 
-#define HWRM_VERSION_STR   "1.7.0"
+#define HWRM_VERSION_RSVD  2 /* non-zero means beta version */
+
+#define HWRM_VERSION_STR   "1.7.6.2"
 /*
  * Following is the signature for HWRM message field that indicates not
  * applicable (All F's). Need to cast it the size of the field if needed.
  */
 #define HWRM_NA_SIGNATURE  ((__le32)(-1))
 #define HWRM_MAX_REQ_LEN(128)  /* hwrm_func_buf_rgtr */
-#define HWRM_MAX_RESP_LEN(176)  /* hwrm_func_qstats */
+#define HWRM_MAX_RESP_LEN(248)  /* hwrm_selftest_qlist */
 #define HW_HASH_INDEX_SIZE  0x80/* 7 bit indirection table index. */
 #define HW_HASH_KEY_SIZE   40
 #define HWRM_RESP_VALID_KEY  1 /* valid key for HWRM response */
@@ -571,9 +573,10 @@ struct hwrm_ver_get_output {
__le16 max_req_win_len;
__le16 max_resp_len;
__le16 def_req_timeout;
+   u8 init_pending;
+   #define VER_GET_RESP_INIT_PENDING_DEV_NOT_RDY   0x1UL
u8 unused_0;
u8 unused_1;
-   u8 unused_2;
u8 valid;
 };
 
@@ -809,6 +812,8 @@ struct hwrm_func_qcfg_output {
#define FUNC_QCFG_RESP_FLAGS_OOB_WOL_BMP_ENABLED0x2UL
#define FUNC_QCFG_RESP_FLAGS_FW_DCBX_AGENT_ENABLED  0x4UL
#define FUNC_QCFG_RESP_FLAGS_STD_TX_RING_MODE_ENABLED  0x8UL
+   #define FUNC_QCFG_RESP_FLAGS_FW_LLDP_AGENT_ENABLED  0x10UL
+   #define FUNC_QCFG_RESP_FLAGS_MULTI_HOST 0x20UL
u8 mac_address[6];
__le16 pci_id;
__le16 alloc_rsscos_ctx;
@@ -827,10 +832,12 @@ struct hwrm_func_qcfg_output {
#define FUNC_QCFG_RESP_PORT_PARTITION_TYPE_NPAR1_5 0x3UL
#define FUNC_QCFG_RESP_PORT_PARTITION_TYPE_NPAR2_0 0x4UL
#define FUNC_QCFG_RESP_PORT_PARTITION_TYPE_UNKNOWN 0xffUL
-   u8 unused_0;
+   u8 port_pf_cnt;
+   #define FUNC_QCFG_RESP_PORT_PF_CNT_UNAVAIL 0x0UL
__le16 dflt_vnic_id;
-   u8 unused_1;
-   u8 unused_2;
+   u8 host_cnt;
+   #define FUNC_QCFG_RESP_HOST_CNT_UNAVAIL0x0UL
+   u8 unused_0;
__le32 min_bw;
#define FUNC_QCFG_RESP_MIN_BW_BW_VALUE_MASK 0xfffUL
#define FUNC_QCFG_RESP_MIN_BW_BW_VALUE_SFT  0
@@ -867,12 +874,12 @@ struct hwrm_func_qcfg_output {
#define FUNC_QCFG_RESP_EVB_MODE_NO_EVB 0x0UL
#define FUNC_QCFG_RESP_EVB_MODE_VEB0x1UL
#define FUNC_QCFG_RESP_EVB_MODE_VEPA   0x2UL
-   u8 unused_3;
+   u8 unused_1;
__le16 alloc_vfs;
__le32 alloc_mcast_filters;
__le32 alloc_hw_ring_grps;
__le16 alloc_sp_tx_rings;
-   u8 unused_4;
+   u8 unused_2;
u8 valid;
 };
 
@@ -888,16 +895,13 @@ struct hwrm_func_cfg_input {
u8 unused_0;
u8 unused_1;
__le32 flags;
-   #define FUNC_CFG_REQ_FLAGS_PROM_MODE0x1UL
-   #define FUNC_CFG_REQ_FLAGS_SRC_MAC_ADDR_CHECK   0x2UL
-   #define FUNC_CFG_REQ_FLAGS_SRC_IP_ADDR_CHECK0x4UL
-   #define FUNC_CFG_REQ_FLAGS_VLAN_PRI_MATCH   0x8UL
-   #define FUNC_CFG_REQ_FLAGS_DFLT_PRI_NOMATCH 0x10UL
-   #define FUNC_CFG_REQ_FLAGS_DISABLE_PAUSE0x20UL
-   #define FUNC_CFG_REQ_FLAGS_DISABLE_STP  0x40UL
-   #define FUNC_CFG_REQ_FLAGS_DISABLE_LLDP 0x80UL
-   #define FUNC_CFG_REQ_FLAGS_DISABLE_PTPV20x100UL
-   #define FUNC_CFG_REQ_FLAGS_STD_TX_RING_MODE 0x200UL
+   #define FUNC_CFG_REQ_FLAGS_SRC_MAC_ADDR_CHECK_DISABLE  0x1UL
+   #define FUNC_CFG_REQ_FLAGS_SRC_MAC_ADDR_CHECK_ENABLE   0x2UL
+   #define FUNC_CFG_REQ_FLAGS_RSVD_MASK0x1fcUL
+   #define FUNC_CFG_REQ_FLAGS_RSVD_SFT 2
+   #define FUNC_CFG_REQ_FLAGS_STD_TX_RING_MODE_ENABLE  0x200UL
+   #define FUNC_CFG_REQ_FLAGS_STD_TX_RING_MODE_DISABLE 0x400UL
+   #defi

[PATCH net-next 10/12] bnxt_en: Add interrupt test to ethtool -t selftest.

2017-04-04 Thread Michael Chan

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 32 ++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index dde3e21..848ecf2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -2178,6 +2178,29 @@ static int bnxt_set_phys_id(struct net_device *dev,
return rc;
 }
 
+static int bnxt_hwrm_selftest_irq(struct bnxt *bp, u16 cmpl_ring)
+{
+   struct hwrm_selftest_irq_input req = {0};
+
+   bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_SELFTEST_IRQ, cmpl_ring, -1);
+   return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+}
+
+static int bnxt_test_irq(struct bnxt *bp)
+{
+   int i;
+
+   for (i = 0; i < bp->cp_nr_rings; i++) {
+   u16 cmpl_ring = bp->grp_info[i].cp_fw_ring_id;
+   int rc;
+
+   rc = bnxt_hwrm_selftest_irq(bp, cmpl_ring);
+   if (rc)
+   return rc;
+   }
+   return 0;
+}
+
 static int bnxt_hwrm_mac_loopback(struct bnxt *bp, bool enable)
 {
struct hwrm_port_mac_cfg_input req = {0};
@@ -2366,9 +2389,10 @@ static int bnxt_run_fw_tests(struct bnxt *bp, u8 
test_mask, u8 *test_results)
return rc;
 }
 
-#define BNXT_DRV_TESTS 2
+#define BNXT_DRV_TESTS 3
 #define BNXT_MACLPBK_TEST_IDX  (bp->num_tests - BNXT_DRV_TESTS)
 #define BNXT_PHYLPBK_TEST_IDX  (BNXT_MACLPBK_TEST_IDX + 1)
+#define BNXT_IRQ_TEST_IDX  (BNXT_MACLPBK_TEST_IDX + 2)
 
 static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest,
   u64 *buf)
@@ -2437,6 +2461,10 @@ static void bnxt_self_test(struct net_device *dev, 
struct ethtool_test *etest,
bnxt_half_close_nic(bp);
bnxt_open_nic(bp, false, true);
}
+   if (bnxt_test_irq(bp)) {
+   buf[BNXT_IRQ_TEST_IDX] = 1;
+   etest->flags |= ETH_TEST_FL_FAILED;
+   }
for (i = 0; i < bp->num_tests - BNXT_DRV_TESTS; i++) {
u8 bit_val = 1 << i;
 
@@ -2484,6 +2512,8 @@ void bnxt_ethtool_init(struct bnxt *bp)
strcpy(str, "Mac loopback test (offline)");
} else if (i == BNXT_PHYLPBK_TEST_IDX) {
strcpy(str, "Phy loopback test (offline)");
+   } else if (i == BNXT_IRQ_TEST_IDX) {
+   strcpy(str, "Interrupt_test (offline)");
} else {
strlcpy(str, fw_str, ETH_GSTRING_LEN);
strncat(str, " test", ETH_GSTRING_LEN - strlen(str));
-- 
1.8.3.1

[PATCH net-next 09/12] bnxt_en: Add PHY loopback to ethtool self-test.

2017-04-04 Thread Michael Chan

It is necessary to disable autoneg before enabling PHY loopback,
otherwise link won't come up.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 62 ++-
 1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index ecb4417..dde3e21 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -2192,6 +2192,54 @@ static int bnxt_hwrm_mac_loopback(struct bnxt *bp, bool 
enable)
return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 }
 
+static int bnxt_disable_an_for_lpbk(struct bnxt *bp,
+   struct hwrm_port_phy_cfg_input *req)
+{
+   struct bnxt_link_info *link_info = &bp->link_info;
+   u16 fw_advertising = link_info->advertising;
+   u16 fw_speed;
+   int rc;
+
+   if (!link_info->autoneg)
+   return 0;
+
+   fw_speed = PORT_PHY_CFG_REQ_FORCE_LINK_SPEED_1GB;
+   if (netif_carrier_ok(bp->dev))
+   fw_speed = bp->link_info.link_speed;
+   else if (fw_advertising & BNXT_LINK_SPEED_MSK_10GB)
+   fw_speed = PORT_PHY_CFG_REQ_FORCE_LINK_SPEED_10GB;
+   else if (fw_advertising & BNXT_LINK_SPEED_MSK_25GB)
+   fw_speed = PORT_PHY_CFG_REQ_FORCE_LINK_SPEED_25GB;
+   else if (fw_advertising & BNXT_LINK_SPEED_MSK_40GB)
+   fw_speed = PORT_PHY_CFG_REQ_FORCE_LINK_SPEED_40GB;
+   else if (fw_advertising & BNXT_LINK_SPEED_MSK_50GB)
+   fw_speed = PORT_PHY_CFG_REQ_FORCE_LINK_SPEED_50GB;
+
+   req->force_link_speed = cpu_to_le16(fw_speed);
+   req->flags |= cpu_to_le32(PORT_PHY_CFG_REQ_FLAGS_FORCE |
+ PORT_PHY_CFG_REQ_FLAGS_RESET_PHY);
+   rc = hwrm_send_message(bp, req, sizeof(*req), HWRM_CMD_TIMEOUT);
+   req->flags = 0;
+   req->force_link_speed = cpu_to_le16(0);
+   return rc;
+}
+
+static int bnxt_hwrm_phy_loopback(struct bnxt *bp, bool enable)
+{
+   struct hwrm_port_phy_cfg_input req = {0};
+
+   bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_PORT_PHY_CFG, -1, -1);
+
+   if (enable) {
+   bnxt_disable_an_for_lpbk(bp, &req);
+   req.lpbk = PORT_PHY_CFG_REQ_LPBK_LOCAL;
+   } else {
+   req.lpbk = PORT_PHY_CFG_REQ_LPBK_NONE;
+   }
+   req.enables = cpu_to_le32(PORT_PHY_CFG_REQ_ENABLES_LPBK);
+   return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+}
+
 static int bnxt_rx_loopback(struct bnxt *bp, struct bnxt_napi *bnapi,
u32 raw_cons, int pkt_size)
 {
@@ -2318,8 +2366,9 @@ static int bnxt_run_fw_tests(struct bnxt *bp, u8 
test_mask, u8 *test_results)
return rc;
 }
 
-#define BNXT_DRV_TESTS 1
+#define BNXT_DRV_TESTS 2
 #define BNXT_MACLPBK_TEST_IDX  (bp->num_tests - BNXT_DRV_TESTS)
+#define BNXT_PHYLPBK_TEST_IDX  (BNXT_MACLPBK_TEST_IDX + 1)
 
 static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest,
   u64 *buf)
@@ -2377,8 +2426,15 @@ static void bnxt_self_test(struct net_device *dev, 
struct ethtool_test *etest,
else
buf[BNXT_MACLPBK_TEST_IDX] = 0;
 
-   bnxt_half_close_nic(bp);
bnxt_hwrm_mac_loopback(bp, false);
+   bnxt_hwrm_phy_loopback(bp, true);
+   msleep(1000);
+   if (bnxt_run_loopback(bp)) {
+   buf[BNXT_PHYLPBK_TEST_IDX] = 1;
+   etest->flags |= ETH_TEST_FL_FAILED;
+   }
+   bnxt_hwrm_phy_loopback(bp, false);
+   bnxt_half_close_nic(bp);
bnxt_open_nic(bp, false, true);
}
for (i = 0; i < bp->num_tests - BNXT_DRV_TESTS; i++) {
@@ -2426,6 +2482,8 @@ void bnxt_ethtool_init(struct bnxt *bp)
 
if (i == BNXT_MACLPBK_TEST_IDX) {
strcpy(str, "Mac loopback test (offline)");
+   } else if (i == BNXT_PHYLPBK_TEST_IDX) {
+   strcpy(str, "Phy loopback test (offline)");
} else {
strlcpy(str, fw_str, ETH_GSTRING_LEN);
strncat(str, " test", ETH_GSTRING_LEN - strlen(str));
-- 
1.8.3.1

[PATCH net-next 11/12] bnxt_en: Use short TX BDs for the XDP TX ring.

2017-04-04 Thread Michael Chan

No offload is performed on the XDP_TX ring so we can use the short TX
BDs.  This has the effect of doubling the size of the XDP TX ring so
that it now matches the size of the rx ring by default.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 8b27137..9dae327 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -23,7 +23,6 @@ void bnxt_xmit_xdp(struct bnxt *bp, struct bnxt_tx_ring_info 
*txr,
   dma_addr_t mapping, u32 len, u16 rx_prod)
 {
struct bnxt_sw_tx_bd *tx_buf;
-   struct tx_bd_ext *txbd1;
struct tx_bd *txbd;
u32 flags;
u16 prod;
@@ -33,23 +32,13 @@ void bnxt_xmit_xdp(struct bnxt *bp, struct 
bnxt_tx_ring_info *txr,
tx_buf->rx_prod = rx_prod;
 
txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)];
-   flags = (len << TX_BD_LEN_SHIFT) | TX_BD_TYPE_LONG_TX_BD |
-   (2 << TX_BD_FLAGS_BD_CNT_SHIFT) | TX_BD_FLAGS_COAL_NOW |
+   flags = (len << TX_BD_LEN_SHIFT) | (1 << TX_BD_FLAGS_BD_CNT_SHIFT) |
TX_BD_FLAGS_PACKET_END | bnxt_lhint_arr[len >> 9];
txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
txbd->tx_bd_opaque = prod;
txbd->tx_bd_haddr = cpu_to_le64(mapping);
 
prod = NEXT_TX(prod);
-   txbd1 = (struct tx_bd_ext *)
-   &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)];
-
-   txbd1->tx_bd_hsize_lflags = cpu_to_le32(0);
-   txbd1->tx_bd_mss = cpu_to_le32(0);
-   txbd1->tx_bd_cfa_action = cpu_to_le32(0);
-   txbd1->tx_bd_cfa_meta = cpu_to_le32(0);
-
-   prod = NEXT_TX(prod);
txr->tx_prod = prod;
 }
 
@@ -66,7 +55,6 @@ void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi 
*bnapi, int nr_pkts)
for (i = 0; i < nr_pkts; i++) {
last_tx_cons = tx_cons;
tx_cons = NEXT_TX(tx_cons);
-   tx_cons = NEXT_TX(tx_cons);
}
txr->tx_cons = tx_cons;
if (bnxt_tx_avail(bp, txr) == bp->tx_ring_size) {
@@ -133,7 +121,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info 
*rxr, u16 cons,
return false;
 
case XDP_TX:
-   if (tx_avail < 2) {
+   if (tx_avail < 1) {
trace_xdp_exception(bp->dev, xdp_prog, act);
bnxt_reuse_rx_data(rxr, cons, page);
return true;
-- 
1.8.3.1

[PATCH net-next 12/12] bnxt_en: Cap the msix vector with the max completion rings.

2017-04-04 Thread Michael Chan

The current code enables up to the maximum MSIX vectors in the PCIE
config space without considering the max completion rings available.
An MSIX vector is only useful when it has an associated completion
ring, so it is better to cap it.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 9d71c19..43b7342 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5183,9 +5183,10 @@ static unsigned int bnxt_get_max_func_irqs(struct bnxt 
*bp)
 {
 #if defined(CONFIG_BNXT_SRIOV)
if (BNXT_VF(bp))
-   return bp->vf.max_irqs;
+   return min_t(unsigned int, bp->vf.max_irqs,
+bp->vf.max_cp_rings);
 #endif
-   return bp->pf.max_irqs;
+   return min_t(unsigned int, bp->pf.max_irqs, bp->pf.max_cp_rings);
 }
 
 void bnxt_set_max_func_irqs(struct bnxt *bp, unsigned int max_irqs)
-- 
1.8.3.1

[PATCH net-next 08/12] bnxt_en: Add ethtool mac loopback self test.

2017-04-04 Thread Michael Chan

The mac loopback self test operates in polling mode.  To support that,
we need to add functions to open and close the NIC half way.  The half
open mode allows the rings to operate without IRQ and NAPI.  We
use the XDP transmit function to send the loopback packet.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |  37 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |   2 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 165 --
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   2 +
 5 files changed, 199 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7b72ba9..9d71c19 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6097,6 +6097,43 @@ int bnxt_open_nic(struct bnxt *bp, bool irq_re_init, 
bool link_re_init)
return rc;
 }
 
+/* rtnl_lock held, open the NIC half way by allocating all resources, but
+ * NAPI, IRQ, and TX are not enabled.  This is mainly used for offline
+ * self tests.
+ */
+int bnxt_half_open_nic(struct bnxt *bp)
+{
+   int rc = 0;
+
+   rc = bnxt_alloc_mem(bp, false);
+   if (rc) {
+   netdev_err(bp->dev, "bnxt_alloc_mem err: %x\n", rc);
+   goto half_open_err;
+   }
+   rc = bnxt_init_nic(bp, false);
+   if (rc) {
+   netdev_err(bp->dev, "bnxt_init_nic err: %x\n", rc);
+   goto half_open_err;
+   }
+   return 0;
+
+half_open_err:
+   bnxt_free_skbs(bp);
+   bnxt_free_mem(bp, false);
+   dev_close(bp->dev);
+   return rc;
+}
+
+/* rtnl_lock held, this call can only be made after a previous successful
+ * call to bnxt_half_open_nic().
+ */
+void bnxt_half_close_nic(struct bnxt *bp)
+{
+   bnxt_hwrm_resource_free(bp, false, false);
+   bnxt_free_skbs(bp);
+   bnxt_free_mem(bp, false);
+}
+
 static int bnxt_open(struct net_device *dev)
 {
struct bnxt *bp = netdev_priv(dev);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 4affaac..c9a1688 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1255,6 +1255,8 @@ int bnxt_hwrm_func_rgtr_async_events(struct bnxt *bp, 
unsigned long *bmap,
 int bnxt_hwrm_free_wol_fltr(struct bnxt *bp);
 int bnxt_hwrm_fw_set_time(struct bnxt *);
 int bnxt_open_nic(struct bnxt *, bool, bool);
+int bnxt_half_open_nic(struct bnxt *bp);
+void bnxt_half_close_nic(struct bnxt *bp);
 int bnxt_close_nic(struct bnxt *, bool, bool);
 int bnxt_reserve_rings(struct bnxt *bp, int tx, int rx, int tcs, int tx_xdp);
 int bnxt_setup_mq_tc(struct net_device *dev, u8 tc);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 711d7fd..ecb4417 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -18,6 +18,7 @@
 #include 
 #include "bnxt_hsi.h"
 #include "bnxt.h"
+#include "bnxt_xdp.h"
 #include "bnxt_ethtool.h"
 #include "bnxt_nvm_defs.h" /* NVRAM content constant and structure defs */
 #include "bnxt_fw_hdr.h"   /* Firmware hdr constant and structure defs */
@@ -2177,6 +2178,130 @@ static int bnxt_set_phys_id(struct net_device *dev,
return rc;
 }
 
+static int bnxt_hwrm_mac_loopback(struct bnxt *bp, bool enable)
+{
+   struct hwrm_port_mac_cfg_input req = {0};
+
+   bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_PORT_MAC_CFG, -1, -1);
+
+   req.enables = cpu_to_le32(PORT_MAC_CFG_REQ_ENABLES_LPBK);
+   if (enable)
+   req.lpbk = PORT_MAC_CFG_REQ_LPBK_LOCAL;
+   else
+   req.lpbk = PORT_MAC_CFG_REQ_LPBK_NONE;
+   return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+}
+
+static int bnxt_rx_loopback(struct bnxt *bp, struct bnxt_napi *bnapi,
+   u32 raw_cons, int pkt_size)
+{
+   struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring;
+   struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
+   struct bnxt_sw_rx_bd *rx_buf;
+   struct rx_cmp *rxcmp;
+   u16 cp_cons, cons;
+   u8 *data;
+   u32 len;
+   int i;
+
+   cp_cons = RING_CMP(raw_cons);
+   rxcmp = (struct rx_cmp *)
+   &cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)];
+   cons = rxcmp->rx_cmp_opaque;
+   rx_buf = &rxr->rx_buf_ring[cons];
+   data = rx_buf->data_ptr;
+   len = le32_to_cpu(rxcmp->rx_cmp_len_flags_type) >> RX_CMP_LEN_SHIFT;
+   if (len != pkt_size)
+   return -EIO;
+   i = ETH_ALEN;
+   if (!ether_addr_equal(data + i, bnapi->bp->dev->dev_addr))
+   return -EIO;
+   i += ETH_ALEN;
+   for (  ; i < pkt_size; i++) {
+   if (data[i] != (u8)(i & 0xff))
+   ret

[PATCH net-next 02/12] bnxt_en: Add basic WoL infrastructure.

2017-04-04 Thread Michael Chan

Add code to driver probe function to check if the device is WoL capable
and if Magic packet WoL filter is currently set.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 43 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 +++
 2 files changed, 47 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 174ec8f..70cc313 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4532,6 +4532,9 @@ static int bnxt_hwrm_func_qcaps(struct bnxt *bp)
pf->max_tx_wm_flows = le32_to_cpu(resp->max_tx_wm_flows);
pf->max_rx_em_flows = le32_to_cpu(resp->max_rx_em_flows);
pf->max_rx_wm_flows = le32_to_cpu(resp->max_rx_wm_flows);
+   if (resp->flags &
+   cpu_to_le32(FUNC_QCAPS_RESP_FLAGS_WOL_MAGICPKT_SUPPORTED))
+   bp->flags |= BNXT_FLAG_WOL_CAP;
} else {
 #ifdef CONFIG_BNXT_SRIOV
struct bnxt_vf_info *vf = &bp->vf;
@@ -5839,6 +5842,44 @@ static int bnxt_hwrm_port_led_qcaps(struct bnxt *bp)
return 0;
 }
 
+static u16 bnxt_hwrm_get_wol_fltrs(struct bnxt *bp, u16 handle)
+{
+   struct hwrm_wol_filter_qcfg_input req = {0};
+   struct hwrm_wol_filter_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
+   u16 next_handle = 0;
+   int rc;
+
+   bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_WOL_FILTER_QCFG, -1, -1);
+   req.port_id = cpu_to_le16(bp->pf.port_id);
+   req.handle = cpu_to_le16(handle);
+   mutex_lock(&bp->hwrm_cmd_lock);
+   rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+   if (!rc) {
+   next_handle = le16_to_cpu(resp->next_handle);
+   if (next_handle != 0) {
+   if (resp->wol_type ==
+   WOL_FILTER_ALLOC_REQ_WOL_TYPE_MAGICPKT) {
+   bp->wol = 1;
+   bp->wol_filter_id = resp->wol_filter_id;
+   }
+   }
+   }
+   mutex_unlock(&bp->hwrm_cmd_lock);
+   return next_handle;
+}
+
+static void bnxt_get_wol_settings(struct bnxt *bp)
+{
+   u16 handle = 0;
+
+   if (!BNXT_PF(bp) || !(bp->flags & BNXT_FLAG_WOL_CAP))
+   return;
+
+   do {
+   handle = bnxt_hwrm_get_wol_fltrs(bp, handle);
+   } while (handle && handle != 0x);
+}
+
 static bool bnxt_eee_config_ok(struct bnxt *bp)
 {
struct ethtool_eee *eee = &bp->eee;
@@ -7575,6 +7616,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
if (rc)
goto init_err_pci_clean;
 
+   bnxt_get_wol_settings(bp);
+
rc = register_netdev(dev);
if (rc)
goto init_err_clr_int;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 3cb0777..02de812 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -989,6 +989,7 @@ struct bnxt {
#define BNXT_FLAG_UDP_RSS_CAP   0x800
#define BNXT_FLAG_EEE_CAP   0x1000
#define BNXT_FLAG_NEW_RSS_CAP   0x2000
+   #define BNXT_FLAG_WOL_CAP   0x4000
#define BNXT_FLAG_ROCEV1_CAP0x8000
#define BNXT_FLAG_ROCEV2_CAP0x1
#define BNXT_FLAG_ROCE_CAP  (BNXT_FLAG_ROCEV1_CAP | \
@@ -1180,6 +1181,9 @@ struct bnxt {
u32 lpi_tmr_lo;
u32 lpi_tmr_hi;
 
+   u8  wol_filter_id;
+   u8  wol;
+
u8  num_leds;
struct bnxt_led_infoleds[BNXT_MAX_LED];
 
-- 
1.8.3.1

[PATCH net-next 00/12] bnxt_en: Updates for net-next.

2017-04-04 Thread Michael Chan

Main changes are to add WoL and selftest features, optimize XDP_TX by
using short BDs, and to cap the usage of MSIX.

Michael Chan (12):
  bnxt_en: Update firmware interface spec to 1.7.6.2.
  bnxt_en: Add basic WoL infrastructure.
  bnxt_en: Add pci shutdown method.
  bnxt_en: Add ethtool get_wol method.
  bnxt_en: Add ethtool set_wol method.
  bnxt_en: Add suspend/resume callbacks.
  bnxt_en: Add basic ethtool -t selftest support.
  bnxt_en: Add ethtool mac loopback self test.
  bnxt_en: Add PHY loopback to ethtool self-test.
  bnxt_en: Add interrupt test to ethtool -t selftest.
  bnxt_en: Use short TX BDs for the XDP TX ring.
  bnxt_en: Cap the msix vector with the max completion rings.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 207 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  21 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 413 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h |   3 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 325 +++--
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c   |   8 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h   |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  20 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   2 +
 9 files changed, 942 insertions(+), 58 deletions(-)

-- 
1.8.3.1

[PATCH net-next 07/12] bnxt_en: Add basic ethtool -t selftest support.

2017-04-04 Thread Michael Chan

Add the basic infrastructure and only firmware tests initially.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   2 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  13 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 136 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.h |   2 +
 4 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4e77bbf..7b72ba9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7281,6 +7281,7 @@ static void bnxt_remove_one(struct pci_dev *pdev)
bnxt_clear_int_mode(bp);
bnxt_hwrm_func_drv_unrgtr(bp);
bnxt_free_hwrm_resources(bp);
+   bnxt_ethtool_free(bp);
bnxt_dcb_free(bp);
kfree(bp->edev);
bp->edev = NULL;
@@ -7603,6 +7604,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
 
bnxt_hwrm_func_qcfg(bp);
bnxt_hwrm_port_led_qcaps(bp);
+   bnxt_ethtool_init(bp);
 
bnxt_set_rx_skb_mode(bp, false);
bnxt_set_tpa_flags(bp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index aba25ba..4affaac 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -426,8 +426,6 @@ struct rx_tpa_end_cmp_ext {
 
 #define BNXT_MIN_PKT_SIZE  52
 
-#define BNXT_NUM_TESTS(bp) 0
-
 #define BNXT_DEFAULT_RX_RING_SIZE  511
 #define BNXT_DEFAULT_TX_RING_SIZE  511
 
@@ -911,6 +909,14 @@ struct bnxt_led_info {
__le16  led_color_caps;
 };
 
+#define BNXT_MAX_TEST  8
+
+struct bnxt_test_info {
+   u8 offline_mask;
+   u16 timeout;
+   char string[BNXT_MAX_TEST][ETH_GSTRING_LEN];
+};
+
 #define BNXT_GRCPF_REG_WINDOW_BASE_OUT 0x400
 #define BNXT_CAG_REG_LEGACY_INT_STATUS 0x4014
 #define BNXT_CAG_REG_BASE  0x30
@@ -1181,6 +1187,9 @@ struct bnxt {
u32 lpi_tmr_lo;
u32 lpi_tmr_hi;
 
+   u8  num_tests;
+   struct bnxt_test_info   *test_info;
+
u8  wol_filter_id;
u8  wol;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 84cd4ca..711d7fd 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -210,6 +210,10 @@ static int bnxt_get_sset_count(struct net_device *dev, int 
sset)
 
return num_stats;
}
+   case ETH_SS_TEST:
+   if (!bp->num_tests)
+   return -EOPNOTSUPP;
+   return bp->num_tests;
default:
return -EOPNOTSUPP;
}
@@ -307,6 +311,11 @@ static void bnxt_get_strings(struct net_device *dev, u32 
stringset, u8 *buf)
}
}
break;
+   case ETH_SS_TEST:
+   if (bp->num_tests)
+   memcpy(buf, bp->test_info->string,
+  bp->num_tests * ETH_GSTRING_LEN);
+   break;
default:
netdev_err(bp->dev, "bnxt_get_strings invalid request %x\n",
   stringset);
@@ -825,7 +834,7 @@ static void bnxt_get_drvinfo(struct net_device *dev,
sizeof(info->fw_version));
strlcpy(info->bus_info, pci_name(bp->pdev), sizeof(info->bus_info));
info->n_stats = BNXT_NUM_STATS * bp->cp_nr_rings;
-   info->testinfo_len = BNXT_NUM_TESTS(bp);
+   info->testinfo_len = bp->num_tests;
/* TODO CHIMP_FW: eeprom dump details */
info->eedump_len = 0;
/* TODO CHIMP FW: reg dump details */
@@ -2168,6 +2177,130 @@ static int bnxt_set_phys_id(struct net_device *dev,
return rc;
 }
 
+static int bnxt_run_fw_tests(struct bnxt *bp, u8 test_mask, u8 *test_results)
+{
+   struct hwrm_selftest_exec_output *resp = bp->hwrm_cmd_resp_addr;
+   struct hwrm_selftest_exec_input req = {0};
+   int rc;
+
+   bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_SELFTEST_EXEC, -1, -1);
+   mutex_lock(&bp->hwrm_cmd_lock);
+   resp->test_success = 0;
+   req.flags = test_mask;
+   rc = _hwrm_send_message(bp, &req, sizeof(req), bp->test_info->timeout);
+   *test_results = resp->test_success;
+   mutex_unlock(&bp->hwrm_cmd_lock);
+   return rc;
+}
+
+#define BNXT_DRV_TESTS 0
+
+static void bnxt_self_test(struct net_device *dev, struct ethtool_test *etest,
+  u64 *buf)
+{
+   struct bnxt *bp = netdev_priv(dev);
+   bool offline = false;
+   u8 test_results = 0;
+   u8 test_mask = 0;
+   int rc, i;
+
+   if (!bp->num_tests || !BNXT_SINGLE_PF(bp))
+   return;
+   mem

[PATCH net-next 06/12] bnxt_en: Add suspend/resume callbacks.

2017-04-04 Thread Michael Chan

Add suspend/resume callbacks using the newer dev_pm_ops method.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 57 +++
 1 file changed, 57 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index e432d0a..4e77bbf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7703,6 +7703,62 @@ static void bnxt_shutdown(struct pci_dev *pdev)
rtnl_unlock();
 }
 
+#ifdef CONFIG_PM_SLEEP
+static int bnxt_suspend(struct device *device)
+{
+   struct pci_dev *pdev = to_pci_dev(device);
+   struct net_device *dev = pci_get_drvdata(pdev);
+   struct bnxt *bp = netdev_priv(dev);
+   int rc = 0;
+
+   rtnl_lock();
+   if (netif_running(dev)) {
+   netif_device_detach(dev);
+   rc = bnxt_close(dev);
+   }
+   bnxt_hwrm_func_drv_unrgtr(bp);
+   rtnl_unlock();
+   return rc;
+}
+
+static int bnxt_resume(struct device *device)
+{
+   struct pci_dev *pdev = to_pci_dev(device);
+   struct net_device *dev = pci_get_drvdata(pdev);
+   struct bnxt *bp = netdev_priv(dev);
+   int rc = 0;
+
+   rtnl_lock();
+   if (bnxt_hwrm_ver_get(bp) || bnxt_hwrm_func_drv_rgtr(bp)) {
+   rc = -ENODEV;
+   goto resume_exit;
+   }
+   rc = bnxt_hwrm_func_reset(bp);
+   if (rc) {
+   rc = -EBUSY;
+   goto resume_exit;
+   }
+   bnxt_get_wol_settings(bp);
+   if (netif_running(dev)) {
+   rc = bnxt_open(dev);
+   if (!rc)
+   netif_device_attach(dev);
+   }
+
+resume_exit:
+   rtnl_unlock();
+   return rc;
+}
+
+static SIMPLE_DEV_PM_OPS(bnxt_pm_ops, bnxt_suspend, bnxt_resume);
+#define BNXT_PM_OPS (&bnxt_pm_ops)
+
+#else
+
+#define BNXT_PM_OPS NULL
+
+#endif /* CONFIG_PM_SLEEP */
+
 /**
  * bnxt_io_error_detected - called when PCI error is detected
  * @pdev: Pointer to PCI device
@@ -7820,6 +7876,7 @@ static void bnxt_io_resume(struct pci_dev *pdev)
.probe  = bnxt_init_one,
.remove = bnxt_remove_one,
.shutdown   = bnxt_shutdown,
+   .driver.pm  = BNXT_PM_OPS,
.err_handler= &bnxt_err_handler,
 #if defined(CONFIG_BNXT_SRIOV)
.sriov_configure = bnxt_sriov_configure,
-- 
1.8.3.1

[PATCH net-next 03/12] bnxt_en: Add pci shutdown method.

2017-04-04 Thread Michael Chan

Add pci shutdown method to put device in the proper WoL and power state.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 70cc313..10a9cda 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7617,6 +7617,10 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
goto init_err_pci_clean;
 
bnxt_get_wol_settings(bp);
+   if (bp->flags & BNXT_FLAG_WOL_CAP)
+   device_set_wakeup_enable(&pdev->dev, bp->wol);
+   else
+   device_set_wakeup_capable(&pdev->dev, false);
 
rc = register_netdev(dev);
if (rc)
@@ -7641,6 +7645,32 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
return rc;
 }
 
+static void bnxt_shutdown(struct pci_dev *pdev)
+{
+   struct net_device *dev = pci_get_drvdata(pdev);
+   struct bnxt *bp;
+
+   if (!dev)
+   return;
+
+   rtnl_lock();
+   bp = netdev_priv(dev);
+   if (!bp)
+   goto shutdown_exit;
+
+   if (netif_running(dev))
+   dev_close(dev);
+
+   if (system_state == SYSTEM_POWER_OFF) {
+   bnxt_clear_int_mode(bp);
+   pci_wake_from_d3(pdev, bp->wol);
+   pci_set_power_state(pdev, PCI_D3hot);
+   }
+
+shutdown_exit:
+   rtnl_unlock();
+}
+
 /**
  * bnxt_io_error_detected - called when PCI error is detected
  * @pdev: Pointer to PCI device
@@ -7757,6 +7787,7 @@ static void bnxt_io_resume(struct pci_dev *pdev)
.id_table   = bnxt_pci_tbl,
.probe  = bnxt_init_one,
.remove = bnxt_remove_one,
+   .shutdown   = bnxt_shutdown,
.err_handler= &bnxt_err_handler,
 #if defined(CONFIG_BNXT_SRIOV)
.sriov_configure = bnxt_sriov_configure,
-- 
1.8.3.1

[PATCH] ebpf: verify the output of the JIT

2017-04-04 Thread Tycho Andersen

The goal of this patch is to protect the JIT against an attacker with a
write-in-memory primitive. The JIT allocates a buffer which will eventually
be marked +x, so we need to make sure that what was written to this buffer
is what was intended.

We acheive this by building a hash of the instruction buffer as
instructions are emittted and then comparing that to a hash at the end of
the JIT compile after the buffer has been marked read-only.

Signed-off-by: Tycho Andersen 
CC: Daniel Borkmann 
CC: Alexei Starovoitov 
CC: Kees Cook 
CC: Mickaël Salaün 
---
 arch/x86/Kconfig|  11 
 arch/x86/net/bpf_jit_comp.c | 147 
 2 files changed, 147 insertions(+), 11 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc98d5a..7b2db2c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2789,6 +2789,17 @@ config X86_DMA_REMAP
 
 source "net/Kconfig"
 
+config EBPF_JIT_HASH_OUTPUT
+   def_bool y
+   depends on HAVE_EBPF_JIT
+   depends on BPF_JIT
+   select CRYPTO_SHA256
+   ---help---
+ Enables a double check of the JIT's output after it is marked 
read-only to
+ ensure that it matches what the JIT generated.
+
+ Note, only applies when /proc/sys/net/core/bpf_jit_harden > 0.
+
 source "drivers/Kconfig"
 
 source "drivers/firmware/Kconfig"
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 32322ce..be1271e 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -13,9 +13,15 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 int bpf_jit_enable __read_mostly;
 
+#ifdef CONFIG_EBPF_JIT_HASH_OUTPUT
+struct crypto_shash *tfm __read_mostly;
+#endif
+
 /*
  * assembly code in arch/x86/net/bpf_jit.S
  */
@@ -25,7 +31,8 @@ extern u8 sk_load_byte_positive_offset[];
 extern u8 sk_load_word_negative_offset[], sk_load_half_negative_offset[];
 extern u8 sk_load_byte_negative_offset[];
 
-static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
+static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len,
+struct shash_desc *hash)
 {
if (len == 1)
*ptr = bytes;
@@ -35,11 +42,15 @@ static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
*(u32 *)ptr = bytes;
barrier();
}
+
+   if (IS_ENABLED(CONFIG_EBPF_JIT_HASH_OUTPUT) && hash)
+   crypto_shash_update(hash, (u8 *) &bytes, len);
+
return ptr + len;
 }
 
 #define EMIT(bytes, len) \
-   do { prog = emit_code(prog, bytes, len); cnt += len; } while (0)
+   do { prog = emit_code(prog, bytes, len, hash); cnt += len; } while (0)
 
 #define EMIT1(b1)  EMIT(b1, 1)
 #define EMIT2(b1, b2)  EMIT((b1) + ((b2) << 8), 2)
@@ -206,7 +217,7 @@ struct jit_context {
 /* emit x64 prologue code for BPF program and check it's size.
  * bpf_tail_call helper will skip it while jumping into another program
  */
-static void emit_prologue(u8 **pprog)
+static void emit_prologue(u8 **pprog, struct shash_desc *hash)
 {
u8 *prog = *pprog;
int cnt = 0;
@@ -264,7 +275,7 @@ static void emit_prologue(u8 **pprog)
  *   goto *(prog->bpf_func + prologue_size);
  * out:
  */
-static void emit_bpf_tail_call(u8 **pprog)
+static void emit_bpf_tail_call(u8 **pprog, struct shash_desc *hash)
 {
u8 *prog = *pprog;
int label1, label2, label3;
@@ -328,7 +339,7 @@ static void emit_bpf_tail_call(u8 **pprog)
 }
 
 
-static void emit_load_skb_data_hlen(u8 **pprog)
+static void emit_load_skb_data_hlen(u8 **pprog, struct shash_desc *hash)
 {
u8 *prog = *pprog;
int cnt = 0;
@@ -348,7 +359,8 @@ static void emit_load_skb_data_hlen(u8 **pprog)
 }
 
 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
- int oldproglen, struct jit_context *ctx)
+ int oldproglen, struct jit_context *ctx,
+ struct shash_desc *hash)
 {
struct bpf_insn *insn = bpf_prog->insnsi;
int insn_cnt = bpf_prog->len;
@@ -360,10 +372,10 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, 
u8 *image,
int proglen = 0;
u8 *prog = temp;
 
-   emit_prologue(&prog);
+   emit_prologue(&prog, hash);
 
if (seen_ld_abs)
-   emit_load_skb_data_hlen(&prog);
+   emit_load_skb_data_hlen(&prog, hash);
 
for (i = 0; i < insn_cnt; i++, insn++) {
const s32 imm32 = insn->imm;
@@ -875,7 +887,7 @@ xadd:   if (is_imm8(insn->off))
if (seen_ld_abs) {
if (reload_skb_data) {
EMIT1(0x5F); /* pop %rdi */
-   emit_load_skb_data_hlen(&prog);
+   emit_load_skb_data_hlen(&prog, hash);
} else {
EMIT2(0x41, 0x59); /* pop %r9 */

Re: [iproute PATCH 0/4] Smaller link type help review

2017-04-04 Thread Stephen Hemminger

On Tue, 28 Mar 2017 23:19:35 +0200
Phil Sutter  wrote:

> This series addresses some minor nits with link type specific help
> texts:
> 
> * Unify coding style of print_help() callbacks (or the functions they
>   call.
> 
> * Unify output as much as possible for a common look and feel.
> 
> * Make sure there's type specific help for each type listed in 'ip link
>   help'.
> 
> Phil Sutter (4):
>   ip: link: bond: Fix whitespace in help text
>   ip: link: macvlan: Add newline to help output
>   ip: link: Unify link type help functions a bit
>   ip: link: Add missing link type help texts
> 
>  ip/Makefile |  3 ++-
>  ip/iplink_bond.c|  2 +-
>  ip/iplink_dummy.c   | 16 
>  ip/iplink_geneve.c  | 28 ++--
>  ip/iplink_ifb.c | 16 
>  ip/iplink_ipoib.c   |  4 +++-
>  ip/iplink_macvlan.c |  1 +
>  ip/iplink_nlmon.c   | 16 
>  ip/iplink_team.c| 25 +
>  ip/iplink_vcan.c| 16 
>  ip/iplink_vlan.c| 15 +--
>  ip/iplink_vxlan.c   | 44 +---
>  ip/link_gre.c   | 36 +++-
>  ip/link_gre6.c  | 47 ---
>  ip/link_ip6tnl.c| 46 +++---
>  ip/link_iptnl.c | 38 ++
>  ip/link_vti.c   | 17 +
>  17 files changed, 265 insertions(+), 105 deletions(-)
>  create mode 100644 ip/iplink_dummy.c
>  create mode 100644 ip/iplink_ifb.c
>  create mode 100644 ip/iplink_nlmon.c
>  create mode 100644 ip/iplink_team.c
>  create mode 100644 ip/iplink_vcan.c
> 

All 4 Applied

Re: [PATCH net] sctp: get sock from transport in sctp_transport_update_pmtu

2017-04-04 Thread Marcelo Ricardo Leitner

On Tue, Apr 04, 2017 at 01:39:55PM +0800, Xin Long wrote:
> This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU
> updates to accomodate route invalidation."). As t->asoc can't be NULL
> in sctp_transport_update_pmtu, it could get sk from asoc, and no need
> to pass sk into that function.
> 
> It is also to remove some duplicated codes from that function.
> 
> Signed-off-by: Xin Long 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  include/net/sctp/sctp.h|  5 ++---
>  include/net/sctp/structs.h |  6 +++---
>  net/sctp/associola.c   |  6 +++---
>  net/sctp/input.c   |  4 ++--
>  net/sctp/output.c  |  4 ++--
>  net/sctp/socket.c  |  6 +++---
>  net/sctp/transport.c   | 19 +++
>  7 files changed, 22 insertions(+), 28 deletions(-)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index d75caa7..069582e 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -448,10 +448,9 @@ static inline int sctp_frag_point(const struct 
> sctp_association *asoc, int pmtu)
>   return frag;
>  }
>  
> -static inline void sctp_assoc_pending_pmtu(struct sock *sk, struct 
> sctp_association *asoc)
> +static inline void sctp_assoc_pending_pmtu(struct sctp_association *asoc)
>  {
> -
> - sctp_assoc_sync_pmtu(sk, asoc);
> + sctp_assoc_sync_pmtu(asoc);
>   asoc->pmtu_pending = 0;
>  }
>  
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index a127b7c..138f861 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -952,8 +952,8 @@ void sctp_transport_lower_cwnd(struct sctp_transport *, 
> sctp_lower_cwnd_t);
>  void sctp_transport_burst_limited(struct sctp_transport *);
>  void sctp_transport_burst_reset(struct sctp_transport *);
>  unsigned long sctp_transport_timeout(struct sctp_transport *);
> -void sctp_transport_reset(struct sctp_transport *);
> -void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32);
> +void sctp_transport_reset(struct sctp_transport *t);
> +void sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu);
>  void sctp_transport_immediate_rtx(struct sctp_transport *);
>  void sctp_transport_dst_release(struct sctp_transport *t);
>  void sctp_transport_dst_confirm(struct sctp_transport *t);
> @@ -1954,7 +1954,7 @@ void sctp_assoc_update(struct sctp_association *old,
>  
>  __u32 sctp_association_get_next_tsn(struct sctp_association *);
>  
> -void sctp_assoc_sync_pmtu(struct sock *, struct sctp_association *);
> +void sctp_assoc_sync_pmtu(struct sctp_association *asoc);
>  void sctp_assoc_rwnd_increase(struct sctp_association *, unsigned int);
>  void sctp_assoc_rwnd_decrease(struct sctp_association *, unsigned int);
>  void sctp_assoc_set_primary(struct sctp_association *,
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 0b26df5..a9708da 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1412,7 +1412,7 @@ sctp_assoc_choose_alter_transport(struct 
> sctp_association *asoc,
>  /* Update the association's pmtu and frag_point by going through all the
>   * transports. This routine is called when a transport's PMTU has changed.
>   */
> -void sctp_assoc_sync_pmtu(struct sock *sk, struct sctp_association *asoc)
> +void sctp_assoc_sync_pmtu(struct sctp_association *asoc)
>  {
>   struct sctp_transport *t;
>   __u32 pmtu = 0;
> @@ -1424,8 +1424,8 @@ void sctp_assoc_sync_pmtu(struct sock *sk, struct 
> sctp_association *asoc)
>   list_for_each_entry(t, &asoc->peer.transport_addr_list,
>   transports) {
>   if (t->pmtu_pending && t->dst) {
> - sctp_transport_update_pmtu(sk, t,
> -
> SCTP_TRUNC4(dst_mtu(t->dst)));
> + sctp_transport_update_pmtu(
> + t, SCTP_TRUNC4(dst_mtu(t->dst)));
>   t->pmtu_pending = 0;
>   }
>   if (!pmtu || (t->pathmtu < pmtu))
> diff --git a/net/sctp/input.c b/net/sctp/input.c
> index 2a28ab2..0e06a27 100644
> --- a/net/sctp/input.c
> +++ b/net/sctp/input.c
> @@ -401,10 +401,10 @@ void sctp_icmp_frag_needed(struct sock *sk, struct 
> sctp_association *asoc,
>  
>   if (t->param_flags & SPP_PMTUD_ENABLE) {
>   /* Update transports view of the MTU */
> - sctp_transport_update_pmtu(sk, t, pmtu);
> + sctp_transport_update_pmtu(t, pmtu);
>  
>   /* Update association pmtu. */
> - sctp_assoc_sync_pmtu(sk, asoc);
> + sctp_assoc_sync_pmtu(asoc);
>   }
>  
>   /* Retransmit with the new pmtu setting.
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index ec4d50a..1409a87 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -105,10 +105,10 @@ void sctp_packet_config(struct sctp_packet *packet, 
> __u32 vtag,
>   if (!sctp_transport_dst_check(tp)) {
>

Re: [PATCH iproute2] man: ip-link.8: document bridge options

2017-04-04 Thread Stephen Hemminger

On Tue, 28 Mar 2017 17:56:48 +0200
Sabrina Dubroca  wrote:

> Signed-off-by: Phil Sutter 
> Signed-off-by: Sabrina Dubroca 

Applied

Re: [PATCH iproute2 1/1] tc: print skbedit action when dumping actions.

2017-04-04 Thread Stephen Hemminger

On Wed, 22 Mar 2017 14:00:31 -0400
Roman Mashak  wrote:

> Signed-off-by: Roman Mashak 

Makes sense. Applied

Re: [PATCH iproute2] man: fix man page warnings

2017-04-04 Thread Stephen Hemminger

On Sun, 26 Mar 2017 21:11:14 +0200
Alexander Alemayhu  wrote:

> While generating PDFs from the man pages, I saw the warning below from
> several files. Compared the tc-matchall.8 with bridge.8 and used .RI
> instead of .R. It should have no effect on the man page rendering.
> 
> `R' is a string (producing the registered sign), not a macro.
> 
> Signed-off-by: Alexander Alemayhu 

Applied

Re: [PATCH] ss: replace all zero characters in a unix name to '@'

2017-04-04 Thread Stephen Hemminger

On Sat,  1 Apr 2017 04:31:57 +0300
Andrei Vagin  wrote:

> From: Andrei Vagin 
> 
> A name of an abstract socket can contain zero characters.
> Now we replace only the first character. If a name contains more
> than one zero character, the ss tool shows only a part of the name:
> u_str  UNCONN00 @1931097   * 0
> 
> the output with this patch:
> u_str  UNCONN00 @@zdtm-./sk-unix-unconn-23/@ 1931097   * 0
> 
> Signed-off-by: Andrei Vagin 

This patch duplicates changes that are already in current version.

commit 878dadc79d247aa37b67fb30608e58ef1f9ab9ff
Author: Isaac Boukris 
Date:   Sat Oct 29 22:20:19 2016 +0300

iproute2: ss: escape all null bytes in abstract unix domain socket

Abstract unix domain socket may embed null characters,
these should be translated to '@' when printed by ss the
same way the null prefix is currently being translated.

Signed-off-by: Isaac Boukris

Re: [iproute2 net-next v2 0/3] ip netconf improvements

2017-04-04 Thread Stephen Hemminger

On Tue, 4 Apr 2017 17:07:31 -0400
David Ahern  wrote:

> On 3/23/17 10:51 PM, David Ahern wrote:
> > Currently, ip netconf only shows data for ipv4 and ipv6 for dumps
> > and just ipv4 for device requests. Improve the user experience by
> > using the new kernel patch to dump all address families that have
> > registered. For example, if mpls_router module is loaded then mpls
> > values are displayed along with ipv4 and ipv6.
> > 
> > If the new feature is not supported (new iproute2 on older kernel)
> > the kernel returns the nlmsg error EOPNOTSUPP which can be trapped
> > and fallback to existing behavior.
> > 
> > v2
> > - fixed index conversion in patch 3 per nicholas' comment
> > 
> > David Ahern (3):
> >   netlink: Add flag to suppress print of nlmsg error
> >   ip netconf: Show all address families by default in dumps
> >   ip netconf: show all families on dev request
> > 
> >  include/libnetlink.h |  1 +
> >  ip/ipnetconf.c   | 36 +---
> >  lib/libnetlink.c |  3 ++-
> >  3 files changed, 28 insertions(+), 12 deletions(-)
> >   
> 
> Hi Stephen: any comments? are you ok with this change?

I was holding off until all the upstream commits went through. Other than
that fine.

[PATCH net-next] bonding: attempt to better support longer hw addresses

2017-04-04 Thread Jarod Wilson

People are using bonding over Infiniband IPoIB connections, and who knows
what else. Infiniband has a hardware address length of 20 octets
(INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32.
Various places in the bonding code are currently hard-wired to 6 octets
(ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides,
only alb is currently possible on Infiniband links right now anyway, due
to commit 1533e7731522, so the alb code is where most of the changes are.

One major component of this change is the addition of a bond_hw_addr_copy
function that takes a length argument, instead of using ether_addr_copy
everywhere that hardware addresses need to be copied about. The other
major component of this change is converting the bonding code from using
struct sockaddr for address storage to struct sockaddr_storage, as the
former has an address storage space of only 14, while the latter is 128
minus a few, which is necessary to support bonding over device with up to
MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes
up some memory corruption issues with the current code, where it's
possible to write an infiniband hardware address into a sockaddr declared
on the stack.

Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet
hardware address now:

$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: mlx4_ib0 (primary_reselect always)
Currently Active Slave: mlx4_ib0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100

Slave Interface: mlx4_ib0
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr:
80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01
Slave queue ID: 0

Slave Interface: mlx4_ib1
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr:
80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02
Slave queue ID: 0

Also tested with a standard 1Gbps NIC bonding setup (with a mix of
e1000 and e1000e cards), running LNST's bonding tests.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_alb.c| 88 +++
 drivers/net/bonding/bond_main.c   | 73 ++--
 drivers/net/bonding/bond_procfs.c |  3 +-
 include/net/bonding.h | 12 +-
 4 files changed, 108 insertions(+), 68 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index c80b023092dd..7d7a3cec149a 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -687,7 +687,8 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, 
struct bonding *bond)
/* the arp must be sent on the selected rx channel */
tx_slave = rlb_choose_channel(skb, bond);
if (tx_slave)
-   ether_addr_copy(arp->mac_src, tx_slave->dev->dev_addr);
+   bond_hw_addr_copy(arp->mac_src, tx_slave->dev->dev_addr,
+ tx_slave->dev->addr_len);
netdev_dbg(bond->dev, "Server sent ARP Reply packet\n");
} else if (arp->op_code == htons(ARPOP_REQUEST)) {
/* Create an entry in the rx_hashtbl for this client as a
@@ -1017,22 +1018,23 @@ static void alb_send_learning_packets(struct slave 
*slave, u8 mac_addr[],
rcu_read_unlock();
 }
 
-static int alb_set_slave_mac_addr(struct slave *slave, u8 addr[])
+static int alb_set_slave_mac_addr(struct slave *slave, u8 addr[],
+ unsigned int len)
 {
struct net_device *dev = slave->dev;
-   struct sockaddr s_addr;
+   struct sockaddr_storage ss;
 
if (BOND_MODE(slave->bond) == BOND_MODE_TLB) {
-   memcpy(dev->dev_addr, addr, dev->addr_len);
+   memcpy(dev->dev_addr, addr, len);
return 0;
}
 
/* for rlb each slave must have a unique hw mac addresses so that
 * each slave will receive packets destined to a different mac
 */
-   memcpy(s_addr.sa_data, addr, dev->addr_len);
-   s_addr.sa_family = dev->type;
-   if (dev_set_mac_address(dev, &s_addr)) {
+   memcpy(ss.__data, addr, len);
+   ss.ss_family = dev->type;
+   if (dev_set_mac_address(dev, (struct sockaddr *)&ss)) {
netdev_err(slave->bond->dev, "dev_set_mac_address of dev %s 
failed! ALB mode requires that the base driver support setting the hw address 
also when the network device's interface is open\n",
   dev->name);
return -EOPNOTSUPP;
@@ -1046,11 +1048,14 @@ static int alb_set_slave_mac_addr(struct slave *slave, 
u8 addr[])
  */
 static void alb_swap_mac_addr(struct slave *slave1, struct slave *slave2)
 {

[PATCH net 2/2] tcp: fix reordering SNMP under-counting

2017-04-04 Thread Yuchung Cheng

Currently the reordering SNMP counters only increase if a connection
sees a higher degree then it has previously seen. It ignores if the
reordering degree is not greater than the default system threshold.
This significantly under-counts the number of reordering events
and falsely convey that reordering is rare on the network.

This patch properly and faithfully records the number of reordering
events detected by the TCP stack, just like the comment says "this
exciting event is worth to be remembered". Note that even so TCP
still under-estimate the actual reordering events because TCP
requires TS options or certain packet sequences to detect reordering
(i.e. ACKing never-retransmitted sequence in recovery or disordered
 state).

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
Signed-off-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp_input.c | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a75c48f62e27..5bfe17fc8064 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -874,22 +874,11 @@ static void tcp_update_reordering(struct sock *sk, const 
int metric,
  const int ts)
 {
struct tcp_sock *tp = tcp_sk(sk);
-   if (metric > tp->reordering) {
-   int mib_idx;
+   int mib_idx;
 
+   if (metric > tp->reordering) {
tp->reordering = min(sysctl_tcp_max_reordering, metric);
 
-   /* This exciting event is worth to be remembered. 8) */
-   if (ts)
-   mib_idx = LINUX_MIB_TCPTSREORDER;
-   else if (tcp_is_reno(tp))
-   mib_idx = LINUX_MIB_TCPRENOREORDER;
-   else if (tcp_is_fack(tp))
-   mib_idx = LINUX_MIB_TCPFACKREORDER;
-   else
-   mib_idx = LINUX_MIB_TCPSACKREORDER;
-
-   NET_INC_STATS(sock_net(sk), mib_idx);
 #if FASTRETRANS_DEBUG > 1
pr_debug("Disorder%d %d %u f%u s%u rr%d\n",
 tp->rx_opt.sack_ok, inet_csk(sk)->icsk_ca_state,
@@ -902,6 +891,18 @@ static void tcp_update_reordering(struct sock *sk, const 
int metric,
}
 
tp->rack.reord = 1;
+
+   /* This exciting event is worth to be remembered. 8) */
+   if (ts)
+   mib_idx = LINUX_MIB_TCPTSREORDER;
+   else if (tcp_is_reno(tp))
+   mib_idx = LINUX_MIB_TCPRENOREORDER;
+   else if (tcp_is_fack(tp))
+   mib_idx = LINUX_MIB_TCPFACKREORDER;
+   else
+   mib_idx = LINUX_MIB_TCPSACKREORDER;
+
+   NET_INC_STATS(sock_net(sk), mib_idx);
 }
 
 /* This must be called before lost_out is incremented */
-- 
2.12.2.715.g7642488e1d-goog

[PATCH net 1/2] tcp: fix lost retransmit SNMP under-counting

2017-04-04 Thread Yuchung Cheng

The lost retransmit SNMP stat is under-counting retransmission
that uses segment offloading. This patch fixes that so all
retransmission related SNMP counters are consistent.

Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
Signed-off-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp_recovery.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index 4ecb38ae8504..d8acbd9f477a 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -12,7 +12,8 @@ static void tcp_rack_mark_skb_lost(struct sock *sk, struct 
sk_buff *skb)
/* Account for retransmits that are lost again */
TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
tp->retrans_out -= tcp_skb_pcount(skb);
-   NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPLOSTRETRANSMIT);
+   NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPLOSTRETRANSMIT,
+ tcp_skb_pcount(skb));
}
 }
 
-- 
2.12.2.715.g7642488e1d-goog

Re: net/sctp: list double add warning in sctp_endpoint_add_asoc

2017-04-04 Thread Marcelo Ricardo Leitner

On Wed, Apr 05, 2017 at 01:29:19AM +0800, Xin Long wrote:
> On Tue, Apr 4, 2017 at 9:28 PM, Andrey Konovalov  
> wrote:
> > Hi,
> >
> > I've got the following error report while fuzzing the kernel with syzkaller.
> >
> > On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5).
> >
> > A reproducer and .config are attached.
> The script is pretty hard to reproduce the issue in my env.

I didn't try running it but I also found the reproducer very complicated
to follow. Do you have any plans on having some PoC optimizer, so we can
have a more readable code?
strace is handy for filtering the noise, yes, but sometimes it doesn't
cut it.

> But there seems a case to cause a use-after-free when out of snd_buf.
> 
> the case is like:
> ---
> one thread:   another thread:
>   sctp_rcv hold asoc (hold transport)
>   enqueue the chunk to backlog queue
>   [refcnt=2]
> 
> sctp_close free assoc
> [refcnt=1]
> 
> sctp_sendmsg find asoc
> but not hold it
> 
> out of snd_buf
> hold asoc, schedule out
> [refcnt = 2]
> 
>   process backlog and put asoc/transport
>   [refcnt=1]
> 
> schedule in, put asoc
> [refcnt=0] <--- destroyed
> 
> sctp_sendmsg continue

It shouldn't be continuing here because sctp_wait_for_sndbuf and
sctp_wait_for_connect functions are checking if the asoc is dead
already when it schedules in, even though sctp_wait_for_connect return
value is ignored and sctp_sendmsg() simply returns after that.
Or the checks for dead asocs in there aren't enough somehow.

> using asoc, panic

Re: [RFC net-next] bpf: taint loading !is_gpl programs

2017-04-04 Thread Daniel Borkmann


On 04/04/2017 08:33 PM, Aaron Conole wrote:

The eBPF framework is used for more than just socket level filtering.  It
can also provide tracing, and even change the way packets coming into the
system look.  Most of the eBPF callable symbols are available to non-gpl
programs, and this includes helper functions which modify packets.  This
allows proprietary eBPF code to link to the kernel and make decisions
which can negatively impact network performance.

Since the sources for these programs are only available under a proprietary
license, it seems better to treat them the same as other proprietary
modules: set the system taint flag.  An exemption is made for socket-level
filters, since they do not really impact networking for the whole kernel.

Signed-off-by: Aaron Conole 
---
  kernel/bpf/syscall.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ab0cf4c4..1255b51 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -860,6 +860,11 @@ static int bpf_prog_load(union bpf_attr *attr)

bpf_prog_kallsyms_add(prog);
trace_bpf_prog_load(prog, err);
+   if (type != BPF_PROG_TYPE_SOCKET_FILTER && !is_gpl && !(err < 0)) {
+   if (!test_taint(TAINT_PROPRIETARY_MODULE))
+   pr_warn("bpf license '%s' taints kernel.\n", license);
+   add_taint(TAINT_PROPRIETARY_MODULE, LOCKDEP_STILL_OK);
+   }
return err;

  free_used_maps:



Nacked-by: Daniel Borkmann 

This is proposal completely unreasonable; what the purpose of .gpl_only
flags is agreed upon since the beginning is that some of the helpers
are only available if the program is loaded as gpl, f.e. bpf_ktime_get_ns(),
bpf_probe_read(), bpf_probe_write_user(), bpf_trace_printk(),
bpf_skb_event_output(), etc. Now, suddenly switching from one kernel
version to another, existing programs would out of a sudden taint the
kernel, which by itself is unacceptable. There are also many other
subsystems that can modify packets, or affect system performance
negatively if configured wrongly and which in addition *don't require* a
hard capable(CAP_SYS_ADMIN) restriction like such eBPF programs already
do, perhaps should we taint them as well? Plus tracing programs are
attached to passively monitor systems performance, not even modifying
data structures ... The current purpose of .gpl_only is fine as-is, and
there's work in progress for a generic dump mechanism that works with
all program types to improve introspection aspect if that's what you're
after, starting to taint is, in a way, breaking existing applications
and this is not acceptable.

Re: [iproute2 net-next v2 0/3] ip netconf improvements

2017-04-04 Thread David Ahern

On 3/23/17 10:51 PM, David Ahern wrote:
> Currently, ip netconf only shows data for ipv4 and ipv6 for dumps
> and just ipv4 for device requests. Improve the user experience by
> using the new kernel patch to dump all address families that have
> registered. For example, if mpls_router module is loaded then mpls
> values are displayed along with ipv4 and ipv6.
> 
> If the new feature is not supported (new iproute2 on older kernel)
> the kernel returns the nlmsg error EOPNOTSUPP which can be trapped
> and fallback to existing behavior.
> 
> v2
> - fixed index conversion in patch 3 per nicholas' comment
> 
> David Ahern (3):
>   netlink: Add flag to suppress print of nlmsg error
>   ip netconf: Show all address families by default in dumps
>   ip netconf: show all families on dev request
> 
>  include/libnetlink.h |  1 +
>  ip/ipnetconf.c   | 36 +---
>  lib/libnetlink.c |  3 ++-
>  3 files changed, 28 insertions(+), 12 deletions(-)
> 

Hi Stephen: any comments? are you ok with this change?

Fw: [Bug 195169] New: ip_route_input_noref panic

2017-04-04 Thread Stephen Hemminger



Begin forwarded message:

Date: Fri, 31 Mar 2017 02:54:55 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 195169] New: ip_route_input_noref panic


https://bugzilla.kernel.org/show_bug.cgi?id=195169

Bug ID: 195169
   Summary: ip_route_input_noref panic
   Product: Networking
   Version: 2.5
Kernel Version: 3.10.103
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: IPV4
  Assignee: step...@networkplumber.org
  Reporter: panweipi...@163.com
Regression: No

Created attachment 255655
  --> https://bugzilla.kernel.org/attachment.cgi?id=255655&action=edit  
ip_route_input_noref

We hit a kernel panic on kernel 3.10.103.

Since we do not configure kdump, I can only take a picture after it hangs.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [PATCH] i40e: limit client interface to X722 hardware

2017-04-04 Thread Stefan Assmann

On 04.04.2017 18:56, Or Gerlitz wrote:
> On Tue, Apr 4, 2017 at 5:34 PM, Stefan Assmann  wrote:
>> The client interface is meant for X722 iWARP support. Modprobing i40iw
>> on systems with X710/XL710 NICs currently may crash the system.
> 
> just curious may or crash? and why?

The backtrace I got was not really conclusive. The code is not meant to
be run on that hardware so I didn't bother to dig deeper.

  Stefan

[PATCH] i40e: only register client on iWarp-capable devices

2017-04-04 Thread Mitch Williams

The client interface is only intended for use on devices that support
iWarp). Only register with the client if this is the case.

This fixes a panic when loading i40iw on X710 devices.

Signed-off-by: Mitch Williams 
Reported-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 87d99fa..5e0e44e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11828,10 +11828,12 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
  round_jiffies(jiffies + pf->service_timer_period));
 
/* add this PF to client device list and launch a client service task */
-   err = i40e_lan_add_device(pf);
-   if (err)
-   dev_info(&pdev->dev, "Failed to add PF to client API service 
list: %d\n",
-err);
+   if (pf->flags & I40E_FLAG_IWARP_ENABLED) {
+   err = i40e_lan_add_device(pf);
+   if (err)
+   dev_info(&pdev->dev, "Failed to add PF to client API 
service list: %d\n",
+err);
+   }
 
 #define PCI_SPEED_SIZE 8
 #define PCI_WIDTH_SIZE 8
@@ -12013,10 +12015,11 @@ static void i40e_remove(struct pci_dev *pdev)
i40e_vsi_release(pf->vsi[pf->lan_vsi]);
 
/* remove attached clients */
-   ret_code = i40e_lan_del_device(pf);
-   if (ret_code) {
-   dev_warn(&pdev->dev, "Failed to delete client device: %d\n",
-ret_code);
+   if (pf->flags & I40E_FLAG_IWARP_ENABLED) {
+   ret_code = i40e_lan_del_device(pf);
+   if (ret_code)
+   dev_warn(&pdev->dev, "Failed to delete client device: 
%d\n",
+ret_code);
}
 
/* shutdown and destroy the HMC */
-- 
2.7.4

Re: [PATCH 4/4] net: stmmac: adding multiple napi mechanism

2017-04-04 Thread Thierry Reding

On Tue, Apr 04, 2017 at 06:54:27PM +0100, Joao Pinto wrote:
[...]
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
[...]
> @@ -1259,7 +1317,6 @@ static int init_dma_tx_desc_rings(struct net_device 
> *dev)
>   /* TX INITIALIZATION */
>   for (i = 0; i < DMA_TX_SIZE; i++) {
>   struct dma_desc *p;
> -
>   if (priv->extend_desc)
>   p = &((tx_q->dma_etx + i)->basic);
>   else

I think checkpatch would complain about this now because we're supposed
to separate variable declarations from code by a single blank line.

> - netif_napi_add(ndev, &priv->napi, stmmac_poll, 64);
> + ret = alloc_dma_desc_resources(priv);
> + if (ret < 0) {
> + netdev_err(priv->dev, "%s: DMA descriptors allocation failed\n",
> +__func__);
> + goto init_dma_error;
> + }
> +
> + ret = init_dma_desc_rings(priv->dev, GFP_KERNEL);
> + if (ret < 0) {
> + netdev_err(priv->dev, "%s: DMA descriptors initialization 
> failed\n",
> +__func__);
> + goto init_dma_error;
> + }
> +
> + for (queue = 0; queue < priv->plat->rx_queues_to_use; queue++) {
> + struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
> +
> + netif_napi_add(ndev, &rx_q->napi, stmmac_poll,
> +(8 * priv->plat->rx_queues_to_use));
> + }

Why is this moving to ->probe() now?

This works on Tegra186, so:

Reviewed-by: Thierry Reding 


signature.asc
Description: PGP signature

Re: [PATCH 2/4] net: stmmac: adding multiple buffers for rx

2017-04-04 Thread Thierry Reding

On Tue, Apr 04, 2017 at 06:54:25PM +0100, Joao Pinto wrote:
[...]
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
[...]
> @@ -3402,6 +3474,9 @@ static irqreturn_t stmmac_interrupt(int irq, void 
> *dev_id)
>  
>   if (priv->synopsys_id >= DWMAC_CORE_4_00) {
>   for (queue = 0; queue < queues_count; queue++) {
> + struct stmmac_rx_queue *rx_q =
> + &priv->rx_queue[queue];

Found one more: the indentation here looks wrong. I think it's more
idiomatic to indent by at least a tab in such cases.

> +
>   status |=
>   priv->hw->mac->host_mtl_irq_status(priv->hw,
>  queue);

This is becoming quite unwieldy because of the indentation levels. Maybe
this could be split out into a separate function. Could be a separate
patch, though.

Thierry

signature.asc
Description: PGP signature

Re: [PATCH 3/4] net: stmmac: adding multiple buffers for TX

2017-04-04 Thread Thierry Reding

On Tue, Apr 04, 2017 at 06:54:26PM +0100, Joao Pinto wrote:
> This patch adds the structure stmmac_tx_queue which contains
> tx queues specific data (previously in stmmac_priv).
> 
> Signed-off-by: Joao Pinto 
> ---
>  drivers/net/ethernet/stmicro/stmmac/chain_mode.c  |  38 +-
>  drivers/net/ethernet/stmicro/stmmac/ring_mode.c   |  46 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  26 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 521 
> +-
>  4 files changed, 375 insertions(+), 256 deletions(-)

Looks good to me:

Reviewed-by: Thierry Reding 

And works fine on Tegra186, so:

Tested-by: Thierry Reding 


signature.asc
Description: PGP signature

Re: [PATCH 2/4] net: stmmac: adding multiple buffers for rx

2017-04-04 Thread Thierry Reding

One more nit: subject should say "... for RX" for consistency with patch
3/4.

Thierry


signature.asc
Description: PGP signature

Re: [PATCH 2/4] net: stmmac: adding multiple buffers for rx

2017-04-04 Thread Thierry Reding

On Tue, Apr 04, 2017 at 06:54:25PM +0100, Joao Pinto wrote:
[...]
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
[...]
>  static void stmmac_display_rx_rings(struct stmmac_priv *priv)
>  {
> + u32 rx_cnt = priv->plat->rx_queues_to_use;
>   void *head_rx;
> + u32 queue;
>  
> - if (priv->extend_desc)
> - head_rx = (void *)priv->dma_erx;
> - else
> - head_rx = (void *)priv->dma_rx;
> + /* Display RX rings */
> + for (queue = 0; queue < rx_cnt; queue++) {
> + struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
>  
> - /* Display RX ring */
> - priv->hw->desc->display_ring(head_rx, DMA_RX_SIZE, true);
> + pr_info("\tRX Queue %d rings\n", queue);

Nit: %u is the right specifier for unsigned integers.

> @@ -1107,46 +1135,65 @@ static int init_dma_rx_desc_rings(struct net_device 
> *dev, gfp_t flags)
[...]
>  err_init_rx_buffers:
> - while (--i >= 0)
> - stmmac_free_rx_buffers(priv, i);
> + while (queue-- >= 0) {

Why are you switching to postfix decrement here? Not only is it
inconsistent with the prefix decrement below, I think this also gives
you a wrong result. Consider what happens if queue == 0. The condition
evaluates to true, but within the loop the queue variable will wrap to
~0 and probably crash stmmac_free_rx_buffers().

Other than that, this looks fine, so with the above fixed:

Reviewed-by: Thierry Reding 

Also works on Tegra186, so:

Tested-by: Thierry Reding 

signature.asc
Description: PGP signature

Re: [PATCH 1/4] net: stmmac: break some functions into RX and TX scopes

2017-04-04 Thread Thierry Reding

On Tue, Apr 04, 2017 at 06:54:24PM +0100, Joao Pinto wrote:
> This patch breaks several functions into RX and TX scopes, which
> will be useful when adding multiple buffers mechanism.
> 
> Signed-off-by: Joao Pinto 
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 350 
> +-
>  1 file changed, 268 insertions(+), 82 deletions(-)

A couple of small nits below.

> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
[...]
> @@ -924,16 +941,16 @@ static int stmmac_set_bfsize(int mtu, int bufsize)
>  }
>  
>  /**
> - * stmmac_clear_descriptors - clear descriptors
> + * stmmac_clear_rx_descriptors - clear RX descriptors
>   * @priv: driver private structure
> - * Description: this function is called to clear the tx and rx descriptors
> + * Description: this function is called to clear the rx descriptors

You seem to be transitioning to "RX" and "TX" everywhere, maybe do the
same in this comment for consistency?

Also, on a general note: there's no need for "Description:" here. The
kerneldoc format mandates that you leave a blank line after the block
of parameter descriptions, and the paragraph that follows becomes the
description. I know that these are static functions and are therefore
not parsed by kerneldoc, but since you already use the syntax anyway,
you might as well get it right.

>   * in case of both basic and extended descriptors are used.
>   */
> -static void stmmac_clear_descriptors(struct stmmac_priv *priv)
> +static void stmmac_clear_rx_descriptors(struct stmmac_priv *priv)
>  {
>   int i;

This could be unsigned.

>  
> - /* Clear the Rx/Tx descriptors */
> + /* Clear the RX descriptors */
>   for (i = 0; i < DMA_RX_SIZE; i++)
>   if (priv->extend_desc)
>   priv->hw->desc->init_rx_desc(&priv->dma_erx[i].basic,
> @@ -943,6 +960,19 @@ static void stmmac_clear_descriptors(struct stmmac_priv 
> *priv)
>   priv->hw->desc->init_rx_desc(&priv->dma_rx[i],
>priv->use_riwt, priv->mode,
>(i == DMA_RX_SIZE - 1));
> +}
> +
> +/**
> + * stmmac_clear_tx_descriptors - clear tx descriptors
> + * @priv: driver private structure
> + * Description: this function is called to clear the tx descriptors
> + * in case of both basic and extended descriptors are used.
> + */
> +static void stmmac_clear_tx_descriptors(struct stmmac_priv *priv)
> +{
> + int i;

Same here. There are a couple of other such occurrences throughout the
file. This already exists in many places in the driver, so I don't think
this needs to be changed. Or at least it could be a follow-up patch.

> +
> + /* Clear the TX descriptors */
>   for (i = 0; i < DMA_TX_SIZE; i++)
>   if (priv->extend_desc)
>   priv->hw->desc->init_tx_desc(&priv->dma_etx[i].basic,
> @@ -955,6 +985,21 @@ static void stmmac_clear_descriptors(struct stmmac_priv 
> *priv)
>  }
>  
>  /**
> + * stmmac_clear_descriptors - clear descriptors
> + * @priv: driver private structure
> + * Description: this function is called to clear the tx and rx descriptors
> + * in case of both basic and extended descriptors are used.
> + */
> +static void stmmac_clear_descriptors(struct stmmac_priv *priv)
> +{
> + /* Clear the RX descriptors */
> + stmmac_clear_rx_descriptors(priv);
> +
> + /* Clear the TX descriptors */
> + stmmac_clear_tx_descriptors(priv);
> +}
> +
> +/**
>   * stmmac_init_rx_buffers - init the RX descriptor buffer.
>   * @priv: driver private structure
>   * @p: descriptor pointer
> @@ -996,6 +1041,11 @@ static int stmmac_init_rx_buffers(struct stmmac_priv 
> *priv, struct dma_desc *p,
>   return 0;
>  }
>  
> +/**
> + * stmmac_free_rx_buffers - free RX dma buffers
> + * @priv: private structure
> + * @i: buffer index.

If this operates on a single buffer, as specified by the buffer index,
maybe this should be named singular stmmac_free_rx_buffer()?

> + */
>  static void stmmac_free_rx_buffers(struct stmmac_priv *priv, int i)

The index could be unsigned.

>  {
>   if (priv->rx_skbuff[i]) {
> @@ -1007,14 +1057,42 @@ static void stmmac_free_rx_buffers(struct stmmac_priv 
> *priv, int i)
>  }
>  
>  /**
> - * init_dma_desc_rings - init the RX/TX descriptor rings
> + * stmmac_free_tx_buffers - free RX dma buffers
> + * @priv: private structure
> + * @i: buffer index.
> + */
> +static void stmmac_free_tx_buffers(struct stmmac_priv *priv, int i)
> +{
> + if (priv->tx_skbuff_dma[i].buf) {
> + if (priv->tx_skbuff_dma[i].map_as_page)
> + dma_unmap_page(priv->device,
> +priv->tx_skbuff_dma[i].buf,
> +priv->tx_skbuff_dma[i].len,
> +DMA_TO_DEVICE);
> + else
> + dma_unmap_single(priv->dev

Re: net/ipv4: use-after-free in ipv4_mtu

2017-04-04 Thread Eric Dumazet

On Tue, Apr 4, 2017 at 7:50 AM, Andrey Konovalov  wrote:
>
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5).
>
> Unfortunately it's not reproducible.
>
> ==
> BUG: KASAN: use-after-free in dst_metric_raw include/net/dst.h:176
> [inline] at addr 88003d6a965c
> BUG: KASAN: use-after-free in ipv4_mtu+0x3f2/0x4b0
> net/ipv4/route.c:1270 at addr 88003d6a965c
> Read of size 4 by task syz-executor3/20611
> CPU: 3 PID: 20611 Comm: syz-executor3 Not tainted 4.11.0-rc5+ #199
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x292/0x398 lib/dump_stack.c:52
>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
>  print_address_description mm/kasan/report.c:202 [inline]
>  kasan_report_error mm/kasan/report.c:291 [inline]
>  kasan_report+0x252/0x510 mm/kasan/report.c:347
>  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367
>  dst_metric_raw include/net/dst.h:176 [inline]
>  ipv4_mtu+0x3f2/0x4b0 net/ipv4/route.c:1270
>  dst_mtu include/net/dst.h:221 [inline]
>  do_ip_getsockopt+0x71d/0x2290 net/ipv4/ip_sockglue.c:1433
>  ip_getsockopt+0x90/0x230 net/ipv4/ip_sockglue.c:1578
>  tcp_getsockopt+0x82/0xd0 net/ipv4/tcp.c:3131
>  sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2709
>  SYSC_getsockopt net/socket.c:1829 [inline]
>  SyS_getsockopt+0x252/0x390 net/socket.c:1811
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x4458d9
> RSP: 002b:7fe87f452b58 EFLAGS: 0286 ORIG_RAX: 0037
> RAX: ffda RBX: 0005 RCX: 004458d9
> RDX: 000e RSI:  RDI: 0005
> RBP: 006e0020 R08: 20db6000 R09: 
> R10: 207e8000 R11: 0286 R12: 00708150
> R13: 20db8000 R14: 1000 R15: 0003
> Object at 88003d6a9658, in cache kmalloc-64 size: 64
> Allocated:
> PID = 20110
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>  set_track mm/kasan/kasan.c:525 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
>  kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745
>  kmalloc include/linux/slab.h:490 [inline]
>  kzalloc include/linux/slab.h:663 [inline]
>  fib_create_info+0x8e0/0x3a30 net/ipv4/fib_semantics.c:1040
>  fib_table_insert+0x1a5/0x1550 net/ipv4/fib_trie.c:1221
>  ip_rt_ioctl+0xddc/0x1590 net/ipv4/fib_frontend.c:597
>  inet_ioctl+0xf2/0x1c0 net/ipv4/af_inet.c:882
> sctp: [Deprecated]: syz-executor0 (pid 20638) Use of int in max_burst
> socket option.
> Use struct sctp_assoc_value instead
>  sock_do_ioctl+0x65/0xb0 net/socket.c:906
>  sock_ioctl+0x28f/0x440 net/socket.c:1004
>  vfs_ioctl fs/ioctl.c:45 [inline]
>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
>  SYSC_ioctl fs/ioctl.c:700 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Freed:
> PID = 4439
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>  set_track mm/kasan/kasan.c:525 [inline]
>  kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
>  slab_free_hook mm/slub.c:1357 [inline]
>  slab_free_freelist_hook mm/slub.c:1379 [inline]
>  slab_free mm/slub.c:2961 [inline]
>  kfree+0xe8/0x2b0 mm/slub.c:3882
>  free_fib_info_rcu+0x4ba/0x5e0 net/ipv4/fib_semantics.c:218
>  __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
>  rcu_do_batch.isra.64+0x947/0xcc0 kernel/rcu/tree.c:2879
>  invoke_rcu_callbacks kernel/rcu/tree.c:3142 [inline]
>  __rcu_process_callbacks kernel/rcu/tree.c:3109 [inline]
>  rcu_process_callbacks+0x2cc/0xb90 kernel/rcu/tree.c:3126
>  __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
> Memory state around the buggy address:
>  88003d6a9500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  88003d6a9580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >88003d6a9600: fc fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb
> ^
>  88003d6a9680: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
>  88003d6a9700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ==

Thanks for the report Andrey

Looking at fib->fib_metrics, I fail to understand how the following can work :

dst_init_metrics(&rt->dst, fi->fib_metrics, true);

In the cases fi->fib_metrics is _not_ dst_default_metrics,
fi->fib_metrics can be freed when the fib is deleted,
while dst(s) have still the 'read only pointer'.

RCU grace period before fi->fib_metrics freeing does not help.

Without refcounts, it looks like we need to copy the fib_metrics.

Re: pull-request: wireless-drivers-next 2017-04-03

2017-04-04 Thread David Miller

From: Kalle Valo 
Date: Tue, 04 Apr 2017 20:48:35 +0300

> David Miller  writes:
> 
>> From: Kalle Valo 
>> Date: Mon, 03 Apr 2017 14:26:10 +0300
>>
>>> here few really small fixes. I'm hoping this to be the last pull request
>>> for 4.11.
>>> 
>>> Please let me if there are any problems.
>>
>> Pulled, thanks.
>>
>> But I will warn you, you say fixes, but your Subject line and
>> GIT tag says "-next" so I pulled it into net-next.
> 
> Sorry, I used the wrong pull request template and that's why I had the
> wrong subject in this pull request. So actually this was supposed to be
> for net, not net-next. Any chance you could also pull this to net so
> that we can still get the fixes to 4.11?

Sure, done.

[RFC net-next] bpf: taint loading !is_gpl programs

2017-04-04 Thread Aaron Conole

The eBPF framework is used for more than just socket level filtering.  It
can also provide tracing, and even change the way packets coming into the
system look.  Most of the eBPF callable symbols are available to non-gpl
programs, and this includes helper functions which modify packets.  This
allows proprietary eBPF code to link to the kernel and make decisions
which can negatively impact network performance.

Since the sources for these programs are only available under a proprietary
license, it seems better to treat them the same as other proprietary
modules: set the system taint flag.  An exemption is made for socket-level
filters, since they do not really impact networking for the whole kernel.

Signed-off-by: Aaron Conole 
---
 kernel/bpf/syscall.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ab0cf4c4..1255b51 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -860,6 +860,11 @@ static int bpf_prog_load(union bpf_attr *attr)
 
bpf_prog_kallsyms_add(prog);
trace_bpf_prog_load(prog, err);
+   if (type != BPF_PROG_TYPE_SOCKET_FILTER && !is_gpl && !(err < 0)) {
+   if (!test_taint(TAINT_PROPRIETARY_MODULE))
+   pr_warn("bpf license '%s' taints kernel.\n", license);
+   add_taint(TAINT_PROPRIETARY_MODULE, LOCKDEP_STILL_OK);
+   }
return err;
 
 free_used_maps:
-- 
2.9.3

Re: [PATCH v4 2/2] can: spi: hi311x: Add Holt HI-311x CAN driver

2017-04-04 Thread Akshay Bhat



On 04/04/2017 11:34 AM, Marc Kleine-Budde wrote:
> On 03/24/2017 06:20 PM, Akshay Bhat wrote:
>> Hi Marc,
>>
>> On 03/17/2017 05:10 PM, Akshay Bhat wrote:
>>> This patch adds support for the Holt HI-311x CAN controller. The HI311x
>>> CAN controller is capable of transmitting and receiving standard data
>>> frames, extended data frames and remote frames. The HI311x interfaces
>>> with the host over SPI.
>>>
>>> Datasheet: www.holtic.com/documents/371-hi-3110_v-rev-jpdf.do
>>>
>>> Signed-off-by: Akshay Bhat 
>>> ---
>>>
>>
>> If there are no further review comments can this series be applied to
>> can-next or does it need to wait for the next kernel release cycle (4.13)?
> 
> The driver doesn't check if the workqueue allocation is successfull,
> I've squashed this patch:
> 

Thanks Marc, appreciate it. The squashed patch looks good.

Re: [PATCH] net: netfilter: Use seq_puts()/seq_putc() where possible

2017-04-04 Thread Simon Horman

On Wed, Mar 29, 2017 at 03:25:17AM +0530, simran singhal wrote:
> For string without format specifiers, use seq_puts(). For
> seq_printf("\n"), use seq_putc('\n').
> 
> Signed-off-by: simran singhal 
> ---
>  net/netfilter/ipvs/ip_vs_ctl.c  | 8 

Simran, I would be happy to pick up the IPVS version if it was posted as a
separate patch.

Alternative, Pablo, if you would like to take this patch feel free to add:

Acked-by: Simon Horman

Re: [PATCH] net: netfilter: Replace explicit NULL comparison with ! operator

2017-04-04 Thread Pablo Neira Ayuso

On Tue, Apr 04, 2017 at 01:41:11PM -0400, Simon Horman wrote:
> On Wed, Mar 29, 2017 at 03:45:01PM +0530, Arushi Singhal wrote:
> > Replace explicit NULL comparison with ! operator to simplify code.
> > 
> > Signed-off-by: Arushi Singhal 
> > ---
> >  net/netfilter/ipvs/ip_vs_ctl.c |  8 ++---
> >  net/netfilter/ipvs/ip_vs_proto.c   |  8 ++---
> 
> I count 18 instances of "!= NULL in net/netfilter/ipvs/ip_vs_proto but this
> patch only seems to update 8 of them. I would prefer to fix all or none of
> them.

Agreed.

Please address all instances and resubmit.

[PATCH 0/4 net-next] net: stmmac: adding multiple buffers

2017-04-04 Thread Joao Pinto

This patch adds multiple buffers to stmmac in a more fragmented
way, in order to make problem debug easier.

I would kindly request to people to test this patch in their HWs in
order to check if everything's functional. Thank you.

Joao Pinto (4):
  net: stmmac: break some functions into RX and TX scopes
  net: stmmac: adding multiple buffers for rx
  net: stmmac: adding multiple buffers for TX
  net: stmmac: adding multiple napi mechanism

 drivers/net/ethernet/stmicro/stmmac/chain_mode.c  |   45 +-
 drivers/net/ethernet/stmicro/stmmac/ring_mode.c   |   46 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |   49 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1295 ++---
 4 files changed, 969 insertions(+), 466 deletions(-)

-- 
2.9.3

1 2 >

1 - 100 of 166 matches

Mail list logo