[PATCH v2] net: mvpp2: initialize port of_node pointer

2018-08-28 Thread Baruch Siach
Without a valid of_node in struct device we can't find the mvpp2 port
device by its DT node. Specifically, this breaks
of_find_net_device_by_node().

For example, the Armada 8040 based Clearfog GT-8K uses Marvell 88E6141
switch connected to the &cp1_eth2 port:

&cp1_mdio {
...

switch0: switch0@4 {
compatible = "marvell,mv88e6085";
...

ports {
...

port@5 {
reg = <5>;
label = "cpu";
ethernet = <&cp1_eth2>;
};
};
};
};

Without this patch, dsa_register_switch() returns -EPROBE_DEFER because
of_find_net_device_by_node() can't find the device_node of the &cp1_eth2
device.

Reviewed-by: Andrew Lunn 
Signed-off-by: Baruch Siach 
---
v2: Expand the commit log as suggested by Andrew
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c 
b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 32d785b616e1..28500417843e 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -4803,6 +4803,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
dev->min_mtu = ETH_MIN_MTU;
/* 9704 == 9728 - 20 and rounding to 8 */
dev->max_mtu = MVPP2_BM_JUMBO_PKT_SIZE;
+   dev->dev.of_node = port_node;
 
/* Phylink isn't used w/ ACPI as of now */
if (port_node) {
-- 
2.18.0



Re: bpfilter causes a leftover kernel process

2018-08-28 Thread Alexei Starovoitov
On Tue, Aug 28, 2018 at 01:23:38PM +0200, Olivier Brunel wrote:
> On Mon, 27 Aug 2018 20:35:02 -0700
> Alexei Starovoitov  wrote:
> 
> > I'm also running Arch Linux in my VM, but I'm not able to reproduce
> > umount issue. I'm guessing it's somehow related to non-static build
> > and libc.so being busy with old systemd.
> 
> Oh, I mentioned it in a previous draft of my original mail but it
> seems it got lost in rewrites, I don't actually use systemd. Not that
> it should matter here though.
> 
> 
> > Typical shutdown should have done:
> > [   73.498022] shutdown[1]: Sending SIGTERM to remaining processes...
> > [   73.505501] shutdown[1]: Sending SIGKILL to remaining processes...
> > [   73.512783] shutdown[1]: Unmounting file systems.
> > And at the time of umount / no processes are alive other than systemd.
> 
> Yeah, I have a similar thing happening on shutdown, except that we're
> talking about a kernel thread here, so that process is ignored by the
> mentionned killing spree as a result, thus leaving that process running.

it's not a kernel thread and sounds like there is a bug in your pid 1
that is worth fixing.



Re: [PATCH bpf] bpf: fix several offset tests in bpf_msg_pull_data

2018-08-28 Thread Alexei Starovoitov
On Tue, Aug 28, 2018 at 04:15:35PM +0200, Daniel Borkmann wrote:
> While recently going over bpf_msg_pull_data(), I noticed three
> issues which are fixed in here:
> 
> 1) When we attempt to find the first scatterlist element (sge)
>for the start offset, we add len to the offset before we check
>for start < offset + len, whereas it should come after when
>we iterate to the next sge to accumulate the offsets. For
>example, given a start offset of 12 with a sge length of 8
>for the first sge in the list would lead us to determine this
>sge as the first sge thinking it covers first 16 bytes where
>start is located, whereas start sits in subsequent sges so
>we would end up pulling in the wrong data.
> 
> 2) After figuring out the starting sge, we have a short-cut test
>in !msg->sg_copy[i] && bytes <= len. This checks whether it's
>not needed to make the page at the sge private where we can
>just exit by updating msg->data and msg->data_end. However,
>the length test is not fully correct. bytes <= len checks
>whether the requested bytes (end - start offsets) fit into the
>sge's length. The part that is missing is that start must not
>be sge length aligned. Meaning, the start offset into the sge
>needs to be accounted as well on top of the requested bytes
>as otherwise we can access the sge out of bounds. For example
>the sge could have length of 8, our requested bytes could have
>length of 8, but at a start offset of 4, so we also would need
>to pull in 4 bytes of the next sge, when we jump to the out
>label we do set msg->data to sg_virt(&sg[i]) + start - offset
>and msg->data_end to msg->data + bytes which would be oob.
> 
> 3) The subsequent bytes < copy test for finding the last sge has
>the same issue as in point 2) but also it tests for less than
>rather than less or equal to. Meaning if the sge length is of
>8 and requested bytes of 8 while having the start aligned with
>the sge, we would unnecessarily go and pull in the next sge as
>well to make it private.
> 
> Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
> Signed-off-by: Daniel Borkmann 
> Acked-by: John Fastabend 

Applied to bpf tree, Thanks



[PATCH net-next v1] selftests/tls: Add test for recv(PEEK) spanning across multiple records

2018-08-28 Thread Vakul Garg
Added test case to receive multiple records with a single recvmsg()
operation with a MSG_PEEK set.
---
 tools/testing/selftests/net/tls.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tools/testing/selftests/net/tls.c 
b/tools/testing/selftests/net/tls.c
index b3ebf2646e52..07daff076ce0 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -502,6 +502,28 @@ TEST_F(tls, recv_peek_multiple)
EXPECT_EQ(memcmp(test_str, buf, send_len), 0);
 }
 
+TEST_F(tls, recv_peek_large_buf_mult_recs)
+{
+   char const *test_str = "test_read_peek_mult_recs";
+   char const *test_str_first = "test_read_peek";
+   char const *test_str_second = "_mult_recs";
+   int len;
+   char buf[64];
+
+   len = strlen(test_str_first);
+   EXPECT_EQ(send(self->fd, test_str_first, len, 0), len);
+
+   len = strlen(test_str_second) + 1;
+   EXPECT_EQ(send(self->fd, test_str_second, len, 0), len);
+
+   len = sizeof(buf);
+   memset(buf, 0, len);
+   EXPECT_NE(recv(self->cfd, buf, len, MSG_PEEK), -1);
+
+   len = strlen(test_str) + 1;
+   EXPECT_EQ(memcmp(test_str, buf, len), 0);
+}
+
 TEST_F(tls, pollin)
 {
char const *test_str = "test_poll";
-- 
2.13.6



[PATCH net-next v2] net/tls: Add support for async decryption of tls records

2018-08-28 Thread Vakul Garg
When tls records are decrypted using asynchronous acclerators such as
NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
getting -EINPROGRESS, the tls record processing stops till the time the
crypto accelerator finishes off and returns the result. This incurs a
context switch and is not an efficient way of accessing the crypto
accelerators. Crypto accelerators work efficient when they are queued
with multiple crypto jobs without having to wait for the previous ones
to complete.

The patch submits multiple crypto requests without having to wait for
for previous ones to complete. This has been implemented for records
which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
for all the asynchronous decryption requests to complete.

The references to records which have been sent for async decryption are
dropped. For cases where record decryption is not possible in zero-copy
mode, asynchronous decryption is not used and we wait for decryption
crypto api to complete.

For crypto requests executing in async fashion, the memory for
aead_request, sglists and skb etc is freed from the decryption
completion handler. The decryption completion handler wakesup the
sleeping user context when recvmsg() flags that it has done sending
all the decryption requests and there are no more decryption requests
pending to be completed.

Signed-off-by: Vakul Garg 
Reviewed-by: Dave Watson 
---

Changes since v1:
- Simplified recvmsg() so to drop reference to skb in case it
  was submimtted for async decryption.
- Modified tls_sw_advance_skb() to handle case when input skb is
  NULL.

 include/net/tls.h |   6 +++
 net/tls/tls_sw.c  | 134 --
 2 files changed, 127 insertions(+), 13 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index d5c683e8bb22..cd0a65bd92f9 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -124,6 +124,12 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt;
u8 control;
bool decrypted;
+   atomic_t decrypt_pending;
+   bool async_notify;
+};
+
+struct decrypt_req_ctx {
+   struct sock *sk;
 };
 
 struct tls_record_info {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 52fbe727d7c1..9503e5a4c27e 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -43,12 +43,50 @@
 
 #define MAX_IV_SIZETLS_CIPHER_AES_GCM_128_IV_SIZE
 
+static void tls_decrypt_done(struct crypto_async_request *req, int err)
+{
+   struct aead_request *aead_req = (struct aead_request *)req;
+   struct decrypt_req_ctx *req_ctx =
+   (struct decrypt_req_ctx *)(aead_req + 1);
+
+   struct scatterlist *sgout = aead_req->dst;
+
+   struct tls_context *tls_ctx = tls_get_ctx(req_ctx->sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   int pending = atomic_dec_return(&ctx->decrypt_pending);
+   struct scatterlist *sg;
+   unsigned int pages;
+
+   /* Propagate if there was an err */
+   if (err) {
+   ctx->async_wait.err = err;
+   tls_err_abort(req_ctx->sk, err);
+   }
+
+   /* Release the skb, pages and memory allocated for crypto req */
+   kfree_skb(req->data);
+
+   /* Skip the first S/G entry as it points to AAD */
+   for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) {
+   if (!sg)
+   break;
+   put_page(sg_page(sg));
+   }
+
+   kfree(aead_req);
+
+   if (!pending && READ_ONCE(ctx->async_notify))
+   complete(&ctx->async_wait.completion);
+}
+
 static int tls_do_decryption(struct sock *sk,
+struct sk_buff *skb,
 struct scatterlist *sgin,
 struct scatterlist *sgout,
 char *iv_recv,
 size_t data_len,
-struct aead_request *aead_req)
+struct aead_request *aead_req,
+bool async)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -59,10 +97,34 @@ static int tls_do_decryption(struct sock *sk,
aead_request_set_crypt(aead_req, sgin, sgout,
   data_len + tls_ctx->rx.tag_size,
   (u8 *)iv_recv);
-   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
- crypto_req_done, &ctx->async_wait);
 
-   ret = crypto_wait_req(crypto_aead_decrypt(aead_req), &ctx->async_wait);
+   if (async) {
+   struct decrypt_req_ctx *req_ctx;
+
+   req_ctx = (struct decrypt_req_ctx *)(aead_req + 1);
+   req_ctx->sk = sk;
+
+   aead_request_set_callback(aead_req,
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+

Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Nicholas Piggin
On Tue, 28 Aug 2018 18:09:09 +0200
Ard Biesheuvel  wrote:

> On 28 August 2018 at 15:56, Ard Biesheuvel  wrote:
> > Hello Andreas, Nick,
> >
> > On 28 August 2018 at 06:06, Nicholas Piggin  
> > wrote:  
> >> On Mon, 27 Aug 2018 19:11:01 +0200
> >> Andreas Schwab  wrote:
> >>  
> >>> I'm getting this Oops when running iptables -F OUTPUT:
> >>>
> >>> [   91.139409] Unable to handle kernel paging request for data at address 
> >>> 0xd001fff12f34
> >>> [   91.139414] Faulting instruction address: 0xd16a5718
> >>> [   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
> >>> [   91.139426] BE SMP NR_CPUS=2 PowerMac
> >>> [   91.139434] Modules linked in: iptable_filter ip_tables x_tables 
> >>> bpfilter nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet 
> >>> snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus 
> >>> snd_aoa_soundbus snd_pcm_oss snd_pcm snd_seq snd_timer snd_seq_device 
> >>> snd_mixer_oss snd sungem sr_mod firewire_ohci cdrom sungem_phy soundcore 
> >>> firewire_core pata_macio crc_itu_t sg hid_generic usbhid linear md_mod 
> >>> ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore usb_common dm_snapshot 
> >>> dm_bufio dm_mirror dm_region_hash dm_log dm_mod sata_svw
> >>> [   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
> >>> [   91.139526] NIP:  d16a5718 LR: d16a569c CTR: 
> >>> c06f560c
> >>> [   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  
> >>> (4.19.0-rc1)
> >>> [   91.139534] MSR:  9200b032   CR: 
> >>> 84002484  XER: 2000
> >>> [   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0
> >>> GPR00: d16a569c c001fa5778f0 d16b0400 
> >>> GPR04: 0002  8001fa46418e c001fa0d05c8
> >>> GPR08: d16b0400 d00037f13000 0001ff3e7000 d16a6fb8
> >>> GPR12: c06f560c c780  
> >>> GPR16: 11635010 3fffa1b7aa68  
> >>> GPR20: 0003 10013918 116350c0 c0b88990
> >>> GPR24: c0b88ba4  d001fff12f34 
> >>> GPR28: d16b8000 c001fa20f400 c001fa20f440 
> >>> [   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
> >>> [ip_tables]
> >>> [   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
> >>> [ip_tables]
> >>> [   91.139638] Call Trace:
> >>> [   91.139645] [c001fa5778f0] [d16a569c] 
> >>> .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
> >>> [   91.139655] [c001fa5779b0] [d16a5b54] 
> >>> .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
> >>> [   91.139666] [c001fa577aa0] [c06233e0] 
> >>> .nf_getsockopt+0x68/0x88
> >>> [   91.139674] [c001fa577b40] [c0631608] 
> >>> .ip_getsockopt+0xbc/0x128
> >>> [   91.139682] [c001fa577bf0] [c065adf4] 
> >>> .raw_getsockopt+0x18/0x5c
> >>> [   91.139690] [c001fa577c60] [c05b5f60] 
> >>> .sock_common_getsockopt+0x2c/0x40
> >>> [   91.139697] [c001fa577cd0] [c05b3394] 
> >>> .__sys_getsockopt+0xa4/0xd0
> >>> [   91.139704] [c001fa577d80] [c05b5ab0] 
> >>> .__se_sys_socketcall+0x238/0x2b4
> >>> [   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
> >>> [   91.139716] Instruction dump:
> >>> [   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 
> >>> 419d000c 393e0060
> >>> [   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 
> >>> 41e20010 7c210b78
> >>> [   91.139752] ---[ end trace f5d1d5431651845d ]---  
> >>
> >> This is due to 7290d58095 ("module: use relative references for
> >> __ksymtab entries"). This part of kernel/module.c -
> >>
> >>/* Divert to percpu allocation if a percpu var. */
> >>if (sym[i].st_shndx == info->index.pcpu)
> >>secbase = (unsigned long)mod_percpu(mod);
> >>else
> >>secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
> >>sym[i].st_value += secbase;
> >>
> >> Causes the distance to the target to exceed 32-bits on powerpc, so
> >> it doesn't fit in a rel32 reloc. Not sure how other archs cope.
> >>  
> >
> > Apologies for the breakage. It does indeed appear to affect all
> > architectures, and I'm a bit puzzled why you are the first one to spot
> > it.
> >
> > I will try to find a clean way to special case the per-CPU variable
> > __ksymtab references in the generic module code, and if that is too
> > cumbersome, we can switch to 64-bit relative references (or rather,
> > native word size relative references) instead. Or revert the whole
> > thing ...  
> 
> OK, after a bit of digging, and confirming that the arm64
> implementation works as expected (its module loader actually detects
> overflows of the 32-bit place relative relocations, so the problem
> definitely does not occur there), I think I found the explanation why
>

Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Nicholas Piggin
On Wed, 29 Aug 2018 13:28:27 +1000
Nicholas Piggin  wrote:

> On Tue, 28 Aug 2018 14:06:32 +1000
> Nicholas Piggin  wrote:
> 
> > On Mon, 27 Aug 2018 19:11:01 +0200
> > Andreas Schwab  wrote:
> >   
> > > I'm getting this Oops when running iptables -F OUTPUT:
> > > 
> > > [   91.139409] Unable to handle kernel paging request for data at address 
> > > 0xd001fff12f34
> > > [   91.139414] Faulting instruction address: 0xd16a5718
> > > [   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
> > > [   91.139426] BE SMP NR_CPUS=2 PowerMac
> > > [   91.139434] Modules linked in: iptable_filter ip_tables x_tables 
> > > bpfilter nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet 
> > > snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus 
> > > snd_aoa_soundbus snd_pcm_oss snd_pcm snd_seq snd_timer snd_seq_device 
> > > snd_mixer_oss snd sungem sr_mod firewire_ohci cdrom sungem_phy soundcore 
> > > firewire_core pata_macio crc_itu_t sg hid_generic usbhid linear md_mod 
> > > ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore usb_common dm_snapshot 
> > > dm_bufio dm_mirror dm_region_hash dm_log dm_mod sata_svw
> > > [   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
> > > [   91.139526] NIP:  d16a5718 LR: d16a569c CTR: 
> > > c06f560c
> > > [   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  
> > > (4.19.0-rc1)
> > > [   91.139534] MSR:  9200b032   CR: 
> > > 84002484  XER: 2000
> > > [   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0 
> > > GPR00: d16a569c c001fa5778f0 d16b0400 
> > >  
> > > GPR04: 0002  8001fa46418e 
> > > c001fa0d05c8 
> > > GPR08: d16b0400 d00037f13000 0001ff3e7000 
> > > d16a6fb8 
> > > GPR12: c06f560c c780  
> > >  
> > > GPR16: 11635010 3fffa1b7aa68  
> > >  
> > > GPR20: 0003 10013918 116350c0 
> > > c0b88990 
> > > GPR24: c0b88ba4  d001fff12f34 
> > >  
> > > GPR28: d16b8000 c001fa20f400 c001fa20f440 
> > >  
> > > [   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
> > > [ip_tables]
> > > [   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
> > > [ip_tables]
> > > [   91.139638] Call Trace:
> > > [   91.139645] [c001fa5778f0] [d16a569c] 
> > > .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
> > > [   91.139655] [c001fa5779b0] [d16a5b54] 
> > > .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
> > > [   91.139666] [c001fa577aa0] [c06233e0] 
> > > .nf_getsockopt+0x68/0x88
> > > [   91.139674] [c001fa577b40] [c0631608] 
> > > .ip_getsockopt+0xbc/0x128
> > > [   91.139682] [c001fa577bf0] [c065adf4] 
> > > .raw_getsockopt+0x18/0x5c
> > > [   91.139690] [c001fa577c60] [c05b5f60] 
> > > .sock_common_getsockopt+0x2c/0x40
> > > [   91.139697] [c001fa577cd0] [c05b3394] 
> > > .__sys_getsockopt+0xa4/0xd0
> > > [   91.139704] [c001fa577d80] [c05b5ab0] 
> > > .__se_sys_socketcall+0x238/0x2b4
> > > [   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
> > > [   91.139716] Instruction dump:
> > > [   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 
> > > 419d000c 393e0060 
> > > [   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 
> > > 41e20010 7c210b78 
> > > [   91.139752] ---[ end trace f5d1d5431651845d ]---
> > 
> > This is due to 7290d58095 ("module: use relative references for
> > __ksymtab entries"). This part of kernel/module.c -
> > 
> >/* Divert to percpu allocation if a percpu var. */
> >if (sym[i].st_shndx == info->index.pcpu)
> >secbase = (unsigned long)mod_percpu(mod);
> >else
> >secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
> >sym[i].st_value += secbase;
> > 
> > Causes the distance to the target to exceed 32-bits on powerpc, so
> > it doesn't fit in a rel32 reloc. Not sure how other archs cope.  
> 
> Any progress on this one? I had a bit of a look but can't see a really
> trivial fix and don't have a lot of time to work on it. Maybe use 64
> bit relative offsets for per-cpu exports, or better might be apply the
> per-cpu fixup when linking against the symbol rather than when writing
> the module symbol table.
> 
> Until then I'd like to just remove HAVE_ARCH_PREL32_RELOCATIONS from
> powerpc/Kconfig, but if other archs are going to have issues too, we
> could just revert
> 
> 271ca788774aa ("arch: enable relative relocations for arm64, power and x86")
> 
> arm64, x86 -- can the distance between your module percpu data link
> location -> module percpu runtime allocation location exceed 31 bits?

[Sorry ignore this, I missed some mail, will reply

[PATCH][net-next] vxlan: reduce dirty cache line in vxlan_find_mac

2018-08-28 Thread Li RongQing
vxlan_find_mac() unconditionally set f->used for every packet,
this causes a cache miss for every packet, since remote, hlist
and used of vxlan_fdb share the same cache line, which are
accessed when send every packets.

so f->used is set only if not equal to jiffies, to reduce dirty
cache line times, this gives 3% speed-up with small packets.

Signed-off-by: Zhang Yu 
Signed-off-by: Li RongQing 
---
 drivers/net/vxlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ababba37d735..e5d236595206 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -464,7 +464,7 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev 
*vxlan,
struct vxlan_fdb *f;
 
f = __vxlan_find_mac(vxlan, mac, vni);
-   if (f)
+   if (f && f->used != jiffies)
f->used = jiffies;
 
return f;
-- 
2.16.2



Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Nicholas Piggin
On Tue, 28 Aug 2018 14:06:32 +1000
Nicholas Piggin  wrote:

> On Mon, 27 Aug 2018 19:11:01 +0200
> Andreas Schwab  wrote:
> 
> > I'm getting this Oops when running iptables -F OUTPUT:
> > 
> > [   91.139409] Unable to handle kernel paging request for data at address 
> > 0xd001fff12f34
> > [   91.139414] Faulting instruction address: 0xd16a5718
> > [   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
> > [   91.139426] BE SMP NR_CPUS=2 PowerMac
> > [   91.139434] Modules linked in: iptable_filter ip_tables x_tables 
> > bpfilter nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet 
> > snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus 
> > snd_aoa_soundbus snd_pcm_oss snd_pcm snd_seq snd_timer snd_seq_device 
> > snd_mixer_oss snd sungem sr_mod firewire_ohci cdrom sungem_phy soundcore 
> > firewire_core pata_macio crc_itu_t sg hid_generic usbhid linear md_mod 
> > ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore usb_common dm_snapshot dm_bufio 
> > dm_mirror dm_region_hash dm_log dm_mod sata_svw
> > [   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
> > [   91.139526] NIP:  d16a5718 LR: d16a569c CTR: 
> > c06f560c
> > [   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  (4.19.0-rc1)
> > [   91.139534] MSR:  9200b032   CR: 
> > 84002484  XER: 2000
> > [   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0 
> > GPR00: d16a569c c001fa5778f0 d16b0400  
> > GPR04: 0002  8001fa46418e c001fa0d05c8 
> > GPR08: d16b0400 d00037f13000 0001ff3e7000 d16a6fb8 
> > GPR12: c06f560c c780   
> > GPR16: 11635010 3fffa1b7aa68   
> > GPR20: 0003 10013918 116350c0 c0b88990 
> > GPR24: c0b88ba4  d001fff12f34  
> > GPR28: d16b8000 c001fa20f400 c001fa20f440  
> > [   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
> > [ip_tables]
> > [   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
> > [ip_tables]
> > [   91.139638] Call Trace:
> > [   91.139645] [c001fa5778f0] [d16a569c] 
> > .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
> > [   91.139655] [c001fa5779b0] [d16a5b54] 
> > .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
> > [   91.139666] [c001fa577aa0] [c06233e0] 
> > .nf_getsockopt+0x68/0x88
> > [   91.139674] [c001fa577b40] [c0631608] 
> > .ip_getsockopt+0xbc/0x128
> > [   91.139682] [c001fa577bf0] [c065adf4] 
> > .raw_getsockopt+0x18/0x5c
> > [   91.139690] [c001fa577c60] [c05b5f60] 
> > .sock_common_getsockopt+0x2c/0x40
> > [   91.139697] [c001fa577cd0] [c05b3394] 
> > .__sys_getsockopt+0xa4/0xd0
> > [   91.139704] [c001fa577d80] [c05b5ab0] 
> > .__se_sys_socketcall+0x238/0x2b4
> > [   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
> > [   91.139716] Instruction dump:
> > [   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 
> > 419d000c 393e0060 
> > [   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 
> > 41e20010 7c210b78 
> > [   91.139752] ---[ end trace f5d1d5431651845d ]---  
> 
> This is due to 7290d58095 ("module: use relative references for
> __ksymtab entries"). This part of kernel/module.c -
> 
>/* Divert to percpu allocation if a percpu var. */
>if (sym[i].st_shndx == info->index.pcpu)
>secbase = (unsigned long)mod_percpu(mod);
>else
>secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
>sym[i].st_value += secbase;
> 
> Causes the distance to the target to exceed 32-bits on powerpc, so
> it doesn't fit in a rel32 reloc. Not sure how other archs cope.

Any progress on this one? I had a bit of a look but can't see a really
trivial fix and don't have a lot of time to work on it. Maybe use 64
bit relative offsets for per-cpu exports, or better might be apply the
per-cpu fixup when linking against the symbol rather than when writing
the module symbol table.

Until then I'd like to just remove HAVE_ARCH_PREL32_RELOCATIONS from
powerpc/Kconfig, but if other archs are going to have issues too, we
could just revert

271ca788774aa ("arch: enable relative relocations for arm64, power and x86")

arm64, x86 -- can the distance between your module percpu data link
location -> module percpu runtime allocation location exceed 31 bits?

Thanks,
Nick


[PATCH] neighbour: confirm neigh entries when ARP packet is received

2018-08-28 Thread Vasily Khoruzhick
Update 'confirmed' timestamp when ARP packet is received. It shouldn't
affect locktime logic and anyway entry can be confirmed by any higher-layer
protocol. Thus it makes no sense not to confirm it when ARP packet is
received.

Fixes: 77d7123342 ("neighbour: update neigh timestamps iff update is
effective")

Signed-off-by: Vasily Khoruzhick 
---
 net/core/neighbour.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index aa19d86937af..901418ef70ea 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1180,6 +1180,9 @@ int neigh_update(struct neighbour *neigh, const u8 
*lladdr, u8 new,
lladdr = neigh->ha;
}
 
+   if (new & NUD_CONNECTED)
+   neigh->confirmed = jiffies;
+
/* If entry was valid and address is not changed,
   do not change entry state, if new one is STALE.
 */
@@ -1205,11 +1208,8 @@ int neigh_update(struct neighbour *neigh, const u8 
*lladdr, u8 new,
 * neighbour entry. Otherwise we risk to move the locktime window with
 * noop updates and ignore relevant ARP updates.
 */
-   if (new != old || lladdr != neigh->ha) {
-   if (new & NUD_CONNECTED)
-   neigh->confirmed = jiffies;
+   if (new != old || lladdr != neigh->ha)
neigh->updated = jiffies;
-   }
 
if (new != old) {
neigh_del_timer(neigh);
-- 
2.18.0



[PATCH net-next 1/4] liquidio: improve soft command handling

2018-08-28 Thread Felix Manlunas
1. Set LIO_SC_MAX_TMO_MS as the maximum timeout value for a soft command
   (sc).  All sc's use this value as a hard timeout value. Add expiry_time
   in struct octeon_soft_command to keep the hard timeout value. The field
   wait_time and timeout in struct octeon_soft_command will be obsoleted in
   the last patch of this patch series.
2. Add processing a synchronous sc in sc response thread
   lio_process_ordered_list. The memory allocated for a synchronous sc will
   be freed by lio_process_ordered_list() to the sc pool.
3. Add two response lists for lio_process_ordered_list to process the
   storage allocated for sc's:
   OCTEON_DONE_SC_LIST response list keeps all sc's which will be freed to
   the pool after their requestors have finished processing the responses.
   OCTEON_ZOMBIE_SC_LIST response list keeps all sc's which have got
   LIO_SC_MAX_TMO_MS timeout.
   When an sc gets a hard timeout, lio_process_order_list() will recheck
   its status 1 ms later. If the status has not updated by the firmware at
   that time, the sc will be removed from OCTEON_DONE_SC_LIST response list
   to OCTEON_ZOMBIE_SC_LIST response list. The sc's in the
   OCTEON_ZOMBIE_SC_LIST response list will be freed when the driver is
   unloaded.

Signed-off-by: Weilin Chang 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  31 +-
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c |  34 +-
 .../net/ethernet/cavium/liquidio/octeon_config.h   |   2 +-
 drivers/net/ethernet/cavium/liquidio/octeon_iq.h   |  11 ++
 drivers/net/ethernet/cavium/liquidio/octeon_nic.c  |   3 +-
 .../net/ethernet/cavium/liquidio/request_manager.c | 114 +++--
 .../ethernet/cavium/liquidio/response_manager.c|  82 +--
 .../ethernet/cavium/liquidio/response_manager.h|   4 +-
 8 files changed, 232 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 6fb13fa..6663749 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -1037,12 +1037,12 @@ static void octeon_destroy_resources(struct 
octeon_device *oct)
 
/* fallthrough */
case OCT_DEV_IO_QUEUES_DONE:
-   if (wait_for_pending_requests(oct))
-   dev_err(&oct->pci_dev->dev, "There were pending 
requests\n");
-
if (lio_wait_for_instr_fetch(oct))
dev_err(&oct->pci_dev->dev, "IQ had pending 
instructions\n");
 
+   if (wait_for_pending_requests(oct))
+   dev_err(&oct->pci_dev->dev, "There were pending 
requests\n");
+
/* Disable the input and output queues now. No more packets will
 * arrive from Octeon, but we should wait for all packet
 * processing to finish.
@@ -1052,6 +1052,31 @@ static void octeon_destroy_resources(struct 
octeon_device *oct)
if (lio_wait_for_oq_pkts(oct))
dev_err(&oct->pci_dev->dev, "OQ had pending packets\n");
 
+   /* Force all requests waiting to be fetched by OCTEON to
+* complete.
+*/
+   for (i = 0; i < MAX_OCTEON_INSTR_QUEUES(oct); i++) {
+   struct octeon_instr_queue *iq;
+
+   if (!(oct->io_qmask.iq & BIT_ULL(i)))
+   continue;
+   iq = oct->instr_queue[i];
+
+   if (atomic_read(&iq->instr_pending)) {
+   spin_lock_bh(&iq->lock);
+   iq->fill_cnt = 0;
+   iq->octeon_read_index = iq->host_write_index;
+   iq->stats.instr_processed +=
+   atomic_read(&iq->instr_pending);
+   lio_process_iq_request_list(oct, iq, 0);
+   spin_unlock_bh(&iq->lock);
+   }
+   }
+
+   lio_process_ordered_list(oct, 1);
+   octeon_free_sc_done_list(oct);
+   octeon_free_sc_zombie_list(oct);
+
/* fallthrough */
case OCT_DEV_INTR_SET_DONE:
/* Disable interrupts  */
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index b778357..59c2dd9 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -471,12 +471,12 @@ static void octeon_destroy_resources(struct octeon_device 
*oct)
case OCT_DEV_HOST_OK:
/* fallthrough */
case OCT_DEV_IO_QUEUES_DONE:
-   if (wait_for_pending_requests(oct))
-   dev_err(&oct->pci_dev->dev, "There were pending 
requests\n");
-
if (lio_wait_for_instr_fetch(oct))

[PATCH net-next 2/4] liquidio: make soft command calls synchronous

2018-08-28 Thread Felix Manlunas
1. Add wait_for_sc_completion_timeout() for waiting the response and
   handling common response errors
2. Send sc's synchronously: remove unused callback function,
   and context structure; use wait_for_sc_completion_timeout() to wait
   its response.

Signed-off-by: Weilin Chang 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/lio_core.c| 134 ++---
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  42 ++-
 drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c  |  42 ++-
 drivers/net/ethernet/cavium/liquidio/octeon_main.h |  66 ++
 .../net/ethernet/cavium/liquidio/octeon_network.h  |   6 -
 5 files changed, 129 insertions(+), 161 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c 
b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 8093c5e..822ce0f 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -1333,8 +1333,6 @@ octnet_nic_stats_callback(struct octeon_device *oct_dev,
struct octeon_soft_command *sc = (struct octeon_soft_command *)ptr;
struct oct_nic_stats_resp *resp =
(struct oct_nic_stats_resp *)sc->virtrptr;
-   struct oct_nic_stats_ctrl *ctrl =
-   (struct oct_nic_stats_ctrl *)sc->ctxptr;
struct nic_rx_stats *rsp_rstats = &resp->stats.fromwire;
struct nic_tx_stats *rsp_tstats = &resp->stats.fromhost;
struct nic_rx_stats *rstats = &oct_dev->link_stats.fromwire;
@@ -1424,7 +1422,6 @@ octnet_nic_stats_callback(struct octeon_device *oct_dev,
} else {
resp->status = -1;
}
-   complete(&ctrl->complete);
 }
 
 int octnet_get_link_stats(struct net_device *netdev)
@@ -1432,7 +1429,6 @@ int octnet_get_link_stats(struct net_device *netdev)
struct lio *lio = GET_LIO(netdev);
struct octeon_device *oct_dev = lio->oct_dev;
struct octeon_soft_command *sc;
-   struct oct_nic_stats_ctrl *ctrl;
struct oct_nic_stats_resp *resp;
int retval;
 
@@ -1441,7 +1437,7 @@ int octnet_get_link_stats(struct net_device *netdev)
octeon_alloc_soft_command(oct_dev,
  0,
  sizeof(struct oct_nic_stats_resp),
- sizeof(struct octnic_ctrl_pkt));
+ 0);
 
if (!sc)
return -ENOMEM;
@@ -1449,66 +1445,39 @@ int octnet_get_link_stats(struct net_device *netdev)
resp = (struct oct_nic_stats_resp *)sc->virtrptr;
memset(resp, 0, sizeof(struct oct_nic_stats_resp));
 
-   ctrl = (struct oct_nic_stats_ctrl *)sc->ctxptr;
-   memset(ctrl, 0, sizeof(struct oct_nic_stats_ctrl));
-   ctrl->netdev = netdev;
-   init_completion(&ctrl->complete);
+   init_completion(&sc->complete);
+   sc->sc_status = OCTEON_REQUEST_PENDING;
 
sc->iq_no = lio->linfo.txpciq[0].s.q_no;
 
octeon_prepare_soft_command(oct_dev, sc, OPCODE_NIC,
OPCODE_NIC_PORT_STATS, 0, 0, 0);
 
-   sc->callback = octnet_nic_stats_callback;
-   sc->callback_arg = sc;
-   sc->wait_time = 500;/*in milli seconds*/
-
retval = octeon_send_soft_command(oct_dev, sc);
if (retval == IQ_SEND_FAILED) {
octeon_free_soft_command(oct_dev, sc);
return -EINVAL;
}
 
-   wait_for_completion_timeout(&ctrl->complete, msecs_to_jiffies(1000));
-
-   if (resp->status != 1) {
-   octeon_free_soft_command(oct_dev, sc);
-
-   return -EINVAL;
+   retval = wait_for_sc_completion_timeout(oct_dev, sc,
+   (2 * LIO_SC_MAX_TMO_MS));
+   if (retval)  {
+   dev_err(&oct_dev->pci_dev->dev, "sc OPCODE_NIC_PORT_STATS 
command failed\n");
+   return retval;
}
 
-   octeon_free_soft_command(oct_dev, sc);
+   octnet_nic_stats_callback(oct_dev, sc->sc_status, sc);
+   WRITE_ONCE(sc->caller_is_done, true);
 
return 0;
 }
 
-static void liquidio_nic_seapi_ctl_callback(struct octeon_device *oct,
-   u32 status,
-   void *buf)
-{
-   struct liquidio_nic_seapi_ctl_context *ctx;
-   struct octeon_soft_command *sc = buf;
-
-   ctx = sc->ctxptr;
-
-   oct = lio_get_device(ctx->octeon_id);
-   if (status) {
-   dev_err(&oct->pci_dev->dev, "%s: instruction failed. Status: 
%llx\n",
-   __func__,
-   CVM_CAST64(status));
-   }
-   ctx->status = status;
-   complete(&ctx->complete);
-}
-
 int liquidio_set_speed(struct lio *lio, int speed)
 {
-   struct liquidio_nic_seapi_ctl_context *ctx;
struct octeon_device *oct = lio->oct_dev;
struct oct_nic_seapi_resp *resp;
struct octeon_so

[PATCH net-next 3/4] liquidio: change octnic_ctrl_pkt to do synchronous soft commands

2018-08-28 Thread Felix Manlunas
1. Change struct octnic_ctrl_pkt to support synchronous operation.
2. Change code which use structure octnic_ctrl_pkt to send sc's
   synchronously.

Signed-off-by: Weilin Chang 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/lio_core.c| 15 ++---
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 21 ---
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 69 ++
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 38 
 drivers/net/ethernet/cavium/liquidio/octeon_nic.c  | 56 --
 drivers/net/ethernet/cavium/liquidio/octeon_nic.h  |  9 +--
 6 files changed, 98 insertions(+), 110 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c 
b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 822ce0f..27b3655 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -198,14 +198,15 @@ int liquidio_set_feature(struct net_device *netdev, int 
cmd, u16 param1)
nctrl.ncmd.s.cmd = cmd;
nctrl.ncmd.s.param1 = param1;
nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
-   nctrl.wait_time = 100;
nctrl.netpndev = (u64)netdev;
nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
 
ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctrl);
-   if (ret < 0) {
+   if (ret) {
dev_err(&oct->pci_dev->dev, "Feature change failed in core 
(ret: 0x%x)\n",
ret);
+   if (ret > 0)
+   ret = -EIO;
}
return ret;
 }
@@ -285,15 +286,7 @@ void liquidio_link_ctrl_cmd_completion(void *nctrl_ptr)
struct octeon_device *oct = lio->oct_dev;
u8 *mac;
 
-   if (nctrl->completion && nctrl->response_code) {
-   /* Signal whoever is interested that the response code from the
-* firmware has arrived.
-*/
-   WRITE_ONCE(*nctrl->response_code, nctrl->status);
-   complete(nctrl->completion);
-   }
-
-   if (nctrl->status)
+   if (nctrl->sc_status)
return;
 
switch (nctrl->ncmd.s.cmd) {
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c 
b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
index 8e05afd..d374c44 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
@@ -472,12 +472,11 @@ lio_send_queue_count_update(struct net_device *netdev, 
uint32_t num_queues)
nctrl.ncmd.s.param1 = num_queues;
nctrl.ncmd.s.param2 = num_queues;
nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
-   nctrl.wait_time = 100;
nctrl.netpndev = (u64)netdev;
nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
 
ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctrl);
-   if (ret < 0) {
+   if (ret) {
dev_err(&oct->pci_dev->dev, "Failed to send Queue reset command 
(ret: 0x%x)\n",
ret);
return -1;
@@ -708,13 +707,13 @@ static int octnet_gpio_access(struct net_device *netdev, 
int addr, int val)
nctrl.ncmd.s.param1 = addr;
nctrl.ncmd.s.param2 = val;
nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
-   nctrl.wait_time = 100;
nctrl.netpndev = (u64)netdev;
nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
 
ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctrl);
-   if (ret < 0) {
-   dev_err(&oct->pci_dev->dev, "Failed to configure gpio value\n");
+   if (ret) {
+   dev_err(&oct->pci_dev->dev,
+   "Failed to configure gpio value, ret=%d\n", ret);
return -EINVAL;
}
 
@@ -734,13 +733,13 @@ static int octnet_id_active(struct net_device *netdev, 
int val)
nctrl.ncmd.s.cmd = OCTNET_CMD_ID_ACTIVE;
nctrl.ncmd.s.param1 = val;
nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
-   nctrl.wait_time = 100;
nctrl.netpndev = (u64)netdev;
nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
 
ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctrl);
-   if (ret < 0) {
-   dev_err(&oct->pci_dev->dev, "Failed to configure gpio value\n");
+   if (ret) {
+   dev_err(&oct->pci_dev->dev,
+   "Failed to configure gpio value, ret=%d\n", ret);
return -EINVAL;
}
 
@@ -1412,7 +1411,6 @@ lio_set_pauseparam(struct net_device *netdev, struct 
ethtool_pauseparam *pause)
nctrl.ncmd.u64 = 0;
nctrl.ncmd.s.cmd = OCTNET_CMD_SET_FLOW_CTL;
nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
-   nctrl.wait_time = 100;
nctrl.netpndev = (u64)netdev;
nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
 
@@ -1433,8 +1431,9 @@ lio_set_pauseparam(struct net_device *netdev, struct 
ethtool_pauseparam *pause)
}
 
ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctr

[PATCH net-next 4/4] liquidio: remove obsolete functions and data structures

2018-08-28 Thread Felix Manlunas
1. Remove unused functions and data structures.
2. Change the sending of the remaining soft commands to synchronous.

Signed-off-by: Weilin Chang 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/lio_core.c|  83 +---
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 235 ++---
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 165 +--
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 122 ---
 drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c  |   5 +-
 .../net/ethernet/cavium/liquidio/octeon_config.h   |   1 +
 drivers/net/ethernet/cavium/liquidio/octeon_iq.h   |   3 -
 drivers/net/ethernet/cavium/liquidio/octeon_main.h |  42 
 .../net/ethernet/cavium/liquidio/octeon_network.h  |  10 -
 9 files changed, 176 insertions(+), 490 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c 
b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 27b3655..30b4a60 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -32,38 +32,6 @@
 #define OCTNIC_MAX_SG  MAX_SKB_FRAGS
 
 /**
- * \brief Callback for getting interface configuration
- * @param status status of request
- * @param buf pointer to resp structure
- */
-void lio_if_cfg_callback(struct octeon_device *oct,
-u32 status __attribute__((unused)), void *buf)
-{
-   struct octeon_soft_command *sc = (struct octeon_soft_command *)buf;
-   struct liquidio_if_cfg_context *ctx;
-   struct liquidio_if_cfg_resp *resp;
-
-   resp = (struct liquidio_if_cfg_resp *)sc->virtrptr;
-   ctx = (struct liquidio_if_cfg_context *)sc->ctxptr;
-
-   oct = lio_get_device(ctx->octeon_id);
-   if (resp->status)
-   dev_err(&oct->pci_dev->dev, "nic if cfg instruction failed. 
Status: %llx\n",
-   CVM_CAST64(resp->status));
-   WRITE_ONCE(ctx->cond, 1);
-
-   snprintf(oct->fw_info.liquidio_firmware_version, 32, "%s",
-resp->cfg_info.liquidio_firmware_version);
-
-   /* This barrier is required to be sure that the response has been
-* written fully before waking up the handler
-*/
-   wmb();
-
-   wake_up_interruptible(&ctx->wc);
-}
-
-/**
  * \brief Delete gather lists
  * @param lio per-network private data
  */
@@ -1211,30 +1179,6 @@ int octeon_setup_interrupt(struct octeon_device *oct, 
u32 num_ioqs)
return 0;
 }
 
-static void liquidio_change_mtu_completion(struct octeon_device *oct,
-  u32 status, void *buf)
-{
-   struct octeon_soft_command *sc = (struct octeon_soft_command *)buf;
-   struct liquidio_if_cfg_context *ctx;
-
-   ctx  = (struct liquidio_if_cfg_context *)sc->ctxptr;
-
-   if (status) {
-   dev_err(&oct->pci_dev->dev, "MTU change failed. Status: %llx\n",
-   CVM_CAST64(status));
-   WRITE_ONCE(ctx->cond, LIO_CHANGE_MTU_FAIL);
-   } else {
-   WRITE_ONCE(ctx->cond, LIO_CHANGE_MTU_SUCCESS);
-   }
-
-   /* This barrier is required to be sure that the response has been
-* written fully before waking up the handler
-*/
-   wmb();
-
-   wake_up_interruptible(&ctx->wc);
-}
-
 /**
  * \brief Net device change_mtu
  * @param netdev network device
@@ -1243,22 +1187,17 @@ int liquidio_change_mtu(struct net_device *netdev, int 
new_mtu)
 {
struct lio *lio = GET_LIO(netdev);
struct octeon_device *oct = lio->oct_dev;
-   struct liquidio_if_cfg_context *ctx;
struct octeon_soft_command *sc;
union octnet_cmd *ncmd;
-   int ctx_size;
int ret = 0;
 
-   ctx_size = sizeof(struct liquidio_if_cfg_context);
sc = (struct octeon_soft_command *)
-   octeon_alloc_soft_command(oct, OCTNET_CMD_SIZE, 16, ctx_size);
+   octeon_alloc_soft_command(oct, OCTNET_CMD_SIZE, 16, 0);
 
ncmd = (union octnet_cmd *)sc->virtdptr;
-   ctx  = (struct liquidio_if_cfg_context *)sc->ctxptr;
 
-   WRITE_ONCE(ctx->cond, 0);
-   ctx->octeon_id = lio_get_device_id(oct);
-   init_waitqueue_head(&ctx->wc);
+   init_completion(&sc->complete);
+   sc->sc_status = OCTEON_REQUEST_PENDING;
 
ncmd->u64 = 0;
ncmd->s.cmd = OCTNET_CMD_CHANGE_MTU;
@@ -1271,28 +1210,28 @@ int liquidio_change_mtu(struct net_device *netdev, int 
new_mtu)
octeon_prepare_soft_command(oct, sc, OPCODE_NIC,
OPCODE_NIC_CMD, 0, 0, 0);
 
-   sc->callback = liquidio_change_mtu_completion;
-   sc->callback_arg = sc;
-   sc->wait_time = 100;
-
ret = octeon_send_soft_command(oct, sc);
if (ret == IQ_SEND_FAILED) {
netif_info(lio, rx_err, lio->netdev, "Failed to change MTU\n");
+   octeon_free_soft_command(oct, sc);
return -EINVAL;
}
/* Sleep on a wait queue till

[PATCH net-next 0/4] liquidio: improve soft command/response handling

2018-08-28 Thread Felix Manlunas
From: Weilin Chang 

Change soft command handling to fix the possible race condition when the
process handles a response of a soft command that was already freed by an
application which got timeout for this request.

Weilin Chang (4):
  liquidio: improve soft command handling
  liquidio: make soft command calls synchronous
  liquidio: change octnic_ctrl_pkt to do synchronous soft commands
  liquidio: remove obsolete functions and data structures

 drivers/net/ethernet/cavium/liquidio/lio_core.c| 232 
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 256 ++---
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 307 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 194 ++---
 drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c  |  47 ++--
 .../net/ethernet/cavium/liquidio/octeon_config.h   |   3 +-
 drivers/net/ethernet/cavium/liquidio/octeon_iq.h   |  12 +-
 drivers/net/ethernet/cavium/liquidio/octeon_main.h |  94 ---
 .../net/ethernet/cavium/liquidio/octeon_network.h  |  16 --
 drivers/net/ethernet/cavium/liquidio/octeon_nic.c  |  59 ++--
 drivers/net/ethernet/cavium/liquidio/octeon_nic.h  |   9 +-
 .../net/ethernet/cavium/liquidio/request_manager.c | 114 ++--
 .../ethernet/cavium/liquidio/response_manager.c|  82 +-
 .../ethernet/cavium/liquidio/response_manager.h|   4 +-
 14 files changed, 627 insertions(+), 802 deletions(-)

-- 
2.9.0



[PATCH can-next] can: ucan: remove set but not used variable 'udev'

2018-08-28 Thread YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/can/usb/ucan.c: In function 'ucan_disconnect':
drivers/net/can/usb/ucan.c:1578:21: warning:
 variable 'udev' set but not used [-Wunused-but-set-variable]
  struct usb_device *udev;

Signed-off-by: YueHaibing 
---
 drivers/net/can/usb/ucan.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/can/usb/ucan.c b/drivers/net/can/usb/ucan.c
index 0678a38..c9fd83e 100644
--- a/drivers/net/can/usb/ucan.c
+++ b/drivers/net/can/usb/ucan.c
@@ -1575,11 +1575,8 @@ static int ucan_probe(struct usb_interface *intf,
 /* disconnect the device */
 static void ucan_disconnect(struct usb_interface *intf)
 {
-   struct usb_device *udev;
struct ucan_priv *up = usb_get_intfdata(intf);
 
-   udev = interface_to_usbdev(intf);
-
usb_set_intfdata(intf, NULL);
 
if (up) {



Re: [PATCH v2 iproute2-next 2/5] bridge: colorize output and use JSON print library

2018-08-28 Thread Roopa Prabhu
On Sat, Jul 14, 2018 at 6:41 PM, Roopa Prabhu  wrote:
> On Tue, Feb 20, 2018 at 11:24 AM, Stephen Hemminger
>  wrote:
>> From: Stephen Hemminger 
>>
>> Use new functions from json_print to simplify code.
>> Provide standard flag for colorizing output.
>>
>> The shortened -c flag is ambiguous it could mean color or
>> compressvlan; it is now changed to mean color for consistency
>> with other iproute2 commands.
>>
>> Signed-off-by: Stephen Hemminger 
>> ---

[snip]

>
> Stephen, this seems to have broken both json and non-json output.
>
> Here is some output before and after the patch (same thing for tunnelshow):
>
> before:
> $bridge vlan show
> portvlan ids
> hostbond41000
>  1001 PVID Egress Untagged
>  1002
>  1003
>  1004
>
> hostbond31000 PVID Egress Untagged
>  1001
>  1002
>  1003
>  1004
>
> bridge   1 PVID Egress Untagged
>  1000
>  1001
>  1002
>  1003
>  1004
>
> vxlan0   1 PVID Egress Untagged
>  1000
>  1001
>  1002
>  1003
>  1004
>
>
> $ bridge -j -c vlan show
> {
> "hostbond4": [{
> "vlan": 1000
> },{
> "vlan": 1001,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1002,
> "vlanEnd": 1004
> }
> ],
> "hostbond3": [{
> "vlan": 1000,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1001,
> "vlanEnd": 1004
> }
> ],
> "bridge": [{
> "vlan": 1,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1000,
> "vlanEnd": 1004
> }
> ],
> "vxlan0": [{
> "vlan": 1,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1000,
> "vlanEnd": 1004
> }
> ]
> }
>
>
> after:
> 
>
> $bridge vlan show
> portvlan ids
> hostbond4
>  10001001 PVID untagged  100210031004
> hostbond3
>  1000 PVID untagged  1001100210031004
> bridge
>  1 PVID untagged 10001001100210031004
> vxlan0
>  1 PVID untagged 10001001100210031004
>
> $bridge -j -c vlan show
> ["hostbond4","vlan":[{"vlan":1000},{"vlan":1001,"pvid":null,"untagged":null},{"vlan":1002},{"vlan":1003},{"vlan":1004}],"hostbond3","vlan":[{"vlan":1000,"pvid":null,"untagged":null},{"vlan":1001},{"vlan":1002},{"vlan":1003},{"vlan":1004}],"bridge","vlan":[{"vlan":1,"pvid":null,"untagged":null},{"vlan":1000},{"vlan":1001},{"vlan":1002},{"vlan":1003},{"vlan":1004}],"vxlan0","vlan":[{"vlan":1,"pvid":null,"untagged":null},{"vlan":1000},{"vlan":1001},{"vlan":1002},{"vlan":1003},{"vlan":1004}]]


Stephen, ping again...

I was trying to fix it ...but its not trivial enough for the time I
have right now.
If this cannot be fixed soon, I request you to please revert the patch
as it has broken the json output completely.

Thanks.


[PATCH can-next] can: ucan: remove duplicated include from ucan.c

2018-08-28 Thread YueHaibing
Remove duplicated include.

Signed-off-by: YueHaibing 
---
 drivers/net/can/usb/ucan.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/can/usb/ucan.c b/drivers/net/can/usb/ucan.c
index 0678a38..c6f4b41 100644
--- a/drivers/net/can/usb/ucan.c
+++ b/drivers/net/can/usb/ucan.c
@@ -35,10 +35,6 @@
 #include 
 #include 
 
-#include 
-#include 
-#include 
-
 #define UCAN_DRIVER_NAME "ucan"
 #define UCAN_MAX_RX_URBS 8
 /* the CAN controller needs a while to enable/disable the bus */



Re: [net-next 00/13][pull request] 10GbE Intel Wired LAN Driver Updates 2018-08-28

2018-08-28 Thread David Miller
From: Jeff Kirsher 
Date: Tue, 28 Aug 2018 14:35:44 -0700

> This series contains updates to ixgbe and ixgbevf only.
 ...

Pulled.


Re: [PATCH net-next 00/15] nfp: add NFP5000 support

2018-08-28 Thread David Miller
From: Jakub Kicinski 
Date: Tue, 28 Aug 2018 13:20:32 -0700

> This series broadly speaking adds support for NFP5000 and
> related products.
 ...

Series applied, thanks Jakub.


Re: [net-next 00/15][pull request] 100GbE Intel Wired LAN Driver Updates 2018-08-28

2018-08-28 Thread David Miller
From: Jeff Kirsher 
Date: Tue, 28 Aug 2018 12:03:58 -0700

> This series contains new features and implementation updates for the
> ice driver.
 ...

Pulled.


[PATCH net-next,v5] net/tls: Calculate nsg for zerocopy path without skb_cow_data.

2018-08-28 Thread Doron Roberts-Kedes
decrypt_skb fails if the number of sg elements required to map it
is greater than MAX_SKB_FRAGS. nsg must always be calculated, but
skb_cow_data adds unnecessary memcpy's for the zerocopy case.

The new function skb_nsg calculates the number of scatterlist elements
required to map the skb without the extra overhead of skb_cow_data.
This patch reduces memcpy by 50% on my encrypted NBD benchmarks.

Reported-by: Vakul Garg 
Reviewed-by: Vakul Garg 
Tested-by: Vakul Garg 
Signed-off-by: Doron Roberts-Kedes 
---
 net/tls/tls_sw.c | 80 +++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 52fbe727d7c1..4ba62cd00a94 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -43,6 +43,82 @@
 
 #define MAX_IV_SIZETLS_CIPHER_AES_GCM_128_IV_SIZE
 
+static int __skb_nsg(struct sk_buff *skb, int offset, int len,
+ unsigned int recursion_level)
+{
+int start = skb_headlen(skb);
+int i, chunk = start - offset;
+struct sk_buff *frag_iter;
+int elt = 0;
+
+if (unlikely(recursion_level >= 24))
+return -EMSGSIZE;
+
+if (chunk > 0) {
+if (chunk > len)
+chunk = len;
+elt++;
+len -= chunk;
+if (len == 0)
+return elt;
+offset += chunk;
+}
+
+for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+int end;
+
+WARN_ON(start > offset + len);
+
+end = start + skb_frag_size(&skb_shinfo(skb)->frags[i]);
+chunk = end - offset;
+if (chunk > 0) {
+if (chunk > len)
+chunk = len;
+elt++;
+len -= chunk;
+if (len == 0)
+return elt;
+offset += chunk;
+}
+start = end;
+}
+
+if (unlikely(skb_has_frag_list(skb))) {
+skb_walk_frags(skb, frag_iter) {
+int end, ret;
+
+WARN_ON(start > offset + len);
+
+end = start + frag_iter->len;
+chunk = end - offset;
+if (chunk > 0) {
+if (chunk > len)
+chunk = len;
+ret = __skb_nsg(frag_iter, offset - start, 
chunk,
+recursion_level + 1);
+if (unlikely(ret < 0))
+return ret;
+elt += ret;
+len -= chunk;
+if (len == 0)
+return elt;
+offset += chunk;
+}
+start = end;
+}
+}
+BUG_ON(len);
+return elt;
+}
+
+/* Return the number of scatterlist elements required to completely map the
+ * skb, or -EMSGSIZE if the recursion depth is exceeded.
+ */
+static int skb_nsg(struct sk_buff *skb, int offset, int len)
+{
+return __skb_nsg(skb, offset, len, 0);
+}
+
 static int tls_do_decryption(struct sock *sk,
 struct scatterlist *sgin,
 struct scatterlist *sgout,
@@ -678,12 +754,14 @@ static int decrypt_internal(struct sock *sk, struct 
sk_buff *skb,
n_sgout = iov_iter_npages(out_iov, INT_MAX) + 1;
else
n_sgout = sg_nents(out_sg);
+   n_sgin = skb_nsg(skb, rxm->offset + tls_ctx->rx.prepend_size,
+rxm->full_len - tls_ctx->rx.prepend_size);
} else {
n_sgout = 0;
*zc = false;
+   n_sgin = skb_cow_data(skb, 0, &unused);
}
 
-   n_sgin = skb_cow_data(skb, 0, &unused);
if (n_sgin < 1)
return -EBADMSG;
 
-- 
2.17.1



Re: [Patch iproute2] ss: add UNIX_DIAG_VFS and UNIX_DIAG_ICONS for unix sockets

2018-08-28 Thread Cong Wang
On Mon, Aug 27, 2018 at 3:27 PM Stephen Hemminger
 wrote:
>
> On Mon, 27 Aug 2018 14:46:52 -0700
> Cong Wang  wrote:
>
> > UNIX_DIAG_VFS and UNIX_DIAG_ICONS are never used by ss,
> > make them available in ss -e output.
> >
> > Cc: Stephen Hemminger 
> > Signed-off-by: Cong Wang 
> > ---
> >  misc/ss.c | 25 +
> >  1 file changed, 25 insertions(+)
> >
> > diff --git a/misc/ss.c b/misc/ss.c
> > index 41e7762b..d28bc1ec 100644
> > --- a/misc/ss.c
> > +++ b/misc/ss.c
> > @@ -16,6 +16,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
>
> Why is this included, it isn't on my system.

It is for major() and minor().

$ find /usr/include/ -name sysmacros.h
/usr/include/bits/sysmacros.h
/usr/include/sys/sysmacros.h
$ rpm -qf /usr/include/sys/sysmacros.h
glibc-headers-2.26-28.fc27.x86_64

So you are not using glibc? Or iproute2 should be built with non-glibc?

>
> >  #include 
> >  #include 
> >  #include 
> > @@ -3604,6 +3605,28 @@ static int unix_show_sock(const struct sockaddr_nl 
> > *addr, struct nlmsghdr *nlh,
> >   out(" %c-%c",
> >   mask & 1 ? '-' : '<', mask & 2 ? '-' : '>');
> >   }
> > + if (tb[UNIX_DIAG_VFS]) {
> > + struct unix_diag_vfs uv;
> > +
> > + memcpy(&uv, RTA_DATA(tb[UNIX_DIAG_VFS]), sizeof(uv));
>
> Copy here is unnecessary, you can just do:
> const struct unix_diag_vfs *uv
> = RTA_DATA(tb[UNIX_DIAG_VFS]);


Oh, good point!


>
> > + out(" ino:%u dev:%u/%u", uv.udiag_vfs_ino, 
> > major(uv.udiag_vfs_dev),
> > +  minor(uv.udiag_vfs_dev));
> > + }
> > + if (tb[UNIX_DIAG_ICONS]) {
> > + int len = RTA_PAYLOAD(tb[UNIX_DIAG_ICONS]);
> > + __u32 *peers = malloc(len);
> > + int i;
>
> Ditto, allocation and copy are not necessary, just reference the data.
>

Sure, will update.

Thanks.


[net-next 02/13] ixgbevf: VF2VF TCP RSS

2018-08-28 Thread Jeff Kirsher
From: Sebastian Basierski 

While VF2VF with RSS communication, RSS Type were wrongly recognized
and RSS hash was not calculated as it should be. Packets was
distributed on various queues by accident.
This commit fixes that behaviour and causes proper RSS Type recognition.

Signed-off-by: Sebastian Basierski 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index d86446d202d5..15deac07fd92 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3849,6 +3849,10 @@ static void ixgbevf_tx_csum(struct ixgbevf_ring *tx_ring,
skb_checksum_help(skb);
goto no_csum;
}
+
+   if (first->protocol == htons(ETH_P_IP))
+   type_tucmd |= IXGBE_ADVTXD_TUCMD_IPV4;
+
/* update TX checksum flag */
first->tx_flags |= IXGBE_TX_FLAGS_CSUM;
vlan_macip_lens = skb_checksum_start_offset(skb) -
-- 
2.17.1



[net-next 03/13] ixgbe: don't clear IPsec sa counters on HW clearing

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

The software SA record counters should not be cleared when clearing
the hardware tables.  This causes the counters to be out of sync
after a driver reset.

Fixes: 63a67fe229ea ("ixgbe: add ipsec offload add and remove SA")
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index da4322e4daed..e515246d0bce 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -113,7 +113,6 @@ static void ixgbe_ipsec_set_rx_ip(struct ixgbe_hw *hw, u16 
idx, __be32 addr[])
  **/
 static void ixgbe_ipsec_clear_hw_tables(struct ixgbe_adapter *adapter)
 {
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
struct ixgbe_hw *hw = &adapter->hw;
u32 buf[4] = {0, 0, 0, 0};
u16 idx;
@@ -132,9 +131,6 @@ static void ixgbe_ipsec_clear_hw_tables(struct 
ixgbe_adapter *adapter)
ixgbe_ipsec_set_tx_sa(hw, idx, buf, 0);
ixgbe_ipsec_set_rx_sa(hw, idx, 0, buf, 0, 0, 0);
}
-
-   ipsec->num_rx_sa = 0;
-   ipsec->num_tx_sa = 0;
 }
 
 /**
-- 
2.17.1



[net-next 01/13] ixgbe: firmware recovery mode

2018-08-28 Thread Jeff Kirsher
From: Sebastian Basierski 

Add check for FW NVM recovery mode during driver initialization and
service task. If in recovery mode, log message and unregister device

Signed-off-by: Sebastian Basierski 
Tested-by: Don Buchholz 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ixgbe/ixgbe_common.c   | 11 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 41 +++
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h |  4 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 15 +++
 4 files changed, 71 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 0bd1294ba517..970f71d5da04 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -3484,6 +3484,17 @@ void ixgbe_set_vlan_anti_spoofing(struct ixgbe_hw *hw, 
bool enable, int vf)
IXGBE_WRITE_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg), pfvfspoof);
 }
 
+/**
+ * ixgbe_fw_recovery_mode - Check if in FW NVM recovery mode
+ * @hw: pointer to hardware structure
+ */
+bool ixgbe_fw_recovery_mode(struct ixgbe_hw *hw)
+{
+   if (hw->mac.ops.fw_recovery_mode)
+   return hw->mac.ops.fw_recovery_mode(hw);
+   return false;
+}
+
 /**
  *  ixgbe_get_device_caps_generic - Get additional device capabilities
  *  @hw: pointer to hardware structure
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9a23d33a47ed..604282f03d23 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7774,6 +7774,33 @@ static void ixgbe_reset_subtask(struct ixgbe_adapter 
*adapter)
rtnl_unlock();
 }
 
+/**
+ * ixgbe_check_fw_error - Check firmware for errors
+ * @adapter: the adapter private structure
+ *
+ * Check firmware errors in register FWSM
+ */
+static bool ixgbe_check_fw_error(struct ixgbe_adapter *adapter)
+{
+   struct ixgbe_hw *hw = &adapter->hw;
+   u32 fwsm;
+
+   /* read fwsm.ext_err_ind register and log errors */
+   fwsm = IXGBE_READ_REG(hw, IXGBE_FWSM(hw));
+
+   if (fwsm & IXGBE_FWSM_EXT_ERR_IND_MASK ||
+   !(fwsm & IXGBE_FWSM_FW_VAL_BIT))
+   e_dev_warn("Warning firmware error detected FWSM: 0x%08X\n",
+  fwsm);
+
+   if (hw->mac.ops.fw_recovery_mode && hw->mac.ops.fw_recovery_mode(hw)) {
+   e_dev_err("Firmware recovery mode detected. Limiting 
functionality. Refer to the Intel(R) Ethernet Adapters and Devices User Guide 
for details on firmware recovery mode.\n");
+   return true;
+   }
+
+   return false;
+}
+
 /**
  * ixgbe_service_task - manages and runs subtasks
  * @work: pointer to work_struct containing our data
@@ -7792,6 +7819,15 @@ static void ixgbe_service_task(struct work_struct *work)
ixgbe_service_event_complete(adapter);
return;
}
+   if (ixgbe_check_fw_error(adapter)) {
+   if (!test_bit(__IXGBE_DOWN, &adapter->state)) {
+   rtnl_lock();
+   unregister_netdev(adapter->netdev);
+   rtnl_unlock();
+   }
+   ixgbe_service_event_complete(adapter);
+   return;
+   }
if (adapter->flags2 & IXGBE_FLAG2_UDP_TUN_REREG_NEEDED) {
rtnl_lock();
adapter->flags2 &= ~IXGBE_FLAG2_UDP_TUN_REREG_NEEDED;
@@ -10716,6 +10752,11 @@ static int ixgbe_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)
netdev->features |= NETIF_F_LRO;
 
+   if (ixgbe_check_fw_error(adapter)) {
+   err = -EIO;
+   goto err_sw_init;
+   }
+
/* make sure the EEPROM is good */
if (hw->eeprom.ops.validate_checksum(hw, NULL) < 0) {
e_dev_err("The EEPROM Checksum Is Not Valid\n");
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index 41bcbb337e83..84f2dba39e36 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -924,6 +924,9 @@ struct ixgbe_nvm_version {
 /* Firmware Semaphore Register */
 #define IXGBE_FWSM_MODE_MASK   0xE
 #define IXGBE_FWSM_FW_MODE_PT  0x4
+#define IXGBE_FWSM_FW_NVM_RECOVERY_MODEBIT(5)
+#define IXGBE_FWSM_EXT_ERR_IND_MASK0x01F8
+#define IXGBE_FWSM_FW_VAL_BIT  BIT(15)
 
 /* ARC Subsystem registers */
 #define IXGBE_HICR  0x15F00
@@ -3461,6 +3464,7 @@ struct ixgbe_mac_operations {
  const char *);
s32 (*get_thermal_sensor_data)(struct ixgbe_hw *);
s32 (*init_thermal_sensor_thresh)(struct ixgbe_hw *hw);
+   bool (*fw_recovery_mode)(struct ixgbe_hw *hw);
void (*disable_rx)(struct ixgbe_hw *hw);
void (*enable_rx)(struct ixgbe_hw *hw);
void (

[net-next 08/13] ixgbe: add VF IPsec offload request message handling

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

Add an add and a delete message for IPsec offload requests from
the VF.  These call into the IPsec functions that can translate
the message buffer into a useful IPsec offload.

These new messages bump the mbox API version to 1.4.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  | 19 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h  |  5 +
 .../net/ethernet/intel/ixgbe/ixgbe_sriov.c| 17 -
 3 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 89e709ce1947..5c6fd42e90ed 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -1004,15 +1004,24 @@ void ixgbe_ipsec_rx(struct ixgbe_ring *rx_ring,
struct sk_buff *skb);
 int ixgbe_ipsec_tx(struct ixgbe_ring *tx_ring, struct ixgbe_tx_buffer *first,
   struct ixgbe_ipsec_tx_data *itd);
+void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter, u32 vf);
+int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *mbuf, u32 vf);
+int ixgbe_ipsec_vf_del_sa(struct ixgbe_adapter *adapter, u32 *mbuf, u32 vf);
 #else
-static inline void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter) { };
-static inline void ixgbe_stop_ipsec_offload(struct ixgbe_adapter *adapter) { };
-static inline void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter) { };
+static inline void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter) { }
+static inline void ixgbe_stop_ipsec_offload(struct ixgbe_adapter *adapter) { }
+static inline void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter) { }
 static inline void ixgbe_ipsec_rx(struct ixgbe_ring *rx_ring,
  union ixgbe_adv_rx_desc *rx_desc,
- struct sk_buff *skb) { };
+ struct sk_buff *skb) { }
 static inline int ixgbe_ipsec_tx(struct ixgbe_ring *tx_ring,
 struct ixgbe_tx_buffer *first,
-struct ixgbe_ipsec_tx_data *itd) { return 0; };
+struct ixgbe_ipsec_tx_data *itd) { return 0; }
+static inline void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter,
+   u32 vf) { }
+static inline int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter,
+   u32 *mbuf, u32 vf) { return -EACCES; }
+static inline int ixgbe_ipsec_vf_del_sa(struct ixgbe_adapter *adapter,
+   u32 *mbuf, u32 vf) { return -EACCES; }
 #endif /* CONFIG_XFRM_OFFLOAD */
 #endif /* _IXGBE_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
index e085b6520dac..a148534d7256 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
@@ -50,6 +50,7 @@ enum ixgbe_pfvf_api_rev {
ixgbe_mbox_api_11,  /* API version 1.1, linux/freebsd VF driver */
ixgbe_mbox_api_12,  /* API version 1.2, linux/freebsd VF driver */
ixgbe_mbox_api_13,  /* API version 1.3, linux/freebsd VF driver */
+   ixgbe_mbox_api_14,  /* API version 1.4, linux/freebsd VF driver */
/* This value should always be last */
ixgbe_mbox_api_unknown, /* indicates that API version is not known */
 };
@@ -80,6 +81,10 @@ enum ixgbe_pfvf_api_rev {
 
 #define IXGBE_VF_UPDATE_XCAST_MODE 0x0c
 
+/* mailbox API, version 1.4 VF requests */
+#define IXGBE_VF_IPSEC_ADD 0x0d
+#define IXGBE_VF_IPSEC_DEL 0x0e
+
 /* length of permanent address message returned from PF */
 #define IXGBE_VF_PERMADDR_MSG_LEN 4
 /* word in permanent address message with the current multicast type */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 3c6f01c41b78..af25a8fffeb8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -496,6 +496,7 @@ static s32 ixgbe_set_vf_lpe(struct ixgbe_adapter *adapter, 
u32 *msgbuf, u32 vf)
case ixgbe_mbox_api_11:
case ixgbe_mbox_api_12:
case ixgbe_mbox_api_13:
+   case ixgbe_mbox_api_14:
/* Version 1.1 supports jumbo frames on VFs if PF has
 * jumbo frames enabled which means legacy VFs are
 * disabled
@@ -728,6 +729,9 @@ static inline void ixgbe_vf_reset_event(struct 
ixgbe_adapter *adapter, u32 vf)
/* reset multicast table array for vf */
adapter->vfinfo[vf].num_vf_mc_hashes = 0;
 
+   /* clear any ipsec table info */
+   ixgbe_ipsec_vf_clear(adapter, vf);
+
/* Flush and reset the mta with the new values */
ixgbe_set_rx_mode(ada

[net-next 09/13] ixgbevf: add defines for IPsec offload request

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

Fix up the register definitions for using IPsec offloads and
add the new mailbox message IDs.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/defines.h | 8 
 drivers/net/ethernet/intel/ixgbevf/mbx.h | 5 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbevf/defines.h 
b/drivers/net/ethernet/intel/ixgbevf/defines.h
index 700d8eb2f6f8..dd9cd4541d7a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/defines.h
+++ b/drivers/net/ethernet/intel/ixgbevf/defines.h
@@ -133,9 +133,14 @@ typedef u32 ixgbe_link_speed;
 #define IXGBE_RXDADV_STAT_FCSTAT_NODDP 0x0010 /* 01: Ctxt w/o DDP */
 #define IXGBE_RXDADV_STAT_FCSTAT_FCPRSP0x0020 /* 10: Recv. FCP_RSP 
*/
 #define IXGBE_RXDADV_STAT_FCSTAT_DDP   0x0030 /* 11: Ctxt w/ DDP */
+#define IXGBE_RXDADV_STAT_SECP 0x0002 /* IPsec/MACsec pkt found */
 
 #define IXGBE_RXDADV_RSSTYPE_MASK  0x000F
 #define IXGBE_RXDADV_PKTTYPE_MASK  0xFFF0
+#define IXGBE_RXDADV_PKTTYPE_IPV4  0x0010 /* IPv4 hdr present */
+#define IXGBE_RXDADV_PKTTYPE_IPV6  0x0040 /* IPv6 hdr present */
+#define IXGBE_RXDADV_PKTTYPE_IPSEC_ESP 0x1000 /* IPSec ESP */
+#define IXGBE_RXDADV_PKTTYPE_IPSEC_AH  0x2000 /* IPSec AH */
 #define IXGBE_RXDADV_PKTTYPE_MASK_EX   0x0001FFF0
 #define IXGBE_RXDADV_HDRBUFLEN_MASK0x7FE0
 #define IXGBE_RXDADV_RSCCNT_MASK   0x001E
@@ -250,9 +255,12 @@ struct ixgbe_adv_tx_context_desc {
 #define IXGBE_ADVTXD_TUCMD_L4T_UDP 0x  /* L4 Packet TYPE of UDP */
 #define IXGBE_ADVTXD_TUCMD_L4T_TCP 0x0800  /* L4 Packet TYPE of TCP */
 #define IXGBE_ADVTXD_TUCMD_L4T_SCTP0x1000  /* L4 Packet TYPE of SCTP */
+#define IXGBE_ADVTXD_TUCMD_IPSEC_TYPE_ESP   0x2000 /* IPSec Type ESP */
+#define IXGBE_ADVTXD_TUCMD_IPSEC_ENCRYPT_EN 0x4000 /* ESP Encrypt Enable */
 #define IXGBE_ADVTXD_IDX_SHIFT 4 /* Adv desc Index shift */
 #define IXGBE_ADVTXD_CC0x0080 /* Check Context */
 #define IXGBE_ADVTXD_POPTS_SHIFT   8  /* Adv desc POPTS shift */
+#define IXGBE_ADVTXD_POPTS_IPSEC   0x0400 /* IPSec offload request */
 #define IXGBE_ADVTXD_POPTS_IXSM(IXGBE_TXD_POPTS_IXSM << \
 IXGBE_ADVTXD_POPTS_SHIFT)
 #define IXGBE_ADVTXD_POPTS_TXSM(IXGBE_TXD_POPTS_TXSM << \
diff --git a/drivers/net/ethernet/intel/ixgbevf/mbx.h 
b/drivers/net/ethernet/intel/ixgbevf/mbx.h
index bfd9ae150808..853796c8ef0e 100644
--- a/drivers/net/ethernet/intel/ixgbevf/mbx.h
+++ b/drivers/net/ethernet/intel/ixgbevf/mbx.h
@@ -62,6 +62,7 @@ enum ixgbe_pfvf_api_rev {
ixgbe_mbox_api_11,  /* API version 1.1, linux/freebsd VF driver */
ixgbe_mbox_api_12,  /* API version 1.2, linux/freebsd VF driver */
ixgbe_mbox_api_13,  /* API version 1.3, linux/freebsd VF driver */
+   ixgbe_mbox_api_14,  /* API version 1.4, linux/freebsd VF driver */
/* This value should always be last */
ixgbe_mbox_api_unknown, /* indicates that API version is not known */
 };
@@ -92,6 +93,10 @@ enum ixgbe_pfvf_api_rev {
 
 #define IXGBE_VF_UPDATE_XCAST_MODE 0x0c
 
+/* mailbox API, version 1.4 VF requests */
+#define IXGBE_VF_IPSEC_ADD 0x0d
+#define IXGBE_VF_IPSEC_DEL 0x0e
+
 /* length of permanent address message returned from PF */
 #define IXGBE_VF_PERMADDR_MSG_LEN  4
 /* word in permanent address message with the current multicast type */
-- 
2.17.1



[net-next 06/13] ixgbe: add VF IPsec management

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

Add functions to translate VF IPsec offload add and delete requests
into something the existing code can work with.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 256 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.h|  13 +
 2 files changed, 260 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 3afb1fe766cd..80108e12ab86 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -8,6 +8,8 @@
 #define IXGBE_IPSEC_KEY_BITS  160
 static const char aes_gcm_name[] = "rfc4106(gcm(aes))";
 
+static void ixgbe_ipsec_del_sa(struct xfrm_state *xs);
+
 /**
  * ixgbe_ipsec_set_tx_sa - set the Tx SA registers
  * @hw: hw specific details
@@ -289,6 +291,13 @@ static void ixgbe_ipsec_start_engine(struct ixgbe_adapter 
*adapter)
 /**
  * ixgbe_ipsec_restore - restore the ipsec HW settings after a reset
  * @adapter: board private structure
+ *
+ * Reload the HW tables from the SW tables after they've been bashed
+ * by a chip reset.
+ *
+ * Any VF entries are removed from the SW and HW tables since either
+ * (a) the VF also gets reset on PF reset and will ask again for the
+ * offloads, or (b) the VF has been removed by a change in the num_vfs.
  **/
 void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
 {
@@ -306,16 +315,24 @@ void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
 
/* reload the Rx and Tx keys */
for (i = 0; i < IXGBE_IPSEC_MAX_SA_COUNT; i++) {
-   struct rx_sa *rsa = &ipsec->rx_tbl[i];
-   struct tx_sa *tsa = &ipsec->tx_tbl[i];
-
-   if (rsa->used)
-   ixgbe_ipsec_set_rx_sa(hw, i, rsa->xs->id.spi,
- rsa->key, rsa->salt,
- rsa->mode, rsa->iptbl_ind);
+   struct rx_sa *r = &ipsec->rx_tbl[i];
+   struct tx_sa *t = &ipsec->tx_tbl[i];
+
+   if (r->used) {
+   if (r->mode & IXGBE_RXTXMOD_VF)
+   ixgbe_ipsec_del_sa(r->xs);
+   else
+   ixgbe_ipsec_set_rx_sa(hw, i, r->xs->id.spi,
+ r->key, r->salt,
+ r->mode, r->iptbl_ind);
+   }
 
-   if (tsa->used)
-   ixgbe_ipsec_set_tx_sa(hw, i, tsa->key, tsa->salt);
+   if (t->used) {
+   if (t->mode & IXGBE_RXTXMOD_VF)
+   ixgbe_ipsec_del_sa(t->xs);
+   else
+   ixgbe_ipsec_set_tx_sa(hw, i, t->key, t->salt);
+   }
}
 
/* reload the IP addrs */
@@ -381,6 +398,8 @@ static struct xfrm_state *ixgbe_ipsec_find_rx_state(struct 
ixgbe_ipsec *ipsec,
rcu_read_lock();
hash_for_each_possible_rcu(ipsec->rx_sa_list, rsa, hlist,
   (__force u32)spi) {
+   if (rsa->mode & IXGBE_RXTXMOD_VF)
+   continue;
if (spi == rsa->xs->id.spi &&
((ip4 && *daddr == rsa->xs->id.daddr.a4) ||
  (!ip4 && !memcmp(daddr, &rsa->xs->id.daddr.a6,
@@ -808,6 +827,225 @@ static const struct xfrmdev_ops ixgbe_xfrmdev_ops = {
.xdo_dev_offload_ok = ixgbe_ipsec_offload_ok,
 };
 
+/**
+ * ixgbe_ipsec_vf_clear - clear the tables of data for a VF
+ * @adapter: board private structure
+ * @vf: VF id to be removed
+ **/
+void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter, u32 vf)
+{
+   struct ixgbe_ipsec *ipsec = adapter->ipsec;
+   int i;
+
+   /* search rx sa table */
+   for (i = 0; i < IXGBE_IPSEC_MAX_SA_COUNT && ipsec->num_rx_sa; i++) {
+   if (!ipsec->rx_tbl[i].used)
+   continue;
+   if (ipsec->rx_tbl[i].mode & IXGBE_RXTXMOD_VF &&
+   ipsec->rx_tbl[i].vf == vf)
+   ixgbe_ipsec_del_sa(ipsec->rx_tbl[i].xs);
+   }
+
+   /* search tx sa table */
+   for (i = 0; i < IXGBE_IPSEC_MAX_SA_COUNT && ipsec->num_tx_sa; i++) {
+   if (!ipsec->tx_tbl[i].used)
+   continue;
+   if (ipsec->tx_tbl[i].mode & IXGBE_RXTXMOD_VF &&
+   ipsec->tx_tbl[i].vf == vf)
+   ixgbe_ipsec_del_sa(ipsec->tx_tbl[i].xs);
+   }
+}
+
+/**
+ * ixgbe_ipsec_vf_add_sa - translate VF request to SA add
+ * @adapter: board private structure
+ * @msgbuf: The message buffer
+ * @vf: the VF index
+ *
+ * Make up a new xs and algorithm info from the data sent by the VF.
+ * We only need to sketch in just enough to set up the HW offload.
+ * Put the resulting offload_handle int

[net-next 04/13] ixgbe: reload IPsec IP table after sa tables

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

Restore the IPsec hardware IP table after reloading the SA tables.
This doesn't make much difference now, but will matter when we add
support for VF IPsec offloads.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index e515246d0bce..434065109b8d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -301,14 +301,6 @@ void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
ixgbe_ipsec_clear_hw_tables(adapter);
ixgbe_ipsec_start_engine(adapter);
 
-   /* reload the IP addrs */
-   for (i = 0; i < IXGBE_IPSEC_MAX_RX_IP_COUNT; i++) {
-   struct rx_ip_sa *ipsa = &ipsec->ip_tbl[i];
-
-   if (ipsa->used)
-   ixgbe_ipsec_set_rx_ip(hw, i, ipsa->ipaddr);
-   }
-
/* reload the Rx and Tx keys */
for (i = 0; i < IXGBE_IPSEC_MAX_SA_COUNT; i++) {
struct rx_sa *rsa = &ipsec->rx_tbl[i];
@@ -322,6 +314,14 @@ void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
if (tsa->used)
ixgbe_ipsec_set_tx_sa(hw, i, tsa->key, tsa->salt);
}
+
+   /* reload the IP addrs */
+   for (i = 0; i < IXGBE_IPSEC_MAX_RX_IP_COUNT; i++) {
+   struct rx_ip_sa *ipsa = &ipsec->ip_tbl[i];
+
+   if (ipsa->used)
+   ixgbe_ipsec_set_rx_ip(hw, i, ipsa->ipaddr);
+   }
 }
 
 /**
-- 
2.17.1



[net-next 13/13] ixgbe: fix the return value for unsupported VF offload

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

When failing the request because we can't support that offload,
reporting EOPNOTSUPP makes much more sense than ENXIO.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 0a1c8bf3f74f..fd1b0546fd67 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -894,7 +894,7 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, 
u32 *msgbuf, u32 vf)
 * device, so block these requests for now.
 */
if (!(sam->flags & XFRM_OFFLOAD_INBOUND)) {
-   err = -ENXIO;
+   err = -EOPNOTSUPP;
goto err_out;
}
 
-- 
2.17.1



[net-next 10/13] ixgbevf: add VF IPsec offload code

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

Add the IPsec offload support code.  This is based off of the similar
code in ixgbe, but instead of writing the SA registers, the VF asks
the PF to setup the offload by sending the offload information to the
PF via the standard mailbox.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/Makefile  |   1 +
 drivers/net/ethernet/intel/ixgbevf/ipsec.c   | 673 +++
 drivers/net/ethernet/intel/ixgbevf/ipsec.h   |  66 ++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h |   8 +
 4 files changed, 748 insertions(+)
 create mode 100644 drivers/net/ethernet/intel/ixgbevf/ipsec.c
 create mode 100644 drivers/net/ethernet/intel/ixgbevf/ipsec.h

diff --git a/drivers/net/ethernet/intel/ixgbevf/Makefile 
b/drivers/net/ethernet/intel/ixgbevf/Makefile
index aba1e6a37a6a..297d0f0858b5 100644
--- a/drivers/net/ethernet/intel/ixgbevf/Makefile
+++ b/drivers/net/ethernet/intel/ixgbevf/Makefile
@@ -10,4 +10,5 @@ ixgbevf-objs := vf.o \
 mbx.o \
 ethtool.o \
 ixgbevf_main.o
+ixgbevf-$(CONFIG_XFRM_OFFLOAD) += ipsec.o
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c 
b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
new file mode 100644
index ..997cea675a37
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -0,0 +1,673 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2018 Oracle and/or its affiliates. All rights reserved. */
+
+#include "ixgbevf.h"
+#include 
+#include 
+
+#define IXGBE_IPSEC_KEY_BITS  160
+static const char aes_gcm_name[] = "rfc4106(gcm(aes))";
+
+/**
+ * ixgbevf_ipsec_set_pf_sa - ask the PF to set up an SA
+ * @adapter: board private structure
+ * @xs: xfrm info to be sent to the PF
+ *
+ * Returns: positive offload handle from the PF, or negative error code
+ **/
+static int ixgbevf_ipsec_set_pf_sa(struct ixgbevf_adapter *adapter,
+  struct xfrm_state *xs)
+{
+   u32 msgbuf[IXGBE_VFMAILBOX_SIZE] = { 0 };
+   struct ixgbe_hw *hw = &adapter->hw;
+   struct sa_mbx_msg *sam;
+   u16 msglen;
+   int ret;
+
+   /* send the important bits to the PF */
+   sam = (struct sa_mbx_msg *)(&msgbuf[1]);
+   sam->flags = xs->xso.flags;
+   sam->spi = xs->id.spi;
+   sam->proto = xs->id.proto;
+   sam->family = xs->props.family;
+
+   if (xs->props.family == AF_INET6)
+   memcpy(sam->addr, &xs->id.daddr.a6, sizeof(xs->id.daddr.a6));
+   else
+   memcpy(sam->addr, &xs->id.daddr.a4, sizeof(xs->id.daddr.a4));
+   memcpy(sam->key, xs->aead->alg_key, sizeof(sam->key));
+
+   msgbuf[0] = IXGBE_VF_IPSEC_ADD;
+   msglen = sizeof(*sam) + sizeof(msgbuf[0]);
+
+   spin_lock_bh(&adapter->mbx_lock);
+
+   ret = hw->mbx.ops.write_posted(hw, msgbuf, msglen);
+   if (ret)
+   goto out;
+
+   msglen = sizeof(msgbuf[0]) * 2;
+   ret = hw->mbx.ops.read_posted(hw, msgbuf, msglen);
+   if (ret)
+   goto out;
+
+   ret = (int)msgbuf[1];
+   if (msgbuf[0] & IXGBE_VT_MSGTYPE_NACK && ret >= 0)
+   ret = -1;
+
+out:
+   spin_unlock_bh(&adapter->mbx_lock);
+
+   return ret;
+}
+
+/**
+ * ixgbevf_ipsec_del_pf_sa - ask the PF to delete an SA
+ * @adapter: board private structure
+ * @pfsa: sa index returned from PF when created, -1 for all
+ *
+ * Returns: 0 on success, or negative error code
+ **/
+static int ixgbevf_ipsec_del_pf_sa(struct ixgbevf_adapter *adapter, int pfsa)
+{
+   struct ixgbe_hw *hw = &adapter->hw;
+   u32 msgbuf[2];
+   int err;
+
+   memset(msgbuf, 0, sizeof(msgbuf));
+   msgbuf[0] = IXGBE_VF_IPSEC_DEL;
+   msgbuf[1] = (u32)pfsa;
+
+   spin_lock_bh(&adapter->mbx_lock);
+
+   err = hw->mbx.ops.write_posted(hw, msgbuf, sizeof(msgbuf));
+   if (err)
+   goto out;
+
+   err = hw->mbx.ops.read_posted(hw, msgbuf, sizeof(msgbuf));
+   if (err)
+   goto out;
+
+out:
+   spin_unlock_bh(&adapter->mbx_lock);
+   return err;
+}
+
+/**
+ * ixgbevf_ipsec_restore - restore the IPsec HW settings after a reset
+ * @adapter: board private structure
+ *
+ * Reload the HW tables from the SW tables after they've been bashed
+ * by a chip reset.  While we're here, make sure any stale VF data is
+ * removed, since we go through reset when num_vfs changes.
+ **/
+void ixgbevf_ipsec_restore(struct ixgbevf_adapter *adapter)
+{
+   struct ixgbevf_ipsec *ipsec = adapter->ipsec;
+   struct net_device *netdev = adapter->netdev;
+   int i;
+
+   if (!(adapter->netdev->features & NETIF_F_HW_ESP))
+   return;
+
+   /* reload the Rx and Tx keys */
+   for (i = 0; i < IXGBE_IPSEC_MAX_SA_COUNT; i++) {
+   struct rx_sa *r = &ipsec->rx_tbl[i];
+   struct tx_sa *t = &ipsec->tx_tbl[i];
+   int ret;
+
+   if (r->used) {
+   

[net-next 11/13] ixgbevf: enable VF IPsec offload operations

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

Add the IPsec initialization into the driver startup and
add the Rx and Tx processing hooks.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/defines.h  |  2 +-
 drivers/net/ethernet/intel/ixgbevf/ethtool.c  |  2 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  | 25 +++
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 74 ++-
 drivers/net/ethernet/intel/ixgbevf/vf.c   |  4 +
 5 files changed, 86 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/defines.h 
b/drivers/net/ethernet/intel/ixgbevf/defines.h
index dd9cd4541d7a..6bace746eaac 100644
--- a/drivers/net/ethernet/intel/ixgbevf/defines.h
+++ b/drivers/net/ethernet/intel/ixgbevf/defines.h
@@ -234,7 +234,7 @@ union ixgbe_adv_rx_desc {
 /* Context descriptors */
 struct ixgbe_adv_tx_context_desc {
__le32 vlan_macip_lens;
-   __le32 seqnum_seed;
+   __le32 fceof_saidx;
__le32 type_tucmd_mlhl;
__le32 mss_l4len_idx;
 };
diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c 
b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 631c91046f39..5399787e07af 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -55,6 +55,8 @@ static struct ixgbe_stats ixgbevf_gstrings_stats[] = {
IXGBEVF_STAT("alloc_rx_page", alloc_rx_page),
IXGBEVF_STAT("alloc_rx_page_failed", alloc_rx_page_failed),
IXGBEVF_STAT("alloc_rx_buff_failed", alloc_rx_buff_failed),
+   IXGBEVF_STAT("tx_ipsec", tx_ipsec),
+   IXGBEVF_STAT("rx_ipsec", rx_ipsec),
 };
 
 #define IXGBEVF_QUEUE_STATS_LEN ( \
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 172637e2f2e6..e399e1c0c54a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -459,6 +459,31 @@ int ethtool_ioctl(struct ifreq *ifr);
 
 extern void ixgbevf_write_eitr(struct ixgbevf_q_vector *q_vector);
 
+#ifdef CONFIG_XFRM_OFFLOAD
+void ixgbevf_init_ipsec_offload(struct ixgbevf_adapter *adapter);
+void ixgbevf_stop_ipsec_offload(struct ixgbevf_adapter *adapter);
+void ixgbevf_ipsec_restore(struct ixgbevf_adapter *adapter);
+void ixgbevf_ipsec_rx(struct ixgbevf_ring *rx_ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ struct sk_buff *skb);
+int ixgbevf_ipsec_tx(struct ixgbevf_ring *tx_ring,
+struct ixgbevf_tx_buffer *first,
+struct ixgbevf_ipsec_tx_data *itd);
+#else
+static inline void ixgbevf_init_ipsec_offload(struct ixgbevf_adapter *adapter)
+{ }
+static inline void ixgbevf_stop_ipsec_offload(struct ixgbevf_adapter *adapter)
+{ }
+static inline void ixgbevf_ipsec_restore(struct ixgbevf_adapter *adapter) { }
+static inline void ixgbevf_ipsec_rx(struct ixgbevf_ring *rx_ring,
+   union ixgbe_adv_rx_desc *rx_desc,
+   struct sk_buff *skb) { }
+static inline int ixgbevf_ipsec_tx(struct ixgbevf_ring *tx_ring,
+  struct ixgbevf_tx_buffer *first,
+  struct ixgbevf_ipsec_tx_data *itd)
+{ return 0; }
+#endif /* CONFIG_XFRM_OFFLOAD */
+
 void ixgbe_napi_add_all(struct ixgbevf_adapter *adapter);
 void ixgbe_napi_del_all(struct ixgbevf_adapter *adapter);
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 15deac07fd92..17e23f609d74 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -40,7 +40,7 @@ static const char ixgbevf_driver_string[] =
 #define DRV_VERSION "4.1.0-k"
 const char ixgbevf_driver_version[] = DRV_VERSION;
 static char ixgbevf_copyright[] =
-   "Copyright (c) 2009 - 2015 Intel Corporation.";
+   "Copyright (c) 2009 - 2018 Intel Corporation.";
 
 static const struct ixgbevf_info *ixgbevf_info_tbl[] = {
[board_82599_vf]= &ixgbevf_82599_vf_info,
@@ -268,7 +268,7 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector 
*q_vector,
struct ixgbevf_adapter *adapter = q_vector->adapter;
struct ixgbevf_tx_buffer *tx_buffer;
union ixgbe_adv_tx_desc *tx_desc;
-   unsigned int total_bytes = 0, total_packets = 0;
+   unsigned int total_bytes = 0, total_packets = 0, total_ipsec = 0;
unsigned int budget = tx_ring->count / 2;
unsigned int i = tx_ring->next_to_clean;
 
@@ -299,6 +299,8 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector 
*q_vector,
/* update the statistics for this packet */
total_bytes += tx_buffer->bytecount;
total_packets += tx_buffer->gso_segs;
+   if (tx_buffer->tx_flags & IXGBE_TX_FLAGS_IPSEC)
+   total_ipsec++;
 
/* free the skb */

[net-next 03/13] ixgbe: don't clear IPsec sa counters on HW clearing

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

The software SA record counters should not be cleared when clearing
the hardware tables.  This causes the counters to be out of sync
after a driver reset.

Fixes: 63a67fe229ea ("ixgbe: add ipsec offload add and remove SA")
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index da4322e4daed..e515246d0bce 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -113,7 +113,6 @@ static void ixgbe_ipsec_set_rx_ip(struct ixgbe_hw *hw, u16 
idx, __be32 addr[])
  **/
 static void ixgbe_ipsec_clear_hw_tables(struct ixgbe_adapter *adapter)
 {
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
struct ixgbe_hw *hw = &adapter->hw;
u32 buf[4] = {0, 0, 0, 0};
u16 idx;
@@ -132,9 +131,6 @@ static void ixgbe_ipsec_clear_hw_tables(struct 
ixgbe_adapter *adapter)
ixgbe_ipsec_set_tx_sa(hw, idx, buf, 0);
ixgbe_ipsec_set_rx_sa(hw, idx, 0, buf, 0, 0, 0);
}
-
-   ipsec->num_rx_sa = 0;
-   ipsec->num_tx_sa = 0;
 }
 
 /**
-- 
2.17.1



[net-next 07/13] ixgbe: add VF IPsec offload enable flag

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

Add a private flag to expressly enable support for VF IPsec offload.
The VF will have to be "trusted" in order to use the hardware offload,
but because of the general concerns of managing VF access, we want to
be sure the user specifically is enabling the feature.

This is likely a candidate for becoming a netdev feature flag.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h | 1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 9 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c   | 3 ++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 4fc906c6166b..89e709ce1947 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -605,6 +605,7 @@ struct ixgbe_adapter {
 #define IXGBE_FLAG2_EEE_ENABLEDBIT(15)
 #define IXGBE_FLAG2_RX_LEGACY  BIT(16)
 #define IXGBE_FLAG2_IPSEC_ENABLED  BIT(17)
+#define IXGBE_FLAG2_VF_IPSEC_ENABLED   BIT(18)
 
/* Tx fast path data */
int num_tx_queues;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index e5a8461fe6a9..732b1e6ecc43 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -136,6 +136,8 @@ static const char ixgbe_gstrings_test[][ETH_GSTRING_LEN] = {
 static const char ixgbe_priv_flags_strings[][ETH_GSTRING_LEN] = {
 #define IXGBE_PRIV_FLAGS_LEGACY_RX BIT(0)
"legacy-rx",
+#define IXGBE_PRIV_FLAGS_VF_IPSEC_EN   BIT(1)
+   "vf-ipsec",
 };
 
 #define IXGBE_PRIV_FLAGS_STR_LEN ARRAY_SIZE(ixgbe_priv_flags_strings)
@@ -3409,6 +3411,9 @@ static u32 ixgbe_get_priv_flags(struct net_device *netdev)
if (adapter->flags2 & IXGBE_FLAG2_RX_LEGACY)
priv_flags |= IXGBE_PRIV_FLAGS_LEGACY_RX;
 
+   if (adapter->flags2 & IXGBE_FLAG2_VF_IPSEC_ENABLED)
+   priv_flags |= IXGBE_PRIV_FLAGS_VF_IPSEC_EN;
+
return priv_flags;
 }
 
@@ -3421,6 +3426,10 @@ static int ixgbe_set_priv_flags(struct net_device 
*netdev, u32 priv_flags)
if (priv_flags & IXGBE_PRIV_FLAGS_LEGACY_RX)
flags2 |= IXGBE_FLAG2_RX_LEGACY;
 
+   flags2 &= ~IXGBE_FLAG2_VF_IPSEC_ENABLED;
+   if (priv_flags & IXGBE_PRIV_FLAGS_VF_IPSEC_EN)
+   flags2 |= IXGBE_FLAG2_VF_IPSEC_ENABLED;
+
if (flags2 != adapter->flags2) {
adapter->flags2 = flags2;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 80108e12ab86..ecd01fade960 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -880,7 +880,8 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, 
u32 *msgbuf, u32 vf)
int err;
 
sam = (struct sa_mbx_msg *)(&msgbuf[1]);
-   if (!adapter->vfinfo[vf].trusted) {
+   if (!adapter->vfinfo[vf].trusted ||
+   !(adapter->flags2 & IXGBE_FLAG2_VF_IPSEC_ENABLED)) {
e_warn(drv, "VF %d attempted to add an IPsec SA\n", vf);
err = -EACCES;
goto err_out;
-- 
2.17.1



[net-next 12/13] ixgbe: disallow IPsec Tx offload when in SR-IOV mode

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

There seems to be a problem in the x540's internal switch wherein if SR-IOV
mode is enabled and an offloaded IPsec packet is sent to a local VF,
the packet is silently dropped.  This might never be a problem as it is
somewhat a corner case, but if someone happens to be using IPsec offload
from the PF to a VF that just happens to get migrated to the local box,
communication will mysteriously fail.

Not good.

A simple way to protect from this is to simply not allow any IPsec offloads
for outgoing packets when num_vfs != 0.  This doesn't help any offloads that
were created before SR-IOV was enabled, but we'll get to that later.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index ecd01fade960..0a1c8bf3f74f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -693,6 +693,9 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
} else {
struct tx_sa tsa;
 
+   if (adapter->num_vfs)
+   return -EOPNOTSUPP;
+
/* find the first unused index */
ret = ixgbe_ipsec_find_empty_idx(ipsec, false);
if (ret < 0) {
-- 
2.17.1



[net-next 00/13][pull request] 10GbE Intel Wired LAN Driver Updates 2018-08-28

2018-08-28 Thread Jeff Kirsher
This series contains updates to ixgbe and ixgbevf only.

Sebastian adds support for firmware NVM recovery mode, which logs a
message when errors are detected and un-registers the device.  Also
fixed RSS type recognition with VF to VF communication.

Shannon Nelson implements IPsec hardware offload for VF devices in
Intel's 10GbE x540 family of Ethernet devices.

The IPsec HW offload feature has been in the x540/Niantic family of
network devices since their release in 2009, but there was no Linux
kernel support for the offload until 2017.  After the XFRM code added
support for the offload last year, the HW offload was added to the ixgbe
PF driver.

Since the related x540 VF device uses same setup as the PF for implementing
the offload, adding the feature to the ixgbevf seemed like a good idea.
In this case, the PF owns the device registers, so the VF simply packages
up the request information into a VF<->PF message and the PF does the
device configuration.

The following are changes since commit 050cdc6c9501abcd64720b8cc3e7941efee9547d:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Sebastian Basierski (2):
  ixgbe: firmware recovery mode
  ixgbevf: VF2VF TCP RSS

Shannon Nelson (11):
  ixgbe: don't clear IPsec sa counters on HW clearing
  ixgbe: reload IPsec IP table after sa tables
  ixgbe: prep IPsec constants for later use
  ixgbe: add VF IPsec management
  ixgbe: add VF IPsec offload enable flag
  ixgbe: add VF IPsec offload request message handling
  ixgbevf: add defines for IPsec offload request
  ixgbevf: add VF IPsec offload code
  ixgbevf: enable VF IPsec offload operations
  ixgbe: disallow IPsec Tx offload when in SR-IOV mode
  ixgbe: fix the return value for unsupported VF offload

 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  20 +-
 .../net/ethernet/intel/ixgbe/ixgbe_common.c   |  11 +
 .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |   9 +
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 282 +++-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.h|  13 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  41 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h  |   5 +
 .../net/ethernet/intel/ixgbe/ixgbe_sriov.c|  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h |   4 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c |  15 +
 drivers/net/ethernet/intel/ixgbevf/Makefile   |   1 +
 drivers/net/ethernet/intel/ixgbevf/defines.h  |  10 +-
 drivers/net/ethernet/intel/ixgbevf/ethtool.c  |   2 +
 drivers/net/ethernet/intel/ixgbevf/ipsec.c| 673 ++
 .../{ixgbe/ixgbe_ipsec.h => ixgbevf/ipsec.h}  |  40 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |  33 +
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  78 +-
 drivers/net/ethernet/intel/ixgbevf/mbx.h  |   5 +
 drivers/net/ethernet/intel/ixgbevf/vf.c   |   4 +
 19 files changed, 1193 insertions(+), 70 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbevf/ipsec.c
 copy drivers/net/ethernet/intel/{ixgbe/ixgbe_ipsec.h => ixgbevf/ipsec.h} (59%)

-- 
2.17.1



[net-next 05/13] ixgbe: prep IPsec constants for later use

2018-08-28 Thread Jeff Kirsher
From: Shannon Nelson 

Pull out a couple of values from a function so they can be used
later elsewhere.

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 434065109b8d..3afb1fe766cd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -5,6 +5,9 @@
 #include 
 #include 
 
+#define IXGBE_IPSEC_KEY_BITS  160
+static const char aes_gcm_name[] = "rfc4106(gcm(aes))";
+
 /**
  * ixgbe_ipsec_set_tx_sa - set the Tx SA registers
  * @hw: hw specific details
@@ -407,7 +410,6 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
struct net_device *dev = xs->xso.dev;
unsigned char *key_data;
char *alg_name = NULL;
-   const char aes_gcm_name[] = "rfc4106(gcm(aes))";
int key_len;
 
if (!xs->aead) {
@@ -435,9 +437,9 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
 * we don't need to do any byteswapping.
 * 160 accounts for 16 byte key and 4 byte salt
 */
-   if (key_len == 160) {
+   if (key_len == IXGBE_IPSEC_KEY_BITS) {
*mysalt = ((u32 *)key_data)[4];
-   } else if (key_len != 128) {
+   } else if (key_len != (IXGBE_IPSEC_KEY_BITS - (sizeof(*mysalt) * 8))) {
netdev_err(dev, "IPsec hw offload only supports keys up to 128 
bits with a 32 bit salt\n");
return -EINVAL;
} else {
-- 
2.17.1



[PATCH 1/3] IB/ipoib: Use dev_port to expose network interface port numbers

2018-08-28 Thread Arseny Maslennikov
Some InfiniBand network devices have multiple ports on the same PCI
function. This initializes the `dev_port' sysfs field of those
network interfaces with their port number.

The use of `dev_id' was considered correct until Linux 3.15,
when another field, `dev_port', was defined for this particular
purpose and `dev_id' was reserved for distinguishing stacked ifaces
(e.g: VLANs) with the same hardware address as their parent device.

Similar fixes to net/mlx4_en and many other drivers, which started
exporting this information through `dev_id' before 3.15, were accepted
into the kernel 4 years ago.
See 76a066f2a2a0268b565459c417b59724b5a3197b, commit message:
`net/mlx4_en: Expose port number through sysfs'.

Signed-off-by: Arseny Maslennikov 
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index e3d28f9ad9c0..fcd69273de91 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1881,6 +1881,7 @@ static int ipoib_parent_init(struct net_device *ndev)
 
SET_NETDEV_DEV(priv->dev, priv->ca->dev.parent);
priv->dev->dev_id = priv->port - 1;
+   priv->dev->dev_port = priv->port - 1;
 
return 0;
 }
-- 
2.18.0



[PATCH 2/3] IB/ipoib: Stop using dev_id to expose port numbers

2018-08-28 Thread Arseny Maslennikov
Some InfiniBand network devices have multiple ports on the same PCI
function. Prior to this the kernel erroneously used the `dev_id' sysfs
field of those network interfaces to convey the port number to userspace.

`dev_id' is currently reserved for distinguishing stacked ifaces
(e.g: VLANs) with the same hardware address as their parent device.

Similar fixes to net/mlx4_en and many other drivers, which started
exporting this information through `dev_id' before 3.15, were accepted
into the kernel 4 years ago.
See 76a066f2a2a0268b565459c417b59724b5a3197b, commit message:
`net/mlx4_en: Expose port number through sysfs'.

This commit is separated from the previous one since we may wish to
preserve backwards compatibility with userspace being already dependent
on `dev_id' being different.

Signed-off-by: Arseny Maslennikov 
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index fcd69273de91..ba16a63ee303 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1880,7 +1880,6 @@ static int ipoib_parent_init(struct net_device *ndev)
   sizeof(union ib_gid));
 
SET_NETDEV_DEV(priv->dev, priv->ca->dev.parent);
-   priv->dev->dev_id = priv->port - 1;
priv->dev->dev_port = priv->port - 1;
 
return 0;
-- 
2.18.0



[PATCH 0/3] IB/ipoib: Use dev_port to disambiguate port numbers

2018-08-28 Thread Arseny Maslennikov
Pre-3.15 userspace had trouble distinguishing different ports
of a NIC on a single PCI bus/device/function. To solve this,
a sysfs field `dev_port' was introduced quite a while ago
(commit v3.14-rc3-739-g3f85944fe207), and some relevant device
drivers were fixed to use it, but not in case of IPoIB.

The convention for some reason never got documented in the kernel, but
was immediately adopted by userspace (notably udev[1][2], biosdevname[3])

3/3 documents the sysfs field — that's why I'm CC-ing netdev.

This series was tested on and applies to 4.19-rc1.

[1] https://lists.freedesktop.org/archives/systemd-devel/2014-June/020788.html
[2] https://lists.freedesktop.org/archives/systemd-devel/2014-July/020804.html
[3] 
https://github.com/CloudAutomationNTools/biosdevname/blob/c795d51dd93a5309652f0d635f12a3ecfabfaa72/src/eths.c#L38

Arseny Maslennikov (3):
  IB/ipoib: Use dev_port to expose network interface port numbers
  IB/ipoib: Stop using dev_id to expose port numbers
  Documentation/ABI: document /sys/class/net/*/dev_port

 Documentation/ABI/testing/sysfs-class-net | 10 ++
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

-- 
2.18.0



[PATCH 3/3] Documentation/ABI: document /sys/class/net/*/dev_port

2018-08-28 Thread Arseny Maslennikov
The sysfs field was introduced 4 years ago along with fixes to various
drivers that erroneously used `dev_id' for that purpose, but it was not
properly documented anywhere.
See commit v3.14-rc3-739-g3f85944fe207.

Signed-off-by: Arseny Maslennikov 
---
 Documentation/ABI/testing/sysfs-class-net | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-net 
b/Documentation/ABI/testing/sysfs-class-net
index 2f1788111cd9..1593d8997ade 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -91,6 +91,16 @@ Description:
stacked (e.g: VLAN interfaces) but still have the same MAC
address as their parent device.
 
+What:  /sys/class/net//dev_port
+Date:  February 2014
+KernelVersion: 3.15
+Contact:   netdev@vger.kernel.org
+Description:
+   Indicates the port number of this network device, formatted
+   as a decimal value. Some NICs have multiple independent ports
+   on the same PCI bus, device and function. This field allows
+   userspace to distinguish the respective interfaces.
+
 What:  /sys/class/net//dormant
 Date:  March 2006
 KernelVersion: 2.6.17
-- 
2.18.0



[PATCH] rtnetlink: expose value from SET_NETDEV_DEVTYPE via IFLA_DEVTYPE attribute

2018-08-28 Thread Marcel Holtmann
The name value from SET_NETDEV_DEVTYPE only ended up in the uevent sysfs
file as DEVTYPE= information. To avoid any kind of race conditions
between netlink messages and reading from sysfs, it is useful to add the
same string as new IFLA_DEVTYPE attribute included in the RTM_NEWLINK
messages.

For network managing daemons that have to classify ARPHRD_ETHER network
devices into different types (like Wireless LAN, Bluetooth etc.), this
avoids the extra round trip to sysfs and parsing of the uevent file.

Signed-off-by: Marcel Holtmann 
---
 include/uapi/linux/if_link.h |  2 ++
 net/core/rtnetlink.c | 12 
 2 files changed, 14 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 43391e2d1153..781294972bb4 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -166,6 +166,8 @@ enum {
IFLA_NEW_IFINDEX,
IFLA_MIN_MTU,
IFLA_MAX_MTU,
+   IFLA_DEVTYPE,   /* Name value from SET_NETDEV_DEVTYPE */
+#define IFLA_DEVTYPE IFLA_DEVTYPE
__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 24431e578310..bd288710f9bf 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -970,6 +970,14 @@ static size_t rtnl_xdp_size(void)
return xdp_size;
 }
 
+static size_t rtnl_devtype_size(const struct net_device *dev)
+{
+   if (!dev->dev.type || !dev->dev.type->name)
+   return 0;
+
+   return strlen(dev->dev.type->name) + 1;
+}
+
 static noinline size_t if_nlmsg_size(const struct net_device *dev,
 u32 ext_filter_mask)
 {
@@ -1017,6 +1025,7 @@ static noinline size_t if_nlmsg_size(const struct 
net_device *dev,
   + nla_total_size(4)  /* IFLA_CARRIER_DOWN_COUNT */
   + nla_total_size(4)  /* IFLA_MIN_MTU */
   + nla_total_size(4)  /* IFLA_MAX_MTU */
+  + rtnl_devtype_size(dev) /* IFLA_DEVTYPE */
   + 0;
 }
 
@@ -1679,6 +1688,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
nla_put_s32(skb, IFLA_NEW_IFINDEX, new_ifindex) < 0)
goto nla_put_failure;
 
+   if (dev->dev.type && dev->dev.type->name)
+   nla_put_string(skb, IFLA_DEVTYPE, dev->dev.type->name);
 
rcu_read_lock();
if (rtnl_fill_link_af(skb, dev, ext_filter_mask))
@@ -1738,6 +1749,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
[IFLA_CARRIER_DOWN_COUNT] = { .type = NLA_U32 },
[IFLA_MIN_MTU]  = { .type = NLA_U32 },
[IFLA_MAX_MTU]  = { .type = NLA_U32 },
+   [IFLA_DEVTYPE]  = { .type = NLA_STRING },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
-- 
2.14.4



[PATCH net-next 03/15] nfp: interpret extended FW load result codes

2018-08-28 Thread Jakub Kicinski
To enable easier FW distribution NFP can now automatically
select between FW stored on the flash and loaded from the
kernel.

If FW loading policy is set to auto it will compare the
versions of FW from the host and from the flash and load
the newer one.  If FW type doesn't match (e.g. one advanced
application vs another) the FW from the host takes precedence,
unless one of them is the basic NIC firmware, in which case
the non-basic-NIC FW is selected.

This automatic selection mechanism requires we inform user
what the verdict was.  Print a message to the logs explaining
the decision and the reason.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_cpp.h  |  3 +
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp.c  | 85 ++-
 2 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
index c338d539fa96..3b5182143ec7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
@@ -56,6 +56,9 @@
dev_info(nfp_cpp_device(cpp)->parent, NFP_SUBSYS ": " fmt, ## args)
 #define nfp_dbg(cpp, fmt, args...) \
dev_dbg(nfp_cpp_device(cpp)->parent, NFP_SUBSYS ": " fmt, ## args)
+#define nfp_printk(level, cpp, fmt, args...) \
+   dev_printk(level, nfp_cpp_device(cpp)->parent,  \
+  NFP_SUBSYS ": " fmt, ## args)
 
 #define PCI_64BIT_BAR_COUNT 3
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 0cdaa1fd6bcf..9eb7b5a91bb1 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -87,6 +87,9 @@
 #define NSP_CODE_MAJOR GENMASK(15, 12)
 #define NSP_CODE_MINOR GENMASK(11, 0)
 
+#define NFP_FW_LOAD_RET_MAJOR  GENMASK(15, 8)
+#define NFP_FW_LOAD_RET_MINOR  GENMASK(23, 16)
+
 enum nfp_nsp_cmd {
SPCODE_NOOP = 0, /* No operation */
SPCODE_SOFT_RESET   = 1, /* Soft reset the NFP */
@@ -135,6 +138,7 @@ struct nfp_nsp {
  * @option:NFP SP Command Argument
  * @buff_cpp:  NFP SP Buffer CPP Address info
  * @buff_addr: NFP SP Buffer Host address
+ * @error_cb:  Callback for interpreting option if error occurred
  */
 struct nfp_nsp_command_arg {
u16 code;
@@ -142,6 +146,7 @@ struct nfp_nsp_command_arg {
u32 option;
u32 buff_cpp;
u64 buff_addr;
+   void (*error_cb)(struct nfp_nsp *state, u32 ret_val);
 };
 
 /**
@@ -401,7 +406,10 @@ __nfp_nsp_command(struct nfp_nsp *state, const struct 
nfp_nsp_command_arg *arg)
if (err) {
nfp_warn(cpp, "Result (error) code set: %d (%d) command: %d\n",
 -err, (int)ret_val, arg->code);
-   nfp_nsp_print_extended_error(state, ret_val);
+   if (arg->error_cb)
+   arg->error_cb(state, ret_val);
+   else
+   nfp_nsp_print_extended_error(state, ret_val);
return -err;
}
 
@@ -530,18 +538,78 @@ int nfp_nsp_mac_reinit(struct nfp_nsp *state)
return nfp_nsp_command(state, SPCODE_MAC_INIT);
 }
 
+static void nfp_nsp_load_fw_extended_msg(struct nfp_nsp *state, u32 ret_val)
+{
+   static const char * const major_msg[] = {
+   /* 0 */ "Firmware from driver loaded",
+   /* 1 */ "Firmware from flash loaded",
+   /* 2 */ "Firmware loading failure",
+   };
+   static const char * const minor_msg[] = {
+   /*  0 */ "",
+   /*  1 */ "no named partition on flash",
+   /*  2 */ "error reading from flash",
+   /*  3 */ "can not deflate",
+   /*  4 */ "not a trusted file",
+   /*  5 */ "can not parse FW file",
+   /*  6 */ "MIP not found in FW file",
+   /*  7 */ "null firmware name in MIP",
+   /*  8 */ "FW version none",
+   /*  9 */ "FW build number none",
+   /* 10 */ "no FW selection policy HWInfo key found",
+   /* 11 */ "static FW selection policy",
+   /* 12 */ "FW version has precedence",
+   /* 13 */ "different FW application load requested",
+   /* 14 */ "development build",
+   };
+   unsigned int major, minor;
+   const char *level;
+
+   major = FIELD_GET(NFP_FW_LOAD_RET_MAJOR, ret_val);
+   minor = FIELD_GET(NFP_FW_LOAD_RET_MINOR, ret_val);
+
+   if (!nfp_nsp_has_stored_fw_load(state))
+   return;
+
+   /* Lower the message level in legacy case */
+   if (major == 0 && (minor == 0 || minor == 10))
+   level = KERN_DEBUG;
+   else if (major == 2)
+   level = KERN_ERR;
+   else
+   level = KERN_INFO;
+
+   if (major >= ARRAY_SIZE(

[PATCH net-next 01/15] nfp: encapsulate NSP command arguments into structs

2018-08-28 Thread Jakub Kicinski
There is already a fair number of arguments to nfp_nsp_command()
family of functions.  Encapsulate them into structures to make
adding new ones easier.  No functional changes.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp.c  | 205 --
 1 file changed, 136 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 2abee0fe3a7c..e1a14f4e5e71 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -127,6 +127,38 @@ struct nfp_nsp {
void *entries;
 };
 
+/**
+ * struct nfp_nsp_command_arg - NFP command argument structure
+ * @code:  NFP SP Command Code
+ * @timeout_sec:Timeout value to wait for completion in seconds
+ * @option:NFP SP Command Argument
+ * @buff_cpp:  NFP SP Buffer CPP Address info
+ * @buff_addr: NFP SP Buffer Host address
+ */
+struct nfp_nsp_command_arg {
+   u16 code;
+   unsigned int timeout_sec;
+   u32 option;
+   u32 buff_cpp;
+   u64 buff_addr;
+};
+
+/**
+ * struct nfp_nsp_command_buf_arg - NFP command with buffer argument structure
+ * @arg:   NFP command argument structure
+ * @in_buf:Buffer with data for input
+ * @in_size:   Size of @in_buf
+ * @out_buf:   Buffer for output data
+ * @out_size:  Size of @out_buf
+ */
+struct nfp_nsp_command_buf_arg {
+   struct nfp_nsp_command_arg arg;
+   const void *in_buf;
+   unsigned int in_size;
+   void *out_buf;
+   unsigned int out_size;
+};
+
 struct nfp_cpp *nfp_nsp_cpp(struct nfp_nsp *state)
 {
return state->cpp;
@@ -291,11 +323,7 @@ nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg, u32 
nsp_cpp, u64 addr,
 /**
  * __nfp_nsp_command() - Execute a command on the NFP Service Processor
  * @state: NFP SP state
- * @code:  NFP SP Command Code
- * @option:NFP SP Command Argument
- * @buff_cpp:  NFP SP Buffer CPP Address info
- * @buff_addr: NFP SP Buffer Host address
- * @timeout_sec:Timeout value to wait for completion in seconds
+ * @arg:   NFP command argument structure
  *
  * Return: 0 for success with no result
  *
@@ -308,8 +336,7 @@ nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg, u32 
nsp_cpp, u64 addr,
  * -ETIMEDOUT if the NSP took longer than @timeout_sec seconds to complete
  */
 static int
-__nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 option, u32 buff_cpp,
- u64 buff_addr, u32 timeout_sec)
+__nfp_nsp_command(struct nfp_nsp *state, const struct nfp_nsp_command_arg *arg)
 {
u64 reg, ret_val, nsp_base, nsp_buffer, nsp_status, nsp_command;
struct nfp_cpp *cpp = state->cpp;
@@ -326,22 +353,22 @@ __nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 
option, u32 buff_cpp,
if (err)
return err;
 
-   if (!FIELD_FIT(NSP_BUFFER_CPP, buff_cpp >> 8) ||
-   !FIELD_FIT(NSP_BUFFER_ADDRESS, buff_addr)) {
+   if (!FIELD_FIT(NSP_BUFFER_CPP, arg->buff_cpp >> 8) ||
+   !FIELD_FIT(NSP_BUFFER_ADDRESS, arg->buff_addr)) {
nfp_err(cpp, "Host buffer out of reach %08x %016llx\n",
-   buff_cpp, buff_addr);
+   arg->buff_cpp, arg->buff_addr);
return -EINVAL;
}
 
err = nfp_cpp_writeq(cpp, nsp_cpp, nsp_buffer,
-FIELD_PREP(NSP_BUFFER_CPP, buff_cpp >> 8) |
-FIELD_PREP(NSP_BUFFER_ADDRESS, buff_addr));
+FIELD_PREP(NSP_BUFFER_CPP, arg->buff_cpp >> 8) |
+FIELD_PREP(NSP_BUFFER_ADDRESS, arg->buff_addr));
if (err < 0)
return err;
 
err = nfp_cpp_writeq(cpp, nsp_cpp, nsp_command,
-FIELD_PREP(NSP_COMMAND_OPTION, option) |
-FIELD_PREP(NSP_COMMAND_CODE, code) |
+FIELD_PREP(NSP_COMMAND_OPTION, arg->option) |
+FIELD_PREP(NSP_COMMAND_CODE, arg->code) |
 FIELD_PREP(NSP_COMMAND_START, 1));
if (err < 0)
return err;
@@ -351,16 +378,16 @@ __nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 
option, u32 buff_cpp,
   NSP_COMMAND_START, 0, NFP_NSP_TIMEOUT_DEFAULT);
if (err) {
nfp_err(cpp, "Error %d waiting for code 0x%04x to start\n",
-   err, code);
+   err, arg->code);
return err;
}
 
/* Wait for NSP_STATUS_BUSY to go to 0 */
err = nfp_nsp_wait_reg(cpp, ®, nsp_cpp, nsp_status, NSP_STATUS_BUSY,
-  0, timeout_sec);
+  0, arg->timeout_sec ?: NFP_NSP_TIMEOUT_DEFAULT);
if (err) {
nfp_err(cpp, "Error %d waiting for code 0x%04x to complete\n",
-

[PATCH net-next 14/15] nfp: support access to absolute RTsyms

2018-08-28 Thread Jakub Kicinski
Add support in nfpcore for reading the absolute RTsyms.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_nffw.h | 13 +++---
 .../netronome/nfp/nfpcore/nfp_rtsym.c | 42 +--
 2 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h
index 04700278d00d..8d2cbdf4d517 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h
@@ -61,10 +61,12 @@ void nfp_mip_strtab(const struct nfp_mip *mip, u32 *addr, 
u32 *size);
 
 /* Implemented in nfp_rtsym.c */
 
-#define NFP_RTSYM_TYPE_NONE0
-#define NFP_RTSYM_TYPE_OBJECT  1
-#define NFP_RTSYM_TYPE_FUNCTION2
-#define NFP_RTSYM_TYPE_ABS 3
+enum nfp_rtsym_type {
+   NFP_RTSYM_TYPE_NONE = 0,
+   NFP_RTSYM_TYPE_OBJECT   = 1,
+   NFP_RTSYM_TYPE_FUNCTION = 2,
+   NFP_RTSYM_TYPE_ABS  = 3,
+};
 
 #define NFP_RTSYM_TARGET_NONE  0
 #define NFP_RTSYM_TARGET_LMEM  -1
@@ -83,7 +85,7 @@ struct nfp_rtsym {
const char *name;
u64 addr;
u64 size;
-   int type;
+   enum nfp_rtsym_type type;
int target;
int domain;
 };
@@ -98,6 +100,7 @@ const struct nfp_rtsym *nfp_rtsym_get(struct nfp_rtsym_table 
*rtbl, int idx);
 const struct nfp_rtsym *
 nfp_rtsym_lookup(struct nfp_rtsym_table *rtbl, const char *name);
 
+u64 nfp_rtsym_size(const struct nfp_rtsym *rtsym);
 int __nfp_rtsym_read(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
 u8 action, u8 token, u64 off, void *buf, size_t len);
 int nfp_rtsym_read(struct nfp_cpp *cpp, const struct nfp_rtsym *sym, u64 off,
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
index 28e5ed0bb31d..108ce8c5e68e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
@@ -233,10 +233,32 @@ nfp_rtsym_lookup(struct nfp_rtsym_table *rtbl, const char 
*name)
return NULL;
 }
 
+u64 nfp_rtsym_size(const struct nfp_rtsym *sym)
+{
+   switch (sym->type) {
+   case NFP_RTSYM_TYPE_NONE:
+   pr_err("rtsym type NONE\n");
+   return 0;
+   default:
+   pr_warn("Unknown rtsym type: %d\n", sym->type);
+   /* fall through */
+   case NFP_RTSYM_TYPE_OBJECT:
+   case NFP_RTSYM_TYPE_FUNCTION:
+   return sym->size;
+   case NFP_RTSYM_TYPE_ABS:
+   return sizeof(u64);
+   }
+}
+
 static int
 nfp_rtsym_to_dest(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
  u8 action, u8 token, u64 off, u32 *cpp_id, u64 *addr)
 {
+   if (sym->type != NFP_RTSYM_TYPE_OBJECT) {
+   nfp_err(cpp, "Direct access attempt to non-object rtsym\n");
+   return -EINVAL;
+   }
+
*addr = sym->addr + off;
 
if (sym->target == NFP_RTSYM_TARGET_EMU_CACHE) {
@@ -266,6 +288,15 @@ int __nfp_rtsym_read(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym,
u64 addr;
int err;
 
+   if (sym->type == NFP_RTSYM_TYPE_ABS) {
+   __le64 tmp = cpu_to_le64(sym->addr);
+
+   len = min(len, sizeof(tmp));
+   memcpy(buf, &tmp, len);
+
+   return len;
+   }
+
err = nfp_rtsym_to_dest(cpp, sym, action, token, off, &cpp_id, &addr);
if (err)
return err;
@@ -306,6 +337,9 @@ int __nfp_rtsym_readq(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym,
u64 addr;
int err;
 
+   if (sym->type == NFP_RTSYM_TYPE_ABS)
+   return sym->addr;
+
err = nfp_rtsym_to_dest(cpp, sym, action, token, off, &cpp_id, &addr);
if (err)
return err;
@@ -405,7 +439,7 @@ u64 nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const 
char *name,
goto exit;
}
 
-   switch (sym->size) {
+   switch (nfp_rtsym_size(sym)) {
case 4:
err = nfp_rtsym_readl(rtbl->cpp, sym, 0, &val32);
val = val32;
@@ -416,7 +450,7 @@ u64 nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const 
char *name,
default:
nfp_err(rtbl->cpp,
"rtsym '%s' unsupported or non-scalar size: %lld\n",
-   name, sym->size);
+   name, nfp_rtsym_size(sym));
err = -EINVAL;
break;
}
@@ -452,7 +486,7 @@ int nfp_rtsym_write_le(struct nfp_rtsym_table *rtbl, const 
char *name,
if (!sym)
return -ENOENT;
 
-   switch (sym->size) {
+   switch (nfp_rtsym_size(sym)) {
case 4:
err = nfp_rtsym_writel(rtbl->cpp, sym, 0, value);
break;
@@ -462,7 +496,7 @@ int nfp_rtsym_wri

[PATCH net-next 05/15] nfp: abm: look up MAC addresses via management FW

2018-08-28 Thread Jakub Kicinski
In multi-host scenarios Management FW may allocate MAC addresses
at runtime, we have to use the indirect lookup to find them.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 drivers/net/ethernet/netronome/nfp/abm/main.c | 34 ++-
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c 
b/drivers/net/ethernet/netronome/nfp/abm/main.c
index b84a6c2d387b..305ac07dc1e7 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -540,8 +540,9 @@ nfp_abm_vnic_set_mac(struct nfp_pf *pf, struct nfp_abm 
*abm, struct nfp_net *nn,
 {
struct nfp_eth_table_port *eth_port = &pf->eth_tbl->ports[id];
u8 mac_addr[ETH_ALEN];
-   const char *mac_str;
-   char name[32];
+   struct nfp_nsp *nsp;
+   char hwinfo[32];
+   int err;
 
if (id > pf->eth_tbl->count) {
nfp_warn(pf->cpp, "No entry for persistent MAC address\n");
@@ -549,22 +550,37 @@ nfp_abm_vnic_set_mac(struct nfp_pf *pf, struct nfp_abm 
*abm, struct nfp_net *nn,
return;
}
 
-   snprintf(name, sizeof(name), "eth%u.mac.pf%u",
+   snprintf(hwinfo, sizeof(hwinfo), "eth%u.mac.pf%u",
 eth_port->eth_index, abm->pf_id);
 
-   mac_str = nfp_hwinfo_lookup(pf->hwinfo, name);
-   if (!mac_str) {
-   nfp_warn(pf->cpp, "Can't lookup persistent MAC address (%s)\n",
-name);
+   nsp = nfp_nsp_open(pf->cpp);
+   if (IS_ERR(nsp)) {
+   nfp_warn(pf->cpp, "Failed to access the NSP for persistent MAC 
address: %ld\n",
+PTR_ERR(nsp));
+   eth_hw_addr_random(nn->dp.netdev);
+   return;
+   }
+
+   if (!nfp_nsp_has_hwinfo_lookup(nsp)) {
+   nfp_warn(pf->cpp, "NSP doesn't support PF MAC generation\n");
+   eth_hw_addr_random(nn->dp.netdev);
+   return;
+   }
+
+   err = nfp_nsp_hwinfo_lookup(nsp, hwinfo, sizeof(hwinfo));
+   nfp_nsp_close(nsp);
+   if (err) {
+   nfp_warn(pf->cpp, "Reading persistent MAC address failed: %d\n",
+err);
eth_hw_addr_random(nn->dp.netdev);
return;
}
 
-   if (sscanf(mac_str, "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx",
+   if (sscanf(hwinfo, "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx",
   &mac_addr[0], &mac_addr[1], &mac_addr[2],
   &mac_addr[3], &mac_addr[4], &mac_addr[5]) != 6) {
nfp_warn(pf->cpp, "Can't parse persistent MAC address (%s)\n",
-mac_str);
+hwinfo);
eth_hw_addr_random(nn->dp.netdev);
return;
}
-- 
2.17.1



[PATCH net-next 02/15] nfp: attempt FW load from flash

2018-08-28 Thread Jakub Kicinski
Flash may contain a default NFP application FW.  This application
can either be put there by the user (with ethtool -f) or shipped
with the card.  If file system FW is not found, attempt to load
this flash stored app FW.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c| 6 --
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 6 ++
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h | 6 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index 4a540c5e27fe..61c22c2935d4 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -441,8 +441,11 @@ nfp_fw_load(struct pci_dev *pdev, struct nfp_pf *pf, 
struct nfp_nsp *nsp)
}
 
fw = nfp_net_fw_find(pdev, pf);
-   if (!fw)
+   if (!fw) {
+   if (nfp_nsp_has_stored_fw_load(nsp))
+   nfp_nsp_load_stored_fw(nsp);
return 0;
+   }
 
dev_info(&pdev->dev, "Soft-reset, loading FW image\n");
err = nfp_nsp_device_soft_reset(nsp);
@@ -453,7 +456,6 @@ nfp_fw_load(struct pci_dev *pdev, struct nfp_pf *pf, struct 
nfp_nsp *nsp)
}
 
err = nfp_nsp_load_fw(nsp, fw);
-
if (err < 0) {
dev_err(&pdev->dev, "FW loading failed: %d\n", err);
goto exit_release_fw;
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index e1a14f4e5e71..0cdaa1fd6bcf 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -100,6 +100,7 @@ enum nfp_nsp_cmd {
SPCODE_NSP_WRITE_FLASH  = 11, /* Load and flash image from buffer */
SPCODE_NSP_SENSORS  = 12, /* Read NSP sensor(s) */
SPCODE_NSP_IDENTIFY = 13, /* Read NSP version */
+   SPCODE_FW_STORED= 16, /* If no FW loaded, load flash app FW */
 };
 
 static const struct {
@@ -618,3 +619,8 @@ int nfp_nsp_read_sensors(struct nfp_nsp *state, unsigned 
int sensor_mask,
 
return nfp_nsp_command_buf(state, &sensors);
 }
+
+int nfp_nsp_load_stored_fw(struct nfp_nsp *state)
+{
+   return nfp_nsp_command(state, SPCODE_FW_STORED);
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index f23d9e06f097..65f2d4a6de02 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -50,12 +50,18 @@ int nfp_nsp_device_soft_reset(struct nfp_nsp *state);
 int nfp_nsp_load_fw(struct nfp_nsp *state, const struct firmware *fw);
 int nfp_nsp_write_flash(struct nfp_nsp *state, const struct firmware *fw);
 int nfp_nsp_mac_reinit(struct nfp_nsp *state);
+int nfp_nsp_load_stored_fw(struct nfp_nsp *state);
 
 static inline bool nfp_nsp_has_mac_reinit(struct nfp_nsp *state)
 {
return nfp_nsp_get_abi_ver_minor(state) > 20;
 }
 
+static inline bool nfp_nsp_has_stored_fw_load(struct nfp_nsp *state)
+{
+   return nfp_nsp_get_abi_ver_minor(state) > 23;
+}
+
 enum nfp_eth_interface {
NFP_INTERFACE_NONE  = 0,
NFP_INTERFACE_SFP   = 1,
-- 
2.17.1



[PATCH net-next 00/15] nfp: add NFP5000 support

2018-08-28 Thread Jakub Kicinski
Hi!

This series broadly speaking adds support for NFP5000 and
related products.

First we add support for loading FW from flash.  We need to allow
for the management processor to provide extended log messages when
FW is loaded.  This is needed when FW selection policy is to compare
the FW on the disk and in the flash, and load the newer.  User should
be told what FW was selected.

We use this opportunity to add extended errors for normal FW loading
as well.

Next we add support for requesting HW information from the management
processor.  Up until now the driver read the HWinfo as it appears in
card memory, but there can be cases when management processor has
additional information or generates the entries dynamically so
occasionally we will have to consult it.  We use this to look up MAC
addresses for PCIe netdevs.

Next the actual patch with NFP5000 support and a small dose of
refactoring of PCIe init. 

The remaining patches add support for reading RTsymbol types we
didn't need before.  Ones explicitly placed in external memory unit's
cache and absolute ones.

This part begins with a patch moving the logic which figures out
the correct bit offsets to device probe, to avoid redoing the
calculation for each access.  Second patch adds error messages
for easier troubleshooting.  Next patch adds helpers which will
take care of address conversions to reach into EMU cache.
Subsequently users are migrated from the raw CPP API to the new RTsym
helpers.  Finally we add support for reading absolute symbols.


Jakub Kicinski (15):
  nfp: encapsulate NSP command arguments into structs
  nfp: attempt FW load from flash
  nfp: interpret extended FW load result codes
  nfp: add support for indirect HWinfo lookup
  nfp: abm: look up MAC addresses via management FW
  nfp: add support for NFP5000
  nfp: refactor the per-chip PCIe config
  nfp: save the MU locality field offset
  nfp: add basic errors messages to target logic
  nfp: add RTsym access helpers
  nfp: pass cpp_id to nfp_cpp_map_area()
  nfp: convert existing RTsym helpers to full target decoding
  nfp: convert all RTsym users to use new read/write helpers
  nfp: support access to absolute RTsyms
  nfp: make RTsym users handle absolute symbols correctly

 drivers/net/ethernet/netronome/nfp/abm/ctrl.c |  32 +-
 drivers/net/ethernet/netronome/nfp/abm/main.c |  34 +-
 drivers/net/ethernet/netronome/nfp/nfp_main.c |  44 +--
 .../netronome/nfp/nfp_net_debugdump.c |  50 +--
 .../net/ethernet/netronome/nfp/nfp_net_main.c |   8 +-
 .../netronome/nfp/nfpcore/nfp6000_pcie.c  |  50 ++-
 .../ethernet/netronome/nfp/nfpcore/nfp_cpp.h  |  12 +-
 .../netronome/nfp/nfpcore/nfp_cppcore.c   |  36 ++
 .../netronome/nfp/nfpcore/nfp_cpplib.c|  12 +-
 .../ethernet/netronome/nfp/nfpcore/nfp_nffw.c |  32 +-
 .../ethernet/netronome/nfp/nfpcore/nfp_nffw.h |  38 +-
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp.c  | 330 ++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp.h  |  12 +
 .../netronome/nfp/nfpcore/nfp_rtsym.c | 216 +++-
 .../netronome/nfp/nfpcore/nfp_target.c|  12 +-
 15 files changed, 682 insertions(+), 236 deletions(-)

-- 
2.17.1



[PATCH net-next 15/15] nfp: make RTsym users handle absolute symbols correctly

2018-08-28 Thread Jakub Kicinski
Make the RTsym users access the size via the helper, which
takes care of special handling of absolute symbols.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 drivers/net/ethernet/netronome/nfp/abm/ctrl.c |  4 +-
 drivers/net/ethernet/netronome/nfp/nfp_main.c |  6 +--
 .../netronome/nfp/nfp_net_debugdump.c | 43 +++
 3 files changed, 21 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c 
b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
index 53fb40aa83db..5b06f07c78cd 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
@@ -280,10 +280,10 @@ nfp_abm_ctrl_find_rtsym(struct nfp_pf *pf, const char 
*name, unsigned int size)
nfp_err(pf->cpp, "Symbol '%s' not found\n", name);
return ERR_PTR(-ENOENT);
}
-   if (sym->size != size) {
+   if (nfp_rtsym_size(sym) != size) {
nfp_err(pf->cpp,
"Symbol '%s' wrong size: expected %u got %llu\n",
-   name, size, sym->size);
+   name, size, nfp_rtsym_size(sym));
return ERR_PTR(-EINVAL);
}
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index 61b4b2055784..9474a4eed8ce 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -124,7 +124,7 @@ int nfp_mbox_cmd(struct nfp_pf *pf, u32 cmd, void *in_data, 
u64 in_length,
if (!pf->mbox)
return -EOPNOTSUPP;
 
-   max_data_sz = pf->mbox->size - NFP_MBOX_SYM_MIN_SIZE;
+   max_data_sz = nfp_rtsym_size(pf->mbox) - NFP_MBOX_SYM_MIN_SIZE;
 
/* Check if cmd field is clear */
err = nfp_rtsym_readl(pf->cpp, pf->mbox, NFP_MBOX_CMD, &val);
@@ -566,9 +566,9 @@ static int nfp_pf_find_rtsyms(struct nfp_pf *pf)
/* Optional per-PCI PF mailbox */
snprintf(pf_symbol, sizeof(pf_symbol), NFP_MBOX_SYM_NAME, pf_id);
pf->mbox = nfp_rtsym_lookup(pf->rtbl, pf_symbol);
-   if (pf->mbox && pf->mbox->size < NFP_MBOX_SYM_MIN_SIZE) {
+   if (pf->mbox && nfp_rtsym_size(pf->mbox) < NFP_MBOX_SYM_MIN_SIZE) {
nfp_err(pf->cpp, "PF mailbox symbol too small: %llu < %d\n",
-   pf->mbox->size, NFP_MBOX_SYM_MIN_SIZE);
+   nfp_rtsym_size(pf->mbox), NFP_MBOX_SYM_MIN_SIZE);
return -EINVAL;
}
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c
index 242c9363e9e8..b6b897840ac5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c
@@ -188,21 +188,21 @@ nfp_net_dump_load_dumpspec(struct nfp_cpp *cpp, struct 
nfp_rtsym_table *rtbl)
const struct nfp_rtsym *specsym;
struct nfp_dumpspec *dumpspec;
int bytes_read;
+   u64 sym_size;
 
specsym = nfp_rtsym_lookup(rtbl, NFP_DUMP_SPEC_RTSYM);
if (!specsym)
return NULL;
+   sym_size = nfp_rtsym_size(specsym);
 
/* expected size of this buffer is in the order of tens of kilobytes */
-   dumpspec = vmalloc(sizeof(*dumpspec) + specsym->size);
+   dumpspec = vmalloc(sizeof(*dumpspec) + sym_size);
if (!dumpspec)
return NULL;
+   dumpspec->size = sym_size;
 
-   dumpspec->size = specsym->size;
-
-   bytes_read = nfp_rtsym_read(cpp, specsym, 0, dumpspec->data,
-   specsym->size);
-   if (bytes_read != specsym->size) {
+   bytes_read = nfp_rtsym_read(cpp, specsym, 0, dumpspec->data, sym_size);
+   if (bytes_read != sym_size) {
vfree(dumpspec);
nfp_warn(cpp, "Debug dump specification read failed.\n");
return NULL;
@@ -262,7 +262,6 @@ nfp_calc_rtsym_dump_sz(struct nfp_pf *pf, struct 
nfp_dump_tl *spec)
struct nfp_dumpspec_rtsym *spec_rtsym;
const struct nfp_rtsym *sym;
u32 tl_len, key_len;
-   u32 size;
 
spec_rtsym = (struct nfp_dumpspec_rtsym *)spec;
tl_len = be32_to_cpu(spec->length);
@@ -274,13 +273,8 @@ nfp_calc_rtsym_dump_sz(struct nfp_pf *pf, struct 
nfp_dump_tl *spec)
if (!sym)
return nfp_dump_error_tlv_size(spec);
 
-   if (sym->type == NFP_RTSYM_TYPE_ABS)
-   size = sizeof(sym->addr);
-   else
-   size = sym->size;
-
return ALIGN8(offsetof(struct nfp_dump_rtsym, rtsym) + key_len + 1) +
-  ALIGN8(size);
+  ALIGN8(nfp_rtsym_size(sym));
 }
 
 static int
@@ -652,11 +646,7 @@ nfp_dump_single_rtsym(struct nfp_pf *pf, struct 
nfp_dumpspec_rtsym *spec,
if (!sym)
return nfp_dump_error_tlv(&spec->tl, -ENOENT, dump);
 
-   if (sym->type == NFP_RTSYM_TYPE_ABS)
-   sym_

[PATCH net-next 13/15] nfp: convert all RTsym users to use new read/write helpers

2018-08-28 Thread Jakub Kicinski
Convert all users of RTsym to the new set of helpers which
handle all targets correctly.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 drivers/net/ethernet/netronome/nfp/abm/ctrl.c | 28 ++-
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 28 ---
 .../netronome/nfp/nfp_net_debugdump.c | 13 ++---
 3 files changed, 23 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c 
b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
index b157ccd8c80f..53fb40aa83db 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
@@ -55,30 +55,21 @@
 #define NFP_QMSTAT_DROP16
 #define NFP_QMSTAT_ECN 24
 
-static unsigned long long
-nfp_abm_q_lvl_thrs(struct nfp_abm_link *alink, unsigned int queue)
-{
-   return alink->abm->q_lvls->addr +
-   (alink->queue_base + queue) * NFP_QLVL_STRIDE + NFP_QLVL_THRS;
-}
-
 static int
 nfp_abm_ctrl_stat(struct nfp_abm_link *alink, const struct nfp_rtsym *sym,
  unsigned int stride, unsigned int offset, unsigned int i,
  bool is_u64, u64 *res)
 {
struct nfp_cpp *cpp = alink->abm->app->cpp;
-   u32 val32, mur;
-   u64 val, addr;
+   u64 val, sym_offset;
+   u32 val32;
int err;
 
-   mur = NFP_CPP_ATOMIC_RD(sym->target, sym->domain);
-
-   addr = sym->addr + (alink->queue_base + i) * stride + offset;
+   sym_offset = (alink->queue_base + i) * stride + offset;
if (is_u64)
-   err = nfp_cpp_readq(cpp, mur, addr, &val);
+   err = __nfp_rtsym_readq(cpp, sym, 3, 0, sym_offset, &val);
else
-   err = nfp_cpp_readl(cpp, mur, addr, &val32);
+   err = __nfp_rtsym_readl(cpp, sym, 3, 0, sym_offset, &val32);
if (err) {
nfp_err(cpp,
"RED offload reading stat failed on vNIC %d queue %d\n",
@@ -114,13 +105,12 @@ nfp_abm_ctrl_stat_all(struct nfp_abm_link *alink, const 
struct nfp_rtsym *sym,
 int nfp_abm_ctrl_set_q_lvl(struct nfp_abm_link *alink, unsigned int i, u32 val)
 {
struct nfp_cpp *cpp = alink->abm->app->cpp;
-   u32 muw;
+   u64 sym_offset;
int err;
 
-   muw = NFP_CPP_ATOMIC_WR(alink->abm->q_lvls->target,
-   alink->abm->q_lvls->domain);
-
-   err = nfp_cpp_writel(cpp, muw, nfp_abm_q_lvl_thrs(alink, i), val);
+   sym_offset = (alink->queue_base + i) * NFP_QLVL_STRIDE + NFP_QLVL_THRS;
+   err = __nfp_rtsym_writel(cpp, alink->abm->q_lvls, 4, 0,
+sym_offset, val);
if (err) {
nfp_err(cpp, "RED offload setting level failed on vNIC %d queue 
%d\n",
alink->id, i);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index b0f1c313fee0..61b4b2055784 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -116,23 +116,18 @@ nfp_pf_map_rtsym(struct nfp_pf *pf, const char *name, 
const char *sym_fmt,
 int nfp_mbox_cmd(struct nfp_pf *pf, u32 cmd, void *in_data, u64 in_length,
 void *out_data, u64 out_length)
 {
-   unsigned long long addr;
unsigned long err_at;
u64 max_data_sz;
u32 val = 0;
-   u32 cpp_id;
int n, err;
 
if (!pf->mbox)
return -EOPNOTSUPP;
 
-   cpp_id = NFP_CPP_ISLAND_ID(pf->mbox->target, NFP_CPP_ACTION_RW, 0,
-  pf->mbox->domain);
-   addr = pf->mbox->addr;
max_data_sz = pf->mbox->size - NFP_MBOX_SYM_MIN_SIZE;
 
/* Check if cmd field is clear */
-   err = nfp_cpp_readl(pf->cpp, cpp_id, addr + NFP_MBOX_CMD, &val);
+   err = nfp_rtsym_readl(pf->cpp, pf->mbox, NFP_MBOX_CMD, &val);
if (err || val) {
nfp_warn(pf->cpp, "failed to issue command (%u): %u, err: %d\n",
 cmd, val, err);
@@ -140,30 +135,29 @@ int nfp_mbox_cmd(struct nfp_pf *pf, u32 cmd, void 
*in_data, u64 in_length,
}
 
in_length = min(in_length, max_data_sz);
-   n = nfp_cpp_write(pf->cpp, cpp_id, addr + NFP_MBOX_DATA,
- in_data, in_length);
+   n = nfp_rtsym_write(pf->cpp, pf->mbox, NFP_MBOX_DATA, in_data,
+   in_length);
if (n != in_length)
return -EIO;
/* Write data_len and wipe reserved */
-   err = nfp_cpp_writeq(pf->cpp, cpp_id, addr + NFP_MBOX_DATA_LEN,
-in_length);
+   err = nfp_rtsym_writeq(pf->cpp, pf->mbox, NFP_MBOX_DATA_LEN, in_length);
if (err)
return err;
 
/* Read back for ordering */
-   err = nfp_cpp_readl(pf->cpp, cpp_id, addr + NFP_MBOX_DATA_LEN, &val);
+   err = nfp_rtsym_readl(pf->cpp, pf->mbox, NFP_MBOX_DATA_LEN, &val);
  

[PATCH net-next 08/15] nfp: save the MU locality field offset

2018-08-28 Thread Jakub Kicinski
We will soon need the MU locality field offset much more
often than just for decoding MIP address.  Save it in nfp_cpp
for quick access.  Note that we can already reuse the target
config from nfp_cpp, no need to do the XPB read.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_cpp.h  |  1 +
 .../netronome/nfp/nfpcore/nfp_cppcore.c   | 36 +++
 .../ethernet/netronome/nfp/nfpcore/nfp_nffw.c | 32 +
 3 files changed, 38 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
index af19fe9f4934..991b8ed7e036 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
@@ -233,6 +233,7 @@ void nfp_cpp_free(struct nfp_cpp *cpp);
 u32 nfp_cpp_model(struct nfp_cpp *cpp);
 u16 nfp_cpp_interface(struct nfp_cpp *cpp);
 int nfp_cpp_serial(struct nfp_cpp *cpp, const u8 **serial);
+unsigned int nfp_cpp_mu_locality_lsb(struct nfp_cpp *cpp);
 
 struct nfp_cpp_area *nfp_cpp_area_alloc_with_name(struct nfp_cpp *cpp,
  u32 cpp_id,
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
index 73de57a09800..f7e1d79e735f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
@@ -75,6 +75,7 @@ struct nfp_cpp_resource {
  * @interface: chip interface id we are using to reach it
  * @serial:chip serial number
  * @imb_cat_table: CPP Mapping Table
+ * @mu_locality_lsb:   MU access type bit offset
  *
  * Following fields use explicit locking:
  * @resource_list: NFP CPP resource list
@@ -100,6 +101,7 @@ struct nfp_cpp {
wait_queue_head_t waitq;
 
u32 imb_cat_table[16];
+   unsigned int mu_locality_lsb;
 
struct mutex area_cache_mutex;
struct list_head area_cache_list;
@@ -266,6 +268,34 @@ int nfp_cpp_serial(struct nfp_cpp *cpp, const u8 **serial)
return sizeof(cpp->serial);
 }
 
+#define NFP_IMB_TGTADDRESSMODECFG_MODE_of(_x)  (((_x) >> 13) & 0x7)
+#define NFP_IMB_TGTADDRESSMODECFG_ADDRMODE BIT(12)
+#define   NFP_IMB_TGTADDRESSMODECFG_ADDRMODE_32_BIT0
+#define   NFP_IMB_TGTADDRESSMODECFG_ADDRMODE_40_BITBIT(12)
+
+static int nfp_cpp_set_mu_locality_lsb(struct nfp_cpp *cpp)
+{
+   unsigned int mode, addr40;
+   u32 imbcppat;
+   int res;
+
+   imbcppat = cpp->imb_cat_table[NFP_CPP_TARGET_MU];
+   mode = NFP_IMB_TGTADDRESSMODECFG_MODE_of(imbcppat);
+   addr40 = !!(imbcppat & NFP_IMB_TGTADDRESSMODECFG_ADDRMODE);
+
+   res = nfp_cppat_mu_locality_lsb(mode, addr40);
+   if (res < 0)
+   return res;
+   cpp->mu_locality_lsb = res;
+
+   return 0;
+}
+
+unsigned int nfp_cpp_mu_locality_lsb(struct nfp_cpp *cpp)
+{
+   return cpp->mu_locality_lsb;
+}
+
 /**
  * nfp_cpp_area_alloc_with_name() - allocate a new CPP area
  * @cpp:   CPP device handle
@@ -1241,6 +1271,12 @@ nfp_cpp_from_operations(const struct nfp_cpp_operations 
*ops,
nfp_cpp_readl(cpp, arm, NFP_ARM_GCSR + NFP_ARM_GCSR_SOFTMODEL3,
  &mask[1]);
 
+   err = nfp_cpp_set_mu_locality_lsb(cpp);
+   if (err < 0) {
+   dev_err(parent, "Can't calculate MU locality bit offset\n");
+   goto err_out;
+   }
+
dev_info(cpp->dev.parent, "Model: 0x%08x, SN: %pM, Ifc: 0x%04x\n",
 nfp_cpp_model(cpp), cpp->serial, nfp_cpp_interface(cpp));
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.c
index 40510860341b..a164fbc85cd3 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.c
@@ -156,29 +156,6 @@ static u64 nffw_fwinfo_mip_offset_get(const struct 
nffw_fwinfo *fi)
return (mip_off_hi & 0xFF) << 32 | le32_to_cpu(fi->mip_offset_lo);
 }
 
-#define NFP_IMB_TGTADDRESSMODECFG_MODE_of(_x)  (((_x) >> 13) & 0x7)
-#define NFP_IMB_TGTADDRESSMODECFG_ADDRMODE BIT(12)
-#define   NFP_IMB_TGTADDRESSMODECFG_ADDRMODE_32_BIT0
-#define   NFP_IMB_TGTADDRESSMODECFG_ADDRMODE_40_BITBIT(12)
-
-static int nfp_mip_mu_locality_lsb(struct nfp_cpp *cpp)
-{
-   unsigned int mode, addr40;
-   u32 xpbaddr, imbcppat;
-   int err;
-
-   /* Hardcoded XPB IMB Base, island 0 */
-   xpbaddr = 0x000a + NFP_CPP_TARGET_MU * 4;
-   err = nfp_xpb_readl(cpp, xpbaddr, &imbcppat);
-   if (err < 0)
-   return err;
-
-   mode = NFP_IMB_TGTADDRESSMODECFG_MODE_of(imbcppat);
-   addr40 = !!(imbcppat & NFP_IMB_TGTADDRESSMODECFG_ADDRMODE);
-
-   return nfp_cppat_mu_locality_lsb(mode, addr40);
-}
-
 static unsigned int
 nffw_res_

[PATCH net-next 12/15] nfp: convert existing RTsym helpers to full target decoding

2018-08-28 Thread Jakub Kicinski
Make nfp_rtsym_{read,write}_le() and nfp_rtsym_map() use the new
target resolution helpers to allow accessing in-cache symbols.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 .../netronome/nfp/nfpcore/nfp_rtsym.c | 28 +--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
index 4d98905c0651..28e5ed0bb31d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
@@ -395,7 +395,7 @@ u64 nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const 
char *name,
  int *error)
 {
const struct nfp_rtsym *sym;
-   u32 val32, id;
+   u32 val32;
u64 val;
int err;
 
@@ -405,15 +405,13 @@ u64 nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const 
char *name,
goto exit;
}
 
-   id = NFP_CPP_ISLAND_ID(sym->target, NFP_CPP_ACTION_RW, 0, sym->domain);
-
switch (sym->size) {
case 4:
-   err = nfp_cpp_readl(rtbl->cpp, id, sym->addr, &val32);
+   err = nfp_rtsym_readl(rtbl->cpp, sym, 0, &val32);
val = val32;
break;
case 8:
-   err = nfp_cpp_readq(rtbl->cpp, id, sym->addr, &val);
+   err = nfp_rtsym_readq(rtbl->cpp, sym, 0, &val);
break;
default:
nfp_err(rtbl->cpp,
@@ -449,20 +447,17 @@ int nfp_rtsym_write_le(struct nfp_rtsym_table *rtbl, 
const char *name,
 {
const struct nfp_rtsym *sym;
int err;
-   u32 id;
 
sym = nfp_rtsym_lookup(rtbl, name);
if (!sym)
return -ENOENT;
 
-   id = NFP_CPP_ISLAND_ID(sym->target, NFP_CPP_ACTION_RW, 0, sym->domain);
-
switch (sym->size) {
case 4:
-   err = nfp_cpp_writel(rtbl->cpp, id, sym->addr, value);
+   err = nfp_rtsym_writel(rtbl->cpp, sym, 0, value);
break;
case 8:
-   err = nfp_cpp_writeq(rtbl->cpp, id, sym->addr, value);
+   err = nfp_rtsym_writeq(rtbl->cpp, sym, 0, value);
break;
default:
nfp_err(rtbl->cpp,
@@ -482,21 +477,26 @@ nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char 
*name, const char *id,
const struct nfp_rtsym *sym;
u8 __iomem *mem;
u32 cpp_id;
+   u64 addr;
+   int err;
 
sym = nfp_rtsym_lookup(rtbl, name);
if (!sym)
return (u8 __iomem *)ERR_PTR(-ENOENT);
 
-   cpp_id = NFP_CPP_ISLAND_ID(sym->target, NFP_CPP_ACTION_RW, 0,
-  sym->domain);
+   err = nfp_rtsym_to_dest(rtbl->cpp, sym, NFP_CPP_ACTION_RW, 0, 0,
+   &cpp_id, &addr);
+   if (err) {
+   nfp_err(rtbl->cpp, "Symbol %s mapping failed\n", name);
+   return (u8 __iomem *)ERR_PTR(err);
+   }
 
if (sym->size < min_size) {
nfp_err(rtbl->cpp, "Symbol %s too small\n", name);
return (u8 __iomem *)ERR_PTR(-EINVAL);
}
 
-   mem = nfp_cpp_map_area(rtbl->cpp, id, cpp_id, sym->addr,
-  sym->size, area);
+   mem = nfp_cpp_map_area(rtbl->cpp, id, cpp_id, addr, sym->size, area);
if (IS_ERR(mem)) {
nfp_err(rtbl->cpp, "Failed to map symbol %s: %ld\n",
name, PTR_ERR(mem));
-- 
2.17.1



[PATCH net-next 04/15] nfp: add support for indirect HWinfo lookup

2018-08-28 Thread Jakub Kicinski
Management FW can adjust some of the information in the HWinfo table
at runtime.  In some cases reading the table directly will not yield
correct results.  Add a NSP command for looking up information.
Up until now we weren't making use of any of the values which may
get adjusted.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp.c  | 38 +++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp.h  |  6 +++
 2 files changed, 44 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 9eb7b5a91bb1..bf593a6b26a1 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -90,6 +90,8 @@
 #define NFP_FW_LOAD_RET_MAJOR  GENMASK(15, 8)
 #define NFP_FW_LOAD_RET_MINOR  GENMASK(23, 16)
 
+#define NFP_HWINFO_LOOKUP_SIZE GENMASK(11, 0)
+
 enum nfp_nsp_cmd {
SPCODE_NOOP = 0, /* No operation */
SPCODE_SOFT_RESET   = 1, /* Soft reset the NFP */
@@ -104,6 +106,7 @@ enum nfp_nsp_cmd {
SPCODE_NSP_SENSORS  = 12, /* Read NSP sensor(s) */
SPCODE_NSP_IDENTIFY = 13, /* Read NSP version */
SPCODE_FW_STORED= 16, /* If no FW loaded, load flash app FW */
+   SPCODE_HWINFO_LOOKUP= 17, /* Lookup HWinfo with overwrites etc. */
 };
 
 static const struct {
@@ -703,3 +706,38 @@ int nfp_nsp_load_stored_fw(struct nfp_nsp *state)
nfp_nsp_load_fw_extended_msg(state, ret);
return 0;
 }
+
+static int
+__nfp_nsp_hwinfo_lookup(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+   struct nfp_nsp_command_buf_arg hwinfo_lookup = {
+   {
+   .code   = SPCODE_HWINFO_LOOKUP,
+   .option = size,
+   },
+   .in_buf = buf,
+   .in_size= size,
+   .out_buf= buf,
+   .out_size   = size,
+   };
+
+   return nfp_nsp_command_buf(state, &hwinfo_lookup);
+}
+
+int nfp_nsp_hwinfo_lookup(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+   int err;
+
+   size = min_t(u32, size, NFP_HWINFO_LOOKUP_SIZE);
+
+   err = __nfp_nsp_hwinfo_lookup(state, buf, size);
+   if (err)
+   return err;
+
+   if (strnlen(buf, size) == size) {
+   nfp_err(state->cpp, "NSP HWinfo value not NULL-terminated\n");
+   return -EINVAL;
+   }
+
+   return 0;
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index 65f2d4a6de02..bd6c9071c8e9 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -51,6 +51,7 @@ int nfp_nsp_load_fw(struct nfp_nsp *state, const struct 
firmware *fw);
 int nfp_nsp_write_flash(struct nfp_nsp *state, const struct firmware *fw);
 int nfp_nsp_mac_reinit(struct nfp_nsp *state);
 int nfp_nsp_load_stored_fw(struct nfp_nsp *state);
+int nfp_nsp_hwinfo_lookup(struct nfp_nsp *state, void *buf, unsigned int size);
 
 static inline bool nfp_nsp_has_mac_reinit(struct nfp_nsp *state)
 {
@@ -62,6 +63,11 @@ static inline bool nfp_nsp_has_stored_fw_load(struct nfp_nsp 
*state)
return nfp_nsp_get_abi_ver_minor(state) > 23;
 }
 
+static inline bool nfp_nsp_has_hwinfo_lookup(struct nfp_nsp *state)
+{
+   return nfp_nsp_get_abi_ver_minor(state) > 24;
+}
+
 enum nfp_eth_interface {
NFP_INTERFACE_NONE  = 0,
NFP_INTERFACE_SFP   = 1,
-- 
2.17.1



[PATCH net-next 06/15] nfp: add support for NFP5000

2018-08-28 Thread Jakub Kicinski
Add NFP5000 to supported chips, the chip is backward compatible
with NFP4000 and NFP6000, so core PCIe code needs to handle it
the same way as 4k and 6k.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 4 
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index 61c22c2935d4..b0f1c313fee0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -68,6 +68,10 @@ static const struct pci_device_id nfp_pci_device_ids[] = {
  PCI_VENDOR_ID_NETRONOME, PCI_ANY_ID,
  PCI_ANY_ID, 0,
},
+   { PCI_VENDOR_ID_NETRONOME, PCI_DEVICE_ID_NETRONOME_NFP5000,
+ PCI_VENDOR_ID_NETRONOME, PCI_ANY_ID,
+ PCI_ANY_ID, 0,
+   },
{ PCI_VENDOR_ID_NETRONOME, PCI_DEVICE_ID_NETRONOME_NFP4000,
  PCI_VENDOR_ID_NETRONOME, PCI_ANY_ID,
  PCI_ANY_ID, 0,
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index c8d0b1016a64..6ef5ac2d0827 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -652,6 +652,7 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
nfp->expl.data = bar->iomem + NFP_PCIE_SRAM + 0x1000;
 
if (nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP4000 ||
+   nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP5000 ||
nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP6000) {
nfp->iomem.csr = bar->iomem + NFP_PCIE_BAR(0);
} else {
@@ -663,6 +664,7 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
}
 
if (nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP4000 ||
+   nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP5000 ||
nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP6000)
expl_groups = 4;
else
@@ -1327,7 +1329,7 @@ struct nfp_cpp *nfp_cpp_from_nfp6000_pcie(struct pci_dev 
*pdev)
 
/*  Finished with card initialization. */
dev_info(&pdev->dev,
-"Netronome Flow Processor NFP4000/NFP6000 PCIe Card Probe\n");
+"Netronome Flow Processor NFP4000/NFP5000/NFP6000 PCIe Card 
Probe\n");
pcie_print_link_status(pdev);
 
nfp = kzalloc(sizeof(*nfp), GFP_KERNEL);
-- 
2.17.1



[PATCH net-next 09/15] nfp: add basic errors messages to target logic

2018-08-28 Thread Jakub Kicinski
Add error prints to CPP target encoding/decoding logic, otherwise
it's quite hard to pin point the reasons why read or write
operations fail.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 .../net/ethernet/netronome/nfp/nfpcore/nfp_target.c  | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c
index 4ea1e585d945..f691c6587c76 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c
@@ -39,7 +39,11 @@
  *  Francois H. Theron 
  */
 
+#define pr_fmt(fmt)   "NFP target: " fmt
+
 #include 
+#include 
+#include 
 
 #include "nfp_cpp.h"
 
@@ -733,8 +737,10 @@ int nfp_target_cpp(u32 cpp_island_id, u64 
cpp_island_address,
u32 imb;
int err;
 
-   if (target < 0 || target >= 16)
+   if (target < 0 || target >= 16) {
+   pr_err("Invalid CPP target: %d\n", target);
return -EINVAL;
+   }
 
if (island == 0) {
/* Already translated */
@@ -753,8 +759,10 @@ int nfp_target_cpp(u32 cpp_island_id, u64 
cpp_island_address,
err = nfp_cppat_addr_encode(cpp_target_address, island, target,
((imb >> 13) & 7), ((imb >> 12) & 1),
((imb >> 6)  & 0x3f), ((imb >> 0)  & 0x3f));
-   if (err)
+   if (err) {
+   pr_err("Can't encode CPP address: %d\n", err);
return err;
+   }
 
*cpp_target_id = NFP_CPP_ID(target,
NFP_CPP_ID_ACTION_of(cpp_island_id),
-- 
2.17.1



[PATCH net-next 11/15] nfp: pass cpp_id to nfp_cpp_map_area()

2018-08-28 Thread Jakub Kicinski
Align nfp_cpp_map_area() with other CPP-level APIs and pass
encoded cpp_id/dest rather than target, action, domain tuple.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c|  8 
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h |  4 ++--
 .../net/ethernet/netronome/nfp/nfpcore/nfp_cpplib.c  | 12 
 .../net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c   |  8 ++--
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 28516ee8..0b1ac9c234d1 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -470,8 +470,8 @@ static void nfp_net_pci_unmap_mem(struct nfp_pf *pf)
 
 static int nfp_net_pci_map_mem(struct nfp_pf *pf)
 {
+   u32 min_size, cpp_id;
u8 __iomem *mem;
-   u32 min_size;
int err;
 
min_size = pf->max_data_vnics * NFP_PF_CSR_SLICE_SIZE;
@@ -519,9 +519,9 @@ static int nfp_net_pci_map_mem(struct nfp_pf *pf)
pf->vfcfg_tbl2 = NULL;
}
 
-   mem = nfp_cpp_map_area(pf->cpp, "net.qc", 0, 0,
-  NFP_PCIE_QUEUE(0), NFP_QCP_QUEUE_AREA_SZ,
-  &pf->qc_area);
+   cpp_id = NFP_CPP_ISLAND_ID(0, NFP_CPP_ACTION_RW, 0, 0);
+   mem = nfp_cpp_map_area(pf->cpp, "net.qc", cpp_id, NFP_PCIE_QUEUE(0),
+  NFP_QCP_QUEUE_AREA_SZ, &pf->qc_area);
if (IS_ERR(mem)) {
nfp_err(pf->cpp, "Failed to map Queue Controller area.\n");
err = PTR_ERR(mem);
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
index 991b8ed7e036..123e29cba6d1 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
@@ -294,8 +294,8 @@ int nfp_cpp_writeq(struct nfp_cpp *cpp, u32 cpp_id,
   unsigned long long address, u64 value);
 
 u8 __iomem *
-nfp_cpp_map_area(struct nfp_cpp *cpp, const char *name, int domain, int target,
-u64 addr, unsigned long size, struct nfp_cpp_area **area);
+nfp_cpp_map_area(struct nfp_cpp *cpp, const char *name, u32 cpp_id, u64 addr,
+unsigned long size, struct nfp_cpp_area **area);
 
 struct nfp_cpp_mutex;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpplib.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpplib.c
index 20bad05e2e92..03fcde5fa137 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpplib.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpplib.c
@@ -294,8 +294,7 @@ int nfp_cpp_explicit_write(struct nfp_cpp *cpp, u32 cpp_id, 
u64 addr,
  * nfp_cpp_map_area() - Helper function to map an area
  * @cpp:NFP CPP handler
  * @name:   Name for the area
- * @domain: CPP domain
- * @target: CPP target
+ * @cpp_id: CPP ID for operation
  * @addr:   CPP address
  * @size:   Size of the area
  * @area:   Area handle (output)
@@ -306,15 +305,12 @@ int nfp_cpp_explicit_write(struct nfp_cpp *cpp, u32 
cpp_id, u64 addr,
  * Return: Pointer to memory mapped area or ERR_PTR
  */
 u8 __iomem *
-nfp_cpp_map_area(struct nfp_cpp *cpp, const char *name, int domain, int target,
-u64 addr, unsigned long size, struct nfp_cpp_area **area)
+nfp_cpp_map_area(struct nfp_cpp *cpp, const char *name, u32 cpp_id, u64 addr,
+unsigned long size, struct nfp_cpp_area **area)
 {
u8 __iomem *res;
-   u32 dest;
 
-   dest = NFP_CPP_ISLAND_ID(target, NFP_CPP_ACTION_RW, 0, domain);
-
-   *area = nfp_cpp_area_alloc_acquire(cpp, name, dest, addr, size);
+   *area = nfp_cpp_area_alloc_acquire(cpp, name, cpp_id, addr, size);
if (!*area)
goto err_eio;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
index 1c0b1b11b69f..4d98905c0651 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
@@ -481,18 +481,22 @@ nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char 
*name, const char *id,
 {
const struct nfp_rtsym *sym;
u8 __iomem *mem;
+   u32 cpp_id;
 
sym = nfp_rtsym_lookup(rtbl, name);
if (!sym)
return (u8 __iomem *)ERR_PTR(-ENOENT);
 
+   cpp_id = NFP_CPP_ISLAND_ID(sym->target, NFP_CPP_ACTION_RW, 0,
+  sym->domain);
+
if (sym->size < min_size) {
nfp_err(rtbl->cpp, "Symbol %s too small\n", name);
return (u8 __iomem *)ERR_PTR(-EINVAL);
}
 
-   mem = nfp_cpp_map_area(rtbl->cpp, id, sym->domain, sym->target,
-  sym->addr, sym->size, area);
+   mem = nfp_cpp_map_area(rtbl->cpp, id, cpp_i

[PATCH net-next 07/15] nfp: refactor the per-chip PCIe config

2018-08-28 Thread Jakub Kicinski
Use a switch statement instead of ifs for code dependent
on chip version.  While at it make sure we fail for unknown
chip revisions.

Signed-off-by: Jakub Kicinski 
---
 .../netronome/nfp/nfpcore/nfp6000_pcie.c  | 50 ++-
 .../ethernet/netronome/nfp/nfpcore/nfp_cpp.h  |  4 ++
 2 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index 6ef5ac2d0827..fd63d83bdea5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -138,6 +138,7 @@
 
 /* The number of explicit BARs to reserve.
  * Minimum is 0, maximum is 4 on the NFP6000.
+ * The NFP3800 can have only one per PF.
  */
 #define NFP_PCIE_EXPLICIT_BARS 2
 
@@ -589,8 +590,8 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
NFP_PCIE_BAR_PCIE2CPP_MapType_EXPLICIT3),
};
char status_msg[196] = {};
+   int i, err, bars_free;
struct nfp_bar *bar;
-   int i, bars_free;
int expl_groups;
char *msg, *end;
 
@@ -643,6 +644,8 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
bar->iomem = ioremap_nocache(nfp_bar_resource_start(bar),
 nfp_bar_resource_len(bar));
if (bar->iomem) {
+   int pf;
+
msg += snprintf(msg, end - msg, "0.0: General/MSI-X SRAM, ");
atomic_inc(&bar->refcnt);
bars_free--;
@@ -651,24 +654,40 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
 
nfp->expl.data = bar->iomem + NFP_PCIE_SRAM + 0x1000;
 
-   if (nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP4000 ||
-   nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP5000 ||
-   nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP6000) {
-   nfp->iomem.csr = bar->iomem + NFP_PCIE_BAR(0);
-   } else {
-   int pf = nfp->pdev->devfn & 7;
-
+   switch (nfp->pdev->device) {
+   case PCI_DEVICE_ID_NETRONOME_NFP3800:
+   pf = nfp->pdev->devfn & 7;
nfp->iomem.csr = bar->iomem + NFP_PCIE_BAR(pf);
+   break;
+   case PCI_DEVICE_ID_NETRONOME_NFP4000:
+   case PCI_DEVICE_ID_NETRONOME_NFP5000:
+   case PCI_DEVICE_ID_NETRONOME_NFP6000:
+   nfp->iomem.csr = bar->iomem + NFP_PCIE_BAR(0);
+   break;
+   default:
+   dev_err(nfp->dev, "Unsupported device ID: %04hx!\n",
+   nfp->pdev->device);
+   err = -EINVAL;
+   goto err_unmap_bar0;
}
nfp->iomem.em = bar->iomem + NFP_PCIE_EM;
}
 
-   if (nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP4000 ||
-   nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP5000 ||
-   nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP6000)
-   expl_groups = 4;
-   else
+   switch (nfp->pdev->device) {
+   case PCI_DEVICE_ID_NETRONOME_NFP3800:
expl_groups = 1;
+   break;
+   case PCI_DEVICE_ID_NETRONOME_NFP4000:
+   case PCI_DEVICE_ID_NETRONOME_NFP5000:
+   case PCI_DEVICE_ID_NETRONOME_NFP6000:
+   expl_groups = 4;
+   break;
+   default:
+   dev_err(nfp->dev, "Unsupported device ID: %04hx!\n",
+   nfp->pdev->device);
+   err = -EINVAL;
+   goto err_unmap_bar0;
+   }
 
/* Configure, and lock, BAR0.1 for PCIe XPB (MSI-X PBA) */
bar = &nfp->bar[1];
@@ -713,6 +732,11 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
dev_info(nfp->dev, "%sfree: %d/%d\n", status_msg, bars_free, nfp->bars);
 
return 0;
+
+err_unmap_bar0:
+   if (nfp->bar[0].iomem)
+   iounmap(nfp->bar[0].iomem);
+   return err;
 }
 
 static void disable_bars(struct nfp6000_pcie *nfp)
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
index 3b5182143ec7..af19fe9f4934 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
@@ -62,6 +62,10 @@
 
 #define PCI_64BIT_BAR_COUNT 3
 
+/* NFP hardware vendor/device ids.
+ */
+#define PCI_DEVICE_ID_NETRONOME_NFP38000x3800
+
 #define NFP_CPP_NUM_TARGETS 16
 /* Max size of area it should be safe to request */
 #define NFP_CPP_SAFE_AREA_SIZE SZ_2M
-- 
2.17.1



[PATCH net-next 10/15] nfp: add RTsym access helpers

2018-08-28 Thread Jakub Kicinski
RTsyms may have special encodings for more complex symbol types.
For example symbols which are placed in external memory unit's
cache directly, constants or local memory.  Add set of helpers
which will check for those special encodings and handle them
correctly.

For now only add direct cache accesses, we don't have a need to
access the other ones in foreseeable future.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_nffw.h |  25 +++
 .../netronome/nfp/nfpcore/nfp_rtsym.c | 146 ++
 2 files changed, 171 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h
index df599d5b6bb3..04700278d00d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h
@@ -98,6 +98,31 @@ const struct nfp_rtsym *nfp_rtsym_get(struct nfp_rtsym_table 
*rtbl, int idx);
 const struct nfp_rtsym *
 nfp_rtsym_lookup(struct nfp_rtsym_table *rtbl, const char *name);
 
+int __nfp_rtsym_read(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
+u8 action, u8 token, u64 off, void *buf, size_t len);
+int nfp_rtsym_read(struct nfp_cpp *cpp, const struct nfp_rtsym *sym, u64 off,
+  void *buf, size_t len);
+int __nfp_rtsym_readl(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
+ u8 action, u8 token, u64 off, u32 *value);
+int nfp_rtsym_readl(struct nfp_cpp *cpp, const struct nfp_rtsym *sym, u64 off,
+   u32 *value);
+int __nfp_rtsym_readq(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
+ u8 action, u8 token, u64 off, u64 *value);
+int nfp_rtsym_readq(struct nfp_cpp *cpp, const struct nfp_rtsym *sym, u64 off,
+   u64 *value);
+int __nfp_rtsym_write(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
+ u8 action, u8 token, u64 off, void *buf, size_t len);
+int nfp_rtsym_write(struct nfp_cpp *cpp, const struct nfp_rtsym *sym, u64 off,
+   void *buf, size_t len);
+int __nfp_rtsym_writel(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
+  u8 action, u8 token, u64 off, u32 value);
+int nfp_rtsym_writel(struct nfp_cpp *cpp, const struct nfp_rtsym *sym, u64 off,
+u32 value);
+int __nfp_rtsym_writeq(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
+  u8 action, u8 token, u64 off, u64 value);
+int nfp_rtsym_writeq(struct nfp_cpp *cpp, const struct nfp_rtsym *sym, u64 off,
+u64 value);
+
 u64 nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const char *name,
  int *error);
 int nfp_rtsym_write_le(struct nfp_rtsym_table *rtbl, const char *name,
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
index 9e34216578da..1c0b1b11b69f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
@@ -233,6 +233,152 @@ nfp_rtsym_lookup(struct nfp_rtsym_table *rtbl, const char 
*name)
return NULL;
 }
 
+static int
+nfp_rtsym_to_dest(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
+ u8 action, u8 token, u64 off, u32 *cpp_id, u64 *addr)
+{
+   *addr = sym->addr + off;
+
+   if (sym->target == NFP_RTSYM_TARGET_EMU_CACHE) {
+   int locality_off = nfp_cpp_mu_locality_lsb(cpp);
+
+   *addr &= ~(NFP_MU_ADDR_ACCESS_TYPE_MASK << locality_off);
+   *addr |= NFP_MU_ADDR_ACCESS_TYPE_DIRECT << locality_off;
+
+   *cpp_id = NFP_CPP_ISLAND_ID(NFP_CPP_TARGET_MU, action, token,
+   sym->domain);
+   } else if (sym->target < 0) {
+   nfp_err(cpp, "Unhandled RTsym target encoding: %d\n",
+   sym->target);
+   return -EINVAL;
+   } else {
+   *cpp_id = NFP_CPP_ISLAND_ID(sym->target, action, token,
+   sym->domain);
+   }
+
+   return 0;
+}
+
+int __nfp_rtsym_read(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
+u8 action, u8 token, u64 off, void *buf, size_t len)
+{
+   u32 cpp_id;
+   u64 addr;
+   int err;
+
+   err = nfp_rtsym_to_dest(cpp, sym, action, token, off, &cpp_id, &addr);
+   if (err)
+   return err;
+
+   return nfp_cpp_read(cpp, cpp_id, addr, buf, len);
+}
+
+int nfp_rtsym_read(struct nfp_cpp *cpp, const struct nfp_rtsym *sym, u64 off,
+  void *buf, size_t len)
+{
+   return __nfp_rtsym_read(cpp, sym, NFP_CPP_ACTION_RW, 0, off, buf, len);
+}
+
+int __nfp_rtsym_readl(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
+ u8 action, u8 token, u64 off, u32 *value)
+{
+   u32 cpp_id;
+   u64 addr;
+   int err;
+
+   err = nfp_rts

Re: [PATCH net-next] virtio_net: force_napi_tx module param.

2018-08-28 Thread Willem de Bruijn
On Mon, Jul 30, 2018 at 2:06 AM Jason Wang  wrote:
>
>
>
> On 2018年07月25日 08:17, Jon Olson wrote:
> > On Tue, Jul 24, 2018 at 3:46 PM Michael S. Tsirkin  wrote:
> >> On Tue, Jul 24, 2018 at 06:31:54PM -0400, Willem de Bruijn wrote:
> >>> On Tue, Jul 24, 2018 at 6:23 PM Michael S. Tsirkin  
> >>> wrote:
>  On Tue, Jul 24, 2018 at 04:52:53PM -0400, Willem de Bruijn wrote:
> > >From the above linked patch, I understand that there are yet
> > other special cases in production, such as a hard cap on #tx queues to
> > 32 regardless of number of vcpus.
>  I don't think upstream kernels have this limit - we can
>  now use vmalloc for higher number of queues.
> >>> Yes. that patch* mentioned it as a google compute engine imposed
> >>> limit. It is exactly such cloud provider imposed rules that I'm
> >>> concerned about working around in upstream drivers.
> >>>
> >>> * for reference, I mean https://patchwork.ozlabs.org/patch/725249/
> >> Yea. Why does GCE do it btw?
> > There are a few reasons for the limit, some historical, some current.
> >
> > Historically we did this because of a kernel limit on the number of
> > TAP queues (in Montreal I thought this limit was 32). To my chagrin,
> > the limit upstream at the time we did it was actually eight. We had
> > increased the limit from eight to 32 internally, and it appears in
> > upstream it has subsequently increased upstream to 256. We no longer
> > use TAP for networking, so that constraint no longer applies for us,
> > but when looking at removing/raising the limit we discovered no
> > workloads that clearly benefited from lifting it, and it also placed
> > more pressure on our virtual networking stack particularly on the Tx
> > side. We left it as-is.
> >
> > In terms of current reasons there are really two. One is memory usage.
> > As you know, virtio-net uses rx/tx pairs, so there's an expectation
> > that the guest will have an Rx queue for every Tx queue. We run our
> > individual virtqueues fairly deep (4096 entries) to give guests a wide
> > time window for re-posting Rx buffers and avoiding starvation on
> > packet delivery. Filling an Rx vring with max-sized mergeable buffers
> > (4096 bytes) is 16MB of GFP_ATOMIC allocations. At 32 queues this can
> > be up to 512MB of memory posted for network buffers. Scaling this to
> > the largest VM GCE offers today (160 VCPUs -- n1-ultramem-160) keeping
> > all of the Rx rings full would (in the large average Rx packet size
> > case) consume up to 2.5 GB(!) of guest RAM. Now, those VMs have 3.8T
> > of RAM available, but I don't believe we've observed a situation where
> > they would have benefited from having 2.5 gigs of buffers posted for
> > incoming network traffic :)
>
> We can work to have async txq and rxq instead of paris if there's a
> strong requirement.
>
> >
> > The second reason is interrupt related -- as I mentioned above, we
> > have found no workloads that clearly benefit from so many queues, but
> > we have found workloads that degrade. In particular workloads that do
> > a lot of small packet processing but which aren't extremely latency
> > sensitive can achieve higher PPS by taking fewer interrupt across
> > fewer VCPUs due to better batching (this also incurs higher latency,
> > but at the limit the "busy" cores end up suppressing most interrupts
> > and spending most of their cycles farming out work). Memcache is a
> > good example here, particularly if the latency targets for request
> > completion are in the ~milliseconds range (rather than the
> > microseconds we typically strive for with TCP_RR-style workloads).
> >
> > All of that said, we haven't been forthcoming with data (and
> > unfortunately I don't have it handy in a useful form, otherwise I'd
> > simply post it here), so I understand the hesitation to simply run
> > with napi_tx across the board. As Willem said, this patch seemed like
> > the least disruptive way to allow us to continue down the road of
> > "universal" NAPI Tx and to hopefully get data across enough workloads
> > (with VMs small, large, and absurdly large :) to present a compelling
> > argument in one direction or another. As far as I know there aren't
> > currently any NAPI related ethtool commands (based on a quick perusal
> > of ethtool.h)
>
> As I suggest before, maybe we can (ab)use tx-frames-irq.

I forgot to respond to this originally, but I agree.

How about something like the snippet below. It would be simpler to
reason about if only allow switching while the device is down, but
napi does not strictly require that.

+static int virtnet_set_coalesce(struct net_device *dev,
+   struct ethtool_coalesce *ec)
+{
+   const u32 tx_coalesce_napi_mask = (1 << 16);
+   const struct ethtool_coalesce ec_default = {
+   .cmd = ETHTOOL_SCOALESCE,
+   .rx_max_coalesced_frames = 1,
+   .tx_max_coalesced_frames = 1,
+   };
+   struct virtnet_info *vi = netdev_priv(dev);

[net-next 05/15] ice: Code optimization for ice_fill_sw_rule()

2018-08-28 Thread Jeff Kirsher
From: Zhenning Xiao 

Use the buffer in the s_rule structure directly instead of using
a local array eth_hdr[DUMMY_ETH_HDR_LEN]

Signed-off-by: Zhenning Xiao 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_switch.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c 
b/drivers/net/ethernet/intel/ice/ice_switch.c
index 6b7ec2ae5ad6..d8b18cabc3a8 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -464,8 +464,9 @@ ice_fill_sw_rule(struct ice_hw *hw, struct ice_fltr_info 
*f_info,
 struct ice_aqc_sw_rules_elem *s_rule, enum ice_adminq_opc opc)
 {
u16 vlan_id = ICE_MAX_VLAN_ID + 1;
-   u8 eth_hdr[DUMMY_ETH_HDR_LEN];
void *daddr = NULL;
+   u16 eth_hdr_sz;
+   u8 *eth_hdr;
u32 act = 0;
__be16 *off;
 
@@ -477,8 +478,11 @@ ice_fill_sw_rule(struct ice_hw *hw, struct ice_fltr_info 
*f_info,
return;
}
 
+   eth_hdr_sz = sizeof(dummy_eth_header);
+   eth_hdr = s_rule->pdata.lkup_tx_rx.hdr;
+
/* initialize the ether header with a dummy header */
-   memcpy(eth_hdr, dummy_eth_header, sizeof(dummy_eth_header));
+   memcpy(eth_hdr, dummy_eth_header, eth_hdr_sz);
ice_fill_sw_info(hw, f_info);
 
switch (f_info->fltr_act) {
@@ -536,7 +540,7 @@ ice_fill_sw_rule(struct ice_hw *hw, struct ice_fltr_info 
*f_info,
daddr = f_info->l_data.ethertype_mac.mac_addr;
/* fall-through */
case ICE_SW_LKUP_ETHERTYPE:
-   off = (__be16 *)ð_hdr[ICE_ETH_ETHTYPE_OFFSET];
+   off = (__be16 *)(eth_hdr + ICE_ETH_ETHTYPE_OFFSET);
*off = cpu_to_be16(f_info->l_data.ethertype_mac.ethertype);
break;
case ICE_SW_LKUP_MAC_VLAN:
@@ -563,18 +567,16 @@ ice_fill_sw_rule(struct ice_hw *hw, struct ice_fltr_info 
*f_info,
s_rule->pdata.lkup_tx_rx.act = cpu_to_le32(act);
 
if (daddr)
-   ether_addr_copy(ð_hdr[ICE_ETH_DA_OFFSET], daddr);
+   ether_addr_copy(eth_hdr + ICE_ETH_DA_OFFSET, daddr);
 
if (!(vlan_id > ICE_MAX_VLAN_ID)) {
-   off = (__be16 *)ð_hdr[ICE_ETH_VLAN_TCI_OFFSET];
+   off = (__be16 *)(eth_hdr + ICE_ETH_VLAN_TCI_OFFSET);
*off = cpu_to_be16(vlan_id);
}
 
/* Create the switch rule with the final dummy Ethernet header */
if (opc != ice_aqc_opc_update_sw_rules)
-   s_rule->pdata.lkup_tx_rx.hdr_len = cpu_to_le16(sizeof(eth_hdr));
-
-   memcpy(s_rule->pdata.lkup_tx_rx.hdr, eth_hdr, sizeof(eth_hdr));
+   s_rule->pdata.lkup_tx_rx.hdr_len = cpu_to_le16(eth_hdr_sz);
 }
 
 /**
-- 
2.17.1



[net-next 04/15] ice: Prevent control queue operations during reset

2018-08-28 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

Once reset is issued, the driver loses all control queue interfaces.
Exercising control queue operations during reset is incorrect and
may result in long timeouts.

This patch introduces a new field 'reset_ongoing' in the hw structure.
This is set to 1 by the core driver when it receives a reset interrupt.
ice_sq_send_cmd checks reset_ongoing before actually issuing the control
queue operation. If a reset is in progress, it returns a soft error code
(ICE_ERR_RESET_PENDING) to the caller. The caller may or may not have to
take any action based on this return. Once the driver knows that the
reset is done, it has to set reset_ongoing back to 0. This will allow
control queue operations to be posted to the hardware again.

This "bail out" logic was specifically added to ice_sq_send_cmd (which
is pretty low level function) so that we have one solution in one place
that applies to all types of control queues.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_controlq.c |  3 ++
 drivers/net/ethernet/intel/ice/ice_main.c | 34 ---
 drivers/net/ethernet/intel/ice/ice_status.h   |  1 +
 drivers/net/ethernet/intel/ice/ice_type.h |  1 +
 4 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_controlq.c 
b/drivers/net/ethernet/intel/ice/ice_controlq.c
index 62be72fdc8f3..1fe026a65d75 100644
--- a/drivers/net/ethernet/intel/ice/ice_controlq.c
+++ b/drivers/net/ethernet/intel/ice/ice_controlq.c
@@ -806,6 +806,9 @@ ice_sq_send_cmd(struct ice_hw *hw, struct ice_ctl_q_info 
*cq,
u16 retval = 0;
u32 val = 0;
 
+   /* if reset is in progress return a soft error */
+   if (hw->reset_ongoing)
+   return ICE_ERR_RESET_ONGOING;
mutex_lock(&cq->sq_lock);
 
cq->sq_last_status = ICE_AQ_RC_OK;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index f1e80eed2fd6..014a2f3ea76c 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -535,10 +535,13 @@ static void ice_reset_subtask(struct ice_pf *pf)
ice_prepare_for_reset(pf);
 
/* make sure we are ready to rebuild */
-   if (ice_check_reset(&pf->hw))
+   if (ice_check_reset(&pf->hw)) {
set_bit(__ICE_RESET_FAILED, pf->state);
-   else
+   } else {
+   /* done with reset. start rebuild */
+   pf->hw.reset_ongoing = false;
ice_rebuild(pf);
+   }
clear_bit(__ICE_RESET_RECOVERY_PENDING, pf->state);
goto unlock;
}
@@ -1754,7 +1757,8 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, 
void *data)
 * We also make note of which reset happened so that peer
 * devices/drivers can be informed.
 */
-   if (!test_bit(__ICE_RESET_RECOVERY_PENDING, pf->state)) {
+   if (!test_and_set_bit(__ICE_RESET_RECOVERY_PENDING,
+ pf->state)) {
if (reset == ICE_RESET_CORER)
set_bit(__ICE_CORER_RECV, pf->state);
else if (reset == ICE_RESET_GLOBR)
@@ -1762,7 +1766,20 @@ static irqreturn_t ice_misc_intr(int __always_unused 
irq, void *data)
else
set_bit(__ICE_EMPR_RECV, pf->state);
 
-   set_bit(__ICE_RESET_RECOVERY_PENDING, pf->state);
+   /* There are couple of different bits at play here.
+* hw->reset_ongoing indicates whether the hardware is
+* in reset. This is set to true when a reset interrupt
+* is received and set back to false after the driver
+* has determined that the hardware is out of reset.
+*
+* __ICE_RESET_RECOVERY_PENDING in pf->state indicates
+* that a post reset rebuild is required before the
+* driver is operational again. This is set above.
+*
+* As this is the start of the reset/rebuild cycle, set
+* both to indicate that.
+*/
+   hw->reset_ongoing = true;
}
}
 
@@ -4185,7 +4202,14 @@ static int ice_vsi_stop_tx_rings(struct ice_vsi *vsi)
}
status = ice_dis_vsi_txq(vsi->port_info, vsi->num_txq, q_ids, q_teids,
 NULL);
-   if (status) {
+   /* if the disable queue command was exercised during an active reset
+* flow, ICE_ERR_RESET_ONGOING is returned. This is not an error as

[net-next 07/15] ice: Refactor VSI allocation, deletion and rebuild flow

2018-08-28 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

This patch refactors aspects of the VSI allocation, deletion and rebuild
flow. Some of the more noteworthy changes are described below.

1) On reset, all switch filters applied in the hardware are lost. In
   the rebuild flow, only MAC and broadcast filters are being restored.
   Instead, use a new function ice_replay_all_fltr to restore all the
   filters that were previously added. To do this, remove calls to
   ice_remove_vsi_fltr to prevent cleaning out the internal bookkeeping
   structures that ice_replay_all_fltr uses to replay filters.

2) Introduce a new state bit __ICE_PREPARED_FOR_RESET to distinguish the
   PF that requested the reset (and consequently prepared for it) from
   the rest of the PFs. These other PFs will prepare for reset only
   when they receive an interrupt from the firmware.

3) Use new functions ice_add_vsi and ice_free_vsi to create and destroy
   VSIs respectively. These functions accept a handle to uniquely
   identify a VSI. This same handle is required to rebuild the VSI post
   reset. To prevent confusion, the existing ice_vsi_add was renamed to
   ice_vsi_init.

4) Enhance ice_vsi_setup for the upcoming SR-IOV changes and expose a
   new wrapper function ice_pf_vsi_setup to create PF VSIs. Rework the
   error handling path in ice_setup_pf_sw.

5) Introduce a new function ice_vsi_release_all to release all PF VSIs.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice.h  |   2 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   1 +
 drivers/net/ethernet/intel/ice/ice_common.c   |   2 +
 drivers/net/ethernet/intel/ice/ice_main.c | 371 +++---
 drivers/net/ethernet/intel/ice/ice_switch.c   | 353 +++--
 drivers/net/ethernet/intel/ice/ice_switch.h   |  14 +-
 drivers/net/ethernet/intel/ice/ice_type.h |   8 +-
 7 files changed, 580 insertions(+), 171 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h 
b/drivers/net/ethernet/intel/ice/ice.h
index 868f4a1d0f72..e17030db0bee 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -62,6 +62,7 @@ extern const char ice_drv_ver[];
 #define ICE_RES_VALID_BIT  0x8000
 #define ICE_RES_MISC_VEC_ID(ICE_RES_VALID_BIT - 1)
 #define ICE_INVAL_Q_INDEX  0x
+#define ICE_INVAL_VFID 256
 
 #define ICE_VSIQF_HKEY_ARRAY_SIZE  ((VSIQF_HKEY_MAX_INDEX + 1) *   4)
 
@@ -122,6 +123,7 @@ struct ice_sw {
 enum ice_state {
__ICE_DOWN,
__ICE_NEEDS_RESTART,
+   __ICE_PREPARED_FOR_RESET,   /* set by driver when prepared */
__ICE_RESET_RECOVERY_PENDING,   /* set by driver when reset starts */
__ICE_PFR_REQ,  /* set by driver and peers */
__ICE_CORER_REQ,/* set by driver and peers */
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 87b304db9cad..55e8275ce2ee 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1253,6 +1253,7 @@ struct ice_aq_desc {
struct ice_aqc_add_txqs add_txqs;
struct ice_aqc_dis_txqs dis_txqs;
struct ice_aqc_add_get_update_free_vsi vsi_cmd;
+   struct ice_aqc_add_update_free_vsi_resp add_update_free_vsi_res;
struct ice_aqc_alloc_free_res_cmd sw_res_ctrl;
struct ice_aqc_set_event_mask set_event_mask;
struct ice_aqc_get_link_status get_link_status;
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index 4c6b1038dc5f..b2bb42def038 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -711,6 +711,8 @@ enum ice_status ice_reset(struct ice_hw *hw, enum 
ice_reset_req req)
ice_debug(hw, ICE_DBG_INIT, "GlobalR requested\n");
val = GLGEN_RTRIG_GLOBR_M;
break;
+   default:
+   return ICE_ERR_PARAM;
}
 
val |= rd32(hw, GLGEN_RTRIG);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 014a2f3ea76c..1ef63bf98cd8 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -32,6 +32,7 @@ static const struct net_device_ops ice_netdev_ops;
 static void ice_pf_dis_all_vsi(struct ice_pf *pf);
 static void ice_rebuild(struct ice_pf *pf);
 static int ice_vsi_release(struct ice_vsi *vsi);
+static void ice_vsi_release_all(struct ice_pf *pf);
 static void ice_update_vsi_stats(struct ice_vsi *vsi);
 static void ice_update_pf_stats(struct ice_pf *pf);
 
@@ -456,23 +457,13 @@ static void
 ice_prepare_for_reset(struct ice_pf *pf)
 {
struct ice_hw *hw = &pf->hw;
-   u32 v;
-
-   ice_for_each_vsi(pf, v)
-   i

[net-next 09/15] ice: Clean up register file

2018-08-28 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

This patch cleans up the existing register definitions.

1) Several instances of long defines names used in the BIT() macro
   were replaced to use the actual values they represent. As a
   result some defines for shifts (ending with _S) that were used
   only to create bitmasks were removed completely.

2) Apply more consistent tab spacing.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ice/ice_hw_autogen.h   | 417 --
 1 file changed, 188 insertions(+), 229 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h 
b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 6076fc87df9d..067ca26a1d94 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -6,251 +6,210 @@
 #ifndef _ICE_HW_AUTOGEN_H_
 #define _ICE_HW_AUTOGEN_H_
 
-#define QTX_COMM_DBELL(_DBQM)  (0x002C + ((_DBQM) * 4))
-#define PF_FW_ARQBAH   0x00080180
-#define PF_FW_ARQBAL   0x00080080
-#define PF_FW_ARQH 0x00080380
-#define PF_FW_ARQH_ARQH_S  0
-#define PF_FW_ARQH_ARQH_M  ICE_M(0x3FF, PF_FW_ARQH_ARQH_S)
-#define PF_FW_ARQLEN   0x00080280
-#define PF_FW_ARQLEN_ARQLEN_S  0
-#define PF_FW_ARQLEN_ARQLEN_M  ICE_M(0x3FF, PF_FW_ARQLEN_ARQLEN_S)
-#define PF_FW_ARQLEN_ARQVFE_S  28
-#define PF_FW_ARQLEN_ARQVFE_M  BIT(PF_FW_ARQLEN_ARQVFE_S)
-#define PF_FW_ARQLEN_ARQOVFL_S 29
-#define PF_FW_ARQLEN_ARQOVFL_M BIT(PF_FW_ARQLEN_ARQOVFL_S)
-#define PF_FW_ARQLEN_ARQCRIT_S 30
-#define PF_FW_ARQLEN_ARQCRIT_M BIT(PF_FW_ARQLEN_ARQCRIT_S)
-#define PF_FW_ARQLEN_ARQENABLE_S   31
-#define PF_FW_ARQLEN_ARQENABLE_M   BIT(PF_FW_ARQLEN_ARQENABLE_S)
-#define PF_FW_ARQT 0x00080480
-#define PF_FW_ATQBAH   0x00080100
-#define PF_FW_ATQBAL   0x0008
-#define PF_FW_ATQH 0x00080300
-#define PF_FW_ATQH_ATQH_S  0
-#define PF_FW_ATQH_ATQH_M  ICE_M(0x3FF, PF_FW_ATQH_ATQH_S)
-#define PF_FW_ATQLEN   0x00080200
-#define PF_FW_ATQLEN_ATQLEN_S  0
-#define PF_FW_ATQLEN_ATQLEN_M  ICE_M(0x3FF, PF_FW_ATQLEN_ATQLEN_S)
-#define PF_FW_ATQLEN_ATQVFE_S  28
-#define PF_FW_ATQLEN_ATQVFE_M  BIT(PF_FW_ATQLEN_ATQVFE_S)
-#define PF_FW_ATQLEN_ATQOVFL_S 29
-#define PF_FW_ATQLEN_ATQOVFL_M BIT(PF_FW_ATQLEN_ATQOVFL_S)
-#define PF_FW_ATQLEN_ATQCRIT_S 30
-#define PF_FW_ATQLEN_ATQCRIT_M BIT(PF_FW_ATQLEN_ATQCRIT_S)
-#define PF_FW_ATQLEN_ATQENABLE_S   31
-#define PF_FW_ATQLEN_ATQENABLE_M   BIT(PF_FW_ATQLEN_ATQENABLE_S)
-#define PF_FW_ATQT 0x00080400
-
+#define QTX_COMM_DBELL(_DBQM)  (0x002C + ((_DBQM) * 4))
+#define PF_FW_ARQBAH   0x00080180
+#define PF_FW_ARQBAL   0x00080080
+#define PF_FW_ARQH 0x00080380
+#define PF_FW_ARQH_ARQH_M  ICE_M(0x3FF, 0)
+#define PF_FW_ARQLEN   0x00080280
+#define PF_FW_ARQLEN_ARQLEN_M  ICE_M(0x3FF, 0)
+#define PF_FW_ARQLEN_ARQVFE_M  BIT(28)
+#define PF_FW_ARQLEN_ARQOVFL_M BIT(29)
+#define PF_FW_ARQLEN_ARQCRIT_M BIT(30)
+#define PF_FW_ARQLEN_ARQENABLE_M   BIT(31)
+#define PF_FW_ARQT 0x00080480
+#define PF_FW_ATQBAH   0x00080100
+#define PF_FW_ATQBAL   0x0008
+#define PF_FW_ATQH 0x00080300
+#define PF_FW_ATQH_ATQH_M  ICE_M(0x3FF, 0)
+#define PF_FW_ATQLEN   0x00080200
+#define PF_FW_ATQLEN_ATQLEN_M  ICE_M(0x3FF, 0)
+#define PF_FW_ATQLEN_ATQVFE_M  BIT(28)
+#define PF_FW_ATQLEN_ATQOVFL_M BIT(29)
+#define PF_FW_ATQLEN_ATQCRIT_M BIT(30)
+#define PF_FW_ATQLEN_ATQENABLE_M   BIT(31)
+#define PF_FW_ATQT 0x00080400
 #define GLFLXP_RXDID_FLAGS(_i, _j) (0x0045D000 + ((_i) * 4 + (_j) 
* 256))
 #define GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_S  0
-#define GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_M  ICE_M(0x3F, 
GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_S)
+#define GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_M  ICE_M(0x3F, 0)
 #define GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_1_S8
-#define GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_1_MICE_M(0x3F, 
GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_1_S)
+#define GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_1_MICE_M(0x3F, 8)
 #define GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_2_S16
-#define GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_2_MICE_M(0x3F, 
GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_2_S)
+#define GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_2_MICE_M(0x3F, 16)
 #define GLFLXP_RXDID_FLAGS_FLEXI

[net-next 11/15] ice: Implement ice_bridge_getlink and ice_bridge_setlink

2018-08-28 Thread Jeff Kirsher
From: Md Fahad Iqbal Polash 

ice_bridge_getlink returns the current bridge mode using
ndo_dflt_bridge_getlink and the mode parameter available in
first_switch->bridge_mode.

ice_bridge_setlink is invoked when the bridge mode needs to
changed. The value to be changed to is available as a netlink
message which is parsed in this function. If the mode has to
be changed, switch_flags is set appropriately (set ALLOW_LB
for VEB mode and clear it for VEPA mode) and ice_aq_update_vsi
is called. Also change the unicast switch filter rules.

Signed-off-by: Md Fahad Iqbal Polash 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_main.c   | 140 +++-
 drivers/net/ethernet/intel/ice/ice_switch.c |  41 ++
 drivers/net/ethernet/intel/ice/ice_switch.h |   1 +
 3 files changed, 181 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index fccecb6fa618..cbeae1355593 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3599,7 +3599,11 @@ static int ice_probe(struct pci_dev *pdev,
goto err_msix_misc_unroll;
}
 
-   pf->first_sw->bridge_mode = BRIDGE_MODE_VEB;
+   if (hw->evb_veb)
+   pf->first_sw->bridge_mode = BRIDGE_MODE_VEB;
+   else
+   pf->first_sw->bridge_mode = BRIDGE_MODE_VEPA;
+
pf->first_sw->pf = pf;
 
/* record the sw_id available for later use */
@@ -5695,6 +5699,138 @@ int ice_get_rss(struct ice_vsi *vsi, u8 *seed, u8 *lut, 
u16 lut_size)
return 0;
 }
 
+/**
+ * ice_bridge_getlink - Get the hardware bridge mode
+ * @skb: skb buff
+ * @pid: process id
+ * @seq: RTNL message seq
+ * @dev: the netdev being configured
+ * @filter_mask: filter mask passed in
+ * @nlflags: netlink flags passed in
+ *
+ * Return the bridge mode (VEB/VEPA)
+ */
+static int
+ice_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
+  struct net_device *dev, u32 filter_mask, int nlflags)
+{
+   struct ice_netdev_priv *np = netdev_priv(dev);
+   struct ice_vsi *vsi = np->vsi;
+   struct ice_pf *pf = vsi->back;
+   u16 bmode;
+
+   bmode = pf->first_sw->bridge_mode;
+
+   return ndo_dflt_bridge_getlink(skb, pid, seq, dev, bmode, 0, 0, nlflags,
+  filter_mask, NULL);
+}
+
+/**
+ * ice_vsi_update_bridge_mode - Update VSI for switching bridge mode (VEB/VEPA)
+ * @vsi: Pointer to VSI structure
+ * @bmode: Hardware bridge mode (VEB/VEPA)
+ *
+ * Returns 0 on success, negative on failure
+ */
+static int ice_vsi_update_bridge_mode(struct ice_vsi *vsi, u16 bmode)
+{
+   struct device *dev = &vsi->back->pdev->dev;
+   struct ice_aqc_vsi_props *vsi_props;
+   struct ice_hw *hw = &vsi->back->hw;
+   struct ice_vsi_ctx ctxt = { 0 };
+   enum ice_status status;
+
+   vsi_props = &vsi->info;
+   ctxt.info = vsi->info;
+
+   if (bmode == BRIDGE_MODE_VEB)
+   /* change from VEPA to VEB mode */
+   ctxt.info.sw_flags |= ICE_AQ_VSI_SW_FLAG_ALLOW_LB;
+   else
+   /* change from VEB to VEPA mode */
+   ctxt.info.sw_flags &= ~ICE_AQ_VSI_SW_FLAG_ALLOW_LB;
+   ctxt.vsi_num = vsi->vsi_num;
+   ctxt.info.valid_sections = cpu_to_le16(ICE_AQ_VSI_PROP_SW_VALID);
+   status = ice_aq_update_vsi(hw, &ctxt, NULL);
+   if (status) {
+   dev_err(dev, "update VSI for bridge mode failed, bmode = %d err 
%d aq_err %d\n",
+   bmode, status, hw->adminq.sq_last_status);
+   return -EIO;
+   }
+   /* Update sw flags for book keeping */
+   vsi_props->sw_flags = ctxt.info.sw_flags;
+
+   return 0;
+}
+
+/**
+ * ice_bridge_setlink - Set the hardware bridge mode
+ * @dev: the netdev being configured
+ * @nlh: RTNL message
+ * @flags: bridge setlink flags
+ *
+ * Sets the bridge mode (VEB/VEPA) of the switch to which the netdev (VSI) is
+ * hooked up to. Iterates through the PF VSI list and sets the loopback mode 
(if
+ * not already set for all VSIs connected to this switch. And also update the
+ * unicast switch filter rules for the corresponding switch of the netdev.
+ */
+static int
+ice_bridge_setlink(struct net_device *dev, struct nlmsghdr *nlh,
+  u16 __always_unused flags)
+{
+   struct ice_netdev_priv *np = netdev_priv(dev);
+   struct ice_pf *pf = np->vsi->back;
+   struct nlattr *attr, *br_spec;
+   struct ice_hw *hw = &pf->hw;
+   enum ice_status status;
+   struct ice_sw *pf_sw;
+   int rem, v, err = 0;
+
+   pf_sw = pf->first_sw;
+   /* find the attribute in the netlink message */
+   br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
+
+   nla_for_each_nested(attr, br_spec, rem) {
+   __u16 mode;
+
+   if (nla_type(attr)

[net-next 02/15] ice: Updates to Tx scheduler code

2018-08-28 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

1) The maximum device nodes is a global value and shared by the whole
   device. Add element AQ command would fail if there is no space to
   add new nodes so the check for max nodes isn't required. So remove
   ice_sched_get_num_nodes_per_layer and ice_sched_val_max_nodes.

2) In ice_sched_add_elems, set default node's CIR/EIR bandwidth weight.

3) Fix default scheduler topology buffer size as the firmware expects
   a 4KB buffer at all times, and will error out if one of any other
   size is provided.

4) In the latest spec, max children per node per layer is replaced by
   max sibling group size. Now it provides the max children of the below
   layer node, not the current layer node.

5) Fix some newline/whitespace issues for consistency.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   5 +-
 drivers/net/ethernet/intel/ice/ice_common.c   |   7 +
 drivers/net/ethernet/intel/ice/ice_sched.c| 161 ++
 drivers/net/ethernet/intel/ice/ice_type.h |   2 +
 4 files changed, 61 insertions(+), 114 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index a0614f472658..9a33fb95c0ea 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -771,9 +771,8 @@ struct ice_aqc_layer_props {
u8 chunk_size;
__le16 max_device_nodes;
__le16 max_pf_nodes;
-   u8 rsvd0[2];
-   __le16 max_shared_rate_lmtr;
-   __le16 max_children;
+   u8 rsvd0[4];
+   __le16 max_sibl_grp_sz;
__le16 max_cir_rl_profiles;
__le16 max_eir_rl_profiles;
__le16 max_srl_profiles;
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index 53cbfd942d03..b315655eab27 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -527,6 +527,13 @@ enum ice_status ice_init_hw(struct ice_hw *hw)
if (status)
goto err_unroll_sched;
 
+   /* need a valid SW entry point to build a Tx tree */
+   if (!hw->sw_entry_point_layer) {
+   ice_debug(hw, ICE_DBG_SCHED, "invalid sw entry point\n");
+   status = ICE_ERR_CFG;
+   goto err_unroll_sched;
+   }
+
status = ice_init_fltr_mgmt_struct(hw);
if (status)
goto err_unroll_sched;
diff --git a/drivers/net/ethernet/intel/ice/ice_sched.c 
b/drivers/net/ethernet/intel/ice/ice_sched.c
index eeae199469b6..9b7b50554952 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.c
+++ b/drivers/net/ethernet/intel/ice/ice_sched.c
@@ -17,7 +17,6 @@ ice_sched_add_root_node(struct ice_port_info *pi,
 {
struct ice_sched_node *root;
struct ice_hw *hw;
-   u16 max_children;
 
if (!pi)
return ICE_ERR_PARAM;
@@ -28,8 +27,8 @@ ice_sched_add_root_node(struct ice_port_info *pi,
if (!root)
return ICE_ERR_NO_MEMORY;
 
-   max_children = le16_to_cpu(hw->layer_info[0].max_children);
-   root->children = devm_kcalloc(ice_hw_to_dev(hw), max_children,
+   /* coverity[suspicious_sizeof] */
+   root->children = devm_kcalloc(ice_hw_to_dev(hw), hw->max_children[0],
  sizeof(*root), GFP_KERNEL);
if (!root->children) {
devm_kfree(ice_hw_to_dev(hw), root);
@@ -100,7 +99,6 @@ ice_sched_add_node(struct ice_port_info *pi, u8 layer,
struct ice_sched_node *parent;
struct ice_sched_node *node;
struct ice_hw *hw;
-   u16 max_children;
 
if (!pi)
return ICE_ERR_PARAM;
@@ -120,9 +118,10 @@ ice_sched_add_node(struct ice_port_info *pi, u8 layer,
node = devm_kzalloc(ice_hw_to_dev(hw), sizeof(*node), GFP_KERNEL);
if (!node)
return ICE_ERR_NO_MEMORY;
-   max_children = le16_to_cpu(hw->layer_info[layer].max_children);
-   if (max_children) {
-   node->children = devm_kcalloc(ice_hw_to_dev(hw), max_children,
+   if (hw->max_children[layer]) {
+   /* coverity[suspicious_sizeof] */
+   node->children = devm_kcalloc(ice_hw_to_dev(hw),
+ hw->max_children[layer],
  sizeof(*node), GFP_KERNEL);
if (!node->children) {
devm_kfree(ice_hw_to_dev(hw), node);
@@ -192,14 +191,17 @@ ice_sched_remove_elems(struct ice_hw *hw, struct 
ice_sched_node *parent,
buf = devm_kzalloc(ice_hw_to_dev(hw), buf_size, GFP_KERNEL);
if (!buf)
return ICE_ERR_NO_MEMORY;
+
buf->hdr.parent_teid = parent->info.node_teid;
buf->hdr.num_elems = cpu_to_le16(num_nodes);
for (i = 0; i < num_nodes; i++)
b

[net-next 08/15] ice: Implement handlers for ethtool PHY/link operations

2018-08-28 Thread Jeff Kirsher
From: Chinh Cao 

This patch implements handlers for ethtool get_link_ksettings and
set_link_ksettings. Helper functions use by these handlers are also
introduced in this patch.

Signed-off-by: Chinh Cao 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   8 +-
 drivers/net/ethernet/intel/ice/ice_common.c   | 121 ++-
 drivers/net/ethernet/intel/ice/ice_common.h   |  14 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c  | 801 +-
 4 files changed, 891 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 55e8275ce2ee..3dadb2b01b5c 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -920,9 +920,11 @@ struct ice_aqc_set_phy_cfg_data {
u8 caps;
 #define ICE_AQ_PHY_ENA_TX_PAUSE_ABILITYBIT(0)
 #define ICE_AQ_PHY_ENA_RX_PAUSE_ABILITYBIT(1)
-#define ICE_AQ_PHY_ENA_LOW_POWER   BIT(2)
-#define ICE_AQ_PHY_ENA_LINKBIT(3)
-#define ICE_AQ_PHY_ENA_ATOMIC_LINK BIT(5)
+#define ICE_AQ_PHY_ENA_LOW_POWER   BIT(2)
+#define ICE_AQ_PHY_ENA_LINKBIT(3)
+#define ICE_AQ_PHY_ENA_AUTO_LINK_UPDT  BIT(5)
+#define ICE_AQ_PHY_ENA_LESMBIT(6)
+#define ICE_AQ_PHY_ENA_AUTO_FECBIT(7)
u8 low_power_ctrl;
__le16 eee_cap; /* Value from ice_aqc_get_phy_caps */
__le16 eeer_value;
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index b2bb42def038..52c2bf4f108e 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -125,7 +125,7 @@ ice_aq_manage_mac_read(struct ice_hw *hw, void *buf, u16 
buf_size,
  *
  * Returns the various PHY capabilities supported on the Port (0x0600)
  */
-static enum ice_status
+enum ice_status
 ice_aq_get_phy_caps(struct ice_port_info *pi, bool qual_mods, u8 report_mode,
struct ice_aqc_get_phy_caps_data *pcaps,
struct ice_sq_cd *cd)
@@ -1408,6 +1408,110 @@ void ice_clear_pxe_mode(struct ice_hw *hw)
ice_aq_clear_pxe_mode(hw);
 }
 
+/**
+ * ice_get_link_speed_based_on_phy_type - returns link speed
+ * @phy_type_low: lower part of phy_type
+ *
+ * This helper function will convert a phy_type_low to its corresponding link
+ * speed.
+ * Note: In the structure of phy_type_low, there should be one bit set, as
+ * this function will convert one phy type to its speed.
+ * If no bit gets set, ICE_LINK_SPEED_UNKNOWN will be returned
+ * If more than one bit gets set, ICE_LINK_SPEED_UNKNOWN will be returned
+ */
+static u16
+ice_get_link_speed_based_on_phy_type(u64 phy_type_low)
+{
+   u16 speed_phy_type_low = ICE_AQ_LINK_SPEED_UNKNOWN;
+
+   switch (phy_type_low) {
+   case ICE_PHY_TYPE_LOW_100BASE_TX:
+   case ICE_PHY_TYPE_LOW_100M_SGMII:
+   speed_phy_type_low = ICE_AQ_LINK_SPEED_100MB;
+   break;
+   case ICE_PHY_TYPE_LOW_1000BASE_T:
+   case ICE_PHY_TYPE_LOW_1000BASE_SX:
+   case ICE_PHY_TYPE_LOW_1000BASE_LX:
+   case ICE_PHY_TYPE_LOW_1000BASE_KX:
+   case ICE_PHY_TYPE_LOW_1G_SGMII:
+   speed_phy_type_low = ICE_AQ_LINK_SPEED_1000MB;
+   break;
+   case ICE_PHY_TYPE_LOW_2500BASE_T:
+   case ICE_PHY_TYPE_LOW_2500BASE_X:
+   case ICE_PHY_TYPE_LOW_2500BASE_KX:
+   speed_phy_type_low = ICE_AQ_LINK_SPEED_2500MB;
+   break;
+   case ICE_PHY_TYPE_LOW_5GBASE_T:
+   case ICE_PHY_TYPE_LOW_5GBASE_KR:
+   speed_phy_type_low = ICE_AQ_LINK_SPEED_5GB;
+   break;
+   case ICE_PHY_TYPE_LOW_10GBASE_T:
+   case ICE_PHY_TYPE_LOW_10G_SFI_DA:
+   case ICE_PHY_TYPE_LOW_10GBASE_SR:
+   case ICE_PHY_TYPE_LOW_10GBASE_LR:
+   case ICE_PHY_TYPE_LOW_10GBASE_KR_CR1:
+   case ICE_PHY_TYPE_LOW_10G_SFI_AOC_ACC:
+   case ICE_PHY_TYPE_LOW_10G_SFI_C2C:
+   speed_phy_type_low = ICE_AQ_LINK_SPEED_10GB;
+   break;
+   case ICE_PHY_TYPE_LOW_25GBASE_T:
+   case ICE_PHY_TYPE_LOW_25GBASE_CR:
+   case ICE_PHY_TYPE_LOW_25GBASE_CR_S:
+   case ICE_PHY_TYPE_LOW_25GBASE_CR1:
+   case ICE_PHY_TYPE_LOW_25GBASE_SR:
+   case ICE_PHY_TYPE_LOW_25GBASE_LR:
+   case ICE_PHY_TYPE_LOW_25GBASE_KR:
+   case ICE_PHY_TYPE_LOW_25GBASE_KR_S:
+   case ICE_PHY_TYPE_LOW_25GBASE_KR1:
+   case ICE_PHY_TYPE_LOW_25G_AUI_AOC_ACC:
+   case ICE_PHY_TYPE_LOW_25G_AUI_C2C:
+   speed_phy_type_low = ICE_AQ_LINK_SPEED_25GB;
+   break;
+   case ICE_PHY_TYPE_LOW_40GBASE_CR4:
+   case ICE_PHY_TYPE_LOW_40GBASE_SR4:
+   case ICE_PHY_TYPE_LOW_40GBASE_LR4:
+   case ICE_PHY_TYPE_LOW_40GBASE_KR4:
+   case ICE_PHY_TYPE_LOW_40G_XLAUI_AOC_ACC:
+   case 

[net-next 14/15] ice: Introduce SERVICE_DIS flag and service routine functions

2018-08-28 Thread Jeff Kirsher
From: Akeem G Abodunrin 

This patch introduces SERVICE_DIS flag to use for stopping service task.
This flag will be checked before scheduling new tasks. Also add new
functions ice_service_task_stop to stop service task.

Signed-off-by: Akeem G Abodunrin 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice.h  |  1 +
 drivers/net/ethernet/intel/ice/ice_main.c | 34 ++-
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h 
b/drivers/net/ethernet/intel/ice/ice.h
index 6f44a850c4b2..9cf233d085d8 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -138,6 +138,7 @@ enum ice_state {
__ICE_FLTR_OVERFLOW_PROMISC,
__ICE_CFG_BUSY,
__ICE_SERVICE_SCHED,
+   __ICE_SERVICE_DIS,
__ICE_STATE_NBITS   /* must be last */
 };
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 46d8e2275647..b1c4dfbdeeb3 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1107,7 +1107,7 @@ static void ice_clean_adminq_subtask(struct ice_pf *pf)
  */
 static void ice_service_task_schedule(struct ice_pf *pf)
 {
-   if (!test_bit(__ICE_DOWN, pf->state) &&
+   if (!test_bit(__ICE_SERVICE_DIS, pf->state) &&
!test_and_set_bit(__ICE_SERVICE_SCHED, pf->state) &&
!test_bit(__ICE_NEEDS_RESTART, pf->state))
queue_work(ice_wq, &pf->serv_task);
@@ -1126,6 +1126,22 @@ static void ice_service_task_complete(struct ice_pf *pf)
clear_bit(__ICE_SERVICE_SCHED, pf->state);
 }
 
+/**
+ * ice_service_task_stop - stop service task and cancel works
+ * @pf: board private structure
+ */
+static void ice_service_task_stop(struct ice_pf *pf)
+{
+   set_bit(__ICE_SERVICE_DIS, pf->state);
+
+   if (pf->serv_tmr.function)
+   del_timer_sync(&pf->serv_tmr);
+   if (pf->serv_task.func)
+   cancel_work_sync(&pf->serv_task);
+
+   clear_bit(__ICE_SERVICE_SCHED, pf->state);
+}
+
 /**
  * ice_service_timer - timer callback to schedule service task
  * @t: pointer to timer_list
@@ -3389,10 +3405,7 @@ static void ice_determine_q_usage(struct ice_pf *pf)
  */
 static void ice_deinit_pf(struct ice_pf *pf)
 {
-   if (pf->serv_tmr.function)
-   del_timer_sync(&pf->serv_tmr);
-   if (pf->serv_task.func)
-   cancel_work_sync(&pf->serv_task);
+   ice_service_task_stop(pf);
mutex_destroy(&pf->sw_mutex);
mutex_destroy(&pf->avail_q_mutex);
 }
@@ -3599,6 +3612,8 @@ static int ice_probe(struct pci_dev *pdev,
pf->pdev = pdev;
pci_set_drvdata(pdev, pf);
set_bit(__ICE_DOWN, pf->state);
+   /* Disable service task until DOWN bit is cleared */
+   set_bit(__ICE_SERVICE_DIS, pf->state);
 
hw = &pf->hw;
hw->hw_addr = pcim_iomap_table(pdev)[ICE_BAR0];
@@ -3656,6 +3671,9 @@ static int ice_probe(struct pci_dev *pdev,
goto err_init_interrupt_unroll;
}
 
+   /* Driver is mostly up */
+   clear_bit(__ICE_DOWN, pf->state);
+
/* In case of MSIX we are going to setup the misc vector right here
 * to handle admin queue events etc. In case of legacy and MSI
 * the misc functionality and queue processing is combined in
@@ -3695,8 +3713,7 @@ static int ice_probe(struct pci_dev *pdev,
goto err_alloc_sw_unroll;
}
 
-   /* Driver is mostly up */
-   clear_bit(__ICE_DOWN, pf->state);
+   clear_bit(__ICE_SERVICE_DIS, pf->state);
 
/* since everything is good, start the service timer */
mod_timer(&pf->serv_tmr, round_jiffies(jiffies + pf->serv_tmr_period));
@@ -3710,6 +3727,7 @@ static int ice_probe(struct pci_dev *pdev,
return 0;
 
 err_alloc_sw_unroll:
+   set_bit(__ICE_SERVICE_DIS, pf->state);
set_bit(__ICE_DOWN, pf->state);
devm_kfree(&pf->pdev->dev, pf->first_sw);
 err_msix_misc_unroll:
@@ -3737,6 +3755,7 @@ static void ice_remove(struct pci_dev *pdev)
return;
 
set_bit(__ICE_DOWN, pf->state);
+   ice_service_task_stop(pf);
 
ice_vsi_release_all(pf);
ice_free_irq_msix_misc(pf);
@@ -5996,6 +6015,7 @@ static void ice_tx_timeout(struct net_device *netdev)
netdev_err(netdev, "tx_timeout recovery unsuccessful, device is 
in unrecoverable state.\n");
set_bit(__ICE_DOWN, pf->state);
set_bit(__ICE_NEEDS_RESTART, vsi->state);
+   set_bit(__ICE_SERVICE_DIS, pf->state);
break;
}
 
-- 
2.17.1



[net-next 10/15] ice: Add support for Tx hang, Tx timeout and malicious driver detection

2018-08-28 Thread Jeff Kirsher
From: Sudheer Mogilappagari 

When a malicious operation is detected, the firmware triggers an
interrupt, which is then picked up by the service task (specifically by
ice_handle_mdd_event). A reset is scheduled if required.

Tx hang detection works in a similar way, except the logic here monitors
the VSI's Tx queues and tries to revive them if stalled. If the hang is
not resolved, the kernel eventually calls ndo_tx_timeout, which is
handled by ice_tx_timeout.

Signed-off-by: Sudheer Mogilappagari 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice.h  |   4 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |  39 +++
 drivers/net/ethernet/intel/ice/ice_main.c | 286 ++
 drivers/net/ethernet/intel/ice/ice_txrx.c |   1 +
 drivers/net/ethernet/intel/ice/ice_txrx.h |   1 +
 5 files changed, 331 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h 
b/drivers/net/ethernet/intel/ice/ice.h
index e17030db0bee..6f44a850c4b2 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -134,6 +134,7 @@ enum ice_state {
__ICE_SUSPENDED,/* set on module remove path */
__ICE_RESET_FAILED, /* set by reset/rebuild */
__ICE_ADMINQ_EVENT_PENDING,
+   __ICE_MDD_EVENT_PENDING,
__ICE_FLTR_OVERFLOW_PROMISC,
__ICE_CFG_BUSY,
__ICE_SERVICE_SCHED,
@@ -272,6 +273,9 @@ struct ice_pf {
struct ice_hw_port_stats stats_prev;
struct ice_hw hw;
u8 stat_prev_loaded;/* has previous stats been loaded */
+   u32 tx_timeout_count;
+   unsigned long tx_timeout_last_recovery;
+   u32 tx_timeout_recovery_level;
char int_name[ICE_INT_NAME_STR_LEN];
 };
 
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h 
b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 067ca26a1d94..88f11498804b 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -123,6 +123,45 @@
 #define QRX_CTRL_QENA_STAT_M   BIT(2)
 #define QRX_ITR(_QRX)  (0x00292000 + ((_QRX) * 4))
 #define QRX_TAIL(_QRX) (0x0029 + ((_QRX) * 4))
+#define QRX_TAIL_MAX_INDEX 2047
+#define QRX_TAIL_TAIL_S0
+#define QRX_TAIL_TAIL_MICE_M(0x1FFF, 0)
+#define GL_MDET_RX 0x00294C00
+#define GL_MDET_RX_QNUM_S  0
+#define GL_MDET_RX_QNUM_M  ICE_M(0x7FFF, 0)
+#define GL_MDET_RX_VF_NUM_S15
+#define GL_MDET_RX_VF_NUM_MICE_M(0xFF, 15)
+#define GL_MDET_RX_PF_NUM_S23
+#define GL_MDET_RX_PF_NUM_MICE_M(0x7, 23)
+#define GL_MDET_RX_MAL_TYPE_S  26
+#define GL_MDET_RX_MAL_TYPE_M  ICE_M(0x1F, 26)
+#define GL_MDET_RX_VALID_M BIT(31)
+#define GL_MDET_TX_PQM 0x002D2E00
+#define GL_MDET_TX_PQM_PF_NUM_S0
+#define GL_MDET_TX_PQM_PF_NUM_MICE_M(0x7, 0)
+#define GL_MDET_TX_PQM_VF_NUM_S4
+#define GL_MDET_TX_PQM_VF_NUM_MICE_M(0xFF, 4)
+#define GL_MDET_TX_PQM_QNUM_S  12
+#define GL_MDET_TX_PQM_QNUM_M  ICE_M(0x3FFF, 12)
+#define GL_MDET_TX_PQM_MAL_TYPE_S  26
+#define GL_MDET_TX_PQM_MAL_TYPE_M  ICE_M(0x1F, 26)
+#define GL_MDET_TX_PQM_VALID_M BIT(31)
+#define GL_MDET_TX_TCLAN   0x000FC068
+#define GL_MDET_TX_TCLAN_QNUM_S0
+#define GL_MDET_TX_TCLAN_QNUM_MICE_M(0x7FFF, 0)
+#define GL_MDET_TX_TCLAN_VF_NUM_S  15
+#define GL_MDET_TX_TCLAN_VF_NUM_M  ICE_M(0xFF, 15)
+#define GL_MDET_TX_TCLAN_PF_NUM_S  23
+#define GL_MDET_TX_TCLAN_PF_NUM_M  ICE_M(0x7, 23)
+#define GL_MDET_TX_TCLAN_MAL_TYPE_S26
+#define GL_MDET_TX_TCLAN_MAL_TYPE_MICE_M(0x1F, 26)
+#define GL_MDET_TX_TCLAN_VALID_M   BIT(31)
+#define PF_MDET_RX 0x00294280
+#define PF_MDET_RX_VALID_M BIT(0)
+#define PF_MDET_TX_PQM 0x002D2C80
+#define PF_MDET_TX_PQM_VALID_M BIT(0)
+#define PF_MDET_TX_TCLAN   0x000FC000
+#define PF_MDET_TX_TCLAN_VALID_M   BIT(0)
 #define GLNVM_FLA  0x000B6108
 #define GLNVM_FLA_LOCKED_M BIT(6)
 #define GLNVM_GENS 0x000B6100
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 1ef63bf98cd8..fccecb6fa618 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/i

[net-next 15/15] ice: Fix and update driver version string

2018-08-28 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

Remove the "ice" prefix for the driver version string and bump version
to 0.7.1-k.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index b1c4dfbdeeb3..1b49a605d094 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -7,7 +7,7 @@
 
 #include "ice.h"
 
-#define DRV_VERSION"ice-0.7.0-k"
+#define DRV_VERSION"0.7.1-k"
 #define DRV_SUMMARY"Intel(R) Ethernet Connection E800 Series Linux Driver"
 const char ice_drv_ver[] = DRV_VERSION;
 static const char ice_driver_string[] = DRV_SUMMARY;
-- 
2.17.1



[net-next 06/15] ice: Refactor switch rule management structures and functions

2018-08-28 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

This patch is an adaptation of the work originally done by Grishma
Kotecha  that in summary refactors the
switch filtering logic in the driver. More specifically,
 - Update the recipe structure to also store list of rules
 - Update the existing code for recipes like MAC, VLAN, ethtype etc to
   use list head that is attached to switch recipe structure
 - Add a common function to search for a rule entry and add a new rule
   entry. Update the code to use this new function.
 - Refactor the rem_handle_vsi_list function to simplify the logic

CC: Shannon Nelson 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  36 +-
 drivers/net/ethernet/intel/ice/ice_switch.c   | 967 --
 drivers/net/ethernet/intel/ice/ice_switch.h   |  35 +-
 drivers/net/ethernet/intel/ice/ice_type.h |  13 +-
 5 files changed, 500 insertions(+), 553 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 9a33fb95c0ea..87b304db9cad 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -443,6 +443,8 @@ struct ice_aqc_vsi_props {
u8 reserved[24];
 };
 
+#define ICE_MAX_NUM_RECIPES 64
+
 /* Add/Update/Remove/Get switch rules (indirect 0x02A0, 0x02A1, 0x02A2, 0x02A3)
  */
 struct ice_aqc_sw_rules {
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index 2a1e13576ce2..4c6b1038dc5f 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -388,20 +388,7 @@ static enum ice_status ice_init_fltr_mgmt_struct(struct 
ice_hw *hw)
 
INIT_LIST_HEAD(&sw->vsi_list_map_head);
 
-   mutex_init(&sw->mac_list_lock);
-   INIT_LIST_HEAD(&sw->mac_list_head);
-
-   mutex_init(&sw->vlan_list_lock);
-   INIT_LIST_HEAD(&sw->vlan_list_head);
-
-   mutex_init(&sw->eth_m_list_lock);
-   INIT_LIST_HEAD(&sw->eth_m_list_head);
-
-   mutex_init(&sw->promisc_list_lock);
-   INIT_LIST_HEAD(&sw->promisc_list_head);
-
-   mutex_init(&sw->mac_vlan_list_lock);
-   INIT_LIST_HEAD(&sw->mac_vlan_list_head);
+   ice_init_def_sw_recp(hw);
 
return 0;
 }
@@ -415,19 +402,28 @@ static void ice_cleanup_fltr_mgmt_struct(struct ice_hw 
*hw)
struct ice_switch_info *sw = hw->switch_info;
struct ice_vsi_list_map_info *v_pos_map;
struct ice_vsi_list_map_info *v_tmp_map;
+   struct ice_sw_recipe *recps;
+   u8 i;
 
list_for_each_entry_safe(v_pos_map, v_tmp_map, &sw->vsi_list_map_head,
 list_entry) {
list_del(&v_pos_map->list_entry);
devm_kfree(ice_hw_to_dev(hw), v_pos_map);
}
+   recps = hw->switch_info->recp_list;
+   for (i = 0; i < ICE_SW_LKUP_LAST; i++) {
+   struct ice_fltr_mgmt_list_entry *lst_itr, *tmp_entry;
+
+   recps[i].root_rid = i;
+   mutex_destroy(&recps[i].filt_rule_lock);
+   list_for_each_entry_safe(lst_itr, tmp_entry,
+&recps[i].filt_rules, list_entry) {
+   list_del(&lst_itr->list_entry);
+   devm_kfree(ice_hw_to_dev(hw), lst_itr);
+   }
+   }
 
-   mutex_destroy(&sw->mac_list_lock);
-   mutex_destroy(&sw->vlan_list_lock);
-   mutex_destroy(&sw->eth_m_list_lock);
-   mutex_destroy(&sw->promisc_list_lock);
-   mutex_destroy(&sw->mac_vlan_list_lock);
-
+   devm_kfree(ice_hw_to_dev(hw), sw->recp_list);
devm_kfree(ice_hw_to_dev(hw), sw);
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c 
b/drivers/net/ethernet/intel/ice/ice_switch.c
index d8b18cabc3a8..2693bebef977 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -85,6 +85,35 @@ ice_aq_alloc_free_res(struct ice_hw *hw, u16 num_entries,
return ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
 }
 
+/**
+ * ice_init_def_sw_recp - initialize the recipe book keeping tables
+ * @hw: pointer to the hw struct
+ *
+ * Allocate memory for the entire recipe table and initialize the structures/
+ * entries corresponding to basic recipes.
+ */
+enum ice_status
+ice_init_def_sw_recp(struct ice_hw *hw)
+{
+   struct ice_sw_recipe *recps;
+   u8 i;
+
+   recps = devm_kcalloc(ice_hw_to_dev(hw), ICE_MAX_NUM_RECIPES,
+sizeof(struct ice_sw_recipe), GFP_KERNEL);
+   if (!recps)
+   return ICE_ERR_NO_MEMORY;
+
+   for (i = 0; i < ICE_SW_LKUP_LAST; i++) {
+   recps[i].root_rid = i;
+   INIT_LIST_HEAD(&recps[i].filt_rules);
+   mutex_init(&recps[i].filt_rule_lock);
+   }
+
+  

[net-next 00/15][pull request] 100GbE Intel Wired LAN Driver Updates 2018-08-28

2018-08-28 Thread Jeff Kirsher
This series contains new features and implementation updates for the
ice driver.

Anirudh reworks the current flex programming logic to add support for
a second flex descriptor profile.  Updated the transmit scheduler
code to handle changes to the spec, specifically the firmware expects
a 4KB buffer at all times so fix the default scheduler topology buffer
size.  Also the maximum children per node per layer is replaced by
maximum sibling group size.  Adds a check to ensure a reset is not in
progress before exercising a control queue operation.  Refactored the
switch rule management functions and structures to simply the logic and
to add a common function to search for a rule entry and add a new rule
entry.  Refactored the VSI allocation, deletion and rebuild flow so that
on reset we can restore all the filters that were previously added.  Did
some spring cleaning of define names and macros.

Dan updates the admin queue command for requesting resource ownership
to the latest specification by adding new enum's and change the locks.

Zhenning optimizes the driver by using the existing buffer in a
structure directly versus a local array.

Chinh implements handlers for ethtool for get and set link settings.

Sudheer implements transmit hang/timeout detection and malicious driver
detection in the driver.

Md Fahad Iqbal implements the get and set bridge mode operations.

Hieu adds the ability for firmware logging during initialization.

Brett updates the driver to only enable VSI transmit and receive pruning
when VLAN 0 is active, and when VLAN 0 is removed/not active, pruning is
disabled.

Akeem adds a flag to use for stopping the service task.

The following are changes since commit 050cdc6c9501abcd64720b8cc3e7941efee9547d:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Akeem G Abodunrin (1):
  ice: Introduce SERVICE_DIS flag and service routine functions

Anirudh Venkataramanan (7):
  ice: Rework flex descriptor programming
  ice: Updates to Tx scheduler code
  ice: Prevent control queue operations during reset
  ice: Refactor switch rule management structures and functions
  ice: Refactor VSI allocation, deletion and rebuild flow
  ice: Clean up register file
  ice: Fix and update driver version string

Brett Creeley (1):
  ice: Enable VSI Rx/Tx pruning only when VLAN 0 is active

Chinh Cao (1):
  ice: Implement handlers for ethtool PHY/link operations

Dan Nowlin (1):
  ice: Update request resource command to latest specification

Hieu Tran (1):
  ice: Enable firmware logging during device initialization.

Md Fahad Iqbal Polash (1):
  ice: Implement ice_bridge_getlink and ice_bridge_setlink

Sudheer Mogilappagari (1):
  ice: Add support for Tx hang, Tx timeout and malicious driver
detection

Zhenning Xiao (1):
  ice: Code optimization for ice_fill_sw_rule()

 drivers/net/ethernet/intel/ice/ice.h  |7 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   99 +-
 drivers/net/ethernet/intel/ice/ice_common.c   |  525 +-
 drivers/net/ethernet/intel/ice/ice_common.h   |   17 +-
 drivers/net/ethernet/intel/ice/ice_controlq.c |3 +
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  801 -
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |  456 +++---
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h|   24 +-
 drivers/net/ethernet/intel/ice/ice_main.c |  964 +--
 drivers/net/ethernet/intel/ice/ice_nvm.c  |2 +-
 drivers/net/ethernet/intel/ice/ice_sched.c|  161 +-
 drivers/net/ethernet/intel/ice/ice_status.h   |1 +
 drivers/net/ethernet/intel/ice/ice_switch.c   | 1459 ++---
 drivers/net/ethernet/intel/ice/ice_switch.h   |   50 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c |1 +
 drivers/net/ethernet/intel/ice/ice_txrx.h |1 +
 drivers/net/ethernet/intel/ice/ice_type.h |   52 +-
 17 files changed, 3375 insertions(+), 1248 deletions(-)

-- 
2.17.1



[net-next 13/15] ice: Enable VSI Rx/Tx pruning only when VLAN 0 is active

2018-08-28 Thread Jeff Kirsher
From: Brett Creeley 

VLAN pruning is not valid when VLAN 0 is not active. If VLAN
pruning is enabled and VLAN 0 is not active (8021q driver not loaded)
then normal, non-VLAN, traffic will not pass.

TX/RX VLAN pruning is enabled when the VLAN 0 is added to the
active_vlan bitmap and it is disabled when VLAN 0 is removed from the
active_vlan bitmap.

So, only enable VLAN pruning when VLAN 0 is active. Setting RX VLAN
pruning causes the switch to drop received VLAN packets when there
are no matching VLAN ids in the associated VSI's switch filters. Setting
TX pruning makes it so the switch will not send out any packets with
VLAN tags that don't match the associated VSI's switch filters.

With this patch, if the VF or PF tries to send a VLAN tagged packet with
a VLAN tag that it does not have a pruning rule for it will trigger an
MDD event. For example, if PF0 has VLAN10 and VLAN11 interfaces and
scapy is used to send a packet with VLAN8 then the MDD is triggered.

Also make ice_vsi_kill_vlan return a value which the caller can check
before updating VLAN related data structures (counts, pruning bits, etc.).

Signed-off-by: Brett Creeley 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_main.c | 94 ---
 1 file changed, 85 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index f04e124bca8c..46d8e2275647 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -349,6 +349,63 @@ static bool ice_vsi_fltr_changed(struct ice_vsi *vsi)
   test_bit(ICE_VSI_FLAG_VLAN_FLTR_CHANGED, vsi->flags);
 }
 
+/**
+ * ice_cfg_vlan_pruning - enable or disable VLAN pruning on the VSI
+ * @vsi: VSI to enable or disable VLAN pruning on
+ * @ena: set to true to enable VLAN pruning and false to disable it
+ *
+ * returns 0 if VSI is updated, negative otherwise
+ */
+static int ice_cfg_vlan_pruning(struct ice_vsi *vsi, bool ena)
+{
+   struct ice_vsi_ctx *ctxt;
+   struct device *dev;
+   int status;
+
+   if (!vsi)
+   return -EINVAL;
+
+   dev = &vsi->back->pdev->dev;
+   ctxt = devm_kzalloc(dev, sizeof(*ctxt), GFP_KERNEL);
+   if (!ctxt)
+   return -ENOMEM;
+
+   ctxt->info = vsi->info;
+
+   if (ena) {
+   ctxt->info.sec_flags |=
+   ICE_AQ_VSI_SEC_TX_VLAN_PRUNE_ENA <<
+   ICE_AQ_VSI_SEC_TX_PRUNE_ENA_S;
+   ctxt->info.sw_flags2 |= ICE_AQ_VSI_SW_FLAG_RX_VLAN_PRUNE_ENA;
+   } else {
+   ctxt->info.sec_flags &=
+   ~(ICE_AQ_VSI_SEC_TX_VLAN_PRUNE_ENA <<
+ ICE_AQ_VSI_SEC_TX_PRUNE_ENA_S);
+   ctxt->info.sw_flags2 &= ~ICE_AQ_VSI_SW_FLAG_RX_VLAN_PRUNE_ENA;
+   }
+
+   ctxt->info.valid_sections = cpu_to_le16(ICE_AQ_VSI_PROP_SECURITY_VALID |
+   ICE_AQ_VSI_PROP_SW_VALID);
+   ctxt->vsi_num = vsi->vsi_num;
+   status = ice_aq_update_vsi(&vsi->back->hw, ctxt, NULL);
+   if (status) {
+   netdev_err(vsi->netdev, "%sabling VLAN pruning on VSI %d 
failed, err = %d, aq_err = %d\n",
+  ena ? "Ena" : "Dis", vsi->vsi_num, status,
+  vsi->back->hw.adminq.sq_last_status);
+   goto err_out;
+   }
+
+   vsi->info.sec_flags = ctxt->info.sec_flags;
+   vsi->info.sw_flags2 = ctxt->info.sw_flags2;
+
+   devm_kfree(dev, ctxt);
+   return 0;
+
+err_out:
+   devm_kfree(dev, ctxt);
+   return -EIO;
+}
+
 /**
  * ice_vsi_sync_fltr - Update the VSI filter list to the HW
  * @vsi: ptr to the VSI
@@ -3126,7 +3183,7 @@ static int ice_vlan_rx_add_vid(struct net_device *netdev,
 {
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
-   int ret = 0;
+   int ret;
 
if (vid >= VLAN_N_VID) {
netdev_err(netdev, "VLAN id requested %d is out of range %d\n",
@@ -3137,6 +3194,13 @@ static int ice_vlan_rx_add_vid(struct net_device *netdev,
if (vsi->info.pvid)
return -EINVAL;
 
+   /* Enable VLAN pruning when VLAN 0 is added */
+   if (unlikely(!vid)) {
+   ret = ice_cfg_vlan_pruning(vsi, true);
+   if (ret)
+   return ret;
+   }
+
/* Add all VLAN ids including 0 to the switch filter. VLAN id 0 is
 * needed to continue allowing all untagged packets since VLAN prune
 * list is applied to all packets by the switch
@@ -3153,16 +3217,19 @@ static int ice_vlan_rx_add_vid(struct net_device 
*netdev,
  * ice_vsi_kill_vlan - Remove VSI membership for a given VLAN
  * @vsi: the VSI being configured
  * @vid: VLAN id to be removed
+ *
+ * Returns 0 on success and negative on failure
  */
-static void ice_vsi_kill_vl

[net-next 03/15] ice: Update request resource command to latest specification

2018-08-28 Thread Jeff Kirsher
From: Dan Nowlin 

Align Request Resource Ownership AQ command (0x0008) to the latest
specification. This includes:

- Correcting the resource IDs for the Global Cfg and Change locks.
- new enum ICE_CHANGE_LOCK_RES_ID
- new enum ICE_GLOBAL_CFG_LOCK_RES_ID
- Altering the flow for Global Config Lock to allow only the first PF to
  download the package.

Signed-off-by: Dan Nowlin 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_common.c | 75 -
 drivers/net/ethernet/intel/ice/ice_common.h |  2 +-
 drivers/net/ethernet/intel/ice/ice_nvm.c|  2 +-
 drivers/net/ethernet/intel/ice/ice_type.h   |  9 ++-
 4 files changed, 67 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index b315655eab27..2a1e13576ce2 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -967,7 +967,22 @@ enum ice_status ice_aq_q_shutdown(struct ice_hw *hw, bool 
unloading)
  * @timeout: the maximum time in ms that the driver may hold the resource
  * @cd: pointer to command details structure or NULL
  *
- * requests common resource using the admin queue commands (0x0008)
+ * Requests common resource using the admin queue commands (0x0008).
+ * When attempting to acquire the Global Config Lock, the driver can
+ * learn of three states:
+ *  1) ICE_SUCCESS -acquired lock, and can perform download package
+ *  2) ICE_ERR_AQ_ERROR -   did not get lock, driver should fail to load
+ *  3) ICE_ERR_AQ_NO_WORK - did not get lock, but another driver has
+ *  successfully downloaded the package; the driver 
does
+ *  not have to download the package and can continue
+ *  loading
+ *
+ * Note that if the caller is in an acquire lock, perform action, release lock
+ * phase of operation, it is possible that the FW may detect a timeout and 
issue
+ * a CORER. In this case, the driver will receive a CORER interrupt and will
+ * have to determine its cause. The calling thread that is handling this flow
+ * will likely get an error propagated back to it indicating the Download
+ * Package, Update Package or the Release Resource AQ commands timed out.
  */
 static enum ice_status
 ice_aq_req_res(struct ice_hw *hw, enum ice_aq_res_ids res,
@@ -985,13 +1000,43 @@ ice_aq_req_res(struct ice_hw *hw, enum ice_aq_res_ids 
res,
cmd_resp->res_id = cpu_to_le16(res);
cmd_resp->access_type = cpu_to_le16(access);
cmd_resp->res_number = cpu_to_le32(sdp_number);
+   cmd_resp->timeout = cpu_to_le32(*timeout);
+   *timeout = 0;
 
status = ice_aq_send_cmd(hw, &desc, NULL, 0, cd);
+
/* The completion specifies the maximum time in ms that the driver
 * may hold the resource in the Timeout field.
-* If the resource is held by someone else, the command completes with
-* busy return value and the timeout field indicates the maximum time
-* the current owner of the resource has to free it.
+*/
+
+   /* Global config lock response utilizes an additional status field.
+*
+* If the Global config lock resource is held by some other driver, the
+* command completes with ICE_AQ_RES_GLBL_IN_PROG in the status field
+* and the timeout field indicates the maximum time the current owner
+* of the resource has to free it.
+*/
+   if (res == ICE_GLOBAL_CFG_LOCK_RES_ID) {
+   if (le16_to_cpu(cmd_resp->status) == ICE_AQ_RES_GLBL_SUCCESS) {
+   *timeout = le32_to_cpu(cmd_resp->timeout);
+   return 0;
+   } else if (le16_to_cpu(cmd_resp->status) ==
+  ICE_AQ_RES_GLBL_IN_PROG) {
+   *timeout = le32_to_cpu(cmd_resp->timeout);
+   return ICE_ERR_AQ_ERROR;
+   } else if (le16_to_cpu(cmd_resp->status) ==
+  ICE_AQ_RES_GLBL_DONE) {
+   return ICE_ERR_AQ_NO_WORK;
+   }
+
+   /* invalid FW response, force a timeout immediately */
+   *timeout = 0;
+   return ICE_ERR_AQ_ERROR;
+   }
+
+   /* If the resource is held by some other driver, the command completes
+* with a busy return value and the timeout field indicates the maximum
+* time the current owner of the resource has to free it.
 */
if (!status || hw->adminq.sq_last_status == ICE_AQ_RC_EBUSY)
*timeout = le32_to_cpu(cmd_resp->timeout);
@@ -1030,30 +1075,28 @@ ice_aq_release_res(struct ice_hw *hw, enum 
ice_aq_res_ids res, u8 sdp_number,
  * @hw: pointer to the HW structure
  * @res: resource id
  * @access: access type (read or write)
+ * @timeout: timeout in milliseconds
  *
  * This function will attemp

[net-next 12/15] ice: Enable firmware logging during device initialization.

2018-08-28 Thread Jeff Kirsher
From: Hieu Tran 

To enable FW logging, the "cq_en" and "uart_en" enable bits of the
"fw_log" element in struct ice_hw need to set accordingly based on
some user-provided parameters during driver loading. To select which
FW log events to be emitted, the "cfg" elements of corresponding FW
modules in the "evnts" array member of "fw_log" need to be configured.

Signed-off-by: Hieu Tran 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  83 
 drivers/net/ethernet/intel/ice/ice_common.c   | 182 +-
 drivers/net/ethernet/intel/ice/ice_common.h   |   1 +
 drivers/net/ethernet/intel/ice/ice_main.c |   3 +
 drivers/net/ethernet/intel/ice/ice_type.h |  19 ++
 5 files changed, 286 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 3dadb2b01b5c..f8dfd675486c 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1206,6 +1206,84 @@ struct ice_aqc_dis_txq {
struct ice_aqc_dis_txq_item qgrps[1];
 };
 
+/* Configure Firmware Logging Command (indirect 0xFF09)
+ * Logging Information Read Response (indirect 0xFF10)
+ * Note: The 0xFF10 command has no input parameters.
+ */
+struct ice_aqc_fw_logging {
+   u8 log_ctrl;
+#define ICE_AQC_FW_LOG_AQ_EN   BIT(0)
+#define ICE_AQC_FW_LOG_UART_EN BIT(1)
+   u8 rsvd0;
+   u8 log_ctrl_valid; /* Not used by 0xFF10 Response */
+#define ICE_AQC_FW_LOG_AQ_VALIDBIT(0)
+#define ICE_AQC_FW_LOG_UART_VALID  BIT(1)
+   u8 rsvd1[5];
+   __le32 addr_high;
+   __le32 addr_low;
+};
+
+enum ice_aqc_fw_logging_mod {
+   ICE_AQC_FW_LOG_ID_GENERAL = 0,
+   ICE_AQC_FW_LOG_ID_CTRL,
+   ICE_AQC_FW_LOG_ID_LINK,
+   ICE_AQC_FW_LOG_ID_LINK_TOPO,
+   ICE_AQC_FW_LOG_ID_DNL,
+   ICE_AQC_FW_LOG_ID_I2C,
+   ICE_AQC_FW_LOG_ID_SDP,
+   ICE_AQC_FW_LOG_ID_MDIO,
+   ICE_AQC_FW_LOG_ID_ADMINQ,
+   ICE_AQC_FW_LOG_ID_HDMA,
+   ICE_AQC_FW_LOG_ID_LLDP,
+   ICE_AQC_FW_LOG_ID_DCBX,
+   ICE_AQC_FW_LOG_ID_DCB,
+   ICE_AQC_FW_LOG_ID_NETPROXY,
+   ICE_AQC_FW_LOG_ID_NVM,
+   ICE_AQC_FW_LOG_ID_AUTH,
+   ICE_AQC_FW_LOG_ID_VPD,
+   ICE_AQC_FW_LOG_ID_IOSF,
+   ICE_AQC_FW_LOG_ID_PARSER,
+   ICE_AQC_FW_LOG_ID_SW,
+   ICE_AQC_FW_LOG_ID_SCHEDULER,
+   ICE_AQC_FW_LOG_ID_TXQ,
+   ICE_AQC_FW_LOG_ID_RSVD,
+   ICE_AQC_FW_LOG_ID_POST,
+   ICE_AQC_FW_LOG_ID_WATCHDOG,
+   ICE_AQC_FW_LOG_ID_TASK_DISPATCH,
+   ICE_AQC_FW_LOG_ID_MNG,
+   ICE_AQC_FW_LOG_ID_MAX,
+};
+
+/* This is the buffer for both of the logging commands.
+ * The entry array size depends on the datalen parameter in the descriptor.
+ * There will be a total of datalen / 2 entries.
+ */
+struct ice_aqc_fw_logging_data {
+   __le16 entry[1];
+#define ICE_AQC_FW_LOG_ID_S0
+#define ICE_AQC_FW_LOG_ID_M(0xFFF << ICE_AQC_FW_LOG_ID_S)
+
+#define ICE_AQC_FW_LOG_CONF_SUCCESS0   /* Used by response */
+#define ICE_AQC_FW_LOG_CONF_BAD_INDX   BIT(12) /* Used by response */
+
+#define ICE_AQC_FW_LOG_EN_S12
+#define ICE_AQC_FW_LOG_EN_M(0xF << ICE_AQC_FW_LOG_EN_S)
+#define ICE_AQC_FW_LOG_INFO_EN BIT(12) /* Used by command */
+#define ICE_AQC_FW_LOG_INIT_EN BIT(13) /* Used by command */
+#define ICE_AQC_FW_LOG_FLOW_EN BIT(14) /* Used by command */
+#define ICE_AQC_FW_LOG_ERR_EN  BIT(15) /* Used by command */
+};
+
+/* Get/Clear FW Log (indirect 0xFF11) */
+struct ice_aqc_get_clear_fw_log {
+   u8 flags;
+#define ICE_AQC_FW_LOG_CLEAR   BIT(0)
+#define ICE_AQC_FW_LOG_MORE_DATA_AVAIL BIT(1)
+   u8 rsvd1[7];
+   __le32 addr_high;
+   __le32 addr_low;
+};
+
 /**
  * struct ice_aq_desc - Admin Queue (AQ) descriptor
  * @flags: ICE_AQ_FLAG_* flags
@@ -1256,6 +1334,8 @@ struct ice_aq_desc {
struct ice_aqc_dis_txqs dis_txqs;
struct ice_aqc_add_get_update_free_vsi vsi_cmd;
struct ice_aqc_add_update_free_vsi_resp add_update_free_vsi_res;
+   struct ice_aqc_fw_logging fw_logging;
+   struct ice_aqc_get_clear_fw_log get_clear_fw_log;
struct ice_aqc_alloc_free_res_cmd sw_res_ctrl;
struct ice_aqc_set_event_mask set_event_mask;
struct ice_aqc_get_link_status get_link_status;
@@ -1353,6 +1433,9 @@ enum ice_adminq_opc {
/* TX queue handling commands/events */
ice_aqc_opc_add_txqs= 0x0C30,
ice_aqc_opc_dis_txqs= 0x0C31,
+
+   /* debug commands */
+   ice_aqc_opc_fw_logging  = 0xFF09,
 };
 
 #endif /* _ICE_ADMINQ_CMD_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_c

[net-next 01/15] ice: Rework flex descriptor programming

2018-08-28 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

The driver can support two flex descriptor profiles, ICE_RXDID_FLEX_NIC
and ICE_RXDID_FLEX_NIC_2. This patch reworks the current flex programming
logic to add support for the latter profile.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_common.c   | 102 ++
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h|  24 +++--
 2 files changed, 92 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index 661beea6af79..53cbfd942d03 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -7,16 +7,16 @@
 
 #define ICE_PF_RESET_WAIT_COUNT200
 
-#define ICE_NIC_FLX_ENTRY(hw, mdid, idx) \
-   wr32((hw), GLFLXP_RXDID_FLX_WRD_##idx(ICE_RXDID_FLEX_NIC), \
+#define ICE_PROG_FLEX_ENTRY(hw, rxdid, mdid, idx) \
+   wr32((hw), GLFLXP_RXDID_FLX_WRD_##idx(rxdid), \
 ((ICE_RX_OPC_MDID << \
   GLFLXP_RXDID_FLX_WRD_##idx##_RXDID_OPCODE_S) & \
  GLFLXP_RXDID_FLX_WRD_##idx##_RXDID_OPCODE_M) | \
 (((mdid) << GLFLXP_RXDID_FLX_WRD_##idx##_PROT_MDID_S) & \
  GLFLXP_RXDID_FLX_WRD_##idx##_PROT_MDID_M))
 
-#define ICE_NIC_FLX_FLG_ENTRY(hw, flg_0, flg_1, flg_2, flg_3, idx) \
-   wr32((hw), GLFLXP_RXDID_FLAGS(ICE_RXDID_FLEX_NIC, idx), \
+#define ICE_PROG_FLG_ENTRY(hw, rxdid, flg_0, flg_1, flg_2, flg_3, idx) \
+   wr32((hw), GLFLXP_RXDID_FLAGS(rxdid, idx), \
 (((flg_0) << GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_S) & \
  GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_M) | \
 (((flg_1) << GLFLXP_RXDID_FLAGS_FLEXIFLAG_4N_1_S) & \
@@ -290,30 +290,85 @@ ice_aq_get_link_info(struct ice_port_info *pi, bool 
ena_lse,
 }
 
 /**
- * ice_init_flex_parser - initialize rx flex parser
+ * ice_init_flex_flags
  * @hw: pointer to the hardware structure
+ * @prof_id: Rx Descriptor Builder profile ID
  *
- * Function to initialize flex descriptors
+ * Function to initialize Rx flex flags
  */
-static void ice_init_flex_parser(struct ice_hw *hw)
+static void ice_init_flex_flags(struct ice_hw *hw, enum ice_rxdid prof_id)
 {
u8 idx = 0;
 
-   ICE_NIC_FLX_ENTRY(hw, ICE_RX_MDID_HASH_LOW, 0);
-   ICE_NIC_FLX_ENTRY(hw, ICE_RX_MDID_HASH_HIGH, 1);
-   ICE_NIC_FLX_ENTRY(hw, ICE_RX_MDID_FLOW_ID_LOWER, 2);
-   ICE_NIC_FLX_ENTRY(hw, ICE_RX_MDID_FLOW_ID_HIGH, 3);
-   ICE_NIC_FLX_FLG_ENTRY(hw, ICE_RXFLG_PKT_FRG, ICE_RXFLG_UDP_GRE,
- ICE_RXFLG_PKT_DSI, ICE_RXFLG_FIN, idx++);
-   ICE_NIC_FLX_FLG_ENTRY(hw, ICE_RXFLG_SYN, ICE_RXFLG_RST,
- ICE_RXFLG_PKT_DSI, ICE_RXFLG_PKT_DSI, idx++);
-   ICE_NIC_FLX_FLG_ENTRY(hw, ICE_RXFLG_PKT_DSI, ICE_RXFLG_PKT_DSI,
- ICE_RXFLG_EVLAN_x8100, ICE_RXFLG_EVLAN_x9100,
- idx++);
-   ICE_NIC_FLX_FLG_ENTRY(hw, ICE_RXFLG_VLAN_x8100, ICE_RXFLG_TNL_VLAN,
- ICE_RXFLG_TNL_MAC, ICE_RXFLG_TNL0, idx++);
-   ICE_NIC_FLX_FLG_ENTRY(hw, ICE_RXFLG_TNL1, ICE_RXFLG_TNL2,
- ICE_RXFLG_PKT_DSI, ICE_RXFLG_PKT_DSI, idx);
+   /* Flex-flag fields (0-2) are programmed with FLG64 bits with layout:
+* flexiflags0[5:0] - TCP flags, is_packet_fragmented, is_packet_UDP_GRE
+* flexiflags1[3:0] - Not used for flag programming
+* flexiflags2[7:0] - Tunnel and VLAN types
+* 2 invalid fields in last index
+*/
+   switch (prof_id) {
+   /* Rx flex flags are currently programmed for the NIC profiles only.
+* Different flag bit programming configurations can be added per
+* profile as needed.
+*/
+   case ICE_RXDID_FLEX_NIC:
+   case ICE_RXDID_FLEX_NIC_2:
+   ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_RXFLG_PKT_FRG,
+  ICE_RXFLG_UDP_GRE, ICE_RXFLG_PKT_DSI,
+  ICE_RXFLG_FIN, idx++);
+   /* flex flag 1 is not used for flexi-flag programming, skipping
+* these four FLG64 bits.
+*/
+   ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_RXFLG_SYN, ICE_RXFLG_RST,
+  ICE_RXFLG_PKT_DSI, ICE_RXFLG_PKT_DSI, idx++);
+   ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_RXFLG_PKT_DSI,
+  ICE_RXFLG_PKT_DSI, ICE_RXFLG_EVLAN_x8100,
+  ICE_RXFLG_EVLAN_x9100, idx++);
+   ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_RXFLG_VLAN_x8100,
+  ICE_RXFLG_TNL_VLAN, ICE_RXFLG_TNL_MAC,
+  ICE_RXFLG_TNL0, idx++);
+   ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_RXFLG_TNL1, ICE_RXFLG_TNL2,
+  ICE_RXFLG_PKT_DSI, ICE_RXFLG_PKT_DSI, idx);
+   break;
+
+   d

Re: phys_port_id in switchdev mode?

2018-08-28 Thread Jakub Kicinski
Ugh, CC: netdev..

On Tue, 28 Aug 2018 20:05:39 +0200, Jakub Kicinski wrote:
> Hi!
> 
> I wonder if we can use phys_port_id in switchdev to group together
> interfaces of a single PCI PF?  Here is the problem:
> 
> With a mix of PF and VF interfaces it gets increasingly difficult to
> figure out which one corresponds to which PF.  We can identify which
> *representor* is which, by means of phys_port_name and devlink
> flavours.  But if the actual VF/PF interfaces are also present on the
> same host, it gets confusing when one tries to identify the PF they
> came from.  Generally one has to resort of matching between PCI DBDF of
> the PF and VFs or read relevant info out of ethtool -i.
> 
> In multi host scenario this is particularly painful, as there seems to
> be no immediately obvious way to match PCI interface ID of a card (0,
> 1, 2, 3, 4...) to the DBDF we have connected.
> 
> Another angle to this is legacy SR-IOV NDOs.  User space picks a netdev
> from /sys/bus/pci/$VF_DBDF/physfn/net/ to run the NDOs on in somehow
> random manner, which means we have to provide those for all devices with
> link to the PF (all reprs).  And we have to link them (a) because it's
> right (tm) and (b) to get correct naming.  The only reliable way to make
> user space (libvirt) choose the repr it should run the NDOs on (which is
> IMHO the corresponding PF repr) is to set phys_port_id on actual VFs,
> VF reprs, PFs and PF reprs to a value corresponding to the *PCI PF*,
> not the external/Ethernet port when in switchdev mode.  User space
> should understand phys_port_id in this context, given it was originally
> introduced for matching VFs to ports.
> 
> I hope this explanation makes sense, and is correct.  Please point out
> errors in my understanding, any comments would be appreciated! :)
> 
> Jiri?  Or?  Others?



[PATCH net-next 1/2] ip: fail fast on IP defrag errors

2018-08-28 Thread Peter Oskolkov
The current behavior of IP defragmentation is inconsistent:
- some overlapping/wrong length fragments are dropped without
  affecting the queue;
- most overlapping fragments cause the whole frag queue to be dropped.

This patch brings consistency: if a bad fragment is detected,
the whole frag queue is dropped. Two major benefits:
- fail fast: corrupted frag queues are cleared immediately, instead of
  by timeout;
- testing of overlapping fragments is now much easier: any kind of
  random fragment length mutation now leads to the frag queue being
  discarded (IP packet dropped); before this patch, some overlaps were
  "corrected", with tests not seeing expected packet drops.

Note that in one case (see "if (end&7)" conditional) the current
behavior is preserved as there are concerns that this could be
legitimate padding.

Signed-off-by: Peter Oskolkov 
Reviewed-by: Eric Dumazet 
Reviewed-by: Willem de Bruijn 
---
 net/ipv4/ip_fragment.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 88281fbce88c..330f62353b11 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -382,7 +382,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff 
*skb)
 */
if (end < qp->q.len ||
((qp->q.flags & INET_FRAG_LAST_IN) && end != qp->q.len))
-   goto err;
+   goto discard_qp;
qp->q.flags |= INET_FRAG_LAST_IN;
qp->q.len = end;
} else {
@@ -394,20 +394,20 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff 
*skb)
if (end > qp->q.len) {
/* Some bits beyond end -> corruption. */
if (qp->q.flags & INET_FRAG_LAST_IN)
-   goto err;
+   goto discard_qp;
qp->q.len = end;
}
}
if (end == offset)
-   goto err;
+   goto discard_qp;
 
err = -ENOMEM;
if (!pskb_pull(skb, skb_network_offset(skb) + ihl))
-   goto err;
+   goto discard_qp;
 
err = pskb_trim_rcsum(skb, end - offset);
if (err)
-   goto err;
+   goto discard_qp;
 
/* Note : skb->rbnode and skb->dev share the same location. */
dev = skb->dev;
@@ -423,6 +423,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff 
*skb)
 * We do the same here for IPv4 (and increment an snmp counter).
 */
 
+   err = -EINVAL;
/* Find out where to put this fragment.  */
prev_tail = qp->q.fragments_tail;
if (!prev_tail)
@@ -431,7 +432,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff 
*skb)
/* This is the common case: skb goes to the end. */
/* Detect and discard overlaps. */
if (offset < prev_tail->ip_defrag_offset + prev_tail->len)
-   goto discard_qp;
+   goto overlap;
if (offset == prev_tail->ip_defrag_offset + prev_tail->len)
ip4_frag_append_to_last_run(&qp->q, skb);
else
@@ -450,7 +451,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff 
*skb)
FRAG_CB(skb1)->frag_run_len)
rbn = &parent->rb_right;
else /* Found an overlap with skb1. */
-   goto discard_qp;
+   goto overlap;
} while (*rbn);
/* Here we have parent properly set, and rbn pointing to
 * one of its NULL left/right children. Insert skb.
@@ -487,16 +488,18 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff 
*skb)
skb->_skb_refdst = 0UL;
err = ip_frag_reasm(qp, skb, prev_tail, dev);
skb->_skb_refdst = orefdst;
+   if (err)
+   inet_frag_kill(&qp->q);
return err;
}
 
skb_dst_drop(skb);
return -EINPROGRESS;
 
+overlap:
+   __IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS);
 discard_qp:
inet_frag_kill(&qp->q);
-   err = -EINVAL;
-   __IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS);
 err:
kfree_skb(skb);
return err;
-- 
2.19.0.rc0.228.g281dcd1b4d0-goog



[PATCH net-next 2/2] selftests/net: add ip_defrag selftest

2018-08-28 Thread Peter Oskolkov
This test creates a raw IPv4 socket, fragments a largish UDP
datagram and sends the fragments out of order.

Then repeats in a loop with different message and fragment lengths.

Then does the same with overlapping fragments (with overlapping
fragments the expectation is that the recv times out).

Tested:

root@# time ./ip_defrag.sh
ipv4 defrag
PASS
ipv4 defrag with overlaps
PASS

real1m7.679s
user0m0.628s
sys 0m2.242s

A similar test for IPv6 is to follow.

Signed-off-by: Peter Oskolkov 
Reviewed-by: Willem de Bruijn 
---
 tools/testing/selftests/net/.gitignore   |   2 +
 tools/testing/selftests/net/Makefile |   4 +-
 tools/testing/selftests/net/ip_defrag.c  | 313 +++
 tools/testing/selftests/net/ip_defrag.sh |  29 +++
 4 files changed, 346 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/net/ip_defrag.c
 create mode 100755 tools/testing/selftests/net/ip_defrag.sh

diff --git a/tools/testing/selftests/net/.gitignore 
b/tools/testing/selftests/net/.gitignore
index 78b24cf76f40..2836e0cf2d81 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -14,3 +14,5 @@ udpgso_bench_rx
 udpgso_bench_tx
 tcp_inq
 tls
+ip_defrag
+
diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index 9cca68e440a0..cccdb2295567 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -5,13 +5,13 @@ CFLAGS =  -Wall -Wl,--no-as-needed -O2 -g
 CFLAGS += -I../../../../usr/include/
 
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh 
rtnetlink.sh
-TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh
+TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
 TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd
-TEST_GEN_FILES += udpgso udpgso_bench_tx udpgso_bench_rx
+TEST_GEN_FILES += udpgso udpgso_bench_tx udpgso_bench_rx ip_defrag
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
 
diff --git a/tools/testing/selftests/net/ip_defrag.c 
b/tools/testing/selftests/net/ip_defrag.c
new file mode 100644
index ..55fdcdc78eef
--- /dev/null
+++ b/tools/testing/selftests/net/ip_defrag.c
@@ -0,0 +1,313 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static boolcfg_do_ipv4;
+static boolcfg_do_ipv6;
+static boolcfg_verbose;
+static boolcfg_overlap;
+static unsigned short  cfg_port = 9000;
+
+const struct in_addr addr4 = { .s_addr = __constant_htonl(INADDR_LOOPBACK + 2) 
};
+
+#define IP4_HLEN   (sizeof(struct iphdr))
+#define IP6_HLEN   (sizeof(struct ip6_hdr))
+#define UDP_HLEN   (sizeof(struct udphdr))
+
+static int msg_len;
+static int max_frag_len;
+
+#define MSG_LEN_MAX6   /* Max UDP payload length. */
+
+#define IP4_MF (1u << 13)  /* IPv4 MF flag. */
+
+static uint8_t udp_payload[MSG_LEN_MAX];
+static uint8_t ip_frame[IP_MAXPACKET];
+static uint16_t ip_id = 0xabcd;
+static int msg_counter;
+static int frag_counter;
+static unsigned int seed;
+
+/* Receive a UDP packet. Validate it matches udp_payload. */
+static void recv_validate_udp(int fd_udp)
+{
+   ssize_t ret;
+   static uint8_t recv_buff[MSG_LEN_MAX];
+
+   ret = recv(fd_udp, recv_buff, msg_len, 0);
+   msg_counter++;
+
+   if (cfg_overlap) {
+   if (ret != -1)
+   error(1, 0, "recv: expected timeout; got %d; seed = %u",
+   (int)ret, seed);
+   if (errno != ETIMEDOUT && errno != EAGAIN)
+   error(1, errno, "recv: expected timeout: %d; seed = %u",
+errno, seed);
+   return;  /* OK */
+   }
+
+   if (ret == -1)
+   error(1, errno, "recv: msg_len = %d max_frag_len = %d",
+   msg_len, max_frag_len);
+   if (ret != msg_len)
+   error(1, 0, "recv: wrong size: %d vs %d", (int)ret, msg_len);
+   if (memcmp(udp_payload, recv_buff, msg_len))
+   error(1, 0, "recv: wrong data");
+}
+
+static uint32_t raw_checksum(uint8_t *buf, int len, uint32_t sum)
+{
+   int i;
+
+   for (i = 0; i < (len & ~1U); i += 2) {
+   sum += (u_int16_t)ntohs(*((u_int16_t *)(buf + i)));
+   if (sum > 0x)
+   sum -= 0x;
+   }
+
+   if (i < len) {
+   sum += buf[i] << 8;
+   if (sum > 0x)
+   sum -= 0x;
+   }
+
+   return sum

[PATCH net-next] liquidio: fix race condition in instruction completion processing

2018-08-28 Thread Felix Manlunas
From: Rick Farrington 

In lio_enable_irq, the pkt_in_done count register was being cleared to
zero.  However, there could be some completed instructions which were not
yet processed due to budget and limit constraints.
So, only write this register with the number of actual completions
that were processed.

Signed-off-by: Rick Farrington 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/octeon_device.c   | 5 +++--
 drivers/net/ethernet/cavium/liquidio/octeon_iq.h   | 2 ++
 drivers/net/ethernet/cavium/liquidio/request_manager.c | 2 ++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index f878a55..d0ed6c4 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -1450,8 +1450,9 @@ void lio_enable_irq(struct octeon_droq *droq, struct 
octeon_instr_queue *iq)
}
if (iq) {
spin_lock_bh(&iq->lock);
-   writel(iq->pkt_in_done, iq->inst_cnt_reg);
-   iq->pkt_in_done = 0;
+   writel(iq->pkts_processed, iq->inst_cnt_reg);
+   iq->pkt_in_done -= iq->pkts_processed;
+   iq->pkts_processed = 0;
/* this write needs to be flushed before we release the lock */
mmiowb();
spin_unlock_bh(&iq->lock);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_iq.h 
b/drivers/net/ethernet/cavium/liquidio/octeon_iq.h
index 2327062..aecd0d3 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_iq.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_iq.h
@@ -94,6 +94,8 @@ struct octeon_instr_queue {
 
u32 pkt_in_done;
 
+   u32 pkts_processed;
+
/** A spinlock to protect access to the input ring.*/
spinlock_t iq_flush_running_lock;
 
diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c 
b/drivers/net/ethernet/cavium/liquidio/request_manager.c
index 8f746e1..f943aa7 100644
--- a/drivers/net/ethernet/cavium/liquidio/request_manager.c
+++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c
@@ -123,6 +123,7 @@ int octeon_init_instr_queue(struct octeon_device *oct,
iq->do_auto_flush = 1;
iq->db_timeout = (u32)conf->db_timeout;
atomic_set(&iq->instr_pending, 0);
+   iq->pkts_processed = 0;
 
/* Initialize the spinlock for this instruction queue */
spin_lock_init(&iq->lock);
@@ -495,6 +496,7 @@ static inline void __copy_cmd_into_iq(struct 
octeon_instr_queue *iq,
lio_process_iq_request_list(oct, iq, 0);
 
if (inst_processed) {
+   iq->pkts_processed += inst_processed;
atomic_sub(inst_processed, &iq->instr_pending);
iq->stats.instr_processed += inst_processed;
}
-- 
1.8.3.1



[PATCH net-next] liquidio: remove unnecessary delay when processing IQ responses

2018-08-28 Thread Felix Manlunas
From: Rick Farrington 

Signed-off-by: Rick Farrington 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/request_manager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c 
b/drivers/net/ethernet/cavium/liquidio/request_manager.c
index 8f746e1..0a06fbb 100644
--- a/drivers/net/ethernet/cavium/liquidio/request_manager.c
+++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c
@@ -459,7 +459,7 @@ static inline void __copy_cmd_into_iq(struct 
octeon_instr_queue *iq,
 
if (atomic_read(&oct->response_list
[OCTEON_ORDERED_SC_LIST].pending_req_count))
-   queue_delayed_work(cwq->wq, &cwq->wk.work, msecs_to_jiffies(1));
+   queue_work(cwq->wq, &cwq->wk.work.work);
 
return inst_count;
 }
-- 
1.8.3.1



Re: [PATCH bpf-next 01/11] xdp: implement convert_to_xdp_frame for MEM_TYPE_ZERO_COPY

2018-08-28 Thread Björn Töpel
Den tis 28 aug. 2018 kl 16:11 skrev Jesper Dangaard Brouer :
>
> On Tue, 28 Aug 2018 14:44:25 +0200
> Björn Töpel  wrote:
>
> > From: Björn Töpel 
> >
> > This commit adds proper MEM_TYPE_ZERO_COPY support for
> > convert_to_xdp_frame. Converting a MEM_TYPE_ZERO_COPY xdp_buff to an
> > xdp_frame is done by transforming the MEM_TYPE_ZERO_COPY buffer into a
> > MEM_TYPE_PAGE_ORDER0 frame. This is costly, and in the future it might
> > make sense to implement a more sophisticated thread-safe alloc/free
> > scheme for MEM_TYPE_ZERO_COPY, so that no allocation and copy is
> > required in the fast-path.
>
> This is going to be slow. Especially the dev_alloc_page() call, which
> for small frames is likely going to be slower than the data copy.
> I guess this is a good first step, but I do hope we will circle back and
> optimize this later.  (It would also be quite easy to use
> MEM_TYPE_PAGE_POOL instead to get page recycling in devmap redirect case).
>

Yes, slow. :-( Still, I think this is a good starting point, and then
introduce a page pool in later performance oriented series to make XDP
faster for the AF_XDP scenario.

But I'm definitely on your side here; This need to be addressed -- but
not now IMO.


And thanks for spending time on the series!
Björn

> I would have liked the MEM_TYPE_ZERO_COPY frame to travel one level
> deeper into the redirect-core code.  Allowing devmap to send these
> frame without copy, and allow cpumap to do the dev_alloc_page() call
> (+copy) on the remote CPU.
>
>
> > Signed-off-by: Björn Töpel 
> > ---
> >  include/net/xdp.h |  5 +++--
> >  net/core/xdp.c| 39 +++
> >  2 files changed, 42 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 76b95256c266..0d5c6fb4b2e2 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -91,6 +91,8 @@ static inline void xdp_scrub_frame(struct xdp_frame 
> > *frame)
> >   frame->dev_rx = NULL;
> >  }
> >
> > +struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
> > +
> >  /* Convert xdp_buff to xdp_frame */
> >  static inline
> >  struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
> > @@ -99,9 +101,8 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff 
> > *xdp)
> >   int metasize;
> >   int headroom;
> >
> > - /* TODO: implement clone, copy, use "native" MEM_TYPE */
> >   if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY)
> > - return NULL;
> > + return xdp_convert_zc_to_xdp_frame(xdp);
> >
> >   /* Assure headroom is available for storing info */
> >   headroom = xdp->data - xdp->data_hard_start;
> > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > index 89b6785cef2a..be6cb2f0e722 100644
> > --- a/net/core/xdp.c
> > +++ b/net/core/xdp.c
> > @@ -398,3 +398,42 @@ void xdp_attachment_setup(struct xdp_attachment_info 
> > *info,
> >   info->flags = bpf->flags;
> >  }
> >  EXPORT_SYMBOL_GPL(xdp_attachment_setup);
> > +
> > +struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
> > +{
> > + unsigned int metasize, headroom, totsize;
> > + void *addr, *data_to_copy;
> > + struct xdp_frame *xdpf;
> > + struct page *page;
> > +
> > + /* Clone into a MEM_TYPE_PAGE_ORDER0 xdp_frame. */
> > + metasize = xdp_data_meta_unsupported(xdp) ? 0 :
> > +xdp->data - xdp->data_meta;
> > + headroom = xdp->data - xdp->data_hard_start;
> > + totsize = xdp->data_end - xdp->data + metasize;
> > +
> > + if (sizeof(*xdpf) + totsize > PAGE_SIZE)
> > + return NULL;
> > +
> > + page = dev_alloc_page();
> > + if (!page)
> > + return NULL;
> > +
> > + addr = page_to_virt(page);
> > + xdpf = addr;
> > + memset(xdpf, 0, sizeof(*xdpf));
> > +
> > + addr += sizeof(*xdpf);
> > + data_to_copy = metasize ? xdp->data_meta : xdp->data;
> > + memcpy(addr, data_to_copy, totsize);
> > +
> > + xdpf->data = addr + metasize;
> > + xdpf->len = totsize - metasize;
> > + xdpf->headroom = 0;
> > + xdpf->metasize = metasize;
> > + xdpf->mem.type = MEM_TYPE_PAGE_ORDER0;
> > +
> > + xdp_return_buff(xdp);
> > + return xdpf;
> > +}
> > +EXPORT_SYMBOL_GPL(xdp_convert_zc_to_xdp_frame);
>
>
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer


Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Andreas Schwab
On Aug 28 2018, Ard Biesheuvel  wrote:

> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 6a501b25dd85..57d09d5ceb1a 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -779,7 +779,6 @@ EXPORT_SYMBOL(__per_cpu_offset);
>
>  void __init setup_per_cpu_areas(void)
>  {
> -   const size_t dyn_size = PERCPU_MODULE_RESERVE + 
> PERCPU_DYNAMIC_RESERVE;
> size_t atom_size;
> unsigned long delta;
> unsigned int cpu;
> @@ -795,7 +794,9 @@ void __init setup_per_cpu_areas(void)
> else
> atom_size = 1 << 20;
>
> -   rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, pcpu_cpu_distance,
> +   rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
> +   PERCPU_DYNAMIC_RESERVE,
> +   atom_size, pcpu_cpu_distance,
> pcpu_fc_alloc, pcpu_fc_free);
> if (rc < 0)
> panic("cannot initialize percpu area (err=%d)", rc);

That didn't help.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [Intel-wired-lan] [PATCH] i40e: report correct statistics when XDP is enabled

2018-08-28 Thread Björn Töpel

On 2018-08-28 19:00, Paul Menzel wrote:

Dear Björn,


On 08/24/18 16:00, Jesper Dangaard Brouer wrote:

On Fri, 24 Aug 2018 13:21:59 +0200
Björn Töpel  wrote:


When XDP is enabled, the driver will report incorrect
statistics. Received frames will reported as transmitted frames.

This commits fixes the i40e implementation of ndo_get_stats64 (struct


Should you send a v2, then please use singular for *commit*:

This commit ….


net_device_ops), so that iproute2 will report correct statistics
(e.g. when running "ip -stats link show dev eth0") even when XDP is
enabled.


In the future, I’d be great, if you could describe your fix in the
commit message too. For example, why the if statement needs to move up.



Thanks for the review, Paul. I'll address your comments, if we'll end up
with V2.


Björn


Reported-by: Jesper Dangaard Brouer 
Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action")


Stable candidate:
  $ git describe --contains 74608d17fe29
  v4.13-rc1~157^2~128^2~13


Signed-off-by: Björn Töpel 


It works for me:

Tested-by: Jesper Dangaard Brouer 

I'm explicitly _not_ ACK'ing the patch, as I think the your code changes
below makes it harder to follow whether a TX or RX ring is getting
updated. But it is 100% up to the driver maintainers to say if this is
acceptable from a maintenance PoV.


---
  drivers/net/ethernet/intel/i40e/i40e_main.c | 24 +++--
  1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e40c023cc7b6..7c122dd3faa1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -425,9 +425,9 @@ static void i40e_get_netdev_stats_struct(struct net_device 
*netdev,
  struct rtnl_link_stats64 *stats)
  {
struct i40e_netdev_priv *np = netdev_priv(netdev);
-   struct i40e_ring *tx_ring, *rx_ring;
struct i40e_vsi *vsi = np->vsi;
struct rtnl_link_stats64 *vsi_stats = i40e_get_vsi_stats_struct(vsi);
+   struct i40e_ring *ring;
int i;
  
  	if (test_bit(__I40E_VSI_DOWN, vsi->state))

@@ -441,24 +441,26 @@ static void i40e_get_netdev_stats_struct(struct 
net_device *netdev,
u64 bytes, packets;
unsigned int start;
  
-		tx_ring = READ_ONCE(vsi->tx_rings[i]);

-   if (!tx_ring)
+   ring = READ_ONCE(vsi->tx_rings[i]);
+   if (!ring)
continue;
-   i40e_get_netdev_stats_struct_tx(tx_ring, stats);
+   i40e_get_netdev_stats_struct_tx(ring, stats);
  
-		rx_ring = &tx_ring[1];

+   if (i40e_enabled_xdp_vsi(vsi)) {
+   ring++;
+   i40e_get_netdev_stats_struct_tx(ring, stats);
+   }
  
+		ring++;

do {
-   start = u64_stats_fetch_begin_irq(&rx_ring->syncp);
-   packets = rx_ring->stats.packets;
-   bytes   = rx_ring->stats.bytes;
-   } while (u64_stats_fetch_retry_irq(&rx_ring->syncp, start));
+   start   = u64_stats_fetch_begin_irq(&ring->syncp);
+   packets = ring->stats.packets;
+   bytes   = ring->stats.bytes;
+   } while (u64_stats_fetch_retry_irq(&ring->syncp, start));
  
  		stats->rx_packets += packets;

stats->rx_bytes   += bytes;
  
-		if (i40e_enabled_xdp_vsi(vsi))

-   i40e_get_netdev_stats_struct_tx(&rx_ring[1], stats);
}
rcu_read_unlock();
  



Kind regards,

Paul



Re: [Intel-wired-lan] [PATCH] i40e: report correct statistics when XDP is enabled

2018-08-28 Thread Paul Menzel
Dear Björn,


On 08/24/18 16:00, Jesper Dangaard Brouer wrote:
> On Fri, 24 Aug 2018 13:21:59 +0200
> Björn Töpel  wrote:
> 
>> When XDP is enabled, the driver will report incorrect
>> statistics. Received frames will reported as transmitted frames.
>>
>> This commits fixes the i40e implementation of ndo_get_stats64 (struct

Should you send a v2, then please use singular for *commit*:

This commit ….

>> net_device_ops), so that iproute2 will report correct statistics
>> (e.g. when running "ip -stats link show dev eth0") even when XDP is
>> enabled.

In the future, I’d be great, if you could describe your fix in the
commit message too. For example, why the if statement needs to move up.

>> Reported-by: Jesper Dangaard Brouer 
>> Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action")
> 
> Stable candidate:
>  $ git describe --contains 74608d17fe29
>  v4.13-rc1~157^2~128^2~13
> 
>> Signed-off-by: Björn Töpel 
> 
> It works for me:
> 
> Tested-by: Jesper Dangaard Brouer 
> 
> I'm explicitly _not_ ACK'ing the patch, as I think the your code changes
> below makes it harder to follow whether a TX or RX ring is getting
> updated. But it is 100% up to the driver maintainers to say if this is
> acceptable from a maintenance PoV.
> 
>> ---
>>  drivers/net/ethernet/intel/i40e/i40e_main.c | 24 +++--
>>  1 file changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
>> b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> index e40c023cc7b6..7c122dd3faa1 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> @@ -425,9 +425,9 @@ static void i40e_get_netdev_stats_struct(struct 
>> net_device *netdev,
>>struct rtnl_link_stats64 *stats)
>>  {
>>  struct i40e_netdev_priv *np = netdev_priv(netdev);
>> -struct i40e_ring *tx_ring, *rx_ring;
>>  struct i40e_vsi *vsi = np->vsi;
>>  struct rtnl_link_stats64 *vsi_stats = i40e_get_vsi_stats_struct(vsi);
>> +struct i40e_ring *ring;
>>  int i;
>>  
>>  if (test_bit(__I40E_VSI_DOWN, vsi->state))
>> @@ -441,24 +441,26 @@ static void i40e_get_netdev_stats_struct(struct 
>> net_device *netdev,
>>  u64 bytes, packets;
>>  unsigned int start;
>>  
>> -tx_ring = READ_ONCE(vsi->tx_rings[i]);
>> -if (!tx_ring)
>> +ring = READ_ONCE(vsi->tx_rings[i]);
>> +if (!ring)
>>  continue;
>> -i40e_get_netdev_stats_struct_tx(tx_ring, stats);
>> +i40e_get_netdev_stats_struct_tx(ring, stats);
>>  
>> -rx_ring = &tx_ring[1];
>> +if (i40e_enabled_xdp_vsi(vsi)) {
>> +ring++;
>> +i40e_get_netdev_stats_struct_tx(ring, stats);
>> +}
>>  
>> +ring++;
>>  do {
>> -start = u64_stats_fetch_begin_irq(&rx_ring->syncp);
>> -packets = rx_ring->stats.packets;
>> -bytes   = rx_ring->stats.bytes;
>> -} while (u64_stats_fetch_retry_irq(&rx_ring->syncp, start));
>> +start   = u64_stats_fetch_begin_irq(&ring->syncp);
>> +packets = ring->stats.packets;
>> +bytes   = ring->stats.bytes;
>> +} while (u64_stats_fetch_retry_irq(&ring->syncp, start));
>>  
>>  stats->rx_packets += packets;
>>  stats->rx_bytes   += bytes;
>>  
>> -if (i40e_enabled_xdp_vsi(vsi))
>> -i40e_get_netdev_stats_struct_tx(&rx_ring[1], stats);
>>  }
>>  rcu_read_unlock();
>>  


Kind regards,

Paul



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH net 1/2] net_sched: reject unknown tcfa_action values

2018-08-28 Thread Cong Wang
On Tue, Aug 28, 2018 at 7:25 AM Paolo Abeni  wrote:
>
> +int tcf_action_destroy_one(struct tc_action *a, int bind)
> +{
> +   struct tc_action *actions[] = { a, NULL };
> +
> +   return tcf_action_destroy(actions, bind);
> +}

Make it static.


> +
>  static int tcf_action_put(struct tc_action *p)
>  {
> return __tcf_action_put(p, false);
> @@ -881,17 +888,16 @@ struct tc_action *tcf_action_init_1(struct net *net, 
> struct tcf_proto *tp,
> if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN)) {
> err = tcf_action_goto_chain_init(a, tp);
> if (err) {
> -   struct tc_action *actions[] = { a, NULL };
> -
> -   tcf_action_destroy(actions, bind);
> NL_SET_ERR_MSG(extack, "Failed to init TC action 
> chain");
> +   tcf_action_destroy_one(a, bind);
> return ERR_PTR(err);
> }
> }
>
> if (!tcf_action_valid(a->tcfa_action)) {
> NL_SET_ERR_MSG(extack, "invalid action value, using 
> TC_ACT_UNSPEC instead");


You need to adjust this extack too.



> -   a->tcfa_action = TC_ACT_UNSPEC;
> +   tcf_action_destroy_one(a, bind);
> +   return ERR_PTR(-EINVAL);
> }

Thanks.


[PATCH net-next] net: thunderbolt: Convert to use SPDX identifier

2018-08-28 Thread Mika Westerberg
This gets rid of the licence boilerblate in favor of SPDX identifier
which only takes a single line comment.

Signed-off-by: Mika Westerberg 
---
 drivers/net/thunderbolt.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/thunderbolt.c b/drivers/net/thunderbolt.c
index e0d6760f3219..c48c3a1eb1f8 100644
--- a/drivers/net/thunderbolt.c
+++ b/drivers/net/thunderbolt.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
  * Networking over Thunderbolt cable using Apple ThunderboltIP protocol
  *
@@ -5,10 +6,6 @@
  * Authors: Amir Levy 
  *  Michael Jamet 
  *  Mika Westerberg 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
  */
 
 #include 
-- 
2.18.0



Re: [PATCH net] net/sched: act_pedit: fix dump of extended layered op

2018-08-28 Thread Cong Wang
On Mon, Aug 27, 2018 at 1:56 PM Davide Caratti  wrote:
>
> in the (rare) case of failure in nla_nest_start(), missing NULL checks in
> tcf_pedit_key_ex_dump() can make the following command
>
>  # tc action add action pedit ex munge ip ttl set 64
>
> dereference a NULL pointer:
>
>  BUG: unable to handle kernel NULL pointer dereference at 
>  PGD 80007d1cd067 P4D 80007d1cd067 PUD 7acd3067 PMD 0
>  Oops: 0002 [#1] SMP PTI
>  CPU: 0 PID: 3336 Comm: tc Tainted: GE 4.18.0.pedit+ #425
>  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>  RIP: 0010:tcf_pedit_dump+0x19d/0x358 [act_pedit]
>  Code: be 02 00 00 00 48 89 df 66 89 44 24 20 e8 9b b1 fd e0 85 c0 75 46 8b 
> 83 c8 00 00 00 49 83 c5 08 48 03 83 d0 00 00 00 4d 39 f5 <66> 89 04 25 00 00 
> 00 00 0f 84 81 01 00 00 41 8b 45 00 48 8d 4c 24
>  RSP: 0018:b5d4004478a8 EFLAGS: 00010246
>  RAX: 8880fcda2070 RBX: 8880fadd2900 RCX: 
>  RDX: 0002 RSI: b5d4004478ca RDI: 8880fcda206e
>  RBP: 8880fb9cb900 R08: 0008 R09: 8880fcda206e
>  R10: 8880fadd2900 R11:  R12: 8880fd26cf40
>  R13: 8880fc957430 R14: 8880fc957430 R15: 8880fb9cb988
>  FS:  7f75a537a740() GS:8880fda0() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2:  CR3: 7a2fa005 CR4: 001606f0
>  Call Trace:
>   ? __nla_reserve+0x38/0x50
>   tcf_action_dump_1+0xd2/0x130
>   tcf_action_dump+0x6a/0xf0
>   tca_get_fill.constprop.31+0xa3/0x120
>   tcf_action_add+0xd1/0x170
>   tc_ctl_action+0x137/0x150
>   rtnetlink_rcv_msg+0x263/0x2d0
>   ? _cond_resched+0x15/0x40
>   ? rtnl_calcit.isra.30+0x110/0x110
>   netlink_rcv_skb+0x4d/0x130
>   netlink_unicast+0x1a3/0x250
>   netlink_sendmsg+0x2ae/0x3a0
>   sock_sendmsg+0x36/0x40
>   ___sys_sendmsg+0x26f/0x2d0
>   ? do_wp_page+0x8e/0x5f0
>   ? handle_pte_fault+0x6c3/0xf50
>   ? __handle_mm_fault+0x38e/0x520
>   ? __sys_sendmsg+0x5e/0xa0
>   __sys_sendmsg+0x5e/0xa0
>   do_syscall_64+0x5b/0x180
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  RIP: 0033:0x7f75a4583ba0
>  Code: c3 48 8b 05 f2 62 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 
> 00 00 00 83 3d fd c3 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 
> 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
>  RSP: 002b:7fff60ee7418 EFLAGS: 0246 ORIG_RAX: 002e
>  RAX: ffda RBX: 7fff60ee7540 RCX: 7f75a4583ba0
>  RDX:  RSI: 7fff60ee7490 RDI: 0003
>  RBP: 5b842d3e R08: 0002 R09: 
>  R10: 7fff60ee6ea0 R11: 0246 R12: 
>  R13: 7fff60ee7554 R14: 0001 R15: 0066c100
>  Modules linked in: act_pedit(E) ip6table_filter ip6_tables iptable_filter 
> binfmt_misc crct10dif_pclmul ext4 crc32_pclmul mbcache ghash_clmulni_intel 
> jbd2 pcbc snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core 
> snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer 
> cryptd glue_helper snd joydev pcspkr soundcore virtio_balloon i2c_piix4 nfsd 
> auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic 
> pata_acpi virtio_net net_failover virtio_blk virtio_console failover qxl 
> crc32c_intel drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt 
> fb_sys_fops ttm drm ata_piix virtio_pci libata virtio_ring i2c_core virtio 
> floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_pedit]
>  CR2: 
>
> Like it's done for other TC actions, give up dumping pedit rules and return
> an error if nla_nest_start() returns NULL.

Looks good to me,

Acked-by: Cong Wang 

While you are at it, please fix act_tunnel_key too.

Thanks.


[bpf-next PATCH 1/2] bpf: sockmap test remove shutdown() calls

2018-08-28 Thread John Fastabend
Currently, we do a shutdown(sk, SHUT_RDWR) on both peer sockets and
a shutdown on the sender as well. However, this is incorrect and can
occasionally cause issues if you happen to have bad timing. First
peer1 or peer2 may still be in use depending on the test and timing.
Second we really should only be closing the read side and/or write
side depending on if the test is receiving or sending.

But, really none of this is needed just remove the shutdown calls.

Signed-off-by: John Fastabend 
---
 tools/testing/selftests/bpf/test_sockmap.c |3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 0c7d9e5..a0e77c6 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -469,8 +469,6 @@ static int sendmsg_test(struct sockmap_options *opt)
fprintf(stderr,
"msg_loop_rx: iov_count %i iov_buf %i cnt %i 
err %i\n",
iov_count, iov_buf, cnt, err);
-   shutdown(p2, SHUT_RDWR);
-   shutdown(p1, SHUT_RDWR);
if (s.end.tv_sec - s.start.tv_sec) {
sent_Bps = sentBps(s);
recvd_Bps = recvdBps(s);
@@ -500,7 +498,6 @@ static int sendmsg_test(struct sockmap_options *opt)
fprintf(stderr,
"msg_loop_tx: iov_count %i iov_buf %i cnt %i 
err %i\n",
iov_count, iov_buf, cnt, err);
-   shutdown(c1, SHUT_RDWR);
if (s.end.tv_sec - s.start.tv_sec) {
sent_Bps = sentBps(s);
recvd_Bps = recvdBps(s);



[bpf-next PATCH 0/2] bpf: test_sockmap updates

2018-08-28 Thread John Fastabend
Two small test sockmap updates for bpf-next. These help me run some
additional tests with test_sockmap.

---

John Fastabend (2):
  bpf: sockmap test remove shutdown() calls
  bpf: use --cgroup in test_suite if supplied


 tools/testing/selftests/bpf/test_sockmap.c |   56 
 1 file changed, 31 insertions(+), 25 deletions(-)

--
Signature


[bpf-next PATCH 2/2] bpf: use --cgroup in test_suite if supplied

2018-08-28 Thread John Fastabend
If the user supplies a --cgroup value in the arguments when running
the test_suite go ahaead and run the self tests there. I use this
to test with multiple cgroup users.

Signed-off-by: John Fastabend 
---
 tools/testing/selftests/bpf/test_sockmap.c |   53 
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index a0e77c6..ac7de38 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1345,9 +1345,9 @@ static int populate_progs(char *bpf_file)
return 0;
 }
 
-static int __test_suite(char *bpf_file)
+static int __test_suite(int cg_fd, char *bpf_file)
 {
-   int cg_fd, err;
+   int err, cleanup = cg_fd;
 
err = populate_progs(bpf_file);
if (err < 0) {
@@ -1355,22 +1355,24 @@ static int __test_suite(char *bpf_file)
return err;
}
 
-   if (setup_cgroup_environment()) {
-   fprintf(stderr, "ERROR: cgroup env failed\n");
-   return -EINVAL;
-   }
-
-   cg_fd = create_and_get_cgroup(CG_PATH);
if (cg_fd < 0) {
-   fprintf(stderr,
-   "ERROR: (%i) open cg path failed: %s\n",
-   cg_fd, optarg);
-   return cg_fd;
-   }
+   if (setup_cgroup_environment()) {
+   fprintf(stderr, "ERROR: cgroup env failed\n");
+   return -EINVAL;
+   }
+
+   cg_fd = create_and_get_cgroup(CG_PATH);
+   if (cg_fd < 0) {
+   fprintf(stderr,
+   "ERROR: (%i) open cg path failed: %s\n",
+   cg_fd, optarg);
+   return cg_fd;
+   }
 
-   if (join_cgroup(CG_PATH)) {
-   fprintf(stderr, "ERROR: failed to join cgroup\n");
-   return -EINVAL;
+   if (join_cgroup(CG_PATH)) {
+   fprintf(stderr, "ERROR: failed to join cgroup\n");
+   return -EINVAL;
+   }
}
 
/* Tests basic commands and APIs with range of iov values */
@@ -1391,20 +1393,24 @@ static int __test_suite(char *bpf_file)
 
 out:
printf("Summary: %i PASSED %i FAILED\n", passed, failed);
-   cleanup_cgroup_environment();
-   close(cg_fd);
+   if (cleanup < 0) {
+   cleanup_cgroup_environment();
+   close(cg_fd);
+   }
return err;
 }
 
-static int test_suite(void)
+static int test_suite(int cg_fd)
 {
int err;
 
-   err = __test_suite(BPF_SOCKMAP_FILENAME);
+   err = __test_suite(cg_fd, BPF_SOCKMAP_FILENAME);
if (err)
goto out;
-   err = __test_suite(BPF_SOCKHASH_FILENAME);
+   err = __test_suite(cg_fd, BPF_SOCKHASH_FILENAME);
 out:
+   if (cg_fd > -1)
+   close(cg_fd);
return err;
 }
 
@@ -1417,7 +1423,7 @@ int main(int argc, char **argv)
int test = PING_PONG;
 
if (argc < 2)
-   return test_suite();
+   return test_suite(-1);
 
while ((opt = getopt_long(argc, argv, ":dhvc:r:i:l:t:",
  long_options, &longindex)) != -1) {
@@ -1483,6 +1489,9 @@ int main(int argc, char **argv)
}
}
 
+   if (argc <= 3 && cg_fd)
+   return test_suite(cg_fd);
+
if (!cg_fd) {
fprintf(stderr, "%s requires cgroup option: --cgroup \n",
argv[0]);



Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Ard Biesheuvel
On 28 August 2018 at 15:56, Ard Biesheuvel  wrote:
> Hello Andreas, Nick,
>
> On 28 August 2018 at 06:06, Nicholas Piggin  wrote:
>> On Mon, 27 Aug 2018 19:11:01 +0200
>> Andreas Schwab  wrote:
>>
>>> I'm getting this Oops when running iptables -F OUTPUT:
>>>
>>> [   91.139409] Unable to handle kernel paging request for data at address 
>>> 0xd001fff12f34
>>> [   91.139414] Faulting instruction address: 0xd16a5718
>>> [   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [   91.139426] BE SMP NR_CPUS=2 PowerMac
>>> [   91.139434] Modules linked in: iptable_filter ip_tables x_tables 
>>> bpfilter nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet 
>>> snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus 
>>> snd_aoa_soundbus snd_pcm_oss snd_pcm snd_seq snd_timer snd_seq_device 
>>> snd_mixer_oss snd sungem sr_mod firewire_ohci cdrom sungem_phy soundcore 
>>> firewire_core pata_macio crc_itu_t sg hid_generic usbhid linear md_mod 
>>> ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore usb_common dm_snapshot dm_bufio 
>>> dm_mirror dm_region_hash dm_log dm_mod sata_svw
>>> [   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
>>> [   91.139526] NIP:  d16a5718 LR: d16a569c CTR: 
>>> c06f560c
>>> [   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  (4.19.0-rc1)
>>> [   91.139534] MSR:  9200b032   CR: 
>>> 84002484  XER: 2000
>>> [   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0
>>> GPR00: d16a569c c001fa5778f0 d16b0400 
>>> GPR04: 0002  8001fa46418e c001fa0d05c8
>>> GPR08: d16b0400 d00037f13000 0001ff3e7000 d16a6fb8
>>> GPR12: c06f560c c780  
>>> GPR16: 11635010 3fffa1b7aa68  
>>> GPR20: 0003 10013918 116350c0 c0b88990
>>> GPR24: c0b88ba4  d001fff12f34 
>>> GPR28: d16b8000 c001fa20f400 c001fa20f440 
>>> [   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
>>> [ip_tables]
>>> [   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
>>> [ip_tables]
>>> [   91.139638] Call Trace:
>>> [   91.139645] [c001fa5778f0] [d16a569c] 
>>> .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
>>> [   91.139655] [c001fa5779b0] [d16a5b54] 
>>> .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
>>> [   91.139666] [c001fa577aa0] [c06233e0] 
>>> .nf_getsockopt+0x68/0x88
>>> [   91.139674] [c001fa577b40] [c0631608] 
>>> .ip_getsockopt+0xbc/0x128
>>> [   91.139682] [c001fa577bf0] [c065adf4] 
>>> .raw_getsockopt+0x18/0x5c
>>> [   91.139690] [c001fa577c60] [c05b5f60] 
>>> .sock_common_getsockopt+0x2c/0x40
>>> [   91.139697] [c001fa577cd0] [c05b3394] 
>>> .__sys_getsockopt+0xa4/0xd0
>>> [   91.139704] [c001fa577d80] [c05b5ab0] 
>>> .__se_sys_socketcall+0x238/0x2b4
>>> [   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
>>> [   91.139716] Instruction dump:
>>> [   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 
>>> 419d000c 393e0060
>>> [   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 
>>> 41e20010 7c210b78
>>> [   91.139752] ---[ end trace f5d1d5431651845d ]---
>>
>> This is due to 7290d58095 ("module: use relative references for
>> __ksymtab entries"). This part of kernel/module.c -
>>
>>/* Divert to percpu allocation if a percpu var. */
>>if (sym[i].st_shndx == info->index.pcpu)
>>secbase = (unsigned long)mod_percpu(mod);
>>else
>>secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
>>sym[i].st_value += secbase;
>>
>> Causes the distance to the target to exceed 32-bits on powerpc, so
>> it doesn't fit in a rel32 reloc. Not sure how other archs cope.
>>
>
> Apologies for the breakage. It does indeed appear to affect all
> architectures, and I'm a bit puzzled why you are the first one to spot
> it.
>
> I will try to find a clean way to special case the per-CPU variable
> __ksymtab references in the generic module code, and if that is too
> cumbersome, we can switch to 64-bit relative references (or rather,
> native word size relative references) instead. Or revert the whole
> thing ...

OK, after a bit of digging, and confirming that the arm64
implementation works as expected (its module loader actually detects
overflows of the 32-bit place relative relocations, so the problem
definitely does not occur there), I think I found the explanation why
this occurs on powerpc and not on x86 or arm64.

Could you please check whether this change makes the issue go away?
(whitespace damage courtesy of Gmail)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6a501b25dd85..57d09d5ceb1a 1006

Re: [PATCH net] sctp: hold transport before accessing its asoc in sctp_transport_get_next

2018-08-28 Thread Xin Long
On Mon, Aug 27, 2018 at 9:08 PM Neil Horman  wrote:
>
> On Mon, Aug 27, 2018 at 06:38:31PM +0800, Xin Long wrote:
> > As Marcelo noticed, in sctp_transport_get_next, it is iterating over
> > transports but then also accessing the association directly, without
> > checking any refcnts before that, which can cause an use-after-free
> > Read.
> >
> > So fix it by holding transport before accessing the association. With
> > that, sctp_transport_hold calls can be removed in the later places.
> >
> > Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and 
> > reuse some for proc")
> > Reported-by: syzbot+fe62a0c9aa6a85c6d...@syzkaller.appspotmail.com
> > Signed-off-by: Xin Long 
> > ---
> >  net/sctp/proc.c   |  4 
> >  net/sctp/socket.c | 22 +++---
> >  2 files changed, 15 insertions(+), 11 deletions(-)
> >
> > diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> > index ef5c9a8..4d6f1c8 100644
> > --- a/net/sctp/proc.c
> > +++ b/net/sctp/proc.c
> > @@ -264,8 +264,6 @@ static int sctp_assocs_seq_show(struct seq_file *seq, 
> > void *v)
> >   }
> >
> >   transport = (struct sctp_transport *)v;
> > - if (!sctp_transport_hold(transport))
> > - return 0;
> >   assoc = transport->asoc;
> >   epb = &assoc->base;
> >   sk = epb->sk;
> > @@ -322,8 +320,6 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, 
> > void *v)
> >   }
> >
> >   transport = (struct sctp_transport *)v;
> > - if (!sctp_transport_hold(transport))
> > - return 0;
> >   assoc = transport->asoc;
> >
> >   list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list,
> > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > index e96b15a..aa76586 100644
> > --- a/net/sctp/socket.c
> > +++ b/net/sctp/socket.c
> > @@ -5005,9 +5005,14 @@ struct sctp_transport 
> > *sctp_transport_get_next(struct net *net,
> >   break;
> >   }
> >
> > + if (!sctp_transport_hold(t))
> > + continue;
> > +
> >   if (net_eq(sock_net(t->asoc->base.sk), net) &&
> >   t->asoc->peer.primary_path == t)
> >   break;
> > +
> > + sctp_transport_put(t);
> >   }
> >
> >   return t;
> > @@ -5017,13 +5022,18 @@ struct sctp_transport 
> > *sctp_transport_get_idx(struct net *net,
> > struct rhashtable_iter *iter,
> > int pos)
> >  {
> > - void *obj = SEQ_START_TOKEN;
> > + struct sctp_transport *t;
> >
> > - while (pos && (obj = sctp_transport_get_next(net, iter)) &&
> > -!IS_ERR(obj))
> > - pos--;
> > + if (!pos)
> > + return SEQ_START_TOKEN;
> >
> > - return obj;
> > + while ((t = sctp_transport_get_next(net, iter)) && !IS_ERR(t)) {
> > + if (!--pos)
> > + break;
> > + sctp_transport_put(t);
> > + }
> > +
> > + return t;
> >  }
> >
> >  int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
> > @@ -5082,8 +5092,6 @@ int sctp_for_each_transport(int (*cb)(struct 
> > sctp_transport *, void *),
> >
> >   tsp = sctp_transport_get_idx(net, &hti, *pos + 1);
> >   for (; !IS_ERR_OR_NULL(tsp); tsp = sctp_transport_get_next(net, 
> > &hti)) {
> > - if (!sctp_transport_hold(tsp))
> > - continue;
> >   ret = cb(tsp, p);
> >   if (ret)
> >   break;
> > --
> > 2.1.0
> >
> >
> Acked-by: Neil Horman 
>
> Additionally, its not germaine to this particular fix, but why are we still
> using that pos variable in sctp_transport_get_idx?  With the conversion to
> rhashtables, it doesn't seem particularly useful anymore.
For proc, seems so, hti is saved into seq->private.
But for diag, "hti" in sctp_for_each_transport() is a local variable.
do you think where we can save it?


net-next is OPEN...

2018-08-28 Thread David Miller


You know the drill...

http://vger.kernel.org/~davem/net-next.html


Re: [PATCH] iprule: Fix destination prefix output

2018-08-28 Thread Luca Boccassi
On Tue, 2018-08-28 at 16:27 +0200, Stefan Bader wrote:
> When adding support for JSON output the new code for printing
> the destination prefix adds a stray blank character before
> the bitmask. This causes some user-space parsing to fail.
> 
> Current output:
>   ...: from x.x.x.x/l to y.y.y.y /l
> Previous output:
>   ...: from x.x.x.x/l to y.y.y.y/l
> 
> Fixes: 0dd4ccc5 "iprule: add json support"
> Signed-off-by: Stefan Bader 
> ---
>  ip/iprule.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ip/iprule.c b/ip/iprule.c
> index 8b94214..744d6d8 100644
> --- a/ip/iprule.c
> +++ b/ip/iprule.c
> @@ -239,7 +239,7 @@ int print_rule(const struct sockaddr_nl *who,
> struct nlmsghdr *n, void *arg)
>  
>   print_string(PRINT_FP, NULL, "to ", NULL);
>   print_color_string(PRINT_ANY, ifa_family_color(frh-
> >family),
> -    "dst", "%s ", dst);
> +    "dst", "%s", dst);
>   if (frh->dst_len != host_len)
>   print_uint(PRINT_ANY, "dstlen", "/%u ", frh-
> >dst_len);
>   else

Acked-by: Luca Boccassi 

-- 
Kind regards,
Luca Boccassi

signature.asc
Description: This is a digitally signed message part


Re: [PATCH net 0/3] ipv6: fix error path of inet6_init()

2018-08-28 Thread Xin Long



- Original Message -
> The error path of inet6_init() can trigger multiple kernel panics,
> mostly due to wrong ordering of cleanups. This series fixes those
> issues.
> 
> Sabrina Dubroca (3):
>   ipv6: fix cleanup ordering for ip6_mr failure
>   ipv6: fix cleanup ordering for pingv6 registration
>   net: rtnl: return early from rtnl_unregister_all when protocol isn't
> registered
> 
>  net/core/rtnetlink.c |  4 
>  net/ipv6/af_inet6.c  | 10 +-
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> --
> 2.18.0
> 
> 
Series Reviewed-by: Xin Long 


[PATCH] iprule: Fix destination prefix output

2018-08-28 Thread Stefan Bader
When adding support for JSON output the new code for printing
the destination prefix adds a stray blank character before
the bitmask. This causes some user-space parsing to fail.

Current output:
  ...: from x.x.x.x/l to y.y.y.y /l
Previous output:
  ...: from x.x.x.x/l to y.y.y.y/l

Fixes: 0dd4ccc5 "iprule: add json support"
Signed-off-by: Stefan Bader 
---
 ip/iprule.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/iprule.c b/ip/iprule.c
index 8b94214..744d6d8 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -239,7 +239,7 @@ int print_rule(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 
print_string(PRINT_FP, NULL, "to ", NULL);
print_color_string(PRINT_ANY, ifa_family_color(frh->family),
-  "dst", "%s ", dst);
+  "dst", "%s", dst);
if (frh->dst_len != host_len)
print_uint(PRINT_ANY, "dstlen", "/%u ", frh->dst_len);
else
-- 
2.7.4



[PATCH net 2/2] tc-testing: add test-cases for numeric and invalid control action

2018-08-28 Thread Paolo Abeni
Only the police action allows us to specify an arbitrary numeric value
for the control action. This change introduces an explicit test case
for the above feature and then leverage it for testing the kernel behavior
for invalid control actions (reject).

Signed-off-by: Paolo Abeni 
---
 .../tc-testing/tc-tests/actions/police.json   | 48 +++
 1 file changed, 48 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/police.json 
b/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
index f03763d81617..30f9b54bd666 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
@@ -312,6 +312,54 @@
 "$TC actions flush action police"
 ]
 },
+{
+"id": "6aaf",
+"name": "Add police actions with conform-exceed control pass/pipe 
[with numeric values]",
+"category": [
+"actions",
+"police"
+],
+"setup": [
+[
+"$TC actions flush action police",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action police rate 3mbit burst 250k 
conform-exceed 0/3 index 1",
+"expExitCode": "0",
+"verifyCmd": "$TC actions get action police index 1",
+"matchPattern": "action order [0-9]*:  police 0x1 rate 3Mbit burst 
250Kb mtu 2Kb action pass/pipe",
+"matchCount": "1",
+"teardown": [
+"$TC actions flush action police"
+]
+},
+{
+"id": "29b1",
+"name": "Add police actions with conform-exceed control 
/drop",
+"category": [
+"actions",
+"police"
+],
+"setup": [
+[
+"$TC actions flush action police",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action police rate 3mbit burst 250k 
conform-exceed 10/drop index 1",
+"expExitCode": "255",
+"verifyCmd": "$TC actions ls action police",
+"matchPattern": "action order [0-9]*:  police 0x1 rate 3Mbit burst 
250Kb mtu 2Kb action ",
+"matchCount": "0",
+"teardown": [
+"$TC actions flush action police"
+]
+},
 {
 "id": "c26f",
 "name": "Add police action with invalid peakrate value",
-- 
2.17.1



[PATCH net 1/2] net_sched: reject unknown tcfa_action values

2018-08-28 Thread Paolo Abeni
After the commit 802bfb19152c ("net/sched: user-space can't set
unknown tcfa_action values"), unknown tcfa_action values are
converted to TC_ACT_UNSPEC, but the common agreement is instead
rejecting such configurations.

This change also introduce an helper to simplify the destruction
of a single action, avoding code duplication.

Fixes: 802bfb19152c ("net/sched: user-space can't set unknown tcfa_action 
values")
Signed-off-by: Paolo Abeni 
---
 net/sched/act_api.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index db83dac1e7f4..8614f2c282e8 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -662,6 +662,13 @@ int tcf_action_destroy(struct tc_action *actions[], int 
bind)
return ret;
 }
 
+int tcf_action_destroy_one(struct tc_action *a, int bind)
+{
+   struct tc_action *actions[] = { a, NULL };
+
+   return tcf_action_destroy(actions, bind);
+}
+
 static int tcf_action_put(struct tc_action *p)
 {
return __tcf_action_put(p, false);
@@ -881,17 +888,16 @@ struct tc_action *tcf_action_init_1(struct net *net, 
struct tcf_proto *tp,
if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN)) {
err = tcf_action_goto_chain_init(a, tp);
if (err) {
-   struct tc_action *actions[] = { a, NULL };
-
-   tcf_action_destroy(actions, bind);
NL_SET_ERR_MSG(extack, "Failed to init TC action 
chain");
+   tcf_action_destroy_one(a, bind);
return ERR_PTR(err);
}
}
 
if (!tcf_action_valid(a->tcfa_action)) {
NL_SET_ERR_MSG(extack, "invalid action value, using 
TC_ACT_UNSPEC instead");
-   a->tcfa_action = TC_ACT_UNSPEC;
+   tcf_action_destroy_one(a, bind);
+   return ERR_PTR(-EINVAL);
}
 
return a;
-- 
2.17.1



[PATCH net 0/2] net_sched: reject unknown tcfa_action values

2018-08-28 Thread Paolo Abeni
As agreed some time ago, this changeset reject unknown tcfa_action values,
instead of changing such values under the hood.

A tdc test is included to verify the new behavior.

Paolo Abeni (2):
  net_sched: reject unknown tcfa_action values
  tc-testing: add test-cases for numeric and invalid control action

 net/sched/act_api.c   | 14 --
 .../tc-testing/tc-tests/actions/police.json   | 48 +++
 2 files changed, 58 insertions(+), 4 deletions(-)

-- 
2.17.1



  1   2   >