Re: [PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list skb

2018-03-19 Thread Yonghong Song



On 3/19/18 10:30 PM, Yuan, Linyu (NSB - CN/Shanghai) wrote:




-Original Message-
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
On Behalf Of Yonghong Song
Sent: Tuesday, March 20, 2018 1:16 PM
To: eduma...@google.com; a...@fb.com; dan...@iogearbox.net;
dipt...@fb.com; netdev@vger.kernel.org
Cc: kernel-t...@fb.com
Subject: [PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list
skb


while (pos < offset + len) {
if (i >= nfrags) {
-   BUG_ON(skb_headlen(list_skb));
+   if (skb_headlen(list_skb) && check_list_skb == 
list_skb) {

Here cause next BUG_ON always false.


The idea is since in this branch, we did not do list_skb = 
list_skb->next. So we update check_list_skb. Next time, when the

control reaches here, list_skb may still be the same, but check_list_skb
is not, so we proceed to process list_skb->frags in the else branch.

In the else branch, we have
   list_skb = list_skb->next;
   check_list_skb = list_skb;

So when the current frags are processed and ready for the list_skb. 
list_skb will be equal to check_list_skb and it will be processed again.


It is a little bit convoluted. Please let me know you have better idea.


+   } else {
+   BUG_ON(skb_headlen(list_skb) && 
check_list_skb ==
list_skb);

Just according code logic, no need BUG_ON, right?


Oh, yes, we do not need this. Will remove in the next version.



-   i = 0;
-   nfrags = skb_shinfo(list_skb)->nr_frags;
-   frag = skb_shinfo(list_skb)->frags;
-   frag_skb = list_skb;
+   i = 0;
+   nfrags = skb_shinfo(list_skb)->nr_frags;
+   frag = skb_shinfo(list_skb)->frags;
+   frag_skb = list_skb;

-   BUG_ON(!nfrags);
+   BUG_ON(!nfrags);

-   if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
-   skb_zerocopy_clone(nskb, frag_skb,
-  GFP_ATOMIC))
-   goto err;
+   if (skb_orphan_frags(frag_skb, 
GFP_ATOMIC) ||
+   skb_zerocopy_clone(nskb, frag_skb,
GFP_ATOMIC))
+   goto err;

-   list_skb = list_skb->next;
+   list_skb = list_skb->next;
+   check_list_skb = list_skb;
+   }
}

if (unlikely(skb_shinfo(nskb)->nr_frags >=
--
2.9.5




Re: [bpf-next PATCH v3 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data

2018-03-19 Thread John Fastabend
On 03/19/2018 01:24 PM, Alexei Starovoitov wrote:
> On Sun, Mar 18, 2018 at 12:57:25PM -0700, John Fastabend wrote:
>> Currently, if a bpf sk msg program is run the program
>> can only parse data that the (start,end) pointers already
>> consumed. For sendmsg hooks this is likely the first
>> scatterlist element. For sendpage this will be the range
>> (0,0) because the data is shared with userspace and by
>> default we want to avoid allowing userspace to modify
>> data while (or after) BPF verdict is being decided.
>>
>> To support pulling in additional bytes for parsing use
>> a new helper bpf_sk_msg_pull(start, end, flags) which
>> works similar to cls tc logic. This helper will attempt
>> to point the data start pointer at 'start' bytes offest
>> into msg and data end pointer at 'end' bytes offset into
>> message.
>>
>> After basic sanity checks to ensure 'start' <= 'end' and
>> 'end' <= msg_length there are a few cases we need to
>> handle.
>>
>> First the sendmsg hook has already copied the data from
>> userspace and has exclusive access to it. Therefor, it
>> is not necessesary to copy the data. However, it may
>> be required. After finding the scatterlist element with
>> 'start' offset byte in it there are two cases. One the
>> range (start,end) is entirely contained in the sg element
>> and is already linear. All that is needed is to update the
>> data pointers, no allocate/copy is needed. The other case
>> is (start, end) crosses sg element boundaries. In this
>> case we allocate a block of size 'end - start' and copy
>> the data to linearize it.
>>
>> Next sendpage hook has not copied any data in initial
>> state so that data pointers are (0,0). In this case we
>> handle it similar to the above sendmsg case except the
>> allocation/copy must always happen. Then when sending
>> the data we have possibly three memory regions that
>> need to be sent, (0, start - 1), (start, end), and
>> (end + 1, msg_length). This is required to ensure any
>> writes by the BPF program are correctly transmitted.
>>
>> Lastly this operation will invalidate any previous
>> data checks so BPF programs will have to revalidate
>> pointers after making this BPF call.
>>
>> Signed-off-by: John Fastabend 
> ..
>> +
>> +page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy));
>> +if (unlikely(!page))
>> +return -ENOMEM;
> 
> I think that's fine. Just curious what order do you see in practice?

At the moment I'm mostly reading headers so this only
happens when a header is split across multiple scatterlist
elements. In these cases a copy size of less than 4k is good
enough.

Some of the nginx configurations I have use a max sendfile
size of 128kb. So these are larger, but unless we look
at the payload we can avoid reading/writing this. If
it becomes commonplace we could look at optimizing it.
Should be doable without changing the user facing API.

> 
> Acked-by: Alexei Starovoitov 
>


RE: [PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list skb

2018-03-19 Thread Yuan, Linyu (NSB - CN/Shanghai)


> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Yonghong Song
> Sent: Tuesday, March 20, 2018 1:16 PM
> To: eduma...@google.com; a...@fb.com; dan...@iogearbox.net;
> dipt...@fb.com; netdev@vger.kernel.org
> Cc: kernel-t...@fb.com
> Subject: [PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list
> skb
> 
> 
>   while (pos < offset + len) {
>   if (i >= nfrags) {
> - BUG_ON(skb_headlen(list_skb));
> + if (skb_headlen(list_skb) && check_list_skb == 
> list_skb) {
Here cause next BUG_ON always false.
> + } else {
> + BUG_ON(skb_headlen(list_skb) && 
> check_list_skb ==
> list_skb);
Just according code logic, no need BUG_ON, right? 
> 
> - i = 0;
> - nfrags = skb_shinfo(list_skb)->nr_frags;
> - frag = skb_shinfo(list_skb)->frags;
> - frag_skb = list_skb;
> + i = 0;
> + nfrags = skb_shinfo(list_skb)->nr_frags;
> + frag = skb_shinfo(list_skb)->frags;
> + frag_skb = list_skb;
> 
> - BUG_ON(!nfrags);
> + BUG_ON(!nfrags);
> 
> - if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> - skb_zerocopy_clone(nskb, frag_skb,
> -GFP_ATOMIC))
> - goto err;
> + if (skb_orphan_frags(frag_skb, 
> GFP_ATOMIC) ||
> + skb_zerocopy_clone(nskb, frag_skb,
> GFP_ATOMIC))
> + goto err;
> 
> - list_skb = list_skb->next;
> + list_skb = list_skb->next;
> + check_list_skb = list_skb;
> + }
>   }
> 
>   if (unlikely(skb_shinfo(nskb)->nr_frags >=
> --
> 2.9.5



[PATCH net-next 1/2] net: permit skb_segment on head_frag frag_list skb

2018-03-19 Thread Yonghong Song
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
function skb_segment(), line 3667. The bpf program attaches to
clsact ingress, calls bpf_skb_change_proto to change protocol
from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
to send the changed packet out.

3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3473 netdev_features_t features)
3474 {
3475 struct sk_buff *segs = NULL;
3476 struct sk_buff *tail = NULL;
...
3665 while (pos < offset + len) {
3666 if (i >= nfrags) {
3667 BUG_ON(skb_headlen(list_skb));
3668
3669 i = 0;
3670 nfrags = skb_shinfo(list_skb)->nr_frags;
3671 frag = skb_shinfo(list_skb)->frags;
3672 frag_skb = list_skb;
...

call stack:
...
 #1 [883ffef03558] __crash_kexec at 8110c525
 #2 [883ffef03620] crash_kexec at 8110d5cc
 #3 [883ffef03640] oops_end at 8101d7e7
 #4 [883ffef03668] die at 8101deb2
 #5 [883ffef03698] do_trap at 8101a700
 #6 [883ffef036e8] do_error_trap at 8101abfe
 #7 [883ffef037a0] do_invalid_op at 8101acd0
 #8 [883ffef037b0] invalid_op at 81a00bab
[exception RIP: skb_segment+3044]
RIP: 817e4dd4  RSP: 883ffef03860  RFLAGS: 00010216
RAX: 2bf6  RBX: 883feb7aaa00  RCX: 0011
RDX: 883fb87910c0  RSI: 0011  RDI: 883feb7ab500
RBP: 883ffef03928   R8: 2ce2   R9: 27da
R10: 01ea  R11: 2d82  R12: 883f90a1ee80
R13: 883fb8791120  R14: 883feb7abc00  R15: 2ce2
ORIG_RAX:   CS: 0010  SS: 0018
 #9 [883ffef03930] tcp_gso_segment at 818713e7
---  ---
...

The triggering input skb has the following properties:
list_skb = skb->frag_list;
skb->nfrags != NULL && skb_headlen(list_skb) != 0
and skb_segment() is not able to handle a frag_list skb
if its headlen (list_skb->len - list_skb->data_len) is not 0.

This patch addressed the issue by handling skb_headlen(list_skb) != 0
case properly if list_skb->head_frag is true, which is expected in
most cases. A one-element frag array is created for the list_skb head
and processed before list_skb->frags are processed.

Reported-by: Diptanu Gon Choudhury 
Signed-off-by: Yonghong Song 
---
 net/core/skbuff.c | 42 ++
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 715c134..983f62a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3475,9 +3475,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
struct sk_buff *segs = NULL;
struct sk_buff *tail = NULL;
struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
-   skb_frag_t *frag = skb_shinfo(head_skb)->frags;
+   skb_frag_t *frag = skb_shinfo(head_skb)->frags, head_frag;
unsigned int mss = skb_shinfo(head_skb)->gso_size;
unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
+   struct sk_buff *check_list_skb = list_skb;
struct sk_buff *frag_skb = head_skb;
unsigned int offset = doffset;
unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
@@ -3590,6 +3591,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 
nskb = skb_clone(list_skb, GFP_ATOMIC);
list_skb = list_skb->next;
+   check_list_skb = list_skb;
 
if (unlikely(!nskb))
goto err;
@@ -3664,21 +3666,37 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 
while (pos < offset + len) {
if (i >= nfrags) {
-   BUG_ON(skb_headlen(list_skb));
+   if (skb_headlen(list_skb) && check_list_skb == 
list_skb) {
+   struct page *page;
+
+   BUG_ON(!list_skb->head_frag);
+
+   i = 0;
+   nfrags = 1;
+   page = 
virt_to_head_page(list_skb->head);
+   head_frag.page.p = page;
+   head_frag.page_offset = list_skb->data -
+   (unsigned char 
*)page_address(page);
+   head_frag.size = skb_headlen(list_skb);
+   frag = _frag;
+   check_list_skb = list_skb->next;
+   } else {
+   

[PATCH net-next 0/2] net: permit skb_segment on head_frag frag_list skb

2018-03-19 Thread Yonghong Song
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
function skb_segment(), line 3667. The bpf program attaches to
clsact ingress, calls bpf_skb_change_proto to change protocol
from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
to send the changed packet out.
 ...
3665 while (pos < offset + len) {
3666 if (i >= nfrags) {
3667 BUG_ON(skb_headlen(list_skb));
 ...

The triggering input skb has the following properties:
list_skb = skb->frag_list;
skb->nfrags != NULL && skb_headlen(list_skb) != 0
and skb_segment() is not able to handle a frag_list skb
if its headlen (list_skb->len - list_skb->data_len) is not 0.

Patch #1 provides a simple solution to avoid BUG_ON. If
list_skb->head_frag is true, its page-backed frag will
be processed before the list_skb->frags.
Patch #2 provides a test case in test_bpf module which
constructs a skb and calls skb_segment() directly. The test
case is able to trigger the BUG_ON without Patch #1.

Yonghong Song (2):
  net: permit skb_segment on head_frag frag_list skb
  net: bpf: add a test for skb_segment in test_bpf module

 lib/test_bpf.c| 71 ++-
 net/core/skbuff.c | 42 ++--
 2 files changed, 100 insertions(+), 13 deletions(-)

-- 
2.9.5



[PATCH net-next 2/2] net: bpf: add a test for skb_segment in test_bpf module

2018-03-19 Thread Yonghong Song
Without the previous commit,
"modprobe test_bpf" will have the following errors:
...
[   98.149165] [ cut here ]
[   98.159362] kernel BUG at net/core/skbuff.c:3667!
[   98.169756] invalid opcode:  [#1] SMP PTI
[   98.179370] Modules linked in:
[   98.179371]  test_bpf(+)
...
which triggers the bug the previous commit intends to fix.

The skbs are constructed to mimic what mlx5 may generate.
The packet size/header may not mimic real cases in production. But
the processing flow is similar.

Signed-off-by: Yonghong Song 
---
 lib/test_bpf.c | 71 +-
 1 file changed, 70 insertions(+), 1 deletion(-)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 2efb213..045d7d3 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -6574,6 +6574,72 @@ static bool exclude_test(int test_id)
return test_id < test_range[0] || test_id > test_range[1];
 }
 
+static struct sk_buff *build_test_skb(void *page)
+{
+   u32 headroom = NET_SKB_PAD + NET_IP_ALIGN + ETH_HLEN;
+   struct sk_buff *skb[2];
+   int i, data_size = 8;
+
+   for (i = 0; i < 2; i++) {
+   /* this will set skb[i]->head_frag */
+   skb[i] = build_skb(page, headroom);
+   if (!skb[i])
+   return NULL;
+
+   skb_reserve(skb[i], headroom);
+   skb_put(skb[i], data_size);
+   skb[i]->protocol = htons(ETH_P_IP);
+   skb_reset_network_header(skb[i]);
+   skb_set_mac_header(skb[i], -ETH_HLEN);
+
+   skb_add_rx_frag(skb[i], skb_shinfo(skb[i])->nr_frags,
+   page, 0, 64, 64);
+   // skb: skb_headlen(skb[i]): 8, skb[i]->head_frag = 1
+   }
+
+   /* setup shinfo */
+   skb_shinfo(skb[0])->gso_size = 1448;
+   skb_shinfo(skb[0])->gso_type = SKB_GSO_TCPV4;
+   skb_shinfo(skb[0])->gso_type |= SKB_GSO_DODGY;
+   skb_shinfo(skb[0])->gso_segs = 0;
+   skb_shinfo(skb[0])->frag_list = skb[1];
+
+   /* adjust skb[0]'s len */
+   skb[0]->len += skb[1]->len;
+   skb[0]->data_len += skb[1]->data_len;
+   skb[0]->truesize += skb[1]->truesize;
+
+   return skb[0];
+}
+
+static __init int test_skb_segment(void)
+{
+   netdev_features_t features;
+   struct sk_buff *skb;
+   void *page;
+   int ret = -1;
+
+   page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+   if (!page) {
+   pr_info("%s: failed to get_free_page!", __func__);
+   return ret;
+   }
+
+   features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | 
NETIF_F_IPV6_CSUM;
+   features |= NETIF_F_RXCSUM;
+   skb = build_test_skb(page);
+   if (!skb) {
+   pr_info("%s: failed to build_test_skb", __func__);
+   } else if (skb_segment(skb, features)) {
+   ret = 0;
+   pr_info("%s: success in skb_segment!", __func__);
+   } else {
+   pr_info("%s: failed in skb_segment!", __func__);
+   }
+   free_page((unsigned long)page);
+   return ret;
+}
+
 static __init int test_bpf(void)
 {
int i, err_cnt = 0, pass_cnt = 0;
@@ -6632,8 +6698,11 @@ static int __init test_bpf_init(void)
return ret;
 
ret = test_bpf();
-
destroy_bpf_tests();
+   if (ret)
+   return ret;
+
+   ret = test_skb_segment();
return ret;
 }
 
-- 
2.9.5



RE: [PATCH] bnx2x: fix spelling mistake: "registeration" -> "registration"

2018-03-19 Thread Kalluru, Sudarsana
-Original Message-
From: Colin King [mailto:colin.k...@canonical.com] 
Sent: 19 March 2018 20:03
To: Elior, Ariel ; Dept-Eng Everest Linux L2 
; netdev@vger.kernel.org
Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: [PATCH] bnx2x: fix spelling mistake: "registeration" -> "registration"

From: Colin Ian King 

Trivial fix to spelling mistake in BNX2X_ERR error message text

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 74fc9af4aadb..b8388e93520a 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13913,7 +13913,7 @@ static void bnx2x_register_phc(struct bnx2x *bp)
bp->ptp_clock = ptp_clock_register(>ptp_clock_info, >pdev->dev);
if (IS_ERR(bp->ptp_clock)) {
bp->ptp_clock = NULL;
-   BNX2X_ERR("PTP clock registeration failed\n");
+   BNX2X_ERR("PTP clock registration failed\n");
}
 }
 
-- 
2.15.1

Acked-by: Sudarsana Kalluru 


Re: [PATCH net-next v2 2/2] dt: bindings: add new dt entries for brcmfmac

2018-03-19 Thread Alexey Roslyakov
Arend,
I use RK3288-firefly, bcm4339 (ap6335).

Regards,
  Alex

On 20 March 2018 at 06:16, Arend van Spriel
 wrote:
> + Uffe
>
> On 3/19/2018 6:55 PM, Florian Fainelli wrote:
>>
>> On 03/19/2018 07:10 AM, Alexey Roslyakov wrote:
>>>
>>> Hi Arend,
>>> I appreciate your response. In my opinion, it has nothing to do with
>>> SDIO host, because it defines "quirks" in the driver itself.
>>
>>
>> It is not clear to me from your patch series whether the problem is that:
>>
>> - the SDIO device has a specific alignment requirements, which would be
>> either a SDIO device driver limitation/issue or maybe the underlying
>> hardware device/firmware requiring that
>>
>> - the SDIO host controller used is not capable of coping nicely with
>> these said limitations
>>
>> It seems to me like what you are doing here is a) applicable to possibly
>> more SDIO devices and host combinations, and b) should likely be done at
>> the layer between the host and device, such that it is available to more
>> combinations.
>
>
> Indeed. That was my thought exactly and I can not imagine Uffe would push
> back on that reasoning.
>
>>> If I get it right, you mean something like this:
>>>
>>> mmc3: mmc@1c12000 {
>>> ...
>>>  broken-sg-support;
>>>  sd-head-align = 4;
>>>  sd-sgentry-align = 512;
>>>
>>>  brcmf: wifi@1 {
>>>  ...
>>>  };
>>> };
>>>
>>> Where dt: bindings documentation for these entries should reside?
>>> In generic MMC bindings? Well, this is the very special case and
>>> mmc-linux maintainer will unlikely to accept these changes.
>>> Also, extra kernel code modification might be required. It could make
>>> quite trivial change much more complex.
>>
>>
>> If the MMC maintainers are not copied on this patch series, it will
>> likely be hard for them to identify this patch series and chime in...
>
>
> The main question is whether this is indeed a "very special case" as Alexey
> claims it to be or that it is likely to be applicable to other device and
> host combinations as you are suggesting.
>
> If these properties are imposed by the host or host controller it would make
> sense to have these in the mmc bindings.
>
>>>
 Also I am not sure if the broken-sg-support is still needed. We added
 that for omap_hsmmc, but that has since changed to scatter-gather emulation
 so it might not be needed anymore.
>>>
>>>
>>> I've experienced the problem with rk3288 (dw-mmc host) and sdio
>>> settings like above solved it.
>>> Frankly, I haven't investigated any deeper which one of the settings
>>> helped in my case yet...
>>> I will try to get rid of broken-sg-support first and let you know if
>>> it does make any difference.
>
>
> Are you using some chromebook. I have some lying around here so I could also
> look into it. What broadcom chipset do you have?
>
> Regards,
> Arend
>
>
>>> All the best,
>>>Alex.
>>>
>>> On 19 March 2018 at 16:31, Arend van Spriel
>>>  wrote:

 On 3/19/2018 2:40 AM, Alexey Roslyakov wrote:
>
>
> In case if the host has higher align requirements for SG items, allow
> setting device-specific aligns for scatterlist items.
>
> Signed-off-by: Alexey Roslyakov 
> ---
>Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
> | 5
> +
>1 file changed, 5 insertions(+)
>
> diff --git
> a/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
> b/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
> index 86602f264dce..187b8c1b52a7 100644
> ---
> a/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
> +++
> b/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
> @@ -17,6 +17,11 @@ Optional properties:
>  When not specified the device will use in-band SDIO
> interrupts.
> - interrupt-names : name of the out-of-band interrupt, which must
> be
> set
>  to "host-wake".
> + - brcm,broken-sg-support : boolean flag to indicate that the SDIO
> host
> +   controller has higher align requirement than 32 bytes for each
> +   scatterlist item.
> + - brcm,sd-head-align : alignment requirement for start of data
> buffer.
> + - brcm,sd-sgentry-align : length alignment requirement for each sg
> entry.



 Hi Alexey,

 Thanks for the patch. However, the problem with these is that they are
 characterizing the host controller and not the wireless device. So from
 device tree perspective , which is to describe the hardware, these
 properties should be SDIO host controller properties. Also I am not sure
 if
 the broken-sg-support is still needed. We added that for omap_hsmmc, but
 that has since changed to scatter-gather emulation so it might not be
 needed
 

Re: [PATCH 13/36] fs: introduce new ->get_poll_head and ->poll_mask methods

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:20PM -0800, Christoph Hellwig wrote:
> ->get_poll_head returns the waitqueue that the poll operation is going
> to sleep on.  Note that this means we can only use a single waitqueue
> for the poll, unlike some current drivers that use two waitqueues for
> different events.  But now that we have keyed wakeups and heavily use
> those for poll there aren't that many good reason left to keep the
> multiple waitqueues, and if there are any ->poll is still around, the
> driver just won't support aio poll.
> 
> Signed-off-by: Christoph Hellwig 

I've been wondering, how does a regular filesystem connect with this?
Also, does anything implement get_poll_head?  It looks to me like an aio
poll provider has to provide both...

--D

> ---
>  Documentation/filesystems/Locking |  7 ++-
>  Documentation/filesystems/vfs.txt | 13 +
>  fs/select.c   | 28 
>  include/linux/fs.h|  2 ++
>  include/linux/poll.h  | 27 +++
>  5 files changed, 72 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/filesystems/Locking 
> b/Documentation/filesystems/Locking
> index 220bba28f72b..6d227f9d7bd9 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -440,6 +440,8 @@ prototypes:
>   ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>   int (*iterate) (struct file *, struct dir_context *);
>   __poll_t (*poll) (struct file *, struct poll_table_struct *);
> + struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
> + __poll_t (*poll_mask) (struct file *, __poll_t);
>   long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>   long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>   int (*mmap) (struct file *, struct vm_area_struct *);
> @@ -470,7 +472,7 @@ prototypes:
>  };
>  
>  locking rules:
> - All may block.
> + All except for ->poll_mask may block.
>  
>  ->llseek() locking has moved from llseek to the individual llseek
>  implementations.  If your fs is not using generic_file_llseek, you
> @@ -498,6 +500,9 @@ in sys_read() and friends.
>  the lease within the individual filesystem to record the result of the
>  operation
>  
> +->poll_mask can be called with or without the waitqueue lock for the 
> waitqueue
> +returned from ->get_poll_head.
> +
>  --- dquot_operations ---
>  prototypes:
>   int (*write_dquot) (struct dquot *);
> diff --git a/Documentation/filesystems/vfs.txt 
> b/Documentation/filesystems/vfs.txt
> index f608180ad59d..50ee13563271 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -857,6 +857,8 @@ struct file_operations {
>   ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>   int (*iterate) (struct file *, struct dir_context *);
>   __poll_t (*poll) (struct file *, struct poll_table_struct *);
> + struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
> + __poll_t (*poll_mask) (struct file *, __poll_t);
>   long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>   long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>   int (*mmap) (struct file *, struct vm_area_struct *);
> @@ -901,6 +903,17 @@ otherwise noted.
>   activity on this file and (optionally) go to sleep until there
>   is activity. Called by the select(2) and poll(2) system calls
>  
> +  get_poll_head: Returns the struct wait_queue_head that poll, select,
> +  epoll or aio poll should wait on in case this instance only has single
> +  waitqueue.  Can return NULL to indicate polling is not supported,
> +  or a POLL* value using the POLL_TO_PTR helper in case a grave error
> +  occured and ->poll_mask shall not be called.
> +
> +  poll_mask: return the mask of POLL* values describing the file descriptor
> +  state.  Called either before going to sleep on the waitqueue returned by
> +  get_poll_head, or after it has been woken.  If ->get_poll_head and
> +  ->poll_mask are implemented ->poll does not need to be implement.
> +
>unlocked_ioctl: called by the ioctl(2) system call.
>  
>compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
> diff --git a/fs/select.c b/fs/select.c
> index ba91103707ea..cc270d7f6192 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -34,6 +34,34 @@
>  
>  #include 
>  
> +__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
> +{
> + unsigned int events = poll_requested_events(pt);
> + struct wait_queue_head *head;
> +
> + if (unlikely(!file_can_poll(file)))
> + return DEFAULT_POLLMASK;
> +
> + if (file->f_op->poll)
> + return file->f_op->poll(file, pt);
> +
> + /*
> +  * Only get the poll head and do the first mask check if we are actually
> +  * 

Re: [PATCH] mlx5: Remove call to ida_pre_get

2018-03-19 Thread Saeed Mahameed
On Thu, 2018-03-15 at 18:30 -0700, Matthew Wilcox wrote:
> On Thu, Mar 15, 2018 at 11:58:07PM +, Saeed Mahameed wrote:
> > On Wed, 2018-03-14 at 19:57 -0700, Matthew Wilcox wrote:
> > > From: Matthew Wilcox 
> > > 
> > > The mlx5 driver calls ida_pre_get() in a loop for no readily
> > > apparent
> > > reason.  The driver uses ida_simple_get() which will call
> > > ida_pre_get()
> > > by itself and there's no need to use ida_pre_get() unless using
> > > ida_get_new().
> > > 
> > 
> > Hi Matthew,
> > 
> > Is this is causing any issues ? or just a simple cleanup ?
> 
> I'm removing the API.  At the end of this cleanup, there will be no
> more
> preallocation; instead we will rely on the slab allocator not
> sucking.
> 

Ok, Seems reasonable, I am ok with this.

> > Adding Maor, the author of this change,
> > 
> > I believe the idea is to speed up insert_fte (which calls
> > ida_simple_get) since insert_fte runs under the FTE write
> > semaphore,
> > in this case if ida_pre_get was successful before taking the
> > semaphore
> > for all the FTE nodes in the loop, this will be a huge win for
> > ida_simple_get which will immediately return success without even
> > trying to allocate.
> 
> I think that's misguided.  The IDA allocator is only going to
> allocate
> memory once in every 1024 allocations.  Also, it does try to
> allocate,
> even if there are preallocated nodes.  So you're just wasting time,
> unfortunately.
> 

Well just by looking at the code you can tell for sure that 
two consecutive calls to ida_pre_get will result in one allocation
only.
due to "if (!this_cpu_read(ida_bitmap))"

but i didn't dig into details and didn't go through the whole
ida_get_new_above, so i will count on your judgment here.

Still i would like to wait for Maor's input here, the author..
I Will ping him today.

Thanks,
Saeed.

Re: [PATCH 07/36] aio: add delayed cancel support

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:14PM -0800, Christoph Hellwig wrote:
> The upcoming aio poll support would like to be able to complete the
> iocb inline from the cancellation context, but that would cause
> a lock order reversal.  Add support for optionally moving the cancelation
> outside the context lock to avoid this reversal.

I started to wonder which lock order reversal the commit message refers
to?

I think the reason for adding delayed cancellations is that we want to
be able to call io_cancel -> kiocb_cancel -> aio_poll_cancel ->
aio_complete without double locking ctx_lock?

--D

> Signed-off-by: Christoph Hellwig 
> Acked-by: Jeff Moyer 
> ---
>  fs/aio.c | 49 ++---
>  1 file changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 0b6394b4e528..9d7d6e4cde87 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -170,6 +170,10 @@ struct aio_kiocb {
>   struct list_headki_list;/* the aio core uses this
>* for cancellation */
>  
> + unsigned intflags;  /* protected by ctx->ctx_lock */
> +#define AIO_IOCB_DELAYED_CANCEL  (1 << 0)
> +#define AIO_IOCB_CANCELLED   (1 << 1)
> +
>   /*
>* If the aio_resfd field of the userspace iocb is not zero,
>* this is the underlying eventfd context to deliver events to.
> @@ -536,9 +540,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned 
> int nr_events)
>  #define AIO_EVENTS_FIRST_PAGE((PAGE_SIZE - sizeof(struct aio_ring)) 
> / sizeof(struct io_event))
>  #define AIO_EVENTS_OFFSET(AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE)
>  
> -void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
> +static void __kiocb_set_cancel_fn(struct aio_kiocb *req,
> + kiocb_cancel_fn *cancel, unsigned int iocb_flags)
>  {
> - struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
>   struct kioctx *ctx = req->ki_ctx;
>   unsigned long flags;
>  
> @@ -548,8 +552,15 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, 
> kiocb_cancel_fn *cancel)
>   spin_lock_irqsave(>ctx_lock, flags);
>   list_add_tail(>ki_list, >active_reqs);
>   req->ki_cancel = cancel;
> + req->flags |= iocb_flags;
>   spin_unlock_irqrestore(>ctx_lock, flags);
>  }
> +
> +void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
> +{
> + return __kiocb_set_cancel_fn(container_of(iocb, struct aio_kiocb, rw),
> + cancel, 0);
> +}
>  EXPORT_SYMBOL(kiocb_set_cancel_fn);
>  
>  /*
> @@ -603,17 +614,27 @@ static void free_ioctx_users(struct percpu_ref *ref)
>  {
>   struct kioctx *ctx = container_of(ref, struct kioctx, users);
>   struct aio_kiocb *req;
> + LIST_HEAD(list);
>  
>   spin_lock_irq(>ctx_lock);
> -
>   while (!list_empty(>active_reqs)) {
>   req = list_first_entry(>active_reqs,
>  struct aio_kiocb, ki_list);
> - kiocb_cancel(req);
> - }
>  
> + if (req->flags & AIO_IOCB_DELAYED_CANCEL) {
> + req->flags |= AIO_IOCB_CANCELLED;
> + list_move_tail(>ki_list, );
> + } else {
> + kiocb_cancel(req);
> + }
> + }
>   spin_unlock_irq(>ctx_lock);
>  
> + while (!list_empty()) {
> + req = list_first_entry(, struct aio_kiocb, ki_list);
> + kiocb_cancel(req);
> + }
> +
>   percpu_ref_kill(>reqs);
>   percpu_ref_put(>reqs);
>  }
> @@ -1785,15 +1806,22 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, 
> struct iocb __user *, iocb,
>   if (unlikely(!ctx))
>   return -EINVAL;
>  
> - spin_lock_irq(>ctx_lock);
> + ret = -EINVAL;
>  
> + spin_lock_irq(>ctx_lock);
>   kiocb = lookup_kiocb(ctx, iocb, key);
> + if (kiocb) {
> + if (kiocb->flags & AIO_IOCB_DELAYED_CANCEL) {
> + kiocb->flags |= AIO_IOCB_CANCELLED;
> + } else {
> + ret = kiocb_cancel(kiocb);
> + kiocb = NULL;
> + }
> + }
> + spin_unlock_irq(>ctx_lock);
> +
>   if (kiocb)
>   ret = kiocb_cancel(kiocb);
> - else
> - ret = -EINVAL;
> -
> - spin_unlock_irq(>ctx_lock);
>  
>   if (!ret) {
>   /*
> @@ -1805,7 +1833,6 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, 
> struct iocb __user *, iocb,
>   }
>  
>   percpu_ref_put(>users);
> -
>   return ret;
>  }
>  
> -- 
> 2.14.2
> 


Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-19 Thread Arnd Bergmann
On Tue, Mar 20, 2018 at 7:29 AM, Linus Torvalds
 wrote:
> On Mon, Mar 19, 2018 at 2:43 AM, David Laight  wrote:
>>
>> Is it necessary to have the full checks for old versions of gcc?
>>
>> Even -Wvla could be predicated on very recent gcc - since we aren't
>> worried about whether gcc decides to generate a vla, but whether
>> the source requests one.
>
> You are correct. We could just ignore the issue with old gcc versions,
> and disable -Wvla rather than worry about it.

This version might also be an option:

diff --git a/Makefile b/Makefile
index 37fc475a2b92..49dd9f0fb76c 100644
--- a/Makefile
+++ b/Makefile
@@ -687,7 +687,8 @@ KBUILD_CFLAGS += $(call cc-option,-fno-reorder-blocks,) \
 endif

 ifneq ($(CONFIG_FRAME_WARN),0)
-KBUILD_CFLAGS += $(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN})
+KBUILD_CFLAGS += $(call cc-option,-Wstack-usage=${CONFIG_FRAME_WARN}, \
+   -$(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN}))
 endif

 # This selects the stack protector compiler flag. Testing it is delayed

Wiht -Wstack-usage=, we should get a similar warning to -Wvla for frames that
contain real VLAs, but not when there is a VLA that ends up being a compile-time
constant size in the end. Wstack-usage was introduced in gcc-4.7, so
on older versions
it turns back into Wframe-larger-than=.

An example output would be

security/integrity/ima/ima_crypto.c: In function 'ima_calc_buffer_hash':
security/integrity/ima/ima_crypto.c:616:5: error: stack usage might be
unbounded [-Werror=stack-usage=]

Arnd


[PATCH net-next 05/14] net: Add TLS TX offload features

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

This patch adds a netdev feature to configure TLS TX offloads.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 include/linux/netdev_features.h | 2 ++
 net/core/ethtool.c  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index db84c516bcfb..18dc34202080 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -77,6 +77,7 @@ enum {
NETIF_F_HW_ESP_BIT, /* Hardware ESP transformation offload 
*/
NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */
NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
+   NETIF_F_HW_TLS_TX_BIT,  /* Hardware TLS TX offload */
 
NETIF_F_GRO_HW_BIT, /* Hardware Generic receive offload */
 
@@ -145,6 +146,7 @@ enum {
 #define NETIF_F_HW_ESP __NETIF_F(HW_ESP)
 #define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM)
 #defineNETIF_F_RX_UDP_TUNNEL_PORT  __NETIF_F(RX_UDP_TUNNEL_PORT)
+#define NETIF_F_HW_TLS_TX  __NETIF_F(HW_TLS_TX)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 157cd9efa4be..9f07f9fe39ca 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -107,6 +107,7 @@ static const char 
netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_HW_ESP_BIT] =   "esp-hw-offload",
[NETIF_F_HW_ESP_TX_CSUM_BIT] =   "esp-tx-csum-hw-offload",
[NETIF_F_RX_UDP_TUNNEL_PORT_BIT] =   "rx-udp_tunnel-port-offload",
+   [NETIF_F_HW_TLS_TX_BIT] ="tls-hw-tx-offload",
 };
 
 static const char
-- 
2.14.3



[PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

This patch adds a generic infrastructure to offload TLS crypto to a
network devices. It enables the kernel TLS socket to skip encryption
and authentication operations on the transmit side of the data path.
Leaving those computationally expensive operations to the NIC.

The NIC offload infrastructure builds TLS records and pushes them to
the TCP layer just like the SW KTLS implementation and using the same API.
TCP segmentation is mostly unaffected. Currently the only exception is
that we prevent mixed SKBs where only part of the payload requires
offload. In the future we are likely to add a similar restriction
following a change cipher spec record.

The notable differences between SW KTLS and NIC offloaded TLS
implementations are as follows:
1. The offloaded implementation builds "plaintext TLS record", those
records contain plaintext instead of ciphertext and place holder bytes
instead of authentication tags.
2. The offloaded implementation maintains a mapping from TCP sequence
number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
TLS socket, we can use the tls NIC offload infrastructure to obtain
enough context to encrypt the payload of the SKB.
A TLS record is released when the last byte of the record is ack'ed,
this is done through the new icsk_clean_acked callback.

The infrastructure should be extendable to support various NIC offload
implementations.  However it is currently written with the
implementation below in mind:
The NIC assumes that packets from each offloaded stream are sent as
plaintext and in-order. It keeps track of the TLS records in the TCP
stream. When a packet marked for offload is transmitted, the NIC
encrypts the payload in-place and puts authentication tags in the
relevant place holders.

The responsibility for handling out-of-order packets (i.e. TCP
retransmission, qdisc drops) falls on the netdev driver.

The netdev driver keeps track of the expected TCP SN from the NIC's
perspective.  If the next packet to transmit matches the expected TCP
SN, the driver advances the expected TCP SN, and transmits the packet
with TLS offload indication.

If the next packet to transmit does not match the expected TCP SN. The
driver calls the TLS layer to obtain the TLS record that includes the
TCP of the packet for transmission. Using this TLS record, the driver
posts a work entry on the transmit queue to reconstruct the NIC TLS
state required for the offload of the out-of-order packet. It updates
the expected TCP SN accordingly and transmit the now in-order packet.
The same queue is used for packet transmission and TLS context
reconstruction to avoid the need for flushing the transmit queue before
issuing the context reconstruction request.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 include/net/tls.h |  70 +++-
 net/tls/Kconfig   |  10 +
 net/tls/Makefile  |   2 +
 net/tls/tls_device.c  | 804 ++
 net/tls/tls_device_fallback.c | 419 ++
 net/tls/tls_main.c|  33 +-
 6 files changed, 1331 insertions(+), 7 deletions(-)
 create mode 100644 net/tls/tls_device.c
 create mode 100644 net/tls/tls_device_fallback.c

diff --git a/include/net/tls.h b/include/net/tls.h
index 4913430ab807..ab98a6dc4929 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -77,6 +77,37 @@ struct tls_sw_context {
struct scatterlist sg_aead_out[2];
 };
 
+struct tls_record_info {
+   struct list_head list;
+   u32 end_seq;
+   int len;
+   int num_frags;
+   skb_frag_t frags[MAX_SKB_FRAGS];
+};
+
+struct tls_offload_context {
+   struct crypto_aead *aead_send;
+   spinlock_t lock;/* protects records list */
+   struct list_head records_list;
+   struct tls_record_info *open_record;
+   struct tls_record_info *retransmit_hint;
+   u64 hint_record_sn;
+   u64 unacked_record_sn;
+
+   struct scatterlist sg_tx_data[MAX_SKB_FRAGS];
+   void (*sk_destruct)(struct sock *sk);
+   u8 driver_state[];
+   /* The TLS layer reserves room for driver specific state
+* Currently the belief is that there is not enough
+* driver specific state to justify another layer of indirection
+*/
+#define TLS_DRIVER_STATE_SIZE (max_t(size_t, 8, sizeof(void *)))
+};
+
+#define TLS_OFFLOAD_CONTEXT_SIZE   
\
+   (ALIGN(sizeof(struct tls_offload_context), sizeof(void *)) +   \
+TLS_DRIVER_STATE_SIZE)
+
 enum {
TLS_PENDING_CLOSED_RECORD
 };
@@ -87,6 +118,10 @@ struct tls_context {
struct tls12_crypto_info_aes_gcm_128 crypto_send_aes_gcm_128;
};
 
+   struct list_head list;
+   struct net_device *netdev;
+   

[PATCH net-next 00/14] TLS offload, netdev & MLX5 support

2018-03-19 Thread Saeed Mahameed
Hi Dave,

The following series from Ilya and Boris provides TLS TX inline crypto
offload.

Boris says:
===
This series adds a generic infrastructure to offload TLS crypto to a
network devices. It enables the kernel TLS socket to skip encryption and
authentication operations on the transmit side of the data path. Leaving
those computationally expensive operations to the NIC.

The NIC offload infrastructure builds TLS records and pushes them to the
TCP layer just like the SW KTLS implementation and using the same API.
TCP segmentation is mostly unaffected. Currently the only exception is
that we prevent mixed SKBs where only part of the payload requires
offload. In the future we are likely to add a similar restriction
following a change cipher spec record.

The notable differences between SW KTLS and NIC offloaded TLS
implementations are as follows:
1. The offloaded implementation builds "plaintext TLS record", those
records contain plaintext instead of ciphertext and place holder bytes
instead of authentication tags.
2. The offloaded implementation maintains a mapping from TCP sequence
number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
  TLS socket, we can use the tls NIC offload infrastructure to obtain
enough context to encrypt the payload of the SKB.
A TLS record is released when the last byte of the record is ack'ed,
this is done through the new icsk_clean_acked callback.

The infrastructure should be extendable to support various NIC offload
implementations.  However it is currently written with the
implementation below in mind:
The NIC assumes that packets from each offloaded stream are sent as
plaintext and in-order. It keeps track of the TLS records in the TCP
stream. When a packet marked for offload is transmitted, the NIC
encrypts the payload in-place and puts authentication tags in the
relevant place holders.

The responsibility for handling out-of-order packets (i.e. TCP
retransmission, qdisc drops) falls on the netdev driver.

The netdev driver keeps track of the expected TCP SN from the NIC's
perspective.  If the next packet to transmit matches the expected TCP
SN, the driver advances the expected TCP SN, and transmits the packet
with TLS offload indication.

If the next packet to transmit does not match the expected TCP SN. The
driver calls the TLS layer to obtain the TLS record that includes the
TCP of the packet for transmission. Using this TLS record, the driver
posts a work entry on the transmit queue to reconstruct the NIC TLS
state required for the offload of the out-of-order packet. It updates
the expected TCP SN accordingly and transmit the now in-order packet.
The same queue is used for packet transmission and TLS context
reconstruction to avoid the need for flushing the transmit queue before
issuing the context reconstruction request.

Expected TCP SN is accessed without a lock, under the assumption that
TCP doesn't transmit SKBs from different TX queue concurrently.

We assume that packets are not rerouted to a different network device.

Paper: https://www.netdevconf.org/1.2/papers/netdevconf-TLS.pdf

===

The series is based on latest net-next:
c314c7ba4038 ("Merge branch '40GbE' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue")

Thanks,
Saeed.

--- 

Boris Pismenny (2):
  MAINTAINERS: Update mlx5 innova driver maintainers
  MAINTAINERS: Update TLS maintainers

Ilya Lesokhin (12):
  tcp: Add clean acked data hook
  net: Rename and export copy_skb_header
  net: Add Software fallback infrastructure for socket dependent
offloads
  net: Add TLS offload netdev ops
  net: Add TLS TX offload features
  net/tls: Add generic NIC offload infrastructure
  net/tls: Support TLS device offload with IPv6
  net/mlx5e: Move defines out of ipsec code
  net/mlx5: Accel, Add TLS tx offload interface
  net/mlx5e: TLS, Add Innova TLS TX support
  net/mlx5e: TLS, Add Innova TLS TX offload data path
  net/mlx5e: TLS, Add error statistics

 MAINTAINERS|  19 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|  11 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   6 +-
 .../net/ethernet/mellanox/mlx5/core/accel/tls.c|  71 ++
 .../net/ethernet/mellanox/mlx5/core/accel/tls.h|  86 +++
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  21 +
 .../mellanox/mlx5/core/en_accel/en_accel.h |  72 ++
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h   |   3 -
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 197 +
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h |  87 +++
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 278 +++
 .../mellanox/mlx5/core/en_accel/tls_rxtx.h |  50 ++
 .../mellanox/mlx5/core/en_accel/tls_stats.c|  89 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c |  32 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |   9 +
 

[PATCH v4 05/17] ixgbevf: keep writel() closer to wmb()

2018-03-19 Thread Sinan Kaya
Remove ixgbevf_write_tail() in favor of moving writel() close to
wmb().

Signed-off-by: Sinan Kaya 
Reviewed-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  | 5 -
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++--
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index f695242..11e893e 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -244,11 +244,6 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring 
*ring)
return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
-static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
-{
-   writel(value, ring->tail);
-}
-
 #define IXGBEVF_RX_DESC(R, i)  \
(&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
 #define IXGBEVF_TX_DESC(R, i)  \
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 9b3d43d..6bf778a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -659,7 +659,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring 
*rx_ring,
 * such as IA-64).
 */
wmb();
-   ixgbevf_write_tail(rx_ring, i);
+   writel(i, rx_ring->tail);
}
 }
 
@@ -3644,7 +3644,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
tx_ring->next_to_use = i;
 
/* notify HW of packet */
-   ixgbevf_write_tail(tx_ring, i);
+   writel(i, tx_ring->tail);
 
return;
 dma_error:
-- 
2.7.4



[PATCH net-next 10/14] net/mlx5e: TLS, Add Innova TLS TX support

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

Add NETIF_F_HW_TLS_TX capability and expose tlsdev_ops to work with the
TLS generic NIC offload infrastructure.
The NETIF_F_HW_TLS_TX capability will be added in the next patch.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|  11 ++
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 173 +
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h |  65 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   3 +
 5 files changed, 254 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 25deaa5a534c..6befd2c381b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -85,3 +85,14 @@ config MLX5_EN_IPSEC
  Build support for IPsec cryptography-offload accelaration in the NIC.
  Note: Support for hardware with this capability needs to be selected
  for this option to become available.
+
+config MLX5_EN_TLS
+   bool "TLS cryptography-offload accelaration"
+   depends on MLX5_CORE_EN
+   depends on TLS_DEVICE
+   depends on MLX5_ACCEL
+   default n
+   ---help---
+ Build support for TLS cryptography-offload accelaration in the NIC.
+ Note: Support for hardware with this capability needs to be selected
+ for this option to become available.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9989e5265a45..50872ed30c0b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -28,4 +28,6 @@ mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o 
ipoib/ethtool.o ipoib/ipoib
 mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o en_accel/ipsec_rxtx.o \
en_accel/ipsec_stats.o
 
+mlx5_core-$(CONFIG_MLX5_EN_TLS) +=  en_accel/tls.o
+
 CFLAGS_tracepoint.o := -I$(src)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
new file mode 100644
index ..38d88108a55a
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -0,0 +1,173 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include 
+#include 
+#include "en_accel/tls.h"
+#include "accel/tls.h"
+
+static void mlx5e_tls_set_ipv4_flow(void *flow, struct sock *sk)
+{
+   struct inet_sock *inet = inet_sk(sk);
+
+   MLX5_SET(tls_flow, flow, ipv6, 0);
+   memcpy(MLX5_ADDR_OF(tls_flow, flow, dst_ipv4_dst_ipv6.ipv4_layout.ipv4),
+  >inet_daddr, MLX5_FLD_SZ_BYTES(ipv4_layout, ipv4));
+   memcpy(MLX5_ADDR_OF(tls_flow, flow, src_ipv4_src_ipv6.ipv4_layout.ipv4),
+  >inet_rcv_saddr, MLX5_FLD_SZ_BYTES(ipv4_layout, ipv4));
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static void mlx5e_tls_set_ipv6_flow(void *flow, struct sock *sk)
+{
+   struct ipv6_pinfo *np = inet6_sk(sk);
+
+   MLX5_SET(tls_flow, flow, ipv6, 1);
+   memcpy(MLX5_ADDR_OF(tls_flow, flow, dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
+  >sk_v6_daddr, MLX5_FLD_SZ_BYTES(ipv6_layout, 

[PATCH net-next 14/14] MAINTAINERS: Update TLS maintainers

2018-03-19 Thread Saeed Mahameed
From: Boris Pismenny 

Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index cd4067ccf959..285ea4e6c580 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9711,7 +9711,7 @@ F:net/netfilter/xt_CONNSECMARK.c
 F: net/netfilter/xt_SECMARK.c
 
 NETWORKING [TLS]
-M: Ilya Lesokhin 
+M: Boris Pismenny 
 M: Aviad Yehezkel 
 M: Dave Watson 
 L: netdev@vger.kernel.org
-- 
2.14.3



[PATCH net-next 13/14] MAINTAINERS: Update mlx5 innova driver maintainers

2018-03-19 Thread Saeed Mahameed
From: Boris Pismenny 

Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
 MAINTAINERS | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 214c9bca232a..cd4067ccf959 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8913,26 +8913,17 @@ W:  http://www.mellanox.com
 Q: http://patchwork.ozlabs.org/project/netdev/list/
 F: drivers/net/ethernet/mellanox/mlx5/core/en_*
 
-MELLANOX ETHERNET INNOVA DRIVER
-M: Ilan Tayari 
-R: Boris Pismenny 
+MELLANOX ETHERNET INNOVA DRIVERS
+M: Boris Pismenny 
 L: netdev@vger.kernel.org
 S: Supported
 W: http://www.mellanox.com
 Q: http://patchwork.ozlabs.org/project/netdev/list/
+F: drivers/net/ethernet/mellanox/mlx5/core/en_accel/*
+F: drivers/net/ethernet/mellanox/mlx5/core/accel/*
 F: drivers/net/ethernet/mellanox/mlx5/core/fpga/*
 F: include/linux/mlx5/mlx5_ifc_fpga.h
 
-MELLANOX ETHERNET INNOVA IPSEC DRIVER
-M: Ilan Tayari 
-R: Boris Pismenny 
-L: netdev@vger.kernel.org
-S: Supported
-W: http://www.mellanox.com
-Q: http://patchwork.ozlabs.org/project/netdev/list/
-F: drivers/net/ethernet/mellanox/mlx5/core/en_ipsec/*
-F: drivers/net/ethernet/mellanox/mlx5/core/ipsec*
-
 MELLANOX ETHERNET SWITCH DRIVERS
 M: Jiri Pirko 
 M: Ido Schimmel 
-- 
2.14.3



[PATCH net-next 12/14] net/mlx5e: TLS, Add error statistics

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

Add statistics for rare TLS related errors.
Since the errors are rare we have a counter per netdev
rather then per SQ.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  3 +
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 22 ++
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h | 22 ++
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 24 +++---
 .../mellanox/mlx5/core/en_accel/tls_stats.c| 89 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  4 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 22 ++
 8 files changed, 178 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_stats.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index ec785f589666..a7135f5d5cf6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -28,6 +28,6 @@ mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o 
ipoib/ethtool.o ipoib/ipoib
 mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o en_accel/ipsec_rxtx.o \
en_accel/ipsec_stats.o
 
-mlx5_core-$(CONFIG_MLX5_EN_TLS) +=  en_accel/tls.o en_accel/tls_rxtx.o
+mlx5_core-$(CONFIG_MLX5_EN_TLS) +=  en_accel/tls.o en_accel/tls_rxtx.o 
en_accel/tls_stats.o
 
 CFLAGS_tracepoint.o := -I$(src)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 7d8696fca826..d397be0b5885 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -795,6 +795,9 @@ struct mlx5e_priv {
 #ifdef CONFIG_MLX5_EN_IPSEC
struct mlx5e_ipsec*ipsec;
 #endif
+#ifdef CONFIG_MLX5_EN_TLS
+   struct mlx5e_tls  *tls;
+#endif
 };
 
 struct mlx5e_profile {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
index aa6981c98bdc..d167845271c3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -173,3 +173,25 @@ void mlx5e_tls_build_netdev(struct mlx5e_priv *priv)
netdev->hw_features |= NETIF_F_HW_TLS_TX;
netdev->tlsdev_ops = _tls_ops;
 }
+
+int mlx5e_tls_init(struct mlx5e_priv *priv)
+{
+   struct mlx5e_tls *tls = kzalloc(sizeof(*tls), GFP_KERNEL);
+
+   if (!tls)
+   return -ENOMEM;
+
+   priv->tls = tls;
+   return 0;
+}
+
+void mlx5e_tls_cleanup(struct mlx5e_priv *priv)
+{
+   struct mlx5e_tls *tls = priv->tls;
+
+   if (!tls)
+   return;
+
+   kfree(tls);
+   priv->tls = NULL;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
index f7216b9b98e2..b6162178f621 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
@@ -38,6 +38,17 @@
 #include 
 #include "en.h"
 
+struct mlx5e_tls_sw_stats {
+   atomic64_t tx_tls_drop_metadata;
+   atomic64_t tx_tls_drop_resync_alloc;
+   atomic64_t tx_tls_drop_no_sync_data;
+   atomic64_t tx_tls_drop_bypass_required;
+};
+
+struct mlx5e_tls {
+   struct mlx5e_tls_sw_stats sw_stats;
+};
+
 struct mlx5e_tls_offload_context {
struct tls_offload_context base;
u32 expected_seq;
@@ -55,10 +66,21 @@ mlx5e_get_tls_tx_context(struct tls_context *tls_ctx)
 }
 
 void mlx5e_tls_build_netdev(struct mlx5e_priv *priv);
+int mlx5e_tls_init(struct mlx5e_priv *priv);
+void mlx5e_tls_cleanup(struct mlx5e_priv *priv);
+
+int mlx5e_tls_get_count(struct mlx5e_priv *priv);
+int mlx5e_tls_get_strings(struct mlx5e_priv *priv, uint8_t *data);
+int mlx5e_tls_get_stats(struct mlx5e_priv *priv, u64 *data);
 
 #else
 
 static inline void mlx5e_tls_build_netdev(struct mlx5e_priv *priv) { }
+static inline int mlx5e_tls_init(struct mlx5e_priv *priv) { return 0; }
+static inline void mlx5e_tls_cleanup(struct mlx5e_priv *priv) { }
+static inline int mlx5e_tls_get_count(struct mlx5e_priv *priv) { return 0; }
+static inline int mlx5e_tls_get_strings(struct mlx5e_priv *priv, uint8_t 
*data) { return 0; }
+static inline int mlx5e_tls_get_stats(struct mlx5e_priv *priv, u64 *data) { 
return 0; }
 
 #endif
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
index 49e8d455ebc3..ad2790fb5966 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -164,7 +164,8 @@ static struct sk_buff *
 

[PATCH net-next 11/14] net/mlx5e: TLS, Add Innova TLS TX offload data path

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

Implement the TLS tx offload data path according to the
requirements of the TLS generic NIC offload infrastructure.

Special metadata ethertype is used to pass information to
the hardware.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  15 ++
 .../mellanox/mlx5/core/en_accel/en_accel.h |  72 ++
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c |   2 +
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 272 +
 .../mellanox/mlx5/core/en_accel/tls_rxtx.h |  50 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c |  10 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |   9 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|  37 +--
 10 files changed, 455 insertions(+), 16 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 50872ed30c0b..ec785f589666 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -28,6 +28,6 @@ mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o 
ipoib/ethtool.o ipoib/ipoib
 mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o en_accel/ipsec_rxtx.o \
en_accel/ipsec_stats.o
 
-mlx5_core-$(CONFIG_MLX5_EN_TLS) +=  en_accel/tls.o
+mlx5_core-$(CONFIG_MLX5_EN_TLS) +=  en_accel/tls.o en_accel/tls_rxtx.o
 
 CFLAGS_tracepoint.o := -I$(src)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6660986285bf..7d8696fca826 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -340,6 +340,7 @@ struct mlx5e_sq_dma {
 enum {
MLX5E_SQ_STATE_ENABLED,
MLX5E_SQ_STATE_IPSEC,
+   MLX5E_SQ_STATE_TLS,
 };
 
 struct mlx5e_sq_wqe_info {
@@ -824,6 +825,8 @@ void mlx5e_build_ptys2ethtool_map(void);
 u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb,
   void *accel_priv, select_queue_fallback_t fallback);
 netdev_tx_t mlx5e_xmit(struct sk_buff *skb, struct net_device *dev);
+netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
+ struct mlx5e_tx_wqe *wqe, u16 pi);
 
 void mlx5e_completion_event(struct mlx5_core_cq *mcq);
 void mlx5e_cq_error_event(struct mlx5_core_cq *mcq, enum mlx5_event event);
@@ -929,6 +932,18 @@ static inline bool mlx5e_tunnel_inner_ft_supported(struct 
mlx5_core_dev *mdev)
MLX5_CAP_FLOWTABLE_NIC_RX(mdev, 
ft_field_support.inner_ip_version));
 }
 
+static inline void mlx5e_sq_fetch_wqe(struct mlx5e_txqsq *sq,
+ struct mlx5e_tx_wqe **wqe,
+ u16 *pi)
+{
+   struct mlx5_wq_cyc *wq;
+
+   wq = >wq;
+   *pi = sq->pc & wq->sz_m1;
+   *wqe = mlx5_wq_cyc_get_wqe(wq, *pi);
+   memset(*wqe, 0, sizeof(**wqe));
+}
+
 static inline
 struct mlx5e_tx_wqe *mlx5e_post_nop(struct mlx5_wq_cyc *wq, u32 sqn, u16 *pc)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
new file mode 100644
index ..68fcb40a2847
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
@@ -0,0 +1,72 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A 

[PATCH net-next 09/14] net/mlx5: Accel, Add TLS tx offload interface

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

Add routines for manipulating TLS TX offload contexts.

In Innova TLS, TLS contexts are added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.

Add implementation for Innova TLS (FPGA-based) hardware.

These routines will be used by the TLS offload support in a later patch

mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs
to work directly with mlx5_core rather than Innova FPGA or other mlx5
acceleration providers.

In the future, when IPSec/TLS or any other acceleration gets integrated
into ConnectX chip, mlx5/accel layer will provide the integrated
acceleration, rather than the Innova one.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   4 +-
 .../net/ethernet/mellanox/mlx5/core/accel/tls.c|  71 +++
 .../net/ethernet/mellanox/mlx5/core/accel/tls.h|  86 
 .../net/ethernet/mellanox/mlx5/core/fpga/core.h|   1 +
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 563 +
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h |  68 +++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  11 +
 include/linux/mlx5/mlx5_ifc.h  |  16 -
 include/linux/mlx5/mlx5_ifc_fpga.h |  77 +++
 9 files changed, 879 insertions(+), 18 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index c805769d92a9..9989e5265a45 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -8,10 +8,10 @@ mlx5_core-y :=main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \
diag/fs_tracepoint.o
 
-mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o
+mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o
 
 mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o fpga/conn.o fpga/sdk.o 
\
-   fpga/ipsec.o
+   fpga/ipsec.o fpga/tls.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
new file mode 100644
index ..77ac19f38cbe
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
@@ -0,0 +1,71 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include 
+
+#include "accel/tls.h"
+#include "mlx5_core.h"
+#include "fpga/tls.h"
+
+int mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
+  struct tls_crypto_info *crypto_info,
+  u32 start_offload_tcp_sn, u32 *p_swid)
+{
+   return mlx5_fpga_tls_add_tx_flow(mdev, flow, crypto_info,
+start_offload_tcp_sn, p_swid);
+}
+
+void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid)
+{
+   mlx5_fpga_tls_del_tx_flow(mdev, swid, 

[PATCH net-next 08/14] net/mlx5e: Move defines out of ipsec code

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

The defines are not IPSEC specific.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 3 ---
 drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 5 +
 drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h   | 2 ++
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 4c9360b25532..6660986285bf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -53,6 +53,9 @@
 #include "mlx5_core.h"
 #include "en_stats.h"
 
+#define MLX5E_METADATA_ETHER_TYPE (0x8CE4)
+#define MLX5E_METADATA_ETHER_LEN 8
+
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
 #define MLX5E_ETH_HARD_MTU (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
index 1198fc1eba4c..93bf10e6508c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
@@ -45,9 +45,6 @@
 #define MLX5E_IPSEC_SADB_RX_BITS 10
 #define MLX5E_IPSEC_ESN_SCOPE_MID 0x8000L
 
-#define MLX5E_METADATA_ETHER_TYPE (0x8CE4)
-#define MLX5E_METADATA_ETHER_LEN 8
-
 struct mlx5e_priv;
 
 struct mlx5e_ipsec_sw_stats {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
index 4f1568528738..a6b672840e34 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
@@ -43,9 +43,6 @@
 #include "fpga/sdk.h"
 #include "fpga/core.h"
 
-#define SBU_QP_QUEUE_SIZE 8
-#define MLX5_FPGA_IPSEC_CMD_TIMEOUT_MSEC   (60 * 1000)
-
 enum mlx5_fpga_ipsec_cmd_status {
MLX5_FPGA_IPSEC_CMD_PENDING,
MLX5_FPGA_IPSEC_CMD_SEND_FAIL,
@@ -258,7 +255,7 @@ static int mlx5_fpga_ipsec_cmd_wait(void *ctx)
 {
struct mlx5_fpga_ipsec_cmd_context *context = ctx;
unsigned long timeout =
-   msecs_to_jiffies(MLX5_FPGA_IPSEC_CMD_TIMEOUT_MSEC);
+   msecs_to_jiffies(MLX5_FPGA_CMD_TIMEOUT_MSEC);
int res;
 
res = wait_for_completion_timeout(>complete, timeout);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h
index baa537e54a49..a0573cc2fc9b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h
@@ -41,6 +41,8 @@
  * DOC: Innova SDK
  * This header defines the in-kernel API for Innova FPGA client drivers.
  */
+#define SBU_QP_QUEUE_SIZE 8
+#define MLX5_FPGA_CMD_TIMEOUT_MSEC (60 * 1000)
 
 enum mlx5_fpga_access_type {
MLX5_FPGA_ACCESS_TYPE_I2C = 0x0,
-- 
2.14.3



[PATCH net-next 01/14] tcp: Add clean acked data hook

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

Called when a TCP segment is acknowledged.
Could be used by application protocols who hold additional
metadata associated with the stream data.

This is required by TLS device offload to release
metadata associated with acknowledged TLS records.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 include/net/inet_connection_sock.h | 2 ++
 net/ipv4/tcp_input.c   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index b68fea022a82..2ab6667275df 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -77,6 +77,7 @@ struct inet_connection_sock_af_ops {
  * @icsk_af_ops   Operations which are AF_INET{4,6} specific
  * @icsk_ulp_ops  Pluggable ULP control hook
  * @icsk_ulp_data ULP private data
+ * @icsk_clean_acked  Clean acked data hook
  * @icsk_listen_portaddr_node  hash to the portaddr listener hashtable
  * @icsk_ca_state:Congestion control state
  * @icsk_retransmits: Number of unrecovered [RTO] timeouts
@@ -102,6 +103,7 @@ struct inet_connection_sock {
const struct inet_connection_sock_af_ops *icsk_af_ops;
const struct tcp_ulp_ops  *icsk_ulp_ops;
void  *icsk_ulp_data;
+   void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq);
struct hlist_node icsk_listen_portaddr_node;
unsigned int  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
__u8  icsk_ca_state:6,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 451ef3012636..9854ecae7245 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3542,6 +3542,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff 
*skb, int flag)
if (after(ack, prior_snd_una)) {
flag |= FLAG_SND_UNA_ADVANCED;
icsk->icsk_retransmits = 0;
+   if (icsk->icsk_clean_acked)
+   icsk->icsk_clean_acked(sk, ack);
}
 
prior_fack = tcp_is_sack(tp) ? tcp_highest_sack_seq(tp) : tp->snd_una;
-- 
2.14.3



[PATCH net-next 03/14] net: Add Software fallback infrastructure for socket dependent offloads

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

With socket dependent offloads we rely on the netdev to transform
the transmitted packets before sending them to the wire.
When a packet from an offloaded socket is rerouted to a different
device we need to detect it and do the transformation in software.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
 include/net/sock.h | 21 +
 net/Kconfig|  4 
 net/core/dev.c |  4 
 3 files changed, 29 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index b9624581d639..92a0e0c54ac1 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -481,6 +481,11 @@ struct sock {
void(*sk_error_report)(struct sock *sk);
int (*sk_backlog_rcv)(struct sock *sk,
  struct sk_buff *skb);
+#ifdef CONFIG_SOCK_VALIDATE_XMIT
+   struct sk_buff* (*sk_validate_xmit_skb)(struct sock *sk,
+   struct net_device *dev,
+   struct sk_buff *skb);
+#endif
void(*sk_destruct)(struct sock *sk);
struct sock_reuseport __rcu *sk_reuseport_cb;
struct rcu_head sk_rcu;
@@ -2323,6 +2328,22 @@ static inline bool sk_fullsock(const struct sock *sk)
return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
 }
 
+/* Checks if this SKB belongs to an HW offloaded socket
+ * and whether any SW fallbacks are required based on dev.
+ */
+static inline struct sk_buff *sk_validate_xmit_skb(struct sk_buff *skb,
+  struct net_device *dev)
+{
+#ifdef CONFIG_SOCK_VALIDATE_XMIT
+   struct sock *sk = skb->sk;
+
+   if (sk && sk_fullsock(sk) && sk->sk_validate_xmit_skb)
+   skb = sk->sk_validate_xmit_skb(sk, dev, skb);
+#endif
+
+   return skb;
+}
+
 /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV
  * SYNACK messages can be attached to either ones (depending on SYNCOOKIE)
  */
diff --git a/net/Kconfig b/net/Kconfig
index 0428f12c25c2..fe84cfe3260e 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -407,6 +407,10 @@ config GRO_CELLS
bool
default n
 
+config SOCK_VALIDATE_XMIT
+   bool
+   default n
+
 config NET_DEVLINK
tristate "Network physical/parent device Netlink interface"
help
diff --git a/net/core/dev.c b/net/core/dev.c
index d8887cc38e7b..244a4c7ab266 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3086,6 +3086,10 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff 
*skb, struct net_device
if (unlikely(!skb))
goto out_null;
 
+   skb = sk_validate_xmit_skb(skb, dev);
+   if (unlikely(!skb))
+   goto out_null;
+
if (netif_needs_gso(skb, features)) {
struct sk_buff *segs;
 
-- 
2.14.3



[PATCH net-next 07/14] net/tls: Support TLS device offload with IPv6

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

Previously get_netdev_for_sock worked only with IPv4.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
 net/tls/tls_device.c | 49 -
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index c0d4e11a4286..6d4d4d513b84 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -37,6 +37,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -101,13 +106,55 @@ static void tls_device_queue_ctx_destruction(struct 
tls_context *ctx)
spin_unlock_irqrestore(_device_lock, flags);
 }
 
+static inline struct net_device *ipv6_get_netdev(struct sock *sk)
+{
+   struct net_device *dev = NULL;
+#if IS_ENABLED(CONFIG_IPV6)
+   struct inet_sock *inet = inet_sk(sk);
+   struct ipv6_pinfo *np = inet6_sk(sk);
+   struct flowi6 _fl6, *fl6 = &_fl6;
+   struct dst_entry *dst;
+
+   memset(fl6, 0, sizeof(*fl6));
+   fl6->flowi6_proto = sk->sk_protocol;
+   fl6->daddr = sk->sk_v6_daddr;
+   fl6->saddr = np->saddr;
+   fl6->flowlabel = np->flow_label;
+   IP6_ECN_flow_xmit(sk, fl6->flowlabel);
+   fl6->flowi6_oif = sk->sk_bound_dev_if;
+   fl6->flowi6_mark = sk->sk_mark;
+   fl6->fl6_sport = inet->inet_sport;
+   fl6->fl6_dport = inet->inet_dport;
+   fl6->flowi6_uid = sk->sk_uid;
+   security_sk_classify_flow(sk, flowi6_to_flowi(fl6));
+
+   if (ipv6_stub->ipv6_dst_lookup(sock_net(sk), sk, , fl6) < 0)
+   return NULL;
+
+   dev = dst->dev;
+   dev_hold(dev);
+   dst_release(dst);
+
+#endif
+   return dev;
+}
+
 /* We assume that the socket is already connected */
 static struct net_device *get_netdev_for_sock(struct sock *sk)
 {
struct inet_sock *inet = inet_sk(sk);
struct net_device *netdev = NULL;
 
-   netdev = dev_get_by_index(sock_net(sk), inet->cork.fl.flowi_oif);
+   if (sk->sk_family == AF_INET)
+   netdev = dev_get_by_index(sock_net(sk),
+ inet->cork.fl.flowi_oif);
+   else if (sk->sk_family == AF_INET6) {
+   netdev = ipv6_get_netdev(sk);
+   if (!netdev && !sk->sk_ipv6only &&
+   ipv6_addr_type(>sk_v6_daddr) == IPV6_ADDR_MAPPED)
+   netdev = dev_get_by_index(sock_net(sk),
+ inet->cork.fl.flowi_oif);
+   }
 
return netdev;
 }
-- 
2.14.3



[PATCH net-next 02/14] net: Rename and export copy_skb_header

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

copy_skb_header is renamed to skb_copy_header and
exported. Exposing this function give more flexibility
in copying SKBs.
skb_copy and skb_copy_expand do not give enough control
over which parts are copied.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
 include/linux/skbuff.h | 1 +
 net/core/skbuff.c  | 9 +
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d8340e6e8814..dc0f81277723 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1031,6 +1031,7 @@ static inline struct sk_buff *alloc_skb_fclone(unsigned 
int size,
 struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src);
 int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask);
 struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t priority);
+void skb_copy_header(struct sk_buff *new, const struct sk_buff *old);
 struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t priority);
 struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom,
   gfp_t gfp_mask, bool fclone);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 715c13495ba6..9ae1812fb705 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1304,7 +1304,7 @@ static void skb_headers_offset_update(struct sk_buff 
*skb, int off)
skb->inner_mac_header += off;
 }
 
-static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
+void skb_copy_header(struct sk_buff *new, const struct sk_buff *old)
 {
__copy_skb_header(new, old);
 
@@ -1312,6 +1312,7 @@ static void copy_skb_header(struct sk_buff *new, const 
struct sk_buff *old)
skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type;
 }
+EXPORT_SYMBOL(skb_copy_header);
 
 static inline int skb_alloc_rx_flag(const struct sk_buff *skb)
 {
@@ -1354,7 +1355,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t 
gfp_mask)
 
BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len));
 
-   copy_skb_header(n, skb);
+   skb_copy_header(n, skb);
return n;
 }
 EXPORT_SYMBOL(skb_copy);
@@ -1418,7 +1419,7 @@ struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, 
int headroom,
skb_clone_fraglist(n);
}
 
-   copy_skb_header(n, skb);
+   skb_copy_header(n, skb);
 out:
return n;
 }
@@ -1598,7 +1599,7 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb,
BUG_ON(skb_copy_bits(skb, -head_copy_len, n->head + head_copy_off,
 skb->len + head_copy_len));
 
-   copy_skb_header(n, skb);
+   skb_copy_header(n, skb);
 
skb_headers_offset_update(n, newheadroom - oldheadroom);
 
-- 
2.14.3



[PATCH net-next 04/14] net: Add TLS offload netdev ops

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

Add new netdev ops to add and delete tls context

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 include/linux/netdevice.h | 24 
 1 file changed, 24 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 913b1cc882cf..e1fef7bb6ed4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -864,6 +864,26 @@ struct xfrmdev_ops {
 };
 #endif
 
+#if IS_ENABLED(CONFIG_TLS_DEVICE)
+enum tls_offload_ctx_dir {
+   TLS_OFFLOAD_CTX_DIR_RX,
+   TLS_OFFLOAD_CTX_DIR_TX,
+};
+
+struct tls_crypto_info;
+struct tls_context;
+
+struct tlsdev_ops {
+   int (*tls_dev_add)(struct net_device *netdev, struct sock *sk,
+  enum tls_offload_ctx_dir direction,
+  struct tls_crypto_info *crypto_info,
+  u32 start_offload_tcp_sn);
+   void (*tls_dev_del)(struct net_device *netdev,
+   struct tls_context *ctx,
+   enum tls_offload_ctx_dir direction);
+};
+#endif
+
 struct dev_ifalias {
struct rcu_head rcuhead;
char ifalias[];
@@ -1748,6 +1768,10 @@ struct net_device {
const struct xfrmdev_ops *xfrmdev_ops;
 #endif
 
+#if IS_ENABLED(CONFIG_TLS_DEVICE)
+   const struct tlsdev_ops *tlsdev_ops;
+#endif
+
const struct header_ops *header_ops;
 
unsigned intflags;
-- 
2.14.3



[PATCH v4 08/17] drivers: net: cxgb: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/chelsio/cxgb/sge.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb/sge.c 
b/drivers/net/ethernet/chelsio/cxgb/sge.c
index 30de26e..57891bd6 100644
--- a/drivers/net/ethernet/chelsio/cxgb/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb/sge.c
@@ -495,7 +495,7 @@ static struct sk_buff *sched_skb(struct sge *sge, struct 
sk_buff *skb,
 static inline void doorbell_pio(struct adapter *adapter, u32 val)
 {
wmb();
-   writel(val, adapter->regs + A_SG_DOORBELL);
+   writel_relaxed(val, adapter->regs + A_SG_DOORBELL);
 }
 
 /*
-- 
2.7.4



[PATCH v4 09/17] net: qla3xxx: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
b/drivers/net/ethernet/qlogic/qla3xxx.c
index 9e5264d..0e71b74 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -1858,8 +1858,8 @@ static void ql_update_small_bufq_prod_index(struct 
ql3_adapter *qdev)
qdev->small_buf_release_cnt -= 8;
}
wmb();
-   writel(qdev->small_buf_q_producer_index,
-   _regs->CommonRegs.rxSmallQProducerIndex);
+   writel_relaxed(qdev->small_buf_q_producer_index,
+  _regs->CommonRegs.rxSmallQProducerIndex);
}
 }
 
-- 
2.7.4



[PATCH v4 03/17] igbvf: eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
Reviewed-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c 
b/drivers/net/ethernet/intel/igbvf/netdev.c
index 4214c15..edb1c34 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -251,7 +251,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring 
*rx_ring,
 * such as IA-64).
*/
wmb();
-   writel(i, adapter->hw.hw_addr + rx_ring->tail);
+   writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
}
 }
 
@@ -2297,7 +2297,7 @@ static inline void igbvf_tx_queue_adv(struct 
igbvf_adapter *adapter,
 
tx_ring->buffer_info[first].next_to_watch = tx_desc;
tx_ring->next_to_use = i;
-   writel(i, adapter->hw.hw_addr + tx_ring->tail);
+   writel_relaxed(i, adapter->hw.hw_addr + tx_ring->tail);
/* we need this if more than one processor can write to our tail
 * at a time, it synchronizes IO on IA64/Altix systems
 */
-- 
2.7.4



[PATCH v4 06/17] ixgbevf: eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
Reviewed-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 6bf778a..774b2a6 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -659,7 +659,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring 
*rx_ring,
 * such as IA-64).
 */
wmb();
-   writel(i, rx_ring->tail);
+   writel_relaxed(i, rx_ring->tail);
}
 }
 
@@ -3644,7 +3644,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
tx_ring->next_to_use = i;
 
/* notify HW of packet */
-   writel(i, tx_ring->tail);
+   writel_relaxed(i, tx_ring->tail);
 
return;
 dma_error:
-- 
2.7.4



[PATCH v4 02/17] ixgbe: eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
Reviewed-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0da5aa2..58ed70f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1692,7 +1692,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, 
u16 cleaned_count)
 * such as IA-64).
 */
wmb();
-   writel(i, rx_ring->tail);
+   writel_relaxed(i, rx_ring->tail);
}
 }
 
@@ -2453,7 +2453,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector 
*q_vector,
 * know there are new descriptors to fetch.
 */
wmb();
-   writel(ring->next_to_use, ring->tail);
+   writel_relaxed(ring->next_to_use, ring->tail);
 
xdp_do_flush_map();
}
@@ -8078,7 +8078,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-   writel(i, tx_ring->tail);
+   writel_relaxed(i, tx_ring->tail);
 
/* we need this if more than one processor can write to our tail
 * at a time, it synchronizes IO on IA64/Altix systems
@@ -10014,7 +10014,7 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 * are new descriptors to fetch.
 */
wmb();
-   writel(ring->next_to_use, ring->tail);
+   writel_relaxed(ring->next_to_use, ring->tail);
 
return;
 }
-- 
2.7.4



[PATCH v4 14/17] net: qlge: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/qlogic/qlge/qlge.h  | 18 ++
 drivers/net/ethernet/qlogic/qlge/qlge_main.c |  2 +-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h 
b/drivers/net/ethernet/qlogic/qlge/qlge.h
index 84ac50f..1465986 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge.h
+++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
@@ -2185,6 +2185,24 @@ static inline void ql_write_db_reg(u32 val, void __iomem 
*addr)
 }
 
 /*
+ * Doorbell Registers:
+ * Doorbell registers are virtual registers in the PCI memory space.
+ * The space is allocated by the chip during PCI initialization.  The
+ * device driver finds the doorbell address in BAR 3 in PCI config space.
+ * The registers are used to control outbound and inbound queues. For
+ * example, the producer index for an outbound queue.  Each queue uses
+ * 1 4k chunk of memory.  The lower half of the space is for outbound
+ * queues. The upper half is for inbound queues.
+ * Caller has to guarantee ordering.
+ */
+static inline void ql_write_db_reg_relaxed(u32 val, void __iomem *addr)
+{
+   writel_relaxed(val, addr);
+   mmiowb();
+}
+
+
+/*
  * Shadow Registers:
  * Outbound queues have a consumer index that is maintained by the chip.
  * Inbound queues have a producer index that is maintained by the chip.
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c 
b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 50038d9..c222b7c 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2700,7 +2700,7 @@ static netdev_tx_t qlge_send(struct sk_buff *skb, struct 
net_device *ndev)
tx_ring->prod_idx = 0;
wmb();
 
-   ql_write_db_reg(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
+   ql_write_db_reg_relaxed(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
netif_printk(qdev, tx_queued, KERN_DEBUG, qdev->ndev,
 "tx queued, slot %d, len %d\n",
 tx_ring->prod_idx, skb->len);
-- 
2.7.4



[PATCH v4 11/17] bnx2x: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h   |  9 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h   |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  | 21 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c |  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c  |  2 +-
 5 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 352beff..ac38db9 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -166,6 +166,12 @@ do {   \
 #define REG_RD8(bp, offset)readb(REG_ADDR(bp, offset))
 #define REG_RD16(bp, offset)   readw(REG_ADDR(bp, offset))
 
+#define REG_WR_RELAXED(bp, offset, val)writel_relaxed((u32)val,\
+  REG_ADDR(bp, offset))
+
+#define REG_WR16_RELAXED(bp, offset, val) \
+   writew_relaxed((u16)val, REG_ADDR(bp, offset))
+
 #define REG_WR(bp, offset, val)writel((u32)val, REG_ADDR(bp, 
offset))
 #define REG_WR8(bp, offset, val)   writeb((u8)val, REG_ADDR(bp, offset))
 #define REG_WR16(bp, offset, val)  writew((u16)val, REG_ADDR(bp, offset))
@@ -760,7 +766,8 @@ struct bnx2x_fastpath {
 #endif
 #define DOORBELL(bp, cid, val) \
do { \
-   writel((u32)(val), bp->doorbells + (bp->db_size * (cid))); \
+   writel_relaxed((u32)(val),\
+   bp->doorbells + (bp->db_size * (cid))); \
} while (0)
 
 /* TX CSUM helpers */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index a5265e1..a8ce5c5 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -522,8 +522,8 @@ static inline void bnx2x_update_rx_prod(struct bnx2x *bp,
wmb();
 
for (i = 0; i < sizeof(rx_prods)/4; i++)
-   REG_WR(bp, fp->ustorm_rx_prods_offset + i*4,
-  ((u32 *)_prods)[i]);
+   REG_WR_RELAXED(bp, fp->ustorm_rx_prods_offset + i * 4,
+  ((u32 *)_prods)[i]);
 
mmiowb(); /* keep prod updates ordered */
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 74fc9af..2dea1b6 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -1608,8 +1608,8 @@ static void bnx2x_hc_int_enable(struct bnx2x *bp)
} else
val = 0x;
 
-   REG_WR(bp, HC_REG_TRAILING_EDGE_0 + port*8, val);
-   REG_WR(bp, HC_REG_LEADING_EDGE_0 + port*8, val);
+   REG_WR_RELAXED(bp, HC_REG_TRAILING_EDGE_0 + port * 8, val);
+   REG_WR_RELAXED(bp, HC_REG_LEADING_EDGE_0 + port * 8, val);
}
 
/* Make sure that interrupts are indeed enabled from here on */
@@ -1672,8 +1672,8 @@ static void bnx2x_igu_int_enable(struct bnx2x *bp)
} else
val = 0x;
 
-   REG_WR(bp, IGU_REG_TRAILING_EDGE_LATCH, val);
-   REG_WR(bp, IGU_REG_LEADING_EDGE_LATCH, val);
+   REG_WR_RELAXED(bp, IGU_REG_TRAILING_EDGE_LATCH, val);
+   REG_WR_RELAXED(bp, IGU_REG_LEADING_EDGE_LATCH, val);
 
/* Make sure that interrupts are indeed enabled from here on */
mmiowb();
@@ -3817,8 +3817,8 @@ static void bnx2x_sp_prod_update(struct bnx2x *bp)
 */
mb();
 
-   REG_WR16(bp, BAR_XSTRORM_INTMEM + XSTORM_SPQ_PROD_OFFSET(func),
-bp->spq_prod_idx);
+   REG_WR16_RELAXED(bp, BAR_XSTRORM_INTMEM + XSTORM_SPQ_PROD_OFFSET(func),
+bp->spq_prod_idx);
mmiowb();
 }
 
@@ -7761,7 +7761,7 @@ void bnx2x_igu_clear_sb_gen(struct bnx2x *bp, u8 func, u8 
idu_sb_id, bool is_pf)
barrier();
DP(NETIF_MSG_HW, "write 0x%08x to IGU(via GRC) addr 0x%x\n",
  ctl, igu_addr_ctl);
-   REG_WR(bp, igu_addr_ctl, ctl);
+   REG_WR_RELAXED(bp, igu_addr_ctl, ctl);
mmiowb();
barrier();
 
@@ -9720,13 +9720,14 @@ static void bnx2x_process_kill_chip_reset(struct bnx2x 
*bp, bool global)
barrier();
mmiowb();
 
-   REG_WR(bp, GRCBASE_MISC + MISC_REGISTERS_RESET_REG_2_SET,
-  reset_mask2 & (~stay_reset2));
+   REG_WR_RELAXED(bp, GRCBASE_MISC + MISC_REGISTERS_RESET_REG_2_SET,
+  reset_mask2 & (~stay_reset2));
 
barrier();
 

[PATCH v4 15/17] bnxt_en: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 11 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  2 +-
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1500243..befb538 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1922,7 +1922,7 @@ static int bnxt_poll_work(struct bnxt *bp, struct 
bnxt_napi *bnapi, int budget)
/* Sync BD data before updating doorbell */
wmb();
 
-   bnxt_db_write(bp, db, DB_KEY_TX | prod);
+   bnxt_db_write_relaxed(bp, db, DB_KEY_TX | prod);
}
 
cpr->cp_raw_cons = raw_cons;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 1989c47..4c0d048 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1402,11 +1402,20 @@ static inline u32 bnxt_tx_avail(struct bnxt *bp, struct 
bnxt_tx_ring_info *txr)
 }
 
 /* For TX and RX ring doorbells */
+static inline void bnxt_db_write_relaxed(struct bnxt *bp, void __iomem *db,
+u32 val)
+{
+   writel_relaxed(val, db);
+   if (bp->flags & BNXT_FLAG_DOUBLE_DB)
+   writel_relaxed(val, db);
+}
+
+/* For TX and RX ring doorbells */
 static inline void bnxt_db_write(struct bnxt *bp, void __iomem *db, u32 val)
 {
writel(val, db);
if (bp->flags & BNXT_FLAG_DOUBLE_DB)
-   writel(val, db);
+   writel_relaxed(val, db);
 }
 
 extern const u16 bnxt_lhint_arr[];
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 1801582..a1b1060 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -2403,7 +2403,7 @@ static int bnxt_run_loopback(struct bnxt *bp)
/* Sync BD data before updating doorbell */
wmb();
 
-   bnxt_db_write(bp, txr->tx_doorbell, DB_KEY_TX | txr->tx_prod);
+   bnxt_db_write_relaxed(bp, txr->tx_doorbell, DB_KEY_TX | txr->tx_prod);
rc = bnxt_poll_loopback(bp, pkt_size);
 
dma_unmap_single(>pdev->dev, map, pkt_size, PCI_DMA_TODEVICE);
-- 
2.7.4



[PATCH v4 10/17] qlcnic: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
Acked-by: Manish Chopra 
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
index 46b0372..97c146e7 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -478,7 +478,7 @@ irqreturn_t qlcnic_83xx_clear_legacy_intr(struct 
qlcnic_adapter *adapter)
wmb();
 
/* clear the interrupt trigger control register */
-   writel(0, adapter->isr_int_vec);
+   writel_relaxed(0, adapter->isr_int_vec);
intr_val = readl(adapter->isr_int_vec);
do {
intr_val = readl(adapter->tgt_status_reg);
-- 
2.7.4



[PATCH v4 12/17] net: cxgb4/cxgb4vf: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  6 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 13 +++--
 drivers/net/ethernet/chelsio/cxgb4/sge.c| 12 ++--
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c  |  2 +-
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h  | 14 ++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c  | 18 ++
 6 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 9040e13..6bde0b9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -1202,6 +1202,12 @@ static inline void t4_write_reg(struct adapter *adap, 
u32 reg_addr, u32 val)
writel(val, adap->regs + reg_addr);
 }
 
+static inline void t4_write_reg_relaxed(struct adapter *adap, u32 reg_addr,
+   u32 val)
+{
+   writel_relaxed(val, adap->regs + reg_addr);
+}
+
 #ifndef readq
 static inline u64 readq(const volatile void __iomem *addr)
 {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 7b452e8..276472d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -1723,8 +1723,8 @@ int cxgb4_sync_txq_pidx(struct net_device *dev, u16 qid, 
u16 pidx,
else
val = PIDX_T5_V(delta);
wmb();
-   t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL_A),
-QID_V(qid) | val);
+   t4_write_reg_relaxed(adap, MYPF_REG(SGE_PF_KDOORBELL_A),
+QID_V(qid) | val);
}
 out:
return ret;
@@ -1902,8 +1902,9 @@ static void enable_txq_db(struct adapter *adap, struct 
sge_txq *q)
 * are committed before we tell HW about them.
 */
wmb();
-   t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL_A),
-QID_V(q->cntxt_id) | PIDX_V(q->db_pidx_inc));
+   t4_write_reg_relaxed(adap, MYPF_REG(SGE_PF_KDOORBELL_A),
+QID_V(q->cntxt_id) |
+   PIDX_V(q->db_pidx_inc));
q->db_pidx_inc = 0;
}
q->db_disabled = 0;
@@ -2003,8 +2004,8 @@ static void sync_txq_pidx(struct adapter *adap, struct 
sge_txq *q)
else
val = PIDX_T5_V(delta);
wmb();
-   t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL_A),
-QID_V(q->cntxt_id) | val);
+   t4_write_reg_relaxed(adap, MYPF_REG(SGE_PF_KDOORBELL_A),
+QID_V(q->cntxt_id) | val);
}
 out:
q->db_disabled = 0;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 6e310a0..7388aac 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -530,11 +530,11 @@ static inline void ring_fl_db(struct adapter *adap, 
struct sge_fl *q)
 * mechanism.
 */
if (unlikely(q->bar2_addr == NULL)) {
-   t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL_A),
-val | QID_V(q->cntxt_id));
+   t4_write_reg_relaxed(adap, MYPF_REG(SGE_PF_KDOORBELL_A),
+val | QID_V(q->cntxt_id));
} else {
-   writel(val | QID_V(q->bar2_qid),
-  q->bar2_addr + SGE_UDB_KDOORBELL);
+   writel_relaxed(val | QID_V(q->bar2_qid),
+  q->bar2_addr + SGE_UDB_KDOORBELL);
 
/* This Write memory Barrier will force the write to
 * the User Doorbell area to be flushed.
@@ -986,8 +986,8 @@ inline void cxgb4_ring_tx_db(struct adapter *adap, struct 
sge_txq *q, int n)
  (q->bar2_addr + SGE_UDB_WCDOORBELL),
  wr);
} else {
-   writel(val | QID_V(q->bar2_qid),
-  q->bar2_addr + SGE_UDB_KDOORBELL);
+   writel_relaxed(val | QID_V(q->bar2_qid),
+  q->bar2_addr + SGE_UDB_KDOORBELL);
}
 
/* This Write Memory Barrier will force the write to 

[PATCH v4 16/17] qed/qede: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/qlogic/qed/qed.h   |  5 -
 drivers/net/ethernet/qlogic/qed/qed_hw.c| 12 
 drivers/net/ethernet/qlogic/qed/qed_hw.h| 14 ++
 drivers/net/ethernet/qlogic/qed/qed_int.c   |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_l2.c|  2 +-
 drivers/net/ethernet/qlogic/qed/qed_ll2.c   |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_vf.c|  7 ---
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |  2 +-
 drivers/net/ethernet/qlogic/qede/qede_fp.c  |  4 ++--
 drivers/net/ethernet/qlogic/qlge/qlge.h |  1 -
 include/linux/qed/qed_if.h  | 17 +
 11 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 6948855..241077f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -818,12 +818,15 @@ u16 qed_get_cm_pq_idx_vf(struct qed_hwfn *p_hwfn, u16 vf);
(cdev->regview) + \
 (offset))
 
+#define REG_WR_RELAXED(cdev, offset, val)  \
+   writel_relaxed((u32)val, REG_ADDR(cdev, offset))
+
 #define REG_RD(cdev, offset)readl(REG_ADDR(cdev, offset))
 #define REG_WR(cdev, offset, val)   writel((u32)val, REG_ADDR(cdev, 
offset))
 #define REG_WR16(cdev, offset, val) writew((u16)val, REG_ADDR(cdev, 
offset))
 
 #define DOORBELL(cdev, db_addr, val)\
-   writel((u32)val, (void __iomem *)((u8 __iomem *)\
+   writel_relaxed((u32)val, (void __iomem *)((u8 __iomem *)\
  (cdev->doorbells) + (db_addr)))
 
 /* Prototypes */
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hw.c 
b/drivers/net/ethernet/qlogic/qed/qed_hw.c
index fca2dbd..1d76121 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hw.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_hw.c
@@ -222,6 +222,18 @@ struct qed_ptt *qed_get_reserved_ptt(struct qed_hwfn 
*p_hwfn,
return _hwfn->p_ptt_pool->ptts[ptt_idx];
 }
 
+void qed_wr_relaxed(struct qed_hwfn *p_hwfn,
+   struct qed_ptt *p_ptt,
+   u32 hw_addr, u32 val)
+{
+   u32 bar_addr = qed_set_ptt(p_hwfn, p_ptt, hw_addr);
+
+   REG_WR_RELAXED(p_hwfn, bar_addr, val);
+   DP_VERBOSE(p_hwfn, NETIF_MSG_HW,
+  "bar_addr 0x%x, hw_addr 0x%x, val 0x%x\n",
+  bar_addr, hw_addr, val);
+}
+
 void qed_wr(struct qed_hwfn *p_hwfn,
struct qed_ptt *p_ptt,
u32 hw_addr, u32 val)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hw.h 
b/drivers/net/ethernet/qlogic/qed/qed_hw.h
index 8db2839..bb4f5ff 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hw.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hw.h
@@ -152,6 +152,20 @@ struct qed_ptt *qed_get_reserved_ptt(struct qed_hwfn 
*p_hwfn,
 enum reserved_ptts ptt_idx);
 
 /**
+ * @brief qed_wr_relaxed - Write value to BAR using the given ptt
+ *No ordering guarantee.
+ *
+ * @param p_hwfn
+ * @param p_ptt
+ * @param val
+ * @param hw_addr
+ */
+void qed_wr_relaxed(struct qed_hwfn *p_hwfn,
+   struct qed_ptt *p_ptt,
+   u32 hw_addr,
+   u32 val);
+
+/**
  * @brief qed_wr - Write value to BAR using the given ptt
  *
  * @param p_hwfn
diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c 
b/drivers/net/ethernet/qlogic/qed/qed_int.c
index d3eabcf..5f09253 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_int.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_int.c
@@ -1747,7 +1747,7 @@ static void qed_int_igu_cleanup_sb(struct qed_hwfn 
*p_hwfn,
 
barrier();
 
-   qed_wr(p_hwfn, p_ptt, IGU_REG_COMMAND_REG_CTRL, cmd_ctrl);
+   qed_wr_relaxed(p_hwfn, p_ptt, IGU_REG_COMMAND_REG_CTRL, cmd_ctrl);
 
/* Flush the write to IGU */
mmiowb();
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c 
b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index 893ef08..7f3f923b 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -921,7 +921,7 @@ qed_eth_pf_rx_queue_start(struct qed_hwfn *p_hwfn,
 
/* Init the rcq, rx bd and rx sge (if valid) producers to 0 */
__internal_ram_wr(p_hwfn, *pp_prod, sizeof(u32),
- (u32 *)(_prod_val));
+ (u32 *)(_prod_val), false);
 
return 

[PATCH v4 17/17] net: ena: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes barrier() followed by writel(). writel() already has a
barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a barrier().

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/amazon/ena/ena_com.c |  6 --
 drivers/net/ethernet/amazon/ena/ena_eth_com.h | 22 --
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  4 ++--
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
b/drivers/net/ethernet/amazon/ena/ena_com.c
index bf2de52..b6e628f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -631,7 +631,8 @@ static u32 ena_com_reg_bar_read32(struct ena_com_dev 
*ena_dev, u16 offset)
 */
wmb();
 
-   writel(mmio_read_reg, ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
+   writel_relaxed(mmio_read_reg,
+  ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
 
for (i = 0; i < timeout; i++) {
if (read_resp->req_id == mmio_read->seq_num)
@@ -1826,7 +1827,8 @@ void ena_com_aenq_intr_handler(struct ena_com_dev *dev, 
void *data)
 
/* write the aenq doorbell after all AENQ descriptors were read */
mb();
-   writel((u32)aenq->head, dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
+   writel_relaxed((u32)aenq->head,
+  dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
 }
 
 int ena_com_dev_reset(struct ena_com_dev *ena_dev,
diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.h 
b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
index 2f76572..09ef7cd 100644
--- a/drivers/net/ethernet/amazon/ena/ena_eth_com.h
+++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
@@ -107,7 +107,8 @@ static inline int ena_com_sq_empty_space(struct 
ena_com_io_sq *io_sq)
return io_sq->q_depth - 1 - cnt;
 }
 
-static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq)
+static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq,
+   bool relaxed)
 {
u16 tail;
 
@@ -116,7 +117,24 @@ static inline int ena_com_write_sq_doorbell(struct 
ena_com_io_sq *io_sq)
pr_debug("write submission queue doorbell for queue: %d tail: %d\n",
 io_sq->qid, tail);
 
-   writel(tail, io_sq->db_addr);
+   if (relaxed)
+   writel_relaxed(tail, io_sq->db_addr);
+   else
+   writel(tail, io_sq->db_addr);
+
+   return 0;
+}
+
+static inline int ena_com_write_sq_doorbell_rel(struct ena_com_io_sq *io_sq)
+{
+   u16 tail;
+
+   tail = io_sq->tail;
+
+   pr_debug("write submission queue doorbell for queue: %d tail: %d\n",
+io_sq->qid, tail);
+
+   writel_relaxed(tail, io_sq->db_addr);
 
return 0;
 }
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 6975150..0530201 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -556,7 +556,7 @@ static int ena_refill_rx_bufs(struct ena_ring *rx_ring, u32 
num)
 * issue a doorbell
 */
wmb();
-   ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq);
+   ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq, true);
}
 
rx_ring->next_to_use = next_to_use;
@@ -2151,7 +2151,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
if (netif_xmit_stopped(txq) || !skb->xmit_more) {
/* trigger the dma engine */
-   ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq);
+   ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq, false);
u64_stats_update_begin(_ring->syncp);
tx_ring->tx_stats.doorbells++;
u64_stats_update_end(_ring->syncp);
-- 
2.7.4



[PATCH v4 04/17] igb: eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
Reviewed-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index b88fae7..82aea92 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5671,7 +5671,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-   writel(i, tx_ring->tail);
+   writel_relaxed(i, tx_ring->tail);
 
/* we need this if more than one processor can write to our tail
 * at a time, it synchronizes IO on IA64/Altix systems
@@ -8072,7 +8072,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 
cleaned_count)
 * such as IA-64).
 */
wmb();
-   writel(i, rx_ring->tail);
+   writel_relaxed(i, rx_ring->tail);
}
 }
 
-- 
2.7.4



[PATCH v4 13/17] net: cxgb3: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/chelsio/cxgb3/adapter.h |  7 +++
 drivers/net/ethernet/chelsio/cxgb3/sge.c | 19 ++-
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb3/adapter.h 
b/drivers/net/ethernet/chelsio/cxgb3/adapter.h
index 087ff0f..0e21e66 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb3/adapter.h
@@ -281,6 +281,13 @@ static inline void t3_write_reg(struct adapter *adapter, 
u32 reg_addr, u32 val)
writel(val, adapter->regs + reg_addr);
 }
 
+static inline void t3_write_reg_relaxed(struct adapter *adapter, u32 reg_addr,
+   u32 val)
+{
+   CH_DBG(adapter, MMIO, "setting register 0x%x to 0x%x\n", reg_addr, val);
+   writel_relaxed(val, adapter->regs + reg_addr);
+}
+
 static inline struct port_info *adap2pinfo(struct adapter *adap, int idx)
 {
return netdev_priv(adap->port[idx]);
diff --git a/drivers/net/ethernet/chelsio/cxgb3/sge.c 
b/drivers/net/ethernet/chelsio/cxgb3/sge.c
index e988caa..0baab06 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/sge.c
@@ -487,7 +487,8 @@ static inline void ring_fl_db(struct adapter *adap, struct 
sge_fl *q)
if (q->pend_cred >= q->credits / 4) {
q->pend_cred = 0;
wmb();
-   t3_write_reg(adap, A_SG_KDOORBELL, V_EGRCNTX(q->cntxt_id));
+   t3_write_reg_relaxed(adap, A_SG_KDOORBELL,
+V_EGRCNTX(q->cntxt_id));
}
 }
 
@@ -1058,8 +1059,8 @@ static inline void check_ring_tx_db(struct adapter *adap, 
struct sge_txq *q)
}
 #else
wmb();  /* write descriptors before telling HW */
-   t3_write_reg(adap, A_SG_KDOORBELL,
-F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
+   t3_write_reg_relaxed(adap, A_SG_KDOORBELL,
+F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
 #endif
 }
 
@@ -1510,8 +1511,8 @@ static int ctrl_xmit(struct adapter *adap, struct sge_txq 
*q,
}
spin_unlock(>lock);
wmb();
-   t3_write_reg(adap, A_SG_KDOORBELL,
-F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
+   t3_write_reg_relaxed(adap, A_SG_KDOORBELL,
+F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
return NET_XMIT_SUCCESS;
 }
 
@@ -1554,8 +1555,8 @@ static void restart_ctrlq(unsigned long data)
 
spin_unlock(>lock);
wmb();
-   t3_write_reg(qs->adap, A_SG_KDOORBELL,
-F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
+   t3_write_reg_relaxed(qs->adap, A_SG_KDOORBELL,
+F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
 }
 
 /*
@@ -1793,8 +1794,8 @@ again:reclaim_completed_tx(adap, q, TX_RECLAIM_CHUNK);
 #endif
wmb();
if (likely(written))
-   t3_write_reg(adap, A_SG_KDOORBELL,
-F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
+   t3_write_reg_relaxed(adap, A_SG_KDOORBELL,
+F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
 }
 
 /**
-- 
2.7.4



[PATCH v4 01/17] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
Reviewed-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 8 
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e554aa6cf..9455869 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -185,7 +185,7 @@ static int i40e_program_fdir_filter(struct i40e_fdir_filter 
*fdir_data,
/* Mark the data descriptor to be watched */
first->next_to_watch = tx_desc;
 
-   writel(tx_ring->next_to_use, tx_ring->tail);
+   writel_relaxed(tx_ring->next_to_use, tx_ring->tail);
return 0;
 
 dma_fail:
@@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring 
*rx_ring, u32 val)
 * such as IA-64).
 */
wmb();
-   writel(val, rx_ring->tail);
+   writel_relaxed(val, rx_ring->tail);
 }
 
 /**
@@ -2258,7 +2258,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, 
int budget)
 */
wmb();
 
-   writel(xdp_ring->next_to_use, xdp_ring->tail);
+   writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
}
 
rx_ring->skb = skb;
@@ -3286,7 +3286,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, 
struct sk_buff *skb,
 
/* notify HW of packet */
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-   writel(i, tx_ring->tail);
+   writel_relaxed(i, tx_ring->tail);
 
/* we need this if more than one processor can write to our tail
 * at a time, it synchronizes IO on IA64/Altix systems
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 357d605..56eea20 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring 
*rx_ring, u32 val)
 * such as IA-64).
 */
wmb();
-   writel(val, rx_ring->tail);
+   writel_relaxed(val, rx_ring->tail);
 }
 
 /**
@@ -2243,7 +2243,7 @@ static inline void i40evf_tx_map(struct i40e_ring 
*tx_ring, struct sk_buff *skb,
 
/* notify HW of packet */
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-   writel(i, tx_ring->tail);
+   writel_relaxed(i, tx_ring->tail);
 
/* we need this if more than one processor can write to our tail
 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4



[PATCH v4 07/17] fm10k: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya 
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 8e12aae..eebef01 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -179,7 +179,7 @@ void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 
cleaned_count)
wmb();
 
/* notify hardware of new descriptors */
-   writel(i, rx_ring->tail);
+   writel_relaxed(i, rx_ring->tail);
}
 }
 
@@ -1054,7 +1054,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 
/* notify HW of packet */
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-   writel(i, tx_ring->tail);
+   writel_relaxed(i, tx_ring->tail);
 
/* we need this if more than one processor can write to our tail
 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4



[PATCH v4 00/17] netdev: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Sinan Kaya
Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

I did a regex search for wmb() followed by writel() in each drivers
directory.
I scrubbed the ones I care about in this series.

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.

We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

Feel free to apply patches individually.

Changes since v3:
- https://www.spinics.net/lists/arm-kernel/msg641851.html
- group patches together into subsystems net:... 
- collect reviewed and tested bys
- scrub barrier()


Sinan Kaya (17):
  i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
  ixgbe: eliminate duplicate barriers on weakly-ordered archs
  igbvf: eliminate duplicate barriers on weakly-ordered archs
  igb: eliminate duplicate barriers on weakly-ordered archs
  ixgbevf: keep writel() closer to wmb()
  ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  fm10k: Eliminate duplicate barriers on weakly-ordered archs
  drivers: net: cxgb: Eliminate duplicate barriers on weakly-ordered
archs
  net: qla3xxx: Eliminate duplicate barriers on weakly-ordered archs
  qlcnic: Eliminate duplicate barriers on weakly-ordered archs
  bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  net: cxgb4/cxgb4vf: Eliminate duplicate barriers on weakly-ordered
archs
  net: cxgb3: Eliminate duplicate barriers on weakly-ordered archs
  net: qlge: Eliminate duplicate barriers on weakly-ordered archs
  bnxt_en: Eliminate duplicate barriers on weakly-ordered archs
  qed/qede: Eliminate duplicate barriers on weakly-ordered archs
  net: ena: Eliminate duplicate barriers on weakly-ordered archs

 drivers/net/ethernet/amazon/ena/ena_com.c  |  6 --
 drivers/net/ethernet/amazon/ena/ena_eth_com.h  | 22 --
 drivers/net/ethernet/amazon/ena/ena_netdev.c   |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h|  9 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h|  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   | 21 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c  |  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c   |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h  | 11 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |  2 +-
 drivers/net/ethernet/chelsio/cxgb/sge.c|  2 +-
 drivers/net/ethernet/chelsio/cxgb3/adapter.h   |  7 +++
 drivers/net/ethernet/chelsio/cxgb3/sge.c   | 19 ++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  6 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 13 +++--
 drivers/net/ethernet/chelsio/cxgb4/sge.c   | 12 ++--
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |  2 +-
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h | 14 ++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 18 ++
 drivers/net/ethernet/intel/fm10k/fm10k_main.c  |  4 ++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c|  8 
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  |  4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c  |  4 ++--
 drivers/net/ethernet/intel/igbvf/netdev.c  |  4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  8 
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h   |  5 -
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c  |  4 ++--
 drivers/net/ethernet/qlogic/qed/qed.h  |  5 -
 drivers/net/ethernet/qlogic/qed/qed_hw.c   | 12 
 drivers/net/ethernet/qlogic/qed/qed_hw.h   | 14 ++
 drivers/net/ethernet/qlogic/qed/qed_int.c  |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_l2.c   |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_ll2.c  |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_vf.c   |  7 ---
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c|  2 +-
 drivers/net/ethernet/qlogic/qede/qede_fp.c |  4 ++--
 drivers/net/ethernet/qlogic/qla3xxx.c  |  4 ++--
 .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c|  2 +-
 drivers/net/ethernet/qlogic/qlge/qlge.h| 17 +
 

Re: [PATCH 12/36] fs: add new vfs_poll and file_can_poll helpers

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:19PM -0800, Christoph Hellwig wrote:
> These abstract out calls to the poll method in preparation for changes
> in how we poll.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/staging/comedi/drivers/serial2002.c |  4 ++--
>  drivers/vfio/virqfd.c   |  2 +-
>  drivers/vhost/vhost.c   |  2 +-
>  fs/eventpoll.c  |  5 ++---
>  fs/select.c | 23 ---
>  include/linux/poll.h| 12 
>  mm/memcontrol.c |  2 +-

For the fs/include/mm changes,
Reviewed-by: Darrick J. Wong 

The other conversions look fine to me too but I've never looked at them
before. :)

--D

>  net/9p/trans_fd.c   | 18 --
>  virt/kvm/eventfd.c  |  2 +-
>  9 files changed, 32 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/staging/comedi/drivers/serial2002.c 
> b/drivers/staging/comedi/drivers/serial2002.c
> index b3f3b4a201af..5471b2212a62 100644
> --- a/drivers/staging/comedi/drivers/serial2002.c
> +++ b/drivers/staging/comedi/drivers/serial2002.c
> @@ -113,7 +113,7 @@ static void serial2002_tty_read_poll_wait(struct file *f, 
> int timeout)
>   long elapsed;
>   __poll_t mask;
>  
> - mask = f->f_op->poll(f, );
> + mask = vfs_poll(f, );
>   if (mask & (EPOLLRDNORM | EPOLLRDBAND | EPOLLIN |
>   EPOLLHUP | EPOLLERR)) {
>   break;
> @@ -136,7 +136,7 @@ static int serial2002_tty_read(struct file *f, int 
> timeout)
>  
>   result = -1;
>   if (!IS_ERR(f)) {
> - if (f->f_op->poll) {
> + if (file_can_poll(f)) {
>   serial2002_tty_read_poll_wait(f, timeout);
>  
>   if (kernel_read(f, , 1, ) == 1)
> diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
> index 085700f1be10..2a1be859ee71 100644
> --- a/drivers/vfio/virqfd.c
> +++ b/drivers/vfio/virqfd.c
> @@ -166,7 +166,7 @@ int vfio_virqfd_enable(void *opaque,
>   init_waitqueue_func_entry(>wait, virqfd_wakeup);
>   init_poll_funcptr(>pt, virqfd_ptable_queue_proc);
>  
> - events = irqfd.file->f_op->poll(irqfd.file, >pt);
> + events = vfs_poll(irqfd.file, >pt);
>  
>   /*
>* Check if there was an event already pending on the eventfd
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 1b3e8d2d5c8b..4d27e288bb1d 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -208,7 +208,7 @@ int vhost_poll_start(struct vhost_poll *poll, struct file 
> *file)
>   if (poll->wqh)
>   return 0;
>  
> - mask = file->f_op->poll(file, >table);
> + mask = vfs_poll(file, >table);
>   if (mask)
>   vhost_poll_wakeup(>wait, 0, 0, poll_to_key(mask));
>   if (mask & EPOLLERR) {
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 0f3494ed3ed0..2bebae5a38cf 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -884,8 +884,7 @@ static __poll_t ep_item_poll(const struct epitem *epi, 
> poll_table *pt,
>  
>   pt->_key = epi->event.events;
>   if (!is_file_epoll(epi->ffd.file))
> - return epi->ffd.file->f_op->poll(epi->ffd.file, pt) &
> -epi->event.events;
> + return vfs_poll(epi->ffd.file, pt) & epi->event.events;
>  
>   ep = epi->ffd.file->private_data;
>   poll_wait(epi->ffd.file, >poll_wait, pt);
> @@ -2020,7 +2019,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
>  
>   /* The target file descriptor must support poll */
>   error = -EPERM;
> - if (!tf.file->f_op->poll)
> + if (!file_can_poll(tf.file))
>   goto error_tgt_fput;
>  
>   /* Check if EPOLLWAKEUP is allowed */
> diff --git a/fs/select.c b/fs/select.c
> index c6c504a814f9..ba91103707ea 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -502,14 +502,10 @@ static int do_select(int n, fd_set_bits *fds, struct 
> timespec64 *end_time)
>   continue;
>   f = fdget(i);
>   if (f.file) {
> - const struct file_operations *f_op;
> - f_op = f.file->f_op;
> - mask = DEFAULT_POLLMASK;
> - if (f_op->poll) {
> - wait_key_set(wait, in, out,
> -  bit, busy_flag);
> - mask = (*f_op->poll)(f.file, 
> wait);
> - }
> + wait_key_set(wait, in, out, bit,
> +  busy_flag);
> + 

Re: [bpf-next V3 PATCH 00/15] XDP redirect memory return API

2018-03-19 Thread Jason Wang



On 2018年03月19日 18:10, Jesper Dangaard Brouer wrote:

On Fri, 16 Mar 2018 17:04:17 +0800
Jason Wang  wrote:


Looks like the series forget to register memory model for tun and
virtio-net.

Well, no.  It is actually not strictly necessary to invoke
xdp_rxq_info_reg_mem_model() because enum MEM_TYPE_PAGE_SHARED == 0.
And if not passing an allocator pointer to the call, then an mem_id is
not registered... and __xdp_return_frame() skips the rhashtable_lookup.


I see.



I designed the API this way, because I want to support later adding an
allocator even for the refcnt scheme MEM_TYPE_PAGE_SHARED.  (As it
would be a performance optimization to return the pages to the
originating RX-CPU, and move the page refcnt dec back to that orig CPU).

I did add an xdp_rxq_info_reg_mem_model() call to ixgbe, for human
programmer "documentation" even-though it isn't strickly necessary.  I
guess, I could add similar calls to tun and virtio_net, and then we
avoid any implicit assumptions. And makes it more clear that
XDP_REDIRECT support use the memory model return API.



Yes, please do it or add a comment somewhere.

Thanks


Re: [bpf-next V2 PATCH 10/15] xdp: rhashtable with allocator ID to pointer mapping

2018-03-19 Thread Jason Wang



On 2018年03月19日 17:48, Jesper Dangaard Brouer wrote:

On Fri, 16 Mar 2018 16:45:30 +0800
Jason Wang  wrote:


On 2018年03月10日 00:07, Jesper Dangaard Brouer wrote:

On Fri, 9 Mar 2018 21:07:36 +0800
Jason Wang  wrote:
  

Use the IDA infrastructure for getting a cyclic increasing ID number,
that is used for keeping track of each registered allocator per
RX-queue xdp_rxq_info.

Signed-off-by: Jesper Dangaard Brouer

A stupid question is, can we manage to unify this ID with NAPI id?

Sorry I don't understand the question?

I mean can we associate page poll pointer to napi_struct, record NAPI id
in xdp_mem_info and do lookup through NAPI id?

No. The driver can unreg/reg a new XDP memory model,

Is there an actual use case for this?

I believe this is the common use case.  When attaching an XDP/bpf prog,
then the driver usually want to change the RX-ring memory model
(different performance trade off).


Right, but a single driver should only have one XDP memory model. (Or 
you want to all drivers to use this generic allocator?)



When detaching XDP, then driver
want to change back to old memory model. During this process, I
believe, the NAPI-ID remains the same (right?).


Yes, but we can change the allocator pointer in the NAPI struct in this 
case too.


Thanks




   without reloading
the NAPI and generate a new NAPI id.
  




Re: [PATCH 11/36] fs: update documentation for __poll_t

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:18PM -0800, Christoph Hellwig wrote:

No commit message... "Update documentation to match the headers"?

--D

> Signed-off-by: Christoph Hellwig 
> ---
>  Documentation/filesystems/Locking | 2 +-
>  Documentation/filesystems/vfs.txt | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/Locking 
> b/Documentation/filesystems/Locking
> index 75d2d57e2c44..220bba28f72b 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -439,7 +439,7 @@ prototypes:
>   ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
>   ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>   int (*iterate) (struct file *, struct dir_context *);
> - unsigned int (*poll) (struct file *, struct poll_table_struct *);
> + __poll_t (*poll) (struct file *, struct poll_table_struct *);
>   long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>   long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>   int (*mmap) (struct file *, struct vm_area_struct *);
> diff --git a/Documentation/filesystems/vfs.txt 
> b/Documentation/filesystems/vfs.txt
> index 5fd325df59e2..f608180ad59d 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -856,7 +856,7 @@ struct file_operations {
>   ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
>   ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
>   int (*iterate) (struct file *, struct dir_context *);
> - unsigned int (*poll) (struct file *, struct poll_table_struct *);
> + __poll_t (*poll) (struct file *, struct poll_table_struct *);
>   long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>   long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>   int (*mmap) (struct file *, struct vm_area_struct *);
> -- 
> 2.14.2
> 


Re: [PATCH 10/36] fs: cleanup do_pollfd

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:17PM -0800, Christoph Hellwig wrote:
> Use straigline code with failure handling gotos instead of a lot
> of nested conditionals.
> 
> Signed-off-by: Christoph Hellwig 

Looks ok,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/select.c | 48 +++-
>  1 file changed, 23 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index 686de7b3a1db..c6c504a814f9 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -806,34 +806,32 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, 
> poll_table *pwait,
>bool *can_busy_poll,
>__poll_t busy_flag)
>  {
> - __poll_t mask;
> - int fd;
> -
> - mask = 0;
> - fd = pollfd->fd;
> - if (fd >= 0) {
> - struct fd f = fdget(fd);
> - mask = EPOLLNVAL;
> - if (f.file) {
> - /* userland u16 ->events contains POLL... bitmap */
> - __poll_t filter = demangle_poll(pollfd->events) |
> - EPOLLERR | EPOLLHUP;
> - mask = DEFAULT_POLLMASK;
> - if (f.file->f_op->poll) {
> - pwait->_key = filter;
> - pwait->_key |= busy_flag;
> - mask = f.file->f_op->poll(f.file, pwait);
> - if (mask & busy_flag)
> - *can_busy_poll = true;
> - }
> - /* Mask out unneeded events. */
> - mask &= filter;
> - fdput(f);
> - }
> + int fd = pollfd->fd;
> + __poll_t mask = 0, filter;
> + struct fd f;
> +
> + if (fd < 0)
> + goto out;
> + mask = EPOLLNVAL;
> + f = fdget(fd);
> + if (!f.file)
> + goto out;
> +
> + /* userland u16 ->events contains POLL... bitmap */
> + filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
> + mask = DEFAULT_POLLMASK;
> + if (f.file->f_op->poll) {
> + pwait->_key = filter | busy_flag;
> + mask = f.file->f_op->poll(f.file, pwait);
> + if (mask & busy_flag)
> + *can_busy_poll = true;
>   }
> + mask &= filter; /* Mask out unneeded events. */
> + fdput(f);
> +
> +out:
>   /* ... and so does ->revents */
>   pollfd->revents = mangle_poll(mask);
> -
>   return mask;
>  }
>  
> -- 
> 2.14.2
> 


Re: [PATCH 09/36] fs: unexport poll_schedule_timeout

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:16PM -0800, Christoph Hellwig wrote:
> No users outside of select.c.
> 
> Signed-off-by: Christoph Hellwig 

Looks ok,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/select.c  | 3 +--
>  include/linux/poll.h | 2 --
>  2 files changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index b6c36254028a..686de7b3a1db 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -233,7 +233,7 @@ static void __pollwait(struct file *filp, 
> wait_queue_head_t *wait_address,
>   add_wait_queue(wait_address, >wait);
>  }
>  
> -int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
> +static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
> ktime_t *expires, unsigned long slack)
>  {
>   int rc = -EINTR;
> @@ -258,7 +258,6 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int 
> state,
>  
>   return rc;
>  }
> -EXPORT_SYMBOL(poll_schedule_timeout);
>  
>  /**
>   * poll_select_set_timeout - helper function to setup the timeout value
> diff --git a/include/linux/poll.h b/include/linux/poll.h
> index f45ebd017eaa..a3576da63377 100644
> --- a/include/linux/poll.h
> +++ b/include/linux/poll.h
> @@ -96,8 +96,6 @@ struct poll_wqueues {
>  
>  extern void poll_initwait(struct poll_wqueues *pwq);
>  extern void poll_freewait(struct poll_wqueues *pwq);
> -extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
> -  ktime_t *expires, unsigned long slack);
>  extern u64 select_estimate_accuracy(struct timespec64 *tv);
>  
>  #define MAX_INT64_SECONDS (((s64)(~((u64)0)>>1)/HZ)-1)
> -- 
> 2.14.2
> 


Re: [PATCH 08/36] aio: implement io_pgetevents

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:15PM -0800, Christoph Hellwig wrote:
> This is the io_getevents equivalent of ppoll/pselect and allows to
> properly mix signals and aio completions (especially with IOCB_CMD_POLL)
> and atomically executes the following sequence:
> 
>   sigset_t origmask;
> 
>   pthread_sigmask(SIG_SETMASK, , );
>   ret = io_getevents(ctx, min_nr, nr, events, timeout);
>   pthread_sigmask(SIG_SETMASK, , NULL);
> 
> Note that unlike many other signal related calls we do not pass a sigmask
> size, as that would get us to 7 arguments, which aren't easily supported
> by the syscall infrastructure.  It seems a lot less painful to just add a
> new syscall variant in the unlikely case we're going to increase the
> sigset size.

I'm assuming there's a proposed manpage update for this somewhere? :)

--D

> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  fs/aio.c   | 114 
> ++---
>  include/linux/compat.h |   7 ++
>  include/linux/syscalls.h   |   6 ++
>  include/uapi/asm-generic/unistd.h  |   4 +-
>  include/uapi/linux/aio_abi.h   |   6 ++
>  kernel/sys_ni.c|   2 +
>  8 files changed, 130 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
> b/arch/x86/entry/syscalls/syscall_32.tbl
> index 448ac2161112..5997c3e9ac3e 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -391,3 +391,4 @@
>  382  i386pkey_free   sys_pkey_free
>  383  i386statx   sys_statx
>  384  i386arch_prctl  sys_arch_prctl  
> compat_sys_arch_prctl
> +385  i386io_pgetevents   sys_io_pgetevents   
> compat_sys_io_pgetevents
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
> b/arch/x86/entry/syscalls/syscall_64.tbl
> index 5aef183e2f85..e995cd2b4e65 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -339,6 +339,7 @@
>  330  common  pkey_alloc  sys_pkey_alloc
>  331  common  pkey_free   sys_pkey_free
>  332  common  statx   sys_statx
> +333  common  io_pgetevents   sys_io_pgetevents
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/aio.c b/fs/aio.c
> index 9d7d6e4cde87..da87cbf7c67a 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1291,10 +1291,6 @@ static long read_events(struct kioctx *ctx, long 
> min_nr, long nr,
>   wait_event_interruptible_hrtimeout(ctx->wait,
>   aio_read_events(ctx, min_nr, nr, event, ),
>   until);
> -
> - if (!ret && signal_pending(current))
> - ret = -EINTR;
> -
>   return ret;
>  }
>  
> @@ -1874,13 +1870,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
>   struct timespec __user *, timeout)
>  {
>   struct timespec64   ts;
> + int ret;
> +
> + if (timeout && unlikely(get_timespec64(, timeout)))
> + return -EFAULT;
> +
> + ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
> + if (!ret && signal_pending(current))
> + ret = -EINTR;
> + return ret;
> +}
> +
> +SYSCALL_DEFINE6(io_pgetevents,
> + aio_context_t, ctx_id,
> + long, min_nr,
> + long, nr,
> + struct io_event __user *, events,
> + struct timespec __user *, timeout,
> + const struct __aio_sigset __user *, usig)
> +{
> + struct __aio_sigset ksig = { NULL, };
> + sigset_tksigmask, sigsaved;
> + struct timespec64   ts;
> + int ret;
> +
> + if (timeout && unlikely(get_timespec64(, timeout)))
> + return -EFAULT;
>  
> - if (timeout) {
> - if (unlikely(get_timespec64(, timeout)))
> + if (usig && copy_from_user(, usig, sizeof(ksig)))
> + return -EFAULT;
> +
> + if (ksig.sigmask) {
> + if (ksig.sigsetsize != sizeof(sigset_t))
> + return -EINVAL;
> + if (copy_from_user(, ksig.sigmask, sizeof(ksigmask)))
>   return -EFAULT;
> + sigdelsetmask(, sigmask(SIGKILL) | sigmask(SIGSTOP));
> + sigprocmask(SIG_SETMASK, , );
> + }
> +
> + ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
> + if (signal_pending(current)) {
> + if (ksig.sigmask) {
> + current->saved_sigmask = sigsaved;
> + set_restore_sigmask();
> + }
> +
> + if (!ret)
> + ret = -ERESTARTNOHAND;
> + } else {
> + if (ksig.sigmask)
> +   

Re: [PATCH bpf] bpf: fix crash due to inode i_op mismatch with clang/llvm

2018-03-19 Thread Linus Torvalds
On Mon, Mar 19, 2018 at 6:50 PM, Linus Torvalds
 wrote:
>
> Add it to everything. If it's an invalid optimization, it shouldn't be on.

IOW, why isn't this just something like

  diff --git a/Makefile b/Makefile
  index d65e2e229017..01abedc2e79f 100644
  --- a/Makefile
  +++ b/Makefile
  @@ -826,6 +826,9 @@ KBUILD_CFLAGS += $(call cc-disable-warning, pointer-sign)
   # disable invalid "can't wrap" optimizations for signed / pointers
   KBUILD_CFLAGS+= $(call cc-option,-fno-strict-overflow)

  +# disable invalid optimization on clang
  +KBUILD_CFLAGS   += $(call cc-option,-fno-merge-all-constants)
  +
   # Make sure -fstack-check isn't enabled (like gentoo apparently did)
   KBUILD_CFLAGS  += $(call cc-option,-fno-stack-check,)

(whitespace-damaged, but you get the gist of it).

We disable some optimizations that are technically _valid_, because
they are too dangerous and a bad idea.

Disabling an optimization that isn't valid EVEN IN THEORY is an
absolute no-brainer, particularly if it has already shown itself to
cause problems.

We have other situations where we generate multiple static structures
and expect them to be unique. I'm not sure any of them would trigger
the clang rules, but the clang rules are obviously complete garbage
anyway, so who knows?

That optimization seems to teuly be pure and utter garbage. Clang can
even *see* the address comparison happening in that file.

Some clang person needs to be publicly shamed for enabling this kind
of garbage by default, particularly since they apparently _knew_ it
was invalid.

  Linus


Re: [PATCH bpf] bpf: fix crash due to inode i_op mismatch with clang/llvm

2018-03-19 Thread Linus Torvalds
On Mon, Mar 19, 2018 at 6:17 PM, Daniel Borkmann  wrote:
>
> Reason for this miscompilation is that clang has the more aggressive
> -fmerge-all-constants enabled by default. In fact, clang source code
> has an 'insightful' comment about it in its source code under
> lib/AST/ExprConstant.cpp:
>
>   // Pointers with different bases cannot represent the same object.
>   // (Note that clang defaults to -fmerge-all-constants, which can
>   // lead to inconsistent results for comparisons involving the address
>   // of a constant; this generally doesn't matter in practice.)
>
> gcc on the other hand does not enable -fmerge-all-constants by default
> and *explicitly* states in it's option description that using this
> flag results in non-conforming behavior, quote from man gcc:
>
>   Languages like C or C++ require each variable, including multiple
>   instances of the same variable in recursive calls, to have distinct
>   locations, so using this option results in non-conforming behavior.
>
> Given there are users with clang/LLVM out there today that triggered
> this, fix this mess by explicitly adding -fno-merge-all-constants to
> inode.o as CFLAGS via Kbuild system.

Oh, please do *NOT* add it to just that one file.

Add it to everything. If it's an invalid optimization, it shouldn't be on.

And even if it happens to trigger in only that one file, then
disabling it globally is just the safe thing to do.

What is the code generation difference if you just enable it globally?
I would certainly _hope_ that it's not noticeable, but if it's
noticeable that would certainly imply that it's very dangerous
somewhere else too!

Linus


[PATCH v2 2/3] net: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b

2018-03-19 Thread Kevin Hao
The Ethernet on mpc8315erdb is broken since commit b6b5e8a69118
("gianfar: Disable EEE autoneg by default"). The reason is that
even though the rtl8211b doesn't support the MMD extended registers
access, it does return some random values if we trying to access
the MMD register via indirect method. This makes it seem that the
EEE is supported by this phy device. And the subsequent writing to
the MMD registers does cause the phy malfunction. So use the dummy
stubs for the MMD register access to fix this issue.

Fixes: b6b5e8a69118 ("gianfar: Disable EEE autoneg by default")
Signed-off-by: Kevin Hao 
---
 drivers/net/phy/realtek.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index ee3ca4a2f12b..9f48ecf9c627 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -172,6 +172,8 @@ static struct phy_driver realtek_drvs[] = {
.flags  = PHY_HAS_INTERRUPT,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
+   .read_mmd   = _read_mmd_unsupported,
+   .write_mmd  = _write_mmd_unsupported,
}, {
.phy_id = 0x001cc914,
.name   = "RTL8211DN Gigabit Ethernet",
-- 
2.9.3



[PATCH v2 3/3] net: phy: micrel: Use the general dummy stubs for MMD register access

2018-03-19 Thread Kevin Hao
The new general dummy stubs for MMD register access were introduced.
Use that for the codes reuse.

Signed-off-by: Kevin Hao 
---
 drivers/net/phy/micrel.c | 23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 49be85afbea9..f41b224a9cdb 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -635,25 +635,6 @@ static int ksz8873mll_config_aneg(struct phy_device 
*phydev)
return 0;
 }
 
-/* This routine returns -1 as an indication to the caller that the
- * Micrel ksz9021 10/100/1000 PHY does not support standard IEEE
- * MMD extended PHY registers.
- */
-static int
-ksz9021_rd_mmd_phyreg(struct phy_device *phydev, int devad, u16 regnum)
-{
-   return -1;
-}
-
-/* This routine does nothing since the Micrel ksz9021 does not support
- * standard IEEE MMD extended PHY registers.
- */
-static int
-ksz9021_wr_mmd_phyreg(struct phy_device *phydev, int devad, u16 regnum, u16 
val)
-{
-   return -1;
-}
-
 static int kszphy_get_sset_count(struct phy_device *phydev)
 {
return ARRAY_SIZE(kszphy_hw_stats);
@@ -946,8 +927,8 @@ static struct phy_driver ksphy_driver[] = {
.get_stats  = kszphy_get_stats,
.suspend= genphy_suspend,
.resume = genphy_resume,
-   .read_mmd   = ksz9021_rd_mmd_phyreg,
-   .write_mmd  = ksz9021_wr_mmd_phyreg,
+   .read_mmd   = genphy_read_mmd_unsupported,
+   .write_mmd  = genphy_write_mmd_unsupported,
 }, {
.phy_id = PHY_ID_KSZ9031,
.phy_id_mask= MICREL_PHY_ID_MASK,
-- 
2.9.3



[PATCH v2 1/3] net: phy: Add general dummy stubs for MMD register access

2018-03-19 Thread Kevin Hao
For some phy devices, even though they don't support the MMD extended
register access, it does have some side effect if we are trying to
read/write the MMD registers via indirect method. So introduce general
dummy stubs for MMD register access which these devices can use to avoid
such side effect.

Fixes: b6b5e8a69118 ("gianfar: Disable EEE autoneg by default")
Signed-off-by: Kevin Hao 
---
 drivers/net/phy/phy_device.c | 17 +
 include/linux/phy.h  |  4 
 2 files changed, 21 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index b285323327c4..b070f8fd66fe 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1666,6 +1666,23 @@ int genphy_config_init(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(genphy_config_init);
 
+/* This is used for the phy device which doesn't support the MMD extended
+ * register access, but it does have side effect when we are trying to access
+ * the MMD register via indirect method.
+ */
+int genphy_read_mmd_unsupported(struct phy_device *phdev, int devad, u16 
regnum)
+{
+   return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL(genphy_read_mmd_unsupported);
+
+int genphy_write_mmd_unsupported(struct phy_device *phdev, int devnum,
+u16 regnum, u16 val)
+{
+   return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL(genphy_write_mmd_unsupported);
+
 int genphy_suspend(struct phy_device *phydev)
 {
return phy_set_bits(phydev, MII_BMCR, BMCR_PDOWN);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 68127b002c3d..f0b5870a6d40 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -984,6 +984,10 @@ static inline int genphy_no_soft_reset(struct phy_device 
*phydev)
 {
return 0;
 }
+int genphy_read_mmd_unsupported(struct phy_device *phdev, int devad,
+   u16 regnum);
+int genphy_write_mmd_unsupported(struct phy_device *phdev, int devnum,
+u16 regnum, u16 val);
 
 /* Clause 45 PHY */
 int genphy_c45_restart_aneg(struct phy_device *phydev);
-- 
2.9.3



[PATCH v2 0/3] net: phy: Add general dummy stubs for MMD register access

2018-03-19 Thread Kevin Hao
v2:
As suggested by Andrew:
  - Add general dummy stubs
  - Also use that for the micrel phy

This patch series fix the Ethernet broken on the mpc8315erdb board introduced
by commit b6b5e8a69118 ("gianfar: Disable EEE autoneg by default").

Kevin Hao (3):
  net: phy: Add general dummy stubs for MMD register access
  net: phy: realtek: Use the dummy stubs for MMD register access for
rtl8211b
  net: phy: micrel: Use the general dummy stubs for MMD register access

 drivers/net/phy/micrel.c | 23 ++-
 drivers/net/phy/phy_device.c | 17 +
 drivers/net/phy/realtek.c|  2 ++
 include/linux/phy.h  |  4 
 4 files changed, 25 insertions(+), 21 deletions(-)

-- 
2.9.3



[PATCH bpf] bpf: fix crash due to inode i_op mismatch with clang/llvm

2018-03-19 Thread Daniel Borkmann
Prasad reported that he has seen crashes with netd on Android with
arm64 in the form of (note, the taint is unrelated):

  [ 4134.721483] Unable to handle kernel paging request at virtual address 
80001
  [ 4134.820925] Mem abort info:
  [ 4134.901283]   Exception class = DABT (current EL), IL = 32 bits
  [ 4135.016736]   SET = 0, FnV = 0
  [ 4135.119820]   EA = 0, S1PTW = 0
  [ 4135.201431] Data abort info:
  [ 4135.301388]   ISV = 0, ISS = 0x0021
  [ 4135.359599]   CM = 0, WnR = 0
  [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffe39b946000
  [ 4135.499757] [00080001] *pgd=, *pud=
  [ 4135.660725] Internal error: Oops: 9621 [#1] PREEMPT SMP
  [ 4135.674610] Modules linked in:
  [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S  W   4.14.19+ 
#1
  [ 4135.716188] task: ffe39f4aa380 task.stack: ff801d4e
  [ 4135.731599] PC is at bpf_prog_add+0x20/0x68
  [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c
  [ 4135.751788] pc : [] lr : [] pstate: 
60400145
  [ 4135.769062] sp : ff801d4e3ce0
  [...]
  [ 4136.258315] Process netd (pid: 1260, stack limit = 0xff801d4e)
  [ 4136.273746] Call trace:
  [...]
  [ 4136.442494] 3ca0: ff94ab7ad584 60400145 ffe3a01bf8f8 
0006
  [ 4136.460936] 3cc0: 0080 ff94ab844204 ff801d4e3cf0 
ff94ab7ad584
  [ 4136.479241] [] bpf_prog_add+0x20/0x68
  [ 4136.491767] [] bpf_prog_inc+0x20/0x2c
  [ 4136.504536] [] bpf_obj_get_user+0x204/0x22c
  [ 4136.518746] [] SyS_bpf+0x5a8/0x1a88

Android's netd was basically pinning the uid cookie BPF map in BPF
fs (/sys/fs/bpf/traffic_cookie_uid_map) and later on retrieving it
again resulting in above panic. Issue is that the map was wrongly
identified as a prog.

Above kernel was compiled with clang 4.0.3, and it turns out that
clang decided to merge the bpf_prog_iops and bpf_map_iops into a
single memory location, such that the two i_ops could then not be
distinguished anymore.

Reason for this miscompilation is that clang has the more aggressive
-fmerge-all-constants enabled by default. In fact, clang source code
has an 'insightful' comment about it in its source code under
lib/AST/ExprConstant.cpp:

  // Pointers with different bases cannot represent the same object.
  // (Note that clang defaults to -fmerge-all-constants, which can
  // lead to inconsistent results for comparisons involving the address
  // of a constant; this generally doesn't matter in practice.)

gcc on the other hand does not enable -fmerge-all-constants by default
and *explicitly* states in it's option description that using this
flag results in non-conforming behavior, quote from man gcc:

  Languages like C or C++ require each variable, including multiple
  instances of the same variable in recursive calls, to have distinct
  locations, so using this option results in non-conforming behavior.

Given there are users with clang/LLVM out there today that triggered
this, fix this mess by explicitly adding -fno-merge-all-constants to
inode.o as CFLAGS via Kbuild system. Also add a BUILD_BUG_ON() to
bail out when the two are the same address. Potentially we might want
to go even further and do this for the whole kernel as next step,
although given 4.16-rc6, it may be more suited to start out with this
in 4.17.

Reported-by: Prasad Sodagudi 
Signed-off-by: Daniel Borkmann 
Tested-by: Prasad Sodagudi 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/Makefile | 7 +++
 kernel/bpf/inode.c  | 1 +
 2 files changed, 8 insertions(+)

diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index a713fd2..8950241 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -1,6 +1,13 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-y := core.o
 
+# Mainly clang workaround that sets this by default. This
+# cannot be used here at all, otherwise it horribly breaks
+# inode ops for maps/progs. gcc does not set this flag by
+# default.
+CFLAGS_REMOVE_inode.o := -fmerge-all-constants
+CFLAGS_inode.o:= $(call cc-option,-fno-merge-all-constants)
+
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o 
bpf_lru_list.o lpm_trie.o map_in_map.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 81e2f69..ad95360 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -527,6 +527,7 @@ static int __init bpf_init(void)
if (ret)
sysfs_remove_mount_point(fs_kobj, "bpf");
 
+   BUILD_BUG_ON(_prog_iops == _map_iops);
return ret;
 }
 fs_initcall(bpf_init);
-- 
2.9.5



Re: [PATCH v2 0/2] net: phy: relax error checking when creating sysfs link netdev->phydev

2018-03-19 Thread David Miller
From: Grygorii Strashko 
Date: Fri, 16 Mar 2018 17:08:33 -0500

> Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per
> one netdevice, as result such drivers will produce warning during system
> boot and fail to connect second phy to netdevice when PHYLIB framework
> will try to create sysfs link netdev->phydev for second PHY
> in phy_attach_direct(), because sysfs link with the same name has been
> created already for the first PHY.
> As result, second CPSW external port will became unusable.
> This regression was introduced by commits:
> 5568363f0cb3 ("net: phy: Create sysfs reciprocal links for 
> attached_dev/phydev"
> a3995460491d ("net: phy: Relax error checking on sysfs_create_link()"
> 
> Patch 1: exports sysfs_create_link_nowarn() function as preparation for Patch 
> 2.
> Patch 2: relaxes error checking when PHYLIB framework is creating sysfs
> link netdev->phydev in phy_attach_direct(), suppresses warning by using
> sysfs_create_link_nowarn() and adds error message instead, so links creation
> failure is not fatal any more and system can continue working,
> which fixes TI CPSW issue and makes boot logs accessible
> in case of NFS boot, for example.

Series applied and queued up for -stable, thanks.


Re: [PATCH] net: gemini: fix memory leak

2018-03-19 Thread Linus Walleij
On Mon, Mar 19, 2018 at 7:40 AM, Igor Pylypiv  wrote:

> cppcheck report:
> [drivers/net/ethernet/cortina/gemini.c:543]: (error) Memory leak: skb_tab
>
> Signed-off-by: Igor Pylypiv 

Acked-by: Linus Walleij 

Yours,
Linus Walleij


[PATCH bpf-next] bpf: skip unnecessary capability check

2018-03-19 Thread Chenbo Feng
From: Chenbo Feng 

The current check statement in BPF syscall will do a capability check
for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
code path will trigger unnecessary security hooks on capability checking
and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
access. This can be resolved by simply switch the order of the statement
and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
allowed.

Signed-off-by: Chenbo Feng 
---
 kernel/bpf/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e24aa3241387..43f95d190eea 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1845,7 +1845,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, 
uattr, unsigned int, siz
union bpf_attr attr = {};
int err;
 
-   if (!capable(CAP_SYS_ADMIN) && sysctl_unprivileged_bpf_disabled)
+   if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
return -EPERM;
 
err = check_uarg_tail_zero(uattr, sizeof(attr), size);
-- 
2.16.2.804.g6dcf76e118-goog



Re: [PATCH 06/36] aio: delete iocbs from the active_reqs list in kiocb_cancel

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:13PM -0800, Christoph Hellwig wrote:
> One we cancel an iocb there is no reason to keep it on the active_reqs
> list, given that the list is only used to look for cancelation candidates.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Jeff Moyer 

Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/aio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 2d40cf5dd4ec..0b6394b4e528 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -561,6 +561,8 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
>  {
>   kiocb_cancel_fn *cancel = kiocb->ki_cancel;
>  
> + list_del_init(>ki_list);
> +
>   if (!cancel)
>   return -EINVAL;
>   kiocb->ki_cancel = NULL;
> @@ -607,8 +609,6 @@ static void free_ioctx_users(struct percpu_ref *ref)
>   while (!list_empty(>active_reqs)) {
>   req = list_first_entry(>active_reqs,
>  struct aio_kiocb, ki_list);
> -
> - list_del_init(>ki_list);
>   kiocb_cancel(req);
>   }
>  
> -- 
> 2.14.2
> 


Re: [PATCH 05/36] aio: simplify cancellation

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:12PM -0800, Christoph Hellwig wrote:
> With the current aio code there is no need for the magic KIOCB_CANCELLED
> value, as a cancelation just kicks the driver to queue the completion
> ASAP, with all actual completion handling done in another thread. Given
> that both the completion path and cancelation take the context lock there
> is no need for magic cmpxchg loops either.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Jeff Moyer 
> ---
>  fs/aio.c | 37 +
>  1 file changed, 9 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index c32c315f05b5..2d40cf5dd4ec 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -156,19 +156,6 @@ struct kioctx {
>   unsignedid;
>  };
>  
> -/*
> - * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been 
> either
> - * cancelled or completed (this makes a certain amount of sense because
> - * successful cancellation - io_cancel() - does deliver the completion to
> - * userspace).
> - *
> - * And since most things don't implement kiocb cancellation and we'd really 
> like
> - * kiocb completion to be lockless when possible, we use ki_cancel to
> - * synchronize cancellation and completion - we only set it to 
> KIOCB_CANCELLED
> - * with xchg() or cmpxchg(), see batch_complete_aio() and kiocb_cancel().
> - */
> -#define KIOCB_CANCELLED  ((void *) (~0ULL))
> -
>  struct aio_kiocb {
>   union {
>   struct kiocbrw;
> @@ -565,24 +552,18 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, 
> kiocb_cancel_fn *cancel)
>  }
>  EXPORT_SYMBOL(kiocb_set_cancel_fn);
>  
> +/*
> + * Only cancel if there ws a ki_cancel function to start with, and we
> + * are the one how managed to clear it (to protect against simulatinious

"...are the one who managed to clear it (to protect against simultaneous
cancel calls)." ?

Really only complaining because who/how are both English words...

Reviewed-by: Darrick J. Wong 

--D

> + * cancel calls).
> + */
>  static int kiocb_cancel(struct aio_kiocb *kiocb)
>  {
> - kiocb_cancel_fn *old, *cancel;
> -
> - /*
> -  * Don't want to set kiocb->ki_cancel = KIOCB_CANCELLED unless it
> -  * actually has a cancel function, hence the cmpxchg()
> -  */
> -
> - cancel = READ_ONCE(kiocb->ki_cancel);
> - do {
> - if (!cancel || cancel == KIOCB_CANCELLED)
> - return -EINVAL;
> -
> - old = cancel;
> - cancel = cmpxchg(>ki_cancel, old, KIOCB_CANCELLED);
> - } while (cancel != old);
> + kiocb_cancel_fn *cancel = kiocb->ki_cancel;
>  
> + if (!cancel)
> + return -EINVAL;
> + kiocb->ki_cancel = NULL;
>   return cancel(>rw);
>  }
>  
> -- 
> 2.14.2
> 


Re: [PATCH 04/36] aio: sanitize ki_list handling

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:11PM -0800, Christoph Hellwig wrote:
> Instead of handcoded non-null checks always initialize ki_list to an
> empty list and use list_empty / list_empty_careful on it.  While we're
> at it also error out on a double call to kiocb_set_cancel_fn instead
> of ignoring it.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Jeff Moyer 

Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/aio.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 6295fc00f104..c32c315f05b5 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -555,13 +555,12 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, 
> kiocb_cancel_fn *cancel)
>   struct kioctx *ctx = req->ki_ctx;
>   unsigned long flags;
>  
> - spin_lock_irqsave(>ctx_lock, flags);
> -
> - if (!req->ki_list.next)
> - list_add(>ki_list, >active_reqs);
> + if (WARN_ON_ONCE(!list_empty(>ki_list)))
> + return;
>  
> + spin_lock_irqsave(>ctx_lock, flags);
> + list_add_tail(>ki_list, >active_reqs);
>   req->ki_cancel = cancel;
> -
>   spin_unlock_irqrestore(>ctx_lock, flags);
>  }
>  EXPORT_SYMBOL(kiocb_set_cancel_fn);
> @@ -1034,7 +1033,7 @@ static inline struct aio_kiocb *aio_get_req(struct 
> kioctx *ctx)
>   goto out_put;
>  
>   percpu_ref_get(>reqs);
> -
> + INIT_LIST_HEAD(>ki_list);
>   req->ki_ctx = ctx;
>   return req;
>  out_put:
> @@ -1080,7 +1079,7 @@ static void aio_complete(struct aio_kiocb *iocb, long 
> res, long res2)
>   unsigned tail, pos, head;
>   unsigned long   flags;
>  
> - if (iocb->ki_list.next) {
> + if (!list_empty_careful(iocb->ki_list.next)) {
>   unsigned long flags;
>  
>   spin_lock_irqsave(>ctx_lock, flags);
> -- 
> 2.14.2
> 


Re: [PATCH 03/36] aio: refactor read/write iocb setup

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:10PM -0800, Christoph Hellwig wrote:
> Don't reference the kiocb structure from the common aio code, and move
> any use of it into helper specific to the read/write path.  This is in
> preparation for aio_poll support that wants to use the space for different
> fields.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Jeff Moyer 

Looks straightforward enough to me,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/aio.c | 171 
> ---
>  1 file changed, 97 insertions(+), 74 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 41fc8ce6bc7f..6295fc00f104 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -170,7 +170,9 @@ struct kioctx {
>  #define KIOCB_CANCELLED  ((void *) (~0ULL))
>  
>  struct aio_kiocb {
> - struct kiocbcommon;
> + union {
> + struct kiocbrw;
> + };
>  
>   struct kioctx   *ki_ctx;
>   kiocb_cancel_fn *ki_cancel;
> @@ -549,7 +551,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned 
> int nr_events)
>  
>  void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
>  {
> - struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, common);
> + struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
>   struct kioctx *ctx = req->ki_ctx;
>   unsigned long flags;
>  
> @@ -582,7 +584,7 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
>   cancel = cmpxchg(>ki_cancel, old, KIOCB_CANCELLED);
>   } while (cancel != old);
>  
> - return cancel(>common);
> + return cancel(>rw);
>  }
>  
>  static void free_ioctx(struct work_struct *work)
> @@ -1040,15 +1042,6 @@ static inline struct aio_kiocb *aio_get_req(struct 
> kioctx *ctx)
>   return NULL;
>  }
>  
> -static void kiocb_free(struct aio_kiocb *req)
> -{
> - if (req->common.ki_filp)
> - fput(req->common.ki_filp);
> - if (req->ki_eventfd != NULL)
> - eventfd_ctx_put(req->ki_eventfd);
> - kmem_cache_free(kiocb_cachep, req);
> -}
> -
>  static struct kioctx *lookup_ioctx(unsigned long ctx_id)
>  {
>   struct aio_ring __user *ring  = (void __user *)ctx_id;
> @@ -1079,29 +1072,14 @@ static struct kioctx *lookup_ioctx(unsigned long 
> ctx_id)
>  /* aio_complete
>   *   Called when the io request on the given iocb is complete.
>   */
> -static void aio_complete(struct kiocb *kiocb, long res, long res2)
> +static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
>  {
> - struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, common);
>   struct kioctx   *ctx = iocb->ki_ctx;
>   struct aio_ring *ring;
>   struct io_event *ev_page, *event;
>   unsigned tail, pos, head;
>   unsigned long   flags;
>  
> - BUG_ON(is_sync_kiocb(kiocb));
> -
> - if (kiocb->ki_flags & IOCB_WRITE) {
> - struct file *file = kiocb->ki_filp;
> -
> - /*
> -  * Tell lockdep we inherited freeze protection from submission
> -  * thread.
> -  */
> - if (S_ISREG(file_inode(file)->i_mode))
> - __sb_writers_acquired(file_inode(file)->i_sb, 
> SB_FREEZE_WRITE);
> - file_end_write(file);
> - }
> -
>   if (iocb->ki_list.next) {
>   unsigned long flags;
>  
> @@ -1163,11 +1141,12 @@ static void aio_complete(struct kiocb *kiocb, long 
> res, long res2)
>* eventfd. The eventfd_signal() function is safe to be called
>* from IRQ context.
>*/
> - if (iocb->ki_eventfd != NULL)
> + if (iocb->ki_eventfd) {
>   eventfd_signal(iocb->ki_eventfd, 1);
> + eventfd_ctx_put(iocb->ki_eventfd);
> + }
>  
> - /* everything turned out well, dispose of the aiocb. */
> - kiocb_free(iocb);
> + kmem_cache_free(kiocb_cachep, iocb);
>  
>   /*
>* We have to order our ring_info tail store above and test
> @@ -1430,6 +1409,47 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
>   return -EINVAL;
>  }
>  
> +static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
> +{
> + struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw);
> +
> + WARN_ON_ONCE(is_sync_kiocb(kiocb));
> +
> + if (kiocb->ki_flags & IOCB_WRITE) {
> + struct inode *inode = file_inode(kiocb->ki_filp);
> +
> + /*
> +  * Tell lockdep we inherited freeze protection from submission
> +  * thread.
> +  */
> + if (S_ISREG(inode->i_mode))
> + __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
> + file_end_write(kiocb->ki_filp);
> + }
> +
> + fput(kiocb->ki_filp);
> + aio_complete(iocb, res, res2);
> +}
> +
> +static int aio_prep_rw(struct kiocb *req, struct iocb *iocb)
> +{
> + int ret;
> +
> + 

Re: [PATCH 02/36] aio: remove an outdated comment in aio_complete

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:09PM -0800, Christoph Hellwig wrote:
> These days we don't treat sync iocbs special in the aio completion code as
> they never use it.  Remove the old comment, and move the BUG_ON for a sync
> iocb to the top of the function.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Jeff Moyer 

Looks ok,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/aio.c | 11 ++-
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 03d59593912d..41fc8ce6bc7f 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1088,6 +1088,8 @@ static void aio_complete(struct kiocb *kiocb, long res, 
> long res2)
>   unsigned tail, pos, head;
>   unsigned long   flags;
>  
> + BUG_ON(is_sync_kiocb(kiocb));
> +
>   if (kiocb->ki_flags & IOCB_WRITE) {
>   struct file *file = kiocb->ki_filp;
>  
> @@ -1100,15 +1102,6 @@ static void aio_complete(struct kiocb *kiocb, long 
> res, long res2)
>   file_end_write(file);
>   }
>  
> - /*
> -  * Special case handling for sync iocbs:
> -  *  - events go directly into the iocb for fast handling
> -  *  - the sync task with the iocb in its stack holds the single iocb
> -  *ref, no other paths have a way to get another ref
> -  *  - the sync task helpfully left a reference to itself in the iocb
> -  */
> - BUG_ON(is_sync_kiocb(kiocb));
> -
>   if (iocb->ki_list.next) {
>   unsigned long flags;
>  
> -- 
> 2.14.2
> 


Re: [PATCH 01/36] aio: don't print the page size at boot time

2018-03-19 Thread Darrick J. Wong
On Mon, Mar 05, 2018 at 01:27:08PM -0800, Christoph Hellwig wrote:
> The page size is in no way related to the aio code, and printing it in
> the (debug) dmesg at every boot serves no purpose.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Jeff Moyer 

Looks ok,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/aio.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index a062d75109cb..03d59593912d 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -264,9 +264,6 @@ static int __init aio_setup(void)
>  
>   kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
>   kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
> -
> - pr_debug("sizeof(struct page) = %zu\n", sizeof(struct page));
> -
>   return 0;
>  }
>  __initcall(aio_setup);
> -- 
> 2.14.2
> 


Re: interdependencies with cxgb4 and iw_cxgb4

2018-03-19 Thread David Miller
From: Steve Wise 
Date: Mon, 19 Mar 2018 14:50:57 -0500

> Let me ask a dumb question:  Why cannot one of the maintaners pull the
> commit from the other mainainer's git repo directly?  IE why have this
> third trusted/signed git repo that has to be on k.o, from which both
> maintainers pull?  If one of you can pull it in via a patch series,
> like you do for all other patches, and then notify the other
> maintainer to pull it from the first maintainers' repo if the series
> meets the requirements that it needs to be in both maintainers'
> repositories?  This avoids adding more staging git repos on k.o.  But
> probably I'm missing something...

Tree A may not want all of tree B's changes, and vice versa.


Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-19 Thread Linus Torvalds
On Mon, Mar 19, 2018 at 2:43 AM, David Laight  wrote:
>
> Is it necessary to have the full checks for old versions of gcc?
>
> Even -Wvla could be predicated on very recent gcc - since we aren't
> worried about whether gcc decides to generate a vla, but whether
> the source requests one.

You are correct. We could just ignore the issue with old gcc versions,
and disable -Wvla rather than worry about it.

  Linus


[PATCH bpf-next] bpf, doc: add description wrt native/bpf clang target and pointer size

2018-03-19 Thread Daniel Borkmann
As this recently came up on netdev [0], lets add it to the BPF devel doc.

  [0] https://www.spinics.net/lists/netdev/msg489612.html

Signed-off-by: Daniel Borkmann 
---
 Documentation/bpf/bpf_devel_QA.txt | 12 
 1 file changed, 12 insertions(+)

diff --git a/Documentation/bpf/bpf_devel_QA.txt 
b/Documentation/bpf/bpf_devel_QA.txt
index 84cbb30..1a0b704 100644
--- a/Documentation/bpf/bpf_devel_QA.txt
+++ b/Documentation/bpf/bpf_devel_QA.txt
@@ -539,6 +539,18 @@ A: Although LLVM IR generation and optimization try to 
stay architecture
The clang option "-fno-jump-tables" can be used to disable
switch table generation.
 
+ - For clang -target bpf, it is guaranteed that pointer or long /
+   unsigned long types will always have a width of 64 bit, no matter
+   whether underlying clang binary or default target (or kernel) is
+   32 bit. However, when native clang target is used, then it will
+   compile these types based on the underlying architecture's conventions,
+   meaning in case of 32 bit architecture, pointer or long / unsigned
+   long types e.g. in BPF context structure will have width of 32 bit
+   while the BPF LLVM back end still operates in 64 bit. The native
+   target is mostly needed in tracing for the case of walking pt_regs
+   or other kernel structures where CPU's register width matters.
+   Otherwise, clang -target bpf is generally recommended.
+
You should use default target when:
 
  - Your program includes a header file, e.g., ptrace.h, which eventually
-- 
2.9.5



Re: [PATCH net-next v2 2/2] dt: bindings: add new dt entries for brcmfmac

2018-03-19 Thread Arend van Spriel

+ Uffe

On 3/19/2018 6:55 PM, Florian Fainelli wrote:

On 03/19/2018 07:10 AM, Alexey Roslyakov wrote:

Hi Arend,
I appreciate your response. In my opinion, it has nothing to do with
SDIO host, because it defines "quirks" in the driver itself.


It is not clear to me from your patch series whether the problem is that:

- the SDIO device has a specific alignment requirements, which would be
either a SDIO device driver limitation/issue or maybe the underlying
hardware device/firmware requiring that

- the SDIO host controller used is not capable of coping nicely with
these said limitations

It seems to me like what you are doing here is a) applicable to possibly
more SDIO devices and host combinations, and b) should likely be done at
the layer between the host and device, such that it is available to more
combinations.


Indeed. That was my thought exactly and I can not imagine Uffe would 
push back on that reasoning.



If I get it right, you mean something like this:

mmc3: mmc@1c12000 {
...
 broken-sg-support;
 sd-head-align = 4;
 sd-sgentry-align = 512;

 brcmf: wifi@1 {
 ...
 };
};

Where dt: bindings documentation for these entries should reside?
In generic MMC bindings? Well, this is the very special case and
mmc-linux maintainer will unlikely to accept these changes.
Also, extra kernel code modification might be required. It could make
quite trivial change much more complex.


If the MMC maintainers are not copied on this patch series, it will
likely be hard for them to identify this patch series and chime in...


The main question is whether this is indeed a "very special case" as 
Alexey claims it to be or that it is likely to be applicable to other 
device and host combinations as you are suggesting.


If these properties are imposed by the host or host controller it would 
make sense to have these in the mmc bindings.





Also I am not sure if the broken-sg-support is still needed. We added that for 
omap_hsmmc, but that has since changed to scatter-gather emulation so it might 
not be needed anymore.


I've experienced the problem with rk3288 (dw-mmc host) and sdio
settings like above solved it.
Frankly, I haven't investigated any deeper which one of the settings
helped in my case yet...
I will try to get rid of broken-sg-support first and let you know if
it does make any difference.


Are you using some chromebook. I have some lying around here so I could 
also look into it. What broadcom chipset do you have?


Regards,
Arend


All the best,
   Alex.

On 19 March 2018 at 16:31, Arend van Spriel
 wrote:

On 3/19/2018 2:40 AM, Alexey Roslyakov wrote:


In case if the host has higher align requirements for SG items, allow
setting device-specific aligns for scatterlist items.

Signed-off-by: Alexey Roslyakov 
---
   Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt | 5
+
   1 file changed, 5 insertions(+)

diff --git
a/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
b/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
index 86602f264dce..187b8c1b52a7 100644
--- a/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
+++ b/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
@@ -17,6 +17,11 @@ Optional properties:
 When not specified the device will use in-band SDIO interrupts.
- interrupt-names : name of the out-of-band interrupt, which must be
set
 to "host-wake".
+ - brcm,broken-sg-support : boolean flag to indicate that the SDIO host
+   controller has higher align requirement than 32 bytes for each
+   scatterlist item.
+ - brcm,sd-head-align : alignment requirement for start of data buffer.
+ - brcm,sd-sgentry-align : length alignment requirement for each sg
entry.



Hi Alexey,

Thanks for the patch. However, the problem with these is that they are
characterizing the host controller and not the wireless device. So from
device tree perspective , which is to describe the hardware, these
properties should be SDIO host controller properties. Also I am not sure if
the broken-sg-support is still needed. We added that for omap_hsmmc, but
that has since changed to scatter-gather emulation so it might not be needed
anymore.

Regards,
Arend











Re: [PATCH 11/16] treewide: simplify Kconfig dependencies for removed archs

2018-03-19 Thread Alexandre Belloni
On 14/03/2018 at 15:43:46 +0100, Arnd Bergmann wrote:
> A lot of Kconfig symbols have architecture specific dependencies.
> In those cases that depend on architectures we have already removed,
> they can be omitted.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  block/bounce.c   |  2 +-
>  drivers/ide/Kconfig  |  2 +-
>  drivers/ide/ide-generic.c| 12 +---
>  drivers/input/joystick/analog.c  |  2 +-
>  drivers/isdn/hisax/Kconfig   | 10 +-
>  drivers/net/ethernet/davicom/Kconfig |  2 +-
>  drivers/net/ethernet/smsc/Kconfig|  6 +++---
>  drivers/net/wireless/cisco/Kconfig   |  2 +-
>  drivers/pwm/Kconfig  |  2 +-
>  drivers/rtc/Kconfig  |  2 +-

Acked-by: Alexandre Belloni 

>  drivers/spi/Kconfig  |  4 ++--
>  drivers/usb/musb/Kconfig |  2 +-
>  drivers/video/console/Kconfig|  3 +--
>  drivers/watchdog/Kconfig |  6 --
>  drivers/watchdog/Makefile|  6 --
>  fs/Kconfig.binfmt|  5 ++---
>  fs/minix/Kconfig |  2 +-
>  include/linux/ide.h  |  7 +--
>  init/Kconfig |  5 ++---
>  lib/Kconfig.debug| 13 +
>  lib/test_user_copy.c |  2 --
>  mm/Kconfig   |  7 ---
>  mm/percpu.c  |  4 
>  23 files changed, 31 insertions(+), 77 deletions(-)
> 
> diff --git a/block/bounce.c b/block/bounce.c
> index 6a3e68292273..dd0b93f2a871 100644
> --- a/block/bounce.c
> +++ b/block/bounce.c
> @@ -31,7 +31,7 @@
>  static struct bio_set *bounce_bio_set, *bounce_bio_split;
>  static mempool_t *page_pool, *isa_page_pool;
>  
> -#if defined(CONFIG_HIGHMEM) || defined(CONFIG_NEED_BOUNCE_POOL)
> +#if defined(CONFIG_HIGHMEM)
>  static __init int init_emergency_pool(void)
>  {
>  #if defined(CONFIG_HIGHMEM) && !defined(CONFIG_MEMORY_HOTPLUG)
> diff --git a/drivers/ide/Kconfig b/drivers/ide/Kconfig
> index cf1fb3fb5d26..901b8833847f 100644
> --- a/drivers/ide/Kconfig
> +++ b/drivers/ide/Kconfig
> @@ -200,7 +200,7 @@ comment "IDE chipset support/bugfixes"
>  
>  config IDE_GENERIC
>   tristate "generic/default IDE chipset support"
> - depends on ALPHA || X86 || IA64 || M32R || MIPS || ARCH_RPC
> + depends on ALPHA || X86 || IA64 || MIPS || ARCH_RPC
>   default ARM && ARCH_RPC
>   help
> This is the generic IDE driver.  This driver attaches to the
> diff --git a/drivers/ide/ide-generic.c b/drivers/ide/ide-generic.c
> index 54d7c4685d23..80c0d69b83ac 100644
> --- a/drivers/ide/ide-generic.c
> +++ b/drivers/ide/ide-generic.c
> @@ -13,13 +13,10 @@
>  #include 
>  #include 
>  
> -/* FIXME: convert arm and m32r to use ide_platform host driver */
> +/* FIXME: convert arm to use ide_platform host driver */
>  #ifdef CONFIG_ARM
>  #include 
>  #endif
> -#ifdef CONFIG_M32R
> -#include 
> -#endif
>  
>  #define DRV_NAME "ide_generic"
>  
> @@ -35,13 +32,6 @@ static const struct ide_port_info ide_generic_port_info = {
>  #ifdef CONFIG_ARM
>  static const u16 legacy_bases[] = { 0x1f0 };
>  static const int legacy_irqs[]  = { IRQ_HARDDISK };
> -#elif defined(CONFIG_PLAT_M32700UT) || defined(CONFIG_PLAT_MAPPI2) || \
> -  defined(CONFIG_PLAT_OPSPUT)
> -static const u16 legacy_bases[] = { 0x1f0 };
> -static const int legacy_irqs[]  = { PLD_IRQ_CFIREQ };
> -#elif defined(CONFIG_PLAT_MAPPI3)
> -static const u16 legacy_bases[] = { 0x1f0, 0x170 };
> -static const int legacy_irqs[]  = { PLD_IRQ_CFIREQ, PLD_IRQ_IDEIREQ };
>  #elif defined(CONFIG_ALPHA)
>  static const u16 legacy_bases[] = { 0x1f0, 0x170, 0x1e8, 0x168 };
>  static const int legacy_irqs[]  = { 14, 15, 11, 10 };
> diff --git a/drivers/input/joystick/analog.c b/drivers/input/joystick/analog.c
> index be1b4921f22a..eefac7978f93 100644
> --- a/drivers/input/joystick/analog.c
> +++ b/drivers/input/joystick/analog.c
> @@ -163,7 +163,7 @@ static unsigned int get_time_pit(void)
>  #define GET_TIME(x)  do { x = (unsigned int)rdtsc(); } while (0)
>  #define DELTA(x,y)   ((y)-(x))
>  #define TIME_NAME"TSC"
> -#elif defined(__alpha__) || defined(CONFIG_ARM) || defined(CONFIG_ARM64) || 
> defined(CONFIG_RISCV) || defined(CONFIG_TILE)
> +#elif defined(__alpha__) || defined(CONFIG_ARM) || defined(CONFIG_ARM64) || 
> defined(CONFIG_RISCV)
>  #define GET_TIME(x)  do { x = get_cycles(); } while (0)
>  #define DELTA(x,y)   ((y)-(x))
>  #define TIME_NAME"get_cycles"
> diff --git a/drivers/isdn/hisax/Kconfig b/drivers/isdn/hisax/Kconfig
> index eb83d94ab4fe..38cfc8baae19 100644
> --- a/drivers/isdn/hisax/Kconfig
> +++ b/drivers/isdn/hisax/Kconfig
> @@ -109,7 +109,7 @@ config HISAX_16_3
>  
>  config HISAX_TELESPCI
>   bool "Teles PCI"
> - depends on PCI && (BROKEN || !(SPARC || PPC || PARISC || M68K || (MIPS 
> && !CPU_LITTLE_ENDIAN) || FRV || (XTENSA && 

Re: NULL pointer dereferences with 4.14.27

2018-03-19 Thread Holger Hoffstätte

(CC: davem, soheil & gregkh)

On 03/17/18 20:12, Holger Hoffstätte wrote:
> On 03/17/18 19:41, Carlos Carvalho wrote:
>> I've put 4.14.27 this morning in this machine and in about 2h it started
>> showing null dereferences identical to the following one. There were several 
>> of
>> them, with about 1/2h of interval. Strangely it continued to work and I saw 
>> no
>> other anomalies. I've just reverted to 4.14.26.
>>
>> It only happened in this machine, which has a net traffic of several Gb/s and
>> thousands of simultaneous connections.
>>
>> Mar 17 13:29:21 sagres kernel: : BUG: unable to handle kernel NULL pointer 
>> dereference at 0038
>> Mar 17 13:29:21 sagres kernel: : IP: tcp_push+0x4e/0xe7
>> Mar 17 13:29:21 sagres kernel: : PGD 0 P4D 0 
>> Mar 17 13:29:21 sagres kernel: : Oops: 0002 [#1] SMP PTI
>> Mar 17 13:29:21 sagres kernel: : CPU: 55 PID: 2658 Comm: apache2 Not tainted 
>> 4.14.27 #4
(snip)
> 
> Fixed by: https://www.spinics.net/lists/netdev/msg489445.html
> 
> -h
> 

This patch is in the netdev patchwork at 
https://patchwork.ozlabs.org/patch/886324/
but has been marked as "not applicable" without further queued/rejected comment
from Dave, so I believe it became a victim of email lossage.
As the patch says it doesn't apply to anything older than 4.14, but it has been
tested & reported by several people as fixing the problem, and indeed works
fine. Since GregKH only accepts net patches from Dave I wanted to make sure
it got queued up for 4.14.

Thanks,
Holger


[PATCH AUTOSEL for 4.9 005/281] x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()

2018-03-19 Thread Sasha Levin
From: Josh Poimboeuf 

[ Upstream commit 42fc6c6cb1662ba2fa727dd01c9473c63be4e3b6 ]

Andrey Konovalov reported the following warning while fuzzing the kernel
with syzkaller:

  WARNING: kernel stack regs at 8800686869f8 in a.out:4933 has bad 'bp' 
value c3fc855a10167ec0

The unwinder dump revealed that RBP had a bad value when an interrupt
occurred in csum_partial_copy_generic().

That function saves RBP on the stack and then overwrites it, using it as
a scratch register.  That's problematic because it breaks stack traces
if an interrupt occurs in the middle of the function.

Replace the usage of RBP with another callee-saved register (R15) so
stack traces are no longer affected.

Reported-by: Andrey Konovalov 
Tested-by: Andrey Konovalov 
Signed-off-by: Josh Poimboeuf 
Cc: Cong Wang 
Cc: David S . Miller 
Cc: Dmitry Vyukov 
Cc: Eric Dumazet 
Cc: Kostya Serebryany 
Cc: Linus Torvalds 
Cc: Marcelo Ricardo Leitner 
Cc: Neil Horman 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vlad Yasevich 
Cc: linux-s...@vger.kernel.org
Cc: netdev 
Cc: syzkaller 
Link: 
http://lkml.kernel.org/r/4b03a961efda5ec9bfe46b7b9c9ad72d1efad343.1493909486.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 arch/x86/lib/csum-copy_64.S | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S
index 7e48807b2fa1..45a53dfe1859 100644
--- a/arch/x86/lib/csum-copy_64.S
+++ b/arch/x86/lib/csum-copy_64.S
@@ -55,7 +55,7 @@ ENTRY(csum_partial_copy_generic)
movq  %r12, 3*8(%rsp)
movq  %r14, 4*8(%rsp)
movq  %r13, 5*8(%rsp)
-   movq  %rbp, 6*8(%rsp)
+   movq  %r15, 6*8(%rsp)
 
movq  %r8, (%rsp)
movq  %r9, 1*8(%rsp)
@@ -74,7 +74,7 @@ ENTRY(csum_partial_copy_generic)
/* main loop. clear in 64 byte blocks */
/* r9: zero, r8: temp2, rbx: temp1, rax: sum, rcx: saved length */
/* r11: temp3, rdx: temp4, r12 loopcnt */
-   /* r10: temp5, rbp: temp6, r14 temp7, r13 temp8 */
+   /* r10: temp5, r15: temp6, r14 temp7, r13 temp8 */
.p2align 4
 .Lloop:
source
@@ -89,7 +89,7 @@ ENTRY(csum_partial_copy_generic)
source
movq  32(%rdi), %r10
source
-   movq  40(%rdi), %rbp
+   movq  40(%rdi), %r15
source
movq  48(%rdi), %r14
source
@@ -103,7 +103,7 @@ ENTRY(csum_partial_copy_generic)
adcq  %r11, %rax
adcq  %rdx, %rax
adcq  %r10, %rax
-   adcq  %rbp, %rax
+   adcq  %r15, %rax
adcq  %r14, %rax
adcq  %r13, %rax
 
@@ -121,7 +121,7 @@ ENTRY(csum_partial_copy_generic)
dest
movq %r10, 32(%rsi)
dest
-   movq %rbp, 40(%rsi)
+   movq %r15, 40(%rsi)
dest
movq %r14, 48(%rsi)
dest
@@ -203,7 +203,7 @@ ENTRY(csum_partial_copy_generic)
movq 3*8(%rsp), %r12
movq 4*8(%rsp), %r14
movq 5*8(%rsp), %r13
-   movq 6*8(%rsp), %rbp
+   movq 6*8(%rsp), %r15
addq $7*8, %rsp
ret
 
-- 
2.14.1


Re: recursive static routes

2018-03-19 Thread Saku Ytti
Hey,

> you want per-packet overhead instead of deferring the overhead event
> based updates? network events tend to be much less frequent than
> sending/forwarding packets

Depending on performance cost and complexity cost of options.

-- 
  ++ytti


[PATCH net-next 1/2] net: dsa: mv88e6xxx: Use the DT IRQ trigger mode

2018-03-19 Thread Andrew Lunn
By calling request_threaded_irq() with the flag IRQF_TRIGGER_FALLING
we override the trigger mode provided in device tree. And the
interrupt is actually active low, which is what all the current device
tree descriptions use.

Suggested-by: Uwe Kleine-K�nig 
Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index fe46b40195fa..84e6febaf881 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -425,7 +425,7 @@ static int mv88e6xxx_g1_irq_setup(struct mv88e6xxx_chip 
*chip)
 
err = request_threaded_irq(chip->irq, NULL,
   mv88e6xxx_g1_irq_thread_fn,
-  IRQF_ONESHOT | IRQF_TRIGGER_FALLING,
+  IRQF_ONESHOT,
   dev_name(chip->dev), chip);
if (err)
mv88e6xxx_g1_irq_free_common(chip);
-- 
2.16.2



[PATCH net-next 2/2] net: dsa: mv88e6xxx: Call the common IRQ free code

2018-03-19 Thread Andrew Lunn
When free'ing the polled IRQs, call the common irq free code.
Otherwise the interrupts are left registered, and when we come to load
the driver a second time, we get an Opps.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 84e6febaf881..85de118c4838 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -467,6 +467,8 @@ static int mv88e6xxx_irq_poll_setup(struct mv88e6xxx_chip 
*chip)
 
 static void mv88e6xxx_irq_poll_free(struct mv88e6xxx_chip *chip)
 {
+   mv88e6xxx_g1_irq_free_common(chip);
+
kthread_cancel_delayed_work_sync(>irq_poll_work);
kthread_destroy_worker(chip->kworker);
 }
-- 
2.16.2



[PATCH net-next 0/2] Fixes to allow mv88e6xxx module to be reloaded

2018-03-19 Thread Andrew Lunn
As reported by Uwe Kleine-K�nig, the interrupt trigger is first
configured by DT and then reconfigured to edge. This results in a
failure on EPROBE_DEFER, or if the module is unloaded and reloaded.

A second crash happens on module reload due to a missing call to the
common IRQ free code when using polled interrupts.

With these fixes in place, it becomes possible to load and unload the
kernel modules a few times without it crashing.

Andrew Lunn (2):
  net: dsa: mv88e6xxx: Use the DT IRQ trigger mode
  net: dsa: mv88e6xxx: Call the common IRQ free code

 drivers/net/dsa/mv88e6xxx/chip.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.16.2



[PATCH v3 iproute2 1/1] tc: fix conversion types when printing actions unsigned values

2018-03-19 Thread Roman Mashak
v3:
   fixed conversion in connmark missed out in first version
v2:
   fixed coding style

Signed-off-by: Roman Mashak 
---
 tc/m_action.c | 2 +-
 tc/m_connmark.c   | 2 +-
 tc/m_gact.c   | 2 +-
 tc/m_ife.c| 2 +-
 tc/m_pedit.c  | 2 +-
 tc/m_sample.c | 6 +++---
 tc/m_tunnel_key.c | 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index 148f1372d414..244c1ec00f28 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -408,7 +408,7 @@ int print_action(const struct sockaddr_nl *who,
if (tb[TCA_ROOT_COUNT])
tot_acts = RTA_DATA(tb[TCA_ROOT_COUNT]);
 
-   fprintf(fp, "total acts %d\n", tot_acts ? *tot_acts:0);
+   fprintf(fp, "total acts %u\n", tot_acts ? *tot_acts : 0);
if (tb[TCA_ACT_TAB] == NULL) {
if (n->nlmsg_type != RTM_GETACTION)
fprintf(stderr, "print_action: NULL kind\n");
diff --git a/tc/m_connmark.c b/tc/m_connmark.c
index 37d718541549..7c4ba7ae5301 100644
--- a/tc/m_connmark.c
+++ b/tc/m_connmark.c
@@ -121,7 +121,7 @@ static int print_connmark(struct action_util *au, FILE *f, 
struct rtattr *arg)
 
ci = RTA_DATA(tb[TCA_CONNMARK_PARMS]);
 
-   fprintf(f, " connmark zone %d\n", ci->zone);
+   fprintf(f, " connmark zone %u\n", ci->zone);
fprintf(f, "\t index %u ref %d bind %d", ci->index,
ci->refcnt, ci->bindcnt);
 
diff --git a/tc/m_gact.c b/tc/m_gact.c
index 16c4413f4217..52022415db48 100644
--- a/tc/m_gact.c
+++ b/tc/m_gact.c
@@ -194,7 +194,7 @@ print_gact(struct action_util *au, FILE *f, struct rtattr 
*arg)
print_string(PRINT_ANY, "random_type", "\n\t random type %s",
 prob_n2a(pp->ptype));
print_action_control(f, " ", pp->paction, " ");
-   print_int(PRINT_ANY, "val", "val %d", pp->pval);
+   print_int(PRINT_ANY, "val", "val %u", pp->pval);
close_json_object();
 #endif
print_uint(PRINT_ANY, "index", "\n\t index %u", p->index);
diff --git a/tc/m_ife.c b/tc/m_ife.c
index 205efc9f1d9a..e1dbd3a79649 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -280,7 +280,7 @@ static int print_ife(struct action_util *au, FILE *f, 
struct rtattr *arg)
if (len) {
mtcindex =

rta_getattr_u16(metalist[IFE_META_TCINDEX]);
-   fprintf(f, "use tcindex %d ", mtcindex);
+   fprintf(f, "use tcindex %u ", mtcindex);
} else
fprintf(f, "allow tcindex ");
}
diff --git a/tc/m_pedit.c b/tc/m_pedit.c
index 26549eeea899..151dfe1a230a 100644
--- a/tc/m_pedit.c
+++ b/tc/m_pedit.c
@@ -817,7 +817,7 @@ int print_pedit(struct action_util *au, FILE *f, struct 
rtattr *arg)
(unsigned int)ntohl(key->mask));
}
} else {
-   fprintf(f, "\npedit %x keys %d is not LEGIT", sel->index,
+   fprintf(f, "\npedit %x keys %u is not LEGIT", sel->index,
sel->nkeys);
}
 
diff --git a/tc/m_sample.c b/tc/m_sample.c
index 01763cb4c356..d42a6a327965 100644
--- a/tc/m_sample.c
+++ b/tc/m_sample.c
@@ -155,17 +155,17 @@ static int print_sample(struct action_util *au, FILE *f, 
struct rtattr *arg)
}
p = RTA_DATA(tb[TCA_SAMPLE_PARMS]);
 
-   fprintf(f, "sample rate 1/%d group %d",
+   fprintf(f, "sample rate 1/%u group %u",
rta_getattr_u32(tb[TCA_SAMPLE_RATE]),
rta_getattr_u32(tb[TCA_SAMPLE_PSAMPLE_GROUP]));
 
if (tb[TCA_SAMPLE_TRUNC_SIZE])
-   fprintf(f, " trunc_size %d",
+   fprintf(f, " trunc_size %u",
rta_getattr_u32(tb[TCA_SAMPLE_TRUNC_SIZE]));
 
print_action_control(f, " ", p->action, "");
 
-   fprintf(f, "\n\tindex %d ref %d bind %d", p->index, p->refcnt,
+   fprintf(f, "\n\tindex %u ref %d bind %d", p->index, p->refcnt,
p->bindcnt);
 
if (show_stats) {
diff --git a/tc/m_tunnel_key.c b/tc/m_tunnel_key.c
index 1cdd03560c35..dd8f8e8c635b 100644
--- a/tc/m_tunnel_key.c
+++ b/tc/m_tunnel_key.c
@@ -292,7 +292,7 @@ static int print_tunnel_key(struct action_util *au, FILE 
*f, struct rtattr *arg)
}
print_action_control(f, " ", parm->action, "");
 
-   fprintf(f, "\n\tindex %d ref %d bind %d", parm->index, parm->refcnt,
+   fprintf(f, "\n\tindex %u ref %d bind %d", parm->index, parm->refcnt,
parm->bindcnt);
 
if (show_stats) {
-- 
2.7.4



Re: [PATCH v2 iproute2 1/1] tc: fix conversion types when printing actions unsigned values

2018-03-19 Thread Roman Mashak
Please disregard this patch, I will send v3.

On Mon, Mar 19, 2018 at 2:13 PM, Roman Mashak  wrote:
> v2:
>fixed coding style
>
> Signed-off-by: Roman Mashak 
> ---
>  tc/m_action.c | 2 +-
>  tc/m_gact.c   | 2 +-
>  tc/m_ife.c| 2 +-
>  tc/m_pedit.c  | 2 +-
>  tc/m_sample.c | 6 +++---
>  tc/m_tunnel_key.c | 2 +-
>  6 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/tc/m_action.c b/tc/m_action.c
> index 148f1372d414..244c1ec00f28 100644
> --- a/tc/m_action.c
> +++ b/tc/m_action.c
> @@ -408,7 +408,7 @@ int print_action(const struct sockaddr_nl *who,
> if (tb[TCA_ROOT_COUNT])
> tot_acts = RTA_DATA(tb[TCA_ROOT_COUNT]);
>
> -   fprintf(fp, "total acts %d\n", tot_acts ? *tot_acts:0);
> +   fprintf(fp, "total acts %u\n", tot_acts ? *tot_acts : 0);
> if (tb[TCA_ACT_TAB] == NULL) {
> if (n->nlmsg_type != RTM_GETACTION)
> fprintf(stderr, "print_action: NULL kind\n");
> diff --git a/tc/m_gact.c b/tc/m_gact.c
> index 16c4413f4217..52022415db48 100644
> --- a/tc/m_gact.c
> +++ b/tc/m_gact.c
> @@ -194,7 +194,7 @@ print_gact(struct action_util *au, FILE *f, struct rtattr 
> *arg)
> print_string(PRINT_ANY, "random_type", "\n\t random type %s",
>  prob_n2a(pp->ptype));
> print_action_control(f, " ", pp->paction, " ");
> -   print_int(PRINT_ANY, "val", "val %d", pp->pval);
> +   print_int(PRINT_ANY, "val", "val %u", pp->pval);
> close_json_object();
>  #endif
> print_uint(PRINT_ANY, "index", "\n\t index %u", p->index);
> diff --git a/tc/m_ife.c b/tc/m_ife.c
> index 205efc9f1d9a..e1dbd3a79649 100644
> --- a/tc/m_ife.c
> +++ b/tc/m_ife.c
> @@ -280,7 +280,7 @@ static int print_ife(struct action_util *au, FILE *f, 
> struct rtattr *arg)
> if (len) {
> mtcindex =
> 
> rta_getattr_u16(metalist[IFE_META_TCINDEX]);
> -   fprintf(f, "use tcindex %d ", mtcindex);
> +   fprintf(f, "use tcindex %u ", mtcindex);
> } else
> fprintf(f, "allow tcindex ");
> }
> diff --git a/tc/m_pedit.c b/tc/m_pedit.c
> index 26549eeea899..151dfe1a230a 100644
> --- a/tc/m_pedit.c
> +++ b/tc/m_pedit.c
> @@ -817,7 +817,7 @@ int print_pedit(struct action_util *au, FILE *f, struct 
> rtattr *arg)
> (unsigned int)ntohl(key->mask));
> }
> } else {
> -   fprintf(f, "\npedit %x keys %d is not LEGIT", sel->index,
> +   fprintf(f, "\npedit %x keys %u is not LEGIT", sel->index,
> sel->nkeys);
> }
>
> diff --git a/tc/m_sample.c b/tc/m_sample.c
> index 01763cb4c356..d42a6a327965 100644
> --- a/tc/m_sample.c
> +++ b/tc/m_sample.c
> @@ -155,17 +155,17 @@ static int print_sample(struct action_util *au, FILE 
> *f, struct rtattr *arg)
> }
> p = RTA_DATA(tb[TCA_SAMPLE_PARMS]);
>
> -   fprintf(f, "sample rate 1/%d group %d",
> +   fprintf(f, "sample rate 1/%u group %u",
> rta_getattr_u32(tb[TCA_SAMPLE_RATE]),
> rta_getattr_u32(tb[TCA_SAMPLE_PSAMPLE_GROUP]));
>
> if (tb[TCA_SAMPLE_TRUNC_SIZE])
> -   fprintf(f, " trunc_size %d",
> +   fprintf(f, " trunc_size %u",
> rta_getattr_u32(tb[TCA_SAMPLE_TRUNC_SIZE]));
>
> print_action_control(f, " ", p->action, "");
>
> -   fprintf(f, "\n\tindex %d ref %d bind %d", p->index, p->refcnt,
> +   fprintf(f, "\n\tindex %u ref %d bind %d", p->index, p->refcnt,
> p->bindcnt);
>
> if (show_stats) {
> diff --git a/tc/m_tunnel_key.c b/tc/m_tunnel_key.c
> index 1cdd03560c35..dd8f8e8c635b 100644
> --- a/tc/m_tunnel_key.c
> +++ b/tc/m_tunnel_key.c
> @@ -292,7 +292,7 @@ static int print_tunnel_key(struct action_util *au, FILE 
> *f, struct rtattr *arg)
> }
> print_action_control(f, " ", parm->action, "");
>
> -   fprintf(f, "\n\tindex %d ref %d bind %d", parm->index, parm->refcnt,
> +   fprintf(f, "\n\tindex %u ref %d bind %d", parm->index, parm->refcnt,
> parm->bindcnt);
>
> if (show_stats) {
> --
> 2.7.4
>



-- 
Roman Mashak


Re: [PATCH net-next 3/4] net: dsa: Plug in PHYLINK support

2018-03-19 Thread Russell King - ARM Linux
On Mon, Mar 19, 2018 at 10:59:55AM -0700, Florian Fainelli wrote:
> Hi Andrew,
> 
> On 03/18/2018 12:19 PM, Andrew Lunn wrote:
> >> +static int dsa_slave_nway_reset(struct net_device *dev)
> >> +{
> >> +  struct dsa_port *dp = dsa_slave_to_port(dev);
> >> +
> >> +  return phylink_ethtool_nway_reset(dp->pl);
> >> +}
> > 
> > Hi Florian
> > 
> > I've seen in one of Russells trees a patch to put a phylink into
> > net_device. That would make a generic slave_nway_reset() possible, and
> > a few others as well. Maybe it makes sense to pull in that patch?
> 
> To make this generic, we would have to have net_device carry a reference
> to a phylink instance, which I would rather not do. Were you possibly
> referring to this patch set:
> 
> http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=phy=4eda3b76573473d811bc80a6f0e5a2e06dd76bf6
> 
> in which case I think it was discussed and rejected (that was my
> recollection).

Unfortunately, that rejection kind'a prevents SFP support on a PHY,
which is why I'm not pushing the 3310 patches (I don't have any other
solution to this problem at the moment as PHYLIB gets in the way of
knowing what state the network interface is in.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


[PATCH 3.18 34/68] MIPS: BPF: Quit clobbering callee saved registers in JIT code.

2018-03-19 Thread Greg Kroah-Hartman
3.18-stable review patch.  If anyone has any objections, please let me know.

--

From: David Daney 


[ Upstream commit 1ef0910cfd681f0bd0b81f8809935b2006e9cfb9 ]

If bpf_needs_clear_a() returns true, only actually clear it if it is
ever used.  If it is not used, we don't save and restore it, so the
clearing has the nasty side effect of clobbering caller state.

Also, don't emit stack pointer adjustment instructions if the
adjustment amount is zero.

Signed-off-by: David Daney 
Cc: James Hogan 
Cc: Alexei Starovoitov 
Cc: Steven J. Hill 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15745/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/mips/net/bpf_jit.c |   16 
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -562,7 +562,8 @@ static void save_bpf_jit_regs(struct jit
u32 sflags, tmp_flags;
 
/* Adjust the stack pointer */
-   emit_stack_offset(-align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(-align_sp(offset), ctx);
 
if (ctx->flags & SEEN_CALL) {
/* Argument save area */
@@ -641,7 +642,8 @@ static void restore_bpf_jit_regs(struct
emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
 
/* Restore the sp and discard the scrach memory */
-   emit_stack_offset(align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(align_sp(offset), ctx);
 }
 
 static unsigned int get_stack_depth(struct jit_ctx *ctx)
@@ -689,8 +691,14 @@ static void build_prologue(struct jit_ct
if (ctx->flags & SEEN_X)
emit_jit_reg_move(r_X, r_zero, ctx);
 
-   /* Do not leak kernel data to userspace */
-   if (bpf_needs_clear_a(>skf->insns[0]))
+   /*
+* Do not leak kernel data to userspace, we only need to clear
+* r_A if it is ever used.  In fact if it is never used, we
+* will not save/restore it, so clearing it in this case would
+* corrupt the state of the caller.
+*/
+   if (bpf_needs_clear_a(>skf->insns[0]) &&
+   (ctx->flags & SEEN_A))
emit_jit_reg_move(r_A, r_zero, ctx);
 }
 




Re: [RFC net-next] sfp/phylink: move module EEPROM ethtool access into netdev core ethtool

2018-03-19 Thread Andrew Lunn
On Mon, Mar 19, 2018 at 12:20:32PM -0700, Florian Fainelli wrote:
> On 12/17/2017 06:48 AM, Russell King wrote:
> > Provide a pointer to the SFP bus in struct net_device, so that the
> > ethtool module EEPROM methods can access the SFP directly, rather
> > than needing every user to provide a hook for it.
> > 
> > Signed-off-by: Russell King 
> > ---
> > Questions:
> > 1. Is it worth adding a pointer to struct net_device for these two
> >methods, rather than having multiple duplicate veneers to vector
> >the ethtool module EEPROM ioctls through to the SFP bus layer?
> 
> Considering the negative diffstat and the fact that it solves real
> problems for you, I would say yes.

We have also received a bunch of patches removing the phydev pointer
for driver private structures and making use of the net_device one. It
would be nice to avoid the same with phylink.

  Andrew


Re: [PATCH net-next 3/4] net: dsa: Plug in PHYLINK support

2018-03-19 Thread Florian Fainelli
On 03/19/2018 11:09 AM, Russell King - ARM Linux wrote:
> On Mon, Mar 19, 2018 at 10:59:55AM -0700, Florian Fainelli wrote:
>> Hi Andrew,
>>
>> On 03/18/2018 12:19 PM, Andrew Lunn wrote:
 +static int dsa_slave_nway_reset(struct net_device *dev)
 +{
 +  struct dsa_port *dp = dsa_slave_to_port(dev);
 +
 +  return phylink_ethtool_nway_reset(dp->pl);
 +}
>>>
>>> Hi Florian
>>>
>>> I've seen in one of Russells trees a patch to put a phylink into
>>> net_device. That would make a generic slave_nway_reset() possible, and
>>> a few others as well. Maybe it makes sense to pull in that patch?
>>
>> To make this generic, we would have to have net_device carry a reference
>> to a phylink instance, which I would rather not do. Were you possibly
>> referring to this patch set:
>>
>> http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=phy=4eda3b76573473d811bc80a6f0e5a2e06dd76bf6
>>
>> in which case I think it was discussed and rejected (that was my
>> recollection).
> 
> Unfortunately, that rejection kind'a prevents SFP support on a PHY,
> which is why I'm not pushing the 3310 patches (I don't have any other
> solution to this problem at the moment as PHYLIB gets in the way of
> knowing what state the network interface is in.)

I don't remember the basis on which this was rejected, and since then, I
had many sleepless nights which don't help with long term memory :) Can
you refresh the context?
-- 
Florian


Re: [bpf-next PATCH v3 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data

2018-03-19 Thread Alexei Starovoitov
On Sun, Mar 18, 2018 at 12:57:25PM -0700, John Fastabend wrote:
> Currently, if a bpf sk msg program is run the program
> can only parse data that the (start,end) pointers already
> consumed. For sendmsg hooks this is likely the first
> scatterlist element. For sendpage this will be the range
> (0,0) because the data is shared with userspace and by
> default we want to avoid allowing userspace to modify
> data while (or after) BPF verdict is being decided.
> 
> To support pulling in additional bytes for parsing use
> a new helper bpf_sk_msg_pull(start, end, flags) which
> works similar to cls tc logic. This helper will attempt
> to point the data start pointer at 'start' bytes offest
> into msg and data end pointer at 'end' bytes offset into
> message.
> 
> After basic sanity checks to ensure 'start' <= 'end' and
> 'end' <= msg_length there are a few cases we need to
> handle.
> 
> First the sendmsg hook has already copied the data from
> userspace and has exclusive access to it. Therefor, it
> is not necessesary to copy the data. However, it may
> be required. After finding the scatterlist element with
> 'start' offset byte in it there are two cases. One the
> range (start,end) is entirely contained in the sg element
> and is already linear. All that is needed is to update the
> data pointers, no allocate/copy is needed. The other case
> is (start, end) crosses sg element boundaries. In this
> case we allocate a block of size 'end - start' and copy
> the data to linearize it.
> 
> Next sendpage hook has not copied any data in initial
> state so that data pointers are (0,0). In this case we
> handle it similar to the above sendmsg case except the
> allocation/copy must always happen. Then when sending
> the data we have possibly three memory regions that
> need to be sent, (0, start - 1), (start, end), and
> (end + 1, msg_length). This is required to ensure any
> writes by the BPF program are correctly transmitted.
> 
> Lastly this operation will invalidate any previous
> data checks so BPF programs will have to revalidate
> pointers after making this BPF call.
> 
> Signed-off-by: John Fastabend 
..
> +
> + page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy));
> + if (unlikely(!page))
> + return -ENOMEM;

I think that's fine. Just curious what order do you see in practice?

Acked-by: Alexei Starovoitov 



[PATCH v2 iproute2 1/1] tc: fix conversion types when printing actions unsigned values

2018-03-19 Thread Roman Mashak
v2:
   fixed coding style

Signed-off-by: Roman Mashak 
---
 tc/m_action.c | 2 +-
 tc/m_gact.c   | 2 +-
 tc/m_ife.c| 2 +-
 tc/m_pedit.c  | 2 +-
 tc/m_sample.c | 6 +++---
 tc/m_tunnel_key.c | 2 +-
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index 148f1372d414..244c1ec00f28 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -408,7 +408,7 @@ int print_action(const struct sockaddr_nl *who,
if (tb[TCA_ROOT_COUNT])
tot_acts = RTA_DATA(tb[TCA_ROOT_COUNT]);
 
-   fprintf(fp, "total acts %d\n", tot_acts ? *tot_acts:0);
+   fprintf(fp, "total acts %u\n", tot_acts ? *tot_acts : 0);
if (tb[TCA_ACT_TAB] == NULL) {
if (n->nlmsg_type != RTM_GETACTION)
fprintf(stderr, "print_action: NULL kind\n");
diff --git a/tc/m_gact.c b/tc/m_gact.c
index 16c4413f4217..52022415db48 100644
--- a/tc/m_gact.c
+++ b/tc/m_gact.c
@@ -194,7 +194,7 @@ print_gact(struct action_util *au, FILE *f, struct rtattr 
*arg)
print_string(PRINT_ANY, "random_type", "\n\t random type %s",
 prob_n2a(pp->ptype));
print_action_control(f, " ", pp->paction, " ");
-   print_int(PRINT_ANY, "val", "val %d", pp->pval);
+   print_int(PRINT_ANY, "val", "val %u", pp->pval);
close_json_object();
 #endif
print_uint(PRINT_ANY, "index", "\n\t index %u", p->index);
diff --git a/tc/m_ife.c b/tc/m_ife.c
index 205efc9f1d9a..e1dbd3a79649 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -280,7 +280,7 @@ static int print_ife(struct action_util *au, FILE *f, 
struct rtattr *arg)
if (len) {
mtcindex =

rta_getattr_u16(metalist[IFE_META_TCINDEX]);
-   fprintf(f, "use tcindex %d ", mtcindex);
+   fprintf(f, "use tcindex %u ", mtcindex);
} else
fprintf(f, "allow tcindex ");
}
diff --git a/tc/m_pedit.c b/tc/m_pedit.c
index 26549eeea899..151dfe1a230a 100644
--- a/tc/m_pedit.c
+++ b/tc/m_pedit.c
@@ -817,7 +817,7 @@ int print_pedit(struct action_util *au, FILE *f, struct 
rtattr *arg)
(unsigned int)ntohl(key->mask));
}
} else {
-   fprintf(f, "\npedit %x keys %d is not LEGIT", sel->index,
+   fprintf(f, "\npedit %x keys %u is not LEGIT", sel->index,
sel->nkeys);
}
 
diff --git a/tc/m_sample.c b/tc/m_sample.c
index 01763cb4c356..d42a6a327965 100644
--- a/tc/m_sample.c
+++ b/tc/m_sample.c
@@ -155,17 +155,17 @@ static int print_sample(struct action_util *au, FILE *f, 
struct rtattr *arg)
}
p = RTA_DATA(tb[TCA_SAMPLE_PARMS]);
 
-   fprintf(f, "sample rate 1/%d group %d",
+   fprintf(f, "sample rate 1/%u group %u",
rta_getattr_u32(tb[TCA_SAMPLE_RATE]),
rta_getattr_u32(tb[TCA_SAMPLE_PSAMPLE_GROUP]));
 
if (tb[TCA_SAMPLE_TRUNC_SIZE])
-   fprintf(f, " trunc_size %d",
+   fprintf(f, " trunc_size %u",
rta_getattr_u32(tb[TCA_SAMPLE_TRUNC_SIZE]));
 
print_action_control(f, " ", p->action, "");
 
-   fprintf(f, "\n\tindex %d ref %d bind %d", p->index, p->refcnt,
+   fprintf(f, "\n\tindex %u ref %d bind %d", p->index, p->refcnt,
p->bindcnt);
 
if (show_stats) {
diff --git a/tc/m_tunnel_key.c b/tc/m_tunnel_key.c
index 1cdd03560c35..dd8f8e8c635b 100644
--- a/tc/m_tunnel_key.c
+++ b/tc/m_tunnel_key.c
@@ -292,7 +292,7 @@ static int print_tunnel_key(struct action_util *au, FILE 
*f, struct rtattr *arg)
}
print_action_control(f, " ", parm->action, "");
 
-   fprintf(f, "\n\tindex %d ref %d bind %d", parm->index, parm->refcnt,
+   fprintf(f, "\n\tindex %u ref %d bind %d", parm->index, parm->refcnt,
parm->bindcnt);
 
if (show_stats) {
-- 
2.7.4



[PATCH 4.4 062/134] MIPS: BPF: Quit clobbering callee saved registers in JIT code.

2018-03-19 Thread Greg Kroah-Hartman
4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: David Daney 


[ Upstream commit 1ef0910cfd681f0bd0b81f8809935b2006e9cfb9 ]

If bpf_needs_clear_a() returns true, only actually clear it if it is
ever used.  If it is not used, we don't save and restore it, so the
clearing has the nasty side effect of clobbering caller state.

Also, don't emit stack pointer adjustment instructions if the
adjustment amount is zero.

Signed-off-by: David Daney 
Cc: James Hogan 
Cc: Alexei Starovoitov 
Cc: Steven J. Hill 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15745/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/mips/net/bpf_jit.c |   16 
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -527,7 +527,8 @@ static void save_bpf_jit_regs(struct jit
u32 sflags, tmp_flags;
 
/* Adjust the stack pointer */
-   emit_stack_offset(-align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(-align_sp(offset), ctx);
 
tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
/* sflags is essentially a bitmap */
@@ -579,7 +580,8 @@ static void restore_bpf_jit_regs(struct
emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
 
/* Restore the sp and discard the scrach memory */
-   emit_stack_offset(align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(align_sp(offset), ctx);
 }
 
 static unsigned int get_stack_depth(struct jit_ctx *ctx)
@@ -626,8 +628,14 @@ static void build_prologue(struct jit_ct
if (ctx->flags & SEEN_X)
emit_jit_reg_move(r_X, r_zero, ctx);
 
-   /* Do not leak kernel data to userspace */
-   if (bpf_needs_clear_a(>skf->insns[0]))
+   /*
+* Do not leak kernel data to userspace, we only need to clear
+* r_A if it is ever used.  In fact if it is never used, we
+* will not save/restore it, so clearing it in this case would
+* corrupt the state of the caller.
+*/
+   if (bpf_needs_clear_a(>skf->insns[0]) &&
+   (ctx->flags & SEEN_A))
emit_jit_reg_move(r_A, r_zero, ctx);
 }
 




Re: [PATCH iproute2 1/1] tc: fix conversion types when printing actions unsigned values

2018-03-19 Thread Roman Mashak
Stephen Hemminger  writes:

> On Mon, 19 Mar 2018 13:50:07 -0400
> Roman Mashak  wrote:
>
>> Signed-off-by: Roman Mashak 
>> ---
>>  tc/m_action.c | 2 +-
>>  tc/m_gact.c   | 2 +-
>>  tc/m_ife.c| 2 +-
>>  tc/m_pedit.c  | 2 +-
>>  tc/m_sample.c | 6 +++---
>>  tc/m_tunnel_key.c | 2 +-
>>  6 files changed, 8 insertions(+), 8 deletions(-)
>> 
>> diff --git a/tc/m_action.c b/tc/m_action.c
>> index 148f1372d414..85c9d44c7e50 100644
>> --- a/tc/m_action.c
>> +++ b/tc/m_action.c
>> @@ -408,7 +408,7 @@ int print_action(const struct sockaddr_nl *who,
>>  if (tb[TCA_ROOT_COUNT])
>>  tot_acts = RTA_DATA(tb[TCA_ROOT_COUNT]);
>>  
>> -fprintf(fp, "total acts %d\n", tot_acts ? *tot_acts:0);
>> +fprintf(fp, "total acts %u\n", tot_acts ? *tot_acts:0);
>
> Please add spaces around : in trigraph.
>
> When fixing code, it has to pass style checkers.

Thanks, I will send v2.


[PATCH 4.4 063/134] MIPS: BPF: Fix multiple problems in JIT skb access helpers.

2018-03-19 Thread Greg Kroah-Hartman
4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: David Daney 


[ Upstream commit a81507c79f4ae9a0f9fb1054b59b62a090620dd9 ]

o Socket data is unsigned, so use unsigned accessors instructions.

 o Fix path result pointer generation arithmetic.

 o Fix half-word byte swapping code for unsigned semantics.

Signed-off-by: David Daney 
Cc: James Hogan 
Cc: Alexei Starovoitov 
Cc: Steven J. Hill 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15747/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/mips/net/bpf_jit_asm.S |   23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

--- a/arch/mips/net/bpf_jit_asm.S
+++ b/arch/mips/net/bpf_jit_asm.S
@@ -90,18 +90,14 @@ FEXPORT(sk_load_half_positive)
is_offset_in_header(2, half)
/* Offset within header boundaries */
PTR_ADDU t1, $r_skb_data, offset
-   .setreorder
-   lh  $r_A, 0(t1)
-   .setnoreorder
+   lhu $r_A, 0(t1)
 #ifdef CONFIG_CPU_LITTLE_ENDIAN
 # if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
-   wsbht0, $r_A
-   seh $r_A, t0
+   wsbh$r_A, $r_A
 # else
-   sll t0, $r_A, 24
-   andit1, $r_A, 0xff00
-   sra t0, t0, 16
-   srl t1, t1, 8
+   sll t0, $r_A, 8
+   srl t1, $r_A, 8
+   andit0, t0, 0xff00
or  $r_A, t0, t1
 # endif
 #endif
@@ -115,7 +111,7 @@ FEXPORT(sk_load_byte_positive)
is_offset_in_header(1, byte)
/* Offset within header boundaries */
PTR_ADDU t1, $r_skb_data, offset
-   lb  $r_A, 0(t1)
+   lbu $r_A, 0(t1)
jr  $r_ra
 move   $r_ret, zero
END(sk_load_byte)
@@ -139,6 +135,11 @@ FEXPORT(sk_load_byte_positive)
  * (void *to) is returned in r_s0
  *
  */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define DS_OFFSET(SIZE) (4 * SZREG)
+#else
+#define DS_OFFSET(SIZE) ((4 * SZREG) + (4 - SIZE))
+#endif
 #define bpf_slow_path_common(SIZE) \
/* Quick check. Are we within reasonable boundaries? */ \
LONG_ADDIU  $r_s1, $r_skb_len, -SIZE;   \
@@ -150,7 +151,7 @@ FEXPORT(sk_load_byte_positive)
PTR_LA  t0, skb_copy_bits;  \
PTR_S   $r_ra, (5 * SZREG)($r_sp);  \
/* Assign low slot to a2 */ \
-   movea2, $r_sp;  \
+   PTR_ADDIU   a2, $r_sp, DS_OFFSET(SIZE); \
jalrt0; \
/* Reset our destination slot (DS but it's ok) */   \
 INT_S  zero, (4 * SZREG)($r_sp);   \




Re: [bpf-next PATCH v3 00/18] bpf,sockmap: sendmsg/sendfile ULP

2018-03-19 Thread Daniel Borkmann
On 03/18/2018 08:56 PM, John Fastabend wrote:
> This series adds a BPF hook for sendmsg and senfile by using
> the ULP infrastructure and sockmap. A simple pseudocode example
> would be,
[...]

Series applied to bpf-next, thanks John!


RE: [PATCH v3 11/18] qlcnic: Eliminate duplicate barriers on weakly-ordered archs

2018-03-19 Thread Chopra, Manish
> -Original Message-
> From: Sinan Kaya [mailto:ok...@codeaurora.org]
> Sent: Friday, March 16, 2018 9:46 PM
> To: netdev@vger.kernel.org; ti...@codeaurora.org; sulr...@codeaurora.org
> Cc: linux-arm-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> Sinan Kaya ; Patil, Harish ;
> Chopra, Manish ; Dept-GE Linux NIC Dev  gelinuxnic...@cavium.com>; linux-ker...@vger.kernel.org
> Subject: [PATCH v3 11/18] qlcnic: Eliminate duplicate barriers on 
> weakly-ordered
> archs
> 
> Code includes wmb() followed by writel(). writel() already has a barrier on 
> some
> architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> index 46b0372..97c146e7 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> @@ -478,7 +478,7 @@ irqreturn_t qlcnic_83xx_clear_legacy_intr(struct
> qlcnic_adapter *adapter)
>   wmb();
> 
>   /* clear the interrupt trigger control register */
> - writel(0, adapter->isr_int_vec);
> + writel_relaxed(0, adapter->isr_int_vec);
>   intr_val = readl(adapter->isr_int_vec);
>   do {
>   intr_val = readl(adapter->tgt_status_reg);
> --
> 2.7.4

Acked-by: Manish Chopra 

Thanks.


Re: recursive static routes

2018-03-19 Thread David Ahern
On 3/19/18 12:58 PM, Saku Ytti wrote:
> Hey David,
> 
>> The Linux stack does not flatten routes when inserting into the FIB.
>> Recursion is expected to be done a routing daemon such as bgp which will
>> be able to handle updates as the network changes.
> 
> Are you saying that routing protocol would observe the next-hop
> change, then update the Linux kernel route to reflect that?

yes

> 
> Wouldn't that add another layer of state and the implied delays of
> maintaining and updating the state.
> 
> Is it not practical to do lookup per-packet to recurse until egress
> rewrite information is found? So literally no state in memory anywhere
> saying 0/0 next-hop is 192.0.2.42/22:22:22:22:22:22, it would always
> have to walk the FIB to find it.
> 

you want per-packet overhead instead of deferring the overhead event
based updates? network events tend to be much less frequent than
sending/forwarding packets.


Re: [bpf-next PATCH v3 07/18] bpf: sockmap, add msg_cork_bytes() helper

2018-03-19 Thread John Fastabend
On 03/19/2018 09:30 AM, Alexei Starovoitov wrote:
> On Sun, Mar 18, 2018 at 12:57:20PM -0700, John Fastabend wrote:
>> In the case where we need a specific number of bytes before a
>> verdict can be assigned, even if the data spans multiple sendmsg
>> or sendfile calls. The BPF program may use msg_cork_bytes().
>>
>> The extreme case is a user can call sendmsg repeatedly with
>> 1-byte msg segments. Obviously, this is bad for performance but
>> is still valid. If the BPF program needs N bytes to validate
>> a header it can use msg_cork_bytes to specify N bytes and the
>> BPF program will not be called again until N bytes have been
>> accumulated. The infrastructure will attempt to coalesce data
>> if possible so in many cases (most my use cases at least) the
>> data will be in a single scatterlist element with data pointers
>> pointing to start/end of the element. However, this is dependent
>> on available memory so is not guaranteed. So BPF programs must
>> validate data pointer ranges, but this is the case anyways to
>> convince the verifier the accesses are valid.
>>
>> Signed-off-by: John Fastabend 
>> ---
>>  include/uapi/linux/bpf.h |3 ++-
>>  net/core/filter.c|   16 
>>  2 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index a557a2a..1765cfb 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -792,7 +792,8 @@ struct bpf_stack_build_id {
>>  FN(override_return),\
>>  FN(sock_ops_cb_flags_set),  \
>>  FN(msg_redirect_map),   \
>> -FN(msg_apply_bytes),
>> +FN(msg_apply_bytes),\
>> +FN(msg_cork_bytes),
>>  
>>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>   * function eBPF program intends to call
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 17d6775..0c9daf6 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -1942,6 +1942,20 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff 
>> *msg)
>>  .arg2_type  = ARG_ANYTHING,
>>  };
>>  
>> +BPF_CALL_2(bpf_msg_cork_bytes, struct sk_msg_buff *, msg, u32, bytes)
>> +{
>> +msg->cork_bytes = bytes;
>> +return 0;
>> +}
> 
> my understanding that setting it here and in the other helper *_bytes to zero
> will be effectively a nop. Right?
> 

Correct, setting cork_bytes or apply_bytes to zero is just a nop.

> Acked-by: Alexei Starovoitov 
> 



Re: get_user_pages returning 0 (was Re: kernel BUG at drivers/vhost/vhost.c:LINE!)

2018-03-19 Thread Dmitry Vyukov
On Mon, Mar 19, 2018 at 4:29 PM, David Sterba  wrote:
> On Mon, Mar 19, 2018 at 05:09:28PM +0200,  Michael S. Tsirkin  wrote:
>> Hello!
>> The following code triggered by syzbot
>>
>> r = get_user_pages_fast(log, 1, 1, );
>> if (r < 0)
>> return r;
>> BUG_ON(r != 1);
>>
>> Just looking at get_user_pages_fast's documentation this seems
>> impossible - it is supposed to only ever return # of pages
>> pinned or errno.
>>
>> However, poking at code, I see at least one path that might cause this:
>>
>> ret = faultin_page(tsk, vma, start, _flags,
>> nonblocking);
>> switch (ret) {
>> case 0:
>> goto retry;
>> case -EFAULT:
>> case -ENOMEM:
>> case -EHWPOISON:
>> return i ? i : ret;
>> case -EBUSY:
>> return i;
>>
>> which originally comes from:
>>
>> commit 53a7706d5ed8f1a53ba062b318773160cc476dde
>> Author: Michel Lespinasse 
>> Date:   Thu Jan 13 15:46:14 2011 -0800
>>
>> mlock: do not hold mmap_sem for extended periods of time
>>
>> __get_user_pages gets a new 'nonblocking' parameter to signal that the
>> caller is prepared to re-acquire mmap_sem and retry the operation if
>> needed.  This is used to split off long operations if they are going to
>> block on a disk transfer, or when we detect contention on the mmap_sem.
>>
>> [a...@linux-foundation.org: remove ref to rwsem_is_contended()]
>> Signed-off-by: Michel Lespinasse 
>> Cc: Hugh Dickins 
>> Cc: Rik van Riel 
>> Cc: Peter Zijlstra 
>> Cc: Nick Piggin 
>> Cc: KOSAKI Motohiro 
>> Cc: Ingo Molnar 
>> Cc: "H. Peter Anvin" 
>> Cc: Thomas Gleixner 
>> Cc: David Howells 
>> Signed-off-by: Andrew Morton 
>> Signed-off-by: Linus Torvalds 
>>
>> I started looking into this, if anyone has any feedback meanwhile,
>> that would be appreciated.
>>
>> In particular I don't really see why would this trigger
>> on commit 8f5fd927c3a7576d57248a2d7a0861c3f2795973:
>>
>> Merge: 8757ae2 093e037
>> Author: Linus Torvalds 
>> Date:   Fri Mar 16 13:37:42 2018 -0700
>>
>> Merge tag 'for-4.16-rc5-tag' of 
>> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
>>
>> is btrfs used on these systems?
>
> There were 3 patches pulled by that tag, none of them is even remotely
> related to the reported bug, AFAICS. If there's some impact, it must be
> indirect, obvious bugs like NULL pointer would exhibit in a different
> way and leave at least some trace in the stacks.

That is just a commit on which the bug was hit. It's provided so that
developers can make sense out of line numbers and check if the tree
includes/not includes a particular commit, etc. It's not that that
commit introduced the bug.


Re: interdependencies with cxgb4 and iw_cxgb4

2018-03-19 Thread Steve Wise



On 3/16/2018 11:21 AM, David Miller wrote:

From: "Steve Wise" 
Date: Wed, 14 Mar 2018 10:31:24 -0500


This issue has also been dealt-with for Mellanox drivers, I believe.  I take
it the solution for them was a k.o. signed repo, that they maintain, where
both linux-rdma and netdev take PRs from for commits that are needed in both
repos.   Then these are reconciled when both repos are merged into Linus'
repo. (I hope my understanding of this is correct)

For Chelsio, this is perhaps a possibility, but I'm wondering if there is a
simpler solution?  A few other option we've been discussing include:

1) submit the cxgb4-only changes to netdev in release cycle X, and then only
submit the iw_cxgb4 (or other upper drivers) changes that use them in
release cycle X+1.  The pro of this is simplicity.  The con is timeliness -
it takes 2 release cycles to get the feature upstream.

2) run the entire series through one maintainer's repo (with all
maintainers' ACK on the content and plan, of course), and ensuring no
conflicting commits are submitted for the rest of that release cycle.  I'm
not really sure that this is feasible given anyone could create commits for
upstream drivers.  So how could Chelsio really control this?

Do you have any suggestions on how we should proceed?

I think the Mellanox setup is working well currently.

If the changes get pulled into both the rdma and networking tree, then it
all gets resolved cleanly no matter which of rdma or networking goes into
Linus's tree first during the merge window.

It doesn't have the delay issues of suggestion #1, and I think avoiding
conflicts in situation #2 is next to impossible.

In fact, such conflict problems are how we arrived at the approach
Mellanox is using in the first place.



Thanks Dave.

Let me ask a dumb question:  Why cannot one of the maintaners pull the 
commit from the other mainainer's git repo directly?  IE why have this 
third trusted/signed git repo that has to be on k.o, from which both 
maintainers pull?  If one of you can pull it in via a patch series, like 
you do for all other patches, and then notify the other maintainer to 
pull it from the first maintainers' repo if the series meets the 
requirements that it needs to be in both maintainers' repositories?  
This avoids adding more staging git repos on k.o.  But probably I'm 
missing something...


Steve.




[PATCH 4.9 129/241] MIPS: BPF: Quit clobbering callee saved registers in JIT code.

2018-03-19 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: David Daney 


[ Upstream commit 1ef0910cfd681f0bd0b81f8809935b2006e9cfb9 ]

If bpf_needs_clear_a() returns true, only actually clear it if it is
ever used.  If it is not used, we don't save and restore it, so the
clearing has the nasty side effect of clobbering caller state.

Also, don't emit stack pointer adjustment instructions if the
adjustment amount is zero.

Signed-off-by: David Daney 
Cc: James Hogan 
Cc: Alexei Starovoitov 
Cc: Steven J. Hill 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15745/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/mips/net/bpf_jit.c |   16 
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -526,7 +526,8 @@ static void save_bpf_jit_regs(struct jit
u32 sflags, tmp_flags;
 
/* Adjust the stack pointer */
-   emit_stack_offset(-align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(-align_sp(offset), ctx);
 
tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
/* sflags is essentially a bitmap */
@@ -578,7 +579,8 @@ static void restore_bpf_jit_regs(struct
emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
 
/* Restore the sp and discard the scrach memory */
-   emit_stack_offset(align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(align_sp(offset), ctx);
 }
 
 static unsigned int get_stack_depth(struct jit_ctx *ctx)
@@ -625,8 +627,14 @@ static void build_prologue(struct jit_ct
if (ctx->flags & SEEN_X)
emit_jit_reg_move(r_X, r_zero, ctx);
 
-   /* Do not leak kernel data to userspace */
-   if (bpf_needs_clear_a(>skf->insns[0]))
+   /*
+* Do not leak kernel data to userspace, we only need to clear
+* r_A if it is ever used.  In fact if it is never used, we
+* will not save/restore it, so clearing it in this case would
+* corrupt the state of the caller.
+*/
+   if (bpf_needs_clear_a(>skf->insns[0]) &&
+   (ctx->flags & SEEN_A))
emit_jit_reg_move(r_A, r_zero, ctx);
 }
 




[PATCH 4.9 130/241] MIPS: BPF: Fix multiple problems in JIT skb access helpers.

2018-03-19 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: David Daney 


[ Upstream commit a81507c79f4ae9a0f9fb1054b59b62a090620dd9 ]

o Socket data is unsigned, so use unsigned accessors instructions.

 o Fix path result pointer generation arithmetic.

 o Fix half-word byte swapping code for unsigned semantics.

Signed-off-by: David Daney 
Cc: James Hogan 
Cc: Alexei Starovoitov 
Cc: Steven J. Hill 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15747/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/mips/net/bpf_jit_asm.S |   23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

--- a/arch/mips/net/bpf_jit_asm.S
+++ b/arch/mips/net/bpf_jit_asm.S
@@ -90,18 +90,14 @@ FEXPORT(sk_load_half_positive)
is_offset_in_header(2, half)
/* Offset within header boundaries */
PTR_ADDU t1, $r_skb_data, offset
-   .setreorder
-   lh  $r_A, 0(t1)
-   .setnoreorder
+   lhu $r_A, 0(t1)
 #ifdef CONFIG_CPU_LITTLE_ENDIAN
 # if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
-   wsbht0, $r_A
-   seh $r_A, t0
+   wsbh$r_A, $r_A
 # else
-   sll t0, $r_A, 24
-   andit1, $r_A, 0xff00
-   sra t0, t0, 16
-   srl t1, t1, 8
+   sll t0, $r_A, 8
+   srl t1, $r_A, 8
+   andit0, t0, 0xff00
or  $r_A, t0, t1
 # endif
 #endif
@@ -115,7 +111,7 @@ FEXPORT(sk_load_byte_positive)
is_offset_in_header(1, byte)
/* Offset within header boundaries */
PTR_ADDU t1, $r_skb_data, offset
-   lb  $r_A, 0(t1)
+   lbu $r_A, 0(t1)
jr  $r_ra
 move   $r_ret, zero
END(sk_load_byte)
@@ -139,6 +135,11 @@ FEXPORT(sk_load_byte_positive)
  * (void *to) is returned in r_s0
  *
  */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define DS_OFFSET(SIZE) (4 * SZREG)
+#else
+#define DS_OFFSET(SIZE) ((4 * SZREG) + (4 - SIZE))
+#endif
 #define bpf_slow_path_common(SIZE) \
/* Quick check. Are we within reasonable boundaries? */ \
LONG_ADDIU  $r_s1, $r_skb_len, -SIZE;   \
@@ -150,7 +151,7 @@ FEXPORT(sk_load_byte_positive)
PTR_LA  t0, skb_copy_bits;  \
PTR_S   $r_ra, (5 * SZREG)($r_sp);  \
/* Assign low slot to a2 */ \
-   movea2, $r_sp;  \
+   PTR_ADDIU   a2, $r_sp, DS_OFFSET(SIZE); \
jalrt0; \
/* Reset our destination slot (DS but it's ok) */   \
 INT_S  zero, (4 * SZREG)($r_sp);   \




Re: [RFC net-next] sfp/phylink: move module EEPROM ethtool access into netdev core ethtool

2018-03-19 Thread Florian Fainelli
On 12/17/2017 06:48 AM, Russell King wrote:
> Provide a pointer to the SFP bus in struct net_device, so that the
> ethtool module EEPROM methods can access the SFP directly, rather
> than needing every user to provide a hook for it.
> 
> Signed-off-by: Russell King 
> ---
> Questions:
> 1. Is it worth adding a pointer to struct net_device for these two
>methods, rather than having multiple duplicate veneers to vector
>the ethtool module EEPROM ioctls through to the SFP bus layer?

Considering the negative diffstat and the fact that it solves real
problems for you, I would say yes.

> 
> 2. Should this allow network/phy drivers to override the default -
>the code is currently structured to allow phy drivers to override
>network drivers implementations, which seems the wrong way around.

This would be nice, but at this point, I would defer until we have all
major PHYLINK, SFP et al. users merged in tree so we have a good
understanding and view of the different possible combinations that might
exist. Then we can try to define an interface allowing network drivers
more flexibility.

If that seems like an appropriate course of action, do you mind
resubmitting this as non RFC?


> 
>  drivers/net/phy/phylink.c | 28 
>  drivers/net/phy/sfp-bus.c |  6 ++
>  include/linux/netdevice.h |  3 +++
>  include/linux/phylink.h   |  3 ---
>  net/core/ethtool.c|  7 +++
>  5 files changed, 12 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index db5d5726ced9..0f59d7149a61 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -1247,34 +1247,6 @@ int phylink_ethtool_set_pauseparam(struct phylink *pl,
>  }
>  EXPORT_SYMBOL_GPL(phylink_ethtool_set_pauseparam);
>  
> -int phylink_ethtool_get_module_info(struct phylink *pl,
> - struct ethtool_modinfo *modinfo)
> -{
> - int ret = -EOPNOTSUPP;
> -
> - WARN_ON(!lockdep_rtnl_is_held());
> -
> - if (pl->sfp_bus)
> - ret = sfp_get_module_info(pl->sfp_bus, modinfo);
> -
> - return ret;
> -}
> -EXPORT_SYMBOL_GPL(phylink_ethtool_get_module_info);
> -
> -int phylink_ethtool_get_module_eeprom(struct phylink *pl,
> -   struct ethtool_eeprom *ee, u8 *buf)
> -{
> - int ret = -EOPNOTSUPP;
> -
> - WARN_ON(!lockdep_rtnl_is_held());
> -
> - if (pl->sfp_bus)
> - ret = sfp_get_module_eeprom(pl->sfp_bus, ee, buf);
> -
> - return ret;
> -}
> -EXPORT_SYMBOL_GPL(phylink_ethtool_get_module_eeprom);
> -
>  /**
>   * phylink_ethtool_get_eee_err() - read the energy efficient ethernet error
>   *   counter
> diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
> index 1356dba0d9d3..4d61099b1357 100644
> --- a/drivers/net/phy/sfp-bus.c
> +++ b/drivers/net/phy/sfp-bus.c
> @@ -321,6 +321,7 @@ static int sfp_register_bus(struct sfp_bus *bus)
>   }
>   if (bus->started)
>   bus->socket_ops->start(bus->sfp);
> + bus->netdev->sfp_bus = bus;
>   bus->registered = true;
>   return 0;
>  }
> @@ -335,6 +336,7 @@ static void sfp_unregister_bus(struct sfp_bus *bus)
>   if (bus->phydev && ops && ops->disconnect_phy)
>   ops->disconnect_phy(bus->upstream);
>   }
> + bus->netdev->sfp_bus = NULL;
>   bus->registered = false;
>  }
>  
> @@ -350,8 +352,6 @@ static void sfp_unregister_bus(struct sfp_bus *bus)
>   */
>  int sfp_get_module_info(struct sfp_bus *bus, struct ethtool_modinfo *modinfo)
>  {
> - if (!bus->registered)
> - return -ENOIOCTLCMD;
>   return bus->socket_ops->module_info(bus->sfp, modinfo);
>  }
>  EXPORT_SYMBOL_GPL(sfp_get_module_info);
> @@ -370,8 +370,6 @@ EXPORT_SYMBOL_GPL(sfp_get_module_info);
>  int sfp_get_module_eeprom(struct sfp_bus *bus, struct ethtool_eeprom *ee,
> u8 *data)
>  {
> - if (!bus->registered)
> - return -ENOIOCTLCMD;
>   return bus->socket_ops->module_eeprom(bus->sfp, ee, data);
>  }
>  EXPORT_SYMBOL_GPL(sfp_get_module_eeprom);
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index ef789e1d679e..99a0a155c319 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -57,6 +57,7 @@ struct device;
>  struct phy_device;
>  struct dsa_port;
>  
> +struct sfp_bus;
>  /* 802.11 specific */
>  struct wireless_dev;
>  /* 802.15.4 specific */
> @@ -1644,6 +1645,7 @@ enum netdev_priv_flags {
>   *   @priomap:   XXX: need comments on this one
>   *   @phydev:Physical device may attach itself
>   *   for hardware timestamping
> + *   @sfp_bus:   attached  sfp_bus structure.
>   *
>   *   @qdisc_tx_busylock: lockdep class annotating Qdisc->busylock spinlock
>   *   @qdisc_running_key: lockdep class annotating Qdisc->running 

Re: [net-next 0/9][pull request] 40GbE Intel Wired LAN Driver Updates 2018-03-19

2018-03-19 Thread David Miller
From: Jeff Kirsher 
Date: Mon, 19 Mar 2018 10:56:50 -0700

> This series contains updates to i40e and i40evf only.

Pulled, thanks Jeff.


  1   2   3   4   >