[net PATCH] i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per packet

2016-03-29 Thread Alexander Duyck
This patch addresses a bug introduced based on my interpretation of the
XL710 datasheet.  Specifically section 8.4.1 states that "A single transmit
packet may span up to 8 buffers (up to 8 data descriptors per packet
including both the header and payload buffers)."  It then later goes on to
say that each segment for a TSO obeys the previous rule, however it then
refers to TSO header and the segment payload buffers.

I believe the actual limit for fragments with TSO and a skbuff that has
payload data in the header portion of the buffer is actually only 7
fragments as the skb->data portion counts as 2 buffers, one for the TSO
header, and one for a segment payload buffer.

Fixes: 2d37490b82af ("i40e/i40evf: Rewrite logic for 8 descriptor per packet 
check")
Signed-off-by: Alexander Duyck 
---

This patch has been sanity checked only.  I cannot yet guarantee it
resolves the original issue that was reported.  I'll try to get a
reproduction environment setup tomorrow but I don't know how long that
should take.

 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   40 ++---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |   40 ++---
 2 files changed, 44 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5d5fa5359a1d..97437f04d99d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2597,12 +2597,17 @@ int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int 
size)
 }
 
 /**
- * __i40e_chk_linearize - Check if there are more than 8 fragments per packet
+ * __i40e_chk_linearize - Check if there are more than 8 buffers per packet
  * @skb:  send buffer
  *
- * Note: Our HW can't scatter-gather more than 8 fragments to build
- * a packet on the wire and so we need to figure out the cases where we
- * need to linearize the skb.
+ * Note: Our HW can't DMA more than 8 buffers to build a packet on the wire
+ * and so we need to figure out the cases where we need to linearize the skb.
+ *
+ * For TSO we need to count the TSO header and segment payload separately.
+ * As such we need to check cases where we have 7 fragments or more as we
+ * can potentially require 9 DMA transactions, 1 for the TSO header, 1 for
+ * the segment payload in the first descriptor, and another 7 for the
+ * fragments.
  **/
 bool __i40e_chk_linearize(struct sk_buff *skb)
 {
@@ -2614,18 +2619,17 @@ bool __i40e_chk_linearize(struct sk_buff *skb)
if (unlikely(!gso_size))
return true;
 
-   /* no need to check if number of frags is less than 8 */
+   /* no need to check if number of frags is less than 7 */
nr_frags = skb_shinfo(skb)->nr_frags;
-   if (nr_frags < I40E_MAX_BUFFER_TXD)
+   if (nr_frags < (I40E_MAX_BUFFER_TXD - 1))
return false;
 
/* We need to walk through the list and validate that each group
 * of 6 fragments totals at least gso_size.  However we don't need
-* to perform such validation on the first or last 6 since the first
-* 6 cannot inherit any data from a descriptor before them, and the
-* last 6 cannot inherit any data from a descriptor after them.
+* to perform such validation on the last 6 since the last 6 cannot
+* inherit any data from a descriptor after them.
 */
-   nr_frags -= I40E_MAX_BUFFER_TXD - 1;
+   nr_frags -= I40E_MAX_BUFFER_TXD - 2;
frag = &skb_shinfo(skb)->frags[0];
 
/* Initialize size to the negative value of gso_size minus 1.  We
@@ -2636,19 +2640,19 @@ bool __i40e_chk_linearize(struct sk_buff *skb)
 */
sum = 1 - gso_size;
 
-   /* Add size of frags 1 through 5 to create our initial sum */
-   sum += skb_frag_size(++frag);
-   sum += skb_frag_size(++frag);
-   sum += skb_frag_size(++frag);
-   sum += skb_frag_size(++frag);
-   sum += skb_frag_size(++frag);
+   /* Add size of frags 0 through 4 to create our initial sum */
+   sum += skb_frag_size(frag++);
+   sum += skb_frag_size(frag++);
+   sum += skb_frag_size(frag++);
+   sum += skb_frag_size(frag++);
+   sum += skb_frag_size(frag++);
 
/* Walk through fragments adding latest fragment, testing it, and
 * then removing stale fragments from the sum.
 */
stale = &skb_shinfo(skb)->frags[0];
for (;;) {
-   sum += skb_frag_size(++frag);
+   sum += skb_frag_size(frag++);
 
/* if sum is negative we failed to make sufficient progress */
if (sum < 0)
@@ -2658,7 +2662,7 @@ bool __i40e_chk_linearize(struct sk_buff *skb)
if (!--nr_frags)
break;
 
-   sum -= skb_frag_size(++stale);
+   sum -= skb_frag_size(stale++);
}
 
return false;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/dri

Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk

2016-03-29 Thread Yang Yingliang



On 2016/3/30 13:34, Eric Dumazet wrote:

On Tue, 2016-03-29 at 22:25 -0700, Eric Dumazet wrote:

On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:

When task A hold the sk owned in tcp_sendmsg, if lots of packets
arrive and the packets will be added to backlog queue. The packets
will be handled in release_sock called from tcp_sendmsg. When the
sk_backlog is removed from sk, the length will not decrease until
all the packets in backlog queue are handled. This may leads to the
new packets be dropped because the lenth is too big. So set the
lenth to 0 immediately after it's detached from sk.

Signed-off-by: Yang Yingliang 
---
  net/core/sock.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index 47fc8bb..108be05 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)

do {
sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
+   sk->sk_backlog.len = 0;
bh_unlock_sock(sk);

do {


Certainly not.

Have you really missed the comment ?

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871


I do not believe the case you describe can happen, unless a misbehaving
driver cooks fat skb (with skb->truesize being far more bigger than
skb->len)



And also make sure you backported
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da882c1f2ecadb0ed582628ec1585e36b137c0f0


Sorry, I made a mistake. I am very sure my kernel has these two patches.
And I can get some dropping of the packets in 10Gb eth.

# netstat -s | grep -i backlog
TCPBacklogDrop: 4135
# netstat -s | grep -i backlog
TCPBacklogDrop: 4167













Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk

2016-03-29 Thread Yang Yingliang



On 2016/3/30 13:25, Eric Dumazet wrote:

On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:

When task A hold the sk owned in tcp_sendmsg, if lots of packets
arrive and the packets will be added to backlog queue. The packets
will be handled in release_sock called from tcp_sendmsg. When the
sk_backlog is removed from sk, the length will not decrease until
all the packets in backlog queue are handled. This may leads to the
new packets be dropped because the lenth is too big. So set the
lenth to 0 immediately after it's detached from sk.

Signed-off-by: Yang Yingliang 
---
  net/core/sock.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index 47fc8bb..108be05 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)

do {
sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
+   sk->sk_backlog.len = 0;
bh_unlock_sock(sk);

do {


Certainly not.

Have you really missed the comment ?

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871


My kernel is 4.1 LTS, it seems don't have this patch. I will try this 
patch later.


Thanks
Yang



I do not believe the case you describe can happen, unless a misbehaving
driver cooks fat skb (with skb->truesize being far more bigger than
skb->len)











Re: am335x: no multicast reception over VLAN

2016-03-29 Thread Mugunthan V N
On Tuesday 29 March 2016 06:14 PM, Grygorii Strashko wrote:
> On 03/29/2016 03:35 PM, Yegor Yefremov wrote:
>> On Tue, Mar 29, 2016 at 1:05 PM, Grygorii Strashko
>>  wrote:
>>> On 03/29/2016 08:21 AM, Yegor Yefremov wrote:
 Hi Mugunthan,

 On Tue, Mar 29, 2016 at 6:00 AM, Mugunthan V N  wrote:
> Hi Yegor
>
> On Wednesday 16 March 2016 08:35 PM, Yegor Yefremov wrote:
>> I have an am335x based board using CPSW in Dual EMAC mode. Without
>> VLAN IDs I can receive and send multicast packets [1]. When I create
>> VLAN ID:
>>
>> ip link add link eth1 name eth1.100 type vlan id 100
>> ip addr add 192.168.100.2/24 brd 192.168.100.255 dev eth1.100
>> route add -net 224.0.0.0 netmask 224.0.0.0 eth1.100
>>
>> I can successfully send multicast packets, but not receive them. On
>> the other side of the Ethernet cable I've used Pandaboard. Pandaboard
>> could both receive and send multicast packets via VLAN.
>
> Are you trying multicast tx/rx on eth1 or eth1.100?

 I'm trying multicast tx/rx on eth1.100.

 eth1 has no problems.

>>>
>>> it'd be nice if will be able to post here output fom commands:
>>> # switch-config -d [git://git.ti.com/switch-config/switch-config.git v4.1]
>>> # ifconfig -a
>>> # tcpdump -e -f -Q in -i eth0
>>> # tcpdump -e -f -Q in -i eth0.100
>>
>> Which kernel/branch do you want me to test?
>>
>> git://git.ti.com/ti-linux-kernel/ti-linux-kernel.git and ti-rt-linux-4.1.y?
>>
>> So far I was using vanilla kernel.
> 
> Your branch (but better 4.5 kernels (or 4.4)). 
> Just when you've done with configuration run cmds 1&2,
> and when you run your use-case - run cmds 2&3 on receiver side (grap ~5-10 
> packets).
> then stop test and run cmd 1 again.
> 
> After all could you provide your console output here, pls.
> 
> 

To use command 1, you need TI kernel [1] as it won't build with vanilla
kernel.

[1]: git://git.ti.com/ti-linux-kernel/ti-linux-kernel.git ti-linux-4.1.y

Regards
Mugunthan V N


Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk

2016-03-29 Thread Eric Dumazet
On Tue, 2016-03-29 at 22:25 -0700, Eric Dumazet wrote:
> On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:
> > When task A hold the sk owned in tcp_sendmsg, if lots of packets
> > arrive and the packets will be added to backlog queue. The packets
> > will be handled in release_sock called from tcp_sendmsg. When the
> > sk_backlog is removed from sk, the length will not decrease until
> > all the packets in backlog queue are handled. This may leads to the
> > new packets be dropped because the lenth is too big. So set the
> > lenth to 0 immediately after it's detached from sk.
> > 
> > Signed-off-by: Yang Yingliang 
> > ---
> >  net/core/sock.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 47fc8bb..108be05 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
> >  
> > do {
> > sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
> > +   sk->sk_backlog.len = 0;
> > bh_unlock_sock(sk);
> >  
> > do {
> 
> Certainly not.
> 
> Have you really missed the comment ?
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871
> 
> 
> I do not believe the case you describe can happen, unless a misbehaving
> driver cooks fat skb (with skb->truesize being far more bigger than
> skb->len)
> 

And also make sure you backported
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da882c1f2ecadb0ed582628ec1585e36b137c0f0






Re: [net-next 02/16] i40e/i40evf: Rewrite logic for 8 descriptor per packet check

2016-03-29 Thread Jeff Kirsher
On Tue, 2016-03-29 at 22:19 -0700, Alexander Duyck wrote:
> Minor correction here.  This is 72 hex, which is 114 in decimal.  The
> interesting bit I believe is the fact that we have both header and
> payload data in the same descriptor.
> 
> You should probably check with your hardware team on this but I have
> a
> working theory on the issue.  I'm thinking that because both header
> and payload are coming out of the same descriptor this must count as
> 2
> descriptors and not 1 when we compute the usage for the first
> descriptor.  As such I do probably need to start testing at frag 0
> because the first payload can actually come out of the header region
> if the first descriptor includes both payload and data.
> 
> I'll try to submit a patch for it tonight.  If you can test it
> tomorrow I would appreciate it.  Otherwise I will try setting up an
> environment to try and reproduce the issue tomorrow.

If you get it submitted tonight, I can have it ready in my tree for
testing for Jesse and the team when they get into the office tomorrow.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk

2016-03-29 Thread Eric Dumazet
On Wed, 2016-03-30 at 13:16 +0800, Yang Yingliang wrote:
> When task A hold the sk owned in tcp_sendmsg, if lots of packets
> arrive and the packets will be added to backlog queue. The packets
> will be handled in release_sock called from tcp_sendmsg. When the
> sk_backlog is removed from sk, the length will not decrease until
> all the packets in backlog queue are handled. This may leads to the
> new packets be dropped because the lenth is too big. So set the
> lenth to 0 immediately after it's detached from sk.
> 
> Signed-off-by: Yang Yingliang 
> ---
>  net/core/sock.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 47fc8bb..108be05 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
>  
>   do {
>   sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
> + sk->sk_backlog.len = 0;
>   bh_unlock_sock(sk);
>  
>   do {

Certainly not.

Have you really missed the comment ?

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871


I do not believe the case you describe can happen, unless a misbehaving
driver cooks fat skb (with skb->truesize being far more bigger than
skb->len)







Re: [net-next 02/16] i40e/i40evf: Rewrite logic for 8 descriptor per packet check

2016-03-29 Thread Alexander Duyck
On Tue, Mar 29, 2016 at 7:20 PM, Alexander Duyck
 wrote:
> On Tue, Mar 29, 2016 at 5:34 PM, Jesse Brandeburg
>  wrote:
>> stupid gmail... sent again to the list, sorry for duplicate (but
>> formatted better this time)
>>
>>
>> Hey Alex, this patch appears to have caused a regression (probably both
>> i40e/i40evf).  Easily reproducible running rds-stress (see the thread titled
>> "i40e card Tx resets", the middle post by sowmini has the repro steps,
>> ignore the pktgen discussion in the thread.)
>
> I'll see about setting up rds tomorrow.  It should be pretty straight forward.
>
>> I've spent some time trying to figure out the right fix, but keep getting
>> stuck in the complicated logic of the function, so I'm sending this in the
>> hope you'll take a look.
>>
>> This is (one example of) the skb that doesn't get linearized:
>> skb_print_bits:  > headlen=114 datalen=33824 data=238 mac=238 nh=252
>> h=272 gso=1448 frag=17
>
> So the code as I had written it up should be verifying we use no more
> than 8 descriptors as that was what the data sheet states.  You might
> want to request a spec update if the hardware only supports 7 data
> descriptors per segment for a TSO as that isn't what is in the
> documentation.
>
>> skb_print_bits: frag[0]: len: 256
>> skb_print_bits: frag[1]: len: 48
>> skb_print_bits: frag[2]: len: 256
>> skb_print_bits: frag[3]: len: 48
>> skb_print_bits: frag[4]: len: 256
>> skb_print_bits: frag[5]: len: 48
>> skb_print_bits: frag[6]: len: 4096
>>
>> This descriptor ^^^ is the 8th, I believe the hardware mechanism faults on
>> this input.
>
> Is this something that was previously being linearized?  It doesn't
> seem like it would be.  We are only 8 descriptors in and that 7th
> fragment should have flushed the count back to 1 with a remainder.
>
> If you are wanting to trigger a linearize with the updated code you
> would just need to update 2 lines.  Replace "I40E_MAX_BUFFER_TXD - 1"
> with just I40E_MAX_BUFFER_TXD" in the line where we subtract that
> value from nr_frags.  Then remove one of the "sum +=
> skb_frag_size(++frag);" lines.  That should update the code so that
> you only evaluate 5 buffers at a time instead of 6.  It should force
> things to coalesce if we would span 8 or more descriptors instead of 9
> or more which is what the code currently does based on what was
> required in the EAS.
>
>> I added a print of the sum at each point it is subtracted or added to after
>> initially being set, sum7/8 are in the for loop.
>>
>> skb_print_bits: frag[7]: len: 4096
>> skb_print_bits: frag[8]: len: 48
>> skb_print_bits: frag[9]: len: 4096
>> skb_print_bits: frag[10]: len: 4096
>> skb_print_bits: frag[11]: len: 48
>> skb_print_bits: frag[12]: len: 4096
>> skb_print_bits: frag[13]: len: 4096
>> skb_print_bits: frag[14]: len: 48
>> skb_print_bits: frag[15]: len: 4096
>> skb_print_bits: frag[16]: len: 4096
>> __i40e_chk_linearize: sum1: -1399
>> __i40e_chk_linearize: sum2: -1143
>> __i40e_chk_linearize: sum3: -1095
>> __i40e_chk_linearize: sum4: -839
>> __i40e_chk_linearize: sum5: -791
>> __i40e_chk_linearize: sum7: 3305
>> __i40e_chk_linearize: sum8: 3257
>> __i40e_chk_linearize: sum7: 7353
>> __i40e_chk_linearize: sum8: 7097
>> __i40e_chk_linearize: sum7: 7145
>> __i40e_chk_linearize: sum8: 7097
>> __i40e_chk_linearize: sum7: 11193
>> __i40e_chk_linearize: sum8: 10937
>> __i40e_chk_linearize: sum7: 15033
>> __i40e_chk_linearize: sum8: 14985
>> __i40e_chk_linearize: sum7: 15033
>>
>> This was the descriptors generated from the above:
>> d[054] = 0x 0x16a021140011
>> d[055] = 0x000bfbaa40ee 0x01ca02871640
>
> len 72

Minor correction here.  This is 72 hex, which is 114 in decimal.  The
interesting bit I believe is the fact that we have both header and
payload data in the same descriptor.

You should probably check with your hardware team on this but I have a
working theory on the issue.  I'm thinking that because both header
and payload are coming out of the same descriptor this must count as 2
descriptors and not 1 when we compute the usage for the first
descriptor.  As such I do probably need to start testing at frag 0
because the first payload can actually come out of the header region
if the first descriptor includes both payload and data.

I'll try to submit a patch for it tonight.  If you can test it
tomorrow I would appreciate it.  Otherwise I will try setting up an
environment to try and reproduce the issue tomorrow.

- Alex


Re: [PATCH 06/16] wcn36xx: Fetch private sta data from sta entry instead of from vif

2016-03-29 Thread Kalle Valo
Bjorn Andersson  writes:

>> All error/warnings (new ones prefixed by >>):
>> 
>>drivers/net/wireless/ath/wcn36xx/main.c: In function 'wcn36xx_set_key':
>> >> drivers/net/wireless/ath/wcn36xx/main.c:389:9: error: implicit 
>> >> declaration of function 'wcn36xx_sta_to_priv' 
>> >> [-Werror=implicit-function-declaration]
>>  struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
>> ^
>> >> drivers/net/wireless/ath/wcn36xx/main.c:389:33: warning: initialization 
>> >> makes pointer from integer without a cast
>>  struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
>> ^
>>cc1: some warnings being treated as errors
>
> This should have been reordered with patch 7, that introduces this
> helper function. Do you want me to resend, or can you apply the patches
> out of order?

It's better that you resend the whole patchset as v2.

-- 
Kalle Valo


[PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk

2016-03-29 Thread Yang Yingliang
When task A hold the sk owned in tcp_sendmsg, if lots of packets
arrive and the packets will be added to backlog queue. The packets
will be handled in release_sock called from tcp_sendmsg. When the
sk_backlog is removed from sk, the length will not decrease until
all the packets in backlog queue are handled. This may leads to the
new packets be dropped because the lenth is too big. So set the
lenth to 0 immediately after it's detached from sk.

Signed-off-by: Yang Yingliang 
---
 net/core/sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index 47fc8bb..108be05 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1933,6 +1933,7 @@ static void __release_sock(struct sock *sk)
 
do {
sk->sk_backlog.head = sk->sk_backlog.tail = NULL;
+   sk->sk_backlog.len = 0;
bh_unlock_sock(sk);
 
do {
-- 
2.5.0




Re: [PATCH] ethernet: mvneta: Support netpoll

2016-03-29 Thread Ezequiel Garcia
Hi David,

On 30 March 2016 at 00:09, David Miller  wrote:
> From: Ezequiel Garcia 
> Date: Mon, 28 Mar 2016 17:41:18 -0300
>
>> +/* Polled functionality used by netconsole and others in non interrupt mode 
>> */
>> +static void mvneta_poll_controller(struct net_device *dev)
>> +{
>> + struct mvneta_port *pp = netdev_priv(dev);
>> +
>> + on_each_cpu(mvneta_percpu_poll_controller, pp, false);
>> +}
>
> This doesn't work.
>
> netpoll may be invoked from any context whatsoever, even hardware
> interrupt handlers or contexts where cpu interrupts are disabled.
>
> smp_call_function() and thus on_each_cpu() may not be called with
> disabled interrupts or from a hardware interrupt handler or from
> a bottom half handler, all of which are valid situations where
> netpoll may occur since printk's can occur anywhere.

Well, I hated the idea of using on_each_cpu here, but wasn't sure
if there was a real reason that forbid it.

Thanks a lot for the feedback.
-- 
Ezequiel García, VanguardiaSur
www.vanguardiasur.com.ar


Re: [PATCH net] bpf: make padding in bpf_tunnel_key explicit

2016-03-29 Thread David Miller
From: Daniel Borkmann 
Date: Wed, 30 Mar 2016 00:02:00 +0200

> Make the 2 byte padding in struct bpf_tunnel_key between tunnel_ttl
> and tunnel_label members explicit. No issue has been observed, and
> gcc/llvm does padding for the old struct already, where tunnel_label
> was not yet present, so the current code works, but since it's part
> of uapi, make sure we don't introduce holes in structs.
> 
> Therefore, add tunnel_ext that we can use generically in future
> (f.e. to flag OAM messages for backends, etc). Also add the offset
> to the compat tests to be sure should some compilers not padd the
> tail of the old version of bpf_tunnel_key.
> 
> Fixes: 4018ab1875e0 ("bpf: support flow label for bpf_skb_{set, 
> get}_tunnel_key")
> Signed-off-by: Daniel Borkmann 

Applied, thanks.


Re: [PATCH] net: fec: stop the "rcv is not +last, " error messages

2016-03-29 Thread Greg Ungerer
Hi Troy,

Commit 55cd48c8 ('net: fec: stop the "rcv is not +last, " error
messages') adds a write to a register that is not present in all
implementations of the FEC hardware module. None of the ColdFire
SoC parts with the FEC module have the FTRL (0x1b0) register.

Does this need a quirk flag to key access to this register of?
Or can you piggyback on the FEC_QUIRK_HAS_RACC flag?

Regards
Greg



Re: [PATCH net] net: ipv4: Multipath needs to handle unreachable nexthops

2016-03-29 Thread David Ahern

On 3/25/16 4:05 AM, Julian Anastasov wrote:


Hello,

On Thu, 24 Mar 2016, David Ahern wrote:


On 3/24/16 4:33 PM, Julian Anastasov wrote:

But for multipath routes we can also consider the
nexthops as "alternatives", so it depends on how one uses
the multipath mechanism. The ability to fallback to
another nexthop assumes one connection is allowed to
move from one ISP to another. What if the second ISP
decides to reject the connection? What we have is a
broken connection just because the retransmits
were diverted to wrong place in the hurry. So, the
nexthops can be compatible or incompatible. For your
setup they are, for others they are not.


I am not sure I completely understand your point. Are you saying that within a
single multipath route some connections to nexthops are allowed and others are
not?

So to put that paragraph into an example

15.0.0.0/16
nexthop via 12.0.0.2  dev swp1 weight 1
nexthop via 12.0.0.3  dev swp1 weight 1

Hosts from 15.0/16 could have TCP connections use 12.0.0.2, but not 12.0.0.3
because 12.0.0.3 could be a different ISP and not allow TCP connections from
this address space?


Yes. Two cases are possible:

1. ISP2 filters saddr, traffic with saddr from ISP1 is dropped.

2. ISP2 allows any saddr. But how the responses from
world with daddr=IP_from_ISP1 will come from ISP2 link?
If the nexthops are for different ISP the connection
can survive only if sticks to its ISP. An ISP will
not work as a backup link for another ISP.


Seems to me this is a problem that is addressed by VRFs, not multipath 
routes where some nexthops are actually deadends because they attempt to 
cross ISPs.




After that if it has information that says that a nexthop is dead, why would
it continue to try to probe? Any traffic that selects that nh is dead. That to


If entry becomes FAILED this state is preserved
if we do not direct traffic to this entry. If there was a
single connection that was rejected after 3 failed probes
the next connection (with your patch) will fallback to
another neigh and the first entry will remain in FAILED
state until expiration. If one wants to refresh the state
often, a script/tool that pings all GWs is needed, so that
you can notice the available or failed paths faster.


me defies the basis of having multiple paths.


We do not know how long is the outage. Long living
connections may prefer to survive with retransmits.
Say you are using SSH via wifi link doing important work.
Do you want your connection to break just because link was
down for a while?


neighbor entries have a timeout and when it drops from the cache the arp 
will try again. This suggested patch is not saying 'never try a nexthop 
again' it is saying 'I have multiple paths and since path 1 is down try 
another one'.


I'll send an updated patch when I get time (traveling at the moment); I 
guess a sysctl is going to be needed if the behavior you mention with 
ISPs is reasonable.




Re: [PATCH] ethernet: mvneta: Support netpoll

2016-03-29 Thread David Miller
From: Ezequiel Garcia 
Date: Mon, 28 Mar 2016 17:41:18 -0300

> +/* Polled functionality used by netconsole and others in non interrupt mode 
> */
> +static void mvneta_poll_controller(struct net_device *dev)
> +{
> + struct mvneta_port *pp = netdev_priv(dev);
> +
> + on_each_cpu(mvneta_percpu_poll_controller, pp, false);
> +}

This doesn't work.

netpoll may be invoked from any context whatsoever, even hardware
interrupt handlers or contexts where cpu interrupts are disabled.

smp_call_function() and thus on_each_cpu() may not be called with
disabled interrupts or from a hardware interrupt handler or from
a bottom half handler, all of which are valid situations where
netpoll may occur since printk's can occur anywhere.


[PATCHv2 net] team: team should sync the port's uc/mc addrs when add a port

2016-03-29 Thread Xin Long
There is an issue when we use mavtap over team:
When we replug nic links from team0, the real nics's mc list will not
include the maddr for macvtap any more. then we can't receive pkts to
macvtap device, as they are filterred by mc list of nic.

In Bonding Driver, it syncs the uc/mc addrs in bond_enslave().

We will fix this issue on team by adding the port's uc/mc addrs sync in
team_port_add.

Signed-off-by: Xin Long 
---
 drivers/net/team/team.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 26c64d2..a0f64cb 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1198,6 +1198,9 @@ static int team_port_add(struct team *team, struct 
net_device *port_dev)
goto err_dev_open;
}
 
+   dev_uc_sync_multiple(port_dev, dev);
+   dev_mc_sync_multiple(port_dev, dev);
+
err = vlan_vids_add_by_dev(port_dev, dev);
if (err) {
netdev_err(dev, "Failed to add vlan ids to device %s\n",
@@ -1261,6 +1264,8 @@ err_enable_netpoll:
vlan_vids_del_by_dev(port_dev, dev);
 
 err_vids_add:
+   dev_uc_unsync(port_dev, dev);
+   dev_mc_unsync(port_dev, dev);
dev_close(port_dev);
 
 err_dev_open:
-- 
2.1.0



Re: [net-next 02/16] i40e/i40evf: Rewrite logic for 8 descriptor per packet check

2016-03-29 Thread Alexander Duyck
On Tue, Mar 29, 2016 at 5:34 PM, Jesse Brandeburg
 wrote:
> stupid gmail... sent again to the list, sorry for duplicate (but
> formatted better this time)
>
>
> Hey Alex, this patch appears to have caused a regression (probably both
> i40e/i40evf).  Easily reproducible running rds-stress (see the thread titled
> "i40e card Tx resets", the middle post by sowmini has the repro steps,
> ignore the pktgen discussion in the thread.)

I'll see about setting up rds tomorrow.  It should be pretty straight forward.

> I've spent some time trying to figure out the right fix, but keep getting
> stuck in the complicated logic of the function, so I'm sending this in the
> hope you'll take a look.
>
> This is (one example of) the skb that doesn't get linearized:
> skb_print_bits:  > headlen=114 datalen=33824 data=238 mac=238 nh=252
> h=272 gso=1448 frag=17

So the code as I had written it up should be verifying we use no more
than 8 descriptors as that was what the data sheet states.  You might
want to request a spec update if the hardware only supports 7 data
descriptors per segment for a TSO as that isn't what is in the
documentation.

> skb_print_bits: frag[0]: len: 256
> skb_print_bits: frag[1]: len: 48
> skb_print_bits: frag[2]: len: 256
> skb_print_bits: frag[3]: len: 48
> skb_print_bits: frag[4]: len: 256
> skb_print_bits: frag[5]: len: 48
> skb_print_bits: frag[6]: len: 4096
>
> This descriptor ^^^ is the 8th, I believe the hardware mechanism faults on
> this input.

Is this something that was previously being linearized?  It doesn't
seem like it would be.  We are only 8 descriptors in and that 7th
fragment should have flushed the count back to 1 with a remainder.

If you are wanting to trigger a linearize with the updated code you
would just need to update 2 lines.  Replace "I40E_MAX_BUFFER_TXD - 1"
with just I40E_MAX_BUFFER_TXD" in the line where we subtract that
value from nr_frags.  Then remove one of the "sum +=
skb_frag_size(++frag);" lines.  That should update the code so that
you only evaluate 5 buffers at a time instead of 6.  It should force
things to coalesce if we would span 8 or more descriptors instead of 9
or more which is what the code currently does based on what was
required in the EAS.

> I added a print of the sum at each point it is subtracted or added to after
> initially being set, sum7/8 are in the for loop.
>
> skb_print_bits: frag[7]: len: 4096
> skb_print_bits: frag[8]: len: 48
> skb_print_bits: frag[9]: len: 4096
> skb_print_bits: frag[10]: len: 4096
> skb_print_bits: frag[11]: len: 48
> skb_print_bits: frag[12]: len: 4096
> skb_print_bits: frag[13]: len: 4096
> skb_print_bits: frag[14]: len: 48
> skb_print_bits: frag[15]: len: 4096
> skb_print_bits: frag[16]: len: 4096
> __i40e_chk_linearize: sum1: -1399
> __i40e_chk_linearize: sum2: -1143
> __i40e_chk_linearize: sum3: -1095
> __i40e_chk_linearize: sum4: -839
> __i40e_chk_linearize: sum5: -791
> __i40e_chk_linearize: sum7: 3305
> __i40e_chk_linearize: sum8: 3257
> __i40e_chk_linearize: sum7: 7353
> __i40e_chk_linearize: sum8: 7097
> __i40e_chk_linearize: sum7: 7145
> __i40e_chk_linearize: sum8: 7097
> __i40e_chk_linearize: sum7: 11193
> __i40e_chk_linearize: sum8: 10937
> __i40e_chk_linearize: sum7: 15033
> __i40e_chk_linearize: sum8: 14985
> __i40e_chk_linearize: sum7: 15033
>
> This was the descriptors generated from the above:
> d[054] = 0x 0x16a021140011
> d[055] = 0x000bfbaa40ee 0x01ca02871640

len 72

> d[056] = 0x000584bcd000 0x040202871640

len 256

> d[057] = 0x000c0bfea9d0 0x00c202871640

len 48

> d[058] = 0x000584bcd100 0x040202871640

len 256

> d[059] = 0x000c0bfeaa00 0x00c202871640

len 48

> d[05a] = 0x000584bcd200 0x040202871640

len 256

> d[05b] = 0x000c0bfeaa30 0x00c202871640

len 48

> d[05c] = 0x00056d5f 0x400202871640

len 4096

>From what I can see we are at 8 descriptors and have exceeded 1 MSS.
According to the EAS this is supposed to work.

> d[05d] = 0x00056d5f1000 0x400202871640
> d[05e] = 0x000c0bfeaa60 0x00c202871640
> d[05f] = 0x0005f2762000 0x400202871640
> d[060] = 0x0005f765e000 0x400202871640
> d[061] = 0x000c0bfeaa90 0x00c202871640
> d[062] = 0x000574928000 0x400202871640
> d[063] = 0x000568ba5000 0x400202871640
> d[064] = 0x000c0bfeaac0 0x00c202871640
> d[065] = 0x0005f68cd000 0x400202871640
> d[066] = 0x000585a2a000 0x400202871670
>
>
> On Fri, Feb 19, 2016 at 3:54 AM Jeff Kirsher 
> wrote:
>>
>> From: Alexander Duyck 
>>
>> This patch is meant to rewrite the logic for how we determine if we can
>> transmit the frame or if it needs to be linearized.
>>
>> +   /* Initialize size to the negative value of gso_size minus 1.  We
>> +* use this as the worst case scenerio in which the frag ahead
>> +* of us only provides one byte which is why we are limited to 6
>> +* descriptors for a single transmit as the h

[PATCH net-next v3 2/2] net: dsa: mv88e6xxx: Clear the PDOWN bit on setup

2016-03-29 Thread Patrick Uiterwijk
Some of the vendor-specific bootloaders set up this part
of the initialization for us, so this was never added.
However, since upstream bootloaders don't initialize the
chip specifically, they leave the fiber MII's PDOWN flag
set, which means that the CPU port doesn't connect.

This patch checks whether this flag has been clear prior
by something else, and if not make us clear it.

Reviewed-by: Andrew Lunn 
Signed-off-by: Patrick Uiterwijk 
---
 drivers/net/dsa/mv88e6xxx.c | 36 
 drivers/net/dsa/mv88e6xxx.h |  8 
 2 files changed, 44 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 86a2029..50454be 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2296,6 +2296,25 @@ restore_page_0:
return ret;
 }
 
+static int mv88e6xxx_power_on_serdes(struct dsa_switch *ds)
+{
+   int ret;
+
+   ret = _mv88e6xxx_phy_page_read(ds, REG_FIBER_SERDES, PAGE_FIBER_SERDES,
+  MII_BMCR);
+   if (ret < 0)
+   return ret;
+
+   if (ret & BMCR_PDOWN) {
+   ret &= ~BMCR_PDOWN;
+   ret = _mv88e6xxx_phy_page_write(ds, REG_FIBER_SERDES,
+   PAGE_FIBER_SERDES, MII_BMCR,
+   ret);
+   }
+
+   return ret;
+}
+
 static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
@@ -2399,6 +2418,23 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, 
int port)
goto abort;
}
 
+   /* If this port is connected to a SerDes, make sure the SerDes is not
+* powered down.
+*/
+   if (mv88e6xxx_6352_family(ds)) {
+   ret = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_STATUS);
+   if (ret < 0)
+   goto abort;
+   ret &= PORT_STATUS_CMODE_MASK;
+   if ((ret == PORT_STATUS_CMODE_100BASE_X) ||
+   (ret == PORT_STATUS_CMODE_1000BASE_X) ||
+   (ret == PORT_STATUS_CMODE_SGMII)) {
+   ret = mv88e6xxx_power_on_serdes(ds);
+   if (ret < 0)
+   goto abort;
+   }
+   }
+
/* Port Control 2: don't force a good FCS, set the maximum frame size to
 * 10240 bytes, disable 802.1q tags checking, don't discard tagged or
 * untagged frames on this port, do a destination address lookup on all
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 9a038ab..26a424a 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -28,6 +28,10 @@
 #define SMI_CMD_OP_45_READ_DATA_INC((3 << 10) | SMI_CMD_BUSY)
 #define SMI_DATA   0x01
 
+/* Fiber/SERDES Registers are located at SMI address F, page 1 */
+#define REG_FIBER_SERDES   0x0f
+#define PAGE_FIBER_SERDES  0x01
+
 #define REG_PORT(p)(0x10 + (p))
 #define PORT_STATUS0x00
 #define PORT_STATUS_PAUSE_EN   BIT(15)
@@ -45,6 +49,10 @@
 #define PORT_STATUS_MGMII  BIT(6) /* 6185 */
 #define PORT_STATUS_TX_PAUSED  BIT(5)
 #define PORT_STATUS_FLOW_CTRL  BIT(4)
+#define PORT_STATUS_CMODE_MASK 0x0f
+#define PORT_STATUS_CMODE_100BASE_X0x8
+#define PORT_STATUS_CMODE_1000BASE_X   0x9
+#define PORT_STATUS_CMODE_SGMII0xa
 #define PORT_PCS_CTRL  0x01
 #define PORT_PCS_CTRL_RGMII_DELAY_RXCLKBIT(15)
 #define PORT_PCS_CTRL_RGMII_DELAY_TXCLKBIT(14)
-- 
2.7.4



[PATCH net-next v3 1/2] net: dsa: mv88e6xxx: Introduce _mv88e6xxx_phy_page_{read,write}

2016-03-29 Thread Patrick Uiterwijk
Add versions of the phy_page_read and _write functions to
be used in a context where the SMI mutex is held.

Tested-by: Vivien Didelot 
Reviewed-by: Vivien Didelot 
Signed-off-by: Patrick Uiterwijk 
---
 drivers/net/dsa/mv88e6xxx.c | 49 +
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index fa086e0..86a2029 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2264,6 +2264,38 @@ static void mv88e6xxx_bridge_work(struct work_struct 
*work)
mutex_unlock(&ps->smi_mutex);
 }
 
+static int _mv88e6xxx_phy_page_write(struct dsa_switch *ds, int port, int page,
+int reg, int val)
+{
+   int ret;
+
+   ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page);
+   if (ret < 0)
+   goto restore_page_0;
+
+   ret = _mv88e6xxx_phy_write_indirect(ds, port, reg, val);
+restore_page_0:
+   _mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0);
+
+   return ret;
+}
+
+static int _mv88e6xxx_phy_page_read(struct dsa_switch *ds, int port, int page,
+   int reg)
+{
+   int ret;
+
+   ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page);
+   if (ret < 0)
+   goto restore_page_0;
+
+   ret = _mv88e6xxx_phy_read_indirect(ds, port, reg);
+restore_page_0:
+   _mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0);
+
+   return ret;
+}
+
 static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
@@ -2714,13 +2746,9 @@ int mv88e6xxx_phy_page_read(struct dsa_switch *ds, int 
port, int page, int reg)
int ret;
 
mutex_lock(&ps->smi_mutex);
-   ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page);
-   if (ret < 0)
-   goto error;
-   ret = _mv88e6xxx_phy_read_indirect(ds, port, reg);
-error:
-   _mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0);
+   ret = _mv88e6xxx_phy_page_read(ds, port, page, reg);
mutex_unlock(&ps->smi_mutex);
+
return ret;
 }
 
@@ -2731,14 +2759,9 @@ int mv88e6xxx_phy_page_write(struct dsa_switch *ds, int 
port, int page,
int ret;
 
mutex_lock(&ps->smi_mutex);
-   ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page);
-   if (ret < 0)
-   goto error;
-
-   ret = _mv88e6xxx_phy_write_indirect(ds, port, reg, val);
-error:
-   _mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0);
+   ret = _mv88e6xxx_phy_page_write(ds, port, page, reg, val);
mutex_unlock(&ps->smi_mutex);
+
return ret;
 }
 
-- 
2.7.4



Re: [PATCH net-next v2 2/2] net: dsa: mv88e6xxx: Clear the PDOWN bit on setup

2016-03-29 Thread Andrew Lunn
On Wed, Mar 30, 2016 at 12:58:04AM +, Patrick Uiterwijk wrote:
> Hi Vivien,
> 
> On Tue, Mar 29, 2016 at 6:49 PM, Vivien Didelot
>  wrote:
> > Hi Andrew, Patrick,
> >
> > Andrew Lunn  writes:
> >
> >> On Tue, Mar 29, 2016 at 12:23:06PM -0400, Vivien Didelot wrote:
> >>> Hi Patrick,
> >>>
> >>> Two comments below.
> >>>
> >>> Patrick Uiterwijk  writes:
> >>>
> >>> > +static int mv88e6xxx_power_on_serdes(struct dsa_switch *ds)
> >>>
> >>> Since this function assumes the SMI lock is already held, its name
> >>> should be prefixed with _ by convention (_mv88e6xxx_power_on_serdes).
> >>
> >> We decided to drop at, since nearly everything would end up with a _
> >> prefix. The assert_smi_lock() should find any missing locks, and
> >> lockdep/deadlocks will make it clear when the lock is taken twice.
> >
> > OK, I didn't know that. This makes sense. There is no need to respin a
> > v3 only for my previous &= comment then.
> 
> Does that mean the merger will fix this up?
> Or that I'll roll a v3 when I get a reviewed-by for the second patch?

Hi Patrick

Role a v3, and you can add

Reviewed-by: Andrew Lunn 

as well as Viviens for patch #1.

 Andrew


Re: [PATCH net-next v2 2/2] net: dsa: mv88e6xxx: Clear the PDOWN bit on setup

2016-03-29 Thread Patrick Uiterwijk
Hi Vivien,

On Tue, Mar 29, 2016 at 6:49 PM, Vivien Didelot
 wrote:
> Hi Andrew, Patrick,
>
> Andrew Lunn  writes:
>
>> On Tue, Mar 29, 2016 at 12:23:06PM -0400, Vivien Didelot wrote:
>>> Hi Patrick,
>>>
>>> Two comments below.
>>>
>>> Patrick Uiterwijk  writes:
>>>
>>> > +static int mv88e6xxx_power_on_serdes(struct dsa_switch *ds)
>>>
>>> Since this function assumes the SMI lock is already held, its name
>>> should be prefixed with _ by convention (_mv88e6xxx_power_on_serdes).
>>
>> We decided to drop at, since nearly everything would end up with a _
>> prefix. The assert_smi_lock() should find any missing locks, and
>> lockdep/deadlocks will make it clear when the lock is taken twice.
>
> OK, I didn't know that. This makes sense. There is no need to respin a
> v3 only for my previous &= comment then.

Does that mean the merger will fix this up?
Or that I'll roll a v3 when I get a reviewed-by for the second patch?

Thanks,
Patrick


Re: [PATCH net-next] tcp: remove cwnd moderation after recovery

2016-03-29 Thread Stephen Hemminger
On Tue, 29 Mar 2016 17:15:52 -0700
Yuchung Cheng  wrote:

> For non-SACK connections, cwnd is lowered to inflight plus 3 packets
> when the recovery ends. This is an optional feature in the NewReno
> RFC 2582 to reduce the potential burst when cwnd is "re-opened"
> after recovery and inflight is low.
> 
> This feature is questionably effective because of PRR: when
> the recovery ends (i.e., snd_una == high_seq) NewReno holds the
> CA_Recovery state for another round trip to prevent false fast
> retransmits. But if the inflight is low, PRR will overwrite the
> moderated cwnd in tcp_cwnd_reduction() later.
> 
> On the other hand, if the recovery ends because the sender
> detects the losses were spurious (e.g., reordering). This feature
> unconditionally lowers a reverted cwnd even though nothing
> was lost.
> 
> By principle loss recovery module should not update cwnd. Further
> pacing is much more effective to reduce burst. Hence this patch
> removes the cwnd moderation feature.
> 
> Signed-off-by: Matt Mathis 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Soheil Hassas Yeganeh 

I have a concern that this might break Linux builtin protection
against hostile receiver sending bogus ACK's.  Remember Linux is
different than NewReno. You are changing something that has existed for
a long long time.


Re: [net-next 02/16] i40e/i40evf: Rewrite logic for 8 descriptor per packet check

2016-03-29 Thread Jesse Brandeburg
stupid gmail... sent again to the list, sorry for duplicate (but
formatted better this time)


Hey Alex, this patch appears to have caused a regression (probably both
i40e/i40evf).  Easily reproducible running rds-stress (see the thread titled
"i40e card Tx resets", the middle post by sowmini has the repro steps,
ignore the pktgen discussion in the thread.)

I've spent some time trying to figure out the right fix, but keep getting
stuck in the complicated logic of the function, so I'm sending this in the
hope you'll take a look.

This is (one example of) the skb that doesn't get linearized:
skb_print_bits:  > headlen=114 datalen=33824 data=238 mac=238 nh=252
h=272 gso=1448 frag=17
skb_print_bits: frag[0]: len: 256
skb_print_bits: frag[1]: len: 48
skb_print_bits: frag[2]: len: 256
skb_print_bits: frag[3]: len: 48
skb_print_bits: frag[4]: len: 256
skb_print_bits: frag[5]: len: 48
skb_print_bits: frag[6]: len: 4096

This descriptor ^^^ is the 8th, I believe the hardware mechanism faults on
this input.

I added a print of the sum at each point it is subtracted or added to after
initially being set, sum7/8 are in the for loop.

skb_print_bits: frag[7]: len: 4096
skb_print_bits: frag[8]: len: 48
skb_print_bits: frag[9]: len: 4096
skb_print_bits: frag[10]: len: 4096
skb_print_bits: frag[11]: len: 48
skb_print_bits: frag[12]: len: 4096
skb_print_bits: frag[13]: len: 4096
skb_print_bits: frag[14]: len: 48
skb_print_bits: frag[15]: len: 4096
skb_print_bits: frag[16]: len: 4096
__i40e_chk_linearize: sum1: -1399
__i40e_chk_linearize: sum2: -1143
__i40e_chk_linearize: sum3: -1095
__i40e_chk_linearize: sum4: -839
__i40e_chk_linearize: sum5: -791
__i40e_chk_linearize: sum7: 3305
__i40e_chk_linearize: sum8: 3257
__i40e_chk_linearize: sum7: 7353
__i40e_chk_linearize: sum8: 7097
__i40e_chk_linearize: sum7: 7145
__i40e_chk_linearize: sum8: 7097
__i40e_chk_linearize: sum7: 11193
__i40e_chk_linearize: sum8: 10937
__i40e_chk_linearize: sum7: 15033
__i40e_chk_linearize: sum8: 14985
__i40e_chk_linearize: sum7: 15033

This was the descriptors generated from the above:
d[054] = 0x 0x16a021140011
d[055] = 0x000bfbaa40ee 0x01ca02871640
d[056] = 0x000584bcd000 0x040202871640
d[057] = 0x000c0bfea9d0 0x00c202871640
d[058] = 0x000584bcd100 0x040202871640
d[059] = 0x000c0bfeaa00 0x00c202871640
d[05a] = 0x000584bcd200 0x040202871640
d[05b] = 0x000c0bfeaa30 0x00c202871640
d[05c] = 0x00056d5f 0x400202871640
d[05d] = 0x00056d5f1000 0x400202871640
d[05e] = 0x000c0bfeaa60 0x00c202871640
d[05f] = 0x0005f2762000 0x400202871640
d[060] = 0x0005f765e000 0x400202871640
d[061] = 0x000c0bfeaa90 0x00c202871640
d[062] = 0x000574928000 0x400202871640
d[063] = 0x000568ba5000 0x400202871640
d[064] = 0x000c0bfeaac0 0x00c202871640
d[065] = 0x0005f68cd000 0x400202871640
d[066] = 0x000585a2a000 0x400202871670


On Fri, Feb 19, 2016 at 3:54 AM Jeff Kirsher 
wrote:
>
> From: Alexander Duyck 
>
> This patch is meant to rewrite the logic for how we determine if we can
> transmit the frame or if it needs to be linearized.
>
> +   /* Initialize size to the negative value of gso_size minus 1.  We
> +* use this as the worst case scenerio in which the frag ahead
> +* of us only provides one byte which is why we are limited to 6
> +* descriptors for a single transmit as the header and previous
> +* fragment are already consuming 2 descriptors.
> +*/
> +   sum = 1 - gso_size;
> +
> +   /* Add size of frags 1 through 5 to create our initial sum */
> +   sum += skb_frag_size(++frag);


I'm pretty sure this code skips frag[0] due to the pre-increment, the bug
seems to occur in the algorithm because skb->data contains L2/L3/L4 header
plus some data, and should counts as 1 descriptor for every subsequent sent
packet, and if the incoming skb has 7 chunks that add up to < MSS then we
get in trouble.

> +   sum += skb_frag_size(++frag);
> +   sum += skb_frag_size(++frag);
> +   sum += skb_frag_size(++frag);
> +   sum += skb_frag_size(++frag);
> +
> +   /* Walk through fragments adding latest fragment, testing it, and
> +* then removing stale fragments from the sum.
> +*/
> +   stale = &skb_shinfo(skb)->frags[0];
> +   for (;;) {
> +   sum += skb_frag_size(++frag);
> +
> +   /* if sum is negative we failed to make sufficient progress */
> +   if (sum < 0)
> +   return true;
> +
> +   /* use pre-decrement to avoid processing last fragment */
> +   if (!--nr_frags)
> +   break;
> +
> +   sum -= skb_frag_size(++stale);


I think this line also skips stale[0]


Re: [PATCH RFC net-next] net: core: Pass XPS select queue decision to skb_tx_hash

2016-03-29 Thread John Fastabend
On 16-03-29 03:24 PM, Saeed Mahameed wrote:
> Currently XPS select queue decision is final and overrides/ignores all other
> select queue parameters such as QoS TC, RX recording.
> 
> This patch makes get_xps_queue value as a hint for skb_tx_hash, which will 
> decide
> whether to use this hint as is or to tweak it a little to provide the correct 
> TXQ.
> 
> This will fix bugs in cases such that TC QoS offload (tc_to_txq mapping) and 
> XPS mapping
> are configured but select queue will only respect the XPS configuration which 
> will skip
> the TC QoS queue selection and thus will not satisfy the QoS configuration.
> 
> RFC because I want to discuss how we would like the final behavior of the
> __netdev_pick_tx, with this patch it goes as follows:
> 
> netdev_pick_tx(skb):
> hint = get_xps_queue
> txq = skb_tx_hash(skb, hint)
> 
> skb_tx_hash(skb, hint):
> if (skb_rx_queue_recorded(skb))
> return skb_get_rx_queue(skb);
> 
> queue_offset = 0;
> if (dev->num_tc)
> queue_offset = tc_queue_offset[tc];
> 
> hash = hint < 0 ? skb_get_hash(skb) : hint;
> return hash + queue_offset;
> 
> i.e: instead of blindly return the XPS decision, we pass it to skb_tx_hash 
> which can make the final
> decision for us, select queue will now respect and combine XPS and TC QoS 
> tc_to_txq mappings.
> 
> Also there is one additional behavioral change that recorded rx queues will
> now override the XPS configuration.
> 
> Signed-off-by: Saeed Mahameed 
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c |2 +-
>  include/linux/netdevice.h  |6 +++---
>  net/core/dev.c |   10 ++
>  3 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index c0d7b72..873cf49 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -693,7 +693,7 @@ u16 mlx4_en_select_queue(struct net_device *dev, struct 
> sk_buff *skb,
>   u8 up = 0;
>  
>   if (dev->num_tc)
> - return skb_tx_hash(dev, skb);
> + return skb_tx_hash(dev, skb, -1);
>  

I would prefer to not have another strange quirk users have to remember
in order to do tx classification. So with this change depending on the
driver the queue selection precedence changes. In short I agree with
the problem statement but think we can find a better solution.

One idea that comes to mind is we can have a tc action to force the
queue selection? Now that we have the egress tc hook it would probably
be fairly cheap to implement and if users want this behavior they can
ask for it explicitly.

If your thinking about tc stuff we could fix the tooling to set this
action when ever dcb is turned on or hardware rate limiting is enabled,
etc. And even if we wanted we could have the driver add the rule in the
cases where firmware protocols are configuring the QOS/etc.


>   if (skb_vlan_tag_present(skb))
>   up = skb_vlan_tag_get(skb) >> VLAN_PRIO_SHIFT;
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index cb0d5d0..ad81ffe 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3130,16 +3130,16 @@ static inline int netif_set_xps_queue(struct 
> net_device *dev,
>  #endif
>  
>  u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
> -   unsigned int num_tx_queues);
> +   unsigned int num_tx_queues, int txq_hint);
>  

[...]

And all this seems like it would only ever be called by drivers select
queue routines which I really wish we could kill off one of these days
instead of add to. Now if the signal is something higher in the stack
and not the driver I think it is OK.

.John


[PATCH net-next] tcp: remove cwnd moderation after recovery

2016-03-29 Thread Yuchung Cheng
For non-SACK connections, cwnd is lowered to inflight plus 3 packets
when the recovery ends. This is an optional feature in the NewReno
RFC 2582 to reduce the potential burst when cwnd is "re-opened"
after recovery and inflight is low.

This feature is questionably effective because of PRR: when
the recovery ends (i.e., snd_una == high_seq) NewReno holds the
CA_Recovery state for another round trip to prevent false fast
retransmits. But if the inflight is low, PRR will overwrite the
moderated cwnd in tcp_cwnd_reduction() later.

On the other hand, if the recovery ends because the sender
detects the losses were spurious (e.g., reordering). This feature
unconditionally lowers a reverted cwnd even though nothing
was lost.

By principle loss recovery module should not update cwnd. Further
pacing is much more effective to reduce burst. Hence this patch
removes the cwnd moderation feature.

Signed-off-by: Matt Mathis 
Signed-off-by: Neal Cardwell 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/net/tcp.h| 11 ---
 net/ipv4/tcp_input.c | 11 ---
 2 files changed, 22 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index b91370f..f8bb4a4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1039,17 +1039,6 @@ static inline __u32 tcp_max_tso_deferred_mss(const 
struct tcp_sock *tp)
return 3;
 }
 
-/* Slow start with delack produces 3 packets of burst, so that
- * it is safe "de facto".  This will be the default - same as
- * the default reordering threshold - but if reordering increases,
- * we must be able to allow cwnd to burst at least this much in order
- * to not pull it back when holes are filled.
- */
-static __inline__ __u32 tcp_max_burst(const struct tcp_sock *tp)
-{
-   return tp->reordering;
-}
-
 /* Returns end sequence number of the receiver's advertised window */
 static inline u32 tcp_wnd_end(const struct tcp_sock *tp)
 {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e6e65f7..f87b84a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2252,16 +2252,6 @@ static void tcp_update_scoreboard(struct sock *sk, int 
fast_rexmit)
}
 }
 
-/* CWND moderation, preventing bursts due to too big ACKs
- * in dubious situations.
- */
-static inline void tcp_moderate_cwnd(struct tcp_sock *tp)
-{
-   tp->snd_cwnd = min(tp->snd_cwnd,
-  tcp_packets_in_flight(tp) + tcp_max_burst(tp));
-   tp->snd_cwnd_stamp = tcp_time_stamp;
-}
-
 static bool tcp_tsopt_ecr_before(const struct tcp_sock *tp, u32 when)
 {
return tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
@@ -2410,7 +2400,6 @@ static bool tcp_try_undo_recovery(struct sock *sk)
/* Hold old state until something *above* high_seq
 * is ACKed. For Reno it is MUST to prevent false
 * fast retransmits (RFC2582). SACK TCP is safe. */
-   tcp_moderate_cwnd(tp);
if (!tcp_any_retrans_done(sk))
tp->retrans_stamp = 0;
return true;
-- 
2.8.0.rc3.226.g39d4020



Best way to reduce system call overhead for tun device I/O?

2016-03-29 Thread Guus Sliepen
I'm trying to reduce system call overhead when reading/writing to/from a
tun device in userspace. For sockets, one can use sendmmsg()/recvmmsg(),
but a tun fd is not a socket fd, so this doesn't work. I'm see several
options to allow userspace to read/write multiple packets with one
syscall:

- Implement a TX/RX ring buffer that is mmap()ed, like with AF_PACKET
  sockets.

- Implement a ioctl() to emulate sendmmsg()/recvmmsg().

- Add a flag that can be set using TUNSETIFF that makes regular
  read()/write() calls handle multiple packets in one go.

- Expose a socket fd to userspace, so regular sendmmsg()/recvmmsg() can
  be used. There is tun_get_socket() which is used internally in the
  kernel, but this is not exposed to userspace, and doesn't look trivial
  to do either.

What would be the right way to do this?

-- 
Met vriendelijke groet / with kind regards,
 Guus Sliepen 


[PATCH RFC net-next] net: core: Pass XPS select queue decision to skb_tx_hash

2016-03-29 Thread Saeed Mahameed
Currently XPS select queue decision is final and overrides/ignores all other
select queue parameters such as QoS TC, RX recording.

This patch makes get_xps_queue value as a hint for skb_tx_hash, which will 
decide
whether to use this hint as is or to tweak it a little to provide the correct 
TXQ.

This will fix bugs in cases such that TC QoS offload (tc_to_txq mapping) and 
XPS mapping
are configured but select queue will only respect the XPS configuration which 
will skip
the TC QoS queue selection and thus will not satisfy the QoS configuration.

RFC because I want to discuss how we would like the final behavior of the
__netdev_pick_tx, with this patch it goes as follows:

netdev_pick_tx(skb):
hint = get_xps_queue
txq = skb_tx_hash(skb, hint)

skb_tx_hash(skb, hint):
if (skb_rx_queue_recorded(skb))
return skb_get_rx_queue(skb);

queue_offset = 0;
if (dev->num_tc)
queue_offset = tc_queue_offset[tc];

hash = hint < 0 ? skb_get_hash(skb) : hint;
return hash + queue_offset;

i.e: instead of blindly return the XPS decision, we pass it to skb_tx_hash 
which can make the final
decision for us, select queue will now respect and combine XPS and TC QoS 
tc_to_txq mappings.

Also there is one additional behavioral change that recorded rx queues will
now override the XPS configuration.

Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |2 +-
 include/linux/netdevice.h  |6 +++---
 net/core/dev.c |   10 ++
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index c0d7b72..873cf49 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -693,7 +693,7 @@ u16 mlx4_en_select_queue(struct net_device *dev, struct 
sk_buff *skb,
u8 up = 0;
 
if (dev->num_tc)
-   return skb_tx_hash(dev, skb);
+   return skb_tx_hash(dev, skb, -1);
 
if (skb_vlan_tag_present(skb))
up = skb_vlan_tag_get(skb) >> VLAN_PRIO_SHIFT;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb0d5d0..ad81ffe 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3130,16 +3130,16 @@ static inline int netif_set_xps_queue(struct net_device 
*dev,
 #endif
 
 u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
- unsigned int num_tx_queues);
+ unsigned int num_tx_queues, int txq_hint);
 
 /*
  * Returns a Tx hash for the given packet when dev->real_num_tx_queues is used
  * as a distribution range limit for the returned value.
  */
 static inline u16 skb_tx_hash(const struct net_device *dev,
- struct sk_buff *skb)
+ struct sk_buff *skb, int txq_hint)
 {
-   return __skb_tx_hash(dev, skb, dev->real_num_tx_queues);
+   return __skb_tx_hash(dev, skb, dev->real_num_tx_queues, txq_hint);
 }
 
 /**
diff --git a/net/core/dev.c b/net/core/dev.c
index b9bcbe7..ff640b7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2394,7 +2394,7 @@ EXPORT_SYMBOL(netif_device_attach);
  * to be used as a distribution range.
  */
 u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
- unsigned int num_tx_queues)
+ unsigned int num_tx_queues, int txq_hint)
 {
u32 hash;
u16 qoffset = 0;
@@ -2411,9 +2411,12 @@ u16 __skb_tx_hash(const struct net_device *dev, struct 
sk_buff *skb,
u8 tc = netdev_get_prio_tc_map(dev, skb->priority);
qoffset = dev->tc_to_txq[tc].offset;
qcount = dev->tc_to_txq[tc].count;
+   if (txq_hint >= qcount) /* This can happen when xps is 
configured on TC TXQs */
+   txq_hint = -1; /* recalculate TXQ hash */
}
 
-   return (u16) reciprocal_scale(skb_get_hash(skb), qcount) + qoffset;
+   hash = txq_hint < 0 ? reciprocal_scale(skb_get_hash(skb), qcount) : 
txq_hint;
+   return (u16)(hash) + qoffset;
 }
 EXPORT_SYMBOL(__skb_tx_hash);
 
@@ -3201,8 +3204,7 @@ static u16 __netdev_pick_tx(struct net_device *dev, 
struct sk_buff *skb)
if (queue_index < 0 || skb->ooo_okay ||
queue_index >= dev->real_num_tx_queues) {
int new_index = get_xps_queue(dev, skb);
-   if (new_index < 0)
-   new_index = skb_tx_hash(dev, skb);
+   new_index = skb_tx_hash(dev, skb, new_index);
 
if (queue_index != new_index && sk &&
sk_fullsock(sk) &&
-- 
1.7.1



Re: [PATCH] bridge: Allow set bridge ageing time when switchdev disabled

2016-03-29 Thread Ido Schimmel
Tue, Mar 29, 2016 at 01:48:08PM IDT, yanhaishu...@cmss.chinamobile.com wrote:
>When NET_SWITCHDEV=n, switchdev_port_attr_set will return -EOPNOTSUPP,
>we should ignore this error code and continue to set the ageing time.
>
>Signed-off-by: Haishuang Yan 

Fixes: c62987bbd8a1 ("bridge: push bridge setting ageing_time down to 
switchdev")
Acked-by: Ido Schimmel 

Thank you.


Re: [PATCH net] bpf: make padding in bpf_tunnel_key explicit

2016-03-29 Thread Alexei Starovoitov
On Wed, Mar 30, 2016 at 12:02:00AM +0200, Daniel Borkmann wrote:
> Make the 2 byte padding in struct bpf_tunnel_key between tunnel_ttl
> and tunnel_label members explicit. No issue has been observed, and
> gcc/llvm does padding for the old struct already, where tunnel_label
> was not yet present, so the current code works, but since it's part
> of uapi, make sure we don't introduce holes in structs.
> 
> Therefore, add tunnel_ext that we can use generically in future
> (f.e. to flag OAM messages for backends, etc). Also add the offset
> to the compat tests to be sure should some compilers not padd the
> tail of the old version of bpf_tunnel_key.
> 
> Fixes: 4018ab1875e0 ("bpf: support flow label for bpf_skb_{set, 
> get}_tunnel_key")
> Signed-off-by: Daniel Borkmann 

Acked-by: Alexei Starovoitov 



[PATCH net] bpf: make padding in bpf_tunnel_key explicit

2016-03-29 Thread Daniel Borkmann
Make the 2 byte padding in struct bpf_tunnel_key between tunnel_ttl
and tunnel_label members explicit. No issue has been observed, and
gcc/llvm does padding for the old struct already, where tunnel_label
was not yet present, so the current code works, but since it's part
of uapi, make sure we don't introduce holes in structs.

Therefore, add tunnel_ext that we can use generically in future
(f.e. to flag OAM messages for backends, etc). Also add the offset
to the compat tests to be sure should some compilers not padd the
tail of the old version of bpf_tunnel_key.

Fixes: 4018ab1875e0 ("bpf: support flow label for bpf_skb_{set, 
get}_tunnel_key")
Signed-off-by: Daniel Borkmann 
---
 include/uapi/linux/bpf.h | 1 +
 net/core/filter.c| 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 924f537..23917bb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -375,6 +375,7 @@ struct bpf_tunnel_key {
};
__u8 tunnel_tos;
__u8 tunnel_ttl;
+   __u16 tunnel_ext;
__u32 tunnel_label;
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index b7177d0..4b81b71 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1764,6 +1764,7 @@ static u64 bpf_skb_get_tunnel_key(u64 r1, u64 r2, u64 
size, u64 flags, u64 r5)
if (unlikely(size != sizeof(struct bpf_tunnel_key))) {
switch (size) {
case offsetof(struct bpf_tunnel_key, tunnel_label):
+   case offsetof(struct bpf_tunnel_key, tunnel_ext):
goto set_compat;
case offsetof(struct bpf_tunnel_key, remote_ipv6[1]):
/* Fixup deprecated structure layouts here, so we have
@@ -1849,6 +1850,7 @@ static u64 bpf_skb_set_tunnel_key(u64 r1, u64 r2, u64 
size, u64 flags, u64 r5)
if (unlikely(size != sizeof(struct bpf_tunnel_key))) {
switch (size) {
case offsetof(struct bpf_tunnel_key, tunnel_label):
+   case offsetof(struct bpf_tunnel_key, tunnel_ext):
case offsetof(struct bpf_tunnel_key, remote_ipv6[1]):
/* Fixup deprecated structure layouts here, so we have
 * a common path later on.
@@ -1861,7 +1863,8 @@ static u64 bpf_skb_set_tunnel_key(u64 r1, u64 r2, u64 
size, u64 flags, u64 r5)
return -EINVAL;
}
}
-   if (unlikely(!(flags & BPF_F_TUNINFO_IPV6) && from->tunnel_label))
+   if (unlikely((!(flags & BPF_F_TUNINFO_IPV6) && from->tunnel_label) ||
+from->tunnel_ext))
return -EINVAL;
 
skb_dst_drop(skb);
-- 
1.9.3



[net PATCH v2] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-29 Thread Alexander Duyck
This patch should fix the issues seen with a recent fix to prevent
tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
correct for now as long as we do not add any devices that support
NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
potential to mess things up due to the fact that the outer transport header
points to the outer UDP header and not the GRE header as would be expected.

Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
encapsulation.")
Signed-off-by: Alexander Duyck 
---

v2: Dropped switch statements per suggestion of Tom Herbert.

 net/ipv4/fou.c |   16 
 1 file changed, 16 insertions(+)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index a0586b4a197d..5a94aea280d3 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -195,6 +195,14 @@ static struct sk_buff **fou_gro_receive(struct sk_buff 
**head,
u8 proto = NAPI_GRO_CB(skb)->proto;
const struct net_offload **offloads;
 
+   /* We can clear the encap_mark for FOU as we are essentially doing
+* one of two possible things.  We are either adding an L4 tunnel
+* header to the outer L3 tunnel header, or we are are simply
+* treating the GRE tunnel header as though it is a UDP protocol
+* specific header such as VXLAN or GENEVE.
+*/
+   NAPI_GRO_CB(skb)->encap_mark = 0;
+
rcu_read_lock();
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
ops = rcu_dereference(offloads[proto]);
@@ -352,6 +360,14 @@ static struct sk_buff **gue_gro_receive(struct sk_buff 
**head,
}
}
 
+   /* We can clear the encap_mark for GUE as we are essentially doing
+* one of two possible things.  We are either adding an L4 tunnel
+* header to the outer L3 tunnel header, or we are are simply
+* treating the GRE tunnel header as though it is a UDP protocol
+* specific header such as VXLAN or GENEVE.
+*/
+   NAPI_GRO_CB(skb)->encap_mark = 0;
+
rcu_read_lock();
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
ops = rcu_dereference(offloads[guehdr->proto_ctype]);



Re: [RESEND PATCH 15/16] wcn36xx: don't pad beacons for mesh

2016-03-29 Thread Bjorn Andersson
On Tue 29 Mar 14:41 PDT 2016, Bjorn Andersson wrote:

> From: Jason Mobarak 
> 
> Patch "wcn36xx: Pad TIM PVM if needed" has caused a regression in mesh
> beaconing.  The field tim_off is always 0 for mesh mode, and thus
> pvm_len (referring to the TIM length field) and pad are both incorrectly
> calculated.  Thus, msg_body.beacon_length is incorrectly calculated for
> mesh mode. Fix this.
> 
> Fixes: 8ad99a4e3ee5 ("wcn36xx: Pad TIM PVM if needed")
> Signed-off-by: Jason Mobarak 
> Signed-off-by: Chun-Yeow Yeoh 
> Signed-off-by: Bjorn Andersson 
> ---
> 
> Resend this single patch with included Fixes tag.
> 

Sorry for the spam, I read the git log incorrectly. The patch referred
to is part of this series, so the sha1 is bogus.

Regards,
Bjorn

>  drivers/net/wireless/ath/wcn36xx/smd.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
> b/drivers/net/wireless/ath/wcn36xx/smd.c
> index a57d158298a1..b1bdc229e560 100644
> --- a/drivers/net/wireless/ath/wcn36xx/smd.c
> +++ b/drivers/net/wireless/ath/wcn36xx/smd.c
> @@ -1410,6 +1410,11 @@ int wcn36xx_smd_send_beacon(struct wcn36xx *wcn, 
> struct ieee80211_vif *vif,
>  
>   pvm_len = skb_beacon->data[tim_off + 1] - 3;
>   pad = TIM_MIN_PVM_SIZE - pvm_len;
> +
> + /* Padding is irrelevant to mesh mode since tim_off is always 0. */
> + if (vif->type == NL80211_IFTYPE_MESH_POINT)
> + pad = 0;
> +
>   msg_body.beacon_length = skb_beacon->len + pad;
>   /* TODO need to find out why + 6 is needed */
>   msg_body.beacon_length6 = msg_body.beacon_length + 6;
> -- 
> 2.5.0
> 


[RESEND PATCH 15/16] wcn36xx: don't pad beacons for mesh

2016-03-29 Thread Bjorn Andersson
From: Jason Mobarak 

Patch "wcn36xx: Pad TIM PVM if needed" has caused a regression in mesh
beaconing.  The field tim_off is always 0 for mesh mode, and thus
pvm_len (referring to the TIM length field) and pad are both incorrectly
calculated.  Thus, msg_body.beacon_length is incorrectly calculated for
mesh mode. Fix this.

Fixes: 8ad99a4e3ee5 ("wcn36xx: Pad TIM PVM if needed")
Signed-off-by: Jason Mobarak 
Signed-off-by: Chun-Yeow Yeoh 
Signed-off-by: Bjorn Andersson 
---

Resend this single patch with included Fixes tag.

 drivers/net/wireless/ath/wcn36xx/smd.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index a57d158298a1..b1bdc229e560 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -1410,6 +1410,11 @@ int wcn36xx_smd_send_beacon(struct wcn36xx *wcn, struct 
ieee80211_vif *vif,
 
pvm_len = skb_beacon->data[tim_off + 1] - 3;
pad = TIM_MIN_PVM_SIZE - pvm_len;
+
+   /* Padding is irrelevant to mesh mode since tim_off is always 0. */
+   if (vif->type == NL80211_IFTYPE_MESH_POINT)
+   pad = 0;
+
msg_body.beacon_length = skb_beacon->len + pad;
/* TODO need to find out why + 6 is needed */
msg_body.beacon_length6 = msg_body.beacon_length + 6;
-- 
2.5.0



Re: [PATCH 06/16] wcn36xx: Fetch private sta data from sta entry instead of from vif

2016-03-29 Thread Bjorn Andersson
On Tue 29 Mar 10:01 PDT 2016, kbuild test robot wrote:

> Hi Pontus,
> 
> [auto build test ERROR on wireless-drivers/master]
> [also build test ERROR on v4.6-rc1 next-20160329]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improving the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Bjorn-Andersson/Misc-wcn36xx-fixes/20160329-141847
> base:   
> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git 
> master
> config: sparc64-allyesconfig (attached as .config)
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=sparc64 
> 
> Note: the linux-review/Bjorn-Andersson/Misc-wcn36xx-fixes/20160329-141847 
> HEAD 8303daac889854237207e7caefaea94fee0b87f2 builds fine.
>   It only hurts bisectibility.
> 
> All error/warnings (new ones prefixed by >>):
> 
>drivers/net/wireless/ath/wcn36xx/main.c: In function 'wcn36xx_set_key':
> >> drivers/net/wireless/ath/wcn36xx/main.c:389:9: error: implicit declaration 
> >> of function 'wcn36xx_sta_to_priv' [-Werror=implicit-function-declaration]
>  struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
> ^
> >> drivers/net/wireless/ath/wcn36xx/main.c:389:33: warning: initialization 
> >> makes pointer from integer without a cast
>  struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
> ^
>cc1: some warnings being treated as errors

This should have been reordered with patch 7, that introduces this
helper function. Do you want me to resend, or can you apply the patches
out of order?

Regards,
Bjorn

> 
> vim +/wcn36xx_sta_to_priv +389 drivers/net/wireless/ath/wcn36xx/main.c
> 
>383   struct ieee80211_vif *vif,
>384   struct ieee80211_sta *sta,
>385   struct ieee80211_key_conf *key_conf)
>386{
>387struct wcn36xx *wcn = hw->priv;
>388struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
>  > 389struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
>390int ret = 0;
>391u8 key[WLAN_MAX_KEY_LEN];
>392
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


> ___
> wcn36xx mailing list
> wcn3...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/wcn36xx



Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-29 Thread Alexander Duyck
On Tue, Mar 29, 2016 at 2:13 PM, Tom Herbert  wrote:
> On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  wrote:
>> This patch should fix the issues seen with a recent fix to prevent
>> tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
>> correct for now as long as we do not add any devices that support
>> NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
>> potential to mess things up due to the fact that the outer transport header
>> points to the outer UDP header and not the GRE header as would be expected.
>>
>> Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
>> encapsulation.")
>> Signed-off-by: Alexander Duyck 
>> ---
>>
>> This should allow us to keep the fix that Jesse added without breaking the
>> 3 cases that Tom called out in terms of FOU/GUE.
>>
>> Additional work will be needed in net-next as we probably need to make it
>> so that offloads work correctly when we get around to supporting
>> NETIF_F_GSO_GRE_CSUM.
>>
>>  net/ipv4/fou.c |   32 
>>  1 file changed, 32 insertions(+)
>>
>> diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
>> index 4136da9275b2..2c30256ee959 100644
>> --- a/net/ipv4/fou.c
>> +++ b/net/ipv4/fou.c
>> @@ -195,6 +195,22 @@ static struct sk_buff **fou_gro_receive(struct sk_buff 
>> **head,
>> u8 proto = NAPI_GRO_CB(skb)->proto;
>> const struct net_offload **offloads;
>>
>> +   switch (proto) {
>> +   case IPPROTO_IPIP:
>> +   case IPPROTO_IPV6:
>> +   case IPPROTO_GRE:
>> +   /* We can clear the encap_mark for these 3 protocols as
>> +* we are either adding an L4 tunnel header to the outer
>> +* L3 tunnel header, or we are are simply treating the
>> +* GRE tunnel header as though it is a UDP protocol
>> +* specific header such as VXLAN or GENEVE.
>> +*/
>> +   NAPI_GRO_CB(skb)->encap_mark = 0;
>> +   /* fall-through */
>> +   default:
>> +   break;
>> +   }
>
> Switch statement is not needed, just do NAPI_GRO_CB(skb)->encap_mark = 0;
>
>> +
>> rcu_read_lock();
>> offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : 
>> inet_offloads;
>> ops = rcu_dereference(offloads[proto]);
>> @@ -359,6 +375,22 @@ static struct sk_buff **gue_gro_receive(struct sk_buff 
>> **head,
>> NAPI_GRO_CB(p)->flush |= NAPI_GRO_CB(p)->flush_id;
>> }
>>
>> +   switch (guehdr->proto_ctype) {
>> +   case IPPROTO_IPIP:
>> +   case IPPROTO_IPV6:
>> +   case IPPROTO_GRE:
>> +   /* We can clear the encap_mark for these 3 protocols as
>> +* we are either adding an L4 tunnel header to the outer
>> +* L3 tunnel header, or we are are simply treating the
>> +* GRE tunnel header as though it is a UDP protocol
>> +* specific header such as VXLAN or GENEVE.
>> +*/
>> +   NAPI_GRO_CB(skb)->encap_mark = 0;
>> +   /* fall-through */
>> +   default:
>> +   break;
>> +   }
>> +
> Here also.
>
>> rcu_read_lock();
>> offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : 
>> inet_offloads;
>> ops = rcu_dereference(offloads[guehdr->proto_ctype]);
>>

Okay, I can update that and submit a v2.  The only real reason why I
had the switch statements was out of an abundance of caution since
those were the only 3 cases where I knew we would run into issues.

- Alex


Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-29 Thread Tom Herbert
On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  wrote:
> This patch should fix the issues seen with a recent fix to prevent
> tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
> correct for now as long as we do not add any devices that support
> NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
> potential to mess things up due to the fact that the outer transport header
> points to the outer UDP header and not the GRE header as would be expected.
>
> Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
> encapsulation.")
> Signed-off-by: Alexander Duyck 
> ---
>
> This should allow us to keep the fix that Jesse added without breaking the
> 3 cases that Tom called out in terms of FOU/GUE.
>
> Additional work will be needed in net-next as we probably need to make it
> so that offloads work correctly when we get around to supporting
> NETIF_F_GSO_GRE_CSUM.
>
>  net/ipv4/fou.c |   32 
>  1 file changed, 32 insertions(+)
>
> diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
> index 4136da9275b2..2c30256ee959 100644
> --- a/net/ipv4/fou.c
> +++ b/net/ipv4/fou.c
> @@ -195,6 +195,22 @@ static struct sk_buff **fou_gro_receive(struct sk_buff 
> **head,
> u8 proto = NAPI_GRO_CB(skb)->proto;
> const struct net_offload **offloads;
>
> +   switch (proto) {
> +   case IPPROTO_IPIP:
> +   case IPPROTO_IPV6:
> +   case IPPROTO_GRE:
> +   /* We can clear the encap_mark for these 3 protocols as
> +* we are either adding an L4 tunnel header to the outer
> +* L3 tunnel header, or we are are simply treating the
> +* GRE tunnel header as though it is a UDP protocol
> +* specific header such as VXLAN or GENEVE.
> +*/
> +   NAPI_GRO_CB(skb)->encap_mark = 0;
> +   /* fall-through */
> +   default:
> +   break;
> +   }

Switch statement is not needed, just do NAPI_GRO_CB(skb)->encap_mark = 0;

> +
> rcu_read_lock();
> offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
> ops = rcu_dereference(offloads[proto]);
> @@ -359,6 +375,22 @@ static struct sk_buff **gue_gro_receive(struct sk_buff 
> **head,
> NAPI_GRO_CB(p)->flush |= NAPI_GRO_CB(p)->flush_id;
> }
>
> +   switch (guehdr->proto_ctype) {
> +   case IPPROTO_IPIP:
> +   case IPPROTO_IPV6:
> +   case IPPROTO_GRE:
> +   /* We can clear the encap_mark for these 3 protocols as
> +* we are either adding an L4 tunnel header to the outer
> +* L3 tunnel header, or we are are simply treating the
> +* GRE tunnel header as though it is a UDP protocol
> +* specific header such as VXLAN or GENEVE.
> +*/
> +   NAPI_GRO_CB(skb)->encap_mark = 0;
> +   /* fall-through */
> +   default:
> +   break;
> +   }
> +
Here also.

> rcu_read_lock();
> offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
> ops = rcu_dereference(offloads[guehdr->proto_ctype]);
>


Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-29 Thread David Miller
From: Tom Herbert 
Date: Mon, 28 Mar 2016 21:51:17 -0700

> No, but I do expect that you support code that is already there. There
> was apparently zero testing done on the original patch and it caused
> one very obvious regression. So how can we have any confidence
> whatsoever that this patch doesn't break other things? Furthermore,
> with all these claims of bugs I still don't see that _anyone_ has
> taken the time to reproduce any issue and show that this patch
> materially fixes any thing. I seriously don't understand how basic
> testing could be such a challenge.
> 
> Anyway, what I expect is moot. It's up to davem to decide what to do
> with this...

You being upset with a lack of testing is one issue, and is
legitimate.

But the fact that we can't support, and never could support, more than
one network header at a time except in a very special case for GRO
is very real.  And you must acknowledge that this was a very shaky
foundation upon which to erect the kinds of things you expect to work.


Re: [PATCH net-next v3.16]r9169: Correct Set Vlan tag

2016-03-29 Thread David Miller
From: Corcodel Marian 
Date: Tue, 29 Mar 2016 11:33:20 +0300

>  This patch add set Vlan tag and flush CPlusCmd register because when unset
>  RxVlan and RxChkSum bit, whithout some explication , unwanted bits
>  is set, PCIDAC, PCIMulRW  and others.Whithout this patch when run
>  ethtool -d eth0 on "C+ Command" field missing "VLAN de-tagging"
> 
> Signed-off-by: Corcodel Marian 

I am hereby blocking you from making any and all postings to the
mailing lists at vger.kernel.org

There is not negotiating about this, all complaints sent to me will be
ignored.  You should have thought about the consequences of your
actions (or lack thereof) over the past several months.

You cannot continually post patches, as you please, and completely
refuse to interact with the developers who review your changes and
give you feedback.

You have not coherently replied to any feedback you have been given.

You have almost always ignored the feedback you were given and
continued to make the same undesirable changes over and over again.

And you refuse to even acknowledge or address the issue of your
interactions with the community in any way whatsoever.

Therefore you are hurting development, and wasting precious developer
resources with your postings.

All of this is unacceptable.

But that all ends right now.

Thank you.


Re: [PATCH 1/1] ipv4: fix NULL pointer dereference in __inet_put_port()

2016-03-29 Thread David Miller
From: fanhui 
Date: Tue, 29 Mar 2016 14:45:53 +0800

> [] tcp_nuke_addr+0x22c/0x2a0

Do not report or fix problems in non-mainline kernels.

Thank you.


Re: ss filter problem

2016-03-29 Thread Vadim Kochan
Hi Phil,

On Tue, Mar 29, 2016 at 09:32:42PM +0200, Phil Sutter wrote:
> Hi,
> 
> I am trying to fix a bug in ss filter code, but feel quite lost right
> now. The issue is this:
> 
> | ss -nl -f inet '( sport = :22 )'
> 
> prints not only listening sockets (as requested by -l flag), but
> established ones as well (reproduce by opening ssh connection to
> 127.0.0.1 before calling above).
> 
> In contrast, the following both don't show the established sockets:
> 
> | ss -nl '( sport = :22 )'
> | ss -nl -f inet
> 
> My investigation led me to see that current_filter.states is altered
> after ssfilter_parse() returns, and using gdb with a watchpoint I was
> able to identify parse_hostcond() to be the bad guy: In line 1560, it
> calls filter_af_set() after checking for fam != AF_UNSPEC (which is the
> case, since fam = preferred_family and the latter is changed to AF_INET
> when parsing '-f inet' parameter).

Yes, after removing of fam != AF_UNSPEC body - it works, because
it does not overwrite specified states (-l) from command line, but I
can't say what it may affect else, I will try to investigate it better.

> 
> This whole jumping back and forth confuses me quite effectively. Since
> you did some fixes in the past already, are you possibly able to point
> out where/how this tiny mess has to be fixed?
> 
> I guess in an ideal world we would translate '-l' to 'state listen', '-f
> inet' to 'src inet:*' and pass everything ANDed together to
> ssfilter_parse(). Or maybe that would make things even worse. ;)
> 
> Cheers, Phil

I thought I fixed & tested well ss filter, but seems it would be good to
have good automation testing.

Regards,
Vadim Kochan


Re: [PATCH] bus: mvebu-mbus: use %pad to print phys_addr_t

2016-03-29 Thread Arnd Bergmann
On Tuesday 29 March 2016 18:04:47 Gregory CLEMENT wrote:
> 
> What is the status of this patch?
> 
> Do you plan to send a second version with the title fixed as suggested
> by Joe Perches?
> 
> Also do you expect that I collect this patch in the mvebu subsystem?

Right now, it's on my long-term todo list along with some 70 other patches
I need to revisit. If you want to fix up the title and apply it now,
that would be great, otherwise I'll get to it in a few weeks after coming
back from ELC/vacation.

ARnd


ss filter problem

2016-03-29 Thread Phil Sutter
Hi,

I am trying to fix a bug in ss filter code, but feel quite lost right
now. The issue is this:

| ss -nl -f inet '( sport = :22 )'

prints not only listening sockets (as requested by -l flag), but
established ones as well (reproduce by opening ssh connection to
127.0.0.1 before calling above).

In contrast, the following both don't show the established sockets:

| ss -nl '( sport = :22 )'
| ss -nl -f inet

My investigation led me to see that current_filter.states is altered
after ssfilter_parse() returns, and using gdb with a watchpoint I was
able to identify parse_hostcond() to be the bad guy: In line 1560, it
calls filter_af_set() after checking for fam != AF_UNSPEC (which is the
case, since fam = preferred_family and the latter is changed to AF_INET
when parsing '-f inet' parameter).

This whole jumping back and forth confuses me quite effectively. Since
you did some fixes in the past already, are you possibly able to point
out where/how this tiny mess has to be fixed?

I guess in an ideal world we would translate '-l' to 'state listen', '-f
inet' to 'src inet:*' and pass everything ANDed together to
ssfilter_parse(). Or maybe that would make things even worse. ;)

Cheers, Phil


Re: [PATCH net-next v2 2/2] net: dsa: mv88e6xxx: Clear the PDOWN bit on setup

2016-03-29 Thread Vivien Didelot
Hi Andrew, Patrick,

Andrew Lunn  writes:

> On Tue, Mar 29, 2016 at 12:23:06PM -0400, Vivien Didelot wrote:
>> Hi Patrick,
>> 
>> Two comments below.
>> 
>> Patrick Uiterwijk  writes:
>> 
>> > +static int mv88e6xxx_power_on_serdes(struct dsa_switch *ds)
>> 
>> Since this function assumes the SMI lock is already held, its name
>> should be prefixed with _ by convention (_mv88e6xxx_power_on_serdes).
>
> We decided to drop at, since nearly everything would end up with a _
> prefix. The assert_smi_lock() should find any missing locks, and
> lockdep/deadlocks will make it clear when the lock is taken twice.

OK, I didn't know that. This makes sense. There is no need to respin a
v3 only for my previous &= comment then.

Thanks,
-v


RE: [PATCH] qed: initialize return rc to avoid returning garbage

2016-03-29 Thread Yuval Mintz
> From: Colin Ian King 
> 
> in the case where qed_slowpath_irq_req is not called, rc is not assigned and 
> so
> qed_int_igu_enable will return a garbage value.
> Fix this by initializing rc to 0.
> 
> Signed-off-by: Colin Ian King 

Thanks Colin.
Acked-by: Yuval Mintz 


[RFC] Add netdev all_adj_list refcnt propagation to fix panic

2016-03-29 Thread Andrew Collins
This is an RFC patch to fix a relatively easily reproducible kernel
panic related to the all_adj_list handling for netdevs in recent kernels.

This is more to generate discussion than anything else.  I don't
particularly like this approach, I'm hoping someone has a better idea.

The following sequence of commands will reproduce the issue:

ip link add link eth0 name eth0.100 type vlan id 100
ip link add link eth0 name eth0.200 type vlan id 200
ip link add name testbr type bridge
ip link set eth0.100 master testbr
ip link set eth0.200 master testbr
ip link add link testbr mac0 type macvlan
ip link delete dev testbr

This creates an upper/lower tree of (excuse the poor ASCII art):

/---eth0.100-eth0
mac0-testbr-
\---eth0.200-eth0

When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice 
from
the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
reference to eth0 is added, so this results in a panic.

This change adds reference count propagation so things are handled properly.

Matthias Schiffer reported a similar crash in batman-adv:

https://github.com/freifunk-gluon/gluon/issues/680
https://www.open-mesh.org/issues/247

which this patch also seems to resolve.
---
 net/core/dev.c | 68 --
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b9bcbe7..4b4ef6b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5471,6 +5471,7 @@ static inline bool netdev_adjacent_is_neigh_list(struct 
net_device *dev,
 
 static int __netdev_adjacent_dev_insert(struct net_device *dev,
struct net_device *adj_dev,
+   u16 ref_nr,
struct list_head *dev_list,
void *private, bool master)
 {
@@ -5480,7 +5481,7 @@ static int __netdev_adjacent_dev_insert(struct net_device 
*dev,
adj = __netdev_find_adj(adj_dev, dev_list);
 
if (adj) {
-   adj->ref_nr++;
+   adj->ref_nr += ref_nr;
return 0;
}
 
@@ -5490,7 +5491,7 @@ static int __netdev_adjacent_dev_insert(struct net_device 
*dev,
 
adj->dev = adj_dev;
adj->master = master;
-   adj->ref_nr = 1;
+   adj->ref_nr = ref_nr;
adj->private = private;
dev_hold(adj_dev);
 
@@ -5529,6 +5530,7 @@ free_adj:
 
 static void __netdev_adjacent_dev_remove(struct net_device *dev,
 struct net_device *adj_dev,
+u16 ref_nr,
 struct list_head *dev_list)
 {
struct netdev_adjacent *adj;
@@ -5541,10 +5543,10 @@ static void __netdev_adjacent_dev_remove(struct 
net_device *dev,
BUG();
}
 
-   if (adj->ref_nr > 1) {
-   pr_debug("%s to %s ref_nr-- = %d\n", dev->name, adj_dev->name,
-adj->ref_nr-1);
-   adj->ref_nr--;
+   if (adj->ref_nr > ref_nr) {
+   pr_debug("%s to %s ref_nr-%d = %d\n", dev->name, adj_dev->name,
+ref_nr, adj->ref_nr-ref_nr);
+   adj->ref_nr -= ref_nr;
return;
}
 
@@ -5563,21 +5565,22 @@ static void __netdev_adjacent_dev_remove(struct 
net_device *dev,
 
 static int __netdev_adjacent_dev_link_lists(struct net_device *dev,
struct net_device *upper_dev,
+   u16 ref_nr,
struct list_head *up_list,
struct list_head *down_list,
void *private, bool master)
 {
int ret;
 
-   ret = __netdev_adjacent_dev_insert(dev, upper_dev, up_list, private,
-  master);
+   ret = __netdev_adjacent_dev_insert(dev, upper_dev, ref_nr, up_list,
+  private, master);
if (ret)
return ret;
 
-   ret = __netdev_adjacent_dev_insert(upper_dev, dev, down_list, private,
-  false);
+   ret = __netdev_adjacent_dev_insert(upper_dev, dev, ref_nr, down_list,
+  private, false);
if (ret) {
-   __netdev_adjacent_dev_remove(dev, upper_dev, up_list);
+   __netdev_adjacent_dev_remove(dev, upper_dev, ref_nr, up_list);
return ret;
}
 
@@ -5585,9 +5588,10 @@ static int __netdev_adjacent_dev_link_lists(struct 
net_device *dev,
 }
 
 static int __netdev_adjacent_dev_link(struct net_device *dev,
- struct net_device *upper_dev)
+ struct net_device *upper_dev,
+ u16 r

Re: [PATCH 06/16] wcn36xx: Fetch private sta data from sta entry instead of from vif

2016-03-29 Thread kbuild test robot
Hi Pontus,

[auto build test ERROR on wireless-drivers/master]
[also build test ERROR on v4.6-rc1 next-20160329]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Bjorn-Andersson/Misc-wcn36xx-fixes/20160329-141847
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git 
master
config: sparc64-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

Note: the linux-review/Bjorn-Andersson/Misc-wcn36xx-fixes/20160329-141847 HEAD 
8303daac889854237207e7caefaea94fee0b87f2 builds fine.
  It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

   drivers/net/wireless/ath/wcn36xx/main.c: In function 'wcn36xx_set_key':
>> drivers/net/wireless/ath/wcn36xx/main.c:389:9: error: implicit declaration 
>> of function 'wcn36xx_sta_to_priv' [-Werror=implicit-function-declaration]
 struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
^
>> drivers/net/wireless/ath/wcn36xx/main.c:389:33: warning: initialization 
>> makes pointer from integer without a cast
 struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
^
   cc1: some warnings being treated as errors

vim +/wcn36xx_sta_to_priv +389 drivers/net/wireless/ath/wcn36xx/main.c

   383 struct ieee80211_vif *vif,
   384 struct ieee80211_sta *sta,
   385 struct ieee80211_key_conf *key_conf)
   386  {
   387  struct wcn36xx *wcn = hw->priv;
   388  struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
 > 389  struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
   390  int ret = 0;
   391  u8 key[WLAN_MAX_KEY_LEN];
   392  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH] qed: initialize return rc to avoid returning garbage

2016-03-29 Thread Colin King
From: Colin Ian King 

in the case where qed_slowpath_irq_req is not called, rc is not
assigned and so qed_int_igu_enable will return a garbage value.
Fix this by initializing rc to 0.

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/qlogic/qed/qed_int.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c 
b/drivers/net/ethernet/qlogic/qed/qed_int.c
index ffd0acc..2017b01 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_int.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_int.c
@@ -2750,7 +2750,7 @@ void qed_int_igu_enable_int(struct qed_hwfn *p_hwfn,
 int qed_int_igu_enable(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt,
   enum qed_int_mode int_mode)
 {
-   int rc;
+   int rc = 0;
 
/* Configure AEU signal change to produce attentions */
qed_wr(p_hwfn, p_ptt, IGU_REG_ATTENTION_ENABLE, 0);
-- 
2.7.4



Re: [PATCH] rtl8xxxu: fix uninitialized return value in ret

2016-03-29 Thread Jes Sorensen
Colin King  writes:
> From: Colin Ian King 
>
> several functions are not initializing a return status in ret
> resulting in garbage to be returned instead of 0 for success.
> Currently, the calls to these functions are not checking the
> return, however, it seems prudent to return the correct status
> in case they are to be checked at a later date.
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Thanks for the patch! I'm surprised the compiler didn't warn about this.

I'll add it to my queue for rtl8xxxu.

Cheers,
Jes

>
> diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c 
> b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
> index abdff45..9262aad 100644
> --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
> +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
> @@ -5231,7 +5231,7 @@ static void rtl8xxxu_set_ampdu_min_space(struct 
> rtl8xxxu_priv *priv, u8 density)
>  static int rtl8xxxu_active_to_emu(struct rtl8xxxu_priv *priv)
>  {
>   u8 val8;
> - int count, ret;
> + int count, ret = 0;
>  
>   /* Start of rtl8723AU_card_enable_flow */
>   /* Act to Cardemu sequence*/
> @@ -5281,7 +5281,7 @@ static int rtl8723bu_active_to_emu(struct rtl8xxxu_priv 
> *priv)
>   u8 val8;
>   u16 val16;
>   u32 val32;
> - int count, ret;
> + int count, ret = 0;
>  
>   /* Turn off RF */
>   rtl8xxxu_write8(priv, REG_RF_CTRL, 0);
> @@ -5338,7 +5338,7 @@ static int rtl8xxxu_active_to_lps(struct rtl8xxxu_priv 
> *priv)
>  {
>   u8 val8;
>   u8 val32;
> - int count, ret;
> + int count, ret = 0;
>  
>   rtl8xxxu_write8(priv, REG_TXPAUSE, 0xff);


[PATCH] rtl8xxxu: fix uninitialized return value in ret

2016-03-29 Thread Colin King
From: Colin Ian King 

several functions are not initializing a return status in ret
resulting in garbage to be returned instead of 0 for success.
Currently, the calls to these functions are not checking the
return, however, it seems prudent to return the correct status
in case they are to be checked at a later date.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
index abdff45..9262aad 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
@@ -5231,7 +5231,7 @@ static void rtl8xxxu_set_ampdu_min_space(struct 
rtl8xxxu_priv *priv, u8 density)
 static int rtl8xxxu_active_to_emu(struct rtl8xxxu_priv *priv)
 {
u8 val8;
-   int count, ret;
+   int count, ret = 0;
 
/* Start of rtl8723AU_card_enable_flow */
/* Act to Cardemu sequence*/
@@ -5281,7 +5281,7 @@ static int rtl8723bu_active_to_emu(struct rtl8xxxu_priv 
*priv)
u8 val8;
u16 val16;
u32 val32;
-   int count, ret;
+   int count, ret = 0;
 
/* Turn off RF */
rtl8xxxu_write8(priv, REG_RF_CTRL, 0);
@@ -5338,7 +5338,7 @@ static int rtl8xxxu_active_to_lps(struct rtl8xxxu_priv 
*priv)
 {
u8 val8;
u8 val32;
-   int count, ret;
+   int count, ret = 0;
 
rtl8xxxu_write8(priv, REG_TXPAUSE, 0xff);
 
-- 
2.7.4



Re: [PATCH net-next v2 2/2] net: dsa: mv88e6xxx: Clear the PDOWN bit on setup

2016-03-29 Thread Andrew Lunn
On Tue, Mar 29, 2016 at 12:23:06PM -0400, Vivien Didelot wrote:
> Hi Patrick,
> 
> Two comments below.
> 
> Patrick Uiterwijk  writes:
> 
> > +static int mv88e6xxx_power_on_serdes(struct dsa_switch *ds)
> 
> Since this function assumes the SMI lock is already held, its name
> should be prefixed with _ by convention (_mv88e6xxx_power_on_serdes).

We decided to drop at, since nearly everything would end up with a _
prefix. The assert_smi_lock() should find any missing locks, and
lockdep/deadlocks will make it clear when the lock is taken twice.

  Andrew


Re: [PATCH net-next v2 1/2] net: dsa: mv88e6xxx: Introduce _mv88e6xxx_phy_page_{read,write}

2016-03-29 Thread Vivien Didelot
Patrick Uiterwijk  writes:

> Add versions of the phy_page_read and _write functions to
> be used in a context where the SMI mutex is held.
>
> Signed-off-by: Patrick Uiterwijk 

Tested-by: Vivien Didelot 
Reviewed-by: Vivien Didelot 

Thanks,
Vivien


Re: [PATCH net-next v2 2/2] net: dsa: mv88e6xxx: Clear the PDOWN bit on setup

2016-03-29 Thread Vivien Didelot
Hi Patrick,

Two comments below.

Patrick Uiterwijk  writes:

> +static int mv88e6xxx_power_on_serdes(struct dsa_switch *ds)

Since this function assumes the SMI lock is already held, its name
should be prefixed with _ by convention (_mv88e6xxx_power_on_serdes).

> +{
> + int ret;
> +
> + ret = _mv88e6xxx_phy_page_read(ds, REG_FIBER_SERDES, PAGE_FIBER_SERDES,
> +MII_BMCR);
> + if (ret < 0)
> + return ret;
> +
> + if (ret & BMCR_PDOWN) {
> + ret = ret & ~BMCR_PDOWN;

ret &= ~BMCR_PDOWN;

> + ret = _mv88e6xxx_phy_page_write(ds, REG_FIBER_SERDES,
> + PAGE_FIBER_SERDES, MII_BMCR,
> + ret);
> + }
> +
> + return ret;
> +}

Thanks,
Vivien


Re: [PATCH] bus: mvebu-mbus: use %pad to print phys_addr_t

2016-03-29 Thread Gregory CLEMENT
Hi Arnd,
 
 On mar., mars 15 2016, Arnd Bergmann  wrote:

> A recent change to the mbus driver added a warning printk that
> prints a phys_addr_t using the %x format string, which fails in
> case we build with 64-bit phys_addr_t:
>
> drivers/bus/mvebu-mbus.c: In function 'mvebu_mbus_get_dram_win_info':
> drivers/bus/mvebu-mbus.c:975:9: error: format '%x' expects argument of type 
> 'unsigned int', but argument 2 has type 'phys_addr_t {aka long long unsigned 
> int}' [-Werror=format=]
>
> This uses the special %pa format string instead, so we always
> print the correct type.
>
> Signed-off-by: Arnd Bergmann 
> Fixes: f2900acea801 ("bus: mvebu-mbus: provide api for obtaining IO
> and DRAM window information")

What is the status of this patch?

Do you plan to send a second version with the title fixed as suggested
by Joe Perches?

Also do you expect that I collect this patch in the mvebu subsystem?

Thanks,

Gregory

> ---
> The warning was introduced in today's linux-next through the net-next
> tree, please apply this fixup on top.
>
>  drivers/bus/mvebu-mbus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
> index c2e52864bb03..ce54a0160faa 100644
> --- a/drivers/bus/mvebu-mbus.c
> +++ b/drivers/bus/mvebu-mbus.c
> @@ -972,7 +972,7 @@ int mvebu_mbus_get_dram_win_info(phys_addr_t phyaddr, u8 
> *target, u8 *attr)
>   }
>   }
>  
> - pr_err("invalid dram address 0x%x\n", phyaddr);
> + pr_err("invalid dram address %pa\n", &phyaddr);
>   return -EINVAL;
>  }
>  EXPORT_SYMBOL_GPL(mvebu_mbus_get_dram_win_info);
> -- 
> 2.7.0
>

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com


[PATCH net] ipv6: udp: fix UDP_MIB_IGNOREDMULTI updates

2016-03-29 Thread Eric Dumazet
From: Eric Dumazet 

IPv6 counters updates use a different macro than IPv4.

Fixes: 36cbb2452cbaf ("udp: Increment UDP_MIB_IGNOREDMULTI for arriving 
unmatched multicasts")
Signed-off-by: Eric Dumazet 
Cc: Rick Jones 
Cc: Willem de Bruijn 
---
 net/ipv6/udp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index fd25e447a5fa..8125931106be 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -843,8 +843,8 @@ start_lookup:
flush_stack(stack, count, skb, count - 1);
} else {
if (!inner_flushed)
-   UDP_INC_STATS_BH(net, UDP_MIB_IGNOREDMULTI,
-proto == IPPROTO_UDPLITE);
+   UDP6_INC_STATS_BH(net, UDP_MIB_IGNOREDMULTI,
+ proto == IPPROTO_UDPLITE);
consume_skb(skb);
}
return 0;




RE: [PATCH] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-03-29 Thread Rosen, Rami
Hi, Netanel,

+into 5 levels and assignes interrupt delay value to each level.
Should be: assigns

+The ENA device AQ and AENQ are allocated on probe and freed ontermination.
Should be: on termination.

+   /* commit previously loaded firmare */
Should be: firmware

+static int ena_com_hash_key_destroy(struct ena_com_dev *ena_dev)
+{
+   struct ena_rss *rss = &ena_dev->rss;
+
+   if (rss->hash_key)
+   dma_free_coherent(ena_dev->dmadev,
+ sizeof(*rss->hash_key),
+ rss->hash_key,
+ rss->hash_key_dma_addr);
+   rss->hash_key = NULL;
+   return 0;
+}

This method always returns 0.

+static int ena_com_hash_ctrl_init(struct ena_com_dev *ena_dev)
+{
+   struct ena_rss *rss = &ena_dev->rss;
+
+   rss->hash_ctrl = dma_alloc_coherent(ena_dev->dmadev,
+   sizeof(*rss->hash_ctrl),
+   &rss->hash_ctrl_dma_addr,
+   GFP_KERNEL | __GFP_ZERO);
+
+   return 0;
+}
+

This method also always returns 0.


+static int ena_com_hash_ctrl_destroy(struct ena_com_dev *ena_dev)
+{
+   struct ena_rss *rss = &ena_dev->rss;
+
+   if (rss->hash_ctrl)
+   dma_free_coherent(ena_dev->dmadev,
+ sizeof(*rss->hash_ctrl),
+ rss->hash_ctrl,
+ rss->hash_ctrl_dma_addr);
+   rss->hash_ctrl = NULL;
+
+   return 0;
+}
+

This method also always returns 0.

+static int ena_com_indirect_table_destroy(struct ena_com_dev *ena_dev)
+{
+   struct ena_rss *rss = &ena_dev->rss;
+   size_t tbl_size = (1 << rss->tbl_log_size) *
+   sizeof(struct ena_admin_rss_ind_table_entry);
+
+   if (rss->rss_ind_tbl)
+   dma_free_coherent(ena_dev->dmadev,
+ tbl_size,
+ rss->rss_ind_tbl,
+ rss->rss_ind_tbl_dma_addr);
+   rss->rss_ind_tbl = NULL;
+
+   if (rss->host_rss_ind_tbl)
+   devm_kfree(ena_dev->dmadev, rss->host_rss_ind_tbl);
+   rss->host_rss_ind_tbl = NULL;
+
+   return 0;
+}

This method also always returns 0.

+int ena_com_rss_destroy(struct ena_com_dev *ena_dev)
+{
+   ena_com_indirect_table_destroy(ena_dev);
+   ena_com_hash_key_destroy(ena_dev);
+   ena_com_hash_ctrl_destroy(ena_dev);
+
+   memset(&ena_dev->rss, 0x0, sizeof(ena_dev->rss));
+
+   return 0;
+}

This method also always returns 0.

Regards,
Rami Rosen
Intel Corporation


[net-next] bond: set mac address only if necessary

2016-03-29 Thread Zhang Shengju
Bond device gets it's mac address from the first slave device, it's not
necessary to set slave device's mac address to bond if equal.

Signed-off-by: Zhang Shengju 
---
 drivers/net/bonding/bond_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 97fad05..6569219 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1473,8 +1473,9 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev)
 */
ether_addr_copy(new_slave->perm_hwaddr, slave_dev->dev_addr);
 
-   if (!bond->params.fail_over_mac ||
-   BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
+   if ((!bond->params.fail_over_mac ||
+   BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) &&
+   !ether_addr_equal_64bits(bond_dev->dev_addr, slave_dev->dev_addr)) {
/* Set slave to master's mac address.  The application already
 * set the master's mac address to that of the first slave
 */
-- 
1.8.3.1





Re: bpf: net/core/filter.c:2115 suspicious rcu_dereference_protected() usage!

2016-03-29 Thread Daniel Borkmann

On 03/29/2016 03:55 PM, Daniel Borkmann wrote:

[ dropping my old email address ]

On 03/29/2016 02:58 PM, Michal Kubecek wrote:

On Mon, Feb 22, 2016 at 10:31:33AM -0500, Sasha Levin wrote:


I've hit the following warning while fuzzing with trinity inside a kvmtool guest
running the latest -next kernel:

[ 1343.104588] ===
[ 1343.104591] [ INFO: suspicious RCU usage. ]
[ 1343.104619] 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978 Not 
tainted
[ 1343.104624] ---
[ 1343.104635] net/core/filter.c:2115 suspicious rcu_dereference_protected() 
usage!
[ 1343.104641]
[ 1343.104641] other info that might help us debug this:
[ 1343.104641]
[ 1343.104650]
[ 1343.104650] rcu_scheduler_active = 1, debug_locks = 0
[ 1343.104660] 1 lock held by syz-executor/17916:
[ 1343.104784] #0: (rtnl_mutex){+.+.+.}, at: rtnl_lock (net/core/rtnetlink.c:71)
[ 1343.104789]
[ 1343.104789] stack backtrace:
[ 1343.104820] CPU: 1 PID: 17916 Comm: trinity-c8 Not tainted 
4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978
[ 1343.104868]  110036968f44 8801b4b47aa8 a23d9a9d 
0001
[ 1343.104891]  fbfff5c2a630 41b58ab3 adb3a8f2 
a23d9905
[ 1343.104914]   8801b5419b40 fbfff7596522 
0001
[ 1343.104919] Call Trace:
[ 1343.104985] dump_stack (lib/dump_stack.c:53)
[ 1343.105060] lockdep_rcu_suspicious (kernel/locking/lockdep.c:4282)
[ 1343.105093] sk_detach_filter (net/core/filter.c:2114 (discriminator 5))
[ 1343.105193] tun_detach_filter (drivers/net/tun.c:1808 (discriminator 7))
[ 1343.105238] __tun_chr_ioctl (drivers/net/tun.c:2133)
[ 1343.105370] tun_chr_ioctl (drivers/net/tun.c:2161)
[ 1343.105407] do_vfs_ioctl (fs/ioctl.c:44 fs/ioctl.c:674)
[ 1343.105506] SyS_ioctl (fs/ioctl.c:689 fs/ioctl.c:680)
[ 1343.105542] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:200)


Looks like sk_detach_filter() wants the socket to be owned which neither
tun_detach_filter() does not do, unlike sock_setsockopt(). Could you
check if the patch below helps?

I'm also not really sure if it is safe to ignore return value of
sk_detach_filter() and just sets tun->filter_attached to false - but
it's not really clear what should be done if one of the calls fails
after some succeeded.


Wrt return value, afaik SOCK_FILTER_LOCKED cannot be set for tun devs, so we
should be okay.


diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afdf950617c3..7417d7c20bab 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1818,11 +1818,13 @@ static int set_offload(struct tun_struct *tun, unsigned 
long arg)
  static void tun_detach_filter(struct tun_struct *tun, int n)
  {
  int i;
-struct tun_file *tfile;

  for (i = 0; i < n; i++) {
-tfile = rtnl_dereference(tun->tfiles[i]);
-sk_detach_filter(tfile->socket.sk);
+struct sock *sk = rtnl_dereference(tun->tfiles[i])->socket.sk;
+
+lock_sock(sk);
+sk_detach_filter(sk);
+release_sock(sk);
  }

  tun->filter_attached = false;



In tun case, the control path for tun_attach_filter() and tun_detach_filter()
is under RTNL lock (held in __tun_chr_ioctl()).

So in the BPF core the rcu_dereference_protected(, 
sock_owned_by_user(sk))
looks like a false positive in this specific use case to me, that we should 
probably
just silence.

Running the filter via sk_filter() in tun device happens under rcu_read_lock(),
so the dereference and assignment pair seems okay to me.

Was wondering whether we should convert this to unattached BPF filter, but this
would break with existing expectations from sk_filter() (e.g. security modules).


If we want to silence it, could be something like the below (only 
compile-tested):

 drivers/net/tun.c  |  8 +---
 include/linux/filter.h |  4 
 net/core/filter.c  | 33 +
 3 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afdf950..510e90a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -622,7 +622,8 @@ static int tun_attach(struct tun_struct *tun, struct file 
*file, bool skip_filte

/* Re-attach the filter to persist device */
if (!skip_filter && (tun->filter_attached == true)) {
-   err = sk_attach_filter(&tun->fprog, tfile->socket.sk);
+   err = __sk_attach_filter(&tun->fprog, tfile->socket.sk,
+lockdep_rtnl_is_held());
if (!err)
goto out;
}
@@ -1822,7 +1823,7 @@ static void tun_detach_filter(struct tun_struct *tun, int 
n)

for (i = 0; i < n; i++) {
tfile = rtnl_dereference(tun->tfiles[i]);
-   sk_detach_filter(tfile->socket.sk);
+   __sk_detach_filter(tfile->socket.sk, lockdep_rtnl_is_held());
}

tun->filter_attached = false;
@@ -1835,7 +1836,8 @@ static int tun_attac

Re: [net-next RFC 0/4] SO_BINDTOPREFIX

2016-03-29 Thread Eric Dumazet
On Wed, 2016-03-23 at 02:26 +, Gilberto Bertin wrote:
> Since the net-next window just opened, I'm resubmitting my RFC for the
> SO_BINDTOSUBNET patch, following Mark Smith's suggestion to rename the
> whole thing to a more clear SO_BINDTOPREFIX.

Please do not add such monolithic option.

BPF is absolutely the way to go here, as it allows for whatever user
specified tweaks, like a list of destination subnetwork, or/and a list
of source network, or the date/time of the day, or port knocking without
netfilter, or ... you name it.

Simply add an option to load a BPF filter on a socket, used to vary the
various compute_score() functions.

No hard coded knowledge in the kernel, but a generic interface.




Re: bpf: net/core/filter.c:2115 suspicious rcu_dereference_protected() usage!

2016-03-29 Thread Daniel Borkmann

[ dropping my old email address ]

On 03/29/2016 02:58 PM, Michal Kubecek wrote:

On Mon, Feb 22, 2016 at 10:31:33AM -0500, Sasha Levin wrote:


I've hit the following warning while fuzzing with trinity inside a kvmtool guest
running the latest -next kernel:

[ 1343.104588] ===
[ 1343.104591] [ INFO: suspicious RCU usage. ]
[ 1343.104619] 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978 Not 
tainted
[ 1343.104624] ---
[ 1343.104635] net/core/filter.c:2115 suspicious rcu_dereference_protected() 
usage!
[ 1343.104641]
[ 1343.104641] other info that might help us debug this:
[ 1343.104641]
[ 1343.104650]
[ 1343.104650] rcu_scheduler_active = 1, debug_locks = 0
[ 1343.104660] 1 lock held by syz-executor/17916:
[ 1343.104784] #0: (rtnl_mutex){+.+.+.}, at: rtnl_lock (net/core/rtnetlink.c:71)
[ 1343.104789]
[ 1343.104789] stack backtrace:
[ 1343.104820] CPU: 1 PID: 17916 Comm: trinity-c8 Not tainted 
4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978
[ 1343.104868]  110036968f44 8801b4b47aa8 a23d9a9d 
0001
[ 1343.104891]  fbfff5c2a630 41b58ab3 adb3a8f2 
a23d9905
[ 1343.104914]   8801b5419b40 fbfff7596522 
0001
[ 1343.104919] Call Trace:
[ 1343.104985] dump_stack (lib/dump_stack.c:53)
[ 1343.105060] lockdep_rcu_suspicious (kernel/locking/lockdep.c:4282)
[ 1343.105093] sk_detach_filter (net/core/filter.c:2114 (discriminator 5))
[ 1343.105193] tun_detach_filter (drivers/net/tun.c:1808 (discriminator 7))
[ 1343.105238] __tun_chr_ioctl (drivers/net/tun.c:2133)
[ 1343.105370] tun_chr_ioctl (drivers/net/tun.c:2161)
[ 1343.105407] do_vfs_ioctl (fs/ioctl.c:44 fs/ioctl.c:674)
[ 1343.105506] SyS_ioctl (fs/ioctl.c:689 fs/ioctl.c:680)
[ 1343.105542] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:200)


Looks like sk_detach_filter() wants the socket to be owned which neither
tun_detach_filter() does not do, unlike sock_setsockopt(). Could you
check if the patch below helps?

I'm also not really sure if it is safe to ignore return value of
sk_detach_filter() and just sets tun->filter_attached to false - but
it's not really clear what should be done if one of the calls fails
after some succeeded.


Wrt return value, afaik SOCK_FILTER_LOCKED cannot be set for tun devs, so we
should be okay.


diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afdf950617c3..7417d7c20bab 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1818,11 +1818,13 @@ static int set_offload(struct tun_struct *tun, unsigned 
long arg)
  static void tun_detach_filter(struct tun_struct *tun, int n)
  {
int i;
-   struct tun_file *tfile;

for (i = 0; i < n; i++) {
-   tfile = rtnl_dereference(tun->tfiles[i]);
-   sk_detach_filter(tfile->socket.sk);
+   struct sock *sk = rtnl_dereference(tun->tfiles[i])->socket.sk;
+
+   lock_sock(sk);
+   sk_detach_filter(sk);
+   release_sock(sk);
}

tun->filter_attached = false;



In tun case, the control path for tun_attach_filter() and tun_detach_filter()
is under RTNL lock (held in __tun_chr_ioctl()).

So in the BPF core the rcu_dereference_protected(, 
sock_owned_by_user(sk))
looks like a false positive in this specific use case to me, that we should 
probably
just silence.

Running the filter via sk_filter() in tun device happens under rcu_read_lock(),
so the dereference and assignment pair seems okay to me.

Was wondering whether we should convert this to unattached BPF filter, but this
would break with existing expectations from sk_filter() (e.g. security modules).


[PATCH] sctp: use list_* in sctp_list_dequeue

2016-03-29 Thread Marcelo Ricardo Leitner
Use list_* helpers in sctp_list_dequeue, more readable.

Signed-off-by: Marcelo Ricardo Leitner 
---
 include/net/sctp/sctp.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 
65521cfdcadeee35d61f280165a387cc2164ab6d..03fb33efcae21d54192204629ff4ced2e36e7d4d
 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -386,11 +386,9 @@ static inline struct list_head *sctp_list_dequeue(struct 
list_head *list)
 {
struct list_head *result = NULL;
 
-   if (list->next != list) {
+   if (!list_empty(list)) {
result = list->next;
-   list->next = result->next;
-   list->next->prev = list;
-   INIT_LIST_HEAD(result);
+   list_del_init(result);
}
return result;
 }
-- 
2.5.0



[PATCH 0/2] sctp: delay calls to sk_data_ready() as much as possible

2016-03-29 Thread Marcelo Ricardo Leitner
1st patch is a preparation for the 2nd. The idea is to not call
->sk_data_ready() for every data chunk processed while processing
packets but only once before releasing the socket.

Marcelo Ricardo Leitner (2):
  sctp: compress bit-wide flags to a bitfield on sctp_sock
  sctp: delay calls to sk_data_ready() as much as possible

 include/net/sctp/structs.h | 13 +++--
 net/sctp/sm_sideeffect.c   |  5 +
 net/sctp/ulpqueue.c|  4 ++--
 3 files changed, 14 insertions(+), 8 deletions(-)

-- 
2.5.0



[PATCH 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock

2016-03-29 Thread Marcelo Ricardo Leitner
It wastes space and gets worse as we add new flags, so convert bit-wide
flags to a bitfield.

Currently it already saves 4 bytes in sctp_sock.

Note that do_auto_asconf cannot be merged, as explained in the comment
before it.

Signed-off-by: Marcelo Ricardo Leitner 
---
 include/net/sctp/structs.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 
6df1ce7a411c548bda4163840a90578b6e1b4cfe..1a6a626904bba4223b7921bbb4be41c2550271a7
 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -210,14 +210,14 @@ struct sctp_sock {
int user_frag;
 
__u32 autoclose;
-   __u8 nodelay;
-   __u8 disable_fragments;
-   __u8 v4mapped;
-   __u8 frag_interleave;
__u32 adaptation_ind;
__u32 pd_point;
-   __u8 recvrcvinfo;
-   __u8 recvnxtinfo;
+   __u16   nodelay:1,
+   disable_fragments:1,
+   v4mapped:1,
+   frag_interleave:1,
+   recvrcvinfo:1,
+   recvnxtinfo:1;
 
atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
-- 
2.5.0



[PATCH 2/2] sctp: delay calls to sk_data_ready() as much as possible

2016-03-29 Thread Marcelo Ricardo Leitner
Currently processing of multiple chunks in a single SCTP packet leads to
multiple calls to sk_data_ready, causing multiple wake up signals which
are costy and doesn't make it wake up any faster.

With this patch it will note that the wake up is pending and will do it
before leaving the state machine interpreter, latest place possible to
do it realiably and cleanly.

Note that sk_data_ready events are not dependent on asocs, unlike waking
up writers.

Signed-off-by: Marcelo Ricardo Leitner 
---
 include/net/sctp/structs.h | 3 ++-
 net/sctp/sm_sideeffect.c   | 5 +
 net/sctp/ulpqueue.c| 4 ++--
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 
1a6a626904bba4223b7921bbb4be41c2550271a7..21cb11107e378b4da1e7efde22fab4349496e35a
 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -217,7 +217,8 @@ struct sctp_sock {
v4mapped:1,
frag_interleave:1,
recvrcvinfo:1,
-   recvnxtinfo:1;
+   recvnxtinfo:1,
+   pending_data_ready:1;
 
atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 
7fe56d0acabf66cfd8fe29dfdb45f7620b470ac7..e7042f9ce63b0cfca50cae252f51b60b68cb5731
 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1742,6 +1742,11 @@ out:
error = sctp_outq_uncork(&asoc->outqueue, gfp);
} else if (local_cork)
error = sctp_outq_uncork(&asoc->outqueue, gfp);
+
+   if (sctp_sk(ep->base.sk)->pending_data_ready) {
+   ep->base.sk->sk_data_ready(ep->base.sk);
+   sctp_sk(ep->base.sk)->pending_data_ready = 0;
+   }
return error;
 nomem:
error = -ENOMEM;
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index 
ce469d648ffbe166f9ae1c5650f481256f31a7f8..72e5b3e41cddf9d79371de8ab01484e4601b97b6
 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -264,7 +264,7 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct 
sctp_ulpevent *event)
sctp_ulpq_clear_pd(ulpq);
 
if (queue == &sk->sk_receive_queue)
-   sk->sk_data_ready(sk);
+   sctp_sk(sk)->pending_data_ready = 1;
return 1;
 
 out_free:
@@ -1140,5 +1140,5 @@ void sctp_ulpq_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 
/* If there is data waiting, send it up the socket now. */
if (sctp_ulpq_clear_pd(ulpq) || ev)
-   sk->sk_data_ready(sk);
+   sctp_sk(sk)->pending_data_ready = 1;
 }
-- 
2.5.0



[PATCH] sctp: avoid refreshing heartbeat timer too often

2016-03-29 Thread Marcelo Ricardo Leitner
Currently on high rate SCTP streams the heartbeat timer refresh can
consume quite a lot of resources as timer updates are costly and it
contains a random factor, which a) is also costly and b) invalidates
mod_timer() optimization for not editing a timer to the same value.
It may even cause the timer to be slightly advanced, for no good reason.

There are 3 places that call sctp_transport_reset_timers(), including
one that calls it for every single chunk added to a packet on
sctp_outq_flush(), even if they are bundled. As there is the possibility
of working on several transports, it's not trivial to just post-poned
this call as it would require to at least remember which transports were
used.

This change then moves the random factor to outside
sctp_transport_timeout(), allowing sctp_transport_reset_timers() to
check if a timer update is really necessary. For that, the random factor
is considered 0. If timer expires is still after it, the update is not
necessary as it's a possible random value and there is no
security/functional loss in doing so.

On loopback with MTU of 65535 and data chunks with 1636, so that we
have a considerable amount of chunks without stressing system calls,
netperf -t SCTP_STREAM -l 30, perf looked like this before:

Samples: 103K of event 'cpu-clock', Event count (approx.): 2583300
  Overhead  Command  Shared Object  Symbol
+6,15%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
-5,43%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
   - _raw_write_unlock_irqrestore
  - 96,54% _raw_spin_unlock_irqrestore
 - 36,14% mod_timer
+ 97,24% sctp_transport_reset_timers
+ 2,76% sctp_do_sm
 + 33,65% __wake_up_sync_key
 + 28,77% sctp_ulpq_tail_event
 + 1,40% del_timer
  - 1,84% mod_timer
 + 99,03% sctp_transport_reset_timers
 + 0,97% sctp_do_sm
  + 1,50% sctp_ulpq_tail_event

And after this patch:

Samples: 120K of event 'cpu-clock', Event count (approx.): 3000225
  Overhead  Command  Shared Object  Symbol
+7,34%  netperf  [kernel.vmlinux]   [k] memcpy_erms
+6,67%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
-5,34%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
   - _raw_write_unlock_irqrestore
  - 98,00% _raw_spin_unlock_irqrestore
 + 54,24% __wake_up_sync_key
 + 41,54% sctp_ulpq_tail_event
 - 2,61% mod_timer
+ 79,88% sctp_transport_update_pmtu
+ 20,12% sctp_do_sm
 + 1,61% del_timer
  + 1,83% sctp_ulpq_tail_event
+2,34%  netperf  [kernel.vmlinux]   [k] pvclock_clocksource_read
+2,31%  netperf  [sctp] [k] sctp_sendmsg
+1,93%  netperf  [kernel.vmlinux]   [k] __slab_free
+1,92%  netperf  [sctp] [k] sctp_packet_transmit
+1,85%  netperf  [kernel.vmlinux]   [k] kfree

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/transport.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 
9b6b48c7524e4b441a151b80f0babec81f539d49..c9d6c61f5a511f96f38a0f2c275e418e158e0632
 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -185,6 +185,8 @@ static void sctp_transport_destroy(struct sctp_transport 
*transport)
  */
 void sctp_transport_reset_timers(struct sctp_transport *transport)
 {
+   unsigned long expires;
+
/* RFC 2960 6.3.2 Retransmission Timer Rules
 *
 * R1) Every time a DATA chunk is sent to any address(including a
@@ -199,8 +201,10 @@ void sctp_transport_reset_timers(struct sctp_transport 
*transport)
sctp_transport_hold(transport);
 
/* When a data chunk is sent, reset the heartbeat interval.  */
-   if (!mod_timer(&transport->hb_timer,
-  sctp_transport_timeout(transport)))
+   expires = sctp_transport_timeout(transport);
+   if (time_before(transport->hb_timer.expires, expires) &&
+   !mod_timer(&transport->hb_timer,
+  expires + prandom_u32_max(transport->rto)))
sctp_transport_hold(transport);
 }
 
@@ -595,7 +599,7 @@ void sctp_transport_burst_reset(struct sctp_transport *t)
 unsigned long sctp_transport_timeout(struct sctp_transport *trans)
 {
/* RTO + timer slack +/- 50% of RTO */
-   unsigned long timeout = (trans->rto >> 1) + prandom_u32_max(trans->rto);
+   unsigned long timeout = trans->rto >> 1;
 
if (trans->state != SCTP_UNCONFIRMED &&
trans->state != SCTP_PF)
-- 
2.5.0



[PATCH] sctp: really allow using GFP_KERNEL on sctp_packet_transmit

2016-03-29 Thread Marcelo Ricardo Leitner
Somehow my patch for commit cea8768f333e ("sctp: allow
sctp_transmit_packet and others to use gfp") missed two important
chunks, which are now added.

Fixes: cea8768f333e ("sctp: allow sctp_transmit_packet and others to use gfp")
Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 
736c004abfbc2787a3c50fa85168ebdf3b112787..97745351d58c2fb32b9f9b57d61831d7724d83b2
 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -401,7 +401,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t 
gfp)
sk = chunk->skb->sk;
 
/* Allocate the new skb.  */
-   nskb = alloc_skb(packet->size + MAX_HEADER, GFP_ATOMIC);
+   nskb = alloc_skb(packet->size + MAX_HEADER, gfp);
if (!nskb)
goto nomem;
 
@@ -523,8 +523,8 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t 
gfp)
 */
if (auth)
sctp_auth_calculate_hmac(asoc, nskb,
-   (struct sctp_auth_chunk *)auth,
-   GFP_ATOMIC);
+(struct sctp_auth_chunk *)auth,
+gfp);
 
/* 2) Calculate the Adler-32 checksum of the whole packet,
 *including the SCTP common header and all the
-- 
2.5.0



[PATCH] sctp: flush if we can't fit another DATA chunk

2016-03-29 Thread Marcelo Ricardo Leitner
There is no point in delaying the packet if we can't fit a single byte
of data on it anymore. So lets just reduce the threshold by the amount
that a data chunk with 4 bytes (rounding) would use.

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 
97745351d58c2fb32b9f9b57d61831d7724d83b2..c518569123ce42a8f21f80754756306c39875013
 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -705,7 +705,8 @@ static sctp_xmit_t sctp_packet_can_append_data(struct 
sctp_packet *packet,
/* Check whether this chunk and all the rest of pending data will fit
 * or delay in hopes of bundling a full sized packet.
 */
-   if (chunk->skb->len + q->out_qlen >= transport->pathmtu - 
packet->overhead)
+   if (chunk->skb->len + q->out_qlen >
+   maxsize - packet->overhead - sizeof(sctp_data_chunk_t) - 4)
/* Enough data queued to fill a packet */
return SCTP_XMIT_OK;
 
-- 
2.5.0



[net-next RFC 0/4] SO_BINDTOPREFIX

2016-03-29 Thread Gilberto Bertin
Since the net-next window just opened, I'm resubmitting my RFC for the
SO_BINDTOSUBNET patch, following Mark Smith's suggestion to rename the
whole thing to a more clear SO_BINDTOPREFIX.

Some arguments for and against it since the first submission:
* SO_BINDTOPREFIX is an arbitrary option and can be seens as nother use
* case of the SO_REUSEPORT BPF patch
* but at the same time using BPF requires more work/code on the server
  and since the bind to prefix use case could potentially become a
  common one maybe there is some value in having it as an option instead
  of having to code (either manually or with clang) an eBPF program that
  would do the same
* it may probably possible to archive the same results using VRF. This
  would require to create a VRF device, configure the device routing
  table and make each bind each process to a different VRF device (but
  I'm not sure how this would work/interfere with an existing iptables
  setup for example)

-

This series introduces support for the SO_BINDTOPREFIX socket option, which
allows a listener socket to bind to a prefix instead of * or a single address.

Motivation:
consider a set of servers, each one with thousands and thousands of IP
addresses. Since assigning /32 or /128 IP individual addresses would be
inefficient, one solution can be assigning prefixes using local routes
(with 'ip route add local').

This allows a listener to listen and terminate connections going to any
of the IP addresses of these prefixes without explicitly configuring all
the IP addresses of the prefix range.
This is very efficient.

Unfortunately there may be the need to use different prefixes for
different purposes.
One can imagine port 80 being served by one HTTP server for some IP
prefix, while another server used for another prefix.
Right now Linux does not allow this.
It is either possible to bind to *, indicating ALL traffic going to
given port, or to individual IP addresses.
The first only allows to accept connections from all the prefixes.
The latter does not scale well with lots of IP addresses.

Using bindtoprefix would solve this problem: just by adding a local
route rule and setting the SO_BINDTOPREFIX option for a socket it would
be possible to easily partition traffic by prefixes.

API:
the prefix is specified (as argument of the setsockopt syscall) by the
address of the network, and the prefix length of the netmask.

IPv4:
struct ipv4_prefix {
__be32 net;
u_char plen;
};

and IPv6:
struct ipv6_prefix {
struct in6_addr net;
u_char plen;
};

Bind conflicts:
two sockets with the bindtoprefix option enabled generate a bind
conflict if their network addresses masked with the shortest of their
prefix are equal.
The bindtoprefix option can be combined with soreuseport so that two
listener can bind on the same prefix.

Any questions/feedback appreciated.

Thanks,
 Gilberto

Gilberto Bertin (4):
  bindtoprefix: infrastructure
  bindtoprefix: TCP/IPv4 implementation
  bindtoprefix: TCP/IPv6 implementation
  bindtoprefix: UPD implementation

 include/net/sock.h|  20 +++
 include/uapi/asm-generic/socket.h |   1 +
 net/core/sock.c   | 111 ++
 net/ipv4/inet_connection_sock.c   |  20 ++-
 net/ipv4/inet_hashtables.c|   9 
 net/ipv4/udp.c|  36 +
 net/ipv6/inet6_connection_sock.c  |  17 +-
 net/ipv6/inet6_hashtables.c   |   6 +++
 8 files changed, 218 insertions(+), 2 deletions(-)

-- 
2.7.3



[net-next RFC 3/4] bindtoprefix: TCP/IPv6 implementation

2016-03-29 Thread Gilberto Bertin
Signed-off-by: Gilberto Bertin 
---
 net/ipv6/inet6_connection_sock.c | 17 -
 net/ipv6/inet6_hashtables.c  |  6 ++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 36c3f01..c65023f 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -27,6 +27,20 @@
 #include 
 #include 
 
+int inet6_csk_bind_prefix_conflict(const struct sock *sk,
+  const struct sock *sk2)
+{
+   u_char plen;
+
+   plen = min(sk->sk_bind_prefix6.plen, sk2->sk_bind_prefix6.plen);
+
+   if (sk->sk_bind_to_prefix && sk2->sk_bind_to_prefix)
+   return ipv6_prefix_equal(&sk->sk_bind_prefix6.net,
+&sk2->sk_bind_prefix6.net, plen);
+
+   return 0;
+}
+
 int inet6_csk_bind_conflict(const struct sock *sk,
const struct inet_bind_bucket *tb, bool relax)
 {
@@ -44,7 +58,8 @@ int inet6_csk_bind_conflict(const struct sock *sk,
if (sk != sk2 &&
(!sk->sk_bound_dev_if ||
 !sk2->sk_bound_dev_if ||
-sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
+sk->sk_bound_dev_if == sk2->sk_bound_dev_if) &&
+inet6_csk_bind_prefix_conflict(sk, sk2)) {
if ((!reuse || !sk2->sk_reuse ||
 sk2->sk_state == TCP_LISTEN) &&
(!reuseport || !sk2->sk_reuseport ||
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 21ace5a..bcc16ed 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -114,6 +114,12 @@ static inline int compute_score(struct sock *sk, struct 
net *net,
return -1;
score++;
}
+   if (sk->sk_bind_to_prefix) {
+   if (!ipv6_prefix_equal(&sk->sk_bind_prefix6.net, daddr,
+  sk->sk_bind_prefix6.plen))
+   return -1;
+   score++;
+   }
if (sk->sk_incoming_cpu == raw_smp_processor_id())
score++;
}
-- 
2.7.3



[net-next RFC 4/4] bindtoprefix: UPD implementation

2016-03-29 Thread Gilberto Bertin
Signed-off-by: Gilberto Bertin 
---
 net/ipv4/udp.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 95d2f19..31b9687 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -133,6 +133,23 @@ EXPORT_SYMBOL(udp_memory_allocated);
 #define MAX_UDP_PORTS 65536
 #define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN)
 
+static inline int udp_csk_bind_prefix_conflict(const struct sock *sk,
+  const struct sock *sk2)
+{
+   __be32 mask;
+
+   if (sk->sk_bind_to_prefix && sk2->sk_bind_to_prefix) {
+   mask = inet_make_mask(min(sk->sk_bind_prefix4.plen,
+ sk2->sk_bind_prefix4.plen));
+
+   return (sk->sk_bind_prefix4.net & mask) ==
+  (sk2->sk_bind_prefix4.net & mask);
+   }
+
+   return 0;
+}
+
+
 static int udp_lib_lport_inuse(struct net *net, __u16 num,
   const struct udp_hslot *hslot,
   unsigned long *bitmap,
@@ -153,6 +170,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
(!sk2->sk_reuse || !sk->sk_reuse) &&
(!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if ||
 sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
+udp_csk_bind_prefix_conflict(sk, sk2) &&
(!sk2->sk_reuseport || !sk->sk_reuseport ||
 rcu_access_pointer(sk->sk_reuseport_cb) ||
 !uid_eq(uid, sock_i_uid(sk2))) &&
@@ -189,6 +207,7 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num,
(!sk2->sk_reuse || !sk->sk_reuse) &&
(!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if ||
 sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
+udp_csk_bind_prefix_conflict(sk, sk2) &&
(!sk2->sk_reuseport || !sk->sk_reuseport ||
 rcu_access_pointer(sk->sk_reuseport_cb) ||
 !uid_eq(uid, sock_i_uid(sk2))) &&
@@ -426,6 +445,15 @@ static inline int compute_score(struct sock *sk, struct 
net *net,
return -1;
score += 4;
}
+
+   if (sk->sk_bind_to_prefix) {
+   __be32 mask = inet_make_mask(sk->sk_bind_prefix4.plen);
+
+   if ((sk->sk_bind_prefix4.net & mask) != (daddr & mask))
+   return -1;
+   score += 4;
+   }
+
if (sk->sk_incoming_cpu == raw_smp_processor_id())
score++;
return score;
@@ -471,6 +499,14 @@ static inline int compute_score2(struct sock *sk, struct 
net *net,
score += 4;
}
 
+   if (sk->sk_bind_to_prefix) {
+   __be32 mask = inet_make_mask(sk->sk_bind_prefix4.plen);
+
+   if ((sk->sk_bind_prefix4.net & mask) != (daddr & mask))
+   return -1;
+   score += 4;
+   }
+
if (sk->sk_incoming_cpu == raw_smp_processor_id())
score++;
 
-- 
2.7.3



[net-next RFC 1/4] bindtoprefix: infrastructure

2016-03-29 Thread Gilberto Bertin
Signed-off-by: Gilberto Bertin 
---
 include/net/sock.h|  20 +++
 include/uapi/asm-generic/socket.h |   1 +
 net/core/sock.c   | 111 ++
 3 files changed, 132 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index f5ea148..409d255 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -109,6 +109,16 @@ typedef struct {
 #endif
 } socket_lock_t;
 
+struct ipv4_prefix {
+   __be32 net;
+   u_char plen;
+};
+
+struct ipv6_prefix {
+   struct in6_addr net;
+   u_char plen;
+};
+
 struct sock;
 struct proto;
 struct net;
@@ -176,6 +186,13 @@ struct sock_common {
unsigned char   skc_ipv6only:1;
unsigned char   skc_net_refcnt:1;
int skc_bound_dev_if;
+
+   unsigned char   skc_bind_to_prefix;
+   union {
+   struct ipv4_prefix skc_bind_prefix4;
+   struct ipv6_prefix skc_bind_prefix6;
+   };
+
union {
struct hlist_node   skc_bind_node;
struct hlist_nulls_node skc_portaddr_node;
@@ -327,6 +344,9 @@ struct sock {
 #define sk_state   __sk_common.skc_state
 #define sk_reuse   __sk_common.skc_reuse
 #define sk_reuseport   __sk_common.skc_reuseport
+#define sk_bind_to_prefix  __sk_common.skc_bind_to_prefix
+#define sk_bind_prefix4__sk_common.skc_bind_prefix4
+#define sk_bind_prefix6__sk_common.skc_bind_prefix6
 #define sk_ipv6only__sk_common.skc_ipv6only
 #define sk_net_refcnt  __sk_common.skc_net_refcnt
 #define sk_bound_dev_if__sk_common.skc_bound_dev_if
diff --git a/include/uapi/asm-generic/socket.h 
b/include/uapi/asm-generic/socket.h
index fb8a416..b4dd61f 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -30,6 +30,7 @@
 #define SO_SNDLOWAT19
 #define SO_RCVTIMEO20
 #define SO_SNDTIMEO21
+#define SO_BINDTOPREFIX 22
 #endif
 
 /* Security levels - as per NRL IPv6 - don't actually do anything */
diff --git a/net/core/sock.c b/net/core/sock.c
index 6c1c8bc..e4c9c55 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -571,6 +571,68 @@ out:
return ret;
 }
 
+static int sock_setbindtoprefix(struct sock *sk, char __user *optval,
+   int optlen)
+{
+   int ret = -ENOPROTOOPT;
+
+   if (sk->sk_family == AF_INET) {
+   struct ipv4_prefix bind_prefix4;
+
+   ret = -EFAULT;
+   if (optlen != sizeof(struct ipv4_prefix))
+   goto out;
+
+   if (copy_from_user(&bind_prefix4, optval,
+  sizeof(struct ipv4_prefix)))
+   goto out;
+
+   ret = -EINVAL;
+   if (bind_prefix4.plen > 32)
+   goto out;
+
+   lock_sock(sk);
+
+   sk->sk_bind_to_prefix = 1;
+   sk->sk_bind_prefix4.net = bind_prefix4.net;
+   sk->sk_bind_prefix4.plen = bind_prefix4.plen;
+   sk_dst_reset(sk);
+
+   release_sock(sk);
+
+   ret = 0;
+   } else if (sk->sk_family == AF_INET6) {
+   struct ipv6_prefix bind_prefix6;
+
+   ret = -EFAULT;
+   if (optlen != sizeof(struct ipv6_prefix))
+   goto out;
+
+   if (copy_from_user(&bind_prefix6, optval,
+  sizeof(struct ipv6_prefix)))
+   goto out;
+
+   ret = -EINVAL;
+   if (bind_prefix6.plen > 128)
+   goto out;
+
+   lock_sock(sk);
+
+   sk->sk_bind_to_prefix = 1;
+   memcpy(&sk->sk_bind_prefix6.net, &bind_prefix6.net,
+  sizeof(struct in6_addr));
+   sk->sk_bind_prefix6.plen = bind_prefix6.plen;
+   sk_dst_reset(sk);
+
+   release_sock(sk);
+
+   ret = 0;
+   }
+
+out:
+   return ret;
+}
+
 static int sock_getbindtodevice(struct sock *sk, char __user *optval,
int __user *optlen, int len)
 {
@@ -611,6 +673,49 @@ out:
return ret;
 }
 
+static int sock_getbindtoprefix(struct sock *sk, char __user *optval,
+   int __user *optlen, int len)
+{
+   int ret;
+
+   if (sk->sk_bind_to_prefix == 0) {
+   len = 0;
+   goto zero;
+   }
+
+   if (sk->sk_family == AF_INET) {
+   ret = -EINVAL;
+   if (len < sizeof(struct ipv4_prefix))
+   goto out;
+
+   len = sizeof(struct ipv4_prefix);
+
+   ret = -EFAULT;
+   if (copy_to_user(optval, &sk->sk_bind_prefix4, len))
+   goto out;
+
+   } else if (sk->sk_family == AF_INET6) {
+   ret = -EINVAL;
+  

[net-next RFC 2/4] bindtoprefix: TCP/IPv4 implementation

2016-03-29 Thread Gilberto Bertin
Signed-off-by: Gilberto Bertin 
---
 net/ipv4/inet_connection_sock.c | 20 +++-
 net/ipv4/inet_hashtables.c  |  9 +
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 6414891..162c252 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -15,6 +15,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -43,6 +44,22 @@ void inet_get_local_port_range(struct net *net, int *low, 
int *high)
 }
 EXPORT_SYMBOL(inet_get_local_port_range);
 
+static inline int inet_csk_bind_prefix_conflict(const struct sock *sk,
+   const struct sock *sk2)
+{
+   __be32 mask;
+
+   if (sk->sk_bind_to_prefix && sk2->sk_bind_to_prefix) {
+   mask = inet_make_mask(min(sk->sk_bind_prefix4.plen,
+ sk2->sk_bind_prefix4.plen));
+
+   return (sk->sk_bind_prefix4.net & mask) ==
+  (sk2->sk_bind_prefix4.net & mask);
+   }
+
+   return 0;
+}
+
 int inet_csk_bind_conflict(const struct sock *sk,
   const struct inet_bind_bucket *tb, bool relax)
 {
@@ -63,7 +80,8 @@ int inet_csk_bind_conflict(const struct sock *sk,
!inet_v6_ipv6only(sk2) &&
(!sk->sk_bound_dev_if ||
 !sk2->sk_bound_dev_if ||
-sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
+sk->sk_bound_dev_if == sk2->sk_bound_dev_if) &&
+inet_csk_bind_prefix_conflict(sk, sk2)) {
if ((!reuse || !sk2->sk_reuse ||
sk2->sk_state == TCP_LISTEN) &&
(!reuseport || !sk2->sk_reuseport ||
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ccc5980..44693c4 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -13,6 +13,7 @@
  *  2 of the License, or (at your option) any later version.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -189,6 +190,14 @@ static inline int compute_score(struct sock *sk, struct 
net *net,
return -1;
score += 4;
}
+   if (sk->sk_bind_to_prefix) {
+   __be32 mask = inet_make_mask(sk->sk_bind_prefix4.plen);
+
+   if ((sk->sk_bind_prefix4.net & mask) != (daddr & mask))
+   return -1;
+   score += 4;
+   }
+
if (sk->sk_incoming_cpu == raw_smp_processor_id())
score++;
}
-- 
2.7.3



Re: [PATCH net-next v3.16]r9169: Correct Set Vlan tag

2016-03-29 Thread Eric Dumazet
On Tue, 2016-03-29 at 11:33 +0300, Corcodel Marian wrote:
>  This patch add set Vlan tag and flush CPlusCmd register because when unset
>  RxVlan and RxChkSum bit, whithout some explication , unwanted bits
>  is set, PCIDAC, PCIMulRW  and others.Whithout this patch when run
>  ethtool -d eth0 on "C+ Command" field missing "VLAN de-tagging"

Are you a bot or a human being ?

It really looks like a program was assigned to fool us, sending
random patches targeting r8169 and random linux kernels.

We wont accept patches if they are not targeting current linux kernels.





Re: [PATCH 1/2 net-next v3.16]r8169: Disable set bit multicast enable per multicast address.

2016-03-29 Thread Eric Dumazet
On Sun, 2016-03-27 at 19:31 +0300, Corcodel Marian wrote:
> On Sat, 26 Mar 2016 10:18:46 -0700
> Eric Dumazet  wrote:
> 
> > On Sat, 2016-03-26 at 12:57 +0200, Corcodel Marian wrote:
> > >  This patch correct set bit multicast enable only once per
> > > set_rx_mode invocation.
> > > 
> > > Signed-off-by: Corcodel Marian 
> > > ---
> > >  drivers/net/ethernet/realtek/r8169.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/realtek/r8169.c
> > > b/drivers/net/ethernet/realtek/r8169.c index 7f6fb1f..f7b0dfb 100644
> > > --- a/drivers/net/ethernet/realtek/r8169.c
> > > +++ b/drivers/net/ethernet/realtek/r8169.c
> > > @@ -4619,12 +4619,11 @@ static void rtl_set_rx_mode(struct
> > > net_device *dev) } else {
> > >   struct netdev_hw_addr *ha;
> > >  
> > > - rx_mode = AcceptBroadcast | AcceptMyPhys;
> > > + rx_mode = AcceptBroadcast | AcceptMyPhys |
> > > AcceptMulticast; mc_filter[1] = mc_filter[0] = 0;
> > >   netdev_for_each_mc_addr(ha, dev) {
> > >   int bit_nr = ether_crc(ETH_ALEN, ha->addr)
> > > >> 26; mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
> > > - rx_mode |= AcceptMulticast;
> > >   }
> > >   }
> > >  
> > 
> > If the list is empty, why should we enable AcceptMulticast ?
> > 
> > 
> > 
> 
> I not experienced list empty, allways on my case exist bit_nr variable. 

If dev->mc list is empty, the netdev_for_each_mc_addr(ha, dev) does
nothing at all.

In other words, netdev_mc_count(dev) can be 0

Current code is fine, and run in control (slow) path anyway.





Re: [PATCH] mwifiex: add __GFP_REPEAT to skb allocation call

2016-03-29 Thread Eric Dumazet
On Tue, 2016-03-29 at 17:27 +0800, Wei-Ning Huang wrote:
> Adding some chromium devs to the thread.
> 
> In, http://lxr.free-electrons.com/source/mm/page_alloc.c#L3152
> 
> The default mm retry allocation when 'order <=
> PAGE_ALLOC_COSTLY_ORDER' of gfp_mask contains __GFP_REPEAT.
> PAGE_ALLOC_COSTLY_ORDER is defined to be 3. On systems with page size
> = 4K, this means memory compaction and retry is only done when the
> size of allocation is <= 32K
> In mwifiex, the allocation size is 64K.



>  When we have system with
> memory fragmentation and allocation failed, there will be no retry.
> This is why we need to add __GFP_REPEAT here to allow the system to
> perform memory compaction and retry allocation.
> 
> Maybe Amit@marvell can comment on if this is a good fix on this issue.
> I'm also aware that marvell is the progress of implementing
> scatter/gatter for mwifiex, which can also fix the issue.

Before SG is implemented, you really need to copy incoming frames into
smallest chunks (to get lowest skb->truesize) and leave the 64KB
allocated stuff forever in the driver.

__GFP_REPEAT wont really solve the issue.

It seems the problem comes from the fact that the drivers calls
dev_kfree_skb_any() after calling mwifiex_deaggr_sdio_pkt(), instead of
recycling this very precious 64KB skb once memory gets fragmented.

Another problem is that mwifiex_deaggr_sdio_pkt() uses
mwifiex_alloc_dma_align_buf() with GFP_KERNEL | GFP_DMA

Really GFP_DMA makes no sense here, since the skb is going to be
processed by the stack, which has no such requirement.

Please use normal skb allocations there.

diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c 
b/drivers/net/wireless/marvell/mwifiex/sdio.c
index b2c839a..8404db5 100644
--- a/drivers/net/wireless/marvell/mwifiex/sdio.c
+++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
@@ -1123,8 +1123,8 @@ static void mwifiex_deaggr_sdio_pkt(struct 
mwifiex_adapter *adapter,
__func__, pkt_len, blk_size);
break;
}
-   skb_deaggr = mwifiex_alloc_dma_align_buf(pkt_len,
-GFP_KERNEL | GFP_DMA);
+   skb_deaggr = __netdev_alloc_skb_ip_align(NULL, pkt_len,
+GFP_KERNEL);
if (!skb_deaggr)
break;
skb_put(skb_deaggr, pkt_len);






Re: bpf: net/core/filter.c:2115 suspicious rcu_dereference_protected() usage!

2016-03-29 Thread Michal Kubecek
On Mon, Feb 22, 2016 at 10:31:33AM -0500, Sasha Levin wrote:
> 
> I've hit the following warning while fuzzing with trinity inside a kvmtool 
> guest
> running the latest -next kernel:
> 
> [ 1343.104588] ===
> [ 1343.104591] [ INFO: suspicious RCU usage. ]
> [ 1343.104619] 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978 Not 
> tainted
> [ 1343.104624] ---
> [ 1343.104635] net/core/filter.c:2115 suspicious rcu_dereference_protected() 
> usage!
> [ 1343.104641]
> [ 1343.104641] other info that might help us debug this:
> [ 1343.104641]
> [ 1343.104650]
> [ 1343.104650] rcu_scheduler_active = 1, debug_locks = 0
> [ 1343.104660] 1 lock held by syz-executor/17916:
> [ 1343.104784] #0: (rtnl_mutex){+.+.+.}, at: rtnl_lock 
> (net/core/rtnetlink.c:71)
> [ 1343.104789]
> [ 1343.104789] stack backtrace:
> [ 1343.104820] CPU: 1 PID: 17916 Comm: trinity-c8 Not tainted 
> 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978
> [ 1343.104868]  110036968f44 8801b4b47aa8 a23d9a9d 
> 0001
> [ 1343.104891]  fbfff5c2a630 41b58ab3 adb3a8f2 
> a23d9905
> [ 1343.104914]   8801b5419b40 fbfff7596522 
> 0001
> [ 1343.104919] Call Trace:
> [ 1343.104985] dump_stack (lib/dump_stack.c:53)
> [ 1343.105060] lockdep_rcu_suspicious (kernel/locking/lockdep.c:4282)
> [ 1343.105093] sk_detach_filter (net/core/filter.c:2114 (discriminator 5))
> [ 1343.105193] tun_detach_filter (drivers/net/tun.c:1808 (discriminator 7))
> [ 1343.105238] __tun_chr_ioctl (drivers/net/tun.c:2133)
> [ 1343.105370] tun_chr_ioctl (drivers/net/tun.c:2161)
> [ 1343.105407] do_vfs_ioctl (fs/ioctl.c:44 fs/ioctl.c:674)
> [ 1343.105506] SyS_ioctl (fs/ioctl.c:689 fs/ioctl.c:680)
> [ 1343.105542] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:200)

Looks like sk_detach_filter() wants the socket to be owned which neither
tun_detach_filter() does not do, unlike sock_setsockopt(). Could you
check if the patch below helps?

I'm also not really sure if it is safe to ignore return value of
sk_detach_filter() and just sets tun->filter_attached to false - but
it's not really clear what should be done if one of the calls fails
after some succeeded.


diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afdf950617c3..7417d7c20bab 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1818,11 +1818,13 @@ static int set_offload(struct tun_struct *tun, unsigned 
long arg)
 static void tun_detach_filter(struct tun_struct *tun, int n)
 {
int i;
-   struct tun_file *tfile;
 
for (i = 0; i < n; i++) {
-   tfile = rtnl_dereference(tun->tfiles[i]);
-   sk_detach_filter(tfile->socket.sk);
+   struct sock *sk = rtnl_dereference(tun->tfiles[i])->socket.sk;
+
+   lock_sock(sk);
+   sk_detach_filter(sk);
+   release_sock(sk);
}
 
tun->filter_attached = false;


Re: [PATCH 15/16] wcn36xx: don't pad beacons for mesh

2016-03-29 Thread Sergei Shtylyov

Hello.

On 3/29/2016 9:06 AM, Bjorn Andersson wrote:


From: Jason Mobarak 

Patch "wcn36xx: Pad TIM PVM if needed" has caused a regression in mesh


   scripts/checkpatch.pl now enforces the specific commit citing format, i.e. 
<12-digit SHA1> ("").



beaconing.  The field tim_off is always 0 for mesh mode, and thus
pvm_len (referring to the TIM length field) and pad are both incorrectly
calculated.  Thus, msg_body.beacon_length is incorrectly calculated for
mesh mode. Fix this.

Signed-off-by: Jason Mobarak 
Signed-off-by: Chun-Yeow Yeoh 
Signed-off-by: Bjorn Andersson 


   You might want to add the Fixes: tag.

[...]

MBR, Sergei



Re: am335x: no multicast reception over VLAN

2016-03-29 Thread Grygorii Strashko
On 03/29/2016 03:35 PM, Yegor Yefremov wrote:
> On Tue, Mar 29, 2016 at 1:05 PM, Grygorii Strashko
>  wrote:
>> On 03/29/2016 08:21 AM, Yegor Yefremov wrote:
>>> Hi Mugunthan,
>>>
>>> On Tue, Mar 29, 2016 at 6:00 AM, Mugunthan V N  wrote:
 Hi Yegor

 On Wednesday 16 March 2016 08:35 PM, Yegor Yefremov wrote:
> I have an am335x based board using CPSW in Dual EMAC mode. Without
> VLAN IDs I can receive and send multicast packets [1]. When I create
> VLAN ID:
>
> ip link add link eth1 name eth1.100 type vlan id 100
> ip addr add 192.168.100.2/24 brd 192.168.100.255 dev eth1.100
> route add -net 224.0.0.0 netmask 224.0.0.0 eth1.100
>
> I can successfully send multicast packets, but not receive them. On
> the other side of the Ethernet cable I've used Pandaboard. Pandaboard
> could both receive and send multicast packets via VLAN.

 Are you trying multicast tx/rx on eth1 or eth1.100?
>>>
>>> I'm trying multicast tx/rx on eth1.100.
>>>
>>> eth1 has no problems.
>>>
>>
>> it'd be nice if will be able to post here output fom commands:
>> # switch-config -d [git://git.ti.com/switch-config/switch-config.git v4.1]
>> # ifconfig -a
>> # tcpdump -e -f -Q in -i eth0
>> # tcpdump -e -f -Q in -i eth0.100
> 
> Which kernel/branch do you want me to test?
> 
> git://git.ti.com/ti-linux-kernel/ti-linux-kernel.git and ti-rt-linux-4.1.y?
> 
> So far I was using vanilla kernel.

Your branch (but better 4.5 kernels (or 4.4)). 
Just when you've done with configuration run cmds 1&2,
and when you run your use-case - run cmds 2&3 on receiver side (grap ~5-10 
packets).
then stop test and run cmd 1 again.

After all could you provide your console output here, pls.


-- 
regards,
-grygorii


Re: am335x: no multicast reception over VLAN

2016-03-29 Thread Yegor Yefremov
On Tue, Mar 29, 2016 at 1:05 PM, Grygorii Strashko
 wrote:
> On 03/29/2016 08:21 AM, Yegor Yefremov wrote:
>> Hi Mugunthan,
>>
>> On Tue, Mar 29, 2016 at 6:00 AM, Mugunthan V N  wrote:
>>> Hi Yegor
>>>
>>> On Wednesday 16 March 2016 08:35 PM, Yegor Yefremov wrote:
 I have an am335x based board using CPSW in Dual EMAC mode. Without
 VLAN IDs I can receive and send multicast packets [1]. When I create
 VLAN ID:

 ip link add link eth1 name eth1.100 type vlan id 100
 ip addr add 192.168.100.2/24 brd 192.168.100.255 dev eth1.100
 route add -net 224.0.0.0 netmask 224.0.0.0 eth1.100

 I can successfully send multicast packets, but not receive them. On
 the other side of the Ethernet cable I've used Pandaboard. Pandaboard
 could both receive and send multicast packets via VLAN.
>>>
>>> Are you trying multicast tx/rx on eth1 or eth1.100?
>>
>> I'm trying multicast tx/rx on eth1.100.
>>
>> eth1 has no problems.
>>
>
> it'd be nice if will be able to post here output fom commands:
> # switch-config -d [git://git.ti.com/switch-config/switch-config.git v4.1]
> # ifconfig -a
> # tcpdump -e -f -Q in -i eth0
> # tcpdump -e -f -Q in -i eth0.100

Which kernel/branch do you want me to test?

git://git.ti.com/ti-linux-kernel/ti-linux-kernel.git and ti-rt-linux-4.1.y?

So far I was using vanilla kernel.

Yegor


[PATCH net-next v2 2/2] net: dsa: mv88e6xxx: Clear the PDOWN bit on setup

2016-03-29 Thread Patrick Uiterwijk
Some of the vendor-specific bootloaders set up this part
of the initialization for us, so this was never added.
However, since upstream bootloaders don't initialize the
chip specifically, they leave the fiber MII's PDOWN flag
set, which means that the CPU port doesn't connect.

This patch checks whether this flag has been clear prior
by something else, and if not make us clear it.

Signed-off-by: Patrick Uiterwijk 
---
 drivers/net/dsa/mv88e6xxx.c | 36 
 drivers/net/dsa/mv88e6xxx.h |  8 
 2 files changed, 44 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 86a2029..9807913 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2296,6 +2296,25 @@ restore_page_0:
return ret;
 }
 
+static int mv88e6xxx_power_on_serdes(struct dsa_switch *ds)
+{
+   int ret;
+
+   ret = _mv88e6xxx_phy_page_read(ds, REG_FIBER_SERDES, PAGE_FIBER_SERDES,
+  MII_BMCR);
+   if (ret < 0)
+   return ret;
+
+   if (ret & BMCR_PDOWN) {
+   ret = ret & ~BMCR_PDOWN;
+   ret = _mv88e6xxx_phy_page_write(ds, REG_FIBER_SERDES,
+   PAGE_FIBER_SERDES, MII_BMCR,
+   ret);
+   }
+
+   return ret;
+}
+
 static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
@@ -2399,6 +2418,23 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, 
int port)
goto abort;
}
 
+   /* If this port is connected to a SerDes, make sure the SerDes is not
+* powered down.
+*/
+   if (mv88e6xxx_6352_family(ds)) {
+   ret = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_STATUS);
+   if (ret < 0)
+   goto abort;
+   ret &= PORT_STATUS_CMODE_MASK;
+   if ((ret == PORT_STATUS_CMODE_100BASE_X) ||
+   (ret == PORT_STATUS_CMODE_1000BASE_X) ||
+   (ret == PORT_STATUS_CMODE_SGMII)) {
+   ret = mv88e6xxx_power_on_serdes(ds);
+   if (ret < 0)
+   goto abort;
+   }
+   }
+
/* Port Control 2: don't force a good FCS, set the maximum frame size to
 * 10240 bytes, disable 802.1q tags checking, don't discard tagged or
 * untagged frames on this port, do a destination address lookup on all
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 9a038ab..26a424a 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -28,6 +28,10 @@
 #define SMI_CMD_OP_45_READ_DATA_INC((3 << 10) | SMI_CMD_BUSY)
 #define SMI_DATA   0x01
 
+/* Fiber/SERDES Registers are located at SMI address F, page 1 */
+#define REG_FIBER_SERDES   0x0f
+#define PAGE_FIBER_SERDES  0x01
+
 #define REG_PORT(p)(0x10 + (p))
 #define PORT_STATUS0x00
 #define PORT_STATUS_PAUSE_EN   BIT(15)
@@ -45,6 +49,10 @@
 #define PORT_STATUS_MGMII  BIT(6) /* 6185 */
 #define PORT_STATUS_TX_PAUSED  BIT(5)
 #define PORT_STATUS_FLOW_CTRL  BIT(4)
+#define PORT_STATUS_CMODE_MASK 0x0f
+#define PORT_STATUS_CMODE_100BASE_X0x8
+#define PORT_STATUS_CMODE_1000BASE_X   0x9
+#define PORT_STATUS_CMODE_SGMII0xa
 #define PORT_PCS_CTRL  0x01
 #define PORT_PCS_CTRL_RGMII_DELAY_RXCLKBIT(15)
 #define PORT_PCS_CTRL_RGMII_DELAY_TXCLKBIT(14)
-- 
2.7.4



[PATCH net-next v2 1/2] net: dsa: mv88e6xxx: Introduce _mv88e6xxx_phy_page_{read,write}

2016-03-29 Thread Patrick Uiterwijk
Add versions of the phy_page_read and _write functions to
be used in a context where the SMI mutex is held.

Signed-off-by: Patrick Uiterwijk 
---
 drivers/net/dsa/mv88e6xxx.c | 49 +
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index fa086e0..86a2029 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2264,6 +2264,38 @@ static void mv88e6xxx_bridge_work(struct work_struct 
*work)
mutex_unlock(&ps->smi_mutex);
 }
 
+static int _mv88e6xxx_phy_page_write(struct dsa_switch *ds, int port, int page,
+int reg, int val)
+{
+   int ret;
+
+   ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page);
+   if (ret < 0)
+   goto restore_page_0;
+
+   ret = _mv88e6xxx_phy_write_indirect(ds, port, reg, val);
+restore_page_0:
+   _mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0);
+
+   return ret;
+}
+
+static int _mv88e6xxx_phy_page_read(struct dsa_switch *ds, int port, int page,
+   int reg)
+{
+   int ret;
+
+   ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page);
+   if (ret < 0)
+   goto restore_page_0;
+
+   ret = _mv88e6xxx_phy_read_indirect(ds, port, reg);
+restore_page_0:
+   _mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0);
+
+   return ret;
+}
+
 static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
@@ -2714,13 +2746,9 @@ int mv88e6xxx_phy_page_read(struct dsa_switch *ds, int 
port, int page, int reg)
int ret;
 
mutex_lock(&ps->smi_mutex);
-   ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page);
-   if (ret < 0)
-   goto error;
-   ret = _mv88e6xxx_phy_read_indirect(ds, port, reg);
-error:
-   _mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0);
+   ret = _mv88e6xxx_phy_page_read(ds, port, page, reg);
mutex_unlock(&ps->smi_mutex);
+
return ret;
 }
 
@@ -2731,14 +2759,9 @@ int mv88e6xxx_phy_page_write(struct dsa_switch *ds, int 
port, int page,
int ret;
 
mutex_lock(&ps->smi_mutex);
-   ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page);
-   if (ret < 0)
-   goto error;
-
-   ret = _mv88e6xxx_phy_write_indirect(ds, port, reg, val);
-error:
-   _mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0);
+   ret = _mv88e6xxx_phy_page_write(ds, port, page, reg, val);
mutex_unlock(&ps->smi_mutex);
+
return ret;
 }
 
-- 
2.7.4



Re: am335x: no multicast reception over VLAN

2016-03-29 Thread Grygorii Strashko
On 03/29/2016 08:21 AM, Yegor Yefremov wrote:
> Hi Mugunthan,
> 
> On Tue, Mar 29, 2016 at 6:00 AM, Mugunthan V N  wrote:
>> Hi Yegor
>>
>> On Wednesday 16 March 2016 08:35 PM, Yegor Yefremov wrote:
>>> I have an am335x based board using CPSW in Dual EMAC mode. Without
>>> VLAN IDs I can receive and send multicast packets [1]. When I create
>>> VLAN ID:
>>>
>>> ip link add link eth1 name eth1.100 type vlan id 100
>>> ip addr add 192.168.100.2/24 brd 192.168.100.255 dev eth1.100
>>> route add -net 224.0.0.0 netmask 224.0.0.0 eth1.100
>>>
>>> I can successfully send multicast packets, but not receive them. On
>>> the other side of the Ethernet cable I've used Pandaboard. Pandaboard
>>> could both receive and send multicast packets via VLAN.
>>
>> Are you trying multicast tx/rx on eth1 or eth1.100?
> 
> I'm trying multicast tx/rx on eth1.100.
> 
> eth1 has no problems.
> 

it'd be nice if will be able to post here output fom commands:
# switch-config -d [git://git.ti.com/switch-config/switch-config.git v4.1]
# ifconfig -a
# tcpdump -e -f -Q in -i eth0
# tcpdump -e -f -Q in -i eth0.100


-- 
regards,
-grygorii


Re: [PATCH] mwifiex: advertise low priority scan feature

2016-03-29 Thread Wei-Ning Huang
I've resent the patch here: https://patchwork.kernel.org/patch/8637861/
Thanks!

Wei-Ning

On Tue, Mar 22, 2016 at 12:12 PM, Wei-Ning Huang  wrote:
> Hi Kalle,
>
> Thanks for the review. I accidentally removed the s-o-b line from
> akarwar in this version.
> The original patch can be found at:
> https://chromium-review.googlesource.com/#/c/246052/
> I've resent a new one.
>
> Wei-Ning
>
> On Mon, Mar 21, 2016 at 6:28 PM, Kalle Valo  wrote:
>> Wei-Ning Huang  writes:
>>
>>> From: Amitkumar Karwar 
>>>
>>> Low priority scan handling code which delays or aborts scan
>>> operation based on Tx traffic is removed recently. The reason
>>> is firmware already takes care of it in our new feature scan
>>> channel gap. Hence we should advertise low priority scan
>>> support to cfg80211.
>>>
>>> This patch fixes a problem in which OBSS scan request from
>>> wpa_supplicant was being rejected by cfg80211.
>>>
>>> Signed-off-by: Wei-Ning Huang 
>>
>> The From line states that this is written by Amitkumar but there's no
>> Signed-off-By line from him. I can't take this without that, please
>> resend.
>>
>> (Wei-Ning's s-o-b line is correct, I just need also Amitkumar's line.)
>>
>> --
>> Kalle Valo
>
>
>
> --
> Wei-Ning Huang, 黃偉寧 | Software Engineer, Google Inc., Taiwan |
> wnhu...@google.com | Cell: +886 910-380678



-- 
Wei-Ning Huang, 黃偉寧 | Software Engineer, Google Inc., Taiwan |
wnhu...@google.com | Cell: +886 910-380678


RE: [PATCH] mwifiex: add __GFP_REPEAT to skb allocation call

2016-03-29 Thread Amitkumar Karwar
> From: Wei-Ning Huang [mailto:wnhu...@google.com]
> Sent: Tuesday, March 29, 2016 2:57 PM
> To: Kalle Valo
> Cc: Linux Wireless; LKML; Amitkumar Karwar; Nishant Sarmukadam; Sameer
> Nanda; netdev@vger.kernel.org; Sonny Rao; Douglas Anderson
> Subject: Re: [PATCH] mwifiex: add __GFP_REPEAT to skb allocation call
> 
> Adding some chromium devs to the thread.
> 
> In, http://lxr.free-electrons.com/source/mm/page_alloc.c#L3152
> 
> The default mm retry allocation when 'order <= PAGE_ALLOC_COSTLY_ORDER'
> of gfp_mask contains __GFP_REPEAT.
> PAGE_ALLOC_COSTLY_ORDER is defined to be 3. On systems with page size =
> 4K, this means memory compaction and retry is only done when the size of
> allocation is <= 32K In mwifiex, the allocation size is 64K. When we
> have system with memory fragmentation and allocation failed, there will
> be no retry.
> This is why we need to add __GFP_REPEAT here to allow the system to
> perform memory compaction and retry allocation.
> 
> Maybe Amit@marvell can comment on if this is a good fix on this issue.
> I'm also aware that marvell is the progress of implementing
> scatter/gatter for mwifiex, which can also fix the issue.
> 
> Wei-Ning
> 

This fix would be useful. We have a feature called single port aggregation in 
which sometimes data received from SDIO interface can be >32k (but less than 
64k). This feature improves throughput performance. We are preparing patches 
for scatter/gather feature. but scatter/gather won't be supported by some 
platforms. Hence this fix would still be needed.

Regards,
Amitkumar


[PATCH] bridge: Allow set bridge ageing time when switchdev disabled

2016-03-29 Thread Haishuang Yan
When NET_SWITCHDEV=n, switchdev_port_attr_set will return -EOPNOTSUPP,
we should ignore this error code and continue to set the ageing time.

Signed-off-by: Haishuang Yan 
---
 net/bridge/br_stp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index e234490..9cb7044 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -582,7 +582,7 @@ int br_set_ageing_time(struct net_bridge *br, u32 
ageing_time)
int err;
 
err = switchdev_port_attr_set(br->dev, &attr);
-   if (err)
+   if (err && err != -EOPNOTSUPP)
return err;
 
br->ageing_time = t;
-- 
1.8.3.1





Re: [PATCH] mwifiex: add __GFP_REPEAT to skb allocation call

2016-03-29 Thread Wei-Ning Huang
Adding some chromium devs to the thread.

In, http://lxr.free-electrons.com/source/mm/page_alloc.c#L3152

The default mm retry allocation when 'order <=
PAGE_ALLOC_COSTLY_ORDER' of gfp_mask contains __GFP_REPEAT.
PAGE_ALLOC_COSTLY_ORDER is defined to be 3. On systems with page size
= 4K, this means memory compaction and retry is only done when the
size of allocation is <= 32K
In mwifiex, the allocation size is 64K. When we have system with
memory fragmentation and allocation failed, there will be no retry.
This is why we need to add __GFP_REPEAT here to allow the system to
perform memory compaction and retry allocation.

Maybe Amit@marvell can comment on if this is a good fix on this issue.
I'm also aware that marvell is the progress of implementing
scatter/gatter for mwifiex, which can also fix the issue.

Wei-Ning

On Tue, Mar 29, 2016 at 4:37 PM, Kalle Valo  wrote:
> Wei-Ning Huang  writes:
>
>> "single skb allocation failure" happens when system is under heavy
>> memory pressure.  Add __GFP_REPEAT to skb allocation call so kernel
>> attempts to reclaim pages and retry the allocation.
>>
>> Signed-off-by: Wei-Ning Huang 
>
> Is this really a proper way to fix the issue? This is the first time I'm
> hearing about the flag and there isn't even a single user in
> drivers/net. I would like to get confirmation from others that
> __GFP_REPEAT is really ok to use in a wireless driver before I can take
> this.
>
> --
> Kalle Valo



-- 
Wei-Ning Huang, 黃偉寧 | Software Engineer, Google Inc., Taiwan |
wnhu...@google.com | Cell: +886 910-380678


[PATCH net-next v3.16]r9169: Correct Set Vlan tag

2016-03-29 Thread Corcodel Marian
 This patch add set Vlan tag and flush CPlusCmd register because when unset
 RxVlan and RxChkSum bit, whithout some explication , unwanted bits
 is set, PCIDAC, PCIMulRW  and others.Whithout this patch when run
 ethtool -d eth0 on "C+ Command" field missing "VLAN de-tagging"

Signed-off-by: Corcodel Marian 
---
 drivers/net/ethernet/realtek/r8169.c | 21 +++--
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 2594bbb..df7ea28 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -752,7 +752,7 @@ struct rtl8169_private {
void *Rx_databuff[NUM_RX_DESC]; /* Rx data buffers */
struct ring_info tx_skb[NUM_TX_DESC];   /* Tx data buffers */
struct timer_list timer;
-   u16 cp_cmd;
+   int cp_cmd;
 
u16 event_slow;
 
@@ -1797,7 +1797,7 @@ static void __rtl8169_set_features(struct net_device *dev,
   netdev_features_t features)
 {
struct rtl8169_private *tp = netdev_priv(dev);
-   netdev_features_t changed = features ^ dev->features;
+   netdev_features_t changed = features & dev->features;
void __iomem *ioaddr = tp->mmio_addr;
 
if (!(changed & (NETIF_F_RXALL | NETIF_F_RXCSUM |
@@ -1807,16 +1807,10 @@ static void __rtl8169_set_features(struct net_device 
*dev,
if (changed & (NETIF_F_RXCSUM | NETIF_F_HW_VLAN_CTAG_RX)) {
if (features & NETIF_F_RXCSUM)
tp->cp_cmd |= RxChkSum;
-   else
-   tp->cp_cmd &= ~RxChkSum;
 
if (dev->features & NETIF_F_HW_VLAN_CTAG_RX)
tp->cp_cmd |= RxVlan;
-   else
-   tp->cp_cmd &= ~RxVlan;
 
-   RTL_W16(CPlusCmd, tp->cp_cmd);
-   RTL_R16(CPlusCmd);
}
if (changed & NETIF_F_RXALL) {
int tmp = (RTL_R32(RxConfig) & ~(AcceptErr | AcceptRunt));
@@ -6573,8 +6567,6 @@ static int rtl_open(struct net_device *dev)
 
rtl8169_init_phy(dev, tp);
 
-   __rtl8169_set_features(dev, dev->features);
-
rtl_pll_power_up(tp);
 
rtl_hw_start(dev);
@@ -7060,7 +7052,7 @@ rtl_init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
goto err_out_mwi_2;
}
 
-   tp->cp_cmd = RxChkSum;
+   tp->cp_cmd = ~0x;
 
if ((sizeof(dma_addr_t) > 4) &&
!pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && use_dac) {
@@ -7101,13 +7093,6 @@ rtl_init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
pci_set_master(pdev);
 
-   /*
-* Pretend we are using VLANs; This bypasses a nasty bug where
-* Interrupts stop flowing on high load on 8110SCd controllers.
-*/
-   if (tp->mac_version == RTL_GIGA_MAC_VER_05)
-   tp->cp_cmd |= RxVlan;
-
rtl_init_mdio_ops(tp);
rtl_init_pll_power_ops(tp);
rtl_init_jumbo_ops(tp);
-- 
2.1.4



Re: [PATCH net] team: team should sync the port's uc/mc addrs when add a port

2016-03-29 Thread Xin Long
On Tue, Mar 29, 2016 at 12:56 PM, Cong Wang  wrote:
> On Mon, Mar 28, 2016 at 9:42 AM, Xin Long  wrote:
>> There is an issue when we use mavtap over team:
>> When we replug nic links from team0, the real nics's mc list will not
>> include the maddr for macvtap any more. then we can't receive pkts to
>> macvtap device, as they are filterred by mc list of nic.
>>
>> In Bonding Driver, it syncs the uc/mc addrs in bond_enslave().
>>
>> We will fix this issue on team by adding the port's uc/mc addrs sync in
>> team_port_add.
>>
>> Signed-off-by: Xin Long 
>> ---
>>  drivers/net/team/team.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>> index 26c64d2..17ff367 100644
>> --- a/drivers/net/team/team.c
>> +++ b/drivers/net/team/team.c
>> @@ -1198,6 +1198,9 @@ static int team_port_add(struct team *team, struct 
>> net_device *port_dev)
>> goto err_dev_open;
>> }
>>
>> +   dev_uc_sync_multiple(port_dev, dev);
>> +   dev_mc_sync_multiple(port_dev, dev);
>> +
>> err = vlan_vids_add_by_dev(port_dev, dev);
>
> You need to call dev_{uc,mc}_unsync() on error path, don't you?

I think so, I'm gonna post another patch as a fix for this one.
Thanks, Cong.


Re: [PATCH] mwifiex: add __GFP_REPEAT to skb allocation call

2016-03-29 Thread Kalle Valo
Wei-Ning Huang  writes:

> "single skb allocation failure" happens when system is under heavy
> memory pressure.  Add __GFP_REPEAT to skb allocation call so kernel
> attempts to reclaim pages and retry the allocation.
>
> Signed-off-by: Wei-Ning Huang 

Is this really a proper way to fix the issue? This is the first time I'm
hearing about the flag and there isn't even a single user in
drivers/net. I would like to get confirmation from others that
__GFP_REPEAT is really ok to use in a wireless driver before I can take
this.

-- 
Kalle Valo


Re: [PATCH] net: macb: Only call GPIO functions if there is a valid GPIO

2016-03-29 Thread Charles Keepax
On Mon, Mar 28, 2016 at 08:30:11PM +0300, Sergei Shtylyov wrote:
> >@@ -3029,7 +3030,8 @@ static int macb_remove(struct platform_device *pdev)
> > mdiobus_free(bp->mii_bus);
> >
> > /* Shutdown the PHY if there is a GPIO reset */
> >-gpiod_set_value(bp->reset_gpio, 0);
> >+if (bp->reset_gpio)
> >+gpiod_set_value(bp->reset_gpio, 0);
> 
>Hm, this function was previously OK to call with NULL (it didn't curse)...
> 

Looks like it was changed so that it does complain fairly
recently (patch librally snipped down):

commit fdeb8e1547cb9dd39d5d7223b33f3565cf86c28e
Author: Linus Walleij 

gpio: reflect base and ngpio into gpio_device


+#define VALIDATE_DESC_VOID(desc) do { \
+ if (!desc || !desc->gdev) { \
+ pr_warn("%s: invalid GPIO\n", __func__); \
+ return; \
+ } \
+

void gpiod_set_value(struct gpio_desc *desc, int value)
{
-   if (!desc)
-   return;
+ VALIDATE_DESC_VOID(desc);

Thanks,
Charles