IPv6: finding a match in find_rr_leaf that cause a NULL pointer dereference

2015-05-26 Thread Jeremy
Hi all,

My kernel is 3.10.20 and I encountered this NULL pointer dereference.

According to my log, I found that dev is NULL in the beginning of for 
loop. it is the dev of rr_head that has been set to NULL. That might be 
possible that dev being NULL but I don't see any case to avoid such case 
in rt6_check_dev(), even the kernel 4.0.

Don't know if my concern is necessary or not or there has a mechanism in 
somewhere that I hadn't been noticed.

Any comment of my concern is welcome and thanks for your time to review my 
post.

for (rt = rr_head; rt && rt->rt6i_metric == metric; rt = rt->dst.rt6_next)
match = find_match(rt, oif, strict, &mpri, match, do_rr);
  --> rt6_score_route(rt, oif, strict);
--> rt6_check_dev(rt, oif);
 --> struct net_device *dev = rt->dst.dev;
 --> if (!oif || dev->ifindex == oif) <--NULL pointer dereference 
on dev object.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in tcp timestamp option ? TSecr in SYN-ACK != TSval in SYN

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 23:12 -0700, Eric Dumazet wrote:
> On Tue, 2015-05-26 at 22:47 -0700, Gopakumar Choorakkot Edakkunni wrote:
> > All,
> > 
> > The original query I had posted is here :
> > http://stackoverflow.com/questions/30414350/tcp-syn-ack-tsecr-not-matching-tsval-in-syn
> > .. The summary is that once in a while, the TSval in SYN is not what
> > is getting echoed in TSecr, and looks like something on amazon aws
> > side is very strict about that and drops those packets. Any clues on
> > this - whether its a known issue/fixed elsewhere etc. would be of
> > great help.
> 
> I guess that if you send SYN packets 3 times as your email did on
> netdev, that might cause some issues...
> 
> More seriously, server has a SYN_RECV socket with same tuple, because of
> a SYN sent earlier :
> 
> 8:36:00.593136 IP XX.YY.ZZ.VV.24548 > AA.BB.CC.DD.443: Flags [S], seq
> 1204544933, win 29200, options [mss 1320,sackOK,TS val 6032576 ecr
> 0,nop,wscale 7], length 0
> 
> 18:36:00.593171 IP AA.BB.CC.DD.443 > XX.YY.ZZ.VV.24548: Flags [S.], seq
> 986069863, ack 1204544934, win 14480, options [mss 1460,sackOK,TS val
> 180940028 ecr 6001497,nop,wscale 5], length 0
> 
> 18:36:00.992699 IP AA.BB.CC.DD.443 > XX.YY.ZZ.VV.24548: Flags [S.], seq
> 986069863, ack 1204544934, win 14480, options [mss 1460,sackOK,TS val
> 180940128 ecr 6001497,nop,wscale 5], length 0
> 
> 
> From these traces, we can guess a SYN packet was sent about 31 seconds
> earlier.
> 
> SYNACK rtx do not update the TSECR : Initial SYN TSval value (6001497)
> is mirrored.
> 
> Are you establishing many active sessions per minute to this particular
> target ?
> 

Here is a packetdrill test to demonstrate behavior :

// Test that SYNACK rtx tsecr is not changed (original SYN tsval)

`../common/defaults.sh
`

// Create a socket.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

// Establish a connection.
0.100 < S 0:0(0) win 2 
+0> S. 0:0(0) ack 1 

+0.100 < S 0:0(0) win 2 
// check rtx tsecr is sill 100, not 199
+0 > S. 0:0(0) ack 1 

+0.100 < . 1:1(0) ack 1 win 2 
+0accept(3, ..., ...) = 4


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in tcp timestamp option ? TSecr in SYN-ACK != TSval in SYN

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 22:47 -0700, Gopakumar Choorakkot Edakkunni wrote:
> All,
> 
> The original query I had posted is here :
> http://stackoverflow.com/questions/30414350/tcp-syn-ack-tsecr-not-matching-tsval-in-syn
> .. The summary is that once in a while, the TSval in SYN is not what
> is getting echoed in TSecr, and looks like something on amazon aws
> side is very strict about that and drops those packets. Any clues on
> this - whether its a known issue/fixed elsewhere etc. would be of
> great help.

I guess that if you send SYN packets 3 times as your email did on
netdev, that might cause some issues...

More seriously, server has a SYN_RECV socket with same tuple, because of
a SYN sent earlier :

8:36:00.593136 IP XX.YY.ZZ.VV.24548 > AA.BB.CC.DD.443: Flags [S], seq
1204544933, win 29200, options [mss 1320,sackOK,TS val 6032576 ecr
0,nop,wscale 7], length 0

18:36:00.593171 IP AA.BB.CC.DD.443 > XX.YY.ZZ.VV.24548: Flags [S.], seq
986069863, ack 1204544934, win 14480, options [mss 1460,sackOK,TS val
180940028 ecr 6001497,nop,wscale 5], length 0

18:36:00.992699 IP AA.BB.CC.DD.443 > XX.YY.ZZ.VV.24548: Flags [S.], seq
986069863, ack 1204544934, win 14480, options [mss 1460,sackOK,TS val
180940128 ecr 6001497,nop,wscale 5], length 0


>From these traces, we can guess a SYN packet was sent about 31 seconds
earlier.

SYNACK rtx do not update the TSECR : Initial SYN TSval value (6001497)
is mirrored.

Are you establishing many active sessions per minute to this particular
target ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug in tcp timestamp option ? TSecr in SYN-ACK != TSval in SYN

2015-05-26 Thread Gopakumar Choorakkot Edakkunni
All,

The original query I had posted is here :
http://stackoverflow.com/questions/30414350/tcp-syn-ack-tsecr-not-matching-tsval-in-syn
.. The summary is that once in a while, the TSval in SYN is not what
is getting echoed in TSecr, and looks like something on amazon aws
side is very strict about that and drops those packets. Any clues on
this - whether its a known issue/fixed elsewhere etc. would be of
great help.

Rgds,
Gopa.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug in tcp timestamp option ? TSecr in SYN-ACK != TSval in SYN

2015-05-26 Thread Gopakumar Choorakkot Edakkunni
All,

The original query I had posted is here :
http://stackoverflow.com/questions/30414350/tcp-syn-ack-tsecr-not-matching-tsval-in-syn
.. The summary is that once in a while, the TSval in SYN is not what
is getting echoed in TSecr, and looks like something on amazon aws
side is very strict about that and drops those packets. Any clues on
this - whether its a known issue/fixed elsewhere etc. would be of
great help.

Rgds,
Gopa.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug in tcp timestamp option ? TSecr in SYN-ACK != TSval in SYN

2015-05-26 Thread Gopakumar Choorakkot Edakkunni
All,

The original query I had posted is here :
http://stackoverflow.com/questions/30414350/tcp-syn-ack-tsecr-not-matching-tsval-in-syn
.. The summary is that once in a while, the TSval in SYN is not what
is getting echoed in TSecr, and looks like something on amazon aws
side is very strict about that and drops those packets. Any clues on
this - whether its a known issue/fixed elsewhere etc. would be of
great help.

Rgds,
Gopa.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug in tcp timestamp option ? TSecr in SYN-ACK != TSval in SYN

2015-05-26 Thread Gopakumar Choorakkot Edakkunni
All,

The original query I had posted is here :
http://stackoverflow.com/questions/30414350/tcp-syn-ack-tsecr-not-matching-tsval-in-syn
.. The summary is that once in a while, the TSval in SYN is not what
is getting echoed in TSecr, and looks like something on amazon aws
side is very strict about that and drops those packets. Any clues on
this - whether its a known issue/fixed elsewhere etc. would be of
great help.

Rgds,
Gopa.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] qla4xxx: add a missing include

2015-05-26 Thread Eric Dumazet
From: Eric Dumazet 

vmalloc.h used to be included from include/net/inet_hashtables.h
but it is no longer the case.

Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()")
Reported-by: kbuild test robot 
Signed-off-by: Eric Dumazet 
---
Given its broken in David net-next tree, its probably simpler
that David merges this fix directly ? Thanks !

 drivers/scsi/qla4xxx/ql4_def.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/qla4xxx/ql4_def.h b/drivers/scsi/qla4xxx/ql4_def.h
index 
8f6d0fb2cd807255a66e962c3cb7c4c8633d4d77..a7cfc270bd08a1f01867affc2dee0fc6b7611472
 100644
--- a/drivers/scsi/qla4xxx/ql4_def.h
+++ b/drivers/scsi/qla4xxx/ql4_def.h
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/2] net: phy: phy_interface_is_rgmii helper

2015-05-26 Thread David Miller
From: Florian Fainelli 
Date: Tue, 26 May 2015 12:19:57 -0700

> As you suggested, here is the helper function to avoid missing some RGMII
> interface checks. Had to wait for net to be merged in net-next to avoid
> submitting the same patch/commit.
> 
> Dan, you might want to rebase your dp83867 submission to use that helper
> when you this patchset gets merged into net-next, thanks!

Applied, thanks for following up on this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] tools: bpf_jit_disasm: fix segfault on disabled debugging log output

2015-05-26 Thread David Miller
From: Daniel Borkmann 
Date: Mon, 25 May 2015 14:08:03 +0200

> With recent debugging, I noticed that bpf_jit_disasm segfaults when
> there's no debugging output from the JIT compiler to the kernel log.
> 
> Reason is that when regexec(3) doesn't match on anything, start/end
> offsets are not being filled out and contain some uninitialized garbage
> from stack. Thus, we need zero out offsets first.
> 
> Signed-off-by: Daniel Borkmann 

Applied, thanks Daniel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next:master 200/201] net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree'

2015-05-26 Thread Eric Dumazet
Few vmalloc() users forgot to include ,
which used to be included from include/net/inet_hashtables.h

Thanks for the report, I'll send fixes.

On Tue, May 26, 2015 at 9:12 PM, kbuild test robot
 wrote:
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
> master
> head:   d6a4e26afb80c049e7f94e1b7b506dcda61eee88
> commit: 095dc8e0c3686d586a01a50abc3e1bb9ac633054 [200/201] tcp: fix/cleanup 
> inet_ehash_locks_alloc()
> config: powerpc-defconfig (attached as .config)
> reproduce:
>   wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout 095dc8e0c3686d586a01a50abc3e1bb9ac633054
>   # save the attached .config to linux build tree
>   make.cross ARCH=powerpc
>
> All error/warnings:
>
>net/ipv4/fib_trie.c: In function '__node_free_rcu':
>>> net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree' 
>>> [-Werror=implicit-function-declaration]
>   vfree(n);
>   ^
>net/ipv4/fib_trie.c: In function 'tnode_alloc':
>>> net/ipv4/fib_trie.c:312:3: error: implicit declaration of function 
>>> 'vzalloc' [-Werror=implicit-function-declaration]
>   return vzalloc(size);
>   ^
>>> net/ipv4/fib_trie.c:312:3: warning: return makes pointer from integer 
>>> without a cast
>cc1: some warnings being treated as errors
> --
>drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_create_chap_list':
>>> drivers/scsi/qla4xxx/ql4_os.c:617:3: error: implicit declaration of 
>>> function 'vmalloc' [-Werror=implicit-function-declaration]
>   ha->chap_list = vmalloc(chap_size);
>   ^
>>> drivers/scsi/qla4xxx/ql4_os.c:617:17: warning: assignment makes pointer 
>>> from integer without a cast
>   ha->chap_list = vmalloc(chap_size);
> ^
>drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_mem_free':
>>> drivers/scsi/qla4xxx/ql4_os.c:4135:3: error: implicit declaration of 
>>> function 'vfree' [-Werror=implicit-function-declaration]
>   vfree(ha->fw_dump);
>   ^
>drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_is_session_exists':
>>> drivers/scsi/qla4xxx/ql4_os.c:6340:2: error: implicit declaration of 
>>> function 'vzalloc' [-Werror=implicit-function-declaration]
>  fw_tddb = vzalloc(sizeof(*fw_tddb));
>  ^
>>> drivers/scsi/qla4xxx/ql4_os.c:6340:10: warning: assignment makes pointer 
>>> from integer without a cast
>  fw_tddb = vzalloc(sizeof(*fw_tddb));
>  ^
>>> drivers/scsi/qla4xxx/ql4_os.c:6348:11: warning: assignment makes pointer 
>>> from integer without a cast
>  tmp_tddb = vzalloc(sizeof(*tmp_tddb));
>   ^
>drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_is_flash_ddb_exists':
>>> drivers/scsi/qla4xxx/ql4_os.c:6485:10: warning: assignment makes pointer 
>>> from integer without a cast
>  fw_tddb = vzalloc(sizeof(*fw_tddb));
>  ^
>>> drivers/scsi/qla4xxx/ql4_os.c:6493:11: warning: assignment makes pointer 
>>> from integer without a cast
>  tmp_tddb = vzalloc(sizeof(*tmp_tddb));
>   ^
>drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_get_ep_fwdb':
>>> drivers/scsi/qla4xxx/ql4_os.c:6556:11: warning: assignment makes pointer 
>>> from integer without a cast
>  dst_addr = vmalloc(sizeof(*dst_addr));
>   ^
>drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_build_st_list':
>>> drivers/scsi/qla4xxx/ql4_os.c:6786:14: warning: assignment makes pointer 
>>> from integer without a cast
>   st_ddb_idx = vzalloc(fw_idx_size);
>  ^
>drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_build_nt_list':
>>> drivers/scsi/qla4xxx/ql4_os.c:7026:15: warning: assignment makes pointer 
>>> from integer without a cast
>nt_ddb_idx = vmalloc(fw_idx_size);
>   ^
>drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_build_new_nt_list':
>>> drivers/scsi/qla4xxx/ql4_os.c:7122:14: warning: assignment makes pointer 
>>> from integer without a cast
>   nt_ddb_idx = vmalloc(fw_idx_size);
>  ^
>drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_sysfs_ddb_logout':
>>> drivers/scsi/qla4xxx/ql4_os.c:7732:13: warning: assignment makes pointer 
>>> from integer without a cast
>  flash_tddb = vzalloc(sizeof(*flash_tddb));
> ^
>>> drivers/scsi/qla4xxx/ql4_os.c:7740:11: warning: assignment makes pointer 
>>> from integer without a cast
>  tmp_tddb = vzalloc(sizeof(*tmp_tddb));
>   ^
>cc1: some warnings being treated as errors
> --
>drivers/scsi/qla4xxx/ql4_init.c: In function 'qla4xxx_alloc_fw_dump':
>>> drivers/scsi/qla4xxx/ql4_init.c:387:2: error: implicit declaration of 
>>> function 'vmalloc' [-Werror=implicit-function-declaration]
>  ha->fw_dump = vmalloc(ha->fw_dump_size);
>  ^
>>> drivers/scsi/qla4xxx/ql4_init.c:387:14: warning: assignment makes pointer 
>>> from integer without a cast
>  ha->fw_dump = vmallo

Re: [net-next:master 200/201] net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree'

2015-05-26 Thread David Miller
From: kbuild test robot 
Date: Wed, 27 May 2015 12:12:08 +0800

> All error/warnings:
> 
>net/ipv4/fib_trie.c: In function '__node_free_rcu':
>>> net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree' 
>>> [-Werror=implicit-function-declaration]
>   vfree(n);
>   ^
>net/ipv4/fib_trie.c: In function 'tnode_alloc':
>>> net/ipv4/fib_trie.c:312:3: error: implicit declaration of function 
>>> 'vzalloc' [-Werror=implicit-function-declaration]
>   return vzalloc(size);
>   ^
>>> net/ipv4/fib_trie.c:312:3: warning: return makes pointer from integer 
>>> without a cast
>cc1: some warnings being treated as errors

I'll take care of the fib_trie.c part:

commit ffa915d071ce4a05dcd866409df26513d25786f8
Author: David S. Miller 
Date:   Wed May 27 00:19:03 2015 -0400

ipv4: Fix fib_trie.c build, missing linux/vmalloc.h include.

We used to get this indirectly I supposed, but no longer do.

Either way, an explicit include should have been done in the
first place.

   net/ipv4/fib_trie.c: In function '__node_free_rcu':
>> net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 
'vfree' [-Werror=implicit-function-declaration]
  vfree(n);
  ^
   net/ipv4/fib_trie.c: In function 'tnode_alloc':
>> net/ipv4/fib_trie.c:312:3: error: implicit declaration of function 
'vzalloc' [-Werror=implicit-function-declaration]
  return vzalloc(size);
  ^
>> net/ipv4/fib_trie.c:312:3: warning: return makes pointer from integer 
without a cast
   cc1: some warnings being treated as errors

Reported-by: kbuild test robot 
Signed-off-by: David S. Miller 

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 5a5d9bd..01bce15 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -72,6 +72,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next:master 200/201] net/ipv4/inet_hashtables.c:620:13: warning: division by zero

2015-05-26 Thread Eric Dumazet
This report seems bogus ?

The divide is guarded by

if (sizeof(spinlock_t) != 0) {

}



On Tue, May 26, 2015 at 9:09 PM, kbuild test robot
 wrote:
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
> master
> head:   d6a4e26afb80c049e7f94e1b7b506dcda61eee88
> commit: 095dc8e0c3686d586a01a50abc3e1bb9ac633054 [200/201] tcp: fix/cleanup 
> inet_ehash_locks_alloc()
> config: cris-etrax-100lx_v2_defconfig (attached as .config)
> reproduce:
>   wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout 095dc8e0c3686d586a01a50abc3e1bb9ac633054
>   # save the attached .config to linux build tree
>   make.cross ARCH=cris
>
> All warnings:
>
>net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
>>> net/ipv4/inet_hashtables.c:620:13: warning: division by zero [-Wdiv-by-zero]
>
> vim +620 net/ipv4/inet_hashtables.c
>
>604  int i;
>605
>606  for (i = 0; i < INET_LHTABLE_SIZE; i++) {
>607  spin_lock_init(&h->listening_hash[i].lock);
>608  INIT_HLIST_NULLS_HEAD(&h->listening_hash[i].head,
>609i + LISTENING_NULLS_BASE);
>610  }
>611  }
>612  EXPORT_SYMBOL_GPL(inet_hashinfo_init);
>613
>614  int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
>615  {
>616  unsigned int i, nblocks = 1;
>617
>618  if (sizeof(spinlock_t) != 0) {
>619  /* allocate 2 cache lines or at least one spinlock 
> per cpu */
>  > 620  nblocks = max_t(unsigned int,
>621  2 * L1_CACHE_BYTES / 
> sizeof(spinlock_t),
>622  1);
>623  nblocks = roundup_pow_of_two(nblocks * 
> num_possible_cpus());
>624
>625  /* no more locks than number of hash buckets */
>626  nblocks = min(nblocks, hashinfo->ehash_mask + 1);
>627
>628  hashinfo->ehash_locks = kmalloc_array(nblocks, 
> sizeof(spinlock_t),
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> http://lists.01.org/mailman/listinfo/kbuild Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next:master 200/201] net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree'

2015-05-26 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   d6a4e26afb80c049e7f94e1b7b506dcda61eee88
commit: 095dc8e0c3686d586a01a50abc3e1bb9ac633054 [200/201] tcp: fix/cleanup 
inet_ehash_locks_alloc()
config: powerpc-defconfig (attached as .config)
reproduce:
  wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout 095dc8e0c3686d586a01a50abc3e1bb9ac633054
  # save the attached .config to linux build tree
  make.cross ARCH=powerpc 

All error/warnings:

   net/ipv4/fib_trie.c: In function '__node_free_rcu':
>> net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree' 
>> [-Werror=implicit-function-declaration]
  vfree(n);
  ^
   net/ipv4/fib_trie.c: In function 'tnode_alloc':
>> net/ipv4/fib_trie.c:312:3: error: implicit declaration of function 'vzalloc' 
>> [-Werror=implicit-function-declaration]
  return vzalloc(size);
  ^
>> net/ipv4/fib_trie.c:312:3: warning: return makes pointer from integer 
>> without a cast
   cc1: some warnings being treated as errors
--
   drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_create_chap_list':
>> drivers/scsi/qla4xxx/ql4_os.c:617:3: error: implicit declaration of function 
>> 'vmalloc' [-Werror=implicit-function-declaration]
  ha->chap_list = vmalloc(chap_size);
  ^
>> drivers/scsi/qla4xxx/ql4_os.c:617:17: warning: assignment makes pointer from 
>> integer without a cast
  ha->chap_list = vmalloc(chap_size);
^
   drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_mem_free':
>> drivers/scsi/qla4xxx/ql4_os.c:4135:3: error: implicit declaration of 
>> function 'vfree' [-Werror=implicit-function-declaration]
  vfree(ha->fw_dump);
  ^
   drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_is_session_exists':
>> drivers/scsi/qla4xxx/ql4_os.c:6340:2: error: implicit declaration of 
>> function 'vzalloc' [-Werror=implicit-function-declaration]
 fw_tddb = vzalloc(sizeof(*fw_tddb));
 ^
>> drivers/scsi/qla4xxx/ql4_os.c:6340:10: warning: assignment makes pointer 
>> from integer without a cast
 fw_tddb = vzalloc(sizeof(*fw_tddb));
 ^
>> drivers/scsi/qla4xxx/ql4_os.c:6348:11: warning: assignment makes pointer 
>> from integer without a cast
 tmp_tddb = vzalloc(sizeof(*tmp_tddb));
  ^
   drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_is_flash_ddb_exists':
>> drivers/scsi/qla4xxx/ql4_os.c:6485:10: warning: assignment makes pointer 
>> from integer without a cast
 fw_tddb = vzalloc(sizeof(*fw_tddb));
 ^
>> drivers/scsi/qla4xxx/ql4_os.c:6493:11: warning: assignment makes pointer 
>> from integer without a cast
 tmp_tddb = vzalloc(sizeof(*tmp_tddb));
  ^
   drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_get_ep_fwdb':
>> drivers/scsi/qla4xxx/ql4_os.c:6556:11: warning: assignment makes pointer 
>> from integer without a cast
 dst_addr = vmalloc(sizeof(*dst_addr));
  ^
   drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_build_st_list':
>> drivers/scsi/qla4xxx/ql4_os.c:6786:14: warning: assignment makes pointer 
>> from integer without a cast
  st_ddb_idx = vzalloc(fw_idx_size);
 ^
   drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_build_nt_list':
>> drivers/scsi/qla4xxx/ql4_os.c:7026:15: warning: assignment makes pointer 
>> from integer without a cast
   nt_ddb_idx = vmalloc(fw_idx_size);
  ^
   drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_build_new_nt_list':
>> drivers/scsi/qla4xxx/ql4_os.c:7122:14: warning: assignment makes pointer 
>> from integer without a cast
  nt_ddb_idx = vmalloc(fw_idx_size);
 ^
   drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4xxx_sysfs_ddb_logout':
>> drivers/scsi/qla4xxx/ql4_os.c:7732:13: warning: assignment makes pointer 
>> from integer without a cast
 flash_tddb = vzalloc(sizeof(*flash_tddb));
^
>> drivers/scsi/qla4xxx/ql4_os.c:7740:11: warning: assignment makes pointer 
>> from integer without a cast
 tmp_tddb = vzalloc(sizeof(*tmp_tddb));
  ^
   cc1: some warnings being treated as errors
--
   drivers/scsi/qla4xxx/ql4_init.c: In function 'qla4xxx_alloc_fw_dump':
>> drivers/scsi/qla4xxx/ql4_init.c:387:2: error: implicit declaration of 
>> function 'vmalloc' [-Werror=implicit-function-declaration]
 ha->fw_dump = vmalloc(ha->fw_dump_size);
 ^
>> drivers/scsi/qla4xxx/ql4_init.c:387:14: warning: assignment makes pointer 
>> from integer without a cast
 ha->fw_dump = vmalloc(ha->fw_dump_size);
 ^
   cc1: some warnings being treated as errors
--
   drivers/scsi/qla4xxx/ql4_83xx.c: In function 'qla4_83xx_copy_bootloader':
>> drivers/scsi/qla4xxx/ql4_83xx.c:640:2: error: implicit declaration of 
>> function 'vmalloc' [-Werror=implicit-function-declaration]
 p_cache = vmalloc(size);
 ^
>> drivers/scsi/qla4xxx/ql4_83xx.c:

[net-next:master 200/201] net/ipv4/inet_hashtables.c:620:13: warning: division by zero

2015-05-26 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   d6a4e26afb80c049e7f94e1b7b506dcda61eee88
commit: 095dc8e0c3686d586a01a50abc3e1bb9ac633054 [200/201] tcp: fix/cleanup 
inet_ehash_locks_alloc()
config: cris-etrax-100lx_v2_defconfig (attached as .config)
reproduce:
  wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout 095dc8e0c3686d586a01a50abc3e1bb9ac633054
  # save the attached .config to linux build tree
  make.cross ARCH=cris 

All warnings:

   net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
>> net/ipv4/inet_hashtables.c:620:13: warning: division by zero [-Wdiv-by-zero]

vim +620 net/ipv4/inet_hashtables.c

   604  int i;
   605  
   606  for (i = 0; i < INET_LHTABLE_SIZE; i++) {
   607  spin_lock_init(&h->listening_hash[i].lock);
   608  INIT_HLIST_NULLS_HEAD(&h->listening_hash[i].head,
   609i + LISTENING_NULLS_BASE);
   610  }
   611  }
   612  EXPORT_SYMBOL_GPL(inet_hashinfo_init);
   613  
   614  int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
   615  {
   616  unsigned int i, nblocks = 1;
   617  
   618  if (sizeof(spinlock_t) != 0) {
   619  /* allocate 2 cache lines or at least one spinlock per 
cpu */
 > 620  nblocks = max_t(unsigned int,
   621  2 * L1_CACHE_BYTES / sizeof(spinlock_t),
   622  1);
   623  nblocks = roundup_pow_of_two(nblocks * 
num_possible_cpus());
   624  
   625  /* no more locks than number of hash buckets */
   626  nblocks = min(nblocks, hashinfo->ehash_mask + 1);
   627  
   628  hashinfo->ehash_locks = kmalloc_array(nblocks, 
sizeof(spinlock_t),

---
0-DAY kernel test infrastructureOpen Source Technology Center
http://lists.01.org/mailman/listinfo/kbuild Intel Corporation
#
# Automatically generated file; DO NOT EDIT.
# Linux/cris 4.1.0-rc4 Kernel Configuration
#
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_NO_IOPORT_MAP=y
CONFIG_FORCE_MAX_ZONEORDER=6
CONFIG_CRIS=y
CONFIG_HZ=100
CONFIG_NR_CPUS=1
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_FHANDLE is not set
CONFIG_USELIB=y
# CONFIG_AUDIT is not set

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_ARCH_USES_GETTIMEOFFSET=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_KTHREAD_PRIO=0
# CONFIG_RCU_EXPEDITE_BOOT is not set
# CONFIG_BUILD_BIN2C is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_CGROUPS is not set
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_NAMESPACES is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_BPF=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_KALLSYMS is not set
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
# CONFIG_BPF_SYSCALL is not set
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
# CONFIG_EMBEDDED is not set

#
# Kernel Performance Events And Counters
#
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
# CONFIG_PROFILING is not set
# CONFIG_UPROBES is not set
# CONFIG_HAVE_64BIT_ALIGNED_ACCESS is not set
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
# CONFIG_CC_STACKPROTECTOR is not set
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_CLONE_BACKWARDS2=y
CONFIG_OLD_SIGSUSPEND=y
CONFIG_OLD_SIGACTION=y

#
# GCOV-based kernel profiling
#
# CONFIG_ARCH_HAS_GCOV_PROFILE_ALL is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
# CONFIG_MODULES is not se

Re: [PATCH net-next] tcp: tcp_tso_autosize() minimum is one packet

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 23:52 -0400, David Miller wrote:
> From: Eric Dumazet 
> Date: Tue, 26 May 2015 08:55:28 -0700
> 
> > From: Eric Dumazet 
> > 
> > By making sure sk->sk_gso_max_segs minimal value is one,
> > and sysctl_tcp_min_tso_segs minimal value is one as well,
> > tcp_tso_autosize() will return a non zero value.
> > 
> > We can then revert 843925f33fcc293d80acf2c5c8a78adf3344d49b
> > ("tcp: Do not apply TSO segment limit to non-TSO packets")
> > and save few cpu cycles in fast path.
> > 
> > Signed-off-by: Eric Dumazet 
> 
> Applied, thanks Eric.

Thanks David, and many thanks to Herbert and Neal for the reviews !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 net-next] openvswitch: include datapath actions with sampled-packet upcall to userspace

2015-05-26 Thread Neil McKee
If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
in the upcall.

This Directly associates the sampled packet with the path it takes
through the virtual switch. Path information currently includes mangling,
encapsulation and decapsulation actions for tunneling protocols GRE,
VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
changes to accommodate datapath actions that may be added in the
future.

Adding path information enhances visibility into complex virtual
networks.

Signed-off-by: Neil McKee 
---
 include/uapi/linux/openvswitch.h |  4 
 net/openvswitch/actions.c| 23 +++
 net/openvswitch/datapath.c   | 18 --
 net/openvswitch/datapath.h   |  2 ++
 4 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index bbd49a0..1dab776 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -153,6 +153,8 @@ enum ovs_packet_cmd {
  * flow key against the kernel's.
  * @OVS_PACKET_ATTR_ACTIONS: Contains actions for the packet.  Used
  * for %OVS_PACKET_CMD_EXECUTE.  It has nested %OVS_ACTION_ATTR_* attributes.
+ * Also used in upcall when %OVS_ACTION_ATTR_USERSPACE has optional
+ * %OVS_USERSPACE_ATTR_ACTIONS attribute.
  * @OVS_PACKET_ATTR_USERDATA: Present for an %OVS_PACKET_CMD_ACTION
  * notification if the %OVS_ACTION_ATTR_USERSPACE action specified an
  * %OVS_USERSPACE_ATTR_USERDATA attribute, with the same length and content
@@ -528,6 +530,7 @@ enum ovs_sample_attr {
  * copied to the %OVS_PACKET_CMD_ACTION message as %OVS_PACKET_ATTR_USERDATA.
  * @OVS_USERSPACE_ATTR_EGRESS_TUN_PORT: If present, u32 output port to get
  * tunnel info.
+ * @OVS_USERSPACE_ATTR_ACTIONS: If present, send actions with upcall.
  */
 enum ovs_userspace_attr {
OVS_USERSPACE_ATTR_UNSPEC,
@@ -535,6 +538,7 @@ enum ovs_userspace_attr {
OVS_USERSPACE_ATTR_USERDATA,  /* Optional user-specified cookie. */
OVS_USERSPACE_ATTR_EGRESS_TUN_PORT,  /* Optional, u32 output port
  * to get tunnel info. */
+   OVS_USERSPACE_ATTR_ACTIONS,   /* Optional flag to get actions. */
__OVS_USERSPACE_ATTR_MAX
 };
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index b491c1c..8a8c0b8 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -608,17 +608,16 @@ static void do_output(struct datapath *dp, struct sk_buff 
*skb, int out_port)
 }
 
 static int output_userspace(struct datapath *dp, struct sk_buff *skb,
-   struct sw_flow_key *key, const struct nlattr *attr)
+   struct sw_flow_key *key, const struct nlattr *attr,
+   const struct nlattr *actions, int actions_len)
 {
struct ovs_tunnel_info info;
struct dp_upcall_info upcall;
const struct nlattr *a;
int rem;
 
+   memset(&upcall, 0, sizeof(upcall));
upcall.cmd = OVS_PACKET_CMD_ACTION;
-   upcall.userdata = NULL;
-   upcall.portid = 0;
-   upcall.egress_tun_info = NULL;
 
for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
 a = nla_next(a, &rem)) {
@@ -647,6 +646,13 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
break;
}
 
+   case OVS_USERSPACE_ATTR_ACTIONS: {
+   /* Include actions. */
+   upcall.actions = actions;
+   upcall.actions_len = actions_len;
+   break;
+   }
+
} /* End of switch. */
}
 
@@ -654,7 +660,8 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
 }
 
 static int sample(struct datapath *dp, struct sk_buff *skb,
- struct sw_flow_key *key, const struct nlattr *attr)
+ struct sw_flow_key *key, const struct nlattr *attr,
+ const struct nlattr *actions, int actions_len)
 {
const struct nlattr *acts_list = NULL;
const struct nlattr *a;
@@ -688,7 +695,7 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
 */
if (likely(nla_type(a) == OVS_ACTION_ATTR_USERSPACE &&
   nla_is_last(a, rem)))
-   return output_userspace(dp, skb, key, a);
+   return output_userspace(dp, skb, key, a, actions, actions_len);
 
skb = skb_clone(skb, GFP_ATOMIC);
if (!skb)
@@ -872,7 +879,7 @@ static int do_execute_actions(struct datapath *dp, struct 
sk_buff *skb,
break;
 
case OVS_ACTION_ATTR_USERSPACE:
-   output_userspace(dp, skb, key, a);
+   output_userspace(dp, skb, key, a, attr, len);
break;
 
case OVS_ACTI

[PATCH net-next v6 1/2] pci: Add Cavium PCI vendor id

2015-05-26 Thread Aleksey Makarov
From: Sunil Goutham 

This vendor id will be used by network (vNIC), USB (xHCI),
SATA (AHCI), GPIO, I2C, MMC and maybe other drivers
for ThunderX SoC.

Acked-by: Bjorn Helgaas 
Signed-off-by: Sunil Goutham 
Signed-off-by: Aleksey Makarov 
---
 include/linux/pci_ids.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 1fa99a3..80bd333 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2324,6 +2324,8 @@
 #define PCI_DEVICE_ID_ALTIMA_AC91000x03ea
 #define PCI_DEVICE_ID_ALTIMA_AC10030x03eb
 
+#define PCI_VENDOR_ID_CAVIUM   0x177d
+
 #define PCI_VENDOR_ID_BELKIN   0x1799
 #define PCI_DEVICE_ID_BELKIN_F5D7010V7 0x701f
 
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: tcp_tso_autosize() minimum is one packet

2015-05-26 Thread David Miller
From: Eric Dumazet 
Date: Tue, 26 May 2015 08:55:28 -0700

> From: Eric Dumazet 
> 
> By making sure sk->sk_gso_max_segs minimal value is one,
> and sysctl_tcp_min_tso_segs minimal value is one as well,
> tcp_tso_autosize() will return a non zero value.
> 
> We can then revert 843925f33fcc293d80acf2c5c8a78adf3344d49b
> ("tcp: Do not apply TSO segment limit to non-TSO packets")
> and save few cpu cycles in fast path.
> 
> Signed-off-by: Eric Dumazet 

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] net/unix: sk_socket can disappear when state is unlocked

2015-05-26 Thread David Miller
From: Hannes Frederic Sowa 
Date: Tue, 26 May 2015 23:24:59 +0200

> On Tue, May 26, 2015, at 17:22, Mark Salyzyn wrote:
>> got a rare NULL pointer dereference in clear_bit
>> 
>> Signed-off-by: Mark Salyzyn 
> 
> IMHO, this is the right approach, I didn't came up with something
> easier, thanks!
> 
> Acked-by: Hannes Frederic Sowa 

Applied and queued up for -stable, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v6 0/2] Adding support for Cavium ThunderX network controller

2015-05-26 Thread David Miller
From: Aleksey Makarov 
Date: Tue, 26 May 2015 19:20:13 -0700

> This patchset adds support for the Cavium ThunderX network controller.

I don't see patch #1 (yet).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] atm:he - Change 0 to false for bool type variable initialization.

2015-05-26 Thread David Miller
From: Shailendra Verma 
Date: Wed, 27 May 2015 06:50:18 +0530

> The variable sdh is bool type so initializing it with false value
> instead of 0.
> 
> Signed-off-by: Shailendra Verma 
> ---
>  drivers/atm/he.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/atm/he.c b/drivers/atm/he.c
> index 93dca2e..eb5bebc 100644
> --- a/drivers/atm/he.c
> +++ b/drivers/atm/he.c
> @@ -116,8 +116,8 @@ static bool disable64;
>  static short nvpibits = -1;
>  static short nvcibits = -1;
>  static short rx_skb_reserve = 16;
> -static bool irq_coalesce = 1;
> -static bool sdh = 0;
> +static bool irq_coalesce = true;
> +static bool sdh = false;

You didn't understand my feedback.

I already applied your patch that handled the irq_coalesce issue,
so you have to submit a patch relative to that.

In fact, you should always test that your patch applied to my tree
before submitting it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v6 0/2] Adding support for Cavium ThunderX network controller

2015-05-26 Thread Aleksey Makarov
This patchset adds support for the Cavium ThunderX network controller.

changes in v6:
 * unused preprocessor symbols were removed
 * reduce no of atomic operations in SQ maintenance
 * support for TCP segmentation at driver level
 * reset RBDR if fifo state is FAIL
 * fixed an issue with link state mailbox message

changes in v5:
 * __packed were removed.  now we rely on C language ABI
 * nic_dbg() -> netdev_dbg()
 * fixes for a typo, constant spelling and using BIT_ULL
 * use print_hex_dump()
 * unnecessary conditions in a long if() chain were removed

changes in v4:
 * the patch "pci: Add Cavium PCI vendor id" was attributed correctly
 * a note that Cavium id is used in many drivers was added
 * the license comments now match MODULE_LICENSE
 * a comment explaining usage of writeq_relaxed()/readq_relaxed() was added

changes in v3:
 * code cleanup
 * issues discovered by reviewers were addressed

changes in v2:
 * non-generic module parameters removed
 * ethtool support added (nicvf_set_rxnfc())

v5: 
https://lkml.kernel.org/g/<1432344498-17131-1-git-send-email-aleksey.maka...@caviumnetworks.com>
v4: 
https://lkml.kernel.org/g/<1432000757-28700-1-git-send-email-aleksey.maka...@auriga.com>
v3: 
https://lkml.kernel.org/g/<1431747401-20847-1-git-send-email-aleksey.maka...@auriga.com>
v2: 
https://lkml.kernel.org/g/<1415596445-10061-1-git-send-email-r...@kernel.org>
v1: https://lkml.kernel.org/g/<20141030165434.GW20170@rric.localhost>

Sunil Goutham (2):
  pci: Add Cavium PCI vendor id
  net: Adding support for Cavium ThunderX network controller

 MAINTAINERS|7 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/cavium/Kconfig|   40 +
 drivers/net/ethernet/cavium/Makefile   |5 +
 drivers/net/ethernet/cavium/thunder/Makefile   |   11 +
 drivers/net/ethernet/cavium/thunder/nic.h  |  414 ++
 drivers/net/ethernet/cavium/thunder/nic_main.c |  940 
 drivers/net/ethernet/cavium/thunder/nic_reg.h  |  213 +++
 .../net/ethernet/cavium/thunder/nicvf_ethtool.c|  601 
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   | 1332 +
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 1544 
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |  381 +
 drivers/net/ethernet/cavium/thunder/q_struct.h |  701 +
 drivers/net/ethernet/cavium/thunder/thunder_bgx.c  |  966 
 drivers/net/ethernet/cavium/thunder/thunder_bgx.h  |  223 +++
 include/linux/pci_ids.h|2 +
 17 files changed, 7382 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/Kconfig
 create mode 100644 drivers/net/ethernet/cavium/Makefile
 create mode 100644 drivers/net/ethernet/cavium/thunder/Makefile
 create mode 100644 drivers/net/ethernet/cavium/thunder/nic.h
 create mode 100644 drivers/net/ethernet/cavium/thunder/nic_main.c
 create mode 100644 drivers/net/ethernet/cavium/thunder/nic_reg.h
 create mode 100644 drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
 create mode 100644 drivers/net/ethernet/cavium/thunder/nicvf_main.c
 create mode 100644 drivers/net/ethernet/cavium/thunder/nicvf_queues.c
 create mode 100644 drivers/net/ethernet/cavium/thunder/nicvf_queues.h
 create mode 100644 drivers/net/ethernet/cavium/thunder/q_struct.h
 create mode 100644 drivers/net/ethernet/cavium/thunder/thunder_bgx.c
 create mode 100644 drivers/net/ethernet/cavium/thunder/thunder_bgx.h

-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: tcp_tso_autosize() minimum is one packet

2015-05-26 Thread Herbert Xu
On Tue, May 26, 2015 at 07:14:41PM -0700, Eric Dumazet wrote:
>
> Fact that Nagle or tso should defer applies in this corner case is not
> very important here, unless you have a specific case in mind ?
> Anyway I double checked and I believes it is fine.

Yes it does appear to be fine on a second look, for the case where
skb->len > mss.

Acked-by: Herbert Xu 

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: tcp_tso_autosize() minimum is one packet

2015-05-26 Thread Eric Dumazet
On Wed, 2015-05-27 at 10:01 +0800, Herbert Xu wrote:
> On Wed, May 27, 2015 at 09:38:40AM +0800, Herbert Xu wrote:
> > 
> > Not really.  They're not identical.  For example, before your
> > patch a packet greater than MSS with TSO disabled would call
> > tcp_nagle_test, with your patch it will call tcp_tso_should_defer
> > instead.
> > 
> > Maybe this is OK but it is far from obvious.
> 
> Funny enough it was you who originally introduced this change
> in behaviour:
> 
> commit 8f26fb1c1ed81c33f5d87c5936f4d9d1b4118918
> Author: Eric Dumazet 
> Date:   Tue Oct 15 12:24:54 2013 -0700
> 
> tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()
> 
> Now you're trying to do it again :)
> 
> The problem is back at the very beginning, tso_segs > 1 was the
> same as TSO enabled.  So we either need to restore this identity
> or find a new way of indiciating that TSO is enabled.
> 
> Cheers,


I have no idea of what you are trying to tell me.

I want to remove duplicate tests, not pushing new ones.

I do not want to add back sk_can_gso()

I do not find something funny here. Sorry.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: tcp_tso_autosize() minimum is one packet

2015-05-26 Thread Eric Dumazet
On Wed, 2015-05-27 at 09:38 +0800, Herbert Xu wrote:
> On Tue, May 26, 2015 at 05:36:34PM -0700, Eric Dumazet wrote:
> > 
> > tldr:  "TSO with max_segs==1"   "no TSO/GSO"
> 
> Not really.  They're not identical.  For example, before your
> patch a packet greater than MSS with TSO disabled would call
> tcp_nagle_test, with your patch it will call tcp_tso_should_defer
> instead.

Well, given that a device can set gso_max_segs to one, if there is a bug
here we'll need to fix it asap.

Fact that Nagle or tso should defer applies in this corner case is not
very important here, unless you have a specific case in mind ?
Anyway I double checked and I believes it is fine.

We normally deal with dynamic MSS changes, even for non GSO cases.

A non GSO packet temporarily becomes a GSO one in tcp_init_tso_segs()
(because its skb->len is bigger than cur_mss)

Then we split it.

Nagle or tso should defer would take same decision : send one full MSS.

By the time the last 'segment' (possibly smaller than mss) will be
considered, Nagle might apply there.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: tcp_tso_autosize() minimum is one packet

2015-05-26 Thread Herbert Xu
On Wed, May 27, 2015 at 09:38:40AM +0800, Herbert Xu wrote:
> 
> Not really.  They're not identical.  For example, before your
> patch a packet greater than MSS with TSO disabled would call
> tcp_nagle_test, with your patch it will call tcp_tso_should_defer
> instead.
> 
> Maybe this is OK but it is far from obvious.

Funny enough it was you who originally introduced this change
in behaviour:

commit 8f26fb1c1ed81c33f5d87c5936f4d9d1b4118918
Author: Eric Dumazet 
Date:   Tue Oct 15 12:24:54 2013 -0700

tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()

Now you're trying to do it again :)

The problem is back at the very beginning, tso_segs > 1 was the
same as TSO enabled.  So we either need to restore this identity
or find a new way of indiciating that TSO is enabled.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF

2015-05-26 Thread Rose, Gregory V

> -Original Message-
> From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com]
> Sent: Tuesday, May 26, 2015 5:28 PM
> To: Rose, Gregory V; Skidmore, Donald C; Kirsher, Jeffrey T; intel-wired-
> l...@lists.osuosl.org
> Cc: nhor...@redhat.com; jogre...@redhat.com; Linux Netdev List; Choi, Sy
> Jong; Rony Efraim; David Miller; Edward Cree; Or Gerlitz;
> sassm...@redhat.com
> Subject: RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF
> 
> > > -Original Message-
> > > From: Skidmore, Donald C
> > > Sent: Tuesday, May 26, 2015 10:46 AM
> > > To: Hiroshi Shimamoto; Rose, Gregory V; Kirsher, Jeffrey T;
> > > intel-wired- l...@lists.osuosl.org
> > > Cc: nhor...@redhat.com; jogre...@redhat.com; Linux Netdev List;
> > > Choi, Sy Jong; Rony Efraim; David Miller; Edward Cree; Or Gerlitz;
> > > sassm...@redhat.com
> > > Subject: RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF
> > >
> > >
> >
> > [snip]
> >
> > >
> > > > -Original Message-
> > > > From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com]
> > > > Sent: Monday, May 25, 2015 6:00 PM
> > > > To: Skidmore, Donald C; Rose, Gregory V; Kirsher, Jeffrey T;
> > > > intel-wired- l...@lists.osuosl.org
> > > > Cc: nhor...@redhat.com; jogre...@redhat.com; Linux Netdev List;
> > > > Choi, Sy Jong; Rony Efraim; David Miller; Edward Cree; Or Gerlitz;
> > > > sassm...@redhat.com
> > > > Subject: RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF
> > > >
> > > >
> > > > Do you mean that VF should care about it is trusted or not?
> > > > Should VF request MC Promisc again when it's trusted?
> > > > Or, do you mean VF never be trusted during its (or VM's) lifetime?
> > >
> > > I think the VF shouldn't directly know whether it is trusted or not
> >
> > That's completely irrevelant.  The person administering the PF will be
> > the person who provided trusted privileges to the VF.  He'll then
> > *tell* or somehow other communicate to the person administering the VF
> (probably himself/herself) and then proceed to execute commands on that VF
> that require trusted privileges.
> >
> > If the VF does not have trusted privileges then the commands to add
> > VLAN filters, set promiscuous modes, and any other privileged commands
> will fail.
> >
> > Let's not get too fancy with this.  It's simple - the host VMM admin
> > provides trusted privileges to the VF.  The person administering the
> > VF (if in fact it is not the same person, it usually will be) will
> proceed to do things that require VF trusted privileges.
> 
> Now I think that it's better to have an interface between PF and VF to
> know the VF is trusted.
> Otherwise VM cannot know whether its VF is trusted, that prevents
> automatic operations.

Agreed, it would be silly for the VF to have privileges but not know that it 
can use them!  

> Or add another communicating interface outside of ixgbe PF-VF mbox API?

We can't depend on any given vendor specific interface.  I'd add a very clear 
comment in the 
Physical Function ndo op that gives a VF trusted privileges that it is up to 
the driver to notify the VF driver.  But yes, in the case of Intel drivers the 
mailbox or admin queue (for i40e) would be the mechanism to do that.  I know 
you have some ixgbe patches that coincide with this patch so that's a good 
place to look.

> 
> >
> >
> > .  It
> > > should request MC Promisc and get it if it is trusted and not if it
> > > is not trusted.  So if you (as the system admin know you have a VF
> > > that will need to request MC Promisc make sure you promote that VF
> > > to trusted before assigning it to a VM.  That way when it requests
> > > MC Promisc the PF will be able to grant it.
> > >
> >
> > Multicast promiscuous should be allowed for the VFs.  We already allow
> > VFs to set whatever multicast filters they want so if they want to go
> > into MPE then so what?  We don't care.  It's not a security risk.
> > Right now, without any modification, the VF can set 30 multicast
> > filters and listen.  It can then remove those and set another 30 filters
> and listen.  And so on and so on.  So if a VF can already listen on any MC
> filter it wants then why this artificial restriction on MC promiscuous
> mode.
> 
> I'm fine with that, previously I mentioned about that.
> Without resetting PF, we can listen every MC packet which hash was set.
> PF reset will restore the last 30 MC addresses per VF.
> 
> Also there is a single hash entries table, all VFs will got a MC packet
> which hash was set in the table. If a VF user set a filter, other users
> will receive that MC packet.
> 
> >
> > We don't care about this case. Unicast promiscuous is the security risk
> and I think we've handled that.
> 
> So, should we separate the discussion, about trusting VF operation and
> about MC promiscuous?

Yes.  And to my mind it shouldn't really be in the context of virtual function 
privilege or trust.

> 
> >
> > >
> > > >
> > > > And what do you think about being untrusted from trusted state?
> > >
> > > Th

Re: [Patch net] net_sched: invoke ->attach() after setting dev->qdisc

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 16:08 -0700, Cong Wang wrote:
> For mq qdisc, we add per tx queue qdisc to root qdisc
> for display purpose, however, that happens too early,
> before the new dev->qdisc is finally set, this causes
> q->list points to an old root qdisc which is going to be
> freed right before assigning with a new one.
> 
> Fix this by moving ->attach() after setting dev->qdisc.
> 
> For the record, this fixes the following crash:
...

> For long term, we probably need to clean up the qdisc_graft() code
> in case it hides other bugs like this.
> 
> Fixes: 95dc19299f74 ("pkt_sched: give visibility to mq slave qdiscs")
> Cc: Jamal Hadi Salim 
> Signed-off-by: Cong Wang 
> ---
>  net/sched/sch_api.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index ad9eed7..1e1c89e 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -815,10 +815,8 @@ static int qdisc_graft(struct net_device *dev, struct 
> Qdisc *parent,
>   if (dev->flags & IFF_UP)
>   dev_deactivate(dev);
>  
> - if (new && new->ops->attach) {
> - new->ops->attach(new);
> - num_q = 0;
> - }
> + if (new && new->ops->attach)
> + goto skip;
>  
>   for (i = 0; i < num_q; i++) {
>   struct netdev_queue *dev_queue = dev_ingress_queue(dev);
> @@ -834,12 +832,16 @@ static int qdisc_graft(struct net_device *dev, struct 
> Qdisc *parent,
>   qdisc_destroy(old);
>   }
>  
> +skip:
>   if (!ingress) {
>   notify_and_destroy(net, skb, n, classid,
>  dev->qdisc, new);
>   if (new && !new->ops->attach)
>   atomic_inc(&new->refcnt);
>   dev->qdisc = new ? : &noop_qdisc;
> +
> + if (new && new->ops->attach)
> + new->ops->attach(new);
>   } else {
>   notify_and_destroy(net, skb, n, classid, old, new);
>   }

Good catch !

Please CC author of a buggy patch, always interesting to learn from our
mistakes. ;)

Note that attach() method is called with the 'new' qdisc,
we might pass this parameter up to qdisc_list_add()

Conceptually, setting dev->qdisc before attach() seems a bit awkward,
but given that attach() returns void, we did not planned having an error
path anyway.

No strong feeling here, I think your patch is fine as is.

Acked-by: Eric Dumazet 

Thanks a lot.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH v5] ixgbe: Add module parameter to disable VLAN filter

2015-05-26 Thread Alexander Duyck

On 05/26/2015 06:11 PM, Hiroshi Shimamoto wrote:

On 05/21/2015 06:10 AM, Hiroshi Shimamoto wrote:

From: Hiroshi Shimamoto 

Introduce module parameter "disable_hw_vlan_filter" to disable HW VLAN
filter on ixgbe module load.

  From the hardware limitation, there are only 64 VLAN entries for HW VLAN
filter, and it leads to limit the number of VLANs up to 64 among PF and
VFs. For SDN/NFV case, we need to handle unlimited VLAN packets on VF.
In such case, every VLAN packet can be transmitted to each VF.

When we try to make VLAN devices on VF, the 65th VLAN registration fails
and never be able to receive a packet with that VLAN tag.
If we do the below command on VM, ethX.65 to ethX.100 cannot be created.
# for i in `seq 1 100`; do \
  ip link add link ethX name ethX.$i type vlan id $i; done

There is a capability to disable HW VLAN filter and that makes all VLAN
tagged packets can be transmitted to every VFs. After VLAN filter stage,
unicast packets are transmitted to VF which has the MAC address same as
the transmitting packet.

With this patch and "disable_hw_vlan_filter=1", we can use unlimited
number of VLANs on VF.

Disabling HW VLAN filter breaks some NIC features such as DCB and FCoE.
DCB and FCoE are disabled when HW VLAN filter is disabled by this module
parameter.
Because of that reason, the administrator has to know that before turning
off HW VLAN filter.

You might also want to note that it makes the system susceptible to
broadcast/multicast storms since it eliminates any/all VLAN isolation.
So a broadcast or multicast packet on one VLAN is received on ALL
interfaces regardless of their VLAN configuration. In addition the
current VF driver is likely to just receive the packet as untagged, see
ixgbevf_process_skb_fields().  As a result one or two VFs can bring the
entire system to a crawl by saturating the PCIe bus via
broadcast/multicast traffic since there is nothing to prevent them from
talking to each other over VLANs that are no longer there.

that's right.

On the other hand, an untagged packet is not isolated,
doesn't it same broadcast/multicast storm on untagged network?


Yes, that is one of the reasons for VLANs.  It provides isolation so 
that if you have two entities on the same network you won't have entity 
A able to talk to entity B.  The problem is with VLAN promiscuous 
enabled if entity B is a VF it will see the traffic but has no way to 
know that it was VLAN tagged and a part of entity A's VLAN.





For the sake of backwards compatibility I would say that a feature like
this should be mutually exclusive with SR-IOV as well since it will
cause erratic behavior.  The VF will receive requests from all VLANs
thinking the traffic is untagged, and then send replies back to VLAN 0
even though that isn't where the message originated.

Sorry, I couldn't catch the above part.
Could you explain a bit more?

thanks,
Hiroshi


Until the VF issue
is fixed this type of feature is a no-go.




The current behavior for a VF is that if it receives a VLAN that it 
didn't request it assumes it is operating in port VLAN mode.  The 
problem is with your patch the VF will be receiving all traffic but have 
no idea which VLAN it came from.  As a result it could be replying to 
multicast or broadcast requests on one VLAN with the wrong VLAN ID.


The VLAN behavior of the VF drivers will need to be fixed before 
something like that could be supported with ANY of the VFs.  As such you 
will probably need to fix the VF driver in order to allow any of them to 
come online when VLAN filtering is disabled, as the driver will need to 
report the VLAN tag ID up to the stack.


- Alex

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: tcp_tso_autosize() minimum is one packet

2015-05-26 Thread Herbert Xu
On Tue, May 26, 2015 at 05:36:34PM -0700, Eric Dumazet wrote:
> 
> tldr:  "TSO with max_segs==1"   "no TSO/GSO"

Not really.  They're not identical.  For example, before your
patch a packet greater than MSS with TSO disabled would call
tcp_nagle_test, with your patch it will call tcp_tso_should_defer
instead.

Maybe this is OK but it is far from obvious.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] isdn: Use ktime_t instead of 'struct timeval'

2015-05-26 Thread Tina Ruchandani
'struct timeval' uses 32-bit representation for seconds which will
overflow in year 2038 and beyond. mISDN/clock.c needs to compute and
store elapsed time in intervals of 125 microseconds. This patch replaces
the usage of 'struct timeval' with 64-bit ktime_t which is y2038 safe.
The patch also replaces do_gettimeofday() (wall-clock time) with 
ktime_get() (monotonic time) since we only care about elapsed time here.

Signed-off-by: Tina Ruchandani 
Suggested-by: Arnd Bergmnann 
--
Changes in v3:
- Use division scheme suggested by Arnd Bergmann to avoid
  a (64-bit variable) / (32-bit variable) expression.
Changes in v2:
- Avoid possible truncation bug caused by assigning ktime_us_delta
output to a 16-bit number.
- Use ktime_us_delta, more concise than a combination of ktime_sub
and ktime_to_us.
---
 drivers/isdn/mISDN/clock.c | 56 ++
 include/linux/mISDNif.h|  2 +-
 2 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/drivers/isdn/mISDN/clock.c b/drivers/isdn/mISDN/clock.c
index 693fb7c..0ea7e2f 100644
--- a/drivers/isdn/mISDN/clock.c
+++ b/drivers/isdn/mISDN/clock.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "core.h"
@@ -45,7 +46,7 @@ static u_int *debug;
 static LIST_HEAD(iclock_list);
 static DEFINE_RWLOCK(iclock_lock);
 static u16 iclock_count;   /* counter of last clock */
-static struct timeval iclock_tv;   /* time stamp of last clock */
+static ktime_t iclock_tv;  /* time stamp of last clock */
 static int iclock_tv_valid;/* already received one timestamp */
 static struct mISDNclock *iclock_current;
 
@@ -53,7 +54,7 @@ void
 mISDN_init_clock(u_int *dp)
 {
debug = dp;
-   do_gettimeofday(&iclock_tv);
+   iclock_tv = ktime_get();
 }
 
 static void
@@ -139,12 +140,11 @@ mISDN_unregister_clock(struct mISDNclock *iclock)
 EXPORT_SYMBOL(mISDN_unregister_clock);
 
 void
-mISDN_clock_update(struct mISDNclock *iclock, int samples, struct timeval *tv)
+mISDN_clock_update(struct mISDNclock *iclock, int samples, ktime_t *tv)
 {
u_long  flags;
-   struct timeval  tv_now;
-   time_t  elapsed_sec;
-   int elapsed_8000th;
+   ktime_t tv_now;
+   u16 delta;
 
write_lock_irqsave(&iclock_lock, flags);
if (iclock_current != iclock) {
@@ -160,28 +160,22 @@ mISDN_clock_update(struct mISDNclock *iclock, int 
samples, struct timeval *tv)
/* increment sample counter by given samples */
iclock_count += samples;
if (tv) { /* tv must be set, if function call is delayed */
-   iclock_tv.tv_sec = tv->tv_sec;
-   iclock_tv.tv_usec = tv->tv_usec;
-   } else
-   do_gettimeofday(&iclock_tv);
+   iclock_tv = *tv;
+   } else {
+   iclock_tv = ktime_get();
+   }
} else {
/* calc elapsed time by system clock */
if (tv) { /* tv must be set, if function call is delayed */
-   tv_now.tv_sec = tv->tv_sec;
-   tv_now.tv_usec = tv->tv_usec;
-   } else
-   do_gettimeofday(&tv_now);
-   elapsed_sec = tv_now.tv_sec - iclock_tv.tv_sec;
-   elapsed_8000th = (tv_now.tv_usec / 125)
-   - (iclock_tv.tv_usec / 125);
-   if (elapsed_8000th < 0) {
-   elapsed_sec -= 1;
-   elapsed_8000th += 8000;
+   tv_now = *tv;
+   } else {
+   tv_now = ktime_get();
}
+   delta = ktime_divns(ktime_sub(tv_now, iclock_tv),
+   (NS_PER_SEC / 8000));
/* add elapsed time to counter and set new timestamp */
-   iclock_count += elapsed_sec * 8000 + elapsed_8000th;
-   iclock_tv.tv_sec = tv_now.tv_sec;
-   iclock_tv.tv_usec = tv_now.tv_usec;
+   iclock_count += delta;
+   iclock_tv = tv_now;
iclock_tv_valid = 1;
if (*debug & DEBUG_CLOCK)
printk("Received first clock from source '%s'.\n",
@@ -195,22 +189,16 @@ unsigned short
 mISDN_clock_get(void)
 {
u_long  flags;
-   struct timeval  tv_now;
-   time_t  elapsed_sec;
-   int elapsed_8000th;
+   ktime_t tv_now;
+   u16 delta;
u16 count;
 
read_lock_irqsave(&iclock_lock, flags);
/* calc elapsed time by system clock */
-   do_gettimeofday(&tv_now);
-   elapsed_sec = tv_now.tv_sec - iclock_tv.tv_sec;
-   elapsed_8000th = (tv_now.tv_usec / 125) - (iclock_tv.tv_usec / 125);
-   if (elapsed_8000th < 0) {
-   elapsed_s

[PATCH] atm:he - Change 0 to false for bool type variable initialization.

2015-05-26 Thread Shailendra Verma
The variable sdh is bool type so initializing it with false value
instead of 0.

Signed-off-by: Shailendra Verma 
---
 drivers/atm/he.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index 93dca2e..eb5bebc 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -116,8 +116,8 @@ static bool disable64;
 static short nvpibits = -1;
 static short nvcibits = -1;
 static short rx_skb_reserve = 16;
-static bool irq_coalesce = 1;
-static bool sdh = 0;
+static bool irq_coalesce = true;
+static bool sdh = false;
 
 /* Read from EEPROM =  0011b */
 static unsigned int readtab[] = {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [PATCH v5] ixgbe: Add module parameter to disable VLAN filter

2015-05-26 Thread Hiroshi Shimamoto
> On 05/21/2015 06:10 AM, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > Introduce module parameter "disable_hw_vlan_filter" to disable HW VLAN
> > filter on ixgbe module load.
> >
> >  From the hardware limitation, there are only 64 VLAN entries for HW VLAN
> > filter, and it leads to limit the number of VLANs up to 64 among PF and
> > VFs. For SDN/NFV case, we need to handle unlimited VLAN packets on VF.
> > In such case, every VLAN packet can be transmitted to each VF.
> >
> > When we try to make VLAN devices on VF, the 65th VLAN registration fails
> > and never be able to receive a packet with that VLAN tag.
> > If we do the below command on VM, ethX.65 to ethX.100 cannot be created.
> ># for i in `seq 1 100`; do \
> >  ip link add link ethX name ethX.$i type vlan id $i; done
> >
> > There is a capability to disable HW VLAN filter and that makes all VLAN
> > tagged packets can be transmitted to every VFs. After VLAN filter stage,
> > unicast packets are transmitted to VF which has the MAC address same as
> > the transmitting packet.
> >
> > With this patch and "disable_hw_vlan_filter=1", we can use unlimited
> > number of VLANs on VF.
> >
> > Disabling HW VLAN filter breaks some NIC features such as DCB and FCoE.
> > DCB and FCoE are disabled when HW VLAN filter is disabled by this module
> > parameter.
> > Because of that reason, the administrator has to know that before turning
> > off HW VLAN filter.
> 
> You might also want to note that it makes the system susceptible to
> broadcast/multicast storms since it eliminates any/all VLAN isolation.
> So a broadcast or multicast packet on one VLAN is received on ALL
> interfaces regardless of their VLAN configuration. In addition the
> current VF driver is likely to just receive the packet as untagged, see
> ixgbevf_process_skb_fields().  As a result one or two VFs can bring the
> entire system to a crawl by saturating the PCIe bus via
> broadcast/multicast traffic since there is nothing to prevent them from
> talking to each other over VLANs that are no longer there.

that's right.

On the other hand, an untagged packet is not isolated,
doesn't it same broadcast/multicast storm on untagged network?

> 
> For the sake of backwards compatibility I would say that a feature like
> this should be mutually exclusive with SR-IOV as well since it will
> cause erratic behavior.  The VF will receive requests from all VLANs
> thinking the traffic is untagged, and then send replies back to VLAN 0
> even though that isn't where the message originated.

Sorry, I couldn't catch the above part.
Could you explain a bit more?

thanks,
Hiroshi

> Until the VF issue
> is fixed this type of feature is a no-go.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3 nf-next] netfilter: nf_tables: allow to bind table to net_device

2015-05-26 Thread Simon Horman
On Tue, May 26, 2015 at 11:58:24AM +0200, Pablo Neira Ayuso wrote:
> On Tue, May 26, 2015 at 09:48:41AM +0900, Simon Horman wrote:
> > Hi Pablo,
> > 
> > On Mon, May 25, 2015 at 02:46:41PM +0200, Pablo Neira Ayuso wrote:
> > > This patch adds the internal NFT_AF_NEEDS_DEV flag to indicate that you 
> > > must
> > > attach this table to a net_device.
> > > 
> > > This change is required by the follow up patch that introduces the new 
> > > netdev
> > > table.
> > > 
> > > Signed-off-by: Pablo Neira Ayuso 
> > > ---
> > >  include/net/netfilter/nf_tables.h|8 ++
> > >  include/uapi/linux/netfilter/nf_tables.h |2 ++
> > >  net/netfilter/nf_tables_api.c|   46 
> > > ++
> > >  3 files changed, 51 insertions(+), 5 deletions(-)
> > 
> > [snip]
> > 
> > > diff --git a/include/uapi/linux/netfilter/nf_tables.h 
> > > b/include/uapi/linux/netfilter/nf_tables.h
> > > index 5fa1cd0..89a671e 100644
> > > --- a/include/uapi/linux/netfilter/nf_tables.h
> > > +++ b/include/uapi/linux/netfilter/nf_tables.h
> > 
> > [snip]
> > 
> > > @@ -423,6 +425,10 @@ static int nf_tables_fill_table_info(struct sk_buff 
> > > *skb, struct net *net,
> > >   nla_put_be32(skb, NFTA_TABLE_USE, htonl(table->use)))
> > >   goto nla_put_failure;
> > >  
> > > + if (table->dev &&
> > > + nla_put_string(skb, NFTA_TABLE_DEV, table->dev->name))
> > > + goto nla_put_failure;
> > > +
> > >   nlmsg_end(skb, nlh);
> > >   return 0;
> > >  
> > > @@ -608,6 +614,11 @@ static int nf_tables_updtable(struct nft_ctx *ctx)
> > >   if (flags == ctx->table->flags)
> > >   return 0;
> > >  
> > > + if ((ctx->afi->flags & NFT_AF_NEEDS_DEV) &&
> > > + ctx->nla[NFTA_TABLE_DEV] &&
> > > + nla_strcmp(ctx->nla[NFTA_TABLE_DEV], ctx->table->dev->name))
> > > + return -EOPNOTSUPP;
> > > +
> > >   trans = nft_trans_alloc(ctx, NFT_MSG_NEWTABLE,
> > >   sizeof(struct nft_trans_table));
> > >   if (trans == NULL)
> > 
> > I'm a little unsure of the above logic.
> > 
> > Is it ok for NFT_AF_NEEDS_DEV to be set but ctx->nla[NFTA_TABLE_DEV] to
> > be absent?
> 
> This path is only run if the table already exists.
> 
> So it basically checks if we're trying to update the binding, in that
> case we hit -EOPNOTSUPP.
> 
> If we don't pass any NFTA_TABLE_DEV, then we assume we stick to the
> existing binding.
> 
> This allows us to update the table flags without indicating the
> binding, eg.
> 
> nft add table netdev filter { flags dormant\; }
> 
> which basically disables the entire table content.

Thanks Pablo, that is clear to me now.
I have no objections.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC net-next] rocker: by default accept untagged packets

2015-05-26 Thread Simon Horman
Hi Scott,

On Tue, May 26, 2015 at 02:04:00AM -0700, Scott Feldman wrote:
> On Tue, May 26, 2015 at 12:28 AM, Scott Feldman  wrote:
> > On Mon, May 25, 2015 at 5:55 PM, Simon Horman
> >  wrote:
> >> This will occur anyway if the 8021q module is loaded as it will
> >> call rocker_port_vlan_rx_add_vid for vlan 0. This code is here
> >> to handle the case where the 8021q module is not loaded.
> >>
> >> This patch also handles the case where the 8021q is unloaded
> >> removing all VLANs from all ports.
> >>
> >> This change should not affect bridging, although the rules are
> >> harmlessly installed anyway. This is in keeping with the behaviour
> >> for VLANs when the 8021q modules is loaded.
> >>
> >> To aid implementation of the above provide a helper
> >> and use it to replace some existing code.
> >>
> >> Signed-off-by: Simon Horman 
> 
> [cut]
> 
> >
> > Hi Simon,
> >
> > Thanks for looking into this one.  I looked at your patch and the code
> > and I think we can streamline it a little bit more.  For the
> > no-8021q-module case we use rocker_port_vlan_add() and
> > rocker_port_vlan_del() to add/del bridge vlans.  We should be able to
> > move those functions up in the file so they can be called from
> > rocker_port_vlan_rx_add_vid() and rocker_port_vlan_rx_kill_vid(),
> > passing trans=SWITCHDEV_TRANS_NONE, but only if vid != 0.  Next, like
> > in your patch, we need to call rocker_port_vlan_add() in
> > rocker_port_open(), passing in vid=0 for untagged.  And in
> > rocker_port_stop(), call rocker_port_vlan_del(), again passing in
> > vid=0.
> >
> > To summarize:
> >
> > Call rocker_port_vlan_add() from:
> >
> > 1) rocker_port_open with vid=0
> > 2) rocker_port_vlans_add() // bridge vlan
> > 3) rocker_port_vlan_rx_add_vid() if vid != 0   // 8021q vlan
> >
> > Call rocker_port_vlan_del() from:
> >
> > 1) rocker_port_stop with vid=0
> > 2) rocker_port_vlans_del()  // bridge vlan
> > 3) rocker_port_vlan_rx_kill_vid() if vid != 0// 8021q vlan
> >
> > Does this look right?

It seems like it ought to work, I can try implementing the above
idea if you think it is worthwhile.

Can I clarify  that its ok to ignore vid != 0 in
rocker_port_vlan_rx_add_vid() and rocker_port_vlan_rx_kill_vid()?

If so I think that leads to an easy simplification of
my proposed change to rocker_port_vlan_rx_kill_vid():
the logic to restore vlan 0 if no vlans are present can be dropped.

Of course your suggestion goes further than that.

> H...things get simpler if we removed support for 8021q module in
> rocker driver by removing NETIF_F_HW_VLAN_CTAG_FILTER.  That gets rid
> of rocker_port_vlan_rx_add_vid() and rocker_port_vlan_rx_kill_vid().
> Leaving us with the bridge VLAN interface to add/del/show vlans on the
> port.  I'm wondering if we can also avoid setting up untagged traffic
> on the port during port open by requiring a explicit command on the
> port from the user:
> 
> bridge vlan add vid 0 dev DEV master self// enable untagged
> traffic on port

I have some questions about that approach:

* Does that behaviour differ from other devices
  (that don't set NETIF_F_HW_VLAN_CTAG_FILTER)?
  It seems like it may be an extra unnecessary step to me.
* Does that behaviour change the current behaviour supported by rocker?
  If so it seems unwise to change it.
* Does the scheme described above work when rocker ports are not bridged?
  This is the scenario I am interested in at this time.

> 
> Do you have a requirement for 8021q module?  I'm leaning towards a
> clean break from 8021q and using just the built-in bridge VLAN
> support.  I'm curious if others have an opinion about 8021q module
> used with switchdev devices.

I do not have a requirement on the 8021q module at this time.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ray_cs: Change 1 to true for bool type variable.

2015-05-26 Thread Shailendra Verma
The variable translate is bool type.So assigning true instead of 1.

Signed-off-by: Shailendra Verma 
---
 drivers/net/wireless/ray_cs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ray_cs.c b/drivers/net/wireless/ray_cs.c
index 477f863..0881ba8 100644
--- a/drivers/net/wireless/ray_cs.c
+++ b/drivers/net/wireless/ray_cs.c
@@ -143,7 +143,7 @@ static int psm;
 static char *essid;
 
 /* Default to encapsulation unless translation requested */
-static bool translate = 1;
+static bool translate = true;
 
 static int country = USA;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: tcp_tso_autosize() minimum is one packet

2015-05-26 Thread Eric Dumazet
On Wed, 2015-05-27 at 08:03 +0800, Herbert Xu wrote:
> Eric Dumazet  wrote:
> >
> > -   if (tso_segs == 1 || !max_segs) {
> > +   if (tso_segs == 1) {
> 
> What we're testing for here with max_segs is actually the question
> "is TSO enabled".  So with your patch we will be taking the TSO
> code path even when TSO is disabled.  Now this may or may not
> trigger the original bug that I was trying to fix but it still
> feels unsafe.
> 
> So please convince me that it is totally safe to take the TSO
> code path with TSO disabled, e.g., when PMTU causes tso_segs
> to be greater than one.

tldr:  "TSO with max_segs==1"   "no TSO/GSO"


So worth thing that will happen is that the call to
tcp_mss_split_point() / tso_fragment() will nicely split the (TSO/GSO)
packet in a nice non GSO packet of one MSS before transmit.

Right now, one can "ethtool -K eth0 tso off gso off" in the middle of a
TCP session and  tcp_write_xmit() automatically falls back to splitting
too big skbs that were cooked at the time GSO/TSO was considered true in
sendmsg().
  
Not sure what you need to be convinced ;)

Thanks !

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF

2015-05-26 Thread Hiroshi Shimamoto
> > -Original Message-
> > From: Skidmore, Donald C
> > Sent: Tuesday, May 26, 2015 10:46 AM
> > To: Hiroshi Shimamoto; Rose, Gregory V; Kirsher, Jeffrey T; intel-wired-
> > l...@lists.osuosl.org
> > Cc: nhor...@redhat.com; jogre...@redhat.com; Linux Netdev List; Choi, Sy
> > Jong; Rony Efraim; David Miller; Edward Cree; Or Gerlitz;
> > sassm...@redhat.com
> > Subject: RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF
> >
> >
> 
> [snip]
> 
> >
> > > -Original Message-
> > > From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com]
> > > Sent: Monday, May 25, 2015 6:00 PM
> > > To: Skidmore, Donald C; Rose, Gregory V; Kirsher, Jeffrey T;
> > > intel-wired- l...@lists.osuosl.org
> > > Cc: nhor...@redhat.com; jogre...@redhat.com; Linux Netdev List; Choi,
> > > Sy Jong; Rony Efraim; David Miller; Edward Cree; Or Gerlitz;
> > > sassm...@redhat.com
> > > Subject: RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF
> > >
> > >
> > > Do you mean that VF should care about it is trusted or not?
> > > Should VF request MC Promisc again when it's trusted?
> > > Or, do you mean VF never be trusted during its (or VM's) lifetime?
> >
> > I think the VF shouldn't directly know whether it is trusted or not
> 
> That's completely irrevelant.  The person administering the PF will be the 
> person who provided trusted privileges to the
> VF.  He'll then *tell* or somehow other communicate to the person 
> administering the VF (probably himself/herself) and
> then proceed to execute commands on that VF that require trusted privileges.
> 
> If the VF does not have trusted privileges then the commands to add VLAN 
> filters, set promiscuous modes, and any other
> privileged commands will fail.
> 
> Let's not get too fancy with this.  It's simple - the host VMM admin provides 
> trusted privileges to the VF.  The person
> administering the VF (if in fact it is not the same person, it usually will 
> be) will proceed to do things that require
> VF trusted privileges.

Now I think that it's better to have an interface between PF and VF to know the 
VF is trusted.
Otherwise VM cannot know whether its VF is trusted, that prevents automatic 
operations.
Or add another communicating interface outside of ixgbe PF-VF mbox API?

> 
> 
> .  It
> > should request MC Promisc and get it if it is trusted and not if it is not
> > trusted.  So if you (as the system admin know you have a VF that will need
> > to request MC Promisc make sure you promote that VF to trusted before
> > assigning it to a VM.  That way when it requests MC Promisc the PF will be
> > able to grant it.
> >
> 
> Multicast promiscuous should be allowed for the VFs.  We already allow VFs to 
> set whatever multicast filters they want
> so if they want to go into MPE then so what?  We don't care.  It's not a 
> security risk.  Right now, without any modification,
> the VF can set 30 multicast filters and listen.  It can then remove those and 
> set another 30 filters and listen.  And
> so on and so on.  So if a VF can already listen on any MC filter it wants 
> then why this artificial restriction on MC promiscuous
> mode.

I'm fine with that, previously I mentioned about that.
Without resetting PF, we can listen every MC packet which hash was set.
PF reset will restore the last 30 MC addresses per VF.

Also there is a single hash entries table, all VFs will got a MC packet
which hash was set in the table. If a VF user set a filter, other users
will receive that MC packet.

> 
> We don't care about this case. Unicast promiscuous is the security risk and I 
> think we've handled that.

So, should we separate the discussion, about trusting VF operation and
about MC promiscuous?

> 
> >
> > >
> > > And what do you think about being untrusted from trusted state?
> >
> > This is an interesting question.  If we allowed a VM to go from trusted ->
> > untrusted we would have to turn off any "special" configuration that
> > trusted allowed.  Maybe in such cases we could reset the PF?  And of
> > course require all the "special" configuration (MC Promisc) to default to
> > off after being reset.
> >
> 
> To remove privileges from a VF that you're already set to privileged will 
> require destruction of the VF VSI and VFLR to
> the VF - after it comes up it can't do any further privileged operations.

yeah, sounds good to reset VF on changing privilege.

> 
> [snip
> 
> > This too is a valid point.  Currently we would just not do it (MC Promisc)
> > and the VF would have to figure that out for itself.  Passing a NAK back
> > to the VF might be nicer. :)  Of course I assumed the sysadm would know
> > that he/she wanted to give a VF trusted status and would do that before
> > the VF was even assigned to a VM, so the issue would never come up.  Maybe
> > that is not valid for your use case?
> 
> Let's not worry about MC promiscuous mode.  As I pointed out above we already 
> let VFs set any MC address filters they
> want so that horse has already left the barn.

Do you thin

Re: [PATCH net-next] tcp: tcp_tso_autosize() minimum is one packet

2015-05-26 Thread Herbert Xu
Eric Dumazet  wrote:
>
> -   if (tso_segs == 1 || !max_segs) {
> +   if (tso_segs == 1) {

What we're testing for here with max_segs is actually the question
"is TSO enabled".  So with your patch we will be taking the TSO
code path even when TSO is disabled.  Now this may or may not
trigger the original bug that I was trying to fix but it still
feels unsafe.

So please convince me that it is totally safe to take the TSO
code path with TSO disabled, e.g., when PMTU causes tso_segs
to be greater than one.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: fix/cleanup inet_ehash_locks_alloc()

2015-05-26 Thread David Miller
From: Eric Dumazet 
Date: Tue, 26 May 2015 07:55:34 -0700

> From: Eric Dumazet 
> 
> If tcp ehash table is constrained to a very small number of buckets
> (eg boot parameter thash_entries=128), then we can crash if spinlock
> array has more entries.
> 
> While we are at it, un-inline inet_ehash_locks_alloc() and make
> following changes :
> 
> - Budget 2 cache lines per cpu worth of 'spinlocks'
> - Try to kmalloc() the array to avoid extra TLB pressure.
>   (Most servers at Google allocate 8192 bytes for this hash table)
> - Get rid of various #ifdef
> 
> Signed-off-by: Eric Dumazet 

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] amd-xgbe-phy: Fix initial mode when autoneg is disabled

2015-05-26 Thread David Miller
From: Tom Lendacky 
Date: Tue, 26 May 2015 09:57:33 -0500

> On 05/26/2015 09:51 AM, Tom Lendacky wrote:
>> When the ethtool command is used to set the speed of the device while
>> the device is down, the check to set the initial mode may fail when
>> the device is brought up, causing failure to bring the device up.
>>
>> Update the code to set the initial mode based on the desired speed if
>> auto-negotiation is disabled.
>>
>> This patch fixes a bug introduced by:
>> d9663c8c2149 ("amd-xgbe-phy: Use phydev advertising field vs
>> supported")
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>>   drivers/net/phy/amd-xgbe-phy.c | 45
>>   +---
>>   1 file changed, 42 insertions(+), 3 deletions(-)
>>
> 
> Hi David,
> 
> Just a heads up on this patch. With the removal of this file in
> net-next
> you'll have a conflict when merging into net-next. My updates from the
> other day to net-next fix this bug in the new file/location.

Ok, applied, and thanks for the heads up about the conflict.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/1] tipc: fix bug in link protocol message create function

2015-05-26 Thread David Miller
From: Jon Maloy 
Date: Tue, 26 May 2015 05:40:19 -0400

> In commit dd3f9e70f59f43a5712eba9cf3ee4f1e6999540c
> ("tipc: add packet sequence number at instant of transmission") we
> made a change with the consequence that packets in the link backlog
> queue don't contain valid sequence numbers.
> 
> However, when we create a link protocol message, we still use the
> sequence number of the first packet in the backlog, if there is any,
> as "next_sent" indicator in the message. This may entail unnecessary
> retransissions or stale packet transmission when there is very low
> traffic on the link.
> 
> This commit fixes this issue by only using the current value of
> tipc_link::snd_nxt as indicator.
> 
> Signed-off-by: Jon Maloy 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull-request: mac80211 2015-05-26

2015-05-26 Thread David Miller
From: Johannes Berg 
Date: Tue, 26 May 2015 09:25:03 +0200

> Unfortunately I neglected to send this to you last week before our long
> weekend.
> 
> These changes look fairly big, but they're fairly contained to the
> remain-on- channel and AP_VLAN key handling code.
> 
> However, if you think you don't want to pull them into net any more at
> this stage just let me know.

This looks fine, pulled, thanks Johannes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sctp: Fix mangled IPv4 addresses on a IPv6 listening socket

2015-05-26 Thread Jason Gunthorpe
sctp_v4_map_v6 was subtly writing and reading from members
of a union in a way the clobbered data it needed to read before
it read it.

Zeroing the v6 flowinfo overwrites the v4 sin_addr with 0, meaning
that every place that calls sctp_v4_map_v6 gets :::0.0.0.0 as the
result.

Reorder things to guarantee correct behaviour no matter what the
union layout is.

This impacts user space clients that open an IPv6 SCTP socket and
receive IPv4 connections. Prior to 299ee user space would see a
sockaddr with AF_INET and a correct address, after 299ee the sockaddr
is AF_INET6, but the address is wrong.

Fixes: 299ee123e198 (sctp: Fixup v4mapped behaviour to comply with Sock API)
Signed-off-by: Jason Gunthorpe 
---
 include/net/sctp/sctp.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

This bugfix should be a candidate for -stable

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 856f01cb51dd..230775f5952a 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -571,11 +571,14 @@ static inline void sctp_v6_map_v4(union sctp_addr *addr)
 /* Map v4 address to v4-mapped v6 address */
 static inline void sctp_v4_map_v6(union sctp_addr *addr)
 {
+   __be16 port;
+
+   port = addr->v4.sin_port;
+   addr->v6.sin6_addr.s6_addr32[3] = addr->v4.sin_addr.s_addr;
+   addr->v6.sin6_port = port;
addr->v6.sin6_family = AF_INET6;
addr->v6.sin6_flowinfo = 0;
addr->v6.sin6_scope_id = 0;
-   addr->v6.sin6_port = addr->v4.sin_port;
-   addr->v6.sin6_addr.s6_addr32[3] = addr->v4.sin_addr.s_addr;
addr->v6.sin6_addr.s6_addr32[0] = 0;
addr->v6.sin6_addr.s6_addr32[1] = 0;
addr->v6.sin6_addr.s6_addr32[2] = htonl(0x);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Patch net] net_sched: invoke ->attach() after setting dev->qdisc

2015-05-26 Thread Cong Wang
For mq qdisc, we add per tx queue qdisc to root qdisc
for display purpose, however, that happens too early,
before the new dev->qdisc is finally set, this causes
q->list points to an old root qdisc which is going to be
freed right before assigning with a new one.

Fix this by moving ->attach() after setting dev->qdisc.

For the record, this fixes the following crash:

 [ cut here ]
 WARNING: CPU: 1 PID: 975 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
 list_del corruption. prev->next should be 8800d1998ae8, but was 
6b6b6b6b6b6b6b6b
 CPU: 1 PID: 975 Comm: tc Not tainted 4.1.0-rc4+ #1019
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0009 8800d73fb928 81a44e7f 47574756
  8800d73fb978 8800d73fb968 810790da 8800cfc4cd20
  814e725b 8800d1998ae8 82381250 
 Call Trace:
  [] dump_stack+0x4c/0x65
  [] warn_slowpath_common+0x9c/0xb6
  [] ? __list_del_entry+0x5a/0x98
  [] warn_slowpath_fmt+0x46/0x48
  [] ? dev_graft_qdisc+0x5e/0x6a
  [] __list_del_entry+0x5a/0x98
  [] list_del+0xe/0x2d
  [] qdisc_list_del+0x1e/0x20
  [] qdisc_destroy+0x30/0xd6
  [] qdisc_graft+0x11d/0x243
  [] tc_get_qdisc+0x1a6/0x1d4
  [] ? mark_lock+0x2e/0x226
  [] rtnetlink_rcv_msg+0x181/0x194
  [] ? rtnl_lock+0x17/0x19
  [] ? rtnl_lock+0x17/0x19
  [] ? __rtnl_unlock+0x17/0x17
  [] netlink_rcv_skb+0x4d/0x93
  [] rtnetlink_rcv+0x26/0x2d
  [] netlink_unicast+0xcb/0x150
  [] ? might_fault+0x59/0xa9
  [] netlink_sendmsg+0x4fa/0x51c
  [] sock_sendmsg_nosec+0x12/0x1d
  [] sock_sendmsg+0x29/0x2e
  [] ___sys_sendmsg+0x1b4/0x23a
  [] ? native_sched_clock+0x35/0x37
  [] ? sched_clock_local+0x12/0x72
  [] ? sched_clock_cpu+0x9e/0xb7
  [] ? current_kernel_time+0xe/0x32
  [] ? lock_release_holdtime.part.29+0x71/0x7f
  [] ? read_seqcount_begin.constprop.27+0x5f/0x76
  [] ? trace_hardirqs_on_caller+0x17d/0x199
  [] ? __fget_light+0x50/0x78
  [] __sys_sendmsg+0x42/0x60
  [] SyS_sendmsg+0x12/0x1c
  [] system_call_fastpath+0x12/0x6f
 ---[ end trace ef29d3fb28e97ae7 ]---

For long term, we probably need to clean up the qdisc_graft() code
in case it hides other bugs like this.

Fixes: 95dc19299f74 ("pkt_sched: give visibility to mq slave qdiscs")
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 net/sched/sch_api.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ad9eed7..1e1c89e 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -815,10 +815,8 @@ static int qdisc_graft(struct net_device *dev, struct 
Qdisc *parent,
if (dev->flags & IFF_UP)
dev_deactivate(dev);
 
-   if (new && new->ops->attach) {
-   new->ops->attach(new);
-   num_q = 0;
-   }
+   if (new && new->ops->attach)
+   goto skip;
 
for (i = 0; i < num_q; i++) {
struct netdev_queue *dev_queue = dev_ingress_queue(dev);
@@ -834,12 +832,16 @@ static int qdisc_graft(struct net_device *dev, struct 
Qdisc *parent,
qdisc_destroy(old);
}
 
+skip:
if (!ingress) {
notify_and_destroy(net, skb, n, classid,
   dev->qdisc, new);
if (new && !new->ops->attach)
atomic_inc(&new->refcnt);
dev->qdisc = new ? : &noop_qdisc;
+
+   if (new && new->ops->attach)
+   new->ops->attach(new);
} else {
notify_and_destroy(net, skb, n, classid, old, new);
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: DSA and underlying 802.1Q encapsulation

2015-05-26 Thread Guenter Roeck
On Tue, May 26, 2015 at 06:29:57PM -0400, Vivien Didelot wrote:
> Hi,
> 
> I'm doing tests with VLAN support in DSA and I noticed that the EDSA 
> frame is prepended with a 802.1q header once queued to the underlying 
> network device, in net/dsa/tag_edsa.c:
> 
> skb->dev = p->parent->dst->master_netdev;
> dev_queue_xmit(skb);
> 
> This issue can be observed with the following dump:
> 
> curl -s http://ix.io/iIv | tcpdump -en -r -
> 
> I suspect that the DSA code must clear some VLAN flags in the skb
> structure, in order to prevent the additional encapsulation by the lower
> level. Does this make sense?
> 

Hi Vivien,

Interesting question. Does the underlying network device support VLAN HW
acceleration (NETIF_F_HW_VLAN_CTAG_RX, NETIF_F_HW_VLAN_CTAG_TX) ?

If yes, the dsa code may need to move the tag into the header.
If we are lucky, a call to vlan_hwaccel_push_inside() might do it.

Do you have some vlan dsa code to share, by any chance ? That might
save me some time, as I am looking into it as well.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 0/3] net: Add incoming CPU mask to sockets

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 13:01 -0700, Tom Herbert wrote:

> In that case there's no guarantee that any two packets in a flow will
> hit the same CPU so there's no way to establish affinity to the
> interrupt anyway. RFS would work okay to get affinity of the soft
> processing, but there would be no point in trying to do any affinity
> with incoming cpu so this feature wouldn't help.

This is why I think this patch can hurt users.

RPS/RFS/smp_affinity/SO_REUSEPORT/cpuhotplug are only hints, that would
never break TCP session establishment, even with hash collisions and
sockets being added/deleted to SO_REUSEPORT pools.

It works in _all_ situations. Even the crazy/stupid setups.

Your patch is breaking this rule, without any clear documentation on how
to make sure everything is properly setup.

I am not sure my tcp listener stuff will be finished for 4.2, because I
had to spend lot of time to more urgent stuff lately, like reviewing
patches ;)

I would prefer your patch is added for 4.3, not before.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH RFC 2/3] xfrm: Override skb->mark with tunnel->parm.i_key in xfrm_input

2015-05-26 Thread Alexander Duyck
This change makes it so that if a tunnel is defined we just use the mark
from the tunnel instead of the mark from the skb header.  By doing this we
can avoid the need to set skb->mark inside of the tunnel receive functions.

Signed-off-by: Alexander Duyck 
---
 net/xfrm/xfrm_input.c |   17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 526c4feb3b50..b58286ecd156 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 static struct kmem_cache *secpath_cachep __read_mostly;
 
@@ -186,6 +188,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
struct xfrm_state *x = NULL;
xfrm_address_t *daddr;
struct xfrm_mode *inner_mode;
+   u32 mark = skb->mark;
unsigned int family;
int decaps = 0;
int async = 0;
@@ -203,6 +206,18 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
   XFRM_SPI_SKB_CB(skb)->daddroff);
family = XFRM_SPI_SKB_CB(skb)->family;
 
+   /* if tunnel is present override skb->mark value with tunnel i_key */
+   if (XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4) {
+   switch (family) {
+   case AF_INET:
+   mark = 
be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4->parms.i_key);
+   break;
+   case AF_INET6:
+   mark = 
be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6->parms.i_key);
+   break;
+   }
+   }
+
/* Allocate new secpath or COW existing one. */
if (!skb->sp || atomic_read(&skb->sp->refcnt) != 1) {
struct sec_path *sp;
@@ -229,7 +244,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
goto drop;
}
 
-   x = xfrm_state_lookup(net, skb->mark, daddr, spi, nexthdr, 
family);
+   x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family);
if (x == NULL) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
xfrm_audit_state_notfound(skb, family, spi, seq);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH RFC 0/3] Preserve skb->mark through VTI tunnels

2015-05-26 Thread Alexander Duyck
These patches are meant to try and address the fact the VTI tunnels are
currently overwriting the skb->mark value.  I am generally happy with the
first two patches, however the third patch still modifies the skb->mark,
though it undoes after the fact.

The main problem I am trying to address is the fact that currently if I use
an v6 over v6 VTI tunnel I cannot receive any traffic on the interface as
the skb->mark is bleeding through and causing the traffic to be dropped.

---

Alexander Duyck (3):
  ip_vti/ip6_vti: Do not touch skb->mark on xmit
  xfrm: Override skb->mark with tunnel->parm.i_key in xfrm_input
  ip_vti/ip6_vti: Preserve skb->mark after rcv_cb call


 net/ipv4/ip_vti.c |9 ++---
 net/ipv6/ip6_vti.c|8 ++--
 net/xfrm/xfrm_input.c |   17 -
 3 files changed, 28 insertions(+), 6 deletions(-)

--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH RFC 1/3] ip_vti/ip6_vti: Do not touch skb->mark on xmit

2015-05-26 Thread Alexander Duyck
Instead of modifying skb->mark we can simply modify the flowi_mark that is
generated as a result of the xfrm_decode_session.  By doing this we don't
need to actually touch the skb->mark and it can be preserved as it passes
out through the tunnel.

Signed-off-by: Alexander Duyck 
---
 net/ipv4/ip_vti.c  |5 +++--
 net/ipv6/ip6_vti.c |4 +++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 9f7269f3c54a..4c318e1c13c8 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -216,8 +216,6 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
memset(&fl, 0, sizeof(fl));
 
-   skb->mark = be32_to_cpu(tunnel->parms.o_key);
-
switch (skb->protocol) {
case htons(ETH_P_IP):
xfrm_decode_session(skb, &fl, AF_INET);
@@ -233,6 +231,9 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, 
struct net_device *dev)
return NETDEV_TX_OK;
}
 
+   /* override mark with tunnel output key */
+   fl.flowi_mark = be32_to_cpu(tunnel->parms.o_key);
+
return vti_xmit(skb, dev, &fl);
 }
 
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index ed9d681207fa..104de4da3ff3 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -495,7 +495,6 @@ vti6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
int ret;
 
memset(&fl, 0, sizeof(fl));
-   skb->mark = be32_to_cpu(t->parms.o_key);
 
switch (skb->protocol) {
case htons(ETH_P_IPV6):
@@ -516,6 +515,9 @@ vti6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
goto tx_err;
}
 
+   /* override mark with tunnel output key */
+   fl.flowi_mark = be32_to_cpu(t->parms.o_key);
+
ret = vti6_xmit(skb, dev, &fl);
if (ret < 0)
goto tx_err;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next PATCH RFC 3/3] ip_vti/ip6_vti: Preserve skb->mark after rcv_cb call

2015-05-26 Thread Alexander Duyck
The vti6_rcv_cb and vti_rcv_cb calls were leaving the skb->mark modified
after completing the function.  This resulted in the original skb->mark
value being lost.  Since we only need skb->mark to be set for
xfrm_policy_check we can pull the assignment into the rcv_cb calls and then
just restore the original mark after xfrm_policy_check has been completed.

Signed-off-by: Alexander Duyck 
---
 net/ipv4/ip_vti.c  |9 +++--
 net/ipv6/ip6_vti.c |9 +++--
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 4c318e1c13c8..0c152087ca15 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -65,7 +65,6 @@ static int vti_input(struct sk_buff *skb, int nexthdr, __be32 
spi,
goto drop;
 
XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = tunnel;
-   skb->mark = be32_to_cpu(tunnel->parms.i_key);
 
return xfrm_input(skb, nexthdr, spi, encap_type);
}
@@ -91,6 +90,8 @@ static int vti_rcv_cb(struct sk_buff *skb, int err)
struct pcpu_sw_netstats *tstats;
struct xfrm_state *x;
struct ip_tunnel *tunnel = XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4;
+   u32 orig_mark = skb->mark;
+   int ret;
 
if (!tunnel)
return 1;
@@ -107,7 +108,11 @@ static int vti_rcv_cb(struct sk_buff *skb, int err)
x = xfrm_input_state(skb);
family = x->inner_mode->afinfo->family;
 
-   if (!xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family))
+   skb->mark = be32_to_cpu(tunnel->parms.i_key);
+   ret = xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family);
+   skb->mark = orig_mark;
+
+   if (!ret)
return -EPERM;
 
skb_scrub_packet(skb, !net_eq(tunnel->net, dev_net(skb->dev)));
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 104de4da3ff3..ff3bd863fa03 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -322,7 +322,6 @@ static int vti6_rcv(struct sk_buff *skb)
}
 
XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
-   skb->mark = be32_to_cpu(t->parms.i_key);
 
rcu_read_unlock();
 
@@ -342,6 +341,8 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err)
struct pcpu_sw_netstats *tstats;
struct xfrm_state *x;
struct ip6_tnl *t = XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6;
+   u32 orig_mark = skb->mark;
+   int ret;
 
if (!t)
return 1;
@@ -358,7 +359,11 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err)
x = xfrm_input_state(skb);
family = x->inner_mode->afinfo->family;
 
-   if (!xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family))
+   skb->mark = be32_to_cpu(t->parms.i_key);
+   ret = xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family);
+   skb->mark = orig_mark;
+
+   if (!ret)
return -EPERM;
 
skb_scrub_packet(skb, !net_eq(t->net, dev_net(skb->dev)));

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


DSA and underlying 802.1Q encapsulation

2015-05-26 Thread Vivien Didelot
Hi,

I'm doing tests with VLAN support in DSA and I noticed that the EDSA 
frame is prepended with a 802.1q header once queued to the underlying 
network device, in net/dsa/tag_edsa.c:

skb->dev = p->parent->dst->master_netdev;
dev_queue_xmit(skb);

This issue can be observed with the following dump:

curl -s http://ix.io/iIv | tcpdump -en -r -

I suspect that the DSA code must clear some VLAN flags in the skb
structure, in order to prevent the additional encapsulation by the lower
level. Does this make sense?

Thanks,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] cgroup: report cgroup release event to proc connector

2015-05-26 Thread Dimitri John Ledkov
This adds a call to proc_cgrelease_connector at check_for_release
time. It is done when cgroup becomes dead, regardless of the
notify_on_release status.

This is thus compatible with both current & unified cgroups hierarchy,
and the decision which cgroups to emit events for is offloaded to the
proc connector API.

Specifically, if there are no listeners, no events are emitted. If
only certain events are desired, the userspace proc connector listener
can filter them in the userspace or install a BPF on the socket to
ignore things it doesn't care about.

Signed-off-by: Dimitri John Ledkov 
---
 kernel/cgroup.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 469dd54..c52e584 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -57,6 +57,7 @@
 #include  /* TODO: replace with more sophisticated array */
 #include 
 #include 
+#include 
 
 #include 
 
@@ -5307,9 +5308,12 @@ void cgroup_exit(struct task_struct *tsk)
 
 static void check_for_release(struct cgroup *cgrp)
 {
-   if (notify_on_release(cgrp) && !cgroup_has_tasks(cgrp) &&
-   !css_has_online_children(&cgrp->self) && !cgroup_is_dead(cgrp))
-   schedule_work(&cgrp->release_agent_work);
+   if (!cgroup_has_tasks(cgrp) &&
+   !css_has_online_children(&cgrp->self) && !cgroup_is_dead(cgrp)) {
+   proc_cgrelease_connector(cgrp);
+   if (notify_on_release(cgrp))
+   schedule_work(&cgrp->release_agent_work);
+   }
 }
 
 /*
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] connector: add cgroup release event report to proc connector

2015-05-26 Thread Dimitri John Ledkov
Add a kernel API to send a proc connector notification that a cgroup
has become empty. A userspace daemon can then act upon such
information, and usually clean-up and remove such a group as it's no
longer needed.

Currently there are two other ways (one for current & one for unified
cgroups) to receive such notifications, but they either involve
spawning userspace helper or monitoring a lot of files. This is a
firehose of all such events instead from a single place.

In the current cgroups structure the way to get notifications is by
enabling `release_agent' and setting `notify_on_release' for a given
cgroup hierarchy. This will then spawn userspace helper with removed
cgroup as an argument. It has been acknowledged that this is
expensive, especially in the exit-heavy workloads. In userspace this
is currently used by systemd and CGmanager that I know of, both of
agents establish connection to the long running daemon and pass the
message to it. As a courtesy to other processes, such an event is
sometimes forwarded further on, e.g. systemd forwards it to the system
DBus.

In the future/unified cgroups structure support for `release_agent' is
removed, without a direct replacement. However, there is a new
`cgroup.populated' file exposed that recursively reports if there are
any tasks in a given cgroup hierarchy. It's a very good flag to
quickly/lazily scan for empty things, however one would need to
establish inotify watch on each and every cgroup.populated file at
cgroup setup time (ideally before any pids enter said cgroup). Thus
again anybody else, but the original creator of a given cgroup, has a
chance to reliably monitor cgroup becoming empty (since there is no
reliable recursive inotify watch).

Hence, the addition to the proc connector firehose. Multiple things,
albeit with a CAP_NET_ADMIN in the init pid/user namespace), could
connect and monitor cgroups release notifications. In a way, this
repeats udev history, at first it was a userspace helper, which later
became a netlink socket. And I hope, that proc connector is a
naturally good fit for this notification type.

For precisely when cgroups should emit this event, see next patch
against kernel/cgroup.c.

Signed-off-by: Dimitri John Ledkov 
---
 drivers/connector/cn_proc.c  | 56 
 include/linux/cn_proc.h  |  6 +
 include/uapi/linux/cn_proc.h |  8 ++-
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 15d06fc..ef71bd9 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -244,6 +245,61 @@ void proc_comm_connector(struct task_struct *task)
cn_netlink_send(msg, 0, CN_IDX_PROC, GFP_KERNEL);
 }
 
+void proc_cgrelease_connector(struct cgroup *cgrp)
+{
+   struct cn_msg *msg;
+   struct proc_event *ev;
+   char *path_buffer, *path;
+   __u8 *buffer;
+   __u8 length;
+
+   if (atomic_read(&proc_event_num_listeners) < 1)
+   return;
+
+   /* ENOMEM */
+   path_buffer = kmalloc(PATH_MAX, GFP_KERNEL);
+   if (!path_buffer)
+   return;
+
+   /* ENAMETOOLONG */
+   path = cgroup_path(cgrp, path_buffer, PATH_MAX);
+   if (!path)
+   goto out_path_buffer;
+
+   length = strlen(path);
+
+   buffer = kmalloc(CN_PROC_MSG_SIZE + length, GFP_KERNEL);
+   if (!buffer)
+   goto out_path_buffer;
+
+   msg = buffer_to_cn_msg(buffer);
+   ev = (struct proc_event *)msg->data;
+   memset(&ev->event_data, 0, sizeof(ev->event_data) + length);
+   get_seq(&msg->seq, &ev->cpu);
+   ev->timestamp_ns = ktime_get_ns();
+   ev->what = PROC_EVENT_CGROUP_RELEASE;
+
+   /* If MAX_CGROUP_ROOT_NAMELEN is ever increased,
+* ./include/uapi/linux/cn_proc.h will need an update */
+   BUILD_BUG_ON(MAX_CGROUP_ROOT_NAMELEN > 64);
+   memcpy(ev->event_data.cgroup_release.cgroup_root,
+  cgrp->root->name,
+  MAX_CGROUP_ROOT_NAMELEN);
+
+   memcpy(ev->event_data.cgroup_release.cgroup_path, path, length);
+
+   memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+   msg->ack = 0; /* not used */
+   msg->len = sizeof(*ev) + length;
+   msg->flags = 0; /* not used */
+   cn_netlink_send(msg, 0, CN_IDX_PROC, GFP_KERNEL);
+
+   kfree(buffer);
+
+out_path_buffer:
+   kfree(path_buffer);
+}
+
 void proc_coredump_connector(struct task_struct *task)
 {
struct cn_msg *msg;
diff --git a/include/linux/cn_proc.h b/include/linux/cn_proc.h
index 1d5b02a..cf7cc56 100644
--- a/include/linux/cn_proc.h
+++ b/include/linux/cn_proc.h
@@ -19,6 +19,8 @@
 
 #include 
 
+struct cgroup;
+
 #ifdef CONFIG_PROC_EVENTS
 void proc_fork_connector(struct task_struct *task);
 void proc_exec_connector(struct task_struct *task);
@@ -28,6 +30,7 @@ void proc_ptrace_connector(struct t

Re: [PATCH net] openvswitch: disable LRO unless stated otherwise

2015-05-26 Thread Pravin Shelar
On Tue, May 26, 2015 at 10:38 AM, Jiri Benc  wrote:
> Currently, openvswitch tries to disable LRO from the user space. This does
> not work correctly when the device added is a vlan interface, though.
> Instead of dealing with possibly complex stacked cross name space relations
> in the user space, do the same as bridging does and call dev_disable_lro in
> the kernel.
>
> As there are use cases of openvswitch setup that can keep LRO enabled and
> there's a planned feature to optimize such use cases (and stop disabling LRO
> unconditionally from the user space daemon), allow the user space to
> override this when adding the interface.
>

OVS interface for generic networking device operation looks odd. have
you considered adding new device ioctl to do this?

> Signed-off-by: Jiri Benc 
> ---
>  include/uapi/linux/openvswitch.h | 9 +
>  net/openvswitch/vport-netdev.c   | 3 +++
>  2 files changed, 12 insertions(+)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index bbd49a0c46c7..3832953a4f27 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -252,6 +252,15 @@ enum ovs_vport_attr {
>
>  #define OVS_VPORT_ATTR_MAX (__OVS_VPORT_ATTR_MAX - 1)
>
> +/* OVS_VPORT_ATTR_OPTIONS attributes for netdev vports.
> + */
> +enum {
> +   OVS_NETDEV_ATTR_KEEP_LRO, /* flag */
> +   __OVS_NETDEV_ATTR_MAX
> +};
> +
> +#define OVS_NETDEV_ATTR_MAX (__OVS_NETDEV_ATTR_MAX - 1)
> +
>  enum {
> OVS_VXLAN_EXT_UNSPEC,
> OVS_VXLAN_EXT_GBP,  /* Flag or __u32 */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xen-netfront crash when detaching network while some network activity

2015-05-26 Thread Marek Marczykowski-Górecki
On Tue, May 26, 2015 at 11:56:00AM +0100, David Vrabel wrote:
> On 22/05/15 12:49, Marek Marczykowski-Górecki wrote:
> > Hi all,
> > 
> > I'm experiencing xen-netfront crash when doing xl network-detach while
> > some network activity is going on at the same time. It happens only when
> > domU has more than one vcpu. Not sure if this matters, but the backend
> > is in another domU (not dom0). I'm using Xen 4.2.2. It happens on kernel
> > 3.9.4 and 4.1-rc1 as well.
> > 
> > Steps to reproduce:
> > 1. Start the domU with some network interface
> > 2. Call there 'ping -f some-IP'
> > 3. Call 'xl network-detach NAME 0'
> 
> There's a use-after-free in xennet_remove().  Does this patch fix it?

Unfortunately not. Note that the crash is in xennet_disconnect_backend,
which is called before xennet_destroy_queues in xennet_remove.
I've tried to add napi_disable and even netif_napi_del just after
napi_synchronize in xennet_disconnect_backend (which would probably
cause crash when trying to cleanup the same later again), but it doesn't
help - the crash is the same (still in gnttab_end_foreign_access called
from xennet_disconnect_backend).


> 8<
> xen-netfront: properly destroy queues when removing device
> 
> xennet_remove() freed the queues before freeing the netdevice which
> results in a use-after-free when free_netdev() tries to delete the
> napi instances that have already been freed.
> 
> Fix this by fully destroy the queues (which includes deleting the napi
> instances) before freeing the netdevice.
> 
> Reported-by: Marek Marczykowski 
> Signed-off-by: David Vrabel 
> ---
>  drivers/net/xen-netfront.c |   15 ++-
>  1 file changed, 2 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 3f45afd..e031c94 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1698,6 +1698,7 @@ static void xennet_destroy_queues(struct netfront_info 
> *info)
>  
>   if (netif_running(info->netdev))
>   napi_disable(&queue->napi);
> + del_timer_sync(&queue->rx_refill_timer);
>   netif_napi_del(&queue->napi);
>   }
>  
> @@ -2102,9 +2103,6 @@ static const struct attribute_group xennet_dev_group = {
>  static int xennet_remove(struct xenbus_device *dev)
>  {
>   struct netfront_info *info = dev_get_drvdata(&dev->dev);
> - unsigned int num_queues = info->netdev->real_num_tx_queues;
> - struct netfront_queue *queue = NULL;
> - unsigned int i = 0;
>  
>   dev_dbg(&dev->dev, "%s\n", dev->nodename);
>  
> @@ -2112,16 +2110,7 @@ static int xennet_remove(struct xenbus_device *dev)
>  
>   unregister_netdev(info->netdev);
>  
> - for (i = 0; i < num_queues; ++i) {
> - queue = &info->queues[i];
> - del_timer_sync(&queue->rx_refill_timer);
> - }
> -
> - if (num_queues) {
> - kfree(info->queues);
> - info->queues = NULL;
> - }
> -
> + xennet_destroy_queues(info);
>   xennet_free_netdev(info->netdev);
>  
>   return 0;

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


pgpqSeMjpdXuF.pgp
Description: PGP signature


Re: [PATCH net] mpls: fix mpls route deletes to not check for route scope and type

2015-05-26 Thread roopa

On 5/26/15, 2:48 PM, Eric W. Biederman wrote:

Roopa Prabhu  writes:


From: Roopa Prabhu 

This patch fixes incorrect -EINVAL error due to invalid
scope and type for mpls route deletes.

Well this is embarrassing apparently I did not exercise this code path
in iproute.

Looking through my tests the closest I came was:
ip -M route flush table all


iproute2 route modify code does not set protocol/scope/type
for RTM_DELROUTE msgs. mpls code can skip checking for
these too.

I am really not certain that is the case.  I expect if you check
you will find that rtm_scope is set to 0  aka RT_SCOPE_UNIVERSE.

For scope I don't much care.  The mpls concepts and the ip concepts
don't match.  With mpls packets can be sent from anywhere in the
universe to an address that is valid only on one link.

For rtm_type I think we do care.  IPv4 and IPv6 are a disaster when it
comes to interfaces for setting up multicast routes, and I don't see any
reason why we would need to replicate that disaster for mpls.

As such I would like rtm_type to actually mean something, as for mpls
the lookup for multicast packets and the lookup for unicast packets is
completely different.  Unicast packet addresses are defined by the
receiver, and multicast packet addresses are defined by the sender.

So can we instead fix iproute to set rtm_type == RTN_UNICAST?
At least for mpls.


yes sure. I started with handling this in iproute2. So, i do have an 
iproute2 patch for this.

Will post it later today.

thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/1] hv_netvsc: Properly size the vrss queues

2015-05-26 Thread K. Y. Srinivasan
The current algorithm for deciding on the number of VRSS channels is
not optimal since we open up the min of number of CPUs online and the
number of VRSS channels the host is offering. So on a 32 VCPU guest
we could potentially open 32 VRSS subchannels. Experimentation has
shown that it is best to limit the number of VRSS channels to the number
of CPUs within a NUMA node. As part of this work introduce a module
parameter to control the number of sub-channels we would open up as well.
Here is the new algorithm for deciding on the number of sub-channels we
would open up:
1) Pick the minimum of what the host is offering and what the driver
   in the guest is specifying via the module parameter.
2) Pick the minimum of (1) and the numbers of CPUs in the NUMA
   node the primary channel is bound to.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/hyperv_net.h   |1 +
 drivers/net/hyperv/netvsc_drv.c   |6 ++
 drivers/net/hyperv/rndis_filter.c |   16 ++--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index ddcc7f8..dd45440 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -161,6 +161,7 @@ struct netvsc_device_info {
unsigned char mac_adr[ETH_ALEN];
bool link_state;/* 0 - link up, 1 - link down */
int  ring_size;
+   u32  max_num_vrss_chns;
 };
 
 enum rndis_device_state {
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index d9c88bc..feb94e2 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -46,6 +46,10 @@ static int ring_size = 128;
 module_param(ring_size, int, S_IRUGO);
 MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
 
+static int max_num_vrss_chns = 8;
+module_param(max_num_vrss_chns, int, S_IRUGO);
+MODULE_PARM_DESC(num_vrss_chns, "Maximum VRSS channels we would open by 
default");
+
 static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE |
NETIF_MSG_LINK | NETIF_MSG_IFUP |
NETIF_MSG_IFDOWN | NETIF_MSG_RX_ERR |
@@ -755,6 +759,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
ndevctx->device_ctx = hdev;
hv_set_drvdata(hdev, ndev);
device_info.ring_size = ring_size;
+   device_info.max_num_vrss_chns = max_num_vrss_chns;
rndis_filter_device_add(hdev, &device_info);
netif_tx_wake_all_queues(ndev);
 
@@ -975,6 +980,7 @@ static int netvsc_probe(struct hv_device *dev,
 
/* Notify the netvsc driver of the new device */
device_info.ring_size = ring_size;
+   device_info.max_num_vrss_chns = max_num_vrss_chns;
ret = rndis_filter_device_add(dev, &device_info);
if (ret != 0) {
netdev_err(net, "unable to add netvsc device (ret %d)\n", ret);
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 9118cea..006c1b8 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1013,6 +1013,9 @@ int rndis_filter_device_add(struct hv_device *dev,
struct ndis_recv_scale_cap rsscap;
u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
u32 mtu, size;
+   u32 num_rss_qs;
+   const struct cpumask *node_cpu_mask;
+   u32 num_possible_rss_qs;
 
rndis_device = get_rndis_device();
if (!rndis_device)
@@ -1100,9 +1103,18 @@ int rndis_filter_device_add(struct hv_device *dev,
if (ret || rsscap.num_recv_que < 2)
goto out;
 
+   num_rss_qs = min(device_info->max_num_vrss_chns, rsscap.num_recv_que);
+
net_device->max_chn = rsscap.num_recv_que;
-   net_device->num_chn = (num_online_cpus() < rsscap.num_recv_que) ?
-  num_online_cpus() : rsscap.num_recv_que;
+
+   /*
+* We will limit the VRSS channels to the number CPUs in the NUMA node
+* the primary channel is currently bound to.
+*/
+   node_cpu_mask = cpumask_of_node(cpu_to_node(dev->channel->target_cpu));
+   num_possible_rss_qs = cpumask_weight(node_cpu_mask);
+   net_device->num_chn = min(num_possible_rss_qs, num_rss_qs);
+
if (net_device->num_chn == 1)
goto out;
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] mpls: fix mpls route deletes to not check for route scope and type

2015-05-26 Thread Eric W. Biederman
Roopa Prabhu  writes:

> From: Roopa Prabhu 
>
> This patch fixes incorrect -EINVAL error due to invalid
> scope and type for mpls route deletes.

Well this is embarrassing apparently I did not exercise this code path
in iproute.

Looking through my tests the closest I came was:
ip -M route flush table all

> iproute2 route modify code does not set protocol/scope/type
> for RTM_DELROUTE msgs. mpls code can skip checking for
> these too.

I am really not certain that is the case.  I expect if you check
you will find that rtm_scope is set to 0  aka RT_SCOPE_UNIVERSE.

For scope I don't much care.  The mpls concepts and the ip concepts
don't match.  With mpls packets can be sent from anywhere in the
universe to an address that is valid only on one link.

For rtm_type I think we do care.  IPv4 and IPv6 are a disaster when it
comes to interfaces for setting up multicast routes, and I don't see any
reason why we would need to replicate that disaster for mpls.

As such I would like rtm_type to actually mean something, as for mpls
the lookup for multicast packets and the lookup for unicast packets is
completely different.  Unicast packet addresses are defined by the
receiver, and multicast packet addresses are defined by the sender.

So can we instead fix iproute to set rtm_type == RTN_UNICAST?
At least for mpls.

Eric

> $ip -f mpls route add 100 as 200 via inet 10.1.1.2 dev swp1
>
> $ip -f mpls route show
> 100 as to 200 via inet 10.1.1.2 dev swp1
>
> $ip -f mpls route del 100 as 200 via inet 10.1.1.2 dev swp1
> RTNETLINK answers: Invalid argument
>
> $ip -f mpls route del 100
> RTNETLINK answers: Invalid argument
>
> After patch:
>
> $ip -f mpls route show
> 100 as to 200 via inet 10.1.1.2 dev swp1
>
> $ip -f mpls route del 100 as 200 via inet 10.1.1.2 dev swp1
>
> $ip -f mpls route show
>
> Reported-by: Vivek Venkataraman 
> Suggested-by: Vivek Venkataraman 
> Signed-off-by: Vivek Venkataraman 
> Signed-off-by: Roopa Prabhu 
> ---
>  net/mpls/af_mpls.c |   11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 7b3f732..18ab7bf 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -693,10 +693,13 @@ static int rtm_to_route_config(struct sk_buff *skb,  
> struct nlmsghdr *nlh,
>* (or source specific address in the case of multicast)
>* all addresses have universal scope.
>*/
> - if (rtm->rtm_scope != RT_SCOPE_UNIVERSE)
> - goto errout;
> - if (rtm->rtm_type != RTN_UNICAST)
> - goto errout;
> + if (nlh->nlmsg_type != RTM_DELROUTE) {
> + if (rtm->rtm_scope != RT_SCOPE_UNIVERSE)
> + goto errout;
> + if (rtm->rtm_type != RTN_UNICAST)
> + goto errout;
> + }
> +
>   if (rtm->rtm_flags != 0)
>   goto errout;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v5 00/11] ipv6: Only create RTF_CACHE route after encountering pmtu exception

2015-05-26 Thread Martin KaFai Lau
On Tue, May 26, 2015 at 11:20:53PM +0200, Hannes Frederic Sowa wrote:
> I also went over the changes to the last version and such, albeit a bit
> late:
> Reviewed-by: Hannes Frederic Sowa 
Thanks for your help and review, Hannes!

--Martin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] openvswitch: include datapath actions with sampled-packet upcall to userspace

2015-05-26 Thread Pravin Shelar
On Fri, May 22, 2015 at 4:21 PM, Neil McKee  wrote:
> If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
> OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
> in the upcall.
>
> This Directly associates the sampled packet with the path it takes
> through the virtual switch. Path information currently includes mangling,
> encapsulation and decapsulation actions for tunneling protocols GRE,
> VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
> changes to accommodate datapath actions that may be added in the
> future.
>
> Adding path information enhances visibility into complex virtual
> networks.
>
> Signed-off-by: Neil McKee 
> ---
>  include/uapi/linux/openvswitch.h |  4 
>  net/openvswitch/actions.c| 21 -
>  net/openvswitch/datapath.c   | 18 ++
>  net/openvswitch/datapath.h   |  2 ++
>  4 files changed, 40 insertions(+), 5 deletions(-)
>

> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index b491c1c..b3f9b89 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -608,7 +608,8 @@ static void do_output(struct datapath *dp, struct sk_buff 
> *skb, int out_port)
>  }
>
>  static int output_userspace(struct datapath *dp, struct sk_buff *skb,
> -   struct sw_flow_key *key, const struct nlattr 
> *attr)
> +   struct sw_flow_key *key, const struct nlattr 
> *attr,
> +   const struct nlattr *actions, int actions_len)
>  {
> struct ovs_tunnel_info info;
> struct dp_upcall_info upcall;
> @@ -619,6 +620,8 @@ static int output_userspace(struct datapath *dp, struct 
> sk_buff *skb,
> upcall.userdata = NULL;
> upcall.portid = 0;
> upcall.egress_tun_info = NULL;
> +   upcall.actions = NULL;
> +   upcall.actions_len = 0;
>
At this point its better to just memset this structure.

> for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
>  a = nla_next(a, &rem)) {
> @@ -647,6 +650,13 @@ static int output_userspace(struct datapath *dp, struct 
> sk_buff *skb,
...

> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 5bae724..e735c8f 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -277,6 +277,8 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct 
> sw_flow_key *key)
> upcall.userdata = NULL;
> upcall.portid = ovs_vport_find_upcall_portid(p, skb);
> upcall.egress_tun_info = NULL;
> +   upcall.actions = NULL;
> +   upcall.actions_len = 0;

Here also memset should be used.

> error = ovs_dp_upcall(dp, skb, key, &upcall);
> if (unlikely(error))
> kfree_skb(skb);

...
> @@ -479,6 +485,18 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
> nla_nest_end(user_skb, nla);
> }
>
> +   if (upcall_info->actions_len) {
> +   nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_ACTIONS);
> +   err = ovs_nla_put_actions(upcall_info->actions,
> + upcall_info->actions_len,
> + user_skb);
> +   if (!err) {
> +   nla_nest_end(user_skb, nla);
> +   } else {
> +   nla_nest_cancel(user_skb, nla);
> +   }
> +   }
> +
There is no need to for curly bracket for the if-else block with
single statements.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] net/unix: sk_socket can disappear when state is unlocked

2015-05-26 Thread Hannes Frederic Sowa
Hi,

On Tue, May 26, 2015, at 17:22, Mark Salyzyn wrote:
> got a rare NULL pointer dereference in clear_bit
> 
> Signed-off-by: Mark Salyzyn 

IMHO, this is the right approach, I didn't came up with something
easier, thanks!

Acked-by: Hannes Frederic Sowa 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove unused functions from the driver file, bcmgenet.c

2015-05-26 Thread Florian Fainelli
+Petri,

On 26/05/15 09:28, Nicholas Krause wrote:
> This removes the unused function,  bcmgenet_hfb_add_filter and
> the filter functions used within it due to either having no
> callers or their  only caller now removed from the file,
> bcmgent.c with the removal of the function, bgmgenet_add_filter.

I am fairly sure Petri has pending changes that will utilize this
function, so if we can keep the code around for a while, that would help.

> 
> Signed-off-by: Nicholas Krause 
> ---
>  drivers/net/ethernet/broadcom/genet/bcmgenet.c | 122 
> -
>  1 file changed, 122 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 6043734..0d5dea9 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -2451,128 +2451,6 @@ static void bcmgenet_enable_dma(struct bcmgenet_priv 
> *priv, u32 dma_ctrl)
>   bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
>  }
>  
> -static bool bcmgenet_hfb_is_filter_enabled(struct bcmgenet_priv *priv,
> -u32 f_index)
> -{
> - u32 offset;
> - u32 reg;
> -
> - offset = HFB_FLT_ENABLE_V3PLUS + (f_index < 32) * sizeof(u32);
> - reg = bcmgenet_hfb_reg_readl(priv, offset);
> - return !!(reg & (1 << (f_index % 32)));
> -}
> -
> -static void bcmgenet_hfb_enable_filter(struct bcmgenet_priv *priv, u32 
> f_index)
> -{
> - u32 offset;
> - u32 reg;
> -
> - offset = HFB_FLT_ENABLE_V3PLUS + (f_index < 32) * sizeof(u32);
> - reg = bcmgenet_hfb_reg_readl(priv, offset);
> - reg |= (1 << (f_index % 32));
> - bcmgenet_hfb_reg_writel(priv, reg, offset);
> -}
> -
> -static void bcmgenet_hfb_set_filter_rx_queue_mapping(struct bcmgenet_priv 
> *priv,
> -  u32 f_index, u32 rx_queue)
> -{
> - u32 offset;
> - u32 reg;
> -
> - offset = f_index / 8;
> - reg = bcmgenet_rdma_readl(priv, DMA_INDEX2RING_0 + offset);
> - reg &= ~(0xF << (4 * (f_index % 8)));
> - reg |= ((rx_queue & 0xF) << (4 * (f_index % 8)));
> - bcmgenet_rdma_writel(priv, reg, DMA_INDEX2RING_0 + offset);
> -}
> -
> -static void bcmgenet_hfb_set_filter_length(struct bcmgenet_priv *priv,
> -u32 f_index, u32 f_length)
> -{
> - u32 offset;
> - u32 reg;
> -
> - offset = HFB_FLT_LEN_V3PLUS +
> -  ((priv->hw_params->hfb_filter_cnt - 1 - f_index) / 4) *
> -  sizeof(u32);
> - reg = bcmgenet_hfb_reg_readl(priv, offset);
> - reg &= ~(0xFF << (8 * (f_index % 4)));
> - reg |= ((f_length & 0xFF) << (8 * (f_index % 4)));
> - bcmgenet_hfb_reg_writel(priv, reg, offset);
> -}
> -
> -static int bcmgenet_hfb_find_unused_filter(struct bcmgenet_priv *priv)
> -{
> - u32 f_index;
> -
> - for (f_index = 0; f_index < priv->hw_params->hfb_filter_cnt; f_index++)
> - if (!bcmgenet_hfb_is_filter_enabled(priv, f_index))
> - return f_index;
> -
> - return -ENOMEM;
> -}
> -
> -/* bcmgenet_hfb_add_filter
> - *
> - * Add new filter to Hardware Filter Block to match and direct Rx traffic to
> - * desired Rx queue.
> - *
> - * f_data is an array of unsigned 32-bit integers where each 32-bit integer
> - * provides filter data for 2 bytes (4 nibbles) of Rx frame:
> - *
> - * bits 31:20 - unused
> - * bit  19- nibble 0 match enable
> - * bit  18- nibble 1 match enable
> - * bit  17- nibble 2 match enable
> - * bit  16- nibble 3 match enable
> - * bits 15:12 - nibble 0 data
> - * bits 11:8  - nibble 1 data
> - * bits 7:4   - nibble 2 data
> - * bits 3:0   - nibble 3 data
> - *
> - * Example:
> - * In order to match:
> - * - Ethernet frame type = 0x0800 (IP)
> - * - IP version field = 4
> - * - IP protocol field = 0x11 (UDP)
> - *
> - * The following filter is needed:
> - * u32 hfb_filter_ipv4_udp[] = {
> - *   Rx frame offset 0x00: 0x, 0x, 0x, 0x,
> - *   Rx frame offset 0x08: 0x, 0x, 0x000F0800, 0x00084000,
> - *   Rx frame offset 0x10: 0x, 0x, 0x, 0x00030011,
> - * };
> - *
> - * To add the filter to HFB and direct the traffic to Rx queue 0, call:
> - * bcmgenet_hfb_add_filter(priv, hfb_filter_ipv4_udp,
> - * ARRAY_SIZE(hfb_filter_ipv4_udp), 0);
> - */
> -int bcmgenet_hfb_add_filter(struct bcmgenet_priv *priv, u32 *f_data,
> - u32 f_length, u32 rx_queue)
> -{
> - int f_index;
> - u32 i;
> -
> - f_index = bcmgenet_hfb_find_unused_filter(priv);
> - if (f_index < 0)
> - return -ENOMEM;
> -
> - if (f_length > priv->hw_params->hfb_filter_size)
> - return -EINVAL;
> -
> - for (i = 0; i < f_length; i++)
> - bcmgenet_hfb_writel(priv, f_data[i],
> - (f_index * priv->hw_params->hfb_filter_size + i) *
> - siz

Re: [PATCH net-next v5 00/11] ipv6: Only create RTF_CACHE route after encountering pmtu exception

2015-05-26 Thread Hannes Frederic Sowa
On Mon, May 25, 2015, at 19:34, David Miller wrote:
> From: Martin KaFai Lau 
> Date: Fri, 22 May 2015 20:55:55 -0700
> 
> > This series is to avoid creating a RTF_CACHE route whenever we are 
> > consulting
> > the fib6 tree with a new destination.  Instead, only create RTF_CACHE route
> > when we see a pmtu exception.
> 
> Looks great, nice work.
> 
> Series applied to net-next, thanks!

I also went over the changes to the last version and such, albeit a bit
late:
Reviewed-by: Hannes Frederic Sowa 

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] mpls: fix mpls route deletes to not check for route scope and type

2015-05-26 Thread Roopa Prabhu
From: Roopa Prabhu 

This patch fixes incorrect -EINVAL error due to invalid
scope and type for mpls route deletes.

iproute2 route modify code does not set protocol/scope/type
for RTM_DELROUTE msgs. mpls code can skip checking for
these too.

$ip -f mpls route add 100 as 200 via inet 10.1.1.2 dev swp1

$ip -f mpls route show
100 as to 200 via inet 10.1.1.2 dev swp1

$ip -f mpls route del 100 as 200 via inet 10.1.1.2 dev swp1
RTNETLINK answers: Invalid argument

$ip -f mpls route del 100
RTNETLINK answers: Invalid argument

After patch:

$ip -f mpls route show
100 as to 200 via inet 10.1.1.2 dev swp1

$ip -f mpls route del 100 as 200 via inet 10.1.1.2 dev swp1

$ip -f mpls route show

Reported-by: Vivek Venkataraman 
Suggested-by: Vivek Venkataraman 
Signed-off-by: Vivek Venkataraman 
Signed-off-by: Roopa Prabhu 
---
 net/mpls/af_mpls.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 7b3f732..18ab7bf 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -693,10 +693,13 @@ static int rtm_to_route_config(struct sk_buff *skb,  
struct nlmsghdr *nlh,
 * (or source specific address in the case of multicast)
 * all addresses have universal scope.
 */
-   if (rtm->rtm_scope != RT_SCOPE_UNIVERSE)
-   goto errout;
-   if (rtm->rtm_type != RTN_UNICAST)
-   goto errout;
+   if (nlh->nlmsg_type != RTM_DELROUTE) {
+   if (rtm->rtm_scope != RT_SCOPE_UNIVERSE)
+   goto errout;
+   if (rtm->rtm_type != RTN_UNICAST)
+   goto errout;
+   }
+
if (rtm->rtm_flags != 0)
goto errout;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] test_bpf: add similarly conflicting jump test case only for classic

2015-05-26 Thread Alexei Starovoitov

On 5/26/15 1:35 PM, Daniel Borkmann wrote:

While 3b52960266a3 ("test_bpf: add more eBPF jump torture cases")
added the int3 bug test case only for eBPF, which needs exactly 11
passes to converge, here's a version for classic BPF with 11 passes,
and one that would need 70 passes on x86_64 to actually converge for
being successfully JITed. Effectively, all jumps are being optimized
out resulting in a JIT image of just 89 bytes (from originally max
BPF insns), only returning K.

Might be useful as a receipe for folks wanting to craft a test case
when backporting the fix in commit 3f7352bf21f8 ("x86: bpf_jit: fix
compilation of large bpf programs") while not having eBPF. The 2nd
one is delegated to the interpreter as the last pass still results
in shrinking, in other words, this one won't be JITed on x86_64.

Signed-off-by: Daniel Borkmann 


great tests. Thanks!
Acked-by: Alexei Starovoitov 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2] test_bpf: add similarly conflicting jump test case only for classic

2015-05-26 Thread Daniel Borkmann
While 3b52960266a3 ("test_bpf: add more eBPF jump torture cases")
added the int3 bug test case only for eBPF, which needs exactly 11
passes to converge, here's a version for classic BPF with 11 passes,
and one that would need 70 passes on x86_64 to actually converge for
being successfully JITed. Effectively, all jumps are being optimized
out resulting in a JIT image of just 89 bytes (from originally max
BPF insns), only returning K.

Might be useful as a receipe for folks wanting to craft a test case
when backporting the fix in commit 3f7352bf21f8 ("x86: bpf_jit: fix
compilation of large bpf programs") while not having eBPF. The 2nd
one is delegated to the interpreter as the last pass still results
in shrinking, in other words, this one won't be JITed on x86_64.

Signed-off-by: Daniel Borkmann 
---
 v1 -> v2:
  - Fixed newline, added 2nd case

 lib/test_bpf.c | 57 +
 1 file changed, 57 insertions(+)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index c07b8e7..7f58c73 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -314,6 +314,47 @@ static int bpf_fill_maxinsns10(struct bpf_test *self)
return 0;
 }
 
+static int __bpf_fill_ja(struct bpf_test *self, unsigned int len,
+unsigned int plen)
+{
+   struct sock_filter *insn;
+   unsigned int rlen;
+   int i, j;
+
+   insn = kmalloc_array(len, sizeof(*insn), GFP_KERNEL);
+   if (!insn)
+   return -ENOMEM;
+
+   rlen = (len % plen) - 1;
+
+   for (i = 0; i + plen < len; i += plen)
+   for (j = 0; j < plen; j++)
+   insn[i + j] = __BPF_JUMP(BPF_JMP | BPF_JA,
+plen - 1 - j, 0, 0);
+   for (j = 0; j < rlen; j++)
+   insn[i + j] = __BPF_JUMP(BPF_JMP | BPF_JA, rlen - 1 - j,
+0, 0);
+
+   insn[len - 1] = __BPF_STMT(BPF_RET | BPF_K, 0xababcbac);
+
+   self->u.ptr.insns = insn;
+   self->u.ptr.len = len;
+
+   return 0;
+}
+
+static int bpf_fill_maxinsns11(struct bpf_test *self)
+{
+   /* Hits 70 passes on x86_64, so cannot get JITed there. */
+   return __bpf_fill_ja(self, BPF_MAXINSNS, 68);
+}
+
+static int bpf_fill_ja(struct bpf_test *self)
+{
+   /* Hits exactly 11 passes on x86_64 JIT. */
+   return __bpf_fill_ja(self, 12, 9);
+}
+
 static struct bpf_test tests[] = {
{
"TAX",
@@ -4252,6 +4293,14 @@ static struct bpf_test tests[] = {
{ },
{ { 0, 1 } },
},
+   {
+   "JMP_JA: Jump, gap, jump, ...",
+   { },
+   CLASSIC | FLAG_NO_DATA,
+   { },
+   { { 0, 0xababcbac } },
+   .fill_helper = bpf_fill_ja,
+   },
{   /* Mainly checking JIT here. */
"BPF_MAXINSNS: Maximum possible literals",
{ },
@@ -4335,6 +4384,14 @@ static struct bpf_test tests[] = {
{ { 0, 0xabababac } },
.fill_helper = bpf_fill_maxinsns10,
},
+   {
+   "BPF_MAXINSNS: Jump, gap, jump, ...",
+   { },
+   CLASSIC | FLAG_NO_DATA,
+   { },
+   { { 0, 0xababcbac } },
+   .fill_helper = bpf_fill_maxinsns11,
+   },
 };
 
 static struct net_device dev;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()

2015-05-26 Thread Greg KH
On Tue, May 26, 2015 at 10:06:17AM -0700, Joe Perches wrote:
> On Tue, 2015-05-26 at 09:35 -0700, Greg KH wrote:
> > On Tue, May 26, 2015 at 07:48:59AM -0700, Joe Perches wrote:
> > > > > The main point is that patches shouldn't be applied without
> > > > > being submitted to a more widely read list.
> > > > 
> > > > I take the blame for any problems with Outreachy patches.
> > > 
> > > Are you going to change any procedure associated to these
> > > Outreachy patches?
> > 
> > 2 bugs out of 900?  Nah, I think that's good odds.
> > 
> > Also, the outreachy patch process would overwhelm everyone else on the
> > list, it's really high volume during the application phase, I'd prefer
> > it to stick with the mentors that wish to help out with the process.  If
> > you and/or Dan, or anyone else wishes to help out with this, I would
> > really appreciate it.  But I don't think that forcing them to post to
> > the driverdevel list is a good idea.
> 
> I don't think that's necessary, but sending them to any
> listed maintainer should be.
> 
> If you're collecting them, I suggest you stick them in
> a separate branch, post them to your driverdev list and
> cc the appropriate maintainers, wait a week, then apply
> them to your main branch.
> 
> You already batch post hundreds of patches for kernel
> x.y.z stable branches.  What's another few hundred?
> 

Stable patches is a totally different workflow from my "normal" kernel
patch acceptance work.  I'll consider changing something for the future
outreachy application process.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()

2015-05-26 Thread Dan Carpenter
On Tue, May 26, 2015 at 10:06:17AM -0700, Joe Perches wrote:
> If you're collecting them, I suggest you stick them in
> a separate branch, post them to your driverdev list and
> cc the appropriate maintainers, wait a week, then apply
> them to your main branch.

That would work.  A massive thread is easy to delete.  Or even just
apply them first, and we tell people to send follow on patches or revert
as needed.  My whole setup revolves around email so finding and
reviewing patches using git log is awkward.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: tcp: Fix a PTO timing granularity issue

2015-05-26 Thread Ido Yariv
The Tail Loss Probe RFC specifies that the PTO value should be set to
max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.

The PTO value is converted to jiffies, so the timer may expire
prematurely.

This is especially problematic on systems in which HZ <= 100, so work
around this by setting the timeout to at least 2 jiffies on such
systems.

The 10ms figure was originally selected based on tests performed with
the current implementation and HZ = 1000. Thus, leave the behavior on
systems with HZ > 100 unchanged.

Signed-off-by: Ido Yariv 
---
 net/ipv4/tcp_output.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 534e5fd..5321df8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2208,6 +2208,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
timeout = max_t(u32, timeout,
(rtt + (rtt >> 1) + TCP_DELACK_MAX));
timeout = max_t(u32, timeout, msecs_to_jiffies(10));
+#if HZ <= 100
+   timeout = max_t(u32, timeout, 2);
+#endif
 
/* If RTO is shorter, just schedule TLP in its place. */
tlp_time_stamp = tcp_time_stamp + timeout;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] i40e: remove unneeded tests

2015-05-26 Thread Laurent Navet
The same code is executed regardless ret_code value, so these tests can
be removed.
Fix Coverity CID 1268789 and 1268791

Signed-off-by: Laurent Navet 
---
 drivers/net/ethernet/intel/i40e/i40e_hmc.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.c 
b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
index 9b987cc..eae4248 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_hmc.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
@@ -298,8 +298,6 @@ i40e_status i40e_remove_sd_bp_new(struct i40e_hw *hw,
goto exit;
}
ret_code = i40e_free_dma_mem(hw, &(sd_entry->u.bp.addr));
-   if (ret_code)
-   goto exit;
 exit:
return ret_code;
 }
@@ -353,8 +351,6 @@ i40e_status i40e_remove_pd_page_new(struct i40e_hw *hw,
}
/* free memory here */
ret_code = i40e_free_dma_mem(hw, &(sd_entry->u.pd_table.pd_page_addr));
-   if (ret_code)
-   goto exit;
 exit:
return ret_code;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()

2015-05-26 Thread Dan Carpenter
On Tue, May 26, 2015 at 09:35:55AM -0700, Greg KH wrote:
> Also, the outreachy patch process would overwhelm everyone else on the
> list, it's really high volume during the application phase, I'd prefer
> it to stick with the mentors that wish to help out with the process.  If
> you and/or Dan, or anyone else wishes to help out with this, I would
> really appreciate it.  But I don't think that forcing them to post to
> the driverdevel list is a good idea.

If it makes your life easier to merge them directly from outreachy
that's great and I'm ok with that, but don't skip the normal review
process as a favour to us.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 0/3] net: Add incoming CPU mask to sockets

2015-05-26 Thread Tom Herbert
On Tue, May 26, 2015 at 11:19 AM, Eric Dumazet  wrote:
> On Tue, 2015-05-26 at 11:00 -0700, Tom Herbert wrote:
>> On Tue, May 26, 2015 at 10:18 AM, Eric Dumazet  
>> wrote:
>> > On Tue, 2015-05-26 at 09:34 -0700, Tom Herbert wrote:
>> >> Added matching of CPU to a socket CPU mask. This is useful for TCP
>> >> listeners and unconnected UDP. This works with SO_REUSPORT to steer
>> >> packets to listener sockets based on CPU affinity. These patches
>> >> allow steering packets to listeners based on numa locality. This is
>> >> only useful for passive connections.
>> >>
>> >> v2:
>> >>   - Add cache alignment for fields used in socket lookup in sock_common
>> >>   - Added UDP test results
>> >
>> > What about the feedback I gave earlier Tom ???
>> >
>> > This cannot work for TCP in its current state.
>> >
>> It does work and it fixes cache server locality issues we are seeing.
>> Right now half of our connections are persistently crossing numa nodes
>> on receive-- this is having big negative impact. Yes, there may be
>> edge conditions where SYN goes to a different CPU than the rest of the
>> flow (probably need RFS or flow director for that problem), and that
>> sounds like something nice to fix, but this patch is not dependent on
>> that. Besides, did you foresee an API change would be required?
>
> With current stack, there is no guarantee SYN and ACK packets are
> handled by same cpu.
>
> These are no edge conditions, but real ones, even with RFS.
>
> Not everyone tweaks /proc/irq/*/smp_affinity
>
> Default is still allowing cpus being almost random (affinity=fff)
>
In that case there's no guarantee that any two packets in a flow will
hit the same CPU so there's no way to establish affinity to the
interrupt anyway. RFS would work okay to get affinity of the soft
processing, but there would be no point in trying to do any affinity
with incoming cpu so this feature wouldn't help.

The general problem is that the flow hash and/or RX CPU for a flow are
not guaranteed to be persistent for a connection. UDP doesn't have a
problem with this since every RX UDP packet can be independently
steered to a good socket in SO_REUSEPORT. For TCP we only get to make
this decision once for the whole lifetime of the flow, which means
that eventually that may turn out to made "wrong". These patches don't
try to fix that problem, for that I believe we're going to need to do
something a little more radical :-)

> That was partly for these reasons that SO_REUSEPORT (for TCP) could not
> use cpu number, but a flow hash to select the target socket.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: tcp: Fix a PTO timing granularity issue

2015-05-26 Thread Ido Yariv
Hi Eric,

On Tue, May 26, 2015 at 11:25:21AM -0700, Eric Dumazet wrote:
> On Tue, 2015-05-26 at 13:55 -0400, Ido Yariv wrote:
> 
> > 
> > The platform this was tested on was an embedded platform with a wifi
> > module (11n, 20MHZ). The other end was a computer running Windows, and
> > the benchmarking software was IxChariot.
> > The whole setup was running in a shielded box with minimal
> > interferences.
> > 
> > As it seems, the throughput was limited by the congestion window.
> > Further analysis led to TLP - the fact that its timer was expiring
> > prematurely impacted cwnd, which in turn prevented the wireless driver
> > from having enough skbs to buffer and send.
> > 
> > Increasing the size of the chunks being sent had a similar impact on
> > throughput, presumably because the congestion window had enough time to
> > increase.
> > 
> > Changing the congestion window to Westwood from cubic/reno also had a
> > similar impact on throughput.
> > 
> 
> Have you tested what results you had by completely disabling TLP ?
> 
> Maybe a timer of 10 to 20ms is too short anyway in your testbed.

Yes, I have (by writing 2 to /proc/sys/net/ipv4/tcp_early_retrans), and
it also had a similar effect. That was actually the first workaround for
this issue, before the issue in TLP was traced.

With 10ms to 20ms timers the throughput was just the same as disabling
TLP altogether, so it seems it was just enough to handle.

Cheers,
Ido.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Add support for QCA IPQ806x Ethernet GMAC controller

2015-05-26 Thread Mathieu Olivari
This patch set adds support for the integrated Ethernet GMAC controller
on QCA IPQ806x SoC. This controller is based on a Gigabit Synopsys
DesignWare IP, already supported in the stmmac driver located in
drivers/net/ethernet/stmicro/stmmac.

This change is done as a follow-up to the following thread:
*http://www.spinics.net/lists/netdev/msg311265.html
While previous attempt was creating a new driver to drive this controller,
this new post leverages the existing stmmac driver by implementing the
SoC specific glue to it.

Aside from the pure stmmac glue layer, we have a couple of related
patches:
*IPQ806x NSS clock addition is cherry-picked and refreshed from the
 following thread: https://lkml.org/lkml/2014/8/6/390
*phy-handle and fixed-link support are also added in this change set so the
 driver can be fully functional on platforms using device-trees as well as
 ethernet switches.

Mathieu Olivari (4):
  stmmac: add phy-handle support to the platform layer
  stmmac: add fixed-link device-tree support
  stmmac: add ipq806x glue layer
  net: stmmac: ipq806x: document device tree bindings

Stephen Boyd (1):
  clk: qcom: Add support for NSS/GMAC clocks and resets

 .../devicetree/bindings/net/ipq806x-dwmac.txt  |  35 ++
 drivers/clk/qcom/gcc-ipq806x.c | 594 -
 drivers/net/ethernet/stmicro/stmmac/Kconfig|  14 +
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   1 +
 .../net/ethernet/stmicro/stmmac/dwmac-ipq806x.c| 365 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  30 +-
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  18 +-
 include/dt-bindings/clock/qcom,gcc-ipq806x.h   |   2 +
 include/dt-bindings/reset/qcom,gcc-ipq806x.h   |  43 ++
 include/linux/stmmac.h |   1 +
 10 files changed, 1089 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/ipq806x-dwmac.txt
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.c

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] stmmac: add fixed-link device-tree support

2015-05-26 Thread Mathieu Olivari
In case DT is used, this change adds the ability to the stmmac driver to
detect a fixed-link PHY, instanciate it, and use it during
phy_connect().

Fixed link PHYs DT usage is described in:
Documentation/devicetree/bindings/net/fixed-link.txt

Signed-off-by: Mathieu Olivari 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |  2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 12 +++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 31c6416..c46178c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -856,7 +856,7 @@ static int stmmac_init_phy(struct net_device *dev)
 * device as well.
 * Note: phydev->phy_id is the result of reading the UID PHY registers.
 */
-   if (phydev->phy_id == 0) {
+   if (!priv->plat->phy_node && phydev->phy_id == 0) {
phy_disconnect(phydev);
return -ENODEV;
}
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 8d23155..f3918c7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -148,6 +148,14 @@ static int stmmac_probe_config_dt(struct platform_device 
*pdev,
/* If we find a phy-handle property, use it as the PHY */
plat->phy_node = of_parse_phandle(np, "phy-handle", 0);
 
+   /* If phy-handle is not specified, check if we have a fixed-phy */
+   if (!plat->phy_node && of_phy_is_fixed_link(np)) {
+   if ((of_phy_register_fixed_link(np) < 0))
+   return -ENODEV;
+
+   plat->phy_node = of_node_get(np);
+   }
+
/* "snps,phy-addr" is not a standard property. Mark it as deprecated
 * and warn of its use. Remove this when phy node support is added.
 */
@@ -212,8 +220,10 @@ static int stmmac_probe_config_dt(struct platform_device 
*pdev,
if (of_find_property(np, "snps,pbl", NULL)) {
dma_cfg = devm_kzalloc(&pdev->dev, sizeof(*dma_cfg),
   GFP_KERNEL);
-   if (!dma_cfg)
+   if (!dma_cfg) {
+   of_node_put(np);
return -ENOMEM;
+   }
plat->dma_cfg = dma_cfg;
of_property_read_u32(np, "snps,pbl", &dma_cfg->pbl);
dma_cfg->fixed_burst =
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] net: stmmac: ipq806x: document device tree bindings

2015-05-26 Thread Mathieu Olivari
Add the device tree bindings documentation for the QCA IPQ806x
variant of the Synopsys DesignWare MAC.

Signed-off-by: Mathieu Olivari 
---
 .../devicetree/bindings/net/ipq806x-dwmac.txt  | 35 ++
 1 file changed, 35 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/ipq806x-dwmac.txt

diff --git a/Documentation/devicetree/bindings/net/ipq806x-dwmac.txt 
b/Documentation/devicetree/bindings/net/ipq806x-dwmac.txt
new file mode 100644
index 000..6d7ab4e
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/ipq806x-dwmac.txt
@@ -0,0 +1,35 @@
+* IPQ806x DWMAC Ethernet controller
+
+The device inherits all the properties of the dwmac/stmmac devices
+described in the file net/stmmac.txt with the following changes.
+
+Required properties:
+
+- compatible: should be "qcom,ipq806x-gmac" along with "snps,dwmac"
+ and any applicable more detailed version number
+ described in net/stmmac.txt
+
+- qcom,nss-common: should contain a phandle to a syscon device mapping the
+  nss-common registers.
+
+- qcom,qsgmii-csr: should contain a phandle to a syscon device mapping the
+  qsgmii-csr registers.
+
+Example:
+
+   gmac: ethernet@3700 {
+   device_type = "network";
+   compatible = "qcom,ipq806x-gmac";
+   reg = <0x3700 0x20>;
+   interrupts = ;
+   interrupt-names = "macirq";
+
+   qcom,nss-common = <&nss_common>;
+   qcom,qsgmii-csr = <&qsgmii_csr>;
+
+   clocks = <&gcc GMAC_CORE1_CLK>;
+   clock-names = "stmmaceth";
+
+   resets = <&gcc GMAC_CORE1_RESET>;
+   reset-names = "stmmaceth";
+   };
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] clk: qcom: Add support for NSS/GMAC clocks and resets

2015-05-26 Thread Mathieu Olivari
From: Stephen Boyd 

Add the NSS/GMAC clocks and the TCM clock and NSS resets.

Signed-off-by: Stephen Boyd 
Signed-off-by: Mathieu Olivari 
---
 drivers/clk/qcom/gcc-ipq806x.c   | 594 ++-
 include/dt-bindings/clock/qcom,gcc-ipq806x.h |   2 +
 include/dt-bindings/reset/qcom,gcc-ipq806x.h |  43 ++
 3 files changed, 638 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/qcom/gcc-ipq806x.c b/drivers/clk/qcom/gcc-ipq806x.c
index a50936a..5639699 100644
--- a/drivers/clk/qcom/gcc-ipq806x.c
+++ b/drivers/clk/qcom/gcc-ipq806x.c
@@ -140,12 +140,47 @@ static struct clk_regmap pll14_vote = {
},
 };
 
+#define NSS_PLL_RATE(f, _l, _m, _n, i) \
+   {  \
+   .freq = f,  \
+   .l = _l, \
+   .m = _m, \
+   .n = _n, \
+   .ibits = i, \
+   }
+
+static struct pll_freq_tbl pll18_freq_tbl[] = {
+   NSS_PLL_RATE(55000, 44, 0, 1, 0x01495625),
+   NSS_PLL_RATE(73300, 58, 16, 25, 0x014b5625),
+};
+
+static struct clk_pll pll18 = {
+   .l_reg = 0x31a4,
+   .m_reg = 0x31a8,
+   .n_reg = 0x31ac,
+   .config_reg = 0x31b4,
+   .mode_reg = 0x31a0,
+   .status_reg = 0x31b8,
+   .status_bit = 16,
+   .post_div_shift = 16,
+   .post_div_width = 1,
+   .freq_tbl = pll18_freq_tbl,
+   .clkr.hw.init = &(struct clk_init_data){
+   .name = "pll18",
+   .parent_names = (const char *[]){ "pxo" },
+   .num_parents = 1,
+   .ops = &clk_pll_ops,
+   },
+};
+
 enum {
P_PXO,
P_PLL8,
P_PLL3,
P_PLL0,
P_CXO,
+   P_PLL14,
+   P_PLL18,
 };
 
 static const struct parent_map gcc_pxo_pll8_map[] = {
@@ -197,6 +232,22 @@ static const char *gcc_pxo_pll8_pll0_map[] = {
"pll0_vote",
 };
 
+static const struct parent_map gcc_pxo_pll8_pll14_pll18_pll0_map[] = {
+   { P_PXO, 0 },
+   { P_PLL8, 4 },
+   { P_PLL0, 2 },
+   { P_PLL14, 5 },
+   { P_PLL18, 1 }
+};
+
+static const char *gcc_pxo_pll8_pll14_pll18_pll0[] = {
+   "pxo",
+   "pll8_vote",
+   "pll0_vote",
+   "pll14",
+   "pll18",
+};
+
 static struct freq_tbl clk_tbl_gsbi_uart[] = {
{  1843200, P_PLL8, 2,  6, 625 },
{  3686400, P_PLL8, 2, 12, 625 },
@@ -2202,6 +2253,472 @@ static struct clk_branch ebi2_aon_clk = {
},
 };
 
+static const struct freq_tbl clk_tbl_gmac[] = {
+   { 13300, P_PLL0, 1,  50, 301 },
+   { 26600, P_PLL0, 1, 127, 382 },
+   { }
+};
+
+static struct clk_dyn_rcg gmac_core1_src = {
+   .ns_reg[0] = 0x3cac,
+   .ns_reg[1] = 0x3cb0,
+   .md_reg[0] = 0x3ca4,
+   .md_reg[1] = 0x3ca8,
+   .bank_reg = 0x3ca0,
+   .mn[0] = {
+   .mnctr_en_bit = 8,
+   .mnctr_reset_bit = 7,
+   .mnctr_mode_shift = 5,
+   .n_val_shift = 16,
+   .m_val_shift = 16,
+   .width = 8,
+   },
+   .mn[1] = {
+   .mnctr_en_bit = 8,
+   .mnctr_reset_bit = 7,
+   .mnctr_mode_shift = 5,
+   .n_val_shift = 16,
+   .m_val_shift = 16,
+   .width = 8,
+   },
+   .s[0] = {
+   .src_sel_shift = 0,
+   .parent_map = gcc_pxo_pll8_pll14_pll18_pll0_map,
+   },
+   .s[1] = {
+   .src_sel_shift = 0,
+   .parent_map = gcc_pxo_pll8_pll14_pll18_pll0_map,
+   },
+   .p[0] = {
+   .pre_div_shift = 3,
+   .pre_div_width = 2,
+   },
+   .p[1] = {
+   .pre_div_shift = 3,
+   .pre_div_width = 2,
+   },
+   .mux_sel_bit = 0,
+   .freq_tbl = clk_tbl_gmac,
+   .clkr = {
+   .enable_reg = 0x3ca0,
+   .enable_mask = BIT(1),
+   .hw.init = &(struct clk_init_data){
+   .name = "gmac_core1_src",
+   .parent_names = gcc_pxo_pll8_pll14_pll18_pll0,
+   .num_parents = 5,
+   .ops = &clk_dyn_rcg_ops,
+   },
+   },
+};
+
+static struct clk_branch gmac_core1_clk = {
+   .halt_reg = 0x3c20,
+   .halt_bit = 4,
+   .hwcg_reg = 0x3cb4,
+   .hwcg_bit = 6,
+   .clkr = {
+   .enable_reg = 0x3cb4,
+   .enable_mask = BIT(4),
+   .hw.init = &(struct clk_init_data){
+   .name = "gmac_core1_clk",
+   .parent_names = (const char *[]){
+   "gmac_core1_src",
+   },
+   .num_parents = 1,
+   .ops = &clk_branch_ops,
+   .flags = CLK_SET_RATE_PARENT,
+   },
+   },
+};
+
+static struct clk_dyn_rcg gmac_core2_src = {
+   .ns_reg[0] = 0x3ccc,
+   .ns_reg[1] = 0x3cd0,
+   .md_reg[0] = 0x3cc4,
+   .md_reg[1] = 0x3cc8,
+   .bank_reg = 0x3ca0,
+   

[PATCH 4/5] stmmac: add ipq806x glue layer

2015-05-26 Thread Mathieu Olivari
The ethernet controller available in IPQ806x is a Synopsys DesignWare
Gigabit MAC IP core, already supported by the stmmac driver.

This glue layer implements some platform specific settings required to
get the controller working on an IPQ806x based platform.

Signed-off-by: Mathieu Olivari 
---
 drivers/net/ethernet/stmicro/stmmac/Kconfig|  14 +
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   1 +
 .../net/ethernet/stmicro/stmmac/dwmac-ipq806x.c| 365 +
 3 files changed, 380 insertions(+)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig 
b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index 731e045..cec147d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -16,6 +16,7 @@ if STMMAC_ETH
 config STMMAC_PLATFORM
tristate "STMMAC Platform bus support"
depends on STMMAC_ETH
+   select MFD_SYSCON
default y
---help---
  This selects the platform specific bus support for the stmmac driver.
@@ -36,6 +37,19 @@ config DWMAC_GENERIC
  platform specific code to function or is using platform
  data for setup.
 
+config DWMAC_IPQ806X
+   tristate "QCA IPQ806x DWMAC support"
+   default ARCH_QCOM
+   depends on OF
+   select MFD_SYSCON
+   help
+ Support for QCA IPQ806X DWMAC Ethernet.
+
+ This selects the IPQ806x SoC glue layer support for the stmmac
+ device driver. This driver does not use any of the hardware
+ acceleration features available on this SoC. Network devices
+ will behave like standard non-accelerated ethernet interfaces.
+
 config DWMAC_LPC18XX
tristate "NXP LPC18xx/43xx DWMAC support"
default ARCH_LPC18XX
diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 92e714a..b390161 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -6,6 +6,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
+obj-$(CONFIG_DWMAC_IPQ806X)+= dwmac-ipq806x.o
 obj-$(CONFIG_DWMAC_LPC18XX)+= dwmac-lpc18xx.o
 obj-$(CONFIG_DWMAC_MESON)  += dwmac-meson.o
 obj-$(CONFIG_DWMAC_ROCKCHIP)   += dwmac-rk.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.c
new file mode 100644
index 000..577b716
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.c
@@ -0,0 +1,365 @@
+/*
+ * Qualcomm Atheros IPQ806x GMAC glue layer
+ *
+ * Copyright (C) 2015 The Linux Foundation
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "stmmac_platform.h"
+
+#define NSS_COMMON_CLK_GATE0x8
+#define NSS_COMMON_CLK_GATE_PTP_EN(x)  BIT(0x10 + x)
+#define NSS_COMMON_CLK_GATE_RGMII_RX_EN(x) BIT(0x9 + (x * 2))
+#define NSS_COMMON_CLK_GATE_RGMII_TX_EN(x) BIT(0x8 + (x * 2))
+#define NSS_COMMON_CLK_GATE_GMII_RX_EN(x)  BIT(0x4 + x)
+#define NSS_COMMON_CLK_GATE_GMII_TX_EN(x)  BIT(0x0 + x)
+
+#define NSS_COMMON_CLK_DIV00xC
+#define NSS_COMMON_CLK_DIV_OFFSET(x)   (x * 8)
+#define NSS_COMMON_CLK_DIV_MASK0x7f
+
+#define NSS_COMMON_CLK_SRC_CTRL0x14
+#define NSS_COMMON_CLK_SRC_CTRL_OFFSET(x)  (1 << x)
+/* Mode is coded on 1 bit but is different depending on the MAC ID:
+ * MAC0: QSGMII=0 RGMII=1
+ * MAC1: QSGMII=0 SGMII=0 RGMII=1
+ * MAC2 & MAC3: QSGMII=0 SGMII=1
+ */
+#define NSS_COMMON_CLK_SRC_CTRL_RGMII(x)   1
+#define NSS_COMMON_CLK_SRC_CTRL_SGMII(x)   ((x >= 2) ? 1 : 0)
+
+#define NSS_COMMON_MACSEC_CTL  0x28
+#define NSS_COMMON_MACSEC_CTL_EXT_BYPASS_EN(x) (1 << x)
+
+#define NSS_COMMON_GMAC_CTL(x) (0x30 + (x * 4))
+#define NSS_COMMON_GMAC_CTL_CSYS_REQ   BIT(19)
+#define NSS_COMMON_GMAC_CTL_PHY_IFACE_SEL  BIT(16)
+#define NSS_COMMON_GMAC_C

[PATCH 2/5] stmmac: add phy-handle support to the platform layer

2015-05-26 Thread Mathieu Olivari
On stmmac driver, PHY specification in device-tree was done using the
non-standard property "snps,phy-addr". Specifying a PHY on a different
MDIO bus that the one within the stmmac controller doesn't seem to be
possible when device-tree is used.

This change adds support for the phy-handle property, as specified in
Documentation/devicetree/bindings/net/ethernet.txt.

Signed-off-by: Mathieu Olivari 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 28 ++
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  6 -
 include/linux/stmmac.h |  1 +
 3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index e4f2739..31c6416 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -52,6 +52,7 @@
 #include "stmmac_ptp.h"
 #include "stmmac.h"
 #include 
+#include 
 
 #define STMMAC_ALIGN(x)L1_CACHE_ALIGN(x)
 
@@ -816,18 +817,25 @@ static int stmmac_init_phy(struct net_device *dev)
priv->speed = 0;
priv->oldduplex = -1;
 
-   if (priv->plat->phy_bus_name)
-   snprintf(bus_id, MII_BUS_ID_SIZE, "%s-%x",
-priv->plat->phy_bus_name, priv->plat->bus_id);
-   else
-   snprintf(bus_id, MII_BUS_ID_SIZE, "stmmac-%x",
-priv->plat->bus_id);
+   if (priv->plat->phy_node) {
+   phydev = of_phy_connect(dev, priv->plat->phy_node,
+   &stmmac_adjust_link, 0, interface);
+   } else {
+   if (priv->plat->phy_bus_name)
+   snprintf(bus_id, MII_BUS_ID_SIZE, "%s-%x",
+priv->plat->phy_bus_name, priv->plat->bus_id);
+   else
+   snprintf(bus_id, MII_BUS_ID_SIZE, "stmmac-%x",
+priv->plat->bus_id);
 
-   snprintf(phy_id_fmt, MII_BUS_ID_SIZE + 3, PHY_ID_FMT, bus_id,
-priv->plat->phy_addr);
-   pr_debug("stmmac_init_phy:  trying to attach to %s\n", phy_id_fmt);
+   snprintf(phy_id_fmt, MII_BUS_ID_SIZE + 3, PHY_ID_FMT, bus_id,
+priv->plat->phy_addr);
+   pr_debug("stmmac_init_phy:  trying to attach to %s\n",
+phy_id_fmt);
 
-   phydev = phy_connect(dev, phy_id_fmt, &stmmac_adjust_link, interface);
+   phydev = phy_connect(dev, phy_id_fmt, &stmmac_adjust_link,
+interface);
+   }
 
if (IS_ERR(phydev)) {
pr_err("%s: Could not attach to PHY\n", dev->name);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 1664c01..8d23155 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "stmmac.h"
 #include "stmmac_platform.h"
@@ -144,13 +145,16 @@ static int stmmac_probe_config_dt(struct platform_device 
*pdev,
/* Default to phy auto-detection */
plat->phy_addr = -1;
 
+   /* If we find a phy-handle property, use it as the PHY */
+   plat->phy_node = of_parse_phandle(np, "phy-handle", 0);
+
/* "snps,phy-addr" is not a standard property. Mark it as deprecated
 * and warn of its use. Remove this when phy node support is added.
 */
if (of_property_read_u32(np, "snps,phy-addr", &plat->phy_addr) == 0)
dev_warn(&pdev->dev, "snps,phy-addr property is deprecated\n");
 
-   if (plat->phy_bus_name)
+   if (plat->phy_node || plat->phy_bus_name)
plat->mdio_bus_data = NULL;
else
plat->mdio_bus_data =
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 7f484a2..c735f5c 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -99,6 +99,7 @@ struct plat_stmmacenet_data {
int phy_addr;
int interface;
struct stmmac_mdio_bus_data *mdio_bus_data;
+   struct device_node *phy_node;
struct stmmac_dma_cfg *dma_cfg;
int clk_csr;
int has_gmac;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/2] net: phy: Utilize phy_interface_is_rgmii

2015-05-26 Thread Florian Fainelli
Update all open-coded tests for all 4 PHY_INTERFACE_MODE_RGMII* values
to use the newly introduced helper: phy_interface_is_rgmii.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/icplus.c  |  5 +
 drivers/net/phy/marvell.c | 10 ++
 drivers/net/phy/phy.c |  3 +--
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/net/phy/icplus.c b/drivers/net/phy/icplus.c
index 8644f039d922..0dbc445a5fa0 100644
--- a/drivers/net/phy/icplus.c
+++ b/drivers/net/phy/icplus.c
@@ -139,10 +139,7 @@ static int ip1001_config_init(struct phy_device *phydev)
if (c < 0)
return c;
 
-   if ((phydev->interface == PHY_INTERFACE_MODE_RGMII) ||
-   (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) ||
-   (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID) ||
-   (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)) {
+   if (phy_interface_is_rgmii(phydev)) {
 
c = phy_read(phydev, IP10XX_SPEC_CTRL_STATUS);
if (c < 0)
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 1b1698f98818..f721444c2b0a 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -317,10 +317,7 @@ static int m88e1121_config_aneg(struct phy_device *phydev)
if (err < 0)
return err;
 
-   if ((phydev->interface == PHY_INTERFACE_MODE_RGMII) ||
-   (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) ||
-   (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID) ||
-   (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)) {
+   if (phy_interface_is_rgmii(phydev)) {
 
mscr = phy_read(phydev, MII_88E1121_PHY_MSCR_REG) &
MII_88E1121_PHY_MSCR_DELAY_MASK;
@@ -469,10 +466,7 @@ static int m88e_config_init(struct phy_device *phydev)
int err;
int temp;
 
-   if ((phydev->interface == PHY_INTERFACE_MODE_RGMII) ||
-   (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) ||
-   (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID) ||
-   (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)) {
+   if (phy_interface_is_rgmii(phydev)) {
 
temp = phy_read(phydev, MII_M_PHY_EXT_CR);
if (temp < 0)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 377d2db04d33..b2197b506acb 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1093,8 +1093,7 @@ int phy_init_eee(struct phy_device *phydev, bool 
clk_stop_enable)
if ((phydev->duplex == DUPLEX_FULL) &&
((phydev->interface == PHY_INTERFACE_MODE_MII) ||
(phydev->interface == PHY_INTERFACE_MODE_GMII) ||
-   (phydev->interface >= PHY_INTERFACE_MODE_RGMII &&
-phydev->interface <= PHY_INTERFACE_MODE_RGMII_TXID) ||
+phy_interface_is_rgmii(phydev) ||
 phy_is_internal(phydev))) {
int eee_lp, eee_cap, eee_adv;
u32 lp, cap, adv;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/2] net: phy: Add phy_interface_is_rgmii helper

2015-05-26 Thread Florian Fainelli
RGMII interfaces come in 4 different flavors that the PHY library needs
to care about: regular RGMII (no delays), RGMII with either RX or TX
delay, and both. In order to avoid errors of checking only for one type
of RGMII interface and miss the 3 others, introduce a convenience
function which tests for all values.

Suggested-by: David S. Miller 
Signed-off-by: Florian Fainelli 
---
 include/linux/phy.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 701c7a3946e0..a26c3f84b8dd 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -678,6 +678,17 @@ static inline bool phy_is_internal(struct phy_device 
*phydev)
 }
 
 /**
+ * phy_interface_is_rgmii - Convenience function for testing if a PHY interface
+ * is RGMII (all variants)
+ * @phydev: the phy_device struct
+ */
+static inline bool phy_interface_is_rgmii(struct phy_device *phydev)
+{
+   return phydev->interface >= PHY_INTERFACE_MODE_RGMII &&
+   phydev->interface <= PHY_INTERFACE_MODE_RGMII_TXID;
+}
+
+/**
  * phy_write_mmd - Convenience function for writing a register
  * on an MMD on a given PHY.
  * @phydev: The phy_device struct
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/2] net: phy: phy_interface_is_rgmii helper

2015-05-26 Thread Florian Fainelli
Hi David,

As you suggested, here is the helper function to avoid missing some RGMII
interface checks. Had to wait for net to be merged in net-next to avoid
submitting the same patch/commit.

Dan, you might want to rebase your dp83867 submission to use that helper
when you this patchset gets merged into net-next, thanks!

Florian Fainelli (2):
  net: phy: Add phy_interface_is_rgmii helper
  net: phy: Utilize phy_interface_is_rgmii

 drivers/net/phy/icplus.c  |  5 +
 drivers/net/phy/marvell.c | 10 ++
 drivers/net/phy/phy.c |  3 +--
 include/linux/phy.h   | 11 +++
 4 files changed, 15 insertions(+), 14 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2] ss: add support for segs_in and segs_out

2015-05-26 Thread Marcelo Ricardo Leitner
On Tue, May 26, 2015 at 02:54:41PM -0400, Craig Gallek wrote:
> Two new tcp_info fields: tcpi_segs_in and tcpi_segs_out.
> (2efd055c53c06b7e89c167c98069bab9afce7e59)
> 
> ~: ss -ti src :22
>cubic wscale:7,6 rto:201 rtt:0.244/0.012 ato:40 mss:1418 cwnd:21 
> bytes_acked:80605 bytes_received:20491 segs_out:414 segs_in:600 send 
> 976.3Mbps lastsnd:23 lastrcv:23 lastack:22 pacing_rate 1952.7Mbps rcv_rtt:98 
> rcv_space:28960
> 
> Signed-off-by: Craig Gallek 

Cool, thanks Craig.

Reviewed-by: Marcelo Ricardo Leitner 

> ---
>  include/linux/tcp.h | 4 +++-
>  misc/ss.c   | 8 
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index 8b17cff..1e9b4a6 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -192,8 +192,10 @@ struct tcp_info {
>  
>   __u64   tcpi_pacing_rate;
>   __u64   tcpi_max_pacing_rate;
> - __u64   tcpi_bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
> + __u64   tcpi_bytes_acked;/* RFC4898 tcpEStatsAppHCThruOctetsAcked */
>   __u64   tcpi_bytes_received; /* RFC4898 
> tcpEStatsAppHCThruOctetsReceived */
> + __u32   tcpi_segs_out;   /* RFC4898 tcpEStatsPerfSegsOut */
> + __u32   tcpi_segs_in;/* RFC4898 tcpEStatsPerfSegsIn */
>  };
>  
>  /* for TCP_MD5SIG socket option */
> diff --git a/misc/ss.c b/misc/ss.c
> index dba0901..3e01f88 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -769,6 +769,8 @@ struct tcpstat
>   double  pacing_rate_max;
>   unsigned long long  bytes_acked;
>   unsigned long long  bytes_received;
> + unsigned intsegs_out;
> + unsigned intsegs_in;
>   unsigned intunacked;
>   unsigned intretrans;
>   unsigned intretrans_total;
> @@ -1695,6 +1697,10 @@ static void tcp_stats_print(struct tcpstat *s)
>   printf(" bytes_acked:%llu", s->bytes_acked);
>   if (s->bytes_received)
>   printf(" bytes_received:%llu", s->bytes_received);
> + if (s->segs_out)
> + printf(" segs_out:%u", s->segs_out);
> + if (s->segs_in)
> + printf(" segs_in:%u", s->segs_in);
>  
>   if (s->dctcp && s->dctcp->enabled) {
>   struct dctcpstat *dctcp = s->dctcp;
> @@ -1990,6 +1996,8 @@ static void tcp_show_info(const struct nlmsghdr *nlh, 
> struct inet_diag_msg *r,
>   }
>   s.bytes_acked = info->tcpi_bytes_acked;
>   s.bytes_received = info->tcpi_bytes_received;
> + s.segs_out = info->tcpi_segs_out;
> + s.segs_in = info->tcpi_segs_in;
>   tcp_stats_print(&s);
>   if (s.dctcp)
>   free(s.dctcp);
> -- 
> 2.2.0.rc0.207.ga3a616c
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "ip netns create" hangs forever, spamming console with "unregister_netdevice: waiting for lo to become free"

2015-05-26 Thread Zack Weinberg
On Tue, May 26, 2015 at 12:21 PM, Zack Weinberg  wrote:
> I have an application that makes heavy use of network namespaces,
> creating and destroying them on the fly during operation.  With 100%
> reproducibility, the first invocation of "ip netns create" after any
> "ip netns del" hangs forever in D-state; only rebooting the machine
> clears the condition.

Following up to myself to say that reproduction is not as simple as
'ip netns add test; ip netns del test; ip netns add test2'.  In fact,
not even bringing the namespace (and all associated interfaces) up and
then  down again _exactly_ as my production code does it will trigger
the bug.  It appears to be necessary to push a significant amount of
data through interfaces attached to the namespace.

Since I had to reset the machine to attempt to create a repro recipe,
I can no longer perform diagnostics on the hung processes, but I've
restarted the application and it should reach the problem state again
in a day or two.

zw
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH iproute2] ss: add support for segs_in and segs_out

2015-05-26 Thread Craig Gallek
Two new tcp_info fields: tcpi_segs_in and tcpi_segs_out.
(2efd055c53c06b7e89c167c98069bab9afce7e59)

~: ss -ti src :22
 cubic wscale:7,6 rto:201 rtt:0.244/0.012 ato:40 mss:1418 cwnd:21 
bytes_acked:80605 bytes_received:20491 segs_out:414 segs_in:600 send 976.3Mbps 
lastsnd:23 lastrcv:23 lastack:22 pacing_rate 1952.7Mbps rcv_rtt:98 
rcv_space:28960

Signed-off-by: Craig Gallek 
---
 include/linux/tcp.h | 4 +++-
 misc/ss.c   | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 8b17cff..1e9b4a6 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -192,8 +192,10 @@ struct tcp_info {
 
__u64   tcpi_pacing_rate;
__u64   tcpi_max_pacing_rate;
-   __u64   tcpi_bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
+   __u64   tcpi_bytes_acked;/* RFC4898 tcpEStatsAppHCThruOctetsAcked */
__u64   tcpi_bytes_received; /* RFC4898 
tcpEStatsAppHCThruOctetsReceived */
+   __u32   tcpi_segs_out;   /* RFC4898 tcpEStatsPerfSegsOut */
+   __u32   tcpi_segs_in;/* RFC4898 tcpEStatsPerfSegsIn */
 };
 
 /* for TCP_MD5SIG socket option */
diff --git a/misc/ss.c b/misc/ss.c
index dba0901..3e01f88 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -769,6 +769,8 @@ struct tcpstat
double  pacing_rate_max;
unsigned long long  bytes_acked;
unsigned long long  bytes_received;
+   unsigned intsegs_out;
+   unsigned intsegs_in;
unsigned intunacked;
unsigned intretrans;
unsigned intretrans_total;
@@ -1695,6 +1697,10 @@ static void tcp_stats_print(struct tcpstat *s)
printf(" bytes_acked:%llu", s->bytes_acked);
if (s->bytes_received)
printf(" bytes_received:%llu", s->bytes_received);
+   if (s->segs_out)
+   printf(" segs_out:%u", s->segs_out);
+   if (s->segs_in)
+   printf(" segs_in:%u", s->segs_in);
 
if (s->dctcp && s->dctcp->enabled) {
struct dctcpstat *dctcp = s->dctcp;
@@ -1990,6 +1996,8 @@ static void tcp_show_info(const struct nlmsghdr *nlh, 
struct inet_diag_msg *r,
}
s.bytes_acked = info->tcpi_bytes_acked;
s.bytes_received = info->tcpi_bytes_received;
+   s.segs_out = info->tcpi_segs_out;
+   s.segs_in = info->tcpi_segs_in;
tcp_stats_print(&s);
if (s.dctcp)
free(s.dctcp);
-- 
2.2.0.rc0.207.ga3a616c

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: phy: dp83867: Add TI dp83867 phy

2015-05-26 Thread Dan Murphy
Florian

Thanks for the review!

On 05/26/2015 01:04 PM, Florian Fainelli wrote:
> On 26/05/15 06:07, Dan Murphy wrote:
>> Add support for the TI dp83867 Gigabit ethernet phy
>> device.
>>
>> The DP83867 is a robust, low power, fully featured
>> Physical Layer transceiver with integrated PMD
>> sublayers to support 10BASE-T, 100BASE-TX and
>> 1000BASE-T Ethernet protocols.
>>
>> Signed-off-by: Dan Murphy 
>> ---
> [snip]
>
>> +
>> +int rx_tx_delay = (DP83867_RGMIIDCTL_2_75_NS << 
>> DP83867_RGMII_RX_CLK_DELAY_SHIFT) | DP83867_RGMIIDCTL_2_25_NS;
>> +module_param(rx_tx_delay, int, 0664);
> This is not going to work, rx and tx delays are inherent properties of
> PCB/board designs, you want to be able to get that value from your
> platform configuration, Device Tree would certainly be preferred here.
> Asking an user to figure this out through module parameters is going to
> be both error prone, and limiting yourself to no more than one instance.

Yeah I agree.  I also have platform data for legacy products that I could put 
in the
DT as well.

> [snip]
>
>> +
>> +static int dp83867_phy_reset(struct phy_device *phydev)
>> +{
>> +int err;
>> +
>> +err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESET);
>> +if (err < 0)
>> +return err;
>> +
>> +err = dp83867_config_init(phydev);
>> +return err;
> you could do a tail-call return directly?

Yep.  Will change that

>
> [snip]
>
>> +
>> +static int __init dp83867_init(void)
>> +{
>> +return phy_driver_register(&dp83867_driver);
>> +}
>> +
>> +static void __exit dp83867_exit(void)
>> +{
>> +phy_driver_unregister(&dp83867_driver);
>> +}
>> +
>> +module_init(dp83867_init);
>> +module_exit(dp83867_exit);
> You could use module_phy_driver to eliminate some boilerplate here.

Nice I will do that.

Dan

-- 
--
Dan Murphy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 for-next 00/12] Add network namespace support in the RDMA-CM

2015-05-26 Thread Jason Gunthorpe
On Tue, May 26, 2015 at 01:46:36PM -0400, Doug Ledford wrote:

> > Remember, this isn't RDMA namespaces, this is netdev namespace support
> > for RDMA-CM -> very different things.
> 
> That was the point of my email.  This is a very myopic view of the
> feature.  It *should* at least have an idea of these other things too.

Everything you talked about seems covered: iwarp/roce/ib now have a
fairly clear uniform story for CM. usNIC doesn't use core code.

I doubt a larger discussion about a 'rdma namespace' is going to
substantially change these patches, they are really netdev focused.
Anyhow, I've been saving that discussion for when the roce and
umad/uverbs namespace stuff is re-sent. It seems more appropriate at
that point.

I don't know about you, but I am exhausted looking at these huge patch
sets, and narrowing the focus is the only way I see to get through.
This series has hopefully narrowed to: 'fix the flow in netdev
handling for rdma-cm'.

Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] test_bpf: add similarly conflicting jump test case only for classic

2015-05-26 Thread Daniel Borkmann

On 05/26/2015 07:45 PM, Daniel Borkmann wrote:

While 3b52960266a3 ("test_bpf: add more eBPF jump torture cases")
added the int3 bug test case only for eBPF, which needs exactly 11
passes to converge, here's a version for classic BPF that would

...

Noticed a newline accidentally slipped in, please ignore this patch,
will fix it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: tcp: Fix a PTO timing granularity issue

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 13:55 -0400, Ido Yariv wrote:

> 
> The platform this was tested on was an embedded platform with a wifi
> module (11n, 20MHZ). The other end was a computer running Windows, and
> the benchmarking software was IxChariot.
> The whole setup was running in a shielded box with minimal
> interferences.
> 
> As it seems, the throughput was limited by the congestion window.
> Further analysis led to TLP - the fact that its timer was expiring
> prematurely impacted cwnd, which in turn prevented the wireless driver
> from having enough skbs to buffer and send.
> 
> Increasing the size of the chunks being sent had a similar impact on
> throughput, presumably because the congestion window had enough time to
> increase.
> 
> Changing the congestion window to Westwood from cubic/reno also had a
> similar impact on throughput.
> 

Have you tested what results you had by completely disabling TLP ?

Maybe a timer of 10 to 20ms is too short anyway in your testbed.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 0/3] net: Add incoming CPU mask to sockets

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 11:00 -0700, Tom Herbert wrote:
> On Tue, May 26, 2015 at 10:18 AM, Eric Dumazet  wrote:
> > On Tue, 2015-05-26 at 09:34 -0700, Tom Herbert wrote:
> >> Added matching of CPU to a socket CPU mask. This is useful for TCP
> >> listeners and unconnected UDP. This works with SO_REUSPORT to steer
> >> packets to listener sockets based on CPU affinity. These patches
> >> allow steering packets to listeners based on numa locality. This is
> >> only useful for passive connections.
> >>
> >> v2:
> >>   - Add cache alignment for fields used in socket lookup in sock_common
> >>   - Added UDP test results
> >
> > What about the feedback I gave earlier Tom ???
> >
> > This cannot work for TCP in its current state.
> >
> It does work and it fixes cache server locality issues we are seeing.
> Right now half of our connections are persistently crossing numa nodes
> on receive-- this is having big negative impact. Yes, there may be
> edge conditions where SYN goes to a different CPU than the rest of the
> flow (probably need RFS or flow director for that problem), and that
> sounds like something nice to fix, but this patch is not dependent on
> that. Besides, did you foresee an API change would be required?

With current stack, there is no guarantee SYN and ACK packets are
handled by same cpu.

These are no edge conditions, but real ones, even with RFS.

Not everyone tweaks /proc/irq/*/smp_affinity

Default is still allowing cpus being almost random (affinity=fff)

That was partly for these reasons that SO_REUSEPORT (for TCP) could not
use cpu number, but a flow hash to select the target socket.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: tcp: Fix a PTO timing granularity issue

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 13:55 -0400, Ido Yariv wrote:
> Hi Eric,
> 

> 
> I understand, and I also suspect that having it expire after 9ms will
> have very little impact, if at all.
> 
> Since it mainly affects HZ=100 systems, we can simply go with having at
> least 2 jiffies on these systems, and leave everything else as is.
> 
> However, if the 10ms has a special meaning (couldn't find reasoning for
> it in the RFC), making sure this timer doesn't expire prematurely could
> be beneficial. I'm afraid this was not tested on the setup mentioned
> above though.
> 


RFC did not explain how 10ms delay was implemented. This is the kind of
dark side. RFC are full of 'unsaid things', like OS bugs.

What is not said in TLP paper is : linux timers have a 'jiffie'
granularity that might be 1/100, 1/250, 1/1000, or even 1/64 on Alpha
processors...

Fact is : We did TLP implementation and experimentations and paper at
the same time, and we do not want to change the current behavior on
HZ=1000 hosts. This is the kind of change that would require lot of
tests for Google. 

Please resend your patch so that only HZ <= 100 is changed, we will
happily acknowledge it.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: phy: dp83867: Add TI dp83867 phy

2015-05-26 Thread Florian Fainelli
On 26/05/15 06:07, Dan Murphy wrote:
> Add support for the TI dp83867 Gigabit ethernet phy
> device.
> 
> The DP83867 is a robust, low power, fully featured
> Physical Layer transceiver with integrated PMD
> sublayers to support 10BASE-T, 100BASE-TX and
> 1000BASE-T Ethernet protocols.
> 
> Signed-off-by: Dan Murphy 
> ---
[snip]

> +
> +int rx_tx_delay = (DP83867_RGMIIDCTL_2_75_NS << 
> DP83867_RGMII_RX_CLK_DELAY_SHIFT) | DP83867_RGMIIDCTL_2_25_NS;
> +module_param(rx_tx_delay, int, 0664);

This is not going to work, rx and tx delays are inherent properties of
PCB/board designs, you want to be able to get that value from your
platform configuration, Device Tree would certainly be preferred here.
Asking an user to figure this out through module parameters is going to
be both error prone, and limiting yourself to no more than one instance.

[snip]

> +
> +static int dp83867_phy_reset(struct phy_device *phydev)
> +{
> + int err;
> +
> + err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESET);
> + if (err < 0)
> + return err;
> +
> + err = dp83867_config_init(phydev);
> + return err;

you could do a tail-call return directly?

[snip]

> +
> +static int __init dp83867_init(void)
> +{
> + return phy_driver_register(&dp83867_driver);
> +}
> +
> +static void __exit dp83867_exit(void)
> +{
> + phy_driver_unregister(&dp83867_driver);
> +}
> +
> +module_init(dp83867_init);
> +module_exit(dp83867_exit);

You could use module_phy_driver to eliminate some boilerplate here.
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 for-next 00/12] Add network namespace support in the RDMA-CM

2015-05-26 Thread Christian Benvenuti (benve)
> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
> Behalf Of Doug Ledford
> Sent: Tuesday, May 26, 2015 6:35 AM
> To: Haggai Eran
> Cc: linux-r...@vger.kernel.org; netdev@vger.kernel.org; Liran Liss; Guy
> Shapiro; Shachar Raindel; Yotam Kenneth
> Subject: Re: [PATCH v4 for-next 00/12] Add network namespace support in the
> RDMA-CM

...

> I don't think this is an issue for usNIC as if you
> want namespace support there, you just start the user space app in a given
> namespace and you are probably 90% of the way there since the user space
> application gets its own device and so its own MAC/IP and all of the RDMA
> transfers are UDP, so the application's namespace should get inherited by all
> the rest, but Cisco would need to confirm that, hence why I say 90% of the way
> there, it needs confirmed).

This is correct. 

Thanks
/Chris

N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥

RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF

2015-05-26 Thread Rose, Gregory V


> -Original Message-
> From: Skidmore, Donald C
> Sent: Tuesday, May 26, 2015 10:46 AM
> To: Hiroshi Shimamoto; Rose, Gregory V; Kirsher, Jeffrey T; intel-wired-
> l...@lists.osuosl.org
> Cc: nhor...@redhat.com; jogre...@redhat.com; Linux Netdev List; Choi, Sy
> Jong; Rony Efraim; David Miller; Edward Cree; Or Gerlitz;
> sassm...@redhat.com
> Subject: RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF
> 
> 

[snip]

> 
> > -Original Message-
> > From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com]
> > Sent: Monday, May 25, 2015 6:00 PM
> > To: Skidmore, Donald C; Rose, Gregory V; Kirsher, Jeffrey T;
> > intel-wired- l...@lists.osuosl.org
> > Cc: nhor...@redhat.com; jogre...@redhat.com; Linux Netdev List; Choi,
> > Sy Jong; Rony Efraim; David Miller; Edward Cree; Or Gerlitz;
> > sassm...@redhat.com
> > Subject: RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF
> >
> >
> > Do you mean that VF should care about it is trusted or not?
> > Should VF request MC Promisc again when it's trusted?
> > Or, do you mean VF never be trusted during its (or VM's) lifetime?
> 
> I think the VF shouldn't directly know whether it is trusted or not

That's completely irrevelant.  The person administering the PF will be the 
person who provided trusted privileges to the VF.  He'll then *tell* or somehow 
other communicate to the person administering the VF (probably himself/herself) 
and then proceed to execute commands on that VF that require trusted privileges.

If the VF does not have trusted privileges then the commands to add VLAN 
filters, set promiscuous modes, and any other privileged commands will fail.

Let's not get too fancy with this.  It's simple - the host VMM admin provides 
trusted privileges to the VF.  The person administering the VF (if in fact it 
is not the same person, it usually will be) will proceed to do things that 
require VF trusted privileges.  


.  It
> should request MC Promisc and get it if it is trusted and not if it is not
> trusted.  So if you (as the system admin know you have a VF that will need
> to request MC Promisc make sure you promote that VF to trusted before
> assigning it to a VM.  That way when it requests MC Promisc the PF will be
> able to grant it.
> 

Multicast promiscuous should be allowed for the VFs.  We already allow VFs to 
set whatever multicast filters they want so if they want to go into MPE then so 
what?  We don't care.  It's not a security risk.  Right now, without any 
modification, the VF can set 30 multicast filters and listen.  It can then 
remove those and set another 30 filters and listen.  And so on and so on.  So 
if a VF can already listen on any MC filter it wants then why this artificial 
restriction on MC promiscuous mode.

We don't care about this case. Unicast promiscuous is the security risk and I 
think we've handled that.

> 
> >
> > And what do you think about being untrusted from trusted state?
> 
> This is an interesting question.  If we allowed a VM to go from trusted ->
> untrusted we would have to turn off any "special" configuration that
> trusted allowed.  Maybe in such cases we could reset the PF?  And of
> course require all the "special" configuration (MC Promisc) to default to
> off after being reset.
> 

To remove privileges from a VF that you're already set to privileged will 
require destruction of the VF VSI and VFLR to the VF - after it comes up it 
can't do any further privileged operations.

[snip

> This too is a valid point.  Currently we would just not do it (MC Promisc)
> and the VF would have to figure that out for itself.  Passing a NAK back
> to the VF might be nicer. :)  Of course I assumed the sysadm would know
> that he/she wanted to give a VF trusted status and would do that before
> the VF was even assigned to a VM, so the issue would never come up.  Maybe
> that is not valid for your use case?

Let's not worry about MC promiscuous mode.  As I pointed out above we already 
let VFs set any MC address filters they want so that horse has already left the 
barn.

Focus on getting the VF privileged mode configuration going and then we're well 
on our way to accomplishing what we need to do.

- Greg



Re: [RFC V7 PATCH 7/7] vhost_net: add interrupt coalescing support

2015-05-26 Thread Stephen Hemminger
On Mon, 25 May 2015 01:24:04 -0400
Jason Wang  wrote:

> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/net.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 7d137a4..5ee28b7 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -320,6 +320,9 @@ static void handle_tx(struct vhost_net *net)
>   hdr_size = nvq->vhost_hlen;
>   zcopy = nvq->ubufs;
>  
> + /* Finish pending interrupts first */
> + vhost_check_coalesce_and_signal(vq->dev, vq, false);
> +
>   for (;;) {
>   /* Release DMAs done buffers first */
>   if (zcopy)
> @@ -415,6 +418,7 @@ static void handle_tx(struct vhost_net *net)
>   }
>   }
>  out:
> + vhost_check_coalesce_and_signal(vq->dev, vq, true);
>   mutex_unlock(&vq->mutex);
>  }
>  
> @@ -554,6 +558,9 @@ static void handle_rx(struct vhost_net *net)
>   vq->log : NULL;
>   mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
>  
> + /* Finish pending interrupts first */
> + vhost_check_coalesce_and_signal(vq->dev, vq, false);
> +
>   while ((sock_len = peek_head_len(sock->sk))) {
>   sock_len += sock_hlen;
>   vhost_len = sock_len + vhost_hlen;
> @@ -638,6 +645,7 @@ static void handle_rx(struct vhost_net *net)
>   }
>   }
>  out:
> + vhost_check_coalesce_and_signal(vq->dev, vq, true);
>   mutex_unlock(&vq->mutex);
>  }
>  

Could you implement ethtool control of these coalescing parameters?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 1/3] net: Add cache alignment in sock_common for socket lookup fields

2015-05-26 Thread Tom Herbert
On Tue, May 26, 2015 at 10:54 AM, Eric Dumazet  wrote:
> On Tue, 2015-05-26 at 10:19 -0700, Eric Dumazet wrote:
>
>> No, we do not want to increase the size of sock_common with such a
>> hammer.
>
> Current sizeof(struct sock_common) is 0x78 bytes.
>
> So moving 2 read_mostly pointers into this structure would be enough to
> make 2 first cache lines read mostly.
>
Right, the problem is refcnt out of those cachelines.

> One candidate would be sk_rx_dst, as it is used in early demux.
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 0/3] net: Add incoming CPU mask to sockets

2015-05-26 Thread Tom Herbert
On Tue, May 26, 2015 at 10:18 AM, Eric Dumazet  wrote:
> On Tue, 2015-05-26 at 09:34 -0700, Tom Herbert wrote:
>> Added matching of CPU to a socket CPU mask. This is useful for TCP
>> listeners and unconnected UDP. This works with SO_REUSPORT to steer
>> packets to listener sockets based on CPU affinity. These patches
>> allow steering packets to listeners based on numa locality. This is
>> only useful for passive connections.
>>
>> v2:
>>   - Add cache alignment for fields used in socket lookup in sock_common
>>   - Added UDP test results
>
> What about the feedback I gave earlier Tom ???
>
> This cannot work for TCP in its current state.
>
It does work and it fixes cache server locality issues we are seeing.
Right now half of our connections are persistently crossing numa nodes
on receive-- this is having big negative impact. Yes, there may be
edge conditions where SYN goes to a different CPU than the rest of the
flow (probably need RFS or flow director for that problem), and that
sounds like something nice to fix, but this patch is not dependent on
that. Besides, did you foresee an API change would be required?

>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Socket receives packet to multicast group to which it was not joined since kernel 3.13.10-1

2015-05-26 Thread Eric Dumazet
On Tue, 2015-05-26 at 12:41 -0500, Shawn Bohrer wrote:
> On Sun, May 24, 2015 at 04:55:40AM +, Oliver Graff wrote:
> > Shawn Bohrer  gmail.com> writes:
> > 
> > > 
> > > On Tue, May 13, 2014 at 04:36:41PM -0500, Shawn Bohrer wrote:
> > 
> > > > If I did "break" something here it appears to be because with
> > > > ip_early_demux we only call ip_check_mc_rcu() once on the initial
> > > > packet, and subsequent packets destined for that socket simply need to
> > > > pass the __udp_is_mcast_sock() test.  With ip_early_demux disaled we
> > > > call ip_check_mc_rcu() for every packet.
> > > 
> > > So is the solution to effectively call ip_check_mc_rcu() inside
> > > udp_v4_early_demux()?
> > > 
> > > --
> > > Shawn
> > > 
> > (See earlier parts of thread here: 
> > http://netdev.vger.kernel.narkive.com/mwurLsMT/socket-receives-packet-to-multicast-
> > group-to-which-it-was-not-joined-since-kernel-3-13-10-1)
> > 
> > Was a bug report filed for this / was this fixed? 
> > I looked at the diffs from the referenced commit and the current version of 
> > udp_v4_early_demux and the changes appear to be 
> > minimal,
> > it doesn't look like ip_check_mc_rcu was added. Is that all that's needed?
> 
> Hi Oliver,
> 
> I haven't tested but looking over the code I would say that this has
> not been fixed.  Is this issue impacting you?
> 
> I think calling ip_check_mc_rcu (you might need some of the code from
> ip_route_input_noref too) might "fix" the issue, though I have not
> tried.  My concern is that early demux is a performance optimization
> and I don't know how much this would impact multicast receive
> performance.
> 
> As Eric mentioned in the earlier email if you don't care about
> performance you can simply disable early demux:
> 
> echo 0 >/proc/sys/net/ipv4/ip_early_demux

Well, we should fix the bug, whatever this sysctl value is ;)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >