Re: [PATCH/RFC net-next] rocker: forward packets to CPU when a port in promiscuous mode

2015-07-13 Thread Scott Feldman
On Wed, Jul 8, 2015 at 9:25 PM, Simon Horman  wrote:
> This change allows the CPU to see all packets seen by a port when the
> netdev associated with the port is in promiscuous mode.
>
> This change was previously posted as part of a larger patch and in turn
> patchset which also aimed to allow rocker interfaces to receive packets
> when not bridged. That problem has subsequently been addressed in a
> different way by Scott Feldman.
>
> When this change was previously posted Scott indicated that he had some
> reservations about sending all packets from a switch to the CPU. The
> purpose of posting this patch is to start discussion of weather this
> approach is appropriate and if not how else we might move forwards.
>
> In my opinion if host doesn't want all packets its shouldn't put a port
> in promiscuous mode. But perhaps that is an overly naïve view to take.
>
> My main motivation for this change at this time is to allow rocker to
> work with Open vSwitch and it appears that this change is sufficient to
> reach that goal. Another approach might be to teach
> rocker_port_master_changed() about Open vSwitch.
>
> In the longer term I believe Open vSwitch should be able to program
> flows into rocker 'hardware' and thus not all packets would reach the CPU.

Hi Simon,

I like your alternate approach to teach rocker about Open vSwitch
using rocker_port_master_change() and only when port is captured by
OVS would we install the "promisc" filter to pass all traffic up.
(Maybe call it ROCKER_CTRL_DFLT_OVS rule?).

Putting a non-bridged, non-ovs port into promisc is kind of weird for
a switch port.  I think of the port in L3 mode by default, where the
port is locked down for all but some selective mcasts, and only opened
up by installing explicit routes.  (Unlike a bridged port where we
flood everything L2 we don't understand).

So maybe first pass is to pass up everything when port is captured by
OVS, and then later refine what's passed up per ovs flows on that
port.

-scott
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] fq_codel: fix a use-after-free

2015-07-13 Thread Eric Dumazet
On Mon, 2015-07-13 at 12:30 -0700, Cong Wang wrote:
> Fixes: 25331d6ce42b ("net: sched: implement qstat helper routines")
> Cc: John Fastabend 
> Signed-off-by: Cong Wang 
> Signed-off-by: Cong Wang 
> ---
>  net/sched/sch_fq_codel.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> index d75993f..06e7c84 100644
> --- a/net/sched/sch_fq_codel.c
> +++ b/net/sched/sch_fq_codel.c
> @@ -155,10 +155,10 @@ static unsigned int fq_codel_drop(struct Qdisc *sch)
>   skb = dequeue_head(flow);
>   len = qdisc_pkt_len(skb);
>   q->backlogs[idx] -= len;
> - kfree_skb(skb);
>   sch->q.qlen--;
>   qdisc_qstats_drop(sch);
>   qdisc_qstats_backlog_dec(sch, skb);
> + kfree_skb(skb);
>   flow->dropped++;
>   return idx;
>  }

Thanks Cong

Acked-by: Eric Dumazet 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net: Fix skb csum races when peeking

2015-07-13 Thread Eric Dumazet
On Mon, 2015-07-13 at 20:01 +0800, Herbert Xu wrote:

> ---8<---
> When we calculate the checksum on the recv path, we store the
> result in the skb as an optimisation in case we need the checksum
> again down the line.
> 
> This is in fact bogus for the MSG_PEEK case as this is done without
> any locking.  So multiple threads can peek and then store the result
> to the same skb, potentially resulting in bogus skb states.
> 
> This patch fixes this by only storing the result if the skb is not
> shared.  This preserves the optimisations for the few cases where
> it can be done safely due to locking or other reasons, e.g., SIOCINQ.
> 
> Signed-off-by: Herbert Xu 
> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index b80fb91..4967262 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -622,7 +657,8 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, 
> int len)
>   !skb->csum_complete_sw)
>   netdev_rx_csum_fault(skb->dev);
>   }
> - skb->csum_valid = !sum;
> + if (!skb_shared(skb))
> + skb->csum_valid = !sum;
>   return sum;
>  }
>  EXPORT_SYMBOL(__skb_checksum_complete_head);
> @@ -642,11 +678,13 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
>   netdev_rx_csum_fault(skb->dev);
>   }
>  
> - /* Save full packet checksum */
> - skb->csum = csum;
> - skb->ip_summed = CHECKSUM_COMPLETE;
> - skb->csum_complete_sw = 1;
> - skb->csum_valid = !sum;
> + if (!skb_shared(skb)) {
> + /* Save full packet checksum */
> + skb->csum = csum;
> + skb->ip_summed = CHECKSUM_COMPLETE;
> + skb->csum_complete_sw = 1;
> + skb->csum_valid = !sum;
> + }
>  
>   return sum;
>  }

Acked-by: Eric Dumazet 

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-4.2-rc2/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1993: possible bad error checking ?

2015-07-13 Thread Giuseppe CAVALLARO

Hello David

thx to have looked at this. I'll check the code and eventually fix it, 
unless you already have a patch to propose.


Kind Regards
Peppe

On 7/13/2015 11:48 AM, David Binderman wrote:

Hello there,

[linux-4.2-rc2/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1993]: (style) 
Checking if unsigned variable 'entry' is less than zero.

Source code is

 entry = priv->hw->mode->jumbo_frm(priv, skb, csum_insertion);
 if (unlikely(entry < 0))
 goto dma_map_err;

but

 unsigned int entry;

So the error checking from the function call looks broken to me.

If the return value from the function call to jumbo_frm is a plain signed 
integer, suggest
sanity check that *before* assigning into an unsigned integer.

Regards

David Binderman





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] ipv6: lock socket in ip6_datagram_connect()

2015-07-13 Thread Eric Dumazet
From: Eric Dumazet 

ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.

Signed-off-by: Eric Dumazet 
---
 include/net/ip.h|1 +
 net/ipv4/datagram.c |   16 
 net/ipv6/datagram.c |   20 +++-
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 0750a186ea63..d5fe9f2ab699 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -161,6 +161,7 @@ static inline __u8 get_rtconn_flags(struct ipcm_cookie* 
ipc, struct sock* sk)
 }
 
 /* datagram.c */
+int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int 
addr_len);
 int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int 
addr_len);
 
 void ip4_datagram_release_cb(struct sock *sk);
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 90c0e8386116..574fad9cca05 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -20,7 +20,7 @@
 #include 
 #include 
 
-int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
+int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int 
addr_len)
 {
struct inet_sock *inet = inet_sk(sk);
struct sockaddr_in *usin = (struct sockaddr_in *) uaddr;
@@ -39,8 +39,6 @@ int ip4_datagram_connect(struct sock *sk, struct sockaddr 
*uaddr, int addr_len)
 
sk_dst_reset(sk);
 
-   lock_sock(sk);
-
oif = sk->sk_bound_dev_if;
saddr = inet->inet_saddr;
if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
@@ -82,9 +80,19 @@ int ip4_datagram_connect(struct sock *sk, struct sockaddr 
*uaddr, int addr_len)
sk_dst_set(sk, &rt->dst);
err = 0;
 out:
-   release_sock(sk);
return err;
 }
+EXPORT_SYMBOL(__ip4_datagram_connect);
+
+int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
+{
+   int res;
+
+   lock_sock(sk);
+   res = __ip4_datagram_connect(sk, uaddr, addr_len);
+   release_sock(sk);
+   return res;
+}
 EXPORT_SYMBOL(ip4_datagram_connect);
 
 /* Because UDP xmit path can manipulate sk_dst_cache without holding
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 62d908e64eeb..b10a88986a98 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -40,7 +40,7 @@ static bool ipv6_mapped_addr_any(const struct in6_addr *a)
return ipv6_addr_v4mapped(a) && (a->s6_addr32[3] == 0);
 }
 
-int ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
+static int __ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int 
addr_len)
 {
struct sockaddr_in6 *usin = (struct sockaddr_in6 *) uaddr;
struct inet_sock*inet = inet_sk(sk);
@@ -56,7 +56,7 @@ int ip6_datagram_connect(struct sock *sk, struct sockaddr 
*uaddr, int addr_len)
if (usin->sin6_family == AF_INET) {
if (__ipv6_only_sock(sk))
return -EAFNOSUPPORT;
-   err = ip4_datagram_connect(sk, uaddr, addr_len);
+   err = __ip4_datagram_connect(sk, uaddr, addr_len);
goto ipv4_connected;
}
 
@@ -98,9 +98,9 @@ int ip6_datagram_connect(struct sock *sk, struct sockaddr 
*uaddr, int addr_len)
sin.sin_addr.s_addr = daddr->s6_addr32[3];
sin.sin_port = usin->sin6_port;
 
-   err = ip4_datagram_connect(sk,
-  (struct sockaddr *) &sin,
-  sizeof(sin));
+   err = __ip4_datagram_connect(sk,
+(struct sockaddr *) &sin,
+sizeof(sin));
 
 ipv4_connected:
if (err)
@@ -204,6 +204,16 @@ out:
fl6_sock_release(flowlabel);
return err;
 }
+
+int ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
+{
+   int res;
+
+   lock_sock(sk);
+   res = __ip6_datagram_connect(sk, uaddr, addr_len);
+   release_sock(sk);
+   return res;
+}
 EXPORT_SYMBOL_GPL(ip6_datagram_connect);
 
 int ip6_datagram_connect_v6_only(struct sock *sk, struct sockaddr *uaddr,


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [rhashtable] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:301 __debug_object_init()

2015-07-13 Thread Fengguang Wu
- 
> 16  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> git bisect good 1d9c5d79e6e4385aea6f69c23ba543717434ed70  # 09:14 22+ 
>  0  Merge branch 'for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
> git bisect good 29afc4e9a408f2304e09c6dd0dbcfbd2356d0faa  # 09:18 22+ 
>  0  Merge branch 'for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
> git bisect  bad fd3e646c87ab3f2ba98aa25394581af27cc78dc5  # 09:27  0- 
> 22  net: act_bpf: fix size mismatch on filter preparation
> git bisect  bad e84448d52190413400663736067f826f28a04ad6  # 09:32  0- 
> 22  xen-netfront: refactor skb slot counting
> git bisect  bad 829a3ada9cc7d4c30fa61f8033403fb6c8f8092a  # 09:38  0- 
>  1  geneve: Simplify locking.
> git bisect good a4c9ea5e8fec680134d22aa99b54d1cd8c226ebd  # 09:42 22+ 
> 12  geneve: Add Geneve GRO support
> git bisect good 255047b0dca31e6b8ce254481a0b65d559d2ebb8  # 09:46 20+ 
>  0  Bluetooth: Add timing information to SMP test case runs
> git bisect good 354f473ee2c5d01c1cf90f747f95218ee3e73e95  # 09:52 22+ 
>  0  ath9k: fix typo
> git bisect good d312da293f787e1b19c57acb58e8c1b171c4a04a  # 09:59 22+ 
>  0  ixgbe: convert to CYCLECOUNTER_MASK macro.
> git bisect good b8e1943e9f754219bcfb40bac4a605b5348acb25  # 10:03 22+ 
>  8  rhashtable: Factor out bucket_tail() function
> git bisect  bad f89bd6f87a53ce5a7d60662429591ebac2745c10  # 10:08  0- 
> 22  rhashtable: Supports for nulls marker
> git bisect good 113948d841e8d78039e5dbbb5248f5b73e99eafa  # 10:12 22+ 
> 13  spinlock: Add spin_lock_bh_nested()
> git bisect  bad 97defe1ecf868b8127f8e62395499d6a06e4c4b1  # 10:16  0- 
> 22  rhashtable: Per bucket locks & deferred expansion/shrinking
> # first bad commit: [97defe1ecf868b8127f8e62395499d6a06e4c4b1] rhashtable: 
> Per bucket locks & deferred expansion/shrinking
> git bisect good 113948d841e8d78039e5dbbb5248f5b73e99eafa  # 10:19 66+ 
> 27  spinlock: Add spin_lock_bh_nested()
> # extra tests with DEBUG_INFO
> git bisect  bad 97defe1ecf868b8127f8e62395499d6a06e4c4b1  # 10:25  0- 
> 66  rhashtable: Per bucket locks & deferred expansion/shrinking
> # extra tests on HEAD of linux-devel/devel-spot-201507122014
> git bisect good 3afd2c3f65a385c405a084d80431c84b103cb6df  # 10:28 66+ 
> 49  0day head guard for 'devel-spot-201507122014'
> # extra tests on tree/branch linus/master
> git bisect good f760b87f8f12eb262f14603e65042996fe03720e  # 10:33 66+ 
>  0  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> # extra tests on tree/branch linus/master
> git bisect good f760b87f8f12eb262f14603e65042996fe03720e  # 10:33 66+ 
>  0  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> # extra tests on tree/branch next/master
> git bisect good 2eb62d762a2112579f259903e62ba18d16c51f66  # 10:36 66+ 
> 20  Add linux-next specific files for 20150713
> 
> 
> This script may reproduce the error.
> 
> 
> #!/bin/bash
> 
> kernel=$1
> initrd=yocto-minimal-x86_64.cgz
> 
> wget --no-clobber 
> https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd
> 
> kvm=(
>   qemu-system-x86_64
>   -enable-kvm
>   -cpu Haswell,+smep,+smap
>   -kernel $kernel
>   -initrd $initrd
>   -m 256
>   -smp 1
>   -device e1000,netdev=net0
>   -netdev user,id=net0
>   -boot order=nc
>   -no-reboot
>   -watchdog i6300esb
>   -rtc base=localtime
>   -serial stdio
>   -display none
>   -monitor null 
> )
> 
> append=(
>   hung_task_panic=1
>   earlyprintk=ttyS0,115200
>   systemd.log_level=err
>   debug
>   apic=debug
>   sysrq_always_enabled
>   rcupdate.rcu_cpu_stall_timeout=100
>   panic=-1
>   softlockup_panic=1
>   nmi_watchdog=panic
>   oops=panic
>   load_ramdisk=2
>   prompt_ramdisk=0
>   console=ttyS0,115200
>   console=tty0
>   vga=normal
>   root=/dev/ram0
>   rw
>   drbd.minor_count=8
> )
> 
> "${kvm[@]}" --append "${append[*]}"
> 
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/lkp  Intel Corporation

> early console in setup code
> [0.00] Initializing cgroup subsys cpuset
> [0.00] Initializing cgrou

[rhashtable] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:301 __debug_object_init()

2015-07-13 Thread Fengguang Wu
it bisect good 354f473ee2c5d01c1cf90f747f95218ee3e73e95  # 09:52 22+  
0  ath9k: fix typo
git bisect good d312da293f787e1b19c57acb58e8c1b171c4a04a  # 09:59 22+  
0  ixgbe: convert to CYCLECOUNTER_MASK macro.
git bisect good b8e1943e9f754219bcfb40bac4a605b5348acb25  # 10:03 22+  
8  rhashtable: Factor out bucket_tail() function
git bisect  bad f89bd6f87a53ce5a7d60662429591ebac2745c10  # 10:08  0- 
22  rhashtable: Supports for nulls marker
git bisect good 113948d841e8d78039e5dbbb5248f5b73e99eafa  # 10:12 22+ 
13  spinlock: Add spin_lock_bh_nested()
git bisect  bad 97defe1ecf868b8127f8e62395499d6a06e4c4b1  # 10:16  0- 
22  rhashtable: Per bucket locks & deferred expansion/shrinking
# first bad commit: [97defe1ecf868b8127f8e62395499d6a06e4c4b1] rhashtable: Per 
bucket locks & deferred expansion/shrinking
git bisect good 113948d841e8d78039e5dbbb5248f5b73e99eafa  # 10:19 66+ 
27  spinlock: Add spin_lock_bh_nested()
# extra tests with DEBUG_INFO
git bisect  bad 97defe1ecf868b8127f8e62395499d6a06e4c4b1  # 10:25  0- 
66  rhashtable: Per bucket locks & deferred expansion/shrinking
# extra tests on HEAD of linux-devel/devel-spot-201507122014
git bisect good 3afd2c3f65a385c405a084d80431c84b103cb6df  # 10:28 66+ 
49  0day head guard for 'devel-spot-201507122014'
# extra tests on tree/branch linus/master
git bisect good f760b87f8f12eb262f14603e65042996fe03720e  # 10:33 66+  
0  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
# extra tests on tree/branch linus/master
git bisect good f760b87f8f12eb262f14603e65042996fe03720e  # 10:33 66+  
0  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
# extra tests on tree/branch next/master
git bisect good 2eb62d762a2112579f259903e62ba18d16c51f66  # 10:36 66+ 
20  Add linux-next specific files for 20150713


This script may reproduce the error.


#!/bin/bash

kernel=$1
initrd=yocto-minimal-x86_64.cgz

wget --no-clobber 
https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd

kvm=(
qemu-system-x86_64
-enable-kvm
-cpu Haswell,+smep,+smap
-kernel $kernel
-initrd $initrd
-m 256
-smp 1
-device e1000,netdev=net0
-netdev user,id=net0
-boot order=nc
-no-reboot
-watchdog i6300esb
-rtc base=localtime
-serial stdio
-display none
-monitor null 
)

append=(
hung_task_panic=1
earlyprintk=ttyS0,115200
systemd.log_level=err
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
console=ttyS0,115200
console=tty0
vga=normal
root=/dev/ram0
rw
drbd.minor_count=8
)

"${kvm[@]}" --append "${append[*]}"


---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/lkp  Intel Corporation
early console in setup code
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.19.0-rc2-00323-g97defe1 (kbuild@lkp-ib03) (gcc 
version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Tue Jul 14 10:14:59 CST 2015
[0.00] Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 
systemd.log_level=err debug apic=debug sysrq_always_enabled 
rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 
nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 
console=ttyS0,115200 console=tty0 vga=normal  root=/dev/ram0 rw 
link=/kbuild-tests/run-queue/kvm/x86_64-randconfig-a0-07122340/linux-devel:devel-spot-201507122014:97defe1ecf868b8127f8e62395499d6a06e4c4b1:bisect-linux-1/.vmlinuz-97defe1ecf868b8127f8e62395499d6a06e4c4b1-20150714101515-19-ivb41
 branch=linux-devel/devel-spot-201507122014 
BOOT_IMAGE=/pkg/linux/x86_64-randconfig-a0-07122340/gcc-4.9/97defe1ecf868b8127f8e62395499d6a06e4c4b1/vmlinuz-3.19.0-rc2-00323-g97defe1
 drbd.minor_count=8
[0.00] KERNEL supported cpus:
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] CPU: vendor_id 'GenuineIntel' unknown, using generic init.
[0.00] CPU: Your system may be unstable.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x0ffd] usable
[0.00] BIOS-e820: [mem 

Re: mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings

2015-07-13 Thread Cong Wang
On Mon, Jul 13, 2015 at 6:18 AM, Kirill A. Shutemov
 wrote:
> Hi,
>
> This simple test-case trigers few locking asserts in kernel:
>
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> #define SOL_NETLINK 270
>
> int main(int argc, char **argv)
> {
> unsigned int block_size = 16 * 4096;
> struct nl_mmap_req req = {
> .nm_block_size  = block_size,
> .nm_block_nr= 64,
> .nm_frame_size  = 16384,
> .nm_frame_nr= 64 * block_size / 16384,
> };
> unsigned int ring_size;
> int fd;
>
> fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
> if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 
> 0)
> exit(1);
> if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 
> 0)
> exit(1);
>
> ring_size = req.nm_block_nr * req.nm_block_size;
> mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> return 0;
> }
>
> +++ exited with 0 +++
> [2.500126] BUG: sleeping function called from invalid context at 
> /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
> [2.501328] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
> [2.501997] 3 locks held by init/1:
> [2.502380]  #0:  (reboot_mutex){+.+...}, at: [] 
> SyS_reboot+0xa9/0x220
> [2.503328]  #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: 
> [] __blocking_notifier_call_chain+0x39/0x70
> [2.504659]  #2:  (rcu_callback){..}, at: [] 
> rcu_do_batch.isra.49+0x160/0x10c0
> [2.505724] Preemption disabled at:[] __delay+0xf/0x20
> [2.506443]
> [2.506612] CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-9-gbddf4c4818e0 
> #253
> [2.507378] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> Debian-1.8.2-1 04/01/2014
> [2.508386]  88017b3d8000 88027bc03c38 81929ceb 
> 0102
> [2.509233]   88027bc03c68 81085a9d 
> 0002
> [2.510057]  81ca2a20 0268  
> 88027bc03c98
> [2.510882] Call Trace:
> [2.511146][] dump_stack+0x4f/0x7b
> [2.511763]  [] ___might_sleep+0x16d/0x270
> [2.512476]  [] __might_sleep+0x4d/0x90
> [2.513071]  [] mutex_lock_nested+0x2f/0x430
> [2.513683]  [] ? _raw_spin_unlock_irqrestore+0x5d/0x80
> [2.514385]  [] ? __this_cpu_preempt_check+0x13/0x20
> [2.515066]  [] netlink_set_ring+0x1ed/0x350
> [2.515694]  [] ? netlink_undo_bind+0x70/0x70
> [2.516411]  [] netlink_sock_destruct+0x80/0x150
> [2.517070]  [] __sk_free+0x1d/0x160
> [2.517607]  [] sk_free+0x19/0x20
> [2.518118]  [] deferred_put_nlk_sk+0x20/0x30
> [2.518735]  [] rcu_do_batch.isra.49+0x79c/0x10c0

Caused by:

commit 21e4902aea80ef35afc00ee8d2abdea4f519b7f7
Author: Thomas Graf 
Date:   Fri Jan 2 23:00:22 2015 +0100

netlink: Lockless lookup with RCU grace period in socket release

Defers the release of the socket reference using call_rcu() to
allow using an RCU read-side protected call to rhashtable_lookup()

This restores behaviour and performance gains as previously
introduced by e341694 ("netlink: Convert netlink_lookup() to use
RCU protected hash table") without the side effect of severely
delayed socket destruction.

Signed-off-by: Thomas Graf 
Signed-off-by: David S. Miller 


We can't hold mutex lock in a rcu callback, perhaps we could
defer the mmap ring cleanup to a workqueue.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] net: ieee802154: Remove redundant spi driver bus initialization

2015-07-13 Thread Marcel Holtmann
Hi Antonio,

> In ancient times it was necessary to manually initialize the bus
> field of an spi_driver to spi_bus_type. These days this is done in
> spi_register_driver(), so we can drop the manual assignment.
> 
> Signed-off-by: Antonio Borneo 
> To: Alan Ott 
> To: Alexander Aring 
> To: Varka Bhadram 
> To: linux-w...@vger.kernel.org
> To: netdev@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> ---
> drivers/net/ieee802154/cc2520.c   | 1 -
> drivers/net/ieee802154/mrf24j40.c | 1 -
> 2 files changed, 2 deletions(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] e1000e: Move e1000e_disable_aspm_locked() inside CONFIG_PM

2015-07-13 Thread Michael Ellerman
e1000e_disable_aspm_locked() is only used in __e1000_resume() which is
inside CONFIG_PM. So when CONFIG_PM=n we get a "defined but not used"
warning for e1000e_disable_aspm_locked().

Move it inside the existing CONFIG_PM block to avoid the warning.

Signed-off-by: Michael Ellerman 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 89d788d8f263..f1d7fe2ea183 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6439,6 +6439,7 @@ static void e1000e_disable_aspm(struct pci_dev *pdev, u16 
state)
__e1000e_disable_aspm(pdev, state, 0);
 }
 
+#ifdef CONFIG_PM
 /**
  * e1000e_disable_aspm_locked   Disable ASPM states.
  * @pdev: pointer to PCI device struct
@@ -6452,7 +6453,6 @@ static void e1000e_disable_aspm_locked(struct pci_dev 
*pdev, u16 state)
__e1000e_disable_aspm(pdev, state, 1);
 }
 
-#ifdef CONFIG_PM
 static int __e1000_resume(struct pci_dev *pdev)
 {
struct net_device *netdev = pci_get_drvdata(pdev);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bridge: mdb: fix double add notification

2015-07-13 Thread Cong Wang
On Mon, Jul 13, 2015 at 6:36 AM, Nikolay Aleksandrov
 wrote:
> Since the mdb add/del code was introduced there have been 2 br_mdb_notify
> calls when doing br_mdb_add() resulting in 2 notifications on each add.
>
> Example:
>  Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
>  Before patch:
>  root@debian:~# bridge monitor all
>  [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
>  [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
>
>  After patch:
>  root@debian:~# bridge monitor all
>  [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
>
> Signed-off-by: Nikolay Aleksandrov 
> Fixes: cfd567543590 ("bridge: add support of adding and deleting mdb entries")
> ---
>  net/bridge/br_mdb.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
> index c11cf2611db0..1198a3dbad95 100644
> --- a/net/bridge/br_mdb.c
> +++ b/net/bridge/br_mdb.c
> @@ -351,7 +351,6 @@ static int br_mdb_add_group(struct net_bridge *br, struct 
> net_bridge_port *port,
> if (state == MDB_TEMPORARY)
> mod_timer(&p->timer, now + br->multicast_membership_interval);
>
> -   br_mdb_notify(br->dev, port, group, RTM_NEWMDB);
> return 0;
>  }

Looks good to me.

And probably we can convert existing __br_mdb_notify() to using
non-atomic allocation too, but that is for net-next.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] net: macb: Add mdio driver for accessing multiple phy devices

2015-07-13 Thread punnaiah choudary kalluri
On Tue, Jul 14, 2015 at 12:13 AM, Florian Fainelli  wrote:
> On 12/07/15 21:48, Punnaiah Choudary Kalluri wrote:
>> This patch is to add support for the design that has multiple ethernet
>> mac controllers and single mdio bus connected to multiple phy devices.
>> i.e mdio lines are connected to any of the ethernet mac controller and
>> all the phy devices will be accessed using the phy maintenance interface
>> in that mac controller.
>>
>>  __   _
>> |  | |PHY0 |
>> | MAC0 |-| |
>> |__|   | |_|
>>|
>>  __|  _
>> |  |   | | |
>> | MAC1 |   |_|PHY1 |
>> |__| | |
>>
>> So, i come up with two implementations for addressing the above 
>> configuration.
>>
>> Implementation 1:
>>  Have separate driver for mdio bus
>>  Create a DT node for all the PHY devices connected to the mdio bus
>>  This driver will share the register space of the mac controller that has
>>  mdio bus connected.
>
> That is the best design implementation, MDIO in itself is a sub-piece of
> your Ethernet MAC controller the fact that it is within the Ethernet MAC
> core is just coincidental, but there is no reason why it could not be
> taken apart and made a separate block in itself.

Thanks Florian for suggesting this.
No idea on why the mdio block was not made a separate block.

regards,
Punnaiah

>
>>
>> Implementation 2:
>>  Add new property "has-mdio" and it should be 1 for the mac that has mdio bus
>>  connected.
>>  Create the mdio bus only when the has-mdio property is 1
>>
>> Please review the two implementations and suggest which one is better to 
>> proceed
>> further. In my opinion implementation 1 will be the ideal one.
>
> Agreed.
>
>>
>> Currently i have tested the patches with single mac and single phy
>> configuration. I need to take care of few more cases before releasing the 
>> final patch
>> but before that i would like to have your opinion on the above 
>> implementations
>> and finalize one implementation. so that i can enhance it further.
>>
>> Punnaiah Choudary Kalluri (1):
>>   net: macb: Add mdio driver for accessing multiple phy devices
>>   net: macb: Add support for single mac managing more than one phy
>>
>>
>>  drivers/net/ethernet/cadence/Makefile|2 +-
>>  drivers/net/ethernet/cadence/macb.c  |   93 +-
>>  drivers/net/ethernet/cadence/macb.h  |3 +-
>>  drivers/net/ethernet/cadence/macb_mdio.c |  204 
>> ++
>>  4 files changed, 211 insertions(+), 91 deletions(-)
>>  create mode 100644 drivers/net/ethernet/cadence/macb_mdio.c
>>
>
>
> --
> Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl

2015-07-13 Thread Hall, Christopher S
I am assuming the patch is rejected at this point.  I will re-submit later as 
soon as I am able to post a full end to end solution.

Chris

> -Original Message-
> From: Richard Cochran [mailto:richardcoch...@gmail.com]
> Sent: Thursday, July 09, 2015 7:58 AM
> To: Hall, Christopher S
> Cc: t...@linutronix.de; john.stu...@linaro.org; Ronciak, John; linux-
> ker...@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [PATCH v3] Add support for driver cross-timestamp to
> PTP_SYS_OFFSET ioctl
> 
> On Wed, Jul 08, 2015 at 01:46:41PM -0700, Christopher Hall wrote:
> > This patch allows system and device time ("cross-timestamp") to be
> > performed by the driver. Currently, the cross-timestamping is performed
> > in the PTP_SYS_OFFSET ioctl.  The PTP clock driver reads gettimeofday()
> > and the gettime64() callback provided by the driver. The cross-timestamp
> > is best effort where the latency between the capture of system time
> > (getnstimeofday()) and the device time (driver callback) may be
> > significant.
> 
> The interface looks okay to me.  Now all we need is a user of it...
> 
> Acked-by: Richard Cochran 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


linux-next: manual merge of the net-next tree with Linus' tree

2015-07-13 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  net/bridge/br_mdb.c

between commit:

  f1158b74e54f ("bridge: mdb: zero out the local br_ip variable before use")

from Linus' tree and commit:

  74fe61f17e99 ("bridge: mdb: add vlan support for user entries")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc net/bridge/br_mdb.c
index c11cf2611db0,a8d0e93d43f2..
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@@ -348,10 -352,8 +353,10 @@@ static int br_mdb_add_group(struct net_
if (unlikely(!p))
return -ENOMEM;
rcu_assign_pointer(*pp, p);
 +  if (state == MDB_TEMPORARY)
 +  mod_timer(&p->timer, now + br->multicast_membership_interval);
  
-   br_mdb_notify(br->dev, port, group, RTM_NEWMDB);
+   br_mdb_notify(br->dev, port, group, RTM_NEWMDB, state);
return 0;
  }
  
@@@ -374,7 -376,7 +379,8 @@@ static int __br_mdb_add(struct net *net
if (!p || p->br != br || p->state == BR_STATE_DISABLED)
return -EINVAL;
  
 +  memset(&ip, 0, sizeof(ip));
+   ip.vid = entry->vid;
ip.proto = entry->addr.proto;
if (ip.proto == htons(ETH_P_IP))
ip.u.ip4 = entry->addr.u.ip4;
@@@ -421,14 -423,21 +427,15 @@@ static int __br_mdb_del(struct net_brid
if (!netif_running(br->dev) || br->multicast_disabled)
return -EINVAL;
  
 +  memset(&ip, 0, sizeof(ip));
+   ip.vid = entry->vid;
ip.proto = entry->addr.proto;
 -  if (ip.proto == htons(ETH_P_IP)) {
 -  if (timer_pending(&br->ip4_other_query.timer))
 -  return -EBUSY;
 -
 +  if (ip.proto == htons(ETH_P_IP))
ip.u.ip4 = entry->addr.u.ip4;
  #if IS_ENABLED(CONFIG_IPV6)
 -  } else {
 -  if (timer_pending(&br->ip6_other_query.timer))
 -  return -EBUSY;
 -
 +  else
ip.u.ip6 = entry->addr.u.ip6;
  #endif
 -  }
  
spin_lock_bh(&br->multicast_lock);
mdb = mlock_dereference(br->mdb, br);


pgpm_Y4weDZDx.pgp
Description: OpenPGP digital signature


Re: [PATCH net-next v5 1/4] net core: Add protodown support.

2015-07-13 Thread Anuradha Karuppiah
On Mon, Jul 13, 2015 at 2:34 PM, David Miller  wrote:
> From: anurad...@cumulusnetworks.com
> Date: Thu,  9 Jul 2015 15:35:27 -0700
>
>> +/* proto_flags - port state information can be passed to the switch driver 
>> and
>> + * used to determine the phys state of the switch port */
>> +enum {
>> + IF_PROTOF_DOWN  = 1<<0  /* set switch port phys state down */
>> +};
>
> Realistically, do we really foresee any other proto flags being added in
> the future?
>
> Unless there is a strong sense that we will have some, this is
> insanely overengineered with all of these bit masking capabilities and
> such and nested attributes.
>
> I'd say just do one boolean attribute and that's it.
Ack, I will resubmit with protodown as a boolean. Thanks for the review Dave.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/3] rhashtable: Add scored lookups

2015-07-13 Thread Tom Herbert
This patch adds a mechanism to do scored lookups in an rhashtable.
This mechanism is based on the UDP and TCP listener socket lookup
functions.

When a bucket is traversed, a matching score is computed for each entry
and the input key. The entry with the highest non-zero score is
returned, and if there are multiple entries with the highest score then
one entry is selected by modulo on a hash value for the key (which must
be separate from the hash used to determine the bucket).

Signed-off-by: Tom Herbert 
---
 include/linux/rhashtable.h | 64 ++
 1 file changed, 64 insertions(+)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 05171c3..317bc793 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -93,6 +94,8 @@ typedef u32 (*rht_obj_hashfn_t)(const void *data, u32 len, 
u32 seed);
 typedef int (*rht_obj_cmpfn_t)(struct rhashtable_compare_arg *arg,
   const void *obj);
 typedef int (*rht_obj_orderfn_t)(const void *obj);
+typedef unsigned int (*rht_obj_scorefn_t)(struct rhashtable_compare_arg *arg,
+ const void *obj);
 
 struct rhashtable;
 
@@ -515,6 +518,67 @@ static inline int rhashtable_compare(struct 
rhashtable_compare_arg *arg,
return memcmp(ptr + ht->p.key_offset, arg->key, ht->p.key_len);
 }
 
+
+/**
+ * rhashtable_lookup_scored - search hash table with scoring
+ * @ht:hash table
+ * @key:   the pointer to the key
+ * @params:hash table parameters
+ * @obj_scorefn: Scoring function
+ * @flow_hash: hash value used to select when multiple matches are found
+ *
+ * Computes the hash value for the key and traverses the bucket chain computing
+ * a match score for each object on the list and the key. The entry with the
+ * highest score is returned. If there is more than one entry with the highest
+ * score, then one of the entries is selected based on a hash of the input key.
+ */
+static inline void *rhashtable_lookup_scored(
+   struct rhashtable *ht, const void *key,
+   const struct rhashtable_params params,
+   rht_obj_scorefn_t obj_scorefn,
+   unsigned int flow_hash)
+{
+   struct rhashtable_compare_arg arg = {
+   .ht = ht,
+   .key = key,
+   };
+   const struct bucket_table *tbl;
+   struct rhash_head *he, *result = NULL;
+   unsigned int hash, score;
+   unsigned int best_score = 0, khash = 0;
+   int matches = 0;
+
+   rcu_read_lock();
+
+   tbl = rht_dereference_rcu(ht->tbl, ht);
+restart:
+   hash = rht_key_hashfn(ht, tbl, key, params);
+   rht_for_each_rcu(he, tbl, hash) {
+   score = obj_scorefn(&arg, rht_obj(ht, he));
+   if (score > best_score) {
+   result = he;
+   best_score = score;
+   matches = 1;
+   khash = flow_hash;
+   } else if (score && score == best_score) {
+   matches++;
+   if (reciprocal_scale(khash, matches) == 0)
+   result = he;
+   khash = next_pseudo_random32(khash);
+   }
+   }
+
+   /* Ensure we see any new tables. */
+   smp_rmb();
+
+   tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+   if (unlikely(tbl))
+   goto restart;
+   rcu_read_unlock();
+
+   return result ? rht_obj(ht, result) : NULL;
+}
+
 /**
  * rhashtable_lookup_fast - search hash table, inlined version
  * @ht:hash table
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/3] rhashtable: Wildcard and scored lookups

2015-07-13 Thread Tom Herbert
This patch set implements:
  - A compare function can be passed in the lookup. This allows for
comparison to include "wildcard fields"
  - Order insertion within a bucket, so that entries with more specific
information can be matched first.
  - Scored lookups. This is like the socket lookups. It allows
different levels of matching, and returning one of N possible
best matches with a uniform distribution based on flow hash.

Testing: Tested this in conjunction with ILA development. Will be
posting ILA patches shortly.


Tom Herbert (3):
  rhashtable: Add a function for in order insertion in buckets
  rhashtable: allow lookup function to have compare function agument
  rhashtable: Add scored lookups

 include/linux/rhashtable.h | 122 ++---
 lib/rhashtable.c   |  20 
 2 files changed, 125 insertions(+), 17 deletions(-)

-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/3] rhashtable: allow lookup function to have compare function agument

2015-07-13 Thread Tom Herbert
Added rhashtable_lookup_fast_cmpfn which does a lookup in an rhash table
with the compare function being taken from an argument. This allows
different compare functions to be used on the same table.

Signed-off-by: Tom Herbert 
---
 include/linux/rhashtable.h | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 8e27159..05171c3 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -526,9 +526,10 @@ static inline int rhashtable_compare(struct 
rhashtable_compare_arg *arg,
  *
  * Returns the first entry on which the compare function returned true.
  */
-static inline void *rhashtable_lookup_fast(
+static inline void *rhashtable_lookup_fast_cmpfn(
struct rhashtable *ht, const void *key,
-   const struct rhashtable_params params)
+   const struct rhashtable_params params,
+   rht_obj_cmpfn_t obj_cmpfn)
 {
struct rhashtable_compare_arg arg = {
.ht = ht,
@@ -544,8 +545,8 @@ static inline void *rhashtable_lookup_fast(
 restart:
hash = rht_key_hashfn(ht, tbl, key, params);
rht_for_each_rcu(he, tbl, hash) {
-   if (params.obj_cmpfn ?
-   params.obj_cmpfn(&arg, rht_obj(ht, he)) :
+   if (obj_cmpfn ?
+   obj_cmpfn(&arg, rht_obj(ht, he)) :
rhashtable_compare(&arg, rht_obj(ht, he)))
continue;
rcu_read_unlock();
@@ -563,6 +564,14 @@ restart:
return NULL;
 }
 
+static inline void *rhashtable_lookup_fast(
+   struct rhashtable *ht, const void *key,
+   const struct rhashtable_params params)
+{
+   return rhashtable_lookup_fast_cmpfn(ht, key, params,
+   params.obj_cmpfn);
+}
+
 struct rht_insert_pos {
struct rhash_head __rcu *head;
struct rhash_head __rcu **pos;
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/3] rhashtable: Add a function for in order insertion in buckets

2015-07-13 Thread Tom Herbert
The obj_orderfn function may be specified in the parameters for a
rhashtable. When inserting an element this function is used to order
objects in a bucket list (greatest to least ordering value).This
allows entries to have wild card fields, where entries with
more specific information match are placed first in the bucket.
When a lookup is done, the first match found will contain
the most specific match.

Signed-off-by: Tom Herbert 
---
 include/linux/rhashtable.h | 41 ++---
 lib/rhashtable.c   | 20 ++--
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 843ceca..8e27159 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -92,6 +92,7 @@ typedef u32 (*rht_hashfn_t)(const void *data, u32 len, u32 
seed);
 typedef u32 (*rht_obj_hashfn_t)(const void *data, u32 len, u32 seed);
 typedef int (*rht_obj_cmpfn_t)(struct rhashtable_compare_arg *arg,
   const void *obj);
+typedef int (*rht_obj_orderfn_t)(const void *obj);
 
 struct rhashtable;
 
@@ -111,6 +112,7 @@ struct rhashtable;
  * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
  * @obj_hashfn: Function to hash object
  * @obj_cmpfn: Function to compare key with object
+ * @obj_orderfn: Function to order an object for in-order insertion
  */
 struct rhashtable_params {
size_t  nelem_hint;
@@ -127,6 +129,7 @@ struct rhashtable_params {
rht_hashfn_thashfn;
rht_obj_hashfn_tobj_hashfn;
rht_obj_cmpfn_t obj_cmpfn;
+   rht_obj_orderfn_t   obj_orderfn;
 };
 
 /**
@@ -560,6 +563,37 @@ restart:
return NULL;
 }
 
+struct rht_insert_pos {
+   struct rhash_head __rcu *head;
+   struct rhash_head __rcu **pos;
+};
+
+static inline void rht_insert_pos(struct rhashtable *ht,
+ struct rhash_head *obj,
+ struct bucket_table *tbl,
+ unsigned int hash,
+ struct rht_insert_pos *ipos)
+{
+   struct rhash_head __rcu *head, **pos;
+
+   pos = &tbl->buckets[hash];
+
+   if (ht->p.obj_orderfn) {
+   int obj_order = ht->p.obj_orderfn(rht_obj(ht, obj));
+
+   rht_for_each_rcu(head, tbl, hash) {
+   if (ht->p.obj_orderfn(rht_obj(ht, head)) <= obj_order)
+   break;
+   pos = &head->next;
+   }
+   } else {
+   head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
+   }
+
+   ipos->head = head;
+   ipos->pos = pos;
+}
+
 /* Internal function, please use rhashtable_insert_fast() instead */
 static inline int __rhashtable_insert_fast(
struct rhashtable *ht, const void *key, struct rhash_head *obj,
@@ -571,6 +605,7 @@ static inline int __rhashtable_insert_fast(
};
struct bucket_table *tbl, *new_tbl;
struct rhash_head *head;
+   struct rht_insert_pos ipos;
spinlock_t *lock;
unsigned int elasticity;
unsigned int hash;
@@ -633,11 +668,11 @@ slow_path:
 
err = 0;
 
-   head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
+   rht_insert_pos(ht, obj, tbl, hash, &ipos);
 
-   RCU_INIT_POINTER(obj->next, head);
+   RCU_INIT_POINTER(obj->next, ipos.head);
 
-   rcu_assign_pointer(tbl->buckets[hash], obj);
+   rcu_assign_pointer(*ipos.pos, obj);
 
atomic_inc(&ht->nelems);
if (rht_grow_above_75(ht, tbl))
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index a60a6d3..0de37e0 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -162,9 +162,10 @@ static int rhashtable_rehash_one(struct rhashtable *ht, 
unsigned int old_hash)
rht_dereference_rcu(old_tbl->future_tbl, ht));
struct rhash_head __rcu **pprev = &old_tbl->buckets[old_hash];
int err = -ENOENT;
-   struct rhash_head *head, *next, *entry;
+   struct rhash_head *next, *entry;
spinlock_t *new_bucket_lock;
unsigned int new_hash;
+   struct rht_insert_pos ipos;
 
rht_for_each(entry, old_tbl, old_hash) {
err = 0;
@@ -184,15 +185,14 @@ static int rhashtable_rehash_one(struct rhashtable *ht, 
unsigned int old_hash)
new_bucket_lock = rht_bucket_lock(new_tbl, new_hash);
 
spin_lock_nested(new_bucket_lock, SINGLE_DEPTH_NESTING);
-   head = rht_dereference_bucket(new_tbl->buckets[new_hash],
- new_tbl, new_hash);
+   rht_insert_pos(ht, entry, new_tbl, new_hash, &ipos);
 
-   if (rht_is_a_nulls(head))
+   if (rht_is_a_nulls(ipos.head))
INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
else
-   RCU_INIT_POINTER(entry->next, head);
+   RCU_INIT_POINTER(entry->next, ipos.head);

Re: [PATCH 1/2] isdn/gigaset: reset tty->receive_room when attaching ser_gigaset

2015-07-13 Thread Tilman Schmidt
Am 14.07.2015 um 01:14 schrieb Peter Hurley:
> On 07/13/2015 06:37 PM, Tilman Schmidt wrote:
>> Commit 79901317ce80 ("n_tty: Don't flush buffer when closing ldisc"),
>> first merged in kernel release 3.10, caused the following regression
>> in the Gigaset M101 driver:
>>
>> Before that commit, when closing the N_TTY line discipline in
>> preparation to switching to N_GIGASET_M101, receive_room would be
>> reset to a non-zero value by the call to n_tty_flush_buffer() in
>> n_tty's close method. With the removal of that call, receive_room
>> might be left at zero, blocking data reception on the serial line.
> 
> That commit didn't cause the problem; it was a bug all along.

Sure. That's why it is correctly fixed in the Gigaset driver.
But before that commit the bug was never actually triggered.
So that commit defines the point in the commit history from
which the fix is needed, and therefore needs to be mentioned
in order to decide which stable releases will need the fix.

> Non-flow controlling line disciplines _must_ set tty->receive_room
> on line discipline open because they are declaring that every
> input they can accept that much data.

I have submitted a corresponding fix to the line discipline
documentation separately.

Thanks,
Tilman
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] isdn/gigaset: reset tty->receive_room when attaching ser_gigaset

2015-07-13 Thread Tilman Schmidt
Commit 79901317ce80 ("n_tty: Don't flush buffer when closing ldisc"),
first merged in kernel release 3.10, caused the following regression
in the Gigaset M101 driver:

Before that commit, when closing the N_TTY line discipline in
preparation to switching to N_GIGASET_M101, receive_room would be
reset to a non-zero value by the call to n_tty_flush_buffer() in
n_tty's close method. With the removal of that call, receive_room
might be left at zero, blocking data reception on the serial line.

The present patch fixes that regression by setting receive_room
to an appropriate value in the ldisc open method.

Fixes: 79901317ce80 ("n_tty: Don't flush buffer when closing ldisc")
Signed-off-by: Tilman Schmidt 
---
 drivers/isdn/gigaset/ser-gigaset.c |   11 ++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/isdn/gigaset/ser-gigaset.c 
b/drivers/isdn/gigaset/ser-gigaset.c
index 8c91fd5..3ac9c41 100644
--- a/drivers/isdn/gigaset/ser-gigaset.c
+++ b/drivers/isdn/gigaset/ser-gigaset.c
@@ -524,9 +524,18 @@ gigaset_tty_open(struct tty_struct *tty)
cs->hw.ser->tty = tty;
atomic_set(&cs->hw.ser->refcnt, 1);
init_completion(&cs->hw.ser->dead_cmp);
-
tty->disc_data = cs;
 
+   /* Set the amount of data we're willing to receive per call
+* from the hardware driver to half of the input buffer size
+* to leave some reserve.
+* Note: We don't do flow control towards the hardware driver.
+* If more data is received than will fit into the input buffer,
+* it will be dropped and an error will be logged. This should
+* never happen as the device is slow and the buffer size ample.
+*/
+   tty->receive_room = RBUFSIZE/2;
+
/* OK.. Initialization of the datastructures and the HW is done.. Now
 * startup system and notify the LL that we are ready to run
 */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] isdn/gigaset: drop unused ldisc methods

2015-07-13 Thread Tilman Schmidt
The line discipline read and write methods are optional so the dummy
methods in ser_gigaset are unnecessary and can be removed.

Signed-off-by: Tilman Schmidt 
---
 drivers/isdn/gigaset/ser-gigaset.c |   24 
 1 files changed, 0 insertions(+), 24 deletions(-)

diff --git a/drivers/isdn/gigaset/ser-gigaset.c 
b/drivers/isdn/gigaset/ser-gigaset.c
index 3ac9c41..375be50 100644
--- a/drivers/isdn/gigaset/ser-gigaset.c
+++ b/drivers/isdn/gigaset/ser-gigaset.c
@@ -607,28 +607,6 @@ static int gigaset_tty_hangup(struct tty_struct *tty)
 }
 
 /*
- * Read on the tty.
- * Unused, received data goes only to the Gigaset driver.
- */
-static ssize_t
-gigaset_tty_read(struct tty_struct *tty, struct file *file,
-unsigned char __user *buf, size_t count)
-{
-   return -EAGAIN;
-}
-
-/*
- * Write on the tty.
- * Unused, transmit data comes only from the Gigaset driver.
- */
-static ssize_t
-gigaset_tty_write(struct tty_struct *tty, struct file *file,
- const unsigned char *buf, size_t count)
-{
-   return -EAGAIN;
-}
-
-/*
  * Ioctl on the tty.
  * Called in process context only.
  * May be re-entered by multiple ioctl calling threads.
@@ -761,8 +739,6 @@ static struct tty_ldisc_ops gigaset_ldisc = {
.open   = gigaset_tty_open,
.close  = gigaset_tty_close,
.hangup = gigaset_tty_hangup,
-   .read   = gigaset_tty_read,
-   .write  = gigaset_tty_write,
.ioctl  = gigaset_tty_ioctl,
.receive_buf= gigaset_tty_receive,
.write_wakeup   = gigaset_tty_wakeup,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Fix long-standing regression in ser_gigaset ISDN driver

2015-07-13 Thread Tilman Schmidt
This series fixes a serious regression in the Gigaset M101 driver
introduced in kernel release 3.10 and removes some unneeded code.

Please also queue up patch 1 of the series for inclusion in the
stable/longterm releases 3.10 and later.

Tilman Schmidt (2):
  isdn/gigaset: reset tty->receive_room when attaching ser_gigaset
  isdn/gigaset: drop unused ldisc methods

 drivers/isdn/gigaset/ser-gigaset.c |   35 ++-
 1 files changed, 10 insertions(+), 25 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] isdn/gigaset: reset tty->receive_room when attaching ser_gigaset

2015-07-13 Thread Peter Hurley
On 07/13/2015 06:37 PM, Tilman Schmidt wrote:
> Commit 79901317ce80 ("n_tty: Don't flush buffer when closing ldisc"),
> first merged in kernel release 3.10, caused the following regression
> in the Gigaset M101 driver:
> 
> Before that commit, when closing the N_TTY line discipline in
> preparation to switching to N_GIGASET_M101, receive_room would be
> reset to a non-zero value by the call to n_tty_flush_buffer() in
> n_tty's close method. With the removal of that call, receive_room
> might be left at zero, blocking data reception on the serial line.

That commit didn't cause the problem; it was a bug all along.

For example, if the tty had first been hooked up to some other line
discipline which consumed most of tty->receive_room, _then_
switched to N_GIGASET_M101 line discipline, the same problem would
have occurred.

Non-flow controlling line disciplines _must_ set tty->receive_room
on line discipline open because they are declaring that every
input they can accept that much data.

Regards,
Peter Hurley

> The present patch fixes that regression by setting receive_room
> to an appropriate value in the ldisc open method.
> 
> Fixes: 79901317ce80 ("n_tty: Don't flush buffer when closing ldisc")
> Signed-off-by: Tilman Schmidt 
> ---
>  drivers/isdn/gigaset/ser-gigaset.c |   11 ++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/isdn/gigaset/ser-gigaset.c 
> b/drivers/isdn/gigaset/ser-gigaset.c
> index 8c91fd5..3ac9c41 100644
> --- a/drivers/isdn/gigaset/ser-gigaset.c
> +++ b/drivers/isdn/gigaset/ser-gigaset.c
> @@ -524,9 +524,18 @@ gigaset_tty_open(struct tty_struct *tty)
>   cs->hw.ser->tty = tty;
>   atomic_set(&cs->hw.ser->refcnt, 1);
>   init_completion(&cs->hw.ser->dead_cmp);
> -
>   tty->disc_data = cs;
>  
> + /* Set the amount of data we're willing to receive per call
> +  * from the hardware driver to half of the input buffer size
> +  * to leave some reserve.
> +  * Note: We don't do flow control towards the hardware driver.
> +  * If more data is received than will fit into the input buffer,
> +  * it will be dropped and an error will be logged. This should
> +  * never happen as the device is slow and the buffer size ample.
> +  */
> + tty->receive_room = RBUFSIZE/2;
> +
>   /* OK.. Initialization of the datastructures and the HW is done.. Now
>* startup system and notify the LL that we are ready to run
>*/
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 18/22] openvswitch: Make tunnel set action attach a metadata dst

2015-07-13 Thread Joe Stringer
Hi Thomas,

On 10 July 2015 at 07:19, Thomas Graf  wrote:
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index ecfa530..05fe46b 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -1548,11 +1548,45 @@ static struct sw_flow_actions 
> *nla_alloc_flow_actions(int size, bool log)
> return sfa;
>  }
>
> +static void ovs_nla_free_set_action(const struct nlattr *a)
> +{
> +   const struct nlattr *ovs_key = nla_data(a);
> +   struct ovs_tunnel_info *ovs_tun;
> +
> +   switch (nla_type(ovs_key)) {
> +   case OVS_KEY_ATTR_TUNNEL_INFO:
> +   ovs_tun = nla_data(ovs_key);
> +   dst_release((struct dst_entry *)ovs_tun->tun_dst);
> +   break;
> +   }
> +}
> +
> +void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
> +{
> +   const struct nlattr *a;
> +   int rem;
> +
> +   nla_for_each_attr(a, sf_acts->actions, sf_acts->actions_len, rem) {
> +   switch (nla_type(a)) {
> +   case OVS_ACTION_ATTR_SET:
> +   ovs_nla_free_set_action(a);
> +   break;
> +   }
> +   }
> +
> +   kfree(sf_acts);
> +}

It doesn't look like flow_free() is using this new function to
properly free the actions. Also, some of the error cases that hit this
code have sf_acts=NULL.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tc: fix tc actions in case of shared skb

2015-07-13 Thread Alexei Starovoitov

On 7/13/15 1:55 PM, Daniel Borkmann wrote:

On 07/13/2015 10:17 PM, Alexei Starovoitov wrote:
...

We cannot check tc actions from pktgen, since they can be added
dynamically.
So I see three options:
1 get rid of burst hack for both RX and TX in pktgen (kills performance)
2 add unlikely(skb_shread) check to few tc actions
3 do nothing

...

pktgen case. :/ With regards to option 2, you could hide that behind
a static inline helper wrapped in IS_ENABLED(CONFIG_NET_PKTGEN), but
that is a vry ugly workaround/hack as well (and distros might
even ship it nevertheless).


naming such helper is a headache as well.
static inline bool is_pktgen_shared_skb(struct sk_buff *skb)
{
#if IS_ENABLED(CONFIG_NET_PKTGEN)
/* pktgen uses skb->users += burst trick to reuse skb */
return skb_shared(skb);
#else
return false;
#endif
}
and in actions:
if (unlikely(is_pktgen_shared_skb(skb))) goto drop;

thoughts?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] pkt_sched: sch_qfq: remove unused member of struct qfq_sched

2015-07-13 Thread Andrea Parri
The member (u32) "num_active_agg" of struct qfq_sched has been unused
since its introduction in 462dbc9101acd38e92eda93c0726857517a24bbd
"pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT)
there is no active plan to use it; this removes the member.

Signed-off-by: Andrea Parri 
Acked-by: Paolo Valente 
---
 net/sched/sch_qfq.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index b8d73bc..ffaeea6 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -186,7 +186,6 @@ struct qfq_sched {
 
u64 oldV, V;/* Precise virtual times. */
struct qfq_aggregate*in_serv_agg;   /* Aggregate being served. */
-   u32 num_active_agg; /* Num. of active aggregates */
u32 wsum;   /* weight sum */
u32 iwsum;  /* inverse weight sum */
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rfkill-gpio: Add support for the Realtek 8723 BT

2015-07-13 Thread Bastien Nocera
http://thread.gmane.org/gmane.linux.kernel.wireless.general/127706/focu
s=127896

Signed-off-by: Bastien Nocera 

---
 net/rfkill/rfkill-gpio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/rfkill/rfkill-gpio.c b/net/rfkill/rfkill-gpio.c
index d5d58d9..9471024 100644
--- a/net/rfkill/rfkill-gpio.c
+++ b/net/rfkill/rfkill-gpio.c
@@ -168,6 +168,7 @@ static const struct acpi_device_id
rfkill_acpi_match[] = {
{ "BCM2E3D", RFKILL_TYPE_BLUETOOTH },
{ "BCM2E40", RFKILL_TYPE_BLUETOOTH },
{ "BCM2E64", RFKILL_TYPE_BLUETOOTH },
+   { "OBDA8723", RFKILL_TYPE_BLUETOOTH },
{ "BCM4752", RFKILL_TYPE_GPS },
{ "LNV4752", RFKILL_TYPE_GPS },
{ },
-- 
2.4.3
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bridge: mdb: add vlan support for user entries

2015-07-13 Thread David Miller
From: Nikolay Aleksandrov 
Date: Fri, 10 Jul 2015 08:02:08 -0700

> Until now all user mdb entries were added in vlan 0, this patch adds
> support to allow the user to specify the vlan for the entry.
> About the uapi change a hole in struct br_mdb_entry is used so the size
> and offsets are kept the same (verified with pahole and tested with older
> iproute2).
> 
> Example:
> $ bridge mdb
> dev br0 port eth1 grp 239.0.0.1 permanent vlan 2000
> dev br0 port eth1 grp 239.0.0.1 permanent vlan 200
> dev br0 port eth1 grp 239.0.0.1 permanent
> 
> Signed-off-by: Nikolay Aleksandrov 

This looks ok, applied thanks Nikolay.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v5 1/4] net core: Add protodown support.

2015-07-13 Thread David Miller
From: anurad...@cumulusnetworks.com
Date: Thu,  9 Jul 2015 15:35:27 -0700

> +/* proto_flags - port state information can be passed to the switch driver 
> and
> + * used to determine the phys state of the switch port */
> +enum {
> + IF_PROTOF_DOWN  = 1<<0  /* set switch port phys state down */
> +};

Realistically, do we really foresee any other proto flags being added in
the future?

Unless there is a strong sense that we will have some, this is
insanely overengineered with all of these bit masking capabilities and
such and nested attributes.

I'd say just do one boolean attribute and that's it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one

2015-07-13 Thread Nikolay Aleksandrov
On 07/13/2015 11:05 PM, Nikolay Aleksandrov wrote:
> On 07/13/2015 08:57 PM, cls...@linux.vnet.ibm.com wrote:
>> From: Carol L Soto 
>>
>> Add function bond_remove_proc_entry at __bond_release_one to avoid stack 
>> trace at rmmod bonding.
>>
>> [68830.202239] remove_proc_entry: removing non-empty directory
>> 'net/bonding', leaking at least 'bond0'
>> [68830.202257] [ cut here ]
>> [68830.202260] WARNING: at fs/proc/generic.c:562
>> [68830.202412] NIP [c02abf6c] .remove_proc_entry+0x1fc/0x240
>> [68830.202416] LR [c02abf68] .remove_proc_entry+0x1f8/0x240
>> [68830.202419] PACATMSCRATCH [80009032]
>> [68830.202421] Call Trace:
>> [68830.202424] [c00179277940] [c02abf68] 
>> .remove_proc_entry+0x1f8/0x240 (unreliable)
>> [68830.202434] [c001792779f0] [d53229a4] 
>> .bond_destroy_proc_dir+0x34/0x54 [bonding]
>> [68830.202440] [c00179277a70] [d53130e0] 
>> .bond_net_exit+0x90/0x120 [bonding]
>> [68830.202445] [c00179277b10] [c059944c] 
>> .ops_exit_list.isra.0+0x6c/0xd0
>> [68830.202450] [c00179277ba0] [c0599774] 
>> .unregister_pernet_operations+0x94/0x100
>> [68830.202454] [c00179277c40] [c0599814] 
>> .unregister_pernet_subsys+0x34/0x60
>> [68830.202460] [c00179277cc0] [d5323758] 
>> .bonding_exit+0x48/0x2328 [bonding]
>> [68830.202466] [c00179277d30] [c010dcc4] 
>> .SyS_delete_module+0x1f4/0x340
>> [68830.202471] [c00179277e30] [c0009e7c] 
>> syscall_exit+0x0/0x7c
>> [68830.202491] ---[ end trace 9bd1d810219c9875 ]---
>>
>> Signed-off-by: Carol L Soto 
>> ---
>>  drivers/net/bonding/bond_main.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c 
>> b/drivers/net/bonding/bond_main.c
>> index 19eb990..ace105a 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device 
>> *bond_dev,
>>  dev_set_mac_address(slave_dev, &addr);
>>  }
>>  
>> +bond_remove_proc_entry(bond);
>> +
>>  dev_set_mtu(slave_dev, slave->original_mtu);
>>  
>>  slave_dev->priv_flags &= ~IFF_BONDING;
>>
> 
> This is incorrect, it tries to remove the bond entry on every slave release
> so if we have a bonding device with >= 2 slaves and release one of them then
> the whole bond device entry will be removed from /proc/net/bonding.

> You can hit this case only if you had created a bonding device while doing the
> rmmod bonding (it's an old race condition which was fixed long time ago, but
> the procfs was apparently missed) and only after the notifier has been
> unregistered but before the sysfs has been removed.
> 
Scratch this part, it should be triggered in a different way.
Could you provide a way to reproduce ?

> Since the bonding netdevice notifier is handling the procfs 
> creation/destruction
> we could try moving the unregister after the pernet destruction which should
> help avoid such problems. Could you try the following patch:
> 
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 19eb990d398c..d515ee38b77f 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4682,12 +4682,10 @@ err_link:
>  
>  static void __exit bonding_exit(void)
>  {
> - unregister_netdevice_notifier(&bond_netdev_notifier);
> -
>   bond_destroy_debugfs();
> -
>   bond_netlink_fini();
>   unregister_pernet_subsys(&bond_net_ops);
> + unregister_netdevice_notifier(&bond_netdev_notifier);
>  
>  #ifdef CONFIG_NET_POLL_CONTROLLER
>   /* Make sure we don't have an imbalance on our netpoll blocking */
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one

2015-07-13 Thread Nikolay Aleksandrov
On 07/13/2015 08:57 PM, cls...@linux.vnet.ibm.com wrote:
> From: Carol L Soto 
> 
> Add function bond_remove_proc_entry at __bond_release_one to avoid stack 
> trace at rmmod bonding.
> 
> [68830.202239] remove_proc_entry: removing non-empty directory
> 'net/bonding', leaking at least 'bond0'
> [68830.202257] [ cut here ]
> [68830.202260] WARNING: at fs/proc/generic.c:562
> [68830.202412] NIP [c02abf6c] .remove_proc_entry+0x1fc/0x240
> [68830.202416] LR [c02abf68] .remove_proc_entry+0x1f8/0x240
> [68830.202419] PACATMSCRATCH [80009032]
> [68830.202421] Call Trace:
> [68830.202424] [c00179277940] [c02abf68] 
> .remove_proc_entry+0x1f8/0x240 (unreliable)
> [68830.202434] [c001792779f0] [d53229a4] 
> .bond_destroy_proc_dir+0x34/0x54 [bonding]
> [68830.202440] [c00179277a70] [d53130e0] 
> .bond_net_exit+0x90/0x120 [bonding]
> [68830.202445] [c00179277b10] [c059944c] 
> .ops_exit_list.isra.0+0x6c/0xd0
> [68830.202450] [c00179277ba0] [c0599774] 
> .unregister_pernet_operations+0x94/0x100
> [68830.202454] [c00179277c40] [c0599814] 
> .unregister_pernet_subsys+0x34/0x60
> [68830.202460] [c00179277cc0] [d5323758] 
> .bonding_exit+0x48/0x2328 [bonding]
> [68830.202466] [c00179277d30] [c010dcc4] 
> .SyS_delete_module+0x1f4/0x340
> [68830.202471] [c00179277e30] [c0009e7c] 
> syscall_exit+0x0/0x7c
> [68830.202491] ---[ end trace 9bd1d810219c9875 ]---
> 
> Signed-off-by: Carol L Soto 
> ---
>  drivers/net/bonding/bond_main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 19eb990..ace105a 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device 
> *bond_dev,
>   dev_set_mac_address(slave_dev, &addr);
>   }
>  
> + bond_remove_proc_entry(bond);
> +
>   dev_set_mtu(slave_dev, slave->original_mtu);
>  
>   slave_dev->priv_flags &= ~IFF_BONDING;
> 

This is incorrect, it tries to remove the bond entry on every slave release
so if we have a bonding device with >= 2 slaves and release one of them then
the whole bond device entry will be removed from /proc/net/bonding.
You can hit this case only if you had created a bonding device while doing the
rmmod bonding (it's an old race condition which was fixed long time ago, but
the procfs was apparently missed) and only after the notifier has been
unregistered but before the sysfs has been removed.

Since the bonding netdevice notifier is handling the procfs creation/destruction
we could try moving the unregister after the pernet destruction which should
help avoid such problems. Could you try the following patch:


diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 19eb990d398c..d515ee38b77f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4682,12 +4682,10 @@ err_link:
 
 static void __exit bonding_exit(void)
 {
-   unregister_netdevice_notifier(&bond_netdev_notifier);
-
bond_destroy_debugfs();
-
bond_netlink_fini();
unregister_pernet_subsys(&bond_net_ops);
+   unregister_netdevice_notifier(&bond_netdev_notifier);
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
/* Make sure we don't have an imbalance on our netpoll blocking */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tc: fix tc actions in case of shared skb

2015-07-13 Thread Daniel Borkmann

On 07/13/2015 10:17 PM, Alexei Starovoitov wrote:
...

We cannot check tc actions from pktgen, since they can be added
dynamically.
So I see three options:
1 get rid of burst hack for both RX and TX in pktgen (kills performance)
2 add unlikely(skb_shread) check to few tc actions
3 do nothing

I think 2 isn't that bad after all if properly documented with
"because pktgen is doing this hack for performance" ?

I'm fine with 3 too, since the whole pktgen business is for root
and for kernel hackers who suppose to know what they're doing.


Hmm, one thing for option 3 could be that we add a modinfo tag
"experimental", so that on loading of pktgen module, we trigger
(like in case of staging) ...

  add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK);

... and add a pr_warn() to the user, it may be more visible/clear
than the "Packet Generator (USE WITH CAUTION)" Kconfig title? ;)

It'd be a pity that we'd need the extra atomic read only for the
pktgen case. :/ With regards to option 2, you could hide that behind
a static inline helper wrapped in IS_ENABLED(CONFIG_NET_PKTGEN), but
that is a vry ugly workaround/hack as well (and distros might
even ship it nevertheless). I wouldn't be surprised if there are
other usage combinations with pktgen that would crash your box. :/

Best,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio-net TSO Lockup

2015-07-13 Thread Brian Rak
We've been encountering an issue in the virtio-net driver that cause it 
to become unresponsive after a period of high load.  This issue goes 
away if we disable TSO on the interface.


Once this issue has been triggered, the interface can still receive 
traffic, but will not transmit anything.


Specifically:
* Initially the machine will still try to respond to packets (I say try, 
because I see the packets in tcpdump, but the counters shown by 'ip -s 
-d link show eth1' do not increment.  I also do not see the packets make 
it to the upstream network interface)
* After a little while (1-2 minutes), I stop seeing the response packets 
in tcpdump.  (In this case I'm looking for ARP request/replies, so the 
requests still come in, but the responses do not go out.  This is not 
limited to just ARP, the interface will not respond at all)
* If I leave a ping running while the interface is broken, eventually I 
start seeing 'ping: sendmsg: No buffer space available'


I've reproduced this on a few Ubuntu kernel builds (3.13.0-53-generic 
and 4.0.7-040007-generic), and a few CentOS kernels 
(2.6.32-504.16.2.el6.x86_64, 4.1.1-1.el6.elrepo.x86_64) so I do not 
believe this to be distribution specific.


If I restart the machine (just issuing a server level 'reboot' command, 
not restarting qemu itself), the adapter starts working properly again.


Interestingly, these machines have two virtio NICs, and this only seems 
to occur for one of them (by this, I mean eth0 always works, and eth1 
always breaks.  If I remove eth0 from the machine, eth1 still breaks). 
On the host level, the broken one is a macvtap interface, while the 
working one is an tap device.   We've seen this in the past with a 
different interface type (the qemu multicast NIC type), so I do not 
believe this is really relevant.  If I switch the machines to using 
emulated e1000 nics, I can no longer reproduce the issue.


Reproduction is fairly easy, with two machines run `nc -lk 1818 | pv > 
/dev/null` on one, and `cat /dev/zero | pv | nc 10.99.0.100 1818` (the 
machine sending traffic will break within a minute or two).  I can 
easily provide access to machines where the problem manifests, if that 
would be helpful.


I'm not really sure where to go from here.  Tracking down a bug in the 
virtio driver is a bit above my skill level.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] net: macb: Add mdio driver for accessing multiple phy devices

2015-07-13 Thread Florian Fainelli
On 12/07/15 21:48, Punnaiah Choudary Kalluri wrote:
> This patch is to add support for the design that has multiple ethernet
> mac controllers and single mdio bus connected to multiple phy devices.
> i.e mdio lines are connected to any of the ethernet mac controller and
> all the phy devices will be accessed using the phy maintenance interface
> in that mac controller.
> 
>  __   _
> |  | |PHY0 |
> | MAC0 |-| |
> |__|   | |_|
>|   
>  __|  _
> |  |   | | |
> | MAC1 |   |_|PHY1 | 
> |__| | |
> 
> So, i come up with two implementations for addressing the above configuration.
> 
> Implementation 1:
>  Have separate driver for mdio bus
>  Create a DT node for all the PHY devices connected to the mdio bus
>  This driver will share the register space of the mac controller that has
>  mdio bus connected.

That is the best design implementation, MDIO in itself is a sub-piece of
your Ethernet MAC controller the fact that it is within the Ethernet MAC
core is just coincidental, but there is no reason why it could not be
taken apart and made a separate block in itself.

> 
> Implementation 2:
>  Add new property "has-mdio" and it should be 1 for the mac that has mdio bus
>  connected.
>  Create the mdio bus only when the has-mdio property is 1
> 
> Please review the two implementations and suggest which one is better to 
> proceed
> further. In my opinion implementation 1 will be the ideal one.

Agreed.

> 
> Currently i have tested the patches with single mac and single phy
> configuration. I need to take care of few more cases before releasing the 
> final patch
> but before that i would like to have your opinion on the above implementations
> and finalize one implementation. so that i can enhance it further.
> 
> Punnaiah Choudary Kalluri (1):
>   net: macb: Add mdio driver for accessing multiple phy devices
>   net: macb: Add support for single mac managing more than one phy
> 
> 
>  drivers/net/ethernet/cadence/Makefile|2 +-
>  drivers/net/ethernet/cadence/macb.c  |   93 +-
>  drivers/net/ethernet/cadence/macb.h  |3 +-
>  drivers/net/ethernet/cadence/macb_mdio.c |  204 
> ++
>  4 files changed, 211 insertions(+), 91 deletions(-)
>  create mode 100644 drivers/net/ethernet/cadence/macb_mdio.c
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tc: fix tc actions in case of shared skb

2015-07-13 Thread Alexei Starovoitov

On 7/13/15 1:04 PM, David Miller wrote:

From: Alexei Starovoitov 
Date: Mon, 13 Jul 2015 12:47:42 -0700


In all normal cases skb->users == 1, but pktgen is using trick:
atomic_add(burst, &skb->users);
so when testing something like:


You can want pktgen rx (which is the only buggy case as far as I can
see, TX is fine) to run fast, but you must do so by abiding by the
appropriate SKB sharing rules.

You can't do an optimization in pktgen for RX processing that works
"some of the time".  We have shared SKB rules for a reason.

And I don't want to have to explain to someone in the future why that
drop check is there, and have to tell them "because pktgen is broken
and we decided to add a hack here rather than make pktgen send
properly formed SKBs into the RX path"

Ok?


in general all makes sense, but it is both RX and TX.
Without burst hack we cannot achieve line rate TX.
atomic_add(burst, &pkt_dev->skb->users);
xmit_more:
ret = netdev_start_xmit(pkt_dev->skb, odev, txq, --burst > 0);

in pktgen we check that driver can work with users > 1 via:
pkt_dev->odev->priv_flags & IFF_TX_SKB_SHARING

so real hw driver are mostly ready for users > 1, it's only
few tc actions struggle a bit.
We cannot check tc actions from pktgen, since they can be added
dynamically.
So I see three options:
1 get rid of burst hack for both RX and TX in pktgen (kills performance)
2 add unlikely(skb_shread) check to few tc actions
3 do nothing

I think 2 isn't that bad after all if properly documented with
"because pktgen is doing this hack for performance" ?

I'm fine with 3 too, since the whole pktgen business is for root
and for kernel hackers who suppose to know what they're doing.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users

2015-07-13 Thread Marcelo Ricardo Leitner

On 13-07-2015 16:58, David Miller wrote:

From: Marcelo Ricardo Leitner 
Date: Mon, 13 Jul 2015 16:05:27 -0300


On 13-07-2015 15:59, David Miller wrote:

From: Neil Horman 
Date: Mon, 13 Jul 2015 06:39:11 -0400


Initially Marcelo had created duplicate code paths, one to return an
fd, one to return a file struct.  If you would rather go in that
direction, I'm sure he can propose it again, but that seems less
correct to me than this solution.


That's much better.


I'm not sure what you mean. Is the new option better or the
history/description?


I mean that adding an explicit function for these internal kernel
users to call is better.


Okay. I'll try to minimize that code duplication then.

Thanks
Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Logically DeadCode

2015-07-13 Thread Rafał Miłecki
On 3 July 2015 at 06:52, Rahul Jain  wrote:
> From 0c34030166a150d6d9f1ab52e7bb40a5440a68c2 Mon Sep 17 00:00:00 2001
> From: Rahul Jain 
> Date: Fri, 3 Jul 2015 10:19:12 +0530
> Subject: [PATCH] Logically DeadCode

You didn't use any prefix for the commit message, it's unclear
(Logically DeadCode what?), no description, you touch two code places
at once.

Please fix above problems and resend.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: Build IPv6 into kernel by default

2015-07-13 Thread David Miller
From: Tom Herbert 
Date: Mon, 13 Jul 2015 08:48:00 -0700

> This patch makes the default to build IPv6 into the kernel. IPv6
> now has significant traction and any remaining vestiges of IPv6
> not being provided parity with IPv4 should be swept away. IPv6 is now
> core to the Internet and kernel.
> 
> Points on IPv6 adoption:
 ...
> Acked-by: YOSHIFUJI Hideaki 
> Signed-off-by: Tom Herbert 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ebpf: remove self-assignment in interpreter's tail call

2015-07-13 Thread David Miller
From: Daniel Borkmann 
Date: Mon, 13 Jul 2015 20:49:32 +0200

> ARG1 = BPF_R1 as it stands, evaluates to regs[BPF_REG_1] = regs[BPF_REG_1]
> and thus has no effect. Add a comment instead, explaining what happens and
> why it's okay to just remove it. Since from user space side, a tail call is
> invoked as a pseudo helper function via bpf_tail_call_proto, the verifier
> checks the arguments just like with any other helper function and makes
> sure that the first argument (regs[BPF_REG_1])'s type is ARG_PTR_TO_CTX.
> 
> Signed-off-by: Daniel Borkmann 

Applied, thanks Daniel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tc: fix tc actions in case of shared skb

2015-07-13 Thread David Miller
From: Alexei Starovoitov 
Date: Mon, 13 Jul 2015 12:47:42 -0700

> In all normal cases skb->users == 1, but pktgen is using trick:
> atomic_add(burst, &skb->users);
> so when testing something like:

You can want pktgen rx (which is the only buggy case as far as I can
see, TX is fine) to run fast, but you must do so by abiding by the
appropriate SKB sharing rules.

You can't do an optimization in pktgen for RX processing that works
"some of the time".  We have shared SKB rules for a reason.

And I don't want to have to explain to someone in the future why that
drop check is there, and have to tell them "because pktgen is broken
and we decided to add a hack here rather than make pktgen send
properly formed SKBs into the RX path"

Ok?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "net: fec: Ensure clocks are enabled while using mdio bus"

2015-07-13 Thread David Miller
From: Fabio Estevam 
Date: Mon, 13 Jul 2015 08:13:52 -0300

> This reverts commit 6c3e921b18edca290099adfddde8a50236bf2d80.
> 
> commit 6c3e921b18ed ("net: fec: Ensure clocks are enabled while using mdio
>  bus") prevents the kernel to boot on mx6 boards, so let's revert it.
> 
> Reported-by: Tyler Baker 
> Signed-off-by: Fabio Estevam 

Andrew, please review.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tc: fix tc actions in case of shared skb

2015-07-13 Thread Alexei Starovoitov

On 7/11/15 9:29 PM, David Miller wrote:

From: Alexei Starovoitov 
Date: Fri, 10 Jul 2015 17:10:11 -0700


TC actions need to check for very unlikely event skb->users != 1,
otherwise subsequent pskb_may_pull/pskb_expand_head will crash.
When skb_shared() just drop the packet, since in the middle of actions
it's too late to call skb_share_check(), since classifiers/actions assume
the same skb pointer.

Signed-off-by: Alexei Starovoitov 


I think whatever creates this skb->users != 1 situation should be fixed,
they should clone the packet.


In all normal cases skb->users == 1, but pktgen is using trick:
atomic_add(burst, &skb->users);
so when testing something like:
tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
  action vlan push id 2 action drop

it will crash:
[   31.999519] kernel BUG at ../net/core/skbuff.c:1130!
[   31.999519] invalid opcode:  [#1] PREEMPT SMP
[   31.999519] Modules linked in: act_gact act_vlan cls_u32 sch_ingress 
veth pktgen

[   31.999519] CPU: 0 PID: 339 Comm: kpktgend_0 Not tainted 4.1.0+ #730
[   31.999519] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), [ 
  31.999519] Call Trace:

[   31.999519]  [] skb_vlan_push+0x1d7/0x200
[   31.999519]  [] tcf_vlan+0x108/0x110 [act_vlan]
[   31.999519]  [] tcf_action_exec+0x46/0x80
[   31.999519]  [] u32_classify+0x30e/0x740 [cls_u32]
[   31.999519]  [] ? __lock_acquire+0xbcf/0x1e80
[   31.999519]  [] ? __lock_acquire+0xbcf/0x1e80
[   31.999519]  [] ? __netif_receive_skb_core+0x1b2/0xce0
[   31.999519]  [] tc_classify_compat+0xa3/0xb0
[   31.999519]  [] tc_classify+0x33/0x90
[   31.999519]  [] __netif_receive_skb_core+0x494/0xce0
[   31.999519]  [] ? __netif_receive_skb_core+0x94/0xce0
[   31.999519]  [] ? trace_hardirqs_on_caller+0xad/0x1d0
[   31.999519]  [] __netif_receive_skb+0x21/0x70
[   31.999519]  [] netif_receive_skb_internal+0x23/0x1c0
[   31.999519]  [] netif_receive_skb_sk+0x49/0x1e0
[   31.999519]  [] pktgen_thread_worker+0x111d/0x1fa0 
[pktgen]



In fact, it would really help enormously if you could explain in detail
how this situation can actually arise.  Especially since I do not consider
it acceptable to drop the packet in this situation.


It's not pretty to drop, but it's better than crash.
I don't think we can get rid of 'skb->users += burst' trick, since
that's where all performance comes from (for both TX and RX testing).

So the only cheap way I see to avoid crash is to do this
if (unlikely(skb_shared(skb)))
check in actions that call pskb_expand_head.

In all normal scenarios it won't be triggered and pktgen tests
won't be crashing.
Yes. pktgen numbers will be a bit meaningless, since act_vlan will be
dropping instead of adding vlan, so users cannot make any performance
conclusions, but still better than crash.


the rules specified here:
Documentation/networking/tc-actions-env-rules.txt
insufficient?


Jamal,
that doc definitely needs updating. :)
It says:
"If you munge any packet thou shalt call pskb_expand_head in the case
someone else is referencing the skb. After that you "own" the skb."
that's incorrect. If somebody 'referencing' skb via skb->users > 1
it's too late to call pskb_expand_head. As you can see in the
crash trace above.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users

2015-07-13 Thread David Miller
From: Marcelo Ricardo Leitner 
Date: Mon, 13 Jul 2015 16:05:27 -0300

> On 13-07-2015 15:59, David Miller wrote:
>> From: Neil Horman 
>> Date: Mon, 13 Jul 2015 06:39:11 -0400
>>
>>> Initially Marcelo had created duplicate code paths, one to return an
>>> fd, one to return a file struct.  If you would rather go in that
>>> direction, I'm sure he can propose it again, but that seems less
>>> correct to me than this solution.
>>
>> That's much better.
> 
> I'm not sure what you mean. Is the new option better or the
> history/description?

I mean that adding an explicit function for these internal kernel
users to call is better.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: qlcnic: Deletion of unnecessary memset

2015-07-13 Thread Christophe JAILLET
There is no need to memset memory allocated with vzalloc.

Signed-off-by: Christophe JAILLET 
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 2f6cc42..7dbab3c 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -2403,7 +2403,6 @@ int qlcnic_alloc_tx_rings(struct qlcnic_adapter *adapter,
qlcnic_free_tx_rings(adapter);
return -ENOMEM;
}
-   memset(cmd_buf_arr, 0, TX_BUFF_RINGSIZE(tx_ring));
tx_ring->cmd_buf_arr = cmd_buf_arr;
spin_lock_init(&tx_ring->tx_clean_lock);
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Patch net] fq_codel: fix a use-after-free

2015-07-13 Thread Cong Wang
Fixes: 25331d6ce42b ("net: sched: implement qstat helper routines")
Cc: John Fastabend 
Signed-off-by: Cong Wang 
Signed-off-by: Cong Wang 
---
 net/sched/sch_fq_codel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index d75993f..06e7c84 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -155,10 +155,10 @@ static unsigned int fq_codel_drop(struct Qdisc *sch)
skb = dequeue_head(flow);
len = qdisc_pkt_len(skb);
q->backlogs[idx] -= len;
-   kfree_skb(skb);
sch->q.qlen--;
qdisc_qstats_drop(sch);
qdisc_qstats_backlog_dec(sch, skb);
+   kfree_skb(skb);
flow->dropped++;
return idx;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one

2015-07-13 Thread clsoto
From: Carol L Soto 

Add function bond_remove_proc_entry at __bond_release_one to avoid stack 
trace at rmmod bonding.

[68830.202239] remove_proc_entry: removing non-empty directory
'net/bonding', leaking at least 'bond0'
[68830.202257] [ cut here ]
[68830.202260] WARNING: at fs/proc/generic.c:562
[68830.202412] NIP [c02abf6c] .remove_proc_entry+0x1fc/0x240
[68830.202416] LR [c02abf68] .remove_proc_entry+0x1f8/0x240
[68830.202419] PACATMSCRATCH [80009032]
[68830.202421] Call Trace:
[68830.202424] [c00179277940] [c02abf68] 
.remove_proc_entry+0x1f8/0x240 (unreliable)
[68830.202434] [c001792779f0] [d53229a4] 
.bond_destroy_proc_dir+0x34/0x54 [bonding]
[68830.202440] [c00179277a70] [d53130e0] 
.bond_net_exit+0x90/0x120 [bonding]
[68830.202445] [c00179277b10] [c059944c] 
.ops_exit_list.isra.0+0x6c/0xd0
[68830.202450] [c00179277ba0] [c0599774] 
.unregister_pernet_operations+0x94/0x100
[68830.202454] [c00179277c40] [c0599814] 
.unregister_pernet_subsys+0x34/0x60
[68830.202460] [c00179277cc0] [d5323758] 
.bonding_exit+0x48/0x2328 [bonding]
[68830.202466] [c00179277d30] [c010dcc4] 
.SyS_delete_module+0x1f4/0x340
[68830.202471] [c00179277e30] [c0009e7c] 
syscall_exit+0x0/0x7c
[68830.202491] ---[ end trace 9bd1d810219c9875 ]---

Signed-off-by: Carol L Soto 
---
 drivers/net/bonding/bond_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 19eb990..ace105a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device *bond_dev,
dev_set_mac_address(slave_dev, &addr);
}
 
+   bond_remove_proc_entry(bond);
+
dev_set_mtu(slave_dev, slave->original_mtu);
 
slave_dev->priv_flags &= ~IFF_BONDING;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users

2015-07-13 Thread Marcelo Ricardo Leitner

On 13-07-2015 15:59, David Miller wrote:

From: Neil Horman 
Date: Mon, 13 Jul 2015 06:39:11 -0400


Initially Marcelo had created duplicate code paths, one to return an
fd, one to return a file struct.  If you would rather go in that
direction, I'm sure he can propose it again, but that seems less
correct to me than this solution.


That's much better.


I'm not sure what you mean. Is the new option better or the 
history/description?


  Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fighting out-of-order reception with RPS?

2015-07-13 Thread Tom Herbert
On Sun, Jul 12, 2015 at 12:15 PM, Oliver Hartkopp
 wrote:
> Hello Eric,
>
> On 07/11/2015 06:35 AM, Eric Dumazet wrote:
>> On Fri, 2015-07-10 at 22:36 +0200, Oliver Hartkopp wrote:
>
>>> Hm. Doesn't sound like a good solution when there's a difference between 
>>> NAPI
>>> and non-NAPI drivers in matters of OOO, right?
>>
>> Isn't OOO a problem for you ? Then you either have to :
>>
>> 1) Use a single CPU to handle IRQ from the device
>> 2) Use NAPI
>>
>
> See below ...
>
>>> What about checking in netif_rx() if the non-NAPI driver has set a hash (aka
>>> the driver is OOO sensitive)?
>>> And if so we could automatically set rps_cpus for this interface in a way 
>>> that
>>> all CPUs are enabled to take skbs following the hash.
>>
>> Wow, netif_rx() is packet processing fast path, certainly not the place
>> to add controlling path decisions.
>
> My only requirement is to be able to pick CAN frames (contained in skbs) from
> the socket in the same order they have been received.
>
>> Please convert your driver to NAPI. You might then even benefit from
>> GRO.
>
> Just some remarks about CAN and CAN frames as you suggest GRO which is
> completely pointless for CAN.
>
> CAN frames have a 11 or 29 bit CAN Identifier (no MAC but content addressing)
> and 0 to 64 bytes of payload. Therefore the MTU for CAN interfaces is 16 or 72
> byte (see struct can(fd)_frame). Each skbuff contains a single CAN frame.
>
> There are CAN controllers which have a FIFO for up to 32 CAN frames, e.g.
> flexcan.c which also implements NAPI. Others (e.g. sja1000.c) don't have any
> FIFO and the reading of the CAN frame from the memory mapped registers needs
> to be processed in the irq context instantly. So 'fast path' netif_rx() is
> reasonable, right?
>
> So why is it not possible to pass netif_rx() skbs from a specific CAN network
> interface to whatever queue where they are processed in order?
>
> E.g. with
>
> skb_set_hash(skb, dev->ifindex, PKT_HASH_TYPE_L2);
>
> and
>
> echo f > /sys/class/net/can0/queues/rx-0/rps_cpus
>
> I get properly ordered CAN frames - even with netif_rx() processed skbs. I
> just want to have this stuff to be enabled by default for CAN interfaces to
> kill the OOO frame issue.
>
If you really must process the CAN FIFO in the hard interrupt then
create a private sk_buf queue. In the interrupt, dequeue from FIFO and
enqueue on the sk_buf queue. Then schedule NAPI, and when that runs
process the sk_buf queue calling call netif_receive_skb for each
enqueued skb. Pretty simple actually :-)

> Regards,
> Oliver
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] tcp: don't use F-RTO on non-recurring timeouts

2015-07-13 Thread Yuchung Cheng
Currently F-RTO may repeatedly send new data packets on non-recurring
timeouts in CA_Loss mode. This is a bug because F-RTO (RFC5682)
should only be used on either new recovery or recurring timeouts.

This exacerbates the recovery progress during frequent timeout &
repair, because we prioritize sending new data packets instead of
repairing the holes when the bandwidth is already scarce.

Fix it by correcting the test of a new recovery episode.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
---
 net/ipv4/tcp_input.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1578fc2..0cef1af 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1920,14 +1920,13 @@ void tcp_enter_loss(struct sock *sk)
const struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb;
-   bool new_recovery = false;
+   bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery;
bool is_reneg;  /* is receiver reneging on SACKs? */
 
/* Reduce ssthresh if it has not yet been made inside this window. */
if (icsk->icsk_ca_state <= TCP_CA_Disorder ||
!after(tp->high_seq, tp->snd_una) ||
(icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
-   new_recovery = true;
tp->prior_ssthresh = tcp_current_ssthresh(sk);
tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
tcp_ca_event(sk, CA_EVENT_LOSS);
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ebpf: remove self-assignment in interpreter's tail call

2015-07-13 Thread Alexei Starovoitov

On 7/13/15 11:49 AM, Daniel Borkmann wrote:

ARG1 = BPF_R1 as it stands, evaluates to regs[BPF_REG_1] = regs[BPF_REG_1]
and thus has no effect. Add a comment instead, explaining what happens and
why it's okay to just remove it. Since from user space side, a tail call is
invoked as a pseudo helper function via bpf_tail_call_proto, the verifier
checks the arguments just like with any other helper function and makes
sure that the first argument (regs[BPF_REG_1])'s type is ARG_PTR_TO_CTX.

Signed-off-by: Daniel Borkmann 


Thanks!
Acked-by: Alexei Starovoitov 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users

2015-07-13 Thread David Miller
From: Neil Horman 
Date: Mon, 13 Jul 2015 06:39:11 -0400

> Initially Marcelo had created duplicate code paths, one to return an
> fd, one to return a file struct.  If you would rather go in that
> direction, I'm sure he can propose it again, but that seems less
> correct to me than this solution.

That's much better.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] ebpf: remove self-assignment in interpreter's tail call

2015-07-13 Thread Daniel Borkmann
ARG1 = BPF_R1 as it stands, evaluates to regs[BPF_REG_1] = regs[BPF_REG_1]
and thus has no effect. Add a comment instead, explaining what happens and
why it's okay to just remove it. Since from user space side, a tail call is
invoked as a pseudo helper function via bpf_tail_call_proto, the verifier
checks the arguments just like with any other helper function and makes
sure that the first argument (regs[BPF_REG_1])'s type is ARG_PTR_TO_CTX.

Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/core.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index c5bedc8..bf38f5e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -453,7 +453,11 @@ select_insn:
if (unlikely(!prog))
goto out;
 
-   ARG1 = BPF_R1;
+   /* ARG1 at this point is guaranteed to point to CTX from
+* the verifier side due to the fact that the tail call is
+* handeled like a helper, that is, bpf_tail_call_proto,
+* where arg1_type is ARG_PTR_TO_CTX.
+*/
insn = prog->insnsi;
goto select_insn;
 out:
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 2/2] pci: Add VPD quirk for Intel Ethernet devices

2015-07-13 Thread Mark D Rustad
From: Mark Rustad 

This quirk sets the PCI_DEV_FLAGS_VPD_REF_F0 flag on all Intel
Ethernet device functions other than function 0.

Signed-off-by: Mark Rustad 
---
Changes in V3:
- Added a multifunction device check
---
 drivers/pci/quirks.c |9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index e9fd0e90fa3b..08c04e4f5ab2 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1894,6 +1894,15 @@ static void quirk_netmos(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID,
 PCI_CLASS_COMMUNICATION_SERIAL, 8, quirk_netmos);
 
+static void quirk_f0_vpd_link(struct pci_dev *dev)
+{
+   if (!dev->multifunction || !PCI_FUNC(dev->devfn))
+   return;
+   dev->dev_flags |= PCI_DEV_FLAGS_VPD_REF_F0;
+}
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
+ PCI_CLASS_NETWORK_ETHERNET, 8, quirk_f0_vpd_link);
+
 static void quirk_e100_interrupt(struct pci_dev *dev)
 {
u16 command, pmcsr;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 1/2] pci: Add dev_flags bit to access VPD through function 0

2015-07-13 Thread Mark D Rustad
From: Mark Rustad 

Add a dev_flags bit, PCI_DEV_FLAGS_VPD_REF_F0, to access VPD through
function 0 to provide VPD access on other functions. This is for
hardware devices that provide copies of the same VPD capability
registers in multiple functions. Because the kernel expects that
each function has its own registers, both the locking and the state
tracking are affected by VPD accesses to different functions.

On such devices for example, if a VPD write is performed on function
0, *any* later attempt to read VPD from any other function of that
device will hang. This has to do with how the kernel tracks the
expected value of the F bit per function.

Concurrent accesses to different functions of the same device can
not only hang but also corrupt both read and write VPD data.

When hangs occur, typically the error message:

vpd r/w failed.  This is likely a firmware bug on this device.

will be seen.

Never set this bit on function 0 or there will be an infinite recursion.

Signed-off-by: Mark Rustad 
---
Changes in V2:
- Corrected spelling in log message
- Added checks to see that the referenced function 0 is reasonable
Changes in V3:
- Don't leak a device reference
- Check that function 0 has VPD
- Make a helper for the function 0 checks
- Do multifunction check in the quirk
Changes in V4:
- Provide a much more detailed explanation in the commit log
---
 drivers/pci/access.c |   61 +-
 include/linux/pci.h  |2 ++
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index d9b64a175990..b965c12168b7 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -439,6 +439,56 @@ static const struct pci_vpd_ops pci_vpd_pci22_ops = {
.release = pci_vpd_pci22_release,
 };
 
+static ssize_t pci_vpd_f0_read(struct pci_dev *dev, loff_t pos, size_t count,
+  void *arg)
+{
+   struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
+   ssize_t ret;
+
+   if (!tdev)
+   return -ENODEV;
+
+   ret = pci_read_vpd(tdev, pos, count, arg);
+   pci_dev_put(tdev);
+   return ret;
+}
+
+static ssize_t pci_vpd_f0_write(struct pci_dev *dev, loff_t pos, size_t count,
+   const void *arg)
+{
+   struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
+   ssize_t ret;
+
+   if (!tdev)
+   return -ENODEV;
+
+   ret = pci_write_vpd(tdev, pos, count, arg);
+   pci_dev_put(tdev);
+   return ret;
+}
+
+static const struct pci_vpd_ops pci_vpd_f0_ops = {
+   .read = pci_vpd_f0_read,
+   .write = pci_vpd_f0_write,
+   .release = pci_vpd_pci22_release,
+};
+
+static int pci_vpd_f0_dev_check(struct pci_dev *dev)
+{
+   struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
+   int ret = 0;
+
+   if (!tdev)
+   return -ENODEV;
+   if (!tdev->vpd || !tdev->multifunction ||
+   dev->class != tdev->class || dev->vendor != tdev->vendor ||
+   dev->device != tdev->device)
+   ret = -ENODEV;
+
+   pci_dev_put(tdev);
+   return ret;
+}
+
 int pci_vpd_pci22_init(struct pci_dev *dev)
 {
struct pci_vpd_pci22 *vpd;
@@ -447,12 +497,21 @@ int pci_vpd_pci22_init(struct pci_dev *dev)
cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
if (!cap)
return -ENODEV;
+   if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
+   int ret = pci_vpd_f0_dev_check(dev);
+
+   if (ret)
+   return ret;
+   }
vpd = kzalloc(sizeof(*vpd), GFP_ATOMIC);
if (!vpd)
return -ENOMEM;
 
vpd->base.len = PCI_VPD_PCI22_SIZE;
-   vpd->base.ops = &pci_vpd_pci22_ops;
+   if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0)
+   vpd->base.ops = &pci_vpd_f0_ops;
+   else
+   vpd->base.ops = &pci_vpd_pci22_ops;
mutex_init(&vpd->lock);
vpd->cap = cap;
vpd->busy = false;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8a0321a8fb59..8edb125db13a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -180,6 +180,8 @@ enum pci_dev_flags {
PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 6),
/* Do not use PM reset even if device advertises NoSoftRst- */
PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
+   /* Get VPD from function 0 VPD */
+   PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
 };
 
 enum pci_irq_reroute_variant {

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] hv_netvsc: Add close of RNDIS filter into change mtu call

2015-07-13 Thread Haiyang Zhang
The current change mtu call only stops tx before removing RNDIS filter.
In case ringbufer is not empty, the rndis_filter_device_remove() may
hang on removing the buffers.

This patch adds close of RNDIS filter before removing it, also a
gradual waiting loop until the ring is empty. The change_mtu hang
issue under heavy traffic is solved by this patch.

Signed-off-by: Haiyang Zhang 
Reviewed-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/netvsc_drv.c |   58 +++
 1 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index b855ba9..7b36d5f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -106,7 +106,7 @@ static int netvsc_open(struct net_device *net)
return ret;
}
 
-   netif_tx_start_all_queues(net);
+   netif_tx_wake_all_queues(net);
 
nvdev = hv_get_drvdata(device_obj);
rdev = nvdev->extension;
@@ -120,15 +120,56 @@ static int netvsc_close(struct net_device *net)
 {
struct net_device_context *net_device_ctx = netdev_priv(net);
struct hv_device *device_obj = net_device_ctx->device_ctx;
+   struct netvsc_device *nvdev = hv_get_drvdata(device_obj);
int ret;
+   u32 aread, awrite, i, msec = 10, retry = 0, retry_max = 20;
+   struct vmbus_channel *chn;
 
netif_tx_disable(net);
 
/* Make sure netvsc_set_multicast_list doesn't re-enable filter! */
cancel_work_sync(&net_device_ctx->work);
ret = rndis_filter_close(device_obj);
-   if (ret != 0)
+   if (ret != 0) {
netdev_err(net, "unable to close device (ret %d).\n", ret);
+   return ret;
+   }
+
+   /* Ensure pending bytes in ring are read */
+   while (true) {
+   aread = 0;
+   for (i = 0; i < nvdev->num_chn; i++) {
+   chn = nvdev->chn_table[i];
+   if (!chn)
+   continue;
+
+   hv_get_ringbuffer_availbytes(&chn->inbound, &aread,
+&awrite);
+
+   if (aread)
+   break;
+
+   hv_get_ringbuffer_availbytes(&chn->outbound, &aread,
+&awrite);
+
+   if (aread)
+   break;
+   }
+
+   retry++;
+   if (retry > retry_max || aread == 0)
+   break;
+
+   msleep(msec);
+
+   if (msec < 1000)
+   msec *= 2;
+   }
+
+   if (aread) {
+   netdev_err(net, "Ring buffer not empty after closing rndis\n");
+   ret = -ETIMEDOUT;
+   }
 
return ret;
 }
@@ -736,6 +777,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
struct netvsc_device *nvdev = hv_get_drvdata(hdev);
struct netvsc_device_info device_info;
int limit = ETH_DATA_LEN;
+   int ret = 0;
 
if (nvdev == NULL || nvdev->destroy)
return -ENODEV;
@@ -746,9 +788,11 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
if (mtu < NETVSC_MTU_MIN || mtu > limit)
return -EINVAL;
 
+   ret = netvsc_close(ndev);
+   if (ret)
+   goto out;
+
nvdev->start_remove = true;
-   cancel_work_sync(&ndevctx->work);
-   netif_tx_disable(ndev);
rndis_filter_device_remove(hdev);
 
ndev->mtu = mtu;
@@ -758,9 +802,11 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
device_info.ring_size = ring_size;
device_info.max_num_vrss_chns = max_num_vrss_chns;
rndis_filter_device_add(hdev, &device_info);
-   netif_tx_wake_all_queues(ndev);
 
-   return 0;
+out:
+   netvsc_open(ndev);
+
+   return ret;
 }
 
 static struct rtnl_link_stats64 *netvsc_get_stats64(struct net_device *net,
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] bnx2:Make various functions to have a return type of void in the file bnx2.c

2015-07-13 Thread Sony Chacko
> Sony,
> I also sent this patch and was wondering if I can get a reply on it.
> From 4a607447562bec161fd947caae5eb02c2365c58a Mon Sep
> 17 00:00:00 2001
> From: Nicholas Krause 
> Date: Wed, 8 Jul 2015 08:29:07 -0400
> Subject: [PATCH] bnx2i:Fix backwards locking scenario in the
> function  bnx2i_cleanup_task
> 
> This fixes the backwards locking scenario for unlocking the
> bottom half spinlock before calling the
> wait_for_completion_timeout on the structure pointer
> bnx2i_conn's member cmd_cleanup_cmpl for the critical region
> of this function to lock the spin_lock bottom half before
> unlocking it after the call to this function in order to have actual
> protection for the function bnx2i_cleanup_task's critical region.
> 
> Signed-off-by: Nicholas Krause 
> ---
>  drivers/scsi/bnx2i/bnx2i_iscsi.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c
> b/drivers/scsi/bnx2i/bnx2i_iscsi.c
> index 7289437..619a26f 100644
> --- a/drivers/scsi/bnx2i/bnx2i_iscsi.c
> +++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c
> @@ -1172,12 +1172,12 @@ static void
> bnx2i_cleanup_task(struct iscsi_task *task)
>   if (task->state == ISCSI_TASK_ABRT_TMF) {
>   bnx2i_send_cmd_cleanup_req(hba, task->dd_data);
> 
> - spin_unlock_bh(&conn->session->back_lock);
> - spin_unlock_bh(&conn->session->frwd_lock);
> + spin_lock_bh(&conn->session->back_lock);
> + spin_lock_bh(&conn->session->frwd_lock);
>   wait_for_completion_timeout(&bnx2i_conn-
> >cmd_cleanup_cmpl,
> 
>   msecs_to_jiffies(ISCSI_CMD_CLEANUP_TIMEOUT));
> - spin_lock_bh(&conn->session->frwd_lock);
> - spin_lock_bh(&conn->session->back_lock);
> + spin_unlock_bh(&conn->session->frwd_lock);
> + spin_unlock_bh(&conn->session->back_lock);
>   }
>   bnx2i_iscsi_unmap_sg_list(task->dd_data);
>  }
> --
> 2.1.4
> I am assuming it's wrong but you never known.
> Nick

Nick,

I have included the Qlogic ISCSI engineer to the mailing list to
review and ACK the patch. I will also follow it up with the 
ISCSI team.

Thanks,
Sony
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 0/2] pci: Provide a flag to access VPD through function 0

2015-07-13 Thread Mark D Rustad
Many multi-function devices provide shared registers in extended
config space for accessing VPD. The behavior of these registers
means that the state must be tracked and access locked correctly
for accesses not to hang or worse. One way to meet these needs is
to always perform the accesses through function 0, thereby using
the state tracking and mutex that already exists.

To provide this behavior, add a dev_flags bit to indicate that this
should be done. This bit can then be set for any non-zero function
that needs to redirect such VPD access to function 0. Do not set
this bit on the zero function or there will be an infinite recursion.

The second patch uses this new flag to invoke this behavior on all
multi-function Intel Ethernet devices.

Any hardware that shares VPD registers with multiple functions has
been suffering these problems forever. The hangs result in the log
message:

vpd r/w failed.  This is likely a firmware bug on this device.

Both read and write data corruption are also possible during
overlapping accesses in addition to hangs.

Signed-off-by: Mark Rustad 

---
Changes in V2:
- Corrected a spelling error in a log message
- Added checks to see that the referenced function 0 is reasonable
Changes in V3:
- Don't leak a device reference
- Check that function 0 has VPD
- Make a helper for the function 0 checks
- Moved a multifunction check to the quirk patch
Changes in V4:
- Provide a more extensive commit log for patch 1

---

Mark Rustad (2):
  pci: Add dev_flags bit to access VPD through function 0
  pci: Add VPD quirk for Intel Ethernet devices


 drivers/pci/access.c |   61 +-
 drivers/pci/quirks.c |9 +++
 include/linux/pci.h  |2 ++
 3 files changed, 71 insertions(+), 1 deletion(-)

-- 
Mark Rustad, Network Division, Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 18/22] fjes: unshare_watch_task

2015-07-13 Thread Yasuaki Ishimatsu

Hi Izumi-san,

On Wed, 24 Jun 2015 11:55:50 +0900
Taku Izumi  wrote:

> This patch adds unshare_watch_task.
> Shared buffer's status can be changed into unshared.
> This task is used to monitor shared buffer's status.
> 
> Signed-off-by: Taku Izumi 
> ---
>  drivers/net/fjes/fjes.h  |   3 +
>  drivers/net/fjes/fjes_main.c | 130 
> +++
>  2 files changed, 133 insertions(+)
> 
> diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h
> index d31d4c3..57feee8 100644
> --- a/drivers/net/fjes/fjes.h
> +++ b/drivers/net/fjes/fjes.h
> @@ -59,6 +59,9 @@ struct fjes_adapter {
>   struct work_struct tx_stall_task;
>   struct work_struct raise_intr_rxdata_task;
>  
> + struct work_struct unshare_watch_task;
> + unsigned long unshare_watch_bitmask;
> +
>   struct delayed_work interrupt_watch_task;
>   bool interrupt_watch_enable;
>  
> diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
> index 1ddb9d3..69a238c 100644
> --- a/drivers/net/fjes/fjes_main.c
> +++ b/drivers/net/fjes/fjes_main.c
> @@ -73,6 +73,7 @@ static int fjes_remove(struct platform_device *);
>  static int fjes_sw_init(struct fjes_adapter *);
>  static void fjes_netdev_setup(struct net_device *);
>  static void fjes_irq_watch_task(struct work_struct *);
> +static void fjes_watch_unshare_task(struct work_struct *);
>  static void fjes_rx_irq(struct fjes_adapter *, int);
>  static int fjes_poll(struct napi_struct *, int);
>  
> @@ -312,6 +313,8 @@ static int fjes_close(struct net_device *netdev)
>   fjes_free_irq(adapter);
>  
>   cancel_delayed_work_sync(&adapter->interrupt_watch_task);
> + cancel_work_sync(&adapter->unshare_watch_task);
> + adapter->unshare_watch_bitmask = 0;
>   cancel_work_sync(&adapter->raise_intr_rxdata_task);
>   cancel_work_sync(&adapter->tx_stall_task);
>  
> @@ -1032,6 +1035,8 @@ static int fjes_probe(struct platform_device *plat_dev)
>   INIT_WORK(&adapter->tx_stall_task, fjes_tx_stall_task);
>   INIT_WORK(&adapter->raise_intr_rxdata_task,
> fjes_raise_intr_rxdata_task);
> + INIT_WORK(&adapter->unshare_watch_task, fjes_watch_unshare_task);
> + adapter->unshare_watch_bitmask = 0;
>  
>   INIT_DELAYED_WORK(&adapter->interrupt_watch_task, fjes_irq_watch_task);
>   adapter->interrupt_watch_enable = false;
> @@ -1077,6 +1082,7 @@ static int fjes_remove(struct platform_device *plat_dev)
>   struct fjes_hw *hw = &adapter->hw;
>  
>   cancel_delayed_work_sync(&adapter->interrupt_watch_task);
> + cancel_work_sync(&adapter->unshare_watch_task);
>   cancel_work_sync(&adapter->raise_intr_rxdata_task);
>   cancel_work_sync(&adapter->tx_stall_task);
>   if (adapter->control_wq)
> @@ -1136,6 +1142,130 @@ static void fjes_irq_watch_task(struct work_struct 
> *work)
>   }
>  }
>  
> +static void fjes_watch_unshare_task(struct work_struct *work)
> +{
> + struct fjes_adapter *adapter =
> + container_of(work, struct fjes_adapter, unshare_watch_task);
> +
> + struct fjes_hw *hw = &adapter->hw;
> + struct net_device *netdev = adapter->netdev;
> + int epidx;
> + int max_epid, my_epid;
> + unsigned long unshare_watch_bitmask;
> + int wait_time = 0;
> + int is_shared;
> + int stop_req, stop_req_done;
> + int unshare_watch, unshare_reserve;
> + int ret;
> +
> + my_epid = hw->my_epid;
> + max_epid = hw->max_epid;
> +
> + unshare_watch_bitmask = adapter->unshare_watch_bitmask;
> + adapter->unshare_watch_bitmask = 0;
> +
> + while ((unshare_watch_bitmask || hw->txrx_stop_req_bit) &&
> +(wait_time < 3000)) {
> + for (epidx = 0; epidx < hw->max_epid; epidx++) {
> + if (epidx == hw->my_epid)
> + continue;
> +
> + is_shared =
> + fjes_hw_epid_is_shared(hw->hw_info.share, epidx);
> +
> + stop_req =
> + test_bit(epidx, &hw->txrx_stop_req_bit);
> +
> + stop_req_done =
> + hw->ep_shm_info[epidx].rx.info->v1i.rx_status &
> + FJES_RX_STOP_REQ_DONE;
> +
> + unshare_watch =
> + test_bit(epidx, &unshare_watch_bitmask);
> +
> + unshare_reserve =
> + test_bit(epidx,
> +  &hw->hw_info.buffer_unshare_reserve_bit);
> +
> + if ((!stop_req ||
> +  (is_shared && (!is_shared || !stop_req_done))) &&
> + (is_shared || !unshare_watch || !unshare_reserve))
> + continue;
> +

> + mutex_lock(&hw->hw_info.lock);
> + ret = fjes_hw_unregister_buff_addr(hw, epidx);
> + switch (ret) {
> + case 0:
> + break;
> +  

Re: Linux 4.2 build error in net/netfilter/ipset/ip_set_hash_netnet.c

2015-07-13 Thread Cong Wang
On Mon, Jul 13, 2015 at 9:13 AM, Akemi Yagi  wrote:
> On Sun, 05 Jul 2015 08:35:20 -0700, Guenter Roeck wrote:
>
>> On Sat, Jul 04, 2015 at 12:44:36AM -0700, Vinson Lee wrote:
>>> Hi.
>>>
>>> With the latest Linux 4.2-rc1, I am hitting this build error with GCC
>>> 4.4.7 on CentOS 6.
>>>
>>>   CC  net/netfilter/ipset/ip_set_hash_netnet.o
>>> net/netfilter/ipset/ip_set_hash_netnet.c: In function
>>> ‘hash_netnet4_uadt’:
>>> net/netfilter/ipset/ip_set_hash_netnet.c:163: error: unknown field
>>> ‘cidr’ specified in initializer
>>> net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: missing braces
>>> around initializer net/netfilter/ipset/ip_set_hash_netnet.c:163:
>>> warning: (near initialization for ‘e..ip’)
>>> net/netfilter/ipset/ip_set_hash_netnet.c: In function
>>> ‘hash_netnet6_uadt’:
>>> net/netfilter/ipset/ip_set_hash_netnet.c:388: error: unknown field
>>> ‘cidr’ specified in initializer
>>> net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: missing braces
>>> around initializer net/netfilter/ipset/ip_set_hash_netnet.c:388:
>>> warning: (near initialization for ‘e.ip[0]’)
>>>
>> Previously fixed with commit 1a869205c75cb ("netfilter: ipset: The
>> unnamed union initialization may lead to compilation error"),
>> reintroduced with commit aff227581ed1a ("netfilter: ipset: Check CIDR
>> value only when attribute is given").
>>
>> Guenter
>
> I wonder what can be done to get this issue fixed. This problem was seen
> in 4.2-rc1 and now in 4.2-rc2 on RHEL-6.6.
>

Just revert the initializer piece.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-13 Thread Jason Gunthorpe
On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote:
> + switch (ib_event->event) {
> + case IB_CM_REQ_RECEIVED:
> + req->device = req_param->listen_id->device;
> + req->port   = req_param->port;
> + req->local_gid  = &req_param->primary_path->sgid;
> + req->service_id = req_param->primary_path->service_id;
> + req->pkey   = be16_to_cpu(req_param->primary_path->pkey);

I feel pretty strongly that we should be using the pkey from the work
completion, not the pkey in the message.

The reason, if someone is using pkey like vlan, and expecting a
container to never receive packets outside the assigned pkey, then we
need to check each and every packet for the correct pkey before
associating it with that container.

When doing the namespace patches you should probably also look at
other CM GMPs than just the REQ and how the paths are setup and
consider what to do with the pkey. I'd probably suggest that the pkey
should be forced throughout the entire process to ensure it always
matches the ip device - at least for containers that is the right
thing.. I probably wouldn't turn it on for the root namespace though..

Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 11/12] IB/cma: Share ib_cm_ids between rdma_cm_ids

2015-07-13 Thread Jason Gunthorpe
On Mon, Jun 22, 2015 at 03:42:40PM +0300, Haggai Eran wrote:
> Use ib_cm_id_create_and_listen to create listening IB CM IDs or share
  ^^^
Is that the wrong name? ib_cm_insert_listen perhaps?

I think I've looked at the details in this series I was concerned
about, Sean should OK the rest of the changes to the CM code, but
nothing much stood out to me.

Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 05/12] IB/cm: Share listening CM IDs

2015-07-13 Thread Jason Gunthorpe
On Mon, Jun 22, 2015 at 03:42:34PM +0300, Haggai Eran wrote:
>   spin_lock_irq(&cm.lock);
> + if (--cm_id_priv->listen_sharecount > 0) {
> + /* The id is still shared. */
> + atomic_dec(&cm_id_priv->refcount);

Nit: This looks very strange not to be cm_deref_id .. Looks OK as is
because we are sure refcount cannot be 0 here?

> @@ -958,8 +988,10 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 
> service_id, __be64 service_mask,
>   }
>  
>   cm_id->state = IB_CM_LISTEN;
> + ++cm_id_priv->listen_sharecount;
>
> - spin_lock_irqsave(&cm.lock, flags);
> + if (lock)
> + spin_lock_irqsave(&cm.lock, flags);

Hmm, I'd like to see the listen_sharecount consistently locked, so it
should be manipulated only while cm.lock is held..

>   if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
>   cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
>   cm_id->service_mask = ~cpu_to_be64(0);
> @@ -968,18 +1000,98 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 
> service_id, __be64 service_mask,
>   cm_id->service_mask = service_mask;
>   }
>   cur_cm_id_priv = cm_insert_listen(cm_id_priv);
> - spin_unlock_irqrestore(&cm.lock, flags);
> + if (lock)
> + spin_unlock_irqrestore(&cm.lock, flags);
>  
>   if (cur_cm_id_priv) {
>   cm_id->state = IB_CM_IDLE;
> + --cm_id_priv->listen_sharecount;

Ditto

Otherwise I don't see any other mechanical problems with this. Sean
said he was happy with the idea right?

Reviewed-By: Jason Gunthorpe 

Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] net: ieee802154: Remove redundant spi driver bus initialization

2015-07-13 Thread Alexander Aring
On Tue, Jun 23, 2015 at 10:52:52PM +0800, Antonio Borneo wrote:
> In ancient times it was necessary to manually initialize the bus
> field of an spi_driver to spi_bus_type. These days this is done in
> spi_register_driver(), so we can drop the manual assignment.
> 

Marcel,

I don't see this patch in any linux-next, net-next, bluetooth-next tree.
Could you please apply this patch with the acks by Alan and Varka?

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


request for -stable: "route: Use ipv4_mtu instead of raw rt_pmtu"

2015-07-13 Thread Timo Teras
Hi,

Can you queue for active older -stables up to 3.18:

commit 3cdaa5be9e81 "ipv4: Don't increase PMTU with Datagram Too Big message"
commit cb6ccf09d6b9 "route: Use ipv4_mtu instead of raw rt_pmtu"

commit 3cdaa5be9e81 made it to 3.19.y and was later fixed additionally
with conversion to ipv4_mtu() in the second referenced commit.

However, these patches together will fix another case that is not so
obvious: the case if the original route had MTU set on it. Previously
it was ignored but using ipv4_mtu as the first check will also check
RTAX_MTU on metrics. This fixes the nasty issue that PMTU can trigger
to send larger packets then what was explicitly configured via a static
route mtu.

Thanks,
Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/22] fjes: net_device_ops.ndo_vlan_rx_add/kill_vid

2015-07-13 Thread Yasuaki Ishimatsu
Hi Izumi-san,

On Wed, 24 Jun 2015 11:55:47 +0900
Taku Izumi  wrote:

> This patch adds net_device_ops.ndo_vlan_rx_add_vid and
> net_device_ops.ndo_vlan_rx_kill_vid callback.
> 
> Signed-off-by: Taku Izumi 
> ---
>  drivers/net/fjes/fjes_hw.c   | 27 +++
>  drivers/net/fjes/fjes_hw.h   |  2 ++
>  drivers/net/fjes/fjes_main.c | 40 
>  3 files changed, 69 insertions(+)
> 
> diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
> index 5e3f847..8363e22 100644
> --- a/drivers/net/fjes/fjes_hw.c
> +++ b/drivers/net/fjes/fjes_hw.c
> @@ -827,6 +827,33 @@ bool fjes_hw_check_vlan_id(struct epbuf_handler *epbh, 
> u16 vlan_id)
>   return ret;
>  }
>  
> +bool fjes_hw_set_vlan_id(struct epbuf_handler *epbh, u16 vlan_id)
> +{
> + union ep_buffer_info *info = epbh->info;
> + int i;
> +
> + for (i = 0; i < EP_BUFFER_SUPPORT_VLAN_MAX; i++) {
> + if (info->v1i.vlan_id[i] == 0) {
> + info->v1i.vlan_id[i] = vlan_id;
> + return true;
> + }
> + }
> + return false;
> +}
> +
> +void fjes_hw_del_vlan_id(struct epbuf_handler *epbh, u16 vlan_id)
> +{
> + union ep_buffer_info *info = epbh->info;
> + int i;
> +

> + if (0 != vlan_id) {

How about using the following if statement so than you can delete
indent?

if (vlan_id == 0)
return;

> + for (i = 0; i < EP_BUFFER_SUPPORT_VLAN_MAX; i++) {
> + if (vlan_id == info->v1i.vlan_id[i])
> + info->v1i.vlan_id[i] = 0;
> + }
> + }
> +}
> +
>  bool fjes_hw_epbuf_rx_is_empty(struct epbuf_handler *epbh)
>  {
>   union ep_buffer_info *info = epbh->info;
> diff --git a/drivers/net/fjes/fjes_hw.h b/drivers/net/fjes/fjes_hw.h
> index ea30aeb..afad03e 100644
> --- a/drivers/net/fjes/fjes_hw.h
> +++ b/drivers/net/fjes/fjes_hw.h
> @@ -321,6 +321,8 @@ int fjes_hw_epid_is_shared(struct fjes_device_shared_info 
> *, int);
>  bool fjes_hw_check_epbuf_version(struct epbuf_handler *, u32);
>  bool fjes_hw_check_mtu(struct epbuf_handler *, u32);
>  bool fjes_hw_check_vlan_id(struct epbuf_handler *, u16);
> +bool fjes_hw_set_vlan_id(struct epbuf_handler *, u16);
> +void fjes_hw_del_vlan_id(struct epbuf_handler *, u16);
>  bool fjes_hw_epbuf_rx_is_empty(struct epbuf_handler *);
>  void *fjes_hw_epbuf_rx_curpkt_get_addr(struct epbuf_handler *, size_t *);
>  void fjes_hw_epbuf_rx_curpkt_drop(struct epbuf_handler *);
> diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
> index e2e69e0..bb4c8e4 100644
> --- a/drivers/net/fjes/fjes_main.c
> +++ b/drivers/net/fjes/fjes_main.c
> @@ -58,6 +58,8 @@ static irqreturn_t fjes_intr(int, void*);
>  static struct rtnl_link_stats64 *
>  fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *);
>  static int fjes_change_mtu(struct net_device *, int);
> +static int fjes_vlan_rx_add_vid(struct net_device *, __be16 proto, u16);
> +static int fjes_vlan_rx_kill_vid(struct net_device *, __be16 proto, u16);
>  static void fjes_tx_retry(struct net_device *);
>  
>  static int fjes_acpi_add(struct acpi_device *);
> @@ -229,6 +231,8 @@ static const struct net_device_ops fjes_netdev_ops = {
>   .ndo_get_stats64= fjes_get_stats64,
>   .ndo_change_mtu = fjes_change_mtu,
>   .ndo_tx_timeout = fjes_tx_retry,
> + .ndo_vlan_rx_add_vid= fjes_vlan_rx_add_vid,
> + .ndo_vlan_rx_kill_vid = fjes_vlan_rx_kill_vid,
>  };
>  
>  /* fjes_open - Called when a network interface is made active */
> @@ -757,6 +761,42 @@ static int fjes_change_mtu(struct net_device *netdev, 
> int new_mtu)
>   return -EINVAL;
>  }
>  
> +static int fjes_vlan_rx_add_vid(struct net_device *netdev,
> + __be16 proto, u16 vid)
> +{
> + struct fjes_adapter *adapter = netdev_priv(netdev);
> + bool ret = true;
> + int epid;
> +
> + for (epid = 0; epid < adapter->hw.max_epid; epid++) {
> + if (epid == adapter->hw.my_epid)
> + continue;
> +
> + if (!fjes_hw_check_vlan_id(
> + &adapter->hw.ep_shm_info[epid].tx, vid))
> + ret = fjes_hw_set_vlan_id(
> + &adapter->hw.ep_shm_info[epid].tx, vid);
> + }
> +
> + return ret ? 0 : -ENOSPC;
> +}
> +

> +static int fjes_vlan_rx_kill_vid(struct net_device *netdev,
> +  __be16 proto, u16 vid)

The function always returns 0. So how about defining the function
as void?

Thanks,
Ysauaki Ishimatsu

> +{
> + struct fjes_adapter *adapter = netdev_priv(netdev);
> + int epid;
> +
> + for (epid = 0; epid < adapter->hw.max_epid; epid++) {
> + if (epid == adapter->hw.my_epid)
> + continue;
> +
> + fjes_hw_del_vlan_id(&adapter->hw.ep_shm_info[epid].tx, vid);
> + }
> +
> + return 0;
> +}
> +
>  static i

Re: Linux 4.2 build error in net/netfilter/ipset/ip_set_hash_netnet.c

2015-07-13 Thread Akemi Yagi
On Sun, 05 Jul 2015 08:35:20 -0700, Guenter Roeck wrote:

> On Sat, Jul 04, 2015 at 12:44:36AM -0700, Vinson Lee wrote:
>> Hi.
>> 
>> With the latest Linux 4.2-rc1, I am hitting this build error with GCC
>> 4.4.7 on CentOS 6.
>> 
>>   CC  net/netfilter/ipset/ip_set_hash_netnet.o
>> net/netfilter/ipset/ip_set_hash_netnet.c: In function
>> ‘hash_netnet4_uadt’:
>> net/netfilter/ipset/ip_set_hash_netnet.c:163: error: unknown field
>> ‘cidr’ specified in initializer
>> net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: missing braces
>> around initializer net/netfilter/ipset/ip_set_hash_netnet.c:163:
>> warning: (near initialization for ‘e..ip’)
>> net/netfilter/ipset/ip_set_hash_netnet.c: In function
>> ‘hash_netnet6_uadt’:
>> net/netfilter/ipset/ip_set_hash_netnet.c:388: error: unknown field
>> ‘cidr’ specified in initializer
>> net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: missing braces
>> around initializer net/netfilter/ipset/ip_set_hash_netnet.c:388:
>> warning: (near initialization for ‘e.ip[0]’)
>> 
> Previously fixed with commit 1a869205c75cb ("netfilter: ipset: The
> unnamed union initialization may lead to compilation error"),
> reintroduced with commit aff227581ed1a ("netfilter: ipset: Check CIDR
> value only when attribute is given").
> 
> Guenter

I wonder what can be done to get this issue fixed. This problem was seen 
in 4.2-rc1 and now in 4.2-rc2 on RHEL-6.6.

$ gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)

Please advise.

Akemi

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/22] fjes: tx_stall_task

2015-07-13 Thread Yasuaki Ishimatsu
Hi Izum-san,

On Wed, 24 Jun 2015 11:55:42 +0900
Taku Izumi  wrote:

> This patch adds tx_stall_task.
> When receiver's buffer is full, sender stops
> its tx queue. This task is used to monitor
> receiver's status and when receiver's buffer
> is avairable, it resumes tx queue.
> 
> Signed-off-by: Taku Izumi 
> ---
>  drivers/net/fjes/fjes.h  |  2 ++
>  drivers/net/fjes/fjes_main.c | 63 
> 
>  2 files changed, 65 insertions(+)
> 
> diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h
> index 8e9899e..b04ea9d 100644
> --- a/drivers/net/fjes/fjes.h
> +++ b/drivers/net/fjes/fjes.h
> @@ -30,6 +30,7 @@
>  #define FJES_MAX_QUEUES  1
>  #define FJES_TX_RETRY_INTERVAL   (20 * HZ)
>  #define FJES_TX_RETRY_TIMEOUT(100)
> +#define FJES_TX_TX_STALL_TIMEOUT (FJES_TX_RETRY_INTERVAL / 2)
>  #define FJES_OPEN_ZONE_UPDATE_WAIT   (300) /* msec */
>  
>  /* board specific private data structure */
> @@ -52,6 +53,7 @@ struct fjes_adapter {
>  
>   struct workqueue_struct *txrx_wq;
>  
> + struct work_struct tx_stall_task;
>   struct work_struct raise_intr_rxdata_task;
>  
>   struct fjes_hw hw;
> diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
> index 735aa5e..f4c2445 100644
> --- a/drivers/net/fjes/fjes_main.c
> +++ b/drivers/net/fjes/fjes_main.c
> @@ -53,6 +53,7 @@ static int fjes_setup_resources(struct fjes_adapter *);
>  static void fjes_free_resources(struct fjes_adapter *);
>  static netdev_tx_t fjes_xmit_frame(struct sk_buff *, struct net_device *);
>  static void fjes_raise_intr_rxdata_task(struct work_struct *);
> +static void fjes_tx_stall_task(struct work_struct *);
>  static irqreturn_t fjes_intr(int, void*);
>  
>  static int fjes_acpi_add(struct acpi_device *);
> @@ -281,6 +282,7 @@ static int fjes_close(struct net_device *netdev)
>   fjes_free_irq(adapter);
>  
>   cancel_work_sync(&adapter->raise_intr_rxdata_task);
> + cancel_work_sync(&adapter->tx_stall_task);
>  
>   fjes_hw_wait_epstop(hw);
>  
> @@ -410,6 +412,61 @@ static void fjes_free_resources(struct fjes_adapter 
> *adapter)
>   }
>  }
>  
> +static void fjes_tx_stall_task(struct work_struct *work)
> +{
> + struct fjes_adapter *adapter = container_of(work,
> + struct fjes_adapter, tx_stall_task);
> + struct fjes_hw *hw = &adapter->hw;
> + struct net_device *netdev = adapter->netdev;
> + enum ep_partner_status pstatus;
> + int epid;
> + int max_epid, my_epid;
> + union ep_buffer_info *info;
> + int all_queue_available;
> + int i;
> + int sendable;
> +
> + if (((long)jiffies -
> + (long)(netdev->trans_start)) > FJES_TX_TX_STALL_TIMEOUT) {
> + netif_wake_queue(netdev);
> + return;
> + }
> +
> + my_epid = hw->my_epid;
> + max_epid = hw->max_epid;
> +

> + for (i = 0; i < 5; i++) {

Why do you loop 5 times?

Thanks,
Yasuaki Ishimatsu

> + all_queue_available = 1;
> +
> + for (epid = 0; epid < max_epid; epid++) {
> + if (my_epid == epid)
> + continue;
> +
> + pstatus = fjes_hw_get_partner_ep_status(hw, epid);
> + sendable = (pstatus == EP_PARTNER_SHARED);
> + if (!sendable)
> + continue;
> +
> + info = adapter->hw.ep_shm_info[epid].tx.info;
> +
> + if (EP_RING_FULL(info->v1i.head, info->v1i.tail,
> +  info->v1i.count_max)) {
> + all_queue_available = 0;
> + break;
> + }
> + }
> +
> + if (all_queue_available) {
> + netif_wake_queue(netdev);
> + return;
> + }
> + }
> +
> + usleep_range(50, 100);
> +
> + queue_work(adapter->txrx_wq, &adapter->tx_stall_task);
> +}
> +
>  static void fjes_raise_intr_rxdata_task(struct work_struct *work)
>  {
>   struct fjes_adapter *adapter = container_of(work,
> @@ -606,6 +663,10 @@ fjes_xmit_frame(struct sk_buff *skb, struct net_device 
> *netdev)
>   netdev->trans_start = jiffies;
>   netif_tx_stop_queue(cur_queue);
>  
> + if 
> (!work_pending(&adapter->tx_stall_task))
> + queue_work(adapter->txrx_wq,
> +
> &adapter->tx_stall_task);
> +
>   ret = NETDEV_TX_BUSY;
>   }
>   } else {
> @@ -690,6 +751,7 @@ static int fjes_probe(struct platform_device *plat_dev)
>  
>   adapter->txrx_wq = create_workqueue(DRV_NAME "/txrx");
>  
> + INIT_WORK(&adapter->tx_stall_task, fjes_tx_stall_ta

[PATCH net-next] net: Build IPv6 into kernel by default

2015-07-13 Thread Tom Herbert
This patch makes the default to build IPv6 into the kernel. IPv6
now has significant traction and any remaining vestiges of IPv6
not being provided parity with IPv4 should be swept away. IPv6 is now
core to the Internet and kernel.

Points on IPv6 adoption:

- Per Google statistics, IPv6 usage has reached 7% on the Internet
  and continues to exhibit an exponential growth rate
  https://www.google.com/intl/en/ipv6/statistics.html
- Just a few days ago ARIN officially depleted its IPv4 pool
- IPv6 only data centers are being successfully built
  (e.g. at Facebook)

This patch changes the IPv6 Kconfig for IPV6. Default for CONFIG_IPV6
is set to "y" and the text has been updated to reflect the maturity of
IPv6.

Impact:

Under some circumstances building modules in to kernel might have a
performance advantage. In my testing, I did notice a very slight
improvement.

This will obviously increase the size of the kernel image. In my
configuration I see:

IPv6 as module:

   textdata bss dec hex filename
9703666 1899288  933888 12536842 bf4c0a vmlinux

IPv6 built into kernel

  text data bss dec hex filename
9436490 1879600  913408 12229498 ba9b7a vmlinux

Which increases text size by ~270K (2.8% increase in size for me). If
image size is an issue, presumably for a device which does not do IP
networking (IMO we should be discouraging IPv4-only devices), IPV6 can
be disabled or still built as a module.

Acked-by: YOSHIFUJI Hideaki 
Signed-off-by: Tom Herbert 
---
 net/ipv6/Kconfig | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 438a73a..643f613 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -5,16 +5,15 @@
 #   IPv6 as module will cause a CRASH if you try to unload it
 menuconfig IPV6
tristate "The IPv6 protocol"
-   default m
+   default y
---help---
- This is complemental support for the IP version 6.
- You will still be able to do traditional IPv4 networking as well.
+ Support for IP version 6 (IPv6).
 
  For general information about IPv6, see
  .
- For Linux IPv6 development information, see 
.
- For specific information about IPv6 under Linux, read the HOWTO at
- .
+ For specific information about IPv6 under Linux, see
+ Documentation/networking/ipv6.txt and read the HOWTO at
+ 
 
  To compile this protocol support as a module, choose M here: the 
  module will be called ipv6.
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ipv6: Fix finding best source address in ipv6_dev_get_saddr().

2015-07-13 Thread Tom Herbert
I am testing this patch which may be a little simpler. Also idev needs
to be checked after __in6_dev_get

Tom

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4ab74d5..d631ac3 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1363,9 +1363,10 @@ static void __ipv6_dev_get_saddr(struct net *net,
 unsigned int prefs,
 const struct in6_addr *saddr,
 struct inet6_dev *idev,
-struct ipv6_saddr_score *scores)
+struct ipv6_saddr_score **in_score,
+struct ipv6_saddr_score **in_hiscore)
 {
-   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
+   struct ipv6_saddr_score *score = *in_score, *hiscore = *in_hiscore;

read_lock_bh(&idev->lock);
list_for_each_entry(score->ifa, &idev->addr_list, if_list) {
@@ -1434,13 +1435,16 @@ static void __ipv6_dev_get_saddr(struct net *net,
}
 out:
read_unlock_bh(&idev->lock);
+   *in_hiscore = hiscore;
+   *in_score = score;
 }

 int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
   const struct in6_addr *daddr, unsigned int prefs,
   struct in6_addr *saddr)
 {
-   struct ipv6_saddr_score scores[2], *hiscore = &scores[1];
+   struct ipv6_saddr_score scores[2];
+   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
struct ipv6_saddr_dst dst;
struct inet6_dev *idev;
struct net_device *dev;
@@ -1475,18 +1479,19 @@ int ipv6_dev_get_saddr(struct net *net, const
struct net_device *dst_dev,
if ((dst_type & IPV6_ADDR_MULTICAST) ||
dst.scope <= IPV6_ADDR_SCOPE_LINKLOCAL) {
idev = __in6_dev_get(dst_dev);
-   use_oif_addr = true;
+   if (idev)
+   use_oif_addr = true;
}
}
if (use_oif_addr) {
-   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, scores);
+   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev,
&score, &hiscore);
} else {
for_each_netdev_rcu(net, dev) {
idev = __in6_dev_get(dev);
if (!idev)
continue;
-   __ipv6_dev_get_saddr(net, &dst, prefs, saddr,
idev, scores);
+   __ipv6_dev_get_saddr(net, &dst, prefs, saddr,
idev, &score, &hiscore);
}
}
rcu_read_unlock();

On Mon, Jul 13, 2015 at 7:28 AM, YOSHIFUJI Hideaki/吉藤英明
 wrote:
> Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when
> finding source address on specific interface.") did not properly
> update best source address available.  Plus, it introduced
> possible NULL pointer dereference.
>
> Bug was reported by Erik Kline .
> Based on patch proposed by Hajime Tazaki .
>
> Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not
> iterate over all interfaces when finding source address
> on specific interface.")
> Signed-off-by: YOSHIFUJI Hideaki 
> ---
>  net/ipv6/addrconf.c | 30 ++
>  1 file changed, 18 insertions(+), 12 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4ab74d5..4c9a024 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1358,14 +1358,15 @@ out:
> return ret;
>  }
>
> -static void __ipv6_dev_get_saddr(struct net *net,
> -struct ipv6_saddr_dst *dst,
> -unsigned int prefs,
> -const struct in6_addr *saddr,
> -struct inet6_dev *idev,
> -struct ipv6_saddr_score *scores)
> +static int __ipv6_dev_get_saddr(struct net *net,
> +   struct ipv6_saddr_dst *dst,
> +   unsigned int prefs,
> +   const struct in6_addr *saddr,
> +   struct inet6_dev *idev,
> +   struct ipv6_saddr_score *scores,
> +   int hiscore_idx)
>  {
> -   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
> +   struct ipv6_saddr_score *score = &scores[1 - hiscore_idx], *hiscore = 
> &scores[hiscore_idx];
>
> read_lock_bh(&idev->lock);
> list_for_each_entry(score->ifa, &idev->addr_list, if_list) {
> @@ -1424,6 +1425,7 @@ static void __ipv6_dev_get_saddr(struct net *net,
> in6_ifa_hold(score->ifa);
>
> swap(hiscore, score);
> +   hiscore_idx = 1 - hiscore_idx;
>
> /* restore our iterator */
>   

Re: [PATCH] ipv6: Fix finding best source address in ipv6_dev_get_saddr().

2015-07-13 Thread Hajime Tazaki

At Mon, 13 Jul 2015 23:28:10 +0900,
YOSHIFUJI Hideaki/吉藤英明 wrote:
> 
> Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when
> finding source address on specific interface.") did not properly
> update best source address available.  Plus, it introduced
> possible NULL pointer dereference.
> 
> Bug was reported by Erik Kline .
> Based on patch proposed by Hajime Tazaki .
> 
> Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not
>   iterate over all interfaces when finding source address
>   on specific interface.")
> Signed-off-by: YOSHIFUJI Hideaki 

all of my tests passed with the patch on 14fe22e: Revert
"ipv4: use skb coalescing in defragmentation".

thanks for the prompt fix !

Acked-by: Hajime Tazaki 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] nf: IDLETIMER: fix lockdep warning

2015-07-13 Thread Pablo Neira Ayuso
On Mon, Jul 13, 2015 at 08:02:36AM -0700, Dmitry Torokhov wrote:
> On Mon, Jul 13, 2015 at 6:20 AM, Pablo Neira Ayuso  
> wrote:
> > On Thu, Jul 09, 2015 at 05:15:01PM -0700, Dmitry Torokhov wrote:
> >> Dynamically allocated sysfs attributes should be initialized with
> >> sysfs_attr_init() otherwise lockdep will be angry with us:
> >>
> >> [   45.468653] BUG: key ffc030fad4e0 not in .data!
> >> [   45.468655] [ cut here ]
> >> [   45.468666] WARNING: CPU: 0 PID: 1176 at 
> >> /mnt/host/source/src/third_party/kernel/v3.18/kernel/locking/lockdep.c:2991
> >>  lockdep_init_map+0x12c/0x490()
> >> [   45.468672] DEBUG_LOCKS_WARN_ON(1)
> >> [   45.468672] CPU: 0 PID: 1176 Comm: iptables Tainted: G U  W 3.18.0 
> >> #43
> >> [   45.468674] Hardware name: XXX
> >> [   45.468675] Call trace:
> >> [   45.468680] [] dump_backtrace+0x0/0x10c
> >> [   45.468683] [] show_stack+0x10/0x1c
> >> [   45.468688] [] dump_stack+0x74/0x94
> >> [   45.468692] [] warn_slowpath_common+0x84/0xb0
> >> [   45.468694] [] warn_slowpath_fmt+0x4c/0x58
> >> [   45.468697] [] lockdep_init_map+0x128/0x490
> >> [   45.468701] [] __kernfs_create_file+0x80/0xe4
> >> [   45.468704] [] sysfs_add_file_mode_ns+0x104/0x170
> >> [   45.468706] [] sysfs_create_file_ns+0x58/0x64
> >> [   45.468711] [] idletimer_tg_checkentry+0x14c/0x324
> >> [   45.468714] [] xt_check_target+0x170/0x198
> >> [   45.468717] [] check_target+0x58/0x6c
> >> [   45.468720] [] translate_table+0x30c/0x424
> >> [   45.468723] [] do_ipt_set_ctl+0x144/0x1d0
> >> [   45.468728] [] nf_setsockopt+0x50/0x60
> >> [   45.468732] [] ip_setsockopt+0x8c/0xb4
> >> [   45.468735] [] raw_setsockopt+0x10/0x50
> >> [   45.468739] [] sock_common_setsockopt+0x14/0x20
> >> [   45.468742] [] SyS_setsockopt+0x88/0xb8
> >> [   45.468744] ---[ end trace 41d156354d18c039 ]---
> >
> > Applied, thanks.
> >
> > One question:
> >
> >> Change-Id: I1da5cd96fc8e1e1e4209e81eba1165a42d4d45e9
> >
> > BTW, does this gerrit change ID provide any public information? Thanks.
> 
> Argh, I am sorry, I forgot to clean this out when mailing the patch.
> In this particular case you can find the change in AOSP gerrit at
> https://android-review.googlesource.com but without such context this
> change-id is of course useless.

No problem, I'll remove it. Thanks Dmitry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] nf: IDLETIMER: fix lockdep warning

2015-07-13 Thread Dmitry Torokhov
On Mon, Jul 13, 2015 at 6:20 AM, Pablo Neira Ayuso  wrote:
> On Thu, Jul 09, 2015 at 05:15:01PM -0700, Dmitry Torokhov wrote:
>> Dynamically allocated sysfs attributes should be initialized with
>> sysfs_attr_init() otherwise lockdep will be angry with us:
>>
>> [   45.468653] BUG: key ffc030fad4e0 not in .data!
>> [   45.468655] [ cut here ]
>> [   45.468666] WARNING: CPU: 0 PID: 1176 at 
>> /mnt/host/source/src/third_party/kernel/v3.18/kernel/locking/lockdep.c:2991 
>> lockdep_init_map+0x12c/0x490()
>> [   45.468672] DEBUG_LOCKS_WARN_ON(1)
>> [   45.468672] CPU: 0 PID: 1176 Comm: iptables Tainted: G U  W 3.18.0 #43
>> [   45.468674] Hardware name: XXX
>> [   45.468675] Call trace:
>> [   45.468680] [] dump_backtrace+0x0/0x10c
>> [   45.468683] [] show_stack+0x10/0x1c
>> [   45.468688] [] dump_stack+0x74/0x94
>> [   45.468692] [] warn_slowpath_common+0x84/0xb0
>> [   45.468694] [] warn_slowpath_fmt+0x4c/0x58
>> [   45.468697] [] lockdep_init_map+0x128/0x490
>> [   45.468701] [] __kernfs_create_file+0x80/0xe4
>> [   45.468704] [] sysfs_add_file_mode_ns+0x104/0x170
>> [   45.468706] [] sysfs_create_file_ns+0x58/0x64
>> [   45.468711] [] idletimer_tg_checkentry+0x14c/0x324
>> [   45.468714] [] xt_check_target+0x170/0x198
>> [   45.468717] [] check_target+0x58/0x6c
>> [   45.468720] [] translate_table+0x30c/0x424
>> [   45.468723] [] do_ipt_set_ctl+0x144/0x1d0
>> [   45.468728] [] nf_setsockopt+0x50/0x60
>> [   45.468732] [] ip_setsockopt+0x8c/0xb4
>> [   45.468735] [] raw_setsockopt+0x10/0x50
>> [   45.468739] [] sock_common_setsockopt+0x14/0x20
>> [   45.468742] [] SyS_setsockopt+0x88/0xb8
>> [   45.468744] ---[ end trace 41d156354d18c039 ]---
>
> Applied, thanks.
>
> One question:
>
>> Change-Id: I1da5cd96fc8e1e1e4209e81eba1165a42d4d45e9
>
> BTW, does this gerrit change ID provide any public information? Thanks.

Argh, I am sorry, I forgot to clean this out when mailing the patch.
In this particular case you can find the change in AOSP gerrit at
https://android-review.googlesource.com but without such context this
change-id is of course useless.

Thanks,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ipv6: Fix finding best source address in ipv6_dev_get_saddr().

2015-07-13 Thread YOSHIFUJI Hideaki/吉藤英明
Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when
finding source address on specific interface.") did not properly
update best source address available.  Plus, it introduced
possible NULL pointer dereference.

Bug was reported by Erik Kline .
Based on patch proposed by Hajime Tazaki .

Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not
iterate over all interfaces when finding source address
on specific interface.")
Signed-off-by: YOSHIFUJI Hideaki 
---
 net/ipv6/addrconf.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4ab74d5..4c9a024 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1358,14 +1358,15 @@ out:
return ret;
 }
 
-static void __ipv6_dev_get_saddr(struct net *net,
-struct ipv6_saddr_dst *dst,
-unsigned int prefs,
-const struct in6_addr *saddr,
-struct inet6_dev *idev,
-struct ipv6_saddr_score *scores)
+static int __ipv6_dev_get_saddr(struct net *net,
+   struct ipv6_saddr_dst *dst,
+   unsigned int prefs,
+   const struct in6_addr *saddr,
+   struct inet6_dev *idev,
+   struct ipv6_saddr_score *scores,
+   int hiscore_idx)
 {
-   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
+   struct ipv6_saddr_score *score = &scores[1 - hiscore_idx], *hiscore = 
&scores[hiscore_idx];
 
read_lock_bh(&idev->lock);
list_for_each_entry(score->ifa, &idev->addr_list, if_list) {
@@ -1424,6 +1425,7 @@ static void __ipv6_dev_get_saddr(struct net *net,
in6_ifa_hold(score->ifa);
 
swap(hiscore, score);
+   hiscore_idx = 1 - hiscore_idx;
 
/* restore our iterator */
score->ifa = hiscore->ifa;
@@ -1434,18 +1436,20 @@ static void __ipv6_dev_get_saddr(struct net *net,
}
 out:
read_unlock_bh(&idev->lock);
+   return hiscore_idx;
 }
 
 int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
   const struct in6_addr *daddr, unsigned int prefs,
   struct in6_addr *saddr)
 {
-   struct ipv6_saddr_score scores[2], *hiscore = &scores[1];
+   struct ipv6_saddr_score scores[2], *hiscore;
struct ipv6_saddr_dst dst;
struct inet6_dev *idev;
struct net_device *dev;
int dst_type;
bool use_oif_addr = false;
+   int hiscore_idx = 0;
 
dst_type = __ipv6_addr_type(daddr);
dst.addr = daddr;
@@ -1454,8 +1458,8 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
dst.label = ipv6_addr_label(net, daddr, dst_type, dst.ifindex);
dst.prefs = prefs;
 
-   hiscore->rule = -1;
-   hiscore->ifa = NULL;
+   scores[hiscore_idx].rule = -1;
+   scores[hiscore_idx].ifa = NULL;
 
rcu_read_lock();
 
@@ -1480,17 +1484,19 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
}
 
if (use_oif_addr) {
-   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, scores);
+   if (idev)
+   hiscore_idx = __ipv6_dev_get_saddr(net, &dst, prefs, 
saddr, idev, scores, hiscore_idx);
} else {
for_each_netdev_rcu(net, dev) {
idev = __in6_dev_get(dev);
if (!idev)
continue;
-   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, 
scores);
+   hiscore_idx = __ipv6_dev_get_saddr(net, &dst, prefs, 
saddr, idev, scores, hiscore_idx);
}
}
rcu_read_unlock();
 
+   hiscore = &scores[hiscore_idx];
if (!hiscore->ifa)
return -EADDRNOTAVAIL;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net: Fix skb csum races when peeking

2015-07-13 Thread Herbert Xu
On Mon, Jul 13, 2015 at 08:01:42PM +0800, Herbert Xu wrote:
> 
> PS we seem to no longer use the hardware checksum in case of
> CHECKSUM_COMPLETE, I wonder why that is?

Nevermind, it's still there.  I was just looking in the wrong place.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v4] rtnl/bond: don't send rtnl msg for unregistered iface

2015-07-13 Thread Kristian Evensen
Hello,

I have a quick question about this patch.

On Wed, May 13, 2015 at 2:19 PM, Nicolas Dichtel
 wrote:
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 837d30b5ffed..7b25f1ef3d75 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -2415,6 +2415,9 @@ void rtmsg_ifinfo(int type, struct net_device *dev, 
> unsigned int change,
>  {
> struct sk_buff *skb;
>
> +   if (dev->reg_state != NETREG_REGISTERED)
> +   return;
> +

Is this check correct, or placed at the correct location? The reason I
am asking is as follows. In rollback_registered_many(), dev->reg_state
is set to NETREG_UNREGISTERING for devices that will be unregistered.
When rtmsg_ifinfo_build_skb(RTM_DELLINK, ...) is called in the
following loop in rollback_registered_many, this comparison will
always be true and no DELLINK event generated.

This change led to some applications I have not behaving as expected
due to missing DELLINK when network devices are removed. I also see no
DELLINK with ip mon link. Removing the check restores the old behavior
(DELLINK events are generated). My machine is running 3.18.18, which
includes this fix.

Thanks in advance for any help,
Kristian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] ipv6: Do not iterate over all interfaces when finding source address on specific interface.

2015-07-13 Thread YOSHIFUJI Hideaki
Hi,

Hajime Tazaki wrote:
> 
> Yoshifuji-san,
> 
> At Mon, 13 Jul 2015 17:38:48 +0900,
> Erik Kline wrote:
>>
>> On 13 July 2015 at 15:32, YOSHIFUJI Hideaki
>>  wrote:
>>> Hi,
>>>
>>> Erik Kline wrote:
 Hmm, when I run a UML linux with this patch (which, I'm ashamed to
 say, I failed to do before) I get these kinds of errors:

 unregister_netdevice: waiting for  to become free.
 Usage count = 1
 unregister_netdevice: waiting for  to become free.
 Usage count = 1

 Perhaps they're unrelated... I'm still investigating.
>>>
>>> Would you test attached patch please?
>>
>> That does look logically correct, so +1 to it regardless, but it does
>> not seem to have fixed the issue I'm seeing.
>>
>> I still haven't produced the smallest possible demo test program.
> 
> sorry to jump-in, but there is a side-effect with this
> patch, which my tcp and dccp tests (ipv6) are failed.
> 
> because newly added function (__ipv6_dev_get_saddr) won't
> update a variable 'hiscore' (it swaps with 'score' in some
> case), the caller (ipv6_dev_get_saddr) can't fill an
> appropriate saddr in the end.
> 
> I don't know if this is a good patch but the following diff
> makes my test happy.

We should update score as well...

> 
> -- Hajime
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4ab74d5..c4e9416 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1363,7 +1363,8 @@ static void __ipv6_dev_get_saddr(struct net *net,
>unsigned int prefs,
>const struct in6_addr *saddr,
>struct inet6_dev *idev,
> -  struct ipv6_saddr_score *scores)
> +  struct ipv6_saddr_score *scores,
> +  struct ipv6_saddr_score **in_hiscore)
>  {
>   struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
>  
> @@ -1424,6 +1425,7 @@ static void __ipv6_dev_get_saddr(struct net *net,
>   in6_ifa_hold(score->ifa);
>  
>   swap(hiscore, score);
> + *in_hiscore = hiscore;
>  
>   /* restore our iterator */
>   score->ifa = hiscore->ifa;
> @@ -1480,13 +1482,15 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
> net_device *dst_dev,
>   }
>  
>   if (use_oif_addr) {
> - __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, scores);
> + __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev,
> +  scores, &hiscore);
>   } else {
>   for_each_netdev_rcu(net, dev) {
>   idev = __in6_dev_get(dev);
>   if (!idev)
>   continue;
> - __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, 
> scores);
> + __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev,
> +  scores, &hiscore);
>   }
>   }
>   rcu_read_unlock();
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] bridge: mdb: fix double add notification

2015-07-13 Thread Nikolay Aleksandrov
Since the mdb add/del code was introduced there have been 2 br_mdb_notify
calls when doing br_mdb_add() resulting in 2 notifications on each add.

Example:
 Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
 Before patch:
 root@debian:~# bridge monitor all
 [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
 [MDB]dev br0 port eth1 grp 239.0.0.1 permanent

 After patch:
 root@debian:~# bridge monitor all
 [MDB]dev br0 port eth1 grp 239.0.0.1 permanent

Signed-off-by: Nikolay Aleksandrov 
Fixes: cfd567543590 ("bridge: add support of adding and deleting mdb entries")
---
 net/bridge/br_mdb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index c11cf2611db0..1198a3dbad95 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -351,7 +351,6 @@ static int br_mdb_add_group(struct net_bridge *br, struct 
net_bridge_port *port,
if (state == MDB_TEMPORARY)
mod_timer(&p->timer, now + br->multicast_membership_interval);
 
-   br_mdb_notify(br->dev, port, group, RTM_NEWMDB);
return 0;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/4 v2] gianfar: Bundle Rx allocation, cleanup

2015-07-13 Thread Claudiu Manoil
Use a more common consumer/ producer index design to improve
rx buffer allocation.  Instead of allocating a single new buffer
(skb) on each iteration, bundle the allocation of several rx
buffers at a time.  This also opens the path for further memory
optimizations.

Remove useless check of rxq->rfbptr, since this patch touches
rx pause frame handling code as well.  rxq->rfbptr is always
initialized as part of Rx BD ring init.
Remove redundant (and misleading) 'amount_pull' parameter.

Signed-off-by: Claudiu Manoil 
---
v2: none

 drivers/net/ethernet/freescale/gianfar.c | 201 ---
 drivers/net/ethernet/freescale/gianfar.h |  39 +++--
 drivers/net/ethernet/freescale/gianfar_ethtool.c |   3 +
 3 files changed, 136 insertions(+), 107 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index ff87502..b35bf3d 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -116,8 +116,8 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
net_device *dev);
 static void gfar_reset_task(struct work_struct *work);
 static void gfar_timeout(struct net_device *dev);
 static int gfar_close(struct net_device *dev);
-static struct sk_buff *gfar_new_skb(struct net_device *dev,
-   dma_addr_t *bufaddr);
+static void gfar_alloc_rx_buffs(struct gfar_priv_rx_q *rx_queue,
+   int alloc_cnt);
 static int gfar_set_mac_address(struct net_device *dev);
 static int gfar_change_mtu(struct net_device *dev, int new_mtu);
 static irqreturn_t gfar_error(int irq, void *dev_id);
@@ -142,7 +142,7 @@ static void gfar_netpoll(struct net_device *dev);
 int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit);
 static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue);
 static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb,
-  int amount_pull, struct napi_struct *napi);
+  struct napi_struct *napi);
 static void gfar_halt_nodisable(struct gfar_private *priv);
 static void gfar_clear_exact_match(struct net_device *dev);
 static void gfar_set_mac_for_addr(struct net_device *dev, int num,
@@ -169,17 +169,15 @@ static void gfar_init_rxbdp(struct gfar_priv_rx_q 
*rx_queue, struct rxbd8 *bdp,
bdp->lstatus = cpu_to_be32(lstatus);
 }
 
-static int gfar_init_bds(struct net_device *ndev)
+static void gfar_init_bds(struct net_device *ndev)
 {
struct gfar_private *priv = netdev_priv(ndev);
struct gfar __iomem *regs = priv->gfargrp[0].regs;
struct gfar_priv_tx_q *tx_queue = NULL;
struct gfar_priv_rx_q *rx_queue = NULL;
struct txbd8 *txbdp;
-   struct rxbd8 *rxbdp;
u32 __iomem *rfbptr;
int i, j;
-   dma_addr_t bufaddr;
 
for (i = 0; i < priv->num_tx_queues; i++) {
tx_queue = priv->tx_queue[i];
@@ -207,33 +205,18 @@ static int gfar_init_bds(struct net_device *ndev)
rfbptr = ®s->rfbptr0;
for (i = 0; i < priv->num_rx_queues; i++) {
rx_queue = priv->rx_queue[i];
-   rx_queue->cur_rx = rx_queue->rx_bd_base;
-   rx_queue->skb_currx = 0;
-   rxbdp = rx_queue->rx_bd_base;
-
-   for (j = 0; j < rx_queue->rx_ring_size; j++) {
-   struct sk_buff *skb = rx_queue->rx_skbuff[j];
 
-   if (skb) {
-   bufaddr = be32_to_cpu(rxbdp->bufPtr);
-   } else {
-   skb = gfar_new_skb(ndev, &bufaddr);
-   if (!skb) {
-   netdev_err(ndev, "Can't allocate RX 
buffers\n");
-   return -ENOMEM;
-   }
-   rx_queue->rx_skbuff[j] = skb;
-   }
+   rx_queue->next_to_clean = 0;
+   rx_queue->next_to_use = 0;
 
-   gfar_init_rxbdp(rx_queue, rxbdp, bufaddr);
-   rxbdp++;
-   }
+   /* make sure next_to_clean != next_to_use after this
+* by leaving at least 1 unused descriptor
+*/
+   gfar_alloc_rx_buffs(rx_queue, gfar_rxbd_unused(rx_queue));
 
rx_queue->rfbptr = rfbptr;
rfbptr += 2;
}
-
-   return 0;
 }
 
 static int gfar_alloc_skb_resources(struct net_device *ndev)
@@ -311,8 +294,7 @@ static int gfar_alloc_skb_resources(struct net_device *ndev)
rx_queue->rx_skbuff[j] = NULL;
}
 
-   if (gfar_init_bds(ndev))
-   goto cleanup;
+   gfar_init_bds(ndev);
 
return 0;
 
@@ -1639,10 +1621,7 @@ static int gfar_restore(struct device *dev)
return 0;
}
 
-   if (gfar_init_bds(ndev)) {
-   

mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings

2015-07-13 Thread Kirill A. Shutemov
Hi,

This simple test-case trigers few locking asserts in kernel:

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define SOL_NETLINK 270

int main(int argc, char **argv)
{
unsigned int block_size = 16 * 4096;
struct nl_mmap_req req = {
.nm_block_size  = block_size,
.nm_block_nr= 64,
.nm_frame_size  = 16384,
.nm_frame_nr= 64 * block_size / 16384,
};
unsigned int ring_size;
int fd;

fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
exit(1);
if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
exit(1);

ring_size = req.nm_block_nr * req.nm_block_size;
mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
return 0;
}

+++ exited with 0 +++
[2.500126] BUG: sleeping function called from invalid context at 
/home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
[2.501328] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
[2.501997] 3 locks held by init/1:
[2.502380]  #0:  (reboot_mutex){+.+...}, at: [] 
SyS_reboot+0xa9/0x220
[2.503328]  #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: 
[] __blocking_notifier_call_chain+0x39/0x70
[2.504659]  #2:  (rcu_callback){..}, at: [] 
rcu_do_batch.isra.49+0x160/0x10c0
[2.505724] Preemption disabled at:[] __delay+0xf/0x20
[2.506443] 
[2.506612] CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-9-gbddf4c4818e0 
#253
[2.507378] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
Debian-1.8.2-1 04/01/2014
[2.508386]  88017b3d8000 88027bc03c38 81929ceb 
0102
[2.509233]   88027bc03c68 81085a9d 
0002
[2.510057]  81ca2a20 0268  
88027bc03c98
[2.510882] Call Trace:
[2.511146][] dump_stack+0x4f/0x7b
[2.511763]  [] ___might_sleep+0x16d/0x270
[2.512476]  [] __might_sleep+0x4d/0x90
[2.513071]  [] mutex_lock_nested+0x2f/0x430
[2.513683]  [] ? _raw_spin_unlock_irqrestore+0x5d/0x80
[2.514385]  [] ? __this_cpu_preempt_check+0x13/0x20
[2.515066]  [] netlink_set_ring+0x1ed/0x350
[2.515694]  [] ? netlink_undo_bind+0x70/0x70
[2.516411]  [] netlink_sock_destruct+0x80/0x150
[2.517070]  [] __sk_free+0x1d/0x160
[2.517607]  [] sk_free+0x19/0x20
[2.518118]  [] deferred_put_nlk_sk+0x20/0x30
[2.518735]  [] rcu_do_batch.isra.49+0x79c/0x10c0
[2.519386]  [] ? rcu_do_batch.isra.49+0x160/0x10c0
[2.520101]  [] rcu_process_callbacks+0xdb/0x6d0
[2.520790]  [] __do_softirq+0x152/0x630
[2.521370]  [] irq_exit+0x8e/0xb0
[2.521895]  [] smp_apic_timer_interrupt+0x46/0x60
[2.522558]  [] ? __delay+0xf/0x20
[2.523079]  [] apic_timer_interrupt+0x70/0x80
[2.523705][] ? __delay+0xf/0x20
[2.524366]  [] ? in_lock_functions+0x1b/0x20
[2.524995]  [] get_parent_ip+0x11/0x50
[2.525562]  [] preempt_count_sub+0x9f/0xf0
[2.526179]  [] delay_tsc+0x68/0xc0
[2.526706]  [] __delay+0xf/0x20
[2.527207]  [] __const_udelay+0x2a/0x30
[2.527781]  [] md_notify_reboot+0xea/0x100
[2.528489]  [] ? __blocking_notifier_call_chain+0x39/0x70
[2.529236]  [] notifier_call_chain+0x66/0x90
[2.529856]  [] __blocking_notifier_call_chain+0x51/0x70
[2.530570]  [] ? __lock_acquire+0x606/0xf50
[2.531172]  [] blocking_notifier_call_chain+0x16/0x20
[2.531869]  [] kernel_restart_prepare+0x1d/0x40
[2.532593]  [] kernel_restart+0x16/0x60
[2.533183]  [] SyS_reboot+0x157/0x220
[2.533738]  [] ? __restore_xstate_sig+0xf8/0x720
[2.534390]  [] ? debug_smp_processor_id+0x17/0x20
[2.535051]  [] ? put_lock_stats.isra.19+0xe/0x30
[2.535707]  [] ? _raw_spin_unlock_irq+0x30/0x60
[2.536446]  [] ? preempt_count_sub+0xab/0xf0
[2.537112]  [] ? syscall_return+0x11/0x54
[2.537709]  [] ? __this_cpu_preempt_check+0x13/0x20
[2.538399]  [] ? trace_hardirqs_on_caller+0xf3/0x240
[2.539094]  [] ? trace_hardirqs_on_thunk+0x17/0x19
[2.539764]  [] system_call_fastpath+0x12/0x6f
[2.540523] 
[2.540695] =
[2.541161] [ INFO: inconsistent lock state ]
[2.541618] 4.1.0-9-gbddf4c4818e0 #253 Not tainted
[2.542154] -
[2.542610] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[2.543236] init/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
[2.543729]  (&nlk->pg_vec_lock){+.?.+.}, at: [] 
netlink_set_ring+0x1ed/0x350
[2.544503] {SOFTIRQ-ON-W} state was registered at:
[2.544503]   [] __lock_acquire+0xb0e/0xf50
[2.544503]   [] lock_acquire+0xd2/0x2b0
[2.544503]   [] mutex_lock_nested+0x71/0x430
[2.544503]   [] netlink_set_ring+0x1ed/0x350
[2.544503]   [] netl

[PATCH net-next 3/4 v2] gianfar: Use ndev, more Rx path cleanup

2015-07-13 Thread Claudiu Manoil
Use "ndev" instead of "dev", as the rx queue back pointer
to a net_device struct, to avoid name clashing with a
"struct device" reference.  This prepares the addition of a
"struct device" back pointer to the rx queue structure.

Remove duplicated rxq registration in the process.
Move napi_gro_receive() outside gfar_process_frame().

Signed-off-by: Claudiu Manoil 
---
v2: merge lstatus as u32

 drivers/net/ethernet/freescale/gianfar.c | 54 ++--
 drivers/net/ethernet/freescale/gianfar.h |  4 +--
 2 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index c839e76..7654d5e 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -141,8 +141,7 @@ static void gfar_netpoll(struct net_device *dev);
 #endif
 int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit);
 static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue);
-static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb,
-  struct napi_struct *napi);
+static void gfar_process_frame(struct net_device *ndev, struct sk_buff *skb);
 static void gfar_halt_nodisable(struct gfar_private *priv);
 static void gfar_clear_exact_match(struct net_device *dev);
 static void gfar_set_mac_for_addr(struct net_device *dev, int num,
@@ -262,7 +261,7 @@ static int gfar_alloc_skb_resources(struct net_device *ndev)
rx_queue = priv->rx_queue[i];
rx_queue->rx_bd_base = vaddr;
rx_queue->rx_bd_dma_base = addr;
-   rx_queue->dev = ndev;
+   rx_queue->ndev = ndev;
addr  += sizeof(struct rxbd8) * rx_queue->rx_ring_size;
vaddr += sizeof(struct rxbd8) * rx_queue->rx_ring_size;
}
@@ -593,7 +592,7 @@ static int gfar_alloc_rx_queues(struct gfar_private *priv)
 
priv->rx_queue[i]->rx_skbuff = NULL;
priv->rx_queue[i]->qindex = i;
-   priv->rx_queue[i]->dev = priv->ndev;
+   priv->rx_queue[i]->ndev = priv->ndev;
}
return 0;
 }
@@ -1913,7 +1912,7 @@ static void free_skb_tx_queue(struct gfar_priv_tx_q 
*tx_queue)
 static void free_skb_rx_queue(struct gfar_priv_rx_q *rx_queue)
 {
struct rxbd8 *rxbdp;
-   struct gfar_private *priv = netdev_priv(rx_queue->dev);
+   struct gfar_private *priv = netdev_priv(rx_queue->ndev);
int i;
 
rxbdp = rx_queue->rx_bd_base;
@@ -2709,17 +2708,17 @@ static struct sk_buff *gfar_new_skb(struct net_device 
*ndev,
 
 static void gfar_rx_alloc_err(struct gfar_priv_rx_q *rx_queue)
 {
-   struct gfar_private *priv = netdev_priv(rx_queue->dev);
+   struct gfar_private *priv = netdev_priv(rx_queue->ndev);
struct gfar_extra_stats *estats = &priv->extra_stats;
 
-   netdev_err(rx_queue->dev, "Can't alloc RX buffers\n");
+   netdev_err(rx_queue->ndev, "Can't alloc RX buffers\n");
atomic64_inc(&estats->rx_alloc_err);
 }
 
 static void gfar_alloc_rx_buffs(struct gfar_priv_rx_q *rx_queue,
int alloc_cnt)
 {
-   struct net_device *ndev = rx_queue->dev;
+   struct net_device *ndev = rx_queue->ndev;
struct rxbd8 *bdp, *base;
dma_addr_t bufaddr;
int i;
@@ -2756,10 +2755,10 @@ static void gfar_alloc_rx_buffs(struct gfar_priv_rx_q 
*rx_queue,
rx_queue->next_to_use = i;
 }
 
-static void count_errors(u32 lstatus, struct net_device *dev)
+static void count_errors(u32 lstatus, struct net_device *ndev)
 {
-   struct gfar_private *priv = netdev_priv(dev);
-   struct net_device_stats *stats = &dev->stats;
+   struct gfar_private *priv = netdev_priv(ndev);
+   struct net_device_stats *stats = &ndev->stats;
struct gfar_extra_stats *estats = &priv->extra_stats;
 
/* If the packet was truncated, none of the other errors matter */
@@ -2854,10 +2853,9 @@ static inline void gfar_rx_checksum(struct sk_buff *skb, 
struct rxfcb *fcb)
 }
 
 /* gfar_process_frame() -- handle one incoming packet if skb isn't NULL. */
-static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb,
-  struct napi_struct *napi)
+static void gfar_process_frame(struct net_device *ndev, struct sk_buff *skb)
 {
-   struct gfar_private *priv = netdev_priv(dev);
+   struct gfar_private *priv = netdev_priv(ndev);
struct rxfcb *fcb = NULL;
 
/* fcb is at the beginning if exists */
@@ -2866,10 +2864,8 @@ static void gfar_process_frame(struct net_device *dev, 
struct sk_buff *skb,
/* Remove the FCB from the skb
 * Remove the padded bytes, if there are any
 */
-   if (priv->uses_rxfcb) {
-   skb_record_rx_queue(skb, fcb->rq);
+   if (priv->uses_rxfcb)
skb_pull(skb, GMAC_FCB_LEN);
-   }
 
/* Get receive timestamp fro

[PATCH net-next 2/4 v2] gianfar: Fix and cleanup rxbd status handling

2015-07-13 Thread Claudiu Manoil
There are several (long standing) problems about how the status
field of the rx buffer descriptor (rxbd) is currently handled on
the error path:
- too many unnecessary 16bit reads of the two halves of the rxbd
status field (32bit), also resulting in overuse of endianness
convesion macros;
- "bdp->status = RXBD_LARGE" makes no sense, since the "large"
flag is read only (only eTSEC can write it), and trying to clear
the other status bits is also error prone in this context
(most of the rx status bits are read only anyway).

This is fixed with a single 32bit read of the "status" field,
and then the appropriate 16bit shifting is applied to access
the various status bits or the rx frame length. Also corrected
the use of the RXBD_LARGE flag.

Additional fix:
"rx_over_errors" stat is incremented instead of "rx_crc_errors"
in case of RXBD_OVERRUN occurrence.

Signed-off-by: Claudiu Manoil 
---
v2: lstatus is "u32", not "unsigned long"

 drivers/net/ethernet/freescale/gianfar.c | 34 +---
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index b35bf3d..c839e76 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2756,14 +2756,14 @@ static void gfar_alloc_rx_buffs(struct gfar_priv_rx_q 
*rx_queue,
rx_queue->next_to_use = i;
 }
 
-static inline void count_errors(unsigned short status, struct net_device *dev)
+static void count_errors(u32 lstatus, struct net_device *dev)
 {
struct gfar_private *priv = netdev_priv(dev);
struct net_device_stats *stats = &dev->stats;
struct gfar_extra_stats *estats = &priv->extra_stats;
 
/* If the packet was truncated, none of the other errors matter */
-   if (status & RXBD_TRUNCATED) {
+   if (lstatus & BD_LFLAG(RXBD_TRUNCATED)) {
stats->rx_length_errors++;
 
atomic64_inc(&estats->rx_trunc);
@@ -2771,25 +2771,25 @@ static inline void count_errors(unsigned short status, 
struct net_device *dev)
return;
}
/* Count the errors, if there were any */
-   if (status & (RXBD_LARGE | RXBD_SHORT)) {
+   if (lstatus & BD_LFLAG(RXBD_LARGE | RXBD_SHORT)) {
stats->rx_length_errors++;
 
-   if (status & RXBD_LARGE)
+   if (lstatus & BD_LFLAG(RXBD_LARGE))
atomic64_inc(&estats->rx_large);
else
atomic64_inc(&estats->rx_short);
}
-   if (status & RXBD_NONOCTET) {
+   if (lstatus & BD_LFLAG(RXBD_NONOCTET)) {
stats->rx_frame_errors++;
atomic64_inc(&estats->rx_nonoctet);
}
-   if (status & RXBD_CRCERR) {
+   if (lstatus & BD_LFLAG(RXBD_CRCERR)) {
atomic64_inc(&estats->rx_crcerr);
stats->rx_crc_errors++;
}
-   if (status & RXBD_OVERRUN) {
+   if (lstatus & BD_LFLAG(RXBD_OVERRUN)) {
atomic64_inc(&estats->rx_overrun);
-   stats->rx_crc_errors++;
+   stats->rx_over_errors++;
}
 }
 
@@ -2921,6 +2921,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, 
int rx_work_limit)
i = rx_queue->next_to_clean;
 
while (rx_work_limit--) {
+   u32 lstatus;
 
if (cleaned_cnt >= GFAR_RX_BUFF_ALLOC) {
gfar_alloc_rx_buffs(rx_queue, cleaned_cnt);
@@ -2928,7 +2929,8 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, 
int rx_work_limit)
}
 
bdp = &rx_queue->rx_bd_base[i];
-   if (be16_to_cpu(bdp->status) & RXBD_EMPTY)
+   lstatus = be32_to_cpu(bdp->lstatus);
+   if (lstatus & BD_LFLAG(RXBD_EMPTY))
break;
 
/* order rx buffer descriptor reads */
@@ -2940,13 +2942,13 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, 
int rx_work_limit)
dma_unmap_single(priv->dev, be32_to_cpu(bdp->bufPtr),
 priv->rx_buffer_size, DMA_FROM_DEVICE);
 
-   if (unlikely(!(be16_to_cpu(bdp->status) & RXBD_ERR) &&
-be16_to_cpu(bdp->length) > priv->rx_buffer_size))
-   bdp->status = cpu_to_be16(RXBD_LARGE);
+   if (unlikely(!(lstatus & BD_LFLAG(RXBD_ERR)) &&
+(lstatus & BD_LENGTH_MASK) > priv->rx_buffer_size))
+   lstatus |= BD_LFLAG(RXBD_LARGE);
 
-   if (unlikely(!(be16_to_cpu(bdp->status) & RXBD_LAST) ||
-be16_to_cpu(bdp->status) & RXBD_ERR)) {
-   count_errors(be16_to_cpu(bdp->status), dev);
+   if (unlikely(!(lstatus & BD_LFLAG(RXBD_LAST)) ||
+(lstatus & BD_LFLAG(RXBD_ERR {
+   count_errors(lstatus, dev);
 
   

[PATCH net-next 4/4 v2] gianfar: Add paged allocation and Rx S/G

2015-07-13 Thread Claudiu Manoil
The eTSEC h/w is capable of scatter/gather on the receive side
too if MAXFRM > MRBLR, when the allowed maximum Rx frame size
is set to be greater than the maximum Rx buffer size (MRBLR).
It's about time the driver makes use of this h/w capability,
by supporting fixed buffer sizes and Rx S/G.

The buffer size given to eTSEC for reception is fixed to
1536B (must be multiple of 64), which is the same default
buffer size as before, used to accommodate standard MTU
(1500B) size frames.  As before, eTSEC can receive frames of
up to 9600B.  Individual Rx buffers are mapped to page halves
(page size for eTSEC systems is 4KB).  The skb is built around
the first buffer of a frame (using build_skb()).  In case the
frame spans multiple buffers, the trailing buffers are added
as Rx fragments to the skb.  The last buffer in frame is marked
by the L status flag.  A mechanism is in place to reuse the pages
owned by the driver (for Rx) for subsequent receptions.

Supporting fixed size buffers allows the implementation of Rx S/G,
which in turn removes the memory pressure issues the driver had
before when MTU was set for jumbo frame reception.
Also, in most cases, the Rx path becomes faster due to Rx page
reusal, since the overhead of allocating new rx buffers is removed
from the fast path.

Signed-off-by: Claudiu Manoil 
---
v2: use lstatus as u32 consistently

 drivers/net/ethernet/freescale/gianfar.c | 320 ++-
 drivers/net/ethernet/freescale/gianfar.h |  31 ++-
 drivers/net/ethernet/freescale/gianfar_ethtool.c |   1 -
 3 files changed, 208 insertions(+), 144 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 7654d5e..648ca85 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -109,7 +109,7 @@
 
 #define TX_TIMEOUT  (1*HZ)
 
-const char gfar_driver_version[] = "1.3";
+const char gfar_driver_version[] = "2.0";
 
 static int gfar_enet_open(struct net_device *dev);
 static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev);
@@ -207,6 +207,7 @@ static void gfar_init_bds(struct net_device *ndev)
 
rx_queue->next_to_clean = 0;
rx_queue->next_to_use = 0;
+   rx_queue->next_to_alloc = 0;
 
/* make sure next_to_clean != next_to_use after this
 * by leaving at least 1 unused descriptor
@@ -222,7 +223,7 @@ static int gfar_alloc_skb_resources(struct net_device *ndev)
 {
void *vaddr;
dma_addr_t addr;
-   int i, j, k;
+   int i, j;
struct gfar_private *priv = netdev_priv(ndev);
struct device *dev = priv->dev;
struct gfar_priv_tx_q *tx_queue = NULL;
@@ -262,6 +263,7 @@ static int gfar_alloc_skb_resources(struct net_device *ndev)
rx_queue->rx_bd_base = vaddr;
rx_queue->rx_bd_dma_base = addr;
rx_queue->ndev = ndev;
+   rx_queue->dev = dev;
addr  += sizeof(struct rxbd8) * rx_queue->rx_ring_size;
vaddr += sizeof(struct rxbd8) * rx_queue->rx_ring_size;
}
@@ -276,21 +278,17 @@ static int gfar_alloc_skb_resources(struct net_device 
*ndev)
if (!tx_queue->tx_skbuff)
goto cleanup;
 
-   for (k = 0; k < tx_queue->tx_ring_size; k++)
-   tx_queue->tx_skbuff[k] = NULL;
+   for (j = 0; j < tx_queue->tx_ring_size; j++)
+   tx_queue->tx_skbuff[j] = NULL;
}
 
for (i = 0; i < priv->num_rx_queues; i++) {
rx_queue = priv->rx_queue[i];
-   rx_queue->rx_skbuff =
-   kmalloc_array(rx_queue->rx_ring_size,
- sizeof(*rx_queue->rx_skbuff),
- GFP_KERNEL);
-   if (!rx_queue->rx_skbuff)
+   rx_queue->rx_buff = kcalloc(rx_queue->rx_ring_size,
+   sizeof(*rx_queue->rx_buff),
+   GFP_KERNEL);
+   if (!rx_queue->rx_buff)
goto cleanup;
-
-   for (j = 0; j < rx_queue->rx_ring_size; j++)
-   rx_queue->rx_skbuff[j] = NULL;
}
 
gfar_init_bds(ndev);
@@ -335,10 +333,8 @@ static void gfar_init_rqprm(struct gfar_private *priv)
}
 }
 
-static void gfar_rx_buff_size_config(struct gfar_private *priv)
+static void gfar_rx_offload_en(struct gfar_private *priv)
 {
-   int frame_size = priv->ndev->mtu + ETH_HLEN + ETH_FCS_LEN;
-
/* set this when rx hw offload (TOE) functions are being used */
priv->uses_rxfcb = 0;
 
@@ -347,16 +343,6 @@ static void gfar_rx_buff_size_config(struct gfar_private 
*priv)
 
if (priv->hwts_rx_en)
priv->uses_rxfcb = 1;
-
-   if (priv->uses_rxfcb)
-   frame_size += GMAC_FCB_LEN;
-
-

[PATCH net-next 0/4 v2] gianfar: Add Rx S/G

2015-07-13 Thread Claudiu Manoil
Hi David,
This patch-set introduces scatter/gather support
on the Rx side, addressing Rx path performance
issues in the driver.
Thanks.

As an example, two boards connected back-to-back
were used to measure the throughput, running the
same kernel 4.1, before and after applying these
patches.
The netperf UDP_STREAM results below show that the
bottleneck lies on the Rx side BEFORE applying the
patches, and that the Rx throughput is even lower
with a larger MTU.  AFTER applying the patches the
Rx bottleneck is gone (Rx throughput matches the
Tx one) and the RX throughput is not influenced by
MTU size any longer (as expected).


BEFORE:

1) MTU 1500 (default)

root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- 
-m 512
MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 
AF_INET
Socket  Message  Elapsed  Messages   CPU  Service
SizeSize Time Okay Errors   Throughput   Util Demand
bytes   bytessecs#  #   10^6bits/sec % SS us/KB

163840 512   150.0020119124  0  549.4 100.00   14.911
163840   150.0014057349 383.9 100.00   14.911

root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- 
-m 64
MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 
AF_INET
Socket  Message  Elapsed  Messages   CPU  Service
SizeSize Time Okay Errors   Throughput   Util Demand
bytes   bytessecs#  #   10^6bits/sec % SS us/KB

163840  64   150.0023654013  0   80.7 100.00   101.463
163840   150.0015875288  54.2 100.00   101.463

2) MTU 8000

root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- 
-m 512
MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 
AF_INET
Socket  Message  Elapsed  Messages   CPU  Service
SizeSize Time Okay Errors   Throughput   Util Demand
bytes   bytessecs#  #   10^6bits/sec % SS us/KB

163840 512   150.0020067232  0  548.0 100.00   14.950
163840   150.006113498 166.9 99.9514.942

root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- 
-m 64
MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 
AF_INET
Socket  Message  Elapsed  Messages   CPU  Service
SizeSize Time Okay Errors   Throughput   Util Demand
bytes   bytessecs#  #   10^6bits/sec % SS us/KB

163840  64   150.0023621279  0   80.6 100.00   101.604
163840   150.005868602  20.0 99.96101.563


AFTER:
(both MTU 1500 and MTU 8000)

root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- 
-m 512
MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 
AF_INET
Socket  Message  Elapsed  Messages   CPU  Service
SizeSize Time Okay Errors   Throughput   Util Demand
bytes   bytessecs#  #   10^6bits/sec % SS us/KB

163840 512   150.0019914969  0  543.8 100.00   15.064
163840   150.0019914969 543.8 99.3514.966

root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- 
-m 64
MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 
AF_INET
Socket  Message  Elapsed  Messages   CPU  Service
SizeSize Time Okay Errors   Throughput   Util Demand
bytes   bytessecs#  #   10^6bits/sec % SS us/KB

163840  64   150.0023433989  0   80.0 100.00   102.416
163840   150.0023433989  80.0 99.62102.023




Claudiu Manoil (4):
  gianfar: Bundle Rx allocation, cleanup
  gianfar: Fix and cleanup rxbd status handling
  gianfar: Use ndev, more Rx path cleanup
  gianfar: Add paged allocation and Rx S/G

 drivers/net/ethernet/freescale/gianfar.c | 496 +--
 drivers/net/ethernet/freescale/gianfar.h |  72 ++--
 drivers/net/ethernet/freescale/gianfar_ethtool.c |   4 +-
 3 files changed, 331 insertions(+), 241 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users

2015-07-13 Thread Marcelo Ricardo Leitner

On 13-07-2015 07:39, Neil Horman wrote:

On Fri, Jul 10, 2015 at 06:21:14PM -0700, David Miller wrote:

From: Marcelo Ricardo Leitner 
Date: Thu,  9 Jul 2015 11:15:19 -0300


SCTP has this operation to peel off associations from a given socket and
create a new socket using this association. We currently have two ways
to use this operation:
- via getsockopt(), on which it will also create and return a file
   descriptor for this new socket
- via sctp_do_peeloff(), which is for kernel only

The caveat with using sctp_do_peeloff() directly is that it creates a
dependency to SCTP module, while all other operations are handled via
kernel_{socket,sendmsg,getsockopt...}() interface. This causes the
kernel to load SCTP module even when it's not directly used

This patch then updates SCTP_SOCKOPT_PEELOFF so that for kernel users of
this protocol it will not allocate a file descriptor but instead just
return the socket pointer directly.

If called by an user application it will work as before.

Signed-off-by: Marcelo Ricardo Leitner 


I do not like this at all.

Socket option implementations should not change their behavior or what
datastructures they consume or return just because the socket happens
to be a kernel socket.


But in this case its necessecary, as the kernel here can't allocate an fd, due
to serious leakage (see commit 2f2d76cc3e938389feee671b46252dde6880b3b7).
Initially Marcelo had created duplicate code paths, one to return an fd, one to
return a file struct.  If you would rather go in that direction, I'm sure he can
propose it again, but that seems less correct to me than this solution.


Yes.

dlm is the only user of this option within kernel today and it causes 
serious problems, as Neil just referenced. Another good result of this 
implementation is that we are preventing such leakage from happening 
again in the future.



I'm not applying this series, sorry.

Also, your patch series lacked an intial "PATCH 0/N" posting, so you
could at least spend the time to discuss this patch series at a high
level and explain your overall motivations.


That was in the initial posting.  It should have been reposted, but if you're
interested:
http://marc.info/?l=linux-sctp&m=143449456219518&w=2


My bad. Won't happen again.

Thanks,
Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] nf: IDLETIMER: fix lockdep warning

2015-07-13 Thread Pablo Neira Ayuso
On Thu, Jul 09, 2015 at 05:15:01PM -0700, Dmitry Torokhov wrote:
> Dynamically allocated sysfs attributes should be initialized with
> sysfs_attr_init() otherwise lockdep will be angry with us:
> 
> [   45.468653] BUG: key ffc030fad4e0 not in .data!
> [   45.468655] [ cut here ]
> [   45.468666] WARNING: CPU: 0 PID: 1176 at 
> /mnt/host/source/src/third_party/kernel/v3.18/kernel/locking/lockdep.c:2991 
> lockdep_init_map+0x12c/0x490()
> [   45.468672] DEBUG_LOCKS_WARN_ON(1)
> [   45.468672] CPU: 0 PID: 1176 Comm: iptables Tainted: G U  W 3.18.0 #43
> [   45.468674] Hardware name: XXX
> [   45.468675] Call trace:
> [   45.468680] [] dump_backtrace+0x0/0x10c
> [   45.468683] [] show_stack+0x10/0x1c
> [   45.468688] [] dump_stack+0x74/0x94
> [   45.468692] [] warn_slowpath_common+0x84/0xb0
> [   45.468694] [] warn_slowpath_fmt+0x4c/0x58
> [   45.468697] [] lockdep_init_map+0x128/0x490
> [   45.468701] [] __kernfs_create_file+0x80/0xe4
> [   45.468704] [] sysfs_add_file_mode_ns+0x104/0x170
> [   45.468706] [] sysfs_create_file_ns+0x58/0x64
> [   45.468711] [] idletimer_tg_checkentry+0x14c/0x324
> [   45.468714] [] xt_check_target+0x170/0x198
> [   45.468717] [] check_target+0x58/0x6c
> [   45.468720] [] translate_table+0x30c/0x424
> [   45.468723] [] do_ipt_set_ctl+0x144/0x1d0
> [   45.468728] [] nf_setsockopt+0x50/0x60
> [   45.468732] [] ip_setsockopt+0x8c/0xb4
> [   45.468735] [] raw_setsockopt+0x10/0x50
> [   45.468739] [] sock_common_setsockopt+0x14/0x20
> [   45.468742] [] SyS_setsockopt+0x88/0xb8
> [   45.468744] ---[ end trace 41d156354d18c039 ]---

Applied, thanks.

One question:

> Change-Id: I1da5cd96fc8e1e1e4209e81eba1165a42d4d45e9

BTW, does this gerrit change ID provide any public information? Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tc: fix tc actions in case of shared skb

2015-07-13 Thread Jamal Hadi Salim

On 07/10/15 20:10, Alexei Starovoitov wrote:

TC actions need to check for very unlikely event skb->users != 1,
otherwise subsequent pskb_may_pull/pskb_expand_head will crash.
When skb_shared() just drop the packet, since in the middle of actions
it's too late to call skb_share_check(), since classifiers/actions assume
the same skb pointer.



Alexei,
To add to what Dave said - are the rules specified here:
Documentation/networking/tc-actions-env-rules.txt
insufficient?

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add stealth mode

2015-07-13 Thread Austin S Hemmelgarn

On 2015-07-12 19:13, Matteo Croce wrote:

2015-07-08 15:32 GMT+02:00 Austin S Hemmelgarn :

On 2015-07-06 15:44, Matteo Croce wrote:
Just to name a few that I know of off the top of my head:
1. IP packets with any protocol number not supported by your current kernel
(these return a special ICMP message).


Right, I'll handle them


2. SCTP INIT and COOKIE_ECHO chunks when you have SCTP enabled in the
kernel.


Well, I've never played with SCTP before
It should still be checked, as should DCCP and RDS (those are the only 
other Layer 3 protocols that I have ever actually seen people try to 
scan hosts with besides TCP/UDP/SCTP).  SCTP itself is not hugely 
prevalent outside of some clustering uses, but it is still seen on the 
internet sometimes (for example, Gentoo has optional patches for OpenSSH 
to use SCTP).



3. Theoretically, some IGMP messages.
4. NDP messages.
5. ARP queries looking for the machine's IP addresses.


Yes I know, but it's unlikely to receive this packets from WAN, right?
My flag is intended to be used mostly on WAN interfaces,
machines in LAN should be easily discoverable IMHO.
In theory it's unlikely, but if you use any kind of IPv4 multicast on 
the WAN you will get IGMP (and MLD for IPv6 multicast).  You may also 
get some NDP queries also if you are using IPv6 and your WAN is itself 
behind a NAT router (and yes, there are ISP's who do that).



6. Certain odd flag combinations on single TCP packets (check the
documentation for Nmap for more info regarding these), which I believe
(although I may be reading the code wrong) you aren't accounting for.


I've tried many TCP flags combination with hping3, NUL, SYN/ACK, ACK,
SYN/FIN, etc.
They doesn't get any response when the flag is set

How about FIN/ACK and FIN/PSH/URG?



7. DAD queries.


Never looked at this packets, are a subset of NDP?
Kind of, it's an ICMPv6 extension for detecting if SLACC configured 
address is already in use.  Most distro's have support for it enabled by 
default.

8. ICMP address mask queries (which you also don't appear to account for).


It's deprecated and actually it doesn't get any response already
Just because it's deprecated doesn't mean you shouldn't account for it, 
although it does appear to get dropped by default by the kernel.


You should also test how different combinations of sysctls under 
/proc/sys/net affect this (there are for example already sysctls for 
ignoring certain types of ICMP packets).




smime.p7s
Description: S/MIME Cryptographic Signature


add some more infomation RE: Issue with active-backup mode bond and bridge

2015-07-13 Thread pengyi Peng(Yi)
I test this issue in kernel 3.0.93. This issue is a reproduction problem.

Step 1. Create a active-backup mode bond with two nics and make sure the IP is 
in the bond.
Step 2. Create a bridge with brctl command
Step 3. Join the bond to the bridge and make the IP in the bridge device
Step 4. use "tcpdump -i bond" to ensure the packets across the bond
Step 5. Use "ifconfig ethX down ", make the active slave down, check whether 
there is gratuitous ARPs or not.

-Original Message-
From: pengyi Peng(Yi) 
Sent: Thursday, July 02, 2015 11:05 AM
To: 'netdev@vger.kernel.org'
Cc: Lichunhe; Zhangwei (FF)
Subject: Issue with active-backup mode bond and bridge

I find that kernel seems to be not well handled with the combination of bonding 
and bridge module. I have a physical host with two nics that are bonded 
together (active backup mode).  Each nic is connected to a separate L2 switch. 
And the two L2 switchs are connected to a L3 switch.

If the host only has the bond device, when I manually make the active slave 
down, bonding will issue one or more gratuitous ARPs on the newly active slave. 
One gratuitous ARP is issued for the bonding master interface, provided that 
the interface has at least one IP address configured. 

However, if there is a bridge named br0 and the bond device joins in the bridge 
br0, the IP address of the bond moves to the br0 device. First, I make two nics 
up. But this time, when I again make the active slave down, I can't capture the 
gratuitous ARP in the bond device with tcpdump. And this can result in the bad 
connect to the host, because with no ARP packet sended out of the host, the L3 
switch may still send the packets from outside to the old L2 switch which 
connect to the new backup nic. These packets can't get any responses.

I read the kernel code. 
When change the active slave into the specified one, in 
bond_change_active_slave function, bond will send the NETDEV_NOTIFY_PEERS event:
netdev_bonding_change(bond->dev, 
NETDEV_BONDING_FAILOVER);
if (should_notify_peers)
netdev_bonding_change(bond->dev,
  NETDEV_NOTIFY_PEERS);

  
And in inetdev_event function, if event is NETDEV_NOTIFY_PEERS, it will call 
inetdev_send_gratuitous_arp to send gratuitous ARP.
case NETDEV_NOTIFY_PEERS:
/* Send gratuitous ARP to notify of link change */
inetdev_send_gratuitous_arp(dev, in_dev);
break;

But when the bond is in the bridge, the code won't change the dev to the bridge 
device, and there is no IP address in bond device, so there is no gratuitous 
ARP.

My question is, why the latest kernel(4.1) still does not consider this 
conditoin ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@xxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next 12/16] i40evf: don't delete all the filters

2015-07-13 Thread Sergei Shtylyov

Hello.

On 7/13/2015 12:08 PM, Jeff Kirsher wrote:


From: Mitch Williams 



Due to an inverted conditional, the driver was marking all of its MAC
filters for deletion every time set_rx_mode was called. Depending upon
the timing of the calls to set_rx_mode and the processing of the admin
queue, the driver would (accidentally) end up with a varying number of
functional filters.



Correct this logic so that MAC filters are added and removed correctly.
Add a check for the driver's "hardware" MAC address so that this filter
doesn't get removed incorrectly.



Change-ID: Ib3e7c4a5b53df6835f164fe44cb778cb71f8aff8
Signed-off-by: Mitch Williams 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 
---
  drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)



diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 94eff4a..07f6052 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -892,8 +892,10 @@ static void i40evf_set_rx_mode(struct net_device *netdev)
break;
}
}
+   if (ether_addr_equal(f->macaddr, adapter->hw.mac.addr))
+   found = true;


   This line is indented too much.

[...]

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave

2015-07-13 Thread Nikolay Aleksandrov
From: Satish Ashok 

A report with INCLUDE/Change_to_include and empty source list should be
treated as a leave, specified by RFC 3376, section 3.1:
"If the requested filter mode is INCLUDE *and* the requested source
 list is empty, then the entry corresponding to the requested
 interface and multicast address is deleted if present.  If no such
 entry is present, the request is ignored."

Signed-off-by: Satish Ashok 
Signed-off-by: Nikolay Aleksandrov 
---
 net/bridge/br_multicast.c | 37 ++---
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 742a6c27d7a2..79db489cdade 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -39,6 +39,16 @@ static void br_multicast_start_querier(struct net_bridge *br,
   struct bridge_mcast_own_query *query);
 static void br_multicast_add_router(struct net_bridge *br,
struct net_bridge_port *port);
+static void br_ip4_multicast_leave_group(struct net_bridge *br,
+struct net_bridge_port *port,
+__be32 group,
+__u16 vid);
+#if IS_ENABLED(CONFIG_IPV6)
+static void br_ip6_multicast_leave_group(struct net_bridge *br,
+struct net_bridge_port *port,
+const struct in6_addr *group,
+__u16 vid);
+#endif
 unsigned int br_mdb_rehash_seq;
 
 static inline int br_ip_equal(const struct br_ip *a, const struct br_ip *b)
@@ -1010,9 +1020,15 @@ static int br_ip4_multicast_igmp3_report(struct 
net_bridge *br,
continue;
}
 
-   err = br_ip4_multicast_add_group(br, port, group, vid);
-   if (err)
-   break;
+   if ((type == IGMPV3_CHANGE_TO_INCLUDE ||
+type == IGMPV3_MODE_IS_INCLUDE) &&
+   ntohs(grec->grec_nsrcs) == 0) {
+   br_ip4_multicast_leave_group(br, port, group, vid);
+   } else {
+   err = br_ip4_multicast_add_group(br, port, group, vid);
+   if (err)
+   break;
+   }
}
 
return err;
@@ -1071,10 +1087,17 @@ static int br_ip6_multicast_mld2_report(struct 
net_bridge *br,
continue;
}
 
-   err = br_ip6_multicast_add_group(br, port, &grec->grec_mca,
-vid);
-   if (err)
-   break;
+   if ((grec->grec_type == MLD2_CHANGE_TO_INCLUDE ||
+grec->grec_type == MLD2_MODE_IS_INCLUDE) &&
+   ntohs(*nsrcs) == 0) {
+   br_ip6_multicast_leave_group(br, port, &grec->grec_mca,
+vid);
+   } else {
+   err = br_ip6_multicast_add_group(br, port,
+&grec->grec_mca, vid);
+   if (!err)
+   break;
+   }
}
 
return err;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net: Fix skb csum races when peeking

2015-07-13 Thread Herbert Xu
On Mon, Jul 13, 2015 at 04:31:00PM +0800, Herbert Xu wrote:
> On Mon, Jul 13, 2015 at 10:28:19AM +0200, Eric Dumazet wrote:
> >
> > Except that udp checksum are checked outside of spinlock protection.
> 
> Good point.  I wonder when this got broken.  I'll do some digging.

OK looks like I can claim credit for this bug too :)

commit fb286bb2990a107009dbf25f6ffebeb7df77f9be
Author: Herbert Xu 
Date:   Thu Nov 10 13:01:24 2005 -0800

[NET]: Detect hardware rx checksum faults correctly

Although others have made the hole bigger more recently.

PS we seem to no longer use the hardware checksum in case of
CHECKSUM_COMPLETE, I wonder why that is?

---8<---
When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.

This is in fact bogus for the MSG_PEEK case as this is done without
any locking.  So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.

This patch fixes this by only storing the result if the skb is not
shared.  This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.

Signed-off-by: Herbert Xu 

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b80fb91..4967262 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -622,7 +657,8 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, 
int len)
!skb->csum_complete_sw)
netdev_rx_csum_fault(skb->dev);
}
-   skb->csum_valid = !sum;
+   if (!skb_shared(skb))
+   skb->csum_valid = !sum;
return sum;
 }
 EXPORT_SYMBOL(__skb_checksum_complete_head);
@@ -642,11 +678,13 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
netdev_rx_csum_fault(skb->dev);
}
 
-   /* Save full packet checksum */
-   skb->csum = csum;
-   skb->ip_summed = CHECKSUM_COMPLETE;
-   skb->csum_complete_sw = 1;
-   skb->csum_valid = !sum;
+   if (!skb_shared(skb)) {
+   /* Save full packet checksum */
+   skb->csum = csum;
+   skb->ip_summed = CHECKSUM_COMPLETE;
+   skb->csum_complete_sw = 1;
+   skb->csum_valid = !sum;
+   }
 
return sum;
 }
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Revert "net: fec: Ensure clocks are enabled while using mdio bus"

2015-07-13 Thread Fabio Estevam
This reverts commit 6c3e921b18edca290099adfddde8a50236bf2d80.

commit 6c3e921b18ed ("net: fec: Ensure clocks are enabled while using mdio
 bus") prevents the kernel to boot on mx6 boards, so let's revert it.

Reported-by: Tyler Baker 
Signed-off-by: Fabio Estevam 
---
 drivers/net/ethernet/freescale/fec_main.c | 88 +--
 1 file changed, 13 insertions(+), 75 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 42e20e5..1f89c59 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -24,7 +24,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -78,7 +77,6 @@ static void fec_enet_itr_coal_init(struct net_device *ndev);
 #define FEC_ENET_RAEM_V0x8
 #define FEC_ENET_RAFL_V0x8
 #define FEC_ENET_OPD_V 0xFFF0
-#define FEC_MDIO_PM_TIMEOUT  100 /* ms */
 
 static struct platform_device_id fec_devtype[] = {
{
@@ -1769,13 +1767,7 @@ static void fec_enet_adjust_link(struct net_device *ndev)
 static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 {
struct fec_enet_private *fep = bus->priv;
-   struct device *dev = &fep->pdev->dev;
unsigned long time_left;
-   int ret = 0;
-
-   ret = pm_runtime_get_sync(dev);
-   if (IS_ERR_VALUE(ret))
-   return ret;
 
fep->mii_timeout = 0;
init_completion(&fep->mdio_done);
@@ -1791,30 +1783,18 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int 
mii_id, int regnum)
if (time_left == 0) {
fep->mii_timeout = 1;
netdev_err(fep->netdev, "MDIO read timeout\n");
-   ret = -ETIMEDOUT;
-   goto out;
+   return -ETIMEDOUT;
}
 
-   ret = FEC_MMFR_DATA(readl(fep->hwp + FEC_MII_DATA));
-
-out:
-   pm_runtime_mark_last_busy(dev);
-   pm_runtime_put_autosuspend(dev);
-
-   return ret;
+   /* return value */
+   return FEC_MMFR_DATA(readl(fep->hwp + FEC_MII_DATA));
 }
 
 static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
   u16 value)
 {
struct fec_enet_private *fep = bus->priv;
-   struct device *dev = &fep->pdev->dev;
unsigned long time_left;
-   int ret = 0;
-
-   ret = pm_runtime_get_sync(dev);
-   if (IS_ERR_VALUE(ret))
-   return ret;
 
fep->mii_timeout = 0;
init_completion(&fep->mdio_done);
@@ -1831,13 +1811,10 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int 
mii_id, int regnum,
if (time_left == 0) {
fep->mii_timeout = 1;
netdev_err(fep->netdev, "MDIO write timeout\n");
-   ret  = -ETIMEDOUT;
+   return -ETIMEDOUT;
}
 
-   pm_runtime_mark_last_busy(dev);
-   pm_runtime_put_autosuspend(dev);
-
-   return ret;
+   return 0;
 }
 
 static int fec_enet_clk_enable(struct net_device *ndev, bool enable)
@@ -1849,6 +1826,9 @@ static int fec_enet_clk_enable(struct net_device *ndev, 
bool enable)
ret = clk_prepare_enable(fep->clk_ahb);
if (ret)
return ret;
+   ret = clk_prepare_enable(fep->clk_ipg);
+   if (ret)
+   goto failed_clk_ipg;
if (fep->clk_enet_out) {
ret = clk_prepare_enable(fep->clk_enet_out);
if (ret)
@@ -1872,6 +1852,7 @@ static int fec_enet_clk_enable(struct net_device *ndev, 
bool enable)
}
} else {
clk_disable_unprepare(fep->clk_ahb);
+   clk_disable_unprepare(fep->clk_ipg);
if (fep->clk_enet_out)
clk_disable_unprepare(fep->clk_enet_out);
if (fep->clk_ptp) {
@@ -1893,6 +1874,8 @@ failed_clk_ptp:
if (fep->clk_enet_out)
clk_disable_unprepare(fep->clk_enet_out);
 failed_clk_enet_out:
+   clk_disable_unprepare(fep->clk_ipg);
+failed_clk_ipg:
clk_disable_unprepare(fep->clk_ahb);
 
return ret;
@@ -2864,14 +2847,10 @@ fec_enet_open(struct net_device *ndev)
struct fec_enet_private *fep = netdev_priv(ndev);
int ret;
 
-   ret = pm_runtime_get_sync(&fep->pdev->dev);
-   if (IS_ERR_VALUE(ret))
-   return ret;
-
pinctrl_pm_select_default_state(&fep->pdev->dev);
ret = fec_enet_clk_enable(ndev, true);
if (ret)
-   goto clk_enable;
+   return ret;
 
/* I should reset the ring buffers here, but I don't yet know
 * a simple way to do that.
@@ -2902,9 +2881,6 @@ err_enet_mii_probe:
fec_enet_free_buffers(ndev);
 err_enet_alloc:
fec_enet_clk_enable(ndev, false);
-clk_enable:
-   pm_runtime_mark_last_busy(&fep->pdev->dev);
-   pm_runtime_put_autosuspend(&fep->pdev->dev

Re: [PATCH net-next v2] ipv6: Do not iterate over all interfaces when finding source address on specific interface.

2015-07-13 Thread Hajime Tazaki

Yoshifuji-san,

At Mon, 13 Jul 2015 17:38:48 +0900,
Erik Kline wrote:
> 
> On 13 July 2015 at 15:32, YOSHIFUJI Hideaki
>  wrote:
> > Hi,
> >
> > Erik Kline wrote:
> >> Hmm, when I run a UML linux with this patch (which, I'm ashamed to
> >> say, I failed to do before) I get these kinds of errors:
> >>
> >> unregister_netdevice: waiting for  to become free.
> >> Usage count = 1
> >> unregister_netdevice: waiting for  to become free.
> >> Usage count = 1
> >>
> >> Perhaps they're unrelated... I'm still investigating.
> >
> > Would you test attached patch please?
> 
> That does look logically correct, so +1 to it regardless, but it does
> not seem to have fixed the issue I'm seeing.
> 
> I still haven't produced the smallest possible demo test program.

sorry to jump-in, but there is a side-effect with this
patch, which my tcp and dccp tests (ipv6) are failed.

because newly added function (__ipv6_dev_get_saddr) won't
update a variable 'hiscore' (it swaps with 'score' in some
case), the caller (ipv6_dev_get_saddr) can't fill an
appropriate saddr in the end.

I don't know if this is a good patch but the following diff
makes my test happy.

-- Hajime

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4ab74d5..c4e9416 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1363,7 +1363,8 @@ static void __ipv6_dev_get_saddr(struct net *net,
 unsigned int prefs,
 const struct in6_addr *saddr,
 struct inet6_dev *idev,
-struct ipv6_saddr_score *scores)
+struct ipv6_saddr_score *scores,
+struct ipv6_saddr_score **in_hiscore)
 {
struct ipv6_saddr_score *score = &scores[0], *hiscore = &scores[1];
 
@@ -1424,6 +1425,7 @@ static void __ipv6_dev_get_saddr(struct net *net,
in6_ifa_hold(score->ifa);
 
swap(hiscore, score);
+   *in_hiscore = hiscore;
 
/* restore our iterator */
score->ifa = hiscore->ifa;
@@ -1480,13 +1482,15 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
}
 
if (use_oif_addr) {
-   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, scores);
+   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev,
+scores, &hiscore);
} else {
for_each_netdev_rcu(net, dev) {
idev = __in6_dev_get(dev);
if (!idev)
continue;
-   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev, 
scores);
+   __ipv6_dev_get_saddr(net, &dst, prefs, saddr, idev,
+scores, &hiscore);
}
}
rcu_read_unlock();
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users

2015-07-13 Thread Neil Horman
On Fri, Jul 10, 2015 at 06:21:14PM -0700, David Miller wrote:
> From: Marcelo Ricardo Leitner 
> Date: Thu,  9 Jul 2015 11:15:19 -0300
> 
> > SCTP has this operation to peel off associations from a given socket and
> > create a new socket using this association. We currently have two ways
> > to use this operation:
> > - via getsockopt(), on which it will also create and return a file
> >   descriptor for this new socket
> > - via sctp_do_peeloff(), which is for kernel only
> > 
> > The caveat with using sctp_do_peeloff() directly is that it creates a
> > dependency to SCTP module, while all other operations are handled via
> > kernel_{socket,sendmsg,getsockopt...}() interface. This causes the
> > kernel to load SCTP module even when it's not directly used
> > 
> > This patch then updates SCTP_SOCKOPT_PEELOFF so that for kernel users of
> > this protocol it will not allocate a file descriptor but instead just
> > return the socket pointer directly.
> > 
> > If called by an user application it will work as before.
> > 
> > Signed-off-by: Marcelo Ricardo Leitner 
> 
> I do not like this at all.
> 
> Socket option implementations should not change their behavior or what
> datastructures they consume or return just because the socket happens
> to be a kernel socket.
> 
But in this case its necessecary, as the kernel here can't allocate an fd, due
to serious leakage (see commit 2f2d76cc3e938389feee671b46252dde6880b3b7).
Initially Marcelo had created duplicate code paths, one to return an fd, one to
return a file struct.  If you would rather go in that direction, I'm sure he can
propose it again, but that seems less correct to me than this solution.

> I'm not applying this series, sorry.
> 
> Also, your patch series lacked an intial "PATCH 0/N" posting, so you
> could at least spend the time to discuss this patch series at a high
> level and explain your overall motivations.
> 
That was in the initial posting.  It should have been reposted, but if you're
interested:
http://marc.info/?l=linux-sctp&m=143449456219518&w=2

Regards
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: GFIT

2015-07-13 Thread Cunliffe Bryan (RW3) CMFT Manchester



From: Cunliffe Bryan (RW3) CMFT Manchester
Sent: 12 July 2015 21:11
To: Cunliffe Bryan (RW3) CMFT Manchester
Subject: GFIT

Donation has been made to you Email 
mrs.gloriamacke...@outlook.com for more 
Details

Privacy and Confidentiality Notice: The information contained in this e-mail is 
intended for the named recipient(s) only. It may contain privileged and 
confidential information.  If you are not an intended recipient, you must not 
copy, distribute or take any action in reliance on it. If you have received 
this e-mail in error, we would be grateful if you would notify us immediately.  
Thank you for your assistance.
 
Please note that e-mails sent or received by our staff may be disclosed under 
the Freedom of Information Act (unless exempt).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] bnx2x: Update to FW version 7.12.30

2015-07-13 Thread Yuval Mintz
> > > The new FW will allow us to utilize some new features in our driver,
> > > mainly adding vlan filtering offload and vxlan offload support.
> > >
> > > In addition, this fixes several issues:
> > > 1. Packets from a VF with pvid configured which were sent with a
> > >different vlan were transmitted instead of being discarded.
> > >
> > > 2. FCoE traffic might not recover after a failue while there's traffic
> > >to another function.
> > >
> > > Signed-off-by: Yuval Mintz 
> >
> > Hi, any news about this one?
> > Thanks, Yuval
> 
> Any updates? I've sent this 3-weeks ago and haven't seen any reply.

Apparently the destination E-mail has changed and I was unaware.
Is anyone here? ;-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] net: enable inband link state negotiation only when explicitly requested

2015-07-13 Thread Stas Sergeev
13.07.2015 12:54, Sebastien Rannou пишет:
> Hi Stas,
> 
> On Fri, 10 Jul 2015, Stas Sergeev wrote:
> 
>> Those who were affected by the change, please send your Tested-by,
>> Thanks!
> 
> I also confirm that this version of the patch solves the issue:
> 
> Tested-by: Sebastien Rannou 
Thanks Sebastien!
Unfortunately, there will be v3 in a few days.
Perhaps you should not rush with the tests until the
things are settled, or who knows how many iterations
you'll have to test...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] net: enable inband link state negotiation only when explicitly requested

2015-07-13 Thread Sebastien Rannou
Hi Stas,

On Fri, 10 Jul 2015, Stas Sergeev wrote:

> Those who were affected by the change, please send your Tested-by,
> Thanks!

I also confirm that this version of the patch solves the issue:

Tested-by: Sebastien Rannou 

-- 
Sébastien

linux-4.2-rc2/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1993: possible bad error checking ?

2015-07-13 Thread David Binderman
Hello there,

[linux-4.2-rc2/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1993]: (style) 
Checking if unsigned variable 'entry' is less than zero.

Source code is

    entry = priv->hw->mode->jumbo_frm(priv, skb, csum_insertion);
    if (unlikely(entry < 0))
    goto dma_map_err;

but

    unsigned int entry;

So the error checking from the function call looks broken to me.

If the return value from the function call to jumbo_frm is a plain signed 
integer, suggest
sanity check that *before* assigning into an unsigned integer.

Regards

David Binderman

  --
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >