RE: A Regression between v4.2-rc2 and v4.2-rc3

2015-07-21 Thread Duan Andy
From: Peter Chen  Sent: Wednesday, July 22, 2015 1:45 
PM
> To: netdev@vger.kernel.org
> Cc: Andrew Lunn; Duan Fugang-B38611; David S. Miller
> Subject: A Regression between v4.2-rc2 and v4.2-rc3
> 
> Hi List,
> 
> I run out a kernel oops [2] for nfsroot at several imx6 boards when
> rebase to v4.2-rc3, after revert below patch [1], it is ok.
> This patch is just adding runtime pm for ipg clock, I am wonder why it
> takes as a bug fix.
> 
> [1]
> commit 6c3e921b18edca290099adfddde8a50236bf2d80
> Author: Andrew Lunn 
> Date:   Mon Jul 6 20:34:55 2015 +0200
> 
> net: fec: Ensure clocks are enabled while using mdio bus
> 
> When a switch is attached to the mdio bus, the mdio bus can be used
> while the interface is not open. If the IPG clock is not enabled,
> MDIO
> reads/writes will simply time out.
> 
> Add support for runtime PM to control this clock. Enable/disable this
> clock using runtime PM, with open()/close() and mdio read()/write()
> function triggering runtime PM operations. Since PM is optional, the
> IPG clock is enabled at probe and is no longer modified by
> fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is
> guaranteed the clock is running when MDIO operations are performed.
> 
> Signed-off-by: Andrew Lunn 
> Acked-by: Fugang Duan 
> Signed-off-by: David S. Miller 
> 

The patch was reverted in last week.

Regards,
Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


A Regression between v4.2-rc2 and v4.2-rc3

2015-07-21 Thread Peter Chen
Hi List,

I run out a kernel oops [2] for nfsroot at several imx6 boards
when rebase to v4.2-rc3, after revert below patch [1], it is ok.
This patch is just adding runtime pm for ipg clock, I am wonder
why it takes as a bug fix.

[1]
commit 6c3e921b18edca290099adfddde8a50236bf2d80
Author: Andrew Lunn 
Date:   Mon Jul 6 20:34:55 2015 +0200

net: fec: Ensure clocks are enabled while using mdio bus

When a switch is attached to the mdio bus, the mdio bus can be used
while the interface is not open. If the IPG clock is not enabled, MDIO
reads/writes will simply time out.

Add support for runtime PM to control this clock. Enable/disable this
clock using runtime PM, with open()/close() and mdio read()/write()
function triggering runtime PM operations. Since PM is optional, the
IPG clock is enabled at probe and is no longer modified by
fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is
guaranteed the clock is running when MDIO operations are performed.

Signed-off-by: Andrew Lunn 
Acked-by: Fugang Duan 
Signed-off-by: David S. Miller 

[2]
[2.534260] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc3 #387
[2.540618] Hardware name: Freescale i.MX6 SoloX (Device Tree)
[2.546455] Backtrace: 
[2.548933] [<80014e00>] (dump_backtrace) from [<80015048>] 
(show_stack+0x20/0x24)
[2.556506]  r6:80cd9db0 r5: r4: r3:
[2.562234] [<80015028>] (show_stack) from [<808b0094>] 
(dump_stack+0x8c/0xa4)
[2.569467] [<808b0008>] (dump_stack) from [<80077b58>] 
(__lock_acquire+0x1d24/0x1ecc)
[2.577385]  r6: r5: r4:80e7d900 r3:0001
[2.583107] [<80075e34>] (__lock_acquire) from [<80078608>] 
(lock_acquire+0xa4/0x124)
[2.590937]  r10:6193 r9:80d3e5c0 r8: r7: r6: 
r5:be0bdae0
[2.598839]  r4:
[2.601400] [<80078564>] (lock_acquire) from [<80095870>] 
(call_timer_fn+0x78/0x1a0)
[2.609144]  r10:0001 r9:0020 r8:bd8cd830 r7:0100 r6:80d3e610 
r5:be0bdae0
[2.617045]  r4:bd8cd854
[2.619600] [<800957f8>] (call_timer_fn) from [<80095a84>] 
(run_timer_softirq+0xec/0x2a4)
[2.62]  r10:bd8cd854 r9:0020 r8:bd8cd830 r7:0020 r6:80d3e610 
r5:00c9
[2.635679]  r4:be7be440
[2.638238] [<80095998>] (run_timer_softirq) from [<80033cbc>] 
(__do_softirq+0xdc/0x364)
[2.646329]  r10:0100 r9:0004 r8:0001 r7:80d3e32c r6:0202 
r5:0001
[2.654230]  r4:80c92084
[2.656785] [<80033be0>] (__do_softirq) from [<800342b4>] 
(irq_exit+0xcc/0x140)
[2.664094]  r10:0001 r9:be01e000 r8:0001 r7: r6:80c932d4 
r5:
[2.671995]  r4:80c8d654
[2.674549] [<800341e8>] (irq_exit) from [<80084254>] 
(__handle_domain_irq+0x7c/0xf0)
[2.682380]  r4:80c8d654 r3:0125
[2.685990] [<800841d8>] (__handle_domain_irq) from [<800095a8>] 
(gic_handle_irq+0x30/0x70)
[2.694342]  r9: r8:07c1 r7:c080e100 r6:80c934bc r5:c080e10c 
r4:be0bdc18
[2.702163] [<80009578>] (gic_handle_irq) from [<80015c24>] 
(__irq_svc+0x44/0x5c)
[2.709647] Exception stack(0xbe0bdc18 to 0xbe0bdc60)
[2.714702] dc00:   
0001 be1103f8
[2.722887] dc20:  6193 2113 80cda1a4 2113 2113 
07c1 
[2.731068] dc40: 0001 be0bdc74 80e8ecc0 be0bdc60 800758d8 808ba154 
2113 
[2.739245]  r7:be0bdc4c r6: r5:2113 r4:808ba154
[2.744976] [<808ba110>] (_raw_spin_unlock_irqrestore) from [<80385dd0>] 
(add_dma_entry+0xa4/0x164)
[2.754023]  r5:02f4f305 r4:
[2.757633] [<80385d2c>] (add_dma_entry) from [<803861e0>] 
(debug_dma_map_page+0x108/0x120)
[2.765984]  r7:be1e4010 r6:bef98980 r5:bd3cc140 r4:be280c00
[2.771714] [<803860d8>] (debug_dma_map_page) from [<8051b604>] 
(fec_enet_new_rxbdp.isra.36+0xe4/0x148)
[2.781107]  r10:be1e4010 r9:0002 r8:bd8c9000 r7:bd3cc140 r6:007a7980 
r5:07c1
[2.789008]  r4:0140 r3:07c1
[2.792619] [<8051b520>] (fec_enet_new_rxbdp.isra.36) from [<8051c558>] 
(fec_enet_open+0x98/0x570)
[2.801578]  r10:bd8cc0f0 r9:003c r8:bd8c9640 r7:bd8cc000 r6:bd254480 
r5:bd8c9000
[2.809481]  r4:bf088780
[2.812043] [<8051c4c0>] (fec_enet_open) from [<806873a4>] 
(__dev_open+0xb8/0x120)
[2.819613]  r10:80d20f00 r9:bd8c9000 r8: r7:bd8c9030 r6:809194c4 
r5:
[2.827515]  r4:bd8c9000
[2.830069] [<806872ec>] (__dev_open) from [<80687680>] 
(__dev_change_flags+0x98/0x158)
[2.838073]  r7:1002 r6:1003 r5:0001 r4:bd8c9000
[2.843797] [<806875e8>] (__dev_change_flags) from [<80687768>] 
(dev_change_flags+0x28/0x58)
[2.852235]  r8: r7:80d20ff0 r6:1002 r5:bd8c9138 r4:bd8c9000 
r3:80c678b4
[2.860060] [<80687740>] (dev_change_flags) from [<80c43ed8>] 
(ip_auto_config.part.14+0x184/0x1020)
[2.869106]  r8:80d20f00 r7:80d20ff0 r6:80d20ff0 r5:

Re: [RFC PATCH v2 net-next 3/3] tcp: add NV congestion control

2015-07-21 Thread Yuchung Cheng
On Tue, Jul 21, 2015 at 9:21 PM, Lawrence Brakmo  wrote:
> This is a request for comments.
>
> TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of
> NV was presented at 2010's LPC (slides). It is a delayed based
> congestion avoidance for the data center. This version has been tested
> within a 10G rack where the HW RTTs are 20-50us.
>
> A description of TCP-NV, including implementation and experimental
> results, can be found at:
> http://www.brakmo.org/networking/tcp-nv/TCPNV.html
>
> The current version includes many module parameters to support
> experimentation with the parameters.
>
> Signed-off-by: Lawrence Brakmo 
> ---
>  include/net/tcp.h  |   1 +
>  net/ipv4/Kconfig   |  16 ++
>  net/ipv4/Makefile  |   1 +
>  net/ipv4/sysctl_net_ipv4.c |   9 +
>  net/ipv4/tcp_input.c   |   2 +
>  net/ipv4/tcp_nv.c  | 479 
> +
>  6 files changed, 508 insertions(+)
>  create mode 100644 net/ipv4/tcp_nv.c
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 2e62efe..c0690ae 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat;
>  extern int sysctl_tcp_min_tso_segs;
>  extern int sysctl_tcp_autocorking;
>  extern int sysctl_tcp_invalid_ratelimit;
> +extern int sysctl_tcp_nv_enable;
>
>  extern atomic_long_t tcp_memory_allocated;
>  extern struct percpu_counter tcp_sockets_allocated;
> diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
> index 6fb3c90..c37b374 100644
> --- a/net/ipv4/Kconfig
> +++ b/net/ipv4/Kconfig
> @@ -539,6 +539,22 @@ config TCP_CONG_VEGAS
> window. TCP Vegas should provide less packet loss, but it is
> not as aggressive as TCP Reno.
>
> +config TCP_CONG_NV
> +   tristate "TCP NV"
> +   default m
> +   ---help---
> +   TCP NV is a follow up to TCP Vegas. It has been modified to deal with
> +   10G networks, measurement noise introduced by LRO, GRO and interrupt
> +   coalescence. In addition, it will decrease its cwnd multiplicative
multiplicatively

> +   instead of linearly.
> +
> +   Note that in general congestion avoidance (cwnd decreased when # 
> packets
> +   queued grows) cannot coexist with congestion control (cwnd decreased 
> only
> +   when there is packet loss) due to fairness issues. One scenario when 
> the
s/the/they
> +   can coexist safely is when the CA flows have RTTs << CC flows RTTs.
> +
> +   For further details see http://www.brakmo.org/networking/tcp-nv/
> +
>  config TCP_CONG_SCALABLE
> tristate "Scalable TCP"
> default n
> diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
> index efc43f3..06f335f 100644
> --- a/net/ipv4/Makefile
> +++ b/net/ipv4/Makefile
> @@ -50,6 +50,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o
>  obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o
>  obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o
>  obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o
> +obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o
>  obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o
>  obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o
>  obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 433231c..31846d5 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -730,6 +730,15 @@ static struct ctl_table ipv4_table[] = {
> .proc_handler   = proc_dointvec_ms_jiffies,
> },
> {
> +   .procname   = "tcp_nv_enable",
> +   .data   = &sysctl_tcp_nv_enable,
> +   .maxlen = sizeof(int),
> +   .mode   = 0644,
> +   .proc_handler   = proc_dointvec_minmax,
> +   .extra1 = &zero,
> +   .extra2 = &one,
> +   },
> +   {
> .procname   = "icmp_msgs_per_sec",
> .data   = &sysctl_icmp_msgs_per_sec,
> .maxlen = sizeof(int),
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index aca4ae5..87560d9 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly;
>  int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
>  int sysctl_tcp_early_retrans __read_mostly = 3;
>  int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
> +int sysctl_tcp_nv_enable __read_mostly = 1;
> +EXPORT_SYMBOL(sysctl_tcp_nv_enable);
>
>  #define FLAG_DATA  0x01 /* Incoming frame contained data.
>   */
>  #define FLAG_WIN_UPDATE0x02 /* Incoming ACK was a window 
> update.   */
> diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c
> new file mode 100644
> index 000..af451b6
> --- /dev/null
> +++ b/net/ipv4/tcp_nv.c
> @@ -0,0 +1,479 @@
> +/*
> + * TCP NV: TCP with Congestion Avoidance
> + *
> + * TCP-NV is a successor of TCP-Vegas that has been developed to
> + * deal wi

[net-next:master 83/84] af_mpls.c:undefined reference to `ip6_route_output'

2015-07-21 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   0b2c2a931a051e75f9df429b520bb2c2f2bb056b
commit: 01faef2cebae02685e2bcfc9bbee8416d5ec19fc [83/84] mpls: make RTA_OIF 
optional
config: i386-randconfig-n1-201529 (attached as .config)
reproduce:
  git checkout 01faef2cebae02685e2bcfc9bbee8416d5ec19fc
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   net/built-in.o: In function `find_outdev':
>> af_mpls.c:(.text+0x194fef): undefined reference to `ip6_route_output'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.2.0-rc2 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
CONFIG_KERNEL_LZ4=y
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
# CONFIG_TASK_DELAY_ACCT is not set
# CONFIG_TASK_XACCT is not set

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_BOOST=y
CONFIG_RCU_KTHREAD_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_NONE=y
# CONFIG_RCU_NOCB_CPU_ZERO is not set
# CONFIG_RCU_NOCB_CPU_ALL is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
# CONFIG_IKCONFIG_PROC is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_MEMCG is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CGROUP_PERF is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_BLK_CGROUP=y
CONFIG_DEBUG_BLK_CGROUP=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
# CONFIG_NET_NS is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CO

[PATCH net-next 5/6] bnx2x: Add MFW dump support

2015-07-21 Thread Yuval Mintz
Devices with up-to-date management FW will be able to store register dumps
on their persistent storage - in case management FW identifies a fatal
error it would gather and store such dumps, which could later be retrieved
using specific debug tools.

This patch adds the necessary part in the driver in order to make the
feature operational, as well as update users [under debug] during load
in case their device contains a dump of a previous crash.

Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h  |  2 ++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |  4 
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h  | 17 ++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 28 
 4 files changed, 51 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index ecf1d7f..2fe3563 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -2582,6 +2582,8 @@ void bnx2x_set_local_cmng(struct bnx2x *bp);
 
 void bnx2x_update_mng_version(struct bnx2x *bp);
 
+void bnx2x_update_mfw_dump(struct bnx2x *bp);
+
 #define MCPR_SCRATCH_BASE(bp) \
(CHIP_IS_E1x(bp) ? MCP_REG_MCPR_SCRATCH : MCP_A_REG_MCPR_SCRATCH)
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 4ad9eeb..7d29bf2 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2908,6 +2908,10 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode)
return -EBUSY;
}
 
+   /* Update driver data for On-Chip MFW dump. */
+   if (IS_PF(bp))
+   bnx2x_update_mfw_dump(bp);
+
/* If PMF - send ADMIN DCBX msg to MFW to initiate DCBX FSM */
if (bp->port.pmf && (bp->state != BNX2X_STATE_DIAG))
bnx2x_dcbx_init(bp, false);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
index 53c8818..931b1b9 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
@@ -2075,6 +2075,20 @@ enum curr_cfg_method_e {
CURR_CFG_MET_VENDOR_SPEC = 2,/* e.g. Option ROM, NPAR, O/S Cfg Utils */
 };
 
+struct mdump_driver_info {
+   u32 epoc;
+   u32 drv_ver;
+   u32 fw_ver;
+
+   u32 valid_dump;
+   #define FIRST_DUMP_VALID(1 << 0)
+   #define SECOND_DUMP_VALID   (1 << 1)
+
+   u32 flags;
+   #define ENABLE_ALL_TRIGGERS (0x7fff)
+   #define TRIGGER_MDUMP_ONCE  (1 << 31)
+};
+
 struct ncsi_oem_data {
u32 driver_version[4];
struct ncsi_oem_fcoe_features ncsi_oem_fcoe_features;
@@ -2347,6 +2361,9 @@ struct shmem2_region {
#define OS_DRIVER_STATE_LOADING 1 /* transition state */
#define OS_DRIVER_STATE_DISABLED2 /* installed but disabled */
#define OS_DRIVER_STATE_ACTIVE  3 /* installed and active */
+
+   /* mini dump driver info */
+   struct mdump_driver_info drv_info;  /* 0x218 */
 };
 
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 0a069fa..78e55fe 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -3709,6 +3709,34 @@ out:
   ethver, iscsiver, fcoever);
 }
 
+void bnx2x_update_mfw_dump(struct bnx2x *bp)
+{
+   struct timeval epoc;
+   u32 drv_ver;
+   u32 valid_dump;
+
+   if (!SHMEM2_HAS(bp, drv_info))
+   return;
+
+   /* Update Driver load time */
+   do_gettimeofday(&epoc);
+   SHMEM2_WR(bp, drv_info.epoc, epoc.tv_sec);
+
+   drv_ver = bnx2x_update_mng_version_utility(DRV_MODULE_VERSION, true);
+   SHMEM2_WR(bp, drv_info.drv_ver, drv_ver);
+
+   SHMEM2_WR(bp, drv_info.fw_ver, REG_RD(bp, XSEM_REG_PRAM));
+
+   /* Check & notify On-Chip dump. */
+   valid_dump = SHMEM2_RD(bp, drv_info.valid_dump);
+
+   if (valid_dump & FIRST_DUMP_VALID)
+   DP(NETIF_MSG_IFUP, "A valid On-Chip MFW dump found on 1st 
partition\n");
+
+   if (valid_dump & SECOND_DUMP_VALID)
+   DP(NETIF_MSG_IFUP, "A valid On-Chip MFW dump found on 2nd 
partition\n");
+}
+
 static void bnx2x_oem_event(struct bnx2x *bp, u32 event)
 {
u32 cmd_ok, cmd_fail;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 6/6] bnx2x: Bump up driver version to 1.712.30

2015-07-21 Thread Yuval Mintz
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 2fe3563..a1f9785 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -32,7 +32,7 @@
  * (you will need to reboot afterwards) */
 /* #define BNX2X_STOP_ON_ERROR */
 
-#define DRV_MODULE_VERSION  "1.710.51-0"
+#define DRV_MODULE_VERSION  "1.712.30-0"
 #define DRV_MODULE_RELDATE  "2014/02/10"
 #define BNX2X_BC_VER0x040200
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/6] bnx2x: Add 84858 phy support

2015-07-21 Thread Yuval Mintz
From: Yaniv Rosner 

This adds support to a new copper phy.

Signed-off-by: Yaniv Rosner 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h  |   3 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 244 ++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h  |  58 +++---
 3 files changed, 232 insertions(+), 73 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
index a838b6e..5425de0 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
@@ -731,6 +731,7 @@ struct port_hw_cfg {/* port 0: 0x12c  
port 1: 0x2bc */
#define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_BCM8722   0x0f00
#define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_BCM54616  0x1000
#define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_BCM84834  0x1100
+   #define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_BCM84858  0x1200
#define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_FAILURE   0xfd00
#define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_NOT_CONN  0xff00
 
@@ -788,6 +789,7 @@ struct port_hw_cfg {/* port 0: 0x12c  
port 1: 0x2bc */
#define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM87220x0f00
#define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM54616   0x1000
#define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84834   0x1100
+   #define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84858   0x1200
#define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_DIRECT_WC  0xfc00
#define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_FAILURE0xfd00
#define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_NOT_CONN   0xff00
@@ -2253,6 +2255,7 @@ struct shmem2_region {
u32 reserved4;  /* Offset 0x150 */
u32 link_attr_sync[PORT_MAX];   /* Offset 0x154 */
#define LINK_ATTR_SYNC_KR2_ENABLE   0x0001
+   #define LINK_ATTR_84858 0x0002
#define LINK_SFP_EEPROM_COMP_CODE_MASK  0xff00
#define LINK_SFP_EEPROM_COMP_CODE_SHIFT  8
#define LINK_SFP_EEPROM_COMP_CODE_SR0x1000
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
index 7f9ec51..d946bba 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
@@ -9654,6 +9654,13 @@ static void bnx2x_8727_link_reset(struct bnx2x_phy *phy,
 /**/
 /* BCM8481/BCM84823/BCM84833 PHY SECTION */
 /**/
+static int bnx2x_is_8483x_8485x(struct bnx2x_phy *phy)
+{
+   return ((phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) ||
+   (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84834) ||
+   (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84858));
+}
+
 static void bnx2x_save_848xx_spirom_version(struct bnx2x_phy *phy,
struct bnx2x *bp,
u8 port)
@@ -9668,8 +9675,7 @@ static void bnx2x_save_848xx_spirom_version(struct 
bnx2x_phy *phy,
};
u16 fw_ver1;
 
-   if ((phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) ||
-   (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84834)) {
+   if (bnx2x_is_8483x_8485x(phy)) {
bnx2x_cl45_read(bp, phy, MDIO_CTL_DEVAD, 0x400f, &fw_ver1);
bnx2x_save_spirom_version(bp, port, fw_ver1 & 0xfff,
phy->ver_addr);
@@ -9751,8 +9757,7 @@ static void bnx2x_848xx_set_led(struct bnx2x *bp,
bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
 reg_set[i].val);
 
-   if ((phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) ||
-   (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84834))
+   if (bnx2x_is_8483x_8485x(phy))
offset = MDIO_PMA_REG_84833_CTL_LED_CTL_1;
else
offset = MDIO_PMA_REG_84823_CTL_LED_CTL_1;
@@ -9770,8 +9775,7 @@ static void bnx2x_848xx_specific_func(struct bnx2x_phy 
*phy,
struct bnx2x *bp = params->bp;
switch (action) {
case PHY_INIT:
-   if ((phy->type != PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) &&
-   (phy->type != PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84834)) {
+   if (!bnx2x_is_8483x_8485x(phy)) {
/* Save spirom version */
bnx2x_save_848xx_spirom_version(phy, bp, params->port);
}
@@ -9903,8 +9907,7 @@ static int bnx2x_848xx_cmn_config_init(struct bnx2x_phy 
*phy,
/* Always write this if this is not 8

[PATCH net-next 0/6] bnx2x: update FW, rebrand and more

2015-07-21 Thread Yuval Mintz
This patch series does several things - it updates the bnx2x FW into
7.12.30 which both contains some small fixes as well as opening the door
for several new features for the device - mainly vxlan/geneve offloads
and vlan filtering offload.
It then adds a new Multi-function mode [BD] which requires this FW in
order to operate.

In addition, this finally rebrands the driver from a 'broadcom' driver
into a 'qlogic' driver [although it would still reside under Broadcom's
tree in the kernel].

Dave,

Please consider applying this series to `net-next'.

[Do notice some of these don't pass checkpatch cleanly, usually due
to confirming with already existing 'bad' styles.
E.g., preprocessor `#define' which is shifted]

Thanks,
Yuval
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/6] bnx2x: Utilize FW 7.12.30

2015-07-21 Thread Yuval Mintz
This moves bnx2x into using 7.12.30 FW. Said firmware fixes the following:

 - Packets from a VF with pvid configured which were sent with a
   different vlan were transmitted instead of being discarded.

 - FCoE traffic might not recover after a failue while there's traffic
   to another function.

In addition, this FW opens the door for the driver to implement several
new features; Specifically, this enhances the device's support for
encapsulated packets and will allow vxlan/geneve offloads to be added in
the future, as well as vlan filtering offload.

Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c| 11 ++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h|  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c|  2 +
 .../net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h|  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h| 87 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   |  2 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 53 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 45 +++
 8 files changed, 136 insertions(+), 70 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index a90d736..fc32821 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2103,9 +2103,14 @@ int bnx2x_rss(struct bnx2x *bp, struct 
bnx2x_rss_config_obj *rss_obj,
if (rss_obj->udp_rss_v6)
__set_bit(BNX2X_RSS_IPV6_UDP, ¶ms.rss_flags);
 
-   if (!CHIP_IS_E1x(bp))
+   if (!CHIP_IS_E1x(bp)) {
+   /* valid only for TUNN_MODE_VXLAN tunnel mode */
+   __set_bit(BNX2X_RSS_IPV4_VXLAN, ¶ms.rss_flags);
+   __set_bit(BNX2X_RSS_IPV6_VXLAN, ¶ms.rss_flags);
+
/* valid only for TUNN_MODE_GRE tunnel mode */
-   __set_bit(BNX2X_RSS_GRE_INNER_HDRS, ¶ms.rss_flags);
+   __set_bit(BNX2X_RSS_TUNN_INNER_HDRS, ¶ms.rss_flags);
+   }
} else {
__set_bit(BNX2X_RSS_MODE_DISABLED, ¶ms.rss_flags);
}
@@ -3677,7 +3682,7 @@ static void bnx2x_update_pbds_gso_enc(struct sk_buff *skb,
pbd2->fw_ip_hdr_to_payload_w =
hlen_w - ((sizeof(struct ipv6hdr)) >> 1);
pbd_e2->data.tunnel_data.flags |=
-   ETH_TUNNEL_DATA_IP_HDR_TYPE_OUTER;
+   ETH_TUNNEL_DATA_IPV6_OUTER;
}
 
pbd2->tcp_send_seq = bswab32(inner_tcp_hdr(skb)->seq);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 03b7404..ec50d12 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -936,9 +936,7 @@ static inline int bnx2x_func_start(struct bnx2x *bp)
else /* CHIP_IS_E1X */
start_params->network_cos_mode = FW_WRR;
 
-   start_params->tunnel_mode   = TUNN_MODE_GRE;
-   start_params->gre_tunnel_type   = IPGRE_TUNNEL;
-   start_params->inner_gre_rss_en  = 1;
+   start_params->inner_rss = 1;
 
if (IS_MF_UFP(bp) && BNX2X_IS_MF_SD_PROTOCOL_FCOE(bp)) {
start_params->class_fail_ethtype = ETH_P_FIP;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
index 6e4294e..b50f154 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
@@ -1850,6 +1850,8 @@ static void bnx2x_dcbx_fw_struct(struct bnx2x *bp,
if (bp->dcbx_port_params.ets.cos_params[cos].
pri_bitmask & pri_bit)
tt2cos[pri].cos = cos;
+
+   pfc_fw_cfg->dcb_outer_pri[pri]  = ttp[pri];
}
 
/* we never want the FW to add a 0 vlan tag */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h
index 7636e3c..bfda526 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h
@@ -372,7 +372,7 @@
 #define MAX_COS_NUMBER 4
 #define MAX_TRAFFIC_TYPES 8
 #define MAX_PFC_PRIORITIES 8
-
+#define MAX_VLAN_PRIORITIES 8
/* used by array traffic_type_to_priority[] to mark traffic type \
that is not mapped to priority*/
 #define LLFC_TRAFFIC_TYPE_TO_PRIORITY_UNMAPPED 0xFF
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
index 058bc73..2b6f97b 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
@@ -2898,8 +2898,8 @@ struct afex_stats {
 };
 
 #define BCM_5710_FW_

[PATCH net-next 4/6] bnx2x: new Multi-function mode - BD

2015-07-21 Thread Yuval Mintz
This adds support to a new multi-function mode, enabling driver to
initialize such devices and correctly interacting with management FW
for fully utilizing their features.

Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h|  3 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c| 74 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h| 36 +++
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c|  3 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h| 74 ++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   | 56 
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h| 17 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c  |  3 +
 8 files changed, 251 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index a59f0b9..ecf1d7f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1424,6 +1424,7 @@ enum {
SUB_MF_MODE_UNKNOWN = 0,
SUB_MF_MODE_UFP,
SUB_MF_MODE_NPAR1_DOT_5,
+   SUB_MF_MODE_BD,
 };
 
 struct bnx2x {
@@ -1638,6 +1639,8 @@ struct bnx2x {
u8  mf_sub_mode;
 #define IS_MF_UFP(bp)  (IS_MF_SD(bp) && \
 bp->mf_sub_mode == SUB_MF_MODE_UFP)
+#define IS_MF_BD(bp)   (IS_MF_SD(bp) && \
+bp->mf_sub_mode == SUB_MF_MODE_BD)
 
u8  wol;
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index e395ae9..0e392ca 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2517,6 +2517,20 @@ static void bnx2x_bz_fp(struct bnx2x *bp, int index)
fp->mode = TPA_MODE_DISABLED;
 }
 
+void bnx2x_set_os_driver_state(struct bnx2x *bp, u32 state)
+{
+   u32 cur;
+
+   if (!IS_MF_BD(bp) || !SHMEM2_HAS(bp, os_driver_state) || IS_VF(bp))
+   return;
+
+   cur = SHMEM2_RD(bp, os_driver_state[BP_FW_MB_IDX(bp)]);
+   DP(NETIF_MSG_IFUP, "Driver state %08x-->%08x\n",
+  cur, state);
+
+   SHMEM2_WR(bp, os_driver_state[BP_FW_MB_IDX(bp)], state);
+}
+
 int bnx2x_load_cnic(struct bnx2x *bp)
 {
int i, rc, port = BP_PORT(bp);
@@ -2880,6 +2894,8 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode)
/* mark driver is loaded in shmem2 */
u32 val;
val = SHMEM2_RD(bp, drv_capabilities_flag[BP_FW_MB_IDX(bp)]);
+   val &= ~DRV_FLAGS_MTU_MASK;
+   val |= (bp->dev->mtu << DRV_FLAGS_MTU_SHIFT);
SHMEM2_WR(bp, drv_capabilities_flag[BP_FW_MB_IDX(bp)],
  val | DRV_FLAGS_CAPABILITIES_LOADED_SUPPORTED |
  DRV_FLAGS_CAPABILITIES_LOADED_L2);
@@ -2896,6 +2912,9 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode)
if (bp->port.pmf && (bp->state != BNX2X_STATE_DIAG))
bnx2x_dcbx_init(bp, false);
 
+   if (!IS_MF_SD_STORAGE_PERSONALITY_ONLY(bp))
+   bnx2x_set_os_driver_state(bp, OS_DRIVER_STATE_ACTIVE);
+
DP(NETIF_MSG_IFUP, "Ending successfully NIC load\n");
 
return 0;
@@ -2963,6 +2982,9 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, 
bool keep_link)
 
DP(NETIF_MSG_IFUP, "Starting NIC unload\n");
 
+   if (!IS_MF_SD_STORAGE_PERSONALITY_ONLY(bp))
+   bnx2x_set_os_driver_state(bp, OS_DRIVER_STATE_DISABLED);
+
/* mark driver is unloaded in shmem2 */
if (IS_PF(bp) && SHMEM2_HAS(bp, drv_capabilities_flag)) {
u32 val;
@@ -4191,6 +4213,41 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
+void bnx2x_get_c2s_mapping(struct bnx2x *bp, u8 *c2s_map, u8 *c2s_default)
+{
+   int mfw_vn = BP_FW_MB_IDX(bp);
+   u32 tmp;
+
+   /* If the shmem shouldn't affect configuration, reflect */
+   if (!IS_MF_BD(bp)) {
+   int i;
+
+   for (i = 0; i < BNX2X_MAX_PRIORITY; i++)
+   c2s_map[i] = i;
+   *c2s_default = 0;
+
+   return;
+   }
+
+   tmp = SHMEM2_RD(bp, c2s_pcp_map_lower[mfw_vn]);
+   tmp = (__force u32)be32_to_cpu((__force __be32)tmp);
+   c2s_map[0] = tmp & 0xff;
+   c2s_map[1] = (tmp >> 8) & 0xff;
+   c2s_map[2] = (tmp >> 16) & 0xff;
+   c2s_map[3] = (tmp >> 24) & 0xff;
+
+   tmp = SHMEM2_RD(bp, c2s_pcp_map_upper[mfw_vn]);
+   tmp = (__force u32)be32_to_cpu((__force __be32)tmp);
+   c2s_map[4] = tmp & 0xff;
+   c2s_map[5] = (tmp >> 8) & 0xff;
+   c2s_map[6] = (tmp >> 16) & 0xff;
+   c2s_map[7] = (tmp >> 24) & 0xff;
+
+   tmp = SHMEM2_RD(bp, c2s_pcp_map_default[mfw_vn]);
+   tmp = (__force u32)be32_t

[PATCH net-next 2/6] bnx2x: Rebrand from 'broadcom' into 'qlogic'

2015-07-21 Thread Yuval Mintz
bnx2x still appears as a Broadcom driver even though the devices it
utilizes belong to Qlogic for more than a year.

This patch changes the various headers and the device strings to indicate
the correct ownership of the device.

Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h|  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c|  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h|  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c| 10 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.h| 10 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_dump.h   | 10 +++--
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c|  4 +-
 .../net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h|  4 +-
 .../ethernet/broadcom/bnx2x/bnx2x_fw_file_hdr.h|  2 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h|  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h   |  4 +-
 .../net/ethernet/broadcom/bnx2x/bnx2x_init_ops.h   |  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c   | 10 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h   | 10 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   | 50 +++---
 .../net/ethernet/broadcom/bnx2x/bnx2x_mfw_req.h|  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h|  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 14 +++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 14 +++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c  | 10 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h  | 10 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c  |  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h  |  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c   | 10 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h   | 22 ++
 25 files changed, 142 insertions(+), 88 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index cd4ae76..a59f0b9 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1,6 +1,8 @@
-/* bnx2x.h: Broadcom Everest network driver.
+/* bnx2x.h: QLogic Everest network driver.
  *
  * Copyright (c) 2007-2013 Broadcom Corporation
+ * Copyright (c) 2014 QLogic Corporation
+ * All rights reserved
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index fc32821..e395ae9 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -1,6 +1,8 @@
-/* bnx2x_cmn.c: Broadcom Everest network driver.
+/* bnx2x_cmn.c: QLogic Everest network driver.
  *
  * Copyright (c) 2007-2013 Broadcom Corporation
+ * Copyright (c) 2014 QLogic Corporation
+ * All rights reserved
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index ec50d12..77693d3 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -1,6 +1,8 @@
-/* bnx2x_cmn.h: Broadcom Everest network driver.
+/* bnx2x_cmn.h: QLogic Everest network driver.
  *
  * Copyright (c) 2007-2013 Broadcom Corporation
+ * Copyright (c) 2014 QLogic Corporation
+ * All rights reserved
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
index b50f154..7ccf668 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c
@@ -1,15 +1,17 @@
-/* bnx2x_dcb.c: Broadcom Everest network driver.
+/* bnx2x_dcb.c: QLogic Everest network driver.
  *
  * Copyright 2009-2013 Broadcom Corporation
+ * Copyright 2014 QLogic Corporation
+ * All rights reserved
  *
- * Unless you and Broadcom execute a separate written software license
+ * Unless you and QLogic execute a separate written software license
  * agreement governing use of this software, this software is licensed to you
  * under the terms of the GNU General Public License version 2, available
  * at http://www.gnu.org/licenses/old-licenses/gpl-2.0.html (the "GPL").
  *
  * Notwithstanding the above, under no circumstances may you combine this
- * software in any way with any other Broadcom software provided under a
- * license other than the GPL, without Broadcom's express prior written
+ * software in any way with any other QLogic software provided under a
+ * license other than the GPL, without QLogic's express prior written
 

Re: [PATCH net-next v2] bridge: Fix setting a flag in br_fill_ifvlaninfo_range().

2015-07-21 Thread roopa

On 7/21/15, 9:57 PM, Rami Rosen wrote:

This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() 
method. The
assignment of vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is
unneeded, as vinfo.flags value is overriden by the  immediately following
vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement.

Signed-off-by: Rami Rosen 


Acked-by: Roopa Prabhu 

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] mpls_iptunnel: fix sparse warn: remove incorrect rcu_dereference

2015-07-21 Thread Roopa Prabhu
From: Roopa Prabhu 

fix for:
net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison
expression (different address spaces)

remove incorrect rcu_dereference possibly left over from
earlier revisions of the code.

Reported-by: kbuild test robot 
Signed-off-by: Roopa Prabhu 
---
 net/mpls/mpls_iptunnel.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index eea096f..276f8c9 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -70,7 +70,7 @@ int mpls_output(struct sock *sk, struct sk_buff *skb)
skb_orphan(skb);
 
/* Find the output device */
-   out_dev = rcu_dereference(dst->dev);
+   out_dev = dst->dev;
if (!mpls_output_possible(out_dev) ||
!lwtstate || skb_warn_if_lro(skb))
goto drop;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] be2net: support ndo_get_phys_port_id()

2015-07-21 Thread Sriharsha Basavapatna
From: Sriharsha Basavapatna 

Add be_get_phys_port_id() function to report physical port id. The port id
should be unique across different be2net devices in the system. We use the
chip serial number along with the physical port number for this.

Signed-off-by: Sriharsha Basavapatna 
---
 drivers/net/ethernet/emulex/benet/be.h  |  3 +++
 drivers/net/ethernet/emulex/benet/be_cmds.c |  7 ++-
 drivers/net/ethernet/emulex/benet/be_cmds.h |  8 +---
 drivers/net/ethernet/emulex/benet/be_main.c | 22 ++
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index cb5777b..8cd384d 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -105,6 +105,8 @@
 
 #define MAX_VFS30 /* Max VFs supported by BE3 FW */
 #define FW_VER_LEN 32
+#defineCNTL_SERIAL_NUM_WORDS   8  /* Controller serial number words */
+#defineCNTL_SERIAL_NUM_WORD_SZ (sizeof(u16)) /* Byte-sz of serial num 
word */
 
 #defineRSS_INDIR_TABLE_LEN 128
 #define RSS_HASH_KEY_LEN   40
@@ -590,6 +592,7 @@ struct be_adapter {
struct rss_info rss_info;
/* Filters for packets that need to be sent to BMC */
u32 bmc_filt_mask;
+   u16 serial_num[CNTL_SERIAL_NUM_WORDS];
 };
 
 #define be_physfn(adapter) (!adapter->virtfn)
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c 
b/drivers/net/ethernet/emulex/benet/be_cmds.c
index ecad46f..3be1fbd 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -2852,10 +2852,11 @@ int be_cmd_get_cntl_attributes(struct be_adapter 
*adapter)
struct be_mcc_wrb *wrb;
struct be_cmd_req_cntl_attribs *req;
struct be_cmd_resp_cntl_attribs *resp;
-   int status;
+   int status, i;
int payload_len = max(sizeof(*req), sizeof(*resp));
struct mgmt_controller_attrib *attribs;
struct be_dma_mem attribs_cmd;
+   u32 *serial_num;
 
if (mutex_lock_interruptible(&adapter->mbox_lock))
return -1;
@@ -2886,6 +2887,10 @@ int be_cmd_get_cntl_attributes(struct be_adapter 
*adapter)
if (!status) {
attribs = attribs_cmd.va + sizeof(struct be_cmd_resp_hdr);
adapter->hba_port_num = attribs->hba_attribs.phy_port;
+   serial_num = attribs->hba_attribs.controller_serial_number;
+   for (i = 0; i < CNTL_SERIAL_NUM_WORDS; i++)
+   adapter->serial_num[i] = le32_to_cpu(serial_num[i]) &
+   (BIT_MASK(16) - 1);
}
 
 err:
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h 
b/drivers/net/ethernet/emulex/benet/be_cmds.h
index a4479f7..36d835b 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.h
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
@@ -1637,10 +1637,12 @@ struct be_cmd_req_set_qos {
 struct mgmt_hba_attribs {
u32 rsvd0[24];
u8 controller_model_number[32];
-   u32 rsvd1[79];
-   u8 rsvd2[3];
+   u32 rsvd1[16];
+   u32 controller_serial_number[8];
+   u32 rsvd2[55];
+   u8 rsvd3[3];
u8 phy_port;
-   u32 rsvd3[13];
+   u32 rsvd4[13];
 } __packed;
 
 struct mgmt_controller_attrib {
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index c996dd7..5e92db8 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -5219,6 +5219,27 @@ static netdev_features_t be_features_check(struct 
sk_buff *skb,
 }
 #endif
 
+static int be_get_phys_port_id(struct net_device *dev,
+  struct netdev_phys_item_id *ppid)
+{
+   int i, id_len = CNTL_SERIAL_NUM_WORDS * CNTL_SERIAL_NUM_WORD_SZ + 1;
+   struct be_adapter *adapter = netdev_priv(dev);
+   u8 *id;
+
+   if (MAX_PHYS_ITEM_ID_LEN < id_len)
+   return -ENOSPC;
+
+   ppid->id[0] = adapter->hba_port_num + 1;
+   id = &ppid->id[1];
+   for (i = CNTL_SERIAL_NUM_WORDS - 1; i >= 0;
+i--, id += CNTL_SERIAL_NUM_WORD_SZ)
+   memcpy(id, &adapter->serial_num[i], CNTL_SERIAL_NUM_WORD_SZ);
+
+   ppid->id_len = id_len;
+
+   return 0;
+}
+
 static const struct net_device_ops be_netdev_ops = {
.ndo_open   = be_open,
.ndo_stop   = be_close,
@@ -5249,6 +5270,7 @@ static const struct net_device_ops be_netdev_ops = {
.ndo_del_vxlan_port = be_del_vxlan_port,
.ndo_features_check = be_features_check,
 #endif
+   .ndo_get_phys_port_id   = be_get_phys_port_id,
 };
 
 static void be_netdev_init(struct net_device *netdev)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.ke

Re: [RFC PATCH v2 net-next 1/3] tcp: replace cnt & rtt with struct in pkts_acked()

2015-07-21 Thread Eric Dumazet
On Tue, 2015-07-21 at 21:21 -0700, Lawrence Brakmo wrote:
> Replace 2 arguments (cnt and rtt) in the congestion control modules'
> pkts_acked() function with a struct. This will allow adding more
> information without having to modify existing congestion control
> modules (tcp_nv in particular needs bytes in flight when packet
> was sent).
> 
> This was proposed by Neal Cardwell in his comments to the tcp_nv patch.

Are you sure Neal suggested to pass a struct as argument ?

It was probably a struct pointer instead.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: track success and failure of TCP PMTU probing

2015-07-21 Thread David Miller
From: r...@tardy.usa.hp.com (Rick Jones)
Date: Tue, 21 Jul 2015 16:14:13 -0700 (PDT)

> From: Rick Jones 
> 
> Track success and failure of TCP PMTU probing.
> 
> Signed-off-by: Rick Jones 

Seems reasonable, applied, thanks Rick.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ravb: fix ring memory allocation

2015-07-21 Thread David Miller
From: Sergei Shtylyov 
Date: Wed, 22 Jul 2015 01:31:59 +0300

> The driver is written as if it can adapt to a low memory situation  allocating
> less RX  skbs and TX aligned buffers than the respective RX/TX ring sizes.  In
> reality  though  the driver  would malfunction in this case. Stop being overly
> smart and just fail in such situation -- this is achieved by moving the memory
> allocation from ravb_ring_format() to ravb_ring_init().
> 
> We leave dma_map_single() calls in place but make their failure non-fatal
> by marking the corresponding RX descriptors  with zero data size which should
> prevent DMA to an invalid addresses.
> 
> Signed-off-by: Sergei Shtylyov 

Applied.

But the real way to handle this is to allocate all of the necessary
resources for the replacement RX SKB before unmapping and passing the
original SKB up into the stack.

That way you _NEVER_ starve the device of RX packets to receive into,
since if you fail the memory allocation or the DMA mapping, you just
put the original SKB back into the ring.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net-next] cxgb4: Add debugfs entry to enable backdoor access

2015-07-21 Thread David Miller
From: Hariprasad Shenai 
Date: Tue, 21 Jul 2015 22:39:40 +0530

> Add debugfs entry 'use_backdoor' to enable backdoor access to read sge
> context. By default, we read sge context's via firmware. In case of FW
> issues, one can enable backdoor access via debugfs to dump sge context
> for debugging purpose.
> 
> Signed-off-by: Hariprasad Shenai 
> ---
> V2: Remove unnecessary braces as per comments by Sergei Shtylyov 
> 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: phy: dp83867: Fix warning check for setting the internal delay

2015-07-21 Thread David Miller
From: Dan Murphy 
Date: Tue, 21 Jul 2015 12:06:45 -0500

> Fix warning: logical ‘or’ of collectively exhaustive tests is always true
> 
> Change the internal delay check from an 'or' condition to an 'and'
> condition.
> 
> Reported-by: David Binderman 
> Signed-off-by: Dan Murphy 

Applied, thanks.


Re: [PATCH] cgroup: net_cls: fix false-positive "suspicious RCU usage"

2015-07-21 Thread David Miller
From: Konstantin Khlebnikov 
Date: Tue, 21 Jul 2015 19:46:29 +0300

> @@ -23,7 +23,8 @@ static inline struct cgroup_cls_state *css_cls_state(struct 
> cgroup_subsys_state
>  
>  struct cgroup_cls_state *task_cls_state(struct task_struct *p)
>  {
> - return css_cls_state(task_css(p, net_cls_cgrp_id));
> +   return css_cls_state(task_css_check(p, net_cls_cgrp_id,
> +rcu_read_lock_bh_held()));

You've made a serious mess of the indentation here.

First of all, you've changed the correct plain "TAB" before the 'return' line
into a TAB and two SPACE characters.

Secondly, the second line needs to be precisely indented to the exact column
following the openning parenthesis of the task_css_check() call on the
previous line.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] mpls: make RTA_OIF optional

2015-07-21 Thread David Miller
From: Roopa Prabhu 
Date: Tue, 21 Jul 2015 09:16:24 -0700

> From: Roopa Prabhu 
> 
> If user did not specify an oif, try and get it from the via address.
> If failed to get device, return with -ENODEV.
> 
> Signed-off-by: Roopa Prabhu 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

2015-07-21 Thread David Miller
From: Chris J Arges 
Date: Tue, 21 Jul 2015 12:36:33 -0500

> Some architectures like POWER can have a NUMA node_possible_map that
> contains sparse entries. This causes memory corruption with openvswitch
> since it allocates flow_cache with a multiple of num_possible_nodes() and
> assumes the node variable returned by for_each_node will index into
> flow->stats[node].
> 
> Use nr_node_ids to allocate a maximal sparse array instead of
> num_possible_nodes().
> 
> The crash was noticed after 3af229f2 was applied as it changed the
> node_possible_map to match node_online_map on boot.
> Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
> 
> Signed-off-by: Chris J Arges 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] netlink: don't hold mutex in rcu callback when releasing mmapd ring

2015-07-21 Thread David Miller
From: Florian Westphal 
Date: Tue, 21 Jul 2015 16:33:50 +0200

> Kirill A. Shutemov says:
> 
> This simple test-case trigers few locking asserts in kernel:
 ...
> Cong Wang says:
> 
> We can't hold mutex lock in a rcu callback, [..]
> 
> Thomas Graf says:
> 
> The socket should be dead at this point. It might be simpler to
> add a netlink_release_ring() function which doesn't require
> locking at all.
> 
> Reported-by: "Kirill A. Shutemov" 
> Diagnosed-by: Cong Wang 
> Suggested-by: Thomas Graf 
> Signed-off-by: Florian Westphal 

Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/9] sfc: support for cascaded multicast filtering

2015-07-21 Thread David Miller
From: Edward Cree 
Date: Tue, 21 Jul 2015 15:07:44 +0100

> Recent versions of firmware for SFC9100 adapters add support for filter
>  chaining, in which packets matching multiple filters are delivered to all
>  filters' recipients, rather than only the highest match-priority filter as 
> was
>  previously the case.
> This patch series enables this feature and redesigns the filter handling code
>  to make use of it; in particular, subscribing to a multicast address on one
>  function no longer prevents traffic to that address reaching another function
>  which is in promiscuous or allmulti mode.
> If the firmware does not support filter chaining, the driver will fall back to
>  the old behaviour.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 net 0/3] BPF JIT fixes for ARM

2015-07-21 Thread David Miller
From: Nicolas Schichan 
Date: Tue, 21 Jul 2015 14:14:11 +0200

> These patches are fixing bugs in the ARM JIT and should probably find
> their way to a stable kernel. All 60 test_bpf tests in Linux 4.1 release
> are now passing OK (was 54 out of 60 before).

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] tcp: suppress a division by zero warning

2015-07-21 Thread David Miller
From: Eric Dumazet 
Date: Wed, 22 Jul 2015 07:02:00 +0200

> From: Eric Dumazet 
> 
> Andrew Morton reported following warning on one ARM build
> with gcc-4.4 :
> 
> net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
> net/ipv4/inet_hashtables.c:617: warning: division by zero
> 
> Even guarded with a test on sizeof(spinlock_t), compiler does not
> like current construct on a !CONFIG_SMP build.
> 
> Remove the warning by using a temporary variable.
> 
> Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()")
> Reported-by: Andrew Morton 
> Signed-off-by: Eric Dumazet 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] tcp: suppress a division by zero warning

2015-07-21 Thread Eric Dumazet
From: Eric Dumazet 

Andrew Morton reported following warning on one ARM build
with gcc-4.4 :

net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
net/ipv4/inet_hashtables.c:617: warning: division by zero

Even guarded with a test on sizeof(spinlock_t), compiler does not
like current construct on a !CONFIG_SMP build.

Remove the warning by using a temporary variable.

Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()")
Reported-by: Andrew Morton 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/inet_hashtables.c |   11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 5f9b063bbe8a..0cb9165421d4 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -624,22 +624,21 @@ EXPORT_SYMBOL_GPL(inet_hashinfo_init);
 
 int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 {
+   unsigned int locksz = sizeof(spinlock_t);
unsigned int i, nblocks = 1;
 
-   if (sizeof(spinlock_t) != 0) {
+   if (locksz != 0) {
/* allocate 2 cache lines or at least one spinlock per cpu */
-   nblocks = max_t(unsigned int,
-   2 * L1_CACHE_BYTES / sizeof(spinlock_t),
-   1);
+   nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U);
nblocks = roundup_pow_of_two(nblocks * num_possible_cpus());
 
/* no more locks than number of hash buckets */
nblocks = min(nblocks, hashinfo->ehash_mask + 1);
 
-   hashinfo->ehash_locks = kmalloc_array(nblocks, 
sizeof(spinlock_t),
+   hashinfo->ehash_locks = kmalloc_array(nblocks, locksz,
  GFP_KERNEL | 
__GFP_NOWARN);
if (!hashinfo->ehash_locks)
-   hashinfo->ehash_locks = vmalloc(nblocks * 
sizeof(spinlock_t));
+   hashinfo->ehash_locks = vmalloc(nblocks * locksz);
 
if (!hashinfo->ehash_locks)
return -ENOMEM;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2] bridge: Fix setting a flag in br_fill_ifvlaninfo_range().

2015-07-21 Thread Rami Rosen
This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() 
method. The
assignment of vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is
unneeded, as vinfo.flags value is overriden by the  immediately following 
vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement.

Signed-off-by: Rami Rosen 
---
 net/bridge/br_netlink.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 364bdc9..793d247 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -164,8 +164,6 @@ static int br_fill_ifvlaninfo_range(struct sk_buff *skb, 
u16 vid_start,
sizeof(vinfo), &vinfo))
goto nla_put_failure;
 
-   vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN;
-
vinfo.vid = vid_end;
vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END;
if (nla_put(skb, IFLA_BRIDGE_VLAN_INFO,
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

2015-07-21 Thread Eric Dumazet
On Tue, 2015-07-21 at 19:03 -0700, Cong Wang wrote:
> On Tue, Jul 21, 2015 at 1:57 PM, Eric Dumazet  wrote:
> > On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote:
> >
> >> > -   kfree_skb(skb);
> >> > +   INIT_LIST_HEAD(&q->new_flows);
> >> > +   INIT_LIST_HEAD(&q->old_flows);
> >> > +   for (i = 0; i < q->flows_cnt; i++) {
> >> > +   struct fq_codel_flow *flow = q->flows + i;
> >> > +
> >> > +   while (flow->head)
> >> > +   kfree_skb(dequeue_head(flow));
> >> > +
> >> > +   INIT_LIST_HEAD(&flow->flowchain);
> >>
> >>
> >> You probably need to call codel_vars_init(&flow->cvars) as well.
> >
> > It is not necessary : flow->cvars only matter in the event of a dequeue,
> > but whole qdisc is dismantled and no packet will be dequeued.
> >
> 
> But it will affect the next dequeue _after_ reset? which is not supposed
> to happen as we expect a fresh start after reset?

Hmm... I thought reset() was only done at queue dismantle, so no new
packet should be added later, and since no packet should be left after
reset, no dequeue should happen.

For completeness, we still can add the codel_vars_init(), no problem.

Thanks.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf] netfilter: Support expectations in different zones

2015-07-21 Thread Joe Stringer
When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.

Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones")

Signed-off-by: Joe Stringer 
---
 net/netfilter/nf_conntrack_expect.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_expect.c 
b/net/netfilter/nf_conntrack_expect.c
index 7a17070..b45a422 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -219,7 +219,8 @@ static inline int expect_clash(const struct 
nf_conntrack_expect *a,
a->mask.src.u3.all[count] & b->mask.src.u3.all[count];
}
 
-   return nf_ct_tuple_mask_cmp(&a->tuple, &b->tuple, &intersect_mask);
+   return nf_ct_tuple_mask_cmp(&a->tuple, &b->tuple, &intersect_mask) &&
+  nf_ct_zone(a->master) == nf_ct_zone(b->master);
 }
 
 static inline int expect_matches(const struct nf_conntrack_expect *a,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 net-next 3/3] tcp: add NV congestion control

2015-07-21 Thread Lawrence Brakmo
This is a request for comments.

TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of
NV was presented at 2010's LPC (slides). It is a delayed based
congestion avoidance for the data center. This version has been tested
within a 10G rack where the HW RTTs are 20-50us.

A description of TCP-NV, including implementation and experimental
results, can be found at:
http://www.brakmo.org/networking/tcp-nv/TCPNV.html

The current version includes many module parameters to support
experimentation with the parameters.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h  |   1 +
 net/ipv4/Kconfig   |  16 ++
 net/ipv4/Makefile  |   1 +
 net/ipv4/sysctl_net_ipv4.c |   9 +
 net/ipv4/tcp_input.c   |   2 +
 net/ipv4/tcp_nv.c  | 479 +
 6 files changed, 508 insertions(+)
 create mode 100644 net/ipv4/tcp_nv.c

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2e62efe..c0690ae 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat;
 extern int sysctl_tcp_min_tso_segs;
 extern int sysctl_tcp_autocorking;
 extern int sysctl_tcp_invalid_ratelimit;
+extern int sysctl_tcp_nv_enable;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 6fb3c90..c37b374 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -539,6 +539,22 @@ config TCP_CONG_VEGAS
window. TCP Vegas should provide less packet loss, but it is
not as aggressive as TCP Reno.
 
+config TCP_CONG_NV
+   tristate "TCP NV"
+   default m
+   ---help---
+   TCP NV is a follow up to TCP Vegas. It has been modified to deal with
+   10G networks, measurement noise introduced by LRO, GRO and interrupt
+   coalescence. In addition, it will decrease its cwnd multiplicative
+   instead of linearly.
+
+   Note that in general congestion avoidance (cwnd decreased when # packets
+   queued grows) cannot coexist with congestion control (cwnd decreased 
only
+   when there is packet loss) due to fairness issues. One scenario when the
+   can coexist safely is when the CA flows have RTTs << CC flows RTTs.
+
+   For further details see http://www.brakmo.org/networking/tcp-nv/
+
 config TCP_CONG_SCALABLE
tristate "Scalable TCP"
default n
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index efc43f3..06f335f 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o
 obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o
 obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o
 obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o
+obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o
 obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o
 obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o
 obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 433231c..31846d5 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -730,6 +730,15 @@ static struct ctl_table ipv4_table[] = {
.proc_handler   = proc_dointvec_ms_jiffies,
},
{
+   .procname   = "tcp_nv_enable",
+   .data   = &sysctl_tcp_nv_enable,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = &zero,
+   .extra2 = &one,
+   },  
+   {
.procname   = "icmp_msgs_per_sec",
.data   = &sysctl_icmp_msgs_per_sec,
.maxlen = sizeof(int),
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index aca4ae5..87560d9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly;
 int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
 int sysctl_tcp_early_retrans __read_mostly = 3;
 int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
+int sysctl_tcp_nv_enable __read_mostly = 1;
+EXPORT_SYMBOL(sysctl_tcp_nv_enable);
 
 #define FLAG_DATA  0x01 /* Incoming frame contained data.  
*/
 #define FLAG_WIN_UPDATE0x02 /* Incoming ACK was a window 
update.   */
diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c
new file mode 100644
index 000..af451b6
--- /dev/null
+++ b/net/ipv4/tcp_nv.c
@@ -0,0 +1,479 @@
+/*
+ * TCP NV: TCP with Congestion Avoidance
+ *
+ * TCP-NV is a successor of TCP-Vegas that has been developed to
+ * deal with the issues that occur in modern networks. 
+ * Like TCP-Vegas, TCP-NV supports true congestion avoidance,
+ * the ability to detect congestion before packet losses occur.
+ * When congestion (queue buildup) starts to occur, TCP-NV
+ * predicts what the cwnd size should be for the current
+ * throughput and it re

[RFC PATCH v2 net-next 1/3] tcp: replace cnt & rtt with struct in pkts_acked()

2015-07-21 Thread Lawrence Brakmo
Replace 2 arguments (cnt and rtt) in the congestion control modules'
pkts_acked() function with a struct. This will allow adding more
information without having to modify existing congestion control
modules (tcp_nv in particular needs bytes in flight when packet
was sent).

This was proposed by Neal Cardwell in his comments to the tcp_nv patch.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h   |  7 ++-
 net/ipv4/tcp_bic.c  |  6 +++---
 net/ipv4/tcp_cdg.c  | 14 +++---
 net/ipv4/tcp_cubic.c|  6 +++---
 net/ipv4/tcp_htcp.c | 10 +-
 net/ipv4/tcp_illinois.c | 20 ++--
 net/ipv4/tcp_input.c|  7 +--
 net/ipv4/tcp_lp.c   |  6 +++---
 net/ipv4/tcp_vegas.c|  6 +++---
 net/ipv4/tcp_vegas.h|  2 +-
 net/ipv4/tcp_veno.c |  6 +++---
 net/ipv4/tcp_westwood.c |  6 +++---
 net/ipv4/tcp_yeah.c |  6 +++---
 13 files changed, 55 insertions(+), 47 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 364426a..26e7651 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -834,6 +834,11 @@ enum tcp_ca_ack_event_flags {
 
 union tcp_cc_info;
 
+struct ack_sample {
+   u32 pkts_acked;
+   s32 rtt_us;
+};
+
 struct tcp_congestion_ops {
struct list_headlist;
u32 key;
@@ -857,7 +862,7 @@ struct tcp_congestion_ops {
/* new value of cwnd after loss (optional) */
u32  (*undo_cwnd)(struct sock *sk);
/* hook for packet ack accounting (optional) */
-   void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us);
+   void (*pkts_acked)(struct sock *sk, struct ack_sample);
/* get info for inet_diag (optional) */
size_t (*get_info)(struct sock *sk, u32 ext, int *attr,
   union tcp_cc_info *info);
diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c
index fd1405d..6a873f7 100644
--- a/net/ipv4/tcp_bic.c
+++ b/net/ipv4/tcp_bic.c
@@ -197,15 +197,15 @@ static void bictcp_state(struct sock *sk, u8 new_state)
 /* Track delayed acknowledgment ratio using sliding window
  * ratio = (15*ratio + sample) / 16
  */
-static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt)
+static void bictcp_acked(struct sock *sk, struct ack_sample sample)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);
 
if (icsk->icsk_ca_state == TCP_CA_Open) {
struct bictcp *ca = inet_csk_ca(sk);
 
-   cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT;
-   ca->delayed_ack += cnt;
+   ca->delayed_ack += sample.pkts_acked - 
+   (ca->delayed_ack >> ACK_RATIO_SHIFT);
}
 }
 
diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c
index 167b6a3..ef64106 100644
--- a/net/ipv4/tcp_cdg.c
+++ b/net/ipv4/tcp_cdg.c
@@ -294,12 +294,12 @@ static void tcp_cdg_cong_avoid(struct sock *sk, u32 ack, 
u32 acked)
ca->shadow_wnd = max(ca->shadow_wnd, ca->shadow_wnd + incr);
 }
 
-static void tcp_cdg_acked(struct sock *sk, u32 num_acked, s32 rtt_us)
+static void tcp_cdg_acked(struct sock *sk, struct ack_sample sample)
 {
struct cdg *ca = inet_csk_ca(sk);
struct tcp_sock *tp = tcp_sk(sk);
 
-   if (rtt_us <= 0)
+   if (sample.rtt_us <= 0)
return;
 
/* A heuristic for filtering delayed ACKs, adapted from:
@@ -307,20 +307,20 @@ static void tcp_cdg_acked(struct sock *sk, u32 num_acked, 
s32 rtt_us)
 * delay and rate based TCP mechanisms." TR 100219A. CAIA, 2010.
 */
if (tp->sacked_out == 0) {
-   if (num_acked == 1 && ca->delack) {
+   if (sample.pkts_acked == 1 && ca->delack) {
/* A delayed ACK is only used for the minimum if it is
 * provenly lower than an existing non-zero minimum.
 */
-   ca->rtt.min = min(ca->rtt.min, rtt_us);
+   ca->rtt.min = min(ca->rtt.min, sample.rtt_us);
ca->delack--;
return;
-   } else if (num_acked > 1 && ca->delack < 5) {
+   } else if (sample.pkts_acked > 1 && ca->delack < 5) {
ca->delack++;
}
}
 
-   ca->rtt.min = min_not_zero(ca->rtt.min, rtt_us);
-   ca->rtt.max = max(ca->rtt.max, rtt_us);
+   ca->rtt.min = min_not_zero(ca->rtt.min, sample.rtt_us);
+   ca->rtt.max = max(ca->rtt.max, sample.rtt_us);
 }
 
 static u32 tcp_cdg_ssthresh(struct sock *sk)
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 28011fb..070d629 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -416,21 +416,21 @@ static void hystart_update(struct sock *sk, u32 delay)
 /* Track delayed acknowledgment ratio using sliding window
  * ratio = (15*ratio + sample) / 16
  */
-static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us)
+static void bictcp_acked(struct sock *sk, struct ack_sample sample)
 {
const struct tcp

[RFC PATCH v2 net-next 2/3] tcp: add in_flight to tcp_skb_cb

2015-07-21 Thread Lawrence Brakmo
Based on comments by Neal Cardwell to tcp_nv patch:

  AFAICT this patch would not require an increase in the size of sk_buff
  cb[] if it were to take advantage of the fact that the tcp_skb_cb
  header.h4 and header.h6 fields are only used in the packet reception
  code path, and this in_flight field is only used on the transmit
  side. So the in_flight field could be placed in a struct that is
  itself placed in a union with the "header" union.

  That way the sender code can remember the in_flight value
  without requiring any extra space. And in the future other
  sender-side info could be stored in the "tx" struct, if needed.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h | 13 ++---
 net/ipv4/tcp_input.c  |  5 -
 net/ipv4/tcp_output.c |  4 +++-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 26e7651..2e62efe 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -755,11 +755,17 @@ struct tcp_skb_cb {
/* 1 byte hole */
__u32   ack_seq;/* Sequence number ACK'd*/
union {
-   struct inet_skb_parmh4;
+   struct {
+   /* bytes in flight when this packet was sent */
+   __u32 in_flight;
+   } tx;   /* only used for outgoing skbs */
+   union {
+   struct inet_skb_parmh4;
 #if IS_ENABLED(CONFIG_IPV6)
-   struct inet6_skb_parm   h6;
+   struct inet6_skb_parm   h6;
 #endif
-   } header;   /* For incoming frames  */
+   } header;   /* For incoming skbs */
+   };
 };
 
 #define TCP_SKB_CB(__skb)  ((struct tcp_skb_cb *)&((__skb)->cb[0]))
@@ -837,6 +843,7 @@ union tcp_cc_info;
 struct ack_sample {
u32 pkts_acked;
s32 rtt_us;
+   u32 in_flight;
 };
 
 struct tcp_congestion_ops {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4f641f6..aca4ae5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3068,6 +3068,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
long ca_rtt_us = -1L;
struct sk_buff *skb;
u32 pkts_acked = 0;
+   u32 last_in_flight = 0;
bool rtt_update;
int flag = 0;
 
@@ -3107,6 +3108,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
if (!first_ackt.v64)
first_ackt = last_ackt;
 
+   last_in_flight = TCP_SKB_CB(skb)->tx.in_flight;
reord = min(pkts_acked, reord);
if (!after(scb->end_seq, tp->high_seq))
flag |= FLAG_ORIG_SACK_ACKED;
@@ -3196,7 +3198,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
}
 
if (icsk->icsk_ca_ops->pkts_acked) {
-   struct ack_sample sample = {pkts_acked, ca_rtt_us};
+   struct ack_sample sample = {pkts_acked, ca_rtt_us,
+   last_in_flight};
 
icsk->icsk_ca_ops->pkts_acked(sk, sample);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7105784..e9deab5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -920,9 +920,12 @@ static int tcp_transmit_skb(struct sock *sk, struct 
sk_buff *skb, int clone_it,
int err;
 
BUG_ON(!skb || !tcp_skb_pcount(skb));
+   tp = tcp_sk(sk);
 
if (clone_it) {
skb_mstamp_get(&skb->skb_mstamp);
+   TCP_SKB_CB(skb)->tx.in_flight = TCP_SKB_CB(skb)->end_seq
+   - tp->snd_una;
 
if (unlikely(skb_cloned(skb)))
skb = pskb_copy(skb, gfp_mask);
@@ -933,7 +936,6 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff 
*skb, int clone_it,
}
 
inet = inet_sk(sk);
-   tp = tcp_sk(sk);
tcb = TCP_SKB_CB(skb);
memset(&opts, 0, sizeof(opts));
 
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 net-next 0/3] tcp: add NV congestion control

2015-07-21 Thread Lawrence Brakmo
This patchset adds support for NV congestion control.

The first patch replaces two arguments in the pkts_acked() function
of the congestion control modules with a struct, making it easier to
add more parameters later without modifying the existing congestion
control modules.

The second patch adds the number of bytes in_flight when a packet is sent
to the tcp_skb_cb without increasing its size.

The third patch adds NV congestion control support.

[RFC PATCH v2 net-next 1/3] tcp: replace cnt & rtt with struct in pkts_acked()
[RFC PATCH v2 net-next 2/3] tcp: add in_flight to tcp_skb_cb
[RFC PATCH v2 net-next 3/3] tcp: add NV congestion control

Signed-off-by: Lawrence Brakmo 

 include/net/tcp.h  |  21 ++-
 net/ipv4/Kconfig   |  16 ++
 net/ipv4/Makefile  |   1 +
 net/ipv4/sysctl_net_ipv4.c |   9 +
 net/ipv4/tcp_bic.c |   6 +-
 net/ipv4/tcp_cdg.c |  14 +-
 net/ipv4/tcp_cubic.c   |   6 +-
 net/ipv4/tcp_htcp.c|  10 +-
 net/ipv4/tcp_illinois.c|  20 +-
 net/ipv4/tcp_input.c   |  12 +-
 net/ipv4/tcp_lp.c  |   6 +-
 net/ipv4/tcp_nv.c  | 479 

 net/ipv4/tcp_output.c  |   4 +-
 net/ipv4/tcp_vegas.c   |   6 +-
 net/ipv4/tcp_vegas.h   |   2 +-
 net/ipv4/tcp_veno.c|   6 +-
 net/ipv4/tcp_westwood.c|   6 +-
 net/ipv4/tcp_yeah.c|   6 +-
 18 files changed, 579 insertions(+), 51 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-21 Thread Julian Anastasov

Hello,

On Tue, 21 Jul 2015, Martin KaFai Lau wrote:

> The patch checks neigh->nud_state before acquiring the writer lock.
> Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

Locking usage is absolutely correct.

> + if (!(neigh->nud_state & NUD_VALID) &&
> + time_after(jiffies, neigh->updated + 
> rt->rt6i_idev->cnf.rtr_probe_interval)) {

but this line is too long...

> + work = kmalloc(sizeof(*work), GFP_ATOMIC);
> + if (work) {
> + __neigh_set_probe_once(neigh);
> + }

scripts/checkpatch.pl --strict /tmp/file.patch

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mac80211_hwsim: unregister genetlink family properly

2015-07-21 Thread Su Kang Yin
During hwsim_init_netlink(), we should call genl_unregister_family()
if failed on netlink_register_notifier() since the genetlink is
already registered.

Signed-off-by: Su Kang Yin 
---
 drivers/net/wireless/mac80211_hwsim.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/mac80211_hwsim.c 
b/drivers/net/wireless/mac80211_hwsim.c
index 99e873d..16d953e 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -3120,8 +3120,10 @@ static int hwsim_init_netlink(void)
goto failure;
 
rc = netlink_register_notifier(&hwsim_netlink_notifier);
-   if (rc)
+   if (rc) {
+   genl_unregister_family(&hwsim_genl_family);
goto failure;
+   }
 
return 0;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next] ebpf: Allow dereferences of PTR_TO_STACK registers

2015-07-21 Thread Alexei Starovoitov
On Tue, Jul 21, 2015 at 07:00:40PM -0700, Alex Gartrell wrote:
> mov %rsp, %r1   ; r1 = rsp
> add $-8, %r1; r1 = rsp - 8
> store_q $123, -8(%rsp)  ; *(u64*)r1 = 123  <- valid
> store_q $123, (%r1) ; *(u64*)r1 = 123  <- previously invalid
> mov $0, %r0
> exit; Always need to exit

Is this your new eBPF assembler syntax? :)
imo gnu style looks ugly... ;)

It's great to see such in-depth understanding of verifier!!

> And we'd get the following error:
> 
>   0: (bf) r1 = r10
>   1: (07) r1 += -8
>   2: (7a) *(u64 *)(r10 -8) = 999
>   3: (7a) *(u64 *)(r1 +0) = 999
>   R1 invalid mem access 'fp'
> 
>   Unable to load program
> 
> We already know that a register is a stack address and the appropriate
> offset, so we should be able to validate those references as well.

yes, we can teach verifier to do that.
Though llvm doesn't generate such code. It's small enough change.

> Signed-off-by: Alex Gartrell 
> ---
>  kernel/bpf/verifier.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 039d866..5dfbece 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -676,6 +676,15 @@ static int check_mem_access(struct verifier_env *env, 
> u32 regno, int off,
>   err = check_stack_write(state, off, size, value_regno);
>   else
>   err = check_stack_read(state, off, size, value_regno);
> + } else if (state->regs[regno].type == PTR_TO_STACK) {
> + int real_off = state->regs[regno].imm + off;

real_off is missing alignment and bounds checks.
something like:
if (state->regs[regno].type == PTR_TO_STACK)
off += state->regs[regno].imm;
if (off % size != 0)
...
else if (state->regs[regno].type == FRAME_PTR || == PTR_TO_STACK)
.. as-is here ...

would fix it.

please add few accept and reject tests for this to test_verifier.c as well.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-21 Thread YOSHIFUJI Hideaki
Hi,

Martin KaFai Lau wrote:
> The patch checks neigh->nud_state before acquiring the writer lock.
> Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

You have to take "some" lock when accessing neigh->nud_state
theoretically.

> 
> I also take this chance to re-arrange the code.

No, please do not mix multiple changes.

> 
> 40 udpflood processes and a /64 gateway route are used.
> The gateway has NUD_PERMANENT.  Each of them is run for 30s.
> At the end, the total number of finished sendto():
> 
> BeforeAfter
> 55M 95M
> 
> Signed-off-by: Martin KaFai Lau 
> Cc: Hannes Frederic Sowa 
> ---
>  net/ipv6/route.c | 41 -
>  1 file changed, 20 insertions(+), 21 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 6090969..a6c6b5a 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -544,6 +544,7 @@ static void rt6_probe_deferred(struct work_struct *w)
>  
>  static void rt6_probe(struct rt6_info *rt)
>  {
> + struct __rt6_probe_work *work;
>   struct neighbour *neigh;
>   /*
>* Okay, this does not seem to be appropriate
> @@ -558,34 +559,32 @@ static void rt6_probe(struct rt6_info *rt)
>   rcu_read_lock_bh();
>   neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway);
>   if (neigh) {
> - write_lock(&neigh->lock);
>   if (neigh->nud_state & NUD_VALID)
>   goto out;
> - }
> -
> - if (!neigh ||
> - time_after(jiffies, neigh->updated + 
> rt->rt6i_idev->cnf.rtr_probe_interval)) {
> - struct __rt6_probe_work *work;
>  
> + work = NULL;
> + write_lock(&neigh->lock);
> + if (!(neigh->nud_state & NUD_VALID) &&
> + time_after(jiffies, neigh->updated + 
> rt->rt6i_idev->cnf.rtr_probe_interval)) {
> + work = kmalloc(sizeof(*work), GFP_ATOMIC);
> + if (work) {
> + __neigh_set_probe_once(neigh);
> + }
> + }
> + write_unlock(&neigh->lock);
> + } else {
>   work = kmalloc(sizeof(*work), GFP_ATOMIC);
> + }
>  
> - if (neigh && work)
> - __neigh_set_probe_once(neigh);
> -
> - if (neigh)
> - write_unlock(&neigh->lock);
> + if (work) {
> + INIT_WORK(&work->work, rt6_probe_deferred);
> + work->target = rt->rt6i_gateway;
> + dev_hold(rt->dst.dev);
> + work->dev = rt->dst.dev;
> + schedule_work(&work->work);
> + }
>  
> - if (work) {
> - INIT_WORK(&work->work, rt6_probe_deferred);
> - work->target = rt->rt6i_gateway;
> - dev_hold(rt->dst.dev);
> - work->dev = rt->dst.dev;
> - schedule_work(&work->work);
> - }
> - } else {
>  out:
> - write_unlock(&neigh->lock);
> - }
>   rcu_read_unlock_bh();
>  }
>  #else
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

2015-07-21 Thread Cong Wang
On Tue, Jul 21, 2015 at 1:57 PM, Eric Dumazet  wrote:
> On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote:
>
>> > -   kfree_skb(skb);
>> > +   INIT_LIST_HEAD(&q->new_flows);
>> > +   INIT_LIST_HEAD(&q->old_flows);
>> > +   for (i = 0; i < q->flows_cnt; i++) {
>> > +   struct fq_codel_flow *flow = q->flows + i;
>> > +
>> > +   while (flow->head)
>> > +   kfree_skb(dequeue_head(flow));
>> > +
>> > +   INIT_LIST_HEAD(&flow->flowchain);
>>
>>
>> You probably need to call codel_vars_init(&flow->cvars) as well.
>
> It is not necessary : flow->cvars only matter in the event of a dequeue,
> but whole qdisc is dismantled and no packet will be dequeued.
>

But it will affect the next dequeue _after_ reset? which is not supposed
to happen as we expect a fresh start after reset?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH net-next] ebpf: Allow dereferences of PTR_TO_STACK registers

2015-07-21 Thread Alex Gartrell
mov %rsp, %r1   ; r1 = rsp
add $-8, %r1; r1 = rsp - 8
store_q $123, -8(%rsp)  ; *(u64*)r1 = 123  <- valid
store_q $123, (%r1) ; *(u64*)r1 = 123  <- previously invalid
mov $0, %r0
exit; Always need to exit

And we'd get the following error:

0: (bf) r1 = r10
1: (07) r1 += -8
2: (7a) *(u64 *)(r10 -8) = 999
3: (7a) *(u64 *)(r1 +0) = 999
R1 invalid mem access 'fp'

Unable to load program

We already know that a register is a stack address and the appropriate
offset, so we should be able to validate those references as well.

Signed-off-by: Alex Gartrell 
---
 kernel/bpf/verifier.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 039d866..5dfbece 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -676,6 +676,15 @@ static int check_mem_access(struct verifier_env *env, u32 
regno, int off,
err = check_stack_write(state, off, size, value_regno);
else
err = check_stack_read(state, off, size, value_regno);
+   } else if (state->regs[regno].type == PTR_TO_STACK) {
+   int real_off = state->regs[regno].imm + off;
+
+   if (t == BPF_WRITE)
+   err = check_stack_write(
+   state, real_off, size, value_regno);
+   else
+   err = check_stack_read(
+   state, real_off, size, value_regno);
} else {
verbose("R%d invalid mem access '%s'\n",
regno, reg_type_str[state->regs[regno].type]);
-- 
Alex Gartrell 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe fail path

2015-07-21 Thread Duan Andy
From: Lucas Stach  Sent: Tuesday, July 21, 2015 11:11 PM
> To: David S. Miller
> Cc: Duan Fugang-B38611; Li Frank-B20596; netdev@vger.kernel.org;
> ker...@pengutronix.de; patchwork-...@pengutronix.de
> Subject: [PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe
> fail path
> 
> This function frees resources and cancels delayed work item that have
> been initialized in fec_ptp_init().
> 
> Use this to do proper error handling if something goes wrong in probe
> function after fec_ptp_init has been called.
> 
> Signed-off-by: Lucas Stach 
> ---
>  drivers/net/ethernet/freescale/fec.h  |  1 +
>  drivers/net/ethernet/freescale/fec_main.c |  5 ++---
> drivers/net/ethernet/freescale/fec_ptp.c  | 10 ++
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec.h
> b/drivers/net/ethernet/freescale/fec.h
> index 1eee73cccdf5..99d33e2d35e6 100644
> --- a/drivers/net/ethernet/freescale/fec.h
> +++ b/drivers/net/ethernet/freescale/fec.h
> @@ -562,6 +562,7 @@ struct fec_enet_private {  };
> 
>  void fec_ptp_init(struct platform_device *pdev);
> +void fec_ptp_stop(struct platform_device *pdev);
>  void fec_ptp_start_cyclecounter(struct net_device *ndev);  int
> fec_ptp_set(struct net_device *ndev, struct ifreq *ifr);  int
> fec_ptp_get(struct net_device *ndev, struct ifreq *ifr); diff --git
> a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index a7f1bdf718f8..32e3807c650e 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -3494,6 +3494,7 @@ failed_register:
>  failed_mii_init:
>  failed_irq:
>  failed_init:
> + fec_ptp_stop(pdev);
>   if (fep->reg_phy)
>   regulator_disable(fep->reg_phy);
>  failed_regulator:
> @@ -3515,14 +3516,12 @@ fec_drv_remove(struct platform_device *pdev)
>   struct net_device *ndev = platform_get_drvdata(pdev);
>   struct fec_enet_private *fep = netdev_priv(ndev);
> 
> - cancel_delayed_work_sync(&fep->time_keep);
>   cancel_work_sync(&fep->tx_timeout_work);
> + fec_ptp_stop(pdev);
>   unregister_netdev(ndev);
>   fec_enet_mii_remove(fep);
>   if (fep->reg_phy)
>   regulator_disable(fep->reg_phy);
> - if (fep->ptp_clock)
> - ptp_clock_unregister(fep->ptp_clock);
>   of_node_put(fep->phy_node);
>   free_netdev(ndev);
> 
> diff --git a/drivers/net/ethernet/freescale/fec_ptp.c
> b/drivers/net/ethernet/freescale/fec_ptp.c
> index a15663ad7f5e..f457a23d0bfb 100644
> --- a/drivers/net/ethernet/freescale/fec_ptp.c
> +++ b/drivers/net/ethernet/freescale/fec_ptp.c
> @@ -604,6 +604,16 @@ void fec_ptp_init(struct platform_device *pdev)
>   schedule_delayed_work(&fep->time_keep, HZ);  }
> 
> +void fec_ptp_stop(struct platform_device *pdev) {
> + struct net_device *ndev = platform_get_drvdata(pdev);
> + struct fec_enet_private *fep = netdev_priv(ndev);
> +
> + cancel_delayed_work_sync(&fep->time_keep);
> + if (fep->ptp_clock)
> + ptp_clock_unregister(fep->ptp_clock);
> +}
> +
>  /**
>   * fec_ptp_check_pps_event
>   * @fep: the fec_enet_private structure handle
> --
> 2.1.4

Acked-by: Fugang Duan 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 1/2] net: fec: use managed DMA API functions to allocate BD ring

2015-07-21 Thread Duan Andy
From: Lucas Stach  Sent: Tuesday, July 21, 2015 11:11 PM
> To: David S. Miller
> Cc: Duan Fugang-B38611; Li Frank-B20596; netdev@vger.kernel.org;
> ker...@pengutronix.de; patchwork-...@pengutronix.de
> Subject: [PATCH v2 1/2] net: fec: use managed DMA API functions to
> allocate BD ring
> 
> So it gets freed when the device is going away.
> This fixes a DMA memory leak on driver probe() fail and driver remove().
> 
> Signed-off-by: Lucas Stach 
> ---
> v2: Fix indentation of second line to fix alignment with opening bracket.
> ---
>  drivers/net/ethernet/freescale/fec_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index 349365d85b92..a7f1bdf718f8 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -3142,8 +3142,8 @@ static int fec_enet_init(struct net_device *ndev)
>   fep->bufdesc_size;
> 
>   /* Allocate memory for buffer descriptors. */
> - cbd_base = dma_alloc_coherent(NULL, bd_size, &bd_dma,
> -   GFP_KERNEL);
> + cbd_base = dmam_alloc_coherent(&fep->pdev->dev, bd_size, &bd_dma,
> +GFP_KERNEL);
>   if (!cbd_base) {
>   return -ENOMEM;
>   }
> --

Can you also replace the below position with dma_alloc_coherent() ?
txq->tso_hdrs = dma_alloc_coherent(NULL,
txq->tx_ring_size * TSO_HEADER_SIZE,
&txq->tso_hdrs_dma,
GFP_KERNEL);


Regards,
Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000e: Move e1000e_disable_aspm_locked() inside CONFIG_PM

2015-07-21 Thread Michael Ellerman
On Wed, 2015-07-15 at 03:30 -0700, Jeff Kirsher wrote:
> On Tue, 2015-07-14 at 13:54 +1000, Michael Ellerman wrote:
> > e1000e_disable_aspm_locked() is only used in __e1000_resume() which is
> > inside CONFIG_PM. So when CONFIG_PM=n we get a "defined but not used"
> > warning for e1000e_disable_aspm_locked().
> > 
> > Move it inside the existing CONFIG_PM block to avoid the warning.
> > 
> > Signed-off-by: Michael Ellerman 
> > ---
> >  drivers/net/ethernet/intel/e1000e/netdev.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> NACK, this is already fixed in my next-queue tree.  Raanan submitted a
> patch back on July 6th to resolve this issue, see commit id
> a75787d2246a93d256061db602f252703559af65 in my dev-queue branch of my
> next-queue tree.

OK. I take it your next-queue is destined for 4.3, so we'll just have to suck
on the warning until then?

cheers


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next:master 200/208] drivers/net/vxlan.c:1739:21: sparse: incorrect type in assignment (different base types)

2015-07-21 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   16040894b26af9f85d9395f072c53d76a44eba21
commit: 614732eaa12dd462c0ab274700bed14f36afea5e [200/208] openvswitch: Use 
regular VXLAN net_device device
reproduce:
  # apt-get install sparse
  git checkout 614732eaa12dd462c0ab274700bed14f36afea5e
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   include/net/checksum.h:166:35: sparse: incorrect type in argument 1 
(different base types)
   include/net/checksum.h:166:35:expected restricted __wsum [usertype] csum
   include/net/checksum.h:166:35:got restricted __sum16
   include/net/checksum.h:166:43: sparse: incorrect type in argument 2 
(different base types)
   include/net/checksum.h:166:43:expected restricted __wsum [usertype] 
addend
   include/net/checksum.h:166:43:got restricted __sum16 [usertype] 
   include/net/checksum.h:174:43: sparse: incorrect type in argument 2 
(different base types)
   include/net/checksum.h:174:43:expected restricted __wsum [usertype] 
addend
   include/net/checksum.h:174:43:got restricted __sum16 [usertype] 
   include/net/checksum.h:166:35: sparse: incorrect type in argument 1 
(different base types)
   include/net/checksum.h:166:35:expected restricted __wsum [usertype] csum
   include/net/checksum.h:166:35:got restricted __sum16
   include/net/checksum.h:166:43: sparse: incorrect type in argument 2 
(different base types)
   include/net/checksum.h:166:43:expected restricted __wsum [usertype] 
addend
   include/net/checksum.h:166:43:got restricted __sum16 [usertype] 
>> drivers/net/vxlan.c:1739:21: sparse: incorrect type in assignment (different 
>> base types)
   drivers/net/vxlan.c:1739:21:expected restricted __be32 [usertype] vx_vni
   drivers/net/vxlan.c:1739:21:got unsigned int [unsigned] [usertype] vni
   drivers/net/vxlan.c:1818:21: sparse: incorrect type in assignment (different 
base types)
   drivers/net/vxlan.c:1818:21:expected restricted __be32 [usertype] vx_vni
   drivers/net/vxlan.c:1818:21:got unsigned int [unsigned] [usertype] vni
>> drivers/net/vxlan.c:2014:58: sparse: incorrect type in argument 11 
>> (different base types)
   drivers/net/vxlan.c:2014:58:expected unsigned int [unsigned] [usertype] 
vni
   drivers/net/vxlan.c:2014:58:got restricted __be32 [usertype] 
   drivers/net/vxlan.c:2072:67: sparse: incorrect type in argument 11 
(different base types)
   drivers/net/vxlan.c:2072:67:expected unsigned int [unsigned] [usertype] 
vni
   drivers/net/vxlan.c:2072:67:got restricted __be32 [usertype] 

vim +1739 drivers/net/vxlan.c

  1723  }
  1724  
  1725  skb = vlan_hwaccel_push_inside(skb);
  1726  if (WARN_ON(!skb)) {
  1727  err = -ENOMEM;
  1728  goto err;
  1729  }
  1730  
  1731  skb = iptunnel_handle_offloads(skb, udp_sum, type);
  1732  if (IS_ERR(skb)) {
  1733  err = -EINVAL;
  1734  goto err;
  1735  }
  1736  
  1737  vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
  1738  vxh->vx_flags = htonl(VXLAN_HF_VNI);
> 1739  vxh->vx_vni = vni;
  1740  
  1741  if (type & SKB_GSO_TUNNEL_REMCSUM) {
  1742  u32 data = (skb_checksum_start_offset(skb) - hdrlen) >>
  1743 VXLAN_RCO_SHIFT;
  1744  
  1745  if (skb->csum_offset == offsetof(struct udphdr, check))
  1746  data |= VXLAN_RCO_UDP;
  1747  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next:master 187/208] net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison expression (different address spaces)

2015-07-21 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   16040894b26af9f85d9395f072c53d76a44eba21
commit: e3e4712ec0961ed586a8db340bd994c4ad7f5dba [187/208] mpls: ip tunnel 
support
reproduce:
  # apt-get install sparse
  git checkout e3e4712ec0961ed586a8db340bd994c4ad7f5dba
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison 
>> expression (different address spaces)

vim +73 net/mpls/mpls_iptunnel.c

57  /* Obtain the ttl */
58  if (skb->protocol == htons(ETH_P_IP)) {
59  ttl = ip_hdr(skb)->ttl;
60  rt = (struct rtable *)dst;
61  lwtstate = rt->rt_lwtstate;
62  } else if (skb->protocol == htons(ETH_P_IPV6)) {
63  ttl = ipv6_hdr(skb)->hop_limit;
64  rt6 = (struct rt6_info *)dst;
65  lwtstate = rt6->rt6i_lwtstate;
66  } else {
67  goto drop;
68  }
69  
70  skb_orphan(skb);
71  
72  /* Find the output device */
  > 73  out_dev = rcu_dereference(dst->dev);
74  if (!mpls_output_possible(out_dev) ||
75  !lwtstate || skb_warn_if_lro(skb))
76  goto drop;
77  
78  skb_forward_csum(skb);
79  
80  tun_encap_info = mpls_lwtunnel_encap(lwtstate);
81  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Patch net] sch_choke: drop all packets in queue during reset

2015-07-21 Thread Cong Wang
Signed-off-by: Cong Wang 
---
 net/sched/sch_choke.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 93d5742..6a783af 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -385,6 +385,19 @@ static void choke_reset(struct Qdisc *sch)
 {
struct choke_sched_data *q = qdisc_priv(sch);
 
+   while (q->head != q->tail) {
+   struct sk_buff *skb = q->tab[q->head];
+
+   q->head = (q->head + 1) & q->tab_mask;
+   if (!skb)
+   continue;
+   qdisc_qstats_backlog_dec(sch, skb);
+   --sch->q.qlen;
+   qdisc_drop(skb, sch);
+   }
+
+   memset(q->tab, 0, (q->tab_mask + 1) * sizeof(struct sk_buff *));
+   q->head = q->tail = 0;
red_restart(&q->vars);
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-21 Thread Martin KaFai Lau
The patch checks neigh->nud_state before acquiring the writer lock.
Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

I also take this chance to re-arrange the code.

40 udpflood processes and a /64 gateway route are used.
The gateway has NUD_PERMANENT.  Each of them is run for 30s.
At the end, the total number of finished sendto():

BeforeAfter
55M   95M

Signed-off-by: Martin KaFai Lau 
Cc: Hannes Frederic Sowa 
---
 net/ipv6/route.c | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 6090969..a6c6b5a 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -544,6 +544,7 @@ static void rt6_probe_deferred(struct work_struct *w)
 
 static void rt6_probe(struct rt6_info *rt)
 {
+   struct __rt6_probe_work *work;
struct neighbour *neigh;
/*
 * Okay, this does not seem to be appropriate
@@ -558,34 +559,32 @@ static void rt6_probe(struct rt6_info *rt)
rcu_read_lock_bh();
neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway);
if (neigh) {
-   write_lock(&neigh->lock);
if (neigh->nud_state & NUD_VALID)
goto out;
-   }
-
-   if (!neigh ||
-   time_after(jiffies, neigh->updated + 
rt->rt6i_idev->cnf.rtr_probe_interval)) {
-   struct __rt6_probe_work *work;
 
+   work = NULL;
+   write_lock(&neigh->lock);
+   if (!(neigh->nud_state & NUD_VALID) &&
+   time_after(jiffies, neigh->updated + 
rt->rt6i_idev->cnf.rtr_probe_interval)) {
+   work = kmalloc(sizeof(*work), GFP_ATOMIC);
+   if (work) {
+   __neigh_set_probe_once(neigh);
+   }
+   }
+   write_unlock(&neigh->lock);
+   } else {
work = kmalloc(sizeof(*work), GFP_ATOMIC);
+   }
 
-   if (neigh && work)
-   __neigh_set_probe_once(neigh);
-
-   if (neigh)
-   write_unlock(&neigh->lock);
+   if (work) {
+   INIT_WORK(&work->work, rt6_probe_deferred);
+   work->target = rt->rt6i_gateway;
+   dev_hold(rt->dst.dev);
+   work->dev = rt->dst.dev;
+   schedule_work(&work->work);
+   }
 
-   if (work) {
-   INIT_WORK(&work->work, rt6_probe_deferred);
-   work->target = rt->rt6i_gateway;
-   dev_hold(rt->dst.dev);
-   work->dev = rt->dst.dev;
-   schedule_work(&work->work);
-   }
-   } else {
 out:
-   write_unlock(&neigh->lock);
-   }
rcu_read_unlock_bh();
 }
 #else
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next:master 195/208] net/core/fib_rules.c:418:3: error: implicit declaration of function 'ip_tunnel_need_metadata'

2015-07-21 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   16040894b26af9f85d9395f072c53d76a44eba21
commit: e7030878fc8448492b6e5cecd574043f63271298 [195/208] fib: Add fib rule 
match on tunnel id
config: i386-randconfig-r0-201529 (attached as .config)
reproduce:
  git checkout e7030878fc8448492b6e5cecd574043f63271298
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   net/core/fib_rules.c: In function 'fib_nl_newrule':
>> net/core/fib_rules.c:418:3: error: implicit declaration of function 
>> 'ip_tunnel_need_metadata' [-Werror=implicit-function-declaration]
  ip_tunnel_need_metadata();
  ^
   net/core/fib_rules.c: In function 'fib_nl_delrule':
>> net/core/fib_rules.c:505:4: error: implicit declaration of function 
>> 'ip_tunnel_unneed_metadata' [-Werror=implicit-function-declaration]
   ip_tunnel_unneed_metadata();
   ^
   cc1: some warnings being treated as errors

vim +/ip_tunnel_need_metadata +418 net/core/fib_rules.c

   412  ops->nr_goto_rules++;
   413  
   414  if (unresolved)
   415  ops->unresolved_rules++;
   416  
   417  if (rule->tun_id)
 > 418  ip_tunnel_need_metadata();
   419  
   420  notify_rule_change(RTM_NEWRULE, rule, ops, nlh, 
NETLINK_CB(skb).portid);
   421  flush_route_cache(ops);
   422  rules_ops_put(ops);
   423  return 0;
   424  
   425  errout_free:
   426  kfree(rule);
   427  errout:
   428  rules_ops_put(ops);
   429  return err;
   430  }
   431  
   432  static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh)
   433  {
   434  struct net *net = sock_net(skb->sk);
   435  struct fib_rule_hdr *frh = nlmsg_data(nlh);
   436  struct fib_rules_ops *ops = NULL;
   437  struct fib_rule *rule, *tmp;
   438  struct nlattr *tb[FRA_MAX+1];
   439  int err = -EINVAL;
   440  
   441  if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh)))
   442  goto errout;
   443  
   444  ops = lookup_rules_ops(net, frh->family);
   445  if (ops == NULL) {
   446  err = -EAFNOSUPPORT;
   447  goto errout;
   448  }
   449  
   450  err = nlmsg_parse(nlh, sizeof(*frh), tb, FRA_MAX, ops->policy);
   451  if (err < 0)
   452  goto errout;
   453  
   454  err = validate_rulemsg(frh, tb, ops);
   455  if (err < 0)
   456  goto errout;
   457  
   458  list_for_each_entry(rule, &ops->rules_list, list) {
   459  if (frh->action && (frh->action != rule->action))
   460  continue;
   461  
   462  if (frh_get_table(frh, tb) &&
   463  (frh_get_table(frh, tb) != rule->table))
   464  continue;
   465  
   466  if (tb[FRA_PRIORITY] &&
   467  (rule->pref != nla_get_u32(tb[FRA_PRIORITY])))
   468  continue;
   469  
   470  if (tb[FRA_IIFNAME] &&
   471  nla_strcmp(tb[FRA_IIFNAME], rule->iifname))
   472  continue;
   473  
   474  if (tb[FRA_OIFNAME] &&
   475  nla_strcmp(tb[FRA_OIFNAME], rule->oifname))
   476  continue;
   477  
   478  if (tb[FRA_FWMARK] &&
   479  (rule->mark != nla_get_u32(tb[FRA_FWMARK])))
   480  continue;
   481  
   482  if (tb[FRA_FWMASK] &&
   483  (rule->mark_mask != nla_get_u32(tb[FRA_FWMASK])))
   484  continue;
   485  
   486  if (tb[FRA_TUN_ID] &&
   487  (rule->tun_id != nla_get_be64(tb[FRA_TUN_ID])))
   488  continue;
   489  
   490  if (!ops->compare(rule, frh, tb))
   491  continue;
   492  
   493  if (rule->flags & FIB_RULE_PERMANENT) {
   494  err = -EPERM;
   495  goto errout;
   496  }
   497  
   498  if (ops->delete) {
   499  err = ops->delete(rule);
   500  if (err)
   501  goto errout;
   502  }
   503  
   504  if (rule->tun_id)
 > 505  ip_tunnel_unneed_metadata();
   506  
   507  list_del_rcu(&rule->list);
   508  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.2.0-rc2 Kernel Configuration
#
# CONFIG_64BIT is not se

[net-next:master 194/208] include/net/dst_metadata.h:39:4: error: implicit declaration of function 'lwt_tun_info'

2015-07-21 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   16040894b26af9f85d9395f072c53d76a44eba21
commit: 3093fbe7ff4bc7d1571fc217dade1cf80330a714 [194/208] route: Per route IP 
tunnel metadata via lightweight tunnel
config: i386-randconfig-i0-201529 (attached as .config)
reproduce:
  git checkout 3093fbe7ff4bc7d1571fc217dade1cf80330a714
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from net/core/dst.c:25:0:
   include/net/dst_metadata.h: In function 'skb_tunnel_info':
>> include/net/dst_metadata.h:39:4: error: implicit declaration of function 
>> 'lwt_tun_info' [-Werror=implicit-function-declaration]
   return lwt_tun_info(rt->rt_lwtstate);
   ^
>> include/net/dst_metadata.h:39:4: warning: return makes pointer from integer 
>> without a cast
   cc1: some warnings being treated as errors

vim +/lwt_tun_info +39 include/net/dst_metadata.h

33  return &md_dst->u.tun_info;
34  
35  switch (family) {
36  case AF_INET:
37  rt = (struct rtable *)skb_dst(skb);
38  if (rt && rt->rt_lwtstate)
  > 39  return lwt_tun_info(rt->rt_lwtstate);
40  break;
41  }
42  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.2.0-rc2 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
CONFIG_KERNEL_LZ4=y
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
# CONFIG_IKCONFIG_PROC is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_MEMCG is not set
# CONFIG_CGROUP_PERF is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
CONFIG_DEBUG_BLK_CGROUP=y
# CONFIG_CHECKPOINT_RESTO

[Patch net] sch_plug: purge buffered packets during reset

2015-07-21 Thread Cong Wang
Otherwise the skbuff related structures are not correctly
refcount'ed.

Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 net/sched/sch_plug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sched/sch_plug.c b/net/sched/sch_plug.c
index 89f8fcf..ade9445 100644
--- a/net/sched/sch_plug.c
+++ b/net/sched/sch_plug.c
@@ -216,6 +216,7 @@ static struct Qdisc_ops plug_qdisc_ops __read_mostly = {
.peek=   qdisc_peek_head,
.init=   plug_init,
.change  =   plug_change,
+   .reset   =   qdisc_reset_queue,
.owner   =   THIS_MODULE,
 };
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/1] tipc: fix compatibility bug

2015-07-21 Thread David Miller
From: Jon Maloy 
Date: Tue, 21 Jul 2015 06:42:28 -0400

> In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6
> ("tipc: reduce locking scope during packet reception") we introduced
> a new function tipc_link_proto_rcv(). This function contains a bug,
> so that it sometimes by error sends out a non-zero link priority value
> in created protocol messages.
> 
> The bug may lead to an extra link reset at initial link establising
> with older nodes. This will never happen more than once, whereafter
> the link will work as intended.
> 
> We fix this bug in this commit.
> 
> Signed-off-by: Jon Maloy 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net: track success and failure of TCP PMTU probing

2015-07-21 Thread Rick Jones
From: Rick Jones 

Track success and failure of TCP PMTU probing.

Signed-off-by: Rick Jones 

---

Tested by loading-up into an OpenStack instance and kicking the MTU
out from under it in the corresponding router namespace.

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index eee8968..25a9ad8 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -278,6 +278,8 @@ enum
LINUX_MIB_TCPACKSKIPPEDCHALLENGE,   /* TCPACKSkippedChallenge */
LINUX_MIB_TCPWINPROBE,  /* TCPWinProbe */
LINUX_MIB_TCPKEEPALIVE, /* TCPKeepAlive */
+   LINUX_MIB_TCPMTUPFAIL,  /* TCPMTUPFail */
+   LINUX_MIB_TCPMTUPSUCCESS,   /* TCPMTUPSuccess */
__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index da5d483..3abd9d7 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -300,6 +300,8 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM("TCPACKSkippedChallenge", 
LINUX_MIB_TCPACKSKIPPEDCHALLENGE),
SNMP_MIB_ITEM("TCPWinProbe", LINUX_MIB_TCPWINPROBE),
SNMP_MIB_ITEM("TCPKeepAlive", LINUX_MIB_TCPKEEPALIVE),
+   SNMP_MIB_ITEM("TCPMTUPFail", LINUX_MIB_TCPMTUPFAIL),
+   SNMP_MIB_ITEM("TCPMTUPSuccess", LINUX_MIB_TCPMTUPSUCCESS),
SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1578fc2..cda3ffe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2593,6 +2593,7 @@ static void tcp_mtup_probe_failed(struct sock *sk)
 
icsk->icsk_mtup.search_high = icsk->icsk_mtup.probe_size - 1;
icsk->icsk_mtup.probe_size = 0;
+   NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPMTUPFAIL);
 }
 
 static void tcp_mtup_probe_success(struct sock *sk)
@@ -2612,6 +2613,7 @@ static void tcp_mtup_probe_success(struct sock *sk)
icsk->icsk_mtup.search_low = icsk->icsk_mtup.probe_size;
icsk->icsk_mtup.probe_size = 0;
tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
+   NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPMTUPSUCCESS);
 }
 
 /* Do a simple retransmit without using the backoff mechanisms in
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v5 0/4] net: enable inband link state negotiation only when explicitly requested

2015-07-21 Thread David Miller
From: Florian Fainelli 
Date: Mon, 20 Jul 2015 17:49:54 -0700

> Changes in v5:
> 
> - removed an invalid use of the link_update callback in the SF2 driver
>   was appeared after merging "net: phy: fixed_phy: handle link-down case"
> 
> - reworded the commit message for patch 2 to make it clear what it fixes and
>   why this is required

Series applied, thanks Florian.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull-request: wireless-drivers 2015-07-20

2015-07-21 Thread David Miller
From: Kalle Valo 
Date: Mon, 20 Jul 2015 18:36:30 +0300

> here are few fixes for 4.2, should not have anything out of ordinary.
> Please let me know if there are any issues.

Pulled, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v4] ipv6: sysctl to restrict candidate source addresses

2015-07-21 Thread David Miller
From: Erik Kline 
Date: Mon, 20 Jul 2015 16:06:34 +0200

> I thought perhaps "use_oif_addr_only" was a slightly clearer sysctl name.
> 
> (Maybe it should be plural, "use_oif_addrs_only"?)

I think plural would be better too, please respin with that change.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net-next] net: #ifdefify sk_classid member of struct sock

2015-07-21 Thread David Miller
From: Mathias Krause 
Date: Sun, 19 Jul 2015 22:21:13 +0200

> The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is
> enabled. #ifdefify it to reduce the size of struct sock on 32 bit
> systems, at least.
> 
> Signed-off-by: Mathias Krause 
> ---
> v2:
> - ensure we'll error out in nft_meta_get_init() if CONFIG_CGROUP_NET_CLASSID
>   is not set

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ravb: fix ring memory allocation

2015-07-21 Thread Sergei Shtylyov
The driver is written as if it can adapt to a low memory situation  allocating
less RX  skbs and TX aligned buffers than the respective RX/TX ring sizes.  In
reality  though  the driver  would malfunction in this case. Stop being overly
smart and just fail in such situation -- this is achieved by moving the memory
allocation from ravb_ring_format() to ravb_ring_init().

We leave dma_map_single() calls in place but make their failure non-fatal
by marking the corresponding RX descriptors  with zero data size which should
prevent DMA to an invalid addresses.

Signed-off-by: Sergei Shtylyov 

---
The patch is against Dave Miller's 'net.git' repo.

drivers/net/ethernet/renesas/ravb_main.c |   59 +--
 1 file changed, 34 insertions(+), 25 deletions(-)

Index: net/drivers/net/ethernet/renesas/ravb_main.c
===
--- net.orig/drivers/net/ethernet/renesas/ravb_main.c
+++ net/drivers/net/ethernet/renesas/ravb_main.c
@@ -228,9 +228,7 @@ static void ravb_ring_format(struct net_
struct ravb_desc *desc = NULL;
int rx_ring_size = sizeof(*rx_desc) * priv->num_rx_ring[q];
int tx_ring_size = sizeof(*tx_desc) * priv->num_tx_ring[q];
-   struct sk_buff *skb;
dma_addr_t dma_addr;
-   void *buffer;
int i;
 
priv->cur_rx[q] = 0;
@@ -241,41 +239,28 @@ static void ravb_ring_format(struct net_
memset(priv->rx_ring[q], 0, rx_ring_size);
/* Build RX ring buffer */
for (i = 0; i < priv->num_rx_ring[q]; i++) {
-   priv->rx_skb[q][i] = NULL;
-   skb = netdev_alloc_skb(ndev, PKT_BUF_SZ + RAVB_ALIGN - 1);
-   if (!skb)
-   break;
-   ravb_set_buffer_align(skb);
/* RX descriptor */
rx_desc = &priv->rx_ring[q][i];
/* The size of the buffer should be on 16-byte boundary. */
rx_desc->ds_cc = cpu_to_le16(ALIGN(PKT_BUF_SZ, 16));
-   dma_addr = dma_map_single(&ndev->dev, skb->data,
+   dma_addr = dma_map_single(&ndev->dev, priv->rx_skb[q][i]->data,
  ALIGN(PKT_BUF_SZ, 16),
  DMA_FROM_DEVICE);
-   if (dma_mapping_error(&ndev->dev, dma_addr)) {
-   dev_kfree_skb(skb);
-   break;
-   }
-   priv->rx_skb[q][i] = skb;
+   /* We just set the data size to 0 for a failed mapping which
+* should prevent DMA from happening...
+*/
+   if (dma_mapping_error(&ndev->dev, dma_addr))
+   rx_desc->ds_cc = cpu_to_le16(0);
rx_desc->dptr = cpu_to_le32(dma_addr);
rx_desc->die_dt = DT_FEMPTY;
}
rx_desc = &priv->rx_ring[q][i];
rx_desc->dptr = cpu_to_le32((u32)priv->rx_desc_dma[q]);
rx_desc->die_dt = DT_LINKFIX; /* type */
-   priv->dirty_rx[q] = (u32)(i - priv->num_rx_ring[q]);
 
memset(priv->tx_ring[q], 0, tx_ring_size);
/* Build TX ring buffer */
for (i = 0; i < priv->num_tx_ring[q]; i++) {
-   priv->tx_skb[q][i] = NULL;
-   priv->tx_buffers[q][i] = NULL;
-   buffer = kmalloc(PKT_BUF_SZ + RAVB_ALIGN - 1, GFP_KERNEL);
-   if (!buffer)
-   break;
-   /* Aligned TX buffer */
-   priv->tx_buffers[q][i] = buffer;
tx_desc = &priv->tx_ring[q][i];
tx_desc->die_dt = DT_EEMPTY;
}
@@ -298,7 +283,10 @@ static void ravb_ring_format(struct net_
 static int ravb_ring_init(struct net_device *ndev, int q)
 {
struct ravb_private *priv = netdev_priv(ndev);
+   struct sk_buff *skb;
int ring_size;
+   void *buffer;
+   int i;
 
/* Allocate RX and TX skb rings */
priv->rx_skb[q] = kcalloc(priv->num_rx_ring[q],
@@ -308,12 +296,28 @@ static int ravb_ring_init(struct net_dev
if (!priv->rx_skb[q] || !priv->tx_skb[q])
goto error;
 
+   for (i = 0; i < priv->num_rx_ring[q]; i++) {
+   skb = netdev_alloc_skb(ndev, PKT_BUF_SZ + RAVB_ALIGN - 1);
+   if (!skb)
+   goto error;
+   ravb_set_buffer_align(skb);
+   priv->rx_skb[q][i] = skb;
+   }
+
/* Allocate rings for the aligned buffers */
priv->tx_buffers[q] = kcalloc(priv->num_tx_ring[q],
  sizeof(*priv->tx_buffers[q]), GFP_KERNEL);
if (!priv->tx_buffers[q])
goto error;
 
+   for (i = 0; i < priv->num_tx_ring[q]; i++) {
+   buffer = kmalloc(PKT_BUF_SZ + RAVB_ALIGN - 1, GFP_KERNEL);
+   if (!buffer)
+   goto error;
+   /* Aligned TX buffer */
+   priv->tx_buffers[q][i] = buffer;
+   }
+
   

Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Nishanth Aravamudan
On 21.07.2015 [11:30:58 -0500], Chris J Arges wrote:
> On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote:
> > On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote:
> > > Some architectures like POWER can have a NUMA node_possible_map that
> > > contains sparse entries. This causes memory corruption with openvswitch
> > > since it allocates flow_cache with a multiple of num_possible_nodes() and
> > 
> > Couldn't this also be fixed by just allocationg with a multiple of
> > nr_node_ids (which seems to have been the original intent all along)?
> > You could then make your stats array be sparse or not.
> > 
> 
> Yea originally this is what I did, but I thought it would be wasting memory.
> 
> > > assumes the node variable returned by for_each_node will index into
> > > flow->stats[node].
> > > 
> > > For example, if node_possible_map is 0x30003, this patch will map node to
> > > node_cnt as follows:
> > > 0,1,16,17 => 0,1,2,3
> > > 
> > > The crash was noticed after 3af229f2 was applied as it changed the
> > > node_possible_map to match node_online_map on boot.
> > > Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
> > 
> > My concern with this version of the fix is that you're relying on,
> > implicitly, the order of for_each_node's iteration corresponding to the
> > entries in stats 1:1. But what about node hotplug? It seems better to
> > have the enumeration of the stats array match the topology accurately,
> > rather, or to maintain some sort of internal map in the OVS code between
> > the NUMA node and the entry in the stats array?
> > 
> > I'm willing to be convinced otherwise, though :)
> > 
> > -Nish
> >
> 
> Nish,
> 
> The method I described should work for hotplug since it's using possible map
> which AFAIK is static rather than the online map. 

Oh you're right, I'm sorry!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

2015-07-21 Thread Nishanth Aravamudan
On 21.07.2015 [12:36:33 -0500], Chris J Arges wrote:
> Some architectures like POWER can have a NUMA node_possible_map that
> contains sparse entries. This causes memory corruption with openvswitch
> since it allocates flow_cache with a multiple of num_possible_nodes() and
> assumes the node variable returned by for_each_node will index into
> flow->stats[node].
> 
> Use nr_node_ids to allocate a maximal sparse array instead of
> num_possible_nodes().
> 
> The crash was noticed after 3af229f2 was applied as it changed the
> node_possible_map to match node_online_map on boot.
> Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
> 
> Signed-off-by: Chris J Arges 
Acked-by: Nishanth Aravamudan 
> ---
>  net/openvswitch/flow_table.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index 4613df8..6552394 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -752,7 +752,7 @@ int ovs_flow_init(void)
>   BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long));
> 
>   flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow)
> -+ (num_possible_nodes()
> ++ (nr_node_ids
> * sizeof(struct flow_stats *)),
>  0, 0, NULL);
>   if (flow_cache == NULL)
> -- 
> 1.9.1
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why return E2BIG from bpf map update?

2015-07-21 Thread Alexei Starovoitov

On 7/21/15 3:13 AM, Alex Gartrell wrote:

But, the EINVAL errno has similarly been
abused to death


there was a thread few month ago trying to come up with
a generic solution for aliased error codes, but unfortunately
nothing concrete came out of it.
The one I liked sounded that the kernel may be able to extend
syscall interface to return a string together with errno,
but it's quite hard to do at present.
May be extensions to vdso data writable by kernel can
improve the situation.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

2015-07-21 Thread Eric Dumazet
On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote:

> > -   kfree_skb(skb);
> > +   INIT_LIST_HEAD(&q->new_flows);
> > +   INIT_LIST_HEAD(&q->old_flows);
> > +   for (i = 0; i < q->flows_cnt; i++) {
> > +   struct fq_codel_flow *flow = q->flows + i;
> > +
> > +   while (flow->head)
> > +   kfree_skb(dequeue_head(flow));
> > +
> > +   INIT_LIST_HEAD(&flow->flowchain);
> 
> 
> You probably need to call codel_vars_init(&flow->cvars) as well.

It is not necessary : flow->cvars only matter in the event of a dequeue,
but whole qdisc is dismantled and no packet will be dequeued.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2] Use PATH_MAX instead of MAXPATHLEN

2015-07-21 Thread Yegor Yefremov
On Wed, Apr 29, 2015 at 6:52 PM, Felix Janda  wrote:
> Florian Fainelli wrote:
>> On 27/04/15 09:13, Stephen Hemminger wrote:
>> > On Sat, 25 Apr 2015 22:33:28 +0200
>> > Felix Janda  wrote:
>> >
>> >> They are equivalent but the former is more common. PATH_MAX is
>> >> specified by POSIX and needs  while MAXPATHLEN has BSD
>> >> origin and needs .
>> >>
>> >> PATH_MAX has already been in use in misc/lnstat.h.
>> >>
>> >> Signed-off-by: Felix Janda 
>> >
>> > Iproute2 is intended for use on Linux.
>> > It makes more sense to align with Posix than using leftover
>> > BSD stuff. Therefore I don't see any point in doing this.
>>
>> My reading from Felix's commit message is that he is attempting to do
>> exactly that: conform to POSIX rather than BSD, which seems to be the
>> direction you are also suggesting here.
>> --
>> Florian
>
> This is correct. (In fact I misread the end of Stephen's message,
> thought that the patch was merged and wanted to thank for that.)

What's the status of this patch? This is one of the reasons iproute2
cannot be compiled against musl C library. After fixing this I get
tons of redefine errors:

In file included from ../include/linux/xfrm.h:4:0,
 from xfrm_state.c:31:
../include/linux/in6.h:32:8: error: redefinition of ‘struct in6_addr’
 struct in6_addr {
^
In file included from
/home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netdb.h:9:0,
 from xfrm_state.c:30:
/home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netinet/in.h:24:8:
note: originally defined here
 struct in6_addr
^
In file included from ../include/linux/xfrm.h:4:0,
 from xfrm_state.c:31:
../include/linux/in6.h:40:0: warning: "s6_addr" redefined
 #define s6_addr   in6_u.u6_addr8
 ^
In file included from
/home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netdb.h:9:0,
 from xfrm_state.c:30:
/home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netinet/in.h:32:0:
note: this is the location of the previous definition
 #define s6_addr __in6_union.__s6_addr
 ^

Yegor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v5 0/4] net: enable inband link state negotiation only when explicitly requested

2015-07-21 Thread Arnaud Ebalard
Hi guys,

Florian Fainelli  writes:

> Changes in v5:
>
> - removed an invalid use of the link_update callback in the SF2 driver
>   was appeared after merging "net: phy: fixed_phy: handle link-down case"
>
> - reworded the commit message for patch 2 to make it clear what it fixes and
>   why this is required
>
> Initial cover letter from Stas:
>
> Hello.
>
> Currently the link status auto-negotiation is enabled
> for any SGMII link with fixed-link DT binding.
> The regression was reported:
> https://lkml.org/lkml/2015/7/8/865
> Apparently not all HW that implements SGMII protocol, generates the
> inband status for the auto-negotiation to work.
> More details here:
> https://lkml.org/lkml/2015/7/10/206
>
> The following patches reverts to the old behavior by default,
> which is to not enable the auto-negotiation for fixed-link.
> The new DT property is added that allows to explicitly request
> the auto-negotiation.

FWIW, I tested this v5 series on mirabox (2 mvneta interfaces using
RGMII); both interfaces still work as expected, i.e. no regression
on my side.

a+
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ARP response with link local IP, why not broadcast

2015-07-21 Thread Sowmini Varadhan
On Tue, Jul 21, 2015 at 4:38 PM, Sebastian Fett  wrote:
> Hello!
>
> According to RFC3927 every ARP packet (reply and request) should be sent as
> link layer broadcast as long as the sender IP is a link local address. (see
> chapter 2.5).

Because broadcast replies are noisy and should be avoided.
if possible- it creates a broadcast flood that would wake up all receivers,
and is especially undesirable in today's world, where bcast would wake
up sleepy devices, or require other inefficient processes in a cloud env.
See also https://www.ietf.org/id/draft-nordmark-6man-dad-approaches-01.txt

> That functionality would help me a lot with a use case I have with our
> application.

what is your use case?

>
> But it is not implemented in the kernel that way.
> Does anyone know why?

--Sowmini
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-21 Thread Florian Westphal
Frank Schreuder  wrote:

[ inet frag evictor crash ]

We believe we found the bug.  This patch should fix it.

We cannot share list for buckets and evictor, the flag member is
subject to race conditions so flags & INET_FRAG_EVICTED test is not
reliable.

It would be great if you could confirm that this fixes the problem
for you, we'll then make formal patch submission.

Please apply this on kernel without previous test patches, wheter you
use affected -stable or net-next kernel shouldn't matter since those are
similar enough.

Many thanks!

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -45,6 +45,7 @@ enum {
  * @flags: fragment queue flags
  * @max_size: maximum received fragment size
  * @net: namespace that this frag belongs to
+ * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
  */
 struct inet_frag_queue {
spinlock_t  lock;
@@ -59,6 +60,7 @@ struct inet_frag_queue {
__u8flags;
u16 max_size;
struct netns_frags  *net;
+   struct hlist_node   list_evictor;
 };
 
 #define INETFRAGS_HASHSZ   1024
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a0..1722348 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -151,14 +151,13 @@ evict_again:
}
 
fq->flags |= INET_FRAG_EVICTED;
-   hlist_del(&fq->list);
-   hlist_add_head(&fq->list, &expired);
+   hlist_add_head(&fq->list_evictor, &expired);
++evicted;
}
 
spin_unlock(&hb->chain_lock);
 
-   hlist_for_each_entry_safe(fq, n, &expired, list)
+   hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
f->frag_expire((unsigned long) fq);
 
return evicted;
@@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, 
struct inet_frags *f)
struct inet_frag_bucket *hb;
 
hb = get_frag_bucket_locked(fq, f);
-   if (!(fq->flags & INET_FRAG_EVICTED))
-   hlist_del(&fq->list);
+   hlist_del(&fq->list);
spin_unlock(&hb->chain_lock);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

2015-07-21 Thread Pravin Shelar
On Tue, Jul 21, 2015 at 10:36 AM, Chris J Arges
 wrote:
> Some architectures like POWER can have a NUMA node_possible_map that
> contains sparse entries. This causes memory corruption with openvswitch
> since it allocates flow_cache with a multiple of num_possible_nodes() and
> assumes the node variable returned by for_each_node will index into
> flow->stats[node].
>
> Use nr_node_ids to allocate a maximal sparse array instead of
> num_possible_nodes().
>
> The crash was noticed after 3af229f2 was applied as it changed the
> node_possible_map to match node_online_map on boot.
> Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
>
> Signed-off-by: Chris J Arges 

Acked-by: Pravin B Shelar 

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 0/2] pci: Provide a flag to access VPD through function 0

2015-07-21 Thread Bjorn Helgaas
[+cc Alex]

On Mon, Jul 13, 2015 at 11:39:54AM -0700, Mark D Rustad wrote:
> Many multi-function devices provide shared registers in extended
> config space for accessing VPD. The behavior of these registers
> means that the state must be tracked and access locked correctly
> for accesses not to hang or worse. One way to meet these needs is
> to always perform the accesses through function 0, thereby using
> the state tracking and mutex that already exists.
> 
> To provide this behavior, add a dev_flags bit to indicate that this
> should be done. This bit can then be set for any non-zero function
> that needs to redirect such VPD access to function 0. Do not set
> this bit on the zero function or there will be an infinite recursion.
> 
> The second patch uses this new flag to invoke this behavior on all
> multi-function Intel Ethernet devices.
> 
> Any hardware that shares VPD registers with multiple functions has
> been suffering these problems forever. The hangs result in the log
> message:
> 
> vpd r/w failed.  This is likely a firmware bug on this device.
> 
> Both read and write data corruption are also possible during
> overlapping accesses in addition to hangs.
> 
> Signed-off-by: Mark Rustad 
> 
> ---
> Changes in V2:
> - Corrected a spelling error in a log message
> - Added checks to see that the referenced function 0 is reasonable
> Changes in V3:
> - Don't leak a device reference
> - Check that function 0 has VPD
> - Make a helper for the function 0 checks
> - Moved a multifunction check to the quirk patch
> Changes in V4:
> - Provide a more extensive commit log for patch 1

I applied these to pci/misc for v4.3 with changelogs as follows.  I added
Alex's ack, since he acked v3 and the only difference here is the
changelog.  I also added a stable tag.  Thanks!

Bjorn


commit 932c435caba8a2ce473a91753bad0173269ef334
Author: Mark Rustad 
Date:   Mon Jul 13 11:40:02 2015 -0700

PCI: Add dev_flags bit to access VPD through function 0

Add a dev_flags bit, PCI_DEV_FLAGS_VPD_REF_F0, to access VPD through
function 0 to provide VPD access on other functions.  This is for hardware
devices that provide copies of the same VPD capability registers in
multiple functions.  Because the kernel expects that each function has its
own registers, both the locking and the state tracking are affected by VPD
accesses to different functions.

On such devices for example, if a VPD write is performed on function 0,
*any* later attempt to read VPD from any other function of that device will
hang.  This has to do with how the kernel tracks the expected value of the
F bit per function.

Concurrent accesses to different functions of the same device can not only
hang but also corrupt both read and write VPD data.

When hangs occur, typically the error message:

  vpd r/w failed.  This is likely a firmware bug on this device.

will be seen.

Never set this bit on function 0 or there will be an infinite recursion.

Signed-off-by: Mark Rustad 
Signed-off-by: Bjorn Helgaas 
Acked-by: Alexander Duyck 
CC: sta...@vger.kernel.org

commit 7aa6ca4d39edf01f997b9e02cf6d2fdeb224f351
Author: Mark Rustad 
Date:   Mon Jul 13 11:40:07 2015 -0700

PCI: Add VPD function 0 quirk for Intel Ethernet devices

Set the PCI_DEV_FLAGS_VPD_REF_F0 flag on all Intel Ethernet device
functions other than function 0, so that on multi-function devices, we will
always read VPD from function 0 instead of from the other functions.

[bhelgaas: changelog]
Signed-off-by: Mark Rustad 
Signed-off-by: Bjorn Helgaas 
Acked-by: Alexander Duyck 
CC: sta...@vger.kernel.org
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

2015-07-21 Thread Cong Wang
On Tue, Jul 21, 2015 at 3:52 AM, Eric Dumazet  wrote:
> On Tue, 2015-07-21 at 06:04 -0400, Jamal Hadi Salim wrote:
>
>> It is worrisome to fix the core code for this. The root cause seems to
>> be codel. Dont have time but in general, reset would be something like:
>>
>> struct fq_codel_sched_data *q = qdisc_priv(sch);
>> qdisc_reset(q)
>
> This only works for very simple qdisc with one queue.
>
>>
>> or something along those lines...
>> But certainly dequeue semantics dont seem right there..
>
> Well, reset() is trivial to implement like this
>
> while (skb = local_dequeue(sch)) {
> kfree_skb(skb);
> }
>
> And I guess I copy/pasted sfq code here, because I was lazy.
>
> But yes, qdisc_tree_decrease_qlen() would have to be not called.


Hmm, so the semantic is each qdisc resets qlen for its own
and calls qdisc_reset() to reset its leaf qdisc's, that makes sense
for me.

>
> It seems I coded fq_reset() differently.
>
> Alex, please try instead :
>
> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> index 21ca33c9f036..3f0320ab6029 100644
> --- a/net/sched/sch_fq_codel.c
> +++ b/net/sched/sch_fq_codel.c
> @@ -288,10 +288,21 @@ begin:
>
>  static void fq_codel_reset(struct Qdisc *sch)
>  {
> -   struct sk_buff *skb;
> +   struct fq_codel_sched_data *q = qdisc_priv(sch);
> +   int i;
>
> -   while ((skb = fq_codel_dequeue(sch)) != NULL)
> -   kfree_skb(skb);
> +   INIT_LIST_HEAD(&q->new_flows);
> +   INIT_LIST_HEAD(&q->old_flows);
> +   for (i = 0; i < q->flows_cnt; i++) {
> +   struct fq_codel_flow *flow = q->flows + i;
> +
> +   while (flow->head)
> +   kfree_skb(dequeue_head(flow));
> +
> +   INIT_LIST_HEAD(&flow->flowchain);


You probably need to call codel_vars_init(&flow->cvars) as well.

> +   }
> +   memset(q->backlogs, 0, q->flows_cnt * sizeof(u32));
> +   sch->q.qlen = 0;
>  }
>
>  static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {
>
>
>

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] tcp: don't extend RTO on failed loss probe attempts

2015-07-21 Thread Yuchung Cheng
On Fri, Jul 17, 2015 at 10:27 PM, Eric Dumazet  wrote:
>
> On Fri, 2015-07-17 at 14:22 -0700, Yuchung Cheng wrote:
> > If TLP was unable to send a probe, it extended the RTO to
> > now + icsk_rto. But extending the RTO makes little sense
> > if no TLP probe went out. With this commit, instead of
> > extending the RTO we re-arm it relative to the transmit time
> > of the write queue head.
>
> But what was the reason the probe could not be sent ?
>
> If it is local congestion or memory allocation error, it does make sense
> to not add fuel to the fire.
Good point. We can identify those so we don't attempt to
retransmit on these errors, but will retransmit on receive-window
limit. I'll re-spin the patch.

>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2 v2] ss: Fix crash when dump stats from /proc with '-p'

2015-07-21 Thread Stephen Hemminger
On Tue, 21 Jul 2015 16:18:36 +0300
Vadim Kochan  wrote:

> From: Vadim Kochan 
> 
> It really partially reverts:
> 
> ec4d0d8a9def35 (ss: Replace unixstat struct by new sockstat struct)
> 
> but adds few fields (name & peer_name) from removed unixstat to sockstat 
> struct to easy
> return original code.
> 
> Fixes: ec4d0d8a9def35 (ss: Replace unixstat struct by new sockstat struct)
> Reported-by: Marc Dietrich 
> Signed-off-by: Vadim Kochan 

I applied this one after resolving merge conflicts.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 14/22] vxlan: Flow based tunneling

2015-07-21 Thread Thomas Graf
On 07/21/15 at 10:30am, Alexei Starovoitov wrote:
> RX:
> >+info->mode = IP_TUNNEL_INFO_RX;
> >+info->key.tun_flags = TUNNEL_KEY;
> >+info->key.tun_id = cpu_to_be64(vni >> 8);
> ...
> TX:
> >+dst_port = info->key.tp_dst ? : vxlan->dst_port;
> >+vni = be64_to_cpu(info->key.tun_id);
> 
> I think the copy paste of ovs_tunnel_info into ip_tunnel_info
> can be improved. In particular instead of '__be64 tun_id'
> we can use '__u64 tun_id' which will avoid extra byteswaps for rx/tx
> paths.
> 
> netlink for this part also seems inconsistent.
> In the patch 16:
> +static const struct nla_policy ip_tun_policy[IP_TUN_MAX + 1] = {
> + [IP_TUN_ID] = { .type = NLA_U64 },
> ...
> + if (tb[IP_TUN_ID])
> + tun_info->key.tun_id = nla_get_u64(tb[IP_TUN_ID]);
> 
> I think nla_get_be64 should be there?
> and with my suggestion we can add be64_to_cpu() here instead
> of doing it per packet.
> Thoughts?

I like this. The be64 originates from how OVS stores the tun_id in the
flow key. I agree that it makes sense to limit and delay the byteswaps
to when OVS inherits the flow key from the ip_tunnel_info. I will send
a follow-up.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] drivers: net: cpsw: remove tx event processing in rx napi poll

2015-07-21 Thread David Miller
From: Mugunthan V N 
Date: Tue, 21 Jul 2015 16:00:42 +0530

> With commit c03abd84634d ("net: ethernet: cpsw: don't requests IRQs
> we don't use") common isr and napi are separated into separate tx isr
> and rx isr/napi, but still in rx napi tx events are handled. So removing
> the tx event handling in rx napi.
> 
> Signed-off-by: Mugunthan V N 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 00/22 v2] Lightweight & flow based encapsulation

2015-07-21 Thread David Miller
From: Thomas Graf 
Date: Tue, 21 Jul 2015 10:43:44 +0200

> This series combines the work previously posted by Roopa, Robert and
> myself. It's according to what we discussed at NFWS. The motivation
> of this series is to:
> 
>  * Consolidate code between OVS and the rest of the kernel and get
>rid of OVS vports and instead represent them as pure net_devices.
>  * Introduce a lightweight tunneling mechanism which enables flow
>based encapsulation to improve scalability on both RX and TX.
>  * Do the above in an encapsulation unspecific way so that the
>encapsulation type is eventually abstracted away from the user.
>  * Use the same forwarding decision for both native forwarding and
>encapsulation thus allowing to switch between native IPv6 and
>UDP encapsulation based on endpoint without requiring additional
>logic
> 
> The fundamental changes introduces in this series are:
>  * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
>instructions. Depending on the specified type, the instructions
>apply to UDP encapsulations, MPLS and possible other in the future.
>  * Depending on the encapsulation type, the output function of the
>dst is directly overwritten or the dst merely attaches metadata and
>relies on a subsequent net_device to apply it to the packet. The
>latter is typically used if an inner and outer IP header exist which
>require two subsequent routing lookups to be performed.
>  * A new metadata_dst structure which can be attached to skbs to
>carry metadata in between subsystems. This new metadata transport
>is used to provide a single interface for VXLAN, routing and OVS
>to communicate through metadata.

Series applied, but please take Alexei's endianness feedback into
consideration.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net] inet: frags: fix defragmented packet's IP header for af_packet

2015-07-21 Thread David Miller
From: Eric Dumazet 
Date: Tue, 21 Jul 2015 09:43:59 +0200

> From: Edward Hyunkoo Jee 
> 
> When ip_frag_queue() computes positions, it assumes that the passed
> sk_buff does not contain L2 headers.
> 
> However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
> functions can be called on outgoing packets that contain L2 headers. 
> 
> Also, IPv4 checksum is not corrected after reassembly.
> 
> Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 
> fanouts.")
> Signed-off-by: Edward Hyunkoo Jee 
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

2015-07-21 Thread Chris J Arges
Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].

Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().

The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861

Signed-off-by: Chris J Arges 
---
 net/openvswitch/flow_table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 4613df8..6552394 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -752,7 +752,7 @@ int ovs_flow_init(void)
BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long));
 
flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow)
-  + (num_possible_nodes()
+  + (nr_node_ids
  * sizeof(struct flow_stats *)),
   0, 0, NULL);
if (flow_cache == NULL)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 14/22] vxlan: Flow based tunneling

2015-07-21 Thread Alexei Starovoitov

On 7/21/15 1:43 AM, Thomas Graf wrote:

This prepares the VXLAN device to be steered by the routing and other
subsystems which allows to support encapsulation for a large number
of tunnel endpoints and tunnel ids through a single net_device which
improves the scalability.


+1. looks very useful.

RX:

+   info->mode = IP_TUNNEL_INFO_RX;
+   info->key.tun_flags = TUNNEL_KEY;
+   info->key.tun_id = cpu_to_be64(vni >> 8);

...
TX:

+   dst_port = info->key.tp_dst ? : vxlan->dst_port;
+   vni = be64_to_cpu(info->key.tun_id);


I think the copy paste of ovs_tunnel_info into ip_tunnel_info
can be improved. In particular instead of '__be64 tun_id'
we can use '__u64 tun_id' which will avoid extra byteswaps for rx/tx
paths.

netlink for this part also seems inconsistent.
In the patch 16:
+static const struct nla_policy ip_tun_policy[IP_TUN_MAX + 1] = {
+   [IP_TUN_ID] = { .type = NLA_U64 },
...
+   if (tb[IP_TUN_ID])
+   tun_info->key.tun_id = nla_get_u64(tb[IP_TUN_ID]);

I think nla_get_be64 should be there?
and with my suggestion we can add be64_to_cpu() here instead
of doing it per packet.
Thoughts?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: phy: dp83867: Fix warning check for setting the internal delay

2015-07-21 Thread Florian Fainelli
On 21/07/15 10:06, Dan Murphy wrote:
> Fix warning: logical ‘or’ of collectively exhaustive tests is always true
> 
> Change the internal delay check from an 'or' condition to an 'and'
> condition.
> 
> Reported-by: David Binderman 
> Signed-off-by: Dan Murphy 

Acked-by: Florian Fainelli 

> ---
>  drivers/net/phy/dp83867.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
> index c7a12e2..8a3bf54 100644
> --- a/drivers/net/phy/dp83867.c
> +++ b/drivers/net/phy/dp83867.c
> @@ -164,7 +164,7 @@ static int dp83867_config_init(struct phy_device *phydev)
>   return ret;
>   }
>  
> - if ((phydev->interface >= PHY_INTERFACE_MODE_RGMII_ID) ||
> + if ((phydev->interface >= PHY_INTERFACE_MODE_RGMII_ID) &&
>   (phydev->interface <= PHY_INTERFACE_MODE_RGMII_RXID)) {
>   val = phy_read_mmd_indirect(phydev, DP83867_RGMIICTL,
>   DP83867_DEVADDR, phydev->addr);
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Identifying underlying interface from struct sock

2015-07-21 Thread Guru Prasad
Hi,

First, I apologize for posting on the netdev forum. Majordomo did not
list any other network related mailing list.

Is there a way to identify the underlying network interface from an
instance of struct sock? I realize that the socket is abstract and
shouldn't/doesn't necessarily depend on the underlying interface, but
say, with TCP, where the connection is endpoint oriented, shouldn't
this mean that the socket maintains a reference to the interface to
which it is associated?

I tried
dev = dev_get_by_index(sock_net(sk), skb->skb_iif);
and
dev = skb->dev;
but in both cases, dev was NULL.

I'm trying to reference the underlying interface to determine whether
the conditions present in that interface are acceptable for
transmission.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: phy: dp83867: Fix warning check for setting the internal delay

2015-07-21 Thread Dan Murphy
Fix warning: logical ‘or’ of collectively exhaustive tests is always true

Change the internal delay check from an 'or' condition to an 'and'
condition.

Reported-by: David Binderman 
Signed-off-by: Dan Murphy 
---
 drivers/net/phy/dp83867.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index c7a12e2..8a3bf54 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -164,7 +164,7 @@ static int dp83867_config_init(struct phy_device *phydev)
return ret;
}
 
-   if ((phydev->interface >= PHY_INTERFACE_MODE_RGMII_ID) ||
+   if ((phydev->interface >= PHY_INTERFACE_MODE_RGMII_ID) &&
(phydev->interface <= PHY_INTERFACE_MODE_RGMII_RXID)) {
val = phy_read_mmd_indirect(phydev, DP83867_RGMIICTL,
DP83867_DEVADDR, phydev->addr);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] packet: Allow packets with only a header (but no payload)

2015-07-21 Thread Willem de Bruijn
On Tue, Jul 21, 2015 at 12:38 PM, Martin Blumenstingl
 wrote:
> Hi Willem,
>
> On Tue, Jul 21, 2015 at 6:28 PM, Willem de Bruijn  wrote:
>> Interesting. 9c7077622dd9 only extended the check from tpacket_snd to
>> packet_snd to make the two paths equivalent. The existing check had the
>> ominous statement
>>
>> /* net device doesn't like empty head */
> OK, I guess it's best to find out what the purpose of this comment is.
>
>> so allowing a header-only packet while correct in your case may not be
>> safe in some edge cases (specific device drivers?).
> I'm wondering how a good fix would look like (I can think of a few
> things, like renaming hard_header_len to something min_packet_size)?
> I am open for suggestions since I have zero knowledge about the inner
> workings of the packet framework.

I don't see a simple way of verifying the safety of allowing packets
without data short of a code audit, which would be huge, especially
when taking device driver logic into account. Perhaps someone
remembers why that statement was added and what edge case(s)
it refers to. I'm afraid that I don't. It was added in 69e3c75f4d54. I
added the author to this thread.

>> This was also discussed previously
>>
>>   http://www.spinics.net/lists/netdev/msg309677.html
>>
>> In any case, I don't think that reverting the patch and restoring the old
>> inconsistent state is a fix.
> I totally agree with you that it's a bad fix if this means that we
> could break other drivers.
> My primary goal was to fix PPPoE connections - I guess I should have
> simply added "RFC" to the subject.
>
>
> Regards,
> Martin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] cgroup: net_cls: fix false-positive "suspicious RCU usage"

2015-07-21 Thread Konstantin Khlebnikov
In dev_queue_xmit() net_cls protected with rcu-bh.

[  270.730026] ===
[  270.730029] [ INFO: suspicious RCU usage. ]
[  270.730033] 4.2.0-rc3+ #2 Not tainted
[  270.730036] ---
[  270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() 
usage!
[  270.730041] other info that might help us debug this:
[  270.730043] rcu_scheduler_active = 1, debug_locks = 1
[  270.730045] 2 locks held by dhclient/748:
[  270.730046]  #0:  (rcu_read_lock_bh){..}, at: [] 
__dev_queue_xmit+0x50/0x960
[  270.730085]  #1:  (&qdisc_tx_lock){+.}, at: [] 
__dev_queue_xmit+0x240/0x960
[  270.730090] stack backtrace:
[  270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2
[  270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 
01/01/2011
[  270.730100]  0001 8800bafeba58 817ad487 
0007
[  270.730103]  880232a0a780 8800bafeba88 810ca4f2 
88022fb23e00
[  270.730105]  880232a0a780 8800bafebb68 8800bafebb68 
8800bafebaa8
[  270.730108] Call Trace:
[  270.730121]  [] dump_stack+0x4c/0x65
[  270.730148]  [] lockdep_rcu_suspicious+0xe2/0x120
[  270.730153]  [] task_cls_state+0x92/0xa0
[  270.730158]  [] cls_cgroup_classify+0x4f/0x120 [cls_cgroup]
[  270.730164]  [] tc_classify_compat+0x74/0xc0
[  270.730166]  [] tc_classify+0x33/0x90
[  270.730170]  [] htb_enqueue+0xaa/0x4a0 [sch_htb]
[  270.730172]  [] __dev_queue_xmit+0x306/0x960
[  270.730174]  [] ? __dev_queue_xmit+0x50/0x960
[  270.730176]  [] dev_queue_xmit_sk+0x13/0x20
[  270.730185]  [] dev_queue_xmit+0x10/0x20
[  270.730187]  [] packet_snd.isra.62+0x54c/0x760
[  270.730190]  [] packet_sendmsg+0x2f5/0x3f0
[  270.730203]  [] ? sock_def_readable+0x5/0x190
[  270.730210]  [] ? _raw_spin_unlock+0x2b/0x40
[  270.730216]  [] ? unix_dgram_sendmsg+0x5cc/0x640
[  270.730219]  [] sock_sendmsg+0x47/0x50
[  270.730221]  [] sock_write_iter+0x7f/0xd0
[  270.730232]  [] __vfs_write+0xa7/0xf0
[  270.730234]  [] vfs_write+0xb8/0x190
[  270.730236]  [] SyS_write+0x52/0xb0
[  270.730239]  [] entry_SYSCALL_64_fastpath+0x12/0x76

Signed-off-by: Konstantin Khlebnikov 
---
 net/core/netclassid_cgroup.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index 1f2a126f4ffa..515939034298 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -23,7 +23,8 @@ static inline struct cgroup_cls_state *css_cls_state(struct 
cgroup_subsys_state
 
 struct cgroup_cls_state *task_cls_state(struct task_struct *p)
 {
-   return css_cls_state(task_css(p, net_cls_cgrp_id));
+   return css_cls_state(task_css_check(p, net_cls_cgrp_id,
+  rcu_read_lock_bh_held()));
 }
 EXPORT_SYMBOL_GPL(task_cls_state);
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] packet: Allow packets with only a header (but no payload)

2015-07-21 Thread Martin Blumenstingl
Hi Willem,

On Tue, Jul 21, 2015 at 6:28 PM, Willem de Bruijn  wrote:
> Interesting. 9c7077622dd9 only extended the check from tpacket_snd to
> packet_snd to make the two paths equivalent. The existing check had the
> ominous statement
>
> /* net device doesn't like empty head */
OK, I guess it's best to find out what the purpose of this comment is.

> so allowing a header-only packet while correct in your case may not be
> safe in some edge cases (specific device drivers?).
I'm wondering how a good fix would look like (I can think of a few
things, like renaming hard_header_len to something min_packet_size)?
I am open for suggestions since I have zero knowledge about the inner
workings of the packet framework.

> This was also discussed previously
>
>   http://www.spinics.net/lists/netdev/msg309677.html
>
> In any case, I don't think that reverting the patch and restoring the old
> inconsistent state is a fix.
I totally agree with you that it's a bad fix if this means that we
could break other drivers.
My primary goal was to fix PPPoE connections - I guess I should have
simply added "RFC" to the subject.


Regards,
Martin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Chris J Arges
On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote:
> On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote:
> > Some architectures like POWER can have a NUMA node_possible_map that
> > contains sparse entries. This causes memory corruption with openvswitch
> > since it allocates flow_cache with a multiple of num_possible_nodes() and
> 
> Couldn't this also be fixed by just allocationg with a multiple of
> nr_node_ids (which seems to have been the original intent all along)?
> You could then make your stats array be sparse or not.
> 

Yea originally this is what I did, but I thought it would be wasting memory.

> > assumes the node variable returned by for_each_node will index into
> > flow->stats[node].
> > 
> > For example, if node_possible_map is 0x30003, this patch will map node to
> > node_cnt as follows:
> > 0,1,16,17 => 0,1,2,3
> > 
> > The crash was noticed after 3af229f2 was applied as it changed the
> > node_possible_map to match node_online_map on boot.
> > Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
> 
> My concern with this version of the fix is that you're relying on,
> implicitly, the order of for_each_node's iteration corresponding to the
> entries in stats 1:1. But what about node hotplug? It seems better to
> have the enumeration of the stats array match the topology accurately,
> rather, or to maintain some sort of internal map in the OVS code between
> the NUMA node and the entry in the stats array?
> 
> I'm willing to be convinced otherwise, though :)
> 
> -Nish
>

Nish,

The method I described should work for hotplug since it's using possible map
which AFAIK is static rather than the online map. 

Regardless, the more simple solution to solve this issue would be to just
allocate nr_node_ids number of entries and use up extra memory.

I'll send a v2 after testing it.

--chris

> > Signed-off-by: Chris J Arges 
> > ---
> >  net/openvswitch/flow.c   | 10 ++
> >  net/openvswitch/flow_table.c | 18 +++---
> >  2 files changed, 17 insertions(+), 11 deletions(-)
> > 
> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > index bc7b0ab..425d45d 100644
> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> > @@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
> > struct ovs_flow_stats *ovs_stats,
> > unsigned long *used, __be16 *tcp_flags)
> >  {
> > -   int node;
> > +   int node, node_cnt = 0;
> > 
> > *used = 0;
> > *tcp_flags = 0;
> > memset(ovs_stats, 0, sizeof(*ovs_stats));
> > 
> > for_each_node(node) {
> > -   struct flow_stats *stats = 
> > rcu_dereference_ovsl(flow->stats[node]);
> > +   struct flow_stats *stats = 
> > rcu_dereference_ovsl(flow->stats[node_cnt]);
> > 
> > if (stats) {
> > /* Local CPU may write on non-local stats, so we must
> > @@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
> > ovs_stats->n_bytes += stats->byte_count;
> > spin_unlock_bh(&stats->lock);
> > }
> > +   node_cnt++;
> > }
> >  }
> > 
> >  /* Called with ovs_mutex. */
> >  void ovs_flow_stats_clear(struct sw_flow *flow)
> >  {
> > -   int node;
> > +   int node, node_cnt = 0;
> > 
> > for_each_node(node) {
> > -   struct flow_stats *stats = ovsl_dereference(flow->stats[node]);
> > +   struct flow_stats *stats = 
> > ovsl_dereference(flow->stats[node_cnt]);
> > 
> > if (stats) {
> > spin_lock_bh(&stats->lock);
> > @@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
> > stats->tcp_flags = 0;
> > spin_unlock_bh(&stats->lock);
> > }
> > +   node_cnt++;
> > }
> >  }
> > 
> > diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> > index 4613df8..5d10c54 100644
> > --- a/net/openvswitch/flow_table.c
> > +++ b/net/openvswitch/flow_table.c
> > @@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void)
> >  {
> > struct sw_flow *flow;
> > struct flow_stats *stats;
> > -   int node;
> > +   int node, node_cnt = 0;
> > 
> > flow = kmem_cache_alloc(flow_cache, GFP_KERNEL);
> > if (!flow)
> > @@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void)
> > 
> > RCU_INIT_POINTER(flow->stats[0], stats);
> > 
> > -   for_each_node(node)
> > +   for_each_node(node) {
> > if (node != 0)
> > -   RCU_INIT_POINTER(flow->stats[node], NULL);
> > +   RCU_INIT_POINTER(flow->stats[node_cnt], NULL);
> > +   node_cnt++;
> > +   }
> > 
> > return flow;
> >  err:
> > @@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int 
> > n_buckets)
> > 
> >  static void flow_free(struct sw_flow *flow)
> >  {
> > -   int node;
> > +   int node, node_cnt = 0;
> > 
> > if (ovs_identifier_is_key(&flow->id))
> > kfree

Re: [PATCH] packet: Allow packets with only a header (but no payload)

2015-07-21 Thread Willem de Bruijn
On Tue, Jul 21, 2015 at 12:14 PM, Martin Blumenstingl
 wrote:
> 9c70776 added validation for the packet size in packet_snd. This change
> enforced that every packet needs a long enough header and at least one
> byte payload.
>
> However, when trying to establish a PPPoE connection the following message
> is printed every time a PPPoE discovery packet is sent:
> pppd: packet size is too short (24 <= 24)
>
> From what I can see in the PPPoE code the "PADI" discovery packet can
> consist of only a header with no payload (when there is neither a service
> name nor a Host-Uniq configured).

Interesting. 9c7077622dd9 only extended the check from tpacket_snd to
packet_snd to make the two paths equivalent. The existing check had the
ominous statement

/* net device doesn't like empty head */

so allowing a header-only packet while correct in your case may not be
safe in some edge cases (specific device drivers?).

This was also discussed previously

  http://www.spinics.net/lists/netdev/msg309677.html

In any case, I don't think that reverting the patch and restoring the old
inconsistent state is a fix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Nishanth Aravamudan
On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote:
> Some architectures like POWER can have a NUMA node_possible_map that
> contains sparse entries. This causes memory corruption with openvswitch
> since it allocates flow_cache with a multiple of num_possible_nodes() and

Couldn't this also be fixed by just allocationg with a multiple of
nr_node_ids (which seems to have been the original intent all along)?
You could then make your stats array be sparse or not.

> assumes the node variable returned by for_each_node will index into
> flow->stats[node].
> 
> For example, if node_possible_map is 0x30003, this patch will map node to
> node_cnt as follows:
> 0,1,16,17 => 0,1,2,3
> 
> The crash was noticed after 3af229f2 was applied as it changed the
> node_possible_map to match node_online_map on boot.
> Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861

My concern with this version of the fix is that you're relying on,
implicitly, the order of for_each_node's iteration corresponding to the
entries in stats 1:1. But what about node hotplug? It seems better to
have the enumeration of the stats array match the topology accurately,
rather, or to maintain some sort of internal map in the OVS code between
the NUMA node and the entry in the stats array?

I'm willing to be convinced otherwise, though :)

-Nish

> Signed-off-by: Chris J Arges 
> ---
>  net/openvswitch/flow.c   | 10 ++
>  net/openvswitch/flow_table.c | 18 +++---
>  2 files changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index bc7b0ab..425d45d 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
>   struct ovs_flow_stats *ovs_stats,
>   unsigned long *used, __be16 *tcp_flags)
>  {
> - int node;
> + int node, node_cnt = 0;
> 
>   *used = 0;
>   *tcp_flags = 0;
>   memset(ovs_stats, 0, sizeof(*ovs_stats));
> 
>   for_each_node(node) {
> - struct flow_stats *stats = 
> rcu_dereference_ovsl(flow->stats[node]);
> + struct flow_stats *stats = 
> rcu_dereference_ovsl(flow->stats[node_cnt]);
> 
>   if (stats) {
>   /* Local CPU may write on non-local stats, so we must
> @@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
>   ovs_stats->n_bytes += stats->byte_count;
>   spin_unlock_bh(&stats->lock);
>   }
> + node_cnt++;
>   }
>  }
> 
>  /* Called with ovs_mutex. */
>  void ovs_flow_stats_clear(struct sw_flow *flow)
>  {
> - int node;
> + int node, node_cnt = 0;
> 
>   for_each_node(node) {
> - struct flow_stats *stats = ovsl_dereference(flow->stats[node]);
> + struct flow_stats *stats = 
> ovsl_dereference(flow->stats[node_cnt]);
> 
>   if (stats) {
>   spin_lock_bh(&stats->lock);
> @@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
>   stats->tcp_flags = 0;
>   spin_unlock_bh(&stats->lock);
>   }
> + node_cnt++;
>   }
>  }
> 
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index 4613df8..5d10c54 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void)
>  {
>   struct sw_flow *flow;
>   struct flow_stats *stats;
> - int node;
> + int node, node_cnt = 0;
> 
>   flow = kmem_cache_alloc(flow_cache, GFP_KERNEL);
>   if (!flow)
> @@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void)
> 
>   RCU_INIT_POINTER(flow->stats[0], stats);
> 
> - for_each_node(node)
> + for_each_node(node) {
>   if (node != 0)
> - RCU_INIT_POINTER(flow->stats[node], NULL);
> + RCU_INIT_POINTER(flow->stats[node_cnt], NULL);
> + node_cnt++;
> + }
> 
>   return flow;
>  err:
> @@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int 
> n_buckets)
> 
>  static void flow_free(struct sw_flow *flow)
>  {
> - int node;
> + int node, node_cnt = 0;
> 
>   if (ovs_identifier_is_key(&flow->id))
>   kfree(flow->id.unmasked_key);
>   kfree((struct sw_flow_actions __force *)flow->sf_acts);
> - for_each_node(node)
> - if (flow->stats[node])
> + for_each_node(node) {
> + if (flow->stats[node_cnt])
>   kmem_cache_free(flow_stats_cache,
> - (struct flow_stats __force 
> *)flow->stats[node]);
> + (struct flow_stats __force 
> *)flow->stats[node_cnt]);
> + node_cnt++;
> + }
>   kmem_cache_free(flow_cache, flow);
>  }
> 
> -- 
> 1.9.1
> 

--
To unsubscribe from this list: send t

[PATCH net-next] mpls: make RTA_OIF optional

2015-07-21 Thread Roopa Prabhu
From: Roopa Prabhu 

If user did not specify an oif, try and get it from the via address.
If failed to get device, return with -ENODEV.

Signed-off-by: Roopa Prabhu 
---
 net/mpls/af_mpls.c |   67 +++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 1f93a59..4cd3789 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 #define LABEL_NOT_SPECIFIED (1<<20)
@@ -327,6 +328,70 @@ static unsigned find_free_label(struct net *net)
return LABEL_NOT_SPECIFIED;
 }
 
+static struct net_device *inet_fib_lookup_dev(struct net *net, void *addr)
+{
+   struct net_device *dev = NULL;
+   struct rtable *rt;
+   struct in_addr daddr;
+
+   memcpy(&daddr, addr, sizeof(struct in_addr));
+   rt = ip_route_output(net, daddr.s_addr, 0, 0, 0);
+   if (IS_ERR(rt))
+   goto errout;
+
+   dev = rt->dst.dev;
+   dev_hold(dev);
+
+   ip_rt_put(rt);
+
+errout:
+   return dev;
+}
+
+static struct net_device *inet6_fib_lookup_dev(struct net *net, void *addr)
+{
+   struct net_device *dev = NULL;
+   struct dst_entry *dst;
+   struct flowi6 fl6;
+
+   memset(&fl6, 0, sizeof(fl6));
+   memcpy(&fl6.daddr, addr, sizeof(struct in6_addr));
+   dst = ip6_route_output(net, NULL, &fl6);
+   if (dst->error)
+   goto errout;
+
+   dev = dst->dev;
+   dev_hold(dev);
+
+errout:
+   dst_release(dst);
+
+   return dev;
+}
+
+static struct net_device *find_outdev(struct net *net,
+ struct mpls_route_config *cfg)
+{
+   struct net_device *dev = NULL;
+
+   if (!cfg->rc_ifindex) {
+   switch (cfg->rc_via_table) {
+   case NEIGH_ARP_TABLE:
+   dev = inet_fib_lookup_dev(net, cfg->rc_via);
+   break;
+   case NEIGH_ND_TABLE:
+   dev = inet6_fib_lookup_dev(net, cfg->rc_via);
+   break;
+   case NEIGH_LINK_TABLE:
+   break;
+   }
+   } else {
+   dev = dev_get_by_index(net, cfg->rc_ifindex);
+   }
+
+   return dev;
+}
+
 static int mpls_route_add(struct mpls_route_config *cfg)
 {
struct mpls_route __rcu **platform_label;
@@ -358,7 +423,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
goto errout;
 
err = -ENODEV;
-   dev = dev_get_by_index(net, cfg->rc_ifindex);
+   dev = find_outdev(net, cfg);
if (!dev)
goto errout;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] packet: Allow packets with only a header (but no payload)

2015-07-21 Thread Martin Blumenstingl
9c70776 added validation for the packet size in packet_snd. This change
enforced that every packet needs a long enough header and at least one
byte payload.

However, when trying to establish a PPPoE connection the following message
is printed every time a PPPoE discovery packet is sent:
pppd: packet size is too short (24 <= 24)

>From what I can see in the PPPoE code the "PADI" discovery packet can
consist of only a header with no payload (when there is neither a service
name nor a Host-Uniq configured).

Signed-off-by: Martin Blumenstingl 
---
 net/packet/af_packet.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c9e8741..d983f8f 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2199,18 +2199,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
sock_wfree(skb);
 }
 
-static bool ll_header_truncated(const struct net_device *dev, int len)
-{
-   /* net device doesn't like empty head */
-   if (unlikely(len <= dev->hard_header_len)) {
-   net_warn_ratelimited("%s: packet size is too short (%d <= 
%d)\n",
-current->comm, len, dev->hard_header_len);
-   return true;
-   }
-
-   return false;
-}
-
 static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
void *frame, struct net_device *dev, int size_max,
__be16 proto, unsigned char *addr, int hlen)
@@ -2286,8 +2274,14 @@ static int tpacket_fill_skb(struct packet_sock *po, 
struct sk_buff *skb,
if (unlikely(err < 0))
return -EINVAL;
} else if (dev->hard_header_len) {
-   if (ll_header_truncated(dev, tp_len))
+   /* net device doesn't like empty head */
+   if (unlikely(len <= dev->hard_header_len)) {
+   net_warn_ratelimited("%s: packet size is too short "
+   "(%d <= %d)\n",
+   current->comm, len,
+   dev->hard_header_len);
return -EINVAL;
+   }
 
skb_push(skb, dev->hard_header_len);
err = skb_store_bits(skb, 0, data,
@@ -2624,8 +2618,13 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
if (unlikely(offset < 0))
goto out_free;
} else {
-   if (ll_header_truncated(dev, len))
+   if (unlikely(len < dev->hard_header_len)) {
+   net_warn_ratelimited("%s: packet size is shorter than "
+   "minimum header size (%d < %d)\n",
+   current->comm, len,
+   dev->hard_header_len);
goto out_free;
+   }
}
 
/* Returns -EFAULT on error */
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] ath10k: fixing wrong initialization of struct channel

2015-07-21 Thread Kalle Valo
Maninder Singh  writes:

>>> chandef is initialized with NULL and on the very next line,
>>> we are using it to get channel, which is not correct.
>>>
>>> channel should be initialized after obtaining chandef.
>>>
>>> Signed-off-by: Maninder Singh 
>
>>How did you find this bug?
>
> Static anlysis reports this bug like coverity or any other static tool like 
> cppcheck :-
>
> drivers/net/wireless/ath/ath10k/mac.c:839]: (error) Possible null pointer 
> dereference: chandef

Thanks. This is always good to add to the commit log so I did that:

ath10k: fix wrong initialization of struct channel

chandef is initialized with NULL and on the very next line, we are using it 
to
get channel, which is not correct. Channel should be initialized after
obtaining chandef.

Found by cppcheck:

ath/ath10k/mac.c:839]: (error) Possible null pointer dereference: chandef

Signed-off-by: Maninder Singh 
Signed-off-by: Kalle Valo 


-- 
Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 net-next 0/3] ARM BPF JIT features

2015-07-21 Thread Alexei Starovoitov

On 7/21/15 5:16 AM, Nicolas Schichan wrote:

This serie adds support for more instructions to the ARM BPF JIT
namely skb netdevice type retrieval, skb payload offset retrieval, and
skb packet type retrieval.

This allows 35 tests to use the JIT instead of 29 before.

This serie depends on the "BPF JIT fixes for ARM" serie sent earlier.


Actually in these patches I don't see a strong dependency on 'net' set,
but since you're saying there is, you'd need to resubmit this set after
your 'net' set is merged, whole 'net' sent to Linus and merged
into net-next.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Chris J Arges
Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].

For example, if node_possible_map is 0x30003, this patch will map node to
node_cnt as follows:
0,1,16,17 => 0,1,2,3

The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861

Signed-off-by: Chris J Arges 
---
 net/openvswitch/flow.c   | 10 ++
 net/openvswitch/flow_table.c | 18 +++---
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index bc7b0ab..425d45d 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
struct ovs_flow_stats *ovs_stats,
unsigned long *used, __be16 *tcp_flags)
 {
-   int node;
+   int node, node_cnt = 0;
 
*used = 0;
*tcp_flags = 0;
memset(ovs_stats, 0, sizeof(*ovs_stats));
 
for_each_node(node) {
-   struct flow_stats *stats = 
rcu_dereference_ovsl(flow->stats[node]);
+   struct flow_stats *stats = 
rcu_dereference_ovsl(flow->stats[node_cnt]);
 
if (stats) {
/* Local CPU may write on non-local stats, so we must
@@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
ovs_stats->n_bytes += stats->byte_count;
spin_unlock_bh(&stats->lock);
}
+   node_cnt++;
}
 }
 
 /* Called with ovs_mutex. */
 void ovs_flow_stats_clear(struct sw_flow *flow)
 {
-   int node;
+   int node, node_cnt = 0;
 
for_each_node(node) {
-   struct flow_stats *stats = ovsl_dereference(flow->stats[node]);
+   struct flow_stats *stats = 
ovsl_dereference(flow->stats[node_cnt]);
 
if (stats) {
spin_lock_bh(&stats->lock);
@@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
stats->tcp_flags = 0;
spin_unlock_bh(&stats->lock);
}
+   node_cnt++;
}
 }
 
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 4613df8..5d10c54 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void)
 {
struct sw_flow *flow;
struct flow_stats *stats;
-   int node;
+   int node, node_cnt = 0;
 
flow = kmem_cache_alloc(flow_cache, GFP_KERNEL);
if (!flow)
@@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void)
 
RCU_INIT_POINTER(flow->stats[0], stats);
 
-   for_each_node(node)
+   for_each_node(node) {
if (node != 0)
-   RCU_INIT_POINTER(flow->stats[node], NULL);
+   RCU_INIT_POINTER(flow->stats[node_cnt], NULL);
+   node_cnt++;
+   }
 
return flow;
 err:
@@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int 
n_buckets)
 
 static void flow_free(struct sw_flow *flow)
 {
-   int node;
+   int node, node_cnt = 0;
 
if (ovs_identifier_is_key(&flow->id))
kfree(flow->id.unmasked_key);
kfree((struct sw_flow_actions __force *)flow->sf_acts);
-   for_each_node(node)
-   if (flow->stats[node])
+   for_each_node(node) {
+   if (flow->stats[node_cnt])
kmem_cache_free(flow_stats_cache,
-   (struct flow_stats __force 
*)flow->stats[node]);
+   (struct flow_stats __force 
*)flow->stats[node_cnt]);
+   node_cnt++;
+   }
kmem_cache_free(flow_cache, flow);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] net: fec: use managed DMA API functions to allocate BD ring

2015-07-21 Thread Lucas Stach
So it gets freed when the device is going away.
This fixes a DMA memory leak on driver probe() fail and driver
remove().

Signed-off-by: Lucas Stach 
---
v2: Fix indentation of second line to fix alignment with opening bracket.
---
 drivers/net/ethernet/freescale/fec_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 349365d85b92..a7f1bdf718f8 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3142,8 +3142,8 @@ static int fec_enet_init(struct net_device *ndev)
fep->bufdesc_size;
 
/* Allocate memory for buffer descriptors. */
-   cbd_base = dma_alloc_coherent(NULL, bd_size, &bd_dma,
- GFP_KERNEL);
+   cbd_base = dmam_alloc_coherent(&fep->pdev->dev, bd_size, &bd_dma,
+  GFP_KERNEL);
if (!cbd_base) {
return -ENOMEM;
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe fail path

2015-07-21 Thread Lucas Stach
This function frees resources and cancels delayed work item that
have been initialized in fec_ptp_init().

Use this to do proper error handling if something goes wrong in
probe function after fec_ptp_init has been called.

Signed-off-by: Lucas Stach 
---
 drivers/net/ethernet/freescale/fec.h  |  1 +
 drivers/net/ethernet/freescale/fec_main.c |  5 ++---
 drivers/net/ethernet/freescale/fec_ptp.c  | 10 ++
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h 
b/drivers/net/ethernet/freescale/fec.h
index 1eee73cccdf5..99d33e2d35e6 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -562,6 +562,7 @@ struct fec_enet_private {
 };
 
 void fec_ptp_init(struct platform_device *pdev);
+void fec_ptp_stop(struct platform_device *pdev);
 void fec_ptp_start_cyclecounter(struct net_device *ndev);
 int fec_ptp_set(struct net_device *ndev, struct ifreq *ifr);
 int fec_ptp_get(struct net_device *ndev, struct ifreq *ifr);
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index a7f1bdf718f8..32e3807c650e 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3494,6 +3494,7 @@ failed_register:
 failed_mii_init:
 failed_irq:
 failed_init:
+   fec_ptp_stop(pdev);
if (fep->reg_phy)
regulator_disable(fep->reg_phy);
 failed_regulator:
@@ -3515,14 +3516,12 @@ fec_drv_remove(struct platform_device *pdev)
struct net_device *ndev = platform_get_drvdata(pdev);
struct fec_enet_private *fep = netdev_priv(ndev);
 
-   cancel_delayed_work_sync(&fep->time_keep);
cancel_work_sync(&fep->tx_timeout_work);
+   fec_ptp_stop(pdev);
unregister_netdev(ndev);
fec_enet_mii_remove(fep);
if (fep->reg_phy)
regulator_disable(fep->reg_phy);
-   if (fep->ptp_clock)
-   ptp_clock_unregister(fep->ptp_clock);
of_node_put(fep->phy_node);
free_netdev(ndev);
 
diff --git a/drivers/net/ethernet/freescale/fec_ptp.c 
b/drivers/net/ethernet/freescale/fec_ptp.c
index a15663ad7f5e..f457a23d0bfb 100644
--- a/drivers/net/ethernet/freescale/fec_ptp.c
+++ b/drivers/net/ethernet/freescale/fec_ptp.c
@@ -604,6 +604,16 @@ void fec_ptp_init(struct platform_device *pdev)
schedule_delayed_work(&fep->time_keep, HZ);
 }
 
+void fec_ptp_stop(struct platform_device *pdev)
+{
+   struct net_device *ndev = platform_get_drvdata(pdev);
+   struct fec_enet_private *fep = netdev_priv(ndev);
+
+   cancel_delayed_work_sync(&fep->time_keep);
+   if (fep->ptp_clock)
+   ptp_clock_unregister(fep->ptp_clock);
+}
+
 /**
  * fec_ptp_check_pps_event
  * @fep: the fec_enet_private structure handle
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/2] iwlegacy: convert hex_dump_to_buffer() to %*ph

2015-07-21 Thread Kalle Valo

> There is no need to use hex_dump_to_buffer() in the cases like this:
> 
>   hexdump_to_buffer(buf, len, 16, 1, outbuf, outlen, false);  /* len 
> <= 16 */
>   sprintf("%s\n", outbuf);
> 
> since it maybe easily converted to simple:
> 
>   sprintf("%*ph\n", len, buf);
> 
> Note: it seems in the case the output is groupped by 2 bytes and looks like a
> typo. Thus, patch changes that to plain byte stream.
> 
> Signed-off-by: Andy Shevchenko 

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2] brcmsmac: Use kstrdup to simplify code

2015-07-21 Thread Kalle Valo

> Replace a kmalloc+strcpy by an equivalent kstrdup in order to improve
> readability.
> 
> Signed-off-by: Christophe JAILLET 
> Acked-by: Arend van Spriel 

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rtlwifi: rtl8821ae: Fix an expression that is always false

2015-07-21 Thread Kalle Valo

> In routine _rtl8821ae_set_media_status(), an incorrect mask results in a test
> for AP status to always be false. Similar bugs were fixed in rtl8192cu and
> rtl8192de, but this instance was missed at that time.
> 
> Reported-by: David Binderman 
> Signed-off-by: Larry Finger 
> Cc: Stable  [3.18+]
> Cc: David Binderman 

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netcp:Fix error handling in the function netcp_xgbe_serdes_config

2015-07-21 Thread Murali Karicheri

On 07/20/2015 11:54 AM, Nicholas Krause wrote:

This fixes error handling in the function netcp_xgbe_serdes_config
by putting the return value of netcp_xgbe_serdes_check_lane into
the variable ret and return this value to the caller as this function
can fail when called by returning the error code -ETIMEOUT.

Signed-off-by: Nicholas Krause 
---
  drivers/net/ethernet/ti/netcp_xgbepcsr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/netcp_xgbepcsr.c 
b/drivers/net/ethernet/ti/netcp_xgbepcsr.c
index 33571ac..0c79e3d 100644
--- a/drivers/net/ethernet/ti/netcp_xgbepcsr.c
+++ b/drivers/net/ethernet/ti/netcp_xgbepcsr.c
@@ -483,7 +483,7 @@ static int netcp_xgbe_serdes_config(void __iomem 
*serdes_regs,
return ret;

netcp_xgbe_serdes_enable_xgmii_port(sw_regs);
-   netcp_xgbe_serdes_check_lane(serdes_regs, sw_regs);
+   ret = netcp_xgbe_serdes_check_lane(serdes_regs, sw_regs);
return ret;
  }



Nicholas,

Thanks for the patch.

Acked-by: Murali Karicheri 

--
Murali Karicheri
Linux Kernel, Keystone
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] macvtap: fix network header pointer for VLAN tagged pkts

2015-07-21 Thread Toshiaki Makita
On 15/07/21 (火) 16:18, Ivan Vecera wrote:
> Network header is set with offset ETH_HLEN but it is not true for VLAN
> (multiple-)tagged and results in checksum issues in lower devices.
> 
> v2: leave skb->protocol untouched (thx Vlad), comment added
> 
> Signed-off-by: Ivan Vecera 
> ---
>   drivers/net/macvtap.c | 7 +++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 3b933bb..b75776b 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -796,6 +796,13 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, 
> struct msghdr *m,
>   skb_reset_mac_header(skb);
>   skb->protocol = eth_hdr(skb)->h_proto;
>   
> + /* Move network header to the right position for VLAN tagged packets */
> + if (skb_vlan_tagged(skb)) {

I guess you don't need the condition skb_vlan_tag_present(skb), i.e.,

if (skb->protocol == htons(ETH_P_8021Q) ||
skb->protocol == htons(ETH_P_8021AD))

> + int depth;
> + __vlan_get_protocol(skb, skb->protocol, &depth);

__vlan_get_protocol() can fail, and then, depth will not be initialized.

> + skb_set_network_header(skb, depth);

I think you should set network_header after
skb_probe_transport_header(). It calls skb_flow_dissect_flow_keys(),
which seems to expect network_header to be ETH_HLEN.

Toshiaki Makita
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ARP response with link local IP, why not broadcast

2015-07-21 Thread Sebastian Fett

Hello!

According to RFC3927 every ARP packet (reply and request) should be sent 
as link layer broadcast as long as the sender IP is a link local 
address. (see chapter 2.5).
That functionality would help me a lot with a use case I have with our 
application.


But it is not implemented in the kernel that way.
Does anyone know why?

Regards,
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] netlink: don't hold mutex in rcu callback when releasing mmapd ring

2015-07-21 Thread Florian Westphal
Kirill A. Shutemov says:

This simple test-case trigers few locking asserts in kernel:

int main(int argc, char **argv)
{
unsigned int block_size = 16 * 4096;
struct nl_mmap_req req = {
.nm_block_size  = block_size,
.nm_block_nr= 64,
.nm_frame_size  = 16384,
.nm_frame_nr= 64 * block_size / 16384,
};
unsigned int ring_size;
int fd;

fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
exit(1);
if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
exit(1);

ring_size = req.nm_block_nr * req.nm_block_size;
mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
return 0;
}

+++ exited with 0 +++
BUG: sleeping function called from invalid context at 
/home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
3 locks held by init/1:
 #0:  (reboot_mutex){+.+...}, at: [] SyS_reboot+0xa9/0x220
 #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [] 
__blocking_notifier_call_chain+0x39/0x70
 #2:  (rcu_callback){..}, at: [] 
rcu_do_batch.isra.49+0x160/0x10c0
Preemption disabled at:[] __delay+0xf/0x20

CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-9-gbddf4c4818e0 #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 
04/01/2014
 88017b3d8000 88027bc03c38 81929ceb 0102
  88027bc03c68 81085a9d 0002
 81ca2a20 0268  88027bc03c98
Call Trace:
   [] dump_stack+0x4f/0x7b
 [] ___might_sleep+0x16d/0x270
 [] __might_sleep+0x4d/0x90
 [] mutex_lock_nested+0x2f/0x430
 [] ? _raw_spin_unlock_irqrestore+0x5d/0x80
 [] ? __this_cpu_preempt_check+0x13/0x20
 [] netlink_set_ring+0x1ed/0x350
 [] ? netlink_undo_bind+0x70/0x70
 [] netlink_sock_destruct+0x80/0x150
 [] __sk_free+0x1d/0x160
 [] sk_free+0x19/0x20
[..]

Cong Wang says:

We can't hold mutex lock in a rcu callback, [..]

Thomas Graf says:

The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.

Reported-by: "Kirill A. Shutemov" 
Diagnosed-by: Cong Wang 
Suggested-by: Thomas Graf 
Signed-off-by: Florian Westphal 
---
 net/netlink/af_netlink.c | 79 
 1 file changed, 47 insertions(+), 32 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 9a0ae71..d8e2e39 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -357,25 +357,52 @@ err1:
return NULL;
 }
 
+
+static void
+__netlink_set_ring(struct sock *sk, struct nl_mmap_req *req, bool tx_ring, 
void **pg_vec,
+  unsigned int order)
+{
+   struct netlink_sock *nlk = nlk_sk(sk);
+   struct sk_buff_head *queue;
+   struct netlink_ring *ring;
+
+   queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue;
+   ring  = tx_ring ? &nlk->tx_ring : &nlk->rx_ring;
+
+   spin_lock_bh(&queue->lock);
+
+   ring->frame_max = req->nm_frame_nr - 1;
+   ring->head  = 0;
+   ring->frame_size= req->nm_frame_size;
+   ring->pg_vec_pages  = req->nm_block_size / PAGE_SIZE;
+
+   swap(ring->pg_vec_len, req->nm_block_nr);
+   swap(ring->pg_vec_order, order);
+   swap(ring->pg_vec, pg_vec);
+
+   __skb_queue_purge(queue);
+   spin_unlock_bh(&queue->lock);
+
+   WARN_ON(atomic_read(&nlk->mapped));
+
+   if (pg_vec)
+   free_pg_vec(pg_vec, order, req->nm_block_nr);
+}
+
 static int netlink_set_ring(struct sock *sk, struct nl_mmap_req *req,
-   bool closing, bool tx_ring)
+   bool tx_ring)
 {
struct netlink_sock *nlk = nlk_sk(sk);
struct netlink_ring *ring;
-   struct sk_buff_head *queue;
void **pg_vec = NULL;
unsigned int order = 0;
-   int err;
 
ring  = tx_ring ? &nlk->tx_ring : &nlk->rx_ring;
-   queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue;
 
-   if (!closing) {
-   if (atomic_read(&nlk->mapped))
-   return -EBUSY;
-   if (atomic_read(&ring->pending))
-   return -EBUSY;
-   }
+   if (atomic_read(&nlk->mapped))
+   return -EBUSY;
+   if (atomic_read(&ring->pending))
+   return -EBUSY;
 
if (req->nm_block_nr) {
if (ring->pg_vec != NULL)
@@ -407,31 +434,19 @@ static int netlink_set_ring(struct sock *sk, struct 
nl_mmap_req *req,
return -EINVAL;
}
 
-   err = -EBUSY;
mutex_lock(&nlk->pg_vec_lock);
-   if (closing || atomic_read(&nlk->mapped) == 0) {
-  

Re: Several races in "usbnet" module (kernel 4.1.x)

2015-07-21 Thread Oliver Neukum
On Mon, 2015-07-20 at 21:13 +0300, Eugene Shatokhin wrote:
> And here, the code clears EVENT_RX_KILL bit in dev->flags, which may 
> execute concurrently with the above operation:
> #0 clear_bit (bitops.h:113, inlined)
> #1 usbnet_bh (usbnet.c:1475)
> /* restart RX again after disabling due to high error rate */
> clear_bit(EVENT_RX_KILL, &dev->flags);
> 
> If clear_bit() is atomic w.r.t. setting dev->flags to 0, this race is 
> not a problem, I guess. Otherwise, it may be.

clear_bit is atomic with respect to other atomic operations.
So how about this:

Regards
Oliver

>From 1c4e685b3a9c183e04c46b661830e5c7ed35b513 Mon Sep 17 00:00:00 2001
From: Oliver Neukum 
Date: Tue, 21 Jul 2015 16:19:40 +0200
Subject: [PATCH] usbnet: fix race between usbnet_stop() and the BH

Does this do the job?

Signed-off-by: Oliver Neukum 
---
 drivers/net/usb/usbnet.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 3c86b10..77a9a86 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -778,7 +778,7 @@ int usbnet_stop (struct net_device *net)
 {
struct usbnet   *dev = netdev_priv(net);
struct driver_info  *info = dev->driver_info;
-   int retval, pm;
+   int retval, pm, mpn;
 
clear_bit(EVENT_DEV_OPEN, &dev->flags);
netif_stop_queue (net);
@@ -813,14 +813,17 @@ int usbnet_stop (struct net_device *net)
 * can't flush_scheduled_work() until we drop rtnl (later),
 * else workers could deadlock; so make workers a NOP.
 */
+   mpn = !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags);
dev->flags = 0;
del_timer_sync (&dev->delay);
tasklet_kill (&dev->bh);
+   mpn |= !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags);
+   /* in case the bh reset a flag */
+   dev->flags = 0;
if (!pm)
usb_autopm_put_interface(dev->intf);
 
-   if (info->manage_power &&
-   !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags))
+   if (info->manage_power && mpn)
info->manage_power(dev, 0);
else
usb_autopm_put_interface(dev->intf);
-- 
2.1.4



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >