RE: A Regression between v4.2-rc2 and v4.2-rc3
From: Peter Chen Sent: Wednesday, July 22, 2015 1:45 PM > To: netdev@vger.kernel.org > Cc: Andrew Lunn; Duan Fugang-B38611; David S. Miller > Subject: A Regression between v4.2-rc2 and v4.2-rc3 > > Hi List, > > I run out a kernel oops [2] for nfsroot at several imx6 boards when > rebase to v4.2-rc3, after revert below patch [1], it is ok. > This patch is just adding runtime pm for ipg clock, I am wonder why it > takes as a bug fix. > > [1] > commit 6c3e921b18edca290099adfddde8a50236bf2d80 > Author: Andrew Lunn > Date: Mon Jul 6 20:34:55 2015 +0200 > > net: fec: Ensure clocks are enabled while using mdio bus > > When a switch is attached to the mdio bus, the mdio bus can be used > while the interface is not open. If the IPG clock is not enabled, > MDIO > reads/writes will simply time out. > > Add support for runtime PM to control this clock. Enable/disable this > clock using runtime PM, with open()/close() and mdio read()/write() > function triggering runtime PM operations. Since PM is optional, the > IPG clock is enabled at probe and is no longer modified by > fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is > guaranteed the clock is running when MDIO operations are performed. > > Signed-off-by: Andrew Lunn > Acked-by: Fugang Duan > Signed-off-by: David S. Miller > The patch was reverted in last week. Regards, Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
A Regression between v4.2-rc2 and v4.2-rc3
Hi List, I run out a kernel oops [2] for nfsroot at several imx6 boards when rebase to v4.2-rc3, after revert below patch [1], it is ok. This patch is just adding runtime pm for ipg clock, I am wonder why it takes as a bug fix. [1] commit 6c3e921b18edca290099adfddde8a50236bf2d80 Author: Andrew Lunn Date: Mon Jul 6 20:34:55 2015 +0200 net: fec: Ensure clocks are enabled while using mdio bus When a switch is attached to the mdio bus, the mdio bus can be used while the interface is not open. If the IPG clock is not enabled, MDIO reads/writes will simply time out. Add support for runtime PM to control this clock. Enable/disable this clock using runtime PM, with open()/close() and mdio read()/write() function triggering runtime PM operations. Since PM is optional, the IPG clock is enabled at probe and is no longer modified by fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is guaranteed the clock is running when MDIO operations are performed. Signed-off-by: Andrew Lunn Acked-by: Fugang Duan Signed-off-by: David S. Miller [2] [2.534260] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc3 #387 [2.540618] Hardware name: Freescale i.MX6 SoloX (Device Tree) [2.546455] Backtrace: [2.548933] [<80014e00>] (dump_backtrace) from [<80015048>] (show_stack+0x20/0x24) [2.556506] r6:80cd9db0 r5: r4: r3: [2.562234] [<80015028>] (show_stack) from [<808b0094>] (dump_stack+0x8c/0xa4) [2.569467] [<808b0008>] (dump_stack) from [<80077b58>] (__lock_acquire+0x1d24/0x1ecc) [2.577385] r6: r5: r4:80e7d900 r3:0001 [2.583107] [<80075e34>] (__lock_acquire) from [<80078608>] (lock_acquire+0xa4/0x124) [2.590937] r10:6193 r9:80d3e5c0 r8: r7: r6: r5:be0bdae0 [2.598839] r4: [2.601400] [<80078564>] (lock_acquire) from [<80095870>] (call_timer_fn+0x78/0x1a0) [2.609144] r10:0001 r9:0020 r8:bd8cd830 r7:0100 r6:80d3e610 r5:be0bdae0 [2.617045] r4:bd8cd854 [2.619600] [<800957f8>] (call_timer_fn) from [<80095a84>] (run_timer_softirq+0xec/0x2a4) [2.62] r10:bd8cd854 r9:0020 r8:bd8cd830 r7:0020 r6:80d3e610 r5:00c9 [2.635679] r4:be7be440 [2.638238] [<80095998>] (run_timer_softirq) from [<80033cbc>] (__do_softirq+0xdc/0x364) [2.646329] r10:0100 r9:0004 r8:0001 r7:80d3e32c r6:0202 r5:0001 [2.654230] r4:80c92084 [2.656785] [<80033be0>] (__do_softirq) from [<800342b4>] (irq_exit+0xcc/0x140) [2.664094] r10:0001 r9:be01e000 r8:0001 r7: r6:80c932d4 r5: [2.671995] r4:80c8d654 [2.674549] [<800341e8>] (irq_exit) from [<80084254>] (__handle_domain_irq+0x7c/0xf0) [2.682380] r4:80c8d654 r3:0125 [2.685990] [<800841d8>] (__handle_domain_irq) from [<800095a8>] (gic_handle_irq+0x30/0x70) [2.694342] r9: r8:07c1 r7:c080e100 r6:80c934bc r5:c080e10c r4:be0bdc18 [2.702163] [<80009578>] (gic_handle_irq) from [<80015c24>] (__irq_svc+0x44/0x5c) [2.709647] Exception stack(0xbe0bdc18 to 0xbe0bdc60) [2.714702] dc00: 0001 be1103f8 [2.722887] dc20: 6193 2113 80cda1a4 2113 2113 07c1 [2.731068] dc40: 0001 be0bdc74 80e8ecc0 be0bdc60 800758d8 808ba154 2113 [2.739245] r7:be0bdc4c r6: r5:2113 r4:808ba154 [2.744976] [<808ba110>] (_raw_spin_unlock_irqrestore) from [<80385dd0>] (add_dma_entry+0xa4/0x164) [2.754023] r5:02f4f305 r4: [2.757633] [<80385d2c>] (add_dma_entry) from [<803861e0>] (debug_dma_map_page+0x108/0x120) [2.765984] r7:be1e4010 r6:bef98980 r5:bd3cc140 r4:be280c00 [2.771714] [<803860d8>] (debug_dma_map_page) from [<8051b604>] (fec_enet_new_rxbdp.isra.36+0xe4/0x148) [2.781107] r10:be1e4010 r9:0002 r8:bd8c9000 r7:bd3cc140 r6:007a7980 r5:07c1 [2.789008] r4:0140 r3:07c1 [2.792619] [<8051b520>] (fec_enet_new_rxbdp.isra.36) from [<8051c558>] (fec_enet_open+0x98/0x570) [2.801578] r10:bd8cc0f0 r9:003c r8:bd8c9640 r7:bd8cc000 r6:bd254480 r5:bd8c9000 [2.809481] r4:bf088780 [2.812043] [<8051c4c0>] (fec_enet_open) from [<806873a4>] (__dev_open+0xb8/0x120) [2.819613] r10:80d20f00 r9:bd8c9000 r8: r7:bd8c9030 r6:809194c4 r5: [2.827515] r4:bd8c9000 [2.830069] [<806872ec>] (__dev_open) from [<80687680>] (__dev_change_flags+0x98/0x158) [2.838073] r7:1002 r6:1003 r5:0001 r4:bd8c9000 [2.843797] [<806875e8>] (__dev_change_flags) from [<80687768>] (dev_change_flags+0x28/0x58) [2.852235] r8: r7:80d20ff0 r6:1002 r5:bd8c9138 r4:bd8c9000 r3:80c678b4 [2.860060] [<80687740>] (dev_change_flags) from [<80c43ed8>] (ip_auto_config.part.14+0x184/0x1020) [2.869106] r8:80d20f00 r7:80d20ff0 r6:80d20ff0 r5:
Re: [RFC PATCH v2 net-next 3/3] tcp: add NV congestion control
On Tue, Jul 21, 2015 at 9:21 PM, Lawrence Brakmo wrote: > This is a request for comments. > > TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of > NV was presented at 2010's LPC (slides). It is a delayed based > congestion avoidance for the data center. This version has been tested > within a 10G rack where the HW RTTs are 20-50us. > > A description of TCP-NV, including implementation and experimental > results, can be found at: > http://www.brakmo.org/networking/tcp-nv/TCPNV.html > > The current version includes many module parameters to support > experimentation with the parameters. > > Signed-off-by: Lawrence Brakmo > --- > include/net/tcp.h | 1 + > net/ipv4/Kconfig | 16 ++ > net/ipv4/Makefile | 1 + > net/ipv4/sysctl_net_ipv4.c | 9 + > net/ipv4/tcp_input.c | 2 + > net/ipv4/tcp_nv.c | 479 > + > 6 files changed, 508 insertions(+) > create mode 100644 net/ipv4/tcp_nv.c > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 2e62efe..c0690ae 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat; > extern int sysctl_tcp_min_tso_segs; > extern int sysctl_tcp_autocorking; > extern int sysctl_tcp_invalid_ratelimit; > +extern int sysctl_tcp_nv_enable; > > extern atomic_long_t tcp_memory_allocated; > extern struct percpu_counter tcp_sockets_allocated; > diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig > index 6fb3c90..c37b374 100644 > --- a/net/ipv4/Kconfig > +++ b/net/ipv4/Kconfig > @@ -539,6 +539,22 @@ config TCP_CONG_VEGAS > window. TCP Vegas should provide less packet loss, but it is > not as aggressive as TCP Reno. > > +config TCP_CONG_NV > + tristate "TCP NV" > + default m > + ---help--- > + TCP NV is a follow up to TCP Vegas. It has been modified to deal with > + 10G networks, measurement noise introduced by LRO, GRO and interrupt > + coalescence. In addition, it will decrease its cwnd multiplicative multiplicatively > + instead of linearly. > + > + Note that in general congestion avoidance (cwnd decreased when # > packets > + queued grows) cannot coexist with congestion control (cwnd decreased > only > + when there is packet loss) due to fairness issues. One scenario when > the s/the/they > + can coexist safely is when the CA flows have RTTs << CC flows RTTs. > + > + For further details see http://www.brakmo.org/networking/tcp-nv/ > + > config TCP_CONG_SCALABLE > tristate "Scalable TCP" > default n > diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile > index efc43f3..06f335f 100644 > --- a/net/ipv4/Makefile > +++ b/net/ipv4/Makefile > @@ -50,6 +50,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o > obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o > obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o > obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o > +obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o > obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o > obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o > obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c > index 433231c..31846d5 100644 > --- a/net/ipv4/sysctl_net_ipv4.c > +++ b/net/ipv4/sysctl_net_ipv4.c > @@ -730,6 +730,15 @@ static struct ctl_table ipv4_table[] = { > .proc_handler = proc_dointvec_ms_jiffies, > }, > { > + .procname = "tcp_nv_enable", > + .data = &sysctl_tcp_nv_enable, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = &zero, > + .extra2 = &one, > + }, > + { > .procname = "icmp_msgs_per_sec", > .data = &sysctl_icmp_msgs_per_sec, > .maxlen = sizeof(int), > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index aca4ae5..87560d9 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly; > int sysctl_tcp_moderate_rcvbuf __read_mostly = 1; > int sysctl_tcp_early_retrans __read_mostly = 3; > int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2; > +int sysctl_tcp_nv_enable __read_mostly = 1; > +EXPORT_SYMBOL(sysctl_tcp_nv_enable); > > #define FLAG_DATA 0x01 /* Incoming frame contained data. > */ > #define FLAG_WIN_UPDATE0x02 /* Incoming ACK was a window > update. */ > diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c > new file mode 100644 > index 000..af451b6 > --- /dev/null > +++ b/net/ipv4/tcp_nv.c > @@ -0,0 +1,479 @@ > +/* > + * TCP NV: TCP with Congestion Avoidance > + * > + * TCP-NV is a successor of TCP-Vegas that has been developed to > + * deal wi
[net-next:master 83/84] af_mpls.c:undefined reference to `ip6_route_output'
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 0b2c2a931a051e75f9df429b520bb2c2f2bb056b commit: 01faef2cebae02685e2bcfc9bbee8416d5ec19fc [83/84] mpls: make RTA_OIF optional config: i386-randconfig-n1-201529 (attached as .config) reproduce: git checkout 01faef2cebae02685e2bcfc9bbee8416d5ec19fc # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): net/built-in.o: In function `find_outdev': >> af_mpls.c:(.text+0x194fef): undefined reference to `ip6_route_output' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.2.0-rc2 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_PERF_EVENTS_INTEL_UNCORE=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_SMP=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=3 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set CONFIG_KERNEL_LZ4=y CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y # CONFIG_TASK_DELAY_ACCT is not set # CONFIG_TASK_XACCT is not set # # RCU Subsystem # CONFIG_PREEMPT_RCU=y CONFIG_RCU_EXPERT=y CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANOUT_LEAF=16 # CONFIG_RCU_FAST_NO_HZ is not set # CONFIG_TREE_RCU_TRACE is not set CONFIG_RCU_BOOST=y CONFIG_RCU_KTHREAD_PRIO=1 CONFIG_RCU_BOOST_DELAY=500 CONFIG_RCU_NOCB_CPU=y CONFIG_RCU_NOCB_CPU_NONE=y # CONFIG_RCU_NOCB_CPU_ZERO is not set # CONFIG_RCU_NOCB_CPU_ALL is not set # CONFIG_RCU_EXPEDITE_BOOT is not set CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=y # CONFIG_IKCONFIG_PROC is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set # CONFIG_CGROUP_FREEZER is not set # CONFIG_CGROUP_DEVICE is not set CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y # CONFIG_CGROUP_CPUACCT is not set # CONFIG_MEMCG is not set # CONFIG_CGROUP_HUGETLB is not set # CONFIG_CGROUP_PERF is not set CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y # CONFIG_CFS_BANDWIDTH is not set # CONFIG_RT_GROUP_SCHED is not set CONFIG_BLK_CGROUP=y CONFIG_DEBUG_BLK_CGROUP=y # CONFIG_CHECKPOINT_RESTORE is not set CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y # CONFIG_USER_NS is not set CONFIG_PID_NS=y # CONFIG_NET_NS is not set CONFIG_SCHED_AUTOGROUP=y # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not set CO
[PATCH net-next 5/6] bnx2x: Add MFW dump support
Devices with up-to-date management FW will be able to store register dumps on their persistent storage - in case management FW identifies a fatal error it would gather and store such dumps, which could later be retrieved using specific debug tools. This patch adds the necessary part in the driver in order to make the feature operational, as well as update users [under debug] during load in case their device contains a dump of a previous crash. Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 ++ drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 4 drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h | 17 ++ drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 28 4 files changed, 51 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index ecf1d7f..2fe3563 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -2582,6 +2582,8 @@ void bnx2x_set_local_cmng(struct bnx2x *bp); void bnx2x_update_mng_version(struct bnx2x *bp); +void bnx2x_update_mfw_dump(struct bnx2x *bp); + #define MCPR_SCRATCH_BASE(bp) \ (CHIP_IS_E1x(bp) ? MCP_REG_MCPR_SCRATCH : MCP_A_REG_MCPR_SCRATCH) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index 4ad9eeb..7d29bf2 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -2908,6 +2908,10 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode) return -EBUSY; } + /* Update driver data for On-Chip MFW dump. */ + if (IS_PF(bp)) + bnx2x_update_mfw_dump(bp); + /* If PMF - send ADMIN DCBX msg to MFW to initiate DCBX FSM */ if (bp->port.pmf && (bp->state != BNX2X_STATE_DIAG)) bnx2x_dcbx_init(bp, false); diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h index 53c8818..931b1b9 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h @@ -2075,6 +2075,20 @@ enum curr_cfg_method_e { CURR_CFG_MET_VENDOR_SPEC = 2,/* e.g. Option ROM, NPAR, O/S Cfg Utils */ }; +struct mdump_driver_info { + u32 epoc; + u32 drv_ver; + u32 fw_ver; + + u32 valid_dump; + #define FIRST_DUMP_VALID(1 << 0) + #define SECOND_DUMP_VALID (1 << 1) + + u32 flags; + #define ENABLE_ALL_TRIGGERS (0x7fff) + #define TRIGGER_MDUMP_ONCE (1 << 31) +}; + struct ncsi_oem_data { u32 driver_version[4]; struct ncsi_oem_fcoe_features ncsi_oem_fcoe_features; @@ -2347,6 +2361,9 @@ struct shmem2_region { #define OS_DRIVER_STATE_LOADING 1 /* transition state */ #define OS_DRIVER_STATE_DISABLED2 /* installed but disabled */ #define OS_DRIVER_STATE_ACTIVE 3 /* installed and active */ + + /* mini dump driver info */ + struct mdump_driver_info drv_info; /* 0x218 */ }; diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index 0a069fa..78e55fe 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -3709,6 +3709,34 @@ out: ethver, iscsiver, fcoever); } +void bnx2x_update_mfw_dump(struct bnx2x *bp) +{ + struct timeval epoc; + u32 drv_ver; + u32 valid_dump; + + if (!SHMEM2_HAS(bp, drv_info)) + return; + + /* Update Driver load time */ + do_gettimeofday(&epoc); + SHMEM2_WR(bp, drv_info.epoc, epoc.tv_sec); + + drv_ver = bnx2x_update_mng_version_utility(DRV_MODULE_VERSION, true); + SHMEM2_WR(bp, drv_info.drv_ver, drv_ver); + + SHMEM2_WR(bp, drv_info.fw_ver, REG_RD(bp, XSEM_REG_PRAM)); + + /* Check & notify On-Chip dump. */ + valid_dump = SHMEM2_RD(bp, drv_info.valid_dump); + + if (valid_dump & FIRST_DUMP_VALID) + DP(NETIF_MSG_IFUP, "A valid On-Chip MFW dump found on 1st partition\n"); + + if (valid_dump & SECOND_DUMP_VALID) + DP(NETIF_MSG_IFUP, "A valid On-Chip MFW dump found on 2nd partition\n"); +} + static void bnx2x_oem_event(struct bnx2x *bp, u32 event) { u32 cmd_ok, cmd_fail; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 6/6] bnx2x: Bump up driver version to 1.712.30
Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index 2fe3563..a1f9785 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -32,7 +32,7 @@ * (you will need to reboot afterwards) */ /* #define BNX2X_STOP_ON_ERROR */ -#define DRV_MODULE_VERSION "1.710.51-0" +#define DRV_MODULE_VERSION "1.712.30-0" #define DRV_MODULE_RELDATE "2014/02/10" #define BNX2X_BC_VER0x040200 -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/6] bnx2x: Add 84858 phy support
From: Yaniv Rosner This adds support to a new copper phy. Signed-off-by: Yaniv Rosner Signed-off-by: Yuval Mintz --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h | 3 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 244 ++- drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h | 58 +++--- 3 files changed, 232 insertions(+), 73 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h index a838b6e..5425de0 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h @@ -731,6 +731,7 @@ struct port_hw_cfg {/* port 0: 0x12c port 1: 0x2bc */ #define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_BCM8722 0x0f00 #define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_BCM54616 0x1000 #define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_BCM84834 0x1100 + #define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_BCM84858 0x1200 #define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_FAILURE 0xfd00 #define PORT_HW_CFG_XGXS_EXT_PHY2_TYPE_NOT_CONN 0xff00 @@ -788,6 +789,7 @@ struct port_hw_cfg {/* port 0: 0x12c port 1: 0x2bc */ #define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM87220x0f00 #define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM54616 0x1000 #define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84834 0x1100 + #define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84858 0x1200 #define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_DIRECT_WC 0xfc00 #define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_FAILURE0xfd00 #define PORT_HW_CFG_XGXS_EXT_PHY_TYPE_NOT_CONN 0xff00 @@ -2253,6 +2255,7 @@ struct shmem2_region { u32 reserved4; /* Offset 0x150 */ u32 link_attr_sync[PORT_MAX]; /* Offset 0x154 */ #define LINK_ATTR_SYNC_KR2_ENABLE 0x0001 + #define LINK_ATTR_84858 0x0002 #define LINK_SFP_EEPROM_COMP_CODE_MASK 0xff00 #define LINK_SFP_EEPROM_COMP_CODE_SHIFT 8 #define LINK_SFP_EEPROM_COMP_CODE_SR0x1000 diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c index 7f9ec51..d946bba 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c @@ -9654,6 +9654,13 @@ static void bnx2x_8727_link_reset(struct bnx2x_phy *phy, /**/ /* BCM8481/BCM84823/BCM84833 PHY SECTION */ /**/ +static int bnx2x_is_8483x_8485x(struct bnx2x_phy *phy) +{ + return ((phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) || + (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84834) || + (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84858)); +} + static void bnx2x_save_848xx_spirom_version(struct bnx2x_phy *phy, struct bnx2x *bp, u8 port) @@ -9668,8 +9675,7 @@ static void bnx2x_save_848xx_spirom_version(struct bnx2x_phy *phy, }; u16 fw_ver1; - if ((phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) || - (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84834)) { + if (bnx2x_is_8483x_8485x(phy)) { bnx2x_cl45_read(bp, phy, MDIO_CTL_DEVAD, 0x400f, &fw_ver1); bnx2x_save_spirom_version(bp, port, fw_ver1 & 0xfff, phy->ver_addr); @@ -9751,8 +9757,7 @@ static void bnx2x_848xx_set_led(struct bnx2x *bp, bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg, reg_set[i].val); - if ((phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) || - (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84834)) + if (bnx2x_is_8483x_8485x(phy)) offset = MDIO_PMA_REG_84833_CTL_LED_CTL_1; else offset = MDIO_PMA_REG_84823_CTL_LED_CTL_1; @@ -9770,8 +9775,7 @@ static void bnx2x_848xx_specific_func(struct bnx2x_phy *phy, struct bnx2x *bp = params->bp; switch (action) { case PHY_INIT: - if ((phy->type != PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) && - (phy->type != PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84834)) { + if (!bnx2x_is_8483x_8485x(phy)) { /* Save spirom version */ bnx2x_save_848xx_spirom_version(phy, bp, params->port); } @@ -9903,8 +9907,7 @@ static int bnx2x_848xx_cmn_config_init(struct bnx2x_phy *phy, /* Always write this if this is not 8
[PATCH net-next 0/6] bnx2x: update FW, rebrand and more
This patch series does several things - it updates the bnx2x FW into 7.12.30 which both contains some small fixes as well as opening the door for several new features for the device - mainly vxlan/geneve offloads and vlan filtering offload. It then adds a new Multi-function mode [BD] which requires this FW in order to operate. In addition, this finally rebrands the driver from a 'broadcom' driver into a 'qlogic' driver [although it would still reside under Broadcom's tree in the kernel]. Dave, Please consider applying this series to `net-next'. [Do notice some of these don't pass checkpatch cleanly, usually due to confirming with already existing 'bad' styles. E.g., preprocessor `#define' which is shifted] Thanks, Yuval -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/6] bnx2x: Utilize FW 7.12.30
This moves bnx2x into using 7.12.30 FW. Said firmware fixes the following: - Packets from a VF with pvid configured which were sent with a different vlan were transmitted instead of being discarded. - FCoE traffic might not recover after a failue while there's traffic to another function. In addition, this FW opens the door for the driver to implement several new features; Specifically, this enhances the device's support for encapsulated packets and will allow vxlan/geneve offloads to be added in the future, as well as vlan filtering offload. Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c| 11 ++- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h| 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c| 2 + .../net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h| 2 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h| 87 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 53 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 45 +++ 8 files changed, 136 insertions(+), 70 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index a90d736..fc32821 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -2103,9 +2103,14 @@ int bnx2x_rss(struct bnx2x *bp, struct bnx2x_rss_config_obj *rss_obj, if (rss_obj->udp_rss_v6) __set_bit(BNX2X_RSS_IPV6_UDP, ¶ms.rss_flags); - if (!CHIP_IS_E1x(bp)) + if (!CHIP_IS_E1x(bp)) { + /* valid only for TUNN_MODE_VXLAN tunnel mode */ + __set_bit(BNX2X_RSS_IPV4_VXLAN, ¶ms.rss_flags); + __set_bit(BNX2X_RSS_IPV6_VXLAN, ¶ms.rss_flags); + /* valid only for TUNN_MODE_GRE tunnel mode */ - __set_bit(BNX2X_RSS_GRE_INNER_HDRS, ¶ms.rss_flags); + __set_bit(BNX2X_RSS_TUNN_INNER_HDRS, ¶ms.rss_flags); + } } else { __set_bit(BNX2X_RSS_MODE_DISABLED, ¶ms.rss_flags); } @@ -3677,7 +3682,7 @@ static void bnx2x_update_pbds_gso_enc(struct sk_buff *skb, pbd2->fw_ip_hdr_to_payload_w = hlen_w - ((sizeof(struct ipv6hdr)) >> 1); pbd_e2->data.tunnel_data.flags |= - ETH_TUNNEL_DATA_IP_HDR_TYPE_OUTER; + ETH_TUNNEL_DATA_IPV6_OUTER; } pbd2->tcp_send_seq = bswab32(inner_tcp_hdr(skb)->seq); diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h index 03b7404..ec50d12 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h @@ -936,9 +936,7 @@ static inline int bnx2x_func_start(struct bnx2x *bp) else /* CHIP_IS_E1X */ start_params->network_cos_mode = FW_WRR; - start_params->tunnel_mode = TUNN_MODE_GRE; - start_params->gre_tunnel_type = IPGRE_TUNNEL; - start_params->inner_gre_rss_en = 1; + start_params->inner_rss = 1; if (IS_MF_UFP(bp) && BNX2X_IS_MF_SD_PROTOCOL_FCOE(bp)) { start_params->class_fail_ethtype = ETH_P_FIP; diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c index 6e4294e..b50f154 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c @@ -1850,6 +1850,8 @@ static void bnx2x_dcbx_fw_struct(struct bnx2x *bp, if (bp->dcbx_port_params.ets.cos_params[cos]. pri_bitmask & pri_bit) tt2cos[pri].cos = cos; + + pfc_fw_cfg->dcb_outer_pri[pri] = ttp[pri]; } /* we never want the FW to add a 0 vlan tag */ diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h index 7636e3c..bfda526 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h @@ -372,7 +372,7 @@ #define MAX_COS_NUMBER 4 #define MAX_TRAFFIC_TYPES 8 #define MAX_PFC_PRIORITIES 8 - +#define MAX_VLAN_PRIORITIES 8 /* used by array traffic_type_to_priority[] to mark traffic type \ that is not mapped to priority*/ #define LLFC_TRAFFIC_TYPE_TO_PRIORITY_UNMAPPED 0xFF diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h index 058bc73..2b6f97b 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h @@ -2898,8 +2898,8 @@ struct afex_stats { }; #define BCM_5710_FW_
[PATCH net-next 4/6] bnx2x: new Multi-function mode - BD
This adds support to a new multi-function mode, enabling driver to initialize such devices and correctly interacting with management FW for fully utilizing their features. Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h| 3 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c| 74 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h| 36 +++ .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c| 3 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h| 74 ++ drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 56 drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h| 17 - drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 3 + 8 files changed, 251 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index a59f0b9..ecf1d7f 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -1424,6 +1424,7 @@ enum { SUB_MF_MODE_UNKNOWN = 0, SUB_MF_MODE_UFP, SUB_MF_MODE_NPAR1_DOT_5, + SUB_MF_MODE_BD, }; struct bnx2x { @@ -1638,6 +1639,8 @@ struct bnx2x { u8 mf_sub_mode; #define IS_MF_UFP(bp) (IS_MF_SD(bp) && \ bp->mf_sub_mode == SUB_MF_MODE_UFP) +#define IS_MF_BD(bp) (IS_MF_SD(bp) && \ +bp->mf_sub_mode == SUB_MF_MODE_BD) u8 wol; diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index e395ae9..0e392ca 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -2517,6 +2517,20 @@ static void bnx2x_bz_fp(struct bnx2x *bp, int index) fp->mode = TPA_MODE_DISABLED; } +void bnx2x_set_os_driver_state(struct bnx2x *bp, u32 state) +{ + u32 cur; + + if (!IS_MF_BD(bp) || !SHMEM2_HAS(bp, os_driver_state) || IS_VF(bp)) + return; + + cur = SHMEM2_RD(bp, os_driver_state[BP_FW_MB_IDX(bp)]); + DP(NETIF_MSG_IFUP, "Driver state %08x-->%08x\n", + cur, state); + + SHMEM2_WR(bp, os_driver_state[BP_FW_MB_IDX(bp)], state); +} + int bnx2x_load_cnic(struct bnx2x *bp) { int i, rc, port = BP_PORT(bp); @@ -2880,6 +2894,8 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode) /* mark driver is loaded in shmem2 */ u32 val; val = SHMEM2_RD(bp, drv_capabilities_flag[BP_FW_MB_IDX(bp)]); + val &= ~DRV_FLAGS_MTU_MASK; + val |= (bp->dev->mtu << DRV_FLAGS_MTU_SHIFT); SHMEM2_WR(bp, drv_capabilities_flag[BP_FW_MB_IDX(bp)], val | DRV_FLAGS_CAPABILITIES_LOADED_SUPPORTED | DRV_FLAGS_CAPABILITIES_LOADED_L2); @@ -2896,6 +2912,9 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode) if (bp->port.pmf && (bp->state != BNX2X_STATE_DIAG)) bnx2x_dcbx_init(bp, false); + if (!IS_MF_SD_STORAGE_PERSONALITY_ONLY(bp)) + bnx2x_set_os_driver_state(bp, OS_DRIVER_STATE_ACTIVE); + DP(NETIF_MSG_IFUP, "Ending successfully NIC load\n"); return 0; @@ -2963,6 +2982,9 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link) DP(NETIF_MSG_IFUP, "Starting NIC unload\n"); + if (!IS_MF_SD_STORAGE_PERSONALITY_ONLY(bp)) + bnx2x_set_os_driver_state(bp, OS_DRIVER_STATE_DISABLED); + /* mark driver is unloaded in shmem2 */ if (IS_PF(bp) && SHMEM2_HAS(bp, drv_capabilities_flag)) { u32 val; @@ -4191,6 +4213,41 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } +void bnx2x_get_c2s_mapping(struct bnx2x *bp, u8 *c2s_map, u8 *c2s_default) +{ + int mfw_vn = BP_FW_MB_IDX(bp); + u32 tmp; + + /* If the shmem shouldn't affect configuration, reflect */ + if (!IS_MF_BD(bp)) { + int i; + + for (i = 0; i < BNX2X_MAX_PRIORITY; i++) + c2s_map[i] = i; + *c2s_default = 0; + + return; + } + + tmp = SHMEM2_RD(bp, c2s_pcp_map_lower[mfw_vn]); + tmp = (__force u32)be32_to_cpu((__force __be32)tmp); + c2s_map[0] = tmp & 0xff; + c2s_map[1] = (tmp >> 8) & 0xff; + c2s_map[2] = (tmp >> 16) & 0xff; + c2s_map[3] = (tmp >> 24) & 0xff; + + tmp = SHMEM2_RD(bp, c2s_pcp_map_upper[mfw_vn]); + tmp = (__force u32)be32_to_cpu((__force __be32)tmp); + c2s_map[4] = tmp & 0xff; + c2s_map[5] = (tmp >> 8) & 0xff; + c2s_map[6] = (tmp >> 16) & 0xff; + c2s_map[7] = (tmp >> 24) & 0xff; + + tmp = SHMEM2_RD(bp, c2s_pcp_map_default[mfw_vn]); + tmp = (__force u32)be32_t
[PATCH net-next 2/6] bnx2x: Rebrand from 'broadcom' into 'qlogic'
bnx2x still appears as a Broadcom driver even though the devices it utilizes belong to Qlogic for more than a year. This patch changes the various headers and the device strings to indicate the correct ownership of the device. Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h| 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c| 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h| 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c| 10 +++-- drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.h| 10 +++-- drivers/net/ethernet/broadcom/bnx2x/bnx2x_dump.h | 10 +++-- .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c| 4 +- .../net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h| 4 +- .../ethernet/broadcom/bnx2x/bnx2x_fw_file_hdr.h| 2 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h| 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h | 4 +- .../net/ethernet/broadcom/bnx2x/bnx2x_init_ops.h | 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 10 +++-- drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h | 10 +++-- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 50 +++--- .../net/ethernet/broadcom/bnx2x/bnx2x_mfw_req.h| 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h| 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 14 +++--- drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 14 +++--- drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 10 +++-- drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h | 10 +++-- drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c | 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h | 4 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c | 10 +++-- drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h | 22 ++ 25 files changed, 142 insertions(+), 88 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index cd4ae76..a59f0b9 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -1,6 +1,8 @@ -/* bnx2x.h: Broadcom Everest network driver. +/* bnx2x.h: QLogic Everest network driver. * * Copyright (c) 2007-2013 Broadcom Corporation + * Copyright (c) 2014 QLogic Corporation + * All rights reserved * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index fc32821..e395ae9 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -1,6 +1,8 @@ -/* bnx2x_cmn.c: Broadcom Everest network driver. +/* bnx2x_cmn.c: QLogic Everest network driver. * * Copyright (c) 2007-2013 Broadcom Corporation + * Copyright (c) 2014 QLogic Corporation + * All rights reserved * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h index ec50d12..77693d3 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h @@ -1,6 +1,8 @@ -/* bnx2x_cmn.h: Broadcom Everest network driver. +/* bnx2x_cmn.h: QLogic Everest network driver. * * Copyright (c) 2007-2013 Broadcom Corporation + * Copyright (c) 2014 QLogic Corporation + * All rights reserved * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c index b50f154..7ccf668 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c @@ -1,15 +1,17 @@ -/* bnx2x_dcb.c: Broadcom Everest network driver. +/* bnx2x_dcb.c: QLogic Everest network driver. * * Copyright 2009-2013 Broadcom Corporation + * Copyright 2014 QLogic Corporation + * All rights reserved * - * Unless you and Broadcom execute a separate written software license + * Unless you and QLogic execute a separate written software license * agreement governing use of this software, this software is licensed to you * under the terms of the GNU General Public License version 2, available * at http://www.gnu.org/licenses/old-licenses/gpl-2.0.html (the "GPL"). * * Notwithstanding the above, under no circumstances may you combine this - * software in any way with any other Broadcom software provided under a - * license other than the GPL, without Broadcom's express prior written + * software in any way with any other QLogic software provided under a + * license other than the GPL, without QLogic's express prior written
Re: [PATCH net-next v2] bridge: Fix setting a flag in br_fill_ifvlaninfo_range().
On 7/21/15, 9:57 PM, Rami Rosen wrote: This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() method. The assignment of vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is unneeded, as vinfo.flags value is overriden by the immediately following vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement. Signed-off-by: Rami Rosen Acked-by: Roopa Prabhu Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] mpls_iptunnel: fix sparse warn: remove incorrect rcu_dereference
From: Roopa Prabhu fix for: net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison expression (different address spaces) remove incorrect rcu_dereference possibly left over from earlier revisions of the code. Reported-by: kbuild test robot Signed-off-by: Roopa Prabhu --- net/mpls/mpls_iptunnel.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c index eea096f..276f8c9 100644 --- a/net/mpls/mpls_iptunnel.c +++ b/net/mpls/mpls_iptunnel.c @@ -70,7 +70,7 @@ int mpls_output(struct sock *sk, struct sk_buff *skb) skb_orphan(skb); /* Find the output device */ - out_dev = rcu_dereference(dst->dev); + out_dev = dst->dev; if (!mpls_output_possible(out_dev) || !lwtstate || skb_warn_if_lro(skb)) goto drop; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] be2net: support ndo_get_phys_port_id()
From: Sriharsha Basavapatna Add be_get_phys_port_id() function to report physical port id. The port id should be unique across different be2net devices in the system. We use the chip serial number along with the physical port number for this. Signed-off-by: Sriharsha Basavapatna --- drivers/net/ethernet/emulex/benet/be.h | 3 +++ drivers/net/ethernet/emulex/benet/be_cmds.c | 7 ++- drivers/net/ethernet/emulex/benet/be_cmds.h | 8 +--- drivers/net/ethernet/emulex/benet/be_main.c | 22 ++ 4 files changed, 36 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h index cb5777b..8cd384d 100644 --- a/drivers/net/ethernet/emulex/benet/be.h +++ b/drivers/net/ethernet/emulex/benet/be.h @@ -105,6 +105,8 @@ #define MAX_VFS30 /* Max VFs supported by BE3 FW */ #define FW_VER_LEN 32 +#defineCNTL_SERIAL_NUM_WORDS 8 /* Controller serial number words */ +#defineCNTL_SERIAL_NUM_WORD_SZ (sizeof(u16)) /* Byte-sz of serial num word */ #defineRSS_INDIR_TABLE_LEN 128 #define RSS_HASH_KEY_LEN 40 @@ -590,6 +592,7 @@ struct be_adapter { struct rss_info rss_info; /* Filters for packets that need to be sent to BMC */ u32 bmc_filt_mask; + u16 serial_num[CNTL_SERIAL_NUM_WORDS]; }; #define be_physfn(adapter) (!adapter->virtfn) diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c index ecad46f..3be1fbd 100644 --- a/drivers/net/ethernet/emulex/benet/be_cmds.c +++ b/drivers/net/ethernet/emulex/benet/be_cmds.c @@ -2852,10 +2852,11 @@ int be_cmd_get_cntl_attributes(struct be_adapter *adapter) struct be_mcc_wrb *wrb; struct be_cmd_req_cntl_attribs *req; struct be_cmd_resp_cntl_attribs *resp; - int status; + int status, i; int payload_len = max(sizeof(*req), sizeof(*resp)); struct mgmt_controller_attrib *attribs; struct be_dma_mem attribs_cmd; + u32 *serial_num; if (mutex_lock_interruptible(&adapter->mbox_lock)) return -1; @@ -2886,6 +2887,10 @@ int be_cmd_get_cntl_attributes(struct be_adapter *adapter) if (!status) { attribs = attribs_cmd.va + sizeof(struct be_cmd_resp_hdr); adapter->hba_port_num = attribs->hba_attribs.phy_port; + serial_num = attribs->hba_attribs.controller_serial_number; + for (i = 0; i < CNTL_SERIAL_NUM_WORDS; i++) + adapter->serial_num[i] = le32_to_cpu(serial_num[i]) & + (BIT_MASK(16) - 1); } err: diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h index a4479f7..36d835b 100644 --- a/drivers/net/ethernet/emulex/benet/be_cmds.h +++ b/drivers/net/ethernet/emulex/benet/be_cmds.h @@ -1637,10 +1637,12 @@ struct be_cmd_req_set_qos { struct mgmt_hba_attribs { u32 rsvd0[24]; u8 controller_model_number[32]; - u32 rsvd1[79]; - u8 rsvd2[3]; + u32 rsvd1[16]; + u32 controller_serial_number[8]; + u32 rsvd2[55]; + u8 rsvd3[3]; u8 phy_port; - u32 rsvd3[13]; + u32 rsvd4[13]; } __packed; struct mgmt_controller_attrib { diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index c996dd7..5e92db8 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -5219,6 +5219,27 @@ static netdev_features_t be_features_check(struct sk_buff *skb, } #endif +static int be_get_phys_port_id(struct net_device *dev, + struct netdev_phys_item_id *ppid) +{ + int i, id_len = CNTL_SERIAL_NUM_WORDS * CNTL_SERIAL_NUM_WORD_SZ + 1; + struct be_adapter *adapter = netdev_priv(dev); + u8 *id; + + if (MAX_PHYS_ITEM_ID_LEN < id_len) + return -ENOSPC; + + ppid->id[0] = adapter->hba_port_num + 1; + id = &ppid->id[1]; + for (i = CNTL_SERIAL_NUM_WORDS - 1; i >= 0; +i--, id += CNTL_SERIAL_NUM_WORD_SZ) + memcpy(id, &adapter->serial_num[i], CNTL_SERIAL_NUM_WORD_SZ); + + ppid->id_len = id_len; + + return 0; +} + static const struct net_device_ops be_netdev_ops = { .ndo_open = be_open, .ndo_stop = be_close, @@ -5249,6 +5270,7 @@ static const struct net_device_ops be_netdev_ops = { .ndo_del_vxlan_port = be_del_vxlan_port, .ndo_features_check = be_features_check, #endif + .ndo_get_phys_port_id = be_get_phys_port_id, }; static void be_netdev_init(struct net_device *netdev) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.ke
Re: [RFC PATCH v2 net-next 1/3] tcp: replace cnt & rtt with struct in pkts_acked()
On Tue, 2015-07-21 at 21:21 -0700, Lawrence Brakmo wrote: > Replace 2 arguments (cnt and rtt) in the congestion control modules' > pkts_acked() function with a struct. This will allow adding more > information without having to modify existing congestion control > modules (tcp_nv in particular needs bytes in flight when packet > was sent). > > This was proposed by Neal Cardwell in his comments to the tcp_nv patch. Are you sure Neal suggested to pass a struct as argument ? It was probably a struct pointer instead. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: track success and failure of TCP PMTU probing
From: r...@tardy.usa.hp.com (Rick Jones) Date: Tue, 21 Jul 2015 16:14:13 -0700 (PDT) > From: Rick Jones > > Track success and failure of TCP PMTU probing. > > Signed-off-by: Rick Jones Seems reasonable, applied, thanks Rick. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ravb: fix ring memory allocation
From: Sergei Shtylyov Date: Wed, 22 Jul 2015 01:31:59 +0300 > The driver is written as if it can adapt to a low memory situation allocating > less RX skbs and TX aligned buffers than the respective RX/TX ring sizes. In > reality though the driver would malfunction in this case. Stop being overly > smart and just fail in such situation -- this is achieved by moving the memory > allocation from ravb_ring_format() to ravb_ring_init(). > > We leave dma_map_single() calls in place but make their failure non-fatal > by marking the corresponding RX descriptors with zero data size which should > prevent DMA to an invalid addresses. > > Signed-off-by: Sergei Shtylyov Applied. But the real way to handle this is to allocate all of the necessary resources for the replacement RX SKB before unmapping and passing the original SKB up into the stack. That way you _NEVER_ starve the device of RX packets to receive into, since if you fail the memory allocation or the DMA mapping, you just put the original SKB back into the ring. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 net-next] cxgb4: Add debugfs entry to enable backdoor access
From: Hariprasad Shenai Date: Tue, 21 Jul 2015 22:39:40 +0530 > Add debugfs entry 'use_backdoor' to enable backdoor access to read sge > context. By default, we read sge context's via firmware. In case of FW > issues, one can enable backdoor access via debugfs to dump sge context > for debugging purpose. > > Signed-off-by: Hariprasad Shenai > --- > V2: Remove unnecessary braces as per comments by Sergei Shtylyov > Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: dp83867: Fix warning check for setting the internal delay
From: Dan Murphy Date: Tue, 21 Jul 2015 12:06:45 -0500 > Fix warning: logical ‘or’ of collectively exhaustive tests is always true > > Change the internal delay check from an 'or' condition to an 'and' > condition. > > Reported-by: David Binderman > Signed-off-by: Dan Murphy Applied, thanks.
Re: [PATCH] cgroup: net_cls: fix false-positive "suspicious RCU usage"
From: Konstantin Khlebnikov Date: Tue, 21 Jul 2015 19:46:29 +0300 > @@ -23,7 +23,8 @@ static inline struct cgroup_cls_state *css_cls_state(struct > cgroup_subsys_state > > struct cgroup_cls_state *task_cls_state(struct task_struct *p) > { > - return css_cls_state(task_css(p, net_cls_cgrp_id)); > + return css_cls_state(task_css_check(p, net_cls_cgrp_id, > +rcu_read_lock_bh_held())); You've made a serious mess of the indentation here. First of all, you've changed the correct plain "TAB" before the 'return' line into a TAB and two SPACE characters. Secondly, the second line needs to be precisely indented to the exact column following the openning parenthesis of the task_css_check() call on the previous line. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] mpls: make RTA_OIF optional
From: Roopa Prabhu Date: Tue, 21 Jul 2015 09:16:24 -0700 > From: Roopa Prabhu > > If user did not specify an oif, try and get it from the via address. > If failed to get device, return with -ENODEV. > > Signed-off-by: Roopa Prabhu Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
From: Chris J Arges Date: Tue, 21 Jul 2015 12:36:33 -0500 > Some architectures like POWER can have a NUMA node_possible_map that > contains sparse entries. This causes memory corruption with openvswitch > since it allocates flow_cache with a multiple of num_possible_nodes() and > assumes the node variable returned by for_each_node will index into > flow->stats[node]. > > Use nr_node_ids to allocate a maximal sparse array instead of > num_possible_nodes(). > > The crash was noticed after 3af229f2 was applied as it changed the > node_possible_map to match node_online_map on boot. > Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 > > Signed-off-by: Chris J Arges Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] netlink: don't hold mutex in rcu callback when releasing mmapd ring
From: Florian Westphal Date: Tue, 21 Jul 2015 16:33:50 +0200 > Kirill A. Shutemov says: > > This simple test-case trigers few locking asserts in kernel: ... > Cong Wang says: > > We can't hold mutex lock in a rcu callback, [..] > > Thomas Graf says: > > The socket should be dead at this point. It might be simpler to > add a netlink_release_ring() function which doesn't require > locking at all. > > Reported-by: "Kirill A. Shutemov" > Diagnosed-by: Cong Wang > Suggested-by: Thomas Graf > Signed-off-by: Florian Westphal Applied, thanks everyone. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/9] sfc: support for cascaded multicast filtering
From: Edward Cree Date: Tue, 21 Jul 2015 15:07:44 +0100 > Recent versions of firmware for SFC9100 adapters add support for filter > chaining, in which packets matching multiple filters are delivered to all > filters' recipients, rather than only the highest match-priority filter as > was > previously the case. > This patch series enables this feature and redesigns the filter handling code > to make use of it; in particular, subscribing to a multicast address on one > function no longer prevents traffic to that address reaching another function > which is in promiscuous or allmulti mode. > If the firmware does not support filter chaining, the driver will fall back to > the old behaviour. Series applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 net 0/3] BPF JIT fixes for ARM
From: Nicolas Schichan Date: Tue, 21 Jul 2015 14:14:11 +0200 > These patches are fixing bugs in the ARM JIT and should probably find > their way to a stable kernel. All 60 test_bpf tests in Linux 4.1 release > are now passing OK (was 54 out of 60 before). Series applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] tcp: suppress a division by zero warning
From: Eric Dumazet Date: Wed, 22 Jul 2015 07:02:00 +0200 > From: Eric Dumazet > > Andrew Morton reported following warning on one ARM build > with gcc-4.4 : > > net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc': > net/ipv4/inet_hashtables.c:617: warning: division by zero > > Even guarded with a test on sizeof(spinlock_t), compiler does not > like current construct on a !CONFIG_SMP build. > > Remove the warning by using a temporary variable. > > Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()") > Reported-by: Andrew Morton > Signed-off-by: Eric Dumazet Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] tcp: suppress a division by zero warning
From: Eric Dumazet Andrew Morton reported following warning on one ARM build with gcc-4.4 : net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc': net/ipv4/inet_hashtables.c:617: warning: division by zero Even guarded with a test on sizeof(spinlock_t), compiler does not like current construct on a !CONFIG_SMP build. Remove the warning by using a temporary variable. Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()") Reported-by: Andrew Morton Signed-off-by: Eric Dumazet --- net/ipv4/inet_hashtables.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 5f9b063bbe8a..0cb9165421d4 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -624,22 +624,21 @@ EXPORT_SYMBOL_GPL(inet_hashinfo_init); int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo) { + unsigned int locksz = sizeof(spinlock_t); unsigned int i, nblocks = 1; - if (sizeof(spinlock_t) != 0) { + if (locksz != 0) { /* allocate 2 cache lines or at least one spinlock per cpu */ - nblocks = max_t(unsigned int, - 2 * L1_CACHE_BYTES / sizeof(spinlock_t), - 1); + nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U); nblocks = roundup_pow_of_two(nblocks * num_possible_cpus()); /* no more locks than number of hash buckets */ nblocks = min(nblocks, hashinfo->ehash_mask + 1); - hashinfo->ehash_locks = kmalloc_array(nblocks, sizeof(spinlock_t), + hashinfo->ehash_locks = kmalloc_array(nblocks, locksz, GFP_KERNEL | __GFP_NOWARN); if (!hashinfo->ehash_locks) - hashinfo->ehash_locks = vmalloc(nblocks * sizeof(spinlock_t)); + hashinfo->ehash_locks = vmalloc(nblocks * locksz); if (!hashinfo->ehash_locks) return -ENOMEM; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] bridge: Fix setting a flag in br_fill_ifvlaninfo_range().
This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() method. The assignment of vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is unneeded, as vinfo.flags value is overriden by the immediately following vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement. Signed-off-by: Rami Rosen --- net/bridge/br_netlink.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 364bdc9..793d247 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -164,8 +164,6 @@ static int br_fill_ifvlaninfo_range(struct sk_buff *skb, u16 vid_start, sizeof(vinfo), &vinfo)) goto nla_put_failure; - vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN; - vinfo.vid = vid_end; vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END; if (nla_put(skb, IFLA_BRIDGE_VLAN_INFO, -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen
On Tue, 2015-07-21 at 19:03 -0700, Cong Wang wrote: > On Tue, Jul 21, 2015 at 1:57 PM, Eric Dumazet wrote: > > On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote: > > > >> > - kfree_skb(skb); > >> > + INIT_LIST_HEAD(&q->new_flows); > >> > + INIT_LIST_HEAD(&q->old_flows); > >> > + for (i = 0; i < q->flows_cnt; i++) { > >> > + struct fq_codel_flow *flow = q->flows + i; > >> > + > >> > + while (flow->head) > >> > + kfree_skb(dequeue_head(flow)); > >> > + > >> > + INIT_LIST_HEAD(&flow->flowchain); > >> > >> > >> You probably need to call codel_vars_init(&flow->cvars) as well. > > > > It is not necessary : flow->cvars only matter in the event of a dequeue, > > but whole qdisc is dismantled and no packet will be dequeued. > > > > But it will affect the next dequeue _after_ reset? which is not supposed > to happen as we expect a fresh start after reset? Hmm... I thought reset() was only done at queue dismantle, so no new packet should be added later, and since no packet should be left after reset, no dequeue should happen. For completeness, we still can add the codel_vars_init(), no problem. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf] netfilter: Support expectations in different zones
When zones were originally introduced, the expectation functions were all extended to perform lookup using the zone. However, insertion was not modified to check the zone. This means that two expectations which are intended to apply for different connections that have the same tuple but exist in different zones cannot both be tracked. Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones") Signed-off-by: Joe Stringer --- net/netfilter/nf_conntrack_expect.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c index 7a17070..b45a422 100644 --- a/net/netfilter/nf_conntrack_expect.c +++ b/net/netfilter/nf_conntrack_expect.c @@ -219,7 +219,8 @@ static inline int expect_clash(const struct nf_conntrack_expect *a, a->mask.src.u3.all[count] & b->mask.src.u3.all[count]; } - return nf_ct_tuple_mask_cmp(&a->tuple, &b->tuple, &intersect_mask); + return nf_ct_tuple_mask_cmp(&a->tuple, &b->tuple, &intersect_mask) && + nf_ct_zone(a->master) == nf_ct_zone(b->master); } static inline int expect_matches(const struct nf_conntrack_expect *a, -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 net-next 3/3] tcp: add NV congestion control
This is a request for comments. TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of NV was presented at 2010's LPC (slides). It is a delayed based congestion avoidance for the data center. This version has been tested within a 10G rack where the HW RTTs are 20-50us. A description of TCP-NV, including implementation and experimental results, can be found at: http://www.brakmo.org/networking/tcp-nv/TCPNV.html The current version includes many module parameters to support experimentation with the parameters. Signed-off-by: Lawrence Brakmo --- include/net/tcp.h | 1 + net/ipv4/Kconfig | 16 ++ net/ipv4/Makefile | 1 + net/ipv4/sysctl_net_ipv4.c | 9 + net/ipv4/tcp_input.c | 2 + net/ipv4/tcp_nv.c | 479 + 6 files changed, 508 insertions(+) create mode 100644 net/ipv4/tcp_nv.c diff --git a/include/net/tcp.h b/include/net/tcp.h index 2e62efe..c0690ae 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat; extern int sysctl_tcp_min_tso_segs; extern int sysctl_tcp_autocorking; extern int sysctl_tcp_invalid_ratelimit; +extern int sysctl_tcp_nv_enable; extern atomic_long_t tcp_memory_allocated; extern struct percpu_counter tcp_sockets_allocated; diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index 6fb3c90..c37b374 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -539,6 +539,22 @@ config TCP_CONG_VEGAS window. TCP Vegas should provide less packet loss, but it is not as aggressive as TCP Reno. +config TCP_CONG_NV + tristate "TCP NV" + default m + ---help--- + TCP NV is a follow up to TCP Vegas. It has been modified to deal with + 10G networks, measurement noise introduced by LRO, GRO and interrupt + coalescence. In addition, it will decrease its cwnd multiplicative + instead of linearly. + + Note that in general congestion avoidance (cwnd decreased when # packets + queued grows) cannot coexist with congestion control (cwnd decreased only + when there is packet loss) due to fairness issues. One scenario when the + can coexist safely is when the CA flows have RTTs << CC flows RTTs. + + For further details see http://www.brakmo.org/networking/tcp-nv/ + config TCP_CONG_SCALABLE tristate "Scalable TCP" default n diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index efc43f3..06f335f 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -50,6 +50,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o +obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 433231c..31846d5 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -730,6 +730,15 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec_ms_jiffies, }, { + .procname = "tcp_nv_enable", + .data = &sysctl_tcp_nv_enable, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, + { .procname = "icmp_msgs_per_sec", .data = &sysctl_icmp_msgs_per_sec, .maxlen = sizeof(int), diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index aca4ae5..87560d9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly; int sysctl_tcp_moderate_rcvbuf __read_mostly = 1; int sysctl_tcp_early_retrans __read_mostly = 3; int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2; +int sysctl_tcp_nv_enable __read_mostly = 1; +EXPORT_SYMBOL(sysctl_tcp_nv_enable); #define FLAG_DATA 0x01 /* Incoming frame contained data. */ #define FLAG_WIN_UPDATE0x02 /* Incoming ACK was a window update. */ diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c new file mode 100644 index 000..af451b6 --- /dev/null +++ b/net/ipv4/tcp_nv.c @@ -0,0 +1,479 @@ +/* + * TCP NV: TCP with Congestion Avoidance + * + * TCP-NV is a successor of TCP-Vegas that has been developed to + * deal with the issues that occur in modern networks. + * Like TCP-Vegas, TCP-NV supports true congestion avoidance, + * the ability to detect congestion before packet losses occur. + * When congestion (queue buildup) starts to occur, TCP-NV + * predicts what the cwnd size should be for the current + * throughput and it re
[RFC PATCH v2 net-next 1/3] tcp: replace cnt & rtt with struct in pkts_acked()
Replace 2 arguments (cnt and rtt) in the congestion control modules' pkts_acked() function with a struct. This will allow adding more information without having to modify existing congestion control modules (tcp_nv in particular needs bytes in flight when packet was sent). This was proposed by Neal Cardwell in his comments to the tcp_nv patch. Signed-off-by: Lawrence Brakmo --- include/net/tcp.h | 7 ++- net/ipv4/tcp_bic.c | 6 +++--- net/ipv4/tcp_cdg.c | 14 +++--- net/ipv4/tcp_cubic.c| 6 +++--- net/ipv4/tcp_htcp.c | 10 +- net/ipv4/tcp_illinois.c | 20 ++-- net/ipv4/tcp_input.c| 7 +-- net/ipv4/tcp_lp.c | 6 +++--- net/ipv4/tcp_vegas.c| 6 +++--- net/ipv4/tcp_vegas.h| 2 +- net/ipv4/tcp_veno.c | 6 +++--- net/ipv4/tcp_westwood.c | 6 +++--- net/ipv4/tcp_yeah.c | 6 +++--- 13 files changed, 55 insertions(+), 47 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 364426a..26e7651 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -834,6 +834,11 @@ enum tcp_ca_ack_event_flags { union tcp_cc_info; +struct ack_sample { + u32 pkts_acked; + s32 rtt_us; +}; + struct tcp_congestion_ops { struct list_headlist; u32 key; @@ -857,7 +862,7 @@ struct tcp_congestion_ops { /* new value of cwnd after loss (optional) */ u32 (*undo_cwnd)(struct sock *sk); /* hook for packet ack accounting (optional) */ - void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us); + void (*pkts_acked)(struct sock *sk, struct ack_sample); /* get info for inet_diag (optional) */ size_t (*get_info)(struct sock *sk, u32 ext, int *attr, union tcp_cc_info *info); diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c index fd1405d..6a873f7 100644 --- a/net/ipv4/tcp_bic.c +++ b/net/ipv4/tcp_bic.c @@ -197,15 +197,15 @@ static void bictcp_state(struct sock *sk, u8 new_state) /* Track delayed acknowledgment ratio using sliding window * ratio = (15*ratio + sample) / 16 */ -static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt) +static void bictcp_acked(struct sock *sk, struct ack_sample sample) { const struct inet_connection_sock *icsk = inet_csk(sk); if (icsk->icsk_ca_state == TCP_CA_Open) { struct bictcp *ca = inet_csk_ca(sk); - cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT; - ca->delayed_ack += cnt; + ca->delayed_ack += sample.pkts_acked - + (ca->delayed_ack >> ACK_RATIO_SHIFT); } } diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c index 167b6a3..ef64106 100644 --- a/net/ipv4/tcp_cdg.c +++ b/net/ipv4/tcp_cdg.c @@ -294,12 +294,12 @@ static void tcp_cdg_cong_avoid(struct sock *sk, u32 ack, u32 acked) ca->shadow_wnd = max(ca->shadow_wnd, ca->shadow_wnd + incr); } -static void tcp_cdg_acked(struct sock *sk, u32 num_acked, s32 rtt_us) +static void tcp_cdg_acked(struct sock *sk, struct ack_sample sample) { struct cdg *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); - if (rtt_us <= 0) + if (sample.rtt_us <= 0) return; /* A heuristic for filtering delayed ACKs, adapted from: @@ -307,20 +307,20 @@ static void tcp_cdg_acked(struct sock *sk, u32 num_acked, s32 rtt_us) * delay and rate based TCP mechanisms." TR 100219A. CAIA, 2010. */ if (tp->sacked_out == 0) { - if (num_acked == 1 && ca->delack) { + if (sample.pkts_acked == 1 && ca->delack) { /* A delayed ACK is only used for the minimum if it is * provenly lower than an existing non-zero minimum. */ - ca->rtt.min = min(ca->rtt.min, rtt_us); + ca->rtt.min = min(ca->rtt.min, sample.rtt_us); ca->delack--; return; - } else if (num_acked > 1 && ca->delack < 5) { + } else if (sample.pkts_acked > 1 && ca->delack < 5) { ca->delack++; } } - ca->rtt.min = min_not_zero(ca->rtt.min, rtt_us); - ca->rtt.max = max(ca->rtt.max, rtt_us); + ca->rtt.min = min_not_zero(ca->rtt.min, sample.rtt_us); + ca->rtt.max = max(ca->rtt.max, sample.rtt_us); } static u32 tcp_cdg_ssthresh(struct sock *sk) diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index 28011fb..070d629 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -416,21 +416,21 @@ static void hystart_update(struct sock *sk, u32 delay) /* Track delayed acknowledgment ratio using sliding window * ratio = (15*ratio + sample) / 16 */ -static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us) +static void bictcp_acked(struct sock *sk, struct ack_sample sample) { const struct tcp
[RFC PATCH v2 net-next 2/3] tcp: add in_flight to tcp_skb_cb
Based on comments by Neal Cardwell to tcp_nv patch: AFAICT this patch would not require an increase in the size of sk_buff cb[] if it were to take advantage of the fact that the tcp_skb_cb header.h4 and header.h6 fields are only used in the packet reception code path, and this in_flight field is only used on the transmit side. So the in_flight field could be placed in a struct that is itself placed in a union with the "header" union. That way the sender code can remember the in_flight value without requiring any extra space. And in the future other sender-side info could be stored in the "tx" struct, if needed. Signed-off-by: Lawrence Brakmo --- include/net/tcp.h | 13 ++--- net/ipv4/tcp_input.c | 5 - net/ipv4/tcp_output.c | 4 +++- 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 26e7651..2e62efe 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -755,11 +755,17 @@ struct tcp_skb_cb { /* 1 byte hole */ __u32 ack_seq;/* Sequence number ACK'd*/ union { - struct inet_skb_parmh4; + struct { + /* bytes in flight when this packet was sent */ + __u32 in_flight; + } tx; /* only used for outgoing skbs */ + union { + struct inet_skb_parmh4; #if IS_ENABLED(CONFIG_IPV6) - struct inet6_skb_parm h6; + struct inet6_skb_parm h6; #endif - } header; /* For incoming frames */ + } header; /* For incoming skbs */ + }; }; #define TCP_SKB_CB(__skb) ((struct tcp_skb_cb *)&((__skb)->cb[0])) @@ -837,6 +843,7 @@ union tcp_cc_info; struct ack_sample { u32 pkts_acked; s32 rtt_us; + u32 in_flight; }; struct tcp_congestion_ops { diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4f641f6..aca4ae5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3068,6 +3068,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, long ca_rtt_us = -1L; struct sk_buff *skb; u32 pkts_acked = 0; + u32 last_in_flight = 0; bool rtt_update; int flag = 0; @@ -3107,6 +3108,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, if (!first_ackt.v64) first_ackt = last_ackt; + last_in_flight = TCP_SKB_CB(skb)->tx.in_flight; reord = min(pkts_acked, reord); if (!after(scb->end_seq, tp->high_seq)) flag |= FLAG_ORIG_SACK_ACKED; @@ -3196,7 +3198,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, } if (icsk->icsk_ca_ops->pkts_acked) { - struct ack_sample sample = {pkts_acked, ca_rtt_us}; + struct ack_sample sample = {pkts_acked, ca_rtt_us, + last_in_flight}; icsk->icsk_ca_ops->pkts_acked(sk, sample); } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 7105784..e9deab5 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -920,9 +920,12 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, int err; BUG_ON(!skb || !tcp_skb_pcount(skb)); + tp = tcp_sk(sk); if (clone_it) { skb_mstamp_get(&skb->skb_mstamp); + TCP_SKB_CB(skb)->tx.in_flight = TCP_SKB_CB(skb)->end_seq + - tp->snd_una; if (unlikely(skb_cloned(skb))) skb = pskb_copy(skb, gfp_mask); @@ -933,7 +936,6 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, } inet = inet_sk(sk); - tp = tcp_sk(sk); tcb = TCP_SKB_CB(skb); memset(&opts, 0, sizeof(opts)); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 net-next 0/3] tcp: add NV congestion control
This patchset adds support for NV congestion control. The first patch replaces two arguments in the pkts_acked() function of the congestion control modules with a struct, making it easier to add more parameters later without modifying the existing congestion control modules. The second patch adds the number of bytes in_flight when a packet is sent to the tcp_skb_cb without increasing its size. The third patch adds NV congestion control support. [RFC PATCH v2 net-next 1/3] tcp: replace cnt & rtt with struct in pkts_acked() [RFC PATCH v2 net-next 2/3] tcp: add in_flight to tcp_skb_cb [RFC PATCH v2 net-next 3/3] tcp: add NV congestion control Signed-off-by: Lawrence Brakmo include/net/tcp.h | 21 ++- net/ipv4/Kconfig | 16 ++ net/ipv4/Makefile | 1 + net/ipv4/sysctl_net_ipv4.c | 9 + net/ipv4/tcp_bic.c | 6 +- net/ipv4/tcp_cdg.c | 14 +- net/ipv4/tcp_cubic.c | 6 +- net/ipv4/tcp_htcp.c| 10 +- net/ipv4/tcp_illinois.c| 20 +- net/ipv4/tcp_input.c | 12 +- net/ipv4/tcp_lp.c | 6 +- net/ipv4/tcp_nv.c | 479 net/ipv4/tcp_output.c | 4 +- net/ipv4/tcp_vegas.c | 6 +- net/ipv4/tcp_vegas.h | 2 +- net/ipv4/tcp_veno.c| 6 +- net/ipv4/tcp_westwood.c| 6 +- net/ipv4/tcp_yeah.c| 6 +- 18 files changed, 579 insertions(+), 51 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path
Hello, On Tue, 21 Jul 2015, Martin KaFai Lau wrote: > The patch checks neigh->nud_state before acquiring the writer lock. > Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. Locking usage is absolutely correct. > + if (!(neigh->nud_state & NUD_VALID) && > + time_after(jiffies, neigh->updated + > rt->rt6i_idev->cnf.rtr_probe_interval)) { but this line is too long... > + work = kmalloc(sizeof(*work), GFP_ATOMIC); > + if (work) { > + __neigh_set_probe_once(neigh); > + } scripts/checkpatch.pl --strict /tmp/file.patch Regards -- Julian Anastasov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mac80211_hwsim: unregister genetlink family properly
During hwsim_init_netlink(), we should call genl_unregister_family() if failed on netlink_register_notifier() since the genetlink is already registered. Signed-off-by: Su Kang Yin --- drivers/net/wireless/mac80211_hwsim.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 99e873d..16d953e 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -3120,8 +3120,10 @@ static int hwsim_init_netlink(void) goto failure; rc = netlink_register_notifier(&hwsim_netlink_notifier); - if (rc) + if (rc) { + genl_unregister_family(&hwsim_genl_family); goto failure; + } return 0; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next] ebpf: Allow dereferences of PTR_TO_STACK registers
On Tue, Jul 21, 2015 at 07:00:40PM -0700, Alex Gartrell wrote: > mov %rsp, %r1 ; r1 = rsp > add $-8, %r1; r1 = rsp - 8 > store_q $123, -8(%rsp) ; *(u64*)r1 = 123 <- valid > store_q $123, (%r1) ; *(u64*)r1 = 123 <- previously invalid > mov $0, %r0 > exit; Always need to exit Is this your new eBPF assembler syntax? :) imo gnu style looks ugly... ;) It's great to see such in-depth understanding of verifier!! > And we'd get the following error: > > 0: (bf) r1 = r10 > 1: (07) r1 += -8 > 2: (7a) *(u64 *)(r10 -8) = 999 > 3: (7a) *(u64 *)(r1 +0) = 999 > R1 invalid mem access 'fp' > > Unable to load program > > We already know that a register is a stack address and the appropriate > offset, so we should be able to validate those references as well. yes, we can teach verifier to do that. Though llvm doesn't generate such code. It's small enough change. > Signed-off-by: Alex Gartrell > --- > kernel/bpf/verifier.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 039d866..5dfbece 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -676,6 +676,15 @@ static int check_mem_access(struct verifier_env *env, > u32 regno, int off, > err = check_stack_write(state, off, size, value_regno); > else > err = check_stack_read(state, off, size, value_regno); > + } else if (state->regs[regno].type == PTR_TO_STACK) { > + int real_off = state->regs[regno].imm + off; real_off is missing alignment and bounds checks. something like: if (state->regs[regno].type == PTR_TO_STACK) off += state->regs[regno].imm; if (off % size != 0) ... else if (state->regs[regno].type == FRAME_PTR || == PTR_TO_STACK) .. as-is here ... would fix it. please add few accept and reject tests for this to test_verifier.c as well. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path
Hi, Martin KaFai Lau wrote: > The patch checks neigh->nud_state before acquiring the writer lock. > Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. You have to take "some" lock when accessing neigh->nud_state theoretically. > > I also take this chance to re-arrange the code. No, please do not mix multiple changes. > > 40 udpflood processes and a /64 gateway route are used. > The gateway has NUD_PERMANENT. Each of them is run for 30s. > At the end, the total number of finished sendto(): > > BeforeAfter > 55M 95M > > Signed-off-by: Martin KaFai Lau > Cc: Hannes Frederic Sowa > --- > net/ipv6/route.c | 41 - > 1 file changed, 20 insertions(+), 21 deletions(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 6090969..a6c6b5a 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -544,6 +544,7 @@ static void rt6_probe_deferred(struct work_struct *w) > > static void rt6_probe(struct rt6_info *rt) > { > + struct __rt6_probe_work *work; > struct neighbour *neigh; > /* >* Okay, this does not seem to be appropriate > @@ -558,34 +559,32 @@ static void rt6_probe(struct rt6_info *rt) > rcu_read_lock_bh(); > neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway); > if (neigh) { > - write_lock(&neigh->lock); > if (neigh->nud_state & NUD_VALID) > goto out; > - } > - > - if (!neigh || > - time_after(jiffies, neigh->updated + > rt->rt6i_idev->cnf.rtr_probe_interval)) { > - struct __rt6_probe_work *work; > > + work = NULL; > + write_lock(&neigh->lock); > + if (!(neigh->nud_state & NUD_VALID) && > + time_after(jiffies, neigh->updated + > rt->rt6i_idev->cnf.rtr_probe_interval)) { > + work = kmalloc(sizeof(*work), GFP_ATOMIC); > + if (work) { > + __neigh_set_probe_once(neigh); > + } > + } > + write_unlock(&neigh->lock); > + } else { > work = kmalloc(sizeof(*work), GFP_ATOMIC); > + } > > - if (neigh && work) > - __neigh_set_probe_once(neigh); > - > - if (neigh) > - write_unlock(&neigh->lock); > + if (work) { > + INIT_WORK(&work->work, rt6_probe_deferred); > + work->target = rt->rt6i_gateway; > + dev_hold(rt->dst.dev); > + work->dev = rt->dst.dev; > + schedule_work(&work->work); > + } > > - if (work) { > - INIT_WORK(&work->work, rt6_probe_deferred); > - work->target = rt->rt6i_gateway; > - dev_hold(rt->dst.dev); > - work->dev = rt->dst.dev; > - schedule_work(&work->work); > - } > - } else { > out: > - write_unlock(&neigh->lock); > - } > rcu_read_unlock_bh(); > } > #else > -- Hideaki Yoshifuji Technical Division, MIRACLE LINUX CORPORATION -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen
On Tue, Jul 21, 2015 at 1:57 PM, Eric Dumazet wrote: > On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote: > >> > - kfree_skb(skb); >> > + INIT_LIST_HEAD(&q->new_flows); >> > + INIT_LIST_HEAD(&q->old_flows); >> > + for (i = 0; i < q->flows_cnt; i++) { >> > + struct fq_codel_flow *flow = q->flows + i; >> > + >> > + while (flow->head) >> > + kfree_skb(dequeue_head(flow)); >> > + >> > + INIT_LIST_HEAD(&flow->flowchain); >> >> >> You probably need to call codel_vars_init(&flow->cvars) as well. > > It is not necessary : flow->cvars only matter in the event of a dequeue, > but whole qdisc is dismantled and no packet will be dequeued. > But it will affect the next dequeue _after_ reset? which is not supposed to happen as we expect a fresh start after reset? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH net-next] ebpf: Allow dereferences of PTR_TO_STACK registers
mov %rsp, %r1 ; r1 = rsp add $-8, %r1; r1 = rsp - 8 store_q $123, -8(%rsp) ; *(u64*)r1 = 123 <- valid store_q $123, (%r1) ; *(u64*)r1 = 123 <- previously invalid mov $0, %r0 exit; Always need to exit And we'd get the following error: 0: (bf) r1 = r10 1: (07) r1 += -8 2: (7a) *(u64 *)(r10 -8) = 999 3: (7a) *(u64 *)(r1 +0) = 999 R1 invalid mem access 'fp' Unable to load program We already know that a register is a stack address and the appropriate offset, so we should be able to validate those references as well. Signed-off-by: Alex Gartrell --- kernel/bpf/verifier.c | 9 + 1 file changed, 9 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 039d866..5dfbece 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -676,6 +676,15 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off, err = check_stack_write(state, off, size, value_regno); else err = check_stack_read(state, off, size, value_regno); + } else if (state->regs[regno].type == PTR_TO_STACK) { + int real_off = state->regs[regno].imm + off; + + if (t == BPF_WRITE) + err = check_stack_write( + state, real_off, size, value_regno); + else + err = check_stack_read( + state, real_off, size, value_regno); } else { verbose("R%d invalid mem access '%s'\n", regno, reg_type_str[state->regs[regno].type]); -- Alex Gartrell -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe fail path
From: Lucas Stach Sent: Tuesday, July 21, 2015 11:11 PM > To: David S. Miller > Cc: Duan Fugang-B38611; Li Frank-B20596; netdev@vger.kernel.org; > ker...@pengutronix.de; patchwork-...@pengutronix.de > Subject: [PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe > fail path > > This function frees resources and cancels delayed work item that have > been initialized in fec_ptp_init(). > > Use this to do proper error handling if something goes wrong in probe > function after fec_ptp_init has been called. > > Signed-off-by: Lucas Stach > --- > drivers/net/ethernet/freescale/fec.h | 1 + > drivers/net/ethernet/freescale/fec_main.c | 5 ++--- > drivers/net/ethernet/freescale/fec_ptp.c | 10 ++ > 3 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/freescale/fec.h > b/drivers/net/ethernet/freescale/fec.h > index 1eee73cccdf5..99d33e2d35e6 100644 > --- a/drivers/net/ethernet/freescale/fec.h > +++ b/drivers/net/ethernet/freescale/fec.h > @@ -562,6 +562,7 @@ struct fec_enet_private { }; > > void fec_ptp_init(struct platform_device *pdev); > +void fec_ptp_stop(struct platform_device *pdev); > void fec_ptp_start_cyclecounter(struct net_device *ndev); int > fec_ptp_set(struct net_device *ndev, struct ifreq *ifr); int > fec_ptp_get(struct net_device *ndev, struct ifreq *ifr); diff --git > a/drivers/net/ethernet/freescale/fec_main.c > b/drivers/net/ethernet/freescale/fec_main.c > index a7f1bdf718f8..32e3807c650e 100644 > --- a/drivers/net/ethernet/freescale/fec_main.c > +++ b/drivers/net/ethernet/freescale/fec_main.c > @@ -3494,6 +3494,7 @@ failed_register: > failed_mii_init: > failed_irq: > failed_init: > + fec_ptp_stop(pdev); > if (fep->reg_phy) > regulator_disable(fep->reg_phy); > failed_regulator: > @@ -3515,14 +3516,12 @@ fec_drv_remove(struct platform_device *pdev) > struct net_device *ndev = platform_get_drvdata(pdev); > struct fec_enet_private *fep = netdev_priv(ndev); > > - cancel_delayed_work_sync(&fep->time_keep); > cancel_work_sync(&fep->tx_timeout_work); > + fec_ptp_stop(pdev); > unregister_netdev(ndev); > fec_enet_mii_remove(fep); > if (fep->reg_phy) > regulator_disable(fep->reg_phy); > - if (fep->ptp_clock) > - ptp_clock_unregister(fep->ptp_clock); > of_node_put(fep->phy_node); > free_netdev(ndev); > > diff --git a/drivers/net/ethernet/freescale/fec_ptp.c > b/drivers/net/ethernet/freescale/fec_ptp.c > index a15663ad7f5e..f457a23d0bfb 100644 > --- a/drivers/net/ethernet/freescale/fec_ptp.c > +++ b/drivers/net/ethernet/freescale/fec_ptp.c > @@ -604,6 +604,16 @@ void fec_ptp_init(struct platform_device *pdev) > schedule_delayed_work(&fep->time_keep, HZ); } > > +void fec_ptp_stop(struct platform_device *pdev) { > + struct net_device *ndev = platform_get_drvdata(pdev); > + struct fec_enet_private *fep = netdev_priv(ndev); > + > + cancel_delayed_work_sync(&fep->time_keep); > + if (fep->ptp_clock) > + ptp_clock_unregister(fep->ptp_clock); > +} > + > /** > * fec_ptp_check_pps_event > * @fep: the fec_enet_private structure handle > -- > 2.1.4 Acked-by: Fugang Duan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 1/2] net: fec: use managed DMA API functions to allocate BD ring
From: Lucas Stach Sent: Tuesday, July 21, 2015 11:11 PM > To: David S. Miller > Cc: Duan Fugang-B38611; Li Frank-B20596; netdev@vger.kernel.org; > ker...@pengutronix.de; patchwork-...@pengutronix.de > Subject: [PATCH v2 1/2] net: fec: use managed DMA API functions to > allocate BD ring > > So it gets freed when the device is going away. > This fixes a DMA memory leak on driver probe() fail and driver remove(). > > Signed-off-by: Lucas Stach > --- > v2: Fix indentation of second line to fix alignment with opening bracket. > --- > drivers/net/ethernet/freescale/fec_main.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/freescale/fec_main.c > b/drivers/net/ethernet/freescale/fec_main.c > index 349365d85b92..a7f1bdf718f8 100644 > --- a/drivers/net/ethernet/freescale/fec_main.c > +++ b/drivers/net/ethernet/freescale/fec_main.c > @@ -3142,8 +3142,8 @@ static int fec_enet_init(struct net_device *ndev) > fep->bufdesc_size; > > /* Allocate memory for buffer descriptors. */ > - cbd_base = dma_alloc_coherent(NULL, bd_size, &bd_dma, > - GFP_KERNEL); > + cbd_base = dmam_alloc_coherent(&fep->pdev->dev, bd_size, &bd_dma, > +GFP_KERNEL); > if (!cbd_base) { > return -ENOMEM; > } > -- Can you also replace the below position with dma_alloc_coherent() ? txq->tso_hdrs = dma_alloc_coherent(NULL, txq->tx_ring_size * TSO_HEADER_SIZE, &txq->tso_hdrs_dma, GFP_KERNEL); Regards, Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] e1000e: Move e1000e_disable_aspm_locked() inside CONFIG_PM
On Wed, 2015-07-15 at 03:30 -0700, Jeff Kirsher wrote: > On Tue, 2015-07-14 at 13:54 +1000, Michael Ellerman wrote: > > e1000e_disable_aspm_locked() is only used in __e1000_resume() which is > > inside CONFIG_PM. So when CONFIG_PM=n we get a "defined but not used" > > warning for e1000e_disable_aspm_locked(). > > > > Move it inside the existing CONFIG_PM block to avoid the warning. > > > > Signed-off-by: Michael Ellerman > > --- > > drivers/net/ethernet/intel/e1000e/netdev.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > NACK, this is already fixed in my next-queue tree. Raanan submitted a > patch back on July 6th to resolve this issue, see commit id > a75787d2246a93d256061db602f252703559af65 in my dev-queue branch of my > next-queue tree. OK. I take it your next-queue is destined for 4.3, so we'll just have to suck on the warning until then? cheers -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next:master 200/208] drivers/net/vxlan.c:1739:21: sparse: incorrect type in assignment (different base types)
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 16040894b26af9f85d9395f072c53d76a44eba21 commit: 614732eaa12dd462c0ab274700bed14f36afea5e [200/208] openvswitch: Use regular VXLAN net_device device reproduce: # apt-get install sparse git checkout 614732eaa12dd462c0ab274700bed14f36afea5e make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) include/net/checksum.h:166:35: sparse: incorrect type in argument 1 (different base types) include/net/checksum.h:166:35:expected restricted __wsum [usertype] csum include/net/checksum.h:166:35:got restricted __sum16 include/net/checksum.h:166:43: sparse: incorrect type in argument 2 (different base types) include/net/checksum.h:166:43:expected restricted __wsum [usertype] addend include/net/checksum.h:166:43:got restricted __sum16 [usertype] include/net/checksum.h:174:43: sparse: incorrect type in argument 2 (different base types) include/net/checksum.h:174:43:expected restricted __wsum [usertype] addend include/net/checksum.h:174:43:got restricted __sum16 [usertype] include/net/checksum.h:166:35: sparse: incorrect type in argument 1 (different base types) include/net/checksum.h:166:35:expected restricted __wsum [usertype] csum include/net/checksum.h:166:35:got restricted __sum16 include/net/checksum.h:166:43: sparse: incorrect type in argument 2 (different base types) include/net/checksum.h:166:43:expected restricted __wsum [usertype] addend include/net/checksum.h:166:43:got restricted __sum16 [usertype] >> drivers/net/vxlan.c:1739:21: sparse: incorrect type in assignment (different >> base types) drivers/net/vxlan.c:1739:21:expected restricted __be32 [usertype] vx_vni drivers/net/vxlan.c:1739:21:got unsigned int [unsigned] [usertype] vni drivers/net/vxlan.c:1818:21: sparse: incorrect type in assignment (different base types) drivers/net/vxlan.c:1818:21:expected restricted __be32 [usertype] vx_vni drivers/net/vxlan.c:1818:21:got unsigned int [unsigned] [usertype] vni >> drivers/net/vxlan.c:2014:58: sparse: incorrect type in argument 11 >> (different base types) drivers/net/vxlan.c:2014:58:expected unsigned int [unsigned] [usertype] vni drivers/net/vxlan.c:2014:58:got restricted __be32 [usertype] drivers/net/vxlan.c:2072:67: sparse: incorrect type in argument 11 (different base types) drivers/net/vxlan.c:2072:67:expected unsigned int [unsigned] [usertype] vni drivers/net/vxlan.c:2072:67:got restricted __be32 [usertype] vim +1739 drivers/net/vxlan.c 1723 } 1724 1725 skb = vlan_hwaccel_push_inside(skb); 1726 if (WARN_ON(!skb)) { 1727 err = -ENOMEM; 1728 goto err; 1729 } 1730 1731 skb = iptunnel_handle_offloads(skb, udp_sum, type); 1732 if (IS_ERR(skb)) { 1733 err = -EINVAL; 1734 goto err; 1735 } 1736 1737 vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh)); 1738 vxh->vx_flags = htonl(VXLAN_HF_VNI); > 1739 vxh->vx_vni = vni; 1740 1741 if (type & SKB_GSO_TUNNEL_REMCSUM) { 1742 u32 data = (skb_checksum_start_offset(skb) - hdrlen) >> 1743 VXLAN_RCO_SHIFT; 1744 1745 if (skb->csum_offset == offsetof(struct udphdr, check)) 1746 data |= VXLAN_RCO_UDP; 1747 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next:master 187/208] net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison expression (different address spaces)
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 16040894b26af9f85d9395f072c53d76a44eba21 commit: e3e4712ec0961ed586a8db340bd994c4ad7f5dba [187/208] mpls: ip tunnel support reproduce: # apt-get install sparse git checkout e3e4712ec0961ed586a8db340bd994c4ad7f5dba make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison >> expression (different address spaces) vim +73 net/mpls/mpls_iptunnel.c 57 /* Obtain the ttl */ 58 if (skb->protocol == htons(ETH_P_IP)) { 59 ttl = ip_hdr(skb)->ttl; 60 rt = (struct rtable *)dst; 61 lwtstate = rt->rt_lwtstate; 62 } else if (skb->protocol == htons(ETH_P_IPV6)) { 63 ttl = ipv6_hdr(skb)->hop_limit; 64 rt6 = (struct rt6_info *)dst; 65 lwtstate = rt6->rt6i_lwtstate; 66 } else { 67 goto drop; 68 } 69 70 skb_orphan(skb); 71 72 /* Find the output device */ > 73 out_dev = rcu_dereference(dst->dev); 74 if (!mpls_output_possible(out_dev) || 75 !lwtstate || skb_warn_if_lro(skb)) 76 goto drop; 77 78 skb_forward_csum(skb); 79 80 tun_encap_info = mpls_lwtunnel_encap(lwtstate); 81 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch net] sch_choke: drop all packets in queue during reset
Signed-off-by: Cong Wang --- net/sched/sch_choke.c | 13 + 1 file changed, 13 insertions(+) diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c index 93d5742..6a783af 100644 --- a/net/sched/sch_choke.c +++ b/net/sched/sch_choke.c @@ -385,6 +385,19 @@ static void choke_reset(struct Qdisc *sch) { struct choke_sched_data *q = qdisc_priv(sch); + while (q->head != q->tail) { + struct sk_buff *skb = q->tab[q->head]; + + q->head = (q->head + 1) & q->tab_mask; + if (!skb) + continue; + qdisc_qstats_backlog_dec(sch, skb); + --sch->q.qlen; + qdisc_drop(skb, sch); + } + + memset(q->tab, 0, (q->tab_mask + 1) * sizeof(struct sk_buff *)); + q->head = q->tail = 0; red_restart(&q->vars); } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path
The patch checks neigh->nud_state before acquiring the writer lock. Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. I also take this chance to re-arrange the code. 40 udpflood processes and a /64 gateway route are used. The gateway has NUD_PERMANENT. Each of them is run for 30s. At the end, the total number of finished sendto(): BeforeAfter 55M 95M Signed-off-by: Martin KaFai Lau Cc: Hannes Frederic Sowa --- net/ipv6/route.c | 41 - 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 6090969..a6c6b5a 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -544,6 +544,7 @@ static void rt6_probe_deferred(struct work_struct *w) static void rt6_probe(struct rt6_info *rt) { + struct __rt6_probe_work *work; struct neighbour *neigh; /* * Okay, this does not seem to be appropriate @@ -558,34 +559,32 @@ static void rt6_probe(struct rt6_info *rt) rcu_read_lock_bh(); neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway); if (neigh) { - write_lock(&neigh->lock); if (neigh->nud_state & NUD_VALID) goto out; - } - - if (!neigh || - time_after(jiffies, neigh->updated + rt->rt6i_idev->cnf.rtr_probe_interval)) { - struct __rt6_probe_work *work; + work = NULL; + write_lock(&neigh->lock); + if (!(neigh->nud_state & NUD_VALID) && + time_after(jiffies, neigh->updated + rt->rt6i_idev->cnf.rtr_probe_interval)) { + work = kmalloc(sizeof(*work), GFP_ATOMIC); + if (work) { + __neigh_set_probe_once(neigh); + } + } + write_unlock(&neigh->lock); + } else { work = kmalloc(sizeof(*work), GFP_ATOMIC); + } - if (neigh && work) - __neigh_set_probe_once(neigh); - - if (neigh) - write_unlock(&neigh->lock); + if (work) { + INIT_WORK(&work->work, rt6_probe_deferred); + work->target = rt->rt6i_gateway; + dev_hold(rt->dst.dev); + work->dev = rt->dst.dev; + schedule_work(&work->work); + } - if (work) { - INIT_WORK(&work->work, rt6_probe_deferred); - work->target = rt->rt6i_gateway; - dev_hold(rt->dst.dev); - work->dev = rt->dst.dev; - schedule_work(&work->work); - } - } else { out: - write_unlock(&neigh->lock); - } rcu_read_unlock_bh(); } #else -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next:master 195/208] net/core/fib_rules.c:418:3: error: implicit declaration of function 'ip_tunnel_need_metadata'
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 16040894b26af9f85d9395f072c53d76a44eba21 commit: e7030878fc8448492b6e5cecd574043f63271298 [195/208] fib: Add fib rule match on tunnel id config: i386-randconfig-r0-201529 (attached as .config) reproduce: git checkout e7030878fc8448492b6e5cecd574043f63271298 # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): net/core/fib_rules.c: In function 'fib_nl_newrule': >> net/core/fib_rules.c:418:3: error: implicit declaration of function >> 'ip_tunnel_need_metadata' [-Werror=implicit-function-declaration] ip_tunnel_need_metadata(); ^ net/core/fib_rules.c: In function 'fib_nl_delrule': >> net/core/fib_rules.c:505:4: error: implicit declaration of function >> 'ip_tunnel_unneed_metadata' [-Werror=implicit-function-declaration] ip_tunnel_unneed_metadata(); ^ cc1: some warnings being treated as errors vim +/ip_tunnel_need_metadata +418 net/core/fib_rules.c 412 ops->nr_goto_rules++; 413 414 if (unresolved) 415 ops->unresolved_rules++; 416 417 if (rule->tun_id) > 418 ip_tunnel_need_metadata(); 419 420 notify_rule_change(RTM_NEWRULE, rule, ops, nlh, NETLINK_CB(skb).portid); 421 flush_route_cache(ops); 422 rules_ops_put(ops); 423 return 0; 424 425 errout_free: 426 kfree(rule); 427 errout: 428 rules_ops_put(ops); 429 return err; 430 } 431 432 static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh) 433 { 434 struct net *net = sock_net(skb->sk); 435 struct fib_rule_hdr *frh = nlmsg_data(nlh); 436 struct fib_rules_ops *ops = NULL; 437 struct fib_rule *rule, *tmp; 438 struct nlattr *tb[FRA_MAX+1]; 439 int err = -EINVAL; 440 441 if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh))) 442 goto errout; 443 444 ops = lookup_rules_ops(net, frh->family); 445 if (ops == NULL) { 446 err = -EAFNOSUPPORT; 447 goto errout; 448 } 449 450 err = nlmsg_parse(nlh, sizeof(*frh), tb, FRA_MAX, ops->policy); 451 if (err < 0) 452 goto errout; 453 454 err = validate_rulemsg(frh, tb, ops); 455 if (err < 0) 456 goto errout; 457 458 list_for_each_entry(rule, &ops->rules_list, list) { 459 if (frh->action && (frh->action != rule->action)) 460 continue; 461 462 if (frh_get_table(frh, tb) && 463 (frh_get_table(frh, tb) != rule->table)) 464 continue; 465 466 if (tb[FRA_PRIORITY] && 467 (rule->pref != nla_get_u32(tb[FRA_PRIORITY]))) 468 continue; 469 470 if (tb[FRA_IIFNAME] && 471 nla_strcmp(tb[FRA_IIFNAME], rule->iifname)) 472 continue; 473 474 if (tb[FRA_OIFNAME] && 475 nla_strcmp(tb[FRA_OIFNAME], rule->oifname)) 476 continue; 477 478 if (tb[FRA_FWMARK] && 479 (rule->mark != nla_get_u32(tb[FRA_FWMARK]))) 480 continue; 481 482 if (tb[FRA_FWMASK] && 483 (rule->mark_mask != nla_get_u32(tb[FRA_FWMASK]))) 484 continue; 485 486 if (tb[FRA_TUN_ID] && 487 (rule->tun_id != nla_get_be64(tb[FRA_TUN_ID]))) 488 continue; 489 490 if (!ops->compare(rule, frh, tb)) 491 continue; 492 493 if (rule->flags & FIB_RULE_PERMANENT) { 494 err = -EPERM; 495 goto errout; 496 } 497 498 if (ops->delete) { 499 err = ops->delete(rule); 500 if (err) 501 goto errout; 502 } 503 504 if (rule->tun_id) > 505 ip_tunnel_unneed_metadata(); 506 507 list_del_rcu(&rule->list); 508 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.2.0-rc2 Kernel Configuration # # CONFIG_64BIT is not se
[net-next:master 194/208] include/net/dst_metadata.h:39:4: error: implicit declaration of function 'lwt_tun_info'
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 16040894b26af9f85d9395f072c53d76a44eba21 commit: 3093fbe7ff4bc7d1571fc217dade1cf80330a714 [194/208] route: Per route IP tunnel metadata via lightweight tunnel config: i386-randconfig-i0-201529 (attached as .config) reproduce: git checkout 3093fbe7ff4bc7d1571fc217dade1cf80330a714 # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): In file included from net/core/dst.c:25:0: include/net/dst_metadata.h: In function 'skb_tunnel_info': >> include/net/dst_metadata.h:39:4: error: implicit declaration of function >> 'lwt_tun_info' [-Werror=implicit-function-declaration] return lwt_tun_info(rt->rt_lwtstate); ^ >> include/net/dst_metadata.h:39:4: warning: return makes pointer from integer >> without a cast cc1: some warnings being treated as errors vim +/lwt_tun_info +39 include/net/dst_metadata.h 33 return &md_dst->u.tun_info; 34 35 switch (family) { 36 case AF_INET: 37 rt = (struct rtable *)skb_dst(skb); 38 if (rt && rt->rt_lwtstate) > 39 return lwt_tun_info(rt->rt_lwtstate); 40 break; 41 } 42 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.2.0-rc2 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_PERF_EVENTS_INTEL_UNCORE=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=2 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set CONFIG_KERNEL_LZ4=y CONFIG_DEFAULT_HOSTNAME="(none)" # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_DEBUG=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_HZ_PERIODIC=y # CONFIG_NO_HZ_IDLE is not set CONFIG_NO_HZ=y # CONFIG_HIGH_RES_TIMERS is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # # RCU Subsystem # CONFIG_TINY_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set # CONFIG_RCU_STALL_COMMON is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_RCU_EXPEDITE_BOOT is not set CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=y # CONFIG_IKCONFIG_PROC is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_DEVICE=y # CONFIG_CPUSETS is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_MEMCG is not set # CONFIG_CGROUP_PERF is not set CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y CONFIG_RT_GROUP_SCHED=y CONFIG_BLK_CGROUP=y CONFIG_DEBUG_BLK_CGROUP=y # CONFIG_CHECKPOINT_RESTO
[Patch net] sch_plug: purge buffered packets during reset
Otherwise the skbuff related structures are not correctly refcount'ed. Cc: Jamal Hadi Salim Signed-off-by: Cong Wang --- net/sched/sch_plug.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sched/sch_plug.c b/net/sched/sch_plug.c index 89f8fcf..ade9445 100644 --- a/net/sched/sch_plug.c +++ b/net/sched/sch_plug.c @@ -216,6 +216,7 @@ static struct Qdisc_ops plug_qdisc_ops __read_mostly = { .peek= qdisc_peek_head, .init= plug_init, .change = plug_change, + .reset = qdisc_reset_queue, .owner = THIS_MODULE, }; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/1] tipc: fix compatibility bug
From: Jon Maloy Date: Tue, 21 Jul 2015 06:42:28 -0400 > In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6 > ("tipc: reduce locking scope during packet reception") we introduced > a new function tipc_link_proto_rcv(). This function contains a bug, > so that it sometimes by error sends out a non-zero link priority value > in created protocol messages. > > The bug may lead to an extra link reset at initial link establising > with older nodes. This will never happen more than once, whereafter > the link will work as intended. > > We fix this bug in this commit. > > Signed-off-by: Jon Maloy Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: track success and failure of TCP PMTU probing
From: Rick Jones Track success and failure of TCP PMTU probing. Signed-off-by: Rick Jones --- Tested by loading-up into an OpenStack instance and kicking the MTU out from under it in the corresponding router namespace. diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index eee8968..25a9ad8 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -278,6 +278,8 @@ enum LINUX_MIB_TCPACKSKIPPEDCHALLENGE, /* TCPACKSkippedChallenge */ LINUX_MIB_TCPWINPROBE, /* TCPWinProbe */ LINUX_MIB_TCPKEEPALIVE, /* TCPKeepAlive */ + LINUX_MIB_TCPMTUPFAIL, /* TCPMTUPFail */ + LINUX_MIB_TCPMTUPSUCCESS, /* TCPMTUPSuccess */ __LINUX_MIB_MAX }; diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index da5d483..3abd9d7 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -300,6 +300,8 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM("TCPACKSkippedChallenge", LINUX_MIB_TCPACKSKIPPEDCHALLENGE), SNMP_MIB_ITEM("TCPWinProbe", LINUX_MIB_TCPWINPROBE), SNMP_MIB_ITEM("TCPKeepAlive", LINUX_MIB_TCPKEEPALIVE), + SNMP_MIB_ITEM("TCPMTUPFail", LINUX_MIB_TCPMTUPFAIL), + SNMP_MIB_ITEM("TCPMTUPSuccess", LINUX_MIB_TCPMTUPSUCCESS), SNMP_MIB_SENTINEL }; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1578fc2..cda3ffe 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2593,6 +2593,7 @@ static void tcp_mtup_probe_failed(struct sock *sk) icsk->icsk_mtup.search_high = icsk->icsk_mtup.probe_size - 1; icsk->icsk_mtup.probe_size = 0; + NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPMTUPFAIL); } static void tcp_mtup_probe_success(struct sock *sk) @@ -2612,6 +2613,7 @@ static void tcp_mtup_probe_success(struct sock *sk) icsk->icsk_mtup.search_low = icsk->icsk_mtup.probe_size; icsk->icsk_mtup.probe_size = 0; tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); + NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPMTUPSUCCESS); } /* Do a simple retransmit without using the backoff mechanisms in -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v5 0/4] net: enable inband link state negotiation only when explicitly requested
From: Florian Fainelli Date: Mon, 20 Jul 2015 17:49:54 -0700 > Changes in v5: > > - removed an invalid use of the link_update callback in the SF2 driver > was appeared after merging "net: phy: fixed_phy: handle link-down case" > > - reworded the commit message for patch 2 to make it clear what it fixes and > why this is required Series applied, thanks Florian. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pull-request: wireless-drivers 2015-07-20
From: Kalle Valo Date: Mon, 20 Jul 2015 18:36:30 +0300 > here are few fixes for 4.2, should not have anything out of ordinary. > Please let me know if there are any issues. Pulled, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v4] ipv6: sysctl to restrict candidate source addresses
From: Erik Kline Date: Mon, 20 Jul 2015 16:06:34 +0200 > I thought perhaps "use_oif_addr_only" was a slightly clearer sysctl name. > > (Maybe it should be plural, "use_oif_addrs_only"?) I think plural would be better too, please respin with that change. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 net-next] net: #ifdefify sk_classid member of struct sock
From: Mathias Krause Date: Sun, 19 Jul 2015 22:21:13 +0200 > The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is > enabled. #ifdefify it to reduce the size of struct sock on 32 bit > systems, at least. > > Signed-off-by: Mathias Krause > --- > v2: > - ensure we'll error out in nft_meta_get_init() if CONFIG_CGROUP_NET_CLASSID > is not set Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ravb: fix ring memory allocation
The driver is written as if it can adapt to a low memory situation allocating less RX skbs and TX aligned buffers than the respective RX/TX ring sizes. In reality though the driver would malfunction in this case. Stop being overly smart and just fail in such situation -- this is achieved by moving the memory allocation from ravb_ring_format() to ravb_ring_init(). We leave dma_map_single() calls in place but make their failure non-fatal by marking the corresponding RX descriptors with zero data size which should prevent DMA to an invalid addresses. Signed-off-by: Sergei Shtylyov --- The patch is against Dave Miller's 'net.git' repo. drivers/net/ethernet/renesas/ravb_main.c | 59 +-- 1 file changed, 34 insertions(+), 25 deletions(-) Index: net/drivers/net/ethernet/renesas/ravb_main.c === --- net.orig/drivers/net/ethernet/renesas/ravb_main.c +++ net/drivers/net/ethernet/renesas/ravb_main.c @@ -228,9 +228,7 @@ static void ravb_ring_format(struct net_ struct ravb_desc *desc = NULL; int rx_ring_size = sizeof(*rx_desc) * priv->num_rx_ring[q]; int tx_ring_size = sizeof(*tx_desc) * priv->num_tx_ring[q]; - struct sk_buff *skb; dma_addr_t dma_addr; - void *buffer; int i; priv->cur_rx[q] = 0; @@ -241,41 +239,28 @@ static void ravb_ring_format(struct net_ memset(priv->rx_ring[q], 0, rx_ring_size); /* Build RX ring buffer */ for (i = 0; i < priv->num_rx_ring[q]; i++) { - priv->rx_skb[q][i] = NULL; - skb = netdev_alloc_skb(ndev, PKT_BUF_SZ + RAVB_ALIGN - 1); - if (!skb) - break; - ravb_set_buffer_align(skb); /* RX descriptor */ rx_desc = &priv->rx_ring[q][i]; /* The size of the buffer should be on 16-byte boundary. */ rx_desc->ds_cc = cpu_to_le16(ALIGN(PKT_BUF_SZ, 16)); - dma_addr = dma_map_single(&ndev->dev, skb->data, + dma_addr = dma_map_single(&ndev->dev, priv->rx_skb[q][i]->data, ALIGN(PKT_BUF_SZ, 16), DMA_FROM_DEVICE); - if (dma_mapping_error(&ndev->dev, dma_addr)) { - dev_kfree_skb(skb); - break; - } - priv->rx_skb[q][i] = skb; + /* We just set the data size to 0 for a failed mapping which +* should prevent DMA from happening... +*/ + if (dma_mapping_error(&ndev->dev, dma_addr)) + rx_desc->ds_cc = cpu_to_le16(0); rx_desc->dptr = cpu_to_le32(dma_addr); rx_desc->die_dt = DT_FEMPTY; } rx_desc = &priv->rx_ring[q][i]; rx_desc->dptr = cpu_to_le32((u32)priv->rx_desc_dma[q]); rx_desc->die_dt = DT_LINKFIX; /* type */ - priv->dirty_rx[q] = (u32)(i - priv->num_rx_ring[q]); memset(priv->tx_ring[q], 0, tx_ring_size); /* Build TX ring buffer */ for (i = 0; i < priv->num_tx_ring[q]; i++) { - priv->tx_skb[q][i] = NULL; - priv->tx_buffers[q][i] = NULL; - buffer = kmalloc(PKT_BUF_SZ + RAVB_ALIGN - 1, GFP_KERNEL); - if (!buffer) - break; - /* Aligned TX buffer */ - priv->tx_buffers[q][i] = buffer; tx_desc = &priv->tx_ring[q][i]; tx_desc->die_dt = DT_EEMPTY; } @@ -298,7 +283,10 @@ static void ravb_ring_format(struct net_ static int ravb_ring_init(struct net_device *ndev, int q) { struct ravb_private *priv = netdev_priv(ndev); + struct sk_buff *skb; int ring_size; + void *buffer; + int i; /* Allocate RX and TX skb rings */ priv->rx_skb[q] = kcalloc(priv->num_rx_ring[q], @@ -308,12 +296,28 @@ static int ravb_ring_init(struct net_dev if (!priv->rx_skb[q] || !priv->tx_skb[q]) goto error; + for (i = 0; i < priv->num_rx_ring[q]; i++) { + skb = netdev_alloc_skb(ndev, PKT_BUF_SZ + RAVB_ALIGN - 1); + if (!skb) + goto error; + ravb_set_buffer_align(skb); + priv->rx_skb[q][i] = skb; + } + /* Allocate rings for the aligned buffers */ priv->tx_buffers[q] = kcalloc(priv->num_tx_ring[q], sizeof(*priv->tx_buffers[q]), GFP_KERNEL); if (!priv->tx_buffers[q]) goto error; + for (i = 0; i < priv->num_tx_ring[q]; i++) { + buffer = kmalloc(PKT_BUF_SZ + RAVB_ALIGN - 1, GFP_KERNEL); + if (!buffer) + goto error; + /* Aligned TX buffer */ + priv->tx_buffers[q][i] = buffer; + } +
Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems
On 21.07.2015 [11:30:58 -0500], Chris J Arges wrote: > On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote: > > On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote: > > > Some architectures like POWER can have a NUMA node_possible_map that > > > contains sparse entries. This causes memory corruption with openvswitch > > > since it allocates flow_cache with a multiple of num_possible_nodes() and > > > > Couldn't this also be fixed by just allocationg with a multiple of > > nr_node_ids (which seems to have been the original intent all along)? > > You could then make your stats array be sparse or not. > > > > Yea originally this is what I did, but I thought it would be wasting memory. > > > > assumes the node variable returned by for_each_node will index into > > > flow->stats[node]. > > > > > > For example, if node_possible_map is 0x30003, this patch will map node to > > > node_cnt as follows: > > > 0,1,16,17 => 0,1,2,3 > > > > > > The crash was noticed after 3af229f2 was applied as it changed the > > > node_possible_map to match node_online_map on boot. > > > Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 > > > > My concern with this version of the fix is that you're relying on, > > implicitly, the order of for_each_node's iteration corresponding to the > > entries in stats 1:1. But what about node hotplug? It seems better to > > have the enumeration of the stats array match the topology accurately, > > rather, or to maintain some sort of internal map in the OVS code between > > the NUMA node and the entry in the stats array? > > > > I'm willing to be convinced otherwise, though :) > > > > -Nish > > > > Nish, > > The method I described should work for hotplug since it's using possible map > which AFAIK is static rather than the online map. Oh you're right, I'm sorry! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
On 21.07.2015 [12:36:33 -0500], Chris J Arges wrote: > Some architectures like POWER can have a NUMA node_possible_map that > contains sparse entries. This causes memory corruption with openvswitch > since it allocates flow_cache with a multiple of num_possible_nodes() and > assumes the node variable returned by for_each_node will index into > flow->stats[node]. > > Use nr_node_ids to allocate a maximal sparse array instead of > num_possible_nodes(). > > The crash was noticed after 3af229f2 was applied as it changed the > node_possible_map to match node_online_map on boot. > Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 > > Signed-off-by: Chris J Arges Acked-by: Nishanth Aravamudan > --- > net/openvswitch/flow_table.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c > index 4613df8..6552394 100644 > --- a/net/openvswitch/flow_table.c > +++ b/net/openvswitch/flow_table.c > @@ -752,7 +752,7 @@ int ovs_flow_init(void) > BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long)); > > flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow) > -+ (num_possible_nodes() > ++ (nr_node_ids > * sizeof(struct flow_stats *)), > 0, 0, NULL); > if (flow_cache == NULL) > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why return E2BIG from bpf map update?
On 7/21/15 3:13 AM, Alex Gartrell wrote: But, the EINVAL errno has similarly been abused to death there was a thread few month ago trying to come up with a generic solution for aliased error codes, but unfortunately nothing concrete came out of it. The one I liked sounded that the kernel may be able to extend syscall interface to return a string together with errno, but it's quite hard to do at present. May be extensions to vdso data writable by kernel can improve the situation. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen
On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote: > > - kfree_skb(skb); > > + INIT_LIST_HEAD(&q->new_flows); > > + INIT_LIST_HEAD(&q->old_flows); > > + for (i = 0; i < q->flows_cnt; i++) { > > + struct fq_codel_flow *flow = q->flows + i; > > + > > + while (flow->head) > > + kfree_skb(dequeue_head(flow)); > > + > > + INIT_LIST_HEAD(&flow->flowchain); > > > You probably need to call codel_vars_init(&flow->cvars) as well. It is not necessary : flow->cvars only matter in the event of a dequeue, but whole qdisc is dismantled and no packet will be dequeued. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iproute2] Use PATH_MAX instead of MAXPATHLEN
On Wed, Apr 29, 2015 at 6:52 PM, Felix Janda wrote: > Florian Fainelli wrote: >> On 27/04/15 09:13, Stephen Hemminger wrote: >> > On Sat, 25 Apr 2015 22:33:28 +0200 >> > Felix Janda wrote: >> > >> >> They are equivalent but the former is more common. PATH_MAX is >> >> specified by POSIX and needs while MAXPATHLEN has BSD >> >> origin and needs . >> >> >> >> PATH_MAX has already been in use in misc/lnstat.h. >> >> >> >> Signed-off-by: Felix Janda >> > >> > Iproute2 is intended for use on Linux. >> > It makes more sense to align with Posix than using leftover >> > BSD stuff. Therefore I don't see any point in doing this. >> >> My reading from Felix's commit message is that he is attempting to do >> exactly that: conform to POSIX rather than BSD, which seems to be the >> direction you are also suggesting here. >> -- >> Florian > > This is correct. (In fact I misread the end of Stephen's message, > thought that the patch was merged and wanted to thank for that.) What's the status of this patch? This is one of the reasons iproute2 cannot be compiled against musl C library. After fixing this I get tons of redefine errors: In file included from ../include/linux/xfrm.h:4:0, from xfrm_state.c:31: ../include/linux/in6.h:32:8: error: redefinition of ‘struct in6_addr’ struct in6_addr { ^ In file included from /home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netdb.h:9:0, from xfrm_state.c:30: /home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netinet/in.h:24:8: note: originally defined here struct in6_addr ^ In file included from ../include/linux/xfrm.h:4:0, from xfrm_state.c:31: ../include/linux/in6.h:40:0: warning: "s6_addr" redefined #define s6_addr in6_u.u6_addr8 ^ In file included from /home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netdb.h:9:0, from xfrm_state.c:30: /home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netinet/in.h:32:0: note: this is the location of the previous definition #define s6_addr __in6_union.__s6_addr ^ Yegor -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v5 0/4] net: enable inband link state negotiation only when explicitly requested
Hi guys, Florian Fainelli writes: > Changes in v5: > > - removed an invalid use of the link_update callback in the SF2 driver > was appeared after merging "net: phy: fixed_phy: handle link-down case" > > - reworded the commit message for patch 2 to make it clear what it fixes and > why this is required > > Initial cover letter from Stas: > > Hello. > > Currently the link status auto-negotiation is enabled > for any SGMII link with fixed-link DT binding. > The regression was reported: > https://lkml.org/lkml/2015/7/8/865 > Apparently not all HW that implements SGMII protocol, generates the > inband status for the auto-negotiation to work. > More details here: > https://lkml.org/lkml/2015/7/10/206 > > The following patches reverts to the old behavior by default, > which is to not enable the auto-negotiation for fixed-link. > The new DT property is added that allows to explicitly request > the auto-negotiation. FWIW, I tested this v5 series on mirabox (2 mvneta interfaces using RGMII); both interfaces still work as expected, i.e. no regression on my side. a+ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ARP response with link local IP, why not broadcast
On Tue, Jul 21, 2015 at 4:38 PM, Sebastian Fett wrote: > Hello! > > According to RFC3927 every ARP packet (reply and request) should be sent as > link layer broadcast as long as the sender IP is a link local address. (see > chapter 2.5). Because broadcast replies are noisy and should be avoided. if possible- it creates a broadcast flood that would wake up all receivers, and is especially undesirable in today's world, where bcast would wake up sleepy devices, or require other inefficient processes in a cloud env. See also https://www.ietf.org/id/draft-nordmark-6man-dad-approaches-01.txt > That functionality would help me a lot with a use case I have with our > application. what is your use case? > > But it is not implemented in the kernel that way. > Does anyone know why? --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
Frank Schreuder wrote: [ inet frag evictor crash ] We believe we found the bug. This patch should fix it. We cannot share list for buckets and evictor, the flag member is subject to race conditions so flags & INET_FRAG_EVICTED test is not reliable. It would be great if you could confirm that this fixes the problem for you, we'll then make formal patch submission. Please apply this on kernel without previous test patches, wheter you use affected -stable or net-next kernel shouldn't matter since those are similar enough. Many thanks! diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -45,6 +45,7 @@ enum { * @flags: fragment queue flags * @max_size: maximum received fragment size * @net: namespace that this frag belongs to + * @list_evictor: list of queues to forcefully evict (e.g. due to low memory) */ struct inet_frag_queue { spinlock_t lock; @@ -59,6 +60,7 @@ struct inet_frag_queue { __u8flags; u16 max_size; struct netns_frags *net; + struct hlist_node list_evictor; }; #define INETFRAGS_HASHSZ 1024 diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 5e346a0..1722348 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -151,14 +151,13 @@ evict_again: } fq->flags |= INET_FRAG_EVICTED; - hlist_del(&fq->list); - hlist_add_head(&fq->list, &expired); + hlist_add_head(&fq->list_evictor, &expired); ++evicted; } spin_unlock(&hb->chain_lock); - hlist_for_each_entry_safe(fq, n, &expired, list) + hlist_for_each_entry_safe(fq, n, &expired, list_evictor) f->frag_expire((unsigned long) fq); return evicted; @@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f) struct inet_frag_bucket *hb; hb = get_frag_bucket_locked(fq, f); - if (!(fq->flags & INET_FRAG_EVICTED)) - hlist_del(&fq->list); + hlist_del(&fq->list); spin_unlock(&hb->chain_lock); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
On Tue, Jul 21, 2015 at 10:36 AM, Chris J Arges wrote: > Some architectures like POWER can have a NUMA node_possible_map that > contains sparse entries. This causes memory corruption with openvswitch > since it allocates flow_cache with a multiple of num_possible_nodes() and > assumes the node variable returned by for_each_node will index into > flow->stats[node]. > > Use nr_node_ids to allocate a maximal sparse array instead of > num_possible_nodes(). > > The crash was noticed after 3af229f2 was applied as it changed the > node_possible_map to match node_online_map on boot. > Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 > > Signed-off-by: Chris J Arges Acked-by: Pravin B Shelar Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 0/2] pci: Provide a flag to access VPD through function 0
[+cc Alex] On Mon, Jul 13, 2015 at 11:39:54AM -0700, Mark D Rustad wrote: > Many multi-function devices provide shared registers in extended > config space for accessing VPD. The behavior of these registers > means that the state must be tracked and access locked correctly > for accesses not to hang or worse. One way to meet these needs is > to always perform the accesses through function 0, thereby using > the state tracking and mutex that already exists. > > To provide this behavior, add a dev_flags bit to indicate that this > should be done. This bit can then be set for any non-zero function > that needs to redirect such VPD access to function 0. Do not set > this bit on the zero function or there will be an infinite recursion. > > The second patch uses this new flag to invoke this behavior on all > multi-function Intel Ethernet devices. > > Any hardware that shares VPD registers with multiple functions has > been suffering these problems forever. The hangs result in the log > message: > > vpd r/w failed. This is likely a firmware bug on this device. > > Both read and write data corruption are also possible during > overlapping accesses in addition to hangs. > > Signed-off-by: Mark Rustad > > --- > Changes in V2: > - Corrected a spelling error in a log message > - Added checks to see that the referenced function 0 is reasonable > Changes in V3: > - Don't leak a device reference > - Check that function 0 has VPD > - Make a helper for the function 0 checks > - Moved a multifunction check to the quirk patch > Changes in V4: > - Provide a more extensive commit log for patch 1 I applied these to pci/misc for v4.3 with changelogs as follows. I added Alex's ack, since he acked v3 and the only difference here is the changelog. I also added a stable tag. Thanks! Bjorn commit 932c435caba8a2ce473a91753bad0173269ef334 Author: Mark Rustad Date: Mon Jul 13 11:40:02 2015 -0700 PCI: Add dev_flags bit to access VPD through function 0 Add a dev_flags bit, PCI_DEV_FLAGS_VPD_REF_F0, to access VPD through function 0 to provide VPD access on other functions. This is for hardware devices that provide copies of the same VPD capability registers in multiple functions. Because the kernel expects that each function has its own registers, both the locking and the state tracking are affected by VPD accesses to different functions. On such devices for example, if a VPD write is performed on function 0, *any* later attempt to read VPD from any other function of that device will hang. This has to do with how the kernel tracks the expected value of the F bit per function. Concurrent accesses to different functions of the same device can not only hang but also corrupt both read and write VPD data. When hangs occur, typically the error message: vpd r/w failed. This is likely a firmware bug on this device. will be seen. Never set this bit on function 0 or there will be an infinite recursion. Signed-off-by: Mark Rustad Signed-off-by: Bjorn Helgaas Acked-by: Alexander Duyck CC: sta...@vger.kernel.org commit 7aa6ca4d39edf01f997b9e02cf6d2fdeb224f351 Author: Mark Rustad Date: Mon Jul 13 11:40:07 2015 -0700 PCI: Add VPD function 0 quirk for Intel Ethernet devices Set the PCI_DEV_FLAGS_VPD_REF_F0 flag on all Intel Ethernet device functions other than function 0, so that on multi-function devices, we will always read VPD from function 0 instead of from the other functions. [bhelgaas: changelog] Signed-off-by: Mark Rustad Signed-off-by: Bjorn Helgaas Acked-by: Alexander Duyck CC: sta...@vger.kernel.org -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen
On Tue, Jul 21, 2015 at 3:52 AM, Eric Dumazet wrote: > On Tue, 2015-07-21 at 06:04 -0400, Jamal Hadi Salim wrote: > >> It is worrisome to fix the core code for this. The root cause seems to >> be codel. Dont have time but in general, reset would be something like: >> >> struct fq_codel_sched_data *q = qdisc_priv(sch); >> qdisc_reset(q) > > This only works for very simple qdisc with one queue. > >> >> or something along those lines... >> But certainly dequeue semantics dont seem right there.. > > Well, reset() is trivial to implement like this > > while (skb = local_dequeue(sch)) { > kfree_skb(skb); > } > > And I guess I copy/pasted sfq code here, because I was lazy. > > But yes, qdisc_tree_decrease_qlen() would have to be not called. Hmm, so the semantic is each qdisc resets qlen for its own and calls qdisc_reset() to reset its leaf qdisc's, that makes sense for me. > > It seems I coded fq_reset() differently. > > Alex, please try instead : > > diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c > index 21ca33c9f036..3f0320ab6029 100644 > --- a/net/sched/sch_fq_codel.c > +++ b/net/sched/sch_fq_codel.c > @@ -288,10 +288,21 @@ begin: > > static void fq_codel_reset(struct Qdisc *sch) > { > - struct sk_buff *skb; > + struct fq_codel_sched_data *q = qdisc_priv(sch); > + int i; > > - while ((skb = fq_codel_dequeue(sch)) != NULL) > - kfree_skb(skb); > + INIT_LIST_HEAD(&q->new_flows); > + INIT_LIST_HEAD(&q->old_flows); > + for (i = 0; i < q->flows_cnt; i++) { > + struct fq_codel_flow *flow = q->flows + i; > + > + while (flow->head) > + kfree_skb(dequeue_head(flow)); > + > + INIT_LIST_HEAD(&flow->flowchain); You probably need to call codel_vars_init(&flow->cvars) as well. > + } > + memset(q->backlogs, 0, q->flows_cnt * sizeof(u32)); > + sch->q.qlen = 0; > } > > static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = { > > > Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] tcp: don't extend RTO on failed loss probe attempts
On Fri, Jul 17, 2015 at 10:27 PM, Eric Dumazet wrote: > > On Fri, 2015-07-17 at 14:22 -0700, Yuchung Cheng wrote: > > If TLP was unable to send a probe, it extended the RTO to > > now + icsk_rto. But extending the RTO makes little sense > > if no TLP probe went out. With this commit, instead of > > extending the RTO we re-arm it relative to the transmit time > > of the write queue head. > > But what was the reason the probe could not be sent ? > > If it is local congestion or memory allocation error, it does make sense > to not add fuel to the fire. Good point. We can identify those so we don't attempt to retransmit on these errors, but will retransmit on receive-window limit. I'll re-spin the patch. > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iproute2 v2] ss: Fix crash when dump stats from /proc with '-p'
On Tue, 21 Jul 2015 16:18:36 +0300 Vadim Kochan wrote: > From: Vadim Kochan > > It really partially reverts: > > ec4d0d8a9def35 (ss: Replace unixstat struct by new sockstat struct) > > but adds few fields (name & peer_name) from removed unixstat to sockstat > struct to easy > return original code. > > Fixes: ec4d0d8a9def35 (ss: Replace unixstat struct by new sockstat struct) > Reported-by: Marc Dietrich > Signed-off-by: Vadim Kochan I applied this one after resolving merge conflicts. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 14/22] vxlan: Flow based tunneling
On 07/21/15 at 10:30am, Alexei Starovoitov wrote: > RX: > >+info->mode = IP_TUNNEL_INFO_RX; > >+info->key.tun_flags = TUNNEL_KEY; > >+info->key.tun_id = cpu_to_be64(vni >> 8); > ... > TX: > >+dst_port = info->key.tp_dst ? : vxlan->dst_port; > >+vni = be64_to_cpu(info->key.tun_id); > > I think the copy paste of ovs_tunnel_info into ip_tunnel_info > can be improved. In particular instead of '__be64 tun_id' > we can use '__u64 tun_id' which will avoid extra byteswaps for rx/tx > paths. > > netlink for this part also seems inconsistent. > In the patch 16: > +static const struct nla_policy ip_tun_policy[IP_TUN_MAX + 1] = { > + [IP_TUN_ID] = { .type = NLA_U64 }, > ... > + if (tb[IP_TUN_ID]) > + tun_info->key.tun_id = nla_get_u64(tb[IP_TUN_ID]); > > I think nla_get_be64 should be there? > and with my suggestion we can add be64_to_cpu() here instead > of doing it per packet. > Thoughts? I like this. The be64 originates from how OVS stores the tun_id in the flow key. I agree that it makes sense to limit and delay the byteswaps to when OVS inherits the flow key from the ip_tunnel_info. I will send a follow-up. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] drivers: net: cpsw: remove tx event processing in rx napi poll
From: Mugunthan V N Date: Tue, 21 Jul 2015 16:00:42 +0530 > With commit c03abd84634d ("net: ethernet: cpsw: don't requests IRQs > we don't use") common isr and napi are separated into separate tx isr > and rx isr/napi, but still in rx napi tx events are handled. So removing > the tx event handling in rx napi. > > Signed-off-by: Mugunthan V N Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 00/22 v2] Lightweight & flow based encapsulation
From: Thomas Graf Date: Tue, 21 Jul 2015 10:43:44 +0200 > This series combines the work previously posted by Roopa, Robert and > myself. It's according to what we discussed at NFWS. The motivation > of this series is to: > > * Consolidate code between OVS and the rest of the kernel and get >rid of OVS vports and instead represent them as pure net_devices. > * Introduce a lightweight tunneling mechanism which enables flow >based encapsulation to improve scalability on both RX and TX. > * Do the above in an encapsulation unspecific way so that the >encapsulation type is eventually abstracted away from the user. > * Use the same forwarding decision for both native forwarding and >encapsulation thus allowing to switch between native IPv6 and >UDP encapsulation based on endpoint without requiring additional >logic > > The fundamental changes introduces in this series are: > * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation >instructions. Depending on the specified type, the instructions >apply to UDP encapsulations, MPLS and possible other in the future. > * Depending on the encapsulation type, the output function of the >dst is directly overwritten or the dst merely attaches metadata and >relies on a subsequent net_device to apply it to the packet. The >latter is typically used if an inner and outer IP header exist which >require two subsequent routing lookups to be performed. > * A new metadata_dst structure which can be attached to skbs to >carry metadata in between subsystems. This new metadata transport >is used to provide a single interface for VXLAN, routing and OVS >to communicate through metadata. Series applied, but please take Alexei's endianness feedback into consideration. Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net] inet: frags: fix defragmented packet's IP header for af_packet
From: Eric Dumazet Date: Tue, 21 Jul 2015 09:43:59 +0200 > From: Edward Hyunkoo Jee > > When ip_frag_queue() computes positions, it assumes that the passed > sk_buff does not contain L2 headers. > > However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly > functions can be called on outgoing packets that contain L2 headers. > > Also, IPv4 checksum is not corrected after reassembly. > > Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 > fanouts.") > Signed-off-by: Edward Hyunkoo Jee > Signed-off-by: Eric Dumazet Applied and queued up for -stable, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and assumes the node variable returned by for_each_node will index into flow->stats[node]. Use nr_node_ids to allocate a maximal sparse array instead of num_possible_nodes(). The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges --- net/openvswitch/flow_table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index 4613df8..6552394 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -752,7 +752,7 @@ int ovs_flow_init(void) BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long)); flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow) - + (num_possible_nodes() + + (nr_node_ids * sizeof(struct flow_stats *)), 0, 0, NULL); if (flow_cache == NULL) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 14/22] vxlan: Flow based tunneling
On 7/21/15 1:43 AM, Thomas Graf wrote: This prepares the VXLAN device to be steered by the routing and other subsystems which allows to support encapsulation for a large number of tunnel endpoints and tunnel ids through a single net_device which improves the scalability. +1. looks very useful. RX: + info->mode = IP_TUNNEL_INFO_RX; + info->key.tun_flags = TUNNEL_KEY; + info->key.tun_id = cpu_to_be64(vni >> 8); ... TX: + dst_port = info->key.tp_dst ? : vxlan->dst_port; + vni = be64_to_cpu(info->key.tun_id); I think the copy paste of ovs_tunnel_info into ip_tunnel_info can be improved. In particular instead of '__be64 tun_id' we can use '__u64 tun_id' which will avoid extra byteswaps for rx/tx paths. netlink for this part also seems inconsistent. In the patch 16: +static const struct nla_policy ip_tun_policy[IP_TUN_MAX + 1] = { + [IP_TUN_ID] = { .type = NLA_U64 }, ... + if (tb[IP_TUN_ID]) + tun_info->key.tun_id = nla_get_u64(tb[IP_TUN_ID]); I think nla_get_be64 should be there? and with my suggestion we can add be64_to_cpu() here instead of doing it per packet. Thoughts? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: dp83867: Fix warning check for setting the internal delay
On 21/07/15 10:06, Dan Murphy wrote: > Fix warning: logical ‘or’ of collectively exhaustive tests is always true > > Change the internal delay check from an 'or' condition to an 'and' > condition. > > Reported-by: David Binderman > Signed-off-by: Dan Murphy Acked-by: Florian Fainelli > --- > drivers/net/phy/dp83867.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c > index c7a12e2..8a3bf54 100644 > --- a/drivers/net/phy/dp83867.c > +++ b/drivers/net/phy/dp83867.c > @@ -164,7 +164,7 @@ static int dp83867_config_init(struct phy_device *phydev) > return ret; > } > > - if ((phydev->interface >= PHY_INTERFACE_MODE_RGMII_ID) || > + if ((phydev->interface >= PHY_INTERFACE_MODE_RGMII_ID) && > (phydev->interface <= PHY_INTERFACE_MODE_RGMII_RXID)) { > val = phy_read_mmd_indirect(phydev, DP83867_RGMIICTL, > DP83867_DEVADDR, phydev->addr); > -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Identifying underlying interface from struct sock
Hi, First, I apologize for posting on the netdev forum. Majordomo did not list any other network related mailing list. Is there a way to identify the underlying network interface from an instance of struct sock? I realize that the socket is abstract and shouldn't/doesn't necessarily depend on the underlying interface, but say, with TCP, where the connection is endpoint oriented, shouldn't this mean that the socket maintains a reference to the interface to which it is associated? I tried dev = dev_get_by_index(sock_net(sk), skb->skb_iif); and dev = skb->dev; but in both cases, dev was NULL. I'm trying to reference the underlying interface to determine whether the conditions present in that interface are acceptable for transmission. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: phy: dp83867: Fix warning check for setting the internal delay
Fix warning: logical ‘or’ of collectively exhaustive tests is always true Change the internal delay check from an 'or' condition to an 'and' condition. Reported-by: David Binderman Signed-off-by: Dan Murphy --- drivers/net/phy/dp83867.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c index c7a12e2..8a3bf54 100644 --- a/drivers/net/phy/dp83867.c +++ b/drivers/net/phy/dp83867.c @@ -164,7 +164,7 @@ static int dp83867_config_init(struct phy_device *phydev) return ret; } - if ((phydev->interface >= PHY_INTERFACE_MODE_RGMII_ID) || + if ((phydev->interface >= PHY_INTERFACE_MODE_RGMII_ID) && (phydev->interface <= PHY_INTERFACE_MODE_RGMII_RXID)) { val = phy_read_mmd_indirect(phydev, DP83867_RGMIICTL, DP83867_DEVADDR, phydev->addr); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] packet: Allow packets with only a header (but no payload)
On Tue, Jul 21, 2015 at 12:38 PM, Martin Blumenstingl wrote: > Hi Willem, > > On Tue, Jul 21, 2015 at 6:28 PM, Willem de Bruijn wrote: >> Interesting. 9c7077622dd9 only extended the check from tpacket_snd to >> packet_snd to make the two paths equivalent. The existing check had the >> ominous statement >> >> /* net device doesn't like empty head */ > OK, I guess it's best to find out what the purpose of this comment is. > >> so allowing a header-only packet while correct in your case may not be >> safe in some edge cases (specific device drivers?). > I'm wondering how a good fix would look like (I can think of a few > things, like renaming hard_header_len to something min_packet_size)? > I am open for suggestions since I have zero knowledge about the inner > workings of the packet framework. I don't see a simple way of verifying the safety of allowing packets without data short of a code audit, which would be huge, especially when taking device driver logic into account. Perhaps someone remembers why that statement was added and what edge case(s) it refers to. I'm afraid that I don't. It was added in 69e3c75f4d54. I added the author to this thread. >> This was also discussed previously >> >> http://www.spinics.net/lists/netdev/msg309677.html >> >> In any case, I don't think that reverting the patch and restoring the old >> inconsistent state is a fix. > I totally agree with you that it's a bad fix if this means that we > could break other drivers. > My primary goal was to fix PPPoE connections - I guess I should have > simply added "RFC" to the subject. > > > Regards, > Martin -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] cgroup: net_cls: fix false-positive "suspicious RCU usage"
In dev_queue_xmit() net_cls protected with rcu-bh. [ 270.730026] === [ 270.730029] [ INFO: suspicious RCU usage. ] [ 270.730033] 4.2.0-rc3+ #2 Not tainted [ 270.730036] --- [ 270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage! [ 270.730041] other info that might help us debug this: [ 270.730043] rcu_scheduler_active = 1, debug_locks = 1 [ 270.730045] 2 locks held by dhclient/748: [ 270.730046] #0: (rcu_read_lock_bh){..}, at: [] __dev_queue_xmit+0x50/0x960 [ 270.730085] #1: (&qdisc_tx_lock){+.}, at: [] __dev_queue_xmit+0x240/0x960 [ 270.730090] stack backtrace: [ 270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2 [ 270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 [ 270.730100] 0001 8800bafeba58 817ad487 0007 [ 270.730103] 880232a0a780 8800bafeba88 810ca4f2 88022fb23e00 [ 270.730105] 880232a0a780 8800bafebb68 8800bafebb68 8800bafebaa8 [ 270.730108] Call Trace: [ 270.730121] [] dump_stack+0x4c/0x65 [ 270.730148] [] lockdep_rcu_suspicious+0xe2/0x120 [ 270.730153] [] task_cls_state+0x92/0xa0 [ 270.730158] [] cls_cgroup_classify+0x4f/0x120 [cls_cgroup] [ 270.730164] [] tc_classify_compat+0x74/0xc0 [ 270.730166] [] tc_classify+0x33/0x90 [ 270.730170] [] htb_enqueue+0xaa/0x4a0 [sch_htb] [ 270.730172] [] __dev_queue_xmit+0x306/0x960 [ 270.730174] [] ? __dev_queue_xmit+0x50/0x960 [ 270.730176] [] dev_queue_xmit_sk+0x13/0x20 [ 270.730185] [] dev_queue_xmit+0x10/0x20 [ 270.730187] [] packet_snd.isra.62+0x54c/0x760 [ 270.730190] [] packet_sendmsg+0x2f5/0x3f0 [ 270.730203] [] ? sock_def_readable+0x5/0x190 [ 270.730210] [] ? _raw_spin_unlock+0x2b/0x40 [ 270.730216] [] ? unix_dgram_sendmsg+0x5cc/0x640 [ 270.730219] [] sock_sendmsg+0x47/0x50 [ 270.730221] [] sock_write_iter+0x7f/0xd0 [ 270.730232] [] __vfs_write+0xa7/0xf0 [ 270.730234] [] vfs_write+0xb8/0x190 [ 270.730236] [] SyS_write+0x52/0xb0 [ 270.730239] [] entry_SYSCALL_64_fastpath+0x12/0x76 Signed-off-by: Konstantin Khlebnikov --- net/core/netclassid_cgroup.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c index 1f2a126f4ffa..515939034298 100644 --- a/net/core/netclassid_cgroup.c +++ b/net/core/netclassid_cgroup.c @@ -23,7 +23,8 @@ static inline struct cgroup_cls_state *css_cls_state(struct cgroup_subsys_state struct cgroup_cls_state *task_cls_state(struct task_struct *p) { - return css_cls_state(task_css(p, net_cls_cgrp_id)); + return css_cls_state(task_css_check(p, net_cls_cgrp_id, + rcu_read_lock_bh_held())); } EXPORT_SYMBOL_GPL(task_cls_state); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] packet: Allow packets with only a header (but no payload)
Hi Willem, On Tue, Jul 21, 2015 at 6:28 PM, Willem de Bruijn wrote: > Interesting. 9c7077622dd9 only extended the check from tpacket_snd to > packet_snd to make the two paths equivalent. The existing check had the > ominous statement > > /* net device doesn't like empty head */ OK, I guess it's best to find out what the purpose of this comment is. > so allowing a header-only packet while correct in your case may not be > safe in some edge cases (specific device drivers?). I'm wondering how a good fix would look like (I can think of a few things, like renaming hard_header_len to something min_packet_size)? I am open for suggestions since I have zero knowledge about the inner workings of the packet framework. > This was also discussed previously > > http://www.spinics.net/lists/netdev/msg309677.html > > In any case, I don't think that reverting the patch and restoring the old > inconsistent state is a fix. I totally agree with you that it's a bad fix if this means that we could break other drivers. My primary goal was to fix PPPoE connections - I guess I should have simply added "RFC" to the subject. Regards, Martin -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems
On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote: > On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote: > > Some architectures like POWER can have a NUMA node_possible_map that > > contains sparse entries. This causes memory corruption with openvswitch > > since it allocates flow_cache with a multiple of num_possible_nodes() and > > Couldn't this also be fixed by just allocationg with a multiple of > nr_node_ids (which seems to have been the original intent all along)? > You could then make your stats array be sparse or not. > Yea originally this is what I did, but I thought it would be wasting memory. > > assumes the node variable returned by for_each_node will index into > > flow->stats[node]. > > > > For example, if node_possible_map is 0x30003, this patch will map node to > > node_cnt as follows: > > 0,1,16,17 => 0,1,2,3 > > > > The crash was noticed after 3af229f2 was applied as it changed the > > node_possible_map to match node_online_map on boot. > > Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 > > My concern with this version of the fix is that you're relying on, > implicitly, the order of for_each_node's iteration corresponding to the > entries in stats 1:1. But what about node hotplug? It seems better to > have the enumeration of the stats array match the topology accurately, > rather, or to maintain some sort of internal map in the OVS code between > the NUMA node and the entry in the stats array? > > I'm willing to be convinced otherwise, though :) > > -Nish > Nish, The method I described should work for hotplug since it's using possible map which AFAIK is static rather than the online map. Regardless, the more simple solution to solve this issue would be to just allocate nr_node_ids number of entries and use up extra memory. I'll send a v2 after testing it. --chris > > Signed-off-by: Chris J Arges > > --- > > net/openvswitch/flow.c | 10 ++ > > net/openvswitch/flow_table.c | 18 +++--- > > 2 files changed, 17 insertions(+), 11 deletions(-) > > > > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c > > index bc7b0ab..425d45d 100644 > > --- a/net/openvswitch/flow.c > > +++ b/net/openvswitch/flow.c > > @@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow, > > struct ovs_flow_stats *ovs_stats, > > unsigned long *used, __be16 *tcp_flags) > > { > > - int node; > > + int node, node_cnt = 0; > > > > *used = 0; > > *tcp_flags = 0; > > memset(ovs_stats, 0, sizeof(*ovs_stats)); > > > > for_each_node(node) { > > - struct flow_stats *stats = > > rcu_dereference_ovsl(flow->stats[node]); > > + struct flow_stats *stats = > > rcu_dereference_ovsl(flow->stats[node_cnt]); > > > > if (stats) { > > /* Local CPU may write on non-local stats, so we must > > @@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow, > > ovs_stats->n_bytes += stats->byte_count; > > spin_unlock_bh(&stats->lock); > > } > > + node_cnt++; > > } > > } > > > > /* Called with ovs_mutex. */ > > void ovs_flow_stats_clear(struct sw_flow *flow) > > { > > - int node; > > + int node, node_cnt = 0; > > > > for_each_node(node) { > > - struct flow_stats *stats = ovsl_dereference(flow->stats[node]); > > + struct flow_stats *stats = > > ovsl_dereference(flow->stats[node_cnt]); > > > > if (stats) { > > spin_lock_bh(&stats->lock); > > @@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow) > > stats->tcp_flags = 0; > > spin_unlock_bh(&stats->lock); > > } > > + node_cnt++; > > } > > } > > > > diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c > > index 4613df8..5d10c54 100644 > > --- a/net/openvswitch/flow_table.c > > +++ b/net/openvswitch/flow_table.c > > @@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void) > > { > > struct sw_flow *flow; > > struct flow_stats *stats; > > - int node; > > + int node, node_cnt = 0; > > > > flow = kmem_cache_alloc(flow_cache, GFP_KERNEL); > > if (!flow) > > @@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void) > > > > RCU_INIT_POINTER(flow->stats[0], stats); > > > > - for_each_node(node) > > + for_each_node(node) { > > if (node != 0) > > - RCU_INIT_POINTER(flow->stats[node], NULL); > > + RCU_INIT_POINTER(flow->stats[node_cnt], NULL); > > + node_cnt++; > > + } > > > > return flow; > > err: > > @@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int > > n_buckets) > > > > static void flow_free(struct sw_flow *flow) > > { > > - int node; > > + int node, node_cnt = 0; > > > > if (ovs_identifier_is_key(&flow->id)) > > kfree
Re: [PATCH] packet: Allow packets with only a header (but no payload)
On Tue, Jul 21, 2015 at 12:14 PM, Martin Blumenstingl wrote: > 9c70776 added validation for the packet size in packet_snd. This change > enforced that every packet needs a long enough header and at least one > byte payload. > > However, when trying to establish a PPPoE connection the following message > is printed every time a PPPoE discovery packet is sent: > pppd: packet size is too short (24 <= 24) > > From what I can see in the PPPoE code the "PADI" discovery packet can > consist of only a header with no payload (when there is neither a service > name nor a Host-Uniq configured). Interesting. 9c7077622dd9 only extended the check from tpacket_snd to packet_snd to make the two paths equivalent. The existing check had the ominous statement /* net device doesn't like empty head */ so allowing a header-only packet while correct in your case may not be safe in some edge cases (specific device drivers?). This was also discussed previously http://www.spinics.net/lists/netdev/msg309677.html In any case, I don't think that reverting the patch and restoring the old inconsistent state is a fix. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems
On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote: > Some architectures like POWER can have a NUMA node_possible_map that > contains sparse entries. This causes memory corruption with openvswitch > since it allocates flow_cache with a multiple of num_possible_nodes() and Couldn't this also be fixed by just allocationg with a multiple of nr_node_ids (which seems to have been the original intent all along)? You could then make your stats array be sparse or not. > assumes the node variable returned by for_each_node will index into > flow->stats[node]. > > For example, if node_possible_map is 0x30003, this patch will map node to > node_cnt as follows: > 0,1,16,17 => 0,1,2,3 > > The crash was noticed after 3af229f2 was applied as it changed the > node_possible_map to match node_online_map on boot. > Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 My concern with this version of the fix is that you're relying on, implicitly, the order of for_each_node's iteration corresponding to the entries in stats 1:1. But what about node hotplug? It seems better to have the enumeration of the stats array match the topology accurately, rather, or to maintain some sort of internal map in the OVS code between the NUMA node and the entry in the stats array? I'm willing to be convinced otherwise, though :) -Nish > Signed-off-by: Chris J Arges > --- > net/openvswitch/flow.c | 10 ++ > net/openvswitch/flow_table.c | 18 +++--- > 2 files changed, 17 insertions(+), 11 deletions(-) > > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c > index bc7b0ab..425d45d 100644 > --- a/net/openvswitch/flow.c > +++ b/net/openvswitch/flow.c > @@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow, > struct ovs_flow_stats *ovs_stats, > unsigned long *used, __be16 *tcp_flags) > { > - int node; > + int node, node_cnt = 0; > > *used = 0; > *tcp_flags = 0; > memset(ovs_stats, 0, sizeof(*ovs_stats)); > > for_each_node(node) { > - struct flow_stats *stats = > rcu_dereference_ovsl(flow->stats[node]); > + struct flow_stats *stats = > rcu_dereference_ovsl(flow->stats[node_cnt]); > > if (stats) { > /* Local CPU may write on non-local stats, so we must > @@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow, > ovs_stats->n_bytes += stats->byte_count; > spin_unlock_bh(&stats->lock); > } > + node_cnt++; > } > } > > /* Called with ovs_mutex. */ > void ovs_flow_stats_clear(struct sw_flow *flow) > { > - int node; > + int node, node_cnt = 0; > > for_each_node(node) { > - struct flow_stats *stats = ovsl_dereference(flow->stats[node]); > + struct flow_stats *stats = > ovsl_dereference(flow->stats[node_cnt]); > > if (stats) { > spin_lock_bh(&stats->lock); > @@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow) > stats->tcp_flags = 0; > spin_unlock_bh(&stats->lock); > } > + node_cnt++; > } > } > > diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c > index 4613df8..5d10c54 100644 > --- a/net/openvswitch/flow_table.c > +++ b/net/openvswitch/flow_table.c > @@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void) > { > struct sw_flow *flow; > struct flow_stats *stats; > - int node; > + int node, node_cnt = 0; > > flow = kmem_cache_alloc(flow_cache, GFP_KERNEL); > if (!flow) > @@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void) > > RCU_INIT_POINTER(flow->stats[0], stats); > > - for_each_node(node) > + for_each_node(node) { > if (node != 0) > - RCU_INIT_POINTER(flow->stats[node], NULL); > + RCU_INIT_POINTER(flow->stats[node_cnt], NULL); > + node_cnt++; > + } > > return flow; > err: > @@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int > n_buckets) > > static void flow_free(struct sw_flow *flow) > { > - int node; > + int node, node_cnt = 0; > > if (ovs_identifier_is_key(&flow->id)) > kfree(flow->id.unmasked_key); > kfree((struct sw_flow_actions __force *)flow->sf_acts); > - for_each_node(node) > - if (flow->stats[node]) > + for_each_node(node) { > + if (flow->stats[node_cnt]) > kmem_cache_free(flow_stats_cache, > - (struct flow_stats __force > *)flow->stats[node]); > + (struct flow_stats __force > *)flow->stats[node_cnt]); > + node_cnt++; > + } > kmem_cache_free(flow_cache, flow); > } > > -- > 1.9.1 > -- To unsubscribe from this list: send t
[PATCH net-next] mpls: make RTA_OIF optional
From: Roopa Prabhu If user did not specify an oif, try and get it from the via address. If failed to get device, return with -ENODEV. Signed-off-by: Roopa Prabhu --- net/mpls/af_mpls.c | 67 +++- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 1f93a59..4cd3789 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "internal.h" #define LABEL_NOT_SPECIFIED (1<<20) @@ -327,6 +328,70 @@ static unsigned find_free_label(struct net *net) return LABEL_NOT_SPECIFIED; } +static struct net_device *inet_fib_lookup_dev(struct net *net, void *addr) +{ + struct net_device *dev = NULL; + struct rtable *rt; + struct in_addr daddr; + + memcpy(&daddr, addr, sizeof(struct in_addr)); + rt = ip_route_output(net, daddr.s_addr, 0, 0, 0); + if (IS_ERR(rt)) + goto errout; + + dev = rt->dst.dev; + dev_hold(dev); + + ip_rt_put(rt); + +errout: + return dev; +} + +static struct net_device *inet6_fib_lookup_dev(struct net *net, void *addr) +{ + struct net_device *dev = NULL; + struct dst_entry *dst; + struct flowi6 fl6; + + memset(&fl6, 0, sizeof(fl6)); + memcpy(&fl6.daddr, addr, sizeof(struct in6_addr)); + dst = ip6_route_output(net, NULL, &fl6); + if (dst->error) + goto errout; + + dev = dst->dev; + dev_hold(dev); + +errout: + dst_release(dst); + + return dev; +} + +static struct net_device *find_outdev(struct net *net, + struct mpls_route_config *cfg) +{ + struct net_device *dev = NULL; + + if (!cfg->rc_ifindex) { + switch (cfg->rc_via_table) { + case NEIGH_ARP_TABLE: + dev = inet_fib_lookup_dev(net, cfg->rc_via); + break; + case NEIGH_ND_TABLE: + dev = inet6_fib_lookup_dev(net, cfg->rc_via); + break; + case NEIGH_LINK_TABLE: + break; + } + } else { + dev = dev_get_by_index(net, cfg->rc_ifindex); + } + + return dev; +} + static int mpls_route_add(struct mpls_route_config *cfg) { struct mpls_route __rcu **platform_label; @@ -358,7 +423,7 @@ static int mpls_route_add(struct mpls_route_config *cfg) goto errout; err = -ENODEV; - dev = dev_get_by_index(net, cfg->rc_ifindex); + dev = find_outdev(net, cfg); if (!dev) goto errout; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] packet: Allow packets with only a header (but no payload)
9c70776 added validation for the packet size in packet_snd. This change enforced that every packet needs a long enough header and at least one byte payload. However, when trying to establish a PPPoE connection the following message is printed every time a PPPoE discovery packet is sent: pppd: packet size is too short (24 <= 24) >From what I can see in the PPPoE code the "PADI" discovery packet can consist of only a header with no payload (when there is neither a service name nor a Host-Uniq configured). Signed-off-by: Martin Blumenstingl --- net/packet/af_packet.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index c9e8741..d983f8f 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2199,18 +2199,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb) sock_wfree(skb); } -static bool ll_header_truncated(const struct net_device *dev, int len) -{ - /* net device doesn't like empty head */ - if (unlikely(len <= dev->hard_header_len)) { - net_warn_ratelimited("%s: packet size is too short (%d <= %d)\n", -current->comm, len, dev->hard_header_len); - return true; - } - - return false; -} - static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, void *frame, struct net_device *dev, int size_max, __be16 proto, unsigned char *addr, int hlen) @@ -2286,8 +2274,14 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, if (unlikely(err < 0)) return -EINVAL; } else if (dev->hard_header_len) { - if (ll_header_truncated(dev, tp_len)) + /* net device doesn't like empty head */ + if (unlikely(len <= dev->hard_header_len)) { + net_warn_ratelimited("%s: packet size is too short " + "(%d <= %d)\n", + current->comm, len, + dev->hard_header_len); return -EINVAL; + } skb_push(skb, dev->hard_header_len); err = skb_store_bits(skb, 0, data, @@ -2624,8 +2618,13 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) if (unlikely(offset < 0)) goto out_free; } else { - if (ll_header_truncated(dev, len)) + if (unlikely(len < dev->hard_header_len)) { + net_warn_ratelimited("%s: packet size is shorter than " + "minimum header size (%d < %d)\n", + current->comm, len, + dev->hard_header_len); goto out_free; + } } /* Returns -EFAULT on error */ -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] ath10k: fixing wrong initialization of struct channel
Maninder Singh writes: >>> chandef is initialized with NULL and on the very next line, >>> we are using it to get channel, which is not correct. >>> >>> channel should be initialized after obtaining chandef. >>> >>> Signed-off-by: Maninder Singh > >>How did you find this bug? > > Static anlysis reports this bug like coverity or any other static tool like > cppcheck :- > > drivers/net/wireless/ath/ath10k/mac.c:839]: (error) Possible null pointer > dereference: chandef Thanks. This is always good to add to the commit log so I did that: ath10k: fix wrong initialization of struct channel chandef is initialized with NULL and on the very next line, we are using it to get channel, which is not correct. Channel should be initialized after obtaining chandef. Found by cppcheck: ath/ath10k/mac.c:839]: (error) Possible null pointer dereference: chandef Signed-off-by: Maninder Singh Signed-off-by: Kalle Valo -- Kalle Valo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 net-next 0/3] ARM BPF JIT features
On 7/21/15 5:16 AM, Nicolas Schichan wrote: This serie adds support for more instructions to the ARM BPF JIT namely skb netdevice type retrieval, skb payload offset retrieval, and skb packet type retrieval. This allows 35 tests to use the JIT instead of 29 before. This serie depends on the "BPF JIT fixes for ARM" serie sent earlier. Actually in these patches I don't see a strong dependency on 'net' set, but since you're saying there is, you'd need to resubmit this set after your 'net' set is merged, whole 'net' sent to Linus and merged into net-next. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] openvswitch: make for_each_node loops work with sparse numa systems
Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and assumes the node variable returned by for_each_node will index into flow->stats[node]. For example, if node_possible_map is 0x30003, this patch will map node to node_cnt as follows: 0,1,16,17 => 0,1,2,3 The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges --- net/openvswitch/flow.c | 10 ++ net/openvswitch/flow_table.c | 18 +++--- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index bc7b0ab..425d45d 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow, struct ovs_flow_stats *ovs_stats, unsigned long *used, __be16 *tcp_flags) { - int node; + int node, node_cnt = 0; *used = 0; *tcp_flags = 0; memset(ovs_stats, 0, sizeof(*ovs_stats)); for_each_node(node) { - struct flow_stats *stats = rcu_dereference_ovsl(flow->stats[node]); + struct flow_stats *stats = rcu_dereference_ovsl(flow->stats[node_cnt]); if (stats) { /* Local CPU may write on non-local stats, so we must @@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow, ovs_stats->n_bytes += stats->byte_count; spin_unlock_bh(&stats->lock); } + node_cnt++; } } /* Called with ovs_mutex. */ void ovs_flow_stats_clear(struct sw_flow *flow) { - int node; + int node, node_cnt = 0; for_each_node(node) { - struct flow_stats *stats = ovsl_dereference(flow->stats[node]); + struct flow_stats *stats = ovsl_dereference(flow->stats[node_cnt]); if (stats) { spin_lock_bh(&stats->lock); @@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow) stats->tcp_flags = 0; spin_unlock_bh(&stats->lock); } + node_cnt++; } } diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index 4613df8..5d10c54 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void) { struct sw_flow *flow; struct flow_stats *stats; - int node; + int node, node_cnt = 0; flow = kmem_cache_alloc(flow_cache, GFP_KERNEL); if (!flow) @@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void) RCU_INIT_POINTER(flow->stats[0], stats); - for_each_node(node) + for_each_node(node) { if (node != 0) - RCU_INIT_POINTER(flow->stats[node], NULL); + RCU_INIT_POINTER(flow->stats[node_cnt], NULL); + node_cnt++; + } return flow; err: @@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int n_buckets) static void flow_free(struct sw_flow *flow) { - int node; + int node, node_cnt = 0; if (ovs_identifier_is_key(&flow->id)) kfree(flow->id.unmasked_key); kfree((struct sw_flow_actions __force *)flow->sf_acts); - for_each_node(node) - if (flow->stats[node]) + for_each_node(node) { + if (flow->stats[node_cnt]) kmem_cache_free(flow_stats_cache, - (struct flow_stats __force *)flow->stats[node]); + (struct flow_stats __force *)flow->stats[node_cnt]); + node_cnt++; + } kmem_cache_free(flow_cache, flow); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] net: fec: use managed DMA API functions to allocate BD ring
So it gets freed when the device is going away. This fixes a DMA memory leak on driver probe() fail and driver remove(). Signed-off-by: Lucas Stach --- v2: Fix indentation of second line to fix alignment with opening bracket. --- drivers/net/ethernet/freescale/fec_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 349365d85b92..a7f1bdf718f8 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3142,8 +3142,8 @@ static int fec_enet_init(struct net_device *ndev) fep->bufdesc_size; /* Allocate memory for buffer descriptors. */ - cbd_base = dma_alloc_coherent(NULL, bd_size, &bd_dma, - GFP_KERNEL); + cbd_base = dmam_alloc_coherent(&fep->pdev->dev, bd_size, &bd_dma, + GFP_KERNEL); if (!cbd_base) { return -ENOMEM; } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe fail path
This function frees resources and cancels delayed work item that have been initialized in fec_ptp_init(). Use this to do proper error handling if something goes wrong in probe function after fec_ptp_init has been called. Signed-off-by: Lucas Stach --- drivers/net/ethernet/freescale/fec.h | 1 + drivers/net/ethernet/freescale/fec_main.c | 5 ++--- drivers/net/ethernet/freescale/fec_ptp.c | 10 ++ 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index 1eee73cccdf5..99d33e2d35e6 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -562,6 +562,7 @@ struct fec_enet_private { }; void fec_ptp_init(struct platform_device *pdev); +void fec_ptp_stop(struct platform_device *pdev); void fec_ptp_start_cyclecounter(struct net_device *ndev); int fec_ptp_set(struct net_device *ndev, struct ifreq *ifr); int fec_ptp_get(struct net_device *ndev, struct ifreq *ifr); diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index a7f1bdf718f8..32e3807c650e 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3494,6 +3494,7 @@ failed_register: failed_mii_init: failed_irq: failed_init: + fec_ptp_stop(pdev); if (fep->reg_phy) regulator_disable(fep->reg_phy); failed_regulator: @@ -3515,14 +3516,12 @@ fec_drv_remove(struct platform_device *pdev) struct net_device *ndev = platform_get_drvdata(pdev); struct fec_enet_private *fep = netdev_priv(ndev); - cancel_delayed_work_sync(&fep->time_keep); cancel_work_sync(&fep->tx_timeout_work); + fec_ptp_stop(pdev); unregister_netdev(ndev); fec_enet_mii_remove(fep); if (fep->reg_phy) regulator_disable(fep->reg_phy); - if (fep->ptp_clock) - ptp_clock_unregister(fep->ptp_clock); of_node_put(fep->phy_node); free_netdev(ndev); diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c index a15663ad7f5e..f457a23d0bfb 100644 --- a/drivers/net/ethernet/freescale/fec_ptp.c +++ b/drivers/net/ethernet/freescale/fec_ptp.c @@ -604,6 +604,16 @@ void fec_ptp_init(struct platform_device *pdev) schedule_delayed_work(&fep->time_keep, HZ); } +void fec_ptp_stop(struct platform_device *pdev) +{ + struct net_device *ndev = platform_get_drvdata(pdev); + struct fec_enet_private *fep = netdev_priv(ndev); + + cancel_delayed_work_sync(&fep->time_keep); + if (fep->ptp_clock) + ptp_clock_unregister(fep->ptp_clock); +} + /** * fec_ptp_check_pps_event * @fep: the fec_enet_private structure handle -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/2] iwlegacy: convert hex_dump_to_buffer() to %*ph
> There is no need to use hex_dump_to_buffer() in the cases like this: > > hexdump_to_buffer(buf, len, 16, 1, outbuf, outlen, false); /* len > <= 16 */ > sprintf("%s\n", outbuf); > > since it maybe easily converted to simple: > > sprintf("%*ph\n", len, buf); > > Note: it seems in the case the output is groupped by 2 bytes and looks like a > typo. Thus, patch changes that to plain byte stream. > > Signed-off-by: Andy Shevchenko Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2] brcmsmac: Use kstrdup to simplify code
> Replace a kmalloc+strcpy by an equivalent kstrdup in order to improve > readability. > > Signed-off-by: Christophe JAILLET > Acked-by: Arend van Spriel Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rtlwifi: rtl8821ae: Fix an expression that is always false
> In routine _rtl8821ae_set_media_status(), an incorrect mask results in a test > for AP status to always be false. Similar bugs were fixed in rtl8192cu and > rtl8192de, but this instance was missed at that time. > > Reported-by: David Binderman > Signed-off-by: Larry Finger > Cc: Stable [3.18+] > Cc: David Binderman Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netcp:Fix error handling in the function netcp_xgbe_serdes_config
On 07/20/2015 11:54 AM, Nicholas Krause wrote: This fixes error handling in the function netcp_xgbe_serdes_config by putting the return value of netcp_xgbe_serdes_check_lane into the variable ret and return this value to the caller as this function can fail when called by returning the error code -ETIMEOUT. Signed-off-by: Nicholas Krause --- drivers/net/ethernet/ti/netcp_xgbepcsr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/netcp_xgbepcsr.c b/drivers/net/ethernet/ti/netcp_xgbepcsr.c index 33571ac..0c79e3d 100644 --- a/drivers/net/ethernet/ti/netcp_xgbepcsr.c +++ b/drivers/net/ethernet/ti/netcp_xgbepcsr.c @@ -483,7 +483,7 @@ static int netcp_xgbe_serdes_config(void __iomem *serdes_regs, return ret; netcp_xgbe_serdes_enable_xgmii_port(sw_regs); - netcp_xgbe_serdes_check_lane(serdes_regs, sw_regs); + ret = netcp_xgbe_serdes_check_lane(serdes_regs, sw_regs); return ret; } Nicholas, Thanks for the patch. Acked-by: Murali Karicheri -- Murali Karicheri Linux Kernel, Keystone -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2] macvtap: fix network header pointer for VLAN tagged pkts
On 15/07/21 (火) 16:18, Ivan Vecera wrote: > Network header is set with offset ETH_HLEN but it is not true for VLAN > (multiple-)tagged and results in checksum issues in lower devices. > > v2: leave skb->protocol untouched (thx Vlad), comment added > > Signed-off-by: Ivan Vecera > --- > drivers/net/macvtap.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c > index 3b933bb..b75776b 100644 > --- a/drivers/net/macvtap.c > +++ b/drivers/net/macvtap.c > @@ -796,6 +796,13 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, > struct msghdr *m, > skb_reset_mac_header(skb); > skb->protocol = eth_hdr(skb)->h_proto; > > + /* Move network header to the right position for VLAN tagged packets */ > + if (skb_vlan_tagged(skb)) { I guess you don't need the condition skb_vlan_tag_present(skb), i.e., if (skb->protocol == htons(ETH_P_8021Q) || skb->protocol == htons(ETH_P_8021AD)) > + int depth; > + __vlan_get_protocol(skb, skb->protocol, &depth); __vlan_get_protocol() can fail, and then, depth will not be initialized. > + skb_set_network_header(skb, depth); I think you should set network_header after skb_probe_transport_header(). It calls skb_flow_dissect_flow_keys(), which seems to expect network_header to be ETH_HLEN. Toshiaki Makita -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ARP response with link local IP, why not broadcast
Hello! According to RFC3927 every ARP packet (reply and request) should be sent as link layer broadcast as long as the sender IP is a link local address. (see chapter 2.5). That functionality would help me a lot with a use case I have with our application. But it is not implemented in the kernel that way. Does anyone know why? Regards, Sebastian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] netlink: don't hold mutex in rcu callback when releasing mmapd ring
Kirill A. Shutemov says: This simple test-case trigers few locking asserts in kernel: int main(int argc, char **argv) { unsigned int block_size = 16 * 4096; struct nl_mmap_req req = { .nm_block_size = block_size, .nm_block_nr= 64, .nm_frame_size = 16384, .nm_frame_nr= 64 * block_size / 16384, }; unsigned int ring_size; int fd; fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) exit(1); if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) exit(1); ring_size = req.nm_block_nr * req.nm_block_size; mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); return 0; } +++ exited with 0 +++ BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (reboot_mutex){+.+...}, at: [] SyS_reboot+0xa9/0x220 #1: ((reboot_notifier_list).rwsem){.+.+..}, at: [] __blocking_notifier_call_chain+0x39/0x70 #2: (rcu_callback){..}, at: [] rcu_do_batch.isra.49+0x160/0x10c0 Preemption disabled at:[] __delay+0xf/0x20 CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-9-gbddf4c4818e0 #253 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 88017b3d8000 88027bc03c38 81929ceb 0102 88027bc03c68 81085a9d 0002 81ca2a20 0268 88027bc03c98 Call Trace: [] dump_stack+0x4f/0x7b [] ___might_sleep+0x16d/0x270 [] __might_sleep+0x4d/0x90 [] mutex_lock_nested+0x2f/0x430 [] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [] ? __this_cpu_preempt_check+0x13/0x20 [] netlink_set_ring+0x1ed/0x350 [] ? netlink_undo_bind+0x70/0x70 [] netlink_sock_destruct+0x80/0x150 [] __sk_free+0x1d/0x160 [] sk_free+0x19/0x20 [..] Cong Wang says: We can't hold mutex lock in a rcu callback, [..] Thomas Graf says: The socket should be dead at this point. It might be simpler to add a netlink_release_ring() function which doesn't require locking at all. Reported-by: "Kirill A. Shutemov" Diagnosed-by: Cong Wang Suggested-by: Thomas Graf Signed-off-by: Florian Westphal --- net/netlink/af_netlink.c | 79 1 file changed, 47 insertions(+), 32 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 9a0ae71..d8e2e39 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -357,25 +357,52 @@ err1: return NULL; } + +static void +__netlink_set_ring(struct sock *sk, struct nl_mmap_req *req, bool tx_ring, void **pg_vec, + unsigned int order) +{ + struct netlink_sock *nlk = nlk_sk(sk); + struct sk_buff_head *queue; + struct netlink_ring *ring; + + queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue; + ring = tx_ring ? &nlk->tx_ring : &nlk->rx_ring; + + spin_lock_bh(&queue->lock); + + ring->frame_max = req->nm_frame_nr - 1; + ring->head = 0; + ring->frame_size= req->nm_frame_size; + ring->pg_vec_pages = req->nm_block_size / PAGE_SIZE; + + swap(ring->pg_vec_len, req->nm_block_nr); + swap(ring->pg_vec_order, order); + swap(ring->pg_vec, pg_vec); + + __skb_queue_purge(queue); + spin_unlock_bh(&queue->lock); + + WARN_ON(atomic_read(&nlk->mapped)); + + if (pg_vec) + free_pg_vec(pg_vec, order, req->nm_block_nr); +} + static int netlink_set_ring(struct sock *sk, struct nl_mmap_req *req, - bool closing, bool tx_ring) + bool tx_ring) { struct netlink_sock *nlk = nlk_sk(sk); struct netlink_ring *ring; - struct sk_buff_head *queue; void **pg_vec = NULL; unsigned int order = 0; - int err; ring = tx_ring ? &nlk->tx_ring : &nlk->rx_ring; - queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue; - if (!closing) { - if (atomic_read(&nlk->mapped)) - return -EBUSY; - if (atomic_read(&ring->pending)) - return -EBUSY; - } + if (atomic_read(&nlk->mapped)) + return -EBUSY; + if (atomic_read(&ring->pending)) + return -EBUSY; if (req->nm_block_nr) { if (ring->pg_vec != NULL) @@ -407,31 +434,19 @@ static int netlink_set_ring(struct sock *sk, struct nl_mmap_req *req, return -EINVAL; } - err = -EBUSY; mutex_lock(&nlk->pg_vec_lock); - if (closing || atomic_read(&nlk->mapped) == 0) { -
Re: Several races in "usbnet" module (kernel 4.1.x)
On Mon, 2015-07-20 at 21:13 +0300, Eugene Shatokhin wrote: > And here, the code clears EVENT_RX_KILL bit in dev->flags, which may > execute concurrently with the above operation: > #0 clear_bit (bitops.h:113, inlined) > #1 usbnet_bh (usbnet.c:1475) > /* restart RX again after disabling due to high error rate */ > clear_bit(EVENT_RX_KILL, &dev->flags); > > If clear_bit() is atomic w.r.t. setting dev->flags to 0, this race is > not a problem, I guess. Otherwise, it may be. clear_bit is atomic with respect to other atomic operations. So how about this: Regards Oliver >From 1c4e685b3a9c183e04c46b661830e5c7ed35b513 Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Tue, 21 Jul 2015 16:19:40 +0200 Subject: [PATCH] usbnet: fix race between usbnet_stop() and the BH Does this do the job? Signed-off-by: Oliver Neukum --- drivers/net/usb/usbnet.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 3c86b10..77a9a86 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -778,7 +778,7 @@ int usbnet_stop (struct net_device *net) { struct usbnet *dev = netdev_priv(net); struct driver_info *info = dev->driver_info; - int retval, pm; + int retval, pm, mpn; clear_bit(EVENT_DEV_OPEN, &dev->flags); netif_stop_queue (net); @@ -813,14 +813,17 @@ int usbnet_stop (struct net_device *net) * can't flush_scheduled_work() until we drop rtnl (later), * else workers could deadlock; so make workers a NOP. */ + mpn = !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags); dev->flags = 0; del_timer_sync (&dev->delay); tasklet_kill (&dev->bh); + mpn |= !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags); + /* in case the bh reset a flag */ + dev->flags = 0; if (!pm) usb_autopm_put_interface(dev->intf); - if (info->manage_power && - !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags)) + if (info->manage_power && mpn) info->manage_power(dev, 0); else usb_autopm_put_interface(dev->intf); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html