Re: [PATCH] IB/mlx4: avoid a -Wmaybe-uninitialize warning
On 10/25/2016 7:16 PM, Arnd Bergmann wrote: There is an old warning about mlx4_SW2HW_EQ_wrapper on x86: ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_SW2HW_EQ_wrapper’: ethernet/mellanox/mlx4/resource_tracker.c:3071:10: error: ‘eq’ may be used uninitialized in this function [-Werror=maybe-uninitialized] The problem here is that gcc won't track the state of the variable across a spin_unlock. Moving the assignment out of the lock is safe here and avoids the warning. Signed-off-by: Arnd Bergmann Reviewed-by: Yishai Hadas --- drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c index 84d7857ccc27..c548beaaf910 100644 --- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c +++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c @@ -1605,13 +1605,14 @@ static int eq_res_start_move_to(struct mlx4_dev *dev, int slave, int index, r->com.from_state = r->com.state; r->com.to_state = state; r->com.state = RES_EQ_BUSY; - if (eq) - *eq = r; } } spin_unlock_irq(mlx4_tlock(dev)); + if (!err && eq) + *eq = r; + return err; }
Re: [PATCH 19/28] brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap
Arnd Bergmann writes: > A bugfix added a sanity check around the assignment and use of the > 'is_11d' variable, which looks correct to me, but as the function is > rather complex already, this confuses the compiler to the point where > it can no longer figure out if the variable is always initialized > correctly: > > brcm80211/brcmfmac/cfg80211.c: In function ‘brcmf_cfg80211_start_ap’: > brcm80211/brcmfmac/cfg80211.c:4586:10: error: ‘is_11d’ may be used > uninitialized in this function [-Werror=maybe-uninitialized] > > This adds an initialization for the newly introduced case in which > the variable should not really be used, in order to make the warning > go away. > > Fixes: b3589dfe0212 ("brcmfmac: ignore 11d configuration errors") > Cc: Hante Meuleman > Cc: Arend van Spriel > Cc: Kalle Valo > Signed-off-by: Arnd Bergmann Via which tree are you planning to submit this? Should I take it? -- Kalle Valo
[PATCH net-next] net: core: Traverse the adjacency list from first entry
From: Ido Schimmel netdev_next_lower_dev() returns NULL when we finished traversing the adjacency list ('iter' points to the list's head). Therefore, we must start traversing the list from the first entry and not its head. Fixes: 1a3f060c1a47 ("net: Introduce new api for walking upper and lower devices") Signed-off-by: Ido Schimmel --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index f55fb45..d9c937f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5419,7 +5419,7 @@ int netdev_walk_all_lower_dev(struct net_device *dev, struct list_head *iter; int ret; - for (iter = &dev->adj_list.lower, + for (iter = dev->adj_list.lower.next, ldev = netdev_next_lower_dev(dev, &iter); ldev; ldev = netdev_next_lower_dev(dev, &iter)) { -- 2.7.4
Re: [PATCH (net.git)] net: phy: at803x: disable by default the hibernation feature
Hello Andrew. On 10/25/2016 11:00 AM, Andrew Lunn wrote: For example, while booting a Kernel the SYNP MAC (stmmac) fails to initialize own DMA engine if the phy entered in hibernation before. Have you tried fixing stmmac instead? Let me describe better what happens, to be honest, this is a marginal user-case, but, maybe it makes sense to share this patch in case of somebody meets the same issue. When performing "ifconfig eth0 up", if this phy is not in hibernation, the iface comes up w/o any issues. If the PHY is in hibernation, because the cable is unplugged (and this is a default for these transceivers), the phy clock does down and the MAC cannot init own DMA. The stmmac is designed to fail the open in this case. If I plug the cable the next ifconfig up is ok. The meaning of the patch, I proposed, is to remove by default this hibernation feature at PHY level that, for me, should be an option not a default. For example, I have used other HW where some power state features could be enabled but, by default, were turned off. Also these transceivers support EEE so, I guess, there is all the technology to manage the power consumption on new setup. Concerning the stmmac, how the driver could fix this situation? The PHY does not provide the clock required for GMAC and the stmmac cannot reset own DMA. I had thought to delay this as soon as the link is UP but I don't like this approach where the open should return a sane state but this is not true and we should wait the ACK from the PHY to reset the MAC DMAC. Anyway, as said, the patch covers a marginal user-case so feel free to consider it or not. For sure, I am open to change something at MAC level if you have better idea. Regards Peppe Andrew
Re: [PATCH net] sctp: validate chunk len before actually using it
On Wed, Oct 26, 2016 at 12:27 AM, Marcelo Ricardo Leitner wrote: > Andrey Konovalov reported that KASAN detected that SCTP was using a slab > beyond the boundaries. It was caused because when handling out of the > blue packets in function sctp_sf_ootb() it was checking the chunk len > only after already processing the first chunk, validating only for the > 2nd and subsequent ones. > > The fix is to just move the check upwards so it's also validated for the > 1st chunk. > > Reported-by: Andrey Konovalov > Tested-by: Andrey Konovalov > Signed-off-by: Marcelo Ricardo Leitner Reviewed-by: Xin Long
Re: [PATCH net] packet: on direct_xmit, limit tso and csum to supported devices
On Tue, Oct 25, 2016 at 8:57 PM, Eric Dumazet wrote: > On Tue, 2016-10-25 at 20:28 -0400, Willem de Bruijn wrote: >> From: Willem de Bruijn >> >> When transmitting on a packet socket with PACKET_VNET_HDR and >> PACKET_QDISC_BYPASS, validate device support for features requested >> in vnet_hdr. > > > You probably need to add an EXPORT_SYMBOL(validate_xmit_skb_list) > because af_packet might be modular. Thanks, Eric. I'll send a v2.
Re: [PATCH net-next 2/3] bpf: Add new cgroups prog type to enable sock modifications
On 10/25/16 8:48 PM, Eric Dumazet wrote: > Maybe I do not understand how you plan to use this. > > Let say you want a filter to force a BIND_TO_DEVICE xxx because a task > runs in a cgroup yyy > > Then a program doing a socket() + connect (127.0.0.1) will fail ? maybe. VRF devices can have 127.0.0.1 address in which case the connect would succeed. ntpq uses 127.0.0.1 to talk to ntpd for example. If ntpd is bound to a Management VRF, then you need this context for ntpq to talk to it. > > I do not see how a BPF filter at socket() time can be selective. Here's my use case - and this is what we are doing today with the l3mdev cgroup (a patch which has not been accepted upstream): 1. create VRF device 2. create cgroup and configure it in this case it means load the bpf program that sets the sk_bound_dev_if to the vrf device that was just created 3. Add shell to cgroup For Management VRF this can be done automatically at login so a user does not need to take any action. At this point any command run in the shell runs in the VRF context (PS1 for bash can show the VRF to keep you from going crazy on why a connect fails :-)) so any ipv4/ipv6 sockets have that VRF scope. For example, if the VRF is a management VRF, sockets opened by apt-get are automatically bound to the VRF at create time, so requests go out the eth0 (management) interface. In a similar fashion, using a cgroup and assigning tasks to it allows automated launch of systemd services with instances running in a VRF context - one dhcrelay in vrf red, one in vrf blue with both using a parameterized instance file.
Re: [PATCH net-next 2/3] bpf: Add new cgroups prog type to enable sock modifications
On Tue, 2016-10-25 at 20:21 -0600, David Ahern wrote: > On 10/25/16 5:39 PM, Eric Dumazet wrote: > > On Tue, 2016-10-25 at 15:30 -0700, David Ahern wrote: > >> Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to > >> BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run > >> any time a process in the cgroup opens an AF_INET or AF_INET6 socket. > >> Currently only sk_bound_dev_if is exported to userspace for modification > >> by a bpf program. > >> > >> This allows a cgroup to be configured such that AF_INET{6} sockets opened > >> by processes are automatically bound to a specific device. In turn, this > >> enables the running of programs that do not support SO_BINDTODEVICE in a > >> specific VRF context / L3 domain. > > > > Does this mean that these programs no longer can use loopback ? > > I am probably misunderstanding your question, so I'll ramble a bit and > see if I cover it. > > This patch set generically allows sk_bound_dev_if to be set to any > value. It does not check that an index corresponds to a device at that > moment (either bpf prog install or execution of the filter), and even > if it did the device can be deleted at any moment. That seems to be > standard operating procedure with bpf filters (user mistakes mean > packets go no where and in this case a socket is bound to a > non-existent device). > > The index can be any interface (e.g., eth0) or an L3 device (e.g., a > VRF). Loopback and index=1 is allowed. > > The VRF device is the loopback device for the domain, so binding to it > covers addresses on the VRF device as well as interfaces enslaved to > it. > > Did you mean something else? Maybe I do not understand how you plan to use this. Let say you want a filter to force a BIND_TO_DEVICE xxx because a task runs in a cgroup yyy Then a program doing a socket() + connect (127.0.0.1) will fail ? I do not see how a BPF filter at socket() time can be selective.
[PATCH v2 3/5] kconfig: regenerate *.c_shipped files after previous changes
Signed-off-by: Nicolas Pitre --- scripts/kconfig/zconf.hash.c_shipped | 228 ++--- scripts/kconfig/zconf.tab.c_shipped | 1631 -- 2 files changed, 888 insertions(+), 971 deletions(-) diff --git a/scripts/kconfig/zconf.hash.c_shipped b/scripts/kconfig/zconf.hash.c_shipped index 360a62df2b..bf7f1378b3 100644 --- a/scripts/kconfig/zconf.hash.c_shipped +++ b/scripts/kconfig/zconf.hash.c_shipped @@ -32,7 +32,7 @@ struct kconf_id; static const struct kconf_id *kconf_id_lookup(register const char *str, register unsigned int len); -/* maximum key range = 71, duplicates = 0 */ +/* maximum key range = 72, duplicates = 0 */ #ifdef __GNUC__ __inline @@ -46,32 +46,32 @@ kconf_id_hash (register const char *str, register unsigned int len) { static const unsigned char asso_values[] = { - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 0, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 5, 25, 25, - 0, 0, 0, 5, 0, 0, 73, 73, 5, 0, - 10, 5, 45, 73, 20, 20, 0, 15, 15, 73, - 20, 5, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73 + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 0, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 0, 20, 10, + 0, 0, 0, 30, 0, 0, 74, 74, 5, 15, + 0, 25, 40, 74, 15, 0, 0, 10, 35, 74, + 10, 0, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74 }; register int hval = len; @@ -97,33 +97,35 @@ struct kconf_id_strings_t char kconf_id_strings_str8[sizeof("tristate")]; char kconf_id_strings_str9[sizeof("endchoice")]; char kconf_id_strings_str10[sizeof("---help---")]; +char kconf_id_strings_str11[sizeof("select")]; char kconf_id_strings_str12[sizeof("def_tristate")]; char kconf_id_strings_str13[sizeof("def_bool")]; char kconf_id_strings_str14[sizeof("defconfig_list")]; -char kconf_id_strings_str17[sizeof("on")]; -char kconf_id_strings_str18[sizeof("optional")]; -char kconf_id_strings_str21[sizeof("option")]; -char kconf_id_strings_str22[sizeof("endmenu")]; -char kconf_id_strings_str23[sizeof("mainmenu")]; -char kconf_id_strings_str25[sizeof("menuconfig")]; -char kconf_id_strings_str27[sizeof("modules")]; -char kconf_id_strings_str28[sizeof("allnoconfig_y")]; +char kconf_id_strings_str16[sizeof("source")]; +char kconf_id_strings_str17[sizeof("endmenu")]; +char kconf_id_strings_str18[sizeof("allnoconfig_y")]; +char kconf_id_strings_str20[sizeof("range")]; +char kconf_id_strings_str22[sizeof("modules")]; +char kconf_id_strings_str23[sizeof("hex")]; +char kconf_id_strings_str27[sizeof("on")]; char kconf_id_strings_str29[sizeof("menu")]; -char kconf_id_strings_str31[sizeof("select")]; +char kconf_id_strings_str31[sizeof("option")]; char kconf_id_strings_str32[sizeof("comment")]; -char kconf_id_strings_str33[sizeof("env")]; -char kconf_id_strings_str35[sizeof("range")]; -char kconf_id_strings_str36[sizeof("choice")]; -char kconf_id_strings_str39[sizeof("bool")]; -char kconf_id_strings_str41[sizeof("source")]; +char kconf_id_string
[PATCH v2 5/5] posix-timers: make it configurable
Some embedded systems have no use for them. This removes about 22KB from the kernel binary size when configured out. Corresponding syscalls are routed to a stub logging the attempt to use those syscalls which should be enough of a clue if they were disabled without proper consideration. They are: timer_create, timer_gettime: timer_getoverrun, timer_settime, timer_delete, clock_adjtime. The clock_settime, clock_gettime, clock_getres and clock_nanosleep syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME, CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast majority of use cases with very little code. Signed-off-by: Nicolas Pitre Reviewed-by: Josh Triplett --- drivers/ptp/Kconfig | 2 +- include/linux/posix-timers.h | 28 +- include/linux/sched.h| 10 init/Kconfig | 17 +++ kernel/signal.c | 4 ++ kernel/time/Makefile | 10 +++- kernel/time/posix-stubs.c| 118 +++ 7 files changed, 184 insertions(+), 5 deletions(-) create mode 100644 kernel/time/posix-stubs.c diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig index 0f7492f8ea..bdce332911 100644 --- a/drivers/ptp/Kconfig +++ b/drivers/ptp/Kconfig @@ -6,7 +6,7 @@ menu "PTP clock support" config PTP_1588_CLOCK tristate "PTP clock support" - depends on NET + depends on NET && POSIX_TIMERS select PPS select NET_PTP_CLASSIFY help diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 62d44c1760..2288c5c557 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -118,6 +118,8 @@ struct k_clock { extern struct k_clock clock_posix_cpu; extern struct k_clock clock_posix_dynamic; +#ifdef CONFIG_POSIX_TIMERS + void posix_timers_register_clock(const clockid_t clock_id, struct k_clock *new_clock); /* function to call to trigger timer event */ @@ -131,8 +133,30 @@ void posix_cpu_timers_exit_group(struct task_struct *task); void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx, cputime_t *newval, cputime_t *oldval); -long clock_nanosleep_restart(struct restart_block *restart_block); - void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new); +#else + +#include + +static inline void posix_timers_register_clock(const clockid_t clock_id, + struct k_clock *new_clock) {} +static inline int posix_timer_event(struct k_itimer *timr, int si_private) +{ return 0; } +static inline void run_posix_cpu_timers(struct task_struct *task) {} +static inline void posix_cpu_timers_exit(struct task_struct *task) +{ + add_device_randomness((const void*) &task->se.sum_exec_runtime, + sizeof(unsigned long long)); +} +static inline void posix_cpu_timers_exit_group(struct task_struct *task) {} +static inline void set_process_cpu_timer(struct task_struct *task, + unsigned int clock_idx, cputime_t *newval, cputime_t *oldval) {} +static inline void update_rlimit_cpu(struct task_struct *task, +unsigned long rlim_new) {} + +#endif + +long clock_nanosleep_restart(struct restart_block *restart_block); + #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 348f51b0ec..ad716d5559 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2946,8 +2946,13 @@ static inline void exit_thread(struct task_struct *tsk) extern void exit_files(struct task_struct *); extern void __cleanup_sighand(struct sighand_struct *); +#ifdef CONFIG_POSIX_TIMERS extern void exit_itimers(struct signal_struct *); extern void flush_itimer_signals(void); +#else +static inline void exit_itimers(struct signal_struct *s) {} +static inline void flush_itimer_signals(void) {} +#endif extern void do_group_exit(int); @@ -3450,7 +3455,12 @@ static __always_inline bool need_resched(void) * Thread group CPU time accounting. */ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times); +#ifdef CONFIG_POSIX_TIMERS void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times); +#else +static inline void thread_group_cputimer(struct task_struct *tsk, +struct task_cputime *times) {} +#endif /* * Reevaluate whether the task has signals pending delivery. diff --git a/init/Kconfig b/init/Kconfig index 34407f15e6..351d422252 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1445,6 +1445,23 @@ config SYSCTL_SYSCALL If unsure say N here. +config POSIX_TIMERS + bool "Posix Clocks & timers" if EXPERT + default y + help + This includes native support for POSIX timers to the kernel. + Most embedded systems may have no use for them and therefore they + can be configured out to reduce the size of the kernel image. +
[PATCH v2 1/5] kconfig: introduce the "imply" keyword
The "imply" keyword is a weak version of "select" where the target config symbol can still be turned off, avoiding those pitfalls that come with the "select" keyword. This is useful e.g. with multiple drivers that want to indicate their ability to hook into a given subsystem while still being able to configure that subsystem out and keep those drivers selected. Currently, the same effect can almost be achieved with: config DRIVER_A tristate config DRIVER_B tristate config DRIVER_C tristate config DRIVER_D tristate [...] config SUBSYSTEM_X tristate default DRIVER_A || DRIVER_B || DRIVER_C || DRIVER_D || [...] This is unwieldly to maintain especially with a large number of drivers. Furthermore, there is no easy way to restrict the choice for SUBSYSTEM_X to y or n, excluding m, when some drivers are built-in. The "select" keyword allows for excluding m, but it excludes n as well. Hence this "imply" keyword. The above becomes: config DRIVER_A tristate imply SUBSYSTEM_X config DRIVER_B tristate imply SUBSYSTEM_X [...] config SUBSYSTEM_X tristate This is much cleaner, and way more flexible than "select". SUBSYSTEM_X can still be configured out, and it can be set as a module when none of the drivers are selected or all of them are also modular. Signed-off-by: Nicolas Pitre Reviewed-by: Josh Triplett --- Documentation/kbuild/kconfig-language.txt | 28 scripts/kconfig/expr.h| 2 ++ scripts/kconfig/menu.c| 55 ++- scripts/kconfig/symbol.c | 24 +- scripts/kconfig/zconf.gperf | 1 + scripts/kconfig/zconf.y | 16 +++-- 6 files changed, 107 insertions(+), 19 deletions(-) diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index 069fcb3eef..5ee0dd3c85 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt @@ -113,6 +113,33 @@ applicable everywhere (see syntax). That will limit the usefulness but on the other hand avoid the illegal configurations all over. +- weak reverse dependencies: "imply" ["if" ] + This is similar to "select" as it enforces a lower limit on another + symbol except that the "implied" config symbol's value may still be + set to n from a direct dependency or with a visible prompt. + Given the following example: + + config FOO + tristate + imply BAZ + + config BAZ + tristate + depends on BAR + + The following values are possible: + + FOO BAR BAZ's default choice for BAZ + --- --- - -- + n y n N/m/y + m y m M/y/n + y y y Y/n + y n * N + + This is useful e.g. with multiple drivers that want to indicate their + ability to hook into a given subsystem while still being able to + configure that subsystem out and keep those drivers selected. + - limiting menu display: "visible if" This attribute is only applicable to menu blocks, if the condition is false, the menu block is not displayed to the user (the symbols @@ -481,6 +508,7 @@ historical issues resolved through these different solutions. b) Match dependency semantics: b1) Swap all "select FOO" to "depends on FOO" or, b2) Swap all "depends on FOO" to "select FOO" + c) Consider the use of "imply" instead of "select" The resolution to a) can be tested with the sample Kconfig file Documentation/kbuild/Kconfig.recursion-issue-01 through the removal diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h index 973b6f7333..a73f762c48 100644 --- a/scripts/kconfig/expr.h +++ b/scripts/kconfig/expr.h @@ -85,6 +85,7 @@ struct symbol { struct property *prop; struct expr_value dir_dep; struct expr_value rev_dep; + struct expr_value implied; }; #define for_all_symbols(i, sym) for (i = 0; i < SYMBOL_HASHSIZE; i++) for (sym = symbol_hash[i]; sym; sym = sym->next) if (sym->type != S_OTHER) @@ -136,6 +137,7 @@ enum prop_type { P_DEFAULT, /* default y */ P_CHOICE, /* choice value */ P_SELECT, /* select BAR */ + P_IMPLY,/* imply BAR */ P_RANGE,/* range 7..100 (for a symbol) */ P_ENV, /* value from environment variable */ P_SYMBOL, /* where a symbol is defined */ diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c index aed678e8a7..e9357931b4 100644 --- a/scripts/kconfig/menu.c +++ b/scripts/kconfig/menu.c @@ -233,6 +233,8 @@ static void sym_check_prop(struct symbol *sym) { struct property *prop; struct symbol *sym2; + char
Re: [PATCH net-next 2/3] bpf: Add new cgroups prog type to enable sock modifications
On 10/25/16 7:55 PM, Alexei Starovoitov wrote: > Same question as Daniel... why extra helper? It can be dropped. wrong path while learning this code. > If program overwrites bpf_sock->sk_bound_dev_if can we use that > after program returns? > Also do you think it's possible to extend this patch to prototype > the port bind restrictions that were proposed few month back using > the same bpf_sock input structure? > Probably the check would need to be moved into different > place instead of sk_alloc(), but then we'll have more > opportunities to overwrite bound_dev_if, look at ports and so on ? > I think the sk_bound_dev_if should be set when the socket is created versus waiting until it is used (bind, connect, sendmsg, recvmsg). That said, the filter could (should?) be run in the protocol family create function (inet_create and inet6_create) versus sk_alloc. That would allow the filter to allocate a local port based on its logic. I'd prefer interested parties to look into the details of that use case. I'll move the running of the filter to the end of the create functions for v2.
[PATCH v2 4/5] ptp_clock: allow for it to be optional
In order to break the hard dependency between the PTP clock subsystem and ethernet drivers capable of being clock providers, this patch provides simple PTP stub functions to allow linkage of those drivers into the kernel even when the PTP subsystem is configured out. Drivers must be ready to accept NULL from ptp_clock_register() in that case. And to make it possible for PTP to be configured out, the select statement in those driver's Kconfig menu entries is converted to the new "imply" statement. This way the PTP subsystem may have Kconfig dependencies of its own, such as POSIX_TIMERS, without having to make those ethernet drivers unavailable if POSIX timers are cconfigured out. And when support for POSIX timers is selected again then the default config option for PTP clock support will automatically be adjusted accordingly. The pch_gbe driver is a bit special as it relies on extra code in drivers/ptp/ptp_pch.c. Therefore we let the make process descend into drivers/ptp/ even if PTP_1588_CLOCK is unselected. Signed-off-by: Nicolas Pitre Reviewed-by: Josh Triplett --- drivers/Makefile| 2 +- drivers/net/ethernet/adi/Kconfig| 2 +- drivers/net/ethernet/amd/Kconfig| 2 +- drivers/net/ethernet/amd/xgbe/xgbe-main.c | 6 ++- drivers/net/ethernet/broadcom/Kconfig | 4 +- drivers/net/ethernet/cavium/Kconfig | 2 +- drivers/net/ethernet/freescale/Kconfig | 2 +- drivers/net/ethernet/intel/Kconfig | 10 ++-- drivers/net/ethernet/mellanox/mlx4/Kconfig | 2 +- drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 2 +- drivers/net/ethernet/renesas/Kconfig| 2 +- drivers/net/ethernet/samsung/Kconfig| 2 +- drivers/net/ethernet/sfc/Kconfig| 2 +- drivers/net/ethernet/stmicro/stmmac/Kconfig | 2 +- drivers/net/ethernet/ti/Kconfig | 2 +- drivers/net/ethernet/tile/Kconfig | 2 +- drivers/ptp/Kconfig | 8 +-- include/linux/ptp_clock_kernel.h| 65 - 18 files changed, 69 insertions(+), 50 deletions(-) diff --git a/drivers/Makefile b/drivers/Makefile index f0afdfb3c7..8cfa1ff8f6 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -107,7 +107,7 @@ obj-$(CONFIG_INPUT) += input/ obj-$(CONFIG_RTC_LIB) += rtc/ obj-y += i2c/ media/ obj-$(CONFIG_PPS) += pps/ -obj-$(CONFIG_PTP_1588_CLOCK) += ptp/ +obj-y += ptp/ obj-$(CONFIG_W1) += w1/ obj-y += power/ obj-$(CONFIG_HWMON)+= hwmon/ diff --git a/drivers/net/ethernet/adi/Kconfig b/drivers/net/ethernet/adi/Kconfig index 6b94ba6103..98cc8f5350 100644 --- a/drivers/net/ethernet/adi/Kconfig +++ b/drivers/net/ethernet/adi/Kconfig @@ -58,7 +58,7 @@ config BFIN_RX_DESC_NUM config BFIN_MAC_USE_HWSTAMP bool "Use IEEE 1588 hwstamp" depends on BFIN_MAC && BF518 - select PTP_1588_CLOCK + imply PTP_1588_CLOCK default y ---help--- To support the IEEE 1588 Precision Time Protocol (PTP), select y here diff --git a/drivers/net/ethernet/amd/Kconfig b/drivers/net/ethernet/amd/Kconfig index 0038709fd3..713ea7ad22 100644 --- a/drivers/net/ethernet/amd/Kconfig +++ b/drivers/net/ethernet/amd/Kconfig @@ -177,7 +177,7 @@ config AMD_XGBE depends on ARM64 || COMPILE_TEST select BITREVERSE select CRC32 - select PTP_1588_CLOCK + imply PTP_1588_CLOCK ---help--- This driver supports the AMD 10GbE Ethernet device found on an AMD SoC. diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c b/drivers/net/ethernet/amd/xgbe/xgbe-main.c index 9de078819a..e10e569c0d 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c @@ -773,7 +773,8 @@ static int xgbe_probe(struct platform_device *pdev) goto err_wq; } - xgbe_ptp_register(pdata); + if (IS_REACHABLE(CONFIG_PTP_1588_CLOCK)) + xgbe_ptp_register(pdata); xgbe_debugfs_init(pdata); @@ -812,7 +813,8 @@ static int xgbe_remove(struct platform_device *pdev) xgbe_debugfs_exit(pdata); - xgbe_ptp_unregister(pdata); + if (IS_REACHABLE(CONFIG_PTP_1588_CLOCK)) + xgbe_ptp_unregister(pdata); flush_workqueue(pdata->an_workqueue); destroy_workqueue(pdata->an_workqueue); diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig index bd8c80c0b7..6a8d74aeb6 100644 --- a/drivers/net/ethernet/broadcom/Kconfig +++ b/drivers/net/ethernet/broadcom/Kconfig @@ -110,7 +110,7 @@ config TIGON3 depends on PCI select PHYLIB select HWMON - select PTP_1588_CLOCK + imply PTP_1588_CLOCK ---help--- This driver supports Broadcom T
[PATCH v2 2/5] kconfig: introduce the "suggest" keyword
Similar to "imply" but with no added restrictions on the target symbol's value. Useful for providing a default value to another symbol. Suggested by Edward Cree. Signed-off-by: Nicolas Pitre --- Documentation/kbuild/kconfig-language.txt | 6 ++ scripts/kconfig/expr.h| 2 ++ scripts/kconfig/menu.c| 15 ++- scripts/kconfig/symbol.c | 20 +++- scripts/kconfig/zconf.gperf | 1 + scripts/kconfig/zconf.y | 16 ++-- 6 files changed, 56 insertions(+), 4 deletions(-) diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index 5ee0dd3c85..b7f4f0ca1d 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt @@ -140,6 +140,12 @@ applicable everywhere (see syntax). ability to hook into a given subsystem while still being able to configure that subsystem out and keep those drivers selected. +- even weaker reverse dependencies: "suggest" ["if" ] + This is similar to "imply" except that this doesn't add any restrictions + on the value the suggested symbol may use. In other words this only + provides a default for the specified symbol based on the value for the + config entry where this is used. + - limiting menu display: "visible if" This attribute is only applicable to menu blocks, if the condition is false, the menu block is not displayed to the user (the symbols diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h index a73f762c48..eea3aa3c7a 100644 --- a/scripts/kconfig/expr.h +++ b/scripts/kconfig/expr.h @@ -86,6 +86,7 @@ struct symbol { struct expr_value dir_dep; struct expr_value rev_dep; struct expr_value implied; + struct expr_value suggested; }; #define for_all_symbols(i, sym) for (i = 0; i < SYMBOL_HASHSIZE; i++) for (sym = symbol_hash[i]; sym; sym = sym->next) if (sym->type != S_OTHER) @@ -138,6 +139,7 @@ enum prop_type { P_CHOICE, /* choice value */ P_SELECT, /* select BAR */ P_IMPLY,/* imply BAR */ + P_SUGGEST, /* suggest BAR */ P_RANGE,/* range 7..100 (for a symbol) */ P_ENV, /* value from environment variable */ P_SYMBOL, /* where a symbol is defined */ diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c index e9357931b4..3abc5c85ac 100644 --- a/scripts/kconfig/menu.c +++ b/scripts/kconfig/menu.c @@ -255,7 +255,9 @@ static void sym_check_prop(struct symbol *sym) break; case P_SELECT: case P_IMPLY: - use = prop->type == P_SELECT ? "select" : "imply"; + case P_SUGGEST: + use = prop->type == P_SELECT ? "select" : + prop->type == P_IMPLY ? "imply" : "suggest"; sym2 = prop_get_symbol(prop); if (sym->type != S_BOOLEAN && sym->type != S_TRISTATE) prop_warn(prop, @@ -341,6 +343,10 @@ void menu_finalize(struct menu *parent) struct symbol *es = prop_get_symbol(prop); es->implied.expr = expr_alloc_or(es->implied.expr, expr_alloc_and(expr_alloc_symbol(menu->sym), expr_copy(dep))); + } else if (prop->type == P_SUGGEST) { + struct symbol *es = prop_get_symbol(prop); + es->suggested.expr = expr_alloc_or(es->suggested.expr, + expr_alloc_and(expr_alloc_symbol(menu->sym), expr_copy(dep))); } } } @@ -687,6 +693,13 @@ static void get_symbol_str(struct gstr *r, struct symbol *sym, str_append(r, "\n"); } + get_symbol_props_str(r, sym, P_SUGGEST, _(" Suggests: ")); + if (sym->suggested.expr) { + str_append(r, _(" Suggested by: ")); + expr_gstr_print(sym->suggested.expr, r); + str_append(r, "\n"); + } + str_append(r, "\n\n"); } diff --git a/scripts/kconfig/symbol.c b/scripts/kconfig/symbol.c index 20136ffefb..4a8094a63c 100644 --- a/scripts/kconfig/symbol.c +++ b/scripts/kconfig/symbol.c @@ -267,6 +267,16 @@ static void sym_calc_visibility(struct symbol *sym) sym->implied.tri = tri; sym_set_changed(sym); } + tri = no; + if (sym->suggested.expr) + tri = expr_calc_value(sym->suggested.expr); + tri = EXPR_AND(tri, sym->visible); + if (tri == mod && sym_get_type(sym) == S_BOOLEAN) + tri = yes; + if (sym->suggested.tri != tri) { + sym->suggested.tri = tri; +
[no subject]
From: Nicolas Pitre Subject: [PATCH v2 0/5] make POSIX timers optional with some Kconfig help Many embedded systems don't need the full POSIX timer support. Configuring them out provides a nice kernel image size reduction. When POSIX timers are configured out, the PTP clock subsystem should be left out as well. However a bunch of ethernet drivers currently *select* the later in their Kconfig entries. Therefore some more work was needed to break that hard dependency from those drivers without preventing their usage altogether. Therefore this series also includes kconfig changes to implement a new keyword to express some reverse dependencies like "select" does, named "imply", and still allowing for the target config symbol to be disabled if the user or a direct dependency says so. The "suggest" keyword is also provided to complement "imply" but without the restrictions from "imply" or "select". At this point I'd like to gather ACKs especially from people in the "To" field. Ideally this would need to go upstream as a single series to avoid cross subsystem dependency issues, and we should decide which maintainer tree to use. Suggestions welcome. Changes from v1: - added "suggest" to kconfig for completeness - various typo fixes - small "imply" effect visibility fix The bulk of the diffstat comes from the kconfig lex parser regeneration. Diffstat: Documentation/kbuild/kconfig-language.txt | 34 + drivers/Makefile|2 +- drivers/net/ethernet/adi/Kconfig|2 +- drivers/net/ethernet/amd/Kconfig|2 +- drivers/net/ethernet/amd/xgbe/xgbe-main.c |6 +- drivers/net/ethernet/broadcom/Kconfig |4 +- drivers/net/ethernet/cavium/Kconfig |2 +- drivers/net/ethernet/freescale/Kconfig |2 +- drivers/net/ethernet/intel/Kconfig | 10 +- drivers/net/ethernet/mellanox/mlx4/Kconfig |2 +- drivers/net/ethernet/mellanox/mlx5/core/Kconfig |2 +- drivers/net/ethernet/renesas/Kconfig|2 +- drivers/net/ethernet/samsung/Kconfig|2 +- drivers/net/ethernet/sfc/Kconfig|2 +- drivers/net/ethernet/stmicro/stmmac/Kconfig |2 +- drivers/net/ethernet/ti/Kconfig |2 +- drivers/net/ethernet/tile/Kconfig |2 +- drivers/ptp/Kconfig | 10 +- include/linux/posix-timers.h| 28 +- include/linux/ptp_clock_kernel.h| 65 +- include/linux/sched.h | 10 + init/Kconfig| 17 + kernel/signal.c |4 + kernel/time/Makefile| 10 +- kernel/time/posix-stubs.c | 118 ++ scripts/kconfig/expr.h |4 + scripts/kconfig/menu.c | 68 +- scripts/kconfig/symbol.c| 42 +- scripts/kconfig/zconf.gperf |2 + scripts/kconfig/zconf.hash.c_shipped| 228 +-- scripts/kconfig/zconf.tab.c_shipped | 1631 - scripts/kconfig/zconf.y | 28 +- 32 files changed, 1300 insertions(+), 1045 deletions(-)
Re: [PATCH net-next 2/3] bpf: Add new cgroups prog type to enable sock modifications
On 10/25/16 5:39 PM, Eric Dumazet wrote: > On Tue, 2016-10-25 at 15:30 -0700, David Ahern wrote: >> Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to >> BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run >> any time a process in the cgroup opens an AF_INET or AF_INET6 socket. >> Currently only sk_bound_dev_if is exported to userspace for modification >> by a bpf program. >> >> This allows a cgroup to be configured such that AF_INET{6} sockets opened >> by processes are automatically bound to a specific device. In turn, this >> enables the running of programs that do not support SO_BINDTODEVICE in a >> specific VRF context / L3 domain. > > Does this mean that these programs no longer can use loopback ? I am probably misunderstanding your question, so I'll ramble a bit and see if I cover it. This patch set generically allows sk_bound_dev_if to be set to any value. It does not check that an index corresponds to a device at that moment (either bpf prog install or execution of the filter), and even if it did the device can be deleted at any moment. That seems to be standard operating procedure with bpf filters (user mistakes mean packets go no where and in this case a socket is bound to a non-existent device). The index can be any interface (e.g., eth0) or an L3 device (e.g., a VRF). Loopback and index=1 is allowed. The VRF device is the loopback device for the domain, so binding to it covers addresses on the VRF device as well as interfaces enslaved to it. Did you mean something else?
[PATCH] ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()
This patch updates skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit() when an IPv6 header is installed to a socket buffer. This is not a cosmetic change. Without updating this value, GSO packets transmitted through an ipip6 tunnel have the protocol of ETH_P_IP and skb_mac_gso_segment() will attempt to call gso_segment() for IPv4, which results in the packets being dropped. Fixes: b8921ca83eed ("ip4ip6: Support for GSO/GRO") Signed-off-by: Eli Cooper --- net/ipv6/ip6_tunnel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 202d16a..03e050d 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1172,6 +1172,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, if (err) return err; + skb->protocol = htons(ETH_P_IPV6); skb_push(skb, sizeof(struct ipv6hdr)); skb_reset_network_header(skb); ipv6h = ipv6_hdr(skb); -- 2.10.1
Re: [PATCH net-next 2/3] bpf: Add new cgroups prog type to enable sock modifications
On 10/25/16 5:28 PM, Daniel Borkmann wrote: >> +BPF_CALL_3(bpf_sock_store_u32, struct sock *, sk, u32, offset, u32, val) >> +{ >> +u8 *ptr = (u8 *)sk; >> + >> +if (unlikely(offset > sizeof(*sk))) >> +return -EFAULT; >> + >> +*((u32 *)ptr) = val; >> + >> +return 0; >> +} > > Seems strange to me. So, this helper allows to overwrite arbitrary memory > of a struct sock instance. Potentially we could crash the kernel. > > And in your sock_filter_convert_ctx_access(), you already implement inline > read/write for the context ... > > Your demo code does in pseudocode: > > r1 = sk > r2 = offsetof(struct bpf_sock, bound_dev_if) > r3 = idx > r1->sk_bound_dev_if = idx > sock_store_u32(r1, r2, r3) // updates sk_bound_dev_if again to idx > return 1 > > Dropping that helper from the patch, the only thing a program can do here > is to read/write the sk_bound_dev_if helper per cgroup. Hmm ... dunno. So > this really has to be for cgroups v2, right? Showing my inexperience with the bpf code. The helper can be dropped. I'll do that for v2. Yes, Daniel's patch set provides the infra for this one and it has a cgroups v2 limitation.
Re: [PATCH net-next 2/3] bpf: Add new cgroups prog type to enable sock modifications
On Wed, Oct 26, 2016 at 01:28:24AM +0200, Daniel Borkmann wrote: > On 10/26/2016 12:30 AM, David Ahern wrote: > >Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to > >BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run > >any time a process in the cgroup opens an AF_INET or AF_INET6 socket. > >Currently only sk_bound_dev_if is exported to userspace for modification > >by a bpf program. > > > >This allows a cgroup to be configured such that AF_INET{6} sockets opened > >by processes are automatically bound to a specific device. In turn, this > >enables the running of programs that do not support SO_BINDTODEVICE in a > >specific VRF context / L3 domain. > > > >Signed-off-by: David Ahern > [...] > >@@ -524,6 +535,10 @@ struct bpf_tunnel_key { > > __u32 tunnel_label; > > }; > > > >+struct bpf_sock { > >+__u32 bound_dev_if; > >+}; > >+ > > /* User return codes for XDP prog type. > > * A valid XDP program must return one of these defined values. All other > > * return codes are reserved for future use. Unknown return codes will > > result > [...] > >diff --git a/net/core/filter.c b/net/core/filter.c > >index 4552b8c93b99..775802881b01 100644 > >--- a/net/core/filter.c > >+++ b/net/core/filter.c > >@@ -2482,6 +2482,27 @@ static const struct bpf_func_proto > >bpf_xdp_event_output_proto = { > > .arg5_type = ARG_CONST_STACK_SIZE, > > }; > > > >+BPF_CALL_3(bpf_sock_store_u32, struct sock *, sk, u32, offset, u32, val) > >+{ > >+u8 *ptr = (u8 *)sk; > >+ > >+if (unlikely(offset > sizeof(*sk))) > >+return -EFAULT; > >+ > >+*((u32 *)ptr) = val; > >+ > >+return 0; > >+} > > Seems strange to me. So, this helper allows to overwrite arbitrary memory > of a struct sock instance. Potentially we could crash the kernel. > > And in your sock_filter_convert_ctx_access(), you already implement inline > read/write for the context ... > > Your demo code does in pseudocode: > > r1 = sk > r2 = offsetof(struct bpf_sock, bound_dev_if) > r3 = idx > r1->sk_bound_dev_if = idx > sock_store_u32(r1, r2, r3) // updates sk_bound_dev_if again to idx > return 1 > > Dropping that helper from the patch, the only thing a program can do here > is to read/write the sk_bound_dev_if helper per cgroup. Hmm ... dunno. So > this really has to be for cgroups v2, right? Looks pretty cool. Same question as Daniel... why extra helper? If program overwrites bpf_sock->sk_bound_dev_if can we use that after program returns? Also do you think it's possible to extend this patch to prototype the port bind restrictions that were proposed few month back using the same bpf_sock input structure? Probably the check would need to be moved into different place instead of sk_alloc(), but then we'll have more opportunities to overwrite bound_dev_if, look at ports and so on ?
Re: [PATCH net] bpf: fix samples to add fake KBUILD_MODNAME
On Wed, Oct 26, 2016 at 12:37:53AM +0200, Daniel Borkmann wrote: > Some of the sample files are causing issues when they are loaded with tc > and cls_bpf, meaning tc bails out while trying to parse the resulting ELF > file as program/map/etc sections are not present, which can be easily > spotted with readelf(1). > > Currently, BPF samples are including some of the kernel headers and mid > term we should change them to refrain from this, really. When dynamic > debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which > is easily overlooked in the build as clang spills this along with other > noisy warnings from various header includes, and llc still generates an > ELF file with mentioned characteristics. For just playing around with BPF > examples, this can be a bit of a hurdle to take. > > Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is > done in xdp*_kern samples already. > > Fixes: 65d472fb007d ("samples/bpf: add 'pointer to packet' tests") > Fixes: 6afb1e28b859 ("samples/bpf: Add tunnel set/get tests.") > Fixes: a3f74617340b ("cgroup: bpf: Add an example to do cgroup checking in > BPF") > Reported-by: Chandrasekar Kannan > Signed-off-by: Daniel Borkmann > --- > samples/bpf/parse_ldabs.c| 1 + > samples/bpf/parse_simple.c | 1 + > samples/bpf/parse_varlen.c | 1 + > samples/bpf/tcbpf1_kern.c| 1 + > samples/bpf/tcbpf2_kern.c| 1 + > samples/bpf/test_cgrp2_tc_kern.c | 1 + > 6 files changed, 6 insertions(+) It's also needed for all of tracex*_kern.c, right? For networking samlpes we probably should get rid of kernel headers. I guess they were there by copy-paste mistake from tracing, since tracing samples actually need to include them, since they do bpf_probe_read into kernel data structures. For this patch in the mean time: Acked-by: Alexei Starovoitov
[PATCH net-next V3 8/9] liquidio CN23XX: copyrights changes and alignment
Updated copyrights comments and also changed some other comments alignments. Signed-off-by: Raghu Vatsavayi Signed-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 53 ++ .../ethernet/cavium/liquidio/cn23xx_pf_device.h| 39 +++- .../net/ethernet/cavium/liquidio/cn23xx_pf_regs.h | 39 +++- .../net/ethernet/cavium/liquidio/cn66xx_device.c | 36 +++ .../net/ethernet/cavium/liquidio/cn66xx_device.h | 37 +++ drivers/net/ethernet/cavium/liquidio/cn66xx_regs.h | 37 +++ .../net/ethernet/cavium/liquidio/cn68xx_device.c | 36 +++ .../net/ethernet/cavium/liquidio/cn68xx_device.h | 37 +++ drivers/net/ethernet/cavium/liquidio/cn68xx_regs.h | 37 +++ drivers/net/ethernet/cavium/liquidio/lio_core.c| 36 +++ drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 42 - drivers/net/ethernet/cavium/liquidio/lio_main.c| 36 +++ .../net/ethernet/cavium/liquidio/liquidio_common.h | 37 +++ .../net/ethernet/cavium/liquidio/liquidio_image.h | 36 +++ .../net/ethernet/cavium/liquidio/octeon_config.h | 37 +++ .../net/ethernet/cavium/liquidio/octeon_console.c | 43 -- .../net/ethernet/cavium/liquidio/octeon_device.c | 36 +++ .../net/ethernet/cavium/liquidio/octeon_device.h | 45 -- drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 36 +++ drivers/net/ethernet/cavium/liquidio/octeon_droq.h | 17 +++ drivers/net/ethernet/cavium/liquidio/octeon_iq.h | 21 - .../net/ethernet/cavium/liquidio/octeon_mailbox.c | 3 -- .../net/ethernet/cavium/liquidio/octeon_mailbox.h | 3 -- drivers/net/ethernet/cavium/liquidio/octeon_main.h | 19 +++- .../net/ethernet/cavium/liquidio/octeon_mem_ops.c | 5 +- .../net/ethernet/cavium/liquidio/octeon_mem_ops.h | 5 +- .../net/ethernet/cavium/liquidio/octeon_network.h | 5 +- drivers/net/ethernet/cavium/liquidio/octeon_nic.c | 5 +- drivers/net/ethernet/cavium/liquidio/octeon_nic.h | 5 +- .../net/ethernet/cavium/liquidio/request_manager.c | 5 +- .../ethernet/cavium/liquidio/response_manager.c| 5 +- .../ethernet/cavium/liquidio/response_manager.h| 5 +- 32 files changed, 352 insertions(+), 486 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c index d6bbccd..c9a706d 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c @@ -1,27 +1,21 @@ /** -* Author: Cavium, Inc. -* -* Contact: supp...@cavium.com -* Please include "LiquidIO" in the subject. -* -* Copyright (c) 2003-2015 Cavium, Inc. -* -* This file is free software; you can redistribute it and/or modify -* it under the terms of the GNU General Public License, Version 2, as -* published by the Free Software Foundation. -* -* This file is distributed in the hope that it will be useful, but -* AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty -* of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or -* NONINFRINGEMENT. See the GNU General Public License for more -* details. -* -* This file may also be available under a different license from Cavium. -* Contact Cavium, Inc. for more information -**/ - + * Author: Cavium, Inc. + * + * Contact: supp...@cavium.com + * Please include "LiquidIO" in the subject. + * + * Copyright (c) 2003-2016 Cavium, Inc. + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, Version 2, as + * published by the Free Software Foundation. + * + * This file is distributed in the hope that it will be useful, but + * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or + * NONINFRINGEMENT. See the GNU General Public License for more details. + ***/ #include -#include #include #include #include "liquidio_common.h" @@ -421,10 +415,10 @@ static int cn23xx_pf_setup_global_input_regs(struct octeon_device *oct) return -1; /** Set the MAC_NUM and PVF_NUM in IQ_PKT_CONTROL reg - * for all queues.Only PF can set these bits. - * bits 29:30 indicate the MAC num. - * bits 32:47 indicate the PVF num. - */ +* for all queues.Only PF can set these bits. +* bits 29:30 indicate the MAC num. +* bits 32:47 indicate the PVF num. +*/ for (q_no = 0; q_no < ern; q_no++) {
[PATCH net-next V3 4/9] liquidio CN23XX: mailbox interrupt processing
Adds support for mailbox interrupt processing of various commands. Signed-off-by: Raghu Vatsavayi Signed-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 157 + drivers/net/ethernet/cavium/liquidio/lio_main.c| 12 ++ .../net/ethernet/cavium/liquidio/octeon_device.c | 1 + .../net/ethernet/cavium/liquidio/octeon_device.h | 21 ++- 4 files changed, 184 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c index 2c7cf89..37d1a4e 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c @@ -30,6 +30,7 @@ #include "octeon_device.h" #include "cn23xx_pf_device.h" #include "octeon_main.h" +#include "octeon_mailbox.h" #define RESET_NOTDONE 0 #define RESET_DONE 1 @@ -677,6 +678,118 @@ static void cn23xx_setup_oq_regs(struct octeon_device *oct, u32 oq_no) } } +static void cn23xx_pf_mbox_thread(struct work_struct *work) +{ + struct cavium_wk *wk = (struct cavium_wk *)work; + struct octeon_mbox *mbox = (struct octeon_mbox *)wk->ctxptr; + struct octeon_device *oct = mbox->oct_dev; + u64 mbox_int_val, val64; + u32 q_no, i; + + if (oct->rev_id < OCTEON_CN23XX_REV_1_1) { + /*read and clear by writing 1*/ + mbox_int_val = readq(mbox->mbox_int_reg); + writeq(mbox_int_val, mbox->mbox_int_reg); + + for (i = 0; i < oct->sriov_info.num_vfs_alloced; i++) { + q_no = i * oct->sriov_info.rings_per_vf; + + val64 = readq(oct->mbox[q_no]->mbox_write_reg); + + if (val64 && (val64 != OCTEON_PFVFACK)) { + if (octeon_mbox_read(oct->mbox[q_no])) + octeon_mbox_process_message( + oct->mbox[q_no]); + } + } + + schedule_delayed_work(&wk->work, msecs_to_jiffies(10)); + } else { + octeon_mbox_process_message(mbox); + } +} + +static int cn23xx_setup_pf_mbox(struct octeon_device *oct) +{ + struct octeon_mbox *mbox = NULL; + u16 mac_no = oct->pcie_port; + u16 pf_num = oct->pf_num; + u32 q_no, i; + + if (!oct->sriov_info.max_vfs) + return 0; + + for (i = 0; i < oct->sriov_info.max_vfs; i++) { + q_no = i * oct->sriov_info.rings_per_vf; + + mbox = vmalloc(sizeof(*mbox)); + if (!mbox) + goto free_mbox; + + memset(mbox, 0, sizeof(struct octeon_mbox)); + + spin_lock_init(&mbox->lock); + + mbox->oct_dev = oct; + + mbox->q_no = q_no; + + mbox->state = OCTEON_MBOX_STATE_IDLE; + + /* PF mbox interrupt reg */ + mbox->mbox_int_reg = (u8 *)oct->mmio[0].hw_addr + +CN23XX_SLI_MAC_PF_MBOX_INT(mac_no, pf_num); + + /* PF writes into SIG0 reg */ + mbox->mbox_write_reg = (u8 *)oct->mmio[0].hw_addr + + CN23XX_SLI_PKT_PF_VF_MBOX_SIG(q_no, 0); + + /* PF reads from SIG1 reg */ + mbox->mbox_read_reg = (u8 *)oct->mmio[0].hw_addr + + CN23XX_SLI_PKT_PF_VF_MBOX_SIG(q_no, 1); + + /*Mail Box Thread creation*/ + INIT_DELAYED_WORK(&mbox->mbox_poll_wk.work, + cn23xx_pf_mbox_thread); + mbox->mbox_poll_wk.ctxptr = (void *)mbox; + + oct->mbox[q_no] = mbox; + + writeq(OCTEON_PFVFSIG, mbox->mbox_read_reg); + } + + if (oct->rev_id < OCTEON_CN23XX_REV_1_1) + schedule_delayed_work(&oct->mbox[0]->mbox_poll_wk.work, + msecs_to_jiffies(0)); + + return 0; + +free_mbox: + while (i) { + i--; + vfree(oct->mbox[i]); + } + + return 1; +} + +static int cn23xx_free_pf_mbox(struct octeon_device *oct) +{ + u32 q_no, i; + + if (!oct->sriov_info.max_vfs) + return 0; + + for (i = 0; i < oct->sriov_info.max_vfs; i++) { + q_no = i * oct->sriov_info.rings_per_vf; + cancel_delayed_work_sync( + &oct->mbox[q_no]->mbox_poll_wk.work); + vfree(oct->mbox[q_no]); + } + + return 0; +} + static int cn23xx_enable_io_queues(struct octeon_device *oct) { u64 reg_val; @@ -871,6 +984,29 @@ static u64 cn23xx_pf_msix_interrupt_handler(void *dev) return ret; } +static void cn23xx_handle_pf_mbox_intr(struct octeon_device *oct) +{ + struct delayed_work *work; +
[PATCH net-next V3 9/9] liquidio CN23XX: fix for new check patch errors
New checkpatch script shows some errors with pre-existing driver. This patch provides fix for those errors. Signed-off-by: Raghu Vatsavayi Signed-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../net/ethernet/cavium/liquidio/cn23xx_pf_regs.h | 12 +-- drivers/net/ethernet/cavium/liquidio/cn66xx_regs.h | 12 +-- .../net/ethernet/cavium/liquidio/cn68xx_device.c | 2 +- drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 9 +- drivers/net/ethernet/cavium/liquidio/lio_main.c| 9 +- .../net/ethernet/cavium/liquidio/liquidio_common.h | 50 - .../net/ethernet/cavium/liquidio/octeon_console.c | 113 ++--- .../net/ethernet/cavium/liquidio/octeon_device.c | 23 ++--- .../net/ethernet/cavium/liquidio/octeon_device.h | 20 ++-- drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 40 drivers/net/ethernet/cavium/liquidio/octeon_iq.h | 3 + .../net/ethernet/cavium/liquidio/octeon_mem_ops.c | 2 +- .../net/ethernet/cavium/liquidio/octeon_network.h | 6 +- drivers/net/ethernet/cavium/liquidio/octeon_nic.h | 2 +- .../net/ethernet/cavium/liquidio/request_manager.c | 16 ++- 15 files changed, 149 insertions(+), 170 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h index 680a405..e6d4ad9 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h @@ -58,7 +58,7 @@ #define CN23XX_CONFIG_SRIOV_BAR_START 0x19C #define CN23XX_CONFIG_SRIOV_BARX(i)\ - (CN23XX_CONFIG_SRIOV_BAR_START + (i * 4)) + (CN23XX_CONFIG_SRIOV_BAR_START + ((i) * 4)) #define CN23XX_CONFIG_SRIOV_BAR_PF0x08 #define CN23XX_CONFIG_SRIOV_BAR_64BIT 0x04 #define CN23XX_CONFIG_SRIOV_BAR_IO0x01 @@ -508,7 +508,7 @@ /* 4 Registers (64 - bit) */ #defineCN23XX_SLI_S2M_PORT_CTL_START 0x23D80 #defineCN23XX_SLI_S2M_PORTX_CTL(port) \ - (CN23XX_SLI_S2M_PORT_CTL_START + (port * 0x10)) + (CN23XX_SLI_S2M_PORT_CTL_START + ((port) * 0x10)) #defineCN23XX_SLI_MAC_NUMBER 0x20050 @@ -549,26 +549,26 @@ * Provides DMA Engine Queue Enable */ #defineCN23XX_DPI_DMA_ENG0_ENB0x0001df80ULL -#defineCN23XX_DPI_DMA_ENG_ENB(eng) (CN23XX_DPI_DMA_ENG0_ENB + (eng * 8)) +#defineCN23XX_DPI_DMA_ENG_ENB(eng) (CN23XX_DPI_DMA_ENG0_ENB + ((eng) * 8)) /* 8 register (64-bit) - DPI_DMA(0..7)_REQQ_CTL * Provides control bits for transaction on 8 Queues */ #defineCN23XX_DPI_DMA_REQQ0_CTL 0x0001df000180ULL #defineCN23XX_DPI_DMA_REQQ_CTL(q_no) \ - (CN23XX_DPI_DMA_REQQ0_CTL + (q_no * 8)) + (CN23XX_DPI_DMA_REQQ0_CTL + ((q_no) * 8)) /* 6 register (64-bit) - DPI_ENG(0..5)_BUF * Provides DMA Engine FIFO (Queue) Size */ #defineCN23XX_DPI_DMA_ENG0_BUF0x0001df000880ULL #defineCN23XX_DPI_DMA_ENG_BUF(eng) \ - (CN23XX_DPI_DMA_ENG0_BUF + (eng * 8)) + (CN23XX_DPI_DMA_ENG0_BUF + ((eng) * 8)) /* 4 Registers (64-bit) */ #defineCN23XX_DPI_SLI_PRT_CFG_START 0x0001df000900ULL #defineCN23XX_DPI_SLI_PRTX_CFG(port)\ - (CN23XX_DPI_SLI_PRT_CFG_START + (port * 0x8)) + (CN23XX_DPI_SLI_PRT_CFG_START + ((port) * 0x8)) /* Masks for DPI_DMA_CONTROL Register */ #defineCN23XX_DPI_DMA_COMMIT_MODE BIT_ULL(58) diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_regs.h b/drivers/net/ethernet/cavium/liquidio/cn66xx_regs.h index 23152c0..b248966 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn66xx_regs.h +++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_regs.h @@ -438,10 +438,10 @@ #defineCN6XXX_SLI_S2M_PORT0_CTL 0x3D80 #defineCN6XXX_SLI_S2M_PORT1_CTL 0x3D90 #defineCN6XXX_SLI_S2M_PORTX_CTL(port)\ - (CN6XXX_SLI_S2M_PORT0_CTL + (port * 0x10)) + (CN6XXX_SLI_S2M_PORT0_CTL + ((port) * 0x10)) #defineCN6XXX_SLI_INT_ENB64(port)\ - (CN6XXX_SLI_INT_ENB64_PORT0 + (port * 0x10)) + (CN6XXX_SLI_INT_ENB64_PORT0 + ((port) * 0x10)) #defineCN6XXX_SLI_MAC_NUMBER 0x3E00 @@ -453,7 +453,7 @@ #defineCN6XXX_PCI_BAR1_OFFSET 0x8 #defineCN6XXX_BAR1_REG(idx, port) \ - (CN6XXX_BAR1_INDEX_START + (port * CN6XXX_PEM_OFFSET) + \ + (CN6XXX_BAR1_INDEX_START + ((port) * CN6XXX_PEM_OFFSET) + \ (CN6XXX_PCI_BAR1_OFFSET * (idx))) /* DPI #*/ @@ -471,17 +471,17 @@ #defineCN6XXX_DPI_DMA_ENG0_ENB0x0001df80ULL #defineCN6XXX_DPI_DMA_ENG_ENB(q_no) \ - (CN6XXX_DPI_DMA_ENG0_ENB + (q_no * 8)) + (CN6XXX_DPI_DMA_ENG0_ENB + ((q_no) * 8)) #d
[PATCH net-next V3 6/9] liquidio CN23XX: device states
Cleaned up resource leaks during destroy resources by introducing more device states. Signed-off-by: Raghu Vatsavayi Signed-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/lio_main.c| 33 -- .../net/ethernet/cavium/liquidio/octeon_device.c | 6 +++- .../net/ethernet/cavium/liquidio/octeon_device.h | 29 ++- drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 13 + drivers/net/ethernet/cavium/liquidio/octeon_main.h | 8 -- .../net/ethernet/cavium/liquidio/request_manager.c | 6 +++- 6 files changed, 64 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index b31ab7e..fcf38ab 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -780,6 +780,7 @@ static void delete_glists(struct lio *lio) } kfree((void *)lio->glist); + kfree((void *)lio->glist_lock); } /** @@ -1339,6 +1340,7 @@ static int liquidio_watchdog(void *param) complete(&first_stage); if (octeon_device_init(oct_dev)) { + complete(&hs->init); liquidio_remove(pdev); return -ENOMEM; } @@ -1363,7 +1365,15 @@ static int liquidio_watchdog(void *param) oct_dev->watchdog_task = kthread_create( liquidio_watchdog, oct_dev, "liowd/%02hhx:%02hhx.%hhx", bus, device, function); - wake_up_process(oct_dev->watchdog_task); + if (!IS_ERR(oct_dev->watchdog_task)) { + wake_up_process(oct_dev->watchdog_task); + } else { + oct_dev->watchdog_task = NULL; + dev_err(&oct_dev->pci_dev->dev, + "failed to create kernel_thread\n"); + liquidio_remove(pdev); + return -1; + } } } @@ -1427,6 +1437,8 @@ static void octeon_destroy_resources(struct octeon_device *oct) if (lio_wait_for_oq_pkts(oct)) dev_err(&oct->pci_dev->dev, "OQ had pending packets\n"); + /* fallthrough */ + case OCT_DEV_INTR_SET_DONE: /* Disable interrupts */ oct->fn_list.disable_interrupt(oct, OCTEON_ALL_INTR); @@ -1453,6 +1465,8 @@ static void octeon_destroy_resources(struct octeon_device *oct) pci_disable_msi(oct->pci_dev); } + /* fallthrough */ + case OCT_DEV_MSIX_ALLOC_VECTOR_DONE: if (OCTEON_CN23XX_PF(oct)) octeon_free_ioq_vector(oct); @@ -1516,10 +1530,13 @@ static void octeon_destroy_resources(struct octeon_device *oct) octeon_unmap_pci_barx(oct, 1); /* fallthrough */ - case OCT_DEV_BEGIN_STATE: + case OCT_DEV_PCI_ENABLE_DONE: + pci_clear_master(oct->pci_dev); /* Disable the device, releasing the PCI INT */ pci_disable_device(oct->pci_dev); + /* fallthrough */ + case OCT_DEV_BEGIN_STATE: /* Nothing to be done here either */ break; } /* end switch (oct->status) */ @@ -1798,6 +1815,7 @@ static int octeon_pci_os_setup(struct octeon_device *oct) if (dma_set_mask_and_coherent(&oct->pci_dev->dev, DMA_BIT_MASK(64))) { dev_err(&oct->pci_dev->dev, "Unexpected DMA device capability\n"); + pci_disable_device(oct->pci_dev); return 1; } @@ -4452,6 +4470,8 @@ static int octeon_device_init(struct octeon_device *octeon_dev) if (octeon_pci_os_setup(octeon_dev)) return 1; + atomic_set(&octeon_dev->status, OCT_DEV_PCI_ENABLE_DONE); + /* Identify the Octeon type and map the BAR address space. */ if (octeon_chip_specific_setup(octeon_dev)) { dev_err(&octeon_dev->pci_dev->dev, "Chip specific setup failed\n"); @@ -4523,9 +4543,6 @@ static int octeon_device_init(struct octeon_device *octeon_dev) if (octeon_setup_instr_queues(octeon_dev)) { dev_err(&octeon_dev->pci_dev->dev, "instruction queue initialization failed\n"); - /* On error, release any previously allocated queues */ - for (j = 0; j < octeon_dev->num_iqs; j++) - octeon_delete_instr_queue(octeon_dev, j); return 1; } atomic_set(&octeon_dev->status, OCT_DEV_INSTR_QUEUE_INIT_DONE); @@ -4541,9 +4558,6 @@ static int octeon_device_init(struct octeon_device *octeon_dev) if (
[PATCH net-next V3 5/9] liquidio CN23XX: VF related operations
Adds support for VF related operations like mac address vlan and link changes. Signed-off-by: Raghu Vatsavayi Signed-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 22 +++ .../ethernet/cavium/liquidio/cn23xx_pf_device.h| 3 + drivers/net/ethernet/cavium/liquidio/lio_main.c| 214 + .../net/ethernet/cavium/liquidio/liquidio_common.h | 5 + .../net/ethernet/cavium/liquidio/octeon_device.h | 8 + 5 files changed, 252 insertions(+) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c index 37d1a4e..d6bbccd 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "liquidio_common.h" #include "octeon_droq.h" #include "octeon_iq.h" @@ -1457,3 +1458,24 @@ int cn23xx_fw_loaded(struct octeon_device *oct) val = octeon_read_csr64(oct, CN23XX_SLI_SCRATCH1); return (val >> 1) & 1ULL; } + +void cn23xx_tell_vf_its_macaddr_changed(struct octeon_device *oct, int vfidx, + u8 *mac) +{ + if (oct->sriov_info.vf_drv_loaded_mask & BIT_ULL(vfidx)) { + struct octeon_mbox_cmd mbox_cmd; + + mbox_cmd.msg.u64 = 0; + mbox_cmd.msg.s.type = OCTEON_MBOX_REQUEST; + mbox_cmd.msg.s.resp_needed = 0; + mbox_cmd.msg.s.cmd = OCTEON_PF_CHANGED_VF_MACADDR; + mbox_cmd.msg.s.len = 1; + mbox_cmd.recv_len = 0; + mbox_cmd.recv_status = 0; + mbox_cmd.fn = NULL; + mbox_cmd.fn_arg = 0; + ether_addr_copy(mbox_cmd.msg.s.params, mac); + mbox_cmd.q_no = vfidx * oct->sriov_info.rings_per_vf; + octeon_mbox_write(oct, &mbox_cmd); + } +} diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h index 21b5c90..20a9dc5 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h @@ -56,4 +56,7 @@ int validate_cn23xx_pf_config_info(struct octeon_device *oct, void cn23xx_dump_pf_initialized_regs(struct octeon_device *oct); int cn23xx_fw_loaded(struct octeon_device *oct); + +void cn23xx_tell_vf_its_macaddr_changed(struct octeon_device *oct, int vfidx, + u8 *mac); #endif diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index 0fc6257..b31ab7e 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -3590,6 +3590,151 @@ static void liquidio_del_vxlan_port(struct net_device *netdev, OCTNET_CMD_VXLAN_PORT_DEL); } +static int __liquidio_set_vf_mac(struct net_device *netdev, int vfidx, +u8 *mac, bool is_admin_assigned) +{ + struct lio *lio = GET_LIO(netdev); + struct octeon_device *oct = lio->oct_dev; + struct octnic_ctrl_pkt nctrl; + + if (!is_valid_ether_addr(mac)) + return -EINVAL; + + if (vfidx < 0 || vfidx >= oct->sriov_info.max_vfs) + return -EINVAL; + + memset(&nctrl, 0, sizeof(struct octnic_ctrl_pkt)); + + nctrl.ncmd.u64 = 0; + nctrl.ncmd.s.cmd = OCTNET_CMD_CHANGE_MACADDR; + /* vfidx is 0 based, but vf_num (param1) is 1 based */ + nctrl.ncmd.s.param1 = vfidx + 1; + nctrl.ncmd.s.param2 = (is_admin_assigned ? 1 : 0); + nctrl.ncmd.s.more = 1; + nctrl.iq_no = lio->linfo.txpciq[0].s.q_no; + nctrl.cb_fn = 0; + nctrl.wait_time = 100; + + nctrl.udd[0] = 0; + /* The MAC Address is presented in network byte order. */ + ether_addr_copy((u8 *)&nctrl.udd[0] + 2, mac); + + oct->sriov_info.vf_macaddr[vfidx] = nctrl.udd[0]; + + octnet_send_nic_ctrl_pkt(oct, &nctrl); + + return 0; +} + +static int liquidio_set_vf_mac(struct net_device *netdev, int vfidx, u8 *mac) +{ + struct lio *lio = GET_LIO(netdev); + struct octeon_device *oct = lio->oct_dev; + int retval; + + retval = __liquidio_set_vf_mac(netdev, vfidx, mac, true); + if (!retval) + cn23xx_tell_vf_its_macaddr_changed(oct, vfidx, mac); + + return retval; +} + +static int liquidio_set_vf_vlan(struct net_device *netdev, int vfidx, + u16 vlan, u8 qos, __be16 vlan_proto) +{ + struct lio *lio = GET_LIO(netdev); + struct octeon_device *oct = lio->oct_dev; + struct octnic_ctrl_pkt nctrl; + u16 vlantci; + + if (vfidx < 0 || vfidx >= oct->sriov_info.num_vfs_alloced) + return -EINVAL; + + if (vl
[PATCH net-next V3 0/9] liquidio CN23XX VF support
Dave, Following is the V3 patch series for adding VF support on CN23XX devices. This version addressed: 1) Your concern for ordering of local variable declarations from longest to shortest line. 2) As recommended by you removed custom module parameter max_vfs. 3) Minor changes for fixing new checkpatch script related errors on pre-existing driver. I will post remaining VF patches soon after this patchseries is applied. Please apply patches in the following order as some of the patches depend on earlier patches. Thanks. Raghu Vatsavayi (9): liquidio CN23XX: HW config for VF support liquidio CN23XX: sysfs VF config support liquidio CN23XX: Mailbox support liquidio CN23XX: mailbox interrupt processing liquidio CN23XX: VF related operations liquidio CN23XX: device states liquidio CN23XX: code cleanup liquidio CN23XX: copyrights changes and alignment liquidio CN23XX: fix for new check patch errors drivers/net/ethernet/cavium/liquidio/Makefile | 1 + .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 357 ++--- .../ethernet/cavium/liquidio/cn23xx_pf_device.h| 42 +- .../net/ethernet/cavium/liquidio/cn23xx_pf_regs.h | 51 ++- .../net/ethernet/cavium/liquidio/cn66xx_device.c | 49 +-- .../net/ethernet/cavium/liquidio/cn66xx_device.h | 41 +- drivers/net/ethernet/cavium/liquidio/cn66xx_regs.h | 49 +-- .../net/ethernet/cavium/liquidio/cn68xx_device.c | 38 +- .../net/ethernet/cavium/liquidio/cn68xx_device.h | 37 +- drivers/net/ethernet/cavium/liquidio/cn68xx_regs.h | 37 +- drivers/net/ethernet/cavium/liquidio/lio_core.c| 68 +++- drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 65 ++- drivers/net/ethernet/cavium/liquidio/lio_main.c| 442 ++--- .../net/ethernet/cavium/liquidio/liquidio_common.h | 100 +++-- .../net/ethernet/cavium/liquidio/liquidio_image.h | 36 +- .../net/ethernet/cavium/liquidio/octeon_config.h | 46 ++- .../net/ethernet/cavium/liquidio/octeon_console.c | 156 .../net/ethernet/cavium/liquidio/octeon_device.c | 74 ++-- .../net/ethernet/cavium/liquidio/octeon_device.h | 133 --- drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 91 +++-- drivers/net/ethernet/cavium/liquidio/octeon_droq.h | 18 +- drivers/net/ethernet/cavium/liquidio/octeon_iq.h | 25 +- .../net/ethernet/cavium/liquidio/octeon_mailbox.c | 318 +++ .../net/ethernet/cavium/liquidio/octeon_mailbox.h | 112 ++ drivers/net/ethernet/cavium/liquidio/octeon_main.h | 47 +-- .../net/ethernet/cavium/liquidio/octeon_mem_ops.c | 7 +- .../net/ethernet/cavium/liquidio/octeon_mem_ops.h | 5 +- .../net/ethernet/cavium/liquidio/octeon_network.h | 11 +- drivers/net/ethernet/cavium/liquidio/octeon_nic.c | 5 +- drivers/net/ethernet/cavium/liquidio/octeon_nic.h | 7 +- .../net/ethernet/cavium/liquidio/request_manager.c | 34 +- .../ethernet/cavium/liquidio/response_manager.c| 11 +- .../ethernet/cavium/liquidio/response_manager.h| 6 +- 33 files changed, 1733 insertions(+), 786 deletions(-) create mode 100644 drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c create mode 100644 drivers/net/ethernet/cavium/liquidio/octeon_mailbox.h -- 1.8.3.1
[PATCH net-next V3 1/9] liquidio CN23XX: HW config for VF support
Adds support for configuring HW for creating VFs. Signed-off-by: Raghu Vatsavayi Signed-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 125 - drivers/net/ethernet/cavium/liquidio/lio_main.c| 23 .../net/ethernet/cavium/liquidio/octeon_config.h | 6 + .../net/ethernet/cavium/liquidio/octeon_device.h | 12 +- 4 files changed, 135 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c index 380a641..2c7cf89 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c @@ -40,11 +40,6 @@ */ #define CN23XX_INPUT_JABBER 64600 -#define LIOLUT_RING_DISTRIBUTION 9 -const int liolut_num_vfs_to_rings_per_vf[LIOLUT_RING_DISTRIBUTION] = { - 0, 8, 4, 2, 2, 2, 1, 1, 1 -}; - void cn23xx_dump_pf_initialized_regs(struct octeon_device *oct) { int i = 0; @@ -309,9 +304,10 @@ u32 cn23xx_pf_get_oq_ticks(struct octeon_device *oct, u32 time_intr_in_us) static void cn23xx_setup_global_mac_regs(struct octeon_device *oct) { - u64 reg_val; u16 mac_no = oct->pcie_port; u16 pf_num = oct->pf_num; + u64 reg_val; + u64 temp; /* programming SRN and TRS for each MAC(0..3) */ @@ -333,6 +329,14 @@ static void cn23xx_setup_global_mac_regs(struct octeon_device *oct) /* setting TRS <23:16> */ reg_val = reg_val | (oct->sriov_info.trs << CN23XX_PKT_MAC_CTL_RINFO_TRS_BIT_POS); + /* setting RPVF <39:32> */ + temp = oct->sriov_info.rings_per_vf & 0xff; + reg_val |= (temp << CN23XX_PKT_MAC_CTL_RINFO_RPVF_BIT_POS); + + /* setting NVFS <55:48> */ + temp = oct->sriov_info.max_vfs & 0xff; + reg_val |= (temp << CN23XX_PKT_MAC_CTL_RINFO_NVFS_BIT_POS); + /* write these settings to MAC register */ octeon_write_csr64(oct, CN23XX_SLI_PKT_MAC_RINFO64(mac_no, pf_num), reg_val); @@ -399,11 +403,12 @@ static int cn23xx_reset_io_queues(struct octeon_device *oct) static int cn23xx_pf_setup_global_input_regs(struct octeon_device *oct) { + struct octeon_cn23xx_pf *cn23xx = (struct octeon_cn23xx_pf *)oct->chip; + struct octeon_instr_queue *iq; + u64 intr_threshold, reg_val; u32 q_no, ern, srn; u64 pf_num; - u64 intr_threshold, reg_val; - struct octeon_instr_queue *iq; - struct octeon_cn23xx_pf *cn23xx = (struct octeon_cn23xx_pf *)oct->chip; + u64 vf_num; pf_num = oct->pf_num; @@ -420,6 +425,16 @@ static int cn23xx_pf_setup_global_input_regs(struct octeon_device *oct) */ for (q_no = 0; q_no < ern; q_no++) { reg_val = oct->pcie_port << CN23XX_PKT_INPUT_CTL_MAC_NUM_POS; + + /* for VF assigned queues. */ + if (q_no < oct->sriov_info.pf_srn) { + vf_num = q_no / oct->sriov_info.rings_per_vf; + vf_num += 1; /* VF1, VF2, */ + } else { + vf_num = 0; + } + + reg_val |= vf_num << CN23XX_PKT_INPUT_CTL_VF_NUM_POS; reg_val |= pf_num << CN23XX_PKT_INPUT_CTL_PF_NUM_POS; octeon_write_csr64(oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no), @@ -1048,50 +1063,100 @@ static void cn23xx_setup_reg_address(struct octeon_device *oct) static int cn23xx_sriov_config(struct octeon_device *oct) { - u32 total_rings; struct octeon_cn23xx_pf *cn23xx = (struct octeon_cn23xx_pf *)oct->chip; - /* num_vfs is already filled for us */ + u32 max_rings, total_rings, max_vfs; u32 pf_srn, num_pf_rings; + u32 max_possible_vfs; + u32 rings_per_vf = 0; cn23xx->conf = - (struct octeon_config *)oct_get_config_info(oct, LIO_23XX); + (struct octeon_config *)oct_get_config_info(oct, LIO_23XX); switch (oct->rev_id) { case OCTEON_CN23XX_REV_1_0: - total_rings = CN23XX_MAX_RINGS_PER_PF_PASS_1_0; + max_rings = CN23XX_MAX_RINGS_PER_PF_PASS_1_0; + max_possible_vfs = CN23XX_MAX_VFS_PER_PF_PASS_1_0; break; case OCTEON_CN23XX_REV_1_1: - total_rings = CN23XX_MAX_RINGS_PER_PF_PASS_1_1; + max_rings = CN23XX_MAX_RINGS_PER_PF_PASS_1_1; + max_possible_vfs = CN23XX_MAX_VFS_PER_PF_PASS_1_1; break; default: - total_rings = CN23XX_MAX_RINGS_PER_PF; + max_rings = CN23XX_MAX_RINGS_PER_PF; + max_possible_vfs = CN23XX_MAX_VFS_PER_PF; break; } - if (!oct->sriov_info.num_pf_rings) { - if (total_rings > num_present_cpus()) - num_pf_rings = num_present_
[PATCH net-next V3 7/9] liquidio CN23XX: code cleanup
Cleaned up unnecessary comments and added some minor macros. Signed-off-by: Raghu Vatsavayi Signed-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/cn66xx_device.c | 13 - drivers/net/ethernet/cavium/liquidio/cn66xx_device.h | 4 ++-- drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 14 -- drivers/net/ethernet/cavium/liquidio/lio_main.c| 17 + drivers/net/ethernet/cavium/liquidio/liquidio_common.h | 2 -- drivers/net/ethernet/cavium/liquidio/octeon_device.c | 8 drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 2 +- drivers/net/ethernet/cavium/liquidio/octeon_droq.h | 1 - drivers/net/ethernet/cavium/liquidio/octeon_iq.h | 1 - drivers/net/ethernet/cavium/liquidio/octeon_main.h | 18 -- drivers/net/ethernet/cavium/liquidio/request_manager.c | 7 ++- .../net/ethernet/cavium/liquidio/response_manager.c| 6 +- .../net/ethernet/cavium/liquidio/response_manager.h| 1 - 13 files changed, 23 insertions(+), 71 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c index e779af8..1ebc225 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c @@ -275,7 +275,6 @@ void lio_cn6xxx_setup_iq_regs(struct octeon_device *oct, u32 iq_no) { struct octeon_instr_queue *iq = oct->instr_queue[iq_no]; - /* Disable Packet-by-Packet mode; No Parse Mode or Skip length */ octeon_write_csr64(oct, CN6XXX_SLI_IQ_PKT_INSTR_HDR64(iq_no), 0); /* Write the start of the input queue's ring and its size */ @@ -378,7 +377,7 @@ void lio_cn6xxx_disable_io_queues(struct octeon_device *oct) /* Reset the doorbell register for each Input queue. */ for (i = 0; i < MAX_OCTEON_INSTR_QUEUES(oct); i++) { - if (!(oct->io_qmask.iq & (1ULL << i))) + if (!(oct->io_qmask.iq & BIT_ULL(i))) continue; octeon_write_csr(oct, CN6XXX_SLI_IQ_DOORBELL(i), 0x); d32 = octeon_read_csr(oct, CN6XXX_SLI_IQ_DOORBELL(i)); @@ -400,9 +399,8 @@ void lio_cn6xxx_disable_io_queues(struct octeon_device *oct) ; /* Reset the doorbell register for each Output queue. */ - /* for (i = 0; i < oct->num_oqs; i++) { */ for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) { - if (!(oct->io_qmask.oq & (1ULL << i))) + if (!(oct->io_qmask.oq & BIT_ULL(i))) continue; octeon_write_csr(oct, CN6XXX_SLI_OQ_PKTS_CREDIT(i), 0x); d32 = octeon_read_csr(oct, CN6XXX_SLI_OQ_PKTS_CREDIT(i)); @@ -537,15 +535,14 @@ static int lio_cn6xxx_process_droq_intr_regs(struct octeon_device *oct) oct->droq_intr = 0; - /* for (oq_no = 0; oq_no < oct->num_oqs; oq_no++) { */ for (oq_no = 0; oq_no < MAX_OCTEON_OUTPUT_QUEUES(oct); oq_no++) { - if (!(droq_mask & (1ULL << oq_no))) + if (!(droq_mask & BIT_ULL(oq_no))) continue; droq = oct->droq[oq_no]; pkt_count = octeon_droq_check_hw_for_pkts(droq); if (pkt_count) { - oct->droq_intr |= (1ULL << oq_no); + oct->droq_intr |= BIT_ULL(oq_no); if (droq->ops.poll_mode) { u32 value; u32 reg; @@ -721,8 +718,6 @@ int lio_setup_cn66xx_octeon_device(struct octeon_device *oct) int lio_validate_cn6xxx_config_info(struct octeon_device *oct, struct octeon_config *conf6xxx) { - /* int total_instrs = 0; */ - if (CFG_GET_IQ_MAX_Q(conf6xxx) > CN6XXX_MAX_INPUT_QUEUES) { dev_err(&oct->pci_dev->dev, "%s: Num IQ (%d) exceeds Max (%d)\n", __func__, CFG_GET_IQ_MAX_Q(conf6xxx), diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h index a40a913..32fbbb2 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h +++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h @@ -96,8 +96,8 @@ void lio_cn6xxx_setup_reg_address(struct octeon_device *oct, void *chip, struct octeon_reg_list *reg_list); u32 lio_cn6xxx_coprocessor_clock(struct octeon_device *oct); u32 lio_cn6xxx_get_oq_ticks(struct octeon_device *oct, u32 time_intr_in_us); -int lio_setup_cn66xx_octeon_device(struct octeon_device *); +int lio_setup_cn66xx_octeon_device(struct octeon_device *oct); int lio_validate_cn6xxx_config_info(struct octeon_device *oct, - struct octeon_config *); + struct octeon_co
[PATCH net-next V3 3/9] liquidio CN23XX: Mailbox support
Adds support for mailbox communication between PF and VF. Signed-off-by: Raghu Vatsavayi Signed-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/Makefile | 1 + drivers/net/ethernet/cavium/liquidio/lio_core.c| 32 ++ .../net/ethernet/cavium/liquidio/liquidio_common.h | 6 +- .../net/ethernet/cavium/liquidio/octeon_device.h | 4 + .../net/ethernet/cavium/liquidio/octeon_mailbox.c | 321 + .../net/ethernet/cavium/liquidio/octeon_mailbox.h | 115 drivers/net/ethernet/cavium/liquidio/octeon_main.h | 2 +- 7 files changed, 478 insertions(+), 3 deletions(-) create mode 100644 drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c create mode 100644 drivers/net/ethernet/cavium/liquidio/octeon_mailbox.h diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile b/drivers/net/ethernet/cavium/liquidio/Makefile index 5a27b2a..14958de 100644 --- a/drivers/net/ethernet/cavium/liquidio/Makefile +++ b/drivers/net/ethernet/cavium/liquidio/Makefile @@ -11,6 +11,7 @@ liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \ cn66xx_device.o\ cn68xx_device.o\ cn23xx_pf_device.o \ + octeon_mailbox.o \ octeon_mem_ops.o \ octeon_droq.o \ octeon_nic.o diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c b/drivers/net/ethernet/cavium/liquidio/lio_core.c index 201eddb..e6026df 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_core.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c @@ -264,3 +264,35 @@ void liquidio_link_ctrl_cmd_completion(void *nctrl_ptr) nctrl->ncmd.s.cmd); } } + +void octeon_pf_changed_vf_macaddr(struct octeon_device *oct, u8 *mac) +{ + bool macaddr_changed = false; + struct net_device *netdev; + struct lio *lio; + + rtnl_lock(); + + netdev = oct->props[0].netdev; + lio = GET_LIO(netdev); + + lio->linfo.macaddr_is_admin_asgnd = true; + + if (!ether_addr_equal(netdev->dev_addr, mac)) { + macaddr_changed = true; + ether_addr_copy(netdev->dev_addr, mac); + ether_addr_copy(((u8 *)&lio->linfo.hw_addr) + 2, mac); + call_netdevice_notifiers(NETDEV_CHANGEADDR, netdev); + } + + rtnl_unlock(); + + if (macaddr_changed) + dev_info(&oct->pci_dev->dev, +"PF changed VF's MAC address to %02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx\n", +mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]); + + /* no need to notify the firmware of the macaddr change because +* the PF did that already +*/ +} diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h index 0d990ac..caeff9a 100644 --- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h +++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h @@ -731,13 +731,15 @@ struct oct_link_info { #ifdef __BIG_ENDIAN_BITFIELD u64 gmxport:16; - u64 rsvd:32; + u64 macaddr_is_admin_asgnd:1; + u64 rsvd:31; u64 num_txpciq:8; u64 num_rxpciq:8; #else u64 num_rxpciq:8; u64 num_txpciq:8; - u64 rsvd:32; + u64 rsvd:31; + u64 macaddr_is_admin_asgnd:1; u64 gmxport:16; #endif diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.h b/drivers/net/ethernet/cavium/liquidio/octeon_device.h index cfd12ec..77a6eb7 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_device.h +++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.h @@ -492,6 +492,9 @@ struct octeon_device { int msix_on; + /** Mail Box details of each octeon queue. */ + struct octeon_mbox *mbox[MAX_POSSIBLE_VFS]; + /** IOq information of it's corresponding MSI-X interrupt. */ struct octeon_ioq_vector*ioq_vector; @@ -511,6 +514,7 @@ struct octeon_device { #define OCTEON_CN6XXX(oct) ((oct->chip_id == OCTEON_CN66XX) || \ (oct->chip_id == OCTEON_CN68XX)) #define OCTEON_CN23XX_PF(oct)(oct->chip_id == OCTEON_CN23XX_PF_VID) +#define OCTEON_CN23XX_VF(oct)((oct)->chip_id == OCTEON_CN23XX_VF_VID) #define CHIP_FIELD(oct, TYPE, field) \ (((struct octeon_ ## TYPE *)(oct->chip))->field) diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c new file mode 100644 index 000..3a2f6c1 --- /dev/null +++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c @@ -0,0 +1,321 @@ +/** + * Author: Cavium, Inc. + * + * Contact: supp...@cavium.com + * Please includ
[PATCH net-next V3 2/9] liquidio CN23XX: sysfs VF config support
Adds sysfs based support for enabling or disabling VFs. Signed-off-by: Raghu Vatsavayi Signed-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/lio_main.c| 98 ++ .../net/ethernet/cavium/liquidio/octeon_config.h | 3 + .../net/ethernet/cavium/liquidio/octeon_device.h | 8 ++ 3 files changed, 109 insertions(+) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index d25746f..51ed875 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -194,6 +194,8 @@ struct octeon_device_priv { unsigned long napi_mask; }; +static int liquidio_enable_sriov(struct pci_dev *dev, int num_vfs); + static int octeon_device_init(struct octeon_device *); static int liquidio_stop(struct net_device *netdev); static void liquidio_remove(struct pci_dev *pdev); @@ -532,6 +534,7 @@ static int liquidio_resume(struct pci_dev *pdev __attribute__((unused))) .suspend= liquidio_suspend, .resume = liquidio_resume, #endif + .sriov_configure = liquidio_enable_sriov, }; /** @@ -1486,6 +1489,8 @@ static void octeon_destroy_resources(struct octeon_device *oct) continue; octeon_delete_instr_queue(oct, i); } + if (oct->sriov_info.sriov_enabled) + pci_disable_sriov(oct->pci_dev); /* fallthrough */ case OCT_DEV_SC_BUFF_POOL_INIT_DONE: octeon_free_sc_buffer_pool(oct); @@ -4013,6 +4018,99 @@ static int setup_nic_devices(struct octeon_device *octeon_dev) return -ENODEV; } +static int octeon_enable_sriov(struct octeon_device *oct) +{ + unsigned int num_vfs_alloced = oct->sriov_info.num_vfs_alloced; + struct pci_dev *vfdev; + int err; + u32 u; + + if (OCTEON_CN23XX_PF(oct) && num_vfs_alloced) { + err = pci_enable_sriov(oct->pci_dev, + oct->sriov_info.num_vfs_alloced); + if (err) { + dev_err(&oct->pci_dev->dev, + "OCTEON: Failed to enable PCI sriov: %d\n", + err); + oct->sriov_info.num_vfs_alloced = 0; + return err; + } + oct->sriov_info.sriov_enabled = 1; + + /* init lookup table that maps DPI ring number to VF pci_dev +* struct pointer +*/ + u = 0; + vfdev = pci_get_device(PCI_VENDOR_ID_CAVIUM, + OCTEON_CN23XX_VF_VID, NULL); + while (vfdev) { + if (vfdev->is_virtfn && + (vfdev->physfn == oct->pci_dev)) { + oct->sriov_info.dpiring_to_vfpcidev_lut[u] = + vfdev; + u += oct->sriov_info.rings_per_vf; + } + vfdev = pci_get_device(PCI_VENDOR_ID_CAVIUM, + OCTEON_CN23XX_VF_VID, vfdev); + } + } + + return num_vfs_alloced; +} + +static int lio_pci_sriov_disable(struct octeon_device *oct) +{ + int u; + + if (pci_vfs_assigned(oct->pci_dev)) { + dev_err(&oct->pci_dev->dev, "VFs are still assigned to VMs.\n"); + return -EPERM; + } + + pci_disable_sriov(oct->pci_dev); + + u = 0; + while (u < MAX_POSSIBLE_VFS) { + oct->sriov_info.dpiring_to_vfpcidev_lut[u] = NULL; + u += oct->sriov_info.rings_per_vf; + } + + oct->sriov_info.num_vfs_alloced = 0; + dev_info(&oct->pci_dev->dev, "oct->pf_num:%d disabled VFs\n", +oct->pf_num); + + return 0; +} + +static int liquidio_enable_sriov(struct pci_dev *dev, int num_vfs) +{ + struct octeon_device *oct = pci_get_drvdata(dev); + int ret = 0; + + if ((num_vfs == oct->sriov_info.num_vfs_alloced) && + (oct->sriov_info.sriov_enabled)) { + dev_info(&oct->pci_dev->dev, "oct->pf_num:%d already enabled num_vfs:%d\n", +oct->pf_num, num_vfs); + return 0; + } + + if (!num_vfs) { + ret = lio_pci_sriov_disable(oct); + } else if (num_vfs > oct->sriov_info.max_vfs) { + dev_err(&oct->pci_dev->dev, + "OCTEON: Max allowed VFs:%d user requested:%d", + oct->sriov_info.max_vfs, num_vfs); + ret = -EPERM; + } else { + oct->sriov_info.num_vfs_alloced = num_vfs; + ret = octeon_enable_sriov(oct); + dev_info(&oct->pci_dev->dev, "oct->pf_n
Re: [PATCH net] packet: on direct_xmit, limit tso and csum to supported devices
On Tue, 2016-10-25 at 20:28 -0400, Willem de Bruijn wrote: > From: Willem de Bruijn > > When transmitting on a packet socket with PACKET_VNET_HDR and > PACKET_QDISC_BYPASS, validate device support for features requested > in vnet_hdr. You probably need to add an EXPORT_SYMBOL(validate_xmit_skb_list) because af_packet might be modular. Sorry for not catching this earlier.
[PATCH net] packet: on direct_xmit, limit tso and csum to supported devices
From: Willem de Bruijn When transmitting on a packet socket with PACKET_VNET_HDR and PACKET_QDISC_BYPASS, validate device support for features requested in vnet_hdr. Drop TSO packets sent to devices that do not support TSO or have the feature disabled. Note that the latter currently do process those packets correctly, regardless of not advertising the feature. Because of SKB_GSO_DODGY, it is not sufficient to test device features with netif_needs_gso. Full validate_xmit_skb is needed. Switch to software checksum for non-TSO packets that request checksum offload if that device feature is unsupported or disabled. Note that similar to the TSO case, device drivers may perform checksum offload correctly even when not advertising it. When switching to software checksum, packets hit skb_checksum_help, which has two BUG_ON checksum not in linear segment. Packet sockets always allocate at least up to csum_start + csum_off + 2 as linear. Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c ethtool -K eth0 tso off tx on psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -N ethtool -K eth0 tx off psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -N Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") Signed-off-by: Willem de Bruijn --- net/packet/af_packet.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 11db0d6..d2238b2 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -250,7 +250,7 @@ static void __fanout_link(struct sock *sk, struct packet_sock *po); static int packet_direct_xmit(struct sk_buff *skb) { struct net_device *dev = skb->dev; - netdev_features_t features; + struct sk_buff *orig_skb = skb; struct netdev_queue *txq; int ret = NETDEV_TX_BUSY; @@ -258,9 +258,8 @@ static int packet_direct_xmit(struct sk_buff *skb) !netif_carrier_ok(dev))) goto drop; - features = netif_skb_features(skb); - if (skb_needs_linearize(skb, features) && - __skb_linearize(skb)) + skb = validate_xmit_skb_list(skb, dev); + if (skb != orig_skb) goto drop; txq = skb_get_tx_queue(dev, skb); @@ -280,7 +279,7 @@ static int packet_direct_xmit(struct sk_buff *skb) return ret; drop: atomic_long_inc(&dev->tx_dropped); - kfree_skb(skb); + kfree_skb_list(skb); return NET_XMIT_DROP; } -- 2.8.0.rc3.226.g39d4020
Re: [PATCH] net: Reset skb to network header in neigh_hh_output
On Wed, 2016-10-26 at 01:57 +0200, Abdelrhman Ahmed wrote: > > What is the issue you want to fix exactly ? > > Please describe the use case. > > When netfilter hook uses skb_push to add a specific header between network > header and hardware header. > For the first time(s) before caching hardware header, this header will be > removed / overwritten by hardware header due to resetting to network header. > After using the cached hardware header, this header will be kept as we do not > reset. I think this behavior is inconsistent, so we need to reset in both > cases. > > > Otherwise, your fix is in fact adding a critical bug. > > Could you explain more as it's not clear to me? > Maybe my wording was not good here. What I intended to say is that the __skb_pull(skb, skb_network_offset(skb)) might not be at the right place. Look at commit e1f165032c8bade3a6bdf546f8faf61fda4dd01c to find the reason. > > > On Fri, 07 Oct 2016 23:10:56 +0200 Eric Dumazet > wrote > > On Fri, 2016-10-07 at 16:14 +0200, Abdelrhman Ahmed wrote: > > > When hardware header is added without using cached one, > neigh_resolve_output > > > and neigh_connected_output reset skb to network header before adding it. > > > When cached one is used, neigh_hh_output does not reset the skb to > network > > > header. > > > > > > The fix is to reset skb to network header before adding cached hardware > header > > > to keep the behavior consistent in all cases. > > > > What is the issue you want to fix exactly ? > > > > Please describe the use case. > > > > I highly suggest you take a look at commit > > > > e1f165032c8bade3a6bdf546f8faf61fda4dd01c > > ("net: Fix skb_under_panic oops in neigh_resolve_output") > > > > Otherwise, your fix is in fact adding a critical bug. > > > > > > >
[PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
We recently encountered a bug where a few customers using ibmveth on the same LPAR hit an issue where a TCP session hung when large receive was enabled. Closer analysis revealed that the session was stuck because the one side was advertising a zero window repeatedly. We narrowed this down to the fact the ibmveth driver did not set gso_size which is translated by TCP into the MSS later up the stack. The MSS is used to calculate the TCP window size and as that was abnormally large, it was calculating a zero window, even although the sockets receive buffer was completely empty. We were able to reproduce this and worked with IBM to fix this. Thanks Tom and Marcelo for all your help and review on this. The patch fixes both our internal reproduction tests and our customers tests. Signed-off-by: Jon Maxwell --- drivers/net/ethernet/ibm/ibmveth.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index 29c05d0..c51717e 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget) int frames_processed = 0; unsigned long lpar_rc; struct iphdr *iph; + bool large_packet = 0; + u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr); restart_poll: while (frames_processed < budget) { @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget) iph->check = 0; iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl); adapter->rx_large_packets++; + large_packet = 1; } } } + if (skb->len > netdev->mtu) { + iph = (struct iphdr *)skb->data; + if (be16_to_cpu(skb->protocol) == ETH_P_IP && + iph->protocol == IPPROTO_TCP) { + hdr_len += sizeof(struct iphdr); + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; + skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len; + } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 && + iph->protocol == IPPROTO_TCP) { + hdr_len += sizeof(struct ipv6hdr); + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6; + skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len; + } + if (!large_packet) + adapter->rx_large_packets++; + } + napi_gro_receive(napi, skb);/* send it up */ netdev->stats.rx_packets++; -- 1.8.3.1
Re: [PATCH] net: Reset skb to network header in neigh_hh_output
> What is the issue you want to fix exactly ? > Please describe the use case. When netfilter hook uses skb_push to add a specific header between network header and hardware header. For the first time(s) before caching hardware header, this header will be removed / overwritten by hardware header due to resetting to network header. After using the cached hardware header, this header will be kept as we do not reset. I think this behavior is inconsistent, so we need to reset in both cases. > Otherwise, your fix is in fact adding a critical bug. Could you explain more as it's not clear to me? On Fri, 07 Oct 2016 23:10:56 +0200 Eric Dumazet wrote > On Fri, 2016-10-07 at 16:14 +0200, Abdelrhman Ahmed wrote: > > When hardware header is added without using cached one, > > neigh_resolve_output > > and neigh_connected_output reset skb to network header before adding it. > > When cached one is used, neigh_hh_output does not reset the skb to network > > header. > > > > The fix is to reset skb to network header before adding cached hardware > > header > > to keep the behavior consistent in all cases. > > What is the issue you want to fix exactly ? > > Please describe the use case. > > I highly suggest you take a look at commit > > e1f165032c8bade3a6bdf546f8faf61fda4dd01c > ("net: Fix skb_under_panic oops in neigh_resolve_output") > > Otherwise, your fix is in fact adding a critical bug. > > >
Re: [PATCH net-next 2/3] bpf: Add new cgroups prog type to enable sock modifications
On Tue, 2016-10-25 at 15:30 -0700, David Ahern wrote: > Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to > BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run > any time a process in the cgroup opens an AF_INET or AF_INET6 socket. > Currently only sk_bound_dev_if is exported to userspace for modification > by a bpf program. > > This allows a cgroup to be configured such that AF_INET{6} sockets opened > by processes are automatically bound to a specific device. In turn, this > enables the running of programs that do not support SO_BINDTODEVICE in a > specific VRF context / L3 domain. Does this mean that these programs no longer can use loopback ?
Re: [PATCH net-next 2/3] bpf: Add new cgroups prog type to enable sock modifications
On 10/26/2016 12:30 AM, David Ahern wrote: Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run any time a process in the cgroup opens an AF_INET or AF_INET6 socket. Currently only sk_bound_dev_if is exported to userspace for modification by a bpf program. This allows a cgroup to be configured such that AF_INET{6} sockets opened by processes are automatically bound to a specific device. In turn, this enables the running of programs that do not support SO_BINDTODEVICE in a specific VRF context / L3 domain. Signed-off-by: David Ahern [...] @@ -524,6 +535,10 @@ struct bpf_tunnel_key { __u32 tunnel_label; }; +struct bpf_sock { + __u32 bound_dev_if; +}; + /* User return codes for XDP prog type. * A valid XDP program must return one of these defined values. All other * return codes are reserved for future use. Unknown return codes will result [...] diff --git a/net/core/filter.c b/net/core/filter.c index 4552b8c93b99..775802881b01 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2482,6 +2482,27 @@ static const struct bpf_func_proto bpf_xdp_event_output_proto = { .arg5_type = ARG_CONST_STACK_SIZE, }; +BPF_CALL_3(bpf_sock_store_u32, struct sock *, sk, u32, offset, u32, val) +{ + u8 *ptr = (u8 *)sk; + + if (unlikely(offset > sizeof(*sk))) + return -EFAULT; + + *((u32 *)ptr) = val; + + return 0; +} Seems strange to me. So, this helper allows to overwrite arbitrary memory of a struct sock instance. Potentially we could crash the kernel. And in your sock_filter_convert_ctx_access(), you already implement inline read/write for the context ... Your demo code does in pseudocode: r1 = sk r2 = offsetof(struct bpf_sock, bound_dev_if) r3 = idx r1->sk_bound_dev_if = idx sock_store_u32(r1, r2, r3) // updates sk_bound_dev_if again to idx return 1 Dropping that helper from the patch, the only thing a program can do here is to read/write the sk_bound_dev_if helper per cgroup. Hmm ... dunno. So this really has to be for cgroups v2, right?
Re: [PATCH net-next 1/3] bpf: Refactor cgroups code in prep for new type
On 10/25/16 5:01 PM, Daniel Borkmann wrote: >> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c >> index a0ab43f264b0..918c01a6f129 100644 >> --- a/kernel/bpf/cgroup.c >> +++ b/kernel/bpf/cgroup.c >> @@ -117,6 +117,19 @@ void __cgroup_bpf_update(struct cgroup *cgrp, >> } >> } >> >> +static int __cgroup_bpf_run_filter_skb(struct sk_buff *skb, >> + struct bpf_prog *prog) >> +{ >> +unsigned int offset = skb->data - skb_network_header(skb); >> +int ret; >> + >> +__skb_push(skb, offset); >> +ret = bpf_prog_run_clear_cb(prog, skb) == 1 ? 0 : -EPERM; > > Original code save skb->cb[], this one clears it. > ah, it changed in Daniel's v6 to v7 code and I missed it. Will fix. Thanks for pointing it out.
[PATCH] uapi: Fix userspace compilation of ip_tables.h/ip6_tables.h in C++ mode
The implicit cast from void * is not allowed for C++ compilers, and the arithmetic on void * generates warnings if a C++ application tries to include these UAPI headers. $ g++ -c t.cc ip_tables.h:221:24: warning: pointer of type 'void *' used in arithmetic ip_tables.h:221:24: error: invalid conversion from 'void*' to 'xt_entry_target*' Signed-off-by: Jason Gunthorpe --- include/uapi/linux/netfilter_ipv4/ip_tables.h | 2 +- include/uapi/linux/netfilter_ipv6/ip6_tables.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/netfilter_ipv4/ip_tables.h b/include/uapi/linux/netfilter_ipv4/ip_tables.h index d0da53d96d93..4682b18f3f44 100644 --- a/include/uapi/linux/netfilter_ipv4/ip_tables.h +++ b/include/uapi/linux/netfilter_ipv4/ip_tables.h @@ -221,7 +221,7 @@ struct ipt_get_entries { static __inline__ struct xt_entry_target * ipt_get_target(struct ipt_entry *e) { - return (void *)e + e->target_offset; + return (struct xt_entry_target *)((__u8 *)e + e->target_offset); } /* diff --git a/include/uapi/linux/netfilter_ipv6/ip6_tables.h b/include/uapi/linux/netfilter_ipv6/ip6_tables.h index d1b22653daf2..05e0631a6d12 100644 --- a/include/uapi/linux/netfilter_ipv6/ip6_tables.h +++ b/include/uapi/linux/netfilter_ipv6/ip6_tables.h @@ -261,7 +261,7 @@ struct ip6t_get_entries { static __inline__ struct xt_entry_target * ip6t_get_target(struct ip6t_entry *e) { - return (void *)e + e->target_offset; + return (struct xt_entry_target *)((__u8 *)e + e->target_offset); } /* -- 2.1.4
Re: [PATCH net-next 1/3] bpf: Refactor cgroups code in prep for new type
On 10/26/2016 12:30 AM, David Ahern wrote: Code move only; no functional change intended. Not quite, see below. Signed-off-by: David Ahern --- kernel/bpf/cgroup.c | 27 ++- kernel/bpf/syscall.c | 28 +++- 2 files changed, 37 insertions(+), 18 deletions(-) diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index a0ab43f264b0..918c01a6f129 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -117,6 +117,19 @@ void __cgroup_bpf_update(struct cgroup *cgrp, } } +static int __cgroup_bpf_run_filter_skb(struct sk_buff *skb, + struct bpf_prog *prog) +{ + unsigned int offset = skb->data - skb_network_header(skb); + int ret; + + __skb_push(skb, offset); + ret = bpf_prog_run_clear_cb(prog, skb) == 1 ? 0 : -EPERM; Original code save skb->cb[], this one clears it. + __skb_pull(skb, offset); + + return ret; +} + /** * __cgroup_bpf_run_filter() - Run a program for packet filtering * @sk: The socken sending or receiving traffic @@ -153,11 +166,15 @@ int __cgroup_bpf_run_filter(struct sock *sk, prog = rcu_dereference(cgrp->bpf.effective[type]); if (prog) { - unsigned int offset = skb->data - skb_network_header(skb); - - __skb_push(skb, offset); - ret = bpf_prog_run_save_cb(prog, skb) == 1 ? 0 : -EPERM; - __skb_pull(skb, offset); + switch (type) { + case BPF_CGROUP_INET_INGRESS: + case BPF_CGROUP_INET_EGRESS: + ret = __cgroup_bpf_run_filter_skb(skb, prog); + break; + /* make gcc happy else complains about missing enum value */ + default: + return 0; + } }
Re: [PATCH net] inet: Fix missing return value in inet6_hash
On Tue, Oct 25, 2016 at 6:08 PM, Craig Gallek wrote: > From: Craig Gallek > > As part of a series to implement faster SO_REUSEPORT lookups, > commit 086c653f5862 ("sock: struct proto hash function may error") > added return values to protocol hash functions and > commit 496611d7b5ea ("inet: create IPv6-equivalent inet_hash function") > implemented a new hash function for IPv6. However, the latter does > not respect the former's convention. > > This properly propagates the hash errors in the IPv6 case. > > Fixes: 496611d7b5ea ("inet: create IPv6-equivalent inet_hash function") > Reported-by: Soheil Hassas Yeganeh > Signed-off-by: Craig Gallek Acked-by: Soheil Hassas Yeganeh > --- > net/ipv6/inet6_hashtables.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c > index 2fd0374a35b1..02761c9fe43e 100644 > --- a/net/ipv6/inet6_hashtables.c > +++ b/net/ipv6/inet6_hashtables.c > @@ -264,13 +264,15 @@ EXPORT_SYMBOL_GPL(inet6_hash_connect); > > int inet6_hash(struct sock *sk) > { > + int err = 0; > + > if (sk->sk_state != TCP_CLOSE) { > local_bh_disable(); > - __inet_hash(sk, NULL, ipv6_rcv_saddr_equal); > + err = __inet_hash(sk, NULL, ipv6_rcv_saddr_equal); > local_bh_enable(); > } > > - return 0; > + return err; > } > EXPORT_SYMBOL_GPL(inet6_hash); Thanks for the fix! > -- > 2.8.0.rc3.226.g39d4020 >
[PATCH net] bpf: fix samples to add fake KBUILD_MODNAME
Some of the sample files are causing issues when they are loaded with tc and cls_bpf, meaning tc bails out while trying to parse the resulting ELF file as program/map/etc sections are not present, which can be easily spotted with readelf(1). Currently, BPF samples are including some of the kernel headers and mid term we should change them to refrain from this, really. When dynamic debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which is easily overlooked in the build as clang spills this along with other noisy warnings from various header includes, and llc still generates an ELF file with mentioned characteristics. For just playing around with BPF examples, this can be a bit of a hurdle to take. Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is done in xdp*_kern samples already. Fixes: 65d472fb007d ("samples/bpf: add 'pointer to packet' tests") Fixes: 6afb1e28b859 ("samples/bpf: Add tunnel set/get tests.") Fixes: a3f74617340b ("cgroup: bpf: Add an example to do cgroup checking in BPF") Reported-by: Chandrasekar Kannan Signed-off-by: Daniel Borkmann --- samples/bpf/parse_ldabs.c| 1 + samples/bpf/parse_simple.c | 1 + samples/bpf/parse_varlen.c | 1 + samples/bpf/tcbpf1_kern.c| 1 + samples/bpf/tcbpf2_kern.c| 1 + samples/bpf/test_cgrp2_tc_kern.c | 1 + 6 files changed, 6 insertions(+) diff --git a/samples/bpf/parse_ldabs.c b/samples/bpf/parse_ldabs.c index d175501..6db6b21 100644 --- a/samples/bpf/parse_ldabs.c +++ b/samples/bpf/parse_ldabs.c @@ -4,6 +4,7 @@ * modify it under the terms of version 2 of the GNU General Public * License as published by the Free Software Foundation. */ +#define KBUILD_MODNAME "foo" #include #include #include diff --git a/samples/bpf/parse_simple.c b/samples/bpf/parse_simple.c index cf2511c..10af53d 100644 --- a/samples/bpf/parse_simple.c +++ b/samples/bpf/parse_simple.c @@ -4,6 +4,7 @@ * modify it under the terms of version 2 of the GNU General Public * License as published by the Free Software Foundation. */ +#define KBUILD_MODNAME "foo" #include #include #include diff --git a/samples/bpf/parse_varlen.c b/samples/bpf/parse_varlen.c index edab34d..95c1632 100644 --- a/samples/bpf/parse_varlen.c +++ b/samples/bpf/parse_varlen.c @@ -4,6 +4,7 @@ * modify it under the terms of version 2 of the GNU General Public * License as published by the Free Software Foundation. */ +#define KBUILD_MODNAME "foo" #include #include #include diff --git a/samples/bpf/tcbpf1_kern.c b/samples/bpf/tcbpf1_kern.c index fa051b3..274c884 100644 --- a/samples/bpf/tcbpf1_kern.c +++ b/samples/bpf/tcbpf1_kern.c @@ -1,3 +1,4 @@ +#define KBUILD_MODNAME "foo" #include #include #include diff --git a/samples/bpf/tcbpf2_kern.c b/samples/bpf/tcbpf2_kern.c index 3303bb8..9c823a6 100644 --- a/samples/bpf/tcbpf2_kern.c +++ b/samples/bpf/tcbpf2_kern.c @@ -5,6 +5,7 @@ * modify it under the terms of version 2 of the GNU General Public * License as published by the Free Software Foundation. */ +#define KBUILD_MODNAME "foo" #include #include #include diff --git a/samples/bpf/test_cgrp2_tc_kern.c b/samples/bpf/test_cgrp2_tc_kern.c index 10ff734..1547b36 100644 --- a/samples/bpf/test_cgrp2_tc_kern.c +++ b/samples/bpf/test_cgrp2_tc_kern.c @@ -4,6 +4,7 @@ * modify it under the terms of version 2 of the GNU General Public * License as published by the Free Software Foundation. */ +#define KBUILD_MODNAME "foo" #include #include #include -- 1.9.3
[PATCH net-next 2/3] bpf: Add new cgroups prog type to enable sock modifications
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run any time a process in the cgroup opens an AF_INET or AF_INET6 socket. Currently only sk_bound_dev_if is exported to userspace for modification by a bpf program. This allows a cgroup to be configured such that AF_INET{6} sockets opened by processes are automatically bound to a specific device. In turn, this enables the running of programs that do not support SO_BINDTODEVICE in a specific VRF context / L3 domain. Signed-off-by: David Ahern --- include/linux/filter.h | 2 +- include/uapi/linux/bpf.h | 15 kernel/bpf/cgroup.c | 9 + kernel/bpf/syscall.c | 4 +++ net/core/filter.c| 92 net/core/sock.c | 7 6 files changed, 128 insertions(+), 1 deletion(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 1f09c521adfe..808e158742a2 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -408,7 +408,7 @@ struct bpf_prog { enum bpf_prog_type type; /* Type of BPF program */ struct bpf_prog_aux *aux; /* Auxiliary fields */ struct sock_fprog_kern *orig_prog; /* Original BPF program */ - unsigned int(*bpf_func)(const struct sk_buff *skb, + unsigned int(*bpf_func)(const void *ctx, const struct bpf_insn *filter); /* Instructions for interpreter */ union { diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 6b62ee9a2f78..ce5283f221e7 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -99,11 +99,13 @@ enum bpf_prog_type { BPF_PROG_TYPE_XDP, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_CGROUP_SKB, + BPF_PROG_TYPE_CGROUP_SOCK, }; enum bpf_attach_type { BPF_CGROUP_INET_INGRESS, BPF_CGROUP_INET_EGRESS, + BPF_CGROUP_INET_SOCK_CREATE, __MAX_BPF_ATTACH_TYPE }; @@ -449,6 +451,15 @@ enum bpf_func_id { */ BPF_FUNC_get_numa_node_id, + /** +* sock_store_u32(sk, offset, val) - store bytes into sock +* @sk: pointer to sock +* @offset: offset within sock +* @val: value to write +* Return: 0 on success +*/ + BPF_FUNC_sock_store_u32, + __BPF_FUNC_MAX_ID, }; @@ -524,6 +535,10 @@ struct bpf_tunnel_key { __u32 tunnel_label; }; +struct bpf_sock { + __u32 bound_dev_if; +}; + /* User return codes for XDP prog type. * A valid XDP program must return one of these defined values. All other * return codes are reserved for future use. Unknown return codes will result diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 918c01a6f129..4fcb58013a3a 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -117,6 +117,12 @@ void __cgroup_bpf_update(struct cgroup *cgrp, } } +static int __cgroup_bpf_run_filter_sk_create(struct sock *sk, +struct bpf_prog *prog) +{ + return prog->bpf_func(sk, prog->insnsi) == 1 ? 0 : -EPERM; +} + static int __cgroup_bpf_run_filter_skb(struct sk_buff *skb, struct bpf_prog *prog) { @@ -171,6 +177,9 @@ int __cgroup_bpf_run_filter(struct sock *sk, case BPF_CGROUP_INET_EGRESS: ret = __cgroup_bpf_run_filter_skb(skb, prog); break; + case BPF_CGROUP_INET_SOCK_CREATE: + ret = __cgroup_bpf_run_filter_sk_create(sk, prog); + break; /* make gcc happy else complains about missing enum value */ default: return 0; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 9abc88deabbc..3b7e30e28cd3 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -844,6 +844,9 @@ static int bpf_prog_attach(const union bpf_attr *attr) ptype = BPF_PROG_TYPE_CGROUP_SKB; break; + case BPF_CGROUP_INET_SOCK_CREATE: + ptype = BPF_PROG_TYPE_CGROUP_SOCK; + break; default: return -EINVAL; } @@ -879,6 +882,7 @@ static int bpf_prog_detach(const union bpf_attr *attr) switch (attr->attach_type) { case BPF_CGROUP_INET_INGRESS: case BPF_CGROUP_INET_EGRESS: + case BPF_CGROUP_INET_SOCK_CREATE: cgrp = cgroup_get_from_fd(attr->target_fd); if (IS_ERR(cgrp)) return PTR_ERR(cgrp); diff --git a/net/core/filter.c b/net/core/filter.c index 4552b8c93b99..775802881b01 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2482,6 +2482,27 @@ static const struct bpf_func_proto bpf_xdp_event_output_proto = { .arg5_type = ARG_CONST_STACK_SIZE, }; +
[PATCH net-next 1/3] bpf: Refactor cgroups code in prep for new type
Code move only; no functional change intended. Signed-off-by: David Ahern --- kernel/bpf/cgroup.c | 27 ++- kernel/bpf/syscall.c | 28 +++- 2 files changed, 37 insertions(+), 18 deletions(-) diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index a0ab43f264b0..918c01a6f129 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -117,6 +117,19 @@ void __cgroup_bpf_update(struct cgroup *cgrp, } } +static int __cgroup_bpf_run_filter_skb(struct sk_buff *skb, + struct bpf_prog *prog) +{ + unsigned int offset = skb->data - skb_network_header(skb); + int ret; + + __skb_push(skb, offset); + ret = bpf_prog_run_clear_cb(prog, skb) == 1 ? 0 : -EPERM; + __skb_pull(skb, offset); + + return ret; +} + /** * __cgroup_bpf_run_filter() - Run a program for packet filtering * @sk: The socken sending or receiving traffic @@ -153,11 +166,15 @@ int __cgroup_bpf_run_filter(struct sock *sk, prog = rcu_dereference(cgrp->bpf.effective[type]); if (prog) { - unsigned int offset = skb->data - skb_network_header(skb); - - __skb_push(skb, offset); - ret = bpf_prog_run_save_cb(prog, skb) == 1 ? 0 : -EPERM; - __skb_pull(skb, offset); + switch (type) { + case BPF_CGROUP_INET_INGRESS: + case BPF_CGROUP_INET_EGRESS: + ret = __cgroup_bpf_run_filter_skb(skb, prog); + break; + /* make gcc happy else complains about missing enum value */ + default: + return 0; + } } rcu_read_unlock(); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 1814c010ace6..9abc88deabbc 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -828,6 +828,7 @@ static int bpf_obj_get(const union bpf_attr *attr) static int bpf_prog_attach(const union bpf_attr *attr) { + enum bpf_prog_type ptype = BPF_PROG_TYPE_UNSPEC; struct bpf_prog *prog; struct cgroup *cgrp; @@ -840,25 +841,26 @@ static int bpf_prog_attach(const union bpf_attr *attr) switch (attr->attach_type) { case BPF_CGROUP_INET_INGRESS: case BPF_CGROUP_INET_EGRESS: - prog = bpf_prog_get_type(attr->attach_bpf_fd, -BPF_PROG_TYPE_CGROUP_SKB); - if (IS_ERR(prog)) - return PTR_ERR(prog); - - cgrp = cgroup_get_from_fd(attr->target_fd); - if (IS_ERR(cgrp)) { - bpf_prog_put(prog); - return PTR_ERR(cgrp); - } - - cgroup_bpf_update(cgrp, prog, attr->attach_type); - cgroup_put(cgrp); + ptype = BPF_PROG_TYPE_CGROUP_SKB; break; default: return -EINVAL; } + prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype); + if (IS_ERR(prog)) + return PTR_ERR(prog); + + cgrp = cgroup_get_from_fd(attr->target_fd); + if (IS_ERR(cgrp)) { + bpf_prog_put(prog); + return PTR_ERR(cgrp); + } + + cgroup_bpf_update(cgrp, prog, attr->attach_type); + cgroup_put(cgrp); + return 0; } -- 2.1.4
[PATCH net-next 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if
Add a simple program to demonstrate the ability to attach a bpf program to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they are created. Signed-off-by: David Ahern --- samples/bpf/Makefile | 2 ++ samples/bpf/bpf_helpers.h | 2 ++ samples/bpf/test_cgrp2_sock.c | 84 +++ 3 files changed, 88 insertions(+) create mode 100644 samples/bpf/test_cgrp2_sock.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 2624d5d7ce8b..ec4ef37a2dbc 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -22,6 +22,7 @@ hostprogs-y += map_perf_test hostprogs-y += test_overhead hostprogs-y += test_cgrp2_array_pin hostprogs-y += test_cgrp2_attach +hostprogs-y += test_cgrp2_sock hostprogs-y += xdp1 hostprogs-y += xdp2 hostprogs-y += test_current_task_under_cgroup @@ -48,6 +49,7 @@ map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o +test_cgrp2_sock-objs := libbpf.o test_cgrp2_sock.o xdp1-objs := bpf_load.o libbpf.o xdp1_user.o # reuse xdp1 source intentionally xdp2-objs := bpf_load.o libbpf.o xdp1_user.o diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h index 90f44bd2045e..7d95c9af3681 100644 --- a/samples/bpf/bpf_helpers.h +++ b/samples/bpf/bpf_helpers.h @@ -88,6 +88,8 @@ static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flag (void *) BPF_FUNC_l4_csum_replace; static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) = (void *) BPF_FUNC_skb_under_cgroup; +static int (*bpf_sock_store_u32)(void *ctx, __u32 off, __u32 val) = + (void *) BPF_FUNC_sock_store_u32; #if defined(__x86_64__) diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c new file mode 100644 index ..1fab10a08846 --- /dev/null +++ b/samples/bpf/test_cgrp2_sock.c @@ -0,0 +1,84 @@ +/* eBPF example program: + * + * - Loads eBPF program + * + * The eBPF program sets the sk_bound_dev_if index in new AF_INET{6} + * sockets opened by processes in the cgroup. + * + * - Attaches the new program to a cgroup using BPF_PROG_ATTACH + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "libbpf.h" + +static int prog_load(int idx) +{ + struct bpf_insn prog[] = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), + BPF_MOV64_IMM(BPF_REG_3, idx), + BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)), + BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_sock_store_u32), + BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */ + BPF_EXIT_INSN(), + }; + + return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, +prog, sizeof(prog), "GPL", 0); +} + +static int usage(const char *argv0) +{ + printf("Usage: %s device-index\n", argv0); + return EXIT_FAILURE; +} + +int main(int argc, char **argv) +{ + int cg_fd, prog_fd, ret; + int idx = 0; + + if (argc < 2) + return usage(argv[0]); + + idx = atoi(argv[2]); + if (!idx) { + printf("Invalid device index\n"); + return EXIT_FAILURE; + } + + cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY); + if (cg_fd < 0) { + printf("Failed to open cgroup path: '%s'\n", strerror(errno)); + return EXIT_FAILURE; + } + + prog_fd = prog_load(idx); + printf("Output from kernel verifier:\n%s\n---\n", bpf_log_buf); + + if (prog_fd < 0) { + printf("Failed to load prog: '%s'\n", strerror(errno)); + return EXIT_FAILURE; + } + + ret = bpf_prog_detach(cg_fd, BPF_CGROUP_INET_SOCK_CREATE); + ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE); + if (ret < 0) { + printf("Failed to attach prog to cgroup: '%s'\n", + strerror(errno)); + return EXIT_FAILURE; + } + + return EXIT_SUCCESS; +} -- 2.1.4
[PATCH net-next 0/3] Add bpf support to set sk_bound_dev_if
The recently added VRF support in Linux leverages the bind-to-device API for programs to specify an L3 domain for a socket. While SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable program has support for it. Even for those programs that do support it, the API requires processes to be started as root (CAP_NET_RAW) which is not desirable from a general security perspective. This patch set leverages Daniel Mack's work to attach bpf programs to a cgroup: https://www.mail-archive.com/netdev@vger.kernel.org/msg134028.html to provide a capability to set sk_bound_dev_if for all AF_INET{6} sockets opened by a process in a cgroup when the sockets are allocated. This capability enables running any program in a VRF context and is key to deploying Management VRF, a fundamental configuration for networking gear, with any Linux OS installation. David Ahern (3): bpf: Refactor cgroups code in prep for new type bpf: Add new cgroups prog type to enable sock modifications samples: bpf: add userspace example for modifying sk_bound_dev_if include/linux/filter.h| 2 +- include/uapi/linux/bpf.h | 15 +++ kernel/bpf/cgroup.c | 36 ++--- kernel/bpf/syscall.c | 32 +-- net/core/filter.c | 92 +++ net/core/sock.c | 7 samples/bpf/Makefile | 2 + samples/bpf/bpf_helpers.h | 2 + samples/bpf/test_cgrp2_sock.c | 84 +++ 9 files changed, 253 insertions(+), 19 deletions(-) create mode 100644 samples/bpf/test_cgrp2_sock.c -- 2.1.4
[PATCH net] inet: Fix missing return value in inet6_hash
From: Craig Gallek As part of a series to implement faster SO_REUSEPORT lookups, commit 086c653f5862 ("sock: struct proto hash function may error") added return values to protocol hash functions and commit 496611d7b5ea ("inet: create IPv6-equivalent inet_hash function") implemented a new hash function for IPv6. However, the latter does not respect the former's convention. This properly propagates the hash errors in the IPv6 case. Fixes: 496611d7b5ea ("inet: create IPv6-equivalent inet_hash function") Reported-by: Soheil Hassas Yeganeh Signed-off-by: Craig Gallek --- net/ipv6/inet6_hashtables.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 2fd0374a35b1..02761c9fe43e 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -264,13 +264,15 @@ EXPORT_SYMBOL_GPL(inet6_hash_connect); int inet6_hash(struct sock *sk) { + int err = 0; + if (sk->sk_state != TCP_CLOSE) { local_bh_disable(); - __inet_hash(sk, NULL, ipv6_rcv_saddr_equal); + err = __inet_hash(sk, NULL, ipv6_rcv_saddr_equal); local_bh_enable(); } - return 0; + return err; } EXPORT_SYMBOL_GPL(inet6_hash); -- 2.8.0.rc3.226.g39d4020
Re: [net-next PATCH 04/27] arch/arc: Add option to skip sync on DMA mapping
On 10/25/2016 02:38 PM, Alexander Duyck wrote: > This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to > avoid invoking cache line invalidation if the driver will just handle it > later via a sync_for_cpu or sync_for_device call. > > Cc: Vineet Gupta > Cc: linux-snps-...@lists.infradead.org > Signed-off-by: Alexander Duyck > --- > arch/arc/mm/dma.c |5 - Acked-by: Vineet Gupta > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c > index 20afc65..6303c34 100644 > --- a/arch/arc/mm/dma.c > +++ b/arch/arc/mm/dma.c > @@ -133,7 +133,10 @@ static dma_addr_t arc_dma_map_page(struct device *dev, > struct page *page, > unsigned long attrs) > { > phys_addr_t paddr = page_to_phys(page) + offset; > - _dma_cache_sync(paddr, size, dir); > + > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + _dma_cache_sync(paddr, size, dir); > + > return plat_phys_to_dma(dev, paddr); > } > > >
[net-next PATCH 05/27] arch/arm: Add option to skip sync on DMA map and unmap
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Cc: Russell King Signed-off-by: Alexander Duyck --- arch/arm/common/dmabounce.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index 3012816..75055df 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -243,7 +243,8 @@ static int needs_bounce(struct device *dev, dma_addr_t dma_addr, size_t size) } static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size, - enum dma_data_direction dir) + enum dma_data_direction dir, + unsigned long attrs) { struct dmabounce_device_info *device_info = dev->archdata.dmabounce; struct safe_buffer *buf; @@ -262,7 +263,8 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size, __func__, buf->ptr, virt_to_dma(dev, buf->ptr), buf->safe, buf->safe_dma_addr); - if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) { + if ((dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) && + !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { dev_dbg(dev, "%s: copy unsafe %p to safe %p, size %d\n", __func__, ptr, buf->safe, size); memcpy(buf->safe, ptr, size); @@ -272,7 +274,8 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size, } static inline void unmap_single(struct device *dev, struct safe_buffer *buf, - size_t size, enum dma_data_direction dir) + size_t size, enum dma_data_direction dir, + unsigned long attrs) { BUG_ON(buf->size != size); BUG_ON(buf->direction != dir); @@ -283,7 +286,8 @@ static inline void unmap_single(struct device *dev, struct safe_buffer *buf, DO_STATS(dev->archdata.dmabounce->bounce_count++); - if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) { + if ((dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) && + !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { void *ptr = buf->ptr; dev_dbg(dev, "%s: copy back safe %p to unsafe %p size %d\n", @@ -334,7 +338,7 @@ static dma_addr_t dmabounce_map_page(struct device *dev, struct page *page, return DMA_ERROR_CODE; } - return map_single(dev, page_address(page) + offset, size, dir); + return map_single(dev, page_address(page) + offset, size, dir, attrs); } /* @@ -357,7 +361,7 @@ static void dmabounce_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t return; } - unmap_single(dev, buf, size, dir); + unmap_single(dev, buf, size, dir, attrs); } static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
[net-next PATCH 04/27] arch/arc: Add option to skip sync on DMA mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Cc: Vineet Gupta Cc: linux-snps-...@lists.infradead.org Signed-off-by: Alexander Duyck --- arch/arc/mm/dma.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index 20afc65..6303c34 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -133,7 +133,10 @@ static dma_addr_t arc_dma_map_page(struct device *dev, struct page *page, unsigned long attrs) { phys_addr_t paddr = page_to_phys(page) + offset; - _dma_cache_sync(paddr, size, dir); + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + _dma_cache_sync(paddr, size, dir); + return plat_phys_to_dma(dev, paddr); }
[net-next PATCH 07/27] arch/blackfin: Add option to skip sync on DMA map
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Cc: Steven Miao Signed-off-by: Alexander Duyck --- arch/blackfin/kernel/dma-mapping.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/blackfin/kernel/dma-mapping.c b/arch/blackfin/kernel/dma-mapping.c index 53fbbb6..a27a74a 100644 --- a/arch/blackfin/kernel/dma-mapping.c +++ b/arch/blackfin/kernel/dma-mapping.c @@ -118,6 +118,10 @@ static int bfin_dma_map_sg(struct device *dev, struct scatterlist *sg_list, for_each_sg(sg_list, sg, nents, i) { sg->dma_address = (dma_addr_t) sg_virt(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + __dma_sync(sg_dma_address(sg), sg_dma_len(sg), direction); } @@ -143,7 +147,9 @@ static dma_addr_t bfin_dma_map_page(struct device *dev, struct page *page, { dma_addr_t handle = (dma_addr_t)(page_address(page) + offset); - _dma_sync(handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + _dma_sync(handle, size, dir); + return handle; }
[net-next PATCH 22/27] arch/xtensa: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Max Filippov Signed-off-by: Alexander Duyck --- arch/xtensa/kernel/pci-dma.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 1e68806..6a16dec 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -189,7 +189,9 @@ static dma_addr_t xtensa_map_page(struct device *dev, struct page *page, { dma_addr_t dma_handle = page_to_phys(page) + offset; - xtensa_sync_single_for_device(dev, dma_handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + xtensa_sync_single_for_device(dev, dma_handle, size, dir); + return dma_handle; } @@ -197,7 +199,8 @@ static void xtensa_unmap_page(struct device *dev, dma_addr_t dma_handle, size_t size, enum dma_data_direction dir, unsigned long attrs) { - xtensa_sync_single_for_cpu(dev, dma_handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + xtensa_sync_single_for_cpu(dev, dma_handle, size, dir); } static int xtensa_map_sg(struct device *dev, struct scatterlist *sg,
[net-next PATCH 01/27] swiotlb: Drop unused function swiotlb_map_sg
There are no users for swiotlb_map_sg so we might as well just drop it. Acked-by: Konrad Rzeszutek Wilk Signed-off-by: Alexander Duyck --- include/linux/swiotlb.h |4 lib/swiotlb.c |8 2 files changed, 12 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 5f81f8a..e237b6f 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -72,10 +72,6 @@ extern void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, size_t size, enum dma_data_direction dir, unsigned long attrs); -extern int -swiotlb_map_sg(struct device *hwdev, struct scatterlist *sg, int nents, - enum dma_data_direction dir); - extern void swiotlb_unmap_sg(struct device *hwdev, struct scatterlist *sg, int nents, enum dma_data_direction dir); diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0..47aad37 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -910,14 +910,6 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, } EXPORT_SYMBOL(swiotlb_map_sg_attrs); -int -swiotlb_map_sg(struct device *hwdev, struct scatterlist *sgl, int nelems, - enum dma_data_direction dir) -{ - return swiotlb_map_sg_attrs(hwdev, sgl, nelems, dir, 0); -} -EXPORT_SYMBOL(swiotlb_map_sg); - /* * Unmap a set of streaming mode DMA translations. Again, cpu read rules * concerning calls here are the same as for swiotlb_unmap_page() above.
[net-next PATCH 03/27] swiotlb: Add support for DMA_ATTR_SKIP_CPU_SYNC
As a first step to making DMA_ATTR_SKIP_CPU_SYNC apply to architectures beyond just ARM I need to make it so that the swiotlb will respect the flag. In order to do that I also need to update the swiotlb-xen since it heavily makes use of the functionality. Cc: Konrad Rzeszutek Wilk Signed-off-by: Alexander Duyck --- drivers/xen/swiotlb-xen.c | 11 +++--- include/linux/swiotlb.h |6 -- lib/swiotlb.c | 48 +++-- 3 files changed, 40 insertions(+), 25 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index b8014bf..3d048af 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -405,7 +405,8 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, */ trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force); - map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir); + map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir, +attrs); if (map == SWIOTLB_MAP_ERROR) return DMA_ERROR_CODE; @@ -419,7 +420,8 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, if (dma_capable(dev, dev_addr, size)) return dev_addr; - swiotlb_tbl_unmap_single(dev, map, size, dir); + swiotlb_tbl_unmap_single(dev, map, size, dir, +attrs | DMA_ATTR_SKIP_CPU_SYNC); return DMA_ERROR_CODE; } @@ -445,7 +447,7 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr, /* NOTE: We use dev_addr here, not paddr! */ if (is_xen_swiotlb_buffer(dev_addr)) { - swiotlb_tbl_unmap_single(hwdev, paddr, size, dir); + swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs); return; } @@ -558,11 +560,12 @@ void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, start_dma_addr, sg_phys(sg), sg->length, -dir); +dir, attrs); if (map == SWIOTLB_MAP_ERROR) { dev_warn(hwdev, "swiotlb buffer is full\n"); /* Don't panic here, we expect map_sg users to do proper error handling. */ + attrs |= DMA_ATTR_SKIP_CPU_SYNC; xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir, attrs); sg_dma_len(sgl) = 0; diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index e237b6f..4517be9 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -44,11 +44,13 @@ enum dma_sync_target { extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, dma_addr_t tbl_dma_addr, phys_addr_t phys, size_t size, - enum dma_data_direction dir); + enum dma_data_direction dir, + unsigned long attrs); extern void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, -size_t size, enum dma_data_direction dir); +size_t size, enum dma_data_direction dir, +unsigned long attrs); extern void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr, diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 47aad37..b538d39 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -425,7 +425,8 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, dma_addr_t tbl_dma_addr, phys_addr_t orig_addr, size_t size, - enum dma_data_direction dir) + enum dma_data_direction dir, + unsigned long attrs) { unsigned long flags; phys_addr_t tlb_addr; @@ -526,7 +527,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, */ for (i = 0; i < nslots; i++) io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT); - if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && + (dir == DMA_TO_DEVICE || dir == DMA_BIDIRE
[net-next PATCH 23/27] dma: Add calls for dma_map_page_attrs and dma_unmap_page_attrs
Add support for mapping and unmapping a page with attributes. The primary use for this is currently to allow for us to pass the DMA_ATTR_SKIP_CPU_SYNC attribute when mapping and unmapping a page. On some architectures such as ARM the synchronization has significant overhead and if we are already taking care of the sync_for_cpu and sync_for_device from the driver there isn't much need to handle this in the map/unmap calls as well. Signed-off-by: Alexander Duyck --- include/linux/dma-mapping.h | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 08528af..10c5a17 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -243,29 +243,33 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg ops->unmap_sg(dev, sg, nents, dir, attrs); } -static inline dma_addr_t dma_map_page(struct device *dev, struct page *page, - size_t offset, size_t size, - enum dma_data_direction dir) +static inline dma_addr_t dma_map_page_attrs(struct device *dev, + struct page *page, + size_t offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) { struct dma_map_ops *ops = get_dma_ops(dev); dma_addr_t addr; kmemcheck_mark_initialized(page_address(page) + offset, size); BUG_ON(!valid_dma_direction(dir)); - addr = ops->map_page(dev, page, offset, size, dir, 0); + addr = ops->map_page(dev, page, offset, size, dir, attrs); debug_dma_map_page(dev, page, offset, size, dir, addr, false); return addr; } -static inline void dma_unmap_page(struct device *dev, dma_addr_t addr, - size_t size, enum dma_data_direction dir) +static inline void dma_unmap_page_attrs(struct device *dev, + dma_addr_t addr, size_t size, + enum dma_data_direction dir, + unsigned long attrs) { struct dma_map_ops *ops = get_dma_ops(dev); BUG_ON(!valid_dma_direction(dir)); if (ops->unmap_page) - ops->unmap_page(dev, addr, size, dir, 0); + ops->unmap_page(dev, addr, size, dir, attrs); debug_dma_unmap_page(dev, addr, size, dir, false); } @@ -385,6 +389,8 @@ static inline void dma_sync_single_range_for_device(struct device *dev, #define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0) #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0) #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0) +#define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0) +#define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0) extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size);
[net-next PATCH 24/27] mm: Add support for releasing multiple instances of a page
This patch adds a function that allows us to batch free a page that has multiple references outstanding. Specifically this function can be used to drop a page being used in the page frag alloc cache. With this drivers can make use of functionality similar to the page frag alloc cache without having to do any workarounds for the fact that there is no function that frees multiple references. Cc: linux...@kvack.org Signed-off-by: Alexander Duyck --- include/linux/gfp.h |2 ++ mm/page_alloc.c | 14 ++ 2 files changed, 16 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index f8041f9de..4175dca 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -506,6 +506,8 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, extern void free_hot_cold_page_list(struct list_head *list, bool cold); struct page_frag_cache; +extern void __page_frag_drain(struct page *page, unsigned int order, + unsigned int count); extern void *__alloc_page_frag(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask); extern void __free_page_frag(void *addr); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ca423cc..253046a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3883,6 +3883,20 @@ static struct page *__page_frag_refill(struct page_frag_cache *nc, return page; } +void __page_frag_drain(struct page *page, unsigned int order, + unsigned int count) +{ + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + + if (page_ref_sub_and_test(page, count)) { + if (order == 0) + free_hot_cold_page(page, false); + else + __free_pages_ok(page, order); + } +} +EXPORT_SYMBOL(__page_frag_drain); + void *__alloc_page_frag(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask) {
[RFC PATCH ethtool 1/2] ethtool-copy.h: sync with net
From: Vidya Sagar Ravipati Sending out this review as RFC to get early feedback on fec options and output changes as changes to kernel uapi to ethtool is under review on netdev currently and might change based on review.. http://patchwork.ozlabs.org/patch/686293/ Signed-off-by: Vidya Sagar Ravipati --- ethtool-copy.h | 53 +++-- 1 file changed, 51 insertions(+), 2 deletions(-) diff --git a/ethtool-copy.h b/ethtool-copy.h index 70748f5..ff3f4f0 100644 --- a/ethtool-copy.h +++ b/ethtool-copy.h @@ -1222,6 +1222,51 @@ struct ethtool_per_queue_op { chardata[]; }; +/** + * struct ethtool_fecparam - Ethernet forward error correction(fec) parameters + * @cmd: Command number = %ETHTOOL_GFECPARAM or %ETHTOOL_SFECPARAM + * @autoneg: Flag to enable autonegotiation of fec modes(rs,baser) + * (D44:47 of base link code word) + * @fec: Bitmask of supported FEC modes + * @rsvd: Reserved for future extensions. i.e FEC bypass feature. + * + * Drivers should reject a non-zero setting of @autoneg when + * autoneogotiation is disabled (or not supported) for the link. + * + * If @autoneg is non-zero, the MAC is configured to enable one of + * the supported FEC modes according to the result of autonegotiation. + * Otherwise, it is configured directly based on the @fec parameter + */ +struct ethtool_fecparam { + __u32 cmd; + __u32 autoneg; + /* bitmask of FEC modes */ + __u32 fec; + __u32 reserved; +}; + +/** + * enum ethtool_fec_config_bits - flags definition of ethtool_fec_configuration + * @ETHTOOL_FEC_NONE: FEC mode configuration is not supported + * @ETHTOOL_FEC_AUTO: Default/Best FEC mode provided by driver + * @ETHTOOL_FEC_OFF: No FEC Mode + * @ETHTOOL_FEC_RS: Reed-Solomon Forward Error Detection mode + * @ETHTOOL_FEC_BASER: Base-R/Reed-Solomon Forward Error Detection mode + */ +enum ethtool_fec_config_bits { + ETHTOOL_FEC_NONE_BIT, + ETHTOOL_FEC_AUTO_BIT, + ETHTOOL_FEC_OFF_BIT, + ETHTOOL_FEC_RS_BIT, + ETHTOOL_FEC_BASER_BIT, +}; + +#define ETHTOOL_FEC_NONE (1 << ETHTOOL_FEC_NONE_BIT) +#define ETHTOOL_FEC_AUTO (1 << ETHTOOL_FEC_AUTO_BIT) +#define ETHTOOL_FEC_OFF(1 << ETHTOOL_FEC_OFF_BIT) +#define ETHTOOL_FEC_RS (1 << ETHTOOL_FEC_RS_BIT) +#define ETHTOOL_FEC_BASER (1 << ETHTOOL_FEC_BASER_BIT) + /* CMDs currently supported */ #define ETHTOOL_GSET 0x0001 /* DEPRECATED, Get settings. * Please use ETHTOOL_GLINKSETTINGS @@ -1313,6 +1358,8 @@ struct ethtool_per_queue_op { #define ETHTOOL_GLINKSETTINGS 0x004c /* Get ethtool_link_settings */ #define ETHTOOL_SLINKSETTINGS 0x004d /* Set ethtool_link_settings */ +#define ETHTOOL_GFECPARAM 0x004e /* Get FEC settings */ +#define ETHTOOL_SFECPARAM 0x004f /* Set FEC settings */ /* compatibility with older code */ #define SPARC_ETH_GSET ETHTOOL_GSET @@ -1367,7 +1414,9 @@ enum ethtool_link_mode_bit_indices { ETHTOOL_LINK_MODE_1baseLR_Full_BIT = 44, ETHTOOL_LINK_MODE_1baseLRM_Full_BIT = 45, ETHTOOL_LINK_MODE_1baseER_Full_BIT = 46, - + ETHTOOL_LINK_MODE_FEC_NONE_BIT = 47, + ETHTOOL_LINK_MODE_FEC_RS_BIT= 48, + ETHTOOL_LINK_MODE_FEC_BASER_BIT = 49, /* Last allowed bit for __ETHTOOL_LINK_MODE_LEGACY_MASK is bit * 31. Please do NOT define any SUPPORTED_* or ADVERTISED_* @@ -1376,7 +1425,7 @@ enum ethtool_link_mode_bit_indices { */ __ETHTOOL_LINK_MODE_LAST - = ETHTOOL_LINK_MODE_1baseER_Full_BIT, + = ETHTOOL_LINK_MODE_FEC_BASER_BIT, }; #define __ETHTOOL_LINK_MODE_LEGACY_MASK(base_name) \ -- 2.1.4
[net-next PATCH 08/27] arch/c6x: Add option to skip sync on DMA map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Cc: Mark Salter Cc: Aurelien Jacquiot Signed-off-by: Alexander Duyck --- arch/c6x/kernel/dma.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/c6x/kernel/dma.c b/arch/c6x/kernel/dma.c index db4a6a3..6752df3 100644 --- a/arch/c6x/kernel/dma.c +++ b/arch/c6x/kernel/dma.c @@ -42,14 +42,17 @@ static dma_addr_t c6x_dma_map_page(struct device *dev, struct page *page, { dma_addr_t handle = virt_to_phys(page_address(page) + offset); - c6x_dma_sync(handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + c6x_dma_sync(handle, size, dir); + return handle; } static void c6x_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir, unsigned long attrs) { - c6x_dma_sync(handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + c6x_dma_sync(handle, size, dir); } static int c6x_dma_map_sg(struct device *dev, struct scatterlist *sglist, @@ -60,7 +63,8 @@ static int c6x_dma_map_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); - c6x_dma_sync(sg->dma_address, sg->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + c6x_dma_sync(sg->dma_address, sg->length, dir); } return nents; @@ -72,9 +76,11 @@ static void c6x_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, struct scatterlist *sg; int i; + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return; + for_each_sg(sglist, sg, nents, i) c6x_dma_sync(sg_dma_address(sg), sg->length, dir); - } static void c6x_dma_sync_single_for_cpu(struct device *dev, dma_addr_t handle,
[net-next PATCH 18/27] arch/powerpc: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Alexander Duyck --- arch/powerpc/kernel/dma.c |9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index e64a601..6877e3f 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -203,6 +203,10 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, for_each_sg(sgl, sg, nents, i) { sg->dma_address = sg_phys(sg) + get_dma_offset(dev); sg->dma_length = sg->length; + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + __dma_sync_page(sg_page(sg), sg->offset, sg->length, direction); } @@ -235,7 +239,10 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev, unsigned long attrs) { BUG_ON(dir == DMA_NONE); - __dma_sync_page(page, offset, size, dir); + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync_page(page, offset, size, dir); + return page_to_phys(page) + offset + get_dma_offset(dev); }
[net-next PATCH 20/27] arch/sparc: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: "David S. Miller" Cc: sparcli...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/sparc/kernel/iommu.c |4 ++-- arch/sparc/kernel/ioport.c |4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5c615ab..8fda4e4 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -415,7 +415,7 @@ static void dma_4u_unmap_page(struct device *dev, dma_addr_t bus_addr, ctx = (iopte_val(*base) & IOPTE_CONTEXT) >> 47UL; /* Step 1: Kick data out of streaming buffers if necessary. */ - if (strbuf->strbuf_enabled) + if (strbuf->strbuf_enabled && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) strbuf_flush(strbuf, iommu, bus_addr, ctx, npages, direction); @@ -640,7 +640,7 @@ static void dma_4u_unmap_sg(struct device *dev, struct scatterlist *sglist, base = iommu->page_table + entry; dma_handle &= IO_PAGE_MASK; - if (strbuf->strbuf_enabled) + if (strbuf->strbuf_enabled && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) strbuf_flush(strbuf, iommu, dma_handle, ctx, npages, direction); diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 2344103..6ffaec4 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -527,7 +527,7 @@ static dma_addr_t pci32_map_page(struct device *dev, struct page *page, static void pci32_unmap_page(struct device *dev, dma_addr_t ba, size_t size, enum dma_data_direction dir, unsigned long attrs) { - if (dir != PCI_DMA_TODEVICE) + if (dir != PCI_DMA_TODEVICE && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) dma_make_coherent(ba, PAGE_ALIGN(size)); } @@ -572,7 +572,7 @@ static void pci32_unmap_sg(struct device *dev, struct scatterlist *sgl, struct scatterlist *sg; int n; - if (dir != PCI_DMA_TODEVICE) { + if (dir != PCI_DMA_TODEVICE && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { for_each_sg(sgl, sg, nents, n) { dma_make_coherent(sg_phys(sg), PAGE_ALIGN(sg->length)); }
[net-next PATCH 21/27] arch/tile: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Chris Metcalf Signed-off-by: Alexander Duyck --- arch/tile/kernel/pci-dma.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/tile/kernel/pci-dma.c b/arch/tile/kernel/pci-dma.c index 09bb774..24e0f8c 100644 --- a/arch/tile/kernel/pci-dma.c +++ b/arch/tile/kernel/pci-dma.c @@ -213,10 +213,12 @@ static int tile_dma_map_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); - __dma_prep_pa_range(sg->dma_address, sg->length, direction); #ifdef CONFIG_NEED_SG_DMA_LENGTH sg->dma_length = sg->length; #endif + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + __dma_prep_pa_range(sg->dma_address, sg->length, direction); } return nents; @@ -232,6 +234,8 @@ static void tile_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, BUG_ON(!valid_dma_direction(direction)); for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; __dma_complete_pa_range(sg->dma_address, sg->length, direction); } @@ -245,7 +249,8 @@ static dma_addr_t tile_dma_map_page(struct device *dev, struct page *page, BUG_ON(!valid_dma_direction(direction)); BUG_ON(offset + size > PAGE_SIZE); - __dma_prep_page(page, offset, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_prep_page(page, offset, size, direction); return page_to_pa(page) + offset; } @@ -256,6 +261,9 @@ static void tile_dma_unmap_page(struct device *dev, dma_addr_t dma_address, { BUG_ON(!valid_dma_direction(direction)); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return; + __dma_complete_page(pfn_to_page(PFN_DOWN(dma_address)), dma_address & (PAGE_SIZE - 1), size, direction); }
[RFC PATCH ethtool 0/2] ethtool: Add support for FEC encoding configuration
From: Vidya Sagar Ravipati Forward Error Correction (FEC) modes i.e Base-R and Reed-Solomon modes are introduced in 25G/40G/100G standards for providing good BER at high speeds. Various networking devices which support 25G/40G/100G provides ability to manage supported FEC modes and the lack of FEC encoding control and reporting today is a source for itneroperability issues for many vendors. FEC capability as well as specific FEC mode i.e. Base-R or RS modes can be requested or advertised through bits D44:47 of base link codeword. This patch set intends to provide option under ethtool to manage and report FEC encoding settings for networking devices as per IEEE 802.3 bj, bm and by specs. set-fec/show-fec option(s) are designed to provide control and report the FEC encoding on the link. SET FEC option: root@tor: ethtool --set-fec swp1 encoding [off | RS | BaseR | auto] autoneg [off | on] Encoding: Types of encoding Off: Turning off any encoding RS : enforcing RS-FEC encoding on supported speeds BaseR : enforcing Base R encoding on supported speeds Auto : Default FEC settings for divers , and would represent asking the hardware to essentially go into a best effort mode. Here are a few examples of what we would expect if encoding=auto: - if autoneg is on, we are expecting FEC to be negotiated as on or off as long as protocol supports it - if the hardware is capable of detecting the FEC encoding on it's receiver it will reconfigure its encoder to match - in absence of the above, the configuration would be set to IEEE defaults. >From our understanding , this is essentially what most hardware/driver combinations are doing today in the absence of a way for users to control the behavior. SHOW FEC option: root@tor: ethtool --show-fec swp1 FEC parameters for swp1: Autonegotiate: off FEC encodings: RS ETHTOOL DEVNAME output modification: ethtool devname output: root@tor:~# ethtool swp1 Settings for swp1: root@hpe-7712-03:~# ethtool swp18 Settings for swp18: Supported ports: [ FIBRE ] Supported link modes: 4baseCR4/Full 4baseSR4/Full 4baseLR4/Full 10baseSR4/Full 10baseCR4/Full 10baseLR4_ER4/Full Supported pause frame use: No Supports auto-negotiation: Yes Supported FEC modes: [RS | BaseR | None | Not reported] Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: [RS | BaseR | None | Not reported] One or more FEC modes Speed: 10Mb/s Duplex: Full Port: FIBRE PHYAD: 106 Transceiver: internal Auto-negotiation: off Link detected: yes Vidya Sagar Ravipati (2): ethtool-copy.h: sync with net ethtool: Support for FEC encoding control ethtool-copy.h | 53 +++- ethtool.c | 152 + 2 files changed, 203 insertions(+), 2 deletions(-) -- 2.1.4
[RFC PATCH ethtool 2/2] ethtool: Support for FEC encoding control
From: Vidya Sagar Ravipati As FEC settings and different FEC modes are mandatory and configurable across various interfaces of 25G/50G/100G/40G , the lack of FEC encoding control and reporting today is a source for interoperability issues for many vendors set-fec/show-fec option(s) are designed to provide control and report the FEC encoding on the link. root@tor: ethtool --set-fec swp1 encoding [off | RS | BaseR | auto] autoneg [off | on] Encoding: Types of encoding Off: Turning off any encoding RS : enforcing RS-FEC encoding on supported speeds BaseR : enforcing Base R encoding on supported speeds Auto : Default FEC settings for divers , and would represent asking the hardware to essentially go into a best effort mode. Here are a few examples of what we would expect if encoding=auto: - if autoneg is on, we are expecting FEC to be negotiated as on or off as long as protocol supports it - if the hardware is capable of detecting the FEC encoding on it's receiver it will reconfigure its encoder to match - in absence of the above, the configuration would be set to IEEE defaults. >From our understanding , this is essentially what most hardware/driver combinations are doing today in the absence of a way for users to control the behavior. root@tor: ethtool --show-fec swp1 FEC parameters for swp1: Autonegotiate: off FEC encodings: RS ethtool devname output: root@tor:~# ethtool swp1 Settings for swp1: root@hpe-7712-03:~# ethtool swp18 Settings for swp18: Supported ports: [ FIBRE ] Supported link modes: 4baseCR4/Full 4baseSR4/Full 4baseLR4/Full 10baseSR4/Full 10baseCR4/Full 10baseLR4_ER4/Full Supported pause frame use: No Supports auto-negotiation: Yes Supported FEC modes: [RS | BaseR | None | Not reported] Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: [RS | BaseR | None | Not reported] Speed: 10Mb/s Duplex: Full Port: FIBRE PHYAD: 106 Transceiver: internal Auto-negotiation: off Link detected: yes Signed-off-by: Vidya Sagar Ravipati --- ethtool.c | 152 ++ 1 file changed, 152 insertions(+) diff --git a/ethtool.c b/ethtool.c index 49ac94e..7fa058c 100644 --- a/ethtool.c +++ b/ethtool.c @@ -684,6 +684,7 @@ static void dump_link_caps(const char *prefix, const char *an_prefix, }; int indent; int did1, new_line_pend, i; + int fecreported = 0; /* Indent just like the separate functions used to */ indent = strlen(prefix) + 14; @@ -735,6 +736,26 @@ static void dump_link_caps(const char *prefix, const char *an_prefix, fprintf(stdout, "Yes\n"); else fprintf(stdout, "No\n"); + + fprintf(stdout, " %s FEC modes: ", prefix); + if (ethtool_link_mode_test_bit( + ETHTOOL_LINK_MODE_FEC_NONE_BIT, mask)) { + fprintf(stdout, "None\n"); + fecreported = 1; + } + if (ethtool_link_mode_test_bit( + ETHTOOL_LINK_MODE_FEC_BASER_BIT, mask)) { + fprintf(stdout, "BaseR\n"); + fecreported = 1; + } + if (ethtool_link_mode_test_bit( + ETHTOOL_LINK_MODE_FEC_RS_BIT, mask)) { + fprintf(stdout, "RS\n"); + fecreported = 1; + } + if (!fecreported) { + fprintf(stdout, "Not reported\n"); + } } } @@ -1562,6 +1583,42 @@ static void dump_eeecmd(struct ethtool_eee *ep) dump_link_caps("Link partner advertised EEE", "", link_mode, 1); } +static void dump_feccmd(struct ethtool_fecparam *ep) +{ + static char buf[300]; + + memset(buf, 0, sizeof(buf)); + + bool first = true; + + fprintf(stdout, + "Auto-negotiation: %s\n", + ep->autoneg ? "on" : "off"); + fprintf(stdout, "FEC encodings :"); + + if(ep->fec & ETHTOOL_FEC_NONE) { + strcat(buf, "NotSupported"); + first = false; + } + if(ep->fec & ETHTOOL_FEC_OFF) { + strcat(buf, "None"); + first = false; + } + if(ep->fec & ETHTOOL_FEC_BASER) { + if (!first) + strcat(buf, " | "); + strcat(buf, "BaseR"); + first = false; + } + if(ep->fec & ETHTOOL_FEC_RS) { + if (!first) + strcat(buf, " | "); + strcat(buf, "RS"); + first = false
[net-next PATCH 15/27] arch/nios2: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Ley Foon Tan Signed-off-by: Alexander Duyck --- arch/nios2/mm/dma-mapping.c | 26 ++ 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index d800fad..f6a5dcf 100644 --- a/arch/nios2/mm/dma-mapping.c +++ b/arch/nios2/mm/dma-mapping.c @@ -98,13 +98,17 @@ static int nios2_dma_map_sg(struct device *dev, struct scatterlist *sg, int i; for_each_sg(sg, sg, nents, i) { - void *addr; + void *addr = sg_virt(sg); - addr = sg_virt(sg); - if (addr) { - __dma_sync_for_device(addr, sg->length, direction); - sg->dma_address = sg_phys(sg); - } + if (!addr) + continue; + + sg->dma_address = sg_phys(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + + __dma_sync_for_device(addr, sg->length, direction); } return nents; @@ -117,7 +121,9 @@ static dma_addr_t nios2_dma_map_page(struct device *dev, struct page *page, { void *addr = page_address(page) + offset; - __dma_sync_for_device(addr, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync_for_device(addr, size, direction); + return page_to_phys(page) + offset; } @@ -125,7 +131,8 @@ static void nios2_dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, enum dma_data_direction direction, unsigned long attrs) { - __dma_sync_for_cpu(phys_to_virt(dma_address), size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync_for_cpu(phys_to_virt(dma_address), size, direction); } static void nios2_dma_unmap_sg(struct device *dev, struct scatterlist *sg, @@ -138,6 +145,9 @@ static void nios2_dma_unmap_sg(struct device *dev, struct scatterlist *sg, if (direction == DMA_TO_DEVICE) return; + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return; + for_each_sg(sg, sg, nhwentries, i) { addr = sg_virt(sg); if (addr)
[net-next PATCH 19/27] arch/sh: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Yoshinori Sato Cc: Rich Felker Cc: linux...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/sh/kernel/dma-nommu.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c index eadb669..47fee3b 100644 --- a/arch/sh/kernel/dma-nommu.c +++ b/arch/sh/kernel/dma-nommu.c @@ -18,7 +18,9 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page, dma_addr_t addr = page_to_phys(page) + offset; WARN_ON(size == 0); - dma_cache_sync(dev, page_address(page) + offset, size, dir); + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_cache_sync(dev, page_address(page) + offset, size, dir); return addr; } @@ -35,7 +37,8 @@ static int nommu_map_sg(struct device *dev, struct scatterlist *sg, for_each_sg(sg, s, nents, i) { BUG_ON(!sg_page(s)); - dma_cache_sync(dev, sg_virt(s), s->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_cache_sync(dev, sg_virt(s), s->length, dir); s->dma_address = sg_phys(s); s->dma_length = s->length;
[net-next PATCH 13/27] arch/microblaze: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Michal Simek Signed-off-by: Alexander Duyck --- arch/microblaze/kernel/dma.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index ec04dc1..818daf2 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -61,6 +61,10 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, /* FIXME this part of code is untested */ for_each_sg(sgl, sg, nents, i) { sg->dma_address = sg_phys(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + __dma_sync(page_to_phys(sg_page(sg)) + sg->offset, sg->length, direction); } @@ -80,7 +84,8 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev, enum dma_data_direction direction, unsigned long attrs) { - __dma_sync(page_to_phys(page) + offset, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync(page_to_phys(page) + offset, size, direction); return page_to_phys(page) + offset; } @@ -95,7 +100,8 @@ static inline void dma_direct_unmap_page(struct device *dev, * phys_to_virt is here because in __dma_sync_page is __virt_to_phys and * dma_address is physical address */ - __dma_sync(dma_address, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync(dma_address, size, direction); } static inline void
[net-next PATCH 10/27] arch/hexagon: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Cc: Richard Kuo Cc: linux-hexa...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/hexagon/kernel/dma.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index b901778..dbc4f10 100644 --- a/arch/hexagon/kernel/dma.c +++ b/arch/hexagon/kernel/dma.c @@ -119,6 +119,9 @@ static int hexagon_map_sg(struct device *hwdev, struct scatterlist *sg, s->dma_length = s->length; + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + flush_dcache_range(dma_addr_to_virt(s->dma_address), dma_addr_to_virt(s->dma_address + s->length)); } @@ -180,7 +183,8 @@ static dma_addr_t hexagon_map_page(struct device *dev, struct page *page, if (!check_addr("map_single", dev, bus, size)) return bad_dma_address; - dma_sync(dma_addr_to_virt(bus), size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_sync(dma_addr_to_virt(bus), size, dir); return bus; }
[net-next PATCH 09/27] arch/frv: Add option to skip sync on DMA map
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Signed-off-by: Alexander Duyck --- arch/frv/mb93090-mb00/pci-dma-nommu.c | 14 ++ arch/frv/mb93090-mb00/pci-dma.c |9 +++-- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/frv/mb93090-mb00/pci-dma-nommu.c b/arch/frv/mb93090-mb00/pci-dma-nommu.c index 90f2e4c..1876881 100644 --- a/arch/frv/mb93090-mb00/pci-dma-nommu.c +++ b/arch/frv/mb93090-mb00/pci-dma-nommu.c @@ -109,16 +109,19 @@ static int frv_dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction, unsigned long attrs) { - int i; struct scatterlist *sg; + int i; + + BUG_ON(direction == DMA_NONE); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return nents; for_each_sg(sglist, sg, nents, i) { frv_cache_wback_inv(sg_dma_address(sg), sg_dma_address(sg) + sg_dma_len(sg)); } - BUG_ON(direction == DMA_NONE); - return nents; } @@ -127,7 +130,10 @@ static dma_addr_t frv_dma_map_page(struct device *dev, struct page *page, enum dma_data_direction direction, unsigned long attrs) { BUG_ON(direction == DMA_NONE); - flush_dcache_page(page); + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + flush_dcache_page(page); + return (dma_addr_t) page_to_phys(page) + offset; } diff --git a/arch/frv/mb93090-mb00/pci-dma.c b/arch/frv/mb93090-mb00/pci-dma.c index f585745..dba7df9 100644 --- a/arch/frv/mb93090-mb00/pci-dma.c +++ b/arch/frv/mb93090-mb00/pci-dma.c @@ -40,13 +40,16 @@ static int frv_dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction, unsigned long attrs) { + struct scatterlist *sg; unsigned long dampr2; void *vaddr; int i; - struct scatterlist *sg; BUG_ON(direction == DMA_NONE); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return nents; + dampr2 = __get_DAMPR(2); for_each_sg(sglist, sg, nents, i) { @@ -70,7 +73,9 @@ static dma_addr_t frv_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction direction, unsigned long attrs) { - flush_dcache_page(page); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + flush_dcache_page(page); + return (dma_addr_t) page_to_phys(page) + offset; }
[net-next PATCH 11/27] arch/m68k: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Cc: Geert Uytterhoeven Cc: linux-m...@lists.linux-m68k.org Signed-off-by: Alexander Duyck --- arch/m68k/kernel/dma.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index 8cf97cb..0707006 100644 --- a/arch/m68k/kernel/dma.c +++ b/arch/m68k/kernel/dma.c @@ -134,7 +134,9 @@ static dma_addr_t m68k_dma_map_page(struct device *dev, struct page *page, { dma_addr_t handle = page_to_phys(page) + offset; - dma_sync_single_for_device(dev, handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_sync_single_for_device(dev, handle, size, dir); + return handle; } @@ -146,6 +148,10 @@ static int m68k_dma_map_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + dma_sync_single_for_device(dev, sg->dma_address, sg->length, dir); }
[net-next PATCH 17/27] arch/parisc: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: linux-par...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/parisc/kernel/pci-dma.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 02d9ed0..be55ede 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -459,7 +459,9 @@ static dma_addr_t pa11_dma_map_page(struct device *dev, struct page *page, void *addr = page_address(page) + offset; BUG_ON(direction == DMA_NONE); - flush_kernel_dcache_range((unsigned long) addr, size); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + flush_kernel_dcache_range((unsigned long) addr, size); + return virt_to_phys(addr); } @@ -469,8 +471,11 @@ static void pa11_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, { BUG_ON(direction == DMA_NONE); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return; + if (direction == DMA_TO_DEVICE) - return; + return; /* * For PCI_DMA_FROMDEVICE this flush is not necessary for the @@ -479,7 +484,6 @@ static void pa11_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, */ flush_kernel_dcache_range((unsigned long) phys_to_virt(dma_handle), size); - return; } static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist, @@ -496,6 +500,10 @@ static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist, sg_dma_address(sg) = (dma_addr_t) virt_to_phys(vaddr); sg_dma_len(sg) = sg->length; + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + flush_kernel_dcache_range(vaddr, sg->length); } return nents; @@ -510,14 +518,16 @@ static void pa11_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, BUG_ON(direction == DMA_NONE); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return; + if (direction == DMA_TO_DEVICE) - return; + return; /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */ for_each_sg(sglist, sg, nents, i) flush_kernel_vmap_range(sg_virt(sg), sg->length); - return; } static void pa11_dma_sync_single_for_cpu(struct device *dev,
[net-next PATCH 12/27] arch/metag: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: James Hogan Cc: linux-me...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/metag/kernel/dma.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/metag/kernel/dma.c b/arch/metag/kernel/dma.c index 0db31e2..91968d9 100644 --- a/arch/metag/kernel/dma.c +++ b/arch/metag/kernel/dma.c @@ -484,8 +484,9 @@ static dma_addr_t metag_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction direction, unsigned long attrs) { - dma_sync_for_device((void *)(page_to_phys(page) + offset), size, - direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_sync_for_device((void *)(page_to_phys(page) + offset), + size, direction); return page_to_phys(page) + offset; } @@ -493,7 +494,8 @@ static void metag_dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, enum dma_data_direction direction, unsigned long attrs) { - dma_sync_for_cpu(phys_to_virt(dma_address), size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_sync_for_cpu(phys_to_virt(dma_address), size, direction); } static int metag_dma_map_sg(struct device *dev, struct scatterlist *sglist, @@ -507,6 +509,10 @@ static int metag_dma_map_sg(struct device *dev, struct scatterlist *sglist, BUG_ON(!sg_page(sg)); sg->dma_address = sg_phys(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + dma_sync_for_device(sg_virt(sg), sg->length, direction); } @@ -525,6 +531,10 @@ static void metag_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, BUG_ON(!sg_page(sg)); sg->dma_address = sg_phys(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + dma_sync_for_cpu(sg_virt(sg), sg->length, direction); } }
[net-next PATCH 14/27] arch/mips: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Ralf Baechle Cc: Keguang Zhang Cc: linux-m...@linux-mips.org Signed-off-by: Alexander Duyck --- arch/mips/loongson64/common/dma-swiotlb.c |2 +- arch/mips/mm/dma-default.c|8 +--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/mips/loongson64/common/dma-swiotlb.c b/arch/mips/loongson64/common/dma-swiotlb.c index 1a80b6f..aab4fd6 100644 --- a/arch/mips/loongson64/common/dma-swiotlb.c +++ b/arch/mips/loongson64/common/dma-swiotlb.c @@ -61,7 +61,7 @@ static int loongson_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir, unsigned long attrs) { - int r = swiotlb_map_sg_attrs(dev, sg, nents, dir, 0); + int r = swiotlb_map_sg_attrs(dev, sg, nents, dir, attrs); mb(); return r; diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c index b2eadd6..dd998d7 100644 --- a/arch/mips/mm/dma-default.c +++ b/arch/mips/mm/dma-default.c @@ -293,7 +293,7 @@ static inline void __dma_sync(struct page *page, static void mips_dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size, enum dma_data_direction direction, unsigned long attrs) { - if (cpu_needs_post_dma_flush(dev)) + if (cpu_needs_post_dma_flush(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) __dma_sync(dma_addr_to_page(dev, dma_addr), dma_addr & ~PAGE_MASK, size, direction); plat_post_dma_flush(dev); @@ -307,7 +307,8 @@ static int mips_dma_map_sg(struct device *dev, struct scatterlist *sglist, struct scatterlist *sg; for_each_sg(sglist, sg, nents, i) { - if (!plat_device_is_coherent(dev)) + if (!plat_device_is_coherent(dev) && + !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) __dma_sync(sg_page(sg), sg->offset, sg->length, direction); #ifdef CONFIG_NEED_SG_DMA_LENGTH @@ -324,7 +325,7 @@ static dma_addr_t mips_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction direction, unsigned long attrs) { - if (!plat_device_is_coherent(dev)) + if (!plat_device_is_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) __dma_sync(page, offset, size, direction); return plat_map_dma_mem_page(dev, page) + offset; @@ -339,6 +340,7 @@ static void mips_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nhwentries, i) { if (!plat_device_is_coherent(dev) && + !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && direction != DMA_TO_DEVICE) __dma_sync(sg_page(sg), sg->offset, sg->length, direction);
[net-next PATCH 00/27] Add support for DMA writable pages being writable by the network stack
The first 22 patches in the set add support for the DMA attribute DMA_ATTR_SKIP_CPU_SYNC on multiple platforms/architectures. This is needed so that we can flag the calls to dma_map/unmap_page so that we do not invalidate cache lines that do not currently belong to the device. Instead we have to take care of this in the driver via a call to sync_single_range_for_cpu prior to freeing the Rx page. Patch 23 adds support for dma_map_page_attrs and dma_unmap_page_attrs so that we can unmap and map a page using the DMA_ATTR_SKIP_CPU_SYNC attribute. Patch 24 adds support for freeing a page that has multiple references being held by a single caller. This way we can free page fragments that were allocated by a given driver. The last 3 patches use these updates in the igb driver to allow for us to reimpelement the use of build_skb. My hope is to get the series accepted into the net-next tree as I have a number of other Intel drivers I could then begin updating once these patches are accepted. v1: Split out changes DMA_ERROR_CODE fix for swiotlb-xen Minor fixes based on issues found by kernel build bot Few minor changes for issues found on code review Added Acked-by for patches that were acked and not changed --- Alexander Duyck (27): swiotlb: Drop unused function swiotlb_map_sg swiotlb-xen: Enforce return of DMA_ERROR_CODE in mapping function swiotlb: Add support for DMA_ATTR_SKIP_CPU_SYNC arch/arc: Add option to skip sync on DMA mapping arch/arm: Add option to skip sync on DMA map and unmap arch/avr32: Add option to skip sync on DMA map arch/blackfin: Add option to skip sync on DMA map arch/c6x: Add option to skip sync on DMA map and unmap arch/frv: Add option to skip sync on DMA map arch/hexagon: Add option to skip DMA sync as a part of mapping arch/m68k: Add option to skip DMA sync as a part of mapping arch/metag: Add option to skip DMA sync as a part of map and unmap arch/microblaze: Add option to skip DMA sync as a part of map and unmap arch/mips: Add option to skip DMA sync as a part of map and unmap arch/nios2: Add option to skip DMA sync as a part of map and unmap arch/openrisc: Add option to skip DMA sync as a part of mapping arch/parisc: Add option to skip DMA sync as a part of map and unmap arch/powerpc: Add option to skip DMA sync as a part of mapping arch/sh: Add option to skip DMA sync as a part of mapping arch/sparc: Add option to skip DMA sync as a part of map and unmap arch/tile: Add option to skip DMA sync as a part of map and unmap arch/xtensa: Add option to skip DMA sync as a part of mapping dma: Add calls for dma_map_page_attrs and dma_unmap_page_attrs mm: Add support for releasing multiple instances of a page igb: Update driver to make use of DMA_ATTR_SKIP_CPU_SYNC igb: Update code to better handle incrementing page count igb: Revert "igb: Revert support for build_skb in igb" arch/arc/mm/dma.c |5 + arch/arm/common/dmabounce.c | 16 +- arch/arm/xen/mm.c |1 arch/avr32/mm/dma-coherent.c |7 + arch/blackfin/kernel/dma-mapping.c|8 + arch/c6x/kernel/dma.c | 14 +- arch/frv/mb93090-mb00/pci-dma-nommu.c | 14 +- arch/frv/mb93090-mb00/pci-dma.c |9 + arch/hexagon/kernel/dma.c |6 + arch/m68k/kernel/dma.c|8 + arch/metag/kernel/dma.c | 16 ++ arch/microblaze/kernel/dma.c | 10 + arch/mips/loongson64/common/dma-swiotlb.c |2 arch/mips/mm/dma-default.c|8 + arch/nios2/mm/dma-mapping.c | 26 +++- arch/openrisc/kernel/dma.c|3 arch/parisc/kernel/pci-dma.c | 20 ++- arch/powerpc/kernel/dma.c |9 + arch/sh/kernel/dma-nommu.c|7 + arch/sparc/kernel/iommu.c |4 - arch/sparc/kernel/ioport.c|4 - arch/tile/kernel/pci-dma.c| 12 +- arch/x86/xen/pci-swiotlb-xen.c|1 arch/xtensa/kernel/pci-dma.c |7 + drivers/net/ethernet/intel/igb/igb.h | 36 - drivers/net/ethernet/intel/igb/igb_main.c | 207 +++-- drivers/xen/swiotlb-xen.c | 27 ++-- include/linux/dma-mapping.h | 20 ++- include/linux/gfp.h |2 include/linux/swiotlb.h | 10 + include/xen/swiotlb-xen.h |3 lib/swiotlb.c | 56 mm/page_alloc.c | 14 ++ 33 files changed, 433 insertions(+), 159 deletions(-) -- Signature
[net-next PATCH 06/27] arch/avr32: Add option to skip sync on DMA map
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Acked-by: Hans-Christian Noren Egtvedt Signed-off-by: Alexander Duyck --- arch/avr32/mm/dma-coherent.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/avr32/mm/dma-coherent.c b/arch/avr32/mm/dma-coherent.c index 58610d0..54534e5 100644 --- a/arch/avr32/mm/dma-coherent.c +++ b/arch/avr32/mm/dma-coherent.c @@ -146,7 +146,8 @@ static dma_addr_t avr32_dma_map_page(struct device *dev, struct page *page, { void *cpu_addr = page_address(page) + offset; - dma_cache_sync(dev, cpu_addr, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_cache_sync(dev, cpu_addr, size, direction); return virt_to_bus(cpu_addr); } @@ -162,6 +163,10 @@ static int avr32_dma_map_sg(struct device *dev, struct scatterlist *sglist, sg->dma_address = page_to_bus(sg_page(sg)) + sg->offset; virt = sg_virt(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + dma_cache_sync(dev, virt, sg->length, direction); }
[net-next PATCH 26/27] igb: Update code to better handle incrementing page count
This patch updates the driver code so that we do bulk updates of the page reference count instead of just incrementing it by one reference at a time. The advantage to doing this is that we cut down on atomic operations and this in turn should give us a slight improvement in cycles per packet. In addition if we eventually move this over to using build_skb the gains will be more noticeable. Signed-off-by: Alexander Duyck --- drivers/net/ethernet/intel/igb/igb.h |7 ++- drivers/net/ethernet/intel/igb/igb_main.c | 24 +--- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index d11093d..acbc3ab 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -210,7 +210,12 @@ struct igb_tx_buffer { struct igb_rx_buffer { dma_addr_t dma; struct page *page; - unsigned int page_offset; +#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536) + __u32 page_offset; +#else + __u16 page_offset; +#endif + __u16 pagecnt_bias; }; struct igb_tx_queue_stats { diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index c8c458c..5e66cde 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3962,7 +3962,8 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring) PAGE_SIZE, DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC); - __free_page(buffer_info->page); + __page_frag_drain(buffer_info->page, 0, + buffer_info->pagecnt_bias); buffer_info->page = NULL; } @@ -6830,13 +6831,15 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer, struct page *page, unsigned int truesize) { + unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--; + /* avoid re-using remote pages */ if (unlikely(igb_page_is_reserved(page))) return false; #if (PAGE_SIZE < 8192) /* if we are only owner of page we can reuse it */ - if (unlikely(page_count(page) != 1)) + if (unlikely(page_ref_count(page) != pagecnt_bias)) return false; /* flip page offset to other buffer */ @@ -6849,10 +6852,14 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer, return false; #endif - /* Even if we own the page, we are not allowed to use atomic_set() -* This would break get_page_unless_zero() users. + /* If we have drained the page fragment pool we need to update +* the pagecnt_bias and page count so that we fully restock the +* number of references the driver holds. */ - page_ref_inc(page); + if (unlikely(pagecnt_bias == 1)) { + page_ref_add(page, USHRT_MAX); + rx_buffer->pagecnt_bias = USHRT_MAX; + } return true; } @@ -6904,7 +6911,6 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring, return true; /* this page cannot be reused so discard it */ - __free_page(page); return false; } @@ -6975,10 +6981,13 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring, /* hand second half of page back to the ring */ igb_reuse_rx_page(rx_ring, rx_buffer); } else { - /* we are not reusing the buffer so unmap it */ + /* We are not reusing the buffer so unmap it and free +* any references we are holding to it +*/ dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, PAGE_SIZE, DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC); + __page_frag_drain(page, 0, rx_buffer->pagecnt_bias); } /* clear contents of rx_buffer */ @@ -7252,6 +7261,7 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring, bi->dma = dma; bi->page = page; bi->page_offset = 0; + bi->pagecnt_bias = 1; return true; }
[net-next PATCH 27/27] igb: Revert "igb: Revert support for build_skb in igb"
This reverts commit f9d40f6a9921 ("igb: Revert support for build_skb in igb") and adds a few changes to update it to work with the latest version of igb. We are now able to revert the removal of this due to the fact that with the recent changes to the page count and the use of DMA_ATTR_SKIP_CPU_SYNC we can make the pages writable so we should not be invalidating the additional data added when we call build_skb. The biggest risk with this change is that we are now not able to support full jumbo frames when using build_skb. Instead we can only support up to 2K minus the skb overhead and padding offset. Signed-off-by: Alexander Duyck --- drivers/net/ethernet/intel/igb/igb.h | 29 ++ drivers/net/ethernet/intel/igb/igb_main.c | 130 ++--- 2 files changed, 142 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index acbc3ab..c3420f3 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -145,6 +145,10 @@ struct vf_data_storage { #define IGB_RX_HDR_LEN IGB_RXBUFFER_256 #define IGB_RX_BUFSZ IGB_RXBUFFER_2048 +#define IGB_SKB_PAD(NET_SKB_PAD + NET_IP_ALIGN) +#define IGB_MAX_BUILD_SKB_SIZE \ + (SKB_WITH_OVERHEAD(IGB_RX_BUFSZ) - (IGB_SKB_PAD + IGB_TS_HDR_LEN)) + /* How many Rx Buffers do we bundle into one write to the hardware ? */ #define IGB_RX_BUFFER_WRITE16 /* Must be power of 2 */ @@ -301,12 +305,29 @@ struct igb_q_vector { }; enum e1000_ring_flags_t { - IGB_RING_FLAG_RX_SCTP_CSUM, - IGB_RING_FLAG_RX_LB_VLAN_BSWAP, - IGB_RING_FLAG_TX_CTX_IDX, - IGB_RING_FLAG_TX_DETECT_HANG + IGB_RING_FLAG_RX_SCTP_CSUM = 0, +#if (NET_IP_ALIGN != 0) + IGB_RING_FLAG_RX_BUILD_SKB_ENABLED = 1, +#endif + IGB_RING_FLAG_RX_LB_VLAN_BSWAP = 2, + IGB_RING_FLAG_TX_CTX_IDX = 3, + IGB_RING_FLAG_TX_DETECT_HANG = 4, +#if (NET_IP_ALIGN == 0) +#if (L1_CACHE_SHIFT < 5) + IGB_RING_FLAG_RX_BUILD_SKB_ENABLED = 5, +#else + IGB_RING_FLAG_RX_BUILD_SKB_ENABLED = L1_CACHE_SHIFT, +#endif +#endif }; +#define ring_uses_build_skb(ring) \ + test_bit(IGB_RING_FLAG_RX_BUILD_SKB_ENABLED, &(ring)->flags) +#define set_ring_build_skb_enabled(ring) \ + set_bit(IGB_RING_FLAG_RX_BUILD_SKB_ENABLED, &(ring)->flags) +#define clear_ring_build_skb_enabled(ring) \ + clear_bit(IGB_RING_FLAG_RX_BUILD_SKB_ENABLED, &(ring)->flags) + #define IGB_TXD_DCMD (E1000_ADVTXD_DCMD_EOP | E1000_ADVTXD_DCMD_RS) #define IGB_RX_DESC(R, i) \ diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 5e66cde..e55407a 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3761,6 +3761,16 @@ void igb_configure_rx_ring(struct igb_adapter *adapter, wr32(E1000_RXDCTL(reg_idx), rxdctl); } +static void igb_set_rx_buffer_len(struct igb_adapter *adapter, + struct igb_ring *rx_ring) +{ + /* set build_skb flag */ + if (adapter->max_frame_size <= IGB_MAX_BUILD_SKB_SIZE) + set_ring_build_skb_enabled(rx_ring); + else + clear_ring_build_skb_enabled(rx_ring); +} + /** * igb_configure_rx - Configure receive Unit after Reset * @adapter: board private structure @@ -3778,8 +3788,12 @@ static void igb_configure_rx(struct igb_adapter *adapter) /* Setup the HW Rx Head and Tail Descriptor Pointers and * the Base and Length of the Rx Descriptor Ring */ - for (i = 0; i < adapter->num_rx_queues; i++) - igb_configure_rx_ring(adapter, adapter->rx_ring[i]); + for (i = 0; i < adapter->num_rx_queues; i++) { + struct igb_ring *rx_ring = adapter->rx_ring[i]; + + igb_set_rx_buffer_len(adapter, rx_ring); + igb_configure_rx_ring(adapter, rx_ring); + } } /** @@ -4238,7 +4252,7 @@ static void igb_set_rx_mode(struct net_device *netdev) struct igb_adapter *adapter = netdev_priv(netdev); struct e1000_hw *hw = &adapter->hw; unsigned int vfn = adapter->vfs_allocated_count; - u32 rctl = 0, vmolr = 0; + u32 rctl = 0, vmolr = 0, rlpml = MAX_JUMBO_FRAME_SIZE; int count; /* Check for Promiscuous and All Multicast modes */ @@ -4310,12 +4324,18 @@ static void igb_set_rx_mode(struct net_device *netdev) vmolr |= rd32(E1000_VMOLR(vfn)) & ~(E1000_VMOLR_ROPE | E1000_VMOLR_MPME | E1000_VMOLR_ROMPE); - /* enable Rx jumbo frames, no need for restriction */ + /* enable Rx jumbo frames, restrict as needed to support build_skb */ vmolr &= ~E1000_VMOLR_RLPML_MASK; - vmolr |= MAX_JUMBO_FRAME_SIZE | E1000_VMOLR_LPE; + vmolr |= E1000_VMOLR_LPE; + vmolr |= (adapter->max_frame_size <= IGB_MAX_BUILD_SKB_SIZE) ? +IGB_MAX_BUILD_SKB_SIZE : MAX
[net-next PATCH 16/27] arch/openrisc: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Jonas Bonn Signed-off-by: Alexander Duyck --- arch/openrisc/kernel/dma.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index 140c991..906998b 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -141,6 +141,9 @@ unsigned long cl; dma_addr_t addr = page_to_phys(page) + offset; + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return addr; + switch (dir) { case DMA_TO_DEVICE: /* Flush the dcache for the requested range */
[net-next PATCH 25/27] igb: Update driver to make use of DMA_ATTR_SKIP_CPU_SYNC
The ARM architecture provides a mechanism for deferring cache line invalidation in the case of map/unmap. This patch makes use of this mechanism to avoid unnecessary synchronization. A secondary effect of this change is that the portion of the page that has been synchronized for use by the CPU should be writable and could be passed up the stack (at least on ARM). The last bit that occurred to me is that on architectures where the sync_for_cpu call invalidates cache lines we were prefetching and then invalidating the first 128 bytes of the packet. To avoid that I have moved the sync up to before we perform the prefetch and allocate the skbuff so that we can actually make use of it. Signed-off-by: Alexander Duyck --- drivers/net/ethernet/intel/igb/igb_main.c | 53 ++--- 1 file changed, 33 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 4feca69..c8c458c 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3947,10 +3947,21 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring) if (!buffer_info->page) continue; - dma_unmap_page(rx_ring->dev, - buffer_info->dma, - PAGE_SIZE, - DMA_FROM_DEVICE); + /* Invalidate cache lines that may have been written to by +* device so that we avoid corrupting memory. +*/ + dma_sync_single_range_for_cpu(rx_ring->dev, + buffer_info->dma, + buffer_info->page_offset, + IGB_RX_BUFSZ, + DMA_FROM_DEVICE); + + /* free resources associated with mapping */ + dma_unmap_page_attrs(rx_ring->dev, +buffer_info->dma, +PAGE_SIZE, +DMA_FROM_DEVICE, +DMA_ATTR_SKIP_CPU_SYNC); __free_page(buffer_info->page); buffer_info->page = NULL; @@ -6808,12 +6819,6 @@ static void igb_reuse_rx_page(struct igb_ring *rx_ring, /* transfer page from old buffer to new buffer */ *new_buff = *old_buff; - - /* sync the buffer for use by the device */ - dma_sync_single_range_for_device(rx_ring->dev, old_buff->dma, -old_buff->page_offset, -IGB_RX_BUFSZ, -DMA_FROM_DEVICE); } static inline bool igb_page_is_reserved(struct page *page) @@ -6934,6 +6939,13 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring, page = rx_buffer->page; prefetchw(page); + /* we are reusing so sync this buffer for CPU use */ + dma_sync_single_range_for_cpu(rx_ring->dev, + rx_buffer->dma, + rx_buffer->page_offset, + size, + DMA_FROM_DEVICE); + if (likely(!skb)) { void *page_addr = page_address(page) + rx_buffer->page_offset; @@ -6958,21 +6970,15 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring, prefetchw(skb->data); } - /* we are reusing so sync this buffer for CPU use */ - dma_sync_single_range_for_cpu(rx_ring->dev, - rx_buffer->dma, - rx_buffer->page_offset, - size, - DMA_FROM_DEVICE); - /* pull page into skb */ if (igb_add_rx_frag(rx_ring, rx_buffer, size, rx_desc, skb)) { /* hand second half of page back to the ring */ igb_reuse_rx_page(rx_ring, rx_buffer); } else { /* we are not reusing the buffer so unmap it */ - dma_unmap_page(rx_ring->dev, rx_buffer->dma, - PAGE_SIZE, DMA_FROM_DEVICE); + dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, +PAGE_SIZE, DMA_FROM_DEVICE, +DMA_ATTR_SKIP_CPU_SYNC); } /* clear contents of rx_buffer */ @@ -7230,7 +7236,8 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring, } /* map page for use */ - dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE, DMA_FROM_DEVICE); + dma = dma_map_page_attrs(rx_ring->dev, page, 0, PAGE_SIZE, +DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC); /
[net-next PATCH 02/27] swiotlb-xen: Enforce return of DMA_ERROR_CODE in mapping function
The mapping function should always return DMA_ERROR_CODE when a mapping has failed as this is what the DMA API expects when a DMA error has occurred. The current function for mapping a page in Xen was returning either DMA_ERROR_CODE or 0 depending on where it failed. On x86 DMA_ERROR_CODE is 0, but on other architectures such as ARM it is ~0. We need to make sure we return the same error value if either the mapping failed or the device is not capable of accessing the mapping. If we are returning DMA_ERROR_CODE as our error value we can drop the function for checking the error code as the default is to compare the return value against DMA_ERROR_CODE if no function is defined. Cc: Konrad Rzeszutek Wilk Signed-off-by: Alexander Duyck --- arch/arm/xen/mm.c |1 - arch/x86/xen/pci-swiotlb-xen.c |1 - drivers/xen/swiotlb-xen.c | 18 ++ include/xen/swiotlb-xen.h |3 --- 4 files changed, 6 insertions(+), 17 deletions(-) diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c index d062f08..bd62d94 100644 --- a/arch/arm/xen/mm.c +++ b/arch/arm/xen/mm.c @@ -186,7 +186,6 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order) EXPORT_SYMBOL(xen_dma_ops); static struct dma_map_ops xen_swiotlb_dma_ops = { - .mapping_error = xen_swiotlb_dma_mapping_error, .alloc = xen_swiotlb_alloc_coherent, .free = xen_swiotlb_free_coherent, .sync_single_for_cpu = xen_swiotlb_sync_single_for_cpu, diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index 0e98e5d..a9fafb5 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -19,7 +19,6 @@ int xen_swiotlb __read_mostly; static struct dma_map_ops xen_swiotlb_dma_ops = { - .mapping_error = xen_swiotlb_dma_mapping_error, .alloc = xen_swiotlb_alloc_coherent, .free = xen_swiotlb_free_coherent, .sync_single_for_cpu = xen_swiotlb_sync_single_for_cpu, diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 87e6035..b8014bf 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -416,11 +416,12 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, /* * Ensure that the address returned is DMA'ble */ - if (!dma_capable(dev, dev_addr, size)) { - swiotlb_tbl_unmap_single(dev, map, size, dir); - dev_addr = 0; - } - return dev_addr; + if (dma_capable(dev, dev_addr, size)) + return dev_addr; + + swiotlb_tbl_unmap_single(dev, map, size, dir); + + return DMA_ERROR_CODE; } EXPORT_SYMBOL_GPL(xen_swiotlb_map_page); @@ -648,13 +649,6 @@ void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, } EXPORT_SYMBOL_GPL(xen_swiotlb_sync_sg_for_device); -int -xen_swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr) -{ - return !dma_addr; -} -EXPORT_SYMBOL_GPL(xen_swiotlb_dma_mapping_error); - /* * Return whether the given device DMA address mask can be supported * properly. For example, if your device can only drive the low 24-bits diff --git a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h index 7c35e27..a0083be 100644 --- a/include/xen/swiotlb-xen.h +++ b/include/xen/swiotlb-xen.h @@ -51,9 +51,6 @@ extern void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, int nelems, enum dma_data_direction dir); extern int -xen_swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr); - -extern int xen_swiotlb_dma_supported(struct device *hwdev, u64 mask); extern int
Re: [PATCH net-next] ibmveth: calculate correct gso_size and set gso_type
>> + u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr); > Compiler may optmize this, but maybe move hdr_len to [*] ?> There are other places in the stack where a u16 is used for the same purpose. So I'll rather stick to that convention. I'll make the other formatting changes you suggested and resubmit as v1. Thanks Jon On Tue, Oct 25, 2016 at 9:31 PM, Marcelo Ricardo Leitner wrote: > On Tue, Oct 25, 2016 at 04:13:41PM +1100, Jon Maxwell wrote: >> We recently encountered a bug where a few customers using ibmveth on the >> same LPAR hit an issue where a TCP session hung when large receive was >> enabled. Closer analysis revealed that the session was stuck because the >> one side was advertising a zero window repeatedly. >> >> We narrowed this down to the fact the ibmveth driver did not set gso_size >> which is translated by TCP into the MSS later up the stack. The MSS is >> used to calculate the TCP window size and as that was abnormally large, >> it was calculating a zero window, even although the sockets receive buffer >> was completely empty. >> >> We were able to reproduce this and worked with IBM to fix this. Thanks Tom >> and Marcelo for all your help and review on this. >> >> The patch fixes both our internal reproduction tests and our customers tests. >> >> Signed-off-by: Jon Maxwell >> --- >> drivers/net/ethernet/ibm/ibmveth.c | 19 +++ >> 1 file changed, 19 insertions(+) >> >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c >> b/drivers/net/ethernet/ibm/ibmveth.c >> index 29c05d0..3028c33 100644 >> --- a/drivers/net/ethernet/ibm/ibmveth.c >> +++ b/drivers/net/ethernet/ibm/ibmveth.c >> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int >> budget) >> int frames_processed = 0; >> unsigned long lpar_rc; >> struct iphdr *iph; >> + bool large_packet = 0; >> + u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr); > > Compiler may optmize this, but maybe move hdr_len to [*] ? > >> >> restart_poll: >> while (frames_processed < budget) { >> @@ -1236,10 +1238,27 @@ static int ibmveth_poll(struct napi_struct *napi, >> int budget) >> iph->check = 0; >> iph->check = >> ip_fast_csum((unsigned char *)iph, iph->ihl); >> adapter->rx_large_packets++; >> + large_packet = 1; >> } >> } >> } >> >> + if (skb->len > netdev->mtu) { > > [*] > >> + iph = (struct iphdr *)skb->data; >> + if (be16_to_cpu(skb->protocol) == ETH_P_IP && >> iph->protocol == IPPROTO_TCP) { > > The if line above is too long, should be broken in two. > >> + hdr_len += sizeof(struct iphdr); >> + skb_shinfo(skb)->gso_type = >> SKB_GSO_TCPV4; >> + skb_shinfo(skb)->gso_size = >> netdev->mtu - hdr_len; >> + } else if (be16_to_cpu(skb->protocol) == >> ETH_P_IPV6 && >> + iph->protocol == IPPROTO_TCP) { > ^ > And this one should start 3 spaces later, right below be16_ > > Marcelo > >> + hdr_len += sizeof(struct ipv6hdr); >> + skb_shinfo(skb)->gso_type = >> SKB_GSO_TCPV6; >> + skb_shinfo(skb)->gso_size = >> netdev->mtu - hdr_len; >> + } >> + if (!large_packet) >> + adapter->rx_large_packets++; >> + } >> + >> napi_gro_receive(napi, skb);/* send it up */ >> >> netdev->stats.rx_packets++; >> -- >> 1.8.3.1 >>
[PATCH v2] cw1200: fix bogus maybe-uninitialized warning
On x86, the cw1200 driver produces a rather silly warning about the possible use of the 'ret' variable without an initialization presumably after being confused by the architecture specific definition of WARN_ON: drivers/net/wireless/st/cw1200/wsm.c: In function ‘wsm_handle_rx’: drivers/net/wireless/st/cw1200/wsm.c:1457:9: error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized] We have already checked that 'count' is larger than 0 here, so we know that 'ret' is initialized. Changing the 'for' loop into do/while also makes this clear to the compiler. Suggested-by: David Laight Signed-off-by: Arnd Bergmann --- drivers/net/wireless/st/cw1200/wsm.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) v2: rewrite based on David Laight's suggestion, the first version was completely wrong. diff --git a/drivers/net/wireless/st/cw1200/wsm.c b/drivers/net/wireless/st/cw1200/wsm.c index 680d60eabc75..ed93bf3474ec 100644 --- a/drivers/net/wireless/st/cw1200/wsm.c +++ b/drivers/net/wireless/st/cw1200/wsm.c @@ -379,7 +379,6 @@ static int wsm_multi_tx_confirm(struct cw1200_common *priv, { int ret; int count; - int i; count = WSM_GET32(buf); if (WARN_ON(count <= 0)) @@ -395,11 +394,10 @@ static int wsm_multi_tx_confirm(struct cw1200_common *priv, } cw1200_debug_txed_multi(priv, count); - for (i = 0; i < count; ++i) { + do { ret = wsm_tx_confirm(priv, buf, link_id); - if (ret) - return ret; - } + } while (!ret && --count); + return ret; underflow: -- 2.9.0
Re: [PATCH] cw1200: fix bogus maybe-uninitialized warning
On Tuesday, October 25, 2016 1:24:55 PM CEST David Laight wrote: > > diff --git a/drivers/net/wireless/st/cw1200/wsm.c > > b/drivers/net/wireless/st/cw1200/wsm.c > > index 680d60eabc75..094e6637ade2 100644 > > --- a/drivers/net/wireless/st/cw1200/wsm.c > > +++ b/drivers/net/wireless/st/cw1200/wsm.c > > @@ -385,14 +385,13 @@ static int wsm_multi_tx_confirm(struct cw1200_common > > *priv, > > if (WARN_ON(count <= 0)) > > return -EINVAL; > > > > - if (count > 1) { > > - /* We already released one buffer, now for the rest */ > > - ret = wsm_release_tx_buffer(priv, count - 1); > > - if (ret < 0) > > - return ret; > > - else if (ret > 0) > > - cw1200_bh_wakeup(priv); > > - } > > + /* We already released one buffer, now for the rest */ > > + ret = wsm_release_tx_buffer(priv, count - 1); > > + if (ret < 0) > > + return ret; > > + > > + if (ret > 0) > > + cw1200_bh_wakeup(priv); > > That doesn't look equivalent to me (when count == 1). Ah, that's what I missed, thanks for pointing that out! > > > > cw1200_debug_txed_multi(priv, count); > > for (i = 0; i < count; ++i) { > > Convert this loop into a do ... while so the body executes at least once. Good idea. Version 2 coming now. Arnd
Re: [PATCH] virtio-net: Update the mtu code to match virtio spec
Aaron Conole writes: >> From: Aaron Conole >> >> The virtio committee recently ratified a change, VIRTIO-152, which >> defines the mtu field to be 'max' MTU, not simply desired MTU. >> >> This commit brings the virtio-net device in compliance with VIRTIO-152. >> >> Additionally, drop the max_mtu branch - it cannot be taken since the u16 >> returned by virtio_cread16 will never exceed the initial value of >> max_mtu. >> >> Cc: "Michael S. Tsirkin" >> Cc: Jarod Wilson >> Signed-off-by: Aaron Conole >> --- > > Sorry about the subject line, David. This is targetted at net-next, and > it appears my from was mangled. Would you like me to resubmit with > these details corrected? I answered my own question. Sorry for the noise.
[PATCH v2 net-next] virtio-net: Update the mtu code to match virtio spec
The virtio committee recently ratified a change, VIRTIO-152, which defines the mtu field to be 'max' MTU, not simply desired MTU. This commit brings the virtio-net device in compliance with VIRTIO-152. Additionally, drop the max_mtu branch - it cannot be taken since the u16 returned by virtio_cread16 will never exceed the initial value of max_mtu. Signed-off-by: Aaron Conole Acked-by: "Michael S. Tsirkin" Acked-by: Jarod Wilson --- Nothing code-wise has changed, but I've included the ACKs and fixed up the subject line. drivers/net/virtio_net.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 720809f..2cafd12 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1870,10 +1870,12 @@ static int virtnet_probe(struct virtio_device *vdev) mtu = virtio_cread16(vdev, offsetof(struct virtio_net_config, mtu)); - if (mtu < dev->min_mtu || mtu > dev->max_mtu) + if (mtu < dev->min_mtu) { __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU); - else + } else { dev->mtu = mtu; + dev->max_mtu = mtu; + } } if (vi->any_header_sg) -- 2.7.4
Re: [PATCH] virtio-net: Update the mtu code to match virtio spec
> From: Aaron Conole > > The virtio committee recently ratified a change, VIRTIO-152, which > defines the mtu field to be 'max' MTU, not simply desired MTU. > > This commit brings the virtio-net device in compliance with VIRTIO-152. > > Additionally, drop the max_mtu branch - it cannot be taken since the u16 > returned by virtio_cread16 will never exceed the initial value of > max_mtu. > > Cc: "Michael S. Tsirkin" > Cc: Jarod Wilson > Signed-off-by: Aaron Conole > --- Sorry about the subject line, David. This is targetted at net-next, and it appears my from was mangled. Would you like me to resubmit with these details corrected? -Aaron
Re: [PATCH net] udp: fix IP_CHECKSUM handling
On Tue, 2016-10-25 at 15:43 -0400, Willem de Bruijn wrote: > On Sun, Oct 23, 2016 at 9:03 PM, Eric Dumazet wrote: > > From: Eric Dumazet > > > > First bug was added in commit ad6f939ab193 ("ip: Add offset parameter to > > ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on > > AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by > > ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr)); > > > > Then commit e6afc8ace6dd ("udp: remove headers from UDP packets before > > queueing") forgot to adjust the offsets now UDP headers are pulled > > before skb are put in receive queue. > > > > Fixes: ad6f939ab193 ("ip: Add offset parameter to ip_cmsg_recv") > > Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") > > Signed-off-by: Eric Dumazet > > Cc: Sam Kumar > > Cc: Willem de Bruijn > > --- > > Tom, I would appreciate your feedback on this patch, I presume > > you have tests to verify IP_CHECKSUM feature ? Thanks ! > > > > Tested-by: Willem de Bruijn > > Thanks for fixing, Eric. > > Tested with > https://github.com/wdebruij/kerneltools/blob/master/tests/recv_cmsg_ipchecksum.c Thanks a lot Willem for cooking this test !
[PATCH v2] netfilter: fix type mismatch with error return from nft_parse_u32_check
Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of u32 netlink attributes") introduced nft_parse_u32_check with a return value of "unsigned int", yet on error it returns "-ERANGE". This patch corrects the mismatch by changing the return value to "int", which happens to match the actual users of nft_parse_u32_check already. Found by Coverity, CID 1373930. Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error handling in nft_exthdr_init()) attempted to address the issue, but did not address the return type of nft_parse_u32_check. Signed-off-by: John W. Linville Cc: Laura Garcia Liebana Cc: Pablo Neira Ayuso Cc: Dan Carpenter Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...") --- include/net/netfilter/nf_tables.h | 2 +- net/netfilter/nf_tables_api.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 5031e072567b..da43f50b39c6 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -145,7 +145,7 @@ static inline enum nft_registers nft_type_to_reg(enum nft_data_types type) return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE; } -unsigned int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest); +int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest); unsigned int nft_parse_register(const struct nlattr *attr); int nft_dump_register(struct sk_buff *skb, unsigned int attr, unsigned int reg); diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 24db22257586..32fa4f08444a 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4421,7 +4421,7 @@ static int nf_tables_check_loops(const struct nft_ctx *ctx, * Otherwise a 0 is returned and the attribute value is stored in the * destination variable. */ -unsigned int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest) +int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest) { u32 val; -- 2.7.4
Re: [PATCH] netfilter: fix type mismatch with error return from nft_parse_u32_check
On Tue, Oct 25, 2016 at 03:08:04PM -0400, John W. Linville wrote: > Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of > u32 netlink attributes") introduced nft_parse_u32_check with a return > value of "unsigned int", yet on error it returns "-ERANGE". > > This patch corrects the mismatch by changing the return value to "int", > which happens to match the actual users of nft_parse_u32_check already. > > Found by Coverity, CID 1373930. > > Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error > handling in nft_exthdr_init()) attempted to address the issue, but > did not address the return type of nft_parse_u32_check. > > Signed-off-by: John W. Linville > Cc: Laura Garcia Liebana > Cc: Pablo Neira Ayuso > Cc: Dan Carpenter > Fixes: 0eadf37afc250 ("netfilter: nf_tables: validate maximum value...") The Fixes line is incorrect -- corrected patch to follow! John -- John W. LinvilleSomeday the world will need a hero, and you linvi...@tuxdriver.com might be all we have. Be ready.
Re: [PATCH net] udp: fix IP_CHECKSUM handling
On Sun, Oct 23, 2016 at 9:03 PM, Eric Dumazet wrote: > From: Eric Dumazet > > First bug was added in commit ad6f939ab193 ("ip: Add offset parameter to > ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on > AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by > ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr)); > > Then commit e6afc8ace6dd ("udp: remove headers from UDP packets before > queueing") forgot to adjust the offsets now UDP headers are pulled > before skb are put in receive queue. > > Fixes: ad6f939ab193 ("ip: Add offset parameter to ip_cmsg_recv") > Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") > Signed-off-by: Eric Dumazet > Cc: Sam Kumar > Cc: Willem de Bruijn > --- > Tom, I would appreciate your feedback on this patch, I presume > you have tests to verify IP_CHECKSUM feature ? Thanks ! > Tested-by: Willem de Bruijn Thanks for fixing, Eric. Tested with https://github.com/wdebruij/kerneltools/blob/master/tests/recv_cmsg_ipchecksum.c
[PATCH] netfilter: fix type mismatch with error return from nft_parse_u32_check
Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of u32 netlink attributes") introduced nft_parse_u32_check with a return value of "unsigned int", yet on error it returns "-ERANGE". This patch corrects the mismatch by changing the return value to "int", which happens to match the actual users of nft_parse_u32_check already. Found by Coverity, CID 1373930. Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error handling in nft_exthdr_init()) attempted to address the issue, but did not address the return type of nft_parse_u32_check. Signed-off-by: John W. Linville Cc: Laura Garcia Liebana Cc: Pablo Neira Ayuso Cc: Dan Carpenter Fixes: 0eadf37afc250 ("netfilter: nf_tables: validate maximum value...") --- include/net/netfilter/nf_tables.h | 2 +- net/netfilter/nf_tables_api.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 5031e072567b..da43f50b39c6 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -145,7 +145,7 @@ static inline enum nft_registers nft_type_to_reg(enum nft_data_types type) return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE; } -unsigned int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest); +int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest); unsigned int nft_parse_register(const struct nlattr *attr); int nft_dump_register(struct sk_buff *skb, unsigned int attr, unsigned int reg); diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 24db22257586..32fa4f08444a 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4421,7 +4421,7 @@ static int nf_tables_check_loops(const struct nft_ctx *ctx, * Otherwise a 0 is returned and the attribute value is stored in the * destination variable. */ -unsigned int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest) +int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest) { u32 val; -- 2.7.4
Re: [PATCH] cw1200: fix bogus maybe-uninitialized warning
On Tue, Oct 25, 2016 at 01:24:55PM +, David Laight wrote: > > - if (count > 1) { > > - /* We already released one buffer, now for the rest */ > > - ret = wsm_release_tx_buffer(priv, count - 1); > > - if (ret < 0) > > - return ret; > > - else if (ret > 0) > > - cw1200_bh_wakeup(priv); > > - } > > + /* We already released one buffer, now for the rest */ > > + ret = wsm_release_tx_buffer(priv, count - 1); > > + if (ret < 0) > > + return ret; > > + > > + if (ret > 0) > > + cw1200_bh_wakeup(priv); > > That doesn't look equivalent to me (when count == 1). I concur, this patch should not be applied in its current form. - Solomon -- Solomon Peachy pizza at shaftnet dot org Delray Beach, FL ^^ (email/xmpp) ^^ Quidquid latine dictum sit, altum viditur. signature.asc Description: PGP signature
RE: nfs NULL-dereferencing in net-next
>-Original Message- >From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On >Behalf Of Jakub Kicinski >Sent: Monday, October 17, 2016 10:20 PM >To: Andy Adamson ; Anna Schumaker >; linux-...@vger.kernel.org >Cc: netdev@vger.kernel.org; Trond Myklebust >Subject: nfs NULL-dereferencing in net-next > >Hi! > >I'm hitting this reliably on net-next, HEAD at 3f3177bb680f >("fsl/fman: fix error return code in mac_probe()"). I see the same thing. It happens constantly on some of my machines, making them completely unusable. I bisected it and got to the commit: commit 04ea1b3e6d8ed4978bb608c1748530af3de8c274 Author: Andy Adamson Date: Fri Sep 9 09:22:27 2016 -0400 NFS add xprt switch addrs test to match client Signed-off-by: Andy Adamson Signed-off-by: Anna Schumaker > >[ 23.409633] BUG: unable to handle kernel NULL pointer dereference at >0172 >[ 23.418716] IP: [] rpc_clnt_xprt_switch_has_addr+0xc/0x40 >[sunrpc] >[ 23.427574] PGD 859020067 [ 23.430472] PUD 858f2d067 >PMD 0 [ 23.434311] >[ 23.436133] Oops: [#1] PREEMPT SMP >[ 23.440506] Modules linked in: nfsv4 ip6table_filter ip6_tables >iptable_filter >ip_tables ebtable_nat ebtables x_tables intel_ri >[ 23.505915] CPU: 1 PID: 1067 Comm: mount.nfs Not tainted 4.8.0-perf-13951- >g3f3177bb680f #51 >[ 23.515363] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 1.2.10 >03/10/2015 >[ 23.523937] task: 983e9086ea00 task.stack: ac6c0a57c000 >[ 23.530641] RIP: 0010:[] [] >rpc_clnt_xprt_switch_has_addr+0xc/0x40 [sunrpc] >[ 23.542229] RSP: 0018:ac6c0a57fb28 EFLAGS: 00010a97 >[ 23.548255] RAX: c80214ac RBX: 983e97c7b000 RCX: >983e9b3bc180 >[ 23.556320] RDX: 0001 RSI: 983e9928ed28 RDI: >ffea >[ 23.564386] RBP: ac6c0a57fb38 R08: 983e97090630 R09: >983e9928ed30 >[ 23.572452] R10: ac6c0a57fba0 R11: 0010 R12: >ac6c0a57fba0 >[ 23.580517] R13: 983e9928ed28 R14: R15: >983e91360560 >[ 23.588585] FS: 7f4c348aa880() GS:983e9f24() >knlGS: >[ 23.597742] CS: 0010 DS: ES: CR0: 80050033 >[ 23.604251] CR2: 0172 CR3: 000850a5f000 CR4: >001406e0 >[ 23.612316] Stack: >[ 23.614648] 983e97c7b000 ac6c0a57fba0 ac6c0a57fb90 >c04d38c3 >[ 23.623331] 983e91360500 983e9928ed30 c0b9e560 >983e913605b8 >[ 23.632016] 983e9882e800 983e9882e800 ac6c0a57fc30 >ac6c0a57fdb8 >[ 23.640706] Call Trace: >[ 23.643535] [] nfs_get_client+0x123/0x340 [nfs] >[ 23.650542] [] nfs4_set_client+0x80/0xb0 [nfsv4] >[ 23.657642] [] nfs4_create_server+0x115/0x2a0 [nfsv4] >[ 23.665230] [] nfs4_remote_mount+0x2e/0x60 [nfsv4] >[ 23.672519] [] mount_fs+0x3a/0x160 >[ 23.678254] [] ? alloc_vfsmnt+0x19e/0x230 >[ 23.684669] [] vfs_kern_mount+0x67/0x110 >[ 23.690990] [] nfs_do_root_mount+0x84/0xc0 [nfsv4] >[ 23.698284] [] nfs4_try_mount+0x37/0x50 [nfsv4] >[ 23.705287] [] nfs_fs_mount+0x2d1/0xa70 [nfs] >[ 23.712092] [] ? find_next_bit+0x18/0x20 >[ 23.718413] [] ? nfs_remount+0x3c0/0x3c0 [nfs] >[ 23.725316] [] ? nfs_clone_super+0x130/0x130 [nfs] >[ 23.732606] [] mount_fs+0x3a/0x160 >[ 23.738340] [] ? alloc_vfsmnt+0x19e/0x230 >[ 23.744755] [] vfs_kern_mount+0x67/0x110 >[ 23.751071] [] do_mount+0x1bf/0xc70 >[ 23.756904] [] ? copy_mount_options+0xbb/0x220 >[ 23.763803] [] SyS_mount+0x83/0xd0 >[ 23.769538] [] entry_SYSCALL_64_fastpath+0x17/0x98 >[ 23.776817] Code: 01 00 48 8b 93 f8 04 00 00 44 89 e6 48 c7 c7 98 b2 43 c0 >e8 9f 0d d4 >f9 eb c0 0f 1f 44 00 00 0f 1f 44 00 00 >[ 23.802909] RIP [] rpc_clnt_xprt_switch_has_addr+0xc/0x40 >[sunrpc] >[ 23.811857] RSP >[ 23.815839] CR2: 0172 >[ 23.819629] ---[ end trace 9958eca92c9eeafe ]--- >[ 23.827345] note: mount.nfs[1067] exited with preempt_count 1
Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net
On Sat, 22 Oct 2016 04:07:23 +, Shrijeet Mukherjee wrote: > + act = bpf_prog_run_xdp(xdp_prog, &xdp); > + switch (act) { > + case XDP_PASS: > + return XDP_PASS; > + case XDP_TX: > + case XDP_ABORTED: > + case XDP_DROP: > + return XDP_DROP; > + default: > + bpf_warn_invalid_xdp_action(act); > + } > + } > + return XDP_PASS; FWIW you may want to move the default label before XDP_TX/XDP_ABORT, to get the behaviour to be drop on unknown ret code.
Re: [PATCH 0/2] at803x: don't power-down SGMII link
Zefir Kurtisi wrote: In a device where the ar8031 operates in SGMII mode, we observed that after a suspend-resume cycle in very rare cases the copper side autonegotiation secceeds but the SGMII side fails to come up. As a work-around, a patch was provided that on suspend and resume powers the SGMII link down and up along with the copper side. This fixed the observed failure, but introduced a regression Timur Tabi observed: once the SGMII is powered down, the PHY is inaccessible by the CPU and with that e.g. can't be re-initialized after suspend. Since the original issue could not be reproduced by others, this series provides an alternative handling: * the first patch reverts the prvious fix that powers down SGMII * the second patch adds double-checking for the observed failure condition Zefir Kurtisi (2): Revert "at803x: fix suspend/resume for SGMII link" at803x: double check SGMII side autoneg Tested-by: Timur Tabi With these patches, the problem I was seeing no longer occurs, and the new code does not appear to break anything. As before, I still have never seen the original problem, but this patchset seems to work for both of us. -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH] virtio-net: Update the mtu code to match virtio spec
On Tue, Oct 25, 2016 at 12:35:35PM -0400, Aaron Conole wrote: > From: Aaron Conole > > The virtio committee recently ratified a change, VIRTIO-152, which > defines the mtu field to be 'max' MTU, not simply desired MTU. > > This commit brings the virtio-net device in compliance with VIRTIO-152. > > Additionally, drop the max_mtu branch - it cannot be taken since the u16 > returned by virtio_cread16 will never exceed the initial value of > max_mtu. > > Cc: "Michael S. Tsirkin" > Cc: Jarod Wilson > Signed-off-by: Aaron Conole Worksforme. Acked-by: Jarod Wilson -- Jarod Wilson ja...@redhat.com
[PATCH] net: bonding: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes --- drivers/net/bonding/bond_main.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index c9944d8..5708f17 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4080,16 +4080,16 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev) return ret; } -static int bond_ethtool_get_settings(struct net_device *bond_dev, -struct ethtool_cmd *ecmd) +static int bond_ethtool_get_link_ksettings(struct net_device *bond_dev, + struct ethtool_link_ksettings *cmd) { struct bonding *bond = netdev_priv(bond_dev); unsigned long speed = 0; struct list_head *iter; struct slave *slave; - ecmd->duplex = DUPLEX_UNKNOWN; - ecmd->port = PORT_OTHER; + cmd->base.duplex = DUPLEX_UNKNOWN; + cmd->base.port = PORT_OTHER; /* Since bond_slave_can_tx returns false for all inactive or down slaves, we * do not need to check mode. Though link speed might not represent @@ -4100,12 +4100,12 @@ static int bond_ethtool_get_settings(struct net_device *bond_dev, if (bond_slave_can_tx(slave)) { if (slave->speed != SPEED_UNKNOWN) speed += slave->speed; - if (ecmd->duplex == DUPLEX_UNKNOWN && + if (cmd->base.duplex == DUPLEX_UNKNOWN && slave->duplex != DUPLEX_UNKNOWN) - ecmd->duplex = slave->duplex; + cmd->base.duplex = slave->duplex; } } - ethtool_cmd_speed_set(ecmd, speed ? : SPEED_UNKNOWN); + cmd->base.speed = speed ? : SPEED_UNKNOWN; return 0; } @@ -4121,8 +4121,8 @@ static void bond_ethtool_get_drvinfo(struct net_device *bond_dev, static const struct ethtool_ops bond_ethtool_ops = { .get_drvinfo= bond_ethtool_get_drvinfo, - .get_settings = bond_ethtool_get_settings, .get_link = ethtool_op_get_link, + .get_link_ksettings = bond_ethtool_get_link_ksettings, }; static const struct net_device_ops bond_netdev_ops = { -- 1.7.4.4
Re: [PATCH] virtio-net: Update the mtu code to match virtio spec
On Tue, Oct 25, 2016 at 12:35:35PM -0400, Aaron Conole wrote: > From: Aaron Conole > > The virtio committee recently ratified a change, VIRTIO-152, which > defines the mtu field to be 'max' MTU, not simply desired MTU. > > This commit brings the virtio-net device in compliance with VIRTIO-152. > > Additionally, drop the max_mtu branch - it cannot be taken since the u16 > returned by virtio_cread16 will never exceed the initial value of > max_mtu. > > Cc: "Michael S. Tsirkin" > Cc: Jarod Wilson > Signed-off-by: Aaron Conole Acked-by: Michael S. Tsirkin > --- > drivers/net/virtio_net.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 720809f..2cafd12 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1870,10 +1870,12 @@ static int virtnet_probe(struct virtio_device *vdev) > mtu = virtio_cread16(vdev, >offsetof(struct virtio_net_config, > mtu)); > - if (mtu < dev->min_mtu || mtu > dev->max_mtu) > + if (mtu < dev->min_mtu) { > __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU); > - else > + } else { > dev->mtu = mtu; > + dev->max_mtu = mtu; > + } > } > > if (vi->any_header_sg) > -- > 2.7.4
[PATCH] virtio-net: Update the mtu code to match virtio spec
From: Aaron Conole The virtio committee recently ratified a change, VIRTIO-152, which defines the mtu field to be 'max' MTU, not simply desired MTU. This commit brings the virtio-net device in compliance with VIRTIO-152. Additionally, drop the max_mtu branch - it cannot be taken since the u16 returned by virtio_cread16 will never exceed the initial value of max_mtu. Cc: "Michael S. Tsirkin" Cc: Jarod Wilson Signed-off-by: Aaron Conole --- drivers/net/virtio_net.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 720809f..2cafd12 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1870,10 +1870,12 @@ static int virtnet_probe(struct virtio_device *vdev) mtu = virtio_cread16(vdev, offsetof(struct virtio_net_config, mtu)); - if (mtu < dev->min_mtu || mtu > dev->max_mtu) + if (mtu < dev->min_mtu) { __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU); - else + } else { dev->mtu = mtu; + dev->max_mtu = mtu; + } } if (vi->any_header_sg) -- 2.7.4
[PATCH net] sctp: validate chunk len before actually using it
Andrey Konovalov reported that KASAN detected that SCTP was using a slab beyond the boundaries. It was caused because when handling out of the blue packets in function sctp_sf_ootb() it was checking the chunk len only after already processing the first chunk, validating only for the 2nd and subsequent ones. The fix is to just move the check upwards so it's also validated for the 1st chunk. Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Signed-off-by: Marcelo Ricardo Leitner --- Hi. Please consider this to -stable too. Thanks net/sctp/sm_statefuns.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index 026e3bca4a94bd34b418d5e6947f7182c1512358..8ec20a64a3f8055a0c3576627c5ec5dad7e99ca8 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -3422,6 +3422,12 @@ sctp_disposition_t sctp_sf_ootb(struct net *net, return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, commands); + /* Report violation if chunk len overflows */ + ch_end = ((__u8 *)ch) + SCTP_PAD4(ntohs(ch->length)); + if (ch_end > skb_tail_pointer(skb)) + return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, + commands); + /* Now that we know we at least have a chunk header, * do things that are type appropriate. */ @@ -3453,12 +3459,6 @@ sctp_disposition_t sctp_sf_ootb(struct net *net, } } - /* Report violation if chunk len overflows */ - ch_end = ((__u8 *)ch) + SCTP_PAD4(ntohs(ch->length)); - if (ch_end > skb_tail_pointer(skb)) - return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, - commands); - ch = (sctp_chunkhdr_t *) ch_end; } while (ch_end < skb_tail_pointer(skb)); -- 2.7.4
RE: [PATCH] netfilter: ip_vs_sync: fix bogus maybe-uninitialized warning
From: Arnd Bergmann > Sent: 24 October 2016 21:22 > On Monday, October 24, 2016 10:47:54 PM CEST Julian Anastasov wrote: > > > diff --git a/net/netfilter/ipvs/ip_vs_sync.c > > > b/net/netfilter/ipvs/ip_vs_sync.c > > > index 1b07578bedf3..9350530c16c1 100644 > > > --- a/net/netfilter/ipvs/ip_vs_sync.c > > > +++ b/net/netfilter/ipvs/ip_vs_sync.c > > > @@ -283,6 +283,7 @@ struct ip_vs_sync_buff { > > > */ > > > static void ntoh_seq(struct ip_vs_seq *no, struct ip_vs_seq *ho) > > > { > > > + memset(ho, 0, sizeof(*ho)); > > > ho->init_seq = get_unaligned_be32(&no->init_seq); > > > ho->delta = get_unaligned_be32(&no->delta); > > > ho->previous_delta = get_unaligned_be32(&no->previous_delta); > > > > So, now there is a double write here? > > Correct. I would hope that a sane version of gcc would just not > perform the first write. What happens instead is that the version > that produces the warning here moves the initialization to the > top of the calling function. Maybe doing the 3 get_unaligned_be32() before the memset will stop the double-writes. The problem is that the compiler doesn't know that the two structures don't alias each other. David