[PATCH net-next] cxgb4: Clear On FLASH config file after a FW upgrade
From: Arjun VynipadathBecause Firmware and the Firmware Configuration File need to be in sync; clear out any On-FLASH Firmware Configuration File when new Firmware is loaded. This will avoid difficult to diagnose and fix problems with a mis-matched Firmware Configuration File which prevents the adapter from being initialized. Original work by: Casey Leedom Signed-off-by: Arjun Vynipadath Signed-off-by: Ganesh Goudar --- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 1 + drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 70 ++ 2 files changed, 71 insertions(+) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h index 1978abb..daa3775 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h @@ -1405,6 +1405,7 @@ int t4_fw_upgrade(struct adapter *adap, unsigned int mbox, int t4_fl_pkt_align(struct adapter *adap); unsigned int t4_flash_cfg_addr(struct adapter *adapter); int t4_check_fw_version(struct adapter *adap); +int t4_load_cfg(struct adapter *adapter, const u8 *cfg_data, unsigned int size); int t4_get_fw_version(struct adapter *adapter, u32 *vers); int t4_get_bs_version(struct adapter *adapter, u32 *vers); int t4_get_tp_version(struct adapter *adapter, u32 *vers); diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index 24087c8..fff8fba 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -6664,6 +6664,17 @@ int t4_fw_upgrade(struct adapter *adap, unsigned int mbox, goto out; /* +* If there was a Firmware Configuration File stored in FLASH, +* there's a good chance that it won't be compatible with the new +* Firmware. In order to prevent difficult to diagnose adapter +* initialization issues, we clear out the Firmware Configuration File +* portion of the FLASH . The user will need to re-FLASH a new +* Firmware Configuration File which is compatible with the new +* Firmware if that's desired. +*/ + (void)t4_load_cfg(adap, NULL, 0); + + /* * Older versions of the firmware don't understand the new * PCIE_FW.HALT flag and so won't know to perform a RESET when they * restart. So for newly loaded older firmware we'll have to do the @@ -8896,6 +8907,65 @@ void t4_idma_monitor(struct adapter *adapter, } /** + * t4_load_cfg - download config file + * @adap: the adapter + * @cfg_data: the cfg text file to write + * @size: text file size + * + * Write the supplied config text file to the card's serial flash. + */ +int t4_load_cfg(struct adapter *adap, const u8 *cfg_data, unsigned int size) +{ + int ret, i, n, cfg_addr; + unsigned int addr; + unsigned int flash_cfg_start_sec; + unsigned int sf_sec_size = adap->params.sf_size / adap->params.sf_nsec; + + cfg_addr = t4_flash_cfg_addr(adap); + if (cfg_addr < 0) + return cfg_addr; + + addr = cfg_addr; + flash_cfg_start_sec = addr / SF_SEC_SIZE; + + if (size > FLASH_CFG_MAX_SIZE) { + dev_err(adap->pdev_dev, "cfg file too large, max is %u bytes\n", + FLASH_CFG_MAX_SIZE); + return -EFBIG; + } + + i = DIV_ROUND_UP(FLASH_CFG_MAX_SIZE,/* # of sectors spanned */ +sf_sec_size); + ret = t4_flash_erase_sectors(adap, flash_cfg_start_sec, +flash_cfg_start_sec + i - 1); + /* If size == 0 then we're simply erasing the FLASH sectors associated +* with the on-adapter Firmware Configuration File. +*/ + if (ret || size == 0) + goto out; + + /* this will write to the flash up to SF_PAGE_SIZE at a time */ + for (i = 0; i < size; i += SF_PAGE_SIZE) { + if ((size - i) < SF_PAGE_SIZE) + n = size - i; + else + n = SF_PAGE_SIZE; + ret = t4_write_flash(adap, addr, n, cfg_data); + if (ret) + goto out; + + addr += SF_PAGE_SIZE; + cfg_data += SF_PAGE_SIZE; + } + +out: + if (ret) + dev_err(adap->pdev_dev, "config file %s failed %d\n", + (size == 0 ? "clear" : "download"), ret); + return ret; +} + +/** * t4_set_vf_mac - Set MAC address for the specified VF * @adapter: The adapter * @vf: one of the VFs instantiated by the specified PF -- 2.1.0
Re: [PATCHv2 net] net: sched: set xt_tgchk_param par.net properly in ipt_init_target
Tue, Aug 08, 2017 at 04:13:27AM CEST, lucien@gmail.com wrote: >Now xt_tgchk_param par in ipt_init_target is a local varibale, >par.net is not initialized there. Later when xt_check_target >calls target's checkentry in which it may access par.net, it >would cause kernel panic. > >Jaroslav found this panic when running: > > # ip link add TestIface type dummy > # tc qd add dev TestIface ingress handle : > # tc filter add dev TestIface parent : u32 match u32 0 0 \ >action xt -j CONNMARK --set-mark 4 > >This patch is to pass net param into ipt_init_target and set >par.net with it properly in there. > >v1->v2: > As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix > it by also passing net_id to __tcf_ipt_init. > >Reported-by: Jaroslav Aster>Signed-off-by: Xin Long Fixes what? You need to have "Fixes" tag for net patches.
Re: [Patch net-next] net_sched: get rid of some forward declarations
Tue, Aug 08, 2017 at 12:26:50AM CEST, xiyou.wangc...@gmail.com wrote: >If we move up tcf_fill_node() we can get rid of these >forward declarations. > >Also, move down tfilter_notify_chain() to group them together. > >Reported-by: Jamal Hadi Salim>Cc: Jamal Hadi Salim >Signed-off-by: Cong Wang Acked-by: Jiri Pirko
Re: [PATCH] net: Reduce skb_warn_bad_offload() noise.
Hi Willem In a case, there is also warn info. The test topo is shown as below. VM01: veth1 and eth0 in the VM01 are inserted to ovs br0. veth0(IP: 172.16.34.100/24) —— veth1--br0--eth0 iperf3 -c 172.168.100.13 -i 1 -P 10 -t 10 -u -b 1000M -l 10K VM02 eth0(IP: 172.16.34.200/24) iperf3 -s The warn info is shown as below [1]. If we change the CHECKSUM_NONE to CHECKSUM_UNNECESSARY in the udp4_ufo_fragment(). and we should add a check in skb_needs_check() when outputting a packet. diff --git a/net/core/dev.c b/net/core/dev.c index 416137c..8fe12a7 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2670,6 +2670,7 @@ static inline bool skb_needs_check(struct sk_buff *skb, bool tx_path) { if (tx_path) return skb->ip_summed != CHECKSUM_PARTIAL && + skb->ip_summed != CHECKSUM_UNNECESSARY && skb->ip_summed != CHECKSUM_NONE; return skb->ip_summed == CHECKSUM_NONE; diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 7812501..0932c85 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -235,7 +235,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, if (uh->check == 0) uh->check = CSUM_MANGLED_0; - skb->ip_summed = CHECKSUM_NONE; + skb->ip_summed = CHECKSUM_UNNECESSARY; /* If there is no outer header we can fake a checksum offload * due to the fact that we have already done the checksum in [1]: [ 1291.596232] vmxnet3: caps=(0x006000214ba9, 0x) len=10282 data_len=10240 gso_size=1480 gso_type=2 ip_summed=1 [ 1291.596239] [ cut here ] [ 1291.596242] WARNING: CPU: 1 PID: 2203 at net/core/dev.c:2564 skb_warn_bad_offload+0xd3/0xde [ 1291.596242] Modules linked in: veth udp_tunnel gre openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack cfg80211 rfkill ext4 jbd2 mbcache sb_edac coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper ppdev vmw_balloon cryptd vmw_vmci sg i2c_piix4 pcspkr parport_pc parport shpchp ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw vmxnet3 ata_piix mptspi libata scsi_transport_spi mptscsih mptbase i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod [ 1291.596280] CPU: 1 PID: 2203 Comm: iperf3 Tainted: GW 4.12.0+ #1 [ 1291.596280] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015 [ 1291.596281] task: 8be36f5e9680 task.stack: b6c840d44000 [ 1291.596283] RIP: 0010:skb_warn_bad_offload+0xd3/0xde [ 1291.596284] RSP: 0018:8be3796438c8 EFLAGS: 00010246 [ 1291.596285] RAX: 0074 RBX: 8be363ff9f00 RCX: 0006 [ 1291.596286] RDX: RSI: 0086 RDI: 8be37964e0a0 [ 1291.596287] RBP: 8be3796438f0 R08: R09: 0bf2 [ 1291.596287] R10: 0004 R11: 0bf1 R12: 8be3660f [ 1291.596288] R13: 0001 R14: R15: 8be3660f [ 1291.596289] FS: 7f8d93bdb740() GS:8be37964() knlGS: [ 1291.596290] CS: 0010 DS: ES: CR0: 80050033 [ 1291.596291] CR2: 555f94a6e0c8 CR3: 00012b84a000 CR4: 001406e0 [ 1291.596296] Call Trace: [ 1291.596297] [ 1291.596300] __skb_gso_segment+0x15d/0x170 [ 1291.596301] validate_xmit_skb+0x12d/0x2b0 [ 1291.596303] validate_xmit_skb_list+0x42/0x70 [ 1291.596306] sch_direct_xmit+0xd0/0x1b0 [ 1291.596308] __dev_queue_xmit+0x42e/0x630 [ 1291.596310] ? hrtimer_interrupt+0xd2/0x1a0 [ 1291.596312] dev_queue_xmit+0x10/0x20 [ 1291.596315] ovs_vport_send+0xc2/0x150 [openvswitch] [ 1291.596318] do_output+0x53/0xf0 [openvswitch] [ 1291.596322] do_execute_actions+0x9bc/0x9d0 [openvswitch] [ 1291.596324] ? __bpf_prog_run+0x385/0x1310 [ 1291.596327] ovs_execute_actions+0x40/0x120 [openvswitch] [ 1291.596330] ovs_dp_process_packet+0x84/0x120 [openvswitch] [ 1291.596333] ? ovs_ct_update_key+0x9a/0xe0 [openvswitch] [ 1291.596336] ovs_vport_receive+0x73/0xd0 [openvswitch] [ 1291.596339] ? handle_irq_event_percpu+0x54/0x80 [ 1291.596340] ? handle_irq_event+0x46/0x60 [ 1291.596342] ? handle_edge_irq+0x8d/0x130 [ 1291.596344] ? handle_irq+0xab/0x120 [ 1291.596346] ? irq_exit+0x77/0xf0 [ 1291.596348] ? do_IRQ+0x51/0xd0 [ 1291.596352] netdev_frame_hook+0xd3/0x160 [openvswitch] [ 1291.596355] __netif_receive_skb_core+0x1da/0x9e0 [ 1291.596358] ? vport_netdev_free+0x30/0x30 [openvswitch] [ 1291.596360] ? kfree_skbmem+0x5a/0x60 [ 1291.596361] ? consume_skb+0x34/0x90 [ 1291.596363] __netif_receive_skb+0x18/0x60 [ 1291.596365] process_backlog+0x95/0x140 [ 1291.596367] net_rx_action+0x26c/0x3b0 [ 1291.596369] __do_softirq+0xc9/0x269 [ 1291.596371]
[PATCH net-next] openvswitch: add NSH support
OVS master and 2.8 branch has merged NSH userspace patch series, this patch is to enable NSH support in kernel data path in order that OVS can support NSH in 2.8 release in compat mode by porting this. Signed-off-by: Yi Yang--- drivers/net/vxlan.c | 7 ++ include/net/nsh.h| 126 ++ include/uapi/linux/openvswitch.h | 33 net/openvswitch/actions.c| 165 +++ net/openvswitch/flow.c | 41 ++ net/openvswitch/flow.h | 1 + net/openvswitch/flow_netlink.c | 54 - 7 files changed, 426 insertions(+), 1 deletion(-) create mode 100644 include/net/nsh.h diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index dbca067..843714c 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -27,6 +27,7 @@ #include #include #include +#include #if IS_ENABLED(CONFIG_IPV6) #include @@ -1267,6 +1268,9 @@ static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed, case VXLAN_GPE_NP_IPV6: *protocol = htons(ETH_P_IPV6); break; + case VXLAN_GPE_NP_NSH: + *protocol = htons(ETH_P_NSH); + break; case VXLAN_GPE_NP_ETHERNET: *protocol = htons(ETH_P_TEB); break; @@ -1806,6 +1810,9 @@ static int vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags, case htons(ETH_P_IPV6): gpe->next_protocol = VXLAN_GPE_NP_IPV6; return 0; + case htons(ETH_P_NSH): + gpe->next_protocol = VXLAN_GPE_NP_NSH; + return 0; case htons(ETH_P_TEB): gpe->next_protocol = VXLAN_GPE_NP_ETHERNET; return 0; diff --git a/include/net/nsh.h b/include/net/nsh.h new file mode 100644 index 000..96477a1 --- /dev/null +++ b/include/net/nsh.h @@ -0,0 +1,126 @@ +#ifndef __NET_NSH_H +#define __NET_NSH_H 1 + + +/* + * Network Service Header: + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * |Ver|O|C|R|R|R|R|R|R|Length | MD Type | Next Proto | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * |Service Path ID| Service Index | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | | + * ~ Mandatory/Optional Context Header ~ + * | | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * Ver = The version field is used to ensure backward compatibility + * going forward with future NSH updates. It MUST be set to 0x0 + * by the sender, in this first revision of NSH. + * + * O = OAM. when set to 0x1 indicates that this packet is an operations + * and management (OAM) packet. The receiving SFF and SFs nodes + * MUST examine the payload and take appropriate action. + * + * C = context. Indicates that a critical metadata TLV is present. + * + * Length : total length, in 4-byte words, of NSH including the Base + * Header, the Service Path Header and the optional variable + * TLVs. + * MD Type: indicates the format of NSH beyond the mandatory Base Header + * and the Service Path Header. + * + * Next Protocol: indicates the protocol type of the original packet. A + * new IANA registry will be created for protocol type. + * + * Service Path Identifier (SPI): identifies a service path. + * Participating nodes MUST use this identifier for Service + * Function Path selection. + * + * Service Index (SI): provides location within the SFP. + * + * [0] https://tools.ietf.org/html/draft-ietf-sfc-nsh-13 + */ + +/** + * struct nsh_md1_ctx - Keeps track of NSH context data + * @nshc<1-4>: NSH Contexts. + */ +struct nsh_md1_ctx { + __be32 c[4]; +}; + +struct nsh_md2_tlv { + __be16 md_class; + u8 type; + u8 length; + u8 md_value[]; +}; + +struct nsh_hdr { + __be16 ver_flags_len; + u8 md_type; + u8 next_proto; + __be32 path_hdr; + union { + struct nsh_md1_ctx md1; + struct nsh_md2_tlv md2[0]; + }; +}; + +/* Masking NSH header fields. */ +#define NSH_VER_MASK 0xc000 +#define NSH_VER_SHIFT 14 +#define NSH_FLAGS_MASK 0x3fc0 +#define NSH_FLAGS_SHIFT6 +#define NSH_LEN_MASK 0x003f +#define NSH_LEN_SHIFT 0 + +#define NSH_SPI_MASK 0xff00 +#define NSH_SPI_SHIFT 8 +#define NSH_SI_MASK0x00ff +#define NSH_SI_SHIFT 0 + +#define NSH_DST_PORT4790 /* UDP Port for NSH on VXLAN. */ +#define ETH_P_NSH 0x894F /* Ethertype for NSH. */ + +/* NSH Base Header Next Protocol. */ +#define NSH_P_IPV40x01 +#define NSH_P_IPV60x02 +#define NSH_P_ETHERNET0x03 +#define NSH_P_NSH
Re: [PATCH 0/6] In-kernel QMI handling
On Mon 07 Aug 12:19 PDT 2017, Marcel Holtmann wrote: > Hi Bjorn, > > >>> This series starts by moving the common definitions of the QMUX > >>> protocol to the > >>> uapi header, as they are shared with clients - both in kernel and > >>> userspace. > >>> > >>> This series then introduces in-kernel helper functions for aiding the > >>> handling > >>> of QMI encoded messages in the kernel. QMI encoding is a wire-format > >>> used in > >>> exchanging messages between the majority of QRTR clients and > >>> services. > >> > >> This raises a few red-flags for me. > > > > I'm glad it does. In discussions with the responsible team within > > Qualcomm I've highlighted a number of concerns about enabling this > > support in the kernel. Together we're continuously looking into what > > should be pushed out to user space, and trying to not introduce > > unnecessary new users. > > > >> So far, we've kept almost everything QMI related in userspace and > >> handled all QMI control-channel messages from libraries like libqmi or > >> uqmi via the cdc-wdm driver and the "rmnet" interface via the qmi_wwan > >> driver. The kernel drivers just serve as the transport. > >> > > > > The path that was taken to support the MSM-style devices was to > > implement net/qrtr, which exposes a socket interface to abstract the > > physical transports (QMUX or IPCROUTER in Qualcomm terminology). > > > > As I share you view on letting the kernel handle the transportation only > > the task of keeping track of registered services (service id -> node and > > port mapping) was done in a user space process and so far we've only > > ever have to deal with QMI encoded messages in various user space tools. > > I think that the transport and multiplexing can be in the kernel as > long as it is done as proper subsystem. Similar to Phonet or CAIF. > Meaning it should have a well defined socket interface that can be > easily used from userspace, but also a clean in-kernel interface > handling. > In a mobile Qualcomm device there's a few different components involved here: message routing, QMUX protocol and QMI-encoding. The downstream Qualcomm kernel implements the two first in the IPCROUTER, upstream this is split between the kernel net/qrtr and a user space service-register implementing the QMUX protocol for knowing where services are located. The common encoding of messages passed between endpoints of the message routing is QMI, which is made an affair totally that of each client. > If Qualcomm is supportive of this effort and is willing to actually > assist and/or open some of the specs or interface descriptions, then > this is a good thing. Service registration and cleanup is really done > best in the kernel. Same applies to multiplexing. Trying to do > multiplexing in userspace is always cumbersome and leads to overhead > that is of no gain. For example within oFono, we had to force > everything to go via oFono since it was the only sane way of handling > it. Other approaches were error prone and full of race conditions. You > need a central entity that can clean up. > The current upstream solution depends on a collaboration between net/qrtr and the user space service register for figuring out whom to send messages to. After that muxing et al is handled by the socket interface and service registry does not need to be involved. Qualcomm is very supporting of this solution and we're collaborating on transitioning "downstream" to use this implementation. > For the definition of an UAPI to share some code, I am actually not > sure that is such a good idea. For example the QMI code in oFono > follows a way simpler approach. And I am not convinced that all the > macros are actually beneficial. For example, the whole netlink macros > are pretty cumbersome. Adding some Documentation/qmi.txt on how the > wire format looks like and what is expected seems to be a way better > approach. > The socket interface provided by the kernel expects some knowledge of the QMUX protocol, for service management. The majority of this knowledge is already public, but I agree that it would be good to gather this in a document. The common data structure for the control message is what I've put in the uapi, as this is used by anyone dealing with control messages. When it comes to the QMI-encoded messages these are application specific, just like e.g. protobuf definitions are application specific. As the core infrastructure is becoming available upstream and boards like the DB410c and DB820c aim to be supported by open solutions we will have a natural place to discuss publication of at least some of the application level protocols. Regards, Bjorn
Re: [PATCH net-next 03/14] sctp: remove the typedef sctp_scope_policy_t
On Mon, Aug 7, 2017 at 9:28 PM, David Laightwrote: > From: Xin Long >> Sent: 05 August 2017 13:00 >> This patch is to remove the typedef sctp_scope_policy_t and keep >> it's members as an anonymous enum. >> >> It is also to define SCTP_SCOPE_POLICY_MAX to replace the num 3 >> in sysctl.c to make codes clear. >> >> Signed-off-by: Xin Long >> --- >> include/net/sctp/constants.h | 6 -- >> net/sctp/sysctl.c| 2 +- >> 2 files changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >> index 922fba5..acb03eb 100644 >> --- a/include/net/sctp/constants.h >> +++ b/include/net/sctp/constants.h >> @@ -341,12 +341,14 @@ typedef enum { >> SCTP_SCOPE_UNUSABLE,/* IPv4 unusable addresses */ >> } sctp_scope_t; >> >> -typedef enum { >> +enum { >> SCTP_SCOPE_POLICY_DISABLE, /* Disable IPv4 address scoping */ >> SCTP_SCOPE_POLICY_ENABLE, /* Enable IPv4 address scoping */ >> SCTP_SCOPE_POLICY_PRIVATE, /* Follow draft but allow IPv4 private >> addresses */ >> SCTP_SCOPE_POLICY_LINK, /* Follow draft but allow IPv4 link >> local addresses */ >> -} sctp_scope_policy_t; >> +}; >> + >> +#define SCTP_SCOPE_POLICY_MAXSCTP_SCOPE_POLICY_LINK > > Perhaps slightly better to end the enum with: > SCTP_SCOPE_POLICY_COUNT,/* Number of policies */ > SCTP_SCOPE_POLICY_MAX = SCTP_SCOPE_POLICY_COUNT - 1 /* Last > policy */ > }; It might be, so that new member coming will not change too much. I just copied the idea of SCTP_EVENT__MAX, SCTP_STATE_MAX :-)
Re: [PATCH net] sctp: use __GFP_NOWARN for sctpw.fifo allocation
On Mon, Aug 7, 2017 at 11:39 AM, Marcelo Ricardo Leitnerwrote: > On Sun, Aug 06, 2017 at 06:14:39PM +1200, Xin Long wrote: >> On Sun, Aug 6, 2017 at 5:08 AM, Marcelo Ricardo Leitner >> wrote: >> > On Sat, Aug 05, 2017 at 08:31:09PM +0800, Xin Long wrote: >> >> Chen Wei found a kernel call trace when modprobe sctp_probe with >> >> bufsize set with a huge value. >> >> >> >> It's because in sctpprobe_init when alloc memory for sctpw.fifo, >> >> the size is got from userspace. If it is too large, kernel will >> >> fail and give a warning. >> > >> > Yes but sctp_probe can only be loaded by an admin and it would happen >> > only during modprobe. It's different from the commit mentioned below, on >> > which any user could trigger it. >> yeah, in this way it's different, I think generally it's acceptable to have >> this kinda warning call trace by admin. >> >> But it could get the feedback from the return value and the warning >> call trace seems not useful. sometimes users may be confused > > users or admins? admins. > >> with this call trace. So it may be better not to dump the warning ? >> >> Or you think it can be helpful if we leave it here ? > > I'm afraid we may be exagerating here. There are several other ways that > an admin can trigger scary warnings, this one is no special. I'd rather > leave this one to the mm defaults instead. OK, I'm all for that. > >> >> > >> >> >> >> As there will be a fallback allocation later, this patch is just >> >> to fail silently and return ret, just as commit 0ccc22f425e5 >> >> ("sit: use __GFP_NOWARN for user controlled allocation") did. >> >> >> >> Reported-by: Chen Wei >> >> Signed-off-by: Xin Long >> >> --- >> >> net/sctp/probe.c | 2 +- >> >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> >> >> diff --git a/net/sctp/probe.c b/net/sctp/probe.c >> >> index 6cc2152..5bf3164 100644 >> >> --- a/net/sctp/probe.c >> >> +++ b/net/sctp/probe.c >> >> @@ -210,7 +210,7 @@ static __init int sctpprobe_init(void) >> >> >> >> init_waitqueue_head(); >> >> spin_lock_init(); >> >> - if (kfifo_alloc(, bufsize, GFP_KERNEL)) >> >> + if (kfifo_alloc(, bufsize, GFP_KERNEL | __GFP_NOWARN)) >> >> return ret; >> >> >> >> if (!proc_create(procname, S_IRUSR, init_net.proc_net, >> >> -- >> >> 2.1.0 >> >> >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >> >> the body of a message to majord...@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>
Re: [PATCH net] net: sched: fix NULL pointer dereference when action calls some targets
On Tue, Aug 8, 2017 at 9:15 AM, Cong Wangwrote: > (Cc'ing netfilter and Jamal) > > On Sat, Aug 5, 2017 at 4:35 AM, Xin Long wrote: >> As we know in some target's checkentry it may dereference par.entryinfo >> to check entry stuff inside. But when sched action calls xt_check_target, >> par.entryinfo is set with NULL. It would cause kernel panic when calling >> some targets. >> >> It can be reproduce with: >> # tc qd add dev eth1 ingress handle : >> # tc filter add dev eth1 parent : u32 match u32 0 0 action xt \ >> -j ECN --ecn-tcp-remove >> >> It could also crash kernel when using target CLUSTERIP or TPROXY. >> [1] >> By now there's no proper value for par.entryinfo in ipt_init_target, >> but it can not be set with NULL. This patch is to void all these >> panics by setting it with an ipt_entry obj with all members 0. >> >> Signed-off-by: Xin Long >> --- >> net/sched/act_ipt.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c >> index 7c4816b..0f09f70 100644 >> --- a/net/sched/act_ipt.c >> +++ b/net/sched/act_ipt.c >> @@ -41,6 +41,7 @@ static int ipt_init_target(struct net *net, struct >> xt_entry_target *t, >> { >> struct xt_tgchk_param par; >> struct xt_target *target; >> + struct ipt_entry e; >> int ret = 0; >> >> target = xt_request_find_target(AF_INET, t->u.user.name, >> @@ -48,10 +49,11 @@ static int ipt_init_target(struct net *net, struct >> xt_entry_target *t, >> if (IS_ERR(target)) >> return PTR_ERR(target); >> >> + memset(, 0, sizeof(e)); >> t->u.kernel.target = target; >> par.net = net; >> par.table = table; >> - par.entryinfo = NULL; >> + par.entryinfo = > > This looks like a completely API burden? netfilter xt targets are not really compatible with netsched action. I've got to say, the patch is just a way to make checkentry return false and avoid panic. like [1] said
Re: [PATCH net] net: sched: set xt_tgchk_param par.nft_compat with false in ipt_init_target
On Tue, Aug 8, 2017 at 9:03 AM, Cong Wangwrote: > On Sat, Aug 5, 2017 at 4:32 AM, Xin Long wrote: >> Commit 55917a21d0cc ("netfilter: x_tables: add context to know if >> extension runs from nft_compat") introduced a member nft_compat to >> xt_tgchk_param structure. >> >> But it didn't set it's value for ipt_init_target. With unexpected >> value in par.nft_compat, it may return unexpected result in some >> target's checkentry. >> >> This patch is to set par.nft_compat with false in ipt_init_target. > > It's time to set all these fields to 0 and only initialize those non-zero > fields, in case we will add more fields in the future. ok, the new fix is depend on the net_id one. I will post v2 after that one gets accepted. thanks.
[PATCHv2 net] net: sched: set xt_tgchk_param par.net properly in ipt_init_target
Now xt_tgchk_param par in ipt_init_target is a local varibale, par.net is not initialized there. Later when xt_check_target calls target's checkentry in which it may access par.net, it would cause kernel panic. Jaroslav found this panic when running: # ip link add TestIface type dummy # tc qd add dev TestIface ingress handle : # tc filter add dev TestIface parent : u32 match u32 0 0 \ action xt -j CONNMARK --set-mark 4 This patch is to pass net param into ipt_init_target and set par.net with it properly in there. v1->v2: As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix it by also passing net_id to __tcf_ipt_init. Reported-by: Jaroslav AsterSigned-off-by: Xin Long --- net/sched/act_ipt.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c index 36f0ced..94ba5cf 100644 --- a/net/sched/act_ipt.c +++ b/net/sched/act_ipt.c @@ -36,8 +36,8 @@ static struct tc_action_ops act_ipt_ops; static unsigned int xt_net_id; static struct tc_action_ops act_xt_ops; -static int ipt_init_target(struct xt_entry_target *t, char *table, - unsigned int hook) +static int ipt_init_target(struct net *net, struct xt_entry_target *t, + char *table, unsigned int hook) { struct xt_tgchk_param par; struct xt_target *target; @@ -49,6 +49,7 @@ static int ipt_init_target(struct xt_entry_target *t, char *table, return PTR_ERR(target); t->u.kernel.target = target; + par.net = net; par.table = table; par.entryinfo = NULL; par.target= target; @@ -91,10 +92,11 @@ static const struct nla_policy ipt_policy[TCA_IPT_MAX + 1] = { [TCA_IPT_TARG] = { .len = sizeof(struct xt_entry_target) }, }; -static int __tcf_ipt_init(struct tc_action_net *tn, struct nlattr *nla, +static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla, struct nlattr *est, struct tc_action **a, const struct tc_action_ops *ops, int ovr, int bind) { + struct tc_action_net *tn = net_generic(net, id); struct nlattr *tb[TCA_IPT_MAX + 1]; struct tcf_ipt *ipt; struct xt_entry_target *td, *t; @@ -159,7 +161,7 @@ static int __tcf_ipt_init(struct tc_action_net *tn, struct nlattr *nla, if (unlikely(!t)) goto err2; - err = ipt_init_target(t, tname, hook); + err = ipt_init_target(net, t, tname, hook); if (err < 0) goto err3; @@ -193,18 +195,16 @@ static int tcf_ipt_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind) { - struct tc_action_net *tn = net_generic(net, ipt_net_id); - - return __tcf_ipt_init(tn, nla, est, a, _ipt_ops, ovr, bind); + return __tcf_ipt_init(net, ipt_net_id, nla, est, a, _ipt_ops, ovr, + bind); } static int tcf_xt_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind) { - struct tc_action_net *tn = net_generic(net, xt_net_id); - - return __tcf_ipt_init(tn, nla, est, a, _xt_ops, ovr, bind); + return __tcf_ipt_init(net, xt_net_id, nla, est, a, _xt_ops, ovr, + bind); } static int tcf_ipt(struct sk_buff *skb, const struct tc_action *a, -- 2.1.0
Re: [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
On Mon, Aug 07, 2017 at 02:14:48PM -0700, David Miller wrote: > From: Ding Tianhong> Date: Mon, 7 Aug 2017 12:13:17 +0800 > > > Hi David: > > > > I think networking tree merge it is a better choice, as it mainly used to > > tell the NIC > > drivers how to use the Relaxed Ordering Attribute, and later we need send > > patch to enable > > RO for ixgbe driver base on this patch. But I am not sure whether Bjorn has > > some of his own > > view. :) > > > > Hi Bjorn: > > > > Could you help review this patch or give some feedback ? > > I'm still waiting on this... > > Bjorn? I was on vacation Friday-today, but I'll look at this series this week.
Re: [PATCH RFC v2 3/5] samples/bpf: Fix inline asm issues building samples on arm64
Hi Dave, On Mon, Aug 7, 2017 at 11:28 AM, David Millerwrote: > > Please, no. Sorry you dislike it, I had intentionally marked it as RFC as its an idea I was just toying with the idea and posted it early to get feedback. > > The amount of hellish hacks we are adding to deal with this is getting > way out of control. I agree with you that hellish hacks are being added which is why it keeps breaking. I think one of the things my series does is to add back inclusion of asm headers that were previously removed (that is the worst hellish hack in my opinion that existing in mainline). So in that respect my patch is an improvement and makes it possible to build for arm64 platforms (which is currently broken in mainline). > > BPF programs MUST have their own set of asm headers, this is the > only way to get around this issue in the long term. Wouldn't that break scripts or bpf code that instruments/trace arch specific code? > > I am also strongly against adding -static to the build. I can drop -static if you prefer, that's not an issue. As I understand it, there are no other cleaner alternatives and this patchset makes the samples work. I would even argue that's its more functional than previous attempts and fixes something broken in mainline in a more generic way. If you can provide an example of where my patchset may not work, I would love to hear it. My whole idea was to do it in a way that makes future breakage not happen. I don't think that leaving things broken in this state for extended periods of time makes sense and IMHO will slow usage of bpf samples on other platforms. thanks, -Joel
Re:Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
At 2017-08-08 01:17:02, "Cong Wang"wrote: >On Sun, Aug 6, 2017 at 6:32 PM, Gao Feng wrote: >> I think the RCU should be supposed to avoid the race between del_chan and >> lookup_chan. > >More precisely, it is callid_sock which is protected by RCU. > >Unless I miss any other code path, pptp_exit_module() is >problematic too, I don't think it can just vfree() the whole thing. > > >> The synchronize_rcu could make sure if there was one which calls lookup_chan >> in this period, it would be finished and the sock refcnt is increased if >> necessary. >> >> So I think it is ok to invoke sock_put directly without SOCK_RCU_FREE, >> because the lookup_chan caller has already hold the sock refcnt, >> > Hi Cong, I just thought about this issue last night, then I get your this email this morning. >If you mean the sock_hold() inside lookup_chan(), no, >it doesn't help because we already dereference the sock >before it. > Sorry, I don't get you clearly. Why the sock_hold() isn't helpful? The pptp_release invokes synchronize_rcu after del_chan, it could make sure the others has increased the sock refcnt if necessary and the lookup is over. There is no one could get the sock after synchronize_rcu in pptp_release. But I think about another problem. It seems the pptp_sock_destruct should not invoke del_chan and pppox_unbind_sock. Because when the sock refcnt is 0, the pptp_release must have be invoked already. There are two cases totally. 1. when pptp_release invokes sock_put, the refcnt is 0. The del_chan and pppox_unbind_sock are invoked. 2. when pptp_release invokes sock_put, the refcnt is not 0. It means someone holds the sock during the period pptp_release invokes del_chan. Then someone invokes sock_put and the sock refcnt reach 0, it would invoke sk_free and invokes pptp_sock_destruct. So it is unnecessary to invoke del_chan and pppox_unbind_sock again. And it would bring a race issue even if the pptp_sock_destruct invoked del_chan. If so, I would send another patch for it. >Also, lookup_chan_dst() does not have a refcnt, I don't >find any code preventing it deref'ing other sock in callid_sock >than the calling one. Sorry, the last email is html format, not text. So I send it with text format again. Best Regards Feng
Re: [PATCH net] net: sched: set xt_tgchk_param par.net properly in ipt_init_target
On Tue, Aug 8, 2017 at 9:00 AM, Cong Wangwrote: > On Sat, Aug 5, 2017 at 1:48 AM, Xin Long wrote: >> -static int __tcf_ipt_init(struct tc_action_net *tn, struct nlattr *nla, >> +static int __tcf_ipt_init(struct net *net, struct nlattr *nla, >> struct nlattr *est, struct tc_action **a, >> const struct tc_action_ops *ops, int ovr, int bind) >> { >> + struct tc_action_net *tn = net_generic(net, xt_net_id); > > ... > >> @@ -193,18 +195,14 @@ static int tcf_ipt_init(struct net *net, struct nlattr >> *nla, >> struct nlattr *est, struct tc_action **a, int ovr, >> int bind) >> { >> - struct tc_action_net *tn = net_generic(net, ipt_net_id); >> - >> - return __tcf_ipt_init(tn, nla, est, a, _ipt_ops, ovr, bind); >> + return __tcf_ipt_init(net, nla, est, a, _ipt_ops, ovr, bind); >> } >> >> static int tcf_xt_init(struct net *net, struct nlattr *nla, >>struct nlattr *est, struct tc_action **a, int ovr, >>int bind) >> { >> - struct tc_action_net *tn = net_generic(net, xt_net_id); >> - >> - return __tcf_ipt_init(tn, nla, est, a, _xt_ops, ovr, bind); >> + return __tcf_ipt_init(net, nla, est, a, _xt_ops, ovr, bind); > > This is not correct. > > You miss ipt_net_id != xt_net_id. right, that's a silly mistake. seems no better way but to pass both net and net_id to __tcf_ipt_init. will send v2. thanks.
Re: [PATCH v5 net-next 00/12] bpf: rewrite value tracking in verifier
On 08/07/2017 04:21 PM, Edward Cree wrote: This series simplifies alignment tracking, generalises bounds tracking and fixes some bounds-tracking bugs in the BPF verifier. Pointer arithmetic on packet pointers, stack pointers, map value pointers and context pointers has been unified, and bounds on these pointers are only checked when the pointer is dereferenced. Operations on pointers which destroy all relation to the original pointer (such as multiplies and shifts) are disallowed if !env->allow_ptr_leaks, otherwise they convert the pointer to an unknown scalar and feed it to the normal scalar arithmetic handling. Pointer types have been unified with the corresponding adjusted-pointer types where those existed (e.g. PTR_TO_MAP_VALUE[_ADJ] or FRAME_PTR vs PTR_TO_STACK); similarly, CONST_IMM and UNKNOWN_VALUE have been unified into SCALAR_VALUE. Pointer types (except CONST_PTR_TO_MAP, PTR_TO_MAP_VALUE_OR_NULL and PTR_TO_PACKET_END, which do not allow arithmetic) have a 'fixed offset' and a 'variable offset'; the former is used when e.g. adding an immediate or a known-constant register, as long as it does not overflow. Otherwise the latter is used, and any operation creating a new variable offset creates a new 'id' (and, for PTR_TO_PACKET, clears the 'range'). SCALAR_VALUEs use the 'variable offset' fields to track the range of possible values; the 'fixed offset' should never be set on a scalar. Been testing and reviewing the series over the last several days, looks reasonable to me as far as I can tell. Thanks for all the hard work on unifying this, Edward! Acked-by: Daniel Borkmann
[PATCH] Allow passing tid or pid in SCM_CREDENTIALS without CAP_SYS_ADMIN
Currently passing tid(gettid(2)) of a thread in struct ucred in SCM_CREDENTIALS message requires CAP_SYS_ADMIN capability otherwise it fails with EPERM error. Some applications deal with thread id of a thread(tid) and so it would help to allow tid in SCM_CREDENTIALS message. Basically, either tgid(pid of the process) or the tid of the thread should be allowed without the need for CAP_SYS_ADMIN capability. SCM_CREDENTIALS will be used to determine the global id of a process or a thread running inside a pid namespace. This patch adds necessary check to accept tid in SCM_CREDENTIALS struct ucred. Signed-off-by: Prakash Sangappa--- net/core/scm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/scm.c b/net/core/scm.c index b1ff8a4..9274197 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -55,6 +55,7 @@ static __inline__ int scm_check_creds(struct ucred *creds) return -EINVAL; if ((creds->pid == task_tgid_vnr(current) || +creds->pid == task_pid_vnr(current) || ns_capable(task_active_pid_ns(current)->user_ns, CAP_SYS_ADMIN)) && ((uid_eq(uid, cred->uid) || uid_eq(uid, cred->euid) || uid_eq(uid, cred->suid)) || ns_capable(cred->user_ns, CAP_SETUID)) && -- 2.7.4
Re: [PATCH] net: Reduce skb_warn_bad_offload() noise.
That is fine to me. I have tested it. Thanks. On Mon, Aug 7, 2017 at 12:42 PM, Willem de Bruijnwrote: >> The openvswitch kernel module calls the __skb_gso_segment()(and sets >> tx_path = false) when passing packets to userspace. The UFO will set >> the ip_summed to CHECKSUM_NONE. There are a lot of warn logs. The warn >> log is shown as below. I guess we should revert the patch. > > Indeed, the software UFO code computes the checksum and > sets ip_summed to CHECKSUM_NONE, as is correct on the > egress path. > > Commit 6e7bc478c9a0 ("net: skb_needs_check() accepts > CHECKSUM_NONE for tx") revised the tx_path case in > skb_needs_check to avoid the warning exactly for the UFO case. > > We cannot just make an exception for CHECKSUM_NONE in the > !tx_path case, as the entire statement then becomes false: > >return skb->ip_summed == CHECKSUM_NONE; > > Since on egress CHECKSUM_UNNECESSARY is equivalent to > CHECKSUM_NONE, it should be fine to update the UFO code > to set that, instead: > > @@ -235,7 +235,7 @@ static struct sk_buff *udp4_ufo_fragment(struct > sk_buff *skb, > if (uh->check == 0) > uh->check = CSUM_MANGLED_0; > > - skb->ip_summed = CHECKSUM_NONE; > + skb->ip_summed = CHECKSUM_UNNECESSARY;
Re: Qdisc->u32_node - licence to kill
On Mon, Aug 7, 2017 at 12:54 PM, John Fastabendwrote: > On 08/07/2017 12:06 PM, Jiri Pirko wrote: >> Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastab...@gmail.com wrote: >>> On 08/07/2017 09:41 AM, Jiri Pirko wrote: Hi Jamal/Cong/David/all. Digging in the u32 code deeper now. I need to get rid of tp->q for shared blocks, but I found out about this: struct Qdisc { .. void*u32_node; .. }; Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually stores a linked list of all hashtables added to one qdiscs. So basically what you have is, you have 1 root ht per prio/pref. Then you can have multiple hts, linked from any other ht, does not matter in which prio/pref they are. >>> >>> We can create arbitrary hash tables here independent of prio/pref via >>> TCA_U32_DIVISOR. Then these can be linked to other hash tables via >>> TCA_U32_LINK commands. >> >> Yeah, that's what I thought. >> >> >>> >>> prio/pref does not really play any part here from my reading, except as >>> a further specifier in the walk callbacks. Making it a useful filter on >>> dump operations. >> >> Not correct. prio/pref is one level up priority, independent on specific >> cls implementation. You can have cls_u32 instance on prio 10 and >> cls_flower instance on prio 20. Both work. > > ah right, lets make sure I got this right then (its been awhile since I've > read this code). So the tcf_ctl_tfilter hook walks classifiers, inserting the > classifier by prio. Then tcf_classify walks the list of classifiers looking > for any matches, specifically any return codes it recognizes or a return code > greater than zero. u32 though has this link notion that allows users to jump > to other u32 classifiers that are in this list, because it has a global hash > table list. So the per prio classifier isolation is not true in u32 case. u32 filter supports multiple hash tables within a qdisc, struct tc_u_common is supposed to link them together. This has to be per qdisc because all of these hash tables belong to one qdisc and their ID's are unique within the qdisc. I dislike it too, and I actually tried to improve it in the past, unfortunately didn't make any real progress. I think we can definitely make it less ugly, but I don't think we can totally get rid of it because of the design of u32. Similar for tp->data.
Re: [PATCH net-next] net: dsa: lan9303: Only allocate 3 ports
Egil Hjelmelandwrites: > Save 2628 bytes on arm eabi by allocate only the required 3 ports. > > Now that ds->num_ports is correct: In net/dsa/tag_lan9303.c > eliminate duplicate LAN9303_MAX_PORTS, use ds->num_ports. > (Matching the pattern of other net/dsa/tag_xxx.c files.) > > Signed-off-by: Egil Hjelmeland Reviewed-by: Vivien Didelot
Re: [PATCH net-next] selftests: bpf: add a test for XDP redirect
On 08/07/2017 01:14 PM, William Tu wrote: > Add test for xdp_redirect by creating two namespaces with two > veth peers, then forward packets in-between. > > Signed-off-by: William Tu> Cc: Daniel Borkmann > Cc: John Fastabend > --- Thanks for doing this. Acked-by: John Fastabend
Re: [PATCH] ip/link_vti*.c: Fix output for ikey/okey
On Mon, 7 Aug 2017 11:59:28 +0200 Christian Langrockwrote: > ikey and okey are normal u32 values. There's no reason to print them as > IPv4/IPv6 addresses. > > Signed-off-by: Christian Langrock Changing output format breaks scripts that parse output. But on the other hand, the VTI code breaks the assumption that ip command output should be the same as input. More likely the original output format was done to match Cisco output. Why not print in hex like fwmark? pgpymvaMpWU8J.pgp Description: OpenPGP digital signature
Re: [PATCH net-next] net: dsa: lan9303: Only allocate 3 ports
On 08/07/2017 03:22 PM, Egil Hjelmeland wrote: > Save 2628 bytes on arm eabi by allocate only the required 3 ports. > > Now that ds->num_ports is correct: In net/dsa/tag_lan9303.c > eliminate duplicate LAN9303_MAX_PORTS, use ds->num_ports. > (Matching the pattern of other net/dsa/tag_xxx.c files.) > > Signed-off-by: Egil HjelmelandReviewed-by: Florian Fainelli -- Florian
[Patch net-next] net_sched: get rid of some forward declarations
If we move up tcf_fill_node() we can get rid of these forward declarations. Also, move down tfilter_notify_chain() to group them together. Reported-by: Jamal Hadi SalimCc: Jamal Hadi Salim Signed-off-by: Cong Wang --- net/sched/cls_api.c | 214 +--- 1 file changed, 103 insertions(+), 111 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 668afb6e9885..8d1157aebaf7 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -100,25 +100,6 @@ int unregister_tcf_proto_ops(struct tcf_proto_ops *ops) } EXPORT_SYMBOL(unregister_tcf_proto_ops); -static int tfilter_notify(struct net *net, struct sk_buff *oskb, - struct nlmsghdr *n, struct tcf_proto *tp, - void *fh, int event, bool unicast); - -static int tfilter_del_notify(struct net *net, struct sk_buff *oskb, - struct nlmsghdr *n, struct tcf_proto *tp, - void *fh, bool unicast, bool *last); - -static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb, -struct nlmsghdr *n, -struct tcf_chain *chain, int event) -{ - struct tcf_proto *tp; - - for (tp = rtnl_dereference(chain->filter_chain); -tp; tp = rtnl_dereference(tp->next)) - tfilter_notify(net, oskb, n, tp, 0, event, false); -} - /* Select new prio value from the range, managed by kernel. */ static inline u32 tcf_auto_prio(struct tcf_proto *tp) @@ -411,6 +392,109 @@ static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain, return tp; } +static int tcf_fill_node(struct net *net, struct sk_buff *skb, +struct tcf_proto *tp, void *fh, u32 portid, +u32 seq, u16 flags, int event) +{ + struct tcmsg *tcm; + struct nlmsghdr *nlh; + unsigned char *b = skb_tail_pointer(skb); + + nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags); + if (!nlh) + goto out_nlmsg_trim; + tcm = nlmsg_data(nlh); + tcm->tcm_family = AF_UNSPEC; + tcm->tcm__pad1 = 0; + tcm->tcm__pad2 = 0; + tcm->tcm_ifindex = qdisc_dev(tp->q)->ifindex; + tcm->tcm_parent = tp->classid; + tcm->tcm_info = TC_H_MAKE(tp->prio, tp->protocol); + if (nla_put_string(skb, TCA_KIND, tp->ops->kind)) + goto nla_put_failure; + if (nla_put_u32(skb, TCA_CHAIN, tp->chain->index)) + goto nla_put_failure; + if (!fh) { + tcm->tcm_handle = 0; + } else { + if (tp->ops->dump && tp->ops->dump(net, tp, fh, skb, tcm) < 0) + goto nla_put_failure; + } + nlh->nlmsg_len = skb_tail_pointer(skb) - b; + return skb->len; + +out_nlmsg_trim: +nla_put_failure: + nlmsg_trim(skb, b); + return -1; +} + +static int tfilter_notify(struct net *net, struct sk_buff *oskb, + struct nlmsghdr *n, struct tcf_proto *tp, + void *fh, int event, bool unicast) +{ + struct sk_buff *skb; + u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb) + return -ENOBUFS; + + if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq, + n->nlmsg_flags, event) <= 0) { + kfree_skb(skb); + return -EINVAL; + } + + if (unicast) + return netlink_unicast(net->rtnl, skb, portid, MSG_DONTWAIT); + + return rtnetlink_send(skb, net, portid, RTNLGRP_TC, + n->nlmsg_flags & NLM_F_ECHO); +} + +static int tfilter_del_notify(struct net *net, struct sk_buff *oskb, + struct nlmsghdr *n, struct tcf_proto *tp, + void *fh, bool unicast, bool *last) +{ + struct sk_buff *skb; + u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; + int err; + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb) + return -ENOBUFS; + + if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq, + n->nlmsg_flags, RTM_DELTFILTER) <= 0) { + kfree_skb(skb); + return -EINVAL; + } + + err = tp->ops->delete(tp, fh, last); + if (err) { + kfree_skb(skb); + return err; + } + + if (unicast) + return netlink_unicast(net->rtnl, skb, portid, MSG_DONTWAIT); + + return rtnetlink_send(skb, net, portid, RTNLGRP_TC, + n->nlmsg_flags & NLM_F_ECHO); +} + +static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb, +struct nlmsghdr *n, +struct
multi-queue over IFF_NO_QUEUE "virtual" devices
Hi, Most DSA supported Broadcom switches have multiple queues per ports (usually 8) and each of these queues can be configured with different pause, drop, hysteresis thresholds and so on in order to make use of the switch's internal buffering scheme and have some queues achieve some kind of lossless behavior (e.g: LAN to LAN traffic for Q7 has a higher priority than LAN to WAN for Q0). This is obviously very workload specific, so I'd want maximum programmability as much as possible. This brings me to a few questions: 1) If we have the DSA slave network devices currently flagged with IFF_NO_QUEUE becoming multi-queue (on TX) aware such that an application can control exactly which switch egress queue is used on a per-flow basis, would that be a problem (this is the dynamic selection of the TX queue)? 2) The conduit interface (CPU) port network interface has a congestion control scheme which requires each of its TX queues (32 or 16) to be statically mapped to each of the underlying switch port queues because the congestion/ HW needs to inspect the queue depths of the switch to accept/reject a packet at the CPU's TX ring level. Do we have a good way with tc to map a virtual/stacked device's queue(s) on-top of its physical/underlying device's queues (this is the static queue mapping necessary for congestion to work)? Let me know if you think this is the right approach or not. Thanks! -- Florian
[RFC PATCH 1/2] bpf: Fix bpf_trace_printk on 32-bit architectures
bpf_trace_printk() uses conditional operators to attempt to pass different types to __trace_printk() depending on the format operators. This doesn't work as intended on 32-bit architectures where u32 & long are passed differently to u64, since the result of C conditional operators follows the "usual arithmetic conversions" rules, such that the values passed to __trace_printk() will always be u64. For example the samples/bpf/tracex5 test printed lines like below on MIPS, where the fd and buf have come from the u64 fd argument, and the size from the buf argument: dd-1176 [000] 1180.941542: 0x0001: write(fd=1, buf= (null), size=6258688) Instead of this: dd-1217 [000] 1625.616026: 0x0001: write(fd=1, buf=009e4000, size=512) Work around this with an ugly hack which expands each combination of argument types for the 3 arguments. On 64-bit kernels it is assumed that u32, long & u64 are all passed the same way so no casting takes place (it has apparently worked implicitly until now). On 32-bit kernels it is assumed that long and u32 pass the same way so there are 8 combinations. On 32-bit kernels bpf_trace_printk() increases in size but should now work correctly. On 64-bit kernels it actually reduces in size slightly, I presume due to removal of some of the casts (which as far as I can tell are unnecessary for printk anyway due to the controlled nature of the interpretation): arch function old new delta x86_64 bpf_trace_printk 532 412-120 x86bpf_trace_printk 6761120+444 MIPS64 bpf_trace_printk 760 612-148 MIPS32 bpf_trace_printk 768 996+228 Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()") Signed-off-by: James HoganCc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Steven Rostedt Cc: Ingo Molnar Cc: netdev@vger.kernel.org --- I'm open to nicer ways of fixing this. This is tested with samples/bpf/tracex5 on MIPS32 and MIPS64. Only build tested on x86. --- kernel/trace/bpf_trace.c | 26 ++ 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 37385193a608..32dcbe1b48f2 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -204,10 +204,28 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1, fmt_cnt++; } - return __trace_printk(1/* fake ip will not be printed */, fmt, - mod[0] == 2 ? arg1 : mod[0] == 1 ? (long) arg1 : (u32) arg1, - mod[1] == 2 ? arg2 : mod[1] == 1 ? (long) arg2 : (u32) arg2, - mod[2] == 2 ? arg3 : mod[2] == 1 ? (long) arg3 : (u32) arg3); + /* +* This is a horribly ugly hack to allow different combinations of +* argument types to be used, particularly on 32-bit architectures where +* u32 & long pass the same as one another, but differently to u64. +* +* On 64-bit architectures it is assumed u32, long & u64 pass in the +* same way. +*/ + +#define __BPFTP_P(...) __trace_printk(1/* fake ip will not be printed */, \ + fmt, ##__VA_ARGS__) +#define __BPFTP_1(...) ((mod[0] == 2 || __BITS_PER_LONG == 64) \ +? __BPFTP_P(arg1, ##__VA_ARGS__) \ +: __BPFTP_P((long)arg1, ##__VA_ARGS__)) +#define __BPFTP_2(...) ((mod[1] == 2 || __BITS_PER_LONG == 64) \ +? __BPFTP_1(arg2, ##__VA_ARGS__) \ +: __BPFTP_1((long)arg2, ##__VA_ARGS__)) +#define __BPFTP_3(...) ((mod[2] == 2 || __BITS_PER_LONG == 64) \ +? __BPFTP_2(arg3, ##__VA_ARGS__) \ +: __BPFTP_2((long)arg3, ##__VA_ARGS__)) + + return __BPFTP_3(); } static const struct bpf_func_proto bpf_trace_printk_proto = { -- 2.13.2
[RFC PATCH 0/2] bpf_trace_printk() fixes
A couple of RFC fixes for bpf_trace_printk(). The first affects 32-bit architectures in particular, the second is a theoretical uninitialised variable fix. Cc: Alexei StarovoitovCc: Daniel Borkmann Cc: Steven Rostedt Cc: Ingo Molnar Cc: netdev@vger.kernel.org James Hogan (2): bpf: Fix bpf_trace_printk on 32-bit architectures bpf: Initialise mod[] in bpf_trace_printk kernel/trace/bpf_trace.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) -- 2.13.2
[RFC PATCH 2/2] bpf: Initialise mod[] in bpf_trace_printk
In bpf_trace_printk(), the elements in mod[] are left uninitialised, but they are then incremented to track the width of the formats. Zero initialise the array just in case the memory contains non-zero values on entry. Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()") Signed-off-by: James HoganCc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Steven Rostedt Cc: Ingo Molnar Cc: netdev@vger.kernel.org --- When I checked (on MIPS32), the elements tended to have the value zero anyway (does BPF zero the stack or something clever?), so this is a purely theoretical fix. --- kernel/trace/bpf_trace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 32dcbe1b48f2..86a52857d941 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -129,7 +129,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1, u64, arg2, u64, arg3) { bool str_seen = false; - int mod[3] = {}; + int mod[3] = { 0, 0, 0 }; int fmt_cnt = 0; u64 unsafe_addr; char buf[64]; -- 2.13.2
[RFC PATCH] net: don't set __LINK_STATE_START until after dev->open() call
Fix an issue with relying on netif_running() which could be true during when dev->open() handler is being called, even if it would exit with a failure. This ensures the state does not get set and removed with a narrow race for other callers to read it as open when infact it never finished opening. Signed-off-by: Jacob Keller--- I found this as a result of debugging a race condition in the i40evf driver, in which we assumed that netif_running() would not be true until after dev->open() had been called and succeeded. Unfortunately we can't hold the rtnl_lock() while checking netif_running() because it would cause a deadlock between our reset task and our ndo_open handler. I am wondering whether the proposed change is acceptable here, or whether some ndo_open handlers rely on __LINK_STATE_START being true prior to their being called? net/core/dev.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 1d75499add72..11953af90427 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1362,8 +1362,6 @@ static int __dev_open(struct net_device *dev) if (ret) return ret; - set_bit(__LINK_STATE_START, >state); - if (ops->ndo_validate_addr) ret = ops->ndo_validate_addr(dev); @@ -1372,9 +1370,8 @@ static int __dev_open(struct net_device *dev) netpoll_poll_enable(dev); - if (ret) - clear_bit(__LINK_STATE_START, >state); - else { + if (!ret) + set_bit(__LINK_STATE_START, >state); dev->flags |= IFF_UP; dev_set_rx_mode(dev); dev_activate(dev); -- 2.14.0.rc1.251.g593d8d6362ce
[PATCH net-next] net: dsa: lan9303: Only allocate 3 ports
Save 2628 bytes on arm eabi by allocate only the required 3 ports. Now that ds->num_ports is correct: In net/dsa/tag_lan9303.c eliminate duplicate LAN9303_MAX_PORTS, use ds->num_ports. (Matching the pattern of other net/dsa/tag_xxx.c files.) Signed-off-by: Egil Hjelmeland--- drivers/net/dsa/lan9303-core.c | 2 +- net/dsa/tag_lan9303.c | 3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c index 15befd155251..46fc1d5d3c9e 100644 --- a/drivers/net/dsa/lan9303-core.c +++ b/drivers/net/dsa/lan9303-core.c @@ -811,7 +811,7 @@ static struct dsa_switch_ops lan9303_switch_ops = { static int lan9303_register_switch(struct lan9303 *chip) { - chip->ds = dsa_switch_alloc(chip->dev, DSA_MAX_PORTS); + chip->ds = dsa_switch_alloc(chip->dev, LAN9303_NUM_PORTS); if (!chip->ds) return -ENOMEM; diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c index 247774d149f9..e23e7635fa00 100644 --- a/net/dsa/tag_lan9303.c +++ b/net/dsa/tag_lan9303.c @@ -39,7 +39,6 @@ */ #define LAN9303_TAG_LEN 4 -#define LAN9303_MAX_PORTS 3 static struct sk_buff *lan9303_xmit(struct sk_buff *skb, struct net_device *dev) { @@ -104,7 +103,7 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev, source_port = ntohs(lan9303_tag[1]) & 0x3; - if (source_port >= LAN9303_MAX_PORTS) { + if (source_port >= ds->num_ports) { dev_warn_ratelimited(>dev, "Dropping packet due to invalid source port\n"); return NULL; } -- 2.11.0
Re: [PATCH net-next] net: vrf: Add extack messages for newlink failures
From: David AhernDate: Mon, 7 Aug 2017 10:08:10 -0700 > Add extack error messages for failure paths creating vrf devices. Once > extack support is added to iproute2, we go from the unhelpful: > $ ip li add foobar type vrf > RTNETLINK answers: Invalid argument > > to: > $ ip li add foobar type vrf > Error: VRF table id is missing > > Signed-off-by: David Ahern Applied, thanks David.
[PATCH RFC 1/2] bpf: Add a BPF return code to disconnect a connection
When using BPF program against a flow a possible verdict is that the packet should not only be dropped, but that the flow the packet was received on should be terminated. Signed-off-by: Tom Herbert--- include/uapi/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 1d06be1569b1..324e886c3490 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -708,6 +708,7 @@ enum bpf_ret_code { BPF_DROP = 2, /* 3-6 reserved */ BPF_REDIRECT = 7, + BPF_DISCONNECT = 8, /* >127 are reserved for prog type specific return codes */ }; -- 2.11.0
[PATCH RFC 2/2] stap: Socket tap
The socket tap uses ULP mechanism to insert funtionality between the socket system calls and the connection itself. Basically, this is a means to tap into a socket. To implement the tap the data operations (sendmsg, recvmsg, sendpage, and read_sock) are intercepted using the ULP infrastructure. In each direction a strparser instance is used to deliniate a stream into application messages. A BPF "verdict" program is run on the output (applicaiton messages) each strparser. The verdict program can allow the message, drop it, or drop it and also kill the connection. The sendmsg path looks something like: sendmsg (ULP) -> strparser -> verditct -> send_sock_skb The receive path looks like: tcp_read_sock -> strparser -> verdict -> recvmsg Note the socket tap does not introduce any new locks or queuing (except for messages under construction by strparser). Also, the buffer limits of the socket are respectd in all operations. Signed-off-by: Tom Herbert--- include/net/stap.h| 43 +++ include/uapi/linux/stap.h | 21 ++ net/Kconfig | 1 + net/Makefile | 1 + net/stap/Kconfig | 8 + net/stap/Makefile | 3 + net/stap/stap_main.c | 769 ++ 7 files changed, 846 insertions(+) create mode 100644 include/net/stap.h create mode 100644 include/uapi/linux/stap.h create mode 100644 net/stap/Kconfig create mode 100644 net/stap/Makefile create mode 100644 net/stap/stap_main.c diff --git a/include/net/stap.h b/include/net/stap.h new file mode 100644 index ..dfc96a116db2 --- /dev/null +++ b/include/net/stap.h @@ -0,0 +1,43 @@ +/* + * Socket tap + * + * Copyright (c) 2017 Tom Herbert + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation. + */ + +#ifndef __NET_STAP_H_ +#define __NET_STAP_H_ + +#include +#include +#include + +struct stap_bops { + struct strparser strp; + struct bpf_prog *parse_prog; + struct bpf_prog *verdict_prog; +}; + +struct stap_sock { + struct sock *sk; /* Associated socket */ + + const struct proto_ops *orig_ops; + + void (*save_data_ready)(struct sock *sk); + void (*save_write_space)(struct sock *sk); + void (*save_state_change)(struct sock *sk); + + /* Send items */ + struct stap_bops send_bops; + struct sk_buff_head build_list; + struct sk_buff_head ready_list; + + /* Receive items */ + struct stap_bops recv_bops; + struct sk_buff *recv_skb; +}; + +#endif /* __NET_STAP_H_ */ diff --git a/include/uapi/linux/stap.h b/include/uapi/linux/stap.h new file mode 100644 index ..fa8545628fd2 --- /dev/null +++ b/include/uapi/linux/stap.h @@ -0,0 +1,21 @@ +/* + * Socket tap + * + * Copyright (c) 2017 Tom Herbert + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation. + */ + +#ifndef _UAPI_LINUX_STAP_H +#define _UAPI_LINUX_STAP_H + +struct stap_params { + int bpf_send_parse_fd; + int bpf_send_verdict_fd; + int bpf_recv_parse_fd; + int bpf_recv_verdict_fd; +}; + +#endif /* _UAPI_LINUX_STAP_H */ diff --git a/net/Kconfig b/net/Kconfig index 2b8d2d88bc2b..8a1bdd269e84 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -368,6 +368,7 @@ source "net/irda/Kconfig" source "net/bluetooth/Kconfig" source "net/rxrpc/Kconfig" source "net/kcm/Kconfig" +source "net/stap/Kconfig" source "net/strparser/Kconfig" config FIB_RULES diff --git a/net/Makefile b/net/Makefile index bed80fa398b7..3ef1d8ae8e58 100644 --- a/net/Makefile +++ b/net/Makefile @@ -36,6 +36,7 @@ obj-$(CONFIG_BT) += bluetooth/ obj-$(CONFIG_SUNRPC) += sunrpc/ obj-$(CONFIG_AF_RXRPC) += rxrpc/ obj-$(CONFIG_AF_KCM) += kcm/ +obj-$(CONFIG_STAP) += stap/ obj-$(CONFIG_STREAM_PARSER)+= strparser/ obj-$(CONFIG_ATM) += atm/ obj-$(CONFIG_L2TP) += l2tp/ diff --git a/net/stap/Kconfig b/net/stap/Kconfig new file mode 100644 index ..adb3149ea624 --- /dev/null +++ b/net/stap/Kconfig @@ -0,0 +1,8 @@ +config STAP + tristate "Socket tap" + depends on INET + select BPF_SYSCALL + select STREAM_PARSER + ---help--- + Socket tap. + diff --git a/net/stap/Makefile b/net/stap/Makefile new file mode 100644 index ..f5bce0f59da5 --- /dev/null +++ b/net/stap/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_STAP) += stap.o + +stap-y := stap_main.o diff --git a/net/stap/stap_main.c b/net/stap/stap_main.c new file mode 100644 index ..a34dcf19e6cd --- /dev/null +++ b/net/stap/stap_main.c @@ -0,0 +1,769 @@ +/* + * Socket tap + * + * Copyright (c) 2017 Tom
[PATCH RFC 0/2] stap: Socket tap
This patch set introduces generic socket tap support. This allows a means to run BPF programs or implement other functionality in both the send and receive path of a socket. The socket tap uses the ULP mechanism recently introduced for kTLS. The data operations (sendmsg, recvmsg, sendpage, and splice_recv) are intercepted. In both directions strparser is used to break the stream into discrete application layer messages. Each message is then run through a BPF verdict program that indicates the message is okay, should be dropped, or should be dropped and the connection terminated. The data path for socket tap for TCP is illustrated below: +-+ | Application| | | | sendmsg recvmsg | +-+ | ^ | | |-|- | | |+-++--+++ || Socket tap |<---| Verdict prog |<---| strparser | || recvmsg || || TCP data ready | |+-++--+++ | ^ | | | TCP receive queue V +-+ +---+ +--+ +---+ | Socket tap |-->| strparser |-->| Verdict prog |-->| skb_send_sock | | sendmsg | | send mode | | | | locked| +-+ +---+ +--+ +---+ | V TCP write queue Interface: A socket tap is enabled on socket using SO_ULP socket option with ulp type "stap". The socket option takes ULP specific configuration in the stap_params structure. The parameters consist of four file descriptors for BPF programs. These are for the parser (strparser) in the send path, the verdict program for send path, the parser and verdict programs in the receive path. Example configuration to set a socket tap. In this case the same parse and verdict programs are used on send and receive sides of the socket.: struct { struct ulp_config ulpc; struct stap_params sp; } my_config; load_bpf_file("parse_kern.o"); load_bpf_file("verdict_kern.o"); my_config.ulpc.ulp_name = "stap"; my_config.sp.bpf_send_parse_fd = prog_fd[0]; my_config.sp.bpf_send_verdict_fd = prog_fd[1]; my_config.sp.bpf_recv_parse_fd = prog_fd[0]; my_config.sp.bpf_recv_verdict_fd = prog_fd[1]; setsockopt(fd, SOL_SOCKET, SO_ULP, _config, sizeof(myconfig) Future work: - Fill in all the expected semantics. A goal of socket tap is transparency with applications. - Performance evaluation. - Add a mechanism to allow an admin process to tap other users' sockets. - Add userpsace tap that can diverted applciation messages through userpsace. I'm thinking to connect tapped sockets to KCM to provide for interface. - Integrate with kTLS. - SUpport for BFP_REDIRECT. This would be useful to redirect messages to different sockets like in John Fasatabend's socket redirect. Tom Herbert (2): bpf: Add a BPF return code to disconnect a connection stap: Socket tap include/net/stap.h| 43 +++ include/uapi/linux/bpf.h | 1 + include/uapi/linux/stap.h | 21 ++ net/Kconfig | 1 + net/Makefile | 1 + net/stap/Kconfig | 8 + net/stap/Makefile | 3 + net/stap/stap_main.c | 769 ++ 8 files changed, 847 insertions(+) create mode 100644 include/net/stap.h create mode 100644 include/uapi/linux/stap.h create mode 100644 net/stap/Kconfig create mode 100644 net/stap/Makefile create mode 100644 net/stap/stap_main.c -- 2.11.0
Re: [PATCH] wan: dscc4: add checks for dma mapping errors
Alexey Khoroshilov: > The driver does not check if mapping dma memory succeed. > The patch adds the checks and failure handling. > > Found by Linux Driver Verification project (linuxtesting.org). > > Signed-off-by: Alexey Khoroshilov Please amend your subject line as: Subject: [PATCH net-next v2 1/1] dscc4: add checks for dma mapping errors. Rationale: davem is not supposed to guess the branch the patch should be applied to. [...] > diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c > index 799830ffcae2..1a94f0a95b2c 100644 > --- a/drivers/net/wan/dscc4.c > +++ b/drivers/net/wan/dscc4.c > @@ -522,19 +522,27 @@ static inline int try_get_rx_skb(struct dscc4_dev_priv > *dpriv, > struct RxFD *rx_fd = dpriv->rx_fd + dirty; > const int len = RX_MAX(HDLC_MAX_MRU); > struct sk_buff *skb; > - int ret = 0; > + dma_addr_t addr; > > skb = dev_alloc_skb(len); > dpriv->rx_skbuff[dirty] = skb; > - if (skb) { > - skb->protocol = hdlc_type_trans(skb, dev); > - rx_fd->data = cpu_to_le32(pci_map_single(dpriv->pci_priv->pdev, > - skb->data, len, PCI_DMA_FROMDEVICE)); > - } else { > - rx_fd->data = 0; > - ret = -1; > - } > - return ret; > + if (!skb) > + goto err_out; > + > + skb->protocol = hdlc_type_trans(skb, dev); > + addr = pci_map_single(dpriv->pci_priv->pdev, > + skb->data, len, PCI_DMA_FROMDEVICE); > + if (pci_dma_mapping_error(dpriv->pci_priv->pdev, addr)) > + goto err_free_skb; Nit: please use a local 'struct pci_dev *pdev = dpriv->pci_priv->pdev;' [...] > @@ -1147,14 +1155,22 @@ static netdev_tx_t dscc4_start_xmit(struct sk_buff > *skb, > struct dscc4_dev_priv *dpriv = dscc4_priv(dev); > struct dscc4_pci_priv *ppriv = dpriv->pci_priv; > struct TxFD *tx_fd; > + dma_addr_t addr; > int next; > > + addr = pci_map_single(ppriv->pdev, skb->data, skb->len, > + PCI_DMA_TODEVICE); > + if (pci_dma_mapping_error(ppriv->pdev, addr)) { > + dev_kfree_skb_any(skb); > + dev->stats.tx_errors++; It should read 'dev->stats.tx_dropped++'. -- Ueimor
Re: [PATCH net-next v3 00/13] Update DSA's FDB API and perform switchdev cleanup
From: Arkadi SharshevskyDate: Sun, 6 Aug 2017 16:15:38 +0300 > The patchset adds support for configuring static FDB entries via the > switchdev notification chain. The current method for FDB configuration > uses the switchdev's bridge bypass implementation. In order to support > this legacy way and to perform the switchdev cleanup, the implementation > is moved inside DSA. > > The DSA drivers cannot sync the software bridge with hardware learned > entries and use the switchdev's implementation of bypass FDB dumping. > Because they are the only ones using this functionality, the fdb_dump > implementation is moved from switchdev code into DSA. > > Finally after this changes a major cleanup in switchdev can be done. > --- > Please see individual patches for patch specific change logs. > v1->v2 > - Split MDB/vlan dump removal into core/driver removal. > > v2->v3 > - The self implementation for FDB add/del is moved inside DSA. Series applied, thank you.
Re: [PATCH 0/3] ARM: dts: keystone-k2g: Add DCAN instances to 66AK2G
Hi Santosh, On 08/04/2017 12:07 PM, Santosh Shilimkar wrote: > Hi Franklin, > > On 8/2/2017 1:18 PM, Franklin S Cooper Jr wrote: >> Add D CAN nodes to 66AK2G based SoC dtsi. >> >> Franklin S Cooper Jr (2): >>dt-bindings: net: c_can: Update binding for clock and power-domains >> property >>ARM: configs: keystone: Enable D_CAN driver >> >> Lokesh Vutla (1): >>ARM: dts: k2g: Add DCAN nodes >> > Any DCAN driver dependency with these patchset ? If not, I can > queue this up so do let me know. There aren't any dependencies. > > Regards, > Santosh
Re: [PATCH net-next] selftests: bpf: add a test for XDP redirect
On 08/07/2017 10:14 PM, William Tu wrote: Add test for xdp_redirect by creating two namespaces with two veth peers, then forward packets in-between. Signed-off-by: William TuCc: Daniel Borkmann Cc: John Fastabend Acked-by: Daniel Borkmann
Re: [PATCH] hamradio: baycom: make hdlcdrv_ops const
From: Bhumika GoyalDate: Sun, 6 Aug 2017 14:21:45 +0530 > Make hdlcdrv_ops structures const as they are only passed to > hdlcdrv_register function. The corresponding argument is of type const, > so make the structures const. > > Signed-off-by: Bhumika Goyal Applied, thanks.
Re: [PATCH ipsec-next] xfrm: check that cached bundle is still valid
From: Florian WestphalDate: Sun, 6 Aug 2017 10:19:07 +0200 > Quoting Ilan Tayari: > 1. Set up a host-to-host IPSec tunnel (or transport, doesn't matter) > 2. Ping over IPSec, or do something to populate the pcpu cache > 3. Join a MC group, then leave MC group > 4. Try to ping again using same CPU as before -> traffic > doesn't egress the machine at all > > Ilan debugged the problem down to the fact that one of the path dsts > devices point to lo due to earlier dst_dev_put(). > In this case, dst is marked as DEAD and we cannot reuse the bundle. > > The cache only asserted that the requested policy and that of the cached > bundle match, but its not enough - also verify the path is still valid. > > Fixes: ec30d78c14a813 ("xfrm: add xdst pcpu cache") > Reported-by: Ayham Masood > Tested-by: Ilan Tayari > Signed-off-by: Florian Westphal Since this regression is from the flow cache removal that went directly into my tree, I'll apply this directly to net-next as well. Thanks Florian.
Re: [PATCH net-next v2 0/3] net: dsa: remove useless arguments
From: Vivien DidelotDate: Sat, 5 Aug 2017 16:20:16 -0400 > Several DSA core setup functions take many arguments, mostly because of > the legacy code. This patch series removes the useless args of these > functions, where either the dsa_switch or dsa_port argument is enough. > > Changes in v2: > - ds->dev is already assigned by dsa_switch_alloc Series applied, thanks.
Re: [PATCH][net-next] net: hns3: fix spelling mistake: "capabilty" -> "capability"
From: Colin KingDate: Sat, 5 Aug 2017 14:46:35 +0100 > From: Colin Ian King > > Trivial fix to spelling mistake in dev_err error message and also > split overly long line to avoid a checkpatch warning. > > Signed-off-by: Colin Ian King Applied, thank you.
Re: [RFC] iproute: Add support for extended ack to rtnl_talk
On Mon, 07 Aug 2017 13:26:03 -0700 (PDT) David Millerwrote: > From: Stephen Hemminger > Date: Mon, 7 Aug 2017 12:12:35 -0700 > > > Dave, I asked for test cases, and received none. > > You don't need a test case to type make and make sure the build succeeds. It did succeed for libmnl but was not doing anything.
Re: [PATCH v4 net-next 0/5] Refactor lan9303_xxx_packet_processing
From: Egil HjelmelandDate: Sat, 5 Aug 2017 13:05:45 +0200 > This series is purely non functional. > > It changes the lan9303_enable_packet_processing, > lan9303_disable_packet_processing() to pass port number (0,1,2) as > parameter instead of port offset. This aligns them with > other functions in the module, and makes it possible to simplify the code. > > The lan9303_enable_packet_processing, lan9303_disable_packet_processing > functions operate on port. Therefore rename the functions to reflect that > as well. > > Reviewer pointed out lan9303_get_ethtool_stats would be better off with > the use of a lan9303_read_switch_port(). So that was added to the series. ... Series applied, thank you.
Re: [PATCH net-next v2 0/5] ipv6: sr: add support for advanced local segment processing
From: David LebrunDate: Sat, 5 Aug 2017 12:38:23 +0200 > v2: use EXPORT_SYMBOL_GPL > > The current implementation of IPv6 SR supports SRH insertion/encapsulation > and basic segment endpoint behavior (i.e., processing of an SRH contained in > a packet whose active segment (IPv6 DA) is routed to the local node). This > behavior simply consists of updating the DA to the next segment and forwarding > the packet accordingly. This processing is realised for all such packets, > regardless of the active segment. > > The most recent specifications of IPv6 SR [1] [2] extend the SRH processing > features as follows. Each segment endpoint defines a MyLocalSID table. > This table maps segments to operations to perform. For each ingress IPv6 > packet whose DA is part of a given prefix, the segment endpoint looks > up the active segment (i.e., the IPv6 DA) in the MyLocalSID table and > applies the corresponding operation. Such specifications enable to specify > arbitrary operations besides the basic SRH processing and allow for a more > fine-grained classification. > > This patch series implements those extended specifications by leveraging > a new type of lightweight tunnel, seg6local. ... Series applied, thanks David.
Re: [PATCH net] net: sched: fix NULL pointer dereference when action calls some targets
(Cc'ing netfilter and Jamal) On Sat, Aug 5, 2017 at 4:35 AM, Xin Longwrote: > As we know in some target's checkentry it may dereference par.entryinfo > to check entry stuff inside. But when sched action calls xt_check_target, > par.entryinfo is set with NULL. It would cause kernel panic when calling > some targets. > > It can be reproduce with: > # tc qd add dev eth1 ingress handle : > # tc filter add dev eth1 parent : u32 match u32 0 0 action xt \ > -j ECN --ecn-tcp-remove > > It could also crash kernel when using target CLUSTERIP or TPROXY. > > By now there's no proper value for par.entryinfo in ipt_init_target, > but it can not be set with NULL. This patch is to void all these > panics by setting it with an ipt_entry obj with all members 0. > > Signed-off-by: Xin Long > --- > net/sched/act_ipt.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c > index 7c4816b..0f09f70 100644 > --- a/net/sched/act_ipt.c > +++ b/net/sched/act_ipt.c > @@ -41,6 +41,7 @@ static int ipt_init_target(struct net *net, struct > xt_entry_target *t, > { > struct xt_tgchk_param par; > struct xt_target *target; > + struct ipt_entry e; > int ret = 0; > > target = xt_request_find_target(AF_INET, t->u.user.name, > @@ -48,10 +49,11 @@ static int ipt_init_target(struct net *net, struct > xt_entry_target *t, > if (IS_ERR(target)) > return PTR_ERR(target); > > + memset(, 0, sizeof(e)); > t->u.kernel.target = target; > par.net = net; > par.table = table; > - par.entryinfo = NULL; > + par.entryinfo = This looks like a completely API burden?
Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1
On Mon, Aug 7, 2017 at 2:05 PM, John Stultzwrote: > So, with recent testing with my HiKey board, I've been noticing some > quirky behavior with my USB eth adapter. > > Basically, pluging the usb eth adapter in and then removing it, when > plugging it back in I often find that its not detected, and the system > slowly spits out the following message over and over: > unregister_netdevice: waiting for eth0 to become free. Usage count = 1 The other bit is that after this starts printing, the board will no longer reboot (it hangs continuing to occasionally print the above message), and I have to manually reset the device. thanks -john
Re: [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
From: Ding TianhongDate: Mon, 7 Aug 2017 12:13:17 +0800 > Hi David: > > I think networking tree merge it is a better choice, as it mainly used to > tell the NIC > drivers how to use the Relaxed Ordering Attribute, and later we need send > patch to enable > RO for ixgbe driver base on this patch. But I am not sure whether Bjorn has > some of his own > view. :) > > Hi Bjorn: > > Could you help review this patch or give some feedback ? I'm still waiting on this... Bjorn?
Re: [net-next PATCH v2] bpf: devmap fix mutex in rcu critical section
From: John FastabendDate: Fri, 04 Aug 2017 22:02:19 -0700 > Originally we used a mutex to protect concurrent devmap update > and delete operations from racing with netdev unregister notifier > callbacks. > > The notifier hook is needed because we increment the netdev ref > count when a dev is added to the devmap. This ensures the netdev > reference is valid in the datapath. However, we don't want to block > unregister events, hence the initial mutex and notifier handler. > > The concern was in the notifier hook we search the map for dev > entries that hold a refcnt on the net device being torn down. But, > in order to do this we require two steps, ... > Fortunately, by writing slightly better code we can avoid the > mutex altogether. If CPU 1 in the above example uses a cmpxchg > and _only_ replaces the dev reference in the map when it is in > fact the expected dev the race is removed completely. The two > cases being illustrated here, first the race condition, ... > And viola the original race we tried to solve with a mutex is > corrected and the trace noted by Sasha below is resolved due > to removal of the mutex. > > Note: When walking the devmap and removing dev references as needed > we depend on the core to fail any calls to dev_get_by_index() using > the ifindex of the device being removed. This way we do not race with > the user while searching the devmap. > > Additionally, the mutex was also protecting list add/del/read on > the list of maps in-use. This patch converts this to an RCU list > and spinlock implementation. This protects the list from concurrent > alloc/free operations. The notifier hook walks this list so it uses > RCU read semantics. > > BUG: sleeping function called from invalid context at > kernel/locking/mutex.c:747 > in_atomic(): 1, irqs_disabled(): 0, pid: 16315, name: syz-executor1 > 1 lock held by syz-executor1/16315: > #0: (rcu_read_lock){..}, at: [] map_delete_elem > kernel/bpf/syscall.c:577 [inline] > #0: (rcu_read_lock){..}, at: [] SYSC_bpf > kernel/bpf/syscall.c:1427 [inline] > #0: (rcu_read_lock){..}, at: [] SyS_bpf+0x1d32/0x4ba0 > kernel/bpf/syscall.c:1388 > > Fixes: 2ddf71e23cc2 ("net: add notifier hooks for devmap bpf map") > Reported-by: Sasha Levin > Signed-off-by: Daniel Borkmann > Signed-off-by: John Fastabend Applied, thanks John.
Re: [Patch net-next 0/2] net_sched: clean up filter handle
From: Cong WangDate: Fri, 4 Aug 2017 21:31:41 -0700 > This patchset sits in my local branch for a long time, it is time to > send it out. It cleans up the ambiguous use of 'unsigned long fh', > please see each of them for details. Series applied, thanks Cong.
Re: [PATCH net-next v2] lwtunnel: replace EXPORT_SYMBOL with EXPORT_SYMBOL_GPL
From: Roopa PrabhuDate: Fri, 4 Aug 2017 18:19:18 -0700 > From: Roopa Prabhu > > Signed-off-by: Roopa Prabhu > --- > v2 - fixed a incorrect replace Applied, thanks.
Re: [PATCH net-next v4 0/2] bpf: add support for sys_{enter|exit}_* tracepoints
From: Yonghong SongDate: Fri, 4 Aug 2017 16:00:08 -0700 > Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_* > style tracepoints. The main reason is that syscalls/sys_enter_* and > syscalls/sys_exit_* > tracepoints are treated differently from other tracepoints and there > is no bpf hook to it. > > This patch set adds bpf support for these syscalls tracepoints and also > adds a test case for it. > > Changelogs: > v3 -> v4: > - Check the legality of ctx offset access for syscall tracepoint as well. >trace_event_get_offsets will return correct max offset for each >specific syscall tracepoint. > - Use variable length array to avoid hardcode 6 as the maximum >arguments beyond syscall_nr. > v2 -> v3: > - Fix a build issue > v1 -> v2: > - Do not use TRACE_EVENT_FL_CAP_ANY to identify syscall tracepoint. >Instead use trace_event_call->class. Series applied, thank you.
Re: [PATCH] of_mdio: use of_property_read_u32_array()
From: Sergei ShtylyovDate: Sat, 05 Aug 2017 00:43:43 +0300 > The "fixed-link" prop support predated of_property_read_u32_array(), so > basically had to open-code it. Using the modern API saves 24 bytes of the > object code (ARM gcc 4.8.5); the only behavior change would be that the > prop length check is now less strict (however the strict pre-check done > in of_phy_is_fixed_link() is left intact anyway)... > > Signed-off-by: Sergei Shtylyov Applied to net-next.
Re: [PATCH] wan: dscc4: add checks for dma mapping errors
From: Alexey KhoroshilovDate: Fri, 4 Aug 2017 23:23:24 +0300 > The driver does not check if mapping dma memory succeed. > The patch adds the checks and failure handling. > > Found by Linux Driver Verification project (linuxtesting.org). > > Signed-off-by: Alexey Khoroshilov This is a great example of why it can be irritating to see these mechanical "bug fixes" for drivers very few people use and actually test, which introduces new bugs. > @@ -522,19 +522,27 @@ static inline int try_get_rx_skb(struct dscc4_dev_priv > *dpriv, > struct RxFD *rx_fd = dpriv->rx_fd + dirty; > const int len = RX_MAX(HDLC_MAX_MRU); > struct sk_buff *skb; > - int ret = 0; > + dma_addr_t addr; > > skb = dev_alloc_skb(len); > dpriv->rx_skbuff[dirty] = skb; skb recorded here. > +err_free_skb: > + dev_kfree_skb_any(skb); Yet freed here in the error path. dpriv->rx_skbuff[dirty] should not be set to 'skb' until all possibile failure tests have passed.
unregister_netdevice: waiting for eth0 to become free. Usage count = 1
So, with recent testing with my HiKey board, I've been noticing some quirky behavior with my USB eth adapter. Basically, pluging the usb eth adapter in and then removing it, when plugging it back in I often find that its not detected, and the system slowly spits out the following message over and over: unregister_netdevice: waiting for eth0 to become free. Usage count = 1 I've tried to go through and bisect it, but apparently the issue isn't always reproducible, as I'm apparently getting lots of false negatives (where I can't always reproduce boot to boot the issue on the same kernel). I've done three bisection passes (always restarting with the "first bad commit" from the previous bisection as the initial bad commit for the following pass), and it does seem to keep moving back. But it seems much easier to trigger with newer kernels then older (and so far I've not seen it with 4.12). Wanted to see if anyone had any ideas what might be going wrong, and how I should further debug this. The last bisect log I generated was: # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c # bad: [98fdd857a3bd6a3bf0003d3f68f07c25c85dcde3] net: ethernet: ti: cpsw: move skb timestamp to packet_submit git bisect bad 98fdd857a3bd6a3bf0003d3f68f07c25c85dcde3 # good: [48b6bbef9a1789f0365c1a385879a1fea4460016] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net git bisect good 48b6bbef9a1789f0365c1a385879a1fea4460016 # good: [a2e8bbd2ef5457485f00b6b947bbbfa2778e5b1e] bpf: Fix test_obj_id.c for llvm 5.0 git bisect good a2e8bbd2ef5457485f00b6b947bbbfa2778e5b1e # good: [273889e306256e95ea55d5ebaef99310cf589def] Merge tag 'mlx5-updates-2017-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux git bisect good 273889e306256e95ea55d5ebaef99310cf589def # bad: [8f46d46715a12f509e13200033a1ed4d6cf335ff] cxgb4: Use Firmware params to get buffer-group map git bisect bad 8f46d46715a12f509e13200033a1ed4d6cf335ff # bad: [f5c306470ed0a8f03ba7017f397da2555b5800d4] Merge tag 'mlx5-updates-2017-06-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux git bisect bad f5c306470ed0a8f03ba7017f397da2555b5800d4 # bad: [e289ef0ded13021db292be9aef134451546e7c60] net: dsa: mv88e6xxx: clarify SMI PHY functions git bisect bad e289ef0ded13021db292be9aef134451546e7c60 # bad: [836d57e5c08e13bb206dcd559d96ee9355e8316e] liquidio: implement vlan filter enable and disable git bisect bad 836d57e5c08e13bb206dcd559d96ee9355e8316e # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 # good: [0830106c53900181d336350581119af09e123bf3] ipv4: take dst->__refcnt when caching dst in fib git bisect good 0830106c53900181d336350581119af09e123bf3 # good: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect good b838d5e1c5b6e57b10ec8af2268824041e3ea911 # bad: [9514528d92d4cbe086499322370155ed69f5d06c] ipv6: call dst_dev_put() properly git bisect bad 9514528d92d4cbe086499322370155ed69f5d06c # good: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree git bisect good 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 # first bad commit: [9514528d92d4cbe086499322370155ed69f5d06c] ipv6: call dst_dev_put() properly But again, reverting the "ipv6: call dst_dev_put() properly" commit doesn't seem to completely resolve the issue on newer kernels (though it may make it harder to trigger), and I suspect with further bisection passes I might move further back. Ideas? I don't seem to have similar issues with USB mass storage devices, so it seems to be networking specific. thanks -john
Re: [PATCH net] net: sched: set xt_tgchk_param par.nft_compat with false in ipt_init_target
On Sat, Aug 5, 2017 at 4:32 AM, Xin Longwrote: > Commit 55917a21d0cc ("netfilter: x_tables: add context to know if > extension runs from nft_compat") introduced a member nft_compat to > xt_tgchk_param structure. > > But it didn't set it's value for ipt_init_target. With unexpected > value in par.nft_compat, it may return unexpected result in some > target's checkentry. > > This patch is to set par.nft_compat with false in ipt_init_target. It's time to set all these fields to 0 and only initialize those non-zero fields, in case we will add more fields in the future.
Re: [PATCH net v2] net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
From: Davide CarattiDate: Thu, 3 Aug 2017 22:54:48 +0200 > if the NIC fails to validate the checksum on TCP/UDP, and validation of IP > checksum is successful, the driver subtracts the pseudo-header checksum > from the value obtained by the hardware and sets CHECKSUM_COMPLETE. Don't > do that if protocol is IPPROTO_SCTP, otherwise CRC32c validation fails. > > V2: don't test MLX4_CQE_STATUS_IPV6 if MLX4_CQE_STATUS_IPV4 is set > > Reported-by: Shuang Li > Fixes: f8c6455bb04b ("net/mlx4_en: Extend checksum offloading by CHECKSUM > COMPLETE") > Signed-off-by: Davide Caratti Can I get reviews from some Mellanox folks please?
Re: [PATCH net-next] ibmvnic: Report rx buffer return codes as netdev_dbg
From: John AllenDate: Mon, 7 Aug 2017 15:42:30 -0500 > Reporting any return code for a receive buffer as an "rx error" only > produces alarming noise and the only values that have been observed to be > used in this field are not error conditions. Change this to a netdev_dbg > with a more descriptive message. > > Signed-off-by: John Allen Applied, thanks John.
Re: [PATCH net] net: sched: set xt_tgchk_param par.net properly in ipt_init_target
On Sat, Aug 5, 2017 at 1:48 AM, Xin Longwrote: > -static int __tcf_ipt_init(struct tc_action_net *tn, struct nlattr *nla, > +static int __tcf_ipt_init(struct net *net, struct nlattr *nla, > struct nlattr *est, struct tc_action **a, > const struct tc_action_ops *ops, int ovr, int bind) > { > + struct tc_action_net *tn = net_generic(net, xt_net_id); ... > @@ -193,18 +195,14 @@ static int tcf_ipt_init(struct net *net, struct nlattr > *nla, > struct nlattr *est, struct tc_action **a, int ovr, > int bind) > { > - struct tc_action_net *tn = net_generic(net, ipt_net_id); > - > - return __tcf_ipt_init(tn, nla, est, a, _ipt_ops, ovr, bind); > + return __tcf_ipt_init(net, nla, est, a, _ipt_ops, ovr, bind); > } > > static int tcf_xt_init(struct net *net, struct nlattr *nla, >struct nlattr *est, struct tc_action **a, int ovr, >int bind) > { > - struct tc_action_net *tn = net_generic(net, xt_net_id); > - > - return __tcf_ipt_init(tn, nla, est, a, _xt_ops, ovr, bind); > + return __tcf_ipt_init(net, nla, est, a, _xt_ops, ovr, bind); This is not correct. You miss ipt_net_id != xt_net_id.
[PATCH net-next] ibmvnic: Report rx buffer return codes as netdev_dbg
Reporting any return code for a receive buffer as an "rx error" only produces alarming noise and the only values that have been observed to be used in this field are not error conditions. Change this to a netdev_dbg with a more descriptive message. Signed-off-by: John Allen--- diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 5932160..99576ba 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1579,7 +1579,8 @@ static int ibmvnic_poll(struct napi_struct *napi, int budget) rx_comp.correlator); /* do error checking */ if (next->rx_comp.rc) { - netdev_err(netdev, "rx error %x\n", next->rx_comp.rc); + netdev_dbg(netdev, "rx buffer returned with rc %x\n", + be16_to_cpu(next->rx_comp.rc)); /* free the entry */ next->rx_comp.first = 0; remove_buff_from_pool(adapter, rx_buff);
Re: [RFC] iproute: Add support for extended ack to rtnl_talk
From: Stephen HemmingerDate: Mon, 7 Aug 2017 12:12:35 -0700 > Dave, I asked for test cases, and received none. You don't need a test case to type make and make sure the build succeeds.
[PATCH net-next] selftests: bpf: add a test for XDP redirect
Add test for xdp_redirect by creating two namespaces with two veth peers, then forward packets in-between. Signed-off-by: William TuCc: Daniel Borkmann Cc: John Fastabend --- tools/include/uapi/linux/bpf.h | 3 +- tools/testing/selftests/bpf/Makefile | 4 +- tools/testing/selftests/bpf/test_xdp_redirect.c | 28 tools/testing/selftests/bpf/test_xdp_redirect.sh | 54 4 files changed, 86 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/bpf/test_xdp_redirect.c create mode 100755 tools/testing/selftests/bpf/test_xdp_redirect.sh diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 1579cab49717..8d9bfcca3fe4 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -592,7 +592,8 @@ union bpf_attr { FN(get_socket_uid), \ FN(set_hash), \ FN(setsockopt), \ - FN(skb_adjust_room), + FN(skb_adjust_room),\ + FN(redirect_map), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 153c3a181a4c..3c2e67da4b41 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -15,9 +15,9 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test test_align TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \ - test_pkt_md_access.o + test_pkt_md_access.o test_xdp_redirect.o -TEST_PROGS := test_kmod.sh +TEST_PROGS := test_kmod.sh test_xdp_redirect.sh include ../lib.mk diff --git a/tools/testing/selftests/bpf/test_xdp_redirect.c b/tools/testing/selftests/bpf/test_xdp_redirect.c new file mode 100644 index ..ef9e704be140 --- /dev/null +++ b/tools/testing/selftests/bpf/test_xdp_redirect.c @@ -0,0 +1,28 @@ +/* Copyright (c) 2017 VMware + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#include +#include "bpf_helpers.h" + +int _version SEC("version") = 1; + +SEC("redirect_to_111") +int xdp_redirect_to_111(struct xdp_md *xdp) +{ + return bpf_redirect(111, 0); +} +SEC("redirect_to_222") +int xdp_redirect_to_222(struct xdp_md *xdp) +{ + return bpf_redirect(222, 0); +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_xdp_redirect.sh b/tools/testing/selftests/bpf/test_xdp_redirect.sh new file mode 100755 index ..d8c73ed6e040 --- /dev/null +++ b/tools/testing/selftests/bpf/test_xdp_redirect.sh @@ -0,0 +1,54 @@ +#!/bin/sh +# Create 2 namespaces with two veth peers, and +# forward packets in-between using generic XDP +# +# NS1(veth11) NS2(veth22) +# | | +# | | +# (veth1, -- (veth2, +# id:111) id:222) +# | xdp forwarding | +# -- + +cleanup() +{ + if [ "$?" = "0" ]; then + echo "selftests: test_xdp_redirect [PASS]"; + else + echo "selftests: test_xdp_redirect [FAILED]"; + fi + + set +e + ip netns del ns1 2> /dev/null + ip netns del ns2 2> /dev/null +} + +set -e + +ip netns add ns1 +ip netns add ns2 + +trap cleanup 0 2 3 6 9 + +ip link add veth1 index 111 type veth peer name veth11 +ip link add veth2 index 222 type veth peer name veth22 + +ip link set veth11 netns ns1 +ip link set veth22 netns ns2 + +ip link set veth1 up +ip link set veth2 up + +ip netns exec ns1 ip addr add 10.1.1.11/24 dev veth11 +ip netns exec ns2 ip addr add 10.1.1.22/24 dev veth22 + +ip netns exec ns1 ip link set dev veth11 up +ip netns exec ns2 ip link set dev veth22 up + +ip link set dev veth1 xdpgeneric obj test_xdp_redirect.o sec redirect_to_222 +ip link set dev veth2 xdpgeneric obj test_xdp_redirect.o sec redirect_to_111 + +ip netns exec ns1 ping -c 1 10.1.1.22 +ip netns exec ns2 ping -c 1 10.1.1.11 + +exit 0 -- 2.7.4
Re: Qdisc->u32_node - licence to kill
On 08/07/2017 12:06 PM, Jiri Pirko wrote: > Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastab...@gmail.com wrote: >> On 08/07/2017 09:41 AM, Jiri Pirko wrote: >>> Hi Jamal/Cong/David/all. >>> >>> Digging in the u32 code deeper now. I need to get rid of tp->q for shared >>> blocks, but I found out about this: >>> >>> struct Qdisc { >>> .. >>> void*u32_node; >>> .. >>> }; >>> >>> Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually >>> stores a linked list of all hashtables added to one qdiscs. >>> >>> So basically what you have is, you have 1 root ht per prio/pref. Then >>> you can have multiple hts, linked from any other ht, does not matter in >>> which prio/pref they are. >>> >> >> We can create arbitrary hash tables here independent of prio/pref via >> TCA_U32_DIVISOR. Then these can be linked to other hash tables via >> TCA_U32_LINK commands. > > Yeah, that's what I thought. > > >> >> prio/pref does not really play any part here from my reading, except as >> a further specifier in the walk callbacks. Making it a useful filter on >> dump operations. > > Not correct. prio/pref is one level up priority, independent on specific > cls implementation. You can have cls_u32 instance on prio 10 and > cls_flower instance on prio 20. Both work. ah right, lets make sure I got this right then (its been awhile since I've read this code). So the tcf_ctl_tfilter hook walks classifiers, inserting the classifier by prio. Then tcf_classify walks the list of classifiers looking for any matches, specifically any return codes it recognizes or a return code greater than zero. u32 though has this link notion that allows users to jump to other u32 classifiers that are in this list, because it has a global hash table list. So the per prio classifier isolation is not true in u32 case. > > In fact, the current u32 "linking" ignores the upper level > prio/pref and breakes user assumptions when he inserts rules with > specific prio. > > hmm yep, I guess users of u32 have a "different" set of assumptions when working with u32 hash tables than the rest of the classifiers. >> >>> Do I understand that correctly that prio/pref only has meaning if >>> linking does not take place, because if there is linking, the prio/pref >>> of inserted rule is simply ignored? >> >> I think even then the prio/pref meaning is dubious, from u32_change, > > Please see tc_ctl_tfilter. That is where prio/pref is processed. What > you describe is one level down. > got it. > >> >>for (pins = rtnl_dereference(*ins); pins; >> ins = >next, pins = rtnl_dereference(*ins)) >>if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle)) >>break; >> >> I think the list insert is done via handle not via prio/pref. >> >>> >>> That is the most confusing thing I saw in net/sched/ so far. >>> Is this a bug? Sounds like one. >>> >> >> I don't think this is a bug at very least I don't see how we can >> change it without breaking users. I know people depend on the hash map >> capabilities and linking logic. > > Do they insert rules into multiple hashtables with different prio? Why? > What is the usecase? > Single u32 classifier with multiple hash tables linked together I would think is the normal way. I guess because the API never disallowed it and the user api is a bit tricky its possible users may use multiple prios, but probably it is not needed. Maybe Jamal has some use case where this is required? > >> >>> Did someone introduce *u32_node (formerly static struct tc_u_common >>> *u32_list;) just to allow this weirdness? >>> >>> Can I just remove this shared tp_c and make the linking to other >>> hashtables only possible within the same prio/pref? That would make >>> sense to me. >>> >> >> The idea to make linking hash tables only possible within the same >> prio/pref will break existing programs. We can't do this its part of >> UAPI now and people depend on it. > > That's why I asked if that is a bug. I still feel it is. But I > definitelly understand your concern. I'm just trying to figure out how > to resolve this misdesign :( > I don't have a good argument for the current design, but just want to be sure we don't break existing users. .John
Re: [PATCH iproute2] lib: Dump ext-ack string by default
On 8/7/17 1:28 PM, David Ahern wrote: > @@ -99,7 +95,12 @@ static int nl_dump_ext_err(const struct nlmsghdr *nlh, > nl_ext_ack_fn_t errfn) > err_nlh = >msg; > } > > - return errfn(errmsg, off, err_nlh); > + if (errfn) > + return errfn(errmsg, off, err_nlh); > + > + fprintf(stderr, "Error: %s\n", errmsg); > + > + return 1; Dang it, missing an 'if (errmsg)' since it does not have to exist. Will send a v2
Re: [pull request][for-next 0/8] Mellanox, mlx5 shared 2017-08-07
On Mon, Aug 7, 2017 at 9:20 PM, David Millerwrote: > From: Saeed Mahameed > Date: Mon, 7 Aug 2017 13:18:00 +0300 > >> Hi Dave & Doug, >> >> This series contains some low level updates for mlx5 core driver, >> to be shared as base code for net-next and rdma for-next mlx5 >> 4.14 submissions. >> >> Please find more information in the tag message below. >> >> Please pull and let me know if there's any porblem. >> >> Side note: >> This series merges cleanly with current net-next, but it will conflict with >> Jiri's patch >> "mlx5e: push cls_flower and mqprio setup_tc processing into separate >> functions" >> Which is under review. >> since this is shared code and must go to both rdma and net-next it has to be >> based on 4.13-rc4, so there is not much I can do about this. > > I resolved the merge conflict as best as I could, please take a look. > Thanks Dave! looks good, I will do some compilation testing later and if i find anything i will post a patch. Thanks a lot, -Saeed.
[PATCH iproute2] lib: Dump ext-ack string by default
In time, errfn can be implemented for link, route, etc commands to give a much more detailed response (e.g., point to the attribute that failed). Doing so is much more complicated to process the message and convert attribute ids to names. In any case the error string returned by the kernel should be dumped to the user, so make that happen now. Signed-off-by: David Ahern--- lib/libnetlink.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/lib/libnetlink.c b/lib/libnetlink.c index 145de2cb0ccf..ee78f768a8bd 100644 --- a/lib/libnetlink.c +++ b/lib/libnetlink.c @@ -61,7 +61,6 @@ static int err_attr_cb(const struct nlattr *attr, void *data) return MNL_CB_OK; } - /* dump netlink extended ack error message */ static int nl_dump_ext_err(const struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn) { @@ -72,9 +71,6 @@ static int nl_dump_ext_err(const struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn) const char *errmsg = NULL; uint32_t off = 0; - if (!errfn) - return 0; - /* no TLVs, nothing to do here */ if (!(nlh->nlmsg_flags & NLM_F_ACK_TLVS)) return 0; @@ -99,7 +95,12 @@ static int nl_dump_ext_err(const struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn) err_nlh = >msg; } - return errfn(errmsg, off, err_nlh); + if (errfn) + return errfn(errmsg, off, err_nlh); + + fprintf(stderr, "Error: %s\n", errmsg); + + return 1; } #else #warning "libmnl required for error support" -- 2.1.4
[PATCH net-next] liquidio: fix misspelled firmware image filenames
From: Derek ChicklesFix misspelled firmware image filenames advertised via MODULE_FIRMWARE(). Signed-off-by: Derek Chickles Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/lio_main.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index 8c2cd80..3ec0dd9 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -39,10 +39,14 @@ MODULE_AUTHOR("Cavium Networks, "); MODULE_DESCRIPTION("Cavium LiquidIO Intelligent Server Adapter Driver"); MODULE_LICENSE("GPL"); MODULE_VERSION(LIQUIDIO_VERSION); -MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_210SV_NAME LIO_FW_NAME_SUFFIX); -MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_210NV_NAME LIO_FW_NAME_SUFFIX); -MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_410NV_NAME LIO_FW_NAME_SUFFIX); -MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_23XX_NAME LIO_FW_NAME_SUFFIX); +MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_210SV_NAME + "_" LIO_FW_NAME_TYPE_NIC LIO_FW_NAME_SUFFIX); +MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_210NV_NAME + "_" LIO_FW_NAME_TYPE_NIC LIO_FW_NAME_SUFFIX); +MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_410NV_NAME + "_" LIO_FW_NAME_TYPE_NIC LIO_FW_NAME_SUFFIX); +MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_23XX_NAME + "_" LIO_FW_NAME_TYPE_NIC LIO_FW_NAME_SUFFIX); static int ddr_timeout = 1; module_param(ddr_timeout, int, 0644);
Re: [PATCH 0/6] In-kernel QMI handling
Hi Bjorn, >>> This series starts by moving the common definitions of the QMUX >>> protocol to the >>> uapi header, as they are shared with clients - both in kernel and >>> userspace. >>> >>> This series then introduces in-kernel helper functions for aiding the >>> handling >>> of QMI encoded messages in the kernel. QMI encoding is a wire-format >>> used in >>> exchanging messages between the majority of QRTR clients and >>> services. >> >> This raises a few red-flags for me. > > I'm glad it does. In discussions with the responsible team within > Qualcomm I've highlighted a number of concerns about enabling this > support in the kernel. Together we're continuously looking into what > should be pushed out to user space, and trying to not introduce > unnecessary new users. > >> So far, we've kept almost everything QMI related in userspace and >> handled all QMI control-channel messages from libraries like libqmi or >> uqmi via the cdc-wdm driver and the "rmnet" interface via the qmi_wwan >> driver. The kernel drivers just serve as the transport. >> > > The path that was taken to support the MSM-style devices was to > implement net/qrtr, which exposes a socket interface to abstract the > physical transports (QMUX or IPCROUTER in Qualcomm terminology). > > As I share you view on letting the kernel handle the transportation only > the task of keeping track of registered services (service id -> node and > port mapping) was done in a user space process and so far we've only > ever have to deal with QMI encoded messages in various user space tools. I think that the transport and multiplexing can be in the kernel as long as it is done as proper subsystem. Similar to Phonet or CAIF. Meaning it should have a well defined socket interface that can be easily used from userspace, but also a clean in-kernel interface handling. If Qualcomm is supportive of this effort and is willing to actually assist and/or open some of the specs or interface descriptions, then this is a good thing. Service registration and cleanup is really done best in the kernel. Same applies to multiplexing. Trying to do multiplexing in userspace is always cumbersome and leads to overhead that is of no gain. For example within oFono, we had to force everything to go via oFono since it was the only sane way of handling it. Other approaches were error prone and full of race conditions. You need a central entity that can clean up. For the definition of an UAPI to share some code, I am actually not sure that is such a good idea. For example the QMI code in oFono follows a way simpler approach. And I am not convinced that all the macros are actually beneficial. For example, the whole netlink macros are pretty cumbersome. Adding some Documentation/qmi.txt on how the wire format looks like and what is expected seems to be a way better approach. Regards Marcel
Re: [RFC] iproute: Add support for extended ack to rtnl_talk
On Mon, 07 Aug 2017 11:45:17 -0700 (PDT) David Millerwrote: > From: David Ahern > Date: Mon, 7 Aug 2017 12:09:31 -0600 > > > On 8/7/17 12:06 PM, Stephen Hemminger wrote: > >>> Does not work. Seems like you pushed the RFC commit which was known to > >>> be incomplete. > >> > >> Patches welcome. > > > > What exists does not even compile. Patches will be sent once you fix that. > > Yeah seriously Stephen, you created this huge mess so the onus is really > on you to fix it up. It is fixed now. Dave, I asked for test cases, and received none.
Re: Qdisc->u32_node - licence to kill
Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastab...@gmail.com wrote: >On 08/07/2017 09:41 AM, Jiri Pirko wrote: >> Hi Jamal/Cong/David/all. >> >> Digging in the u32 code deeper now. I need to get rid of tp->q for shared >> blocks, but I found out about this: >> >> struct Qdisc { >> .. >> void*u32_node; >> .. >> }; >> >> Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually >> stores a linked list of all hashtables added to one qdiscs. >> >> So basically what you have is, you have 1 root ht per prio/pref. Then >> you can have multiple hts, linked from any other ht, does not matter in >> which prio/pref they are. >> > >We can create arbitrary hash tables here independent of prio/pref via >TCA_U32_DIVISOR. Then these can be linked to other hash tables via >TCA_U32_LINK commands. Yeah, that's what I thought. > >prio/pref does not really play any part here from my reading, except as >a further specifier in the walk callbacks. Making it a useful filter on >dump operations. Not correct. prio/pref is one level up priority, independent on specific cls implementation. You can have cls_u32 instance on prio 10 and cls_flower instance on prio 20. Both work. In fact, the current u32 "linking" ignores the upper level prio/pref and breakes user assumptions when he inserts rules with specific prio. > >> Do I understand that correctly that prio/pref only has meaning if >> linking does not take place, because if there is linking, the prio/pref >> of inserted rule is simply ignored? > >I think even then the prio/pref meaning is dubious, from u32_change, Please see tc_ctl_tfilter. That is where prio/pref is processed. What you describe is one level down. > >for (pins = rtnl_dereference(*ins); pins; > ins = >next, pins = rtnl_dereference(*ins)) >if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle)) >break; > >I think the list insert is done via handle not via prio/pref. > >> >> That is the most confusing thing I saw in net/sched/ so far. >> Is this a bug? Sounds like one. >> > >I don't think this is a bug at very least I don't see how we can >change it without breaking users. I know people depend on the hash map >capabilities and linking logic. Do they insert rules into multiple hashtables with different prio? Why? What is the usecase? > >> Did someone introduce *u32_node (formerly static struct tc_u_common >> *u32_list;) just to allow this weirdness? >> >> Can I just remove this shared tp_c and make the linking to other >> hashtables only possible within the same prio/pref? That would make >> sense to me. >> > >The idea to make linking hash tables only possible within the same >prio/pref will break existing programs. We can't do this its part of >UAPI now and people depend on it. That's why I asked if that is a bug. I still feel it is. But I definitelly understand your concern. I'm just trying to figure out how to resolve this misdesign :(
Re: [PATCH net-next v4 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints
On 8/4/17 1:00 PM, Yonghong Song wrote: Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_* style tracepoints. The iovisor/bcc issue #748 (https://github.com/iovisor/bcc/issues/748) documents this issue. For example, if you try to attach a bpf program to tracepoints syscalls/sys_enter_newfstat, you will get the following error: # ./tools/trace.py t:syscalls:sys_enter_newfstat Ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument Failed to attach BPF to tracepoint The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_* tracepoints are treated differently from other tracepoints and there is no bpf hook to it. This patch adds bpf support for these syscalls tracepoints by . permitting bpf attachment in ioctl PERF_EVENT_IOC_SET_BPF . calling bpf programs in perf_syscall_enter and perf_syscall_exit The legality of bpf program ctx access is also checked. Function trace_event_get_offsets returns correct max offset for each specific syscall tracepoint, which is compared against the maximum offset access in bpf program. Signed-off-by: Yonghong Songlgtm Acked-by: Alexei Starovoitov
Re: [PATCH net-next v3 00/13] Update DSA's FDB API and perform switchdev cleanup
On 08/07/2017 07:59 AM, Vivien Didelot wrote: > Hi Arkadi, > > Arkadi Sharshevskywrites: > >> The patchset adds support for configuring static FDB entries via the >> switchdev notification chain. The current method for FDB configuration >> uses the switchdev's bridge bypass implementation. In order to support >> this legacy way and to perform the switchdev cleanup, the implementation >> is moved inside DSA. >> >> The DSA drivers cannot sync the software bridge with hardware learned >> entries and use the switchdev's implementation of bypass FDB dumping. >> Because they are the only ones using this functionality, the fdb_dump >> implementation is moved from switchdev code into DSA. >> >> Finally after this changes a major cleanup in switchdev can be done. >> --- >> Please see individual patches for patch specific change logs. >> v1->v2 >> - Split MDB/vlan dump removal into core/driver removal. >> >> v2->v3 >> - The self implementation for FDB add/del is moved inside DSA. > > v3 behaves correctly: > > # bridge fdb add e4:1d:2d:a5:f0:2a dev lan3 > # bridge fdb add e4:1d:2d:a5:f0:4a dev lan4 master > # bridge fdb show > 01:00:5e:00:00:01 dev eth0 self permanent > 01:00:5e:00:00:01 dev eth1 self permanent > b6:f2:c8:3a:1c:71 dev lan0 master br0 permanent > e4:1d:2d:a5:f0:2a dev lan3 self static > e4:1d:2d:a5:f0:4a dev lan4 offload master br0 permanent > e4:1d:2d:a5:f0:4a dev lan4 self static > 01:00:5e:00:00:01 dev br0 self permanent > # bridge fdb del e4:1d:2d:a5:f0:2a dev lan3 > # bridge fdb del e4:1d:2d:a5:f0:4a dev lan4 master > # bridge fdb show > 01:00:5e:00:00:01 dev eth0 self permanent > 01:00:5e:00:00:01 dev eth1 self permanent > b6:f2:c8:3a:1c:71 dev lan0 master br0 permanent > 01:00:5e:00:00:01 dev br0 self permanent > > Tested-by: Vivien Didelot Same here: Tested-by: Florian Fainelli thanks! -- Florian
[PATCH net-next v2 2/2] bpf: Extend check_uarg_tail_zero() checks
The function check_uarg_tail_zero() was created from bpf(2) for BPF_OBJ_GET_INFO_BY_FD without taking the access_ok() nor the PAGE_SIZE checks. Make this checks more generally available while unlikely to be triggered, extend the memory range check and add an explanation including why the ToCToU should not be a security concern. Signed-off-by: Mickaël SalaünAcked-by: Daniel Borkmann Cc: Alexei Starovoitov Cc: David S. Miller Cc: Kees Cook Cc: Martin KaFai Lau Link: https://lkml.kernel.org/r/CAGXu5j+vRGFvJZmjtAcT8Hi8B+Wz0e1b6VKYZHfQP_=dxzc...@mail.gmail.com --- kernel/bpf/syscall.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c653ee0bd162..fbe09a0cccf4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -48,6 +48,15 @@ static const struct bpf_map_ops * const bpf_map_types[] = { #undef BPF_MAP_TYPE }; +/* + * If we're handed a bigger struct than we know of, ensure all the unknown bits + * are 0 - i.e. new user-space does not rely on any kernel feature extensions + * we don't know about yet. + * + * There is a ToCToU between this function call and the following + * copy_from_user() call. However, this is not a concern since this function is + * meant to be a future-proofing of bits. + */ static int check_uarg_tail_zero(void __user *uaddr, size_t expected_size, size_t actual_size) @@ -57,6 +66,12 @@ static int check_uarg_tail_zero(void __user *uaddr, unsigned char val; int err; + if (unlikely(actual_size > PAGE_SIZE)) /* silly large */ + return -E2BIG; + + if (unlikely(!access_ok(VERIFY_READ, uaddr, actual_size))) + return -EFAULT; + if (actual_size <= expected_size) return 0; @@ -1393,17 +1408,6 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz if (!capable(CAP_SYS_ADMIN) && sysctl_unprivileged_bpf_disabled) return -EPERM; - if (!access_ok(VERIFY_READ, uattr, 1)) - return -EFAULT; - - if (size > PAGE_SIZE) /* silly large */ - return -E2BIG; - - /* If we're handed a bigger struct than we know of, -* ensure all the unknown bits are 0 - i.e. new -* user-space does not rely on any kernel feature -* extensions we dont know about yet. -*/ err = check_uarg_tail_zero(uattr, sizeof(attr), size); if (err) return err; -- 2.13.3
[PATCH net-next v2 1/2] bpf: Move check_uarg_tail_zero() upward
The function check_uarg_tail_zero() may be useful for other part of the code in the syscall.c file. Move this function at the beginning of the file. Signed-off-by: Mickaël SalaünAcked-by: Daniel Borkmann Cc: Alexei Starovoitov Cc: David S. Miller Cc: Kees Cook Cc: Martin KaFai Lau --- This is needed for the Landlock patch series. :) --- kernel/bpf/syscall.c | 52 ++-- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 6c772adabad2..c653ee0bd162 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -48,6 +48,32 @@ static const struct bpf_map_ops * const bpf_map_types[] = { #undef BPF_MAP_TYPE }; +static int check_uarg_tail_zero(void __user *uaddr, + size_t expected_size, + size_t actual_size) +{ + unsigned char __user *addr; + unsigned char __user *end; + unsigned char val; + int err; + + if (actual_size <= expected_size) + return 0; + + addr = uaddr + expected_size; + end = uaddr + actual_size; + + for (; addr < end; addr++) { + err = get_user(val, addr); + if (err) + return err; + if (val) + return -E2BIG; + } + + return 0; +} + static struct bpf_map *find_and_alloc_map(union bpf_attr *attr) { struct bpf_map *map; @@ -1246,32 +1272,6 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr) return fd; } -static int check_uarg_tail_zero(void __user *uaddr, - size_t expected_size, - size_t actual_size) -{ - unsigned char __user *addr; - unsigned char __user *end; - unsigned char val; - int err; - - if (actual_size <= expected_size) - return 0; - - addr = uaddr + expected_size; - end = uaddr + actual_size; - - for (; addr < end; addr++) { - err = get_user(val, addr); - if (err) - return err; - if (val) - return -E2BIG; - } - - return 0; -} - static int bpf_prog_get_info_by_fd(struct bpf_prog *prog, const union bpf_attr *attr, union bpf_attr __user *uattr) -- 2.13.3
Re: [RFC] iproute: Add support for extended ack to rtnl_talk
From: David AhernDate: Mon, 7 Aug 2017 12:09:31 -0600 > On 8/7/17 12:06 PM, Stephen Hemminger wrote: >>> Does not work. Seems like you pushed the RFC commit which was known to >>> be incomplete. >> >> Patches welcome. > > What exists does not even compile. Patches will be sent once you fix that. Yeah seriously Stephen, you created this huge mess so the onus is really on you to fix it up.
Re: [PATCH] of_mdio: use of_property_read_u32_array()
On Mon, Aug 7, 2017 at 1:01 PM, Florian Fainelliwrote: > On 08/07/2017 09:18 AM, Sergei Shtylyov wrote: >> Hello! >> >> On 08/07/2017 05:18 PM, Rob Herring wrote: >> The "fixed-link" prop support predated of_property_read_u32_array(), so basically had to open-code it. Using the modern API saves 24 bytes of the object code (ARM gcc 4.8.5); the only behavior change would be that the prop length check is now less strict (however the strict pre-check done in of_phy_is_fixed_link() is left intact anyway)... Signed-off-by: Sergei Shtylyov --- The patch is against the 'dt/next' branch of Rob Herring's 'linux-git' repo plus the previously posted patch killing the useless local variable in of_phy_register_fixed_link(). >>> >>> It shouldn't depend on anything in my tree and David normally takes >>> of_mdio.c changes. >> >>MAINTAINERS still only point at the DT repo, perhaps it should be >> updated? > > More or less done with this (minus the repo part): > > http://patchwork.ozlabs.org/patch/795887/ Really I'd like to see this fixed by moving of_mdio.c and of_net.c to drivers/net/ as we've done for all other subsystems. Rob
Re: [PATCH v3 net-next 0/7] net: l3mdev: Support for sockets bound to enslaved device
On 8/7/17 12:39 PM, David Miller wrote: > Series applied, let's see if it builds this time :-) I did an allyesconfig build before sending just to make sure, so our mileage better not vary.
Re: [PATCH v3 net-next 0/7] net: l3mdev: Support for sockets bound to enslaved device
From: David AhernDate: Mon, 7 Aug 2017 08:44:15 -0700 > A missing piece to the VRF puzzle is the ability to bind sockets to > devices enslaved to a VRF. This patch set adds the enslaved device > index, sdif, to IPv4 and IPv6 socket lookups. The end result for users > is the following scope options for services: > > 1. "global" services - sockets not bound to any device > >Allows 1 service to work across all network interfaces with >connected sockets bound to the VRF the connection originates >(Requires net.ipv4.tcp_l3mdev_accept=1 for TCP and > net.ipv4.udp_l3mdev_accept=1 for UDP) > > 2. "VRF" local services - sockets bound to a VRF > >Sockets work across all network interfaces enslaved to a VRF but >are limited to just the one VRF. > > 3. "device" services - sockets bound to a specific network interface > >Service works only through the one specific interface. > > v3 > - convert __inet_lookup_established in dccp_v4_err; missed in v2 > > v2 > - remove sk_lookup struct and add sdif as an argument to existing > functions > > Changes since RFC: > - no significant logic changes; mainly whitespace cleanups Series applied, let's see if it builds this time :-)
Re: pull-request: wireless-drivers-next 2017-08-07
From: Kalle ValoDate: Mon, 07 Aug 2017 17:55:40 +0300 > here's the first pull request to net-next for 4.14, more info in the > signed tag below. This time there's a simple conflict in iwlwifi but > you can fix it just like Stephen did: > > https://lkml.kernel.org/r/20170804120408.0d147...@canb.auug.org.au > > Please let me know if you have any problems. Pulled, thanks Kalle.
Re: [PATCH net-next v1 2/2] bpf: Extend check_uarg_tail_zero() checks
On 08/07/2017 06:36 PM, Mickaël Salaün wrote: The function check_uarg_tail_zero() was created from bpf(2) for BPF_OBJ_GET_INFO_BY_FD without taking the access_ok() nor the PAGE_SIZE checks. Make this checks more generally available while unlikely to be triggered, extend the memory range check and add an explanation including why the ToCToU should not be a security concern. Signed-off-by: Mickaël SalaünCc: Alexei Starovoitov Cc: Daniel Borkmann Cc: David S. Miller Cc: Kees Cook Cc: Martin KaFai Lau Link: https://lkml.kernel.org/r/CAGXu5j+vRGFvJZmjtAcT8Hi8B+Wz0e1b6VKYZHfQP_=dxzc...@mail.gmail.com --- kernel/bpf/syscall.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c653ee0bd162..b884fdc371e0 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -48,6 +48,15 @@ static const struct bpf_map_ops * const bpf_map_types[] = { #undef BPF_MAP_TYPE }; +/* + * If we're handed a bigger struct than we know of, ensure all the unknown bits + * are 0 - i.e. new user-space does not rely on any kernel feature extensions + * we dont know about yet. Nit: don't + * + * There is a ToCToU between this function call and the following + * copy_from_user() call. However, this should not be a concern since this Lets make it a bit more clear to the reader: s/should not/is not/ + * function is meant to be a future-proofing of bits. + */ static int check_uarg_tail_zero(void __user *uaddr, size_t expected_size, size_t actual_size) @@ -57,6 +66,12 @@ static int check_uarg_tail_zero(void __user *uaddr, unsigned char val; int err; + if (unlikely(!access_ok(VERIFY_READ, uaddr, actual_size))) + return -EFAULT; + + if (unlikely(actual_size > PAGE_SIZE)) /* silly large */ + return -E2BIG; + Yeah, moving the checks into check_uarg_tail_zero() is fine by me. Can we make the 'silly large' test first, so we don't generate unnecessary work if we bail out later anyway? Other than that: Acked-by: Daniel Borkmann Thanks, Daniel if (actual_size <= expected_size) return 0; @@ -1393,17 +1408,6 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz if (!capable(CAP_SYS_ADMIN) && sysctl_unprivileged_bpf_disabled) return -EPERM; - if (!access_ok(VERIFY_READ, uattr, 1)) - return -EFAULT; - - if (size > PAGE_SIZE)/* silly large */ - return -E2BIG; - - /* If we're handed a bigger struct than we know of, -* ensure all the unknown bits are 0 - i.e. new -* user-space does not rely on any kernel feature -* extensions we dont know about yet. -*/ err = check_uarg_tail_zero(uattr, sizeof(attr), size); if (err) return err;
[PATCH net-next 1/1] netvsc: make sure and unregister datapath
Go back to switching datapath directly in the notifier callback. Otherwise datapath might not get switched on unregister. No need for calling the NOTIFY_PEERS notifier since that is only for a gratitious ARP/ND packet; but that is not required with Hyper-V because both VF and synthetic NIC have the same MAC address. Reported-by: Vitaly KuznetsovFixes: 0c195567a8f6 ("netvsc: transparent VF management") Signed-off-by: Stephen Hemminger --- drivers/net/hyperv/hyperv_net.h | 3 -- drivers/net/hyperv/netvsc.c | 2 -- drivers/net/hyperv/netvsc_drv.c | 71 - 3 files changed, 28 insertions(+), 48 deletions(-) diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index c701b059c5ac..d1ea99a12cf2 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -724,14 +724,11 @@ struct net_device_context { struct net_device __rcu *vf_netdev; struct netvsc_vf_pcpu_stats __percpu *vf_stats; struct work_struct vf_takeover; - struct work_struct vf_notify; /* 1: allocated, serial number is valid. 0: not allocated */ u32 vf_alloc; /* Serial number of the VF to team with */ u32 vf_serial; - - bool datapath; /* 0 - synthetic, 1 - VF nic */ }; /* Per channel data */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 9598220b3bcc..208f03aa83de 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -60,8 +60,6 @@ void netvsc_switch_datapath(struct net_device *ndev, bool vf) sizeof(struct nvsp_message), (unsigned long)init_pkt, VM_PKT_DATA_INBAND, 0); - - net_device_ctx->datapath = vf; } static struct netvsc_device *alloc_net_device(void) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index e75c0f852a63..eb0023f55fe1 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -1649,55 +1649,35 @@ static int netvsc_register_vf(struct net_device *vf_netdev) return NOTIFY_OK; } -/* Change datapath */ -static void netvsc_vf_update(struct work_struct *w) +static int netvsc_vf_up(struct net_device *vf_netdev) { - struct net_device_context *ndev_ctx - = container_of(w, struct net_device_context, vf_notify); - struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx); + struct net_device_context *net_device_ctx; struct netvsc_device *netvsc_dev; - struct net_device *vf_netdev; - bool vf_is_up; - - if (!rtnl_trylock()) { - schedule_work(w); - return; - } + struct net_device *ndev; - vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev); - if (!vf_netdev) - goto unlock; + ndev = get_netvsc_byref(vf_netdev); + if (!ndev) + return NOTIFY_DONE; - netvsc_dev = rtnl_dereference(ndev_ctx->nvdev); + net_device_ctx = netdev_priv(ndev); + netvsc_dev = rtnl_dereference(net_device_ctx->nvdev); if (!netvsc_dev) - goto unlock; - - vf_is_up = netif_running(vf_netdev); - if (vf_is_up != ndev_ctx->datapath) { - if (vf_is_up) { - netdev_info(ndev, "VF up: %s\n", vf_netdev->name); - rndis_filter_open(netvsc_dev); - netvsc_switch_datapath(ndev, true); - netdev_info(ndev, "Data path switched to VF: %s\n", - vf_netdev->name); - } else { - netdev_info(ndev, "VF down: %s\n", vf_netdev->name); - netvsc_switch_datapath(ndev, false); - rndis_filter_close(netvsc_dev); - netdev_info(ndev, "Data path switched from VF: %s\n", - vf_netdev->name); - } + return NOTIFY_DONE; - /* Now notify peers through VF device. */ - call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, ndev); - } -unlock: - rtnl_unlock(); + /* Bump refcount when datapath is acvive - Why? */ + rndis_filter_open(netvsc_dev); + + /* notify the host to switch the data path. */ + netvsc_switch_datapath(ndev, true); + netdev_info(ndev, "Data path switched to VF: %s\n", vf_netdev->name); + + return NOTIFY_OK; } -static int netvsc_vf_notify(struct net_device *vf_netdev) +static int netvsc_vf_down(struct net_device *vf_netdev) { struct net_device_context *net_device_ctx; + struct netvsc_device *netvsc_dev; struct net_device *ndev; ndev = get_netvsc_byref(vf_netdev); @@ -1705,7 +1685,13 @@ static int netvsc_vf_notify(struct net_device *vf_netdev) return
[PATCH net-next 0/1] netvsc: another VF datapath fix
Previous fix was incomplete. Stephen Hemminger (1): netvsc: make sure and unregister datapath drivers/net/hyperv/hyperv_net.h | 3 -- drivers/net/hyperv/netvsc.c | 2 -- drivers/net/hyperv/netvsc_drv.c | 71 - 3 files changed, 28 insertions(+), 48 deletions(-) -- 2.11.0
Re: [PATCH RFC v2 3/5] samples/bpf: Fix inline asm issues building samples on arm64
Please, no. The amount of hellish hacks we are adding to deal with this is getting way out of control. BPF programs MUST have their own set of asm headers, this is the only way to get around this issue in the long term. I am also strongly against adding -static to the build.
Re: [PATCH net 1/1] s390/qeth: fix L3 next-hop in xmit qeth hdr
From: Ursula BraunDate: Mon, 7 Aug 2017 13:28:39 +0200 > From: Julian Wiedmann > > On L3, the qeth_hdr struct needs to be filled with the next-hop > IP address. > The current code accesses rtable->rt_gateway without checking that > rtable is a valid address. The accidental access to a lowcore area > results in a random next-hop address in the qeth_hdr. > rtable (or more precisely, skb_dst(skb)) can be NULL in rare cases > (for instance together with AF_PACKET sockets). > This patch adds the missing NULL-ptr checks. > > Signed-off-by: Julian Wiedmann > Signed-off-by: Ursula Braun > Fixes: 87e7597b5a3 qeth: Move away from using neighbour entries in > qeth_l3_fill_header() Applied.
Re: [PATCH v2] hysdn: fix to a race condition in put_log_buffer
From: Anton VolkovDate: Mon, 7 Aug 2017 15:54:14 +0300 > The synchronization type that was used earlier to guard the loop that > deletes unused log buffers may lead to a situation that prevents any > thread from going through the loop. > > The patch deletes previously used synchronization mechanism and moves > the loop under the spin_lock so the similar cases won't be feasible in > the future. > > Found by by Linux Driver Verification project (linuxtesting.org). > > Signed-off-by: Anton Volkov > --- > v2: Fixed coding style issues Applied.
Re: [PATCH net-next v1 1/2] bpf: Move check_uarg_tail_zero() upward
On 08/07/2017 06:36 PM, Mickaël Salaün wrote: The function check_uarg_tail_zero() may be useful for other part of the code in the syscall.c file. Move this function at the beginning of the file. Signed-off-by: Mickaël SalaünCc: Alexei Starovoitov Cc: Daniel Borkmann Cc: David S. Miller Cc: Kees Cook Cc: Martin KaFai Lau Acked-by: Daniel Borkmann
Re: [PATCH] hns3: fix unused function warning
From: Arnd BergmannDate: Mon, 7 Aug 2017 12:41:53 +0200 > Without CONFIG_PCI_IOV, we get a harmless warning about an > unused function: > > drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:2273:13: error: > 'hclge_disable_sriov' defined but not used [-Werror=unused-function] > > The #ifdefs in this driver are obviously wrong, so this just > removes them and uses an IS_ENABLED() check that does the same > thing correctly in a more readable way. > > Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility > Layer Support") > Signed-off-by: Arnd Bergmann Applied.
Re: [pull request][for-next 0/8] Mellanox, mlx5 shared 2017-08-07
From: Saeed MahameedDate: Mon, 7 Aug 2017 13:18:00 +0300 > Hi Dave & Doug, > > This series contains some low level updates for mlx5 core driver, > to be shared as base code for net-next and rdma for-next mlx5 > 4.14 submissions. > > Please find more information in the tag message below. > > Please pull and let me know if there's any porblem. > > Side note: > This series merges cleanly with current net-next, but it will conflict with > Jiri's patch > "mlx5e: push cls_flower and mqprio setup_tc processing into separate > functions" > Which is under review. > since this is shared code and must go to both rdma and net-next it has to be > based on 4.13-rc4, so there is not much I can do about this. I resolved the merge conflict as best as I could, please take a look. Pulled, thanks.
[PATCH v1 net] TCP_USER_TIMEOUT and tcp_keepalive should conform to RFC5482
Change from version 0: Rationale behind the change: The man page for tcp(7) states when used with the TCP keepalive (SO_KEEPALIVE) option, TCP_USER_TIMEOUT will override keepalive to determine when to close a connection due to keepalive failure. This is ambigious at best. user expectation is most likely that the connection will be reset after TCP_USER_TIMEOUT milliseconds of inactivity. The code however waits for the keepalive to kick-in (default 2hrs) and than after one failure resets the conenction. What is the rationale for that ? The same effect can be obtained by simply changing the value of tcp_keep_alive_probes. Since the TCP_USER_TIMEOUT option was added based on RFC 5482 we need to follow the RFC. Which states 4.2 TCP keep-Alives: Some TCP implementations, such as those in BSD systems, use a different abort policy for TCP keep-alives than for user data. Thus, the TCP keep-alive mechanism might abort a connection that would otherwise have survived the transient period without connectivity. Therefore, if a connection that enables keep-alives is also using the TCP User Timeout Option, then the keep-alive timer MUST be set to a value larger than that of the adopted USER TIMEOUT. This patch enforces the MUST and also dis-associates user timeout from keep alive. A man page patch will be submitted separately. Signed-off-by: Rao Shoaib--- net/ipv4/tcp.c | 10 -- net/ipv4/tcp_timer.c | 9 + 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 71ce33d..f2af44d 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2628,7 +2628,9 @@ static int do_tcp_setsockopt(struct sock *sk, int level, break; case TCP_KEEPIDLE: - if (val < 1 || val > MAX_TCP_KEEPIDLE) + /* Per RFC5482 keepalive_time must be > user_timeout */ + if (val < 1 || val > MAX_TCP_KEEPIDLE || + ((val * HZ) <= icsk->icsk_user_timeout)) err = -EINVAL; else { tp->keepalive_time = val * HZ; @@ -2724,8 +2726,12 @@ static int do_tcp_setsockopt(struct sock *sk, int level, case TCP_USER_TIMEOUT: /* Cap the max time in ms TCP will retry or probe the window * before giving up and aborting (ETIMEDOUT) a connection. +* Per RFC5482 TCP user timeout must be < keepalive_time. +* If the default value changes later -- all bets are off. */ - if (val < 0) + if (val < 0 || (tp->keepalive_time && + tp->keepalive_time <= msecs_to_jiffies(val)) || + net->ipv4.sysctl_tcp_keepalive_time <= msecs_to_jiffies(val)) err = -EINVAL; else icsk->icsk_user_timeout = msecs_to_jiffies(val); diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index c0f..d39fe60 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -664,14 +664,7 @@ static void tcp_keepalive_timer (unsigned long data) elapsed = keepalive_time_elapsed(tp); if (elapsed >= keepalive_time_when(tp)) { - /* If the TCP_USER_TIMEOUT option is enabled, use that -* to determine when to timeout instead. -*/ - if ((icsk->icsk_user_timeout != 0 && - elapsed >= icsk->icsk_user_timeout && - icsk->icsk_probes_out > 0) || - (icsk->icsk_user_timeout == 0 && - icsk->icsk_probes_out >= keepalive_probes(tp))) { + if (icsk->icsk_probes_out >= keepalive_probes(tp)) { tcp_send_active_reset(sk, GFP_ATOMIC); tcp_write_err(sk); goto out; -- 2.7.4
Re: [PATCH net-next v3 13/13] net: switchdev: Remove bridge bypass support from switchdev
On 08/06/2017 06:15 AM, Arkadi Sharshevsky wrote: > Currently the bridge port flags, vlans, FDBs and MDBs can be offloaded > through the bridge code, making the switchdev's SELF bridge bypass > implementation to be redundant. This implies several changes: > - No need for dump infra in switchdev, DSA's special case is handled > privately. > - Remove obj_dump from switchdev_ops. > - FDBs are removed from obj_add/del routines, due to the fact that they > are offloaded through the bridge notification chain. > - The switchdev_port_bridge_xx() and switchdev_port_fdb_xx() functions > can be removed. > > Signed-off-by: Arkadi Sharshevsky> Reviewed-by: Vivien Didelot Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH net-next v4 2/2] bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints
On 08/05/2017 01:00 AM, Yonghong Song wrote: Signed-off-by: Yonghong SongAcked-by: Daniel Borkmann
Re: [PATCH net-next v3 12/13] net: bridge: Remove FDB deletion through switchdev object
On 08/06/2017 06:15 AM, Arkadi Sharshevsky wrote: > At this point no driver supports FDB add/del through switchdev object > but rather via notification chain, thus, it is removed. > > Signed-off-by: Arkadi Sharshevsky> Reviewed-by: Vivien Didelot Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH net-next v3 09/13] net: dsa: Remove support for MDB dump from DSA's drivers
On 08/06/2017 06:15 AM, Arkadi Sharshevsky wrote: > This is done as a preparation before removing support for MDB dump from > DSA core. The MDBs are synced with the bridge and thus there is no > need for special dump operation support. > > Signed-off-by: Arkadi SharshevskyReviewed-by: Florian Fainelli -- Florian
Re: [RFC] iproute: Add support for extended ack to rtnl_talk
On 8/7/17 12:06 PM, Stephen Hemminger wrote: >> Does not work. Seems like you pushed the RFC commit which was known to >> be incomplete. > > Patches welcome. What exists does not even compile. Patches will be sent once you fix that.
Re: [PATCH net-next v3 07/13] net: dsa: Remove support for vlan dump from DSA's drivers
On 08/06/2017 06:15 AM, Arkadi Sharshevsky wrote: > This is done as a preparation before removing support for vlan dump from > DSA core. The vlans are synced with the bridge and thus there is no > need for special dump operation support. > > Signed-off-by: Arkadi SharshevskyReviewed-by: Florian Fainelli -- Florian
Re: [PATCH net-next v3 08/13] net: dsa: Remove support for bypass bridge port attributes/vlan set
On 08/06/2017 06:15 AM, Arkadi Sharshevsky wrote: > The bridge port attributes/vlan for DSA devices should be set only > from bridge code. Furthermore, The vlans are synced totally with the > bridge so there is no need for special dump support. > > Signed-off-by: Arkadi SharshevskyReviewed-by: Florian Fainelli -- Florian
Re: [PATCH net-next v3 05/13] net: dsa: Move FDB add/del implementation inside DSA
On 08/06/2017 06:15 AM, Arkadi Sharshevsky wrote: > Currently DSA uses switchdev's implementation of FDB add/del ndos. This > patch moves the implementation inside DSA in order to support the legacy > way for static FDB configuration. > > Signed-off-by: Arkadi SharshevskyReviewed-by: Florian Fainelli -- Florian
Re: [PATCH net-next v3 04/13] net: dsa: Add support for learning FDB through notification
On 08/06/2017 06:15 AM, Arkadi Sharshevsky wrote: > Add support for learning FDB through notification. The driver defers > the hardware update via ordered work queue. In case of a successful > FDB add a notification is sent back to bridge. > > In case of hw FDB del failure the static FDB will be deleted from > the bridge, thus, the interface is moved to down state in order to > indicate inconsistent situation. > > Signed-off-by: Arkadi SharshevskyReviewed-by: Florian Fainelli -- Florian
Re: [RFC] iproute: Add support for extended ack to rtnl_talk
On Mon, 7 Aug 2017 10:48:23 -0600 David Ahernwrote: > On 8/4/17 10:47 AM, Stephen Hemminger wrote: > > I will put in the libmnl version. If it doesn't work because no one sent > > me test cases, then fine. send a patch for that. > > This commit: > > commit b6432e68ac2f1f6b4ea50aa0d6d47e72c445c71c > Author: Stephen Hemminger > Date: Fri Aug 4 09:52:15 2017 -0700 > > iproute: Add support for extended ack to rtnl_talk > > > Does not work. Seems like you pushed the RFC commit which was known to > be incomplete. Patches welcome. > First, the Config is HAVE_MNL not HAVE_LIBMNL which is in lib/libnetlink.c. > > Second, changing that to HAVE_MNL does not work -- something is not > getting passed in correctly. Just remove the semicolon on the else path: > > +#else > +/* No extended error ack without libmnl */ > +static int nl_dump_ext_err(const struct nlmsghdr *nlh, nl_ext_ack_fn_t > errfn) > +{ > + return 0; > +} > +#endif > > and you will see that HAVE_MNL is never defined. Ok, that I will fix.
Re: [PATCH net-next v3 10/13] net: dsa: Remove redundant MDB dump support
On 08/06/2017 06:15 AM, Arkadi Sharshevsky wrote: > Currently the MDB HW database is synced with the bridge's one, thus, > There is no need to support special dump functionality. > > Signed-off-by: Arkadi SharshevskyReviewed-by: Florian Fainelli -- Florian