Re: [PATCH v3 net-next 1/2] rocker: remove unused rocker_port parameter from rocker_port_kfree
Mon, May 25, 2015 at 07:28:35AM CEST, simon.hor...@netronome.com wrote: Remove unused rocker_port parameter from rocker_port_kfree. Also remove the rocker_port parameter from callers of rocker_port_kfree where the parameter it is now unused. Signed-off-by: Simon Horman simon.hor...@netronome.com Acked-by: Scott Feldman sfel...@gmail.com Acked-by: Jiri Pirko j...@resnulli.us -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] irda: use msecs_to_jiffies for conversion to jiffies
API compliance scanning with coccinelle flagged: ./net/irda/timer.c:63:35-37: use of msecs_to_jiffies probably perferable Converting milliseconds to jiffies by val * HZ / 1000 technically is not a clean solution as it does not handle all corner cases correctly. By changing the conversion to use msecs_to_jiffies(val) conversion is correct in all cases. Further the () around the arithmetic expression was dropped. Patch was compile tested for x86_64_defconfig + CONFIG_IRDA=m Patch is against 4.1-rc4 (localversion-next is -next-20150522) Signed-off-by: Nicholas Mc Guire hof...@osadl.org --- V2: botched prefix in the subject line updated to irda: thanks to David Miller da...@davemloft.net Should the Status note at the top of the file be updated ? it seems the file has been marked experimental since at least 1999 ? net/irda/timer.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/irda/timer.c b/net/irda/timer.c index 0c4c115..f2280f7 100644 --- a/net/irda/timer.c +++ b/net/irda/timer.c @@ -60,8 +60,8 @@ void irlap_start_query_timer(struct irlap_cb *self, int S, int s) * to avoid messing with for incoming connections requests and * to accommodate devices that perform discovery slower than us. * Jean II */ - timeout = ((sysctl_slot_timeout * HZ / 1000) * (S - s) - + XIDEXTRA_TIMEOUT + SMALLBUSY_TIMEOUT); + timeout = msecs_to_jiffies(sysctl_slot_timeout) * (S - s) + + XIDEXTRA_TIMEOUT + SMALLBUSY_TIMEOUT; /* Set or re-set the timer. We reset the timer for each received * discovery query, which allow us to automatically adjust to -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linuxwifi] [PATCH] iwlwifi: Remove use of the deprecacted marco, PTR_RET for the function, iwl_mvm_get_regdomain
On Sat, 2015-05-23 at 20:53 -0400, Nicholas Krause wrote: This removes the use of the two deprecated calls to the marco, PTR_RET in the function, iwl_mvm_get_regdomain and replaces them both with a call to the function, PTR_ERR_OR_ZERO. Signed-off-by: Nicholas Krause xerofo...@gmail.com Applied in our internal tree after a few fixes to the commit message. Will be pushed out with the regular process. Thanks. --- drivers/net/wireless/iwlwifi/mvm/mac80211.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/iwlwifi/mvm/mac80211.c index 40265b9..5fb2e8e 100644 --- a/drivers/net/wireless/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/iwlwifi/mvm/mac80211.c @@ -319,7 +319,7 @@ struct ieee80211_regdomain *iwl_mvm_get_regdomain(struct wiphy *wiphy, resp = iwl_mvm_update_mcc(mvm, alpha2, src_id); if (IS_ERR_OR_NULL(resp)) { IWL_DEBUG_LAR(mvm, Could not get update from FW %d\n, - PTR_RET(resp)); + PTR_ERR_OR_ZERO(resp)); goto out; } @@ -335,7 +335,7 @@ struct ieee80211_regdomain *iwl_mvm_get_regdomain(struct wiphy *wiphy, kfree(resp); if (IS_ERR_OR_NULL(regd)) { IWL_DEBUG_LAR(mvm, Could not get parse update from FW %d\n, - PTR_RET(regd)); + PTR_ERR_OR_ZERO(regd)); goto out; }
Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
Le 22/05/2015 22:50, Alexander Holler a écrit : Am 08.05.2015 um 14:02 schrieb Eric W. Biederman: So I am dense. I have read through the patches and I don't see where you tag packets from other network namespaces with a network namespace id. Me too, I've recently written a little tool called snetmanmon (source is available at github) to monitor and handle network related events by using rtnetlink. Having seen this patch series (thanks!), I've played with it. I've applied the patch series to v4.1-rc4. Maybe I'm using or holding it wrong, but I've some comments. First I think if NETLINK_LISTEN_ALL_NSID is enabled, a dump of the interfaces through RTM_GETLINK together with NLM_F_DUMP and NLM_F_REQUEST should return all interfaces of all reachable namespaces. This option is only for 'listening', ie spontaneous notifications from the kernel. It does nothing for request. Next, if NETLINK_LISTEN_ALL_NSID is enabled, I receive RTM_NEWLINK but without any indication of the namespace. E.g. if I do ip netns add netns1 ip netns exec netns1 brctl addbr br0 the RTM_NEWLINK for br0 (received in the root ns, not netns1) doesn't have the attribute IFLA_LINK_NETNSID. nsid is sent through control message (see rcvmsg). Try iproute2 branch net-next: 'ip monitor all-nsid'. It's an example of how to use it. Same for the RTM_DELLINK msg if I call ip netns exec netns1 brctl delbr br0 afterwards. So both netlink messages are looking like br0 was created in the root ns. Another problem seems to be with veth devices. E.g. if I do ip link add veth0 type veth peer name veth1 ip link set veth1 netns netns1 I receive RTM_NEWLINK for veth0 (no nsid) RTM_NEWLINK for veth1 (no nsid) RTM_DELLINK for veth1 (no nsid) RTM_NEWLINK for veth1 (with nsid 0) That looks ok, except the missing RTM_NEWLINK for lo in netns1, which The nsid for netns1 in the current netns is allocated when the veth1 is moved to netns1. At this time, lo is created since a long time, thus the kernel won't send any notification. Note, you can manually allocate it with 'ip netns set netns1 -1', but you won't get any notifications for the loopback. was created together with the namespace. But if I now request a dump, I get RTM_NEWLINK for veth0 (with nsid 0) which looks like veth0 is part of nsid 0, and I get nothing for veth1. The netlink message gives informations about veth1. With iproute2: $ ip netns netns1 (id: 0) $ ip -d l ls veth0 9: veth0@if8: BROADCAST,MULTICAST mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 72:36:c0:f4:35:64 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 veth addrgenmode eui64 Peer veth is the interface with ifindex 8 (@if8) in netns1 (link-netnsid 0). To get informations about this interface, you need to dump it in netns1. Of course, that vlan device might be part of nsid 0 too (as veth1), but its part named veth0 is not part of that namespace. So the IFLA_LINK_NETNSID attribute received with the RTM_NEWLINK for veth0 through the dump is misleading. Not sure to follow you. veth0 sits in the current netns (let's say init_net) and veth1 in netns1. So, when you dump veth0 in init_net, its link-netnsid is set to the id of netns1 in init_net. And when you dump veth1 in netns1, it's link-netnsid is set to the id of init_net in netns1. So it looks like either I missed something, I'm doing something wrong, or there still is some work todo to make NETLINK_LISTEN_ALL_NSID work like expected (or like my simple mind would expect it). Having a patch that allows to perform request from a netns foo for a netns bar is something doable, but much more complicated. And I think it requires more thought. Let's see what will happen ;-) Thanks again for the patches, regards, Thank you, Regards, Nicolas -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()
On Sun, May 24, 2015 at 07:11:40PM -0500, Larry Finger wrote: On 05/24/2015 02:03 PM, Haggai Eran wrote: On 24 May 2015 at 00:16, Larry Finger larry.fin...@lwfinger.net wrote: The driver is reporting a warning at kernel/time/timer.c:1096 due to calling del_timer_sync() while in interrupt mode. Such warnings are fixed by calling del_timer() instead. Signed-off-by: Larry Finger larry.fin...@lwfinger.net Cc: Stable sta...@vger.kernel.org Cc: Haggi Eran haggai.e...@gmail.com Hi, I haven't been using kernel v4.1 so I haven't seen this warning, but looking at the code it seems to originate from the two recent patches to remove _cancel_timer and _cancel_timer_ex. I see that there's another patch in lkml [1] that changes del_timer_sync back to del_timer in more places. Perhaps it could prevent other warnings like this in the future. Regards, Haggai [1] https://lkml.org/lkml/2015/5/15/226 Yes, the script kiddies make changes they do not understand and screw everything up. Unfortunately, I did not catch these in review. I think I will submit V2 and blast the contributor. Don't blast the contributor... These are special intern patches that dont' go through the normal review process. The intern process is over this year. The lack of normal review introduced a number of bugs this year. I always complain to Greg about it and he says that I should join the intern mailing list if I care so much. regards, dan carpenter -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 net-next 2/2] rocker: mark parameters and local variables as const
Mon, May 25, 2015 at 07:28:36AM CEST, simon.hor...@netronome.com wrote: Mark parameters and local variables as const where possible. Signed-off-by: Simon Horman simon.hor...@netronome.com Acked-by: Scott Feldman sfel...@gmail.com Acked-by: Jiri Pirko j...@resnulli.us -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/4] net: cpsw: remove two unused global functions
The funtions, cpsw_ale_flush and cpsw_ale_set_ageout, have never been used since they were first introduced. This patch removes the dead code. Signed-off-by: Richard Cochran richardcoch...@gmail.com --- drivers/net/ethernet/ti/cpsw_ale.c | 45 -- drivers/net/ethernet/ti/cpsw_ale.h | 2 -- 2 files changed, 47 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c index 6e927b4..43b061b 100644 --- a/drivers/net/ethernet/ti/cpsw_ale.c +++ b/drivers/net/ethernet/ti/cpsw_ale.c @@ -268,39 +268,6 @@ int cpsw_ale_flush_multicast(struct cpsw_ale *ale, int port_mask, int vid) } EXPORT_SYMBOL_GPL(cpsw_ale_flush_multicast); -static void cpsw_ale_flush_ucast(struct cpsw_ale *ale, u32 *ale_entry, -int port_mask) -{ - int port; - - port = cpsw_ale_get_port_num(ale_entry); - if ((BIT(port) port_mask) == 0) - return; /* ports dont intersect, not interested */ - cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_FREE); -} - -int cpsw_ale_flush(struct cpsw_ale *ale, int port_mask) -{ - u32 ale_entry[ALE_ENTRY_WORDS]; - int ret, idx; - - for (idx = 0; idx ale-params.ale_entries; idx++) { - cpsw_ale_read(ale, idx, ale_entry); - ret = cpsw_ale_get_entry_type(ale_entry); - if (ret != ALE_TYPE_ADDR ret != ALE_TYPE_VLAN_ADDR) - continue; - - if (cpsw_ale_get_mcast(ale_entry)) - cpsw_ale_flush_mcast(ale, ale_entry, port_mask); - else - cpsw_ale_flush_ucast(ale, ale_entry, port_mask); - - cpsw_ale_write(ale, idx, ale_entry); - } - return 0; -} -EXPORT_SYMBOL_GPL(cpsw_ale_flush); - static inline void cpsw_ale_set_vlan_entry_type(u32 *ale_entry, int flags, u16 vid) { @@ -752,18 +719,6 @@ static void cpsw_ale_timer(unsigned long arg) } } -int cpsw_ale_set_ageout(struct cpsw_ale *ale, int ageout) -{ - del_timer_sync(ale-timer); - ale-ageout = ageout * HZ; - if (ale-ageout) { - ale-timer.expires = jiffies + ale-ageout; - add_timer(ale-timer); - } - return 0; -} -EXPORT_SYMBOL_GPL(cpsw_ale_set_ageout); - void cpsw_ale_start(struct cpsw_ale *ale) { u32 rev; diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h index af1e7ec..a700189 100644 --- a/drivers/net/ethernet/ti/cpsw_ale.h +++ b/drivers/net/ethernet/ti/cpsw_ale.h @@ -90,8 +90,6 @@ int cpsw_ale_destroy(struct cpsw_ale *ale); void cpsw_ale_start(struct cpsw_ale *ale); void cpsw_ale_stop(struct cpsw_ale *ale); -int cpsw_ale_set_ageout(struct cpsw_ale *ale, int ageout); -int cpsw_ale_flush(struct cpsw_ale *ale, int port_mask); int cpsw_ale_flush_multicast(struct cpsw_ale *ale, int port_mask, int vid); int cpsw_ale_add_ucast(struct cpsw_ale *ale, u8 *addr, int port, int flags, u16 vid); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/4] cpsw cleanups
While working on an out-of-tree customization, I noticed a few minor problems in the cpsw code. This series cleans up the issues I found. Thanks, Richard Richard Cochran (4): net: cpsw: fix misplaced break statements. net: cpsw: remove two unused global functions net: cpsw: remove redundant calls enabling dma interrupts. net: cpsw: remove redundant calls disabling dma interrupts. drivers/net/ethernet/ti/cpsw.c | 9 ++-- drivers/net/ethernet/ti/cpsw_ale.c | 45 -- drivers/net/ethernet/ti/cpsw_ale.h | 2 -- 3 files changed, 2 insertions(+), 54 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/4] net: cpsw: remove redundant calls disabling dma interrupts.
The function, cpsw_intr_disable, already calls cpdma_ctlr_int_ctrl. There is no need to disable the dma interrupts twice. This patch removes the extra calls. Signed-off-by: Richard Cochran richardcoch...@gmail.com --- drivers/net/ethernet/ti/cpsw.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 0d0cf9a..4628205 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1361,7 +1361,6 @@ static int cpsw_ndo_stop(struct net_device *ndev) if (cpsw_common_res_usage_state(priv) = 1) { cpts_unregister(priv-cpts); cpsw_intr_disable(priv); - cpdma_ctlr_int_ctrl(priv-dma, false); cpdma_ctlr_stop(priv-dma); cpsw_ale_stop(priv-ale); } @@ -1589,7 +1588,6 @@ static void cpsw_ndo_tx_timeout(struct net_device *ndev) cpsw_err(priv, tx_err, transmit timeout, restarting dma\n); ndev-stats.tx_errors++; cpsw_intr_disable(priv); - cpdma_ctlr_int_ctrl(priv-dma, false); cpdma_chan_stop(priv-txch); cpdma_chan_start(priv-txch); cpsw_intr_enable(priv); @@ -1628,7 +1626,6 @@ static void cpsw_ndo_poll_controller(struct net_device *ndev) struct cpsw_priv *priv = netdev_priv(ndev); cpsw_intr_disable(priv); - cpdma_ctlr_int_ctrl(priv-dma, false); cpsw_rx_interrupt(priv-irqs_table[0], priv); cpsw_tx_interrupt(priv-irqs_table[1], priv); cpsw_intr_enable(priv); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/4] net: cpsw: remove redundant calls enabling dma interrupts.
The function, cpsw_intr_enable, already calls cpdma_ctlr_int_ctrl. There is no need to enable the dma interrupts twice. This patch removes the extra call. Signed-off-by: Richard Cochran richardcoch...@gmail.com --- drivers/net/ethernet/ti/cpsw.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index e9e3ab3..0d0cf9a 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1592,7 +1592,6 @@ static void cpsw_ndo_tx_timeout(struct net_device *ndev) cpdma_ctlr_int_ctrl(priv-dma, false); cpdma_chan_stop(priv-txch); cpdma_chan_start(priv-txch); - cpdma_ctlr_int_ctrl(priv-dma, true); cpsw_intr_enable(priv); } @@ -1632,7 +1631,6 @@ static void cpsw_ndo_poll_controller(struct net_device *ndev) cpdma_ctlr_int_ctrl(priv-dma, false); cpsw_rx_interrupt(priv-irqs_table[0], priv); cpsw_tx_interrupt(priv-irqs_table[1], priv); - cpdma_ctlr_int_ctrl(priv-dma, true); cpsw_intr_enable(priv); } #endif -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/4] net: cpsw: fix misplaced break statements.
Having the breaks too far to the left makes parsing the dense switch/case block unnecessarily harder. Signed-off-by: Richard Cochran richardcoch...@gmail.com --- drivers/net/ethernet/ti/cpsw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index b536b4c..e9e3ab3 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1456,7 +1456,7 @@ static void cpsw_hwtstamp_v2(struct cpsw_priv *priv) if (priv-cpts-rx_enable) ctrl |= CTRL_V2_RX_TS_BITS; - break; + break; case CPSW_VERSION_3: default: ctrl = ~CTRL_V3_ALL_TS_MASK; @@ -1466,7 +1466,7 @@ static void cpsw_hwtstamp_v2(struct cpsw_priv *priv) if (priv-cpts-rx_enable) ctrl |= CTRL_V3_RX_TS_BITS; - break; + break; } mtype = (30 TS_SEQ_ID_OFFSET_SHIFT) | EVENT_MSG_BITS; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] can: mcp251x: not correct register address
This patch corrects addresses of acceptance filters. These registers are not in use, but values should be correct. Tested with MCP2515 and am3352 and also checked datasheets for MCP2515 and MCP2510. Signed-off-by: Tomas Krcka tomas.kr...@nkgroup.cz --- drivers/net/can/spi/mcp251x.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c index bf63fee..c1a95a3 100644 --- a/drivers/net/can/spi/mcp251x.c +++ b/drivers/net/can/spi/mcp251x.c @@ -190,10 +190,11 @@ #define RXBEID0_OFF 4 #define RXBDLC_OFF 5 #define RXBDAT_OFF 6 -#define RXFSIDH(n) ((n) * 4) -#define RXFSIDL(n) ((n) * 4 + 1) -#define RXFEID8(n) ((n) * 4 + 2) -#define RXFEID0(n) ((n) * 4 + 3) +#define RXFSID(n) ((n 3) ? 0 : 4) +#define RXFSIDH(n) ((n) * 4 + RXFSID(n)) +#define RXFSIDL(n) ((n) * 4 + 1 + RXFSID(n)) +#define RXFEID8(n) ((n) * 4 + 2 + RXFSID(n)) +#define RXFEID0(n) ((n) * 4 + 3 + RXFSID(n)) #define RXMSIDH(n) ((n) * 4 + 0x20) #define RXMSIDL(n) ((n) * 4 + 0x21) #define RXMEID8(n) ((n) * 4 + 0x22) -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 3/3] net: dp83640: fix improper double spin locking.
A pair of nested spin locks was introduced in commit 63502b8d0 dp83640: Fix receive timestamp race condition. Unfortunately the 'flags' parameter was reused for the inner lock, clobbering the originally saved IRQ state. This patch fixes the issue by changing the inner lock to plain spin_lock without irqsave. Signed-off-by: Richard Cochran richardcoch...@gmail.com --- drivers/net/phy/dp83640.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c index e570036..00cb41e 100644 --- a/drivers/net/phy/dp83640.c +++ b/drivers/net/phy/dp83640.c @@ -846,7 +846,7 @@ static void decode_rxts(struct dp83640_private *dp83640, list_del_init(rxts-list); phy2rxts(phy_rxts, rxts); - spin_lock_irqsave(dp83640-rx_queue.lock, flags); + spin_lock(dp83640-rx_queue.lock); skb_queue_walk(dp83640-rx_queue, skb) { struct dp83640_skb_info *skb_info; @@ -861,7 +861,7 @@ static void decode_rxts(struct dp83640_private *dp83640, break; } } - spin_unlock_irqrestore(dp83640-rx_queue.lock, flags); + spin_unlock(dp83640-rx_queue.lock); if (!shhwtstamps) list_add_tail(rxts-list, dp83640-rxts); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 2/3] net: dp83640: reinforce locking rules.
Callers of the ext_write function are supposed to hold a mutex that protects the state of the dialed page, but one caller was missing the lock from the very start, and over time the code has been changed without following the rule. This patch cleans up the call sites in violation of the rule. Signed-off-by: Richard Cochran richardcoch...@gmail.com --- drivers/net/phy/dp83640.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c index 7a068d9..e570036 100644 --- a/drivers/net/phy/dp83640.c +++ b/drivers/net/phy/dp83640.c @@ -496,7 +496,9 @@ static int ptp_dp83640_enable(struct ptp_clock_info *ptp, else evnt |= EVNT_RISE; } + mutex_lock(clock-extreg_lock); ext_write(0, phydev, PAGE5, PTP_EVNT, evnt); + mutex_unlock(clock-extreg_lock); return 0; case PTP_CLK_REQ_PEROUT: @@ -532,6 +534,8 @@ static u8 status_frame_src[6] = { 0x08, 0x00, 0x17, 0x0B, 0x6B, 0x0F }; static void enable_status_frames(struct phy_device *phydev, bool on) { + struct dp83640_private *dp83640 = phydev-priv; + struct dp83640_clock *clock = dp83640-clock; u16 cfg0 = 0, ver; if (on) @@ -539,9 +543,13 @@ static void enable_status_frames(struct phy_device *phydev, bool on) ver = (PSF_PTPVER VERSIONPTP_MASK) VERSIONPTP_SHIFT; + mutex_lock(clock-extreg_lock); + ext_write(0, phydev, PAGE5, PSF_CFG0, cfg0); ext_write(0, phydev, PAGE6, PSF_CFG1, ver); + mutex_unlock(clock-extreg_lock); + if (!phydev-attached_dev) { pr_warn(expected to find an attached netdevice\n); return; @@ -1173,11 +1181,18 @@ static int dp83640_config_init(struct phy_device *phydev) if (clock-chosen !list_empty(clock-phylist)) recalibrate(clock); - else + else { + mutex_lock(clock-extreg_lock); enable_broadcast(phydev, clock-page, 1); + mutex_unlock(clock-extreg_lock); + } enable_status_frames(phydev, true); + + mutex_lock(clock-extreg_lock); ext_write(0, phydev, PAGE4, PTP_CTL, PTP_ENABLE); + mutex_unlock(clock-extreg_lock); + return 0; } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 0/3] phyter bug fixes
While working on a project using the phyter, I noticed some bugs that have crept in over time. This series fixes those bugs. These patches are also meant for stable. Thanks, Richard Richard Cochran (3): net: dp83640: fix broken calibration routine. net: dp83640: reinforce locking rules. net: dp83640: fix improper double spin locking. drivers/net/phy/dp83640.c | 23 +++ 1 file changed, 19 insertions(+), 4 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 1/3] net: dp83640: fix broken calibration routine.
Currently, the calibration function that corrects the initial offsets among multiple devices only works the first time. If the function is called more than once, the calibration fails and bogus offsets will be programmed into the devices. In a well hidden spot, the device documentation tells that trigger indexes 0 and 1 are special in allowing the TRIG_IF_LATE flag to actually work. This patch fixes the issue by using one of the special triggers during the recalibration method. Signed-off-by: Richard Cochran richardcoch...@gmail.com --- drivers/net/phy/dp83640.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c index 496e02f..7a068d9 100644 --- a/drivers/net/phy/dp83640.c +++ b/drivers/net/phy/dp83640.c @@ -47,7 +47,7 @@ #define PSF_TX 0x1000 #define EXT_EVENT 1 #define CAL_EVENT 7 -#define CAL_TRIGGER7 +#define CAL_TRIGGER1 #define DP83640_N_PINS 12 #define MII_DP83640_MICR 0x11 -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
Am 25.05.2015 um 09:45 schrieb Nicolas Dichtel: Le 22/05/2015 22:50, Alexander Holler a écrit : First I think if NETLINK_LISTEN_ALL_NSID is enabled, a dump of the interfaces through RTM_GETLINK together with NLM_F_DUMP and NLM_F_REQUEST should return all interfaces of all reachable namespaces. This option is only for 'listening', ie spontaneous notifications from the kernel. It does nothing for request. The problem is that you need informations about the affected interfaces. E.g. if you receive an NEWADDR or NEWROUTE for some interface (indicated by the index of the interface) in a(nother) namespace, how do you get informations about that interface, if not by a dump which includes the interfaces of these namespaces too? Without knowledge about the interface, these messages are not very usable. ;) Not sure to follow you. veth0 sits in the current netns (let's say init_net) and veth1 in netns1. So, when you dump veth0 in init_net, its link-netnsid is set to the id of netns1 in init_net. And when you dump veth1 in netns1, it's link-netnsid is set to the id of init_net in netns1. I've misunderstood the meaning of IFLA_LINK_NETNSID. I thought it indicates the namespace an interface lives in, but it indicates the namespace it is linked too. I've also thought that the NETNSID is a global unique identifier of a namespace, which seems to be wrong too. While I still not have read through all the sources, the other comments are suggesting that the NSID is just an ID which is unique only in one namespace, or in other words, every namespace has its own set of nsids. I'm not sure if I'm now right with that assumption, but that's what I now think after the responses to my mail. ;) So to conclude, I've now scheduled support for namespaces to a far later point. It doesn't seem to be as easy as I've thought after having read the introductory mail of your patch series. ;) Regards, Alexander Holler -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()
On Mon, May 25, 2015 at 03:07:08PM +0530, Vaishali Thakkar wrote: I am sorry for those patches. It was me who introduced those bugs. Yes, it was sent during Outreachy process. But it was my mistake as a newbie. May be I should have taken care of interrupt mode thing. I would like to fix it if someone is not doing it. Sorry again. I will take care of these things in my future patches. No, it's not your fault for making mistakes. We *expect* newbies to make mistakes. It's Greg's fault for merging these when they weren't sent to the list or to the other maintainers. But Greg knows I'm annoyed already since we have been dealing with the fallout for months and I always make sure to complain whenever I have a chance. /me shakes a fist! Grrr :P regards, dan carpenter -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2 PATCH 13/13] crypto: algif_aead - Switch to new AEAD interface
On Sun, May 24, 2015 at 12:52:02PM +0200, Stephan Mueller wrote: [ 29.653113] BUG: unable to handle kernel NULL pointer dereference at 000c Weird. I tried running your test but it appears to pass. The only failures were the nonsense strings and everything else says pased. It certainly didn't crash for me. Considering that I just killed cryptoff in my local tree, it is entirely possible that the patches that you are running are no longer the same as mine. So let me merge the cryptoff patches and then I'll repost the algif_aead patch and ask you to retest. Thanks, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pull-request: wireless-drivers-next 2015-05-21
David Miller da...@davemloft.net writes: From: Kalle Valo kv...@codeaurora.org Date: Thu, 21 May 2015 16:39:04 +0300 here's a wireless-drivers pull request for 4.2. This time please pay extra attention to this pull as there are two problems: First of all as you can see the diffstat from git-pull-request in the end is just weird. I was long and hard trying to check everything and to my understanding all the merges look ok and I cannot explain the reason for the diffstat, but of course I might be missing something. Maybe git-request-pull is just buggy? At least with gitk everything looks to be ok and the patch list below also looks valid. The diffstat doesn't look anything like that for me. It contained only your wireless changes. It may have helped that I merged 'net' into 'net-next' right before I pulled this. Good to hear. Secondly there's a non-trivial conflict in drivers/net/wireless/ath/ath10k/mac.c which is due to removal of FIF_PROMISC_IN_BSS in commit df1404650c. You need to remove more code than just the obvious conflicts shown by git. In the end of this mail I added a git diff output after I fixed the conflict, hopefully that helps you to fix it. The main points are that you remove ath10k_mac_should_disable_promisc() and the last ath10k_monitor_recalc() call from ath10k_vdev_start_restart() along with the obvious conflict fixes git points out. There's also a patch from Michal which will also help to fix the resolution. Michal, please double check the resolution proposal below so that I didn't miss anything. https://patchwork.kernel.org/patch/6387631/ Thanks, I think I got the conflict resolution correct, please have a look. Looks good, thanks for fixing it. -- Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] bridge: skip fdb add if the port shouldn't learn
On Mon, May 25, 2015 at 4:59 AM, David Miller da...@davemloft.net wrote: From: Nikolay Aleksandrov niko...@cumulusnetworks.com Date: Thu, 21 May 2015 03:42:57 -0700 From: Wilson Kok w...@cumulusnetworks.com Check in fdb_add_entry() if the source port should learn, similar check is used in br_fdb_update. Note that new fdb entries which are added manually or as local ones are still permitted. This patch has been tested by running traffic via a bridge port and switching the port's state, also by manually adding/removing entries from the bridge's fdb. Signed-off-by: Wilson Kok w...@cumulusnetworks.com Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- Nik: Maybe it'd be better if we returned an error even though it doesn't look necessary. I'm open to suggestions. If you don't return an error, then rtnetlink.c is going to emit a NEWNEIGH netlink message. I seriously doubt we want that to happen. Thanks Dave, I was afraid I've missed something like that. I'll re-spin, test and post a v2. Nik -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2 PATCH 13/13] crypto: algif_aead - Switch to new AEAD interface
On Mon, May 25, 2015 at 01:50:55PM +0200, Stephan Mueller wrote: When you have my code local, simply execute libkcapi/test/kcapi -y twice or three times. That triggered the crash. Aha that's what I was missing. I'll look into the crash. Thanks, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pull-request: wireless-drivers 2015-05-25
Hi Dave, here's hopefully the last wireless-drivers pull request for 4.1. Mostly iwlwifi fixes this time. Please let me know if there are any problems. Kalle The following changes since commit f673821864899153142365aca888435815ac93f0: ath9k: fix per-packet tx power configuration (2015-05-03 23:54:38 +0300) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2015-05-25 for you to fetch changes up to aefa441b150279dd8d25658e018898a3fe9a6769: Merge tag 'iwlwifi-for-kalle-2015-05-21' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes (2015-05-22 10:47:02 +0300) iwlwifi: * fix firmware name and other things to enable 3165 * fix bad APMG configuration for 8000 (no AMPG on these devices) * fix MAC address assignment for 8000 * fix firmware debugging triggers (MLME) * fix several bugs in low power states code (net-detect, d0i3) ssb: * fix reboot after device reset for WRT350N v1 Avri Altman (1): iwlwifi: pcie: don't disable the busmaster DMA clock for family 8000 Eliad Peller (1): iwlwifi: mvm: avoid use-after-free on iwl_mvm_d0i3_enable_tx() Emmanuel Grumbach (4): iwlwifi: mvm: forbid MIMO on devices that don't support it iwlwifi: 7000: modify the firmware name for 3165 iwlwifi: mvm: fix MLME trigger iwlwifi: mvm: BT Coex - duplicate the command if sent ASYNC Haim Dreyfuss (1): iwlwifi: mvm: Free fw_status after use to avoid memory leak Kalle Valo (1): Merge tag 'iwlwifi-for-kalle-2015-05-21' of https://git.kernel.org/.../iwlwifi/iwlwifi-fixes Liad Kaufman (1): iwlwifi: nvm: force mac from otp in case nvm mac is reserved Luciano Coelho (2): iwlwifi: mvm: take the UCODE_DOWN reference when resuming iwlwifi: mvm: clean net-detect info if device was reset during suspend Rafał Miłecki (1): ssb: extend fix for PCI related silent reboots to all chipsets drivers/net/wireless/iwlwifi/Kconfig|1 + drivers/net/wireless/iwlwifi/iwl-7000.c | 16 ++-- drivers/net/wireless/iwlwifi/iwl-eeprom-parse.c |5 drivers/net/wireless/iwlwifi/iwl-eeprom-parse.h |3 +++ drivers/net/wireless/iwlwifi/iwl-nvm-parse.c| 30 +-- drivers/net/wireless/iwlwifi/mvm/coex_legacy.c |2 +- drivers/net/wireless/iwlwifi/mvm/d3.c | 22 - drivers/net/wireless/iwlwifi/mvm/mac80211.c |3 --- drivers/net/wireless/iwlwifi/mvm/ops.c |6 +++-- drivers/net/wireless/iwlwifi/mvm/rs.c |3 +++ drivers/net/wireless/iwlwifi/pcie/trans.c |8 +++--- drivers/ssb/driver_pcicore.c|7 +++--- 12 files changed, 73 insertions(+), 33 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] tools: bpf_jit_disasm: fix segfault on disabled debugging log output
With recent debugging, I noticed that bpf_jit_disasm segfaults when there's no debugging output from the JIT compiler to the kernel log. Reason is that when regexec(3) doesn't match on anything, start/end offsets are not being filled out and contain some uninitialized garbage from stack. Thus, we need zero out offsets first. Signed-off-by: Daniel Borkmann dan...@iogearbox.net --- tools/net/bpf_jit_disasm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/net/bpf_jit_disasm.c b/tools/net/bpf_jit_disasm.c index c5baf9c..618c2bc 100644 --- a/tools/net/bpf_jit_disasm.c +++ b/tools/net/bpf_jit_disasm.c @@ -123,6 +123,8 @@ static int get_last_jit_image(char *haystack, size_t hlen, assert(ret == 0); ptr = haystack; + memset(pmatch, 0, sizeof(pmatch)); + while (1) { ret = regexec(regex, ptr, 1, pmatch, 0); if (ret == 0) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3 nf-next] netfilter: nf_tables: add netdev table to filter from ingress
This allows us to create netdev tables that contain ingress chains. Use skb_header_pointer() as we may see shared sk_buffs at this stage. This change provides access to the existing nf_tables features from the ingress hook. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/net/netns/nftables.h |1 + net/netfilter/Kconfig|5 ++ net/netfilter/Makefile |1 + net/netfilter/nf_tables_netdev.c | 183 ++ 4 files changed, 190 insertions(+) create mode 100644 net/netfilter/nf_tables_netdev.c diff --git a/include/net/netns/nftables.h b/include/net/netns/nftables.h index eee608b..c807811 100644 --- a/include/net/netns/nftables.h +++ b/include/net/netns/nftables.h @@ -13,6 +13,7 @@ struct netns_nftables { struct nft_af_info *inet; struct nft_af_info *arp; struct nft_af_info *bridge; + struct nft_af_info *netdev; unsigned intbase_seq; u8 gencursor; }; diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 9a89e7c..bd5aaeb 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -456,6 +456,11 @@ config NF_TABLES_INET help This option enables support for a mixed IPv4/IPv6 inet table. +config NF_TABLES_NETDEV + tristate Netfilter nf_tables netdev tables support + help + This option enables support for the netdev table. + config NFT_EXTHDR tristate Netfilter nf_tables IPv6 exthdr module help diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index a87d8b8..70d026d 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -75,6 +75,7 @@ nf_tables-objs += nft_bitwise.o nft_byteorder.o nft_payload.o obj-$(CONFIG_NF_TABLES)+= nf_tables.o obj-$(CONFIG_NF_TABLES_INET) += nf_tables_inet.o +obj-$(CONFIG_NF_TABLES_NETDEV) += nf_tables_netdev.o obj-$(CONFIG_NFT_COMPAT) += nft_compat.o obj-$(CONFIG_NFT_EXTHDR) += nft_exthdr.o obj-$(CONFIG_NFT_META) += nft_meta.o diff --git a/net/netfilter/nf_tables_netdev.c b/net/netfilter/nf_tables_netdev.c new file mode 100644 index 000..04cb170 --- /dev/null +++ b/net/netfilter/nf_tables_netdev.c @@ -0,0 +1,183 @@ +/* + * Copyright (c) 2015 Pablo Neira Ayuso pa...@netfilter.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include linux/init.h +#include linux/module.h +#include net/netfilter/nf_tables.h +#include linux/ip.h +#include linux/ipv6.h +#include net/netfilter/nf_tables_ipv4.h +#include net/netfilter/nf_tables_ipv6.h + +static inline void +nft_netdev_set_pktinfo_ipv4(struct nft_pktinfo *pkt, + const struct nf_hook_ops *ops, struct sk_buff *skb, + const struct nf_hook_state *state) +{ + struct iphdr *iph, _iph; + u32 len, thoff; + + nft_set_pktinfo(pkt, ops, skb, state); + + iph = skb_header_pointer(skb, skb_network_offset(skb), sizeof(*iph), +_iph); + if (!iph) + return; + + iph = ip_hdr(skb); + if (iph-ihl 5 || iph-version != 4) + return; + + len = ntohs(iph-tot_len); + thoff = iph-ihl * 4; + if (skb-len len) + return; + else if (len thoff) + return; + + pkt-tprot = iph-protocol; + pkt-xt.thoff = thoff; + pkt-xt.fragoff = ntohs(iph-frag_off) IP_OFFSET; +} + +static inline void +__nft_netdev_set_pktinfo_ipv6(struct nft_pktinfo *pkt, + const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ +#if IS_ENABLED(CONFIG_IPV6) + struct ipv6hdr *ip6h, _ip6h; + unsigned int thoff = 0; + unsigned short frag_off; + int protohdr; + u32 pkt_len; + + ip6h = skb_header_pointer(skb, skb_network_offset(skb), sizeof(*ip6h), + _ip6h); + if (!ip6h) + return; + + if (ip6h-version != 6) + return; + + pkt_len = ntohs(ip6h-payload_len); + if (pkt_len + sizeof(*ip6h) skb-len) + return; + + protohdr = ipv6_find_hdr(pkt-skb, thoff, -1, frag_off, NULL); + if (protohdr 0) +return; + + pkt-tprot = protohdr; + pkt-xt.thoff = thoff; + pkt-xt.fragoff = frag_off; +#endif +} + +static inline void nft_netdev_set_pktinfo_ipv6(struct nft_pktinfo *pkt, + const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + nft_set_pktinfo(pkt, ops,
[PATCH nft] src: add netdev family support
This patch adds support for the new 'netdev' table. So far, this table allows you to create filter chains from ingress. The following example shows a very simple base configuration with one table that is bound to device 'eth0' with a single ingress chain: # nft list table netdev eth0 table netdev eth0 { device eth0; chain ingress { type filter hook ingress priority 0; policy accept; } } The selected table name is 'eth0' but you could have selected any name. You can test that this works by adding a simple rule with counters: # nft add rule netdev eth0 ingress counter or a bit more elaborated test like: http://people.netfilter.org/pablo/nft-ingress.ruleset More information will be available at the nftables documentation site [1]. [1] http://wiki.nftables.org/ Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- doc/nft.xml | 41 + include/linux/netfilter.h |8 include/rule.h|2 ++ src/evaluate.c|4 src/netlink.c | 11 +-- src/parser_bison.y|7 +++ src/payload.c |1 + src/proto.c |1 + src/rule.c| 23 +++ src/scanner.l |2 ++ 10 files changed, 98 insertions(+), 2 deletions(-) diff --git a/doc/nft.xml b/doc/nft.xml index 8d79016..1172c43 100644 --- a/doc/nft.xml +++ b/doc/nft.xml @@ -267,6 +267,14 @@ filter input iif $int_ifs accept /para /listitem /varlistentry + varlistentry + termoptionnetdev/option/term + listitem + para + Netdev address family, handling packets from ingress. + /para + /listitem + /varlistentry /variablelist /para para @@ -373,6 +381,38 @@ filter input iif $int_ifs accept The bridge address family handles ethernet packets traversing bridge devices. /para /refsect2 + refsect2 + titleNetdev address family/title + para + The Netdev address family handles packets from ingress. + /para + para + table frame=all + titleNetdev address family hooks/title + tgroup cols='2' align='left' colsep='1' rowsep='1' pgwide=1 + colspec colname='c1' colwidth=1*/ + colspec colname='c2' colwidth=5*/ + thead + row + entryHook/entry + entryDescription/entry + /row + /thead + tbody + row + entryingress/entry + entry + All packets entering the system are processed by this hook. It is invoked + before layer 3 protocol handlers and it can be used for early filtering and + policing. + /entry + /row + /tbody + /tgroup + /table + /para + /refsect2 + /refsect1 refsect1 @@ -401,6 +441,7 @@ filter input iif $int_ifs accept memberliteralinet/literal/member memberliteralarp/literal/member memberliteralbridge/literal/member + memberliteralnetdev/literal/member /simplelist. The literalinet/literal address family is a dummy family which is used to
[PATCH 2/3 nf-next] netfilter: nf_tables: allow to bind table to net_device
This patch adds the internal NFT_AF_NEEDS_DEV flag to indicate that you must attach this table to a net_device. This change is required by the follow up patch that introduces the new netdev table. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/net/netfilter/nf_tables.h|8 ++ include/uapi/linux/netfilter/nf_tables.h |2 ++ net/netfilter/nf_tables_api.c| 46 ++ 3 files changed, 51 insertions(+), 5 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index e6bcf55..3d6f48c 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -819,6 +819,7 @@ unsigned int nft_do_chain(struct nft_pktinfo *pkt, * @use: number of chain references to this table * @flags: table flag (see enum nft_table_flags) * @name: name of the table + * @dev: this table is bound to this device (if any) */ struct nft_table { struct list_headlist; @@ -828,6 +829,11 @@ struct nft_table { u32 use; u16 flags; charname[NFT_TABLE_MAXNAMELEN]; + struct net_device *dev; +}; + +enum nft_af_flags { + NFT_AF_NEEDS_DEV= (1 0), }; /** @@ -838,6 +844,7 @@ struct nft_table { * @nhooks: number of hooks in this family * @owner: module owner * @tables: used internally + * @flags: family flags * @nops: number of hook ops in this family * @hook_ops_init: initialization function for chain hook ops * @hooks: hookfn overrides for packet validation @@ -848,6 +855,7 @@ struct nft_af_info { unsigned intnhooks; struct module *owner; struct list_headtables; + u32 flags; unsigned intnops; void(*hook_ops_init)(struct nf_hook_ops *, unsigned int); diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 5fa1cd0..89a671e 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -146,12 +146,14 @@ enum nft_table_flags { * @NFTA_TABLE_NAME: name of the table (NLA_STRING) * @NFTA_TABLE_FLAGS: bitmask of enum nft_table_flags (NLA_U32) * @NFTA_TABLE_USE: number of chains in this table (NLA_U32) + * @NFTA_TABLE_DEV: net device name (NLA_STRING) */ enum nft_table_attributes { NFTA_TABLE_UNSPEC, NFTA_TABLE_NAME, NFTA_TABLE_FLAGS, NFTA_TABLE_USE, + NFTA_TABLE_DEV, __NFTA_TABLE_MAX }; #define NFTA_TABLE_MAX (__NFTA_TABLE_MAX - 1) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index ad9d11f..2fd4e99 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -399,6 +399,8 @@ static const struct nla_policy nft_table_policy[NFTA_TABLE_MAX + 1] = { [NFTA_TABLE_NAME] = { .type = NLA_STRING, .len = NFT_TABLE_MAXNAMELEN - 1 }, [NFTA_TABLE_FLAGS] = { .type = NLA_U32 }, + [NFTA_TABLE_DEV]= { .type = NLA_STRING, + .len = IFNAMSIZ - 1 }, }; static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net, @@ -423,6 +425,10 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net, nla_put_be32(skb, NFTA_TABLE_USE, htonl(table-use))) goto nla_put_failure; + if (table-dev + nla_put_string(skb, NFTA_TABLE_DEV, table-dev-name)) + goto nla_put_failure; + nlmsg_end(skb, nlh); return 0; @@ -608,6 +614,11 @@ static int nf_tables_updtable(struct nft_ctx *ctx) if (flags == ctx-table-flags) return 0; + if ((ctx-afi-flags NFT_AF_NEEDS_DEV) + ctx-nla[NFTA_TABLE_DEV] + nla_strcmp(ctx-nla[NFTA_TABLE_DEV], ctx-table-dev-name)) + return -EOPNOTSUPP; + trans = nft_trans_alloc(ctx, NFT_MSG_NEWTABLE, sizeof(struct nft_trans_table)); if (trans == NULL) @@ -645,6 +656,7 @@ static int nf_tables_newtable(struct sock *nlsk, struct sk_buff *skb, struct nft_table *table; struct net *net = sock_net(skb-sk); int family = nfmsg-nfgen_family; + struct net_device *dev = NULL; u32 flags = 0; struct nft_ctx ctx; int err; @@ -679,30 +691,50 @@ static int nf_tables_newtable(struct sock *nlsk, struct sk_buff *skb, return -EINVAL; } + if (afi-flags NFT_AF_NEEDS_DEV) { + char ifname[IFNAMSIZ]; + + if (!nla[NFTA_TABLE_DEV]) + return
Re: [v2 PATCH 13/13] crypto: algif_aead - Switch to new AEAD interface
Am Montag, 25. Mai 2015, 18:20:21 schrieb Herbert Xu: Hi Herbert, On Sun, May 24, 2015 at 12:52:02PM +0200, Stephan Mueller wrote: [ 29.653113] BUG: unable to handle kernel NULL pointer dereference at 000c Weird. I tried running your test but it appears to pass. The only failures were the nonsense strings and everything else says pased. To simply verify that all passes is to check for the return code: the return code tells you the number of failures --- the value of 0 indicates that all pass. And I see a simple test problem: I added a debug return that I forgot to remove in the test.sh. Thus, the large test is not executed with test.sh. When you have my code local, simply execute libkcapi/test/kcapi -y twice or three times. That triggered the crash. It certainly didn't crash for me. Considering that I just killed cryptoff in my local tree, it is entirely possible that the patches that you are running are no longer the same as mine. So let me merge the cryptoff patches and then I'll repost the algif_aead patch and ask you to retest. Thanks, -- Ciao Stephan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3 nf-next] netfilter: default CONFIG_NETFILTER_INGRESS to y
Useful to compile-test all options. Suggested-by: by Alexei Stavoroitov a...@plumgrid.com Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/netfilter/Kconfig |1 + 1 file changed, 1 insertion(+) diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index db1c674..9a89e7c 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -3,6 +3,7 @@ menu Core Netfilter Configuration config NETFILTER_INGRESS bool Netfilter ingress support + default y select NET_INGRESS help This allows you to classify packets from ingress using the Netfilter -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH libnftnl] table: add netdev family support
This adds support for the new 'netdev' family tables. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/buffer.h|1 + include/libnftnl/table.h|1 + include/linux/netfilter.h |8 include/linux/netfilter/nf_tables.h |2 ++ src/chain.c |6 ++ src/table.c | 37 +-- 6 files changed, 53 insertions(+), 2 deletions(-) diff --git a/include/buffer.h b/include/buffer.h index 52942ed..38b6136 100644 --- a/include/buffer.h +++ b/include/buffer.h @@ -41,6 +41,7 @@ int nft_buf_reg(struct nft_buf *b, int type, union nft_data_reg *reg, #define CHAIN chain #define CODE code #define DATA data +#define DEVICE device #define DIRdir #define DREG dreg #define EXTHDR_TYPEexthdr_type diff --git a/include/libnftnl/table.h b/include/libnftnl/table.h index fac79e7..16df5fa 100644 --- a/include/libnftnl/table.h +++ b/include/libnftnl/table.h @@ -22,6 +22,7 @@ enum { NFT_TABLE_ATTR_FAMILY, NFT_TABLE_ATTR_FLAGS, NFT_TABLE_ATTR_USE, + NFT_TABLE_ATTR_DEV, __NFT_TABLE_ATTR_MAX }; #define NFT_TABLE_ATTR_MAX (__NFT_TABLE_ATTR_MAX - 1) diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index be0bc18..18075f9 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -32,6 +32,7 @@ #define NF_DROP_ERR(x) (((-x) 16) | NF_DROP) /* only for userspace compatibility */ +#ifndef __KERNEL__ /* Generic cache responses from hook functions. = 0x2000 is used for protocol-flags. */ #define NFC_UNKNOWN 0x4000 @@ -39,6 +40,7 @@ /* NF_VERDICT_BITS should be 8 now, but userspace might break if this changes */ #define NF_VERDICT_BITS 16 +#endif enum nf_inet_hooks { NF_INET_PRE_ROUTING, @@ -49,11 +51,17 @@ enum nf_inet_hooks { NF_INET_NUMHOOKS }; +enum nf_dev_hooks { + NF_NETDEV_INGRESS, + NF_NETDEV_NUMHOOKS +}; + enum { NFPROTO_UNSPEC = 0, NFPROTO_INET = 1, NFPROTO_IPV4 = 2, NFPROTO_ARP= 3, + NFPROTO_NETDEV = 5, NFPROTO_BRIDGE = 7, NFPROTO_IPV6 = 10, NFPROTO_DECNET = 12, diff --git a/include/linux/netfilter/nf_tables.h b/include/linux/netfilter/nf_tables.h index 5fa1cd0..89a671e 100644 --- a/include/linux/netfilter/nf_tables.h +++ b/include/linux/netfilter/nf_tables.h @@ -146,12 +146,14 @@ enum nft_table_flags { * @NFTA_TABLE_NAME: name of the table (NLA_STRING) * @NFTA_TABLE_FLAGS: bitmask of enum nft_table_flags (NLA_U32) * @NFTA_TABLE_USE: number of chains in this table (NLA_U32) + * @NFTA_TABLE_DEV: net device name (NLA_STRING) */ enum nft_table_attributes { NFTA_TABLE_UNSPEC, NFTA_TABLE_NAME, NFTA_TABLE_FLAGS, NFTA_TABLE_USE, + NFTA_TABLE_DEV, __NFTA_TABLE_MAX }; #define NFTA_TABLE_MAX (__NFTA_TABLE_MAX - 1) diff --git a/src/chain.c b/src/chain.c index 84851e0..74e5925 100644 --- a/src/chain.c +++ b/src/chain.c @@ -76,6 +76,12 @@ static const char *nft_hooknum2str(int family, int hooknum) return forward; } break; + case NFPROTO_NETDEV: + switch (hooknum) { + case NF_NETDEV_INGRESS: + return ingress; + } + break; } return unknown; } diff --git a/src/table.c b/src/table.c index ab0a8ea..f748d6d 100644 --- a/src/table.c +++ b/src/table.c @@ -32,6 +32,7 @@ struct nft_table { const char *name; uint32_tfamily; uint32_ttable_flags; + const char *dev; uint32_tuse; uint32_tflags; }; @@ -74,6 +75,12 @@ void nft_table_attr_unset(struct nft_table *t, uint16_t attr) break; case NFT_TABLE_ATTR_USE: break; + case NFT_TABLE_ATTR_DEV: + if (t-dev) { + xfree(t-dev); + t-dev = NULL; + } + break; } t-flags = ~(1 attr); } @@ -108,6 +115,12 @@ void nft_table_attr_set_data(struct nft_table *t, uint16_t attr, case NFT_TABLE_ATTR_USE: t-use = *((uint32_t *)data); break; + case NFT_TABLE_ATTR_DEV: + if (t-dev) + xfree(t-dev); + + t-dev = strdup(data); + break; } t-flags |= (1 attr); } @@ -155,6 +168,8 @@ const void *nft_table_attr_get_data(struct nft_table *t, uint16_t attr, case NFT_TABLE_ATTR_USE: *data_len = sizeof(uint32_t); return t-use; + case NFT_TABLE_ATTR_DEV: + return t-dev; } return NULL; } @@ -193,6 +208,8 @@ void
[PATCH 0/3 nf-next] nf_tables support at ingress
Hi, This is the follow-up patchset to add nf_tables on top of the Netfilter ingress hook [1] now available in the net-next tree: 1) default CONFIG_NETFILTER_INGRESS to y for easier compile-testing of all options. 2) Allow to bind a table to net_device. This introduces the internal NFT_AF_NEEDS_DEV flag to perform a mandatory check for this binding. This patch is required by the next patch. 3) Add the 'netdev' table family, this new table allows you to create ingress filter basechains. This provides access to the existing nf_tables features from ingress. After this, I'll prepare more patches to revisit our existing limit expression and to add support for the tee expression. If no objections, I'll enqueue this patchset to the nf-next tree. Thanks. [1] https://lwn.net/Articles/644937/ Pablo Neira Ayuso (3): netfilter: default CONFIG_NETFILTER_INGRESS to y netfilter: nf_tables: allow to bind table to net_device netfilter: nf_tables: add netdev table to filter from ingress include/net/netfilter/nf_tables.h|8 ++ include/net/netns/nftables.h |1 + include/uapi/linux/netfilter/nf_tables.h |2 + net/netfilter/Kconfig|6 + net/netfilter/Makefile |1 + net/netfilter/nf_tables_api.c| 46 +++- net/netfilter/nf_tables_netdev.c | 183 ++ 7 files changed, 242 insertions(+), 5 deletions(-) create mode 100644 net/netfilter/nf_tables_netdev.c -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 0/7] netns: ease netlink use with a lot of netns
Le 25/05/2015 12:55, Alexander Holler a écrit : Am 25.05.2015 um 09:45 schrieb Nicolas Dichtel: Le 22/05/2015 22:50, Alexander Holler a écrit : First I think if NETLINK_LISTEN_ALL_NSID is enabled, a dump of the interfaces through RTM_GETLINK together with NLM_F_DUMP and NLM_F_REQUEST should return all interfaces of all reachable namespaces. This option is only for 'listening', ie spontaneous notifications from the kernel. It does nothing for request. The problem is that you need informations about the affected interfaces. E.g. if you receive an NEWADDR or NEWROUTE for some interface (indicated by the index of the interface) in a(nother) namespace, how do you get informations about that interface, if not by a dump which includes the interfaces of these namespaces too? Without knowledge about the interface, these messages are not very usable. ;) Yes, this is the right things. Usually, a daemon opens a socket to listen netlink event. Then, it opens another netlink socket to dump the configuration (interfaces, addresses, routes, etc.) and fill its internal structures. Starting from that point, for most of configuration parameters, it doesn't need anymore to do dumps and thus it can close the second socket. This allows your daemon to have only one socket to monitor a set a netns. Look at iproute for example, it starts by dumping all interfaces before executing the specified command. Not sure to follow you. veth0 sits in the current netns (let's say init_net) and veth1 in netns1. So, when you dump veth0 in init_net, its link-netnsid is set to the id of netns1 in init_net. And when you dump veth1 in netns1, it's link-netnsid is set to the id of init_net in netns1. I've misunderstood the meaning of IFLA_LINK_NETNSID. I thought it indicates the namespace an interface lives in, but it indicates the namespace it is linked too. Yes. I've also thought that the NETNSID is a global unique identifier of a namespace, which seems to be wrong too. While I still not have read through all the sources, the other comments are suggesting that the NSID is just an ID which is unique only in one namespace, or in other words, every namespace has its own set of nsids. I'm not sure if I'm now right with that assumption, but that's what I now think after the responses to my mail. ;) Right, nsid are local to a netns. This allows to migrate a container. With a global id, that won't be possible. ifindex are local for the exact same purpose. So to conclude, I've now scheduled support for namespaces to a far later point. It doesn't seem to be as easy as I've thought after having read the introductory mail of your patch series. ;) The main goal of the series was to improve scalability ;-) Regards, Nicolas -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] bridge: skip fdb add if the port shouldn't learn
From: Wilson Kok w...@cumulusnetworks.com Check in fdb_add_entry() if the source port should learn, similar check is used in br_fdb_update. Note that new fdb entries which are added manually or as local ones are still permitted. This patch has been tested by running traffic via a bridge port and switching the port's state, also by manually adding/removing entries from the bridge's fdb. Signed-off-by: Wilson Kok w...@cumulusnetworks.com Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- v2: return an error instead of silently failing. net/bridge/br_fdb.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index e0670d7054f9..7896cf143045 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -736,6 +736,12 @@ static int fdb_add_entry(struct net_bridge_port *source, const __u8 *addr, struct net_bridge_fdb_entry *fdb; bool modified = false; + /* If the port cannot learn allow only local and static entries */ + if (!(state NUD_PERMANENT) !(state NUD_NOARP) + !(source-state == BR_STATE_LEARNING || + source-state == BR_STATE_FORWARDING)) + return -EPERM; + fdb = fdb_find(head, addr, vid); if (fdb == NULL) { if (!(flags NLM_F_CREATE)) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()
On 05/23/2015 04:16 PM, Larry Finger wrote: The driver is reporting a warning at kernel/time/timer.c:1096 due to calling del_timer_sync() while in interrupt mode. Such warnings are fixed by calling del_timer() instead. Signed-off-by: Larry Finger larry.fin...@lwfinger.net Cc: Stable sta...@vger.kernel.org Cc: Haggi Eran haggai.e...@gmail.com --- Greg, Please drop this patch. The same fixes were submitted as https://lkml.org/lkml/2015/5/15/226. It is crucial that this get into the 4.1 kernel where the regression was introduced. Larry drivers/staging/rtl8712/rtl8712_led.c | 2 +- drivers/staging/rtl8712/rtl871x_mlme.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/rtl8712/rtl8712_led.c b/drivers/staging/rtl8712/rtl8712_led.c index f1d47a0..8cc716c 100644 --- a/drivers/staging/rtl8712/rtl8712_led.c +++ b/drivers/staging/rtl8712/rtl8712_led.c @@ -921,7 +921,7 @@ static void SwLedControlMode1(struct _adapter *padapter, IS_LED_WPS_BLINKING(pLed)) return; if (pLed-bLedNoLinkBlinkInProgress == true) { - del_timer_sync(pLed-BlinkTimer); + del_timer(pLed-BlinkTimer); pLed-bLedNoLinkBlinkInProgress = false; } if (pLed-bLedBlinkInProgress == true) { diff --git a/drivers/staging/rtl8712/rtl871x_mlme.c b/drivers/staging/rtl8712/rtl871x_mlme.c index fb2b195..ace88ab 100644 --- a/drivers/staging/rtl8712/rtl871x_mlme.c +++ b/drivers/staging/rtl8712/rtl871x_mlme.c @@ -582,7 +582,7 @@ void r8712_surveydone_event_callback(struct _adapter *adapter, u8 *pbuf) spin_lock_irqsave(pmlmepriv-lock, irqL); if (check_fwstate(pmlmepriv, _FW_UNDER_SURVEY) == true) { - del_timer_sync(pmlmepriv-scan_to_timer); + del_timer(pmlmepriv-scan_to_timer); _clr_fwstate_(pmlmepriv, _FW_UNDER_SURVEY); } @@ -910,7 +910,7 @@ void r8712_joinbss_event_callback(struct _adapter *adapter, u8 *pbuf) if (check_fwstate(pmlmepriv, WIFI_STATION_STATE) == true) r8712_indicate_connect(adapter); - del_timer_sync(pmlmepriv-assoc_timer); + del_timer(pmlmepriv-assoc_timer); } else goto ignore_joinbss_callback; } else { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pull-request: wireless-drivers 2015-05-25
From: Kalle Valo kv...@codeaurora.org Date: Mon, 25 May 2015 15:01:19 +0300 here's hopefully the last wireless-drivers pull request for 4.1. Mostly iwlwifi fixes this time. Please let me know if there are any problems. Pulled into 'net', thanks Kalle. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
TCP window auto-tuning sub-optimal in GRE tunnel
Hello, all. I hope this is the correct list for this question. We are having serious problems on high BDP networks using GRE tunnels. Our traces show it to be a TCP Window problem. When we test without GRE, throughput is wire speed and traces show the window size to be 16MB which is what we configured for r/wmem_max and tcp_r/wmem. When we switch to GRE, we see over a 90% drop in throughput and the TCP window size seems to peak at around 500K. What causes this and how can we get the GRE tunnels to use the max window size? Thanks - John -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()
On 05/25/2015 04:37 AM, Vaishali Thakkar wrote: On 25 May 2015 14:49, Dan Carpenter dan.carpen...@oracle.com mailto:dan.carpen...@oracle.com wrote: On Sun, May 24, 2015 at 07:11:40PM -0500, Larry Finger wrote: On 05/24/2015 02:03 PM, Haggai Eran wrote: On 24 May 2015 at 00:16, Larry Finger larry.fin...@lwfinger.net mailto:larry.fin...@lwfinger.net wrote: The driver is reporting a warning at kernel/time/timer.c:1096 due to calling del_timer_sync() while in interrupt mode. Such warnings are fixed by calling del_timer() instead. Signed-off-by: Larry Finger larry.fin...@lwfinger.net mailto:larry.fin...@lwfinger.net Cc: Stable sta...@vger.kernel.org mailto:sta...@vger.kernel.org Cc: Haggi Eran haggai.e...@gmail.com mailto:haggai.e...@gmail.com Hi, I haven't been using kernel v4.1 so I haven't seen this warning, but looking at the code it seems to originate from the two recent patches to remove _cancel_timer and _cancel_timer_ex. I see that there's another patch in lkml [1] that changes del_timer_sync back to del_timer in more places. Perhaps it could prevent other warnings like this in the future. Regards, Haggai [1] https://lkml.org/lkml/2015/5/15/226 Yes, the script kiddies make changes they do not understand and screw everything up. Unfortunately, I did not catch these in review. I think I will submit V2 and blast the contributor. Don't blast the contributor... These are special intern patches that dont' go through the normal review process. The intern process is over this year. The lack of normal review introduced a number of bugs this year. I always complain to Greg about it and he says that I should join the intern mailing list if I care so much. I am sorry for those patches. It was me who introduced those bugs. Yes, it was sent during Outreachy process. But it was my mistake as a newbie. May be I should have taken care of interrupt mode thing. I would like to fix it if someone is not doing it. Sorry again. I will take care of these things in my future patches. No, one of us will fix the problems with r8712u. The hardware is needed for proper testing, and I doubt that you have it. Larry -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()
On 05/24/2015 11:42 PM, Sudip Mukherjee wrote: I haven't been using kernel v4.1 so I haven't seen this warning, but looking at the code it seems to originate from the two recent patches to remove _cancel_timer and _cancel_timer_ex. I see that there's another patch in lkml [1] that changes del_timer_sync back to del_timer in more places. Perhaps it could prevent other warnings like this in the future. _cancel_timer and _cancel_timer_ex both were internally using del_timer, and the issue was reported in bugzilla. I have given the reference of the bugzilla in my patch in lkml. I have changed the reference of del_timer_sync to del_timer in all places which were in interrupt context, in some places it was not removed as those were not in interrupt context. Why did you not Cc the maintainers?? That is why file MAINTAINERS exists. Was your patch sent to Greg? Larry -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 11:42 -0400, John A. Sullivan III wrote: Hello, all. I hope this is the correct list for this question. We are having serious problems on high BDP networks using GRE tunnels. Our traces show it to be a TCP Window problem. When we test without GRE, throughput is wire speed and traces show the window size to be 16MB which is what we configured for r/wmem_max and tcp_r/wmem. When we switch to GRE, we see over a 90% drop in throughput and the TCP window size seems to peak at around 500K. What causes this and how can we get the GRE tunnels to use the max window size? Thanks - John Hi John Is it for a single flow or multiple ones ? Which kernel versions on sender and receiver ? What is the nominal speed of non GRE traffic ? What is the brand/model of receiving NIC ? Is GRO enabled ? It is possible receiver window is impacted because of GRE encapsulation making skb-len/skb-truesize ratio a bit smaller, but not by 90%. I suspect some more trivial issues, like receiver overwhelmed by the extra load of GRE encapsulation. 1) Non GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H lpaa24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 70 tcpi_rttvar 7 tcpi_snd_ssthresh 221 tpci_snd_cwnd 258 tcpi_reordering 3 tcpi_total_retrans 711 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPUCPU CPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1912320 6291456 16384 10.00 22386.89 10^6bits/s 1.20 S 2.60 S 0.211 0.456 usec/KB 2) GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 7.7.7.24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.24 () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 76 tcpi_rttvar 7 tcpi_snd_ssthresh 176 tpci_snd_cwnd 249 tcpi_reordering 3 tcpi_total_retrans 819 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPUCPU CPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1815552 6291456 16384 10.00 22420.88 10^6bits/s 1.01 S 3.44 S 0.177 0.603 usec/KB -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 14:49 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 09:58 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 11:42 -0400, John A. Sullivan III wrote: Hello, all. I hope this is the correct list for this question. We are having serious problems on high BDP networks using GRE tunnels. Our traces show it to be a TCP Window problem. When we test without GRE, throughput is wire speed and traces show the window size to be 16MB which is what we configured for r/wmem_max and tcp_r/wmem. When we switch to GRE, we see over a 90% drop in throughput and the TCP window size seems to peak at around 500K. What causes this and how can we get the GRE tunnels to use the max window size? Thanks - John Hi John Is it for a single flow or multiple ones ? Which kernel versions on sender and receiver ? What is the nominal speed of non GRE traffic ? What is the brand/model of receiving NIC ? Is GRO enabled ? It is possible receiver window is impacted because of GRE encapsulation making skb-len/skb-truesize ratio a bit smaller, but not by 90%. I suspect some more trivial issues, like receiver overwhelmed by the extra load of GRE encapsulation. 1) Non GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H lpaa24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 70 tcpi_rttvar 7 tcpi_snd_ssthresh 221 tpci_snd_cwnd 258 tcpi_reordering 3 tcpi_total_retrans 711 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1912320 6291456 16384 10.00 22386.89 10^6bits/s 1.20 S 2.60 S 0.211 0.456 usec/KB 2) GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 7.7.7.24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.24 () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 76 tcpi_rttvar 7 tcpi_snd_ssthresh 176 tpci_snd_cwnd 249 tcpi_reordering 3 tcpi_total_retrans 819 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1815552 6291456 16384 10.00 22420.88 10^6bits/s 1.01 S 3.44 S 0.177 0.603 usec/KB Thanks, Eric. It really looks like a windowing issue but here is the relevant information: We are measuring single flows. One side is an Intel GbE NIC connected to a 1 Gbps CIR Internet connection. The other side is an Intel 10 GbE NIC connected to a 40 Gbps Internet connection. RTT is ~=80ms The numbers I will post below are from a duplicated setup in our test lab where the systems are connected by GbE links with a netem router in the middle to introduce the latency. We are not varying the latency to ensure we eliminate packet re-ordering from the mix. We are measuring a single flow. Here are the non-GRE numbers: root@gwhq-1:~# nuttcp -T 60 -i 10 192.168.224.2 666.3125 MB / 10.00 sec = 558.9370 Mbps 0 retrans 1122.2500 MB / 10.00 sec = 941.4151 Mbps 0 retrans 720.8750 MB / 10.00 sec = 604.7129 Mbps 0 retrans 1122.3125 MB / 10.00 sec = 941.4622 Mbps 0 retrans 1122.2500 MB / 10.00 sec = 941.4101 Mbps 0 retrans 1122.3125 MB / 10.00 sec = 941.4668 Mbps 0 retrans 5888.5000 MB / 60.19 sec = 820.6857 Mbps 4 %TX 13 %RX 0 retrans 80.28 msRTT For some reason, nuttcp does not show retransmissions in our environment even when they do exist. gro is active on the send side: root@gwhq-1:~# ethtool -k eth0 Features for eth0: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-unneeded: off [fixed] tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 15:29 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 15:21 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 12:05 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 14:49 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 09:58 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 11:42 -0400, John A. Sullivan III wrote: Hello, all. I hope this is the correct list for this question. We are having serious problems on high BDP networks using GRE tunnels. Our traces show it to be a TCP Window problem. When we test without GRE, throughput is wire speed and traces show the window size to be 16MB which is what we configured for r/wmem_max and tcp_r/wmem. When we switch to GRE, we see over a 90% drop in throughput and the TCP window size seems to peak at around 500K. What causes this and how can we get the GRE tunnels to use the max window size? Thanks - John Hi John Is it for a single flow or multiple ones ? Which kernel versions on sender and receiver ? What is the nominal speed of non GRE traffic ? What is the brand/model of receiving NIC ? Is GRO enabled ? It is possible receiver window is impacted because of GRE encapsulation making skb-len/skb-truesize ratio a bit smaller, but not by 90%. I suspect some more trivial issues, like receiver overwhelmed by the extra load of GRE encapsulation. 1) Non GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H lpaa24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 70 tcpi_rttvar 7 tcpi_snd_ssthresh 221 tpci_snd_cwnd 258 tcpi_reordering 3 tcpi_total_retrans 711 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPUCPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1912320 6291456 16384 10.00 22386.89 10^6bits/s 1.20 S 2.60 S 0.211 0.456 usec/KB 2) GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 7.7.7.24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.24 () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 76 tcpi_rttvar 7 tcpi_snd_ssthresh 176 tpci_snd_cwnd 249 tcpi_reordering 3 tcpi_total_retrans 819 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPUCPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1815552 6291456 16384 10.00 22420.88 10^6bits/s 1.01 S 3.44 S 0.177 0.603 usec/KB Thanks, Eric. It really looks like a windowing issue but here is the relevant information: We are measuring single flows. One side is an Intel GbE NIC connected to a 1 Gbps CIR Internet connection. The other side is an Intel 10 GbE NIC connected to a 40 Gbps Internet connection. RTT is ~=80ms The numbers I will post below are from a duplicated setup in our test lab where the systems are connected by GbE links with a netem router in the middle to introduce the latency. We are not varying the latency to ensure we eliminate packet re-ordering from the mix. We are measuring a single flow. Here are the non-GRE numbers: root@gwhq-1:~# nuttcp -T 60 -i 10 192.168.224.2 666.3125 MB / 10.00 sec = 558.9370 Mbps 0 retrans 1122.2500 MB / 10.00 sec = 941.4151 Mbps 0 retrans 720.8750 MB / 10.00 sec = 604.7129 Mbps 0 retrans 1122.3125 MB / 10.00 sec = 941.4622 Mbps 0 retrans 1122.2500 MB / 10.00 sec = 941.4101 Mbps 0 retrans 1122.3125 MB / 10.00 sec = 941.4668 Mbps 0 retrans 5888.5000 MB / 60.19 sec = 820.6857 Mbps 4 %TX 13 %RX 0 retrans 80.28 msRTT For some reason, nuttcp does not show retransmissions in our environment even when they do exist. gro is active on the send side: root@gwhq-1:~# ethtool -k eth0 Features for eth0:
[PATCH] net:xen-netback - Change 1 to true for bool type variable.
The variable separate_tx_rx_irq is bool type so assigning true instead of 1. Signed-off-by: Shailendra Verma shailendra.capric...@gmail.com --- drivers/net/xen-netback/netback.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 4de46aa..792ada6 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -52,7 +52,7 @@ * event channels are limited resource. Split event channels are * enabled by default. */ -bool separate_tx_rx_irq = 1; +bool separate_tx_rx_irq = true; module_param(separate_tx_rx_irq, bool, 0644); /* The time that packets can stay on the guest Rx internal queue -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net:wireless:rndis_wlan - Use bool function return value
The function rndis_bss_info_update() has bool return type. So use bool value flase instead of NULL to return. Signed-off-by: Shailendra Verma shailendra.capric...@gmail.com --- drivers/net/wireless/rndis_wlan.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/rndis_wlan.c b/drivers/net/wireless/rndis_wlan.c index d72ff8e..b2bff5c 100644 --- a/drivers/net/wireless/rndis_wlan.c +++ b/drivers/net/wireless/rndis_wlan.c @@ -2000,7 +2000,7 @@ static bool rndis_bss_info_update(struct usbnet *usbdev, if (bssid_len sizeof(struct ndis_80211_bssid_ex) + sizeof(struct ndis_80211_fixed_ies)) - return NULL; + return false; fixed = (struct ndis_80211_fixed_ies *)bssid-ies; @@ -2009,13 +2009,13 @@ static bool rndis_bss_info_update(struct usbnet *usbdev, (int)le32_to_cpu(bssid-ie_length)); ie_len -= sizeof(struct ndis_80211_fixed_ies); if (ie_len 0) - return NULL; + return false; /* extract data for cfg80211_inform_bss */ channel = ieee80211_get_channel(priv-wdev.wiphy, KHZ_TO_MHZ(le32_to_cpu(bssid-config.ds_config))); if (!channel) - return NULL; + return false; signal = level_to_qual(le32_to_cpu(bssid-rssi)); timestamp = le64_to_cpu(*(__le64 *)fixed-timestamp); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next V4 00/12] net/mlx5: ConnectX-4 100G Ethernet driver
On Wed, May 20, 2015 at 5:17 PM, Amir Vadai am...@mellanox.com wrote: On Tue, May 19, 2015 at 7:41 PM, David Miller da...@davemloft.net wrote: From: Amir Vadai am...@mellanox.com Date: Tue, 19 May 2015 12:25:12 +0300 On Sun, May 17, 2015 at 8:05 PM, David Miller da...@davemloft.net wrote: From: Amir Vadai am...@mellanox.com Date: Sun, 17 May 2015 16:02:11 +0300 We didn't get a response yet regarding your comment about the irq renaming [3]. Well then, please hold off on resubmissions of this series until you do get a response and that issue is firmly resolved. Hi, I don't mean to push you, I only want to understand what is expected from me and what are the next steps: How will the issue be resolved? Do you plan to answer my question [1] from last week, and just too busy right now or something like that? I have not seen any response to me explaining why it's ok to change the IRQ name strings in the context where this will occur. Once you explain that, we can make forward progress, but only at that point. Hi Dave, Just to put us back on the same page, repeating a bit the previous chapters.. You wrote [1] that if we change these names after the request_irq() call(s), the new name string will not propagate to /proc/interrupts output. So, indeed, request_irq() is called when the driver is loaded (and we don't know yet if the port types are Infiniband or Ethernet). Only later on, we rename the name when the Ethernet interface is up and we know its name. Fact is that the new name does propagate to /proc/interrupts. Also, looking in the code, I don't see a reason why shouldn't it be properly updated. When calling request_irq(), the name argument is not copied, but irq_desc-action-name points to it. This same pointer is being used by show_interrupts() when /proc/interrupts is shown. All in all, unless I somehow missed your precise question or I didn't explain myself clearly, I don't see what is still missing in my reply, can you please shed some light? What I did find is that the /proc/irq/N/handler directory name which is a copy of the original action-name doesn't change. However AFAIK, this directory isn't being used anywhere in the kernel. [1] - http://www.spinics.net/lists/netdev/msg328444.html Hi Dave, Going over this thread, Amir's response to your comments make sense to me -- could you please let us know if you're convinced... and if not, where his arguments break? I also copied some more folks and will love it if more people will provide their opinion on the matter. Or. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 15:21 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 12:05 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 14:49 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 09:58 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 11:42 -0400, John A. Sullivan III wrote: Hello, all. I hope this is the correct list for this question. We are having serious problems on high BDP networks using GRE tunnels. Our traces show it to be a TCP Window problem. When we test without GRE, throughput is wire speed and traces show the window size to be 16MB which is what we configured for r/wmem_max and tcp_r/wmem. When we switch to GRE, we see over a 90% drop in throughput and the TCP window size seems to peak at around 500K. What causes this and how can we get the GRE tunnels to use the max window size? Thanks - John Hi John Is it for a single flow or multiple ones ? Which kernel versions on sender and receiver ? What is the nominal speed of non GRE traffic ? What is the brand/model of receiving NIC ? Is GRO enabled ? It is possible receiver window is impacted because of GRE encapsulation making skb-len/skb-truesize ratio a bit smaller, but not by 90%. I suspect some more trivial issues, like receiver overwhelmed by the extra load of GRE encapsulation. 1) Non GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H lpaa24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 70 tcpi_rttvar 7 tcpi_snd_ssthresh 221 tpci_snd_cwnd 258 tcpi_reordering 3 tcpi_total_retrans 711 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1912320 6291456 16384 10.00 22386.89 10^6bits/s 1.20 S 2.60 S 0.211 0.456 usec/KB 2) GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 7.7.7.24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.24 () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 76 tcpi_rttvar 7 tcpi_snd_ssthresh 176 tpci_snd_cwnd 249 tcpi_reordering 3 tcpi_total_retrans 819 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1815552 6291456 16384 10.00 22420.88 10^6bits/s 1.01 S 3.44 S 0.177 0.603 usec/KB Thanks, Eric. It really looks like a windowing issue but here is the relevant information: We are measuring single flows. One side is an Intel GbE NIC connected to a 1 Gbps CIR Internet connection. The other side is an Intel 10 GbE NIC connected to a 40 Gbps Internet connection. RTT is ~=80ms The numbers I will post below are from a duplicated setup in our test lab where the systems are connected by GbE links with a netem router in the middle to introduce the latency. We are not varying the latency to ensure we eliminate packet re-ordering from the mix. We are measuring a single flow. Here are the non-GRE numbers: root@gwhq-1:~# nuttcp -T 60 -i 10 192.168.224.2 666.3125 MB / 10.00 sec = 558.9370 Mbps 0 retrans 1122.2500 MB / 10.00 sec = 941.4151 Mbps 0 retrans 720.8750 MB / 10.00 sec = 604.7129 Mbps 0 retrans 1122.3125 MB / 10.00 sec = 941.4622 Mbps 0 retrans 1122.2500 MB / 10.00 sec = 941.4101 Mbps 0 retrans 1122.3125 MB / 10.00 sec = 941.4668 Mbps 0 retrans 5888.5000 MB / 60.19 sec = 820.6857 Mbps 4 %TX 13 %RX 0 retrans 80.28 msRTT For some reason, nuttcp does not show retransmissions in our environment even when they do exist. gro is active on the send side: root@gwhq-1:~# ethtool -k eth0 Features for eth0: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-unneeded: off [fixed] tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6:
[PATCH] atm:he - Do not initialise statics to 0.
Static variables are initialised to 0 by GCC. Fixes the following checkpatch error: ERROR: do not initialise statics to 0 or NULL FILE: drivers/atm/he.c:120: static bool sdh = 0; Signed-off-by: Shailendra Verma shailendra.capric...@gmail.com --- drivers/atm/he.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/atm/he.c b/drivers/atm/he.c index 93dca2e..2cd6f17 100644 --- a/drivers/atm/he.c +++ b/drivers/atm/he.c @@ -117,7 +117,7 @@ static short nvpibits = -1; static short nvcibits = -1; static short rx_skb_reserve = 16; static bool irq_coalesce = 1; -static bool sdh = 0; +static bool sdh; /* Read from EEPROM = 0011b */ static unsigned int readtab[] = { -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v5 00/11] ipv6: Only create RTF_CACHE route after encountering pmtu exception
From: Martin KaFai Lau ka...@fb.com Date: Fri, 22 May 2015 20:55:55 -0700 This series is to avoid creating a RTF_CACHE route whenever we are consulting the fib6 tree with a new destination. Instead, only create RTF_CACHE route when we see a pmtu exception. Looks great, nice work. Series applied to net-next, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 09:58 -0700, Eric Dumazet wrote: 1) Non GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H lpaa24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 70 tcpi_rttvar 7 tcpi_snd_ssthresh 221 tpci_snd_cwnd 258 tcpi_reordering 3 tcpi_total_retrans 711 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1912320 6291456 16384 10.00 22386.89 10^6bits/s 1.20 S 2.60 S 0.211 0.456 usec/KB 2) GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 7.7.7.24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.24 () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 76 tcpi_rttvar 7 tcpi_snd_ssthresh 176 tpci_snd_cwnd 249 tcpi_reordering 3 tcpi_total_retrans 819 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1815552 6291456 16384 10.00 22420.88 10^6bits/s 1.01 S 3.44 S 0.177 0.603 usec/KB Scratch these numbers, they were quite wrong (7.7.7.24 was not using a GRE tunnel) Correct experiment : 1) No GRE tunnel lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H lpaa24 -Cc -t OMNI -l 10 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 82 tcpi_rttvar 11 tcpi_snd_ssthresh 356 tpci_snd_cwnd 358 tcpi_reordering 3 tcpi_total_retrans 288 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPUCPU CPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 2492928 6291456 16384 10.03 31426.29 10^6bits/s 1.14 S 4.82 S 0.143 0.603 usec/KB 2) GRE tunnel -- lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.7.8.152 -Cc -t OMNI -l 10 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.7.8.152 () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1476 tcpi_rcv_ssthresh 28720 tcpi_rtt 165 tcpi_rttvar 27 tcpi_snd_ssthresh 263 tpci_snd_cwnd 264 tcpi_reordering 81 tcpi_total_retrans 26 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPUCPU CPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1216512 6291456 16384 10.00 8471.2410^6bits/s 2.82 S 2.56 S 1.308 1.190 usec/KB Bottleneck here is the sender, because NIC does not support GRE/TSO, so we spend lot of time doing segmentation and TX checksum, consuming a full cpu core. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] atm:he - Change 1 to true for bool type variable.
The variable irq_coalesce is bool type. So assign the value true instead of 1. Signed-off-by: Shailendra Verma shailendra.capric...@gmail.com --- drivers/atm/he.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/atm/he.c b/drivers/atm/he.c index 93dca2e..0237271 100644 --- a/drivers/atm/he.c +++ b/drivers/atm/he.c @@ -116,7 +116,7 @@ static bool disable64; static short nvpibits = -1; static short nvcibits = -1; static short rx_skb_reserve = 16; -static bool irq_coalesce = 1; +static bool irq_coalesce = true; static bool sdh = 0; /* Read from EEPROM = 0011b */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] staging: r8712u: Fix kernel warning for improper call of del_timer_sync()
On Mon, 2015-05-25 at 12:17 +0300, Dan Carpenter wrote: These are special intern patches that dont' go through the normal review process. The intern process is over this year. The lack of normal review introduced a number of bugs this year. I always complain to Greg about it and he says that I should join the intern mailing list if I care so much. It'd be better if the approved patches from the intern list (no idea what that is) were sent to lkml/devel@driverdev lists for review before actually being applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] ARM: zynq: DT: Use the zynq binding with macb
On Fri, 2015-05-22 at 09:22AM -0500, Nathan Sullivan wrote: Use the new zynq binding for macb ethernet, since it will disable half duplex gigabit like the Zynq TRM says to do. Signed-off-by: Nathan Sullivan nathan.sulli...@ni.com --- arch/arm/boot/dts/zynq-7000.dtsi |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/zynq-7000.dtsi b/arch/arm/boot/dts/zynq-7000.dtsi index a5cd2ed..9ea54b3 100644 --- a/arch/arm/boot/dts/zynq-7000.dtsi +++ b/arch/arm/boot/dts/zynq-7000.dtsi @@ -193,7 +193,7 @@ }; gem0: ethernet@e000b000 { - compatible = cdns,gem; + compatible = cdns,zynq-gem; Please, prepend the new string to the compatible list. Don't just replace it like this. reg = 0xe000b000 0x1000; status = disabled; interrupts = 0 22 4; @@ -204,7 +204,7 @@ }; gem1: ethernet@e000c000 { - compatible = cdns,gem; + compatible = cdns,zynq-gem; ditto Sören -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Drops in qdisc on ifb interface
Hello, all. One one of our connections we are doing intensive traffic shaping with tc. We are using ifb interfaces for shaping ingress traffic and we also use ifb interfaces for egress so that we can apply the same set of rules to multiple interfaces (e.g., tun and eth interfaces operating on the same physical interface). These are running on very powerful gateways; I have watched them handling 16 Gbps with CPU utilization at a handful of percent. Yet, I am seeing drops on the ifb interfaces when I do a tc -s qdisc show. Why would this be? I would expect if there was some kind of problem that it would manifest as drops on the physical interfaces and not the IFB interface. We have played with queue lengths in both directions. We are using HFSC with SFQ leaves so I would imagine this overrides the very short qlen on the IFB interfaces (32). These are drops and not overlimits. Ingress: root@gwhq-2:~# tc -s qdisc show dev ifb0 qdisc hfsc 11: root refcnt 2 default 50 Sent 198152831324 bytes 333838154 pkt (dropped 101509, overlimits 9850280 requeues 43871) backlog 0b 0p requeues 43871 qdisc sfq 1102: parent 11:10 limit 127p quantum 1514b divisor 4096 Sent 208463490 bytes 1367761 pkt (dropped 234, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc sfq 1202: parent 11:20 limit 127p quantum 1514b divisor 4096 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc sfq 1302: parent 11:30 limit 127p quantum 1514b divisor 4096 Sent 13498600307 bytes 203705301 pkt (dropped 23358, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc sfq 1402: parent 11:40 limit 127p quantum 1514b divisor 4096 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc sfq 1502: parent 11:50 limit 127p quantum 1514b divisor 4096 Sent 184445767527 bytes 128765092 pkt (dropped 77990, overlimits 0 requeues 0) backlog 0b 0p requeues 0 root@gwhq-2:~# tc -s class show dev ifb0 class hfsc 11: root Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 period 0 level 2 class hfsc 11:1 parent 11: ls m1 0bit d 0us m2 1000Mbit ul m1 0bit d 0us m2 1000Mbit Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 period 210766381 work 198152837828 bytes level 1 class hfsc 11:10 parent 11:1 leaf 1102: rt m1 0bit d 0us m2 1000Mbit Sent 208463490 bytes 1367761 pkt (dropped 234, overlimits 0 requeues 0) backlog 0b 0p requeues 0 period 0 work 208463490 bytes rtwork 208463490 bytes level 0 class hfsc 11:20 parent 11:1 leaf 1202: rt m1 186182Kbit d 2.2ms m2 10Kbit ls m1 0bit d 0us m2 10Kbit Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 period 0 level 0 class hfsc 11:30 parent 11:1 leaf 1302: rt m1 0bit d 0us m2 10Kbit ls m1 0bit d 0us m2 30Kbit Sent 13498600307 bytes 203705301 pkt (dropped 23358, overlimits 0 requeues 0) backlog 0b 0p requeues 0 period 200073586 work 13498600307 bytes rtwork 10035553945 bytes level 0 class hfsc 11:40 parent 11:1 leaf 1402: rt m1 0bit d 0us m2 20Kbit ls m1 0bit d 0us m2 50Kbit Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 period 0 level 0 class hfsc 11:50 parent 11:1 leaf 1502: rt m1 0bit d 0us m2 20Kbit ls m1 0bit d 0us m2 10Kbit Sent 184446394921 bytes 128765668 pkt (dropped 77917, overlimits 0 requeues 0) backlog 0b 0p requeues 0 period 11254219 work 184445774031 bytes rtwork 39040535823 bytes level 0 Egress: root@gwhq-2:~# tc -s qdisc show dev ifb1 qdisc hfsc 1: root refcnt 2 default 40 Sent 783335740812 bytes 551888729 pkt (dropped 9622, overlimits 8546933 requeues 7180) backlog 0b 0p requeues 7180 qdisc sfq 1101: parent 1:10 limit 127p quantum 1514b divisor 4096 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc sfq 1201: parent 1:20 limit 127p quantum 1514b divisor 4096 Sent 345678 bytes 2800 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc sfq 1301: parent 1:30 limit 127p quantum 1514b divisor 4096 Sent 573479513 bytes 8689797 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc sfq 1401: parent 1:40 limit 127p quantum 1514b divisor 4096 Sent 782761915621 bytes 543196132 pkt (dropped 9692, overlimits 0 requeues 0) backlog 0b 0p requeues 0 root@gwhq-2:~# tc -s class show dev ifb1 class hfsc 1: root Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 period 0 level 2 class hfsc 1:10 parent 1:1 leaf 1101: rt m1 186182Kbit d 2.2ms m2 10Kbit ls m1 0bit d 0us m2 10Kbit Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 period 0 level 0 class hfsc 1:1 parent 1: ls m1 0bit d 0us m2 1000Mbit ul m1 0bit d 0us m2 1000Mbit Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 period 27259167 work 783335741126 bytes level 1 class hfsc 1:20 parent 1:1 leaf 1201:
Re: [PATCH v3 -next] ipv6: don't increase size when refragmenting forwarded ipv6 skbs
From: Florian Westphal f...@strlen.de Date: Fri, 22 May 2015 00:44:16 +0200 since commit 6aafeef03b9d (netfilter: push reasm skb through instead of original frag skbs) we will end up sometimes re-fragmenting skbs that we've reassembled. ipv6 defrag preserves the original skbs using the skb frag list, i.e. as long as the skb frag list is preserved there is no problem since we keep original geometry of fragments intact. However, in the rare case where the frag list is munged or skb is linearized, we might send larger fragments than what we originally received. A router in the path might then send packet-too-big errors even if sender never sent fragments exceeding the reported mtu: mtu 1500 - 1500:1400 - 1400:1280 - 1280 A R1 R2B 1 - A sends to B, fragment size 1400 2 - R2 sends pkttoobig error for 1280 3 - A sends to B, fragment size 1280 4 - R2 sends pkttoobig error for 1280 again because it sees fragments of size 1400. make sure ip6_fragment always caps MTU at largest packet size seen when defragmented skb is forwarded. Acked-by: Hannes Frederic Sowa han...@stressinduktion.org Signed-off-by: Florian Westphal f...@strlen.de Applied to net-next, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net:wireless - Change 1 to true for bool type variable.
The variable translate is bool type.So assigning true instead of 1. Signed-off-by: Shailendra Verma shailendra.capric...@gmail.com --- drivers/net/wireless/ray_cs.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ray_cs.c b/drivers/net/wireless/ray_cs.c index 477f863..0881ba8 100644 --- a/drivers/net/wireless/ray_cs.c +++ b/drivers/net/wireless/ray_cs.c @@ -143,7 +143,7 @@ static int psm; static char *essid; /* Default to encapsulation unless translation requested */ -static bool translate = 1; +static bool translate = true; static int country = USA; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net:wireless - Change 1 to true for bool type variable.
On Mon, 2015-05-25 at 23:25 +0530, Shailendra Verma wrote: The variable translate is bool type.So assigning true instead of 1. There are a lot these in the kernel. $ git grep -P ^[ \t]*(?:static[ \t]+)?(?:const\s+)?bool\s+\w+\s*=\s*[01]\s*; * | wc -l 161 Are you going to submit patches for all of them one at a time? I suggest sending maybe 10 to 12 patches, one for each subsystem, and cc the trivial maintainer Jiri Kosina triv...@kernel.org -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 12:05 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 14:49 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 09:58 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 11:42 -0400, John A. Sullivan III wrote: Hello, all. I hope this is the correct list for this question. We are having serious problems on high BDP networks using GRE tunnels. Our traces show it to be a TCP Window problem. When we test without GRE, throughput is wire speed and traces show the window size to be 16MB which is what we configured for r/wmem_max and tcp_r/wmem. When we switch to GRE, we see over a 90% drop in throughput and the TCP window size seems to peak at around 500K. What causes this and how can we get the GRE tunnels to use the max window size? Thanks - John Hi John Is it for a single flow or multiple ones ? Which kernel versions on sender and receiver ? What is the nominal speed of non GRE traffic ? What is the brand/model of receiving NIC ? Is GRO enabled ? It is possible receiver window is impacted because of GRE encapsulation making skb-len/skb-truesize ratio a bit smaller, but not by 90%. I suspect some more trivial issues, like receiver overwhelmed by the extra load of GRE encapsulation. 1) Non GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H lpaa24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 70 tcpi_rttvar 7 tcpi_snd_ssthresh 221 tpci_snd_cwnd 258 tcpi_reordering 3 tcpi_total_retrans 711 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1912320 6291456 16384 10.00 22386.89 10^6bits/s 1.20 S 2.60 S 0.211 0.456 usec/KB 2) GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 7.7.7.24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.24 () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 76 tcpi_rttvar 7 tcpi_snd_ssthresh 176 tpci_snd_cwnd 249 tcpi_reordering 3 tcpi_total_retrans 819 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1815552 6291456 16384 10.00 22420.88 10^6bits/s 1.01 S 3.44 S 0.177 0.603 usec/KB Thanks, Eric. It really looks like a windowing issue but here is the relevant information: We are measuring single flows. One side is an Intel GbE NIC connected to a 1 Gbps CIR Internet connection. The other side is an Intel 10 GbE NIC connected to a 40 Gbps Internet connection. RTT is ~=80ms The numbers I will post below are from a duplicated setup in our test lab where the systems are connected by GbE links with a netem router in the middle to introduce the latency. We are not varying the latency to ensure we eliminate packet re-ordering from the mix. We are measuring a single flow. Here are the non-GRE numbers: root@gwhq-1:~# nuttcp -T 60 -i 10 192.168.224.2 666.3125 MB / 10.00 sec = 558.9370 Mbps 0 retrans 1122.2500 MB / 10.00 sec = 941.4151 Mbps 0 retrans 720.8750 MB / 10.00 sec = 604.7129 Mbps 0 retrans 1122.3125 MB / 10.00 sec = 941.4622 Mbps 0 retrans 1122.2500 MB / 10.00 sec = 941.4101 Mbps 0 retrans 1122.3125 MB / 10.00 sec = 941.4668 Mbps 0 retrans 5888.5000 MB / 60.19 sec = 820.6857 Mbps 4 %TX 13 %RX 0 retrans 80.28 msRTT For some reason, nuttcp does not show retransmissions in our environment even when they do exist. gro is active on the send side: root@gwhq-1:~# ethtool -k eth0 Features for eth0: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-unneeded: off [fixed] tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off
Re: [PATCH] net:xen-netback - Change 1 to true for bool type variable.
From: Shailendra Verma shailendra.capric...@gmail.com Date: Mon, 25 May 2015 23:19:31 +0530 The variable separate_tx_rx_irq is bool type so assigning true instead of 1. Signed-off-by: Shailendra Verma shailendra.capric...@gmail.com Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] atm:he - Change 1 to true for bool type variable.
From: Shailendra Verma shailendra.capric...@gmail.com Date: Tue, 26 May 2015 01:17:23 +0530 The variable irq_coalesce is bool type. So assign the value true instead of 1. Signed-off-by: Shailendra Verma shailendra.capric...@gmail.com Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] atm:he - Do not initialise statics to 0.
From: Shailendra Verma shailendra.capric...@gmail.com Date: Tue, 26 May 2015 01:23:53 +0530 Static variables are initialised to 0 by GCC. Fixes the following checkpatch error: ERROR: do not initialise statics to 0 or NULL FILE: drivers/atm/he.c:120: static bool sdh = 0; Signed-off-by: Shailendra Verma shailendra.capric...@gmail.com true is not necessarily '1' and false is not necessarily '0', therefore the correct fix would be to assign 'false' to this variable. Furthermore you've submitted this in such a way that it cannot be applied alongside the other atm:he patch you submitted. This is why when you submit multiple patches to the same file, you must group them and submit them relative to eachother and in a specific order. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/ibm/emac: fix size of emac dump memory areas
From: Ivan Mikhaylov i...@ru.ibm.com Date: Thu, 21 May 2015 19:11:02 +0400 Fix in send of emac regs dump to ethtool which causing in wrong data interpretation on ethtool layer for MII and EMAC. Signed-off-by: Ivan Mikhaylov i...@ru.ibm.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 15:21 -0400, John A. Sullivan III wrote: Thanks, Eric. I really appreciate the help. This is a problem holding up a very high profile, major project and, for the life of me, I can't figure out why my TCP window size is reduced inside the GRE tunnel. Here is the netem setup although we are using this merely to reproduce what we are seeing in production. We see the same results bare metal to bare metal across the Internet. qdisc prio 10: root refcnt 17 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 32578077286 bytes 56349187 pkt (dropped 15361, overlimits 0 requeues 61323) backlog 0b 1p requeues 61323 qdisc netem 101: parent 10:1 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc netem 102: parent 10:2 limit 1000 delay 40.0ms Sent 32434562015 bytes 54180984 pkt (dropped 15361, overlimits 0 requeues 0) backlog 0b 1p requeues 0 qdisc netem 103: parent 10:3 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 root@router-001:~# tc -s qdisc show dev eth2 qdisc prio 2: root refcnt 17 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 296515482689 bytes 217794609 pkt (dropped 11719, overlimits 0 requeues 5307) backlog 0b 2p requeues 5307 qdisc netem 21: parent 2:1 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc netem 22: parent 2:2 limit 1000 delay 40.0ms Sent 289364020190 bytes 212892539 pkt (dropped 11719, overlimits 0 requeues 0) backlog 0b 2p requeues 0 qdisc netem 23: parent 2:3 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 I'm not sure how helpful these stats are as we did set this router up for packet loss at one point. We did suspect netem at some point and did things like change the limit but that had no effect. 80 ms at 1Gbps - you need to hold about packets in your netem qdisc, not 1000. tc qdisc ... netem ... limit 8000 ... (I see you added 40ms both ways, so you need packets in forward, and 1666 packets for the ACK packets) I tried a netem 80ms here and got following with default settings (no change in send/receive windows) lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.7.8.152 -Cc -t OMNI -l 20 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.7.8.152 () port 0 AF_INET tcpi_rto 281000 tcpi_ato 0 tcpi_pmtu 1476 tcpi_rcv_ssthresh 28720 tcpi_rtt 80431 tcpi_rttvar 304 tcpi_snd_ssthresh 2147483647 tpci_snd_cwnd 2215 tcpi_reordering 3 tcpi_total_retrans 0 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPUCPU CPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 4194304 6291456 16384 20.17 149.54 10^6bits/s 0.40 S 0.78 S 10.467 20.554 usec/KB Now with 16MB I got : -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 13:41 -0700, Eric Dumazet wrote: lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.7.8.152 -Cc -t OMNI -l 20 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.7.8.152 () port 0 AF_INET tcpi_rto 281000 tcpi_ato 0 tcpi_pmtu 1476 tcpi_rcv_ssthresh 28720 tcpi_rtt 80431 tcpi_rttvar 304 tcpi_snd_ssthresh 2147483647 tpci_snd_cwnd 2215 tcpi_reordering 3 tcpi_total_retrans 0 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 4194304 6291456 16384 20.17 149.54 10^6bits/s 0.40 S 0.78 S 10.467 20.554 usec/KB Now with 16MB I got : Sorry message was sent too soon : lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.7.8.152 -Cc -t OMNI -l 20 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.7.8.152 () port 0 AF_INET tcpi_rto 281000 tcpi_ato 0 tcpi_pmtu 1476 tcpi_rcv_ssthresh 28720 tcpi_rtt 80438 tcpi_rttvar 25 tcpi_snd_ssthresh 2147483647 tpci_snd_cwnd 5895 tcpi_reordering 3 tcpi_total_retrans 0 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPUCPU CPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 167772161677721616384 20.31 399.75 10^6bits/s 0.55 S 0.65 S 5.375 6.416 usec/KB -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [E1000-devel] [stable] e1000e fixes
Thanks, I will look into it shortly. I'm in the process of bringing the upstream driver up to date. -Original Message- From: Ben Hutchings [mailto:b...@decadent.org.uk] Sent: Saturday, May 23, 2015 20:09 To: e1000-de...@lists.sourceforge.net Cc: netdev; stable Subject: [E1000-devel] [stable] e1000e fixes I was looking through recent changes in e1000e, and thought these might be suitable for inclusion in stable updates: commit 493004d04f56fd7d642bdbb2938e17e5f7d622d1 Author: David Ertman david.m.ert...@intel.com Date: Fri Jul 4 01:44:32 2014 + e1000e: Fix CRC errors with jumbo traffic commit 47ccd1edc57ddabb81f6ba07e1e30201a8f578d6 Author: Vlad Yasevich vyasev...@gmail.com Date: Mon Aug 25 10:34:48 2014 -0400 e1000e: Fix TSO with non-accelerated vlans commit 6930895df994af212985396f12747125bc26 Author: Mathias Koehrer mathias.koeh...@etas.com Date: Thu Aug 7 18:51:53 2014 + e1000e: Fix 82572EI that has no hardware timestamp support This is purely based on the commit messages, not on bug reports or my own experience of the bugs. I leave it to the e1000e maintainers to judge whether they're important enough. Ben. -- Ben Hutchings If more than one person is responsible for a bug, no one is at fault. - Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Re: [PATCH] net:wireless - Change 1 to true for bool type variable.
On 05/25/2015 12:55 PM, Shailendra Verma wrote: The variable translate is bool type.So assigning true instead of 1. Signed-off-by: Shailendra Verma shailendra.capric...@gmail.com When you submit a patch for a particular driver in the drivers/net/wireless/ tree, it is preferred that the subject start with the driver name, not with net:wireless. Accordingly, your subject should be ray_cs: Change 1 to true for bool type variable. In fact, you can probably choose a better wording for the rest of the title, but I do not insist. Among other advantages, this choice of subject allows easy searches for patches for a given driver using 'git log'. Larry --- drivers/net/wireless/ray_cs.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ray_cs.c b/drivers/net/wireless/ray_cs.c index 477f863..0881ba8 100644 --- a/drivers/net/wireless/ray_cs.c +++ b/drivers/net/wireless/ray_cs.c @@ -143,7 +143,7 @@ static int psm; static char *essid; /* Default to encapsulation unless translation requested */ -static bool translate = 1; +static bool translate = true; static int country = USA; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
TX abort errors on GRE tunnels
Hello, all. We are seeing TX abort errors on all of our GRE tunnels. What would cause such a thing? We suspected it might be MTU because these are encapsulated in IPSec transport and the MTU on the GRE tunnel is still 1476 so we reduced the MTU but that did not eliminate the problem. By the way, I submitted a separate thread about TCP windowing problems inside GRE tunnels but those test tunnels are not generating the TX abort errors. Thanks - John -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 09:58 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 11:42 -0400, John A. Sullivan III wrote: Hello, all. I hope this is the correct list for this question. We are having serious problems on high BDP networks using GRE tunnels. Our traces show it to be a TCP Window problem. When we test without GRE, throughput is wire speed and traces show the window size to be 16MB which is what we configured for r/wmem_max and tcp_r/wmem. When we switch to GRE, we see over a 90% drop in throughput and the TCP window size seems to peak at around 500K. What causes this and how can we get the GRE tunnels to use the max window size? Thanks - John Hi John Is it for a single flow or multiple ones ? Which kernel versions on sender and receiver ? What is the nominal speed of non GRE traffic ? What is the brand/model of receiving NIC ? Is GRO enabled ? It is possible receiver window is impacted because of GRE encapsulation making skb-len/skb-truesize ratio a bit smaller, but not by 90%. I suspect some more trivial issues, like receiver overwhelmed by the extra load of GRE encapsulation. 1) Non GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H lpaa24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 70 tcpi_rttvar 7 tcpi_snd_ssthresh 221 tpci_snd_cwnd 258 tcpi_reordering 3 tcpi_total_retrans 711 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1912320 6291456 16384 10.00 22386.89 10^6bits/s 1.20 S 2.60 S 0.211 0.456 usec/KB 2) GRE session lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 7.7.7.24 -Cc -t OMNI OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.24 () port 0 AF_INET tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200 tcpi_rtt 76 tcpi_rttvar 7 tcpi_snd_ssthresh 176 tpci_snd_cwnd 249 tcpi_reordering 3 tcpi_total_retrans 819 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1815552 6291456 16384 10.00 22420.88 10^6bits/s 1.01 S 3.44 S 0.177 0.603 usec/KB Thanks, Eric. It really looks like a windowing issue but here is the relevant information: We are measuring single flows. One side is an Intel GbE NIC connected to a 1 Gbps CIR Internet connection. The other side is an Intel 10 GbE NIC connected to a 40 Gbps Internet connection. RTT is ~=80ms The numbers I will post below are from a duplicated setup in our test lab where the systems are connected by GbE links with a netem router in the middle to introduce the latency. We are not varying the latency to ensure we eliminate packet re-ordering from the mix. We are measuring a single flow. Here are the non-GRE numbers: root@gwhq-1:~# nuttcp -T 60 -i 10 192.168.224.2 666.3125 MB / 10.00 sec = 558.9370 Mbps 0 retrans 1122.2500 MB / 10.00 sec = 941.4151 Mbps 0 retrans 720.8750 MB / 10.00 sec = 604.7129 Mbps 0 retrans 1122.3125 MB / 10.00 sec = 941.4622 Mbps 0 retrans 1122.2500 MB / 10.00 sec = 941.4101 Mbps 0 retrans 1122.3125 MB / 10.00 sec = 941.4668 Mbps 0 retrans 5888.5000 MB / 60.19 sec = 820.6857 Mbps 4 %TX 13 %RX 0 retrans 80.28 msRTT For some reason, nuttcp does not show retransmissions in our environment even when they do exist. gro is active on the send side: root@gwhq-1:~# ethtool -k eth0 Features for eth0: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-unneeded: off [fixed] tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on
Re: [PATCH] irda: irda-usb: use msecs_to_jiffies for conversions
From: Nicholas Mc Guire hof...@osadl.org Date: Sat, 23 May 2015 14:46:30 +0200 API compliance scanning with coccinelle flagged: Converting milliseconds to jiffies by val * HZ / 1000 is technically is not a clean solution as it does not handle all corner cases correctly. By changing the conversion to use msecs_to_jiffies(val) conversion is correct in all cases. in the current code: mod_timer(self-rx_defer_timer, jiffies + (10 * HZ / 1000)); for HZ 100 (e.g. CONFIG_HZ == 64|32 in alpha) this effectively results in no delay at all. Patch was compile tested for x86_64_defconfig (implies CONFIG_USB_IRDA=m) Patch is against 4.1-rc4 (localversion-next is -next-20150522) Signed-off-by: Nicholas Mc Guire hof...@osadl.org Applied to net-next, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: stmmac: create one debugfs dir per net-device
From: Mathieu Olivari math...@codeaurora.org Date: Fri, 22 May 2015 19:03:29 -0700 stmmac DebugFS entries are currently global to the driver. As a result, having more than one stmmac device in the system creates the following error: * ERROR stmmaceth, debugfs create directory failed * stmmac_hw_setup: failed debugFS registration This also results in being able to access the debugfs information for the first registered device only. This patch changes the debugfs structure to have one sub-directory per net-device. Files under /sys/kernel/debug/stmmaceth will now show-up under /sys/kernel/debug/stmmaceth/ethN/. Signed-off-by: Mathieu Olivari math...@codeaurora.org Applied, thank you. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] irda: use msecs_to_jiffies for conversion to jiffies
From: Nicholas Mc Guire hof...@osadl.org Date: Mon, 25 May 2015 08:16:50 +0200 API compliance scanning with coccinelle flagged: ./net/irda/timer.c:63:35-37: use of msecs_to_jiffies probably perferable Converting milliseconds to jiffies by val * HZ / 1000 technically is not a clean solution as it does not handle all corner cases correctly. By changing the conversion to use msecs_to_jiffies(val) conversion is correct in all cases. Further the () around the arithmetic expression was dropped. Patch was compile tested for x86_64_defconfig + CONFIG_IRDA=m Patch is against 4.1-rc4 (localversion-next is -next-20150522) Signed-off-by: Nicholas Mc Guire hof...@osadl.org Applied, thank you. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 16:19 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 18:44 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 15:38 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 18:22 -0400, John A. Sullivan III wrote: 2) Why do we still not negotiate the 16MB buffer that we get when we are not using GRE? What exact NIC handles receive side ? If drivers allocate a full 4KB page to hold each frame, plus sk_buff overhead, then 32MB of kernel memory translates to 8MB of TCP window space. Hi, Eric. I'm not sure I understand the question or how to obtain the information you've requested. The receive side system has 48GB of RAM but that does not sound like what you are requesting. I suspect the behavior is a protection mechanism, i.e., it is being calculated for good reason. When I set the buffer to 16MB manually in nuttcp, performance degraded so I assume I was overrunning something. I am still downloading the traces. But I'm still mystified by why this only affects GRE traffic. Thanks - GRE is quite expensive, some extra cpu load is needed. On receiver, can you please check what exact driver is loaded ? Is it igb, ixgbe, e1000e, i40e ? ethtool -i eth0 GRE has extra 28 bytes of encapsulation, this definitely can make skb a little bit fat. TCP has very simple heuristics (using power of two steps) and a 50% factor can be explained by this extra 28 bytes for some particular driver. You could emulate this at the sender (without GRE) by reducing the mtu for the route to your target. ip route add 192.x.y.z via gateway mtu 1450 The receiver as well as the gateway is using igb: root@vserveringestst-01:~# ethtool -i eth0 driver: igb version: 3.2.10-k firmware-version: 1.4-3 bus-info: :01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes Changing the MTU does not show the same degradation as GRE: root@gwhq-1:~# ip route add 192.168.224.2 via 192.168.128.1 mtu 1476 root@gwhq-1:~# nuttcp -T 60 -i 10 192.168.224.2 connect failed: Connection timed out interval option only supported for client/server mode root@gwhq-1:~# nuttcp -T 60 -i 10 192.168.224.2 644.6875 MB / 10.00 sec = 540.7944 Mbps 0 retrans 1121.1875 MB / 10.00 sec = 940.5201 Mbps 0 retrans 1121.2500 MB / 10.00 sec = 940.5744 Mbps 0 retrans 1121.1250 MB / 10.00 sec = 940.4777 Mbps 0 retrans 1121.2500 MB / 10.00 sec = 940.5757 Mbps 0 retrans 1028.8750 MB / 10.00 sec = 863.0736 Mbps 0 retrans 6171.9375 MB / 60.70 sec = 852.9101 Mbps 5 %TX 12 %RX 0 retrans 80.27 msRTT CPU does not seem to be an issue from what I can see. The systems are all sitting at 98% idle and even checking individual CPUs shows no overload. Thanks - John -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: ipv6_select_ident() returns a __be32
On Mon, May 25, 2015 at 04:02:21PM -0700, Eric Dumazet wrote: From: Eric Dumazet eduma...@google.com ipv6_select_ident() returns a 32bit value in network order. Fixes: 286c2349f666 (ipv6: Clean up ipv6_select_ident() and ip6_fragment()) Signed-off-by: Eric Dumazet eduma...@google.com Reported-by: kbuild test robot fengguang...@intel.com Thanks for fixing it. Acked-by: Martin KaFai Lau ka...@fb.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 13:41 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 15:21 -0400, John A. Sullivan III wrote: Thanks, Eric. I really appreciate the help. This is a problem holding up a very high profile, major project and, for the life of me, I can't figure out why my TCP window size is reduced inside the GRE tunnel. Here is the netem setup although we are using this merely to reproduce what we are seeing in production. We see the same results bare metal to bare metal across the Internet. qdisc prio 10: root refcnt 17 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 32578077286 bytes 56349187 pkt (dropped 15361, overlimits 0 requeues 61323) backlog 0b 1p requeues 61323 qdisc netem 101: parent 10:1 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc netem 102: parent 10:2 limit 1000 delay 40.0ms Sent 32434562015 bytes 54180984 pkt (dropped 15361, overlimits 0 requeues 0) backlog 0b 1p requeues 0 qdisc netem 103: parent 10:3 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 root@router-001:~# tc -s qdisc show dev eth2 qdisc prio 2: root refcnt 17 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 296515482689 bytes 217794609 pkt (dropped 11719, overlimits 0 requeues 5307) backlog 0b 2p requeues 5307 qdisc netem 21: parent 2:1 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc netem 22: parent 2:2 limit 1000 delay 40.0ms Sent 289364020190 bytes 212892539 pkt (dropped 11719, overlimits 0 requeues 0) backlog 0b 2p requeues 0 qdisc netem 23: parent 2:3 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 I'm not sure how helpful these stats are as we did set this router up for packet loss at one point. We did suspect netem at some point and did things like change the limit but that had no effect. 80 ms at 1Gbps - you need to hold about packets in your netem qdisc, not 1000. tc qdisc ... netem ... limit 8000 ... (I see you added 40ms both ways, so you need packets in forward, and 1666 packets for the ACK packets) I tried a netem 80ms here and got following with default settings (no change in send/receive windows) lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.7.8.152 -Cc -t OMNI -l 20 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.7.8.152 () port 0 AF_INET tcpi_rto 281000 tcpi_ato 0 tcpi_pmtu 1476 tcpi_rcv_ssthresh 28720 tcpi_rtt 80431 tcpi_rttvar 304 tcpi_snd_ssthresh 2147483647 tpci_snd_cwnd 2215 tcpi_reordering 3 tcpi_total_retrans 0 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 4194304 6291456 16384 20.17 149.54 10^6bits/s 0.40 S 0.78 S 10.467 20.554 usec/KB Now with 16MB I got : Hmm . . . I did: tc qdisc replace dev eth0 parent 10:1 handle 101: netem delay 40ms limit 8000 tc qdisc replace dev eth0 parent 10:2 handle 102: netem delay 40ms limit 8000 tc qdisc replace dev eth0 parent 10:3 handle 103: netem delay 40ms limit 8000 tc qdisc replace dev eth2 parent 2:1 handle 21: netem delay 40ms limit 8000 tc qdisc replace dev eth2 parent 2:2 handle 22: netem delay 40ms limit 8000 tc qdisc replace dev eth2 parent 2:3 handle 23: netem delay 40ms limit 8000 The gateway to gateway performance was still abysmal: root@gwhq-1:~# nuttcp -T 60 -i 10 192.168.126.1 19.8750 MB / 10.00 sec = 16.6722 Mbps 0 retrans 23.2500 MB / 10.00 sec = 19.5035 Mbps 0 retrans 23.3125 MB / 10.00 sec = 19.5559 Mbps 0 retrans 23.3750 MB / 10.00 sec = 19.6084 Mbps 0 retrans 23.2500 MB / 10.00 sec = 19.5035 Mbps 0 retrans 23.3125 MB / 10.00 sec = 19.5560 Mbps 0 retrans 136.4375 MB / 60.13 sec = 19.0353 Mbps 0 %TX 0 %RX 0 retrans 80.25 msRTT But the end to end was near wire speed!: rita@vserver-002:~$ nuttcp -T 60 -i 10 192.168.8.20 518.9375 MB / 10.00 sec = 435.3154 Mbps 0 retrans 979.6875 MB / 10.00 sec = 821.8186 Mbps 0 retrans 979.2500 MB / 10.00 sec = 821.4541 Mbps 0 retrans 979.7500 MB / 10.00 sec = 821.8782 Mbps 0 retrans 979.7500 MB / 10.00 sec = 821.8735 Mbps 0 retrans 979.8750 MB / 10.00 sec = 821.9784 Mbps 0 retrans 5419.8750 MB / 60.11 sec = 756.3881 Mbps 7 %TX 10 %RX 0 retrans 80.58 msRTT I'm still downloading the trace to see what the window size
Re: [PATCH net-next v5 2/2] net: Adding support for Cavium ThunderX network controller
From: Aleksey Makarov aleksey.maka...@caviumnetworks.com Date: Fri, 22 May 2015 18:28:16 -0700 +#ifdef NOT_SUPPORTED_FOR_NOW +static int nicvf_set_coalesce(struct net_device *netdev, + struct ethtool_coalesce *cmd) +{ ... +#ifdef NOT_SUPPORTED_FOR_NOW + .set_coalesce = nicvf_set_coalesce, +#endif + .get_ringparam = nicvf_get_ringparam, Remove this completely. When you actually make it work and want to enable it, you can re-add the code. The upstream kernel is not your personal workspace :-) +#ifdef VNIC_RSS_SUPPORT This is unconditionally defined in one of your headers, please remove these ifdefs as they will never be false. +#ifdef VNIC_RSS_SUPPORT Likewise. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3 net-next] bridge: allow setting hash_max + multicast_router if interface is down
From: Linus Lüssing linus.luess...@c0d3.blue Date: Sat, 23 May 2015 03:12:34 +0200 Network managers like netifd (used in OpenWRT for instance) try to configure interface options after creation but before setting the interface up. Unfortunately the sysfs / bridge currently only allows to configure the hash_max and multicast_router options when the bridge interface is up. But since br_multicast_init() doesn't start any timers and only sets default values and initializes timers it should be save to reconfigure the default values after that, before things actually get active after the bridge is set up. Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue Applied to net-next, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 net-next 0/2] rocker: unused parameter and const cleanups
From: Simon Horman simon.hor...@netronome.com Date: Mon, 25 May 2015 14:28:34 +0900 This series provides some minor though verbose cleanup of rocker. The second patch depends on the first though it could be rebased. I had previously asked for v2 to be put on hold while some bugs I had found in the rocker driver were shaken out. That has now happened and the bugs turned out to be unrelated. Accordingly I am reposting the series. * Changes v2 - v3 - Rebase and update for new variables and parameters that may be const * Changes v1 - v2 - Found quite a few more variables and parameters to make const Series applied, thanks Simon. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 19:35 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 16:19 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 18:44 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 15:38 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 18:22 -0400, John A. Sullivan III wrote: 2) Why do we still not negotiate the 16MB buffer that we get when we are not using GRE? What exact NIC handles receive side ? If drivers allocate a full 4KB page to hold each frame, plus sk_buff overhead, then 32MB of kernel memory translates to 8MB of TCP window space. Hi, Eric. I'm not sure I understand the question or how to obtain the information you've requested. The receive side system has 48GB of RAM but that does not sound like what you are requesting. I suspect the behavior is a protection mechanism, i.e., it is being calculated for good reason. When I set the buffer to 16MB manually in nuttcp, performance degraded so I assume I was overrunning something. I am still downloading the traces. But I'm still mystified by why this only affects GRE traffic. Thanks - GRE is quite expensive, some extra cpu load is needed. On receiver, can you please check what exact driver is loaded ? Is it igb, ixgbe, e1000e, i40e ? ethtool -i eth0 GRE has extra 28 bytes of encapsulation, this definitely can make skb a little bit fat. TCP has very simple heuristics (using power of two steps) and a 50% factor can be explained by this extra 28 bytes for some particular driver. You could emulate this at the sender (without GRE) by reducing the mtu for the route to your target. ip route add 192.x.y.z via gateway mtu 1450 The receiver as well as the gateway is using igb: root@vserveringestst-01:~# ethtool -i eth0 driver: igb version: 3.2.10-k firmware-version: 1.4-3 bus-info: :01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes Changing the MTU does not show the same degradation as GRE: Then it is very possible igb was not able to dissect GRE packets, and driver skb allocation enters a 'slow path' You might try a more recent version of linux kernel at receiver. igb current version is 5.2.15-k -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next:master 741/754] net/ipv6/ip6_output.c:587:17: sparse: incorrect type in assignment (different base types)
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 376cd36dc7b68ec7f7de1428fa055ce706a33bbf commit: 286c2349f6665c3e67f464a5faa14a0e28be4842 [741/754] ipv6: Clean up ipv6_select_ident() and ip6_fragment() reproduce: # apt-get install sparse git checkout 286c2349f6665c3e67f464a5faa14a0e28be4842 make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by ) net/ipv6/ip6_output.c:587:17: sparse: incorrect type in assignment (different base types) net/ipv6/ip6_output.c:587:17:expected restricted __be32 [usertype] frag_id net/ipv6/ip6_output.c:587:17:got unsigned int net/ipv6/ip6_output.c:1105:38: sparse: incorrect type in assignment (different base types) net/ipv6/ip6_output.c:1105:38:expected restricted __be32 [usertype] ip6_frag_id net/ipv6/ip6_output.c:1105:38:got unsigned int -- net/ipv6/output_core.c:72:16: sparse: incorrect type in return expression (different base types) net/ipv6/output_core.c:72:16:expected unsigned int net/ipv6/output_core.c:72:16:got restricted __be32 [usertype] noident vim +587 net/ipv6/ip6_output.c 571 sk_nocaps_add(skb-sk, NETIF_F_GSO_MASK); 572 573 skb-dev = skb_dst(skb)-dev; 574 icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); 575 IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), 576IPSTATS_MIB_FRAGFAILS); 577 kfree_skb(skb); 578 return -EMSGSIZE; 579 } 580 581 if (np np-frag_size mtu) { 582 if (np-frag_size) 583 mtu = np-frag_size; 584 } 585 mtu -= hlen + sizeof(struct frag_hdr); 586 587 frag_id = ipv6_select_ident(net, rt); 588 589 if (skb_has_frag_list(skb)) { 590 int first_len = skb_pagelen(skb); 591 struct sk_buff *frag2; 592 593 if (first_len - hlen mtu || 594 ((first_len - hlen) 7) || 595 skb_cloned(skb)) --- 0-DAY kernel test infrastructureOpen Source Technology Center http://lists.01.org/mailman/listinfo/kbuild Intel Corporation -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/4] cpsw cleanups
From: Richard Cochran richardcoch...@gmail.com Date: Mon, 25 May 2015 11:02:12 +0200 While working on an out-of-tree customization, I noticed a few minor problems in the cpsw code. This series cleans up the issues I found. Series applied, thanks Richard. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Drops in qdisc on ifb interface
On Mon, 2015-05-25 at 16:05 -0400, John A. Sullivan III wrote: Hello, all. One one of our connections we are doing intensive traffic shaping with tc. We are using ifb interfaces for shaping ingress traffic and we also use ifb interfaces for egress so that we can apply the same set of rules to multiple interfaces (e.g., tun and eth interfaces operating on the same physical interface). These are running on very powerful gateways; I have watched them handling 16 Gbps with CPU utilization at a handful of percent. Yet, I am seeing drops on the ifb interfaces when I do a tc -s qdisc show. Why would this be? I would expect if there was some kind of problem that it would manifest as drops on the physical interfaces and not the IFB interface. We have played with queue lengths in both directions. We are using HFSC with SFQ leaves so I would imagine this overrides the very short qlen on the IFB interfaces (32). These are drops and not overlimits. IFB is single threaded and a serious bottleneck. Don't use this on egress, this destroys multiqueue capaility. And SFQ is pretty limited (127 packets) You might try to change your NIC to have a single queue for RX, so that you have a single cpu feeding your IFB queue. (ethtool -L eth0 rx 1) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 15:38 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 18:22 -0400, John A. Sullivan III wrote: 2) Why do we still not negotiate the 16MB buffer that we get when we are not using GRE? What exact NIC handles receive side ? If drivers allocate a full 4KB page to hold each frame, plus sk_buff overhead, then 32MB of kernel memory translates to 8MB of TCP window space. Hi, Eric. I'm not sure I understand the question or how to obtain the information you've requested. The receive side system has 48GB of RAM but that does not sound like what you are requesting. I suspect the behavior is a protection mechanism, i.e., it is being calculated for good reason. When I set the buffer to 16MB manually in nuttcp, performance degraded so I assume I was overrunning something. I am still downloading the traces. But I'm still mystified by why this only affects GRE traffic. Thanks - John -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] ipv6: ipv6_select_ident() returns a __be32
From: Eric Dumazet eduma...@google.com ipv6_select_ident() returns a 32bit value in network order. Fixes: 286c2349f666 (ipv6: Clean up ipv6_select_ident() and ip6_fragment()) Signed-off-by: Eric Dumazet eduma...@google.com Reported-by: kbuild test robot fengguang...@intel.com --- include/net/ipv6.h |6 +++--- net/ipv6/output_core.c |6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index b950a2000b7f4608647b77dd029c94277a1afd97..35d485c780802cc28dedb4f889f3478be712b9df 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -671,9 +671,9 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr)); } -u32 ipv6_select_ident(struct net *net, - const struct in6_addr *daddr, - const struct in6_addr *saddr); +__be32 ipv6_select_ident(struct net *net, +const struct in6_addr *daddr, +const struct in6_addr *saddr); void ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb); int ip6_dst_hoplimit(struct dst_entry *dst); diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c index 055e85cb7b6518f20796859e46fa043b186278e7..21678acd452165fae8a2c0b7c007ed1daa407344 100644 --- a/net/ipv6/output_core.c +++ b/net/ipv6/output_core.c @@ -61,9 +61,9 @@ void ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb) } EXPORT_SYMBOL_GPL(ipv6_proxy_select_ident); -u32 ipv6_select_ident(struct net *net, - const struct in6_addr *daddr, - const struct in6_addr *saddr) +__be32 ipv6_select_ident(struct net *net, +const struct in6_addr *daddr, +const struct in6_addr *saddr) { static u32 ip6_idents_hashrnd __read_mostly; u32 id; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC net-next] rocker: by default accept untagged packets
This will occur anyway if the 8021q module is loaded as it will call rocker_port_vlan_rx_add_vid for vlan 0. This code is here to handle the case where the 8021q module is not loaded. This patch also handles the case where the 8021q is unloaded removing all VLANs from all ports. This change should not affect bridging, although the rules are harmlessly installed anyway. This is in keeping with the behaviour for VLANs when the 8021q modules is loaded. To aid implementation of the above provide a helper and use it to replace some existing code. Signed-off-by: Simon Horman simon.hor...@netronome.com --- drivers/net/ethernet/rocker/rocker.c | 51 +++- 1 file changed, 39 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 36f7edfc3c7a..bc00e0abd8b6 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -3720,6 +3720,19 @@ static int rocker_port_router_mac(struct rocker_port *rocker_port, return err; } +static int rocker_port_vlan_rx_vid(struct rocker_port *rocker_port, + enum switchdev_trans trans, int flag, + u16 vid) +{ + int err; + + err = rocker_port_vlan(rocker_port, trans, flag, vid); + if (err) + return err; + + return rocker_port_router_mac(rocker_port, trans, flag, htons(vid)); +} + static int rocker_port_fwding(struct rocker_port *rocker_port, enum switchdev_trans trans) { @@ -4009,6 +4022,16 @@ static int rocker_port_open(struct net_device *dev) goto err_request_rx_irq; } + /* By default accept untagged vlan packets. +* +* This will occur anyway if the 8021q module is loaded as it will +* call rocker_port_vlan_rx_add_vid for vlan 0. This code is here +* to handle the case where the 8021q module is not loaded. +*/ + err = rocker_port_vlan_rx_vid(rocker_port, SWITCHDEV_TRANS_NONE, 0, 0); + if (err) + goto err_fwd_enable; + err = rocker_port_fwd_enable(rocker_port, SWITCHDEV_TRANS_NONE); if (err) goto err_fwd_enable; @@ -4187,29 +4210,33 @@ static int rocker_port_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid) { struct rocker_port *rocker_port = netdev_priv(dev); - int err; - - err = rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, 0, vid); - if (err) - return err; - return rocker_port_router_mac(rocker_port, SWITCHDEV_TRANS_NONE, - 0, htons(vid)); + return rocker_port_vlan_rx_vid(rocker_port, SWITCHDEV_TRANS_NONE, + 0, vid); } static int rocker_port_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid) { struct rocker_port *rocker_port = netdev_priv(dev); - int err; + int err, i; - err = rocker_port_router_mac(rocker_port, SWITCHDEV_TRANS_NONE, -ROCKER_OP_FLAG_REMOVE, htons(vid)); + err = rocker_port_vlan_rx_vid(rocker_port, SWITCHDEV_TRANS_NONE, + ROCKER_OP_FLAG_REMOVE, vid); if (err) return err; - return rocker_port_vlan(rocker_port, SWITCHDEV_TRANS_NONE, - ROCKER_OP_FLAG_REMOVE, vid); + /* If no vlans are set then the last one has been removed; +* restore the default behaviour of accepting untagged packets. +* +* This may occur if the 8021q module is unloaded. +*/ + for (i = 0; i ROCKER_VLAN_BITMAP_LEN; i++) + if (rocker_port-vlan_bitmap[i]) + return 0; + return rocker_port_vlan_rx_vid(rocker_port, SWITCHDEV_TRANS_NONE, + 0, 0); + } static int rocker_port_get_phys_port_name(struct net_device *dev, -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: netxen: correct sysfs bin attribute return code
If read() syscall requests unexpected number of bytes from dimm binary attribute file, return EINVAL instead of EPERM. At the same time pin down sysfs file size to the fixed sizeof(struct netxen_dimm_cfg), which allows to exploit some missing sanity checks from kernfs (file boundary checks vs offset etc.) Signed-off-by: Vladimir Zapolskiy v...@mleia.com --- drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c index e0c31e3..6409a06 100644 --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c @@ -3025,9 +3025,9 @@ netxen_sysfs_read_dimm(struct file *filp, struct kobject *kobj, u8 dw, rows, cols, banks, ranks; u32 val; - if (size != sizeof(struct netxen_dimm_cfg)) { + if (size attr-size) { netdev_err(netdev, Invalid size\n); - return -1; + return -EINVAL; } memset(dimm, 0, sizeof(struct netxen_dimm_cfg)); @@ -3137,7 +3137,7 @@ out: static struct bin_attribute bin_attr_dimm = { .attr = { .name = dimm, .mode = (S_IRUGO | S_IWUSR) }, - .size = 0, + .size = sizeof(struct netxen_dimm_cfg), .read = netxen_sysfs_read_dimm, }; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: qlcnic: clean up sysfs error codes
Replace confusing QL_STATUS_INVALID_PARAM == -1 == -EPERM with -EINVAL and QLC_STATUS_UNSUPPORTED_CMD == -2 == -ENOENT with -EOPNOTSUPP, the latter error code is arguable, but it is already used in the driver, so let it be here as well. Also remove always false (!buf) check on read(), the driver should not care if userspace gets its EFAULT or not. Signed-off-by: Vladimir Zapolskiy v...@mleia.com --- drivers/net/ethernet/qlogic/qlcnic/qlcnic.h | 3 - drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 2 +- drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c | 77 +++ 3 files changed, 36 insertions(+), 46 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h index f221126..055f376 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h @@ -1326,9 +1326,6 @@ struct qlcnic_eswitch { }; -/* Return codes for Error handling */ -#define QL_STATUS_INVALID_PARAM-1 - #define MAX_BW 100 /* % of link speed */ #define MIN_BW 1 /* % of link speed */ #define MAX_VLAN_ID4095 diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index 367f397..2f6cc42 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -1031,7 +1031,7 @@ int qlcnic_init_pci_info(struct qlcnic_adapter *adapter) pfn = pci_info[i].id; if (pfn = ahw-max_vnic_func) { - ret = QL_STATUS_INVALID_PARAM; + ret = -EINVAL; dev_err(adapter-pdev-dev, %s: Invalid function 0x%x, max 0x%x\n, __func__, pfn, ahw-max_vnic_func); goto err_eswitch; diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c index 59a721f..05c28f2 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c @@ -24,8 +24,6 @@ #include linux/hwmon-sysfs.h #endif -#define QLC_STATUS_UNSUPPORTED_CMD -2 - int qlcnicvf_config_bridged_mode(struct qlcnic_adapter *adapter, u32 enable) { return -EOPNOTSUPP; @@ -166,7 +164,7 @@ static int qlcnic_82xx_store_beacon(struct qlcnic_adapter *adapter, u8 b_state, b_rate; if (len != sizeof(u16)) - return QL_STATUS_INVALID_PARAM; + return -EINVAL; memcpy(beacon, buf, sizeof(u16)); err = qlcnic_validate_beacon(adapter, beacon, b_state, b_rate); @@ -383,17 +381,17 @@ static int validate_pm_config(struct qlcnic_adapter *adapter, dest_pci_func = pm_cfg[i].dest_npar; src_index = qlcnic_is_valid_nic_func(adapter, src_pci_func); if (src_index 0) - return QL_STATUS_INVALID_PARAM; + return -EINVAL; dest_index = qlcnic_is_valid_nic_func(adapter, dest_pci_func); if (dest_index 0) - return QL_STATUS_INVALID_PARAM; + return -EINVAL; s_esw_id = adapter-npars[src_index].phy_port; d_esw_id = adapter-npars[dest_index].phy_port; if (s_esw_id != d_esw_id) - return QL_STATUS_INVALID_PARAM; + return -EINVAL; } return 0; @@ -414,7 +412,7 @@ static ssize_t qlcnic_sysfs_write_pm_config(struct file *filp, count = size / sizeof(struct qlcnic_pm_func_cfg); rem = size % sizeof(struct qlcnic_pm_func_cfg); if (rem) - return QL_STATUS_INVALID_PARAM; + return -EINVAL; qlcnic_swap32_buffer((u32 *)buf, size / sizeof(u32)); pm_cfg = (struct qlcnic_pm_func_cfg *)buf; @@ -427,7 +425,7 @@ static ssize_t qlcnic_sysfs_write_pm_config(struct file *filp, action = !!pm_cfg[i].action; index = qlcnic_is_valid_nic_func(adapter, pci_func); if (index 0) - return QL_STATUS_INVALID_PARAM; + return -EINVAL; id = adapter-npars[index].phy_port; ret = qlcnic_config_port_mirroring(adapter, id, @@ -440,7 +438,7 @@ static ssize_t qlcnic_sysfs_write_pm_config(struct file *filp, pci_func = pm_cfg[i].pci_func; index = qlcnic_is_valid_nic_func(adapter, pci_func); if (index 0) - return QL_STATUS_INVALID_PARAM; + return -EINVAL; id = adapter-npars[index].phy_port; adapter-npars[index].enable_pm = !!pm_cfg[i].action; adapter-npars[index].dest_npar = id; @@ -499,11 +497,11 @@ static int validate_esw_config(struct
Re: Drops in qdisc on ifb interface
On Mon, 2015-05-25 at 15:31 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 16:05 -0400, John A. Sullivan III wrote: Hello, all. One one of our connections we are doing intensive traffic shaping with tc. We are using ifb interfaces for shaping ingress traffic and we also use ifb interfaces for egress so that we can apply the same set of rules to multiple interfaces (e.g., tun and eth interfaces operating on the same physical interface). These are running on very powerful gateways; I have watched them handling 16 Gbps with CPU utilization at a handful of percent. Yet, I am seeing drops on the ifb interfaces when I do a tc -s qdisc show. Why would this be? I would expect if there was some kind of problem that it would manifest as drops on the physical interfaces and not the IFB interface. We have played with queue lengths in both directions. We are using HFSC with SFQ leaves so I would imagine this overrides the very short qlen on the IFB interfaces (32). These are drops and not overlimits. IFB is single threaded and a serious bottleneck. Don't use this on egress, this destroys multiqueue capaility. And SFQ is pretty limited (127 packets) You might try to change your NIC to have a single queue for RX, so that you have a single cpu feeding your IFB queue. (ethtool -L eth0 rx 1) Hmm . . . I've been thinking about that SFQ leaf qdisc. I see that newer kernels allow a much higher limit than 127 but it still seems that the queue depth limit for any one flow is still 127. When we do something like GRE/IPSec, I think the decrypted GRE traffic will distribute across the queues but the IPSec traffic will collapse all the packets initially into one queue. At 80ms RTT a 1 Gbps wire speed, I would need a queue of around 7500. Thus, can one say that SFQ is almost useless for high BDP connections? Is there a similar round-robin type qdisc that does not have this limitation? If I recall correctly, if one does not attach a qdisc explicitly to a class, it defaults to pfifo_fast. Is that correct? Thanks - John -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3 nf-next] netfilter: nf_tables: allow to bind table to net_device
Hi Pablo, On Mon, May 25, 2015 at 02:46:41PM +0200, Pablo Neira Ayuso wrote: This patch adds the internal NFT_AF_NEEDS_DEV flag to indicate that you must attach this table to a net_device. This change is required by the follow up patch that introduces the new netdev table. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/net/netfilter/nf_tables.h|8 ++ include/uapi/linux/netfilter/nf_tables.h |2 ++ net/netfilter/nf_tables_api.c| 46 ++ 3 files changed, 51 insertions(+), 5 deletions(-) [snip] diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 5fa1cd0..89a671e 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h [snip] @@ -423,6 +425,10 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net, nla_put_be32(skb, NFTA_TABLE_USE, htonl(table-use))) goto nla_put_failure; + if (table-dev + nla_put_string(skb, NFTA_TABLE_DEV, table-dev-name)) + goto nla_put_failure; + nlmsg_end(skb, nlh); return 0; @@ -608,6 +614,11 @@ static int nf_tables_updtable(struct nft_ctx *ctx) if (flags == ctx-table-flags) return 0; + if ((ctx-afi-flags NFT_AF_NEEDS_DEV) + ctx-nla[NFTA_TABLE_DEV] + nla_strcmp(ctx-nla[NFTA_TABLE_DEV], ctx-table-dev-name)) + return -EOPNOTSUPP; + trans = nft_trans_alloc(ctx, NFT_MSG_NEWTABLE, sizeof(struct nft_trans_table)); if (trans == NULL) I'm a little unsure of the above logic. Is it ok for NFT_AF_NEEDS_DEV to be set but ctx-nla[NFTA_TABLE_DEV] to be absent? @@ -645,6 +656,7 @@ static int nf_tables_newtable(struct sock *nlsk, struct sk_buff *skb, struct nft_table *table; struct net *net = sock_net(skb-sk); int family = nfmsg-nfgen_family; + struct net_device *dev = NULL; u32 flags = 0; struct nft_ctx ctx; int err; @@ -679,30 +691,50 @@ static int nf_tables_newtable(struct sock *nlsk, struct sk_buff *skb, return -EINVAL; } + if (afi-flags NFT_AF_NEEDS_DEV) { + char ifname[IFNAMSIZ]; + + if (!nla[NFTA_TABLE_DEV]) + return -EOPNOTSUPP; + + nla_strlcpy(ifname, nla[NFTA_TABLE_DEV], IFNAMSIZ); + dev = dev_get_by_name(net, ifname); + if (!dev) + return -ENOENT; + } else if (nla[NFTA_TABLE_DEV]) { + return -EOPNOTSUPP; + } + + err = -EAFNOSUPPORT; if (!try_module_get(afi-owner)) - return -EAFNOSUPPORT; + goto err1; [snip] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [trivial PATCH] neterion: s2io: Fix kernel doc formatting
From: Joe Perches j...@perches.com Date: Sat, 23 May 2015 10:32:55 -0700 These two uses seem to have had carriage returns removed. Make these entries like all the others in this file. Signed-off-by: Joe Perches j...@perches.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 18:44 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 15:38 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 18:22 -0400, John A. Sullivan III wrote: 2) Why do we still not negotiate the 16MB buffer that we get when we are not using GRE? What exact NIC handles receive side ? If drivers allocate a full 4KB page to hold each frame, plus sk_buff overhead, then 32MB of kernel memory translates to 8MB of TCP window space. Hi, Eric. I'm not sure I understand the question or how to obtain the information you've requested. The receive side system has 48GB of RAM but that does not sound like what you are requesting. I suspect the behavior is a protection mechanism, i.e., it is being calculated for good reason. When I set the buffer to 16MB manually in nuttcp, performance degraded so I assume I was overrunning something. I am still downloading the traces. But I'm still mystified by why this only affects GRE traffic. Thanks - GRE is quite expensive, some extra cpu load is needed. On receiver, can you please check what exact driver is loaded ? Is it igb, ixgbe, e1000e, i40e ? ethtool -i eth0 GRE has extra 28 bytes of encapsulation, this definitely can make skb a little bit fat. TCP has very simple heuristics (using power of two steps) and a 50% factor can be explained by this extra 28 bytes for some particular driver. You could emulate this at the sender (without GRE) by reducing the mtu for the route to your target. ip route add 192.x.y.z via gateway mtu 1450 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: ipv6_select_ident() returns a __be32
From: Eric Dumazet eric.duma...@gmail.com Date: Mon, 25 May 2015 16:02:21 -0700 From: Eric Dumazet eduma...@google.com ipv6_select_ident() returns a 32bit value in network order. Fixes: 286c2349f666 (ipv6: Clean up ipv6_select_ident() and ip6_fragment()) Signed-off-by: Eric Dumazet eduma...@google.com Reported-by: kbuild test robot fengguang...@intel.com Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] pktgen: remove one sparse error
From: Eric Dumazet eric.duma...@gmail.com Date: Mon, 25 May 2015 16:06:37 -0700 From: Eric Dumazet eduma...@google.com net/core/pktgen.c:2672:43: warning: incorrect type in assignment (different base types) net/core/pktgen.c:2672:43:expected unsigned short [unsigned] [short] [usertype] noident net/core/pktgen.c:2672:43:got restricted __be16 [usertype] protocol Let's use proper struct ethhdr instead of hard coding everything. Signed-off-by: Eric Dumazet eduma...@google.com Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF
-Original Message- From: Rose, Gregory V Sent: Friday, May 22, 2015 8:08 AM To: Hiroshi Shimamoto; Skidmore, Donald C; Kirsher, Jeffrey T; intel-wired- l...@lists.osuosl.org Cc: nhor...@redhat.com; jogre...@redhat.com; Linux Netdev List; Choi, Sy Jong; Rony Efraim; David Miller; Edward Cree; Or Gerlitz; sassm...@redhat.com Subject: RE: [PATCH v5 3/3] ixgbe: Add new ndo to trust VF -Original Message- From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On Behalf Of Hiroshi Shimamoto Sent: Thursday, May 21, 2015 7:31 PM To: Skidmore, Donald C; Kirsher, Jeffrey T; intel-wired- l...@lists.osuosl.org Cc: nhor...@redhat.com; jogre...@redhat.com; Linux Netdev List; Choi, Sy Jong; Rony Efraim; David Miller; Edward Cree; Or Gerlitz; sassm...@redhat.com Subject: Re: [Intel-wired-lan] [PATCH v5 3/3] ixgbe: Add new ndo to trust VF [big snip] I think your concerns are related to some operational assumptions. My basic concept is, not to change the behavior of VM, existing user operation. I mean that I didn't think it's better that the user should check the both of the ixgbevf driver can deal with new API and the VF is trusted. Now, I think the point is who takes care whether the VF is trusted. Right? It seems that you think the VF user should handle that user is trusted and do something with a notice that you're trusted or untrusted from the host. Is that correct? I made it in PF side, because it looks easy to handle it. If something to do in VF side, I think ixgbevf driver should handle it. Setting the VF trusted mode feature should only be allowed through the PF as it is the only trusted entity from the start. We do not want the VF being able to decide for itself to be trusted. - Greg I completely agree with Greg and never meant to imply anything else. The PF should be where a given VF is made trusted. Likewise a VF can get promoted to MC Promiscuous buy requesting over 30 MC groups. I like this and your patch currently does this. So for example below: PFVF ----- Set given VF as trusted Request 30+ MC groups via Mail Box Put PF in MC Promiscuous mode What I am concerned about is the following flow where we seem to store the fact the VF requests more than 30+ MC groups so that we can automatically enter MC Promisc Mode if that VF is ever made trusted. PFVF --- -- Currently VF is NOT trusted Request 30+ MC groups via Mail Box Do NOT put PF in MC Promisc (hw-mac.mc_promisc = true) Some time later Set given VF as trusted (because mc_promisc set) Put PF in MC Promisc I don't like the fact that the PF remembers that the VF was denied MC Promiscuous mode in the past. And because of that automatically put the VF in MC Promiscuous mode when it becomes trusted. Maybe showing in code what I would like removed/added would be more helpful, probably should have started doing that. :) Do you mean that VF should care about it is trusted or not? Should VF request MC Promisc again when it's trusted? Or, do you mean VF never be trusted during its (or VM's) lifetime? And what do you think about being untrusted from trusted state? I would remove this bit of code from ixgbe_ndo_set_vf_trust(): int ixgbe_ndo_set_vf_trust(struct net_device *netdev, int vf, bool setting) { struct ixgbe_adapter *adapter = netdev_priv(netdev); if (vf = adapter-num_vfs) return -EINVAL; /* nothing to do */ if (adapter-vfinfo[vf].trusted == setting) return 0; adapter-vfinfo[vf].trusted = setting; - /* Reconfigure features which are only allowed for trusted VF */ - /* VF multicast promiscuous mode */ - if (adapter-vfinfo[vf].mc_promisc) - ixgbe_enable_vf_mc_promisc(adapter, vf); I understand, you don't think we need to have a capability to enable/disable MC Promisc on the fly. return 0; } This of course would be we should not set mc_promisc ever if we are NOT trusted (adapter-vfinfo[vf].trusted) so in ixgbe_set_vf_mc_promisc() I would add or something like it: static int ixgbe_set_vf_mc_promisc(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) { bool enable = !!msgbuf[1]; /* msgbuf contains the flag to enable */ switch (adapter-vfinfo[vf].vf_api) { case ixgbe_mbox_api_12: break; default: return -1; } + /* have to be trusted */ + If (!adapter-vfinfo[vf].trusted) + Return 0; Should we return an error to VF to inform it isn't trusted? + /* nothing to
[PATCH net-next] net: remove a sparse error in secure_dccpv6_sequence_number()
From: Eric Dumazet eduma...@google.com make C=2 CF=-D__CHECK_ENDIAN__ net/core/secure_seq.o net/core/secure_seq.c:157:50: warning: restricted __be32 degrades to integer Signed-off-by: Eric Dumazet eduma...@google.com --- net/core/secure_seq.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c index 51dd3193a33ebb26ea3c008e752d1f2832f791d8..fd3ce461fbe6210ab95fbcbb4b5e6c862a262898 100644 --- a/net/core/secure_seq.c +++ b/net/core/secure_seq.c @@ -154,7 +154,7 @@ u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr, net_secret_init(); memcpy(hash, saddr, 16); for (i = 0; i 4; i++) - secret[i] = net_secret[i] + daddr[i]; + secret[i] = net_secret[i] + (__force u32)daddr[i]; secret[4] = net_secret[4] + (((__force u16)sport 16) + (__force u16)dport); for (i = 5; i MD5_MESSAGE_BYTES / 4; i++) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 17:34 -0400, John A. Sullivan III wrote: On Mon, 2015-05-25 at 13:41 -0700, Eric Dumazet wrote: On Mon, 2015-05-25 at 15:21 -0400, John A. Sullivan III wrote: Thanks, Eric. I really appreciate the help. This is a problem holding up a very high profile, major project and, for the life of me, I can't figure out why my TCP window size is reduced inside the GRE tunnel. Here is the netem setup although we are using this merely to reproduce what we are seeing in production. We see the same results bare metal to bare metal across the Internet. qdisc prio 10: root refcnt 17 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 32578077286 bytes 56349187 pkt (dropped 15361, overlimits 0 requeues 61323) backlog 0b 1p requeues 61323 qdisc netem 101: parent 10:1 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc netem 102: parent 10:2 limit 1000 delay 40.0ms Sent 32434562015 bytes 54180984 pkt (dropped 15361, overlimits 0 requeues 0) backlog 0b 1p requeues 0 qdisc netem 103: parent 10:3 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 root@router-001:~# tc -s qdisc show dev eth2 qdisc prio 2: root refcnt 17 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 296515482689 bytes 217794609 pkt (dropped 11719, overlimits 0 requeues 5307) backlog 0b 2p requeues 5307 qdisc netem 21: parent 2:1 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc netem 22: parent 2:2 limit 1000 delay 40.0ms Sent 289364020190 bytes 212892539 pkt (dropped 11719, overlimits 0 requeues 0) backlog 0b 2p requeues 0 qdisc netem 23: parent 2:3 limit 1000 delay 40.0ms Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 I'm not sure how helpful these stats are as we did set this router up for packet loss at one point. We did suspect netem at some point and did things like change the limit but that had no effect. 80 ms at 1Gbps - you need to hold about packets in your netem qdisc, not 1000. tc qdisc ... netem ... limit 8000 ... (I see you added 40ms both ways, so you need packets in forward, and 1666 packets for the ACK packets) I tried a netem 80ms here and got following with default settings (no change in send/receive windows) lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.7.8.152 -Cc -t OMNI -l 20 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.7.8.152 () port 0 AF_INET tcpi_rto 281000 tcpi_ato 0 tcpi_pmtu 1476 tcpi_rcv_ssthresh 28720 tcpi_rtt 80431 tcpi_rttvar 304 tcpi_snd_ssthresh 2147483647 tpci_snd_cwnd 2215 tcpi_reordering 3 tcpi_total_retrans 0 Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPUCPUService Service Demand SizeSizeSize (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 4194304 6291456 16384 20.17 149.54 10^6bits/s 0.40 S 0.78 S 10.467 20.554 usec/KB Now with 16MB I got : Hmm . . . I did: tc qdisc replace dev eth0 parent 10:1 handle 101: netem delay 40ms limit 8000 tc qdisc replace dev eth0 parent 10:2 handle 102: netem delay 40ms limit 8000 tc qdisc replace dev eth0 parent 10:3 handle 103: netem delay 40ms limit 8000 tc qdisc replace dev eth2 parent 2:1 handle 21: netem delay 40ms limit 8000 tc qdisc replace dev eth2 parent 2:2 handle 22: netem delay 40ms limit 8000 tc qdisc replace dev eth2 parent 2:3 handle 23: netem delay 40ms limit 8000 The gateway to gateway performance was still abysmal: root@gwhq-1:~# nuttcp -T 60 -i 10 192.168.126.1 19.8750 MB / 10.00 sec = 16.6722 Mbps 0 retrans 23.2500 MB / 10.00 sec = 19.5035 Mbps 0 retrans 23.3125 MB / 10.00 sec = 19.5559 Mbps 0 retrans 23.3750 MB / 10.00 sec = 19.6084 Mbps 0 retrans 23.2500 MB / 10.00 sec = 19.5035 Mbps 0 retrans 23.3125 MB / 10.00 sec = 19.5560 Mbps 0 retrans 136.4375 MB / 60.13 sec = 19.0353 Mbps 0 %TX 0 %RX 0 retrans 80.25 msRTT But the end to end was near wire speed!: rita@vserver-002:~$ nuttcp -T 60 -i 10 192.168.8.20 518.9375 MB / 10.00 sec = 435.3154 Mbps 0 retrans 979.6875 MB / 10.00 sec = 821.8186 Mbps 0 retrans 979.2500 MB / 10.00 sec = 821.4541 Mbps 0 retrans 979.7500 MB / 10.00 sec = 821.8782 Mbps 0 retrans 979.7500 MB / 10.00 sec = 821.8735 Mbps 0 retrans 979.8750 MB / 10.00
Re: [PATCH net 0/3] phyter bug fixes
From: Richard Cochran richardcoch...@gmail.com Date: Mon, 25 May 2015 11:55:42 +0200 While working on a project using the phyter, I noticed some bugs that have crept in over time. This series fixes those bugs. These patches are also meant for stable. Series applied and queued up for -stable as well, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP window auto-tuning sub-optimal in GRE tunnel
On Mon, 2015-05-25 at 18:22 -0400, John A. Sullivan III wrote: 2) Why do we still not negotiate the 16MB buffer that we get when we are not using GRE? What exact NIC handles receive side ? If drivers allocate a full 4KB page to hold each frame, plus sk_buff overhead, then 32MB of kernel memory translates to 8MB of TCP window space. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] pktgen: remove one sparse error
From: Eric Dumazet eduma...@google.com net/core/pktgen.c:2672:43: warning: incorrect type in assignment (different base types) net/core/pktgen.c:2672:43:expected unsigned short [unsigned] [short] [usertype] noident net/core/pktgen.c:2672:43:got restricted __be16 [usertype] protocol Let's use proper struct ethhdr instead of hard coding everything. Signed-off-by: Eric Dumazet eduma...@google.com --- net/core/pktgen.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 62f979984a23236db1109bb109f46a932840633f..46dfcfcb9c540dac0853b5648cce2d69e8fa3608 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -2645,9 +2645,9 @@ static int process_ipsec(struct pktgen_dev *pkt_dev, struct xfrm_state *x = pkt_dev-flows[pkt_dev-curfl].x; int nhead = 0; if (x) { - int ret; - __u8 *eth; + struct ethhdr *eth; struct iphdr *iph; + int ret; nhead = x-props.header_len - skb_headroom(skb); if (nhead 0) { @@ -2667,9 +2667,9 @@ static int process_ipsec(struct pktgen_dev *pkt_dev, goto err; } /* restore ll */ - eth = (__u8 *) skb_push(skb, ETH_HLEN); - memcpy(eth, pkt_dev-hh, 12); - *(u16 *) eth[12] = protocol; + eth = (struct ethhdr *)skb_push(skb, ETH_HLEN); + memcpy(eth, pkt_dev-hh, 2 * ETH_ALEN); + eth-h_proto = protocol; /* Update IPv4 header len as well as checksum value */ iph = ip_hdr(skb); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 nf-next] netfilter: default CONFIG_NETFILTER_INGRESS to y
Hi Pablo, On Mon, May 25, 2015 at 02:46:40PM +0200, Pablo Neira Ayuso wrote: Useful to compile-test all options. Suggested-by: by Alexei Stavoroitov a...@plumgrid.com Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org There seems to be a stray by between ':' and Alexi's name. --- net/netfilter/Kconfig |1 + 1 file changed, 1 insertion(+) diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index db1c674..9a89e7c 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -3,6 +3,7 @@ menu Core Netfilter Configuration config NETFILTER_INGRESS bool Netfilter ingress support + default y select NET_INGRESS help This allows you to classify packets from ingress using the Netfilter -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: fix inet_proto_csum_replace4() sparse errors
From: Eric Dumazet eduma...@google.com make C=2 CF=-D__CHECK_ENDIAN__ net/core/utils.o ... net/core/utils.c:307:72: warning: incorrect type in argument 2 (different base types) net/core/utils.c:307:72:expected restricted __wsum [usertype] addend net/core/utils.c:307:72:got restricted __be32 [usertype] from net/core/utils.c:308:34: warning: incorrect type in argument 2 (different base types) net/core/utils.c:308:34:expected restricted __wsum [usertype] addend net/core/utils.c:308:34:got restricted __be32 [usertype] to net/core/utils.c:310:70: warning: incorrect type in argument 2 (different base types) net/core/utils.c:310:70:expected restricted __wsum [usertype] addend net/core/utils.c:310:70:got restricted __be32 [usertype] from net/core/utils.c:310:77: warning: incorrect type in argument 2 (different base types) net/core/utils.c:310:77:expected restricted __wsum [usertype] addend net/core/utils.c:310:77:got restricted __be32 [usertype] to net/core/utils.c:312:72: warning: incorrect type in argument 2 (different base types) net/core/utils.c:312:72:expected restricted __wsum [usertype] addend net/core/utils.c:312:72:got restricted __be32 [usertype] from net/core/utils.c:313:35: warning: incorrect type in argument 2 (different base types) net/core/utils.c:313:35:expected restricted __wsum [usertype] addend net/core/utils.c:313:35:got restricted __be32 [usertype] to Note we can use csum_replace4() helper Fixes: 58e3cac5613aa (net: optimise inet_proto_csum_replace4()) Signed-off-by: Eric Dumazet eduma...@google.com --- net/core/utils.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/utils.c b/net/core/utils.c index 7b803884c162..a7732a068043 100644 --- a/net/core/utils.c +++ b/net/core/utils.c @@ -304,13 +304,15 @@ void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb, __be32 from, __be32 to, int pseudohdr) { if (skb-ip_summed != CHECKSUM_PARTIAL) { - *sum = csum_fold(csum_add(csum_sub(~csum_unfold(*sum), from), -to)); + csum_replace4(sum, from, to); if (skb-ip_summed == CHECKSUM_COMPLETE pseudohdr) - skb-csum = ~csum_add(csum_sub(~(skb-csum), from), to); + skb-csum = ~csum_add(csum_sub(~(skb-csum), + (__force __wsum)from), + (__force __wsum)to); } else if (pseudohdr) - *sum = ~csum_fold(csum_add(csum_sub(csum_unfold(*sum), from), - to)); + *sum = ~csum_fold(csum_add(csum_sub(csum_unfold(*sum), + (__force __wsum)from), + (__force __wsum)to)); } EXPORT_SYMBOL(inet_proto_csum_replace4); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: remove a sparse error in secure_dccpv6_sequence_number()
From: Eric Dumazet eric.duma...@gmail.com Date: Mon, 25 May 2015 18:55:48 -0700 From: Eric Dumazet eduma...@google.com make C=2 CF=-D__CHECK_ENDIAN__ net/core/secure_seq.o net/core/secure_seq.c:157:50: warning: restricted __be32 degrades to integer Signed-off-by: Eric Dumazet eduma...@google.com Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: fix inet_proto_csum_replace4() sparse errors
From: Eric Dumazet eric.duma...@gmail.com Date: Mon, 25 May 2015 18:50:01 -0700 From: Eric Dumazet eduma...@google.com make C=2 CF=-D__CHECK_ENDIAN__ net/core/utils.o ... Note we can use csum_replace4() helper Fixes: 58e3cac5613aa (net: optimise inet_proto_csum_replace4()) Signed-off-by: Eric Dumazet eduma...@google.com Also applied, thanks Eric. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html