Re: [PATCH net-next 4/6] net: hip04: Don't free the tx skb when the buffer is not ready for xmit
On 2015/4/15 22:19, Arnd Bergmann wrote: On Wednesday 15 April 2015 20:30:06 Ding Tianhong wrote: @@ -489,6 +487,8 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev) /* Ensure tx_head update visible to tx reclaim */ smp_wmb(); + count++; + priv-tx_head = TX_NEXT(tx_head); Something is wrong here, the comment does not match the code any more, because the smp_wmb() won't actually make the tx_head update visible. Arnd Yes, the smp_wmb() could only make sure the tx buffer was ready to xmit. . -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/6] net: hip04: use the big endian for tx and rx desc
On 2015/4/15 22:13, Arnd Bergmann wrote: On Wednesday 15 April 2015 20:30:04 Ding Tianhong wrote: The hip04 ethernet use the big endian for tx and rx, so set desc to big endian and remove the unused next_addr. I don't understand: diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index b0a7f03..6473462 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -132,19 +132,18 @@ #define HIP04_MIN_TX_COALESCE_FRAMES 1 struct tx_desc { - u32 send_addr; - u32 send_size; - u32 next_addr; - u32 cfg; - u32 wb_addr; + __be32 send_addr; + __be32 send_size; + __be32 cfg; + __be32 wb_addr; } __aligned(64); I would think this is a hardware structure, does this not break access to the cfg and wb_addr fields if you remove next_addr? Arnd good cache, I make a mistake here, thanks. . -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/6] net: hip04: fix some problem for hip04 drivers
On 2015/4/15 22:25, Arnd Bergmann wrote: On Wednesday 15 April 2015 20:30:02 Ding Tianhong wrote: The patches series was used to fix the issues of the hip04 driver, and added some optimizations according to some good suggestion. Thanks, that looks much better, except for patch 4 that I commented on. I will check and fix it. I had at some point sent a patch that is archived at http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/318120.html I believe that one is also still needed, but I have not tested whether it is correct. Can you have a look at the patch from back then and see if it works, of if you find something wrong about it? I'm sending the unmodified patch from then here again for you to apply or comment. It will have to be rebased on top of your current changes. Arnd Thanks for the suggestion, I notices that I miss this patch in my series from the maillist, I need time to check the code and try to test it, thanks for the work.:) Ding 8 Subject: [PATCH] net/hip04: refactor interrupt masking The hip04 ethernet driver currently acknowledges all interrupts directly in the interrupt handler, and leaves all interrupts except the RX data enabled the whole time. This causes multiple problems: - When more packets come in between the original interrupt and the NAPI poll function, we will get an extraneous RX interrupt as soon as interrupts are enabled again. - The two error interrupts are by definition combining all errors that may have happened since the last time they were handled, but just acknowledging the irq without dealing with the cause of the condition makes it come back immediately. In particular, when NAPI is intentionally stalling the rx queue, this causes a storm of rx dropped messages. - The access to the 'reg_inten' field in hip_priv is used for serializing access, but is in fact racy itself. To deal with these issues, the driver is changed to only acknowledge the IRQ at the point when it is being handled. The RX interrupts now get acked right before reading the number of received frames to keep spurious interrupts to the absolute minimum without losing interrupts. As there is no tx-complete interrupt, only an informational tx-dropped one, we now ack that in the tx reclaim handler, hoping that clearing the descriptors has also eliminated the error condition. The only place that reads the reg_inten now just relies on napi_schedule_prep() to return whether NAPI is active or not, and the poll() function is then going to ack and reenabled the IRQ if necessary. Finally, when we disable interrupts, they are now all disabled together, in order to simplify the logic and to avoid getting rx-dropped interrupts when NAPI is in control of the rx queue. Signed-off-by: Arnd Bergmann a...@arndb.de diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 525214ef5984..83773247a368 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -56,6 +56,8 @@ #define RCV_DROP BIT(7) #define TX_DROPBIT(6) #define DEF_INT_ERR(RCV_NOBUF | RCV_DROP | TX_DROP) +#define DEF_INT_RX (RCV_INT | RCV_NOBUF | RCV_DROP) +#define DEF_INT_TX (TX_DROP) #define DEF_INT_MASK (RCV_INT | DEF_INT_ERR) /* TX descriptor config */ @@ -154,7 +156,6 @@ struct hip04_priv { unsigned int port; unsigned int speed; unsigned int duplex; - unsigned int reg_inten; struct napi_struct napi; struct net_device *ndev; @@ -303,17 +304,15 @@ static void hip04_mac_enable(struct net_device *ndev) val |= GE_RX_PORT_EN | GE_TX_PORT_EN; writel_relaxed(val, priv-base + GE_PORT_EN); - /* clear rx int */ - val = RCV_INT; - writel_relaxed(val, priv-base + PPE_RINT); + /* clear prior interrupts */ + writel_relaxed(DEF_INT_MASK, priv-base + PPE_RINT); /* config recv int */ val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT; writel_relaxed(val, priv-base + PPE_CFG_RX_PKT_INT); /* enable interrupt */ - priv-reg_inten = DEF_INT_MASK; - writel_relaxed(priv-reg_inten, priv-base + PPE_INTEN); + writel_relaxed(DEF_INT_MASK, priv-base + PPE_INTEN); } static void hip04_mac_disable(struct net_device *ndev) @@ -322,8 +321,7 @@ static void hip04_mac_disable(struct net_device *ndev) u32 val; /* disable int */ - priv-reg_inten = ~(DEF_INT_MASK); - writel_relaxed(priv-reg_inten, priv-base + PPE_INTEN); + writel_relaxed(0, priv-base + PPE_INTEN); /* disable tx rx */ val = readl_relaxed(priv-base + GE_PORT_EN); @@ -403,6 +401,8 @@ static int hip04_tx_reclaim(struct net_device
Re: [v3] skbuff: Do not scrub skb mark within the same name space
On Thu, 16 Apr 2015, Herbert Xu wrote: On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote: Le 15/04/2015 15:57, Herbert Xu a écrit : On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote: [snip] Subject: skbuff: Do not scrub skb mark within the same name space The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 (tunnels: Maybe add a Fixes tag? Fixes: ea23192e8e57 (tunnels: harmonize cleanup done on skb on rx path) harmonize cleanup done on skb on rx path) broke anyone trying to use netfilter marking across IPv4 tunnels. While most of the fields that are cleared by skb_scrub_packet don't matter, the netfilter mark must be preserved. This patch rearranges skb_scurb_packet to preserve the mark field. nit: s/scurb/scrub Else it's fine for me. Sure. PS I used the wrong email for James the first time around. So let me repeat the question here. Should secmark be preserved or cleared across tunnels within the same name space? In fact, do our security models even support name spaces? They don't support namespaces, and maintaining the label is critical for SELinux, at least, which mediates security for the system as a whole. -- James Morris jmor...@namei.org
Re: Revert net: Reset secmark when scrubbing packet
On 04/16/15 at 04:12pm, Herbert Xu wrote: On Thu, Apr 16, 2015 at 05:02:15PM +1000, James Morris wrote: They don't support namespaces, and maintaining the label is critical for SELinux, at least, which mediates security for the system as a whole. Thanks for the confirmation James, I thought this looked a bit dodgy :) ---8--- This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602 because the secmark must be preserved even when a packet crosses namespace boundaries. The reason is that security labels apply to the system as a whole and is not per-namespace. No objection to reverting, _BUT_ just because security labels apply to the system as a whole does not mean that both the packet in the underlay and overlay belong to the same context. The point here was to not blindly inherit the security context of a packet based on the outer or inner header. Someone tagging all packets addressed to the host itself with a SElinux context may not expect that SELinux context to be preserved into a namespaced tenant. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On 04/16/2015 10:20 AM, Daniel Borkmann wrote: So mid term, it would be much more beneficial if you attempt fix the underlying driver issues that actually cause high tx completion delays, instead of reintroducing bufferbloat. So that we all can move forward and not backwards in time. Yes, I think we definitely see the need for this. I think we certainly agree that bufferbloat needs to be reduced, and minimizing the data we need in the pipe for full performance on xennet is an important part of that. It should be said, however, that any virtual device is always going to have higher latency than a physical device. Hopefully we'll be able to get the latency of xennet down to something that's more reasonable, but it may just not be possible. And in any case, if we're going to be cranking down these limits to just barely within the tolerance of physical NICs, virtual devices (either xennet or virtio_net) are never going to be able to catch up. (Without cheating that is.) What Eric described to you was that you introduce a new netdev member like netdev-needs_bufferbloat, set that indication from driver site, and cache that in the socket that binds to it, so you can adjust the test in tcp_xmit_size_goal(). It should merely be seen as a hint/indication for such devices. Hmm? He suggested that after he'd been prodded by 4 more e-mails in which two of us guessed what he was trying to get at. That's what I was complaining about. Having a per-device long transmit latency hint sounds like a sensible short-term solution to me. -George -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: make mandocs build failure with next-20150407
On Thu, 16 Apr 2015 08:33:15 +0900 Masanari Iida standby2...@gmail.com wrote: On Wed, Apr 8, 2015 at 7:41 AM, Jim Davis jim.ep...@gmail.com wrote: DOCPROC Documentation/DocBook/80211.xml Error(.//include/net/mac80211.h:329): Cannot parse enum! Error(.//include/net/mac80211.h:367): Cannot parse enum! Warning(.//include/net/mac80211.h:381): No description found for parameter 'type' -- This error start to appear on Linus's tree. make xmldocs stops after this error. Then run it again, make xmldocs process rest of the files. Add TO: netdev and patch authors for reporting the issue. It's the result of some incomplete changes in the wireless tree; I sent a fix to Johannes that he applied a couple of days ago. Once that hits mainline the problem should go away. Thanks, jon -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On 04/15/2015 07:19 PM, Eric Dumazet wrote: On Wed, 2015-04-15 at 19:04 +0100, George Dunlap wrote: Maybe you should stop wasting all of our time and just tell us what you're thinking. I think you make me wasting my time. I already gave all the hints in prior discussions. Right, and I suggested these two options: Obviously one solution would be to allow the drivers themselves to set the tcp_limit_output_bytes, but that seems like a maintenance nightmare. Another simple solution would be to allow drivers to indicate whether they have a high transmit latency, and have the kernel use a higher value by default when that's the case. [1] Neither of which you commented on. Instead you pointed me to a comment that only partially described what the limitations were. (I.e., it described the two packets or 1ms, but not how they related, nor how they related to the max of 2 64k packets outstanding of the default tcp_limit_output_bytes setting.) -George [1] http://marc.info/?i=CAFLBxZYt7-v29ysm=f+5qmow64_qhesjzj98udba+1cs-pf...@mail.gmail.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On 04/16/2015 10:56 AM, George Dunlap wrote: On 04/15/2015 07:19 PM, Eric Dumazet wrote: On Wed, 2015-04-15 at 19:04 +0100, George Dunlap wrote: Maybe you should stop wasting all of our time and just tell us what you're thinking. I think you make me wasting my time. I already gave all the hints in prior discussions. Right, and I suggested these two options: So mid term, it would be much more beneficial if you attempt fix the underlying driver issues that actually cause high tx completion delays, instead of reintroducing bufferbloat. So that we all can move forward and not backwards in time. Obviously one solution would be to allow the drivers themselves to set the tcp_limit_output_bytes, but that seems like a maintenance nightmare. Possible, but very hacky, as you penalize globally. Another simple solution would be to allow drivers to indicate whether they have a high transmit latency, and have the kernel use a higher value by default when that's the case. [1] Neither of which you commented on. Instead you pointed me to a comment What Eric described to you was that you introduce a new netdev member like netdev-needs_bufferbloat, set that indication from driver site, and cache that in the socket that binds to it, so you can adjust the test in tcp_xmit_size_goal(). It should merely be seen as a hint/indication for such devices. Hmm? that only partially described what the limitations were. (I.e., it described the two packets or 1ms, but not how they related, nor how they related to the max of 2 64k packets outstanding of the default tcp_limit_output_bytes setting.) -George [1] http://marc.info/?i=CAFLBxZYt7-v29ysm=f+5qmow64_qhesjzj98udba+1cs-pf...@mail.gmail.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/6] net: hip04: Don't free the tx skb when the buffer is not ready for xmit
On 2015/4/16 16:41, Arnd Bergmann wrote: On Thursday 16 April 2015 14:26:21 Ding Tianhong wrote: On 2015/4/15 22:19, Arnd Bergmann wrote: On Wednesday 15 April 2015 20:30:06 Ding Tianhong wrote: @@ -489,6 +487,8 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev) /* Ensure tx_head update visible to tx reclaim */ smp_wmb(); + count++; + priv-tx_head = TX_NEXT(tx_head); Something is wrong here, the comment does not match the code any more, because the smp_wmb() won't actually make the tx_head update visible. Arnd Yes, the smp_wmb() could only make sure the tx buffer was ready to xmit. The problem you need to avoid is that the tx reclaim function thinks it's done with all outstanding packets and does not retrigger, while the start_xmit thinks it will still get to that. This means you need a barrier between the priv-tx_head update and the napi_schedule_prep() call. But I still doubt about that, when the buffer is ready to xmit, it will add to fifo queue, but it doesn't mean it is finished xmit yet, the tx reclaim function will free the skb only depend on whether tx_head is updated. Ding Arnd . -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 0/5] stmmac: Correct flow control configuration
Hi Vince On 4/15/2015 6:17 PM, Vince Bridgers wrote: This series of patches corrects flow control configuration for the Synopsys GMAC driver. Thx for these patches For the series Acked-by: Giuseppe Cavallaro peppe.cavall...@st.com peppe Flow control is configured based on a configurable receive fifo size. If less than 4Kbytes flow control is left disabled and a warning is presented. If a receive fifo size is not specified, flow control is left disabled to maintain current behavior. Unicast pause detection was disabled, but is now enabled. The pause time was modified to be maximum time per a XON/XOFF flow control mode of operation. This patch was tested on an Altera Cyclone 5 and an Altera Arria 10 devkit, and verified that flow control operates as expected when enabled. Please consider this series for inclusion so that flow control will function as expected for the Synopsys GMAC controller. Vince Bridgers (5): stmmac: Add properties for transmit and receive fifo sizes stmmac: Add defines and documentation for enabling flow control stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree stmmac: Enable unicast pause frame detect in GMAC Register 6 stmmac: Configure Flow Control to work correctly based on rxfifo size Documentation/devicetree/bindings/net/ethernet.txt | 6 +++ Documentation/devicetree/bindings/net/stmmac.txt | 4 ++ drivers/net/ethernet/stmicro/stmmac/common.h | 5 ++- drivers/net/ethernet/stmicro/stmmac/dwmac1000.h| 51 ++ .../net/ethernet/stmicro/stmmac/dwmac1000_core.c | 5 ++- .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c| 26 ++- drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c | 2 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 16 --- .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 4 ++ include/linux/stmmac.h | 2 + 10 files changed, 111 insertions(+), 10 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3] skbuff: Do not scrub skb mark within the same name space
On Thu, Apr 16, 2015 at 09:35:31AM +0200, Nicolas Dichtel wrote: Herbert, could you send a v4 of your patch with the secmark included? Actually this should go into a separate patch because it's a straight revert and also that unlike the mark this should never be cleared, even when you cross a namespace boundary. But I will send a patch. Cheers, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3] skbuff: Do not scrub skb mark within the same name space
Le 16/04/2015 09:02, James Morris a écrit : On Thu, 16 Apr 2015, Herbert Xu wrote: [snip] PS I used the wrong email for James the first time around. So let me repeat the question here. Should secmark be preserved or cleared across tunnels within the same name space? In fact, do our security models even support name spaces? They don't support namespaces, and maintaining the label is critical for SELinux, at least, which mediates security for the system as a whole. Herbert, could you send a v4 of your patch with the secmark included? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Revert net: Reset secmark when scrubbing packet
On Thu, Apr 16, 2015 at 05:02:15PM +1000, James Morris wrote: They don't support namespaces, and maintaining the label is critical for SELinux, at least, which mediates security for the system as a whole. Thanks for the confirmation James, I thought this looked a bit dodgy :) ---8--- This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602 because the secmark must be preserved even when a packet crosses namespace boundaries. The reason is that security labels apply to the system as a whole and is not per-namespace. Signed-off-by: Herbert Xu herb...@gondor.apana.org.au diff --git a/net/core/skbuff.c b/net/core/skbuff.c index a185427..d1967da 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4130,7 +4130,6 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet) skb-ignore_df = 0; skb_dst_drop(skb); skb_sender_cpu_clear(skb); - skb_init_secmark(skb); secpath_reset(skb); nf_reset(skb); nf_reset_trace(skb); -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3] skbuff: Do not scrub skb mark within the same name space
On 04/16/15 at 09:03am, Herbert Xu wrote: The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 (tunnels: harmonize cleanup done on skb on rx path) broke anyone trying to use netfilter marking across IPv4 tunnels. While most of the fields that are cleared by skb_scrub_packet don't matter, the netfilter mark must be preserved. This patch rearranges skb_scrub_packet to preserve the mark field. Fixes: ea23192e8e57 (tunnels: harmonize cleanup done on skb on rx path) Signed-off-by: Herbert Xu herb...@gondor.apana.org.au Acked-by: Thomas Graf tg...@suug.ch We should also add a flag to veth which expclitly allows to preserve the mark into the namespace. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/6] net: hip04: Don't free the tx skb when the buffer is not ready for xmit
On Thursday 16 April 2015 14:26:21 Ding Tianhong wrote: On 2015/4/15 22:19, Arnd Bergmann wrote: On Wednesday 15 April 2015 20:30:06 Ding Tianhong wrote: @@ -489,6 +487,8 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev) /* Ensure tx_head update visible to tx reclaim */ smp_wmb(); + count++; + priv-tx_head = TX_NEXT(tx_head); Something is wrong here, the comment does not match the code any more, because the smp_wmb() won't actually make the tx_head update visible. Arnd Yes, the smp_wmb() could only make sure the tx buffer was ready to xmit. The problem you need to avoid is that the tx reclaim function thinks it's done with all outstanding packets and does not retrigger, while the start_xmit thinks it will still get to that. This means you need a barrier between the priv-tx_head update and the napi_schedule_prep() call. Arnd -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]
Prashant writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]): Ian, using your config we are able to recreate the problem that you are seeing. The driver finds the RX data buffer to be all zero, with a analyzer trace we are seeing the chip is DMA'ing valid RX data buffer contents to the host but once the driver tries to read this DMA area, it is seeing all zero's which is the reason of the corruption. This is only for the RX data buffer, the RX descriptor and status block update DMA regions are having valid contents. I am no expert on this area, but this suggests that the driver is misoperating the Linux DMA management API. This is what I think Konrad suspected when he suggested the `iommu=soft swiotlb=force' command line option. Note in kernel-parameters.txt: swiotlb=[ARM,IA-64,PPC,MIPS,X86] Format: { int | force } int -- Number of I/O TLB slabs force -- force using of bounce buffers even if they wouldn't be automatically used by the kernel So with `swiotlb=force' the DMA is _expected_ to go to a bounce buffer managed by the kernel DMA API. This is unlikely to be a chip or driver issue, as the chip is doing the correct DMA but the corruption occurs before driver reads it. Would request iommu experts to take a look and suggest what can be done next. As I say above I think this is probably a driver bug. I have seen identical symptoms on a 5yo desktop box under my desk and on two brand new rackmount servers; I therefore doubt that it's a hardware problem. Ian. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
From: George Dunlap Sent: 16 April 2015 09:56 On 04/15/2015 07:19 PM, Eric Dumazet wrote: On Wed, 2015-04-15 at 19:04 +0100, George Dunlap wrote: Maybe you should stop wasting all of our time and just tell us what you're thinking. I think you make me wasting my time. I already gave all the hints in prior discussions. Right, and I suggested these two options: Obviously one solution would be to allow the drivers themselves to set the tcp_limit_output_bytes, but that seems like a maintenance nightmare. Another simple solution would be to allow drivers to indicate whether they have a high transmit latency, and have the kernel use a higher value by default when that's the case. [1] Neither of which you commented on. Instead you pointed me to a comment that only partially described what the limitations were. (I.e., it described the two packets or 1ms, but not how they related, nor how they related to the max of 2 64k packets outstanding of the default tcp_limit_output_bytes setting.) ISTM that you are changing the wrong knob. You need to change something that affects the global amount of pending tx data, not the amount that can be buffered by a single connection. If you change tcp_limit_output_bytes and then have 1000 connections trying to send data you'll suffer 'bufferbloat'. If you call skb_orphan() in the tx setup path then the total number of buffers is limited, but a single connection can (and will) will the tx ring leading to incorrect RTT calculations and additional latency for other connections. This will give high single connection throughput but isn't ideal. One possibility might be to call skb_orphan() when enough time has elapsed since the packet was queued for transmit that it is very likely to have actually been transmitted - even though 'transmit done' has not yet been signalled. Not at all sure how this would fit in though... David
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On Thu, 2015-04-16 at 12:39 +0100, George Dunlap wrote: On 04/15/2015 07:17 PM, Eric Dumazet wrote: Do not expect me to fight bufferbloat alone. Be part of the challenge, instead of trying to get back to proven bad solutions. I tried that. I wrote a description of what I thought the situation was, so that you could correct me if my understanding was wrong, and then what I thought we could do about it. You apparently didn't even read it, but just pointed me to a single cryptic comment that doesn't give me enough information to actually figure out what the situation is. We all agree that bufferbloat is a problem for everybody, and I can definitely understand the desire to actually make the situation better rather than dying the death of a thousand exceptions. If you want help fighting bufferbloat, you have to educate people to help you; or alternately, if you don't want to bother educating people, you have to fight it alone -- or lose the battle due to having a thousand exceptions. So, back to TSQ limits. What's so magical about 2 packets being *in the device itself*? And what does 1ms, or 2*64k packets (the default for tcp_limit_output_bytes), have anything to do with it? Your comment lists three benefits: 1. better RTT estimation 2. faster recovery 3. high rates #3 is just marketing fluff; it's also contradicted by the statement that immediately follows it -- i.e., there are drivers for which the limitation does *not* give high rates. #1, as far as I can tell, has to do with measuring the *actual* minimal round trip time of an empty pipe, rather than the round trip time you get when there's 512MB of packets in the device buffer. If a device has a large internal buffer, then having a large number of packets outstanding means that the measured RTT is skewed. The goal here, I take it, is to have this pipe *exactly* full; having it significantly more than full is what leads to bufferbloat. #2 sounds like you're saying that if there are too many packets outstanding when you discover that you need to adjust things, that it takes a long time for your changes to have an effect; i.e., if you have 5ms of data in the pipe, it will take at least 5ms for your reduced transmmission rate to actually have an effect. Is that accurate, or have I misunderstood something? #2 means that : If you have an outstanding queue of 500 packets for a flow in qdisc. A rtx has to be done, because we receive a SACK. The rtx is queued _after_ the previous 500 packets. 500 packets have to be drained before rtx can be sent and eventually reach destination. These 500 packets will likely be dropped because the destination cannot process them before the rtx. 2 TSO packets are already 90 packets (MSS=1448). It is not small, but a good compromise allowing line rate even on 40Gbit NIC. #1 is not marketing. It is hugely relevant. You might use cubic as the default congestion control, you have to understand we work hard on delay based cc, as losses are no longer a way to measure congestion in modern networks. Vegas and delay gradient congestion depends on precise RTT measures. I added usec RTT estimations (instead of jiffies based rtt samples) to increase resolution by 3 order of magnitude, not for marketing, but because it had to be done when DC communications have typical rtt of 25 usec these days. And jitter in host queues is not nice and must be kept at the minimum. You do not have the whole picture, but this tight bufferbloat control is one step before we can replace cubic by new upcoming cc, that many companies are actively developing and testing. The steps are the following : 1) TCP Small queues 2) FQ/pacing 3) TSO auto sizing 3) usec rtt estimations 4) New revolutionary cc module currently under test at Google, but others have alternatives. The fact that few drivers have bugs should not stop this effort. If you guys are in the Bay area, we would be happy to host a meeting where we can present you how our work reduced packet drops in our networks by 2 order of magnitude, and increased capacity by 40 or 50%. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netns: deinline net_generic()
On Thu, 2015-04-16 at 13:14 +0200, Denys Vlasenko wrote: However, without BUG_ONs, function is still a bit big on PREEMPT configs. Only on allyesconfig builds, that nobody use but to prove some points about code size. If you look at net_generic(), it is mostly used from code that is normally in 3 modules. How many people really load them ? net/tipc : 91 call sites net/sunrpc : 57 fs/nfsd fs/lockd : 183 Then few remaining uses in tunnels. As we suggested, please just remove the BUG_ON(). Then, if you can come up with a solution that does not slow down non PREEMPT kernel, why not. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state
On Wed, Apr 15, 2015 at 10:12:42PM -0700, Guenter Roeck wrote: Return correct error code if _mv88e6xxx_reg_read returns an error. Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging support) Signed-off-by: Guenter Roeck li...@roeck-us.net Hi Guenter Good catch. I'm surprised there is no compiler warning, possibly used before set. Reviewed-by: Andrew Lunn and...@lunn.ch Thanks Andrew --- drivers/net/dsa/mv88e6xxx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index fc8d3b6ffe8e..f64186a5f634 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -908,8 +908,10 @@ static int mv88e6xxx_set_port_state(struct dsa_switch *ds, int port, u8 state) mutex_lock(ps-smi_mutex); reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_CONTROL); - if (reg 0) + if (reg 0) { + ret = reg; goto abort; + } oldstate = reg PORT_CONTROL_STATE_MASK; if (oldstate != state) { -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting
On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote: On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote: Netfilter may change the contents of the packet, even change its size. It is *really* hard to do this while keeping the original fragments intact. Perhaps we should provide better helpers to facilitate this? So instead of directly manipulating the content of the skb you would so so through helpers and the helpers can then try to do sensible things with the fragments. When Florian and me started discussing how to solve this we also wanted to be as transparent as possible. But looking at all possible fragmentation scenarios, this seems to be too complicated. Even imagine a fragment with overlapping offsets and some of the fragments got duplicated. If we had to keep this in frag_list and now netfilter has to change any of this contents, this will become a total mess, like changing one port in multiple skbs at different offsets. I doubt it is worth the effort. Thanks, Hannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netns: deinline net_generic()
Hi David, Eric, As you may have surmised, this patch wasn't a result of me looking at networking code; rather, it is a result of semi-automated search for huge inlines. The last step of this process would be the submission of patches. I am expecting a range of responses from maintainers: enthusiastic, no reply, go away, I don't care about code size, and everything in between. I can't invest a large amount of effort trying to push every deinlining patch through. If someone, for example, would flat out refuse to fix a huge inline in his driver, so be it. I will be happy to run at least a few iterations over a patch if maintainers do see a merit in deinlining, but have some reservations wrt performance. Your response falls into the latter case (you aren't refusing the patch outright, but want it to be changed in some way). It would help if you tell me how I should change the patches. (I also hope to have a semi-generic way of addressing performance concerns for future deinlining patch submissions.) On 04/14/2015 08:37 PM, David Miller wrote: From: Denys Vlasenko dvlas...@redhat.com Date: Tue, 14 Apr 2015 14:25:11 +0200 On x86 allyesconfig build: The function compiles to 130 bytes of machine code. It has 493 callsites. Total reduction of vmlinux size: 27906 bytes. text data bss dec hex filename 82447071 22255384 20627456 125329911 77861f7 vmlinux4 82419165 22255384 20627456 125302005 777f4f5 vmlinux5 Signed-off-by: Denys Vlasenko dvlas...@redhat.com That BUG_ON() was added 7 years ago, and I don't remember it ever triggering or helping us diagnose something, so just remove it and keep the function inlined. There are two BUG_ONs. I'll remove both of them in v2. Let me know if you want something else. However, without BUG_ONs, function is still a bit big on PREEMPT configs. Among almost 500 users, many probably aren't hot paths. How about having an inlined version, say, fast_net_generic(), and a deinlined one, and using one or another as appropriate. Is this ok with you? On 04/14/2015 03:19 PM, Eric Dumazet wrote: On Tue, 2015-04-14 at 14:25 +0200, Denys Vlasenko wrote: On x86 allyesconfig build: The function compiles to 130 bytes of machine code. It has 493 callsites. Total reduction of vmlinux size: 27906 bytes. text data bss dec hex filename 82447071 22255384 20627456 125329911 77861f7 vmlinux4 82419165 22255384 20627456 125302005 777f4f5 vmlinux5 This sounds a big hammer to me. These savings probably comes from the BUG_ON() that could simply be removed. The second one for sure has no purpose. First one looks defensive. For a typical (non allyesconfig) kernel, net_generic() would translate to : return net-gen[id - 1] Tunnels need this in fast path, so I presume we could introduce net_generic_rcu() to keep this stuff inlined where it matters. static inline void *net_generic_rcu(const struct net *net, int id) { struct net_generic *ng = rcu_dereference(net-gen); return ng-ptr[id - 1]; } I looked at the code and I'm not feeling confident enough to find places where replacing net_generic() with net_generic_rcu() is okay. Would it be okay if I add net_generic_rcu() as you suggest, but leave it to you (network people) to switch to it where appropriate? -- vda -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH linux-next 1/4] infiniband/ipoib: fix possible NULL pointer dereference in ipoib_get_iflink
On Wed, Apr 15, 2015 at 7:06 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Apr 15, 2015 at 09:17:14AM +0300, Erez Shitrit wrote: + /* parent interface */ + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) + return dev-ifindex; + + /* child/vlan interface */ + if (!priv-parent) + return -1; Like was said for other drivers, I can't see how parent can be null while IPOIB_FLAG_SUBINTERFACE is set. Drop the last if. It can, at least for ipoib child interface (AKA vlan), you can't control the call for that ndo and it can be called before the parent was set. If the ndo can be called before the netdev private structures are fully prepared then we have another bug, and returning -1 or 0 is not the right answer anyhow. For safety, fold this into your patch. OK, will do that. diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 9fad7b5ac8b9..e62b007adf5d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, /* MTU will be reset when mcast join happens */ priv-dev-mtu = IPOIB_UD_MTU(priv-max_ib_mtu); priv-mcast_mtu = priv-admin_mtu = priv-dev-mtu; + priv-parent = ppriv-dev; set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags); result = ipoib_set_dev_features(priv, ppriv-ca); @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, goto register_failed; } - priv-parent = ppriv-dev; - ipoib_create_debug_files(priv-dev); /* RTNL childs don't need proprietary sysfs entries */ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pppoe relay and MAC address filtering
I have create pppoe session over a pppoe relay socket: A B and C are Linux nodes, B impl. a relay socket so that A and B can create a pppoe session: A ifA-ifB0 B ifB1--ifC C Now I noticed that if ifB0 is in promisc mode it picks up other pppoe pkgs which are meant for some other pppoe session to another node(D) and relays this pkg to node C! To me this looks like the pppoe relay socket does not check if the DST MAC on pppoe pkgs received over ifB0 matches ifB0 MAC address? Instead the relay socket happily relays any pppoe pkg as long as the session id matches. This feels like a bug to me, comments? Jocke-- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]
On Thu, Apr 16, 2015 at 11:18:39AM +0100, Ian Jackson wrote: Prashant writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]): Ian, using your config we are able to recreate the problem that you are seeing. The driver finds the RX data buffer to be all zero, with a analyzer trace we are seeing the chip is DMA'ing valid RX data buffer contents to the host but once the driver tries to read this DMA area, it is seeing all zero's which is the reason of the corruption. This is only for the RX data buffer, the RX descriptor and status block update DMA regions are having valid contents. I am no expert on this area, but this suggests that the driver is misoperating the Linux DMA management API. This is what I think Konrad suspected when he suggested the `iommu=soft swiotlb=force' command line option. Note in kernel-parameters.txt: swiotlb=[ARM,IA-64,PPC,MIPS,X86] Format: { int | force } int -- Number of I/O TLB slabs force -- force using of bounce buffers even if they wouldn't be automatically used by the kernel So with `swiotlb=force' the DMA is _expected_ to go to a bounce buffer managed by the kernel DMA API. This is unlikely to be a chip or driver issue, as the chip is doing the correct DMA but the corruption occurs before driver reads it. Would request iommu experts to take a look and suggest what can be done next. As I say above I think this is probably a driver bug. Yes, this looks like the driver is not syncing the DMA buffers. Unmap is supposed to synchronize as well. Prashant, can you point to where in the code you see all zeroes after checking up the data? Cascardo. I have seen identical symptoms on a 5yo desktop box under my desk and on two brand new rackmount servers; I therefore doubt that it's a hardware problem. Ian. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On Thu, 2015-04-16 at 11:01 +0100, George Dunlap wrote: He suggested that after he'd been prodded by 4 more e-mails in which two of us guessed what he was trying to get at. That's what I was complaining about. My big complain is that I suggested to test to double the sysctl, which gave good results. Then you provided a patch using a 8x factor. How does that sound ? Next time I ask a raise, I should try a 8x factor as well, who knows, it might be accepted. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rocker: fix error return code in rocker_probe()
Thu, Apr 16, 2015 at 02:21:02PM CEST, weiyj...@163.com wrote: From: Wei Yongjun yongjun_...@trendmicro.com.cn Fix to return -EINVAL from the invalid PCI region size error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn Good catch. Acked-by: Jiri Pirko j...@resnulli.us -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mac802154: llsec: fix return value check in llsec_key_alloc()
From: Wei Yongjun yongjun_...@trendmicro.com.cn In case of error, the functions crypto_alloc_aead() and crypto_alloc_blkcipher() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- net/mac802154/llsec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/mac802154/llsec.c b/net/mac802154/llsec.c index dcf7395..5b2be12 100644 --- a/net/mac802154/llsec.c +++ b/net/mac802154/llsec.c @@ -134,7 +134,7 @@ llsec_key_alloc(const struct ieee802154_llsec_key *template) for (i = 0; i ARRAY_SIZE(key-tfm); i++) { key-tfm[i] = crypto_alloc_aead(ccm(aes), 0, CRYPTO_ALG_ASYNC); - if (!key-tfm[i]) + if (IS_ERR(key-tfm[i])) goto err_tfm; if (crypto_aead_setkey(key-tfm[i], template-key, IEEE802154_LLSEC_KEY_SIZE)) @@ -144,7 +144,7 @@ llsec_key_alloc(const struct ieee802154_llsec_key *template) } key-tfm0 = crypto_alloc_blkcipher(ctr(aes), 0, CRYPTO_ALG_ASYNC); - if (!key-tfm0) + if (IS_ERR(key-tfm0)) goto err_tfm; if (crypto_blkcipher_setkey(key-tfm0, template-key, -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On Thu, Apr 16, 2015 at 10:22 AM, David Laight david.lai...@aculab.com wrote: ISTM that you are changing the wrong knob. You need to change something that affects the global amount of pending tx data, not the amount that can be buffered by a single connection. Well it seems like the problem is that the global amount of pending tx data is high enough, but that the per-stream amount is too low for only a single stream. If you change tcp_limit_output_bytes and then have 1000 connections trying to send data you'll suffer 'bufferbloat'. Right -- so are you worried about the buffers in the local device here, or are you worried about buffers elsewhere in the network? If you're worried about buffers on the local device, don't you have a similar problem for physical NICs? i.e., if a NIC has a big buffer that you're trying to keep mostly empty, limiting a single TCP stream may keep that buffer empty, but if you have 1000 connections, 1000*limit will still fill up the buffer. Or am I missing something? If you call skb_orphan() in the tx setup path then the total number of buffers is limited, but a single connection can (and will) will the tx ring leading to incorrect RTT calculations and additional latency for other connections. This will give high single connection throughput but isn't ideal. One possibility might be to call skb_orphan() when enough time has elapsed since the packet was queued for transmit that it is very likely to have actually been transmitted - even though 'transmit done' has not yet been signalled. Not at all sure how this would fit in though... Right -- so it sounds like the problem with skb_orphan() is making sure that the tx ring is shared properly between different streams. That would mean that ideally we wouldn't call it until the tx ring actually had space to add more packets onto it. The Xen project is having a sort of developer meeting in a few weeks; if we can get a good picture of all the constraints, maybe we can hash out a solution that works for everyone. -George -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -next] ethernet: remove unused including linux/version.h
From: Wei Yongjun yongjun_...@trendmicro.com.cn Remove including linux/version.h that don't need it. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- drivers/net/ethernet/qualcomm/qca_spi.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index 4a42e96..f66641d 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -41,7 +41,6 @@ #include linux/skbuff.h #include linux/spi/spi.h #include linux/types.h -#include linux/version.h #include qca_7k.h #include qca_debug.h -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On 04/15/2015 07:17 PM, Eric Dumazet wrote: Do not expect me to fight bufferbloat alone. Be part of the challenge, instead of trying to get back to proven bad solutions. I tried that. I wrote a description of what I thought the situation was, so that you could correct me if my understanding was wrong, and then what I thought we could do about it. You apparently didn't even read it, but just pointed me to a single cryptic comment that doesn't give me enough information to actually figure out what the situation is. We all agree that bufferbloat is a problem for everybody, and I can definitely understand the desire to actually make the situation better rather than dying the death of a thousand exceptions. If you want help fighting bufferbloat, you have to educate people to help you; or alternately, if you don't want to bother educating people, you have to fight it alone -- or lose the battle due to having a thousand exceptions. So, back to TSQ limits. What's so magical about 2 packets being *in the device itself*? And what does 1ms, or 2*64k packets (the default for tcp_limit_output_bytes), have anything to do with it? Your comment lists three benefits: 1. better RTT estimation 2. faster recovery 3. high rates #3 is just marketing fluff; it's also contradicted by the statement that immediately follows it -- i.e., there are drivers for which the limitation does *not* give high rates. #1, as far as I can tell, has to do with measuring the *actual* minimal round trip time of an empty pipe, rather than the round trip time you get when there's 512MB of packets in the device buffer. If a device has a large internal buffer, then having a large number of packets outstanding means that the measured RTT is skewed. The goal here, I take it, is to have this pipe *exactly* full; having it significantly more than full is what leads to bufferbloat. #2 sounds like you're saying that if there are too many packets outstanding when you discover that you need to adjust things, that it takes a long time for your changes to have an effect; i.e., if you have 5ms of data in the pipe, it will take at least 5ms for your reduced transmmission rate to actually have an effect. Is that accurate, or have I misunderstood something? -George -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -next] fou: Fix missing unlock on error in fou_nl_dump()
From: Wei Yongjun yongjun_...@trendmicro.com.cn Add the missing unlock before return from function fou_nl_dump() in the error handling case. Fixes: 7a6c8c34e5b7 (fou: implement FOU_CMD_GET) Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- net/ipv4/fou.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c index af150b4..6ca0182 100644 --- a/net/ipv4/fou.c +++ b/net/ipv4/fou.c @@ -711,7 +711,7 @@ static int fou_nl_dump(struct sk_buff *skb, struct netlink_callback *cb) cb-nlh-nlmsg_seq, NLM_F_MULTI, skb, FOU_CMD_GET); if (ret) - goto done; + break; } mutex_unlock(fn-fou_lock); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] rocker: fix error return code in rocker_probe()
From: Wei Yongjun yongjun_...@trendmicro.com.cn Fix to return -EINVAL from the invalid PCI region size error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- drivers/net/ethernet/rocker/rocker.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 5cecec2..0cc17f5 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4293,6 +4293,7 @@ static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (pci_resource_len(pdev, 0) ROCKER_PCI_BAR0_SIZE) { dev_err(pdev-dev, invalid PCI region size\n); + err = -EINVAL; goto err_pci_resource_len_check; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state
On 04/16/2015 05:37 AM, Andrew Lunn wrote: On Wed, Apr 15, 2015 at 10:12:42PM -0700, Guenter Roeck wrote: Return correct error code if _mv88e6xxx_reg_read returns an error. Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging support) Signed-off-by: Guenter Roeck li...@roeck-us.net Hi Guenter Good catch. I'm surprised there is no compiler warning, possibly used before set. There was, actually. No idea how I missed that. Fengguang's build bots caught it. Reviewed-by: Andrew Lunn and...@lunn.ch Thanks! Guenter -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state
On 04/15/2015 10:12 PM, Guenter Roeck wrote: Return correct error code if _mv88e6xxx_reg_read returns an error. Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging support) Signed-off-by: Guenter Roeck li...@roeck-us.net I should have given proper credit. Reported-by: kbuild test robot fengguang...@intel.com For the curious, neither W=1, W=2, C-1, C=2, nor sparse report this problem, at least not with gcc 4.9.1. Guenter -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] dsa: mv88e6xxx: Drop duplicate declaration of 'ret' variable
A duplicate declaration of 'ret' can result in hiding an error code. Drop it. Fixes: 17ee3e04ddbf (net: dsa: Provide additional RMON statistics) Signed-off-by: Guenter Roeck li...@roeck-us.net --- Found by compiling the file with W=2. drivers/net/dsa/mv88e6xxx.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index f64186a5f634..12f186cb30cc 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -602,8 +602,6 @@ static void _mv88e6xxx_get_ethtool_stats(struct dsa_switch *ds, u32 high = 0; if (s-reg = 0x100) { - int ret; - ret = mv88e6xxx_reg_read(ds, REG_PORT(port), s-reg - 0x100); if (ret 0) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netns: deinline net_generic()
From: Denys Vlasenko dvlas...@redhat.com Date: Thu, 16 Apr 2015 13:14:14 +0200 It would help if you tell me how I should change the patches. Why ask Eric when I told you exactly how to change the patch to make it acceptable, so please do so. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting
From: Hannes Frederic Sowa han...@stressinduktion.org Date: Thu, 16 Apr 2015 14:11:42 +0200 On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote: On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote: Netfilter may change the contents of the packet, even change its size. It is *really* hard to do this while keeping the original fragments intact. Perhaps we should provide better helpers to facilitate this? So instead of directly manipulating the content of the skb you would so so through helpers and the helpers can then try to do sensible things with the fragments. When Florian and me started discussing how to solve this we also wanted to be as transparent as possible. But looking at all possible fragmentation scenarios, this seems to be too complicated. Even imagine a fragment with overlapping offsets and some of the fragments got duplicated. If we had to keep this in frag_list and now netfilter has to change any of this contents, this will become a total mess, like changing one port in multiple skbs at different offsets. I doubt it is worth the effort. You guys keep talking about exceptional cases rather than what is unquestionable the common case, and the one worth handling in the fast paths. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink
On Thu, Apr 16, 2015 at 04:34:34PM +0300, Erez Shitrit wrote: Currently, iflink of the parent interface was always accessed, even when interface didn't have a parent and hence we crashed there. Handle the interface types properly: for a child interface, return the ifindex of the parent, for parent interface, return its ifindex. For child devices, make sure to set the parent pointer prior to invoking register_netdevice(), this allows the new ndo to be called by the stack immediately after the child device is registered. Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink') Reported-by: Honggang Li ho...@redhat.com Signed-off-by: Erez Shitrit ere...@mellanox.com Signed-off-by: Honggang Li ho...@redhat.com Reviewed-By: Jason Gunthorpe jguntho...@obsidianresearch.com+ changes from V0: - fixed two typos in the change-log drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 + drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 657b89b..915ad04 100644 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); + /* parent interface */ + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) + return dev-ifindex; + + /* child/vlan interface */ return priv-parent-ifindex; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 4dd1313..fca1a88 100644 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, /* MTU will be reset when mcast join happens */ priv-dev-mtu = IPOIB_UD_MTU(priv-max_ib_mtu); priv-mcast_mtu = priv-admin_mtu = priv-dev-mtu; + priv-parent = ppriv-dev; set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags); result = ipoib_set_dev_features(priv, ppriv-ca); @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, goto register_failed; } - priv-parent = ppriv-dev; - ipoib_create_debug_files(priv-dev); /* RTNL childs don't need proprietary sysfs entries */ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH stable 3.10-3.16] tcp: Fix crash in TCP Fast Open
On Wed, Apr 15, 2015 at 07:00:32PM +0100, Ben Hutchings wrote: Commit 355a901e6cf1 (tcp: make connect() mem charging friendly) changed tcp_send_syn_data() to perform an open-coded copy of the 'syn' skb rather than using skb_copy_expand(). The open-coded copy does not cover the skb_shared_info::gso_segs field, so in the new skb it is left set to 0. When this commit was backported into stable branches between 3.10.y and 3.16.7-ckty inclusive, it triggered the BUG() in tcp_transmit_skb(). Since Linux 3.18 the GSO segment count is kept in the tcp_skb_cb::tcp_gso_segs field and tcp_send_syn_data() does copy the tcp_skb_cb structure to the new skb, so mainline and newer stable branches are not affected. Set skb_shared_info::gso_segs to the correct value of 1. Signed-off-by: Ben Hutchings b...@decadent.org.uk Thanks a lot, Ben. I'll queue this for the next 3.16 kernel release. Cheers, -- Luís --- net/ipv4/tcp_output.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d5457e4..1ea0a07 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2992,6 +2992,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) goto fallback; syn_data-ip_summed = CHECKSUM_PARTIAL; memcpy(syn_data-cb, syn-cb, sizeof(syn-cb)); + skb_shinfo(syn_data)-gso_segs = 1; if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space), fo-data-msg_iov, 0, space))) { kfree_skb(syn_data); -- Ben Hutchings Editing code like this is akin to sticking plasters on the bleeding stump of a severed limb. - me, 29 June 1999 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting
From: Patrick McHardy ka...@trash.net Date: Thu, 16 Apr 2015 06:24:00 +0100 On 16.04, Herbert Xu wrote: David Miller da...@davemloft.net wrote: Because then there is no ambiguity at all, you preserve on output exactly what you had on input. The same geometry, the same everything. No special checks, no max frag len, none of this crap. Those are all hacks trying to work around the _fundamental_ issue which is that we potentially change the thing when we refrag. Agreed. Doing anything other than preserving the original geometry is simply wrong. However, this doesn't mean that netfilter has to process each fragment. What we could do is to preserve the original fragments in frag_list and then process the overall skb as a unit in netfilter. On output we simply fragment according to the original frag_list. The only thing to watch out for is to eliminate anything in the middle that tries to linearise the skb. Netfilter may change the contents of the packet, even change its size. It is *really* hard to do this while keeping the original fragments intact. I keep hearing a lot of it's hard as the only reason we shouldn't do this properly, and that frankly sucks. People aren't looking for a solution and to be honest it's quite tiring. The common case is that the rules processed are simple, the size of the overall packet does _not_ change, and therefore the best thing to do is pass the entire thing as a unit with the frags in tact. That's the fundamental fact. It's also the fastest way to process these packets and avoids all of these stupid max frag garbage. Only at the point where netfilter makes changes to the size of the packet does it take ownership and get to take on the responsibility of making sure the new resulting fragments are sane. But only at that point. It must not molest the geometry in any other set of circumstances, and I absolutely will not relent on this. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting
Hi David, On Thu, Apr 16, 2015, at 17:43, David Miller wrote: From: Hannes Frederic Sowa han...@stressinduktion.org Date: Thu, 16 Apr 2015 14:11:42 +0200 On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote: On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote: Netfilter may change the contents of the packet, even change its size. It is *really* hard to do this while keeping the original fragments intact. Perhaps we should provide better helpers to facilitate this? So instead of directly manipulating the content of the skb you would so so through helpers and the helpers can then try to do sensible things with the fragments. When Florian and me started discussing how to solve this we also wanted to be as transparent as possible. But looking at all possible fragmentation scenarios, this seems to be too complicated. Even imagine a fragment with overlapping offsets and some of the fragments got duplicated. If we had to keep this in frag_list and now netfilter has to change any of this contents, this will become a total mess, like changing one port in multiple skbs at different offsets. I doubt it is worth the effort. You guys keep talking about exceptional cases rather than what is unquestionable the common case, and the one worth handling in the fast paths. So currently we have one fast path, that is: we are not fragmented, we get out non-fragmented, none of this code is ever touched and no problem. We don't want to mak this more complex, but every other solution would need to first expand sk_buff with a new member(!), then: 1) push the original fragments through netfilter - I think this will definitely break user space compatibility 2) add new wrappers into the fast path which now will make sure that we access the correct skb from the frag_list instead of just dereference the pointer - we will make fast path also more slowly and a lot more complex 3) use the reassembled skb for netfilter and emit the original ones - seems like a security hazard to me, discrepancy what is checked and what is used 4) just use the frag_list skb-len's to restore the reassembled skb to the original sizes, this would be possible but we would still harm fast-path with that (new skb member) we could switch to dynamically allocated array of int[] to store sizes and DF bit information In particular, even this would not achieve the goal to have perfect transparency. I can give you a more fundamental problem: We cannot even be transparent if asymmetric routing is in effect and we don't see all fragments. In case we have contrack rules active, we clear fragment queues after some time. So a bridge actually will toss offset!=0 packets if it never sees the head. If we let the packet pass, we might have a security problem. We can never become fully transparent. Bye, Hannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state
From: Guenter Roeck li...@roeck-us.net Date: Wed, 15 Apr 2015 22:12:42 -0700 Return correct error code if _mv88e6xxx_reg_read returns an error. Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging support) Signed-off-by: Guenter Roeck li...@roeck-us.net Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next] netns: remove duplicated include from net_namespace.c
From: weiyj...@163.com Date: Thu, 16 Apr 2015 21:17:35 +0800 From: Wei Yongjun yongjun_...@trendmicro.com.cn Remove duplicated include. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] dsa: mv88e6xxx: Drop duplicate declaration of 'ret' variable
From: Guenter Roeck li...@roeck-us.net Date: Thu, 16 Apr 2015 06:49:50 -0700 A duplicate declaration of 'ret' can result in hiding an error code. Drop it. Fixes: 17ee3e04ddbf (net: dsa: Provide additional RMON statistics) Signed-off-by: Guenter Roeck li...@roeck-us.net Applied, thank you. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next] ethernet: remove unused including linux/version.h
From: weiyj...@163.com Date: Thu, 16 Apr 2015 21:06:47 +0800 From: Wei Yongjun yongjun_...@trendmicro.com.cn Remove including linux/version.h that don't need it. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] altera tse: Fix network-delays and -retransmissions after high throughput.
From: Andreas Oetken ennoerlan...@googlemail.com Date: Tue, 14 Apr 2015 22:25:26 +0200 From: Andreas Oetken andr...@oetken.name Fix bug which occurs when more than limit packets are available during napi-poll, leading to delays and retransmissions on the network. Check for (count limit) before checking the get_rx_status in tse_rx-function. get_rx_status is reading from the response-fifo. If there is currently a response in the fifo, reading the last byte of the response pops the value from the fifo. If the limit is check as second condition and the limit is reached the fifo is popped but the packet is not processed. Signed-off-by: Andreas Oetken ennoerlan...@gmail.com Your commit message lines are very long, format them to 80 columns or less. - while (((rxstatus = priv-dmaops-get_rx_status(priv)) != 0) -(count limit)) { + /* Check for limit first. get_rx_status is reading the from the + * response-fifo. If there is a currently a response in the fifo, + * reading the last byte of the response pops the value from the fifo. + */ + while ((count limit) + ((rxstatus = priv-dmaops-get_rx_status(priv)) != 0)) { You're corrupted the indentation. The second line of this while statement should start precisely at the first column after the openning parenthesis of the first line. You must use the appropriate number of TAB then SPACE characters necessary to achieve this. If you are purely using TAB characters, as you have done here, for indentation then it is very likely that you have indented things incorrectly. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/1] cxgb4: drop __GFP_NOFAIL allocation
From: a...@linux-foundation.org Date: Tue, 14 Apr 2015 13:24:33 -0700 From: Michal Hocko mho...@suse.cz Subject: cxgb4: drop __GFP_NOFAIL allocation set_filter_wr is requesting __GFP_NOFAIL allocation although it can return ENOMEM without any problems obviously (t4_l2t_set_switching does that already). So the non-failing requirement is too strong without any obvious reason. Drop __GFP_NOFAIL and reorganize the code to have the failure paths easier. The same applies to _c4iw_write_mem_dma_aligned which uses __GFP_NOFAIL and then checks the return value and returns -ENOMEM on failure. This doesn't make any sense what so ever. Either the allocation cannot fail or it can. del_filter_wr seems to be safe as well because the filter entry is not marked as pending and the return value is propagated up the stack up to c4iw_destroy_listen. Signed-off-by: Michal Hocko mho...@suse.cz ... Signed-off-by: Andrew Morton a...@linux-foundation.org Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 03/39] net: sched: Use hrtimer_resolution instead of hrtimer_get_res()
From: Thomas Gleixner t...@linutronix.de Date: Tue, 14 Apr 2015 21:08:28 - No point in converting a timespec now that the value is directly accessible. Signed-off-by: Thomas Gleixner t...@linutronix.de Acked-by: David S. Miller da...@davemloft.net -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 31/39] net: core: pktgen: Remove bogus hrtimer_active() check
From: Thomas Gleixner t...@linutronix.de Date: Tue, 14 Apr 2015 21:09:16 - The check for hrtimer_active() after starting the timer is pointless. If the timer is inactive it has expired already and therefor the task pointer is already NULL. Signed-off-by: Thomas Gleixner t...@linutronix.de Acked-by: David S. Miller da...@davemloft.net -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] bpf: fix verifier memory corruption
From: Alexei Starovoitov a...@plumgrid.com Date: Tue, 14 Apr 2015 15:57:13 -0700 Due to missing bounds check the DAG pass of the BPF verifier can corrupt the memory which can cause random crashes during program loading: [8.449451] BUG: unable to handle kernel paging request at [8.451293] IP: [811de33d] kmem_cache_alloc_trace+0x8d/0x2f0 [8.452329] Oops: [#1] SMP [8.452329] Call Trace: [8.452329] [8116cc82] bpf_check+0x852/0x2000 [8.452329] [8116b7e4] bpf_prog_load+0x1e4/0x310 [8.452329] [811b190f] ? might_fault+0x5f/0xb0 [8.452329] [8116c206] SyS_bpf+0x806/0xa30 Fixes: f1bca824dabb (bpf: add search pruning optimization to verifier) Signed-off-by: Alexei Starovoitov a...@plumgrid.com Applied and queued up for -stable, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/1] stmmac: fix oops on rmmod after assigning ip addr
From: Bryan O'Donoghue pure.lo...@nexus-software.ie Date: Wed, 15 Apr 2015 02:07:46 +0100 An oops exists in the flow of stmmac_release(). phy_ethtool_get_wol() depends on phydev-drv. phydev-drv will be null after stmmac_mdio_unreg() completes. Steps to reproduce on Quark X1000: 1. ifconfig eth0 192.168.0.1 2. rmmod stmmac_pci To fix this stmmac_mdio_unreg() should be run after unregister_netdev(). Signed-off-by: Bryan O'Donoghue pure.lo...@nexus-software.ie Reported-by: Dan O'Donovan dan.odono...@emutex.com This patch does not apply to the current tree, please respin. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch net-next v2] fou: avoid missing unlock in failure path
From: Cong Wang xiyou.wangc...@gmail.com Date: Wed, 15 Apr 2015 11:48:49 -0700 Fixes: 7a6c8c34e5b7 (fou: implement FOU_CMD_GET) Reported-by: Dan Carpenter dan.carpen...@oracle.com Cc: Dan Carpenter dan.carpen...@oracle.com Signed-off-by: Cong Wang xiyou.wangc...@gmail.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next] fou: Fix missing unlock on error in fou_nl_dump()
From: weiyj...@163.com Date: Thu, 16 Apr 2015 20:14:39 +0800 From: Wei Yongjun yongjun_...@trendmicro.com.cn Add the missing unlock before return from function fou_nl_dump() in the error handling case. Fixes: 7a6c8c34e5b7 (fou: implement FOU_CMD_GET) Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn Your patch adds a warning because you fail to remove the 'done' label which is no longer used. Cong Wang already submitted a patch which does remove the 'done' field properly, so I will apply his version. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rocker: fix error return code in rocker_probe()
From: weiyj...@163.com Date: Thu, 16 Apr 2015 20:21:02 +0800 From: Wei Yongjun yongjun_...@trendmicro.com.cn Fix to return -EINVAL from the invalid PCI region size error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/1] stmmac: fix oops on rmmod after assigning ip addr
On 16/04/15 17:07, David Miller wrote: This patch does not apply to the current tree, please respin. /facepalm Did this against : https://github.com/torvalds/linux.git I'll spin again against : https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/1] stmmac: fix oops on rmmod after assigning ip addr
From: Bryan O'Donoghue pure.lo...@nexus-software.ie Date: Thu, 16 Apr 2015 17:20:50 +0100 I'll spin again against : https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/ Currently active tree is 'net', not 'net-next' -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/1] stmmac: rmmod oops after ifconfig
We have an oops with stmmac on Quark X1000/Galileo, triggered by rmmod after ifconfig. Fix for issue contained in next mail against git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git root@clanton:~# ifconfig eth0 192.168.0.1 root@clanton:~# rmmod stmmac_pci [ 39.257871] stmmac_dvr_remove: removing driver [ 39.263618] BUG: unable to handle kernel NULL pointer dereference at 0060 [ 39.271030] IP: [c1254192] phy_ethtool_get_wol+0x2/0x20 [ 39.272391] *pdpt = 0e682001 *pde = [ 39.272391] Oops: [#1] [ 39.272391] Modules linked in: evdev usbhid btusb bluetooth iwlwifi cfg80211 rfkill ad7298 industrialio_triggered_buffer kfifo_buf industrialio spi_s [ 39.272391] CPU: 0 PID: 1245 Comm: rmmod Not tainted 4.0.0 #26 [ 39.272391] Hardware name: Intel Corp. QUARK/Galileo, BIOS 0x0350 01/01/2014 [ 39.272391] task: ce52a9e0 ti: cea64000 task.ti: cea64000 [ 39.272391] EIP: 0060:[c1254192] EFLAGS: 00010292 CPU: 0 [ 39.272391] EIP is at phy_ethtool_get_wol+0x2/0x20 [ 39.272391] EAX: ce59e400 EBX: ce59e400 ECX: EDX: cea65ddc [ 39.272391] ESI: EDI: cea65e80 EBP: cea65df8 ESP: cea65dd8 [ 39.272391] DS: 007b ES: 007b FS: GS: 0033 SS: 0068 [ 39.272391] CR0: 8005003b CR2: 0060 CR3: 0e641000 CR4: 00100030 [ 39.272391] Stack: [ 39.272391] c1255908 0005 ce59e400 ce85b4e0 [ 39.272391] cea65e04 c125597c ce59e400 cea65e10 c12559cf ce85b000 cea65e24 e07a9dd8 [ 39.272391] cea65e38 ce85b000 cea65e80 cea65e44 c12a7639 cea0cc60 cdd4a180 c009ad00 [ 39.272391] Call Trace: [ 39.272391] [c1255908] ? phy_suspend+0x38/0x70 [ 39.272391] [c125597c] phy_detach+0x3c/0x60 [ 39.272391] [c12559cf] phy_disconnect+0x2f/0x40 [ 39.272391] [e07a9dd8] stmmac_release+0x38/0x150 [stmmac] [ 39.272391] [c12a7639] __dev_close_many+0x69/0xb0 [ 39.272391] [c12a76dd] dev_close_many+0x5d/0xd0 [ 39.272391] [c12a859a] rollback_registered_many+0xda/0x240 [ 39.272391] [c12a871f] rollback_registered+0x1f/0x30 [ 39.272391] [c12a93b7] unregister_netdevice_queue+0x47/0xb0 [ 39.272391] [c132bb1b] ? mutex_lock+0xb/0x19 [ 39.272391] [c12a9434] unregister_netdev+0x14/0x20 [ 39.272391] [e07a695d] stmmac_dvr_remove+0x6d/0xb0 [stmmac] [ 39.272391] [e074e02e] stmmac_pci_remove+0xe/0x10 [stmmac_pci] [ 39.272391] [c1196c48] pci_device_remove+0x28/0xa0 [ 39.272391] [c121d8ea] __device_release_driver+0x5a/0xb0 [ 39.272391] [c121df4f] driver_detach+0x6f/0x80 [ 39.272391] [c121d68b] bus_remove_driver+0x3b/0x80 [ 39.272391] [c121e4c3] driver_unregister+0x23/0x60 [ 39.272391] [c10726da] ? find_module_all+0x5a/0x80 [ 39.272391] [c11968df] pci_unregister_driver+0xf/0x50 [ 39.272391] [e074e3d3] stmmac_pci_driver_exit+0xd/0xc3a [stmmac_pci] [ 39.272391] [c1073f5c] SyS_delete_module+0x15c/0x1b0 [ 39.272391] [c132d19d] syscall_call+0x7/0x7 [ 39.272391] Code: 42 12 31 c0 c3 8d 74 26 00 8b 08 8b 49 5c 85 c9 74 07 55 89 e5 ff d1 5d c3 b8 a1 ff ff ff c3 8d 76 00 8d bc 27 00 00 00 00 8b 08 0 [ 39.272391] EIP: [c1254192] phy_ethtool_get_wol+0x2/0x20 SS:ESP 0068:cea65dd8 [ 39.272391] CR2: 0060 [ 39.551363] ---[ end trace 98d1260353bf4fec ]--- Killed Bryan O'Donoghue (1): stmmac: fix oops on rmmod after assigning ip addr drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/1] stmmac: fix oops on rmmod after assigning ip addr
An oops exists in the flow of stmmac_release(). phy_ethtool_get_wol() depends on phydev-drv. phydev-drv will be null after stmmac_mdio_unreg() completes. Steps to reproduce on Quark X1000: 1. ifconfig eth0 192.168.0.1 2. rmmod stmmac_pci To fix this stmmac_mdio_unreg() should be run after unregister_netdev(). Signed-off-by: Bryan O'Donoghue pure.lo...@nexus-software.ie Reported-by: Dan O'Donovan dan.odono...@emutex.com --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 06103ca..6065173 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2970,15 +2970,15 @@ int stmmac_dvr_remove(struct net_device *ndev) priv-hw-dma-stop_tx(priv-ioaddr); stmmac_set_mac(priv-ioaddr, false); - if (priv-pcs != STMMAC_PCS_RGMII priv-pcs != STMMAC_PCS_TBI - priv-pcs != STMMAC_PCS_RTBI) - stmmac_mdio_unregister(ndev); netif_carrier_off(ndev); unregister_netdev(ndev); if (priv-stmmac_rst) reset_control_assert(priv-stmmac_rst); clk_disable_unprepare(priv-pclk); clk_disable_unprepare(priv-stmmac_clk); + if (priv-pcs != STMMAC_PCS_RGMII priv-pcs != STMMAC_PCS_TBI + priv-pcs != STMMAC_PCS_RTBI) + stmmac_mdio_unregister(ndev); free_netdev(ndev); return 0; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]
On Thu, 2015-04-16 at 09:24 -0300, casca...@linux.vnet.ibm.com wrote: Yes, this looks like the driver is not syncing the DMA buffers. Unmap is supposed to synchronize as well. For small rx packets ( 256 bytes), we sync the DMA buffer before we copy the data to another SKB. For larger packets, we unmap the DMA buffer. Do we see the corruption in both cases? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink
Hi, Le jeudi 16 avril 2015 à 16:34 +0300, Erez Shitrit a écrit : Currently, iflink of the parent interface was always accessed, even when interface didn't have a parent and hence we crashed there. + since commit 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink'). as there was no crash here before AFAIK. Handle the interface types properly: for a child interface, return the ifindex of the parent, for parent interface, return its ifindex. For child devices, make sure to set the parent pointer prior to invoking register_netdevice(), this allows the new ndo to be called by the stack immediately after the child device is registered. Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink') Cc: Nicolas Dichtel nicolas.dich...@6wind.com Reported-by: Honggang Li ho...@redhat.com Signed-off-by: Erez Shitrit ere...@mellanox.com Signed-off-by: Honggang Li ho...@redhat.com --- changes from V0: - fixed two typos in the change-log drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 + drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 657b89b..915ad04 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); + /* parent interface */ + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) + return dev-ifindex; + + /* child/vlan interface */ return priv-parent-ifindex; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 4dd1313..fca1a88 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, /* MTU will be reset when mcast join happens */ priv-dev-mtu = IPOIB_UD_MTU(priv-max_ib_mtu); priv-mcast_mtu = priv-admin_mtu = priv-dev-mtu; + priv-parent = ppriv-dev; set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags); result = ipoib_set_dev_features(priv, ppriv-ca); @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, goto register_failed; } - priv-parent = ppriv-dev; - ipoib_create_debug_files(priv-dev); /* RTNL childs don't need proprietary sysfs entries */ Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] IB/ipoib: Fix ndo_get_iflink
Currently, iflink of the parent interface was always accessed, event when interface didn't have a parent and hence we rashed there. Handle the interface types properly: for a child interface, return the ifindex of the parent, for parent interface, return its ifindex. For child devices, make sure to set the parent pointer prior to invoking register_netdevice(), this allows the new ndo to be called by the stack immediately after the child device is registered. Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink') Reported-by: Honggang Li ho...@redhat.com Signed-off-by: Erez Shitrit ere...@mellanox.com Signed-off-by: Honggang Li ho...@redhat.com --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 + drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 657b89b..915ad04 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); + /* parent interface */ + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) + return dev-ifindex; + + /* child/vlan interface */ return priv-parent-ifindex; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 4dd1313..fca1a88 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, /* MTU will be reset when mcast join happens */ priv-dev-mtu = IPOIB_UD_MTU(priv-max_ib_mtu); priv-mcast_mtu = priv-admin_mtu = priv-dev-mtu; + priv-parent = ppriv-dev; set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags); result = ipoib_set_dev_features(priv, ppriv-ca); @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, goto register_failed; } - priv-parent = ppriv-dev; - ipoib_create_debug_files(priv-dev); /* RTNL childs don't need proprietary sysfs entries */ -- 1.7.11.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
At 12:39 +0100 on 16 Apr (1429187952), George Dunlap wrote: Your comment lists three benefits: 1. better RTT estimation 2. faster recovery 3. high rates #3 is just marketing fluff; it's also contradicted by the statement that immediately follows it -- i.e., there are drivers for which the limitation does *not* give high rates. AFAICT #3 is talking about throughput _under TCP_, where inflating the RTT will absolutely cause problems. Tim. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink
Currently, iflink of the parent interface was always accessed, even when interface didn't have a parent and hence we crashed there. Handle the interface types properly: for a child interface, return the ifindex of the parent, for parent interface, return its ifindex. For child devices, make sure to set the parent pointer prior to invoking register_netdevice(), this allows the new ndo to be called by the stack immediately after the child device is registered. Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink') Reported-by: Honggang Li ho...@redhat.com Signed-off-by: Erez Shitrit ere...@mellanox.com Signed-off-by: Honggang Li ho...@redhat.com --- changes from V0: - fixed two typos in the change-log drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 + drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 657b89b..915ad04 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); + /* parent interface */ + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) + return dev-ifindex; + + /* child/vlan interface */ return priv-parent-ifindex; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 4dd1313..fca1a88 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, /* MTU will be reset when mcast join happens */ priv-dev-mtu = IPOIB_UD_MTU(priv-max_ib_mtu); priv-mcast_mtu = priv-admin_mtu = priv-dev-mtu; + priv-parent = ppriv-dev; set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags); result = ipoib_set_dev_features(priv, ppriv-ca); @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, goto register_failed; } - priv-parent = ppriv-dev; - ipoib_create_debug_files(priv-dev); /* RTNL childs don't need proprietary sysfs entries */ -- 1.7.11.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next] netns: remove duplicated include from net_namespace.c
Le 16/04/2015 15:17, weiyj...@163.com a écrit : From: Wei Yongjun yongjun_...@trendmicro.com.cn Remove duplicated include. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn Acked-by: Nicolas Dichtel nicolas.dich...@6wind.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -next] netns: remove duplicated include from net_namespace.c
From: Wei Yongjun yongjun_...@trendmicro.com.cn Remove duplicated include. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- net/core/net_namespace.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index e5e96b0..9c43cf6 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -16,7 +16,6 @@ #include linux/export.h #include linux/user_namespace.h #include linux/net_namespace.h -#include linux/rtnetlink.h #include net/sock.h #include net/netlink.h #include net/net_namespace.h -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] hso: fix refcnt leak in recent patch.
On Tue, Apr 14, 2015 at 11:03:03AM +1000, NeilBrown wrote: On Tue, 14 Apr 2015 09:36:34 +1000 NeilBrown ne...@suse.de wrote: Prior to commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b hso: fix crash when device disappears while serial port is open hso_serial_open would always kref_get(serial-parent-ref) before returning zero. Since that commit, it only calls kref_get when returning 0 if serial-port.count was zero. This results in calls to kref_put(serial-parent-ref, hso_serial_ref_free); after hso_serial_ref_free has been called, which dereferences a freed pointer. This patch adds the missing kref_get(). Fixes: commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b Cc: sta...@vger.kernel.org (v4.0) Cc: Olivier Sobrie oliv...@sobrie.be Signed-off-by: NeilBrown ne...@suse.de diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c index 75befc1bd816..6848fc903340 100644 --- a/drivers/net/usb/hso.c +++ b/drivers/net/usb/hso.c @@ -1299,6 +1299,7 @@ static int hso_serial_open(struct tty_struct *tty, struct file *filp) } } else { D1(Port was already open); + kref_get(serial-parent-ref); } usb_autopm_put_interface(serial-parent-interface); Sorry - that was wrong. I'm getting crashes which strongly suggest the kref_put is being called extra times, but I misunderstood the code and was hasty. Maybe this instead? I tested the patch and it looks fine :-) Thank you, Olivier Thanks, NeilBrown From: NeilBrown n...@brown.name Date: Tue, 14 Apr 2015 09:33:03 +1000 Subject: [PATCH] hso: fix refcnt leak in recent patch. Prior to commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b hso: fix crash when device disappears while serial port is open a kref_get on serial-parent-ref would be taken on each open, and it would be kref_put on each close. Now the kref_put happens when the tty_struct is finally put (via the 'cleanup') providing tty-driver_data has been set. So the kref_get must be called exact once when tty-driver_data is set. With the current code, if the first open fails the kref_get() is never called, but the kref_put() is called, leaving to a crash. So change the kref_get call to happen exactly when -driver_data is changed from NULL to non-NULL. Fixes: commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b Cc: sta...@vger.kernel.org (v4.0) Cc: Olivier Sobrie oliv...@sobrie.be Tested-by: Olivier Sobrie oliv...@sobrie.be Signed-off-by: NeilBrown n...@brown.name diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c index 75befc1bd816..17fd3820263a 100644 --- a/drivers/net/usb/hso.c +++ b/drivers/net/usb/hso.c @@ -1278,6 +1278,8 @@ static int hso_serial_open(struct tty_struct *tty, struct file *filp) D1(Opening %d, serial-minor); /* setup */ + if (tty-driver_data == NULL) + kref_get(serial-parent-ref); tty-driver_data = serial; tty_port_tty_set(serial-port, tty); @@ -1294,8 +1296,6 @@ static int hso_serial_open(struct tty_struct *tty, struct file *filp) if (result) { hso_stop_serial_device(serial-parent); serial-port.count--; - } else { - kref_get(serial-parent-ref); } } else { D1(Port was already open); -- Olivier -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iproute2 -next] tc: built-in eBPF exec proxy
On 4/15/15 7:52 AM, Daniel Borkmann wrote: File descriptors, based on their id, are being fetched from the same unix domain socket as demonstrated in the bpf_agent, the shell spawned via execvpe(2) and the map fds passed over the environment, and thus are made available to applications in the fashion of std{in,out,err} for read/write access, for example in case of iproute2's examples/bpf/: Amazing that it worked. Acked-by: Alexei Starovoitov a...@plumgrid.com +static void bpf_map_set_env(int *tfd) +{ + char key[64], *val; + int i; + for (i = 0; i BPF_MAP_ID_MAX; i++) { + memset(key, 0, sizeof(key)); + snprintf(key, sizeof(key), BPF_MAP%d, i); + val = secure_getenv(key); + assert(val != NULL); everything looks good. My only nit is that the name of the function reads as this function is setting env vars, whereas it's actually reading them. I guess in your mind it fits with the rest of 'bpf_map_set_*' functions, but the name is still confusing. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iproute2 -next] tc: built-in eBPF exec proxy
On 04/16/2015 07:48 PM, Alexei Starovoitov wrote: On 4/15/15 7:52 AM, Daniel Borkmann wrote: File descriptors, based on their id, are being fetched from the same unix domain socket as demonstrated in the bpf_agent, the shell spawned via execvpe(2) and the map fds passed over the environment, and thus are made available to applications in the fashion of std{in,out,err} for read/write access, for example in case of iproute2's examples/bpf/: Amazing that it worked. Acked-by: Alexei Starovoitov a...@plumgrid.com +static void bpf_map_set_env(int *tfd) +{ +char key[64], *val; +int i; +for (i = 0; i BPF_MAP_ID_MAX; i++) { +memset(key, 0, sizeof(key)); +snprintf(key, sizeof(key), BPF_MAP%d, i); +val = secure_getenv(key); +assert(val != NULL); everything looks good. My only nit is that the name of the function reads as this function is setting env vars, whereas it's actually reading them. I guess in your mind it fits with the rest of 'bpf_map_set_*' functions, but the name is still confusing. Ok, since it's example code, I'll find a better function name for it, keep yours and Hannes' tags and resubmit. Thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]
Michael Chan writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]): On Thu, 2015-04-16 at 09:24 -0300, casca...@linux.vnet.ibm.com wrote: Yes, this looks like the driver is not syncing the DMA buffers. Unmap is supposed to synchronize as well. For small rx packets ( 256 bytes), we sync the DMA buffer before we copy the data to another SKB. For larger packets, we unmap the DMA buffer. Do we see the corruption in both cases? Yes, at least with swiotlb=force iommu=soft. Ian. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ethtool: return 1 as exit code on a settings(-s) failure.
On Thu, 2015-04-16 at 13:55 -0700, Sridhar Samudrala wrote: Currently 0 is returned on both success or failure. Previously discussed here: http://thread.gmane.org/gmane.linux.network/349717. Ben. Signed-off-by: Sridhar Samudrala sridhar.samudr...@intel.com --- ethtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ethtool.c b/ethtool.c index 01b13a6..163dff2 100644 --- a/ethtool.c +++ b/ethtool.c @@ -2352,7 +2352,7 @@ static int do_sset(struct cmd_context *ctx) int argc = ctx-argc; char **argp = ctx-argp; int i; - int err; + int err = 0; for (i = 0; i ARRAY_SIZE(flags_msglvl); i++) flag_to_cmdline_info(flags_msglvl[i].name, @@ -2665,7 +2665,7 @@ static int do_sset(struct cmd_context *ctx) } } - return 0; + return err ? 1 : 0; } static int do_gregs(struct cmd_context *ctx) -- Ben Hutchings Humour is the best antidote to reality. signature.asc Description: This is a digitally signed message part
[PATCH 1/1] altera tse: Fix network-delays and -retransmissions after high throughput.
From: Andreas Oetken andr...@oetken.name Fix bug which occurs when more than limit packets are available during napi-poll, leading to delays and retransmissions on the network. Check for (count limit) before checking the get_rx_status in tse_rx-function. Function get_rx_status is reading from the response-fifo. If there is currently a response in the fifo, reading the last byte of the response pops the value from the fifo. If the limit is checked as second condition and the limit is reached the fifo is popped but the packet is not processed. Signed-off-by: Andreas Oetken ennoerlan...@gmail.com --- drivers/net/ethernet/altera/altera_tse_main.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/altera/altera_tse_main.c b/drivers/net/ethernet/altera/altera_tse_main.c index 6725dc0..dbbbd34 100644 --- a/drivers/net/ethernet/altera/altera_tse_main.c +++ b/drivers/net/ethernet/altera/altera_tse_main.c @@ -376,8 +376,13 @@ static int tse_rx(struct altera_tse_private *priv, int limit) u16 pktlength; u16 pktstatus; - while (((rxstatus = priv-dmaops-get_rx_status(priv)) != 0) - (count limit)) { + /* Check for count limit first as get_rx_status is changing + * the response-fifo so we must process the next packet + * after calling get_rx_status if a response is pending. + * (reading the last byte of the response pops the value from the fifo.) + */ + while ((count limit) + ((rxstatus = priv-dmaops-get_rx_status(priv)) != 0)) { pktstatus = rxstatus 16; pktlength = rxstatus 0x; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
On Thu, Apr 16, 2015 at 01:18:37PM +0900, Hyong-Youb Kim wrote: On Thu, Apr 16, 2015 at 01:58:16AM +0200, Luis R. Rodriguez wrote: An alternative... is to just ioremap_wc() the entire region, including MMIO registers for these old devices. I see one ethernet driver that does this, myri10ge, and am curious how and why they ended up deciding this and if they have run into any issues. I wonder if this is a reasonable comrpomise for these 2 remaining corner cases. For myri10ge, it a performance thing. Descriptor rings are in NIC memory BAR0, not in host memory. Say, to send a packet, the driver writes the send descriptor to the ioremap'd NIC memory. It is a multi-word descriptor. So, to send it as one PCIE MWr transaction, the driver maps the whole BAR0 as WC and does copy descriptor; wmb. Interesting, so you burst write multi-word descriptor writes using write-combining here for the Ethernet device. Without WC, descriptors would end up as multiple 4B or 8B MWr packets to the NIC, which has a pretty big performance impact on this particular NIC. How big are the descriptors? Most registers that do not want WC are actually in BAR2, which is not mapped as WC. For registers that are in BAR0, we do write to the register; wmb. If we want to wait till the NIC has seen the write, we do write; wmb; read. Interesting, thanks, yeah using this as a work around to the problem sounds plausible however it still would require likely making just as many changes to the ivtv and ipath driver as to just do a proper split. I do wonder however if this sort of work around can be generalized somehow though so that others could use, if this sort of thing is going to become prevalent. If so then this would serve two purposes: work around for the corner cases of MTRR use on Linux and also these sorts of device constraints. In order to determine if this is likely to be generally useful could you elaborate a bit more about the detals of the performance issues of not bursting writes for the descriptor on this device. Even if that is done a conversion over to this work around seems it may require device specific nitpicks. For instance I note in myri10ge_submit_req() for small writes you just do a reverse write and do the first set last, then finally the last 32 bits are rewritten, I guess to trigger something? This approach has worked for this device for many years. I cannot say whether it works for other devices, though. I think it should but the more interesting question would be exactly *why* it was needed for this device, who determined that, and why? Luis -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state
On Thu, Apr 16, 2015 at 3:46 PM, Guenter Roeck li...@roeck-us.net wrote: On 04/15/2015 10:12 PM, Guenter Roeck wrote: Return correct error code if _mv88e6xxx_reg_read returns an error. Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging support) Signed-off-by: Guenter Roeck li...@roeck-us.net I should have given proper credit. Reported-by: kbuild test robot fengguang...@intel.com For the curious, neither W=1, W=2, C-1, C=2, nor sparse report this problem, at least not with gcc 4.9.1. Good old gcc 4.1.2 (which I still use for m68k builds, and won't retire anytime soon as it finds real bugs in every new kernel release) says: drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’: drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used uninitialized in this function Even after your patch, as it missed one case (patch sent). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] net: dsa: use DEVICE_ATTR_RW to declare temp1_max
Hello. On 04/16/2015 09:38 PM, Vivien Didelot wrote: Since commit da4759c, sysfs will only use the permissions returned by Please also specify that commit's summary line in parens. is_visible, instead of OR'ing them with the default file mode. This allows us to declare temp1_max with the DEVICE_ATTR_RW macro and just return the desired permissions for the hwmon sysfs attributes in dsa_hwmon_attrs_visible. Also, allow temp1_max to be write-only if set_temp_limit is provided, but not get_temp_limit. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com WBR, Sergei -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting
On 17.04, Herbert Xu wrote: On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote: So currently we have one fast path, that is: we are not fragmented, we get out non-fragmented, none of this code is ever touched and no problem. We don't want to mak this more complex, but You should read Dave's other email where he gives you an obvious solution. If you have to modify the skb then you don't have to worry about the original fragments. But if you only read the skb then don't linearise it completely and keep the original fragments. Yes, I was just responding to that. We need an additional member in the skb, but then this part is quite simple. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2 -next v2] tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e45f (tc, bpf: finalize eBPF support for cls and act front-end) and takes up the idea proposed by Hannes Frederic Sowa to spawn a shell (or any other command) that holds generated eBPF map file descriptors. File descriptors, based on their id, are being fetched from the same unix domain socket as demonstrated in the bpf_agent, the shell spawned via execvpe(2) and the map fds passed over the environment, and thus are made available to applications in the fashion of std{in,out,err} for read/write access, for example in case of iproute2's examples/bpf/: # env | grep BPF BPF_NUM_MAPS=3 BPF_MAP1=6- BPF_MAP_ID_QUEUE (id 1) BPF_MAP0=5- BPF_MAP_ID_PROTO (id 0) BPF_MAP2=7- BPF_MAP_ID_DROPS (id 2) # ls -la /proc/self/fd [...] lrwx--. 1 root root 64 Apr 14 16:46 0 - /dev/pts/4 lrwx--. 1 root root 64 Apr 14 16:46 1 - /dev/pts/4 lrwx--. 1 root root 64 Apr 14 16:46 2 - /dev/pts/4 [...] lrwx--. 1 root root 64 Apr 14 16:46 5 - anon_inode:bpf-map lrwx--. 1 root root 64 Apr 14 16:46 6 - anon_inode:bpf-map lrwx--. 1 root root 64 Apr 14 16:46 7 - anon_inode:bpf-map The advantage (as opposed to the direct/native usage) is that now the shell is map fd owner and applications can terminate and easily reattach to descriptors w/o any kernel changes. Moreover, multiple applications can easily read/write eBPF maps simultaneously. To further allow users for experimenting with that, next step is to add a small helper that can get along with simple data types, so that also shell scripts can make use of bpf syscall, f.e to read/write into maps. Generally, this allows for prepopulating maps, or any runtime altering which could influence eBPF program behaviour (f.e. different run-time classifications, skb modifications, ...), dumping of statistics, etc. Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860 Suggested-by: Hannes Frederic Sowa han...@stressinduktion.org Signed-off-by: Daniel Borkmann dan...@iogearbox.net Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org Acked-by: Alexei Starovoitov a...@plumgrid.com --- ( Stephen, this applies to current net-next branch of iproute2. ) v1 - v2: - s/bpf_map_set_env/bpf_map_get_from_env/ as suggested by Alexei - Rest as is, also kept reviewer acks examples/bpf/bpf_agent.c | 55 ++ examples/bpf/bpf_prog.c | 27 + tc/Makefile | 7 ++- tc/e_bpf.c | 147 +++ tc/f_bpf.c | 2 +- tc/m_action.c| 2 +- tc/m_bpf.c | 2 +- tc/tc.c | 9 +-- tc/tc_bpf.c | 98 --- tc/tc_bpf.h | 15 - tc/tc_common.h | 3 + tc/tc_exec.c | 109 +++ tc/tc_filter.c | 2 +- tc/tc_util.h | 16 -- 14 files changed, 454 insertions(+), 40 deletions(-) create mode 100644 tc/e_bpf.c create mode 100644 tc/tc_exec.c diff --git a/examples/bpf/bpf_agent.c b/examples/bpf/bpf_agent.c index 0f481b1..426b880 100644 --- a/examples/bpf/bpf_agent.c +++ b/examples/bpf/bpf_agent.c @@ -24,6 +24,8 @@ * -- Happy eBPF hacking! ;) */ +#define _GNU_SOURCE + #include stdio.h #include stdlib.h #include string.h @@ -31,6 +33,7 @@ #include unistd.h #include stdint.h #include assert.h + #include sys/un.h #include sys/types.h #include sys/stat.h @@ -102,6 +105,23 @@ static void bpf_dump_proto(int fd) printf(\n); } +static void bpf_dump_map_data(int *tfd) +{ + int i; + + for (i = 0; i 30; i++) { + const int period = 5; + + printf(data, period: %dsec\n, period); + + bpf_dump_drops(tfd[BPF_MAP_ID_DROPS]); + bpf_dump_queue(tfd[BPF_MAP_ID_QUEUE]); + bpf_dump_proto(tfd[BPF_MAP_ID_PROTO]); + + sleep(period); + } +} + static void bpf_info_loop(int *fds, struct bpf_map_aux *aux) { int i, tfd[BPF_MAP_ID_MAX]; @@ -122,16 +142,22 @@ static void bpf_info_loop(int *fds, struct bpf_map_aux *aux) tfd[aux-ent[i].id] = fds[i]; } - for (i = 0; i 30; i++) { - int period = 5; + bpf_dump_map_data(tfd); +} - printf(data, period: %dsec\n, period); +static void bpf_map_get_from_env(int *tfd) +{ + char key[64], *val; + int i; - bpf_dump_drops(tfd[BPF_MAP_ID_DROPS]); - bpf_dump_queue(tfd[BPF_MAP_ID_QUEUE]); - bpf_dump_proto(tfd[BPF_MAP_ID_PROTO]); + for (i = 0; i BPF_MAP_ID_MAX; i++) { + memset(key, 0, sizeof(key)); + snprintf(key, sizeof(key), BPF_MAP%d, i); - sleep(period); + val = secure_getenv(key); + assert(val != NULL); + + tfd[i] = atoi(val); } } @@
Re: [PATCH] Bluetooth: Pre-initialize variables in read_local_oob_ext_data_complete()
Hi Geert, net/bluetooth/mgmt.c: In function ‘read_local_oob_ext_data_complete’: net/bluetooth/mgmt.c:6474: warning: ‘r256’ may be used uninitialized in this function net/bluetooth/mgmt.c:6474: warning: ‘h256’ may be used uninitialized in this function net/bluetooth/mgmt.c:6474: warning: ‘r192’ may be used uninitialized in this function net/bluetooth/mgmt.c:6474: warning: ‘h192’ may be used uninitialized in this function While these are false positives, the code can be shortened by pre-initializing the hash table pointers and eir_len. This has the side effect of killing the compiler warnings. can you be a bit specific on which compiler version is this. I fixed one occurrence that seemed valid. However in this case the compiler seems to be just plain stupid. On a gcc 4.9, I am not seeing these for example. Regards Marcel -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting
On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote: So currently we have one fast path, that is: we are not fragmented, we get out non-fragmented, none of this code is ever touched and no problem. We don't want to mak this more complex, but You should read Dave's other email where he gives you an obvious solution. If you have to modify the skb then you don't have to worry about the original fragments. But if you only read the skb then don't linearise it completely and keep the original fragments. Cheers, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net: dsa: register hwmon for any provided function
On Thu, Apr 16, 2015 at 02:38:19PM -0400, Vivien Didelot wrote: A switch driver may only provide one of the temperature limit accessors, or the temperature alarm getter. So register the hwmon subsystem if any of the related functions is provided. Thus, check get_temp to set the visibility of temp1_input. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- net/dsa/dsa.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 67d2983..6b68994 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -157,6 +157,10 @@ static umode_t dsa_hwmon_attrs_visible(struct kobject *kobj, umode_t mode = 0; switch (index) { + case 0: /* temp1_input */ + if (drv-get_temp) + mode |= S_IRUGO; This should be mandatory. Sorry, I don't really understand what you are trying to accomplish here. Can you give me a real world example where a chip would support setting a limit but not reading it ? Thanks, Guenter + break; case 1: /* temp1_max */ if (drv-get_temp_limit) mode |= S_IRUGO; @@ -310,7 +314,8 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) * register with hardware monitoring subsystem. * Treat registration error as non-fatal and ignore it. */ - if (drv-get_temp) { + if (drv-get_temp || drv-get_temp_limit || drv-set_temp_limit || + drv-get_temp_alarm) { const char *netname = netdev_name(dst-master_netdev); char hname[IFNAMSIZ + 1]; int i, j; -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Bluetooth: Pre-initialize variables in read_local_oob_ext_data_complete()
net/bluetooth/mgmt.c: In function ‘read_local_oob_ext_data_complete’: net/bluetooth/mgmt.c:6474: warning: ‘r256’ may be used uninitialized in this function net/bluetooth/mgmt.c:6474: warning: ‘h256’ may be used uninitialized in this function net/bluetooth/mgmt.c:6474: warning: ‘r192’ may be used uninitialized in this function net/bluetooth/mgmt.c:6474: warning: ‘h192’ may be used uninitialized in this function While these are false positives, the code can be shortened by pre-initializing the hash table pointers and eir_len. This has the side effect of killing the compiler warnings. Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org --- net/bluetooth/mgmt.c | 16 ++-- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c index 7fd87e7135b52753..f8e13b6d58279463 100644 --- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -6471,9 +6471,9 @@ static void read_local_oob_ext_data_complete(struct hci_dev *hdev, u8 status, { const struct mgmt_cp_read_local_oob_ext_data *mgmt_cp; struct mgmt_rp_read_local_oob_ext_data *mgmt_rp; - u8 *h192, *r192, *h256, *r256; + u8 *h192 = NULL, *r192 = NULL, *h256 = NULL, *r256 = NULL; struct mgmt_pending_cmd *cmd; - u16 eir_len; + u16 eir_len = 0; int err; BT_DBG(%s status %u, hdev-name, status); @@ -6486,18 +6486,11 @@ static void read_local_oob_ext_data_complete(struct hci_dev *hdev, u8 status, if (status) { status = mgmt_status(status); - eir_len = 0; - - h192 = NULL; - r192 = NULL; - h256 = NULL; - r256 = NULL; } else if (opcode == HCI_OP_READ_LOCAL_OOB_DATA) { struct hci_rp_read_local_oob_data *rp; if (skb-len != sizeof(*rp)) { status = MGMT_STATUS_FAILED; - eir_len = 0; } else { status = MGMT_STATUS_SUCCESS; rp = (void *)skb-data; @@ -6505,23 +6498,18 @@ static void read_local_oob_ext_data_complete(struct hci_dev *hdev, u8 status, eir_len = 5 + 18 + 18; h192 = rp-hash; r192 = rp-rand; - h256 = NULL; - r256 = NULL; } } else { struct hci_rp_read_local_oob_ext_data *rp; if (skb-len != sizeof(*rp)) { status = MGMT_STATUS_FAILED; - eir_len = 0; } else { status = MGMT_STATUS_SUCCESS; rp = (void *)skb-data; if (hci_dev_test_flag(hdev, HCI_SC_ONLY)) { eir_len = 5 + 18 + 18; - h192 = NULL; - r192 = NULL; } else { eir_len = 5 + 18 + 18 + 18 + 18; h192 = rp-hash192; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting
On 16.04, David Miller wrote: Netfilter may change the contents of the packet, even change its size. It is *really* hard to do this while keeping the original fragments intact. I keep hearing a lot of it's hard as the only reason we shouldn't do this properly, and that frankly sucks. People aren't looking for a solution and to be honest it's quite tiring. The common case is that the rules processed are simple, the size of the overall packet does _not_ change, and therefore the best thing to do is pass the entire thing as a unit with the frags in tact. That's the fundamental fact. It's also the fastest way to process these packets and avoids all of these stupid max frag garbage. Only at the point where netfilter makes changes to the size of the packet does it take ownership and get to take on the responsibility of making sure the new resulting fragments are sane. But only at that point. Agreed, that part shouldn't be hard. We need to pass the defragmented skb through the ruleset, meaning we need to pass it through the stack. That's needed since the rules depend on this. If we don't make changes, we can spit out the original fragments, but for this we need to keep a reference to them from the skb. We still need the max_frag_size thing, once a modification is made we drop the frag list reference and just regulary refragment the modified skb according to the limits. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] net: dsa: use DEVICE_ATTR_RW to declare temp1_max
Hello Sergei, Since commit da4759c, sysfs will only use the permissions returned by Please also specify that commit's summary line in parens. Duly noted. da4759c is sysfs: Use only return value from is_visible for the file mode (see: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da4759c) Best, -v -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ethtool: return 1 as exit code on a settings(-s) failure.
Currently 0 is returned on both success or failure. Signed-off-by: Sridhar Samudrala sridhar.samudr...@intel.com --- ethtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ethtool.c b/ethtool.c index 01b13a6..163dff2 100644 --- a/ethtool.c +++ b/ethtool.c @@ -2352,7 +2352,7 @@ static int do_sset(struct cmd_context *ctx) int argc = ctx-argc; char **argp = ctx-argp; int i; - int err; + int err = 0; for (i = 0; i ARRAY_SIZE(flags_msglvl); i++) flag_to_cmdline_info(flags_msglvl[i].name, @@ -2665,7 +2665,7 @@ static int do_sset(struct cmd_context *ctx) } } - return 0; + return err ? 1 : 0; } static int do_gregs(struct cmd_context *ctx) -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] net: dsa: use DEVICE_ATTR_RW to declare temp1_max
On Thu, Apr 16, 2015 at 02:38:18PM -0400, Vivien Didelot wrote: Since commit da4759c, sysfs will only use the permissions returned by is_visible, instead of OR'ing them with the default file mode. This allows us to declare temp1_max with the DEVICE_ATTR_RW macro and just return the desired permissions for the hwmon sysfs attributes in dsa_hwmon_attrs_visible. Also, allow temp1_max to be write-only if set_temp_limit is provided, but not get_temp_limit. Hi Vivien, This would be a first for the entire hwmon subsystem and doesn't really make sense. Guenter Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- net/dsa/dsa.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 5eaadab..67d2983 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -124,7 +124,7 @@ static ssize_t temp1_max_store(struct device *dev, return count; } -static DEVICE_ATTR(temp1_max, S_IRUGO, temp1_max_show, temp1_max_store); +static DEVICE_ATTR_RW(temp1_max); static ssize_t temp1_max_alarm_show(struct device *dev, struct device_attribute *attr, char *buf) @@ -154,16 +154,24 @@ static umode_t dsa_hwmon_attrs_visible(struct kobject *kobj, struct device *dev = container_of(kobj, struct device, kobj); struct dsa_switch *ds = dev_get_drvdata(dev); struct dsa_switch_driver *drv = ds-drv; - umode_t mode = attr-mode; + umode_t mode = 0; - if (index == 1) { - if (!drv-get_temp_limit) - mode = 0; - else if (drv-set_temp_limit) + switch (index) { + case 1: /* temp1_max */ + if (drv-get_temp_limit) + mode |= S_IRUGO; + if (drv-set_temp_limit) mode |= S_IWUSR; - } else if (index == 2 !drv-get_temp_alarm) { - mode = 0; + break; + case 2: /* temp1_max_alarm */ + if (drv-get_temp_alarm) + mode |= S_IRUGO; + break; + default: + mode = attr-mode; + break; } + return mode; } -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink
On Thu, Apr 16, 2015, Yann Droneaud ydrone...@opteya.com wrote: Hi, Le jeudi 16 avril 2015 à 16:34 +0300, Erez Shitrit a écrit : Currently, iflink of the parent interface was always accessed, even when interface didn't have a parent and hence we crashed there. + since commit 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink'). as there was no crash here before AFAIK. Disagree, it's redundant, as the Fixes tag below points to this commit Handle the interface types properly: for a child interface, return the ifindex of the parent, for parent interface, return its ifindex. For child devices, make sure to set the parent pointer prior to invoking register_netdevice(), this allows the new ndo to be called by the stack immediately after the child device is registered. Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink') Cc: Nicolas Dichtel nicolas.dich...@6wind.com Ditto, I find it enough to just copy Nicholas on the patch submission, as we did here. Or. Reported-by: Honggang Li ho...@redhat.com Signed-off-by: Erez Shitrit ere...@mellanox.com Signed-off-by: Honggang Li ho...@redhat.com --- changes from V0: - fixed two typos in the change-log drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 + drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 657b89b..915ad04 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); + /* parent interface */ + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) + return dev-ifindex; + + /* child/vlan interface */ return priv-parent-ifindex; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 4dd1313..fca1a88 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, /* MTU will be reset when mcast join happens */ priv-dev-mtu = IPOIB_UD_MTU(priv-max_ib_mtu); priv-mcast_mtu = priv-admin_mtu = priv-dev-mtu; + priv-parent = ppriv-dev; set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags); result = ipoib_set_dev_features(priv, ppriv-ca); @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, goto register_failed; } - priv-parent = ppriv-dev; - ipoib_create_debug_files(priv-dev); /* RTNL childs don't need proprietary sysfs entries */ Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting
Am 16. April 2015 17:43:23 MESZ, schrieb David Miller da...@davemloft.net: From: Hannes Frederic Sowa han...@stressinduktion.org Date: Thu, 16 Apr 2015 14:11:42 +0200 On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote: On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote: Netfilter may change the contents of the packet, even change its size. It is *really* hard to do this while keeping the original fragments intact. Perhaps we should provide better helpers to facilitate this? So instead of directly manipulating the content of the skb you would so so through helpers and the helpers can then try to do sensible things with the fragments. When Florian and me started discussing how to solve this we also wanted to be as transparent as possible. But looking at all possible fragmentation scenarios, this seems to be too complicated. Even imagine a fragment with overlapping offsets and some of the fragments got duplicated. If we had to keep this in frag_list and now netfilter has to change any of this contents, this will become a total mess, like changing one port in multiple skbs at different offsets. I doubt it is worth the effort. You guys keep talking about exceptional cases rather than what is unquestionable the common case, and the one worth handling in the fast paths. True. But its not like we haven't tried. What I keep hearing is lots of well meaning opinions, and the fact is, I've looked into this countless times, you can find my first attempt when googling from 11 years ago, and the resolution was always - it's not worth it. It's not too hard, I wouldn't mind. And I fail to see the problem with that. We're not talking about a functional defect, it's something people (me included) think is not nice. I will try to recollect the problems the different approaches have in detail. I'm pretty sure you will come to the same conclusion as I have. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state()
From: Geert Uytterhoeven ge...@linux-m68k.org Date: Thu, 16 Apr 2015 20:49:14 +0200 drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’: drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used uninitialized in this function If oldstate == state, mv88e6xxx_set_port_state() will return an uninitialized value. Pre-initialize ret to zero to fix this. Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org Indeed, applied, thanks Geert. N§²ζμrΈyϊθΨb²X¬ΆΗ§vΨ^)ήΊ{.nΗ+·§zΧ^Ύ)ν ζθw*jg¬±¨Άέ’j/κδzΉήΰ2ή¨θΪ’)ί‘«aΆΪώψ�G«ιh�ζj:+v¨wθΩ₯
Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state
On 04/16/2015 11:51 AM, Geert Uytterhoeven wrote: On Thu, Apr 16, 2015 at 3:46 PM, Guenter Roeck li...@roeck-us.net wrote: On 04/15/2015 10:12 PM, Guenter Roeck wrote: Return correct error code if _mv88e6xxx_reg_read returns an error. Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging support) Signed-off-by: Guenter Roeck li...@roeck-us.net I should have given proper credit. Reported-by: kbuild test robot fengguang...@intel.com For the curious, neither W=1, W=2, C-1, C=2, nor sparse report this problem, at least not with gcc 4.9.1. Good old gcc 4.1.2 (which I still use for m68k builds, and won't retire anytime soon as it finds real bugs in every new kernel release) says: drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’: drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used uninitialized in this function Even after your patch, as it missed one case (patch sent). Grumble. So much for trusting tools :-( Thanks for fixing this! Guenter -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v4 00/24] switchdev: spring cleanup
On 4/13/15, 10:47 PM, roopa wrote: On 4/12/15, 11:16 PM, sfel...@gmail.com wrote: From: Scott Feldman sfel...@gmail.com v4: Well, it was a lot of work, but now prepare-commit transaction model is how davem advises: if prepare fails, abort the transaction. The driver must do resource reservations up front in prepare phase and return those resources if aborting. Commit phase would use reserved resources. The good news is the driver code (for rocker) now handles resource allocation failures better by not leaving partially device or driver states. This is a side-effect of the prepare phase where state isn't modified; only validation of inputs and resource reservations happen in the prepare phase. Since we're supporting setting attrs and add objs across lower devs in the stacked case, we need to hold rtnl_lock (or ensure rtnl_lock is held) so lower devs don't move on us during the prepare-commit transaction. DSA driver code skips the prepare phase and goes straight for the commit phase since no up-front allocations are done and no device failures (that could be detected in the prepare phase) can happen. thanks for the series. It definitely does look cleaner and less confusing now!. I do love the abstraction but i was one of the people voting against duplicating the kernel objects into swdev objs which this patches does (which i am still not convinced we should have). Remove NETIF_F_HW_SWITCH_OFFLOAD from rocker and the swdev_attr_set/get wrappers. DSA doesn't set NETIF_F_HW_SWITCH_OFFLOAD, so it can't be in swdev_attr_set/get. rocker doesn't need it; or rather can't support NETIF_F_HW_SWITCH_OFFLOAD being set/cleared at run-time after the device port is already up and offloading L2/L3. NETIF_F_HW_SWITCH_OFFLOAD is still left as a feature flag for drivers that can use it. I see that this series removes all uses of it in the switchdev api later. I had summarized the need for the flag in reply to one of your questions a few weeks back. Since you have moved all ndo ops to swdev ops (including ndo_bridge_setlink/dellink), I don't want to hold on to the feature flag if no one is using it. yes, my userspace driver uses it today. I will come back with stronger justification to keep it or will submit a patch to remove it and add it back at a later point if needed. scott, I see that you will be spinning v5 of the series. In which case feel free to remove the feature flag during your code restructuring effort. If i need it again, i will resubmit. thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v2] act_mirred: Fix bogus header when redirecting from VLAN
On Thu, Apr 16, 2015 at 07:40:30PM -0700, Alexei Starovoitov wrote: On Fri, Apr 17, 2015 at 10:15:01AM +0800, Herbert Xu wrote: seems the cleaner fix will be to push skb-mac_len instead? No skb-mac_len is the same as skb2-dev-hard_header_len. hmm. please help me understand the problem then. In the commit log you mentioned that your vlan dev and ifb have unequal hard header length. I think that can only happen if your master dev used to create vlan, didn't have vlan offload, so vlandev-hhl = 18 and ifb-hhl = 14. Then when tagged packet arrives on master device we call skb_vlan_untag() and skb-mac_len becomes 14, then vlan_do_receive() will do You are right. I should've just used mac_len. ---8--- When you redirect a VLAN device to any device, you end up with crap in af_packet on the xmit path because hard_header_len is not equal to skb-mac_len. So the redirected packet contains four extra bytes at the start which then gets interpreted as part of the MAC address. This patch fixes this by only pushing skb-mac_len. We also need to fix ifb because it tries to undo the pushing done by act_mirred. Signed-off-by: Herbert Xu herb...@gondor.apana.org.au diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c index 34f846b..94570aa 100644 --- a/drivers/net/ifb.c +++ b/drivers/net/ifb.c @@ -105,7 +105,7 @@ static void ri_tasklet(unsigned long dev) if (from AT_EGRESS) { dev_queue_xmit(skb); } else if (from AT_INGRESS) { - skb_pull(skb, skb-dev-hard_header_len); + skb_pull(skb, skb-mac_len); netif_receive_skb(skb); } else BUG(); diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 5953517..3f63cea 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -157,7 +157,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, if (!(at AT_EGRESS)) { if (m-tcfm_ok_push) - skb_push(skb2, skb2-dev-hard_header_len); + skb_push(skb2, skb-mac_len); } /* mirror is always swallowed */ -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]
On Thu, 2015-04-16 at 18:15 +0100, Ian Jackson wrote: Michael Chan writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]): On Thu, 2015-04-16 at 09:24 -0300, casca...@linux.vnet.ibm.com wrote: Yes, this looks like the driver is not syncing the DMA buffers. Unmap is supposed to synchronize as well. For small rx packets ( 256 bytes), we sync the DMA buffer before we copy the data to another SKB. For larger packets, we unmap the DMA buffer. Do we see the corruption in both cases? Yes, at least with swiotlb=force iommu=soft. Ok this is what is causing the problem, the driver uses DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of the dma mapping and dma_unmap_addr() to get the mapping value. On most of the platforms this is a no-op, but it appears with iommu=soft and swiotlb=force this house keeping is required, when I pass the correct dma_addr instead of 0 while calling pci_unmap_/pci_dma_sync_ I don't see the corruption. ie If you set CONFIG_NEED_DMA_MAP_STATE=y in your kernel config you should not see the problem. Can you confirm ? Thanks -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting
On Thu, Apr 16, 2015, at 22:56, Patrick McHardy wrote: On 17.04, Herbert Xu wrote: On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote: So currently we have one fast path, that is: we are not fragmented, we get out non-fragmented, none of this code is ever touched and no problem. We don't want to mak this more complex, but You should read Dave's other email where he gives you an obvious solution. If you have to modify the skb then you don't have to worry about the original fragments. But if you only read the skb then don't linearise it completely and keep the original fragments. Yes, I was just responding to that. We need an additional member in the skb, but then this part is quite simple. I guess you refer to the solution of using the fragmented list of packets to do checks. I don't think it will be that simple to peek into all the different skbs if the fragments are split in the header. Splitting fragments in the header is the 1x1 of firewall by-passing attacks, so we should certainly handle it correctly. Logic off peeking into fragments and splitting fragments must be logically consent all the time, with all netfilter helpers in the tree. I don't think the additional checks in fast-path are in any way justified. Bye, Hannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting
On 17.04, Hannes Frederic Sowa wrote: On Thu, Apr 16, 2015, at 22:56, Patrick McHardy wrote: On 17.04, Herbert Xu wrote: On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote: So currently we have one fast path, that is: we are not fragmented, we get out non-fragmented, none of this code is ever touched and no problem. We don't want to mak this more complex, but You should read Dave's other email where he gives you an obvious solution. If you have to modify the skb then you don't have to worry about the original fragments. But if you only read the skb then don't linearise it completely and keep the original fragments. Yes, I was just responding to that. We need an additional member in the skb, but then this part is quite simple. I guess you refer to the solution of using the fragmented list of packets to do checks. I don't think it will be that simple to peek into all the different skbs if the fragments are split in the header. Splitting fragments in the header is the 1x1 of firewall by-passing attacks, so we should certainly handle it correctly. Also correct, that argument hasn't come up so far. Its actually not too hard to handle, you need to keep track of which parts have been inspected to make a classification decision and make sure those parts are not overwritten. The TCP match does this statically, but we certainly could do this dynamically. The question again is how to keep geometry if things are actually overwritten, especially partially. Apparently all this troubly is being seen as worthwhile. Logic off peeking into fragments and splitting fragments must be logically consent all the time, with all netfilter helpers in the tree. I don't think the additional checks in fast-path are in any way justified. I fully agree. Bye, Hannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: act_mirred: Fix bogus header when redirecting from VLAN
On Fri, Apr 17, 2015 at 10:15:01AM +0800, Herbert Xu wrote: seems the cleaner fix will be to push skb-mac_len instead? No skb-mac_len is the same as skb2-dev-hard_header_len. hmm. please help me understand the problem then. In the commit log you mentioned that your vlan dev and ifb have unequal hard header length. I think that can only happen if your master dev used to create vlan, didn't have vlan offload, so vlandev-hhl = 18 and ifb-hhl = 14. Then when tagged packet arrives on master device we call skb_vlan_untag() and skb-mac_len becomes 14, then vlan_do_receive() will do another_round, ingress qdisc will trigger and act_mirred eventually will be called. So existing act_mirred will be pushing 18 bytes, whereas only 14 bytes are valid and the lowest 4 have junk. By 'fixing it' using ifb's 14 byte the patch is only masking the problem, no? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
act_mirred: Fix bogus header when redirecting from VLAN
When you redirect a VLAN device to an ifb device, you end up with crap within the ifb device because they have unequal hard header lengths and the redirected packet contains four extra bytes at the start which then gets interpreted as part of the MAC address. This patch fixes this by only pushing on the hard header length of ifb. This assumes that the original packet actually has enough header for that so checks have been added to that effect. Signed-off-by: Herbert Xu herb...@gondor.apana.org.au diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c index 34f846b..1256d26 100644 --- a/drivers/net/ifb.c +++ b/drivers/net/ifb.c @@ -105,7 +105,7 @@ static void ri_tasklet(unsigned long dev) if (from AT_EGRESS) { dev_queue_xmit(skb); } else if (from AT_INGRESS) { - skb_pull(skb, skb-dev-hard_header_len); + skb_pull(skb, _dev-hard_header_len); netif_receive_skb(skb); } else BUG(); diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 5953517..44e4d50 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -151,13 +151,25 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, } at = G_TC_AT(skb-tc_verd); + + if (!(at AT_EGRESS) m-tcfm_ok_push + (skb_headroom(skb) dev-hard_header_len || +skb-dev-hard_header_len dev-hard_header_len)) { + net_notice_ratelimited( + tc mirred: Insufficient headroom from %s to %s: + %d %d\n, + skb-dev-name, dev-name, + skb_headroom(skb), skb-dev-hard_header_len); + goto out; + } + skb2 = skb_act_clone(skb, GFP_ATOMIC, m-tcf_action); if (skb2 == NULL) goto out; if (!(at AT_EGRESS)) { if (m-tcfm_ok_push) - skb_push(skb2, skb2-dev-hard_header_len); + skb_push(skb2, dev-hard_header_len); } /* mirror is always swallowed */ -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] tcp: tcp_get_info() should fetch socket fields once
From: Eric Dumazet eduma...@google.com tcp_get_info() can be called without holding socket lock, so any socket fields can change under us. Use READ_ONCE() to fetch sk_pacing_rate and sk_max_pacing_rate Fixes: 977cb0ecf82e (tcp: add pacing_rate information into tcp_info) Signed-off-by: Eric Dumazet eduma...@google.com --- David, I do not think this needs stable backport, this is a quite minor bug. I've added the 'Fixes' tag for reference only. net/ipv4/tcp.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 18e3a12eb1b2..59c8a027721b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2595,6 +2595,7 @@ void tcp_get_info(const struct sock *sk, struct tcp_info *info) const struct tcp_sock *tp = tcp_sk(sk); const struct inet_connection_sock *icsk = inet_csk(sk); u32 now = tcp_time_stamp; + u32 rate; memset(info, 0, sizeof(*info)); @@ -2655,10 +2656,11 @@ void tcp_get_info(const struct sock *sk, struct tcp_info *info) info-tcpi_total_retrans = tp-total_retrans; - info-tcpi_pacing_rate = sk-sk_pacing_rate != ~0U ? - sk-sk_pacing_rate : ~0ULL; - info-tcpi_max_pacing_rate = sk-sk_max_pacing_rate != ~0U ? - sk-sk_max_pacing_rate : ~0ULL; + rate = READ_ONCE(sk-sk_pacing_rate); + info-tcpi_pacing_rate = rate != ~0U ? rate : ~0ULL; + + rate = READ_ONCE(sk-sk_max_pacing_rate); + info-tcpi_max_pacing_rate = rate != ~0U ? rate : ~0ULL; } EXPORT_SYMBOL_GPL(tcp_get_info); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: act_mirred: Fix bogus header when redirecting from VLAN
On Fri, Apr 17, 2015 at 09:02:16AM +0800, Herbert Xu wrote: @@ -105,7 +105,7 @@ static void ri_tasklet(unsigned long dev) if (from AT_EGRESS) { dev_queue_xmit(skb); } else if (from AT_INGRESS) { - skb_pull(skb, skb-dev-hard_header_len); + skb_pull(skb, _dev-hard_header_len); ... if (!(at AT_EGRESS)) { if (m-tcfm_ok_push) - skb_push(skb2, skb2-dev-hard_header_len); + skb_push(skb2, dev-hard_header_len); seems the cleaner fix will be to push skb-mac_len instead? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net: dsa: register hwmon for any provided function
hi Vivien, On 04/16/2015 03:05 PM, Vivien Didelot wrote: Hi Guenter, switch (index) { +case 0: /* temp1_input */ +if (drv-get_temp) +mode |= S_IRUGO; This should be mandatory. Sorry, I don't really understand what you are trying to accomplish here. Can you give me a real world example where a chip would support setting a limit but not reading it ? I have no such example. I just did not see why this couldn't be allowed (e.g. setting only set_temp_limit and get_temp_alarm looks fine to me). But if you say that get_temp should be mandatory, I'm OK with that. write-only attributes are not defined in the hwmon ABI. If the 'sensors' command encounters such an attribute, it will create an error message each time it executes. That doesn't sound very useful to me. If a chip - for whatever reason - does not have a limit register but an alarm register or flag, its temperature limit is usually hard-coded and can be reported this way (the AMD temperature sensor driver does this, for example). If there is ever a need to support the alarm-register-only situation for some odd reason, we can add the code at the time. For now, it just seems to me that you are adding complexity to solve some theoretic problem which is very unlikely to occur in the real world. The primary goal of this patchset was to use DEVICE_ATTR_RW to declare temp1_max, instead of reflecting the minimal permissions needed. Then why don't you just do that and nothing else ? The goal should be to simplify code, not to make it more complicated. If the result isn't less code, I don't think it is worth it. Thanks, Guenter -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html