date:20150416

Re: [PATCH net-next 4/6] net: hip04: Don't free the tx skb when the buffer is not ready for xmit

2015-04-16 Thread Ding Tianhong

On 2015/4/15 22:19, Arnd Bergmann wrote:
 On Wednesday 15 April 2015 20:30:06 Ding Tianhong wrote:
 @@ -489,6 +487,8 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, 
 struct net_device *ndev)
  
 /* Ensure tx_head update visible to tx reclaim */
 smp_wmb();
 +   count++;
 +   priv-tx_head = TX_NEXT(tx_head);
  

 
 Something is wrong here, the comment does not match the code any more, because
 the smp_wmb() won't actually make the tx_head update visible.
 
   Arnd
 
Yes, the smp_wmb() could only make sure the tx buffer was ready to xmit.

 .
 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/6] net: hip04: use the big endian for tx and rx desc

2015-04-16 Thread Ding Tianhong

On 2015/4/15 22:13, Arnd Bergmann wrote:
 On Wednesday 15 April 2015 20:30:04 Ding Tianhong wrote:
 The hip04 ethernet use the big endian for tx and rx, so set desc to
 big endian and remove the unused next_addr.
 
 I don't understand:
 
 diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
 b/drivers/net/ethernet/hisilicon/hip04_eth.c
 index b0a7f03..6473462 100644
 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c
 +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
 @@ -132,19 +132,18 @@
  #define HIP04_MIN_TX_COALESCE_FRAMES   1
  
  struct tx_desc {
 -   u32 send_addr;
 -   u32 send_size;
 -   u32 next_addr;
 -   u32 cfg;
 -   u32 wb_addr;
 +   __be32 send_addr;
 +   __be32 send_size;
 +   __be32 cfg;
 +   __be32 wb_addr;
  } __aligned(64);
 
 I would think this is a hardware structure, does this not break
 access to the cfg and wb_addr fields if you remove next_addr?
 
   Arnd
 
good cache, I make a mistake here, thanks.

 .
 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 0/6] net: hip04: fix some problem for hip04 drivers

2015-04-16 Thread Ding Tianhong

On 2015/4/15 22:25, Arnd Bergmann wrote:
 On Wednesday 15 April 2015 20:30:02 Ding Tianhong wrote:
 The patches series was used to fix the issues of the hip04 driver, and added
 some optimizations according to some good suggestion. 


 
 Thanks, that looks much better, except for patch 4 that I commented on.
 
I will check and fix it.

 I had at some point sent a patch that is archived at
 http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/318120.html
 
 I believe that one is also still needed, but I have not tested whether it
 is correct. Can you have a look at the patch from back then and see if it
 works, of if you find something wrong about it?
 
 I'm sending the unmodified patch from then here again for you to apply
 or comment. It will have to be rebased on top of your current changes.
 
   Arnd
 

Thanks for the suggestion, I notices that I miss this patch in my series from 
the maillist, 
I need time to check the code and try to test it, thanks for the work.:)

Ding

 8
 Subject: [PATCH] net/hip04: refactor interrupt masking
 
 The hip04 ethernet driver currently acknowledges all interrupts directly
 in the interrupt handler, and leaves all interrupts except the RX data
 enabled the whole time. This causes multiple problems:
 
 - When more packets come in between the original interrupt and the
   NAPI poll function, we will get an extraneous RX interrupt as soon
   as interrupts are enabled again.
 
 - The two error interrupts are by definition combining all errors that
   may have happened since the last time they were handled, but just
   acknowledging the irq without dealing with the cause of the condition
   makes it come back immediately. In particular, when NAPI is intentionally
   stalling the rx queue, this causes a storm of rx dropped messages.  
 
 - The access to the 'reg_inten' field in hip_priv is used for serializing
   access, but is in fact racy itself.
 
 To deal with these issues, the driver is changed to only acknowledge
 the IRQ at the point when it is being handled. The RX interrupts now get
 acked right before reading the number of received frames to keep spurious
 interrupts to the absolute minimum without losing interrupts.
 
 As there is no tx-complete interrupt, only an informational tx-dropped
 one, we now ack that in the tx reclaim handler, hoping that clearing
 the descriptors has also eliminated the error condition.
 
 The only place that reads the reg_inten now just relies on
 napi_schedule_prep() to return whether NAPI is active or not, and
 the poll() function is then going to ack and reenabled the IRQ if
 necessary.
 
 Finally, when we disable interrupts, they are now all disabled
 together, in order to simplify the logic and to avoid getting
 rx-dropped interrupts when NAPI is in control of the rx queue.
 
 Signed-off-by: Arnd Bergmann a...@arndb.de
 
 diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
 b/drivers/net/ethernet/hisilicon/hip04_eth.c
 index 525214ef5984..83773247a368 100644
 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c
 +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
 @@ -56,6 +56,8 @@
  #define RCV_DROP   BIT(7)
  #define TX_DROPBIT(6)
  #define DEF_INT_ERR(RCV_NOBUF | RCV_DROP | TX_DROP)
 +#define DEF_INT_RX (RCV_INT | RCV_NOBUF | RCV_DROP)
 +#define DEF_INT_TX (TX_DROP)
  #define DEF_INT_MASK   (RCV_INT | DEF_INT_ERR)
  
  /* TX descriptor config */
 @@ -154,7 +156,6 @@ struct hip04_priv {
 unsigned int port;
 unsigned int speed;
 unsigned int duplex;
 -   unsigned int reg_inten;
  
 struct napi_struct napi;
 struct net_device *ndev;
 @@ -303,17 +304,15 @@ static void hip04_mac_enable(struct net_device *ndev)
 val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
 writel_relaxed(val, priv-base + GE_PORT_EN);
  
 -   /* clear rx int */
 -   val = RCV_INT;
 -   writel_relaxed(val, priv-base + PPE_RINT);
 +   /* clear prior interrupts */
 +   writel_relaxed(DEF_INT_MASK, priv-base + PPE_RINT);
  
 /* config recv int */
 val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
 writel_relaxed(val, priv-base + PPE_CFG_RX_PKT_INT);
  
 /* enable interrupt */
 -   priv-reg_inten = DEF_INT_MASK;
 -   writel_relaxed(priv-reg_inten, priv-base + PPE_INTEN);
 +   writel_relaxed(DEF_INT_MASK, priv-base + PPE_INTEN);
  }
  
  static void hip04_mac_disable(struct net_device *ndev)
 @@ -322,8 +321,7 @@ static void hip04_mac_disable(struct net_device *ndev)
 u32 val;
  
 /* disable int */
 -   priv-reg_inten = ~(DEF_INT_MASK);
 -   writel_relaxed(priv-reg_inten, priv-base + PPE_INTEN);
 +   writel_relaxed(0, priv-base + PPE_INTEN);
  
 /* disable tx  rx */
 val = readl_relaxed(priv-base + GE_PORT_EN);
 @@ -403,6 +401,8 @@ static int hip04_tx_reclaim(struct net_device

Re: [v3] skbuff: Do not scrub skb mark within the same name space

2015-04-16 Thread James Morris

On Thu, 16 Apr 2015, Herbert Xu wrote:

 On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote:
  Le 15/04/2015 15:57, Herbert Xu a écrit :
  On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote:
  [snip]
  Subject: skbuff: Do not scrub skb mark within the same name space
  
  The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 (tunnels:
  Maybe add a Fixes tag?
  Fixes: ea23192e8e57 (tunnels: harmonize cleanup done on skb on rx path)
  
  harmonize cleanup done on skb on rx path) broke anyone trying to
  use netfilter marking across IPv4 tunnels.  While most of the
  fields that are cleared by skb_scrub_packet don't matter, the
  netfilter mark must be preserved.
  
  This patch rearranges skb_scurb_packet to preserve the mark field.
  nit: s/scurb/scrub
  
  Else it's fine for me.
 
 Sure.
 
 PS I used the wrong email for James the first time around.  So
 let me repeat the question here.  Should secmark be preserved
 or cleared across tunnels within the same name space? In fact,
 do our security models even support name spaces?

They don't support namespaces, and maintaining the label is critical for 
SELinux, at least, which mediates security for the system as a whole.


-- 
James Morris
jmor...@namei.org

Re: Revert net: Reset secmark when scrubbing packet

2015-04-16 Thread Thomas Graf

On 04/16/15 at 04:12pm, Herbert Xu wrote:
 On Thu, Apr 16, 2015 at 05:02:15PM +1000, James Morris wrote:
  
  They don't support namespaces, and maintaining the label is critical for 
  SELinux, at least, which mediates security for the system as a whole.
 
 Thanks for the confirmation James, I thought this looked a bit
 dodgy :)
 
 ---8---
 This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602
 because the secmark must be preserved even when a packet crosses
 namespace boundaries.  The reason is that security labels apply to
 the system as a whole and is not per-namespace.

No objection to reverting, _BUT_ just because security labels
apply to the system as a whole does not mean that both the packet
in the underlay and overlay belong to the same context.

The point here was to not blindly inherit the security context of a
packet based on the outer or inner header. Someone tagging all
packets addressed to the host itself with a SElinux context may not
expect that SELinux context to be preserved into a namespaced tenant.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-16 Thread George Dunlap

On 04/16/2015 10:20 AM, Daniel Borkmann wrote:
 So mid term, it would be much more beneficial if you attempt fix the
 underlying driver issues that actually cause high tx completion delays,
 instead of reintroducing bufferbloat. So that we all can move forward
 and not backwards in time.

Yes, I think we definitely see the need for this.  I think we certainly
agree that bufferbloat needs to be reduced, and minimizing the data we
need in the pipe for full performance on xennet is an important part
of that.

It should be said, however, that any virtual device is always going to
have higher latency than a physical device.  Hopefully we'll be able to
get the latency of xennet down to something that's more reasonable,
but it may just not be possible.  And in any case, if we're going to be
cranking down these limits to just barely within the tolerance of
physical NICs, virtual devices (either xennet or virtio_net) are never
going to be able to catch up.  (Without cheating that is.)

 What Eric described to you was that you introduce a new netdev member
 like netdev-needs_bufferbloat, set that indication from driver site,
 and cache that in the socket that binds to it, so you can adjust the
 test in tcp_xmit_size_goal(). It should merely be seen as a hint/indication
 for such devices. Hmm?

He suggested that after he'd been prodded by 4 more e-mails in which two
of us guessed what he was trying to get at.  That's what I was
complaining about.

Having a per-device long transmit latency hint sounds like a sensible
short-term solution to me.

 -George
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: make mandocs build failure with next-20150407

2015-04-16 Thread Jonathan Corbet

On Thu, 16 Apr 2015 08:33:15 +0900
Masanari Iida standby2...@gmail.com wrote:

 On Wed, Apr 8, 2015 at 7:41 AM, Jim Davis jim.ep...@gmail.com wrote:
DOCPROC Documentation/DocBook/80211.xml
  Error(.//include/net/mac80211.h:329): Cannot parse enum!
  Error(.//include/net/mac80211.h:367): Cannot parse enum!
  Warning(.//include/net/mac80211.h:381): No description found for
  parameter 'type'
  --  
 
 This error start to appear on Linus's tree.
 make xmldocs stops after this error.
 Then run it again,  make xmldocs process rest of the files.
 
 Add TO:  netdev and patch authors for reporting the issue.

It's the result of some incomplete changes in the wireless tree; I sent a
fix to Johannes that he applied a couple of days ago.  Once that hits
mainline the problem should go away.

Thanks,

jon
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-16 Thread George Dunlap

On 04/15/2015 07:19 PM, Eric Dumazet wrote:
 On Wed, 2015-04-15 at 19:04 +0100, George Dunlap wrote:
 
 Maybe you should stop wasting all of our time and just tell us what
 you're thinking.
 
 I think you make me wasting my time.
 
 I already gave all the hints in prior discussions.

Right, and I suggested these two options:

Obviously one solution would be to allow the drivers themselves to set
the tcp_limit_output_bytes, but that seems like a maintenance
nightmare.

Another simple solution would be to allow drivers to indicate whether
they have a high transmit latency, and have the kernel use a higher
value by default when that's the case. [1]

Neither of which you commented on.  Instead you pointed me to a comment
that only partially described what the limitations were. (I.e., it
described the two packets or 1ms, but not how they related, nor how
they related to the max of 2 64k packets outstanding of the default
tcp_limit_output_bytes setting.)

 -George


[1]
http://marc.info/?i=CAFLBxZYt7-v29ysm=f+5qmow64_qhesjzj98udba+1cs-pf...@mail.gmail.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-16 Thread Daniel Borkmann


On 04/16/2015 10:56 AM, George Dunlap wrote:

On 04/15/2015 07:19 PM, Eric Dumazet wrote:

On Wed, 2015-04-15 at 19:04 +0100, George Dunlap wrote:


Maybe you should stop wasting all of our time and just tell us what
you're thinking.


I think you make me wasting my time.

I already gave all the hints in prior discussions.


Right, and I suggested these two options:


So mid term, it would be much more beneficial if you attempt fix the
underlying driver issues that actually cause high tx completion delays,
instead of reintroducing bufferbloat. So that we all can move forward
and not backwards in time.


Obviously one solution would be to allow the drivers themselves to set
the tcp_limit_output_bytes, but that seems like a maintenance
nightmare.


Possible, but very hacky, as you penalize globally.


Another simple solution would be to allow drivers to indicate whether
they have a high transmit latency, and have the kernel use a higher
value by default when that's the case. [1]

Neither of which you commented on.  Instead you pointed me to a comment


What Eric described to you was that you introduce a new netdev member
like netdev-needs_bufferbloat, set that indication from driver site,
and cache that in the socket that binds to it, so you can adjust the
test in tcp_xmit_size_goal(). It should merely be seen as a hint/indication
for such devices. Hmm?


that only partially described what the limitations were. (I.e., it
described the two packets or 1ms, but not how they related, nor how
they related to the max of 2 64k packets outstanding of the default
tcp_limit_output_bytes setting.)

  -George

[1]
http://marc.info/?i=CAFLBxZYt7-v29ysm=f+5qmow64_qhesjzj98udba+1cs-pf...@mail.gmail.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 4/6] net: hip04: Don't free the tx skb when the buffer is not ready for xmit

2015-04-16 Thread Ding Tianhong

On 2015/4/16 16:41, Arnd Bergmann wrote:
 On Thursday 16 April 2015 14:26:21 Ding Tianhong wrote:
 On 2015/4/15 22:19, Arnd Bergmann wrote:
 On Wednesday 15 April 2015 20:30:06 Ding Tianhong wrote:
 @@ -489,6 +487,8 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, 
 struct net_device *ndev)
  
 /* Ensure tx_head update visible to tx reclaim */
 smp_wmb();
 +   count++;
 +   priv-tx_head = TX_NEXT(tx_head);
  


 Something is wrong here, the comment does not match the code any more, 
 because
 the smp_wmb() won't actually make the tx_head update visible.

   Arnd

 Yes, the smp_wmb() could only make sure the tx buffer was ready to xmit.
 
 The problem you need to avoid is that the tx reclaim function thinks it's
 done with all outstanding packets and does not retrigger, while the
 start_xmit thinks it will still get to that. This means you
 need a barrier between the priv-tx_head update and the
 napi_schedule_prep() call.
 

But I still doubt about that, when the buffer is ready to xmit, it will add to 
fifo queue, but it
doesn't mean it is finished xmit yet, the tx reclaim function will free the skb 
only
depend on whether tx_head is updated.

Ding

   Arnd
 
 .
 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 0/5] stmmac: Correct flow control configuration

2015-04-16 Thread Giuseppe CAVALLARO


Hi Vince

On 4/15/2015 6:17 PM, Vince Bridgers wrote:

This series of patches corrects flow control configuration for the Synopsys
GMAC driver.


Thx for these patches

For the series

Acked-by: Giuseppe Cavallaro peppe.cavall...@st.com

peppe



Flow control is configured based on a configurable receive fifo size. If
less than 4Kbytes flow control is left disabled and a warning is presented. If
a receive fifo size is not specified, flow control is left disabled to
maintain current behavior. Unicast pause detection was disabled, but is now
enabled. The pause time was modified to be maximum time per a XON/XOFF
flow control mode of operation.

This patch was tested on an Altera Cyclone 5 and an Altera Arria 10 devkit,
and verified that flow control operates as expected when enabled.

Please consider this series for inclusion so that flow control will
function as expected for the Synopsys GMAC controller.

Vince Bridgers (5):
   stmmac: Add properties for transmit and receive fifo sizes
   stmmac: Add defines and documentation for enabling flow control
   stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree
   stmmac: Enable unicast pause frame detect in GMAC Register 6
   stmmac: Configure Flow Control to work correctly based on rxfifo size

  Documentation/devicetree/bindings/net/ethernet.txt |  6 +++
  Documentation/devicetree/bindings/net/stmmac.txt   |  4 ++
  drivers/net/ethernet/stmicro/stmmac/common.h   |  5 ++-
  drivers/net/ethernet/stmicro/stmmac/dwmac1000.h| 51 ++
  .../net/ethernet/stmicro/stmmac/dwmac1000_core.c   |  5 ++-
  .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c| 26 ++-
  drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c |  2 +-
  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 16 ---
  .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  4 ++
  include/linux/stmmac.h |  2 +
  10 files changed, 111 insertions(+), 10 deletions(-)



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v3] skbuff: Do not scrub skb mark within the same name space

2015-04-16 Thread Herbert Xu

On Thu, Apr 16, 2015 at 09:35:31AM +0200, Nicolas Dichtel wrote:
 Herbert, could you send a v4 of your patch with the secmark included?

Actually this should go into a separate patch because it's a
straight revert and also that unlike the mark this should never
be cleared, even when you cross a namespace boundary.

But I will send a patch.

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v3] skbuff: Do not scrub skb mark within the same name space

2015-04-16 Thread Nicolas Dichtel


Le 16/04/2015 09:02, James Morris a écrit :

On Thu, 16 Apr 2015, Herbert Xu wrote:

[snip]

PS I used the wrong email for James the first time around.  So
let me repeat the question here.  Should secmark be preserved
or cleared across tunnels within the same name space? In fact,
do our security models even support name spaces?


They don't support namespaces, and maintaining the label is critical for
SELinux, at least, which mediates security for the system as a whole.

Herbert, could you send a v4 of your patch with the secmark included?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Revert net: Reset secmark when scrubbing packet

2015-04-16 Thread Herbert Xu

On Thu, Apr 16, 2015 at 05:02:15PM +1000, James Morris wrote:
 
 They don't support namespaces, and maintaining the label is critical for 
 SELinux, at least, which mediates security for the system as a whole.

Thanks for the confirmation James, I thought this looked a bit
dodgy :)

---8---
This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602
because the secmark must be preserved even when a packet crosses
namespace boundaries.  The reason is that security labels apply to
the system as a whole and is not per-namespace.

Signed-off-by: Herbert Xu herb...@gondor.apana.org.au

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a185427..d1967da 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4130,7 +4130,6 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
skb-ignore_df = 0;
skb_dst_drop(skb);
skb_sender_cpu_clear(skb);
-   skb_init_secmark(skb);
secpath_reset(skb);
nf_reset(skb);
nf_reset_trace(skb);
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v3] skbuff: Do not scrub skb mark within the same name space

2015-04-16 Thread Thomas Graf

On 04/16/15 at 09:03am, Herbert Xu wrote:
 The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 (tunnels:
 harmonize cleanup done on skb on rx path) broke anyone trying to
 use netfilter marking across IPv4 tunnels.  While most of the
 fields that are cleared by skb_scrub_packet don't matter, the
 netfilter mark must be preserved.
 
 This patch rearranges skb_scrub_packet to preserve the mark field.
 
 Fixes: ea23192e8e57 (tunnels: harmonize cleanup done on skb on rx path)
 Signed-off-by: Herbert Xu herb...@gondor.apana.org.au

Acked-by: Thomas Graf tg...@suug.ch

We should also add a flag to veth which expclitly allows to preserve
the mark into the namespace.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 4/6] net: hip04: Don't free the tx skb when the buffer is not ready for xmit

2015-04-16 Thread Arnd Bergmann

On Thursday 16 April 2015 14:26:21 Ding Tianhong wrote:
 On 2015/4/15 22:19, Arnd Bergmann wrote:
  On Wednesday 15 April 2015 20:30:06 Ding Tianhong wrote:
  @@ -489,6 +487,8 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, 
  struct net_device *ndev)
   
  /* Ensure tx_head update visible to tx reclaim */
  smp_wmb();
  +   count++;
  +   priv-tx_head = TX_NEXT(tx_head);
   
 
  
  Something is wrong here, the comment does not match the code any more, 
  because
  the smp_wmb() won't actually make the tx_head update visible.
  
Arnd
  
 Yes, the smp_wmb() could only make sure the tx buffer was ready to xmit.

The problem you need to avoid is that the tx reclaim function thinks it's
done with all outstanding packets and does not retrigger, while the
start_xmit thinks it will still get to that. This means you
need a barrier between the priv-tx_head update and the
napi_schedule_prep() call.

Arnd
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-16 Thread Ian Jackson

Prashant writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more 
messages]):
 Ian, using your config we are able to recreate the problem that you are 
 seeing. The driver finds the RX data buffer to be all zero, with a 
 analyzer trace we are seeing the chip is DMA'ing valid RX data buffer 
 contents to the host but once the driver tries to read this DMA area, it 
 is seeing all zero's which is the reason of the corruption. This is only 
 for the RX data buffer, the RX descriptor and status block update DMA 
 regions are having valid contents.

I am no expert on this area, but this suggests that the driver is
misoperating the Linux DMA management API.  This is what I think
Konrad suspected when he suggested the `iommu=soft swiotlb=force'
command line option.

Note in kernel-parameters.txt:

swiotlb=[ARM,IA-64,PPC,MIPS,X86]
Format: { int | force }
int -- Number of I/O TLB slabs
force -- force using of bounce buffers even if they
 wouldn't be automatically used by the kernel

So with `swiotlb=force' the DMA is _expected_ to go to a bounce buffer
managed by the kernel DMA API.

 This is unlikely to be a chip or driver issue, as the chip is doing the 
 correct DMA but the corruption occurs before driver reads it. Would 
 request iommu experts to take a look and suggest what can be done next.

As I say above I think this is probably a driver bug.

I have seen identical symptoms on a 5yo desktop box under my desk and
on two brand new rackmount servers; I therefore doubt that it's a
hardware problem.

Ian.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-16 Thread David Laight

From: George Dunlap
 Sent: 16 April 2015 09:56
 On 04/15/2015 07:19 PM, Eric Dumazet wrote:
  On Wed, 2015-04-15 at 19:04 +0100, George Dunlap wrote:
 
  Maybe you should stop wasting all of our time and just tell us what
  you're thinking.
 
  I think you make me wasting my time.
 
  I already gave all the hints in prior discussions.
 
 Right, and I suggested these two options:
 
 Obviously one solution would be to allow the drivers themselves to set
 the tcp_limit_output_bytes, but that seems like a maintenance
 nightmare.
 
 Another simple solution would be to allow drivers to indicate whether
 they have a high transmit latency, and have the kernel use a higher
 value by default when that's the case. [1]
 
 Neither of which you commented on.  Instead you pointed me to a comment
 that only partially described what the limitations were. (I.e., it
 described the two packets or 1ms, but not how they related, nor how
 they related to the max of 2 64k packets outstanding of the default
 tcp_limit_output_bytes setting.)

ISTM that you are changing the wrong knob.
You need to change something that affects the global amount of pending tx data,
not the amount that can be buffered by a single connection.

If you change tcp_limit_output_bytes and then have 1000 connections trying
to send data you'll suffer 'bufferbloat'.

If you call skb_orphan() in the tx setup path then the total number of
buffers is limited, but a single connection can (and will) will the tx
ring leading to incorrect RTT calculations and additional latency for
other connections.
This will give high single connection throughput but isn't ideal.

One possibility might be to call skb_orphan() when enough time has
elapsed since the packet was queued for transmit that it is very likely
to have actually been transmitted - even though 'transmit done' has
not yet been signalled.
Not at all sure how this would fit in though...

David

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-16 Thread Eric Dumazet

On Thu, 2015-04-16 at 12:39 +0100, George Dunlap wrote:
 On 04/15/2015 07:17 PM, Eric Dumazet wrote:
  Do not expect me to fight bufferbloat alone. Be part of the challenge,
  instead of trying to get back to proven bad solutions.
 
 I tried that.  I wrote a description of what I thought the situation
 was, so that you could correct me if my understanding was wrong, and
 then what I thought we could do about it.  You apparently didn't even
 read it, but just pointed me to a single cryptic comment that doesn't
 give me enough information to actually figure out what the situation is.
 
 We all agree that bufferbloat is a problem for everybody, and I can
 definitely understand the desire to actually make the situation better
 rather than dying the death of a thousand exceptions.
 
 If you want help fighting bufferbloat, you have to educate people to
 help you; or alternately, if you don't want to bother educating people,
 you have to fight it alone -- or lose the battle due to having a
 thousand exceptions.
 
 So, back to TSQ limits.  What's so magical about 2 packets being *in the
 device itself*?  And what does 1ms, or 2*64k packets (the default for
 tcp_limit_output_bytes), have anything to do with it?
 
 Your comment lists three benefits:
 1. better RTT estimation
 2. faster recovery
 3. high rates
 
 #3 is just marketing fluff; it's also contradicted by the statement that
 immediately follows it -- i.e., there are drivers for which the
 limitation does *not* give high rates.
 
 #1, as far as I can tell, has to do with measuring the *actual* minimal
 round trip time of an empty pipe, rather than the round trip time you
 get when there's 512MB of packets in the device buffer.  If a device has
 a large internal buffer, then having a large number of packets
 outstanding means that the measured RTT is skewed.
 
 The goal here, I take it, is to have this pipe *exactly* full; having
 it significantly more than full is what leads to bufferbloat.
 
 #2 sounds like you're saying that if there are too many packets
 outstanding when you discover that you need to adjust things, that it
 takes a long time for your changes to have an effect; i.e., if you have
 5ms of data in the pipe, it will take at least 5ms for your reduced
 transmmission rate to actually have an effect.
 
 Is that accurate, or have I misunderstood something?

#2 means that :

If you have an outstanding queue of 500 packets for a flow in qdisc.

A rtx has to be done, because we receive a SACK.

The rtx is queued _after_ the previous 500 packets.

500 packets have to be drained before rtx can be sent and eventually
reach destination.


These 500 packets will likely be dropped because the destination cannot
process them before the rtx.

2 TSO packets are already 90 packets (MSS=1448). It is not small, but a
good compromise allowing line rate even on 40Gbit NIC.


#1 is not marketing. It is hugely relevant.

You might use cubic as the default congestion control, you have to
understand we work hard on delay based cc, as losses are no longer a way
to measure congestion in modern networks.

Vegas and delay gradient congestion depends on precise RTT measures.

I added usec RTT estimations (instead of jiffies based rtt samples) to
increase resolution by 3 order of magnitude, not for marketing, but
because it had to be done when DC communications have typical rtt of 25
usec these days.

And jitter in host queues is not nice and must be kept at the minimum.

You do not have the whole picture, but this tight bufferbloat control is
one step before we can replace cubic by new upcoming cc, that many
companies are actively developing and testing.

The steps are the following :

1) TCP Small queues
2) FQ/pacing
3) TSO auto sizing
3) usec rtt estimations
4) New revolutionary cc module currently under test at Google,
   but others have alternatives.


The fact that few drivers have bugs should not stop this effort.

If you guys are in the Bay area, we would be happy to host a meeting
where we can present you how our work reduced packet drops in our
networks by 2 order of magnitude, and increased capacity by 40 or 50%.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netns: deinline net_generic()

2015-04-16 Thread Eric Dumazet

On Thu, 2015-04-16 at 13:14 +0200, Denys Vlasenko wrote:

 However, without BUG_ONs, function is still a bit big
 on PREEMPT configs.

Only on allyesconfig builds, that nobody use but to prove some points
about code size.

If you look at net_generic(), it is mostly used from code that is
normally in 3 modules. How many people really load them ?

net/tipc : 91 call sites
net/sunrpc : 57
fs/nfsd  fs/lockd : 183

Then few remaining uses in tunnels.

As we suggested, please just remove the BUG_ON().

Then, if you can come up with a solution that does not slow down non
PREEMPT kernel, why not.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Andrew Lunn

On Wed, Apr 15, 2015 at 10:12:42PM -0700, Guenter Roeck wrote:
 Return correct error code if _mv88e6xxx_reg_read returns an error.
 
 Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging support)
 Signed-off-by: Guenter Roeck li...@roeck-us.net

Hi Guenter

Good catch. I'm surprised there is no compiler warning, possibly used
before set.

Reviewed-by: Andrew Lunn and...@lunn.ch

Thanks
Andrew

 ---
  drivers/net/dsa/mv88e6xxx.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
 index fc8d3b6ffe8e..f64186a5f634 100644
 --- a/drivers/net/dsa/mv88e6xxx.c
 +++ b/drivers/net/dsa/mv88e6xxx.c
 @@ -908,8 +908,10 @@ static int mv88e6xxx_set_port_state(struct dsa_switch 
 *ds, int port, u8 state)
   mutex_lock(ps-smi_mutex);
  
   reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_CONTROL);
 - if (reg  0)
 + if (reg  0) {
 + ret = reg;
   goto abort;
 + }
  
   oldstate = reg  PORT_CONTROL_STATE_MASK;
   if (oldstate != state) {
 -- 
 2.1.0
 
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Hannes Frederic Sowa

On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote:
 On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote:
 
  Netfilter may change the contents of the packet, even change its size.
  It is *really* hard to do this while keeping the original fragments
  intact.
 
 Perhaps we should provide better helpers to facilitate this?
 
 So instead of directly manipulating the content of the skb you
 would so so through helpers and the helpers can then try to do
 sensible things with the fragments.

When Florian and me started discussing how to solve this we also wanted
to be as transparent as possible. But looking at all possible
fragmentation scenarios, this seems to be too complicated.

Even imagine a fragment with overlapping offsets and some of the
fragments got duplicated. If we had to keep this in frag_list and now
netfilter has to change any of this contents, this will become a total
mess, like changing one port in multiple skbs at different offsets.

I doubt it is worth the effort.

Thanks,
Hannes
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netns: deinline net_generic()

2015-04-16 Thread Denys Vlasenko

Hi David, Eric,

As you may have surmised, this patch wasn't a result of me looking
at networking code; rather, it is a result of semi-automated search
for huge inlines.

The last step of this process would be the submission of patches.
I am expecting a range of responses from maintainers: enthusiastic,
no reply, go away, I don't care about code size,
and everything in between.

I can't invest a large amount of effort trying to push
every deinlining patch through. If someone, for example, would
flat out refuse to fix a huge inline in his driver, so be it.

I will be happy to run at least a few iterations over a patch
if maintainers do see a merit in deinlining, but have some
reservations wrt performance.

Your response falls into the latter case (you aren't refusing the patch
outright, but want it to be changed in some way).

It would help if you tell me how I should change the patches.

(I also hope to have a semi-generic way of addressing
performance concerns for future deinlining
patch submissions.)


On 04/14/2015 08:37 PM, David Miller wrote:
 From: Denys Vlasenko dvlas...@redhat.com
 Date: Tue, 14 Apr 2015 14:25:11 +0200
 
 On x86 allyesconfig build:
 The function compiles to 130 bytes of machine code.
 It has 493 callsites.
 Total reduction of vmlinux size: 27906 bytes.

text  data  bss   dec hex filename
 82447071 22255384 20627456 125329911 77861f7 vmlinux4
 82419165 22255384 20627456 125302005 777f4f5 vmlinux5

 Signed-off-by: Denys Vlasenko dvlas...@redhat.com
 
 That BUG_ON() was added 7 years ago, and I don't remember it ever
 triggering or helping us diagnose something, so just remove it and
 keep the function inlined.

There are two BUG_ONs. I'll remove both of them in v2.
Let me know if you want something else.

However, without BUG_ONs, function is still a bit big
on PREEMPT configs.

Among almost 500 users, many probably aren't hot paths.

How about having an inlined version, say, fast_net_generic(),
and a deinlined one, and using one or another as appropriate.
Is this ok with you?


On 04/14/2015 03:19 PM, Eric Dumazet wrote:
 On Tue, 2015-04-14 at 14:25 +0200, Denys Vlasenko wrote:
 On x86 allyesconfig build:
 The function compiles to 130 bytes of machine code.
 It has 493 callsites.
 Total reduction of vmlinux size: 27906 bytes.

text   data  bss   dec hex filename
 82447071 22255384 20627456 125329911 77861f7 vmlinux4
 82419165 22255384 20627456 125302005 777f4f5 vmlinux5

 This sounds a big hammer to me.

 These savings probably comes from the BUG_ON() that could simply be
 removed.
 The second one for sure has no purpose. First one looks defensive.

 For a typical (non allyesconfig) kernel, net_generic() would translate
 to :

 return net-gen[id - 1]

 Tunnels need this in fast path, so I presume we could introduce
 net_generic_rcu() to keep this stuff inlined where it matters.

 static inline void *net_generic_rcu(const struct net *net, int id)
 {
   struct net_generic *ng = rcu_dereference(net-gen);

   return ng-ptr[id - 1];
 }

I looked at the code and I'm not feeling confident enough
to find places where replacing net_generic() with
net_generic_rcu() is okay.

Would it be okay if I add net_generic_rcu() as you suggest,
but leave it to you (network people) to switch to it where
appropriate?

-- 
vda
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH linux-next 1/4] infiniband/ipoib: fix possible NULL pointer dereference in ipoib_get_iflink

2015-04-16 Thread Erez Shitrit

On Wed, Apr 15, 2015 at 7:06 PM, Jason Gunthorpe
jguntho...@obsidianresearch.com wrote:
 On Wed, Apr 15, 2015 at 09:17:14AM +0300, Erez Shitrit wrote:
 +   /* parent interface */
 +   if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
 +   return dev-ifindex;
 +
 +   /* child/vlan interface */
 +   if (!priv-parent)
 +   return -1;

 Like was said for other drivers, I can't see how parent can be null
 while IPOIB_FLAG_SUBINTERFACE is set. Drop the last if.

 It can, at least for ipoib child interface (AKA vlan), you can't
 control the call for that ndo and it can be called before the parent
 was set.

 If the ndo can be called before the netdev private structures are fully
 prepared then we have another bug, and returning -1 or 0 is not the right
 answer anyhow.

 For safety, fold this into your patch.

OK, will do that.


 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 index 9fad7b5ac8b9..e62b007adf5d 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
 /* MTU will be reset when mcast join happens */
 priv-dev-mtu   = IPOIB_UD_MTU(priv-max_ib_mtu);
 priv-mcast_mtu  = priv-admin_mtu = priv-dev-mtu;
 +   priv-parent = ppriv-dev;
 set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags);

 result = ipoib_set_dev_features(priv, ppriv-ca);
 @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
 goto register_failed;
 }

 -   priv-parent = ppriv-dev;
 -
 ipoib_create_debug_files(priv-dev);

 /* RTNL childs don't need proprietary sysfs entries */
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

pppoe relay and MAC address filtering

2015-04-16 Thread Joakim Tjernlund

I have create pppoe session over a pppoe relay socket:
A B and C are Linux nodes, B impl. a relay socket so
that A and B can create a pppoe session:

A ifA-ifB0 B ifB1--ifC C

Now I noticed that if ifB0 is in promisc mode it picks up other pppoe pkgs which
are meant for some other pppoe session to another node(D) and relays this
pkg to node C!

To me this looks like the pppoe relay socket does not check if the DST MAC on
pppoe pkgs received over ifB0 matches ifB0 MAC address?
Instead the relay socket happily relays any pppoe pkg as long as the session id 
matches.

This feels like a bug to me, comments?

  Jocke--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-16 Thread cascardo

On Thu, Apr 16, 2015 at 11:18:39AM +0100, Ian Jackson wrote:
 Prashant writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more 
 messages]):
  Ian, using your config we are able to recreate the problem that you are 
  seeing. The driver finds the RX data buffer to be all zero, with a 
  analyzer trace we are seeing the chip is DMA'ing valid RX data buffer 
  contents to the host but once the driver tries to read this DMA area, it 
  is seeing all zero's which is the reason of the corruption. This is only 
  for the RX data buffer, the RX descriptor and status block update DMA 
  regions are having valid contents.
 
 I am no expert on this area, but this suggests that the driver is
 misoperating the Linux DMA management API.  This is what I think
 Konrad suspected when he suggested the `iommu=soft swiotlb=force'
 command line option.
 
 Note in kernel-parameters.txt:
 
 swiotlb=[ARM,IA-64,PPC,MIPS,X86]
 Format: { int | force }
 int -- Number of I/O TLB slabs
 force -- force using of bounce buffers even if they
  wouldn't be automatically used by the kernel
 
 So with `swiotlb=force' the DMA is _expected_ to go to a bounce buffer
 managed by the kernel DMA API.
 
  This is unlikely to be a chip or driver issue, as the chip is doing the 
  correct DMA but the corruption occurs before driver reads it. Would 
  request iommu experts to take a look and suggest what can be done next.
 
 As I say above I think this is probably a driver bug.
 

Yes, this looks like the driver is not syncing the DMA buffers. Unmap is
supposed to synchronize as well.

Prashant, can you point to where in the code you see all zeroes after
checking up the data?

Cascardo.

 I have seen identical symptoms on a 5yo desktop box under my desk and
 on two brand new rackmount servers; I therefore doubt that it's a
 hardware problem.
 
 Ian.
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-16 Thread Eric Dumazet

On Thu, 2015-04-16 at 11:01 +0100, George Dunlap wrote:

 He suggested that after he'd been prodded by 4 more e-mails in which two
 of us guessed what he was trying to get at.  That's what I was
 complaining about.

My big complain is that I suggested to test to double the sysctl, which
gave good results.

Then you provided a patch using a 8x factor. How does that sound ?

Next time I ask a raise, I should try a 8x factor as well, who knows,
it might be accepted.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rocker: fix error return code in rocker_probe()

2015-04-16 Thread Jiri Pirko

Thu, Apr 16, 2015 at 02:21:02PM CEST, weiyj...@163.com wrote:
From: Wei Yongjun yongjun_...@trendmicro.com.cn

Fix to return -EINVAL from the invalid PCI region size error
handling case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn


Good catch.

Acked-by: Jiri Pirko j...@resnulli.us
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] mac802154: llsec: fix return value check in llsec_key_alloc()

2015-04-16 Thread weiyj_lk

From: Wei Yongjun yongjun_...@trendmicro.com.cn

In case of error, the functions crypto_alloc_aead() and crypto_alloc_blkcipher()
returns ERR_PTR() and never returns NULL. The NULL test in the return value 
check
should be replaced with IS_ERR().

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
---
 net/mac802154/llsec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mac802154/llsec.c b/net/mac802154/llsec.c
index dcf7395..5b2be12 100644
--- a/net/mac802154/llsec.c
+++ b/net/mac802154/llsec.c
@@ -134,7 +134,7 @@ llsec_key_alloc(const struct ieee802154_llsec_key *template)
for (i = 0; i  ARRAY_SIZE(key-tfm); i++) {
key-tfm[i] = crypto_alloc_aead(ccm(aes), 0,
CRYPTO_ALG_ASYNC);
-   if (!key-tfm[i])
+   if (IS_ERR(key-tfm[i]))
goto err_tfm;
if (crypto_aead_setkey(key-tfm[i], template-key,
   IEEE802154_LLSEC_KEY_SIZE))
@@ -144,7 +144,7 @@ llsec_key_alloc(const struct ieee802154_llsec_key *template)
}
 
key-tfm0 = crypto_alloc_blkcipher(ctr(aes), 0, CRYPTO_ALG_ASYNC);
-   if (!key-tfm0)
+   if (IS_ERR(key-tfm0))
goto err_tfm;
 
if (crypto_blkcipher_setkey(key-tfm0, template-key,

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-16 Thread George Dunlap

On Thu, Apr 16, 2015 at 10:22 AM, David Laight david.lai...@aculab.com wrote:
 ISTM that you are changing the wrong knob.
 You need to change something that affects the global amount of pending tx 
 data,
 not the amount that can be buffered by a single connection.

Well it seems like the problem is that the global amount of pending tx
data is high enough, but that the per-stream amount is too low for
only a single stream.

 If you change tcp_limit_output_bytes and then have 1000 connections trying
 to send data you'll suffer 'bufferbloat'.

Right -- so are you worried about the buffers in the local device
here, or are you worried about buffers elsewhere in the network?

If you're worried about buffers on the local device, don't you have a
similar problem for physical NICs?  i.e., if a NIC has a big buffer
that you're trying to keep mostly empty, limiting a single TCP stream
may keep that buffer empty, but if you have 1000 connections,
1000*limit will still fill up the buffer.

Or am I missing something?

 If you call skb_orphan() in the tx setup path then the total number of
 buffers is limited, but a single connection can (and will) will the tx
 ring leading to incorrect RTT calculations and additional latency for
 other connections.
 This will give high single connection throughput but isn't ideal.

 One possibility might be to call skb_orphan() when enough time has
 elapsed since the packet was queued for transmit that it is very likely
 to have actually been transmitted - even though 'transmit done' has
 not yet been signalled.
 Not at all sure how this would fit in though...

Right -- so it sounds like the problem with skb_orphan() is making
sure that the tx ring is shared properly between different streams.
That would mean that ideally we wouldn't call it until the tx ring
actually had space to add more packets onto it.

The Xen project is having a sort of developer meeting in a few weeks;
if we can get a good picture of all the constraints, maybe we can hash
out a solution that works for everyone.

 -George
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH -next] ethernet: remove unused including linux/version.h

2015-04-16 Thread weiyj_lk

From: Wei Yongjun yongjun_...@trendmicro.com.cn

Remove including linux/version.h that don't need it.

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
---
 drivers/net/ethernet/qualcomm/qca_spi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c 
b/drivers/net/ethernet/qualcomm/qca_spi.c
index 4a42e96..f66641d 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -41,7 +41,6 @@
 #include linux/skbuff.h
 #include linux/spi/spi.h
 #include linux/types.h
-#include linux/version.h
 
 #include qca_7k.h
 #include qca_debug.h

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-16 Thread George Dunlap

On 04/15/2015 07:17 PM, Eric Dumazet wrote:
 Do not expect me to fight bufferbloat alone. Be part of the challenge,
 instead of trying to get back to proven bad solutions.

I tried that.  I wrote a description of what I thought the situation
was, so that you could correct me if my understanding was wrong, and
then what I thought we could do about it.  You apparently didn't even
read it, but just pointed me to a single cryptic comment that doesn't
give me enough information to actually figure out what the situation is.

We all agree that bufferbloat is a problem for everybody, and I can
definitely understand the desire to actually make the situation better
rather than dying the death of a thousand exceptions.

If you want help fighting bufferbloat, you have to educate people to
help you; or alternately, if you don't want to bother educating people,
you have to fight it alone -- or lose the battle due to having a
thousand exceptions.

So, back to TSQ limits.  What's so magical about 2 packets being *in the
device itself*?  And what does 1ms, or 2*64k packets (the default for
tcp_limit_output_bytes), have anything to do with it?

Your comment lists three benefits:
1. better RTT estimation
2. faster recovery
3. high rates

#3 is just marketing fluff; it's also contradicted by the statement that
immediately follows it -- i.e., there are drivers for which the
limitation does *not* give high rates.

#1, as far as I can tell, has to do with measuring the *actual* minimal
round trip time of an empty pipe, rather than the round trip time you
get when there's 512MB of packets in the device buffer.  If a device has
a large internal buffer, then having a large number of packets
outstanding means that the measured RTT is skewed.

The goal here, I take it, is to have this pipe *exactly* full; having
it significantly more than full is what leads to bufferbloat.

#2 sounds like you're saying that if there are too many packets
outstanding when you discover that you need to adjust things, that it
takes a long time for your changes to have an effect; i.e., if you have
5ms of data in the pipe, it will take at least 5ms for your reduced
transmmission rate to actually have an effect.

Is that accurate, or have I misunderstood something?

 -George
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH -next] fou: Fix missing unlock on error in fou_nl_dump()

2015-04-16 Thread weiyj_lk

From: Wei Yongjun yongjun_...@trendmicro.com.cn

Add the missing unlock before return from function fou_nl_dump()
in the error handling case.

Fixes: 7a6c8c34e5b7 (fou: implement FOU_CMD_GET)
Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
---
 net/ipv4/fou.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index af150b4..6ca0182 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -711,7 +711,7 @@ static int fou_nl_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
cb-nlh-nlmsg_seq, NLM_F_MULTI,
skb, FOU_CMD_GET);
if (ret)
-   goto done;
+   break;
}
mutex_unlock(fn-fou_lock);
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] rocker: fix error return code in rocker_probe()

2015-04-16 Thread weiyj_lk

From: Wei Yongjun yongjun_...@trendmicro.com.cn

Fix to return -EINVAL from the invalid PCI region size error
handling case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
---
 drivers/net/ethernet/rocker/rocker.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 5cecec2..0cc17f5 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4293,6 +4293,7 @@ static int rocker_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
 
if (pci_resource_len(pdev, 0)  ROCKER_PCI_BAR0_SIZE) {
dev_err(pdev-dev, invalid PCI region size\n);
+   err = -EINVAL;
goto err_pci_resource_len_check;
}
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Guenter Roeck


On 04/16/2015 05:37 AM, Andrew Lunn wrote:

On Wed, Apr 15, 2015 at 10:12:42PM -0700, Guenter Roeck wrote:

Return correct error code if _mv88e6xxx_reg_read returns an error.

Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging support)
Signed-off-by: Guenter Roeck li...@roeck-us.net


Hi Guenter

Good catch. I'm surprised there is no compiler warning, possibly used
before set.


There was, actually. No idea how I missed that. Fengguang's build bots caught 
it.


Reviewed-by: Andrew Lunn and...@lunn.ch


Thanks!

Guenter

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Guenter Roeck


On 04/15/2015 10:12 PM, Guenter Roeck wrote:

Return correct error code if _mv88e6xxx_reg_read returns an error.

Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging support)
Signed-off-by: Guenter Roeck li...@roeck-us.net


I should have given proper credit.

Reported-by: kbuild test robot fengguang...@intel.com

For the curious, neither W=1, W=2, C-1, C=2, nor sparse report this problem,
at least not with gcc 4.9.1.

Guenter

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] dsa: mv88e6xxx: Drop duplicate declaration of 'ret' variable

2015-04-16 Thread Guenter Roeck

A duplicate declaration of 'ret' can result in hiding an error code.
Drop it.

Fixes: 17ee3e04ddbf (net: dsa: Provide additional RMON statistics)
Signed-off-by: Guenter Roeck li...@roeck-us.net
---
Found by compiling the file with W=2.

 drivers/net/dsa/mv88e6xxx.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index f64186a5f634..12f186cb30cc 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -602,8 +602,6 @@ static void _mv88e6xxx_get_ethtool_stats(struct dsa_switch 
*ds,
u32 high = 0;
 
if (s-reg = 0x100) {
-   int ret;
-
ret = mv88e6xxx_reg_read(ds, REG_PORT(port),
 s-reg - 0x100);
if (ret  0)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netns: deinline net_generic()

2015-04-16 Thread David Miller

From: Denys Vlasenko dvlas...@redhat.com
Date: Thu, 16 Apr 2015 13:14:14 +0200

 It would help if you tell me how I should change the patches.

Why ask Eric when I told you exactly how to change the patch to make
it acceptable, so please do so.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread David Miller

From: Hannes Frederic Sowa han...@stressinduktion.org
Date: Thu, 16 Apr 2015 14:11:42 +0200

 On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote:
 On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote:

  Netfilter may change the contents of the packet, even change its size.
  It is *really* hard to do this while keeping the original fragments
  intact.

 Perhaps we should provide better helpers to facilitate this?

 So instead of directly manipulating the content of the skb you
 would so so through helpers and the helpers can then try to do
 sensible things with the fragments.

 When Florian and me started discussing how to solve this we also wanted
 to be as transparent as possible. But looking at all possible
 fragmentation scenarios, this seems to be too complicated.

 Even imagine a fragment with overlapping offsets and some of the
 fragments got duplicated. If we had to keep this in frag_list and now
 netfilter has to change any of this contents, this will become a total
 mess, like changing one port in multiple skbs at different offsets.

 I doubt it is worth the effort.

You guys keep talking about exceptional cases rather than what is
unquestionable the common case, and the one worth handling in the
fast paths.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Jason Gunthorpe

On Thu, Apr 16, 2015 at 04:34:34PM +0300, Erez Shitrit wrote:
 Currently, iflink of the parent interface was always accessed, even 
 when interface didn't have a parent and hence we crashed there.
 
 Handle the interface types properly: for a child interface, return
 the ifindex of the parent, for parent interface, return its ifindex.
 
 For child devices, make sure to set the parent pointer prior to
 invoking register_netdevice(), this allows the new ndo to be called
 by the stack immediately after the child device is registered.
 
 Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')
 Reported-by: Honggang Li ho...@redhat.com
 Signed-off-by: Erez Shitrit ere...@mellanox.com
 Signed-off-by: Honggang Li ho...@redhat.com

Reviewed-By: Jason Gunthorpe jguntho...@obsidianresearch.com+

 changes from V0:
  - fixed two typos in the change-log
 
  drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
  drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
  2 files changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 index 657b89b..915ad04 100644
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
  {
   struct ipoib_dev_priv *priv = netdev_priv(dev);
  
 + /* parent interface */
 + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
 + return dev-ifindex;
 +
 + /* child/vlan interface */
   return priv-parent-ifindex;
  }
  
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 index 4dd1313..fca1a88 100644
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
   /* MTU will be reset when mcast join happens */
   priv-dev-mtu   = IPOIB_UD_MTU(priv-max_ib_mtu);
   priv-mcast_mtu  = priv-admin_mtu = priv-dev-mtu;
 + priv-parent = ppriv-dev;
   set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags);
  
   result = ipoib_set_dev_features(priv, ppriv-ca);
 @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
   goto register_failed;
   }
  
 - priv-parent = ppriv-dev;
 -
   ipoib_create_debug_files(priv-dev);
  
   /* RTNL childs don't need proprietary sysfs entries */
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH stable 3.10-3.16] tcp: Fix crash in TCP Fast Open

2015-04-16 Thread Luis Henriques

On Wed, Apr 15, 2015 at 07:00:32PM +0100, Ben Hutchings wrote:
 Commit 355a901e6cf1 (tcp: make connect() mem charging friendly)
 changed tcp_send_syn_data() to perform an open-coded copy of the 'syn'
 skb rather than using skb_copy_expand().
 
 The open-coded copy does not cover the skb_shared_info::gso_segs
 field, so in the new skb it is left set to 0.  When this commit was
 backported into stable branches between 3.10.y and 3.16.7-ckty
 inclusive, it triggered the BUG() in tcp_transmit_skb().
 
 Since Linux 3.18 the GSO segment count is kept in the
 tcp_skb_cb::tcp_gso_segs field and tcp_send_syn_data() does copy the
 tcp_skb_cb structure to the new skb, so mainline and newer stable
 branches are not affected.
 
 Set skb_shared_info::gso_segs to the correct value of 1.
 
 Signed-off-by: Ben Hutchings b...@decadent.org.uk

Thanks a lot, Ben.  I'll queue this for the next 3.16 kernel release.

Cheers,
--
Luís

 ---
  net/ipv4/tcp_output.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
 index d5457e4..1ea0a07 100644
 --- a/net/ipv4/tcp_output.c
 +++ b/net/ipv4/tcp_output.c
 @@ -2992,6 +2992,7 @@ static int tcp_send_syn_data(struct sock *sk, struct 
 sk_buff *syn)
   goto fallback;
   syn_data-ip_summed = CHECKSUM_PARTIAL;
   memcpy(syn_data-cb, syn-cb, sizeof(syn-cb));
 + skb_shinfo(syn_data)-gso_segs = 1;
   if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space),
fo-data-msg_iov, 0, space))) {
   kfree_skb(syn_data);
 
 -- 
 Ben Hutchings
 Editing code like this is akin to sticking plasters on the bleeding stump
 of a severed limb. - me, 29 June 1999

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread David Miller

From: Patrick McHardy ka...@trash.net
Date: Thu, 16 Apr 2015 06:24:00 +0100

 On 16.04, Herbert Xu wrote:
 David Miller da...@davemloft.net wrote:

  Because then there is no ambiguity at all, you preserve on output
  exactly what you had on input.  The same geometry, the same
  everything.  No special checks, no max frag len, none of this crap.
  Those are all hacks trying to work around the _fundamental_ issue
  which is that we potentially change the thing when we refrag.

 Agreed.  Doing anything other than preserving the original geometry
 is simply wrong.

 However, this doesn't mean that netfilter has to process each
 fragment.  What we could do is to preserve the original fragments
 in frag_list and then process the overall skb as a unit in netfilter.

 On output we simply fragment according to the original frag_list.

 The only thing to watch out for is to eliminate anything in the
 middle that tries to linearise the skb.

 Netfilter may change the contents of the packet, even change its size.
 It is *really* hard to do this while keeping the original fragments
 intact.

I keep hearing a lot of it's hard as the only reason we shouldn't do
this properly, and that frankly sucks.  People aren't looking for a
solution and to be honest it's quite tiring.

The common case is that the rules processed are simple, the size of
the overall packet does _not_ change, and therefore the best thing
to do is pass the entire thing as a unit with the frags in tact.

That's the fundamental fact.  It's also the fastest way to process
these packets and avoids all of these stupid max frag garbage.

Only at the point where netfilter makes changes to the size of the
packet does it take ownership and get to take on the responsibility
of making sure the new resulting fragments are sane.

But only at that point.

It must not molest the geometry in any other set of circumstances,
and I absolutely will not relent on this.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Hannes Frederic Sowa

Hi David,

On Thu, Apr 16, 2015, at 17:43, David Miller wrote:
 From: Hannes Frederic Sowa han...@stressinduktion.org
 Date: Thu, 16 Apr 2015 14:11:42 +0200

  On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote:
  On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote:

   Netfilter may change the contents of the packet, even change its size.
   It is *really* hard to do this while keeping the original fragments
   intact.

  Perhaps we should provide better helpers to facilitate this?

  So instead of directly manipulating the content of the skb you
  would so so through helpers and the helpers can then try to do
  sensible things with the fragments.

  When Florian and me started discussing how to solve this we also wanted
  to be as transparent as possible. But looking at all possible
  fragmentation scenarios, this seems to be too complicated.

  Even imagine a fragment with overlapping offsets and some of the
  fragments got duplicated. If we had to keep this in frag_list and now
  netfilter has to change any of this contents, this will become a total
  mess, like changing one port in multiple skbs at different offsets.

  I doubt it is worth the effort.

 You guys keep talking about exceptional cases rather than what is
 unquestionable the common case, and the one worth handling in the
 fast paths.

So currently we have one fast path, that is: we are not fragmented, we
get out non-fragmented, none of this code is ever touched and no
problem.

We don't want to mak this more complex, but

every other solution would need to first expand sk_buff with a new
member(!), then:
  1) push the original fragments through netfilter - I think this will
  definitely break user space compatibility
  2) add new wrappers into the fast path which now will make sure that
  we access the correct skb from the frag_list instead of just
  dereference the pointer - we will make fast path also more slowly
  and a lot more complex
  3) use the reassembled skb for netfilter and emit the original ones -
  seems like a security hazard to me, discrepancy what is 
  checked and what is used
  4) just use the frag_list skb-len's to restore the reassembled skb to
  the original sizes, this would be possible but we would still harm 
  fast-path with that (new skb member) we could switch to
  dynamically allocated array of int[] to store sizes and DF bit
  information
  In particular, even this would not achieve the goal to have
  perfect transparency.

I can give you a more fundamental problem:
We cannot even be transparent if asymmetric routing is in effect and we
don't see all fragments. In case we have contrack rules active, we clear
fragment queues after some time. So a bridge actually will toss
offset!=0 packets if it never sees the head. If we let the packet pass,
we might have a security problem. We can never become fully transparent.

Bye,
Hannes
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread David Miller

From: Guenter Roeck li...@roeck-us.net
Date: Wed, 15 Apr 2015 22:12:42 -0700

 Return correct error code if _mv88e6xxx_reg_read returns an error.
 
 Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging support)
 Signed-off-by: Guenter Roeck li...@roeck-us.net

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next] netns: remove duplicated include from net_namespace.c

2015-04-16 Thread David Miller

From: weiyj...@163.com
Date: Thu, 16 Apr 2015 21:17:35 +0800

 From: Wei Yongjun yongjun_...@trendmicro.com.cn
 
 Remove duplicated include.
 
 Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] dsa: mv88e6xxx: Drop duplicate declaration of 'ret' variable

2015-04-16 Thread David Miller

From: Guenter Roeck li...@roeck-us.net
Date: Thu, 16 Apr 2015 06:49:50 -0700

 A duplicate declaration of 'ret' can result in hiding an error code.
 Drop it.
 
 Fixes: 17ee3e04ddbf (net: dsa: Provide additional RMON statistics)
 Signed-off-by: Guenter Roeck li...@roeck-us.net

Applied, thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next] ethernet: remove unused including linux/version.h

2015-04-16 Thread David Miller

From: weiyj...@163.com
Date: Thu, 16 Apr 2015 21:06:47 +0800

 From: Wei Yongjun yongjun_...@trendmicro.com.cn
 
 Remove including linux/version.h that don't need it.
 
 Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] altera tse: Fix network-delays and -retransmissions after high throughput.

2015-04-16 Thread David Miller

From: Andreas Oetken ennoerlan...@googlemail.com
Date: Tue, 14 Apr 2015 22:25:26 +0200

 From: Andreas Oetken andr...@oetken.name

 Fix bug which occurs when more than limit packets are available during 
 napi-poll,
 leading to delays and retransmissions on the network.

 Check for (count  limit) before checking the get_rx_status in 
 tse_rx-function. get_rx_status is reading from the response-fifo.
 If there is currently a response in the fifo, reading the last byte of the 
 response pops the value from the fifo.
 If the limit is check as second condition and the limit is reached the fifo 
 is popped but the packet is not processed.

 Signed-off-by: Andreas Oetken ennoerlan...@gmail.com

Your commit message lines are very long, format them to 80 columns
or less.

 - while (((rxstatus = priv-dmaops-get_rx_status(priv)) != 0) 
 -(count  limit))  {
 + /* Check for limit first. get_rx_status is reading the from the
 + * response-fifo. If there is a currently a response in the fifo,
 + * reading the last byte of the response pops the value from the fifo.
 + */
 + while ((count  limit) 
 + ((rxstatus = priv-dmaops-get_rx_status(priv)) != 0)) {

You're corrupted the indentation.

The second line of this while statement should start precisely at
the first column after the openning parenthesis of the first line.
You must use the appropriate number of TAB then SPACE characters
necessary to achieve this.

If you are purely using TAB characters, as you have done here, for
indentation then it is very likely that you have indented things
incorrectly.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/1] cxgb4: drop __GFP_NOFAIL allocation

2015-04-16 Thread David Miller

From: a...@linux-foundation.org
Date: Tue, 14 Apr 2015 13:24:33 -0700

 From: Michal Hocko mho...@suse.cz
 Subject: cxgb4: drop __GFP_NOFAIL allocation

 set_filter_wr is requesting __GFP_NOFAIL allocation although it can return
 ENOMEM without any problems obviously (t4_l2t_set_switching does that
 already).  So the non-failing requirement is too strong without any
 obvious reason.  Drop __GFP_NOFAIL and reorganize the code to have the
 failure paths easier.

 The same applies to _c4iw_write_mem_dma_aligned which uses __GFP_NOFAIL
 and then checks the return value and returns -ENOMEM on failure.  This
 doesn't make any sense what so ever.  Either the allocation cannot fail or
 it can.

 del_filter_wr seems to be safe as well because the filter entry is not
 marked as pending and the return value is propagated up the stack up to
 c4iw_destroy_listen.

 Signed-off-by: Michal Hocko mho...@suse.cz
 ...
 Signed-off-by: Andrew Morton a...@linux-foundation.org

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 03/39] net: sched: Use hrtimer_resolution instead of hrtimer_get_res()

2015-04-16 Thread David Miller

From: Thomas Gleixner t...@linutronix.de
Date: Tue, 14 Apr 2015 21:08:28 -

 No point in converting a timespec now that the value is directly
 accessible.
 
 Signed-off-by: Thomas Gleixner t...@linutronix.de

Acked-by: David S. Miller da...@davemloft.net
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 31/39] net: core: pktgen: Remove bogus hrtimer_active() check

2015-04-16 Thread David Miller

From: Thomas Gleixner t...@linutronix.de
Date: Tue, 14 Apr 2015 21:09:16 -

 The check for hrtimer_active() after starting the timer is
 pointless. If the timer is inactive it has expired already and
 therefor the task pointer is already NULL.

 Signed-off-by: Thomas Gleixner t...@linutronix.de

Acked-by: David S. Miller da...@davemloft.net
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] bpf: fix verifier memory corruption

2015-04-16 Thread David Miller

From: Alexei Starovoitov a...@plumgrid.com
Date: Tue, 14 Apr 2015 15:57:13 -0700

 Due to missing bounds check the DAG pass of the BPF verifier can corrupt
 the memory which can cause random crashes during program loading:
 
 [8.449451] BUG: unable to handle kernel paging request at 
 [8.451293] IP: [811de33d] kmem_cache_alloc_trace+0x8d/0x2f0
 [8.452329] Oops:  [#1] SMP
 [8.452329] Call Trace:
 [8.452329]  [8116cc82] bpf_check+0x852/0x2000
 [8.452329]  [8116b7e4] bpf_prog_load+0x1e4/0x310
 [8.452329]  [811b190f] ? might_fault+0x5f/0xb0
 [8.452329]  [8116c206] SyS_bpf+0x806/0xa30
 
 Fixes: f1bca824dabb (bpf: add search pruning optimization to verifier)
 Signed-off-by: Alexei Starovoitov a...@plumgrid.com

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/1] stmmac: fix oops on rmmod after assigning ip addr

2015-04-16 Thread David Miller

From: Bryan O'Donoghue pure.lo...@nexus-software.ie
Date: Wed, 15 Apr 2015 02:07:46 +0100

 An oops exists in the flow of stmmac_release().
 phy_ethtool_get_wol() depends on phydev-drv.
 phydev-drv will be null after stmmac_mdio_unreg() completes.
 
 Steps to reproduce on Quark X1000:
 
 1. ifconfig eth0 192.168.0.1
 2. rmmod stmmac_pci
 
 To fix this stmmac_mdio_unreg() should be run after unregister_netdev().
 
 Signed-off-by: Bryan O'Donoghue pure.lo...@nexus-software.ie
 Reported-by: Dan O'Donovan dan.odono...@emutex.com

This patch does not apply to the current tree, please respin.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch net-next v2] fou: avoid missing unlock in failure path

2015-04-16 Thread David Miller

From: Cong Wang xiyou.wangc...@gmail.com
Date: Wed, 15 Apr 2015 11:48:49 -0700

 Fixes: 7a6c8c34e5b7 (fou: implement FOU_CMD_GET)
 Reported-by: Dan Carpenter dan.carpen...@oracle.com
 Cc: Dan Carpenter dan.carpen...@oracle.com
 Signed-off-by: Cong Wang xiyou.wangc...@gmail.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next] fou: Fix missing unlock on error in fou_nl_dump()

2015-04-16 Thread David Miller

From: weiyj...@163.com
Date: Thu, 16 Apr 2015 20:14:39 +0800

 From: Wei Yongjun yongjun_...@trendmicro.com.cn
 
 Add the missing unlock before return from function fou_nl_dump()
 in the error handling case.
 
 Fixes: 7a6c8c34e5b7 (fou: implement FOU_CMD_GET)
 Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn

Your patch adds a warning because you fail to remove the 'done'
label which is no longer used.

Cong Wang already submitted a patch which does remove the 'done'
field properly, so I will apply his version.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rocker: fix error return code in rocker_probe()

2015-04-16 Thread David Miller

From: weiyj...@163.com
Date: Thu, 16 Apr 2015 20:21:02 +0800

 From: Wei Yongjun yongjun_...@trendmicro.com.cn
 
 Fix to return -EINVAL from the invalid PCI region size error
 handling case instead of 0, as done elsewhere in this function.
 
 Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/1] stmmac: fix oops on rmmod after assigning ip addr

2015-04-16 Thread Bryan O'Donoghue


On 16/04/15 17:07, David Miller wrote:

This patch does not apply to the current tree, please respin.



/facepalm

Did this against :
https://github.com/torvalds/linux.git

I'll spin again against :
https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/1] stmmac: fix oops on rmmod after assigning ip addr

2015-04-16 Thread David Miller

From: Bryan O'Donoghue pure.lo...@nexus-software.ie
Date: Thu, 16 Apr 2015 17:20:50 +0100

 I'll spin again against :
 https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/

Currently active tree is 'net', not 'net-next'
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/1] stmmac: rmmod oops after ifconfig

2015-04-16 Thread Bryan O'Donoghue

We have an oops with stmmac on Quark X1000/Galileo, triggered by rmmod after
ifconfig. Fix for issue contained in next mail against
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git


root@clanton:~# ifconfig eth0 192.168.0.1
root@clanton:~# rmmod stmmac_pci
[   39.257871] stmmac_dvr_remove:
removing driver
[   39.263618] BUG: unable to handle kernel NULL pointer dereference at
0060
[   39.271030] IP: [c1254192] phy_ethtool_get_wol+0x2/0x20
[   39.272391] *pdpt = 0e682001 *pde =  
[   39.272391] Oops:  [#1] 
[   39.272391] Modules linked in: evdev usbhid btusb bluetooth iwlwifi
cfg80211 rfkill ad7298 industrialio_triggered_buffer kfifo_buf industrialio
spi_s
[   39.272391] CPU: 0 PID: 1245 Comm: rmmod Not tainted 4.0.0 #26
[   39.272391] Hardware name: Intel Corp. QUARK/Galileo, BIOS 0x0350
01/01/2014
[   39.272391] task: ce52a9e0 ti: cea64000 task.ti: cea64000
[   39.272391] EIP: 0060:[c1254192] EFLAGS: 00010292 CPU: 0
[   39.272391] EIP is at phy_ethtool_get_wol+0x2/0x20
[   39.272391] EAX: ce59e400 EBX: ce59e400 ECX:  EDX: cea65ddc
[   39.272391] ESI:  EDI: cea65e80 EBP: cea65df8 ESP: cea65dd8
[   39.272391]  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
[   39.272391] CR0: 8005003b CR2: 0060 CR3: 0e641000 CR4: 00100030
[   39.272391] Stack:
[   39.272391]  c1255908 0005    
ce59e400 ce85b4e0
[   39.272391]  cea65e04 c125597c ce59e400 cea65e10 c12559cf ce85b000
cea65e24 e07a9dd8
[   39.272391]  cea65e38 ce85b000 cea65e80 cea65e44 c12a7639 cea0cc60
cdd4a180 c009ad00
[   39.272391] Call Trace:
[   39.272391]  [c1255908] ? phy_suspend+0x38/0x70
[   39.272391]  [c125597c] phy_detach+0x3c/0x60
[   39.272391]  [c12559cf] phy_disconnect+0x2f/0x40
[   39.272391]  [e07a9dd8] stmmac_release+0x38/0x150 [stmmac]
[   39.272391]  [c12a7639] __dev_close_many+0x69/0xb0
[   39.272391]  [c12a76dd] dev_close_many+0x5d/0xd0
[   39.272391]  [c12a859a] rollback_registered_many+0xda/0x240
[   39.272391]  [c12a871f] rollback_registered+0x1f/0x30
[   39.272391]  [c12a93b7] unregister_netdevice_queue+0x47/0xb0
[   39.272391]  [c132bb1b] ? mutex_lock+0xb/0x19
[   39.272391]  [c12a9434] unregister_netdev+0x14/0x20
[   39.272391]  [e07a695d] stmmac_dvr_remove+0x6d/0xb0 [stmmac]
[   39.272391]  [e074e02e] stmmac_pci_remove+0xe/0x10 [stmmac_pci]
[   39.272391]  [c1196c48] pci_device_remove+0x28/0xa0
[   39.272391]  [c121d8ea] __device_release_driver+0x5a/0xb0
[   39.272391]  [c121df4f] driver_detach+0x6f/0x80
[   39.272391]  [c121d68b] bus_remove_driver+0x3b/0x80
[   39.272391]  [c121e4c3] driver_unregister+0x23/0x60
[   39.272391]  [c10726da] ? find_module_all+0x5a/0x80
[   39.272391]  [c11968df] pci_unregister_driver+0xf/0x50
[   39.272391]  [e074e3d3] stmmac_pci_driver_exit+0xd/0xc3a [stmmac_pci]
[   39.272391]  [c1073f5c] SyS_delete_module+0x15c/0x1b0
[   39.272391]  [c132d19d] syscall_call+0x7/0x7
[   39.272391] Code: 42 12 31 c0 c3 8d 74 26 00 8b 08 8b 49 5c 85 c9 74 07
55 89 e5 ff d1 5d c3 b8 a1 ff ff ff c3 8d 76 00 8d bc 27 00 00 00 00 8b 08
0
[   39.272391] EIP: [c1254192] phy_ethtool_get_wol+0x2/0x20 SS:ESP
0068:cea65dd8
[   39.272391] CR2: 0060
[   39.551363] ---[ end trace 98d1260353bf4fec ]---
Killed

Bryan O'Donoghue (1):
  stmmac: fix oops on rmmod after assigning ip addr

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
1.9.1
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/1] stmmac: fix oops on rmmod after assigning ip addr

2015-04-16 Thread Bryan O'Donoghue

An oops exists in the flow of stmmac_release().
phy_ethtool_get_wol() depends on phydev-drv.
phydev-drv will be null after stmmac_mdio_unreg() completes.

Steps to reproduce on Quark X1000:

1. ifconfig eth0 192.168.0.1
2. rmmod stmmac_pci

To fix this stmmac_mdio_unreg() should be run after unregister_netdev().

Signed-off-by: Bryan O'Donoghue pure.lo...@nexus-software.ie
Reported-by: Dan O'Donovan dan.odono...@emutex.com
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 06103ca..6065173 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2970,15 +2970,15 @@ int stmmac_dvr_remove(struct net_device *ndev)
priv-hw-dma-stop_tx(priv-ioaddr);
 
stmmac_set_mac(priv-ioaddr, false);
-   if (priv-pcs != STMMAC_PCS_RGMII  priv-pcs != STMMAC_PCS_TBI 
-   priv-pcs != STMMAC_PCS_RTBI)
-   stmmac_mdio_unregister(ndev);
netif_carrier_off(ndev);
unregister_netdev(ndev);
if (priv-stmmac_rst)
reset_control_assert(priv-stmmac_rst);
clk_disable_unprepare(priv-pclk);
clk_disable_unprepare(priv-stmmac_clk);
+   if (priv-pcs != STMMAC_PCS_RGMII  priv-pcs != STMMAC_PCS_TBI 
+   priv-pcs != STMMAC_PCS_RTBI)
+   stmmac_mdio_unregister(ndev);
free_netdev(ndev);
 
return 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-16 Thread Michael Chan

On Thu, 2015-04-16 at 09:24 -0300, casca...@linux.vnet.ibm.com wrote: 
 Yes, this looks like the driver is not syncing the DMA buffers. Unmap is
 supposed to synchronize as well.
 

For small rx packets ( 256 bytes), we sync the DMA buffer before we
copy the data to another SKB.  For larger packets, we unmap the DMA
buffer.  Do we see the corruption in both cases?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Yann Droneaud

Hi,

Le jeudi 16 avril 2015 à 16:34 +0300, Erez Shitrit a écrit :
 Currently, iflink of the parent interface was always accessed, even 
 when interface didn't have a parent and hence we crashed there.

+ since commit 5aa7add8f14b ('infiniband/ipoib: implement
ndo_get_iflink').

as there was no crash here before AFAIK.

 
 Handle the interface types properly: for a child interface, return
 the ifindex of the parent, for parent interface, return its ifindex.
 
 For child devices, make sure to set the parent pointer prior to
 invoking register_netdevice(), this allows the new ndo to be called
 by the stack immediately after the child device is registered.
 
 Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')

Cc: Nicolas Dichtel nicolas.dich...@6wind.com

 Reported-by: Honggang Li ho...@redhat.com
 Signed-off-by: Erez Shitrit ere...@mellanox.com
 Signed-off-by: Honggang Li ho...@redhat.com
 ---
 
 changes from V0:
  - fixed two typos in the change-log
 
  drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
  drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
  2 files changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 index 657b89b..915ad04 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
  {
   struct ipoib_dev_priv *priv = netdev_priv(dev);
  
 + /* parent interface */
 + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
 + return dev-ifindex;
 +
 + /* child/vlan interface */
   return priv-parent-ifindex;
  }
  
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 index 4dd1313..fca1a88 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
   /* MTU will be reset when mcast join happens */
   priv-dev-mtu   = IPOIB_UD_MTU(priv-max_ib_mtu);
   priv-mcast_mtu  = priv-admin_mtu = priv-dev-mtu;
 + priv-parent = ppriv-dev;
   set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags);
  
   result = ipoib_set_dev_features(priv, ppriv-ca);
 @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
   goto register_failed;
   }
  
 - priv-parent = ppriv-dev;
 -
   ipoib_create_debug_files(priv-dev);
  
   /* RTNL childs don't need proprietary sysfs entries */

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Erez Shitrit

Currently, iflink of the parent interface was always accessed, event 
when interface didn't have a parent and hence we rashed there.

Handle the interface types properly: for a child interface, return
the ifindex of the parent, for parent interface, return its ifindex.

For child devices, make sure to set the parent pointer prior to
invoking register_netdevice(), this allows the new ndo to be called
by the stack immediately after the child device is registered.

Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')
Reported-by: Honggang Li ho...@redhat.com
Signed-off-by: Erez Shitrit ere...@mellanox.com
Signed-off-by: Honggang Li ho...@redhat.com
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 657b89b..915ad04 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
 
+   /* parent interface */
+   if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
+   return dev-ifindex;
+
+   /* child/vlan interface */
return priv-parent-ifindex;
 }
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 4dd1313..fca1a88 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
/* MTU will be reset when mcast join happens */
priv-dev-mtu   = IPOIB_UD_MTU(priv-max_ib_mtu);
priv-mcast_mtu  = priv-admin_mtu = priv-dev-mtu;
+   priv-parent = ppriv-dev;
set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags);
 
result = ipoib_set_dev_features(priv, ppriv-ca);
@@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
goto register_failed;
}
 
-   priv-parent = ppriv-dev;
-
ipoib_create_debug_files(priv-dev);
 
/* RTNL childs don't need proprietary sysfs entries */
-- 
1.7.11.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-16 Thread Tim Deegan

At 12:39 +0100 on 16 Apr (1429187952), George Dunlap wrote:
 Your comment lists three benefits:
 1. better RTT estimation
 2. faster recovery
 3. high rates
 
 #3 is just marketing fluff; it's also contradicted by the statement that
 immediately follows it -- i.e., there are drivers for which the
 limitation does *not* give high rates.

AFAICT #3 is talking about throughput _under TCP_, where inflating the
RTT will absolutely cause problems.

Tim.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Erez Shitrit

Currently, iflink of the parent interface was always accessed, even 
when interface didn't have a parent and hence we crashed there.

Handle the interface types properly: for a child interface, return
the ifindex of the parent, for parent interface, return its ifindex.

For child devices, make sure to set the parent pointer prior to
invoking register_netdevice(), this allows the new ndo to be called
by the stack immediately after the child device is registered.

Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')
Reported-by: Honggang Li ho...@redhat.com
Signed-off-by: Erez Shitrit ere...@mellanox.com
Signed-off-by: Honggang Li ho...@redhat.com
---

changes from V0:
 - fixed two typos in the change-log

 drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 657b89b..915ad04 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
 
+   /* parent interface */
+   if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
+   return dev-ifindex;
+
+   /* child/vlan interface */
return priv-parent-ifindex;
 }
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 4dd1313..fca1a88 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
/* MTU will be reset when mcast join happens */
priv-dev-mtu   = IPOIB_UD_MTU(priv-max_ib_mtu);
priv-mcast_mtu  = priv-admin_mtu = priv-dev-mtu;
+   priv-parent = ppriv-dev;
set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags);
 
result = ipoib_set_dev_features(priv, ppriv-ca);
@@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
goto register_failed;
}
 
-   priv-parent = ppriv-dev;
-
ipoib_create_debug_files(priv-dev);
 
/* RTNL childs don't need proprietary sysfs entries */
-- 
1.7.11.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next] netns: remove duplicated include from net_namespace.c

2015-04-16 Thread Nicolas Dichtel


Le 16/04/2015 15:17, weiyj...@163.com a écrit :

From: Wei Yongjun yongjun_...@trendmicro.com.cn

Remove duplicated include.

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn

Acked-by: Nicolas Dichtel nicolas.dich...@6wind.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH -next] netns: remove duplicated include from net_namespace.c

2015-04-16 Thread weiyj_lk

From: Wei Yongjun yongjun_...@trendmicro.com.cn

Remove duplicated include.

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
---
 net/core/net_namespace.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index e5e96b0..9c43cf6 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -16,7 +16,6 @@
 #include linux/export.h
 #include linux/user_namespace.h
 #include linux/net_namespace.h
-#include linux/rtnetlink.h
 #include net/sock.h
 #include net/netlink.h
 #include net/net_namespace.h

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] hso: fix refcnt leak in recent patch.

2015-04-16 Thread Olivier Sobrie

On Tue, Apr 14, 2015 at 11:03:03AM +1000, NeilBrown wrote:
 On Tue, 14 Apr 2015 09:36:34 +1000 NeilBrown ne...@suse.de wrote:
 
  
  
  Prior to
  commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
  hso: fix crash when device disappears while serial port is open
  
  hso_serial_open would always kref_get(serial-parent-ref) before
  returning zero.
  Since that commit, it only calls kref_get when returning 0 if
  serial-port.count was zero.
  
  This results in calls to
 kref_put(serial-parent-ref, hso_serial_ref_free);
  
  after hso_serial_ref_free has been called, which dereferences a freed
  pointer.
  
  This patch adds the missing kref_get().
  
  Fixes: commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
  Cc: sta...@vger.kernel.org (v4.0)
  Cc: Olivier Sobrie oliv...@sobrie.be
  Signed-off-by: NeilBrown ne...@suse.de
  
  diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
  index 75befc1bd816..6848fc903340 100644
  --- a/drivers/net/usb/hso.c
  +++ b/drivers/net/usb/hso.c
  @@ -1299,6 +1299,7 @@ static int hso_serial_open(struct tty_struct *tty, 
  struct file *filp)
  }
  } else {
  D1(Port was already open);
  +   kref_get(serial-parent-ref);
  }
   
  usb_autopm_put_interface(serial-parent-interface);
 
 
 Sorry - that was wrong.
 I'm getting crashes which strongly suggest the kref_put is being called extra
 times, but I misunderstood the code and was hasty.
 
 Maybe this instead?

I tested the patch and it looks fine :-)
Thank you,
Olivier

 
 Thanks,
 NeilBrown
 
 From: NeilBrown n...@brown.name
 Date: Tue, 14 Apr 2015 09:33:03 +1000
 Subject: [PATCH] hso: fix refcnt leak in recent patch.
 
 Prior to
 commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
 hso: fix crash when device disappears while serial port is open
 
 a kref_get on serial-parent-ref would be taken on each open,
 and it would be kref_put on each close.
 
 Now the kref_put happens when the tty_struct is finally put (via
 the 'cleanup') providing tty-driver_data has been set.
 So the kref_get must be called exact once when tty-driver_data is
 set.
 
 With the current code, if the first open fails the kref_get() is never
 called, but the kref_put() is called, leaving to a crash.
 
 So change the kref_get call to happen exactly when -driver_data is
 changed from NULL to non-NULL.
 
 Fixes: commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
 Cc: sta...@vger.kernel.org (v4.0)
 Cc: Olivier Sobrie oliv...@sobrie.be

Tested-by: Olivier Sobrie oliv...@sobrie.be

 Signed-off-by: NeilBrown n...@brown.name
 
 diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
 index 75befc1bd816..17fd3820263a 100644
 --- a/drivers/net/usb/hso.c
 +++ b/drivers/net/usb/hso.c
 @@ -1278,6 +1278,8 @@ static int hso_serial_open(struct tty_struct *tty, 
 struct file *filp)
   D1(Opening %d, serial-minor);
  
   /* setup */
 + if (tty-driver_data == NULL)
 + kref_get(serial-parent-ref);
   tty-driver_data = serial;
   tty_port_tty_set(serial-port, tty);
  
 @@ -1294,8 +1296,6 @@ static int hso_serial_open(struct tty_struct *tty, 
 struct file *filp)
   if (result) {
   hso_stop_serial_device(serial-parent);
   serial-port.count--;
 - } else {
 - kref_get(serial-parent-ref);
   }
   } else {
   D1(Port was already open);



-- 
Olivier
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2 -next] tc: built-in eBPF exec proxy

2015-04-16 Thread Alexei Starovoitov


On 4/15/15 7:52 AM, Daniel Borkmann wrote:


File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:


Amazing that it worked.
Acked-by: Alexei Starovoitov a...@plumgrid.com


+static void bpf_map_set_env(int *tfd)
+{
+   char key[64], *val;
+   int i;
+   for (i = 0; i  BPF_MAP_ID_MAX; i++) {
+   memset(key, 0, sizeof(key));
+   snprintf(key, sizeof(key), BPF_MAP%d, i);

+   val = secure_getenv(key);
+   assert(val != NULL);


everything looks good. My only nit is that the name of the function
reads as this function is setting env vars, whereas it's actually
reading them. I guess in your mind it fits with the rest of
'bpf_map_set_*' functions, but the name is still confusing.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2 -next] tc: built-in eBPF exec proxy

2015-04-16 Thread Daniel Borkmann


On 04/16/2015 07:48 PM, Alexei Starovoitov wrote:

On 4/15/15 7:52 AM, Daniel Borkmann wrote:


File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:


Amazing that it worked.
Acked-by: Alexei Starovoitov a...@plumgrid.com


+static void bpf_map_set_env(int *tfd)
+{
+char key[64], *val;
+int i;
+for (i = 0; i  BPF_MAP_ID_MAX; i++) {
+memset(key, 0, sizeof(key));
+snprintf(key, sizeof(key), BPF_MAP%d, i);

+val = secure_getenv(key);
+assert(val != NULL);


everything looks good. My only nit is that the name of the function
reads as this function is setting env vars, whereas it's actually
reading them. I guess in your mind it fits with the rest of
'bpf_map_set_*' functions, but the name is still confusing.


Ok, since it's example code, I'll find a better function name for it,
keep yours and Hannes' tags and resubmit. Thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-16 Thread Ian Jackson

Michael Chan writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more 
messages]):
 On Thu, 2015-04-16 at 09:24 -0300, casca...@linux.vnet.ibm.com wrote: 
  Yes, this looks like the driver is not syncing the DMA buffers. Unmap is
  supposed to synchronize as well.
 
 For small rx packets ( 256 bytes), we sync the DMA buffer before we
 copy the data to another SKB.  For larger packets, we unmap the DMA
 buffer.  Do we see the corruption in both cases?

Yes, at least with swiotlb=force iommu=soft.

Ian.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ethtool: return 1 as exit code on a settings(-s) failure.

2015-04-16 Thread Ben Hutchings

On Thu, 2015-04-16 at 13:55 -0700, Sridhar Samudrala wrote:
 Currently 0 is returned on both success or failure.

Previously discussed here:
http://thread.gmane.org/gmane.linux.network/349717.

Ben.

 Signed-off-by: Sridhar Samudrala sridhar.samudr...@intel.com
 ---
  ethtool.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/ethtool.c b/ethtool.c
 index 01b13a6..163dff2 100644
 --- a/ethtool.c
 +++ b/ethtool.c
 @@ -2352,7 +2352,7 @@ static int do_sset(struct cmd_context *ctx)
   int argc = ctx-argc;
   char **argp = ctx-argp;
   int i;
 - int err;
 + int err = 0;
  
   for (i = 0; i  ARRAY_SIZE(flags_msglvl); i++)
   flag_to_cmdline_info(flags_msglvl[i].name,
 @@ -2665,7 +2665,7 @@ static int do_sset(struct cmd_context *ctx)
   }
   }
  
 - return 0;
 + return err ? 1 : 0;
  }
  
  static int do_gregs(struct cmd_context *ctx)

-- 
Ben Hutchings
Humour is the best antidote to reality.


signature.asc
Description: This is a digitally signed message part

[PATCH 1/1] altera tse: Fix network-delays and -retransmissions after high throughput.

2015-04-16 Thread Andreas Oetken

From: Andreas Oetken andr...@oetken.name

Fix bug which occurs when more than limit packets are available during
napi-poll, leading to delays and retransmissions on the network.

Check for (count  limit) before checking the get_rx_status in tse_rx-function.
Function get_rx_status is reading from the response-fifo.
If there is currently a response in the fifo,
reading the last byte of the response pops the value from the fifo.
If the limit is checked as second condition
and the limit is reached the fifo is popped but the packet is not processed.

Signed-off-by: Andreas Oetken ennoerlan...@gmail.com
---
 drivers/net/ethernet/altera/altera_tse_main.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/altera/altera_tse_main.c 
b/drivers/net/ethernet/altera/altera_tse_main.c
index 6725dc0..dbbbd34 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -376,8 +376,13 @@ static int tse_rx(struct altera_tse_private *priv, int 
limit)
u16 pktlength;
u16 pktstatus;
 
-   while (((rxstatus = priv-dmaops-get_rx_status(priv)) != 0) 
-  (count  limit))  {
+   /* Check for count  limit first as get_rx_status is changing
+   * the response-fifo so we must process the next packet
+   * after calling get_rx_status if a response is pending.
+   * (reading the last byte of the response pops the value from the fifo.)
+   */
+   while ((count  limit) 
+  ((rxstatus = priv-dmaops-get_rx_status(priv)) != 0)) {
pktstatus = rxstatus  16;
pktlength = rxstatus  0x;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

2015-04-16 Thread Luis R. Rodriguez

On Thu, Apr 16, 2015 at 01:18:37PM +0900, Hyong-Youb Kim wrote:
 On Thu, Apr 16, 2015 at 01:58:16AM +0200, Luis R. Rodriguez wrote:
  
  An alternative... is to just ioremap_wc() the entire region, including
  MMIO registers for these old devices. I see one ethernet driver that does
  this, myri10ge, and am curious how and why they ended up deciding this
  and if they have run into any issues. I wonder if this is a reasonable
  comrpomise for these 2 remaining corner cases.
  
 
 For myri10ge, it a performance thing.  Descriptor rings are in NIC
 memory BAR0, not in host memory.  Say, to send a packet, the driver
 writes the send descriptor to the ioremap'd NIC memory.  It is a
 multi-word descriptor.  So, to send it as one PCIE MWr transaction,
 the driver maps the whole BAR0 as WC and does copy descriptor; wmb.

Interesting, so you burst write multi-word descriptor writes using
write-combining here for the Ethernet device.

 Without WC, descriptors would end up as multiple 4B or 8B MWr packets
 to the NIC, which has a pretty big performance impact on this
 particular NIC.

How big are the descriptors?

 Most registers that do not want WC are actually in BAR2, which is not
 mapped as WC.  For registers that are in BAR0, we do write to the
 register; wmb.  If we want to wait till the NIC has seen the write,
 we do write; wmb; read.

Interesting, thanks, yeah using this as a work around to the problem sounds
plausible however it still would require likely making just as many changes to
the ivtv and ipath driver as to just do a proper split. I do wonder however if
this sort of work around can be generalized somehow though so that others could
use, if this sort of thing is going to become prevalent. If so then this would
serve two purposes: work around for the corner cases of MTRR use on Linux and
also these sorts of device constraints.

In order to determine if this is likely to be generally useful could you 
elaborate
a bit more about the detals of the performance issues of not bursting writes
for the descriptor on this device.

Even if that is done a conversion over to this work around seems it may require
device specific nitpicks. For instance I note in myri10ge_submit_req() for
small writes you just do a reverse write and do the first set last, then
finally the last 32 bits are rewritten, I guess to trigger something?

 This approach has worked for this device for many years.  I cannot say
 whether it works for other devices, though.

I think it should but the more interesting question would be exactly
*why* it was needed for this device, who determined that, and why?

  Luis
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Geert Uytterhoeven

On Thu, Apr 16, 2015 at 3:46 PM, Guenter Roeck li...@roeck-us.net wrote:
 On 04/15/2015 10:12 PM, Guenter Roeck wrote:

 Return correct error code if _mv88e6xxx_reg_read returns an error.

 Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging
 support)
 Signed-off-by: Guenter Roeck li...@roeck-us.net


 I should have given proper credit.

 Reported-by: kbuild test robot fengguang...@intel.com

 For the curious, neither W=1, W=2, C-1, C=2, nor sparse report this problem,
 at least not with gcc 4.9.1.

Good old gcc 4.1.2 (which I still use for m68k builds, and won't retire anytime
soon as it finds real bugs in every new kernel release) says:

drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’:
drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used
uninitialized in this function

Even after your patch, as it missed one case (patch sent).

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] net: dsa: use DEVICE_ATTR_RW to declare temp1_max

2015-04-16 Thread Sergei Shtylyov


Hello.

On 04/16/2015 09:38 PM, Vivien Didelot wrote:


Since commit da4759c, sysfs will only use the permissions returned by


   Please also specify that commit's summary line in parens.


is_visible, instead of OR'ing them with the default file mode.



This allows us to declare temp1_max with the DEVICE_ATTR_RW macro and
just return the desired permissions for the hwmon sysfs attributes in
dsa_hwmon_attrs_visible.



Also, allow temp1_max to be write-only if set_temp_limit is provided,
but not get_temp_limit.



Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com


WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Patrick McHardy

On 17.04, Herbert Xu wrote:
 On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote:
 
  So currently we have one fast path, that is: we are not fragmented, we
  get out non-fragmented, none of this code is ever touched and no
  problem.
  
  We don't want to mak this more complex, but
 
 You should read Dave's other email where he gives you an obvious
 solution.  If you have to modify the skb then you don't have to
 worry about the original fragments.
 
 But if you only read the skb then don't linearise it completely
 and keep the original fragments.

Yes, I was just responding to that. We need an additional member
in the skb, but then this part is quite simple.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2 -next v2] tc: built-in eBPF exec proxy

2015-04-16 Thread Daniel Borkmann

This work follows upon commit 6256f8c9e45f (tc, bpf: finalize eBPF
support for cls and act front-end) and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.

File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:

  # env | grep BPF
  BPF_NUM_MAPS=3
  BPF_MAP1=6- BPF_MAP_ID_QUEUE (id 1)
  BPF_MAP0=5- BPF_MAP_ID_PROTO (id 0)
  BPF_MAP2=7- BPF_MAP_ID_DROPS (id 2)

  # ls -la /proc/self/fd
  [...]
  lrwx--. 1 root root 64 Apr 14 16:46 0 - /dev/pts/4
  lrwx--. 1 root root 64 Apr 14 16:46 1 - /dev/pts/4
  lrwx--. 1 root root 64 Apr 14 16:46 2 - /dev/pts/4
  [...]
  lrwx--. 1 root root 64 Apr 14 16:46 5 - anon_inode:bpf-map
  lrwx--. 1 root root 64 Apr 14 16:46 6 - anon_inode:bpf-map
  lrwx--. 1 root root 64 Apr 14 16:46 7 - anon_inode:bpf-map

The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.

To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.

Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.

Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa han...@stressinduktion.org
Signed-off-by: Daniel Borkmann dan...@iogearbox.net
Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org
Acked-by: Alexei Starovoitov a...@plumgrid.com
---
 ( Stephen, this applies to current net-next branch of iproute2. )
 v1 - v2:
  - s/bpf_map_set_env/bpf_map_get_from_env/ as suggested by Alexei
  - Rest as is, also kept reviewer acks

 examples/bpf/bpf_agent.c |  55 ++
 examples/bpf/bpf_prog.c  |  27 +
 tc/Makefile  |   7 ++-
 tc/e_bpf.c   | 147 +++
 tc/f_bpf.c   |   2 +-
 tc/m_action.c|   2 +-
 tc/m_bpf.c   |   2 +-
 tc/tc.c  |   9 +--
 tc/tc_bpf.c  |  98 ---
 tc/tc_bpf.h  |  15 -
 tc/tc_common.h   |   3 +
 tc/tc_exec.c | 109 +++
 tc/tc_filter.c   |   2 +-
 tc/tc_util.h |  16 --
 14 files changed, 454 insertions(+), 40 deletions(-)
 create mode 100644 tc/e_bpf.c
 create mode 100644 tc/tc_exec.c

diff --git a/examples/bpf/bpf_agent.c b/examples/bpf/bpf_agent.c
index 0f481b1..426b880 100644
--- a/examples/bpf/bpf_agent.c
+++ b/examples/bpf/bpf_agent.c
@@ -24,6 +24,8 @@
  *   -- Happy eBPF hacking! ;)
  */
 
+#define _GNU_SOURCE
+
 #include stdio.h
 #include stdlib.h
 #include string.h
@@ -31,6 +33,7 @@
 #include unistd.h
 #include stdint.h
 #include assert.h
+
 #include sys/un.h
 #include sys/types.h
 #include sys/stat.h
@@ -102,6 +105,23 @@ static void bpf_dump_proto(int fd)
printf(\n);
 }
 
+static void bpf_dump_map_data(int *tfd)
+{
+   int i;
+
+   for (i = 0; i  30; i++) {
+   const int period = 5;
+
+   printf(data, period: %dsec\n, period);
+
+   bpf_dump_drops(tfd[BPF_MAP_ID_DROPS]);
+   bpf_dump_queue(tfd[BPF_MAP_ID_QUEUE]);
+   bpf_dump_proto(tfd[BPF_MAP_ID_PROTO]);
+
+   sleep(period);
+   }
+}
+
 static void bpf_info_loop(int *fds, struct bpf_map_aux *aux)
 {
int i, tfd[BPF_MAP_ID_MAX];
@@ -122,16 +142,22 @@ static void bpf_info_loop(int *fds, struct bpf_map_aux 
*aux)
tfd[aux-ent[i].id] = fds[i];
}
 
-   for (i = 0; i  30; i++) {
-   int period = 5;
+   bpf_dump_map_data(tfd);
+}
 
-   printf(data, period: %dsec\n, period);
+static void bpf_map_get_from_env(int *tfd)
+{
+   char key[64], *val;
+   int i;
 
-   bpf_dump_drops(tfd[BPF_MAP_ID_DROPS]);
-   bpf_dump_queue(tfd[BPF_MAP_ID_QUEUE]);
-   bpf_dump_proto(tfd[BPF_MAP_ID_PROTO]);
+   for (i = 0; i  BPF_MAP_ID_MAX; i++) {
+   memset(key, 0, sizeof(key));
+   snprintf(key, sizeof(key), BPF_MAP%d, i);
 
-   sleep(period);
+   val = secure_getenv(key);
+   assert(val != NULL);
+
+   tfd[i] = atoi(val);
}
 }
 
@@

Re: [PATCH] Bluetooth: Pre-initialize variables in read_local_oob_ext_data_complete()

2015-04-16 Thread Marcel Holtmann

Hi Geert,

 net/bluetooth/mgmt.c: In function ‘read_local_oob_ext_data_complete’:
 net/bluetooth/mgmt.c:6474: warning: ‘r256’ may be used uninitialized in this 
 function
 net/bluetooth/mgmt.c:6474: warning: ‘h256’ may be used uninitialized in this 
 function
 net/bluetooth/mgmt.c:6474: warning: ‘r192’ may be used uninitialized in this 
 function
 net/bluetooth/mgmt.c:6474: warning: ‘h192’ may be used uninitialized in this 
 function
 
 While these are false positives, the code can be shortened by
 pre-initializing the hash table pointers and eir_len. This has the side
 effect of killing the compiler warnings.

can you be a bit specific on which compiler version is this. I fixed one 
occurrence that seemed valid. However in this case the compiler seems to be 
just plain stupid. On a gcc 4.9, I am not seeing these for example.

Regards

Marcel

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Herbert Xu

On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote:

 So currently we have one fast path, that is: we are not fragmented, we
 get out non-fragmented, none of this code is ever touched and no
 problem.
 
 We don't want to mak this more complex, but

You should read Dave's other email where he gives you an obvious
solution.  If you have to modify the skb then you don't have to
worry about the original fragments.

But if you only read the skb then don't linearise it completely
and keep the original fragments.

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] net: dsa: register hwmon for any provided function

2015-04-16 Thread Guenter Roeck

On Thu, Apr 16, 2015 at 02:38:19PM -0400, Vivien Didelot wrote:
 A switch driver may only provide one of the temperature limit accessors,
 or the temperature alarm getter. So register the hwmon subsystem if any
 of the related functions is provided.
 
 Thus, check get_temp to set the visibility of temp1_input.
 
 Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com
 ---
  net/dsa/dsa.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)
 
 diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
 index 67d2983..6b68994 100644
 --- a/net/dsa/dsa.c
 +++ b/net/dsa/dsa.c
 @@ -157,6 +157,10 @@ static umode_t dsa_hwmon_attrs_visible(struct kobject 
 *kobj,
   umode_t mode = 0;
  
   switch (index) {
 + case 0: /* temp1_input */
 + if (drv-get_temp)
 + mode |= S_IRUGO;

This should be mandatory. Sorry, I don't really understand what you are
trying to accomplish here.

Can you give me a real world example where a chip would support setting
a limit but not reading it ?

Thanks,
Guenter

 + break;
   case 1: /* temp1_max */
   if (drv-get_temp_limit)
   mode |= S_IRUGO;
 @@ -310,7 +314,8 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
 struct device *parent)
* register with hardware monitoring subsystem.
* Treat registration error as non-fatal and ignore it.
*/
 - if (drv-get_temp) {
 + if (drv-get_temp || drv-get_temp_limit || drv-set_temp_limit ||
 + drv-get_temp_alarm) {
   const char *netname = netdev_name(dst-master_netdev);
   char hname[IFNAMSIZ + 1];
   int i, j;
 -- 
 2.3.5
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Bluetooth: Pre-initialize variables in read_local_oob_ext_data_complete()

2015-04-16 Thread Geert Uytterhoeven

net/bluetooth/mgmt.c: In function ‘read_local_oob_ext_data_complete’:
net/bluetooth/mgmt.c:6474: warning: ‘r256’ may be used uninitialized in this 
function
net/bluetooth/mgmt.c:6474: warning: ‘h256’ may be used uninitialized in this 
function
net/bluetooth/mgmt.c:6474: warning: ‘r192’ may be used uninitialized in this 
function
net/bluetooth/mgmt.c:6474: warning: ‘h192’ may be used uninitialized in this 
function

While these are false positives, the code can be shortened by
pre-initializing the hash table pointers and eir_len. This has the side
effect of killing the compiler warnings.

Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org
---
 net/bluetooth/mgmt.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index 7fd87e7135b52753..f8e13b6d58279463 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -6471,9 +6471,9 @@ static void read_local_oob_ext_data_complete(struct 
hci_dev *hdev, u8 status,
 {
const struct mgmt_cp_read_local_oob_ext_data *mgmt_cp;
struct mgmt_rp_read_local_oob_ext_data *mgmt_rp;
-   u8 *h192, *r192, *h256, *r256;
+   u8 *h192 = NULL, *r192 = NULL, *h256 = NULL, *r256 = NULL;
struct mgmt_pending_cmd *cmd;
-   u16 eir_len;
+   u16 eir_len = 0;
int err;
 
BT_DBG(%s status %u, hdev-name, status);
@@ -6486,18 +6486,11 @@ static void read_local_oob_ext_data_complete(struct 
hci_dev *hdev, u8 status,
 
if (status) {
status = mgmt_status(status);
-   eir_len = 0;
-
-   h192 = NULL;
-   r192 = NULL;
-   h256 = NULL;
-   r256 = NULL;
} else if (opcode == HCI_OP_READ_LOCAL_OOB_DATA) {
struct hci_rp_read_local_oob_data *rp;
 
if (skb-len != sizeof(*rp)) {
status = MGMT_STATUS_FAILED;
-   eir_len = 0;
} else {
status = MGMT_STATUS_SUCCESS;
rp = (void *)skb-data;
@@ -6505,23 +6498,18 @@ static void read_local_oob_ext_data_complete(struct 
hci_dev *hdev, u8 status,
eir_len = 5 + 18 + 18;
h192 = rp-hash;
r192 = rp-rand;
-   h256 = NULL;
-   r256 = NULL;
}
} else {
struct hci_rp_read_local_oob_ext_data *rp;
 
if (skb-len != sizeof(*rp)) {
status = MGMT_STATUS_FAILED;
-   eir_len = 0;
} else {
status = MGMT_STATUS_SUCCESS;
rp = (void *)skb-data;
 
if (hci_dev_test_flag(hdev, HCI_SC_ONLY)) {
eir_len = 5 + 18 + 18;
-   h192 = NULL;
-   r192 = NULL;
} else {
eir_len = 5 + 18 + 18 + 18 + 18;
h192 = rp-hash192;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Patrick McHardy

On 16.04, David Miller wrote:
  Netfilter may change the contents of the packet, even change its size.
  It is *really* hard to do this while keeping the original fragments
  intact.
 
 I keep hearing a lot of it's hard as the only reason we shouldn't do
 this properly, and that frankly sucks.  People aren't looking for a
 solution and to be honest it's quite tiring.
 
 The common case is that the rules processed are simple, the size of
 the overall packet does _not_ change, and therefore the best thing
 to do is pass the entire thing as a unit with the frags in tact.
 
 That's the fundamental fact.  It's also the fastest way to process
 these packets and avoids all of these stupid max frag garbage.
 
 Only at the point where netfilter makes changes to the size of the
 packet does it take ownership and get to take on the responsibility
 of making sure the new resulting fragments are sane.
 
 But only at that point.

Agreed, that part shouldn't be hard. We need to pass the defragmented
skb through the ruleset, meaning we need to pass it through the stack.
That's needed since the rules depend on this.

If we don't make changes, we can spit out the original fragments, but
for this we need to keep a reference to them from the skb. We still
need the max_frag_size thing, once a modification is made we drop the
frag list reference and just regulary refragment the modified skb
according to the limits.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] net: dsa: use DEVICE_ATTR_RW to declare temp1_max

2015-04-16 Thread Vivien Didelot

Hello Sergei,

  Since commit da4759c, sysfs will only use the permissions returned by
 
 Please also specify that commit's summary line in parens.

Duly noted.

da4759c is sysfs: Use only return value from is_visible for the file mode
(see: 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da4759c)

Best,
-v
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] ethtool: return 1 as exit code on a settings(-s) failure.

2015-04-16 Thread Sridhar Samudrala

Currently 0 is returned on both success or failure.

Signed-off-by: Sridhar Samudrala sridhar.samudr...@intel.com
---
 ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 01b13a6..163dff2 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -2352,7 +2352,7 @@ static int do_sset(struct cmd_context *ctx)
int argc = ctx-argc;
char **argp = ctx-argp;
int i;
-   int err;
+   int err = 0;
 
for (i = 0; i  ARRAY_SIZE(flags_msglvl); i++)
flag_to_cmdline_info(flags_msglvl[i].name,
@@ -2665,7 +2665,7 @@ static int do_sset(struct cmd_context *ctx)
}
}
 
-   return 0;
+   return err ? 1 : 0;
 }
 
 static int do_gregs(struct cmd_context *ctx)
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] net: dsa: use DEVICE_ATTR_RW to declare temp1_max

2015-04-16 Thread Guenter Roeck

On Thu, Apr 16, 2015 at 02:38:18PM -0400, Vivien Didelot wrote:
 Since commit da4759c, sysfs will only use the permissions returned by
 is_visible, instead of OR'ing them with the default file mode.
 
 This allows us to declare temp1_max with the DEVICE_ATTR_RW macro and
 just return the desired permissions for the hwmon sysfs attributes in
 dsa_hwmon_attrs_visible.
 
 Also, allow temp1_max to be write-only if set_temp_limit is provided,
 but not get_temp_limit.
 
Hi Vivien,

This would be a first for the entire hwmon subsystem and doesn't really
make sense.

Guenter

 Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com
 ---
  net/dsa/dsa.c | 24 
  1 file changed, 16 insertions(+), 8 deletions(-)
 
 diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
 index 5eaadab..67d2983 100644
 --- a/net/dsa/dsa.c
 +++ b/net/dsa/dsa.c
 @@ -124,7 +124,7 @@ static ssize_t temp1_max_store(struct device *dev,
  
   return count;
  }
 -static DEVICE_ATTR(temp1_max, S_IRUGO, temp1_max_show, temp1_max_store);
 +static DEVICE_ATTR_RW(temp1_max);
  
  static ssize_t temp1_max_alarm_show(struct device *dev,
   struct device_attribute *attr, char *buf)
 @@ -154,16 +154,24 @@ static umode_t dsa_hwmon_attrs_visible(struct kobject 
 *kobj,
   struct device *dev = container_of(kobj, struct device, kobj);
   struct dsa_switch *ds = dev_get_drvdata(dev);
   struct dsa_switch_driver *drv = ds-drv;
 - umode_t mode = attr-mode;
 + umode_t mode = 0;
  
 - if (index == 1) {
 - if (!drv-get_temp_limit)
 - mode = 0;
 - else if (drv-set_temp_limit)
 + switch (index) {
 + case 1: /* temp1_max */
 + if (drv-get_temp_limit)
 + mode |= S_IRUGO;
 + if (drv-set_temp_limit)
   mode |= S_IWUSR;
 - } else if (index == 2  !drv-get_temp_alarm) {
 - mode = 0;
 + break;
 + case 2: /* temp1_max_alarm */
 + if (drv-get_temp_alarm)
 + mode |= S_IRUGO;
 + break;
 + default:
 + mode = attr-mode;
 + break;
   }
 +
   return mode;
  }
  
 -- 
 2.3.5
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Or Gerlitz

On Thu, Apr 16, 2015, Yann Droneaud ydrone...@opteya.com wrote:
 Hi,

 Le jeudi 16 avril 2015 à 16:34 +0300, Erez Shitrit a écrit :
 Currently, iflink of the parent interface was always accessed, even
 when interface didn't have a parent and hence we crashed there.

 + since commit 5aa7add8f14b ('infiniband/ipoib: implement
 ndo_get_iflink').

 as there was no crash here before AFAIK.

Disagree, it's redundant, as the Fixes tag below points to this commit


 Handle the interface types properly: for a child interface, return
 the ifindex of the parent, for parent interface, return its ifindex.

 For child devices, make sure to set the parent pointer prior to
 invoking register_netdevice(), this allows the new ndo to be called
 by the stack immediately after the child device is registered.

 Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')

 Cc: Nicolas Dichtel nicolas.dich...@6wind.com

Ditto, I find it enough to just copy Nicholas on the patch submission,
as we did here.

Or.

 Reported-by: Honggang Li ho...@redhat.com
 Signed-off-by: Erez Shitrit ere...@mellanox.com
 Signed-off-by: Honggang Li ho...@redhat.com
 ---

 changes from V0:
  - fixed two typos in the change-log

  drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
  drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
  2 files changed, 6 insertions(+), 2 deletions(-)

 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 index 657b89b..915ad04 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device 
 *dev)
  {
   struct ipoib_dev_priv *priv = netdev_priv(dev);

 + /* parent interface */
 + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
 + return dev-ifindex;
 +
 + /* child/vlan interface */
   return priv-parent-ifindex;
  }

 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 index 4dd1313..fca1a88 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
 @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
   /* MTU will be reset when mcast join happens */
   priv-dev-mtu   = IPOIB_UD_MTU(priv-max_ib_mtu);
   priv-mcast_mtu  = priv-admin_mtu = priv-dev-mtu;
 + priv-parent = ppriv-dev;
   set_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags);

   result = ipoib_set_dev_features(priv, ppriv-ca);
 @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
 ipoib_dev_priv *priv,
   goto register_failed;
   }

 - priv-parent = ppriv-dev;
 -
   ipoib_create_debug_files(priv-dev);

   /* RTNL childs don't need proprietary sysfs entries */

 Regards.

 --
 Yann Droneaud
 OPTEYA


 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Patrick McHardy

Am 16. April 2015 17:43:23 MESZ, schrieb David Miller da...@davemloft.net:
From: Hannes Frederic Sowa han...@stressinduktion.org
Date: Thu, 16 Apr 2015 14:11:42 +0200

 On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote:
 On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote:

  Netfilter may change the contents of the packet, even change its
size.
  It is *really* hard to do this while keeping the original
fragments
  intact.

 Perhaps we should provide better helpers to facilitate this?

 So instead of directly manipulating the content of the skb you
 would so so through helpers and the helpers can then try to do
 sensible things with the fragments.

 When Florian and me started discussing how to solve this we also
wanted
 to be as transparent as possible. But looking at all possible
 fragmentation scenarios, this seems to be too complicated.

 Even imagine a fragment with overlapping offsets and some of the
 fragments got duplicated. If we had to keep this in frag_list and now
 netfilter has to change any of this contents, this will become a
total
 mess, like changing one port in multiple skbs at different offsets.

 I doubt it is worth the effort.

You guys keep talking about exceptional cases rather than what is
unquestionable the common case, and the one worth handling in the
fast paths.

True. But its not like we haven't tried. What I keep hearing is lots of well 
meaning opinions, and the fact is, I've looked into this countless times, you 
can find my first attempt when googling from 11 years ago, and the resolution 
was always - it's not worth it. It's not too hard, I wouldn't mind. And I 
fail to see the problem with that. We're not talking about a functional defect, 
it's something people (me included) think is not nice.

I will try to recollect the problems the different approaches have in detail. 
I'm pretty sure you will come to the same conclusion as I have.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state()

2015-04-16 Thread David Miller

From: Geert Uytterhoeven ge...@linux-m68k.org
Date: Thu, 16 Apr 2015 20:49:14 +0200

 drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’:
 drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used uninitialized in 
 this function

 If oldstate == state, mv88e6xxx_set_port_state() will return an
 uninitialized value. Pre-initialize ret to zero to fix this.

 Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org

Indeed, applied, thanks Geert.
N§²ζμrΈyϊθΨb²X¬ΆΗ§vΨ^)ήΊ{.nΗ+·§zΧ^Ύ)ν
ζθw*jg¬±¨Άέ’j/κδzΉήΰ2ή¨θΪ’)ί‘«aΆΪώψ�G«ιh�ζj:+v¨wθΩ₯

Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Guenter Roeck


On 04/16/2015 11:51 AM, Geert Uytterhoeven wrote:

On Thu, Apr 16, 2015 at 3:46 PM, Guenter Roeck li...@roeck-us.net wrote:

On 04/15/2015 10:12 PM, Guenter Roeck wrote:


Return correct error code if _mv88e6xxx_reg_read returns an error.

Fixes: facd95b2e0ec0 (net: dsa: mv88e6xxx: Add Hardware bridging
support)
Signed-off-by: Guenter Roeck li...@roeck-us.net



I should have given proper credit.

Reported-by: kbuild test robot fengguang...@intel.com

For the curious, neither W=1, W=2, C-1, C=2, nor sparse report this problem,
at least not with gcc 4.9.1.


Good old gcc 4.1.2 (which I still use for m68k builds, and won't retire anytime
soon as it finds real bugs in every new kernel release) says:

 drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’:
 drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used
uninitialized in this function

Even after your patch, as it missed one case (patch sent).



Grumble. So much for trusting tools :-(

Thanks for fixing this!

Guenter

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v4 00/24] switchdev: spring cleanup

2015-04-16 Thread roopa


On 4/13/15, 10:47 PM, roopa wrote:

On 4/12/15, 11:16 PM, sfel...@gmail.com wrote:

From: Scott Feldman sfel...@gmail.com

v4:

Well, it was a lot of work, but now prepare-commit transaction model 
is how
davem advises: if prepare fails, abort the transaction.  The driver 
must do
resource reservations up front in prepare phase and return those 
resources if
aborting.  Commit phase would use reserved resources.  The good news 
is the
driver code (for rocker) now handles resource allocation failures 
better by not

leaving partially device or driver states.  This is a side-effect of the
prepare phase where state isn't modified; only validation of inputs and
resource reservations happen in the prepare phase.  Since we're 
supporting
setting attrs and add objs across lower devs in the stacked case, we 
need to
hold rtnl_lock (or ensure rtnl_lock is held) so lower devs don't move 
on us
during the prepare-commit transaction.  DSA driver code skips the 
prepare phase
and goes straight for the commit phase since no up-front allocations 
are done

and no device failures (that could be detected in the prepare phase) can
happen.


thanks for the series. It definitely does look cleaner and less 
confusing now!.
I do love the abstraction but i was one of the people voting against 
duplicating the
kernel objects into swdev objs which this patches does (which i am 
still not convinced

we should have).



Remove NETIF_F_HW_SWITCH_OFFLOAD from rocker and the swdev_attr_set/get
wrappers.  DSA doesn't set NETIF_F_HW_SWITCH_OFFLOAD, so it can't be in
swdev_attr_set/get.  rocker doesn't need it; or rather can't support
NETIF_F_HW_SWITCH_OFFLOAD being set/cleared at run-time after the device
port is already up and offloading L2/L3. NETIF_F_HW_SWITCH_OFFLOAD is 
still

left as a feature flag for drivers that can use it.


I see that this series removes all uses of it in the switchdev api 
later. I had summarized
the need for the flag in reply to one of your questions a few weeks 
back. Since you have moved all
ndo ops to swdev ops (including ndo_bridge_setlink/dellink), I don't 
want to hold on to the
 feature flag if no one is using it. yes, my userspace driver uses it 
today.

I will come back with stronger justification to keep it or
will submit a patch to remove it and add it back at a later point if 
needed.


scott,  I see that you will be spinning v5 of the series. In which case 
feel free to remove the feature flag during your

code restructuring effort. If i need it again, i will resubmit. thanks.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v2] act_mirred: Fix bogus header when redirecting from VLAN

2015-04-16 Thread Herbert Xu

On Thu, Apr 16, 2015 at 07:40:30PM -0700, Alexei Starovoitov wrote:
 On Fri, Apr 17, 2015 at 10:15:01AM +0800, Herbert Xu wrote:
   seems the cleaner fix will be to push skb-mac_len instead?
  
  No skb-mac_len is the same as skb2-dev-hard_header_len.
 
 hmm. please help me understand the problem then.
 In the commit log you mentioned that your vlan dev and ifb
 have unequal hard header length. I think that can only happen if
 your master dev used to create vlan, didn't have vlan offload,
 so vlandev-hhl = 18 and ifb-hhl = 14. Then when tagged packet
 arrives on master device we call skb_vlan_untag() and
 skb-mac_len becomes 14, then vlan_do_receive() will do

You are right.  I should've just used mac_len.

---8---
When you redirect a VLAN device to any device, you end up with
crap in af_packet on the xmit path because hard_header_len is
not equal to skb-mac_len.  So the redirected packet contains
four extra bytes at the start which then gets interpreted as
part of the MAC address.

This patch fixes this by only pushing skb-mac_len.  We also
need to fix ifb because it tries to undo the pushing done by
act_mirred.

Signed-off-by: Herbert Xu herb...@gondor.apana.org.au

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 34f846b..94570aa 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -105,7 +105,7 @@ static void ri_tasklet(unsigned long dev)
if (from  AT_EGRESS) {
dev_queue_xmit(skb);
} else if (from  AT_INGRESS) {
-   skb_pull(skb, skb-dev-hard_header_len);
+   skb_pull(skb, skb-mac_len);
netif_receive_skb(skb);
} else
BUG();
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 5953517..3f63cea 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -157,7 +157,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
 
if (!(at  AT_EGRESS)) {
if (m-tcfm_ok_push)
-   skb_push(skb2, skb2-dev-hard_header_len);
+   skb_push(skb2, skb-mac_len);
}
 
/* mirror is always swallowed */

-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-16 Thread Prashant Sreedharan

On Thu, 2015-04-16 at 18:15 +0100, Ian Jackson wrote:
 Michael Chan writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more 
 messages]):
  On Thu, 2015-04-16 at 09:24 -0300, casca...@linux.vnet.ibm.com wrote: 
   Yes, this looks like the driver is not syncing the DMA buffers. Unmap is
   supposed to synchronize as well.
  
  For small rx packets ( 256 bytes), we sync the DMA buffer before we
  copy the data to another SKB.  For larger packets, we unmap the DMA
  buffer.  Do we see the corruption in both cases?
 
 Yes, at least with swiotlb=force iommu=soft.

Ok this is what is causing the problem, the driver uses
DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of the dma
mapping and dma_unmap_addr() to get the mapping value. On most of
the platforms this is a no-op, but it appears with iommu=soft and
swiotlb=force this house keeping is required, when I pass the correct
dma_addr instead of 0 while calling pci_unmap_/pci_dma_sync_ I don't see
the corruption. ie If you set CONFIG_NEED_DMA_MAP_STATE=y in your kernel
config you should not see the problem. Can you confirm ? Thanks


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Hannes Frederic Sowa



On Thu, Apr 16, 2015, at 22:56, Patrick McHardy wrote:
 On 17.04, Herbert Xu wrote:
  On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote:
  
   So currently we have one fast path, that is: we are not fragmented, we
   get out non-fragmented, none of this code is ever touched and no
   problem.
   
   We don't want to mak this more complex, but
  
  You should read Dave's other email where he gives you an obvious
  solution.  If you have to modify the skb then you don't have to
  worry about the original fragments.
  
  But if you only read the skb then don't linearise it completely
  and keep the original fragments.
 
 Yes, I was just responding to that. We need an additional member
 in the skb, but then this part is quite simple.

I guess you refer to the solution of using the fragmented list of
packets to do checks. I don't think it will be that simple to peek into
all the different skbs if the fragments are split in the header.
Splitting fragments in the header is the 1x1 of firewall by-passing
attacks, so we should certainly handle it correctly.

Logic off peeking into fragments and splitting fragments must be
logically consent all the time, with all netfilter helpers in the tree.

I don't think the additional checks in fast-path are in any way
justified.

Bye,
Hannes
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Patrick McHardy

On 17.04, Hannes Frederic Sowa wrote:
 
 
 On Thu, Apr 16, 2015, at 22:56, Patrick McHardy wrote:
  On 17.04, Herbert Xu wrote:
   On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote:
   
So currently we have one fast path, that is: we are not fragmented, we
get out non-fragmented, none of this code is ever touched and no
problem.

We don't want to mak this more complex, but
   
   You should read Dave's other email where he gives you an obvious
   solution.  If you have to modify the skb then you don't have to
   worry about the original fragments.
   
   But if you only read the skb then don't linearise it completely
   and keep the original fragments.
  
  Yes, I was just responding to that. We need an additional member
  in the skb, but then this part is quite simple.
 
 I guess you refer to the solution of using the fragmented list of
 packets to do checks. I don't think it will be that simple to peek into
 all the different skbs if the fragments are split in the header.
 Splitting fragments in the header is the 1x1 of firewall by-passing
 attacks, so we should certainly handle it correctly.

Also correct, that argument hasn't come up so far. Its actually not
too hard to handle, you need to keep track of which parts have been
inspected to make a classification decision and make sure those
parts are not overwritten. The TCP match does this statically, but
we certainly could do this dynamically. The question again is how
to keep geometry if things are actually overwritten, especially
partially. Apparently all this troubly is being seen as worthwhile.

 Logic off peeking into fragments and splitting fragments must be
 logically consent all the time, with all netfilter helpers in the tree.
 
 I don't think the additional checks in fast-path are in any way
 justified.

I fully agree.

 
 Bye,
 Hannes
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: act_mirred: Fix bogus header when redirecting from VLAN

2015-04-16 Thread Alexei Starovoitov

On Fri, Apr 17, 2015 at 10:15:01AM +0800, Herbert Xu wrote:
  seems the cleaner fix will be to push skb-mac_len instead?
 
 No skb-mac_len is the same as skb2-dev-hard_header_len.

hmm. please help me understand the problem then.
In the commit log you mentioned that your vlan dev and ifb
have unequal hard header length. I think that can only happen if
your master dev used to create vlan, didn't have vlan offload,
so vlandev-hhl = 18 and ifb-hhl = 14. Then when tagged packet
arrives on master device we call skb_vlan_untag() and
skb-mac_len becomes 14, then vlan_do_receive() will do
another_round, ingress qdisc will trigger and act_mirred
eventually will be called. So existing act_mirred will be pushing
18 bytes, whereas only 14 bytes are valid and the lowest 4 have junk.
By 'fixing it' using ifb's 14 byte the patch is only masking the
problem, no?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

act_mirred: Fix bogus header when redirecting from VLAN

2015-04-16 Thread Herbert Xu

When you redirect a VLAN device to an ifb device, you end up with
crap within the ifb device because they have unequal hard header
lengths and the redirected packet contains four extra bytes at the
start which then gets interpreted as part of the MAC address.

This patch fixes this by only pushing on the hard header length
of ifb.  This assumes that the original packet actually has enough
header for that so checks have been added to that effect.

Signed-off-by: Herbert Xu herb...@gondor.apana.org.au

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 34f846b..1256d26 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -105,7 +105,7 @@ static void ri_tasklet(unsigned long dev)
if (from  AT_EGRESS) {
dev_queue_xmit(skb);
} else if (from  AT_INGRESS) {
-   skb_pull(skb, skb-dev-hard_header_len);
+   skb_pull(skb, _dev-hard_header_len);
netif_receive_skb(skb);
} else
BUG();
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 5953517..44e4d50 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -151,13 +151,25 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
}
 
at = G_TC_AT(skb-tc_verd);
+
+   if (!(at  AT_EGRESS)  m-tcfm_ok_push 
+   (skb_headroom(skb)  dev-hard_header_len ||
+skb-dev-hard_header_len  dev-hard_header_len)) {
+   net_notice_ratelimited(
+   tc mirred: Insufficient headroom from %s to %s: 
+   %d %d\n,
+   skb-dev-name, dev-name,
+   skb_headroom(skb), skb-dev-hard_header_len);
+   goto out;
+   }
+
skb2 = skb_act_clone(skb, GFP_ATOMIC, m-tcf_action);
if (skb2 == NULL)
goto out;
 
if (!(at  AT_EGRESS)) {
if (m-tcfm_ok_push)
-   skb_push(skb2, skb2-dev-hard_header_len);
+   skb_push(skb2, dev-hard_header_len);
}
 
/* mirror is always swallowed */
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] tcp: tcp_get_info() should fetch socket fields once

2015-04-16 Thread Eric Dumazet

From: Eric Dumazet eduma...@google.com

tcp_get_info() can be called without holding socket lock,
so any socket fields can change under us.

Use READ_ONCE() to fetch sk_pacing_rate and sk_max_pacing_rate

Fixes: 977cb0ecf82e (tcp: add pacing_rate information into tcp_info)
Signed-off-by: Eric Dumazet eduma...@google.com
---
David, I do not think this needs stable backport, this is a quite
minor bug. I've added the 'Fixes' tag for reference only.

 net/ipv4/tcp.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 18e3a12eb1b2..59c8a027721b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2595,6 +2595,7 @@ void tcp_get_info(const struct sock *sk, struct tcp_info 
*info)
const struct tcp_sock *tp = tcp_sk(sk);
const struct inet_connection_sock *icsk = inet_csk(sk);
u32 now = tcp_time_stamp;
+   u32 rate;
 
memset(info, 0, sizeof(*info));
 
@@ -2655,10 +2656,11 @@ void tcp_get_info(const struct sock *sk, struct 
tcp_info *info)
 
info-tcpi_total_retrans = tp-total_retrans;
 
-   info-tcpi_pacing_rate = sk-sk_pacing_rate != ~0U ?
-   sk-sk_pacing_rate : ~0ULL;
-   info-tcpi_max_pacing_rate = sk-sk_max_pacing_rate != ~0U ?
-   sk-sk_max_pacing_rate : ~0ULL;
+   rate = READ_ONCE(sk-sk_pacing_rate);
+   info-tcpi_pacing_rate = rate != ~0U ? rate : ~0ULL;
+
+   rate = READ_ONCE(sk-sk_max_pacing_rate);
+   info-tcpi_max_pacing_rate = rate != ~0U ? rate : ~0ULL;
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: act_mirred: Fix bogus header when redirecting from VLAN

2015-04-16 Thread Alexei Starovoitov

On Fri, Apr 17, 2015 at 09:02:16AM +0800, Herbert Xu wrote:
 @@ -105,7 +105,7 @@ static void ri_tasklet(unsigned long dev)
   if (from  AT_EGRESS) {
   dev_queue_xmit(skb);
   } else if (from  AT_INGRESS) {
 - skb_pull(skb, skb-dev-hard_header_len);
 + skb_pull(skb, _dev-hard_header_len);
...
   if (!(at  AT_EGRESS)) {
   if (m-tcfm_ok_push)
 - skb_push(skb2, skb2-dev-hard_header_len);
 + skb_push(skb2, dev-hard_header_len);

seems the cleaner fix will be to push skb-mac_len instead?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] net: dsa: register hwmon for any provided function

2015-04-16 Thread Guenter Roeck


hi Vivien,

On 04/16/2015 03:05 PM, Vivien Didelot wrote:

Hi Guenter,


  switch (index) {
+case 0: /* temp1_input */
+if (drv-get_temp)
+mode |= S_IRUGO;


This should be mandatory. Sorry, I don't really understand what you are
trying to accomplish here.

Can you give me a real world example where a chip would support setting
a limit but not reading it ?


I have no such example. I just did not see why this couldn't be allowed
(e.g. setting only set_temp_limit and get_temp_alarm looks fine to me).
But if you say that get_temp should be mandatory, I'm OK with that.


write-only attributes are not defined in the hwmon ABI. If the 'sensors'
command encounters such an attribute, it will create an error message
each time it executes. That doesn't sound very useful to me.

If a chip - for whatever reason - does not have a limit register
but an alarm register or flag, its temperature limit is usually hard-coded
and can be reported this way (the AMD temperature sensor driver does this,
for example). If there is ever a need to support the alarm-register-only
situation for some odd reason, we can add the code at the time.
For now, it just seems to me that you are adding complexity to solve
some theoretic problem which is very unlikely to occur in the real world.


The primary goal of this patchset was to use DEVICE_ATTR_RW to declare
temp1_max, instead of reflecting the minimal permissions needed.


Then why don't you just do that and nothing else ? The goal should be
to simplify code, not to make it more complicated. If the result isn't
less code, I don't think it is worth it.

Thanks,
Guenter

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 128 matches

Mail list logo