Re: [RFC PATCH v3 09/10] lib: libos build scripts and documentation

2015-04-21 Thread Hajime Tazaki

Hi Paul,

many thanks for your review. 

all the fixes will be on next patchset.
my comments are below.

At Mon, 20 Apr 2015 22:43:07 +0200,
Paul Bolle wrote:
> 
> Some random observations while I'm still trying to wrap my head around
> all this (which might take quite some time).
> 
> On Sun, 2015-04-19 at 22:28 +0900, Hajime Tazaki wrote:
> > --- /dev/null
> > +++ b/arch/lib/Kconfig
> > @@ -0,0 +1,124 @@
> > +menuconfig LIB
> > +   bool "LibOS-specific options"
> > +   def_bool n
> 
> This is the start of the Kconfig parse for lib. (That would basically
> still be true even if you didn't set KBUILD_KCONFIG, see below.) So why
> not do something like all arches do:
> 
> config LIB
>   def_bool y
>   select [...]
> 
> Ie, why would someone want to build for ARCH=lib and still not set LIB?

agreed. fixed.

> > +config EXPERIMENTAL
> > +   def_bool y
> 
> Unneeded: removed treewide in, I think, 2014.

thanks. fixed.

> > +config MMU
> > +def_bool n
> 
> Add empty line.
> 
> > +config FPU
> > +def_bool n
> 
> Ditto.

both are fixed.

> > +config KTIME_SCALAR
> > +   def_bool y
> 
> This one is unused.

deleted.

> > +config GENERIC_BUG
> > +   def_bool y
> > +   depends on BUG
> 
> Add empty line here.

fixed.

> > +config GENERIC_FIND_NEXT_BIT
> > +   def_bool y
> 
> This one is unused too.

deleted.

> > +config SLIB
> > +   def_bool y
> 
> You've also added SLIB to init/Kconfig in 02/10. But "make ARCH=lib
> *config" will never visit init/Kconfig, will it? And, apparently, none
> of SL[AOU]B are wanted for lib. So I think the entry for config SLIB in
> that file can be dropped (as other arches will never see it because it
> depends on LIB).
> 
> (Note that I haven't actually looked into all the Kconfig entries added
> above. Perhaps I might do that. But I'm pretty sure most of the time all
> I can say is: "I have no idea why this entry defaults to $VALUE".)

I intended to SLIB be a generic one, not only for the
arch/lib, as we discussed during v2 patch. 

but, you're right: for the moment, no one uses SLIB, we
don't visit init/Kconfig, I dropped config SLIB entry from
init/Kconfig.

> > +source "net/Kconfig"
> > +
> > +source "drivers/base/Kconfig"
> > +
> > +source "crypto/Kconfig"
> > +
> > +source "lib/Kconfig"
> > +
> > +
> 
> Trailing empty lines.

deleted. thanks.

> > diff --git a/arch/lib/Makefile b/arch/lib/Makefile
> > new file mode 100644
> > index 000..d8a0bf9
> > --- /dev/null
> > +++ b/arch/lib/Makefile
> > @@ -0,0 +1,251 @@
> > +ARCH_DIR := arch/lib
> > +SRCDIR=$(dir $(firstword $(MAKEFILE_LIST)))
> 
> Do you use SRCDIR?

no. deleted the line.

> > +DCE_TESTDIR=$(srctree)/tools/testing/libos/
> > +KBUILD_KCONFIG := arch/$(ARCH)/Kconfig
> 
> I think you copied this from arch/um/Makefile. But arch/um/ is, well,
> special. Why should lib not start the kconfig parse in the file named
> Kconfig? And if you want to start in arch/lib/Kconfig, it would be nice
> to add a mainmenu (just like arch/x86/um/Kconfig does).

right now, 'lib' only wants to eat arch/lib/Kconfig so that
build and link its wanted files instead of configurable one.

so I beilive arch/lib is also special as arch/um is.

I added a mainmenu btw. thanks.

> (I don't read Makefilese well enough to understand the rest of this
> file. I think it's scary.)

indeed. thank you again to review the cryptic files..

> When I did
> make ARCH=lib menuconfig
> 
> I saw (among other things):
> arch/lib/Makefile.print:41: target `trace/' given more than once in the 
> same rule.
> arch/lib/Makefile.print:41: target `trace/' given more than once in the 
> same rule.
> arch/lib/Makefile.print:41: target `trace/' given more than once in the 
> same rule.
> arch/lib/Makefile.print:41: target `trace/' given more than once in the 
> same rule.
> arch/lib/Makefile.print:41: target `lzo/' given more than once in the 
> same rule.
(snip)
> arch/lib/Makefile.print:41: target `ppp/' given more than once in the 
> same rule.
> arch/lib/Makefile.print:41: target `slip/' given more than once in the 
> same rule.
> 
> I have no idea why. Unclean tree?

this was due to inappropriate handling of the internal
directory listing procedure. fixed.

> > +.PHONY : core
> > +.NOTPARALLEL : print $(subdirs) $(final-obj-m)
> 
> > --- /dev/null
> > +++ b/arch/lib/processor.mk
> > @@ -0,0 +1,7 @@
> > +PROCESSOR=$(shell uname -m)
> > +PROCESSOR_x86_64=64
> > +PROCESSOR_i686=32
> > +PROCESSOR_i586=32
> > +PROCESSOR_i386=32
> > +PROCESSOR_i486=32
> > +PROCESSOR_SIZE=$(PROCESSOR_$(PROCESSOR))
> 
> The rest of the tree appears to use BITS instead of PROCESSOR_SIZE. And
> I do hope there's a cleaner way for lib to set PROCESSOR_SIZE than this.

the variable PROCESSOR_SIZE is only used by
arch/lib/Makefile, with the following lines.

> +ifeq ($(PROCESSOR_SIZE),64)
> +CFLAGS+= -DCONFIG_64BIT
> +endif

Thus it eventually uses CONFIG_64BIT.

I think a cleaner way is to follow the way of arch/um, like
below:

Re: [RFC 3/3] tc: cleanup tc_classify

2015-04-21 Thread Cong Wang
On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov  wrote:
> introduce tc_classify_act() and qdisc_drop_bypass() helper functions to reduce
> copy-paste among different qdiscs


I don't think qdisc_drop_bypass() is more readable than without it,
maybe you need a better name, or just leave the code as it is.

tc_classify_act() seems ok.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Renesas Ethernet AVB driver

2015-04-21 Thread MITSUHIRO KIMURA
Hello Sergei.

 (2015/04/15 6:37:28), Sergei Shtylyov wrote:
> >> +  if (!ravb_tx_free(ndev, q)) {
> >> +  netif_warn(priv, tx_queued, ndev, "TX FD exhausted.\n");
> >> +  netif_stop_queue(ndev);
> >> +  spin_unlock_irqrestore(&priv->lock, flags);
> >> +  return NETDEV_TX_BUSY;
> >> +  }
> >> +  }
> >> +  entry = priv->cur_tx[q] % priv->num_tx_ring[q];
> >> +  priv->cur_tx[q]++;
> >> +  spin_unlock_irqrestore(&priv->lock, flags);
> >> +
> >> +  if (skb_put_padto(skb, ETH_ZLEN))
> >> +  return NETDEV_TX_OK;
> >> +
> >> +  priv->tx_skb[q][entry] = skb;
> >> +  buffer = PTR_ALIGN(priv->tx_buffers[q][entry], RAVB_ALIGN);
> >> +  memcpy(buffer, skb->data, skb->len);
> 
> > ~1500 bytes memcpy(), not good...
> 
> I'm looking in the manual and not finding the hard requirement to have the
> buffer address aligned to 128 bytes (RAVB_ALIGN), sigh... Kimura-san?

There are the hardware requirement that the frame data must be aligned with
a 32-bit boundary in the URAM, see section 45A.3.3.1 Data Representation
in the manual.
I think that the original skb->data is almost aligned with 2 bytes boundary
by NET_IP_ALING, so we copied original skb->data to the local aligned buffer.

In addition, see section 45A.3.3.12 Tips for Optimizing Performance in Handling
Descriptors, it mentioned that frame data is accessed in blocks up to 128 bytes
and the number of 128 byte borders (addresses H'xxx00 and H'xxx80) and frame 
data
inside should be minimized.
So we set RAVB_ALIGN to 128 bytes.

Best Regards,
Mitsuhiro Kimura



Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-21 Thread Cong Wang
On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov  wrote:
> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
> Get rid of redundant checks in all qdiscs.
> Instead do it once.

The current code can be easily extended, while your code not.
I don't see the need of this change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/3] tc: fix return values of ingress qdisc

2015-04-21 Thread Cong Wang
On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov  wrote:
> ingress qdisc should return NET_XMIT_* values just like all other qdiscs.
>

XMIT already means egress...

> Since it's invoked via qdisc_enqueue_root() (which suppose to return
> only NET_XMIT_* values as well), it was working by accident,
> since TC_ACT_* values fit within NET_XMIT_MASK.
>

Why not just add a BUILD_BUG_ON() to capture this?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] igb: fix a typo in igb_reset_q_vector()

2015-04-21 Thread Cong Wang
On Tue, Apr 21, 2015 at 5:29 PM, Jeff Kirsher
 wrote:
> On Wed, 2015-04-22 at 09:26 +0900, Toshiaki Makita wrote:
>> Hi Cong Wang,
>>
>> I have already sent a patch to intel's tree.
>> http://git.kernel.org/cgit/linux/kernel/git/jkirsher/net-queue.git/commit/?h=dev-queue&id=02ac4b65689f8df824117395fd8d160c04161a7b
>>

I didn't know this tree, I use -net tree.

>>
>
> Oops, yeah.  I already have this in my queue.  I wondered why the patch
> seemed familiar, Cong dropping your patch since I already have
> Tochiaki's patch in my queue.

Sure. Please queue it for -stable too, since it apparently fixes a bug.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT] Networking

2015-04-21 Thread David Miller

Just a few fixes trickling in at this point.

1) If we see an attached socket on an skb in the ipv4 forwarding
   path, bail.  This can happen due to races with FIB rule addition,
   and deletion, and we should just drop such frames.  From Sebastian
   Pöhn.

2) pppoe receive should only accept packets destined for this hosts's
   MAC address.  From Joakim Tjernlund.

3) Handle checksum unwrapping properly in ppp receive properly when
   it's encapsulated in UDP in some way, fix from Tom Herbert.

4) Fix some bugs in mv88e6xxx DSA driver resulting from the conversion
   from register offset constants to mnenomic macros.  From Vivien
   Didelot.

5) Fix handling of HCA max message size in mlx4 adapters, from Eran
   Ben ELisha.

Please pull, thanks a lot.

The following changes since commit 04b7fe6a4a231871ef681bc95e08fe66992f7b1f:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide (2015-04-17 
16:36:59 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to fab9adfb71fc8690e20c3c280d39d49c8f4a3f0a:

  net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP 
(2015-04-21 17:36:08 -0400)


Andreas Oetken (1):
  altera tse: Error-Bit on tx-avalon-stream always set.

David S. Miller (1):
  Merge branch 'ppp_csum_unset'

Eran Ben Elisha (1):
  net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP

Joakim Tjernlund (1):
  pppoe: Lacks DST MAC address check

Sebastian Pöhn (1):
  ip_forward: Drop frames with attached skb->sk

Tom Herbert (2):
  net: add skb_checksum_complete_unset
  ppp: call skb_checksum_complete_unset in ppp_receive_frame

Vivien Didelot (2):
  net: dsa: mv88e6xxx: fix setup of port control 1
  net: dsa: mv88e6xxx: use PORT_DEFAULT_VLAN

jba...@akamai.com (1):
  tcp: add memory barriers to write space paths

 drivers/net/dsa/mv88e6xxx.c   |  6 +++---
 drivers/net/ethernet/altera/altera_msgdmahw.h |  1 -
 drivers/net/ethernet/mellanox/mlx4/fw.c   |  2 +-
 drivers/net/ppp/ppp_generic.c |  1 +
 drivers/net/ppp/pppoe.c   |  3 +++
 include/linux/skbuff.h| 12 
 net/ipv4/ip_forward.c |  3 +++
 net/ipv4/tcp.c|  4 +++-
 net/ipv4/tcp_input.c  |  2 ++
 9 files changed, 28 insertions(+), 6 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] mpls: ABI changes for security and correctness

2015-04-21 Thread David Miller
From: ebied...@xmission.com (Eric W. Biederman)
Date: Tue, 21 Apr 2015 19:29:42 -0500

> Robert Shearman  writes:
> 
>> These changes make mpls not be enabled by default on all
>> interfaces when in use for security, along with ensuring that a label
>> not valid as an outgoing label can be added in mpls routes.
>>
>> This series contains three ABI/behaviour-affecting changes which have
>> been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
>> improvements" without any further modification. These changes need to
>> be considered for 4.1 otherwise we'll be stuck with the current
>> behaviour/ABI forever.
> 
> I don't like the difference in default between loopback and everything
> else.  That just seems like an extra arbitrary rule.
> 
> Otherwise:
> Acked-by: "Eric W. Biederman" 
> 
> Not that I expect Dave Miller is taking patches during the merge window.

Eric, you say you disagree with the loopback vs. everything else
behavior, yet you're ACK'ing this.

Please don't say something like that because it is confusing and
I can't tell what you want me to do.

If you're willing to accept the series as is, say is: "Even though
I disagree with X, I'm ok with this series for now."

If you want changes before the series gets applied: "I want X
changed to Y, and with that I give my ACK."

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails

2015-04-21 Thread David Miller
From: Herbert Xu 
Date: Wed, 22 Apr 2015 08:36:34 +0800

> On Tue, Apr 21, 2015 at 02:55:34PM +0200, Thomas Graf wrote:
>> When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
>> we can't allocate the necessary memory in the current context but the
>> limits as set by the user would still allow to grow.
>> 
>> Thus attempt an async resize in the background where we can allocate
>> using GFP_KERNEL which is more likely to succeed. The insertion itself
>> will still fail to indicate pressure.
>> 
>> This fixes a bug where the table would never continue growing once the
>> utilization is above 100%.
>> 
>> Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
>> Signed-off-by: Thomas Graf 
> 
> Good catch.  But I think this call should happen in
> rhashtable_insert_rehash since it's on the slow-path.

Ok, then I expect a respin of this series.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] tcp: fix possible deadlock in tcp_send_fin()

2015-04-21 Thread Eric Dumazet
From: Eric Dumazet 

Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in
case a huge process is killed by OOM, and tcp_mem[2] is hit.

To be able to free memory we need to make progress, so this
patch allows FIN packets to not care about tcp_mem[2], if
skb allocation succeeded.

In a follow-up patch, we might abort tcp_send_fin() infinite loop
in case TIF_MEMDIE is set on this thread, as memory allocator
did its best getting extra memory already. 

This patch reverts d22e15371811 ("tcp: fix tcp fin memory accounting")

Fixes: d22e15371811 ("tcp: fix tcp fin memory accounting")
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_output.c |   20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8c8d7e06b72f..2ade67b7cdb0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2812,6 +2812,21 @@ begin_fwd:
}
 }
 
+/* We allow to exceed memory limits for FIN packets to expedite
+ * connection tear down and (memory) recovery.
+ * Otherwise tcp_send_fin() could loop forever.
+ */
+static void sk_forced_wmem_schedule(struct sock *sk, int size)
+{
+   int amt, status;
+
+   if (size <= sk->sk_forward_alloc)
+   return;
+   amt = sk_mem_pages(size);
+   sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
+   sk_memory_allocated_add(sk, amt, &status);
+}
+
 /* Send a fin.  The caller locks the socket for us.  This cannot be
  * allowed to fail queueing a FIN frame under any circumstances.
  */
@@ -2834,11 +2849,14 @@ void tcp_send_fin(struct sock *sk)
} else {
/* Socket is locked, keep trying until memory is available. */
for (;;) {
-   skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation);
+   skb = alloc_skb_fclone(MAX_TCP_HEADER,
+  sk->sk_allocation);
if (skb)
break;
yield();
}
+   skb_reserve(skb, MAX_TCP_HEADER);
+   sk_forced_wmem_schedule(sk, skb->truesize);
/* FIN eats a sequence byte, write_seq advanced by 
tcp_queue_skb(). */
tcp_init_nondata_skb(skb, tp->write_seq,
 TCPHDR_ACK | TCPHDR_FIN);


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] net: stmmac: use msleep instead of udelay for gpio reset

2015-04-21 Thread Michael Trimarchi
Hi

On Tue, Apr 21, 2015 at 08:31:34PM -0400, David Miller wrote:
> From: Michael Trimarchi 
> Date: Wed, 22 Apr 2015 01:13:47 +0200
> 
> > Hi
> > 
> > On Tue, Apr 21, 2015 at 05:35:40PM -0400, David Miller wrote:
> >> From: Michael Trimarchi 
> >> Date: Tue, 21 Apr 2015 13:16:13 +0200
> >> 
> >> > -udelay(data->delays[0]);
> >>  ...
> >> > +msleep(max(1U, data->delays[0] / 1000));
> >> 
> >> That looks very ugly with that max() expression in there.
> >>
> > 
> > Is fine for you a DIV_ROUND_UP?
> 
> Not inside of these simple msleep() calls, no.
> 
> How about adjusting the values either in the datastructure or
> in local variables instead?  That wasn't so hard to come up
> with now, was it?

Ok, it's easy no problem at all, I will post later today but I prefer
local variables and use DIV_ROUND_UP

Michael

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 2/2] rhashtable: Do not schedule more than one rehash if we can't grow further

2015-04-21 Thread Herbert Xu
On Tue, Apr 21, 2015 at 02:55:35PM +0200, Thomas Graf wrote:
> The current code currently only stops inserting rehashes into the
> chain when no resizes are currently scheduled. As long as resizes
> are scheduled and while inserting above the utilization watermark,
> more and more rehashes will be scheduled.
> 
> This lead to a perfect DoS storm with thousands of rehashes
> scheduled which lead to thousands of spinlocks to be taken
> sequentially.
> 
> Instead, only allow either a series of resizes or a single rehash.
> Drop any further rehashes and return -EBUSY.
> 
> Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
> Signed-off-by: Thomas Graf 

Acked-by: Herbert Xu 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails

2015-04-21 Thread Herbert Xu
On Tue, Apr 21, 2015 at 02:55:34PM +0200, Thomas Graf wrote:
> When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
> we can't allocate the necessary memory in the current context but the
> limits as set by the user would still allow to grow.
> 
> Thus attempt an async resize in the background where we can allocate
> using GFP_KERNEL which is more likely to succeed. The insertion itself
> will still fail to indicate pressure.
> 
> This fixes a bug where the table would never continue growing once the
> utilization is above 100%.
> 
> Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
> Signed-off-by: Thomas Graf 

Good catch.  But I think this call should happen in
rhashtable_insert_rehash since it's on the slow-path.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] mpls: ABI changes for security and correctness

2015-04-21 Thread Eric W. Biederman
Robert Shearman  writes:

> These changes make mpls not be enabled by default on all
> interfaces when in use for security, along with ensuring that a label
> not valid as an outgoing label can be added in mpls routes.
>
> This series contains three ABI/behaviour-affecting changes which have
> been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
> improvements" without any further modification. These changes need to
> be considered for 4.1 otherwise we'll be stuck with the current
> behaviour/ABI forever.

I don't like the difference in default between loopback and everything
else.  That just seems like an extra arbitrary rule.

Otherwise:
Acked-by: "Eric W. Biederman" 

Not that I expect Dave Miller is taking patches during the merge window.

> Robert Shearman (3):
>   mpls: Per-device MPLS state
>   mpls: Per-device enabling of packet input
>   mpls: Prevent use of implicit NULL label as outgoing label
>
>  Documentation/networking/mpls-sysctl.txt |   9 +++
>  include/linux/netdevice.h|   4 +
>  net/mpls/af_mpls.c   | 132 
> ++-
>  net/mpls/internal.h  |   6 ++
>  4 files changed, 148 insertions(+), 3 deletions(-)

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] net: stmmac: use msleep instead of udelay for gpio reset

2015-04-21 Thread David Miller
From: Michael Trimarchi 
Date: Wed, 22 Apr 2015 01:13:47 +0200

> Hi
> 
> On Tue, Apr 21, 2015 at 05:35:40PM -0400, David Miller wrote:
>> From: Michael Trimarchi 
>> Date: Tue, 21 Apr 2015 13:16:13 +0200
>> 
>> > -  udelay(data->delays[0]);
>>  ...
>> > +  msleep(max(1U, data->delays[0] / 1000));
>> 
>> That looks very ugly with that max() expression in there.
>>
> 
> Is fine for you a DIV_ROUND_UP?

Not inside of these simple msleep() calls, no.

How about adjusting the values either in the datastructure or
in local variables instead?  That wasn't so hard to come up
with now, was it?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next,1/1] hv_netvsc: call dump_rndis_message() only in netvsc debug mode

2015-04-21 Thread David Miller
From: Simon Xiao 
Date: Tue, 21 Apr 2015 22:14:14 +

> In current netvsc driver, for each packet received, it will call
> dump_rndis_message() to try to dump the rndis packet information by
> netdev_dbg().  In non-debug mode, dump_rndis_message() will not dump
> anything but it still initialize some local variables and process
> the switch logic in the function of dump_rndis_message(), which is
> unnecessary, especially in high network throughput situation.

See NETIF_MSG_* and use it properly in your driver, read other drivers
and learn how to properly use it for things like this.

I'm not going to explain this a third time.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] igb: fix a typo in igb_reset_q_vector()

2015-04-21 Thread Jeff Kirsher
On Wed, 2015-04-22 at 09:26 +0900, Toshiaki Makita wrote:
> On 2015/04/22 8:20, Cong Wang wrote:
> > Fixes: 5536d2102a2d ("igb: Combine q_vector and ring allocation into a 
> > single function")
> > Cc: Alexander Duyck 
> > Cc: Jeff Kirsher 
> > Signed-off-by: Cong Wang 
> > ---
> >  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> > b/drivers/net/ethernet/intel/igb/igb_main.c
> > index 8457d03..85bbeb2 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -1036,7 +1036,7 @@ static void igb_reset_q_vector(struct igb_adapter 
> > *adapter, int v_idx)
> > adapter->tx_ring[q_vector->tx.ring->queue_index] = NULL;
> >  
> > if (q_vector->rx.ring)
> > -   adapter->tx_ring[q_vector->rx.ring->queue_index] = NULL;
> > +   adapter->rx_ring[q_vector->rx.ring->queue_index] = NULL;
> >  
> > netif_napi_del(&q_vector->napi);
> >  
> 
> Hi Cong Wang,
> 
> I have already sent a patch to intel's tree.
> http://git.kernel.org/cgit/linux/kernel/git/jkirsher/net-queue.git/commit/?h=dev-queue&id=02ac4b65689f8df824117395fd8d160c04161a7b
> 
> Toshiaki Makita
> 

Oops, yeah.  I already have this in my queue.  I wondered why the patch
seemed familiar, Cong dropping your patch since I already have
Tochiaki's patch in my queue.


signature.asc
Description: This is a digitally signed message part


Re: [Patch net] igb: fix a typo in igb_reset_q_vector()

2015-04-21 Thread Toshiaki Makita
On 2015/04/22 8:20, Cong Wang wrote:
> Fixes: 5536d2102a2d ("igb: Combine q_vector and ring allocation into a single 
> function")
> Cc: Alexander Duyck 
> Cc: Jeff Kirsher 
> Signed-off-by: Cong Wang 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 8457d03..85bbeb2 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -1036,7 +1036,7 @@ static void igb_reset_q_vector(struct igb_adapter 
> *adapter, int v_idx)
>   adapter->tx_ring[q_vector->tx.ring->queue_index] = NULL;
>  
>   if (q_vector->rx.ring)
> - adapter->tx_ring[q_vector->rx.ring->queue_index] = NULL;
> + adapter->rx_ring[q_vector->rx.ring->queue_index] = NULL;
>  
>   netif_napi_del(&q_vector->napi);
>  

Hi Cong Wang,

I have already sent a patch to intel's tree.
http://git.kernel.org/cgit/linux/kernel/git/jkirsher/net-queue.git/commit/?h=dev-queue&id=02ac4b65689f8df824117395fd8d160c04161a7b

Toshiaki Makita

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] igb: fix a typo in igb_reset_q_vector()

2015-04-21 Thread Jeff Kirsher
On Tue, 2015-04-21 at 16:20 -0700, Cong Wang wrote:
> Fixes: 5536d2102a2d ("igb: Combine q_vector and ring allocation into a
> single function")
> Cc: Alexander Duyck 
> Cc: Jeff Kirsher 
> Signed-off-by: Cong Wang 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks Cong, I will add your patch to my queue.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] fix tcp fin memory accounting

2015-04-21 Thread Eric Dumazet
On Tue, 2015-03-24 at 01:11 -0500, Josh Hunt wrote:
> On 03/24/2015 01:10 AM, David Miller wrote:
> > From: Josh Hunt 
> > Date: Fri, 20 Mar 2015 12:36:24 -0500
> >
> >> Would it be possible to queue up 355a901e6cf1 (tcp: make connect()
> >> mem charging friendly) for stable as well? That is the commit that
> >> fixes this problem in the tcp_connect()/tcp_send_syn_data() cases.
> >
> > Done.
> >
> 
> Thanks David.

Note that this patch adds a deadlock possibility in some stress
situations.

If a process owning some tcp socket dies, and tcp_mem[2] is already hit,
all sk_stream_alloc_skb() can return NULL and we loop in tcp_send_fin(),
making no progress because we can not free any tcp memory.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Networking microconference at LPC15 in Seattle (Aug 19-21)

2015-04-21 Thread Thomas Graf
Hi,

Tom and myself will be running another iteration of a networking
focused microconference [0] at LPC, Aug 19-21 in Seattle. Given that
a bunch of us will be in Seattle anyway we might as well have a
general networking session and spend some time together.

This year's focus is on: IPv6, Network Virtualization, and Security.

Please note that separated sessions are proposed for "Network offload",
"Wireless networking" and "Network management".

If you are interested in participating and have ideas for discussions,
please add them to the wiki [0]. If you are interested in attending,
please list your name on the wiki [0]. If you are interested in
helping us run the session, feel free to reach out to us.

[0] http://wiki.linuxplumbersconf.org/2015:networking

Thomas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Patch net] igb: fix a typo in igb_reset_q_vector()

2015-04-21 Thread Cong Wang
Fixes: 5536d2102a2d ("igb: Combine q_vector and ring allocation into a single 
function")
Cc: Alexander Duyck 
Cc: Jeff Kirsher 
Signed-off-by: Cong Wang 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 8457d03..85bbeb2 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1036,7 +1036,7 @@ static void igb_reset_q_vector(struct igb_adapter 
*adapter, int v_idx)
adapter->tx_ring[q_vector->tx.ring->queue_index] = NULL;
 
if (q_vector->rx.ring)
-   adapter->tx_ring[q_vector->rx.ring->queue_index] = NULL;
+   adapter->rx_ring[q_vector->rx.ring->queue_index] = NULL;
 
netif_napi_del(&q_vector->napi);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

2015-04-21 Thread Luis R. Rodriguez
On Tue, Apr 21, 2015 at 06:51:26PM -0400, Andy Walls wrote:
> Sorry for the top post; mobile work email account.
> 
> Luis,
> 
> You do the changes to remove MTTR and point me to your dev repo and branch.
> Also point me to the new functions/primitives I'll need.

There is nothing new actually needed for ivtv, unless of course
the ivtv driver is bounded to use a large MTRR that includes
the non-framebuffer region, if so then the ioremap_uc() would
be needed, and you can just cherry pick that patch:

https://marc.info/?l=linux-kernel&m=142964809110516&w=1

I'll bounce that patch to you as well. Might help reading this
patch too:

https://marc.info/?l=linux-kernel&m=142964809710517&w=1

If your write-combining area is not restricted by size constraints
so that it also include the non-framebuffer areas then you can just
do a simple conversion of the driver to use ioremap_wc() on the
framebuffer followed by arch_phys_wc_add().

An example driver that required changes to split this with size
contraints is atyfb, here are the changes for it:

https://marc.info/?l=linux-kernel&m=142964818810539&w=1
https://marc.info/?l=linux-kernel&m=142964813610531&w=1
https://marc.info/?l=linux-kernel&m=142964811010524&w=1
https://marc.info/?l=linux-kernel&m=142964814810532&w=1

If you are not constrained by MTRR's limitation on size then
a simple trivial driver conversion is sufficient. For example:

https://marc.info/?l=linux-kernel&m=142964744610286&w=1

I should also note that we are strivoing to also not use overlapping ioremap()
calls as we want to avoid that mess. Overlapping iroemap() calls with different
types could in theory work but its best we just design clean drivers and avoid
this.

As per Andy Lutomirski, what we'd need done on ivtv likely is
for it to ioremap() for an initial bring up of the device, then
infer the framebuffer offset, and only when that is being used
then iounmap and then ioremap() again split areas on the driver,
one with ioremap.

> I'll do the changes to add write-combining back into ivtv and ivtvfb, test
> them with my hardware and push them to my linuxtv.org git repo.

Great! The above sounded like a complexity you did not wish to
take on, but if you're up for the change, that'd be awesome!

> I know there is at least one English speaking user in India using ivtv with
> old PVR hardware, and probably folks in less developed places also using it.

If the above is too much work for that few amount of users I'd hope
we can just have them use older kernels, for the sake of sane APIs and
clean driver architecture.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] net: stmmac: use msleep instead of udelay for gpio reset

2015-04-21 Thread Michael Trimarchi
Hi

On Tue, Apr 21, 2015 at 05:35:40PM -0400, David Miller wrote:
> From: Michael Trimarchi 
> Date: Tue, 21 Apr 2015 13:16:13 +0200
> 
> > -   udelay(data->delays[0]);
>  ...
> > +   msleep(max(1U, data->delays[0] / 1000));
> 
> That looks very ugly with that max() expression in there.
>

Is fine for you a DIV_ROUND_UP?

> Please find some clean way to get rid of it if you want to
> make this conversion.
>

Agree, I will repost it

Michael

> Thanks.

-- 
| Michael Nazzareno Trimarchi Amarula Solutions BV |
| COO  -  Founder  Cruquiuskade 47 |
| +31(0)851119172 Amsterdam 1018 AM NL |
|  [`as] http://www.amarulasolutions.com   |
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()

2015-04-21 Thread Joe Perches
On Tue, 2015-04-21 at 23:44 +0200, Mateusz Kulikowski wrote:
> On 21.04.2015 23:22, Joe Perches wrote:
> > On Tue, 2015-04-21 at 22:57 +0200, Mateusz Kulikowski wrote:
> (...)
> >>
> >> Perhaps it would be smarter to use (for both patches) $stat instead.
> >> This applies also to existing checks (like PREFER_ETHER_ADDR_COPY) 
> >> so we can catch calls formatted like
> >>
> >> memset(very.long.structure->something.something_different42,
> >>0xFF, ETH_ALEN);
> > 
> > Yes, likely that's true.
> > 
> > checkpatch couldn't --fix it easily unless it's on a
> > single line though.
> 
> True, True; If you prefer $line and ability to --fix - I'll use that in v3

I suppose you could do both $line and $stat
and the fix would only work when it's on a
single line.

Perhaps something like this would work:

if ($line =~ /whatever/ ||
(defined($stat) && $stat =~ /whatever/)) {
if (WARN(...) &&
$fix) {
fixed[$fixlinenr] =~ s/whatever/appropriate/;
}
}

No worries about getting 'round the the list.
It'll get got eventually.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next,1/1] hv_netvsc: call dump_rndis_message() only in netvsc debug mode

2015-04-21 Thread Simon Xiao

> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, April 21, 2015 2:49 PM
> To: Simon Xiao
> Cc: KY Srinivasan; Haiyang Zhang; de...@linuxdriverproject.org;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH net-next,1/1] hv_netvsc: call dump_rndis_message() only in
> netvsc debug mode
> 
> From: six...@microsoft.com
> Date: Tue, 21 Apr 2015 15:58:05 -0700
> 
> > From: Simon Xiao 
> >
> > Signed-off-by: Simon Xiao 
> > Reviewed-by: K. Y. Srinivasan 
> > Reviewed-by: Haiyang Zhang 
> 
> I just gave you feedback on this patch in response to your original 
> submission,
> do not ignore it.

Thanks for your feedback, David.

In current netvsc driver, for each packet received, it will call 
dump_rndis_message() 
to try to dump the rndis packet information by netdev_dbg(). 
In non-debug mode, dump_rndis_message() will not dump anything 
but it still initialize some local variables and process the switch logic in 
the function 
of dump_rndis_message(), which is unnecessary, especially in high network 
throughput situation.

My change is to have a run-time config flag to control the execution of 
dump_rndis_message() 
and avoid above unnecessary cost in non-debug mode.
In the default case, it will be non-debug mode,
 and rndis_filter_receive() will not call dump_rndis_message() 
which saves the above extra cost for each packet received.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

2015-04-21 Thread Andy Lutomirski
On Tue, Apr 21, 2015 at 3:08 PM, Luis R. Rodriguez  wrote:
> On Tue, Apr 21, 2015 at 3:02 PM, Luis R. Rodriguez  wrote:
>> Andy, can we live without MTRR support on this driver for future kernels? 
>> This
>> would only leave ipath as the only offending driver.
>
> Sorry to be clear, can we live with removal of write-combining on this driver?
>

I personally think so, but a driver maintainer's ack would be nice.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

2015-04-21 Thread Luis R. Rodriguez
On Tue, Apr 21, 2015 at 3:02 PM, Luis R. Rodriguez  wrote:
> Andy, can we live without MTRR support on this driver for future kernels? This
> would only leave ipath as the only offending driver.

Sorry to be clear, can we live with removal of write-combining on this driver?

 Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

2015-04-21 Thread Luis R. Rodriguez
On Wed, Apr 15, 2015 at 09:07:37PM -0400, Andy Walls wrote:
> On Thu, 2015-04-16 at 01:58 +0200, Luis R. Rodriguez wrote:
> > Hey Andy, thanks for your review,  adding Hyong-Youb Kim for  review of the
> > full range ioremap_wc() idea below.
> > 
> > On Wed, Apr 15, 2015 at 06:38:51PM -0400, Andy Walls wrote:
> > > Hi All,
> > > 
> > > On Mon, 2015-04-13 at 19:49 +0200, Luis R. Rodriguez wrote:
> > > > From the beginning it seems only framebuffer devices used MTRR/WC,
> > > [snip]
> > > >  The ivtv device is a good example of the worst type of
> > > > situations and these days. So perhap __arch_phys_wc_add() and a
> > > > ioremap_ucminus() might be something more than transient unless 
> > > > hardware folks
> > > > get a good memo or already know how to just Do The Right Thing (TM).
> > > 
> > > Just to reiterate a subtle point, use of the ivtvfb is *optional*.  A
> > > user may or may not load it.  When the user does load the ivtvfb driver,
> > > the ivtv driver has already been initialized and may have functions of
> > > the card already in use by userspace.
> > 
> > I suspected this and its why I note that a rewrite to address a clean
> > split with separate ioremap seems rather difficult in this case.
> > 
> > > Hopefully no one is trying to use the OSD as framebuffer and the video
> > > decoder/output engine for video display at the same time. 
> > 
> > Worst case concern I have also is the implications of having overlapping
> > ioremap() calls (as proposed in my last reply) for different memory types
> > and having the different virtual memory addresse used by different parts
> > of the driver. Its not clear to me what the hardware implications of this
> > are.
> > 
> > >  But the video
> > > decoder/output device nodes may already be open for performing ioctl()
> > > functions so unmapping the decoder IO space out from under them, when
> > > loading the ivtvfb driver module, might not be a good thing. 
> > 
> > Using overlapping ioremap() calls with different memory types would address
> > this concern provided hardware won't barf both on the device and CPU. 
> > Hardware
> > folks could provide feedback or an ivtvfb user could test the patch supplied
> > on both non-PAT and PAT systems. Even so, who knows,  this might work on 
> > some
> > systems while not on others, only hardware folks would know.
> 
> The CX2341[56] firmware+hardware has a track record for being really
> picky about sytem hardware.  It's primary symptoms are for the DMA
> engine or Mailbox protocol to get hung up.  So yeah, it could barf
> easily on some users.
> 
> > An alternative... is to just ioremap_wc() the entire region, including
> > MMIO registers for these old devices.
> 
> That's my thought; as long as implementing PCI write then read can force
> writes to be posted and that setting that many pages as WC doesn't cause
> some sort of PAT resource exhaustion. (I know very little about PAT).

So upon review that strategy won't work well unless we implemnt some
sort of of hack on the driver. That's also quite a bit of work.

Andy, can we live without MTRR support on this driver for future kernels? This
would only leave ipath as the only offending driver.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-21 Thread Stephen Hemminger
On Fri, 17 Apr 2015 13:17:12 -0400 (EDT)
David Miller  wrote:

> From: Tejun Heo 
> Date: Fri, 17 Apr 2015 12:28:26 -0400
> 
> > On Sat, Apr 18, 2015 at 12:35:06AM +0900, Tetsuo Handa wrote:
> >> If the sender side can wait for retransmission, why can't we use
> >> userspace programs (e.g. rsyslogd)?
> > 
> > Because the system may be oopsing, ooming or threshing excessively
> > rendering the userland inoperable and that's exactly when we want
> > those log messages to be transmitted out of the system.
> 
> If userland cannot run properly, it is almost certain that neither will
> your complex reliability layer logic.
> 
> I tend to agree with Tetsuo, that in-kernel netconsole should remain
> as simple as possible and once it starts to have any smarts and less
> trivial logic the job belongs in userspace.

Keep existing netconsole as simple as possible. It is not meant as
reliable, secure logging.

"Those who do not understand TCP are doomed to reinvent it"
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] v2: Driver: hv: netvsc: call dump_rndis_message() only in netvsc debug mode

2015-04-21 Thread David Miller
From: Simon Xiao 
Date: Tue, 21 Apr 2015 21:47:32 +

> Sorry - this patch should be sent to net-next so please ignore it. 

Please do not top post.

First, provide exactly the necessary quoted material, and only the
most necessary quoted material.

Then place your response afterwards, rather than beforehand.

Again, please do not ever top-post or quote more material in
your reponse tha necessary.

This is a very serious pet peeve of experienced people who read
this list every day, so please do your best to abide to do this
properly.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] ethernet: myri10ge: use arch_phys_wc_add()

2015-04-21 Thread David Miller
From: "Luis R. Rodriguez" 
Date: Tue, 21 Apr 2015 13:09:45 -0700

> From: "Luis R. Rodriguez" 
> 
> This driver already uses ioremap_wc() on the same range
> so when write-combining is available that will be used
> instead.
 ...
> Signed-off-by: Luis R. Rodriguez 

I'll apply this with a driver maintainer's ACK.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next,1/1] hv_netvsc: call dump_rndis_message() only in netvsc debug mode

2015-04-21 Thread David Miller
From: six...@microsoft.com
Date: Tue, 21 Apr 2015 15:58:05 -0700

> From: Simon Xiao 
> 
> Signed-off-by: Simon Xiao 
> Reviewed-by: K. Y. Srinivasan 
> Reviewed-by: Haiyang Zhang 

I just gave you feedback on this patch in response to your
original submission, do not ignore it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next,1/1] hv_netvsc: call dump_rndis_message() only in netvsc debug mode

2015-04-21 Thread sixiao
From: Simon Xiao 

Signed-off-by: Simon Xiao 
Reviewed-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h   | 3 +++
 drivers/net/hyperv/netvsc_drv.c   | 8 
 drivers/net/hyperv/rndis_filter.c | 3 ++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index a10b316..c9be35e 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -28,6 +28,9 @@
 #include 
 #include 
 
+/* flag for netvsc debug mode */
+extern int debug_mode;
+
 /* RSS related */
 #define OID_GEN_RECEIVE_SCALE_CAPABILITIES 0x00010203  /* query only */
 #define OID_GEN_RECEIVE_SCALE_PARAMETERS 0x00010204  /* query and set */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index a3a9d38..7c41864 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -52,6 +52,10 @@ static int ring_size = 128;
 module_param(ring_size, int, S_IRUGO);
 MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
 
+int debug_mode = 0;
+module_param(debug_mode, int, S_IRUGO);
+MODULE_PARM_DESC(debug_mode, "debug mode: zero(0) for non-debug mode; non-zero 
for debug mode");
+
 static void do_set_multicast(struct work_struct *w)
 {
struct net_device_context *ndevctx =
@@ -999,6 +1003,10 @@ static int __init netvsc_drv_init(void)
pr_info("Increased ring_size to %d (min allowed)\n",
ring_size);
}
+
+   if (debug_mode != 0)
+   pr_info("Run netvsc in debug mode");
+
return vmbus_driver_register(&netvsc_drv);
 }
 
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 0d92efe..a3f43f6 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -429,7 +429,8 @@ int rndis_filter_receive(struct hv_device *dev,
 
rndis_msg = pkt->data;
 
-   dump_rndis_message(dev, rndis_msg);
+   if (debug_mode != 0)
+   dump_rndis_message(dev, rndis_msg);
 
switch (rndis_msg->ndis_msg_type) {
case RNDIS_MSG_PACKET:
-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] v2: Driver: hv: netvsc: call dump_rndis_message() only in netvsc debug mode

2015-04-21 Thread Simon Xiao
Sorry - this patch should be sent to net-next so please ignore it. 

Thanks,
Simon

-Original Message-
From: six...@microsoft.com [mailto:six...@microsoft.com] 
Sent: Tuesday, April 21, 2015 2:44 PM
To: KY Srinivasan; Haiyang Zhang; netdev@vger.kernel.org; 
linux-ker...@vger.kernel.org
Cc: Simon Xiao
Subject: [PATCH 1/1] v2: Driver: hv: netvsc: call dump_rndis_message() only in 
netvsc debug mode

From: Simon Xiao 

Signed-off-by: Simon Xiao 
---
 drivers/net/hyperv/hyperv_net.h   | 3 +++
 drivers/net/hyperv/netvsc_drv.c   | 8 
 drivers/net/hyperv/rndis_filter.c | 3 ++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h 
index a10b316..c9be35e 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -28,6 +28,9 @@
 #include 
 #include 
 
+/* flag for netvsc debug mode */
+extern int debug_mode;
+
 /* RSS related */
 #define OID_GEN_RECEIVE_SCALE_CAPABILITIES 0x00010203  /* query only */  
#define OID_GEN_RECEIVE_SCALE_PARAMETERS 0x00010204  /* query and set */ diff 
--git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 
a3a9d38..7c41864 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -52,6 +52,10 @@ static int ring_size = 128;  module_param(ring_size, int, 
S_IRUGO);  MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
 
+int debug_mode = 0;
+module_param(debug_mode, int, S_IRUGO); MODULE_PARM_DESC(debug_mode, 
+"debug mode: zero(0) for non-debug mode; non-zero for debug mode");
+
 static void do_set_multicast(struct work_struct *w)  {
struct net_device_context *ndevctx =
@@ -999,6 +1003,10 @@ static int __init netvsc_drv_init(void)
pr_info("Increased ring_size to %d (min allowed)\n",
ring_size);
}
+
+   if (debug_mode != 0)
+   pr_info("Run netvsc in debug mode");
+
return vmbus_driver_register(&netvsc_drv);
 }
 
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 0d92efe..a3f43f6 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -429,7 +429,8 @@ int rndis_filter_receive(struct hv_device *dev,
 
rndis_msg = pkt->data;
 
-   dump_rndis_message(dev, rndis_msg);
+   if (debug_mode != 0)
+   dump_rndis_message(dev, rndis_msg);
 
switch (rndis_msg->ndis_msg_type) {
case RNDIS_MSG_PACKET:
--
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] v2: Driver: hv: netvsc: call dump_rndis_message() only in netvsc debug mode

2015-04-21 Thread David Miller
From: six...@microsoft.com
Date: Tue, 21 Apr 2015 14:43:55 -0700

> From: Simon Xiao 
> 
> Signed-off-by: Simon Xiao 

This commit message is lacking an explanation why you want to do
what you are doing.

Also, we have an existing mechanism to control network device driver
debug logging output, please use it rather than invent your own
facility.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()

2015-04-21 Thread Mateusz Kulikowski
On 21.04.2015 23:22, Joe Perches wrote:
> On Tue, 2015-04-21 at 22:57 +0200, Mateusz Kulikowski wrote:
(...)
>>
>> Perhaps it would be smarter to use (for both patches) $stat instead.
>> This applies also to existing checks (like PREFER_ETHER_ADDR_COPY) 
>> so we can catch calls formatted like
>>
>> memset(very.long.structure->something.something_different42,
>>0xFF, ETH_ALEN);
> 
> Yes, likely that's true.
> 
> checkpatch couldn't --fix it easily unless it's on a
> single line though.

True, True; If you prefer $line and ability to --fix - I'll use that in v3

> 
> As far as I can tell, there are ~120 of these "memcpy"s
> in the tree, but there aren't any "memset"s like that
> split into 2 or more lines.

Some of them are probably candidates for ether_addr_copy(_unaligned) :)
I'll probably take a look at them once I have both functions available.

Regards,
Mateusz

> 
> Here's a list of the multiple line memcpy(..., ETH_ALEN)
> uses that I found.
> 
> arch/arm/mach-davinci/board-mityomapl138.c:144:   
> memcpy(soc_info->emac_pdata->mac_addr,
>   factory_config.mac, ETH_ALEN);
> drivers/staging/rtl8712/rtl8712_recv.c:395:   
> memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->src,
>   ETH_ALEN);
> drivers/staging/rtl8712/rtl8712_recv.c:396:   
> memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->dst,
>   ETH_ALEN);
> drivers/staging/rtl8712/rtl8712_recv.c:402:   
> memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->src,
>   ETH_ALEN);
> drivers/staging/rtl8712/rtl8712_recv.c:403:   
> memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->dst,
>   ETH_ALEN);
> drivers/staging/rtl8712/os_intfs.c:398:   
> memcpy(pnetdev->dev_addr,
>   padapter->eeprompriv.mac_addr, ETH_ALEN);
> drivers/staging/rtl8712/os_intfs.c:414:   
> memcpy(padapter->eeprompriv.mac_addr,
>   pnetdev->dev_addr, ETH_ALEN);
> drivers/staging/rtl8712/rtl871x_ioctl_linux.c:107:
> memcpy(wrqu.ap_addr.sa_data, pmlmepriv->cur_network.network.MacAddress,
>   ETH_ALEN);
> drivers/staging/rtl8712/rtl871x_ioctl_linux.c:838:
> memcpy(psecuritypriv->PMKIDList[psecuritypriv->
>   PMKIDIndex].Bssid, strIssueBssid, ETH_ALEN);
> drivers/staging/rtl8712/rtl871x_xmit.c:489:   
> memcpy(pwlanhdr->addr1, get_bssid(pmlmepriv),
>   ETH_ALEN);
> drivers/staging/rtl8712/rtl871x_xmit.c:496:   
> memcpy(pwlanhdr->addr2, get_bssid(pmlmepriv),
>   ETH_ALEN);
> drivers/staging/rtl8712/rtl871x_xmit.c:503:   
> memcpy(pwlanhdr->addr3, get_bssid(pmlmepriv),
>   ETH_ALEN);
> drivers/staging/rtl8712/rtl871x_xmit.c:507:   
> memcpy(pwlanhdr->addr3, get_bssid(pmlmepriv),
>   ETH_ALEN);
> drivers/staging/rtl8192e/rtllib_softmac_wx.c:125: 
> memcpy(wrqu->ap_addr.sa_data,
>  ieee->current_network.bssid, ETH_ALEN);
> drivers/staging/rtl8192e/rtllib_tx.c:697: 
> memcpy(&header.addr1, ieee->current_network.bssid,
>  ETH_ALEN);
> drivers/staging/rtl8192e/rtllib_tx.c:700: 
> memcpy(&header.addr3,
>  ieee->current_network.bssid, ETH_ALEN);
> drivers/staging/rtl8192e/rtllib_tx.c:709: 
> memcpy(&header.addr3, ieee->current_network.bssid,
>  ETH_ALEN);
> drivers/staging/rtl8192e/rtllib_softmac.c:3748:   
> memcpy(wrqu.ap_addr.sa_data, ieee->current_network.bssid,
>  ETH_ALEN);
> drivers/staging/rtl8192u/ieee80211/ieee80211_softmac_wx.c:126:
> memcpy(wrqu->ap_addr.sa_data,
>  ieee->current_network.bssid, ETH_ALEN);
> drivers/staging/slicoss/slicoss.c:567:
> memcpy(adapter->currmacaddr, adapter->macaddr,
>  ETH_ALEN);
> drivers/staging/slicoss/slicoss.c:569:
> memcpy(adapter->netdev->dev_addr, adapter->currmacaddr,
>  ETH_ALEN);
> drivers/staging/rtl8723au/hal/usb_halinit.c:1020: 
> memcpy(pEEPROM->mac_addr, &hwinfo[EEPROM_MAC_ADDR_8723AU],
>  ETH_ALEN);
> drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:327: 
> memcpy(pwlanhdr->addr1,
>  get_my_bssid23a(&pmlmeinfo->network), ETH_ALEN);
> drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:328: 
> memcpy(pwlanhdr->addr2, myid(&padapter->eeprompriv),
>  ETH_ALEN);
> drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:335: 
> memcpy(pwlanhdr->addr2,
>  get_my

Re: [PATCH] altera tse: add support for lixed-links.

2015-04-21 Thread David Miller
From: Andreas Oetken 
Date: Tue, 21 Apr 2015 18:32:25 +0200

Subject typo "lixed --> fixed"

> + /* In the case of a fixed PHY, the DT node associated
> + * to the PHY is the Ethernet MAC DT node.
> + */

Not indented properly, the second and third line of this comment need
one extra space before the first "*".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP

2015-04-21 Thread David Miller
From: Or Gerlitz 
Date: Tue, 21 Apr 2015 15:46:34 +0300

> From: Eran Ben Elisha 
> 
> Currently we parse max_msg_sz from the wrong offset in QUERY_DEV_CAP,
> fix to use the right offset.
> 
> Fixes: 0b131561a7d6 ('net/mlx4_en: Add Flow control statistics [..]')
> Signed-off-by: Eran Ben Elisha 
> Signed-off-by: Or Gerlitz 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] net: stmmac: use msleep instead of udelay for gpio reset

2015-04-21 Thread David Miller
From: Michael Trimarchi 
Date: Tue, 21 Apr 2015 13:16:13 +0200

> - udelay(data->delays[0]);
 ...
> + msleep(max(1U, data->delays[0] / 1000));

That looks very ugly with that max() expression in there.

Please find some clean way to get rid of it if you want to
make this conversion.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcp: set SOCK_NOSPACE under memory presure

2015-04-21 Thread David Miller
From: Jason Baron 
Date: Mon, 20 Apr 2015 20:05:13 + (GMT)

> Under tcp memory pressure, calling epoll_wait() in edge triggered
> mode after -EAGAIN, can result in an indefinite hang in epoll_wait(),
> even when there is suffcient memory available to continue making
> progress. The problem is that __sk_mem_schedule() can return 0,
> under memory pressure without having set the SOCK_NOSPACE flag. Thus,
> even though all the outstanding packets have been acked, we never
> get the EPOLLOUT that we are expecting from epoll_wait().
> 
> This issue is currently limited to epoll when used in edge trigger
> mode, since 'tcp_poll()', does in fact currently set SOCK_NOSPACE.
> This is sufficient for poll()/select() and epoll() in level trigger
> mode. However, in edge trigger mode, epoll() is relying on the write
> path to set SOCK_NOSPACE. So I view this patch as bringing us into
> sync with poll()/select() and epoll() level trigger behavior.

Can you explain exactly how epoll in edge trigger mode is
depending upon SOCK_NOSPACE being set in this way?  I tried
to read the epoll code and it just seems to call ->poll()
in the normal way when returning event state.

Also, there are exactly two call sites of sk_stream_wait_space()
for TCP, and they both look like this:


wait_for_sndbuf:
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
wait_for_memory:
tcp_push(sk, flags & ~MSG_MORE, mss_now,
 TCP_NAGLE_PUSH, size_goal);

if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
goto do_error;


Definitely, the person who wrote this code intended SOCK_NOSPACE to be
set only when we are waiting for sndbuf space rather than just memory.

At a minimum, I need a more detailed commit log message for this,
showing the exact code paths in epoll() that have this requirement and
thus create the looping condition.  Because with a casual scan of the
epoll code I could not figure it out.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()

2015-04-21 Thread Joe Perches
On Tue, 2015-04-21 at 22:57 +0200, Mateusz Kulikowski wrote:
> Hi Joe, 
> 
> On 20.04.2015 03:13, Joe Perches wrote:
> > On Mon, 2015-04-20 at 00:16 +0200, Mateusz Kulikowski wrote:
> >> Suggest using eth_zero_addr() or eth_broadcast_addr() instead of memset().
> > 
> > Hi again Mateusz
> > 
> >> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > []
> >> @@ -5042,6 +5042,22 @@ sub process {
> >> "Prefer ether_addr_equal() or 
> >> ether_addr_equal_unaligned() over memcmp()\n" . $herecurr)
> >>}
> >>  
> >> +# check for memset(foo, 0x0, ETH_ALEN) that could be eth_zero_addr
> >> +# check for memset(foo, 0xFF, ETH_ALEN) that could be eth_broadcast_addr
> >> +  if ($^V && $^V ge 5.10.0 &&
> >> +  $line =~ 
> >> /^\+(?:.*?)\bmemset\s*\(\s*$FuncArg\s*,\s*$FuncArg\s*\,\s*ETH_ALEN\s*\)/s) 
> >> {
> > 
> > Because you are working with $line and not $stat,
> > the last /s isn't useful here.
> > 
> > $line is always a single line.
> 
> Perhaps it would be smarter to use (for both patches) $stat instead.
> This applies also to existing checks (like PREFER_ETHER_ADDR_COPY) 
> so we can catch calls formatted like
> 
> memset(very.long.structure->something.something_different42,
>0xFF, ETH_ALEN);

Yes, likely that's true.

checkpatch couldn't --fix it easily unless it's on a
single line though.

As far as I can tell, there are ~120 of these "memcpy"s
in the tree, but there aren't any "memset"s like that
split into 2 or more lines.

Here's a list of the multiple line memcpy(..., ETH_ALEN)
uses that I found.

arch/arm/mach-davinci/board-mityomapl138.c:144: 
memcpy(soc_info->emac_pdata->mac_addr,
factory_config.mac, ETH_ALEN);
drivers/staging/rtl8712/rtl8712_recv.c:395: 
memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->src,
ETH_ALEN);
drivers/staging/rtl8712/rtl8712_recv.c:396: 
memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->dst,
ETH_ALEN);
drivers/staging/rtl8712/rtl8712_recv.c:402: 
memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->src,
ETH_ALEN);
drivers/staging/rtl8712/rtl8712_recv.c:403: 
memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->dst,
ETH_ALEN);
drivers/staging/rtl8712/os_intfs.c:398: 
memcpy(pnetdev->dev_addr,
padapter->eeprompriv.mac_addr, ETH_ALEN);
drivers/staging/rtl8712/os_intfs.c:414: 
memcpy(padapter->eeprompriv.mac_addr,
pnetdev->dev_addr, ETH_ALEN);
drivers/staging/rtl8712/rtl871x_ioctl_linux.c:107:  
memcpy(wrqu.ap_addr.sa_data, pmlmepriv->cur_network.network.MacAddress,
ETH_ALEN);
drivers/staging/rtl8712/rtl871x_ioctl_linux.c:838:  
memcpy(psecuritypriv->PMKIDList[psecuritypriv->
PMKIDIndex].Bssid, strIssueBssid, ETH_ALEN);
drivers/staging/rtl8712/rtl871x_xmit.c:489: 
memcpy(pwlanhdr->addr1, get_bssid(pmlmepriv),
ETH_ALEN);
drivers/staging/rtl8712/rtl871x_xmit.c:496: 
memcpy(pwlanhdr->addr2, get_bssid(pmlmepriv),
ETH_ALEN);
drivers/staging/rtl8712/rtl871x_xmit.c:503: 
memcpy(pwlanhdr->addr3, get_bssid(pmlmepriv),
ETH_ALEN);
drivers/staging/rtl8712/rtl871x_xmit.c:507: 
memcpy(pwlanhdr->addr3, get_bssid(pmlmepriv),
ETH_ALEN);
drivers/staging/rtl8192e/rtllib_softmac_wx.c:125:   
memcpy(wrqu->ap_addr.sa_data,
   ieee->current_network.bssid, ETH_ALEN);
drivers/staging/rtl8192e/rtllib_tx.c:697:   
memcpy(&header.addr1, ieee->current_network.bssid,
   ETH_ALEN);
drivers/staging/rtl8192e/rtllib_tx.c:700:   
memcpy(&header.addr3,
   ieee->current_network.bssid, ETH_ALEN);
drivers/staging/rtl8192e/rtllib_tx.c:709:   
memcpy(&header.addr3, ieee->current_network.bssid,
   ETH_ALEN);
drivers/staging/rtl8192e/rtllib_softmac.c:3748: 
memcpy(wrqu.ap_addr.sa_data, ieee->current_network.bssid,
   ETH_ALEN);
drivers/staging/rtl8192u/ieee80211/ieee80211_softmac_wx.c:126:  
memcpy(wrqu->ap_addr.sa_data,
   ieee->current_network.bssid, ETH_ALEN);
drivers/staging/slicoss/slicoss.c:567:  
memcpy(adapter->currmacaddr, adapter->macaddr,
   ETH_ALEN);
drivers/staging/slicoss/slicoss.c:569:  
memcpy(adapter->netdev->dev_addr, adapter->currmacaddr,
   ETH_ALEN);
drivers/staging/rtl8723au/hal/usb_halinit.c:1020:   
memcpy(p

Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()

2015-04-21 Thread Mateusz Kulikowski
Hi Joe, 

On 20.04.2015 03:13, Joe Perches wrote:
> On Mon, 2015-04-20 at 00:16 +0200, Mateusz Kulikowski wrote:
>> Suggest using eth_zero_addr() or eth_broadcast_addr() instead of memset().
> 
> Hi again Mateusz
> 
>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
>> @@ -5042,6 +5042,22 @@ sub process {
>>   "Prefer ether_addr_equal() or 
>> ether_addr_equal_unaligned() over memcmp()\n" . $herecurr)
>>  }
>>  
>> +# check for memset(foo, 0x0, ETH_ALEN) that could be eth_zero_addr
>> +# check for memset(foo, 0xFF, ETH_ALEN) that could be eth_broadcast_addr
>> +if ($^V && $^V ge 5.10.0 &&
>> +$line =~ 
>> /^\+(?:.*?)\bmemset\s*\(\s*$FuncArg\s*,\s*$FuncArg\s*\,\s*ETH_ALEN\s*\)/s) {
> 
> Because you are working with $line and not $stat,
> the last /s isn't useful here.
> 
> $line is always a single line.

Perhaps it would be smarter to use (for both patches) $stat instead.
This applies also to existing checks (like PREFER_ETHER_ADDR_COPY) 
so we can catch calls formatted like

memset(very.long.structure->something.something_different42,
   0xFF, ETH_ALEN);


Regards,
Mateusz
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] mpls: Per-device enabling of packet input

2015-04-21 Thread Robert Shearman
An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring this is for the edge device to always impose the labels,
and not allow forward labeled traffic from untrusted neighbours. This
is achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.

To be secure by default, the default state is changed to MPLS being
disabled on all interfaces (except the loopback) unless explicitly
enabled and no global option is provided to change the default. Whilst
this differs from other protocols (e.g. IPv6), network operators are
used to explicitly enabling MPLS forwarding on interfaces, and with
the number of links to the MPLS core typically fairly low this doesn't
present too much of a burden on operators.

Cc: "Eric W. Biederman" 
Signed-off-by: Robert Shearman 
---
 Documentation/networking/mpls-sysctl.txt |  9 
 net/mpls/af_mpls.c   | 75 +++-
 net/mpls/internal.h  |  3 ++
 3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/mpls-sysctl.txt 
b/Documentation/networking/mpls-sysctl.txt
index 639ddf0ece9b..9ed15f86c17c 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -18,3 +18,12 @@ platform_labels - INTEGER
 
Possible values: 0 - 1048575
Default: 0
+
+conf//input - BOOL
+   Control whether packets can be input on this interface.
+
+   If disabled, packets will be discarded without further
+   processing.
+
+   0 - disabled (default)
+   not 0 - enabled
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ad45017eed99..7ac93082e3dc 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -150,7 +150,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
/* Careful this entire function runs inside of an rcu critical section 
*/
 
mdev = mpls_dev_get(dev);
-   if (!mdev)
+   if (!mdev || !mdev->input_enabled)
goto drop;
 
if (skb->pkt_type != PACKET_HOST)
@@ -438,6 +438,60 @@ errout:
return err;
 }
 
+#define MPLS_PERDEV_SYSCTL_OFFSET(field)   \
+   (&((struct mpls_dev *)0)->field)
+
+static const struct ctl_table mpls_dev_table[] = {
+   {
+   .procname   = "input",
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   .data   = MPLS_PERDEV_SYSCTL_OFFSET(input_enabled),
+   },
+   { }
+};
+
+static int mpls_dev_sysctl_register(struct net_device *dev,
+   struct mpls_dev *mdev)
+{
+   char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
+   struct ctl_table *table;
+   int i;
+
+   table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
+   if (!table)
+   goto out;
+
+   /* Table data contains only offsets relative to the base of
+* the mdev at this point, so make them absolute.
+*/
+   for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
+   table[i].data = (char *)mdev + (uintptr_t)table[i].data;
+
+   snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
+
+   mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
+   if (!mdev->sysctl)
+   goto free;
+
+   return 0;
+
+free:
+   kfree(table);
+out:
+   return -ENOBUFS;
+}
+
+static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
+{
+   struct ctl_table *table;
+
+   table = mdev->sysctl->ctl_table_arg;
+   unregister_net_sysctl_table(mdev->sysctl);
+   kfree(table);
+}
+
 static struct mpls_dev *mpls_add_dev(struct net_device *dev)
 {
struct mpls_dev *mdev;
@@ -449,9 +503,24 @@ static struct mpls_dev *mpls_add_dev(struct net_device 
*dev)
if (!mdev)
return ERR_PTR(err);
 
+   /* Enable MPLS by default on loopback devices, since this
+* doesn't represent a security boundary and is required for the
+* lookup of inner labels for LSPs terminating on this router.
+*/
+   if (dev->flags & IFF_LOOPBACK)
+   mdev->input_enabled = 1;
+
+   err = mpls_dev_sysctl_register(dev, mdev);
+   if (err)
+   goto free;
+
rcu_assign_pointer(dev->mpls_ptr, mdev);
 
return mdev;
+
+free:
+   kfree(mdev);
+   return ERR_PTR(err);
 }
 
 static void mpls_ifdown(struct net_device *dev)
@@ -475,6 +544,8 @@ static void mpls_ifdown(struct net_device *dev)
if (!mdev)
return;
 
+   mpls_dev_sysctl_unregister(mdev);
+
RCU_INIT_POINTER(dev->mpls_ptr, NULL);
 
kfree(mdev);
@@ -958,7 +1029,7 @@ static int mpls_platform_labels(struct ctl_table *table, 
int write,
return ret;
 }
 
-

[PATCH 1/3] mpls: Per-device MPLS state

2015-04-21 Thread Robert Shearman
Add per-device MPLS state to supported interfaces. Use the presence of
this state in mpls_route_add to determine that this is a supported
interface.

Use the presence of mpls_dev to drop packets that arrived on an
unsupported interface - previously they were allowed through.

Cc: "Eric W. Biederman" 
Signed-off-by: Robert Shearman 
---
 include/linux/netdevice.h |  4 
 net/mpls/af_mpls.c| 50 +--
 net/mpls/internal.h   |  3 +++
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bcbde799ec69..dae106a3a998 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -60,6 +60,7 @@ struct phy_device;
 struct wireless_dev;
 /* 802.15.4 specific */
 struct wpan_dev;
+struct mpls_dev;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
const struct ethtool_ops *ops);
@@ -1627,6 +1628,9 @@ struct net_device {
void*ax25_ptr;
struct wireless_dev *ieee80211_ptr;
struct wpan_dev *ieee802154_ptr;
+#if IS_ENABLED(CONFIG_MPLS_ROUTING)
+   struct mpls_dev __rcu   *mpls_ptr;
+#endif
 
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index db8a2ea6d4de..ad45017eed99 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -53,6 +53,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net 
*net, unsigned index)
return rt;
 }
 
+static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
+{
+   return rcu_dereference_rtnl(dev->mpls_ptr);
+}
+
 static bool mpls_output_possible(const struct net_device *dev)
 {
return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
@@ -136,6 +141,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
struct mpls_route *rt;
struct mpls_entry_decoded dec;
struct net_device *out_dev;
+   struct mpls_dev *mdev;
unsigned int hh_len;
unsigned int new_header_size;
unsigned int mtu;
@@ -143,6 +149,10 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
 
/* Careful this entire function runs inside of an rcu critical section 
*/
 
+   mdev = mpls_dev_get(dev);
+   if (!mdev)
+   goto drop;
+
if (skb->pkt_type != PACKET_HOST)
goto drop;
 
@@ -352,9 +362,9 @@ static int mpls_route_add(struct mpls_route_config *cfg)
if (!dev)
goto errout;
 
-   /* For now just support ethernet devices */
+   /* Ensure this is a supported device */
err = -EINVAL;
-   if ((dev->type != ARPHRD_ETHER) && (dev->type != ARPHRD_LOOPBACK))
+   if (!mpls_dev_get(dev))
goto errout;
 
err = -EINVAL;
@@ -428,10 +438,27 @@ errout:
return err;
 }
 
+static struct mpls_dev *mpls_add_dev(struct net_device *dev)
+{
+   struct mpls_dev *mdev;
+   int err = -ENOMEM;
+
+   ASSERT_RTNL();
+
+   mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+   if (!mdev)
+   return ERR_PTR(err);
+
+   rcu_assign_pointer(dev->mpls_ptr, mdev);
+
+   return mdev;
+}
+
 static void mpls_ifdown(struct net_device *dev)
 {
struct mpls_route __rcu **platform_label;
struct net *net = dev_net(dev);
+   struct mpls_dev *mdev;
unsigned index;
 
platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -443,14 +470,33 @@ static void mpls_ifdown(struct net_device *dev)
continue;
rt->rt_dev = NULL;
}
+
+   mdev = mpls_dev_get(dev);
+   if (!mdev)
+   return;
+
+   RCU_INIT_POINTER(dev->mpls_ptr, NULL);
+
+   kfree(mdev);
 }
 
 static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
   void *ptr)
 {
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct mpls_dev *mdev;
 
switch(event) {
+   case NETDEV_REGISTER:
+   /* For now just support ethernet devices */
+   if ((dev->type == ARPHRD_ETHER) ||
+   (dev->type == ARPHRD_LOOPBACK)) {
+   mdev = mpls_add_dev(dev);
+   if (IS_ERR(mdev))
+   return notifier_from_errno(PTR_ERR(mdev));
+   }
+   break;
+
case NETDEV_UNREGISTER:
mpls_ifdown(dev);
break;
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index fb6de92052c4..8090cb3099b4 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -22,6 +22,9 @@ struct mpls_entry_decoded {
u8 bos;
 };
 
+struct mpls_dev {
+};
+
 struct sk_buff;
 
 static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)
-- 
2.1.4

--
To unsubscribe from this list: send th

[PATCH 3/3] mpls: Prevent use of implicit NULL label as outgoing label

2015-04-21 Thread Robert Shearman
The reserved implicit-NULL label isn't allowed to appear in the label
stack for packets, so make it an error for the control plane to
specify it as an outgoing label.

Suggested-by: "Eric W. Biederman" 
Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 7ac93082e3dc..eb8dc411859d 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -653,6 +653,15 @@ int nla_get_labels(const struct nlattr *nla,
if ((dec.bos != bos) || dec.ttl || dec.tc)
return -EINVAL;
 
+   switch (dec.label) {
+   case LABEL_IMPLICIT_NULL:
+   /* RFC3032: This is a label that an LSR may
+* assign and distribute, but which never
+* actually appears in the encapsulation.
+*/
+   return -EINVAL;
+   }
+
label[i] = dec.label;
}
*labels = nla_labels;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] mpls: ABI changes for security and correctness

2015-04-21 Thread Robert Shearman
These changes make mpls not be enabled by default on all
interfaces when in use for security, along with ensuring that a label
not valid as an outgoing label can be added in mpls routes.

This series contains three ABI/behaviour-affecting changes which have
been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
improvements" without any further modification. These changes need to
be considered for 4.1 otherwise we'll be stuck with the current
behaviour/ABI forever.

Robert Shearman (3):
  mpls: Per-device MPLS state
  mpls: Per-device enabling of packet input
  mpls: Prevent use of implicit NULL label as outgoing label

 Documentation/networking/mpls-sysctl.txt |   9 +++
 include/linux/netdevice.h|   4 +
 net/mpls/af_mpls.c   | 132 ++-
 net/mpls/internal.h  |   6 ++
 4 files changed, 148 insertions(+), 3 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] tc filter not show last u32 filter

2015-04-21 Thread Cong Wang
On Tue, Apr 21, 2015 at 1:07 PM, Vitaly E. Lavrov  wrote:
>
> A similar patch is already exists in the
> git.kernel.org/linux-stable.git/master (commit
> b057df24a7536cce6c372efe9d0e3d1558afedf4)
> and linux-4.0.y.
>
> The patch can be applied to branches of the kernel 3.14, 3.18 and 3.19.  Why
> is this not done at once?
>

That commit should already have been backported to all stable kernels, so why
do we need a different one?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] v2: Driver: hv: netvsc: call dump_rndis_message() only in netvsc debug mode

2015-04-21 Thread sixiao
From: Simon Xiao 

Signed-off-by: Simon Xiao 
---
 drivers/net/hyperv/hyperv_net.h   | 3 +++
 drivers/net/hyperv/netvsc_drv.c   | 8 
 drivers/net/hyperv/rndis_filter.c | 3 ++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index a10b316..c9be35e 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -28,6 +28,9 @@
 #include 
 #include 
 
+/* flag for netvsc debug mode */
+extern int debug_mode;
+
 /* RSS related */
 #define OID_GEN_RECEIVE_SCALE_CAPABILITIES 0x00010203  /* query only */
 #define OID_GEN_RECEIVE_SCALE_PARAMETERS 0x00010204  /* query and set */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index a3a9d38..7c41864 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -52,6 +52,10 @@ static int ring_size = 128;
 module_param(ring_size, int, S_IRUGO);
 MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
 
+int debug_mode = 0;
+module_param(debug_mode, int, S_IRUGO);
+MODULE_PARM_DESC(debug_mode, "debug mode: zero(0) for non-debug mode; non-zero 
for debug mode");
+
 static void do_set_multicast(struct work_struct *w)
 {
struct net_device_context *ndevctx =
@@ -999,6 +1003,10 @@ static int __init netvsc_drv_init(void)
pr_info("Increased ring_size to %d (min allowed)\n",
ring_size);
}
+
+   if (debug_mode != 0)
+   pr_info("Run netvsc in debug mode");
+
return vmbus_driver_register(&netvsc_drv);
 }
 
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 0d92efe..a3f43f6 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -429,7 +429,8 @@ int rndis_filter_receive(struct hv_device *dev,
 
rndis_msg = pkt->data;
 
-   dump_rndis_message(dev, rndis_msg);
+   if (debug_mode != 0)
+   dump_rndis_message(dev, rndis_msg);
 
switch (rndis_msg->ndis_msg_type) {
case RNDIS_MSG_PACKET:
-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure

2015-04-21 Thread Thomas Gleixner
On Tue, 21 Apr 2015, Arnd Bergmann wrote:
> I know there are concerns about this, in particular because C11 and
> POSIX both require tv_nsec to be 'long', unlike timeval->tv_usec,
> which is a 'suseconds_t' and can be defined as 'long long'.
>
> a)
> 
> struct timespec {
>   time_t tv_sec;
>   long long tv_nsec; /* or typedef long long snseconds_t */
> };
> 
> This is not directly compatible with C11 or POSIX.1-2008, but it
> matches what we do inside of 64-bit kernels, so probably has the
> highest chance of working correctly in practice

After reading Linus rant in the x32 thread again (thanks for the
reminder), and looking at b/c/d - which rate between ugly and butt
ugly - I think we should go for a) and screw POSIX and C11 as those
committee dinosaurs seem to completely ignore the 2038 problem on
32bit machines. At least I have not found any hint that these folks
care at all. So why should we comply to something which is completely
useless?

That also makes the question about the upper 32bits check moot, so
it's the simplest and clearest of the possible solutions.

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] ethernet: myri10ge: use arch_phys_wc_add()

2015-04-21 Thread Luis R. Rodriguez
From: "Luis R. Rodriguez" 

This driver already uses ioremap_wc() on the same range
so when write-combining is available that will be used
instead.

Cc: Hyong-Youb Kim 
Cc: Andy Lutomirski 
Cc: Suresh Siddha 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Juergen Gross 
Cc: Daniel Vetter 
Cc: netdev@vger.kernel.org
Cc: Juergen Gross 
Cc: Daniel Vetter 
Cc: Andy Lutomirski 
Cc: Dave Airlie 
Cc: Antonino Daplas 
Cc: Jean-Christophe Plagniol-Villard 
Cc: Tomi Valkeinen 
Cc: linux-ker...@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Luis R. Rodriguez 
---
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 38 ++--
 1 file changed, 9 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c 
b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 1412f5a..2bae502 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -69,11 +69,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#ifdef CONFIG_MTRR
-#include 
-#endif
 #include 
 
 #include "myri10ge_mcp.h"
@@ -242,8 +238,7 @@ struct myri10ge_priv {
unsigned int rdma_tags_available;
int intr_coal_delay;
__be32 __iomem *intr_coal_delay_ptr;
-   int mtrr;
-   int wc_enabled;
+   int wc_cookie;
int down_cnt;
wait_queue_head_t down_wq;
struct work_struct watchdog_work;
@@ -1905,7 +1900,7 @@ static const char 
myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
"tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors",
"tx_heartbeat_errors", "tx_window_errors",
/* device-specific stats */
-   "tx_boundary", "WC", "irq", "MSI", "MSIX",
+   "tx_boundary", "irq", "MSI", "MSIX",
"read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs",
"serial_number", "watchdog_resets",
 #ifdef CONFIG_MYRI10GE_DCA
@@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
data[i] = ((u64 *)&link_stats)[i];
 
data[i++] = (unsigned int)mgp->tx_boundary;
-   data[i++] = (unsigned int)mgp->wc_enabled;
data[i++] = (unsigned int)mgp->pdev->irq;
data[i++] = (unsigned int)mgp->msi_enabled;
data[i++] = (unsigned int)mgp->msix_enabled;
@@ -4040,14 +4034,7 @@ static int myri10ge_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
 
mgp->board_span = pci_resource_len(pdev, 0);
mgp->iomem_base = pci_resource_start(pdev, 0);
-   mgp->mtrr = -1;
-   mgp->wc_enabled = 0;
-#ifdef CONFIG_MTRR
-   mgp->mtrr = mtrr_add(mgp->iomem_base, mgp->board_span,
-MTRR_TYPE_WRCOMB, 1);
-   if (mgp->mtrr >= 0)
-   mgp->wc_enabled = 1;
-#endif
+   mgp->wc_cookie = arch_phys_wc_add(mgp->iomem_base, mgp->board_span);
mgp->sram = ioremap_wc(mgp->iomem_base, mgp->board_span);
if (mgp->sram == NULL) {
dev_err(&pdev->dev, "ioremap failed for %ld bytes at 0x%lx\n",
@@ -4146,14 +4133,14 @@ static int myri10ge_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
goto abort_with_state;
}
if (mgp->msix_enabled)
-   dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, WC %s\n",
+   dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, MTRR %s, WC 
Enabled\n",
 mgp->num_slices, mgp->tx_boundary, mgp->fw_name,
-(mgp->wc_enabled ? "Enabled" : "Disabled"));
+(mgp->wc_cookie > 0 ? "Enabled" : "Disabled"));
else
-   dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, WC %s\n",
+   dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, MTRR %s, WC 
Enabled\n",
 mgp->msi_enabled ? "MSI" : "xPIC",
 pdev->irq, mgp->tx_boundary, mgp->fw_name,
-(mgp->wc_enabled ? "Enabled" : "Disabled"));
+(mgp->wc_cookie > 0 ? "Enabled" : "Disabled"));
 
board_number++;
return 0;
@@ -4175,10 +4162,7 @@ abort_with_ioremap:
iounmap(mgp->sram);
 
 abort_with_mtrr:
-#ifdef CONFIG_MTRR
-   if (mgp->mtrr >= 0)
-   mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
-#endif
+   arch_phys_wc_del(mgp->wc_cookie);
dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd),
  mgp->cmd, mgp->cmd_bus);
 
@@ -4220,11 +4204,7 @@ static void myri10ge_remove(struct pci_dev *pdev)
pci_restore_state(pdev);
 
iounmap(mgp->sram);
-
-#ifdef CONFIG_MTRR
-   if (mgp->mtrr >= 0)
-   mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
-#endif
+   arch_phys_wc_del(mgp->wc_cookie);
myri10ge_free_slices(mgp);
kfree(mgp->msix_vectors);
dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd),
-- 
2.3.2.209.gd67f9d5.dirty

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the bo

Re: [RFC PATCH] tc filter not show last u32 filter

2015-04-21 Thread Vitaly E. Lavrov

On 21.04.2015 21:17, Sergei Shtylyov wrote:

Hello.

On 04/21/2015 08:53 PM, Vitaly E. Lavrov wrote:


"tc filter show" does not show last U32 filter on 32-bit systems (tested on 
x86).



Additional condition: filter does not have action and CONFIG_NET_CLS_ACT=y



Example: tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip
dst 10.200.2.2 flowid 1:20



You need to send patches against the latest kernel. Look for DaveM's 
'net.git' repo on git.kernel.org.

>Please run your patches thru scripts/chackpatch.pl; space is needed after 
*if*.
Thanks for the reminder about the rules.
Sorry that have not checked all branches of the kernel.

A similar patch is already exists in the git.kernel.org/linux-stable.git/master 
(commit b057df24a7536cce6c372efe9d0e3d1558afedf4)
and linux-4.0.y.

The patch can be applied to branches of the kernel 3.14, 3.18 and 3.19.  Why is 
this not done at once?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] etherdevice: Add ether_addr_copy_unaligned

2015-04-21 Thread Mateusz Kulikowski
On 20.04.2015 20:19, David Miller wrote:
(...)
> I'd rather see something like this submitted in a patch series alongside
> some actual uses.
> 
> So I'm tossing this for now.
> 
Ok;

I'll add it to a series where I need it (rtl8192e)

Regards,
Mateusz
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcp: add memory barriers to write space paths

2015-04-21 Thread David Miller
From: Jason Baron 
Date: Mon, 20 Apr 2015 20:05:07 + (GMT)

> Ensure that we either see that the buffer has write space
> in tcp_poll() or that we perform a wakeup from the input
> side. Did not run into any actual problem here, but thought
> that we should make things explicit.
> 
> Signed-off-by: Jason Baron 

This looks fine, applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 0/3] tc cleanup?

2015-04-21 Thread Alexei Starovoitov
Hi,

I've started cleaning TC a bit.
Before I go too far, need your feedback on this RFC:

patch 1 - stop abuse of return values in ingress qdisc
patch 2 - deprecate TC_ACT_QUEUED
patch 3 - reduce copy-paste around tc_classify()

Lightly tested so far. Waiting for John's and other tc test scripts.

Alexei Starovoitov (3):
  tc: fix return values of ingress qdisc
  tc: deprecate TC_ACT_QUEUED
  tc: cleanup tc_classify

 include/net/pkt_sched.h  |2 ++
 include/net/sch_generic.h|7 +++
 include/uapi/linux/pkt_cls.h |2 +-
 net/core/dev.c   |8 ++--
 net/sched/sch_api.c  |   22 ++
 net/sched/sch_atm.c  |1 -
 net/sched/sch_cbq.c  |1 -
 net/sched/sch_choke.c|   17 +++--
 net/sched/sch_drr.c  |   18 +++---
 net/sched/sch_dsmark.c   |1 -
 net/sched/sch_fq_codel.c |   25 ++---
 net/sched/sch_hfsc.c |1 -
 net/sched/sch_htb.c  |1 -
 net/sched/sch_ingress.c  |   10 --
 net/sched/sch_multiq.c   |1 -
 net/sched/sch_prio.c |1 -
 net/sched/sch_qfq.c  |   16 ++--
 net/sched/sch_sfb.c  |   17 +++--
 net/sched/sch_sfq.c  |   26 ++
 19 files changed, 61 insertions(+), 116 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-21 Thread Alexei Starovoitov
TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
Get rid of redundant checks in all qdiscs.
Instead do it once.

Signed-off-by: Alexei Starovoitov 
---
 include/uapi/linux/pkt_cls.h |2 +-
 net/sched/sch_api.c  |2 ++
 net/sched/sch_atm.c  |1 -
 net/sched/sch_cbq.c  |1 -
 net/sched/sch_choke.c|1 -
 net/sched/sch_drr.c  |1 -
 net/sched/sch_dsmark.c   |1 -
 net/sched/sch_fq_codel.c |1 -
 net/sched/sch_hfsc.c |1 -
 net/sched/sch_htb.c  |1 -
 net/sched/sch_ingress.c  |1 -
 net/sched/sch_multiq.c   |1 -
 net/sched/sch_prio.c |1 -
 net/sched/sch_qfq.c  |1 -
 net/sched/sch_sfb.c  |1 -
 net/sched/sch_sfq.c  |1 -
 16 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index bf08e76bf505..208e5ed5256c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -102,7 +102,7 @@ enum {
 #define TC_ACT_SHOT2
 #define TC_ACT_PIPE3
 #define TC_ACT_STOLEN  4
-#define TC_ACT_QUEUED  5
+#define TC_ACT_QUEUED  5 /* deprecated. same as TC_ACT_STOLEN */
 #define TC_ACT_REPEAT  6
 #define TC_ACT_JUMP0x1000
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ad9eed70bc8f..f7950327bb22 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1820,6 +1820,8 @@ int tc_classify_compat(struct sk_buff *skb, const struct 
tcf_proto *tp,
 #ifdef CONFIG_NET_CLS_ACT
if (err != TC_ACT_RECLASSIFY && skb->tc_verd)
skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0);
+   if (err == TC_ACT_QUEUED)
+   return TC_ACT_STOLEN;
 #endif
return err;
}
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index e3e2cc5fd068..7ef0bf4bdce6 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -396,7 +396,6 @@ done:
/*@@@ looks good ... but it's not supposed to work :-) */
 #ifdef CONFIG_NET_CLS_ACT
switch (result) {
-   case TC_ACT_QUEUED:
case TC_ACT_STOLEN:
kfree_skb(skb);
return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index beeb75f80fdb..eca4725e3273 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -258,7 +258,6 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int 
*qerr)
goto fallback;
 #ifdef CONFIG_NET_CLS_ACT
switch (result) {
-   case TC_ACT_QUEUED:
case TC_ACT_STOLEN:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
case TC_ACT_SHOT:
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index c009eb9045ce..a3bc7cf151d3 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -212,7 +212,6 @@ static bool choke_classify(struct sk_buff *skb,
 #ifdef CONFIG_NET_CLS_ACT
switch (result) {
case TC_ACT_STOLEN:
-   case TC_ACT_QUEUED:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
case TC_ACT_SHOT:
return false;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 338706092c27..1051c5d4e85b 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -335,7 +335,6 @@ static struct drr_class *drr_classify(struct sk_buff *skb, 
struct Qdisc *sch,
if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
switch (result) {
-   case TC_ACT_QUEUED:
case TC_ACT_STOLEN:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
case TC_ACT_SHOT:
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 66700a6116aa..ce9d4123cbbe 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -236,7 +236,6 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc 
*sch)
 
switch (result) {
 #ifdef CONFIG_NET_CLS_ACT
-   case TC_ACT_QUEUED:
case TC_ACT_STOLEN:
kfree_skb(skb);
return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 1e52decb7b59..4d00ece3243d 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -104,7 +104,6 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, 
struct Qdisc *sch,
 #ifdef CONFIG_NET_CLS_ACT
switch (result) {
case TC_ACT_STOLEN:
-   case TC_ACT_QUEUED:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
case TC_ACT_SHOT:
return 0;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hf

[RFC 3/3] tc: cleanup tc_classify

2015-04-21 Thread Alexei Starovoitov
introduce tc_classify_act() and qdisc_drop_bypass() helper functions to reduce
copy-paste among different qdiscs

Signed-off-by: Alexei Starovoitov 
---
 include/net/pkt_sched.h   |2 ++
 include/net/sch_generic.h |7 +++
 net/sched/sch_api.c   |   20 
 net/sched/sch_choke.c |   16 +++-
 net/sched/sch_drr.c   |   17 +++--
 net/sched/sch_fq_codel.c  |   24 ++--
 net/sched/sch_qfq.c   |   15 ++-
 net/sched/sch_sfb.c   |   16 +++-
 net/sched/sch_sfq.c   |   25 ++---
 9 files changed, 52 insertions(+), 90 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 2342bf12cb78..7c73cbe95169 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -114,6 +114,8 @@ int tc_classify_compat(struct sk_buff *skb, const struct 
tcf_proto *tp,
   struct tcf_result *res);
 int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
struct tcf_result *res);
+int tc_classify_act(struct sk_buff *skb, const struct tcf_proto *tp,
+   struct tcf_result *res, int *qerr);
 
 static inline __be16 tc_skb_protocol(const struct sk_buff *skb)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 6d778efcfdfd..9a50bad24b1d 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -715,6 +715,13 @@ static inline int qdisc_drop(struct sk_buff *skb, struct 
Qdisc *sch)
return NET_XMIT_DROP;
 }
 
+static inline void qdisc_drop_bypass(struct sk_buff *skb, struct Qdisc *sch, 
int err)
+{
+   if (err & __NET_XMIT_BYPASS)
+   qdisc_qstats_drop(sch);
+   kfree_skb(skb);
+}
+
 static inline int qdisc_reshape_fail(struct sk_buff *skb, struct Qdisc *sch)
 {
qdisc_qstats_drop(sch);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index f7950327bb22..c7c4a672eb35 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1860,6 +1860,26 @@ reclassify:
 }
 EXPORT_SYMBOL(tc_classify);
 
+int tc_classify_act(struct sk_buff *skb, const struct tcf_proto *tp,
+   struct tcf_result *res, int *qerr)
+{
+   int result;
+
+   *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+   result = tc_classify(skb, tp, res);
+
+#ifdef CONFIG_NET_CLS_ACT
+   switch (result) {
+   case TC_ACT_STOLEN:
+   *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+   case TC_ACT_SHOT:
+   return -1;
+   }
+#endif
+   return result;
+}
+EXPORT_SYMBOL(tc_classify_act);
+
 bool tcf_destroy(struct tcf_proto *tp, bool force)
 {
if (tp->ops->destroy(tp, force)) {
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index a3bc7cf151d3..8d8ad5303497 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -207,16 +207,8 @@ static bool choke_classify(struct sk_buff *skb,
int result;
 
fl = rcu_dereference_bh(q->filter_list);
-   result = tc_classify(skb, fl, &res);
+   result = tc_classify_act(skb, fl, &res, qerr);
if (result >= 0) {
-#ifdef CONFIG_NET_CLS_ACT
-   switch (result) {
-   case TC_ACT_STOLEN:
-   *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-   case TC_ACT_SHOT:
-   return false;
-   }
-#endif
choke_set_classid(skb, TC_H_MIN(res.classid));
return true;
}
@@ -268,9 +260,9 @@ static bool choke_match_random(const struct 
choke_sched_data *q,
 
 static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
-   int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
struct choke_sched_data *q = qdisc_priv(sch);
const struct red_parms *p = &q->parms;
+   int ret;
 
if (rcu_access_pointer(q->filter_list)) {
/* If using external classifiers, get result and record it. */
@@ -343,9 +335,7 @@ congestion_drop:
return NET_XMIT_CN;
 
 other_drop:
-   if (ret & __NET_XMIT_BYPASS)
-   qdisc_qstats_drop(sch);
-   kfree_skb(skb);
+   qdisc_drop_bypass(skb, sch, ret);
return ret;
 }
 
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 1051c5d4e85b..36ab69375c79 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -329,18 +329,9 @@ static struct drr_class *drr_classify(struct sk_buff *skb, 
struct Qdisc *sch,
return cl;
}
 
-   *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
fl = rcu_dereference_bh(q->filter_list);
-   result = tc_classify(skb, fl, &res);
+   result = tc_classify_act(skb, fl, &res, qerr);
if (result >= 0) {
-#ifdef CONFIG_NET_CLS_ACT
-   switch (result) {
-   case TC_ACT_STOLEN:
-   *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-   case TC_ACT_SHOT:
-   return NULL;
-   }
-#en

[RFC 1/3] tc: fix return values of ingress qdisc

2015-04-21 Thread Alexei Starovoitov
ingress qdisc should return NET_XMIT_* values just like all other qdiscs.

Since it's invoked via qdisc_enqueue_root() (which suppose to return
only NET_XMIT_* values as well), it was working by accident,
since TC_ACT_* values fit within NET_XMIT_MASK.

Signed-off-by: Alexei Starovoitov 
---
 net/core/dev.c  |8 ++--
 net/sched/sch_ingress.c |9 -
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1796cef55ab5..ac6233f6f353 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3533,7 +3533,7 @@ static int ing_filter(struct sk_buff *skb, struct 
netdev_queue *rxq)
 {
struct net_device *dev = skb->dev;
u32 ttl = G_TC_RTTL(skb->tc_verd);
-   int result = TC_ACT_OK;
+   int result = NET_XMIT_SUCCESS;
struct Qdisc *q;
 
if (unlikely(MAX_RED_LOOP < ttl++)) {
@@ -3570,12 +3570,8 @@ static inline struct sk_buff *handle_ing(struct sk_buff 
*skb,
*pt_prev = NULL;
}
 
-   switch (ing_filter(skb, rxq)) {
-   case TC_ACT_SHOT:
-   case TC_ACT_STOLEN:
-   kfree_skb(skb);
+   if (ing_filter(skb, rxq) == NET_XMIT_DROP)
return NULL;
-   }
 
return skb;
 }
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 4cdbfb85686a..e68f4a5dbeba 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -65,21 +65,20 @@ static int ingress_enqueue(struct sk_buff *skb, struct 
Qdisc *sch)
 
result = tc_classify(skb, fl, &res);
 
-   qdisc_bstats_update(sch, skb);
switch (result) {
case TC_ACT_SHOT:
-   result = TC_ACT_SHOT;
qdisc_qstats_drop(sch);
-   break;
case TC_ACT_STOLEN:
case TC_ACT_QUEUED:
-   result = TC_ACT_STOLEN;
+   result = NET_XMIT_DROP;
+   kfree_skb(skb);
break;
case TC_ACT_RECLASSIFY:
case TC_ACT_OK:
skb->tc_index = TC_H_MIN(res.classid);
default:
-   result = TC_ACT_OK;
+   qdisc_bstats_update(sch, skb);
+   result = NET_XMIT_SUCCESS;
break;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] ibmveth: Add support for Large Receive Offload

2015-04-21 Thread Thomas Falcon
On 04/14/2015 05:00 PM, Eric Dumazet wrote:
> On Tue, 2015-04-14 at 15:35 -0500, Thomas Falcon wrote:
>> Enables receiving large packets from other LPARs. These packets
>> have a -1 IP header checksum, so we must recalculate to have
>> a valid checksum.
>>
>> Signed-off-by: Brian King 
>> Signed-off-by: Thomas Falcon 
>> ---
>>  drivers/net/ethernet/ibm/ibmveth.c | 15 ++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
>> b/drivers/net/ethernet/ibm/ibmveth.c
>> index 08970c7..05eaca6a 100644
>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> @@ -1092,6 +1092,7 @@ static int ibmveth_poll(struct napi_struct *napi, int 
>> budget)
>>  struct net_device *netdev = adapter->netdev;
>>  int frames_processed = 0;
>>  unsigned long lpar_rc;
>> +struct iphdr *iph;
>>  
>>  restart_poll:
>>  while (frames_processed < budget) {
>> @@ -1134,8 +1135,20 @@ restart_poll:
>>  skb_put(skb, length);
>>  skb->protocol = eth_type_trans(skb, netdev);
>>  
>> -if (csum_good)
>> +if (csum_good) {
>>  skb->ip_summed = CHECKSUM_UNNECESSARY;
>> +if (be16_to_cpu(skb->protocol) == ETH_P_IP) {
>> +skb_set_network_header(skb, 0);
>> +skb_set_transport_header(skb, 
>> sizeof(struct iphdr));
>> +iph = ip_hdr(skb);
>> +
>> +/* If the IP checksum is not offloaded 
>> and if the packet
>> + *  is large send, the checksum must be 
>> rebuilt.
>> + */
>> +if (iph->check == 0x)
>> +iph->check = 
>> ip_fast_csum((unsigned char *)iph, iph->ihl);
>
> How can this possibly work ?
>
> Normally you would have to set iph->check to 0 before calling
> ip_fast_csum(), as done in ip_send_check()
I don't have an answer for why I weren't seeing any problems while testing 
this, but I'll go back and set iph->check to zero and retest.  Thanks for 
noticing this.
>
>
> ___
> Linuxppc-dev mailing list
> linuxppc-...@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] tc filter not show last u32 filter

2015-04-21 Thread Sergei Shtylyov

Hello.

On 04/21/2015 08:53 PM, Vitaly E. Lavrov wrote:


"tc filter show" does not show last U32 filter on 32-bit systems (tested on 
x86).



Additional condition: filter does not have action and CONFIG_NET_CLS_ACT=y



Example: tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip
dst 10.200.2.2 flowid 1:20



Function tcf_action_copy_stats() returns an error because the (struct
tc_action *)a->priv is NULL (for 32bit systems).



The sequence of calls:



u32_dump()
   cls_u32.c:1009 if (tcf_exts_dump_stats(skb, &n->exts) < 0) goto
nla_put_failure;



   tcf_exts_dump_stats()
  cls_api.c:606  if (tcf_action_copy_stats(skb, a, 1) < 0) return -1;

  tcf_action_copy_stats()
 act_api.c:604 struct tcf_common *p = a->priv;
 act_api.c:606 if (p == NULL) goto errout; // return -1;



One of variants correcting this error is a verify the existence of action
before calling tcf_action_copy_stats().



Patch for kernel 3.18.10


   You need to send patches against the latest kernel. Look for DaveM's 
'net.git' repo on git.kernel.org.



diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index aad6a67..8e7ad61 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -602,9 +602,11 @@ EXPORT_SYMBOL(tcf_exts_dump);
  int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts)
  {
  #ifdef CONFIG_NET_CLS_ACT
-   struct tc_action *a = tcf_exts_first_act(exts);
-   if (tcf_action_copy_stats(skb, a, 1) < 0)
-   return -1;
+   if(tcf_exts_is_available(exts)) {


   Please run your patches thru scripts/chackpatch.pl; space is needed after 
*if*.



+   struct tc_action *a = tcf_exts_first_act(exts);


   Empty line needed here.

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please backport dev_kfree_skb_any fixes for stable

2015-04-21 Thread David Miller
From: Vinson Lee 
Date: Wed, 25 Mar 2015 12:39:01 -0700

> Please backport the list of commits from the original stable request
> at http://permalink.gmane.org/gmane.linux.network/320390.
> 
> The patches are from 3.15. Please queue them up for kernels <= 3.14.

Ok, doing that right now, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] tc filter not show last u32 filter

2015-04-21 Thread Vitaly E. Lavrov

"tc filter show" does not show last U32 filter on 32-bit systems (tested on 
x86).

Additional condition: filter does not have action and CONFIG_NET_CLS_ACT=y

Example: tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip 
dst 10.200.2.2 flowid 1:20

Function tcf_action_copy_stats() returns an error because the (struct tc_action 
*)a->priv is NULL (for 32bit systems).

The sequence of calls:

u32_dump()
  cls_u32.c:1009 if (tcf_exts_dump_stats(skb, &n->exts) < 0) goto 
nla_put_failure;

  tcf_exts_dump_stats()
 cls_api.c:606  if (tcf_action_copy_stats(skb, a, 1) < 0) return -1;

 tcf_action_copy_stats()
act_api.c:604 struct tcf_common *p = a->priv;
act_api.c:606 if (p == NULL) goto errout; // return -1;

One of variants correcting this error is a verify the existence of action 
before calling tcf_action_copy_stats().

Patch for kernel 3.18.10

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index aad6a67..8e7ad61 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -602,9 +602,11 @@ EXPORT_SYMBOL(tcf_exts_dump);
 int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts)
 {
 #ifdef CONFIG_NET_CLS_ACT
-   struct tc_action *a = tcf_exts_first_act(exts);
-   if (tcf_action_copy_stats(skb, a, 1) < 0)
-   return -1;
+   if(tcf_exts_is_available(exts)) {
+   struct tc_action *a = tcf_exts_first_act(exts);
+   if (tcf_action_copy_stats(skb, a, 1) < 0)
+   return -1;
+   }
 #endif
return 0;
 }


This will fix the bug 84661 https://bugzilla.kernel.org/show_bug.cgi?id=84661

In 64bit system a->priv is not NULL, but is not a valid pointer, but because of 
a->type == 0 and
compat_mode == 1 returns a value 0.

"tc filter show dev eth0".

tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip dst 
10.200.2.2 flowid 1:20
tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip dst 
10.200.2.3 flowid 1:30
tc filter show dev eth0

(1) filter parent 1: protocol ip pref 100 u32 fh 800::800 order 2048 key ht 800 
bkt 0 flowid 1:20
  match 0ac80202/ at 16
(2) filter parent 1: protocol ip pref 100 u32 fh 800::801 order 2049 key ht 800 
bkt 0 flowid 1:30
  match 0ac80203/ at 16

64bit system
(1) a->priv == 0x8800
(2) a->priv == 0x8801

32bit system
(1) a->priv == 0x0
(2) a->priv == 0x0

I could not understand the reason for this difference between 32-bit and 64-bit 
systems.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.

2015-04-21 Thread Florian Fainelli
On 21/04/15 10:39, Andrew Lunn wrote:
>>> I would however say that sysfs is the wrong API. The linux network
>>> stack uses netlink for most configuration activities. So i would
>>> suggest adding a netlink binding to DSA, and place the code in
>>> net/dsa/, not within an MDIO driver.
>>
>> I suppose we could do that, but that sounds like a pretty radical change
>> in how DSA is currently configured (that is statically at boot time),
>> part in order to allow booting from DSA-enabled network devices (e.g:
>> nfsroot).
> 
> We would keep both DT and platform device. But statically at boot does
> not work for a USB hotpluggable switch!

Is the switch really hotpluggable, or it is the USB-Ethernet adapter
connecting to it? If the former, then I agree, if not, I would imagine
that there is nothing that prevents creating the switch device first,
and wait for its "master_netdev" to show up later before it starts doing
anything useful?
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: commit d0af71a3573 for 3.19.y stable

2015-04-21 Thread David Miller
From: Josh Boyer 
Date: Wed, 1 Apr 2015 20:34:22 -0400

> Another possible stable candidate.  We had a report[1] of deadlocks
> with tigon devices on 3.19.y and the commit below fixes it.  It
> cherry-picks cleanly on top of 3.19.3.  I don't see it queued up so I
> thought I would point it out.
> 
> commit d0af71a3573f1217b140c60b66f1a9b335fb058b
> Author: Jun'ichi Nomura \(NEC\) 
> Date:   Thu Feb 12 01:26:24 2015 +
> 
> tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()

Queued up, thanks Josh.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.

2015-04-21 Thread Florian Fainelli
On 21/04/15 10:30, Andrew Lunn wrote:
>> My goal in reworking this weird DSA device/driver model is that you
>> could just register your switch devices as an enhanced
>> phy_driver/spi_driver/pci_driver etc..., such that libphy-ready drivers
>> could just take advantage of that when they scan/detect their MDIO buses
>> and find a switch. We are not quite there yet, but some help could be
>> welcome, here are the WIP patches (tested with platform_driver only so far):
> 
> We are hijacking another thread, but...
> 
> I don't understand you here. Who calls dsa_switch_register()?

Any driver which is backing the underlying device, if this is a PCI(e)
switch, a pci_driver's probe function gets called, and then registers
with DSA a switch device, very much like this:

https://github.com/ffainelli/linux/commit/f94efc3d7b489955351c01efeafcc89939df388e

> 
> I know of a board coming soon which has three switch chips on
> it. There is one MDIO device in the Soc, but there is an external MDIO
> multiplexor controlled via gpio lines, such that each switch has its
> own MDIO bus. The DT binding does not support this currently, but the
> underlying data structures do.
> 
> How do you envisage dsa_switch_register() to work in such a setup?

I would envision something where we can scan all of these switches
individually using their respective device drivers, with the help of
Device Tree or platform_data, figure out which position in a
dsa_switch_tree they should have, and make sure that we create a
dsa_switch_tree which reflects that, taking probe ordering into account.
All of these switches would be phy_driver instances, like this:
https://github.com/ffainelli/linux/commit/4a5c6b17de36377f6a71423b91f80bc1c7fee7be

We can keep discussing the details in a separate thread, I think that
would be useful.
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.

2015-04-21 Thread Andrew Lunn
> > I would however say that sysfs is the wrong API. The linux network
> > stack uses netlink for most configuration activities. So i would
> > suggest adding a netlink binding to DSA, and place the code in
> > net/dsa/, not within an MDIO driver.
> 
> I suppose we could do that, but that sounds like a pretty radical change
> in how DSA is currently configured (that is statically at boot time),
> part in order to allow booting from DSA-enabled network devices (e.g:
> nfsroot).

We would keep both DT and platform device. But statically at boot does
not work for a USB hotpluggable switch!

Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.

2015-04-21 Thread Andrew Lunn
> My goal in reworking this weird DSA device/driver model is that you
> could just register your switch devices as an enhanced
> phy_driver/spi_driver/pci_driver etc..., such that libphy-ready drivers
> could just take advantage of that when they scan/detect their MDIO buses
> and find a switch. We are not quite there yet, but some help could be
> welcome, here are the WIP patches (tested with platform_driver only so far):

We are hijacking another thread, but...

I don't understand you here. Who calls dsa_switch_register()?

I know of a board coming soon which has three switch chips on
it. There is one MDIO device in the Soc, but there is an external MDIO
multiplexor controlled via gpio lines, such that each switch has its
own MDIO bus. The DT binding does not support this currently, but the
underlying data structures do.

How do you envisage dsa_switch_register() to work in such a setup?

Thanks
Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.

2015-04-21 Thread Florian Fainelli
On 21/04/15 05:47, Andrew Lunn wrote:
> Hi Jan
> 
> Interesting work, but i think the architecture is wrong.
> 
> DSA needs an Ethernet device, an MDIO bus, and information about ports
> on the switch. 

That requirement is completely artificial as it is today, and just comes
from arbitrary limitations imposed in the initial DSA design, something
that I am still trying to get away from.

> The MDIO bus and the Ethernet need no knowledge of
> DSA. So putting your DSA configuration code in the MDIO driver is
> wrong.

I agree with that.

> 
> The problem you have is where the put the configuration data. There
> are the currently two choices, using a platform driver, which you can
> find some examples of in arch/arm/mach-orion5x, or via device tree. Or
> you need a new method.
> 
> Part of your problem is hotplug, since you have a USB device, and no
> stable names for the ethernet device nor the MDIO device. Your
> hardware is not fixed, you could hang any switch off the USB
> device. So it does sound like you need a user space API.
> 
> I would however say that sysfs is the wrong API. The linux network
> stack uses netlink for most configuration activities. So i would
> suggest adding a netlink binding to DSA, and place the code in
> net/dsa/, not within an MDIO driver.

I suppose we could do that, but that sounds like a pretty radical change
in how DSA is currently configured (that is statically at boot time),
part in order to allow booting from DSA-enabled network devices (e.g:
nfsroot).

> 
> Device tree overlays might be a solution, if you can dynamically load
> a blob as part of a USB hotplug event. What makes it easier is that
> both the Ethernet device and MDIO bus are on the same USB device, so
> all your phandles are within the blob.
> 
> What is your long term goal? Is this just a development tool? Are you
> thinking of making a product which integrates both the switch and the
> USB ethernet onto a USB dongle? This could also change the
> architecture, since it makes the configuration more fixed.

My goal in reworking this weird DSA device/driver model is that you
could just register your switch devices as an enhanced
phy_driver/spi_driver/pci_driver etc..., such that libphy-ready drivers
could just take advantage of that when they scan/detect their MDIO buses
and find a switch. We are not quite there yet, but some help could be
welcome, here are the WIP patches (tested with platform_driver only so far):

https://github.com/ffainelli/linux/tree/dsa-model-b53
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/3] net/dsa: Refactor dsa_probe()

2015-04-21 Thread Florian Fainelli
On 21/04/15 06:26, Jan Kaisrlik wrote:
> From: Jan Kaisrlik 
> 
> This patch refactors dsa_probe in order to simplify code in the patch 2/3.

It does not look like you are working on the latest net-next tree, that
part of the code has already been refactored to have separate helper
functions such as dsa_setup_dst(), dsa_switch_setup() and
dsa_switch_setup_one().
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] altera tse: add support for lixed-links.

2015-04-21 Thread Andreas Oetken
From: Andreas Oetken 

Add support for fixed-links in configurations without PHY.
(e.g. connection to a switch, SGMII point to point, SFPs)

Check: Documentation/devicetree/bindings/net/fixed-link.txt.
Signed-off-by: Andreas Oetken 
---
 drivers/net/ethernet/altera/altera_tse_main.c | 36 +--
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/altera/altera_tse_main.c 
b/drivers/net/ethernet/altera/altera_tse_main.c
index dbbbd34..a90262e 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -777,6 +777,7 @@ static int init_phy(struct net_device *dev)
struct altera_tse_private *priv = netdev_priv(dev);
struct phy_device *phydev;
struct device_node *phynode;
+   bool fixed_link = false;
 
/* Avoid init phy in case of no phy present */
if (!priv->phy_iface)
@@ -789,13 +790,32 @@ static int init_phy(struct net_device *dev)
phynode = of_parse_phandle(priv->device->of_node, "phy-handle", 0);
 
if (!phynode) {
-   netdev_dbg(dev, "no phy-handle found\n");
-   if (!priv->mdio) {
-   netdev_err(dev,
-  "No phy-handle nor local mdio specified\n");
-   return -ENODEV;
+   /* check if a fixed-link is defined in device-tree */
+   if (of_phy_is_fixed_link(priv->device->of_node)) {
+   if (of_phy_register_fixed_link(priv->device->of_node)
+   < 0) {
+   netdev_err(dev, "cannot register fixed PHY\n");
+   return -ENODEV;
+   }
+
+   /* In the case of a fixed PHY, the DT node associated
+   * to the PHY is the Ethernet MAC DT node.
+   */
+   phynode = of_node_get(priv->device->of_node);
+   fixed_link = true;
+
+   netdev_dbg(dev, "fixed-link detected\n");
+   phydev = of_phy_connect(dev, phynode,
+   &altera_tse_adjust_link,
+   0, priv->phy_iface);
+   } else {
+   netdev_dbg(dev, "no phy-handle found\n");
+   if (!priv->mdio) {
+   netdev_err(dev, "No phy-handle nor local mdio 
specified\n");
+   return -ENODEV;
+   }
+   phydev = connect_local_phy(dev);
}
-   phydev = connect_local_phy(dev);
} else {
netdev_dbg(dev, "phy-handle found\n");
phydev = of_phy_connect(dev, phynode,
@@ -819,10 +839,10 @@ static int init_phy(struct net_device *dev)
/* Broken HW is sometimes missing the pull-up resistor on the
 * MDIO line, which results in reads to non-existent devices returning
 * 0 rather than 0x. Catch this here and treat 0 as a non-existent
-* device as well.
+* device as well. If a fixed-link is used the phy_id is always 0.
 * Note: phydev->phy_id is the result of reading the UID PHY registers.
 */
-   if (phydev->phy_id == 0) {
+   if ((phydev->phy_id == 0) && !fixed_link) {
netdev_err(dev, "Bad PHY UID 0x%08x\n", phydev->phy_id);
phy_disconnect(phydev);
return -ENODEV;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH iproute2 1/3] tc: fix compilation warning on 32bits arch

2015-04-21 Thread David Laight
From: Nicolas Dichtel
> Sent: 21 April 2015 17:07
> The warning was:
> m_simple.c: In function parse_simple:
> m_simple.c:142:4: warning: format %ld expects argument of type long int, but 
> argument 3 has type
> size_t [-Wformat]
> 
> Useful to be able to compile with -Werror.
> 
> Signed-off-by: Nicolas Dichtel 
> ---
>  tc/m_simple.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tc/m_simple.c b/tc/m_simple.c
> index 866552f559b3..3b6d7beb769c 100644
> --- a/tc/m_simple.c
> +++ b/tc/m_simple.c
> @@ -139,7 +139,7 @@ parse_simple(struct action_util *a, int *argc_p, char 
> ***argv_p, int tca_id,
> 
>   if (strlen(simpdata) > (SIMP_MAX_DATA - 1)) {
>   fprintf(stderr, "simple: Illegal string len %ld <%s> \n",
> - strlen(simpdata), simpdata);
> + (long)strlen(simpdata), simpdata);
>   return -1;

Isn't the correct fix to use "%zu" ?

David



Re: randconfig build error with next-20150421, in net/ceph

2015-04-21 Thread Sage Weil
On Tue, 21 Apr 2015, Guenter Roeck wrote:
> On Tue, Apr 21, 2015 at 08:10:44AM -0700, Jim Davis wrote:
> > Building with the attached random configuration file,
> > 
> > ERROR: "__divdi3" [net/ceph/libceph.ko] undefined!
> 
> Commit 7321f19d ("crush: straw2 bucket type with an efficient 64-bit 
> crush_ln()").
> 
> +   draw = ln / w;
> 
> where 'ln' is 64 bit.
> 
> Some other oddies in that patch, such as 
> 
> +#if defined(__linux__)
> +#include 
> +#elif defined(__FreeBSD__)
> +#include 
> +#endif
> 
> and lots of coding style violations.

Thanks for the report--we'll fix it up!

sage
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: randconfig build error with next-20150421, in net/ceph

2015-04-21 Thread Ilya Dryomov
On Tue, Apr 21, 2015 at 7:21 PM, Sage Weil  wrote:
> On Tue, 21 Apr 2015, Guenter Roeck wrote:
>> On Tue, Apr 21, 2015 at 08:10:44AM -0700, Jim Davis wrote:
>> > Building with the attached random configuration file,
>> >
>> > ERROR: "__divdi3" [net/ceph/libceph.ko] undefined!
>>
>> Commit 7321f19d ("crush: straw2 bucket type with an efficient 64-bit 
>> crush_ln()").
>>
>> +   draw = ln / w;
>>
>> where 'ln' is 64 bit.
>>
>> Some other oddies in that patch, such as
>>
>> +#if defined(__linux__)
>> +#include 
>> +#elif defined(__FreeBSD__)
>> +#include 
>> +#endif
>>
>> and lots of coding style violations.
>
> Thanks for the report--we'll fix it up!

Fixed in ceph-client/master, should make it to linux-next tomorrow.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: randconfig build error with next-20150421, in net/ceph

2015-04-21 Thread Guenter Roeck
On Tue, Apr 21, 2015 at 08:10:44AM -0700, Jim Davis wrote:
> Building with the attached random configuration file,
> 
> ERROR: "__divdi3" [net/ceph/libceph.ko] undefined!

Commit 7321f19d ("crush: straw2 bucket type with an efficient 64-bit 
crush_ln()").

+   draw = ln / w;

where 'ln' is 64 bit.

Some other oddies in that patch, such as 

+#if defined(__linux__)
+#include 
+#elif defined(__FreeBSD__)
+#include 
+#endif

and lots of coding style violations.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH iproute2 1/3] tc: fix compilation warning on 32bits arch

2015-04-21 Thread Nicolas Dichtel
The warning was:
m_simple.c: In function ‘parse_simple’:
m_simple.c:142:4: warning: format ‘%ld’ expects argument of type ‘long int’, 
but argument 3 has type ‘size_t’ [-Wformat]

Useful to be able to compile with -Werror.

Signed-off-by: Nicolas Dichtel 
---
 tc/m_simple.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/m_simple.c b/tc/m_simple.c
index 866552f559b3..3b6d7beb769c 100644
--- a/tc/m_simple.c
+++ b/tc/m_simple.c
@@ -139,7 +139,7 @@ parse_simple(struct action_util *a, int *argc_p, char 
***argv_p, int tca_id,
 
if (strlen(simpdata) > (SIMP_MAX_DATA - 1)) {
fprintf(stderr, "simple: Illegal string len %ld <%s> \n",
-   strlen(simpdata), simpdata);
+   (long)strlen(simpdata), simpdata);
return -1;
}
 
-- 
2.2.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.16.y-ckt 093/144] net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}

2015-04-21 Thread Luis Henriques
3.16.7-ckt10 -stable review patch.  If anyone has any objections, please let me 
know.

--

From: Markos Chandras 

commit 87f966d97b89774162df04d2106c6350c8fe4cb3 upstream.

On a MIPS Malta board, tons of fifo underflow errors have been observed
when using u-boot as bootloader instead of YAMON. The reason for that
is that YAMON used to set the pcnet device to SRAM mode but u-boot does
not. As a result, the default Tx threshold (64 bytes) is now too small to
keep the fifo relatively used and it can result to Tx fifo underflow errors.
As a result of which, it's best to setup the SRAM on supported controllers
so we can always use the NOUFLO bit.

Cc: 
Cc: 
Cc: Don Fry 
Signed-off-by: Markos Chandras 
Signed-off-by: David S. Miller 
Signed-off-by: Luis Henriques 
---
 drivers/net/ethernet/amd/pcnet32.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/pcnet32.c 
b/drivers/net/ethernet/amd/pcnet32.c
index e7cc9174e364..02d3b7975835 100644
--- a/drivers/net/ethernet/amd/pcnet32.c
+++ b/drivers/net/ethernet/amd/pcnet32.c
@@ -1552,7 +1552,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
 {
struct pcnet32_private *lp;
int i, media;
-   int fdx, mii, fset, dxsuflo;
+   int fdx, mii, fset, dxsuflo, sram;
int chip_version;
char *chipname;
struct net_device *dev;
@@ -1589,7 +1589,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
}
 
/* initialize variables */
-   fdx = mii = fset = dxsuflo = 0;
+   fdx = mii = fset = dxsuflo = sram = 0;
chip_version = (chip_version >> 12) & 0x;
 
switch (chip_version) {
@@ -1622,6 +1622,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
chipname = "PCnet/FAST III 79C973"; /* PCI */
fdx = 1;
mii = 1;
+   sram = 1;
break;
case 0x2626:
chipname = "PCnet/Home 79C978"; /* PCI */
@@ -1645,6 +1646,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
chipname = "PCnet/FAST III 79C975"; /* PCI */
fdx = 1;
mii = 1;
+   sram = 1;
break;
case 0x2628:
chipname = "PCnet/PRO 79C976";
@@ -1673,6 +1675,31 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
dxsuflo = 1;
}
 
+   /*
+* The Am79C973/Am79C975 controllers come with 12K of SRAM
+* which we can use for the Tx/Rx buffers but most importantly,
+* the use of SRAM allow us to use the BCR18:NOUFLO bit to avoid
+* Tx fifo underflows.
+*/
+   if (sram) {
+   /*
+* The SRAM is being configured in two steps. First we
+* set the SRAM size in the BCR25:SRAM_SIZE bits. According
+* to the datasheet, each bit corresponds to a 512-byte
+* page so we can have at most 24 pages. The SRAM_SIZE
+* holds the value of the upper 8 bits of the 16-bit SRAM size.
+* The low 8-bits start at 0x00 and end at 0xff. So the
+* address range is from 0x up to 0x17ff. Therefore,
+* the SRAM_SIZE is set to 0x17. The next step is to set
+* the BCR26:SRAM_BND midway through so the Tx and Rx
+* buffers can share the SRAM equally.
+*/
+   a->write_bcr(ioaddr, 25, 0x17);
+   a->write_bcr(ioaddr, 26, 0xc);
+   /* And finally enable the NOUFLO bit */
+   a->write_bcr(ioaddr, 18, a->read_bcr(ioaddr, 18) | (1 << 11));
+   }
+
dev = alloc_etherdev(sizeof(*lp));
if (!dev) {
ret = -ENOMEM;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH iproute2 2/3] libnamespaces: fix warning about syscall()

2015-04-21 Thread Nicolas Dichtel
The warning was:
In file included from namespace.c:14:0:
../include/namespace.h: In function ‘setns’:
../include/namespace.h:37:2: warning: implicit declaration of function 
‘syscall’ [-Wimplicit-function-declaration]

Signed-off-by: Nicolas Dichtel 
---
 include/namespace.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/namespace.h b/include/namespace.h
index a2ac7dccd0e1..5add9d266b7d 100644
--- a/include/namespace.h
+++ b/include/namespace.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
-- 
2.2.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] turn Makefile more distribution friendly

2015-04-21 Thread Pavel Šimerda
On 04/20/2015 06:55 PM, Stephen Hemminger wrote:
> On Mon, 13 Apr 2015 16:00:56 +0200
> Pavel Šimerda  wrote:
> 
>> From: Pavel Šimerda 
>>
>> Changes:
>>
>>  * Accept directory settings from environment.
>>  * Remove redundant ROOTDIR variable.
>>  * Set KERNEL_INCLUDE default to '/usr/include'.
>>  * Use CFLAGS from environemnt.
>>
>> Note: In the long term it might be better to improve the configure
>> script to generate those parts of the Makefile in a manner similar
>> to autoconf. It might be even practical to autotoolize the package.
>>
>> Signed-off-by: Pavel Šimerda 
> 
> I will take this part.
> But don't want to start iproute2 down the autoconf/autotool sink hole.

Thanks! The changes I submitted should generally be good enough for
distro maintainers to avoid Makefile modifications.

Cheers,

Pavel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH iproute2 3/3] mroute: remove invalid check against NLM_F_MULTI

2015-04-21 Thread Nicolas Dichtel
This flag is only for the netlink protocol (multi-part messages), no reason
to reject messages without it.

Note that this flag was removed by the following kernel patches (v3.14)
65886f439ab0 ipmr: fix mfc notification flags
f518338b1603 ip6mr: fix mfc notification flags

Signed-off-by: Nicolas Dichtel 
---
 ip/ipmroute.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/ip/ipmroute.c b/ip/ipmroute.c
index 13ac892512d0..125a13f8c582 100644
--- a/ip/ipmroute.c
+++ b/ip/ipmroute.c
@@ -67,8 +67,7 @@ int print_mroute(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
int family;
 
if ((n->nlmsg_type != RTM_NEWROUTE &&
-n->nlmsg_type != RTM_DELROUTE) ||
-   !(n->nlmsg_flags & NLM_F_MULTI)) {
+n->nlmsg_type != RTM_DELROUTE)) {
fprintf(stderr, "Not a multicast route: %08x %08x %08x\n",
n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
return 0;
-- 
2.2.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/11] IB/cm: Add network namespace support

2015-04-21 Thread Jason Gunthorpe
On Tue, Apr 21, 2015 at 03:07:47PM +0300, Haggai Eran wrote:

> Namespace is needed for RoCE address resolution, in cases where the
> driver doesn't report the MAC as part of the ib_wc.

This patch explicitly says it doesn't deal with RoCE, so why are we
adding namespaces to support RoCE paths in this series? Especially
since we have no idea how that should fit into verbs.

Frankly, that stuff is the most objectionable part of this series.

I suggest:
 1) Focus only on the RDMA-CM, and only on IB support, as the title
says
 2) Drop all changes to verbs and cm and otherwise that are not
directly related to IB
 3) Very, very, strongly justify why the remaining layer violations are
necessary, and think very carefully about doing something else.

For IB, it is very clear to me that only the RDMA-CM can possibly have
the knoweldge to find the namespace, so only the RDMA-CM should be
touching it.

If the interface between the RDMA-CM and IB-CM layers is preventing
something, then extend the interface, don't drop RDMA-CM code into
IB-CM.

>From that point, with working IB, we can revisit what is needed to
make iWarp and RoCE work at the verbs layer and ultimately at the CM
layer, in steps.

Your other questions:

> Using the GID alone is not enough to distinguish between namespaces,
> because you can have multiple IPoIB interfaces, all using the GID (and
> possibly the same P_Key), and each belonging to a different namespace.

Exactly, this is why IB GID layers can't possibly need to touch the net
namespace.

> The listener rbtree's key is currently the service ID, for
> instance. Now with namespaces, you can have multiple listeners
> listening on the same service ID, so we need to use (service ID,
> namespace) as the key.

CM doesn't care, a service ID is registered by RDMA-CM and RDMA-CM can
demux the (service ID,IP) tuple to the right namespace. Having CM
snoop private data is a huge layering violation!

> looks at it's private data and demuxes it to a net namespace.
> I don't see it there. The code seem to fetch the GID from the GRH.
> Because the IP address in the source GID can be the same for different
> namespaces, this is not enough to pick the right namespace.

For IB, ib_init_ah_from_wc does not need a namespace.

For RoCEE, the GID *MUST* be enough to find the namespace because each
namespace will create a unique GID table entry.

RoCEE and IB are going to be totally different in how this
implemented...

I expect RoCEE to have namespace constraints at the verbs QP level,
while IB cannot - that feels like a huge journey...

Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure

2015-04-21 Thread Arnd Bergmann
On Tuesday 21 April 2015 17:13:32 Thomas Gleixner wrote:
> On Tue, 21 Apr 2015, Arnd Bergmann wrote:
> > On Tuesday 21 April 2015 16:14:26 Thomas Gleixner wrote:
> > > > Note the use of a separate __kernel_itimerspec64 for the user interface
> > > > here, which I think will be needed to hide the differences between the
> > > > normal itimerspec on 64-bit machines, and the new itimerspec on 32-bit
> > > > platforms that will be defined differently (using 'long long').
> > > 
> > > Confused.
> > > 
> > > timespec64 / itimerspec64 should be the same independent of 64bit and
> > > 32bit. So why do we need another variant ?
> > 
> > There are multiple reasons:
> > 
> > * On 64-bit systems, timespec64 would always be defined in the same way
> >   as struct timespec { __kernel_time_t tv_sec; long tv_nsec; }, with
> >   __kernel_time_t being 'long'. On 32-bit, we probably need to make both
> >   members 'long long' for the user space side, in order to share the
> >   syscall implementation with the kernel side, but we may also want to
> >   keep the internal timespec64 using a 'long' for tv_nsec, as we do
> >   today. This means that both the binary layout (padding or no padding)
> >   and the basic types (long or long long) are different between 32-bit
> >   and 64-bit, and between kernel and user space
> 
> So you want to avoid a compat syscall for 32bit applications on a
> 64bit kernel, right?

That is the idea at the moment, yes. At least for the kernel-user
interface.

> That burdens 32bit with the extra 'long long' in user space. Not sure
> whether user space folks will be happy about it.

I know there are concerns about this, in particular because C11 and
POSIX both require tv_nsec to be 'long', unlike timeval->tv_usec,
which is a 'suseconds_t' and can be defined as 'long long'.

There are four possible ways that 32-bit user space could define
timespec based on the uncontroversial (I hope) 'typedef long long
time_t;'.

a)

struct timespec {
time_t tv_sec;
long long tv_nsec; /* or typedef long long snseconds_t */
};

This is not directly compatible with C11 or POSIX.1-2008, but it
matches what we do inside of 64-bit kernels, so probably has the
highest chance of working correctly in practice

b)

struct timespec {
time_t tv_sec;
long tv_nsec;
};

This is the definition according to C11 and POSIX, and the main problem
here is the padding, which may cause a 4-byte data leak and has the tv_nsec
in the wrong place when used together with the timespec we have in the kernel
on big-endian 64-bit machines.

c)

struct timespec {
time_t tv_sec;
#if __BITS_PER_LONG == 32 && __BYTE_ORDER == __BIG_ENDIAN
long __pad;
#endif
long tv_nsec; /* or typedef long long snseconds_t */
#if __BITS_PER_LONG == 32 && __BYTE_ORDER == __LITTLE_ENDIAN
long __pad;
#endif
};

This version could be used transparently by user space, but has
the potential to cause problems with existing user space doing
things like

struct timespec ts = { 0, 1000 }; /* one microsecond */

even though it is probably compliant.

d)

struct timespec {
time_t tv_sec;
#if __BITS_PER_LONG == 32 && __BYTE_ORDER == __BIG_ENDIAN
long :32;
#endif
long tv_nsec; /* or typedef long long snseconds_t */
#if __BITS_PER_LONG == 32 && __BYTE_ORDER == __LITTLE_ENDIAN
long :32;
#endif
};

This is very similar to c, but trades the problem I described
above for a dependency on a gcc extension that is not part of
C11 or any earlier standard.


>From the kernel's point of view, b), c) and d) can all be
treated the same for output data, as we only ever pass back
normalized timespec structures that have zeroes in the upper
bits of timespec. However, for input to the kernel, we would
require an extra conditional on 64-bit kernels to decide whether
we ignore the upper bits (doing tv->tv_nsec &= 0x) or 
a structure that has the upper bits set  needs to cause EINVAL.
We could still do that in get_timespec() if we decide to not to
go with a).

See also https://lwn.net/Articles/457089/ for an earlier discussion
on the topic when debating the x32 ABI.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure

2015-04-21 Thread Thomas Gleixner
On Tue, 21 Apr 2015, Arnd Bergmann wrote:
> On Tuesday 21 April 2015 16:14:26 Thomas Gleixner wrote:
> > > Note the use of a separate __kernel_itimerspec64 for the user interface
> > > here, which I think will be needed to hide the differences between the
> > > normal itimerspec on 64-bit machines, and the new itimerspec on 32-bit
> > > platforms that will be defined differently (using 'long long').
> > 
> > Confused.
> > 
> > timespec64 / itimerspec64 should be the same independent of 64bit and
> > 32bit. So why do we need another variant ?
> 
> There are multiple reasons:
> 
> * On 64-bit systems, timespec64 would always be defined in the same way
>   as struct timespec { __kernel_time_t tv_sec; long tv_nsec; }, with
>   __kernel_time_t being 'long'. On 32-bit, we probably need to make both
>   members 'long long' for the user space side, in order to share the
>   syscall implementation with the kernel side, but we may also want to
>   keep the internal timespec64 using a 'long' for tv_nsec, as we do
>   today. This means that both the binary layout (padding or no padding)
>   and the basic types (long or long long) are different between 32-bit
>   and 64-bit, and between kernel and user space

So you want to avoid a compat syscall for 32bit applications on a
64bit kernel, right?

That burdens 32bit with the extra 'long long' in user space. Not sure
whether user space folks will be happy about it.

> * We should not put 'struct timespec64' into the user space namespace,
>   as applications might already use that identifier. This is similar
>   to the __u32/u32 or __kernel_time_t/time_t tuple of types for interface
>   and in-kernel uses. This is particularly important when embedding a
>   timespec in another data structure.

Fair enough.

> * My plan is to use a temporary hack where I actually define
>   __kernel_timespec64 to look like the 32-bit version of timespec,
>   as an intermediate step when converting all 32-bit architectures over
>   to use the compat_*() syscalls in place of the existing ones, so
>   I can change over the normal syscalls to use __kernel_timespec64
>   without having to change all architectures at once, or having to
>   modify each syscall multiple times.

Makes sense.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


randconfig build error with next-20150421, in net/ceph

2015-04-21 Thread Jim Davis
Building with the attached random configuration file,

ERROR: "__divdi3" [net/ceph/libceph.ko] undefined!
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.0.0 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_KDBUS=m
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
# CONFIG_AUDITSYSCALL is not set

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_KTHREAD_PRIO=0
# CONFIG_RCU_NOCB_CPU is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
# CONFIG_BUILD_BIN2C is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
# CONFIG_CGROUP_FREEZER is not set
CONFIG_CGROUP_DEVICE=y
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_CPUACCT is not set
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
# CONFIG_MEMCG_KMEM is not set
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_RD_GZIP is not set
CONFIG_RD_BZIP2=y
# CONFIG_RD_LZMA is not set
# CONFIG_RD_XZ is not set
# CONFIG_RD_LZO is not set
CONFIG_RD_LZ4=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
# CONFIG_LTO_MENU is not set
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
CONFIG_EXPERT=y
# CONFIG_MULTIUSER is not set
CONFIG_SGETMASK_SYSCALL=y
# CONFIG_SYSFS_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
# CONFIG_PCSPKR_PLATFORM is not set
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
# CONFIG_EPOLL is not set
# CONFIG_SIGNALFD is not set
# CONFIG_TIMERFD is not set
CONFIG_EVENTFD=y
# CONFIG_BPF_SYSCALL is not set
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_PCI_QUIRKS=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# CONFIG_VM_EVENT_COUNTERS is not set
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CON

Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-21 Thread Ian Jackson
Prashant Sreedharan writes ("Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 
more messages]"):
> On Fri, 2015-04-17 at 15:12 -0400, David Miller wrote:
> > From: Konrad Rzeszutek Wilk 
> > Date: Fri, 17 Apr 2015 15:04:48 -0400
> > > A huge amount of NIC drivers use the DMA API, however if
> > > compiled under 32-bit an very important part of the DMA API can
> > > be ommitted leading to the drivers not working at all
> > > (especially if used with 'swiotlb=force iommu=soft').
...
> > > As such enable this even when using 32-bit kernels.
> > > 
> > > Reported-by: Ian Jackson 
> > > Signed-off-by: Konrad Rzeszutek Wilk 

Thanks.

Tested-by: Ian Jackson 

I'd appreciate it if this patch could make its way into the 3.14.y
stable branch, as well as all the other places it's applicable of
course.  I've included another copy below for convenience, with
acks etc. from this email thread folded in.

I have tested 3.14.34 plus just this patch, with my usual kernel
configuration input and the affected machine, and this fixes the
problem completely AFAICT.

I have tested both baremetal 32-bit with `iommu=soft swiotlb=force'
(which previously corrupted all received network packets) and 32-bit
Linux under 64-bit Xen without any special options (which previously
gave 25-30% packet loss).

Thanks,
Ian.

>From 9e417af099e3cee2b219ab28ffc1e96b0564b213 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Fri, 17 Apr 2015 14:55:47 -0400
Subject: [PATCH] config: Enable NEED_DMA_MAP_STATE when SWIOTLB is selected

A huge amount of NIC drivers use the DMA API, however if compiled
under 32-bit an very important part of the DMA API can be ommitted leading
to the drivers not working at all (especially if used with
'swiotlb=force iommu=soft').

As Prashant Sreedharan explains it: "the driver [tg3] uses
DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of the dma
"mapping" and dma_unmap_addr() to get the "mapping" value. On most of
the platforms this is a no-op, but ... with "iommu=soft and
swiotlb=force" this house keeping is required, ... otherwise
we pass 0 while calling pci_unmap_/pci_dma_sync_ instead of the
DMA address."

As such enable this even when using 32-bit kernels.

Reported-by: Ian Jackson 
Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: David S. Miller 
Acked-by: Prashant Sreedharan 
Tested-by: Ian Jackson 
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca..570c71d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -177,7 +177,7 @@ config SBUS
 
 config NEED_DMA_MAP_STATE
def_bool y
-   depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG
+   depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG || SWIOTLB
 
 config NEED_SG_DMA_LENGTH
def_bool y
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] tcp: set SOCK_NOSPACE under memory presure

2015-04-21 Thread Jason Baron
Under tcp memory pressure, calling epoll_wait() in edge triggered
mode after -EAGAIN, can result in an indefinite hang in epoll_wait(),
even when there is suffcient memory available to continue making
progress. The problem is that __sk_mem_schedule() can return 0,
under memory pressure without having set the SOCK_NOSPACE flag. Thus,
even though all the outstanding packets have been acked, we never
get the EPOLLOUT that we are expecting from epoll_wait().

This issue is currently limited to epoll when used in edge trigger
mode, since 'tcp_poll()', does in fact currently set SOCK_NOSPACE.
This is sufficient for poll()/select() and epoll() in level trigger
mode. However, in edge trigger mode, epoll() is relying on the write
path to set SOCK_NOSPACE. So I view this patch as bringing us into
sync with poll()/select() and epoll() level trigger behavior.

I can reproduce this issue, using SO_SNDBUF, since
__sk_mem_schedule() will return 0, or failure more readily with
SO_SNDBUF:

1) create socket and set SO_SNDBUF to N
2) add socket as edge trigger
3) write to socket and block in epoll on -EAGAIN
4) cause tcp mem pressure via: echo "" > net.ipv4.tcp_mem

The fix here is simply to set SOCK_NOSPACE in sk_stream_wait_memory()
when the socket is non-blocking.

Note that we could still hang if sk->sk_wmem_queue is 0, when we get
the -EAGAIN. In this case the SOCK_NOSPACE bit will not help, since we
are waiting for and event that will never happen. I believe
that this case is hard to hit (and did not hit in my testing),
in that over the 'soft' limit, we continue to guarantee a minimum
write buffer size. Perhaps, we could return -ENOSPC in this
case...note that this case is not specific to epoll ET, but
rather would affect all blocking and non-blocking sockets as well,
and thus I think its ok to treat it as a separate case.

Signed-off-by: Jason Baron 
---
 net/core/stream.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/core/stream.c b/net/core/stream.c
index 301c05f..d70f77a 100644
--- a/net/core/stream.c
+++ b/net/core/stream.c
@@ -119,6 +119,7 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
int err = 0;
long vm_wait = 0;
long current_timeo = *timeo_p;
+   bool noblock = (*timeo_p ? false : true);
DEFINE_WAIT(wait);
 
if (sk_stream_memory_free(sk))
@@ -131,8 +132,11 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
 
if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
goto do_error;
-   if (!*timeo_p)
+   if (!*timeo_p) {
+   if (noblock)
+   set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
goto do_nonblock;
+   }
if (signal_pending(current))
goto do_interrupted;
clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags);
-- 
1.8.2.rc2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure

2015-04-21 Thread Arnd Bergmann
On Tuesday 21 April 2015 16:14:26 Thomas Gleixner wrote:
> > Note the use of a separate __kernel_itimerspec64 for the user interface
> > here, which I think will be needed to hide the differences between the
> > normal itimerspec on 64-bit machines, and the new itimerspec on 32-bit
> > platforms that will be defined differently (using 'long long').
> 
> Confused.
> 
> timespec64 / itimerspec64 should be the same independent of 64bit and
> 32bit. So why do we need another variant ?

There are multiple reasons:

* On 64-bit systems, timespec64 would always be defined in the same way
  as struct timespec { __kernel_time_t tv_sec; long tv_nsec; }, with
  __kernel_time_t being 'long'. On 32-bit, we probably need to make both
  members 'long long' for the user space side, in order to share the
  syscall implementation with the kernel side, but we may also want to
  keep the internal timespec64 using a 'long' for tv_nsec, as we do
  today. This means that both the binary layout (padding or no padding)
  and the basic types (long or long long) are different between 32-bit
  and 64-bit, and between kernel and user space
* We should not put 'struct timespec64' into the user space namespace,
  as applications might already use that identifier. This is similar
  to the __u32/u32 or __kernel_time_t/time_t tuple of types for interface
  and in-kernel uses. This is particularly important when embedding a
  timespec in another data structure.
* My plan is to use a temporary hack where I actually define
  __kernel_timespec64 to look like the 32-bit version of timespec,
  as an intermediate step when converting all 32-bit architectures over
  to use the compat_*() syscalls in place of the existing ones, so
  I can change over the normal syscalls to use __kernel_timespec64
  without having to change all architectures at once, or having to
  modify each syscall multiple times.

> > I would also prefer not too many people to work on the syscalls, and
> > would rather have Baolin not touch any of the syscall prototypes for
> > the moment.
> 
> I did not ask him to change any of the syscall prototypes. I just
> wanted him to split out the guts of the syscall into a seperate static
> function to avoid all that code churn.

Ok, I wasn't sure about that part, thanks for clarifying.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


VM guest bridging issue with i40e

2015-04-21 Thread Stefan Assmann
There's some trouble with a kvm guest that is bridged to the host
networks i40e (X710) NIC. The guest no longer receives DHCP replies and
probably other traffic as well.
I bisected this back to the following commit.
commit 79c21a827e98081895a8b9650f1b0a8b37b16125
Author: Anjali Singhai Jain 
Date:   Thu Nov 13 03:06:14 2014 +

i40e: Add new update VSI flow to accommodate FW fix with VSI Loopback mode

All VSIs on a VEB should either have loopback enabled or disabled, a
mixed mode is not supported for a VEB. Since our driver supports multiple
VSIs per PF that need to talk to each other make sure to enable Loopback
for the PF and FDIR VSI as well.

Also, we now have to explicitly enable Loopback mode otherwise we fail
VSI creation for VMDq and VF VSIs.

Reproducer:
Adding bridge on the host
# brctl addbr br0
# brctl addif br0 eth6
virt XML for bridging

  

Now request IP via DHCP from the kvm guest.

The problem appears as soon as loopback gets enabled on the main VSI.
>From the XL710 spec:
7.4.9.5.4.1.1 Add VSI Settings Recommendations
Table 7-86.  Add VSI Recommended settings
Allow Loopback 0 for PF

Is it correct that the main VSI represents the PF? If so shouldn't
loopback stay disabled for the main VSI?

  Stefan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/11] Add network namespace support in the RDMA-CM

2015-04-21 Thread Haggai Eran
On 21/04/2015 17:11, Steve Wise wrote:
> On 4/21/2015 1:36 AM, Haggai Eran wrote:
>> On 20/04/2015 17:53, Steve Wise wrote:
>>> Hey Haggai,
>>>
>>> Did you check for changes needed in drivers/infiniband/core/iwcm.c?
>> We focused on namespace support for InfiniBand alone in this series. We
>> didn't handle iWARP, nor did we implement support for RoCE or other
>> transports.
>>
>>> I notice that it uses init_net here:
>>>
>>> static int __init iw_cm_init(void)
>>> {
>>>  iwcm_wq = create_singlethread_workqueue("iw_cm_wq");
>>>  if (!iwcm_wq)
>>>  return -ENOMEM;
>>>
>>>  iwcm_ctl_table_hdr = register_net_sysctl(&init_net,
>>> "net/iw_cm",
>>>   iwcm_ctl_table);
>>>  if (!iwcm_ctl_table_hdr) {
>>>  pr_err("iw_cm: couldn't register sysctl paths\n");
>>>  destroy_workqueue(iwcm_wq);
>>>  return -ENOMEM;
>>>  }
>>>
>>>  return 0;
>>> }
>>>
>> I see the only thing in the iWARP sysctl registered here is the default
>> backlog. If you want to control this parameter per namespace, we could
>> store it per network namespace, and add a namespace parameter to
>> iw_cm_listen. I'm not sure how important this is though.
> 
> I don't think it needs to be per namespace, as long as it still applies
> across all name spaces.

It will, but it will currently only be visible and controllable through
init's namespace.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure

2015-04-21 Thread Thomas Gleixner
On Tue, 21 Apr 2015, Arnd Bergmann wrote:
> > COMPAT_SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
> >struct compat_itimerspec __user *, setting)
> 
> As a side note, I want to kill off the get_fs()/set_fs() calls in
> the process. These always make me dizzy when I try to work out whether
> there is a potential security hole (there is not in this case), and
> they can be slow on some architectures.

Yeah. I have to take a deep breath every time I look at those :)
 
> My preferred solution is one where we end up with the same syscalls
> for both 32-bit and 64-bit, and basically use the
> compat_sys_timer_gettime() implementation (or a simplified version)
> for the existing , something like this:

No objections from my side. I was not looking into the syscall magic
yet. I just wanted to avoid the code churn and have the guts of the
syscalls factored out for simple reusage.


 
> Note the use of a separate __kernel_itimerspec64 for the user interface
> here, which I think will be needed to hide the differences between the
> normal itimerspec on 64-bit machines, and the new itimerspec on 32-bit
> platforms that will be defined differently (using 'long long').

Confused.

timespec64 / itimerspec64 should be the same independent of 64bit and
32bit. So why do we need another variant ?

> I would also prefer not too many people to work on the syscalls, and
> would rather have Baolin not touch any of the syscall prototypes for
> the moment.

I did not ask him to change any of the syscall prototypes. I just
wanted him to split out the guts of the syscall into a seperate static
function to avoid all that code churn.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/11] Add network namespace support in the RDMA-CM

2015-04-21 Thread Steve Wise

On 4/21/2015 1:36 AM, Haggai Eran wrote:

On 20/04/2015 17:53, Steve Wise wrote:

Hey Haggai,

Did you check for changes needed in drivers/infiniband/core/iwcm.c?

We focused on namespace support for InfiniBand alone in this series. We
didn't handle iWARP, nor did we implement support for RoCE or other
transports.


I notice that it uses init_net here:

static int __init iw_cm_init(void)
{
 iwcm_wq = create_singlethread_workqueue("iw_cm_wq");
 if (!iwcm_wq)
 return -ENOMEM;

 iwcm_ctl_table_hdr = register_net_sysctl(&init_net, "net/iw_cm",
  iwcm_ctl_table);
 if (!iwcm_ctl_table_hdr) {
 pr_err("iw_cm: couldn't register sysctl paths\n");
 destroy_workqueue(iwcm_wq);
 return -ENOMEM;
 }

 return 0;
}


I see the only thing in the iWARP sysctl registered here is the default
backlog. If you want to control this parameter per namespace, we could
store it per network namespace, and add a namespace parameter to
iw_cm_listen. I'm not sure how important this is though.


I don't think it needs to be per namespace, as long as it still applies 
across all name spaces.


Steve.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V2 12/12] net/mlx5: Ethernet driver

2015-04-21 Thread Or Gerlitz

On 4/14/2015 9:51 PM, David Miller wrote:

From: Amir Vadai 
Date: Tue, 14 Apr 2015 11:20:35 +0300


Signed-off-by: Amir Vadai 

What does "Ethernet driver" mean?

Are you adding a new ethernet driver?  If so, what is it for and how
does it interact with the existing mlx5 driver?

It looks to me like you are adding a lot of code and objects to the
existing mlx5 module.  An incredible amount, in fact.  This seems very
suboptimal especially for users of the existing mlx5 chips.


Hi Dave,

To clarify things a bit, the mlx5 driver serves two NICs families: 
ConnectIB (device IDs 0x1011/12) and ConnectX4 (device IDs 1013-1016).


ConnectIB HW is IB only, ConnextX4 is IB and Ethernet.

This submission enhances the driver to support Ethernet for the 
ConnectX4 family.


In a similar manner to the mlx4 driver stacking, mlx5 has has IB driver 
and core driver. Per your comments, in V3, the Ethernet functionality 
would be added per dedicated config directive, such that if someone 
wants only the IB driver to be functional, the Ethernet netdev code and 
such doesn't get built.







You haven't discussed this, what design decisions made you decide in the end to 
do it this way, etc.

You absolutely have to say something other than "Ethernet driver"
in this commit message, I expect several paragraphs of details
and the hows and whys of the change as it is a non-trivial amount
of code being added here.


Understood, will add more text to the cover letter, change-logs, etc.



I still consider this patch series not ready yet, and the merge
window is open thus closing the net-next tree.

You will therefore need to wait until the net-next tree opens
again before submitting this series again.


Sure, thanks for the review feedback provided so far.

BTW Sorry for the late reply, just realized today there has been no 
response on your comments.


Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] driver/net/usb: Add support for DSA to ax88772b

2015-04-21 Thread Bjørn Mork
Jan Kaisrlik  writes:

> From: Jan Kaisrlik 
>
> This patch adds a possibility to use the RMII interface of the ax88772b
> instead of the Ethernet PHY which is used now.
>
> Binding DSA to a USB device is done via sysfs.
>
> ---
>  drivers/net/usb/asix.h |   7 ++
>  drivers/net/usb/asix_devices.c | 258 
> -
>  2 files changed, 261 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/usb/asix.h b/drivers/net/usb/asix.h
> index 5d049d0..6b1a5f5 100644
> --- a/drivers/net/usb/asix.h
> +++ b/drivers/net/usb/asix.h
> @@ -174,6 +174,13 @@ struct asix_rx_fixup_info {
>  
>  struct asix_common_private {
>   struct asix_rx_fixup_info rx_fixup_info;
> +#ifdef CONFIG_NET_DSA
> + struct kobject kobj;
> + struct mii_bus *mdio;
> + int use_embphy;
> + bool dsa_up;
> + struct usbnet *dev;
> +#endif
>  };
>  
>  extern const struct driver_info ax88172a_info;
> diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
> index bf49792..57b3a34 100644
> --- a/drivers/net/usb/asix_devices.c
> +++ b/drivers/net/usb/asix_devices.c
> @@ -35,6 +35,88 @@
>  
>  #define  PHY_MODE_RTL8211CL  0x000C
>  
> +#ifdef CONFIG_NET_DSA
> +
> +#define AX88772_RMII 0x02
> +#define AX88772_MAX_PORTS 0x20
> +#define MV88e6065_ID  0x0c89
> +
> +#include 
> +#include 
> +
> +#define to_asix_obj(x) container_of(x, struct asix_common_private, kobj)
> +#define to_asix_attr(x) container_of(x, struct asix_attribute, attr)
> +
> +static int mii_asix_read(struct mii_bus *bus, int phy_id, int regnum)
> +{
> + return asix_mdio_read(((struct usbnet *)bus->priv)->net, phy_id, 
> regnum);
> +}
> +
> +static int mii_asix_write(struct mii_bus *bus, int phy_id, int regnum, u16 
> val)
> +{
> + asix_mdio_write(((struct usbnet *)bus->priv)->net, phy_id, regnum, val);
> + return 0;
> +}
> +
> +static int ax88772_init_mdio(struct usbnet *dev){
> + struct asix_common_private *priv = dev->driver_priv;
> + int ret, i;
> +
> + priv->mdio = mdiobus_alloc();
> + if (!priv->mdio) {
> + netdev_err(dev->net, "Could not allocate mdio bus\n");
> + return -ENOMEM;
> + }
> +
> + priv->mdio->priv = (void *)dev;
> + priv->mdio->read = &mii_asix_read;
> + priv->mdio->write = &mii_asix_write;
> + priv->mdio->name = "ax88772 mdio bus";
> + priv->mdio->parent = &dev->udev->dev;
> +
> + /* mii bus name is usb-- */
> + snprintf(priv->mdio->id, MII_BUS_ID_SIZE, 
> "usb-%03d:%03d",dev->udev->bus->busnum, dev->udev->devnum);
> +
> + priv->mdio->irq = kzalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
> + if (!priv->mdio->irq) {
> + ret = -ENOMEM;
> + goto free;
> + }
> +
> + for (i = 0; i < PHY_MAX_ADDR; i++)
> + priv->mdio->irq[i] = PHY_POLL;
> +
> + ret = mdiobus_register(priv->mdio);
> + if (ret) {
> + netdev_err(dev->net, "Could not register MDIO bus\n");
> + goto free_irq;
> + }
> +
> + netdev_info(dev->net, "registered mdio bus %s\n", priv->mdio->id);
> + return 0;
> +
> +free_irq:
> + kfree(priv->mdio->irq);
> +free:
> + mdiobus_free(priv->mdio);
> + return ret;
> +}

There is already identical code in drivers/net/usb/ax88172a.c.  Any
chance these ASIX devices can share some code, or does it all have to be
duplicated for each new chip?


> +//   dsa_free(); TODO

Probably not like that...


> +static ssize_t usb_dsa_store(struct asix_common_private *priv,
> + struct asix_attribute *attr, const char *buf, size_t count)
> +{
> + ax88772_set_bind_dsa(priv);
> + return count;
> +}
> +
> +static ssize_t usb_dsa_show(struct asix_common_private *priv,
> + struct asix_attribute *attr, char *buf)
> +{
> + return scnprintf(buf, PAGE_SIZE, "usb_dsa_binding.\n");
> +}

I'm not sure I understand this at all.  What kind of userspace API are
you trying to provide here? Maybe you could document these attributes
Documentation/ABI/testing/ to make it more clear?

> +static void driver_release(struct kobject *kobj)
> +{
> + pr_debug("driver: '%s': %s\n", kobject_name(kobj), __func__);
> +//   kfree(drv_priv); TODO free
> +}

Ah, I guess you might have missed this section of
Documentation/kobject.txt ?:

  One important point cannot be overstated: every kobject must have a
  release() method, and the kobject must persist (in a consistent state)
  until that method is called. If these constraints are not met, the
  code is flawed.  Note that the kernel will warn you if you forget to
  provide a release() method.  Do not try to get rid of this warning by
  providing an "empty" release function; you will be mocked mercilessly
  by the kobject maintainer if you attempt this.

Better CC Greg KH on your next attempt to make sure you get the mocking
you deserve :-)


> +static ssize_t usb_dsa_attr_show(struct kobject *kobj,
> + struct attribute *attr,
> + char *buf)
> +{
>

[PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails

2015-04-21 Thread Thomas Graf
When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
we can't allocate the necessary memory in the current context but the
limits as set by the user would still allow to grow.

Thus attempt an async resize in the background where we can allocate
using GFP_KERNEL which is more likely to succeed. The insertion itself
will still fail to indicate pressure.

This fixes a bug where the table would never continue growing once the
utilization is above 100%.

Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: Thomas Graf 
---
 include/linux/rhashtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index e23d242..7040b5c 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -593,6 +593,8 @@ slow_path:
spin_unlock_bh(lock);
err = rhashtable_insert_rehash(ht);
rcu_read_unlock();
+   if (err == -ENOMEM)
+   schedule_work(&ht->run_work);
if (err)
return err;
 
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 2/2] rhashtable: Do not schedule more than one rehash if we can't grow further

2015-04-21 Thread Thomas Graf
The current code currently only stops inserting rehashes into the
chain when no resizes are currently scheduled. As long as resizes
are scheduled and while inserting above the utilization watermark,
more and more rehashes will be scheduled.

This lead to a perfect DoS storm with thousands of rehashes
scheduled which lead to thousands of spinlocks to be taken
sequentially.

Instead, only allow either a series of resizes or a single rehash.
Drop any further rehashes and return -EBUSY.

Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: Thomas Graf 
---
 lib/rhashtable.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 4898442..cb819ed 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -405,8 +405,8 @@ int rhashtable_insert_rehash(struct rhashtable *ht)
 
if (rht_grow_above_75(ht, tbl))
size *= 2;
-   /* More than two rehashes (not resizes) detected. */
-   else if (WARN_ON(old_tbl != tbl && old_tbl->size == size))
+   /* Do not schedule more than one rehash */
+   else if (old_tbl != tbl)
return -EBUSY;
 
new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 0/2] rhashtable rehashing fixes

2015-04-21 Thread Thomas Graf
Some rhashtable rehashing bugs found while testing with the
next rhashtable self-test queued up for the next devel cycle:

https://github.com/tgraf/net-next/commits/rht

Thomas Graf (2):
  rhashtable: Schedule async resize when sync realloc fails
  rhashtable: Do not schedule more than one rehash if we can't grow
further

 include/linux/rhashtable.h | 2 ++
 lib/rhashtable.c   | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.

2015-04-21 Thread Andrew Lunn
Hi Jan

Interesting work, but i think the architecture is wrong.

DSA needs an Ethernet device, an MDIO bus, and information about ports
on the switch. The MDIO bus and the Ethernet need no knowledge of
DSA. So putting your DSA configuration code in the MDIO driver is
wrong.

The problem you have is where the put the configuration data. There
are the currently two choices, using a platform driver, which you can
find some examples of in arch/arm/mach-orion5x, or via device tree. Or
you need a new method.

Part of your problem is hotplug, since you have a USB device, and no
stable names for the ethernet device nor the MDIO device. Your
hardware is not fixed, you could hang any switch off the USB
device. So it does sound like you need a user space API.

I would however say that sysfs is the wrong API. The linux network
stack uses netlink for most configuration activities. So i would
suggest adding a netlink binding to DSA, and place the code in
net/dsa/, not within an MDIO driver.

Device tree overlays might be a solution, if you can dynamically load
a blob as part of a USB hotplug event. What makes it easier is that
both the Ethernet device and MDIO bus are on the same USB device, so
all your phandles are within the blob.

What is your long term goal? Is this just a development tool? Are you
thinking of making a product which integrates both the switch and the
USB ethernet onto a USB dongle? This could also change the
architecture, since it makes the configuration more fixed.

Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP

2015-04-21 Thread Or Gerlitz
From: Eran Ben Elisha 

Currently we parse max_msg_sz from the wrong offset in QUERY_DEV_CAP,
fix to use the right offset.

Fixes: 0b131561a7d6 ('net/mlx4_en: Add Flow control statistics [..]')
Signed-off-by: Eran Ben Elisha 
Signed-off-by: Or Gerlitz 
---

Hi Dave, 

Sending this fix early as that innocent bug breaks RoCE applications on SRIOV 
VFs,
since the max message size there gets down to two bytes. No need for -stable 
here
as the bug was introduced in this merge window.

Or.

 drivers/net/ethernet/mellanox/mlx4/fw.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index b9881fc..a407981 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -781,10 +781,10 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET);
dev_cap->num_ports = field & 0xf;
MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_MSG_SZ_OFFSET);
+   dev_cap->max_msg_sz = 1 << (field & 0x1f);
MLX4_GET(field, outbox, QUERY_DEV_CAP_PORT_FLOWSTATS_COUNTERS_OFFSET);
if (field & 0x10)
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_FLOWSTATS_EN;
-   dev_cap->max_msg_sz = 1 << (field & 0x1f);
MLX4_GET(field, outbox, QUERY_DEV_CAP_FLOW_STEERING_RANGE_EN_OFFSET);
if (field & 0x80)
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_FS_EN;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/11] IB/cm: Add network namespace support

2015-04-21 Thread Haggai Eran
On 20/04/2015 20:06, Jason Gunthorpe wrote:
> On Mon, Apr 20, 2015 at 12:03:38PM +0300, Haggai Eran wrote:
>> From: Guy Shapiro 
>>
>> Add namespace support to the IB-CM layer.
> 
>> - Each CM-ID now has a network namespace it is associated with, assigned at
>>   creation. This namespace is used as needed during subsequent action on the
>>   CM-ID or related objects.
> 
> There is really something weird about this layering. At the CM layer
> there should be no concept of an IP address, it only deals with GIDs.

Using the GID alone is not enough to distinguish between namespaces,
because you can have multiple IPoIB interfaces, all using the GID (and
possibly the same P_Key), and each belonging to a different namespace.

> So how can a CM object have a network namespace associated with it?

The listener rbtree's key is currently the service ID, for instance. Now
with namespaces, you can have multiple listeners listening on the same
service ID, so we need to use (service ID, namespace) as the key.

> 
>>  {
>>  av->port = port;
>>  av->pkey_index = wc->pkey_index;
>>  ib_init_ah_from_wc(port->cm_dev->ib_device, port->port_num, wc,
>> -   grh, &av->ah_attr, &init_net);
>> +   grh, &av->ah_attr, net);
> 
> There is something deeply wrong with adding network namespace
> arguments to verbs.
> 
> For rocee the gid index clearly specifies the network namespace
> to use, so much of this should go away and have rocee get the
> namespace from the gid index.
> 
> Ie in ib_init_ah_from_wc we have the ib_wc which contains the sgid
> index.

I don't see it there. The code seem to fetch the GID from the GRH.
Because the IP address in the source GID can be the same for different
namespaces, this is not enough to pick the right namespace.

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/3] net/dsa: Refactor dsa_probe()

2015-04-21 Thread Jan Kaisrlik
From: Jan Kaisrlik 

This patch refactors dsa_probe in order to simplify code in the patch 2/3.

---
 net/dsa/dsa.c | 82 ++-
 1 file changed, 47 insertions(+), 35 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 3731714..e2c0703 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -699,12 +699,56 @@ static inline void dsa_of_remove(struct platform_device 
*pdev)
 }
 #endif
 
-static int dsa_probe(struct platform_device *pdev)
+static int dsa_probe_common(struct dsa_switch_tree *dst, struct device *parent)
 {
+   struct dsa_platform_data *pd = dst->pd;
+   struct net_device *dev = dst->master_netdev;
+   int i;
+
+   dst->cpu_switch = -1;
+   dst->cpu_port = -1;
+
+   for (i = 0; i < pd->nr_chips; i++) {
+   struct dsa_switch *ds;
+
+   ds = dsa_switch_setup(dst, i, parent, pd->chip[i].host_dev);
+   if (IS_ERR(ds)) {
+   netdev_err(dev, "[%d]: couldn't create dsa switch 
instance (error %ld)\n",
+  i, PTR_ERR(ds));
+   continue;
+   }
+
+   dst->ds[i] = ds;
+   if (ds->drv->poll_link != NULL)
+   dst->link_poll_needed = 1;
+   }
+
+   /*
+* If we use a tagging format that doesn't have an ethertype
+* field, make sure that all packets from this point on get
+* sent to the tag format's receive function.
+*/
+   wmb();
+   dev->dsa_ptr = (void *)dst;
+
+   if (dst->link_poll_needed) {
+   INIT_WORK(&dst->link_poll_work, dsa_link_poll_work);
+   init_timer(&dst->link_poll_timer);
+   dst->link_poll_timer.data = (unsigned long)dst;
+   dst->link_poll_timer.function = dsa_link_poll_timer;
+   dst->link_poll_timer.expires = round_jiffies(jiffies + HZ);
+   add_timer(&dst->link_poll_timer);
+   }
+
+   return 0;
+
+}
+
+static int dsa_probe(struct platform_device *pdev){
struct dsa_platform_data *pd = pdev->dev.platform_data;
struct net_device *dev;
struct dsa_switch_tree *dst;
-   int i, ret;
+   int ret;
 
pr_notice_once("Distributed Switch Architecture driver version %s\n",
   dsa_driver_version);
@@ -743,40 +787,8 @@ static int dsa_probe(struct platform_device *pdev)
 
dst->pd = pd;
dst->master_netdev = dev;
-   dst->cpu_switch = -1;
-   dst->cpu_port = -1;
-
-   for (i = 0; i < pd->nr_chips; i++) {
-   struct dsa_switch *ds;
-
-   ds = dsa_switch_setup(dst, i, &pdev->dev, pd->chip[i].host_dev);
-   if (IS_ERR(ds)) {
-   netdev_err(dev, "[%d]: couldn't create dsa switch 
instance (error %ld)\n",
-  i, PTR_ERR(ds));
-   continue;
-   }
-
-   dst->ds[i] = ds;
-   if (ds->drv->poll_link != NULL)
-   dst->link_poll_needed = 1;
-   }
-
-   /*
-* If we use a tagging format that doesn't have an ethertype
-* field, make sure that all packets from this point on get
-* sent to the tag format's receive function.
-*/
-   wmb();
-   dev->dsa_ptr = (void *)dst;
 
-   if (dst->link_poll_needed) {
-   INIT_WORK(&dst->link_poll_work, dsa_link_poll_work);
-   init_timer(&dst->link_poll_timer);
-   dst->link_poll_timer.data = (unsigned long)dst;
-   dst->link_poll_timer.function = dsa_link_poll_timer;
-   dst->link_poll_timer.expires = round_jiffies(jiffies + HZ);
-   add_timer(&dst->link_poll_timer);
-   }
+   dsa_probe_common(dst, &pdev->dev);
 
return 0;
 
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/3] driver/net/usb: Add support for DSA to ax88772b

2015-04-21 Thread Jan Kaisrlik
From: Jan Kaisrlik 

This patch adds a possibility to use the RMII interface of the ax88772b
instead of the Ethernet PHY which is used now.

Binding DSA to a USB device is done via sysfs.

---
 drivers/net/usb/asix.h |   7 ++
 drivers/net/usb/asix_devices.c | 258 -
 2 files changed, 261 insertions(+), 4 deletions(-)

diff --git a/drivers/net/usb/asix.h b/drivers/net/usb/asix.h
index 5d049d0..6b1a5f5 100644
--- a/drivers/net/usb/asix.h
+++ b/drivers/net/usb/asix.h
@@ -174,6 +174,13 @@ struct asix_rx_fixup_info {
 
 struct asix_common_private {
struct asix_rx_fixup_info rx_fixup_info;
+#ifdef CONFIG_NET_DSA
+   struct kobject kobj;
+   struct mii_bus *mdio;
+   int use_embphy;
+   bool dsa_up;
+   struct usbnet *dev;
+#endif
 };
 
 extern const struct driver_info ax88172a_info;
diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
index bf49792..57b3a34 100644
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -35,6 +35,88 @@
 
 #definePHY_MODE_RTL8211CL  0x000C
 
+#ifdef CONFIG_NET_DSA
+
+#define AX88772_RMII 0x02
+#define AX88772_MAX_PORTS 0x20
+#define MV88e6065_ID  0x0c89
+
+#include 
+#include 
+
+#define to_asix_obj(x) container_of(x, struct asix_common_private, kobj)
+#define to_asix_attr(x) container_of(x, struct asix_attribute, attr)
+
+static int mii_asix_read(struct mii_bus *bus, int phy_id, int regnum)
+{
+   return asix_mdio_read(((struct usbnet *)bus->priv)->net, phy_id, 
regnum);
+}
+
+static int mii_asix_write(struct mii_bus *bus, int phy_id, int regnum, u16 val)
+{
+   asix_mdio_write(((struct usbnet *)bus->priv)->net, phy_id, regnum, val);
+   return 0;
+}
+
+static int ax88772_init_mdio(struct usbnet *dev){
+   struct asix_common_private *priv = dev->driver_priv;
+   int ret, i;
+
+   priv->mdio = mdiobus_alloc();
+   if (!priv->mdio) {
+   netdev_err(dev->net, "Could not allocate mdio bus\n");
+   return -ENOMEM;
+   }
+
+   priv->mdio->priv = (void *)dev;
+   priv->mdio->read = &mii_asix_read;
+   priv->mdio->write = &mii_asix_write;
+   priv->mdio->name = "ax88772 mdio bus";
+   priv->mdio->parent = &dev->udev->dev;
+
+   /* mii bus name is usb-- */
+   snprintf(priv->mdio->id, MII_BUS_ID_SIZE, 
"usb-%03d:%03d",dev->udev->bus->busnum, dev->udev->devnum);
+
+   priv->mdio->irq = kzalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
+   if (!priv->mdio->irq) {
+   ret = -ENOMEM;
+   goto free;
+   }
+
+   for (i = 0; i < PHY_MAX_ADDR; i++)
+   priv->mdio->irq[i] = PHY_POLL;
+
+   ret = mdiobus_register(priv->mdio);
+   if (ret) {
+   netdev_err(dev->net, "Could not register MDIO bus\n");
+   goto free_irq;
+   }
+
+   netdev_info(dev->net, "registered mdio bus %s\n", priv->mdio->id);
+   return 0;
+
+free_irq:
+   kfree(priv->mdio->irq);
+free:
+   mdiobus_free(priv->mdio);
+   return ret;
+}
+
+static void ax88772_remove_mdio(struct usbnet *dev)
+{
+   struct asix_common_private *priv = dev->driver_priv;
+
+// dsa_free(); TODO
+
+   netdev_info(dev->net, "deregistering mdio bus %s\n", priv->mdio->id);
+   mdiobus_unregister(priv->mdio);
+   kfree(priv->mdio->irq);
+   mdiobus_free(priv->mdio);
+   kfree(priv);
+}
+
+#endif
+
 struct ax88172_int_data {
__le16 res1;
u8 link;
@@ -301,6 +383,27 @@ static int ax88772_reset(struct usbnet *dev)
int ret, embd_phy;
u16 rx_ctl;
 
+#ifdef CONFIG_NET_DSA
+   int temp = AX88772_RMII;
+   struct asix_common_private *priv = dev->driver_priv;
+
+   if (priv->use_embphy == 1) {
+   data->phymode = PHY_MODE_MARVELL;
+   data->ledmode = 0;
+
+   /* Set AX88772 to enable RMII interface for external PHY */
+   asix_write_cmd(dev, AX_CMD_SW_PHY_SELECT, 0, 0, 0, NULL);
+   asix_write_cmd(dev, AX_CMD_SW_PHY_SELECT, 0, 0, 1, &temp);
+
+   asix_sw_reset(dev, 0);
+   msleep(150);
+
+   asix_write_rx_ctl(dev, 0);
+   msleep(60);
+   }
+
+#endif
+
ret = asix_write_gpio(dev,
AX_GPIO_RSE | AX_GPIO_GPO_2 | AX_GPIO_GPO2EN, 5);
if (ret < 0)
@@ -415,11 +518,131 @@ static const struct net_device_ops ax88772_netdev_ops = {
.ndo_set_rx_mode= asix_set_multicast,
 };
 
+
+#ifdef CONFIG_NET_DSA
+struct asix_attribute {
+   struct attribute attr;
+   ssize_t (*show)(struct asix_common_private *priv, struct asix_attribute 
*attr, char *buf);
+   ssize_t (*store)(struct asix_common_private *priv, struct 
asix_attribute *attr, const char *buf, size_t count);
+};
+
+static int ax88772_set_bind_dsa(struct asix_common_private *priv)
+{
+   struct usbnet *dev = priv->dev;
+   int i, ret, embd_phy;
+   u32 phyid;

  1   2   >