Re: [RFC PATCH v3 09/10] lib: libos build scripts and documentation
Hi Paul, many thanks for your review. all the fixes will be on next patchset. my comments are below. At Mon, 20 Apr 2015 22:43:07 +0200, Paul Bolle wrote: > > Some random observations while I'm still trying to wrap my head around > all this (which might take quite some time). > > On Sun, 2015-04-19 at 22:28 +0900, Hajime Tazaki wrote: > > --- /dev/null > > +++ b/arch/lib/Kconfig > > @@ -0,0 +1,124 @@ > > +menuconfig LIB > > + bool "LibOS-specific options" > > + def_bool n > > This is the start of the Kconfig parse for lib. (That would basically > still be true even if you didn't set KBUILD_KCONFIG, see below.) So why > not do something like all arches do: > > config LIB > def_bool y > select [...] > > Ie, why would someone want to build for ARCH=lib and still not set LIB? agreed. fixed. > > +config EXPERIMENTAL > > + def_bool y > > Unneeded: removed treewide in, I think, 2014. thanks. fixed. > > +config MMU > > +def_bool n > > Add empty line. > > > +config FPU > > +def_bool n > > Ditto. both are fixed. > > +config KTIME_SCALAR > > + def_bool y > > This one is unused. deleted. > > +config GENERIC_BUG > > + def_bool y > > + depends on BUG > > Add empty line here. fixed. > > +config GENERIC_FIND_NEXT_BIT > > + def_bool y > > This one is unused too. deleted. > > +config SLIB > > + def_bool y > > You've also added SLIB to init/Kconfig in 02/10. But "make ARCH=lib > *config" will never visit init/Kconfig, will it? And, apparently, none > of SL[AOU]B are wanted for lib. So I think the entry for config SLIB in > that file can be dropped (as other arches will never see it because it > depends on LIB). > > (Note that I haven't actually looked into all the Kconfig entries added > above. Perhaps I might do that. But I'm pretty sure most of the time all > I can say is: "I have no idea why this entry defaults to $VALUE".) I intended to SLIB be a generic one, not only for the arch/lib, as we discussed during v2 patch. but, you're right: for the moment, no one uses SLIB, we don't visit init/Kconfig, I dropped config SLIB entry from init/Kconfig. > > +source "net/Kconfig" > > + > > +source "drivers/base/Kconfig" > > + > > +source "crypto/Kconfig" > > + > > +source "lib/Kconfig" > > + > > + > > Trailing empty lines. deleted. thanks. > > diff --git a/arch/lib/Makefile b/arch/lib/Makefile > > new file mode 100644 > > index 000..d8a0bf9 > > --- /dev/null > > +++ b/arch/lib/Makefile > > @@ -0,0 +1,251 @@ > > +ARCH_DIR := arch/lib > > +SRCDIR=$(dir $(firstword $(MAKEFILE_LIST))) > > Do you use SRCDIR? no. deleted the line. > > +DCE_TESTDIR=$(srctree)/tools/testing/libos/ > > +KBUILD_KCONFIG := arch/$(ARCH)/Kconfig > > I think you copied this from arch/um/Makefile. But arch/um/ is, well, > special. Why should lib not start the kconfig parse in the file named > Kconfig? And if you want to start in arch/lib/Kconfig, it would be nice > to add a mainmenu (just like arch/x86/um/Kconfig does). right now, 'lib' only wants to eat arch/lib/Kconfig so that build and link its wanted files instead of configurable one. so I beilive arch/lib is also special as arch/um is. I added a mainmenu btw. thanks. > (I don't read Makefilese well enough to understand the rest of this > file. I think it's scary.) indeed. thank you again to review the cryptic files.. > When I did > make ARCH=lib menuconfig > > I saw (among other things): > arch/lib/Makefile.print:41: target `trace/' given more than once in the > same rule. > arch/lib/Makefile.print:41: target `trace/' given more than once in the > same rule. > arch/lib/Makefile.print:41: target `trace/' given more than once in the > same rule. > arch/lib/Makefile.print:41: target `trace/' given more than once in the > same rule. > arch/lib/Makefile.print:41: target `lzo/' given more than once in the > same rule. (snip) > arch/lib/Makefile.print:41: target `ppp/' given more than once in the > same rule. > arch/lib/Makefile.print:41: target `slip/' given more than once in the > same rule. > > I have no idea why. Unclean tree? this was due to inappropriate handling of the internal directory listing procedure. fixed. > > +.PHONY : core > > +.NOTPARALLEL : print $(subdirs) $(final-obj-m) > > > --- /dev/null > > +++ b/arch/lib/processor.mk > > @@ -0,0 +1,7 @@ > > +PROCESSOR=$(shell uname -m) > > +PROCESSOR_x86_64=64 > > +PROCESSOR_i686=32 > > +PROCESSOR_i586=32 > > +PROCESSOR_i386=32 > > +PROCESSOR_i486=32 > > +PROCESSOR_SIZE=$(PROCESSOR_$(PROCESSOR)) > > The rest of the tree appears to use BITS instead of PROCESSOR_SIZE. And > I do hope there's a cleaner way for lib to set PROCESSOR_SIZE than this. the variable PROCESSOR_SIZE is only used by arch/lib/Makefile, with the following lines. > +ifeq ($(PROCESSOR_SIZE),64) > +CFLAGS+= -DCONFIG_64BIT > +endif Thus it eventually uses CONFIG_64BIT. I think a cleaner way is to follow the way of arch/um, like below:
Re: [RFC 3/3] tc: cleanup tc_classify
On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov wrote: > introduce tc_classify_act() and qdisc_drop_bypass() helper functions to reduce > copy-paste among different qdiscs I don't think qdisc_drop_bypass() is more readable than without it, maybe you need a better name, or just leave the code as it is. tc_classify_act() seems ok. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Renesas Ethernet AVB driver
Hello Sergei. (2015/04/15 6:37:28), Sergei Shtylyov wrote: > >> + if (!ravb_tx_free(ndev, q)) { > >> + netif_warn(priv, tx_queued, ndev, "TX FD exhausted.\n"); > >> + netif_stop_queue(ndev); > >> + spin_unlock_irqrestore(&priv->lock, flags); > >> + return NETDEV_TX_BUSY; > >> + } > >> + } > >> + entry = priv->cur_tx[q] % priv->num_tx_ring[q]; > >> + priv->cur_tx[q]++; > >> + spin_unlock_irqrestore(&priv->lock, flags); > >> + > >> + if (skb_put_padto(skb, ETH_ZLEN)) > >> + return NETDEV_TX_OK; > >> + > >> + priv->tx_skb[q][entry] = skb; > >> + buffer = PTR_ALIGN(priv->tx_buffers[q][entry], RAVB_ALIGN); > >> + memcpy(buffer, skb->data, skb->len); > > > ~1500 bytes memcpy(), not good... > > I'm looking in the manual and not finding the hard requirement to have the > buffer address aligned to 128 bytes (RAVB_ALIGN), sigh... Kimura-san? There are the hardware requirement that the frame data must be aligned with a 32-bit boundary in the URAM, see section 45A.3.3.1 Data Representation in the manual. I think that the original skb->data is almost aligned with 2 bytes boundary by NET_IP_ALING, so we copied original skb->data to the local aligned buffer. In addition, see section 45A.3.3.12 Tips for Optimizing Performance in Handling Descriptors, it mentioned that frame data is accessed in blocks up to 128 bytes and the number of 128 byte borders (addresses H'xxx00 and H'xxx80) and frame data inside should be minimized. So we set RAVB_ALIGN to 128 bytes. Best Regards, Mitsuhiro Kimura
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov wrote: > TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN. > Get rid of redundant checks in all qdiscs. > Instead do it once. The current code can be easily extended, while your code not. I don't see the need of this change. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3] tc: fix return values of ingress qdisc
On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov wrote: > ingress qdisc should return NET_XMIT_* values just like all other qdiscs. > XMIT already means egress... > Since it's invoked via qdisc_enqueue_root() (which suppose to return > only NET_XMIT_* values as well), it was working by accident, > since TC_ACT_* values fit within NET_XMIT_MASK. > Why not just add a BUILD_BUG_ON() to capture this? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch net] igb: fix a typo in igb_reset_q_vector()
On Tue, Apr 21, 2015 at 5:29 PM, Jeff Kirsher wrote: > On Wed, 2015-04-22 at 09:26 +0900, Toshiaki Makita wrote: >> Hi Cong Wang, >> >> I have already sent a patch to intel's tree. >> http://git.kernel.org/cgit/linux/kernel/git/jkirsher/net-queue.git/commit/?h=dev-queue&id=02ac4b65689f8df824117395fd8d160c04161a7b >> I didn't know this tree, I use -net tree. >> > > Oops, yeah. I already have this in my queue. I wondered why the patch > seemed familiar, Cong dropping your patch since I already have > Tochiaki's patch in my queue. Sure. Please queue it for -stable too, since it apparently fixes a bug. Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT] Networking
Just a few fixes trickling in at this point. 1) If we see an attached socket on an skb in the ipv4 forwarding path, bail. This can happen due to races with FIB rule addition, and deletion, and we should just drop such frames. From Sebastian Pöhn. 2) pppoe receive should only accept packets destined for this hosts's MAC address. From Joakim Tjernlund. 3) Handle checksum unwrapping properly in ppp receive properly when it's encapsulated in UDP in some way, fix from Tom Herbert. 4) Fix some bugs in mv88e6xxx DSA driver resulting from the conversion from register offset constants to mnenomic macros. From Vivien Didelot. 5) Fix handling of HCA max message size in mlx4 adapters, from Eran Ben ELisha. Please pull, thanks a lot. The following changes since commit 04b7fe6a4a231871ef681bc95e08fe66992f7b1f: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide (2015-04-17 16:36:59 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to fab9adfb71fc8690e20c3c280d39d49c8f4a3f0a: net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP (2015-04-21 17:36:08 -0400) Andreas Oetken (1): altera tse: Error-Bit on tx-avalon-stream always set. David S. Miller (1): Merge branch 'ppp_csum_unset' Eran Ben Elisha (1): net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP Joakim Tjernlund (1): pppoe: Lacks DST MAC address check Sebastian Pöhn (1): ip_forward: Drop frames with attached skb->sk Tom Herbert (2): net: add skb_checksum_complete_unset ppp: call skb_checksum_complete_unset in ppp_receive_frame Vivien Didelot (2): net: dsa: mv88e6xxx: fix setup of port control 1 net: dsa: mv88e6xxx: use PORT_DEFAULT_VLAN jba...@akamai.com (1): tcp: add memory barriers to write space paths drivers/net/dsa/mv88e6xxx.c | 6 +++--- drivers/net/ethernet/altera/altera_msgdmahw.h | 1 - drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +- drivers/net/ppp/ppp_generic.c | 1 + drivers/net/ppp/pppoe.c | 3 +++ include/linux/skbuff.h| 12 net/ipv4/ip_forward.c | 3 +++ net/ipv4/tcp.c| 4 +++- net/ipv4/tcp_input.c | 2 ++ 9 files changed, 28 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] mpls: ABI changes for security and correctness
From: ebied...@xmission.com (Eric W. Biederman) Date: Tue, 21 Apr 2015 19:29:42 -0500 > Robert Shearman writes: > >> These changes make mpls not be enabled by default on all >> interfaces when in use for security, along with ensuring that a label >> not valid as an outgoing label can be added in mpls routes. >> >> This series contains three ABI/behaviour-affecting changes which have >> been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing >> improvements" without any further modification. These changes need to >> be considered for 4.1 otherwise we'll be stuck with the current >> behaviour/ABI forever. > > I don't like the difference in default between loopback and everything > else. That just seems like an extra arbitrary rule. > > Otherwise: > Acked-by: "Eric W. Biederman" > > Not that I expect Dave Miller is taking patches during the merge window. Eric, you say you disagree with the loopback vs. everything else behavior, yet you're ACK'ing this. Please don't say something like that because it is confusing and I can't tell what you want me to do. If you're willing to accept the series as is, say is: "Even though I disagree with X, I'm ok with this series for now." If you want changes before the series gets applied: "I want X changed to Y, and with that I give my ACK." Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails
From: Herbert Xu Date: Wed, 22 Apr 2015 08:36:34 +0800 > On Tue, Apr 21, 2015 at 02:55:34PM +0200, Thomas Graf wrote: >> When rhashtable_insert_rehash() fails with ENOMEM, this indicates that >> we can't allocate the necessary memory in the current context but the >> limits as set by the user would still allow to grow. >> >> Thus attempt an async resize in the background where we can allocate >> using GFP_KERNEL which is more likely to succeed. The insertion itself >> will still fail to indicate pressure. >> >> This fixes a bug where the table would never continue growing once the >> utilization is above 100%. >> >> Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion") >> Signed-off-by: Thomas Graf > > Good catch. But I think this call should happen in > rhashtable_insert_rehash since it's on the slow-path. Ok, then I expect a respin of this series. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] tcp: fix possible deadlock in tcp_send_fin()
From: Eric Dumazet Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in case a huge process is killed by OOM, and tcp_mem[2] is hit. To be able to free memory we need to make progress, so this patch allows FIN packets to not care about tcp_mem[2], if skb allocation succeeded. In a follow-up patch, we might abort tcp_send_fin() infinite loop in case TIF_MEMDIE is set on this thread, as memory allocator did its best getting extra memory already. This patch reverts d22e15371811 ("tcp: fix tcp fin memory accounting") Fixes: d22e15371811 ("tcp: fix tcp fin memory accounting") Signed-off-by: Eric Dumazet --- net/ipv4/tcp_output.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 8c8d7e06b72f..2ade67b7cdb0 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2812,6 +2812,21 @@ begin_fwd: } } +/* We allow to exceed memory limits for FIN packets to expedite + * connection tear down and (memory) recovery. + * Otherwise tcp_send_fin() could loop forever. + */ +static void sk_forced_wmem_schedule(struct sock *sk, int size) +{ + int amt, status; + + if (size <= sk->sk_forward_alloc) + return; + amt = sk_mem_pages(size); + sk->sk_forward_alloc += amt * SK_MEM_QUANTUM; + sk_memory_allocated_add(sk, amt, &status); +} + /* Send a fin. The caller locks the socket for us. This cannot be * allowed to fail queueing a FIN frame under any circumstances. */ @@ -2834,11 +2849,14 @@ void tcp_send_fin(struct sock *sk) } else { /* Socket is locked, keep trying until memory is available. */ for (;;) { - skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation); + skb = alloc_skb_fclone(MAX_TCP_HEADER, + sk->sk_allocation); if (skb) break; yield(); } + skb_reserve(skb, MAX_TCP_HEADER); + sk_forced_wmem_schedule(sk, skb->truesize); /* FIN eats a sequence byte, write_seq advanced by tcp_queue_skb(). */ tcp_init_nondata_skb(skb, tp->write_seq, TCPHDR_ACK | TCPHDR_FIN); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] net: stmmac: use msleep instead of udelay for gpio reset
Hi On Tue, Apr 21, 2015 at 08:31:34PM -0400, David Miller wrote: > From: Michael Trimarchi > Date: Wed, 22 Apr 2015 01:13:47 +0200 > > > Hi > > > > On Tue, Apr 21, 2015 at 05:35:40PM -0400, David Miller wrote: > >> From: Michael Trimarchi > >> Date: Tue, 21 Apr 2015 13:16:13 +0200 > >> > >> > -udelay(data->delays[0]); > >> ... > >> > +msleep(max(1U, data->delays[0] / 1000)); > >> > >> That looks very ugly with that max() expression in there. > >> > > > > Is fine for you a DIV_ROUND_UP? > > Not inside of these simple msleep() calls, no. > > How about adjusting the values either in the datastructure or > in local variables instead? That wasn't so hard to come up > with now, was it? Ok, it's easy no problem at all, I will post later today but I prefer local variables and use DIV_ROUND_UP Michael -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 2/2] rhashtable: Do not schedule more than one rehash if we can't grow further
On Tue, Apr 21, 2015 at 02:55:35PM +0200, Thomas Graf wrote: > The current code currently only stops inserting rehashes into the > chain when no resizes are currently scheduled. As long as resizes > are scheduled and while inserting above the utilization watermark, > more and more rehashes will be scheduled. > > This lead to a perfect DoS storm with thousands of rehashes > scheduled which lead to thousands of spinlocks to be taken > sequentially. > > Instead, only allow either a series of resizes or a single rehash. > Drop any further rehashes and return -EBUSY. > > Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion") > Signed-off-by: Thomas Graf Acked-by: Herbert Xu -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails
On Tue, Apr 21, 2015 at 02:55:34PM +0200, Thomas Graf wrote: > When rhashtable_insert_rehash() fails with ENOMEM, this indicates that > we can't allocate the necessary memory in the current context but the > limits as set by the user would still allow to grow. > > Thus attempt an async resize in the background where we can allocate > using GFP_KERNEL which is more likely to succeed. The insertion itself > will still fail to indicate pressure. > > This fixes a bug where the table would never continue growing once the > utilization is above 100%. > > Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion") > Signed-off-by: Thomas Graf Good catch. But I think this call should happen in rhashtable_insert_rehash since it's on the slow-path. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] mpls: ABI changes for security and correctness
Robert Shearman writes: > These changes make mpls not be enabled by default on all > interfaces when in use for security, along with ensuring that a label > not valid as an outgoing label can be added in mpls routes. > > This series contains three ABI/behaviour-affecting changes which have > been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing > improvements" without any further modification. These changes need to > be considered for 4.1 otherwise we'll be stuck with the current > behaviour/ABI forever. I don't like the difference in default between loopback and everything else. That just seems like an extra arbitrary rule. Otherwise: Acked-by: "Eric W. Biederman" Not that I expect Dave Miller is taking patches during the merge window. > Robert Shearman (3): > mpls: Per-device MPLS state > mpls: Per-device enabling of packet input > mpls: Prevent use of implicit NULL label as outgoing label > > Documentation/networking/mpls-sysctl.txt | 9 +++ > include/linux/netdevice.h| 4 + > net/mpls/af_mpls.c | 132 > ++- > net/mpls/internal.h | 6 ++ > 4 files changed, 148 insertions(+), 3 deletions(-) Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] net: stmmac: use msleep instead of udelay for gpio reset
From: Michael Trimarchi Date: Wed, 22 Apr 2015 01:13:47 +0200 > Hi > > On Tue, Apr 21, 2015 at 05:35:40PM -0400, David Miller wrote: >> From: Michael Trimarchi >> Date: Tue, 21 Apr 2015 13:16:13 +0200 >> >> > - udelay(data->delays[0]); >> ... >> > + msleep(max(1U, data->delays[0] / 1000)); >> >> That looks very ugly with that max() expression in there. >> > > Is fine for you a DIV_ROUND_UP? Not inside of these simple msleep() calls, no. How about adjusting the values either in the datastructure or in local variables instead? That wasn't so hard to come up with now, was it? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next,1/1] hv_netvsc: call dump_rndis_message() only in netvsc debug mode
From: Simon Xiao Date: Tue, 21 Apr 2015 22:14:14 + > In current netvsc driver, for each packet received, it will call > dump_rndis_message() to try to dump the rndis packet information by > netdev_dbg(). In non-debug mode, dump_rndis_message() will not dump > anything but it still initialize some local variables and process > the switch logic in the function of dump_rndis_message(), which is > unnecessary, especially in high network throughput situation. See NETIF_MSG_* and use it properly in your driver, read other drivers and learn how to properly use it for things like this. I'm not going to explain this a third time. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch net] igb: fix a typo in igb_reset_q_vector()
On Wed, 2015-04-22 at 09:26 +0900, Toshiaki Makita wrote: > On 2015/04/22 8:20, Cong Wang wrote: > > Fixes: 5536d2102a2d ("igb: Combine q_vector and ring allocation into a > > single function") > > Cc: Alexander Duyck > > Cc: Jeff Kirsher > > Signed-off-by: Cong Wang > > --- > > drivers/net/ethernet/intel/igb/igb_main.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > > b/drivers/net/ethernet/intel/igb/igb_main.c > > index 8457d03..85bbeb2 100644 > > --- a/drivers/net/ethernet/intel/igb/igb_main.c > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > > @@ -1036,7 +1036,7 @@ static void igb_reset_q_vector(struct igb_adapter > > *adapter, int v_idx) > > adapter->tx_ring[q_vector->tx.ring->queue_index] = NULL; > > > > if (q_vector->rx.ring) > > - adapter->tx_ring[q_vector->rx.ring->queue_index] = NULL; > > + adapter->rx_ring[q_vector->rx.ring->queue_index] = NULL; > > > > netif_napi_del(&q_vector->napi); > > > > Hi Cong Wang, > > I have already sent a patch to intel's tree. > http://git.kernel.org/cgit/linux/kernel/git/jkirsher/net-queue.git/commit/?h=dev-queue&id=02ac4b65689f8df824117395fd8d160c04161a7b > > Toshiaki Makita > Oops, yeah. I already have this in my queue. I wondered why the patch seemed familiar, Cong dropping your patch since I already have Tochiaki's patch in my queue. signature.asc Description: This is a digitally signed message part
Re: [Patch net] igb: fix a typo in igb_reset_q_vector()
On 2015/04/22 8:20, Cong Wang wrote: > Fixes: 5536d2102a2d ("igb: Combine q_vector and ring allocation into a single > function") > Cc: Alexander Duyck > Cc: Jeff Kirsher > Signed-off-by: Cong Wang > --- > drivers/net/ethernet/intel/igb/igb_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index 8457d03..85bbeb2 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -1036,7 +1036,7 @@ static void igb_reset_q_vector(struct igb_adapter > *adapter, int v_idx) > adapter->tx_ring[q_vector->tx.ring->queue_index] = NULL; > > if (q_vector->rx.ring) > - adapter->tx_ring[q_vector->rx.ring->queue_index] = NULL; > + adapter->rx_ring[q_vector->rx.ring->queue_index] = NULL; > > netif_napi_del(&q_vector->napi); > Hi Cong Wang, I have already sent a patch to intel's tree. http://git.kernel.org/cgit/linux/kernel/git/jkirsher/net-queue.git/commit/?h=dev-queue&id=02ac4b65689f8df824117395fd8d160c04161a7b Toshiaki Makita -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch net] igb: fix a typo in igb_reset_q_vector()
On Tue, 2015-04-21 at 16:20 -0700, Cong Wang wrote: > Fixes: 5536d2102a2d ("igb: Combine q_vector and ring allocation into a > single function") > Cc: Alexander Duyck > Cc: Jeff Kirsher > Signed-off-by: Cong Wang > --- > drivers/net/ethernet/intel/igb/igb_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks Cong, I will add your patch to my queue. signature.asc Description: This is a digitally signed message part
Re: [PATCH] fix tcp fin memory accounting
On Tue, 2015-03-24 at 01:11 -0500, Josh Hunt wrote: > On 03/24/2015 01:10 AM, David Miller wrote: > > From: Josh Hunt > > Date: Fri, 20 Mar 2015 12:36:24 -0500 > > > >> Would it be possible to queue up 355a901e6cf1 (tcp: make connect() > >> mem charging friendly) for stable as well? That is the commit that > >> fixes this problem in the tcp_connect()/tcp_send_syn_data() cases. > > > > Done. > > > > Thanks David. Note that this patch adds a deadlock possibility in some stress situations. If a process owning some tcp socket dies, and tcp_mem[2] is already hit, all sk_stream_alloc_skb() can return NULL and we loop in tcp_send_fin(), making no progress because we can not free any tcp memory. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Networking microconference at LPC15 in Seattle (Aug 19-21)
Hi, Tom and myself will be running another iteration of a networking focused microconference [0] at LPC, Aug 19-21 in Seattle. Given that a bunch of us will be in Seattle anyway we might as well have a general networking session and spend some time together. This year's focus is on: IPv6, Network Virtualization, and Security. Please note that separated sessions are proposed for "Network offload", "Wireless networking" and "Network management". If you are interested in participating and have ideas for discussions, please add them to the wiki [0]. If you are interested in attending, please list your name on the wiki [0]. If you are interested in helping us run the session, feel free to reach out to us. [0] http://wiki.linuxplumbersconf.org/2015:networking Thomas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch net] igb: fix a typo in igb_reset_q_vector()
Fixes: 5536d2102a2d ("igb: Combine q_vector and ring allocation into a single function") Cc: Alexander Duyck Cc: Jeff Kirsher Signed-off-by: Cong Wang --- drivers/net/ethernet/intel/igb/igb_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 8457d03..85bbeb2 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -1036,7 +1036,7 @@ static void igb_reset_q_vector(struct igb_adapter *adapter, int v_idx) adapter->tx_ring[q_vector->tx.ring->queue_index] = NULL; if (q_vector->rx.ring) - adapter->tx_ring[q_vector->rx.ring->queue_index] = NULL; + adapter->rx_ring[q_vector->rx.ring->queue_index] = NULL; netif_napi_del(&q_vector->napi); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
On Tue, Apr 21, 2015 at 06:51:26PM -0400, Andy Walls wrote: > Sorry for the top post; mobile work email account. > > Luis, > > You do the changes to remove MTTR and point me to your dev repo and branch. > Also point me to the new functions/primitives I'll need. There is nothing new actually needed for ivtv, unless of course the ivtv driver is bounded to use a large MTRR that includes the non-framebuffer region, if so then the ioremap_uc() would be needed, and you can just cherry pick that patch: https://marc.info/?l=linux-kernel&m=142964809110516&w=1 I'll bounce that patch to you as well. Might help reading this patch too: https://marc.info/?l=linux-kernel&m=142964809710517&w=1 If your write-combining area is not restricted by size constraints so that it also include the non-framebuffer areas then you can just do a simple conversion of the driver to use ioremap_wc() on the framebuffer followed by arch_phys_wc_add(). An example driver that required changes to split this with size contraints is atyfb, here are the changes for it: https://marc.info/?l=linux-kernel&m=142964818810539&w=1 https://marc.info/?l=linux-kernel&m=142964813610531&w=1 https://marc.info/?l=linux-kernel&m=142964811010524&w=1 https://marc.info/?l=linux-kernel&m=142964814810532&w=1 If you are not constrained by MTRR's limitation on size then a simple trivial driver conversion is sufficient. For example: https://marc.info/?l=linux-kernel&m=142964744610286&w=1 I should also note that we are strivoing to also not use overlapping ioremap() calls as we want to avoid that mess. Overlapping iroemap() calls with different types could in theory work but its best we just design clean drivers and avoid this. As per Andy Lutomirski, what we'd need done on ivtv likely is for it to ioremap() for an initial bring up of the device, then infer the framebuffer offset, and only when that is being used then iounmap and then ioremap() again split areas on the driver, one with ioremap. > I'll do the changes to add write-combining back into ivtv and ivtvfb, test > them with my hardware and push them to my linuxtv.org git repo. Great! The above sounded like a complexity you did not wish to take on, but if you're up for the change, that'd be awesome! > I know there is at least one English speaking user in India using ivtv with > old PVR hardware, and probably folks in less developed places also using it. If the above is too much work for that few amount of users I'd hope we can just have them use older kernels, for the sake of sane APIs and clean driver architecture. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] net: stmmac: use msleep instead of udelay for gpio reset
Hi On Tue, Apr 21, 2015 at 05:35:40PM -0400, David Miller wrote: > From: Michael Trimarchi > Date: Tue, 21 Apr 2015 13:16:13 +0200 > > > - udelay(data->delays[0]); > ... > > + msleep(max(1U, data->delays[0] / 1000)); > > That looks very ugly with that max() expression in there. > Is fine for you a DIV_ROUND_UP? > Please find some clean way to get rid of it if you want to > make this conversion. > Agree, I will repost it Michael > Thanks. -- | Michael Nazzareno Trimarchi Amarula Solutions BV | | COO - Founder Cruquiuskade 47 | | +31(0)851119172 Amsterdam 1018 AM NL | | [`as] http://www.amarulasolutions.com | -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
On Tue, 2015-04-21 at 23:44 +0200, Mateusz Kulikowski wrote: > On 21.04.2015 23:22, Joe Perches wrote: > > On Tue, 2015-04-21 at 22:57 +0200, Mateusz Kulikowski wrote: > (...) > >> > >> Perhaps it would be smarter to use (for both patches) $stat instead. > >> This applies also to existing checks (like PREFER_ETHER_ADDR_COPY) > >> so we can catch calls formatted like > >> > >> memset(very.long.structure->something.something_different42, > >>0xFF, ETH_ALEN); > > > > Yes, likely that's true. > > > > checkpatch couldn't --fix it easily unless it's on a > > single line though. > > True, True; If you prefer $line and ability to --fix - I'll use that in v3 I suppose you could do both $line and $stat and the fix would only work when it's on a single line. Perhaps something like this would work: if ($line =~ /whatever/ || (defined($stat) && $stat =~ /whatever/)) { if (WARN(...) && $fix) { fixed[$fixlinenr] =~ s/whatever/appropriate/; } } No worries about getting 'round the the list. It'll get got eventually. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net-next,1/1] hv_netvsc: call dump_rndis_message() only in netvsc debug mode
> -Original Message- > From: David Miller [mailto:da...@davemloft.net] > Sent: Tuesday, April 21, 2015 2:49 PM > To: Simon Xiao > Cc: KY Srinivasan; Haiyang Zhang; de...@linuxdriverproject.org; > netdev@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: Re: [PATCH net-next,1/1] hv_netvsc: call dump_rndis_message() only in > netvsc debug mode > > From: six...@microsoft.com > Date: Tue, 21 Apr 2015 15:58:05 -0700 > > > From: Simon Xiao > > > > Signed-off-by: Simon Xiao > > Reviewed-by: K. Y. Srinivasan > > Reviewed-by: Haiyang Zhang > > I just gave you feedback on this patch in response to your original > submission, > do not ignore it. Thanks for your feedback, David. In current netvsc driver, for each packet received, it will call dump_rndis_message() to try to dump the rndis packet information by netdev_dbg(). In non-debug mode, dump_rndis_message() will not dump anything but it still initialize some local variables and process the switch logic in the function of dump_rndis_message(), which is unnecessary, especially in high network throughput situation. My change is to have a run-time config flag to control the execution of dump_rndis_message() and avoid above unnecessary cost in non-debug mode. In the default case, it will be non-debug mode, and rndis_filter_receive() will not call dump_rndis_message() which saves the above extra cost for each packet received. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
On Tue, Apr 21, 2015 at 3:08 PM, Luis R. Rodriguez wrote: > On Tue, Apr 21, 2015 at 3:02 PM, Luis R. Rodriguez wrote: >> Andy, can we live without MTRR support on this driver for future kernels? >> This >> would only leave ipath as the only offending driver. > > Sorry to be clear, can we live with removal of write-combining on this driver? > I personally think so, but a driver maintainer's ack would be nice. --Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
On Tue, Apr 21, 2015 at 3:02 PM, Luis R. Rodriguez wrote: > Andy, can we live without MTRR support on this driver for future kernels? This > would only leave ipath as the only offending driver. Sorry to be clear, can we live with removal of write-combining on this driver? Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
On Wed, Apr 15, 2015 at 09:07:37PM -0400, Andy Walls wrote: > On Thu, 2015-04-16 at 01:58 +0200, Luis R. Rodriguez wrote: > > Hey Andy, thanks for your review, adding Hyong-Youb Kim for review of the > > full range ioremap_wc() idea below. > > > > On Wed, Apr 15, 2015 at 06:38:51PM -0400, Andy Walls wrote: > > > Hi All, > > > > > > On Mon, 2015-04-13 at 19:49 +0200, Luis R. Rodriguez wrote: > > > > From the beginning it seems only framebuffer devices used MTRR/WC, > > > [snip] > > > > The ivtv device is a good example of the worst type of > > > > situations and these days. So perhap __arch_phys_wc_add() and a > > > > ioremap_ucminus() might be something more than transient unless > > > > hardware folks > > > > get a good memo or already know how to just Do The Right Thing (TM). > > > > > > Just to reiterate a subtle point, use of the ivtvfb is *optional*. A > > > user may or may not load it. When the user does load the ivtvfb driver, > > > the ivtv driver has already been initialized and may have functions of > > > the card already in use by userspace. > > > > I suspected this and its why I note that a rewrite to address a clean > > split with separate ioremap seems rather difficult in this case. > > > > > Hopefully no one is trying to use the OSD as framebuffer and the video > > > decoder/output engine for video display at the same time. > > > > Worst case concern I have also is the implications of having overlapping > > ioremap() calls (as proposed in my last reply) for different memory types > > and having the different virtual memory addresse used by different parts > > of the driver. Its not clear to me what the hardware implications of this > > are. > > > > > But the video > > > decoder/output device nodes may already be open for performing ioctl() > > > functions so unmapping the decoder IO space out from under them, when > > > loading the ivtvfb driver module, might not be a good thing. > > > > Using overlapping ioremap() calls with different memory types would address > > this concern provided hardware won't barf both on the device and CPU. > > Hardware > > folks could provide feedback or an ivtvfb user could test the patch supplied > > on both non-PAT and PAT systems. Even so, who knows, this might work on > > some > > systems while not on others, only hardware folks would know. > > The CX2341[56] firmware+hardware has a track record for being really > picky about sytem hardware. It's primary symptoms are for the DMA > engine or Mailbox protocol to get hung up. So yeah, it could barf > easily on some users. > > > An alternative... is to just ioremap_wc() the entire region, including > > MMIO registers for these old devices. > > That's my thought; as long as implementing PCI write then read can force > writes to be posted and that setting that many pages as WC doesn't cause > some sort of PAT resource exhaustion. (I know very little about PAT). So upon review that strategy won't work well unless we implemnt some sort of of hack on the driver. That's also quite a bit of work. Andy, can we live without MTRR support on this driver for future kernels? This would only leave ipath as the only offending driver. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET] printk, netconsole: implement reliable netconsole
On Fri, 17 Apr 2015 13:17:12 -0400 (EDT) David Miller wrote: > From: Tejun Heo > Date: Fri, 17 Apr 2015 12:28:26 -0400 > > > On Sat, Apr 18, 2015 at 12:35:06AM +0900, Tetsuo Handa wrote: > >> If the sender side can wait for retransmission, why can't we use > >> userspace programs (e.g. rsyslogd)? > > > > Because the system may be oopsing, ooming or threshing excessively > > rendering the userland inoperable and that's exactly when we want > > those log messages to be transmitted out of the system. > > If userland cannot run properly, it is almost certain that neither will > your complex reliability layer logic. > > I tend to agree with Tetsuo, that in-kernel netconsole should remain > as simple as possible and once it starts to have any smarts and less > trivial logic the job belongs in userspace. Keep existing netconsole as simple as possible. It is not meant as reliable, secure logging. "Those who do not understand TCP are doomed to reinvent it" -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] v2: Driver: hv: netvsc: call dump_rndis_message() only in netvsc debug mode
From: Simon Xiao Date: Tue, 21 Apr 2015 21:47:32 + > Sorry - this patch should be sent to net-next so please ignore it. Please do not top post. First, provide exactly the necessary quoted material, and only the most necessary quoted material. Then place your response afterwards, rather than beforehand. Again, please do not ever top-post or quote more material in your reponse tha necessary. This is a very serious pet peeve of experienced people who read this list every day, so please do your best to abide to do this properly. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] ethernet: myri10ge: use arch_phys_wc_add()
From: "Luis R. Rodriguez" Date: Tue, 21 Apr 2015 13:09:45 -0700 > From: "Luis R. Rodriguez" > > This driver already uses ioremap_wc() on the same range > so when write-combining is available that will be used > instead. ... > Signed-off-by: Luis R. Rodriguez I'll apply this with a driver maintainer's ACK. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next,1/1] hv_netvsc: call dump_rndis_message() only in netvsc debug mode
From: six...@microsoft.com Date: Tue, 21 Apr 2015 15:58:05 -0700 > From: Simon Xiao > > Signed-off-by: Simon Xiao > Reviewed-by: K. Y. Srinivasan > Reviewed-by: Haiyang Zhang I just gave you feedback on this patch in response to your original submission, do not ignore it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next,1/1] hv_netvsc: call dump_rndis_message() only in netvsc debug mode
From: Simon Xiao Signed-off-by: Simon Xiao Reviewed-by: K. Y. Srinivasan Reviewed-by: Haiyang Zhang --- drivers/net/hyperv/hyperv_net.h | 3 +++ drivers/net/hyperv/netvsc_drv.c | 8 drivers/net/hyperv/rndis_filter.c | 3 ++- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index a10b316..c9be35e 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -28,6 +28,9 @@ #include #include +/* flag for netvsc debug mode */ +extern int debug_mode; + /* RSS related */ #define OID_GEN_RECEIVE_SCALE_CAPABILITIES 0x00010203 /* query only */ #define OID_GEN_RECEIVE_SCALE_PARAMETERS 0x00010204 /* query and set */ diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index a3a9d38..7c41864 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -52,6 +52,10 @@ static int ring_size = 128; module_param(ring_size, int, S_IRUGO); MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)"); +int debug_mode = 0; +module_param(debug_mode, int, S_IRUGO); +MODULE_PARM_DESC(debug_mode, "debug mode: zero(0) for non-debug mode; non-zero for debug mode"); + static void do_set_multicast(struct work_struct *w) { struct net_device_context *ndevctx = @@ -999,6 +1003,10 @@ static int __init netvsc_drv_init(void) pr_info("Increased ring_size to %d (min allowed)\n", ring_size); } + + if (debug_mode != 0) + pr_info("Run netvsc in debug mode"); + return vmbus_driver_register(&netvsc_drv); } diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index 0d92efe..a3f43f6 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -429,7 +429,8 @@ int rndis_filter_receive(struct hv_device *dev, rndis_msg = pkt->data; - dump_rndis_message(dev, rndis_msg); + if (debug_mode != 0) + dump_rndis_message(dev, rndis_msg); switch (rndis_msg->ndis_msg_type) { case RNDIS_MSG_PACKET: -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/1] v2: Driver: hv: netvsc: call dump_rndis_message() only in netvsc debug mode
Sorry - this patch should be sent to net-next so please ignore it. Thanks, Simon -Original Message- From: six...@microsoft.com [mailto:six...@microsoft.com] Sent: Tuesday, April 21, 2015 2:44 PM To: KY Srinivasan; Haiyang Zhang; netdev@vger.kernel.org; linux-ker...@vger.kernel.org Cc: Simon Xiao Subject: [PATCH 1/1] v2: Driver: hv: netvsc: call dump_rndis_message() only in netvsc debug mode From: Simon Xiao Signed-off-by: Simon Xiao --- drivers/net/hyperv/hyperv_net.h | 3 +++ drivers/net/hyperv/netvsc_drv.c | 8 drivers/net/hyperv/rndis_filter.c | 3 ++- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index a10b316..c9be35e 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -28,6 +28,9 @@ #include #include +/* flag for netvsc debug mode */ +extern int debug_mode; + /* RSS related */ #define OID_GEN_RECEIVE_SCALE_CAPABILITIES 0x00010203 /* query only */ #define OID_GEN_RECEIVE_SCALE_PARAMETERS 0x00010204 /* query and set */ diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index a3a9d38..7c41864 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -52,6 +52,10 @@ static int ring_size = 128; module_param(ring_size, int, S_IRUGO); MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)"); +int debug_mode = 0; +module_param(debug_mode, int, S_IRUGO); MODULE_PARM_DESC(debug_mode, +"debug mode: zero(0) for non-debug mode; non-zero for debug mode"); + static void do_set_multicast(struct work_struct *w) { struct net_device_context *ndevctx = @@ -999,6 +1003,10 @@ static int __init netvsc_drv_init(void) pr_info("Increased ring_size to %d (min allowed)\n", ring_size); } + + if (debug_mode != 0) + pr_info("Run netvsc in debug mode"); + return vmbus_driver_register(&netvsc_drv); } diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index 0d92efe..a3f43f6 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -429,7 +429,8 @@ int rndis_filter_receive(struct hv_device *dev, rndis_msg = pkt->data; - dump_rndis_message(dev, rndis_msg); + if (debug_mode != 0) + dump_rndis_message(dev, rndis_msg); switch (rndis_msg->ndis_msg_type) { case RNDIS_MSG_PACKET: -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] v2: Driver: hv: netvsc: call dump_rndis_message() only in netvsc debug mode
From: six...@microsoft.com Date: Tue, 21 Apr 2015 14:43:55 -0700 > From: Simon Xiao > > Signed-off-by: Simon Xiao This commit message is lacking an explanation why you want to do what you are doing. Also, we have an existing mechanism to control network device driver debug logging output, please use it rather than invent your own facility. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
On 21.04.2015 23:22, Joe Perches wrote: > On Tue, 2015-04-21 at 22:57 +0200, Mateusz Kulikowski wrote: (...) >> >> Perhaps it would be smarter to use (for both patches) $stat instead. >> This applies also to existing checks (like PREFER_ETHER_ADDR_COPY) >> so we can catch calls formatted like >> >> memset(very.long.structure->something.something_different42, >>0xFF, ETH_ALEN); > > Yes, likely that's true. > > checkpatch couldn't --fix it easily unless it's on a > single line though. True, True; If you prefer $line and ability to --fix - I'll use that in v3 > > As far as I can tell, there are ~120 of these "memcpy"s > in the tree, but there aren't any "memset"s like that > split into 2 or more lines. Some of them are probably candidates for ether_addr_copy(_unaligned) :) I'll probably take a look at them once I have both functions available. Regards, Mateusz > > Here's a list of the multiple line memcpy(..., ETH_ALEN) > uses that I found. > > arch/arm/mach-davinci/board-mityomapl138.c:144: > memcpy(soc_info->emac_pdata->mac_addr, > factory_config.mac, ETH_ALEN); > drivers/staging/rtl8712/rtl8712_recv.c:395: > memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->src, > ETH_ALEN); > drivers/staging/rtl8712/rtl8712_recv.c:396: > memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->dst, > ETH_ALEN); > drivers/staging/rtl8712/rtl8712_recv.c:402: > memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->src, > ETH_ALEN); > drivers/staging/rtl8712/rtl8712_recv.c:403: > memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->dst, > ETH_ALEN); > drivers/staging/rtl8712/os_intfs.c:398: > memcpy(pnetdev->dev_addr, > padapter->eeprompriv.mac_addr, ETH_ALEN); > drivers/staging/rtl8712/os_intfs.c:414: > memcpy(padapter->eeprompriv.mac_addr, > pnetdev->dev_addr, ETH_ALEN); > drivers/staging/rtl8712/rtl871x_ioctl_linux.c:107: > memcpy(wrqu.ap_addr.sa_data, pmlmepriv->cur_network.network.MacAddress, > ETH_ALEN); > drivers/staging/rtl8712/rtl871x_ioctl_linux.c:838: > memcpy(psecuritypriv->PMKIDList[psecuritypriv-> > PMKIDIndex].Bssid, strIssueBssid, ETH_ALEN); > drivers/staging/rtl8712/rtl871x_xmit.c:489: > memcpy(pwlanhdr->addr1, get_bssid(pmlmepriv), > ETH_ALEN); > drivers/staging/rtl8712/rtl871x_xmit.c:496: > memcpy(pwlanhdr->addr2, get_bssid(pmlmepriv), > ETH_ALEN); > drivers/staging/rtl8712/rtl871x_xmit.c:503: > memcpy(pwlanhdr->addr3, get_bssid(pmlmepriv), > ETH_ALEN); > drivers/staging/rtl8712/rtl871x_xmit.c:507: > memcpy(pwlanhdr->addr3, get_bssid(pmlmepriv), > ETH_ALEN); > drivers/staging/rtl8192e/rtllib_softmac_wx.c:125: > memcpy(wrqu->ap_addr.sa_data, > ieee->current_network.bssid, ETH_ALEN); > drivers/staging/rtl8192e/rtllib_tx.c:697: > memcpy(&header.addr1, ieee->current_network.bssid, > ETH_ALEN); > drivers/staging/rtl8192e/rtllib_tx.c:700: > memcpy(&header.addr3, > ieee->current_network.bssid, ETH_ALEN); > drivers/staging/rtl8192e/rtllib_tx.c:709: > memcpy(&header.addr3, ieee->current_network.bssid, > ETH_ALEN); > drivers/staging/rtl8192e/rtllib_softmac.c:3748: > memcpy(wrqu.ap_addr.sa_data, ieee->current_network.bssid, > ETH_ALEN); > drivers/staging/rtl8192u/ieee80211/ieee80211_softmac_wx.c:126: > memcpy(wrqu->ap_addr.sa_data, > ieee->current_network.bssid, ETH_ALEN); > drivers/staging/slicoss/slicoss.c:567: > memcpy(adapter->currmacaddr, adapter->macaddr, > ETH_ALEN); > drivers/staging/slicoss/slicoss.c:569: > memcpy(adapter->netdev->dev_addr, adapter->currmacaddr, > ETH_ALEN); > drivers/staging/rtl8723au/hal/usb_halinit.c:1020: > memcpy(pEEPROM->mac_addr, &hwinfo[EEPROM_MAC_ADDR_8723AU], > ETH_ALEN); > drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:327: > memcpy(pwlanhdr->addr1, > get_my_bssid23a(&pmlmeinfo->network), ETH_ALEN); > drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:328: > memcpy(pwlanhdr->addr2, myid(&padapter->eeprompriv), > ETH_ALEN); > drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:335: > memcpy(pwlanhdr->addr2, > get_my
Re: [PATCH] altera tse: add support for lixed-links.
From: Andreas Oetken Date: Tue, 21 Apr 2015 18:32:25 +0200 Subject typo "lixed --> fixed" > + /* In the case of a fixed PHY, the DT node associated > + * to the PHY is the Ethernet MAC DT node. > + */ Not indented properly, the second and third line of this comment need one extra space before the first "*". -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP
From: Or Gerlitz Date: Tue, 21 Apr 2015 15:46:34 +0300 > From: Eran Ben Elisha > > Currently we parse max_msg_sz from the wrong offset in QUERY_DEV_CAP, > fix to use the right offset. > > Fixes: 0b131561a7d6 ('net/mlx4_en: Add Flow control statistics [..]') > Signed-off-by: Eran Ben Elisha > Signed-off-by: Or Gerlitz Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] net: stmmac: use msleep instead of udelay for gpio reset
From: Michael Trimarchi Date: Tue, 21 Apr 2015 13:16:13 +0200 > - udelay(data->delays[0]); ... > + msleep(max(1U, data->delays[0] / 1000)); That looks very ugly with that max() expression in there. Please find some clean way to get rid of it if you want to make this conversion. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcp: set SOCK_NOSPACE under memory presure
From: Jason Baron Date: Mon, 20 Apr 2015 20:05:13 + (GMT) > Under tcp memory pressure, calling epoll_wait() in edge triggered > mode after -EAGAIN, can result in an indefinite hang in epoll_wait(), > even when there is suffcient memory available to continue making > progress. The problem is that __sk_mem_schedule() can return 0, > under memory pressure without having set the SOCK_NOSPACE flag. Thus, > even though all the outstanding packets have been acked, we never > get the EPOLLOUT that we are expecting from epoll_wait(). > > This issue is currently limited to epoll when used in edge trigger > mode, since 'tcp_poll()', does in fact currently set SOCK_NOSPACE. > This is sufficient for poll()/select() and epoll() in level trigger > mode. However, in edge trigger mode, epoll() is relying on the write > path to set SOCK_NOSPACE. So I view this patch as bringing us into > sync with poll()/select() and epoll() level trigger behavior. Can you explain exactly how epoll in edge trigger mode is depending upon SOCK_NOSPACE being set in this way? I tried to read the epoll code and it just seems to call ->poll() in the normal way when returning event state. Also, there are exactly two call sites of sk_stream_wait_space() for TCP, and they both look like this: wait_for_sndbuf: set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); wait_for_memory: tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH, size_goal); if ((err = sk_stream_wait_memory(sk, &timeo)) != 0) goto do_error; Definitely, the person who wrote this code intended SOCK_NOSPACE to be set only when we are waiting for sndbuf space rather than just memory. At a minimum, I need a more detailed commit log message for this, showing the exact code paths in epoll() that have this requirement and thus create the looping condition. Because with a casual scan of the epoll code I could not figure it out. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
On Tue, 2015-04-21 at 22:57 +0200, Mateusz Kulikowski wrote: > Hi Joe, > > On 20.04.2015 03:13, Joe Perches wrote: > > On Mon, 2015-04-20 at 00:16 +0200, Mateusz Kulikowski wrote: > >> Suggest using eth_zero_addr() or eth_broadcast_addr() instead of memset(). > > > > Hi again Mateusz > > > >> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > > [] > >> @@ -5042,6 +5042,22 @@ sub process { > >> "Prefer ether_addr_equal() or > >> ether_addr_equal_unaligned() over memcmp()\n" . $herecurr) > >>} > >> > >> +# check for memset(foo, 0x0, ETH_ALEN) that could be eth_zero_addr > >> +# check for memset(foo, 0xFF, ETH_ALEN) that could be eth_broadcast_addr > >> + if ($^V && $^V ge 5.10.0 && > >> + $line =~ > >> /^\+(?:.*?)\bmemset\s*\(\s*$FuncArg\s*,\s*$FuncArg\s*\,\s*ETH_ALEN\s*\)/s) > >> { > > > > Because you are working with $line and not $stat, > > the last /s isn't useful here. > > > > $line is always a single line. > > Perhaps it would be smarter to use (for both patches) $stat instead. > This applies also to existing checks (like PREFER_ETHER_ADDR_COPY) > so we can catch calls formatted like > > memset(very.long.structure->something.something_different42, >0xFF, ETH_ALEN); Yes, likely that's true. checkpatch couldn't --fix it easily unless it's on a single line though. As far as I can tell, there are ~120 of these "memcpy"s in the tree, but there aren't any "memset"s like that split into 2 or more lines. Here's a list of the multiple line memcpy(..., ETH_ALEN) uses that I found. arch/arm/mach-davinci/board-mityomapl138.c:144: memcpy(soc_info->emac_pdata->mac_addr, factory_config.mac, ETH_ALEN); drivers/staging/rtl8712/rtl8712_recv.c:395: memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->src, ETH_ALEN); drivers/staging/rtl8712/rtl8712_recv.c:396: memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->dst, ETH_ALEN); drivers/staging/rtl8712/rtl8712_recv.c:402: memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->src, ETH_ALEN); drivers/staging/rtl8712/rtl8712_recv.c:403: memcpy(skb_push(sub_skb, ETH_ALEN), pattrib->dst, ETH_ALEN); drivers/staging/rtl8712/os_intfs.c:398: memcpy(pnetdev->dev_addr, padapter->eeprompriv.mac_addr, ETH_ALEN); drivers/staging/rtl8712/os_intfs.c:414: memcpy(padapter->eeprompriv.mac_addr, pnetdev->dev_addr, ETH_ALEN); drivers/staging/rtl8712/rtl871x_ioctl_linux.c:107: memcpy(wrqu.ap_addr.sa_data, pmlmepriv->cur_network.network.MacAddress, ETH_ALEN); drivers/staging/rtl8712/rtl871x_ioctl_linux.c:838: memcpy(psecuritypriv->PMKIDList[psecuritypriv-> PMKIDIndex].Bssid, strIssueBssid, ETH_ALEN); drivers/staging/rtl8712/rtl871x_xmit.c:489: memcpy(pwlanhdr->addr1, get_bssid(pmlmepriv), ETH_ALEN); drivers/staging/rtl8712/rtl871x_xmit.c:496: memcpy(pwlanhdr->addr2, get_bssid(pmlmepriv), ETH_ALEN); drivers/staging/rtl8712/rtl871x_xmit.c:503: memcpy(pwlanhdr->addr3, get_bssid(pmlmepriv), ETH_ALEN); drivers/staging/rtl8712/rtl871x_xmit.c:507: memcpy(pwlanhdr->addr3, get_bssid(pmlmepriv), ETH_ALEN); drivers/staging/rtl8192e/rtllib_softmac_wx.c:125: memcpy(wrqu->ap_addr.sa_data, ieee->current_network.bssid, ETH_ALEN); drivers/staging/rtl8192e/rtllib_tx.c:697: memcpy(&header.addr1, ieee->current_network.bssid, ETH_ALEN); drivers/staging/rtl8192e/rtllib_tx.c:700: memcpy(&header.addr3, ieee->current_network.bssid, ETH_ALEN); drivers/staging/rtl8192e/rtllib_tx.c:709: memcpy(&header.addr3, ieee->current_network.bssid, ETH_ALEN); drivers/staging/rtl8192e/rtllib_softmac.c:3748: memcpy(wrqu.ap_addr.sa_data, ieee->current_network.bssid, ETH_ALEN); drivers/staging/rtl8192u/ieee80211/ieee80211_softmac_wx.c:126: memcpy(wrqu->ap_addr.sa_data, ieee->current_network.bssid, ETH_ALEN); drivers/staging/slicoss/slicoss.c:567: memcpy(adapter->currmacaddr, adapter->macaddr, ETH_ALEN); drivers/staging/slicoss/slicoss.c:569: memcpy(adapter->netdev->dev_addr, adapter->currmacaddr, ETH_ALEN); drivers/staging/rtl8723au/hal/usb_halinit.c:1020: memcpy(p
Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
Hi Joe, On 20.04.2015 03:13, Joe Perches wrote: > On Mon, 2015-04-20 at 00:16 +0200, Mateusz Kulikowski wrote: >> Suggest using eth_zero_addr() or eth_broadcast_addr() instead of memset(). > > Hi again Mateusz > >> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > [] >> @@ -5042,6 +5042,22 @@ sub process { >> "Prefer ether_addr_equal() or >> ether_addr_equal_unaligned() over memcmp()\n" . $herecurr) >> } >> >> +# check for memset(foo, 0x0, ETH_ALEN) that could be eth_zero_addr >> +# check for memset(foo, 0xFF, ETH_ALEN) that could be eth_broadcast_addr >> +if ($^V && $^V ge 5.10.0 && >> +$line =~ >> /^\+(?:.*?)\bmemset\s*\(\s*$FuncArg\s*,\s*$FuncArg\s*\,\s*ETH_ALEN\s*\)/s) { > > Because you are working with $line and not $stat, > the last /s isn't useful here. > > $line is always a single line. Perhaps it would be smarter to use (for both patches) $stat instead. This applies also to existing checks (like PREFER_ETHER_ADDR_COPY) so we can catch calls formatted like memset(very.long.structure->something.something_different42, 0xFF, ETH_ALEN); Regards, Mateusz -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] mpls: Per-device enabling of packet input
An MPLS network is a single trust domain where the edges must be in control of what labels make their way into the core. The simplest way of ensuring this is for the edge device to always impose the labels, and not allow forward labeled traffic from untrusted neighbours. This is achieved by allowing a per-device configuration of whether MPLS traffic input from that interface should be processed or not. To be secure by default, the default state is changed to MPLS being disabled on all interfaces (except the loopback) unless explicitly enabled and no global option is provided to change the default. Whilst this differs from other protocols (e.g. IPv6), network operators are used to explicitly enabling MPLS forwarding on interfaces, and with the number of links to the MPLS core typically fairly low this doesn't present too much of a burden on operators. Cc: "Eric W. Biederman" Signed-off-by: Robert Shearman --- Documentation/networking/mpls-sysctl.txt | 9 net/mpls/af_mpls.c | 75 +++- net/mpls/internal.h | 3 ++ 3 files changed, 85 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt index 639ddf0ece9b..9ed15f86c17c 100644 --- a/Documentation/networking/mpls-sysctl.txt +++ b/Documentation/networking/mpls-sysctl.txt @@ -18,3 +18,12 @@ platform_labels - INTEGER Possible values: 0 - 1048575 Default: 0 + +conf//input - BOOL + Control whether packets can be input on this interface. + + If disabled, packets will be discarded without further + processing. + + 0 - disabled (default) + not 0 - enabled diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index ad45017eed99..7ac93082e3dc 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -150,7 +150,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev, /* Careful this entire function runs inside of an rcu critical section */ mdev = mpls_dev_get(dev); - if (!mdev) + if (!mdev || !mdev->input_enabled) goto drop; if (skb->pkt_type != PACKET_HOST) @@ -438,6 +438,60 @@ errout: return err; } +#define MPLS_PERDEV_SYSCTL_OFFSET(field) \ + (&((struct mpls_dev *)0)->field) + +static const struct ctl_table mpls_dev_table[] = { + { + .procname = "input", + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + .data = MPLS_PERDEV_SYSCTL_OFFSET(input_enabled), + }, + { } +}; + +static int mpls_dev_sysctl_register(struct net_device *dev, + struct mpls_dev *mdev) +{ + char path[sizeof("net/mpls/conf/") + IFNAMSIZ]; + struct ctl_table *table; + int i; + + table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL); + if (!table) + goto out; + + /* Table data contains only offsets relative to the base of +* the mdev at this point, so make them absolute. +*/ + for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++) + table[i].data = (char *)mdev + (uintptr_t)table[i].data; + + snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name); + + mdev->sysctl = register_net_sysctl(dev_net(dev), path, table); + if (!mdev->sysctl) + goto free; + + return 0; + +free: + kfree(table); +out: + return -ENOBUFS; +} + +static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev) +{ + struct ctl_table *table; + + table = mdev->sysctl->ctl_table_arg; + unregister_net_sysctl_table(mdev->sysctl); + kfree(table); +} + static struct mpls_dev *mpls_add_dev(struct net_device *dev) { struct mpls_dev *mdev; @@ -449,9 +503,24 @@ static struct mpls_dev *mpls_add_dev(struct net_device *dev) if (!mdev) return ERR_PTR(err); + /* Enable MPLS by default on loopback devices, since this +* doesn't represent a security boundary and is required for the +* lookup of inner labels for LSPs terminating on this router. +*/ + if (dev->flags & IFF_LOOPBACK) + mdev->input_enabled = 1; + + err = mpls_dev_sysctl_register(dev, mdev); + if (err) + goto free; + rcu_assign_pointer(dev->mpls_ptr, mdev); return mdev; + +free: + kfree(mdev); + return ERR_PTR(err); } static void mpls_ifdown(struct net_device *dev) @@ -475,6 +544,8 @@ static void mpls_ifdown(struct net_device *dev) if (!mdev) return; + mpls_dev_sysctl_unregister(mdev); + RCU_INIT_POINTER(dev->mpls_ptr, NULL); kfree(mdev); @@ -958,7 +1029,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write, return ret; } -
[PATCH 1/3] mpls: Per-device MPLS state
Add per-device MPLS state to supported interfaces. Use the presence of this state in mpls_route_add to determine that this is a supported interface. Use the presence of mpls_dev to drop packets that arrived on an unsupported interface - previously they were allowed through. Cc: "Eric W. Biederman" Signed-off-by: Robert Shearman --- include/linux/netdevice.h | 4 net/mpls/af_mpls.c| 50 +-- net/mpls/internal.h | 3 +++ 3 files changed, 55 insertions(+), 2 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index bcbde799ec69..dae106a3a998 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -60,6 +60,7 @@ struct phy_device; struct wireless_dev; /* 802.15.4 specific */ struct wpan_dev; +struct mpls_dev; void netdev_set_default_ethtool_ops(struct net_device *dev, const struct ethtool_ops *ops); @@ -1627,6 +1628,9 @@ struct net_device { void*ax25_ptr; struct wireless_dev *ieee80211_ptr; struct wpan_dev *ieee802154_ptr; +#if IS_ENABLED(CONFIG_MPLS_ROUTING) + struct mpls_dev __rcu *mpls_ptr; +#endif /* * Cache lines mostly used on receive path (including eth_type_trans()) diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index db8a2ea6d4de..ad45017eed99 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -53,6 +53,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index) return rt; } +static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev) +{ + return rcu_dereference_rtnl(dev->mpls_ptr); +} + static bool mpls_output_possible(const struct net_device *dev) { return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev); @@ -136,6 +141,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev, struct mpls_route *rt; struct mpls_entry_decoded dec; struct net_device *out_dev; + struct mpls_dev *mdev; unsigned int hh_len; unsigned int new_header_size; unsigned int mtu; @@ -143,6 +149,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev, /* Careful this entire function runs inside of an rcu critical section */ + mdev = mpls_dev_get(dev); + if (!mdev) + goto drop; + if (skb->pkt_type != PACKET_HOST) goto drop; @@ -352,9 +362,9 @@ static int mpls_route_add(struct mpls_route_config *cfg) if (!dev) goto errout; - /* For now just support ethernet devices */ + /* Ensure this is a supported device */ err = -EINVAL; - if ((dev->type != ARPHRD_ETHER) && (dev->type != ARPHRD_LOOPBACK)) + if (!mpls_dev_get(dev)) goto errout; err = -EINVAL; @@ -428,10 +438,27 @@ errout: return err; } +static struct mpls_dev *mpls_add_dev(struct net_device *dev) +{ + struct mpls_dev *mdev; + int err = -ENOMEM; + + ASSERT_RTNL(); + + mdev = kzalloc(sizeof(*mdev), GFP_KERNEL); + if (!mdev) + return ERR_PTR(err); + + rcu_assign_pointer(dev->mpls_ptr, mdev); + + return mdev; +} + static void mpls_ifdown(struct net_device *dev) { struct mpls_route __rcu **platform_label; struct net *net = dev_net(dev); + struct mpls_dev *mdev; unsigned index; platform_label = rtnl_dereference(net->mpls.platform_label); @@ -443,14 +470,33 @@ static void mpls_ifdown(struct net_device *dev) continue; rt->rt_dev = NULL; } + + mdev = mpls_dev_get(dev); + if (!mdev) + return; + + RCU_INIT_POINTER(dev->mpls_ptr, NULL); + + kfree(mdev); } static int mpls_dev_notify(struct notifier_block *this, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct mpls_dev *mdev; switch(event) { + case NETDEV_REGISTER: + /* For now just support ethernet devices */ + if ((dev->type == ARPHRD_ETHER) || + (dev->type == ARPHRD_LOOPBACK)) { + mdev = mpls_add_dev(dev); + if (IS_ERR(mdev)) + return notifier_from_errno(PTR_ERR(mdev)); + } + break; + case NETDEV_UNREGISTER: mpls_ifdown(dev); break; diff --git a/net/mpls/internal.h b/net/mpls/internal.h index fb6de92052c4..8090cb3099b4 100644 --- a/net/mpls/internal.h +++ b/net/mpls/internal.h @@ -22,6 +22,9 @@ struct mpls_entry_decoded { u8 bos; }; +struct mpls_dev { +}; + struct sk_buff; static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb) -- 2.1.4 -- To unsubscribe from this list: send th
[PATCH 3/3] mpls: Prevent use of implicit NULL label as outgoing label
The reserved implicit-NULL label isn't allowed to appear in the label stack for packets, so make it an error for the control plane to specify it as an outgoing label. Suggested-by: "Eric W. Biederman" Signed-off-by: Robert Shearman --- net/mpls/af_mpls.c | 9 + 1 file changed, 9 insertions(+) diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 7ac93082e3dc..eb8dc411859d 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -653,6 +653,15 @@ int nla_get_labels(const struct nlattr *nla, if ((dec.bos != bos) || dec.ttl || dec.tc) return -EINVAL; + switch (dec.label) { + case LABEL_IMPLICIT_NULL: + /* RFC3032: This is a label that an LSR may +* assign and distribute, but which never +* actually appears in the encapsulation. +*/ + return -EINVAL; + } + label[i] = dec.label; } *labels = nla_labels; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] mpls: ABI changes for security and correctness
These changes make mpls not be enabled by default on all interfaces when in use for security, along with ensuring that a label not valid as an outgoing label can be added in mpls routes. This series contains three ABI/behaviour-affecting changes which have been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing improvements" without any further modification. These changes need to be considered for 4.1 otherwise we'll be stuck with the current behaviour/ABI forever. Robert Shearman (3): mpls: Per-device MPLS state mpls: Per-device enabling of packet input mpls: Prevent use of implicit NULL label as outgoing label Documentation/networking/mpls-sysctl.txt | 9 +++ include/linux/netdevice.h| 4 + net/mpls/af_mpls.c | 132 ++- net/mpls/internal.h | 6 ++ 4 files changed, 148 insertions(+), 3 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] tc filter not show last u32 filter
On Tue, Apr 21, 2015 at 1:07 PM, Vitaly E. Lavrov wrote: > > A similar patch is already exists in the > git.kernel.org/linux-stable.git/master (commit > b057df24a7536cce6c372efe9d0e3d1558afedf4) > and linux-4.0.y. > > The patch can be applied to branches of the kernel 3.14, 3.18 and 3.19. Why > is this not done at once? > That commit should already have been backported to all stable kernels, so why do we need a different one? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] v2: Driver: hv: netvsc: call dump_rndis_message() only in netvsc debug mode
From: Simon Xiao Signed-off-by: Simon Xiao --- drivers/net/hyperv/hyperv_net.h | 3 +++ drivers/net/hyperv/netvsc_drv.c | 8 drivers/net/hyperv/rndis_filter.c | 3 ++- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index a10b316..c9be35e 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -28,6 +28,9 @@ #include #include +/* flag for netvsc debug mode */ +extern int debug_mode; + /* RSS related */ #define OID_GEN_RECEIVE_SCALE_CAPABILITIES 0x00010203 /* query only */ #define OID_GEN_RECEIVE_SCALE_PARAMETERS 0x00010204 /* query and set */ diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index a3a9d38..7c41864 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -52,6 +52,10 @@ static int ring_size = 128; module_param(ring_size, int, S_IRUGO); MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)"); +int debug_mode = 0; +module_param(debug_mode, int, S_IRUGO); +MODULE_PARM_DESC(debug_mode, "debug mode: zero(0) for non-debug mode; non-zero for debug mode"); + static void do_set_multicast(struct work_struct *w) { struct net_device_context *ndevctx = @@ -999,6 +1003,10 @@ static int __init netvsc_drv_init(void) pr_info("Increased ring_size to %d (min allowed)\n", ring_size); } + + if (debug_mode != 0) + pr_info("Run netvsc in debug mode"); + return vmbus_driver_register(&netvsc_drv); } diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index 0d92efe..a3f43f6 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -429,7 +429,8 @@ int rndis_filter_receive(struct hv_device *dev, rndis_msg = pkt->data; - dump_rndis_message(dev, rndis_msg); + if (debug_mode != 0) + dump_rndis_message(dev, rndis_msg); switch (rndis_msg->ndis_msg_type) { case RNDIS_MSG_PACKET: -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure
On Tue, 21 Apr 2015, Arnd Bergmann wrote: > I know there are concerns about this, in particular because C11 and > POSIX both require tv_nsec to be 'long', unlike timeval->tv_usec, > which is a 'suseconds_t' and can be defined as 'long long'. > > a) > > struct timespec { > time_t tv_sec; > long long tv_nsec; /* or typedef long long snseconds_t */ > }; > > This is not directly compatible with C11 or POSIX.1-2008, but it > matches what we do inside of 64-bit kernels, so probably has the > highest chance of working correctly in practice After reading Linus rant in the x32 thread again (thanks for the reminder), and looking at b/c/d - which rate between ugly and butt ugly - I think we should go for a) and screw POSIX and C11 as those committee dinosaurs seem to completely ignore the 2038 problem on 32bit machines. At least I have not found any hint that these folks care at all. So why should we comply to something which is completely useless? That also makes the question about the upper 32bits check moot, so it's the simplest and clearest of the possible solutions. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] ethernet: myri10ge: use arch_phys_wc_add()
From: "Luis R. Rodriguez" This driver already uses ioremap_wc() on the same range so when write-combining is available that will be used instead. Cc: Hyong-Youb Kim Cc: Andy Lutomirski Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: netdev@vger.kernel.org Cc: Juergen Gross Cc: Daniel Vetter Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: linux-ker...@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Luis R. Rodriguez --- drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 38 ++-- 1 file changed, 9 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c index 1412f5a..2bae502 100644 --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c @@ -69,11 +69,7 @@ #include #include #include -#include #include -#ifdef CONFIG_MTRR -#include -#endif #include #include "myri10ge_mcp.h" @@ -242,8 +238,7 @@ struct myri10ge_priv { unsigned int rdma_tags_available; int intr_coal_delay; __be32 __iomem *intr_coal_delay_ptr; - int mtrr; - int wc_enabled; + int wc_cookie; int down_cnt; wait_queue_head_t down_wq; struct work_struct watchdog_work; @@ -1905,7 +1900,7 @@ static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = { "tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors", "tx_heartbeat_errors", "tx_window_errors", /* device-specific stats */ - "tx_boundary", "WC", "irq", "MSI", "MSIX", + "tx_boundary", "irq", "MSI", "MSIX", "read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs", "serial_number", "watchdog_resets", #ifdef CONFIG_MYRI10GE_DCA @@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev, data[i] = ((u64 *)&link_stats)[i]; data[i++] = (unsigned int)mgp->tx_boundary; - data[i++] = (unsigned int)mgp->wc_enabled; data[i++] = (unsigned int)mgp->pdev->irq; data[i++] = (unsigned int)mgp->msi_enabled; data[i++] = (unsigned int)mgp->msix_enabled; @@ -4040,14 +4034,7 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent) mgp->board_span = pci_resource_len(pdev, 0); mgp->iomem_base = pci_resource_start(pdev, 0); - mgp->mtrr = -1; - mgp->wc_enabled = 0; -#ifdef CONFIG_MTRR - mgp->mtrr = mtrr_add(mgp->iomem_base, mgp->board_span, -MTRR_TYPE_WRCOMB, 1); - if (mgp->mtrr >= 0) - mgp->wc_enabled = 1; -#endif + mgp->wc_cookie = arch_phys_wc_add(mgp->iomem_base, mgp->board_span); mgp->sram = ioremap_wc(mgp->iomem_base, mgp->board_span); if (mgp->sram == NULL) { dev_err(&pdev->dev, "ioremap failed for %ld bytes at 0x%lx\n", @@ -4146,14 +4133,14 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent) goto abort_with_state; } if (mgp->msix_enabled) - dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, WC %s\n", + dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, MTRR %s, WC Enabled\n", mgp->num_slices, mgp->tx_boundary, mgp->fw_name, -(mgp->wc_enabled ? "Enabled" : "Disabled")); +(mgp->wc_cookie > 0 ? "Enabled" : "Disabled")); else - dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, WC %s\n", + dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, MTRR %s, WC Enabled\n", mgp->msi_enabled ? "MSI" : "xPIC", pdev->irq, mgp->tx_boundary, mgp->fw_name, -(mgp->wc_enabled ? "Enabled" : "Disabled")); +(mgp->wc_cookie > 0 ? "Enabled" : "Disabled")); board_number++; return 0; @@ -4175,10 +4162,7 @@ abort_with_ioremap: iounmap(mgp->sram); abort_with_mtrr: -#ifdef CONFIG_MTRR - if (mgp->mtrr >= 0) - mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span); -#endif + arch_phys_wc_del(mgp->wc_cookie); dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd), mgp->cmd, mgp->cmd_bus); @@ -4220,11 +4204,7 @@ static void myri10ge_remove(struct pci_dev *pdev) pci_restore_state(pdev); iounmap(mgp->sram); - -#ifdef CONFIG_MTRR - if (mgp->mtrr >= 0) - mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span); -#endif + arch_phys_wc_del(mgp->wc_cookie); myri10ge_free_slices(mgp); kfree(mgp->msix_vectors); dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd), -- 2.3.2.209.gd67f9d5.dirty -- To unsubscribe from this list: send the line "unsubscribe netdev" in the bo
Re: [RFC PATCH] tc filter not show last u32 filter
On 21.04.2015 21:17, Sergei Shtylyov wrote: Hello. On 04/21/2015 08:53 PM, Vitaly E. Lavrov wrote: "tc filter show" does not show last U32 filter on 32-bit systems (tested on x86). Additional condition: filter does not have action and CONFIG_NET_CLS_ACT=y Example: tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip dst 10.200.2.2 flowid 1:20 You need to send patches against the latest kernel. Look for DaveM's 'net.git' repo on git.kernel.org. >Please run your patches thru scripts/chackpatch.pl; space is needed after *if*. Thanks for the reminder about the rules. Sorry that have not checked all branches of the kernel. A similar patch is already exists in the git.kernel.org/linux-stable.git/master (commit b057df24a7536cce6c372efe9d0e3d1558afedf4) and linux-4.0.y. The patch can be applied to branches of the kernel 3.14, 3.18 and 3.19. Why is this not done at once? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] etherdevice: Add ether_addr_copy_unaligned
On 20.04.2015 20:19, David Miller wrote: (...) > I'd rather see something like this submitted in a patch series alongside > some actual uses. > > So I'm tossing this for now. > Ok; I'll add it to a series where I need it (rtl8192e) Regards, Mateusz -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcp: add memory barriers to write space paths
From: Jason Baron Date: Mon, 20 Apr 2015 20:05:07 + (GMT) > Ensure that we either see that the buffer has write space > in tcp_poll() or that we perform a wakeup from the input > side. Did not run into any actual problem here, but thought > that we should make things explicit. > > Signed-off-by: Jason Baron This looks fine, applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 0/3] tc cleanup?
Hi, I've started cleaning TC a bit. Before I go too far, need your feedback on this RFC: patch 1 - stop abuse of return values in ingress qdisc patch 2 - deprecate TC_ACT_QUEUED patch 3 - reduce copy-paste around tc_classify() Lightly tested so far. Waiting for John's and other tc test scripts. Alexei Starovoitov (3): tc: fix return values of ingress qdisc tc: deprecate TC_ACT_QUEUED tc: cleanup tc_classify include/net/pkt_sched.h |2 ++ include/net/sch_generic.h|7 +++ include/uapi/linux/pkt_cls.h |2 +- net/core/dev.c |8 ++-- net/sched/sch_api.c | 22 ++ net/sched/sch_atm.c |1 - net/sched/sch_cbq.c |1 - net/sched/sch_choke.c| 17 +++-- net/sched/sch_drr.c | 18 +++--- net/sched/sch_dsmark.c |1 - net/sched/sch_fq_codel.c | 25 ++--- net/sched/sch_hfsc.c |1 - net/sched/sch_htb.c |1 - net/sched/sch_ingress.c | 10 -- net/sched/sch_multiq.c |1 - net/sched/sch_prio.c |1 - net/sched/sch_qfq.c | 16 ++-- net/sched/sch_sfb.c | 17 +++-- net/sched/sch_sfq.c | 26 ++ 19 files changed, 61 insertions(+), 116 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 2/3] tc: deprecate TC_ACT_QUEUED
TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN. Get rid of redundant checks in all qdiscs. Instead do it once. Signed-off-by: Alexei Starovoitov --- include/uapi/linux/pkt_cls.h |2 +- net/sched/sch_api.c |2 ++ net/sched/sch_atm.c |1 - net/sched/sch_cbq.c |1 - net/sched/sch_choke.c|1 - net/sched/sch_drr.c |1 - net/sched/sch_dsmark.c |1 - net/sched/sch_fq_codel.c |1 - net/sched/sch_hfsc.c |1 - net/sched/sch_htb.c |1 - net/sched/sch_ingress.c |1 - net/sched/sch_multiq.c |1 - net/sched/sch_prio.c |1 - net/sched/sch_qfq.c |1 - net/sched/sch_sfb.c |1 - net/sched/sch_sfq.c |1 - 16 files changed, 3 insertions(+), 15 deletions(-) diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index bf08e76bf505..208e5ed5256c 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -102,7 +102,7 @@ enum { #define TC_ACT_SHOT2 #define TC_ACT_PIPE3 #define TC_ACT_STOLEN 4 -#define TC_ACT_QUEUED 5 +#define TC_ACT_QUEUED 5 /* deprecated. same as TC_ACT_STOLEN */ #define TC_ACT_REPEAT 6 #define TC_ACT_JUMP0x1000 diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index ad9eed70bc8f..f7950327bb22 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1820,6 +1820,8 @@ int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp, #ifdef CONFIG_NET_CLS_ACT if (err != TC_ACT_RECLASSIFY && skb->tc_verd) skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0); + if (err == TC_ACT_QUEUED) + return TC_ACT_STOLEN; #endif return err; } diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c index e3e2cc5fd068..7ef0bf4bdce6 100644 --- a/net/sched/sch_atm.c +++ b/net/sched/sch_atm.c @@ -396,7 +396,6 @@ done: /*@@@ looks good ... but it's not supposed to work :-) */ #ifdef CONFIG_NET_CLS_ACT switch (result) { - case TC_ACT_QUEUED: case TC_ACT_STOLEN: kfree_skb(skb); return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index beeb75f80fdb..eca4725e3273 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -258,7 +258,6 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) goto fallback; #ifdef CONFIG_NET_CLS_ACT switch (result) { - case TC_ACT_QUEUED: case TC_ACT_STOLEN: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; case TC_ACT_SHOT: diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c index c009eb9045ce..a3bc7cf151d3 100644 --- a/net/sched/sch_choke.c +++ b/net/sched/sch_choke.c @@ -212,7 +212,6 @@ static bool choke_classify(struct sk_buff *skb, #ifdef CONFIG_NET_CLS_ACT switch (result) { case TC_ACT_STOLEN: - case TC_ACT_QUEUED: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; case TC_ACT_SHOT: return false; diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c index 338706092c27..1051c5d4e85b 100644 --- a/net/sched/sch_drr.c +++ b/net/sched/sch_drr.c @@ -335,7 +335,6 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch, if (result >= 0) { #ifdef CONFIG_NET_CLS_ACT switch (result) { - case TC_ACT_QUEUED: case TC_ACT_STOLEN: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; case TC_ACT_SHOT: diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c index 66700a6116aa..ce9d4123cbbe 100644 --- a/net/sched/sch_dsmark.c +++ b/net/sched/sch_dsmark.c @@ -236,7 +236,6 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch) switch (result) { #ifdef CONFIG_NET_CLS_ACT - case TC_ACT_QUEUED: case TC_ACT_STOLEN: kfree_skb(skb); return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index 1e52decb7b59..4d00ece3243d 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -104,7 +104,6 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch, #ifdef CONFIG_NET_CLS_ACT switch (result) { case TC_ACT_STOLEN: - case TC_ACT_QUEUED: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; case TC_ACT_SHOT: return 0; diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hf
[RFC 3/3] tc: cleanup tc_classify
introduce tc_classify_act() and qdisc_drop_bypass() helper functions to reduce copy-paste among different qdiscs Signed-off-by: Alexei Starovoitov --- include/net/pkt_sched.h |2 ++ include/net/sch_generic.h |7 +++ net/sched/sch_api.c | 20 net/sched/sch_choke.c | 16 +++- net/sched/sch_drr.c | 17 +++-- net/sched/sch_fq_codel.c | 24 ++-- net/sched/sch_qfq.c | 15 ++- net/sched/sch_sfb.c | 16 +++- net/sched/sch_sfq.c | 25 ++--- 9 files changed, 52 insertions(+), 90 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 2342bf12cb78..7c73cbe95169 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -114,6 +114,8 @@ int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp, struct tcf_result *res); int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct tcf_result *res); +int tc_classify_act(struct sk_buff *skb, const struct tcf_proto *tp, + struct tcf_result *res, int *qerr); static inline __be16 tc_skb_protocol(const struct sk_buff *skb) { diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 6d778efcfdfd..9a50bad24b1d 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -715,6 +715,13 @@ static inline int qdisc_drop(struct sk_buff *skb, struct Qdisc *sch) return NET_XMIT_DROP; } +static inline void qdisc_drop_bypass(struct sk_buff *skb, struct Qdisc *sch, int err) +{ + if (err & __NET_XMIT_BYPASS) + qdisc_qstats_drop(sch); + kfree_skb(skb); +} + static inline int qdisc_reshape_fail(struct sk_buff *skb, struct Qdisc *sch) { qdisc_qstats_drop(sch); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index f7950327bb22..c7c4a672eb35 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1860,6 +1860,26 @@ reclassify: } EXPORT_SYMBOL(tc_classify); +int tc_classify_act(struct sk_buff *skb, const struct tcf_proto *tp, + struct tcf_result *res, int *qerr) +{ + int result; + + *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; + result = tc_classify(skb, tp, res); + +#ifdef CONFIG_NET_CLS_ACT + switch (result) { + case TC_ACT_STOLEN: + *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; + case TC_ACT_SHOT: + return -1; + } +#endif + return result; +} +EXPORT_SYMBOL(tc_classify_act); + bool tcf_destroy(struct tcf_proto *tp, bool force) { if (tp->ops->destroy(tp, force)) { diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c index a3bc7cf151d3..8d8ad5303497 100644 --- a/net/sched/sch_choke.c +++ b/net/sched/sch_choke.c @@ -207,16 +207,8 @@ static bool choke_classify(struct sk_buff *skb, int result; fl = rcu_dereference_bh(q->filter_list); - result = tc_classify(skb, fl, &res); + result = tc_classify_act(skb, fl, &res, qerr); if (result >= 0) { -#ifdef CONFIG_NET_CLS_ACT - switch (result) { - case TC_ACT_STOLEN: - *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - case TC_ACT_SHOT: - return false; - } -#endif choke_set_classid(skb, TC_H_MIN(res.classid)); return true; } @@ -268,9 +260,9 @@ static bool choke_match_random(const struct choke_sched_data *q, static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch) { - int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; struct choke_sched_data *q = qdisc_priv(sch); const struct red_parms *p = &q->parms; + int ret; if (rcu_access_pointer(q->filter_list)) { /* If using external classifiers, get result and record it. */ @@ -343,9 +335,7 @@ congestion_drop: return NET_XMIT_CN; other_drop: - if (ret & __NET_XMIT_BYPASS) - qdisc_qstats_drop(sch); - kfree_skb(skb); + qdisc_drop_bypass(skb, sch, ret); return ret; } diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c index 1051c5d4e85b..36ab69375c79 100644 --- a/net/sched/sch_drr.c +++ b/net/sched/sch_drr.c @@ -329,18 +329,9 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch, return cl; } - *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; fl = rcu_dereference_bh(q->filter_list); - result = tc_classify(skb, fl, &res); + result = tc_classify_act(skb, fl, &res, qerr); if (result >= 0) { -#ifdef CONFIG_NET_CLS_ACT - switch (result) { - case TC_ACT_STOLEN: - *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - case TC_ACT_SHOT: - return NULL; - } -#en
[RFC 1/3] tc: fix return values of ingress qdisc
ingress qdisc should return NET_XMIT_* values just like all other qdiscs. Since it's invoked via qdisc_enqueue_root() (which suppose to return only NET_XMIT_* values as well), it was working by accident, since TC_ACT_* values fit within NET_XMIT_MASK. Signed-off-by: Alexei Starovoitov --- net/core/dev.c |8 ++-- net/sched/sch_ingress.c |9 - 2 files changed, 6 insertions(+), 11 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 1796cef55ab5..ac6233f6f353 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3533,7 +3533,7 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq) { struct net_device *dev = skb->dev; u32 ttl = G_TC_RTTL(skb->tc_verd); - int result = TC_ACT_OK; + int result = NET_XMIT_SUCCESS; struct Qdisc *q; if (unlikely(MAX_RED_LOOP < ttl++)) { @@ -3570,12 +3570,8 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, *pt_prev = NULL; } - switch (ing_filter(skb, rxq)) { - case TC_ACT_SHOT: - case TC_ACT_STOLEN: - kfree_skb(skb); + if (ing_filter(skb, rxq) == NET_XMIT_DROP) return NULL; - } return skb; } diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c index 4cdbfb85686a..e68f4a5dbeba 100644 --- a/net/sched/sch_ingress.c +++ b/net/sched/sch_ingress.c @@ -65,21 +65,20 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch) result = tc_classify(skb, fl, &res); - qdisc_bstats_update(sch, skb); switch (result) { case TC_ACT_SHOT: - result = TC_ACT_SHOT; qdisc_qstats_drop(sch); - break; case TC_ACT_STOLEN: case TC_ACT_QUEUED: - result = TC_ACT_STOLEN; + result = NET_XMIT_DROP; + kfree_skb(skb); break; case TC_ACT_RECLASSIFY: case TC_ACT_OK: skb->tc_index = TC_H_MIN(res.classid); default: - result = TC_ACT_OK; + qdisc_bstats_update(sch, skb); + result = NET_XMIT_SUCCESS; break; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ibmveth: Add support for Large Receive Offload
On 04/14/2015 05:00 PM, Eric Dumazet wrote: > On Tue, 2015-04-14 at 15:35 -0500, Thomas Falcon wrote: >> Enables receiving large packets from other LPARs. These packets >> have a -1 IP header checksum, so we must recalculate to have >> a valid checksum. >> >> Signed-off-by: Brian King >> Signed-off-by: Thomas Falcon >> --- >> drivers/net/ethernet/ibm/ibmveth.c | 15 ++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c >> b/drivers/net/ethernet/ibm/ibmveth.c >> index 08970c7..05eaca6a 100644 >> --- a/drivers/net/ethernet/ibm/ibmveth.c >> +++ b/drivers/net/ethernet/ibm/ibmveth.c >> @@ -1092,6 +1092,7 @@ static int ibmveth_poll(struct napi_struct *napi, int >> budget) >> struct net_device *netdev = adapter->netdev; >> int frames_processed = 0; >> unsigned long lpar_rc; >> +struct iphdr *iph; >> >> restart_poll: >> while (frames_processed < budget) { >> @@ -1134,8 +1135,20 @@ restart_poll: >> skb_put(skb, length); >> skb->protocol = eth_type_trans(skb, netdev); >> >> -if (csum_good) >> +if (csum_good) { >> skb->ip_summed = CHECKSUM_UNNECESSARY; >> +if (be16_to_cpu(skb->protocol) == ETH_P_IP) { >> +skb_set_network_header(skb, 0); >> +skb_set_transport_header(skb, >> sizeof(struct iphdr)); >> +iph = ip_hdr(skb); >> + >> +/* If the IP checksum is not offloaded >> and if the packet >> + * is large send, the checksum must be >> rebuilt. >> + */ >> +if (iph->check == 0x) >> +iph->check = >> ip_fast_csum((unsigned char *)iph, iph->ihl); > > How can this possibly work ? > > Normally you would have to set iph->check to 0 before calling > ip_fast_csum(), as done in ip_send_check() I don't have an answer for why I weren't seeing any problems while testing this, but I'll go back and set iph->check to zero and retest. Thanks for noticing this. > > > ___ > Linuxppc-dev mailing list > linuxppc-...@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] tc filter not show last u32 filter
Hello. On 04/21/2015 08:53 PM, Vitaly E. Lavrov wrote: "tc filter show" does not show last U32 filter on 32-bit systems (tested on x86). Additional condition: filter does not have action and CONFIG_NET_CLS_ACT=y Example: tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip dst 10.200.2.2 flowid 1:20 Function tcf_action_copy_stats() returns an error because the (struct tc_action *)a->priv is NULL (for 32bit systems). The sequence of calls: u32_dump() cls_u32.c:1009 if (tcf_exts_dump_stats(skb, &n->exts) < 0) goto nla_put_failure; tcf_exts_dump_stats() cls_api.c:606 if (tcf_action_copy_stats(skb, a, 1) < 0) return -1; tcf_action_copy_stats() act_api.c:604 struct tcf_common *p = a->priv; act_api.c:606 if (p == NULL) goto errout; // return -1; One of variants correcting this error is a verify the existence of action before calling tcf_action_copy_stats(). Patch for kernel 3.18.10 You need to send patches against the latest kernel. Look for DaveM's 'net.git' repo on git.kernel.org. diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index aad6a67..8e7ad61 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -602,9 +602,11 @@ EXPORT_SYMBOL(tcf_exts_dump); int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts) { #ifdef CONFIG_NET_CLS_ACT - struct tc_action *a = tcf_exts_first_act(exts); - if (tcf_action_copy_stats(skb, a, 1) < 0) - return -1; + if(tcf_exts_is_available(exts)) { Please run your patches thru scripts/chackpatch.pl; space is needed after *if*. + struct tc_action *a = tcf_exts_first_act(exts); Empty line needed here. WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please backport dev_kfree_skb_any fixes for stable
From: Vinson Lee Date: Wed, 25 Mar 2015 12:39:01 -0700 > Please backport the list of commits from the original stable request > at http://permalink.gmane.org/gmane.linux.network/320390. > > The patches are from 3.15. Please queue them up for kernels <= 3.14. Ok, doing that right now, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] tc filter not show last u32 filter
"tc filter show" does not show last U32 filter on 32-bit systems (tested on x86). Additional condition: filter does not have action and CONFIG_NET_CLS_ACT=y Example: tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip dst 10.200.2.2 flowid 1:20 Function tcf_action_copy_stats() returns an error because the (struct tc_action *)a->priv is NULL (for 32bit systems). The sequence of calls: u32_dump() cls_u32.c:1009 if (tcf_exts_dump_stats(skb, &n->exts) < 0) goto nla_put_failure; tcf_exts_dump_stats() cls_api.c:606 if (tcf_action_copy_stats(skb, a, 1) < 0) return -1; tcf_action_copy_stats() act_api.c:604 struct tcf_common *p = a->priv; act_api.c:606 if (p == NULL) goto errout; // return -1; One of variants correcting this error is a verify the existence of action before calling tcf_action_copy_stats(). Patch for kernel 3.18.10 diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index aad6a67..8e7ad61 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -602,9 +602,11 @@ EXPORT_SYMBOL(tcf_exts_dump); int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts) { #ifdef CONFIG_NET_CLS_ACT - struct tc_action *a = tcf_exts_first_act(exts); - if (tcf_action_copy_stats(skb, a, 1) < 0) - return -1; + if(tcf_exts_is_available(exts)) { + struct tc_action *a = tcf_exts_first_act(exts); + if (tcf_action_copy_stats(skb, a, 1) < 0) + return -1; + } #endif return 0; } This will fix the bug 84661 https://bugzilla.kernel.org/show_bug.cgi?id=84661 In 64bit system a->priv is not NULL, but is not a valid pointer, but because of a->type == 0 and compat_mode == 1 returns a value 0. "tc filter show dev eth0". tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip dst 10.200.2.2 flowid 1:20 tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip dst 10.200.2.3 flowid 1:30 tc filter show dev eth0 (1) filter parent 1: protocol ip pref 100 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:20 match 0ac80202/ at 16 (2) filter parent 1: protocol ip pref 100 u32 fh 800::801 order 2049 key ht 800 bkt 0 flowid 1:30 match 0ac80203/ at 16 64bit system (1) a->priv == 0x8800 (2) a->priv == 0x8801 32bit system (1) a->priv == 0x0 (2) a->priv == 0x0 I could not understand the reason for this difference between 32-bit and 64-bit systems. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.
On 21/04/15 10:39, Andrew Lunn wrote: >>> I would however say that sysfs is the wrong API. The linux network >>> stack uses netlink for most configuration activities. So i would >>> suggest adding a netlink binding to DSA, and place the code in >>> net/dsa/, not within an MDIO driver. >> >> I suppose we could do that, but that sounds like a pretty radical change >> in how DSA is currently configured (that is statically at boot time), >> part in order to allow booting from DSA-enabled network devices (e.g: >> nfsroot). > > We would keep both DT and platform device. But statically at boot does > not work for a USB hotpluggable switch! Is the switch really hotpluggable, or it is the USB-Ethernet adapter connecting to it? If the former, then I agree, if not, I would imagine that there is nothing that prevents creating the switch device first, and wait for its "master_netdev" to show up later before it starts doing anything useful? -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: commit d0af71a3573 for 3.19.y stable
From: Josh Boyer Date: Wed, 1 Apr 2015 20:34:22 -0400 > Another possible stable candidate. We had a report[1] of deadlocks > with tigon devices on 3.19.y and the commit below fixes it. It > cherry-picks cleanly on top of 3.19.3. I don't see it queued up so I > thought I would point it out. > > commit d0af71a3573f1217b140c60b66f1a9b335fb058b > Author: Jun'ichi Nomura \(NEC\) > Date: Thu Feb 12 01:26:24 2015 + > > tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one() Queued up, thanks Josh. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.
On 21/04/15 10:30, Andrew Lunn wrote: >> My goal in reworking this weird DSA device/driver model is that you >> could just register your switch devices as an enhanced >> phy_driver/spi_driver/pci_driver etc..., such that libphy-ready drivers >> could just take advantage of that when they scan/detect their MDIO buses >> and find a switch. We are not quite there yet, but some help could be >> welcome, here are the WIP patches (tested with platform_driver only so far): > > We are hijacking another thread, but... > > I don't understand you here. Who calls dsa_switch_register()? Any driver which is backing the underlying device, if this is a PCI(e) switch, a pci_driver's probe function gets called, and then registers with DSA a switch device, very much like this: https://github.com/ffainelli/linux/commit/f94efc3d7b489955351c01efeafcc89939df388e > > I know of a board coming soon which has three switch chips on > it. There is one MDIO device in the Soc, but there is an external MDIO > multiplexor controlled via gpio lines, such that each switch has its > own MDIO bus. The DT binding does not support this currently, but the > underlying data structures do. > > How do you envisage dsa_switch_register() to work in such a setup? I would envision something where we can scan all of these switches individually using their respective device drivers, with the help of Device Tree or platform_data, figure out which position in a dsa_switch_tree they should have, and make sure that we create a dsa_switch_tree which reflects that, taking probe ordering into account. All of these switches would be phy_driver instances, like this: https://github.com/ffainelli/linux/commit/4a5c6b17de36377f6a71423b91f80bc1c7fee7be We can keep discussing the details in a separate thread, I think that would be useful. -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.
> > I would however say that sysfs is the wrong API. The linux network > > stack uses netlink for most configuration activities. So i would > > suggest adding a netlink binding to DSA, and place the code in > > net/dsa/, not within an MDIO driver. > > I suppose we could do that, but that sounds like a pretty radical change > in how DSA is currently configured (that is statically at boot time), > part in order to allow booting from DSA-enabled network devices (e.g: > nfsroot). We would keep both DT and platform device. But statically at boot does not work for a USB hotpluggable switch! Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.
> My goal in reworking this weird DSA device/driver model is that you > could just register your switch devices as an enhanced > phy_driver/spi_driver/pci_driver etc..., such that libphy-ready drivers > could just take advantage of that when they scan/detect their MDIO buses > and find a switch. We are not quite there yet, but some help could be > welcome, here are the WIP patches (tested with platform_driver only so far): We are hijacking another thread, but... I don't understand you here. Who calls dsa_switch_register()? I know of a board coming soon which has three switch chips on it. There is one MDIO device in the Soc, but there is an external MDIO multiplexor controlled via gpio lines, such that each switch has its own MDIO bus. The DT binding does not support this currently, but the underlying data structures do. How do you envisage dsa_switch_register() to work in such a setup? Thanks Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.
On 21/04/15 05:47, Andrew Lunn wrote: > Hi Jan > > Interesting work, but i think the architecture is wrong. > > DSA needs an Ethernet device, an MDIO bus, and information about ports > on the switch. That requirement is completely artificial as it is today, and just comes from arbitrary limitations imposed in the initial DSA design, something that I am still trying to get away from. > The MDIO bus and the Ethernet need no knowledge of > DSA. So putting your DSA configuration code in the MDIO driver is > wrong. I agree with that. > > The problem you have is where the put the configuration data. There > are the currently two choices, using a platform driver, which you can > find some examples of in arch/arm/mach-orion5x, or via device tree. Or > you need a new method. > > Part of your problem is hotplug, since you have a USB device, and no > stable names for the ethernet device nor the MDIO device. Your > hardware is not fixed, you could hang any switch off the USB > device. So it does sound like you need a user space API. > > I would however say that sysfs is the wrong API. The linux network > stack uses netlink for most configuration activities. So i would > suggest adding a netlink binding to DSA, and place the code in > net/dsa/, not within an MDIO driver. I suppose we could do that, but that sounds like a pretty radical change in how DSA is currently configured (that is statically at boot time), part in order to allow booting from DSA-enabled network devices (e.g: nfsroot). > > Device tree overlays might be a solution, if you can dynamically load > a blob as part of a USB hotplug event. What makes it easier is that > both the Ethernet device and MDIO bus are on the same USB device, so > all your phandles are within the blob. > > What is your long term goal? Is this just a development tool? Are you > thinking of making a product which integrates both the switch and the > USB ethernet onto a USB dongle? This could also change the > architecture, since it makes the configuration more fixed. My goal in reworking this weird DSA device/driver model is that you could just register your switch devices as an enhanced phy_driver/spi_driver/pci_driver etc..., such that libphy-ready drivers could just take advantage of that when they scan/detect their MDIO buses and find a switch. We are not quite there yet, but some help could be welcome, here are the WIP patches (tested with platform_driver only so far): https://github.com/ffainelli/linux/tree/dsa-model-b53 -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/3] net/dsa: Refactor dsa_probe()
On 21/04/15 06:26, Jan Kaisrlik wrote: > From: Jan Kaisrlik > > This patch refactors dsa_probe in order to simplify code in the patch 2/3. It does not look like you are working on the latest net-next tree, that part of the code has already been refactored to have separate helper functions such as dsa_setup_dst(), dsa_switch_setup() and dsa_switch_setup_one(). -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] altera tse: add support for lixed-links.
From: Andreas Oetken Add support for fixed-links in configurations without PHY. (e.g. connection to a switch, SGMII point to point, SFPs) Check: Documentation/devicetree/bindings/net/fixed-link.txt. Signed-off-by: Andreas Oetken --- drivers/net/ethernet/altera/altera_tse_main.c | 36 +-- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/altera/altera_tse_main.c b/drivers/net/ethernet/altera/altera_tse_main.c index dbbbd34..a90262e 100644 --- a/drivers/net/ethernet/altera/altera_tse_main.c +++ b/drivers/net/ethernet/altera/altera_tse_main.c @@ -777,6 +777,7 @@ static int init_phy(struct net_device *dev) struct altera_tse_private *priv = netdev_priv(dev); struct phy_device *phydev; struct device_node *phynode; + bool fixed_link = false; /* Avoid init phy in case of no phy present */ if (!priv->phy_iface) @@ -789,13 +790,32 @@ static int init_phy(struct net_device *dev) phynode = of_parse_phandle(priv->device->of_node, "phy-handle", 0); if (!phynode) { - netdev_dbg(dev, "no phy-handle found\n"); - if (!priv->mdio) { - netdev_err(dev, - "No phy-handle nor local mdio specified\n"); - return -ENODEV; + /* check if a fixed-link is defined in device-tree */ + if (of_phy_is_fixed_link(priv->device->of_node)) { + if (of_phy_register_fixed_link(priv->device->of_node) + < 0) { + netdev_err(dev, "cannot register fixed PHY\n"); + return -ENODEV; + } + + /* In the case of a fixed PHY, the DT node associated + * to the PHY is the Ethernet MAC DT node. + */ + phynode = of_node_get(priv->device->of_node); + fixed_link = true; + + netdev_dbg(dev, "fixed-link detected\n"); + phydev = of_phy_connect(dev, phynode, + &altera_tse_adjust_link, + 0, priv->phy_iface); + } else { + netdev_dbg(dev, "no phy-handle found\n"); + if (!priv->mdio) { + netdev_err(dev, "No phy-handle nor local mdio specified\n"); + return -ENODEV; + } + phydev = connect_local_phy(dev); } - phydev = connect_local_phy(dev); } else { netdev_dbg(dev, "phy-handle found\n"); phydev = of_phy_connect(dev, phynode, @@ -819,10 +839,10 @@ static int init_phy(struct net_device *dev) /* Broken HW is sometimes missing the pull-up resistor on the * MDIO line, which results in reads to non-existent devices returning * 0 rather than 0x. Catch this here and treat 0 as a non-existent -* device as well. +* device as well. If a fixed-link is used the phy_id is always 0. * Note: phydev->phy_id is the result of reading the UID PHY registers. */ - if (phydev->phy_id == 0) { + if ((phydev->phy_id == 0) && !fixed_link) { netdev_err(dev, "Bad PHY UID 0x%08x\n", phydev->phy_id); phy_disconnect(phydev); return -ENODEV; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH iproute2 1/3] tc: fix compilation warning on 32bits arch
From: Nicolas Dichtel > Sent: 21 April 2015 17:07 > The warning was: > m_simple.c: In function parse_simple: > m_simple.c:142:4: warning: format %ld expects argument of type long int, but > argument 3 has type > size_t [-Wformat] > > Useful to be able to compile with -Werror. > > Signed-off-by: Nicolas Dichtel > --- > tc/m_simple.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tc/m_simple.c b/tc/m_simple.c > index 866552f559b3..3b6d7beb769c 100644 > --- a/tc/m_simple.c > +++ b/tc/m_simple.c > @@ -139,7 +139,7 @@ parse_simple(struct action_util *a, int *argc_p, char > ***argv_p, int tca_id, > > if (strlen(simpdata) > (SIMP_MAX_DATA - 1)) { > fprintf(stderr, "simple: Illegal string len %ld <%s> \n", > - strlen(simpdata), simpdata); > + (long)strlen(simpdata), simpdata); > return -1; Isn't the correct fix to use "%zu" ? David
Re: randconfig build error with next-20150421, in net/ceph
On Tue, 21 Apr 2015, Guenter Roeck wrote: > On Tue, Apr 21, 2015 at 08:10:44AM -0700, Jim Davis wrote: > > Building with the attached random configuration file, > > > > ERROR: "__divdi3" [net/ceph/libceph.ko] undefined! > > Commit 7321f19d ("crush: straw2 bucket type with an efficient 64-bit > crush_ln()"). > > + draw = ln / w; > > where 'ln' is 64 bit. > > Some other oddies in that patch, such as > > +#if defined(__linux__) > +#include > +#elif defined(__FreeBSD__) > +#include > +#endif > > and lots of coding style violations. Thanks for the report--we'll fix it up! sage -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: randconfig build error with next-20150421, in net/ceph
On Tue, Apr 21, 2015 at 7:21 PM, Sage Weil wrote: > On Tue, 21 Apr 2015, Guenter Roeck wrote: >> On Tue, Apr 21, 2015 at 08:10:44AM -0700, Jim Davis wrote: >> > Building with the attached random configuration file, >> > >> > ERROR: "__divdi3" [net/ceph/libceph.ko] undefined! >> >> Commit 7321f19d ("crush: straw2 bucket type with an efficient 64-bit >> crush_ln()"). >> >> + draw = ln / w; >> >> where 'ln' is 64 bit. >> >> Some other oddies in that patch, such as >> >> +#if defined(__linux__) >> +#include >> +#elif defined(__FreeBSD__) >> +#include >> +#endif >> >> and lots of coding style violations. > > Thanks for the report--we'll fix it up! Fixed in ceph-client/master, should make it to linux-next tomorrow. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: randconfig build error with next-20150421, in net/ceph
On Tue, Apr 21, 2015 at 08:10:44AM -0700, Jim Davis wrote: > Building with the attached random configuration file, > > ERROR: "__divdi3" [net/ceph/libceph.ko] undefined! Commit 7321f19d ("crush: straw2 bucket type with an efficient 64-bit crush_ln()"). + draw = ln / w; where 'ln' is 64 bit. Some other oddies in that patch, such as +#if defined(__linux__) +#include +#elif defined(__FreeBSD__) +#include +#endif and lots of coding style violations. Guenter -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2 1/3] tc: fix compilation warning on 32bits arch
The warning was: m_simple.c: In function ‘parse_simple’: m_simple.c:142:4: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ [-Wformat] Useful to be able to compile with -Werror. Signed-off-by: Nicolas Dichtel --- tc/m_simple.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tc/m_simple.c b/tc/m_simple.c index 866552f559b3..3b6d7beb769c 100644 --- a/tc/m_simple.c +++ b/tc/m_simple.c @@ -139,7 +139,7 @@ parse_simple(struct action_util *a, int *argc_p, char ***argv_p, int tca_id, if (strlen(simpdata) > (SIMP_MAX_DATA - 1)) { fprintf(stderr, "simple: Illegal string len %ld <%s> \n", - strlen(simpdata), simpdata); + (long)strlen(simpdata), simpdata); return -1; } -- 2.2.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3.16.y-ckt 093/144] net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}
3.16.7-ckt10 -stable review patch. If anyone has any objections, please let me know. -- From: Markos Chandras commit 87f966d97b89774162df04d2106c6350c8fe4cb3 upstream. On a MIPS Malta board, tons of fifo underflow errors have been observed when using u-boot as bootloader instead of YAMON. The reason for that is that YAMON used to set the pcnet device to SRAM mode but u-boot does not. As a result, the default Tx threshold (64 bytes) is now too small to keep the fifo relatively used and it can result to Tx fifo underflow errors. As a result of which, it's best to setup the SRAM on supported controllers so we can always use the NOUFLO bit. Cc: Cc: Cc: Don Fry Signed-off-by: Markos Chandras Signed-off-by: David S. Miller Signed-off-by: Luis Henriques --- drivers/net/ethernet/amd/pcnet32.c | 31 +-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/amd/pcnet32.c b/drivers/net/ethernet/amd/pcnet32.c index e7cc9174e364..02d3b7975835 100644 --- a/drivers/net/ethernet/amd/pcnet32.c +++ b/drivers/net/ethernet/amd/pcnet32.c @@ -1552,7 +1552,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct pci_dev *pdev) { struct pcnet32_private *lp; int i, media; - int fdx, mii, fset, dxsuflo; + int fdx, mii, fset, dxsuflo, sram; int chip_version; char *chipname; struct net_device *dev; @@ -1589,7 +1589,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct pci_dev *pdev) } /* initialize variables */ - fdx = mii = fset = dxsuflo = 0; + fdx = mii = fset = dxsuflo = sram = 0; chip_version = (chip_version >> 12) & 0x; switch (chip_version) { @@ -1622,6 +1622,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct pci_dev *pdev) chipname = "PCnet/FAST III 79C973"; /* PCI */ fdx = 1; mii = 1; + sram = 1; break; case 0x2626: chipname = "PCnet/Home 79C978"; /* PCI */ @@ -1645,6 +1646,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct pci_dev *pdev) chipname = "PCnet/FAST III 79C975"; /* PCI */ fdx = 1; mii = 1; + sram = 1; break; case 0x2628: chipname = "PCnet/PRO 79C976"; @@ -1673,6 +1675,31 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct pci_dev *pdev) dxsuflo = 1; } + /* +* The Am79C973/Am79C975 controllers come with 12K of SRAM +* which we can use for the Tx/Rx buffers but most importantly, +* the use of SRAM allow us to use the BCR18:NOUFLO bit to avoid +* Tx fifo underflows. +*/ + if (sram) { + /* +* The SRAM is being configured in two steps. First we +* set the SRAM size in the BCR25:SRAM_SIZE bits. According +* to the datasheet, each bit corresponds to a 512-byte +* page so we can have at most 24 pages. The SRAM_SIZE +* holds the value of the upper 8 bits of the 16-bit SRAM size. +* The low 8-bits start at 0x00 and end at 0xff. So the +* address range is from 0x up to 0x17ff. Therefore, +* the SRAM_SIZE is set to 0x17. The next step is to set +* the BCR26:SRAM_BND midway through so the Tx and Rx +* buffers can share the SRAM equally. +*/ + a->write_bcr(ioaddr, 25, 0x17); + a->write_bcr(ioaddr, 26, 0xc); + /* And finally enable the NOUFLO bit */ + a->write_bcr(ioaddr, 18, a->read_bcr(ioaddr, 18) | (1 << 11)); + } + dev = alloc_etherdev(sizeof(*lp)); if (!dev) { ret = -ENOMEM; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2 2/3] libnamespaces: fix warning about syscall()
The warning was: In file included from namespace.c:14:0: ../include/namespace.h: In function ‘setns’: ../include/namespace.h:37:2: warning: implicit declaration of function ‘syscall’ [-Wimplicit-function-declaration] Signed-off-by: Nicolas Dichtel --- include/namespace.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/namespace.h b/include/namespace.h index a2ac7dccd0e1..5add9d266b7d 100644 --- a/include/namespace.h +++ b/include/namespace.h @@ -3,6 +3,7 @@ #include #include +#include #include #include -- 2.2.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] turn Makefile more distribution friendly
On 04/20/2015 06:55 PM, Stephen Hemminger wrote: > On Mon, 13 Apr 2015 16:00:56 +0200 > Pavel Šimerda wrote: > >> From: Pavel Šimerda >> >> Changes: >> >> * Accept directory settings from environment. >> * Remove redundant ROOTDIR variable. >> * Set KERNEL_INCLUDE default to '/usr/include'. >> * Use CFLAGS from environemnt. >> >> Note: In the long term it might be better to improve the configure >> script to generate those parts of the Makefile in a manner similar >> to autoconf. It might be even practical to autotoolize the package. >> >> Signed-off-by: Pavel Šimerda > > I will take this part. > But don't want to start iproute2 down the autoconf/autotool sink hole. Thanks! The changes I submitted should generally be good enough for distro maintainers to avoid Makefile modifications. Cheers, Pavel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2 3/3] mroute: remove invalid check against NLM_F_MULTI
This flag is only for the netlink protocol (multi-part messages), no reason to reject messages without it. Note that this flag was removed by the following kernel patches (v3.14) 65886f439ab0 ipmr: fix mfc notification flags f518338b1603 ip6mr: fix mfc notification flags Signed-off-by: Nicolas Dichtel --- ip/ipmroute.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/ip/ipmroute.c b/ip/ipmroute.c index 13ac892512d0..125a13f8c582 100644 --- a/ip/ipmroute.c +++ b/ip/ipmroute.c @@ -67,8 +67,7 @@ int print_mroute(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) int family; if ((n->nlmsg_type != RTM_NEWROUTE && -n->nlmsg_type != RTM_DELROUTE) || - !(n->nlmsg_flags & NLM_F_MULTI)) { +n->nlmsg_type != RTM_DELROUTE)) { fprintf(stderr, "Not a multicast route: %08x %08x %08x\n", n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags); return 0; -- 2.2.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/11] IB/cm: Add network namespace support
On Tue, Apr 21, 2015 at 03:07:47PM +0300, Haggai Eran wrote: > Namespace is needed for RoCE address resolution, in cases where the > driver doesn't report the MAC as part of the ib_wc. This patch explicitly says it doesn't deal with RoCE, so why are we adding namespaces to support RoCE paths in this series? Especially since we have no idea how that should fit into verbs. Frankly, that stuff is the most objectionable part of this series. I suggest: 1) Focus only on the RDMA-CM, and only on IB support, as the title says 2) Drop all changes to verbs and cm and otherwise that are not directly related to IB 3) Very, very, strongly justify why the remaining layer violations are necessary, and think very carefully about doing something else. For IB, it is very clear to me that only the RDMA-CM can possibly have the knoweldge to find the namespace, so only the RDMA-CM should be touching it. If the interface between the RDMA-CM and IB-CM layers is preventing something, then extend the interface, don't drop RDMA-CM code into IB-CM. >From that point, with working IB, we can revisit what is needed to make iWarp and RoCE work at the verbs layer and ultimately at the CM layer, in steps. Your other questions: > Using the GID alone is not enough to distinguish between namespaces, > because you can have multiple IPoIB interfaces, all using the GID (and > possibly the same P_Key), and each belonging to a different namespace. Exactly, this is why IB GID layers can't possibly need to touch the net namespace. > The listener rbtree's key is currently the service ID, for > instance. Now with namespaces, you can have multiple listeners > listening on the same service ID, so we need to use (service ID, > namespace) as the key. CM doesn't care, a service ID is registered by RDMA-CM and RDMA-CM can demux the (service ID,IP) tuple to the right namespace. Having CM snoop private data is a huge layering violation! > looks at it's private data and demuxes it to a net namespace. > I don't see it there. The code seem to fetch the GID from the GRH. > Because the IP address in the source GID can be the same for different > namespaces, this is not enough to pick the right namespace. For IB, ib_init_ah_from_wc does not need a namespace. For RoCEE, the GID *MUST* be enough to find the namespace because each namespace will create a unique GID table entry. RoCEE and IB are going to be totally different in how this implemented... I expect RoCEE to have namespace constraints at the verbs QP level, while IB cannot - that feels like a huge journey... Jason -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure
On Tuesday 21 April 2015 17:13:32 Thomas Gleixner wrote: > On Tue, 21 Apr 2015, Arnd Bergmann wrote: > > On Tuesday 21 April 2015 16:14:26 Thomas Gleixner wrote: > > > > Note the use of a separate __kernel_itimerspec64 for the user interface > > > > here, which I think will be needed to hide the differences between the > > > > normal itimerspec on 64-bit machines, and the new itimerspec on 32-bit > > > > platforms that will be defined differently (using 'long long'). > > > > > > Confused. > > > > > > timespec64 / itimerspec64 should be the same independent of 64bit and > > > 32bit. So why do we need another variant ? > > > > There are multiple reasons: > > > > * On 64-bit systems, timespec64 would always be defined in the same way > > as struct timespec { __kernel_time_t tv_sec; long tv_nsec; }, with > > __kernel_time_t being 'long'. On 32-bit, we probably need to make both > > members 'long long' for the user space side, in order to share the > > syscall implementation with the kernel side, but we may also want to > > keep the internal timespec64 using a 'long' for tv_nsec, as we do > > today. This means that both the binary layout (padding or no padding) > > and the basic types (long or long long) are different between 32-bit > > and 64-bit, and between kernel and user space > > So you want to avoid a compat syscall for 32bit applications on a > 64bit kernel, right? That is the idea at the moment, yes. At least for the kernel-user interface. > That burdens 32bit with the extra 'long long' in user space. Not sure > whether user space folks will be happy about it. I know there are concerns about this, in particular because C11 and POSIX both require tv_nsec to be 'long', unlike timeval->tv_usec, which is a 'suseconds_t' and can be defined as 'long long'. There are four possible ways that 32-bit user space could define timespec based on the uncontroversial (I hope) 'typedef long long time_t;'. a) struct timespec { time_t tv_sec; long long tv_nsec; /* or typedef long long snseconds_t */ }; This is not directly compatible with C11 or POSIX.1-2008, but it matches what we do inside of 64-bit kernels, so probably has the highest chance of working correctly in practice b) struct timespec { time_t tv_sec; long tv_nsec; }; This is the definition according to C11 and POSIX, and the main problem here is the padding, which may cause a 4-byte data leak and has the tv_nsec in the wrong place when used together with the timespec we have in the kernel on big-endian 64-bit machines. c) struct timespec { time_t tv_sec; #if __BITS_PER_LONG == 32 && __BYTE_ORDER == __BIG_ENDIAN long __pad; #endif long tv_nsec; /* or typedef long long snseconds_t */ #if __BITS_PER_LONG == 32 && __BYTE_ORDER == __LITTLE_ENDIAN long __pad; #endif }; This version could be used transparently by user space, but has the potential to cause problems with existing user space doing things like struct timespec ts = { 0, 1000 }; /* one microsecond */ even though it is probably compliant. d) struct timespec { time_t tv_sec; #if __BITS_PER_LONG == 32 && __BYTE_ORDER == __BIG_ENDIAN long :32; #endif long tv_nsec; /* or typedef long long snseconds_t */ #if __BITS_PER_LONG == 32 && __BYTE_ORDER == __LITTLE_ENDIAN long :32; #endif }; This is very similar to c, but trades the problem I described above for a dependency on a gcc extension that is not part of C11 or any earlier standard. >From the kernel's point of view, b), c) and d) can all be treated the same for output data, as we only ever pass back normalized timespec structures that have zeroes in the upper bits of timespec. However, for input to the kernel, we would require an extra conditional on 64-bit kernels to decide whether we ignore the upper bits (doing tv->tv_nsec &= 0x) or a structure that has the upper bits set needs to cause EINVAL. We could still do that in get_timespec() if we decide to not to go with a). See also https://lwn.net/Articles/457089/ for an earlier discussion on the topic when debating the x32 ABI. Arnd -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure
On Tue, 21 Apr 2015, Arnd Bergmann wrote: > On Tuesday 21 April 2015 16:14:26 Thomas Gleixner wrote: > > > Note the use of a separate __kernel_itimerspec64 for the user interface > > > here, which I think will be needed to hide the differences between the > > > normal itimerspec on 64-bit machines, and the new itimerspec on 32-bit > > > platforms that will be defined differently (using 'long long'). > > > > Confused. > > > > timespec64 / itimerspec64 should be the same independent of 64bit and > > 32bit. So why do we need another variant ? > > There are multiple reasons: > > * On 64-bit systems, timespec64 would always be defined in the same way > as struct timespec { __kernel_time_t tv_sec; long tv_nsec; }, with > __kernel_time_t being 'long'. On 32-bit, we probably need to make both > members 'long long' for the user space side, in order to share the > syscall implementation with the kernel side, but we may also want to > keep the internal timespec64 using a 'long' for tv_nsec, as we do > today. This means that both the binary layout (padding or no padding) > and the basic types (long or long long) are different between 32-bit > and 64-bit, and between kernel and user space So you want to avoid a compat syscall for 32bit applications on a 64bit kernel, right? That burdens 32bit with the extra 'long long' in user space. Not sure whether user space folks will be happy about it. > * We should not put 'struct timespec64' into the user space namespace, > as applications might already use that identifier. This is similar > to the __u32/u32 or __kernel_time_t/time_t tuple of types for interface > and in-kernel uses. This is particularly important when embedding a > timespec in another data structure. Fair enough. > * My plan is to use a temporary hack where I actually define > __kernel_timespec64 to look like the 32-bit version of timespec, > as an intermediate step when converting all 32-bit architectures over > to use the compat_*() syscalls in place of the existing ones, so > I can change over the normal syscalls to use __kernel_timespec64 > without having to change all architectures at once, or having to > modify each syscall multiple times. Makes sense. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
randconfig build error with next-20150421, in net/ceph
Building with the attached random configuration file, ERROR: "__divdi3" [net/ceph/libceph.ko] undefined! # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.0.0 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_SMP=y CONFIG_X86_HT=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=2 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" # CONFIG_SYSVIPC is not set CONFIG_POSIX_MQUEUE=y CONFIG_KDBUS=m CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y # CONFIG_AUDITSYSCALL is not set # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_CHIP=y CONFIG_IRQ_DOMAIN=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_HZ_PERIODIC=y # CONFIG_NO_HZ_IDLE is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set # # RCU Subsystem # CONFIG_TREE_RCU=y CONFIG_SRCU=y CONFIG_TASKS_RCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANOUT_LEAF=16 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_TREE_RCU_TRACE is not set CONFIG_RCU_KTHREAD_PRIO=0 # CONFIG_RCU_NOCB_CPU is not set # CONFIG_RCU_EXPEDITE_BOOT is not set # CONFIG_BUILD_BIN2C is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_CGROUPS=y CONFIG_CGROUP_DEBUG=y # CONFIG_CGROUP_FREEZER is not set CONFIG_CGROUP_DEVICE=y # CONFIG_CPUSETS is not set # CONFIG_CGROUP_CPUACCT is not set CONFIG_PAGE_COUNTER=y CONFIG_MEMCG=y # CONFIG_MEMCG_KMEM is not set CONFIG_CGROUP_PERF=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y # CONFIG_RT_GROUP_SCHED is not set CONFIG_CHECKPOINT_RESTORE=y CONFIG_SCHED_AUTOGROUP=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" # CONFIG_RD_GZIP is not set CONFIG_RD_BZIP2=y # CONFIG_RD_LZMA is not set # CONFIG_RD_XZ is not set # CONFIG_RD_LZO is not set CONFIG_RD_LZ4=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set # CONFIG_LTO_MENU is not set CONFIG_ANON_INODES=y CONFIG_HAVE_UID16=y CONFIG_SYSCTL_EXCEPTION_TRACE=y CONFIG_HAVE_PCSPKR_PLATFORM=y CONFIG_BPF=y CONFIG_EXPERT=y # CONFIG_MULTIUSER is not set CONFIG_SGETMASK_SYSCALL=y # CONFIG_SYSFS_SYSCALL is not set CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y # CONFIG_PCSPKR_PLATFORM is not set CONFIG_BASE_FULL=y CONFIG_FUTEX=y # CONFIG_EPOLL is not set # CONFIG_SIGNALFD is not set # CONFIG_TIMERFD is not set CONFIG_EVENTFD=y # CONFIG_BPF_SYSCALL is not set CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_ADVISE_SYSCALLS=y CONFIG_PCI_QUIRKS=y # CONFIG_EMBEDDED is not set CONFIG_HAVE_PERF_EVENTS=y # # Kernel Performance Events And Counters # CONFIG_PERF_EVENTS=y # CONFIG_DEBUG_PERF_USE_VMALLOC is not set # CONFIG_VM_EVENT_COUNTERS is not set CONFIG_COMPAT_BRK=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CON
Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]
Prashant Sreedharan writes ("Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]"): > On Fri, 2015-04-17 at 15:12 -0400, David Miller wrote: > > From: Konrad Rzeszutek Wilk > > Date: Fri, 17 Apr 2015 15:04:48 -0400 > > > A huge amount of NIC drivers use the DMA API, however if > > > compiled under 32-bit an very important part of the DMA API can > > > be ommitted leading to the drivers not working at all > > > (especially if used with 'swiotlb=force iommu=soft'). ... > > > As such enable this even when using 32-bit kernels. > > > > > > Reported-by: Ian Jackson > > > Signed-off-by: Konrad Rzeszutek Wilk Thanks. Tested-by: Ian Jackson I'd appreciate it if this patch could make its way into the 3.14.y stable branch, as well as all the other places it's applicable of course. I've included another copy below for convenience, with acks etc. from this email thread folded in. I have tested 3.14.34 plus just this patch, with my usual kernel configuration input and the affected machine, and this fixes the problem completely AFAICT. I have tested both baremetal 32-bit with `iommu=soft swiotlb=force' (which previously corrupted all received network packets) and 32-bit Linux under 64-bit Xen without any special options (which previously gave 25-30% packet loss). Thanks, Ian. >From 9e417af099e3cee2b219ab28ffc1e96b0564b213 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Fri, 17 Apr 2015 14:55:47 -0400 Subject: [PATCH] config: Enable NEED_DMA_MAP_STATE when SWIOTLB is selected A huge amount of NIC drivers use the DMA API, however if compiled under 32-bit an very important part of the DMA API can be ommitted leading to the drivers not working at all (especially if used with 'swiotlb=force iommu=soft'). As Prashant Sreedharan explains it: "the driver [tg3] uses DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of the dma "mapping" and dma_unmap_addr() to get the "mapping" value. On most of the platforms this is a no-op, but ... with "iommu=soft and swiotlb=force" this house keeping is required, ... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_ instead of the DMA address." As such enable this even when using 32-bit kernels. Reported-by: Ian Jackson Signed-off-by: Konrad Rzeszutek Wilk Acked-by: David S. Miller Acked-by: Prashant Sreedharan Tested-by: Ian Jackson --- arch/x86/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index b7d31ca..570c71d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -177,7 +177,7 @@ config SBUS config NEED_DMA_MAP_STATE def_bool y - depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG + depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG || SWIOTLB config NEED_SG_DMA_LENGTH def_bool y -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] tcp: set SOCK_NOSPACE under memory presure
Under tcp memory pressure, calling epoll_wait() in edge triggered mode after -EAGAIN, can result in an indefinite hang in epoll_wait(), even when there is suffcient memory available to continue making progress. The problem is that __sk_mem_schedule() can return 0, under memory pressure without having set the SOCK_NOSPACE flag. Thus, even though all the outstanding packets have been acked, we never get the EPOLLOUT that we are expecting from epoll_wait(). This issue is currently limited to epoll when used in edge trigger mode, since 'tcp_poll()', does in fact currently set SOCK_NOSPACE. This is sufficient for poll()/select() and epoll() in level trigger mode. However, in edge trigger mode, epoll() is relying on the write path to set SOCK_NOSPACE. So I view this patch as bringing us into sync with poll()/select() and epoll() level trigger behavior. I can reproduce this issue, using SO_SNDBUF, since __sk_mem_schedule() will return 0, or failure more readily with SO_SNDBUF: 1) create socket and set SO_SNDBUF to N 2) add socket as edge trigger 3) write to socket and block in epoll on -EAGAIN 4) cause tcp mem pressure via: echo "" > net.ipv4.tcp_mem The fix here is simply to set SOCK_NOSPACE in sk_stream_wait_memory() when the socket is non-blocking. Note that we could still hang if sk->sk_wmem_queue is 0, when we get the -EAGAIN. In this case the SOCK_NOSPACE bit will not help, since we are waiting for and event that will never happen. I believe that this case is hard to hit (and did not hit in my testing), in that over the 'soft' limit, we continue to guarantee a minimum write buffer size. Perhaps, we could return -ENOSPC in this case...note that this case is not specific to epoll ET, but rather would affect all blocking and non-blocking sockets as well, and thus I think its ok to treat it as a separate case. Signed-off-by: Jason Baron --- net/core/stream.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/core/stream.c b/net/core/stream.c index 301c05f..d70f77a 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -119,6 +119,7 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) int err = 0; long vm_wait = 0; long current_timeo = *timeo_p; + bool noblock = (*timeo_p ? false : true); DEFINE_WAIT(wait); if (sk_stream_memory_free(sk)) @@ -131,8 +132,11 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) goto do_error; - if (!*timeo_p) + if (!*timeo_p) { + if (noblock) + set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); goto do_nonblock; + } if (signal_pending(current)) goto do_interrupted; clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags); -- 1.8.2.rc2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure
On Tuesday 21 April 2015 16:14:26 Thomas Gleixner wrote: > > Note the use of a separate __kernel_itimerspec64 for the user interface > > here, which I think will be needed to hide the differences between the > > normal itimerspec on 64-bit machines, and the new itimerspec on 32-bit > > platforms that will be defined differently (using 'long long'). > > Confused. > > timespec64 / itimerspec64 should be the same independent of 64bit and > 32bit. So why do we need another variant ? There are multiple reasons: * On 64-bit systems, timespec64 would always be defined in the same way as struct timespec { __kernel_time_t tv_sec; long tv_nsec; }, with __kernel_time_t being 'long'. On 32-bit, we probably need to make both members 'long long' for the user space side, in order to share the syscall implementation with the kernel side, but we may also want to keep the internal timespec64 using a 'long' for tv_nsec, as we do today. This means that both the binary layout (padding or no padding) and the basic types (long or long long) are different between 32-bit and 64-bit, and between kernel and user space * We should not put 'struct timespec64' into the user space namespace, as applications might already use that identifier. This is similar to the __u32/u32 or __kernel_time_t/time_t tuple of types for interface and in-kernel uses. This is particularly important when embedding a timespec in another data structure. * My plan is to use a temporary hack where I actually define __kernel_timespec64 to look like the 32-bit version of timespec, as an intermediate step when converting all 32-bit architectures over to use the compat_*() syscalls in place of the existing ones, so I can change over the normal syscalls to use __kernel_timespec64 without having to change all architectures at once, or having to modify each syscall multiple times. > > I would also prefer not too many people to work on the syscalls, and > > would rather have Baolin not touch any of the syscall prototypes for > > the moment. > > I did not ask him to change any of the syscall prototypes. I just > wanted him to split out the guts of the syscall into a seperate static > function to avoid all that code churn. Ok, I wasn't sure about that part, thanks for clarifying. Arnd -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
VM guest bridging issue with i40e
There's some trouble with a kvm guest that is bridged to the host networks i40e (X710) NIC. The guest no longer receives DHCP replies and probably other traffic as well. I bisected this back to the following commit. commit 79c21a827e98081895a8b9650f1b0a8b37b16125 Author: Anjali Singhai Jain Date: Thu Nov 13 03:06:14 2014 + i40e: Add new update VSI flow to accommodate FW fix with VSI Loopback mode All VSIs on a VEB should either have loopback enabled or disabled, a mixed mode is not supported for a VEB. Since our driver supports multiple VSIs per PF that need to talk to each other make sure to enable Loopback for the PF and FDIR VSI as well. Also, we now have to explicitly enable Loopback mode otherwise we fail VSI creation for VMDq and VF VSIs. Reproducer: Adding bridge on the host # brctl addbr br0 # brctl addif br0 eth6 virt XML for bridging Now request IP via DHCP from the kvm guest. The problem appears as soon as loopback gets enabled on the main VSI. >From the XL710 spec: 7.4.9.5.4.1.1 Add VSI Settings Recommendations Table 7-86. Add VSI Recommended settings Allow Loopback 0 for PF Is it correct that the main VSI represents the PF? If so shouldn't loopback stay disabled for the main VSI? Stefan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] Add network namespace support in the RDMA-CM
On 21/04/2015 17:11, Steve Wise wrote: > On 4/21/2015 1:36 AM, Haggai Eran wrote: >> On 20/04/2015 17:53, Steve Wise wrote: >>> Hey Haggai, >>> >>> Did you check for changes needed in drivers/infiniband/core/iwcm.c? >> We focused on namespace support for InfiniBand alone in this series. We >> didn't handle iWARP, nor did we implement support for RoCE or other >> transports. >> >>> I notice that it uses init_net here: >>> >>> static int __init iw_cm_init(void) >>> { >>> iwcm_wq = create_singlethread_workqueue("iw_cm_wq"); >>> if (!iwcm_wq) >>> return -ENOMEM; >>> >>> iwcm_ctl_table_hdr = register_net_sysctl(&init_net, >>> "net/iw_cm", >>> iwcm_ctl_table); >>> if (!iwcm_ctl_table_hdr) { >>> pr_err("iw_cm: couldn't register sysctl paths\n"); >>> destroy_workqueue(iwcm_wq); >>> return -ENOMEM; >>> } >>> >>> return 0; >>> } >>> >> I see the only thing in the iWARP sysctl registered here is the default >> backlog. If you want to control this parameter per namespace, we could >> store it per network namespace, and add a namespace parameter to >> iw_cm_listen. I'm not sure how important this is though. > > I don't think it needs to be per namespace, as long as it still applies > across all name spaces. It will, but it will currently only be visible and controllable through init's namespace. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure
On Tue, 21 Apr 2015, Arnd Bergmann wrote: > > COMPAT_SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id, > >struct compat_itimerspec __user *, setting) > > As a side note, I want to kill off the get_fs()/set_fs() calls in > the process. These always make me dizzy when I try to work out whether > there is a potential security hole (there is not in this case), and > they can be slow on some architectures. Yeah. I have to take a deep breath every time I look at those :) > My preferred solution is one where we end up with the same syscalls > for both 32-bit and 64-bit, and basically use the > compat_sys_timer_gettime() implementation (or a simplified version) > for the existing , something like this: No objections from my side. I was not looking into the syscall magic yet. I just wanted to avoid the code churn and have the guts of the syscalls factored out for simple reusage. > Note the use of a separate __kernel_itimerspec64 for the user interface > here, which I think will be needed to hide the differences between the > normal itimerspec on 64-bit machines, and the new itimerspec on 32-bit > platforms that will be defined differently (using 'long long'). Confused. timespec64 / itimerspec64 should be the same independent of 64bit and 32bit. So why do we need another variant ? > I would also prefer not too many people to work on the syscalls, and > would rather have Baolin not touch any of the syscall prototypes for > the moment. I did not ask him to change any of the syscall prototypes. I just wanted him to split out the guts of the syscall into a seperate static function to avoid all that code churn. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] Add network namespace support in the RDMA-CM
On 4/21/2015 1:36 AM, Haggai Eran wrote: On 20/04/2015 17:53, Steve Wise wrote: Hey Haggai, Did you check for changes needed in drivers/infiniband/core/iwcm.c? We focused on namespace support for InfiniBand alone in this series. We didn't handle iWARP, nor did we implement support for RoCE or other transports. I notice that it uses init_net here: static int __init iw_cm_init(void) { iwcm_wq = create_singlethread_workqueue("iw_cm_wq"); if (!iwcm_wq) return -ENOMEM; iwcm_ctl_table_hdr = register_net_sysctl(&init_net, "net/iw_cm", iwcm_ctl_table); if (!iwcm_ctl_table_hdr) { pr_err("iw_cm: couldn't register sysctl paths\n"); destroy_workqueue(iwcm_wq); return -ENOMEM; } return 0; } I see the only thing in the iWARP sysctl registered here is the default backlog. If you want to control this parameter per namespace, we could store it per network namespace, and add a namespace parameter to iw_cm_listen. I'm not sure how important this is though. I don't think it needs to be per namespace, as long as it still applies across all name spaces. Steve. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next V2 12/12] net/mlx5: Ethernet driver
On 4/14/2015 9:51 PM, David Miller wrote: From: Amir Vadai Date: Tue, 14 Apr 2015 11:20:35 +0300 Signed-off-by: Amir Vadai What does "Ethernet driver" mean? Are you adding a new ethernet driver? If so, what is it for and how does it interact with the existing mlx5 driver? It looks to me like you are adding a lot of code and objects to the existing mlx5 module. An incredible amount, in fact. This seems very suboptimal especially for users of the existing mlx5 chips. Hi Dave, To clarify things a bit, the mlx5 driver serves two NICs families: ConnectIB (device IDs 0x1011/12) and ConnectX4 (device IDs 1013-1016). ConnectIB HW is IB only, ConnextX4 is IB and Ethernet. This submission enhances the driver to support Ethernet for the ConnectX4 family. In a similar manner to the mlx4 driver stacking, mlx5 has has IB driver and core driver. Per your comments, in V3, the Ethernet functionality would be added per dedicated config directive, such that if someone wants only the IB driver to be functional, the Ethernet netdev code and such doesn't get built. You haven't discussed this, what design decisions made you decide in the end to do it this way, etc. You absolutely have to say something other than "Ethernet driver" in this commit message, I expect several paragraphs of details and the hows and whys of the change as it is a non-trivial amount of code being added here. Understood, will add more text to the cover letter, change-logs, etc. I still consider this patch series not ready yet, and the merge window is open thus closing the net-next tree. You will therefore need to wait until the net-next tree opens again before submitting this series again. Sure, thanks for the review feedback provided so far. BTW Sorry for the late reply, just realized today there has been no response on your comments. Or. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] driver/net/usb: Add support for DSA to ax88772b
Jan Kaisrlik writes: > From: Jan Kaisrlik > > This patch adds a possibility to use the RMII interface of the ax88772b > instead of the Ethernet PHY which is used now. > > Binding DSA to a USB device is done via sysfs. > > --- > drivers/net/usb/asix.h | 7 ++ > drivers/net/usb/asix_devices.c | 258 > - > 2 files changed, 261 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/usb/asix.h b/drivers/net/usb/asix.h > index 5d049d0..6b1a5f5 100644 > --- a/drivers/net/usb/asix.h > +++ b/drivers/net/usb/asix.h > @@ -174,6 +174,13 @@ struct asix_rx_fixup_info { > > struct asix_common_private { > struct asix_rx_fixup_info rx_fixup_info; > +#ifdef CONFIG_NET_DSA > + struct kobject kobj; > + struct mii_bus *mdio; > + int use_embphy; > + bool dsa_up; > + struct usbnet *dev; > +#endif > }; > > extern const struct driver_info ax88172a_info; > diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c > index bf49792..57b3a34 100644 > --- a/drivers/net/usb/asix_devices.c > +++ b/drivers/net/usb/asix_devices.c > @@ -35,6 +35,88 @@ > > #define PHY_MODE_RTL8211CL 0x000C > > +#ifdef CONFIG_NET_DSA > + > +#define AX88772_RMII 0x02 > +#define AX88772_MAX_PORTS 0x20 > +#define MV88e6065_ID 0x0c89 > + > +#include > +#include > + > +#define to_asix_obj(x) container_of(x, struct asix_common_private, kobj) > +#define to_asix_attr(x) container_of(x, struct asix_attribute, attr) > + > +static int mii_asix_read(struct mii_bus *bus, int phy_id, int regnum) > +{ > + return asix_mdio_read(((struct usbnet *)bus->priv)->net, phy_id, > regnum); > +} > + > +static int mii_asix_write(struct mii_bus *bus, int phy_id, int regnum, u16 > val) > +{ > + asix_mdio_write(((struct usbnet *)bus->priv)->net, phy_id, regnum, val); > + return 0; > +} > + > +static int ax88772_init_mdio(struct usbnet *dev){ > + struct asix_common_private *priv = dev->driver_priv; > + int ret, i; > + > + priv->mdio = mdiobus_alloc(); > + if (!priv->mdio) { > + netdev_err(dev->net, "Could not allocate mdio bus\n"); > + return -ENOMEM; > + } > + > + priv->mdio->priv = (void *)dev; > + priv->mdio->read = &mii_asix_read; > + priv->mdio->write = &mii_asix_write; > + priv->mdio->name = "ax88772 mdio bus"; > + priv->mdio->parent = &dev->udev->dev; > + > + /* mii bus name is usb-- */ > + snprintf(priv->mdio->id, MII_BUS_ID_SIZE, > "usb-%03d:%03d",dev->udev->bus->busnum, dev->udev->devnum); > + > + priv->mdio->irq = kzalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL); > + if (!priv->mdio->irq) { > + ret = -ENOMEM; > + goto free; > + } > + > + for (i = 0; i < PHY_MAX_ADDR; i++) > + priv->mdio->irq[i] = PHY_POLL; > + > + ret = mdiobus_register(priv->mdio); > + if (ret) { > + netdev_err(dev->net, "Could not register MDIO bus\n"); > + goto free_irq; > + } > + > + netdev_info(dev->net, "registered mdio bus %s\n", priv->mdio->id); > + return 0; > + > +free_irq: > + kfree(priv->mdio->irq); > +free: > + mdiobus_free(priv->mdio); > + return ret; > +} There is already identical code in drivers/net/usb/ax88172a.c. Any chance these ASIX devices can share some code, or does it all have to be duplicated for each new chip? > +// dsa_free(); TODO Probably not like that... > +static ssize_t usb_dsa_store(struct asix_common_private *priv, > + struct asix_attribute *attr, const char *buf, size_t count) > +{ > + ax88772_set_bind_dsa(priv); > + return count; > +} > + > +static ssize_t usb_dsa_show(struct asix_common_private *priv, > + struct asix_attribute *attr, char *buf) > +{ > + return scnprintf(buf, PAGE_SIZE, "usb_dsa_binding.\n"); > +} I'm not sure I understand this at all. What kind of userspace API are you trying to provide here? Maybe you could document these attributes Documentation/ABI/testing/ to make it more clear? > +static void driver_release(struct kobject *kobj) > +{ > + pr_debug("driver: '%s': %s\n", kobject_name(kobj), __func__); > +// kfree(drv_priv); TODO free > +} Ah, I guess you might have missed this section of Documentation/kobject.txt ?: One important point cannot be overstated: every kobject must have a release() method, and the kobject must persist (in a consistent state) until that method is called. If these constraints are not met, the code is flawed. Note that the kernel will warn you if you forget to provide a release() method. Do not try to get rid of this warning by providing an "empty" release function; you will be mocked mercilessly by the kobject maintainer if you attempt this. Better CC Greg KH on your next attempt to make sure you get the mocking you deserve :-) > +static ssize_t usb_dsa_attr_show(struct kobject *kobj, > + struct attribute *attr, > + char *buf) > +{ >
[PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails
When rhashtable_insert_rehash() fails with ENOMEM, this indicates that we can't allocate the necessary memory in the current context but the limits as set by the user would still allow to grow. Thus attempt an async resize in the background where we can allocate using GFP_KERNEL which is more likely to succeed. The insertion itself will still fail to indicate pressure. This fixes a bug where the table would never continue growing once the utilization is above 100%. Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion") Signed-off-by: Thomas Graf --- include/linux/rhashtable.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index e23d242..7040b5c 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -593,6 +593,8 @@ slow_path: spin_unlock_bh(lock); err = rhashtable_insert_rehash(ht); rcu_read_unlock(); + if (err == -ENOMEM) + schedule_work(&ht->run_work); if (err) return err; -- 2.3.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 2/2] rhashtable: Do not schedule more than one rehash if we can't grow further
The current code currently only stops inserting rehashes into the chain when no resizes are currently scheduled. As long as resizes are scheduled and while inserting above the utilization watermark, more and more rehashes will be scheduled. This lead to a perfect DoS storm with thousands of rehashes scheduled which lead to thousands of spinlocks to be taken sequentially. Instead, only allow either a series of resizes or a single rehash. Drop any further rehashes and return -EBUSY. Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion") Signed-off-by: Thomas Graf --- lib/rhashtable.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 4898442..cb819ed 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -405,8 +405,8 @@ int rhashtable_insert_rehash(struct rhashtable *ht) if (rht_grow_above_75(ht, tbl)) size *= 2; - /* More than two rehashes (not resizes) detected. */ - else if (WARN_ON(old_tbl != tbl && old_tbl->size == size)) + /* Do not schedule more than one rehash */ + else if (old_tbl != tbl) return -EBUSY; new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC); -- 2.3.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 0/2] rhashtable rehashing fixes
Some rhashtable rehashing bugs found while testing with the next rhashtable self-test queued up for the next devel cycle: https://github.com/tgraf/net-next/commits/rht Thomas Graf (2): rhashtable: Schedule async resize when sync realloc fails rhashtable: Do not schedule more than one rehash if we can't grow further include/linux/rhashtable.h | 2 ++ lib/rhashtable.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) -- 2.3.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] Enable connecting DSA-based switch to the USB RMII interface.
Hi Jan Interesting work, but i think the architecture is wrong. DSA needs an Ethernet device, an MDIO bus, and information about ports on the switch. The MDIO bus and the Ethernet need no knowledge of DSA. So putting your DSA configuration code in the MDIO driver is wrong. The problem you have is where the put the configuration data. There are the currently two choices, using a platform driver, which you can find some examples of in arch/arm/mach-orion5x, or via device tree. Or you need a new method. Part of your problem is hotplug, since you have a USB device, and no stable names for the ethernet device nor the MDIO device. Your hardware is not fixed, you could hang any switch off the USB device. So it does sound like you need a user space API. I would however say that sysfs is the wrong API. The linux network stack uses netlink for most configuration activities. So i would suggest adding a netlink binding to DSA, and place the code in net/dsa/, not within an MDIO driver. Device tree overlays might be a solution, if you can dynamically load a blob as part of a USB hotplug event. What makes it easier is that both the Ethernet device and MDIO bus are on the same USB device, so all your phandles are within the blob. What is your long term goal? Is this just a development tool? Are you thinking of making a product which integrates both the switch and the USB ethernet onto a USB dongle? This could also change the architecture, since it makes the configuration more fixed. Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP
From: Eran Ben Elisha Currently we parse max_msg_sz from the wrong offset in QUERY_DEV_CAP, fix to use the right offset. Fixes: 0b131561a7d6 ('net/mlx4_en: Add Flow control statistics [..]') Signed-off-by: Eran Ben Elisha Signed-off-by: Or Gerlitz --- Hi Dave, Sending this fix early as that innocent bug breaks RoCE applications on SRIOV VFs, since the max message size there gets down to two bytes. No need for -stable here as the bug was introduced in this merge window. Or. drivers/net/ethernet/mellanox/mlx4/fw.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c index b9881fc..a407981 100644 --- a/drivers/net/ethernet/mellanox/mlx4/fw.c +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c @@ -781,10 +781,10 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET); dev_cap->num_ports = field & 0xf; MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_MSG_SZ_OFFSET); + dev_cap->max_msg_sz = 1 << (field & 0x1f); MLX4_GET(field, outbox, QUERY_DEV_CAP_PORT_FLOWSTATS_COUNTERS_OFFSET); if (field & 0x10) dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_FLOWSTATS_EN; - dev_cap->max_msg_sz = 1 << (field & 0x1f); MLX4_GET(field, outbox, QUERY_DEV_CAP_FLOW_STEERING_RANGE_EN_OFFSET); if (field & 0x80) dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_FS_EN; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/11] IB/cm: Add network namespace support
On 20/04/2015 20:06, Jason Gunthorpe wrote: > On Mon, Apr 20, 2015 at 12:03:38PM +0300, Haggai Eran wrote: >> From: Guy Shapiro >> >> Add namespace support to the IB-CM layer. > >> - Each CM-ID now has a network namespace it is associated with, assigned at >> creation. This namespace is used as needed during subsequent action on the >> CM-ID or related objects. > > There is really something weird about this layering. At the CM layer > there should be no concept of an IP address, it only deals with GIDs. Using the GID alone is not enough to distinguish between namespaces, because you can have multiple IPoIB interfaces, all using the GID (and possibly the same P_Key), and each belonging to a different namespace. > So how can a CM object have a network namespace associated with it? The listener rbtree's key is currently the service ID, for instance. Now with namespaces, you can have multiple listeners listening on the same service ID, so we need to use (service ID, namespace) as the key. > >> { >> av->port = port; >> av->pkey_index = wc->pkey_index; >> ib_init_ah_from_wc(port->cm_dev->ib_device, port->port_num, wc, >> - grh, &av->ah_attr, &init_net); >> + grh, &av->ah_attr, net); > > There is something deeply wrong with adding network namespace > arguments to verbs. > > For rocee the gid index clearly specifies the network namespace > to use, so much of this should go away and have rocee get the > namespace from the gid index. > > Ie in ib_init_ah_from_wc we have the ib_wc which contains the sgid > index. I don't see it there. The code seem to fetch the GID from the GRH. Because the IP address in the source GID can be the same for different namespaces, this is not enough to pick the right namespace. Regards, Haggai -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/3] net/dsa: Refactor dsa_probe()
From: Jan Kaisrlik This patch refactors dsa_probe in order to simplify code in the patch 2/3. --- net/dsa/dsa.c | 82 ++- 1 file changed, 47 insertions(+), 35 deletions(-) diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 3731714..e2c0703 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -699,12 +699,56 @@ static inline void dsa_of_remove(struct platform_device *pdev) } #endif -static int dsa_probe(struct platform_device *pdev) +static int dsa_probe_common(struct dsa_switch_tree *dst, struct device *parent) { + struct dsa_platform_data *pd = dst->pd; + struct net_device *dev = dst->master_netdev; + int i; + + dst->cpu_switch = -1; + dst->cpu_port = -1; + + for (i = 0; i < pd->nr_chips; i++) { + struct dsa_switch *ds; + + ds = dsa_switch_setup(dst, i, parent, pd->chip[i].host_dev); + if (IS_ERR(ds)) { + netdev_err(dev, "[%d]: couldn't create dsa switch instance (error %ld)\n", + i, PTR_ERR(ds)); + continue; + } + + dst->ds[i] = ds; + if (ds->drv->poll_link != NULL) + dst->link_poll_needed = 1; + } + + /* +* If we use a tagging format that doesn't have an ethertype +* field, make sure that all packets from this point on get +* sent to the tag format's receive function. +*/ + wmb(); + dev->dsa_ptr = (void *)dst; + + if (dst->link_poll_needed) { + INIT_WORK(&dst->link_poll_work, dsa_link_poll_work); + init_timer(&dst->link_poll_timer); + dst->link_poll_timer.data = (unsigned long)dst; + dst->link_poll_timer.function = dsa_link_poll_timer; + dst->link_poll_timer.expires = round_jiffies(jiffies + HZ); + add_timer(&dst->link_poll_timer); + } + + return 0; + +} + +static int dsa_probe(struct platform_device *pdev){ struct dsa_platform_data *pd = pdev->dev.platform_data; struct net_device *dev; struct dsa_switch_tree *dst; - int i, ret; + int ret; pr_notice_once("Distributed Switch Architecture driver version %s\n", dsa_driver_version); @@ -743,40 +787,8 @@ static int dsa_probe(struct platform_device *pdev) dst->pd = pd; dst->master_netdev = dev; - dst->cpu_switch = -1; - dst->cpu_port = -1; - - for (i = 0; i < pd->nr_chips; i++) { - struct dsa_switch *ds; - - ds = dsa_switch_setup(dst, i, &pdev->dev, pd->chip[i].host_dev); - if (IS_ERR(ds)) { - netdev_err(dev, "[%d]: couldn't create dsa switch instance (error %ld)\n", - i, PTR_ERR(ds)); - continue; - } - - dst->ds[i] = ds; - if (ds->drv->poll_link != NULL) - dst->link_poll_needed = 1; - } - - /* -* If we use a tagging format that doesn't have an ethertype -* field, make sure that all packets from this point on get -* sent to the tag format's receive function. -*/ - wmb(); - dev->dsa_ptr = (void *)dst; - if (dst->link_poll_needed) { - INIT_WORK(&dst->link_poll_work, dsa_link_poll_work); - init_timer(&dst->link_poll_timer); - dst->link_poll_timer.data = (unsigned long)dst; - dst->link_poll_timer.function = dsa_link_poll_timer; - dst->link_poll_timer.expires = round_jiffies(jiffies + HZ); - add_timer(&dst->link_poll_timer); - } + dsa_probe_common(dst, &pdev->dev); return 0; -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 3/3] driver/net/usb: Add support for DSA to ax88772b
From: Jan Kaisrlik This patch adds a possibility to use the RMII interface of the ax88772b instead of the Ethernet PHY which is used now. Binding DSA to a USB device is done via sysfs. --- drivers/net/usb/asix.h | 7 ++ drivers/net/usb/asix_devices.c | 258 - 2 files changed, 261 insertions(+), 4 deletions(-) diff --git a/drivers/net/usb/asix.h b/drivers/net/usb/asix.h index 5d049d0..6b1a5f5 100644 --- a/drivers/net/usb/asix.h +++ b/drivers/net/usb/asix.h @@ -174,6 +174,13 @@ struct asix_rx_fixup_info { struct asix_common_private { struct asix_rx_fixup_info rx_fixup_info; +#ifdef CONFIG_NET_DSA + struct kobject kobj; + struct mii_bus *mdio; + int use_embphy; + bool dsa_up; + struct usbnet *dev; +#endif }; extern const struct driver_info ax88172a_info; diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c index bf49792..57b3a34 100644 --- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -35,6 +35,88 @@ #definePHY_MODE_RTL8211CL 0x000C +#ifdef CONFIG_NET_DSA + +#define AX88772_RMII 0x02 +#define AX88772_MAX_PORTS 0x20 +#define MV88e6065_ID 0x0c89 + +#include +#include + +#define to_asix_obj(x) container_of(x, struct asix_common_private, kobj) +#define to_asix_attr(x) container_of(x, struct asix_attribute, attr) + +static int mii_asix_read(struct mii_bus *bus, int phy_id, int regnum) +{ + return asix_mdio_read(((struct usbnet *)bus->priv)->net, phy_id, regnum); +} + +static int mii_asix_write(struct mii_bus *bus, int phy_id, int regnum, u16 val) +{ + asix_mdio_write(((struct usbnet *)bus->priv)->net, phy_id, regnum, val); + return 0; +} + +static int ax88772_init_mdio(struct usbnet *dev){ + struct asix_common_private *priv = dev->driver_priv; + int ret, i; + + priv->mdio = mdiobus_alloc(); + if (!priv->mdio) { + netdev_err(dev->net, "Could not allocate mdio bus\n"); + return -ENOMEM; + } + + priv->mdio->priv = (void *)dev; + priv->mdio->read = &mii_asix_read; + priv->mdio->write = &mii_asix_write; + priv->mdio->name = "ax88772 mdio bus"; + priv->mdio->parent = &dev->udev->dev; + + /* mii bus name is usb-- */ + snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "usb-%03d:%03d",dev->udev->bus->busnum, dev->udev->devnum); + + priv->mdio->irq = kzalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL); + if (!priv->mdio->irq) { + ret = -ENOMEM; + goto free; + } + + for (i = 0; i < PHY_MAX_ADDR; i++) + priv->mdio->irq[i] = PHY_POLL; + + ret = mdiobus_register(priv->mdio); + if (ret) { + netdev_err(dev->net, "Could not register MDIO bus\n"); + goto free_irq; + } + + netdev_info(dev->net, "registered mdio bus %s\n", priv->mdio->id); + return 0; + +free_irq: + kfree(priv->mdio->irq); +free: + mdiobus_free(priv->mdio); + return ret; +} + +static void ax88772_remove_mdio(struct usbnet *dev) +{ + struct asix_common_private *priv = dev->driver_priv; + +// dsa_free(); TODO + + netdev_info(dev->net, "deregistering mdio bus %s\n", priv->mdio->id); + mdiobus_unregister(priv->mdio); + kfree(priv->mdio->irq); + mdiobus_free(priv->mdio); + kfree(priv); +} + +#endif + struct ax88172_int_data { __le16 res1; u8 link; @@ -301,6 +383,27 @@ static int ax88772_reset(struct usbnet *dev) int ret, embd_phy; u16 rx_ctl; +#ifdef CONFIG_NET_DSA + int temp = AX88772_RMII; + struct asix_common_private *priv = dev->driver_priv; + + if (priv->use_embphy == 1) { + data->phymode = PHY_MODE_MARVELL; + data->ledmode = 0; + + /* Set AX88772 to enable RMII interface for external PHY */ + asix_write_cmd(dev, AX_CMD_SW_PHY_SELECT, 0, 0, 0, NULL); + asix_write_cmd(dev, AX_CMD_SW_PHY_SELECT, 0, 0, 1, &temp); + + asix_sw_reset(dev, 0); + msleep(150); + + asix_write_rx_ctl(dev, 0); + msleep(60); + } + +#endif + ret = asix_write_gpio(dev, AX_GPIO_RSE | AX_GPIO_GPO_2 | AX_GPIO_GPO2EN, 5); if (ret < 0) @@ -415,11 +518,131 @@ static const struct net_device_ops ax88772_netdev_ops = { .ndo_set_rx_mode= asix_set_multicast, }; + +#ifdef CONFIG_NET_DSA +struct asix_attribute { + struct attribute attr; + ssize_t (*show)(struct asix_common_private *priv, struct asix_attribute *attr, char *buf); + ssize_t (*store)(struct asix_common_private *priv, struct asix_attribute *attr, const char *buf, size_t count); +}; + +static int ax88772_set_bind_dsa(struct asix_common_private *priv) +{ + struct usbnet *dev = priv->dev; + int i, ret, embd_phy; + u32 phyid;