Re: DSA support for Marvell 88e6065 switch
On Thu, Nov 22, 2018 at 09:27:24PM +0100, Pavel Machek wrote: > Hi! Hello! > > > > > If I wanted it to work, what do I need to do? AFAICT phy autoprobing > > > > > should just attach it as soon as it is compiled in? > > > > > > > > Nope. It is a switch, not a PHY. Switches are never auto-probed > > > > because they are not guaranteed to have ID registers. > > > > > > > > You need to use the legacy device tree binding. Look in > > > > Documentation/devicetree/bindings/net/dsa/dsa.txt, section Deprecated > > > > Binding. You can get more examples if you checkout old kernels. Or > > > > kirkwood-rd88f6281.dtsi, the dsa { } node which is disabled. > > > > > > Thanks; I ported code from mv88e66xx in the meantime, and switch > > > appears to be detected. > > > > > > But I'm running into problems with tagging code, and I guess I'd like > > > some help understanding. > > > > > > tag_trailer: allocates new skb, then copies data around. > > > > > > tag_qca: does dev->stats.tx_packets++, and reuses existing skb. > > > > > > tag_brcm: reuses existing skb. > > Any idea why tag trailer allocates new skb, I wrote this code over 10 years ago, so I don't remember all that well, but I think that it is because you have to do manual checksumming of the packet, as there's no way to pass down the stack that you don't want to checksum all the way down to the end of the data area (and you don't want the tag to be included in the checksum), and so you want to do that before you add the trailer tag, and you'll probably have to reallocate the data area to be able to add the tag, and you probably won't get an exclusive skb here anyway, so you might as well allocate a new one. > and what is going on with dev->stats.tx_packets++? trailer_xmit would be the hard_start_xmit function for the virtual (slave) network interface, so this would be the right thing to do? > > > Is qca wrong in adjusting the statistics? Why does trailer allocate > > > new skb? > > > > > > 6065 seems to use 2-byte header between "SFD" and "Destination > > > address" in the ethernet frame. That's ... strange place to put > > > header, as addresses are now shifted. I need to put ethernet in > > > promisc mode (by running tcpdump) to get data moving.. and can not > > > figure out what to do in tag_... > > > > Does this switch chip not also support trailer mode? > > > > There's basically four tagging modes for Marvell switch chips: header > > mode (the one you described), trailer mode (tag_trailer.c), DSA and > > ethertype DSA. The switch chips I worked on that didn't support > > (ethertype) DSA tagging did support both header and trailer modes, > > and I chose to run them in trailer mode for the reasons you describe > > above, but if your chip doesn't support trailer mode, then yes, > > you'll have to add support for header mode and put the underlying > > interface into promiscuous mode and such. > > It seems that 6060 supports both header (probably, parts of docs are > redacted) and trailer mode... but I'm working with 6065. That does not > support trailer mode... or at least word "trailer" does not appear > anywhere in the documentation. > > What chip were you working with? I may want to take a look on their > wording. I think I added trailer mode just for the 6060, since it doesn't (IIRC) support (ethertype) DSA tagging. > 6065 indeed has some kind of "egress tagging mode" (with four > options), but I have trouble understanding what it really does. What are the options?
Re: DSA support for Marvell 88e6065 switch
On Thu, Nov 22, 2018 at 02:21:23PM +0100, Pavel Machek wrote: > > > If I wanted it to work, what do I need to do? AFAICT phy autoprobing > > > should just attach it as soon as it is compiled in? > > > > Nope. It is a switch, not a PHY. Switches are never auto-probed > > because they are not guaranteed to have ID registers. > > > > You need to use the legacy device tree binding. Look in > > Documentation/devicetree/bindings/net/dsa/dsa.txt, section Deprecated > > Binding. You can get more examples if you checkout old kernels. Or > > kirkwood-rd88f6281.dtsi, the dsa { } node which is disabled. > > Thanks; I ported code from mv88e66xx in the meantime, and switch > appears to be detected. > > But I'm running into problems with tagging code, and I guess I'd like > some help understanding. > > tag_trailer: allocates new skb, then copies data around. > > tag_qca: does dev->stats.tx_packets++, and reuses existing skb. > > tag_brcm: reuses existing skb. > > Is qca wrong in adjusting the statistics? Why does trailer allocate > new skb? > > 6065 seems to use 2-byte header between "SFD" and "Destination > address" in the ethernet frame. That's ... strange place to put > header, as addresses are now shifted. I need to put ethernet in > promisc mode (by running tcpdump) to get data moving.. and can not > figure out what to do in tag_... Does this switch chip not also support trailer mode? There's basically four tagging modes for Marvell switch chips: header mode (the one you described), trailer mode (tag_trailer.c), DSA and ethertype DSA. The switch chips I worked on that didn't support (ethertype) DSA tagging did support both header and trailer modes, and I chose to run them in trailer mode for the reasons you describe above, but if your chip doesn't support trailer mode, then yes, you'll have to add support for header mode and put the underlying interface into promiscuous mode and such.
Re: [BUG] xfrm: unable to handle kernel NULL pointer dereference
On Sat, Nov 10, 2018 at 08:34:34PM +0100, Jean-Philippe Menil wrote: > we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using > IPsec VTI interfaces, on several vpn hosts, since upgrade from 4.4. I looked into this with Jean-Philippe, and it appears to be crashing on a NULL pointer dereference in the inlined xfrm_policy_check() call in vti_rcv_cb(), and specifically on the skb_dst(skb) dereference in __xfrm_policy_check2(): return (!net->xfrm.policy_count[dir] && !skb->sp) || (skb_dst(skb)->flags & DST_NOPOLICY) || <= __xfrm_policy_check(sk, ndir, skb, family); Commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.") fixes a very similar problem on the output and forward paths, but our issue seems to be triggering on the input path. This hack patch seems to make the crashes go away, and the printk added triggers with approximately the same regularity as the crashes used to occur, so the fix from 9e1437937807 probably needs to be extended to the input path somewhat like this. Thanks! diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 352abca2605f..c666e29441b4 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -381,6 +381,12 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) XFRM_SKB_CB(skb)->seq.input.hi = seq_hi; skb_dst_force(skb); + if (!skb_dst(skb)) { + if (net_ratelimit()) + printk(KERN_CRIT "OH CRAP\n"); + goto drop; + } + dev_hold(skb->dev); if (crypto_done) > Attached, the offended oops against 4.18. > > Output of decodedecode: > > [ 37.134864] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 00 > 8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe 40 > 38 04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20 > All code > >0: 8b 44 24 70 mov0x70(%rsp),%eax >4: 0f c8 bswap %eax >6: 89 87 b4 00 00 00 mov%eax,0xb4(%rdi) >c: 48 8b 86 20 05 00 00mov0x520(%rsi),%rax > 13: 8b 80 f8 14 00 00 mov0x14f8(%rax),%eax > 19: 85 c0 test %eax,%eax > 1b: 75 05 jne0x22 > 1d: 48 85 d2test %rdx,%rdx > 20: 74 0e je 0x30 > 22: 48 8b 43 58 mov0x58(%rbx),%rax > 26: 48 83 e0 fe and$0xfffe,%rax > 2a:* f6 40 38 04 testb $0x4,0x38(%rax) <-- trapping > instruction > 2e: 74 7d je 0xad > 30: 44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx) > 37: 49 8b 44 24 20 mov0x20(%r12),%rax > 3c: 48 rex.W > 3d: 39 .byte 0x39 > 3e: 86 20 xchg %ah,(%rax) > > Code starting with the faulting instruction > === >0: f6 40 38 04 testb $0x4,0x38(%rax) >4: 74 7d je 0x83 >6: 44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx) >d: 49 8b 44 24 20 mov0x20(%r12),%rax > 12: 48 rex.W > 13: 39 .byte 0x39 > 14: 86 20 xchg %ah,(%rax) > > > if my understanding is correct, we fail here: > > /build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h: > 1169return (!net->xfrm.policy_count[dir] && !skb->sp) || >0x0b19 <+185>: testb $0x4,0x38(%rax) >0x0b1d <+189>: je 0xb9c > > (gdb) list *0x0b19 > 0xb19 is in vti_rcv_cb > (/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:1169). > 1164int ndir = dir | (reverse ? XFRM_POLICY_MASK + 1 : 0); > 1165 > 1166if (sk && sk->sk_policy[XFRM_POLICY_IN]) > 1167return __xfrm_policy_check(sk, ndir, skb, family); > 1168 > 1169return (!net->xfrm.policy_count[dir] && !skb->sp) || > 1170(skb_dst(skb)->flags & DST_NOPOLICY) || > 1171__xfrm_policy_check(sk, ndir, skb, family); > 1172} > 1173 > > I really have hard time to understand why skb seem to be freed twice. > > I'm not able to repeat the bug in lab, but it happened regulary in prod, > seem to depend of the workload. > > Any help will be appreciated. > > Let me know if you need further informations. > > Regards, > > Jean-Philippe > [ 31.154360] BUG: unable to handle kernel NULL pointer dereference at > 0038 > [ 31.162233] PGD 0 P4D 0 > [ 31.164786] Oops: [#1] SMP PTI > [ 31.168291] CPU: 5 PID: 42 Comm: ksoftirqd/5 Not tainted 4.18.0-11-generic > #12~18.04.1-Ubuntu > [ 31.176854] Hardware name: Supermicro
Re: [PATCH net] packet: fix reserve calculation
On Thu, May 24, 2018 at 06:10:30PM -0400, Willem de Bruijn wrote: > From: Willem de Bruijn > > Commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link > layer allocation") ensures that packet_snd always starts writing > the link layer header in reserved headroom allocated for this > purpose. > > This is needed because packets may be shorter than hard_header_len, > in which case the space up to hard_header_len may be zeroed. But > that necessary padding is not accounted for in skb->len. > > The fix, however, is buggy. It calls skb_push, which grows skb->len > when moving skb->data back. But in this case packet length should not > change. > > Instead, call skb_reserve, which moves both skb->data and skb->tail > back, without changing length. > > Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer > allocation") > Reported-by: Tariq Toukan > Signed-off-by: Willem de Bruijn > Acked-by: Soheil Hassas Yeganeh After upgrading my router from 4.16.11 to 4.16.12, it is failing to obtain a DHCP lease from my ISP, as it started sending out DHCP queries with 14 bytes of junk at the end (which is presumably causing RX csum failures on the DHCP server end): 13:08:39.292667 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328) 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from xx:xx:xx:xx:xx:xx, length 300, xid 0x, Flags [none] (0x) [...] -0x0150: +0x0150: e802 e802 +0x0160: This seems to be caused by (the -stable backport of) b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation") and appears to have been fixed by this patch, as applying this patch to 4.16.12 makes DHCP work for me again. Tested-by: Lennert Buytenhek > --- > net/packet/af_packet.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index e9422fe45179..acb7b86574cd 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -2911,7 +2911,7 @@ static int packet_snd(struct socket *sock, struct > msghdr *msg, size_t len) > if (unlikely(offset < 0)) > goto out_free; > } else if (reserve) { > - skb_push(skb, reserve); > + skb_reserve(skb, -reserve); > } > > /* Returns -EFAULT on error */ > -- > 2.17.0.921.gf22659ad46-goog
Re: [PATCH net] net: ipv6: Compare lwstate in detecting duplicate nexthops
On Wed, Jul 05, 2017 at 02:14:33PM -0700, Roopa Prabhu wrote: > > Lennert reported a failure to add different mpls encaps in a multipath > > route: > > > > $ ip -6 route add 1234::/16 \ > > nexthop encap mpls 10 via fe80::1 dev ens3 \ > > nexthop encap mpls 20 via fe80::1 dev ens3 > > RTNETLINK answers: File exists > > > > The problem is that the duplicate nexthop detection does not compare > > lwtunnel configuration. Add it. > > > > Fixes: 19e42e451506("ipv6: support for fib route lwtunnel encap attributes") > > Signed-off-by: David Ahern <dsah...@gmail.com> > > Reported-by: João Taveira Araújo <joao.tave...@gmail.com> > > Reported-by: Lennert Buytenhek <buyt...@wantstofly.org> > > Acked-by: Roopa Prabhu <ro...@cumulusnetworks.com> Tested-by: Lennert Buytenhek <buyt...@wantstofly.org> Seems to work! Thanks!
Unable to add v6 multipath route with same nexthops but different MPLS labels
Hi! FWIW, this doesn't work: # ip -6 route add 1234::/16 \ nexthop encap mpls 10 via fe80::1 dev ens3 \ nexthop encap mpls 20 via fe80::1 dev ens3 RTNETLINK answers: File exists While this does: # ip -6 route chg 1234::/16 nexthop encap mpls 10 via fe80::1 dev ens3 nexthop encap mpls 20 via fe80::2 dev ens3 # ip -6 route 1234::/16 encap mpls 10 via fe80::1 dev ens3 metric 1024 pref medium 1234::/16 encap mpls 20 via fe80::2 dev ens3 metric 1024 pref medium [...] ECMPing over different LSPs that share a nexthop router seems like a legitimate use case to me. Is this restriction intentional or just an accident? (The same thing works fine in v4 land, where multipath routes are handled differently.) Thanks in advance! Cheers, Lennert
Re: [PATCH v2 net-next 06/12] ep93xx_eth: add GRO support
On Sat, Feb 04, 2017 at 03:24:56PM -0800, Eric Dumazet wrote: > Use napi_complete_done() instead of __napi_complete() to : > > 1) Get support of gro_flush_timeout if opt-in > 2) Not rearm interrupts for busy-polling users. > 3) use standard NAPI API. > 4) get rid of baroque code and ease maintenance. > > [...] > > @@ -310,35 +311,17 @@ static int ep93xx_rx(struct net_device *dev, int > processed, int budget) > return processed; > } > > -static int ep93xx_have_more_rx(struct ep93xx_priv *ep) > -{ > - struct ep93xx_rstat *rstat = ep->descs->rstat + ep->rx_pointer; > - return !!((rstat->rstat0 & RSTAT0_RFP) && (rstat->rstat1 & RSTAT1_RFP)); > -} > - > static int ep93xx_poll(struct napi_struct *napi, int budget) > { > struct ep93xx_priv *ep = container_of(napi, struct ep93xx_priv, napi); > struct net_device *dev = ep->dev; > - int rx = 0; > - > -poll_some_more: > - rx = ep93xx_rx(dev, rx, budget); > - if (rx < budget) { > - int more = 0; > + int rx; > > + rx = ep93xx_rx(dev, budget); > + if (rx < budget && napi_complete_done(napi, rx)) { > spin_lock_irq(>rx_lock); > - __napi_complete(napi); > wrl(ep, REG_INTEN, REG_INTEN_TX | REG_INTEN_RX); > - if (ep93xx_have_more_rx(ep)) { > - wrl(ep, REG_INTEN, REG_INTEN_TX); > - wrl(ep, REG_INTSTSP, REG_INTSTS_RX); > - more = 1; > - } > spin_unlock_irq(>rx_lock); > - > - if (more && napi_reschedule(napi)) > - goto poll_some_more; > } > > if (rx) { This code was the way it was because the ep93xx hardware is somewhat braindead. If I remember correctly (but it's been a while since I wrote this code): 1. ep93xx netdev IRQs are edge-triggered, so if you re-enable IRQs while there was still work to be done, you will not get another IRQ. 2. Disabling an interrupt source in the interrupt mask register will cause its interrupt status bit to always return zero, so you cannot check whether an interrupt status is pending without having the interrupt source enabled. (I'll admit that a comment explaining this would have been in order.) I don't know if we really care about this hardware anymore (I don't), but the ep93xx platform is still listed as being maintained in the MAINTAINERS file -- adding Ryan and Hartley.
Re: problem with MPLS and TSO/GSO
On Wed, Jul 27, 2016 at 03:02:24PM +0800, zhuyj wrote: > On ubuntu16.04 server 64 bit > The attached script is run, the following will appear. > > Error: either "to" is duplicate, or "encap" is a garbage. Looks like your installed iproute2 package doesn't grok MPLS.
problem with MPLS and TSO/GSO
Hi! I am seeing pretty horrible TCP transmit performance (anywhere between 1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a route that involves MPLS labeling, and this seems to be due to an interaction between MPLS and TSO/GSO that causes all segmentable TCP frames that are MPLS-labeled to be dropped on egress. I initially ran into this issue with the ixgbe driver, but it is easily reproduced with veth interfaces, and the script attached below this email reproduces the issue. The script configures three network namespaces: one that transmits TCP data (netperf) with MPLS labels, one that takes the MPLS traffic and pops the labels and forwards the traffic on, and one that receives the traffic (netserver). When not using MPLS labeling, I get ~3 Mb/s single-stream TCP performance in this setup on my test box, and with MPLS labeling, I get ~2 Mb/s. Some investigating shows that egress TCP frames that need to be segmented are being dropped in validate_xmit_skb(), which calls skb_gso_segment() which calls skb_mac_gso_segment() which returns -EPROTONOSUPPORT because we apparently didn't have the right kernel module (mpls_gso) loaded. (It's somewhat poor design, IMHO, to degrade network performance by 15000x if someone didn't load a kernel module they didn't know they should have loaded, and in a way that doesn't log any warnings or errors and can only be diagnosed by adding printk calls to net/core/ and recompiling your kernel.) (Also, I'm not sure why mpls_gso is needed when ixgbe seems to be able to natively do TSO on MPLS-labeled traffic, maybe because ixgbe doesn't advertise the necessary features in ->mpls_features? But adding those bits doesn't seem to change much.) But, loading mpls_gso doesn't change much -- skb_gso_segment() then starts return -EINVAL instead, which is due to the skb_network_protocol() call in skb_mac_gso_segment() returning zero. And looking at skb_network_protocol(), I don't see how this is supposed to work -- skb->protocol is 0 at this point, and there is no way to figure out that what we are encapsulating is IP traffic, because unlike what is the case with VLAN tags, MPLS labels aren't followed by an inner ethertype that says what kind of traffic is in here, you have to have explicit knowledge of the payload type for MPLS. Any ideas? Thanks in advance! Cheers, Lennert === problem.sh #!/bin/sh # ns0 sends out packets with mpls labels # ns1 receives the labelled packets, pops the labels, and forwards to ns2 # ns2 receives the unlabelled packets and replies to ns0 ip netns add ns0 ip netns add ns1 ip netns add ns2 ip link add virt01 type veth peer name virt10 ip link set virt01 netns ns0 ip link set virt10 netns ns1 ip link add virt12 type veth peer name virt21 ip link set virt12 netns ns1 ip link set virt21 netns ns2 ip netns exec ns0 ip addr add 127.0.0.1/8 dev lo ip netns exec ns0 ip link set lo up ip netns exec ns0 ip addr add 172.16.20.20/24 dev virt01 ip netns exec ns0 ip link set virt01 up ip netns exec ns1 ip addr add 127.0.0.1/8 dev lo ip netns exec ns1 ip link set lo up ip netns exec ns1 ip addr add 172.16.20.21/24 dev virt10 ip netns exec ns1 ip link set virt10 up ip netns exec ns1 ip addr add 172.16.21.21/24 dev virt12 ip netns exec ns1 ip link set virt12 up ip netns exec ns2 ip addr add 127.0.0.1/8 dev lo ip netns exec ns2 ip link set lo up ip netns exec ns2 ip addr add 172.16.21.22/24 dev virt21 ip netns exec ns2 ip link set virt21 up modprobe mpls_iptunnel ip netns exec ns0 ip route add 10.10.10.10/32 encap mpls 100 via inet 172.16.20.21 mtu lock 1496 #ip netns exec ns0 ip route add 172.16.21.0/24 via 172.16.20.21 ip netns exec ns0 ip route add 172.16.21.0/24 via 172.16.20.21 mtu lock 1496 ip netns exec ns1 sysctl -w net.ipv4.conf.all.rp_filter=0 ip netns exec ns1 sysctl -w net.ipv4.conf.default.rp_filter=0 ip netns exec ns1 sysctl -w net.ipv4.conf.lo.rp_filter=0 ip netns exec ns1 sysctl -w net.ipv4.conf.virt10.rp_filter=0 ip netns exec ns1 sysctl -w net.ipv4.conf.virt12.rp_filter=0 ip netns exec ns1 sysctl -w net.ipv4.ip_forward=1 ip netns exec ns1 sysctl -w net.mpls.conf.virt10.input=1 ip netns exec ns1 sysctl -w net.mpls.platform_labels=1000 ip netns exec ns1 ip -f mpls route add 100 via inet 172.16.21.22 ip netns exec ns2 ip addr add 10.10.10.10/32 dev lo ip netns exec ns2 ip route add 172.16.20.0/24 via 172.16.21.21 ip netns exec ns0 ping -c 1 10.10.10.10 ip netns exec ns2 netserver # non-mpls ip netns exec ns0 netperf -c -C -H 172.16.21.22 -l 10 -t TCP_STREAM # mpls (retry this with mpls_gso loaded) ip netns exec ns0 netperf -c -C -H 10.10.10.10 -l 10 -t TCP_STREAM
[PATCH] neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
From: David Barroso <dbarr...@fastly.com> neigh_xmit() expects to be called inside an RCU-bh read side critical section, and while one of its two current callers gets this right, the other one doesn't. More specifically, neigh_xmit() has two callers, mpls_forward() and mpls_output(), and while both callers call neigh_xmit() under rcu_read_lock(), this provides sufficient protection for neigh_xmit() only in the case of mpls_forward(), as that is always called from softirq context and therefore doesn't need explicit BH protection, while mpls_output() can be called from process context with softirqs enabled. When mpls_output() is called from process context, with softirqs enabled, we can be preempted by a softirq at any time, and RCU-bh considers the completion of a softirq as signaling the end of any pending read-side critical sections, so if we do get a softirq while we are in the part of neigh_xmit() that expects to be run inside an RCU-bh read side critical section, we can end up with an unexpected RCU grace period running right in the middle of that critical section, making things go boom. This patch fixes this impedance mismatch in the callee, by making neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that expects to be treated as an RCU-bh read side critical section, as this seems a safer option than fixing it in the callers. Fixes: 4fd3d7d9e868f ("neigh: Add helper function neigh_xmit") Signed-off-by: David Barroso <dbarr...@fastly.com> Signed-off-by: Lennert Buytenhek <lbuyten...@fastly.com> Acked-by: David Ahern <d...@cumulusnetworks.com> Acked-by: Robert Shearman <rshea...@brocade.com> --- net/core/neighbour.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 29dd8cc..510cd62 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2469,13 +2469,17 @@ int neigh_xmit(int index, struct net_device *dev, tbl = neigh_tables[index]; if (!tbl) goto out; + rcu_read_lock_bh(); neigh = __neigh_lookup_noref(tbl, addr, dev); if (!neigh) neigh = __neigh_create(tbl, addr, dev, false); err = PTR_ERR(neigh); - if (IS_ERR(neigh)) + if (IS_ERR(neigh)) { + rcu_read_unlock_bh(); goto out_kfree_skb; + } err = neigh->output(neigh, skb); + rcu_read_unlock_bh(); } else if (index == NEIGH_LINK_TABLE) { err = dev_hard_header(skb, dev, ntohs(skb->protocol), -- 2.7.4
Re: [PATCH] mpls: Add missing RCU-bh read side critical section locking in output path
On Thu, Jun 23, 2016 at 12:00:55PM -0400, David Miller wrote: > > From: David Barroso <dbarr...@fastly.com> > > > > When locally originated IP traffic hits a route that says to push > > MPLS labels, we'll get a call chain dst_output() -> lwtunnel_output() > > -> mpls_output() -> neigh_xmit() -> ___neigh_lookup_noref() where the > > last function in this chain accesses a RCU-bh protected struct > > neigh_table pointer without us ever having declared an RCU-bh read > > side critical section. > > > > As in case of locally originated IP traffic we'll be running in process > > context, with softirqs enabled, we can be preempted by a softirq at any > > time, and RCU-bh considers the completion of a softirq as signaling > > the end of any pending read-side critical sections, so if we do get a > > softirq here, we can end up with an unexpected RCU grace period and > > all the nastiness that that comes with. > > > > This patch makes neigh_xmit() take rcu_read_{,un}lock_bh() around the > > code that expects to be treated as an RCU-bh read side critical section. > > > > Signed-off-by: David Barroso <dbarr...@fastly.com> > > Signed-off-by: Lennert Buytenhek <lbuyten...@fastly.com> > > Whilst the case that was used to discover this problem was MPLS, that > is not the subsystem where the bug exists and is being fixed. > > Therefore please fix your Subject line. > > Thanks. I'd say that the bug _is_ in the MPLS code, but that we're just fixing it in a helper function that lives elsewhere (and which is only used by MPLS), but yeah, the subject line and the patch body don't match up. :( I've resubmitted the patch with the commit message below, I hope that that'll do. Thanks! === [PATCH] neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit() From: David Barroso <dbarr...@fastly.com> neigh_xmit() expects to be called inside an RCU-bh read side critical section, and while one of its two current callers gets this right, the other one doesn't. More specifically, neigh_xmit() has two callers, mpls_forward() and mpls_output(), and while both callers call neigh_xmit() under rcu_read_lock(), this provides sufficient protection for neigh_xmit() only in the case of mpls_forward(), as that is always called from softirq context and therefore doesn't need explicit BH protection, while mpls_output() can be called from process context with softirqs enabled. When mpls_output() is called from process context, with softirqs enabled, we can be preempted by a softirq at any time, and RCU-bh considers the completion of a softirq as signaling the end of any pending read-side critical sections, so if we do get a softirq while we are in the part of neigh_xmit() that expects to be run inside an RCU-bh read side critical section, we can end up with an unexpected RCU grace period running right in the middle of that critical section, making things go boom. This patch fixes this impedance mismatch in the callee, by making neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that expects to be treated as an RCU-bh read side critical section, as this seems a safer option than fixing it in the callers. Fixes: 4fd3d7d9e868f ("neigh: Add helper function neigh_xmit") Signed-off-by: David Barroso <dbarr...@fastly.com> Signed-off-by: Lennert Buytenhek <lbuyten...@fastly.com> Acked-by: David Ahern <d...@cumulusnetworks.com> Acked-by: Robert Shearman <rshea...@brocade.com>
Re: rcu locking issue in mpls output code?
On Mon, Jun 20, 2016 at 10:38:39AM -0600, David Ahern wrote: > > OK, patch coming up. Thanks! > > can you build a kernel with rcu debugging enabled as well and run > it through your tests? git HEAD with CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_PROVE_RCU=y gives me a lockdep splat on the machine under my desk when I cause mpls_output() to be called. The script I use for that is this one -- it creates a namespace that accepts MPLS tagged packets for one of its local IPs and then sends an MPLS tagged packet into that namespace. If you run the script on an unpatched kernel with lock debugging enabled, you should be able to see the issue as well, the lockdep splat happens on the very first packet. = #!/bin/sh ip link add tons type veth peer name tempitf ifconfig tons 172.16.20.20 netmask 255.255.255.0 ip netns add ns1 ip netns exec ns1 ifconfig lo 127.0.0.1 up ip link set tempitf netns ns1 ip netns exec ns1 ip link set tempitf name eth0 ip netns exec ns1 ifconfig eth0 172.16.20.21 netmask 255.255.255.0 modprobe mpls_iptunnel ip route add 10.10.10.10/32 encap mpls 100 via inet 172.16.20.21 ip netns exec ns1 sysctl -w net.ipv4.conf.all.rp_filter=0 ip netns exec ns1 sysctl -w net.ipv4.conf.lo.rp_filter=0 ip netns exec ns1 sysctl -w net.mpls.conf.eth0.input=1 ip netns exec ns1 sysctl -w net.mpls.platform_labels=1000 ip netns exec ns1 ip addr add 10.10.10.10/32 dev lo ip netns exec ns1 ip -f mpls route add 100 dev lo ping -c 1 10.10.10.10 = The patch below (which I'll submit shortly with a proper commit message) makes this lockdep splat go away. Enabling lock/rcu debugging gives you a lockdep splat on the first packet going out through mpls_output(), but then makes the packet loss / memory corruption issue stop appearing, both on my local space heater and on much more serious hardware, probably due to timing differences. But, with lock/rcu debugging disabled and the patch below included, I don't see packet loss anymore in a production environment during a test that would fairly reliably show it before. diff --git a/net/core/neighbour.c b/net/core/neighbour.c index f18ae91..769cece 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2467,13 +2467,17 @@ int neigh_xmit(int index, struct net_device *dev, tbl = neigh_tables[index]; if (!tbl) goto out; + rcu_read_lock_bh(); neigh = __neigh_lookup_noref(tbl, addr, dev); if (!neigh) neigh = __neigh_create(tbl, addr, dev, false); err = PTR_ERR(neigh); - if (IS_ERR(neigh)) + if (IS_ERR(neigh)) { + rcu_read_unlock_bh(); goto out_kfree_skb; + } err = neigh->output(neigh, skb); + rcu_read_unlock_bh(); } else if (index == NEIGH_LINK_TABLE) { err = dev_hard_header(skb, dev, ntohs(skb->protocol),
[PATCH] mpls: Add missing RCU-bh read side critical section locking in output path
From: David Barroso <dbarr...@fastly.com> When locally originated IP traffic hits a route that says to push MPLS labels, we'll get a call chain dst_output() -> lwtunnel_output() -> mpls_output() -> neigh_xmit() -> ___neigh_lookup_noref() where the last function in this chain accesses a RCU-bh protected struct neigh_table pointer without us ever having declared an RCU-bh read side critical section. As in case of locally originated IP traffic we'll be running in process context, with softirqs enabled, we can be preempted by a softirq at any time, and RCU-bh considers the completion of a softirq as signaling the end of any pending read-side critical sections, so if we do get a softirq here, we can end up with an unexpected RCU grace period and all the nastiness that that comes with. This patch makes neigh_xmit() take rcu_read_{,un}lock_bh() around the code that expects to be treated as an RCU-bh read side critical section. Signed-off-by: David Barroso <dbarr...@fastly.com> Signed-off-by: Lennert Buytenhek <lbuyten...@fastly.com> diff --git a/net/core/neighbour.c b/net/core/neighbour.c index f18ae91..769cece 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2467,13 +2467,17 @@ int neigh_xmit(int index, struct net_device *dev, tbl = neigh_tables[index]; if (!tbl) goto out; + rcu_read_lock_bh(); neigh = __neigh_lookup_noref(tbl, addr, dev); if (!neigh) neigh = __neigh_create(tbl, addr, dev, false); err = PTR_ERR(neigh); - if (IS_ERR(neigh)) + if (IS_ERR(neigh)) { + rcu_read_unlock_bh(); goto out_kfree_skb; + } err = neigh->output(neigh, skb); + rcu_read_unlock_bh(); } else if (index == NEIGH_LINK_TABLE) { err = dev_hard_header(skb, dev, ntohs(skb->protocol),
Re: rcu locking issue in mpls output code?
On Mon, Jun 20, 2016 at 09:13:36AM -0700, Roopa Prabhu wrote: > diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c > index fb31aa8..802956b 100644 > --- a/net/mpls/mpls_iptunnel.c > +++ b/net/mpls/mpls_iptunnel.c > @@ -105,12 +105,15 @@ static int mpls_output(struct net *net, struct > sock *sk, struct sk_buff *skb) > bos = false; > } > > + rcu_read_lock_bh(); > if (rt) > err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, > >rt_gateway, > skb); > else if (rt6) > err = neigh_xmit(NEIGH_ND_TABLE, out_dev, > >rt6i_gateway, > skb); > + rcu_read_unlock_bh(); > + > if (err) > net_dbg_ratelimited("%s: packet transmission failed: > %d\n", > __func__, err); > > >>> > >>> I think those need to be added to neigh_xmit in the > >>> > >>> if (likely(index < NEIGH_NR_TABLES)) { > >>> > >>> } > >> > >> > >> That'll force callers that don't need the extra protection (i.e. > >> mpls_forward(), since that always runs from softirq and it's enough > >> to protect the neigh state with rcu_read_lock() from softirq and we're > >> already running under rcu_read_lock() when we get to neigh_xmit()) to > >> eat the useless overhead of an extra rcu_read_{,un}lock_bh() pair, but > >> sure, functionally that's correct, I think, and in my workload I don't > >> care about MPLS forwarding performance anyway. ;-) > > > > > > __neigh_lookup_noref expects bh level protection. Since the if block in > > neigh_xmit requires the locking seems like this the appropriate place for > > it. > > > >> > >> Want me to send a patch moving it to neigh_xmit() ? > > > > > > Roopa/Robert: agree? > > yes, seems like an appropriate place for it. provided it does not add > unnecessary overhead for others. > But then neigh_xmit seems to be only called from mpls_output and mpls_forward. OK, patch coming up. Thanks!
Re: rcu locking issue in mpls output code?
On Sun, Jun 19, 2016 at 08:19:20PM -0600, David Ahern wrote: > > diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c > > index fb31aa8..802956b 100644 > > --- a/net/mpls/mpls_iptunnel.c > > +++ b/net/mpls/mpls_iptunnel.c > > @@ -105,12 +105,15 @@ static int mpls_output(struct net *net, struct sock > > *sk, struct sk_buff *skb) > > bos = false; > > } > > > > + rcu_read_lock_bh(); > > if (rt) > > err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, >rt_gateway, > > skb); > > else if (rt6) > > err = neigh_xmit(NEIGH_ND_TABLE, out_dev, >rt6i_gateway, > > skb); > > + rcu_read_unlock_bh(); > > + > > if (err) > > net_dbg_ratelimited("%s: packet transmission failed: %d\n", > > __func__, err); > > > > I think those need to be added to neigh_xmit in the > > if (likely(index < NEIGH_NR_TABLES)) { > > } That'll force callers that don't need the extra protection (i.e. mpls_forward(), since that always runs from softirq and it's enough to protect the neigh state with rcu_read_lock() from softirq and we're already running under rcu_read_lock() when we get to neigh_xmit()) to eat the useless overhead of an extra rcu_read_{,un}lock_bh() pair, but sure, functionally that's correct, I think, and in my workload I don't care about MPLS forwarding performance anyway. ;-) Want me to send a patch moving it to neigh_xmit() ? Thank you for having a look! Cheers, Lennert
rcu locking issue in mpls output code?
Hi! While trying to chase down a memory corruption issue that only occurs when originating large amounts of MPLS tagged IP traffic, I came across something in the MPLS output code for which I'm not entirely sure that it's correct. Specifically, there is the code path dst_output() -> lwtunnel_output() -> mpls_output() -> neigh_xmit() -> ___neigh_lookup_noref(), where the latter accesses a RCU-bh protected struct neigh_table pointer, but there is no RCU-bh protection being arranged anywhere in this call chain. Since this is locally generated IP traffic, we're running in process context, and while lwtunnel_output() holds rcu_read_lock() across its call to lwtunnel_encap_ops::output() (which is mpls_output() here), nothing in the chain disables BHs, and in RCU-bh, the completion of a softirq signals the end of any pending read-side critical sections, and BHs can preempt this call chain at any time because it runs with hardirqs and softirqs both enabled, so that would mean that neighbour table entries can be zapped at any time even while we hold rcu_read_lock(). I think. The mpls_forward() path doesn't seem susceptible to the same issue, as it runs from softirq, where rcu_read_lock() suffices, so I figured that mpls_output() would be a good place to deal with this and that something like the patch below would do the trick. I can't say yet if this makes my memory corruption issues go away, as they don't reproduce that easily, but I'll keep testing. Any thoughts so far? Thanks, Lennert diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c index fb31aa8..802956b 100644 --- a/net/mpls/mpls_iptunnel.c +++ b/net/mpls/mpls_iptunnel.c @@ -105,12 +105,15 @@ static int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb) bos = false; } + rcu_read_lock_bh(); if (rt) err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, >rt_gateway, skb); else if (rt6) err = neigh_xmit(NEIGH_ND_TABLE, out_dev, >rt6i_gateway, skb); + rcu_read_unlock_bh(); + if (err) net_dbg_ratelimited("%s: packet transmission failed: %d\n", __func__, err);
Re: [PATCH 0/2] macvlan: Avoid unnecessary multicast cloning
On Mon, May 30, 2016 at 04:17:52PM +0800, Herbert Xu wrote: > > Commit 412ca1550cbecb2c ("macvlan: Move broadcasts into a work queue") > > moved processing of all macvlan multicasts into a work queue. This > > causes a noticable performance regression when there is heavy multicast > > traffic on the underlying interface for multicast groups that the > > macvlan subinterfaces are not members of, in which case we end up > > cloning all those packets and then freeing them again from a work queue > > without really doing any useful work with them in between. > > OK so your motivation is to get rid of the unnecessary memory > allocation, right? That and stack switches to kworker threads and serialisation on the bc_queue queue lock.
Re: [PATCH,RFC] macvlan: Handle broadcasts inline if we have only a few macvlans.
On Fri, May 27, 2016 at 10:56:44AM -0700, Cong Wang wrote: > > Commit 412ca1550cbecb2c ("macvlan: Move broadcasts into a work queue") > > moved processing of all macvlan multicasts into a work queue. This > > causes a noticable performance regression when there is heavy multicast > > traffic on the underlying interface for multicast groups that the > > macvlan subinterfaces are not members of, in which case we end up > > cloning all those packets and then freeing them again from a work queue > > without really doing any useful work with them in between. > > But we only queue up to 1000 packets in our backlog. > > How about adding a quick check before cloning it? > > diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c > index cb01023..1c73d0f 100644 > --- a/drivers/net/macvlan.c > +++ b/drivers/net/macvlan.c > @@ -315,6 +315,9 @@ static void macvlan_broadcast_enqueue(struct > macvlan_port *port, > struct sk_buff *nskb; > int err = -ENOMEM; > > + if (skb_queue_len(>bc_queue) >= MACVLAN_BC_QUEUE_LEN) > + return; > + > nskb = skb_clone(skb, GFP_ATOMIC); > if (!nskb) > goto err; We're not hitting the bc_queue skb limit in our environment, as the machine can keep up with the traffic -- it's just that taking an extra clone of the skb and queueing and running the work queue item to free it again is eating up a lot of cycles. But doing the queue length check before the clone might not be a bad idea? (You'd probably want to atomic_long_inc(>dev->rx_dropped) before returning, though?)
[PATCH,RFC] macvlan: Handle broadcasts inline if we have only a few macvlans.
Commit 412ca1550cbecb2c ("macvlan: Move broadcasts into a work queue") moved processing of all macvlan multicasts into a work queue. This causes a noticable performance regression when there is heavy multicast traffic on the underlying interface for multicast groups that the macvlan subinterfaces are not members of, in which case we end up cloning all those packets and then freeing them again from a work queue without really doing any useful work with them in between. The commit message for commit 412ca1550cbecb2c says: | Fundamentally, we need to ensure that the amount of work handled | in each netif_rx backlog run is constrained. As broadcasts are | anything but constrained, it either needs to be limited per run | or moved to process context. This patch moves multicast handling back into macvlan_handle_frame() context if there are 100 or fewer macvlan subinterfaces, while keeping the work queue for if there are more macvlan subinterfaces than that. I played around with keeping track of the number of macvlan subinterfaces that have each multicast filter bit set, but that ended up being more complicated than I liked. Conditionalising the work queue deferring on the total number of macvlan subinterfaces seems like a fair compromise. On a quickly whipped together test program that creates an ethertap interface with a single macvlan subinterface and then blasts 16 Mi multicast packets through the ethertap interface for a multicast group that the macvlan subinterface is not a member of, run time goes from (vanilla kernel): # time ./stress real0m41.864s user0m0.622s sys 0m20.754s to (with this patch): # time ./stress real0m16.539s user0m0.519s sys 0m15.949s Reported-by: Grant Zhang <gzh...@fastly.com> Signed-off-by: Lennert Buytenhek <lbuyten...@fastly.com> --- drivers/net/macvlan.c | 71 --- 1 file changed, 45 insertions(+), 26 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index cb01023..02934a5 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -231,7 +231,8 @@ static unsigned int mc_hash(const struct macvlan_dev *vlan, static void macvlan_broadcast(struct sk_buff *skb, const struct macvlan_port *port, struct net_device *src, - enum macvlan_mode mode) + enum macvlan_mode mode, + bool do_rx_softirq) { const struct ethhdr *eth = eth_hdr(skb); const struct macvlan_dev *vlan; @@ -254,17 +255,49 @@ static void macvlan_broadcast(struct sk_buff *skb, err = NET_RX_DROP; nskb = skb_clone(skb, GFP_ATOMIC); - if (likely(nskb)) + if (likely(nskb)) { err = macvlan_broadcast_one( nskb, vlan, eth, - mode == MACVLAN_MODE_BRIDGE) ?: - netif_rx_ni(nskb); + mode == MACVLAN_MODE_BRIDGE); + if (err == 0) { + if (do_rx_softirq) + err = netif_rx_ni(nskb); + else + err = netif_rx(nskb); + } + } macvlan_count_rx(vlan, skb->len + ETH_HLEN, err == NET_RX_SUCCESS, true); } } } +static void macvlan_process_one(struct sk_buff *skb, + struct macvlan_port *port, + const struct macvlan_dev *src, + bool do_rx_softirq) +{ + if (!src) + /* frame comes from an external address */ + macvlan_broadcast(skb, port, NULL, + MACVLAN_MODE_PRIVATE | + MACVLAN_MODE_VEPA| + MACVLAN_MODE_PASSTHRU| + MACVLAN_MODE_BRIDGE, do_rx_softirq); + else if (src->mode == MACVLAN_MODE_VEPA) + /* flood to everyone except source */ + macvlan_broadcast(skb, port, src->dev, + MACVLAN_MODE_VEPA | + MACVLAN_MODE_BRIDGE, do_rx_softirq); + else + /* +* flood only to VEPA ports, bridge ports +* already saw the frame on the way out. +*/ + macvlan_broadcast(skb, port, src->dev, + MACVLAN_MODE_VEPA, do_rx_softirq); +} + sta
Re: [PATCH,RFC] ep93xx_eth: conversion to phylib framework
On Sun, Feb 24, 2008 at 09:21:53AM +0100, Herbert Valerio Riedel wrote: Currently, the ep93xx_eth driver doesn't care about the PHY state, but it should, in order to tell the MAC when full duplex operation is required; failure to do so causes degraded performance on full duplex links. This patch implements proper PHY handling via the phylib framework: - clean up ep93xx_mdio_{read,write} to conform to ep93xx manual - convert ep93xx_eth driver to phylib framework - set full duplex bit in configuration of MAC when FDX link detected - convert to use print_mac() Looks good to me. My only comment is that we might want to have support for checking preamble suppression support in the PHY Lib, itself. Acked-by: Andy Fleming [EMAIL PROTECTED] ...as nothing happend for some months now just wondering, what I should do next, to get this patch merged upstream :-) ACK! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rtl8150: use default MTU of 1500
On Thu, Jan 31, 2008 at 05:42:34PM +0200, Petko Manolov wrote: The RTL8150 driver uses an MTU of 1540 by default, which causes a bunch of problems -- it prevents booting from NFS root, for one. Agreed, although it is a bit strange how this particular bug has sneaked up for so long... I posted this patch sometime in 2006, and you asked me a question about it then (why we don't just set RTL8150_MTU to 1500 -- the answer would be that RTL8150_MTU is used in a couple more places in the driver, including for allocing skbuffs), but I failed to follow up to that question at the time, which is why I assume it got dropped. I have been carrying the patch in my own tree since then, and only noticed recently that the patch never made it upstream. cheers, Lennert Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Cc: Petko Manolov [EMAIL PROTECTED] --- linux-2.6.24-git7.orig/drivers/net/usb/rtl8150.c 2008-01-24 23:58:37.0 +0100 +++ linux-2.6.24-git7/drivers/net/usb/rtl8150.c 2008-01-30 20:29:00.0 +0100 @@ -925,9 +925,8 @@ netdev-hard_start_xmit = rtl8150_start_xmit; netdev-set_multicast_list = rtl8150_set_multicast; netdev-set_mac_address = rtl8150_set_mac_address; netdev-get_stats = rtl8150_netdev_stats; -netdev-mtu = RTL8150_MTU; SET_ETHTOOL_OPS(netdev, ops); dev-intr_interval = 100; /* 100ms */ if (!alloc_all_urbs(dev)) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
rtl8150: use default MTU of 1500
The RTL8150 driver uses an MTU of 1540 by default, which causes a bunch of problems -- it prevents booting from NFS root, for one. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Cc: Petko Manolov [EMAIL PROTECTED] --- linux-2.6.24-git7.orig/drivers/net/usb/rtl8150.c2008-01-24 23:58:37.0 +0100 +++ linux-2.6.24-git7/drivers/net/usb/rtl8150.c 2008-01-30 20:29:00.0 +0100 @@ -925,9 +925,8 @@ netdev-hard_start_xmit = rtl8150_start_xmit; netdev-set_multicast_list = rtl8150_set_multicast; netdev-set_mac_address = rtl8150_set_mac_address; netdev-get_stats = rtl8150_netdev_stats; - netdev-mtu = RTL8150_MTU; SET_ETHTOOL_OPS(netdev, ops); dev-intr_interval = 100; /* 100ms */ if (!alloc_all_urbs(dev)) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] [NETDEV] ixp2000: rtnl_lock out of loop will be faster
On Wed, Dec 12, 2007 at 04:48:28PM +0800, Wang Chen wrote: [PATCH 3/4] [NETDEV] ixp2000: rtnl_lock out of loop will be faster Before this patch, it gets and releases the lock at each iteration of the loop. Changing unregister_netdev to unregister_netdevice and locking outside of the loop will be faster for this approach. Since the number of net devices is typically either 2 or 3 (depending on the specific model card you're using), and this is not in any kind of hot path at all, I don't see a whole lot of benefit of acquiring the RTNL separately. Besides, I'm slightly worried about putting knowledge of the RTNL into the driver directly. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pegasos_eth.c: Fix compile error over MV643XX_ defines
On Mon, Oct 29, 2007 at 05:27:29PM -0400, Luis R. Rodriguez wrote: This commit made an incorrect assumption: -- Author: Lennert Buytenhek [EMAIL PROTECTED] Date: Fri Oct 19 04:10:10 2007 +0200 mv643xx_eth: Move ethernet register definitions into private header Move the mv643xx's ethernet-related register definitions from include/linux/mv643xx.h into drivers/net/mv643xx_eth.h, since they aren't of any use outside the ethernet driver. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Acked-by: Tzachi Perelstein [EMAIL PROTECTED] Signed-off-by: Dale Farnsworth [EMAIL PROTECTED] -- arch/powerpc/platforms/chrp/pegasos_eth.c made use of a 3 defines there. [EMAIL PROTECTED]:~/devel/wireless-2.6$ git-describe v2.6.24-rc1-138-g0119130 This patch fixes this by internalizing 3 defines onto pegasos which are simply no longer available elsewhere. Without this your compile will fail whenever you enable 'Common Hardware Reference Platform (CHRP) based machines', [...] diff --git a/arch/powerpc/platforms/chrp/pegasos_eth.c b/arch/powerpc/platforms/chrp/pegasos_eth.c index 5bcc58d..1fc9e8c 100644 --- a/arch/powerpc/platforms/chrp/pegasos_eth.c +++ b/arch/powerpc/platforms/chrp/pegasos_eth.c @@ -24,6 +24,9 @@ #define PEGASOS2_SRAM_BASE_ETH0 (PEGASOS2_SRAM_BASE) #define PEGASOS2_SRAM_BASE_ETH1 (PEGASOS2_SRAM_BASE_ETH0 + (PEGASOS2_SRAM_SIZE / 2) ) +#define PEGASOS2_ETH_BAR_4 0x2220 +#define PEGASOS2_ETH_SIZE_REG_4 0x2224 +#define PEGASOS2_ETH_BASE_ADDR_ENABLE_REG0x2290 #define PEGASOS2_SRAM_RXRING_SIZE(PEGASOS2_SRAM_SIZE/4) #define PEGASOS2_SRAM_TXRING_SIZE(PEGASOS2_SRAM_SIZE/4) @@ -147,13 +150,13 @@ static int Enable_SRAM(void) ALong = 0x02; ALong |= PEGASOS2_SRAM_BASE 0x; - MV_WRITE(MV643XX_ETH_BAR_4, ALong); + MV_WRITE(PEGASOS2_ETH_BAR_4, ALong); - MV_WRITE(MV643XX_ETH_SIZE_REG_4, (PEGASOS2_SRAM_SIZE-1) 0x); + MV_WRITE(PEGASOS2_ETH_SIZE_REG_4, (PEGASOS2_SRAM_SIZE-1) 0x); - MV_READ(MV643XX_ETH_BASE_ADDR_ENABLE_REG, ALong); + MV_READ(PEGASOS2_ETH_BASE_ADDR_ENABLE_REG, ALong); ALong = ~(1 4); - MV_WRITE(MV643XX_ETH_BASE_ADDR_ENABLE_REG, ALong); + MV_WRITE(PEGASOS2_ETH_BASE_ADDR_ENABLE_REG, ALong); #ifdef BE_VERBOSE printk(Pegasos II/Marvell MV64361: register unmapped\n); Al Viro sent a patch for this breakage a couple of days ago: http://marc.info/?l=linux-kernelm=119351541706811w=2 (FWIW, I think that code outside of mv643xx_eth.c should not be poking into the mv643xx's registers directly. Ideally, this info should just be passed by pegasos_eth into mv643xx_eth via platform data, and then mv643xx_eth can write the relevant hardware registers.) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove pointless casts from void pointers,
On Fri, Oct 26, 2007 at 05:40:22AM -0400, Jeff Garzik wrote: arch/arm/mach-pxa/ssp.c|2 +- arch/arm/mach-s3c2410/usb-simtec.c |2 +- arch/arm/plat-omap/mailbox.c |2 +- FWIW Acked-by: Lennert Buytenhek [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,RFC] Marvell Orion SoC ethernet driver
On Thu, Oct 25, 2007 at 05:12:04AM -0400, Jeff Garzik wrote: +struct rx_desc { +u32 cmd_sts; +u16 size; +u16 count; +u32 buf; +u32 next; +}; + +struct tx_desc { +u32 cmd_sts; +u16 l4i_chk; +u16 count; +u32 buf; +u32 next; +}; should use sparse type (__le32, etc.) and make sure this driver passes sparse checks ditto for checkpatch (except for the excessively anal stuff) Sorry if it wasn't clear from the thread -- the mainline mv643xx_eth driver turns out to support the same silicon block (but as part of a different chip), so we've dropped orion_eth and submitted patches to make mv643xx_eth work on both the Discovery (what it was originally written for) and the Orion, and these patches are in -rc1 already. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] [MV643XX_ETH] Move ethernet register definitions into private header
On Fri, Oct 19, 2007 at 05:56:54AM -0700, Dale Farnsworth wrote: Isn't it a little too confusing to have two headers with the same name, one in drivers/net and one in include/linux? Perhaps we can fold the drivers/net one into drivers/net/mv643xx_eth.c? Since nothing else includes drivers/net/mv643xx_eth.h anyway, there's not much point in having it separate. Sounds good to me. Please add a patch to do so. Okay. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] [MV643XX_ETH] Move ethernet register definitions into private header
On Fri, Oct 19, 2007 at 09:30:48AM +0100, Christoph Hellwig wrote: Move the mv643xx's ethernet-related register definitions from include/linux/mv643xx.h into drivers/net/mv643xx_eth.h, since they aren't of any use outside the ethernet driver. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Acked-by: Tzachi Perelstein [EMAIL PROTECTED] Index: linux-2.6/drivers/net/mv643xx_eth.h === --- linux-2.6.orig/drivers/net/mv643xx_eth.h +++ linux-2.6/drivers/net/mv643xx_eth.h @@ -7,7 +7,7 @@ #include linux/workqueue.h #include linux/mii.h -#include linux/mv643xx.h +#include linux/mv643xx_eth.h Isn't it a little too confusing to have two headers with the same name, one in drivers/net and one in include/linux? Perhaps we can fold the drivers/net one into drivers/net/mv643xx_eth.c? Since nothing else includes drivers/net/mv643xx_eth.h anyway, there's not much point in having it separate. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/8] [MV643XX_ETH] Merge drivers/net/mv643xx_eth.h into mv643xx_eth.c
Since drivers/net/mv643xx_eth.c is the only user of drivers/net/mv643xx_eth.h, there's not much use in having the header file as a separate file, so merge the header into the driver. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Index: linux-2.6/drivers/net/mv643xx_eth.c === --- linux-2.6.orig/drivers/net/mv643xx_eth.c +++ linux-2.6/drivers/net/mv643xx_eth.c @@ -43,14 +43,570 @@ #include linux/ethtool.h #include linux/platform_device.h +#include linux/module.h +#include linux/kernel.h +#include linux/spinlock.h +#include linux/workqueue.h +#include linux/mii.h + +#include linux/mv643xx_eth.h + #include asm/io.h #include asm/types.h #include asm/pgtable.h #include asm/system.h #include asm/delay.h -#include mv643xx_eth.h +#include asm/dma-mapping.h + +/* Checksum offload for Tx works for most packets, but + * fails if previous packet sent did not use hw csum + */ +#define MV643XX_CHECKSUM_OFFLOAD_TX +#define MV643XX_NAPI +#define MV643XX_TX_FAST_REFILL +#undef MV643XX_COAL + +/* + * Number of RX / TX descriptors on RX / TX rings. + * Note that allocating RX descriptors is done by allocating the RX + * ring AND a preallocated RX buffers (skb's) for each descriptor. + * The TX descriptors only allocates the TX descriptors ring, + * with no pre allocated TX buffers (skb's are allocated by higher layers. + */ + +/* Default TX ring size is 1000 descriptors */ +#define MV643XX_DEFAULT_TX_QUEUE_SIZE 1000 + +/* Default RX ring size is 400 descriptors */ +#define MV643XX_DEFAULT_RX_QUEUE_SIZE 400 + +#define MV643XX_TX_COAL 100 +#ifdef MV643XX_COAL +#define MV643XX_RX_COAL 100 +#endif + +#ifdef MV643XX_CHECKSUM_OFFLOAD_TX +#define MAX_DESCS_PER_SKB (MAX_SKB_FRAGS + 1) +#else +#define MAX_DESCS_PER_SKB 1 +#endif + +#define ETH_VLAN_HLEN 4 +#define ETH_FCS_LEN4 +#define ETH_HW_IP_ALIGN2 /* hw aligns IP header */ +#define ETH_WRAPPER_LEN(ETH_HW_IP_ALIGN + ETH_HLEN + \ + ETH_VLAN_HLEN + ETH_FCS_LEN) +#define ETH_RX_SKB_SIZE(dev-mtu + ETH_WRAPPER_LEN + \ + dma_get_cache_alignment()) + +/* + * Registers shared between all ports. + */ +#define PHY_ADDR_REG 0x +#define SMI_REG0x0004 + +/* + * Per-port registers. + */ +#define PORT_CONFIG_REG(p) (0x0400 + ((p) 10)) +#define PORT_CONFIG_EXTEND_REG(p) (0x0404 + ((p) 10)) +#define MAC_ADDR_LOW(p)(0x0414 + ((p) 10)) +#define MAC_ADDR_HIGH(p) (0x0418 + ((p) 10)) +#define SDMA_CONFIG_REG(p) (0x041c + ((p) 10)) +#define PORT_SERIAL_CONTROL_REG(p) (0x043c + ((p) 10)) +#define PORT_STATUS_REG(p) (0x0444 + ((p) 10)) +#define TRANSMIT_QUEUE_COMMAND_REG(p) (0x0448 + ((p) 10)) +#define MAXIMUM_TRANSMIT_UNIT(p) (0x0458 + ((p) 10)) +#define INTERRUPT_CAUSE_REG(p) (0x0460 + ((p) 10)) +#define INTERRUPT_CAUSE_EXTEND_REG(p) (0x0464 + ((p) 10)) +#define INTERRUPT_MASK_REG(p) (0x0468 + ((p) 10)) +#define INTERRUPT_EXTEND_MASK_REG(p) (0x046c + ((p) 10)) +#define TX_FIFO_URGENT_THRESHOLD_REG(p)(0x0474 + ((p) 10)) +#define RX_CURRENT_QUEUE_DESC_PTR_0(p) (0x060c + ((p) 10)) +#define RECEIVE_QUEUE_COMMAND_REG(p) (0x0680 + ((p) 10)) +#define TX_CURRENT_QUEUE_DESC_PTR_0(p) (0x06c0 + ((p) 10)) +#define MIB_COUNTERS_BASE(p) (0x1000 + ((p) 7)) +#define DA_FILTER_SPECIAL_MULTICAST_TABLE_BASE(p) (0x1400 + ((p) 10)) +#define DA_FILTER_OTHER_MULTICAST_TABLE_BASE(p)(0x1500 + ((p) 10)) +#define DA_FILTER_UNICAST_TABLE_BASE(p)(0x1600 + ((p) 10)) + +/* These macros describe Ethernet Port configuration reg (Px_cR) bits */ +#define UNICAST_NORMAL_MODE(0 0) +#define UNICAST_PROMISCUOUS_MODE (1 0) +#define DEFAULT_RX_QUEUE(queue)((queue) 1) +#define DEFAULT_RX_ARP_QUEUE(queue)((queue) 4) +#define RECEIVE_BC_IF_NOT_IP_OR_ARP(0 7) +#define REJECT_BC_IF_NOT_IP_OR_ARP (1 7) +#define RECEIVE_BC_IF_IP (0 8) +#define REJECT_BC_IF_IP(1 8) +#define RECEIVE_BC_IF_ARP (0 9) +#define REJECT_BC_IF_ARP (1 9) +#define TX_AM_NO_UPDATE_ERROR_SUMMARY (1 12) +#define CAPTURE_TCP_FRAMES_DIS (0 14) +#define CAPTURE_TCP_FRAMES_EN (1 14) +#define CAPTURE_UDP_FRAMES_DIS (0 15) +#define CAPTURE_UDP_FRAMES_EN (1 15) +#define DEFAULT_RX_TCP_QUEUE(queue)((queue) 16) +#define
Re: [PATCH 5/8] [MV643XX_ETH] Remove SHARED_REGS register address bias
On Thu, Oct 18, 2007 at 08:46:58PM -0700, Roland Dreier wrote: +static void __iomem *mv643xx_eth_base; + return readl(((void __iomem *)mv643xx_eth_base) + offset); Given the declaration of mv643xx_eth_base as void __iomem * already, I don't understand why you need the cast to the same type here (and elsewhere in the driver). Makes sense, fixed. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8] [MV643XX_ETH] Remove unused register defines
Most of the register defines in drivers/net/mv643xx_eth.h aren't used at all. Nuke them -- we can always re-add them if/when we need them, and meanwhile, they unnecessarily clutter up the header file. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Acked-by: Tzachi Perelstein [EMAIL PROTECTED] Index: linux-2.6/drivers/net/mv643xx_eth.h === --- linux-2.6.orig/drivers/net/mv643xx_eth.h +++ linux-2.6/drivers/net/mv643xx_eth.h @@ -57,115 +57,28 @@ */ #define PHY_ADDR_REG 0x #define SMI_REG0x0004 -#define UNIT_DEFAULT_ADDR_REG 0x0008 -#define UNIT_DEFAULTID_REG 0x000c -#define UNIT_INTERRUPT_CAUSE_REG 0x0080 -#define UNIT_INTERRUPT_MASK_REG0x0084 -#define UNIT_INTERNAL_USE_REG 0x04fc -#define UNIT_ERROR_ADDR_REG0x0094 -#define BAR_0 0x0200 -#define BAR_1 0x0208 -#define BAR_2 0x0210 -#define BAR_3 0x0218 -#define BAR_4 0x0220 -#define BAR_5 0x0228 -#define SIZE_REG_0 0x0204 -#define SIZE_REG_1 0x020c -#define SIZE_REG_2 0x0214 -#define SIZE_REG_3 0x021c -#define SIZE_REG_4 0x0224 -#define SIZE_REG_5 0x022c -#define HEADERS_RETARGET_BASE_REG 0x0230 -#define HEADERS_RETARGET_CONTROL_REG 0x0234 -#define HIGH_ADDR_REMAP_REG_0 0x0280 -#define HIGH_ADDR_REMAP_REG_1 0x0284 -#define HIGH_ADDR_REMAP_REG_2 0x0288 -#define HIGH_ADDR_REMAP_REG_3 0x028c -#define BASE_ADDR_ENABLE_REG 0x0290 /* * Per-port registers. */ -#define ACCESS_PROTECTION_REG(p) (0x0294 + ((p) 2)) #define PORT_CONFIG_REG(p) (0x0400 + ((p) 10)) #define PORT_CONFIG_EXTEND_REG(p) (0x0404 + ((p) 10)) -#define MII_SERIAL_PARAMETRS_REG(p)(0x0408 + ((p) 10)) -#define GMII_SERIAL_PARAMETRS_REG(p) (0x040c + ((p) 10)) -#define VLAN_ETHERTYPE_REG(p) (0x0410 + ((p) 10)) #define MAC_ADDR_LOW(p)(0x0414 + ((p) 10)) #define MAC_ADDR_HIGH(p) (0x0418 + ((p) 10)) #define SDMA_CONFIG_REG(p) (0x041c + ((p) 10)) -#define DSCP_0(p) (0x0420 + ((p) 10)) -#define DSCP_1(p) (0x0424 + ((p) 10)) -#define DSCP_2(p) (0x0428 + ((p) 10)) -#define DSCP_3(p) (0x042c + ((p) 10)) -#define DSCP_4(p) (0x0430 + ((p) 10)) -#define DSCP_5(p) (0x0434 + ((p) 10)) -#define DSCP_6(p) (0x0438 + ((p) 10)) #define PORT_SERIAL_CONTROL_REG(p) (0x043c + ((p) 10)) -#define VLAN_PRIORITY_TAG_TO_PRIORITY(p) (0x0440 + ((p) 10)) #define PORT_STATUS_REG(p) (0x0444 + ((p) 10)) #define TRANSMIT_QUEUE_COMMAND_REG(p) (0x0448 + ((p) 10)) -#define TX_QUEUE_FIXED_PRIORITY(p) (0x044c + ((p) 10)) -#define PORT_TX_TOKEN_BUCKET_RATE_CONFIG(p)(0x0450 + ((p) 10)) #define MAXIMUM_TRANSMIT_UNIT(p) (0x0458 + ((p) 10)) -#define PORT_MAXIMUM_TOKEN_BUCKET_SIZE(p) (0x045c + ((p) 10)) #define INTERRUPT_CAUSE_REG(p) (0x0460 + ((p) 10)) #define INTERRUPT_CAUSE_EXTEND_REG(p) (0x0464 + ((p) 10)) #define INTERRUPT_MASK_REG(p) (0x0468 + ((p) 10)) #define INTERRUPT_EXTEND_MASK_REG(p) (0x046c + ((p) 10)) -#define RX_FIFO_URGENT_THRESHOLD_REG(p)(0x0470 + ((p) 10)) #define TX_FIFO_URGENT_THRESHOLD_REG(p)(0x0474 + ((p) 10)) -#define RX_MINIMAL_FRAME_SIZE_REG(p) (0x047c + ((p) 10)) -#define RX_DISCARDED_FRAMES_COUNTER(p) (0x0484 + ((p) 10)) -#define PORT_DEBUG_0_REG(p)(0x048c + ((p) 10)) -#define PORT_DEBUG_1_REG(p)(0x0490 + ((p) 10)) -#define PORT_INTERNAL_ADDR_ERROR_REG(p)(0x0494 + ((p) 10)) -#define INTERNAL_USE_REG(p)(0x04fc + ((p) 10)) #define RX_CURRENT_QUEUE_DESC_PTR_0(p) (0x060c + ((p) 10)) -#define RX_CURRENT_QUEUE_DESC_PTR_1(p
[PATCH 7/8] [MV643XX_ETH] Clean up mv643xx_eth.h
Apply the following cleanups to drivers/net/mv643xx_eth.h: * Change #definetab to #definespace. * Fix comment block style. * Wrap lines to fit in 80 columns. * Change foo1 to foo 1. * Align addresses in the same column. * Parenthesize macro arguments. * Replace (124) | (123) | (122) type constructs with (7 22). Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Acked-by: Tzachi Perelstein [EMAIL PROTECTED] Index: linux-2.6/drivers/net/mv643xx_eth.h === --- linux-2.6.orig/drivers/net/mv643xx_eth.h +++ linux-2.6/drivers/net/mv643xx_eth.h @@ -14,9 +14,9 @@ /* Checksum offload for Tx works for most packets, but * fails if previous packet sent did not use hw csum */ -#defineMV643XX_CHECKSUM_OFFLOAD_TX -#defineMV643XX_NAPI -#defineMV643XX_TX_FAST_REFILL +#define MV643XX_CHECKSUM_OFFLOAD_TX +#define MV643XX_NAPI +#define MV643XX_TX_FAST_REFILL #undef MV643XX_COAL /* @@ -49,230 +49,199 @@ #define ETH_HW_IP_ALIGN2 /* hw aligns IP header */ #define ETH_WRAPPER_LEN(ETH_HW_IP_ALIGN + ETH_HLEN + \ ETH_VLAN_HLEN + ETH_FCS_LEN) -#define ETH_RX_SKB_SIZE(dev-mtu + ETH_WRAPPER_LEN + dma_get_cache_alignment()) +#define ETH_RX_SKB_SIZE(dev-mtu + ETH_WRAPPER_LEN + \ + dma_get_cache_alignment()) -// -/*Ethernet Unit Registers */ -// - -#define PHY_ADDR_REG0x -#define SMI_REG 0x0004 -#define UNIT_DEFAULT_ADDR_REG 0x0008 -#define UNIT_DEFAULTID_REG 0x000c -#define UNIT_INTERRUPT_CAUSE_REG0x0080 -#define UNIT_INTERRUPT_MASK_REG 0x0084 -#define UNIT_INTERNAL_USE_REG 0x04fc -#define UNIT_ERROR_ADDR_REG 0x0094 -#define BAR_0 0x0200 -#define BAR_1 0x0208 -#define BAR_2 0x0210 -#define BAR_3 0x0218 -#define BAR_4 0x0220 -#define BAR_5 0x0228 -#define SIZE_REG_0 0x0204 -#define SIZE_REG_1 0x020c -#define SIZE_REG_2 0x0214 -#define SIZE_REG_3 0x021c -#define SIZE_REG_4 0x0224 -#define SIZE_REG_5 0x022c -#define HEADERS_RETARGET_BASE_REG 0x0230 -#define HEADERS_RETARGET_CONTROL_REG0x0234 -#define HIGH_ADDR_REMAP_REG_0 0x0280 -#define HIGH_ADDR_REMAP_REG_1 0x0284 -#define HIGH_ADDR_REMAP_REG_2 0x0288 -#define HIGH_ADDR_REMAP_REG_3 0x028c -#define BASE_ADDR_ENABLE_REG0x0290 -#define ACCESS_PROTECTION_REG(port)(0x0294 + (port2)) -#define MIB_COUNTERS_BASE(port)(0x1000 + (port7)) -#define PORT_CONFIG_REG(port) (0x0400 + (port10)) -#define PORT_CONFIG_EXTEND_REG(port) (0x0404 + (port10)) -#define MII_SERIAL_PARAMETRS_REG(port) (0x0408 + (port10)) -#define GMII_SERIAL_PARAMETRS_REG(port)(0x040c + (port10)) -#define VLAN_ETHERTYPE_REG(port) (0x0410 + (port10)) -#define MAC_ADDR_LOW(port) (0x0414 + (port10)) -#define MAC_ADDR_HIGH(port)(0x0418 + (port10)) -#define SDMA_CONFIG_REG(port) (0x041c + (port10)) -#define DSCP_0(port) (0x0420 + (port10)) -#define DSCP_1(port) (0x0424 + (port10)) -#define DSCP_2(port) (0x0428 + (port10)) -#define DSCP_3(port) (0x042c + (port10)) -#define DSCP_4(port) (0x0430 + (port10)) -#define DSCP_5(port) (0x0434 + (port10)) -#define DSCP_6(port) (0x0438 + (port10)) -#define PORT_SERIAL_CONTROL_REG(port) (0x043c + (port10)) -#define VLAN_PRIORITY_TAG_TO_PRIORITY(port)(0x0440 + (port10)) -#define PORT_STATUS_REG(port) (0x0444 + (port10)) -#define TRANSMIT_QUEUE_COMMAND_REG(port) (0x0448 + (port10)) -#define TX_QUEUE_FIXED_PRIORITY(port) (0x044c + (port10
[PATCH 3/8] [MV643XX_ETH] Disable RX/TX byte swapping on little-endian systems
On little-endian systems, configure the SDMA unit with MV643XX_ETH_BLM_RX_NO_SWAP and MV643XX_ETH_BLM_TX_NO_SWAP. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Acked-by: Tzachi Perelstein [EMAIL PROTECTED] Index: linux-2.6/drivers/net/mv643xx_eth.h === --- linux-2.6.orig/drivers/net/mv643xx_eth.h +++ linux-2.6/drivers/net/mv643xx_eth.h @@ -266,10 +266,21 @@ #defineMV643XX_ETH_IPG_INT_RX(value) ((value 0x3fff) 8) +#if defined(__BIG_ENDIAN) #defineMV643XX_ETH_PORT_SDMA_CONFIG_DEFAULT_VALUE \ MV643XX_ETH_RX_BURST_SIZE_4_64BIT | \ MV643XX_ETH_IPG_INT_RX(0) | \ MV643XX_ETH_TX_BURST_SIZE_4_64BIT +#elif defined(__LITTLE_ENDIAN) +#defineMV643XX_ETH_PORT_SDMA_CONFIG_DEFAULT_VALUE \ + MV643XX_ETH_RX_BURST_SIZE_4_64BIT | \ + MV643XX_ETH_BLM_RX_NO_SWAP | \ + MV643XX_ETH_BLM_TX_NO_SWAP | \ + MV643XX_ETH_IPG_INT_RX(0) | \ + MV643XX_ETH_TX_BURST_SIZE_4_64BIT +#else +#error One of __BIG_ENDIAN or __LITTLE_ENDIAN must be defined +#endif /* These macros describe Ethernet Port serial control reg (PSCR) bits */ #define MV643XX_ETH_SERIAL_PORT_DISABLE0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] [MV643XX_ETH] Move ethernet register definitions into private header
Move the mv643xx's ethernet-related register definitions from include/linux/mv643xx.h into drivers/net/mv643xx_eth.h, since they aren't of any use outside the ethernet driver. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Acked-by: Tzachi Perelstein [EMAIL PROTECTED] Index: linux-2.6/drivers/net/mv643xx_eth.h === --- linux-2.6.orig/drivers/net/mv643xx_eth.h +++ linux-2.6/drivers/net/mv643xx_eth.h @@ -7,7 +7,7 @@ #include linux/workqueue.h #include linux/mii.h -#include linux/mv643xx.h +#include linux/mv643xx_eth.h #include asm/dma-mapping.h @@ -51,6 +51,312 @@ ETH_VLAN_HLEN + ETH_FCS_LEN) #define ETH_RX_SKB_SIZE(dev-mtu + ETH_WRAPPER_LEN + dma_get_cache_alignment()) +// +/*Ethernet Unit Registers */ +// + +#define MV643XX_ETH_PHY_ADDR_REG0x2000 +#define MV643XX_ETH_SMI_REG 0x2004 +#define MV643XX_ETH_UNIT_DEFAULT_ADDR_REG 0x2008 +#define MV643XX_ETH_UNIT_DEFAULTID_REG 0x200c +#define MV643XX_ETH_UNIT_INTERRUPT_CAUSE_REG0x2080 +#define MV643XX_ETH_UNIT_INTERRUPT_MASK_REG 0x2084 +#define MV643XX_ETH_UNIT_INTERNAL_USE_REG 0x24fc +#define MV643XX_ETH_UNIT_ERROR_ADDR_REG 0x2094 +#define MV643XX_ETH_BAR_0 0x2200 +#define MV643XX_ETH_BAR_1 0x2208 +#define MV643XX_ETH_BAR_2 0x2210 +#define MV643XX_ETH_BAR_3 0x2218 +#define MV643XX_ETH_BAR_4 0x2220 +#define MV643XX_ETH_BAR_5 0x2228 +#define MV643XX_ETH_SIZE_REG_0 0x2204 +#define MV643XX_ETH_SIZE_REG_1 0x220c +#define MV643XX_ETH_SIZE_REG_2 0x2214 +#define MV643XX_ETH_SIZE_REG_3 0x221c +#define MV643XX_ETH_SIZE_REG_4 0x2224 +#define MV643XX_ETH_SIZE_REG_5 0x222c +#define MV643XX_ETH_HEADERS_RETARGET_BASE_REG 0x2230 +#define MV643XX_ETH_HEADERS_RETARGET_CONTROL_REG0x2234 +#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_0 0x2280 +#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_1 0x2284 +#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_2 0x2288 +#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_3 0x228c +#define MV643XX_ETH_BASE_ADDR_ENABLE_REG0x2290 +#define MV643XX_ETH_ACCESS_PROTECTION_REG(port)(0x2294 + (port2)) +#define MV643XX_ETH_MIB_COUNTERS_BASE(port)(0x3000 + (port7)) +#define MV643XX_ETH_PORT_CONFIG_REG(port) (0x2400 + (port10)) +#define MV643XX_ETH_PORT_CONFIG_EXTEND_REG(port) (0x2404 + (port10)) +#define MV643XX_ETH_MII_SERIAL_PARAMETRS_REG(port) (0x2408 + (port10)) +#define MV643XX_ETH_GMII_SERIAL_PARAMETRS_REG(port)(0x240c + (port10)) +#define MV643XX_ETH_VLAN_ETHERTYPE_REG(port) (0x2410 + (port10)) +#define MV643XX_ETH_MAC_ADDR_LOW(port) (0x2414 + (port10)) +#define MV643XX_ETH_MAC_ADDR_HIGH(port)(0x2418 + (port10)) +#define MV643XX_ETH_SDMA_CONFIG_REG(port) (0x241c + (port10)) +#define MV643XX_ETH_DSCP_0(port) (0x2420 + (port10)) +#define MV643XX_ETH_DSCP_1(port) (0x2424 + (port10)) +#define MV643XX_ETH_DSCP_2(port) (0x2428 + (port10)) +#define MV643XX_ETH_DSCP_3(port) (0x242c + (port10)) +#define MV643XX_ETH_DSCP_4(port) (0x2430 + (port10)) +#define MV643XX_ETH_DSCP_5(port) (0x2434 + (port10)) +#define MV643XX_ETH_DSCP_6(port) (0x2438 + (port10)) +#define MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port) (0x243c + (port10)) +#define MV643XX_ETH_VLAN_PRIORITY_TAG_TO_PRIORITY(port)(0x2440 + (port10)) +#define MV643XX_ETH_PORT_STATUS_REG(port) (0x2444 + (port10)) +#define MV643XX_ETH_TRANSMIT_QUEUE_COMMAND_REG(port) (0x2448 + (port10)) +#define MV643XX_ETH_TX_QUEUE_FIXED_PRIORITY(port) (0x244c + (port10)) +#define
[PATCH 0/8] [MV643XX_ETH] Add Orion support, and assorted cleanups
This patch series adds support for the Orion's ethernet MAC (which is the same MAC as in the Discovery 643xx) to the mv643xx_eth driver, and performs various random cleanups all over the driver. Patches 1-3 are cleanups necessary to be able to support Orion. Patch 4 enables mv643xx_eth for ARCH_ORION. Patches 5-8 are more cleanups. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,RFC] Marvell Orion SoC ethernet driver
On Thu, Oct 18, 2007 at 03:15:36AM +0200, Lennert Buytenhek wrote: +#define PORT_CONF0x400 +#define PORT_CONF_EXT0x404 +#define PORT_MAC_LO 0x414 +#define PORT_MAC_HI 0x418 +#define PORT_SDMA0x41c +#define PORT_SERIAL 0x43c +#define PORT_STAT0x444 +#define PORT_TXQ_CMD 0x448 +#define PORT_MTU 0x458 +#define PORT_CAUSE 0x460 +#define PORT_CAUSE_EXT 0x464 +#define PORT_MASK0x468 +#define PORT_MASK_EXT0x46c +#define PORT_TX_THRESH 0x474 This driver seems to support the same hardware as mv643xx_eth, any chance you could use it to avoid code duplication ? Interesting. After some asking around, it appears that the mv643xx ethernet silicon block is indeed very similar to the ethernet silicon block found the in Orion ARM SoCs. We'll work on getting Orion to use mv643xx_eth. Thanks for pointing this out. Okay, patchset coming up. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] [MV643XX_ETH] Remove MV643XX_ETH_ register prefix
Now that all register address and bit defines are in private namespace (drivers/net/mv643xx_eth.h), we can safely remove the MV643XX_ETH_ prefix to conserve horizontal space. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Acked-by: Tzachi Perelstein [EMAIL PROTECTED] Index: linux-2.6/drivers/net/mv643xx_eth.c === --- linux-2.6.orig/drivers/net/mv643xx_eth.c +++ linux-2.6/drivers/net/mv643xx_eth.c @@ -80,7 +80,7 @@ static char mv643xx_driver_version[] = static void __iomem *mv643xx_eth_base; -/* used to protect MV643XX_ETH_SMI_REG, which is shared across ports */ +/* used to protect SMI_REG, which is shared across ports */ static DEFINE_SPINLOCK(mv643xx_eth_phy_lock); static inline u32 mv_read(int offset) @@ -214,12 +214,12 @@ static void mv643xx_eth_set_rx_mode(stru struct mv643xx_private *mp = netdev_priv(dev); u32 config_reg; - config_reg = mv_read(MV643XX_ETH_PORT_CONFIG_REG(mp-port_num)); + config_reg = mv_read(PORT_CONFIG_REG(mp-port_num)); if (dev-flags IFF_PROMISC) - config_reg |= (u32) MV643XX_ETH_UNICAST_PROMISCUOUS_MODE; + config_reg |= (u32) UNICAST_PROMISCUOUS_MODE; else - config_reg = ~(u32) MV643XX_ETH_UNICAST_PROMISCUOUS_MODE; - mv_write(MV643XX_ETH_PORT_CONFIG_REG(mp-port_num), config_reg); + config_reg = ~(u32) UNICAST_PROMISCUOUS_MODE; + mv_write(PORT_CONFIG_REG(mp-port_num), config_reg); eth_port_set_multicast_list(dev); } @@ -455,41 +455,37 @@ static void mv643xx_eth_update_pscr(stru u32 o_pscr, n_pscr; unsigned int queues; - o_pscr = mv_read(MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port_num)); + o_pscr = mv_read(PORT_SERIAL_CONTROL_REG(port_num)); n_pscr = o_pscr; /* clear speed, duplex and rx buffer size fields */ - n_pscr = ~(MV643XX_ETH_SET_MII_SPEED_TO_100 | - MV643XX_ETH_SET_GMII_SPEED_TO_1000 | - MV643XX_ETH_SET_FULL_DUPLEX_MODE | - MV643XX_ETH_MAX_RX_PACKET_MASK); + n_pscr = ~(SET_MII_SPEED_TO_100 | + SET_GMII_SPEED_TO_1000 | + SET_FULL_DUPLEX_MODE | + MAX_RX_PACKET_MASK); if (ecmd-duplex == DUPLEX_FULL) - n_pscr |= MV643XX_ETH_SET_FULL_DUPLEX_MODE; + n_pscr |= SET_FULL_DUPLEX_MODE; if (ecmd-speed == SPEED_1000) - n_pscr |= MV643XX_ETH_SET_GMII_SPEED_TO_1000 | - MV643XX_ETH_MAX_RX_PACKET_9700BYTE; + n_pscr |= SET_GMII_SPEED_TO_1000 | + MAX_RX_PACKET_9700BYTE; else { if (ecmd-speed == SPEED_100) - n_pscr |= MV643XX_ETH_SET_MII_SPEED_TO_100; - n_pscr |= MV643XX_ETH_MAX_RX_PACKET_1522BYTE; + n_pscr |= SET_MII_SPEED_TO_100; + n_pscr |= MAX_RX_PACKET_1522BYTE; } if (n_pscr != o_pscr) { - if ((o_pscr MV643XX_ETH_SERIAL_PORT_ENABLE) == 0) - mv_write(MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port_num), - n_pscr); + if ((o_pscr SERIAL_PORT_ENABLE) == 0) + mv_write(PORT_SERIAL_CONTROL_REG(port_num), n_pscr); else { queues = mv643xx_eth_port_disable_tx(port_num); - o_pscr = ~MV643XX_ETH_SERIAL_PORT_ENABLE; - mv_write(MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port_num), - o_pscr); - mv_write(MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port_num), - n_pscr); - mv_write(MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port_num), - n_pscr); + o_pscr = ~SERIAL_PORT_ENABLE; + mv_write(PORT_SERIAL_CONTROL_REG(port_num), o_pscr); + mv_write(PORT_SERIAL_CONTROL_REG(port_num), n_pscr); + mv_write(PORT_SERIAL_CONTROL_REG(port_num), n_pscr); if (queues) mv643xx_eth_port_enable_tx(port_num, queues); } @@ -515,13 +511,13 @@ static irqreturn_t mv643xx_eth_int_handl unsigned int port_num = mp-port_num; /* Read interrupt cause registers */ - eth_int_cause = mv_read(MV643XX_ETH_INTERRUPT_CAUSE_REG(port_num)) + eth_int_cause = mv_read(INTERRUPT_CAUSE_REG(port_num)) ETH_INT_UNMASK_ALL; if (eth_int_cause ETH_INT_CAUSE_EXT) { eth_int_cause_ext = mv_read( - MV643XX_ETH_INTERRUPT_CAUSE_EXTEND_REG(port_num
[PATCH 1/8] [MV643XX_ETH] Split off mv643xx_eth platform device data
The mv643xx ethernet silicon block is also found in a couple of other Marvell chips. As a first step towards splitting off the mv643xx_eth bits from the rest of the mv643xx bits, this patch splits the mv643xx ethernet platform device data struct in linux/mv643xx.h off into linux/mv643xx_eth.h, and includes the latter from the former. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Acked-by: Tzachi Perelstein [EMAIL PROTECTED] Index: linux-2.6/include/linux/mv643xx.h === --- linux-2.6.orig/include/linux/mv643xx.h +++ linux-2.6/include/linux/mv643xx.h @@ -14,6 +14,7 @@ #define __ASM_MV643XX_H #include asm/types.h +#include linux/mv643xx_eth.h // /* Processor Address Space */ @@ -658,9 +659,6 @@ /*Ethernet Unit Registers */ // -#define MV643XX_ETH_SHARED_REGS 0x2000 -#define MV643XX_ETH_SHARED_REGS_SIZE0x2000 - #define MV643XX_ETH_PHY_ADDR_REG0x2000 #define MV643XX_ETH_SMI_REG 0x2004 #define MV643XX_ETH_UNIT_DEFAULT_ADDR_REG 0x2008 @@ -1280,28 +1278,6 @@ struct mv64xxx_i2c_pdata { #define MV643XX_ETH_DESC_SIZE 64 -#define MV643XX_ETH_SHARED_NAMEmv643xx_eth_shared -#define MV643XX_ETH_NAME mv643xx_eth - -struct mv643xx_eth_platform_data { - int port_number; - u16 force_phy_addr; /* force override if phy_addr == 0 */ - u16 phy_addr; - - /* If speed is 0, then speed and duplex are autonegotiated. */ - int speed; /* 0, SPEED_10, SPEED_100, SPEED_1000 */ - int duplex; /* DUPLEX_HALF or DUPLEX_FULL */ - - /* non-zero values of the following fields override defaults */ - u32 tx_queue_size; - u32 rx_queue_size; - u32 tx_sram_addr; - u32 tx_sram_size; - u32 rx_sram_addr; - u32 rx_sram_size; - u8 mac_addr[6];/* mac address if non-zero*/ -}; - /* Watchdog Platform Device, Driver Data */ #defineMV64x60_WDT_NAMEmv64x60_wdt Index: linux-2.6/include/linux/mv643xx_eth.h === --- /dev/null +++ linux-2.6/include/linux/mv643xx_eth.h @@ -0,0 +1,31 @@ +/* + * MV-643XX ethernet platform device data definition file. + */ +#ifndef __LINUX_MV643XX_ETH_H +#define __LINUX_MV643XX_ETH_H + +#define MV643XX_ETH_SHARED_NAMEmv643xx_eth_shared +#define MV643XX_ETH_NAME mv643xx_eth +#define MV643XX_ETH_SHARED_REGS0x2000 +#define MV643XX_ETH_SHARED_REGS_SIZE 0x2000 + +struct mv643xx_eth_platform_data { + int port_number; + u16 force_phy_addr; /* force override if phy_addr == 0 */ + u16 phy_addr; + + /* If speed is 0, then speed and duplex are autonegotiated. */ + int speed; /* 0, SPEED_10, SPEED_100, SPEED_1000 */ + int duplex; /* DUPLEX_HALF or DUPLEX_FULL */ + + /* non-zero values of the following fields override defaults */ + u32 tx_queue_size; + u32 rx_queue_size; + u32 tx_sram_addr; + u32 tx_sram_size; + u32 rx_sram_addr; + u32 rx_sram_size; + u8 mac_addr[6];/* mac address if non-zero*/ +}; + +#endif /* __LINUX_MV643XX_ETH_H */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] [MV643XX_ETH] Enable use on Orion platforms
Allow Orion ARM platforms to use the mv643xx_eth driver. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Acked-by: Tzachi Perelstein [EMAIL PROTECTED] Index: linux-2.6/drivers/net/Kconfig === --- linux-2.6.orig/drivers/net/Kconfig +++ linux-2.6/drivers/net/Kconfig @@ -2392,13 +2392,16 @@ config UGETH_TX_ON_DEMAND depends on UCC_GETH config MV643XX_ETH - tristate MV-643XX Ethernet support - depends on MV64360 || MV64X60 || (PPC_MULTIPLATFORM PPC32) + tristate Marvell Discovery (643XX) and Orion ethernet support + depends on MV64360 || MV64X60 || (PPC_MULTIPLATFORM PPC32) || ARCH_ORION select MII help - This driver supports the gigabit Ethernet on the Marvell MV643XX - chipset which is used in the Momenco Ocelot C and Jaguar ATX and - Pegasos II, amongst other PPC and MIPS boards. + This driver supports the gigabit ethernet MACs in the + Marvell Discovery PPC/MIPS chipset family (MV643XX) and + in the Marvell Orion ARM SoC family. + + Some boards that use the Discovery chipset are the Momenco + Ocelot C and Jaguar ATX and Pegasos II. config QLA3XXX tristate QLogic QLA3XXX Network Driver Support Index: linux-2.6/drivers/net/mv643xx_eth.c === --- linux-2.6.orig/drivers/net/mv643xx_eth.c +++ linux-2.6/drivers/net/mv643xx_eth.c @@ -1,5 +1,5 @@ /* - * drivers/net/mv643xx_eth.c - Driver for MV643XX ethernet ports + * Driver for Marvell Discovery (MV643XX) and Marvell Orion ethernet ports * Copyright (C) 2002 Matthew Dharm [EMAIL PROTECTED] * * Based on the 64360 driver from: - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] [MV643XX_ETH] Remove SHARED_REGS register address bias
Start counting mv643xx_eth register addresses from zero, instead of from 0x2000 (MV643XX_ETH_SHARED_REGS.) Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Acked-by: Tzachi Perelstein [EMAIL PROTECTED] Index: linux-2.6/drivers/net/mv643xx_eth.c === --- linux-2.6.orig/drivers/net/mv643xx_eth.c +++ linux-2.6/drivers/net/mv643xx_eth.c @@ -78,26 +78,19 @@ static const struct ethtool_ops mv643xx_ static char mv643xx_driver_name[] = mv643xx_eth; static char mv643xx_driver_version[] = 1.0; -static void __iomem *mv643xx_eth_shared_base; +static void __iomem *mv643xx_eth_base; /* used to protect MV643XX_ETH_SMI_REG, which is shared across ports */ static DEFINE_SPINLOCK(mv643xx_eth_phy_lock); static inline u32 mv_read(int offset) { - void __iomem *reg_base; - - reg_base = mv643xx_eth_shared_base - MV643XX_ETH_SHARED_REGS; - - return readl(reg_base + offset); + return readl(((void __iomem *)mv643xx_eth_base) + offset); } static inline void mv_write(int offset, u32 data) { - void __iomem *reg_base; - - reg_base = mv643xx_eth_shared_base - MV643XX_ETH_SHARED_REGS; - writel(data, reg_base + offset); + writel(data, ((void __iomem *)mv643xx_eth_base) + offset); } /* @@ -1470,9 +1463,8 @@ static int mv643xx_eth_shared_probe(stru if (res == NULL) return -ENODEV; - mv643xx_eth_shared_base = ioremap(res-start, - MV643XX_ETH_SHARED_REGS_SIZE); - if (mv643xx_eth_shared_base == NULL) + mv643xx_eth_base = ioremap(res-start, res-end - res-start + 1); + if (mv643xx_eth_base == NULL) return -ENOMEM; return 0; @@ -1481,8 +1473,8 @@ static int mv643xx_eth_shared_probe(stru static int mv643xx_eth_shared_remove(struct platform_device *pdev) { - iounmap(mv643xx_eth_shared_base); - mv643xx_eth_shared_base = NULL; + iounmap(mv643xx_eth_base); + mv643xx_eth_base = NULL; return 0; } Index: linux-2.6/drivers/net/mv643xx_eth.h === --- linux-2.6.orig/drivers/net/mv643xx_eth.h +++ linux-2.6/drivers/net/mv643xx_eth.h @@ -55,116 +55,116 @@ /*Ethernet Unit Registers */ // -#define MV643XX_ETH_PHY_ADDR_REG0x2000 -#define MV643XX_ETH_SMI_REG 0x2004 -#define MV643XX_ETH_UNIT_DEFAULT_ADDR_REG 0x2008 -#define MV643XX_ETH_UNIT_DEFAULTID_REG 0x200c -#define MV643XX_ETH_UNIT_INTERRUPT_CAUSE_REG0x2080 -#define MV643XX_ETH_UNIT_INTERRUPT_MASK_REG 0x2084 -#define MV643XX_ETH_UNIT_INTERNAL_USE_REG 0x24fc -#define MV643XX_ETH_UNIT_ERROR_ADDR_REG 0x2094 -#define MV643XX_ETH_BAR_0 0x2200 -#define MV643XX_ETH_BAR_1 0x2208 -#define MV643XX_ETH_BAR_2 0x2210 -#define MV643XX_ETH_BAR_3 0x2218 -#define MV643XX_ETH_BAR_4 0x2220 -#define MV643XX_ETH_BAR_5 0x2228 -#define MV643XX_ETH_SIZE_REG_0 0x2204 -#define MV643XX_ETH_SIZE_REG_1 0x220c -#define MV643XX_ETH_SIZE_REG_2 0x2214 -#define MV643XX_ETH_SIZE_REG_3 0x221c -#define MV643XX_ETH_SIZE_REG_4 0x2224 -#define MV643XX_ETH_SIZE_REG_5 0x222c -#define MV643XX_ETH_HEADERS_RETARGET_BASE_REG 0x2230 -#define MV643XX_ETH_HEADERS_RETARGET_CONTROL_REG0x2234 -#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_0 0x2280 -#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_1 0x2284 -#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_2 0x2288 -#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_3 0x228c -#define MV643XX_ETH_BASE_ADDR_ENABLE_REG0x2290 -#define MV643XX_ETH_ACCESS_PROTECTION_REG(port)(0x2294 + (port2)) -#define MV643XX_ETH_MIB_COUNTERS_BASE(port)(0x3000 + (port7)) -#define MV643XX_ETH_PORT_CONFIG_REG(port) (0x2400 + (port10)) -#define MV643XX_ETH_PORT_CONFIG_EXTEND_REG(port) (0x2404 + (port10)) -#define MV643XX_ETH_MII_SERIAL_PARAMETRS_REG(port) (0x2408 + (port10)) -#define MV643XX_ETH_GMII_SERIAL_PARAMETRS_REG(port)(0x240c + (port10)) -#define
Re: [PATCH,RFC] Marvell Orion SoC ethernet driver
On Tue, Oct 16, 2007 at 11:31:15PM +0200, Maxime Bizon wrote: Hello, Hi, +#define PORT_CONF 0x400 +#define PORT_CONF_EXT 0x404 +#define PORT_MAC_LO0x414 +#define PORT_MAC_HI0x418 +#define PORT_SDMA 0x41c +#define PORT_SERIAL0x43c +#define PORT_STAT 0x444 +#define PORT_TXQ_CMD 0x448 +#define PORT_MTU 0x458 +#define PORT_CAUSE 0x460 +#define PORT_CAUSE_EXT 0x464 +#define PORT_MASK 0x468 +#define PORT_MASK_EXT 0x46c +#define PORT_TX_THRESH 0x474 This driver seems to support the same hardware as mv643xx_eth, any chance you could use it to avoid code duplication ? Interesting. After some asking around, it appears that the mv643xx ethernet silicon block is indeed very similar to the ethernet silicon block found the in Orion ARM SoCs. We'll work on getting Orion to use mv643xx_eth. Thanks for pointing this out. thanks, Lennert - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH,RFC] Marvell Orion SoC ethernet driver
Attached is a driver for the built-in 10/100/1000 ethernet MAC in the Marvell Orion series of ARM SoCs. This ethernet MAC supports the MII/GMII/RGMII PCS interface types, and offers a pretty standard set of MAC features, such as RX/TX checksum offload, scatter-gather, interrupt coalescing, PAUSE, jumbo frames, etc. This patch is against 2.6.22.1, and the driver has not yet been adapted to the recent NAPI changes. Nevertheless, we wanted to get this out there for feedback/review. Comments appreciated! Signed-off-by: Tzachi Perelstein [EMAIL PROTECTED] Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Signed-off-by: Nicolas Pitre [EMAIL PROTECTED] Index: linux-2.6.22.1-orion.3.3/drivers/net/Kconfig === --- linux-2.6.22.1-orion.3.3.orig/drivers/net/Kconfig +++ linux-2.6.22.1-orion.3.3/drivers/net/Kconfig @@ -1995,6 +1995,12 @@ config E1000_DISABLE_PACKET_SPLIT source drivers/net/ixp2000/Kconfig +config ORION_ETH + tristate Marvell Orion Gigabit Ethernet support + depends on ARCH_ORION + ---help--- + This driver supports the Orion's on chip gigabit ethernet port. + config MYRI_SBUS tristate MyriCOM Gigabit Ethernet support depends on SBUS Index: linux-2.6.22.1-orion.3.3/drivers/net/Makefile === --- linux-2.6.22.1-orion.3.3.orig/drivers/net/Makefile +++ linux-2.6.22.1-orion.3.3/drivers/net/Makefile @@ -221,6 +221,7 @@ obj-$(CONFIG_HAMRADIO) += hamradio/ obj-$(CONFIG_IRDA) += irda/ obj-$(CONFIG_ETRAX_ETHERNET) += cris/ obj-$(CONFIG_ENP2611_MSF_NET) += ixp2000/ +obj-$(CONFIG_ORION_ETH) += orion_eth.o obj-$(CONFIG_NETCONSOLE) += netconsole.o Index: linux-2.6.22.1-orion.3.3/drivers/net/orion_eth.c === --- /dev/null +++ linux-2.6.22.1-orion.3.3/drivers/net/orion_eth.c @@ -0,0 +1,1506 @@ +/* + * Marvell Orion Gigabit Ethernet network device driver + * + * Maintainer: Tzachi Perelstein [EMAIL PROTECTED] + * + * This file is licensed under the terms of the GNU General Public + * License version 2. This program is licensed as is without any + * warranty of any kind, whether express or implied. + */ + +#include linux/dma-mapping.h +#include linux/module.h +#include linux/kernel.h +#include linux/netdevice.h +#include linux/mii.h +#include linux/etherdevice.h +#include linux/ethtool.h +#include linux/ip.h +#include linux/in.h +#include linux/init.h +#include linux/platform_device.h +#include linux/delay.h +#include asm/arch/platform.h +#include asm/io.h + +#define DRV_NAME orion-eth +#define DRV_VERSION0.3 + +/* + * Orion Gigabit Ethernet Registers + / +#define rdl(op, off) __raw_readl((op)-base_addr + (off)) +#define wrl(op, off, val) __raw_writel((val), (op)-base_addr + (off)) +#define wrb(op, off, val) __raw_writeb((val), (op)-base_addr + (off)) + +/* + * Unit Global Registers + */ +#define ETH_PHY_ID 0x000 +#define ETH_SMI0x004 +#define ETH_CAUSE 0x080 +#define ETH_MASK 0x084 +#define ETH_CTRL 0x0b0 + +/* + * Port Registers + */ +#define PORT_CONF 0x400 +#define PORT_CONF_EXT 0x404 +#define PORT_MAC_LO0x414 +#define PORT_MAC_HI0x418 +#define PORT_SDMA 0x41c +#define PORT_SERIAL0x43c +#define PORT_STAT 0x444 +#define PORT_TXQ_CMD 0x448 +#define PORT_MTU 0x458 +#define PORT_CAUSE 0x460 +#define PORT_CAUSE_EXT 0x464 +#define PORT_MASK 0x468 +#define PORT_MASK_EXT 0x46c +#define PORT_TX_THRESH 0x474 +#define PORT_CURR_RXD 0x60c +#define PORT_RXQ_CMD 0x680 +#define PORT_CURR_TXD 0x6c0 +#define PORT_MIB_BASE 0x1000 +#define PORT_MIB_SIZE 128 +#define PORT_SPEC_MCAST_BASE 0x1400 +#define PORT_SPEC_MCAST_SIZE 256 +#define PORT_OTHER_MCAST_BASE 0x1500 +#define PORT_OTHER_MCAST_SIZE 256 +#define PORT_UCAST_BASE0x1600 +#define PORT_UCAST_SIZE16 + +/* + * ETH_SMI bits + */ +#define SMI_DEV_OFFS 16 +#define SMI_REG_OFFS 21 +#define SMI_READ (1 26) +#define SMI_READ_VALID (1 27) +#define SMI_BUSY (1 28) + +/* + * PORT_STAT bits + */ +#define STAT_LINK_UP (1 1) +#define STAT_FULL_DUPLEX (1 2) +#define STAT_SPEED_1000(1 4) +#define STAT_SPEED_100 (1 5) + +/* + * PORT_[T/R]XQ_CMD bits + */ +#define PORT_EN_TXQ0 1 +#define PORT_EN_RXQ0 1 +#define PORT_DIS_RXQ0 (1 8) +#define PORT_DIS_TXQ0 (1 8) + +/* + * Descriptors bits + */ +#define
Re: Problem with implementation of TCP_DEFER_ACCEPT?
On Fri, Aug 24, 2007 at 01:08:25AM +0100, TJ wrote: An RFC 793 standard TCP handshake requires three packets: client SYN server LISTENING client SYN ACK server SYN_RECEIVED client ACK server ESTABLISHED client PSH ACK + data server TCP_DEFER_ACCEPT is designed to increase performance by reducing the number of TCP packets exchanged before the client can pass data: client SYN server LISTENING client SYN ACK server SYN_RECEIVED client PSH ACK + data server ESTABLISHED At present with TCP_DEFER_ACCEPT the kernel treats the RFC 793 handshake as invalid; dropping the ACK from the client without replying so the client doesn't know the server has in fact set it's internal ACKed flag. If the client doesn't send a packet containing data before the SYN_ACK time-outs finally expire the connection will be dropped. A brought this up a long, long time ago, and I seem to remember Alexey Kuznetsov explained me at the time that this was intentional. I can't find the thread in the mailing list archives anymore, though -- and my memory might be failing me. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: when using 'brctl stp'
On Tue, Aug 14, 2007 at 02:11:05PM +0100, Stephen Hemminger wrote: Bridge locking for /sys/class/net/br0/bridge/stp_enabled was wrong. Another bug in bridge utilities makes it such that this interface, meant it wasn't being used. The locking needs to be removed from set_stp_state(), the lock is already acquired down in br_stp_start()/br_stp_stop. The 'locking' in set_stp_state() is actually dropping the lock around the br_stp_set_enabled() invocation, not acquiring it: @@ -150,9 +150,7 @@ static ssize_t show_stp_state(struct dev static void set_stp_state(struct net_bridge *br, unsigned long val) { rtnl_lock(); - spin_unlock_bh(br-lock); br_stp_set_enabled(br, val); - spin_lock_bh(br-lock); rtnl_unlock(); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make atomic_t volatile on all architectures
On Wed, Aug 08, 2007 at 07:07:33PM -0400, Chris Snook wrote: From: Chris Snook [EMAIL PROTECTED] Some architectures currently do not declare the contents of an atomic_t to be volatile. This causes confusion since atomic_read() might not actually read anything if an optimizing compiler re-uses a value stored in a register, which can break code that loops until something external changes the value of an atomic_t. Avoiding such bugs requires using barrier(), which causes re-loads of all registers used in the loop, thus hurting performance instead of helping it, particularly on architectures where it's unnecessary. Since we generally want to re-read the contents of an atomic variable on every access anyway, let's standardize the behavior across all architectures and avoid the performance and correctness problems of requiring the use of barrier() in loops that expect atomic_t variables to change externally. This is relevant even on non-smp architectures, since drivers may use atomic operations in interrupt handlers. Signed-off-by: Chris Snook [EMAIL PROTECTED] Documentation/atomic_ops.txt would need updating: [...] One very important aspect of these two routines is that they DO NOT require any explicit memory barriers. They need only perform the atomic_t counter update in an SMP safe manner. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] EP93XX_ETH must select MII
On Fri, Jul 13, 2007 at 02:12:08AM +0200, Adrian Bunk wrote: From: John Donoghue [EMAIL PROTECTED] CONFIG_EP93XX_ETH=y, CONFIG_MII=n results in an obvious link error. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Acked-by: Lennert Buytenhek [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] b44: power down PHY when interface down
On Sun, Jul 01, 2007 at 12:23:16PM +0200, Michael Buesch wrote: More or less. You can't add the resistances like that, since the bus isolation chip buffers the IDSEL signal, but it is correct that if the host's IDSEL resistor is larger than a certain value, the combination of the resistive coupling of IDSEL plus the extra buffer in the isolator might be causing the IDSEL input on the 'guest' PCI board to assert too late (or not assert at all), causing config accesses to fail. (This also depends on the specific 'guest' PCI board used, as you noted, due to differing IDSEL trace lengths/capacitances and input pin capacitances on different PCI boards. Also, it might work at 33 MHz but not work at 66 MHz, etc.) It doesn't work on any of my boards :( What extender board is this? Do you have docs/schematics? And what motherboard brand/type? If you feel adventurous, you could try to hack around this by figuring out which AD[31:16] line this PCI slot's IDSEL line is resistively coupled to (depends on the slot), and then adding another parallel resistor on the board itself to make the bus isolator's input buffer charge faster. Note that this does increase the load on that specific AD[] line, which might cause other funny effects. Well, but how to find out to which address line it's connected to? Pretty hard to follow the PCB traces, especially since it's multilayered. Actually, the IDSEL resistor would be on the computer's motherboard, not on the PCI board. And to which address line the IDSEL line is connected depends on which PCI slot on the motherboard you're looking at. A multimeter should do the trick, but I would advise against this if you're not totally comfortable with hacking hardware. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] b44: power down PHY when interface down
On Sat, Jun 30, 2007 at 04:19:23PM +0100, Matthew Garrett wrote: I'd agree that there's a need for a state where we power down as much as possible (even at the cost of functionality), but where possible it would also be nice to offer a state where the mac is powered down and the phy left up. There are PHYs which can detect that someone's on the other end even when powered down.. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] b44: power down PHY when interface down
On Sat, Jun 30, 2007 at 11:53:25PM +0200, Michael Buesch wrote: When the interface is down (or driver removed), the BroadCom 44xx card remains powered on, and both its MAC and PHY is using up power. This patch makes the driver issue a MAC_CTRL_PHY_PDOWN when the interface is halted, and does a partial chip reset turns off the activity LEDs too. Applies to 2.6.22-rc6, or current git head. Tested on a Broadcom BCM4401-B0 card, it saves ~0.5W (measured using powertop). Hm, I was going to measure the real power advantage with a PCI-extender card. But my B44B0 card doesn't seem to work in that extender card. It works perfectly fine sticked directly into the motherboard, though, and other cards like a BCM4318 work in the extender, too. Not sure what this is. The extender has an application note about nonworking cards in the extender and a too big resistor on the board IDSEL pin being the cause of this. Does the card show up in lspci at all? IDSEL drive strength issues should only affect config space accesses. Does the extender board have a PCI-PCI bridge on it? (If not, there's not really any reason to resistively couple the IDSEL line to the host, since the host should take care of that.) Maybe I can try with another machine tomorrow. That would only make a difference if there is no PCI-PCI bridge on the extender board. If the extender resistively couples the host's IDSEL line, you might see different results on a different host bridge, since different host bridges can use different numbers of IDSEL stepping cycles. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] b44: power down PHY when interface down
On Sun, Jul 01, 2007 at 12:24:40AM +0200, Michael Buesch wrote: Hm, I was going to measure the real power advantage with a PCI-extender card. But my B44B0 card doesn't seem to work in that extender card. It works perfectly fine sticked directly into the motherboard, though, and other cards like a BCM4318 work in the extender, too. Not sure what this is. The extender has an application note about nonworking cards in the extender and a too big resistor on the board IDSEL pin being the cause of this. Does the card show up in lspci at all? No it doesn't. Right, so it sounds like it might be this issue. Does the extender board have a PCI-PCI bridge on it? (If not, there's not really any reason to resistively couple the IDSEL line to the host, since the host should take care of that.) There's no bridge. It just decouples all voltage lines, so you can drive it from external supply and/or measure voltages and current. On the PCB it looks like the the IDSEL line is rather directly routed to the host IDSEL. It just goes through one of the bus isolation chips. So I guess (just my guess) that this chip has some resistance and if the total resistance of the chip + the IDSEL resistor on the mainboard goes above some threshold it doesn't work anymore for some cards. In the application note they write about trouble for IDSEL resistors 51ohms. More or less. You can't add the resistances like that, since the bus isolation chip buffers the IDSEL signal, but it is correct that if the host's IDSEL resistor is larger than a certain value, the combination of the resistive coupling of IDSEL plus the extra buffer in the isolator might be causing the IDSEL input on the 'guest' PCI board to assert too late (or not assert at all), causing config accesses to fail. (This also depends on the specific 'guest' PCI board used, as you noted, due to differing IDSEL trace lengths/capacitances and input pin capacitances on different PCI boards. Also, it might work at 33 MHz but not work at 66 MHz, etc.) If you feel adventurous, you could try to hack around this by figuring out which AD[31:16] line this PCI slot's IDSEL line is resistively coupled to (depends on the slot), and then adding another parallel resistor on the board itself to make the bus isolator's input buffer charge faster. Note that this does increase the load on that specific AD[] line, which might cause other funny effects. Maybe I can try with another machine tomorrow. That would only make a difference if there is no PCI-PCI bridge on the extender board. Well, they suggest it in the application note as a possible fix. ;) The bus isolation chip doesn't count as a PCI-PCI bridge. :) I'm just saying that you wouldn't see the issue you are seeing now if the extender board had a real PCI-PCI bridge on it, since in that case the type 0 config access to the guest PCI board would be generated by the bridge instead of by the host. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
On Wed, May 16, 2007 at 08:13:01AM +0100, Christoph Hellwig wrote: +#ifndef __ARMEB__ +#warning Little endian mode not supported +#endif Personally I'm less fussed about WAN / LE support. Anyone with any sense will run ixp4xx boards doing such a specialised network operation as BE. Also, NSLU2-Linux can't test this functionality with our LE setup as we don't have this hardware on-board. You may just want to declare a depends on ARMEB in Kconfig (with or without OR (ARM || BROKEN) ) and have done with it - it's up to you. Christian Hohnstaedt's work did support LE though. Not all ixp4xx boards are by definition doing such a specialised network operation. Krzysztof, why is LE not supported? Do you need access to ixp4xx that starts in LE mode? Not even trying to support LE is a clear merge blocker. Maybe Krzysztof can't actually test it himself, which is fine - but not even pretending to be endian clean is not what proper Linux drivers do. The issue is not that the driver is not 'endian clean'. This is a driver for an on-chip ethernet MAC on an ARM CPU. I.e. the ethernet MAC is on the CPU itself, it's not some kind of PCI device or something like that. The ARM CPU in question can be run in either little endian or big endian mode. Making a driver work in both modes of operation is generally not just an issue of adding a couple of be32_to_cpu()s in the right places. For example, intel IXP2000 and IXP23xx CPU support in arch/arm only supports big-endian mode of operation, and none of the associated drivers support little-endian mode. Most of the other CPU support in arch/arm only supports little-endian mode, and none of the associated drivers support big-endian mode. According to your criterion, that would mean that most of the ARM drivers (alsa, usb, framebuffer, networking, etc.) should never have been accepted in the kernel tree in the first place. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
On Wed, May 16, 2007 at 08:16:38PM +0930, Rod Whitby wrote: So, if the author of these patches wishes to concentrate on big-endian support first, then we will not say (and have not said) anything which will block inclusion of a big-endian only version of this driver. The NSLU2 people are the ones here that are saying that the driver should really support LE (because that is what they happen to be using, the rest of the world runs the ixp4xx in BE), and they keep saying that it would be so easy to make a patch to add LE support, but so far they haven't produced such a patch. Please just write the patch and let's get this over with. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
On Wed, May 16, 2007 at 09:05:18PM +0930, Rod Whitby wrote: So, if the author of these patches wishes to concentrate on big-endian support first, then we will not say (and have not said) anything which will block inclusion of a big-endian only version of this driver. The NSLU2 people are the ones here that are saying that the driver should really support LE (because that is what they happen to be using, the rest of the world runs the ixp4xx in BE) I'll repeat again. NSLU2-Linux supports both BE and LE. We have about 5,000 users running BE and about 5,000 users running LE. Perhaps, but somehow I don't think that we'd have seen any reaction if the submitted driver had only supported LE and not BE. Please just write the patch and let's get this over with. Please let's just stop arguing about it. If a patch appears before it gets merged, then great. If it doesn't then it will appear at a later date. Great. I agree. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
On Wed, May 09, 2007 at 03:45:53PM +0100, Michael-Luke Jones wrote: No-one is saying that this driver should not be mainlined before it has LE support. All that I said was: Personally I'd like LE ethernet tested and working before we push. The alternative would be to explicitly state in Kconfig that LE arm is broken with this driver, so that this could be fixed later. The driver does bomb out during compile if __ARMEB__ isn't defined, but that apparently wasn't good enough. Please can we not blow this out of proportion, it really isn't that big a deal. The irony is that fixing Krzysztof's driver to work on LE will probably be quite easy, given that we already have a working LE driver from Christian. I'm looking forward to your patch. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
On Wed, May 09, 2007 at 10:58:06AM +0200, Marcus Better wrote: There _is_ an ARM BE version of Debian. It's not an official port, but it's not maintained any worse than the 'official' LE ARM Debian port is. Hmm... That changes a bit. Perhaps we should forget about that LE thing then, and (at best) put that trivial workaround? Please keep in mind that users are unlikely to install an unofficial port which lacks integration with the Debian infrastructure, security support and other services. The arm architecture (LE) is currently the third most popular in Debian, whereas I suspect (?) there are very few BE Debian systems out there. Note that all of your arguments also apply to the experimental EABI little-endian ARM port. I.e.: 1. The EABI port is an unofficial port. 2. The EABI port is not integrated with the Debian infrastructure. 3. The EABI port lacks security support. You could also argue that: 4. There is no reason to use EABI -- old-ABI works just as well. 5. The perceived floating point speedups that EABI gives are completely drowned out by the slowness of the rest of the system. 6. A lot of programs assume old-ABI behavior, it is too much work to patch them all. Does that mean that the Debian ARM people have their heads so far up their collective asses that they think that every form of change is bad and are unable to accept that some forms of change might be for the better? I think you've just summarised why I don't like working on Debian. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR
On Tue, May 08, 2007 at 06:59:36PM +0200, Krzysztof Halasa wrote: There may be up to 6 Ethernet ports (not sure about hardware status, not yet supported even by Intel) - 7 queues * 128 entries each = ~ 3.5 KB. Add 2 long queues (RX) for HSS and something for TX, and then crypto, and maybe other things. You're unlikely to be using all of those at the same time, though. That's the point. And what do you do if the user does compile all of these features into his kernel and then tries to use them all at the same time? Return -ENOMEM? If he is able to do so, yes - there is nothing we can do. But I suspect a single machine would not have all possible hardware. The problem is, we don't know what would it have, so it must be dynamic. Well, you _would_ like to have a way to make sure that all the capabilities on the board can be used. If you have a future ixp4xx based board with 16 ethernet ports, you don't want 'ifconfig eth7 up' to give you -ENOMEM just because we ran out of SRAM. The way I see it, that means that you do want to scale back your other SRAM allocations if you know that you're going to need a lot of SRAM (say, for ethernet RX/TX queues.) Either you can do this with an ugly hack a la: /* * The FOO board has many ethernet ports, and runs out of * SRAM prematurely if we use the default TX/RX ring sizes. */ #ifdef CONFIG_MACH_IXP483_FOO_BOARD #define IXP4XX_ETH_RXTX_QUEUE_SIZE 32 #else #define IXP4XX_ETH_RXTX_QUEUE_SIZE 256 #endif Or you can put this knowledge in the board support code (cleaner, IMHO.) E.g. let arch/arm/mach-ixp4xx/nslu2.c decide, at platform device instantiation time, which region of queue SRAM can be used by which queue, and take static allocations for things like the crypto unit into account. (This is just one form of that idea, there are many different variations.) That way, you can _guarantee_ that you'll always have enough SRAM to be able to use the functionality that is exposed on the board you are running on (which is a desirable property, IMHO), which is something that you can't achieve with an allocator, as far as I can see. I'm not per se against the allocator, I just think that there are problems (running out of SRAM, fragmentation) that can't be solved by the allocator alone (SRAM users have to be aware which other SRAM users there are in the system, while the idea of the allocator is to insulate these users from each other), and any solution that solves those two problems IMHO also automatically solves the problem that the allocator is trying to solve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
On Wed, May 09, 2007 at 11:35:03AM +0200, Marcus Better wrote: Does that mean that the Debian ARM people have their heads so far up their collective asses that they think that every form of change is bad and are unable to accept that some forms of change might be for the better? Well, I am not one of the Debian ARM people, just a user... and I do hope the EABI port becomes supported in the future! But in the meatime there is a crowd of users running Debian on consumer devices like the NSLU2, and they need a LE network driver. There's a crowd of users running Linux on TCP offload capable cards, and they need TCP offload support in Linux. The people who need a LE network driver can use Christian's driver, as Christian's driver works in LE just fine. The people who care about LE support can add LE support to the driver that Krzysztof wrote. I don't think that not supporting LE is a reason not to merge Krzysztof's driver. Don't make supporting LE systems Krzysztof's problem. Krzysztof has written an excellent driver, and while it would be 100% Debian style to reject his driver just because it doesn't support LE[*], thankfully, Linux is not Debian. Please don't turn Linux into Debian. [*] And if he were to complain about this, he would get slapped with the standard Our priorities are our users and free software Debian Social Contract rhetoric -- thank $DEITY we don't have a Linux Kernel Social Contract with the same bullshit in it. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
On Wed, May 09, 2007 at 12:35:40PM +0200, Mikael Pettersson wrote: Does that mean that the Debian ARM people have their heads so far up their collective asses that they think that every form of change is bad and are unable to accept that some forms of change might be for the better? Well, I am not one of the Debian ARM people, just a user... and I do hope the EABI port becomes supported in the future! But in the meatime there is a crowd of users running Debian on consumer devices like the NSLU2, and they need a LE network driver. 1) Development _should_ happen in small individually-manageable steps. It's wrong to delay integration of the new IXP4xx eth driver just because it's not yet LE-compatible. Exactly. 2) LE Debian/ARM users do have alternatives: they can use USB-Ethernet adapters, for instance. Or just use Christian's driver. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR
I'm not sure what the latest versions are, so I'm not sure which patches to review and which patches are obsolete. On Tue, May 08, 2007 at 02:46:28AM +0200, Krzysztof Halasa wrote: +struct qmgr_regs __iomem *qmgr_regs; +static struct resource *mem_res; +static spinlock_t qmgr_lock; +static u32 used_sram_bitmap[4]; /* 128 16-dword pages */ +static void (*irq_handlers[HALF_QUEUES])(void *pdev); +static void *irq_pdevs[HALF_QUEUES]; + +void qmgr_set_irq(unsigned int queue, int src, + void (*handler)(void *pdev), void *pdev) +{ + u32 __iomem *reg = qmgr_regs-irqsrc[queue / 8]; /* 8 queues / u32 */ + int bit = (queue % 8) * 4; /* 3 bits + 1 reserved bit per queue */ + unsigned long flags; + + src = 7; + spin_lock_irqsave(qmgr_lock, flags); + __raw_writel((__raw_readl(reg) ~(7 bit)) | (src bit), reg); + irq_handlers[queue] = handler; + irq_pdevs[queue] = pdev; + spin_unlock_irqrestore(qmgr_lock, flags); +} The queue manager interrupts should probably be implemented as an irqchip, in the same way that GPIO interrupts are implemented. (I.e. allocate 'real' interrupt numbers for them, and use the interrupt cascade mechanism.) You probably want to have separate irqchips for the upper and lower halves, too. This way, drivers can just use request_irq() instead of having to bother with platform-specific qmgr_set_irq() methods. I think I also made this review comment with Christian's driver. +int qmgr_request_queue(unsigned int queue, unsigned int len /* dwords */, +unsigned int nearly_empty_watermark, +unsigned int nearly_full_watermark) +{ + u32 cfg, addr = 0, mask[4]; /* in 16-dwords */ + int err; + + if (queue = HALF_QUEUES) + return -ERANGE; + + if ((nearly_empty_watermark | nearly_full_watermark) ~7) + return -EINVAL; + + switch (len) { + case 16: + cfg = 0 24; + mask[0] = 0x1; + break; + case 32: + cfg = 1 24; + mask[0] = 0x3; + break; + case 64: + cfg = 2 24; + mask[0] = 0xF; + break; + case 128: + cfg = 3 24; + mask[0] = 0xFF; + break; + default: + return -EINVAL; + } + + cfg |= nearly_empty_watermark 26; + cfg |= nearly_full_watermark 29; + len /= 16; /* in 16-dwords: 1, 2, 4 or 8 */ + mask[1] = mask[2] = mask[3] = 0; + + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + spin_lock_irq(qmgr_lock); + if (__raw_readl(qmgr_regs-sram[queue])) { + err = -EBUSY; + goto err; + } + + while (1) { + if (!(used_sram_bitmap[0] mask[0]) + !(used_sram_bitmap[1] mask[1]) + !(used_sram_bitmap[2] mask[2]) + !(used_sram_bitmap[3] mask[3])) + break; /* found free space */ + + addr++; + shift_mask(mask); + if (addr + len ARRAY_SIZE(qmgr_regs-sram)) { + printk(KERN_ERR qmgr: no free SRAM space for + queue %i\n, queue); + err = -ENOMEM; + goto err; + } + } + + used_sram_bitmap[0] |= mask[0]; + used_sram_bitmap[1] |= mask[1]; + used_sram_bitmap[2] |= mask[2]; + used_sram_bitmap[3] |= mask[3]; + __raw_writel(cfg | (addr 14), qmgr_regs-sram[queue]); + spin_unlock_irq(qmgr_lock); + +#if DEBUG + printk(KERN_DEBUG qmgr: requested queue %i, addr = 0x%02X\n, +queue, addr); +#endif + return 0; + +err: + spin_unlock_irq(qmgr_lock); + module_put(THIS_MODULE); + return err; +} As with Christian's driver, I don't know whether an SRAM allocator makes much sense. We can just set up a static allocation map for the in-tree drivers and leave out the allocator altogether. I.e. I don't think it's worth the complexity (and just because the butt-ugly Intel code has an allocator isn't a very good reason. :-) I.e. an API a la: ixp4xx_qmgr_config_queue(int queue_nr, int sram_base_address, int queue_size, ...); might simply suffice. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Intel IXP4xx network drivers
On Mon, May 07, 2007 at 02:07:16AM +0200, Krzysztof Halasa wrote: + * Ethernet port config (0x00 is not present on IXP42X): + * + * logical port 0x000x100x20 + * NPE 0 (NPE-A) 1 (NPE-B) 2 (NPE-C) + * physical PortId 2 0 1 + * TX queue 23 24 25 + * RX-free queue 26 27 28 + * TX-done queue is always 31, RX queue is configurable (Note that this assignment depends on the firmware, and different firmware versions use different queues -- you might want to add a note about which firmware version this holds for.) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Intel IXP4xx network drivers
On Mon, May 07, 2007 at 09:18:00PM +0100, Michael-Luke Jones wrote: Well, I'm told that (compatible) NPEs are present on other IXP CPUs. Not sure about details. If, by a combined effort, we ever manage to create a generic NPE driver for the NPEs found in IXP42x/43x/46x/2000/23xx then the driver should go in arch/arm/npe.c (Note that the ixp2000 doesn't have NPEs.) (Both the 2000 and the 23xx have microengines, which are both supported by arch/arm/common/uengine.c.) It's possible, but hard due to the differences in hardware design The ixp23xx NPEs seem pretty much identical to me to the ixp4xx NPEs. There are some minor differences between the ixp2000 and ixp23xx uengines, but those are easy enough to deal with. and the fact that boards based on anything other than 42x are few and far between. The vast majority of 'independent' users following mainline are likely running on 42x boards. Sure, ixp23xx hardware is harder to get. I'm not sure what you mean by 'independent' users, though. Are people with non-42x hardware 'dependent' users, and why? Thus, for now, I would drop the NPE / QMGR code in arch/arm/mach- ixp4xx/ and concentrate on making it 42x/43x/46x agnostic. One step at a time :) I'd say that it's up to those who are interested in ixp23xx support (probably only myself at this point) to add ixp23xx support. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Intel IXP4xx network drivers
On Mon, May 07, 2007 at 10:00:20PM +0200, Krzysztof Halasa wrote: - the NPE can also be used as DMA engine and for crypto operations. Both are not network related. Additionally, the NPE is not only ixp4xx related, but is also used in IXP23xx CPUs, so it could be placed in arch/arm/common or arch/arm/xscale ? - The MAC is used on IXP23xx, too. So the drivers for both CPU familys only differ in the way they exchange network packets between the NPE and the kernel. Hmm... perhaps someone have a spare device with such IXP23xx and wants to make it a donation for science? :-) I have a couple of ixp23xx boards at home, but I'm not sure whether I can give them away. I can give you remote access to them, though. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR
On Tue, May 08, 2007 at 04:12:17PM +0200, Krzysztof Halasa wrote: The queue manager interrupts should probably be implemented as an irqchip, in the same way that GPIO interrupts are implemented. (I.e. allocate 'real' interrupt numbers for them, and use the interrupt cascade mechanism.) You probably want to have separate irqchips for the upper and lower halves, too. This way, drivers can just use request_irq() instead of having to bother with platform-specific qmgr_set_irq() methods. Is there a sample somewhere? See for example arch/arm/mach-ep93xx/core.c, handling of the A/B/F port GPIO interrupts. In a nutshell, it goes like this. 1) Allocate a set of IRQ numbers. E.g. in include/asm-arm/arch-ixp4xx/irqs.h: #define IRQ_IXP4XX_QUEUE_0 64 #define IRQ_IXP4XX_QUEUE_1 65 [...] Adjust NR_IRQS, too. 2) Implement interrupt chip functions: static void ixp4xx_queue_low_irq_mask_ack(unsigned int irq) { [...] } static void ixp4xx_queue_low_irq_mask(unsigned int irq) { [...] } static void ixp4xx_queue_low_irq_unmask(unsigned int irq) { [...] } static void ixp4xx_queue_low_irq_set_type(unsigned int irq) { [...] } static struct irq_chip ixp4xx_queue_low_irq_chip = { .name = QMGR low, .ack= ixp4xx_queue_low_irq_mask_ack, .mask = ixp4xx_queue_low_irq_mask, .unmask = ixp4xx_queue_low_irq_unmask, .set_type = ixp4xx_queue_low_irq_set_type, }; 3) Hook up the queue interrupts: for (i = IRQ_IXP4XX_QUEUE_0; i = IRQ_IXP4XX_QUEUE_31; i++) { set_irq_chip(i, ixp4xx_queue_low_irq_chip); set_irq_handler(i, handle_level_irq); set_irq_flags(i, IRQF_VALID); } 4) Implement an interrupt handler for the parent interrupt: static void ixp4xx_qmgr_low_irq_handler(unsigned int irq, struct irq_des c *desc) { u32 status; int i; status = __raw_readl(IXP4XX_WHATEVER_QMGR_LOW_STATUS_REGISTER); for (i = 0; i 32; i++) { if (status (1 i)) { desc = irq_desc + IRQ_IXP4XX_QUEUE_0 + i; desc_handle_irq(IRQ_IXP4XX_QUEUE_0 + i, desc); } } } 5) Hook up the parent interrupt: set_irq_chained_handler(IRQ_IXP4XX_QM1, ixp4xx_qmgr_low_irq_handler); Or something like that. As with Christian's driver, I don't know whether an SRAM allocator makes much sense. We can just set up a static allocation map for the in-tree drivers and leave out the allocator altogether. I.e. I don't think it's worth the complexity (and just because the butt-ugly Intel code has an allocator isn't a very good reason. :-) It's a very simple allocator. I don't whink we have enough SRAM without it. For now it would work but it's probably too small for all potential users at a time. There may be up to 6 Ethernet ports (not sure about hardware status, not yet supported even by Intel) - 7 queues * 128 entries each = ~ 3.5 KB. Add 2 long queues (RX) for HSS and something for TX, and then crypto, and maybe other things. You're unlikely to be using all of those at the same time, though. And what do you do if the user does compile all of these features into his kernel and then tries to use them all at the same time? Return -ENOMEM? Shouldn't we make sure that at least the features that are compiled in can be used at the same time? If you want that guarantee, then you might as well determine the SRAM map at compile time. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
On Tue, May 08, 2007 at 04:31:12PM +0200, Krzysztof Halasa wrote: +/* Built-in 10/100 Ethernet MAC interfaces */ +static struct mac_plat_info ixdp425_plat_mac[] = { + { + .phy= 0, + .rxq= 3, + }, { + .phy= 1, + .rxq= 4, + } +}; As with Christian's driver (I'm feeling like a bit of a broken record here :-), putting knowledge of which queue to use (which is firmware- specific) in the _board_ support file is almost certainly wrong. I would just put the port number in there, and let the ethernet driver map the port number to the hardware queue number. After all, the ethernet driver knows which queues the firmware uses, while the board support code doesn't. No, quite the opposite. The board code knows its set of hardware interfaces etc. and can let Ethernet driver use, say, HSS queues. The driver can't know that. You are attacking a point that I did not make. The board support code knows such things as that the front ethernet port on the board is connected to the CPU's MII port number #2, but the board support code does _not_ know that MII port number #2 corresponds to ixp4xx hardware queue #5. If Intel puts out a firmware update next month, and your ethernet driver is modified to take advantage of the new features in that firmware and starts depending on the newer version of that firmware, we will have to modify every ixp4xx board support file in case the firmware update modifies the ixp4xx queue numbers in use. The mapping from hardware ports (MII port #0, MII port #6, HSS port #42, whatever) to ixp4xx hardware queue numbers (0-63) should _not_ be put in every single ixp4xx board support file. Even if you only change the (in board support file) .rxq= 4, line to something like this instead: (in some ixp4xx-specific or driver-specific header file) #define IXP4XX_MII_PORT_1_RX_QUEUE 4 (in board support file) .rxq= IXP4XX_MII_PORT_1_RX_QUEUE, then you have remved this dependency, and then you only have to update one place if you move to a newer firmware version. I generally discourage the use of such wrappers, as it often makes people forget that the set and clear operations are not atomic, and it ignores the fact that some of the other bits in the register you are modifying might have side-effects. Without them the code in question is hardly readable, You can read Polish, how can you complain about code readability. :-)) *runs* I pick the need to remember about non-atomicity and possible side effects instead :-) Sure, point taken, it's just that the person after you might not remember.. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS
On Tue, May 08, 2007 at 05:28:21PM +0200, Krzysztof Halasa wrote: I was always curious, why do people want to run ixp4xx in LE mode? What are the benefits that overweight the obvious performance degradation? Debian is indeed a valid reason. I wonder if it would be much work to create BE Debian as well. There _is_ an ARM BE version of Debian. It's not an official port, but it's not maintained any worse than the 'official' LE ARM Debian port is. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SMC on pxa3xx (was: pxa3xx base patch [5/5] - net)
On Fri, Apr 27, 2007 at 12:52:08PM +0400, dmitry pervushin wrote: +#elif defined(CONFIG_PXA3xx) +#define SMC_CAN_USE_8BIT 1 +#define SMC_CAN_USE_16BIT1 +#define SMC_CAN_USE_32BIT0 +#define SMC_IO_SHIFT 0 +#define SMC_NOWAIT 1 +#define SMC_USE_PXA_DMA 1 +#define SMC_inb(a, r)readb((a) + (r)) +#define SMC_outb(v, a, r)writeb(v, (a) + (r)) +#define SMC_inw(a, r)readw((a) + (r)) +#define SMC_outw(v, a, r)writew(v, (a) + (r)) +#define SMC_insw(a, r, p, l) insw((a) + (r), p, l) +#define SMC_outsw(a, r, p, l)outsw((a) + (r), p, l) This is bogus, please don't apply. The fact that the SMC might be hooked up in a certain way on one certain PXA3xx board doesn't mean that it will be hooked up in that way on every PXA3xx board. Everything I've seen of the PXA3xx patch set so far is a disaster. MontaVista is flooding every corner of the internet with these crap patches. This idiocy has got to stop. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] e100 driver on ARM
On Thu, Apr 26, 2007 at 09:41:22AM -0400, David Acker wrote: Here is a quote from Russell that describes what I believe is the main problem: http://www-gatago.com/linux/kernel/15457063.html Has e100 actually been fixed to use the PCI DMA API correctly yet? Looking at it, it doesn't look like it, so until it does, eepro100 is the far better bet for platforms needing working DMA API. What I'm talking about is e100's apparant belief that it can modify rfd's in the receive ring on a non-cache coherent architecture and expect the data around it to remain unaffected (see e100_rx_alloc_skb): struct rfd { u16 status; u16 command; u32 link; u32 rbd; u16 actual_size; u16 size; }; it touches command and link. This means that the whole rfd plus maybe the following or preceding 16 bytes get loaded into a cache line (assuming cache lines of 32 bytes), and that data written out again at sync. However, it does this on what seems to be an active receive chain. So, both the CPU _and_ the device own the same data. Which is a violation of the DMA API. I think that the S-bit patch fixes it because the hardware spins on the s-bit instead of using the packet. With just the el-bit, the hardware tries to use the same cache line that the software is updating. Can someone from Intel let us know if I understand the hardware's handling of the S and EL bits? If my interpretation is correct, can the s-bit patch be applied? It seems like the correct way to lock out the hardware while a packet is being updated. I have not seen a reason given not to apply the patch. This is all a while ago now, but wasn't the e100 S-bit patch originally written by Intel people in response to the very same quote by Russell King that you've quoted above? The S-bit patch should probably just be applied, IMHO. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] bridge: if no STP then forward all BPDU's
On Tue, Apr 24, 2007 at 04:12:26PM -0700, Stephen Hemminger wrote: The bridge code by default captures all spanning tree packets and doesn't forward them. I propose that this might not be a good idea. As far as I remember, the original bridge code did pass through BPDUs when STP was disabled. I think that that is the only right way to behave. --- bridge-2.6.22.orig/net/bridge/br_input.c +++ bridge-2.6.22/net/bridge/br_input.c @@ -131,8 +131,16 @@ struct sk_buff *br_handle_frame(struct n if (!is_valid_ether_addr(eth_hdr(skb)-h_source)) goto drop; - if (unlikely(is_link_local(dest))) { - skb-pkt_type = PACKET_HOST; + /* + * If STP is running, then trap all link-local (802.1x) frames + * process through normal receive path. + * + * For safety, if not running STP then act as a completely transparent + * device. This means that if STP is running on another machine, it + * can still detect cycles. + */ + if (p-br-stp_enabled != BR_NO_STP is_link_local(dest)) { + /* skb-pkt_type should already be PACKET_MULTICAST */ (Does this check include PAUSE frames? We still don't want to forward PAUSE frames in any case..) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add support for running the Marvell m88e1111 PHY in RGMII mode
On Wed, Apr 11, 2007 at 04:36:49PM -0500, Kim Phillips wrote: On Tue, Apr 10, 2007 at 04:57:23PM -0500, Kim Phillips wrote: also adds RX TX delay bits to help boards with clock skew problems. snip [...] + + temp |= (MII_M_RX_DELAY | MII_M_TX_DELAY); Enabling this unconditionally is just wrong. I agree. There needs to be a way for the platform code to communicate board specific quirkiness to the phylib (I'm not sure whether it's really a quirk, as the RGMII spec allows both modes.) (I just haven't figured out how to yet). Maybe offer separate RGMII and RGMII-ID[*] mode choices? [*] RGMII with Internal Delay (RGMII specification nomenclature) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: phylib usage
On Tue, Apr 10, 2007 at 05:20:52PM -0500, Kim Phillips wrote: (note I'm coming from an embedded world here.) Please read this: http://marc.info/?l=linux-netdevm=116527863300952w=2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: routing question under invisible bridge
On Thu, Mar 22, 2007 at 03:52:55PM -0500, Bin He wrote: Dear sir, Hi, I found your email address from kernel bridge source codes. I would appreciate if you could look into my question a little bit. The netdev@ mailing list is a better forum to ask such questions, I've CC'ed this email there. I have an invisible bridge (br0) which contains eth0 and eth1. None of them have an IP address because I want to it to be transparent to the existing network. So there is no entries in kernel routing table. If you have an IP address assigned to br0, your kernel will likely have (at least) one entry in its routing table even if you didn't put any routes in there yourself. The problem is how does it handle the routing, i.e., which eth interface will a packet be sent to? (The decision which bridge sub-device to send a packet to isn't called 'routing', as it doesn't involve an IP routing decision -- that decision has already been made at that point.) For example, I can create a packet and bind it to a device by SO_BINDTODEVICE socket option. I did some tests and found: 1) if the socket is bound to eth0 or eth1, the packet cannot be sent out. 2) if the socket is bound to br0, it seems that the packet is only sent out to eth0. Check out your system's ARP table (run /sbin/arp) and your br0 bridge's MAC address table (run 'brctl showmacs br0' or something like that.) When your machine wants to communicate with a remote IP address, it first sends an ARP packet to figure out what the ethernet address is that corresponds to that remote IP address. When your machine then sends an IP packet on the br0 interface to that ethernet address, the bridge code checks the MAC address table to find out whether to send it to eth0 or eth1 (if the MAC address is a known MAC address) or to both (if we have never seen the MAC address before or if it has timed out.) So is there a way to send out a packet on a particular device? I'm not sure exactly what you are trying to do? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
On Sun, Jan 07, 2007 at 03:11:34PM +0100, Harald Welte wrote: So instead of using NAT to dynamically redirect traffic to local addresses, we now rely on native non-locally-bound sockets and do early socket lookups for inbound IPv4 packets. It's good to see a solid implementation of this 'old idea'. Just as a quick historical note to netdev: This is the way how the netfilter project advised the balabit guys to implement fully transparent proxy support, after having seen the complexity of the old nat-based TPROXY patches. Didn't rusty tell the balabit guys to use the NAT approach? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
On Thu, Jan 04, 2007 at 01:13:27PM +0100, KOVACS Krisztian wrote: I'd also love to see the old tproxy API go away entirely. It was always a bit of a pain to use. It's gone with these patches: all you need is to bind() to foreign addresses, like in the Linux 2.2 days. That's how I understood it. Great. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
On Wed, Jan 03, 2007 at 05:33:57PM +0100, KOVACS Krisztian wrote: The following set of patches implement transparent proxying support loosely modeled on the Linux 2.2 transparent proxying functionality. In a transparent http proxy server I wrote a while ago, we used to use tproxy for making outgoing connections appear to be originating from a foreign IP address, but moved to inserting an iptables nat rule from the proxy app every time an outgoing connection needs to be made, due to the pain of having to patch in the tproxy patches every time we needed to do a kernel update. I'd love to see working tproxy functionality merged upstream for that reason alone. I'd also love to see the old tproxy API go away entirely. It was always a bit of a pain to use. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] ep93xx: some minor cleanups to the ep93xx eth driver
On Tue, Dec 26, 2006 at 04:41:17PM -0500, Jeff Garzik wrote: Small cleanup in the Cirrus Logic EP93xx ethernet driver: Check for NULL pointer before dereferencing it instead of after. Remove unreferenced variable. Signed-off-by: Yan Burman [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] Cc: Russell King [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Why wasn't I CC'ed on this? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] ep93xx: some minor cleanups to the ep93xx eth driver
On Tue, Dec 26, 2006 at 10:42:27PM +0100, Lennert Buytenhek wrote: Small cleanup in the Cirrus Logic EP93xx ethernet driver: Check for NULL pointer before dereferencing it instead of after. Remove unreferenced variable. Signed-off-by: Yan Burman [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] Cc: Russell King [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Why wasn't I CC'ed on this? Sorry, meant to ask Yan Burman, not Jeff. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] r8169: use the broken_parity_status field in pci_dev
On Mon, Dec 18, 2006 at 12:04:19AM +0100, Francois Romieu wrote: The former option is removed and platform code can now specify the expected behavior. Thanks a lot. FYI, I submitted this patch for the n2100 side: Index: linux-2.6.19/arch/arm/mach-iop32x/n2100.c === --- linux-2.6.19.orig/arch/arm/mach-iop32x/n2100.c +++ linux-2.6.19/arch/arm/mach-iop32x/n2100.c @@ -123,9 +123,26 @@ static struct hw_pci n2100_pci __initdat static int __init n2100_pci_init(void) { - if (machine_is_n2100()) + if (machine_is_n2100()) { + int i; + pci_common_init(n2100_pci); + /* +* Both r8169 chips on the n2100 exhibit PCI parity +* problems. Set the -broken_parity_status flag for +* both ports so that the r8169 driver knows it should +* ignore error interrupts. +*/ + for (i = 1; i = 2; i++) { + struct pci_dev *dev; + + dev = pci_get_bus_and_slot(0, PCI_DEVFN(i, 0)); + if (dev != NULL) + dev-broken_parity_status = 1; + } + } + return 0; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bridge it's MAC address question
On Mon, Oct 30, 2006 at 07:28:37AM -0800, Stephen Hemminger wrote: Could somebody explain, why bridge uses minimal MAC of the attached devices? It makes this address instable, variable during bridge life-cycle, which is not good for DHCP. For example, I want to attach multiple virtual devices to one physical. Then, I need to make sure that after each virtual device addition, bridge addr is not changed and still addr of the physical device. Why not to use MAC of the first attached device? The bridge physical address is the minimum of all the attached devices. This is done because the STP standard requires it. You can reset it to be the same as any of the attached devices. This will not cause a problem unless using STP. You can in fact use any MAC address. The STP standard recommends using the minimum address, as that is deterministic, and so it doesn't depend on the order in which you enslave subdevices. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bridge] Bridge it's MAC address question
[ dropped subscriber-only openvz.org list ] On Fri, Dec 15, 2006 at 07:52:36AM -0800, Stephen Hemminger wrote: The bridge physical address is the minimum of all the attached devices. This is done because the STP standard requires it. You can reset it to be the same as any of the attached devices. This will not cause a problem unless using STP. You can in fact use any MAC address. The STP standard recommends using the minimum address, as that is deterministic, and so it doesn't depend on the order in which you enslave subdevices. So should restriction be lifted? We should definitely allow users to override the MAC address of a bridge interface. Please update wiki page FAQ, or I'll do it Please do. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC patch] driver for the Opencores Ethernet Controller
On Mon, Dec 04, 2006 at 10:01:01AM -0800, Dan Nicolaescu wrote: The Opencores Ethernet Controller is Verilog code that can be used to implement an Ethernet device in hardware. It needs to be coupled with a PHY and some buffer memory. Because of that devices that implement this controller can be very different. The code here tries to support that by having some parameters that need to be defined at compile time. Considering this, why don't you make it a platform driver? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC patch] driver for the Opencores Ethernet Controller
On Mon, Dec 04, 2006 at 10:27:52AM -0800, Dan Nicolaescu wrote: The Opencores Ethernet Controller is Verilog code that can be used to implement an Ethernet device in hardware. It needs to be coupled with a PHY and some buffer memory. Because of that devices that implement this controller can be very different. The code here tries to support that by having some parameters that need to be defined at compile time. Considering this, why don't you make it a platform driver? I didn't know about platform drivers before your mail. I guess I could convert it to that if that is the right thing to do. I definitely think so. Check the ep93xx_eth driver for an example. (It might be an overkill given that the device is kind of simple and embedded people prefer small code...) ..until someone decides that he wants to build a design with two of these ethernet cores instead of just one, at which point the entire Let's use #defines for everything plan breaks down badly. Any comments on the driver itself? Sorry, no, I didn't look at it. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
'embedded people' and the 'embedded world' (was: Re: [RFC patch] driver for the Opencores Ethernet Controller)
On Mon, Dec 04, 2006 at 10:27:52AM -0800, Dan Nicolaescu wrote: I didn't know about platform drivers before your mail. I guess I could convert it to that if that is the right thing to do. (It might be an overkill given that the device is kind of simple and embedded people prefer small code...) BTW (and this is not specifically directed to you.) I count myself as an 'embedded person', having contributed a thing or two to the Linux ARM kernel port and doing most of my Linux hacking on ARM platforms, but I certainly don't share your opinion w.r.t. what 'embedded people' want or don't want. Nor do I share any of the opinions of most 'embedded people' who proclaim to be representing 'the embedded world' to 'the outside world', the opinions that have gotten the embedded crowd the bad reputation that we have gotten over the years. - We can't use existing kernel infrastructure because we are special. - Let's save 8 bytes and 2 cycles in a slow path by throwing all established and sane kernel design principles out of the window. - If we code this in a really ugly, unmaintainable, incompatible, incomprehensible way, we can save 3 cycles in the slow path. Let's do it. - All our code lives in a separately maintained tree, and everyone wanting to use Linux on our CPUs will have to use our 'Cirrus Logic Linux 2.6.8.1 version 1.3.2' release[*]. We can't be bothered to merge with upstream because we are special. - There's nothing wrong with a function having 500 lines. - We are using 2.4.6 because it is more stable than all that newfangled 2.6 stuff. - etc. It does piss me off from time to time that these people are out there, and that they claim to be speaking in my name when they spout their nonsense. For all the 'non-embedded' folks out there: the next time you hear someone claiming to be representing 'the embedded world', please take whatever they say with a bag of salt. [*] Seriously, I didn't make this up. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH,try2 2/3] ep93xx_eth: fix unlikely(x) y test
Fix unlikely(x) y test in ep93xx_eth. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Index: linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c === --- linux-2.6.19-rc3.orig/drivers/net/arm/ep93xx_eth.c +++ linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c @@ -334,7 +334,7 @@ static int ep93xx_xmit(struct sk_buff *s struct ep93xx_priv *ep = netdev_priv(dev); int entry; - if (unlikely(skb-len) MAX_PKT_SIZE) { + if (unlikely(skb-len MAX_PKT_SIZE)) { ep-stats.tx_dropped++; dev_kfree_skb(skb); return NETDEV_TX_OK; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH,try3 3/3] ep93xx_eth: don't report RX errors
Flooding the console with error messages for every RX FIFO overrun, checksum error and framing error isn't very sensible. Each of these errors can occur during normal operation, so stop printk'ing error messages for RX errors at all. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Index: linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c === --- linux-2.6.19-rc3.orig/drivers/net/arm/ep93xx_eth.c +++ linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c @@ -230,9 +230,6 @@ static int ep93xx_rx(struct net_device * %.8x %.8x\n, rstat0, rstat1); if (!(rstat0 RSTAT0_RWE)) { - printk(KERN_NOTICE ep93xx_rx: receive error - %.8x %.8x\n, rstat0, rstat1); - ep-stats.rx_errors++; if (rstat0 RSTAT0_OE) ep-stats.rx_fifo_errors++; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH,try2 0/3] ep93xx_eth: three fixes for 2.6.19
This patchset fixes three issues in ep93xx_eth. The first fix is for an RX/TX lockup bug due to mishandling of the RX/TXstatus rings in the driver, and is a showstopper. The second and third aren't really showstopper bugs, but real issues nevertheless, and easy enough to fix. In this new queue, I've replaced the third patch, which modified ep93xx_eth to only printk for non-FIFO overrun-type RX errors, by a patch which stops ep93xx_eth reporting RX errors at all, since it makes little sense, and the regular error counters should be sufficient. Please apply for 2.6.19 -- thanks! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] ep93xx_eth: three fixes for 2.6.19
This patchset fixes three issues in ep93xx_eth. The first fix is for an RX/TX lockup bug due to mishandling of the RX/TXstatus rings in the driver, and is a showstopper. The second and third aren't really showstopper bugs, but real issues nevertheless, and easy enough to fix. Please apply for 2.6.19 -- thanks! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] ep93xx_eth: fix RX/TXstatus ring full handling
Ray Lehtiniemi reported that an incoming UDP packet flood can lock up the ep93xx ethernet driver. Herbert Valerio Riedel noted that due to the way ep93xx_eth manages the RX/TXstatus rings, it cannot distinguish a full ring from an empty one, and correctly suggested that this was likely to be causing this lockup to occur. Instead of looking at the hardware's RX/TXstatus ring write pointers to determine when to stop reading from those rings, we should just check every individual RX/TXstatus descriptor's valid bit instead, since there is no other way to distinguish an empty ring from a full ring, and if there is a descriptor waiting, we take the hit of reading the descriptor from memory anyway. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Index: linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c === --- linux-2.6.19-rc3.orig/drivers/net/arm/ep93xx_eth.c +++ linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c @@ -193,12 +193,9 @@ static struct net_device_stats *ep93xx_g static int ep93xx_rx(struct net_device *dev, int *budget) { struct ep93xx_priv *ep = netdev_priv(dev); - int tail_offset; int rx_done; int processed; - tail_offset = rdl(ep, REG_RXSTSQCURADD) - ep-descs_dma_addr; - rx_done = 0; processed = 0; while (*budget 0) { @@ -211,28 +208,23 @@ static int ep93xx_rx(struct net_device * entry = ep-rx_pointer; rstat = ep-descs-rstat + entry; - if ((void *)rstat - (void *)ep-descs == tail_offset) { + + rstat0 = rstat-rstat0; + rstat1 = rstat-rstat1; + if (!(rstat0 RSTAT0_RFP) || !(rstat1 RSTAT1_RFP)) { rx_done = 1; break; } - rstat0 = rstat-rstat0; - rstat1 = rstat-rstat1; rstat-rstat0 = 0; rstat-rstat1 = 0; - if (!(rstat0 RSTAT0_RFP)) - printk(KERN_CRIT ep93xx_rx: buffer not done - %.8x %.8x\n, rstat0, rstat1); if (!(rstat0 RSTAT0_EOF)) printk(KERN_CRIT ep93xx_rx: not end-of-frame %.8x %.8x\n, rstat0, rstat1); if (!(rstat0 RSTAT0_EOB)) printk(KERN_CRIT ep93xx_rx: not end-of-buffer %.8x %.8x\n, rstat0, rstat1); - if (!(rstat1 RSTAT1_RFP)) - printk(KERN_CRIT ep93xx_rx: buffer1 not done - %.8x %.8x\n, rstat0, rstat1); if ((rstat1 RSTAT1_BUFFER_INDEX) 16 != entry) printk(KERN_CRIT ep93xx_rx: entry mismatch %.8x %.8x\n, rstat0, rstat1); @@ -301,13 +293,8 @@ err: static int ep93xx_have_more_rx(struct ep93xx_priv *ep) { - struct ep93xx_rstat *rstat; - int tail_offset; - - rstat = ep-descs-rstat + ep-rx_pointer; - tail_offset = rdl(ep, REG_RXSTSQCURADD) - ep-descs_dma_addr; - - return !((void *)rstat - (void *)ep-descs == tail_offset); + struct ep93xx_rstat *rstat = ep-descs-rstat + ep-rx_pointer; + return !!((rstat-rstat0 RSTAT0_RFP) (rstat-rstat1 RSTAT1_RFP)); } static int ep93xx_poll(struct net_device *dev, int *budget) @@ -379,10 +366,8 @@ static int ep93xx_xmit(struct sk_buff *s static void ep93xx_tx_complete(struct net_device *dev) { struct ep93xx_priv *ep = netdev_priv(dev); - int tail_offset; int wake; - tail_offset = rdl(ep, REG_TXSTSQCURADD) - ep-descs_dma_addr; wake = 0; spin_lock(ep-tx_pending_lock); @@ -393,15 +378,13 @@ static void ep93xx_tx_complete(struct ne entry = ep-tx_clean_pointer; tstat = ep-descs-tstat + entry; - if ((void *)tstat - (void *)ep-descs == tail_offset) - break; tstat0 = tstat-tstat0; + if (!(tstat0 TSTAT0_TXFP)) + break; + tstat-tstat0 = 0; - if (!(tstat0 TSTAT0_TXFP)) - printk(KERN_CRIT ep93xx_tx_complete: buffer not done - %.8x\n, tstat0); if (tstat0 TSTAT0_FA) printk(KERN_CRIT ep93xx_tx_complete: frame aborted %.8x\n, tstat0); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] ep93xx_eth: don't report RX FIFO overrun errors
On Sun, Oct 29, 2006 at 11:15:28AM -0700, Ray Lehtiniemi wrote: Index: linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c === --- linux-2.6.19-rc3.orig/drivers/net/arm/ep93xx_eth.c +++ linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c @@ -230,8 +230,9 @@ static int ep93xx_rx(struct net_device * %.8x %.8x\n, rstat0, rstat1); if (!(rstat0 RSTAT0_RWE)) { - printk(KERN_NOTICE ep93xx_rx: receive error - %.8x %.8x\n, rstat0, rstat1); + if (!(rstat0 RSTAT_OE)) + printk(KERN_NOTICE ep93xx_rx: receive error + %.8x %.8x\n, rstat0, rstat1); ep-stats.rx_errors++; if (rstat0 RSTAT0_OE) i got a compile error: please s/RSTAT_OE/RSTAT0_OE/ in this patch. Whoops, I thought I sent the right one. :( Also, is it possible for any other error bits to be set at the same time as OE? such bits would not be printed to the log in this case. Not sure, but arguably, this wouldn't be very interesting. Actually, now I'm wondering whether we should just remove the printk altogether. cheers, Lennert - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.4/2.6 share in linux routers ?
On Fri, Oct 27, 2006 at 11:47:52PM +0200, Yakov Lerner wrote: I'd like to find/gather estimates about 2.4 vs 2.6 share in [small] linux routers in 2006. Can anyone offer estimates and/or references ? For ARM devices, 2.4 is still definitely in the majority. The reason for that appears to be that embedded linux distro vendors like locking their customers into their own patched-to-hell once-looked- like-something-2.4-ish kernels, under the guise of a load of 2.6 is too unstable FUD. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH,RFC] bridge: call eth_type_trans() in br_pass_frame_up()
Hi, I've been seeing a failure to reply to incoming ARP packets on a bridge interface until after the first few packets have been transmitted over that interface, and the patch below seems to fix the issue, the 'issue' being that the incoming ARP packets are marked with PACKET_OTHERHOST, and there not being anything to set that back to PACKET_HOST even if the destination MAC address matches the bridge interface's MAC address. If this looks good, I'll prepare a proper commit message. cheers, Lennert Signed-off-by: Tom Billman [EMAIL PROTECTED] Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] --- linux-2.6.19-rc2.orig/net/bridge/br_input.c 2006-10-18 11:11:08.0 +0200 +++ linux-2.6.19-rc2/net/bridge/br_input.c 2006-10-18 11:10:08.0 +0200 @@ -32,6 +32,9 @@ indev = skb-dev; skb-dev = br-dev; + skb_push(skb, ETH_HLEN); + skb-protocol = eth_type_trans(skb, skb-dev); + NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, indev, NULL, netif_receive_skb); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
On Thu, Oct 12, 2006 at 10:08:53AM +0200, Martin Schiller wrote: I'm searching for a solution to suppress / delay the SYN-ACK packet of a listening server (-application) until he has decided (e.g. analysed the requesting ip-address or checked if the corresponding other end of a connection is available) if he wants to accept the connect request of the client. If not, it should be possible to reject the connect request. I wrote something like this a couple of years ago: http://marc.theaimsgroup.com/?l=linux-netdevm=103666165629419w=2 http://marc.theaimsgroup.com/?l=linux-netdevm=106089519611631w=2 There wasn't a whole lot of external interest, and my need for it disappeared, so I never really finished it, and there's a couple of unfixed bugs, cheers, Lennert - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] cirrus logic ep93xx ethernet driver
The cirrus ep93xx is an ARM SoC that includes an ethernet MAC -- this patch adds a driver for that ethernet MAC. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Index: linux-2.6.18/drivers/net/arm/Kconfig === --- linux-2.6.18.orig/drivers/net/arm/Kconfig +++ linux-2.6.18/drivers/net/arm/Kconfig @@ -39,3 +39,10 @@ config ARM_AT91_ETHER help If you wish to compile a kernel for the AT91RM9200 and enable ethernet support, then you should always answer Y to this. + +config EP93XX_ETH + tristate EP93xx Ethernet support + depends on NET_ETHERNET ARM ARCH_EP93XX + help + This is a driver for the ethernet hardware included in EP93xx CPUs. + Say Y if you are building a kernel for EP93xx based devices. Index: linux-2.6.18/drivers/net/arm/Makefile === --- linux-2.6.18.orig/drivers/net/arm/Makefile +++ linux-2.6.18/drivers/net/arm/Makefile @@ -8,3 +8,4 @@ obj-$(CONFIG_ARM_ETHERH)+= etherh.o obj-$(CONFIG_ARM_ETHER3) += ether3.o obj-$(CONFIG_ARM_ETHER1) += ether1.o obj-$(CONFIG_ARM_AT91_ETHER) += at91_ether.o +obj-$(CONFIG_EP93XX_ETH) += ep93xx_eth.o Index: linux-2.6.18/drivers/net/arm/ep93xx_eth.c === --- /dev/null +++ linux-2.6.18/drivers/net/arm/ep93xx_eth.c @@ -0,0 +1,856 @@ +/* + * EP93xx ethernet network device driver + * Copyright (C) 2006 Lennert Buytenhek [EMAIL PROTECTED] + * Dedicated to Marija Kulikova. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include linux/config.h +#include linux/dma-mapping.h +#include linux/module.h +#include linux/kernel.h +#include linux/netdevice.h +#include linux/mii.h +#include linux/etherdevice.h +#include linux/ethtool.h +#include linux/init.h +#include linux/moduleparam.h +#include linux/platform_device.h +#include linux/delay.h +#include asm/arch/ep93xx-regs.h +#include asm/arch/platform.h +#include asm/io.h +#include ep93xx_eth.h + +#define DRV_MODULE_NAMEep93xx-eth +#define DRV_MODULE_VERSION 0.1 + +#define RX_QUEUE_ENTRIES 64 +#define TX_QUEUE_ENTRIES 8 + +#define MAX_PKT_SIZE 2044 +#define PKT_BUF_SIZE 2048 + +struct ep93xx_descs +{ + struct ep93xx_rdesc rdesc[RX_QUEUE_ENTRIES]; + struct ep93xx_tdesc tdesc[TX_QUEUE_ENTRIES]; + struct ep93xx_rstat rstat[RX_QUEUE_ENTRIES]; + struct ep93xx_tstat tstat[TX_QUEUE_ENTRIES]; +}; + +struct ep93xx_priv +{ + struct resource *res; + void*base_addr; + int irq; + + struct ep93xx_descs *descs; + dma_addr_t descs_dma_addr; + + void*rx_buf[RX_QUEUE_ENTRIES]; + void*tx_buf[TX_QUEUE_ENTRIES]; + + spinlock_t rx_lock; + int rx_pointer; + int tx_clean_pointer; + int tx_pointer; + spinlock_t tx_pending_lock; + int tx_pending; + + struct net_device_stats stats; + + struct mii_if_info mii; + u8 mdc_divisor; +}; + +#define rdb(ep, off) __raw_readb((ep)-base_addr + (off)) +#define rdw(ep, off) __raw_readw((ep)-base_addr + (off)) +#define rdl(ep, off) __raw_readl((ep)-base_addr + (off)) +#define wrb(ep, off, val) __raw_writeb((val), (ep)-base_addr + (off)) +#define wrw(ep, off, val) __raw_writew((val), (ep)-base_addr + (off)) +#define wrl(ep, off, val) __raw_writel((val), (ep)-base_addr + (off)) + +static int ep93xx_mdio_read(struct net_device *dev, int phy_id, int reg); + +static struct net_device_stats *ep93xx_get_stats(struct net_device *dev) +{ + struct ep93xx_priv *ep = netdev_priv(dev); + return (ep-stats); +} + +static int ep93xx_rx(struct net_device *dev, int *budget) +{ + struct ep93xx_priv *ep = netdev_priv(dev); + int tail_offset; + int rx_done; + int processed; + + tail_offset = rdl(ep, REG_RXSTSQCURADD) - ep-descs_dma_addr; + + rx_done = 0; + processed = 0; + while (*budget 0) { + int entry; + struct ep93xx_rstat *rstat; + u32 rstat0; + u32 rstat1; + int length; + struct sk_buff *skb; + + entry = ep-rx_pointer; + rstat = ep-descs-rstat + entry; + if ((void *)rstat - (void *)ep-descs == tail_offset) { + rx_done = 1; + break
[PATCH] Cirrus Logic ep93xx ethernet driver
The Cirrus Logic ep93xx is an ARM SoC that includes an ethernet MAC -- this patch adds a driver for that ethernet MAC. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Index: linux-2.6.18/drivers/net/arm/Kconfig === --- linux-2.6.18.orig/drivers/net/arm/Kconfig +++ linux-2.6.18/drivers/net/arm/Kconfig @@ -39,3 +39,10 @@ config ARM_AT91_ETHER help If you wish to compile a kernel for the AT91RM9200 and enable ethernet support, then you should always answer Y to this. + +config EP93XX_ETH + tristate EP93xx Ethernet support + depends on NET_ETHERNET ARM ARCH_EP93XX + help + This is a driver for the ethernet hardware included in EP93xx CPUs. + Say Y if you are building a kernel for EP93xx based devices. Index: linux-2.6.18/drivers/net/arm/Makefile === --- linux-2.6.18.orig/drivers/net/arm/Makefile +++ linux-2.6.18/drivers/net/arm/Makefile @@ -8,3 +8,4 @@ obj-$(CONFIG_ARM_ETHERH)+= etherh.o obj-$(CONFIG_ARM_ETHER3) += ether3.o obj-$(CONFIG_ARM_ETHER1) += ether1.o obj-$(CONFIG_ARM_AT91_ETHER) += at91_ether.o +obj-$(CONFIG_EP93XX_ETH) += ep93xx_eth.o Index: linux-2.6.18/drivers/net/arm/ep93xx_eth.c === --- /dev/null +++ linux-2.6.18/drivers/net/arm/ep93xx_eth.c @@ -0,0 +1,944 @@ +/* + * EP93xx ethernet network device driver + * Copyright (C) 2006 Lennert Buytenhek [EMAIL PROTECTED] + * Dedicated to Marija Kulikova. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include linux/config.h +#include linux/dma-mapping.h +#include linux/module.h +#include linux/kernel.h +#include linux/netdevice.h +#include linux/mii.h +#include linux/etherdevice.h +#include linux/ethtool.h +#include linux/init.h +#include linux/moduleparam.h +#include linux/platform_device.h +#include linux/delay.h +#include asm/arch/ep93xx-regs.h +#include asm/arch/platform.h +#include asm/io.h + +#define DRV_MODULE_NAMEep93xx-eth +#define DRV_MODULE_VERSION 0.1 + +#define RX_QUEUE_ENTRIES 64 +#define TX_QUEUE_ENTRIES 8 + +#define MAX_PKT_SIZE 2044 +#define PKT_BUF_SIZE 2048 + +#define REG_RXCTL 0x +#define REG_RXCTL_DEFAULT 0x00073800 +#define REG_TXCTL 0x0004 +#define REG_TXCTL_ENABLE 0x0001 +#define REG_MIICMD 0x0010 +#define REG_MIICMD_READ 0x8000 +#define REG_MIICMD_WRITE 0x4000 +#define REG_MIIDATA0x0014 +#define REG_MIISTS 0x0018 +#define REG_MIISTS_BUSY 0x0001 +#define REG_SELFCTL0x0020 +#define REG_SELFCTL_RESET 0x0001 +#define REG_INTEN 0x0024 +#define REG_INTEN_TX 0x0008 +#define REG_INTEN_RX 0x0007 +#define REG_INTSTSP0x0028 +#define REG_INTSTS_TX 0x0008 +#define REG_INTSTS_RX 0x0004 +#define REG_INTSTSC0x002c +#define REG_AFP0x004c +#define REG_INDAD0 0x0050 +#define REG_INDAD1 0x0051 +#define REG_INDAD2 0x0052 +#define REG_INDAD3 0x0053 +#define REG_INDAD4 0x0054 +#define REG_INDAD5 0x0055 +#define REG_GIINTMSK 0x0064 +#define REG_GIINTMSK_ENABLE 0x8000 +#define REG_BMCTL 0x0080 +#define REG_BMCTL_ENABLE_TX 0x0100 +#define REG_BMCTL_ENABLE_RX 0x0001 +#define REG_BMSTS 0x0084 +#define REG_BMSTS_RX_ACTIVE 0x0008 +#define REG_RXDQBADD 0x0090 +#define REG_RXDQBLEN 0x0094 +#define REG_RXDCURADD 0x0098 +#define REG_RXDENQ 0x009c +#define REG_RXSTSQBADD 0x00a0 +#define REG_RXSTSQBLEN 0x00a4 +#define REG_RXSTSQCURADD 0x00a8 +#define REG_RXSTSENQ 0x00ac +#define REG_TXDQBADD 0x00b0 +#define REG_TXDQBLEN 0x00b4 +#define REG_TXDQCURADD 0x00b8 +#define REG_TXDENQ 0x00bc +#define REG_TXSTSQBADD 0x00c0 +#define REG_TXSTSQBLEN 0x00c4 +#define REG_TXSTSQCURADD 0x00c8 +#define REG_MAXFRMLEN 0x00e8 + +struct ep93xx_rdesc +{ + u32 buf_addr; + u32 rdesc1; +}; + +#define RDESC1_NSOF0x8000 +#define RDESC1_BUFFER_INDEX0x7fff +#define RDESC1_BUFFER_LENGTH 0x + +struct ep93xx_rstat +{ + u32 rstat0; + u32 rstat1; +}; + +#define RSTAT0_RFP 0x8000 +#define RSTAT0_RWE 0x4000 +#define RSTAT0_EOF 0x2000 +#define RSTAT0_EOB 0x1000 +#define RSTAT0_AM
Re: [PATCH] cirrus logic ep93xx ethernet driver
On Thu, Sep 21, 2006 at 07:10:02PM -0400, Jeff Garzik wrote: +if (!(rstat0 RSTAT0_RFP)) { +printk(KERN_CRIT ep93xx_rx: buffer not done + %.8x %.8x\n, rstat0, rstat1); +BUG(); +} +if (!(rstat0 RSTAT0_EOF)) { +printk(KERN_CRIT ep93xx_rx: not end-of-frame + %.8x %.8x\n, rstat0, rstat1); +BUG(); +} +if (!(rstat0 RSTAT0_EOB)) { +printk(KERN_CRIT ep93xx_rx: not end-of-buffer + %.8x %.8x\n, rstat0, rstat1); +BUG(); +} +if (!(rstat1 RSTAT1_RFP)) { +printk(KERN_CRIT ep93xx_rx: buffer1 not done + %.8x %.8x\n, rstat0, rstat1); +BUG(); +} +if ((rstat1 RSTAT1_BUFFER_INDEX) 16 != entry) { +printk(KERN_CRIT ep93xx_rx: entry mismatch + %.8x %.8x\n, rstat0, rstat1); +BUG(); +} NAK all these BUGs. Very unfriendly If any of these checks trigger, we are in a very bad state, and something is likely trampling over random bits of memory, but OK, removed. +if (tstat0 TSTAT0_TXWE) { +int length = ep-descs-tdesc[entry].tdesc1 0xfff; + +ep-stats.tx_packets++; +ep-stats.tx_bytes += length; +} else { +ep-stats.tx_errors++; +} +#if 0 +/* This is only valid in half duplex mode. */ +if (tstat0 TSTAT0_LCRS) +ep-stats.tx_carrier_errors++; +#endif why #if 0'd? The CRS bit will be set in the tx completion entry if the MII CRS (Carrier Sense) signal wasn't asserted after the first N nibbles have been transmitted. However, if the PHY is running in full duplex mode, the CRS signal isn't supposed to assert at all, and so we ended up counting tx_carrier_errors for every transmitted packet if the interface was in full duplex mode. I removed this bit of code rather than #if 0'ing it out. +static irqreturn_t ep93xx_irq(int irq, void *dev_id, struct pt_regs *regs) +{ +struct net_device *dev = dev_id; +struct ep93xx_priv *ep = netdev_priv(dev); +u32 status; + +status = rdl(ep, REG_INTSTSC); +if (status == 0) +return IRQ_NONE; also check for status == 0x As the ethernet controller is on the CPU die itself, it's not very likely to be unplugged? +static int ep93xx_alloc_buffers(struct ep93xx_priv *ep) +{ +int i; + +ep-descs = dma_alloc_coherent(NULL, sizeof(struct ep93xx_descs), +ep-descs_dma_addr, GFP_KERNEL | GFP_DMA); +if (ep-descs == NULL) +return 1; + +for (i = 0; i RX_QUEUE_ENTRIES; i += 2) { +void *page; +dma_addr_t d; + +page = (void *)get_zeroed_page(GFP_KERNEL | GFP_DMA); +if (page == NULL) +goto err; do you really need a zeroed page? No, any page will do -- fixed. +static int ep93xx_eth_remove(struct platform_device *pdev) +{ +struct net_device *dev; +struct ep93xx_priv *ep; + +dev = platform_get_drvdata(pdev); +if (dev == NULL) +return 0; +platform_set_drvdata(pdev, NULL); + +ep = netdev_priv(dev); + +/* @@@ Force down. */ +unregister_netdev(dev); +ep93xx_free_buffers(ep); + +if (ep-base_addr != NULL) +iounmap(ep-base_addr); + +if (ep-res != NULL) { +release_resource(ep-res); +kfree(ep-res); +} when will these ever be NULL ? ep93xx_eth_remove is called from ep93xx_eth_probe's error path (see below.) All other issues fixed as well. Thanks for your time. +free_netdev(dev); + +return 0; +} + +static int ep93xx_eth_probe(struct platform_device *pdev) +{ +struct ep93xx_eth_data *data; +struct net_device *dev; +struct ep93xx_priv *ep; +int err; + +data = pdev-dev.platform_data; +if (pdev == NULL) +return -ENODEV; + +dev = ep93xx_dev_alloc(data); +if (dev == NULL) { +err = -ENOMEM; +goto err_out; +} +ep = netdev_priv(dev); + +platform_set_drvdata(pdev, dev); + +ep-res = request_mem_region(pdev-resource[0].start, +pdev-resource[0].end - pdev-resource[0].start + 1, +pdev-dev.bus_id); +if (ep-res == NULL) { +dev_err(pdev-dev, Could not reserve memory region\n); +err = -ENOMEM; +goto err_out; +} + +ep-base_addr = ioremap(pdev-resource[0].start, +pdev-resource[0].end -
Re: [PATCH] EtherIP tunnel driver (RFC 3378)
On Mon, Sep 11, 2006 at 10:41:29PM +0200, Joerg Roedel wrote: This driver implements the tunneling of Ethernet packets over IPv4 networks for Linux. It uses the protocol defined in RFC 3378. Check out the thread [PATCH][RFC] etherip: Ethernet-in-IPv4 tunneling that was on netdev in January of 2005 -- a number of arguments against etherip (and for tunneling ethernet in GRE) were raised back then. One of the most significant ones, IMHO: Another argument against etherip would be that OpenBSD apparently mis-implemented etherip by putting the etherip version nibble in the second nibble of the etherip header instead of the first, which would probably prevent the linux and OpenBSD versions from interoperating, negating the advantage of using etherip in the first place. cheers, Lennert - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH,RESEND] rtl8150: use default MTU of 1500
The rtl8150 (ethernet) driver uses a default MTU of 1540, which causes all kinds of problems with for example booting off NFS root. There isn't really any reason why we shouldn't use the default of 1500. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] Index: linux-2.6.18-rc2/drivers/usb/net/rtl8150.c === --- linux-2.6.18-rc2.orig/drivers/usb/net/rtl8150.c +++ linux-2.6.18-rc2/drivers/usb/net/rtl8150.c @@ -867,9 +867,8 @@ netdev-hard_start_xmit = rtl8150_start_xmit; netdev-set_multicast_list = rtl8150_set_multicast; netdev-set_mac_address = rtl8150_set_mac_address; netdev-get_stats = rtl8150_netdev_stats; - netdev-mtu = RTL8150_MTU; SET_ETHTOOL_OPS(netdev, ops); dev-intr_interval = 100; /* 100ms */ if (!alloc_all_urbs(dev)) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pull request for 'r8169-20060912-00' branch
On Tue, Sep 12, 2006 at 09:35:31PM +0200, Francois Romieu wrote: + /* + * Magic spell: some iop3xx ARM board needs the TxDescAddrHigh + * register to be written before TxDescAddrLow to work. + * Switching from MMIO to I/O access fixes the issue as well. + */ Not that it matters much either way, but my impression was that this was an 8110SB bug rather than an iop3xx bug, as the same iop3xx ARM board sports a VIA PCI USB controller and a Silicon Image PCI SATA controller, which both work without any problems. cheers, Lennert - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,RFC] Re: r8169 driver problem with RTL8110SB chip (on iop3xx ARM board)
On Fri, Sep 08, 2006 at 10:23:36PM +0200, Francois Romieu wrote: I suspect it's a chip bug. I rechecked with I/O space, and that works okay, so this artifact (bug) only manifests itself when you do the upper write in MMIO space. Are there any plans to switch r8169 to the iomap API? Would you take a patch if I'd write one? Given the current state of the r8169 driver, I do not see a lot of benefit from the iomap() API in itself. It could make the switch to I/O read/write easier for strange bugs like your but I have an epidermic defiance against I/O ops (much too synchronizing for me: people forget that MMIO will post). I may change my mind if bugs start poping up like mushrooms but we are hopefully not there yet. An ordered write with a big sign in front of it to comment the issue is good enough for me. Don't hesitate to protest if you think that I need a clue. What you say makes sense -- in my case it would have been useful to have a knob to switch the driver to use I/O ops (since that is what the vendor driver uses, and the vendor driver works), but bugs like these are generally rare anyway. and so the added benefit isn't too big. OK. cheers, Lennert - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] e100 driver on ARM
On Mon, Sep 04, 2006 at 06:39:29AM -0400, Jeff Garzik wrote: 1) Does e100 driver work on ARM? FWIW, e100 seems to work okay for me on an intel ixp2400 (xscale based) board, an ixp2850 (xscale based) board and an ixp2350 (xscale3 based) board. ixp2350 works both with hardware coherency turned on (cpu snoops bus) and turned off (manual dma cache clean/invalidate as usual.) As for the other ARM platforms that I'm interested in / have hardware for / maintain, the at91/ep93xx/pxa270 don't have PCI, and the other two (iop32x/iop33x) I can't test because I don't have such systems with e100 NICs, but I expect those would work, since they're both xscale based like the ixp2400, and the ixp2400 works. cheers, Lennert -- VGER BF report: H 6.97804e-11 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] DM9000 interrupt is hardware dependant
On Mon, Sep 04, 2006 at 10:17:08PM +0200, Jürgen Schindele wrote: i made a patch for an PXA270-evalboard with DM9000 ethernet contoller. The Interrupt can be high- or low- active dependant of the wiring of the MDC-(57)pin. Because of this hardware dependency you shoud be able to configure this behaviour in struct resource dm9000_resources[] Putting it in 'struct dm9000_plat_data' sounds like a much better idea to me... Overloading the IRQ number in 'struct resource' is just ugly. cheers, Lennert - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html