from:"Lennert Buytenhek"

Re: DSA support for Marvell 88e6065 switch

2018-12-02 Thread Lennert Buytenhek

On Thu, Nov 22, 2018 at 09:27:24PM +0100, Pavel Machek wrote:

> Hi!

Hello!


> > > > > If I wanted it to work, what do I need to do? AFAICT phy autoprobing
> > > > > should just attach it as soon as it is compiled in?
> > > > 
> > > > Nope. It is a switch, not a PHY. Switches are never auto-probed
> > > > because they are not guaranteed to have ID registers.
> > > > 
> > > > You need to use the legacy device tree binding. Look in
> > > > Documentation/devicetree/bindings/net/dsa/dsa.txt, section Deprecated
> > > > Binding. You can get more examples if you checkout old kernels. Or
> > > > kirkwood-rd88f6281.dtsi, the dsa { } node which is disabled.
> > > 
> > > Thanks; I ported code from mv88e66xx in the meantime, and switch
> > > appears to be detected.
> > > 
> > > But I'm running into problems with tagging code, and I guess I'd like
> > > some help understanding.
> > > 
> > > tag_trailer: allocates new skb, then copies data around.
> > > 
> > > tag_qca: does dev->stats.tx_packets++, and reuses existing skb.
> > > 
> > > tag_brcm: reuses existing skb.
> 
> Any idea why tag trailer allocates new skb,

I wrote this code over 10 years ago, so I don't remember all that
well, but I think that it is because you have to do manual checksumming
of the packet, as there's no way to pass down the stack that you don't
want to checksum all the way down to the end of the data area (and you
don't want the tag to be included in the checksum), and so you want to
do that before you add the trailer tag, and you'll probably have to
reallocate the data area to be able to add the tag, and you probably
won't get an exclusive skb here anyway, so you might as well allocate
a new one.


> and what is going on with dev->stats.tx_packets++?

trailer_xmit would be the hard_start_xmit function for the virtual
(slave) network interface, so this would be the right thing to do?


> > > Is qca wrong in adjusting the statistics? Why does trailer allocate
> > > new skb?
> > > 
> > > 6065 seems to use 2-byte header between "SFD" and "Destination
> > > address" in the ethernet frame. That's ... strange place to put
> > > header, as addresses are now shifted. I need to put ethernet in
> > > promisc mode (by running tcpdump) to get data moving.. and can not
> > > figure out what to do in tag_...
> > 
> > Does this switch chip not also support trailer mode?
> > 
> > There's basically four tagging modes for Marvell switch chips: header
> > mode (the one you described), trailer mode (tag_trailer.c), DSA and
> > ethertype DSA.  The switch chips I worked on that didn't support
> > (ethertype) DSA tagging did support both header and trailer modes,
> > and I chose to run them in trailer mode for the reasons you describe
> > above, but if your chip doesn't support trailer mode, then yes,
> > you'll have to add support for header mode and put the underlying
> > interface into promiscuous mode and such.
> 
> It seems that 6060 supports both header (probably, parts of docs are
> redacted) and trailer mode... but I'm working with 6065. That does not
> support trailer mode... or at least word "trailer" does not appear
> anywhere in the documentation.
> 
> What chip were you working with? I may want to take a look on their
> wording.

I think I added trailer mode just for the 6060, since it doesn't (IIRC)
support (ethertype) DSA tagging.


> 6065 indeed has some kind of "egress tagging mode" (with four
> options), but I have trouble understanding what it really does.

What are the options?

Re: DSA support for Marvell 88e6065 switch

2018-11-22 Thread Lennert Buytenhek

On Thu, Nov 22, 2018 at 02:21:23PM +0100, Pavel Machek wrote:

> > > If I wanted it to work, what do I need to do? AFAICT phy autoprobing
> > > should just attach it as soon as it is compiled in?
> > 
> > Nope. It is a switch, not a PHY. Switches are never auto-probed
> > because they are not guaranteed to have ID registers.
> > 
> > You need to use the legacy device tree binding. Look in
> > Documentation/devicetree/bindings/net/dsa/dsa.txt, section Deprecated
> > Binding. You can get more examples if you checkout old kernels. Or
> > kirkwood-rd88f6281.dtsi, the dsa { } node which is disabled.
> 
> Thanks; I ported code from mv88e66xx in the meantime, and switch
> appears to be detected.
> 
> But I'm running into problems with tagging code, and I guess I'd like
> some help understanding.
> 
> tag_trailer: allocates new skb, then copies data around.
> 
> tag_qca: does dev->stats.tx_packets++, and reuses existing skb.
> 
> tag_brcm: reuses existing skb.
> 
> Is qca wrong in adjusting the statistics? Why does trailer allocate
> new skb?
> 
> 6065 seems to use 2-byte header between "SFD" and "Destination
> address" in the ethernet frame. That's ... strange place to put
> header, as addresses are now shifted. I need to put ethernet in
> promisc mode (by running tcpdump) to get data moving.. and can not
> figure out what to do in tag_...

Does this switch chip not also support trailer mode?

There's basically four tagging modes for Marvell switch chips: header
mode (the one you described), trailer mode (tag_trailer.c), DSA and
ethertype DSA.  The switch chips I worked on that didn't support
(ethertype) DSA tagging did support both header and trailer modes,
and I chose to run them in trailer mode for the reasons you describe
above, but if your chip doesn't support trailer mode, then yes,
you'll have to add support for header mode and put the underlying
interface into promiscuous mode and such.

Re: [BUG] xfrm: unable to handle kernel NULL pointer dereference

2018-11-16 Thread Lennert Buytenhek

On Sat, Nov 10, 2018 at 08:34:34PM +0100, Jean-Philippe Menil wrote:

> we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using
> IPsec VTI interfaces, on several vpn hosts, since upgrade from 4.4.

I looked into this with Jean-Philippe, and it appears to be crashing
on a NULL pointer dereference in the inlined xfrm_policy_check() call
in vti_rcv_cb(), and specifically on the skb_dst(skb) dereference in
__xfrm_policy_check2():

return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
(skb_dst(skb)->flags & DST_NOPOLICY) || <=
__xfrm_policy_check(sk, ndir, skb, family);

Commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when
skb_dst_force clears the dst_entry.") fixes a very similar problem on
the output and forward paths, but our issue seems to be triggering on
the input path.

This hack patch seems to make the crashes go away, and the printk added
triggers with approximately the same regularity as the crashes used
to occur, so the fix from 9e1437937807 probably needs to be extended
to the input path somewhat like this.

Thanks!


diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 352abca2605f..c666e29441b4 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -381,6 +381,12 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
XFRM_SKB_CB(skb)->seq.input.hi = seq_hi;
 
skb_dst_force(skb);
+   if (!skb_dst(skb)) {
+   if (net_ratelimit())
+   printk(KERN_CRIT "OH CRAP\n");
+   goto drop;
+   }
+
dev_hold(skb->dev);
 
if (crypto_done)



> Attached, the offended oops against 4.18.
> 
> Output of decodedecode:
> 
> [ 37.134864] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 00
> 8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe  40
> 38 04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20
> All code
> 
>0:   8b 44 24 70 mov0x70(%rsp),%eax
>4:   0f c8   bswap  %eax
>6:   89 87 b4 00 00 00   mov%eax,0xb4(%rdi)
>c:   48 8b 86 20 05 00 00mov0x520(%rsi),%rax
>   13:   8b 80 f8 14 00 00   mov0x14f8(%rax),%eax
>   19:   85 c0   test   %eax,%eax
>   1b:   75 05   jne0x22
>   1d:   48 85 d2test   %rdx,%rdx
>   20:   74 0e   je 0x30
>   22:   48 8b 43 58 mov0x58(%rbx),%rax
>   26:   48 83 e0 fe and$0xfffe,%rax
>   2a:*  f6 40 38 04 testb  $0x4,0x38(%rax)  <-- trapping
> instruction
>   2e:   74 7d   je 0xad
>   30:   44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx)
>   37:   49 8b 44 24 20  mov0x20(%r12),%rax
>   3c:   48  rex.W
>   3d:   39  .byte 0x39
>   3e:   86 20   xchg   %ah,(%rax)
> 
> Code starting with the faulting instruction
> ===
>0:   f6 40 38 04 testb  $0x4,0x38(%rax)
>4:   74 7d   je 0x83
>6:   44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx)
>d:   49 8b 44 24 20  mov0x20(%r12),%rax
>   12:   48  rex.W
>   13:   39  .byte 0x39
>   14:   86 20   xchg   %ah,(%rax)
> 
> 
> if my understanding is correct, we fail here:
> 
> /build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:
> 1169return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
>0x0b19 <+185>:   testb  $0x4,0x38(%rax)
>0x0b1d <+189>:   je 0xb9c 
> 
> (gdb) list *0x0b19
> 0xb19 is in vti_rcv_cb
> (/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:1169).
> 1164int ndir = dir | (reverse ? XFRM_POLICY_MASK + 1 : 0);
> 1165
> 1166if (sk && sk->sk_policy[XFRM_POLICY_IN])
> 1167return __xfrm_policy_check(sk, ndir, skb, family);
> 1168
> 1169return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
> 1170(skb_dst(skb)->flags & DST_NOPOLICY) ||
> 1171__xfrm_policy_check(sk, ndir, skb, family);
> 1172}
> 1173
> 
> I really have hard time to understand why skb seem to be freed twice.
> 
> I'm not able to repeat the bug in lab, but it happened regulary in prod,
> seem to depend of the workload.
> 
> Any help will be appreciated.
> 
> Let me know if you need further informations.
> 
> Regards,
> 
> Jean-Philippe

> [   31.154360] BUG: unable to handle kernel NULL pointer dereference at 
> 0038
> [   31.162233] PGD 0 P4D 0
> [   31.164786] Oops:  [#1] SMP PTI
> [   31.168291] CPU: 5 PID: 42 Comm: ksoftirqd/5 Not tainted 4.18.0-11-generic 
> #12~18.04.1-Ubuntu
> [   31.176854] Hardware name: Supermicro

Re: [PATCH net] packet: fix reserve calculation

2018-06-02 Thread Lennert Buytenhek

On Thu, May 24, 2018 at 06:10:30PM -0400, Willem de Bruijn wrote:

> From: Willem de Bruijn 
> 
> Commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link
> layer allocation") ensures that packet_snd always starts writing
> the link layer header in reserved headroom allocated for this
> purpose.
> 
> This is needed because packets may be shorter than hard_header_len,
> in which case the space up to hard_header_len may be zeroed. But
> that necessary padding is not accounted for in skb->len.
> 
> The fix, however, is buggy. It calls skb_push, which grows skb->len
> when moving skb->data back. But in this case packet length should not
> change.
> 
> Instead, call skb_reserve, which moves both skb->data and skb->tail
> back, without changing length.
> 
> Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer 
> allocation")
> Reported-by: Tariq Toukan 
> Signed-off-by: Willem de Bruijn 
> Acked-by: Soheil Hassas Yeganeh 

After upgrading my router from 4.16.11 to 4.16.12, it is failing to
obtain a DHCP lease from my ISP, as it started sending out DHCP queries
with 14 bytes of junk at the end (which is presumably causing RX csum
failures on the DHCP server end):

 13:08:39.292667 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP 
(17), length 328)
 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 
xx:xx:xx:xx:xx:xx, length 300, xid 0x, Flags [none] (0x)
[...]
-0x0150:    
+0x0150:     e802    e802
+0x0160:   

This seems to be caused by (the -stable backport of) b84bbaf7a6c8
("packet: in packet_snd start writing at link layer allocation") and
appears to have been fixed by this patch, as applying this patch to
4.16.12 makes DHCP work for me again.

Tested-by: Lennert Buytenhek 



> ---
>  net/packet/af_packet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index e9422fe45179..acb7b86574cd 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2911,7 +2911,7 @@ static int packet_snd(struct socket *sock, struct 
> msghdr *msg, size_t len)
>   if (unlikely(offset < 0))
>   goto out_free;
>   } else if (reserve) {
> - skb_push(skb, reserve);
> + skb_reserve(skb, -reserve);
>   }
>  
>   /* Returns -EFAULT on error */
> -- 
> 2.17.0.921.gf22659ad46-goog

Re: [PATCH net] net: ipv6: Compare lwstate in detecting duplicate nexthops

2017-07-05 Thread Lennert Buytenhek

On Wed, Jul 05, 2017 at 02:14:33PM -0700, Roopa Prabhu wrote:

> > Lennert reported a failure to add different mpls encaps in a multipath
> > route:
> >
> >   $ ip -6 route add 1234::/16 \
> > nexthop encap mpls 10 via fe80::1 dev ens3 \
> > nexthop encap mpls 20 via fe80::1 dev ens3
> >   RTNETLINK answers: File exists
> >
> > The problem is that the duplicate nexthop detection does not compare
> > lwtunnel configuration. Add it.
> >
> > Fixes: 19e42e451506("ipv6: support for fib route lwtunnel encap attributes")
> > Signed-off-by: David Ahern <dsah...@gmail.com>
> 
> Reported-by: João Taveira Araújo <joao.tave...@gmail.com>
> 
> Reported-by: Lennert Buytenhek <buyt...@wantstofly.org>
> 
> Acked-by: Roopa Prabhu <ro...@cumulusnetworks.com>

Tested-by: Lennert Buytenhek <buyt...@wantstofly.org>

Seems to work!  Thanks!

Unable to add v6 multipath route with same nexthops but different MPLS labels

2017-07-05 Thread Lennert Buytenhek

Hi!

FWIW, this doesn't work:

# ip -6 route add 1234::/16 \
nexthop encap mpls 10 via fe80::1 dev ens3 \
nexthop encap mpls 20 via fe80::1 dev ens3
RTNETLINK answers: File exists

While this does:

# ip -6 route chg 1234::/16
nexthop encap mpls 10 via fe80::1 dev ens3
nexthop encap mpls 20 via fe80::2 dev ens3
# ip -6 route
1234::/16  encap mpls  10 via fe80::1 dev ens3 metric 1024 pref medium
1234::/16  encap mpls  20 via fe80::2 dev ens3 metric 1024 pref medium
[...]

ECMPing over different LSPs that share a nexthop router seems like a
legitimate use case to me.  Is this restriction intentional or just an
accident?  (The same thing works fine in v4 land, where multipath
routes are handled differently.)

Thanks in advance!


Cheers,
Lennert

Re: [PATCH v2 net-next 06/12] ep93xx_eth: add GRO support

2017-05-15 Thread Lennert Buytenhek

On Sat, Feb 04, 2017 at 03:24:56PM -0800, Eric Dumazet wrote:

> Use napi_complete_done() instead of __napi_complete() to :
> 
> 1) Get support of gro_flush_timeout if opt-in
> 2) Not rearm interrupts for busy-polling users.
> 3) use standard NAPI API.
> 4) get rid of baroque code and ease maintenance.
>
> [...]
>
> @@ -310,35 +311,17 @@ static int ep93xx_rx(struct net_device *dev, int 
> processed, int budget)
>   return processed;
>  }
>  
> -static int ep93xx_have_more_rx(struct ep93xx_priv *ep)
> -{
> - struct ep93xx_rstat *rstat = ep->descs->rstat + ep->rx_pointer;
> - return !!((rstat->rstat0 & RSTAT0_RFP) && (rstat->rstat1 & RSTAT1_RFP));
> -}
> -
>  static int ep93xx_poll(struct napi_struct *napi, int budget)
>  {
>   struct ep93xx_priv *ep = container_of(napi, struct ep93xx_priv, napi);
>   struct net_device *dev = ep->dev;
> - int rx = 0;
> -
> -poll_some_more:
> - rx = ep93xx_rx(dev, rx, budget);
> - if (rx < budget) {
> - int more = 0;
> + int rx;
>  
> + rx = ep93xx_rx(dev, budget);
> + if (rx < budget && napi_complete_done(napi, rx)) {
>   spin_lock_irq(>rx_lock);
> - __napi_complete(napi);
>   wrl(ep, REG_INTEN, REG_INTEN_TX | REG_INTEN_RX);
> - if (ep93xx_have_more_rx(ep)) {
> - wrl(ep, REG_INTEN, REG_INTEN_TX);
> - wrl(ep, REG_INTSTSP, REG_INTSTS_RX);
> - more = 1;
> - }
>   spin_unlock_irq(>rx_lock);
> -
> - if (more && napi_reschedule(napi))
> - goto poll_some_more;
>   }
>  
>   if (rx) {

This code was the way it was because the ep93xx hardware is somewhat
braindead.  If I remember correctly (but it's been a while since I wrote
this code):

1. ep93xx netdev IRQs are edge-triggered, so if you re-enable IRQs
   while there was still work to be done, you will not get another IRQ.

2. Disabling an interrupt source in the interrupt mask register will
   cause its interrupt status bit to always return zero, so you cannot
   check whether an interrupt status is pending without having the
   interrupt source enabled.

(I'll admit that a comment explaining this would have been in order.)

I don't know if we really care about this hardware anymore (I don't),
but the ep93xx platform is still listed as being maintained in the
MAINTAINERS file -- adding Ryan and Hartley.

Re: problem with MPLS and TSO/GSO

2016-07-27 Thread Lennert Buytenhek

On Wed, Jul 27, 2016 at 03:02:24PM +0800, zhuyj wrote:

> On ubuntu16.04 server 64 bit
> The attached script is run, the following will appear.
> 
> Error: either "to" is duplicate, or "encap" is a garbage.

Looks like your installed iproute2 package doesn't grok MPLS.

problem with MPLS and TSO/GSO

2016-07-25 Thread Lennert Buytenhek

Hi!

I am seeing pretty horrible TCP transmit performance (anywhere between
1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a
route that involves MPLS labeling, and this seems to be due to an
interaction between MPLS and TSO/GSO that causes all segmentable TCP
frames that are MPLS-labeled to be dropped on egress.

I initially ran into this issue with the ixgbe driver, but it is easily
reproduced with veth interfaces, and the script attached below this
email reproduces the issue.  The script configures three network
namespaces: one that transmits TCP data (netperf) with MPLS labels,
one that takes the MPLS traffic and pops the labels and forwards the
traffic on, and one that receives the traffic (netserver).  When not
using MPLS labeling, I get ~3 Mb/s single-stream TCP performance
in this setup on my test box, and with MPLS labeling, I get ~2 Mb/s.

Some investigating shows that egress TCP frames that need to be
segmented are being dropped in validate_xmit_skb(), which calls
skb_gso_segment() which calls skb_mac_gso_segment() which returns
-EPROTONOSUPPORT because we apparently didn't have the right kernel
module (mpls_gso) loaded.

(It's somewhat poor design, IMHO, to degrade network performance by
15000x if someone didn't load a kernel module they didn't know they
should have loaded, and in a way that doesn't log any warnings or
errors and can only be diagnosed by adding printk calls to net/core/
and recompiling your kernel.)

(Also, I'm not sure why mpls_gso is needed when ixgbe seems to be
able to natively do TSO on MPLS-labeled traffic, maybe because ixgbe
doesn't advertise the necessary features in ->mpls_features?  But
adding those bits doesn't seem to change much.)

But, loading mpls_gso doesn't change much -- skb_gso_segment() then
starts return -EINVAL instead, which is due to the
skb_network_protocol() call in skb_mac_gso_segment() returning zero.
And looking at skb_network_protocol(), I don't see how this is
supposed to work -- skb->protocol is 0 at this point, and there is no
way to figure out that what we are encapsulating is IP traffic, because
unlike what is the case with VLAN tags, MPLS labels aren't followed by
an inner ethertype that says what kind of traffic is in here, you have
to have explicit knowledge of the payload type for MPLS.

Any ideas?

Thanks in advance!


Cheers,
Lennert



=== problem.sh
#!/bin/sh

# ns0 sends out packets with mpls labels
# ns1 receives the labelled packets, pops the labels, and forwards to ns2
# ns2 receives the unlabelled packets and replies to ns0

ip netns add ns0
ip netns add ns1
ip netns add ns2

ip link add virt01 type veth peer name virt10
ip link set virt01 netns ns0
ip link set virt10 netns ns1

ip link add virt12 type veth peer name virt21
ip link set virt12 netns ns1
ip link set virt21 netns ns2

ip netns exec ns0 ip addr add 127.0.0.1/8 dev lo
ip netns exec ns0 ip link set lo up
ip netns exec ns0 ip addr add 172.16.20.20/24 dev virt01
ip netns exec ns0 ip link set virt01 up

ip netns exec ns1 ip addr add 127.0.0.1/8 dev lo
ip netns exec ns1 ip link set lo up
ip netns exec ns1 ip addr add 172.16.20.21/24 dev virt10
ip netns exec ns1 ip link set virt10 up
ip netns exec ns1 ip addr add 172.16.21.21/24 dev virt12
ip netns exec ns1 ip link set virt12 up

ip netns exec ns2 ip addr add 127.0.0.1/8 dev lo
ip netns exec ns2 ip link set lo up
ip netns exec ns2 ip addr add 172.16.21.22/24 dev virt21
ip netns exec ns2 ip link set virt21 up

modprobe mpls_iptunnel

ip netns exec ns0 ip route add 10.10.10.10/32 encap mpls 100 via inet 
172.16.20.21 mtu lock 1496
#ip netns exec ns0 ip route add 172.16.21.0/24 via 172.16.20.21
ip netns exec ns0 ip route add 172.16.21.0/24 via 172.16.20.21 mtu lock 1496

ip netns exec ns1 sysctl -w net.ipv4.conf.all.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.conf.default.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.conf.lo.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.conf.virt10.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.conf.virt12.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.ip_forward=1
ip netns exec ns1 sysctl -w net.mpls.conf.virt10.input=1
ip netns exec ns1 sysctl -w net.mpls.platform_labels=1000
ip netns exec ns1 ip -f mpls route add 100 via inet 172.16.21.22

ip netns exec ns2 ip addr add 10.10.10.10/32 dev lo
ip netns exec ns2 ip route add 172.16.20.0/24 via 172.16.21.21

ip netns exec ns0 ping -c 1 10.10.10.10

ip netns exec ns2 netserver

# non-mpls
ip netns exec ns0 netperf -c -C -H 172.16.21.22 -l 10 -t TCP_STREAM

# mpls (retry this with mpls_gso loaded)
ip netns exec ns0 netperf -c -C -H 10.10.10.10 -l 10 -t TCP_STREAM

[PATCH] neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()

2016-06-28 Thread Lennert Buytenhek

From: David Barroso <dbarr...@fastly.com>

neigh_xmit() expects to be called inside an RCU-bh read side critical
section, and while one of its two current callers gets this right, the
other one doesn't.

More specifically, neigh_xmit() has two callers, mpls_forward() and
mpls_output(), and while both callers call neigh_xmit() under
rcu_read_lock(), this provides sufficient protection for neigh_xmit()
only in the case of mpls_forward(), as that is always called from
softirq context and therefore doesn't need explicit BH protection,
while mpls_output() can be called from process context with softirqs
enabled.

When mpls_output() is called from process context, with softirqs
enabled, we can be preempted by a softirq at any time, and RCU-bh
considers the completion of a softirq as signaling the end of any
pending read-side critical sections, so if we do get a softirq
while we are in the part of neigh_xmit() that expects to be run inside
an RCU-bh read side critical section, we can end up with an unexpected
RCU grace period running right in the middle of that critical section,
making things go boom.

This patch fixes this impedance mismatch in the callee, by making
neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
expects to be treated as an RCU-bh read side critical section, as this
seems a safer option than fixing it in the callers.

Fixes: 4fd3d7d9e868f ("neigh: Add helper function neigh_xmit")
Signed-off-by: David Barroso <dbarr...@fastly.com>
Signed-off-by: Lennert Buytenhek <lbuyten...@fastly.com>
Acked-by: David Ahern <d...@cumulusnetworks.com>
Acked-by: Robert Shearman <rshea...@brocade.com>
---
 net/core/neighbour.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 29dd8cc..510cd62 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2469,13 +2469,17 @@ int neigh_xmit(int index, struct net_device *dev,
tbl = neigh_tables[index];
if (!tbl)
goto out;
+   rcu_read_lock_bh();
neigh = __neigh_lookup_noref(tbl, addr, dev);
if (!neigh)
neigh = __neigh_create(tbl, addr, dev, false);
err = PTR_ERR(neigh);
-   if (IS_ERR(neigh))
+   if (IS_ERR(neigh)) {
+   rcu_read_unlock_bh();
goto out_kfree_skb;
+   }
err = neigh->output(neigh, skb);
+   rcu_read_unlock_bh();
}
else if (index == NEIGH_LINK_TABLE) {
err = dev_hard_header(skb, dev, ntohs(skb->protocol),
-- 
2.7.4

Re: [PATCH] mpls: Add missing RCU-bh read side critical section locking in output path

2016-06-28 Thread Lennert Buytenhek

On Thu, Jun 23, 2016 at 12:00:55PM -0400, David Miller wrote:

> > From: David Barroso <dbarr...@fastly.com>
> > 
> > When locally originated IP traffic hits a route that says to push
> > MPLS labels, we'll get a call chain dst_output() -> lwtunnel_output()
> > -> mpls_output() -> neigh_xmit() -> ___neigh_lookup_noref() where the
> > last function in this chain accesses a RCU-bh protected struct
> > neigh_table pointer without us ever having declared an RCU-bh read
> > side critical section.
> > 
> > As in case of locally originated IP traffic we'll be running in process
> > context, with softirqs enabled, we can be preempted by a softirq at any
> > time, and RCU-bh considers the completion of a softirq as signaling
> > the end of any pending read-side critical sections, so if we do get a
> > softirq here, we can end up with an unexpected RCU grace period and
> > all the nastiness that that comes with.
> > 
> > This patch makes neigh_xmit() take rcu_read_{,un}lock_bh() around the
> > code that expects to be treated as an RCU-bh read side critical section.
> > 
> > Signed-off-by: David Barroso <dbarr...@fastly.com>
> > Signed-off-by: Lennert Buytenhek <lbuyten...@fastly.com>
> 
> Whilst the case that was used to discover this problem was MPLS, that
> is not the subsystem where the bug exists and is being fixed.
> 
> Therefore please fix your Subject line.
> 
> Thanks.

I'd say that the bug _is_ in the MPLS code, but that we're just fixing
it in a helper function that lives elsewhere (and which is only used by
MPLS), but yeah, the subject line and the patch body don't match up. :(
I've resubmitted the patch with the commit message below, I hope that
that'll do.

Thanks!

===

[PATCH] neigh: Explicitly declare RCU-bh read side critical section in 
neigh_xmit()

From: David Barroso <dbarr...@fastly.com>

neigh_xmit() expects to be called inside an RCU-bh read side critical
section, and while one of its two current callers gets this right, the
other one doesn't.

More specifically, neigh_xmit() has two callers, mpls_forward() and
mpls_output(), and while both callers call neigh_xmit() under
rcu_read_lock(), this provides sufficient protection for neigh_xmit()
only in the case of mpls_forward(), as that is always called from
softirq context and therefore doesn't need explicit BH protection,
while mpls_output() can be called from process context with softirqs
enabled.

When mpls_output() is called from process context, with softirqs
enabled, we can be preempted by a softirq at any time, and RCU-bh
considers the completion of a softirq as signaling the end of any
pending read-side critical sections, so if we do get a softirq
while we are in the part of neigh_xmit() that expects to be run inside
an RCU-bh read side critical section, we can end up with an unexpected
RCU grace period running right in the middle of that critical section,
making things go boom.

This patch fixes this impedance mismatch in the callee, by making
neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
expects to be treated as an RCU-bh read side critical section, as this
seems a safer option than fixing it in the callers.

Fixes: 4fd3d7d9e868f ("neigh: Add helper function neigh_xmit")
Signed-off-by: David Barroso <dbarr...@fastly.com>
Signed-off-by: Lennert Buytenhek <lbuyten...@fastly.com>
Acked-by: David Ahern <d...@cumulusnetworks.com>
Acked-by: Robert Shearman <rshea...@brocade.com>

Re: rcu locking issue in mpls output code?

2016-06-20 Thread Lennert Buytenhek

On Mon, Jun 20, 2016 at 10:38:39AM -0600, David Ahern wrote:

> > OK, patch coming up.  Thanks!
> 
> can you build a kernel with rcu debugging enabled as well and run
> it through your tests?

git HEAD with CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y CONFIG_PROVE_RCU=y gives me a lockdep splat on the
machine under my desk when I cause mpls_output() to be called.

The script I use for that is this one -- it creates a namespace that
accepts MPLS tagged packets for one of its local IPs and then sends
an MPLS tagged packet into that namespace.  If you run the script on
an unpatched kernel with lock debugging enabled, you should be able
to see the issue as well, the lockdep splat happens on the very first
packet.

=
#!/bin/sh

ip link add tons type veth peer name tempitf

ifconfig tons 172.16.20.20 netmask 255.255.255.0

ip netns add ns1
ip netns exec ns1 ifconfig lo 127.0.0.1 up
ip link set tempitf netns ns1
ip netns exec ns1 ip link set tempitf name eth0
ip netns exec ns1 ifconfig eth0 172.16.20.21 netmask 255.255.255.0

modprobe mpls_iptunnel

ip route add 10.10.10.10/32 encap mpls 100 via inet 172.16.20.21

ip netns exec ns1 sysctl -w net.ipv4.conf.all.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.conf.lo.rp_filter=0
ip netns exec ns1 sysctl -w net.mpls.conf.eth0.input=1
ip netns exec ns1 sysctl -w net.mpls.platform_labels=1000
ip netns exec ns1 ip addr add 10.10.10.10/32 dev lo
ip netns exec ns1 ip -f mpls route add 100 dev lo

ping -c 1 10.10.10.10
=

The patch below (which I'll submit shortly with a proper commit
message) makes this lockdep splat go away.

Enabling lock/rcu debugging gives you a lockdep splat on the first
packet going out through mpls_output(), but then makes the packet
loss / memory corruption issue stop appearing, both on my local
space heater and on much more serious hardware, probably due to
timing differences.

But, with lock/rcu debugging disabled and the patch below included,
I don't see packet loss anymore in a production environment during a
test that would fairly reliably show it before.

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f18ae91..769cece 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2467,13 +2467,17 @@ int neigh_xmit(int index, struct net_device *dev,
tbl = neigh_tables[index];
if (!tbl)
goto out;
+   rcu_read_lock_bh();
neigh = __neigh_lookup_noref(tbl, addr, dev);
if (!neigh)
neigh = __neigh_create(tbl, addr, dev, false);
err = PTR_ERR(neigh);
-   if (IS_ERR(neigh))
+   if (IS_ERR(neigh)) {
+   rcu_read_unlock_bh();
goto out_kfree_skb;
+   }
err = neigh->output(neigh, skb);
+   rcu_read_unlock_bh();
}
else if (index == NEIGH_LINK_TABLE) {
err = dev_hard_header(skb, dev, ntohs(skb->protocol),

[PATCH] mpls: Add missing RCU-bh read side critical section locking in output path

2016-06-20 Thread Lennert Buytenhek

From: David Barroso <dbarr...@fastly.com>

When locally originated IP traffic hits a route that says to push
MPLS labels, we'll get a call chain dst_output() -> lwtunnel_output()
-> mpls_output() -> neigh_xmit() -> ___neigh_lookup_noref() where the
last function in this chain accesses a RCU-bh protected struct
neigh_table pointer without us ever having declared an RCU-bh read
side critical section.

As in case of locally originated IP traffic we'll be running in process
context, with softirqs enabled, we can be preempted by a softirq at any
time, and RCU-bh considers the completion of a softirq as signaling
the end of any pending read-side critical sections, so if we do get a
softirq here, we can end up with an unexpected RCU grace period and
all the nastiness that that comes with.

This patch makes neigh_xmit() take rcu_read_{,un}lock_bh() around the
code that expects to be treated as an RCU-bh read side critical section.

Signed-off-by: David Barroso <dbarr...@fastly.com>
Signed-off-by: Lennert Buytenhek <lbuyten...@fastly.com>

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f18ae91..769cece 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2467,13 +2467,17 @@ int neigh_xmit(int index, struct net_device *dev,
tbl = neigh_tables[index];
if (!tbl)
goto out;
+   rcu_read_lock_bh();
neigh = __neigh_lookup_noref(tbl, addr, dev);
if (!neigh)
neigh = __neigh_create(tbl, addr, dev, false);
err = PTR_ERR(neigh);
-   if (IS_ERR(neigh))
+   if (IS_ERR(neigh)) {
+   rcu_read_unlock_bh();
goto out_kfree_skb;
+   }
err = neigh->output(neigh, skb);
+   rcu_read_unlock_bh();
}
else if (index == NEIGH_LINK_TABLE) {
err = dev_hard_header(skb, dev, ntohs(skb->protocol),

Re: rcu locking issue in mpls output code?

2016-06-20 Thread Lennert Buytenhek

On Mon, Jun 20, 2016 at 09:13:36AM -0700, Roopa Prabhu wrote:

>  diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
>  index fb31aa8..802956b 100644
>  --- a/net/mpls/mpls_iptunnel.c
>  +++ b/net/mpls/mpls_iptunnel.c
>  @@ -105,12 +105,15 @@ static int mpls_output(struct net *net, struct
>  sock *sk, struct sk_buff *skb)
>  bos = false;
>  }
> 
>  +   rcu_read_lock_bh();
>  if (rt)
>  err = neigh_xmit(NEIGH_ARP_TABLE, out_dev,
>  >rt_gateway,
>   skb);
>  else if (rt6)
>  err = neigh_xmit(NEIGH_ND_TABLE, out_dev,
>  >rt6i_gateway,
>   skb);
>  +   rcu_read_unlock_bh();
>  +
>  if (err)
>  net_dbg_ratelimited("%s: packet transmission failed:
>  %d\n",
>  __func__, err);
> 
> >>>
> >>> I think those need to be added to neigh_xmit in the
> >>>
> >>> if (likely(index < NEIGH_NR_TABLES)) {
> >>>
> >>> }
> >>
> >>
> >> That'll force callers that don't need the extra protection (i.e.
> >> mpls_forward(), since that always runs from softirq and it's enough
> >> to protect the neigh state with rcu_read_lock() from softirq and we're
> >> already running under rcu_read_lock() when we get to neigh_xmit()) to
> >> eat the useless overhead of an extra rcu_read_{,un}lock_bh() pair, but
> >> sure, functionally that's correct, I think, and in my workload I don't
> >> care about MPLS forwarding performance anyway. ;-)
> >
> >
> > __neigh_lookup_noref expects bh level protection. Since the if block in
> > neigh_xmit requires the locking seems like this the appropriate place for
> > it.
> >
> >>
> >> Want me to send a patch moving it to neigh_xmit() ?
> >
> >
> > Roopa/Robert: agree?
> 
> yes, seems like an appropriate place for it.  provided it does not add
> unnecessary overhead for others.
> But then neigh_xmit seems to be only called from mpls_output and mpls_forward.

OK, patch coming up.  Thanks!

Re: rcu locking issue in mpls output code?

2016-06-20 Thread Lennert Buytenhek

On Sun, Jun 19, 2016 at 08:19:20PM -0600, David Ahern wrote:

> > diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
> > index fb31aa8..802956b 100644
> > --- a/net/mpls/mpls_iptunnel.c
> > +++ b/net/mpls/mpls_iptunnel.c
> > @@ -105,12 +105,15 @@ static int mpls_output(struct net *net, struct sock 
> > *sk, struct sk_buff *skb)
> > bos = false;
> > }
> > 
> > +   rcu_read_lock_bh();
> > if (rt)
> > err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, >rt_gateway,
> >  skb);
> > else if (rt6)
> > err = neigh_xmit(NEIGH_ND_TABLE, out_dev, >rt6i_gateway,
> >  skb);
> > +   rcu_read_unlock_bh();
> > +
> > if (err)
> > net_dbg_ratelimited("%s: packet transmission failed: %d\n",
> > __func__, err);
> > 
> 
> I think those need to be added to neigh_xmit in the
> 
>   if (likely(index < NEIGH_NR_TABLES)) {
> 
>   }

That'll force callers that don't need the extra protection (i.e.
mpls_forward(), since that always runs from softirq and it's enough
to protect the neigh state with rcu_read_lock() from softirq and we're
already running under rcu_read_lock() when we get to neigh_xmit()) to
eat the useless overhead of an extra rcu_read_{,un}lock_bh() pair, but
sure, functionally that's correct, I think, and in my workload I don't
care about MPLS forwarding performance anyway. ;-)

Want me to send a patch moving it to neigh_xmit() ?

Thank you for having a look!

Cheers,
Lennert

rcu locking issue in mpls output code?

2016-06-19 Thread Lennert Buytenhek

Hi!

While trying to chase down a memory corruption issue that only occurs
when originating large amounts of MPLS tagged IP traffic, I came across
something in the MPLS output code for which I'm not entirely sure that
it's correct.

Specifically, there is the code path dst_output() -> lwtunnel_output()
-> mpls_output() -> neigh_xmit() -> ___neigh_lookup_noref(), where the
latter accesses a RCU-bh protected struct neigh_table pointer, but there
is no RCU-bh protection being arranged anywhere in this call chain.

Since this is locally generated IP traffic, we're running in process
context, and while lwtunnel_output() holds rcu_read_lock() across its
call to lwtunnel_encap_ops::output() (which is mpls_output() here),
nothing in the chain disables BHs, and in RCU-bh, the completion of a
softirq signals the end of any pending read-side critical sections,
and BHs can preempt this call chain at any time because it runs with
hardirqs and softirqs both enabled, so that would mean that neighbour
table entries can be zapped at any time even while we hold
rcu_read_lock().  I think.

The mpls_forward() path doesn't seem susceptible to the same issue,
as it runs from softirq, where rcu_read_lock() suffices, so I figured
that mpls_output() would be a good place to deal with this and that
something like the patch below would do the trick.  I can't say yet if
this makes my memory corruption issues go away, as they don't reproduce
that easily, but I'll keep testing.  Any thoughts so far?


Thanks,
Lennert



diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index fb31aa8..802956b 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -105,12 +105,15 @@ static int mpls_output(struct net *net, struct sock *sk, 
struct sk_buff *skb)
bos = false;
}
 
+   rcu_read_lock_bh();
if (rt)
err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, >rt_gateway,
 skb);
else if (rt6)
err = neigh_xmit(NEIGH_ND_TABLE, out_dev, >rt6i_gateway,
 skb);
+   rcu_read_unlock_bh();
+
if (err)
net_dbg_ratelimited("%s: packet transmission failed: %d\n",
__func__, err);

Re: [PATCH 0/2] macvlan: Avoid unnecessary multicast cloning

2016-05-30 Thread Lennert Buytenhek

On Mon, May 30, 2016 at 04:17:52PM +0800, Herbert Xu wrote:

> > Commit 412ca1550cbecb2c ("macvlan: Move broadcasts into a work queue")
> > moved processing of all macvlan multicasts into a work queue.  This
> > causes a noticable performance regression when there is heavy multicast
> > traffic on the underlying interface for multicast groups that the
> > macvlan subinterfaces are not members of, in which case we end up
> > cloning all those packets and then freeing them again from a work queue
> > without really doing any useful work with them in between.
> 
> OK so your motivation is to get rid of the unnecessary memory
> allocation, right?

That and stack switches to kworker threads and serialisation on
the bc_queue queue lock.

Re: [PATCH,RFC] macvlan: Handle broadcasts inline if we have only a few macvlans.

2016-05-27 Thread Lennert Buytenhek

On Fri, May 27, 2016 at 10:56:44AM -0700, Cong Wang wrote:

> > Commit 412ca1550cbecb2c ("macvlan: Move broadcasts into a work queue")
> > moved processing of all macvlan multicasts into a work queue.  This
> > causes a noticable performance regression when there is heavy multicast
> > traffic on the underlying interface for multicast groups that the
> > macvlan subinterfaces are not members of, in which case we end up
> > cloning all those packets and then freeing them again from a work queue
> > without really doing any useful work with them in between.
> 
> But we only queue up to 1000 packets in our backlog.
> 
> How about adding a quick check before cloning it?
> 
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index cb01023..1c73d0f 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -315,6 +315,9 @@ static void macvlan_broadcast_enqueue(struct
> macvlan_port *port,
> struct sk_buff *nskb;
> int err = -ENOMEM;
> 
> +   if (skb_queue_len(>bc_queue) >= MACVLAN_BC_QUEUE_LEN)
> +   return;
> +
> nskb = skb_clone(skb, GFP_ATOMIC);
> if (!nskb)
> goto err;

We're not hitting the bc_queue skb limit in our environment, as the
machine can keep up with the traffic -- it's just that taking an
extra clone of the skb and queueing and running the work queue item
to free it again is eating up a lot of cycles.

But doing the queue length check before the clone might not be a bad
idea?  (You'd probably want to atomic_long_inc(>dev->rx_dropped)
before returning, though?)

[PATCH,RFC] macvlan: Handle broadcasts inline if we have only a few macvlans.

2016-05-26 Thread Lennert Buytenhek

Commit 412ca1550cbecb2c ("macvlan: Move broadcasts into a work queue")
moved processing of all macvlan multicasts into a work queue.  This
causes a noticable performance regression when there is heavy multicast
traffic on the underlying interface for multicast groups that the
macvlan subinterfaces are not members of, in which case we end up
cloning all those packets and then freeing them again from a work queue
without really doing any useful work with them in between.

The commit message for commit 412ca1550cbecb2c says:

|   Fundamentally, we need to ensure that the amount of work handled
|   in each netif_rx backlog run is constrained.  As broadcasts are
|   anything but constrained, it either needs to be limited per run
|   or moved to process context.

This patch moves multicast handling back into macvlan_handle_frame()
context if there are 100 or fewer macvlan subinterfaces, while keeping
the work queue for if there are more macvlan subinterfaces than that.

I played around with keeping track of the number of macvlan
subinterfaces that have each multicast filter bit set, but that ended
up being more complicated than I liked.  Conditionalising the work
queue deferring on the total number of macvlan subinterfaces seems
like a fair compromise.

On a quickly whipped together test program that creates an ethertap
interface with a single macvlan subinterface and then blasts 16 Mi
multicast packets through the ethertap interface for a multicast
group that the macvlan subinterface is not a member of, run time goes
from (vanilla kernel):

# time ./stress
real0m41.864s
user0m0.622s
sys 0m20.754s

to (with this patch):

# time ./stress
real0m16.539s
user0m0.519s
sys 0m15.949s

Reported-by: Grant Zhang <gzh...@fastly.com>
Signed-off-by: Lennert Buytenhek <lbuyten...@fastly.com>
---
 drivers/net/macvlan.c | 71 ---
 1 file changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index cb01023..02934a5 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -231,7 +231,8 @@ static unsigned int mc_hash(const struct macvlan_dev *vlan,
 static void macvlan_broadcast(struct sk_buff *skb,
  const struct macvlan_port *port,
  struct net_device *src,
- enum macvlan_mode mode)
+ enum macvlan_mode mode,
+ bool do_rx_softirq)
 {
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
@@ -254,17 +255,49 @@ static void macvlan_broadcast(struct sk_buff *skb,
 
err = NET_RX_DROP;
nskb = skb_clone(skb, GFP_ATOMIC);
-   if (likely(nskb))
+   if (likely(nskb)) {
err = macvlan_broadcast_one(
nskb, vlan, eth,
-   mode == MACVLAN_MODE_BRIDGE) ?:
- netif_rx_ni(nskb);
+   mode == MACVLAN_MODE_BRIDGE);
+   if (err == 0) {
+   if (do_rx_softirq)
+   err = netif_rx_ni(nskb);
+   else
+   err = netif_rx(nskb);
+   }
+   }
macvlan_count_rx(vlan, skb->len + ETH_HLEN,
 err == NET_RX_SUCCESS, true);
}
}
 }
 
+static void macvlan_process_one(struct sk_buff *skb,
+   struct macvlan_port *port,
+   const struct macvlan_dev *src,
+   bool do_rx_softirq)
+{
+   if (!src)
+   /* frame comes from an external address */
+   macvlan_broadcast(skb, port, NULL,
+ MACVLAN_MODE_PRIVATE |
+ MACVLAN_MODE_VEPA|
+ MACVLAN_MODE_PASSTHRU|
+ MACVLAN_MODE_BRIDGE, do_rx_softirq);
+   else if (src->mode == MACVLAN_MODE_VEPA)
+   /* flood to everyone except source */
+   macvlan_broadcast(skb, port, src->dev,
+ MACVLAN_MODE_VEPA |
+ MACVLAN_MODE_BRIDGE, do_rx_softirq);
+   else
+   /*
+* flood only to VEPA ports, bridge ports
+* already saw the frame on the way out.
+*/
+   macvlan_broadcast(skb, port, src->dev,
+ MACVLAN_MODE_VEPA, do_rx_softirq);
+}
+
 sta

Re: [PATCH,RFC] ep93xx_eth: conversion to phylib framework

2008-02-24 Thread Lennert Buytenhek

On Sun, Feb 24, 2008 at 09:21:53AM +0100, Herbert Valerio Riedel wrote:

   Currently, the ep93xx_eth driver doesn't care about the PHY state,
   but it should, in order to tell the MAC when full duplex operation is
   required; failure to do so causes degraded performance on full duplex
   links. This patch implements proper PHY handling via the phylib  
   framework:
  
- clean up ep93xx_mdio_{read,write} to conform to ep93xx manual
- convert ep93xx_eth driver to phylib framework
- set full duplex bit in configuration of MAC when FDX link detected
- convert to use print_mac()
  
  Looks good to me.  My only comment is that we might want to have  
  support for checking preamble suppression support in the PHY Lib,  
  itself.
  
  Acked-by: Andy Fleming [EMAIL PROTECTED]
 
 ...as nothing happend for some months now just wondering, what I should
 do next, to get this patch merged upstream :-)

ACK!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rtl8150: use default MTU of 1500

2008-01-31 Thread Lennert Buytenhek

On Thu, Jan 31, 2008 at 05:42:34PM +0200, Petko Manolov wrote:

  The RTL8150 driver uses an MTU of 1540 by default, which causes a
  bunch of problems -- it prevents booting from NFS root, for one.
 
 Agreed, although it is a bit strange how this particular bug has
 sneaked up for so long...

I posted this patch sometime in 2006, and you asked me a question
about it then (why we don't just set RTL8150_MTU to 1500 -- the
answer would be that RTL8150_MTU is used in a couple more places
in the driver, including for allocing skbuffs), but I failed to
follow up to that question at the time, which is why I assume it got
dropped.

I have been carrying the patch in my own tree since then, and only
noticed recently that the patch never made it upstream.


cheers,
Lennert


 Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
 Cc: Petko Manolov [EMAIL PROTECTED]
 
 --- linux-2.6.24-git7.orig/drivers/net/usb/rtl8150.c 2008-01-24 
 23:58:37.0 +0100
 +++ linux-2.6.24-git7/drivers/net/usb/rtl8150.c  2008-01-30 
 20:29:00.0 +0100
 @@ -925,9 +925,8 @@
  netdev-hard_start_xmit = rtl8150_start_xmit;
  netdev-set_multicast_list = rtl8150_set_multicast;
  netdev-set_mac_address = rtl8150_set_mac_address;
  netdev-get_stats = rtl8150_netdev_stats;
 -netdev-mtu = RTL8150_MTU;
  SET_ETHTOOL_OPS(netdev, ops);
  dev-intr_interval = 100;   /* 100ms */
 
  if (!alloc_all_urbs(dev)) {
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

rtl8150: use default MTU of 1500

2008-01-30 Thread Lennert Buytenhek

The RTL8150 driver uses an MTU of 1540 by default, which causes a
bunch of problems -- it prevents booting from NFS root, for one.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
Cc: Petko Manolov [EMAIL PROTECTED]

--- linux-2.6.24-git7.orig/drivers/net/usb/rtl8150.c2008-01-24 
23:58:37.0 +0100
+++ linux-2.6.24-git7/drivers/net/usb/rtl8150.c 2008-01-30 20:29:00.0 
+0100
@@ -925,9 +925,8 @@
netdev-hard_start_xmit = rtl8150_start_xmit;
netdev-set_multicast_list = rtl8150_set_multicast;
netdev-set_mac_address = rtl8150_set_mac_address;
netdev-get_stats = rtl8150_netdev_stats;
-   netdev-mtu = RTL8150_MTU;
SET_ETHTOOL_OPS(netdev, ops);
dev-intr_interval = 100;   /* 100ms */
 
if (!alloc_all_urbs(dev)) {
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] [NETDEV] ixp2000: rtnl_lock out of loop will be faster

2007-12-12 Thread Lennert Buytenhek

On Wed, Dec 12, 2007 at 04:48:28PM +0800, Wang Chen wrote:

 [PATCH 3/4] [NETDEV] ixp2000: rtnl_lock out of loop will be faster
 
 Before this patch, it gets and releases the lock at each
 iteration of the loop. Changing unregister_netdev to
 unregister_netdevice and locking outside of the loop will
 be faster for this approach.

Since the number of net devices is typically either 2 or 3 (depending
on the specific model card you're using), and this is not in any kind
of hot path at all, I don't see a whole lot of benefit of acquiring
the RTNL separately.  Besides, I'm slightly worried about putting
knowledge of the RTNL into the driver directly.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pegasos_eth.c: Fix compile error over MV643XX_ defines

2007-10-29 Thread Lennert Buytenhek

On Mon, Oct 29, 2007 at 05:27:29PM -0400, Luis R. Rodriguez wrote:

 This commit made an incorrect assumption:
 --
 Author: Lennert Buytenhek [EMAIL PROTECTED]
  Date:   Fri Oct 19 04:10:10 2007 +0200
 
 mv643xx_eth: Move ethernet register definitions into private header
 
 Move the mv643xx's ethernet-related register definitions from
 include/linux/mv643xx.h into drivers/net/mv643xx_eth.h, since
 they aren't of any use outside the ethernet driver.
 
 Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
 Acked-by: Tzachi Perelstein [EMAIL PROTECTED]
 Signed-off-by: Dale Farnsworth [EMAIL PROTECTED]
 --
 
 arch/powerpc/platforms/chrp/pegasos_eth.c made use of a 3 defines there.
 
 [EMAIL PROTECTED]:~/devel/wireless-2.6$ git-describe 
 
 v2.6.24-rc1-138-g0119130
 
 This patch fixes this by internalizing 3 defines onto pegasos which are
 simply no longer available elsewhere. Without this your compile will fail
 whenever you enable 'Common Hardware Reference Platform (CHRP) based 
 machines',

 [...]
 
 diff --git a/arch/powerpc/platforms/chrp/pegasos_eth.c 
 b/arch/powerpc/platforms/chrp/pegasos_eth.c
 index 5bcc58d..1fc9e8c 100644
 --- a/arch/powerpc/platforms/chrp/pegasos_eth.c
 +++ b/arch/powerpc/platforms/chrp/pegasos_eth.c
 @@ -24,6 +24,9 @@
  #define PEGASOS2_SRAM_BASE_ETH0  (PEGASOS2_SRAM_BASE)
  #define PEGASOS2_SRAM_BASE_ETH1  
 (PEGASOS2_SRAM_BASE_ETH0 + (PEGASOS2_SRAM_SIZE / 2) )
  
 +#define PEGASOS2_ETH_BAR_4   0x2220
 +#define PEGASOS2_ETH_SIZE_REG_4  0x2224
 +#define PEGASOS2_ETH_BASE_ADDR_ENABLE_REG0x2290
  
  #define PEGASOS2_SRAM_RXRING_SIZE(PEGASOS2_SRAM_SIZE/4)
  #define PEGASOS2_SRAM_TXRING_SIZE(PEGASOS2_SRAM_SIZE/4)
 @@ -147,13 +150,13 @@ static int Enable_SRAM(void)
  
   ALong = 0x02;
   ALong |= PEGASOS2_SRAM_BASE  0x;
 - MV_WRITE(MV643XX_ETH_BAR_4, ALong);
 + MV_WRITE(PEGASOS2_ETH_BAR_4, ALong);
  
 - MV_WRITE(MV643XX_ETH_SIZE_REG_4, (PEGASOS2_SRAM_SIZE-1)  0x);
 + MV_WRITE(PEGASOS2_ETH_SIZE_REG_4, (PEGASOS2_SRAM_SIZE-1)  0x);
  
 - MV_READ(MV643XX_ETH_BASE_ADDR_ENABLE_REG, ALong);
 + MV_READ(PEGASOS2_ETH_BASE_ADDR_ENABLE_REG, ALong);
   ALong = ~(1  4);
 - MV_WRITE(MV643XX_ETH_BASE_ADDR_ENABLE_REG, ALong);
 + MV_WRITE(PEGASOS2_ETH_BASE_ADDR_ENABLE_REG, ALong);
  
  #ifdef BE_VERBOSE
   printk(Pegasos II/Marvell MV64361: register unmapped\n);

Al Viro sent a patch for this breakage a couple of days ago:

http://marc.info/?l=linux-kernelm=119351541706811w=2

(FWIW, I think that code outside of mv643xx_eth.c should not be poking
into the mv643xx's registers directly.  Ideally, this info should just
be passed by pegasos_eth into mv643xx_eth via platform data, and then
mv643xx_eth can write the relevant hardware registers.)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Remove pointless casts from void pointers,

2007-10-26 Thread Lennert Buytenhek

On Fri, Oct 26, 2007 at 05:40:22AM -0400, Jeff Garzik wrote:

  arch/arm/mach-pxa/ssp.c|2 +-
  arch/arm/mach-s3c2410/usb-simtec.c |2 +-
  arch/arm/plat-omap/mailbox.c   |2 +-

FWIW

Acked-by: Lennert Buytenhek [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH,RFC] Marvell Orion SoC ethernet driver

2007-10-25 Thread Lennert Buytenhek

On Thu, Oct 25, 2007 at 05:12:04AM -0400, Jeff Garzik wrote:

 +struct rx_desc {
 +u32 cmd_sts;
 +u16 size;
 +u16 count;
 +u32 buf;
 +u32 next;
 +};
 +
 +struct tx_desc {
 +u32 cmd_sts;
 +u16 l4i_chk;
 +u16 count;
 +u32 buf;
 +u32 next;
 +};
 
 should use sparse type (__le32, etc.) and make sure this driver passes 
 sparse checks
 
 ditto for checkpatch (except for the excessively anal stuff)

Sorry if it wasn't clear from the thread -- the mainline mv643xx_eth
driver turns out to support the same silicon block (but as part of a
different chip), so we've dropped orion_eth and submitted patches to
make mv643xx_eth work on both the Discovery (what it was originally
written for) and the Orion, and these patches are in -rc1 already.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/8] [MV643XX_ETH] Move ethernet register definitions into private header

2007-10-19 Thread Lennert Buytenhek

On Fri, Oct 19, 2007 at 05:56:54AM -0700, Dale Farnsworth wrote:

   Isn't it a little too confusing to have two headers with the same name,
   one in drivers/net and one in include/linux?
  
  Perhaps we can fold the drivers/net one into drivers/net/mv643xx_eth.c?
  Since nothing else includes drivers/net/mv643xx_eth.h anyway, there's
  not much point in having it separate.
 
 Sounds good to me.  Please add a patch to do so.

Okay.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/8] [MV643XX_ETH] Move ethernet register definitions into private header

2007-10-19 Thread Lennert Buytenhek

On Fri, Oct 19, 2007 at 09:30:48AM +0100, Christoph Hellwig wrote:

  Move the mv643xx's ethernet-related register definitions from
  include/linux/mv643xx.h into drivers/net/mv643xx_eth.h, since
  they aren't of any use outside the ethernet driver.
  
  Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
  Acked-by: Tzachi Perelstein [EMAIL PROTECTED]
  
  Index: linux-2.6/drivers/net/mv643xx_eth.h
  ===
  --- linux-2.6.orig/drivers/net/mv643xx_eth.h
  +++ linux-2.6/drivers/net/mv643xx_eth.h
  @@ -7,7 +7,7 @@
   #include linux/workqueue.h
   #include linux/mii.h
   
  -#include linux/mv643xx.h
  +#include linux/mv643xx_eth.h
 
 Isn't it a little too confusing to have two headers with the same name,
 one in drivers/net and one in include/linux?

Perhaps we can fold the drivers/net one into drivers/net/mv643xx_eth.c?
Since nothing else includes drivers/net/mv643xx_eth.h anyway, there's
not much point in having it separate.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 9/8] [MV643XX_ETH] Merge drivers/net/mv643xx_eth.h into mv643xx_eth.c

2007-10-19 Thread Lennert Buytenhek

Since drivers/net/mv643xx_eth.c is the only user of
drivers/net/mv643xx_eth.h, there's not much use in having the header
file as a separate file, so merge the header into the driver.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]

Index: linux-2.6/drivers/net/mv643xx_eth.c
===
--- linux-2.6.orig/drivers/net/mv643xx_eth.c
+++ linux-2.6/drivers/net/mv643xx_eth.c
@@ -43,14 +43,570 @@
 #include linux/ethtool.h
 #include linux/platform_device.h
 
+#include linux/module.h
+#include linux/kernel.h
+#include linux/spinlock.h
+#include linux/workqueue.h
+#include linux/mii.h
+
+#include linux/mv643xx_eth.h
+
 #include asm/io.h
 #include asm/types.h
 #include asm/pgtable.h
 #include asm/system.h
 #include asm/delay.h
-#include mv643xx_eth.h
+#include asm/dma-mapping.h
+
+/* Checksum offload for Tx works for most packets, but
+ * fails if previous packet sent did not use hw csum
+ */
+#define MV643XX_CHECKSUM_OFFLOAD_TX
+#define MV643XX_NAPI
+#define MV643XX_TX_FAST_REFILL
+#undef MV643XX_COAL
+
+/*
+ * Number of RX / TX descriptors on RX / TX rings.
+ * Note that allocating RX descriptors is done by allocating the RX
+ * ring AND a preallocated RX buffers (skb's) for each descriptor.
+ * The TX descriptors only allocates the TX descriptors ring,
+ * with no pre allocated TX buffers (skb's are allocated by higher layers.
+ */
+
+/* Default TX ring size is 1000 descriptors */
+#define MV643XX_DEFAULT_TX_QUEUE_SIZE 1000
+
+/* Default RX ring size is 400 descriptors */
+#define MV643XX_DEFAULT_RX_QUEUE_SIZE 400
+
+#define MV643XX_TX_COAL 100
+#ifdef MV643XX_COAL
+#define MV643XX_RX_COAL 100
+#endif
+
+#ifdef MV643XX_CHECKSUM_OFFLOAD_TX
+#define MAX_DESCS_PER_SKB  (MAX_SKB_FRAGS + 1)
+#else
+#define MAX_DESCS_PER_SKB  1
+#endif
+
+#define ETH_VLAN_HLEN  4
+#define ETH_FCS_LEN4
+#define ETH_HW_IP_ALIGN2   /* hw aligns IP header 
*/
+#define ETH_WRAPPER_LEN(ETH_HW_IP_ALIGN + ETH_HLEN + \
+   ETH_VLAN_HLEN + ETH_FCS_LEN)
+#define ETH_RX_SKB_SIZE(dev-mtu + ETH_WRAPPER_LEN + \
+   dma_get_cache_alignment())
+
+/*
+ * Registers shared between all ports.
+ */
+#define PHY_ADDR_REG   0x
+#define SMI_REG0x0004
+
+/*
+ * Per-port registers.
+ */
+#define PORT_CONFIG_REG(p) (0x0400 + ((p)  10))
+#define PORT_CONFIG_EXTEND_REG(p)  (0x0404 + ((p)  10))
+#define MAC_ADDR_LOW(p)(0x0414 + ((p) 
 10))
+#define MAC_ADDR_HIGH(p)   (0x0418 + ((p)  10))
+#define SDMA_CONFIG_REG(p) (0x041c + ((p)  10))
+#define PORT_SERIAL_CONTROL_REG(p) (0x043c + ((p)  10))
+#define PORT_STATUS_REG(p) (0x0444 + ((p)  10))
+#define TRANSMIT_QUEUE_COMMAND_REG(p)  (0x0448 + ((p)  10))
+#define MAXIMUM_TRANSMIT_UNIT(p)   (0x0458 + ((p)  10))
+#define INTERRUPT_CAUSE_REG(p) (0x0460 + ((p)  10))
+#define INTERRUPT_CAUSE_EXTEND_REG(p)  (0x0464 + ((p)  10))
+#define INTERRUPT_MASK_REG(p)  (0x0468 + ((p)  10))
+#define INTERRUPT_EXTEND_MASK_REG(p)   (0x046c + ((p)  10))
+#define TX_FIFO_URGENT_THRESHOLD_REG(p)(0x0474 + ((p) 
 10))
+#define RX_CURRENT_QUEUE_DESC_PTR_0(p) (0x060c + ((p)  10))
+#define RECEIVE_QUEUE_COMMAND_REG(p)   (0x0680 + ((p)  10))
+#define TX_CURRENT_QUEUE_DESC_PTR_0(p) (0x06c0 + ((p)  10))
+#define MIB_COUNTERS_BASE(p)   (0x1000 + ((p)  7))
+#define DA_FILTER_SPECIAL_MULTICAST_TABLE_BASE(p)  (0x1400 + ((p)  10))
+#define DA_FILTER_OTHER_MULTICAST_TABLE_BASE(p)(0x1500 + ((p) 
 10))
+#define DA_FILTER_UNICAST_TABLE_BASE(p)(0x1600 + ((p) 
 10))
+
+/* These macros describe Ethernet Port configuration reg (Px_cR) bits */
+#define UNICAST_NORMAL_MODE(0  0)
+#define UNICAST_PROMISCUOUS_MODE   (1  0)
+#define DEFAULT_RX_QUEUE(queue)((queue)  1)
+#define DEFAULT_RX_ARP_QUEUE(queue)((queue)  4)
+#define RECEIVE_BC_IF_NOT_IP_OR_ARP(0  7)
+#define REJECT_BC_IF_NOT_IP_OR_ARP (1  7)
+#define RECEIVE_BC_IF_IP   (0  8)
+#define REJECT_BC_IF_IP(1  8)
+#define RECEIVE_BC_IF_ARP  (0  9)
+#define REJECT_BC_IF_ARP   (1  9)
+#define TX_AM_NO_UPDATE_ERROR_SUMMARY  (1  12)
+#define CAPTURE_TCP_FRAMES_DIS (0  14)
+#define CAPTURE_TCP_FRAMES_EN  (1  14)
+#define CAPTURE_UDP_FRAMES_DIS (0  15)
+#define CAPTURE_UDP_FRAMES_EN  (1  15)
+#define DEFAULT_RX_TCP_QUEUE(queue)((queue)  16)
+#define

Re: [PATCH 5/8] [MV643XX_ETH] Remove SHARED_REGS register address bias

2007-10-19 Thread Lennert Buytenhek

On Thu, Oct 18, 2007 at 08:46:58PM -0700, Roland Dreier wrote:

   +static void __iomem *mv643xx_eth_base;
 
   +  return readl(((void __iomem *)mv643xx_eth_base) + offset);
 
 Given the declaration of mv643xx_eth_base as void __iomem * already, I
 don't understand why you need the cast to the same type here (and
 elsewhere in the driver).

Makes sense, fixed.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/8] [MV643XX_ETH] Remove unused register defines

2007-10-18 Thread Lennert Buytenhek

Most of the register defines in drivers/net/mv643xx_eth.h aren't
used at all.  Nuke them -- we can always re-add them if/when we
need them, and meanwhile, they unnecessarily clutter up the
header file.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
Acked-by: Tzachi Perelstein [EMAIL PROTECTED]

Index: linux-2.6/drivers/net/mv643xx_eth.h
===
--- linux-2.6.orig/drivers/net/mv643xx_eth.h
+++ linux-2.6/drivers/net/mv643xx_eth.h
@@ -57,115 +57,28 @@
  */
 #define PHY_ADDR_REG   0x
 #define SMI_REG0x0004
-#define UNIT_DEFAULT_ADDR_REG  0x0008
-#define UNIT_DEFAULTID_REG 0x000c
-#define UNIT_INTERRUPT_CAUSE_REG   0x0080
-#define UNIT_INTERRUPT_MASK_REG0x0084
-#define UNIT_INTERNAL_USE_REG  0x04fc
-#define UNIT_ERROR_ADDR_REG0x0094
-#define BAR_0  0x0200
-#define BAR_1  0x0208
-#define BAR_2  0x0210
-#define BAR_3  0x0218
-#define BAR_4  0x0220
-#define BAR_5  0x0228
-#define SIZE_REG_0 0x0204
-#define SIZE_REG_1 0x020c
-#define SIZE_REG_2 0x0214
-#define SIZE_REG_3 0x021c
-#define SIZE_REG_4 0x0224
-#define SIZE_REG_5 0x022c
-#define HEADERS_RETARGET_BASE_REG  0x0230
-#define HEADERS_RETARGET_CONTROL_REG   0x0234
-#define HIGH_ADDR_REMAP_REG_0  0x0280
-#define HIGH_ADDR_REMAP_REG_1  0x0284
-#define HIGH_ADDR_REMAP_REG_2  0x0288
-#define HIGH_ADDR_REMAP_REG_3  0x028c
-#define BASE_ADDR_ENABLE_REG   0x0290
 
 
 /*
  * Per-port registers.
  */
-#define ACCESS_PROTECTION_REG(p)   (0x0294 + ((p)  2))
 #define PORT_CONFIG_REG(p) (0x0400 + ((p)  10))
 #define PORT_CONFIG_EXTEND_REG(p)  (0x0404 + ((p)  10))
-#define MII_SERIAL_PARAMETRS_REG(p)(0x0408 + ((p)  10))
-#define GMII_SERIAL_PARAMETRS_REG(p)   (0x040c + ((p)  10))
-#define VLAN_ETHERTYPE_REG(p)  (0x0410 + ((p)  10))
 #define MAC_ADDR_LOW(p)(0x0414 + ((p) 
 10))
 #define MAC_ADDR_HIGH(p)   (0x0418 + ((p)  10))
 #define SDMA_CONFIG_REG(p) (0x041c + ((p)  10))
-#define DSCP_0(p)  (0x0420 + ((p)  10))
-#define DSCP_1(p)  (0x0424 + ((p)  10))
-#define DSCP_2(p)  (0x0428 + ((p)  10))
-#define DSCP_3(p)  (0x042c + ((p)  10))
-#define DSCP_4(p)  (0x0430 + ((p)  10))
-#define DSCP_5(p)  (0x0434 + ((p)  10))
-#define DSCP_6(p)  (0x0438 + ((p)  10))
 #define PORT_SERIAL_CONTROL_REG(p) (0x043c + ((p)  10))
-#define VLAN_PRIORITY_TAG_TO_PRIORITY(p)   (0x0440 + ((p)  10))
 #define PORT_STATUS_REG(p) (0x0444 + ((p)  10))
 #define TRANSMIT_QUEUE_COMMAND_REG(p)  (0x0448 + ((p)  10))
-#define TX_QUEUE_FIXED_PRIORITY(p) (0x044c + ((p)  10))
-#define PORT_TX_TOKEN_BUCKET_RATE_CONFIG(p)(0x0450 + ((p)  10))
 #define MAXIMUM_TRANSMIT_UNIT(p)   (0x0458 + ((p)  10))
-#define PORT_MAXIMUM_TOKEN_BUCKET_SIZE(p)  (0x045c + ((p)  10))
 #define INTERRUPT_CAUSE_REG(p) (0x0460 + ((p)  10))
 #define INTERRUPT_CAUSE_EXTEND_REG(p)  (0x0464 + ((p)  10))
 #define INTERRUPT_MASK_REG(p)  (0x0468 + ((p)  10))
 #define INTERRUPT_EXTEND_MASK_REG(p)   (0x046c + ((p)  10))
-#define RX_FIFO_URGENT_THRESHOLD_REG(p)(0x0470 + ((p) 
 10))
 #define TX_FIFO_URGENT_THRESHOLD_REG(p)(0x0474 + ((p) 
 10))
-#define RX_MINIMAL_FRAME_SIZE_REG(p)   (0x047c + ((p)  10))
-#define RX_DISCARDED_FRAMES_COUNTER(p) (0x0484 + ((p)  10))
-#define PORT_DEBUG_0_REG(p)(0x048c + ((p)  10))
-#define PORT_DEBUG_1_REG(p)(0x0490 + ((p)  10))
-#define PORT_INTERNAL_ADDR_ERROR_REG(p)(0x0494 + ((p) 
 10))
-#define INTERNAL_USE_REG(p)(0x04fc + ((p)  10))
 #define RX_CURRENT_QUEUE_DESC_PTR_0(p) (0x060c + ((p)  10))
-#define RX_CURRENT_QUEUE_DESC_PTR_1(p

[PATCH 7/8] [MV643XX_ETH] Clean up mv643xx_eth.h

2007-10-18 Thread Lennert Buytenhek

Apply the following cleanups to drivers/net/mv643xx_eth.h:
* Change #definetab to #definespace.
* Fix comment block style.
* Wrap lines to fit in 80 columns.
* Change foo1 to foo  1.
* Align addresses in the same column.
* Parenthesize macro arguments.
* Replace (124) | (123) | (122) type constructs with (7  22).

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
Acked-by: Tzachi Perelstein [EMAIL PROTECTED]

Index: linux-2.6/drivers/net/mv643xx_eth.h
===
--- linux-2.6.orig/drivers/net/mv643xx_eth.h
+++ linux-2.6/drivers/net/mv643xx_eth.h
@@ -14,9 +14,9 @@
 /* Checksum offload for Tx works for most packets, but
  * fails if previous packet sent did not use hw csum
  */
-#defineMV643XX_CHECKSUM_OFFLOAD_TX
-#defineMV643XX_NAPI
-#defineMV643XX_TX_FAST_REFILL
+#define MV643XX_CHECKSUM_OFFLOAD_TX
+#define MV643XX_NAPI
+#define MV643XX_TX_FAST_REFILL
 #undef MV643XX_COAL
 
 /*
@@ -49,230 +49,199 @@
 #define ETH_HW_IP_ALIGN2   /* hw aligns IP header 
*/
 #define ETH_WRAPPER_LEN(ETH_HW_IP_ALIGN + ETH_HLEN + \
ETH_VLAN_HLEN + ETH_FCS_LEN)
-#define ETH_RX_SKB_SIZE(dev-mtu + ETH_WRAPPER_LEN + 
dma_get_cache_alignment())
+#define ETH_RX_SKB_SIZE(dev-mtu + ETH_WRAPPER_LEN + \
+   dma_get_cache_alignment())
 
-//
-/*Ethernet Unit Registers  */
-//
-
-#define PHY_ADDR_REG0x
-#define SMI_REG 0x0004
-#define UNIT_DEFAULT_ADDR_REG   0x0008
-#define UNIT_DEFAULTID_REG  0x000c
-#define UNIT_INTERRUPT_CAUSE_REG0x0080
-#define UNIT_INTERRUPT_MASK_REG 0x0084
-#define UNIT_INTERNAL_USE_REG   0x04fc
-#define UNIT_ERROR_ADDR_REG 0x0094
-#define BAR_0   0x0200
-#define BAR_1   0x0208
-#define BAR_2   0x0210
-#define BAR_3   0x0218
-#define BAR_4   0x0220
-#define BAR_5   0x0228
-#define SIZE_REG_0  0x0204
-#define SIZE_REG_1  0x020c
-#define SIZE_REG_2  0x0214
-#define SIZE_REG_3  0x021c
-#define SIZE_REG_4  0x0224
-#define SIZE_REG_5  0x022c
-#define HEADERS_RETARGET_BASE_REG   0x0230
-#define HEADERS_RETARGET_CONTROL_REG0x0234
-#define HIGH_ADDR_REMAP_REG_0   0x0280
-#define HIGH_ADDR_REMAP_REG_1   0x0284
-#define HIGH_ADDR_REMAP_REG_2   0x0288
-#define HIGH_ADDR_REMAP_REG_3   0x028c
-#define BASE_ADDR_ENABLE_REG0x0290
-#define ACCESS_PROTECTION_REG(port)(0x0294 + (port2))
-#define MIB_COUNTERS_BASE(port)(0x1000 + (port7))
-#define PORT_CONFIG_REG(port)  (0x0400 + (port10))
-#define PORT_CONFIG_EXTEND_REG(port)   (0x0404 + (port10))
-#define MII_SERIAL_PARAMETRS_REG(port) (0x0408 + (port10))
-#define GMII_SERIAL_PARAMETRS_REG(port)(0x040c + (port10))
-#define VLAN_ETHERTYPE_REG(port)   (0x0410 + (port10))
-#define MAC_ADDR_LOW(port) (0x0414 + (port10))
-#define MAC_ADDR_HIGH(port)(0x0418 + (port10))
-#define SDMA_CONFIG_REG(port)  (0x041c + (port10))
-#define DSCP_0(port)   (0x0420 + (port10))
-#define DSCP_1(port)   (0x0424 + (port10))
-#define DSCP_2(port)   (0x0428 + (port10))
-#define DSCP_3(port)   (0x042c + (port10))
-#define DSCP_4(port)   (0x0430 + (port10))
-#define DSCP_5(port)   (0x0434 + (port10))
-#define DSCP_6(port)   (0x0438 + (port10))
-#define PORT_SERIAL_CONTROL_REG(port)  (0x043c + (port10))
-#define VLAN_PRIORITY_TAG_TO_PRIORITY(port)(0x0440 + (port10))
-#define PORT_STATUS_REG(port)  (0x0444 + (port10))
-#define TRANSMIT_QUEUE_COMMAND_REG(port)   (0x0448 + (port10))
-#define TX_QUEUE_FIXED_PRIORITY(port)  (0x044c + (port10

[PATCH 3/8] [MV643XX_ETH] Disable RX/TX byte swapping on little-endian systems

2007-10-18 Thread Lennert Buytenhek

On little-endian systems, configure the SDMA unit with
MV643XX_ETH_BLM_RX_NO_SWAP and MV643XX_ETH_BLM_TX_NO_SWAP.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
Acked-by: Tzachi Perelstein [EMAIL PROTECTED]

Index: linux-2.6/drivers/net/mv643xx_eth.h
===
--- linux-2.6.orig/drivers/net/mv643xx_eth.h
+++ linux-2.6/drivers/net/mv643xx_eth.h
@@ -266,10 +266,21 @@
 
 #defineMV643XX_ETH_IPG_INT_RX(value) ((value  0x3fff)  8)
 
+#if defined(__BIG_ENDIAN)
 #defineMV643XX_ETH_PORT_SDMA_CONFIG_DEFAULT_VALUE  \
MV643XX_ETH_RX_BURST_SIZE_4_64BIT   |   \
MV643XX_ETH_IPG_INT_RX(0)   |   \
MV643XX_ETH_TX_BURST_SIZE_4_64BIT
+#elif defined(__LITTLE_ENDIAN)
+#defineMV643XX_ETH_PORT_SDMA_CONFIG_DEFAULT_VALUE  \
+   MV643XX_ETH_RX_BURST_SIZE_4_64BIT   |   \
+   MV643XX_ETH_BLM_RX_NO_SWAP  |   \
+   MV643XX_ETH_BLM_TX_NO_SWAP  |   \
+   MV643XX_ETH_IPG_INT_RX(0)   |   \
+   MV643XX_ETH_TX_BURST_SIZE_4_64BIT
+#else
+#error One of __BIG_ENDIAN or __LITTLE_ENDIAN must be defined
+#endif
 
 /* These macros describe Ethernet Port serial control reg (PSCR) bits */
 #define MV643XX_ETH_SERIAL_PORT_DISABLE0
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/8] [MV643XX_ETH] Move ethernet register definitions into private header

2007-10-18 Thread Lennert Buytenhek

Move the mv643xx's ethernet-related register definitions from
include/linux/mv643xx.h into drivers/net/mv643xx_eth.h, since
they aren't of any use outside the ethernet driver.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
Acked-by: Tzachi Perelstein [EMAIL PROTECTED]

Index: linux-2.6/drivers/net/mv643xx_eth.h
===
--- linux-2.6.orig/drivers/net/mv643xx_eth.h
+++ linux-2.6/drivers/net/mv643xx_eth.h
@@ -7,7 +7,7 @@
 #include linux/workqueue.h
 #include linux/mii.h
 
-#include linux/mv643xx.h
+#include linux/mv643xx_eth.h
 
 #include asm/dma-mapping.h
 
@@ -51,6 +51,312 @@
ETH_VLAN_HLEN + ETH_FCS_LEN)
 #define ETH_RX_SKB_SIZE(dev-mtu + ETH_WRAPPER_LEN + 
dma_get_cache_alignment())
 
+//
+/*Ethernet Unit Registers  */
+//
+
+#define MV643XX_ETH_PHY_ADDR_REG0x2000
+#define MV643XX_ETH_SMI_REG 0x2004
+#define MV643XX_ETH_UNIT_DEFAULT_ADDR_REG   0x2008
+#define MV643XX_ETH_UNIT_DEFAULTID_REG  0x200c
+#define MV643XX_ETH_UNIT_INTERRUPT_CAUSE_REG0x2080
+#define MV643XX_ETH_UNIT_INTERRUPT_MASK_REG 0x2084
+#define MV643XX_ETH_UNIT_INTERNAL_USE_REG   0x24fc
+#define MV643XX_ETH_UNIT_ERROR_ADDR_REG 0x2094
+#define MV643XX_ETH_BAR_0   0x2200
+#define MV643XX_ETH_BAR_1   0x2208
+#define MV643XX_ETH_BAR_2   0x2210
+#define MV643XX_ETH_BAR_3   0x2218
+#define MV643XX_ETH_BAR_4   0x2220
+#define MV643XX_ETH_BAR_5   0x2228
+#define MV643XX_ETH_SIZE_REG_0  0x2204
+#define MV643XX_ETH_SIZE_REG_1  0x220c
+#define MV643XX_ETH_SIZE_REG_2  0x2214
+#define MV643XX_ETH_SIZE_REG_3  0x221c
+#define MV643XX_ETH_SIZE_REG_4  0x2224
+#define MV643XX_ETH_SIZE_REG_5  0x222c
+#define MV643XX_ETH_HEADERS_RETARGET_BASE_REG   0x2230
+#define MV643XX_ETH_HEADERS_RETARGET_CONTROL_REG0x2234
+#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_0   0x2280
+#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_1   0x2284
+#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_2   0x2288
+#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_3   0x228c
+#define MV643XX_ETH_BASE_ADDR_ENABLE_REG0x2290
+#define MV643XX_ETH_ACCESS_PROTECTION_REG(port)(0x2294 + 
(port2))
+#define MV643XX_ETH_MIB_COUNTERS_BASE(port)(0x3000 + 
(port7))
+#define MV643XX_ETH_PORT_CONFIG_REG(port)  (0x2400 + 
(port10))
+#define MV643XX_ETH_PORT_CONFIG_EXTEND_REG(port)   (0x2404 + 
(port10))
+#define MV643XX_ETH_MII_SERIAL_PARAMETRS_REG(port) (0x2408 + 
(port10))
+#define MV643XX_ETH_GMII_SERIAL_PARAMETRS_REG(port)(0x240c + 
(port10))
+#define MV643XX_ETH_VLAN_ETHERTYPE_REG(port)   (0x2410 + 
(port10))
+#define MV643XX_ETH_MAC_ADDR_LOW(port) (0x2414 + 
(port10))
+#define MV643XX_ETH_MAC_ADDR_HIGH(port)(0x2418 + 
(port10))
+#define MV643XX_ETH_SDMA_CONFIG_REG(port)  (0x241c + 
(port10))
+#define MV643XX_ETH_DSCP_0(port)   (0x2420 + 
(port10))
+#define MV643XX_ETH_DSCP_1(port)   (0x2424 + 
(port10))
+#define MV643XX_ETH_DSCP_2(port)   (0x2428 + 
(port10))
+#define MV643XX_ETH_DSCP_3(port)   (0x242c + 
(port10))
+#define MV643XX_ETH_DSCP_4(port)   (0x2430 + 
(port10))
+#define MV643XX_ETH_DSCP_5(port)   (0x2434 + 
(port10))
+#define MV643XX_ETH_DSCP_6(port)   (0x2438 + 
(port10))
+#define MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port)  (0x243c + 
(port10))
+#define MV643XX_ETH_VLAN_PRIORITY_TAG_TO_PRIORITY(port)(0x2440 + 
(port10))
+#define MV643XX_ETH_PORT_STATUS_REG(port)  (0x2444 + 
(port10))
+#define MV643XX_ETH_TRANSMIT_QUEUE_COMMAND_REG(port)   (0x2448 + 
(port10))
+#define MV643XX_ETH_TX_QUEUE_FIXED_PRIORITY(port)  (0x244c + 
(port10))
+#define

[PATCH 0/8] [MV643XX_ETH] Add Orion support, and assorted cleanups

2007-10-18 Thread Lennert Buytenhek

This patch series adds support for the Orion's ethernet MAC
(which is the same MAC as in the Discovery 643xx) to the mv643xx_eth
driver, and performs various random cleanups all over the driver.

Patches 1-3 are cleanups necessary to be able to support Orion.

Patch 4 enables mv643xx_eth for ARCH_ORION.

Patches 5-8 are more cleanups.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH,RFC] Marvell Orion SoC ethernet driver

2007-10-18 Thread Lennert Buytenhek

On Thu, Oct 18, 2007 at 03:15:36AM +0200, Lennert Buytenhek wrote:

   +#define PORT_CONF0x400
   +#define PORT_CONF_EXT0x404
   +#define PORT_MAC_LO  0x414
   +#define PORT_MAC_HI  0x418
   +#define PORT_SDMA0x41c
   +#define PORT_SERIAL  0x43c
   +#define PORT_STAT0x444
   +#define PORT_TXQ_CMD 0x448
   +#define PORT_MTU 0x458
   +#define PORT_CAUSE   0x460
   +#define PORT_CAUSE_EXT   0x464
   +#define PORT_MASK0x468
   +#define PORT_MASK_EXT0x46c
   +#define PORT_TX_THRESH   0x474
  
  This driver seems to support the same hardware as mv643xx_eth, any
  chance you could use it to avoid code duplication ?
 
 Interesting.  After some asking around, it appears that the mv643xx
 ethernet silicon block is indeed very similar to the ethernet silicon
 block found the in Orion ARM SoCs.
 
 We'll work on getting Orion to use mv643xx_eth.  Thanks for pointing
 this out.

Okay, patchset coming up.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/8] [MV643XX_ETH] Remove MV643XX_ETH_ register prefix

2007-10-18 Thread Lennert Buytenhek

Now that all register address and bit defines are in private
namespace (drivers/net/mv643xx_eth.h), we can safely remove the
MV643XX_ETH_ prefix to conserve horizontal space.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
Acked-by: Tzachi Perelstein [EMAIL PROTECTED]

Index: linux-2.6/drivers/net/mv643xx_eth.c
===
--- linux-2.6.orig/drivers/net/mv643xx_eth.c
+++ linux-2.6/drivers/net/mv643xx_eth.c
@@ -80,7 +80,7 @@ static char mv643xx_driver_version[] = 
 
 static void __iomem *mv643xx_eth_base;
 
-/* used to protect MV643XX_ETH_SMI_REG, which is shared across ports */
+/* used to protect SMI_REG, which is shared across ports */
 static DEFINE_SPINLOCK(mv643xx_eth_phy_lock);
 
 static inline u32 mv_read(int offset)
@@ -214,12 +214,12 @@ static void mv643xx_eth_set_rx_mode(stru
struct mv643xx_private *mp = netdev_priv(dev);
u32 config_reg;
 
-   config_reg = mv_read(MV643XX_ETH_PORT_CONFIG_REG(mp-port_num));
+   config_reg = mv_read(PORT_CONFIG_REG(mp-port_num));
if (dev-flags  IFF_PROMISC)
-   config_reg |= (u32) MV643XX_ETH_UNICAST_PROMISCUOUS_MODE;
+   config_reg |= (u32) UNICAST_PROMISCUOUS_MODE;
else
-   config_reg = ~(u32) MV643XX_ETH_UNICAST_PROMISCUOUS_MODE;
-   mv_write(MV643XX_ETH_PORT_CONFIG_REG(mp-port_num), config_reg);
+   config_reg = ~(u32) UNICAST_PROMISCUOUS_MODE;
+   mv_write(PORT_CONFIG_REG(mp-port_num), config_reg);
 
eth_port_set_multicast_list(dev);
 }
@@ -455,41 +455,37 @@ static void mv643xx_eth_update_pscr(stru
u32 o_pscr, n_pscr;
unsigned int queues;
 
-   o_pscr = mv_read(MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port_num));
+   o_pscr = mv_read(PORT_SERIAL_CONTROL_REG(port_num));
n_pscr = o_pscr;
 
/* clear speed, duplex and rx buffer size fields */
-   n_pscr = ~(MV643XX_ETH_SET_MII_SPEED_TO_100  |
-  MV643XX_ETH_SET_GMII_SPEED_TO_1000 |
-  MV643XX_ETH_SET_FULL_DUPLEX_MODE   |
-  MV643XX_ETH_MAX_RX_PACKET_MASK);
+   n_pscr = ~(SET_MII_SPEED_TO_100  |
+  SET_GMII_SPEED_TO_1000 |
+  SET_FULL_DUPLEX_MODE   |
+  MAX_RX_PACKET_MASK);
 
if (ecmd-duplex == DUPLEX_FULL)
-   n_pscr |= MV643XX_ETH_SET_FULL_DUPLEX_MODE;
+   n_pscr |= SET_FULL_DUPLEX_MODE;
 
if (ecmd-speed == SPEED_1000)
-   n_pscr |= MV643XX_ETH_SET_GMII_SPEED_TO_1000 |
- MV643XX_ETH_MAX_RX_PACKET_9700BYTE;
+   n_pscr |= SET_GMII_SPEED_TO_1000 |
+ MAX_RX_PACKET_9700BYTE;
else {
if (ecmd-speed == SPEED_100)
-   n_pscr |= MV643XX_ETH_SET_MII_SPEED_TO_100;
-   n_pscr |= MV643XX_ETH_MAX_RX_PACKET_1522BYTE;
+   n_pscr |= SET_MII_SPEED_TO_100;
+   n_pscr |= MAX_RX_PACKET_1522BYTE;
}
 
if (n_pscr != o_pscr) {
-   if ((o_pscr  MV643XX_ETH_SERIAL_PORT_ENABLE) == 0)
-   mv_write(MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port_num),
-   n_pscr);
+   if ((o_pscr  SERIAL_PORT_ENABLE) == 0)
+   mv_write(PORT_SERIAL_CONTROL_REG(port_num), n_pscr);
else {
queues = mv643xx_eth_port_disable_tx(port_num);
 
-   o_pscr = ~MV643XX_ETH_SERIAL_PORT_ENABLE;
-   mv_write(MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port_num),
-   o_pscr);
-   mv_write(MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port_num),
-   n_pscr);
-   mv_write(MV643XX_ETH_PORT_SERIAL_CONTROL_REG(port_num),
-   n_pscr);
+   o_pscr = ~SERIAL_PORT_ENABLE;
+   mv_write(PORT_SERIAL_CONTROL_REG(port_num), o_pscr);
+   mv_write(PORT_SERIAL_CONTROL_REG(port_num), n_pscr);
+   mv_write(PORT_SERIAL_CONTROL_REG(port_num), n_pscr);
if (queues)
mv643xx_eth_port_enable_tx(port_num, queues);
}
@@ -515,13 +511,13 @@ static irqreturn_t mv643xx_eth_int_handl
unsigned int port_num = mp-port_num;
 
/* Read interrupt cause registers */
-   eth_int_cause = mv_read(MV643XX_ETH_INTERRUPT_CAUSE_REG(port_num)) 
+   eth_int_cause = mv_read(INTERRUPT_CAUSE_REG(port_num)) 
ETH_INT_UNMASK_ALL;
if (eth_int_cause  ETH_INT_CAUSE_EXT) {
eth_int_cause_ext = mv_read(
-   MV643XX_ETH_INTERRUPT_CAUSE_EXTEND_REG(port_num

[PATCH 1/8] [MV643XX_ETH] Split off mv643xx_eth platform device data

2007-10-18 Thread Lennert Buytenhek

The mv643xx ethernet silicon block is also found in a couple of other
Marvell chips.  As a first step towards splitting off the mv643xx_eth
bits from the rest of the mv643xx bits, this patch splits the mv643xx
ethernet platform device data struct in linux/mv643xx.h off into
linux/mv643xx_eth.h, and includes the latter from the former.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
Acked-by: Tzachi Perelstein [EMAIL PROTECTED]

Index: linux-2.6/include/linux/mv643xx.h
===
--- linux-2.6.orig/include/linux/mv643xx.h
+++ linux-2.6/include/linux/mv643xx.h
@@ -14,6 +14,7 @@
 #define __ASM_MV643XX_H
 
 #include asm/types.h
+#include linux/mv643xx_eth.h
 
 //
 /* Processor Address Space  */
@@ -658,9 +659,6 @@
 /*Ethernet Unit Registers  */
 //
 
-#define MV643XX_ETH_SHARED_REGS 0x2000
-#define MV643XX_ETH_SHARED_REGS_SIZE0x2000
-
 #define MV643XX_ETH_PHY_ADDR_REG0x2000
 #define MV643XX_ETH_SMI_REG 0x2004
 #define MV643XX_ETH_UNIT_DEFAULT_ADDR_REG   0x2008
@@ -1280,28 +1278,6 @@ struct mv64xxx_i2c_pdata {
 
 #define MV643XX_ETH_DESC_SIZE  64
 
-#define MV643XX_ETH_SHARED_NAMEmv643xx_eth_shared
-#define MV643XX_ETH_NAME   mv643xx_eth
-
-struct mv643xx_eth_platform_data {
-   int port_number;
-   u16 force_phy_addr; /* force override if phy_addr == 0 */
-   u16 phy_addr;
-
-   /* If speed is 0, then speed and duplex are autonegotiated. */
-   int speed;  /* 0, SPEED_10, SPEED_100, SPEED_1000 */
-   int duplex; /* DUPLEX_HALF or DUPLEX_FULL */
-
-   /* non-zero values of the following fields override defaults */
-   u32 tx_queue_size;
-   u32 rx_queue_size;
-   u32 tx_sram_addr;
-   u32 tx_sram_size;
-   u32 rx_sram_addr;
-   u32 rx_sram_size;
-   u8  mac_addr[6];/* mac address if non-zero*/
-};
-
 /* Watchdog Platform Device, Driver Data */
 #defineMV64x60_WDT_NAMEmv64x60_wdt
 
Index: linux-2.6/include/linux/mv643xx_eth.h
===
--- /dev/null
+++ linux-2.6/include/linux/mv643xx_eth.h
@@ -0,0 +1,31 @@
+/*
+ * MV-643XX ethernet platform device data definition file.
+ */
+#ifndef __LINUX_MV643XX_ETH_H
+#define __LINUX_MV643XX_ETH_H
+
+#define MV643XX_ETH_SHARED_NAMEmv643xx_eth_shared
+#define MV643XX_ETH_NAME   mv643xx_eth
+#define MV643XX_ETH_SHARED_REGS0x2000
+#define MV643XX_ETH_SHARED_REGS_SIZE   0x2000
+
+struct mv643xx_eth_platform_data {
+   int port_number;
+   u16 force_phy_addr; /* force override if phy_addr == 0 */
+   u16 phy_addr;
+
+   /* If speed is 0, then speed and duplex are autonegotiated. */
+   int speed;  /* 0, SPEED_10, SPEED_100, SPEED_1000 */
+   int duplex; /* DUPLEX_HALF or DUPLEX_FULL */
+
+   /* non-zero values of the following fields override defaults */
+   u32 tx_queue_size;
+   u32 rx_queue_size;
+   u32 tx_sram_addr;
+   u32 tx_sram_size;
+   u32 rx_sram_addr;
+   u32 rx_sram_size;
+   u8  mac_addr[6];/* mac address if non-zero*/
+};
+
+#endif /* __LINUX_MV643XX_ETH_H */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/8] [MV643XX_ETH] Enable use on Orion platforms

2007-10-18 Thread Lennert Buytenhek

Allow Orion ARM platforms to use the mv643xx_eth driver.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
Acked-by: Tzachi Perelstein [EMAIL PROTECTED]

Index: linux-2.6/drivers/net/Kconfig
===
--- linux-2.6.orig/drivers/net/Kconfig
+++ linux-2.6/drivers/net/Kconfig
@@ -2392,13 +2392,16 @@ config UGETH_TX_ON_DEMAND
depends on UCC_GETH
 
 config MV643XX_ETH
-   tristate MV-643XX Ethernet support
-   depends on MV64360 || MV64X60 || (PPC_MULTIPLATFORM  PPC32)
+   tristate Marvell Discovery (643XX) and Orion ethernet support
+   depends on MV64360 || MV64X60 || (PPC_MULTIPLATFORM  PPC32) || 
ARCH_ORION
select MII
help
- This driver supports the gigabit Ethernet on the Marvell MV643XX
- chipset which is used in the Momenco Ocelot C and Jaguar ATX and
- Pegasos II, amongst other PPC and MIPS boards.
+ This driver supports the gigabit ethernet MACs in the
+ Marvell Discovery PPC/MIPS chipset family (MV643XX) and
+ in the Marvell Orion ARM SoC family.
+
+ Some boards that use the Discovery chipset are the Momenco
+ Ocelot C and Jaguar ATX and Pegasos II.
 
 config QLA3XXX
tristate QLogic QLA3XXX Network Driver Support
Index: linux-2.6/drivers/net/mv643xx_eth.c
===
--- linux-2.6.orig/drivers/net/mv643xx_eth.c
+++ linux-2.6/drivers/net/mv643xx_eth.c
@@ -1,5 +1,5 @@
 /*
- * drivers/net/mv643xx_eth.c - Driver for MV643XX ethernet ports
+ * Driver for Marvell Discovery (MV643XX) and Marvell Orion ethernet ports
  * Copyright (C) 2002 Matthew Dharm [EMAIL PROTECTED]
  *
  * Based on the 64360 driver from:
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/8] [MV643XX_ETH] Remove SHARED_REGS register address bias

2007-10-18 Thread Lennert Buytenhek

Start counting mv643xx_eth register addresses from zero, instead of
from 0x2000 (MV643XX_ETH_SHARED_REGS.)

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
Acked-by: Tzachi Perelstein [EMAIL PROTECTED]

Index: linux-2.6/drivers/net/mv643xx_eth.c
===
--- linux-2.6.orig/drivers/net/mv643xx_eth.c
+++ linux-2.6/drivers/net/mv643xx_eth.c
@@ -78,26 +78,19 @@ static const struct ethtool_ops mv643xx_
 static char mv643xx_driver_name[] = mv643xx_eth;
 static char mv643xx_driver_version[] = 1.0;
 
-static void __iomem *mv643xx_eth_shared_base;
+static void __iomem *mv643xx_eth_base;
 
 /* used to protect MV643XX_ETH_SMI_REG, which is shared across ports */
 static DEFINE_SPINLOCK(mv643xx_eth_phy_lock);
 
 static inline u32 mv_read(int offset)
 {
-   void __iomem *reg_base;
-
-   reg_base = mv643xx_eth_shared_base - MV643XX_ETH_SHARED_REGS;
-
-   return readl(reg_base + offset);
+   return readl(((void __iomem *)mv643xx_eth_base) + offset);
 }
 
 static inline void mv_write(int offset, u32 data)
 {
-   void __iomem *reg_base;
-
-   reg_base = mv643xx_eth_shared_base - MV643XX_ETH_SHARED_REGS;
-   writel(data, reg_base + offset);
+   writel(data, ((void __iomem *)mv643xx_eth_base) + offset);
 }
 
 /*
@@ -1470,9 +1463,8 @@ static int mv643xx_eth_shared_probe(stru
if (res == NULL)
return -ENODEV;
 
-   mv643xx_eth_shared_base = ioremap(res-start,
-   MV643XX_ETH_SHARED_REGS_SIZE);
-   if (mv643xx_eth_shared_base == NULL)
+   mv643xx_eth_base = ioremap(res-start, res-end - res-start + 1);
+   if (mv643xx_eth_base == NULL)
return -ENOMEM;
 
return 0;
@@ -1481,8 +1473,8 @@ static int mv643xx_eth_shared_probe(stru
 
 static int mv643xx_eth_shared_remove(struct platform_device *pdev)
 {
-   iounmap(mv643xx_eth_shared_base);
-   mv643xx_eth_shared_base = NULL;
+   iounmap(mv643xx_eth_base);
+   mv643xx_eth_base = NULL;
 
return 0;
 }
Index: linux-2.6/drivers/net/mv643xx_eth.h
===
--- linux-2.6.orig/drivers/net/mv643xx_eth.h
+++ linux-2.6/drivers/net/mv643xx_eth.h
@@ -55,116 +55,116 @@
 /*Ethernet Unit Registers  */
 //
 
-#define MV643XX_ETH_PHY_ADDR_REG0x2000
-#define MV643XX_ETH_SMI_REG 0x2004
-#define MV643XX_ETH_UNIT_DEFAULT_ADDR_REG   0x2008
-#define MV643XX_ETH_UNIT_DEFAULTID_REG  0x200c
-#define MV643XX_ETH_UNIT_INTERRUPT_CAUSE_REG0x2080
-#define MV643XX_ETH_UNIT_INTERRUPT_MASK_REG 0x2084
-#define MV643XX_ETH_UNIT_INTERNAL_USE_REG   0x24fc
-#define MV643XX_ETH_UNIT_ERROR_ADDR_REG 0x2094
-#define MV643XX_ETH_BAR_0   0x2200
-#define MV643XX_ETH_BAR_1   0x2208
-#define MV643XX_ETH_BAR_2   0x2210
-#define MV643XX_ETH_BAR_3   0x2218
-#define MV643XX_ETH_BAR_4   0x2220
-#define MV643XX_ETH_BAR_5   0x2228
-#define MV643XX_ETH_SIZE_REG_0  0x2204
-#define MV643XX_ETH_SIZE_REG_1  0x220c
-#define MV643XX_ETH_SIZE_REG_2  0x2214
-#define MV643XX_ETH_SIZE_REG_3  0x221c
-#define MV643XX_ETH_SIZE_REG_4  0x2224
-#define MV643XX_ETH_SIZE_REG_5  0x222c
-#define MV643XX_ETH_HEADERS_RETARGET_BASE_REG   0x2230
-#define MV643XX_ETH_HEADERS_RETARGET_CONTROL_REG0x2234
-#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_0   0x2280
-#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_1   0x2284
-#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_2   0x2288
-#define MV643XX_ETH_HIGH_ADDR_REMAP_REG_3   0x228c
-#define MV643XX_ETH_BASE_ADDR_ENABLE_REG0x2290
-#define MV643XX_ETH_ACCESS_PROTECTION_REG(port)(0x2294 + 
(port2))
-#define MV643XX_ETH_MIB_COUNTERS_BASE(port)(0x3000 + 
(port7))
-#define MV643XX_ETH_PORT_CONFIG_REG(port)  (0x2400 + 
(port10))
-#define MV643XX_ETH_PORT_CONFIG_EXTEND_REG(port)   (0x2404 + 
(port10))
-#define MV643XX_ETH_MII_SERIAL_PARAMETRS_REG(port) (0x2408 + 
(port10))
-#define MV643XX_ETH_GMII_SERIAL_PARAMETRS_REG(port)(0x240c + 
(port10))
-#define

Re: [PATCH,RFC] Marvell Orion SoC ethernet driver

2007-10-17 Thread Lennert Buytenhek

On Tue, Oct 16, 2007 at 11:31:15PM +0200, Maxime Bizon wrote:

 Hello,

Hi,


  +#define PORT_CONF  0x400
  +#define PORT_CONF_EXT  0x404
  +#define PORT_MAC_LO0x414
  +#define PORT_MAC_HI0x418
  +#define PORT_SDMA  0x41c
  +#define PORT_SERIAL0x43c
  +#define PORT_STAT  0x444
  +#define PORT_TXQ_CMD   0x448
  +#define PORT_MTU   0x458
  +#define PORT_CAUSE 0x460
  +#define PORT_CAUSE_EXT 0x464
  +#define PORT_MASK  0x468
  +#define PORT_MASK_EXT  0x46c
  +#define PORT_TX_THRESH 0x474
 
 This driver seems to support the same hardware as mv643xx_eth, any
 chance you could use it to avoid code duplication ?

Interesting.  After some asking around, it appears that the mv643xx
ethernet silicon block is indeed very similar to the ethernet silicon
block found the in Orion ARM SoCs.

We'll work on getting Orion to use mv643xx_eth.  Thanks for pointing
this out.


thanks,
Lennert
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH,RFC] Marvell Orion SoC ethernet driver

2007-10-16 Thread Lennert Buytenhek

Attached is a driver for the built-in 10/100/1000 ethernet MAC in
the Marvell Orion series of ARM SoCs.
 
This ethernet MAC supports the MII/GMII/RGMII PCS interface types,
and offers a pretty standard set of MAC features, such as RX/TX
checksum offload, scatter-gather, interrupt coalescing, PAUSE,
jumbo frames, etc.
 
This patch is against 2.6.22.1, and the driver has not yet been
adapted to the recent NAPI changes.  Nevertheless, we wanted to
get this out there for feedback/review.

Comments appreciated!

Signed-off-by: Tzachi Perelstein [EMAIL PROTECTED]
Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]
Signed-off-by: Nicolas Pitre [EMAIL PROTECTED]


Index: linux-2.6.22.1-orion.3.3/drivers/net/Kconfig
===
--- linux-2.6.22.1-orion.3.3.orig/drivers/net/Kconfig
+++ linux-2.6.22.1-orion.3.3/drivers/net/Kconfig
@@ -1995,6 +1995,12 @@ config E1000_DISABLE_PACKET_SPLIT
 
 source drivers/net/ixp2000/Kconfig
 
+config ORION_ETH
+   tristate Marvell Orion Gigabit Ethernet support
+   depends on ARCH_ORION
+   ---help---
+ This driver supports the Orion's on chip gigabit ethernet port.
+
 config MYRI_SBUS
tristate MyriCOM Gigabit Ethernet support
depends on SBUS
Index: linux-2.6.22.1-orion.3.3/drivers/net/Makefile
===
--- linux-2.6.22.1-orion.3.3.orig/drivers/net/Makefile
+++ linux-2.6.22.1-orion.3.3/drivers/net/Makefile
@@ -221,6 +221,7 @@ obj-$(CONFIG_HAMRADIO) += hamradio/
 obj-$(CONFIG_IRDA) += irda/
 obj-$(CONFIG_ETRAX_ETHERNET) += cris/
 obj-$(CONFIG_ENP2611_MSF_NET) += ixp2000/
+obj-$(CONFIG_ORION_ETH) += orion_eth.o
 
 obj-$(CONFIG_NETCONSOLE) += netconsole.o
 
Index: linux-2.6.22.1-orion.3.3/drivers/net/orion_eth.c
===
--- /dev/null
+++ linux-2.6.22.1-orion.3.3/drivers/net/orion_eth.c
@@ -0,0 +1,1506 @@
+/*
+ * Marvell Orion Gigabit Ethernet network device driver
+ *
+ * Maintainer: Tzachi Perelstein [EMAIL PROTECTED]
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed as is without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include linux/dma-mapping.h
+#include linux/module.h
+#include linux/kernel.h
+#include linux/netdevice.h
+#include linux/mii.h
+#include linux/etherdevice.h
+#include linux/ethtool.h
+#include linux/ip.h
+#include linux/in.h
+#include linux/init.h
+#include linux/platform_device.h
+#include linux/delay.h
+#include asm/arch/platform.h
+#include asm/io.h
+
+#define DRV_NAME   orion-eth
+#define DRV_VERSION0.3
+
+/*
+ * Orion Gigabit Ethernet Registers
+ /
+#define rdl(op, off)   __raw_readl((op)-base_addr + (off))
+#define wrl(op, off, val)  __raw_writel((val), (op)-base_addr + (off))
+#define wrb(op, off, val)  __raw_writeb((val), (op)-base_addr + (off))
+
+/*
+ * Unit Global Registers
+ */
+#define ETH_PHY_ID 0x000
+#define ETH_SMI0x004
+#define ETH_CAUSE  0x080
+#define ETH_MASK   0x084
+#define ETH_CTRL   0x0b0
+
+/*
+ * Port Registers
+ */
+#define PORT_CONF  0x400
+#define PORT_CONF_EXT  0x404
+#define PORT_MAC_LO0x414
+#define PORT_MAC_HI0x418
+#define PORT_SDMA  0x41c
+#define PORT_SERIAL0x43c
+#define PORT_STAT  0x444
+#define PORT_TXQ_CMD   0x448
+#define PORT_MTU   0x458
+#define PORT_CAUSE 0x460
+#define PORT_CAUSE_EXT 0x464
+#define PORT_MASK  0x468
+#define PORT_MASK_EXT  0x46c
+#define PORT_TX_THRESH 0x474
+#define PORT_CURR_RXD  0x60c
+#define PORT_RXQ_CMD   0x680
+#define PORT_CURR_TXD  0x6c0
+#define PORT_MIB_BASE  0x1000
+#define PORT_MIB_SIZE  128
+#define PORT_SPEC_MCAST_BASE   0x1400
+#define PORT_SPEC_MCAST_SIZE   256
+#define PORT_OTHER_MCAST_BASE  0x1500
+#define PORT_OTHER_MCAST_SIZE  256
+#define PORT_UCAST_BASE0x1600
+#define PORT_UCAST_SIZE16
+
+/*
+ * ETH_SMI bits
+ */
+#define SMI_DEV_OFFS   16
+#define SMI_REG_OFFS   21
+#define SMI_READ   (1  26)
+#define SMI_READ_VALID (1  27)
+#define SMI_BUSY   (1  28)
+
+/*
+ * PORT_STAT bits
+ */
+#define STAT_LINK_UP   (1  1)
+#define STAT_FULL_DUPLEX   (1  2)
+#define STAT_SPEED_1000(1  4)
+#define STAT_SPEED_100 (1  5)
+
+/*
+ * PORT_[T/R]XQ_CMD bits
+ */
+#define PORT_EN_TXQ0   1
+#define PORT_EN_RXQ0   1
+#define PORT_DIS_RXQ0  (1  8)
+#define PORT_DIS_TXQ0  (1  8)
+
+/*
+ * Descriptors bits
+ */
+#define

Re: Problem with implementation of TCP_DEFER_ACCEPT?

2007-08-24 Thread Lennert Buytenhek

On Fri, Aug 24, 2007 at 01:08:25AM +0100, TJ wrote:

 An RFC 793 standard TCP handshake requires three packets:
 
 client SYN  server LISTENING
 client  SYN ACK server SYN_RECEIVED
 client ACK  server ESTABLISHED
 
 client PSH ACK + data  server
 
 TCP_DEFER_ACCEPT is designed to increase performance by reducing the
 number of TCP packets exchanged before the client can pass data:
 
 client SYN  server LISTENING
 client  SYN ACK server SYN_RECEIVED
 
 client PSH ACK + data  server ESTABLISHED
 
 At present with TCP_DEFER_ACCEPT the kernel treats the RFC 793 handshake
 as invalid; dropping the ACK from the client without replying so the
 client doesn't know the server has in fact set it's internal ACKed flag.
 
 If the client doesn't send a packet containing data before the SYN_ACK
 time-outs finally expire the connection will be dropped.

A brought this up a long, long time ago, and I seem to remember
Alexey Kuznetsov explained me at the time that this was intentional.

I can't find the thread in the mailing list archives anymore, though
-- and my memory might be failing me.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG: when using 'brctl stp'

2007-08-14 Thread Lennert Buytenhek

On Tue, Aug 14, 2007 at 02:11:05PM +0100, Stephen Hemminger wrote:

 Bridge locking for /sys/class/net/br0/bridge/stp_enabled
 was wrong.  Another bug in bridge utilities makes it such that
 this interface, meant it wasn't being used.  The locking needs
 to be removed from set_stp_state(), the lock is already acquired
 down in br_stp_start()/br_stp_stop.

The 'locking' in set_stp_state() is actually dropping the lock
around the br_stp_set_enabled() invocation, not acquiring it:


 @@ -150,9 +150,7 @@ static ssize_t show_stp_state(struct dev
  static void set_stp_state(struct net_bridge *br, unsigned long val)
  {
   rtnl_lock();
 - spin_unlock_bh(br-lock);
   br_stp_set_enabled(br, val);
 - spin_lock_bh(br-lock);
   rtnl_unlock();
  }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Lennert Buytenhek

On Wed, Aug 08, 2007 at 07:07:33PM -0400, Chris Snook wrote:

 From: Chris Snook [EMAIL PROTECTED]
 
 Some architectures currently do not declare the contents of an atomic_t to be
 volatile.  This causes confusion since atomic_read() might not actually read
 anything if an optimizing compiler re-uses a value stored in a register, which
 can break code that loops until something external changes the value of an
 atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
 of all registers used in the loop, thus hurting performance instead of helping
 it, particularly on architectures where it's unnecessary.  Since we generally
 want to re-read the contents of an atomic variable on every access anyway,
 let's standardize the behavior across all architectures and avoid the
 performance and correctness problems of requiring the use of barrier() in
 loops that expect atomic_t variables to change externally.  This is relevant
 even on non-smp architectures, since drivers may use atomic operations in
 interrupt handlers.
 
 Signed-off-by: Chris Snook [EMAIL PROTECTED]

Documentation/atomic_ops.txt would need updating:

[...]

One very important aspect of these two routines is that they DO NOT
require any explicit memory barriers.  They need only perform the
atomic_t counter update in an SMP safe manner.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6 patch] EP93XX_ETH must select MII

2007-07-13 Thread Lennert Buytenhek

On Fri, Jul 13, 2007 at 02:12:08AM +0200, Adrian Bunk wrote:

 From: John Donoghue [EMAIL PROTECTED]
 
 CONFIG_EP93XX_ETH=y, CONFIG_MII=n results in an obvious link error.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Acked-by: Lennert Buytenhek [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] b44: power down PHY when interface down

2007-07-01 Thread Lennert Buytenhek

On Sun, Jul 01, 2007 at 12:23:16PM +0200, Michael Buesch wrote:

  More or less.  You can't add the resistances like that, since the
  bus isolation chip buffers the IDSEL signal, but it is correct that
  if the host's IDSEL resistor is larger than a certain value, the
  combination of the resistive coupling of IDSEL plus the extra buffer
  in the isolator might be causing the IDSEL input on the 'guest' PCI
  board to assert too late (or not assert at all), causing config
  accesses to fail.
  
  (This also depends on the specific 'guest' PCI board used, as you
  noted, due to differing IDSEL trace lengths/capacitances and input
  pin capacitances on different PCI boards.  Also, it might work at
  33 MHz but not work at 66 MHz, etc.)
 
 It doesn't work on any of my boards :(

What extender board is this?  Do you have docs/schematics?
And what motherboard brand/type?


  If you feel adventurous, you could try to hack around this by
  figuring out which AD[31:16] line this PCI slot's IDSEL line is
  resistively coupled to (depends on the slot), and then adding
  another parallel resistor on the board itself to make the bus
  isolator's input buffer charge faster.  Note that this does
  increase the load on that specific AD[] line, which might cause
  other funny effects.
 
 Well, but how to find out to which address line it's connected to?
 Pretty hard to follow the PCB traces, especially since it's
 multilayered.

Actually, the IDSEL resistor would be on the computer's
motherboard, not on the PCI board.  And to which address line
the IDSEL line is connected depends on which PCI slot on the
motherboard you're looking at.

A multimeter should do the trick, but I would advise against this
if you're not totally comfortable with hacking hardware.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] b44: power down PHY when interface down

2007-06-30 Thread Lennert Buytenhek

On Sat, Jun 30, 2007 at 04:19:23PM +0100, Matthew Garrett wrote:

 I'd agree that there's a need for a state where we power down as much as 
 possible (even at the cost of functionality), but where possible it 
 would also be nice to offer a state where the mac is powered down and 
 the phy left up.

There are PHYs which can detect that someone's on the other end even
when powered down..
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] b44: power down PHY when interface down

2007-06-30 Thread Lennert Buytenhek

On Sat, Jun 30, 2007 at 11:53:25PM +0200, Michael Buesch wrote:

  When the interface is down (or driver removed), the BroadCom 44xx card 
  remains
  powered on, and both its MAC and PHY is using up power.
  This patch makes the driver issue a MAC_CTRL_PHY_PDOWN when the interface
  is halted, and does a partial chip reset turns off the activity LEDs too.
  
  Applies to 2.6.22-rc6, or current git head.
  
  Tested on a Broadcom BCM4401-B0 card, it saves ~0.5W (measured using 
  powertop).
 
 Hm, I was going to measure the real power advantage with a
 PCI-extender card. But my B44B0 card doesn't seem to work in
 that extender card. It works perfectly fine sticked directly into
 the motherboard, though, and other cards like a BCM4318 work in
 the extender, too.
 Not sure what this is.
 The extender has an application note about nonworking cards in the
 extender and a too big resistor on the board IDSEL pin being the
 cause of this.

Does the card show up in lspci at all?  IDSEL drive strength
issues should only affect config space accesses.

Does the extender board have a PCI-PCI bridge on it?  (If not,
there's not really any reason to resistively couple the IDSEL
line to the host, since the host should take care of that.)


 Maybe I can try with another machine tomorrow.

That would only make a difference if there is no PCI-PCI bridge on
the extender board.

If the extender resistively couples the host's IDSEL line, you might
see different results on a different host bridge, since different
host bridges can use different numbers of IDSEL stepping cycles.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] b44: power down PHY when interface down

2007-06-30 Thread Lennert Buytenhek

On Sun, Jul 01, 2007 at 12:24:40AM +0200, Michael Buesch wrote:

   Hm, I was going to measure the real power advantage with a
   PCI-extender card. But my B44B0 card doesn't seem to work in
   that extender card. It works perfectly fine sticked directly into
   the motherboard, though, and other cards like a BCM4318 work in
   the extender, too.
   Not sure what this is.
   The extender has an application note about nonworking cards in the
   extender and a too big resistor on the board IDSEL pin being the
   cause of this.
  
  Does the card show up in lspci at all?
 
 No it doesn't.

Right, so it sounds like it might be this issue.


  Does the extender board have a PCI-PCI bridge on it?  (If not,
  there's not really any reason to resistively couple the IDSEL
  line to the host, since the host should take care of that.)
 
 There's no bridge. It just decouples all voltage lines, so you can
 drive it from external supply and/or measure voltages and current.
 On the PCB it looks like the the IDSEL line is rather directly
 routed to the host IDSEL. It just goes through one of the bus
 isolation chips. So I guess (just my guess) that this chip has some
 resistance and if the total resistance of the chip + the IDSEL
 resistor on the mainboard goes above some threshold it doesn't work
 anymore for some cards. In the application note they write
 about trouble for IDSEL resistors 51ohms.

More or less.  You can't add the resistances like that, since the
bus isolation chip buffers the IDSEL signal, but it is correct that
if the host's IDSEL resistor is larger than a certain value, the
combination of the resistive coupling of IDSEL plus the extra buffer
in the isolator might be causing the IDSEL input on the 'guest' PCI
board to assert too late (or not assert at all), causing config
accesses to fail.

(This also depends on the specific 'guest' PCI board used, as you
noted, due to differing IDSEL trace lengths/capacitances and input
pin capacitances on different PCI boards.  Also, it might work at
33 MHz but not work at 66 MHz, etc.)

If you feel adventurous, you could try to hack around this by
figuring out which AD[31:16] line this PCI slot's IDSEL line is
resistively coupled to (depends on the slot), and then adding
another parallel resistor on the board itself to make the bus
isolator's input buffer charge faster.  Note that this does
increase the load on that specific AD[] line, which might cause
other funny effects.


   Maybe I can try with another machine tomorrow.
  
  That would only make a difference if there is no PCI-PCI bridge on
  the extender board.
 
 Well, they suggest it in the application note as a possible fix. ;)

The bus isolation chip doesn't count as a PCI-PCI bridge. :)  I'm just
saying that you wouldn't see the issue you are seeing now if the
extender board had a real PCI-PCI bridge on it, since in that case the
type 0 config access to the guest PCI board would be generated by the
bridge instead of by the host.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-16 Thread Lennert Buytenhek

On Wed, May 16, 2007 at 08:13:01AM +0100, Christoph Hellwig wrote:

  +#ifndef __ARMEB__
  +#warning Little endian mode not supported
  +#endif
  
  Personally I'm less fussed about WAN / LE support. Anyone with any  
  sense will run ixp4xx boards doing such a specialised network  
  operation as BE. Also, NSLU2-Linux can't test this functionality with  
  our LE setup as we don't have this hardware on-board. You may just  
  want to declare a depends on ARMEB in Kconfig (with or without OR  
  (ARM || BROKEN) ) and have done with it - it's up to you.
  
  Christian Hohnstaedt's work did support LE though.
  
  Not all ixp4xx boards are by definition doing such a specialised 
  network operation.
  
  Krzysztof, why is LE not supported?
  
  Do you need access to ixp4xx that starts in LE mode?
 
 Not even trying to support LE is a clear merge blocker.  Maybe
 Krzysztof can't actually test it himself, which is fine - but
 not even pretending to be endian clean is not what proper Linux
 drivers do.

The issue is not that the driver is not 'endian clean'.

This is a driver for an on-chip ethernet MAC on an ARM CPU.  I.e. the
ethernet MAC is on the CPU itself, it's not some kind of PCI device or
something like that.  The ARM CPU in question can be run in either
little endian or big endian mode.  Making a driver work in both modes
of operation is generally not just an issue of adding a couple of
be32_to_cpu()s in the right places.

For example, intel IXP2000 and IXP23xx CPU support in arch/arm only
supports big-endian mode of operation, and none of the associated
drivers support little-endian mode.

Most of the other CPU support in arch/arm only supports
little-endian mode, and none of the associated drivers support
big-endian mode.  According to your criterion, that would mean that
most of the ARM drivers (alsa, usb, framebuffer, networking, etc.)
should never have been accepted in the kernel tree in the first place.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-16 Thread Lennert Buytenhek

On Wed, May 16, 2007 at 08:16:38PM +0930, Rod Whitby wrote:

 So, if the author of these patches wishes to concentrate on big-endian
 support first, then we will not say (and have not said) anything which
 will block inclusion of a big-endian only version of this driver.

The NSLU2 people are the ones here that are saying that the driver
should really support LE (because that is what they happen to be
using, the rest of the world runs the ixp4xx in BE), and they keep
saying that it would be so easy to make a patch to add LE support,
but so far they haven't produced such a patch.

Please just write the patch and let's get this over with.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-16 Thread Lennert Buytenhek

On Wed, May 16, 2007 at 09:05:18PM +0930, Rod Whitby wrote:

  So, if the author of these patches wishes to concentrate on big-endian
  support first, then we will not say (and have not said) anything which
  will block inclusion of a big-endian only version of this driver.
  
  The NSLU2 people are the ones here that are saying that the driver
  should really support LE (because that is what they happen to be
  using, the rest of the world runs the ixp4xx in BE)
 
 I'll repeat again.  NSLU2-Linux supports both BE and LE.  We have
 about 5,000 users running BE and about 5,000 users running LE.

Perhaps, but somehow I don't think that we'd have seen any reaction
if the submitted driver had only supported LE and not BE.


  Please just write the patch and let's get this over with.
 
 Please let's just stop arguing about it.  If a patch appears before
 it gets merged, then great.  If it doesn't then it will appear at a
 later date.

Great.  I agree.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-15 Thread Lennert Buytenhek

On Wed, May 09, 2007 at 03:45:53PM +0100, Michael-Luke Jones wrote:

 No-one is saying that this driver should not be mainlined before it  
 has LE support. All that I said was:
 
  Personally I'd like LE ethernet tested and working before we push.
 
 The alternative would be to explicitly state in Kconfig that LE arm  
 is broken with this driver, so that this could be fixed later.

The driver does bomb out during compile if __ARMEB__ isn't defined,
but that apparently wasn't good enough.


 Please can we not blow this out of proportion, it really isn't that  
 big a deal. The irony is that fixing Krzysztof's driver to work on LE  
 will probably be quite easy, given that we already have a working LE  
 driver from Christian.

I'm looking forward to your patch.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-09 Thread Lennert Buytenhek

On Wed, May 09, 2007 at 10:58:06AM +0200, Marcus Better wrote:

  There _is_ an ARM BE version of Debian.
 
  It's not an official port, but it's not maintained any worse than
  the 'official' LE ARM Debian port is.
 
  Hmm... That changes a bit. Perhaps we should forget about
  that LE thing then, and (at best) put that trivial workaround?
 
 Please keep in mind that users are unlikely to install an unofficial port
 which lacks integration with the Debian infrastructure, security support
 and other services. The arm architecture (LE) is currently the third most
 popular in Debian, whereas I suspect (?) there are very few BE Debian
 systems out there.

Note that all of your arguments also apply to the experimental
EABI little-endian ARM port.

I.e.:
1. The EABI port is an unofficial port.
2. The EABI port is not integrated with the Debian infrastructure.
3. The EABI port lacks security support.

You could also argue that:
4. There is no reason to use EABI -- old-ABI works just as well.
5. The perceived floating point speedups that EABI gives are
   completely drowned out by the slowness of the rest of the system.
6. A lot of programs assume old-ABI behavior, it is too much work
   to patch them all.

Does that mean that the Debian ARM people have their heads so far
up their collective asses that they think that every form of change
is bad and are unable to accept that some forms of change might be
for the better?

I think you've just summarised why I don't like working on Debian.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-09 Thread Lennert Buytenhek

On Tue, May 08, 2007 at 06:59:36PM +0200, Krzysztof Halasa wrote:

  There may be up to 6 Ethernet ports (not sure about hardware
  status, not yet supported even by Intel) - 7 queues * 128 entries
  each = ~ 3.5 KB. Add 2 long queues (RX) for HSS and something
  for TX, and then crypto, and maybe other things.
 
  You're unlikely to be using all of those at the same time, though.
 
 That's the point.
 
  And what do you do if the user does compile all of these features into
  his kernel and then tries to use them all at the same time?  Return
  -ENOMEM?
 
 If he is able to do so, yes - there is nothing we can do. But
 I suspect a single machine would not have all possible hardware.
 The problem is, we don't know what would it have, so it must be
 dynamic.

Well, you _would_ like to have a way to make sure that all the
capabilities on the board can be used.  If you have a future ixp4xx
based board with 16 ethernet ports, you don't want 'ifconfig eth7 up'
to give you -ENOMEM just because we ran out of SRAM.

The way I see it, that means that you do want to scale back your
other SRAM allocations if you know that you're going to need a lot
of SRAM (say, for ethernet RX/TX queues.)

Either you can do this with an ugly hack a la:

/*
 * The FOO board has many ethernet ports, and runs out of
 * SRAM prematurely if we use the default TX/RX ring sizes.
 */
#ifdef CONFIG_MACH_IXP483_FOO_BOARD
#define IXP4XX_ETH_RXTX_QUEUE_SIZE  32
#else
#define IXP4XX_ETH_RXTX_QUEUE_SIZE  256
#endif

Or you can put this knowledge in the board support code (cleaner, IMHO.)

E.g. let arch/arm/mach-ixp4xx/nslu2.c decide, at platform device
instantiation time, which region of queue SRAM can be used by which
queue, and take static allocations for things like the crypto unit
into account.  (This is just one form of that idea, there are many
different variations.)

That way, you can _guarantee_ that you'll always have enough SRAM
to be able to use the functionality that is exposed on the board you
are running on (which is a desirable property, IMHO), which is
something that you can't achieve with an allocator, as far as I can
see.

I'm not per se against the allocator, I just think that there are
problems (running out of SRAM, fragmentation) that can't be solved
by the allocator alone (SRAM users have to be aware which other SRAM
users there are in the system, while the idea of the allocator is to
insulate these users from each other), and any solution that solves
those two problems IMHO also automatically solves the problem that
the allocator is trying to solve.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-09 Thread Lennert Buytenhek

On Wed, May 09, 2007 at 11:35:03AM +0200, Marcus Better wrote:

  Does that mean that the Debian ARM people have their heads so far
  up their collective asses that they think that every form of change
  is bad and are unable to accept that some forms of change might be
  for the better?
 
 Well, I am not one of the Debian ARM people, just a user... and I
 do hope the EABI port becomes supported in the future! But in the
 meatime there is a crowd of users running Debian on consumer devices
 like the NSLU2, and they need a LE network driver.

There's a crowd of users running Linux on TCP offload capable
cards, and they need TCP offload support in Linux.

The people who need a LE network driver can use Christian's driver,
as Christian's driver works in LE just fine.  The people who care
about LE support can add LE support to the driver that Krzysztof wrote.

I don't think that not supporting LE is a reason not to merge
Krzysztof's driver.  Don't make supporting LE systems Krzysztof's
problem.

Krzysztof has written an excellent driver, and while it would be 100%
Debian style to reject his driver just because it doesn't support LE[*],
thankfully, Linux is not Debian.  Please don't turn Linux into Debian.


[*] And if he were to complain about this, he would get slapped with
the standard Our priorities are our users and free software
Debian Social Contract rhetoric -- thank $DEITY we don't have a
Linux Kernel Social Contract with the same bullshit in it.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-09 Thread Lennert Buytenhek

On Wed, May 09, 2007 at 12:35:40PM +0200, Mikael Pettersson wrote:

   Does that mean that the Debian ARM people have their heads so far
   up their collective asses that they think that every form of change
   is bad and are unable to accept that some forms of change might be
   for the better?
  
  Well, I am not one of the Debian ARM people, just a user... and I
  do hope the EABI port becomes supported in the future! But in the
  meatime there is a crowd of users running Debian on consumer devices
  like the NSLU2, and they need a LE network driver.
 
 1) Development _should_ happen in small individually-manageable steps.
It's wrong to delay integration of the new IXP4xx eth driver just
because it's not yet LE-compatible.

Exactly.


 2) LE Debian/ARM users do have alternatives: they can use USB-Ethernet
adapters, for instance.

Or just use Christian's driver.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-08 Thread Lennert Buytenhek

I'm not sure what the latest versions are, so I'm not sure which
patches to review and which patches are obsolete.


On Tue, May 08, 2007 at 02:46:28AM +0200, Krzysztof Halasa wrote:

 +struct qmgr_regs __iomem *qmgr_regs;
 +static struct resource *mem_res;
 +static spinlock_t qmgr_lock;
 +static u32 used_sram_bitmap[4]; /* 128 16-dword pages */
 +static void (*irq_handlers[HALF_QUEUES])(void *pdev);
 +static void *irq_pdevs[HALF_QUEUES];
 +
 +void qmgr_set_irq(unsigned int queue, int src,
 +   void (*handler)(void *pdev), void *pdev)
 +{
 + u32 __iomem *reg = qmgr_regs-irqsrc[queue / 8]; /* 8 queues / u32 */
 + int bit = (queue % 8) * 4; /* 3 bits + 1 reserved bit per queue */
 + unsigned long flags;
 +
 + src = 7;
 + spin_lock_irqsave(qmgr_lock, flags);
 + __raw_writel((__raw_readl(reg)  ~(7  bit)) | (src  bit), reg);
 + irq_handlers[queue] = handler;
 + irq_pdevs[queue] = pdev;
 + spin_unlock_irqrestore(qmgr_lock, flags);
 +}

The queue manager interrupts should probably be implemented as an
irqchip, in the same way that GPIO interrupts are implemented.  (I.e.
allocate 'real' interrupt numbers for them, and use the interrupt
cascade mechanism.)  You probably want to have separate irqchips for
the upper and lower halves, too.  This way, drivers can just use
request_irq() instead of having to bother with platform-specific
qmgr_set_irq() methods.  I think I also made this review comment
with Christian's driver.


 +int qmgr_request_queue(unsigned int queue, unsigned int len /* dwords */,
 +unsigned int nearly_empty_watermark,
 +unsigned int nearly_full_watermark)
 +{
 + u32 cfg, addr = 0, mask[4]; /* in 16-dwords */
 + int err;
 +
 + if (queue = HALF_QUEUES)
 + return -ERANGE;
 +
 + if ((nearly_empty_watermark | nearly_full_watermark)  ~7)
 + return -EINVAL;
 +
 + switch (len) {
 + case  16:
 + cfg = 0  24;
 + mask[0] = 0x1;
 + break;
 + case  32:
 + cfg = 1  24;
 + mask[0] = 0x3;
 + break;
 + case  64:
 + cfg = 2  24;
 + mask[0] = 0xF;
 + break;
 + case 128:
 + cfg = 3  24;
 + mask[0] = 0xFF;
 + break;
 + default:
 + return -EINVAL;
 + }
 +
 + cfg |= nearly_empty_watermark  26;
 + cfg |= nearly_full_watermark  29;
 + len /= 16;  /* in 16-dwords: 1, 2, 4 or 8 */
 + mask[1] = mask[2] = mask[3] = 0;
 +
 + if (!try_module_get(THIS_MODULE))
 + return -ENODEV;
 +
 + spin_lock_irq(qmgr_lock);
 + if (__raw_readl(qmgr_regs-sram[queue])) {
 + err = -EBUSY;
 + goto err;
 + }
 +
 + while (1) {
 + if (!(used_sram_bitmap[0]  mask[0]) 
 + !(used_sram_bitmap[1]  mask[1]) 
 + !(used_sram_bitmap[2]  mask[2]) 
 + !(used_sram_bitmap[3]  mask[3]))
 + break; /* found free space */
 +
 + addr++;
 + shift_mask(mask);
 + if (addr + len  ARRAY_SIZE(qmgr_regs-sram)) {
 + printk(KERN_ERR qmgr: no free SRAM space for
 + queue %i\n, queue);
 + err = -ENOMEM;
 + goto err;
 + }
 + }
 +
 + used_sram_bitmap[0] |= mask[0];
 + used_sram_bitmap[1] |= mask[1];
 + used_sram_bitmap[2] |= mask[2];
 + used_sram_bitmap[3] |= mask[3];
 + __raw_writel(cfg | (addr  14), qmgr_regs-sram[queue]);
 + spin_unlock_irq(qmgr_lock);
 +
 +#if DEBUG
 + printk(KERN_DEBUG qmgr: requested queue %i, addr = 0x%02X\n,
 +queue, addr);
 +#endif
 + return 0;
 +
 +err:
 + spin_unlock_irq(qmgr_lock);
 + module_put(THIS_MODULE);
 + return err;
 +}

As with Christian's driver, I don't know whether an SRAM allocator
makes much sense.  We can just set up a static allocation map for the
in-tree drivers and leave out the allocator altogether.  I.e. I don't
think it's worth the complexity (and just because the butt-ugly Intel
code has an allocator isn't a very good reason. :-)

I.e. an API a la:

ixp4xx_qmgr_config_queue(int queue_nr, int sram_base_address, int 
queue_size, ...);

might simply suffice.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] Intel IXP4xx network drivers

2007-05-08 Thread Lennert Buytenhek

On Mon, May 07, 2007 at 02:07:16AM +0200, Krzysztof Halasa wrote:

 + * Ethernet port config (0x00 is not present on IXP42X):
 + *
 + * logical port  0x000x100x20
 + * NPE   0 (NPE-A)   1 (NPE-B)   2 (NPE-C)
 + * physical PortId   2   0   1
 + * TX queue  23  24  25
 + * RX-free queue 26  27  28
 + * TX-done queue is always 31, RX queue is configurable

(Note that this assignment depends on the firmware, and different
firmware versions use different queues -- you might want to add a
note about which firmware version this holds for.)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] Intel IXP4xx network drivers

2007-05-08 Thread Lennert Buytenhek

On Mon, May 07, 2007 at 09:18:00PM +0100, Michael-Luke Jones wrote:

 Well, I'm told that (compatible) NPEs are present on other IXP CPUs.
 Not sure about details.
 
 If, by a combined effort, we ever manage to create a generic NPE  
 driver for the NPEs found in IXP42x/43x/46x/2000/23xx then the driver  
 should go in arch/arm/npe.c

(Note that the ixp2000 doesn't have NPEs.)

(Both the 2000 and the 23xx have microengines, which are both
supported by arch/arm/common/uengine.c.)


 It's possible, but hard due to the differences in hardware design

The ixp23xx NPEs seem pretty much identical to me to the ixp4xx
NPEs.  There are some minor differences between the ixp2000 and
ixp23xx uengines, but those are easy enough to deal with.


 and the fact that boards based on anything other than 42x are few
 and far between. The vast majority of 'independent' users following
 mainline are likely running on 42x boards.

Sure, ixp23xx hardware is harder to get.  I'm not sure what you mean
by 'independent' users, though.  Are people with non-42x hardware
'dependent' users, and why?


 Thus, for now, I would drop the NPE / QMGR code in arch/arm/mach- 
 ixp4xx/ and concentrate on making it 42x/43x/46x agnostic. One step  
 at a time :)

I'd say that it's up to those who are interested in ixp23xx support
(probably only myself at this point) to add ixp23xx support.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] Intel IXP4xx network drivers

2007-05-08 Thread Lennert Buytenhek

On Mon, May 07, 2007 at 10:00:20PM +0200, Krzysztof Halasa wrote:

  - the NPE can also be used as DMA engine and for crypto operations.
Both are not network related.
Additionally, the NPE is not only ixp4xx related, but is
also used in IXP23xx CPUs, so it could be placed in
arch/arm/common or arch/arm/xscale ?
 
  - The MAC is used on IXP23xx, too. So the drivers for
both CPU familys only differ in the way they exchange
network packets between the NPE and the kernel.
 
 Hmm... perhaps someone have a spare device with such IXP23xx
 and wants to make it a donation for science? :-)

I have a couple of ixp23xx boards at home, but I'm not sure whether I
can give them away.  I can give you remote access to them, though.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-08 Thread Lennert Buytenhek

On Tue, May 08, 2007 at 04:12:17PM +0200, Krzysztof Halasa wrote:

  The queue manager interrupts should probably be implemented as an
  irqchip, in the same way that GPIO interrupts are implemented.  (I.e.
  allocate 'real' interrupt numbers for them, and use the interrupt
  cascade mechanism.)  You probably want to have separate irqchips for
  the upper and lower halves, too.  This way, drivers can just use
  request_irq() instead of having to bother with platform-specific
  qmgr_set_irq() methods.
 
 Is there a sample somewhere?

See for example arch/arm/mach-ep93xx/core.c, handling of the A/B/F
port GPIO interrupts.

In a nutshell, it goes like this.

1) Allocate a set of IRQ numbers.  E.g. in include/asm-arm/arch-ixp4xx/irqs.h:

#define IRQ_IXP4XX_QUEUE_0  64
#define IRQ_IXP4XX_QUEUE_1  65
[...]

   Adjust NR_IRQS, too.

2) Implement interrupt chip functions:

static void ixp4xx_queue_low_irq_mask_ack(unsigned int irq)
{
[...]
}

static void ixp4xx_queue_low_irq_mask(unsigned int irq)
{
[...]
}

static void ixp4xx_queue_low_irq_unmask(unsigned int irq)
{
[...]
}

static void ixp4xx_queue_low_irq_set_type(unsigned int irq)
{
[...]
}

static struct irq_chip ixp4xx_queue_low_irq_chip = {
.name   = QMGR low,
.ack= ixp4xx_queue_low_irq_mask_ack,
.mask   = ixp4xx_queue_low_irq_mask,
.unmask = ixp4xx_queue_low_irq_unmask,
.set_type   = ixp4xx_queue_low_irq_set_type,
};

3) Hook up the queue interrupts:

for (i = IRQ_IXP4XX_QUEUE_0; i = IRQ_IXP4XX_QUEUE_31; i++) {
set_irq_chip(i, ixp4xx_queue_low_irq_chip);
set_irq_handler(i, handle_level_irq);
set_irq_flags(i, IRQF_VALID);
}

4) Implement an interrupt handler for the parent interrupt:

static void ixp4xx_qmgr_low_irq_handler(unsigned int irq, struct 
irq_des c *desc)
{
u32 status;
int i;

status = __raw_readl(IXP4XX_WHATEVER_QMGR_LOW_STATUS_REGISTER);
for (i = 0; i  32; i++) {
if (status  (1  i)) {
desc = irq_desc + IRQ_IXP4XX_QUEUE_0 + i;
desc_handle_irq(IRQ_IXP4XX_QUEUE_0 + i, desc);
}
}
}

5) Hook up the parent interrupt:

set_irq_chained_handler(IRQ_IXP4XX_QM1, ixp4xx_qmgr_low_irq_handler);


Or something like that.


  As with Christian's driver, I don't know whether an SRAM allocator
  makes much sense.  We can just set up a static allocation map for the
  in-tree drivers and leave out the allocator altogether.  I.e. I don't
  think it's worth the complexity (and just because the butt-ugly Intel
  code has an allocator isn't a very good reason. :-)
 
 It's a very simple allocator. I don't whink we have enough SRAM
 without it. For now it would work but it's probably too small
 for all potential users at a time.
 
 There may be up to 6 Ethernet ports (not sure about hardware
 status, not yet supported even by Intel) - 7 queues * 128 entries
 each = ~ 3.5 KB. Add 2 long queues (RX) for HSS and something
 for TX, and then crypto, and maybe other things.

You're unlikely to be using all of those at the same time, though.

And what do you do if the user does compile all of these features into
his kernel and then tries to use them all at the same time?  Return
-ENOMEM?

Shouldn't we make sure that at least the features that are compiled in
can be used at the same time?  If you want that guarantee, then you
might as well determine the SRAM map at compile time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-08 Thread Lennert Buytenhek

On Tue, May 08, 2007 at 04:31:12PM +0200, Krzysztof Halasa wrote:

  +/* Built-in 10/100 Ethernet MAC interfaces */
  +static struct mac_plat_info ixdp425_plat_mac[] = {
  +  {
  +  .phy= 0,
  +  .rxq= 3,
  +  }, {
  +  .phy= 1,
  +  .rxq= 4,
  +  }
  +};
 
  As with Christian's driver (I'm feeling like a bit of a broken record
  here :-), putting knowledge of which queue to use (which is firmware-
  specific) in the _board_ support file is almost certainly wrong.
 
  I would just put the port number in there, and let the ethernet
  driver map the port number to the hardware queue number.  After all,
  the ethernet driver knows which queues the firmware uses, while the
  board support code doesn't.
 
 No, quite the opposite. The board code knows its set of hardware
 interfaces etc. and can let Ethernet driver use, say, HSS queues.
 The driver can't know that.

You are attacking a point that I did not make.

The board support code knows such things as that the front ethernet
port on the board is connected to the CPU's MII port number #2, but
the board support code does _not_ know that MII port number #2
corresponds to ixp4xx hardware queue #5.

If Intel puts out a firmware update next month, and your ethernet
driver is modified to take advantage of the new features in that
firmware and starts depending on the newer version of that firmware,
we will have to modify every ixp4xx board support file in case the
firmware update modifies the ixp4xx queue numbers in use.  The
mapping from hardware ports (MII port #0, MII port #6, HSS port
#42, whatever) to ixp4xx hardware queue numbers (0-63) should _not_
be put in every single ixp4xx board support file.

Even if you only change the

(in board support file)
.rxq= 4,

line to something like this instead:

(in some ixp4xx-specific or driver-specific header file)
#define IXP4XX_MII_PORT_1_RX_QUEUE  4

(in board support file)
.rxq= IXP4XX_MII_PORT_1_RX_QUEUE,

then you have remved this dependency, and then you only have to update
one place if you move to a newer firmware version.


  I generally discourage the use of such wrappers, as it often makes
  people forget that the set and clear operations are not atomic, and
  it ignores the fact that some of the other bits in the register you
  are modifying might have side-effects.
 
 Without them the code in question is hardly readable,

You can read Polish, how can you complain about code readability. :-))

*runs*


 I pick the need to remember about non-atomicity and possible side
 effects instead :-)

Sure, point taken, it's just that the person after you might not
remember..
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-08 Thread Lennert Buytenhek

On Tue, May 08, 2007 at 05:28:21PM +0200, Krzysztof Halasa wrote:

  I was always curious, why do people want to run ixp4xx in LE mode? What
  are the benefits that overweight the obvious performance degradation?
 
 Debian is indeed a valid reason.
 I wonder if it would be much work to create BE Debian as well.

There _is_ an ARM BE version of Debian.

It's not an official port, but it's not maintained any worse than
the 'official' LE ARM Debian port is.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SMC on pxa3xx (was: pxa3xx base patch [5/5] - net)

2007-04-27 Thread Lennert Buytenhek

On Fri, Apr 27, 2007 at 12:52:08PM +0400, dmitry pervushin wrote:

 +#elif defined(CONFIG_PXA3xx)
 +#define SMC_CAN_USE_8BIT 1
 +#define SMC_CAN_USE_16BIT1
 +#define SMC_CAN_USE_32BIT0
 +#define SMC_IO_SHIFT 0
 +#define SMC_NOWAIT   1
 +#define SMC_USE_PXA_DMA  1
 +#define SMC_inb(a, r)readb((a) + (r))
 +#define SMC_outb(v, a, r)writeb(v, (a) + (r))
 +#define SMC_inw(a, r)readw((a) + (r))
 +#define SMC_outw(v, a, r)writew(v, (a) + (r))
 +#define SMC_insw(a, r, p, l) insw((a) + (r), p, l)
 +#define SMC_outsw(a, r, p, l)outsw((a) + (r), p, l)

This is bogus, please don't apply.

The fact that the SMC might be hooked up in a certain way on one
certain PXA3xx board doesn't mean that it will be hooked up in that
way on every PXA3xx board.

Everything I've seen of the PXA3xx patch set so far is a disaster.
MontaVista is flooding every corner of the internet with these crap
patches.  This idiocy has got to stop.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFT] e100 driver on ARM

2007-04-26 Thread Lennert Buytenhek

On Thu, Apr 26, 2007 at 09:41:22AM -0400, David Acker wrote:

 Here is a quote from Russell that describes what I believe is the main 
 problem:
 http://www-gatago.com/linux/kernel/15457063.html
 
 Has e100 actually been fixed to use the PCI DMA API correctly yet?
 Looking at it, it doesn't look like it, so until it does, eepro100
 is the far better bet for platforms needing working DMA API.
 
 What I'm talking about is e100's apparant belief that it can modify
 rfd's in the receive ring on a non-cache coherent architecture and
 expect the data around it to remain unaffected (see e100_rx_alloc_skb):
 
 struct rfd {
 u16 status;
 u16 command;
 u32 link;
 u32 rbd;
 u16 actual_size;
 u16 size;
 };
 
 it touches command and link. This means that the whole rfd plus
 maybe the following or preceding 16 bytes get loaded into a cache
 line (assuming cache lines of 32 bytes), and that data written
 out again at sync. However, it does this on what seems to be an
 active receive chain.
 
 So, both the CPU _and_ the device own the same data. Which is a
 violation of the DMA API.
 
 
 I think that the S-bit patch fixes it because the hardware spins on the 
 s-bit instead of using the packet.  With just the el-bit, the hardware 
 tries to use the same cache line that the software is updating.
 
 Can someone from Intel let us know if I understand the hardware's 
 handling of the S and EL bits?  If my interpretation is correct, can the 
 s-bit patch be applied?  It seems like the correct way to lock out the 
 hardware while a packet is being updated.  I have not seen a reason 
 given not to apply the patch.

This is all a while ago now, but wasn't the e100 S-bit patch originally
written by Intel people in response to the very same quote by Russell
King that you've quoted above?  The S-bit patch should probably just
be applied, IMHO.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] bridge: if no STP then forward all BPDU's

2007-04-24 Thread Lennert Buytenhek

On Tue, Apr 24, 2007 at 04:12:26PM -0700, Stephen Hemminger wrote:

 The bridge code by default captures all spanning tree packets and
 doesn't forward them. I propose that this might not be a good idea.

As far as I remember, the original bridge code did pass through BPDUs
when STP was disabled.  I think that that is the only right way to
behave.


 --- bridge-2.6.22.orig/net/bridge/br_input.c
 +++ bridge-2.6.22/net/bridge/br_input.c
 @@ -131,8 +131,16 @@ struct sk_buff *br_handle_frame(struct n
   if (!is_valid_ether_addr(eth_hdr(skb)-h_source))
   goto drop;
  
 - if (unlikely(is_link_local(dest))) {
 - skb-pkt_type = PACKET_HOST;
 + /*
 +  * If STP is running, then trap all link-local (802.1x) frames
 +  * process through normal receive path.
 +  *
 +  * For safety, if not running STP then act as a completely transparent
 +  * device. This means that if STP is running on another machine, it
 +  * can still detect cycles.
 +  */
 + if (p-br-stp_enabled != BR_NO_STP  is_link_local(dest)) {
 + /* skb-pkt_type should already be PACKET_MULTICAST */

(Does this check include PAUSE frames?  We still don't want to
forward PAUSE frames in any case..)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add support for running the Marvell m88e1111 PHY in RGMII mode

2007-04-11 Thread Lennert Buytenhek

On Wed, Apr 11, 2007 at 04:36:49PM -0500, Kim Phillips wrote:

  On Tue, Apr 10, 2007 at 04:57:23PM -0500, Kim Phillips wrote:
  
   also adds RX  TX delay bits to help boards with clock skew
   problems.
  
 snip
  
   [...]
   +
   + temp |= (MII_M_RX_DELAY | MII_M_TX_DELAY);
  
  Enabling this unconditionally is just wrong.
 
 I agree.  There needs to be a way for the platform code to
 communicate board specific quirkiness to the phylib

(I'm not sure whether it's really a quirk, as the RGMII spec allows
both modes.)


 (I just haven't figured out how to yet).

Maybe offer separate RGMII and RGMII-ID[*] mode choices?


[*] RGMII with Internal Delay (RGMII specification nomenclature)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: phylib usage

2007-04-10 Thread Lennert Buytenhek

On Tue, Apr 10, 2007 at 05:20:52PM -0500, Kim Phillips wrote:

 (note I'm coming from an embedded world here.)

Please read this:

http://marc.info/?l=linux-netdevm=116527863300952w=2
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: routing question under invisible bridge

2007-03-23 Thread Lennert Buytenhek

On Thu, Mar 22, 2007 at 03:52:55PM -0500, Bin He wrote:

 Dear sir,

Hi,


 I found your email address from kernel bridge source codes. I would
 appreciate if you could look into my question a little bit.

The netdev@ mailing list is a better forum to ask such questions,
I've CC'ed this email there.


 I have an invisible bridge (br0) which contains eth0 and eth1. None
 of them have an IP address because I want to it to be transparent to
 the existing network. So there is no entries in kernel routing table.

If you have an IP address assigned to br0, your kernel will likely have
(at least) one entry in its routing table even if you didn't put any
routes in there yourself.


 The problem is how does it handle the routing, i.e., which eth
 interface will a packet be sent to?

(The decision which bridge sub-device to send a packet to isn't
called 'routing', as it doesn't involve an IP routing decision --
that decision has already been made at that point.)


 For example, I can create a packet and bind it to a device by
 SO_BINDTODEVICE socket option. I did some tests and found:
 1) if the socket is bound to eth0 or eth1, the packet cannot be sent out.
 2) if the socket is bound to br0, it seems that the packet is only
 sent out to eth0.

Check out your system's ARP table (run /sbin/arp) and your br0
bridge's MAC address table (run 'brctl showmacs br0' or something
like that.)

When your machine wants to communicate with a remote IP address, it
first sends an ARP packet to figure out what the ethernet address is
that corresponds to that remote IP address.

When your machine then sends an IP packet on the br0 interface to that
ethernet address, the bridge code checks the MAC address table to find
out whether to send it to eth0 or eth1 (if the MAC address is a known
MAC address) or to both (if we have never seen the MAC address before
or if it has timed out.)


 So is there a way to send out a packet on a particular device?

I'm not sure exactly what you are trying to do?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC 00/10] Transparent proxying patches version 4

2007-01-07 Thread Lennert Buytenhek

On Sun, Jan 07, 2007 at 03:11:34PM +0100, Harald Welte wrote:

  So instead of using NAT to dynamically redirect traffic to local
  addresses, we now rely on native non-locally-bound sockets and do
  early socket lookups for inbound IPv4 packets. 
 
 It's good to see a solid implementation of this 'old idea'.  
 
 Just as a quick historical note to netdev:  This is the way how the
 netfilter project  advised the balabit guys to implement fully
 transparent proxy support, after having seen the complexity of the old
 nat-based TPROXY patches.

Didn't rusty tell the balabit guys to use the NAT approach?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC 00/10] Transparent proxying patches version 4

2007-01-04 Thread Lennert Buytenhek

On Thu, Jan 04, 2007 at 01:13:27PM +0100, KOVACS Krisztian wrote:

  I'd also love to see the old tproxy API go away entirely.  It was
  always a bit of a pain to use.
 
   It's gone with these patches: all you need is to bind() to foreign 
 addresses, like in the Linux 2.2 days.

That's how I understood it.  Great.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC 00/10] Transparent proxying patches version 4

2007-01-03 Thread Lennert Buytenhek

On Wed, Jan 03, 2007 at 05:33:57PM +0100, KOVACS Krisztian wrote:

 The following set of patches implement transparent proxying support
 loosely modeled on the Linux 2.2 transparent proxying functionality.

In a transparent http proxy server I wrote a while ago, we used to use
tproxy for making outgoing connections appear to be originating from a
foreign IP address, but moved to inserting an iptables nat rule from
the proxy app every time an outgoing connection needs to be made, due
to the pain of having to patch in the tproxy patches every time we
needed to do a kernel update.

I'd love to see working tproxy functionality merged upstream for that
reason alone.

I'd also love to see the old tproxy API go away entirely.  It was
always a bit of a pain to use.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 5/7] ep93xx: some minor cleanups to the ep93xx eth driver

2006-12-26 Thread Lennert Buytenhek

On Tue, Dec 26, 2006 at 04:41:17PM -0500, Jeff Garzik wrote:

 Small cleanup in the Cirrus Logic EP93xx ethernet driver: Check for NULL
 pointer before dereferencing it instead of after.  Remove unreferenced
 variable.

 Signed-off-by: Yan Burman [EMAIL PROTECTED]
 Cc: Jeff Garzik [EMAIL PROTECTED]
 Cc: Russell King [EMAIL PROTECTED]
 Signed-off-by: Andrew Morton [EMAIL PROTECTED]

Why wasn't I CC'ed on this?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 5/7] ep93xx: some minor cleanups to the ep93xx eth driver

2006-12-26 Thread Lennert Buytenhek

On Tue, Dec 26, 2006 at 10:42:27PM +0100, Lennert Buytenhek wrote:

  Small cleanup in the Cirrus Logic EP93xx ethernet driver: Check for NULL
  pointer before dereferencing it instead of after.  Remove unreferenced
  variable.

  Signed-off-by: Yan Burman [EMAIL PROTECTED]
  Cc: Jeff Garzik [EMAIL PROTECTED]
  Cc: Russell King [EMAIL PROTECTED]
  Signed-off-by: Andrew Morton [EMAIL PROTECTED]

 Why wasn't I CC'ed on this?

Sorry, meant to ask Yan Burman, not Jeff.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] r8169: use the broken_parity_status field in pci_dev

2006-12-17 Thread Lennert Buytenhek

On Mon, Dec 18, 2006 at 12:04:19AM +0100, Francois Romieu wrote:

 The former option is removed and platform code can now specify the
 expected behavior.

Thanks a lot.

FYI, I submitted this patch for the n2100 side:


Index: linux-2.6.19/arch/arm/mach-iop32x/n2100.c
===
--- linux-2.6.19.orig/arch/arm/mach-iop32x/n2100.c
+++ linux-2.6.19/arch/arm/mach-iop32x/n2100.c
@@ -123,9 +123,26 @@ static struct hw_pci n2100_pci __initdat
 
 static int __init n2100_pci_init(void)
 {
-   if (machine_is_n2100())
+   if (machine_is_n2100()) {
+   int i;
+
pci_common_init(n2100_pci);
 
+   /*
+* Both r8169 chips on the n2100 exhibit PCI parity
+* problems.  Set the -broken_parity_status flag for
+* both ports so that the r8169 driver knows it should
+* ignore error interrupts.
+*/
+   for (i = 1; i = 2; i++) {
+   struct pci_dev *dev;
+
+   dev = pci_get_bus_and_slot(0, PCI_DEVFN(i, 0));
+   if (dev != NULL)
+   dev-broken_parity_status = 1;
+   }
+   }
+
return 0;
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bridge it's MAC address question

2006-12-15 Thread Lennert Buytenhek

On Mon, Oct 30, 2006 at 07:28:37AM -0800, Stephen Hemminger wrote:

  Could somebody explain, why bridge uses minimal MAC of the attached devices?
  It makes this address instable, variable during bridge life-cycle, which is 
  not good for DHCP. For example, I want to attach multiple virtual devices 
  to 
  one physical. Then, I need to make sure that after each virtual device 
  addition, bridge addr is not changed and still addr of the physical device. 
   
  Why not to use MAC of the first attached device?
 
 The bridge physical address is the minimum of all the attached devices.
 This is done because the STP standard requires it.  You can reset it
 to be the same as any of the attached devices. This will not cause a
 problem unless using STP.

You can in fact use any MAC address.  The STP standard recommends using
the minimum address, as that is deterministic, and so it doesn't depend
on the order in which you enslave subdevices.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bridge] Bridge it's MAC address question

2006-12-15 Thread Lennert Buytenhek

[ dropped subscriber-only openvz.org list ]

On Fri, Dec 15, 2006 at 07:52:36AM -0800, Stephen Hemminger wrote:

   The bridge physical address is the minimum of all the attached devices.
   This is done because the STP standard requires it.  You can reset it
   to be the same as any of the attached devices. This will not cause a
   problem unless using STP.
  
  You can in fact use any MAC address.  The STP standard recommends using
  the minimum address, as that is deterministic, and so it doesn't depend
  on the order in which you enslave subdevices.
 
 So should restriction be lifted?

We should definitely allow users to override the MAC address of a
bridge interface.


 Please update wiki page FAQ, or I'll do it

Please do.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC patch] driver for the Opencores Ethernet Controller

2006-12-04 Thread Lennert Buytenhek

On Mon, Dec 04, 2006 at 10:01:01AM -0800, Dan Nicolaescu wrote:

 The Opencores Ethernet Controller is Verilog code that can be used to
 implement an Ethernet device in hardware. It needs to be coupled with
 a PHY and some buffer memory. Because of that devices that implement
 this controller can be very different. The code here tries to support
 that by having some parameters that need to be defined at compile
 time.

Considering this, why don't you make it a platform driver?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC patch] driver for the Opencores Ethernet Controller

2006-12-04 Thread Lennert Buytenhek

On Mon, Dec 04, 2006 at 10:27:52AM -0800, Dan Nicolaescu wrote:

   The Opencores Ethernet Controller is Verilog code that can be used to
   implement an Ethernet device in hardware. It needs to be coupled with
   a PHY and some buffer memory. Because of that devices that implement
   this controller can be very different. The code here tries to support
   that by having some parameters that need to be defined at compile
   time.
  
  Considering this, why don't you make it a platform driver?
 
 I didn't know about platform drivers before your mail. I guess I
 could convert it to that if that is the right thing to do. 

I definitely think so.  Check the ep93xx_eth driver for an example.


 (It might be an overkill given that the device is kind of simple and
 embedded people prefer small code...)

..until someone decides that he wants to build a design with two of
these ethernet cores instead of just one, at which point the entire
Let's use #defines for everything plan breaks down badly.


 Any comments on the driver itself? 

Sorry, no, I didn't look at it.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

'embedded people' and the 'embedded world' (was: Re: [RFC patch] driver for the Opencores Ethernet Controller)

2006-12-04 Thread Lennert Buytenhek

On Mon, Dec 04, 2006 at 10:27:52AM -0800, Dan Nicolaescu wrote:

 I didn't know about platform drivers before your mail. I guess I
 could convert it to that if that is the right thing to do. 
 (It might be an overkill given that the device is kind of simple and
 embedded people prefer small code...)

BTW (and this is not specifically directed to you.)

I count myself as an 'embedded person', having contributed a thing or
two to the Linux ARM kernel port and doing most of my Linux hacking on
ARM platforms, but I certainly don't share your opinion w.r.t. what
'embedded people' want or don't want.

Nor do I share any of the opinions of most 'embedded people' who
proclaim to be representing 'the embedded world' to 'the outside world',
the opinions that have gotten the embedded crowd the bad reputation
that we have gotten over the years.

- We can't use existing kernel infrastructure because we are special.

- Let's save 8 bytes and 2 cycles in a slow path by throwing all
  established and sane kernel design principles out of the window.

- If we code this in a really ugly, unmaintainable, incompatible,
  incomprehensible way, we can save 3 cycles in the slow path.  Let's
  do it.

- All our code lives in a separately maintained tree, and everyone
  wanting to use Linux on our CPUs will have to use our 'Cirrus Logic
  Linux 2.6.8.1 version 1.3.2' release[*].  We can't be bothered to
  merge with upstream because we are special.

- There's nothing wrong with a function having 500 lines.

- We are using 2.4.6 because it is more stable than all that
  newfangled 2.6 stuff.

- etc.

It does piss me off from time to time that these people are out there,
and that they claim to be speaking in my name when they spout their
nonsense.

For all the 'non-embedded' folks out there: the next time you hear
someone claiming to be representing 'the embedded world', please take
whatever they say with a bag of salt.


[*] Seriously, I didn't make this up.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH,try2 2/3] ep93xx_eth: fix unlikely(x) y test

2006-10-30 Thread Lennert Buytenhek

Fix unlikely(x)  y test in ep93xx_eth.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]

Index: linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c
===
--- linux-2.6.19-rc3.orig/drivers/net/arm/ep93xx_eth.c
+++ linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c
@@ -334,7 +334,7 @@ static int ep93xx_xmit(struct sk_buff *s
struct ep93xx_priv *ep = netdev_priv(dev);
int entry;
 
-   if (unlikely(skb-len)  MAX_PKT_SIZE) {
+   if (unlikely(skb-len  MAX_PKT_SIZE)) {
ep-stats.tx_dropped++;
dev_kfree_skb(skb);
return NETDEV_TX_OK;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH,try3 3/3] ep93xx_eth: don't report RX errors

2006-10-30 Thread Lennert Buytenhek

Flooding the console with error messages for every RX FIFO overrun,
checksum error and framing error isn't very sensible.  Each of these
errors can occur during normal operation, so stop printk'ing error
messages for RX errors at all.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]

Index: linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c
===
--- linux-2.6.19-rc3.orig/drivers/net/arm/ep93xx_eth.c
+++ linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c
@@ -230,9 +230,6 @@ static int ep93xx_rx(struct net_device *
  %.8x %.8x\n, rstat0, rstat1);
 
if (!(rstat0  RSTAT0_RWE)) {
-   printk(KERN_NOTICE ep93xx_rx: receive error 
- %.8x %.8x\n, rstat0, rstat1);
-
ep-stats.rx_errors++;
if (rstat0  RSTAT0_OE)
ep-stats.rx_fifo_errors++;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH,try2 0/3] ep93xx_eth: three fixes for 2.6.19

2006-10-30 Thread Lennert Buytenhek

This patchset fixes three issues in ep93xx_eth.  The first fix is for
an RX/TX lockup bug due to mishandling of the RX/TXstatus rings in the
driver, and is a showstopper.  The second and third aren't really
showstopper bugs, but real issues nevertheless, and easy enough to fix.

In this new queue, I've replaced the third patch, which modified
ep93xx_eth to only printk for non-FIFO overrun-type RX errors, by a
patch which stops ep93xx_eth reporting RX errors at all, since it makes
little sense, and the regular error counters should be sufficient.

Please apply for 2.6.19 -- thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3] ep93xx_eth: three fixes for 2.6.19

2006-10-29 Thread Lennert Buytenhek

This patchset fixes three issues in ep93xx_eth.  The first fix is for
an RX/TX lockup bug due to mishandling of the RX/TXstatus rings in the
driver, and is a showstopper.  The second and third aren't really
showstopper bugs, but real issues nevertheless, and easy enough to fix.

Please apply for 2.6.19 -- thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] ep93xx_eth: fix RX/TXstatus ring full handling

2006-10-29 Thread Lennert Buytenhek

Ray Lehtiniemi reported that an incoming UDP packet flood can lock up
the ep93xx ethernet driver.  Herbert Valerio Riedel noted that due to
the way ep93xx_eth manages the RX/TXstatus rings, it cannot distinguish
a full ring from an empty one, and correctly suggested that this was
likely to be causing this lockup to occur.

Instead of looking at the hardware's RX/TXstatus ring write pointers
to determine when to stop reading from those rings, we should just check
every individual RX/TXstatus descriptor's valid bit instead, since there
is no other way to distinguish an empty ring from a full ring, and if
there is a descriptor waiting, we take the hit of reading the descriptor
from memory anyway.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]

Index: linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c
===
--- linux-2.6.19-rc3.orig/drivers/net/arm/ep93xx_eth.c
+++ linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c
@@ -193,12 +193,9 @@ static struct net_device_stats *ep93xx_g
 static int ep93xx_rx(struct net_device *dev, int *budget)
 {
struct ep93xx_priv *ep = netdev_priv(dev);
-   int tail_offset;
int rx_done;
int processed;
 
-   tail_offset = rdl(ep, REG_RXSTSQCURADD) - ep-descs_dma_addr;
-
rx_done = 0;
processed = 0;
while (*budget  0) {
@@ -211,28 +208,23 @@ static int ep93xx_rx(struct net_device *
 
entry = ep-rx_pointer;
rstat = ep-descs-rstat + entry;
-   if ((void *)rstat - (void *)ep-descs == tail_offset) {
+
+   rstat0 = rstat-rstat0;
+   rstat1 = rstat-rstat1;
+   if (!(rstat0  RSTAT0_RFP) || !(rstat1  RSTAT1_RFP)) {
rx_done = 1;
break;
}
 
-   rstat0 = rstat-rstat0;
-   rstat1 = rstat-rstat1;
rstat-rstat0 = 0;
rstat-rstat1 = 0;
 
-   if (!(rstat0  RSTAT0_RFP))
-   printk(KERN_CRIT ep93xx_rx: buffer not done 
- %.8x %.8x\n, rstat0, rstat1);
if (!(rstat0  RSTAT0_EOF))
printk(KERN_CRIT ep93xx_rx: not end-of-frame 
  %.8x %.8x\n, rstat0, rstat1);
if (!(rstat0  RSTAT0_EOB))
printk(KERN_CRIT ep93xx_rx: not end-of-buffer 
  %.8x %.8x\n, rstat0, rstat1);
-   if (!(rstat1  RSTAT1_RFP))
-   printk(KERN_CRIT ep93xx_rx: buffer1 not done 
- %.8x %.8x\n, rstat0, rstat1);
if ((rstat1  RSTAT1_BUFFER_INDEX)  16 != entry)
printk(KERN_CRIT ep93xx_rx: entry mismatch 
  %.8x %.8x\n, rstat0, rstat1);
@@ -301,13 +293,8 @@ err:
 
 static int ep93xx_have_more_rx(struct ep93xx_priv *ep)
 {
-   struct ep93xx_rstat *rstat;
-   int tail_offset;
-
-   rstat = ep-descs-rstat + ep-rx_pointer;
-   tail_offset = rdl(ep, REG_RXSTSQCURADD) - ep-descs_dma_addr;
-
-   return !((void *)rstat - (void *)ep-descs == tail_offset);
+   struct ep93xx_rstat *rstat = ep-descs-rstat + ep-rx_pointer;
+   return !!((rstat-rstat0  RSTAT0_RFP)  (rstat-rstat1  RSTAT1_RFP));
 }
 
 static int ep93xx_poll(struct net_device *dev, int *budget)
@@ -379,10 +366,8 @@ static int ep93xx_xmit(struct sk_buff *s
 static void ep93xx_tx_complete(struct net_device *dev)
 {
struct ep93xx_priv *ep = netdev_priv(dev);
-   int tail_offset;
int wake;
 
-   tail_offset = rdl(ep, REG_TXSTSQCURADD) - ep-descs_dma_addr;
wake = 0;
 
spin_lock(ep-tx_pending_lock);
@@ -393,15 +378,13 @@ static void ep93xx_tx_complete(struct ne
 
entry = ep-tx_clean_pointer;
tstat = ep-descs-tstat + entry;
-   if ((void *)tstat - (void *)ep-descs == tail_offset)
-   break;
 
tstat0 = tstat-tstat0;
+   if (!(tstat0  TSTAT0_TXFP))
+   break;
+
tstat-tstat0 = 0;
 
-   if (!(tstat0  TSTAT0_TXFP))
-   printk(KERN_CRIT ep93xx_tx_complete: buffer not done 
- %.8x\n, tstat0);
if (tstat0  TSTAT0_FA)
printk(KERN_CRIT ep93xx_tx_complete: frame aborted 
  %.8x\n, tstat0);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] ep93xx_eth: don't report RX FIFO overrun errors

2006-10-29 Thread Lennert Buytenhek

On Sun, Oct 29, 2006 at 11:15:28AM -0700, Ray Lehtiniemi wrote:

  Index: linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c
  ===
  --- linux-2.6.19-rc3.orig/drivers/net/arm/ep93xx_eth.c
  +++ linux-2.6.19-rc3/drivers/net/arm/ep93xx_eth.c
  @@ -230,8 +230,9 @@ static int ep93xx_rx(struct net_device *
%.8x %.8x\n, rstat0, rstat1);
 
  if (!(rstat0  RSTAT0_RWE)) {
  -   printk(KERN_NOTICE ep93xx_rx: receive error 
  - %.8x %.8x\n, rstat0, rstat1);
  +   if (!(rstat0  RSTAT_OE))
  +   printk(KERN_NOTICE ep93xx_rx: receive error 
  +   %.8x %.8x\n, rstat0, rstat1);
 
  ep-stats.rx_errors++;
  if (rstat0  RSTAT0_OE)
 
 i got a compile error: please s/RSTAT_OE/RSTAT0_OE/ in this patch.

Whoops, I thought I sent the right one.  :(


 Also, is it possible for any other error bits to be set at the same
 time as OE?  such bits would not be printed to the log in this case.

Not sure, but arguably, this wouldn't be very interesting.  Actually,
now I'm wondering whether we should just remove the printk altogether.


cheers,
Lennert
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.4/2.6 share in linux routers ?

2006-10-28 Thread Lennert Buytenhek

On Fri, Oct 27, 2006 at 11:47:52PM +0200, Yakov Lerner wrote:

 I'd like to find/gather estimates about 2.4 vs 2.6 share in  [small]
 linux routers in 2006. Can anyone offer estimates and/or references ?

For ARM devices, 2.4 is still definitely in the majority.

The reason for that appears to be that embedded linux distro vendors
like locking their customers into their own patched-to-hell once-looked-
like-something-2.4-ish kernels, under the guise of a load of 2.6 is too
unstable FUD.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH,RFC] bridge: call eth_type_trans() in br_pass_frame_up()

2006-10-18 Thread Lennert Buytenhek

Hi,

I've been seeing a failure to reply to incoming ARP packets on a bridge
interface until after the first few packets have been transmitted over
that interface, and the patch below seems to fix the issue, the 'issue'
being that the incoming ARP packets are marked with PACKET_OTHERHOST,
and there not being anything to set that back to PACKET_HOST even if
the destination MAC address matches the bridge interface's MAC address.

If this looks good, I'll prepare a proper commit message.


cheers,
Lennert

Signed-off-by: Tom Billman [EMAIL PROTECTED]
Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]

--- linux-2.6.19-rc2.orig/net/bridge/br_input.c 2006-10-18 11:11:08.0 
+0200
+++ linux-2.6.19-rc2/net/bridge/br_input.c  2006-10-18 11:10:08.0 
+0200
@@ -32,6 +32,9 @@
indev = skb-dev;
skb-dev = br-dev;
 
+   skb_push(skb, ETH_HLEN);
+   skb-protocol = eth_type_trans(skb, skb-dev);
+
NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, indev, NULL,
netif_receive_skb);
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Suppress / delay SYN-ACK

2006-10-16 Thread Lennert Buytenhek

On Thu, Oct 12, 2006 at 10:08:53AM +0200, Martin Schiller wrote:

 I'm searching for a solution to suppress / delay the SYN-ACK packet of a
 listening server (-application) until he has decided (e.g. analysed the
 requesting ip-address or checked if the corresponding other end of a
 connection is available) if he wants to accept the connect request of the
 client. If not, it should be possible to reject the connect request.

I wrote something like this a couple of years ago:

http://marc.theaimsgroup.com/?l=linux-netdevm=103666165629419w=2
http://marc.theaimsgroup.com/?l=linux-netdevm=106089519611631w=2

There wasn't a whole lot of external interest, and my need for it
disappeared, so I never really finished it, and there's a couple of
unfixed bugs,


cheers,
Lennert
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] cirrus logic ep93xx ethernet driver

2006-09-21 Thread Lennert Buytenhek

The cirrus ep93xx is an ARM SoC that includes an ethernet MAC --
this patch adds a driver for that ethernet MAC.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]

Index: linux-2.6.18/drivers/net/arm/Kconfig
===
--- linux-2.6.18.orig/drivers/net/arm/Kconfig
+++ linux-2.6.18/drivers/net/arm/Kconfig
@@ -39,3 +39,10 @@ config ARM_AT91_ETHER
help
  If you wish to compile a kernel for the AT91RM9200 and enable
  ethernet support, then you should always answer Y to this.
+
+config EP93XX_ETH
+   tristate EP93xx Ethernet support
+   depends on NET_ETHERNET  ARM  ARCH_EP93XX
+   help
+ This is a driver for the ethernet hardware included in EP93xx CPUs.
+ Say Y if you are building a kernel for EP93xx based devices.
Index: linux-2.6.18/drivers/net/arm/Makefile
===
--- linux-2.6.18.orig/drivers/net/arm/Makefile
+++ linux-2.6.18/drivers/net/arm/Makefile
@@ -8,3 +8,4 @@ obj-$(CONFIG_ARM_ETHERH)+= etherh.o
 obj-$(CONFIG_ARM_ETHER3)   += ether3.o
 obj-$(CONFIG_ARM_ETHER1)   += ether1.o
 obj-$(CONFIG_ARM_AT91_ETHER)   += at91_ether.o
+obj-$(CONFIG_EP93XX_ETH)   += ep93xx_eth.o
Index: linux-2.6.18/drivers/net/arm/ep93xx_eth.c
===
--- /dev/null
+++ linux-2.6.18/drivers/net/arm/ep93xx_eth.c
@@ -0,0 +1,856 @@
+/*
+ * EP93xx ethernet network device driver
+ * Copyright (C) 2006 Lennert Buytenhek [EMAIL PROTECTED]
+ * Dedicated to Marija Kulikova.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include linux/config.h
+#include linux/dma-mapping.h
+#include linux/module.h
+#include linux/kernel.h
+#include linux/netdevice.h
+#include linux/mii.h
+#include linux/etherdevice.h
+#include linux/ethtool.h
+#include linux/init.h
+#include linux/moduleparam.h
+#include linux/platform_device.h
+#include linux/delay.h
+#include asm/arch/ep93xx-regs.h
+#include asm/arch/platform.h
+#include asm/io.h
+#include ep93xx_eth.h
+
+#define DRV_MODULE_NAMEep93xx-eth
+#define DRV_MODULE_VERSION 0.1
+
+#define RX_QUEUE_ENTRIES   64
+#define TX_QUEUE_ENTRIES   8
+
+#define MAX_PKT_SIZE   2044
+#define PKT_BUF_SIZE   2048
+
+struct ep93xx_descs
+{
+   struct ep93xx_rdesc rdesc[RX_QUEUE_ENTRIES];
+   struct ep93xx_tdesc tdesc[TX_QUEUE_ENTRIES];
+   struct ep93xx_rstat rstat[RX_QUEUE_ENTRIES];
+   struct ep93xx_tstat tstat[TX_QUEUE_ENTRIES];
+};
+
+struct ep93xx_priv
+{
+   struct resource *res;
+   void*base_addr;
+   int irq;
+
+   struct ep93xx_descs *descs;
+   dma_addr_t  descs_dma_addr;
+
+   void*rx_buf[RX_QUEUE_ENTRIES];
+   void*tx_buf[TX_QUEUE_ENTRIES];
+
+   spinlock_t  rx_lock;
+   int rx_pointer;
+   int tx_clean_pointer;
+   int tx_pointer;
+   spinlock_t  tx_pending_lock;
+   int tx_pending;
+
+   struct net_device_stats stats;
+
+   struct mii_if_info  mii;
+   u8  mdc_divisor;
+};
+
+#define rdb(ep, off)   __raw_readb((ep)-base_addr + (off))
+#define rdw(ep, off)   __raw_readw((ep)-base_addr + (off))
+#define rdl(ep, off)   __raw_readl((ep)-base_addr + (off))
+#define wrb(ep, off, val)  __raw_writeb((val), (ep)-base_addr + (off))
+#define wrw(ep, off, val)  __raw_writew((val), (ep)-base_addr + (off))
+#define wrl(ep, off, val)  __raw_writel((val), (ep)-base_addr + (off))
+
+static int ep93xx_mdio_read(struct net_device *dev, int phy_id, int reg);
+
+static struct net_device_stats *ep93xx_get_stats(struct net_device *dev)
+{
+   struct ep93xx_priv *ep = netdev_priv(dev);
+   return (ep-stats);
+}
+
+static int ep93xx_rx(struct net_device *dev, int *budget)
+{
+   struct ep93xx_priv *ep = netdev_priv(dev);
+   int tail_offset;
+   int rx_done;
+   int processed;
+
+   tail_offset = rdl(ep, REG_RXSTSQCURADD) - ep-descs_dma_addr;
+
+   rx_done = 0;
+   processed = 0;
+   while (*budget  0) {
+   int entry;
+   struct ep93xx_rstat *rstat;
+   u32 rstat0;
+   u32 rstat1;
+   int length;
+   struct sk_buff *skb;
+
+   entry = ep-rx_pointer;
+   rstat = ep-descs-rstat + entry;
+   if ((void *)rstat - (void *)ep-descs == tail_offset) {
+   rx_done = 1;
+   break

[PATCH] Cirrus Logic ep93xx ethernet driver

2006-09-21 Thread Lennert Buytenhek

The Cirrus Logic ep93xx is an ARM SoC that includes an ethernet MAC
-- this patch adds a driver for that ethernet MAC.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]

Index: linux-2.6.18/drivers/net/arm/Kconfig
===
--- linux-2.6.18.orig/drivers/net/arm/Kconfig
+++ linux-2.6.18/drivers/net/arm/Kconfig
@@ -39,3 +39,10 @@ config ARM_AT91_ETHER
help
  If you wish to compile a kernel for the AT91RM9200 and enable
  ethernet support, then you should always answer Y to this.
+
+config EP93XX_ETH
+   tristate EP93xx Ethernet support
+   depends on NET_ETHERNET  ARM  ARCH_EP93XX
+   help
+ This is a driver for the ethernet hardware included in EP93xx CPUs.
+ Say Y if you are building a kernel for EP93xx based devices.
Index: linux-2.6.18/drivers/net/arm/Makefile
===
--- linux-2.6.18.orig/drivers/net/arm/Makefile
+++ linux-2.6.18/drivers/net/arm/Makefile
@@ -8,3 +8,4 @@ obj-$(CONFIG_ARM_ETHERH)+= etherh.o
 obj-$(CONFIG_ARM_ETHER3)   += ether3.o
 obj-$(CONFIG_ARM_ETHER1)   += ether1.o
 obj-$(CONFIG_ARM_AT91_ETHER)   += at91_ether.o
+obj-$(CONFIG_EP93XX_ETH)   += ep93xx_eth.o
Index: linux-2.6.18/drivers/net/arm/ep93xx_eth.c
===
--- /dev/null
+++ linux-2.6.18/drivers/net/arm/ep93xx_eth.c
@@ -0,0 +1,944 @@
+/*
+ * EP93xx ethernet network device driver
+ * Copyright (C) 2006 Lennert Buytenhek [EMAIL PROTECTED]
+ * Dedicated to Marija Kulikova.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include linux/config.h
+#include linux/dma-mapping.h
+#include linux/module.h
+#include linux/kernel.h
+#include linux/netdevice.h
+#include linux/mii.h
+#include linux/etherdevice.h
+#include linux/ethtool.h
+#include linux/init.h
+#include linux/moduleparam.h
+#include linux/platform_device.h
+#include linux/delay.h
+#include asm/arch/ep93xx-regs.h
+#include asm/arch/platform.h
+#include asm/io.h
+
+#define DRV_MODULE_NAMEep93xx-eth
+#define DRV_MODULE_VERSION 0.1
+
+#define RX_QUEUE_ENTRIES   64
+#define TX_QUEUE_ENTRIES   8
+
+#define MAX_PKT_SIZE   2044
+#define PKT_BUF_SIZE   2048
+
+#define REG_RXCTL  0x
+#define  REG_RXCTL_DEFAULT 0x00073800
+#define REG_TXCTL  0x0004
+#define  REG_TXCTL_ENABLE  0x0001
+#define REG_MIICMD 0x0010
+#define  REG_MIICMD_READ   0x8000
+#define  REG_MIICMD_WRITE  0x4000
+#define REG_MIIDATA0x0014
+#define REG_MIISTS 0x0018
+#define  REG_MIISTS_BUSY   0x0001
+#define REG_SELFCTL0x0020
+#define  REG_SELFCTL_RESET 0x0001
+#define REG_INTEN  0x0024
+#define  REG_INTEN_TX  0x0008
+#define  REG_INTEN_RX  0x0007
+#define REG_INTSTSP0x0028
+#define  REG_INTSTS_TX 0x0008
+#define  REG_INTSTS_RX 0x0004
+#define REG_INTSTSC0x002c
+#define REG_AFP0x004c
+#define REG_INDAD0 0x0050
+#define REG_INDAD1 0x0051
+#define REG_INDAD2 0x0052
+#define REG_INDAD3 0x0053
+#define REG_INDAD4 0x0054
+#define REG_INDAD5 0x0055
+#define REG_GIINTMSK   0x0064
+#define  REG_GIINTMSK_ENABLE   0x8000
+#define REG_BMCTL  0x0080
+#define  REG_BMCTL_ENABLE_TX   0x0100
+#define  REG_BMCTL_ENABLE_RX   0x0001
+#define REG_BMSTS  0x0084
+#define  REG_BMSTS_RX_ACTIVE   0x0008
+#define REG_RXDQBADD   0x0090
+#define REG_RXDQBLEN   0x0094
+#define REG_RXDCURADD  0x0098
+#define REG_RXDENQ 0x009c
+#define REG_RXSTSQBADD 0x00a0
+#define REG_RXSTSQBLEN 0x00a4
+#define REG_RXSTSQCURADD   0x00a8
+#define REG_RXSTSENQ   0x00ac
+#define REG_TXDQBADD   0x00b0
+#define REG_TXDQBLEN   0x00b4
+#define REG_TXDQCURADD 0x00b8
+#define REG_TXDENQ 0x00bc
+#define REG_TXSTSQBADD 0x00c0
+#define REG_TXSTSQBLEN 0x00c4
+#define REG_TXSTSQCURADD   0x00c8
+#define REG_MAXFRMLEN  0x00e8
+
+struct ep93xx_rdesc
+{
+   u32 buf_addr;
+   u32 rdesc1;
+};
+
+#define RDESC1_NSOF0x8000
+#define RDESC1_BUFFER_INDEX0x7fff
+#define RDESC1_BUFFER_LENGTH   0x
+
+struct ep93xx_rstat
+{
+   u32 rstat0;
+   u32 rstat1;
+};
+
+#define RSTAT0_RFP 0x8000
+#define RSTAT0_RWE 0x4000
+#define RSTAT0_EOF 0x2000
+#define RSTAT0_EOB 0x1000
+#define RSTAT0_AM

Re: [PATCH] cirrus logic ep93xx ethernet driver

2006-09-21 Thread Lennert Buytenhek

On Thu, Sep 21, 2006 at 07:10:02PM -0400, Jeff Garzik wrote:

 +if (!(rstat0  RSTAT0_RFP)) {
 +printk(KERN_CRIT ep93xx_rx: buffer not done 
 +  %.8x %.8x\n, rstat0, rstat1);
 +BUG();
 +}
 +if (!(rstat0  RSTAT0_EOF)) {
 +printk(KERN_CRIT ep93xx_rx: not end-of-frame 
 +  %.8x %.8x\n, rstat0, rstat1);
 +BUG();
 +}
 +if (!(rstat0  RSTAT0_EOB)) {
 +printk(KERN_CRIT ep93xx_rx: not end-of-buffer 
 +  %.8x %.8x\n, rstat0, rstat1);
 +BUG();
 +}
 +if (!(rstat1  RSTAT1_RFP)) {
 +printk(KERN_CRIT ep93xx_rx: buffer1 not done 
 +  %.8x %.8x\n, rstat0, rstat1);
 +BUG();
 +}
 +if ((rstat1  RSTAT1_BUFFER_INDEX)  16 != entry) {
 +printk(KERN_CRIT ep93xx_rx: entry mismatch 
 +  %.8x %.8x\n, rstat0, rstat1);
 +BUG();
 +}
 
 NAK all these BUGs.
 
 Very unfriendly

If any of these checks trigger, we are in a very bad state, and
something is likely trampling over random bits of memory, but OK,
removed.


 +if (tstat0  TSTAT0_TXWE) {
 +int length = ep-descs-tdesc[entry].tdesc1  0xfff;
 +
 +ep-stats.tx_packets++;
 +ep-stats.tx_bytes += length;
 +} else {
 +ep-stats.tx_errors++;
 +}
 +#if 0
 +/* This is only valid in half duplex mode.  */
 +if (tstat0  TSTAT0_LCRS)
 +ep-stats.tx_carrier_errors++;
 +#endif
 
 why #if 0'd?

The CRS bit will be set in the tx completion entry if the MII CRS
(Carrier Sense) signal wasn't asserted after the first N nibbles have
been transmitted.  However, if the PHY is running in full duplex mode,
the CRS signal isn't supposed to assert at all, and so we ended up
counting tx_carrier_errors for every transmitted packet if the
interface was in full duplex mode.

I removed this bit of code rather than #if 0'ing it out.


 +static irqreturn_t ep93xx_irq(int irq, void *dev_id, struct pt_regs *regs)
 +{
 +struct net_device *dev = dev_id;
 +struct ep93xx_priv *ep = netdev_priv(dev);
 +u32 status;
 +
 +status = rdl(ep, REG_INTSTSC);
 +if (status == 0)
 +return IRQ_NONE;
 
 also check for status == 0x

As the ethernet controller is on the CPU die itself, it's not very
likely to be unplugged?


 +static int ep93xx_alloc_buffers(struct ep93xx_priv *ep)
 +{
 +int i;
 +
 +ep-descs = dma_alloc_coherent(NULL, sizeof(struct ep93xx_descs),
 +ep-descs_dma_addr, GFP_KERNEL | GFP_DMA);
 +if (ep-descs == NULL)
 +return 1;
 +
 +for (i = 0; i  RX_QUEUE_ENTRIES; i += 2) {
 +void *page;
 +dma_addr_t d;
 +
 +page = (void *)get_zeroed_page(GFP_KERNEL | GFP_DMA);
 +if (page == NULL)
 +goto err;
 
 do you really need a zeroed page?

No, any page will do -- fixed.


 +static int ep93xx_eth_remove(struct platform_device *pdev)
 +{
 +struct net_device *dev;
 +struct ep93xx_priv *ep;
 +
 +dev = platform_get_drvdata(pdev);
 +if (dev == NULL)
 +return 0;
 +platform_set_drvdata(pdev, NULL);
 +
 +ep = netdev_priv(dev);
 +
 +/* @@@ Force down.  */
 +unregister_netdev(dev);
 +ep93xx_free_buffers(ep);
 +
 +if (ep-base_addr != NULL)
 +iounmap(ep-base_addr);
 +
 +if (ep-res != NULL) {
 +release_resource(ep-res);
 +kfree(ep-res);
 +}
 
 when will these ever be NULL ?

ep93xx_eth_remove is called from ep93xx_eth_probe's error path (see
below.)

All other issues fixed as well.  Thanks for your time.


 +free_netdev(dev);
 +
 +return 0;
 +}
 +
 +static int ep93xx_eth_probe(struct platform_device *pdev)
 +{
 +struct ep93xx_eth_data *data;
 +struct net_device *dev;
 +struct ep93xx_priv *ep;
 +int err;
 +
 +data = pdev-dev.platform_data;
 +if (pdev == NULL)
 +return -ENODEV;
 +
 +dev = ep93xx_dev_alloc(data);
 +if (dev == NULL) {
 +err = -ENOMEM;
 +goto err_out;
 +}
 +ep = netdev_priv(dev);
 +
 +platform_set_drvdata(pdev, dev);
 +
 +ep-res = request_mem_region(pdev-resource[0].start,
 +pdev-resource[0].end - pdev-resource[0].start + 1,
 +pdev-dev.bus_id);
 +if (ep-res == NULL) {
 +dev_err(pdev-dev, Could not reserve memory region\n);
 +err = -ENOMEM;
 +goto err_out;
 +}
 +
 +ep-base_addr = ioremap(pdev-resource[0].start,
 +pdev-resource[0].end -

Re: [PATCH] EtherIP tunnel driver (RFC 3378)

2006-09-18 Thread Lennert Buytenhek

On Mon, Sep 11, 2006 at 10:41:29PM +0200, Joerg Roedel wrote:

 This driver implements the tunneling of Ethernet packets over IPv4
 networks for Linux. It uses the protocol defined in RFC 3378.

Check out the thread [PATCH][RFC] etherip: Ethernet-in-IPv4 tunneling
that was on netdev in January of 2005 -- a number of arguments against
etherip (and for tunneling ethernet in GRE) were raised back then.

One of the most significant ones, IMHO:

 Another argument against etherip would be that OpenBSD apparently
 mis-implemented etherip by putting the etherip version nibble in the
 second nibble of the etherip header instead of the first, which would
 probably prevent the linux and OpenBSD versions from interoperating,
 negating the advantage of using etherip in the first place.


cheers,
Lennert
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH,RESEND] rtl8150: use default MTU of 1500

2006-09-17 Thread Lennert Buytenhek

The rtl8150 (ethernet) driver uses a default MTU of 1540, which causes
all kinds of problems with for example booting off NFS root.  There isn't
really any reason why we shouldn't use the default of 1500.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]

Index: linux-2.6.18-rc2/drivers/usb/net/rtl8150.c
===
--- linux-2.6.18-rc2.orig/drivers/usb/net/rtl8150.c
+++ linux-2.6.18-rc2/drivers/usb/net/rtl8150.c
@@ -867,9 +867,8 @@
netdev-hard_start_xmit = rtl8150_start_xmit;
netdev-set_multicast_list = rtl8150_set_multicast;
netdev-set_mac_address = rtl8150_set_mac_address;
netdev-get_stats = rtl8150_netdev_stats;
-   netdev-mtu = RTL8150_MTU;
SET_ETHTOOL_OPS(netdev, ops);
dev-intr_interval = 100;   /* 100ms */
 
if (!alloc_all_urbs(dev)) {
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Pull request for 'r8169-20060912-00' branch

2006-09-12 Thread Lennert Buytenhek

On Tue, Sep 12, 2006 at 09:35:31PM +0200, Francois Romieu wrote:

 + /*
 +  * Magic spell: some iop3xx ARM board needs the TxDescAddrHigh
 +  * register to be written before TxDescAddrLow to work.
 +  * Switching from MMIO to I/O access fixes the issue as well.
 +  */

Not that it matters much either way, but my impression was that this
was an 8110SB bug rather than an iop3xx bug, as the same iop3xx ARM
board sports a VIA PCI USB controller and a Silicon Image PCI SATA
controller, which both work without any problems.


cheers,
Lennert
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH,RFC] Re: r8169 driver problem with RTL8110SB chip (on iop3xx ARM board)

2006-09-08 Thread Lennert Buytenhek

On Fri, Sep 08, 2006 at 10:23:36PM +0200, Francois Romieu wrote:

  I suspect it's a chip bug.  I rechecked with I/O space, and that
  works okay, so this artifact (bug) only manifests itself when you
  do the upper write in MMIO space.
 
  Are there any plans to switch r8169 to the iomap API?  Would you
  take a patch if I'd write one?
 
 Given the current state of the r8169 driver, I do not see a lot of
 benefit from the iomap() API in itself.
 
 It could make the switch to I/O read/write easier for strange bugs
 like your but I have an epidermic defiance against I/O ops (much too 
 synchronizing for me: people forget that MMIO will post). I may
 change my mind if bugs start poping up like mushrooms but we are
 hopefully not there yet.
 
 An ordered write with a big sign in front of it to comment the issue
 is good enough for me.
 
 Don't hesitate to protest if you think that I need a clue.

What you say makes sense -- in my case it would have been useful to
have a knob to switch the driver to use I/O ops (since that is what
the vendor driver uses, and the vendor driver works), but bugs like
these are generally rare anyway. and so the added benefit isn't too
big.  OK.


cheers,
Lennert
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFT] e100 driver on ARM

2006-09-04 Thread Lennert Buytenhek

On Mon, Sep 04, 2006 at 06:39:29AM -0400, Jeff Garzik wrote:

 1) Does e100 driver work on ARM?

FWIW, e100 seems to work okay for me on an intel ixp2400 (xscale based)
board, an ixp2850 (xscale based) board and an ixp2350 (xscale3 based)
board.  ixp2350 works both with hardware coherency turned on (cpu
snoops bus) and turned off (manual dma cache clean/invalidate as usual.)

As for the other ARM platforms that I'm interested in / have hardware
for / maintain, the at91/ep93xx/pxa270 don't have PCI, and the other
two (iop32x/iop33x) I can't test because I don't have such systems with
e100 NICs, but I expect those would work, since they're both xscale
based like the ixp2400, and the ixp2400 works.


cheers,
Lennert

-- 
VGER BF report: H 6.97804e-11
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] DM9000 interrupt is hardware dependant

2006-09-04 Thread Lennert Buytenhek

On Mon, Sep 04, 2006 at 10:17:08PM +0200, Jürgen Schindele wrote:

 i made a patch for an PXA270-evalboard with DM9000
 ethernet contoller. The Interrupt can be high- or low-
 active dependant of the wiring of the MDC-(57)pin.
 
 Because of this hardware dependency you shoud be able
 to configure this behaviour in struct resource dm9000_resources[]

Putting it in 'struct dm9000_plat_data' sounds like a much better
idea to me...

Overloading the IRQ number in 'struct resource' is just ugly.


cheers,
Lennert
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 167 matches

Mail list logo