date:20070604

Re: [PATCH 4/4] [IPV4]: Restore old behaviour of default config values

2007-06-04 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 05 Jun 2007 16:31:04 +1000

> [IPV4]: Restore old behaviour of default config values
> 
> Previously inet devices were only constructed when addresses are added
> (or rarely in ipmr).  Therefore the default config values they get are
> the ones at the time of these operations.
> 
> Now that we're creating inet devices earlier, this changes the behaviour
> of default config values in an incompatible way (see bug #8519).
> 
> This patch creates a compromise by setting the default values at the
> same point as before but only for those that have not been explicitly
> set by the user since the inet device's creation.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Also applied, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] [IPV4]: Add default config support after inetdev_init

2007-06-04 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 05 Jun 2007 16:31:03 +1000

> [IPV4]: Add default config support after inetdev_init
> 
> Previously once inetdev_init has been called on a device any changes made
> to ipv4_devconf_dflt would have no effect on that device's configuration.
> 
> This creates a problem since we have moved the point where inetdev_init
> is called from when an address is added to where the device is registered.
> 
> This patch is the first half of a set that tries to mimic the old behaviour
> while still calling inetdev_init.
> 
> It propagates any changes to ipv4_devconf_dflt to those devices that have
> not had the corresponding attribute set.
> 
> The next patch will forcibly set all values at the point where inetdev_init
> was previously called.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Looks good, applied.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4] [IPV4]: Convert IPv4 devconf to an array

2007-06-04 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 05 Jun 2007 16:31:02 +1000

> [IPV4]: Convert IPv4 devconf to an array
> 
> This patch converts the ipv4_devconf config members (everything except
> sysctl) to an array.  This allows easier manipulation which will be
> needed later on to provide better management of default config values.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] [IPV4]: Only panic if inetdev_init fails for loopback

2007-06-04 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 05 Jun 2007 16:31:01 +1000

> [IPV4]: Only panic if inetdev_init fails for loopback
> 
> When I made the inetdev_init call work on all devices I incorrectly
> left in the panic call as well.  It is obviously undesirable to
> panic on an allocation failure for a normal network device.  This
> patch moves the panic call under the loopback if clause.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] [IPV4]: Add default config support after inetdev_init

2007-06-04 Thread Herbert Xu

[IPV4]: Add default config support after inetdev_init

Previously once inetdev_init has been called on a device any changes made
to ipv4_devconf_dflt would have no effect on that device's configuration.

This creates a problem since we have moved the point where inetdev_init
is called from when an address is added to where the device is registered.

This patch is the first half of a set that tries to mimic the old behaviour
while still calling inetdev_init.

It propagates any changes to ipv4_devconf_dflt to those devices that have
not had the corresponding attribute set.

The next patch will forcibly set all values at the point where inetdev_init
was previously called.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
---

 include/linux/inetdevice.h |3 +
 net/ipv4/devinet.c |  133 +
 2 files changed, 101 insertions(+), 35 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -3,6 +3,7 @@
 
 #ifdef __KERNEL__
 
+#include 
 #include 
 #include 
 #include 
@@ -12,6 +13,7 @@ struct ipv4_devconf
 {
void*sysctl;
int data[__NET_IPV4_CONF_MAX - 1];
+   DECLARE_BITMAP(state, __NET_IPV4_CONF_MAX - 1);
 };
 
 extern struct ipv4_devconf ipv4_devconf;
@@ -53,6 +55,7 @@ static inline void ipv4_devconf_set(stru
int val)
 {
index--;
+   set_bit(index, in_dev->cnf.state);
in_dev->cnf.data[index] = val;
 }
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1244,6 +1244,91 @@ errout:
 
 #ifdef CONFIG_SYSCTL
 
+static void devinet_copy_dflt_conf(int i)
+{
+   struct net_device *dev;
+
+   read_lock(&dev_base_lock);
+   for_each_netdev(dev) {
+   struct in_device *in_dev;
+   rcu_read_lock();
+   in_dev = __in_dev_get_rcu(dev);
+   if (in_dev && !test_bit(i, in_dev->cnf.state))
+   in_dev->cnf.data[i] = ipv4_devconf_dflt.data[i];
+   rcu_read_unlock();
+   }
+   read_unlock(&dev_base_lock);
+}
+
+static int devinet_conf_proc(ctl_table *ctl, int write,
+struct file* filp, void __user *buffer,
+size_t *lenp, loff_t *ppos)
+{
+   int ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos);
+
+   if (write) {
+   struct ipv4_devconf *cnf = ctl->extra1;
+   int i = (int *)ctl->data - cnf->data;
+
+   set_bit(i, cnf->state);
+
+   if (cnf == &ipv4_devconf_dflt)
+   devinet_copy_dflt_conf(i);
+   }
+
+   return ret;
+}
+
+static int devinet_conf_sysctl(ctl_table *table, int __user *name, int nlen,
+  void __user *oldval, size_t __user *oldlenp,
+  void __user *newval, size_t newlen)
+{
+   struct ipv4_devconf *cnf;
+   int *valp = table->data;
+   int new;
+   int i;
+
+   if (!newval || !newlen)
+   return 0;
+
+   if (newlen != sizeof(int))
+   return -EINVAL;
+
+   if (get_user(new, (int __user *)newval))
+   return -EFAULT;
+
+   if (new == *valp)
+   return 0;
+
+   if (oldval && oldlenp) {
+   size_t len;
+
+   if (get_user(len, oldlenp))
+   return -EFAULT;
+
+   if (len) {
+   if (len > table->maxlen)
+   len = table->maxlen;
+   if (copy_to_user(oldval, valp, len))
+   return -EFAULT;
+   if (put_user(len, oldlenp))
+   return -EFAULT;
+   }
+   }
+
+   *valp = new;
+
+   cnf = table->extra1;
+   i = (int *)table->data - cnf->data;
+
+   set_bit(i, cnf->state);
+
+   if (cnf == &ipv4_devconf_dflt)
+   devinet_copy_dflt_conf(i);
+
+   return 1;
+}
+
 void inet_forward_change(void)
 {
struct net_device *dev;
@@ -1302,40 +1387,13 @@ int ipv4_doint_and_flush_strategy(ctl_ta
  void __user *oldval, size_t __user *oldlenp,
  void __user *newval, size_t newlen)
 {
-   int *valp = table->data;
-   int new;
-
-   if (!newval || !newlen)
-   return 0;
-
-   if (newlen != sizeof(int))
-   return -EINVAL;
+   int ret = devinet_conf_sysctl(table, name, nlen, oldval, oldlenp,
+ newval, newlen);
 
-   if (get_user(new, (int __user *)newval))
-   return -EFAULT;
-
-   if (new == *valp)
-   return 0;
-
-   if (oldval && oldlenp) {
-   size_t len;
-
-   if (get_user(len, oldlenp))
-   return -EFAULT;
-
-

[PATCH 2/4] [IPV4]: Convert IPv4 devconf to an array

2007-06-04 Thread Herbert Xu

[IPV4]: Convert IPv4 devconf to an array

This patch converts the ipv4_devconf config members (everything except
sysctl) to an array.  This allows easier manipulation which will be
needed later on to provide better management of default config values.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
---

 drivers/infiniband/hw/amso1100/c2.c |2 
 include/linux/inetdevice.h  |   94 +++-
 net/ipv4/arp.c  |   11 -
 net/ipv4/devinet.c  |  264 ++--
 net/ipv4/igmp.c |   18 +-
 net/ipv4/ipmr.c |   12 -
 net/ipv4/proc.c |2 
 net/ipv4/route.c|   14 -
 net/ipv4/sysctl_net_ipv4.c  |6 
 9 files changed, 163 insertions(+), 260 deletions(-)

diff --git a/drivers/infiniband/hw/amso1100/c2.c 
b/drivers/infiniband/hw/amso1100/c2.c
--- a/drivers/infiniband/hw/amso1100/c2.c
+++ b/drivers/infiniband/hw/amso1100/c2.c
@@ -672,7 +672,7 @@ static int c2_up(struct net_device *netd
 * rdma interface.
 */
in_dev = in_dev_get(netdev);
-   in_dev->cnf.arp_ignore = 1;
+   IN_DEV_CONF_SET(in_dev, ARP_IGNORE, 1);
in_dev_put(in_dev);
 
return 0;
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -10,28 +10,8 @@
 
 struct ipv4_devconf
 {
-   int accept_redirects;
-   int send_redirects;
-   int secure_redirects;
-   int shared_media;
-   int accept_source_route;
-   int rp_filter;
-   int proxy_arp;
-   int bootp_relay;
-   int log_martians;
-   int forwarding;
-   int mc_forwarding;
-   int tag;
-   int arp_filter;
-   int arp_announce;
-   int arp_ignore;
-   int arp_accept;
-   int medium_id;
-   int no_xfrm;
-   int no_policy;
-   int force_igmp_version;
-   int promote_secondaries;
void*sysctl;
+   int data[__NET_IPV4_CONF_MAX - 1];
 };
 
 extern struct ipv4_devconf ipv4_devconf;
@@ -60,30 +40,64 @@ struct in_device
struct rcu_head rcu_head;
 };
 
-#define IN_DEV_FORWARD(in_dev) ((in_dev)->cnf.forwarding)
-#define IN_DEV_MFORWARD(in_dev)(ipv4_devconf.mc_forwarding && 
(in_dev)->cnf.mc_forwarding)
-#define IN_DEV_RPFILTER(in_dev)(ipv4_devconf.rp_filter && 
(in_dev)->cnf.rp_filter)
-#define IN_DEV_SOURCE_ROUTE(in_dev)(ipv4_devconf.accept_source_route && 
(in_dev)->cnf.accept_source_route)
-#define IN_DEV_BOOTP_RELAY(in_dev) (ipv4_devconf.bootp_relay && 
(in_dev)->cnf.bootp_relay)
-
-#define IN_DEV_LOG_MARTIANS(in_dev)(ipv4_devconf.log_martians || 
(in_dev)->cnf.log_martians)
-#define IN_DEV_PROXY_ARP(in_dev)   (ipv4_devconf.proxy_arp || 
(in_dev)->cnf.proxy_arp)
-#define IN_DEV_SHARED_MEDIA(in_dev)(ipv4_devconf.shared_media || 
(in_dev)->cnf.shared_media)
-#define IN_DEV_TX_REDIRECTS(in_dev)(ipv4_devconf.send_redirects || 
(in_dev)->cnf.send_redirects)
-#define IN_DEV_SEC_REDIRECTS(in_dev)   (ipv4_devconf.secure_redirects || 
(in_dev)->cnf.secure_redirects)
-#define IN_DEV_IDTAG(in_dev)   ((in_dev)->cnf.tag)
-#define IN_DEV_MEDIUM_ID(in_dev)   ((in_dev)->cnf.medium_id)
-#define IN_DEV_PROMOTE_SECONDARIES(in_dev) 
(ipv4_devconf.promote_secondaries || (in_dev)->cnf.promote_secondaries)
+#define IPV4_DEVCONF(cnf, attr) ((cnf).data[NET_IPV4_CONF_ ## attr - 1])
+#define IPV4_DEVCONF_ALL(attr) IPV4_DEVCONF(ipv4_devconf, attr)
+
+static inline int ipv4_devconf_get(struct in_device *in_dev, int index)
+{
+   index--;
+   return in_dev->cnf.data[index];
+}
+
+static inline void ipv4_devconf_set(struct in_device *in_dev, int index,
+   int val)
+{
+   index--;
+   in_dev->cnf.data[index] = val;
+}
+
+#define IN_DEV_CONF_GET(in_dev, attr) \
+   ipv4_devconf_get((in_dev), NET_IPV4_CONF_ ## attr)
+#define IN_DEV_CONF_SET(in_dev, attr, val) \
+   ipv4_devconf_set((in_dev), NET_IPV4_CONF_ ## attr, (val))
+
+#define IN_DEV_ANDCONF(in_dev, attr) \
+   (IPV4_DEVCONF_ALL(attr) && IN_DEV_CONF_GET((in_dev), attr))
+#define IN_DEV_ORCONF(in_dev, attr) \
+   (IPV4_DEVCONF_ALL(attr) || IN_DEV_CONF_GET((in_dev), attr))
+#define IN_DEV_MAXCONF(in_dev, attr) \
+   (max(IPV4_DEVCONF_ALL(attr), IN_DEV_CONF_GET((in_dev), attr)))
+
+#define IN_DEV_FORWARD(in_dev) IN_DEV_CONF_GET((in_dev), FORWARDING)
+#define IN_DEV_MFORWARD(in_dev)
(IPV4_DEVCONF_ALL(MC_FORWARDING) && \
+IPV4_DEVCONF((in_dev)->cnf, \
+ MC_FORWARDING))
+#define IN_DEV_RPFILTER(in_dev)IN_DEV_ANDCONF((in_dev), 
RP_FILTER)
+#define IN_DEV_SOURCE_ROUTE(in_dev)IN_DEV_ANDCONF((in_dev), \
+

[PATCH 4/4] [IPV4]: Restore old behaviour of default config values

2007-06-04 Thread Herbert Xu

[IPV4]: Restore old behaviour of default config values

Previously inet devices were only constructed when addresses are added
(or rarely in ipmr).  Therefore the default config values they get are
the ones at the time of these operations.

Now that we're creating inet devices earlier, this changes the behaviour
of default config values in an incompatible way (see bug #8519).

This patch creates a compromise by setting the default values at the
same point as before but only for those that have not been explicitly
set by the user since the inet device's creation.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
---

 include/linux/inetdevice.h |6 +-
 net/ipv4/devinet.c |   19 ---
 net/ipv4/ipmr.c|   15 +++
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -59,6 +59,11 @@ static inline void ipv4_devconf_set(stru
in_dev->cnf.data[index] = val;
 }
 
+static inline void ipv4_devconf_setall(struct in_device *in_dev)
+{
+   bitmap_fill(in_dev->cnf.state, __NET_IPV4_CONF_MAX - 1);
+}
+
 #define IN_DEV_CONF_GET(in_dev, attr) \
ipv4_devconf_get((in_dev), NET_IPV4_CONF_ ## attr)
 #define IN_DEV_CONF_SET(in_dev, attr, val) \
@@ -125,7 +130,6 @@ extern struct net_device*ip_dev_find(_
 extern int inet_addr_onlink(struct in_device *in_dev, __be32 a, 
__be32 b);
 extern int devinet_ioctl(unsigned int cmd, void __user *);
 extern voiddevinet_init(void);
-extern struct in_device *inetdev_init(struct net_device *dev);
 extern struct in_device*inetdev_by_index(int);
 extern __be32  inet_select_addr(const struct net_device *dev, __be32 
dst, int scope);
 extern __be32  inet_confirm_addr(const struct net_device *dev, __be32 
dst, __be32 local, int scope);
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -147,7 +147,7 @@ void in_dev_finish_destroy(struct in_dev
}
 }
 
-struct in_device *inetdev_init(struct net_device *dev)
+static struct in_device *inetdev_init(struct net_device *dev)
 {
struct in_device *in_dev;
 
@@ -405,12 +405,10 @@ static int inet_set_ifa(struct net_devic
ASSERT_RTNL();
 
if (!in_dev) {
-   in_dev = inetdev_init(dev);
-   if (!in_dev) {
-   inet_free_ifa(ifa);
-   return -ENOBUFS;
-   }
+   inet_free_ifa(ifa);
+   return -ENOBUFS;
}
+   ipv4_devconf_setall(in_dev);
if (ifa->ifa_dev != in_dev) {
BUG_TRAP(!ifa->ifa_dev);
in_dev_hold(in_dev);
@@ -520,13 +518,12 @@ static struct in_ifaddr *rtm_to_ifaddr(s
 
in_dev = __in_dev_get_rtnl(dev);
if (in_dev == NULL) {
-   in_dev = inetdev_init(dev);
-   if (in_dev == NULL) {
-   err = -ENOBUFS;
-   goto errout;
-   }
+   err = -ENOBUFS;
+   goto errout;
}
 
+   ipv4_devconf_setall(in_dev);
+
ifa = inet_alloc_ifa();
if (ifa == NULL) {
/*
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -152,9 +152,11 @@ struct net_device *ipmr_new_tunnel(struc
dev->flags |= IFF_MULTICAST;
 
in_dev = __in_dev_get_rtnl(dev);
-   if (in_dev == NULL && (in_dev = inetdev_init(dev)) == 
NULL)
+   if (in_dev == NULL)
goto failure;
-   IN_DEV_CONF_SET(in_dev, RP_FILTER, 0);
+
+   ipv4_devconf_setall(in_dev);
+   IPV4_DEVCONF(in_dev->cnf, RP_FILTER) = 0;
 
if (dev_open(dev))
goto failure;
@@ -218,10 +220,15 @@ static struct net_device *ipmr_reg_vif(v
}
dev->iflink = 0;
 
-   if ((in_dev = inetdev_init(dev)) == NULL)
+   rcu_read_lock();
+   if ((in_dev = __in_dev_get_rcu(dev)) == NULL) {
+   rcu_read_unlock();
goto failure;
+   }
 
-   IN_DEV_CONF_SET(in_dev, RP_FILTER, 0);
+   ipv4_devconf_setall(in_dev);
+   IPV4_DEVCONF(in_dev->cnf, RP_FILTER) = 0;
+   rcu_read_unlock();
 
if (dev_open(dev))
goto failure;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] [IPV4]: Only panic if inetdev_init fails for loopback

2007-06-04 Thread Herbert Xu

[IPV4]: Only panic if inetdev_init fails for loopback

When I made the inetdev_init call work on all devices I incorrectly
left in the panic call as well.  It is obviously undesirable to
panic on an allocation failure for a normal network device.  This
patch moves the panic call under the loopback if clause.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
---

 net/ipv4/devinet.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1057,9 +1057,10 @@ static int inetdev_event(struct notifier
if (!in_dev) {
if (event == NETDEV_REGISTER) {
in_dev = inetdev_init(dev);
-   if (!in_dev)
-   panic("devinet: Failed to create loopback\n");
if (dev == &loopback_dev) {
+   if (!in_dev)
+   panic("devinet: "
+ "Failed to create loopback\n");
in_dev->cnf.no_xfrm = 1;
in_dev->cnf.no_policy = 1;
}
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2/4] [IPV4]: Convert IPv4 devconf to an array

2007-06-04 Thread Herbert Xu

On Mon, Jun 04, 2007 at 11:17:54PM -0700, David Miller wrote:
> 
> These array indexes are off by one.

Good catch.  I'll repost this.

> This is the danger in using this "x-1" indexing style.
> Such a mistake is way too easy to make.

Yes, unfortunately the NET_IPV4_* constants are exposed to user-space
so I can't easily change them.  Inventing a new set of constants didn't
seem to be worthwhile.  The other option would be to keep the symbolic
names with a union or explicit pointer calculations for the bitmap, but
this seemed to me to be the least ugly of all the alternatives.

If it's any consolation, this should be the only spot where we use
these constants directly.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2/4] [IPV4]: Convert IPv4 devconf to an array

2007-06-04 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Sat, 2 Jun 2007 20:02:52 +1000

> @@ -64,20 +64,26 @@
>  #include 
>  
>  struct ipv4_devconf ipv4_devconf = {
> - .accept_redirects = 1,
> - .send_redirects =  1,
> - .secure_redirects = 1,
> - .shared_media =   1,
> + .data = {
> + [NET_IPV4_CONF_ACCEPT_REDIRECTS] = 1,
> + [NET_IPV4_CONF_SEND_REDIRECTS] = 1,
> + [NET_IPV4_CONF_SECURE_REDIRECTS] = 1,
> + [NET_IPV4_CONF_SHARED_MEDIA] = 1,
> + },
>  };

These array indexes are off by one.

This is the danger in using this "x-1" indexing style.
Such a mistake is way too easy to make.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] Add suspend and resume support to uli526x

2007-06-04 Thread Stephen Hemminger

I hope soon to add suspend/resume to the network device class
and remove driver specific suspend/resume from lots of devices.

The class suspend routine would just be:
pci_save_state
dev->stop

resume is
pci_restore_state
dev->open

for many devices that is all they need.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [5/5][BNX2]: Update version and reldate.

2007-06-04 Thread David Miller

From: "Michael Chan" <[EMAIL PROTECTED]>
Date: Mon, 04 Jun 2007 17:01:18 -0700

> [BNX2]: Update version and reldate.
> 
> Update to version 1.5.11.
> 
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

Also applied.

I'd like to note in passing since I saw it in the context of this diff
that I find the RUN_AT() macro confusing.

It's name implies that the argument given to it is absolute.  Instead,
the argument is relative and it's return value is absolute.

Maybe it's just me :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4/5][BNX2]: Fix occasional counter corruption on 5708.

2007-06-04 Thread David Miller

From: "Michael Chan" <[EMAIL PROTECTED]>
Date: Mon, 04 Jun 2007 17:01:08 -0700

> [BNX2]: Fix occasional counter corruption on 5708.
> 
> The statistics block DMA on 5708 can be messed up occasionally on the
> average of about once per hour.  If the user is reading the counters
> within one second after the corruption, the counters will be all
> messed up.  One second later, the counters will be ok again until the
> next corruption occurs.
> 
> The workaround is to disable the periodic statistics DMA.  Instead,
> we manually trigger the DMA once a second in bnx2_timer().  This
> manual trigger of the DMA avoids the problem.
> 
> As a consequence, we can only allow 0 or 1 second settings for
> ethtool -C statistics block.
> 
> Thanks to Jean-Daniel Pauget <[EMAIL PROTECTED]> and
> CaT <[EMAIL PROTECTED]> for reporting this rare problem.
> 
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

Applied.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3/5][BNX2]: Enable DMA on 5709.

2007-06-04 Thread David Miller

From: "Michael Chan" <[EMAIL PROTECTED]>
Date: Mon, 04 Jun 2007 17:00:48 -0700

> [BNX2]: Enable DMA on 5709.
> 
> Add missing code to enable DMA on 5709 A1.  The bit is a no-op on
> A0 and therefore can be set on all 5709 chips.
> 
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2/5 revised][BNX2]: Add missing wait in bnx2_init_5709_context().

2007-06-04 Thread David Miller

From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Mon, 4 Jun 2007 21:23:51 -0400

> On Mon, Jun 04, 2007 at 05:58:49PM -0700, Michael Chan wrote:
> > [BNX2]: Add missing wait in bnx2_init_5709_context().
> > 
> > For correctness, we need to wait for the MEM_INIT bit to be cleared
> > in the BNX2_CTX_COMMAND register before proceeding.
> > 
> > [Added return -EBUSY when the MEM_INIT bit doesn't clear, suggested
> > by Jeff Garzik.]
> > 
> > Signed-off-by: Michael Chan <[EMAIL PROTECTED]>
> 
> ACK

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [1/5][BNX2]: Fix netdev watchdog on 5708.

2007-06-04 Thread David Miller

From: "Michael Chan" <[EMAIL PROTECTED]>
Date: Mon, 04 Jun 2007 17:00:03 -0700

> [BNX2]: Fix netdev watchdog on 5708.
> 
> There's a bug in the driver that only initializes half of the context
> memory on the 5708.  Surprisingly, this works most of the time except
> for some occasional netdev watchdogs when sending a lot of 64-byte
> packets.  The fix is to add the missing code to initialize the 2nd
> halves of all context memory.
> 
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

Applied, thanks Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bugfix GFP_KERNEL -> GFP_ATOMIC in spin_locked region

2007-06-04 Thread Andrew Morton

On Mon, 4 Jun 2007 21:00:18 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote:

> > diff --git a/drivers/usb/serial/io_ti.c b/drivers/usb/serial/io_ti.c
> > index 544098d..9ec38e3 100644
> > --- a/drivers/usb/serial/io_ti.c
> > +++ b/drivers/usb/serial/io_ti.c
> > @@ -2351,7 +2351,7 @@ static int restart_read(struct edgeport_
> > urb->complete = edge_bulk_in_callback;
> > urb->context = edge_port;
> > urb->dev = edge_port->port->serial->dev;
> > -   status = usb_submit_urb(urb, GFP_KERNEL);
> > +   status = usb_submit_urb(urb, GFP_ATOMIC);
> > }
> > edge_port->ep_read_urb_state = EDGE_READ_URB_RUNNING;
> > edge_port->shadow_mcr |= MCR_RTS;
> > diff --git a/drivers/usb/serial/ti_usb_3410_5052.c 
> > b/drivers/usb/serial/ti_usb_3410_5052.c
> > index 4203e2b..10dc36f 100644
> > --- a/drivers/usb/serial/ti_usb_3410_5052.c
> > +++ b/drivers/usb/serial/ti_usb_3410_5052.c
> > @@ -1559,7 +1559,7 @@ static int ti_restart_read(struct ti_por
> > urb->complete = ti_bulk_in_callback;
> > urb->context = tport;
> > urb->dev = tport->tp_port->serial->dev;
> > -   status = usb_submit_urb(urb, GFP_KERNEL);
> > +   status = usb_submit_urb(urb, GFP_ATOMIC);
> > }
> > tport->tp_read_urb_state = TI_READ_URB_RUNNING;
> >  
> > diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c
> > index 27c5f8f..1b01207 100644
> > --- a/drivers/usb/serial/whiteheat.c
> > +++ b/drivers/usb/serial/whiteheat.c
> > @@ -1116,7 +1116,7 @@ static int firm_send_command (struct usb
> > memcpy (&transfer_buffer[1], data, datasize);
> > command_port->write_urb->transfer_buffer_length = datasize + 1;
> > command_port->write_urb->dev = port->serial->dev;
> > -   retval = usb_submit_urb (command_port->write_urb, GFP_KERNEL);
> > +   retval = usb_submit_urb (command_port->write_urb, GFP_ATOMIC);
> > if (retval) {
> > dbg("%s - submit urb failed", __FUNCTION__);
> > goto exit;
> > @@ -1316,7 +1316,7 @@ static int start_command_port(struct usb
> > usb_clear_halt(serial->dev, command_port->read_urb->pipe);
> >  
> > command_port->read_urb->dev = serial->dev;
> > -   retval = usb_submit_urb(command_port->read_urb, GFP_KERNEL);
> > +   retval = usb_submit_urb(command_port->read_urb, GFP_ATOMIC);
> > if (retval) {
> > err("%s - failed submitting read urb, error %d", 
> > __FUNCTION__, retval);
> > goto exit;
> > @@ -1363,7 +1363,7 @@ static int start_port_read(struct usb_se
> > wrap = list_entry(tmp, struct whiteheat_urb_wrap, list);
> > urb = wrap->urb;
> > urb->dev = port->serial->dev;
> > -   retval = usb_submit_urb(urb, GFP_KERNEL);
> > +   retval = usb_submit_urb(urb, GFP_ATOMIC);
> > if (retval) {
> > list_add(tmp, &info->rx_urbs_free);
> > list_for_each_safe(tmp, tmp2, &info->rx_urbs_submitted) 
> > {
> 
> This part might make sense so I'll queue it for the USB guys to look at.
> 
> 

Everything in USB appears to already be fixed, apart from the io_ti.c bug.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bugfix GFP_KERNEL -> GFP_ATOMIC in spin_locked region

2007-06-04 Thread Andrew Morton

On Mon, 04 Jun 2007 18:25:28 +0200 Yoann Padioleau <[EMAIL PROTECTED]> wrote:

> 
> In a few files a function such as usb_submit_urb is taking GFP_KERNEL
> as an argument whereas this function call is inside a
> spin_lock_irqsave region of code. Documentation says that it must be
> GFP_ATOMIC instead.
> 
> Me and my colleagues have made a tool targeting program
> transformations in device drivers. We have designed a scripting
> language for specifying easily program transformations and a
> transformation engine for performing them. In the spirit of Linux
> development practice, the language is based on the patch syntax. We
> call our scripts "semantic patches" because as opposed to traditional
> patches, our semantic patches do not work at the line level but on a
> high level representation of the C program.
> 
> FYI here is our semantic patch detecting invalid use of GFP_KERNEL and
> fixing the problem:
> 
> @@
> identifier fn;
> @@
> 
>  spin_lock_irqsave(...)
>  ... when != spin_unlock_irqrestore(...)
>  fn(...,
> - GFP_KERNEL
> + GFP_ATOMIC
>,...
>)

I think I read that paper.

> And now the real patch resulting from the automated transformation:
> 
>  net/wan/lmc/lmc_main.c|2 +-
>  scsi/megaraid/megaraid_mm.c   |2 +-
>  usb/serial/io_ti.c|2 +-
>  usb/serial/ti_usb_3410_5052.c |2 +-
>  usb/serial/whiteheat.c|6 +++---
>  5 files changed, 7 insertions(+), 7 deletions(-)

This patch covers three areas of maintainer responsibility, so poor old me
has to split it up and redo the changelogs.  Oh well.

> 
> diff --git a/drivers/net/wan/lmc/lmc_main.c b/drivers/net/wan/lmc/lmc_main.c
> index ae132c1..750b3ef 100644
> --- a/drivers/net/wan/lmc/lmc_main.c
> +++ b/drivers/net/wan/lmc/lmc_main.c
> @@ -483,7 +483,7 @@ #endif /* end ifdef _DBG_EVENTLOG */
>  break;
>  }
>  
> -data = kmalloc(xc.len, GFP_KERNEL);
> +data = kmalloc(xc.len, GFP_ATOMIC);
>  if(data == 0x0){
>  printk(KERN_WARNING "%s: Failed to allocate 
> memory for copy\n", dev->name);
>  ret = -ENOMEM;

A few lines later we do:

if(copy_from_user(data, xc.data, xc.len))

which also is illegal under spinlock.


Frankly, I think that a better use of this tool would be to just report on
the errors, let humans fix them up.

Nobody maintains this ATM code afaik.

> index e075a52..edee220 100644
> --- a/drivers/scsi/megaraid/megaraid_mm.c
> +++ b/drivers/scsi/megaraid/megaraid_mm.c
> @@ -547,7 +547,7 @@ mraid_mm_attach_buf(mraid_mmadp_t *adp, 
>  
>   kioc->pool_index= right_pool;
>   kioc->free_buf  = 1;
> - kioc->buf_vaddr = pci_pool_alloc(pool->handle, GFP_KERNEL,
> + kioc->buf_vaddr = pci_pool_alloc(pool->handle, GFP_ATOMIC,
>   &kioc->buf_paddr);
>   spin_unlock_irqrestore(&pool->lock, flags);

Again, a better fix would probably be to move the pci_pool_alloc() to
before the spin_lock_irqsave(), so we can continue to use the stronger
GFP_KERNEL.

But the locking in there looks basically nonsensical or wrong anyway.  It
appears that local variable `right_pool' cannot be validly used unless
we're holding pool->lock for the whole duration.

Somebody does maintain the megaraid driver, but I'm not sure who, and 
the MAINTAINERS file isn't very useful.  So I'll spray it around a bit.
We definitely have bugs in there.

> diff --git a/drivers/usb/serial/io_ti.c b/drivers/usb/serial/io_ti.c
> index 544098d..9ec38e3 100644
> --- a/drivers/usb/serial/io_ti.c
> +++ b/drivers/usb/serial/io_ti.c
> @@ -2351,7 +2351,7 @@ static int restart_read(struct edgeport_
>   urb->complete = edge_bulk_in_callback;
>   urb->context = edge_port;
>   urb->dev = edge_port->port->serial->dev;
> - status = usb_submit_urb(urb, GFP_KERNEL);
> + status = usb_submit_urb(urb, GFP_ATOMIC);
>   }
>   edge_port->ep_read_urb_state = EDGE_READ_URB_RUNNING;
>   edge_port->shadow_mcr |= MCR_RTS;
> diff --git a/drivers/usb/serial/ti_usb_3410_5052.c 
> b/drivers/usb/serial/ti_usb_3410_5052.c
> index 4203e2b..10dc36f 100644
> --- a/drivers/usb/serial/ti_usb_3410_5052.c
> +++ b/drivers/usb/serial/ti_usb_3410_5052.c
> @@ -1559,7 +1559,7 @@ static int ti_restart_read(struct ti_por
>   urb->complete = ti_bulk_in_callback;
>   urb->context = tport;
>   urb->dev = tport->tp_port->serial->dev;
> - status = usb_submit_urb(urb, GFP_KERNEL);
> + status = usb_submit_urb(urb, GFP_ATOMIC);
>   }
>   tport->tp_read_urb_state = TI_READ_URB_RUNNING;
>  
> diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c
> index 27c5f8f..1b01207 100644
> --- a/drivers/usb/serial/whiteheat.c
> +++ b/drivers/usb/serial/whiteheat.c
> @@ -1116,7

[PATCH] sundance: PHY address form 0, only for device ID 0x0200 (IP100A) (20070605)

2007-06-04 Thread Jesse Huang

From: Jesse Huang <[EMAIL PROTECTED]>

Change Logs:
Search PHY address form 0, only for device ID 0x0200 (IP100A). Other device are 
from PHY address 1.

Signed-off-by: Jesse Huang <[EMAIL PROTECTED]>
---

 drivers/net/sundance.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

a093fe5e3e7b9f58458171e481b5096cc1d3abfe
diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c
index e1f912d..23d049d 100644
--- a/drivers/net/sundance.c
+++ b/drivers/net/sundance.c
@@ -562,7 +562,11 @@ #endif
 * It seems some phys doesn't deal well with address 0 being accessed
 * first, so leave address zero to the end of the loop (32 & 31).
 */
-   for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) {
+   if(sundance_pci_tbl[np->chip_id].device == 0x0200) 
+   phy = 0;
+   else 
+   phy = 1;
+   for (; phy <= 32 && phy_idx < MII_CNT; phy++) {
int phyx = phy & 0x1f;
int mii_status = mdio_read(dev, phyx, MII_BMSR);
if (mii_status != 0x  &&  mii_status != 0x) {
-- 
1.3.GIT

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] sundance: PHY address form 0, only for device ID 0x0200 (IP100A) (20070605)

2007-06-04 Thread Jesse Huang

From: Jesse Huang <[EMAIL PROTECTED]>

Change Logs:
Search PHY address form 0, only for device ID 0x0200 (IP100A). Other device are 
from PHY address 1.

Signed-off-by: Jesse Huang <[EMAIL PROTECTED]>
---

 drivers/net/sundance.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

a093fe5e3e7b9f58458171e481b5096cc1d3abfe
diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c
index e1f912d..23d049d 100644
--- a/drivers/net/sundance.c
+++ b/drivers/net/sundance.c
@@ -562,7 +562,11 @@ #endif
 * It seems some phys doesn't deal well with address 0 being accessed
 * first, so leave address zero to the end of the loop (32 & 31).
 */
-   for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) {
+   if(sundance_pci_tbl[np->chip_id].device == 0x0200) 
+   phy = 0;
+   else 
+   phy = 1;
+   for (; phy <= 32 && phy_idx < MII_CNT; phy++) {
int phyx = phy & 0x1f;
int mii_status = mdio_read(dev, phyx, MII_BMSR);
if (mii_status != 0x  &&  mii_status != 0x) {
-- 
1.3.GIT



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2/5 revised][BNX2]: Add missing wait in bnx2_init_5709_context().

2007-06-04 Thread Jeff Garzik

On Mon, Jun 04, 2007 at 05:58:49PM -0700, Michael Chan wrote:
> [BNX2]: Add missing wait in bnx2_init_5709_context().
> 
> For correctness, we need to wait for the MEM_INIT bit to be cleared
> in the BNX2_CTX_COMMAND register before proceeding.
> 
> [Added return -EBUSY when the MEM_INIT bit doesn't clear, suggested
> by Jeff Garzik.]
> 
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

ACK


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] Add suspend and resume support to uli526x

2007-06-04 Thread Valerie Henson

On Sun, Jun 03, 2007 at 12:37:35PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> Add suspend/resume support to the uli526x network driver (tested on x86_64,
> with 'Ethernet controller: ALi Corporation M5263 Ethernet Controller, rev 
> 40').
> 
> Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>

Nice work!  Looks good to me.

Signed-off-by: Val Henson <[EMAIL PROTECTED]>

-VAL

>  drivers/net/tulip/uli526x.c |  120 
> ++--
>  1 file changed, 105 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6.22-rc3/drivers/net/tulip/uli526x.c
> ===
> --- linux-2.6.22-rc3.orig/drivers/net/tulip/uli526x.c 2007-06-03 
> 12:10:41.0 +0200
> +++ linux-2.6.22-rc3/drivers/net/tulip/uli526x.c  2007-06-03 
> 12:14:33.0 +0200
> @@ -220,6 +220,10 @@ static int mode = 8;
>  static int uli526x_open(struct net_device *);
>  static int uli526x_start_xmit(struct sk_buff *, struct net_device *);
>  static int uli526x_stop(struct net_device *);
> +#ifdef CONFIG_PM
> +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state);
> +static int uli526x_resume(struct pci_dev *pdev);
> +#endif
>  static struct net_device_stats * uli526x_get_stats(struct net_device *);
>  static void uli526x_set_filter_mode(struct net_device *);
>  static const struct ethtool_ops netdev_ethtool_ops;
> @@ -425,21 +429,14 @@ static void __devexit uli526x_remove_one
>  
>  
>  /*
> - *   Open the interface.
> - *   The interface is opened whenever "ifconfig" activates it.
> + *   Bring the interface up.
> + *   Used by uli526x_open and uli526x_resume.
>   */
>  
> -static int uli526x_open(struct net_device *dev)
> +static void uli526x_up(struct net_device *dev)
>  {
> - int ret;
>   struct uli526x_board_info *db = netdev_priv(dev);
>  
> - ULI526X_DBUG(0, "uli526x_open", 0);
> -
> - ret = request_irq(dev->irq, &uli526x_interrupt, IRQF_SHARED, dev->name, 
> dev);
> - if (ret)
> - return ret;
> -
>   /* system variable init */
>   db->cr6_data = CR6_DEFAULT | uli526x_cr6_user_set;
>   db->tx_packet_cnt = 0;
> @@ -467,7 +464,25 @@ static int uli526x_open(struct net_devic
>   db->timer.data = (unsigned long)dev;
>   db->timer.function = &uli526x_timer;
>   add_timer(&db->timer);
> +}
> +
> +
> +/*
> + *   Open the interface.
> + *   The interface is opened whenever "ifconfig" activates it.
> + */
>  
> +static int uli526x_open(struct net_device *dev)
> +{
> + int ret;
> +
> + ULI526X_DBUG(0, "uli526x_open", 0);
> +
> + ret = request_irq(dev->irq, &uli526x_interrupt, IRQF_SHARED, dev->name, 
> dev);
> + if (ret)
> + return ret;
> +
> + uli526x_up(dev);
>   return 0;
>  }
>  
> @@ -613,17 +628,15 @@ static int uli526x_start_xmit(struct sk_
>  
>  
>  /*
> - *   Stop the interface.
> - *   The interface is stopped when it is brought.
> + *   Take the interface down.
> + *   Used by uli526x_stop and uli526x_suspend.
>   */
>  
> -static int uli526x_stop(struct net_device *dev)
> +static void uli526x_down(struct net_device *dev)
>  {
>   struct uli526x_board_info *db = netdev_priv(dev);
>   unsigned long ioaddr = dev->base_addr;
>  
> - ULI526X_DBUG(0, "uli526x_stop", 0);
> -
>   /* disable system */
>   netif_stop_queue(dev);
>  
> @@ -634,6 +647,21 @@ static int uli526x_stop(struct net_devic
>   outl(ULI526X_RESET, ioaddr + DCR0);
>   udelay(5);
>   phy_write(db->ioaddr, db->phy_addr, 0, 0x8000, db->chip_id);
> +}
> +
> +
> +/*
> + *   Stop the interface.
> + *   The interface is stopped when it is brought.
> + */
> +
> +static int uli526x_stop(struct net_device *dev)
> +{
> + struct uli526x_board_info *db = netdev_priv(dev);
> +
> + ULI526X_DBUG(0, "uli526x_stop", 0);
> +
> + uli526x_down(dev);
>  
>   /* free interrupt */
>   free_irq(dev->irq, dev);
> @@ -654,6 +682,64 @@ static int uli526x_stop(struct net_devic
>  }
>  
>  
> +#ifdef CONFIG_PM
> +
> +/*
> + *   Suspend the interface.
> + */
> +
> +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state)
> +{
> + struct net_device *dev = pci_get_drvdata(pdev);
> +
> + ULI526X_DBUG(0, "uli526x_suspend", 0);
> +
> + if (dev && netdev_priv(dev)) {
> + if (netif_running(dev)) {
> + netif_device_detach(dev);
> + uli526x_down(dev);
> + }
> + pci_save_state(pdev);
> + pci_enable_wake(pdev, pci_choose_state(pdev, state), 0);
> + pci_disable_device(pdev);
> + pci_set_power_state(pdev, pci_choose_state(pdev, state));
> + }
> + return 0;
> +}
> +
> +/*
> + *   Resume the interface.
> + */
> +
> +static int uli526x_resume(struct pci_dev *pdev)
> +{
> + struct net_device *dev = pci_get_drvdata(pdev);
> + struct uli526x_board_info *db = netdev_priv(dev);
> + int er

[PATCH 6/7] sky2: Yukon Extreme (88e8071) support.

2007-06-04 Thread Stephen Hemminger

Enable support for Yukon EX chipset (88e8071).
Most of changes are related to new commands to chip for transmit,
and change in status and checksumming.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |  176 +++--
 drivers/net/sky2.h |   43 
 2 files changed, 148 insertions(+), 71 deletions(-)

--- a/drivers/net/sky2.c2007-06-04 08:46:47.0 -0700
+++ b/drivers/net/sky2.c2007-06-04 08:46:54.0 -0700
@@ -130,7 +130,7 @@ static const struct pci_device_id sky2_i
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4368) }, /* 88EC034 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4369) }, /* 88EC042 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x436A) }, /* 88E8058 */
-// { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x436B) }, /* 88E8071 */
+   { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x436B) }, /* 88E8071 */
{ 0 }
 };
 
@@ -661,6 +661,30 @@ static void sky2_wol_init(struct sky2_po
 
 }
 
+static void sky2_set_tx_stfwd(struct sky2_hw *hw, unsigned port)
+{
+   if (hw->chip_id == CHIP_ID_YUKON_EX && hw->chip_rev != 
CHIP_REV_YU_EX_A0) {
+   sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
+TX_STFW_ENA |
+(hw->dev[port]->mtu > ETH_DATA_LEN) ? TX_JUMBO_ENA 
: TX_JUMBO_DIS);
+   } else {
+   if (hw->dev[port]->mtu > ETH_DATA_LEN) {
+   /* set Tx GMAC FIFO Almost Empty Threshold */
+   sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR),
+(ECU_JUMBO_WM << 16) | ECU_AE_THR);
+
+   sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
+TX_JUMBO_ENA | TX_STFW_DIS);
+
+   /* Can't do offload because of lack of store/forward */
+   hw->dev[port]->features &= ~(NETIF_F_TSO | NETIF_F_SG
+| NETIF_F_ALL_CSUM);
+   } else
+   sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
+TX_JUMBO_DIS | TX_STFW_ENA);
+   }
+}
+
 static void sky2_mac_init(struct sky2_hw *hw, unsigned port)
 {
struct sky2_port *sky2 = netdev_priv(hw->dev[port]);
@@ -741,8 +765,11 @@ static void sky2_mac_init(struct sky2_hw
 
/* Configure Rx MAC FIFO */
sky2_write8(hw, SK_REG(port, RX_GMF_CTRL_T), GMF_RST_CLR);
-   sky2_write32(hw, SK_REG(port, RX_GMF_CTRL_T),
-GMF_OPER_ON | GMF_RX_F_FL_ON);
+   reg = GMF_OPER_ON | GMF_RX_F_FL_ON;
+   if (hw->chip_id == CHIP_ID_YUKON_EX)
+   reg |= GMF_RX_OVER_ON;
+
+   sky2_write32(hw, SK_REG(port, RX_GMF_CTRL_T), reg);
 
/* Flush Rx MAC FIFO on any flow control or error */
sky2_write16(hw, SK_REG(port, RX_GMF_FL_MSK), GMR_FS_ANY_ERR);
@@ -758,16 +785,7 @@ static void sky2_mac_init(struct sky2_hw
sky2_write8(hw, SK_REG(port, RX_GMF_LP_THR), 768/8);
sky2_write8(hw, SK_REG(port, RX_GMF_UP_THR), 1024/8);
 
-   /* set Tx GMAC FIFO Almost Empty Threshold */
-   sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR),
-(ECU_JUMBO_WM << 16) | ECU_AE_THR);
-
-   if (hw->dev[port]->mtu > ETH_DATA_LEN)
-   sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
-TX_JUMBO_ENA | TX_STFW_DIS);
-   else
-   sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
-TX_JUMBO_DIS | TX_STFW_ENA);
+   sky2_set_tx_stfwd(hw, port);
}
 
 }
@@ -950,14 +968,16 @@ static void rx_set_checksum(struct sky2_
 {
struct sky2_rx_le *le;
 
-   le = sky2_next_rx(sky2);
-   le->addr = cpu_to_le32((ETH_HLEN << 16) | ETH_HLEN);
-   le->ctrl = 0;
-   le->opcode = OP_TCPSTART | HW_OWNER;
-
-   sky2_write32(sky2->hw,
-Q_ADDR(rxqaddr[sky2->port], Q_CSR),
-sky2->rx_csum ? BMU_ENA_RX_CHKSUM : BMU_DIS_RX_CHKSUM);
+   if (sky2->hw->chip_id != CHIP_ID_YUKON_EX) {
+   le = sky2_next_rx(sky2);
+   le->addr = cpu_to_le32((ETH_HLEN << 16) | ETH_HLEN);
+   le->ctrl = 0;
+   le->opcode = OP_TCPSTART | HW_OWNER;
+
+   sky2_write32(sky2->hw,
+Q_ADDR(rxqaddr[sky2->port], Q_CSR),
+sky2->rx_csum ? BMU_ENA_RX_CHKSUM : 
BMU_DIS_RX_CHKSUM);
+   }
 
 }
 
@@ -1296,6 +1316,10 @@ static int sky2_up(struct net_device *de
 
sky2_qset(hw, txqaddr[port]);
 
+   /* This is copied from sk98lin 10.0.5.3; no one tells me about 
erratta's */
+   if (hw->chip_id == CHIP_ID_YUKON_EX && hw->chip_rev == 
CHIP_REV_YU_EX_B0)
+   sky2_write32(hw, Q_ADDR(txqaddr[port], Q_TEST), 
F_TX_CHK_AUTO_OFF);
+
/* Set almost em

[PATCH 7/7] sky2: version 1.15

2007-06-04 Thread Stephen Hemminger

New version because of new chip support.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/sky2.c2007-05-31 11:27:07.0 -0700
+++ b/drivers/net/sky2.c2007-05-31 11:27:08.0 -0700
@@ -50,7 +50,7 @@
 #include "sky2.h"
 
 #define DRV_NAME   "sky2"
-#define DRV_VERSION"1.14"
+#define DRV_VERSION"1.15"
 #define PFXDRV_NAME " "
 
 /*

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/7] sky2: Add PCI device specfic register 4 & 5

2007-06-04 Thread Stephen Hemminger

Need to setup more PCI control control registers are on Yukon EX.
Some of these also exist on Yukon EC-U as well.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |   18 
 drivers/net/sky2.h |   77 +
 2 files changed, 89 insertions(+), 6 deletions(-)

--- a/drivers/net/sky2.c2007-06-04 08:46:34.0 -0700
+++ b/drivers/net/sky2.c2007-06-04 08:46:41.0 -0700
@@ -217,13 +217,19 @@ static void sky2_power_on(struct sky2_hw
sky2_write8(hw, B2_Y2_CLK_GATE, 0);
 
if (hw->chip_id == CHIP_ID_YUKON_EC_U || hw->chip_id == 
CHIP_ID_YUKON_EX) {
-   u32 reg1;
+   u32 reg;
 
-   sky2_pci_write32(hw, PCI_DEV_REG3, 0);
-   reg1 = sky2_pci_read32(hw, PCI_DEV_REG4);
-   reg1 &= P_ASPM_CONTROL_MSK;
-   sky2_pci_write32(hw, PCI_DEV_REG4, reg1);
-   sky2_pci_write32(hw, PCI_DEV_REG5, 0);
+   reg = sky2_pci_read32(hw, PCI_DEV_REG4);
+   /* set all bits to 0 except bits 15..12 and 8 */
+   reg &= P_ASPM_CONTROL_MSK;
+   sky2_pci_write32(hw, PCI_DEV_REG4, reg);
+
+   reg = sky2_pci_read32(hw, PCI_DEV_REG5);
+   /* set all bits to 0 except bits 28 & 27 */
+   reg &= P_CTL_TIM_VMAIN_AV_MSK;
+   sky2_pci_write32(hw, PCI_DEV_REG5, reg);
+
+   sky2_pci_write32(hw, PCI_CFG_REG_1, 0);
}
 }
 
--- a/drivers/net/sky2.h2007-06-04 08:45:25.0 -0700
+++ b/drivers/net/sky2.h2007-06-04 08:46:41.0 -0700
@@ -14,6 +14,8 @@ enum {
PCI_DEV_REG3= 0x80,
PCI_DEV_REG4= 0x84,
PCI_DEV_REG5= 0x88,
+   PCI_CFG_REG_0   = 0x90,
+   PCI_CFG_REG_1   = 0x94,
 };
 
 enum {
@@ -28,6 +30,7 @@ enum {
 enum pci_dev_reg_1 {
PCI_Y2_PIG_ENA   = 1<<31, /* Enable Plug-in-Go (YUKON-2) */
PCI_Y2_DLL_DIS   = 1<<30, /* Disable PCI DLL (YUKON-2) */
+   PCI_SW_PWR_ON_RST= 1<<30, /* SW Power on Reset (Yukon-EX) */
PCI_Y2_PHY2_COMA = 1<<29, /* Set PHY 2 to Coma Mode (YUKON-2) */
PCI_Y2_PHY1_COMA = 1<<28, /* Set PHY 1 to Coma Mode (YUKON-2) */
PCI_Y2_PHY2_POWD = 1<<27, /* Set PHY 2 to Power Down (YUKON-2) */
@@ -67,6 +70,80 @@ enum pci_dev_reg_4 {
  | P_ASPM_CLKRUN_REQUEST | 
P_ASPM_INT_FIFO_EMPTY,
 };
 
+/* PCI_OUR_REG_5   32 bit  Our Register 5 (Yukon-ECU only) */
+enum pci_dev_reg_5 {
+   /* Bit 31..27:  for A3 & later */
+   P_CTL_DIV_CORE_CLK_ENA  = 1<<31, /* Divide Core Clock Enable */
+   P_CTL_SRESET_VMAIN_AV   = 1<<30, /* Soft Reset for Vmain_av De-Glitch */
+   P_CTL_BYPASS_VMAIN_AV   = 1<<29, /* Bypass En. for Vmain_av De-Glitch */
+   P_CTL_TIM_VMAIN_AV_MSK  = 3<<27, /* Bit 28..27: Timer Vmain_av Mask */
+/* Bit 26..16: Release Clock on Event 
*/
+   P_REL_PCIE_RST_DE_ASS   = 1<<26, /* PCIe Reset De-Asserted */
+   P_REL_GPHY_REC_PACKET   = 1<<25, /* GPHY Received Packet */
+   P_REL_INT_FIFO_N_EMPTY  = 1<<24, /* Internal FIFO Not Empty */
+   P_REL_MAIN_PWR_AVAIL= 1<<23, /* Main Power Available */
+   P_REL_CLKRUN_REQ_REL= 1<<22, /* CLKRUN Request Release */
+   P_REL_PCIE_RESET_ASS= 1<<21, /* PCIe Reset Asserted */
+   P_REL_PME_ASSERTED  = 1<<20, /* PME Asserted */
+   P_REL_PCIE_EXIT_L1_ST   = 1<<19, /* PCIe Exit L1 State */
+   P_REL_LOADER_NOT_FIN= 1<<18, /* EPROM Loader Not Finished */
+   P_REL_PCIE_RX_EX_IDLE   = 1<<17, /* PCIe Rx Exit Electrical Idle State 
*/
+   P_REL_GPHY_LINK_UP  = 1<<16, /* GPHY Link Up */
+
+   /* Bit 10.. 0: Mask for Gate Clock */
+   P_GAT_PCIE_RST_ASSERTED = 1<<10,/* PCIe Reset Asserted */
+   P_GAT_GPHY_N_REC_PACKET = 1<<9, /* GPHY Not Received Packet */
+   P_GAT_INT_FIFO_EMPTY= 1<<8, /* Internal FIFO Empty */
+   P_GAT_MAIN_PWR_N_AVAIL  = 1<<7, /* Main Power Not Available */
+   P_GAT_CLKRUN_REQ_REL= 1<<6, /* CLKRUN Not Requested */
+   P_GAT_PCIE_RESET_ASS= 1<<5, /* PCIe Reset Asserted */
+   P_GAT_PME_DE_ASSERTED   = 1<<4, /* PME De-Asserted */
+   P_GAT_PCIE_ENTER_L1_ST  = 1<<3, /* PCIe Enter L1 State */
+   P_GAT_LOADER_FINISHED   = 1<<2, /* EPROM Loader Finished */
+   P_GAT_PCIE_RX_EL_IDLE   = 1<<1, /* PCIe Rx Electrical Idle State */
+   P_GAT_GPHY_LINK_DOWN= 1<<0, /* GPHY Link Down */
+
+   PCIE_OUR5_EVENT_CLK_D3_SET = P_REL_GPHY_REC_PACKET |
+P_REL_INT_FIFO_N_EMPTY |
+P_REL_PCIE_EXIT_L1_ST |
+P_REL_PCIE_RX_EX_IDLE |
+P_GAT_GPHY_N_REC_PACKET |
+P_GAT_INT_FIFO_EMPTY |
+

[PATCH 3/7] sky2: rename BMU register

2007-06-04 Thread Stephen Hemminger

This register is more of a test and control register on Yukon2.
So rename it to Q_TEST and give some bit definitions.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |2 +-
 drivers/net/sky2.h |   29 +++--
 2 files changed, 12 insertions(+), 19 deletions(-)

--- a/drivers/net/sky2.c2007-06-04 08:46:41.0 -0700
+++ b/drivers/net/sky2.c2007-06-04 08:46:43.0 -0700
@@ -1140,7 +1140,7 @@ static int sky2_rx_start(struct sky2_por
if (hw->chip_id == CHIP_ID_YUKON_EC_U &&
(hw->chip_rev == CHIP_REV_YU_EC_U_A1
 || hw->chip_rev == CHIP_REV_YU_EC_U_B0))
-   sky2_write32(hw, Q_ADDR(rxq, Q_F), F_M_RX_RAM_DIS);
+   sky2_write32(hw, Q_ADDR(rxq, Q_TEST), F_M_RX_RAM_DIS);
 
sky2_prefetch_init(hw, rxq, sky2->rx_le_map, RX_LE_SIZE - 1);
 
--- a/drivers/net/sky2.h2007-06-04 08:46:41.0 -0700
+++ b/drivers/net/sky2.h2007-06-04 08:46:43.0 -0700
@@ -592,23 +592,15 @@ enum {
 enum {
B8_Q_REGS = 0x0400, /* base of Queue registers */
Q_D = 0x00, /* 8*32 bit Current Descriptor */
-   Q_DA_L  = 0x20, /* 32 bit   Current Descriptor Address Low dWord */
-   Q_DA_H  = 0x24, /* 32 bit   Current Descriptor Address High dWord */
+   Q_VLAN  = 0x20, /* 16 bit   Current VLAN Tag */
+   Q_DONE  = 0x24, /* 16 bit   Done Index */
Q_AC_L  = 0x28, /* 32 bit   Current Address Counter Low dWord */
Q_AC_H  = 0x2c, /* 32 bit   Current Address Counter High dWord */
Q_BC= 0x30, /* 32 bit   Current Byte Counter */
Q_CSR   = 0x34, /* 32 bit   BMU Control/Status Register */
-   Q_F = 0x38, /* 32 bit   Flag Register */
-   Q_T1= 0x3c, /* 32 bit   Test Register 1 */
-   Q_T1_TR = 0x3c, /*  8 bit   Test Register 1 Transfer SM */
-   Q_T1_WR = 0x3d, /*  8 bit   Test Register 1 Write Descriptor SM */
-   Q_T1_RD = 0x3e, /*  8 bit   Test Register 1 Read Descriptor SM */
-   Q_T1_SV = 0x3f, /*  8 bit   Test Register 1 Supervisor SM */
-   Q_T2= 0x40, /* 32 bit   Test Register 2 */
-   Q_T3= 0x44, /* 32 bit   Test Register 3 */
+   Q_TEST  = 0x38, /* 32 bit   Test/Control Register */
 
 /* Yukon-2 */
-   Q_DONE  = 0x24, /* 16 bit   Done Index  (Yukon-2 only) 
*/
Q_WM= 0x40, /* 16 bit   FIFO Watermark */
Q_AL= 0x42, /*  8 bit   FIFO Alignment */
Q_RSP   = 0x44, /* 16 bit   FIFO Read Shadow Pointer */
@@ -622,15 +614,16 @@ enum {
 };
 #define Q_ADDR(reg, offs) (B8_Q_REGS + (reg) + (offs))
 
-/* Q_F 32 bit  Flag Register */
+/* Q_TEST  32 bit  Test Register */
 enum {
-   F_ALM_FULL  = 1<<27, /* Rx FIFO: almost full */
-   F_EMPTY = 1<<27, /* Tx FIFO: empty flag */
-   F_FIFO_EOF  = 1<<26, /* Tag (EOF Flag) bit in FIFO */
-   F_WM_REACHED= 1<<25, /* Watermark reached */
+   /* Transmit */
+   F_TX_CHK_AUTO_OFF = 1<<31, /* Tx checksum auto calc off (Yukon EX) */
+   F_TX_CHK_AUTO_ON  = 1<<30, /* Tx checksum auto calc off (Yukon EX) */
+
+   /* Receive */
F_M_RX_RAM_DIS  = 1<<24, /* MAC Rx RAM Read Port disable */
-   F_FIFO_LEVEL= 0x1fL<<16, /* Bit 23..16: # of Qwords in FIFO */
-   F_WATER_MARK= 0x0007ffL, /* Bit 10.. 0: Watermark */
+
+   /* Hardware testbits not used */
 };
 
 /* Queue Prefetch Unit Offsets, use Y2_QADDR() to address (Yukon-2 only)*/

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/7] sky2: enable clocks before probe

2007-06-04 Thread Stephen Hemminger

Catch-22: On Yukon EX (88E8071) need to have internal clocks enabled
before reading chip id. It is harmless on other chips.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/drivers/net/sky2.c2007-06-04 08:46:43.0 -0700
+++ b/drivers/net/sky2.c2007-06-04 08:46:45.0 -0700
@@ -2519,6 +2519,9 @@ static int __devinit sky2_init(struct sk
 {
u8 t8;
 
+   /* Enable all clocks */
+   sky2_pci_write32(hw, PCI_DEV_REG3, 0);
+
sky2_write8(hw, B0_CTST, CS_RST_CLR);
 
hw->chip_id = sky2_read8(hw, B2_CHIP_ID);
@@ -2532,10 +2535,6 @@ static int __devinit sky2_init(struct sk
dev_warn(&hw->pdev->dev, "this driver not yet tested on this 
chip type\n"
 "Please report success or failure to 
\n");
 
-   /* Make sure and enable all clocks */
-   if (hw->chip_id == CHIP_ID_YUKON_EX || hw->chip_id == 
CHIP_ID_YUKON_EC_U)
-   sky2_pci_write32(hw, PCI_DEV_REG3, 0);
-
hw->chip_rev = (sky2_read8(hw, B2_MAC_CFG) & CFG_CHIP_R_MSK) >> 4;
 
/* This rev is really old, and requires untested workarounds */

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/7] sky2: GPIO register

2007-06-04 Thread Stephen Hemminger

The General Purpose I/O register is yet another hardware workaround
catchall. Enable workaround that vendor driver does to stay
but for bug compatiable.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |5 +
 drivers/net/sky2.h |   14 ++
 2 files changed, 19 insertions(+)

--- a/drivers/net/sky2.h2007-06-04 08:46:43.0 -0700
+++ b/drivers/net/sky2.h2007-06-04 08:46:47.0 -0700
@@ -441,6 +441,20 @@ enum {
TST_CFG_WRITE_OFF= 1<<0, /* Disable Config Reg WR */
 };
 
+/* B2_GPIO */
+enum {
+   GLB_GPIO_CLK_DEB_ENA = 1<<31,   /* Clock Debug Enable */
+   GLB_GPIO_CLK_DBG_MSK = 0xf<<26, /* Clock Debug */
+
+   GLB_GPIO_INT_RST_D3_DIS = 1<<15, /* Disable Internal Reset After D3 to 
D0 */
+   GLB_GPIO_LED_PAD_SPEED_UP = 1<<14, /* LED PAD Speed Up */
+   GLB_GPIO_STAT_RACE_DIS  = 1<<13, /* Status Race Disable */
+   GLB_GPIO_TEST_SEL_MSK   = 3<<11, /* Testmode Select */
+   GLB_GPIO_TEST_SEL_BASE  = 1<<11,
+   GLB_GPIO_RAND_ENA   = 1<<10, /* Random Enable */
+   GLB_GPIO_RAND_BIT_1 = 1<<9,  /* Random Bit 1 */
+};
+
 /* B2_MAC_CFG   8 bit  MAC Configuration / Chip Revision */
 enum {
CFG_CHIP_R_MSK= 0xf<<4, /* Bit 7.. 4: Chip Revision */
--- a/drivers/net/sky2.c2007-06-04 08:46:45.0 -0700
+++ b/drivers/net/sky2.c2007-06-04 08:46:47.0 -0700
@@ -230,6 +230,11 @@ static void sky2_power_on(struct sky2_hw
sky2_pci_write32(hw, PCI_DEV_REG5, reg);
 
sky2_pci_write32(hw, PCI_CFG_REG_1, 0);
+
+   /* Enable workaround for dev 4.107 on Yukon-Ultra & Extreme */
+   reg = sky2_read32(hw, B2_GP_IO);
+   reg |= GLB_GPIO_STAT_RACE_DIS;
+   sky2_write32(hw, B2_GP_IO, reg);
}
 }
 

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/7] sky2 version 1.15 (88e8071) support

2007-06-04 Thread Stephen Hemminger

These changes are to enable the Yukon Extreme (88e8071)
chipset. This chip is similar to earlier chip but has a different
set of offloading operations and some other minor quirks.

Marvell has given me some evaluation boards with the 88e8071
chip set. The chip is available now, but haven't seen a systems
with the hardware yet. The support fot this is exprerimental
at this point, but haven't encountered any new problems.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/7] sky2: avoid reserved regions on ethtool reg dump

2007-06-04 Thread Stephen Hemminger

On Yukon EX reading some of the undocumented places in the
memory space will cause a hang. Since they don't provide useful
information, just skip the reserved areas.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |   14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

--- a/drivers/net/sky2.c2007-06-04 08:45:25.0 -0700
+++ b/drivers/net/sky2.c2007-06-04 08:46:34.0 -0700
@@ -3330,7 +3330,7 @@ static int sky2_get_regs_len(struct net_
 
 /*
  * Returns copy of control register region
- * Note: access to the RAM address register set will cause timeouts.
+ * Note: ethtool_get_regs always provides full size (16k) buffer
  */
 static void sky2_get_regs(struct net_device *dev, struct ethtool_regs *regs,
  void *p)
@@ -3338,15 +3338,19 @@ static void sky2_get_regs(struct net_dev
const struct sky2_port *sky2 = netdev_priv(dev);
const void __iomem *io = sky2->hw->regs;
 
-   BUG_ON(regs->len < B3_RI_WTO_R1);
regs->version = 1;
memset(p, 0, regs->len);
 
memcpy_fromio(p, io, B3_RAM_ADDR);
 
-   memcpy_fromio(p + B3_RI_WTO_R1,
- io + B3_RI_WTO_R1,
- regs->len - B3_RI_WTO_R1);
+   /* skip diagnostic ram region */
+   memcpy_fromio(p + B3_RI_WTO_R1, io + B3_RI_WTO_R1, 0x2000 - 
B3_RI_WTO_R1);
+
+   /* copy GMAC registers */
+   memcpy_fromio(p + BASE_GMAC_1, io + BASE_GMAC_1, 0x1000);
+   if (sky2->hw->ports > 1)
+   memcpy_fromio(p + BASE_GMAC_2, io + BASE_GMAC_2, 0x1000);
+
 }
 
 /* In order to do Jumbo packets on these chips, need to turn off the

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2/5 revised][BNX2]: Add missing wait in bnx2_init_5709_context().

2007-06-04 Thread Michael Chan

[BNX2]: Add missing wait in bnx2_init_5709_context().

For correctness, we need to wait for the MEM_INIT bit to be cleared
in the BNX2_CTX_COMMAND register before proceeding.

[Added return -EBUSY when the MEM_INIT bit doesn't clear, suggested
by Jeff Garzik.]

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 5d69e5b..0f861e2 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -1778,6 +1778,15 @@ bnx2_init_5709_context(struct bnx2 *bp)
val = BNX2_CTX_COMMAND_ENABLED | BNX2_CTX_COMMAND_MEM_INIT | (1 << 12);
val |= (BCM_PAGE_BITS - 8) << 16;
REG_WR(bp, BNX2_CTX_COMMAND, val);
+   for (i = 0; i < 10; i++) {
+   val = REG_RD(bp, BNX2_CTX_COMMAND);
+   if (!(val & BNX2_CTX_COMMAND_MEM_INIT))
+   break;
+   udelay(2);
+   }
+   if (val & BNX2_CTX_COMMAND_MEM_INIT)
+   return -EBUSY;
+
for (i = 0; i < bp->ctx_pages; i++) {
int j;
 
@@ -3696,9 +3705,11 @@ bnx2_init_chip(struct bnx2 *bp)
 
/* Initialize context mapping and zero out the quick contexts.  The
 * context block must have already been enabled. */
-   if (CHIP_NUM(bp) == CHIP_NUM_5709)
-   bnx2_init_5709_context(bp);
-   else
+   if (CHIP_NUM(bp) == CHIP_NUM_5709) {
+   rc = bnx2_init_5709_context(bp);
+   if (rc)
+   return rc;
+   } else
bnx2_init_context(bp);
 
if ((rc = bnx2_init_cpus(bp)) != 0)


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: r8169 in 2.6.18 silently corrupting data, 2.6.22-rc3 link not detected at all

2007-06-04 Thread Andrew Paprocki


Francois,

Using the latest patch you provided in the previous e-mail under
2.6.22-rc3 I still see data corruption after I dropped the entire
network to MTU 1500. (The receive corruption is way more frequent than
under 2.6.18.) I'll attempt to find out what caused the link detection
to stop working. Do you have any idea about the data corruption? I can
easily put in trace anywhere in the driver if you'd like to see any
output. My latest build was without NAPI support, and running with or
without it didn't seem to affect the receive corruption.

Thanks, -Andrew

On 6/4/07, Francois Romieu <[EMAIL PROTECTED]> wrote:

Francois Romieu <[EMAIL PROTECTED]> :
[...]

Andrew, have you had any time to do some testing with an uniform 7200
bytes MTU through your LAN ?

You will find a backport of the r8169 driver for the 2.6.18 kernel at
http://www.fr.zoreil.com/people/francois/backport/r8169/20070604-00

Could you do a binary search of the patch which breaks the link detection ?
From a different report that I have received, the current testing branch of
the 8169 driver would improve things for MTU beyond 1500 bytes: nailing down
your broken link regression would be a welcome improvement.

If you are confortable with git, you can do a git bisect after pulling
the branch 'r8169-backport-test-20070604' at:
git://electric-eye.fr.zoreil.com/home/romieu/linux/linux-2.6-out

(please do the initial clone from elsewhere if you use git, thanks)

--
Ueimor



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Add security check before flushing SAD/SPD

2007-06-04 Thread David Miller

From: James Morris <[EMAIL PROTECTED]>
Date: Mon, 4 Jun 2007 19:13:10 -0400 (EDT)

> I've applied this patch to 
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-davem
> 
> 
> Dave, feel free to pull from that branch.

Thanks James, I'll pull that in for my next round of
networking fixes.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2/5][BNX2]: Add missing wait in bnx2_init_5709_context().

2007-06-04 Thread Jeff Garzik

On Mon, Jun 04, 2007 at 05:00:35PM -0700, Michael Chan wrote:
> [BNX2]: Add missing wait in bnx2_init_5709_context().
> 
> For correctness, we need to wait for the MEM_INIT bit to be cleared
> in the BNX2_CTX_COMMAND register before proceeding.
> 
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 5d69e5b..daa62d9 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -1778,6 +1778,12 @@ bnx2_init_5709_context(struct bnx2 *bp)
>   val = BNX2_CTX_COMMAND_ENABLED | BNX2_CTX_COMMAND_MEM_INIT | (1 << 12);
>   val |= (BCM_PAGE_BITS - 8) << 16;
>   REG_WR(bp, BNX2_CTX_COMMAND, val);
> + for (i = 0; i < 10; i++) {
> + val = REG_RD(bp, BNX2_CTX_COMMAND);
> + if (!(val & BNX2_CTX_COMMAND_MEM_INIT))
> + break;
> + udelay(2);
> + }
>   for (i = 0; i < bp->ctx_pages; i++) {
>   int j;

If the bit does not clear after 10 loops, it seems to me you should
complain or somehow otherwise make note of the failure.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [1/5][BNX2]: Fix netdev watchdog on 5708.

2007-06-04 Thread Jeff Garzik

On Mon, Jun 04, 2007 at 05:00:03PM -0700, Michael Chan wrote:
> [BNX2]: Fix netdev watchdog on 5708.
> 
> There's a bug in the driver that only initializes half of the context
> memory on the 5708.  Surprisingly, this works most of the time except
> for some occasional netdev watchdogs when sending a lot of 64-byte
> packets.  The fix is to add the missing code to initialize the 2nd
> halves of all context memory.
> 
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

ACK patch 1, and 3-5


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[2/5][BNX2]: Add missing wait in bnx2_init_5709_context().

2007-06-04 Thread Michael Chan

[BNX2]: Add missing wait in bnx2_init_5709_context().

For correctness, we need to wait for the MEM_INIT bit to be cleared
in the BNX2_CTX_COMMAND register before proceeding.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 5d69e5b..daa62d9 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -1778,6 +1778,12 @@ bnx2_init_5709_context(struct bnx2 *bp)
val = BNX2_CTX_COMMAND_ENABLED | BNX2_CTX_COMMAND_MEM_INIT | (1 << 12);
val |= (BCM_PAGE_BITS - 8) << 16;
REG_WR(bp, BNX2_CTX_COMMAND, val);
+   for (i = 0; i < 10; i++) {
+   val = REG_RD(bp, BNX2_CTX_COMMAND);
+   if (!(val & BNX2_CTX_COMMAND_MEM_INIT))
+   break;
+   udelay(2);
+   }
for (i = 0; i < bp->ctx_pages; i++) {
int j;
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[4/5][BNX2]: Fix occasional counter corruption on 5708.

2007-06-04 Thread Michael Chan

[BNX2]: Fix occasional counter corruption on 5708.

The statistics block DMA on 5708 can be messed up occasionally on the
average of about once per hour.  If the user is reading the counters
within one second after the corruption, the counters will be all
messed up.  One second later, the counters will be ok again until the
next corruption occurs.

The workaround is to disable the periodic statistics DMA.  Instead,
we manually trigger the DMA once a second in bnx2_timer().  This
manual trigger of the DMA avoids the problem.

As a consequence, we can only allow 0 or 1 second settings for
ethtool -C statistics block.

Thanks to Jean-Daniel Pauget <[EMAIL PROTECTED]> and
CaT <[EMAIL PROTECTED]> for reporting this rare problem.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index c03282c..5654cc7 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -3783,7 +3783,10 @@ bnx2_init_chip(struct bnx2 *bp)
REG_WR(bp, BNX2_HC_CMD_TICKS,
   (bp->cmd_ticks_int << 16) | bp->cmd_ticks);
 
-   REG_WR(bp, BNX2_HC_STATS_TICKS, bp->stats_ticks & 0x00);
+   if (CHIP_NUM(bp) == CHIP_NUM_5708)
+   REG_WR(bp, BNX2_HC_STATS_TICKS, 0);
+   else
+   REG_WR(bp, BNX2_HC_STATS_TICKS, bp->stats_ticks & 0x00);
REG_WR(bp, BNX2_HC_STAT_COLLECT_TICKS, 0xbb8);  /* 3ms */
 
if (CHIP_ID(bp) == CHIP_ID_5706_A1)
@@ -4636,6 +4639,11 @@ bnx2_timer(unsigned long data)
 
bp->stats_blk->stat_FwRxDrop = REG_RD_IND(bp, BNX2_FW_RX_DROP_COUNT);
 
+   /* workaround occasional corrupted counters */
+   if (CHIP_NUM(bp) == CHIP_NUM_5708 && bp->stats_ticks)
+   REG_WR(bp, BNX2_HC_COMMAND, bp->hc_cmd |
+   BNX2_HC_COMMAND_STATS_NOW);
+
if (bp->phy_flags & PHY_SERDES_FLAG) {
if (CHIP_NUM(bp) == CHIP_NUM_5706)
bnx2_5706_serdes_timer(bp);
@@ -5446,6 +5454,10 @@ bnx2_set_coalesce(struct net_device *dev, struct 
ethtool_coalesce *coal)
0xff;
 
bp->stats_ticks = coal->stats_block_coalesce_usecs;
+   if (CHIP_NUM(bp) == CHIP_NUM_5708) {
+   if (bp->stats_ticks != 0 && bp->stats_ticks != USEC_PER_SEC)
+   bp->stats_ticks = USEC_PER_SEC;
+   }
if (bp->stats_ticks > 0x00) bp->stats_ticks = 0x00;
bp->stats_ticks &= 0x00;
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[5/5][BNX2]: Update version and reldate.

2007-06-04 Thread Michael Chan

[BNX2]: Update version and reldate.

Update to version 1.5.11.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 58e6f49..a9b833b 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -54,8 +54,8 @@
 
 #define DRV_MODULE_NAME"bnx2"
 #define PFX DRV_MODULE_NAME": "
-#define DRV_MODULE_VERSION "1.5.10"
-#define DRV_MODULE_RELDATE "May 1, 2007"
+#define DRV_MODULE_VERSION "1.5.11"
+#define DRV_MODULE_RELDATE "June 4, 2007"
 
 #define RUN_AT(x) (jiffies + (x))
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[1/5][BNX2]: Fix netdev watchdog on 5708.

2007-06-04 Thread Michael Chan

[BNX2]: Fix netdev watchdog on 5708.

There's a bug in the driver that only initializes half of the context
memory on the 5708.  Surprisingly, this works most of the time except
for some occasional netdev watchdogs when sending a lot of 64-byte
packets.  The fix is to add the missing code to initialize the 2nd
halves of all context memory.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 88b33c6..5d69e5b 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -1811,6 +1811,7 @@ bnx2_init_context(struct bnx2 *bp)
vcid = 96;
while (vcid) {
u32 vcid_addr, pcid_addr, offset;
+   int i;
 
vcid--;
 
@@ -1831,16 +1832,20 @@ bnx2_init_context(struct bnx2 *bp)
pcid_addr = vcid_addr;
}
 
-   REG_WR(bp, BNX2_CTX_VIRT_ADDR, 0x00);
-   REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr);
+   for (i = 0; i < (CTX_SIZE / PHY_CTX_SIZE); i++) {
+   vcid_addr += (i << PHY_CTX_SHIFT);
+   pcid_addr += (i << PHY_CTX_SHIFT);
 
-   /* Zero out the context. */
-   for (offset = 0; offset < PHY_CTX_SIZE; offset += 4) {
-   CTX_WR(bp, 0x00, offset, 0);
-   }
+   REG_WR(bp, BNX2_CTX_VIRT_ADDR, 0x00);
+   REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr);
 
-   REG_WR(bp, BNX2_CTX_VIRT_ADDR, vcid_addr);
-   REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr);
+   /* Zero out the context. */
+   for (offset = 0; offset < PHY_CTX_SIZE; offset += 4)
+   CTX_WR(bp, 0x00, offset, 0);
+
+   REG_WR(bp, BNX2_CTX_VIRT_ADDR, vcid_addr);
+   REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr);
+   }
}
 }
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[3/5][BNX2]: Enable DMA on 5709.

2007-06-04 Thread Michael Chan

[BNX2]: Enable DMA on 5709.

Add missing code to enable DMA on 5709 A1.  The bit is a no-op on
A0 and therefore can be set on all 5709 chips.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index daa62d9..c03282c 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -3810,6 +3810,11 @@ bnx2_init_chip(struct bnx2 *bp)
/* Initialize the receive filter. */
bnx2_set_rx_mode(bp->dev);
 
+   if (CHIP_NUM(bp) == CHIP_NUM_5709) {
+   val = REG_RD(bp, BNX2_MISC_NEW_CORE_CTL);
+   val |= BNX2_MISC_NEW_CORE_CTL_DMA_ENABLE;
+   REG_WR(bp, BNX2_MISC_NEW_CORE_CTL, val);
+   }
rc = bnx2_fw_sync(bp, BNX2_DRV_MSG_DATA_WAIT2 | BNX2_DRV_MSG_CODE_RESET,
  0);
 
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index bd6288d..49a5de2 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -1373,6 +1373,7 @@ struct l2_fhdr {
 #define BNX2_MISC_NEW_CORE_CTL 0x08c8
 #define BNX2_MISC_NEW_CORE_CTL_LINK_HOLDOFF_SUCCESS (1L<<0)
 #define BNX2_MISC_NEW_CORE_CTL_LINK_HOLDOFF_REQ (1L<<1)
+#define BNX2_MISC_NEW_CORE_CTL_DMA_ENABLE   (1L<<16)
 #define BNX2_MISC_NEW_CORE_CTL_RESERVED_CMN (0x3fffL<<2)
 #define BNX2_MISC_NEW_CORE_CTL_RESERVED_TC  (0xL<<16)
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Add security check before flushing SAD/SPD

2007-06-04 Thread James Morris

I've applied this patch to 

git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-davem


Dave, feel free to pull from that branch.



- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] Add suspend and resume support to uli526x

2007-06-04 Thread Nigel Cunningham

Hi.

On Tue, 2007-06-05 at 00:53 +0200, Pavel Machek wrote:
> On Tue 2007-06-05 08:49:04, Nigel Cunningham wrote:
> > Hi.
> > 
> > On Tue, 2007-06-05 at 00:41 +0200, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > > > > +#endif /* CONFIG_PM */
> > > > > > 
> > > > > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver 
> > > > > > >   .id_table   = uli526x_pci_tbl,
> > > > > > >   .probe  = uli526x_init_one,
> > > > > > >   .remove = __devexit_p(uli526x_remove_one),
> > > > > > > +#ifdef CONFIG_PM
> > > > > > > + .suspend= uli526x_suspend,
> > > > > > > + .resume = uli526x_resume,
> > > > > > > +#endif
> > > > > > 
> > > > > > ...so that this ifdef is not needed?
> > > > > 
> > > > > OK, why not.
> > > > 
> > > > Because it's uglier and #ifdef is the established way of doing things?
> > > 
> > > Actually the way I suggested is nicer, IIRC akpm invented it. It keeps
> > > ifdefs localized around the block that _needs_ to be ifdefed.
> > 
> > The localised point is true. I'll also admit that 'nicer'/'uglier' is a
> > matter of aesthetics and therefore personal opinion.
> > 
> > I guess that leaves the question, "What's the precedent to follow?" or
> > "Is there a driver that's already got this new format?".
> 
> for example net/ne.c ...
> 
> [EMAIL PROTECTED]:/data/l/linux/drivers$ grep "_suspend NULL" */*.c
> leds/leds-ams-delta.c:#define ams_delta_led_suspend NULL
> leds/leds-net48xx.c:#define net48xx_led_suspend NULL
> leds/leds-s3c24xx.c:#define s3c24xx_led_suspend NULL
> leds/leds-tosa.c:#define tosaled_suspend NULL
> leds/leds-wrap.c:#define wrap_led_suspend NULL
> misc/tifm_7xx1.c:#define tifm_7xx1_suspend NULL
> misc/tifm_core.c:#define tifm_device_suspend NULL
> net/3c509.c:#define el3_suspend NULL
> net/forcedeth.c:#define nv_suspend NULL
> net/ne.c:#define ne_drv_suspend NULL
> parport/parport_ax88796.c:#define parport_ax88796_suspend NULL
> rtc/rtc-at91rm9200.c:#define at91_rtc_suspend NULL
> rtc/rtc-omap.c:#define omap_rtc_suspend NULL
> rtc/rtc-s3c.c:#define s3c_rtc_suspend NULL
> serial/8250_pnp.c:#define serial_pnp_suspend NULL
> serial/atmel_serial.c:#define atmel_serial_suspend NULL
> serial/s3c2410.c:#define s3c24xx_serial_suspend NULL
> spi/pxa2xx_spi.c:#define pxa2xx_spi_suspend NULL
> spi/spi_bfin5xx.c:#define bfin5xx_spi_suspend NULL
> spi/spi_imx.c:#define spi_imx_suspend NULL
> spi/spi_s3c24xx.c:#define s3c24xx_spi_suspend NULL
> spi/spi_s3c24xx_gpio.c:#define s3c2410_spigpio_suspend NULL
> video/arkfb.c:#define ark_pci_suspend NULL
> video/au1100fb.c:#define au1100fb_drv_suspend NULL
> video/s3c2410fb.c:#define s3c2410fb_suspend NULL
> video/skeletonfb.c:#define xxxfb_suspend NULL
> video/skeletonfb.c:#define xxxfb_suspend NULL
> video/sm501fb.c:#define sm501fb_suspend NULL

Cool. That's a lot more than I would have expected. Ok. I'll drop my
objection then. (Still think it's ugly though :>).

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: [RFC][PATCH] Add suspend and resume support to uli526x

2007-06-04 Thread Pavel Machek

On Tue 2007-06-05 08:49:04, Nigel Cunningham wrote:
> Hi.
> 
> On Tue, 2007-06-05 at 00:41 +0200, Pavel Machek wrote:
> > Hi!
> > 
> > > > > > +#endif /* CONFIG_PM */
> > > > > 
> > > > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver 
> > > > > > .id_table   = uli526x_pci_tbl,
> > > > > > .probe  = uli526x_init_one,
> > > > > > .remove = __devexit_p(uli526x_remove_one),
> > > > > > +#ifdef CONFIG_PM
> > > > > > +   .suspend= uli526x_suspend,
> > > > > > +   .resume = uli526x_resume,
> > > > > > +#endif
> > > > > 
> > > > > ...so that this ifdef is not needed?
> > > > 
> > > > OK, why not.
> > > 
> > > Because it's uglier and #ifdef is the established way of doing things?
> > 
> > Actually the way I suggested is nicer, IIRC akpm invented it. It keeps
> > ifdefs localized around the block that _needs_ to be ifdefed.
> 
> The localised point is true. I'll also admit that 'nicer'/'uglier' is a
> matter of aesthetics and therefore personal opinion.
> 
> I guess that leaves the question, "What's the precedent to follow?" or
> "Is there a driver that's already got this new format?".

for example net/ne.c ...

[EMAIL PROTECTED]:/data/l/linux/drivers$ grep "_suspend NULL" */*.c
leds/leds-ams-delta.c:#define ams_delta_led_suspend NULL
leds/leds-net48xx.c:#define net48xx_led_suspend NULL
leds/leds-s3c24xx.c:#define s3c24xx_led_suspend NULL
leds/leds-tosa.c:#define tosaled_suspend NULL
leds/leds-wrap.c:#define wrap_led_suspend NULL
misc/tifm_7xx1.c:#define tifm_7xx1_suspend NULL
misc/tifm_core.c:#define tifm_device_suspend NULL
net/3c509.c:#define el3_suspend NULL
net/forcedeth.c:#define nv_suspend NULL
net/ne.c:#define ne_drv_suspend NULL
parport/parport_ax88796.c:#define parport_ax88796_suspend NULL
rtc/rtc-at91rm9200.c:#define at91_rtc_suspend NULL
rtc/rtc-omap.c:#define omap_rtc_suspend NULL
rtc/rtc-s3c.c:#define s3c_rtc_suspend NULL
serial/8250_pnp.c:#define serial_pnp_suspend NULL
serial/atmel_serial.c:#define atmel_serial_suspend NULL
serial/s3c2410.c:#define s3c24xx_serial_suspend NULL
spi/pxa2xx_spi.c:#define pxa2xx_spi_suspend NULL
spi/spi_bfin5xx.c:#define bfin5xx_spi_suspend NULL
spi/spi_imx.c:#define spi_imx_suspend NULL
spi/spi_s3c24xx.c:#define s3c24xx_spi_suspend NULL
spi/spi_s3c24xx_gpio.c:#define s3c2410_spigpio_suspend NULL
video/arkfb.c:#define ark_pci_suspend NULL
video/au1100fb.c:#define au1100fb_drv_suspend NULL
video/s3c2410fb.c:#define s3c2410fb_suspend NULL
video/skeletonfb.c:#define xxxfb_suspend NULL
video/skeletonfb.c:#define xxxfb_suspend NULL
video/sm501fb.c:#define sm501fb_suspend NULL

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] Add suspend and resume support to uli526x

2007-06-04 Thread Nigel Cunningham

Hi.

On Tue, 2007-06-05 at 00:41 +0200, Pavel Machek wrote:
> Hi!
> 
> > > > > +#endif /* CONFIG_PM */
> > > > 
> > > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver 
> > > > >   .id_table   = uli526x_pci_tbl,
> > > > >   .probe  = uli526x_init_one,
> > > > >   .remove = __devexit_p(uli526x_remove_one),
> > > > > +#ifdef CONFIG_PM
> > > > > + .suspend= uli526x_suspend,
> > > > > + .resume = uli526x_resume,
> > > > > +#endif
> > > > 
> > > > ...so that this ifdef is not needed?
> > > 
> > > OK, why not.
> > 
> > Because it's uglier and #ifdef is the established way of doing things?
> 
> Actually the way I suggested is nicer, IIRC akpm invented it. It keeps
> ifdefs localized around the block that _needs_ to be ifdefed.

The localised point is true. I'll also admit that 'nicer'/'uglier' is a
matter of aesthetics and therefore personal opinion.

I guess that leaves the question, "What's the precedent to follow?" or
"Is there a driver that's already got this new format?".

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: [RFC][PATCH] Add suspend and resume support to uli526x

2007-06-04 Thread Pavel Machek

Hi!

> > > > +#endif /* CONFIG_PM */
> > > 
> > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver 
> > > > .id_table   = uli526x_pci_tbl,
> > > > .probe  = uli526x_init_one,
> > > > .remove = __devexit_p(uli526x_remove_one),
> > > > +#ifdef CONFIG_PM
> > > > +   .suspend= uli526x_suspend,
> > > > +   .resume = uli526x_resume,
> > > > +#endif
> > > 
> > > ...so that this ifdef is not needed?
> > 
> > OK, why not.
> 
> Because it's uglier and #ifdef is the established way of doing things?

Actually the way I suggested is nicer, IIRC akpm invented it. It keeps
ifdefs localized around the block that _needs_ to be ifdefed.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-mm patch] e1000: #if 0 two functions

2007-06-04 Thread Adrian Bunk

e1000_{read,write}_pci_cfg() are no longer used.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/net/e1000/e1000_hw.h   |2 --
 drivers/net/e1000/e1000_main.c |4 
 2 files changed, 4 insertions(+), 2 deletions(-)

--- linux-2.6.22-rc3-mm1/drivers/net/e1000/e1000_hw.h.old   2007-06-04 
22:03:05.0 +0200
+++ linux-2.6.22-rc3-mm1/drivers/net/e1000/e1000_hw.h   2007-06-04 
22:03:14.0 +0200
@@ -421,8 +421,6 @@ void e1000_tbi_adjust_stats(struct e1000
 void e1000_get_bus_info(struct e1000_hw *hw);
 void e1000_pci_set_mwi(struct e1000_hw *hw);
 void e1000_pci_clear_mwi(struct e1000_hw *hw);
-void e1000_read_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t * value);
-void e1000_write_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t * value);
 int32_t e1000_read_pcie_cap_reg(struct e1000_hw *hw, uint32_t reg, uint16_t 
*value);
 void e1000_pcix_set_mmrbc(struct e1000_hw *hw, int mmrbc);
 int e1000_pcix_get_mmrbc(struct e1000_hw *hw);
--- linux-2.6.22-rc3-mm1/drivers/net/e1000/e1000_main.c.old 2007-06-04 
22:03:24.0 +0200
+++ linux-2.6.22-rc3-mm1/drivers/net/e1000/e1000_main.c 2007-06-04 
22:03:40.0 +0200
@@ -4888,6 +4888,8 @@ e1000_pci_clear_mwi(struct e1000_hw *hw)
pci_clear_mwi(adapter->pdev);
 }
 
+#if 0
+
 void
 e1000_read_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t *value)
 {
@@ -4904,6 +4906,8 @@ e1000_write_pci_cfg(struct e1000_hw *hw,
pci_write_config_word(adapter->pdev, reg, *value);
 }
 
+#endif  /*  0  */
+
 int
 e1000_pcix_get_mmrbc(struct e1000_hw *hw)
 {

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: skb->priority on arp packets

2007-06-04 Thread Ben Greear


Waskiewicz Jr, Peter P wrote:
is it possible to set the skb->priority on arp packets 
generated by the kernel?
I want to to set the 802.1p priority on arp and ip packets on 
an interface. On ip packets, this can be done by the iptables 
CLASSIFY target and the 
skb->priority mapping from the vlan implementation.


any ideas?


Currently unclassified ip packets have skb->priority set in ipsockglue,
by translating the IP TOS field into a Linux priority.  This is not the
same priority as 802.1p, rather, it's a Linux-based OS priority
classification for dequeuing priority in schedulers such as sch_prio.
802.1p lives in the VLAN tag, which is separate from skb->priority.  I'd
suggest reading http://lartc.org/howto/lartc.qdisc.classless.html to see
what exactly skb->priority is used for in sch_prio and pfifo_fast, and
how the ip layer determines a packet's priority.  The skb->priority
field is also used in other ways in other qdiscs I'm not completely
familiar with, but it's not directly related to priority of the packet.
I'd need to look closer at those qdiscs to be specific.

That being said, the only way I can think of manipulating skb->priority
on arp packets (and actually setting it) is modifying a qdisc to set
skb->priority if it matches a filter created using the tc command.


If you do manage to get the skb->priority set, then you can use vconfig
to set up mapping between skb->priority and the VLAN priority field.

Thanks,
Ben



Cheers,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: skb->priority on arp packets

2007-06-04 Thread Waskiewicz Jr, Peter P

> is it possible to set the skb->priority on arp packets 
> generated by the kernel?
> I want to to set the 802.1p priority on arp and ip packets on 
> an interface. On ip packets, this can be done by the iptables 
> CLASSIFY target and the 
> skb->priority mapping from the vlan implementation.
> 
> any ideas?

Currently unclassified ip packets have skb->priority set in ipsockglue,
by translating the IP TOS field into a Linux priority.  This is not the
same priority as 802.1p, rather, it's a Linux-based OS priority
classification for dequeuing priority in schedulers such as sch_prio.
802.1p lives in the VLAN tag, which is separate from skb->priority.  I'd
suggest reading http://lartc.org/howto/lartc.qdisc.classless.html to see
what exactly skb->priority is used for in sch_prio and pfifo_fast, and
how the ip layer determines a packet's priority.  The skb->priority
field is also used in other ways in other qdiscs I'm not completely
familiar with, but it's not directly related to priority of the packet.
I'd need to look closer at those qdiscs to be specific.

That being said, the only way I can think of manipulating skb->priority
on arp packets (and actually setting it) is modifying a qdisc to set
skb->priority if it matches a filter created using the tc command.

Cheers,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] b44 driver improvements

2007-06-04 Thread John W. Linville

On Mon, Jun 04, 2007 at 01:25:36PM -0700, Stephen Hemminger wrote:
> While researching why the wired networking was so slow
> on my laptop (not a driver problem); spotted these small
> changes to b44 driver. Nothing urgent, maybe 2.6.23
> or later material.

ACK to the lot

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: r8169: hard freezes on TX

2007-06-04 Thread Francois Romieu

Rolf Eike Beer <[EMAIL PROTECTED]> :
[...]
> I just had another freeze using your patches. After 512kB over smb it was 
> dead.

In-kernel smb/cifs ?

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] iproute2: Added support for RR qdisc (sch_rr)

2007-06-04 Thread PJ Waskiewicz

Add tc support for the sch_rr qdisc.  This qdisc supports multiple queues
on hardware.  The syntax for sch_rr is the same as sch_prio.

Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>
---

 include/linux/pkt_sched.h |   11 
 tc/Makefile   |1 
 tc/q_rr.c |  113 +
 3 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index d10f353..907412b 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -22,6 +22,7 @@
 #define TC_PRIO_CONTROL7
 
 #define TC_PRIO_MAX15
+#define TC_RR_MAX  15
 
 /* Generic queue statistics, available for all the elements.
Particular schedulers may have also their private records.
@@ -90,6 +91,16 @@ struct tc_fifo_qopt
__u32   limit;  /* Queue length: bytes for bfifo, packets for pfifo */
 };
 
+/* RR section */
+#define TCQ_RR_BANDS   16
+#define TCQ_MIN_RR_BANDS 2
+
+struct tc_rr_qopt
+{
+   int bands;
+   __u8priomap[TC_RR_MAX + 1];
+};
+
 /* PRIO section */
 
 #define TCQ_PRIO_BANDS 16
diff --git a/tc/Makefile b/tc/Makefile
index 9d618ff..62e2697 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -9,6 +9,7 @@ TCMODULES += q_fifo.o
 TCMODULES += q_sfq.o
 TCMODULES += q_red.o
 TCMODULES += q_prio.o
+TCMODULES += q_rr.o
 TCMODULES += q_tbf.o
 TCMODULES += q_cbq.o
 TCMODULES += f_rsvp.o
diff --git a/tc/q_rr.c b/tc/q_rr.c
new file mode 100644
index 000..8eecac9
--- /dev/null
+++ b/tc/q_rr.c
@@ -0,0 +1,113 @@
+/*
+ * q_rr.c  RR.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:PJ Waskiewicz, <[EMAIL PROTECTED]>
+ * Original Authors:   Alexey Kuznetsov, <[EMAIL PROTECTED]> (from PRIO)
+ *
+ * Changes:
+ *
+ * Ole Husgaard <[EMAIL PROTECTED]>: 990513: prio2band map was always reset.
+ * J Hadi Salim <[EMAIL PROTECTED]>: 990609: priomap fix.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+   fprintf(stderr, "Usage: ... rr bands NUMBER priomap P1 P2...\n");
+}
+
+#define usage() return(-1)
+
+static int rr_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct 
nlmsghdr *n)
+{
+   int ok = 0;
+   int pmap_mode = 0;
+   int idx = 0;
+   struct tc_rr_qopt opt={3,{ 1, 2, 2, 2, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1, 
1 }};
+
+   while (argc > 0) {
+   if (strcmp(*argv, "bands") == 0) {
+   if (pmap_mode)
+   explain();
+   NEXT_ARG();
+   if (get_integer(&opt.bands, *argv, 10)) {
+   fprintf(stderr, "Illegal \"bands\"\n");
+   return -1;
+   }
+   ok++;
+   } else if (strcmp(*argv, "priomap") == 0) {
+   if (pmap_mode) {
+   fprintf(stderr, "Error: duplicate priomap\n");
+   return -1;
+   }
+   pmap_mode = 1;
+   } else if (strcmp(*argv, "help") == 0) {
+   explain();
+   return -1;
+   } else {
+   unsigned band;
+   if (!pmap_mode) {
+   fprintf(stderr, "What is \"%s\"?\n", *argv);
+   explain();
+   return -1;
+   }
+   if (get_unsigned(&band, *argv, 10)) {
+   fprintf(stderr, "Illegal \"priomap\" 
element\n");
+   return -1;
+   }
+   if (band > opt.bands) {
+   fprintf(stderr, "\"priomap\" element is out of 
bands\n");
+   return -1;
+   }
+   if (idx > TC_RR_MAX) {
+   fprintf(stderr, "\"priomap\" index > 
TC_RR_MAX=%u\n", TC_RR_MAX);
+   return -1;
+   }
+   opt.priomap[idx++] = band;
+   }
+   argc--; argv++;
+   }
+
+   addattr_l(n, 1024, TCA_OPTIONS, &opt, sizeof(opt));
+   return 0;
+}
+
+int rr_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+   int i;
+   struct tc_rr_qopt *qopt;
+
+   if (opt == NULL)
+   return 0;
+
+   if (RTA_PAYLOAD(opt)  < sizeof(*qopt))
+

[RFC] iproute2: sch_rr support in tc

2007-06-04 Thread PJ Waskiewicz

This patch is to support the new sch_rr (round-robin) qdisc being proposed
in NET for multiqueue network device support in the Linux network stack.
It uses q_prio.c as the template, since the qdiscs are nearly identical,
outside of the ->dequeue() routine.

I'm soliciting feedback for a 2.6.23 multiqueue submission.  Thanks.

-- 
PJ Waskiewicz <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] NET: Multiqueue network device support.

2007-06-04 Thread PJ Waskiewicz

API added to support multiple hardware queues on an ethernet device.
Round-robin scheduler added (sch_rr) to provide a no-scheduling policy
qdisc for hardware with multiple queues.

Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>
---

 include/linux/etherdevice.h |3 
 include/linux/netdevice.h   |   62 +
 include/linux/pkt_sched.h   |   11 +
 include/linux/skbuff.h  |2 
 net/core/dev.c  |   27 ++
 net/core/skbuff.c   |3 
 net/ethernet/eth.c  |9 -
 net/sched/Kconfig   |   22 ++
 net/sched/Makefile  |1 
 net/sched/sch_generic.c |4 
 net/sched/sch_prio.c|   66 +-
 net/sched/sch_rr.c  |  516 +++
 12 files changed, 706 insertions(+), 20 deletions(-)

diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index 071c67a..283e687 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -39,7 +39,8 @@ extern void   eth_header_cache_update(struct hh_cache 
*hh, struct net_device *dev
 extern int eth_header_cache(struct neighbour *neigh,
 struct hh_cache *hh);
 
-extern struct net_device *alloc_etherdev(int sizeof_priv);
+extern struct net_device *alloc_etherdev_mq(int sizeof_priv, int queue_count);
+#define alloc_etherdev(sizeof_priv) alloc_etherdev_mq(sizeof_priv, 1)
 static inline void eth_copy_and_sum (struct sk_buff *dest, 
 const unsigned char *src, 
 int len, int base)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f671cd2..376a0d2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -108,6 +108,14 @@ struct wireless_dev;
 #define MAX_HEADER (LL_MAX_HEADER + 48)
 #endif
 
+struct net_device_subqueue
+{
+   /* Give a control state for each queue.  This struct may contain
+* per-queue locks in the future.
+*/
+   unsigned long   state;
+};
+
 /*
  * Network device statistics. Akin to the 2.0 ether stats but
  * with byte counters.
@@ -325,6 +333,7 @@ struct net_device
 #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN 
packets */
 #define NETIF_F_GSO2048/* Enable software GSO. */
 #define NETIF_F_LLTX   4096/* LockLess TX */
+#define NETIF_F_MULTI_QUEUE16384   /* Has multiple TX/RX queues */
 
/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT  16
@@ -540,6 +549,10 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+   /* The TX queue control structures */
+   struct net_device_subqueue  *egress_subqueue;
+   int egress_subqueue_count;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -702,6 +715,48 @@ static inline int netif_running(const struct net_device 
*dev)
return test_bit(__LINK_STATE_START, &dev->state);
 }
 
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the device if we're multiqueue.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, u16 
queue_index)
+{
+   clear_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, u16 queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   set_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+ u16 queue_index)
+{
+   return test_bit(__LINK_STATE_XOFF,
+   &dev->egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   if (test_and_clear_bit(__LINK_STATE_XOFF,
+  &dev->egress_subqueue[queue_index].state))
+   __netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev)
+{
+   return (!!(NETIF_F_MULTI_QUEUE & dev->features));
+}
 
 /* Use this variant when it is known for sure that it
  * is executing from interrupt context.
@@ -995,8 +1050,11 @@ static inline void netif_tx_disable(struct net_device 
*dev)
 extern voidether_setup(struct net_device *dev);
 
 /* Support for loadable net-drivers */
-extern struct net_device *alloc_netdev(int sizeof_priv, const char *name,
-  void (*setup)(struct net_device *));
+extern struct net_device *al

[RFC] NET: Multiple queue hardware support

2007-06-04 Thread PJ Waskiewicz

This patchset is an updated version of previous multiqueue network device
support patches.  The general approach of introducing a new API for multiqueue
network devices to register with the stack has remained.  The changes include
adding a round-robin qdisc, heavily based on sch_prio, which will allow
queueing to hardware with no OS-enforced queuing policy.  sch_prio still has
the multiqueue code in it, but has a Kconfig option to compile it out of the
qdisc.  This allows people with hardware containing scheduling policies to
use sch_rr (round-robin), and others without scheduling policies in hardware
to continue using sch_prio if they wish to have some notion of scheduling
priority.

The patches to iproute2 for tc will be sent separately, to support sch_rr.

I'm soliciting feedback for a 2.6.23 submission.  Thanks.

-- 
PJ Waskiewicz <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

skb->priority on arp packets

2007-06-04 Thread Marc Donner

hi @all,

is it possible to set the skb->priority on arp packets generated by the 
kernel?
I want to to set the 802.1p priority on arp and ip packets on an interface. On 
ip packets, this can be done by the iptables CLASSIFY target and the 
skb->priority mapping from the vlan implementation.

any ideas?

thx
Marc
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] Add suspend and resume support to uli526x

2007-06-04 Thread Rafael J. Wysocki

Hi,

On Monday, 4 June 2007 23:16, Nigel Cunningham wrote:
> Hi.
> 
> On Mon, 2007-06-04 at 15:49 +0200, Rafael J. Wysocki wrote:
> > On Monday, 4 June 2007 13:11, Pavel Machek wrote:
> > > > From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > > 
> > > > Add suspend/resume support to the uli526x network driver (tested on 
> > > > x86_64,
> > > > with 'Ethernet controller: ALi Corporation M5263 Ethernet Controller, 
> > > > rev 40').
> > > > 
> > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > 
> > > Looks ok to me.
> > > 
> > > > +#ifdef CONFIG_PM
> > > > +
> > > > +/*
> > > > + * Suspend the interface.
> > > > + */
> > > > +
> > > > +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state)
> > > > +{
> > > > +   struct net_device *dev = pci_get_drvdata(pdev);
> > > > +
> > > > +   ULI526X_DBUG(0, "uli526x_suspend", 0);
> > > > +
> > > > +   if (dev && netdev_priv(dev)) {
> > > > +   if (netif_running(dev)) {
> > > > +   netif_device_detach(dev);
> > > > +   uli526x_down(dev);
> > > > +   }
> > > > +   pci_save_state(pdev);
> > > > +   pci_enable_wake(pdev, pci_choose_state(pdev, state), 0);
> > > > +   pci_disable_device(pdev);
> > > > +   pci_set_power_state(pdev, pci_choose_state(pdev, 
> > > > state));
> > > > +   }
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Resume the interface.
> > > > + */
> > > > +
> > > > +static int uli526x_resume(struct pci_dev *pdev)
> > > > +{
> > > > +   struct net_device *dev = pci_get_drvdata(pdev);
> > > > +   struct uli526x_board_info *db = netdev_priv(dev);
> > > > +   int err;
> > > > +
> > > > +   ULI526X_DBUG(0, "uli526x_resume", 0);
> > > > +
> > > > +   if (dev && db) {
> > > > +   pci_set_power_state(pdev, PCI_D0);
> > > > +   err = pci_enable_device(pdev);
> > > > +   if (err) {
> > > > +   printk(KERN_WARNING "%s: Could not enable 
> > > > device \n",
> > > > +   dev->name);
> > > > +   return err;
> > > > +   }
> > > > +   pci_restore_state(pdev);
> > > > +   pci_set_master(pdev);
> > > > +   if (netif_running(dev)) {
> > > > +   uli526x_up(dev);
> > > > +   netif_device_attach(dev);
> > > > +   }
> > > > +   }
> > > > +   return 0;
> > > > +}
> > > > +
> > > #else 
> > > #define *_resume NULL
> > > #define *_suspend NULL
> > 
> > > > +#endif /* CONFIG_PM */
> > > 
> > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver 
> > > > .id_table   = uli526x_pci_tbl,
> > > > .probe  = uli526x_init_one,
> > > > .remove = __devexit_p(uli526x_remove_one),
> > > > +#ifdef CONFIG_PM
> > > > +   .suspend= uli526x_suspend,
> > > > +   .resume = uli526x_resume,
> > > > +#endif
> > > 
> > > ...so that this ifdef is not needed?
> > 
> > OK, why not.
> 
> Because it's uglier and #ifdef is the established way of doing things?

Hmm, I have no strong opinion.  I slightly prefer the #ifdef, but well.

Perhaps I'll send it to Andrew 'as is' if no one else has any more comments. ;-)

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] Add suspend and resume support to uli526x

2007-06-04 Thread Nigel Cunningham

Hi.

On Mon, 2007-06-04 at 15:49 +0200, Rafael J. Wysocki wrote:
> On Monday, 4 June 2007 13:11, Pavel Machek wrote:
> > > From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > 
> > > Add suspend/resume support to the uli526x network driver (tested on 
> > > x86_64,
> > > with 'Ethernet controller: ALi Corporation M5263 Ethernet Controller, rev 
> > > 40').
> > > 
> > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > 
> > Looks ok to me.
> > 
> > > +#ifdef CONFIG_PM
> > > +
> > > +/*
> > > + *   Suspend the interface.
> > > + */
> > > +
> > > +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state)
> > > +{
> > > + struct net_device *dev = pci_get_drvdata(pdev);
> > > +
> > > + ULI526X_DBUG(0, "uli526x_suspend", 0);
> > > +
> > > + if (dev && netdev_priv(dev)) {
> > > + if (netif_running(dev)) {
> > > + netif_device_detach(dev);
> > > + uli526x_down(dev);
> > > + }
> > > + pci_save_state(pdev);
> > > + pci_enable_wake(pdev, pci_choose_state(pdev, state), 0);
> > > + pci_disable_device(pdev);
> > > + pci_set_power_state(pdev, pci_choose_state(pdev, state));
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +/*
> > > + *   Resume the interface.
> > > + */
> > > +
> > > +static int uli526x_resume(struct pci_dev *pdev)
> > > +{
> > > + struct net_device *dev = pci_get_drvdata(pdev);
> > > + struct uli526x_board_info *db = netdev_priv(dev);
> > > + int err;
> > > +
> > > + ULI526X_DBUG(0, "uli526x_resume", 0);
> > > +
> > > + if (dev && db) {
> > > + pci_set_power_state(pdev, PCI_D0);
> > > + err = pci_enable_device(pdev);
> > > + if (err) {
> > > + printk(KERN_WARNING "%s: Could not enable device \n",
> > > + dev->name);
> > > + return err;
> > > + }
> > > + pci_restore_state(pdev);
> > > + pci_set_master(pdev);
> > > + if (netif_running(dev)) {
> > > + uli526x_up(dev);
> > > + netif_device_attach(dev);
> > > + }
> > > + }
> > > + return 0;
> > > +}
> > > +
> > #else 
> > #define *_resume NULL
> > #define *_suspend NULL
> 
> > > +#endif /* CONFIG_PM */
> > 
> > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver 
> > >   .id_table   = uli526x_pci_tbl,
> > >   .probe  = uli526x_init_one,
> > >   .remove = __devexit_p(uli526x_remove_one),
> > > +#ifdef CONFIG_PM
> > > + .suspend= uli526x_suspend,
> > > + .resume = uli526x_resume,
> > > +#endif
> > 
> > ...so that this ifdef is not needed?
> 
> OK, why not.

Because it's uglier and #ifdef is the established way of doing things?

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 0/4] b44 driver improvements

2007-06-04 Thread Jeff Garzik

On Mon, Jun 04, 2007 at 01:25:36PM -0700, Stephen Hemminger wrote:
> While researching why the wired networking was so slow
> on my laptop (not a driver problem); spotted these small
> changes to b44 driver. Nothing urgent, maybe 2.6.23
> or later material.

Seems sane to me.  I'm travelling and will merge this at the beginning
of next week.  I agree netdev#upstream is appropriate, though the
b44_resume() fix may want upstreaming for 2.6.22.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: r8169 in 2.6.18 silently corrupting data, 2.6.22-rc3 link not detected at all

2007-06-04 Thread Francois Romieu

Francois Romieu <[EMAIL PROTECTED]> :
[...]

Andrew, have you had any time to do some testing with an uniform 7200
bytes MTU through your LAN ?

You will find a backport of the r8169 driver for the 2.6.18 kernel at
http://www.fr.zoreil.com/people/francois/backport/r8169/20070604-00

Could you do a binary search of the patch which breaks the link detection ?
>From a different report that I have received, the current testing branch of
the 8169 driver would improve things for MTU beyond 1500 bytes: nailing down
your broken link regression would be a welcome improvement.

If you are confortable with git, you can do a git bisect after pulling
the branch 'r8169-backport-test-20070604' at:
git://electric-eye.fr.zoreil.com/home/romieu/linux/linux-2.6-out

(please do the initial clone from elsewhere if you use git, thanks)

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iperf: performance regression (was b44 driver problem?)

2007-06-04 Thread Maximilian Engelhardt

On Monday 04 June 2007, Stephen Hemminger wrote:
> On Mon, 4 Jun 2007 21:47:59 +0200
>
> Maximilian Engelhardt <[EMAIL PROTECTED]> wrote:
> > On Monday 04 June 2007, Ingo Molnar wrote:
> > > * Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> > > > Yes, the following patch makes iperf work better than ever. But are
> > > > other broken applications going to have same problem. Sounds like the
> > > > old "who runs first" fork() problems.
> > >
> > > this is the first such app and really, and even for this app: i've been
> > > frequently running iperf on -rt kernels for _years_ and never noticed
> > > how buggy its 'locking' code was, and that it would under some
> > > circumstances use up the whole CPU on high-res timers.
> >
> > I must admit I don't know much about that topic, but there is one thing I
> > don't understand. Why is iperf (even if it's buggy) able to use up the
> > whole cpu? I didn't run it as root but as my normal user so it should
> > have limited rights. Shouldn't the linux scheduler distribute cpu time
> > among all running processes?
>
> In this case, there are two threads. One is receiving data and the other
> is spinning checking on progress. If the spinning thread doesn't yield,
> it will end up using it's whole quantum (10ms at 100hz), before the
> scheduler lets the receiver run again. If the receiving thread doesn't
> get to run then on a UP the performance stinks.
>
Ok, let's see if I got this right:
If there are other processes that want cpu time they will get it after the 
quantum for the iperf thread is used up. So cpu time will be distributed 
among other processes, but it takes some time until they get it and this 
increases latency.

> The problem only showed up laptop because most of my other systems are
> SMP (or fake SMP/HT), and usually set HZ to 1000 not 100.

Hm, on my laptop (Pentium M) I have configured CONFIG_HZ_300 and CONFIG_NO_HZ.
On my desktop PC (Athlon 2000+, also UP) I also have CONFIG_HZ_300 and 
CONFIG_NO_HZ but didn't notice the problem.

Maxi


signature.asc
Description: This is a digitally signed message part.

[PATCH 2/4] b44: tx bounce sizing.

2007-06-04 Thread Stephen Hemminger

No need to grap full size MTU buffer for possibly small transmit
bounce buffers.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/b44.c |7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/drivers/net/b44.c 2007-06-04 12:46:54.0 -0700
+++ b/drivers/net/b44.c 2007-06-04 12:48:21.0 -0700
@@ -69,7 +69,6 @@
 #define NEXT_TX(N) (((N) + 1) & (B44_TX_RING_SIZE - 1))
 
 #define RX_PKT_BUF_SZ  (1536 + bp->rx_offset + 64)
-#define TX_PKT_BUF_SZ  (B44_MAX_MTU + ETH_HLEN + 8)
 
 /* minimum number of free TX descriptors required to wake up TX process */
 #define B44_TX_WAKEUP_THRESH   (B44_TX_RING_SIZE / 4)
@@ -968,7 +967,6 @@ static void b44_tx_timeout(struct net_de
 static int b44_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct b44 *bp = netdev_priv(dev);
-   struct sk_buff *bounce_skb;
int rc = NETDEV_TX_OK;
dma_addr_t mapping;
u32 len, entry, ctrl;
@@ -986,12 +984,13 @@ static int b44_start_xmit(struct sk_buff
 
mapping = pci_map_single(bp->pdev, skb->data, len, PCI_DMA_TODEVICE);
if (dma_mapping_error(mapping) || mapping + len > DMA_30BIT_MASK) {
+   struct sk_buff *bounce_skb;
+
/* Chip can't handle DMA to/from >1GB, use bounce buffer */
if (!dma_mapping_error(mapping))
pci_unmap_single(bp->pdev, mapping, len, 
PCI_DMA_TODEVICE);
 
-   bounce_skb = __dev_alloc_skb(TX_PKT_BUF_SZ,
-GFP_ATOMIC|GFP_DMA);
+   bounce_skb = __dev_alloc_skb(len, GFP_ATOMIC | GFP_DMA);
if (!bounce_skb)
goto err_out;
 
@@ -1000,13 +999,12 @@ static int b44_start_xmit(struct sk_buff
if (dma_mapping_error(mapping) || mapping + len > 
DMA_30BIT_MASK) {
if (!dma_mapping_error(mapping))
pci_unmap_single(bp->pdev, mapping,
-len, PCI_DMA_TODEVICE);
+len, PCI_DMA_TODEVICE);
dev_kfree_skb_any(bounce_skb);
goto err_out;
}
 
-   skb_copy_from_linear_data(skb, skb_put(bounce_skb, len),
- skb->len);
+   skb_copy_from_linear_data(skb, skb_put(bounce_skb, len), len);
dev_kfree_skb_any(skb);
skb = bounce_skb;
}

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] b44: use netdev_alloc_skb

2007-06-04 Thread Stephen Hemminger

Use netdev_alloc_skb rather than dev_alloc_skb when allocating
receive buffers.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/b44.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/drivers/net/b44.c 2007-06-04 12:49:11.0 -0700
+++ b/drivers/net/b44.c 2007-06-04 12:49:26.0 -0700
@@ -653,7 +653,7 @@ static int b44_alloc_rx_skb(struct b44 *
src_map = &bp->rx_buffers[src_idx];
dest_idx = dest_idx_unmasked & (B44_RX_RING_SIZE - 1);
map = &bp->rx_buffers[dest_idx];
-   skb = dev_alloc_skb(RX_PKT_BUF_SZ);
+   skb = netdev_alloc_skb(bp->dev, RX_PKT_BUF_SZ);
if (skb == NULL)
return -ENOMEM;
 
@@ -669,7 +669,7 @@ static int b44_alloc_rx_skb(struct b44 *
if (!dma_mapping_error(mapping))
pci_unmap_single(bp->pdev, mapping, 
RX_PKT_BUF_SZ,PCI_DMA_FROMDEVICE);
dev_kfree_skb_any(skb);
-   skb = __dev_alloc_skb(RX_PKT_BUF_SZ,GFP_DMA);
+   skb = __netdev_alloc_skb(bp->dev, RX_PKT_BUF_SZ, 
GFP_ATOMIC|GFP_DMA);
if (skb == NULL)
return -ENOMEM;
mapping = pci_map_single(bp->pdev, skb->data,
@@ -684,7 +684,6 @@ static int b44_alloc_rx_skb(struct b44 *
}
}
 
-   skb->dev = bp->dev;
rh = (struct rx_header *) skb->data;
skb_reserve(skb, RX_PKT_OFFSET);
 

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] b44: packet offset is constant

2007-06-04 Thread Stephen Hemminger

The receive buffer offset is constant in this driver.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/b44.c |   29 -
 drivers/net/b44.h |2 --
 2 files changed, 12 insertions(+), 19 deletions(-)

Index: lifebook/drivers/net/b44.c
===
--- lifebook.orig/drivers/net/b44.c 2007-06-04 13:22:12.0 -0700
+++ lifebook/drivers/net/b44.c  2007-06-04 13:27:47.0 -0700
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -68,7 +69,8 @@
  (BP)->tx_cons - (BP)->tx_prod - TX_RING_GAP(BP))
 #define NEXT_TX(N) (((N) + 1) & (B44_TX_RING_SIZE - 1))
 
-#define RX_PKT_BUF_SZ  (1536 + bp->rx_offset + 64)
+#define RX_PKT_OFFSET  30
+#define RX_PKT_BUF_SZ  (1536 + RX_PKT_OFFSET + 64)
 
 /* minimum number of free TX descriptors required to wake up TX process */
 #define B44_TX_WAKEUP_THRESH   (B44_TX_RING_SIZE / 4)
@@ -683,10 +685,9 @@
}
 
skb->dev = bp->dev;
-   skb_reserve(skb, bp->rx_offset);
+   rh = (struct rx_header *) skb->data;
+   skb_reserve(skb, RX_PKT_OFFSET);
 
-   rh = (struct rx_header *)
-   (skb->data - bp->rx_offset);
rh->len = 0;
rh->flags = 0;
 
@@ -696,13 +697,13 @@
if (src_map != NULL)
src_map->skb = NULL;
 
-   ctrl  = (DESC_CTRL_LEN & (RX_PKT_BUF_SZ - bp->rx_offset));
+   ctrl  = (DESC_CTRL_LEN & (RX_PKT_BUF_SZ - RX_PKT_OFFSET));
if (dest_idx == (B44_RX_RING_SIZE - 1))
ctrl |= DESC_CTRL_EOT;
 
dp = &bp->rx_ring[dest_idx];
dp->ctrl = cpu_to_le32(ctrl);
-   dp->addr = cpu_to_le32((u32) mapping + bp->rx_offset + bp->dma_offset);
+   dp->addr = cpu_to_le32((u32) mapping + RX_PKT_OFFSET + bp->dma_offset);
 
if (bp->flags & B44_FLAG_RX_RING_HACK)
b44_sync_dma_desc_for_device(bp->pdev, bp->rx_ring_dma,
@@ -781,7 +782,7 @@
PCI_DMA_FROMDEVICE);
rh = (struct rx_header *) skb->data;
len = le16_to_cpu(rh->len);
-   if ((len > (RX_PKT_BUF_SZ - bp->rx_offset)) ||
+   if ((len > (RX_PKT_BUF_SZ - RX_PKT_OFFSET)) ||
(rh->flags & cpu_to_le16(RX_FLAG_ERRORS))) {
drop_it:
b44_recycle_rx(bp, cons, bp->rx_prod);
@@ -813,8 +814,8 @@
pci_unmap_single(bp->pdev, map,
 skb_size, PCI_DMA_FROMDEVICE);
/* Leave out rx_header */
-   skb_put(skb, len+bp->rx_offset);
-   skb_pull(skb,bp->rx_offset);
+   skb_put(skb, len + RX_PKT_OFFSET);
+   skb_pull(skb, RX_PKT_OFFSET);
} else {
struct sk_buff *copy_skb;
 
@@ -826,7 +827,7 @@
skb_reserve(copy_skb, 2);
skb_put(copy_skb, len);
/* DMA sync done above, copy just the actual packet */
-   skb_copy_from_linear_data_offset(skb, bp->rx_offset,
+   skb_copy_from_linear_data_offset(skb, RX_PKT_OFFSET,
 copy_skb->data, len);
skb = copy_skb;
}
@@ -1393,12 +1394,12 @@
bw32(bp, B44_TX_WMARK, 56); /* XXX magic */
if (reset_kind == B44_PARTIAL_RESET) {
bw32(bp, B44_DMARX_CTRL, (DMARX_CTRL_ENABLE |
- (bp->rx_offset << DMARX_CTRL_ROSHIFT)));
+ (RX_PKT_OFFSET << DMARX_CTRL_ROSHIFT)));
} else {
bw32(bp, B44_DMATX_CTRL, DMATX_CTRL_ENABLE);
bw32(bp, B44_DMATX_ADDR, bp->tx_ring_dma + bp->dma_offset);
bw32(bp, B44_DMARX_CTRL, (DMARX_CTRL_ENABLE |
- (bp->rx_offset << DMARX_CTRL_ROSHIFT)));
+ (RX_PKT_OFFSET << DMARX_CTRL_ROSHIFT)));
bw32(bp, B44_DMARX_ADDR, bp->rx_ring_dma + bp->dma_offset);
 
bw32(bp, B44_DMARX_PTR, bp->rx_pending);
@@ -2090,11 +2091,6 @@
 
bp->phy_addr = eeprom[90] & 0x1f;
 
-   /* With this, plus the rx_header prepended to the data by the
-* hardware, we'll land the ethernet header on a 2-byte boundary.
-*/
-   bp->rx_offset = 30;
-
bp->imask = IMASK_DEF;
 
bp->core_unit = ssb_core_unit(bp);
Index: lifebook/drivers/net/b44.h
===
--- lifebook.orig/drivers/net/b44.h 2007-06-04 13:18:25.0 -0700
+++ lifebook/drivers/net/b44.h  2007-06-04 13:22:15.0 -0700
@@ -443,8 +443,6 @@
 #define B44_FLAG_TX_RING_HACK  0x4000
 #define B44_FLAG_WOL_ENABLE0x80

[PATCH 0/4] b44 driver improvements

2007-06-04 Thread Stephen Hemminger

While researching why the wired networking was so slow
on my laptop (not a driver problem); spotted these small
changes to b44 driver. Nothing urgent, maybe 2.6.23
or later material.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] b44: timer power saving

2007-06-04 Thread Stephen Hemminger

Make the PHY and statistic timer run on one second boundary
for powersaving.

On resume, the driver should check for link up immediately, to
get online faster (rather than waiting for the next second).

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


---
 drivers/net/b44.c |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

--- a/drivers/net/b44.c 2007-06-04 12:31:27.0 -0700
+++ b/drivers/net/b44.c 2007-06-04 12:31:34.0 -0700
@@ -599,8 +599,7 @@ static void b44_timer(unsigned long __op
 
spin_unlock_irq(&bp->lock);
 
-   bp->timer.expires = jiffies + HZ;
-   add_timer(&bp->timer);
+   mod_timer(&bp->timer, round_jiffies(jiffies + HZ));
 }
 
 static void b44_tx(struct b44 *bp)
@@ -2348,11 +2347,11 @@ static int b44_resume(struct pci_dev *pd
netif_device_attach(bp->dev);
spin_unlock_irq(&bp->lock);
 
-   bp->timer.expires = jiffies + HZ;
-   add_timer(&bp->timer);
-
b44_enable_ints(bp);
netif_wake_queue(dev);
+
+   mod_timer(&bp->timer, jiffies + 1);
+
return 0;
 }
 

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iperf: performance regression (was b44 driver problem?)

2007-06-04 Thread Stephen Hemminger

On Mon, 4 Jun 2007 21:47:59 +0200
Maximilian Engelhardt <[EMAIL PROTECTED]> wrote:

> On Monday 04 June 2007, Ingo Molnar wrote:
> > * Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> > > Yes, the following patch makes iperf work better than ever. But are
> > > other broken applications going to have same problem. Sounds like the
> > > old "who runs first" fork() problems.
> >
> > this is the first such app and really, and even for this app: i've been
> > frequently running iperf on -rt kernels for _years_ and never noticed
> > how buggy its 'locking' code was, and that it would under some
> > circumstances use up the whole CPU on high-res timers.
> 
> I must admit I don't know much about that topic, but there is one thing I 
> don't understand. Why is iperf (even if it's buggy) able to use up the whole 
> cpu? I didn't run it as root but as my normal user so it should have limited 
> rights. Shouldn't the linux scheduler distribute cpu time among all running 
> processes?

In this case, there are two threads. One is receiving data and the other
is spinning checking on progress. If the spinning thread doesn't yield,
it will end up using it's whole quantum (10ms at 100hz), before the
scheduler lets the receiver run again. If the receiving thread doesn't
get to run then on a UP the performance stinks.

The problem only showed up laptop because most of my other systems are
SMP (or fake SMP/HT), and usually set HZ to 1000 not 100.

It kind of reminds me of the family road trip with the kids in the back
seat going "Are we there yet?"

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iperf: performance regression (was b44 driver problem?)

2007-06-04 Thread Maximilian Engelhardt

On Monday 04 June 2007, Ingo Molnar wrote:
> * Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> > Yes, the following patch makes iperf work better than ever. But are
> > other broken applications going to have same problem. Sounds like the
> > old "who runs first" fork() problems.
>
> this is the first such app and really, and even for this app: i've been
> frequently running iperf on -rt kernels for _years_ and never noticed
> how buggy its 'locking' code was, and that it would under some
> circumstances use up the whole CPU on high-res timers.

I must admit I don't know much about that topic, but there is one thing I 
don't understand. Why is iperf (even if it's buggy) able to use up the whole 
cpu? I didn't run it as root but as my normal user so it should have limited 
rights. Shouldn't the linux scheduler distribute cpu time among all running 
processes?

Maxi

signature.asc
Description: This is a digitally signed message part.

Re: [PATCH]: Add security check before flushing SAD/SPD

2007-06-04 Thread James Morris

On Mon, 4 Jun 2007, Eric Paris wrote:

> Some time ago this thread bounced back and forth and seemed to come to
> rest with the patch below, I cleaned up the comments and put all the
> ACKs it received in one place below, but so much time has passed I doubt
> if they should still count for free.  I also rediffed the patch against
> the latest miller tree.  Is the idea or patch in any way flawed or
> unacceptable to people at the moment?
> 
> Anyone willing to step up an re-ack the patch to get it moving into the
> tree?

Looks good to me.

Acked-by: James Morris <[EMAIL PROTECTED]>


-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iperf: performance regression (was b44 driver problem?)

2007-06-04 Thread Ingo Molnar

* Stephen Hemminger <[EMAIL PROTECTED]> wrote:

> Yes, the following patch makes iperf work better than ever. But are 
> other broken applications going to have same problem. Sounds like the 
> old "who runs first" fork() problems.

this is the first such app and really, and even for this app: i've been 
frequently running iperf on -rt kernels for _years_ and never noticed 
how buggy its 'locking' code was, and that it would under some 
circumstances use up the whole CPU on high-res timers.

If this were a widespread problem then the right solution would be to 
preserve the old behavior. The child-runs-first thing was an unspecified 
detail and many apps grew to depend on how the kernel did it - and when 
we changed it all hell broke lose. Even today, when i switch off 
child-runs-first, Gnome would not start up because some of its startup 
threads are racy and hang :-/

I googled for usleep(0) quickly (on google.com/codesearch and on 
google.com) and it didnt seem to suggest that the problem is widespread. 
(3 hits on the code-search site, all were false positives.)

so ... if this situation were even just a little bit more serious i'd 
argue for working this around and implementing some API quirk. But right 
now i'm leaning towards just saying that the iperf fix exists and fixes 
the problem, and that we already have kernels out with the new behavior. 
Maybe we should add a once-per-boot printk to flesh out any other apps? 
I'd be surprised if there was more than iperf.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iperf: performance regression (was b44 driver problem?)

2007-06-04 Thread Thomas Gleixner

On Mon, 2007-06-04 at 21:00 +0200, Thomas Gleixner wrote:
> > 
> > Yes, the following patch makes iperf work better than ever.
> > But are other broken applications going to have same problem.
> > Sounds like the old "who runs first" fork() problems.
> 
> Not really. The fork() "who runs first" problem is nowhere specified.
> 
> usleep(0) is well defined:
> 
>  If the value of useconds is 0, then the call has no effect.
> 
> So the call into the kernel has been wrong for quite a time.
> 

Just for clarification: I'm not saying that we should break the (broken)
user space ABI. I'm going to work out a patch which prints out a warning
(limited number per boot) and emulating the old behavior by a call to
yield() along with an entry into (mis)feature-removal.txt.

tglx


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Add security check before flushing SAD/SPD

2007-06-04 Thread Eric Paris

Some time ago this thread bounced back and forth and seemed to come to
rest with the patch below, I cleaned up the comments and put all the
ACKs it received in one place below, but so much time has passed I doubt
if they should still count for free.  I also rediffed the patch against
the latest miller tree.  Is the idea or patch in any way flawed or
unacceptable to people at the moment?

Anyone willing to step up an re-ack the patch to get it moving into the
tree?

-Eric

Currently we check for permission before deleting entries from SAD and
SPD, (see security_xfrm_policy_delete() security_xfrm_state_delete())
However we are not checking for authorization when flushing the SPD and
the SAD completely. It was perhaps missed in the original security hooks
patch.

This patch adds a security check when flushing entries from the SAD and
SPD.  It runs the entire database and checks each entry for a denial.
If the process attempting the flush is unable to remove all of the
entries a denial is logged the the flush function returns an error
without removing anything.

This is particularly useful when a process may need to create or delete
its own xfrm entries used for things like labeled networking but that
same process should not be able to delete other entries or flush the
entire database.

WAS Signed-off-by: Signed-off-by: Joy Latten<[EMAIL PROTECTED]> NOT NOW
WAS Acked-by: James Morris <[EMAIL PROTECTED]> NOT NOW
WAS Acked-by: Eric Paris <[EMAIL PROTECTED]> NOT NOW

---

 include/net/xfrm.h |6 ++--
 net/key/af_key.c   |   10 ++-
 net/xfrm/xfrm_policy.c |   63 +--
 net/xfrm/xfrm_state.c  |   46 --
 net/xfrm/xfrm_user.c   |9 +-
 5 files changed, 121 insertions(+), 13 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 90185e8..311f25a 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -964,7 +964,7 @@ struct xfrmk_spdinfo {
 
 extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq);
 extern int xfrm_state_delete(struct xfrm_state *x);
-extern void xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info);
+extern int xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info);
 extern void xfrm_sad_getinfo(struct xfrmk_sadinfo *si);
 extern void xfrm_spd_getinfo(struct xfrmk_spdinfo *si);
 extern int xfrm_replay_check(struct xfrm_state *x, __be32 seq);
@@ -1020,13 +1020,13 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(u8 type, int 
dir,
  struct xfrm_sec_ctx *ctx, int delete,
  int *err);
 struct xfrm_policy *xfrm_policy_byid(u8, int dir, u32 id, int delete, int 
*err);
-void xfrm_policy_flush(u8 type, struct xfrm_audit *audit_info);
+int xfrm_policy_flush(u8 type, struct xfrm_audit *audit_info);
 u32 xfrm_get_acqseq(void);
 void xfrm_alloc_spi(struct xfrm_state *x, __be32 minspi, __be32 maxspi);
 struct xfrm_state * xfrm_find_acq(u8 mode, u32 reqid, u8 proto,
  xfrm_address_t *daddr, xfrm_address_t *saddr,
  int create, unsigned short family);
-extern void xfrm_policy_flush(u8 type, struct xfrm_audit *audit_info);
+extern int xfrm_policy_flush(u8 type, struct xfrm_audit *audit_info);
 extern int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy 
*pol);
 extern int xfrm_bundle_ok(struct xfrm_policy *pol, struct xfrm_dst *xdst,
  struct flowi *fl, int family, int strict);
diff --git a/net/key/af_key.c b/net/key/af_key.c
index d302dda..0f8304b 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1682,6 +1682,7 @@ static int pfkey_flush(struct sock *sk, struct sk_buff 
*skb, struct sadb_msg *hd
unsigned proto;
struct km_event c;
struct xfrm_audit audit_info;
+   int err;
 
proto = pfkey_satype2proto(hdr->sadb_msg_satype);
if (proto == 0)
@@ -1689,7 +1690,9 @@ static int pfkey_flush(struct sock *sk, struct sk_buff 
*skb, struct sadb_msg *hd
 
audit_info.loginuid = audit_get_loginuid(current->audit_context);
audit_info.secid = 0;
-   xfrm_state_flush(proto, &audit_info);
+   err = xfrm_state_flush(proto, &audit_info);
+   if (err)
+   return err;
c.data.proto = proto;
c.seq = hdr->sadb_msg_seq;
c.pid = hdr->sadb_msg_pid;
@@ -2683,10 +2686,13 @@ static int pfkey_spdflush(struct sock *sk, struct 
sk_buff *skb, struct sadb_msg
 {
struct km_event c;
struct xfrm_audit audit_info;
+   int err;
 
audit_info.loginuid = audit_get_loginuid(current->audit_context);
audit_info.secid = 0;
-   xfrm_policy_flush(XFRM_POLICY_TYPE_MAIN, &audit_info);
+   err = xfrm_policy_flush(XFRM_POLICY_TYPE_MAIN, &audit_info);
+   if (err)
+   return err;
c.data.type = XFRM_POLICY_TYPE_MAIN;
c.event = XFRM_MSG_FLUSHPOLICY;
c.pid = hdr->sadb_msg_pid;
d

Re: iperf: performance regression (was b44 driver problem?)

2007-06-04 Thread Thomas Gleixner

On Mon, 2007-06-04 at 10:51 -0700, Stephen Hemminger wrote:
> > I doubt that. This is in the iperf code itself.
> > 
> > void thread_rest ( void ) {
> > #if defined( HAVE_THREAD )
> > #if defined( HAVE_POSIX_THREAD )
> > // TODO add checks for sched_yield or pthread_yield and call that
> > // if available
> > usleep( 0 );
> > 
> > --
> > 
> > It results in a nanosleep({0,0}, NULL)
> > 
> > tglx
> > 
> 
> Yes, the following patch makes iperf work better than ever.
> But are other broken applications going to have same problem.
> Sounds like the old "who runs first" fork() problems.

Not really. The fork() "who runs first" problem is nowhere specified.

usleep(0) is well defined:

 If the value of useconds is 0, then the call has no effect.

So the call into the kernel has been wrong for quite a time.

tglx


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] NetXen: Add NETXEN prefix to macros

2007-06-04 Thread Andy Gospodarek

On Sun, Jun 03, 2007 at 11:50:29AM -0400, Jeff Garzik wrote:
> Mithlesh Thukral wrote:
> >NetXen: Add NETXEN prefix to a macro
> >This patch will add the "NETXEN" prefix to "USER_START" macro.
> >
> >Signed-off by: Wen Xiong <[EMAIL PROTECTED]>
> >Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]>
> >---
> >
> > drivers/net/netxen/netxen_nic.h |4 ++--
> > drivers/net/netxen/netxen_nic_ethtool.c |2 +-
> > drivers/net/netxen/netxen_nic_hw.c  |4 ++--
> > drivers/net/netxen/netxen_nic_init.c|4 ++--
> > 4 files changed, 7 insertions(+), 7 deletions(-)
> 
> Your patch description is useless.  Clearly we know -what- it does, 
> simply by reading the patch.
> 
> But it does not answer the simple question:  why?  why is this needed in 
> a bug fix Release Candidate series?
> 

I can't see why this is needed in an RC branch, but the use of
generically named macros/variables has been problematic with this driver
on more 'obscure' arches.  

I submitted a bigger cleanup patch a in March that never got taken so
this is still a problem.  Here is that patch:

Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]>
---

 netxen_nic.h |   47 ---
 netxen_nic_ethtool.c |8 
 netxen_nic_hw.c  |   10 +-
 netxen_nic_init.c|   23 ---
 4 files changed, 45 insertions(+), 43 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index dd8ce35..d5f0c06 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -68,9 +68,10 @@
 #define _NETXEN_NIC_LINUX_SUBVERSION 3
 #define NETXEN_NIC_LINUX_VERSIONID  "3.3.3"
 
-#define NUM_FLASH_SECTORS (64)
-#define FLASH_SECTOR_SIZE (64 * 1024)
-#define FLASH_TOTAL_SIZE  (NUM_FLASH_SECTORS * FLASH_SECTOR_SIZE)
+#define NETXEN_NUM_FLASH_SECTORS (64)
+#define NETXEN_FLASH_SECTOR_SIZE (64 * 1024)
+#define NETXEN_FLASH_TOTAL_SIZE  (NETXEN_NUM_FLASH_SECTORS \
+   * NETXEN_FLASH_SECTOR_SIZE)
 
 #define PHAN_VENDOR_ID 0x4040
 
@@ -671,28 +672,28 @@ struct netxen_new_user_info {
 
 /* Flash memory map */
 typedef enum {
-   CRBINIT_START = 0,  /* Crbinit section */
-   BRDCFG_START = 0x4000,  /* board config */
-   INITCODE_START = 0x6000,/* pegtune code */
-   BOOTLD_START = 0x1, /* bootld */
-   IMAGE_START = 0x43000,  /* compressed image */
-   SECONDARY_START = 0x20, /* backup images */
-   PXE_START = 0x3E,   /* user defined region */
-   USER_START = 0x3E8000,  /* User defined region for new boards */
-   FIXED_START = 0x3F  /* backup of crbinit */
+   NETXEN_CRBINIT_START = 0,   /* Crbinit section */
+   NETXEN_BRDCFG_START = 0x4000,   /* board config */
+   NETXEN_INITCODE_START = 0x6000, /* pegtune code */
+   NETXEN_BOOTLD_START = 0x1,  /* bootld */
+   NETXEN_IMAGE_START = 0x43000,   /* compressed image */
+   NETXEN_SECONDARY_START = 0x20,  /* backup images */
+   NETXEN_PXE_START = 0x3E,/* user defined region */
+   NETXEN_USER_START = 0x3E8000,   /* User defined region for new boards */
+   NETXEN_FIXED_START = 0x3F   /* backup of crbinit */
 } netxen_flash_map_t;
 
-#define USER_START_OLD PXE_START   /* for backward compatibility */
-
-#define FLASH_START(CRBINIT_START)
-#define INIT_SECTOR(0)
-#define PRIMARY_START  (BOOTLD_START)
-#define FLASH_CRBINIT_SIZE (0x4000)
-#define FLASH_BRDCFG_SIZE  (sizeof(struct netxen_board_info))
-#define FLASH_USER_SIZE(sizeof(struct 
netxen_user_info)/sizeof(u32))
-#define FLASH_SECONDARY_SIZE   (USER_START-SECONDARY_START)
-#define NUM_PRIMARY_SECTORS(0x20)
-#define NUM_CONFIG_SECTORS (1)
+#define NETXEN_USER_START_OLD NETXEN_PXE_START /* for backward compatibility */
+
+#define NETXEN_FLASH_START (NETXEN_CRBINIT_START)
+#define NETXEN_INIT_SECTOR (0)
+#define NETXEN_PRIMARY_START   (NETXEN_BOOTLD_START)
+#define NETXEN_FLASH_CRBINIT_SIZE  (0x4000)
+#define NETXEN_FLASH_BRDCFG_SIZE   (sizeof(struct netxen_board_info))
+#define NETXEN_FLASH_USER_SIZE (sizeof(struct 
netxen_user_info)/sizeof(u32))
+#define NETXEN_FLASH_SECONDARY_SIZE
(NETXEN_USER_START-NETXEN_SECONDARY_START)
+#define NETXEN_NUM_PRIMARY_SECTORS (0x20)
+#define NETXEN_NUM_CONFIG_SECTORS  (1)
 #define PFX "NetXen: "
 extern char netxen_nic_driver_name[];
 
diff --git a/drivers/net/netxen/netxen_nic_ethtool.c 
b/drivers/net/netxen/netxen_nic_ethtool.c
index ee1b5a2..4dfa76b 100644
--- a/drivers/net/netxen/netxen_nic_ethtool.c
+++ b/drivers/net/netxen/netxen_nic_ethtool.c
@@ -94,7 +94,7 @@ static const char netxen_nic_gstrings_test[][ETH_GSTRING_LEN] 
= {
 
 static int netxen_nic_get_eeprom_len(struct net_device *dev)
 {
-   return FLASH_TOTAL_SIZE;
+   return NETXEN_FLASH_TOTAL_SIZE;
 }
 
 static void
@@

2.6.22-rc3-mm1 - pppd hanging in netdev_run_todo while holding mutex

2007-06-04 Thread Valdis . Kletnieks

On Wed, 30 May 2007 23:58:23 PDT, Andrew Morton said:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc3/2.6.22-rc3-mm1/

Under 22-rc2-mm1, if my VPN connection got reset, ppp0 just quietly went away.

Under 22-rc3-mm1, it seems to end up wedged and waiting for references to
go away:

Jun  4 09:23:01 turing-police kernel: [90089.270707] unregister_netdevice: 
waiting for ppp0 to become free. Usage count = 8
Jun  4 09:23:11 turing-police kernel: [90099.396121] unregister_netdevice: 
waiting for ppp0 to become free. Usage count = 8
Jun  4 09:23:21 turing-police kernel: [90109.520574] unregister_netdevice: 
waiting for ppp0 to become free. Usage count = 8
Jun  4 09:23:32 turing-police kernel: [90119.653129] unregister_netdevice: 
waiting for ppp0 to become free. Usage count = 8

'echo t > /proc/sysrq_trigger' shows pppd hung up here:

Jun  4 10:52:57 turing-police kernel: [95478.047892] pppd  D 
000105ad3830  4968  3815  1 (NOTLB)
Jun  4 10:52:57 turing-police kernel: [95478.047902]  810008d5fd78 
0086  81000349
Jun  4 10:52:57 turing-police kernel: [95478.047911]  810008d5fd28 
810008d4a040 810003461820 810008d4a2b0
Jun  4 10:52:57 turing-police kernel: [95478.047920]  000105ad3733 
0202 00ff 80239795
Jun  4 10:52:57 turing-police kernel: [95478.047928] Call Trace:
Jun  4 10:52:57 turing-police kernel: [95478.047936]  [] 
schedule_timeout+0x8d/0xb4
Jun  4 10:52:57 turing-police kernel: [95478.047945]  [] 
schedule_timeout_uninterruptible+0x19/0x1b
Jun  4 10:52:57 turing-police kernel: [95478.047954]  [] 
msleep+0x14/0x1e
Jun  4 10:52:57 turing-police kernel: [95478.047963]  [] 
netdev_run_todo+0x12f/0x234 
Jun  4 10:52:57 turing-police kernel: [95478.047972]  [] 
rtnl_unlock+0x35/0x37
Jun  4 10:52:57 turing-police kernel: [95478.047981]  [] 
unregister_netdev+0x1e/0x23
Jun  4 10:52:57 turing-police kernel: [95478.047994]  [] 
:ppp_generic:ppp_shutdown_interface+0x67/0xbb
Jun  4 10:52:57 turing-police kernel: [95478.048018]  [] 
:ppp_generic:ppp_release+0x33/0x65
Jun  4 10:52:57 turing-police kernel: [95478.048028]  [] 
__fput+0xac/0x176
Jun  4 10:52:57 turing-police kernel: [95478.048036]  [] 
fput+0x14/0x16
Jun  4 10:52:57 turing-police kernel: [95478.048045]  [] 
filp_close+0x66/0x71
Jun  4 10:52:57 turing-police kernel: [95478.048054]  [] 
sys_close+0x98/0xd7
Jun  4 10:52:57 turing-police kernel: [95478.048062]  [] 
tracesys+0xdc/0xe1
Jun  4 10:52:57 turing-police kernel: [95478.048073]  [<2b45cd2429a0>]

Which in itself wouldn't be so bad, except that it's holding a mutex and
lots of other stuff gets wedged up waiting for it (here's 1 of 6 processes
that was wedged this morning):

Jun  4 10:52:58 turing-police kernel: [95478.051129] ifconfig  D 
810005e19820  5800  9787  20510 (NOTLB)
Jun  4 10:52:58 turing-police kernel: [95478.051141]  81000868fd08 
0082 81000868fec8 0246
Jun  4 10:52:58 turing-police kernel: [95478.051150]  00010101 
810005e19820 810003fe0820 810005e19a90
Jun  4 10:52:58 turing-police kernel: [95478.051159]  0a3f26c0 
0006 81000868ff28 8028aacc
Jun  4 10:52:58 turing-police kernel: [95478.051167] Call Trace:
Jun  4 10:52:58 turing-police kernel: [95478.051176]  [] 
__mutex_lock_slowpath+0x74/0xb6
Jun  4 10:52:58 turing-police kernel: [95478.051185]  [] 
mutex_lock+0xe/0x10
Jun  4 10:52:58 turing-police kernel: [95478.051193]  [] 
netdev_run_todo+0x19/0x234
Jun  4 10:52:58 turing-police kernel: [95478.051202]  [] 
rtnl_unlock+0x35/0x37
Jun  4 10:52:58 turing-police kernel: [95478.051210]  [] 
dev_ioctl+0x3e3/0x483
Jun  4 10:52:58 turing-police kernel: [95478.051218]  [] 
sock_ioctl+0x1ef/0x1fc
Jun  4 10:52:58 turing-police kernel: [95478.051227]  [] 
do_ioctl+0x2a/0x77
Jun  4 10:52:58 turing-police kernel: [95478.051235]  [] 
vfs_ioctl+0x247/0x264
Jun  4 10:52:58 turing-police kernel: [95478.051243]  [] 
sys_ioctl+0x5f/0x85
Jun  4 10:52:58 turing-police kernel: [95478.051252]  [] 
tracesys+0xdc/0xe1

(And of course, you can't shutdown cleanly, because /etc/init.d/network tries
to down other interfaces on the way out, and)

I'd bisect this, except I don't have a better way to replicate it than "wait for
our VPN box to reset the connection after 24 hours of connect" - basically means
I get 2 tries per weekend..)

An hour or so of digging through the -rc3-mm1 broken-out/ didn't find any
obvious-to-me culprits.  Any ideas/suggestions?


pgpgLKOKJ5mzu.pgp
Description: PGP signature

Re: [patch 5/7] CAN: Add virtual CAN netdevice driver

2007-06-04 Thread Oliver Hartkopp

Oliver Hartkopp wrote:
> I think, it goes into the right direction to use skb->pkt_type. The flag
> should really be somewhere inside the skb as all back references into
> the sk would become sticky in the implementation.
>
> But regarding the use of skb->pkt_type i would suggest to take a closer
> look on the definitions in include/linux/if_packet.h and how the
> pkt_type is to be used inside the kernel. In my opinion we should use ...
>
> * TX-Path:
> PACKET_OTHERHOST: send the CAN frame without loopback
> PACKET_BROADCAST : send the CAN frame with loopback (aka local broadcast)
>
> See an example of this approach in drivers/net/arcnet/rfc1051.c :
> http://www.linux-m32r.org/lxr/http/source/drivers/net/arcnet/rfc1051.c?a=i386#L99
>
> * RX-Path:
> PACKET_HOST : just an incoming CAN frame for this host
>
> Any comments? ACKs?
>
> Best regards,
> Oliver
>
>
>   

The updated changes would look like this. Of course it has been tested
and is working fine :-)

Regards,
Oliver

Index: net/can/af_can.c
===
--- net/can/af_can.c(revision 325)
+++ net/can/af_can.c(working copy)
@@ -257,7 +257,6 @@
  */
 int can_send(struct sk_buff *skb, int loop)
 {
-struct sock **tx_sk = (struct sock **)skb->cb;
 int err;
 
 if (skb->dev->type != ARPHRD_CAN) {
@@ -265,30 +264,43 @@
 return -EPERM;
 }
 
+skb->protocol = htons(ETH_P_CAN);
+
 if (loop) {
 /* local loopback of sent CAN frames (default) */
 
-/* indication for the CAN driver: do loopback */
-*tx_sk = skb->sk;
+/*
+ * This packet is not only sent on the CAN bus but
+ * also broadcasted to local subscribers on this host.
+ */
+skb->pkt_type = PACKET_BROADCAST;
 
 /*
- * The reference to the originating sock may be also required
+ * The reference to the originating sock may be required
  * by the receiving socket to indicate (and ignore) his own
- * sent data. Example: can_raw sockopt CAN_RAW_RECV_OWN_MSGS
+ * sent data. Example: can_raw sockopt CAN_RAW_RECV_OWN_MSGS.
+ * Therefore we have to ensure that skb->sk remains the
+ * reference to the originating sock by restoring skb->sk
+ * after each skb_clone() or skb_orphan() usage.
+ * skb->sk is usually unused and unset in the rx path.
  */
 
 /* interface not capabable to do the loopback itself? */
 if (!(skb->dev->flags & IFF_LOOPBACK)) {
+struct sock *srcsk = skb->sk;
 struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
 
 /* perform the local loopback here */
-newskb->protocol  = htons(ETH_P_CAN);
 newskb->ip_summed = CHECKSUM_UNNECESSARY;
+newskb->sk = srcsk;
 netif_rx(newskb);
 }
 } else {
-/* indication for the CAN driver: no loopback required */
-*tx_sk = NULL;
+/*
+ * Indication for the CAN driver: No loopback required!
+ * This packet is only transmitted to the CAN bus.
+ */
+skb->pkt_type = PACKET_OTHERHOST;
 }
 
 if (!(skb->dev->flags & IFF_UP))
@@ -581,10 +593,12 @@
 
 static inline void deliver(struct sk_buff *skb, struct receiver *r)
 {
+struct sock *srcsk = skb->sk;
 struct sk_buff *clone = skb_clone(skb, GFP_ATOMIC);
 
 DBG("skbuff %p cloned to %p\n", skb, clone);
 if (clone) {
+clone->sk = srcsk;
 r->func(clone, r->data);
 r->matches++;
 }
Index: net/can/raw.c
===
--- net/can/raw.c(revision 325)
+++ net/can/raw.c(working copy)
@@ -154,7 +154,7 @@
 
 if (!ro->recv_own_msgs) {
 /* check the received tx sock reference */
-if (*(struct sock **)skb->cb == sk) {
+if (skb->sk == sk) {
 DBG("trashed own tx msg\n");
 kfree_skb(skb);
 return;
Index: drivers/net/can/vcan.c
===
--- drivers/net/can/vcan.c(revision 325)
+++ drivers/net/can/vcan.c(working copy)
@@ -133,6 +133,7 @@
 skb->protocol  = htons(ETH_P_CAN);
 skb->dev   = dev;
 skb->ip_summed = CHECKSUM_UNNECESSARY;
+skb->pkt_type  = PACKET_HOST;
 
 DBG("received skbuff on interface %d\n", dev->ifindex);
 
@@ -149,8 +150,8 @@
 stats->tx_packets++;
 stats->tx_bytes += skb->len;
 
-/* tx socket reference pointer: Loopback required if not NULL */
-loop = *(struct sock **)skb->cb != NULL;
+/* indication for CAN netdevice drivers that loopback is required */
+loop = (skb->pkt_type == PACKET_BROADCAST);
 
 if (!loopback) {
 /* no loopback handling available inside this driver */
@@ -170,6 +171,8 @@
 /* perform standard loopback handling for CAN network interfaces */
 
 if (loop) {
+struct sock *srcsk = skb->sk;
+

Re: iperf: performance regression (was b44 driver problem?)

2007-06-04 Thread Stephen Hemminger

On Mon, 04 Jun 2007 19:32:48 +0200
Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> On Mon, 2007-06-04 at 09:59 -0700, Stephen Hemminger wrote:
> > > > gettimeofday({1180973726, 982754}, NULL) = 0
> > > > recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 
> > > > 8192, 0) = 8192
> > > > gettimeofday({1180973726, 983790}, NULL) = 0
> > > 
> > > Well, gettimeofday() is not affected by the highres code, but
> > > 
> > > > nanosleep({0, 0}, NULL) = 0
> > > > nanosleep({0, 0}, NULL) = 0
> > > 
> > > is. The nanosleep call with a relative timeout of 0 returns immediately
> > > with highres enabled, while it sleeps at least until the next tick
> > > arrives when highres is off. Are there more of those stupid sleeps in
> > > the code ?
> > 
> > GLIBC pthread_mutex does it, YES it is a problem!
> > Looks like the old behavior is required for ABI compatibility.
> >
> > iperf server has several threads. One thread is using pthread_mutex_lock
> > to wait for the other thread.  It looks like pthread_mutex_lock is using
> > nanosleep as yield().
> 
> I doubt that. This is in the iperf code itself.
> 
> void thread_rest ( void ) {
> #if defined( HAVE_THREAD )
> #if defined( HAVE_POSIX_THREAD )
> // TODO add checks for sched_yield or pthread_yield and call that
> // if available
> usleep( 0 );
> 
> --
> 
> It results in a nanosleep({0,0}, NULL)
> 
>   tglx
> 

Yes, the following patch makes iperf work better than ever.
But are other broken applications going to have same problem.
Sounds like the old "who runs first" fork() problems.

--- iperf-2.0.2/compat/Thread.c.orig2005-05-03 08:15:51.0 -0700
+++ iperf-2.0.2/compat/Thread.c 2007-06-04 10:54:21.0 -0700
@@ -405,9 +405,13 @@
 void thread_rest ( void ) {
 #if defined( HAVE_THREAD )
 #if defined( HAVE_POSIX_THREAD )
-// TODO add checks for sched_yield or pthread_yield and call that
-// if available
+
+#if defined( _POSIX_PRIORITY_SCHEDULING )
+sched_yield();
+#else
 usleep( 0 );
+#endif
+
 #else // Win32
 SwitchToThread( );
 #endif




-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iperf: performance regression (was b44 driver problem?)

2007-06-04 Thread Thomas Gleixner

On Mon, 2007-06-04 at 09:59 -0700, Stephen Hemminger wrote:
> > > gettimeofday({1180973726, 982754}, NULL) = 0
> > > recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 
> > > 8192, 0) = 8192
> > > gettimeofday({1180973726, 983790}, NULL) = 0
> > 
> > Well, gettimeofday() is not affected by the highres code, but
> > 
> > > nanosleep({0, 0}, NULL) = 0
> > > nanosleep({0, 0}, NULL) = 0
> > 
> > is. The nanosleep call with a relative timeout of 0 returns immediately
> > with highres enabled, while it sleeps at least until the next tick
> > arrives when highres is off. Are there more of those stupid sleeps in
> > the code ?
> 
> GLIBC pthread_mutex does it, YES it is a problem!
> Looks like the old behavior is required for ABI compatibility.
>
> iperf server has several threads. One thread is using pthread_mutex_lock
> to wait for the other thread.  It looks like pthread_mutex_lock is using
> nanosleep as yield().

I doubt that. This is in the iperf code itself.

void thread_rest ( void ) {
#if defined( HAVE_THREAD )
#if defined( HAVE_POSIX_THREAD )
// TODO add checks for sched_yield or pthread_yield and call that
// if available
usleep( 0 );

--

It results in a nanosleep({0,0}, NULL)

tglx


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

iperf: performance regression (was b44 driver problem?)

2007-06-04 Thread Stephen Hemminger

On Mon, 04 Jun 2007 18:35:58 +0200
Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> On Mon, 2007-06-04 at 09:09 -0700, Stephen Hemminger wrote:
> > > > I did the test with an 2.6.22-rc3-git4 kernel and the p54 driver built 
> > > > external as module. 
> > > 
> > > Can you look at iperf to figure out, whether it does some weird timer
> > > stuff (high frequency interval timer or such) ? Either check the code or
> > > strace it.
> >
> > It is the receiver doing a tight loop doing gettimeofday/recv calls.
> > 
> > 
> > sendto(-1227715616, 0xc, 3085438964, 0, {...}, 3067249832) = 0
> > gettimeofday({1180973726, 981615}, NULL) = 0
> > gettimeofday({1180973726, 981751}, NULL) = 0
> > futex(0x8055c64, 0x5 /* FUTEX_??? */, 1) = 1
> > futex(0x8055c90, FUTEX_WAKE, 1) = 0
> > recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 
> > 0) = 8192
> > gettimeofday({1180973726, 982754}, NULL) = 0
> > recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 
> > 0) = 8192
> > gettimeofday({1180973726, 983790}, NULL) = 0
> 
> Well, gettimeofday() is not affected by the highres code, but
> 
> > nanosleep({0, 0}, NULL) = 0
> > nanosleep({0, 0}, NULL) = 0
> 
> is. The nanosleep call with a relative timeout of 0 returns immediately
> with highres enabled, while it sleeps at least until the next tick
> arrives when highres is off. Are there more of those stupid sleeps in
> the code ?

GLIBC pthread_mutex does it, YES it is a problem!
Looks like the old behavior is required for ABI compatibility.

iperf server has several threads. One thread is using pthread_mutex_lock
to wait for the other thread.  It looks like pthread_mutex_lock is using
nanosleep as yield().

Multi-thread strace shows how this could kill performance.
These are with old 2.6.20 kernel that doesn't have the problem:

sendto(-1210930208, 0xc, 3085438964, 0, {...}, 3084035240) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(5001), 
sin_addr=inet_addr("0.0.0.0")}, 16)   = 0
listen(3, 5)= 0
futex(0x8055c64, 0x5 /* FUTEX_??? */, 1) = 1
accept(3, {sa_family=AF_INET, sin_port=htons(49973), 
sin_addr=inet_addr("192.168.0.14")}, [16]) = 4
getsockname(4, {sa_family=AF_INET, sin_port=htons(5001), 
sin_addr=inet_addr("192.168.0.12")}, [16]) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 24, 0) = 
24
mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb6528000
mprotect(0xb6528000, 4096, PROT_NONE)   = 0
clone(child_stack=0xb6d284a4, 
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED,
 parent_tidptr=0xb6d28bd8, {entry_number:6, base_addr:0xb6d28b90, 
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, 
seg_not_present:0, useable:1}, child_tidptr=0xb6d28bd8) = 5622
accept(3, 0x8058b98, [128]) = ? ERESTARTSYS (To be restarted)

==

sendto(-1219322912, 0xc, 3085438964, 0, {...}, 3075642536) = 0
futex(0x8055d04, 0x5 /* FUTEX_??? */, 1) = 1
futex(0x8055d30, FUTEX_WAKE, 1) = 1
futex(0x8055c64, FUTEX_WAIT, 1, NULL) = 0
futex(0x8055d04, 0x5 /* FUTEX_??? */, 1) = 1
futex(0x8055d30, FUTEX_WAKE, 1) = 1
futex(0x8055c90, FUTEX_WAKE, 1) = 0
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [87380], [4]) = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7fb
write(1, ""..., 61) = 61
write(1, "Server listening on TCP port 500"..., 34) = 34
write(1, "TCP window size: 85.3 KByte (def"..., 38) = 38
write(1, ""..., 61) = 61
nanosleep({0, 0}, NULL) = 0
futex(0x8055d04, 0x5 /* FUTEX_??? */, 1) = 1
futex(0x8055d30, FUTEX_WAKE, 1) = 1
futex(0x8055c64, FUTEX_WAIT, 3, NULL) = 0
futex(0x8055c90, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily 
unavailable)
futex(0x8055d04, 0x5 /* FUTEX_??? */, 1)   = 1
futex(0x8055d30, FUTEX_WAKE, 1)  = 0
futex(0x8055c90, FUTEX_WAKE, 1)   = 0
write(1, "[  4] local 192.168.0.12 port 50"..., 74)  = 
74
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL)  = 0
nanosleep({0, 0}, NULL)   = 0
nanosleep({0, 0}, NULL) = 0

...

nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
write(1, "[  4]  0.0-30.2 sec336 MByte"..., 50) = 50
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
futex(0x8055d04, 0x5 /*

[PATCH 1/1] myri10ge: limit the number of recoveries

2007-06-04 Thread Brice Goglin

Limit the number of recoveries from a NIC hw watchdog reset to 1 by default.
It enables detection of defective NICs immediately since these memory parity
errors are expected to happen very rarely (less than once per century*NIC).

Signed-off-by: Brice Goglin <[EMAIL PROTECTED]>
---
 drivers/net/myri10ge/myri10ge.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Index: linux-rc/drivers/net/myri10ge/myri10ge.c
===
--- linux-rc.orig/drivers/net/myri10ge/myri10ge.c   2007-05-30 
20:58:22.0 +0200
+++ linux-rc/drivers/net/myri10ge/myri10ge.c2007-06-04 19:01:54.0 
+0200
@@ -279,6 +279,8 @@
 module_param(myri10ge_fill_thresh, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(myri10ge_fill_thresh, "Number of empty rx slots allowed\n");
 
+static int myri10ge_reset_recover = 1;
+
 static int myri10ge_wcfifo = 0;
 module_param(myri10ge_wcfifo, int, S_IRUGO);
 MODULE_PARM_DESC(myri10ge_wcfifo, "Enable WC Fifo when WC is enabled\n");
@@ -2730,8 +2732,14 @@
 * For now, just report it */
reboot = myri10ge_read_reboot(mgp);
printk(KERN_ERR
-  "myri10ge: %s: NIC rebooted (0x%x), resetting\n",
-  mgp->dev->name, reboot);
+  "myri10ge: %s: NIC rebooted (0x%x),%s resetting\n",
+  mgp->dev->name, reboot,
+  myri10ge_reset_recover ? " " : " not");
+   if (myri10ge_reset_recover == 0)
+   return;
+
+   myri10ge_reset_recover--;
+
/*
 * A rebooted nic will come back with config space as
 * it was after power was applied to PCIe bus.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/1] myri10ge: limit the number of recoveries

2007-06-04 Thread Brice Goglin

Jeff Garzik wrote:
> Brice Goglin wrote:
>> Limit the number of recoveries from a NIC hw watchdog reset to 1 by
>> default.
>> It enables detection of defective NICs immediately since these memory
>> parity
>> errors are expected to happen very rarely (less than once per
>> century*NIC).
>> However, a defective NIC (very rare, fortunately) can see such an error
>> quite often, ie. every few minutes under high load.
>>
>> Make the limit tunable to allow people with mission critical
>> installations
>> to crank up the tunable and recover an INTMAX number of times while
>> waiting
>> for a downtime window to replace the NIC. The performance won't be
>> optimal,
>> but at least, it will still work.
>>
>> Signed-off-by: Brice Goglin <[EMAIL PROTECTED]>
>> ---
>>  drivers/net/myri10ge/myri10ge.c |   15 +--
>>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> NAK.

Ok...
Then please apply the following patch which limits the number of
recovery to 1 without making it tunable. It will at least enable
detection of bad NICs.

Brice


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: b44: regression in 2.6.22 (resend)

2007-06-04 Thread Thomas Gleixner

On Mon, 2007-06-04 at 09:09 -0700, Stephen Hemminger wrote:
> > > I did the test with an 2.6.22-rc3-git4 kernel and the p54 driver built 
> > > external as module. 
> > 
> > Can you look at iperf to figure out, whether it does some weird timer
> > stuff (high frequency interval timer or such) ? Either check the code or
> > strace it.
>
> It is the receiver doing a tight loop doing gettimeofday/recv calls.
> 
> 
> sendto(-1227715616, 0xc, 3085438964, 0, {...}, 3067249832) = 0
> gettimeofday({1180973726, 981615}, NULL) = 0
> gettimeofday({1180973726, 981751}, NULL) = 0
> futex(0x8055c64, 0x5 /* FUTEX_??? */, 1) = 1
> futex(0x8055c90, FUTEX_WAKE, 1) = 0
> recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 
> 0) = 8192
> gettimeofday({1180973726, 982754}, NULL) = 0
> recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 
> 0) = 8192
> gettimeofday({1180973726, 983790}, NULL) = 0

Well, gettimeofday() is not affected by the highres code, but

> nanosleep({0, 0}, NULL) = 0
> nanosleep({0, 0}, NULL) = 0

is. The nanosleep call with a relative timeout of 0 returns immediately
with highres enabled, while it sleeps at least until the next tick
arrives when highres is off. Are there more of those stupid sleeps in
the code ?

tglx


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 5/7] CAN: Add virtual CAN netdevice driver

2007-06-04 Thread Oliver Hartkopp

Urs Thuermann wrote:
> Oliver Hartkopp <[EMAIL PROTECTED]> writes:
>
>   
>> 2. The loopback indication is done by using the unused skb->protocol in
>> the tx path.
>> 
>
> I don't think we should (mis-)use skb->protocol as a loopback flag.
>   

Yep! After reading the comments from Patrick and Urs i definitely agree
to their concerns.

> IMO it would be better to skb->pkt_type.  This is used to indicate
> packet type to rcv functions registered by dev_add_pack().  It is set
> by netdevice drivers to PACKET_{MULTICAST,BROADCAST,HOST,OTHER} for
> received packets.  In the send path it is set to PACKET_OUTGOING on
> the copy of the skbuff that is delivered to the sockets registered on
> ptype_all (typically packet sockets from tcpdump or other sniffers).
> AFAICS, pkt_type is not used otherwise in the send path.
>
> We could set skb->pkt_type = PACKET_LOOPBACK to flag to the CAN
> netdevice driver whether to loop back the packet.
>   

I think, it goes into the right direction to use skb->pkt_type. The flag
should really be somewhere inside the skb as all back references into
the sk would become sticky in the implementation.

But regarding the use of skb->pkt_type i would suggest to take a closer
look on the definitions in include/linux/if_packet.h and how the
pkt_type is to be used inside the kernel. In my opinion we should use ...

* TX-Path:
PACKET_OTHERHOST: send the CAN frame without loopback
PACKET_BROADCAST : send the CAN frame with loopback (aka local broadcast)

See an example of this approach in drivers/net/arcnet/rfc1051.c :
http://www.linux-m32r.org/lxr/http/source/drivers/net/arcnet/rfc1051.c?a=i386#L99

* RX-Path:
PACKET_HOST : just an incoming CAN frame for this host

Any comments? ACKs?

Best regards,
Oliver





-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: b44: regression in 2.6.22 (resend)

2007-06-04 Thread Stephen Hemminger

On Mon, 04 Jun 2007 08:39:48 +0200
Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> On Sun, 2007-06-03 at 18:26 +0200, Maximilian Engelhardt wrote:
> > > Is there any other strange behavior of the high res enabled kernel than
> > > the b44 problem ?
> > 
> > I didn't notice anything in the past (as I wrote). But today I did some 
> > tests 
> > for an updated version of the p54 mac80211 wlan driver and I noticed 
> > exactly 
> > the same problem:
> > 
> > when booting with highres=off everything is fine.
> > But when I boot an highres enabled kernel and I do the iperf-test with the 
> > p54 
> > driver, my systems becomes unresponsive during the test. It seems to be 
> > exactly the same problem I have with the b44 driver.
> > So this might not be a bug in the b44 code but a bug somewhere in the linux 
> > networking code.
> > 
> > I did the test with an 2.6.22-rc3-git4 kernel and the p54 driver built 
> > external as module. 
> 
> Can you look at iperf to figure out, whether it does some weird timer
> stuff (high frequency interval timer or such) ? Either check the code or
> strace it.
> 
>   tglx
> 
> 


It is the receiver doing a tight loop doing gettimeofday/recv calls.


sendto(-1227715616, 0xc, 3085438964, 0, {...}, 3067249832) = 0
gettimeofday({1180973726, 981615}, NULL) = 0
gettimeofday({1180973726, 981751}, NULL) = 0
futex(0x8055c64, 0x5 /* FUTEX_??? */, 1) = 1
futex(0x8055c90, FUTEX_WAKE, 1) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 982754}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 983790}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 984355}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 984706}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 985111}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 985499}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 986088}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 986436}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 986916}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 987397}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 987872}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 988440}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 988823}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 989314}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 990029}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 990890}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 991803}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 992616}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 993105}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 993585}, NULL) = 0
recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) 
= 8192
gettimeofday({1180973726, 994014}, NULL) = 0
...

recv(4, "45678901234567890123456789012345"..., 8192, 0) = 1448
gettimeofday({1180973757, 172437}, NULL) = 0
recv(4, "23456789012345678901234567890123"..., 8192, 0) = 1448
gettimeofday({1180973757, 172576}, NULL) = 0
recv(4, "01234567890123456789012345678901"..., 8192, 0) = 1632
gettimeofday({1180973757, 172752}, NULL) = 0
recv(4, "", 8192, 0)= 0
gettimeofday({1180973757, 172797}, NULL) = 0
gettimeofday({1180973757, 172817}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
nanosleep({0, 0}, NULL) = 0
close(4) = 0
futex(0x8055d04, 0x5 /* FUTEX_??? */, 1) = 1
futex(0x8055d30, FUTEX_WAKE, 1) = 0
_exit(0)

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to

Re: [PATCH] sundance: PHY address form 0, only for device ID 0x0200 (IP100A)

2007-06-04 Thread Randy Dunlap

On Mon, 04 Jun 2007 17:04:39 -0400 Jesse Huang wrote:

> From: Jesse Huang <[EMAIL PROTECTED]>
> 
> Change Logs:
> Search PHY address form 0, only for device ID 0x0200 (IP100A). Other device 
> are from PHY address 1.
> 
> Signed-off-by: Jesse Huang <[EMAIL PROTECTED]>
> ---
> 
>  drivers/net/sundance.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> 5724a72722dfc9cafbb8f273cb82dbf577bd9ad0
> diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c
> index e1f912d..914ab29 100644
> --- a/drivers/net/sundance.c
> +++ b/drivers/net/sundance.c
> @@ -562,7 +562,9 @@ #endif
>* It seems some phys doesn't deal well with address 0 being accessed
>* first, so leave address zero to the end of the loop (32 & 31).
>*/
> - for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) {
> + if(sundance_pci_tbl[np->chip_id].device == 0x0200) phy = 0;

kernel style:
if (sundance_pci_tbl[np->chip_id].device == 0x0200)
phy = 0;
else
phy = 1;

> + else phy = 1;
> + for (; phy <= 32 && phy_idx < MII_CNT; phy++) {
>   int phyx = phy & 0x1f;
>   int mii_status = mdio_read(dev, phyx, MII_BMSR);
>   if (mii_status != 0x  &&  mii_status != 0x) {
> -- 


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Definition and usage of NETIF_F_HW_SUM?

2007-06-04 Thread Ron Mercer

I was out of town last week and did not have a chance to respond.  Yes,
qla3xxx is (before Stephen's fix) broken on IPV6.  I will review the
changes and post a patch if necessary. 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Stephen Hemminger
> Sent: Tuesday, May 29, 2007 4:46 PM
> To: Michael Chan
> Cc: Herbert Xu; netdev
> Subject: Re: Definition and usage of NETIF_F_HW_SUM?
> 
> On Tue, 29 May 2007 17:10:52 -0700
> "Michael Chan" <[EMAIL PROTECTED]> wrote:
> 
> > On Wed, 2007-05-30 at 07:36 +1000, Herbert Xu wrote:
> > 
> > > 
> > > I just checked e1000 and it's correct as it does use the 
> csum_offset
> > > when doing TX offload.  However, you're definitely right that bnx2
> > > seems to be broken.
> > > 
> > > > A few devices take a offset, starting point, and 
> insertion point. This looks like
> > > > the correct model. But no upper layer protocols other 
> than IPV4/IPV6 can do checksum
> > > > offload at present, so it seems moot.
> > > 
> > > I could easily whip up a patch to get GRE to use it for a start :)
> > > 
> > > > IMHO the correct solution would be to get rid if 
> NETIF_F_HW_SUM and make a new flag
> > > > NETIF_F_IPV6_SUM. Devices that can checksum both could 
> do NETIF_F_IPV4_SUM|NETI_F_IPV6_SUM.
> > > 
> > > We should definitely keep NETIF_F_HW_SUM for sane 
> hardware such as the
> > > e1000.  Unfortunately we may just have to invent IPV6_SUM 
> for the broken
> > > ones.
> > > 
> > > Ccing Michael to see if the bnx2 chip can actually do offset-based
> > > checksum offload.
> > > 
> > 
> > bnx2 and tg3 cannot do offset-based checksumming because 
> the hardware
> > doesn't have room in the buffer descriptors to specify the 
> offsets.  So
> > regrettably, the NETIF_F_HW_SUM flag has been misused in 
> these drivers.
> > A new NETIF_F_IPV6_SUM flag will be very useful for us.
> 
> Look furthur many drivers are just plain broken and use F_HW_SUM
> and can't even do IPV6 properly.  I'll fix
> 
> The worst code award goes to: qla3xxx.c
> which is broken on IPV6 and goes to trouble of computing all the
> offsets and they are already there in skb...
> 
> 
> 
> -- 
> Stephen Hemminger <[EMAIL PROTECTED]>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack

2007-06-04 Thread Patrick McHardy

Patrick McHardy wrote:
> [TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack
> 
> A time_wait socket inherits sk_bound_dev_if from the original socket,
> but it is not used when sending ACK packets using ip_send_reply.
> 
> Fix by passing the oif to ip_send_reply in struct ip_reply_arg and
> use it for output routing.


The patch was missing a NULL pointer check in tcp_v4_send_ack for
a different code path where twsk == NULL. Fixed version attached.

[TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack

A time_wait socket inherits sk_bound_dev_if from the original socket,
but it is not used when sending ACK packets using ip_send_reply.

Fix by passing the oif to ip_send_reply in struct ip_reply_arg and
use it for output routing.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 9f66116df33984809a5cc0ea66a358db684d4346
tree a0b163d17181f1d1fafb4b7517726b5bb414b579
parent ea1601c50bf23af25094511e2a9ce1b755ab9669
author Patrick McHardy <[EMAIL PROTECTED]> Mon, 04 Jun 2007 17:10:41 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Mon, 04 Jun 2007 17:13:17 +0200

 include/net/ip.h |1 +
 net/ipv4/ip_output.c |4 +++-
 net/ipv4/tcp_ipv4.c  |2 ++
 3 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index bb207db..abf2820 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -143,6 +143,7 @@ struct ip_reply_arg {
__wsum  csum;
int csumoffset; /* u16 offset of csum in iov[0].iov_base */
/* -1 if not needed */ 
+   int bound_dev_if;
 }; 
 
 void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg 
*arg,
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d6427d9..34ea454 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1352,7 +1352,8 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, 
struct ip_reply_arg *ar
}
 
{
-   struct flowi fl = { .nl_u = { .ip4_u =
+   struct flowi fl = { .oif = arg->bound_dev_if,
+   .nl_u = { .ip4_u =
  { .daddr = daddr,
.saddr = rt->rt_spec_dst,
.tos = RT_TOS(ip_hdr(skb)->tos) 
} },
@@ -1376,6 +1377,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, 
struct ip_reply_arg *ar
inet->tos = ip_hdr(skb)->tos;
sk->sk_priority = skb->priority;
sk->sk_protocol = ip_hdr(skb)->protocol;
+   sk->sk_bound_dev_if = arg->bound_dev_if;
ip_append_data(sk, ip_reply_glue_bits, arg->iov->iov_base, len, 0,
   &ipc, rt, MSG_DONTWAIT);
if ((skb = skb_peek(&sk->sk_write_queue)) != NULL) {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 47c6105..97e294e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -705,6 +705,8 @@ static void tcp_v4_send_ack(struct tcp_timewait_sock *twsk,
  ip_hdr(skb)->saddr, /* XXX */
  arg.iov[0].iov_len, IPPROTO_TCP, 0);
arg.csumoffset = offsetof(struct tcphdr, check) / 2;
+   if (twsk)
+   arg.bound_dev_if = twsk->tw_sk.tw_bound_dev_if;
 
ip_send_reply(tcp_socket->sk, skb, &arg, arg.iov[0].iov_len);

Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)

2007-06-04 Thread Lennart Sorensen

On Mon, May 07, 2007 at 01:45:11PM -0400, Lennart Sorensen wrote:
> Hmm, I thought I saw it on two systems already, but I should go try that
> again.

Hmm, still haven't figured this out.  I just saw this one this morning:

BUG: soft lockup detected on CPU#0!
 [] dump_stack+0x24/0x30
 [] softlockup_tick+0x7e/0xc0
 [] update_process_times+0x33/0x80
 [] timer_interrupt+0x39/0x80
 [] handle_IRQ_event+0x3d/0x70
 [] __do_IRQ+0xa9/0x150
 [] do_IRQ+0x25/0x60
 [] common_interrupt+0x1a/0x20
 [] handle_IRQ_event+0x18/0x70
 [] __do_IRQ+0xa9/0x150
 [] do_IRQ+0x25/0x60
 [] common_interrupt+0x1a/0x20
 [] __do_softirq+0x3a/0xa0
 [] do_softirq+0x2d/0x30
 [] irq_exit+0x37/0x40
 [] do_IRQ+0x2a/0x60
 [] common_interrupt+0x1a/0x20
 [] setup_irq+0xce/0x1e0
 [] request_irq+0x97/0xb0
 [] pcnet32_open+0x4d/0x3d0 [pcnet32]
 [] dev_open+0x39/0x80
 [] dev_change_flags+0xfa/0x130
 [] devinet_ioctl+0x4ff/0x6f0
 [] sock_ioctl+0xf1/0x1f0
 [] do_ioctl+0x2c/0x80
 [] vfs_ioctl+0x52/0x2f0
 [] sys_ioctl+0x6f/0x80
 [] syscall_call+0x7/0xb
 [] 0xb7f41d04

And it is happening on multiple systems.  I am starting to wonder if it
is a bug in the soft lockup detection.  Maybe it really isn't locked up
but just momentarily appears to be.  I will try turning off the soft
lockup detection and see what happens.

--
Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Failover-friendly TCP retransmission

2007-06-04 Thread Andi Kleen

[EMAIL PROTECTED] writes:

> Please note first that I want to address physical failures by
> the failover-capable network devices, which are increasingly
> becoming important as Xen-based VM systems are getting popular.
> Reducing a single-point-of-failure (physical device) is vital on
> such VM systems.

Just you typically still have lots of other single points of failures in 
a single system, some of them quite less reliable than your typical
NIC. But at least it gives impressive demos when pulling ethernet cables @)

> 1. Network device layer detects a failure first and switch to a
>backup device (say, in 20sec).
> 
> 2. TCP layer timeout & retransmission comes next, _hopefully_
>before the application layer timeout.
> 
> 3. Application layer detects a network failure last (by, say,
>30sec timeout) and may trigger a system-level failover.
> 
> It should be noted that the timeouts for #1 and #2 are handled
> independently and there is no relationship between them.

> If TCP retransmission misses the time frame between event #1 and
> #3 in Background above (between 20 and 30sec since network
> failure), a failure causes the system-level failover where the
> network-device-level failover should be enough.

You should probably make sure that the device ends up returning the
right NET_XMIT_* code for such drops to TCP, in particular
NET_XMIT_DROP. This might require slight driver interface
changes. Also right now it only affects the congestion window, I think, 
it  might be reasonable to let it affect the timer backoff too.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] Add suspend and resume support to uli526x

2007-06-04 Thread Rafael J. Wysocki

Hi,

On Monday, 4 June 2007 13:11, Pavel Machek wrote:
> Hi!
> 
> > From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > 
> > Add suspend/resume support to the uli526x network driver (tested on x86_64,
> > with 'Ethernet controller: ALi Corporation M5263 Ethernet Controller, rev 
> > 40').
> > 
> > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> Looks ok to me.
> 
> > +#ifdef CONFIG_PM
> > +
> > +/*
> > + * Suspend the interface.
> > + */
> > +
> > +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state)
> > +{
> > +   struct net_device *dev = pci_get_drvdata(pdev);
> > +
> > +   ULI526X_DBUG(0, "uli526x_suspend", 0);
> > +
> > +   if (dev && netdev_priv(dev)) {
> > +   if (netif_running(dev)) {
> > +   netif_device_detach(dev);
> > +   uli526x_down(dev);
> > +   }
> > +   pci_save_state(pdev);
> > +   pci_enable_wake(pdev, pci_choose_state(pdev, state), 0);
> > +   pci_disable_device(pdev);
> > +   pci_set_power_state(pdev, pci_choose_state(pdev, state));
> > +   }
> > +   return 0;
> > +}
> > +
> > +/*
> > + * Resume the interface.
> > + */
> > +
> > +static int uli526x_resume(struct pci_dev *pdev)
> > +{
> > +   struct net_device *dev = pci_get_drvdata(pdev);
> > +   struct uli526x_board_info *db = netdev_priv(dev);
> > +   int err;
> > +
> > +   ULI526X_DBUG(0, "uli526x_resume", 0);
> > +
> > +   if (dev && db) {
> > +   pci_set_power_state(pdev, PCI_D0);
> > +   err = pci_enable_device(pdev);
> > +   if (err) {
> > +   printk(KERN_WARNING "%s: Could not enable device \n",
> > +   dev->name);
> > +   return err;
> > +   }
> > +   pci_restore_state(pdev);
> > +   pci_set_master(pdev);
> > +   if (netif_running(dev)) {
> > +   uli526x_up(dev);
> > +   netif_device_attach(dev);
> > +   }
> > +   }
> > +   return 0;
> > +}
> > +
> #else 
> #define *_resume NULL
> #define *_suspend NULL

> > +#endif /* CONFIG_PM */
> 
> > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver 
> > .id_table   = uli526x_pci_tbl,
> > .probe  = uli526x_init_one,
> > .remove = __devexit_p(uli526x_remove_one),
> > +#ifdef CONFIG_PM
> > +   .suspend= uli526x_suspend,
> > +   .resume = uli526x_resume,
> > +#endif
> 
> ...so that this ifdef is not needed?

OK, why not.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: r8169: hard freezes on TX

2007-06-04 Thread Rolf Eike Beer

Rolf Eike Beer wrote:
> Francois Romieu wrote:
> > Rolf Eike Beer <[EMAIL PROTECTED]> :
> > [...]
>
> Ok, just tested. I used a file of 200MB and copied it to another host on
> the LAN. If I used our 100 MBit switch nothing happened. When I put a 10
> MBit hub in the middle it died at 77 MB.

I just had another freeze using your patches. After 512kB over smb it was 
dead.

Eike

signature.asc
Description: This is a digitally signed message part.

[RFC] Failover-friendly TCP retrans mission

2007-06-04 Thread noboru . obata . ar

Hi all,

I would like to hear comments on how TCP retransmission can be
done better on failover-capable network devices, such as an
active-backup bonding device.


Premise
===

Please note first that I want to address physical failures by
the failover-capable network devices, which are increasingly
becoming important as Xen-based VM systems are getting popular.
Reducing a single-point-of-failure (physical device) is vital on
such VM systems.

And the failover here is not going to address overloaded or
congested networks here, which should be solved separately.


Background
==

When designing a TCP/IP based network system on failover-capable
network devices, people want to set timeouts hierarchically in
three layers, network device layer, TCP layer, and application
layer (bottom-up order), such that:

1. Network device layer detects a failure first and switch to a
   backup device (say, in 20sec).

2. TCP layer timeout & retransmission comes next, _hopefully_
   before the application layer timeout.

3. Application layer detects a network failure last (by, say,
   30sec timeout) and may trigger a system-level failover.

It should be noted that the timeouts for #1 and #2 are handled
independently and there is no relationship between them.

Also note that the actual timeout settings (20sec or 30sec in
this example) are often determined by systems requirement and so
setting them to certain "safe values" (if any) are usually not
possible.


Problem
===

If TCP retransmission misses the time frame between event #1 and
#3 in Background above (between 20 and 30sec since network
failure), a failure causes the system-level failover where the
network-device-level failover should be enough.

The problem in this hierarchical timeout scheme is that TCP
layer does not guarantee the next retransmission to occur in
certain period of time.  In the above example, people expect TCP
to retransmit a packet between 20 and 30sec since network
failure, but it may not happen.

Starting from RTO=0.5sec for example, retransmission will occur
at time 0.5, 1.5, 3.5, 7.5, 15.5, and 31.5 as indicated by 'o'
in the following diagram, but miss the time frame between time
20 and 30.

   time: 0 102030sec
 | | | |
  App. layer |-+-+-X  ==> system failover
   TCP layer oo-o---o--+o+-+o <== expects retrans. b/w 20~30
Netdev layer |-+-X==> network failover


Solution


It seems reasonable for me to solve this problem by capping
TCP_RTO_MAX, i.e., making TCP_RTO_MAX a sysctl variable and set
it to a small number.

In this example, setting to (10 - RTT)[sec] at most should work
because retransmission will take place between time 20 and 30.

My rationale follows.

* This solution is simple and so less error-prone.

* The solution does not violate RFC 2988 in maximum RTO value,
  because RFC 2988's requirement in maximum RTO value, "at least
  60 seconds (in (2.5))," is OPTIONAL.

* The solution adds a system-wide setting, which is preferable
  to per-socket setting (by, say, setsockopt or something),
  because all application benefits from the solution.


Before posting patches, I would like to hear comments here.
Any comments or suggestions, to make TCP retransmission work
better on failover-capable network devices, are welcome.

Regards,

-- 
OBATA Noboru ([EMAIL PROTECTED])

Re: [patch 5/7] CAN: Add virtual CAN netdevice driver

2007-06-04 Thread Urs Thuermann

Patrick McHardy <[EMAIL PROTECTED]> writes:

> I was thinking about this, don't CAN frames include the identity
> of the sender? That would be the easiest way to determine whether
> the frame needs to be delivered.

No, CAN is quite a broken networking technology (at least in my view,
coming from tcp/ip).  There is no station addresses and therefore no
sender or receiver addresses.  Everything is broadcast.

> Actually its not unused, it needs to be set on TX frames once the
> packet leaves the protocol specific code.

Oops, yes, this is something still missing in our code.  Needs to be
fixed.

> I don't get why you can't directly check the socket option on the
> TX path.

We have several types of sockets in the PF_CAN family, two of which
are GPL'ed and which are in the patch series.  These are CAN_RAW and
CAN_BCM.  The protocol implementations use can_send() in af_can.c to
send a CAN frame and indicate to can_send() in an int argument,
whether this frame should be looped back.  Only the raw protocol has a
socket option (setsockopt(2)) in struct raw_sock for this, bcm always
sets this to 1 to have the frame looped back.  There is no option in
struct bcm_sock for this.  In can_send() and in the driver we don't
know what type of socket skb->sk points to and can't check that
option.  Changing this would mean we have to add such an option in the
same position in all CAN socket types and set it to fixed values in
some of them (e.g. to 1 for bcm).  While it's doable, I wouldn't like
that very much.

Is there anything that prevents can_send() from using skb->pkt_type to
pass the loopback flag down to the driver?

urs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] ehea: Receive SKB Aggregation

2007-06-04 Thread Christoph Raisch


Christoph Hellwig wrote on 31.05.2007 15:41:18:

> I'm still very unhappy with having all this in various drivers.  There's
> a lot of code that can be turned into generic library functions, and even
> more code that could be made generic with some amount of refactoring.

Yes, we'd also prefer to use a generic function, but we first would want to
get
some "real world" experience how our driver behaves with LRO to be even
able to
define requirements for such a generic function. A lot of this is tied into
pathlengths,
caching, and why does that help compared to a different TCP receive side
processing?
In a perfect world we shouldn't see a diffference if this is enabled or
not,
but measurements indicate something completely different at 10gbit.

Gruss / Regards
Christoph Raisch


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] ehea: Receive SKB Aggregation

2007-06-04 Thread Christoph Raisch



Stephen Hemminger  wrote on 31.05.2007 18:37:03:

>
> >
> >
> > +static int try_get_ip_tcp_hdr(struct ehea_cqe *cqe, struct sk_buff
*skb,
> > +   struct iphdr **iph, struct tcphdr **tcph)
> > +{
> > +   int ip_len;
> > +
> > +   /* non tcp/udp packets */
> > +   if (!cqe->header_length)
> > +  return -1;
> > +
> > +   /* non tcp packet */
> > +   *iph = (struct iphdr *)(skb->data);
>
> Why the indirection, copying of headers..
This interacts with the header split function in the hardware.
>
> > +   if ((*iph)->protocol != IPPROTO_TCP)
> > +  return -1;
> > +
> > +   ip_len = (u8)((*iph)->ihl);
> > +   ip_len <<= 2;
> > +   *tcph = (struct tcphdr *)(((u64)*iph) + ip_len);
> > +
> > +   return 0;
> > +}
> > +
> >
>
> This code seems to be duplicating a lot  (but not all) of the TCP/IP
> input path validation checks. This is a security problem if nothing
else...
>
We should only do aggregation in the driver if this really is a TCP header,
otherwise things will get worse.
You're right, we should at least check that tcph is within the received
frame.

> Also, how do you prevent DoS attacks from hostile TCP senders that send
> huge number of back to back frames?

Actually a huge number of back to back frames is what we would want to
receive
at 10 gbit ;-)
How is it possible to figure out if this is what you want or just DoS?
It doesn't change anything compared to a non LRO driver, we process a
certain
maximum amount of frames before waiting for the next interrupt,
the packet filters/DoS should still see all traffic (which is above the
driver).
Any suggestions how to handle this better/different?

>
> --
> Stephen Hemminger

Gruss / Regards
Christoph Raisch

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] ps3: add wireless LAN support

2007-06-04 Thread MOKUNO Masakazu

Hi

Thank you for your comments!

It seems to be pretty preferable we make the driver wpa_supplicant
compatible, I'll plan to rework and resubmit the patch (to
linux-wireless)

On Fri, 01 Jun 2007 12:35:49 -0400
Dan Williams <[EMAIL PROTECTED]> wrote:
> There are a few basic
> issues, rehashed here:
> 
> 1) Need endian annotations and conversions.

OK, I'll fix. 

> 2) How much WPA does the device really support?  If it only supports
> WPA[2]-PSK, I guess that's OK, but the driver _MUST_ return the WPA and
> RSN information elements from each BSS that provides them using the
> IWEVGENIE tag from the get scan handler.  A hack might be to construct a
> suitable synthetic IE depending on the contents of the 'security' field
> the firmware passes back.  But userspace needs to know that a particular
> BSS supports WPA in a standard manner.

It supports WEP and WPA[2]-PSKs(TKIP and AES) only. 

I'll consider incorporating wpa_supplicant stuff, ie to support IWEVGENIE
parameters.

> 3) Should remove the WPA passphrase bits from the encode functions,
> these functions should operate only on raw key material.  Userspace
> tools do the hashing.

The hypervisor (firmware) has functionality of hashing WPA passphrase. 
So the driver code just passes it to hypervisor.  With that feature, end
users can easily set their passphase.  I thought it was convenient.
Anyway I can remove it to delegate to user space.

> 4) I'd like an explanation of the scanning behavior too, looks fishy.

Well, it basically start to scan as requested.  But since scanning is
time consuming job and it seems to block normal messages, rate control
code you mentioned was inserted.
Background scan is performed for roaming as you thought.

> 
> 5) Should be using a list of scanned BSSs, should be aging and culling
> them, instead of a static array.

OK.

> 
> 6) The private ioctl needs to die.

These private ioctls were made so that we can setup the driver simply
with wireless-tools, like the answer to 3) above.
Anyway I'll try to incorporate wpa_supplicant support.

> 
> 7) Does the device work with wpa_supplicant's 'wext' driver?

I don't think it works with wext of wpa_supplicant, although I didn't test
it with wpa_supplicant well.

--
Masakazu MOKUNO

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 5/7] CAN: Add virtual CAN netdevice driver

2007-06-04 Thread Urs Thuermann

Oliver Hartkopp <[EMAIL PROTECTED]> writes:

> 2. The loopback indication is done by using the unused skb->protocol in
> the tx path.

I don't think we should (mis-)use skb->protocol as a loopback flag.
As the name says, it's the protocol number and not a flag whether
loopback is to be done by the driver.  Even if it's not used otherwise
in tx path (haven't checked this), in my experience it is asking for
future trouble to use variables other than for it's intended purpose.

IMO it would be better to skb->pkt_type.  This is used to indicate
packet type to rcv functions registered by dev_add_pack().  It is set
by netdevice drivers to PACKET_{MULTICAST,BROADCAST,HOST,OTHER} for
received packets.  In the send path it is set to PACKET_OUTGOING on
the copy of the skbuff that is delivered to the sockets registered on
ptype_all (typically packet sockets from tcpdump or other sniffers).
AFAICS, pkt_type is not used otherwise in the send path.

We could set skb->pkt_type = PACKET_LOOPBACK to flag to the CAN
netdevice driver whether to loop back the packet.

Any objections?

urs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 5/7] CAN: Add virtual CAN netdevice driver

2007-06-04 Thread Patrick McHardy

Oliver Hartkopp wrote:
> 1. The needed skb->sk is preserved for the rx path (the way 'up').

I was thinking about this, don't CAN frames include the identity
of the sender? That would be the easiest way to determine whether
the frame needs to be delivered.

> 2. The loopback indication is done by using the unused skb->protocol in
> the tx path.

Actually its not unused, it needs to be set on TX frames once the
packet leaves the protocol specific code. Its used for example by
packet sockets (sk_run_filter), qdisc classification, neighbour
resolution, gso segmentation, ...

I don't get why you can't directly check the socket option on the
TX path.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] cfg80211: fix signed macaddress in sysfs

2007-06-04 Thread Johannes Berg

On Mon, 2007-06-04 at 00:06 +0200, David Lamparter wrote:
> Fix signedness mixup making mac addresses show up strangely
> (like 00:11:22:33:44:ffaa) in /sys/class/ieee80211/*/macaddress.

Huh, my mistake, patch looks good. Do we care for this in .22 still?

Acked-by: Johannes Berg <[EMAIL PROTECTED]>


> Signed-off-by: David Lamparter <[EMAIL PROTECTED]>
> ---
> 
>  net/wireless/sysfs.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/wireless/sysfs.c b/net/wireless/sysfs.c
> index 7844be4..374d16d 100644
> --- a/net/wireless/sysfs.c
> +++ b/net/wireless/sysfs.c
> @@ -33,7 +33,7 @@ static ssize_t _show_permaddr(struct device *dev,
>   struct device_attribute *attr,
>   char *buf)
>  {
> -   char *addr = dev_to_rdev(dev)->wiphy.perm_addr;
> +   unsigned char *addr = dev_to_rdev(dev)->wiphy.perm_addr;
>  
> return sprintf(buf, "%.2x:%.2x:%.2x:%.2x:%.2x:%.2x\n",
>addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);

This probably should also use MAC_FMT/MAC_ARG, hah. Oh well, I'll fix it
up later.

johannes


signature.asc
Description: This is a digitally signed message part

Re: [RFC][PATCH] Add suspend and resume support to uli526x

2007-06-04 Thread Pavel Machek

Hi!

> From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> Add suspend/resume support to the uli526x network driver (tested on x86_64,
> with 'Ethernet controller: ALi Corporation M5263 Ethernet Controller, rev 
> 40').
> 
> Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>

Looks ok to me.

> +#ifdef CONFIG_PM
> +
> +/*
> + *   Suspend the interface.
> + */
> +
> +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state)
> +{
> + struct net_device *dev = pci_get_drvdata(pdev);
> +
> + ULI526X_DBUG(0, "uli526x_suspend", 0);
> +
> + if (dev && netdev_priv(dev)) {
> + if (netif_running(dev)) {
> + netif_device_detach(dev);
> + uli526x_down(dev);
> + }
> + pci_save_state(pdev);
> + pci_enable_wake(pdev, pci_choose_state(pdev, state), 0);
> + pci_disable_device(pdev);
> + pci_set_power_state(pdev, pci_choose_state(pdev, state));
> + }
> + return 0;
> +}
> +
> +/*
> + *   Resume the interface.
> + */
> +
> +static int uli526x_resume(struct pci_dev *pdev)
> +{
> + struct net_device *dev = pci_get_drvdata(pdev);
> + struct uli526x_board_info *db = netdev_priv(dev);
> + int err;
> +
> + ULI526X_DBUG(0, "uli526x_resume", 0);
> +
> + if (dev && db) {
> + pci_set_power_state(pdev, PCI_D0);
> + err = pci_enable_device(pdev);
> + if (err) {
> + printk(KERN_WARNING "%s: Could not enable device \n",
> + dev->name);
> + return err;
> + }
> + pci_restore_state(pdev);
> + pci_set_master(pdev);
> + if (netif_running(dev)) {
> + uli526x_up(dev);
> + netif_device_attach(dev);
> + }
> + }
> + return 0;
> +}
> +
#else 
#define *_resume NULL
#define *_suspend NULL

> +#endif /* CONFIG_PM */

> @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver 
>   .id_table   = uli526x_pci_tbl,
>   .probe  = uli526x_init_one,
>   .remove = __devexit_p(uli526x_remove_one),
> +#ifdef CONFIG_PM
> + .suspend= uli526x_suspend,
> + .resume = uli526x_resume,
> +#endif

...so that this ifdef is not needed?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fix race in AF_UNIX

2007-06-04 Thread Miklos Szeredi

(resend, sorry, fscked up the address list)

> A recv() on an AF_UNIX, SOCK_STREAM socket can race with a
> send()+close() on the peer, causing recv() to return zero, even though
> the sent data should be received.
> 
> This happens if the send() and the close() is performed between
> skb_dequeue() and checking sk->sk_shutdown in unix_stream_recvmsg():
> 
> process A  skb_dequeue() returns NULL, there's no data in the socket queue
> process B  new data is inserted onto the queue by unix_stream_sendmsg()
> process B  sk->sk_shutdown is set to SHUTDOWN_MASK by unix_release_sock()
> process A  sk->sk_shutdown is checked, unix_release_sock() returns zero

This is only part of the story.  It turns out, there are other races
involving the garbage collector, that can throw away perfectly good
packets with AF_UNIX sockets in them.

The problems arise when a socket goes from installed to in-flight or
vica versa during garbage collection.  Since gc is done with a
spinlock held, this only shows up on SMP.

The following patch fixes it for me, but it's possibly the wrong
approach.

Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
---

Index: linux-2.6.22-rc2/net/unix/garbage.c
===
--- linux-2.6.22-rc2.orig/net/unix/garbage.c2007-06-03 23:58:11.0 
+0200
+++ linux-2.6.22-rc2/net/unix/garbage.c 2007-06-04 11:39:42.0 +0200
@@ -90,6 +90,7 @@
 static struct sock *gc_current = GC_HEAD; /* stack of objects to mark */
 
 atomic_t unix_tot_inflight = ATOMIC_INIT(0);
+DECLARE_RWSEM(unix_gc_sem);
 
 
 static struct sock *unix_get_socket(struct file *filp)
@@ -169,7 +170,7 @@ static void maybe_unmark_and_push(struct
 
 void unix_gc(void)
 {
-   static DEFINE_MUTEX(unix_gc_sem);
+   static DEFINE_MUTEX(unix_gc_local_lock);
int i;
struct sock *s;
struct sk_buff_head hitlist;
@@ -179,9 +180,22 @@ void unix_gc(void)
 *  Avoid a recursive GC.
 */
 
-   if (!mutex_trylock(&unix_gc_sem))
+   if (!mutex_trylock(&unix_gc_local_lock))
return;
 
+
+   /*
+* unix_gc_sem protects against sockets going from in-flight to
+* installed
+*
+* Can't sleep on this, because skb_recv_datagram could be
+* waiting for a packet that is to be sent by the thread which
+* invoked the gc
+*/
+   if (!down_write_trylock(&unix_gc_sem)) {
+   mutex_unlock(&unix_gc_local_lock);
+   return;
+   }
spin_lock(&unix_table_lock);
 
forall_unix_sockets(i, s)
@@ -207,8 +221,6 @@ void unix_gc(void)
 
forall_unix_sockets(i, s)
{
-   int open_count = 0;
-
/*
 *  If all instances of the descriptor are not
 *  in flight we are in use.
@@ -218,10 +230,20 @@ void unix_gc(void)
 *  In this case (see unix_create1()) we set artificial
 *  negative inflight counter to close race window.
 *  It is trick of course and dirty one.
+*
+*  Get the inflight counter first, then the open
+*  counter.  This avoids problems if racing with
+*  sendmsg
+*
+*  If just created socket is not yet attached to
+*  a file descriptor, assume open_count of 1
 */
+   int inflight_count = atomic_read(&unix_sk(s)->inflight);
+   int open_count = 1;
+
if (s->sk_socket && s->sk_socket->file)
open_count = file_count(s->sk_socket->file);
-   if (open_count > atomic_read(&unix_sk(s)->inflight))
+   if (open_count > inflight_count)
maybe_unmark_and_push(s);
}
 
@@ -302,11 +324,12 @@ void unix_gc(void)
u->gc_tree = GC_ORPHAN;
}
spin_unlock(&unix_table_lock);
+   up_write(&unix_gc_sem);
 
/*
 *  Here we are. Hitlist is filled. Die.
 */
 
__skb_queue_purge(&hitlist);
-   mutex_unlock(&unix_gc_sem);
+   mutex_unlock(&unix_gc_local_lock);
 }
Index: linux-2.6.22-rc2/include/net/af_unix.h
===
--- linux-2.6.22-rc2.orig/include/net/af_unix.h 2007-04-26 05:08:32.0 
+0200
+++ linux-2.6.22-rc2/include/net/af_unix.h  2007-06-04 09:13:56.0 
+0200
@@ -14,6 +14,7 @@ extern void unix_gc(void);
 
 extern struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
 extern spinlock_t unix_table_lock;
+extern struct rw_semaphore unix_gc_sem;
 
 extern atomic_t unix_tot_inflight;
 
Index: linux-2.6.22-rc2/net/unix/af_unix.c
===
--- linux-2.6.22-rc2.orig/net/unix/af_unix.c2007-06-03 23:58:11.0 
+0200
+++ linux-2.6.22-rc2/net/unix/af_u

1 2 >

1 - 100 of 104 matches

Mail list logo