Re: [PATCH 4/4] [IPV4]: Restore old behaviour of default config values
From: Herbert Xu <[EMAIL PROTECTED]> Date: Tue, 05 Jun 2007 16:31:04 +1000 > [IPV4]: Restore old behaviour of default config values > > Previously inet devices were only constructed when addresses are added > (or rarely in ipmr). Therefore the default config values they get are > the ones at the time of these operations. > > Now that we're creating inet devices earlier, this changes the behaviour > of default config values in an incompatible way (see bug #8519). > > This patch creates a compromise by setting the default values at the > same point as before but only for those that have not been explicitly > set by the user since the inet device's creation. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Also applied, thanks a lot. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] [IPV4]: Add default config support after inetdev_init
From: Herbert Xu <[EMAIL PROTECTED]> Date: Tue, 05 Jun 2007 16:31:03 +1000 > [IPV4]: Add default config support after inetdev_init > > Previously once inetdev_init has been called on a device any changes made > to ipv4_devconf_dflt would have no effect on that device's configuration. > > This creates a problem since we have moved the point where inetdev_init > is called from when an address is added to where the device is registered. > > This patch is the first half of a set that tries to mimic the old behaviour > while still calling inetdev_init. > > It propagates any changes to ipv4_devconf_dflt to those devices that have > not had the corresponding attribute set. > > The next patch will forcibly set all values at the point where inetdev_init > was previously called. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Looks good, applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] [IPV4]: Convert IPv4 devconf to an array
From: Herbert Xu <[EMAIL PROTECTED]> Date: Tue, 05 Jun 2007 16:31:02 +1000 > [IPV4]: Convert IPv4 devconf to an array > > This patch converts the ipv4_devconf config members (everything except > sysctl) to an array. This allows easier manipulation which will be > needed later on to provide better management of default config values. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] [IPV4]: Only panic if inetdev_init fails for loopback
From: Herbert Xu <[EMAIL PROTECTED]> Date: Tue, 05 Jun 2007 16:31:01 +1000 > [IPV4]: Only panic if inetdev_init fails for loopback > > When I made the inetdev_init call work on all devices I incorrectly > left in the panic call as well. It is obviously undesirable to > panic on an allocation failure for a normal network device. This > patch moves the panic call under the loopback if clause. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] [IPV4]: Add default config support after inetdev_init
[IPV4]: Add default config support after inetdev_init Previously once inetdev_init has been called on a device any changes made to ipv4_devconf_dflt would have no effect on that device's configuration. This creates a problem since we have moved the point where inetdev_init is called from when an address is added to where the device is registered. This patch is the first half of a set that tries to mimic the old behaviour while still calling inetdev_init. It propagates any changes to ipv4_devconf_dflt to those devices that have not had the corresponding attribute set. The next patch will forcibly set all values at the point where inetdev_init was previously called. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- include/linux/inetdevice.h |3 + net/ipv4/devinet.c | 133 + 2 files changed, 101 insertions(+), 35 deletions(-) diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -3,6 +3,7 @@ #ifdef __KERNEL__ +#include #include #include #include @@ -12,6 +13,7 @@ struct ipv4_devconf { void*sysctl; int data[__NET_IPV4_CONF_MAX - 1]; + DECLARE_BITMAP(state, __NET_IPV4_CONF_MAX - 1); }; extern struct ipv4_devconf ipv4_devconf; @@ -53,6 +55,7 @@ static inline void ipv4_devconf_set(stru int val) { index--; + set_bit(index, in_dev->cnf.state); in_dev->cnf.data[index] = val; } diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1244,6 +1244,91 @@ errout: #ifdef CONFIG_SYSCTL +static void devinet_copy_dflt_conf(int i) +{ + struct net_device *dev; + + read_lock(&dev_base_lock); + for_each_netdev(dev) { + struct in_device *in_dev; + rcu_read_lock(); + in_dev = __in_dev_get_rcu(dev); + if (in_dev && !test_bit(i, in_dev->cnf.state)) + in_dev->cnf.data[i] = ipv4_devconf_dflt.data[i]; + rcu_read_unlock(); + } + read_unlock(&dev_base_lock); +} + +static int devinet_conf_proc(ctl_table *ctl, int write, +struct file* filp, void __user *buffer, +size_t *lenp, loff_t *ppos) +{ + int ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos); + + if (write) { + struct ipv4_devconf *cnf = ctl->extra1; + int i = (int *)ctl->data - cnf->data; + + set_bit(i, cnf->state); + + if (cnf == &ipv4_devconf_dflt) + devinet_copy_dflt_conf(i); + } + + return ret; +} + +static int devinet_conf_sysctl(ctl_table *table, int __user *name, int nlen, + void __user *oldval, size_t __user *oldlenp, + void __user *newval, size_t newlen) +{ + struct ipv4_devconf *cnf; + int *valp = table->data; + int new; + int i; + + if (!newval || !newlen) + return 0; + + if (newlen != sizeof(int)) + return -EINVAL; + + if (get_user(new, (int __user *)newval)) + return -EFAULT; + + if (new == *valp) + return 0; + + if (oldval && oldlenp) { + size_t len; + + if (get_user(len, oldlenp)) + return -EFAULT; + + if (len) { + if (len > table->maxlen) + len = table->maxlen; + if (copy_to_user(oldval, valp, len)) + return -EFAULT; + if (put_user(len, oldlenp)) + return -EFAULT; + } + } + + *valp = new; + + cnf = table->extra1; + i = (int *)table->data - cnf->data; + + set_bit(i, cnf->state); + + if (cnf == &ipv4_devconf_dflt) + devinet_copy_dflt_conf(i); + + return 1; +} + void inet_forward_change(void) { struct net_device *dev; @@ -1302,40 +1387,13 @@ int ipv4_doint_and_flush_strategy(ctl_ta void __user *oldval, size_t __user *oldlenp, void __user *newval, size_t newlen) { - int *valp = table->data; - int new; - - if (!newval || !newlen) - return 0; - - if (newlen != sizeof(int)) - return -EINVAL; + int ret = devinet_conf_sysctl(table, name, nlen, oldval, oldlenp, + newval, newlen); - if (get_user(new, (int __user *)newval)) - return -EFAULT; - - if (new == *valp) - return 0; - - if (oldval && oldlenp) { - size_t len; - - if (get_user(len, oldlenp)) - return -EFAULT; - -
[PATCH 2/4] [IPV4]: Convert IPv4 devconf to an array
[IPV4]: Convert IPv4 devconf to an array This patch converts the ipv4_devconf config members (everything except sysctl) to an array. This allows easier manipulation which will be needed later on to provide better management of default config values. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- drivers/infiniband/hw/amso1100/c2.c |2 include/linux/inetdevice.h | 94 +++- net/ipv4/arp.c | 11 - net/ipv4/devinet.c | 264 ++-- net/ipv4/igmp.c | 18 +- net/ipv4/ipmr.c | 12 - net/ipv4/proc.c |2 net/ipv4/route.c| 14 - net/ipv4/sysctl_net_ipv4.c |6 9 files changed, 163 insertions(+), 260 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2.c b/drivers/infiniband/hw/amso1100/c2.c --- a/drivers/infiniband/hw/amso1100/c2.c +++ b/drivers/infiniband/hw/amso1100/c2.c @@ -672,7 +672,7 @@ static int c2_up(struct net_device *netd * rdma interface. */ in_dev = in_dev_get(netdev); - in_dev->cnf.arp_ignore = 1; + IN_DEV_CONF_SET(in_dev, ARP_IGNORE, 1); in_dev_put(in_dev); return 0; diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -10,28 +10,8 @@ struct ipv4_devconf { - int accept_redirects; - int send_redirects; - int secure_redirects; - int shared_media; - int accept_source_route; - int rp_filter; - int proxy_arp; - int bootp_relay; - int log_martians; - int forwarding; - int mc_forwarding; - int tag; - int arp_filter; - int arp_announce; - int arp_ignore; - int arp_accept; - int medium_id; - int no_xfrm; - int no_policy; - int force_igmp_version; - int promote_secondaries; void*sysctl; + int data[__NET_IPV4_CONF_MAX - 1]; }; extern struct ipv4_devconf ipv4_devconf; @@ -60,30 +40,64 @@ struct in_device struct rcu_head rcu_head; }; -#define IN_DEV_FORWARD(in_dev) ((in_dev)->cnf.forwarding) -#define IN_DEV_MFORWARD(in_dev)(ipv4_devconf.mc_forwarding && (in_dev)->cnf.mc_forwarding) -#define IN_DEV_RPFILTER(in_dev)(ipv4_devconf.rp_filter && (in_dev)->cnf.rp_filter) -#define IN_DEV_SOURCE_ROUTE(in_dev)(ipv4_devconf.accept_source_route && (in_dev)->cnf.accept_source_route) -#define IN_DEV_BOOTP_RELAY(in_dev) (ipv4_devconf.bootp_relay && (in_dev)->cnf.bootp_relay) - -#define IN_DEV_LOG_MARTIANS(in_dev)(ipv4_devconf.log_martians || (in_dev)->cnf.log_martians) -#define IN_DEV_PROXY_ARP(in_dev) (ipv4_devconf.proxy_arp || (in_dev)->cnf.proxy_arp) -#define IN_DEV_SHARED_MEDIA(in_dev)(ipv4_devconf.shared_media || (in_dev)->cnf.shared_media) -#define IN_DEV_TX_REDIRECTS(in_dev)(ipv4_devconf.send_redirects || (in_dev)->cnf.send_redirects) -#define IN_DEV_SEC_REDIRECTS(in_dev) (ipv4_devconf.secure_redirects || (in_dev)->cnf.secure_redirects) -#define IN_DEV_IDTAG(in_dev) ((in_dev)->cnf.tag) -#define IN_DEV_MEDIUM_ID(in_dev) ((in_dev)->cnf.medium_id) -#define IN_DEV_PROMOTE_SECONDARIES(in_dev) (ipv4_devconf.promote_secondaries || (in_dev)->cnf.promote_secondaries) +#define IPV4_DEVCONF(cnf, attr) ((cnf).data[NET_IPV4_CONF_ ## attr - 1]) +#define IPV4_DEVCONF_ALL(attr) IPV4_DEVCONF(ipv4_devconf, attr) + +static inline int ipv4_devconf_get(struct in_device *in_dev, int index) +{ + index--; + return in_dev->cnf.data[index]; +} + +static inline void ipv4_devconf_set(struct in_device *in_dev, int index, + int val) +{ + index--; + in_dev->cnf.data[index] = val; +} + +#define IN_DEV_CONF_GET(in_dev, attr) \ + ipv4_devconf_get((in_dev), NET_IPV4_CONF_ ## attr) +#define IN_DEV_CONF_SET(in_dev, attr, val) \ + ipv4_devconf_set((in_dev), NET_IPV4_CONF_ ## attr, (val)) + +#define IN_DEV_ANDCONF(in_dev, attr) \ + (IPV4_DEVCONF_ALL(attr) && IN_DEV_CONF_GET((in_dev), attr)) +#define IN_DEV_ORCONF(in_dev, attr) \ + (IPV4_DEVCONF_ALL(attr) || IN_DEV_CONF_GET((in_dev), attr)) +#define IN_DEV_MAXCONF(in_dev, attr) \ + (max(IPV4_DEVCONF_ALL(attr), IN_DEV_CONF_GET((in_dev), attr))) + +#define IN_DEV_FORWARD(in_dev) IN_DEV_CONF_GET((in_dev), FORWARDING) +#define IN_DEV_MFORWARD(in_dev) (IPV4_DEVCONF_ALL(MC_FORWARDING) && \ +IPV4_DEVCONF((in_dev)->cnf, \ + MC_FORWARDING)) +#define IN_DEV_RPFILTER(in_dev)IN_DEV_ANDCONF((in_dev), RP_FILTER) +#define IN_DEV_SOURCE_ROUTE(in_dev)IN_DEV_ANDCONF((in_dev), \ +
[PATCH 4/4] [IPV4]: Restore old behaviour of default config values
[IPV4]: Restore old behaviour of default config values Previously inet devices were only constructed when addresses are added (or rarely in ipmr). Therefore the default config values they get are the ones at the time of these operations. Now that we're creating inet devices earlier, this changes the behaviour of default config values in an incompatible way (see bug #8519). This patch creates a compromise by setting the default values at the same point as before but only for those that have not been explicitly set by the user since the inet device's creation. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- include/linux/inetdevice.h |6 +- net/ipv4/devinet.c | 19 --- net/ipv4/ipmr.c| 15 +++ 3 files changed, 24 insertions(+), 16 deletions(-) diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -59,6 +59,11 @@ static inline void ipv4_devconf_set(stru in_dev->cnf.data[index] = val; } +static inline void ipv4_devconf_setall(struct in_device *in_dev) +{ + bitmap_fill(in_dev->cnf.state, __NET_IPV4_CONF_MAX - 1); +} + #define IN_DEV_CONF_GET(in_dev, attr) \ ipv4_devconf_get((in_dev), NET_IPV4_CONF_ ## attr) #define IN_DEV_CONF_SET(in_dev, attr, val) \ @@ -125,7 +130,6 @@ extern struct net_device*ip_dev_find(_ extern int inet_addr_onlink(struct in_device *in_dev, __be32 a, __be32 b); extern int devinet_ioctl(unsigned int cmd, void __user *); extern voiddevinet_init(void); -extern struct in_device *inetdev_init(struct net_device *dev); extern struct in_device*inetdev_by_index(int); extern __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope); extern __be32 inet_confirm_addr(const struct net_device *dev, __be32 dst, __be32 local, int scope); diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -147,7 +147,7 @@ void in_dev_finish_destroy(struct in_dev } } -struct in_device *inetdev_init(struct net_device *dev) +static struct in_device *inetdev_init(struct net_device *dev) { struct in_device *in_dev; @@ -405,12 +405,10 @@ static int inet_set_ifa(struct net_devic ASSERT_RTNL(); if (!in_dev) { - in_dev = inetdev_init(dev); - if (!in_dev) { - inet_free_ifa(ifa); - return -ENOBUFS; - } + inet_free_ifa(ifa); + return -ENOBUFS; } + ipv4_devconf_setall(in_dev); if (ifa->ifa_dev != in_dev) { BUG_TRAP(!ifa->ifa_dev); in_dev_hold(in_dev); @@ -520,13 +518,12 @@ static struct in_ifaddr *rtm_to_ifaddr(s in_dev = __in_dev_get_rtnl(dev); if (in_dev == NULL) { - in_dev = inetdev_init(dev); - if (in_dev == NULL) { - err = -ENOBUFS; - goto errout; - } + err = -ENOBUFS; + goto errout; } + ipv4_devconf_setall(in_dev); + ifa = inet_alloc_ifa(); if (ifa == NULL) { /* diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -152,9 +152,11 @@ struct net_device *ipmr_new_tunnel(struc dev->flags |= IFF_MULTICAST; in_dev = __in_dev_get_rtnl(dev); - if (in_dev == NULL && (in_dev = inetdev_init(dev)) == NULL) + if (in_dev == NULL) goto failure; - IN_DEV_CONF_SET(in_dev, RP_FILTER, 0); + + ipv4_devconf_setall(in_dev); + IPV4_DEVCONF(in_dev->cnf, RP_FILTER) = 0; if (dev_open(dev)) goto failure; @@ -218,10 +220,15 @@ static struct net_device *ipmr_reg_vif(v } dev->iflink = 0; - if ((in_dev = inetdev_init(dev)) == NULL) + rcu_read_lock(); + if ((in_dev = __in_dev_get_rcu(dev)) == NULL) { + rcu_read_unlock(); goto failure; + } - IN_DEV_CONF_SET(in_dev, RP_FILTER, 0); + ipv4_devconf_setall(in_dev); + IPV4_DEVCONF(in_dev->cnf, RP_FILTER) = 0; + rcu_read_unlock(); if (dev_open(dev)) goto failure; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] [IPV4]: Only panic if inetdev_init fails for loopback
[IPV4]: Only panic if inetdev_init fails for loopback When I made the inetdev_init call work on all devices I incorrectly left in the panic call as well. It is obviously undesirable to panic on an allocation failure for a normal network device. This patch moves the panic call under the loopback if clause. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- net/ipv4/devinet.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1057,9 +1057,10 @@ static int inetdev_event(struct notifier if (!in_dev) { if (event == NETDEV_REGISTER) { in_dev = inetdev_init(dev); - if (!in_dev) - panic("devinet: Failed to create loopback\n"); if (dev == &loopback_dev) { + if (!in_dev) + panic("devinet: " + "Failed to create loopback\n"); in_dev->cnf.no_xfrm = 1; in_dev->cnf.no_policy = 1; } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/4] [IPV4]: Convert IPv4 devconf to an array
On Mon, Jun 04, 2007 at 11:17:54PM -0700, David Miller wrote: > > These array indexes are off by one. Good catch. I'll repost this. > This is the danger in using this "x-1" indexing style. > Such a mistake is way too easy to make. Yes, unfortunately the NET_IPV4_* constants are exposed to user-space so I can't easily change them. Inventing a new set of constants didn't seem to be worthwhile. The other option would be to keep the symbolic names with a union or explicit pointer calculations for the bitmap, but this seemed to me to be the least ugly of all the alternatives. If it's any consolation, this should be the only spot where we use these constants directly. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/4] [IPV4]: Convert IPv4 devconf to an array
From: Herbert Xu <[EMAIL PROTECTED]> Date: Sat, 2 Jun 2007 20:02:52 +1000 > @@ -64,20 +64,26 @@ > #include > > struct ipv4_devconf ipv4_devconf = { > - .accept_redirects = 1, > - .send_redirects = 1, > - .secure_redirects = 1, > - .shared_media = 1, > + .data = { > + [NET_IPV4_CONF_ACCEPT_REDIRECTS] = 1, > + [NET_IPV4_CONF_SEND_REDIRECTS] = 1, > + [NET_IPV4_CONF_SECURE_REDIRECTS] = 1, > + [NET_IPV4_CONF_SHARED_MEDIA] = 1, > + }, > }; These array indexes are off by one. This is the danger in using this "x-1" indexing style. Such a mistake is way too easy to make. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Add suspend and resume support to uli526x
I hope soon to add suspend/resume to the network device class and remove driver specific suspend/resume from lots of devices. The class suspend routine would just be: pci_save_state dev->stop resume is pci_restore_state dev->open for many devices that is all they need. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [5/5][BNX2]: Update version and reldate.
From: "Michael Chan" <[EMAIL PROTECTED]> Date: Mon, 04 Jun 2007 17:01:18 -0700 > [BNX2]: Update version and reldate. > > Update to version 1.5.11. > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> Also applied. I'd like to note in passing since I saw it in the context of this diff that I find the RUN_AT() macro confusing. It's name implies that the argument given to it is absolute. Instead, the argument is relative and it's return value is absolute. Maybe it's just me :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4/5][BNX2]: Fix occasional counter corruption on 5708.
From: "Michael Chan" <[EMAIL PROTECTED]> Date: Mon, 04 Jun 2007 17:01:08 -0700 > [BNX2]: Fix occasional counter corruption on 5708. > > The statistics block DMA on 5708 can be messed up occasionally on the > average of about once per hour. If the user is reading the counters > within one second after the corruption, the counters will be all > messed up. One second later, the counters will be ok again until the > next corruption occurs. > > The workaround is to disable the periodic statistics DMA. Instead, > we manually trigger the DMA once a second in bnx2_timer(). This > manual trigger of the DMA avoids the problem. > > As a consequence, we can only allow 0 or 1 second settings for > ethtool -C statistics block. > > Thanks to Jean-Daniel Pauget <[EMAIL PROTECTED]> and > CaT <[EMAIL PROTECTED]> for reporting this rare problem. > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/5][BNX2]: Enable DMA on 5709.
From: "Michael Chan" <[EMAIL PROTECTED]> Date: Mon, 04 Jun 2007 17:00:48 -0700 > [BNX2]: Enable DMA on 5709. > > Add missing code to enable DMA on 5709 A1. The bit is a no-op on > A0 and therefore can be set on all 5709 chips. > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/5 revised][BNX2]: Add missing wait in bnx2_init_5709_context().
From: Jeff Garzik <[EMAIL PROTECTED]> Date: Mon, 4 Jun 2007 21:23:51 -0400 > On Mon, Jun 04, 2007 at 05:58:49PM -0700, Michael Chan wrote: > > [BNX2]: Add missing wait in bnx2_init_5709_context(). > > > > For correctness, we need to wait for the MEM_INIT bit to be cleared > > in the BNX2_CTX_COMMAND register before proceeding. > > > > [Added return -EBUSY when the MEM_INIT bit doesn't clear, suggested > > by Jeff Garzik.] > > > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> > > ACK Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/5][BNX2]: Fix netdev watchdog on 5708.
From: "Michael Chan" <[EMAIL PROTECTED]> Date: Mon, 04 Jun 2007 17:00:03 -0700 > [BNX2]: Fix netdev watchdog on 5708. > > There's a bug in the driver that only initializes half of the context > memory on the 5708. Surprisingly, this works most of the time except > for some occasional netdev watchdogs when sending a lot of 64-byte > packets. The fix is to add the missing code to initialize the 2nd > halves of all context memory. > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> Applied, thanks Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bugfix GFP_KERNEL -> GFP_ATOMIC in spin_locked region
On Mon, 4 Jun 2007 21:00:18 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > > diff --git a/drivers/usb/serial/io_ti.c b/drivers/usb/serial/io_ti.c > > index 544098d..9ec38e3 100644 > > --- a/drivers/usb/serial/io_ti.c > > +++ b/drivers/usb/serial/io_ti.c > > @@ -2351,7 +2351,7 @@ static int restart_read(struct edgeport_ > > urb->complete = edge_bulk_in_callback; > > urb->context = edge_port; > > urb->dev = edge_port->port->serial->dev; > > - status = usb_submit_urb(urb, GFP_KERNEL); > > + status = usb_submit_urb(urb, GFP_ATOMIC); > > } > > edge_port->ep_read_urb_state = EDGE_READ_URB_RUNNING; > > edge_port->shadow_mcr |= MCR_RTS; > > diff --git a/drivers/usb/serial/ti_usb_3410_5052.c > > b/drivers/usb/serial/ti_usb_3410_5052.c > > index 4203e2b..10dc36f 100644 > > --- a/drivers/usb/serial/ti_usb_3410_5052.c > > +++ b/drivers/usb/serial/ti_usb_3410_5052.c > > @@ -1559,7 +1559,7 @@ static int ti_restart_read(struct ti_por > > urb->complete = ti_bulk_in_callback; > > urb->context = tport; > > urb->dev = tport->tp_port->serial->dev; > > - status = usb_submit_urb(urb, GFP_KERNEL); > > + status = usb_submit_urb(urb, GFP_ATOMIC); > > } > > tport->tp_read_urb_state = TI_READ_URB_RUNNING; > > > > diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c > > index 27c5f8f..1b01207 100644 > > --- a/drivers/usb/serial/whiteheat.c > > +++ b/drivers/usb/serial/whiteheat.c > > @@ -1116,7 +1116,7 @@ static int firm_send_command (struct usb > > memcpy (&transfer_buffer[1], data, datasize); > > command_port->write_urb->transfer_buffer_length = datasize + 1; > > command_port->write_urb->dev = port->serial->dev; > > - retval = usb_submit_urb (command_port->write_urb, GFP_KERNEL); > > + retval = usb_submit_urb (command_port->write_urb, GFP_ATOMIC); > > if (retval) { > > dbg("%s - submit urb failed", __FUNCTION__); > > goto exit; > > @@ -1316,7 +1316,7 @@ static int start_command_port(struct usb > > usb_clear_halt(serial->dev, command_port->read_urb->pipe); > > > > command_port->read_urb->dev = serial->dev; > > - retval = usb_submit_urb(command_port->read_urb, GFP_KERNEL); > > + retval = usb_submit_urb(command_port->read_urb, GFP_ATOMIC); > > if (retval) { > > err("%s - failed submitting read urb, error %d", > > __FUNCTION__, retval); > > goto exit; > > @@ -1363,7 +1363,7 @@ static int start_port_read(struct usb_se > > wrap = list_entry(tmp, struct whiteheat_urb_wrap, list); > > urb = wrap->urb; > > urb->dev = port->serial->dev; > > - retval = usb_submit_urb(urb, GFP_KERNEL); > > + retval = usb_submit_urb(urb, GFP_ATOMIC); > > if (retval) { > > list_add(tmp, &info->rx_urbs_free); > > list_for_each_safe(tmp, tmp2, &info->rx_urbs_submitted) > > { > > This part might make sense so I'll queue it for the USB guys to look at. > > Everything in USB appears to already be fixed, apart from the io_ti.c bug. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bugfix GFP_KERNEL -> GFP_ATOMIC in spin_locked region
On Mon, 04 Jun 2007 18:25:28 +0200 Yoann Padioleau <[EMAIL PROTECTED]> wrote: > > In a few files a function such as usb_submit_urb is taking GFP_KERNEL > as an argument whereas this function call is inside a > spin_lock_irqsave region of code. Documentation says that it must be > GFP_ATOMIC instead. > > Me and my colleagues have made a tool targeting program > transformations in device drivers. We have designed a scripting > language for specifying easily program transformations and a > transformation engine for performing them. In the spirit of Linux > development practice, the language is based on the patch syntax. We > call our scripts "semantic patches" because as opposed to traditional > patches, our semantic patches do not work at the line level but on a > high level representation of the C program. > > FYI here is our semantic patch detecting invalid use of GFP_KERNEL and > fixing the problem: > > @@ > identifier fn; > @@ > > spin_lock_irqsave(...) > ... when != spin_unlock_irqrestore(...) > fn(..., > - GFP_KERNEL > + GFP_ATOMIC >,... >) I think I read that paper. > And now the real patch resulting from the automated transformation: > > net/wan/lmc/lmc_main.c|2 +- > scsi/megaraid/megaraid_mm.c |2 +- > usb/serial/io_ti.c|2 +- > usb/serial/ti_usb_3410_5052.c |2 +- > usb/serial/whiteheat.c|6 +++--- > 5 files changed, 7 insertions(+), 7 deletions(-) This patch covers three areas of maintainer responsibility, so poor old me has to split it up and redo the changelogs. Oh well. > > diff --git a/drivers/net/wan/lmc/lmc_main.c b/drivers/net/wan/lmc/lmc_main.c > index ae132c1..750b3ef 100644 > --- a/drivers/net/wan/lmc/lmc_main.c > +++ b/drivers/net/wan/lmc/lmc_main.c > @@ -483,7 +483,7 @@ #endif /* end ifdef _DBG_EVENTLOG */ > break; > } > > -data = kmalloc(xc.len, GFP_KERNEL); > +data = kmalloc(xc.len, GFP_ATOMIC); > if(data == 0x0){ > printk(KERN_WARNING "%s: Failed to allocate > memory for copy\n", dev->name); > ret = -ENOMEM; A few lines later we do: if(copy_from_user(data, xc.data, xc.len)) which also is illegal under spinlock. Frankly, I think that a better use of this tool would be to just report on the errors, let humans fix them up. Nobody maintains this ATM code afaik. > index e075a52..edee220 100644 > --- a/drivers/scsi/megaraid/megaraid_mm.c > +++ b/drivers/scsi/megaraid/megaraid_mm.c > @@ -547,7 +547,7 @@ mraid_mm_attach_buf(mraid_mmadp_t *adp, > > kioc->pool_index= right_pool; > kioc->free_buf = 1; > - kioc->buf_vaddr = pci_pool_alloc(pool->handle, GFP_KERNEL, > + kioc->buf_vaddr = pci_pool_alloc(pool->handle, GFP_ATOMIC, > &kioc->buf_paddr); > spin_unlock_irqrestore(&pool->lock, flags); Again, a better fix would probably be to move the pci_pool_alloc() to before the spin_lock_irqsave(), so we can continue to use the stronger GFP_KERNEL. But the locking in there looks basically nonsensical or wrong anyway. It appears that local variable `right_pool' cannot be validly used unless we're holding pool->lock for the whole duration. Somebody does maintain the megaraid driver, but I'm not sure who, and the MAINTAINERS file isn't very useful. So I'll spray it around a bit. We definitely have bugs in there. > diff --git a/drivers/usb/serial/io_ti.c b/drivers/usb/serial/io_ti.c > index 544098d..9ec38e3 100644 > --- a/drivers/usb/serial/io_ti.c > +++ b/drivers/usb/serial/io_ti.c > @@ -2351,7 +2351,7 @@ static int restart_read(struct edgeport_ > urb->complete = edge_bulk_in_callback; > urb->context = edge_port; > urb->dev = edge_port->port->serial->dev; > - status = usb_submit_urb(urb, GFP_KERNEL); > + status = usb_submit_urb(urb, GFP_ATOMIC); > } > edge_port->ep_read_urb_state = EDGE_READ_URB_RUNNING; > edge_port->shadow_mcr |= MCR_RTS; > diff --git a/drivers/usb/serial/ti_usb_3410_5052.c > b/drivers/usb/serial/ti_usb_3410_5052.c > index 4203e2b..10dc36f 100644 > --- a/drivers/usb/serial/ti_usb_3410_5052.c > +++ b/drivers/usb/serial/ti_usb_3410_5052.c > @@ -1559,7 +1559,7 @@ static int ti_restart_read(struct ti_por > urb->complete = ti_bulk_in_callback; > urb->context = tport; > urb->dev = tport->tp_port->serial->dev; > - status = usb_submit_urb(urb, GFP_KERNEL); > + status = usb_submit_urb(urb, GFP_ATOMIC); > } > tport->tp_read_urb_state = TI_READ_URB_RUNNING; > > diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c > index 27c5f8f..1b01207 100644 > --- a/drivers/usb/serial/whiteheat.c > +++ b/drivers/usb/serial/whiteheat.c > @@ -1116,7
[PATCH] sundance: PHY address form 0, only for device ID 0x0200 (IP100A) (20070605)
From: Jesse Huang <[EMAIL PROTECTED]> Change Logs: Search PHY address form 0, only for device ID 0x0200 (IP100A). Other device are from PHY address 1. Signed-off-by: Jesse Huang <[EMAIL PROTECTED]> --- drivers/net/sundance.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) a093fe5e3e7b9f58458171e481b5096cc1d3abfe diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c index e1f912d..23d049d 100644 --- a/drivers/net/sundance.c +++ b/drivers/net/sundance.c @@ -562,7 +562,11 @@ #endif * It seems some phys doesn't deal well with address 0 being accessed * first, so leave address zero to the end of the loop (32 & 31). */ - for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) { + if(sundance_pci_tbl[np->chip_id].device == 0x0200) + phy = 0; + else + phy = 1; + for (; phy <= 32 && phy_idx < MII_CNT; phy++) { int phyx = phy & 0x1f; int mii_status = mdio_read(dev, phyx, MII_BMSR); if (mii_status != 0x && mii_status != 0x) { -- 1.3.GIT - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sundance: PHY address form 0, only for device ID 0x0200 (IP100A) (20070605)
From: Jesse Huang <[EMAIL PROTECTED]> Change Logs: Search PHY address form 0, only for device ID 0x0200 (IP100A). Other device are from PHY address 1. Signed-off-by: Jesse Huang <[EMAIL PROTECTED]> --- drivers/net/sundance.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) a093fe5e3e7b9f58458171e481b5096cc1d3abfe diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c index e1f912d..23d049d 100644 --- a/drivers/net/sundance.c +++ b/drivers/net/sundance.c @@ -562,7 +562,11 @@ #endif * It seems some phys doesn't deal well with address 0 being accessed * first, so leave address zero to the end of the loop (32 & 31). */ - for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) { + if(sundance_pci_tbl[np->chip_id].device == 0x0200) + phy = 0; + else + phy = 1; + for (; phy <= 32 && phy_idx < MII_CNT; phy++) { int phyx = phy & 0x1f; int mii_status = mdio_read(dev, phyx, MII_BMSR); if (mii_status != 0x && mii_status != 0x) { -- 1.3.GIT - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/5 revised][BNX2]: Add missing wait in bnx2_init_5709_context().
On Mon, Jun 04, 2007 at 05:58:49PM -0700, Michael Chan wrote: > [BNX2]: Add missing wait in bnx2_init_5709_context(). > > For correctness, we need to wait for the MEM_INIT bit to be cleared > in the BNX2_CTX_COMMAND register before proceeding. > > [Added return -EBUSY when the MEM_INIT bit doesn't clear, suggested > by Jeff Garzik.] > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> ACK - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Add suspend and resume support to uli526x
On Sun, Jun 03, 2007 at 12:37:35PM +0200, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <[EMAIL PROTECTED]> > > Add suspend/resume support to the uli526x network driver (tested on x86_64, > with 'Ethernet controller: ALi Corporation M5263 Ethernet Controller, rev > 40'). > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> Nice work! Looks good to me. Signed-off-by: Val Henson <[EMAIL PROTECTED]> -VAL > drivers/net/tulip/uli526x.c | 120 > ++-- > 1 file changed, 105 insertions(+), 15 deletions(-) > > Index: linux-2.6.22-rc3/drivers/net/tulip/uli526x.c > === > --- linux-2.6.22-rc3.orig/drivers/net/tulip/uli526x.c 2007-06-03 > 12:10:41.0 +0200 > +++ linux-2.6.22-rc3/drivers/net/tulip/uli526x.c 2007-06-03 > 12:14:33.0 +0200 > @@ -220,6 +220,10 @@ static int mode = 8; > static int uli526x_open(struct net_device *); > static int uli526x_start_xmit(struct sk_buff *, struct net_device *); > static int uli526x_stop(struct net_device *); > +#ifdef CONFIG_PM > +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state); > +static int uli526x_resume(struct pci_dev *pdev); > +#endif > static struct net_device_stats * uli526x_get_stats(struct net_device *); > static void uli526x_set_filter_mode(struct net_device *); > static const struct ethtool_ops netdev_ethtool_ops; > @@ -425,21 +429,14 @@ static void __devexit uli526x_remove_one > > > /* > - * Open the interface. > - * The interface is opened whenever "ifconfig" activates it. > + * Bring the interface up. > + * Used by uli526x_open and uli526x_resume. > */ > > -static int uli526x_open(struct net_device *dev) > +static void uli526x_up(struct net_device *dev) > { > - int ret; > struct uli526x_board_info *db = netdev_priv(dev); > > - ULI526X_DBUG(0, "uli526x_open", 0); > - > - ret = request_irq(dev->irq, &uli526x_interrupt, IRQF_SHARED, dev->name, > dev); > - if (ret) > - return ret; > - > /* system variable init */ > db->cr6_data = CR6_DEFAULT | uli526x_cr6_user_set; > db->tx_packet_cnt = 0; > @@ -467,7 +464,25 @@ static int uli526x_open(struct net_devic > db->timer.data = (unsigned long)dev; > db->timer.function = &uli526x_timer; > add_timer(&db->timer); > +} > + > + > +/* > + * Open the interface. > + * The interface is opened whenever "ifconfig" activates it. > + */ > > +static int uli526x_open(struct net_device *dev) > +{ > + int ret; > + > + ULI526X_DBUG(0, "uli526x_open", 0); > + > + ret = request_irq(dev->irq, &uli526x_interrupt, IRQF_SHARED, dev->name, > dev); > + if (ret) > + return ret; > + > + uli526x_up(dev); > return 0; > } > > @@ -613,17 +628,15 @@ static int uli526x_start_xmit(struct sk_ > > > /* > - * Stop the interface. > - * The interface is stopped when it is brought. > + * Take the interface down. > + * Used by uli526x_stop and uli526x_suspend. > */ > > -static int uli526x_stop(struct net_device *dev) > +static void uli526x_down(struct net_device *dev) > { > struct uli526x_board_info *db = netdev_priv(dev); > unsigned long ioaddr = dev->base_addr; > > - ULI526X_DBUG(0, "uli526x_stop", 0); > - > /* disable system */ > netif_stop_queue(dev); > > @@ -634,6 +647,21 @@ static int uli526x_stop(struct net_devic > outl(ULI526X_RESET, ioaddr + DCR0); > udelay(5); > phy_write(db->ioaddr, db->phy_addr, 0, 0x8000, db->chip_id); > +} > + > + > +/* > + * Stop the interface. > + * The interface is stopped when it is brought. > + */ > + > +static int uli526x_stop(struct net_device *dev) > +{ > + struct uli526x_board_info *db = netdev_priv(dev); > + > + ULI526X_DBUG(0, "uli526x_stop", 0); > + > + uli526x_down(dev); > > /* free interrupt */ > free_irq(dev->irq, dev); > @@ -654,6 +682,64 @@ static int uli526x_stop(struct net_devic > } > > > +#ifdef CONFIG_PM > + > +/* > + * Suspend the interface. > + */ > + > +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state) > +{ > + struct net_device *dev = pci_get_drvdata(pdev); > + > + ULI526X_DBUG(0, "uli526x_suspend", 0); > + > + if (dev && netdev_priv(dev)) { > + if (netif_running(dev)) { > + netif_device_detach(dev); > + uli526x_down(dev); > + } > + pci_save_state(pdev); > + pci_enable_wake(pdev, pci_choose_state(pdev, state), 0); > + pci_disable_device(pdev); > + pci_set_power_state(pdev, pci_choose_state(pdev, state)); > + } > + return 0; > +} > + > +/* > + * Resume the interface. > + */ > + > +static int uli526x_resume(struct pci_dev *pdev) > +{ > + struct net_device *dev = pci_get_drvdata(pdev); > + struct uli526x_board_info *db = netdev_priv(dev); > + int er
[PATCH 6/7] sky2: Yukon Extreme (88e8071) support.
Enable support for Yukon EX chipset (88e8071). Most of changes are related to new commands to chip for transmit, and change in status and checksumming. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/sky2.c | 176 +++-- drivers/net/sky2.h | 43 2 files changed, 148 insertions(+), 71 deletions(-) --- a/drivers/net/sky2.c2007-06-04 08:46:47.0 -0700 +++ b/drivers/net/sky2.c2007-06-04 08:46:54.0 -0700 @@ -130,7 +130,7 @@ static const struct pci_device_id sky2_i { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4368) }, /* 88EC034 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4369) }, /* 88EC042 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x436A) }, /* 88E8058 */ -// { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x436B) }, /* 88E8071 */ + { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x436B) }, /* 88E8071 */ { 0 } }; @@ -661,6 +661,30 @@ static void sky2_wol_init(struct sky2_po } +static void sky2_set_tx_stfwd(struct sky2_hw *hw, unsigned port) +{ + if (hw->chip_id == CHIP_ID_YUKON_EX && hw->chip_rev != CHIP_REV_YU_EX_A0) { + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), +TX_STFW_ENA | +(hw->dev[port]->mtu > ETH_DATA_LEN) ? TX_JUMBO_ENA : TX_JUMBO_DIS); + } else { + if (hw->dev[port]->mtu > ETH_DATA_LEN) { + /* set Tx GMAC FIFO Almost Empty Threshold */ + sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR), +(ECU_JUMBO_WM << 16) | ECU_AE_THR); + + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), +TX_JUMBO_ENA | TX_STFW_DIS); + + /* Can't do offload because of lack of store/forward */ + hw->dev[port]->features &= ~(NETIF_F_TSO | NETIF_F_SG +| NETIF_F_ALL_CSUM); + } else + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), +TX_JUMBO_DIS | TX_STFW_ENA); + } +} + static void sky2_mac_init(struct sky2_hw *hw, unsigned port) { struct sky2_port *sky2 = netdev_priv(hw->dev[port]); @@ -741,8 +765,11 @@ static void sky2_mac_init(struct sky2_hw /* Configure Rx MAC FIFO */ sky2_write8(hw, SK_REG(port, RX_GMF_CTRL_T), GMF_RST_CLR); - sky2_write32(hw, SK_REG(port, RX_GMF_CTRL_T), -GMF_OPER_ON | GMF_RX_F_FL_ON); + reg = GMF_OPER_ON | GMF_RX_F_FL_ON; + if (hw->chip_id == CHIP_ID_YUKON_EX) + reg |= GMF_RX_OVER_ON; + + sky2_write32(hw, SK_REG(port, RX_GMF_CTRL_T), reg); /* Flush Rx MAC FIFO on any flow control or error */ sky2_write16(hw, SK_REG(port, RX_GMF_FL_MSK), GMR_FS_ANY_ERR); @@ -758,16 +785,7 @@ static void sky2_mac_init(struct sky2_hw sky2_write8(hw, SK_REG(port, RX_GMF_LP_THR), 768/8); sky2_write8(hw, SK_REG(port, RX_GMF_UP_THR), 1024/8); - /* set Tx GMAC FIFO Almost Empty Threshold */ - sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR), -(ECU_JUMBO_WM << 16) | ECU_AE_THR); - - if (hw->dev[port]->mtu > ETH_DATA_LEN) - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), -TX_JUMBO_ENA | TX_STFW_DIS); - else - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), -TX_JUMBO_DIS | TX_STFW_ENA); + sky2_set_tx_stfwd(hw, port); } } @@ -950,14 +968,16 @@ static void rx_set_checksum(struct sky2_ { struct sky2_rx_le *le; - le = sky2_next_rx(sky2); - le->addr = cpu_to_le32((ETH_HLEN << 16) | ETH_HLEN); - le->ctrl = 0; - le->opcode = OP_TCPSTART | HW_OWNER; - - sky2_write32(sky2->hw, -Q_ADDR(rxqaddr[sky2->port], Q_CSR), -sky2->rx_csum ? BMU_ENA_RX_CHKSUM : BMU_DIS_RX_CHKSUM); + if (sky2->hw->chip_id != CHIP_ID_YUKON_EX) { + le = sky2_next_rx(sky2); + le->addr = cpu_to_le32((ETH_HLEN << 16) | ETH_HLEN); + le->ctrl = 0; + le->opcode = OP_TCPSTART | HW_OWNER; + + sky2_write32(sky2->hw, +Q_ADDR(rxqaddr[sky2->port], Q_CSR), +sky2->rx_csum ? BMU_ENA_RX_CHKSUM : BMU_DIS_RX_CHKSUM); + } } @@ -1296,6 +1316,10 @@ static int sky2_up(struct net_device *de sky2_qset(hw, txqaddr[port]); + /* This is copied from sk98lin 10.0.5.3; no one tells me about erratta's */ + if (hw->chip_id == CHIP_ID_YUKON_EX && hw->chip_rev == CHIP_REV_YU_EX_B0) + sky2_write32(hw, Q_ADDR(txqaddr[port], Q_TEST), F_TX_CHK_AUTO_OFF); + /* Set almost em
[PATCH 7/7] sky2: version 1.15
New version because of new chip support. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/sky2.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/net/sky2.c2007-05-31 11:27:07.0 -0700 +++ b/drivers/net/sky2.c2007-05-31 11:27:08.0 -0700 @@ -50,7 +50,7 @@ #include "sky2.h" #define DRV_NAME "sky2" -#define DRV_VERSION"1.14" +#define DRV_VERSION"1.15" #define PFXDRV_NAME " " /* -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] sky2: Add PCI device specfic register 4 & 5
Need to setup more PCI control control registers are on Yukon EX. Some of these also exist on Yukon EC-U as well. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/sky2.c | 18 drivers/net/sky2.h | 77 + 2 files changed, 89 insertions(+), 6 deletions(-) --- a/drivers/net/sky2.c2007-06-04 08:46:34.0 -0700 +++ b/drivers/net/sky2.c2007-06-04 08:46:41.0 -0700 @@ -217,13 +217,19 @@ static void sky2_power_on(struct sky2_hw sky2_write8(hw, B2_Y2_CLK_GATE, 0); if (hw->chip_id == CHIP_ID_YUKON_EC_U || hw->chip_id == CHIP_ID_YUKON_EX) { - u32 reg1; + u32 reg; - sky2_pci_write32(hw, PCI_DEV_REG3, 0); - reg1 = sky2_pci_read32(hw, PCI_DEV_REG4); - reg1 &= P_ASPM_CONTROL_MSK; - sky2_pci_write32(hw, PCI_DEV_REG4, reg1); - sky2_pci_write32(hw, PCI_DEV_REG5, 0); + reg = sky2_pci_read32(hw, PCI_DEV_REG4); + /* set all bits to 0 except bits 15..12 and 8 */ + reg &= P_ASPM_CONTROL_MSK; + sky2_pci_write32(hw, PCI_DEV_REG4, reg); + + reg = sky2_pci_read32(hw, PCI_DEV_REG5); + /* set all bits to 0 except bits 28 & 27 */ + reg &= P_CTL_TIM_VMAIN_AV_MSK; + sky2_pci_write32(hw, PCI_DEV_REG5, reg); + + sky2_pci_write32(hw, PCI_CFG_REG_1, 0); } } --- a/drivers/net/sky2.h2007-06-04 08:45:25.0 -0700 +++ b/drivers/net/sky2.h2007-06-04 08:46:41.0 -0700 @@ -14,6 +14,8 @@ enum { PCI_DEV_REG3= 0x80, PCI_DEV_REG4= 0x84, PCI_DEV_REG5= 0x88, + PCI_CFG_REG_0 = 0x90, + PCI_CFG_REG_1 = 0x94, }; enum { @@ -28,6 +30,7 @@ enum { enum pci_dev_reg_1 { PCI_Y2_PIG_ENA = 1<<31, /* Enable Plug-in-Go (YUKON-2) */ PCI_Y2_DLL_DIS = 1<<30, /* Disable PCI DLL (YUKON-2) */ + PCI_SW_PWR_ON_RST= 1<<30, /* SW Power on Reset (Yukon-EX) */ PCI_Y2_PHY2_COMA = 1<<29, /* Set PHY 2 to Coma Mode (YUKON-2) */ PCI_Y2_PHY1_COMA = 1<<28, /* Set PHY 1 to Coma Mode (YUKON-2) */ PCI_Y2_PHY2_POWD = 1<<27, /* Set PHY 2 to Power Down (YUKON-2) */ @@ -67,6 +70,80 @@ enum pci_dev_reg_4 { | P_ASPM_CLKRUN_REQUEST | P_ASPM_INT_FIFO_EMPTY, }; +/* PCI_OUR_REG_5 32 bit Our Register 5 (Yukon-ECU only) */ +enum pci_dev_reg_5 { + /* Bit 31..27: for A3 & later */ + P_CTL_DIV_CORE_CLK_ENA = 1<<31, /* Divide Core Clock Enable */ + P_CTL_SRESET_VMAIN_AV = 1<<30, /* Soft Reset for Vmain_av De-Glitch */ + P_CTL_BYPASS_VMAIN_AV = 1<<29, /* Bypass En. for Vmain_av De-Glitch */ + P_CTL_TIM_VMAIN_AV_MSK = 3<<27, /* Bit 28..27: Timer Vmain_av Mask */ +/* Bit 26..16: Release Clock on Event */ + P_REL_PCIE_RST_DE_ASS = 1<<26, /* PCIe Reset De-Asserted */ + P_REL_GPHY_REC_PACKET = 1<<25, /* GPHY Received Packet */ + P_REL_INT_FIFO_N_EMPTY = 1<<24, /* Internal FIFO Not Empty */ + P_REL_MAIN_PWR_AVAIL= 1<<23, /* Main Power Available */ + P_REL_CLKRUN_REQ_REL= 1<<22, /* CLKRUN Request Release */ + P_REL_PCIE_RESET_ASS= 1<<21, /* PCIe Reset Asserted */ + P_REL_PME_ASSERTED = 1<<20, /* PME Asserted */ + P_REL_PCIE_EXIT_L1_ST = 1<<19, /* PCIe Exit L1 State */ + P_REL_LOADER_NOT_FIN= 1<<18, /* EPROM Loader Not Finished */ + P_REL_PCIE_RX_EX_IDLE = 1<<17, /* PCIe Rx Exit Electrical Idle State */ + P_REL_GPHY_LINK_UP = 1<<16, /* GPHY Link Up */ + + /* Bit 10.. 0: Mask for Gate Clock */ + P_GAT_PCIE_RST_ASSERTED = 1<<10,/* PCIe Reset Asserted */ + P_GAT_GPHY_N_REC_PACKET = 1<<9, /* GPHY Not Received Packet */ + P_GAT_INT_FIFO_EMPTY= 1<<8, /* Internal FIFO Empty */ + P_GAT_MAIN_PWR_N_AVAIL = 1<<7, /* Main Power Not Available */ + P_GAT_CLKRUN_REQ_REL= 1<<6, /* CLKRUN Not Requested */ + P_GAT_PCIE_RESET_ASS= 1<<5, /* PCIe Reset Asserted */ + P_GAT_PME_DE_ASSERTED = 1<<4, /* PME De-Asserted */ + P_GAT_PCIE_ENTER_L1_ST = 1<<3, /* PCIe Enter L1 State */ + P_GAT_LOADER_FINISHED = 1<<2, /* EPROM Loader Finished */ + P_GAT_PCIE_RX_EL_IDLE = 1<<1, /* PCIe Rx Electrical Idle State */ + P_GAT_GPHY_LINK_DOWN= 1<<0, /* GPHY Link Down */ + + PCIE_OUR5_EVENT_CLK_D3_SET = P_REL_GPHY_REC_PACKET | +P_REL_INT_FIFO_N_EMPTY | +P_REL_PCIE_EXIT_L1_ST | +P_REL_PCIE_RX_EX_IDLE | +P_GAT_GPHY_N_REC_PACKET | +P_GAT_INT_FIFO_EMPTY | +
[PATCH 3/7] sky2: rename BMU register
This register is more of a test and control register on Yukon2. So rename it to Q_TEST and give some bit definitions. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/sky2.c |2 +- drivers/net/sky2.h | 29 +++-- 2 files changed, 12 insertions(+), 19 deletions(-) --- a/drivers/net/sky2.c2007-06-04 08:46:41.0 -0700 +++ b/drivers/net/sky2.c2007-06-04 08:46:43.0 -0700 @@ -1140,7 +1140,7 @@ static int sky2_rx_start(struct sky2_por if (hw->chip_id == CHIP_ID_YUKON_EC_U && (hw->chip_rev == CHIP_REV_YU_EC_U_A1 || hw->chip_rev == CHIP_REV_YU_EC_U_B0)) - sky2_write32(hw, Q_ADDR(rxq, Q_F), F_M_RX_RAM_DIS); + sky2_write32(hw, Q_ADDR(rxq, Q_TEST), F_M_RX_RAM_DIS); sky2_prefetch_init(hw, rxq, sky2->rx_le_map, RX_LE_SIZE - 1); --- a/drivers/net/sky2.h2007-06-04 08:46:41.0 -0700 +++ b/drivers/net/sky2.h2007-06-04 08:46:43.0 -0700 @@ -592,23 +592,15 @@ enum { enum { B8_Q_REGS = 0x0400, /* base of Queue registers */ Q_D = 0x00, /* 8*32 bit Current Descriptor */ - Q_DA_L = 0x20, /* 32 bit Current Descriptor Address Low dWord */ - Q_DA_H = 0x24, /* 32 bit Current Descriptor Address High dWord */ + Q_VLAN = 0x20, /* 16 bit Current VLAN Tag */ + Q_DONE = 0x24, /* 16 bit Done Index */ Q_AC_L = 0x28, /* 32 bit Current Address Counter Low dWord */ Q_AC_H = 0x2c, /* 32 bit Current Address Counter High dWord */ Q_BC= 0x30, /* 32 bit Current Byte Counter */ Q_CSR = 0x34, /* 32 bit BMU Control/Status Register */ - Q_F = 0x38, /* 32 bit Flag Register */ - Q_T1= 0x3c, /* 32 bit Test Register 1 */ - Q_T1_TR = 0x3c, /* 8 bit Test Register 1 Transfer SM */ - Q_T1_WR = 0x3d, /* 8 bit Test Register 1 Write Descriptor SM */ - Q_T1_RD = 0x3e, /* 8 bit Test Register 1 Read Descriptor SM */ - Q_T1_SV = 0x3f, /* 8 bit Test Register 1 Supervisor SM */ - Q_T2= 0x40, /* 32 bit Test Register 2 */ - Q_T3= 0x44, /* 32 bit Test Register 3 */ + Q_TEST = 0x38, /* 32 bit Test/Control Register */ /* Yukon-2 */ - Q_DONE = 0x24, /* 16 bit Done Index (Yukon-2 only) */ Q_WM= 0x40, /* 16 bit FIFO Watermark */ Q_AL= 0x42, /* 8 bit FIFO Alignment */ Q_RSP = 0x44, /* 16 bit FIFO Read Shadow Pointer */ @@ -622,15 +614,16 @@ enum { }; #define Q_ADDR(reg, offs) (B8_Q_REGS + (reg) + (offs)) -/* Q_F 32 bit Flag Register */ +/* Q_TEST 32 bit Test Register */ enum { - F_ALM_FULL = 1<<27, /* Rx FIFO: almost full */ - F_EMPTY = 1<<27, /* Tx FIFO: empty flag */ - F_FIFO_EOF = 1<<26, /* Tag (EOF Flag) bit in FIFO */ - F_WM_REACHED= 1<<25, /* Watermark reached */ + /* Transmit */ + F_TX_CHK_AUTO_OFF = 1<<31, /* Tx checksum auto calc off (Yukon EX) */ + F_TX_CHK_AUTO_ON = 1<<30, /* Tx checksum auto calc off (Yukon EX) */ + + /* Receive */ F_M_RX_RAM_DIS = 1<<24, /* MAC Rx RAM Read Port disable */ - F_FIFO_LEVEL= 0x1fL<<16, /* Bit 23..16: # of Qwords in FIFO */ - F_WATER_MARK= 0x0007ffL, /* Bit 10.. 0: Watermark */ + + /* Hardware testbits not used */ }; /* Queue Prefetch Unit Offsets, use Y2_QADDR() to address (Yukon-2 only)*/ -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] sky2: enable clocks before probe
Catch-22: On Yukon EX (88E8071) need to have internal clocks enabled before reading chip id. It is harmless on other chips. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/sky2.c |7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) --- a/drivers/net/sky2.c2007-06-04 08:46:43.0 -0700 +++ b/drivers/net/sky2.c2007-06-04 08:46:45.0 -0700 @@ -2519,6 +2519,9 @@ static int __devinit sky2_init(struct sk { u8 t8; + /* Enable all clocks */ + sky2_pci_write32(hw, PCI_DEV_REG3, 0); + sky2_write8(hw, B0_CTST, CS_RST_CLR); hw->chip_id = sky2_read8(hw, B2_CHIP_ID); @@ -2532,10 +2535,6 @@ static int __devinit sky2_init(struct sk dev_warn(&hw->pdev->dev, "this driver not yet tested on this chip type\n" "Please report success or failure to \n"); - /* Make sure and enable all clocks */ - if (hw->chip_id == CHIP_ID_YUKON_EX || hw->chip_id == CHIP_ID_YUKON_EC_U) - sky2_pci_write32(hw, PCI_DEV_REG3, 0); - hw->chip_rev = (sky2_read8(hw, B2_MAC_CFG) & CFG_CHIP_R_MSK) >> 4; /* This rev is really old, and requires untested workarounds */ -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] sky2: GPIO register
The General Purpose I/O register is yet another hardware workaround catchall. Enable workaround that vendor driver does to stay but for bug compatiable. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/sky2.c |5 + drivers/net/sky2.h | 14 ++ 2 files changed, 19 insertions(+) --- a/drivers/net/sky2.h2007-06-04 08:46:43.0 -0700 +++ b/drivers/net/sky2.h2007-06-04 08:46:47.0 -0700 @@ -441,6 +441,20 @@ enum { TST_CFG_WRITE_OFF= 1<<0, /* Disable Config Reg WR */ }; +/* B2_GPIO */ +enum { + GLB_GPIO_CLK_DEB_ENA = 1<<31, /* Clock Debug Enable */ + GLB_GPIO_CLK_DBG_MSK = 0xf<<26, /* Clock Debug */ + + GLB_GPIO_INT_RST_D3_DIS = 1<<15, /* Disable Internal Reset After D3 to D0 */ + GLB_GPIO_LED_PAD_SPEED_UP = 1<<14, /* LED PAD Speed Up */ + GLB_GPIO_STAT_RACE_DIS = 1<<13, /* Status Race Disable */ + GLB_GPIO_TEST_SEL_MSK = 3<<11, /* Testmode Select */ + GLB_GPIO_TEST_SEL_BASE = 1<<11, + GLB_GPIO_RAND_ENA = 1<<10, /* Random Enable */ + GLB_GPIO_RAND_BIT_1 = 1<<9, /* Random Bit 1 */ +}; + /* B2_MAC_CFG 8 bit MAC Configuration / Chip Revision */ enum { CFG_CHIP_R_MSK= 0xf<<4, /* Bit 7.. 4: Chip Revision */ --- a/drivers/net/sky2.c2007-06-04 08:46:45.0 -0700 +++ b/drivers/net/sky2.c2007-06-04 08:46:47.0 -0700 @@ -230,6 +230,11 @@ static void sky2_power_on(struct sky2_hw sky2_pci_write32(hw, PCI_DEV_REG5, reg); sky2_pci_write32(hw, PCI_CFG_REG_1, 0); + + /* Enable workaround for dev 4.107 on Yukon-Ultra & Extreme */ + reg = sky2_read32(hw, B2_GP_IO); + reg |= GLB_GPIO_STAT_RACE_DIS; + sky2_write32(hw, B2_GP_IO, reg); } } -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] sky2 version 1.15 (88e8071) support
These changes are to enable the Yukon Extreme (88e8071) chipset. This chip is similar to earlier chip but has a different set of offloading operations and some other minor quirks. Marvell has given me some evaluation boards with the 88e8071 chip set. The chip is available now, but haven't seen a systems with the hardware yet. The support fot this is exprerimental at this point, but haven't encountered any new problems. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] sky2: avoid reserved regions on ethtool reg dump
On Yukon EX reading some of the undocumented places in the memory space will cause a hang. Since they don't provide useful information, just skip the reserved areas. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/sky2.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) --- a/drivers/net/sky2.c2007-06-04 08:45:25.0 -0700 +++ b/drivers/net/sky2.c2007-06-04 08:46:34.0 -0700 @@ -3330,7 +3330,7 @@ static int sky2_get_regs_len(struct net_ /* * Returns copy of control register region - * Note: access to the RAM address register set will cause timeouts. + * Note: ethtool_get_regs always provides full size (16k) buffer */ static void sky2_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *p) @@ -3338,15 +3338,19 @@ static void sky2_get_regs(struct net_dev const struct sky2_port *sky2 = netdev_priv(dev); const void __iomem *io = sky2->hw->regs; - BUG_ON(regs->len < B3_RI_WTO_R1); regs->version = 1; memset(p, 0, regs->len); memcpy_fromio(p, io, B3_RAM_ADDR); - memcpy_fromio(p + B3_RI_WTO_R1, - io + B3_RI_WTO_R1, - regs->len - B3_RI_WTO_R1); + /* skip diagnostic ram region */ + memcpy_fromio(p + B3_RI_WTO_R1, io + B3_RI_WTO_R1, 0x2000 - B3_RI_WTO_R1); + + /* copy GMAC registers */ + memcpy_fromio(p + BASE_GMAC_1, io + BASE_GMAC_1, 0x1000); + if (sky2->hw->ports > 1) + memcpy_fromio(p + BASE_GMAC_2, io + BASE_GMAC_2, 0x1000); + } /* In order to do Jumbo packets on these chips, need to turn off the -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/5 revised][BNX2]: Add missing wait in bnx2_init_5709_context().
[BNX2]: Add missing wait in bnx2_init_5709_context(). For correctness, we need to wait for the MEM_INIT bit to be cleared in the BNX2_CTX_COMMAND register before proceeding. [Added return -EBUSY when the MEM_INIT bit doesn't clear, suggested by Jeff Garzik.] Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index 5d69e5b..0f861e2 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -1778,6 +1778,15 @@ bnx2_init_5709_context(struct bnx2 *bp) val = BNX2_CTX_COMMAND_ENABLED | BNX2_CTX_COMMAND_MEM_INIT | (1 << 12); val |= (BCM_PAGE_BITS - 8) << 16; REG_WR(bp, BNX2_CTX_COMMAND, val); + for (i = 0; i < 10; i++) { + val = REG_RD(bp, BNX2_CTX_COMMAND); + if (!(val & BNX2_CTX_COMMAND_MEM_INIT)) + break; + udelay(2); + } + if (val & BNX2_CTX_COMMAND_MEM_INIT) + return -EBUSY; + for (i = 0; i < bp->ctx_pages; i++) { int j; @@ -3696,9 +3705,11 @@ bnx2_init_chip(struct bnx2 *bp) /* Initialize context mapping and zero out the quick contexts. The * context block must have already been enabled. */ - if (CHIP_NUM(bp) == CHIP_NUM_5709) - bnx2_init_5709_context(bp); - else + if (CHIP_NUM(bp) == CHIP_NUM_5709) { + rc = bnx2_init_5709_context(bp); + if (rc) + return rc; + } else bnx2_init_context(bp); if ((rc = bnx2_init_cpus(bp)) != 0) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8169 in 2.6.18 silently corrupting data, 2.6.22-rc3 link not detected at all
Francois, Using the latest patch you provided in the previous e-mail under 2.6.22-rc3 I still see data corruption after I dropped the entire network to MTU 1500. (The receive corruption is way more frequent than under 2.6.18.) I'll attempt to find out what caused the link detection to stop working. Do you have any idea about the data corruption? I can easily put in trace anywhere in the driver if you'd like to see any output. My latest build was without NAPI support, and running with or without it didn't seem to affect the receive corruption. Thanks, -Andrew On 6/4/07, Francois Romieu <[EMAIL PROTECTED]> wrote: Francois Romieu <[EMAIL PROTECTED]> : [...] Andrew, have you had any time to do some testing with an uniform 7200 bytes MTU through your LAN ? You will find a backport of the r8169 driver for the 2.6.18 kernel at http://www.fr.zoreil.com/people/francois/backport/r8169/20070604-00 Could you do a binary search of the patch which breaks the link detection ? From a different report that I have received, the current testing branch of the 8169 driver would improve things for MTU beyond 1500 bytes: nailing down your broken link regression would be a welcome improvement. If you are confortable with git, you can do a git bisect after pulling the branch 'r8169-backport-test-20070604' at: git://electric-eye.fr.zoreil.com/home/romieu/linux/linux-2.6-out (please do the initial clone from elsewhere if you use git, thanks) -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Add security check before flushing SAD/SPD
From: James Morris <[EMAIL PROTECTED]> Date: Mon, 4 Jun 2007 19:13:10 -0400 (EDT) > I've applied this patch to > > git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-davem > > > Dave, feel free to pull from that branch. Thanks James, I'll pull that in for my next round of networking fixes. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/5][BNX2]: Add missing wait in bnx2_init_5709_context().
On Mon, Jun 04, 2007 at 05:00:35PM -0700, Michael Chan wrote: > [BNX2]: Add missing wait in bnx2_init_5709_context(). > > For correctness, we need to wait for the MEM_INIT bit to be cleared > in the BNX2_CTX_COMMAND register before proceeding. > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> > > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c > index 5d69e5b..daa62d9 100644 > --- a/drivers/net/bnx2.c > +++ b/drivers/net/bnx2.c > @@ -1778,6 +1778,12 @@ bnx2_init_5709_context(struct bnx2 *bp) > val = BNX2_CTX_COMMAND_ENABLED | BNX2_CTX_COMMAND_MEM_INIT | (1 << 12); > val |= (BCM_PAGE_BITS - 8) << 16; > REG_WR(bp, BNX2_CTX_COMMAND, val); > + for (i = 0; i < 10; i++) { > + val = REG_RD(bp, BNX2_CTX_COMMAND); > + if (!(val & BNX2_CTX_COMMAND_MEM_INIT)) > + break; > + udelay(2); > + } > for (i = 0; i < bp->ctx_pages; i++) { > int j; If the bit does not clear after 10 loops, it seems to me you should complain or somehow otherwise make note of the failure. Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/5][BNX2]: Fix netdev watchdog on 5708.
On Mon, Jun 04, 2007 at 05:00:03PM -0700, Michael Chan wrote: > [BNX2]: Fix netdev watchdog on 5708. > > There's a bug in the driver that only initializes half of the context > memory on the 5708. Surprisingly, this works most of the time except > for some occasional netdev watchdogs when sending a lot of 64-byte > packets. The fix is to add the missing code to initialize the 2nd > halves of all context memory. > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> ACK patch 1, and 3-5 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[2/5][BNX2]: Add missing wait in bnx2_init_5709_context().
[BNX2]: Add missing wait in bnx2_init_5709_context(). For correctness, we need to wait for the MEM_INIT bit to be cleared in the BNX2_CTX_COMMAND register before proceeding. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index 5d69e5b..daa62d9 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -1778,6 +1778,12 @@ bnx2_init_5709_context(struct bnx2 *bp) val = BNX2_CTX_COMMAND_ENABLED | BNX2_CTX_COMMAND_MEM_INIT | (1 << 12); val |= (BCM_PAGE_BITS - 8) << 16; REG_WR(bp, BNX2_CTX_COMMAND, val); + for (i = 0; i < 10; i++) { + val = REG_RD(bp, BNX2_CTX_COMMAND); + if (!(val & BNX2_CTX_COMMAND_MEM_INIT)) + break; + udelay(2); + } for (i = 0; i < bp->ctx_pages; i++) { int j; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[4/5][BNX2]: Fix occasional counter corruption on 5708.
[BNX2]: Fix occasional counter corruption on 5708. The statistics block DMA on 5708 can be messed up occasionally on the average of about once per hour. If the user is reading the counters within one second after the corruption, the counters will be all messed up. One second later, the counters will be ok again until the next corruption occurs. The workaround is to disable the periodic statistics DMA. Instead, we manually trigger the DMA once a second in bnx2_timer(). This manual trigger of the DMA avoids the problem. As a consequence, we can only allow 0 or 1 second settings for ethtool -C statistics block. Thanks to Jean-Daniel Pauget <[EMAIL PROTECTED]> and CaT <[EMAIL PROTECTED]> for reporting this rare problem. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index c03282c..5654cc7 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -3783,7 +3783,10 @@ bnx2_init_chip(struct bnx2 *bp) REG_WR(bp, BNX2_HC_CMD_TICKS, (bp->cmd_ticks_int << 16) | bp->cmd_ticks); - REG_WR(bp, BNX2_HC_STATS_TICKS, bp->stats_ticks & 0x00); + if (CHIP_NUM(bp) == CHIP_NUM_5708) + REG_WR(bp, BNX2_HC_STATS_TICKS, 0); + else + REG_WR(bp, BNX2_HC_STATS_TICKS, bp->stats_ticks & 0x00); REG_WR(bp, BNX2_HC_STAT_COLLECT_TICKS, 0xbb8); /* 3ms */ if (CHIP_ID(bp) == CHIP_ID_5706_A1) @@ -4636,6 +4639,11 @@ bnx2_timer(unsigned long data) bp->stats_blk->stat_FwRxDrop = REG_RD_IND(bp, BNX2_FW_RX_DROP_COUNT); + /* workaround occasional corrupted counters */ + if (CHIP_NUM(bp) == CHIP_NUM_5708 && bp->stats_ticks) + REG_WR(bp, BNX2_HC_COMMAND, bp->hc_cmd | + BNX2_HC_COMMAND_STATS_NOW); + if (bp->phy_flags & PHY_SERDES_FLAG) { if (CHIP_NUM(bp) == CHIP_NUM_5706) bnx2_5706_serdes_timer(bp); @@ -5446,6 +5454,10 @@ bnx2_set_coalesce(struct net_device *dev, struct ethtool_coalesce *coal) 0xff; bp->stats_ticks = coal->stats_block_coalesce_usecs; + if (CHIP_NUM(bp) == CHIP_NUM_5708) { + if (bp->stats_ticks != 0 && bp->stats_ticks != USEC_PER_SEC) + bp->stats_ticks = USEC_PER_SEC; + } if (bp->stats_ticks > 0x00) bp->stats_ticks = 0x00; bp->stats_ticks &= 0x00; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[5/5][BNX2]: Update version and reldate.
[BNX2]: Update version and reldate. Update to version 1.5.11. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index 58e6f49..a9b833b 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -54,8 +54,8 @@ #define DRV_MODULE_NAME"bnx2" #define PFX DRV_MODULE_NAME": " -#define DRV_MODULE_VERSION "1.5.10" -#define DRV_MODULE_RELDATE "May 1, 2007" +#define DRV_MODULE_VERSION "1.5.11" +#define DRV_MODULE_RELDATE "June 4, 2007" #define RUN_AT(x) (jiffies + (x)) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[1/5][BNX2]: Fix netdev watchdog on 5708.
[BNX2]: Fix netdev watchdog on 5708. There's a bug in the driver that only initializes half of the context memory on the 5708. Surprisingly, this works most of the time except for some occasional netdev watchdogs when sending a lot of 64-byte packets. The fix is to add the missing code to initialize the 2nd halves of all context memory. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index 88b33c6..5d69e5b 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -1811,6 +1811,7 @@ bnx2_init_context(struct bnx2 *bp) vcid = 96; while (vcid) { u32 vcid_addr, pcid_addr, offset; + int i; vcid--; @@ -1831,16 +1832,20 @@ bnx2_init_context(struct bnx2 *bp) pcid_addr = vcid_addr; } - REG_WR(bp, BNX2_CTX_VIRT_ADDR, 0x00); - REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr); + for (i = 0; i < (CTX_SIZE / PHY_CTX_SIZE); i++) { + vcid_addr += (i << PHY_CTX_SHIFT); + pcid_addr += (i << PHY_CTX_SHIFT); - /* Zero out the context. */ - for (offset = 0; offset < PHY_CTX_SIZE; offset += 4) { - CTX_WR(bp, 0x00, offset, 0); - } + REG_WR(bp, BNX2_CTX_VIRT_ADDR, 0x00); + REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr); - REG_WR(bp, BNX2_CTX_VIRT_ADDR, vcid_addr); - REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr); + /* Zero out the context. */ + for (offset = 0; offset < PHY_CTX_SIZE; offset += 4) + CTX_WR(bp, 0x00, offset, 0); + + REG_WR(bp, BNX2_CTX_VIRT_ADDR, vcid_addr); + REG_WR(bp, BNX2_CTX_PAGE_TBL, pcid_addr); + } } } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[3/5][BNX2]: Enable DMA on 5709.
[BNX2]: Enable DMA on 5709. Add missing code to enable DMA on 5709 A1. The bit is a no-op on A0 and therefore can be set on all 5709 chips. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index daa62d9..c03282c 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -3810,6 +3810,11 @@ bnx2_init_chip(struct bnx2 *bp) /* Initialize the receive filter. */ bnx2_set_rx_mode(bp->dev); + if (CHIP_NUM(bp) == CHIP_NUM_5709) { + val = REG_RD(bp, BNX2_MISC_NEW_CORE_CTL); + val |= BNX2_MISC_NEW_CORE_CTL_DMA_ENABLE; + REG_WR(bp, BNX2_MISC_NEW_CORE_CTL, val); + } rc = bnx2_fw_sync(bp, BNX2_DRV_MSG_DATA_WAIT2 | BNX2_DRV_MSG_CODE_RESET, 0); diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h index bd6288d..49a5de2 100644 --- a/drivers/net/bnx2.h +++ b/drivers/net/bnx2.h @@ -1373,6 +1373,7 @@ struct l2_fhdr { #define BNX2_MISC_NEW_CORE_CTL 0x08c8 #define BNX2_MISC_NEW_CORE_CTL_LINK_HOLDOFF_SUCCESS (1L<<0) #define BNX2_MISC_NEW_CORE_CTL_LINK_HOLDOFF_REQ (1L<<1) +#define BNX2_MISC_NEW_CORE_CTL_DMA_ENABLE (1L<<16) #define BNX2_MISC_NEW_CORE_CTL_RESERVED_CMN (0x3fffL<<2) #define BNX2_MISC_NEW_CORE_CTL_RESERVED_TC (0xL<<16) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Add security check before flushing SAD/SPD
I've applied this patch to git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-davem Dave, feel free to pull from that branch. - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Add suspend and resume support to uli526x
Hi. On Tue, 2007-06-05 at 00:53 +0200, Pavel Machek wrote: > On Tue 2007-06-05 08:49:04, Nigel Cunningham wrote: > > Hi. > > > > On Tue, 2007-06-05 at 00:41 +0200, Pavel Machek wrote: > > > Hi! > > > > > > > > > > +#endif /* CONFIG_PM */ > > > > > > > > > > > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver > > > > > > > .id_table = uli526x_pci_tbl, > > > > > > > .probe = uli526x_init_one, > > > > > > > .remove = __devexit_p(uli526x_remove_one), > > > > > > > +#ifdef CONFIG_PM > > > > > > > + .suspend= uli526x_suspend, > > > > > > > + .resume = uli526x_resume, > > > > > > > +#endif > > > > > > > > > > > > ...so that this ifdef is not needed? > > > > > > > > > > OK, why not. > > > > > > > > Because it's uglier and #ifdef is the established way of doing things? > > > > > > Actually the way I suggested is nicer, IIRC akpm invented it. It keeps > > > ifdefs localized around the block that _needs_ to be ifdefed. > > > > The localised point is true. I'll also admit that 'nicer'/'uglier' is a > > matter of aesthetics and therefore personal opinion. > > > > I guess that leaves the question, "What's the precedent to follow?" or > > "Is there a driver that's already got this new format?". > > for example net/ne.c ... > > [EMAIL PROTECTED]:/data/l/linux/drivers$ grep "_suspend NULL" */*.c > leds/leds-ams-delta.c:#define ams_delta_led_suspend NULL > leds/leds-net48xx.c:#define net48xx_led_suspend NULL > leds/leds-s3c24xx.c:#define s3c24xx_led_suspend NULL > leds/leds-tosa.c:#define tosaled_suspend NULL > leds/leds-wrap.c:#define wrap_led_suspend NULL > misc/tifm_7xx1.c:#define tifm_7xx1_suspend NULL > misc/tifm_core.c:#define tifm_device_suspend NULL > net/3c509.c:#define el3_suspend NULL > net/forcedeth.c:#define nv_suspend NULL > net/ne.c:#define ne_drv_suspend NULL > parport/parport_ax88796.c:#define parport_ax88796_suspend NULL > rtc/rtc-at91rm9200.c:#define at91_rtc_suspend NULL > rtc/rtc-omap.c:#define omap_rtc_suspend NULL > rtc/rtc-s3c.c:#define s3c_rtc_suspend NULL > serial/8250_pnp.c:#define serial_pnp_suspend NULL > serial/atmel_serial.c:#define atmel_serial_suspend NULL > serial/s3c2410.c:#define s3c24xx_serial_suspend NULL > spi/pxa2xx_spi.c:#define pxa2xx_spi_suspend NULL > spi/spi_bfin5xx.c:#define bfin5xx_spi_suspend NULL > spi/spi_imx.c:#define spi_imx_suspend NULL > spi/spi_s3c24xx.c:#define s3c24xx_spi_suspend NULL > spi/spi_s3c24xx_gpio.c:#define s3c2410_spigpio_suspend NULL > video/arkfb.c:#define ark_pci_suspend NULL > video/au1100fb.c:#define au1100fb_drv_suspend NULL > video/s3c2410fb.c:#define s3c2410fb_suspend NULL > video/skeletonfb.c:#define xxxfb_suspend NULL > video/skeletonfb.c:#define xxxfb_suspend NULL > video/sm501fb.c:#define sm501fb_suspend NULL Cool. That's a lot more than I would have expected. Ok. I'll drop my objection then. (Still think it's ugly though :>). Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: [RFC][PATCH] Add suspend and resume support to uli526x
On Tue 2007-06-05 08:49:04, Nigel Cunningham wrote: > Hi. > > On Tue, 2007-06-05 at 00:41 +0200, Pavel Machek wrote: > > Hi! > > > > > > > > +#endif /* CONFIG_PM */ > > > > > > > > > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver > > > > > > .id_table = uli526x_pci_tbl, > > > > > > .probe = uli526x_init_one, > > > > > > .remove = __devexit_p(uli526x_remove_one), > > > > > > +#ifdef CONFIG_PM > > > > > > + .suspend= uli526x_suspend, > > > > > > + .resume = uli526x_resume, > > > > > > +#endif > > > > > > > > > > ...so that this ifdef is not needed? > > > > > > > > OK, why not. > > > > > > Because it's uglier and #ifdef is the established way of doing things? > > > > Actually the way I suggested is nicer, IIRC akpm invented it. It keeps > > ifdefs localized around the block that _needs_ to be ifdefed. > > The localised point is true. I'll also admit that 'nicer'/'uglier' is a > matter of aesthetics and therefore personal opinion. > > I guess that leaves the question, "What's the precedent to follow?" or > "Is there a driver that's already got this new format?". for example net/ne.c ... [EMAIL PROTECTED]:/data/l/linux/drivers$ grep "_suspend NULL" */*.c leds/leds-ams-delta.c:#define ams_delta_led_suspend NULL leds/leds-net48xx.c:#define net48xx_led_suspend NULL leds/leds-s3c24xx.c:#define s3c24xx_led_suspend NULL leds/leds-tosa.c:#define tosaled_suspend NULL leds/leds-wrap.c:#define wrap_led_suspend NULL misc/tifm_7xx1.c:#define tifm_7xx1_suspend NULL misc/tifm_core.c:#define tifm_device_suspend NULL net/3c509.c:#define el3_suspend NULL net/forcedeth.c:#define nv_suspend NULL net/ne.c:#define ne_drv_suspend NULL parport/parport_ax88796.c:#define parport_ax88796_suspend NULL rtc/rtc-at91rm9200.c:#define at91_rtc_suspend NULL rtc/rtc-omap.c:#define omap_rtc_suspend NULL rtc/rtc-s3c.c:#define s3c_rtc_suspend NULL serial/8250_pnp.c:#define serial_pnp_suspend NULL serial/atmel_serial.c:#define atmel_serial_suspend NULL serial/s3c2410.c:#define s3c24xx_serial_suspend NULL spi/pxa2xx_spi.c:#define pxa2xx_spi_suspend NULL spi/spi_bfin5xx.c:#define bfin5xx_spi_suspend NULL spi/spi_imx.c:#define spi_imx_suspend NULL spi/spi_s3c24xx.c:#define s3c24xx_spi_suspend NULL spi/spi_s3c24xx_gpio.c:#define s3c2410_spigpio_suspend NULL video/arkfb.c:#define ark_pci_suspend NULL video/au1100fb.c:#define au1100fb_drv_suspend NULL video/s3c2410fb.c:#define s3c2410fb_suspend NULL video/skeletonfb.c:#define xxxfb_suspend NULL video/skeletonfb.c:#define xxxfb_suspend NULL video/sm501fb.c:#define sm501fb_suspend NULL Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Add suspend and resume support to uli526x
Hi. On Tue, 2007-06-05 at 00:41 +0200, Pavel Machek wrote: > Hi! > > > > > > +#endif /* CONFIG_PM */ > > > > > > > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver > > > > > .id_table = uli526x_pci_tbl, > > > > > .probe = uli526x_init_one, > > > > > .remove = __devexit_p(uli526x_remove_one), > > > > > +#ifdef CONFIG_PM > > > > > + .suspend= uli526x_suspend, > > > > > + .resume = uli526x_resume, > > > > > +#endif > > > > > > > > ...so that this ifdef is not needed? > > > > > > OK, why not. > > > > Because it's uglier and #ifdef is the established way of doing things? > > Actually the way I suggested is nicer, IIRC akpm invented it. It keeps > ifdefs localized around the block that _needs_ to be ifdefed. The localised point is true. I'll also admit that 'nicer'/'uglier' is a matter of aesthetics and therefore personal opinion. I guess that leaves the question, "What's the precedent to follow?" or "Is there a driver that's already got this new format?". Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: [RFC][PATCH] Add suspend and resume support to uli526x
Hi! > > > > +#endif /* CONFIG_PM */ > > > > > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver > > > > .id_table = uli526x_pci_tbl, > > > > .probe = uli526x_init_one, > > > > .remove = __devexit_p(uli526x_remove_one), > > > > +#ifdef CONFIG_PM > > > > + .suspend= uli526x_suspend, > > > > + .resume = uli526x_resume, > > > > +#endif > > > > > > ...so that this ifdef is not needed? > > > > OK, why not. > > Because it's uglier and #ifdef is the established way of doing things? Actually the way I suggested is nicer, IIRC akpm invented it. It keeps ifdefs localized around the block that _needs_ to be ifdefed. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[-mm patch] e1000: #if 0 two functions
e1000_{read,write}_pci_cfg() are no longer used. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> --- drivers/net/e1000/e1000_hw.h |2 -- drivers/net/e1000/e1000_main.c |4 2 files changed, 4 insertions(+), 2 deletions(-) --- linux-2.6.22-rc3-mm1/drivers/net/e1000/e1000_hw.h.old 2007-06-04 22:03:05.0 +0200 +++ linux-2.6.22-rc3-mm1/drivers/net/e1000/e1000_hw.h 2007-06-04 22:03:14.0 +0200 @@ -421,8 +421,6 @@ void e1000_tbi_adjust_stats(struct e1000 void e1000_get_bus_info(struct e1000_hw *hw); void e1000_pci_set_mwi(struct e1000_hw *hw); void e1000_pci_clear_mwi(struct e1000_hw *hw); -void e1000_read_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t * value); -void e1000_write_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t * value); int32_t e1000_read_pcie_cap_reg(struct e1000_hw *hw, uint32_t reg, uint16_t *value); void e1000_pcix_set_mmrbc(struct e1000_hw *hw, int mmrbc); int e1000_pcix_get_mmrbc(struct e1000_hw *hw); --- linux-2.6.22-rc3-mm1/drivers/net/e1000/e1000_main.c.old 2007-06-04 22:03:24.0 +0200 +++ linux-2.6.22-rc3-mm1/drivers/net/e1000/e1000_main.c 2007-06-04 22:03:40.0 +0200 @@ -4888,6 +4888,8 @@ e1000_pci_clear_mwi(struct e1000_hw *hw) pci_clear_mwi(adapter->pdev); } +#if 0 + void e1000_read_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t *value) { @@ -4904,6 +4906,8 @@ e1000_write_pci_cfg(struct e1000_hw *hw, pci_write_config_word(adapter->pdev, reg, *value); } +#endif /* 0 */ + int e1000_pcix_get_mmrbc(struct e1000_hw *hw) { - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skb->priority on arp packets
Waskiewicz Jr, Peter P wrote: is it possible to set the skb->priority on arp packets generated by the kernel? I want to to set the 802.1p priority on arp and ip packets on an interface. On ip packets, this can be done by the iptables CLASSIFY target and the skb->priority mapping from the vlan implementation. any ideas? Currently unclassified ip packets have skb->priority set in ipsockglue, by translating the IP TOS field into a Linux priority. This is not the same priority as 802.1p, rather, it's a Linux-based OS priority classification for dequeuing priority in schedulers such as sch_prio. 802.1p lives in the VLAN tag, which is separate from skb->priority. I'd suggest reading http://lartc.org/howto/lartc.qdisc.classless.html to see what exactly skb->priority is used for in sch_prio and pfifo_fast, and how the ip layer determines a packet's priority. The skb->priority field is also used in other ways in other qdiscs I'm not completely familiar with, but it's not directly related to priority of the packet. I'd need to look closer at those qdiscs to be specific. That being said, the only way I can think of manipulating skb->priority on arp packets (and actually setting it) is modifying a qdisc to set skb->priority if it matches a filter created using the tc command. If you do manage to get the skb->priority set, then you can use vconfig to set up mapping between skb->priority and the VLAN priority field. Thanks, Ben Cheers, -PJ Waskiewicz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: skb->priority on arp packets
> is it possible to set the skb->priority on arp packets > generated by the kernel? > I want to to set the 802.1p priority on arp and ip packets on > an interface. On ip packets, this can be done by the iptables > CLASSIFY target and the > skb->priority mapping from the vlan implementation. > > any ideas? Currently unclassified ip packets have skb->priority set in ipsockglue, by translating the IP TOS field into a Linux priority. This is not the same priority as 802.1p, rather, it's a Linux-based OS priority classification for dequeuing priority in schedulers such as sch_prio. 802.1p lives in the VLAN tag, which is separate from skb->priority. I'd suggest reading http://lartc.org/howto/lartc.qdisc.classless.html to see what exactly skb->priority is used for in sch_prio and pfifo_fast, and how the ip layer determines a packet's priority. The skb->priority field is also used in other ways in other qdiscs I'm not completely familiar with, but it's not directly related to priority of the packet. I'd need to look closer at those qdiscs to be specific. That being said, the only way I can think of manipulating skb->priority on arp packets (and actually setting it) is modifying a qdisc to set skb->priority if it matches a filter created using the tc command. Cheers, -PJ Waskiewicz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] b44 driver improvements
On Mon, Jun 04, 2007 at 01:25:36PM -0700, Stephen Hemminger wrote: > While researching why the wired networking was so slow > on my laptop (not a driver problem); spotted these small > changes to b44 driver. Nothing urgent, maybe 2.6.23 > or later material. ACK to the lot John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8169: hard freezes on TX
Rolf Eike Beer <[EMAIL PROTECTED]> : [...] > I just had another freeze using your patches. After 512kB over smb it was > dead. In-kernel smb/cifs ? -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] iproute2: Added support for RR qdisc (sch_rr)
Add tc support for the sch_rr qdisc. This qdisc supports multiple queues on hardware. The syntax for sch_rr is the same as sch_prio. Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]> --- include/linux/pkt_sched.h | 11 tc/Makefile |1 tc/q_rr.c | 113 + 3 files changed, 125 insertions(+), 0 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index d10f353..907412b 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -22,6 +22,7 @@ #define TC_PRIO_CONTROL7 #define TC_PRIO_MAX15 +#define TC_RR_MAX 15 /* Generic queue statistics, available for all the elements. Particular schedulers may have also their private records. @@ -90,6 +91,16 @@ struct tc_fifo_qopt __u32 limit; /* Queue length: bytes for bfifo, packets for pfifo */ }; +/* RR section */ +#define TCQ_RR_BANDS 16 +#define TCQ_MIN_RR_BANDS 2 + +struct tc_rr_qopt +{ + int bands; + __u8priomap[TC_RR_MAX + 1]; +}; + /* PRIO section */ #define TCQ_PRIO_BANDS 16 diff --git a/tc/Makefile b/tc/Makefile index 9d618ff..62e2697 100644 --- a/tc/Makefile +++ b/tc/Makefile @@ -9,6 +9,7 @@ TCMODULES += q_fifo.o TCMODULES += q_sfq.o TCMODULES += q_red.o TCMODULES += q_prio.o +TCMODULES += q_rr.o TCMODULES += q_tbf.o TCMODULES += q_cbq.o TCMODULES += f_rsvp.o diff --git a/tc/q_rr.c b/tc/q_rr.c new file mode 100644 index 000..8eecac9 --- /dev/null +++ b/tc/q_rr.c @@ -0,0 +1,113 @@ +/* + * q_rr.c RR. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors:PJ Waskiewicz, <[EMAIL PROTECTED]> + * Original Authors: Alexey Kuznetsov, <[EMAIL PROTECTED]> (from PRIO) + * + * Changes: + * + * Ole Husgaard <[EMAIL PROTECTED]>: 990513: prio2band map was always reset. + * J Hadi Salim <[EMAIL PROTECTED]>: 990609: priomap fix. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "utils.h" +#include "tc_util.h" + +static void explain(void) +{ + fprintf(stderr, "Usage: ... rr bands NUMBER priomap P1 P2...\n"); +} + +#define usage() return(-1) + +static int rr_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nlmsghdr *n) +{ + int ok = 0; + int pmap_mode = 0; + int idx = 0; + struct tc_rr_qopt opt={3,{ 1, 2, 2, 2, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1 }}; + + while (argc > 0) { + if (strcmp(*argv, "bands") == 0) { + if (pmap_mode) + explain(); + NEXT_ARG(); + if (get_integer(&opt.bands, *argv, 10)) { + fprintf(stderr, "Illegal \"bands\"\n"); + return -1; + } + ok++; + } else if (strcmp(*argv, "priomap") == 0) { + if (pmap_mode) { + fprintf(stderr, "Error: duplicate priomap\n"); + return -1; + } + pmap_mode = 1; + } else if (strcmp(*argv, "help") == 0) { + explain(); + return -1; + } else { + unsigned band; + if (!pmap_mode) { + fprintf(stderr, "What is \"%s\"?\n", *argv); + explain(); + return -1; + } + if (get_unsigned(&band, *argv, 10)) { + fprintf(stderr, "Illegal \"priomap\" element\n"); + return -1; + } + if (band > opt.bands) { + fprintf(stderr, "\"priomap\" element is out of bands\n"); + return -1; + } + if (idx > TC_RR_MAX) { + fprintf(stderr, "\"priomap\" index > TC_RR_MAX=%u\n", TC_RR_MAX); + return -1; + } + opt.priomap[idx++] = band; + } + argc--; argv++; + } + + addattr_l(n, 1024, TCA_OPTIONS, &opt, sizeof(opt)); + return 0; +} + +int rr_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) +{ + int i; + struct tc_rr_qopt *qopt; + + if (opt == NULL) + return 0; + + if (RTA_PAYLOAD(opt) < sizeof(*qopt)) +
[RFC] iproute2: sch_rr support in tc
This patch is to support the new sch_rr (round-robin) qdisc being proposed in NET for multiqueue network device support in the Linux network stack. It uses q_prio.c as the template, since the qdiscs are nearly identical, outside of the ->dequeue() routine. I'm soliciting feedback for a 2.6.23 multiqueue submission. Thanks. -- PJ Waskiewicz <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NET: Multiqueue network device support.
API added to support multiple hardware queues on an ethernet device. Round-robin scheduler added (sch_rr) to provide a no-scheduling policy qdisc for hardware with multiple queues. Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]> --- include/linux/etherdevice.h |3 include/linux/netdevice.h | 62 + include/linux/pkt_sched.h | 11 + include/linux/skbuff.h |2 net/core/dev.c | 27 ++ net/core/skbuff.c |3 net/ethernet/eth.c |9 - net/sched/Kconfig | 22 ++ net/sched/Makefile |1 net/sched/sch_generic.c |4 net/sched/sch_prio.c| 66 +- net/sched/sch_rr.c | 516 +++ 12 files changed, 706 insertions(+), 20 deletions(-) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index 071c67a..283e687 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -39,7 +39,8 @@ extern void eth_header_cache_update(struct hh_cache *hh, struct net_device *dev extern int eth_header_cache(struct neighbour *neigh, struct hh_cache *hh); -extern struct net_device *alloc_etherdev(int sizeof_priv); +extern struct net_device *alloc_etherdev_mq(int sizeof_priv, int queue_count); +#define alloc_etherdev(sizeof_priv) alloc_etherdev_mq(sizeof_priv, 1) static inline void eth_copy_and_sum (struct sk_buff *dest, const unsigned char *src, int len, int base) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f671cd2..376a0d2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -108,6 +108,14 @@ struct wireless_dev; #define MAX_HEADER (LL_MAX_HEADER + 48) #endif +struct net_device_subqueue +{ + /* Give a control state for each queue. This struct may contain +* per-queue locks in the future. +*/ + unsigned long state; +}; + /* * Network device statistics. Akin to the 2.0 ether stats but * with byte counters. @@ -325,6 +333,7 @@ struct net_device #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN packets */ #define NETIF_F_GSO2048/* Enable software GSO. */ #define NETIF_F_LLTX 4096/* LockLess TX */ +#define NETIF_F_MULTI_QUEUE16384 /* Has multiple TX/RX queues */ /* Segmentation offload features */ #define NETIF_F_GSO_SHIFT 16 @@ -540,6 +549,10 @@ struct net_device struct device dev; /* space for optional statistics and wireless sysfs groups */ struct attribute_group *sysfs_groups[3]; + + /* The TX queue control structures */ + struct net_device_subqueue *egress_subqueue; + int egress_subqueue_count; }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -702,6 +715,48 @@ static inline int netif_running(const struct net_device *dev) return test_bit(__LINK_STATE_START, &dev->state); } +/* + * Routines to manage the subqueues on a device. We only need start + * stop, and a check if it's stopped. All other device management is + * done at the overall netdevice level. + * Also test the device if we're multiqueue. + */ +static inline void netif_start_subqueue(struct net_device *dev, u16 queue_index) +{ + clear_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state); +} + +static inline void netif_stop_subqueue(struct net_device *dev, u16 queue_index) +{ +#ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + set_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state); +} + +static inline int netif_subqueue_stopped(const struct net_device *dev, + u16 queue_index) +{ + return test_bit(__LINK_STATE_XOFF, + &dev->egress_subqueue[queue_index].state); +} + +static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index) +{ +#ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + if (test_and_clear_bit(__LINK_STATE_XOFF, + &dev->egress_subqueue[queue_index].state)) + __netif_schedule(dev); +} + +static inline int netif_is_multiqueue(const struct net_device *dev) +{ + return (!!(NETIF_F_MULTI_QUEUE & dev->features)); +} /* Use this variant when it is known for sure that it * is executing from interrupt context. @@ -995,8 +1050,11 @@ static inline void netif_tx_disable(struct net_device *dev) extern voidether_setup(struct net_device *dev); /* Support for loadable net-drivers */ -extern struct net_device *alloc_netdev(int sizeof_priv, const char *name, - void (*setup)(struct net_device *)); +extern struct net_device *al
[RFC] NET: Multiple queue hardware support
This patchset is an updated version of previous multiqueue network device support patches. The general approach of introducing a new API for multiqueue network devices to register with the stack has remained. The changes include adding a round-robin qdisc, heavily based on sch_prio, which will allow queueing to hardware with no OS-enforced queuing policy. sch_prio still has the multiqueue code in it, but has a Kconfig option to compile it out of the qdisc. This allows people with hardware containing scheduling policies to use sch_rr (round-robin), and others without scheduling policies in hardware to continue using sch_prio if they wish to have some notion of scheduling priority. The patches to iproute2 for tc will be sent separately, to support sch_rr. I'm soliciting feedback for a 2.6.23 submission. Thanks. -- PJ Waskiewicz <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
skb->priority on arp packets
hi @all, is it possible to set the skb->priority on arp packets generated by the kernel? I want to to set the 802.1p priority on arp and ip packets on an interface. On ip packets, this can be done by the iptables CLASSIFY target and the skb->priority mapping from the vlan implementation. any ideas? thx Marc - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Add suspend and resume support to uli526x
Hi, On Monday, 4 June 2007 23:16, Nigel Cunningham wrote: > Hi. > > On Mon, 2007-06-04 at 15:49 +0200, Rafael J. Wysocki wrote: > > On Monday, 4 June 2007 13:11, Pavel Machek wrote: > > > > From: Rafael J. Wysocki <[EMAIL PROTECTED]> > > > > > > > > Add suspend/resume support to the uli526x network driver (tested on > > > > x86_64, > > > > with 'Ethernet controller: ALi Corporation M5263 Ethernet Controller, > > > > rev 40'). > > > > > > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> > > > > > > Looks ok to me. > > > > > > > +#ifdef CONFIG_PM > > > > + > > > > +/* > > > > + * Suspend the interface. > > > > + */ > > > > + > > > > +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state) > > > > +{ > > > > + struct net_device *dev = pci_get_drvdata(pdev); > > > > + > > > > + ULI526X_DBUG(0, "uli526x_suspend", 0); > > > > + > > > > + if (dev && netdev_priv(dev)) { > > > > + if (netif_running(dev)) { > > > > + netif_device_detach(dev); > > > > + uli526x_down(dev); > > > > + } > > > > + pci_save_state(pdev); > > > > + pci_enable_wake(pdev, pci_choose_state(pdev, state), 0); > > > > + pci_disable_device(pdev); > > > > + pci_set_power_state(pdev, pci_choose_state(pdev, > > > > state)); > > > > + } > > > > + return 0; > > > > +} > > > > + > > > > +/* > > > > + * Resume the interface. > > > > + */ > > > > + > > > > +static int uli526x_resume(struct pci_dev *pdev) > > > > +{ > > > > + struct net_device *dev = pci_get_drvdata(pdev); > > > > + struct uli526x_board_info *db = netdev_priv(dev); > > > > + int err; > > > > + > > > > + ULI526X_DBUG(0, "uli526x_resume", 0); > > > > + > > > > + if (dev && db) { > > > > + pci_set_power_state(pdev, PCI_D0); > > > > + err = pci_enable_device(pdev); > > > > + if (err) { > > > > + printk(KERN_WARNING "%s: Could not enable > > > > device \n", > > > > + dev->name); > > > > + return err; > > > > + } > > > > + pci_restore_state(pdev); > > > > + pci_set_master(pdev); > > > > + if (netif_running(dev)) { > > > > + uli526x_up(dev); > > > > + netif_device_attach(dev); > > > > + } > > > > + } > > > > + return 0; > > > > +} > > > > + > > > #else > > > #define *_resume NULL > > > #define *_suspend NULL > > > > > > +#endif /* CONFIG_PM */ > > > > > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver > > > > .id_table = uli526x_pci_tbl, > > > > .probe = uli526x_init_one, > > > > .remove = __devexit_p(uli526x_remove_one), > > > > +#ifdef CONFIG_PM > > > > + .suspend= uli526x_suspend, > > > > + .resume = uli526x_resume, > > > > +#endif > > > > > > ...so that this ifdef is not needed? > > > > OK, why not. > > Because it's uglier and #ifdef is the established way of doing things? Hmm, I have no strong opinion. I slightly prefer the #ifdef, but well. Perhaps I'll send it to Andrew 'as is' if no one else has any more comments. ;-) Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Add suspend and resume support to uli526x
Hi. On Mon, 2007-06-04 at 15:49 +0200, Rafael J. Wysocki wrote: > On Monday, 4 June 2007 13:11, Pavel Machek wrote: > > > From: Rafael J. Wysocki <[EMAIL PROTECTED]> > > > > > > Add suspend/resume support to the uli526x network driver (tested on > > > x86_64, > > > with 'Ethernet controller: ALi Corporation M5263 Ethernet Controller, rev > > > 40'). > > > > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> > > > > Looks ok to me. > > > > > +#ifdef CONFIG_PM > > > + > > > +/* > > > + * Suspend the interface. > > > + */ > > > + > > > +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state) > > > +{ > > > + struct net_device *dev = pci_get_drvdata(pdev); > > > + > > > + ULI526X_DBUG(0, "uli526x_suspend", 0); > > > + > > > + if (dev && netdev_priv(dev)) { > > > + if (netif_running(dev)) { > > > + netif_device_detach(dev); > > > + uli526x_down(dev); > > > + } > > > + pci_save_state(pdev); > > > + pci_enable_wake(pdev, pci_choose_state(pdev, state), 0); > > > + pci_disable_device(pdev); > > > + pci_set_power_state(pdev, pci_choose_state(pdev, state)); > > > + } > > > + return 0; > > > +} > > > + > > > +/* > > > + * Resume the interface. > > > + */ > > > + > > > +static int uli526x_resume(struct pci_dev *pdev) > > > +{ > > > + struct net_device *dev = pci_get_drvdata(pdev); > > > + struct uli526x_board_info *db = netdev_priv(dev); > > > + int err; > > > + > > > + ULI526X_DBUG(0, "uli526x_resume", 0); > > > + > > > + if (dev && db) { > > > + pci_set_power_state(pdev, PCI_D0); > > > + err = pci_enable_device(pdev); > > > + if (err) { > > > + printk(KERN_WARNING "%s: Could not enable device \n", > > > + dev->name); > > > + return err; > > > + } > > > + pci_restore_state(pdev); > > > + pci_set_master(pdev); > > > + if (netif_running(dev)) { > > > + uli526x_up(dev); > > > + netif_device_attach(dev); > > > + } > > > + } > > > + return 0; > > > +} > > > + > > #else > > #define *_resume NULL > > #define *_suspend NULL > > > > +#endif /* CONFIG_PM */ > > > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver > > > .id_table = uli526x_pci_tbl, > > > .probe = uli526x_init_one, > > > .remove = __devexit_p(uli526x_remove_one), > > > +#ifdef CONFIG_PM > > > + .suspend= uli526x_suspend, > > > + .resume = uli526x_resume, > > > +#endif > > > > ...so that this ifdef is not needed? > > OK, why not. Because it's uglier and #ifdef is the established way of doing things? Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: [PATCH 0/4] b44 driver improvements
On Mon, Jun 04, 2007 at 01:25:36PM -0700, Stephen Hemminger wrote: > While researching why the wired networking was so slow > on my laptop (not a driver problem); spotted these small > changes to b44 driver. Nothing urgent, maybe 2.6.23 > or later material. Seems sane to me. I'm travelling and will merge this at the beginning of next week. I agree netdev#upstream is appropriate, though the b44_resume() fix may want upstreaming for 2.6.22. Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8169 in 2.6.18 silently corrupting data, 2.6.22-rc3 link not detected at all
Francois Romieu <[EMAIL PROTECTED]> : [...] Andrew, have you had any time to do some testing with an uniform 7200 bytes MTU through your LAN ? You will find a backport of the r8169 driver for the 2.6.18 kernel at http://www.fr.zoreil.com/people/francois/backport/r8169/20070604-00 Could you do a binary search of the patch which breaks the link detection ? >From a different report that I have received, the current testing branch of the 8169 driver would improve things for MTU beyond 1500 bytes: nailing down your broken link regression would be a welcome improvement. If you are confortable with git, you can do a git bisect after pulling the branch 'r8169-backport-test-20070604' at: git://electric-eye.fr.zoreil.com/home/romieu/linux/linux-2.6-out (please do the initial clone from elsewhere if you use git, thanks) -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iperf: performance regression (was b44 driver problem?)
On Monday 04 June 2007, Stephen Hemminger wrote: > On Mon, 4 Jun 2007 21:47:59 +0200 > > Maximilian Engelhardt <[EMAIL PROTECTED]> wrote: > > On Monday 04 June 2007, Ingo Molnar wrote: > > > * Stephen Hemminger <[EMAIL PROTECTED]> wrote: > > > > Yes, the following patch makes iperf work better than ever. But are > > > > other broken applications going to have same problem. Sounds like the > > > > old "who runs first" fork() problems. > > > > > > this is the first such app and really, and even for this app: i've been > > > frequently running iperf on -rt kernels for _years_ and never noticed > > > how buggy its 'locking' code was, and that it would under some > > > circumstances use up the whole CPU on high-res timers. > > > > I must admit I don't know much about that topic, but there is one thing I > > don't understand. Why is iperf (even if it's buggy) able to use up the > > whole cpu? I didn't run it as root but as my normal user so it should > > have limited rights. Shouldn't the linux scheduler distribute cpu time > > among all running processes? > > In this case, there are two threads. One is receiving data and the other > is spinning checking on progress. If the spinning thread doesn't yield, > it will end up using it's whole quantum (10ms at 100hz), before the > scheduler lets the receiver run again. If the receiving thread doesn't > get to run then on a UP the performance stinks. > Ok, let's see if I got this right: If there are other processes that want cpu time they will get it after the quantum for the iperf thread is used up. So cpu time will be distributed among other processes, but it takes some time until they get it and this increases latency. > The problem only showed up laptop because most of my other systems are > SMP (or fake SMP/HT), and usually set HZ to 1000 not 100. Hm, on my laptop (Pentium M) I have configured CONFIG_HZ_300 and CONFIG_NO_HZ. On my desktop PC (Athlon 2000+, also UP) I also have CONFIG_HZ_300 and CONFIG_NO_HZ but didn't notice the problem. Maxi signature.asc Description: This is a digitally signed message part.
[PATCH 2/4] b44: tx bounce sizing.
No need to grap full size MTU buffer for possibly small transmit bounce buffers. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/b44.c |7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) --- a/drivers/net/b44.c 2007-06-04 12:46:54.0 -0700 +++ b/drivers/net/b44.c 2007-06-04 12:48:21.0 -0700 @@ -69,7 +69,6 @@ #define NEXT_TX(N) (((N) + 1) & (B44_TX_RING_SIZE - 1)) #define RX_PKT_BUF_SZ (1536 + bp->rx_offset + 64) -#define TX_PKT_BUF_SZ (B44_MAX_MTU + ETH_HLEN + 8) /* minimum number of free TX descriptors required to wake up TX process */ #define B44_TX_WAKEUP_THRESH (B44_TX_RING_SIZE / 4) @@ -968,7 +967,6 @@ static void b44_tx_timeout(struct net_de static int b44_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct b44 *bp = netdev_priv(dev); - struct sk_buff *bounce_skb; int rc = NETDEV_TX_OK; dma_addr_t mapping; u32 len, entry, ctrl; @@ -986,12 +984,13 @@ static int b44_start_xmit(struct sk_buff mapping = pci_map_single(bp->pdev, skb->data, len, PCI_DMA_TODEVICE); if (dma_mapping_error(mapping) || mapping + len > DMA_30BIT_MASK) { + struct sk_buff *bounce_skb; + /* Chip can't handle DMA to/from >1GB, use bounce buffer */ if (!dma_mapping_error(mapping)) pci_unmap_single(bp->pdev, mapping, len, PCI_DMA_TODEVICE); - bounce_skb = __dev_alloc_skb(TX_PKT_BUF_SZ, -GFP_ATOMIC|GFP_DMA); + bounce_skb = __dev_alloc_skb(len, GFP_ATOMIC | GFP_DMA); if (!bounce_skb) goto err_out; @@ -1000,13 +999,12 @@ static int b44_start_xmit(struct sk_buff if (dma_mapping_error(mapping) || mapping + len > DMA_30BIT_MASK) { if (!dma_mapping_error(mapping)) pci_unmap_single(bp->pdev, mapping, -len, PCI_DMA_TODEVICE); +len, PCI_DMA_TODEVICE); dev_kfree_skb_any(bounce_skb); goto err_out; } - skb_copy_from_linear_data(skb, skb_put(bounce_skb, len), - skb->len); + skb_copy_from_linear_data(skb, skb_put(bounce_skb, len), len); dev_kfree_skb_any(skb); skb = bounce_skb; } -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] b44: use netdev_alloc_skb
Use netdev_alloc_skb rather than dev_alloc_skb when allocating receive buffers. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/b44.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) --- a/drivers/net/b44.c 2007-06-04 12:49:11.0 -0700 +++ b/drivers/net/b44.c 2007-06-04 12:49:26.0 -0700 @@ -653,7 +653,7 @@ static int b44_alloc_rx_skb(struct b44 * src_map = &bp->rx_buffers[src_idx]; dest_idx = dest_idx_unmasked & (B44_RX_RING_SIZE - 1); map = &bp->rx_buffers[dest_idx]; - skb = dev_alloc_skb(RX_PKT_BUF_SZ); + skb = netdev_alloc_skb(bp->dev, RX_PKT_BUF_SZ); if (skb == NULL) return -ENOMEM; @@ -669,7 +669,7 @@ static int b44_alloc_rx_skb(struct b44 * if (!dma_mapping_error(mapping)) pci_unmap_single(bp->pdev, mapping, RX_PKT_BUF_SZ,PCI_DMA_FROMDEVICE); dev_kfree_skb_any(skb); - skb = __dev_alloc_skb(RX_PKT_BUF_SZ,GFP_DMA); + skb = __netdev_alloc_skb(bp->dev, RX_PKT_BUF_SZ, GFP_ATOMIC|GFP_DMA); if (skb == NULL) return -ENOMEM; mapping = pci_map_single(bp->pdev, skb->data, @@ -684,7 +684,6 @@ static int b44_alloc_rx_skb(struct b44 * } } - skb->dev = bp->dev; rh = (struct rx_header *) skb->data; skb_reserve(skb, RX_PKT_OFFSET); -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] b44: packet offset is constant
The receive buffer offset is constant in this driver. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/b44.c | 29 - drivers/net/b44.h |2 -- 2 files changed, 12 insertions(+), 19 deletions(-) Index: lifebook/drivers/net/b44.c === --- lifebook.orig/drivers/net/b44.c 2007-06-04 13:22:12.0 -0700 +++ lifebook/drivers/net/b44.c 2007-06-04 13:27:47.0 -0700 @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -68,7 +69,8 @@ (BP)->tx_cons - (BP)->tx_prod - TX_RING_GAP(BP)) #define NEXT_TX(N) (((N) + 1) & (B44_TX_RING_SIZE - 1)) -#define RX_PKT_BUF_SZ (1536 + bp->rx_offset + 64) +#define RX_PKT_OFFSET 30 +#define RX_PKT_BUF_SZ (1536 + RX_PKT_OFFSET + 64) /* minimum number of free TX descriptors required to wake up TX process */ #define B44_TX_WAKEUP_THRESH (B44_TX_RING_SIZE / 4) @@ -683,10 +685,9 @@ } skb->dev = bp->dev; - skb_reserve(skb, bp->rx_offset); + rh = (struct rx_header *) skb->data; + skb_reserve(skb, RX_PKT_OFFSET); - rh = (struct rx_header *) - (skb->data - bp->rx_offset); rh->len = 0; rh->flags = 0; @@ -696,13 +697,13 @@ if (src_map != NULL) src_map->skb = NULL; - ctrl = (DESC_CTRL_LEN & (RX_PKT_BUF_SZ - bp->rx_offset)); + ctrl = (DESC_CTRL_LEN & (RX_PKT_BUF_SZ - RX_PKT_OFFSET)); if (dest_idx == (B44_RX_RING_SIZE - 1)) ctrl |= DESC_CTRL_EOT; dp = &bp->rx_ring[dest_idx]; dp->ctrl = cpu_to_le32(ctrl); - dp->addr = cpu_to_le32((u32) mapping + bp->rx_offset + bp->dma_offset); + dp->addr = cpu_to_le32((u32) mapping + RX_PKT_OFFSET + bp->dma_offset); if (bp->flags & B44_FLAG_RX_RING_HACK) b44_sync_dma_desc_for_device(bp->pdev, bp->rx_ring_dma, @@ -781,7 +782,7 @@ PCI_DMA_FROMDEVICE); rh = (struct rx_header *) skb->data; len = le16_to_cpu(rh->len); - if ((len > (RX_PKT_BUF_SZ - bp->rx_offset)) || + if ((len > (RX_PKT_BUF_SZ - RX_PKT_OFFSET)) || (rh->flags & cpu_to_le16(RX_FLAG_ERRORS))) { drop_it: b44_recycle_rx(bp, cons, bp->rx_prod); @@ -813,8 +814,8 @@ pci_unmap_single(bp->pdev, map, skb_size, PCI_DMA_FROMDEVICE); /* Leave out rx_header */ - skb_put(skb, len+bp->rx_offset); - skb_pull(skb,bp->rx_offset); + skb_put(skb, len + RX_PKT_OFFSET); + skb_pull(skb, RX_PKT_OFFSET); } else { struct sk_buff *copy_skb; @@ -826,7 +827,7 @@ skb_reserve(copy_skb, 2); skb_put(copy_skb, len); /* DMA sync done above, copy just the actual packet */ - skb_copy_from_linear_data_offset(skb, bp->rx_offset, + skb_copy_from_linear_data_offset(skb, RX_PKT_OFFSET, copy_skb->data, len); skb = copy_skb; } @@ -1393,12 +1394,12 @@ bw32(bp, B44_TX_WMARK, 56); /* XXX magic */ if (reset_kind == B44_PARTIAL_RESET) { bw32(bp, B44_DMARX_CTRL, (DMARX_CTRL_ENABLE | - (bp->rx_offset << DMARX_CTRL_ROSHIFT))); + (RX_PKT_OFFSET << DMARX_CTRL_ROSHIFT))); } else { bw32(bp, B44_DMATX_CTRL, DMATX_CTRL_ENABLE); bw32(bp, B44_DMATX_ADDR, bp->tx_ring_dma + bp->dma_offset); bw32(bp, B44_DMARX_CTRL, (DMARX_CTRL_ENABLE | - (bp->rx_offset << DMARX_CTRL_ROSHIFT))); + (RX_PKT_OFFSET << DMARX_CTRL_ROSHIFT))); bw32(bp, B44_DMARX_ADDR, bp->rx_ring_dma + bp->dma_offset); bw32(bp, B44_DMARX_PTR, bp->rx_pending); @@ -2090,11 +2091,6 @@ bp->phy_addr = eeprom[90] & 0x1f; - /* With this, plus the rx_header prepended to the data by the -* hardware, we'll land the ethernet header on a 2-byte boundary. -*/ - bp->rx_offset = 30; - bp->imask = IMASK_DEF; bp->core_unit = ssb_core_unit(bp); Index: lifebook/drivers/net/b44.h === --- lifebook.orig/drivers/net/b44.h 2007-06-04 13:18:25.0 -0700 +++ lifebook/drivers/net/b44.h 2007-06-04 13:22:15.0 -0700 @@ -443,8 +443,6 @@ #define B44_FLAG_TX_RING_HACK 0x4000 #define B44_FLAG_WOL_ENABLE0x80
[PATCH 0/4] b44 driver improvements
While researching why the wired networking was so slow on my laptop (not a driver problem); spotted these small changes to b44 driver. Nothing urgent, maybe 2.6.23 or later material. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] b44: timer power saving
Make the PHY and statistic timer run on one second boundary for powersaving. On resume, the driver should check for link up immediately, to get online faster (rather than waiting for the next second). Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/b44.c |9 - 1 file changed, 4 insertions(+), 5 deletions(-) --- a/drivers/net/b44.c 2007-06-04 12:31:27.0 -0700 +++ b/drivers/net/b44.c 2007-06-04 12:31:34.0 -0700 @@ -599,8 +599,7 @@ static void b44_timer(unsigned long __op spin_unlock_irq(&bp->lock); - bp->timer.expires = jiffies + HZ; - add_timer(&bp->timer); + mod_timer(&bp->timer, round_jiffies(jiffies + HZ)); } static void b44_tx(struct b44 *bp) @@ -2348,11 +2347,11 @@ static int b44_resume(struct pci_dev *pd netif_device_attach(bp->dev); spin_unlock_irq(&bp->lock); - bp->timer.expires = jiffies + HZ; - add_timer(&bp->timer); - b44_enable_ints(bp); netif_wake_queue(dev); + + mod_timer(&bp->timer, jiffies + 1); + return 0; } -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iperf: performance regression (was b44 driver problem?)
On Mon, 4 Jun 2007 21:47:59 +0200 Maximilian Engelhardt <[EMAIL PROTECTED]> wrote: > On Monday 04 June 2007, Ingo Molnar wrote: > > * Stephen Hemminger <[EMAIL PROTECTED]> wrote: > > > Yes, the following patch makes iperf work better than ever. But are > > > other broken applications going to have same problem. Sounds like the > > > old "who runs first" fork() problems. > > > > this is the first such app and really, and even for this app: i've been > > frequently running iperf on -rt kernels for _years_ and never noticed > > how buggy its 'locking' code was, and that it would under some > > circumstances use up the whole CPU on high-res timers. > > I must admit I don't know much about that topic, but there is one thing I > don't understand. Why is iperf (even if it's buggy) able to use up the whole > cpu? I didn't run it as root but as my normal user so it should have limited > rights. Shouldn't the linux scheduler distribute cpu time among all running > processes? In this case, there are two threads. One is receiving data and the other is spinning checking on progress. If the spinning thread doesn't yield, it will end up using it's whole quantum (10ms at 100hz), before the scheduler lets the receiver run again. If the receiving thread doesn't get to run then on a UP the performance stinks. The problem only showed up laptop because most of my other systems are SMP (or fake SMP/HT), and usually set HZ to 1000 not 100. It kind of reminds me of the family road trip with the kids in the back seat going "Are we there yet?" -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iperf: performance regression (was b44 driver problem?)
On Monday 04 June 2007, Ingo Molnar wrote: > * Stephen Hemminger <[EMAIL PROTECTED]> wrote: > > Yes, the following patch makes iperf work better than ever. But are > > other broken applications going to have same problem. Sounds like the > > old "who runs first" fork() problems. > > this is the first such app and really, and even for this app: i've been > frequently running iperf on -rt kernels for _years_ and never noticed > how buggy its 'locking' code was, and that it would under some > circumstances use up the whole CPU on high-res timers. I must admit I don't know much about that topic, but there is one thing I don't understand. Why is iperf (even if it's buggy) able to use up the whole cpu? I didn't run it as root but as my normal user so it should have limited rights. Shouldn't the linux scheduler distribute cpu time among all running processes? Maxi signature.asc Description: This is a digitally signed message part.
Re: [PATCH]: Add security check before flushing SAD/SPD
On Mon, 4 Jun 2007, Eric Paris wrote: > Some time ago this thread bounced back and forth and seemed to come to > rest with the patch below, I cleaned up the comments and put all the > ACKs it received in one place below, but so much time has passed I doubt > if they should still count for free. I also rediffed the patch against > the latest miller tree. Is the idea or patch in any way flawed or > unacceptable to people at the moment? > > Anyone willing to step up an re-ack the patch to get it moving into the > tree? Looks good to me. Acked-by: James Morris <[EMAIL PROTECTED]> -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iperf: performance regression (was b44 driver problem?)
* Stephen Hemminger <[EMAIL PROTECTED]> wrote: > Yes, the following patch makes iperf work better than ever. But are > other broken applications going to have same problem. Sounds like the > old "who runs first" fork() problems. this is the first such app and really, and even for this app: i've been frequently running iperf on -rt kernels for _years_ and never noticed how buggy its 'locking' code was, and that it would under some circumstances use up the whole CPU on high-res timers. If this were a widespread problem then the right solution would be to preserve the old behavior. The child-runs-first thing was an unspecified detail and many apps grew to depend on how the kernel did it - and when we changed it all hell broke lose. Even today, when i switch off child-runs-first, Gnome would not start up because some of its startup threads are racy and hang :-/ I googled for usleep(0) quickly (on google.com/codesearch and on google.com) and it didnt seem to suggest that the problem is widespread. (3 hits on the code-search site, all were false positives.) so ... if this situation were even just a little bit more serious i'd argue for working this around and implementing some API quirk. But right now i'm leaning towards just saying that the iperf fix exists and fixes the problem, and that we already have kernels out with the new behavior. Maybe we should add a once-per-boot printk to flesh out any other apps? I'd be surprised if there was more than iperf. Ingo - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iperf: performance regression (was b44 driver problem?)
On Mon, 2007-06-04 at 21:00 +0200, Thomas Gleixner wrote: > > > > Yes, the following patch makes iperf work better than ever. > > But are other broken applications going to have same problem. > > Sounds like the old "who runs first" fork() problems. > > Not really. The fork() "who runs first" problem is nowhere specified. > > usleep(0) is well defined: > > If the value of useconds is 0, then the call has no effect. > > So the call into the kernel has been wrong for quite a time. > Just for clarification: I'm not saying that we should break the (broken) user space ABI. I'm going to work out a patch which prints out a warning (limited number per boot) and emulating the old behavior by a call to yield() along with an entry into (mis)feature-removal.txt. tglx - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Add security check before flushing SAD/SPD
Some time ago this thread bounced back and forth and seemed to come to rest with the patch below, I cleaned up the comments and put all the ACKs it received in one place below, but so much time has passed I doubt if they should still count for free. I also rediffed the patch against the latest miller tree. Is the idea or patch in any way flawed or unacceptable to people at the moment? Anyone willing to step up an re-ack the patch to get it moving into the tree? -Eric Currently we check for permission before deleting entries from SAD and SPD, (see security_xfrm_policy_delete() security_xfrm_state_delete()) However we are not checking for authorization when flushing the SPD and the SAD completely. It was perhaps missed in the original security hooks patch. This patch adds a security check when flushing entries from the SAD and SPD. It runs the entire database and checks each entry for a denial. If the process attempting the flush is unable to remove all of the entries a denial is logged the the flush function returns an error without removing anything. This is particularly useful when a process may need to create or delete its own xfrm entries used for things like labeled networking but that same process should not be able to delete other entries or flush the entire database. WAS Signed-off-by: Signed-off-by: Joy Latten<[EMAIL PROTECTED]> NOT NOW WAS Acked-by: James Morris <[EMAIL PROTECTED]> NOT NOW WAS Acked-by: Eric Paris <[EMAIL PROTECTED]> NOT NOW --- include/net/xfrm.h |6 ++-- net/key/af_key.c | 10 ++- net/xfrm/xfrm_policy.c | 63 +-- net/xfrm/xfrm_state.c | 46 -- net/xfrm/xfrm_user.c |9 +- 5 files changed, 121 insertions(+), 13 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 90185e8..311f25a 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -964,7 +964,7 @@ struct xfrmk_spdinfo { extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); extern int xfrm_state_delete(struct xfrm_state *x); -extern void xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info); +extern int xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info); extern void xfrm_sad_getinfo(struct xfrmk_sadinfo *si); extern void xfrm_spd_getinfo(struct xfrmk_spdinfo *si); extern int xfrm_replay_check(struct xfrm_state *x, __be32 seq); @@ -1020,13 +1020,13 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(u8 type, int dir, struct xfrm_sec_ctx *ctx, int delete, int *err); struct xfrm_policy *xfrm_policy_byid(u8, int dir, u32 id, int delete, int *err); -void xfrm_policy_flush(u8 type, struct xfrm_audit *audit_info); +int xfrm_policy_flush(u8 type, struct xfrm_audit *audit_info); u32 xfrm_get_acqseq(void); void xfrm_alloc_spi(struct xfrm_state *x, __be32 minspi, __be32 maxspi); struct xfrm_state * xfrm_find_acq(u8 mode, u32 reqid, u8 proto, xfrm_address_t *daddr, xfrm_address_t *saddr, int create, unsigned short family); -extern void xfrm_policy_flush(u8 type, struct xfrm_audit *audit_info); +extern int xfrm_policy_flush(u8 type, struct xfrm_audit *audit_info); extern int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol); extern int xfrm_bundle_ok(struct xfrm_policy *pol, struct xfrm_dst *xdst, struct flowi *fl, int family, int strict); diff --git a/net/key/af_key.c b/net/key/af_key.c index d302dda..0f8304b 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -1682,6 +1682,7 @@ static int pfkey_flush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hd unsigned proto; struct km_event c; struct xfrm_audit audit_info; + int err; proto = pfkey_satype2proto(hdr->sadb_msg_satype); if (proto == 0) @@ -1689,7 +1690,9 @@ static int pfkey_flush(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hd audit_info.loginuid = audit_get_loginuid(current->audit_context); audit_info.secid = 0; - xfrm_state_flush(proto, &audit_info); + err = xfrm_state_flush(proto, &audit_info); + if (err) + return err; c.data.proto = proto; c.seq = hdr->sadb_msg_seq; c.pid = hdr->sadb_msg_pid; @@ -2683,10 +2686,13 @@ static int pfkey_spdflush(struct sock *sk, struct sk_buff *skb, struct sadb_msg { struct km_event c; struct xfrm_audit audit_info; + int err; audit_info.loginuid = audit_get_loginuid(current->audit_context); audit_info.secid = 0; - xfrm_policy_flush(XFRM_POLICY_TYPE_MAIN, &audit_info); + err = xfrm_policy_flush(XFRM_POLICY_TYPE_MAIN, &audit_info); + if (err) + return err; c.data.type = XFRM_POLICY_TYPE_MAIN; c.event = XFRM_MSG_FLUSHPOLICY; c.pid = hdr->sadb_msg_pid; d
Re: iperf: performance regression (was b44 driver problem?)
On Mon, 2007-06-04 at 10:51 -0700, Stephen Hemminger wrote: > > I doubt that. This is in the iperf code itself. > > > > void thread_rest ( void ) { > > #if defined( HAVE_THREAD ) > > #if defined( HAVE_POSIX_THREAD ) > > // TODO add checks for sched_yield or pthread_yield and call that > > // if available > > usleep( 0 ); > > > > -- > > > > It results in a nanosleep({0,0}, NULL) > > > > tglx > > > > Yes, the following patch makes iperf work better than ever. > But are other broken applications going to have same problem. > Sounds like the old "who runs first" fork() problems. Not really. The fork() "who runs first" problem is nowhere specified. usleep(0) is well defined: If the value of useconds is 0, then the call has no effect. So the call into the kernel has been wrong for quite a time. tglx - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] NetXen: Add NETXEN prefix to macros
On Sun, Jun 03, 2007 at 11:50:29AM -0400, Jeff Garzik wrote: > Mithlesh Thukral wrote: > >NetXen: Add NETXEN prefix to a macro > >This patch will add the "NETXEN" prefix to "USER_START" macro. > > > >Signed-off by: Wen Xiong <[EMAIL PROTECTED]> > >Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]> > >--- > > > > drivers/net/netxen/netxen_nic.h |4 ++-- > > drivers/net/netxen/netxen_nic_ethtool.c |2 +- > > drivers/net/netxen/netxen_nic_hw.c |4 ++-- > > drivers/net/netxen/netxen_nic_init.c|4 ++-- > > 4 files changed, 7 insertions(+), 7 deletions(-) > > Your patch description is useless. Clearly we know -what- it does, > simply by reading the patch. > > But it does not answer the simple question: why? why is this needed in > a bug fix Release Candidate series? > I can't see why this is needed in an RC branch, but the use of generically named macros/variables has been problematic with this driver on more 'obscure' arches. I submitted a bigger cleanup patch a in March that never got taken so this is still a problem. Here is that patch: Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]> --- netxen_nic.h | 47 --- netxen_nic_ethtool.c |8 netxen_nic_hw.c | 10 +- netxen_nic_init.c| 23 --- 4 files changed, 45 insertions(+), 43 deletions(-) diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h index dd8ce35..d5f0c06 100644 --- a/drivers/net/netxen/netxen_nic.h +++ b/drivers/net/netxen/netxen_nic.h @@ -68,9 +68,10 @@ #define _NETXEN_NIC_LINUX_SUBVERSION 3 #define NETXEN_NIC_LINUX_VERSIONID "3.3.3" -#define NUM_FLASH_SECTORS (64) -#define FLASH_SECTOR_SIZE (64 * 1024) -#define FLASH_TOTAL_SIZE (NUM_FLASH_SECTORS * FLASH_SECTOR_SIZE) +#define NETXEN_NUM_FLASH_SECTORS (64) +#define NETXEN_FLASH_SECTOR_SIZE (64 * 1024) +#define NETXEN_FLASH_TOTAL_SIZE (NETXEN_NUM_FLASH_SECTORS \ + * NETXEN_FLASH_SECTOR_SIZE) #define PHAN_VENDOR_ID 0x4040 @@ -671,28 +672,28 @@ struct netxen_new_user_info { /* Flash memory map */ typedef enum { - CRBINIT_START = 0, /* Crbinit section */ - BRDCFG_START = 0x4000, /* board config */ - INITCODE_START = 0x6000,/* pegtune code */ - BOOTLD_START = 0x1, /* bootld */ - IMAGE_START = 0x43000, /* compressed image */ - SECONDARY_START = 0x20, /* backup images */ - PXE_START = 0x3E, /* user defined region */ - USER_START = 0x3E8000, /* User defined region for new boards */ - FIXED_START = 0x3F /* backup of crbinit */ + NETXEN_CRBINIT_START = 0, /* Crbinit section */ + NETXEN_BRDCFG_START = 0x4000, /* board config */ + NETXEN_INITCODE_START = 0x6000, /* pegtune code */ + NETXEN_BOOTLD_START = 0x1, /* bootld */ + NETXEN_IMAGE_START = 0x43000, /* compressed image */ + NETXEN_SECONDARY_START = 0x20, /* backup images */ + NETXEN_PXE_START = 0x3E,/* user defined region */ + NETXEN_USER_START = 0x3E8000, /* User defined region for new boards */ + NETXEN_FIXED_START = 0x3F /* backup of crbinit */ } netxen_flash_map_t; -#define USER_START_OLD PXE_START /* for backward compatibility */ - -#define FLASH_START(CRBINIT_START) -#define INIT_SECTOR(0) -#define PRIMARY_START (BOOTLD_START) -#define FLASH_CRBINIT_SIZE (0x4000) -#define FLASH_BRDCFG_SIZE (sizeof(struct netxen_board_info)) -#define FLASH_USER_SIZE(sizeof(struct netxen_user_info)/sizeof(u32)) -#define FLASH_SECONDARY_SIZE (USER_START-SECONDARY_START) -#define NUM_PRIMARY_SECTORS(0x20) -#define NUM_CONFIG_SECTORS (1) +#define NETXEN_USER_START_OLD NETXEN_PXE_START /* for backward compatibility */ + +#define NETXEN_FLASH_START (NETXEN_CRBINIT_START) +#define NETXEN_INIT_SECTOR (0) +#define NETXEN_PRIMARY_START (NETXEN_BOOTLD_START) +#define NETXEN_FLASH_CRBINIT_SIZE (0x4000) +#define NETXEN_FLASH_BRDCFG_SIZE (sizeof(struct netxen_board_info)) +#define NETXEN_FLASH_USER_SIZE (sizeof(struct netxen_user_info)/sizeof(u32)) +#define NETXEN_FLASH_SECONDARY_SIZE (NETXEN_USER_START-NETXEN_SECONDARY_START) +#define NETXEN_NUM_PRIMARY_SECTORS (0x20) +#define NETXEN_NUM_CONFIG_SECTORS (1) #define PFX "NetXen: " extern char netxen_nic_driver_name[]; diff --git a/drivers/net/netxen/netxen_nic_ethtool.c b/drivers/net/netxen/netxen_nic_ethtool.c index ee1b5a2..4dfa76b 100644 --- a/drivers/net/netxen/netxen_nic_ethtool.c +++ b/drivers/net/netxen/netxen_nic_ethtool.c @@ -94,7 +94,7 @@ static const char netxen_nic_gstrings_test[][ETH_GSTRING_LEN] = { static int netxen_nic_get_eeprom_len(struct net_device *dev) { - return FLASH_TOTAL_SIZE; + return NETXEN_FLASH_TOTAL_SIZE; } static void @@
2.6.22-rc3-mm1 - pppd hanging in netdev_run_todo while holding mutex
On Wed, 30 May 2007 23:58:23 PDT, Andrew Morton said: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc3/2.6.22-rc3-mm1/ Under 22-rc2-mm1, if my VPN connection got reset, ppp0 just quietly went away. Under 22-rc3-mm1, it seems to end up wedged and waiting for references to go away: Jun 4 09:23:01 turing-police kernel: [90089.270707] unregister_netdevice: waiting for ppp0 to become free. Usage count = 8 Jun 4 09:23:11 turing-police kernel: [90099.396121] unregister_netdevice: waiting for ppp0 to become free. Usage count = 8 Jun 4 09:23:21 turing-police kernel: [90109.520574] unregister_netdevice: waiting for ppp0 to become free. Usage count = 8 Jun 4 09:23:32 turing-police kernel: [90119.653129] unregister_netdevice: waiting for ppp0 to become free. Usage count = 8 'echo t > /proc/sysrq_trigger' shows pppd hung up here: Jun 4 10:52:57 turing-police kernel: [95478.047892] pppd D 000105ad3830 4968 3815 1 (NOTLB) Jun 4 10:52:57 turing-police kernel: [95478.047902] 810008d5fd78 0086 81000349 Jun 4 10:52:57 turing-police kernel: [95478.047911] 810008d5fd28 810008d4a040 810003461820 810008d4a2b0 Jun 4 10:52:57 turing-police kernel: [95478.047920] 000105ad3733 0202 00ff 80239795 Jun 4 10:52:57 turing-police kernel: [95478.047928] Call Trace: Jun 4 10:52:57 turing-police kernel: [95478.047936] [] schedule_timeout+0x8d/0xb4 Jun 4 10:52:57 turing-police kernel: [95478.047945] [] schedule_timeout_uninterruptible+0x19/0x1b Jun 4 10:52:57 turing-police kernel: [95478.047954] [] msleep+0x14/0x1e Jun 4 10:52:57 turing-police kernel: [95478.047963] [] netdev_run_todo+0x12f/0x234 Jun 4 10:52:57 turing-police kernel: [95478.047972] [] rtnl_unlock+0x35/0x37 Jun 4 10:52:57 turing-police kernel: [95478.047981] [] unregister_netdev+0x1e/0x23 Jun 4 10:52:57 turing-police kernel: [95478.047994] [] :ppp_generic:ppp_shutdown_interface+0x67/0xbb Jun 4 10:52:57 turing-police kernel: [95478.048018] [] :ppp_generic:ppp_release+0x33/0x65 Jun 4 10:52:57 turing-police kernel: [95478.048028] [] __fput+0xac/0x176 Jun 4 10:52:57 turing-police kernel: [95478.048036] [] fput+0x14/0x16 Jun 4 10:52:57 turing-police kernel: [95478.048045] [] filp_close+0x66/0x71 Jun 4 10:52:57 turing-police kernel: [95478.048054] [] sys_close+0x98/0xd7 Jun 4 10:52:57 turing-police kernel: [95478.048062] [] tracesys+0xdc/0xe1 Jun 4 10:52:57 turing-police kernel: [95478.048073] [<2b45cd2429a0>] Which in itself wouldn't be so bad, except that it's holding a mutex and lots of other stuff gets wedged up waiting for it (here's 1 of 6 processes that was wedged this morning): Jun 4 10:52:58 turing-police kernel: [95478.051129] ifconfig D 810005e19820 5800 9787 20510 (NOTLB) Jun 4 10:52:58 turing-police kernel: [95478.051141] 81000868fd08 0082 81000868fec8 0246 Jun 4 10:52:58 turing-police kernel: [95478.051150] 00010101 810005e19820 810003fe0820 810005e19a90 Jun 4 10:52:58 turing-police kernel: [95478.051159] 0a3f26c0 0006 81000868ff28 8028aacc Jun 4 10:52:58 turing-police kernel: [95478.051167] Call Trace: Jun 4 10:52:58 turing-police kernel: [95478.051176] [] __mutex_lock_slowpath+0x74/0xb6 Jun 4 10:52:58 turing-police kernel: [95478.051185] [] mutex_lock+0xe/0x10 Jun 4 10:52:58 turing-police kernel: [95478.051193] [] netdev_run_todo+0x19/0x234 Jun 4 10:52:58 turing-police kernel: [95478.051202] [] rtnl_unlock+0x35/0x37 Jun 4 10:52:58 turing-police kernel: [95478.051210] [] dev_ioctl+0x3e3/0x483 Jun 4 10:52:58 turing-police kernel: [95478.051218] [] sock_ioctl+0x1ef/0x1fc Jun 4 10:52:58 turing-police kernel: [95478.051227] [] do_ioctl+0x2a/0x77 Jun 4 10:52:58 turing-police kernel: [95478.051235] [] vfs_ioctl+0x247/0x264 Jun 4 10:52:58 turing-police kernel: [95478.051243] [] sys_ioctl+0x5f/0x85 Jun 4 10:52:58 turing-police kernel: [95478.051252] [] tracesys+0xdc/0xe1 (And of course, you can't shutdown cleanly, because /etc/init.d/network tries to down other interfaces on the way out, and) I'd bisect this, except I don't have a better way to replicate it than "wait for our VPN box to reset the connection after 24 hours of connect" - basically means I get 2 tries per weekend..) An hour or so of digging through the -rc3-mm1 broken-out/ didn't find any obvious-to-me culprits. Any ideas/suggestions? pgpgLKOKJ5mzu.pgp Description: PGP signature
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Oliver Hartkopp wrote: > I think, it goes into the right direction to use skb->pkt_type. The flag > should really be somewhere inside the skb as all back references into > the sk would become sticky in the implementation. > > But regarding the use of skb->pkt_type i would suggest to take a closer > look on the definitions in include/linux/if_packet.h and how the > pkt_type is to be used inside the kernel. In my opinion we should use ... > > * TX-Path: > PACKET_OTHERHOST: send the CAN frame without loopback > PACKET_BROADCAST : send the CAN frame with loopback (aka local broadcast) > > See an example of this approach in drivers/net/arcnet/rfc1051.c : > http://www.linux-m32r.org/lxr/http/source/drivers/net/arcnet/rfc1051.c?a=i386#L99 > > * RX-Path: > PACKET_HOST : just an incoming CAN frame for this host > > Any comments? ACKs? > > Best regards, > Oliver > > > The updated changes would look like this. Of course it has been tested and is working fine :-) Regards, Oliver Index: net/can/af_can.c === --- net/can/af_can.c(revision 325) +++ net/can/af_can.c(working copy) @@ -257,7 +257,6 @@ */ int can_send(struct sk_buff *skb, int loop) { -struct sock **tx_sk = (struct sock **)skb->cb; int err; if (skb->dev->type != ARPHRD_CAN) { @@ -265,30 +264,43 @@ return -EPERM; } +skb->protocol = htons(ETH_P_CAN); + if (loop) { /* local loopback of sent CAN frames (default) */ -/* indication for the CAN driver: do loopback */ -*tx_sk = skb->sk; +/* + * This packet is not only sent on the CAN bus but + * also broadcasted to local subscribers on this host. + */ +skb->pkt_type = PACKET_BROADCAST; /* - * The reference to the originating sock may be also required + * The reference to the originating sock may be required * by the receiving socket to indicate (and ignore) his own - * sent data. Example: can_raw sockopt CAN_RAW_RECV_OWN_MSGS + * sent data. Example: can_raw sockopt CAN_RAW_RECV_OWN_MSGS. + * Therefore we have to ensure that skb->sk remains the + * reference to the originating sock by restoring skb->sk + * after each skb_clone() or skb_orphan() usage. + * skb->sk is usually unused and unset in the rx path. */ /* interface not capabable to do the loopback itself? */ if (!(skb->dev->flags & IFF_LOOPBACK)) { +struct sock *srcsk = skb->sk; struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC); /* perform the local loopback here */ -newskb->protocol = htons(ETH_P_CAN); newskb->ip_summed = CHECKSUM_UNNECESSARY; +newskb->sk = srcsk; netif_rx(newskb); } } else { -/* indication for the CAN driver: no loopback required */ -*tx_sk = NULL; +/* + * Indication for the CAN driver: No loopback required! + * This packet is only transmitted to the CAN bus. + */ +skb->pkt_type = PACKET_OTHERHOST; } if (!(skb->dev->flags & IFF_UP)) @@ -581,10 +593,12 @@ static inline void deliver(struct sk_buff *skb, struct receiver *r) { +struct sock *srcsk = skb->sk; struct sk_buff *clone = skb_clone(skb, GFP_ATOMIC); DBG("skbuff %p cloned to %p\n", skb, clone); if (clone) { +clone->sk = srcsk; r->func(clone, r->data); r->matches++; } Index: net/can/raw.c === --- net/can/raw.c(revision 325) +++ net/can/raw.c(working copy) @@ -154,7 +154,7 @@ if (!ro->recv_own_msgs) { /* check the received tx sock reference */ -if (*(struct sock **)skb->cb == sk) { +if (skb->sk == sk) { DBG("trashed own tx msg\n"); kfree_skb(skb); return; Index: drivers/net/can/vcan.c === --- drivers/net/can/vcan.c(revision 325) +++ drivers/net/can/vcan.c(working copy) @@ -133,6 +133,7 @@ skb->protocol = htons(ETH_P_CAN); skb->dev = dev; skb->ip_summed = CHECKSUM_UNNECESSARY; +skb->pkt_type = PACKET_HOST; DBG("received skbuff on interface %d\n", dev->ifindex); @@ -149,8 +150,8 @@ stats->tx_packets++; stats->tx_bytes += skb->len; -/* tx socket reference pointer: Loopback required if not NULL */ -loop = *(struct sock **)skb->cb != NULL; +/* indication for CAN netdevice drivers that loopback is required */ +loop = (skb->pkt_type == PACKET_BROADCAST); if (!loopback) { /* no loopback handling available inside this driver */ @@ -170,6 +171,8 @@ /* perform standard loopback handling for CAN network interfaces */ if (loop) { +struct sock *srcsk = skb->sk; +
Re: iperf: performance regression (was b44 driver problem?)
On Mon, 04 Jun 2007 19:32:48 +0200 Thomas Gleixner <[EMAIL PROTECTED]> wrote: > On Mon, 2007-06-04 at 09:59 -0700, Stephen Hemminger wrote: > > > > gettimeofday({1180973726, 982754}, NULL) = 0 > > > > recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., > > > > 8192, 0) = 8192 > > > > gettimeofday({1180973726, 983790}, NULL) = 0 > > > > > > Well, gettimeofday() is not affected by the highres code, but > > > > > > > nanosleep({0, 0}, NULL) = 0 > > > > nanosleep({0, 0}, NULL) = 0 > > > > > > is. The nanosleep call with a relative timeout of 0 returns immediately > > > with highres enabled, while it sleeps at least until the next tick > > > arrives when highres is off. Are there more of those stupid sleeps in > > > the code ? > > > > GLIBC pthread_mutex does it, YES it is a problem! > > Looks like the old behavior is required for ABI compatibility. > > > > iperf server has several threads. One thread is using pthread_mutex_lock > > to wait for the other thread. It looks like pthread_mutex_lock is using > > nanosleep as yield(). > > I doubt that. This is in the iperf code itself. > > void thread_rest ( void ) { > #if defined( HAVE_THREAD ) > #if defined( HAVE_POSIX_THREAD ) > // TODO add checks for sched_yield or pthread_yield and call that > // if available > usleep( 0 ); > > -- > > It results in a nanosleep({0,0}, NULL) > > tglx > Yes, the following patch makes iperf work better than ever. But are other broken applications going to have same problem. Sounds like the old "who runs first" fork() problems. --- iperf-2.0.2/compat/Thread.c.orig2005-05-03 08:15:51.0 -0700 +++ iperf-2.0.2/compat/Thread.c 2007-06-04 10:54:21.0 -0700 @@ -405,9 +405,13 @@ void thread_rest ( void ) { #if defined( HAVE_THREAD ) #if defined( HAVE_POSIX_THREAD ) -// TODO add checks for sched_yield or pthread_yield and call that -// if available + +#if defined( _POSIX_PRIORITY_SCHEDULING ) +sched_yield(); +#else usleep( 0 ); +#endif + #else // Win32 SwitchToThread( ); #endif -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iperf: performance regression (was b44 driver problem?)
On Mon, 2007-06-04 at 09:59 -0700, Stephen Hemminger wrote: > > > gettimeofday({1180973726, 982754}, NULL) = 0 > > > recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., > > > 8192, 0) = 8192 > > > gettimeofday({1180973726, 983790}, NULL) = 0 > > > > Well, gettimeofday() is not affected by the highres code, but > > > > > nanosleep({0, 0}, NULL) = 0 > > > nanosleep({0, 0}, NULL) = 0 > > > > is. The nanosleep call with a relative timeout of 0 returns immediately > > with highres enabled, while it sleeps at least until the next tick > > arrives when highres is off. Are there more of those stupid sleeps in > > the code ? > > GLIBC pthread_mutex does it, YES it is a problem! > Looks like the old behavior is required for ABI compatibility. > > iperf server has several threads. One thread is using pthread_mutex_lock > to wait for the other thread. It looks like pthread_mutex_lock is using > nanosleep as yield(). I doubt that. This is in the iperf code itself. void thread_rest ( void ) { #if defined( HAVE_THREAD ) #if defined( HAVE_POSIX_THREAD ) // TODO add checks for sched_yield or pthread_yield and call that // if available usleep( 0 ); -- It results in a nanosleep({0,0}, NULL) tglx - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
iperf: performance regression (was b44 driver problem?)
On Mon, 04 Jun 2007 18:35:58 +0200 Thomas Gleixner <[EMAIL PROTECTED]> wrote: > On Mon, 2007-06-04 at 09:09 -0700, Stephen Hemminger wrote: > > > > I did the test with an 2.6.22-rc3-git4 kernel and the p54 driver built > > > > external as module. > > > > > > Can you look at iperf to figure out, whether it does some weird timer > > > stuff (high frequency interval timer or such) ? Either check the code or > > > strace it. > > > > It is the receiver doing a tight loop doing gettimeofday/recv calls. > > > > > > sendto(-1227715616, 0xc, 3085438964, 0, {...}, 3067249832) = 0 > > gettimeofday({1180973726, 981615}, NULL) = 0 > > gettimeofday({1180973726, 981751}, NULL) = 0 > > futex(0x8055c64, 0x5 /* FUTEX_??? */, 1) = 1 > > futex(0x8055c90, FUTEX_WAKE, 1) = 0 > > recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, > > 0) = 8192 > > gettimeofday({1180973726, 982754}, NULL) = 0 > > recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, > > 0) = 8192 > > gettimeofday({1180973726, 983790}, NULL) = 0 > > Well, gettimeofday() is not affected by the highres code, but > > > nanosleep({0, 0}, NULL) = 0 > > nanosleep({0, 0}, NULL) = 0 > > is. The nanosleep call with a relative timeout of 0 returns immediately > with highres enabled, while it sleeps at least until the next tick > arrives when highres is off. Are there more of those stupid sleeps in > the code ? GLIBC pthread_mutex does it, YES it is a problem! Looks like the old behavior is required for ABI compatibility. iperf server has several threads. One thread is using pthread_mutex_lock to wait for the other thread. It looks like pthread_mutex_lock is using nanosleep as yield(). Multi-thread strace shows how this could kill performance. These are with old 2.6.20 kernel that doesn't have the problem: sendto(-1210930208, 0xc, 3085438964, 0, {...}, 3084035240) = 0 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(5001), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 listen(3, 5)= 0 futex(0x8055c64, 0x5 /* FUTEX_??? */, 1) = 1 accept(3, {sa_family=AF_INET, sin_port=htons(49973), sin_addr=inet_addr("192.168.0.14")}, [16]) = 4 getsockname(4, {sa_family=AF_INET, sin_port=htons(5001), sin_addr=inet_addr("192.168.0.12")}, [16]) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 24, 0) = 24 mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6528000 mprotect(0xb6528000, 4096, PROT_NONE) = 0 clone(child_stack=0xb6d284a4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0xb6d28bd8, {entry_number:6, base_addr:0xb6d28b90, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb6d28bd8) = 5622 accept(3, 0x8058b98, [128]) = ? ERESTARTSYS (To be restarted) == sendto(-1219322912, 0xc, 3085438964, 0, {...}, 3075642536) = 0 futex(0x8055d04, 0x5 /* FUTEX_??? */, 1) = 1 futex(0x8055d30, FUTEX_WAKE, 1) = 1 futex(0x8055c64, FUTEX_WAIT, 1, NULL) = 0 futex(0x8055d04, 0x5 /* FUTEX_??? */, 1) = 1 futex(0x8055d30, FUTEX_WAKE, 1) = 1 futex(0x8055c90, FUTEX_WAKE, 1) = 0 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [87380], [4]) = 0 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fb write(1, ""..., 61) = 61 write(1, "Server listening on TCP port 500"..., 34) = 34 write(1, "TCP window size: 85.3 KByte (def"..., 38) = 38 write(1, ""..., 61) = 61 nanosleep({0, 0}, NULL) = 0 futex(0x8055d04, 0x5 /* FUTEX_??? */, 1) = 1 futex(0x8055d30, FUTEX_WAKE, 1) = 1 futex(0x8055c64, FUTEX_WAIT, 3, NULL) = 0 futex(0x8055c90, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x8055d04, 0x5 /* FUTEX_??? */, 1) = 1 futex(0x8055d30, FUTEX_WAKE, 1) = 0 futex(0x8055c90, FUTEX_WAKE, 1) = 0 write(1, "[ 4] local 192.168.0.12 port 50"..., 74) = 74 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 ... nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 write(1, "[ 4] 0.0-30.2 sec336 MByte"..., 50) = 50 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 futex(0x8055d04, 0x5 /*
[PATCH 1/1] myri10ge: limit the number of recoveries
Limit the number of recoveries from a NIC hw watchdog reset to 1 by default. It enables detection of defective NICs immediately since these memory parity errors are expected to happen very rarely (less than once per century*NIC). Signed-off-by: Brice Goglin <[EMAIL PROTECTED]> --- drivers/net/myri10ge/myri10ge.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) Index: linux-rc/drivers/net/myri10ge/myri10ge.c === --- linux-rc.orig/drivers/net/myri10ge/myri10ge.c 2007-05-30 20:58:22.0 +0200 +++ linux-rc/drivers/net/myri10ge/myri10ge.c2007-06-04 19:01:54.0 +0200 @@ -279,6 +279,8 @@ module_param(myri10ge_fill_thresh, int, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(myri10ge_fill_thresh, "Number of empty rx slots allowed\n"); +static int myri10ge_reset_recover = 1; + static int myri10ge_wcfifo = 0; module_param(myri10ge_wcfifo, int, S_IRUGO); MODULE_PARM_DESC(myri10ge_wcfifo, "Enable WC Fifo when WC is enabled\n"); @@ -2730,8 +2732,14 @@ * For now, just report it */ reboot = myri10ge_read_reboot(mgp); printk(KERN_ERR - "myri10ge: %s: NIC rebooted (0x%x), resetting\n", - mgp->dev->name, reboot); + "myri10ge: %s: NIC rebooted (0x%x),%s resetting\n", + mgp->dev->name, reboot, + myri10ge_reset_recover ? " " : " not"); + if (myri10ge_reset_recover == 0) + return; + + myri10ge_reset_recover--; + /* * A rebooted nic will come back with config space as * it was after power was applied to PCIe bus. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1] myri10ge: limit the number of recoveries
Jeff Garzik wrote: > Brice Goglin wrote: >> Limit the number of recoveries from a NIC hw watchdog reset to 1 by >> default. >> It enables detection of defective NICs immediately since these memory >> parity >> errors are expected to happen very rarely (less than once per >> century*NIC). >> However, a defective NIC (very rare, fortunately) can see such an error >> quite often, ie. every few minutes under high load. >> >> Make the limit tunable to allow people with mission critical >> installations >> to crank up the tunable and recover an INTMAX number of times while >> waiting >> for a downtime window to replace the NIC. The performance won't be >> optimal, >> but at least, it will still work. >> >> Signed-off-by: Brice Goglin <[EMAIL PROTECTED]> >> --- >> drivers/net/myri10ge/myri10ge.c | 15 +-- >> 1 file changed, 13 insertions(+), 2 deletions(-) > > NAK. Ok... Then please apply the following patch which limits the number of recovery to 1 without making it tunable. It will at least enable detection of bad NICs. Brice - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: b44: regression in 2.6.22 (resend)
On Mon, 2007-06-04 at 09:09 -0700, Stephen Hemminger wrote: > > > I did the test with an 2.6.22-rc3-git4 kernel and the p54 driver built > > > external as module. > > > > Can you look at iperf to figure out, whether it does some weird timer > > stuff (high frequency interval timer or such) ? Either check the code or > > strace it. > > It is the receiver doing a tight loop doing gettimeofday/recv calls. > > > sendto(-1227715616, 0xc, 3085438964, 0, {...}, 3067249832) = 0 > gettimeofday({1180973726, 981615}, NULL) = 0 > gettimeofday({1180973726, 981751}, NULL) = 0 > futex(0x8055c64, 0x5 /* FUTEX_??? */, 1) = 1 > futex(0x8055c90, FUTEX_WAKE, 1) = 0 > recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, > 0) = 8192 > gettimeofday({1180973726, 982754}, NULL) = 0 > recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, > 0) = 8192 > gettimeofday({1180973726, 983790}, NULL) = 0 Well, gettimeofday() is not affected by the highres code, but > nanosleep({0, 0}, NULL) = 0 > nanosleep({0, 0}, NULL) = 0 is. The nanosleep call with a relative timeout of 0 returns immediately with highres enabled, while it sleeps at least until the next tick arrives when highres is off. Are there more of those stupid sleeps in the code ? tglx - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Urs Thuermann wrote: > Oliver Hartkopp <[EMAIL PROTECTED]> writes: > > >> 2. The loopback indication is done by using the unused skb->protocol in >> the tx path. >> > > I don't think we should (mis-)use skb->protocol as a loopback flag. > Yep! After reading the comments from Patrick and Urs i definitely agree to their concerns. > IMO it would be better to skb->pkt_type. This is used to indicate > packet type to rcv functions registered by dev_add_pack(). It is set > by netdevice drivers to PACKET_{MULTICAST,BROADCAST,HOST,OTHER} for > received packets. In the send path it is set to PACKET_OUTGOING on > the copy of the skbuff that is delivered to the sockets registered on > ptype_all (typically packet sockets from tcpdump or other sniffers). > AFAICS, pkt_type is not used otherwise in the send path. > > We could set skb->pkt_type = PACKET_LOOPBACK to flag to the CAN > netdevice driver whether to loop back the packet. > I think, it goes into the right direction to use skb->pkt_type. The flag should really be somewhere inside the skb as all back references into the sk would become sticky in the implementation. But regarding the use of skb->pkt_type i would suggest to take a closer look on the definitions in include/linux/if_packet.h and how the pkt_type is to be used inside the kernel. In my opinion we should use ... * TX-Path: PACKET_OTHERHOST: send the CAN frame without loopback PACKET_BROADCAST : send the CAN frame with loopback (aka local broadcast) See an example of this approach in drivers/net/arcnet/rfc1051.c : http://www.linux-m32r.org/lxr/http/source/drivers/net/arcnet/rfc1051.c?a=i386#L99 * RX-Path: PACKET_HOST : just an incoming CAN frame for this host Any comments? ACKs? Best regards, Oliver - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: b44: regression in 2.6.22 (resend)
On Mon, 04 Jun 2007 08:39:48 +0200 Thomas Gleixner <[EMAIL PROTECTED]> wrote: > On Sun, 2007-06-03 at 18:26 +0200, Maximilian Engelhardt wrote: > > > Is there any other strange behavior of the high res enabled kernel than > > > the b44 problem ? > > > > I didn't notice anything in the past (as I wrote). But today I did some > > tests > > for an updated version of the p54 mac80211 wlan driver and I noticed > > exactly > > the same problem: > > > > when booting with highres=off everything is fine. > > But when I boot an highres enabled kernel and I do the iperf-test with the > > p54 > > driver, my systems becomes unresponsive during the test. It seems to be > > exactly the same problem I have with the b44 driver. > > So this might not be a bug in the b44 code but a bug somewhere in the linux > > networking code. > > > > I did the test with an 2.6.22-rc3-git4 kernel and the p54 driver built > > external as module. > > Can you look at iperf to figure out, whether it does some weird timer > stuff (high frequency interval timer or such) ? Either check the code or > strace it. > > tglx > > It is the receiver doing a tight loop doing gettimeofday/recv calls. sendto(-1227715616, 0xc, 3085438964, 0, {...}, 3067249832) = 0 gettimeofday({1180973726, 981615}, NULL) = 0 gettimeofday({1180973726, 981751}, NULL) = 0 futex(0x8055c64, 0x5 /* FUTEX_??? */, 1) = 1 futex(0x8055c90, FUTEX_WAKE, 1) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 982754}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 983790}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 984355}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 984706}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 985111}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 985499}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 986088}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 986436}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 986916}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 987397}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 987872}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 988440}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 988823}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 989314}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 990029}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 990890}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 991803}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 992616}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 993105}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 993585}, NULL) = 0 recv(4, "\0\0\0\0\0\0\0\1\0\0\23\211\0\0\0\0\0\0\0\0\377\377\364"..., 8192, 0) = 8192 gettimeofday({1180973726, 994014}, NULL) = 0 ... recv(4, "45678901234567890123456789012345"..., 8192, 0) = 1448 gettimeofday({1180973757, 172437}, NULL) = 0 recv(4, "23456789012345678901234567890123"..., 8192, 0) = 1448 gettimeofday({1180973757, 172576}, NULL) = 0 recv(4, "01234567890123456789012345678901"..., 8192, 0) = 1632 gettimeofday({1180973757, 172752}, NULL) = 0 recv(4, "", 8192, 0)= 0 gettimeofday({1180973757, 172797}, NULL) = 0 gettimeofday({1180973757, 172817}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 nanosleep({0, 0}, NULL) = 0 close(4) = 0 futex(0x8055d04, 0x5 /* FUTEX_??? */, 1) = 1 futex(0x8055d30, FUTEX_WAKE, 1) = 0 _exit(0) -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to
Re: [PATCH] sundance: PHY address form 0, only for device ID 0x0200 (IP100A)
On Mon, 04 Jun 2007 17:04:39 -0400 Jesse Huang wrote: > From: Jesse Huang <[EMAIL PROTECTED]> > > Change Logs: > Search PHY address form 0, only for device ID 0x0200 (IP100A). Other device > are from PHY address 1. > > Signed-off-by: Jesse Huang <[EMAIL PROTECTED]> > --- > > drivers/net/sundance.c |4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > 5724a72722dfc9cafbb8f273cb82dbf577bd9ad0 > diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c > index e1f912d..914ab29 100644 > --- a/drivers/net/sundance.c > +++ b/drivers/net/sundance.c > @@ -562,7 +562,9 @@ #endif >* It seems some phys doesn't deal well with address 0 being accessed >* first, so leave address zero to the end of the loop (32 & 31). >*/ > - for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) { > + if(sundance_pci_tbl[np->chip_id].device == 0x0200) phy = 0; kernel style: if (sundance_pci_tbl[np->chip_id].device == 0x0200) phy = 0; else phy = 1; > + else phy = 1; > + for (; phy <= 32 && phy_idx < MII_CNT; phy++) { > int phyx = phy & 0x1f; > int mii_status = mdio_read(dev, phyx, MII_BMSR); > if (mii_status != 0x && mii_status != 0x) { > -- --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Definition and usage of NETIF_F_HW_SUM?
I was out of town last week and did not have a chance to respond. Yes, qla3xxx is (before Stephen's fix) broken on IPV6. I will review the changes and post a patch if necessary. > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Stephen Hemminger > Sent: Tuesday, May 29, 2007 4:46 PM > To: Michael Chan > Cc: Herbert Xu; netdev > Subject: Re: Definition and usage of NETIF_F_HW_SUM? > > On Tue, 29 May 2007 17:10:52 -0700 > "Michael Chan" <[EMAIL PROTECTED]> wrote: > > > On Wed, 2007-05-30 at 07:36 +1000, Herbert Xu wrote: > > > > > > > > I just checked e1000 and it's correct as it does use the > csum_offset > > > when doing TX offload. However, you're definitely right that bnx2 > > > seems to be broken. > > > > > > > A few devices take a offset, starting point, and > insertion point. This looks like > > > > the correct model. But no upper layer protocols other > than IPV4/IPV6 can do checksum > > > > offload at present, so it seems moot. > > > > > > I could easily whip up a patch to get GRE to use it for a start :) > > > > > > > IMHO the correct solution would be to get rid if > NETIF_F_HW_SUM and make a new flag > > > > NETIF_F_IPV6_SUM. Devices that can checksum both could > do NETIF_F_IPV4_SUM|NETI_F_IPV6_SUM. > > > > > > We should definitely keep NETIF_F_HW_SUM for sane > hardware such as the > > > e1000. Unfortunately we may just have to invent IPV6_SUM > for the broken > > > ones. > > > > > > Ccing Michael to see if the bnx2 chip can actually do offset-based > > > checksum offload. > > > > > > > bnx2 and tg3 cannot do offset-based checksumming because > the hardware > > doesn't have room in the buffer descriptors to specify the > offsets. So > > regrettably, the NETIF_F_HW_SUM flag has been misused in > these drivers. > > A new NETIF_F_IPV6_SUM flag will be very useful for us. > > Look furthur many drivers are just plain broken and use F_HW_SUM > and can't even do IPV6 properly. I'll fix > > The worst code award goes to: qla3xxx.c > which is broken on IPV6 and goes to trouble of computing all the > offsets and they are already there in skb... > > > > -- > Stephen Hemminger <[EMAIL PROTECTED]> > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack
Patrick McHardy wrote: > [TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack > > A time_wait socket inherits sk_bound_dev_if from the original socket, > but it is not used when sending ACK packets using ip_send_reply. > > Fix by passing the oif to ip_send_reply in struct ip_reply_arg and > use it for output routing. The patch was missing a NULL pointer check in tcp_v4_send_ack for a different code path where twsk == NULL. Fixed version attached. [TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack A time_wait socket inherits sk_bound_dev_if from the original socket, but it is not used when sending ACK packets using ip_send_reply. Fix by passing the oif to ip_send_reply in struct ip_reply_arg and use it for output routing. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit 9f66116df33984809a5cc0ea66a358db684d4346 tree a0b163d17181f1d1fafb4b7517726b5bb414b579 parent ea1601c50bf23af25094511e2a9ce1b755ab9669 author Patrick McHardy <[EMAIL PROTECTED]> Mon, 04 Jun 2007 17:10:41 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Mon, 04 Jun 2007 17:13:17 +0200 include/net/ip.h |1 + net/ipv4/ip_output.c |4 +++- net/ipv4/tcp_ipv4.c |2 ++ 3 files changed, 6 insertions(+), 1 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index bb207db..abf2820 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -143,6 +143,7 @@ struct ip_reply_arg { __wsum csum; int csumoffset; /* u16 offset of csum in iov[0].iov_base */ /* -1 if not needed */ + int bound_dev_if; }; void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *arg, diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index d6427d9..34ea454 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1352,7 +1352,8 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar } { - struct flowi fl = { .nl_u = { .ip4_u = + struct flowi fl = { .oif = arg->bound_dev_if, + .nl_u = { .ip4_u = { .daddr = daddr, .saddr = rt->rt_spec_dst, .tos = RT_TOS(ip_hdr(skb)->tos) } }, @@ -1376,6 +1377,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar inet->tos = ip_hdr(skb)->tos; sk->sk_priority = skb->priority; sk->sk_protocol = ip_hdr(skb)->protocol; + sk->sk_bound_dev_if = arg->bound_dev_if; ip_append_data(sk, ip_reply_glue_bits, arg->iov->iov_base, len, 0, &ipc, rt, MSG_DONTWAIT); if ((skb = skb_peek(&sk->sk_write_queue)) != NULL) { diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 47c6105..97e294e 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -705,6 +705,8 @@ static void tcp_v4_send_ack(struct tcp_timewait_sock *twsk, ip_hdr(skb)->saddr, /* XXX */ arg.iov[0].iov_len, IPPROTO_TCP, 0); arg.csumoffset = offsetof(struct tcphdr, check) / 2; + if (twsk) + arg.bound_dev_if = twsk->tw_sk.tw_bound_dev_if; ip_send_reply(tcp_socket->sk, skb, &arg, arg.iov[0].iov_len);
Re: Strange soft lockup detected message (looks like spin_lock bug in pcnet32)
On Mon, May 07, 2007 at 01:45:11PM -0400, Lennart Sorensen wrote: > Hmm, I thought I saw it on two systems already, but I should go try that > again. Hmm, still haven't figured this out. I just saw this one this morning: BUG: soft lockup detected on CPU#0! [] dump_stack+0x24/0x30 [] softlockup_tick+0x7e/0xc0 [] update_process_times+0x33/0x80 [] timer_interrupt+0x39/0x80 [] handle_IRQ_event+0x3d/0x70 [] __do_IRQ+0xa9/0x150 [] do_IRQ+0x25/0x60 [] common_interrupt+0x1a/0x20 [] handle_IRQ_event+0x18/0x70 [] __do_IRQ+0xa9/0x150 [] do_IRQ+0x25/0x60 [] common_interrupt+0x1a/0x20 [] __do_softirq+0x3a/0xa0 [] do_softirq+0x2d/0x30 [] irq_exit+0x37/0x40 [] do_IRQ+0x2a/0x60 [] common_interrupt+0x1a/0x20 [] setup_irq+0xce/0x1e0 [] request_irq+0x97/0xb0 [] pcnet32_open+0x4d/0x3d0 [pcnet32] [] dev_open+0x39/0x80 [] dev_change_flags+0xfa/0x130 [] devinet_ioctl+0x4ff/0x6f0 [] sock_ioctl+0xf1/0x1f0 [] do_ioctl+0x2c/0x80 [] vfs_ioctl+0x52/0x2f0 [] sys_ioctl+0x6f/0x80 [] syscall_call+0x7/0xb [] 0xb7f41d04 And it is happening on multiple systems. I am starting to wonder if it is a bug in the soft lockup detection. Maybe it really isn't locked up but just momentarily appears to be. I will try turning off the soft lockup detection and see what happens. -- Len Sorensen - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Failover-friendly TCP retransmission
[EMAIL PROTECTED] writes: > Please note first that I want to address physical failures by > the failover-capable network devices, which are increasingly > becoming important as Xen-based VM systems are getting popular. > Reducing a single-point-of-failure (physical device) is vital on > such VM systems. Just you typically still have lots of other single points of failures in a single system, some of them quite less reliable than your typical NIC. But at least it gives impressive demos when pulling ethernet cables @) > 1. Network device layer detects a failure first and switch to a >backup device (say, in 20sec). > > 2. TCP layer timeout & retransmission comes next, _hopefully_ >before the application layer timeout. > > 3. Application layer detects a network failure last (by, say, >30sec timeout) and may trigger a system-level failover. > > It should be noted that the timeouts for #1 and #2 are handled > independently and there is no relationship between them. > If TCP retransmission misses the time frame between event #1 and > #3 in Background above (between 20 and 30sec since network > failure), a failure causes the system-level failover where the > network-device-level failover should be enough. You should probably make sure that the device ends up returning the right NET_XMIT_* code for such drops to TCP, in particular NET_XMIT_DROP. This might require slight driver interface changes. Also right now it only affects the congestion window, I think, it might be reasonable to let it affect the timer backoff too. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Add suspend and resume support to uli526x
Hi, On Monday, 4 June 2007 13:11, Pavel Machek wrote: > Hi! > > > From: Rafael J. Wysocki <[EMAIL PROTECTED]> > > > > Add suspend/resume support to the uli526x network driver (tested on x86_64, > > with 'Ethernet controller: ALi Corporation M5263 Ethernet Controller, rev > > 40'). > > > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> > > Looks ok to me. > > > +#ifdef CONFIG_PM > > + > > +/* > > + * Suspend the interface. > > + */ > > + > > +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state) > > +{ > > + struct net_device *dev = pci_get_drvdata(pdev); > > + > > + ULI526X_DBUG(0, "uli526x_suspend", 0); > > + > > + if (dev && netdev_priv(dev)) { > > + if (netif_running(dev)) { > > + netif_device_detach(dev); > > + uli526x_down(dev); > > + } > > + pci_save_state(pdev); > > + pci_enable_wake(pdev, pci_choose_state(pdev, state), 0); > > + pci_disable_device(pdev); > > + pci_set_power_state(pdev, pci_choose_state(pdev, state)); > > + } > > + return 0; > > +} > > + > > +/* > > + * Resume the interface. > > + */ > > + > > +static int uli526x_resume(struct pci_dev *pdev) > > +{ > > + struct net_device *dev = pci_get_drvdata(pdev); > > + struct uli526x_board_info *db = netdev_priv(dev); > > + int err; > > + > > + ULI526X_DBUG(0, "uli526x_resume", 0); > > + > > + if (dev && db) { > > + pci_set_power_state(pdev, PCI_D0); > > + err = pci_enable_device(pdev); > > + if (err) { > > + printk(KERN_WARNING "%s: Could not enable device \n", > > + dev->name); > > + return err; > > + } > > + pci_restore_state(pdev); > > + pci_set_master(pdev); > > + if (netif_running(dev)) { > > + uli526x_up(dev); > > + netif_device_attach(dev); > > + } > > + } > > + return 0; > > +} > > + > #else > #define *_resume NULL > #define *_suspend NULL > > +#endif /* CONFIG_PM */ > > > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver > > .id_table = uli526x_pci_tbl, > > .probe = uli526x_init_one, > > .remove = __devexit_p(uli526x_remove_one), > > +#ifdef CONFIG_PM > > + .suspend= uli526x_suspend, > > + .resume = uli526x_resume, > > +#endif > > ...so that this ifdef is not needed? OK, why not. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8169: hard freezes on TX
Rolf Eike Beer wrote: > Francois Romieu wrote: > > Rolf Eike Beer <[EMAIL PROTECTED]> : > > [...] > > Ok, just tested. I used a file of 200MB and copied it to another host on > the LAN. If I used our 100 MBit switch nothing happened. When I put a 10 > MBit hub in the middle it died at 77 MB. I just had another freeze using your patches. After 512kB over smb it was dead. Eike signature.asc Description: This is a digitally signed message part.
[RFC] Failover-friendly TCP retrans mission
Hi all, I would like to hear comments on how TCP retransmission can be done better on failover-capable network devices, such as an active-backup bonding device. Premise === Please note first that I want to address physical failures by the failover-capable network devices, which are increasingly becoming important as Xen-based VM systems are getting popular. Reducing a single-point-of-failure (physical device) is vital on such VM systems. And the failover here is not going to address overloaded or congested networks here, which should be solved separately. Background == When designing a TCP/IP based network system on failover-capable network devices, people want to set timeouts hierarchically in three layers, network device layer, TCP layer, and application layer (bottom-up order), such that: 1. Network device layer detects a failure first and switch to a backup device (say, in 20sec). 2. TCP layer timeout & retransmission comes next, _hopefully_ before the application layer timeout. 3. Application layer detects a network failure last (by, say, 30sec timeout) and may trigger a system-level failover. It should be noted that the timeouts for #1 and #2 are handled independently and there is no relationship between them. Also note that the actual timeout settings (20sec or 30sec in this example) are often determined by systems requirement and so setting them to certain "safe values" (if any) are usually not possible. Problem === If TCP retransmission misses the time frame between event #1 and #3 in Background above (between 20 and 30sec since network failure), a failure causes the system-level failover where the network-device-level failover should be enough. The problem in this hierarchical timeout scheme is that TCP layer does not guarantee the next retransmission to occur in certain period of time. In the above example, people expect TCP to retransmit a packet between 20 and 30sec since network failure, but it may not happen. Starting from RTO=0.5sec for example, retransmission will occur at time 0.5, 1.5, 3.5, 7.5, 15.5, and 31.5 as indicated by 'o' in the following diagram, but miss the time frame between time 20 and 30. time: 0 102030sec | | | | App. layer |-+-+-X ==> system failover TCP layer oo-o---o--+o+-+o <== expects retrans. b/w 20~30 Netdev layer |-+-X==> network failover Solution It seems reasonable for me to solve this problem by capping TCP_RTO_MAX, i.e., making TCP_RTO_MAX a sysctl variable and set it to a small number. In this example, setting to (10 - RTT)[sec] at most should work because retransmission will take place between time 20 and 30. My rationale follows. * This solution is simple and so less error-prone. * The solution does not violate RFC 2988 in maximum RTO value, because RFC 2988's requirement in maximum RTO value, "at least 60 seconds (in (2.5))," is OPTIONAL. * The solution adds a system-wide setting, which is preferable to per-socket setting (by, say, setsockopt or something), because all application benefits from the solution. Before posting patches, I would like to hear comments here. Any comments or suggestions, to make TCP retransmission work better on failover-capable network devices, are welcome. Regards, -- OBATA Noboru ([EMAIL PROTECTED])
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Patrick McHardy <[EMAIL PROTECTED]> writes: > I was thinking about this, don't CAN frames include the identity > of the sender? That would be the easiest way to determine whether > the frame needs to be delivered. No, CAN is quite a broken networking technology (at least in my view, coming from tcp/ip). There is no station addresses and therefore no sender or receiver addresses. Everything is broadcast. > Actually its not unused, it needs to be set on TX frames once the > packet leaves the protocol specific code. Oops, yes, this is something still missing in our code. Needs to be fixed. > I don't get why you can't directly check the socket option on the > TX path. We have several types of sockets in the PF_CAN family, two of which are GPL'ed and which are in the patch series. These are CAN_RAW and CAN_BCM. The protocol implementations use can_send() in af_can.c to send a CAN frame and indicate to can_send() in an int argument, whether this frame should be looped back. Only the raw protocol has a socket option (setsockopt(2)) in struct raw_sock for this, bcm always sets this to 1 to have the frame looped back. There is no option in struct bcm_sock for this. In can_send() and in the driver we don't know what type of socket skb->sk points to and can't check that option. Changing this would mean we have to add such an option in the same position in all CAN socket types and set it to fixed values in some of them (e.g. to 1 for bcm). While it's doable, I wouldn't like that very much. Is there anything that prevents can_send() from using skb->pkt_type to pass the loopback flag down to the driver? urs - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ehea: Receive SKB Aggregation
Christoph Hellwig wrote on 31.05.2007 15:41:18: > I'm still very unhappy with having all this in various drivers. There's > a lot of code that can be turned into generic library functions, and even > more code that could be made generic with some amount of refactoring. Yes, we'd also prefer to use a generic function, but we first would want to get some "real world" experience how our driver behaves with LRO to be even able to define requirements for such a generic function. A lot of this is tied into pathlengths, caching, and why does that help compared to a different TCP receive side processing? In a perfect world we shouldn't see a diffference if this is enabled or not, but measurements indicate something completely different at 10gbit. Gruss / Regards Christoph Raisch - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ehea: Receive SKB Aggregation
Stephen Hemminger wrote on 31.05.2007 18:37:03: > > > > > > > +static int try_get_ip_tcp_hdr(struct ehea_cqe *cqe, struct sk_buff *skb, > > + struct iphdr **iph, struct tcphdr **tcph) > > +{ > > + int ip_len; > > + > > + /* non tcp/udp packets */ > > + if (!cqe->header_length) > > + return -1; > > + > > + /* non tcp packet */ > > + *iph = (struct iphdr *)(skb->data); > > Why the indirection, copying of headers.. This interacts with the header split function in the hardware. > > > + if ((*iph)->protocol != IPPROTO_TCP) > > + return -1; > > + > > + ip_len = (u8)((*iph)->ihl); > > + ip_len <<= 2; > > + *tcph = (struct tcphdr *)(((u64)*iph) + ip_len); > > + > > + return 0; > > +} > > + > > > > This code seems to be duplicating a lot (but not all) of the TCP/IP > input path validation checks. This is a security problem if nothing else... > We should only do aggregation in the driver if this really is a TCP header, otherwise things will get worse. You're right, we should at least check that tcph is within the received frame. > Also, how do you prevent DoS attacks from hostile TCP senders that send > huge number of back to back frames? Actually a huge number of back to back frames is what we would want to receive at 10 gbit ;-) How is it possible to figure out if this is what you want or just DoS? It doesn't change anything compared to a non LRO driver, we process a certain maximum amount of frames before waiting for the next interrupt, the packet filters/DoS should still see all traffic (which is above the driver). Any suggestions how to handle this better/different? > > -- > Stephen Hemminger Gruss / Regards Christoph Raisch - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ps3: add wireless LAN support
Hi Thank you for your comments! It seems to be pretty preferable we make the driver wpa_supplicant compatible, I'll plan to rework and resubmit the patch (to linux-wireless) On Fri, 01 Jun 2007 12:35:49 -0400 Dan Williams <[EMAIL PROTECTED]> wrote: > There are a few basic > issues, rehashed here: > > 1) Need endian annotations and conversions. OK, I'll fix. > 2) How much WPA does the device really support? If it only supports > WPA[2]-PSK, I guess that's OK, but the driver _MUST_ return the WPA and > RSN information elements from each BSS that provides them using the > IWEVGENIE tag from the get scan handler. A hack might be to construct a > suitable synthetic IE depending on the contents of the 'security' field > the firmware passes back. But userspace needs to know that a particular > BSS supports WPA in a standard manner. It supports WEP and WPA[2]-PSKs(TKIP and AES) only. I'll consider incorporating wpa_supplicant stuff, ie to support IWEVGENIE parameters. > 3) Should remove the WPA passphrase bits from the encode functions, > these functions should operate only on raw key material. Userspace > tools do the hashing. The hypervisor (firmware) has functionality of hashing WPA passphrase. So the driver code just passes it to hypervisor. With that feature, end users can easily set their passphase. I thought it was convenient. Anyway I can remove it to delegate to user space. > 4) I'd like an explanation of the scanning behavior too, looks fishy. Well, it basically start to scan as requested. But since scanning is time consuming job and it seems to block normal messages, rate control code you mentioned was inserted. Background scan is performed for roaming as you thought. > > 5) Should be using a list of scanned BSSs, should be aging and culling > them, instead of a static array. OK. > > 6) The private ioctl needs to die. These private ioctls were made so that we can setup the driver simply with wireless-tools, like the answer to 3) above. Anyway I'll try to incorporate wpa_supplicant support. > > 7) Does the device work with wpa_supplicant's 'wext' driver? I don't think it works with wext of wpa_supplicant, although I didn't test it with wpa_supplicant well. -- Masakazu MOKUNO - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Oliver Hartkopp <[EMAIL PROTECTED]> writes: > 2. The loopback indication is done by using the unused skb->protocol in > the tx path. I don't think we should (mis-)use skb->protocol as a loopback flag. As the name says, it's the protocol number and not a flag whether loopback is to be done by the driver. Even if it's not used otherwise in tx path (haven't checked this), in my experience it is asking for future trouble to use variables other than for it's intended purpose. IMO it would be better to skb->pkt_type. This is used to indicate packet type to rcv functions registered by dev_add_pack(). It is set by netdevice drivers to PACKET_{MULTICAST,BROADCAST,HOST,OTHER} for received packets. In the send path it is set to PACKET_OUTGOING on the copy of the skbuff that is delivered to the sockets registered on ptype_all (typically packet sockets from tcpdump or other sniffers). AFAICS, pkt_type is not used otherwise in the send path. We could set skb->pkt_type = PACKET_LOOPBACK to flag to the CAN netdevice driver whether to loop back the packet. Any objections? urs - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Oliver Hartkopp wrote: > 1. The needed skb->sk is preserved for the rx path (the way 'up'). I was thinking about this, don't CAN frames include the identity of the sender? That would be the easiest way to determine whether the frame needs to be delivered. > 2. The loopback indication is done by using the unused skb->protocol in > the tx path. Actually its not unused, it needs to be set on TX frames once the packet leaves the protocol specific code. Its used for example by packet sockets (sk_run_filter), qdisc classification, neighbour resolution, gso segmentation, ... I don't get why you can't directly check the socket option on the TX path. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cfg80211: fix signed macaddress in sysfs
On Mon, 2007-06-04 at 00:06 +0200, David Lamparter wrote: > Fix signedness mixup making mac addresses show up strangely > (like 00:11:22:33:44:ffaa) in /sys/class/ieee80211/*/macaddress. Huh, my mistake, patch looks good. Do we care for this in .22 still? Acked-by: Johannes Berg <[EMAIL PROTECTED]> > Signed-off-by: David Lamparter <[EMAIL PROTECTED]> > --- > > net/wireless/sysfs.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/wireless/sysfs.c b/net/wireless/sysfs.c > index 7844be4..374d16d 100644 > --- a/net/wireless/sysfs.c > +++ b/net/wireless/sysfs.c > @@ -33,7 +33,7 @@ static ssize_t _show_permaddr(struct device *dev, > struct device_attribute *attr, > char *buf) > { > - char *addr = dev_to_rdev(dev)->wiphy.perm_addr; > + unsigned char *addr = dev_to_rdev(dev)->wiphy.perm_addr; > > return sprintf(buf, "%.2x:%.2x:%.2x:%.2x:%.2x:%.2x\n", >addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]); This probably should also use MAC_FMT/MAC_ARG, hah. Oh well, I'll fix it up later. johannes signature.asc Description: This is a digitally signed message part
Re: [RFC][PATCH] Add suspend and resume support to uli526x
Hi! > From: Rafael J. Wysocki <[EMAIL PROTECTED]> > > Add suspend/resume support to the uli526x network driver (tested on x86_64, > with 'Ethernet controller: ALi Corporation M5263 Ethernet Controller, rev > 40'). > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> Looks ok to me. > +#ifdef CONFIG_PM > + > +/* > + * Suspend the interface. > + */ > + > +static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state) > +{ > + struct net_device *dev = pci_get_drvdata(pdev); > + > + ULI526X_DBUG(0, "uli526x_suspend", 0); > + > + if (dev && netdev_priv(dev)) { > + if (netif_running(dev)) { > + netif_device_detach(dev); > + uli526x_down(dev); > + } > + pci_save_state(pdev); > + pci_enable_wake(pdev, pci_choose_state(pdev, state), 0); > + pci_disable_device(pdev); > + pci_set_power_state(pdev, pci_choose_state(pdev, state)); > + } > + return 0; > +} > + > +/* > + * Resume the interface. > + */ > + > +static int uli526x_resume(struct pci_dev *pdev) > +{ > + struct net_device *dev = pci_get_drvdata(pdev); > + struct uli526x_board_info *db = netdev_priv(dev); > + int err; > + > + ULI526X_DBUG(0, "uli526x_resume", 0); > + > + if (dev && db) { > + pci_set_power_state(pdev, PCI_D0); > + err = pci_enable_device(pdev); > + if (err) { > + printk(KERN_WARNING "%s: Could not enable device \n", > + dev->name); > + return err; > + } > + pci_restore_state(pdev); > + pci_set_master(pdev); > + if (netif_running(dev)) { > + uli526x_up(dev); > + netif_device_attach(dev); > + } > + } > + return 0; > +} > + #else #define *_resume NULL #define *_suspend NULL > +#endif /* CONFIG_PM */ > @@ -1689,6 +1775,10 @@ static struct pci_driver uli526x_driver > .id_table = uli526x_pci_tbl, > .probe = uli526x_init_one, > .remove = __devexit_p(uli526x_remove_one), > +#ifdef CONFIG_PM > + .suspend= uli526x_suspend, > + .resume = uli526x_resume, > +#endif ...so that this ifdef is not needed? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix race in AF_UNIX
(resend, sorry, fscked up the address list) > A recv() on an AF_UNIX, SOCK_STREAM socket can race with a > send()+close() on the peer, causing recv() to return zero, even though > the sent data should be received. > > This happens if the send() and the close() is performed between > skb_dequeue() and checking sk->sk_shutdown in unix_stream_recvmsg(): > > process A skb_dequeue() returns NULL, there's no data in the socket queue > process B new data is inserted onto the queue by unix_stream_sendmsg() > process B sk->sk_shutdown is set to SHUTDOWN_MASK by unix_release_sock() > process A sk->sk_shutdown is checked, unix_release_sock() returns zero This is only part of the story. It turns out, there are other races involving the garbage collector, that can throw away perfectly good packets with AF_UNIX sockets in them. The problems arise when a socket goes from installed to in-flight or vica versa during garbage collection. Since gc is done with a spinlock held, this only shows up on SMP. The following patch fixes it for me, but it's possibly the wrong approach. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux-2.6.22-rc2/net/unix/garbage.c === --- linux-2.6.22-rc2.orig/net/unix/garbage.c2007-06-03 23:58:11.0 +0200 +++ linux-2.6.22-rc2/net/unix/garbage.c 2007-06-04 11:39:42.0 +0200 @@ -90,6 +90,7 @@ static struct sock *gc_current = GC_HEAD; /* stack of objects to mark */ atomic_t unix_tot_inflight = ATOMIC_INIT(0); +DECLARE_RWSEM(unix_gc_sem); static struct sock *unix_get_socket(struct file *filp) @@ -169,7 +170,7 @@ static void maybe_unmark_and_push(struct void unix_gc(void) { - static DEFINE_MUTEX(unix_gc_sem); + static DEFINE_MUTEX(unix_gc_local_lock); int i; struct sock *s; struct sk_buff_head hitlist; @@ -179,9 +180,22 @@ void unix_gc(void) * Avoid a recursive GC. */ - if (!mutex_trylock(&unix_gc_sem)) + if (!mutex_trylock(&unix_gc_local_lock)) return; + + /* +* unix_gc_sem protects against sockets going from in-flight to +* installed +* +* Can't sleep on this, because skb_recv_datagram could be +* waiting for a packet that is to be sent by the thread which +* invoked the gc +*/ + if (!down_write_trylock(&unix_gc_sem)) { + mutex_unlock(&unix_gc_local_lock); + return; + } spin_lock(&unix_table_lock); forall_unix_sockets(i, s) @@ -207,8 +221,6 @@ void unix_gc(void) forall_unix_sockets(i, s) { - int open_count = 0; - /* * If all instances of the descriptor are not * in flight we are in use. @@ -218,10 +230,20 @@ void unix_gc(void) * In this case (see unix_create1()) we set artificial * negative inflight counter to close race window. * It is trick of course and dirty one. +* +* Get the inflight counter first, then the open +* counter. This avoids problems if racing with +* sendmsg +* +* If just created socket is not yet attached to +* a file descriptor, assume open_count of 1 */ + int inflight_count = atomic_read(&unix_sk(s)->inflight); + int open_count = 1; + if (s->sk_socket && s->sk_socket->file) open_count = file_count(s->sk_socket->file); - if (open_count > atomic_read(&unix_sk(s)->inflight)) + if (open_count > inflight_count) maybe_unmark_and_push(s); } @@ -302,11 +324,12 @@ void unix_gc(void) u->gc_tree = GC_ORPHAN; } spin_unlock(&unix_table_lock); + up_write(&unix_gc_sem); /* * Here we are. Hitlist is filled. Die. */ __skb_queue_purge(&hitlist); - mutex_unlock(&unix_gc_sem); + mutex_unlock(&unix_gc_local_lock); } Index: linux-2.6.22-rc2/include/net/af_unix.h === --- linux-2.6.22-rc2.orig/include/net/af_unix.h 2007-04-26 05:08:32.0 +0200 +++ linux-2.6.22-rc2/include/net/af_unix.h 2007-06-04 09:13:56.0 +0200 @@ -14,6 +14,7 @@ extern void unix_gc(void); extern struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1]; extern spinlock_t unix_table_lock; +extern struct rw_semaphore unix_gc_sem; extern atomic_t unix_tot_inflight; Index: linux-2.6.22-rc2/net/unix/af_unix.c === --- linux-2.6.22-rc2.orig/net/unix/af_unix.c2007-06-03 23:58:11.0 +0200 +++ linux-2.6.22-rc2/net/unix/af_u