kernel 2.4 vs 2.6 Traffic Controller performance
Hello This is a repost, there seems to have a misunderstanding before. I hope this is the right place to ask this. Does any know if there is a substantial difference in the performance of the traffic controller between kernel 2.4 and 2.6. We tested it using 1 iperf server and use 250 and 500 clients, altering the burst. This is the set-up: iperf client - router (w/ traffic controller) - iperf server We use the top command inside the router to check the idle time of our router to see this. The results we got from the 2.4 kernel shows around 65-70% idle time while the 2.6 shows 60-65% idle time. We tried to use MRTG and we're not getting any results either. We want to know if we could improve the bandwidth by upgrading the kernel, else we would have to get a new bandwidth manager. Have anyone performed a similar test or can suggest a better way to do this. Thanks in advance. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rtnl: Simplify ASSERT_RTNL
On Tue, Oct 02, 2007 at 05:29:11PM +0200, Patrick McHardy wrote: > > I think this doesn't completely fix it, when dev_unicast_add is > interrupted by dev_mc_add before the unicast changes are performed, > they will get committed in the dev_mc_add context, so we might still > call change_flags with BH disabled. Taking the TX lock around the > dev->uc_count and dev->uc_promisc checks and changes in __dev_set_rx_mode > should fix this. Good catch. Digging back in history it seems that you added the change_rx_flags function so that the driver didn't have to do it under TX lock, right? The problem with this is that the stack can now call change_rx_flags and set_multicast_list simultaneously which presents a potential headache for the driver author (if they were to use change_rx_flags). It seems to me what we could do is in fact separate out the part that adds the address and the part that syncs it with hardware. That way we can call the hardware from a process context later and use the RTNL to guarantee that we only enter the driver once. So dev_mc_add would look like: 1) Hold some form of lock L. 2) Modify mc list A (a copy of the current mc list). 3) Drop lock. 4) Schedule an update to the hardware. The update to the hardware would look lie: 1) Hold RTNL. 2) Hold lock L. 3) Copy list A to list B (B would be our current list). 4) Drop lock L. 5) Call the hardware. 6) Drop RTNL. For compatibility, set_multicast_list would still be invoked under the TX lock while set_rx_mode would do exactly the same thing but would only hold the RTNL. What do you think about this approach? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3][NET_BATCH] net core use batching
On Tue, 02 Oct 2007, jamal wrote: > On Tue, 2007-02-10 at 00:25 -0400, Bill Fink wrote: > > > One reason I ask, is that on an earlier set of alternative batching > > xmit patches by Krishna Kumar, his performance testing showed a 30 % > > performance hit for TCP for a single process and a size of 4 KB, and > > a performance hit of 5 % for a single process and a size of 16 KB > > (a size of 8 KB wasn't tested). Unfortunately I was too busy at the > > time to inquire further about it, but it would be a major potential > > concern for me in my 10-GigE network testing with 9000-byte jumbo > > frames. Of course the single process and 4 KB or larger size was > > the only case that showed a significant performance hit in Krishna > > Kumar's latest reported test results, so it might be acceptable to > > just have a switch to disable the batching feature for that specific > > usage scenario. So it would be useful to know if your xmit batching > > changes would have similar issues. > > There were many times while testing that i noticed inconsistencies and > in each case when i analysed[1], i found it to be due to some variable > other than batching which needed some resolving, always via some > parametrization or other. I suspect what KK posted is in the same class. > To give you an example, with UDP, batching was giving worse results at > around 256B compared to 64B or 512B; investigating i found that the > receiver just wasnt able to keep up and the udp layer dropped a lot of > packets so both iperf and netperf reported bad numbers. Fixing the > receiver ended up with consistency coming back. On why 256B was the one > that overwhelmed the receiver more than 64B(which sent more pps)? On > some limited investigation, it seemed to me to be the effect of the > choice of the tg3 driver's default tx mitigation parameters as well tx > ring size; which is something i plan to revisit (but neutralizing it > helps me focus on just batching). In the end i dropped both netperf and > iperf for similar reasons and wrote my own app. What i am trying to > achieve is demonstrate if batching is a GoodThing. In experimentation > like this, it is extremely valuable to reduce the variables. Batching > may expose other orthogonal issues - those need to be resolved or fixed > as they are found. I hope that sounds sensible. It does sound sensible. My own decidedly non-expert speculation was that the big 30 % performance hit right at 4 KB may be related to memory allocation issues or having to split the skb across multiple 4 KB pages. And perhaps it only affected the single process case because with multiple processes lock contention may be a bigger issue and the xmit batching changes would presumably help with that. I am admittedly a novice when it comes to the detailed internals of TCP/skb processing, although I have been slowly slogging my way through parts of the TCP kernel code to try and get a better understanding, so I don't know if these thoughts have any merit. BTW does anyone know of a good book they would recommend that has substantial coverage of the Linux kernel TCP code, that's fairly up-to-date and gives both an overall view of the code and packet flow as well as details on individual functions and algorithms, and hopefully covers basic issues like locking and synchronization, concurrency of different parts of the stack, and memory allocation. I have several books already on Linux kernel and networking internals, but they seem to only cover the IP (and perhaps UDP) portions of the network stack, and none have more than a cursory reference to TCP. The most useful documentation on the Linux TCP stack that I have found thus far is some of Dave Miller's excellent web pages and a few other web references, but overall it seems fairly skimpy for such an important part of the Linux network code. > Back to the >=9K packet size you raise above: > I dont have a 10Gige card so iam theorizing. Given that theres an > observed benefit to batching for a saturated link with "smaller" packets > (in my results "small" is anything below 256B which maps to about > 380Kpps anything above that seems to approach wire speed and the link is > the bottleneck); then i theorize that 10Gige with 9K jumbo frames if > already achieving wire rate, should continue to do so. And sizes below > that will see improvements if they were not already hitting wire rate. > So i would say that with 10G NICS, there will be more observed > improvements with batching with apps that do bulk transfers (assuming > those apps are not seeing wire speed already). Note that this hasnt been > quiet the case even with TSO given the bottlenecks in the Linux > receivers that J Heffner put nicely in a response to some results you > posted - but that exposes an issue with Linux receivers rather than TSO. It would be good to see some empirical evidence that there aren't any unforeseen gotchas for larger packet sizes, that at least the same level of performance can be obt
Re: [PATCH] sky2: jumbo frame regression fix
On Wed, 03 Oct 2007 03:34:34 +0200 Ian Kumlien <[EMAIL PROTECTED]> wrote: > On tis, 2007-10-02 at 18:02 -0700, Stephen Hemminger wrote: > > Remove unneeded check that caused problems with jumbo frame sizes. > > The check was recently added and is wrong. > > When using jumbo frames the sky2 driver does fragmentation, so > > rx_data_size is less than mtu. > > Confirmed working. > > Now running with 9k mtu with no errors, =) > > It also seems that the FIFO bug was the one that affected me before, > damn odd race that one. Does the workaround (forced reset work). Ian, you are the first person to report triggering it. I haven't found a way to make it happen. What combination of flow control and speeds are you using? -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sky2: jumbo frame regression fix
Stephen Hemminger wrote: On Tue, 02 Oct 2007 21:07:22 -0400 Jeff Garzik <[EMAIL PROTECTED]> wrote: Stephen Hemminger wrote: Remove unneeded check that caused problems with jumbo frame sizes. The check was recently added and is wrong. When using jumbo frames the sky2 driver does fragmentation, so rx_data_size is less than mtu. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- a/drivers/net/sky2.c2007-10-02 17:56:31.0 -0700 +++ b/drivers/net/sky2.c2007-10-02 17:58:56.0 -0700 @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending; prefetch(sky2->rx_ring + sky2->rx_next); - if (length < ETH_ZLEN || length > sky2->rx_data_size) - goto len_error; - 2.6.23? 2.6.24? enquiring minds want to know... 2.6.23, since it is a regression You can have regressions in behavior in net-2.6.24.git, too. _Please_ be specific about where you want your patches to go. Thanks. Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sky2: jumbo frame regression fix
On Tue, 02 Oct 2007 21:07:22 -0400 Jeff Garzik <[EMAIL PROTECTED]> wrote: > Stephen Hemminger wrote: > > Remove unneeded check that caused problems with jumbo frame sizes. > > The check was recently added and is wrong. > > When using jumbo frames the sky2 driver does fragmentation, so > > rx_data_size is less than mtu. > > > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > > > --- a/drivers/net/sky2.c2007-10-02 17:56:31.0 -0700 > > +++ b/drivers/net/sky2.c2007-10-02 17:58:56.0 -0700 > > @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru > > sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending; > > prefetch(sky2->rx_ring + sky2->rx_next); > > > > - if (length < ETH_ZLEN || length > sky2->rx_data_size) > > - goto len_error; > > - > > 2.6.23? 2.6.24? enquiring minds want to know... 2.6.23, since it is a regression -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'upstream-davem' branch of wireless-2.6
Of course, these are intended for 2.6.24. Also, I forgot to mention that the individual patches are available here: http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/upstream-davem/ I also preserved the net-2.6.24 commit I based from as 'master-davem' in case you need it for reference. Hth! John On Tue, Oct 02, 2007 at 09:25:52PM -0400, John W. Linville wrote: > The following changes since commit d3adbde754a9ae7a6f87612055cb20db856f0721: > Ilpo Järvinen (1): > [TCP]: Wrap-safed reordering detection FRTO check > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git > upstream-davem > > Daniel Drake (1): > hostap: set netdev type before registering AP interface > > Johannes Berg (9): > mac80211: add "invalid" interface type > mac80211: remove management interface > mac80211: move sta_process rx handler later > mac80211: consolidate decryption more > mac80211: use RX_FLAG_DECRYPTED for sw decrypted as well > mac80211: remove ALG_NONE > mac80211: improve radiotap injection > mac80211: make userspace-mlme a per-interface setting > mac80211: implement cfg80211's change_interface hook > > Michael Buesch (9): > rfkill: Add support for an rfkill LED. > rfkill: Add support for hardware-only rfkill buttons > b43: LED triggers support > b43: RF-kill support > b43: Use input-polldev for the rfkill switch > b43: Rewrite pwork locking policy. > mac80211: Check open_count before calling config callback. > mac80211: Add association LED trigger > mac80211: Update beacon_update callback documentation > > Tomas Winkler (1): > mac80211: add sta_notify callback > > Ulrich Kunitz (1): > zd1211rw: Removed zd_util.c and zd_util.h > > Documentation/networking/mac80211-injection.txt | 32 ++- > drivers/net/wireless/adm8211.c |8 +- > drivers/net/wireless/b43/Kconfig| 12 + > drivers/net/wireless/b43/Makefile |5 +- > drivers/net/wireless/b43/b43.h | 11 +- > drivers/net/wireless/b43/leds.c | 399 > ++- > drivers/net/wireless/b43/leds.h | 63 ++-- > drivers/net/wireless/b43/main.c | 205 > drivers/net/wireless/b43/phy.c | 13 +- > drivers/net/wireless/b43/phy.h |2 +- > drivers/net/wireless/b43/rfkill.c | 184 +++ > drivers/net/wireless/b43/rfkill.h | 58 > drivers/net/wireless/hostap/hostap.h|2 +- > drivers/net/wireless/hostap/hostap_hw.c |2 +- > drivers/net/wireless/hostap/hostap_main.c | 19 +- > drivers/net/wireless/iwlwifi/iwl3945-base.c |4 - > drivers/net/wireless/iwlwifi/iwl4965-base.c |4 - > drivers/net/wireless/p54common.c|4 +- > drivers/net/wireless/p54pci.c |4 +- > drivers/net/wireless/rt2x00/rt2x00.h|2 +- > drivers/net/wireless/zd1211rw/Makefile |2 +- > drivers/net/wireless/zd1211rw/zd_chip.c |1 - > drivers/net/wireless/zd1211rw/zd_mac.c |4 +- > drivers/net/wireless/zd1211rw/zd_usb.c |1 - > drivers/net/wireless/zd1211rw/zd_util.c | 82 - > drivers/net/wireless/zd1211rw/zd_util.h | 29 -- > include/linux/rfkill.h | 24 ++ > include/net/mac80211.h | 46 +++- > net/mac80211/cfg.c | 75 - > net/mac80211/ieee80211.c| 189 +--- > net/mac80211/ieee80211_i.h | 17 +- > net/mac80211/ieee80211_iface.c | 68 + > net/mac80211/ieee80211_ioctl.c | 31 +- > net/mac80211/ieee80211_led.c| 67 +++- > net/mac80211/ieee80211_led.h|6 + > net/mac80211/ieee80211_rate.c |3 +- > net/mac80211/ieee80211_rate.h |2 - > net/mac80211/ieee80211_sta.c|7 +- > net/mac80211/key.c |1 - > net/mac80211/rx.c | 122 +++- > net/mac80211/sta_info.c | 13 +- > net/mac80211/tx.c | 211 ++-- > net/mac80211/wme.c | 10 +- > net/rfkill/Kconfig |7 + > net/rfkill/rfkill.c | 49 +++- > 45 files changed, 1022 insertions(+), 1078 deletions(-) > create mode 100644 drivers/net/wireless/b43/rfkill.c > create mode 100644 drivers/net/wireless/b43/rfkill.h > delete mode 100644 drivers/net/wireless/zd1211rw/zd_util.c > delete mode 100644 drivers/net/wireless/zd1211rw/zd_util.h
Re: Please pull 'upstream-davem' branch of wireless-2.6
From: "John W. Linville" <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2007 21:25:52 -0400 > git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git > upstream-davem This doesn't pull cleanly. Probably you used a recently cloned Linus tree, pulled net-2.6.24 into that (and resolved the conflicts), and then put your patches in. Please don't do it like that, I don't want to pull from a tree that has linus vs. net-2.6.24 conflict handling in it. That's why I usually rebase frequently, to minimize that as much as is humanly possible. What you can do is figure out what linus's HEAD was at the last rebase (basically 'origin' or parent of net-2.6.24), clone that then pull in net-2.6.24, then add your patches. That way I can always do a clean pull. My pull from Jeff today was very clean, for example. I'll add these wireless bits by hand as patches. Thanks John. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] baycom epp header ops
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2007 17:41:03 -0700 > Update baycom epp driver for new header ops in net-2.6.24 > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Applied, thanks Stephen. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Please pull 'fixes-jgarzik' branch of wireless-2.6
The following changes since commit 3146b39c185f8a436d430132457e84fa1d8f8208: Linus Torvalds (1): Linux 2.6.23-rc9 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git fixes-jgarzik Joe Perches (1): bcm43xx: Correct printk with PFX before KERN_ Richard Knutsson (1): softmac: Fix compiler-warning drivers/net/wireless/bcm43xx/bcm43xx_wx.c |2 +- net/ieee80211/softmac/ieee80211softmac_wx.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_wx.c b/drivers/net/wireless/bcm43xx/bcm43xx_wx.c index d6d9413..6acfdc4 100644 --- a/drivers/net/wireless/bcm43xx/bcm43xx_wx.c +++ b/drivers/net/wireless/bcm43xx/bcm43xx_wx.c @@ -444,7 +444,7 @@ static int bcm43xx_wx_set_xmitpower(struct net_device *net_dev, u16 maxpower; if ((data->txpower.flags & IW_TXPOW_TYPE) != IW_TXPOW_DBM) { - printk(PFX KERN_ERR "TX power not in dBm.\n"); + printk(KERN_ERR PFX "TX power not in dBm.\n"); return -EOPNOTSUPP; } diff --git a/net/ieee80211/softmac/ieee80211softmac_wx.c b/net/ieee80211/softmac/ieee80211softmac_wx.c index 442b987..5742dc8 100644 --- a/net/ieee80211/softmac/ieee80211softmac_wx.c +++ b/net/ieee80211/softmac/ieee80211softmac_wx.c @@ -114,7 +114,7 @@ check_assoc_again: sm->associnfo.associating = 1; /* queue lower level code to do work (if necessary) */ schedule_delayed_work(&sm->associnfo.work, 0); -out: + mutex_unlock(&sm->associnfo.mutex); return 0; -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Please pull 'upstream-davem' branch of wireless-2.6
The following changes since commit d3adbde754a9ae7a6f87612055cb20db856f0721: Ilpo Järvinen (1): [TCP]: Wrap-safed reordering detection FRTO check are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream-davem Daniel Drake (1): hostap: set netdev type before registering AP interface Johannes Berg (9): mac80211: add "invalid" interface type mac80211: remove management interface mac80211: move sta_process rx handler later mac80211: consolidate decryption more mac80211: use RX_FLAG_DECRYPTED for sw decrypted as well mac80211: remove ALG_NONE mac80211: improve radiotap injection mac80211: make userspace-mlme a per-interface setting mac80211: implement cfg80211's change_interface hook Michael Buesch (9): rfkill: Add support for an rfkill LED. rfkill: Add support for hardware-only rfkill buttons b43: LED triggers support b43: RF-kill support b43: Use input-polldev for the rfkill switch b43: Rewrite pwork locking policy. mac80211: Check open_count before calling config callback. mac80211: Add association LED trigger mac80211: Update beacon_update callback documentation Tomas Winkler (1): mac80211: add sta_notify callback Ulrich Kunitz (1): zd1211rw: Removed zd_util.c and zd_util.h Documentation/networking/mac80211-injection.txt | 32 ++- drivers/net/wireless/adm8211.c |8 +- drivers/net/wireless/b43/Kconfig| 12 + drivers/net/wireless/b43/Makefile |5 +- drivers/net/wireless/b43/b43.h | 11 +- drivers/net/wireless/b43/leds.c | 399 ++- drivers/net/wireless/b43/leds.h | 63 ++-- drivers/net/wireless/b43/main.c | 205 drivers/net/wireless/b43/phy.c | 13 +- drivers/net/wireless/b43/phy.h |2 +- drivers/net/wireless/b43/rfkill.c | 184 +++ drivers/net/wireless/b43/rfkill.h | 58 drivers/net/wireless/hostap/hostap.h|2 +- drivers/net/wireless/hostap/hostap_hw.c |2 +- drivers/net/wireless/hostap/hostap_main.c | 19 +- drivers/net/wireless/iwlwifi/iwl3945-base.c |4 - drivers/net/wireless/iwlwifi/iwl4965-base.c |4 - drivers/net/wireless/p54common.c|4 +- drivers/net/wireless/p54pci.c |4 +- drivers/net/wireless/rt2x00/rt2x00.h|2 +- drivers/net/wireless/zd1211rw/Makefile |2 +- drivers/net/wireless/zd1211rw/zd_chip.c |1 - drivers/net/wireless/zd1211rw/zd_mac.c |4 +- drivers/net/wireless/zd1211rw/zd_usb.c |1 - drivers/net/wireless/zd1211rw/zd_util.c | 82 - drivers/net/wireless/zd1211rw/zd_util.h | 29 -- include/linux/rfkill.h | 24 ++ include/net/mac80211.h | 46 +++- net/mac80211/cfg.c | 75 - net/mac80211/ieee80211.c| 189 +--- net/mac80211/ieee80211_i.h | 17 +- net/mac80211/ieee80211_iface.c | 68 + net/mac80211/ieee80211_ioctl.c | 31 +- net/mac80211/ieee80211_led.c| 67 +++- net/mac80211/ieee80211_led.h|6 + net/mac80211/ieee80211_rate.c |3 +- net/mac80211/ieee80211_rate.h |2 - net/mac80211/ieee80211_sta.c|7 +- net/mac80211/key.c |1 - net/mac80211/rx.c | 122 +++- net/mac80211/sta_info.c | 13 +- net/mac80211/tx.c | 211 ++-- net/mac80211/wme.c | 10 +- net/rfkill/Kconfig |7 + net/rfkill/rfkill.c | 49 +++- 45 files changed, 1022 insertions(+), 1078 deletions(-) create mode 100644 drivers/net/wireless/b43/rfkill.c create mode 100644 drivers/net/wireless/b43/rfkill.h delete mode 100644 drivers/net/wireless/zd1211rw/zd_util.c delete mode 100644 drivers/net/wireless/zd1211rw/zd_util.h Omnibus patch attached as upstream-davem.patch.bz2 -- John W. Linville [EMAIL PROTECTED] upstream-davem.patch.bz2 Description: BZip2 compressed data
Re: [PATCH] sky2: jumbo frame regression fix
On tis, 2007-10-02 at 18:02 -0700, Stephen Hemminger wrote: > Remove unneeded check that caused problems with jumbo frame sizes. > The check was recently added and is wrong. > When using jumbo frames the sky2 driver does fragmentation, so > rx_data_size is less than mtu. Confirmed working. Now running with 9k mtu with no errors, =) It also seems that the FIFO bug was the one that affected me before, damn odd race that one. > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Tested-by: Ian Kumlien <[EMAIL PROTECTED]> (if that tag exists now) Btw, Sorry but all mail directly to you will be blocked. I have yet to fix the relaying properly with isp:s blocking port 25 etc so for some of you this mail will only show up on the ML. > --- a/drivers/net/sky2.c 2007-10-02 17:56:31.0 -0700 > +++ b/drivers/net/sky2.c 2007-10-02 17:58:56.0 -0700 > @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru > sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending; > prefetch(sky2->rx_ring + sky2->rx_next); > > - if (length < ETH_ZLEN || length > sky2->rx_data_size) > - goto len_error; > - > /* This chip has hardware problems that generates bogus status. >* So do only marginal checking and expect higher level protocols >* to handle crap frames. -- Ian Kumlien -- http://pomac.netswarm.net signature.asc Description: This is a digitally signed message part
Re: [PATCH] sky2: jumbo frame regression fix
Stephen Hemminger wrote: Remove unneeded check that caused problems with jumbo frame sizes. The check was recently added and is wrong. When using jumbo frames the sky2 driver does fragmentation, so rx_data_size is less than mtu. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- a/drivers/net/sky2.c2007-10-02 17:56:31.0 -0700 +++ b/drivers/net/sky2.c2007-10-02 17:58:56.0 -0700 @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending; prefetch(sky2->rx_ring + sky2->rx_next); - if (length < ETH_ZLEN || length > sky2->rx_data_size) - goto len_error; - 2.6.23? 2.6.24? enquiring minds want to know... - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sky2: jumbo frame regression fix
Remove unneeded check that caused problems with jumbo frame sizes. The check was recently added and is wrong. When using jumbo frames the sky2 driver does fragmentation, so rx_data_size is less than mtu. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- a/drivers/net/sky2.c2007-10-02 17:56:31.0 -0700 +++ b/drivers/net/sky2.c2007-10-02 17:58:56.0 -0700 @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending; prefetch(sky2->rx_ring + sky2->rx_next); - if (length < ETH_ZLEN || length > sky2->rx_data_size) - goto len_error; - /* This chip has hardware problems that generates bogus status. * So do only marginal checking and expect higher level protocols * to handle crap frames. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] baycom epp header ops
Update baycom epp driver for new header ops in net-2.6.24 Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/net/hamradio/baycom_epp.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/net/hamradio/baycom_epp.c b/drivers/net/hamradio/baycom_epp.c index 355c6cf..1a5a75a 100644 --- a/drivers/net/hamradio/baycom_epp.c +++ b/drivers/net/hamradio/baycom_epp.c @@ -1159,8 +1159,7 @@ static void baycom_probe(struct net_device *dev) /* Fill in the fields of the device structure */ bc->skb = NULL; - dev->hard_header = ax25_hard_header; - dev->rebuild_header = ax25_rebuild_header; + dev->header_ops = &ax25_header_ops; dev->set_mac_address = baycom_set_mac_address; dev->type = ARPHRD_AX25; /* AF_AX25 device */ -- 1.5.2.5 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel 2.4 vs 2.6 Traffic Controller Performance
Hello I hope this is the right place to ask this.Does any know if there is a substantial difference in the performance of the traffic controller between kernel 2.4 and 2.6. We tested it using 1 iperf server and use 250 and 500 clients, altering the burst. We use the top command to check the idle time of our router to see this. The results we got from the 2.4 kernel shows around 65-70% idle time while the 2.6 shows 60-65% idle time. We tried to use MRTG and we're not getting any results either. We want to know if we could improve the bandwidth by upgrading the kernel, else we would have to get a new bandwidth manager. Could anyone have the similar test regarding this or suggest a better way to do this. Thanks in advance. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] net driver updates
From: Jeff Garzik <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2007 13:41:50 -0400 > Please pull from the 'upstream' branch of > master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream Pulled and pushed back out to net-2.6.24, thanks Jeff! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24 3/3][BNX2]: Update version to 1.6.6.
Michael Chan wrote: [BNX2]: Update version to 1.6.6. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> ACK patches 1-3 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24 3/3][BNX2]: Update version to 1.6.6.
From: "Michael Chan" <[EMAIL PROTECTED]> Date: Tue, 02 Oct 2007 17:24:06 -0700 > [BNX2]: Update version to 1.6.6. > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> Also applied to net-2.6.24, thanks! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24 2/3][BNX2]: Optimize firmware loading.
From: "Michael Chan" <[EMAIL PROTECTED]> Date: Tue, 02 Oct 2007 17:23:43 -0700 > [BNX2]: Optimize firmware loading. > > This is a follow up to the patches from Denys Vlasenkos > <[EMAIL PROTECTED]> to further optimize firmware loading. > > 1. In bnx2_init_cpus(), we allocate memory for decompression once > and use it repeatedly instead of doing this for every firmware image. > > 2. We eliminate the BSS and SBSS firmware sections in bnx2_fw*.h since > these are always zeros. > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> Applied, thanks for following up on this Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24 1/3][BNX2]: Add missing napi_disable() in bnx2_close().
From: "Michael Chan" <[EMAIL PROTECTED]> Date: Tue, 02 Oct 2007 17:23:09 -0700 > [BNX2]: Add missing napi_disable() in bnx2_close(). > > bnx2_close() -> bnx2_netif_stop() will not call napi_disable() because > the netif_state is not running in bnx2_close(). To avoid confusion, > we change it to disable interrupt and napi directly in bnx2_close(). > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> Applied to net-2.6.24 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.24 3/3][BNX2]: Update version to 1.6.6.
[BNX2]: Update version to 1.6.6. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index c50e4c8..db14f35 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -56,8 +56,8 @@ #define DRV_MODULE_NAME"bnx2" #define PFX DRV_MODULE_NAME": " -#define DRV_MODULE_VERSION "1.6.5" -#define DRV_MODULE_RELDATE "September 20, 2007" +#define DRV_MODULE_VERSION "1.6.6" +#define DRV_MODULE_RELDATE "October 2, 2007" #define RUN_AT(x) (jiffies + (x)) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.24 2/3][BNX2]: Optimize firmware loading.
[BNX2]: Optimize firmware loading. This is a follow up to the patches from Denys Vlasenkos <[EMAIL PROTECTED]> to further optimize firmware loading. 1. In bnx2_init_cpus(), we allocate memory for decompression once and use it repeatedly instead of doing this for every firmware image. 2. We eliminate the BSS and SBSS firmware sections in bnx2_fw*.h since these are always zeros. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index 4887c31..c50e4c8 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -2810,21 +2810,16 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_reg *cpu_reg, struct fw_info *fw) /* Load the Text area. */ offset = cpu_reg->spad_base + (fw->text_addr - cpu_reg->mips_view_base); if (fw->gz_text) { - u32 *text; int j; - text = vmalloc(FW_BUF_SIZE); - if (!text) - return -ENOMEM; - rc = zlib_inflate_blob(text, FW_BUF_SIZE, fw->gz_text, fw->gz_text_len); - if (rc < 0) { - vfree(text); + rc = zlib_inflate_blob(fw->text, FW_BUF_SIZE, fw->gz_text, + fw->gz_text_len); + if (rc < 0) return rc; - } + for (j = 0; j < (fw->text_len / 4); j++, offset += 4) { - REG_WR_IND(bp, offset, cpu_to_le32(text[j])); + REG_WR_IND(bp, offset, cpu_to_le32(fw->text[j])); } - vfree(text); } /* Load the Data area. */ @@ -2839,21 +2834,21 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_reg *cpu_reg, struct fw_info *fw) /* Load the SBSS area. */ offset = cpu_reg->spad_base + (fw->sbss_addr - cpu_reg->mips_view_base); - if (fw->sbss) { + if (fw->sbss_len) { int j; for (j = 0; j < (fw->sbss_len / 4); j++, offset += 4) { - REG_WR_IND(bp, offset, fw->sbss[j]); + REG_WR_IND(bp, offset, 0); } } /* Load the BSS area. */ offset = cpu_reg->spad_base + (fw->bss_addr - cpu_reg->mips_view_base); - if (fw->bss) { + if (fw->bss_len) { int j; for (j = 0; j < (fw->bss_len/4); j++, offset += 4) { - REG_WR_IND(bp, offset, fw->bss[j]); + REG_WR_IND(bp, offset, 0); } } @@ -2894,19 +2889,16 @@ bnx2_init_cpus(struct bnx2 *bp) if (!text) return -ENOMEM; rc = zlib_inflate_blob(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1)); - if (rc < 0) { - vfree(text); + if (rc < 0) goto init_cpu_err; - } + load_rv2p_fw(bp, text, rc /* == len */, RV2P_PROC1); rc = zlib_inflate_blob(text, FW_BUF_SIZE, bnx2_rv2p_proc2, sizeof(bnx2_rv2p_proc2)); - if (rc < 0) { - vfree(text); + if (rc < 0) goto init_cpu_err; - } + load_rv2p_fw(bp, text, rc /* == len */, RV2P_PROC2); - vfree(text); /* Initialize the RX Processor. */ cpu_reg.mode = BNX2_RXP_CPU_MODE; @@ -2927,6 +2919,7 @@ bnx2_init_cpus(struct bnx2 *bp) else fw = &bnx2_rxp_fw_06; + fw->text = text; rc = load_cpu_fw(bp, &cpu_reg, fw); if (rc) goto init_cpu_err; @@ -2950,6 +2943,7 @@ bnx2_init_cpus(struct bnx2 *bp) else fw = &bnx2_txp_fw_06; + fw->text = text; rc = load_cpu_fw(bp, &cpu_reg, fw); if (rc) goto init_cpu_err; @@ -2973,6 +2967,7 @@ bnx2_init_cpus(struct bnx2 *bp) else fw = &bnx2_tpat_fw_06; + fw->text = text; rc = load_cpu_fw(bp, &cpu_reg, fw); if (rc) goto init_cpu_err; @@ -2996,6 +2991,7 @@ bnx2_init_cpus(struct bnx2 *bp) else fw = &bnx2_com_fw_06; + fw->text = text; rc = load_cpu_fw(bp, &cpu_reg, fw); if (rc) goto init_cpu_err; @@ -3017,11 +3013,13 @@ bnx2_init_cpus(struct bnx2 *bp) if (CHIP_NUM(bp) == CHIP_NUM_5709) { fw = &bnx2_cp_fw_09; + fw->text = text; rc = load_cpu_fw(bp, &cpu_reg, fw); if (rc) goto init_cpu_err; } init_cpu_err: + vfree(text); return rc; } diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h index a717459..56c190f 100644 --- a/drivers/net/bnx2.h +++ b/drivers/net/bnx2.h @@ -6738,7 +6738,7 @@ struct fw_info { const u32 text_addr; const u32 text_len; const u32 text_index; -/* u32 *text;*/ + u32 *text; u8 *gz_text; const u32 gz_text_len; @@ -6752,13 +6752,11 @@ struct fw_info { const
[PATCH 2.6.24 1/3][BNX2]: Add missing napi_disable() in bnx2_close().
[BNX2]: Add missing napi_disable() in bnx2_close(). bnx2_close() -> bnx2_netif_stop() will not call napi_disable() because the netif_state is not running in bnx2_close(). To avoid confusion, we change it to disable interrupt and napi directly in bnx2_close(). Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index cd5f1b7..4887c31 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -5212,8 +5212,8 @@ bnx2_close(struct net_device *dev) while (bp->in_reset_task) msleep(1); - /* This does napi_disable() for us. */ - bnx2_netif_stop(bp); + bnx2_disable_int_sync(bp); + napi_disable(&bp->napi); del_timer_sync(&bp->timer); if (bp->flags & NO_WOL_FLAG) reset_code = BNX2_DRV_MSG_CODE_UNLOAD_LNK_DN; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
Larry McVoy wrote: On Tue, Oct 02, 2007 at 03:32:16PM -0700, David Miller wrote: I'm starting to have a theory about what the bad case might be. A strong sender going to an even stronger receiver which can pull out packets into the process as fast as they arrive. This might be part of what keeps the receive window from growing. I can back you up on that. When I straced the receiving side that goes slowly, all the reads were short, like 1-2K. The way that works the reads were a lot larger as I recall. Indeed I was getting more like 8K on each recv() call per netperf's -v 2 stats, but the system was more than fast enough to stay ahead of the traffic. On the hunch that it was the interrupt throttling which was keeping the recv's large rather than the speed of the system(s) I nuked the InterruptThrottleRate to 0 and was able to get between 1900 and 2300 byte recvs on the TCP_STREAM and TCP_MAERTS tests and still had 940 Mbit/s in each direction. hpcpc106:~# netperf -H 192.168.7.107 -t TCP_STREAM -v 2 -c -C TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.7.107 (192.168.7.107) port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 87380 8738010.02 940.95 10.7521.653.743 7.540 Alignment Offset BytesBytes Sends BytesRecvs Local Remote Local Remote Xfered Per Per Send RecvSend Recv Send (avg) Recv (avg) 8 8 0 0 1.179e+09 87386.29 13491 1965.77 599729 Maximum Segment Size (bytes) 1448 hpcpc106:~# netperf -H 192.168.7.107 -t TCP_MAERTS -v 2 -c -C TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.7.107 (192.168.7.107) port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 87380 8738010.02 940.82 20.4410.617.117 3.696 Alignment Offset BytesBytes Recvs BytesSends Local Remote Local Remote Xfered Per Per Recv SendRecv Send Recv (avg) Send (avg) 8 8 0 0 1.178e+09 2352.26500931 87380.00 13485 Maximum Segment Size (bytes) 1448 the systems above had four 1.6 GHz cores, netperf reports CPU as 0 to 100% regardless of core count. and then my systems with the 3.0 GHz cores: [EMAIL PROTECTED] netperf2_trunk]# netperf -H sweb20 -v 2 -t TCP_STREAM -c -C TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to sweb20.cup.hp.com (16.89.133.20) port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 1638410.03 941.37 6.40 13.262.229 4.615 Alignment Offset BytesBytes Sends BytesRecvs Local Remote Local Remote Xfered Per Per Send RecvSend Recv Send (avg) Recv (avg) 8 8 0 0 1.18e+09 16384.06 72035 1453.85 811793 Maximum Segment Size (bytes) 1448 [EMAIL PROTECTED] netperf2_trunk]# netperf -H sweb20 -v 2 -t TCP_MAERTS -c -C TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to sweb20.cup.hp.com (16.89.133.20) port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 1638410.03 941.35 12.135.80 4.221 2.018 Alignment Offset BytesBytes Recvs BytesSends Local Remote Local Remote Xfered Per Per Recv SendRecv Send Recv (avg) Send (avg) 8 8 0 0 1.181e+09 1452.38812953 16384.00 72065 Maximum Segment Size (bytes) 1448 rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Fix ICMPv6 redirect handling with target multicast address
Brian, I don't think a few instructions is a performance issue in the redirect paths (it'd be pretty broken if you're getting or generating lots of them), but I know there are lots of other checks similar to that that will break with new attributes, so doing that as a general clean-up separately is ok with me, too. With the error message changes, you can add: Acked-by: David L Stevens <[EMAIL PROTECTED]> FWIW. :-) +-DLS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
On Tue, Oct 02, 2007 at 03:32:16PM -0700, David Miller wrote: > I'm starting to have a theory about what the bad case might > be. > > A strong sender going to an even stronger receiver which can > pull out packets into the process as fast as they arrive. > This might be part of what keeps the receive window from > growing. I can back you up on that. When I straced the receiving side that goes slowly, all the reads were short, like 1-2K. The way that works the reads were a lot larger as I recall. -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][TG3]Some cleanups
On Tue, 2007-10-02 at 08:37 -0400, jamal wrote: > The simplest solution seems to me to modify the definition of > TG3_SKB_CB > as i did for e1000 from: > (struct tg3_tx_cbdata *)&((__skb)->cb[0]) > to: > (struct tg3_tx_cbdata *)&((__skb)->cb[8]) > > that way the vlan tags are always present and no need to recreate > them. > What do you think? Seems ok to me. I think we should make it more clear that we're skipping over the VLAN tag: (struct tg3_tx_cbdata *)&((__skb)->cb[sizeof(struct vlan_skb_tx_cookie)]) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
From: Rick Jones <[EMAIL PROTECTED]> Date: Tue, 02 Oct 2007 15:17:35 -0700 > Stranger still, with a mix of a 2.6.23-rc5ish kernel and a net-2.6.24 one > (pulled oh middle of last week?) I get link-rate and I see no asymmetry > between > TCP_STREAM and TCP_MAERTS over an "e1000" link with no switch or tg3 with a > ProCurve on my rx2660's. > > I can also run bw_tcp from lmbench 3.0a8 and get 106 MB/s. > > I don't have a netgear switch to try in all this... I'm starting to have a theory about what the bad case might be. A strong sender going to an even stronger receiver which can pull out packets into the process as fast as they arrive. This might be part of what keeps the receive window from growing. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
David Miller wrote: From: [EMAIL PROTECTED] (Larry McVoy) Date: Tue, 2 Oct 2007 14:26:08 -0700 And note that sky2 doesn't have this problem. Does the broadcom do TSO? And sky2 not? I noticed a much higher CPU load for sky2. Yes the broadcoms (the revisions I have) do TSO and it is enabled on both sides. Which makes the mis-matched performance even stranger :) Stranger still, with a mix of a 2.6.23-rc5ish kernel and a net-2.6.24 one (pulled oh middle of last week?) I get link-rate and I see no asymmetry between TCP_STREAM and TCP_MAERTS over an "e1000" link with no switch or tg3 with a ProCurve on my rx2660's. I can also run bw_tcp from lmbench 3.0a8 and get 106 MB/s. I don't have a netgear switch to try in all this... rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2007 14:52:36 -0700 > Please consider using netif_msg_xxx() and module parameter to set > default message level, like other real network drivers already do. I keep seeing this recommendation, but the two supposedly most mature and actively used drivers in the tree, tg3 and e1000 and e1000e, all do not use this scheme. In fact there are tons of drivers that even hook up the ethtool msg_level setting function and never even use the value. If people aren't using netif_msg_xxx() and the ethtool msg_level facilities properly, it's because there is a severe dearth of good example drivers to learn about it from. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver
On Tue, 02 Oct 2007 23:02:53 +0200 Oliver Hartkopp <[EMAIL PROTECTED]> wrote: > Arnaldo Carvalho de Melo wrote: > > Em Tue, Oct 02, 2007 at 03:10:11PM +0200, Urs Thuermann escreveu: > > > >> + > >> +#ifdef CONFIG_CAN_DEBUG_DEVICES > >> +static int debug; > >> +module_param(debug, int, S_IRUGO); > >> +#endif > >> > > > > Can debug be a boolean? Like its counterpart on DCCP: > > > > net/dccp/proto.c: > > > > module_param(dccp_debug, bool, 0444); > > > > 'debug' should remain an integer to be able to specifiy debug-levels or > bit-fields for different Debug outputs. > > > Where we also use a namespace prefix, for those of us who use ctags or > > cscope. > > > > Even if i don't have any general objections to rename this 'debug' to > 'vcan_debug', it looks like an 'overnamed' module parameter for me. Is > this a genereal naming scheme recommendation for debug module_params? > Please consider using netif_msg_xxx() and module parameter to set default message level, like other real network drivers already do. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
On Tue, 2 Oct 2007, Wayne Scott wrote: > > The slow set was done like this: > > on ia64: netcat -l -p > /dev/null > on work: netcat ia64 < /dev/zero That sounds wrong. Larry claims the slow case is when the side that did "accept()" does the sending, the above has the listener just reading. > The fast set was done like this: > > on work: netcat -l -p > /dev/null > on ia64: netcat ia64 < /dev/zero This one is guaranteed wrong too, since you have the listener reading (fine), but the sener now doesn't go over the network at all, but sends to itself. That said, let's assume that only your description was bogus, the TCP dumps themselves are ok. I find the window scaling differences interesting. This is the opening of the fast sequence from the receiver: 13:35:13.929349 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: S 2592471184:2592471184(0) ack 3363219397 win 5792 13:35:13.929702 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 1449 win 68 13:35:13.929712 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 2897 win 91 13:35:13.929724 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 4345 win 114 13:35:13.929941 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 5793 win 136 13:35:13.929951 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 7241 win 159 13:35:13.929960 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 8689 win 181 13:35:13.929970 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 10137 win 204 13:35:13.929981 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 11585 win 227 13:35:13.929992 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 13033 win 249 13:35:13.930331 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 14481 win 272 ... ie we use a window scale of 7, and we started with a window of 5792 bytes, and after ten packets it has grown to 272<<7 (34816) bytes. The slow case is 13:34:16.761034 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: S 3299922549:3299922549(0) ack 2548837296 win 5792 13:34:16.761533 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 1449 win 2172 13:34:16.761553 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 2897 win 2896 13:34:16.761782 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 4345 win 3620 13:34:16.761908 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 5793 win 4344 13:34:16.761916 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 7241 win 5068 13:34:16.762157 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 8689 win 5792 13:34:16.762164 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 10137 win 6516 13:34:16.762283 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 11585 win 7240 13:34:16.762290 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 13033 win 7964 13:34:16.762303 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 14481 win 8688 ... so after the same ten packets, it too has grown to about the same size (8688<<2 = 34752 bytes). But the slow case has a smaller window scale, and it actually stops opening the window at that point: the window stays at 8688<<2 for a long time (and eventually grows to 9412<<2 and then 16652<<2 in the steady case, and is basically limited at that 66kB window size). But the fast one that had a window scale of 7 can keep growing, and will do so quite aggressively. It grows the window to (1442<<7 = 180kB) in the first fifty packets. But in your dump, it doesn't seem to be about who is listening and who is connecting. It seems to be about the fact that your machine 10.3.1.10 uses a window scale of 2, while 10.3.1.1 uses a scale of 7. Linus - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 05/13] skge: remove broken and unused PHY_M_PC_MDI_XMODE macro
Stephen Hemminger wrote: On Tue, 02 Oct 2007 14:11:38 -0700 [EMAIL PROTECTED] wrote: From: Mariusz Kozlowski <[EMAIL PROTECTED]> Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]> Cc: Stephen Hemminger <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- Already in netdev tree isn't it? Yep. Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver
From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2007 18:43:25 -0300 > I think that helping ctags to find the definition for the debug variable > to see, for instance, if it is a bitmask or a boolean without having to > chose from tons of 'debug' variables is a good thing. I completely agree. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 09/13] forcedeth: "no link" is informational
On Tue, 02 Oct 2007 14:11:41 -0700 [EMAIL PROTECTED] wrote: > From: "Ed Swierk" <[EMAIL PROTECTED]> > > Log "no link during initialization" at KERN_INFO as it's not an error, and > occurs every time the interface comes up (when the forcedeth-phy-power-down > patch is applied). > > Signed-off-by: Ed Swierk <[EMAIL PROTECTED]> > Cc: Ayaz Abdulla <[EMAIL PROTECTED]> > Cc: Jeff Garzik <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > > drivers/net/forcedeth.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff -puN drivers/net/forcedeth.c~forcedeth-no-link-is-informational > drivers/net/forcedeth.c > --- a/drivers/net/forcedeth.c~forcedeth-no-link-is-informational > +++ a/drivers/net/forcedeth.c > @@ -4921,7 +4921,7 @@ static int nv_open(struct net_device *de > if (ret) { > netif_carrier_on(dev); > } else { > - printk("%s: no link during initialization.\n", dev->name); > + printk(KERN_INFO "%s: no link during initialization.\n", > dev->name); > netif_carrier_off(dev); > } > if (oom) Driver should use netif_msg_link_up() -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc8-mm2 - tcp_fastretrans_alert() WARNING
> On Tue, 2 Oct 2007, Ilpo Järvinen wrote: > > > I'm currently out of ideas where it could come from... Hmm, there seems to be off-by-one in tcp_retrans_try_collapse after all, or in fact, two of them. I'll post patch for this tomorrow... -- i.
Re: tcp bw in 2.6
From: [EMAIL PROTECTED] (Larry McVoy) Date: Tue, 2 Oct 2007 14:26:08 -0700 > And note that sky2 doesn't have this problem. Does the broadcom do TSO? > And sky2 not? I noticed a much higher CPU load for sky2. Yes the broadcoms (the revisions I have) do TSO and it is enabled on both sides. Which makes the mis-matched performance even stranger :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 05/13] skge: remove broken and unused PHY_M_PC_MDI_XMODE macro
On Tue, 02 Oct 2007 14:11:38 -0700 [EMAIL PROTECTED] wrote: > From: Mariusz Kozlowski <[EMAIL PROTECTED]> > > Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]> > Cc: Stephen Hemminger <[EMAIL PROTECTED]> > Cc: Jeff Garzik <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > Already in netdev tree isn't it? -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver
Em Tue, Oct 02, 2007 at 11:02:53PM +0200, Oliver Hartkopp escreveu: > Arnaldo Carvalho de Melo wrote: >> Em Tue, Oct 02, 2007 at 03:10:11PM +0200, Urs Thuermann escreveu: >> >>> + >>> +#ifdef CONFIG_CAN_DEBUG_DEVICES >>> +static int debug; >>> +module_param(debug, int, S_IRUGO); >>> +#endif >>> >> >> Can debug be a boolean? Like its counterpart on DCCP: >> >> net/dccp/proto.c: >> >> module_param(dccp_debug, bool, 0444); >> > > 'debug' should remain an integer to be able to specifiy debug-levels or > bit-fields for different Debug outputs. > >> Where we also use a namespace prefix, for those of us who use ctags or >> cscope. >> > > Even if i don't have any general objections to rename this 'debug' to > 'vcan_debug', it looks like an 'overnamed' module parameter for me. Is this > a genereal naming scheme recommendation for debug module_params? [EMAIL PROTECTED] linux-2.6.23-rc9-rt1]$ find . -name "*.c" | xargs grep 'module_param(.\+debug,' | wc -l 112 [EMAIL PROTECTED] linux-2.6.23-rc9-rt1]$ find . -name "*.c" | xargs grep 'module_param(debug,' | wc -l 233 [EMAIL PROTECTED] linux-2.6.23-rc9-rt1]$ I think that helping ctags to find the definition for the debug variable to see, for instance, if it is a bitmask or a boolean without having to chose from tons of 'debug' variables is a good thing. - Arnaldo - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
From: [EMAIL PROTECTED] (Larry McVoy) Date: Tue, 2 Oct 2007 11:40:32 -0700 > I doubt it, the same test works fine in one direction and poorly in the other. > Wouldn't the flow control squelch either way? HW controls for these things are typically: 1) Generates flow control flames 2) Listens for them So you can have flow control operational in one direction and not the other. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
On Tue, Oct 02, 2007 at 02:16:56PM -0700, David Miller wrote: > We absolutely depend upon people like you to report when there are > anomalies like this. It's the only thing that scales. Well cool, finally doing something useful :) Is this issue no test setup? Because this does seem like something we'd want to have work well. > FWIW I have a t1000 Niagara box and an Ultra45 going through a netgear > gigabit switch. I'm getting 85MB/sec in one direction and 10MB/sec in > the other (using bw_tcp from lmbench3). Note that bw_tcp mucks with SND/RCVBUF. It probably shouldn't, it's been 12 years since that code went in there and I dunno if it is still needed. > Both are using identical > broadcom tigon3 gigabit chips and identical current kernels so that is > a truly strange result. > > I'll investigate, it may be the same thing you're seeing. Wow, sounds very similar. In my case I was seeing pretty close to 3x consistently. You're more like 8x, but I was all e1000 not broadcom. And note that sky2 doesn't have this problem. Does the broadcom do TSO? And sky2 not? I noticed a much higher CPU load for sky2. -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [10/11] pasemi_mac: use buffer index pointer in clean_rx()
pasemi_mac: use buffer index pointer in clean_rx() Use the new features in B0 for buffer ring index on the receive side. This means we no longer have to search in the ring for where the buffer came from. Also cleanup the RX cleaning side a little, while I was at it. Note: Pre-B0 hardware is no longer supported, and needs a pile of other workarounds that are not being submitted for mainline inclusion. So the fact that this breaks old hardware is not a problem at this time. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: k.org/drivers/net/pasemi_mac.c === --- k.org.orig/drivers/net/pasemi_mac.c +++ k.org/drivers/net/pasemi_mac.c @@ -243,9 +243,9 @@ static int pasemi_mac_setup_rx_resources PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3)); write_dma_reg(mac, PAS_DMA_RXINT_CFG(mac->dma_if), - PAS_DMA_RXINT_CFG_DHL(3) | - PAS_DMA_RXINT_CFG_L2 | - PAS_DMA_RXINT_CFG_LW); + PAS_DMA_RXINT_CFG_DHL(3) | PAS_DMA_RXINT_CFG_L2 | + PAS_DMA_RXINT_CFG_LW | PAS_DMA_RXINT_CFG_RBP | + PAS_DMA_RXINT_CFG_HEN); ring->next_to_fill = 0; ring->next_to_clean = 0; @@ -402,13 +402,12 @@ static void pasemi_mac_free_rx_resources static void pasemi_mac_replenish_rx_ring(struct net_device *dev, int limit) { struct pasemi_mac *mac = netdev_priv(dev); - int start = mac->rx->next_to_fill; - unsigned int fill, count; + int fill, count; if (limit <= 0) return; - fill = start; + fill = mac->rx->next_to_fill; for (count = 0; count < limit; count++) { struct pasemi_mac_buffer *info = &RX_RING_INFO(mac, fill); u64 *buff = &RX_BUFF(mac, fill); @@ -446,10 +445,10 @@ static void pasemi_mac_replenish_rx_ring wmb(); - write_dma_reg(mac, PAS_DMA_RXCHAN_INCR(mac->dma_rxch), count); write_dma_reg(mac, PAS_DMA_RXINT_INCR(mac->dma_if), count); - mac->rx->next_to_fill += count; + mac->rx->next_to_fill = (mac->rx->next_to_fill + count) & + (RX_RING_SIZE - 1); } static void pasemi_mac_restart_rx_intr(struct pasemi_mac *mac) @@ -517,15 +516,19 @@ static int pasemi_mac_clean_rx(struct pa int count; struct pasemi_mac_buffer *info; struct sk_buff *skb; - unsigned int i, len; + unsigned int len; u64 macrx; dma_addr_t dma; + int buf_index; + u64 eval; spin_lock(&mac->rx->lock); n = mac->rx->next_to_clean; - for (count = limit; count; count--) { + prefetch(RX_RING(mac, n)); + + for (count = 0; count < limit; count++) { macrx = RX_RING(mac, n); if ((macrx & XCT_MACRX_E) || @@ -537,21 +540,14 @@ static int pasemi_mac_clean_rx(struct pa info = NULL; - /* We have to scan for our skb since there's no way -* to back-map them from the descriptor, and if we -* have several receive channels then they might not -* show up in the same order as they were put on the -* interface ring. -*/ + BUG_ON(!(macrx & XCT_MACRX_RR_8BRES)); - dma = (RX_RING(mac, n+1) & XCT_PTR_ADDR_M); - for (i = mac->rx->next_to_fill; -i < (mac->rx->next_to_fill + RX_RING_SIZE); -i++) { - info = &RX_RING_INFO(mac, i); - if (info->dma == dma) - break; - } + eval = (RX_RING(mac, n+1) & XCT_RXRES_8B_EVAL_M) >> + XCT_RXRES_8B_EVAL_S; + buf_index = eval-1; + + dma = (RX_RING(mac, n+2) & XCT_PTR_ADDR_M); + info = &RX_RING_INFO(mac, buf_index); skb = info->skb; @@ -600,9 +596,9 @@ static int pasemi_mac_clean_rx(struct pa /* Need to zero it out since hardware doesn't, since the * replenish loop uses it to tell when it's done. */ - RX_BUFF(mac, i) = 0; + RX_BUFF(mac, buf_index) = 0; - n += 2; + n += 4; } if (n > RX_RING_SIZE) { @@ -610,8 +606,16 @@ static int pasemi_mac_clean_rx(struct pa write_iob_reg(mac, PAS_IOB_COM_PKTHDRCNT, 0); n &= (RX_RING_SIZE-1); } + mac->rx->next_to_clean = n; - pasemi_mac_replenish_rx_ring(mac->netdev, limit-count); + + /* Increase is in number of 16-byte entries, and since each descriptor +* with an 8BRES takes up 3x8 bytes (padded to 4x8), increase with +* count*2. +*/ + write_dma_reg(mac, PAS_DMA_RXCHAN_INCR(mac->dma_rxch), c
[PATCH] [11/11] pasemi_mac: enable iommu support
pasemi_mac: use buffer index pointer in clean_rx() Use the new features in B0 for buffer ring index on the receive side. This means we no longer have to search in the ring for where the buffer came from. Also cleanup the RX cleaning side a little, while I was at it. Note: Pre-B0 hardware is no longer supported, and needs a pile of other workarounds that are not being submitted for mainline inclusion. So the fact that this breaks old hardware is not a problem at this time. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: k.org/drivers/net/pasemi_mac.c === --- k.org.orig/drivers/net/pasemi_mac.c +++ k.org/drivers/net/pasemi_mac.c @@ -243,9 +243,9 @@ static int pasemi_mac_setup_rx_resources PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3)); write_dma_reg(mac, PAS_DMA_RXINT_CFG(mac->dma_if), - PAS_DMA_RXINT_CFG_DHL(3) | - PAS_DMA_RXINT_CFG_L2 | - PAS_DMA_RXINT_CFG_LW); + PAS_DMA_RXINT_CFG_DHL(3) | PAS_DMA_RXINT_CFG_L2 | + PAS_DMA_RXINT_CFG_LW | PAS_DMA_RXINT_CFG_RBP | + PAS_DMA_RXINT_CFG_HEN); ring->next_to_fill = 0; ring->next_to_clean = 0; @@ -402,13 +402,12 @@ static void pasemi_mac_free_rx_resources static void pasemi_mac_replenish_rx_ring(struct net_device *dev, int limit) { struct pasemi_mac *mac = netdev_priv(dev); - int start = mac->rx->next_to_fill; - unsigned int fill, count; + int fill, count; if (limit <= 0) return; - fill = start; + fill = mac->rx->next_to_fill; for (count = 0; count < limit; count++) { struct pasemi_mac_buffer *info = &RX_RING_INFO(mac, fill); u64 *buff = &RX_BUFF(mac, fill); @@ -446,10 +445,10 @@ static void pasemi_mac_replenish_rx_ring wmb(); - write_dma_reg(mac, PAS_DMA_RXCHAN_INCR(mac->dma_rxch), count); write_dma_reg(mac, PAS_DMA_RXINT_INCR(mac->dma_if), count); - mac->rx->next_to_fill += count; + mac->rx->next_to_fill = (mac->rx->next_to_fill + count) & + (RX_RING_SIZE - 1); } static void pasemi_mac_restart_rx_intr(struct pasemi_mac *mac) @@ -517,15 +516,19 @@ static int pasemi_mac_clean_rx(struct pa int count; struct pasemi_mac_buffer *info; struct sk_buff *skb; - unsigned int i, len; + unsigned int len; u64 macrx; dma_addr_t dma; + int buf_index; + u64 eval; spin_lock(&mac->rx->lock); n = mac->rx->next_to_clean; - for (count = limit; count; count--) { + prefetch(RX_RING(mac, n)); + + for (count = 0; count < limit; count++) { macrx = RX_RING(mac, n); if ((macrx & XCT_MACRX_E) || @@ -537,21 +540,14 @@ static int pasemi_mac_clean_rx(struct pa info = NULL; - /* We have to scan for our skb since there's no way -* to back-map them from the descriptor, and if we -* have several receive channels then they might not -* show up in the same order as they were put on the -* interface ring. -*/ + BUG_ON(!(macrx & XCT_MACRX_RR_8BRES)); - dma = (RX_RING(mac, n+1) & XCT_PTR_ADDR_M); - for (i = mac->rx->next_to_fill; -i < (mac->rx->next_to_fill + RX_RING_SIZE); -i++) { - info = &RX_RING_INFO(mac, i); - if (info->dma == dma) - break; - } + eval = (RX_RING(mac, n+1) & XCT_RXRES_8B_EVAL_M) >> + XCT_RXRES_8B_EVAL_S; + buf_index = eval-1; + + dma = (RX_RING(mac, n+2) & XCT_PTR_ADDR_M); + info = &RX_RING_INFO(mac, buf_index); skb = info->skb; @@ -600,9 +596,9 @@ static int pasemi_mac_clean_rx(struct pa /* Need to zero it out since hardware doesn't, since the * replenish loop uses it to tell when it's done. */ - RX_BUFF(mac, i) = 0; + RX_BUFF(mac, buf_index) = 0; - n += 2; + n += 4; } if (n > RX_RING_SIZE) { @@ -610,8 +606,16 @@ static int pasemi_mac_clean_rx(struct pa write_iob_reg(mac, PAS_IOB_COM_PKTHDRCNT, 0); n &= (RX_RING_SIZE-1); } + mac->rx->next_to_clean = n; - pasemi_mac_replenish_rx_ring(mac->netdev, limit-count); + + /* Increase is in number of 16-byte entries, and since each descriptor +* with an 8BRES takes up 3x8 bytes (padded to 4x8), increase with +* count*2. +*/ + write_dma_reg(mac, PAS_DMA_RXCHAN_INCR(mac->dma_rxch), c
[PATCH] [8/11] pasemi_mac: update todo list
pasemi_mac: update todo list Remove some stale todo items that have been taken care of. Add a couple of upcoming ones. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: 2.6.23/drivers/net/pasemi_mac.c === --- 2.6.23.orig/drivers/net/pasemi_mac.c +++ 2.6.23/drivers/net/pasemi_mac.c @@ -46,12 +46,10 @@ /* TODO list * - * - Get rid of pci_{read,write}_config(), map registers with ioremap - * for performance - * - PHY support * - Multicast support * - Large MTU support - * - Other performance improvements + * - SW LRO + * - Multiqueue RX/TX */ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [9/11] pasemi_mac: clear out old errors on interface open
pasemi_mac: clear out old errors on interface open Clear out any pending errors when an interface is brought up. Since the bits are sticky, they might be from interface shutdown time after firmware has used it, etc. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: k.org/drivers/net/pasemi_mac.c === --- k.org.orig/drivers/net/pasemi_mac.c +++ k.org/drivers/net/pasemi_mac.c @@ -903,16 +903,27 @@ static int pasemi_mac_open(struct net_de /* enable rx if */ write_dma_reg(mac, PAS_DMA_RXINT_RCMDSTA(mac->dma_if), - PAS_DMA_RXINT_RCMDSTA_EN); + PAS_DMA_RXINT_RCMDSTA_EN | + PAS_DMA_RXINT_RCMDSTA_DROPS_M | + PAS_DMA_RXINT_RCMDSTA_BP | + PAS_DMA_RXINT_RCMDSTA_OO | + PAS_DMA_RXINT_RCMDSTA_BT); /* enable rx channel */ write_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch), PAS_DMA_RXCHAN_CCMDSTA_EN | - PAS_DMA_RXCHAN_CCMDSTA_DU); + PAS_DMA_RXCHAN_CCMDSTA_DU | + PAS_DMA_RXCHAN_CCMDSTA_OD | + PAS_DMA_RXCHAN_CCMDSTA_FD | + PAS_DMA_RXCHAN_CCMDSTA_DT); /* enable tx channel */ write_dma_reg(mac, PAS_DMA_TXCHAN_TCMDSTA(mac->dma_txch), - PAS_DMA_TXCHAN_TCMDSTA_EN); + PAS_DMA_TXCHAN_TCMDSTA_EN | + PAS_DMA_TXCHAN_TCMDSTA_SZ | + PAS_DMA_TXCHAN_TCMDSTA_DB | + PAS_DMA_TXCHAN_TCMDSTA_DE | + PAS_DMA_TXCHAN_TCMDSTA_DA); pasemi_mac_replenish_rx_ring(dev, RX_RING_SIZE); @@ -987,7 +998,7 @@ out_rx_resources: static int pasemi_mac_close(struct net_device *dev) { struct pasemi_mac *mac = netdev_priv(dev); - unsigned int stat; + unsigned int sta; int retries; if (mac->phydev) { @@ -998,6 +1009,26 @@ static int pasemi_mac_close(struct net_d netif_stop_queue(dev); napi_disable(&mac->napi); + sta = read_dma_reg(mac, PAS_DMA_RXINT_RCMDSTA(mac->dma_if)); + if (sta & (PAS_DMA_RXINT_RCMDSTA_BP | + PAS_DMA_RXINT_RCMDSTA_OO | + PAS_DMA_RXINT_RCMDSTA_BT)) + printk(KERN_DEBUG "pasemi_mac: rcmdsta error: 0x%08x\n", sta); + + sta = read_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch)); + if (sta & (PAS_DMA_RXCHAN_CCMDSTA_DU | +PAS_DMA_RXCHAN_CCMDSTA_OD | +PAS_DMA_RXCHAN_CCMDSTA_FD | +PAS_DMA_RXCHAN_CCMDSTA_DT)) + printk(KERN_DEBUG "pasemi_mac: ccmdsta error: 0x%08x\n", sta); + + sta = read_dma_reg(mac, PAS_DMA_TXCHAN_TCMDSTA(mac->dma_txch)); + if (sta & (PAS_DMA_TXCHAN_TCMDSTA_SZ | + PAS_DMA_TXCHAN_TCMDSTA_DB | + PAS_DMA_TXCHAN_TCMDSTA_DE | + PAS_DMA_TXCHAN_TCMDSTA_DA)) + printk(KERN_DEBUG "pasemi_mac: tcmdsta error: 0x%08x\n", sta); + /* Clean out any pending buffers */ pasemi_mac_clean_tx(mac); pasemi_mac_clean_rx(mac, RX_RING_SIZE); @@ -1008,33 +1039,33 @@ static int pasemi_mac_close(struct net_d write_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch), PAS_DMA_RXCHAN_CCMDSTA_ST); for (retries = 0; retries < MAX_RETRIES; retries++) { - stat = read_dma_reg(mac, PAS_DMA_TXCHAN_TCMDSTA(mac->dma_txch)); - if (!(stat & PAS_DMA_TXCHAN_TCMDSTA_ACT)) + sta = read_dma_reg(mac, PAS_DMA_TXCHAN_TCMDSTA(mac->dma_txch)); + if (!(sta & PAS_DMA_TXCHAN_TCMDSTA_ACT)) break; cond_resched(); } - if (stat & PAS_DMA_TXCHAN_TCMDSTA_ACT) + if (sta & PAS_DMA_TXCHAN_TCMDSTA_ACT) dev_err(&mac->dma_pdev->dev, "Failed to stop tx channel\n"); for (retries = 0; retries < MAX_RETRIES; retries++) { - stat = read_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch)); - if (!(stat & PAS_DMA_RXCHAN_CCMDSTA_ACT)) + sta = read_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch)); + if (!(sta & PAS_DMA_RXCHAN_CCMDSTA_ACT)) break; cond_resched(); } - if (stat & PAS_DMA_RXCHAN_CCMDSTA_ACT) + if (sta & PAS_DMA_RXCHAN_CCMDSTA_ACT) dev_err(&mac->dma_pdev->dev, "Failed to stop rx channel\n"); for (retries = 0; retries < MAX_RETRIES; retries++) { - stat = read_dma_reg(mac, PAS_DMA_RXINT_RCMDSTA(mac->dma_if)); - if (!(stat & PAS_DMA_RXINT_RCMDSTA_ACT)) + sta = read_dma_reg(mac, PAS_DMA_RXINT_RCMDSTA(mac->dma_if)); + if (!(sta & PAS_DMA_RXINT_R
[PATCH] [6/11] pasemi_mac: add local skb alignment
pasemi_mac: add local skb alignment Add local SKB alignment to pasemi_mac, since ppc64 in general has it at 0 because of design flaws in some of the IBM server bridge chips. However, for PWRficient doing the unaligned copies is more expensive than doing unaligned DMA so make sure the data is aligned instead. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: k.org/drivers/net/pasemi_mac.c === --- k.org.orig/drivers/net/pasemi_mac.c +++ k.org/drivers/net/pasemi_mac.c @@ -37,6 +37,12 @@ #include "pasemi_mac.h" +/* We have our own align, since ppc64 in general has it at 0 because + * of design flaws in some of the server bridge chips. However, for + * PWRficient doing the unaligned copies is more expensive than doing + * unaligned DMA, so make sure the data is aligned instead. + */ +#define LOCAL_SKB_ALIGN2 /* TODO list * @@ -409,13 +415,16 @@ static void pasemi_mac_replenish_rx_ring /* skb might still be in there for recycle on short receives */ if (info->skb) skb = info->skb; - else + else { skb = dev_alloc_skb(BUF_SIZE); + skb_reserve(skb, LOCAL_SKB_ALIGN); + } if (unlikely(!skb)) break; - dma = pci_map_single(mac->dma_pdev, skb->data, BUF_SIZE, + dma = pci_map_single(mac->dma_pdev, skb->data, +BUF_SIZE - LOCAL_SKB_ALIGN, PCI_DMA_FROMDEVICE); if (unlikely(dma_mapping_error(dma))) { @@ -553,10 +562,12 @@ static int pasemi_mac_clean_rx(struct pa len = (macrx & XCT_MACRX_LLEN_M) >> XCT_MACRX_LLEN_S; if (len < 256) { - struct sk_buff *new_skb = - netdev_alloc_skb(mac->netdev, len + NET_IP_ALIGN); + struct sk_buff *new_skb; + + new_skb = netdev_alloc_skb(mac->netdev, + len + LOCAL_SKB_ALIGN); if (new_skb) { - skb_reserve(new_skb, NET_IP_ALIGN); + skb_reserve(new_skb, LOCAL_SKB_ALIGN); memcpy(new_skb->data, skb->data, len); /* save the skb in buffer_info as good */ skb = new_skb; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [7/11] pasemi_mac: further performance tweaks
pasemi_mac: further performance tweaks Misc driver tweaks for pasemi_mac: * Increase ring size (really needed mostly on 10G) * Take out an unneeded barrier * Move around a few prefetches and reorder a few calls * Don't try to clean on full tx buffer, just let things take their course and stop the queue directly * Avoid filling on the same line as the interface is working on to reduce cache line bouncing * Avoid unneeded clearing of software state (and make the interface shutdown code handle it) * Fix up some of the tx ring wrap logic. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: k.org/drivers/net/pasemi_mac.c === --- k.org.orig/drivers/net/pasemi_mac.c +++ k.org/drivers/net/pasemi_mac.c @@ -56,8 +56,8 @@ /* Must be a power of two */ -#define RX_RING_SIZE 512 -#define TX_RING_SIZE 512 +#define RX_RING_SIZE 4096 +#define TX_RING_SIZE 4096 #define DEFAULT_MSG_ENABLE \ (NETIF_MSG_DRV | \ @@ -336,8 +336,16 @@ static void pasemi_mac_free_tx_resources struct pasemi_mac_buffer *info; dma_addr_t dmas[MAX_SKB_FRAGS+1]; int freed; + int start, limit; - for (i = 0; i < TX_RING_SIZE; i += freed) { + start = mac->tx->next_to_clean; + limit = mac->tx->next_to_fill; + + /* Compensate for when fill has wrapped and clean has not */ + if (start > limit) + limit += TX_RING_SIZE; + + for (i = start; i < limit; i += freed) { info = &TX_RING_INFO(mac, i+1); if (info->dma && info->skb) { for (j = 0; j <= skb_shinfo(info->skb)->nr_frags; j++) @@ -520,9 +528,6 @@ static int pasemi_mac_clean_rx(struct pa n = mac->rx->next_to_clean; for (count = limit; count; count--) { - - rmb(); - macrx = RX_RING(mac, n); if ((macrx & XCT_MACRX_E) || @@ -550,14 +555,10 @@ static int pasemi_mac_clean_rx(struct pa break; } - prefetchw(info); - skb = info->skb; - prefetchw(skb); - info->dma = 0; - pci_unmap_single(mac->dma_pdev, dma, skb->len, -PCI_DMA_FROMDEVICE); + prefetch(skb); + prefetch(&skb->data_len); len = (macrx & XCT_MACRX_LLEN_M) >> XCT_MACRX_LLEN_S; @@ -576,10 +577,9 @@ static int pasemi_mac_clean_rx(struct pa } else info->skb = NULL; - /* Need to zero it out since hardware doesn't, since the -* replenish loop uses it to tell when it's done. -*/ - RX_BUFF(mac, i) = 0; + pci_unmap_single(mac->dma_pdev, dma, len, PCI_DMA_FROMDEVICE); + + info->dma = 0; skb_put(skb, len); @@ -599,6 +599,11 @@ static int pasemi_mac_clean_rx(struct pa RX_RING(mac, n) = 0; RX_RING(mac, n+1) = 0; + /* Need to zero it out since hardware doesn't, since the +* replenish loop uses it to tell when it's done. +*/ + RX_BUFF(mac, i) = 0; + n += 2; } @@ -621,27 +626,33 @@ static int pasemi_mac_clean_rx(struct pa static int pasemi_mac_clean_tx(struct pasemi_mac *mac) { int i, j; - struct pasemi_mac_buffer *info; - unsigned int start, descr_count, buf_count, limit; + unsigned int start, descr_count, buf_count, batch_limit; + unsigned int ring_limit; unsigned int total_count; unsigned long flags; struct sk_buff *skbs[TX_CLEAN_BATCHSIZE]; dma_addr_t dmas[TX_CLEAN_BATCHSIZE][MAX_SKB_FRAGS+1]; total_count = 0; - limit = TX_CLEAN_BATCHSIZE; + batch_limit = TX_CLEAN_BATCHSIZE; restart: spin_lock_irqsave(&mac->tx->lock, flags); start = mac->tx->next_to_clean; + ring_limit = mac->tx->next_to_fill; + + /* Compensate for when fill has wrapped but clean has not */ + if (start > ring_limit) + ring_limit += TX_RING_SIZE; buf_count = 0; descr_count = 0; for (i = start; -descr_count < limit && i < mac->tx->next_to_fill; +descr_count < batch_limit && i < ring_limit; i += buf_count) { u64 mactx = TX_RING(mac, i); + struct sk_buff *skb; if ((mactx & XCT_MACTX_E) || (*mac->tx_status & PAS_STATUS_ERROR)) @@ -651,19 +662,15 @@ restart: /* Not yet transmitted */ break; - info = &TX_RING_INFO(mac, i+1); - skbs[descr_count] = info->skb; + skb = TX_RING_INFO(mac, i+1).skb; +
[PATCH] [4/11] pasemi_mac: implement sg support
pasemi_mac: implement sg support Implement SG support for pasemi_mac Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: k.org/drivers/net/pasemi_mac.c === --- k.org.orig/drivers/net/pasemi_mac.c +++ k.org/drivers/net/pasemi_mac.c @@ -160,6 +160,30 @@ static int pasemi_get_mac_addr(struct pa return 0; } +static int pasemi_mac_unmap_tx_skb(struct pasemi_mac *mac, + struct sk_buff *skb, + dma_addr_t *dmas) +{ + int f; + int nfrags = skb_shinfo(skb)->nr_frags; + + pci_unmap_single(mac->dma_pdev, dmas[0], skb_headlen(skb), +PCI_DMA_TODEVICE); + + for (f = 0; f < nfrags; f++) { + skb_frag_t *frag = &skb_shinfo(skb)->frags[f]; + + pci_unmap_page(mac->dma_pdev, dmas[f+1], frag->size, + PCI_DMA_TODEVICE); + } + dev_kfree_skb_irq(skb); + + /* Freed descriptor slot + main SKB ptr + nfrags additional ptrs, +* aligned up to a power of 2 +*/ + return (nfrags + 3) & ~1; +} + static int pasemi_mac_setup_rx_resources(struct net_device *dev) { struct pasemi_mac_rxring *ring; @@ -300,24 +324,24 @@ out_ring: static void pasemi_mac_free_tx_resources(struct net_device *dev) { struct pasemi_mac *mac = netdev_priv(dev); - unsigned int i; + unsigned int i, j; struct pasemi_mac_buffer *info; + dma_addr_t dmas[MAX_SKB_FRAGS+1]; + int freed; - for (i = 0; i < TX_RING_SIZE; i += 2) { + for (i = 0; i < TX_RING_SIZE; i += freed) { info = &TX_RING_INFO(mac, i+1); if (info->dma && info->skb) { - pci_unmap_single(mac->dma_pdev, -info->dma, -info->skb->len, -PCI_DMA_TODEVICE); - dev_kfree_skb_any(info->skb); - } - TX_RING(mac, i) = 0; - TX_RING(mac, i+1) = 0; - info->dma = 0; - info->skb = NULL; + for (j = 0; j <= skb_shinfo(info->skb)->nr_frags; j++) + dmas[j] = TX_RING_INFO(mac, i+1+j).dma; + freed = pasemi_mac_unmap_tx_skb(mac, info->skb, dmas); + } else + freed = 2; } + for (i = 0; i < TX_RING_SIZE; i++) + TX_RING(mac, i) = 0; + dma_free_coherent(&mac->dma_pdev->dev, TX_RING_SIZE * sizeof(u64), mac->tx->ring, mac->tx->dma); @@ -573,27 +597,34 @@ static int pasemi_mac_clean_rx(struct pa return count; } +/* Can't make this too large or we blow the kernel stack limits */ +#define TX_CLEAN_BATCHSIZE (128/MAX_SKB_FRAGS) + static int pasemi_mac_clean_tx(struct pasemi_mac *mac) { - int i; + int i, j; struct pasemi_mac_buffer *info; - unsigned int start, count, limit; + unsigned int start, descr_count, buf_count, limit; unsigned int total_count; unsigned long flags; - struct sk_buff *skbs[32]; - dma_addr_t dmas[32]; + struct sk_buff *skbs[TX_CLEAN_BATCHSIZE]; + dma_addr_t dmas[TX_CLEAN_BATCHSIZE][MAX_SKB_FRAGS+1]; total_count = 0; + limit = TX_CLEAN_BATCHSIZE; restart: spin_lock_irqsave(&mac->tx->lock, flags); start = mac->tx->next_to_clean; - limit = min(mac->tx->next_to_fill, start+32); - count = 0; + buf_count = 0; + descr_count = 0; - for (i = start; i < limit; i += 2) { + for (i = start; +descr_count < limit && i < mac->tx->next_to_fill; +i += buf_count) { u64 mactx = TX_RING(mac, i); + if ((mactx & XCT_MACTX_E) || (*mac->tx_status & PAS_STATUS_ERROR)) pasemi_mac_tx_error(mac, mactx); @@ -603,30 +634,38 @@ restart: break; info = &TX_RING_INFO(mac, i+1); - skbs[count] = info->skb; - dmas[count] = info->dma; + skbs[descr_count] = info->skb; + + buf_count = 2 + skb_shinfo(info->skb)->nr_frags; + for (j = 0; j <= skb_shinfo(info->skb)->nr_frags; j++) + dmas[descr_count][j] = TX_RING_INFO(mac, i+1+j).dma; + info->dma = 0; TX_RING(mac, i) = 0; TX_RING(mac, i+1) = 0; + TX_RING_INFO(mac, i+1).skb = 0; + TX_RING_INFO(mac, i+1).dma = 0; - - count++; + /* Since we always fill with an even number of entries, make +* sure we skip any unused one at the end as well. +*/ + if (buf_count & 1) +
[PATCH] [5/11] pasemi_mac: workaround for erratum 5971
pasemi_mac: workaround for erratum 5971 Implement workarounds for erratum 5971, where L2 hints aren't considered properly unless the way hint is enabled on the interface. Since L2 isn't setup to dedicate a way to headers, we need to reset the packet count by hand so it won't run out of credits. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: k.org/drivers/net/pasemi_mac.c === --- k.org.orig/drivers/net/pasemi_mac.c +++ k.org/drivers/net/pasemi_mac.c @@ -239,7 +239,9 @@ static int pasemi_mac_setup_rx_resources PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3)); write_dma_reg(mac, PAS_DMA_RXINT_CFG(mac->dma_if), - PAS_DMA_RXINT_CFG_DHL(2)); + PAS_DMA_RXINT_CFG_DHL(3) | + PAS_DMA_RXINT_CFG_L2 | + PAS_DMA_RXINT_CFG_LW); ring->next_to_fill = 0; ring->next_to_clean = 0; @@ -589,6 +591,11 @@ static int pasemi_mac_clean_rx(struct pa n += 2; } + if (n > RX_RING_SIZE) { + /* Errata 5971 workaround: L2 target of headers */ + write_iob_reg(mac, PAS_IOB_COM_PKTHDRCNT, 0); + n &= (RX_RING_SIZE-1); + } mac->rx->next_to_clean = n; pasemi_mac_replenish_rx_ring(mac->netdev, limit-count); Index: k.org/drivers/net/pasemi_mac.h === --- k.org.orig/drivers/net/pasemi_mac.h +++ k.org/drivers/net/pasemi_mac.h @@ -210,6 +210,8 @@ enum { #definePAS_DMA_RXINT_CFG_DHL_S 24 #definePAS_DMA_RXINT_CFG_DHL(x)(((x) << PAS_DMA_RXINT_CFG_DHL_S) & \ PAS_DMA_RXINT_CFG_DHL_M) +#definePAS_DMA_RXINT_CFG_LW0x0020 +#definePAS_DMA_RXINT_CFG_L20x0010 #definePAS_DMA_RXINT_CFG_WIF 0x0002 #definePAS_DMA_RXINT_CFG_WIL 0x0001 @@ -315,6 +317,12 @@ enum { #definePAS_STATUS_SOFT 0x4000ull #definePAS_STATUS_INT 0x8000ull +#define PAS_IOB_COM_PKTHDRCNT 0x120 +#definePAS_IOB_COM_PKTHDRCNT_PKTHDR1_M 0x0fff +#definePAS_IOB_COM_PKTHDRCNT_PKTHDR1_S 16 +#definePAS_IOB_COM_PKTHDRCNT_PKTHDR0_M 0x0fff +#definePAS_IOB_COM_PKTHDRCNT_PKTHDR0_S 0 + #define PAS_IOB_DMA_RXCH_CFG(i)(0x1100 + (i)*4) #definePAS_IOB_DMA_RXCH_CFG_CNTTH_M0x0fff #definePAS_IOB_DMA_RXCH_CFG_CNTTH_S0 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 13/13] ax88796: add 93cx6 eeprom support
From: Magnus Damm <[EMAIL PROTECTED]> Hook up the 93cx6 eeprom code to the ax88796 driver and modify the ax88796 driver to read out the mac address from the eeprom. We need this for the ax88796 on certain SuperH boards. The pin configuration used to connect the eeprom to the ax88796 on these boards is the same as pointed out by the ax88796 datasheet, so we can probably reuse this code for multiple platforms in the future. Signed-off-by: Magnus Damm <[EMAIL PROTECTED]> Cc: Ben Dooks <[EMAIL PROTECTED]> Cc: Paul Mundt <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/Kconfig |7 drivers/net/ax88796.c| 49 + include/linux/eeprom_93cx6.h |3 +- include/net/ax88796.h|1 4 files changed, 59 insertions(+), 1 deletion(-) diff -puN drivers/net/Kconfig~ax88796-add-93cx6-eeprom-support drivers/net/Kconfig --- a/drivers/net/Kconfig~ax88796-add-93cx6-eeprom-support +++ a/drivers/net/Kconfig @@ -240,6 +240,13 @@ config AX88796 AX88796 driver, using platform bus to provide chip detection and resources +config AX88796_93CX6 + bool "ASIX AX88796 external 93CX6 eeprom support" + depends on AX88796 + select EEPROM_93CX6 + help + Select this if your platform comes with an external 93CX6 eeprom. + config MACE tristate "MACE (Power Mac ethernet) support" depends on PPC_PMAC && PPC32 diff -puN drivers/net/ax88796.c~ax88796-add-93cx6-eeprom-support drivers/net/ax88796.c --- a/drivers/net/ax88796.c~ax88796-add-93cx6-eeprom-support +++ a/drivers/net/ax88796.c @@ -24,6 +24,7 @@ #include #include #include +#include #include @@ -582,6 +583,37 @@ static const struct ethtool_ops ax_ethto .get_link = ax_get_link, }; +#ifdef CONFIG_AX88796_93CX6 +static void ax_eeprom_register_read(struct eeprom_93cx6 *eeprom) +{ + struct ei_device *ei_local = eeprom->data; + u8 reg = ei_inb(ei_local->mem + AX_MEMR); + + eeprom->reg_data_in = reg & AX_MEMR_EEI; + eeprom->reg_data_out = reg & AX_MEMR_EEO; /* Input pin */ + eeprom->reg_data_clock = reg & AX_MEMR_EECLK; + eeprom->reg_chip_select = reg & AX_MEMR_EECS; +} + +static void ax_eeprom_register_write(struct eeprom_93cx6 *eeprom) +{ + struct ei_device *ei_local = eeprom->data; + u8 reg = ei_inb(ei_local->mem + AX_MEMR); + + reg &= ~(AX_MEMR_EEI | AX_MEMR_EECLK | AX_MEMR_EECS); + + if (eeprom->reg_data_in) + reg |= AX_MEMR_EEI; + if (eeprom->reg_data_clock) + reg |= AX_MEMR_EECLK; + if (eeprom->reg_chip_select) + reg |= AX_MEMR_EECS; + + ei_outb(reg, ei_local->mem + AX_MEMR); + udelay(10); +} +#endif + /* setup code */ static void ax_initial_setup(struct net_device *dev, struct ei_device *ei_local) @@ -640,6 +672,23 @@ static int ax_init_dev(struct net_device memcpy(dev->dev_addr, SA_prom, 6); } +#ifdef CONFIG_AX88796_93CX6 + if (first_init && ax->plat->flags & AXFLG_HAS_93CX6) { + unsigned char mac_addr[6]; + struct eeprom_93cx6 eeprom; + + eeprom.data = ei_local; + eeprom.register_read = ax_eeprom_register_read; + eeprom.register_write = ax_eeprom_register_write; + eeprom.width = PCI_EEPROM_WIDTH_93C56; + + eeprom_93cx6_multiread(&eeprom, 0, + (__le16 __force *)mac_addr, + sizeof(mac_addr) >> 1); + + memcpy(dev->dev_addr, mac_addr, 6); + } +#endif if (ax->plat->wordlength == 2) { /* We must set the 8390 for word mode. */ ei_outb(ax->plat->dcr_val, ei_local->mem + EN0_DCFG); diff -puN include/linux/eeprom_93cx6.h~ax88796-add-93cx6-eeprom-support include/linux/eeprom_93cx6.h --- a/include/linux/eeprom_93cx6.h~ax88796-add-93cx6-eeprom-support +++ a/include/linux/eeprom_93cx6.h @@ -21,13 +21,14 @@ /* Module: eeprom_93cx6 Abstract: EEPROM reader datastructures for 93cx6 chipsets. - Supported chipsets: 93c46 & 93c66. + Supported chipsets: 93c46, 93c56 and 93c66. */ /* * EEPROM operation defines. */ #define PCI_EEPROM_WIDTH_93C46 6 +#define PCI_EEPROM_WIDTH_93C56 8 #define PCI_EEPROM_WIDTH_93C66 8 #define PCI_EEPROM_WIDTH_OPCODE3 #define PCI_EEPROM_WRITE_OPCODE0x05 diff -puN include/net/ax88796.h~ax88796-add-93cx6-eeprom-support include/net/ax88796.h --- a/include/net/ax88796.h~ax88796-add-93cx6-eeprom-support +++ a/include/net/ax88796.h @@ -14,6 +14,7 @@ #define AXFLG_HAS_EEPROM (1<<0) #define AXFLG_MAC_FROMDEV (1<<1) /* device already has MAC */ +#define AXFLG_HAS_93CX6(1<<2) /* use eeprom_93cx6 driver */ struct ax_plat_data { unsigned int flags; _ - To unsub
[patch 11/13] PHYLIB: fix an interrupt loop potential when halting
From: "Maciej W. Rozycki" <[EMAIL PROTECTED]> Ensure the PHY_HALTED state is not entered with the IRQ asserted as it could lead to an interrupt loop. There is a small window in phy_stop(), where the state of the PHY machine indicates it has been halted, but its interrupt output might still be unmasked. If an interrupt goes active right at this moment it will loop as the phy_interrupt() handler exits immediately with IRQ_NONE if the halted state is seen. It is unsafe to extend the phydev spinlock to cover phy_interrupt(). It is safe to swap the order of the actions though as all the competing places to unmask the interrupt output of the PHY, which are phy_change() and phy_timer() are already covered with the lock as is the sequence in question. Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]> Cc: Andy Fleming <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/phy/phy.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff -puN drivers/net/phy/phy.c~phylib-fix-an-interrupt-loop-potential-when-halting drivers/net/phy/phy.c --- a/drivers/net/phy/phy.c~phylib-fix-an-interrupt-loop-potential-when-halting +++ a/drivers/net/phy/phy.c @@ -737,8 +737,6 @@ void phy_stop(struct phy_device *phydev) if (PHY_HALTED == phydev->state) goto out_unlock; - phydev->state = PHY_HALTED; - if (phydev->irq != PHY_POLL) { /* Disable PHY Interrupts */ phy_config_interrupt(phydev, PHY_INTERRUPT_DISABLED); @@ -747,6 +745,8 @@ void phy_stop(struct phy_device *phydev) phy_clear_interrupt(phydev); } + phydev->state = PHY_HALTED; + out_unlock: spin_unlock_bh(&phydev->lock); _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [3/11] pasemi_mac: rework ring management
pasemi_mac: rework ring management Rework ring management, switching to an opaque ring format instead of the struct-based descriptor+pointer setup, since it will be needed for SG support. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: k.org/drivers/net/pasemi_mac.c === --- k.org.orig/drivers/net/pasemi_mac.c +++ k.org/drivers/net/pasemi_mac.c @@ -63,10 +63,10 @@ NETIF_MSG_RX_ERR | \ NETIF_MSG_TX_ERR) -#define TX_DESC(mac, num) ((mac)->tx->desc[(num) & (TX_RING_SIZE-1)]) -#define TX_DESC_INFO(mac, num) ((mac)->tx->desc_info[(num) & (TX_RING_SIZE-1)]) -#define RX_DESC(mac, num) ((mac)->rx->desc[(num) & (RX_RING_SIZE-1)]) -#define RX_DESC_INFO(mac, num) ((mac)->rx->desc_info[(num) & (RX_RING_SIZE-1)]) +#define TX_RING(mac, num) ((mac)->tx->ring[(num) & (TX_RING_SIZE-1)]) +#define TX_RING_INFO(mac, num) ((mac)->tx->ring_info[(num) & (TX_RING_SIZE-1)]) +#define RX_RING(mac, num) ((mac)->rx->ring[(num) & (RX_RING_SIZE-1)]) +#define RX_RING_INFO(mac, num) ((mac)->rx->ring_info[(num) & (RX_RING_SIZE-1)]) #define RX_BUFF(mac, num) ((mac)->rx->buffers[(num) & (RX_RING_SIZE-1)]) #define RING_USED(ring)(((ring)->next_to_fill - (ring)->next_to_clean) \ @@ -174,22 +174,21 @@ static int pasemi_mac_setup_rx_resources spin_lock_init(&ring->lock); ring->size = RX_RING_SIZE; - ring->desc_info = kzalloc(sizeof(struct pasemi_mac_buffer) * + ring->ring_info = kzalloc(sizeof(struct pasemi_mac_buffer) * RX_RING_SIZE, GFP_KERNEL); - if (!ring->desc_info) - goto out_desc_info; + if (!ring->ring_info) + goto out_ring_info; /* Allocate descriptors */ - ring->desc = dma_alloc_coherent(&mac->dma_pdev->dev, - RX_RING_SIZE * - sizeof(struct pas_dma_xct_descr), + ring->ring = dma_alloc_coherent(&mac->dma_pdev->dev, + RX_RING_SIZE * sizeof(u64), &ring->dma, GFP_KERNEL); - if (!ring->desc) - goto out_desc; + if (!ring->ring) + goto out_ring_desc; - memset(ring->desc, 0, RX_RING_SIZE * sizeof(struct pas_dma_xct_descr)); + memset(ring->ring, 0, RX_RING_SIZE * sizeof(u64)); ring->buffers = dma_alloc_coherent(&mac->dma_pdev->dev, RX_RING_SIZE * sizeof(u64), @@ -203,7 +202,7 @@ static int pasemi_mac_setup_rx_resources write_dma_reg(mac, PAS_DMA_RXCHAN_BASEU(chan_id), PAS_DMA_RXCHAN_BASEU_BRBH(ring->dma >> 32) | - PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE >> 2)); + PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE >> 3)); write_dma_reg(mac, PAS_DMA_RXCHAN_CFG(chan_id), PAS_DMA_RXCHAN_CFG_HBU(2)); @@ -229,11 +228,11 @@ static int pasemi_mac_setup_rx_resources out_buffers: dma_free_coherent(&mac->dma_pdev->dev, - RX_RING_SIZE * sizeof(struct pas_dma_xct_descr), - mac->rx->desc, mac->rx->dma); -out_desc: - kfree(ring->desc_info); -out_desc_info: + RX_RING_SIZE * sizeof(u64), + mac->rx->ring, mac->rx->dma); +out_ring_desc: + kfree(ring->ring_info); +out_ring_info: kfree(ring); out_ring: return -ENOMEM; @@ -254,25 +253,24 @@ static int pasemi_mac_setup_tx_resources spin_lock_init(&ring->lock); ring->size = TX_RING_SIZE; - ring->desc_info = kzalloc(sizeof(struct pasemi_mac_buffer) * + ring->ring_info = kzalloc(sizeof(struct pasemi_mac_buffer) * TX_RING_SIZE, GFP_KERNEL); - if (!ring->desc_info) - goto out_desc_info; + if (!ring->ring_info) + goto out_ring_info; /* Allocate descriptors */ - ring->desc = dma_alloc_coherent(&mac->dma_pdev->dev, - TX_RING_SIZE * - sizeof(struct pas_dma_xct_descr), + ring->ring = dma_alloc_coherent(&mac->dma_pdev->dev, + TX_RING_SIZE * sizeof(u64), &ring->dma, GFP_KERNEL); - if (!ring->desc) - goto out_desc; + if (!ring->ring) + goto out_ring_desc; - memset(ring->desc, 0, TX_RING_SIZE * sizeof(struct pas_dma_xct_descr)); + memset(ring->ring, 0, TX_RING_SIZE * sizeof(u64)); write_dma_reg(mac, PAS_DMA_TXCHAN_BASEL(chan_id), PAS_DMA_TXCHAN_BASEL_BRBL(ring->dma)); val = PAS_DMA_TXCHAN_BASEU_BRBH(ring->dma >> 32); - val |= PAS_DMA_TXCHAN_BASEU_SIZ(TX_RING_SIZE >> 2); + val |=
[PATCH] [2/11] pasemi_mac: fix bug in receive buffer dma mapping
pasemi_mac: fix bug in receive buffer dma mapping skb->len isn't actually set to the size of the allocated skb, so don't try to use it when figuring out how much to map. (This hasn't surfaced as a real bug because we effectively disable translation for the interface, but it still needs fixing for the future) Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: k.org/drivers/net/pasemi_mac.c === --- k.org.orig/drivers/net/pasemi_mac.c +++ k.org/drivers/net/pasemi_mac.c @@ -396,7 +396,7 @@ static void pasemi_mac_replenish_rx_ring if (unlikely(!skb)) break; - dma = pci_map_single(mac->dma_pdev, skb->data, skb->len, + dma = pci_map_single(mac->dma_pdev, skb->data, BUF_SIZE, PCI_DMA_FROMDEVICE); if (unlikely(dma_mapping_error(dma))) { - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 12/13] Clean up redundant PHY write line for ULi526x Ethernet driver
From: Roy Zang <[EMAIL PROTECTED]> Clean up redundant PHY write line for ULi526x Ethernet Driver. Signed-off-by: Roy Zang <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> Acked-by: Grant Grundler <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/tulip/uli526x.c |1 - 1 file changed, 1 deletion(-) diff -puN drivers/net/tulip/uli526x.c~clean-up-redundant-phy-write-line-for-uli526x-ethernet drivers/net/tulip/uli526x.c --- a/drivers/net/tulip/uli526x.c~clean-up-redundant-phy-write-line-for-uli526x-ethernet +++ a/drivers/net/tulip/uli526x.c @@ -1599,7 +1599,6 @@ static void uli526x_process_mode(struct case ULI526X_100MFD: phy_reg = 0x2100; break; } phy_write(db->ioaddr, db->phy_addr, 0, phy_reg, db->chip_id); - phy_write(db->ioaddr, db->phy_addr, 0, phy_reg, db->chip_id); } } } _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
> We fixed a lot of bugs in TSO last year. > > It would be really great to see numbers with a more recent kernel > than 2.6.18 More data, sky2 works fine (really really fine, like 79MB/sec) between Linux dylan.bitmover.com 2.6.18.1 #5 SMP Mon Oct 23 17:36:00 PDT 2006 i686 Linux steele 2.6.20-16-generic #2 SMP Sun Sep 23 18:31:23 UTC 2007 x86_64 So this is looking like a e1000 bug. I'll try to upgrade the kernel on the ia64 box and see what happens. -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [1/11] pasemi_mac: basic error checking
pasemi_mac: basic error checking Add some rudimentary error checking to pasemi_mac. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: k.org/drivers/net/pasemi_mac.c === --- k.org.orig/drivers/net/pasemi_mac.c +++ k.org/drivers/net/pasemi_mac.c @@ -445,6 +445,38 @@ static void pasemi_mac_restart_tx_intr(s } +static inline void pasemi_mac_rx_error(struct pasemi_mac *mac, u64 macrx) +{ + unsigned int rcmdsta, ccmdsta; + + if (!netif_msg_rx_err(mac)) + return; + + rcmdsta = read_dma_reg(mac, PAS_DMA_RXINT_RCMDSTA(mac->dma_if)); + ccmdsta = read_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch)); + + printk(KERN_ERR "pasemi_mac: rx error. macrx %016lx, rx status %lx\n", + macrx, *mac->rx_status); + + printk(KERN_ERR "pasemi_mac: rcmdsta %08x ccmdsta %08x\n", + rcmdsta, ccmdsta); +} + +static inline void pasemi_mac_tx_error(struct pasemi_mac *mac, u64 mactx) +{ + unsigned int cmdsta; + + if (!netif_msg_tx_err(mac)) + return; + + cmdsta = read_dma_reg(mac, PAS_DMA_TXCHAN_TCMDSTA(mac->dma_txch)); + + printk(KERN_ERR "pasemi_mac: tx error. mactx 0x%016lx, "\ + "tx status 0x%016lx\n", mactx, *mac->tx_status); + + printk(KERN_ERR "pasemi_mac: tcmdsta 0x%08x\n", cmdsta); +} + static int pasemi_mac_clean_rx(struct pasemi_mac *mac, int limit) { unsigned int n; @@ -468,10 +500,13 @@ static int pasemi_mac_clean_rx(struct pa prefetchw(dp); macrx = dp->macrx; + if ((macrx & XCT_MACRX_E) || + (*mac->rx_status & PAS_STATUS_ERROR)) + pasemi_mac_rx_error(mac, macrx); + if (!(macrx & XCT_MACRX_O)) break; - info = NULL; /* We have to scan for our skb since there's no way @@ -563,6 +598,10 @@ restart: for (i = start; i < limit; i++) { dp = &TX_DESC(mac, i); + if ((dp->mactx & XCT_MACTX_E) || + (*mac->tx_status & PAS_STATUS_ERROR)) + pasemi_mac_tx_error(mac, dp->mactx); + if (unlikely(dp->mactx & XCT_MACTX_O)) /* Not yet transmitted */ break; @@ -607,9 +646,6 @@ static irqreturn_t pasemi_mac_rx_intr(in if (!(*mac->rx_status & PAS_STATUS_CAUSE_M)) return IRQ_NONE; - if (*mac->rx_status & PAS_STATUS_ERROR) - printk("rx_status reported error\n"); - /* Don't reset packet count so it won't fire again but clear * all others. */ @@ -1230,7 +1266,7 @@ pasemi_mac_probe(struct pci_dev *pdev, c dev_err(&mac->pdev->dev, "register_netdev failed with error %d\n", err); goto out; - } else + } else if netif_msg_probe(mac) printk(KERN_INFO "%s: PA Semi %s: intf %d, txch %d, rxch %d, " "hw addr %s\n", dev->name, mac->type == MAC_TYPE_GMAC ? "GMAC" : "XAUI", - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 06/13] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c
On Tue, Oct 02, 2007 at 02:11:38PM -0700, [EMAIL PROTECTED] wrote: > From: Micah Gruber <[EMAIL PROTECTED]> > > This patch fixes an apparent potential null dereference bug where we > dereference dev before a null check. This patch simply remvoes the > can't-happen test for a null pointer. > > Signed-off-by: Micah Gruber <[EMAIL PROTECTED]> > Cc: Grant Grundler <[EMAIL PROTECTED]> Acked-by: Grant Grundler <[EMAIL PROTECTED]> thanks! grant > Acked-by: Jeff Garzik <[EMAIL PROTECTED]> > Acked-by: Kyle McMartin <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > > drivers/net/tulip/uli526x.c |5 - > 1 file changed, 5 deletions(-) > > diff -puN > drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt > drivers/net/tulip/uli526x.c > --- > a/drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt > +++ a/drivers/net/tulip/uli526x.c > @@ -664,11 +664,6 @@ static irqreturn_t uli526x_interrupt(int > unsigned long ioaddr = dev->base_addr; > unsigned long flags; > > - if (!dev) { > - ULI526X_DBUG(1, "uli526x_interrupt() without DEVICE arg", 0); > - return IRQ_NONE; > - } > - > spin_lock_irqsave(&db->lock, flags); > outl(0, ioaddr + DCR7); > > _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [0/11] pasemi_mac: Patches for 2.6.24
Hi, This series of patches go on top of the previous fixes that were sent out and picked up. It's a series of mostly feature-related changes, but also a couple of bugfixes: [1/11] pasemi_mac: basic error checking [2/11] pasemi_mac: fix bug in receive buffer dma mapping [3/11] pasemi_mac: rework ring management [4/11] pasemi_mac: implement sg support [5/11] pasemi_mac: workaround for erratum 5971 [6/11] pasemi_mac: add local skb alignment [7/11] pasemi_mac: further performance tweaks [8/11] pasemi_mac: update todo list [9/11] pasemi_mac: clear out old errors on interface open [10/11] pasemi_mac: use buffer index pointer in clean_rx() [11/11] pasemi_mac: enable iommu support Thanks, -Olof - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
From: [EMAIL PROTECTED] (Larry McVoy) Date: Tue, 2 Oct 2007 09:48:58 -0700 > Isn't this something so straightforward that you would have tests for it? > This is the basic FTP server loop, doesn't someone have a big machine with > 10gig cards and test that sending/recving data doesn't regress? Nobody is really doing this, or they aren't talking about it. Sometimes the crash fixes and other work completely consumes us. Add in travel to conferences and real life, and it's no surprise stuff like this slips through the cracks. We absolutely depend upon people like you to report when there are anomalies like this. It's the only thing that scales. FWIW I have a t1000 Niagara box and an Ultra45 going through a netgear gigabit switch. I'm getting 85MB/sec in one direction and 10MB/sec in the other (using bw_tcp from lmbench3). Both are using identical broadcom tigon3 gigabit chips and identical current kernels so that is a truly strange result. I'll investigate, it may be the same thing you're seeing. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24] tg3: fix ethtool autonegotiate flags
On Tue, Oct 02, 2007 at 03:02:56PM -0700, Michael Chan wrote: > On Tue, 2007-10-02 at 16:16 -0400, Andy Gospodarek wrote: > > Adding that flag in tg3_set_settings seemed like the most logical > > place > > since the driver works fine on boot. This is just an issue when > > re-enabling autonegotiation, so we should probably nip it there. > > > > Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]> > > We also noticed this issue recently, but didn't pay too much attention > to it since it was more of a "cosmetic" issue. The driver behaves the > same since we rely on cmd->autoneg to decide whether to enable autoneg > or not. Your fix seems reasonable to me. Thanks. > > Acked-by: Michael Chan <[EMAIL PROTECTED]> > I completely agree that it's cosmetic, it just seems like something decent to toss in there since it's the kind of thing others will start complaining about. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 10/13] PHYLIB: IRQ event workqueue handling fixes
From: "Maciej W. Rozycki" <[EMAIL PROTECTED]> Keep track of disable_irq_nosync() invocations and call enable_irq() the right number of times if work has been cancelled that would include them. Now that the call to flush_work_keventd() (problematic because of rtnl_mutex being held) has been replaced by cancel_work_sync() another issue has arisen and been left unresolved. As the MDIO bus cannot be accessed from the interrupt context the PHY interrupt handler uses disable_irq_nosync() to prevent from looping and schedules some work to be done as a softirq, which, apart from handling the state change of the originating PHY, is responsible for reenabling the interrupt. Now if the interrupt line is shared by another device and a call to the softirq handler has been cancelled, that call to enable_irq() never happens and the other device cannot use its interrupt anymore as its stuck disabled. I decided to use a counter rather than a flag because there may be more than one call to phy_change() cancelled in the queue -- a real one and a fake one triggered by free_irq() if DEBUG_SHIRQ is used, if nothing else. Therefore because of its nesting property enable_irq() has to be called the right number of times to match the number disable_irq_nosync() was called and restore the original state. This DEBUG_SHIRQ feature is also the reason why free_irq() has to be called before cancel_work_sync(). While at it I updated the comment about phy_stop_interrupts() being called from `keventd' -- this is no longer relevant as the use of cancel_work_sync() makes such an approach unnecessary. OTOH a similar comment referring to flush_scheduled_work() in phy_stop() still applies as using cancel_work_sync() there would be dangerous. Checked with checkpatch.pl and at the run time (with and without DEBUG_SHIRQ). Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]> Cc: Andy Fleming <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/phy/phy.c | 24 +++- include/linux/phy.h |3 +++ 2 files changed, 22 insertions(+), 5 deletions(-) diff -puN drivers/net/phy/phy.c~phylib-irq-event-workqueue-handling-fixes drivers/net/phy/phy.c --- a/drivers/net/phy/phy.c~phylib-irq-event-workqueue-handling-fixes +++ a/drivers/net/phy/phy.c @@ -7,7 +7,7 @@ * Author: Andy Fleming * * Copyright (c) 2004 Freescale Semiconductor, Inc. - * Copyright (c) 2006 Maciej W. Rozycki + * Copyright (c) 2006, 2007 Maciej W. Rozycki * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the @@ -35,6 +35,7 @@ #include #include +#include #include #include #include @@ -562,6 +563,7 @@ static irqreturn_t phy_interrupt(int irq * queue will write the PHY to disable and clear the * interrupt, and then reenable the irq line. */ disable_irq_nosync(irq); + atomic_inc(&phydev->irq_disable); schedule_work(&phydev->phy_queue); @@ -632,6 +634,7 @@ int phy_start_interrupts(struct phy_devi INIT_WORK(&phydev->phy_queue, phy_change); + atomic_set(&phydev->irq_disable, 0); if (request_irq(phydev->irq, phy_interrupt, IRQF_SHARED, "phy_interrupt", @@ -662,13 +665,22 @@ int phy_stop_interrupts(struct phy_devic if (err) phy_error(phydev); + free_irq(phydev->irq, phydev); + /* -* Finish any pending work; we might have been scheduled to be called -* from keventd ourselves, but cancel_work_sync() handles that. +* Cannot call flush_scheduled_work() here as desired because +* of rtnl_lock(), but we do not really care about what would +* be done, except from enable_irq(), so cancel any work +* possibly pending and take care of the matter below. */ cancel_work_sync(&phydev->phy_queue); - - free_irq(phydev->irq, phydev); + /* +* If work indeed has been cancelled, disable_irq() will have +* been left unbalanced from phy_interrupt() and enable_irq() +* has to be called so that other devices on the line work. +*/ + while (atomic_dec_return(&phydev->irq_disable) >= 0) + enable_irq(phydev->irq); return err; } @@ -695,6 +707,7 @@ static void phy_change(struct work_struc phydev->state = PHY_CHANGELINK; spin_unlock_bh(&phydev->lock); + atomic_dec(&phydev->irq_disable); enable_irq(phydev->irq); /* Reenable interrupts */ @@ -708,6 +721,7 @@ static void phy_change(struct work_struc irq_enable_err: disable_irq(phydev->irq); + atomic_inc(&phydev->irq_disable); phy_err: phy_error(phydev); } diff -puN include/linux/phy.h~phylib-irq-event-workqueue-handling-fixes include/linux/phy.h --- a/include/linux/phy.h~p
[patch 07/13] PHYLIB: Spinlock fixes for softirqs
From: "Maciej W. Rozycki" <[EMAIL PROTECTED]> Use spin_lock_bh()/spin_unlock_bh() for the phydev lock throughout as it is used in phy_timer() that is called as a softirq and all the other operations may happen in the user context. There has been a change recently that did such a conversion for some of the operations on the lock, but some have been left intact. Many of them, perhaps all, may be called in the user context and I was able to trigger recursive spinlock acquisition indeed, so I think for the sake of long-term maintenance it is best to convert them all, even if unnecessarily for one or two -- better safe than sorry. Perhaps one in phy_timer() could actually be skipped as only called as a softirq -- I can send an update if that sounds like a good idea. Checked with checkpatch.pl and at the runtime. Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/phy/phy.c| 24 drivers/net/phy/phy_device.c |4 ++-- 2 files changed, 14 insertions(+), 14 deletions(-) diff -puN drivers/net/phy/phy.c~phylib-spinlock-fixes-for-softirqs drivers/net/phy/phy.c --- a/drivers/net/phy/phy.c~phylib-spinlock-fixes-for-softirqs +++ a/drivers/net/phy/phy.c @@ -424,7 +424,7 @@ int phy_start_aneg(struct phy_device *ph { int err; - spin_lock(&phydev->lock); + spin_lock_bh(&phydev->lock); if (AUTONEG_DISABLE == phydev->autoneg) phy_sanitize_settings(phydev); @@ -445,7 +445,7 @@ int phy_start_aneg(struct phy_device *ph } out_unlock: - spin_unlock(&phydev->lock); + spin_unlock_bh(&phydev->lock); return err; } EXPORT_SYMBOL(phy_start_aneg); @@ -490,10 +490,10 @@ void phy_stop_machine(struct phy_device { del_timer_sync(&phydev->phy_timer); - spin_lock(&phydev->lock); + spin_lock_bh(&phydev->lock); if (phydev->state > PHY_UP) phydev->state = PHY_UP; - spin_unlock(&phydev->lock); + spin_unlock_bh(&phydev->lock); phydev->adjust_state = NULL; } @@ -537,9 +537,9 @@ static void phy_force_reduction(struct p */ void phy_error(struct phy_device *phydev) { - spin_lock(&phydev->lock); + spin_lock_bh(&phydev->lock); phydev->state = PHY_HALTED; - spin_unlock(&phydev->lock); + spin_unlock_bh(&phydev->lock); } /** @@ -690,10 +690,10 @@ static void phy_change(struct work_struc if (err) goto phy_err; - spin_lock(&phydev->lock); + spin_lock_bh(&phydev->lock); if ((PHY_RUNNING == phydev->state) || (PHY_NOLINK == phydev->state)) phydev->state = PHY_CHANGELINK; - spin_unlock(&phydev->lock); + spin_unlock_bh(&phydev->lock); enable_irq(phydev->irq); @@ -718,7 +718,7 @@ phy_err: */ void phy_stop(struct phy_device *phydev) { - spin_lock(&phydev->lock); + spin_lock_bh(&phydev->lock); if (PHY_HALTED == phydev->state) goto out_unlock; @@ -734,7 +734,7 @@ void phy_stop(struct phy_device *phydev) } out_unlock: - spin_unlock(&phydev->lock); + spin_unlock_bh(&phydev->lock); /* * Cannot call flush_scheduled_work() here as desired because @@ -782,7 +782,7 @@ static void phy_timer(unsigned long data int needs_aneg = 0; int err = 0; - spin_lock(&phydev->lock); + spin_lock_bh(&phydev->lock); if (phydev->adjust_state) phydev->adjust_state(phydev->attached_dev); @@ -948,7 +948,7 @@ static void phy_timer(unsigned long data break; } - spin_unlock(&phydev->lock); + spin_unlock_bh(&phydev->lock); if (needs_aneg) err = phy_start_aneg(phydev); diff -puN drivers/net/phy/phy_device.c~phylib-spinlock-fixes-for-softirqs drivers/net/phy/phy_device.c --- a/drivers/net/phy/phy_device.c~phylib-spinlock-fixes-for-softirqs +++ a/drivers/net/phy/phy_device.c @@ -670,9 +670,9 @@ static int phy_remove(struct device *dev phydev = to_phy_device(dev); - spin_lock(&phydev->lock); + spin_lock_bh(&phydev->lock); phydev->state = PHY_DOWN; - spin_unlock(&phydev->lock); + spin_unlock_bh(&phydev->lock); if (phydev->drv->remove) phydev->drv->remove(phydev); _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 06/13] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c
From: Micah Gruber <[EMAIL PROTECTED]> This patch fixes an apparent potential null dereference bug where we dereference dev before a null check. This patch simply remvoes the can't-happen test for a null pointer. Signed-off-by: Micah Gruber <[EMAIL PROTECTED]> Cc: Grant Grundler <[EMAIL PROTECTED]> Acked-by: Jeff Garzik <[EMAIL PROTECTED]> Acked-by: Kyle McMartin <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/tulip/uli526x.c |5 - 1 file changed, 5 deletions(-) diff -puN drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt drivers/net/tulip/uli526x.c --- a/drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt +++ a/drivers/net/tulip/uli526x.c @@ -664,11 +664,6 @@ static irqreturn_t uli526x_interrupt(int unsigned long ioaddr = dev->base_addr; unsigned long flags; - if (!dev) { - ULI526X_DBUG(1, "uli526x_interrupt() without DEVICE arg", 0); - return IRQ_NONE; - } - spin_lock_irqsave(&db->lock, flags); outl(0, ioaddr + DCR7); _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 03/13] drivers/net/cxgb3/xgmac.c: remove dead code
From: Adrian Bunk <[EMAIL PROTECTED]> This patch removes dead code ("tx_xcnt" can never be != 0 at this place) spotted by the Coverity checker. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/cxgb3/xgmac.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff -puN drivers/net/cxgb3/xgmac.c~drivers-net-cxgb3-xgmacc-remove-dead-code drivers/net/cxgb3/xgmac.c --- a/drivers/net/cxgb3/xgmac.c~drivers-net-cxgb3-xgmacc-remove-dead-code +++ a/drivers/net/cxgb3/xgmac.c @@ -522,10 +522,7 @@ int t3b2_mac_watchdog_task(struct cmac * goto rxcheck; } - if (((tx_tcnt != mac->tx_tcnt) && -(tx_xcnt == 0) && (mac->tx_xcnt == 0)) || - ((mac->tx_mcnt == tx_mcnt) && -(tx_xcnt != 0) && (mac->tx_xcnt != 0))) { + if ((tx_tcnt != mac->tx_tcnt) && (mac->tx_xcnt == 0)) { if (mac->toggle_cnt > 4) { status = 2; goto out; _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 02/13] PCI-X/PCI-Express read control interfaces: use them in e1000
From: "Peter Oruba" <[EMAIL PROTECTED]> These driver changes incorporate the proposed PCI-X / PCI-Express read byte count interface. Reading and setting those valuse doesn't take place "manually", instead wrapping functions are called to allow quirks for some PCI bridges. [EMAIL PROTECTED]: e1000: #if 0 two functions] Signed-off by: Peter Oruba <[EMAIL PROTECTED]> Based on work by Stephen Hemminger <[EMAIL PROTECTED]> Acked-by: Auke Kok <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/e1000/e1000_hw.c | 25 +++-- drivers/net/e1000/e1000_hw.h |4 ++-- drivers/net/e1000/e1000_main.c | 18 ++ 3 files changed, 23 insertions(+), 24 deletions(-) diff -puN drivers/net/e1000/e1000_hw.c~pci-x-pci-express-read-control-interfaces-e1000 drivers/net/e1000/e1000_hw.c --- a/drivers/net/e1000/e1000_hw.c~pci-x-pci-express-read-control-interfaces-e1000 +++ a/drivers/net/e1000/e1000_hw.c @@ -871,10 +871,6 @@ e1000_init_hw(struct e1000_hw *hw) uint32_t ctrl; uint32_t i; int32_t ret_val; -uint16_t pcix_cmd_word; -uint16_t pcix_stat_hi_word; -uint16_t cmd_mmrbc; -uint16_t stat_mmrbc; uint32_t mta_size; uint32_t reg_data; uint32_t ctrl_ext; @@ -964,24 +960,9 @@ e1000_init_hw(struct e1000_hw *hw) break; default: /* Workaround for PCI-X problem when BIOS sets MMRBC incorrectly. */ -if (hw->bus_type == e1000_bus_type_pcix) { -e1000_read_pci_cfg(hw, PCIX_COMMAND_REGISTER, &pcix_cmd_word); -e1000_read_pci_cfg(hw, PCIX_STATUS_REGISTER_HI, -&pcix_stat_hi_word); -cmd_mmrbc = (pcix_cmd_word & PCIX_COMMAND_MMRBC_MASK) >> -PCIX_COMMAND_MMRBC_SHIFT; -stat_mmrbc = (pcix_stat_hi_word & PCIX_STATUS_HI_MMRBC_MASK) >> -PCIX_STATUS_HI_MMRBC_SHIFT; -if (stat_mmrbc == PCIX_STATUS_HI_MMRBC_4K) -stat_mmrbc = PCIX_STATUS_HI_MMRBC_2K; -if (cmd_mmrbc > stat_mmrbc) { -pcix_cmd_word &= ~PCIX_COMMAND_MMRBC_MASK; -pcix_cmd_word |= stat_mmrbc << PCIX_COMMAND_MMRBC_SHIFT; -e1000_write_pci_cfg(hw, PCIX_COMMAND_REGISTER, -&pcix_cmd_word); -} -} -break; + if (hw->bus_type == e1000_bus_type_pcix && e1000_pcix_get_mmrbc(hw) > 2048) + e1000_pcix_set_mmrbc(hw, 2048); + break; } /* More time needed for PHY to initialize */ diff -puN drivers/net/e1000/e1000_hw.h~pci-x-pci-express-read-control-interfaces-e1000 drivers/net/e1000/e1000_hw.h --- a/drivers/net/e1000/e1000_hw.h~pci-x-pci-express-read-control-interfaces-e1000 +++ a/drivers/net/e1000/e1000_hw.h @@ -421,9 +421,9 @@ void e1000_tbi_adjust_stats(struct e1000 void e1000_get_bus_info(struct e1000_hw *hw); void e1000_pci_set_mwi(struct e1000_hw *hw); void e1000_pci_clear_mwi(struct e1000_hw *hw); -void e1000_read_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t * value); -void e1000_write_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t * value); int32_t e1000_read_pcie_cap_reg(struct e1000_hw *hw, uint32_t reg, uint16_t *value); +void e1000_pcix_set_mmrbc(struct e1000_hw *hw, int mmrbc); +int e1000_pcix_get_mmrbc(struct e1000_hw *hw); /* Port I/O is only supported on 82544 and newer */ void e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value); int32_t e1000_disable_pciex_master(struct e1000_hw *hw); diff -puN drivers/net/e1000/e1000_main.c~pci-x-pci-express-read-control-interfaces-e1000 drivers/net/e1000/e1000_main.c --- a/drivers/net/e1000/e1000_main.c~pci-x-pci-express-read-control-interfaces-e1000 +++ a/drivers/net/e1000/e1000_main.c @@ -4887,6 +4887,8 @@ e1000_pci_clear_mwi(struct e1000_hw *hw) pci_clear_mwi(adapter->pdev); } +#if 0 + void e1000_read_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t *value) { @@ -4903,6 +4905,22 @@ e1000_write_pci_cfg(struct e1000_hw *hw, pci_write_config_word(adapter->pdev, reg, *value); } +#endif /* 0 */ + +int +e1000_pcix_get_mmrbc(struct e1000_hw *hw) +{ + struct e1000_adapter *adapter = hw->back; + return pcix_get_mmrbc(adapter->pdev); +} + +void +e1000_pcix_set_mmrbc(struct e1000_hw *hw, int mmrbc) +{ + struct e1000_adapter *adapter = hw->back; + pcix_set_mmrbc(adapter->pdev, mmrbc); +} + int32_t e1000_read_pcie_cap_reg(struct e1000_hw *hw, uint32_t reg, uint16_t *value) { _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 08/13] forcedeth: power down phy when interface is down
From: "Ed Swierk" <[EMAIL PROTECTED]> Bring the physical link down when the interface is down, by placing the PHY in power-down state. This mirrors the behavior of other drivers including e1000 and tg3. Signed-off-by: Ed Swierk <[EMAIL PROTECTED]> Cc: Ayaz Abdulla <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> On Sat, 29 Sep 2007 01:57:04 -0400 Jeff Garzik <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > From: "Ed Swierk" <[EMAIL PROTECTED]> > > > > Bring the physical link down when the interface is down, by placing the PHY > > in > > power-down state. This mirrors the behavior of other drivers including > > e1000 > > and tg3. > > > > Signed-off-by: Ed Swierk <[EMAIL PROTECTED]> > > Cc: Ayaz Abdulla <[EMAIL PROTECTED]> > > Cc: Jeff Garzik <[EMAIL PROTECTED]> > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > > > HOLD -- waiting a bit for comment from others, particularly NVIDIA. > > I'm not opposed to applying it, the patch looks correct, but I would > also like see testing results and general "it's ok for this hardware" > comments. > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/forcedeth.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff -puN drivers/net/forcedeth.c~forcedeth-power-down-phy-when-interface-is-down drivers/net/forcedeth.c --- a/drivers/net/forcedeth.c~forcedeth-power-down-phy-when-interface-is-down +++ a/drivers/net/forcedeth.c @@ -1313,9 +1313,9 @@ static int phy_init(struct net_device *d /* some phys clear out pause advertisment on reset, set it back */ mii_rw(dev, np->phyaddr, MII_ADVERTISE, reg); - /* restart auto negotiation */ + /* restart auto negotiation, power down phy */ mii_control = mii_rw(dev, np->phyaddr, MII_BMCR, MII_READ); - mii_control |= (BMCR_ANRESTART | BMCR_ANENABLE); + mii_control |= (BMCR_ANRESTART | BMCR_ANENABLE | BMCR_PDOWN); if (mii_rw(dev, np->phyaddr, MII_BMCR, mii_control)) { return PHY_ERROR; } @@ -4791,6 +4791,10 @@ static int nv_open(struct net_device *de dprintk(KERN_DEBUG "nv_open: begin\n"); + /* power up phy */ + mii_rw(dev, np->phyaddr, MII_BMCR, + mii_rw(dev, np->phyaddr, MII_BMCR, MII_READ) & ~BMCR_PDOWN); + /* erase previous misconfiguration */ if (np->driver_data & DEV_HAS_POWER_CNTRL) nv_mac_reset(dev); @@ -4975,6 +4979,10 @@ static int nv_close(struct net_device *d nv_start_rx(dev); } + /* power down phy */ + mii_rw(dev, np->phyaddr, MII_BMCR, + mii_rw(dev, np->phyaddr, MII_BMCR, MII_READ) | BMCR_PDOWN); + /* FIXME: power down nic */ return 0; _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 09/13] forcedeth: "no link" is informational
From: "Ed Swierk" <[EMAIL PROTECTED]> Log "no link during initialization" at KERN_INFO as it's not an error, and occurs every time the interface comes up (when the forcedeth-phy-power-down patch is applied). Signed-off-by: Ed Swierk <[EMAIL PROTECTED]> Cc: Ayaz Abdulla <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/forcedeth.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN drivers/net/forcedeth.c~forcedeth-no-link-is-informational drivers/net/forcedeth.c --- a/drivers/net/forcedeth.c~forcedeth-no-link-is-informational +++ a/drivers/net/forcedeth.c @@ -4921,7 +4921,7 @@ static int nv_open(struct net_device *de if (ret) { netif_carrier_on(dev); } else { - printk("%s: no link during initialization.\n", dev->name); + printk(KERN_INFO "%s: no link during initialization.\n", dev->name); netif_carrier_off(dev); } if (oom) _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 05/13] skge: remove broken and unused PHY_M_PC_MDI_XMODE macro
From: Mariusz Kozlowski <[EMAIL PROTECTED]> Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]> Cc: Stephen Hemminger <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/skge.h |2 -- 1 file changed, 2 deletions(-) diff -puN drivers/net/skge.h~skge-remove-broken-and-unused-phy_m_pc_mdi_xmode-macro drivers/net/skge.h --- a/drivers/net/skge.h~skge-remove-broken-and-unused-phy_m_pc_mdi_xmode-macro +++ a/drivers/net/skge.h @@ -1351,8 +1351,6 @@ enum { PHY_M_PC_EN_DET_PLUS= 3<<8, /* Energy Detect Plus (Mode 2) */ }; -#define PHY_M_PC_MDI_XMODE(x) u16)(x)<<5) & PHY_M_PC_MDIX_MSK) - enum { PHY_M_PC_MAN_MDI= 0, /* 00 = Manual MDI configuration */ PHY_M_PC_MAN_MDIX = 1, /* 01 = Manual MDIX configuration */ _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 01/13] PHY fixed driver: rework release path and update phy_id notation
From: Vitaly Bordug <[EMAIL PROTECTED]> device_bind_driver() error code returning has been fixed. release() function has been written, so that to free resources in correct way; the release path is now clean. Before the rework, it used to cause Device '[EMAIL PROTECTED]:1' does not have a release() function, it is broken and must be fixed. BUG: at drivers/base/core.c:104 device_release() Call Trace: [] kobject_cleanup+0x53/0x7e [] kobject_release+0x0/0x9 [] kref_put+0x74/0x81 [] fixed_mdio_register_device+0x230/0x265 [] fixed_init+0x1f/0x35 [] init+0x147/0x2fb [] schedule_tail+0x36/0x92 [] child_rip+0xa/0x12 [] acpi_ds_init_one_object+0x0/0x83 [] init+0x0/0x2fb [] child_rip+0x0/0x12 Also changed the notation of the fixed phy definition on mdio bus to the form of + to make it able to be used by gianfar and ucc_geth that define phy_id strictly as "%d:%d" and cleaned up the whitespace issues. Signed-off-by: Vitaly Bordug <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/phy/Kconfig | 14 + drivers/net/phy/fixed.c | 310 ++-- include/linux/phy_fixed.h | 38 3 files changed, 207 insertions(+), 155 deletions(-) diff -puN drivers/net/phy/Kconfig~phy-fixed-driver-rework-release-path-and-update drivers/net/phy/Kconfig --- a/drivers/net/phy/Kconfig~phy-fixed-driver-rework-release-path-and-update +++ a/drivers/net/phy/Kconfig @@ -76,4 +76,18 @@ config FIXED_MII_100_FDX bool "Emulation for 100M Fdx fixed PHY behavior" depends on FIXED_PHY +config FIXED_MII_1000_FDX + bool "Emulation for 1000M Fdx fixed PHY behavior" + depends on FIXED_PHY + +config FIXED_MII_AMNT +int "Number of emulated PHYs to allocate " +depends on FIXED_PHY +default "1" +---help--- +Sometimes it is required to have several independent emulated +PHYs on the bus (in case of multi-eth but phy-less HW for instance). +This control will have specified number allocated for each fixed +PHY type enabled. + endif # PHYLIB diff -puN drivers/net/phy/fixed.c~phy-fixed-driver-rework-release-path-and-update drivers/net/phy/fixed.c --- a/drivers/net/phy/fixed.c~phy-fixed-driver-rework-release-path-and-update +++ a/drivers/net/phy/fixed.c @@ -30,53 +30,31 @@ #include #include #include +#include #include #include #include -#define MII_REGS_NUM 7 - -/* -The idea is to emulate normal phy behavior by responding with -pre-defined values to mii BMCR read, so that read_status hook could -take all the needed info. -*/ - -struct fixed_phy_status { - u8 link; - u16 speed; - u8 duplex; -}; - -/*- - * Private information hoder for mii_bus - *-*/ -struct fixed_info { - u16 *regs; - u8 regs_num; - struct fixed_phy_status phy_status; - struct phy_device *phydev; /* pointer to the container */ - /* link & speed cb */ - int(*link_update)(struct net_device*, struct fixed_phy_status*); - -}; +/* we need to track the allocated pointers in order to free them on exit */ +static struct fixed_info *fixed_phy_ptrs[CONFIG_FIXED_MII_AMNT*MAX_PHY_AMNT]; /*- * If something weird is required to be done with link/speed, * network driver is able to assign a function to implement this. * May be useful for PHY's that need to be software-driven. *-*/ -int fixed_mdio_set_link_update(struct phy_device* phydev, - int(*link_update)(struct net_device*, struct fixed_phy_status*)) +int fixed_mdio_set_link_update(struct phy_device *phydev, + int (*link_update) (struct net_device *, + struct fixed_phy_status *)) { struct fixed_info *fixed; - if(link_update == NULL) + if (link_update == NULL) return -EINVAL; - if(phydev) { - if(phydev->bus) { + if (phydev) { + if (phydev->bus) { fixed = phydev->bus->priv; fixed->link_update = link_update; return 0; @@ -84,54 +62,64 @@ int fixed_mdio_set_link_update(struct ph } return -EINVAL; } + EXPORT_SYMBOL(fixed_mdio_set_link_update); +struct fixed_info *fixed_mdio_get_phydev (int phydev_ind) +{ + if (phydev_ind >= MAX_PHY_AMNT) + return NULL; + return fixed_phy_ptrs[phydev_ind]; +} + +EXPORT_SYMBOL(fixed_mdio_get_phydev); + /*- * This is used for updating internal mii regs from the sta
[patch 04/13] Avoid possible NULL pointer deref in 3c359 driver
From: Jesper Juhl <[EMAIL PROTECTED]> In xl_freemem(), if dev_if is NULL, the line struct xl_private *xl_priv =(struct xl_private *)dev->priv; will cause a NULL pointer dereference. (akpm: don't try to fix it: just delete the pointless test-for-null) Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/tokenring/3c359.c |5 - 1 file changed, 5 deletions(-) diff -puN drivers/net/tokenring/3c359.c~avoid-possible-null-pointer-deref-in-3c359-driver drivers/net/tokenring/3c359.c --- a/drivers/net/tokenring/3c359.c~avoid-possible-null-pointer-deref-in-3c359-driver +++ a/drivers/net/tokenring/3c359.c @@ -1045,11 +1045,6 @@ static irqreturn_t xl_interrupt(int irq, u8 __iomem * xl_mmio = xl_priv->xl_mmio ; u16 intstatus, macstatus ; - if (!dev) { - printk(KERN_WARNING "Device structure dead, aaa !\n") ; - return IRQ_NONE; - } - intstatus = readw(xl_mmio + MMIO_INTSTATUS) ; if (!(intstatus & 1)) /* We didn't generate the interrupt */ _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/3] git-net: sctp build fix (not for applying)
From: Andrew Morton <[EMAIL PROTECTED]> net/sctp/sm_statetable.c:551: error: 'sctp_sf_tabort_8_4_8' undeclared here (not in a function) Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- net/sctp/sm_statetable.c |2 -- 1 file changed, 2 deletions(-) diff -puN net/sctp/sm_statetable.c~git-net-sctp-hack net/sctp/sm_statetable.c --- a/net/sctp/sm_statetable.c~git-net-sctp-hack +++ a/net/sctp/sm_statetable.c @@ -527,8 +527,6 @@ static const sctp_sm_table_entry_t prsct /* SCTP_STATE_EMPTY */ \ TYPE_SCTP_FUNC(sctp_sf_ootb), \ /* SCTP_STATE_CLOSED */ \ - TYPE_SCTP_FUNC(sctp_sf_tabort_8_4_8), \ - /* SCTP_STATE_COOKIE_WAIT */ \ TYPE_SCTP_FUNC(sctp_sf_discard_chunk), \ /* SCTP_STATE_COOKIE_ECHOED */ \ TYPE_SCTP_FUNC(sctp_sf_eat_auth), \ _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/3] ipg.c doesn't compile with with CONFIG_HIGHMEM64G
From: trem <[EMAIL PROTECTED]> I've tried to compile 2.6.23-rc8-mm2, but it fails on ipg.c with the error : ERROR: "__udivdi3" [drivers/net/ipg.ko] undefined! I've instigated a bit, and I've found this code in ipg.c : static void ipg_nic_txfree(struct net_device *dev) { struct ipg_nic_private *sp = netdev_priv(dev); void __iomem *ioaddr = sp->ioaddr; const unsigned int curr = ipg_r32(TFD_LIST_PTR_0) - (sp->txd_map / sizeof(struct ipg_tx)) - 1; unsigned int released, pending; sp->txd_map is an u64 because : dma_addr_t txd_map; And in asm-i386/types.h, I see : #ifdef CONFIG_HIGHMEM64G typedef u64 dma_addr_t; #else typedef u32 dma_addr_t; #endif I my config, I use CONFIG_HIGHMEM64G sizeof(struct ipg_tx) is an u32 So the div failed on i386 because of u64 / u32. [EMAIL PROTECTED]: cleanups] Cc: Sorbica Shieh <[EMAIL PROTECTED]> Cc: Jesse Huang <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> Cc: "David S. Miller" <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/ipg.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff -puN drivers/net/ipg.c~ipgc-doesnt-compile-with-with-config_highmem64g drivers/net/ipg.c --- a/drivers/net/ipg.c~ipgc-doesnt-compile-with-with-config_highmem64g +++ a/drivers/net/ipg.c @@ -25,6 +25,8 @@ #include #include +#include + #define IPG_RX_RING_BYTES (sizeof(struct ipg_rx) * IPG_RFDLIST_LENGTH) #define IPG_TX_RING_BYTES (sizeof(struct ipg_tx) * IPG_TFDLIST_LENGTH) #define IPG_RESET_MASK \ @@ -836,10 +838,14 @@ static void ipg_nic_txfree(struct net_de { struct ipg_nic_private *sp = netdev_priv(dev); void __iomem *ioaddr = sp->ioaddr; - const unsigned int curr = ipg_r32(TFD_LIST_PTR_0) - - (sp->txd_map / sizeof(struct ipg_tx)) - 1; + unsigned int curr; + u64 txd_map; unsigned int released, pending; + txd_map = (u64)sp->txd_map; + curr = ipg_r32(TFD_LIST_PTR_0) - + do_div(txd_map, sizeof(struct ipg_tx)) - 1; + IPG_DEBUG_MSG("_nic_txfree\n"); pending = sp->tx_current - sp->tx_dirty; _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Fix ICMPv6 redirect handling with target multicast address
Hi David, David Stevens wrote: ipv6_addr_type() returns a mask, so checking for equality will fail to match if any other (irrelevant) attributes are set. How about using bitwise operators for that? ipv6_addr_type() does return a mask, but there's a lot of code that just checks for equality since some things are mutually-exclusive - this code is actually identical to what ip6_route_add() does. I don't particularly like this duality, but it's there - I'd gladly volunteer to clean this up everywhere if I didn't think there might be some performance reason it was done like that. Also, the error message is no longer descriptive of the failure if it's a link-local multicast, but you could make it "target address is not link-local unicast.\n" (in both places). I can do that, thanks. -Brian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/3] git-net: make it compile (not for applying?)
From: Andrew Morton <[EMAIL PROTECTED]> drivers/net/hamradio/baycom_epp.c: In function 'baycom_probe': drivers/net/hamradio/baycom_epp.c:1162: error: 'struct net_device' has no member named 'hard_header' drivers/net/hamradio/baycom_epp.c:1163: error: 'struct net_device' has no member named 'rebuild_header' Cc: "David S. Miller" <[EMAIL PROTECTED]> Cc: Stephen Hemminger <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/net/hamradio/baycom_epp.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff -puN drivers/net/hamradio/baycom_epp.c~git-net-more-bustage drivers/net/hamradio/baycom_epp.c --- a/drivers/net/hamradio/baycom_epp.c~git-net-more-bustage +++ a/drivers/net/hamradio/baycom_epp.c @@ -1159,8 +1159,8 @@ static void baycom_probe(struct net_devi /* Fill in the fields of the device structure */ bc->skb = NULL; - dev->hard_header = ax25_hard_header; - dev->rebuild_header = ax25_rebuild_header; +// dev->hard_header = ax25_hard_header; +// dev->rebuild_header = ax25_rebuild_header; dev->set_mac_address = baycom_set_mac_address; dev->type = ARPHRD_AX25; /* AF_AX25 device */ _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24] tg3: fix ethtool autonegotiate flags
On Tue, 2007-10-02 at 16:16 -0400, Andy Gospodarek wrote: > Adding that flag in tg3_set_settings seemed like the most logical > place > since the driver works fine on boot. This is just an issue when > re-enabling autonegotiation, so we should probably nip it there. > > Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]> We also noticed this issue recently, but didn't pay too much attention to it since it was more of a "cosmetic" issue. The driver behaves the same since we rely on cmd->autoneg to decide whether to enable autoneg or not. Your fix seems reasonable to me. Thanks. Acked-by: Michael Chan <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver
Arnaldo Carvalho de Melo wrote: Em Tue, Oct 02, 2007 at 03:10:11PM +0200, Urs Thuermann escreveu: + +#ifdef CONFIG_CAN_DEBUG_DEVICES +static int debug; +module_param(debug, int, S_IRUGO); +#endif Can debug be a boolean? Like its counterpart on DCCP: net/dccp/proto.c: module_param(dccp_debug, bool, 0444); 'debug' should remain an integer to be able to specifiy debug-levels or bit-fields for different Debug outputs. Where we also use a namespace prefix, for those of us who use ctags or cscope. Even if i don't have any general objections to rename this 'debug' to 'vcan_debug', it looks like an 'overnamed' module parameter for me. Is this a genereal naming scheme recommendation for debug module_params? +/* + * CAN test feature: + * Enable the echo on driver level for testing the CAN core echo modes. + * See Documentation/networking/can.txt for details. + */ + +static int echo; /* echo testing. Default: 0 (Off) */ +module_param(echo, int, S_IRUGO); +MODULE_PARM_DESC(echo, "Echo sent frames (for testing). Default: 0 (Off)"); echo also seems to be a boolean Yes. This is definitely a boolean candidate. We'll change that. Thanks, Oliver - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
> It would be really great to see numbers with a more recent kernel > than 2.6.18 FWIW Debian has binaries for 2.6.21 in testing and for 2.6.22 in unstable so it should be very easy for Larry to try at least those. - R. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
Alternatively, take your favorite test programs, such as John's, and make a second pair that reverses the direction the data is sent. So one pair is server sends, the other is server receives, try both. That's where we started, BitKeeper, my stripped down test, and John's test all exhibit the same behavior. And the rsh test is just a really simple way to demonstrate it. Netperf TCP_STREAM - server receives. TCP_MAERTS (STREAM backwards) - server sends: [EMAIL PROTECTED] ~]# netperf -H 192.168.2.107 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.107 (192.168.2.107) port 0 AF_INET : demo Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 87380 8738010.17 941.46 [EMAIL PROTECTED] ~]# netperf -H 192.168.2.107 -t TCP_MAERTS TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.107 (192.168.2.107) port 0 AF_INET : demo Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 87380 8738010.15 941.35 The above took all the defaults for socket buffers and such. [EMAIL PROTECTED] ~]# uname -a Linux hpcpc106.cup.hp.com 2.6.18-8.el5 #1 SMP Fri Jan 26 14:16:09 EST 2007 ia64 ia64 ia64 GNU/Linux [EMAIL PROTECTED] ~]# ethtool -i eth2 driver: e1000 version: 7.2.7-k2-NAPI firmware-version: N/A bus-info: :06:01.0 between a pair of 1.6 GHz itanium2 montecito rx2660's with a dual-port HP A9900A (Intel 82546GB) in slot 3 of the io cage on each. Connection is actually back-to-back rather than through a switch. I'm afraid I've nothing older installed. sysctl settings attached Where I do have things connected via a switch (HP ProCurve 3500 IIRC, perhaps a 2724) is through the core BCM5704: [EMAIL PROTECTED] netperf2_work]# netperf -H hpcpc107 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to hpcpc107.cup.hp.com (16.89.84.107) port 0 AF_INET : demo Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 87380 8738010.03 941.41 [EMAIL PROTECTED] netperf2_work]# netperf -H hpcpc107 -t TCP_MAERTS TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to hpcpc107.cup.hp.com (16.89.84.107) port 0 AF_INET : demo Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 87380 8738010.03 941.37 [EMAIL PROTECTED] netperf2_work]# ethtool -i eth0 driver: tg3 version: 3.65-rh firmware-version: 5704-v3.27 bus-info: :01:02.0 rick jones net.ipv6.conf.eth2.router_probe_interval = 60 net.ipv6.conf.eth2.accept_ra_rtr_pref = 1 net.ipv6.conf.eth2.accept_ra_pinfo = 1 net.ipv6.conf.eth2.accept_ra_defrtr = 1 net.ipv6.conf.eth2.max_addresses = 16 net.ipv6.conf.eth2.max_desync_factor = 600 net.ipv6.conf.eth2.regen_max_retry = 5 net.ipv6.conf.eth2.temp_prefered_lft = 86400 net.ipv6.conf.eth2.temp_valid_lft = 604800 net.ipv6.conf.eth2.use_tempaddr = 0 net.ipv6.conf.eth2.force_mld_version = 0 net.ipv6.conf.eth2.router_solicitation_delay = 1 net.ipv6.conf.eth2.router_solicitation_interval = 4 net.ipv6.conf.eth2.router_solicitations = 3 net.ipv6.conf.eth2.dad_transmits = 1 net.ipv6.conf.eth2.autoconf = 1 net.ipv6.conf.eth2.accept_redirects = 1 net.ipv6.conf.eth2.accept_ra = 1 net.ipv6.conf.eth2.mtu = 1500 net.ipv6.conf.eth2.hop_limit = 64 net.ipv6.conf.eth2.forwarding = 0 net.ipv6.conf.eth0.router_probe_interval = 60 net.ipv6.conf.eth0.accept_ra_rtr_pref = 1 net.ipv6.conf.eth0.accept_ra_pinfo = 1 net.ipv6.conf.eth0.accept_ra_defrtr = 1 net.ipv6.conf.eth0.max_addresses = 16 net.ipv6.conf.eth0.max_desync_factor = 600 net.ipv6.conf.eth0.regen_max_retry = 5 net.ipv6.conf.eth0.temp_prefered_lft = 86400 net.ipv6.conf.eth0.temp_valid_lft = 604800 net.ipv6.conf.eth0.use_tempaddr = 0 net.ipv6.conf.eth0.force_mld_version = 0 net.ipv6.conf.eth0.router_solicitation_delay = 1 net.ipv6.conf.eth0.router_solicitation_interval = 4 net.ipv6.conf.eth0.router_solicitations = 3 net.ipv6.conf.eth0.dad_transmits = 1 net.ipv6.conf.eth0.autoconf = 1 net.ipv6.conf.eth0.accept_redirects = 1 net.ipv6.conf.eth0.accept_ra = 1 net.ipv6.conf.eth0.mtu = 1500 net.ipv6.conf.eth0.hop_limit = 64 net.ipv6.conf.eth0.forwarding = 0 net.ipv6.conf.default.router_probe_interval = 60 net.ipv6.conf.default.accept_ra_rtr_pref = 1 net.ipv6.conf.default.accept_ra_pinfo = 1 net.ipv6.conf.default.accept_ra_defrtr = 1 net.ipv6.conf.default.max_addresses = 16 net.ipv6.conf.default.max_desync_factor = 600 net.ipv6.conf.default.regen_max_retry = 5 net.ipv6.conf.default.temp_prefered_lft = 86400 net.ipv6.conf.default.temp_valid_lft = 604800 net.ipv6.conf.default.use_tempaddr = 0 net.ipv6.conf.default.force_mld_version = 0 net.ipv6.conf.default.router_solicitation_delay = 1 net.ipv6.conf.default.router_solicitation_interv
Re: [IPv6] Fix ICMPv6 redirect handling with target multicast address
Brian, ipv6_addr_type() returns a mask, so checking for equality will fail to match if any other (irrelevant) attributes are set. How about using bitwise operators for that? Also, the error message is no longer descriptive of the failure if it's a link-local multicast, but you could make it "target address is not link-local unicast.\n" (in both places). +-DLS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
From: Linus Torvalds <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2007 12:27:53 -0700 (PDT) > We see a single packet containing 16060 bytes, which seems to be because > of TSO on the sending side (you did your tcpdump on the sender, no?), so > it will actually be broken up into 11 1460-byte regular frames by the > network card, since they started out agreeing on a standard 1460-byte MSS. > So the above is not a jumbo frame, it just kind of looks like one when you > capture it on the sender side. > > And maybe a 32kB window is not big enough when it causes the networking > code to basically just have a single packet outstanding. We fixed a lot of bugs in TSO last year. It would be really great to see numbers with a more recent kernel than 2.6.18 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
From: Linus Torvalds <[EMAIL PROTECTED]> Date: Tue, 2 Oct 2007 12:29:50 -0700 (PDT) > On Tue, 2 Oct 2007, Larry McVoy wrote: > > > > No HP in the mix. It's got nothing to do with hp, nor to do with rsh, it > > has everything to do with the direction the data is flowing. > > Can you tcpdump both cases and send snippets (both of steady-state, and > the initial connect)? Another thing I'd like to see is if something more recent than 2.6.18 also reproduces the problem. It could be just some bug we've fixed in the past year :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.24] tg3: fix ethtool autonegotiate flags
I recently noticed that when calling: # ethtool -s eth0 autoneg on on a 5722 (though I'm sure it's not specific to that card) that subsequent checks of the cards status looked like this: # ethtool eth0 Settings for eth0: Supported ports: [ MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: No< This seems odd?!? Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Current message level: 0x00ff (255) Link detected: yes I noticed that the following commit: commit 3600d918d870456ea8e7bb9d47f327de5c20f3d6 Author: Michael Chan <[EMAIL PROTECTED]> Date: Thu Dec 7 00:21:48 2006 -0800 [TG3]: Allow partial speed advertisement. Honor the advertisement bitmask from ethtool. We used to always advertise the full capability when autoneg was set to on. changed things around so that ethtool speed settings were strictly followed. Unfortunately ethtool doesn't seem to set ADVERTISED_Autoneg in the advertising field (and maybe it shouldn't have to). I'd vote that it should be fixed there, but it should also be added here just in case someone using ethtool ioctls in their own application gets what they want. Adding that flag in tg3_set_settings seemed like the most logical place since the driver works fine on boot. This is just an issue when re-enabling autonegotiation, so we should probably nip it there. Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]> --- tg3.c |3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index d4ac6e9..0a414be 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -8070,7 +8070,8 @@ static int tg3_set_settings(struct net_device *dev, struct ethtool_cmd *cmd) tp->link_config.autoneg = cmd->autoneg; if (cmd->autoneg == AUTONEG_ENABLE) { - tp->link_config.advertising = cmd->advertising; + tp->link_config.advertising = (cmd->advertising | + ADVERTISED_Autoneg); tp->link_config.speed = SPEED_INVALID; tp->link_config.duplex = DUPLEX_INVALID; } else { - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
> I think I'm still missing some basic data here (probably because this > thread did not originate on netdev). Let me try to nail down some of > the basics. You have a linux ia64 box (running 2.6.12 or 2.6.18?) that > sends slowly, and receives faster, but not quite a 1 Gbps? And this is > true regardless of which peer it sends or receives from? And the > behavior is different depending on which kernel? How, and which kernel > versions? Do you have other hardware running the same kernel that > behaves the same or differently? just got off the phone with Linus and he thinks it is the side that does the accept is the problem side, i.e., if you are the server, you do the accept, and you send the data, you'll go slow. But as I'm writing this I realize he's wrong, because it is the combination of accept & send. accept & recv goes fast. A trivial way to see the problem is to take two linux boxes, on each apt-get install rsh-client rsh-server set up your .rhosts, and then do dd if=/dev/zero count=10 | rsh OTHER_BOX dd of=/dev/null rsh OTHER_BOX dd if=/dev/zero count=10 | dd of=/dev/null See if you get balanced results. For me, I get 45MB/sec one way, and 15-19MB/sec the other way. I've tried the same test linux - linux and linux - hpux. Same results. The test setup I have is work: 2ghz x 2 Athlons, e1000, 2.6.18 ia64: 900mhz Itanium, e1000, 2.6.12 hp-ia64:900mhz Itanium, e1000, hpux 11 glibc*: 1-2ghz athlons running various linux releases all connected through a netgear 724T 10/100/1000 switch (a linksys showed identical results). I tested work <-> hp-ia64 work <-> ia64 ia64 <-> hp-ia64 and in all cases, one direction worked fast and the other didn't. It would be good if people tried the same simple test. You have to use rsh, ssh will slow things down way too much. Alternatively, take your favorite test programs, such as John's, and make a second pair that reverses the direction the data is sent. So one pair is server sends, the other is server receives, try both. That's where we started, BitKeeper, my stripped down test, and John's test all exhibit the same behavior. And the rsh test is just a really simple way to demonstrate it. Wayne, Linus asked for tcp dumps from just one side, with the first 100 packets and then wait 10 seconds or so for the window to open up, and then a snap shot of the another 100 packets. Do that for both directions and send them to the list. Can you do that? I want to get lunch, I'm starving. -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MSI interrupts and disable_irq
Ayaz Abdulla wrote: I am trying to track down a forcedeth driver issue described by bug 9047 in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load). I added a patch to synchronize the timer handlers so that one handler doesn't accidently enable the IRQ while another timer handler is running (see attachment 'Add timer lock' in bug report) and for other processing protection. However, the system still had an Oops. So I added a lock around the nv_rx_process_optimized() and the Oops has not happened (see attachment 'New patch for locking' in bug report). This would imply a synchronization issue. However, the only callers of that function are the IRQ handler and the timer handlers (in non-NAPI case). The timer handlers use disable_irq so that the IRQ handler does not contend with them. It looks as if disable_irq is not working properly. Either disable_irq() is not working properly or interrupts are nested, i.e. the irq handler is called again while running. Which timer handler do you mean? I only see disable_irq() in the configuration paths (set mtu, change ring size, ...) and in the tx timeout case. Neither one should happen during normal operation. -- Manfred - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc8-mm2 - tcp_fastretrans_alert() WARNING
On Tue, 2 Oct 2007, Ilpo Järvinen wrote: > I'm currently out of ideas where it could come from... so lets try > brute-force checking as your test case is not very high-speed... This > could hide it though... :-( > > Please put the patch below on top of clean rc8-mm2 (it includes the patch > I gave you last time) and try to reproduce These counter bugs can > survive for sometime until !sacked_out condition occurs, so the patch > below tries to find that out when inconsisteny occurs for the first time > regardless of sacked_out (I also removed some statics which hopefully > reduces compiler inlining for easier reading of the output). I tried this > myself (except for verify()s in frto funcs and minor printout > modifications), didn't trigger for me. In case you haven't yet get started (or it's easy enough to replace), please use the one below instead (I forgot one counter from printout in the last patch, which might turn out useful...). -- i. --- include/net/tcp.h |3 + net/ipv4/tcp_input.c | 23 +-- net/ipv4/tcp_ipv4.c | 103 + net/ipv4/tcp_output.c |6 ++- 4 files changed, 129 insertions(+), 6 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 991ccdc..54a0d91 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -43,6 +43,9 @@ #include +extern void tcp_verify_fackets(struct sock *sk); +extern void tcp_print_queue(struct sock *sk); + extern struct inet_hashinfo tcp_hashinfo; extern atomic_t tcp_orphan_count; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index e22ffe7..1d7367d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1140,7 +1140,7 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb, return dup_sack; } -static int +int tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_una) { const struct inet_connection_sock *icsk = inet_csk(sk); @@ -1160,6 +1160,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ int first_sack_index; if (!tp->sacked_out) { + if (WARN_ON(tp->fackets_out)) + tcp_print_queue(sk); tp->fackets_out = 0; tp->highest_sack = tp->snd_una; } @@ -1420,6 +1422,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ } } } + tcp_verify_fackets(sk); /* Check for lost retransmit. This superb idea is * borrowed from "ratehalving". Event "C". @@ -1632,13 +1635,14 @@ void tcp_enter_frto(struct sock *sk) tcp_set_ca_state(sk, TCP_CA_Disorder); tp->high_seq = tp->snd_nxt; tp->frto_counter = 1; + tcp_verify_fackets(sk); } /* Enter Loss state after F-RTO was applied. Dupack arrived after RTO, * which indicates that we should follow the traditional RTO recovery, * i.e. mark everything lost and do go-back-N retransmission. */ -static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) +void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; @@ -1675,6 +1679,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) } } tcp_verify_left_out(tp); + tcp_verify_fackets(sk); tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments; tp->snd_cwnd_cnt = 0; @@ -1753,6 +1758,7 @@ void tcp_enter_loss(struct sock *sk, int how) } } tcp_verify_left_out(tp); + tcp_verify_fackets(sk); tp->reordering = min_t(unsigned int, tp->reordering, sysctl_tcp_reordering); @@ -2308,7 +2314,7 @@ static void tcp_mtup_probe_success(struct sock *sk, struct sk_buff *skb) * It does _not_ decide what to send, it is made in function * tcp_xmit_retransmit_queue(). */ -static void +void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag) { struct inet_connection_sock *icsk = inet_csk(sk); @@ -2322,8 +2328,11 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag) if (!tp->packets_out) tp->sacked_out = 0; - if (WARN_ON(!tp->sacked_out && tp->fackets_out)) + if (WARN_ON(!tp->sacked_out && tp->fackets_out)) { + printk(KERN_ERR "TCP %d\n", tcp_is_reno(tp)); + tcp_print_queue(sk); tp->fackets_out = 0; + } /* Now state machine starts. * A. ECE, hence prohibit cwnd undoing, the reduction is required. */ @@ -2333,6 +2342,8 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag) /* B. In all the states check for reneging SACKs. */ if (tp->sacked_out && tcp_check_sack_reneging(sk))
[PATCH] - trivial - Correct printk with PFX before KERN_ in drivers/net/wireless/bcm43xx/bcm43xx_wx.c
Signed-off-by: Joe Perches <[EMAIL PROTECTED]> diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_wx.c b/drivers/net/wireless/bcm43xx/bcm43xx_wx.c index d6d9413..6acfdc4 100644 --- a/drivers/net/wireless/bcm43xx/bcm43xx_wx.c +++ b/drivers/net/wireless/bcm43xx/bcm43xx_wx.c @@ -444,7 +444,7 @@ static int bcm43xx_wx_set_xmitpower(struct net_device *net_dev, u16 maxpower; if ((data->txpower.flags & IW_TXPOW_TYPE) != IW_TXPOW_DBM) { - printk(PFX KERN_ERR "TX power not in dBm.\n"); + printk(KERN_ERR PFX "TX power not in dBm.\n"); return -EOPNOTSUPP; } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
Larry McVoy wrote: More data, we've conclusively eliminated the card / cpu from the mix. We've got 2 ia64 boxes with e1000 interfaces. One box is running linux 2.6.12 and the other is running hpux 11. I made sure the linux one was running at gigabit and reran the tests from the linux/ia64 <=> hp/ia64. Same results, when linux sends it is slow, when it receives it is fast. And note carefully: we've removed hpux from the equation, we can do the same tests from linux to multiple linux clients and see the same thing, sending from the server is slow, receiving on the server is fast. I think I'm still missing some basic data here (probably because this thread did not originate on netdev). Let me try to nail down some of the basics. You have a linux ia64 box (running 2.6.12 or 2.6.18?) that sends slowly, and receives faster, but not quite a 1 Gbps? And this is true regardless of which peer it sends or receives from? And the behavior is different depending on which kernel? How, and which kernel versions? Do you have other hardware running the same kernel that behaves the same or differently? Have you done ethernet cable tests? Have you tried measuring the udp sending rate? (Iperf can do this.) Are there any error counters on the interface? -John - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
I also would have expected more ACK's from the HP box. It's been a long time since I did TCP, but I thought the rule was still that you were supposed to ACK at least every other full frame - but the HP box is acking roughly every 16K (and it's *not* always at TSO boundaries: the earlier ACK's in the sequence are at 1460-byte packet boundaries, but it does seem to end up getting into that pattern later on). Drift... The RFC's say "SHOULD" (emphasis theirs) rather than "MUST." Both HP-UX and Solaris have rather robust ACK avoidance heuristics to cut-down on the CPU overhead of bulk transfers. (That they both have them stems from their being cousins, sharing a common TCP stack ancestor long ago - both of course have been diverging since then). rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
Larry McVoy wrote: On Tue, Oct 02, 2007 at 11:01:47AM -0700, Rick Jones wrote: has anyone already asked whether link-layer flow-control is enabled? I doubt it, the same test works fine in one direction and poorly in the other. Wouldn't the flow control squelch either way? While I am often guilty of it, a wise old engineer tried to teach me that the proper spelling is ass-u-me :) I wouldn't count on it hitting in both directions, depends on the specifics of the situation. WRT the HP-UX ACK avoidance heuristic, the default HP-UX socket buffer/window is 32768, and tcp_deferred_ack_max defaults to 22. That isn't really all that good a combination - with a window of 32768 11 for the deferred ack would be better. You could also go ahead and try it with a value of 2. Or, bump the window size defaults - tcp_recv_hiwater_def and tcp_xmit_hiwater_def - to say 65535 or 128K or something - or use the setsockopt() calls to effect that. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
More data, we've conclusively eliminated the card / cpu from the mix. We've got 2 ia64 boxes with e1000 interfaces. One box is running linux 2.6.12 and the other is running hpux 11. I made sure the linux one was running at gigabit and reran the tests from the linux/ia64 <=> hp/ia64. Same results, when linux sends it is slow, when it receives it is fast. And note carefully: we've removed hpux from the equation, we can do the same tests from linux to multiple linux clients and see the same thing, sending from the server is slow, receiving on the server is fast. -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
On Tue, 2 Oct 2007, Larry McVoy wrote: > > No HP in the mix. It's got nothing to do with hp, nor to do with rsh, it > has everything to do with the direction the data is flowing. Can you tcpdump both cases and send snippets (both of steady-state, and the initial connect)? Linus - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
On Tue, 2 Oct 2007, Larry McVoy wrote: > > tcpdump is a good idea, take a look at this. The window starts out > at 46 and never opens up in my test case, but in the rsh case it > starts out the same but does open up. Ideas? I don't think that's an issue, since you only send one way. The window opening up only matters for the receiver. Also, you missed the "wscale=7" at the beginning, so the window of "46" looks like it actually is 5888 (ie fits four segments - and it's not grown because it never gets any data). However, I think this is some strange TSO artifact: ... > 08:08:18.843942 IP work-cluster.bitmover.com.31235 > > hp-ia64.bitmover.com.49614: P 48181:64241(16060) ack 0 win 46 > 08:08:18.844681 IP hp-ia64.bitmover.com.49614 > > work-cluster.bitmover.com.31235: . ack 48181 win 32768 > 08:08:18.844690 IP work-cluster.bitmover.com.31235 > > hp-ia64.bitmover.com.49614: P 64241:80301(16060) ack 0 win 46 > 08:08:18.845556 IP hp-ia64.bitmover.com.49614 > > work-cluster.bitmover.com.31235: . ack 64241 win 32768 > 08:08:18.845566 IP work-cluster.bitmover.com.31235 > > hp-ia64.bitmover.com.49614: . 80301:96361(16060) ack 0 win 46 > 08:08:18.846304 IP hp-ia64.bitmover.com.49614 > > work-cluster.bitmover.com.31235: . ack 80301 win 32768 ... We see a single packet containing 16060 bytes, which seems to be because of TSO on the sending side (you did your tcpdump on the sender, no?), so it will actually be broken up into 11 1460-byte regular frames by the network card, since they started out agreeing on a standard 1460-byte MSS. So the above is not a jumbo frame, it just kind of looks like one when you capture it on the sender side. And maybe a 32kB window is not big enough when it causes the networking code to basically just have a single packet outstanding. I also would have expected more ACK's from the HP box. It's been a long time since I did TCP, but I thought the rule was still that you were supposed to ACK at least every other full frame - but the HP box is acking roughly every 16K (and it's *not* always at TSO boundaries: the earlier ACK's in the sequence are at 1460-byte packet boundaries, but it does seem to end up getting into that pattern later on). So I'm wondering if we get into some bad pattern with the networking code trying to make big TSO packets for e1000, but because they are *so* big that there's only room for two such packets per window, you don't get into any smooth pattern with lots of outstanding packets, but it starts stuttering. Larry, try turning off TSO. Or rather, make the kernel use a smaller limit for the large packets. The easiest way to do that should be to just change the value in /proc/sys/net/ipv4/tcp_tso_win_divisor. It defaults to 3, try doing echo 6 > /proc/sys/net/ipv4/tcp_tso_win_divisor and see if that changes anything. And maybe I'm just whistling in the dark. In fact, it looks like for you it's not 3, but 2 (window of 32768, but the TSO frames are half the size). So maybe I'm just totally confused and I'm not reading that tcp dump correctly at all! Linus - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPv6] Fix ICMPv6 redirect handling with target multicast address
When the ICMPv6 Target address is multicast, Linux processes the redirect instead of dropping it. The problem is in this code in ndisc_redirect_rcv(): if (ipv6_addr_equal(dest, target)) { on_link = 1; } else if (!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) { ND_PRINTK2(KERN_WARNING "ICMPv6 Redirect: target address is not link-local.\n"); return; } This second check will succeed if the Target address is, for example, FF02::1 because it has link-local scope. Instead, it should be checking if it's a unicast link-local address, as stated in RFC 2461/4861 Section 8.1: - The ICMP Target Address is either a link-local address (when redirected to a router) or the same as the ICMP Destination Address (when redirected to the on-link destination). I know this doesn't explicitly say unicast link-local address, but it's implied. This bug is preventing Linux kernels from achieving IPv6 Logo Phase II certification because of a recent error that was found in the TAHI test suite - Neighbor Disovery suite test 206 (v6LC.2.3.6_G) had the multicast address in the Destination field instead of Target field, so we were passing the test. This won't be the case anymore. The patch below fixes this problem, and also fixes ndisc_send_redirect() to not send an invalid redirect with a multicast address in the Target field. I re-ran the TAHI Neighbor Discovery section to make sure Linux passes all 245 tests now. -Brian Signed-off-by: Brian Haley <[EMAIL PROTECTED]> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 74c4d8d..a0a6406 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1267,7 +1267,8 @@ static void ndisc_redirect_rcv(struct sk_buff *skb) if (ipv6_addr_equal(dest, target)) { on_link = 1; - } else if (!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) { + } else if (ipv6_addr_type(target) != + (IPV6_ADDR_UNICAST|IPV6_ADDR_LINKLOCAL)) { ND_PRINTK2(KERN_WARNING "ICMPv6 Redirect: target address is not link-local.\n"); return; @@ -1343,7 +1344,7 @@ void ndisc_send_redirect(struct sk_buff *skb, struct neighbour *neigh, } if (!ipv6_addr_equal(&ipv6_hdr(skb)->daddr, target) && - !(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) { + ipv6_addr_type(target) != (IPV6_ADDR_UNICAST|IPV6_ADDR_LINKLOCAL)) { ND_PRINTK2(KERN_WARNING "ICMPv6 Redirect: target address is not link-local.\n"); return;
Re: tcp bw in 2.6
> Looks like you have TSO enabled. Does it behave differently if it's > disabled? It cranks the interrupts/sec up to 8K instead of 5K. No difference in performance other than that. > I think Rick Jones is on to something with the HP ack avoidance. I sincerely doubt it. I'm only using the HP box because it has gigabit so it's a single connection. I can produce almost identical results by doing the same sorts of tests with several linux clients. One direction goes fast and the other goes slow. 3x performance difference depending on the direction of data flow: # Server is receiving, goes fast $ for i in 22 24 25 26; do rsh -n glibc$i dd if=/dev/zero|dd of=/dev/null & done load free cach swap pgin pgou dk0 dk1 dk2 dk3 ipkt opkt int ctx usr sys idl 0.98 0000 00 0 0 0 30K 15K 8.1K 68K 12 66 22 0.98 0000 00 0 0 0 29K 15K 8.2K 67K 11 64 25 0.98 0000 00 0 0 0 29K 15K 8.2K 67K 12 66 22 # Server is sending, goes slow $ for i in 22 24 25 26; do dd if=/dev/zero|rsh glibc$i dd of=/dev/null & done load free cach swap pgin pgou dk0 dk1 dk2 dk3 ipkt opkt int ctx usr sys idl 1.06 0000 00 0 0 0 5.0K 10K 4.4K 8.4K 21 17 62 0.97 0000 00 0 0 0 5.1K 10K 4.4K 8.9K 2 15 83 0.97 0000 00 0 0 0 5.0K 10K 4.4K 8.6K 21 26 53 $ for i in 22 24 25 26; do rsh glibc$i cat /etc/motd; done | grep Welcome Welcome to redhat71.bitmover.com, a 2Ghz Athlon running Red Hat 7.1. Welcome to glibc24.bitmover.com, a 1.2Ghz Athlon running SUSE 10.1. Welcome to glibc25.bitmover.com, a 2Ghz Athlon running Fedora Core 6 Welcome to glibc26.bitmover.com, a 2Ghz Athlon running Fedora Core 7 $ for i in 22 24 25 26; do rsh glibc$i uname -r; done 2.4.2-2 2.6.16.13-4-default 2.6.18-1.2798.fc6 2.6.22.4-65.fc7 No HP in the mix. It's got nothing to do with hp, nor to do with rsh, it has everything to do with the direction the data is flowing. -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
>Umm... this is a difficult situation for me to merge the changes then. >We're changing the CM retry behavior blind here. How do we know that >the MRA changes don't make the scalability issue worse? What's currently upstream doesn't work for Intel MPI on our larger clusters. The connection requests time out on the active side before the passive side can respond. The OFED release works because it provides a kernel patch to make the timeout a module parameter. I'm trying to avoid adding a module parameter, and the MRA is designed for this situation. I tested this by simulating a slow passive side responder, and it worked as expected for those tests. Using an MRA does add another MAD to the CM exchange, which is why it is sent only after seeing a duplicate request. Alternatively, we can take the OFED module parameter patch. - Sean - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
On Tue, Oct 02, 2007 at 11:01:47AM -0700, Rick Jones wrote: > has anyone already asked whether link-layer flow-control is enabled? I doubt it, the same test works fine in one direction and poorly in the other. Wouldn't the flow control squelch either way? -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
> Make sure you don't have slab debugging turned on. It kills performance. It's a stock debian kernel, so unless they turn it on it's off. -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp bw in 2.6
Larry McVoy wrote: On Tue, Oct 02, 2007 at 06:52:54PM +0800, Herbert Xu wrote: One of my clients also has gigabit so I played around with just that one and it (itanium running hpux w/ broadcom gigabit) can push the load as well. One weird thing is that it is dependent on the direction the data is flowing. If the hp is sending then I get 46MB/sec, if linux is sending then I get 18MB/sec. Weird. Linux is debian, running First of all check the CPU load on both sides to see if either of them is saturating. If the CPU's fine then look at the tcpdump output to see if both receivers are using the same window settings. tcpdump is a good idea, take a look at this. The window starts out at 46 and never opens up in my test case, but in the rsh case it starts out the same but does open up. Ideas? (Binary tcpdumps are always better than ascii.) The window on the sender (linux box) starts at 46. It doesn't open up, but it's not receiving data so it doesn't matter, and you don't expect it to. The HP box always announces a window of 32768. Looks like you have TSO enabled. Does it behave differently if it's disabled? I think Rick Jones is on to something with the HP ack avoidance. Looks like a pretty low ack ratio, and it might not be interacting well with TSO, especially at such a small window size. -John - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
> >OK -- just to make sure I'm understanding what you're saying: have you > >confirmed that your proposed [CM MRA] patches actually fix the issue? > > Not directly. I cannot easily test kernel patches on our larger, production > clusters. We've seen the issue with specific applications on 512 and 1024 > cores, but I've only been able to test the patch on a 48-core cluster. I > have > verified that it successfully increases the timeout to where it *should* > work, > but cannot absolutely confirm that it will fix the problem. I'm unlikely to > know that until the production clusters move to an OFED release (1.3?) > containing this patch. Umm... this is a difficult situation for me to merge the changes then. We're changing the CM retry behavior blind here. How do we know that the MRA changes don't make the scalability issue worse? - R. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html