kernel 2.4 vs 2.6 Traffic Controller performance

2007-10-02 Thread Sonny
Hello
This is a repost, there seems to have a misunderstanding before.

I hope this is the right place to ask this. Does any know if there is a
substantial difference in the performance of the traffic controller
between kernel 2.4 and 2.6. We tested it using 1 iperf server and use
250 and 500 clients, altering the burst.

This is the set-up:
iperf client -  router (w/ traffic controller) - iperf server

We use the top command inside the router to check the idle time of our
router to see this. The results we got from the 2.4 kernel shows
around 65-70% idle time while the 2.6 shows
60-65% idle time. We tried to use MRTG and we're not getting any
results either. We want to know if we could improve the bandwidth by
upgrading the kernel, else we would have to get a new bandwidth
manager.  Have anyone performed a similar test or can suggest a better
way to do this. Thanks in advance.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-02 Thread Herbert Xu
On Tue, Oct 02, 2007 at 05:29:11PM +0200, Patrick McHardy wrote:
>
> I think this doesn't completely fix it, when dev_unicast_add is
> interrupted by dev_mc_add before the unicast changes are performed,
> they will get committed in the dev_mc_add context, so we might still
> call change_flags with BH disabled. Taking the TX lock around the
> dev->uc_count and dev->uc_promisc checks and changes in __dev_set_rx_mode
> should fix this.

Good catch.  Digging back in history it seems that you added
the change_rx_flags function so that the driver didn't have to
do it under TX lock, right?

The problem with this is that the stack can now call
change_rx_flags and set_multicast_list simultaneously
which presents a potential headache for the driver
author (if they were to use change_rx_flags).

It seems to me what we could do is in fact separate out the
part that adds the address and the part that syncs it with
hardware.

That way we can call the hardware from a process context later
and use the RTNL to guarantee that we only enter the driver
once.

So dev_mc_add would look like:

1) Hold some form of lock L.
2) Modify mc list A (a copy of the current mc list).
3) Drop lock.
4) Schedule an update to the hardware.

The update to the hardware would look lie:

1) Hold RTNL.
2) Hold lock L.
3) Copy list A to list B (B would be our current list).
4) Drop lock L.
5) Call the hardware.
6) Drop RTNL.

For compatibility, set_multicast_list would still be invoked
under the TX lock while set_rx_mode would do exactly the same
thing but would only hold the RTNL.

What do you think about this approach?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3][NET_BATCH] net core use batching

2007-10-02 Thread Bill Fink
On Tue, 02 Oct 2007, jamal wrote:

> On Tue, 2007-02-10 at 00:25 -0400, Bill Fink wrote:
> 
> > One reason I ask, is that on an earlier set of alternative batching
> > xmit patches by Krishna Kumar, his performance testing showed a 30 %
> > performance hit for TCP for a single process and a size of 4 KB, and
> > a performance hit of 5 % for a single process and a size of 16 KB
> > (a size of 8 KB wasn't tested).  Unfortunately I was too busy at the
> > time to inquire further about it, but it would be a major potential
> > concern for me in my 10-GigE network testing with 9000-byte jumbo
> > frames.  Of course the single process and 4 KB or larger size was
> > the only case that showed a significant performance hit in Krishna
> > Kumar's latest reported test results, so it might be acceptable to
> > just have a switch to disable the batching feature for that specific
> > usage scenario.  So it would be useful to know if your xmit batching
> > changes would have similar issues.
> 
> There were many times while testing that i noticed inconsistencies and
> in each case when i analysed[1], i found it to be due to some variable
> other than batching which needed some resolving, always via some
> parametrization or other. I suspect what KK posted is in the same class.
> To give you an example, with UDP, batching was giving worse results at
> around 256B compared to 64B or 512B; investigating i found that the
> receiver just wasnt able to keep up and the udp layer dropped a lot of
> packets so both iperf and netperf reported bad numbers. Fixing the
> receiver ended up with consistency coming back. On why 256B was the one
> that overwhelmed the receiver more than 64B(which sent more pps)? On
> some limited investigation, it seemed to me to be the effect of the
> choice of the tg3 driver's default tx mitigation parameters as well tx
> ring size; which is something i plan to revisit (but neutralizing it
> helps me focus on just batching). In the end i dropped both netperf and
> iperf for similar reasons and wrote my own app. What i am trying to
> achieve is demonstrate if batching is a GoodThing. In experimentation
> like this, it is extremely valuable to reduce the variables. Batching
> may expose other orthogonal issues - those need to be resolved or fixed
> as they are found. I hope that sounds sensible.

It does sound sensible.  My own decidedly non-expert speculation
was that the big 30 % performance hit right at 4 KB may be related
to memory allocation issues or having to split the skb across
multiple 4 KB pages.  And perhaps it only affected the single
process case because with multiple processes lock contention may
be a bigger issue and the xmit batching changes would presumably
help with that.  I am admittedly a novice when it comes to the
detailed internals of TCP/skb processing, although I have been
slowly slogging my way through parts of the TCP kernel code to
try and get a better understanding, so I don't know if these
thoughts have any merit.

BTW does anyone know of a good book they would recommend that has
substantial coverage of the Linux kernel TCP code, that's fairly
up-to-date and gives both an overall view of the code and packet
flow as well as details on individual functions and algorithms,
and hopefully covers basic issues like locking and synchronization,
concurrency of different parts of the stack, and memory allocation.
I have several books already on Linux kernel and networking internals,
but they seem to only cover the IP (and perhaps UDP) portions of the
network stack, and none have more than a cursory reference to TCP.  
The most useful documentation on the Linux TCP stack that I have
found thus far is some of Dave Miller's excellent web pages and
a few other web references, but overall it seems fairly skimpy
for such an important part of the Linux network code.

> Back to the >=9K packet size you raise above:
> I dont have a 10Gige card so iam theorizing. Given that theres an
> observed benefit to batching for a saturated link with "smaller" packets
> (in my results "small" is anything below 256B which maps to about
> 380Kpps anything above that seems to approach wire speed and the link is
> the bottleneck); then i theorize that 10Gige with 9K jumbo frames if
> already achieving wire rate, should continue to do so. And sizes below
> that will see improvements if they were not already hitting wire rate.
> So i would say that with 10G NICS, there will be more observed
> improvements with batching with apps that do bulk transfers (assuming
> those apps are not seeing wire speed already). Note that this hasnt been
> quiet the case even with TSO given the bottlenecks in the Linux
> receivers that J Heffner put nicely in a response to some results you
> posted - but that exposes an issue with Linux receivers rather than TSO.

It would be good to see some empirical evidence that there aren't
any unforeseen gotchas for larger packet sizes, that at least the
same level of performance can be obt

Re: [PATCH] sky2: jumbo frame regression fix

2007-10-02 Thread Stephen Hemminger
On Wed, 03 Oct 2007 03:34:34 +0200
Ian Kumlien <[EMAIL PROTECTED]> wrote:

> On tis, 2007-10-02 at 18:02 -0700, Stephen Hemminger wrote:
> > Remove unneeded check that caused problems with jumbo frame sizes.
> > The check was recently added and is wrong.
> > When using jumbo frames the sky2 driver does fragmentation, so
> > rx_data_size is less than mtu.
> 
> Confirmed working.
> 
> Now running with 9k mtu with no errors, =)
> 
> It also seems that the FIFO bug was the one that affected me before,
> damn odd race that one.

Does the workaround (forced reset work). Ian, you are the first person to
report triggering it.  I haven't found a way to make it happen.
What combination of flow control and speeds are you using?


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sky2: jumbo frame regression fix

2007-10-02 Thread Jeff Garzik

Stephen Hemminger wrote:

On Tue, 02 Oct 2007 21:07:22 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:


Stephen Hemminger wrote:

Remove unneeded check that caused problems with jumbo frame sizes.
The check was recently added and is wrong.
When using jumbo frames the sky2 driver does fragmentation, so
rx_data_size is less than mtu.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- a/drivers/net/sky2.c2007-10-02 17:56:31.0 -0700
+++ b/drivers/net/sky2.c2007-10-02 17:58:56.0 -0700
@@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru
sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
prefetch(sky2->rx_ring + sky2->rx_next);
 
-	if (length < ETH_ZLEN || length > sky2->rx_data_size)

-   goto len_error;
-

2.6.23?  2.6.24?  enquiring minds want to know...


2.6.23, since it is a regression


You can have regressions in behavior in net-2.6.24.git, too.  _Please_ 
be specific about where you want your patches to go.  Thanks.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sky2: jumbo frame regression fix

2007-10-02 Thread Stephen Hemminger
On Tue, 02 Oct 2007 21:07:22 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:

> Stephen Hemminger wrote:
> > Remove unneeded check that caused problems with jumbo frame sizes.
> > The check was recently added and is wrong.
> > When using jumbo frames the sky2 driver does fragmentation, so
> > rx_data_size is less than mtu.
> > 
> > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
> > 
> > --- a/drivers/net/sky2.c2007-10-02 17:56:31.0 -0700
> > +++ b/drivers/net/sky2.c2007-10-02 17:58:56.0 -0700
> > @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru
> > sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
> > prefetch(sky2->rx_ring + sky2->rx_next);
> >  
> > -   if (length < ETH_ZLEN || length > sky2->rx_data_size)
> > -   goto len_error;
> > -
> 
> 2.6.23?  2.6.24?  enquiring minds want to know...

2.6.23, since it is a regression

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please pull 'upstream-davem' branch of wireless-2.6

2007-10-02 Thread John W. Linville
Of course, these are intended for 2.6.24.  Also, I forgot to mention
that the individual patches are available here:


http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/upstream-davem/

I also preserved the net-2.6.24 commit I based from as 'master-davem'
in case you need it for reference.

Hth!

John

On Tue, Oct 02, 2007 at 09:25:52PM -0400, John W. Linville wrote:
> The following changes since commit d3adbde754a9ae7a6f87612055cb20db856f0721:
>   Ilpo Järvinen (1):
> [TCP]: Wrap-safed reordering detection FRTO check
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
> upstream-davem
> 
> Daniel Drake (1):
>   hostap: set netdev type before registering AP interface
> 
> Johannes Berg (9):
>   mac80211: add "invalid" interface type
>   mac80211: remove management interface
>   mac80211: move sta_process rx handler later
>   mac80211: consolidate decryption more
>   mac80211: use RX_FLAG_DECRYPTED for sw decrypted as well
>   mac80211: remove ALG_NONE
>   mac80211: improve radiotap injection
>   mac80211: make userspace-mlme a per-interface setting
>   mac80211: implement cfg80211's change_interface hook
> 
> Michael Buesch (9):
>   rfkill: Add support for an rfkill LED.
>   rfkill: Add support for hardware-only rfkill buttons
>   b43: LED triggers support
>   b43: RF-kill support
>   b43: Use input-polldev for the rfkill switch
>   b43: Rewrite pwork locking policy.
>   mac80211: Check open_count before calling config callback.
>   mac80211: Add association LED trigger
>   mac80211: Update beacon_update callback documentation
> 
> Tomas Winkler (1):
>   mac80211: add sta_notify callback
> 
> Ulrich Kunitz (1):
>   zd1211rw: Removed zd_util.c and zd_util.h
> 
>  Documentation/networking/mac80211-injection.txt |   32 ++-
>  drivers/net/wireless/adm8211.c  |8 +-
>  drivers/net/wireless/b43/Kconfig|   12 +
>  drivers/net/wireless/b43/Makefile   |5 +-
>  drivers/net/wireless/b43/b43.h  |   11 +-
>  drivers/net/wireless/b43/leds.c |  399 
> ++-
>  drivers/net/wireless/b43/leds.h |   63 ++--
>  drivers/net/wireless/b43/main.c |  205 
>  drivers/net/wireless/b43/phy.c  |   13 +-
>  drivers/net/wireless/b43/phy.h  |2 +-
>  drivers/net/wireless/b43/rfkill.c   |  184 +++
>  drivers/net/wireless/b43/rfkill.h   |   58 
>  drivers/net/wireless/hostap/hostap.h|2 +-
>  drivers/net/wireless/hostap/hostap_hw.c |2 +-
>  drivers/net/wireless/hostap/hostap_main.c   |   19 +-
>  drivers/net/wireless/iwlwifi/iwl3945-base.c |4 -
>  drivers/net/wireless/iwlwifi/iwl4965-base.c |4 -
>  drivers/net/wireless/p54common.c|4 +-
>  drivers/net/wireless/p54pci.c   |4 +-
>  drivers/net/wireless/rt2x00/rt2x00.h|2 +-
>  drivers/net/wireless/zd1211rw/Makefile  |2 +-
>  drivers/net/wireless/zd1211rw/zd_chip.c |1 -
>  drivers/net/wireless/zd1211rw/zd_mac.c  |4 +-
>  drivers/net/wireless/zd1211rw/zd_usb.c  |1 -
>  drivers/net/wireless/zd1211rw/zd_util.c |   82 -
>  drivers/net/wireless/zd1211rw/zd_util.h |   29 --
>  include/linux/rfkill.h  |   24 ++
>  include/net/mac80211.h  |   46 +++-
>  net/mac80211/cfg.c  |   75 -
>  net/mac80211/ieee80211.c|  189 +---
>  net/mac80211/ieee80211_i.h  |   17 +-
>  net/mac80211/ieee80211_iface.c  |   68 +
>  net/mac80211/ieee80211_ioctl.c  |   31 +-
>  net/mac80211/ieee80211_led.c|   67 +++-
>  net/mac80211/ieee80211_led.h|6 +
>  net/mac80211/ieee80211_rate.c   |3 +-
>  net/mac80211/ieee80211_rate.h   |2 -
>  net/mac80211/ieee80211_sta.c|7 +-
>  net/mac80211/key.c  |1 -
>  net/mac80211/rx.c   |  122 +++-
>  net/mac80211/sta_info.c |   13 +-
>  net/mac80211/tx.c   |  211 ++--
>  net/mac80211/wme.c  |   10 +-
>  net/rfkill/Kconfig  |7 +
>  net/rfkill/rfkill.c |   49 +++-
>  45 files changed, 1022 insertions(+), 1078 deletions(-)
>  create mode 100644 drivers/net/wireless/b43/rfkill.c
>  create mode 100644 drivers/net/wireless/b43/rfkill.h
>  delete mode 100644 drivers/net/wireless/zd1211rw/zd_util.c
>  delete mode 100644 drivers/net/wireless/zd1211rw/zd_util.h

Re: Please pull 'upstream-davem' branch of wireless-2.6

2007-10-02 Thread David Miller
From: "John W. Linville" <[EMAIL PROTECTED]>
Date: Tue, 2 Oct 2007 21:25:52 -0400

>   git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
> upstream-davem

This doesn't pull cleanly.

Probably you used a recently cloned Linus tree, pulled
net-2.6.24 into that (and resolved the conflicts), and
then put your patches in.

Please don't do it like that, I don't want to pull from
a tree that has linus vs. net-2.6.24 conflict handling
in it.  That's why I usually rebase frequently, to minimize
that as much as is humanly possible.

What you can do is figure out what linus's HEAD was at the last rebase
(basically 'origin' or parent of net-2.6.24), clone that then pull in
net-2.6.24, then add your patches.

That way I can always do a clean pull.

My pull from Jeff today was very clean, for example.

I'll add these wireless bits by hand as patches.

Thanks John.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] baycom epp header ops

2007-10-02 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 2 Oct 2007 17:41:03 -0700

> Update baycom epp driver for new header ops in net-2.6.24
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied, thanks Stephen.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please pull 'fixes-jgarzik' branch of wireless-2.6

2007-10-02 Thread John W. Linville
The following changes since commit 3146b39c185f8a436d430132457e84fa1d8f8208:
  Linus Torvalds (1):
Linux 2.6.23-rc9

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
fixes-jgarzik

Joe Perches (1):
  bcm43xx: Correct printk with PFX before KERN_

Richard Knutsson (1):
  softmac: Fix compiler-warning

 drivers/net/wireless/bcm43xx/bcm43xx_wx.c   |2 +-
 net/ieee80211/softmac/ieee80211softmac_wx.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_wx.c 
b/drivers/net/wireless/bcm43xx/bcm43xx_wx.c
index d6d9413..6acfdc4 100644
--- a/drivers/net/wireless/bcm43xx/bcm43xx_wx.c
+++ b/drivers/net/wireless/bcm43xx/bcm43xx_wx.c
@@ -444,7 +444,7 @@ static int bcm43xx_wx_set_xmitpower(struct net_device 
*net_dev,
u16 maxpower;
 
if ((data->txpower.flags & IW_TXPOW_TYPE) != IW_TXPOW_DBM) {
-   printk(PFX KERN_ERR "TX power not in dBm.\n");
+   printk(KERN_ERR PFX "TX power not in dBm.\n");
return -EOPNOTSUPP;
}
 
diff --git a/net/ieee80211/softmac/ieee80211softmac_wx.c 
b/net/ieee80211/softmac/ieee80211softmac_wx.c
index 442b987..5742dc8 100644
--- a/net/ieee80211/softmac/ieee80211softmac_wx.c
+++ b/net/ieee80211/softmac/ieee80211softmac_wx.c
@@ -114,7 +114,7 @@ check_assoc_again:
sm->associnfo.associating = 1;
/* queue lower level code to do work (if necessary) */
schedule_delayed_work(&sm->associnfo.work, 0);
-out:
+
mutex_unlock(&sm->associnfo.mutex);
 
return 0;
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please pull 'upstream-davem' branch of wireless-2.6

2007-10-02 Thread John W. Linville
The following changes since commit d3adbde754a9ae7a6f87612055cb20db856f0721:
  Ilpo Järvinen (1):
[TCP]: Wrap-safed reordering detection FRTO check

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
upstream-davem

Daniel Drake (1):
  hostap: set netdev type before registering AP interface

Johannes Berg (9):
  mac80211: add "invalid" interface type
  mac80211: remove management interface
  mac80211: move sta_process rx handler later
  mac80211: consolidate decryption more
  mac80211: use RX_FLAG_DECRYPTED for sw decrypted as well
  mac80211: remove ALG_NONE
  mac80211: improve radiotap injection
  mac80211: make userspace-mlme a per-interface setting
  mac80211: implement cfg80211's change_interface hook

Michael Buesch (9):
  rfkill: Add support for an rfkill LED.
  rfkill: Add support for hardware-only rfkill buttons
  b43: LED triggers support
  b43: RF-kill support
  b43: Use input-polldev for the rfkill switch
  b43: Rewrite pwork locking policy.
  mac80211: Check open_count before calling config callback.
  mac80211: Add association LED trigger
  mac80211: Update beacon_update callback documentation

Tomas Winkler (1):
  mac80211: add sta_notify callback

Ulrich Kunitz (1):
  zd1211rw: Removed zd_util.c and zd_util.h

 Documentation/networking/mac80211-injection.txt |   32 ++-
 drivers/net/wireless/adm8211.c  |8 +-
 drivers/net/wireless/b43/Kconfig|   12 +
 drivers/net/wireless/b43/Makefile   |5 +-
 drivers/net/wireless/b43/b43.h  |   11 +-
 drivers/net/wireless/b43/leds.c |  399 ++-
 drivers/net/wireless/b43/leds.h |   63 ++--
 drivers/net/wireless/b43/main.c |  205 
 drivers/net/wireless/b43/phy.c  |   13 +-
 drivers/net/wireless/b43/phy.h  |2 +-
 drivers/net/wireless/b43/rfkill.c   |  184 +++
 drivers/net/wireless/b43/rfkill.h   |   58 
 drivers/net/wireless/hostap/hostap.h|2 +-
 drivers/net/wireless/hostap/hostap_hw.c |2 +-
 drivers/net/wireless/hostap/hostap_main.c   |   19 +-
 drivers/net/wireless/iwlwifi/iwl3945-base.c |4 -
 drivers/net/wireless/iwlwifi/iwl4965-base.c |4 -
 drivers/net/wireless/p54common.c|4 +-
 drivers/net/wireless/p54pci.c   |4 +-
 drivers/net/wireless/rt2x00/rt2x00.h|2 +-
 drivers/net/wireless/zd1211rw/Makefile  |2 +-
 drivers/net/wireless/zd1211rw/zd_chip.c |1 -
 drivers/net/wireless/zd1211rw/zd_mac.c  |4 +-
 drivers/net/wireless/zd1211rw/zd_usb.c  |1 -
 drivers/net/wireless/zd1211rw/zd_util.c |   82 -
 drivers/net/wireless/zd1211rw/zd_util.h |   29 --
 include/linux/rfkill.h  |   24 ++
 include/net/mac80211.h  |   46 +++-
 net/mac80211/cfg.c  |   75 -
 net/mac80211/ieee80211.c|  189 +---
 net/mac80211/ieee80211_i.h  |   17 +-
 net/mac80211/ieee80211_iface.c  |   68 +
 net/mac80211/ieee80211_ioctl.c  |   31 +-
 net/mac80211/ieee80211_led.c|   67 +++-
 net/mac80211/ieee80211_led.h|6 +
 net/mac80211/ieee80211_rate.c   |3 +-
 net/mac80211/ieee80211_rate.h   |2 -
 net/mac80211/ieee80211_sta.c|7 +-
 net/mac80211/key.c  |1 -
 net/mac80211/rx.c   |  122 +++-
 net/mac80211/sta_info.c |   13 +-
 net/mac80211/tx.c   |  211 ++--
 net/mac80211/wme.c  |   10 +-
 net/rfkill/Kconfig  |7 +
 net/rfkill/rfkill.c |   49 +++-
 45 files changed, 1022 insertions(+), 1078 deletions(-)
 create mode 100644 drivers/net/wireless/b43/rfkill.c
 create mode 100644 drivers/net/wireless/b43/rfkill.h
 delete mode 100644 drivers/net/wireless/zd1211rw/zd_util.c
 delete mode 100644 drivers/net/wireless/zd1211rw/zd_util.h

Omnibus patch attached as upstream-davem.patch.bz2
-- 
John W. Linville
[EMAIL PROTECTED]


upstream-davem.patch.bz2
Description: BZip2 compressed data


Re: [PATCH] sky2: jumbo frame regression fix

2007-10-02 Thread Ian Kumlien
On tis, 2007-10-02 at 18:02 -0700, Stephen Hemminger wrote:
> Remove unneeded check that caused problems with jumbo frame sizes.
> The check was recently added and is wrong.
> When using jumbo frames the sky2 driver does fragmentation, so
> rx_data_size is less than mtu.

Confirmed working.

Now running with 9k mtu with no errors, =)

It also seems that the FIFO bug was the one that affected me before,
damn odd race that one.

> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
Tested-by: Ian Kumlien <[EMAIL PROTECTED]>

(if that tag exists now)

Btw, Sorry but all mail directly to you will be blocked. I have yet to
fix the relaying properly with isp:s blocking port 25 etc so for some of
you this mail will only show up on the ML.

> --- a/drivers/net/sky2.c  2007-10-02 17:56:31.0 -0700
> +++ b/drivers/net/sky2.c  2007-10-02 17:58:56.0 -0700
> @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru
>   sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
>   prefetch(sky2->rx_ring + sky2->rx_next);
>  
> - if (length < ETH_ZLEN || length > sky2->rx_data_size)
> - goto len_error;
> -
>   /* This chip has hardware problems that generates bogus status.
>* So do only marginal checking and expect higher level protocols
>* to handle crap frames.
-- 
Ian Kumlien  -- http://pomac.netswarm.net


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] sky2: jumbo frame regression fix

2007-10-02 Thread Jeff Garzik

Stephen Hemminger wrote:

Remove unneeded check that caused problems with jumbo frame sizes.
The check was recently added and is wrong.
When using jumbo frames the sky2 driver does fragmentation, so
rx_data_size is less than mtu.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- a/drivers/net/sky2.c2007-10-02 17:56:31.0 -0700
+++ b/drivers/net/sky2.c2007-10-02 17:58:56.0 -0700
@@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru
sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
prefetch(sky2->rx_ring + sky2->rx_next);
 
-	if (length < ETH_ZLEN || length > sky2->rx_data_size)

-   goto len_error;
-


2.6.23?  2.6.24?  enquiring minds want to know...


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sky2: jumbo frame regression fix

2007-10-02 Thread Stephen Hemminger
Remove unneeded check that caused problems with jumbo frame sizes.
The check was recently added and is wrong.
When using jumbo frames the sky2 driver does fragmentation, so
rx_data_size is less than mtu.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- a/drivers/net/sky2.c2007-10-02 17:56:31.0 -0700
+++ b/drivers/net/sky2.c2007-10-02 17:58:56.0 -0700
@@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru
sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
prefetch(sky2->rx_ring + sky2->rx_next);
 
-   if (length < ETH_ZLEN || length > sky2->rx_data_size)
-   goto len_error;
-
/* This chip has hardware problems that generates bogus status.
 * So do only marginal checking and expect higher level protocols
 * to handle crap frames.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] baycom epp header ops

2007-10-02 Thread Stephen Hemminger
Update baycom epp driver for new header ops in net-2.6.24

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 drivers/net/hamradio/baycom_epp.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hamradio/baycom_epp.c 
b/drivers/net/hamradio/baycom_epp.c
index 355c6cf..1a5a75a 100644
--- a/drivers/net/hamradio/baycom_epp.c
+++ b/drivers/net/hamradio/baycom_epp.c
@@ -1159,8 +1159,7 @@ static void baycom_probe(struct net_device *dev)
/* Fill in the fields of the device structure */
bc->skb = NULL;

-   dev->hard_header = ax25_hard_header;
-   dev->rebuild_header = ax25_rebuild_header;
+   dev->header_ops = &ax25_header_ops;
dev->set_mac_address = baycom_set_mac_address;

dev->type = ARPHRD_AX25;   /* AF_AX25 device */
-- 
1.5.2.5

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel 2.4 vs 2.6 Traffic Controller Performance

2007-10-02 Thread Sonny
Hello
I hope this is the right place to ask this.Does any know if there is a
substantial difference in the performance of the traffic controller
between kernel 2.4 and 2.6. We tested it using 1 iperf server and use
250 and 500 clients, altering the burst. We use the top command to
check the idle time of our router to see this. The results we got from
the 2.4 kernel shows around 65-70% idle time while the 2.6 shows
60-65% idle time. We tried to use MRTG and we're not getting any
results either. We want to know if we could improve the bandwidth by
upgrading the kernel, else we would have to get a new bandwidth
manager.  Could anyone have the similar test regarding this or suggest
a better way to do this. Thanks in advance.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver updates

2007-10-02 Thread David Miller
From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Tue, 2 Oct 2007 13:41:50 -0400

> Please pull from the 'upstream' branch of
> master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream

Pulled and pushed back out to net-2.6.24, thanks Jeff!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24 3/3][BNX2]: Update version to 1.6.6.

2007-10-02 Thread Jeff Garzik

Michael Chan wrote:

[BNX2]: Update version to 1.6.6.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>


ACK patches 1-3


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24 3/3][BNX2]: Update version to 1.6.6.

2007-10-02 Thread David Miller
From: "Michael Chan" <[EMAIL PROTECTED]>
Date: Tue, 02 Oct 2007 17:24:06 -0700

> [BNX2]: Update version to 1.6.6.
> 
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

Also applied to net-2.6.24, thanks!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24 2/3][BNX2]: Optimize firmware loading.

2007-10-02 Thread David Miller
From: "Michael Chan" <[EMAIL PROTECTED]>
Date: Tue, 02 Oct 2007 17:23:43 -0700

> [BNX2]: Optimize firmware loading.
> 
> This is a follow up to the patches from Denys Vlasenkos
> <[EMAIL PROTECTED]> to further optimize firmware loading.
> 
> 1. In bnx2_init_cpus(), we allocate memory for decompression once
> and use it repeatedly instead of doing this for every firmware image.
> 
> 2. We eliminate the BSS and SBSS firmware sections in bnx2_fw*.h since
> these are always zeros.
> 
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

Applied, thanks for following up on this Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24 1/3][BNX2]: Add missing napi_disable() in bnx2_close().

2007-10-02 Thread David Miller
From: "Michael Chan" <[EMAIL PROTECTED]>
Date: Tue, 02 Oct 2007 17:23:09 -0700

> [BNX2]: Add missing napi_disable() in bnx2_close().
> 
> bnx2_close() -> bnx2_netif_stop() will not call napi_disable() because
> the netif_state is not running in bnx2_close().  To avoid confusion,
> we change it to disable interrupt and napi directly in bnx2_close().
> 
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

Applied to net-2.6.24
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.24 3/3][BNX2]: Update version to 1.6.6.

2007-10-02 Thread Michael Chan
[BNX2]: Update version to 1.6.6.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index c50e4c8..db14f35 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -56,8 +56,8 @@
 
 #define DRV_MODULE_NAME"bnx2"
 #define PFX DRV_MODULE_NAME": "
-#define DRV_MODULE_VERSION "1.6.5"
-#define DRV_MODULE_RELDATE "September 20, 2007"
+#define DRV_MODULE_VERSION "1.6.6"
+#define DRV_MODULE_RELDATE "October 2, 2007"
 
 #define RUN_AT(x) (jiffies + (x))
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.24 2/3][BNX2]: Optimize firmware loading.

2007-10-02 Thread Michael Chan
[BNX2]: Optimize firmware loading.

This is a follow up to the patches from Denys Vlasenkos
<[EMAIL PROTECTED]> to further optimize firmware loading.

1. In bnx2_init_cpus(), we allocate memory for decompression once
and use it repeatedly instead of doing this for every firmware image.

2. We eliminate the BSS and SBSS firmware sections in bnx2_fw*.h since
these are always zeros.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4887c31..c50e4c8 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2810,21 +2810,16 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_reg *cpu_reg, 
struct fw_info *fw)
/* Load the Text area. */
offset = cpu_reg->spad_base + (fw->text_addr - cpu_reg->mips_view_base);
if (fw->gz_text) {
-   u32 *text;
int j;
 
-   text = vmalloc(FW_BUF_SIZE);
-   if (!text)
-   return -ENOMEM;
-   rc = zlib_inflate_blob(text, FW_BUF_SIZE, fw->gz_text, 
fw->gz_text_len);
-   if (rc < 0) {
-   vfree(text);
+   rc = zlib_inflate_blob(fw->text, FW_BUF_SIZE, fw->gz_text,
+  fw->gz_text_len);
+   if (rc < 0)
return rc;
-   }
+
for (j = 0; j < (fw->text_len / 4); j++, offset += 4) {
-   REG_WR_IND(bp, offset, cpu_to_le32(text[j]));
+   REG_WR_IND(bp, offset, cpu_to_le32(fw->text[j]));
}
-   vfree(text);
}
 
/* Load the Data area. */
@@ -2839,21 +2834,21 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_reg *cpu_reg, 
struct fw_info *fw)
 
/* Load the SBSS area. */
offset = cpu_reg->spad_base + (fw->sbss_addr - cpu_reg->mips_view_base);
-   if (fw->sbss) {
+   if (fw->sbss_len) {
int j;
 
for (j = 0; j < (fw->sbss_len / 4); j++, offset += 4) {
-   REG_WR_IND(bp, offset, fw->sbss[j]);
+   REG_WR_IND(bp, offset, 0);
}
}
 
/* Load the BSS area. */
offset = cpu_reg->spad_base + (fw->bss_addr - cpu_reg->mips_view_base);
-   if (fw->bss) {
+   if (fw->bss_len) {
int j;
 
for (j = 0; j < (fw->bss_len/4); j++, offset += 4) {
-   REG_WR_IND(bp, offset, fw->bss[j]);
+   REG_WR_IND(bp, offset, 0);
}
}
 
@@ -2894,19 +2889,16 @@ bnx2_init_cpus(struct bnx2 *bp)
if (!text)
return -ENOMEM;
rc = zlib_inflate_blob(text, FW_BUF_SIZE, bnx2_rv2p_proc1, 
sizeof(bnx2_rv2p_proc1));
-   if (rc < 0) {
-   vfree(text);
+   if (rc < 0)
goto init_cpu_err;
-   }
+
load_rv2p_fw(bp, text, rc /* == len */, RV2P_PROC1);
 
rc = zlib_inflate_blob(text, FW_BUF_SIZE, bnx2_rv2p_proc2, 
sizeof(bnx2_rv2p_proc2));
-   if (rc < 0) {
-   vfree(text);
+   if (rc < 0)
goto init_cpu_err;
-   }
+
load_rv2p_fw(bp, text, rc /* == len */, RV2P_PROC2);
-   vfree(text);
 
/* Initialize the RX Processor. */
cpu_reg.mode = BNX2_RXP_CPU_MODE;
@@ -2927,6 +2919,7 @@ bnx2_init_cpus(struct bnx2 *bp)
else
fw = &bnx2_rxp_fw_06;
 
+   fw->text = text;
rc = load_cpu_fw(bp, &cpu_reg, fw);
if (rc)
goto init_cpu_err;
@@ -2950,6 +2943,7 @@ bnx2_init_cpus(struct bnx2 *bp)
else
fw = &bnx2_txp_fw_06;
 
+   fw->text = text;
rc = load_cpu_fw(bp, &cpu_reg, fw);
if (rc)
goto init_cpu_err;
@@ -2973,6 +2967,7 @@ bnx2_init_cpus(struct bnx2 *bp)
else
fw = &bnx2_tpat_fw_06;
 
+   fw->text = text;
rc = load_cpu_fw(bp, &cpu_reg, fw);
if (rc)
goto init_cpu_err;
@@ -2996,6 +2991,7 @@ bnx2_init_cpus(struct bnx2 *bp)
else
fw = &bnx2_com_fw_06;
 
+   fw->text = text;
rc = load_cpu_fw(bp, &cpu_reg, fw);
if (rc)
goto init_cpu_err;
@@ -3017,11 +3013,13 @@ bnx2_init_cpus(struct bnx2 *bp)
if (CHIP_NUM(bp) == CHIP_NUM_5709) {
fw = &bnx2_cp_fw_09;
 
+   fw->text = text;
rc = load_cpu_fw(bp, &cpu_reg, fw);
if (rc)
goto init_cpu_err;
}
 init_cpu_err:
+   vfree(text);
return rc;
 }
 
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index a717459..56c190f 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -6738,7 +6738,7 @@ struct fw_info {
const u32 text_addr;
const u32 text_len;
const u32 text_index;
-/* u32 *text;*/
+   u32 *text;
u8 *gz_text;
const u32 gz_text_len;
 
@@ -6752,13 +6752,11 @@ struct fw_info {
const 

[PATCH 2.6.24 1/3][BNX2]: Add missing napi_disable() in bnx2_close().

2007-10-02 Thread Michael Chan
[BNX2]: Add missing napi_disable() in bnx2_close().

bnx2_close() -> bnx2_netif_stop() will not call napi_disable() because
the netif_state is not running in bnx2_close().  To avoid confusion,
we change it to disable interrupt and napi directly in bnx2_close().

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index cd5f1b7..4887c31 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -5212,8 +5212,8 @@ bnx2_close(struct net_device *dev)
while (bp->in_reset_task)
msleep(1);
 
-   /* This does napi_disable() for us.  */
-   bnx2_netif_stop(bp);
+   bnx2_disable_int_sync(bp);
+   napi_disable(&bp->napi);
del_timer_sync(&bp->timer);
if (bp->flags & NO_WOL_FLAG)
reset_code = BNX2_DRV_MSG_CODE_UNLOAD_LNK_DN;


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones

Larry McVoy wrote:

On Tue, Oct 02, 2007 at 03:32:16PM -0700, David Miller wrote:


I'm starting to have a theory about what the bad case might
be.

A strong sender going to an even stronger receiver which can
pull out packets into the process as fast as they arrive.
This might be part of what keeps the receive window from
growing.



I can back you up on that.  When I straced the receiving side that goes
slowly, all the reads were short, like 1-2K.  The way that works the 
reads were a lot larger as I recall.


Indeed I was getting more like 8K on each recv() call per netperf's -v 2 stats, 
but the system was more than fast enough to stay ahead of the traffic.  On the 
hunch that it was the interrupt throttling which was keeping the recv's large 
rather than the speed of the system(s) I nuked the InterruptThrottleRate to 0 
and was able to get between 1900 and 2300 byte recvs on the TCP_STREAM and 
TCP_MAERTS tests and still had 940 Mbit/s in each direction.


hpcpc106:~# netperf -H 192.168.7.107 -t TCP_STREAM -v 2 -c -C
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.7.107 
(192.168.7.107) port 0 AF_INET

Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

 87380  87380  8738010.02   940.95   10.7521.653.743   7.540

Alignment  Offset BytesBytes   Sends   BytesRecvs
Local  Remote  Local  Remote  Xfered   Per Per
Send   RecvSend   Recv Send (avg)  Recv (avg)
8   8  0   0 1.179e+09  87386.29 13491   1965.77 599729

Maximum
Segment
Size (bytes)
  1448
hpcpc106:~# netperf -H 192.168.7.107 -t TCP_MAERTS -v 2 -c -C
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.7.107 
(192.168.7.107) port 0 AF_INET

Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

 87380  87380  8738010.02   940.82   20.4410.617.117   3.696

Alignment  Offset BytesBytes   Recvs   BytesSends
Local  Remote  Local  Remote  Xfered   Per Per
Recv   SendRecv   Send Recv (avg)  Send (avg)
8   8  0   0 1.178e+09  2352.26500931   87380.00  13485

Maximum
Segment
Size (bytes)
  1448

the systems above had four 1.6 GHz cores, netperf reports CPU as 0 to 100% 
regardless of core count.


and then my systems with the 3.0 GHz cores:

[EMAIL PROTECTED] netperf2_trunk]# netperf -H sweb20 -v 2 -t TCP_STREAM -c -C
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to sweb20.cup.hp.com 
(16.89.133.20) port 0 AF_INET

Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

 87380  16384  1638410.03   941.37   6.40 13.262.229   4.615

Alignment  Offset BytesBytes   Sends   BytesRecvs
Local  Remote  Local  Remote  Xfered   Per Per
Send   RecvSend   Recv Send (avg)  Recv (avg)
8   8  0   0 1.18e+09  16384.06 72035   1453.85 811793

Maximum
Segment
Size (bytes)
  1448
[EMAIL PROTECTED] netperf2_trunk]# netperf -H sweb20 -v 2 -t TCP_MAERTS -c -C
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to sweb20.cup.hp.com 
(16.89.133.20) port 0 AF_INET

Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

 87380  16384  1638410.03   941.35   12.135.80 4.221   2.018

Alignment  Offset BytesBytes   Recvs   BytesSends
Local  Remote  Local  Remote  Xfered   Per Per
Recv   SendRecv   Send Recv (avg)  Send (avg)
8   8  0   0 1.181e+09  1452.38812953   16384.00  72065

Maximum
Segment
Size (bytes)
  1448


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] Fix ICMPv6 redirect handling with target multicast address

2007-10-02 Thread David Stevens
Brian,
I don't think a few instructions is a performance issue in the 
redirect
paths (it'd be pretty broken if you're getting or generating lots of 
them), but I
know there are lots of other checks similar to that that will break with 
new
attributes, so doing that as a general clean-up separately is ok with me, 
too.

With the error message changes, you can add:

Acked-by: David L Stevens <[EMAIL PROTECTED]>

FWIW. :-)

+-DLS


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
On Tue, Oct 02, 2007 at 03:32:16PM -0700, David Miller wrote:
> I'm starting to have a theory about what the bad case might
> be.
> 
> A strong sender going to an even stronger receiver which can
> pull out packets into the process as fast as they arrive.
> This might be part of what keeps the receive window from
> growing.

I can back you up on that.  When I straced the receiving side that goes
slowly, all the reads were short, like 1-2K.  The way that works the 
reads were a lot larger as I recall.
-- 
---
Larry McVoylm at bitmover.com   http://www.bitkeeper.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][TG3]Some cleanups

2007-10-02 Thread Michael Chan
On Tue, 2007-10-02 at 08:37 -0400, jamal wrote:
> The simplest solution seems to me to modify the definition of
> TG3_SKB_CB
> as i did for e1000 from:
> (struct tg3_tx_cbdata *)&((__skb)->cb[0])
> to:
> (struct tg3_tx_cbdata *)&((__skb)->cb[8])
> 
> that way the vlan tags are always present and no need to recreate
> them.
> What do you think?

Seems ok to me.  I think we should make it more clear that we're
skipping over the VLAN tag:

(struct tg3_tx_cbdata *)&((__skb)->cb[sizeof(struct vlan_skb_tx_cookie)])

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: Rick Jones <[EMAIL PROTECTED]>
Date: Tue, 02 Oct 2007 15:17:35 -0700

> Stranger still, with a mix of a 2.6.23-rc5ish kernel and a net-2.6.24 one 
> (pulled oh middle of last week?) I get link-rate and I see no asymmetry 
> between 
> TCP_STREAM and TCP_MAERTS over an "e1000" link with no switch or tg3 with a 
> ProCurve on my rx2660's.
> 
> I can also run bw_tcp from lmbench 3.0a8 and get 106 MB/s.
> 
> I don't have a netgear switch to try in all this...

I'm starting to have a theory about what the bad case might
be.

A strong sender going to an even stronger receiver which can
pull out packets into the process as fast as they arrive.
This might be part of what keeps the receive window from
growing.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones

David Miller wrote:

From: [EMAIL PROTECTED] (Larry McVoy)
Date: Tue, 2 Oct 2007 14:26:08 -0700



And note that sky2 doesn't have this problem.  Does the broadcom do TSO?
And sky2 not?  I noticed a much higher CPU load for sky2.



Yes the broadcoms (the revisions I have) do TSO and it is enabled
on both sides.

Which makes the mis-matched performance even stranger :)


Stranger still, with a mix of a 2.6.23-rc5ish kernel and a net-2.6.24 one 
(pulled oh middle of last week?) I get link-rate and I see no asymmetry between 
TCP_STREAM and TCP_MAERTS over an "e1000" link with no switch or tg3 with a 
ProCurve on my rx2660's.


I can also run bw_tcp from lmbench 3.0a8 and get 106 MB/s.

I don't have a netgear switch to try in all this...

rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver

2007-10-02 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 2 Oct 2007 14:52:36 -0700

> Please consider using netif_msg_xxx() and module parameter to set
> default message level, like other real network drivers already do.

I keep seeing this recommendation, but the two supposedly most mature
and actively used drivers in the tree, tg3 and e1000 and e1000e, all
do not use this scheme.

In fact there are tons of drivers that even hook up the ethtool
msg_level setting function and never even use the value.

If people aren't using netif_msg_xxx() and the ethtool msg_level
facilities properly, it's because there is a severe dearth of good
example drivers to learn about it from.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver

2007-10-02 Thread Stephen Hemminger
On Tue, 02 Oct 2007 23:02:53 +0200
Oliver Hartkopp <[EMAIL PROTECTED]> wrote:

> Arnaldo Carvalho de Melo wrote:
> > Em Tue, Oct 02, 2007 at 03:10:11PM +0200, Urs Thuermann escreveu:
> >   
> >> +
> >> +#ifdef CONFIG_CAN_DEBUG_DEVICES
> >> +static int debug;
> >> +module_param(debug, int, S_IRUGO);
> >> +#endif
> >> 
> >
> > Can debug be a boolean? Like its counterpart on DCCP:
> >
> > net/dccp/proto.c:
> >
> > module_param(dccp_debug, bool, 0444);
> >   
> 
> 'debug' should remain an integer to be able to specifiy debug-levels or 
> bit-fields for different Debug outputs.
> 
> > Where we also use a namespace prefix, for those of us who use ctags or
> > cscope.
> >   
> 
> Even if i don't have any general objections to rename this 'debug' to 
> 'vcan_debug', it looks like an 'overnamed' module parameter for me. Is 
> this a genereal naming scheme recommendation for debug module_params?
> 

Please consider using netif_msg_xxx() and module parameter to set
default message level, like other real network drivers already do.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Linus Torvalds


On Tue, 2 Oct 2007, Wayne Scott wrote:
> 
> The slow set was done like this:
> 
>  on ia64:  netcat -l -p > /dev/null
>  on work:  netcat ia64  < /dev/zero

That sounds wrong. Larry claims the slow case is when the side that did 
"accept()" does the sending, the above has the listener just reading.

> The fast set was done like this:
> 
>  on work:  netcat -l -p > /dev/null
>  on ia64:  netcat ia64  < /dev/zero

This one is guaranteed wrong too, since you have the listener reading 
(fine), but the sener now doesn't go over the network at all, but sends to 
itself.

That said, let's assume that only your description was bogus, the TCP 
dumps themselves are ok. 

I find the window scaling differences interesting. This is the opening of 
the fast sequence from the receiver:

13:35:13.929349 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: S 
2592471184:2592471184(0) ack 3363219397 win 5792 
13:35:13.929702 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 1449 win 
68 
13:35:13.929712 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 2897 win 
91 
13:35:13.929724 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 4345 win 
114 
13:35:13.929941 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 5793 win 
136 
13:35:13.929951 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 7241 win 
159 
13:35:13.929960 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 8689 win 
181 
13:35:13.929970 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 10137 
win 204 
13:35:13.929981 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 11585 
win 227 
13:35:13.929992 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 13033 
win 249 
13:35:13.930331 IP 10.3.1.1.ddi-tcp-1 > 10.3.1.10.58415: . ack 14481 
win 272 
 ...

ie we use a window scale of 7, and we started with a window of 5792 bytes, 
and after ten packets it has grown to 272<<7 (34816) bytes.

The slow case is 

13:34:16.761034 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: S 
3299922549:3299922549(0) ack 2548837296 win 5792 
13:34:16.761533 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 1449 win 
2172 
13:34:16.761553 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 2897 win 
2896 
13:34:16.761782 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 4345 win 
3620 
13:34:16.761908 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 5793 win 
4344 
13:34:16.761916 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 7241 win 
5068 
13:34:16.762157 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 8689 win 
5792 
13:34:16.762164 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 10137 
win 6516 
13:34:16.762283 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 11585 
win 7240 
13:34:16.762290 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 13033 
win 7964 
13:34:16.762303 IP 10.3.1.10.ddi-tcp-1 > 10.3.1.1.49864: . ack 14481 
win 8688 
...

so after the same ten packets, it too has grown to about the same 
size (8688<<2 = 34752 bytes). 

But the slow case has a smaller window scale, and it actually stops 
opening the window at that point: the window stays at 8688<<2 for a long 
time (and eventually grows to 9412<<2 and then 16652<<2 in the steady 
case, and is basically limited at that 66kB window size).

But the fast one that had a window scale of 7 can keep growing, and will 
do so quite aggressively. It grows the window to (1442<<7 = 180kB) in the 
first fifty packets.

But in your dump, it doesn't seem to be about who is listening and who is 
connecting. It seems to be about the fact that your machine 10.3.1.10 uses 
a window scale of 2, while 10.3.1.1 uses a scale of 7.

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 05/13] skge: remove broken and unused PHY_M_PC_MDI_XMODE macro

2007-10-02 Thread Jeff Garzik

Stephen Hemminger wrote:

On Tue, 02 Oct 2007 14:11:38 -0700
[EMAIL PROTECTED] wrote:


From: Mariusz Kozlowski <[EMAIL PROTECTED]>

Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>
Cc: Stephen Hemminger <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---



Already in netdev tree isn't it?


Yep.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver

2007-10-02 Thread David Miller
From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Date: Tue, 2 Oct 2007 18:43:25 -0300

> I think that helping ctags to find the definition for the debug variable
> to see, for instance, if it is a bitmask or a boolean without having to
> chose from tons of 'debug' variables is a good thing.

I completely agree.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 09/13] forcedeth: "no link" is informational

2007-10-02 Thread Stephen Hemminger
On Tue, 02 Oct 2007 14:11:41 -0700
[EMAIL PROTECTED] wrote:

> From: "Ed Swierk" <[EMAIL PROTECTED]>
> 
> Log "no link during initialization" at KERN_INFO as it's not an error, and
> occurs every time the interface comes up (when the forcedeth-phy-power-down
> patch is applied).
> 
> Signed-off-by: Ed Swierk <[EMAIL PROTECTED]>
> Cc: Ayaz Abdulla <[EMAIL PROTECTED]>
> Cc: Jeff Garzik <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
> 
>  drivers/net/forcedeth.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -puN drivers/net/forcedeth.c~forcedeth-no-link-is-informational 
> drivers/net/forcedeth.c
> --- a/drivers/net/forcedeth.c~forcedeth-no-link-is-informational
> +++ a/drivers/net/forcedeth.c
> @@ -4921,7 +4921,7 @@ static int nv_open(struct net_device *de
>   if (ret) {
>   netif_carrier_on(dev);
>   } else {
> - printk("%s: no link during initialization.\n", dev->name);
> + printk(KERN_INFO "%s: no link during initialization.\n", 
> dev->name);
>   netif_carrier_off(dev);
>   }
>   if (oom)

Driver should use netif_msg_link_up()


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc8-mm2 - tcp_fastretrans_alert() WARNING

2007-10-02 Thread Ilpo Järvinen
> On Tue, 2 Oct 2007, Ilpo Järvinen wrote:
> 
> > I'm currently out of ideas where it could come from...

Hmm, there seems to be off-by-one in tcp_retrans_try_collapse after
all, or in fact, two of them. I'll post patch for this tomorrow...


-- 
 i.

Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: [EMAIL PROTECTED] (Larry McVoy)
Date: Tue, 2 Oct 2007 14:26:08 -0700

> And note that sky2 doesn't have this problem.  Does the broadcom do TSO?
> And sky2 not?  I noticed a much higher CPU load for sky2.

Yes the broadcoms (the revisions I have) do TSO and it is enabled
on both sides.

Which makes the mis-matched performance even stranger :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 05/13] skge: remove broken and unused PHY_M_PC_MDI_XMODE macro

2007-10-02 Thread Stephen Hemminger
On Tue, 02 Oct 2007 14:11:38 -0700
[EMAIL PROTECTED] wrote:

> From: Mariusz Kozlowski <[EMAIL PROTECTED]>
> 
> Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>
> Cc: Stephen Hemminger <[EMAIL PROTECTED]>
> Cc: Jeff Garzik <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
>

Already in netdev tree isn't it?


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver

2007-10-02 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 02, 2007 at 11:02:53PM +0200, Oliver Hartkopp escreveu:
> Arnaldo Carvalho de Melo wrote:
>> Em Tue, Oct 02, 2007 at 03:10:11PM +0200, Urs Thuermann escreveu:
>>   
>>> +
>>> +#ifdef CONFIG_CAN_DEBUG_DEVICES
>>> +static int debug;
>>> +module_param(debug, int, S_IRUGO);
>>> +#endif
>>> 
>>
>> Can debug be a boolean? Like its counterpart on DCCP:
>>
>> net/dccp/proto.c:
>>
>> module_param(dccp_debug, bool, 0444);
>>   
>
> 'debug' should remain an integer to be able to specifiy debug-levels or 
> bit-fields for different Debug outputs.
>
>> Where we also use a namespace prefix, for those of us who use ctags or
>> cscope.
>>   
>
> Even if i don't have any general objections to rename this 'debug' to 
> 'vcan_debug', it looks like an 'overnamed' module parameter for me. Is this 
> a genereal naming scheme recommendation for debug module_params?

[EMAIL PROTECTED] linux-2.6.23-rc9-rt1]$ find . -name "*.c" | xargs grep
'module_param(.\+debug,' | wc -l
112
[EMAIL PROTECTED] linux-2.6.23-rc9-rt1]$ find . -name "*.c" | xargs grep
'module_param(debug,' | wc -l
233
[EMAIL PROTECTED] linux-2.6.23-rc9-rt1]$

I think that helping ctags to find the definition for the debug variable
to see, for instance, if it is a bitmask or a boolean without having to
chose from tons of 'debug' variables is a good thing.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: [EMAIL PROTECTED] (Larry McVoy)
Date: Tue, 2 Oct 2007 11:40:32 -0700

> I doubt it, the same test works fine in one direction and poorly in the other.
> Wouldn't the flow control squelch either way?

HW controls for these things are typically:

1) Generates flow control flames
2) Listens for them

So you can have flow control operational in one direction
and not the other.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
On Tue, Oct 02, 2007 at 02:16:56PM -0700, David Miller wrote:
> We absolutely depend upon people like you to report when there are
> anomalies like this.  It's the only thing that scales.

Well cool, finally doing something useful :)

Is this issue no test setup?  Because this does seem like something we'd
want to have work well.

> FWIW I have a t1000 Niagara box and an Ultra45 going through a netgear
> gigabit switch.  I'm getting 85MB/sec in one direction and 10MB/sec in
> the other (using bw_tcp from lmbench3).  

Note that bw_tcp mucks with SND/RCVBUF.  It probably shouldn't, it's been
12 years since that code went in there and I dunno if it is still needed.

> Both are using identical
> broadcom tigon3 gigabit chips and identical current kernels so that is
> a truly strange result.
> 
> I'll investigate, it may be the same thing you're seeing.

Wow, sounds very similar.  In my case I was seeing pretty close to 3x
consistently.  You're more like 8x, but I was all e1000 not broadcom.

And note that sky2 doesn't have this problem.  Does the broadcom do TSO?
And sky2 not?  I noticed a much higher CPU load for sky2.
-- 
---
Larry McVoylm at bitmover.com   http://www.bitkeeper.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [10/11] pasemi_mac: use buffer index pointer in clean_rx()

2007-10-02 Thread Olof Johansson
pasemi_mac: use buffer index pointer in clean_rx()

Use the new features in B0 for buffer ring index on the receive side. This
means we no longer have to search in the ring for where the buffer
came from.

Also cleanup the RX cleaning side a little, while I was at it.

Note: Pre-B0 hardware is no longer supported, and needs a pile of other
workarounds that are not being submitted for mainline inclusion. So the
fact that this breaks old hardware is not a problem at this time.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: k.org/drivers/net/pasemi_mac.c
===
--- k.org.orig/drivers/net/pasemi_mac.c
+++ k.org/drivers/net/pasemi_mac.c
@@ -243,9 +243,9 @@ static int pasemi_mac_setup_rx_resources
   PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3));
 
write_dma_reg(mac, PAS_DMA_RXINT_CFG(mac->dma_if),
-  PAS_DMA_RXINT_CFG_DHL(3) |
-  PAS_DMA_RXINT_CFG_L2 |
-  PAS_DMA_RXINT_CFG_LW);
+ PAS_DMA_RXINT_CFG_DHL(3) | PAS_DMA_RXINT_CFG_L2 |
+ PAS_DMA_RXINT_CFG_LW | PAS_DMA_RXINT_CFG_RBP |
+ PAS_DMA_RXINT_CFG_HEN);
 
ring->next_to_fill = 0;
ring->next_to_clean = 0;
@@ -402,13 +402,12 @@ static void pasemi_mac_free_rx_resources
 static void pasemi_mac_replenish_rx_ring(struct net_device *dev, int limit)
 {
struct pasemi_mac *mac = netdev_priv(dev);
-   int start = mac->rx->next_to_fill;
-   unsigned int fill, count;
+   int fill, count;
 
if (limit <= 0)
return;
 
-   fill = start;
+   fill = mac->rx->next_to_fill;
for (count = 0; count < limit; count++) {
struct pasemi_mac_buffer *info = &RX_RING_INFO(mac, fill);
u64 *buff = &RX_BUFF(mac, fill);
@@ -446,10 +445,10 @@ static void pasemi_mac_replenish_rx_ring
 
wmb();
 
-   write_dma_reg(mac, PAS_DMA_RXCHAN_INCR(mac->dma_rxch), count);
write_dma_reg(mac, PAS_DMA_RXINT_INCR(mac->dma_if), count);
 
-   mac->rx->next_to_fill += count;
+   mac->rx->next_to_fill = (mac->rx->next_to_fill + count) &
+   (RX_RING_SIZE - 1);
 }
 
 static void pasemi_mac_restart_rx_intr(struct pasemi_mac *mac)
@@ -517,15 +516,19 @@ static int pasemi_mac_clean_rx(struct pa
int count;
struct pasemi_mac_buffer *info;
struct sk_buff *skb;
-   unsigned int i, len;
+   unsigned int len;
u64 macrx;
dma_addr_t dma;
+   int buf_index;
+   u64 eval;
 
spin_lock(&mac->rx->lock);
 
n = mac->rx->next_to_clean;
 
-   for (count = limit; count; count--) {
+   prefetch(RX_RING(mac, n));
+
+   for (count = 0; count < limit; count++) {
macrx = RX_RING(mac, n);
 
if ((macrx & XCT_MACRX_E) ||
@@ -537,21 +540,14 @@ static int pasemi_mac_clean_rx(struct pa
 
info = NULL;
 
-   /* We have to scan for our skb since there's no way
-* to back-map them from the descriptor, and if we
-* have several receive channels then they might not
-* show up in the same order as they were put on the
-* interface ring.
-*/
+   BUG_ON(!(macrx & XCT_MACRX_RR_8BRES));
 
-   dma = (RX_RING(mac, n+1) & XCT_PTR_ADDR_M);
-   for (i = mac->rx->next_to_fill;
-i < (mac->rx->next_to_fill + RX_RING_SIZE);
-i++) {
-   info = &RX_RING_INFO(mac, i);
-   if (info->dma == dma)
-   break;
-   }
+   eval = (RX_RING(mac, n+1) & XCT_RXRES_8B_EVAL_M) >>
+   XCT_RXRES_8B_EVAL_S;
+   buf_index = eval-1;
+
+   dma = (RX_RING(mac, n+2) & XCT_PTR_ADDR_M);
+   info = &RX_RING_INFO(mac, buf_index);
 
skb = info->skb;
 
@@ -600,9 +596,9 @@ static int pasemi_mac_clean_rx(struct pa
/* Need to zero it out since hardware doesn't, since the
 * replenish loop uses it to tell when it's done.
 */
-   RX_BUFF(mac, i) = 0;
+   RX_BUFF(mac, buf_index) = 0;
 
-   n += 2;
+   n += 4;
}
 
if (n > RX_RING_SIZE) {
@@ -610,8 +606,16 @@ static int pasemi_mac_clean_rx(struct pa
write_iob_reg(mac, PAS_IOB_COM_PKTHDRCNT, 0);
n &= (RX_RING_SIZE-1);
}
+
mac->rx->next_to_clean = n;
-   pasemi_mac_replenish_rx_ring(mac->netdev, limit-count);
+
+   /* Increase is in number of 16-byte entries, and since each descriptor
+* with an 8BRES takes up 3x8 bytes (padded to 4x8), increase with
+* count*2.
+*/
+   write_dma_reg(mac, PAS_DMA_RXCHAN_INCR(mac->dma_rxch), c

[PATCH] [11/11] pasemi_mac: enable iommu support

2007-10-02 Thread Olof Johansson
pasemi_mac: use buffer index pointer in clean_rx()

Use the new features in B0 for buffer ring index on the receive side. This
means we no longer have to search in the ring for where the buffer
came from.

Also cleanup the RX cleaning side a little, while I was at it.

Note: Pre-B0 hardware is no longer supported, and needs a pile of other
workarounds that are not being submitted for mainline inclusion. So the
fact that this breaks old hardware is not a problem at this time.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: k.org/drivers/net/pasemi_mac.c
===
--- k.org.orig/drivers/net/pasemi_mac.c
+++ k.org/drivers/net/pasemi_mac.c
@@ -243,9 +243,9 @@ static int pasemi_mac_setup_rx_resources
   PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3));
 
write_dma_reg(mac, PAS_DMA_RXINT_CFG(mac->dma_if),
-  PAS_DMA_RXINT_CFG_DHL(3) |
-  PAS_DMA_RXINT_CFG_L2 |
-  PAS_DMA_RXINT_CFG_LW);
+ PAS_DMA_RXINT_CFG_DHL(3) | PAS_DMA_RXINT_CFG_L2 |
+ PAS_DMA_RXINT_CFG_LW | PAS_DMA_RXINT_CFG_RBP |
+ PAS_DMA_RXINT_CFG_HEN);
 
ring->next_to_fill = 0;
ring->next_to_clean = 0;
@@ -402,13 +402,12 @@ static void pasemi_mac_free_rx_resources
 static void pasemi_mac_replenish_rx_ring(struct net_device *dev, int limit)
 {
struct pasemi_mac *mac = netdev_priv(dev);
-   int start = mac->rx->next_to_fill;
-   unsigned int fill, count;
+   int fill, count;
 
if (limit <= 0)
return;
 
-   fill = start;
+   fill = mac->rx->next_to_fill;
for (count = 0; count < limit; count++) {
struct pasemi_mac_buffer *info = &RX_RING_INFO(mac, fill);
u64 *buff = &RX_BUFF(mac, fill);
@@ -446,10 +445,10 @@ static void pasemi_mac_replenish_rx_ring
 
wmb();
 
-   write_dma_reg(mac, PAS_DMA_RXCHAN_INCR(mac->dma_rxch), count);
write_dma_reg(mac, PAS_DMA_RXINT_INCR(mac->dma_if), count);
 
-   mac->rx->next_to_fill += count;
+   mac->rx->next_to_fill = (mac->rx->next_to_fill + count) &
+   (RX_RING_SIZE - 1);
 }
 
 static void pasemi_mac_restart_rx_intr(struct pasemi_mac *mac)
@@ -517,15 +516,19 @@ static int pasemi_mac_clean_rx(struct pa
int count;
struct pasemi_mac_buffer *info;
struct sk_buff *skb;
-   unsigned int i, len;
+   unsigned int len;
u64 macrx;
dma_addr_t dma;
+   int buf_index;
+   u64 eval;
 
spin_lock(&mac->rx->lock);
 
n = mac->rx->next_to_clean;
 
-   for (count = limit; count; count--) {
+   prefetch(RX_RING(mac, n));
+
+   for (count = 0; count < limit; count++) {
macrx = RX_RING(mac, n);
 
if ((macrx & XCT_MACRX_E) ||
@@ -537,21 +540,14 @@ static int pasemi_mac_clean_rx(struct pa
 
info = NULL;
 
-   /* We have to scan for our skb since there's no way
-* to back-map them from the descriptor, and if we
-* have several receive channels then they might not
-* show up in the same order as they were put on the
-* interface ring.
-*/
+   BUG_ON(!(macrx & XCT_MACRX_RR_8BRES));
 
-   dma = (RX_RING(mac, n+1) & XCT_PTR_ADDR_M);
-   for (i = mac->rx->next_to_fill;
-i < (mac->rx->next_to_fill + RX_RING_SIZE);
-i++) {
-   info = &RX_RING_INFO(mac, i);
-   if (info->dma == dma)
-   break;
-   }
+   eval = (RX_RING(mac, n+1) & XCT_RXRES_8B_EVAL_M) >>
+   XCT_RXRES_8B_EVAL_S;
+   buf_index = eval-1;
+
+   dma = (RX_RING(mac, n+2) & XCT_PTR_ADDR_M);
+   info = &RX_RING_INFO(mac, buf_index);
 
skb = info->skb;
 
@@ -600,9 +596,9 @@ static int pasemi_mac_clean_rx(struct pa
/* Need to zero it out since hardware doesn't, since the
 * replenish loop uses it to tell when it's done.
 */
-   RX_BUFF(mac, i) = 0;
+   RX_BUFF(mac, buf_index) = 0;
 
-   n += 2;
+   n += 4;
}
 
if (n > RX_RING_SIZE) {
@@ -610,8 +606,16 @@ static int pasemi_mac_clean_rx(struct pa
write_iob_reg(mac, PAS_IOB_COM_PKTHDRCNT, 0);
n &= (RX_RING_SIZE-1);
}
+
mac->rx->next_to_clean = n;
-   pasemi_mac_replenish_rx_ring(mac->netdev, limit-count);
+
+   /* Increase is in number of 16-byte entries, and since each descriptor
+* with an 8BRES takes up 3x8 bytes (padded to 4x8), increase with
+* count*2.
+*/
+   write_dma_reg(mac, PAS_DMA_RXCHAN_INCR(mac->dma_rxch), c

[PATCH] [8/11] pasemi_mac: update todo list

2007-10-02 Thread Olof Johansson
pasemi_mac: update todo list

Remove some stale todo items that have been taken care of. Add a couple
of upcoming ones.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: 2.6.23/drivers/net/pasemi_mac.c
===
--- 2.6.23.orig/drivers/net/pasemi_mac.c
+++ 2.6.23/drivers/net/pasemi_mac.c
@@ -46,12 +46,10 @@
 
 /* TODO list
  *
- * - Get rid of pci_{read,write}_config(), map registers with ioremap
- *   for performance
- * - PHY support
  * - Multicast support
  * - Large MTU support
- * - Other performance improvements
+ * - SW LRO
+ * - Multiqueue RX/TX
  */
 
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [9/11] pasemi_mac: clear out old errors on interface open

2007-10-02 Thread Olof Johansson
pasemi_mac: clear out old errors on interface open

Clear out any pending errors when an interface is brought up. Since the bits
are sticky, they might be from interface shutdown time after firmware has
used it, etc.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: k.org/drivers/net/pasemi_mac.c
===
--- k.org.orig/drivers/net/pasemi_mac.c
+++ k.org/drivers/net/pasemi_mac.c
@@ -903,16 +903,27 @@ static int pasemi_mac_open(struct net_de
 
/* enable rx if */
write_dma_reg(mac, PAS_DMA_RXINT_RCMDSTA(mac->dma_if),
-  PAS_DMA_RXINT_RCMDSTA_EN);
+  PAS_DMA_RXINT_RCMDSTA_EN |
+  PAS_DMA_RXINT_RCMDSTA_DROPS_M |
+  PAS_DMA_RXINT_RCMDSTA_BP |
+  PAS_DMA_RXINT_RCMDSTA_OO |
+  PAS_DMA_RXINT_RCMDSTA_BT);
 
/* enable rx channel */
write_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch),
   PAS_DMA_RXCHAN_CCMDSTA_EN |
-  PAS_DMA_RXCHAN_CCMDSTA_DU);
+  PAS_DMA_RXCHAN_CCMDSTA_DU |
+  PAS_DMA_RXCHAN_CCMDSTA_OD |
+  PAS_DMA_RXCHAN_CCMDSTA_FD |
+  PAS_DMA_RXCHAN_CCMDSTA_DT);
 
/* enable tx channel */
write_dma_reg(mac, PAS_DMA_TXCHAN_TCMDSTA(mac->dma_txch),
-  PAS_DMA_TXCHAN_TCMDSTA_EN);
+  PAS_DMA_TXCHAN_TCMDSTA_EN |
+  PAS_DMA_TXCHAN_TCMDSTA_SZ |
+  PAS_DMA_TXCHAN_TCMDSTA_DB |
+  PAS_DMA_TXCHAN_TCMDSTA_DE |
+  PAS_DMA_TXCHAN_TCMDSTA_DA);
 
pasemi_mac_replenish_rx_ring(dev, RX_RING_SIZE);
 
@@ -987,7 +998,7 @@ out_rx_resources:
 static int pasemi_mac_close(struct net_device *dev)
 {
struct pasemi_mac *mac = netdev_priv(dev);
-   unsigned int stat;
+   unsigned int sta;
int retries;
 
if (mac->phydev) {
@@ -998,6 +1009,26 @@ static int pasemi_mac_close(struct net_d
netif_stop_queue(dev);
napi_disable(&mac->napi);
 
+   sta = read_dma_reg(mac, PAS_DMA_RXINT_RCMDSTA(mac->dma_if));
+   if (sta & (PAS_DMA_RXINT_RCMDSTA_BP |
+ PAS_DMA_RXINT_RCMDSTA_OO |
+ PAS_DMA_RXINT_RCMDSTA_BT))
+   printk(KERN_DEBUG "pasemi_mac: rcmdsta error: 0x%08x\n", sta);
+
+   sta = read_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch));
+   if (sta & (PAS_DMA_RXCHAN_CCMDSTA_DU |
+PAS_DMA_RXCHAN_CCMDSTA_OD |
+PAS_DMA_RXCHAN_CCMDSTA_FD |
+PAS_DMA_RXCHAN_CCMDSTA_DT))
+   printk(KERN_DEBUG "pasemi_mac: ccmdsta error: 0x%08x\n", sta);
+
+   sta = read_dma_reg(mac, PAS_DMA_TXCHAN_TCMDSTA(mac->dma_txch));
+   if (sta & (PAS_DMA_TXCHAN_TCMDSTA_SZ |
+ PAS_DMA_TXCHAN_TCMDSTA_DB |
+ PAS_DMA_TXCHAN_TCMDSTA_DE |
+ PAS_DMA_TXCHAN_TCMDSTA_DA))
+   printk(KERN_DEBUG "pasemi_mac: tcmdsta error: 0x%08x\n", sta);
+
/* Clean out any pending buffers */
pasemi_mac_clean_tx(mac);
pasemi_mac_clean_rx(mac, RX_RING_SIZE);
@@ -1008,33 +1039,33 @@ static int pasemi_mac_close(struct net_d
write_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch), 
PAS_DMA_RXCHAN_CCMDSTA_ST);
 
for (retries = 0; retries < MAX_RETRIES; retries++) {
-   stat = read_dma_reg(mac, PAS_DMA_TXCHAN_TCMDSTA(mac->dma_txch));
-   if (!(stat & PAS_DMA_TXCHAN_TCMDSTA_ACT))
+   sta = read_dma_reg(mac, PAS_DMA_TXCHAN_TCMDSTA(mac->dma_txch));
+   if (!(sta & PAS_DMA_TXCHAN_TCMDSTA_ACT))
break;
cond_resched();
}
 
-   if (stat & PAS_DMA_TXCHAN_TCMDSTA_ACT)
+   if (sta & PAS_DMA_TXCHAN_TCMDSTA_ACT)
dev_err(&mac->dma_pdev->dev, "Failed to stop tx channel\n");
 
for (retries = 0; retries < MAX_RETRIES; retries++) {
-   stat = read_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch));
-   if (!(stat & PAS_DMA_RXCHAN_CCMDSTA_ACT))
+   sta = read_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch));
+   if (!(sta & PAS_DMA_RXCHAN_CCMDSTA_ACT))
break;
cond_resched();
}
 
-   if (stat & PAS_DMA_RXCHAN_CCMDSTA_ACT)
+   if (sta & PAS_DMA_RXCHAN_CCMDSTA_ACT)
dev_err(&mac->dma_pdev->dev, "Failed to stop rx channel\n");
 
for (retries = 0; retries < MAX_RETRIES; retries++) {
-   stat = read_dma_reg(mac, PAS_DMA_RXINT_RCMDSTA(mac->dma_if));
-   if (!(stat & PAS_DMA_RXINT_RCMDSTA_ACT))
+   sta = read_dma_reg(mac, PAS_DMA_RXINT_RCMDSTA(mac->dma_if));
+   if (!(sta & PAS_DMA_RXINT_R

[PATCH] [6/11] pasemi_mac: add local skb alignment

2007-10-02 Thread Olof Johansson
pasemi_mac: add local skb alignment

Add local SKB alignment to pasemi_mac, since ppc64 in general has it at 0
because of design flaws in some of the IBM server bridge chips. However,
for PWRficient doing the unaligned copies is more expensive than doing
unaligned DMA so make sure the data is aligned instead.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: k.org/drivers/net/pasemi_mac.c
===
--- k.org.orig/drivers/net/pasemi_mac.c
+++ k.org/drivers/net/pasemi_mac.c
@@ -37,6 +37,12 @@
 
 #include "pasemi_mac.h"
 
+/* We have our own align, since ppc64 in general has it at 0 because
+ * of design flaws in some of the server bridge chips. However, for
+ * PWRficient doing the unaligned copies is more expensive than doing
+ * unaligned DMA, so make sure the data is aligned instead.
+ */
+#define LOCAL_SKB_ALIGN2
 
 /* TODO list
  *
@@ -409,13 +415,16 @@ static void pasemi_mac_replenish_rx_ring
/* skb might still be in there for recycle on short receives */
if (info->skb)
skb = info->skb;
-   else
+   else {
skb = dev_alloc_skb(BUF_SIZE);
+   skb_reserve(skb, LOCAL_SKB_ALIGN);
+   }
 
if (unlikely(!skb))
break;
 
-   dma = pci_map_single(mac->dma_pdev, skb->data, BUF_SIZE,
+   dma = pci_map_single(mac->dma_pdev, skb->data,
+BUF_SIZE - LOCAL_SKB_ALIGN,
 PCI_DMA_FROMDEVICE);
 
if (unlikely(dma_mapping_error(dma))) {
@@ -553,10 +562,12 @@ static int pasemi_mac_clean_rx(struct pa
len = (macrx & XCT_MACRX_LLEN_M) >> XCT_MACRX_LLEN_S;
 
if (len < 256) {
-   struct sk_buff *new_skb =
-   netdev_alloc_skb(mac->netdev, len + NET_IP_ALIGN);
+   struct sk_buff *new_skb;
+
+   new_skb = netdev_alloc_skb(mac->netdev,
+  len + LOCAL_SKB_ALIGN);
if (new_skb) {
-   skb_reserve(new_skb, NET_IP_ALIGN);
+   skb_reserve(new_skb, LOCAL_SKB_ALIGN);
memcpy(new_skb->data, skb->data, len);
/* save the skb in buffer_info as good */
skb = new_skb;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [7/11] pasemi_mac: further performance tweaks

2007-10-02 Thread Olof Johansson
pasemi_mac: further performance tweaks

Misc driver tweaks for pasemi_mac:
* Increase ring size (really needed mostly on 10G)
* Take out an unneeded barrier
* Move around a few prefetches and reorder a few calls
* Don't try to clean on full tx buffer, just let things
  take their course and stop the queue directly
* Avoid filling on the same line as the interface is
  working on to reduce cache line bouncing
* Avoid unneeded clearing of software state (and make the
  interface shutdown code handle it)
* Fix up some of the tx ring wrap logic.


Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: k.org/drivers/net/pasemi_mac.c
===
--- k.org.orig/drivers/net/pasemi_mac.c
+++ k.org/drivers/net/pasemi_mac.c
@@ -56,8 +56,8 @@
 
 
 /* Must be a power of two */
-#define RX_RING_SIZE 512
-#define TX_RING_SIZE 512
+#define RX_RING_SIZE 4096
+#define TX_RING_SIZE 4096
 
 #define DEFAULT_MSG_ENABLE   \
(NETIF_MSG_DRV  | \
@@ -336,8 +336,16 @@ static void pasemi_mac_free_tx_resources
struct pasemi_mac_buffer *info;
dma_addr_t dmas[MAX_SKB_FRAGS+1];
int freed;
+   int start, limit;
 
-   for (i = 0; i < TX_RING_SIZE; i += freed) {
+   start = mac->tx->next_to_clean;
+   limit = mac->tx->next_to_fill;
+
+   /* Compensate for when fill has wrapped and clean has not */
+   if (start > limit)
+   limit += TX_RING_SIZE;
+
+   for (i = start; i < limit; i += freed) {
info = &TX_RING_INFO(mac, i+1);
if (info->dma && info->skb) {
for (j = 0; j <= skb_shinfo(info->skb)->nr_frags; j++)
@@ -520,9 +528,6 @@ static int pasemi_mac_clean_rx(struct pa
n = mac->rx->next_to_clean;
 
for (count = limit; count; count--) {
-
-   rmb();
-
macrx = RX_RING(mac, n);
 
if ((macrx & XCT_MACRX_E) ||
@@ -550,14 +555,10 @@ static int pasemi_mac_clean_rx(struct pa
break;
}
 
-   prefetchw(info);
-
skb = info->skb;
-   prefetchw(skb);
-   info->dma = 0;
 
-   pci_unmap_single(mac->dma_pdev, dma, skb->len,
-PCI_DMA_FROMDEVICE);
+   prefetch(skb);
+   prefetch(&skb->data_len);
 
len = (macrx & XCT_MACRX_LLEN_M) >> XCT_MACRX_LLEN_S;
 
@@ -576,10 +577,9 @@ static int pasemi_mac_clean_rx(struct pa
} else
info->skb = NULL;
 
-   /* Need to zero it out since hardware doesn't, since the
-* replenish loop uses it to tell when it's done.
-*/
-   RX_BUFF(mac, i) = 0;
+   pci_unmap_single(mac->dma_pdev, dma, len, PCI_DMA_FROMDEVICE);
+
+   info->dma = 0;
 
skb_put(skb, len);
 
@@ -599,6 +599,11 @@ static int pasemi_mac_clean_rx(struct pa
RX_RING(mac, n) = 0;
RX_RING(mac, n+1) = 0;
 
+   /* Need to zero it out since hardware doesn't, since the
+* replenish loop uses it to tell when it's done.
+*/
+   RX_BUFF(mac, i) = 0;
+
n += 2;
}
 
@@ -621,27 +626,33 @@ static int pasemi_mac_clean_rx(struct pa
 static int pasemi_mac_clean_tx(struct pasemi_mac *mac)
 {
int i, j;
-   struct pasemi_mac_buffer *info;
-   unsigned int start, descr_count, buf_count, limit;
+   unsigned int start, descr_count, buf_count, batch_limit;
+   unsigned int ring_limit;
unsigned int total_count;
unsigned long flags;
struct sk_buff *skbs[TX_CLEAN_BATCHSIZE];
dma_addr_t dmas[TX_CLEAN_BATCHSIZE][MAX_SKB_FRAGS+1];
 
total_count = 0;
-   limit = TX_CLEAN_BATCHSIZE;
+   batch_limit = TX_CLEAN_BATCHSIZE;
 restart:
spin_lock_irqsave(&mac->tx->lock, flags);
 
start = mac->tx->next_to_clean;
+   ring_limit = mac->tx->next_to_fill;
+
+   /* Compensate for when fill has wrapped but clean has not */
+   if (start > ring_limit)
+   ring_limit += TX_RING_SIZE;
 
buf_count = 0;
descr_count = 0;
 
for (i = start;
-descr_count < limit && i < mac->tx->next_to_fill;
+descr_count < batch_limit && i < ring_limit;
 i += buf_count) {
u64 mactx = TX_RING(mac, i);
+   struct sk_buff *skb;
 
if ((mactx  & XCT_MACTX_E) ||
(*mac->tx_status & PAS_STATUS_ERROR))
@@ -651,19 +662,15 @@ restart:
/* Not yet transmitted */
break;
 
-   info = &TX_RING_INFO(mac, i+1);
-   skbs[descr_count] = info->skb;
+   skb = TX_RING_INFO(mac, i+1).skb;
+   

[PATCH] [4/11] pasemi_mac: implement sg support

2007-10-02 Thread Olof Johansson
pasemi_mac: implement sg support

Implement SG support for pasemi_mac

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: k.org/drivers/net/pasemi_mac.c
===
--- k.org.orig/drivers/net/pasemi_mac.c
+++ k.org/drivers/net/pasemi_mac.c
@@ -160,6 +160,30 @@ static int pasemi_get_mac_addr(struct pa
return 0;
 }
 
+static int pasemi_mac_unmap_tx_skb(struct pasemi_mac *mac,
+   struct sk_buff *skb,
+   dma_addr_t *dmas)
+{
+   int f;
+   int nfrags = skb_shinfo(skb)->nr_frags;
+
+   pci_unmap_single(mac->dma_pdev, dmas[0], skb_headlen(skb),
+PCI_DMA_TODEVICE);
+
+   for (f = 0; f < nfrags; f++) {
+   skb_frag_t *frag = &skb_shinfo(skb)->frags[f];
+
+   pci_unmap_page(mac->dma_pdev, dmas[f+1], frag->size,
+  PCI_DMA_TODEVICE);
+   }
+   dev_kfree_skb_irq(skb);
+
+   /* Freed descriptor slot + main SKB ptr + nfrags additional ptrs,
+* aligned up to a power of 2
+*/
+   return (nfrags + 3) & ~1;
+}
+
 static int pasemi_mac_setup_rx_resources(struct net_device *dev)
 {
struct pasemi_mac_rxring *ring;
@@ -300,24 +324,24 @@ out_ring:
 static void pasemi_mac_free_tx_resources(struct net_device *dev)
 {
struct pasemi_mac *mac = netdev_priv(dev);
-   unsigned int i;
+   unsigned int i, j;
struct pasemi_mac_buffer *info;
+   dma_addr_t dmas[MAX_SKB_FRAGS+1];
+   int freed;
 
-   for (i = 0; i < TX_RING_SIZE; i += 2) {
+   for (i = 0; i < TX_RING_SIZE; i += freed) {
info = &TX_RING_INFO(mac, i+1);
if (info->dma && info->skb) {
-   pci_unmap_single(mac->dma_pdev,
-info->dma,
-info->skb->len,
-PCI_DMA_TODEVICE);
-   dev_kfree_skb_any(info->skb);
-   }
-   TX_RING(mac, i) = 0;
-   TX_RING(mac, i+1) = 0;
-   info->dma = 0;
-   info->skb = NULL;
+   for (j = 0; j <= skb_shinfo(info->skb)->nr_frags; j++)
+   dmas[j] = TX_RING_INFO(mac, i+1+j).dma;
+   freed = pasemi_mac_unmap_tx_skb(mac, info->skb, dmas);
+   } else
+   freed = 2;
}
 
+   for (i = 0; i < TX_RING_SIZE; i++)
+   TX_RING(mac, i) = 0;
+
dma_free_coherent(&mac->dma_pdev->dev,
  TX_RING_SIZE * sizeof(u64),
  mac->tx->ring, mac->tx->dma);
@@ -573,27 +597,34 @@ static int pasemi_mac_clean_rx(struct pa
return count;
 }
 
+/* Can't make this too large or we blow the kernel stack limits */
+#define TX_CLEAN_BATCHSIZE (128/MAX_SKB_FRAGS)
+
 static int pasemi_mac_clean_tx(struct pasemi_mac *mac)
 {
-   int i;
+   int i, j;
struct pasemi_mac_buffer *info;
-   unsigned int start, count, limit;
+   unsigned int start, descr_count, buf_count, limit;
unsigned int total_count;
unsigned long flags;
-   struct sk_buff *skbs[32];
-   dma_addr_t dmas[32];
+   struct sk_buff *skbs[TX_CLEAN_BATCHSIZE];
+   dma_addr_t dmas[TX_CLEAN_BATCHSIZE][MAX_SKB_FRAGS+1];
 
total_count = 0;
+   limit = TX_CLEAN_BATCHSIZE;
 restart:
spin_lock_irqsave(&mac->tx->lock, flags);
 
start = mac->tx->next_to_clean;
-   limit = min(mac->tx->next_to_fill, start+32);
 
-   count = 0;
+   buf_count = 0;
+   descr_count = 0;
 
-   for (i = start; i < limit; i += 2) {
+   for (i = start;
+descr_count < limit && i < mac->tx->next_to_fill;
+i += buf_count) {
u64 mactx = TX_RING(mac, i);
+
if ((mactx  & XCT_MACTX_E) ||
(*mac->tx_status & PAS_STATUS_ERROR))
pasemi_mac_tx_error(mac, mactx);
@@ -603,30 +634,38 @@ restart:
break;
 
info = &TX_RING_INFO(mac, i+1);
-   skbs[count] = info->skb;
-   dmas[count] = info->dma;
+   skbs[descr_count] = info->skb;
+
+   buf_count = 2 + skb_shinfo(info->skb)->nr_frags;
+   for (j = 0; j <= skb_shinfo(info->skb)->nr_frags; j++)
+   dmas[descr_count][j] = TX_RING_INFO(mac, i+1+j).dma;
+
 
info->dma = 0;
TX_RING(mac, i) = 0;
TX_RING(mac, i+1) = 0;
+   TX_RING_INFO(mac, i+1).skb = 0;
+   TX_RING_INFO(mac, i+1).dma = 0;
 
-
-   count++;
+   /* Since we always fill with an even number of entries, make
+* sure we skip any unused one at the end as well.
+*/
+   if (buf_count & 1)
+   

[PATCH] [5/11] pasemi_mac: workaround for erratum 5971

2007-10-02 Thread Olof Johansson
pasemi_mac: workaround for erratum 5971

Implement workarounds for erratum 5971, where L2 hints aren't considered
properly unless the way hint is enabled on the interface. Since L2 isn't
setup to dedicate a way to headers, we need to reset the packet count
by hand so it won't run out of credits.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: k.org/drivers/net/pasemi_mac.c
===
--- k.org.orig/drivers/net/pasemi_mac.c
+++ k.org/drivers/net/pasemi_mac.c
@@ -239,7 +239,9 @@ static int pasemi_mac_setup_rx_resources
   PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3));
 
write_dma_reg(mac, PAS_DMA_RXINT_CFG(mac->dma_if),
-  PAS_DMA_RXINT_CFG_DHL(2));
+  PAS_DMA_RXINT_CFG_DHL(3) |
+  PAS_DMA_RXINT_CFG_L2 |
+  PAS_DMA_RXINT_CFG_LW);
 
ring->next_to_fill = 0;
ring->next_to_clean = 0;
@@ -589,6 +591,11 @@ static int pasemi_mac_clean_rx(struct pa
n += 2;
}
 
+   if (n > RX_RING_SIZE) {
+   /* Errata 5971 workaround: L2 target of headers */
+   write_iob_reg(mac, PAS_IOB_COM_PKTHDRCNT, 0);
+   n &= (RX_RING_SIZE-1);
+   }
mac->rx->next_to_clean = n;
pasemi_mac_replenish_rx_ring(mac->netdev, limit-count);
 
Index: k.org/drivers/net/pasemi_mac.h
===
--- k.org.orig/drivers/net/pasemi_mac.h
+++ k.org/drivers/net/pasemi_mac.h
@@ -210,6 +210,8 @@ enum {
 #definePAS_DMA_RXINT_CFG_DHL_S 24
 #definePAS_DMA_RXINT_CFG_DHL(x)(((x) << PAS_DMA_RXINT_CFG_DHL_S) & \
 PAS_DMA_RXINT_CFG_DHL_M)
+#definePAS_DMA_RXINT_CFG_LW0x0020
+#definePAS_DMA_RXINT_CFG_L20x0010
 #definePAS_DMA_RXINT_CFG_WIF   0x0002
 #definePAS_DMA_RXINT_CFG_WIL   0x0001
 
@@ -315,6 +317,12 @@ enum {
 #definePAS_STATUS_SOFT 0x4000ull
 #definePAS_STATUS_INT  0x8000ull
 
+#define PAS_IOB_COM_PKTHDRCNT  0x120
+#definePAS_IOB_COM_PKTHDRCNT_PKTHDR1_M 0x0fff
+#definePAS_IOB_COM_PKTHDRCNT_PKTHDR1_S 16
+#definePAS_IOB_COM_PKTHDRCNT_PKTHDR0_M 0x0fff
+#definePAS_IOB_COM_PKTHDRCNT_PKTHDR0_S 0
+
 #define PAS_IOB_DMA_RXCH_CFG(i)(0x1100 + (i)*4)
 #definePAS_IOB_DMA_RXCH_CFG_CNTTH_M0x0fff
 #definePAS_IOB_DMA_RXCH_CFG_CNTTH_S0
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 13/13] ax88796: add 93cx6 eeprom support

2007-10-02 Thread akpm
From: Magnus Damm <[EMAIL PROTECTED]>

Hook up the 93cx6 eeprom code to the ax88796 driver and modify the ax88796
driver to read out the mac address from the eeprom.  We need this for the
ax88796 on certain SuperH boards.  The pin configuration used to connect
the eeprom to the ax88796 on these boards is the same as pointed out by the
ax88796 datasheet, so we can probably reuse this code for multiple
platforms in the future.

Signed-off-by: Magnus Damm <[EMAIL PROTECTED]>
Cc: Ben Dooks <[EMAIL PROTECTED]>
Cc: Paul Mundt <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/Kconfig  |7 
 drivers/net/ax88796.c|   49 +
 include/linux/eeprom_93cx6.h |3 +-
 include/net/ax88796.h|1 
 4 files changed, 59 insertions(+), 1 deletion(-)

diff -puN drivers/net/Kconfig~ax88796-add-93cx6-eeprom-support 
drivers/net/Kconfig
--- a/drivers/net/Kconfig~ax88796-add-93cx6-eeprom-support
+++ a/drivers/net/Kconfig
@@ -240,6 +240,13 @@ config AX88796
  AX88796 driver, using platform bus to provide
  chip detection and resources
 
+config AX88796_93CX6
+   bool "ASIX AX88796 external 93CX6 eeprom support"
+   depends on AX88796
+   select EEPROM_93CX6
+   help
+ Select this if your platform comes with an external 93CX6 eeprom.
+
 config MACE
tristate "MACE (Power Mac ethernet) support"
depends on PPC_PMAC && PPC32
diff -puN drivers/net/ax88796.c~ax88796-add-93cx6-eeprom-support 
drivers/net/ax88796.c
--- a/drivers/net/ax88796.c~ax88796-add-93cx6-eeprom-support
+++ a/drivers/net/ax88796.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -582,6 +583,37 @@ static const struct ethtool_ops ax_ethto
.get_link   = ax_get_link,
 };
 
+#ifdef CONFIG_AX88796_93CX6
+static void ax_eeprom_register_read(struct eeprom_93cx6 *eeprom)
+{
+   struct ei_device *ei_local = eeprom->data;
+   u8 reg = ei_inb(ei_local->mem + AX_MEMR);
+
+   eeprom->reg_data_in = reg & AX_MEMR_EEI;
+   eeprom->reg_data_out = reg & AX_MEMR_EEO; /* Input pin */
+   eeprom->reg_data_clock = reg & AX_MEMR_EECLK;
+   eeprom->reg_chip_select = reg & AX_MEMR_EECS;
+}
+
+static void ax_eeprom_register_write(struct eeprom_93cx6 *eeprom)
+{
+   struct ei_device *ei_local = eeprom->data;
+   u8 reg = ei_inb(ei_local->mem + AX_MEMR);
+
+   reg &= ~(AX_MEMR_EEI | AX_MEMR_EECLK | AX_MEMR_EECS);
+
+   if (eeprom->reg_data_in)
+   reg |= AX_MEMR_EEI;
+   if (eeprom->reg_data_clock)
+   reg |= AX_MEMR_EECLK;
+   if (eeprom->reg_chip_select)
+   reg |= AX_MEMR_EECS;
+
+   ei_outb(reg, ei_local->mem + AX_MEMR);
+   udelay(10);
+}
+#endif
+
 /* setup code */
 
 static void ax_initial_setup(struct net_device *dev, struct ei_device 
*ei_local)
@@ -640,6 +672,23 @@ static int ax_init_dev(struct net_device
memcpy(dev->dev_addr,  SA_prom, 6);
}
 
+#ifdef CONFIG_AX88796_93CX6
+   if (first_init && ax->plat->flags & AXFLG_HAS_93CX6) {
+   unsigned char mac_addr[6];
+   struct eeprom_93cx6 eeprom;
+
+   eeprom.data = ei_local;
+   eeprom.register_read = ax_eeprom_register_read;
+   eeprom.register_write = ax_eeprom_register_write;
+   eeprom.width = PCI_EEPROM_WIDTH_93C56;
+
+   eeprom_93cx6_multiread(&eeprom, 0,
+  (__le16 __force *)mac_addr,
+  sizeof(mac_addr) >> 1);
+
+   memcpy(dev->dev_addr,  mac_addr, 6);
+   }
+#endif
if (ax->plat->wordlength == 2) {
/* We must set the 8390 for word mode. */
ei_outb(ax->plat->dcr_val, ei_local->mem + EN0_DCFG);
diff -puN include/linux/eeprom_93cx6.h~ax88796-add-93cx6-eeprom-support 
include/linux/eeprom_93cx6.h
--- a/include/linux/eeprom_93cx6.h~ax88796-add-93cx6-eeprom-support
+++ a/include/linux/eeprom_93cx6.h
@@ -21,13 +21,14 @@
 /*
Module: eeprom_93cx6
Abstract: EEPROM reader datastructures for 93cx6 chipsets.
-   Supported chipsets: 93c46 & 93c66.
+   Supported chipsets: 93c46, 93c56 and 93c66.
  */
 
 /*
  * EEPROM operation defines.
  */
 #define PCI_EEPROM_WIDTH_93C46 6
+#define PCI_EEPROM_WIDTH_93C56 8
 #define PCI_EEPROM_WIDTH_93C66 8
 #define PCI_EEPROM_WIDTH_OPCODE3
 #define PCI_EEPROM_WRITE_OPCODE0x05
diff -puN include/net/ax88796.h~ax88796-add-93cx6-eeprom-support 
include/net/ax88796.h
--- a/include/net/ax88796.h~ax88796-add-93cx6-eeprom-support
+++ a/include/net/ax88796.h
@@ -14,6 +14,7 @@
 
 #define AXFLG_HAS_EEPROM   (1<<0)
 #define AXFLG_MAC_FROMDEV  (1<<1)  /* device already has MAC */
+#define AXFLG_HAS_93CX6(1<<2)  /* use eeprom_93cx6 
driver */
 
 struct ax_plat_data {
unsigned int flags;
_
-
To unsub

[patch 11/13] PHYLIB: fix an interrupt loop potential when halting

2007-10-02 Thread akpm
From: "Maciej W. Rozycki" <[EMAIL PROTECTED]>

Ensure the PHY_HALTED state is not entered with the IRQ asserted as it
could lead to an interrupt loop.

There is a small window in phy_stop(), where the state of the PHY machine
indicates it has been halted, but its interrupt output might still be
unmasked.  If an interrupt goes active right at this moment it will loop as
the phy_interrupt() handler exits immediately with IRQ_NONE if the halted
state is seen.  It is unsafe to extend the phydev spinlock to cover
phy_interrupt().  It is safe to swap the order of the actions though as all
the competing places to unmask the interrupt output of the PHY, which are
phy_change() and phy_timer() are already covered with the lock as is the
sequence in question.

Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>
Cc: Andy Fleming <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/phy/phy.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN 
drivers/net/phy/phy.c~phylib-fix-an-interrupt-loop-potential-when-halting 
drivers/net/phy/phy.c
--- a/drivers/net/phy/phy.c~phylib-fix-an-interrupt-loop-potential-when-halting
+++ a/drivers/net/phy/phy.c
@@ -737,8 +737,6 @@ void phy_stop(struct phy_device *phydev)
if (PHY_HALTED == phydev->state)
goto out_unlock;
 
-   phydev->state = PHY_HALTED;
-
if (phydev->irq != PHY_POLL) {
/* Disable PHY Interrupts */
phy_config_interrupt(phydev, PHY_INTERRUPT_DISABLED);
@@ -747,6 +745,8 @@ void phy_stop(struct phy_device *phydev)
phy_clear_interrupt(phydev);
}
 
+   phydev->state = PHY_HALTED;
+
 out_unlock:
spin_unlock_bh(&phydev->lock);
 
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [3/11] pasemi_mac: rework ring management

2007-10-02 Thread Olof Johansson
pasemi_mac: rework ring management

Rework ring management, switching to an opaque ring format instead of
the struct-based descriptor+pointer setup, since it will be needed for
SG support.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: k.org/drivers/net/pasemi_mac.c
===
--- k.org.orig/drivers/net/pasemi_mac.c
+++ k.org/drivers/net/pasemi_mac.c
@@ -63,10 +63,10 @@
 NETIF_MSG_RX_ERR   | \
 NETIF_MSG_TX_ERR)
 
-#define TX_DESC(mac, num)  ((mac)->tx->desc[(num) & (TX_RING_SIZE-1)])
-#define TX_DESC_INFO(mac, num) ((mac)->tx->desc_info[(num) & (TX_RING_SIZE-1)])
-#define RX_DESC(mac, num)  ((mac)->rx->desc[(num) & (RX_RING_SIZE-1)])
-#define RX_DESC_INFO(mac, num) ((mac)->rx->desc_info[(num) & (RX_RING_SIZE-1)])
+#define TX_RING(mac, num)  ((mac)->tx->ring[(num) & (TX_RING_SIZE-1)])
+#define TX_RING_INFO(mac, num) ((mac)->tx->ring_info[(num) & (TX_RING_SIZE-1)])
+#define RX_RING(mac, num)  ((mac)->rx->ring[(num) & (RX_RING_SIZE-1)])
+#define RX_RING_INFO(mac, num) ((mac)->rx->ring_info[(num) & (RX_RING_SIZE-1)])
 #define RX_BUFF(mac, num)  ((mac)->rx->buffers[(num) & (RX_RING_SIZE-1)])
 
 #define RING_USED(ring)(((ring)->next_to_fill - 
(ring)->next_to_clean) \
@@ -174,22 +174,21 @@ static int pasemi_mac_setup_rx_resources
spin_lock_init(&ring->lock);
 
ring->size = RX_RING_SIZE;
-   ring->desc_info = kzalloc(sizeof(struct pasemi_mac_buffer) *
+   ring->ring_info = kzalloc(sizeof(struct pasemi_mac_buffer) *
  RX_RING_SIZE, GFP_KERNEL);
 
-   if (!ring->desc_info)
-   goto out_desc_info;
+   if (!ring->ring_info)
+   goto out_ring_info;
 
/* Allocate descriptors */
-   ring->desc = dma_alloc_coherent(&mac->dma_pdev->dev,
-   RX_RING_SIZE *
-   sizeof(struct pas_dma_xct_descr),
+   ring->ring = dma_alloc_coherent(&mac->dma_pdev->dev,
+   RX_RING_SIZE * sizeof(u64),
&ring->dma, GFP_KERNEL);
 
-   if (!ring->desc)
-   goto out_desc;
+   if (!ring->ring)
+   goto out_ring_desc;
 
-   memset(ring->desc, 0, RX_RING_SIZE * sizeof(struct pas_dma_xct_descr));
+   memset(ring->ring, 0, RX_RING_SIZE * sizeof(u64));
 
ring->buffers = dma_alloc_coherent(&mac->dma_pdev->dev,
   RX_RING_SIZE * sizeof(u64),
@@ -203,7 +202,7 @@ static int pasemi_mac_setup_rx_resources
 
write_dma_reg(mac, PAS_DMA_RXCHAN_BASEU(chan_id),
   PAS_DMA_RXCHAN_BASEU_BRBH(ring->dma >> 32) |
-  PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE >> 2));
+  PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE >> 3));
 
write_dma_reg(mac, PAS_DMA_RXCHAN_CFG(chan_id),
   PAS_DMA_RXCHAN_CFG_HBU(2));
@@ -229,11 +228,11 @@ static int pasemi_mac_setup_rx_resources
 
 out_buffers:
dma_free_coherent(&mac->dma_pdev->dev,
- RX_RING_SIZE * sizeof(struct pas_dma_xct_descr),
- mac->rx->desc, mac->rx->dma);
-out_desc:
-   kfree(ring->desc_info);
-out_desc_info:
+ RX_RING_SIZE * sizeof(u64),
+ mac->rx->ring, mac->rx->dma);
+out_ring_desc:
+   kfree(ring->ring_info);
+out_ring_info:
kfree(ring);
 out_ring:
return -ENOMEM;
@@ -254,25 +253,24 @@ static int pasemi_mac_setup_tx_resources
spin_lock_init(&ring->lock);
 
ring->size = TX_RING_SIZE;
-   ring->desc_info = kzalloc(sizeof(struct pasemi_mac_buffer) *
+   ring->ring_info = kzalloc(sizeof(struct pasemi_mac_buffer) *
  TX_RING_SIZE, GFP_KERNEL);
-   if (!ring->desc_info)
-   goto out_desc_info;
+   if (!ring->ring_info)
+   goto out_ring_info;
 
/* Allocate descriptors */
-   ring->desc = dma_alloc_coherent(&mac->dma_pdev->dev,
-   TX_RING_SIZE *
-   sizeof(struct pas_dma_xct_descr),
+   ring->ring = dma_alloc_coherent(&mac->dma_pdev->dev,
+   TX_RING_SIZE * sizeof(u64),
&ring->dma, GFP_KERNEL);
-   if (!ring->desc)
-   goto out_desc;
+   if (!ring->ring)
+   goto out_ring_desc;
 
-   memset(ring->desc, 0, TX_RING_SIZE * sizeof(struct pas_dma_xct_descr));
+   memset(ring->ring, 0, TX_RING_SIZE * sizeof(u64));
 
write_dma_reg(mac, PAS_DMA_TXCHAN_BASEL(chan_id),
   PAS_DMA_TXCHAN_BASEL_BRBL(ring->dma));
val = PAS_DMA_TXCHAN_BASEU_BRBH(ring->dma >> 32);
-   val |= PAS_DMA_TXCHAN_BASEU_SIZ(TX_RING_SIZE >> 2);
+   val |= 

[PATCH] [2/11] pasemi_mac: fix bug in receive buffer dma mapping

2007-10-02 Thread Olof Johansson
pasemi_mac: fix bug in receive buffer dma mapping

skb->len isn't actually set to the size of the allocated skb, so don't
try to use it when figuring out how much to map.

(This hasn't surfaced as a real bug because we effectively disable
translation for the interface, but it still needs fixing for the future)


Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: k.org/drivers/net/pasemi_mac.c
===
--- k.org.orig/drivers/net/pasemi_mac.c
+++ k.org/drivers/net/pasemi_mac.c
@@ -396,7 +396,7 @@ static void pasemi_mac_replenish_rx_ring
if (unlikely(!skb))
break;
 
-   dma = pci_map_single(mac->dma_pdev, skb->data, skb->len,
+   dma = pci_map_single(mac->dma_pdev, skb->data, BUF_SIZE,
 PCI_DMA_FROMDEVICE);
 
if (unlikely(dma_mapping_error(dma))) {
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 12/13] Clean up redundant PHY write line for ULi526x Ethernet driver

2007-10-02 Thread akpm
From: Roy Zang <[EMAIL PROTECTED]>

Clean up redundant PHY write line for ULi526x Ethernet Driver.

Signed-off-by: Roy Zang <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
Acked-by: Grant Grundler <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/tulip/uli526x.c |1 -
 1 file changed, 1 deletion(-)

diff -puN 
drivers/net/tulip/uli526x.c~clean-up-redundant-phy-write-line-for-uli526x-ethernet
 drivers/net/tulip/uli526x.c
--- 
a/drivers/net/tulip/uli526x.c~clean-up-redundant-phy-write-line-for-uli526x-ethernet
+++ a/drivers/net/tulip/uli526x.c
@@ -1599,7 +1599,6 @@ static void uli526x_process_mode(struct 
case ULI526X_100MFD: phy_reg = 0x2100; break;
}
phy_write(db->ioaddr, db->phy_addr, 0, phy_reg, 
db->chip_id);
-   phy_write(db->ioaddr, db->phy_addr, 0, phy_reg, 
db->chip_id);
}
}
 }
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
> We fixed a lot of bugs in TSO last year.
> 
> It would be really great to see numbers with a more recent kernel
> than 2.6.18

More data, sky2 works fine (really really fine, like 79MB/sec) between
Linux dylan.bitmover.com 2.6.18.1 #5 SMP Mon Oct 23 17:36:00 PDT 2006 i686
Linux steele 2.6.20-16-generic #2 SMP Sun Sep 23 18:31:23 UTC 2007 x86_64

So this is looking like a e1000 bug.  I'll try to upgrade the kernel on 
the ia64 box and see what happens.
-- 
---
Larry McVoylm at bitmover.com   http://www.bitkeeper.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [1/11] pasemi_mac: basic error checking

2007-10-02 Thread Olof Johansson
pasemi_mac: basic error checking

Add some rudimentary error checking to pasemi_mac.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: k.org/drivers/net/pasemi_mac.c
===
--- k.org.orig/drivers/net/pasemi_mac.c
+++ k.org/drivers/net/pasemi_mac.c
@@ -445,6 +445,38 @@ static void pasemi_mac_restart_tx_intr(s
 }
 
 
+static inline void pasemi_mac_rx_error(struct pasemi_mac *mac, u64 macrx)
+{
+   unsigned int rcmdsta, ccmdsta;
+
+   if (!netif_msg_rx_err(mac))
+   return;
+
+   rcmdsta = read_dma_reg(mac, PAS_DMA_RXINT_RCMDSTA(mac->dma_if));
+   ccmdsta = read_dma_reg(mac, PAS_DMA_RXCHAN_CCMDSTA(mac->dma_rxch));
+
+   printk(KERN_ERR "pasemi_mac: rx error. macrx %016lx, rx status %lx\n",
+   macrx, *mac->rx_status);
+
+   printk(KERN_ERR "pasemi_mac: rcmdsta %08x ccmdsta %08x\n",
+   rcmdsta, ccmdsta);
+}
+
+static inline void pasemi_mac_tx_error(struct pasemi_mac *mac, u64 mactx)
+{
+   unsigned int cmdsta;
+
+   if (!netif_msg_tx_err(mac))
+   return;
+
+   cmdsta = read_dma_reg(mac, PAS_DMA_TXCHAN_TCMDSTA(mac->dma_txch));
+
+   printk(KERN_ERR "pasemi_mac: tx error. mactx 0x%016lx, "\
+   "tx status 0x%016lx\n", mactx, *mac->tx_status);
+
+   printk(KERN_ERR "pasemi_mac: tcmdsta 0x%08x\n", cmdsta);
+}
+
 static int pasemi_mac_clean_rx(struct pasemi_mac *mac, int limit)
 {
unsigned int n;
@@ -468,10 +500,13 @@ static int pasemi_mac_clean_rx(struct pa
prefetchw(dp);
macrx = dp->macrx;
 
+   if ((macrx & XCT_MACRX_E) ||
+   (*mac->rx_status & PAS_STATUS_ERROR))
+   pasemi_mac_rx_error(mac, macrx);
+
if (!(macrx & XCT_MACRX_O))
break;
 
-
info = NULL;
 
/* We have to scan for our skb since there's no way
@@ -563,6 +598,10 @@ restart:
for (i = start; i < limit; i++) {
dp = &TX_DESC(mac, i);
 
+   if ((dp->mactx & XCT_MACTX_E) ||
+   (*mac->tx_status & PAS_STATUS_ERROR))
+   pasemi_mac_tx_error(mac, dp->mactx);
+
if (unlikely(dp->mactx & XCT_MACTX_O))
/* Not yet transmitted */
break;
@@ -607,9 +646,6 @@ static irqreturn_t pasemi_mac_rx_intr(in
if (!(*mac->rx_status & PAS_STATUS_CAUSE_M))
return IRQ_NONE;
 
-   if (*mac->rx_status & PAS_STATUS_ERROR)
-   printk("rx_status reported error\n");
-
/* Don't reset packet count so it won't fire again but clear
 * all others.
 */
@@ -1230,7 +1266,7 @@ pasemi_mac_probe(struct pci_dev *pdev, c
dev_err(&mac->pdev->dev, "register_netdev failed with error 
%d\n",
err);
goto out;
-   } else
+   } else if netif_msg_probe(mac)
printk(KERN_INFO "%s: PA Semi %s: intf %d, txch %d, rxch %d, "
   "hw addr %s\n",
   dev->name, mac->type == MAC_TYPE_GMAC ? "GMAC" : "XAUI",
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 06/13] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c

2007-10-02 Thread Grant Grundler
On Tue, Oct 02, 2007 at 02:11:38PM -0700, [EMAIL PROTECTED] wrote:
> From: Micah Gruber <[EMAIL PROTECTED]>
> 
> This patch fixes an apparent potential null dereference bug where we
> dereference dev before a null check.  This patch simply remvoes the
> can't-happen test for a null pointer.
> 
> Signed-off-by: Micah Gruber <[EMAIL PROTECTED]>
> Cc: Grant Grundler <[EMAIL PROTECTED]>

Acked-by: Grant Grundler <[EMAIL PROTECTED]>

thanks!
grant

> Acked-by: Jeff Garzik <[EMAIL PROTECTED]>
> Acked-by: Kyle McMartin <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
> 
>  drivers/net/tulip/uli526x.c |5 -
>  1 file changed, 5 deletions(-)
> 
> diff -puN 
> drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt
>  drivers/net/tulip/uli526x.c
> --- 
> a/drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt
> +++ a/drivers/net/tulip/uli526x.c
> @@ -664,11 +664,6 @@ static irqreturn_t uli526x_interrupt(int
>   unsigned long ioaddr = dev->base_addr;
>   unsigned long flags;
>  
> - if (!dev) {
> - ULI526X_DBUG(1, "uli526x_interrupt() without DEVICE arg", 0);
> - return IRQ_NONE;
> - }
> -
>   spin_lock_irqsave(&db->lock, flags);
>   outl(0, ioaddr + DCR7);
>  
> _
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [0/11] pasemi_mac: Patches for 2.6.24

2007-10-02 Thread Olof Johansson
Hi,

This series of patches go on top of the previous fixes that were sent
out and picked up.

It's a series of mostly feature-related changes, but also a couple
of bugfixes:

[1/11] pasemi_mac: basic error checking
[2/11] pasemi_mac: fix bug in receive buffer dma mapping
[3/11] pasemi_mac: rework ring management
[4/11] pasemi_mac: implement sg support
[5/11] pasemi_mac: workaround for erratum 5971
[6/11] pasemi_mac: add local skb alignment
[7/11] pasemi_mac: further performance tweaks
[8/11] pasemi_mac: update todo list
[9/11] pasemi_mac: clear out old errors on interface open
[10/11] pasemi_mac: use buffer index pointer in clean_rx()
[11/11] pasemi_mac: enable iommu support


Thanks,

-Olof
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: [EMAIL PROTECTED] (Larry McVoy)
Date: Tue, 2 Oct 2007 09:48:58 -0700

> Isn't this something so straightforward that you would have tests for it?
> This is the basic FTP server loop, doesn't someone have a big machine with
> 10gig cards and test that sending/recving data doesn't regress?

Nobody is really doing this, or they aren't talking about it.
Sometimes the crash fixes and other work completely consumes us.  Add
in travel to conferences and real life, and it's no surprise stuff
like this slips through the cracks.

We absolutely depend upon people like you to report when there are
anomalies like this.  It's the only thing that scales.

FWIW I have a t1000 Niagara box and an Ultra45 going through a netgear
gigabit switch.  I'm getting 85MB/sec in one direction and 10MB/sec in
the other (using bw_tcp from lmbench3).  Both are using identical
broadcom tigon3 gigabit chips and identical current kernels so that is
a truly strange result.

I'll investigate, it may be the same thing you're seeing.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24] tg3: fix ethtool autonegotiate flags

2007-10-02 Thread Andy Gospodarek
On Tue, Oct 02, 2007 at 03:02:56PM -0700, Michael Chan wrote:
> On Tue, 2007-10-02 at 16:16 -0400, Andy Gospodarek wrote:
> > Adding that flag in tg3_set_settings seemed like the most logical
> > place
> > since the driver works fine on boot.  This is just an issue when
> > re-enabling autonegotiation, so we should probably nip it there.
> > 
> > Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]>
> 
> We also noticed this issue recently, but didn't pay too much attention
> to it since it was more of a "cosmetic" issue.  The driver behaves the
> same since we rely on cmd->autoneg to decide whether to enable autoneg
> or not.  Your fix seems reasonable to me.  Thanks.
> 
> Acked-by: Michael Chan <[EMAIL PROTECTED]>
> 

I completely agree that it's cosmetic, it just seems like something
decent to toss in there since it's the kind of thing others will start
complaining about. 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 10/13] PHYLIB: IRQ event workqueue handling fixes

2007-10-02 Thread akpm
From: "Maciej W. Rozycki" <[EMAIL PROTECTED]>

Keep track of disable_irq_nosync() invocations and call enable_irq() the
right number of times if work has been cancelled that would include them.

Now that the call to flush_work_keventd() (problematic because of
rtnl_mutex being held) has been replaced by cancel_work_sync() another
issue has arisen and been left unresolved.  As the MDIO bus cannot be
accessed from the interrupt context the PHY interrupt handler uses
disable_irq_nosync() to prevent from looping and schedules some work to be
done as a softirq, which, apart from handling the state change of the
originating PHY, is responsible for reenabling the interrupt.  Now if the
interrupt line is shared by another device and a call to the softirq
handler has been cancelled, that call to enable_irq() never happens and the
other device cannot use its interrupt anymore as its stuck disabled.

I decided to use a counter rather than a flag because there may be more
than one call to phy_change() cancelled in the queue -- a real one and a
fake one triggered by free_irq() if DEBUG_SHIRQ is used, if nothing else. 
Therefore because of its nesting property enable_irq() has to be called the
right number of times to match the number disable_irq_nosync() was called
and restore the original state.  This DEBUG_SHIRQ feature is also the
reason why free_irq() has to be called before cancel_work_sync().

While at it I updated the comment about phy_stop_interrupts() being called
from `keventd' -- this is no longer relevant as the use of
cancel_work_sync() makes such an approach unnecessary.  OTOH a similar
comment referring to flush_scheduled_work() in phy_stop() still applies as
using cancel_work_sync() there would be dangerous.

Checked with checkpatch.pl and at the run time (with and without
DEBUG_SHIRQ).

Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>
Cc: Andy Fleming <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/phy/phy.c |   24 +++-
 include/linux/phy.h   |3 +++
 2 files changed, 22 insertions(+), 5 deletions(-)

diff -puN drivers/net/phy/phy.c~phylib-irq-event-workqueue-handling-fixes 
drivers/net/phy/phy.c
--- a/drivers/net/phy/phy.c~phylib-irq-event-workqueue-handling-fixes
+++ a/drivers/net/phy/phy.c
@@ -7,7 +7,7 @@
  * Author: Andy Fleming
  *
  * Copyright (c) 2004 Freescale Semiconductor, Inc.
- * Copyright (c) 2006  Maciej W. Rozycki
+ * Copyright (c) 2006, 2007  Maciej W. Rozycki
  *
  * This program is free software; you can redistribute  it and/or modify it
  * under  the terms of  the GNU General  Public License as published by the
@@ -35,6 +35,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -562,6 +563,7 @@ static irqreturn_t phy_interrupt(int irq
 * queue will write the PHY to disable and clear the
 * interrupt, and then reenable the irq line. */
disable_irq_nosync(irq);
+   atomic_inc(&phydev->irq_disable);
 
schedule_work(&phydev->phy_queue);
 
@@ -632,6 +634,7 @@ int phy_start_interrupts(struct phy_devi
 
INIT_WORK(&phydev->phy_queue, phy_change);
 
+   atomic_set(&phydev->irq_disable, 0);
if (request_irq(phydev->irq, phy_interrupt,
IRQF_SHARED,
"phy_interrupt",
@@ -662,13 +665,22 @@ int phy_stop_interrupts(struct phy_devic
if (err)
phy_error(phydev);
 
+   free_irq(phydev->irq, phydev);
+
/*
-* Finish any pending work; we might have been scheduled to be called
-* from keventd ourselves, but cancel_work_sync() handles that.
+* Cannot call flush_scheduled_work() here as desired because
+* of rtnl_lock(), but we do not really care about what would
+* be done, except from enable_irq(), so cancel any work
+* possibly pending and take care of the matter below.
 */
cancel_work_sync(&phydev->phy_queue);
-
-   free_irq(phydev->irq, phydev);
+   /*
+* If work indeed has been cancelled, disable_irq() will have
+* been left unbalanced from phy_interrupt() and enable_irq()
+* has to be called so that other devices on the line work.
+*/
+   while (atomic_dec_return(&phydev->irq_disable) >= 0)
+   enable_irq(phydev->irq);
 
return err;
 }
@@ -695,6 +707,7 @@ static void phy_change(struct work_struc
phydev->state = PHY_CHANGELINK;
spin_unlock_bh(&phydev->lock);
 
+   atomic_dec(&phydev->irq_disable);
enable_irq(phydev->irq);
 
/* Reenable interrupts */
@@ -708,6 +721,7 @@ static void phy_change(struct work_struc
 
 irq_enable_err:
disable_irq(phydev->irq);
+   atomic_inc(&phydev->irq_disable);
 phy_err:
phy_error(phydev);
 }
diff -puN include/linux/phy.h~phylib-irq-event-workqueue-handling-fixes 
include/linux/phy.h
--- a/include/linux/phy.h~p

[patch 07/13] PHYLIB: Spinlock fixes for softirqs

2007-10-02 Thread akpm
From: "Maciej W. Rozycki" <[EMAIL PROTECTED]>

Use spin_lock_bh()/spin_unlock_bh() for the phydev lock throughout as it
is used in phy_timer() that is called as a softirq and all the other
operations may happen in the user context.

There has been a change recently that did such a conversion for some of the
operations on the lock, but some have been left intact.  Many of them,
perhaps all, may be called in the user context and I was able to trigger
recursive spinlock acquisition indeed, so I think for the sake of long-term
maintenance it is best to convert them all, even if unnecessarily for one
or two -- better safe than sorry.

Perhaps one in phy_timer() could actually be skipped as only called as a
softirq -- I can send an update if that sounds like a good idea.

Checked with checkpatch.pl and at the runtime.

Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/phy/phy.c|   24 
 drivers/net/phy/phy_device.c |4 ++--
 2 files changed, 14 insertions(+), 14 deletions(-)

diff -puN drivers/net/phy/phy.c~phylib-spinlock-fixes-for-softirqs 
drivers/net/phy/phy.c
--- a/drivers/net/phy/phy.c~phylib-spinlock-fixes-for-softirqs
+++ a/drivers/net/phy/phy.c
@@ -424,7 +424,7 @@ int phy_start_aneg(struct phy_device *ph
 {
int err;
 
-   spin_lock(&phydev->lock);
+   spin_lock_bh(&phydev->lock);
 
if (AUTONEG_DISABLE == phydev->autoneg)
phy_sanitize_settings(phydev);
@@ -445,7 +445,7 @@ int phy_start_aneg(struct phy_device *ph
}
 
 out_unlock:
-   spin_unlock(&phydev->lock);
+   spin_unlock_bh(&phydev->lock);
return err;
 }
 EXPORT_SYMBOL(phy_start_aneg);
@@ -490,10 +490,10 @@ void phy_stop_machine(struct phy_device 
 {
del_timer_sync(&phydev->phy_timer);
 
-   spin_lock(&phydev->lock);
+   spin_lock_bh(&phydev->lock);
if (phydev->state > PHY_UP)
phydev->state = PHY_UP;
-   spin_unlock(&phydev->lock);
+   spin_unlock_bh(&phydev->lock);
 
phydev->adjust_state = NULL;
 }
@@ -537,9 +537,9 @@ static void phy_force_reduction(struct p
  */
 void phy_error(struct phy_device *phydev)
 {
-   spin_lock(&phydev->lock);
+   spin_lock_bh(&phydev->lock);
phydev->state = PHY_HALTED;
-   spin_unlock(&phydev->lock);
+   spin_unlock_bh(&phydev->lock);
 }
 
 /**
@@ -690,10 +690,10 @@ static void phy_change(struct work_struc
if (err)
goto phy_err;
 
-   spin_lock(&phydev->lock);
+   spin_lock_bh(&phydev->lock);
if ((PHY_RUNNING == phydev->state) || (PHY_NOLINK == phydev->state))
phydev->state = PHY_CHANGELINK;
-   spin_unlock(&phydev->lock);
+   spin_unlock_bh(&phydev->lock);
 
enable_irq(phydev->irq);
 
@@ -718,7 +718,7 @@ phy_err:
  */
 void phy_stop(struct phy_device *phydev)
 {
-   spin_lock(&phydev->lock);
+   spin_lock_bh(&phydev->lock);
 
if (PHY_HALTED == phydev->state)
goto out_unlock;
@@ -734,7 +734,7 @@ void phy_stop(struct phy_device *phydev)
}
 
 out_unlock:
-   spin_unlock(&phydev->lock);
+   spin_unlock_bh(&phydev->lock);
 
/*
 * Cannot call flush_scheduled_work() here as desired because
@@ -782,7 +782,7 @@ static void phy_timer(unsigned long data
int needs_aneg = 0;
int err = 0;
 
-   spin_lock(&phydev->lock);
+   spin_lock_bh(&phydev->lock);
 
if (phydev->adjust_state)
phydev->adjust_state(phydev->attached_dev);
@@ -948,7 +948,7 @@ static void phy_timer(unsigned long data
break;
}
 
-   spin_unlock(&phydev->lock);
+   spin_unlock_bh(&phydev->lock);
 
if (needs_aneg)
err = phy_start_aneg(phydev);
diff -puN drivers/net/phy/phy_device.c~phylib-spinlock-fixes-for-softirqs 
drivers/net/phy/phy_device.c
--- a/drivers/net/phy/phy_device.c~phylib-spinlock-fixes-for-softirqs
+++ a/drivers/net/phy/phy_device.c
@@ -670,9 +670,9 @@ static int phy_remove(struct device *dev
 
phydev = to_phy_device(dev);
 
-   spin_lock(&phydev->lock);
+   spin_lock_bh(&phydev->lock);
phydev->state = PHY_DOWN;
-   spin_unlock(&phydev->lock);
+   spin_unlock_bh(&phydev->lock);
 
if (phydev->drv->remove)
phydev->drv->remove(phydev);
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 06/13] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c

2007-10-02 Thread akpm
From: Micah Gruber <[EMAIL PROTECTED]>

This patch fixes an apparent potential null dereference bug where we
dereference dev before a null check.  This patch simply remvoes the
can't-happen test for a null pointer.

Signed-off-by: Micah Gruber <[EMAIL PROTECTED]>
Cc: Grant Grundler <[EMAIL PROTECTED]>
Acked-by: Jeff Garzik <[EMAIL PROTECTED]>
Acked-by: Kyle McMartin <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/tulip/uli526x.c |5 -
 1 file changed, 5 deletions(-)

diff -puN 
drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt
 drivers/net/tulip/uli526x.c
--- 
a/drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt
+++ a/drivers/net/tulip/uli526x.c
@@ -664,11 +664,6 @@ static irqreturn_t uli526x_interrupt(int
unsigned long ioaddr = dev->base_addr;
unsigned long flags;
 
-   if (!dev) {
-   ULI526X_DBUG(1, "uli526x_interrupt() without DEVICE arg", 0);
-   return IRQ_NONE;
-   }
-
spin_lock_irqsave(&db->lock, flags);
outl(0, ioaddr + DCR7);
 
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 03/13] drivers/net/cxgb3/xgmac.c: remove dead code

2007-10-02 Thread akpm
From: Adrian Bunk <[EMAIL PROTECTED]>

This patch removes dead code ("tx_xcnt" can never be != 0 at this place)
spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/xgmac.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff -puN drivers/net/cxgb3/xgmac.c~drivers-net-cxgb3-xgmacc-remove-dead-code 
drivers/net/cxgb3/xgmac.c
--- a/drivers/net/cxgb3/xgmac.c~drivers-net-cxgb3-xgmacc-remove-dead-code
+++ a/drivers/net/cxgb3/xgmac.c
@@ -522,10 +522,7 @@ int t3b2_mac_watchdog_task(struct cmac *
goto rxcheck;
}
 
-   if (((tx_tcnt != mac->tx_tcnt) &&
-(tx_xcnt == 0) && (mac->tx_xcnt == 0)) ||
-   ((mac->tx_mcnt == tx_mcnt) &&
-(tx_xcnt != 0) && (mac->tx_xcnt != 0))) {
+   if ((tx_tcnt != mac->tx_tcnt) && (mac->tx_xcnt == 0))  {
if (mac->toggle_cnt > 4) {
status = 2;
goto out;
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 02/13] PCI-X/PCI-Express read control interfaces: use them in e1000

2007-10-02 Thread akpm
From: "Peter Oruba" <[EMAIL PROTECTED]>

These driver changes incorporate the proposed PCI-X / PCI-Express read byte
count interface.  Reading and setting those valuse doesn't take place
"manually", instead wrapping functions are called to allow quirks for some
PCI bridges.

[EMAIL PROTECTED]: e1000: #if 0 two functions]
Signed-off by: Peter Oruba <[EMAIL PROTECTED]>
Based on work by Stephen Hemminger <[EMAIL PROTECTED]>
Acked-by: Auke Kok <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/e1000/e1000_hw.c   |   25 +++--
 drivers/net/e1000/e1000_hw.h   |4 ++--
 drivers/net/e1000/e1000_main.c |   18 ++
 3 files changed, 23 insertions(+), 24 deletions(-)

diff -puN 
drivers/net/e1000/e1000_hw.c~pci-x-pci-express-read-control-interfaces-e1000 
drivers/net/e1000/e1000_hw.c
--- 
a/drivers/net/e1000/e1000_hw.c~pci-x-pci-express-read-control-interfaces-e1000
+++ a/drivers/net/e1000/e1000_hw.c
@@ -871,10 +871,6 @@ e1000_init_hw(struct e1000_hw *hw)
 uint32_t ctrl;
 uint32_t i;
 int32_t ret_val;
-uint16_t pcix_cmd_word;
-uint16_t pcix_stat_hi_word;
-uint16_t cmd_mmrbc;
-uint16_t stat_mmrbc;
 uint32_t mta_size;
 uint32_t reg_data;
 uint32_t ctrl_ext;
@@ -964,24 +960,9 @@ e1000_init_hw(struct e1000_hw *hw)
 break;
 default:
 /* Workaround for PCI-X problem when BIOS sets MMRBC incorrectly. */
-if (hw->bus_type == e1000_bus_type_pcix) {
-e1000_read_pci_cfg(hw, PCIX_COMMAND_REGISTER, &pcix_cmd_word);
-e1000_read_pci_cfg(hw, PCIX_STATUS_REGISTER_HI,
-&pcix_stat_hi_word);
-cmd_mmrbc = (pcix_cmd_word & PCIX_COMMAND_MMRBC_MASK) >>
-PCIX_COMMAND_MMRBC_SHIFT;
-stat_mmrbc = (pcix_stat_hi_word & PCIX_STATUS_HI_MMRBC_MASK) >>
-PCIX_STATUS_HI_MMRBC_SHIFT;
-if (stat_mmrbc == PCIX_STATUS_HI_MMRBC_4K)
-stat_mmrbc = PCIX_STATUS_HI_MMRBC_2K;
-if (cmd_mmrbc > stat_mmrbc) {
-pcix_cmd_word &= ~PCIX_COMMAND_MMRBC_MASK;
-pcix_cmd_word |= stat_mmrbc << PCIX_COMMAND_MMRBC_SHIFT;
-e1000_write_pci_cfg(hw, PCIX_COMMAND_REGISTER,
-&pcix_cmd_word);
-}
-}
-break;
+   if (hw->bus_type == e1000_bus_type_pcix && e1000_pcix_get_mmrbc(hw) > 
2048)
+   e1000_pcix_set_mmrbc(hw, 2048);
+   break;
 }
 
 /* More time needed for PHY to initialize */
diff -puN 
drivers/net/e1000/e1000_hw.h~pci-x-pci-express-read-control-interfaces-e1000 
drivers/net/e1000/e1000_hw.h
--- 
a/drivers/net/e1000/e1000_hw.h~pci-x-pci-express-read-control-interfaces-e1000
+++ a/drivers/net/e1000/e1000_hw.h
@@ -421,9 +421,9 @@ void e1000_tbi_adjust_stats(struct e1000
 void e1000_get_bus_info(struct e1000_hw *hw);
 void e1000_pci_set_mwi(struct e1000_hw *hw);
 void e1000_pci_clear_mwi(struct e1000_hw *hw);
-void e1000_read_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t * value);
-void e1000_write_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t * value);
 int32_t e1000_read_pcie_cap_reg(struct e1000_hw *hw, uint32_t reg, uint16_t 
*value);
+void e1000_pcix_set_mmrbc(struct e1000_hw *hw, int mmrbc);
+int e1000_pcix_get_mmrbc(struct e1000_hw *hw);
 /* Port I/O is only supported on 82544 and newer */
 void e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value);
 int32_t e1000_disable_pciex_master(struct e1000_hw *hw);
diff -puN 
drivers/net/e1000/e1000_main.c~pci-x-pci-express-read-control-interfaces-e1000 
drivers/net/e1000/e1000_main.c
--- 
a/drivers/net/e1000/e1000_main.c~pci-x-pci-express-read-control-interfaces-e1000
+++ a/drivers/net/e1000/e1000_main.c
@@ -4887,6 +4887,8 @@ e1000_pci_clear_mwi(struct e1000_hw *hw)
pci_clear_mwi(adapter->pdev);
 }
 
+#if 0
+
 void
 e1000_read_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t *value)
 {
@@ -4903,6 +4905,22 @@ e1000_write_pci_cfg(struct e1000_hw *hw,
pci_write_config_word(adapter->pdev, reg, *value);
 }
 
+#endif  /*  0  */
+
+int
+e1000_pcix_get_mmrbc(struct e1000_hw *hw)
+{
+   struct e1000_adapter *adapter = hw->back;
+   return pcix_get_mmrbc(adapter->pdev);
+}
+
+void
+e1000_pcix_set_mmrbc(struct e1000_hw *hw, int mmrbc)
+{
+   struct e1000_adapter *adapter = hw->back;
+   pcix_set_mmrbc(adapter->pdev, mmrbc);
+}
+
 int32_t
 e1000_read_pcie_cap_reg(struct e1000_hw *hw, uint32_t reg, uint16_t *value)
 {
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 08/13] forcedeth: power down phy when interface is down

2007-10-02 Thread akpm
From: "Ed Swierk" <[EMAIL PROTECTED]>

Bring the physical link down when the interface is down, by placing the PHY in
power-down state.  This mirrors the behavior of other drivers including e1000
and tg3.

Signed-off-by: Ed Swierk <[EMAIL PROTECTED]>
Cc: Ayaz Abdulla <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>

On Sat, 29 Sep 2007 01:57:04 -0400 Jeff Garzik <[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] wrote:
> > From: "Ed Swierk" <[EMAIL PROTECTED]>
> > 
> > Bring the physical link down when the interface is down, by placing the PHY 
> > in
> > power-down state.  This mirrors the behavior of other drivers including 
> > e1000
> > and tg3.
> > 
> > Signed-off-by: Ed Swierk <[EMAIL PROTECTED]>
> > Cc: Ayaz Abdulla <[EMAIL PROTECTED]>
> > Cc: Jeff Garzik <[EMAIL PROTECTED]>
> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> 
> 
> HOLD -- waiting a bit for comment from others, particularly NVIDIA.
> 
> I'm not opposed to applying it, the patch looks correct, but I would 
> also like see testing results and general "it's ok for this hardware" 
> comments.
> 

Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/forcedeth.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff -puN 
drivers/net/forcedeth.c~forcedeth-power-down-phy-when-interface-is-down 
drivers/net/forcedeth.c
--- a/drivers/net/forcedeth.c~forcedeth-power-down-phy-when-interface-is-down
+++ a/drivers/net/forcedeth.c
@@ -1313,9 +1313,9 @@ static int phy_init(struct net_device *d
/* some phys clear out pause advertisment on reset, set it back */
mii_rw(dev, np->phyaddr, MII_ADVERTISE, reg);
 
-   /* restart auto negotiation */
+   /* restart auto negotiation, power down phy */
mii_control = mii_rw(dev, np->phyaddr, MII_BMCR, MII_READ);
-   mii_control |= (BMCR_ANRESTART | BMCR_ANENABLE);
+   mii_control |= (BMCR_ANRESTART | BMCR_ANENABLE | BMCR_PDOWN);
if (mii_rw(dev, np->phyaddr, MII_BMCR, mii_control)) {
return PHY_ERROR;
}
@@ -4791,6 +4791,10 @@ static int nv_open(struct net_device *de
 
dprintk(KERN_DEBUG "nv_open: begin\n");
 
+   /* power up phy */
+   mii_rw(dev, np->phyaddr, MII_BMCR,
+  mii_rw(dev, np->phyaddr, MII_BMCR, MII_READ) & ~BMCR_PDOWN);
+
/* erase previous misconfiguration */
if (np->driver_data & DEV_HAS_POWER_CNTRL)
nv_mac_reset(dev);
@@ -4975,6 +4979,10 @@ static int nv_close(struct net_device *d
nv_start_rx(dev);
}
 
+   /* power down phy */
+   mii_rw(dev, np->phyaddr, MII_BMCR,
+  mii_rw(dev, np->phyaddr, MII_BMCR, MII_READ) | BMCR_PDOWN);
+
/* FIXME: power down nic */
 
return 0;
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 09/13] forcedeth: "no link" is informational

2007-10-02 Thread akpm
From: "Ed Swierk" <[EMAIL PROTECTED]>

Log "no link during initialization" at KERN_INFO as it's not an error, and
occurs every time the interface comes up (when the forcedeth-phy-power-down
patch is applied).

Signed-off-by: Ed Swierk <[EMAIL PROTECTED]>
Cc: Ayaz Abdulla <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/forcedeth.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN drivers/net/forcedeth.c~forcedeth-no-link-is-informational 
drivers/net/forcedeth.c
--- a/drivers/net/forcedeth.c~forcedeth-no-link-is-informational
+++ a/drivers/net/forcedeth.c
@@ -4921,7 +4921,7 @@ static int nv_open(struct net_device *de
if (ret) {
netif_carrier_on(dev);
} else {
-   printk("%s: no link during initialization.\n", dev->name);
+   printk(KERN_INFO "%s: no link during initialization.\n", 
dev->name);
netif_carrier_off(dev);
}
if (oom)
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 05/13] skge: remove broken and unused PHY_M_PC_MDI_XMODE macro

2007-10-02 Thread akpm
From: Mariusz Kozlowski <[EMAIL PROTECTED]>

Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>
Cc: Stephen Hemminger <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/skge.h |2 --
 1 file changed, 2 deletions(-)

diff -puN 
drivers/net/skge.h~skge-remove-broken-and-unused-phy_m_pc_mdi_xmode-macro 
drivers/net/skge.h
--- a/drivers/net/skge.h~skge-remove-broken-and-unused-phy_m_pc_mdi_xmode-macro
+++ a/drivers/net/skge.h
@@ -1351,8 +1351,6 @@ enum {
PHY_M_PC_EN_DET_PLUS= 3<<8, /* Energy Detect Plus (Mode 2) */
 };
 
-#define PHY_M_PC_MDI_XMODE(x)  u16)(x)<<5) & PHY_M_PC_MDIX_MSK)
-
 enum {
PHY_M_PC_MAN_MDI= 0, /* 00 = Manual MDI configuration */
PHY_M_PC_MAN_MDIX   = 1, /* 01 = Manual MDIX configuration */
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 01/13] PHY fixed driver: rework release path and update phy_id notation

2007-10-02 Thread akpm
From: Vitaly Bordug <[EMAIL PROTECTED]>

device_bind_driver() error code returning has been fixed.  release()
function has been written, so that to free resources in correct way; the
release path is now clean.

Before the rework, it used to cause
 Device '[EMAIL PROTECTED]:1' does not have a release() function, it is broken
 and must be fixed.
 BUG: at drivers/base/core.c:104 device_release()

 Call Trace:
  [] kobject_cleanup+0x53/0x7e
  [] kobject_release+0x0/0x9
  [] kref_put+0x74/0x81
  [] fixed_mdio_register_device+0x230/0x265
  [] fixed_init+0x1f/0x35
  [] init+0x147/0x2fb
  [] schedule_tail+0x36/0x92
  [] child_rip+0xa/0x12
  [] acpi_ds_init_one_object+0x0/0x83
  [] init+0x0/0x2fb
  [] child_rip+0x0/0x12


Also changed the notation of the fixed phy definition on
mdio bus to the form of + to make it able to be used by
gianfar and ucc_geth that define phy_id strictly as "%d:%d" and cleaned up
the whitespace issues.

Signed-off-by: Vitaly Bordug <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/phy/Kconfig   |   14 +
 drivers/net/phy/fixed.c   |  310 ++--
 include/linux/phy_fixed.h |   38 
 3 files changed, 207 insertions(+), 155 deletions(-)

diff -puN 
drivers/net/phy/Kconfig~phy-fixed-driver-rework-release-path-and-update 
drivers/net/phy/Kconfig
--- a/drivers/net/phy/Kconfig~phy-fixed-driver-rework-release-path-and-update
+++ a/drivers/net/phy/Kconfig
@@ -76,4 +76,18 @@ config FIXED_MII_100_FDX
bool "Emulation for 100M Fdx fixed PHY behavior"
depends on FIXED_PHY
 
+config FIXED_MII_1000_FDX
+   bool "Emulation for 1000M Fdx fixed PHY behavior"
+   depends on FIXED_PHY
+
+config FIXED_MII_AMNT
+int "Number of emulated PHYs to allocate "
+depends on FIXED_PHY
+default "1"
+---help---
+Sometimes it is required to have several independent emulated
+PHYs on the bus (in case of multi-eth but phy-less HW for instance).
+This control will have specified number allocated for each fixed
+PHY type enabled.
+
 endif # PHYLIB
diff -puN 
drivers/net/phy/fixed.c~phy-fixed-driver-rework-release-path-and-update 
drivers/net/phy/fixed.c
--- a/drivers/net/phy/fixed.c~phy-fixed-driver-rework-release-path-and-update
+++ a/drivers/net/phy/fixed.c
@@ -30,53 +30,31 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 
-#define MII_REGS_NUM   7
-
-/*
-The idea is to emulate normal phy behavior by responding with
-pre-defined values to mii BMCR read, so that read_status hook could
-take all the needed info.
-*/
-
-struct fixed_phy_status {
-   u8  link;
-   u16 speed;
-   u8  duplex;
-};
-
-/*-
- *  Private information hoder for mii_bus
- 
*-*/
-struct fixed_info {
-   u16 *regs;
-   u8 regs_num;
-   struct fixed_phy_status phy_status;
-   struct phy_device *phydev; /* pointer to the container */
-   /* link & speed cb */
-   int(*link_update)(struct net_device*, struct fixed_phy_status*);
-
-};
+/* we need to track the allocated pointers in order to free them on exit */
+static struct fixed_info *fixed_phy_ptrs[CONFIG_FIXED_MII_AMNT*MAX_PHY_AMNT];
 
 /*-
  *  If something weird is required to be done with link/speed,
  * network driver is able to assign a function to implement this.
  * May be useful for PHY's that need to be software-driven.
  
*-*/
-int fixed_mdio_set_link_update(struct phy_device* phydev,
-   int(*link_update)(struct net_device*, struct fixed_phy_status*))
+int fixed_mdio_set_link_update(struct phy_device *phydev,
+  int (*link_update) (struct net_device *,
+  struct fixed_phy_status *))
 {
struct fixed_info *fixed;
 
-   if(link_update == NULL)
+   if (link_update == NULL)
return -EINVAL;
 
-   if(phydev) {
-   if(phydev->bus) {
+   if (phydev) {
+   if (phydev->bus) {
fixed = phydev->bus->priv;
fixed->link_update = link_update;
return 0;
@@ -84,54 +62,64 @@ int fixed_mdio_set_link_update(struct ph
}
return -EINVAL;
 }
+
 EXPORT_SYMBOL(fixed_mdio_set_link_update);
 
+struct fixed_info *fixed_mdio_get_phydev (int phydev_ind)
+{
+   if (phydev_ind >= MAX_PHY_AMNT)
+   return NULL;
+   return fixed_phy_ptrs[phydev_ind];
+}
+
+EXPORT_SYMBOL(fixed_mdio_get_phydev);
+
 /*-
  *  This is used for updating internal mii regs from the sta

[patch 04/13] Avoid possible NULL pointer deref in 3c359 driver

2007-10-02 Thread akpm
From: Jesper Juhl <[EMAIL PROTECTED]>

In xl_freemem(), if dev_if is NULL, the line

  struct xl_private *xl_priv =(struct xl_private *)dev->priv;

will cause a NULL pointer dereference.

(akpm: don't try to fix it: just delete the pointless test-for-null)

Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/tokenring/3c359.c |5 -
 1 file changed, 5 deletions(-)

diff -puN 
drivers/net/tokenring/3c359.c~avoid-possible-null-pointer-deref-in-3c359-driver 
drivers/net/tokenring/3c359.c
--- 
a/drivers/net/tokenring/3c359.c~avoid-possible-null-pointer-deref-in-3c359-driver
+++ a/drivers/net/tokenring/3c359.c
@@ -1045,11 +1045,6 @@ static irqreturn_t xl_interrupt(int irq,
u8 __iomem * xl_mmio = xl_priv->xl_mmio ; 
u16 intstatus, macstatus  ;
 
-   if (!dev) { 
-   printk(KERN_WARNING "Device structure dead, aaa !\n") ;
-   return IRQ_NONE; 
-   }
-
intstatus = readw(xl_mmio + MMIO_INTSTATUS) ;  
 
if (!(intstatus & 1)) /* We didn't generate the interrupt */
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/3] git-net: sctp build fix (not for applying)

2007-10-02 Thread akpm
From: Andrew Morton <[EMAIL PROTECTED]>

net/sctp/sm_statetable.c:551: error: 'sctp_sf_tabort_8_4_8' undeclared here 
(not in a function)


Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 net/sctp/sm_statetable.c |2 --
 1 file changed, 2 deletions(-)

diff -puN net/sctp/sm_statetable.c~git-net-sctp-hack net/sctp/sm_statetable.c
--- a/net/sctp/sm_statetable.c~git-net-sctp-hack
+++ a/net/sctp/sm_statetable.c
@@ -527,8 +527,6 @@ static const sctp_sm_table_entry_t prsct
/* SCTP_STATE_EMPTY */ \
TYPE_SCTP_FUNC(sctp_sf_ootb), \
/* SCTP_STATE_CLOSED */ \
-   TYPE_SCTP_FUNC(sctp_sf_tabort_8_4_8), \
-   /* SCTP_STATE_COOKIE_WAIT */ \
TYPE_SCTP_FUNC(sctp_sf_discard_chunk), \
/* SCTP_STATE_COOKIE_ECHOED */ \
TYPE_SCTP_FUNC(sctp_sf_eat_auth), \
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/3] ipg.c doesn't compile with with CONFIG_HIGHMEM64G

2007-10-02 Thread akpm
From: trem <[EMAIL PROTECTED]>

I've tried to compile 2.6.23-rc8-mm2, but it fails on ipg.c with the
error : ERROR: "__udivdi3" [drivers/net/ipg.ko] undefined!

I've instigated a bit, and I've found this code in ipg.c :

static void ipg_nic_txfree(struct net_device *dev)
{
   struct ipg_nic_private *sp = netdev_priv(dev);
   void __iomem *ioaddr = sp->ioaddr;
   const unsigned int curr = ipg_r32(TFD_LIST_PTR_0) -
   (sp->txd_map / sizeof(struct ipg_tx)) - 1;
   unsigned int released, pending;

sp->txd_map is an u64
because :
dma_addr_t txd_map;

And in asm-i386/types.h, I see :
#ifdef CONFIG_HIGHMEM64G
typedef u64 dma_addr_t;
#else
typedef u32 dma_addr_t;
#endif
I my config, I use CONFIG_HIGHMEM64G

sizeof(struct ipg_tx) is an u32
So the div failed on i386 because of u64 / u32.

[EMAIL PROTECTED]: cleanups]
Cc: Sorbica Shieh <[EMAIL PROTECTED]>
Cc: Jesse Huang <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
Cc: "David S. Miller" <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/ipg.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff -puN drivers/net/ipg.c~ipgc-doesnt-compile-with-with-config_highmem64g 
drivers/net/ipg.c
--- a/drivers/net/ipg.c~ipgc-doesnt-compile-with-with-config_highmem64g
+++ a/drivers/net/ipg.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+#include 
+
 #define IPG_RX_RING_BYTES  (sizeof(struct ipg_rx) * IPG_RFDLIST_LENGTH)
 #define IPG_TX_RING_BYTES  (sizeof(struct ipg_tx) * IPG_TFDLIST_LENGTH)
 #define IPG_RESET_MASK \
@@ -836,10 +838,14 @@ static void ipg_nic_txfree(struct net_de
 {
struct ipg_nic_private *sp = netdev_priv(dev);
void __iomem *ioaddr = sp->ioaddr;
-   const unsigned int curr = ipg_r32(TFD_LIST_PTR_0) -
-   (sp->txd_map / sizeof(struct ipg_tx)) - 1;
+   unsigned int curr;
+   u64 txd_map;
unsigned int released, pending;
 
+   txd_map = (u64)sp->txd_map;
+   curr = ipg_r32(TFD_LIST_PTR_0) -
+   do_div(txd_map, sizeof(struct ipg_tx)) - 1;
+
IPG_DEBUG_MSG("_nic_txfree\n");
 
pending = sp->tx_current - sp->tx_dirty;
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] Fix ICMPv6 redirect handling with target multicast address

2007-10-02 Thread Brian Haley

Hi David,

David Stevens wrote:
ipv6_addr_type() returns a mask, so checking for equality will 
fail to
match if  any other (irrelevant) attributes are set. How about using 
bitwise

operators for that?


ipv6_addr_type() does return a mask, but there's a lot of code that just 
checks for equality since some things are mutually-exclusive - this code 
is actually identical to what ip6_route_add() does.  I don't 
particularly like this duality, but it's there - I'd gladly volunteer to 
clean this up everywhere if I didn't think there might be some 
performance reason it was done like that.



Also, the error message is no longer descriptive of the
failure if it's a link-local multicast, but you could make it "target 
address is not

link-local unicast.\n" (in both places).


I can do that, thanks.

-Brian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/3] git-net: make it compile (not for applying?)

2007-10-02 Thread akpm
From: Andrew Morton <[EMAIL PROTECTED]>

drivers/net/hamradio/baycom_epp.c: In function 'baycom_probe':
drivers/net/hamradio/baycom_epp.c:1162: error: 'struct net_device' has no 
member named 'hard_header'
drivers/net/hamradio/baycom_epp.c:1163: error: 'struct net_device' has no 
member named 'rebuild_header'

Cc: "David S. Miller" <[EMAIL PROTECTED]>
Cc: Stephen Hemminger <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/net/hamradio/baycom_epp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN drivers/net/hamradio/baycom_epp.c~git-net-more-bustage 
drivers/net/hamradio/baycom_epp.c
--- a/drivers/net/hamradio/baycom_epp.c~git-net-more-bustage
+++ a/drivers/net/hamradio/baycom_epp.c
@@ -1159,8 +1159,8 @@ static void baycom_probe(struct net_devi
/* Fill in the fields of the device structure */
bc->skb = NULL;

-   dev->hard_header = ax25_hard_header;
-   dev->rebuild_header = ax25_rebuild_header;
+// dev->hard_header = ax25_hard_header;
+// dev->rebuild_header = ax25_rebuild_header;
dev->set_mac_address = baycom_set_mac_address;

dev->type = ARPHRD_AX25;   /* AF_AX25 device */
_
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24] tg3: fix ethtool autonegotiate flags

2007-10-02 Thread Michael Chan
On Tue, 2007-10-02 at 16:16 -0400, Andy Gospodarek wrote:
> Adding that flag in tg3_set_settings seemed like the most logical
> place
> since the driver works fine on boot.  This is just an issue when
> re-enabling autonegotiation, so we should probably nip it there.
> 
> Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]>

We also noticed this issue recently, but didn't pay too much attention
to it since it was more of a "cosmetic" issue.  The driver behaves the
same since we rely on cmd->autoneg to decide whether to enable autoneg
or not.  Your fix seems reasonable to me.  Thanks.

Acked-by: Michael Chan <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] CAN: Add virtual CAN netdevice driver

2007-10-02 Thread Oliver Hartkopp

Arnaldo Carvalho de Melo wrote:

Em Tue, Oct 02, 2007 at 03:10:11PM +0200, Urs Thuermann escreveu:
  

+
+#ifdef CONFIG_CAN_DEBUG_DEVICES
+static int debug;
+module_param(debug, int, S_IRUGO);
+#endif



Can debug be a boolean? Like its counterpart on DCCP:

net/dccp/proto.c:

module_param(dccp_debug, bool, 0444);
  


'debug' should remain an integer to be able to specifiy debug-levels or 
bit-fields for different Debug outputs.



Where we also use a namespace prefix, for those of us who use ctags or
cscope.
  


Even if i don't have any general objections to rename this 'debug' to 
'vcan_debug', it looks like an 'overnamed' module parameter for me. Is 
this a genereal naming scheme recommendation for debug module_params?





+/*
+ * CAN test feature:
+ * Enable the echo on driver level for testing the CAN core echo modes.
+ * See Documentation/networking/can.txt for details.
+ */
+
+static int echo; /* echo testing. Default: 0 (Off) */
+module_param(echo, int, S_IRUGO);
+MODULE_PARM_DESC(echo, "Echo sent frames (for testing). Default: 0 (Off)");



echo also seems to be a boolean

  


Yes. This is definitely a boolean candidate. We'll change that.

Thanks,
Oliver

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Roland Dreier
 > It would be really great to see numbers with a more recent kernel
 > than 2.6.18

FWIW Debian has binaries for 2.6.21 in testing and for 2.6.22 in
unstable so it should be very easy for Larry to try at least those.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones

Alternatively, take your favorite test programs, such as John's,
and make a second pair that reverses the direction the data is 
sent.  So one pair is server sends, the other is server receives,

try both.  That's where we started, BitKeeper, my stripped down test,
and John's test all exhibit the same behavior.  And the rsh test
is just a really simple way to demonstrate it.


Netperf TCP_STREAM - server receives.  TCP_MAERTS (STREAM backwards) - server 
sends:

[EMAIL PROTECTED] ~]# netperf -H 192.168.2.107
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.107 
(192.168.2.107) port 0 AF_INET : demo

Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  87380  8738010.17 941.46
[EMAIL PROTECTED] ~]# netperf -H 192.168.2.107 -t TCP_MAERTS
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.107 
(192.168.2.107) port 0 AF_INET : demo

Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  87380  8738010.15 941.35

The above took all the defaults for socket buffers and such.

 [EMAIL PROTECTED] ~]# uname -a
Linux hpcpc106.cup.hp.com 2.6.18-8.el5 #1 SMP Fri Jan 26 14:16:09 EST 2007 ia64 
ia64 ia64 GNU/Linux


[EMAIL PROTECTED] ~]# ethtool -i eth2
driver: e1000
version: 7.2.7-k2-NAPI
firmware-version: N/A
bus-info: :06:01.0

between a pair of 1.6 GHz itanium2 montecito rx2660's with a dual-port HP A9900A 
(Intel 82546GB) in slot 3 of the io cage on each.  Connection is actually 
back-to-back rather than through a switch.  I'm afraid I've nothing older installed.


sysctl settings attached

Where I do have things connected via a switch (HP ProCurve 3500 IIRC, perhaps a 
2724) is through the core BCM5704:


[EMAIL PROTECTED] netperf2_work]# netperf -H hpcpc107
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to hpcpc107.cup.hp.com 
(16.89.84.107) port 0 AF_INET : demo

Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  87380  8738010.03 941.41

[EMAIL PROTECTED] netperf2_work]# netperf -H hpcpc107 -t TCP_MAERTS
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to hpcpc107.cup.hp.com 
(16.89.84.107) port 0 AF_INET : demo

Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  87380  8738010.03 941.37

[EMAIL PROTECTED] netperf2_work]# ethtool -i eth0
driver: tg3
version: 3.65-rh
firmware-version: 5704-v3.27
bus-info: :01:02.0

rick jones
net.ipv6.conf.eth2.router_probe_interval = 60
net.ipv6.conf.eth2.accept_ra_rtr_pref = 1
net.ipv6.conf.eth2.accept_ra_pinfo = 1
net.ipv6.conf.eth2.accept_ra_defrtr = 1
net.ipv6.conf.eth2.max_addresses = 16
net.ipv6.conf.eth2.max_desync_factor = 600
net.ipv6.conf.eth2.regen_max_retry = 5
net.ipv6.conf.eth2.temp_prefered_lft = 86400
net.ipv6.conf.eth2.temp_valid_lft = 604800
net.ipv6.conf.eth2.use_tempaddr = 0
net.ipv6.conf.eth2.force_mld_version = 0
net.ipv6.conf.eth2.router_solicitation_delay = 1
net.ipv6.conf.eth2.router_solicitation_interval = 4
net.ipv6.conf.eth2.router_solicitations = 3
net.ipv6.conf.eth2.dad_transmits = 1
net.ipv6.conf.eth2.autoconf = 1
net.ipv6.conf.eth2.accept_redirects = 1
net.ipv6.conf.eth2.accept_ra = 1
net.ipv6.conf.eth2.mtu = 1500
net.ipv6.conf.eth2.hop_limit = 64
net.ipv6.conf.eth2.forwarding = 0
net.ipv6.conf.eth0.router_probe_interval = 60
net.ipv6.conf.eth0.accept_ra_rtr_pref = 1
net.ipv6.conf.eth0.accept_ra_pinfo = 1
net.ipv6.conf.eth0.accept_ra_defrtr = 1
net.ipv6.conf.eth0.max_addresses = 16
net.ipv6.conf.eth0.max_desync_factor = 600
net.ipv6.conf.eth0.regen_max_retry = 5
net.ipv6.conf.eth0.temp_prefered_lft = 86400
net.ipv6.conf.eth0.temp_valid_lft = 604800
net.ipv6.conf.eth0.use_tempaddr = 0
net.ipv6.conf.eth0.force_mld_version = 0
net.ipv6.conf.eth0.router_solicitation_delay = 1
net.ipv6.conf.eth0.router_solicitation_interval = 4
net.ipv6.conf.eth0.router_solicitations = 3
net.ipv6.conf.eth0.dad_transmits = 1
net.ipv6.conf.eth0.autoconf = 1
net.ipv6.conf.eth0.accept_redirects = 1
net.ipv6.conf.eth0.accept_ra = 1
net.ipv6.conf.eth0.mtu = 1500
net.ipv6.conf.eth0.hop_limit = 64
net.ipv6.conf.eth0.forwarding = 0
net.ipv6.conf.default.router_probe_interval = 60
net.ipv6.conf.default.accept_ra_rtr_pref = 1
net.ipv6.conf.default.accept_ra_pinfo = 1
net.ipv6.conf.default.accept_ra_defrtr = 1
net.ipv6.conf.default.max_addresses = 16
net.ipv6.conf.default.max_desync_factor = 600
net.ipv6.conf.default.regen_max_retry = 5
net.ipv6.conf.default.temp_prefered_lft = 86400
net.ipv6.conf.default.temp_valid_lft = 604800
net.ipv6.conf.default.use_tempaddr = 0
net.ipv6.conf.default.force_mld_version = 0
net.ipv6.conf.default.router_solicitation_delay = 1
net.ipv6.conf.default.router_solicitation_interv

Re: [IPv6] Fix ICMPv6 redirect handling with target multicast address

2007-10-02 Thread David Stevens
Brian,
ipv6_addr_type() returns a mask, so checking for equality will 
fail to
match if  any other (irrelevant) attributes are set. How about using 
bitwise
operators for that? Also, the error message is no longer descriptive of 
the
failure if it's a link-local multicast, but you could make it "target 
address is not
link-local unicast.\n" (in both places).

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: Linus Torvalds <[EMAIL PROTECTED]>
Date: Tue, 2 Oct 2007 12:27:53 -0700 (PDT)

> We see a single packet containing 16060 bytes, which seems to be because 
> of TSO on the sending side (you did your tcpdump on the sender, no?), so 
> it will actually be broken up into 11 1460-byte regular frames by the 
> network card, since they started out agreeing on a standard 1460-byte MSS. 
> So the above is not a jumbo frame, it just kind of looks like one when you 
> capture it on the sender side.
> 
> And maybe a 32kB window is not big enough when it causes the networking 
> code to basically just have a single packet outstanding.

We fixed a lot of bugs in TSO last year.

It would be really great to see numbers with a more recent kernel
than 2.6.18

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: Linus Torvalds <[EMAIL PROTECTED]>
Date: Tue, 2 Oct 2007 12:29:50 -0700 (PDT)

> On Tue, 2 Oct 2007, Larry McVoy wrote:
> > 
> > No HP in the mix.  It's got nothing to do with hp, nor to do with rsh, it 
> > has everything to do with the direction the data is flowing.  
> 
> Can you tcpdump both cases and send snippets (both of steady-state, and 
> the initial connect)? 

Another thing I'd like to see is if something more recent than 2.6.18
also reproduces the problem.

It could be just some bug we've fixed in the past year :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.24] tg3: fix ethtool autonegotiate flags

2007-10-02 Thread Andy Gospodarek

I recently noticed that when calling:

# ethtool -s eth0 autoneg on

on a 5722 (though I'm sure it's not specific to that card) that
subsequent checks of the cards status looked like this:

# ethtool eth0
Settings for eth0:
Supported ports: [ MII ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised auto-negotiation: No< This seems odd?!?
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Current message level: 0x00ff (255)
Link detected: yes

I noticed that the following commit:

commit 3600d918d870456ea8e7bb9d47f327de5c20f3d6
Author: Michael Chan <[EMAIL PROTECTED]>
Date:   Thu Dec 7 00:21:48 2006 -0800

[TG3]: Allow partial speed advertisement.

Honor the advertisement bitmask from ethtool.  We used to always
advertise the full capability when autoneg was set to on.

changed things around so that ethtool speed settings were strictly
followed.  Unfortunately ethtool doesn't seem to set ADVERTISED_Autoneg
in the advertising field (and maybe it shouldn't have to).  I'd vote
that it should be fixed there, but it should also be added here just in
case someone using ethtool ioctls in their own application gets what
they want.

Adding that flag in tg3_set_settings seemed like the most logical place
since the driver works fine on boot.  This is just an issue when
re-enabling autonegotiation, so we should probably nip it there.

Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]>
---

 tg3.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index d4ac6e9..0a414be 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -8070,7 +8070,8 @@ static int tg3_set_settings(struct net_device *dev, 
struct ethtool_cmd *cmd)
 
tp->link_config.autoneg = cmd->autoneg;
if (cmd->autoneg == AUTONEG_ENABLE) {
-   tp->link_config.advertising = cmd->advertising;
+   tp->link_config.advertising = (cmd->advertising |
+ ADVERTISED_Autoneg);
tp->link_config.speed = SPEED_INVALID;
tp->link_config.duplex = DUPLEX_INVALID;
} else {
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
> I think I'm still missing some basic data here (probably because this 
> thread did not originate on netdev).  Let me try to nail down some of 
> the basics.  You have a linux ia64 box (running 2.6.12 or 2.6.18?) that 
> sends slowly, and receives faster, but not quite a 1 Gbps?  And this is 
> true regardless of which peer it sends or receives from?  And the 
> behavior is different depending on which kernel?  How, and which kernel 
> versions?  Do you have other hardware running the same kernel that 
> behaves the same or differently?

just got off the phone with Linus and he thinks it is the side that does
the accept is the problem side, i.e., if you are the server, you do the
accept, and you send the data, you'll go slow.  But as I'm writing this
I realize he's wrong, because it is the combination of accept & send.
accept & recv goes fast.

A trivial way to see the problem is to take two linux boxes, on each
apt-get install rsh-client rsh-server
set up your .rhosts,
and then do

dd if=/dev/zero count=10 | rsh OTHER_BOX dd of=/dev/null
rsh OTHER_BOX dd if=/dev/zero count=10 | dd of=/dev/null

See if you get balanced results.  For me, I get 45MB/sec one way, and
15-19MB/sec the other way.

I've tried the same test linux - linux and linux - hpux.  Same results.
The test setup I have is

work:   2ghz x 2 Athlons, e1000, 2.6.18
ia64:   900mhz Itanium, e1000, 2.6.12
hp-ia64:900mhz Itanium, e1000, hpux 11
glibc*: 1-2ghz athlons running various linux releases

all connected through a netgear 724T 10/100/1000 switch (a linksys showed
identical results).

I tested 

work <-> hp-ia64
work <-> ia64
ia64 <-> hp-ia64

and in all cases, one direction worked fast and the other didn't.

It would be good if people tried the same simple test.  You have to
use rsh, ssh will slow things down way too much.

Alternatively, take your favorite test programs, such as John's,
and make a second pair that reverses the direction the data is 
sent.  So one pair is server sends, the other is server receives,
try both.  That's where we started, BitKeeper, my stripped down test,
and John's test all exhibit the same behavior.  And the rsh test
is just a really simple way to demonstrate it.

Wayne, Linus asked for tcp dumps from just one side, with the first 100
packets and then wait 10 seconds or so for the window to open up, and then
a snap shot of the another 100 packets.  Do that for both directions
and send them to the list.  Can you do that?  I want to get lunch, I'm
starving.
-- 
---
Larry McVoylm at bitmover.com   http://www.bitkeeper.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MSI interrupts and disable_irq

2007-10-02 Thread Manfred Spraul

Ayaz Abdulla wrote:
I am trying to track down a forcedeth driver issue described by bug 
9047 in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy 
load). I added a patch to synchronize the timer handlers so that one 
handler doesn't accidently enable the IRQ while another timer handler 
is running (see attachment 'Add timer lock' in bug report) and for 
other processing protection.


However, the system still had an Oops. So I added a lock around the 
nv_rx_process_optimized() and the Oops has not happened (see 
attachment 'New patch for locking' in bug report). This would imply a 
synchronization issue. However, the only callers of that function are 
the IRQ handler and the timer handlers (in non-NAPI case). The timer 
handlers  use disable_irq so that the IRQ handler does not contend 
with them. It looks as if disable_irq is not working properly.
Either disable_irq() is not working properly or interrupts are nested, 
i.e. the irq handler is called again while running.
Which timer handler do you mean? I only see disable_irq() in the 
configuration paths (set mtu, change ring size, ...) and in the tx 
timeout case.

Neither one should happen during normal operation.

--
   Manfred
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc8-mm2 - tcp_fastretrans_alert() WARNING

2007-10-02 Thread Ilpo Järvinen
On Tue, 2 Oct 2007, Ilpo Järvinen wrote:

> I'm currently out of ideas where it could come from... so lets try 
> brute-force checking as your test case is not very high-speed... This 
> could hide it though... :-(
> 
> Please put the patch below on top of clean rc8-mm2 (it includes the patch
> I gave you last time) and try to reproduce These counter bugs can
> survive for sometime until !sacked_out condition occurs, so the patch
> below tries to find that out when inconsisteny occurs for the first time 
> regardless of sacked_out (I also removed some statics which hopefully 
> reduces compiler inlining for easier reading of the output). I tried this 
> myself (except for verify()s in frto funcs and minor printout 
> modifications), didn't trigger for me.

In case you haven't yet get started (or it's easy enough to replace), 
please use the one below instead (I forgot one counter from printout
in the last patch, which might turn out useful...). 

-- 
 i.


---
 include/net/tcp.h |3 +
 net/ipv4/tcp_input.c  |   23 +--
 net/ipv4/tcp_ipv4.c   |  103 +
 net/ipv4/tcp_output.c |6 ++-
 4 files changed, 129 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 991ccdc..54a0d91 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -43,6 +43,9 @@
 
 #include 
 
+extern void tcp_verify_fackets(struct sock *sk);
+extern void tcp_print_queue(struct sock *sk);
+
 extern struct inet_hashinfo tcp_hashinfo;
 
 extern atomic_t tcp_orphan_count;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e22ffe7..1d7367d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1140,7 +1140,7 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct 
sk_buff *ack_skb,
return dup_sack;
 }
 
-static int
+int
 tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 
prior_snd_una)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -1160,6 +1160,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
int first_sack_index;
 
if (!tp->sacked_out) {
+   if (WARN_ON(tp->fackets_out))
+   tcp_print_queue(sk);
tp->fackets_out = 0;
tp->highest_sack = tp->snd_una;
}
@@ -1420,6 +1422,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
}
}
}
+   tcp_verify_fackets(sk);
 
/* Check for lost retransmit. This superb idea is
 * borrowed from "ratehalving". Event "C".
@@ -1632,13 +1635,14 @@ void tcp_enter_frto(struct sock *sk)
tcp_set_ca_state(sk, TCP_CA_Disorder);
tp->high_seq = tp->snd_nxt;
tp->frto_counter = 1;
+   tcp_verify_fackets(sk);
 }
 
 /* Enter Loss state after F-RTO was applied. Dupack arrived after RTO,
  * which indicates that we should follow the traditional RTO recovery,
  * i.e. mark everything lost and do go-back-N retransmission.
  */
-static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int 
flag)
+void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 {
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb;
@@ -1675,6 +1679,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int 
allowed_segments, int flag)
}
}
tcp_verify_left_out(tp);
+   tcp_verify_fackets(sk);
 
tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments;
tp->snd_cwnd_cnt = 0;
@@ -1753,6 +1758,7 @@ void tcp_enter_loss(struct sock *sk, int how)
}
}
tcp_verify_left_out(tp);
+   tcp_verify_fackets(sk);
 
tp->reordering = min_t(unsigned int, tp->reordering,
 sysctl_tcp_reordering);
@@ -2308,7 +2314,7 @@ static void tcp_mtup_probe_success(struct sock *sk, 
struct sk_buff *skb)
  * It does _not_ decide what to send, it is made in function
  * tcp_xmit_retransmit_queue().
  */
-static void
+void
 tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
@@ -2322,8 +2328,11 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, 
int flag)
if (!tp->packets_out)
tp->sacked_out = 0;
 
-   if (WARN_ON(!tp->sacked_out && tp->fackets_out))
+   if (WARN_ON(!tp->sacked_out && tp->fackets_out)) {
+   printk(KERN_ERR "TCP %d\n", tcp_is_reno(tp));
+   tcp_print_queue(sk);
tp->fackets_out = 0;
+   }
 
/* Now state machine starts.
 * A. ECE, hence prohibit cwnd undoing, the reduction is required. */
@@ -2333,6 +2342,8 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, 
int flag)
/* B. In all the states check for reneging SACKs. */
if (tp->sacked_out && tcp_check_sack_reneging(sk))
  

[PATCH] - trivial - Correct printk with PFX before KERN_ in drivers/net/wireless/bcm43xx/bcm43xx_wx.c

2007-10-02 Thread Joe Perches
Signed-off-by: Joe Perches <[EMAIL PROTECTED]>

diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_wx.c 
b/drivers/net/wireless/bcm43xx/bcm43xx_wx.c
index d6d9413..6acfdc4 100644
--- a/drivers/net/wireless/bcm43xx/bcm43xx_wx.c
+++ b/drivers/net/wireless/bcm43xx/bcm43xx_wx.c
@@ -444,7 +444,7 @@ static int bcm43xx_wx_set_xmitpower(struct net_device 
*net_dev,
u16 maxpower;
 
if ((data->txpower.flags & IW_TXPOW_TYPE) != IW_TXPOW_DBM) {
-   printk(PFX KERN_ERR "TX power not in dBm.\n");
+   printk(KERN_ERR PFX "TX power not in dBm.\n");
return -EOPNOTSUPP;
}
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread John Heffner

Larry McVoy wrote:

More data, we've conclusively eliminated the card / cpu from the mix.
We've got 2 ia64 boxes with e1000 interfaces.  One box is running
linux 2.6.12 and the other is running hpux 11.

I made sure the linux one was running at gigabit and reran the tests
from the linux/ia64 <=> hp/ia64.  Same results, when linux sends
it is slow, when it receives it is fast.

And note carefully: we've removed hpux from the equation, we can do
the same tests from linux to multiple linux clients and see the same
thing, sending from the server is slow, receiving on the server is
fast.



I think I'm still missing some basic data here (probably because this 
thread did not originate on netdev).  Let me try to nail down some of 
the basics.  You have a linux ia64 box (running 2.6.12 or 2.6.18?) that 
sends slowly, and receives faster, but not quite a 1 Gbps?  And this is 
true regardless of which peer it sends or receives from?  And the 
behavior is different depending on which kernel?  How, and which kernel 
versions?  Do you have other hardware running the same kernel that 
behaves the same or differently?


Have you done ethernet cable tests?  Have you tried measuring the udp 
sending rate?  (Iperf can do this.)  Are there any error counters on the 
interface?


  -John
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones
I also would have expected more ACK's from the HP box. It's been a long 
time since I did TCP, but I thought the rule was still that you were 
supposed to ACK at least every other full frame - but the HP box is acking 
roughly every 16K (and it's *not* always at TSO boundaries: the earlier 
ACK's in the sequence are at 1460-byte packet boundaries, but it does seem 
to end up getting into that pattern later on).


Drift...

The RFC's say "SHOULD" (emphasis theirs) rather than "MUST."

Both HP-UX and Solaris have rather robust ACK avoidance heuristics to cut-down 
on the CPU overhead of bulk transfers.  (That they both have them stems from 
their being cousins, sharing a common TCP stack ancestor long ago - both of 
course have been diverging since then).


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones

Larry McVoy wrote:

On Tue, Oct 02, 2007 at 11:01:47AM -0700, Rick Jones wrote:


has anyone already asked whether link-layer flow-control is enabled?



I doubt it, the same test works fine in one direction and poorly in the other.
Wouldn't the flow control squelch either way?


While I am often guilty of it, a wise old engineer tried to teach me that the 
proper spelling is ass-u-me :)  I wouldn't count on it hitting in both 
directions, depends on the specifics of the situation.


WRT the HP-UX ACK avoidance heuristic, the default HP-UX socket buffer/window is 
32768, and tcp_deferred_ack_max defaults to 22.  That isn't really all that good 
a combination - with a window of 32768 11 for the deferred ack would be better. 
 You could also go ahead and try it with a value of 2.  Or, bump the window 
size defaults - tcp_recv_hiwater_def and tcp_xmit_hiwater_def - to say 65535 or 
128K or something - or use the setsockopt() calls to effect that.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
More data, we've conclusively eliminated the card / cpu from the mix.
We've got 2 ia64 boxes with e1000 interfaces.  One box is running
linux 2.6.12 and the other is running hpux 11.

I made sure the linux one was running at gigabit and reran the tests
from the linux/ia64 <=> hp/ia64.  Same results, when linux sends
it is slow, when it receives it is fast.

And note carefully: we've removed hpux from the equation, we can do
the same tests from linux to multiple linux clients and see the same
thing, sending from the server is slow, receiving on the server is
fast.
-- 
---
Larry McVoylm at bitmover.com   http://www.bitkeeper.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Linus Torvalds


On Tue, 2 Oct 2007, Larry McVoy wrote:
> 
> No HP in the mix.  It's got nothing to do with hp, nor to do with rsh, it 
> has everything to do with the direction the data is flowing.  

Can you tcpdump both cases and send snippets (both of steady-state, and 
the initial connect)? 

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Linus Torvalds


On Tue, 2 Oct 2007, Larry McVoy wrote:
> 
> tcpdump is a good idea, take a look at this.  The window starts out
> at 46 and never opens up in my test case, but in the rsh case it 
> starts out the same but does open up.  Ideas?

I don't think that's an issue, since you only send one way. The window 
opening up only matters for the receiver. Also, you missed the "wscale=7" 
at the beginning, so the window of "46" looks like it actually is 5888 (ie 
fits four segments - and it's not grown because it never gets any data).

However, I think this is some strange TSO artifact:

...
> 08:08:18.843942 IP work-cluster.bitmover.com.31235 > 
> hp-ia64.bitmover.com.49614: P 48181:64241(16060) ack 0 win 46
> 08:08:18.844681 IP hp-ia64.bitmover.com.49614 > 
> work-cluster.bitmover.com.31235: . ack 48181 win 32768
> 08:08:18.844690 IP work-cluster.bitmover.com.31235 > 
> hp-ia64.bitmover.com.49614: P 64241:80301(16060) ack 0 win 46
> 08:08:18.845556 IP hp-ia64.bitmover.com.49614 > 
> work-cluster.bitmover.com.31235: . ack 64241 win 32768
> 08:08:18.845566 IP work-cluster.bitmover.com.31235 > 
> hp-ia64.bitmover.com.49614: . 80301:96361(16060) ack 0 win 46
> 08:08:18.846304 IP hp-ia64.bitmover.com.49614 > 
> work-cluster.bitmover.com.31235: . ack 80301 win 32768
...

We see a single packet containing 16060 bytes, which seems to be because 
of TSO on the sending side (you did your tcpdump on the sender, no?), so 
it will actually be broken up into 11 1460-byte regular frames by the 
network card, since they started out agreeing on a standard 1460-byte MSS. 
So the above is not a jumbo frame, it just kind of looks like one when you 
capture it on the sender side.

And maybe a 32kB window is not big enough when it causes the networking 
code to basically just have a single packet outstanding.

I also would have expected more ACK's from the HP box. It's been a long 
time since I did TCP, but I thought the rule was still that you were 
supposed to ACK at least every other full frame - but the HP box is acking 
roughly every 16K (and it's *not* always at TSO boundaries: the earlier 
ACK's in the sequence are at 1460-byte packet boundaries, but it does seem 
to end up getting into that pattern later on).

So I'm wondering if we get into some bad pattern with the networking code 
trying to make big TSO packets for e1000, but because they are *so* big 
that there's only room for two such packets per window, you don't get into 
any smooth pattern with lots of outstanding packets, but it starts 
stuttering.

Larry, try turning off TSO. Or rather, make the kernel use a smaller limit 
for the large packets. The easiest way to do that should be to just change 
the value in /proc/sys/net/ipv4/tcp_tso_win_divisor. It defaults to 3, try 
doing

echo 6 > /proc/sys/net/ipv4/tcp_tso_win_divisor

and see if that changes anything.

And maybe I'm just whistling in the dark. In fact, it looks like for you 
it's not 3, but 2 (window of 32768, but the TSO frames are half the size). 
So maybe I'm just totally confused and I'm not reading that tcp dump 
correctly at all!

Linus

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[IPv6] Fix ICMPv6 redirect handling with target multicast address

2007-10-02 Thread Brian Haley
When the ICMPv6 Target address is multicast, Linux processes the 
redirect instead of dropping it.  The problem is in this code in 
ndisc_redirect_rcv():


if (ipv6_addr_equal(dest, target)) {
on_link = 1;
} else if (!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) {
ND_PRINTK2(KERN_WARNING
   "ICMPv6 Redirect: target address is not 
link-local.\n");

return;
}

This second check will succeed if the Target address is, for example, 
FF02::1 because it has link-local scope.  Instead, it should be checking 
if it's a unicast link-local address, as stated in RFC 2461/4861 Section 
8.1:


  - The ICMP Target Address is either a link-local address (when
redirected to a router) or the same as the ICMP Destination
Address (when redirected to the on-link destination).

I know this doesn't explicitly say unicast link-local address, but it's 
implied.


This bug is preventing Linux kernels from achieving IPv6 Logo Phase II 
certification because of a recent error that was found in the TAHI test 
suite - Neighbor Disovery suite test 206 (v6LC.2.3.6_G) had the 
multicast address in the Destination field instead of Target field, so 
we were passing the test.  This won't be the case anymore.


The patch below fixes this problem, and also fixes ndisc_send_redirect() 
to not send an invalid redirect with a multicast address in the Target 
field.  I re-ran the TAHI Neighbor Discovery section to make sure Linux 
passes all 245 tests now.


-Brian


Signed-off-by: Brian Haley <[EMAIL PROTECTED]>

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 74c4d8d..a0a6406 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1267,7 +1267,8 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
 
 	if (ipv6_addr_equal(dest, target)) {
 		on_link = 1;
-	} else if (!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) {
+	} else if (ipv6_addr_type(target) !=
+		   (IPV6_ADDR_UNICAST|IPV6_ADDR_LINKLOCAL)) {
 		ND_PRINTK2(KERN_WARNING
 			   "ICMPv6 Redirect: target address is not link-local.\n");
 		return;
@@ -1343,7 +1344,7 @@ void ndisc_send_redirect(struct sk_buff *skb, struct neighbour *neigh,
 	}
 
 	if (!ipv6_addr_equal(&ipv6_hdr(skb)->daddr, target) &&
-	!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) {
+	ipv6_addr_type(target) != (IPV6_ADDR_UNICAST|IPV6_ADDR_LINKLOCAL)) {
 		ND_PRINTK2(KERN_WARNING
 			"ICMPv6 Redirect: target address is not link-local.\n");
 		return;


Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
> Looks like you have TSO enabled.  Does it behave differently if it's 
> disabled?  

It cranks the interrupts/sec up to 8K instead of 5K.  No difference in
performance other than that.

> I think Rick Jones is on to something with the HP ack avoidance.  

I sincerely doubt it.  I'm only using the HP box because it has gigabit
so it's a single connection.  I can produce almost identical results by
doing the same sorts of tests with several linux clients.  One direction
goes fast and the other goes slow.

3x performance difference depending on the direction of data flow:

# Server is receiving, goes fast
$ for i in 22 24 25 26; do rsh -n glibc$i dd if=/dev/zero|dd of=/dev/null & done
load free cach swap pgin  pgou dk0 dk1 dk2 dk3 ipkt opkt  int  ctx  usr sys idl
0.98   0000 00   0   0   0   30K  15K 8.1K  68K  12  66  22
0.98   0000 00   0   0   0   29K  15K 8.2K  67K  11  64  25
0.98   0000 00   0   0   0   29K  15K 8.2K  67K  12  66  22

# Server is sending, goes slow
$ for i in 22 24 25 26; do dd if=/dev/zero|rsh glibc$i dd of=/dev/null & done
load free cach swap pgin  pgou dk0 dk1 dk2 dk3 ipkt opkt  int  ctx  usr sys idl
1.06   0000 00   0   0   0  5.0K  10K 4.4K 8.4K  21  17  62
0.97   0000 00   0   0   0  5.1K  10K 4.4K 8.9K   2  15  83
0.97   0000 00   0   0   0  5.0K  10K 4.4K 8.6K  21  26  53

$ for i in 22 24 25 26; do rsh glibc$i cat /etc/motd; done | grep Welcome
Welcome to redhat71.bitmover.com, a 2Ghz Athlon running Red Hat 7.1.
Welcome to glibc24.bitmover.com, a 1.2Ghz Athlon running SUSE 10.1.
Welcome to glibc25.bitmover.com, a 2Ghz Athlon running Fedora Core 6
Welcome to glibc26.bitmover.com, a 2Ghz Athlon running Fedora Core 7

$ for i in 22 24 25 26; do rsh glibc$i uname -r; done
2.4.2-2
2.6.16.13-4-default
2.6.18-1.2798.fc6
2.6.22.4-65.fc7

No HP in the mix.  It's got nothing to do with hp, nor to do with rsh, it 
has everything to do with the direction the data is flowing.  
-- 
---
Larry McVoylm at bitmover.com   http://www.bitkeeper.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-10-02 Thread Sean Hefty
>Umm... this is a difficult situation for me to merge the changes then.
>We're changing the CM retry behavior blind here.  How do we know that
>the MRA changes don't make the scalability issue worse?

What's currently upstream doesn't work for Intel MPI on our larger clusters.
The connection requests time out on the active side before the passive side can
respond.

The OFED release works because it provides a kernel patch to make the timeout a
module parameter.  I'm trying to avoid adding a module parameter, and the MRA is
designed for this situation.

I tested this by simulating a slow passive side responder, and it worked as
expected for those tests.  Using an MRA does add another MAD to the CM exchange,
which is why it is sent only after seeing a duplicate request.  Alternatively,
we can take the OFED module parameter patch.

- Sean
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
On Tue, Oct 02, 2007 at 11:01:47AM -0700, Rick Jones wrote:
> has anyone already asked whether link-layer flow-control is enabled?

I doubt it, the same test works fine in one direction and poorly in the other.
Wouldn't the flow control squelch either way?
-- 
---
Larry McVoylm at bitmover.com   http://www.bitkeeper.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
> Make sure you don't have slab debugging turned on. It kills performance.

It's a stock debian kernel, so unless they turn it on it's off.
-- 
---
Larry McVoylm at bitmover.com   http://www.bitkeeper.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-02 Thread John Heffner

Larry McVoy wrote:

On Tue, Oct 02, 2007 at 06:52:54PM +0800, Herbert Xu wrote:

One of my clients also has gigabit so I played around with just that
one and it (itanium running hpux w/ broadcom gigabit) can push the load
as well.  One weird thing is that it is dependent on the direction the
data is flowing.  If the hp is sending then I get 46MB/sec, if linux is
sending then I get 18MB/sec.  Weird.  Linux is debian, running 

First of all check the CPU load on both sides to see if either
of them is saturating.  If the CPU's fine then look at the tcpdump
output to see if both receivers are using the same window settings.


tcpdump is a good idea, take a look at this.  The window starts out
at 46 and never opens up in my test case, but in the rsh case it 
starts out the same but does open up.  Ideas?


(Binary tcpdumps are always better than ascii.)

The window on the sender (linux box) starts at 46.  It doesn't open up, 
but it's not receiving data so it doesn't matter, and you don't expect 
it to.  The HP box always announces a window of 32768.


Looks like you have TSO enabled.  Does it behave differently if it's 
disabled?  I think Rick Jones is on to something with the HP ack 
avoidance.  Looks like a pretty low ack ratio, and it might not be 
interacting well with TSO, especially at such a small window size.


  -John
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-10-02 Thread Roland Dreier
 > >OK -- just to make sure I'm understanding what you're saying: have you
 > >confirmed that your proposed [CM MRA] patches actually fix the issue?
 > 
 > Not directly.  I cannot easily test kernel patches on our larger, production
 > clusters.  We've seen the issue with specific applications on 512 and 1024
 > cores, but I've only been able to test the patch on a 48-core cluster.  I 
 > have
 > verified that it successfully increases the timeout to where it *should* 
 > work,
 > but cannot absolutely confirm that it will fix the problem.  I'm unlikely to
 > know that until the production clusters move to an OFED release (1.3?)
 > containing this patch.

Umm... this is a difficult situation for me to merge the changes then.
We're changing the CM retry behavior blind here.  How do we know that
the MRA changes don't make the scalability issue worse?

 - R.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >