Re: pktgen question

2007-09-24 Thread Ben Greear

jamal wrote:

On Sun, 2007-23-09 at 12:55 -0500, Steve Wise wrote:

  
Its a hack that breaks cxgb3 because cxgb3 uses the skb-cb area for 
each skb passed down.  So cxgb3 is at fault then?  IE a driver cannot 
use the skb-cb field if the users count is  1?  Or maybe a driver can 
_never_ use the cb field?



any layer can use the cb structure whichever way they wish. There are
violations, e.g:
the vlan code also uses the cb field to pass vlan details for hardware
acceleration. How does pktgen affect it though, clone() will just copy
the cb field and pktgen doesnt touch it.
In retrospect, pktgen may have to use clone - ccing Robert Olsson.
  
Pktgen abuses the ref count, as far as I can tell.  It works with most 
ethernet drivers,
but that multi-pkt will also fail with things like vlans because the 
skb-dev is changed

as it is transmitted (to the lower-level device).

I'd say just don't use the multi-pkt with pktgen on devices that can't 
handle it.


Thanks,
Ben


--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHES] TX batching

2007-09-24 Thread Kok, Auke
jamal wrote:
 On Sun, 2007-23-09 at 12:36 -0700, Kok, Auke wrote:
 
 please be reminded that we're going to strip down e1000 and most of the 
 features
 should go into e1000e, which has much less hardware workarounds. I'm still
 reluctant to putting in new stuff in e1000 - I really want to chop it down 
 first ;)
 
 sure - the question then is, will you take those changes if i use
 e1000e? theres a few cleanups that have nothing to do with batching;
 take a look at the modified e1000 on the git tree.

that's bad to begin with :) - please send those separately so I can fasttrack 
them
into e1000e and e1000 where applicable.

But yes, I'm very inclined to merge more features into e1000e than e1000. I 
intend
to put multiqueue support into e1000e, as *all* of the hardware that it will
support has multiple queues. Putting in any other performance feature like tx
batching would absolutely be interesting.

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix mace_handle_misc_intrs compilation

2007-09-24 Thread Olaf Hering

Fix compilation after incomplete struct net_device changes.

Signed-off-by: Olaf Hering [EMAIL PROTECTED]

---
 drivers/net/mace.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/net/mace.c
+++ b/drivers/net/mace.c
@@ -633,7 +633,7 @@ static void mace_set_multicast(struct ne
 spin_unlock_irqrestore(mp-lock, flags);
 }
 
-static void mace_handle_misc_intrs(struct mace_data *mp, int intr)
+static void mace_handle_misc_intrs(struct mace_data *mp, int intr, struct 
net_device *dev)
 {
 volatile struct mace __iomem *mb = mp-mace;
 static int mace_babbles, mace_jabbers;
@@ -669,7 +669,7 @@ static irqreturn_t mace_interrupt(int ir
 spin_lock_irqsave(mp-lock, flags);
 intr = in_8(mb-ir);  /* read interrupt register */
 in_8(mb-xmtrc);  /* get retries */
-mace_handle_misc_intrs(mp, intr);
+mace_handle_misc_intrs(mp, intr, dev);
 
 i = mp-tx_empty;
 while (in_8(mb-pr)  XMTSV) {
@@ -682,7 +682,7 @@ static irqreturn_t mace_interrupt(int ir
 */
intr = in_8(mb-ir);
if (intr != 0)
-   mace_handle_misc_intrs(mp, intr);
+   mace_handle_misc_intrs(mp, intr, dev);
if (mp-tx_bad_runt) {
fs = in_8(mb-xmtfs);
mp-tx_bad_runt = 0;
@@ -817,7 +817,7 @@ static void mace_tx_timeout(unsigned lon
goto out;
 
 /* update various counters */
-mace_handle_misc_intrs(mp, in_8(mb-ir));
+mace_handle_misc_intrs(mp, in_8(mb-ir), dev);
 
 cp = mp-tx_cmds + NCMDS_TX * mp-tx_empty;
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix mace_handle_misc_intrs compilation

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 10:15:01 +0200 Olaf Hering [EMAIL PROTECTED] wrote:

 Fix compilation after incomplete struct net_device changes.

yup, thanks, Kamalesh Babulal has already sent in an identical
patch.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: change the way e1000 is handling short VLAN frames

2007-09-24 Thread Emil Micek
Thanks for your replies.

We contacted Arris (manufacturer of our CMTS) about this issue with
links to relevant parts of specification about minimum size of VLAN
tagged frames and it seems they'll acknowledge the problem and fix it in
next firmware.

Meantime i tried different suggestions posted there (patch from Chris,
replacing ETH_ZLEN with VLAN_ETH_ZLEN in e1000_main.c and disabling HW
VLAN tagging in netdev-features) and combinations of them, but none of
it worked and 64B VLAN frames are still generated by our intel NIC.

I'll now consider this issue as closed on the e1000 and intel NIC side,
because there was no problem with them in the first place, and I'll
rather concentrate on resolving it on the CMTS side which doesn't act
according to specs.

Cheer

Emil.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Clean up redundant PHY write line for ULi526x Ethernet driver

2007-09-24 Thread Zang Roy-r61911
From: Roy Zang [EMAIL PROTECTED]

Clean up redundant PHY write line for ULi526x Ethernet
Driver.

Signed-off-by: Roy Zang [EMAIL PROTECTED]
---
 drivers/net/tulip/uli526x.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/tulip/uli526x.c b/drivers/net/tulip/uli526x.c
index ca2548e..53a8e65 100644
--- a/drivers/net/tulip/uli526x.c
+++ b/drivers/net/tulip/uli526x.c
@@ -1512,7 +1512,6 @@ static void uli526x_process_mode(struct 
uli526x_board_info *db)
case ULI526X_100MFD: phy_reg = 0x2100; break;
}
phy_write(db-ioaddr, db-phy_addr, 0, phy_reg, 
db-chip_id);
-   phy_write(db-ioaddr, db-phy_addr, 0, phy_reg, 
db-chip_id);
}
}
 }
-- 
1.5.2




-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/3] [TCP] MIB: Count FRTO's successfully detected spurious RTOs

2007-09-24 Thread Ilpo Järvinen
Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/snmp.h |1 +
 net/ipv4/proc.c  |1 +
 net/ipv4/tcp_input.c |1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/snmp.h b/include/linux/snmp.h
index d8fd3ec..89f0c2b 100644
--- a/include/linux/snmp.h
+++ b/include/linux/snmp.h
@@ -213,6 +213,7 @@ enum
LINUX_MIB_TCPSACKDISCARD,   /* TCPSACKDiscard */
LINUX_MIB_TCPDSACKIGNOREDOLD,   /* TCPSACKIgnoredOld */
LINUX_MIB_TCPDSACKIGNOREDNOUNDO,/* TCPSACKIgnoredNoUndo */
+   LINUX_MIB_TCPSPURIOUSRTOS,  /* TCPSpuriousRTOs */
__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 2015148..9dee70e 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -245,6 +245,7 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM(TCPSACKDiscard, LINUX_MIB_TCPSACKDISCARD),
SNMP_MIB_ITEM(TCPDSACKIgnoredOld, LINUX_MIB_TCPDSACKIGNOREDOLD),
SNMP_MIB_ITEM(TCPDSACKIgnoredNoUndo, LINUX_MIB_TCPDSACKIGNOREDNOUNDO),
+   SNMP_MIB_ITEM(TCPSpuriousRTOs, LINUX_MIB_TCPSPURIOUSRTOS),
SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ae06b94..259f517 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2935,6 +2935,7 @@ static int tcp_process_frto(struct sock *sk, int flag)
}
tp-frto_counter = 0;
tp-undo_marker = 0;
+   NET_INC_STATS_BH(LINUX_MIB_TCPSPURIOUSRTOS);
}
return 0;
 }
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/3] [TCP]: Reordered ACK's (old) SACKs not included to discarded MIB

2007-09-24 Thread Ilpo Järvinen
In case of ACK reordering, the SACK block might be valid in it's
time but is already obsoleted since we've received another kind
of confirmation about arrival of the segments through snd_una
advancement of an earlier packet.

I didn't bother to build distinguishing of valid and invalid
SACK blocks but simply made reordered SACK blocks that are too
old always not counted regardless of their real validity which
could be determined by using the ack field of the reordered
packet (won't be significant IMHO).

DSACKs can very well be considered useful even in this situation,
so won't do any of this for them.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 532288b..ae06b94 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1250,8 +1250,13 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_

NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDNOUNDO);
else

NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDOLD);
-   } else
+   } else {
+   /* Don't count olds caused by ACK reordering */
+   if ((TCP_SKB_CB(ack_skb)-ack_seq != 
tp-snd_una) 
+   !after(end_seq, tp-snd_una))
+   continue;
NET_INC_STATS_BH(LINUX_MIB_TCPSACKDISCARD);
+   }
continue;
}
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] [TCP]: Re-place highest_sack check to a more robust position

2007-09-24 Thread Ilpo Järvinen
I previously added checking to position that is rather poor as
state has already been adjusted quite a bit. Re-placing it above
all state changes should be more robust though the return should
never ever get executed regardless of its place :-).

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_output.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cbb83ac..94c8011 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1718,6 +1718,10 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
struct sk_buff *skb, int m
BUG_ON(tcp_skb_pcount(skb) != 1 ||
   tcp_skb_pcount(next_skb) != 1);
 
+   if (WARN_ON(tp-sacked_out 
+   (TCP_SKB_CB(next_skb)-seq == tp-highest_sack)))
+   return;
+
/* Ok.  We will be able to collapse the packet. */
tcp_unlink_write_queue(next_skb, sk);
 
@@ -1734,10 +1738,6 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
struct sk_buff *skb, int m
/* Update sequence range on original skb. */
TCP_SKB_CB(skb)-end_seq = TCP_SKB_CB(next_skb)-end_seq;
 
-   if (WARN_ON(tp-sacked_out 
-   (TCP_SKB_CB(next_skb)-seq == tp-highest_sack)))
-   return;
-
/* Merge over control information. */
flags |= TCP_SKB_CB(next_skb)-flags; /* This moves PSH/FIN 
etc. over */
TCP_SKB_CB(skb)-flags = flags;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-2.6.24 0/3]: TCP tweaks

2007-09-24 Thread Ilpo Järvinen
Hi Dave,

Here are some minor tweaks to recent net-2.6.24 changes, more or
less mandatory. The first one is the most important one because
if the not-supposed-to-happen case occurs, TCP would be left to
an inconsistent state. As regards the second, I'm fine with
either way. The third one's MIB for FRTO is sort of nice to have
but definately not mandatory. I'm leaving decision about those
two to you if you're fine with them feel free to apply :-). 
  
-- 
 i.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: incorrect cksum with tcp/udp on lo with 2.6.20/2.6.21/2.6.22

2007-09-24 Thread Krzysztof Oledzki



On Mon, 24 Sep 2007, Herbert Xu wrote:


On Sun, Sep 23, 2007 at 11:18:58PM +0200, Krzysztof Oledzki wrote:


Thank you for the information. Is there any easy way to turn them on? I
need it for LVS.


Do you really need it?


Yes. I would like to use a LVS redirector as both a client and a director:
 
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.LVS-DR.html#director_as_client_in_LVS-DR


The packets should be checksummed at the point where they physically
leave the host.


So, with DR mode, packet goes by the lo device (with bad checksum) and 
then get redirected outside. Unfortunately, when it leaves host it has bad 
checksum, too. :(


Best regards,

Krzysztof Olędzki

[PATCH 3/5] [TCP]: Convert highest_sack to sk_buff to allow direct access

2007-09-24 Thread Ilpo Järvinen
From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= [EMAIL PROTECTED]

It is going to replace the sack fastpath hint quite soon... :-)

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/tcp.h   |6 --
 include/net/tcp.h |   13 +
 net/ipv4/tcp_input.c  |   12 ++--
 net/ipv4/tcp_output.c |   19 ++-
 4 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f8cf090..1d6be2a 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -332,8 +332,10 @@ struct tcp_sock {
 
struct tcp_sack_block_wire recv_sack_cache[4];
 
-   u32 highest_sack;   /* Start seq of globally highest revd SACK
-* (validity guaranteed only if sacked_out  0) 
*/
+   struct sk_buff *highest_sack;   /* highest skb with SACK received
+* (validity guaranteed only if
+* sacked_out  0)
+*/
 
/* from STCP, retrans queue hinting */
struct sk_buff* lost_skb_hint;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 991ccdc..8bc64b7 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1308,6 +1308,19 @@ static inline int tcp_write_queue_empty(struct sock *sk)
return skb_queue_empty(sk-sk_write_queue);
 }
 
+/* Start sequence of the highest skb with SACKed bit, valid only if
+ * sacked  0 or when the caller has ensured validity by itself.
+ */
+static inline u32 tcp_highest_sack_seq(struct sock *sk)
+{
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   if (WARN_ON(!tp-sacked_out 
+   tp-highest_sack != tcp_write_queue_head(sk)))
+   return tp-snd_una;
+   return TCP_SKB_CB(tp-highest_sack)-seq;
+}
+
 /* /proc */
 enum tcp_seq_states {
TCP_SEQ_STATE_LISTENING,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 76e9c9b..85dd4b0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1143,10 +1143,11 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct 
sk_buff *ack_skb,
return dup_sack;
 }
 
-static void tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp,
+static void tcp_sacktag_one(struct sk_buff *skb, struct sock *sk,
struct tcp_sacktag_state *state, int in_sack,
int dup_sack, int fack_count, u32 end_seq)
 {
+   struct tcp_sock *tp = tcp_sk(sk);
u8 sacked = TCP_SKB_CB(skb)-sacked;
 
/* Account D-SACK for retransmitted packet. */
@@ -1231,9 +1232,8 @@ static void tcp_sacktag_one(struct sk_buff *skb, struct 
tcp_sock *tp,
if (fack_count  tp-fackets_out)
tp-fackets_out = fack_count;
 
-   if (after(TCP_SKB_CB(skb)-seq,
- tp-highest_sack))
-   tp-highest_sack = TCP_SKB_CB(skb)-seq;
+   if (after(TCP_SKB_CB(skb)-seq, tcp_highest_sack_seq(sk)))
+   tp-highest_sack = skb;
} else {
if (dup_sack  (sackedTCPCB_RETRANS))
state-reord = min(fack_count, state-reord);
@@ -1271,7 +1271,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
 
if (!tp-sacked_out) {
tp-fackets_out = 0;
-   tp-highest_sack = tp-snd_una;
+   tp-highest_sack = tcp_write_queue_head(sk);
}
 
found_dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, 
prior_snd_una);
@@ -1424,7 +1424,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
 
fack_count += pcount;
 
-   tcp_sacktag_one(skb, tp, state, in_sack,
+   tcp_sacktag_one(skb, sk, state, in_sack,
dup_sack, fack_count, end_seq);
}
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 94c8011..fd51692 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -657,13 +657,15 @@ static void tcp_set_skb_tso_segs(struct sock *sk, struct 
sk_buff *skb, unsigned
  * tweak SACK fastpath hint too as it would overwrite all changes unless
  * hint is also changed.
  */
-static void tcp_adjust_fackets_out(struct tcp_sock *tp, struct sk_buff *skb,
+static void tcp_adjust_fackets_out(struct sock *sk, struct sk_buff *skb,
   int decr)
 {
+   struct tcp_sock *tp = tcp_sk(sk);
+
if (!tp-sacked_out)
return;
 
-   if (!before(tp-highest_sack, TCP_SKB_CB(skb)-seq))
+   if (!before(tcp_highest_sack_seq(sk), TCP_SKB_CB(skb)-seq))
tp-fackets_out -= decr;
 
/* cnt_hint is off-by-one compared with fackets_out (see sacktag) */
@@ -712,8 +714,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 
len, unsigned int mss
TCP_SKB_CB(buff)-end_seq = 

[RFC PATCH net-2.6.24 0/5]: TCP sacktag cache usage recoded

2007-09-24 Thread Ilpo Järvinen
Hi all,

After couple of wrong-wayed before/after()s and one infinite
loopy version, here's the current trial version of a sacktag  
cache usage recode

Two first patches come from tcp-2.6 (rebased and rotated).
This series apply cleanly only on top of the other three patch
series I posted earlier today. The last debug patch provides
some statistics for those interested enough.

Dave, please DO NOT apply! ...Some thoughts could be nice
though :-).

It should improve processing of such likely events as cumulative
ACKs and new forward holed SACK considerably because full walk  
is not necessary anymore (old code could have been tweaked to   
cover them but it's better to drop each special case handling   
altogether and do a generic solution. Redundancy of fastpath
hints and highest_sack stuff is also addressed, however, it 
might have slight effect as the hint could point to something   
less than highest_sack occassionally, whether that's significant
remains to see... In all cases except hint below highest_sack,  
the new solution should perform at least as well as the old code
(with a bit larger constant though, no additional cache misses  
though) because the SACK blocks old code choose not to process  
should all fall to LINUX_MIB_TCP_FULLSKIP category.

It's possible to improve it easily with RB-tree stuff though
this version is based on code using linked lists. I'm not yet
too sure that I got everything 100% correct as I tweak
start/end_seqs and exit skb loops a way that is prone to
off-by-one errors, could miss skb here and there. I'll probably
also recode dsack handling too to avoid recursion.

Stephen, Sangtae, others? experiencing those unexpected RTOs
during recovery of large windowed TCP, could you please give it
a try if it helps any...

--
 i.
   
ps. Our net-2.6.24 (and mainline?) DSACK processing code could
be broken btw. DSACK in the middle of other SACK block during
in-order walk seems to not be processed at all as the earlier
(sorted) block caused skb to advance past it already? (Just
occurred to me, I'll see what I can do to that if I've enough
time).


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] [TCP]: Create tcp_sacktag_state.

2007-09-24 Thread Ilpo Järvinen
From: David S. Miller [EMAIL PROTECTED]

It is difficult to break out the inner-logic of
tcp_sacktag_write_queue() into worker functions because
so many local variables get updated in-place.

Start to overcome this by creating a structure block
of state variables that can be passed around into
worker routines.

[I made minor tweaks due to rebase/reordering of stuff
in tcp-2.6 tree, and dropped found_dup_sack  dup_sack
from state -ij]

Signed-off-by: David S. Miller [EMAIL PROTECTED]
Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   89 ++---
 1 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 259f517..04ff465 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1102,6 +1102,13 @@ static int tcp_is_sackblock_valid(struct tcp_sock *tp, 
int is_dsack,
return !before(start_seq, end_seq - tp-max_window);
 }
 
+struct tcp_sacktag_state {
+   unsigned int flag;
+   int reord;
+   int prior_fackets;
+   u32 lost_retrans;
+   int first_sack_index;
+};
 
 static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
   struct tcp_sack_block_wire *sp, int num_sacks,
@@ -1146,25 +1153,22 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3;
-   int reord = tp-packets_out;
-   int prior_fackets;
-   u32 lost_retrans = 0;
-   int flag = 0;
-   int found_dup_sack = 0;
+   struct tcp_sacktag_state state;
+   int found_dup_sack;
int cached_fack_count;
int i;
-   int first_sack_index;
+   int force_one_sack;
+
+   state.flag = 0;
 
if (!tp-sacked_out) {
tp-fackets_out = 0;
tp-highest_sack = tp-snd_una;
}
-   prior_fackets = tp-fackets_out;
 
-   found_dup_sack = tcp_check_dsack(tp, ack_skb, sp,
-num_sacks, prior_snd_una);
+   found_dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, 
prior_snd_una);
if (found_dup_sack)
-   flag |= FLAG_DSACKING_ACK;
+   state.flag |= FLAG_DSACKING_ACK;
 
/* Eliminate too old ACKs, but take into
 * account more or less fresh ones, they can
@@ -1177,18 +1181,18 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
 * if the only SACK change is the increase of the end_seq of
 * the first block then only apply that SACK block
 * and use retrans queue hinting otherwise slowpath */
-   flag = 1;
+   force_one_sack = 1;
for (i = 0; i  num_sacks; i++) {
__be32 start_seq = sp[i].start_seq;
__be32 end_seq = sp[i].end_seq;
 
if (i == 0) {
if (tp-recv_sack_cache[i].start_seq != start_seq)
-   flag = 0;
+   force_one_sack = 0;
} else {
if ((tp-recv_sack_cache[i].start_seq != start_seq) ||
(tp-recv_sack_cache[i].end_seq != end_seq))
-   flag = 0;
+   force_one_sack = 0;
}
tp-recv_sack_cache[i].start_seq = start_seq;
tp-recv_sack_cache[i].end_seq = end_seq;
@@ -1199,8 +1203,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tp-recv_sack_cache[i].end_seq = 0;
}
 
-   first_sack_index = 0;
-   if (flag)
+   state.first_sack_index = 0;
+   if (force_one_sack)
num_sacks = 1;
else {
int j;
@@ -1218,17 +1222,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
sp[j+1] = tmp;
 
/* Track where the first SACK block 
goes to */
-   if (j == first_sack_index)
-   first_sack_index = j+1;
+   if (j == state.first_sack_index)
+   state.first_sack_index = j+1;
}
 
}
}
}
 
-   /* clear flag as used for different purpose in following code */
-   flag = 0;
-
/* Use SACK fastpath hint if valid */
cached_skb = tp-fastpath_skb_hint;
cached_fack_count = tp-fastpath_cnt_hint;
@@ -1237,12 +1238,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
cached_fack_count = 0;
}
 
+   state.reord = 

[PATCH 2/5] [TCP]: Create tcp_sacktag_one().

2007-09-24 Thread Ilpo Järvinen
From: David S. Miller [EMAIL PROTECTED]

Worker function that implements the main logic of
the inner-most loop of tcp_sacktag_write_queue().

Signed-off-by: David S. Miller [EMAIL PROTECTED]
Acked-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |  213 ++
 1 files changed, 110 insertions(+), 103 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 04ff465..76e9c9b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1143,6 +1143,114 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct 
sk_buff *ack_skb,
return dup_sack;
 }
 
+static void tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp,
+   struct tcp_sacktag_state *state, int in_sack,
+   int dup_sack, int fack_count, u32 end_seq)
+{
+   u8 sacked = TCP_SKB_CB(skb)-sacked;
+
+   /* Account D-SACK for retransmitted packet. */
+   if ((dup_sack  in_sack) 
+   (sacked  TCPCB_RETRANS) 
+   after(TCP_SKB_CB(skb)-end_seq, tp-undo_marker))
+   tp-undo_retrans--;
+
+   /* The frame is ACKed. */
+   if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una)) {
+   if (sacked  TCPCB_RETRANS) {
+   if ((dup_sack  in_sack) 
+   (sacked  TCPCB_SACKED_ACKED))
+   state-reord = min(fack_count, state-reord);
+   } else {
+   /* If it was in a hole, we detected reordering. */
+   if (fack_count  state-prior_fackets 
+   !(sacked  TCPCB_SACKED_ACKED))
+   state-reord = min(fack_count, state-reord);
+   }
+
+   /* Nothing to do; acked frame is about to be dropped. */
+   return;
+   }
+
+   if ((sacked  TCPCB_SACKED_RETRANS) 
+   after(end_seq, TCP_SKB_CB(skb)-ack_seq) 
+   (!state-lost_retrans || after(end_seq, state-lost_retrans)))
+   state-lost_retrans = end_seq;
+
+   if (!in_sack)
+   return;
+
+   if (!(sacked  TCPCB_SACKED_ACKED)) {
+   if (sacked  TCPCB_SACKED_RETRANS) {
+   /* If the segment is not tagged as lost,
+* we do not clear RETRANS, believing
+* that retransmission is still in flight.
+*/
+   if (sacked  TCPCB_LOST) {
+   TCP_SKB_CB(skb)-sacked =
+   ~(TCPCB_LOST|TCPCB_SACKED_RETRANS);
+   tp-lost_out -= tcp_skb_pcount(skb);
+   tp-retrans_out -= tcp_skb_pcount(skb);
+
+   /* clear lost hint */
+   tp-retransmit_skb_hint = NULL;
+   }
+   } else {
+   /* New sack for not retransmitted frame,
+* which was in hole. It is reordering.
+*/
+   if (!(sacked  TCPCB_RETRANS) 
+   fack_count  state-prior_fackets)
+   state-reord = min(fack_count, state-reord);
+
+   if (sacked  TCPCB_LOST) {
+   TCP_SKB_CB(skb)-sacked = ~TCPCB_LOST;
+   tp-lost_out -= tcp_skb_pcount(skb);
+
+   /* clear lost hint */
+   tp-retransmit_skb_hint = NULL;
+   }
+   /* SACK enhanced F-RTO detection.
+* Set flag if and only if non-rexmitted
+* segments below frto_highmark are
+* SACKed (RFC4138; Appendix B).
+* Clearing correct due to in-order walk
+*/
+   if (after(end_seq, tp-frto_highmark)) {
+   state-flag = ~FLAG_ONLY_ORIG_SACKED;
+   } else {
+   if (!(sacked  TCPCB_RETRANS))
+   state-flag |= FLAG_ONLY_ORIG_SACKED;
+   }
+   }
+
+   TCP_SKB_CB(skb)-sacked |= TCPCB_SACKED_ACKED;
+   state-flag |= FLAG_DATA_SACKED;
+   tp-sacked_out += tcp_skb_pcount(skb);
+
+   if (fack_count  tp-fackets_out)
+   tp-fackets_out = fack_count;
+
+   if (after(TCP_SKB_CB(skb)-seq,
+ tp-highest_sack))
+   tp-highest_sack = TCP_SKB_CB(skb)-seq;
+   } else {
+   if (dup_sack  (sackedTCPCB_RETRANS))
+   state-reord = min(fack_count, state-reord);
+   }
+
+   /* D-SACK. We can detect redundant retransmission
+* in S|R and plain R frames and clear 

[RFC PATCH 4/5] [TCP]: Rewrite sack_recv_cache (WIP)

2007-09-24 Thread Ilpo Järvinen
From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= [EMAIL PROTECTED]

Previously a number of cases in TCP SACK processing fail to take
advantage of costly stored information in sack_recv_cache. Most
importantly expected events such as cumulative ACK, new hole
ACKs and first ACK after RTO fall to this category. Processing
on such ACKs result in rather long walks building up latencies
(which easily gets nasty when window is large), which are
completely unnecessary, usually no new information was gathered
except the new SACK block above the hole in the respective case.

Since the inclusion of highest_sack, there's a lot information
that is very likely redundant (SACK fastpath hint stuff,
fackets_out, highest_sack), though there's no ultimate guarantee
that they'll remain the same whole the time (in all unearthly
scenarios). Take advantage of this too and drop fastpath hint.
Effectively this drops special cased fastpath. This change
adds some complexity to introduce better coveraged fastpath.
The current ACK's SACK blocks are compared against each cached
block individially and only ranges that are new are then scanned
by the high constant walk. For other parts of write queue, even
when in previously known part of the SACK blocks, a faster skip
function is used. In addition, whenever possible, TCP
fast-forwards to highest_sack skb that was made available
earlier. In typical case, no other things but this fast-forward
and mandatory markings after that occur making the access
pattern quite similar to the former fastpath. DSACKs are special
case that must always be walked.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/tcp.h   |4 +-
 include/net/tcp.h |1 -
 net/ipv4/tcp_input.c  |  320 ++--
 net/ipv4/tcp_output.c |   12 +--
 4 files changed, 202 insertions(+), 135 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 1d6be2a..8d91eac 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -330,7 +330,7 @@ struct tcp_sock {
struct tcp_sack_block duplicate_sack[1]; /* D-SACK block */
struct tcp_sack_block selective_acks[4]; /* The SACKS themselves*/
 
-   struct tcp_sack_block_wire recv_sack_cache[4];
+   struct tcp_sack_block recv_sack_cache[4];
 
struct sk_buff *highest_sack;   /* highest skb with SACK received
 * (validity guaranteed only if
@@ -343,9 +343,7 @@ struct tcp_sock {
struct sk_buff *scoreboard_skb_hint;
struct sk_buff *retransmit_skb_hint;
struct sk_buff *forward_skb_hint;
-   struct sk_buff *fastpath_skb_hint;
 
-   int fastpath_cnt_hint;
int lost_cnt_hint;
int retransmit_cnt_hint;
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 8bc64b7..d5def9b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1078,7 +1078,6 @@ static inline void tcp_clear_retrans_hints_partial(struct 
tcp_sock *tp)
 static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp)
 {
tcp_clear_retrans_hints_partial(tp);
-   tp-fastpath_skb_hint = NULL;
 }
 
 /* MD5 Signature */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 85dd4b0..9dfdd67 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1106,11 +1106,15 @@ struct tcp_sacktag_state {
unsigned int flag;
int reord;
int prior_fackets;
+   int fack_count;
u32 lost_retrans;
-   int first_sack_index;
+   u32 dup_start;
+   u32 dup_end;
 };
 
-static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
+static int tcp_check_dsack(struct tcp_sock *tp,
+  struct tcp_sacktag_state *state,
+  struct sk_buff *ack_skb,
   struct tcp_sack_block_wire *sp, int num_sacks,
   u32 prior_snd_una)
 {
@@ -1120,6 +1124,8 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct 
sk_buff *ack_skb,
 
if (before(start_seq_0, TCP_SKB_CB(ack_skb)-ack_seq)) {
dup_sack = 1;
+   state-dup_start = start_seq_0;
+   state-dup_end = end_seq_0;
tcp_dsack_seen(tp);
NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV);
} else if (num_sacks  1) {
@@ -1129,6 +1135,8 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct 
sk_buff *ack_skb,
if (!after(end_seq_0, end_seq_1) 
!before(start_seq_0, start_seq_1)) {
dup_sack = 1;
+   state-dup_start = start_seq_1;
+   state-dup_end = end_seq_1;
tcp_dsack_seen(tp);
NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV);
}
@@ -1251,6 +1259,104 @@ static void tcp_sacktag_one(struct sk_buff *skb, struct 
sock *sk,
}
 }
 
+static struct sk_buff *tcp_sacktag_walk(struct sk_buff *skb, struct sock *sk,
+ 

[DEVELOPER PATCH 5/5] [TCP]: Track sacktag

2007-09-24 Thread Ilpo Järvinen
From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= [EMAIL PROTECTED]

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/snmp.h |   20 +++
 net/ipv4/proc.c  |   20 +++
 net/ipv4/tcp_input.c |   52 +++--
 3 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/include/linux/snmp.h b/include/linux/snmp.h
index 89f0c2b..42b8c07 100644
--- a/include/linux/snmp.h
+++ b/include/linux/snmp.h
@@ -214,6 +214,26 @@ enum
LINUX_MIB_TCPDSACKIGNOREDOLD,   /* TCPSACKIgnoredOld */
LINUX_MIB_TCPDSACKIGNOREDNOUNDO,/* TCPSACKIgnoredNoUndo */
LINUX_MIB_TCPSPURIOUSRTOS,  /* TCPSpuriousRTOs */
+   LINUX_MIB_TCP_SACKTAG,
+   LINUX_MIB_TCP_SACK0,
+   LINUX_MIB_TCP_SACK1,
+   LINUX_MIB_TCP_SACK2,
+   LINUX_MIB_TCP_SACK3,
+   LINUX_MIB_TCP_SACK4,
+   LINUX_MIB_TCP_WALKEDSKBS,
+   LINUX_MIB_TCP_WALKEDDSACKS,
+   LINUX_MIB_TCP_SKIPPEDSKBS,
+   LINUX_MIB_TCP_NOCACHE,
+   LINUX_MIB_TCP_FULLWALK,
+   LINUX_MIB_TCP_HEADWALK,
+   LINUX_MIB_TCP_TAILWALK,
+   LINUX_MIB_TCP_FULLSKIP,
+   LINUX_MIB_TCP_TAILSKIP,
+   LINUX_MIB_TCP_HEADSKIP,
+   LINUX_MIB_TCP_FULLSKIP_TOHIGH,
+   LINUX_MIB_TCP_TAILSKIP_TOHIGH,
+   LINUX_MIB_TCP_NEWSKIP,
+   LINUX_MIB_TCP_CACHEREMAINING,
__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 9dee70e..4808f82 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -246,6 +246,26 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM(TCPDSACKIgnoredOld, LINUX_MIB_TCPDSACKIGNOREDOLD),
SNMP_MIB_ITEM(TCPDSACKIgnoredNoUndo, LINUX_MIB_TCPDSACKIGNOREDNOUNDO),
SNMP_MIB_ITEM(TCPSpuriousRTOs, LINUX_MIB_TCPSPURIOUSRTOS),
+   SNMP_MIB_ITEM(TCP_SACKTAG, LINUX_MIB_TCP_SACKTAG),
+   SNMP_MIB_ITEM(TCP_SACK0, LINUX_MIB_TCP_SACK0),
+   SNMP_MIB_ITEM(TCP_SACK1, LINUX_MIB_TCP_SACK1),
+   SNMP_MIB_ITEM(TCP_SACK2, LINUX_MIB_TCP_SACK2),
+   SNMP_MIB_ITEM(TCP_SACK3, LINUX_MIB_TCP_SACK3),
+   SNMP_MIB_ITEM(TCP_SACK4, LINUX_MIB_TCP_SACK4),
+   SNMP_MIB_ITEM(TCP_WALKEDSKBS, LINUX_MIB_TCP_WALKEDSKBS),
+   SNMP_MIB_ITEM(TCP_WALKEDDSACKS, LINUX_MIB_TCP_WALKEDDSACKS),
+   SNMP_MIB_ITEM(TCP_SKIPPEDSKBS, LINUX_MIB_TCP_SKIPPEDSKBS),
+   SNMP_MIB_ITEM(TCP_NOCACHE, LINUX_MIB_TCP_NOCACHE),
+   SNMP_MIB_ITEM(TCP_FULLWALK, LINUX_MIB_TCP_FULLWALK),
+   SNMP_MIB_ITEM(TCP_HEADWALK, LINUX_MIB_TCP_HEADWALK),
+   SNMP_MIB_ITEM(TCP_TAILWALK, LINUX_MIB_TCP_TAILWALK),
+   SNMP_MIB_ITEM(TCP_FULLSKIP, LINUX_MIB_TCP_FULLSKIP),
+   SNMP_MIB_ITEM(TCP_TAILSKIP, LINUX_MIB_TCP_TAILSKIP),
+   SNMP_MIB_ITEM(TCP_HEADSKIP, LINUX_MIB_TCP_HEADSKIP),
+   SNMP_MIB_ITEM(TCP_FULLSKIP_TOHIGH, LINUX_MIB_TCP_FULLSKIP_TOHIGH),
+   SNMP_MIB_ITEM(TCP_TAILSKIP_TOHIGH, LINUX_MIB_TCP_TAILSKIP_TOHIGH),
+   SNMP_MIB_ITEM(TCP_NEWSKIP, LINUX_MIB_TCP_NEWSKIP),
+   SNMP_MIB_ITEM(TCP_CACHEREMAINING, LINUX_MIB_TCP_CACHEREMAINING),
SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9dfdd67..38045c8 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1306,6 +1306,10 @@ static struct sk_buff *tcp_sacktag_walk(struct sk_buff 
*skb, struct sock *sk,
 
tcp_sacktag_one(skb, sk, state, in_sack, dup_sack,
state-fack_count, end_seq);
+
+   NET_INC_STATS_BH(LINUX_MIB_TCP_WALKEDSKBS);
+   if (dup_sack)
+   NET_INC_STATS_BH(LINUX_MIB_TCP_WALKEDDSACKS);
}
return skb;
 }
@@ -1329,6 +1333,7 @@ static struct sk_buff *tcp_sacktag_skip(struct sk_buff 
*skb, struct sock *sk,
skb = tcp_sacktag_walk(skb, sk, state, state-dup_start,
   state-dup_end, 1);
}
+   NET_INC_STATS_BH(LINUX_MIB_TCP_SKIPPEDSKBS);
}
return skb;
 }
@@ -1458,9 +1463,22 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
cache++;
}
 
+   NET_INC_STATS_BH(LINUX_MIB_TCP_SACKTAG);
+   switch (used_sacks) {
+   case 0: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK0); break;
+   case 1: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK1); break;
+   case 2: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK2); break;
+   case 3: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK3); break;
+   case 4: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK4); break;
+   }
+
+   if (!tcp_sack_cache_ok(tp, cache))
+   NET_INC_STATS_BH(LINUX_MIB_TCP_NOCACHE);
+
while (i  used_sacks) {
u32 start_seq = sp[i].start_seq;
u32 end_seq = sp[i].end_seq;
+   int fullwalk = 0;
 
/* Event B in the comment above. */
if (after(end_seq, tp-high_seq))
@@ -1473,41 +1491,69 @@ 

Re: 2.6.23-rc6-mm1: Build failures on ppc64_defconfig

2007-09-24 Thread Mel Gorman
On (22/09/07 12:24), Satyam Sharma didst pronounce:
 
 
 On Thu, 20 Sep 2007, Satyam Sharma wrote:
  
  BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ...
  IIRC I got build failures in:
 
  drivers/net/spider_net.c
 
 
 [PATCH -mm] spider_net: Misc build fixes after recent netdev stats changes
 
 Unbreak the following:
 
 drivers/net/spider_net.c: In function 'spider_net_release_tx_chain':
 drivers/net/spider_net.c:818: error: 'dev' undeclared (first use in this 
 function)
 drivers/net/spider_net.c:818: error: (Each undeclared identifier is reported 
 only once
 drivers/net/spider_net.c:818: error: for each function it appears in.)
 drivers/net/spider_net.c: In function 'spider_net_xmit':
 drivers/net/spider_net.c:922: error: 'dev' undeclared (first use in this 
 function)
 drivers/net/spider_net.c: In function 'spider_net_pass_skb_up':
 drivers/net/spider_net.c:1018: error: 'dev' undeclared (first use in this 
 function)
 drivers/net/spider_net.c: In function 'spider_net_decode_one_descr':
 drivers/net/spider_net.c:1215: error: 'dev' undeclared (first use in this 
 function)
 make[2]: *** [drivers/net/spider_net.o] Error 1
 
 Signed-off-by: Satyam Sharma [EMAIL PROTECTED]

I've confirmed that this patch fixes the build error in question.

Acked-by: Mel Gorman [EMAIL PROTECTED]

 
 ---
 
  drivers/net/spider_net.c |   24 +++-
  1 file changed, 11 insertions(+), 13 deletions(-)
 
 diff -ruNp a/drivers/net/spider_net.c b/drivers/net/spider_net.c
 --- a/drivers/net/spider_net.c2007-09-22 06:26:39.0 +0530
 +++ b/drivers/net/spider_net.c2007-09-22 12:12:23.0 +0530
 @@ -795,6 +795,7 @@ spider_net_set_low_watermark(struct spid
  static int
  spider_net_release_tx_chain(struct spider_net_card *card, int brutal)
  {
 + struct net_device *dev = card-netdev;
   struct spider_net_descr_chain *chain = card-tx_chain;
   struct spider_net_descr *descr;
   struct spider_net_hw_descr *hwdescr;
 @@ -919,7 +920,7 @@ spider_net_xmit(struct sk_buff *skb, str
   spider_net_release_tx_chain(card, 0);
  
   if (spider_net_prepare_tx_descr(card, skb) != 0) {
 - dev-stats.tx_dropped++;
 + netdev-stats.tx_dropped++;
   netif_stop_queue(netdev);
   return NETDEV_TX_BUSY;
   }
 @@ -979,16 +980,12 @@ static void
  spider_net_pass_skb_up(struct spider_net_descr *descr,
  struct spider_net_card *card)
  {
 - struct spider_net_hw_descr *hwdescr= descr-hwdescr;
 - struct sk_buff *skb;
 - struct net_device *netdev;
 - u32 data_status, data_error;
 -
 - data_status = hwdescr-data_status;
 - data_error = hwdescr-data_error;
 - netdev = card-netdev;
 + struct spider_net_hw_descr *hwdescr = descr-hwdescr;
 + struct sk_buff *skb = descr-skb;
 + struct net_device *netdev = card-netdev;
 + u32 data_status = hwdescr-data_status;
 + u32 data_error = hwdescr-data_error;
  
 - skb = descr-skb;
   skb_put(skb, hwdescr-valid_size);
  
   /* the card seems to add 2 bytes of junk in front
 @@ -1015,8 +1012,8 @@ spider_net_pass_skb_up(struct spider_net
   }
  
   /* update netdevice statistics */
 - dev-stats.rx_packets++;
 - dev-stats.rx_bytes += skb-len;
 + netdev-stats.rx_packets++;
 + netdev-stats.rx_bytes += skb-len;
  
   /* pass skb up to stack */
   netif_receive_skb(skb);
 @@ -1184,6 +1181,7 @@ static int spider_net_resync_tail_ptr(st
  static int
  spider_net_decode_one_descr(struct spider_net_card *card)
  {
 + struct net_device *dev = card-netdev;
   struct spider_net_descr_chain *chain = card-rx_chain;
   struct spider_net_descr *descr = chain-tail;
   struct spider_net_hw_descr *hwdescr = descr-hwdescr;
 @@ -1210,7 +1208,7 @@ spider_net_decode_one_descr(struct spide
(status == SPIDER_NET_DESCR_PROTECTION_ERROR) ||
(status == SPIDER_NET_DESCR_FORCE_END) ) {
   if (netif_msg_rx_err(card))
 - dev_err(card-netdev-dev,
 + dev_err(dev-dev,
  dropping RX descriptor with state %d\n, 
 status);
   dev-stats.rx_dropped++;
   goto bad_desc;
 

-- 
-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pktgen question

2007-09-24 Thread Steve Wise
I think pktgen should be cloning the skbs using skb_clone().  Then it 
will work for all devices, eh?



Ben Greear wrote:

jamal wrote:

On Sun, 2007-23-09 at 12:55 -0500, Steve Wise wrote:

 
Its a hack that breaks cxgb3 because cxgb3 uses the skb-cb area for 
each skb passed down.  So cxgb3 is at fault then?  IE a driver cannot 
use the skb-cb field if the users count is  1?  Or maybe a driver 
can _never_ use the cb field?



any layer can use the cb structure whichever way they wish. There are
violations, e.g:
the vlan code also uses the cb field to pass vlan details for hardware
acceleration. How does pktgen affect it though, clone() will just copy
the cb field and pktgen doesnt touch it.
In retrospect, pktgen may have to use clone - ccing Robert Olsson.
  
Pktgen abuses the ref count, as far as I can tell.  It works with most 
ethernet drivers,
but that multi-pkt will also fail with things like vlans because the 
skb-dev is changed

as it is transmitted (to the lower-level device).

I'd say just don't use the multi-pkt with pktgen on devices that can't 
handle it.


Thanks,
Ben



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] Re: [PATCH V5 2/11] IB/ipoib: Notify the world before doing unregister

2007-09-24 Thread Moni Shoua
Roland Dreier wrote:
   The action in bonding to a detach of slave is to unregister the master 
 (see patch 10).
   This can't be done from the context of unregister_netdevice itself (it is 
 protected by rtnl_lock).
 
 I'm confused.  Your patch has:
 
   +  ipoib_slave_detach(cpriv-dev);
  unregister_netdev(cpriv-dev);
 
 And ipoib_slave_detach() is:
 
   +static inline void ipoib_slave_detach(struct net_device *dev)
   +{
   +  rtnl_lock();
   +  netdev_slave_detach(dev);
   +  rtnl_unlock();
   +}
 
 so you are calling netdev_slave_detach() with the rtnl lock held.
 Why can't you make the same call from the start of unregister_netdevice()?
 
 Anyway, if the rtnl lock is a problem, can you just add the call to
 netdev_slave_detach() to unregister_netdev() before it takes the rtnl lock?
 
  - R.
 

Your comment made me do a little rethinking.
In bonding, device is released by calling unregister_netdevice() that doesn't 
take the rtnl_lock (unlike unregister_netdev() that does). I guess that this 
made me 
confused to think that this is not possible. So, I guess I could put 
the detach notification in unregister_netedev() and the reaction to the 
notification 
in the bonding driver would not block.
However, I looked one more time at the code of unregister_netdevice() and found 
out that
nothing prevents from calling unregister_netdevice() again when the 
notification NETDEV_GOING_DOWN
is sent. I tried that and it works.
I have a new set of patches without sending a slave detach and I will send it 
soon.

Thanks for the comment Roland. It makes this patch simpler.

I'd also like to give a credit to Jay for the idea of using NETDEV_GOING_DOWN 
notification
instead of NETDEV_CHANGE+IFF_SLAVE_DETACH. He suggested it a while ago but I 
wrongly thought that
it wouldn't work.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pktgen question

2007-09-24 Thread Ben Greear

Steve Wise wrote:
I think pktgen should be cloning the skbs using skb_clone().  Then it 
will work for all devices, eh?
That might work, but it would decrease performance slightly (or, 
increase CPU load at least).


Maybe a new option:  multi_clone

Ben

--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sb1250-mac: Driver model phylib update

2007-09-24 Thread Maciej W. Rozycki
On Fri, 21 Sep 2007, Andrew Morton wrote:

   A driver model and phylib update.
 
 akpm:/usr/src/25 diffstat patches/git-net.patch | tail -n 1
  1013 files changed, 187667 insertions(+), 23587 deletions(-)
 
 Sorry, but raising networking patches against Linus's crufty
 old mainline tree just isn't viable at present.

 Well, this is against Jeff's netdev-2.6 tree which hopefully is not as 
crufty as Linus's old mainline; if it is not possible to queue this change 
for 2.6.25 or suchlike, then I will try to resubmit later.  Thanks for 
your attention though.

  Maciej
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pktgen question

2007-09-24 Thread Steve Wise


Ben Greear wrote:

Steve Wise wrote:
I think pktgen should be cloning the skbs using skb_clone().  Then it 
will work for all devices, eh?
That might work, but it would decrease performance slightly (or, 
increase CPU load at least).


Perf-wise, you could clone the skbs up front, then deliver them to the 
nic in a tight loop.  This would mitigate the added overhead introduced 
by calling skb_clone() in the loop doing transmits...




Maybe a new option:  multi_clone



If the current code is busted, I think it should be fixed.

My 2 cents.

Steve.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 0/9] net/bonding: ADD IPoIB support for the bonding driver

2007-09-24 Thread Moni Shoua
This patch series is the sixth version (see below link to V5) of the 
suggested changes to the bonding driver so it would be able to support 
non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode. 

Patches 1-8 were originally submitted in V5 and patch 9 is an addition by Jay.


Major changes from the previous version:


1. Remove the patches to net/core. Bonding will use the NETDEV_GOING_DOWN 
notification
   instead of NETDEV_CHANGE+IFF_SLAVE_DETACH. This reduces the amount of 
patches from 11
   to 9.

Links to earlier discussion:


1. A discussion in netdev about bonding support for IPoIB.
http://lists.openwall.net/netdev/2006/11/30/46

2. V5 series
http://lists.openfabrics.org/pipermail/general/2007-September/040996.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 1/9] IB/ipoib: Bound the net device to the ipoib_neigh structue

2007-09-24 Thread Moni Shoua
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb-dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n-dev is an ipoib device and hence netdev_priv(n-dev)
can be casted to struct ipoib_dev_priv is buggy.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n-dev-flags has the
IFF_MASTER bit set.

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/infiniband/ulp/ipoib/ipoib.h   |4 +++-
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   24 +++-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |3 ++-
 3 files changed, 20 insertions(+), 11 deletions(-)

Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h   2007-09-18 
17:08:53.245849217 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-09-18 
17:09:26.534874404 +0200
@@ -328,6 +328,7 @@ struct ipoib_neigh {
struct sk_buff_head queue;
 
struct neighbour   *neighbour;
+   struct net_device *dev;
 
struct list_headlist;
 };
@@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ip
 INFINIBAND_ALEN, sizeof(void *));
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh);
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
+ struct net_device *dev);
 void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c  2007-09-18 
17:08:53.245849217 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c   2007-09-18 
17:23:54.725744661 +0200
@@ -511,7 +511,7 @@ static void neigh_add_path(struct sk_buf
struct ipoib_path *path;
struct ipoib_neigh *neigh;
 
-   neigh = ipoib_neigh_alloc(skb-dst-neighbour);
+   neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev);
if (!neigh) {
++priv-stats.tx_dropped;
dev_kfree_skb_any(skb);
@@ -830,6 +830,13 @@ static void ipoib_neigh_cleanup(struct n
unsigned long flags;
struct ipoib_ah *ah = NULL;
 
+   neigh = *to_ipoib_neigh(n);
+   if (neigh) {
+   priv = netdev_priv(neigh-dev);
+   ipoib_dbg(priv, neigh_destructor for bonding device: %s\n,
+ n-dev-name);
+   } else
+   return;
ipoib_dbg(priv,
  neigh_cleanup for %06x  IPOIB_GID_FMT \n,
  IPOIB_QPN(n-ha),
@@ -837,13 +844,10 @@ static void ipoib_neigh_cleanup(struct n
 
spin_lock_irqsave(priv-lock, flags);
 
-   neigh = *to_ipoib_neigh(n);
-   if (neigh) {
-   if (neigh-ah)
-   ah = neigh-ah;
-   list_del(neigh-list);
-   ipoib_neigh_free(n-dev, neigh);
-   }
+   if (neigh-ah)
+   ah = neigh-ah;
+   list_del(neigh-list);
+   ipoib_neigh_free(n-dev, neigh);
 
spin_unlock_irqrestore(priv-lock, flags);
 
@@ -851,7 +855,8 @@ static void ipoib_neigh_cleanup(struct n
ipoib_put_ah(ah);
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour,
+ struct net_device *dev)
 {
struct ipoib_neigh *neigh;
 
@@ -860,6 +865,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
return NULL;
 
neigh-neighbour = neighbour;
+   neigh-dev = dev;
*to_ipoib_neigh(neighbour) = neigh;
skb_queue_head_init(neigh-queue);
ipoib_cm_set(neigh, NULL);
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-09-18 
17:08:53.245849217 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c  2007-09-18 
17:09:26.536874045 +0200
@@ -727,7 +727,8 @@ out:
if (skb-dst
skb-dst-neighbour 
!*to_ipoib_neigh(skb-dst-neighbour)) {
-  

[PATCH V6 2/9] IB/ipoib: Verify address handle validity on send

2007-09-24 Thread Moni Shoua
When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c  2007-09-18 
17:09:26.535874225 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c   2007-09-18 
17:10:22.375853147 +0200
@@ -686,9 +686,10 @@ static int ipoib_start_xmit(struct sk_bu
goto out;
}
} else if (neigh-ah) {
-   if (unlikely(memcmp(neigh-dgid.raw,
+   if (unlikely((memcmp(neigh-dgid.raw,
skb-dst-neighbour-ha + 4,
-   sizeof(union ib_gid {
+   sizeof(union ib_gid))) ||
+(neigh-dev != dev))) {
spin_lock(priv-lock);
/*
 * It's safe to call ipoib_put_ah() inside

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 3/9] net/bonding: Enable bonding to enslave non ARPHRD_ETHER

2007-09-24 Thread Moni Shoua
This patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by 
ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types. It also enforces mutual exclusion on bonding slaves
from dissimilar ether types, as was concluded over the v1 discussion.

IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have omitted here some details which
are not important for the bonding RFC).

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/net/bonding/bond_main.c |   39 +++
 1 files changed, 39 insertions(+)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 
10:08:59.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:13.424688411 
+0300
@@ -1237,6 +1237,26 @@ static int bond_compute_features(struct 
return 0;
 }
 
+
+static void bond_setup_by_slave(struct net_device *bond_dev,
+   struct net_device *slave_dev)
+{
+   bond_dev-hard_header   = slave_dev-hard_header;
+   bond_dev-rebuild_header= slave_dev-rebuild_header;
+   bond_dev-hard_header_cache = slave_dev-hard_header_cache;
+   bond_dev-header_cache_update   = slave_dev-header_cache_update;
+   bond_dev-hard_header_parse = slave_dev-hard_header_parse;
+
+   bond_dev-neigh_setup   = slave_dev-neigh_setup;
+
+   bond_dev-type  = slave_dev-type;
+   bond_dev-hard_header_len   = slave_dev-hard_header_len;
+   bond_dev-addr_len  = slave_dev-addr_len;
+
+   memcpy(bond_dev-broadcast, slave_dev-broadcast,
+   slave_dev-addr_len);
+}
+
 /* enslave device slave to bond device master */
 int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 {
@@ -1311,6 +1331,25 @@ int bond_enslave(struct net_device *bond
goto err_undo_flags;
}
 
+   /* set bonding device ether type by slave - bonding netdevices are
+* created with ether_setup, so when the slave type is not ARPHRD_ETHER
+* there is a need to override some of the type dependent attribs/funcs.
+*
+* bond ether type mutual exclusion - don't allow slaves of dissimilar
+* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same 
bond
+*/
+   if (bond-slave_cnt == 0) {
+   if (slave_dev-type != ARPHRD_ETHER)
+   bond_setup_by_slave(bond_dev, slave_dev);
+   } else if (bond_dev-type != slave_dev-type) {
+   printk(KERN_ERR DRV_NAME : %s ether type (%d) is different 
+   from other slaves (%d), can not enslave it.\n,
+   slave_dev-name,
+   slave_dev-type, bond_dev-type);
+   res = -EINVAL;
+   goto err_undo_flags;
+   }
+
if (slave_dev-set_mac_address == NULL) {
printk(KERN_ERR DRV_NAME
: %s: Error: The slave device you specified does 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 4/9] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()

2007-09-24 Thread Moni Shoua
This patch allows for enslaving netdevices which do not support
the set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs
(this is already done by the bonding code).

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/net/bonding/bond_main.c |   87 +++-
 drivers/net/bonding/bonding.h   |1 
 2 files changed, 60 insertions(+), 28 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 
10:54:13.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:41.971632881 
+0300
@@ -1095,6 +1095,14 @@ void bond_change_active_slave(struct bon
if (new_active) {
bond_set_slave_active_flags(new_active);
}
+
+   /* when bonding does not set the slave MAC address, the bond MAC
+* address is the one of the active slave.
+*/
+   if (new_active  !bond-do_set_mac_addr)
+   memcpy(bond-dev-dev_addr,  new_active-dev-dev_addr,
+   new_active-dev-addr_len);
+
bond_send_gratuitous_arp(bond);
}
 }
@@ -1351,13 +1359,22 @@ int bond_enslave(struct net_device *bond
}
 
if (slave_dev-set_mac_address == NULL) {
-   printk(KERN_ERR DRV_NAME
-   : %s: Error: The slave device you specified does 
-   not support setting the MAC address. 
-   Your kernel likely does not support slave 
-   devices.\n, bond_dev-name);
-   res = -EOPNOTSUPP;
-   goto err_undo_flags;
+   if (bond-slave_cnt == 0) {
+   printk(KERN_WARNING DRV_NAME
+   : %s: Warning: The first slave device you 
+   specified does not support setting the MAC 
+   address. This bond MAC address would be that 
+   of the active slave.\n, bond_dev-name);
+   bond-do_set_mac_addr = 0;
+   } else if (bond-do_set_mac_addr) {
+   printk(KERN_ERR DRV_NAME
+   : %s: Error: The slave device you specified 
+   does not support setting the MAC addres,.
+   but this bond uses this practice. \n
+   , bond_dev-name);
+   res = -EOPNOTSUPP;
+   goto err_undo_flags;
+   }
}
 
new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
@@ -1378,16 +1395,18 @@ int bond_enslave(struct net_device *bond
 */
memcpy(new_slave-perm_hwaddr, slave_dev-dev_addr, ETH_ALEN);
 
-   /*
-* Set slave to master's mac address.  The application already
-* set the master's mac address to that of the first slave
-*/
-   memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len);
-   addr.sa_family = slave_dev-type;
-   res = dev_set_mac_address(slave_dev, addr);
-   if (res) {
-   dprintk(Error %d calling set_mac_address\n, res);
-   goto err_free;
+   if (bond-do_set_mac_addr) {
+   /*
+* Set slave to master's mac address.  The application already
+* set the master's mac address to that of the first slave
+*/
+   memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len);
+   addr.sa_family = slave_dev-type;
+   res = dev_set_mac_address(slave_dev, addr);
+   if (res) {
+   dprintk(Error %d calling set_mac_address\n, res);
+   goto err_free;
+   }
}
 
res = netdev_set_master(slave_dev, bond_dev);
@@ -1612,9 +1631,11 @@ err_close:
dev_close(slave_dev);
 
 err_restore_mac:
-   memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN);
-   addr.sa_family = slave_dev-type;
-   dev_set_mac_address(slave_dev, addr);
+   if (bond-do_set_mac_addr) {
+   memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN);
+   addr.sa_family = slave_dev-type;
+   dev_set_mac_address(slave_dev, addr);
+   }
 
 err_free:
kfree(new_slave);
@@ -1792,10 +1813,12 @@ int bond_release(struct net_device *bond
/* close slave before restoring its mac address */
dev_close(slave_dev);
 
-   /* restore original (permanent) mac address */
-   memcpy(addr.sa_data, slave-perm_hwaddr, ETH_ALEN);
-   

Re: pktgen question

2007-09-24 Thread Ben Greear

Steve Wise wrote:


Ben Greear wrote:

Steve Wise wrote:
I think pktgen should be cloning the skbs using skb_clone().  Then 
it will work for all devices, eh?
That might work, but it would decrease performance slightly (or, 
increase CPU load at least).


Perf-wise, you could clone the skbs up front, then deliver them to the 
nic in a tight loop.  This would mitigate the added overhead 
introduced by calling skb_clone() in the loop doing transmits...
That only works if you are sending a small number of skbs.  You can't 
pre-clone several minutes worth of 10Gbe traffic

with any normal amount of RAM.



Maybe a new option:  multi_clone



If the current code is busted, I think it should be fixed.

Well, it works fine when used correctly :)

Ben

--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 5/9] net/bonding: Enable IP multicast for bonding IPoIB devices

2007-09-24 Thread Moni Shoua
Allow to enslave devices when the bonding device is not up. Over the discussion
held at the previous post this seemed to be the most clean way to go, where it
is not expected to cause instabilities.

Normally, the bonding driver is UP before any enslavement takes place.
Once a netdevice is UP, the network stack acts to have it join some multicast 
groups
(eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding 
device
type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
where for multicast joins taking place after the enslavement another 
ip_xxx_mc_map()
is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/net/bonding/bond_main.c  |5 +++--
 drivers/net/bonding/bond_sysfs.c |6 ++
 2 files changed, 5 insertions(+), 6 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 
10:54:41.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:55:48.431862446 
+0300
@@ -1285,8 +1285,9 @@ int bond_enslave(struct net_device *bond
 
/* bond must be initialized by bond_open() before enslaving */
if (!(bond_dev-flags  IFF_UP)) {
-   dprintk(Error, master_dev is not up\n);
-   return -EPERM;
+   printk(KERN_WARNING DRV_NAME
+%s: master_dev is not up in bond_enslave\n,
+   bond_dev-name);
}
 
/* already enslaved */
Index: net-2.6/drivers/net/bonding/bond_sysfs.c
===
--- net-2.6.orig/drivers/net/bonding/bond_sysfs.c   2007-08-15 
10:08:58.0 +0300
+++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-08-15 10:55:48.432862269 
+0300
@@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru
 
/* Quick sanity check -- is the bond interface up? */
if (!(bond-dev-flags  IFF_UP)) {
-   printk(KERN_ERR DRV_NAME
-  : %s: Unable to update slaves because interface is 
down.\n,
+   printk(KERN_WARNING DRV_NAME
+  : %s: doing slave updates when interface is down.\n,
   bond-dev-name);
-   ret = -EPERM;
-   goto out;
}
 
/* Note:  We can't hold bond-lock here, as bond_create grabs it. */

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 6/9] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device

2007-09-24 Thread Moni Shoua
bonding sometimes uses Ethernet constants (such as MTU and address length) which
are not good when it enslaves non Ethernet devices (such as InfiniBand).

Signed-off-by: Moni Shoua monis at voltaire.com
---
 drivers/net/bonding/bond_main.c  |3 ++-
 drivers/net/bonding/bond_sysfs.c |   10 --
 drivers/net/bonding/bonding.h|1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-09-24 
12:52:33.0 +0200
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-09-24 12:57:33.411459811 
+0200
@@ -1224,7 +1224,8 @@ static int bond_compute_features(struct 
struct slave *slave;
struct net_device *bond_dev = bond-dev;
unsigned long features = bond_dev-features;
-   unsigned short max_hard_header_len = ETH_HLEN;
+   unsigned short max_hard_header_len = max((u16)ETH_HLEN,
+   bond_dev-hard_header_len);
int i;
 
features = ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES);
Index: net-2.6/drivers/net/bonding/bond_sysfs.c
===
--- net-2.6.orig/drivers/net/bonding/bond_sysfs.c   2007-09-24 
12:55:09.0 +0200
+++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-09-24 13:00:23.752680721 
+0200
@@ -260,6 +260,7 @@ static ssize_t bonding_store_slaves(stru
char command[IFNAMSIZ + 1] = { 0, };
char *ifname;
int i, res, found, ret = count;
+   u32 original_mtu;
struct slave *slave;
struct net_device *dev = NULL;
struct bonding *bond = to_bond(d);
@@ -325,6 +326,7 @@ static ssize_t bonding_store_slaves(stru
}
 
/* Set the slave's MTU to match the bond */
+   original_mtu = dev-mtu;
if (dev-mtu != bond-dev-mtu) {
if (dev-change_mtu) {
res = dev-change_mtu(dev,
@@ -339,6 +341,9 @@ static ssize_t bonding_store_slaves(stru
}
rtnl_lock();
res = bond_enslave(bond-dev, dev);
+   bond_for_each_slave(bond, slave, i)
+   if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0)
+   slave-original_mtu = original_mtu;
rtnl_unlock();
if (res) {
ret = res;
@@ -351,6 +356,7 @@ static ssize_t bonding_store_slaves(stru
bond_for_each_slave(bond, slave, i)
if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) {
dev = slave-dev;
+   original_mtu = slave-original_mtu;
break;
}
if (dev) {
@@ -365,9 +371,9 @@ static ssize_t bonding_store_slaves(stru
}
/* set the slave MTU to the default */
if (dev-change_mtu) {
-   dev-change_mtu(dev, 1500);
+   dev-change_mtu(dev, original_mtu);
} else {
-   dev-mtu = 1500;
+   dev-mtu = original_mtu;
}
}
else {
Index: net-2.6/drivers/net/bonding/bonding.h
===
--- net-2.6.orig/drivers/net/bonding/bonding.h  2007-09-24 12:55:09.0 
+0200
+++ net-2.6/drivers/net/bonding/bonding.h   2007-09-24 12:57:33.412459636 
+0200
@@ -156,6 +156,7 @@ struct slave {
s8 link;/* one of BOND_LINK_ */
s8 state;   /* one of BOND_STATE_ */
u32original_flags;
+   u32original_mtu;
u32link_failure_count;
u16speed;
u8 duplex;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pktgen question

2007-09-24 Thread Robert Olsson

Hi,

Steve Wise writes:
  I think pktgen should be cloning the skbs using skb_clone().  Then it 
  will work for all devices, eh?

 pktgen assumes for fastpath sending exclusive ownership of
 the skb. And does a skb_get to avoid final skb destruction so 
 the same skb can be sent over and over. The idea is to avoid 
 memory allocation and keep things in cache to give very high
 packet rates with identical packets.
   I
 But if you need to alter the packet then the skb_get trick can't 
 be done. And you have to turn off fastpath with clone_skb 

  Perf-wise, you could clone the skbs up front, then deliver them to the 
  nic in a tight loop.  This would mitigate the added overhead introduced 
  by calling skb_clone() in the loop doing transmits...

 Sure it's can be done. It could replay sequences etc but it will not
 beat the skb_get trick in sending identical packets. It has been
 proposed before but I've avoided such efforts to keep things relatively 
 small and simple. Really pktgen should be reworked to have s small 
 skim in kernel and move the rest of the stuff to userland.

 Cheers.
--ro
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PATCH V6 7/9] net/bonding: Delay sending of gratuitous ARP to avoid failure

2007-09-24 Thread Moni Shoua
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
in dev-state field is on. This improves the chances for the arp packet to
be transmitted.

Signed-off-by: Moni Shoua monis at voltaire.com
---
 drivers/net/bonding/bond_main.c |   24 +---
 drivers/net/bonding/bonding.h   |1 +
 2 files changed, 22 insertions(+), 3 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 
10:56:33.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 11:04:37.221123652 
+0300
@@ -1102,8 +1102,14 @@ void bond_change_active_slave(struct bon
if (new_active  !bond-do_set_mac_addr)
memcpy(bond-dev-dev_addr,  new_active-dev-dev_addr,
new_active-dev-addr_len);
-
-   bond_send_gratuitous_arp(bond);
+   if (bond-curr_active_slave 
+   test_bit(__LINK_STATE_LINKWATCH_PENDING,
+   bond-curr_active_slave-dev-state)) {
+   dprintk(delaying gratuitous arp on %s\n,
+   bond-curr_active_slave-dev-name);
+   bond-send_grat_arp = 1;
+   } else
+   bond_send_gratuitous_arp(bond);
}
 }
 
@@ -2083,6 +2089,17 @@ void bond_mii_monitor(struct net_device 
 * program could monitor the link itself if needed.
 */
 
+   if (bond-send_grat_arp) {
+   if (bond-curr_active_slave  
test_bit(__LINK_STATE_LINKWATCH_PENDING,
+   bond-curr_active_slave-dev-state))
+   dprintk(Needs to send gratuitous arp but not yet\n);
+   else {
+   dprintk(sending delayed gratuitous arp on on %s\n,
+   bond-curr_active_slave-dev-name);
+   bond_send_gratuitous_arp(bond);
+   bond-send_grat_arp = 0;
+   }
+   }
read_lock(bond-curr_slave_lock);
oldcurrent = bond-curr_active_slave;
read_unlock(bond-curr_slave_lock);
@@ -2484,7 +2501,7 @@ static void bond_send_gratuitous_arp(str
 
if (bond-master_ip) {
bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip,
- bond-master_ip, 0);
+   bond-master_ip, 0);
}
 
list_for_each_entry(vlan, bond-vlan_list, vlan_list) {
@@ -4293,6 +4310,7 @@ static int bond_init(struct net_device *
bond-current_arp_slave = NULL;
bond-primary_slave = NULL;
bond-dev = bond_dev;
+   bond-send_grat_arp = 0;
INIT_LIST_HEAD(bond-vlan_list);
 
/* Initialize the device entry points */
Index: net-2.6/drivers/net/bonding/bonding.h
===
--- net-2.6.orig/drivers/net/bonding/bonding.h  2007-08-15 10:56:33.0 
+0300
+++ net-2.6/drivers/net/bonding/bonding.h   2007-08-15 11:05:41.516451497 
+0300
@@ -187,6 +187,7 @@ struct bonding {
struct   timer_list arp_timer;
s8   kill_timers;
s8   do_set_mac_addr;
+   s8   send_grat_arp;
struct   net_device_stats stats;
 #ifdef CONFIG_PROC_FS
struct   proc_dir_entry *proc_entry;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 8/9] net/bonding: Destroy bonding master when last slave is gone

2007-09-24 Thread Moni Shoua
When bonding enslaves non Ethernet devices it takes pointers to functions 
in the module that owns the slaves. In this case it becomes unsafe
to keep the bonding master registered after last slave was unenslaved 
because we don't know if the pointers are still valid.  Destroying the bond 
when slave_cnt is zero
ensures that these functions be used anymore.

Signed-off-by: Moni Shoua monis at voltaire.com
---
 drivers/net/bonding/bond_main.c  |   37 +
 drivers/net/bonding/bond_sysfs.c |9 +
 drivers/net/bonding/bonding.h|3 +++
 3 files changed, 45 insertions(+), 4 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-09-24 
14:01:24.055441842 +0200
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-09-24 14:05:05.658979207 
+0200
@@ -1256,6 +1256,7 @@ static int bond_compute_features(struct 
 static void bond_setup_by_slave(struct net_device *bond_dev,
struct net_device *slave_dev)
 {
+   struct bonding *bond = bond_dev-priv;
bond_dev-hard_header   = slave_dev-hard_header;
bond_dev-rebuild_header= slave_dev-rebuild_header;
bond_dev-hard_header_cache = slave_dev-hard_header_cache;
@@ -1270,6 +1271,7 @@ static void bond_setup_by_slave(struct n
 
memcpy(bond_dev-broadcast, slave_dev-broadcast,
slave_dev-addr_len);
+   bond-setup_by_slave = 1;
 }
 
 /* enslave device slave to bond device master */
@@ -1838,6 +1840,35 @@ int bond_release(struct net_device *bond
 }
 
 /*
+* Destroy a bonding device.
+* Must be under rtnl_lock when this function is called.
+*/
+void bond_destroy(struct bonding *bond)
+{
+   bond_deinit(bond-dev);
+   bond_destroy_sysfs_entry(bond);
+   unregister_netdevice(bond-dev);
+}
+
+/*
+* First release a slave and than destroy the bond if no more slaves iare left.
+* Must be under rtnl_lock when this function is called.
+*/
+int  bond_release_and_destroy(struct net_device *bond_dev, struct net_device 
*slave_dev)
+{
+   struct bonding *bond = bond_dev-priv;
+   int ret;
+
+   ret = bond_release(bond_dev, slave_dev);
+   if ((ret == 0)  (bond-slave_cnt == 0)) {
+   printk(KERN_INFO DRV_NAME  %s: destroying bond %s.\n,
+   bond_dev-name);
+   bond_destroy(bond);
+   }
+   return ret;
+}
+
+/*
  * This function releases all slaves.
  */
 static int bond_release_all(struct net_device *bond_dev)
@@ -3337,6 +3368,11 @@ static int bond_slave_netdev_event(unsig
 * ... Or is it this?
 */
break;
+   case NETDEV_GOING_DOWN:
+   dprintk(slave %s is going down\n, slave_dev-name);
+   if (bond-setup_by_slave)
+   bond_release_and_destroy(bond_dev, slave_dev);
+   break;
case NETDEV_CHANGEMTU:
/*
 * TODO: Should slaves be allowed to
@@ -4311,6 +4347,7 @@ static int bond_init(struct net_device *
bond-primary_slave = NULL;
bond-dev = bond_dev;
bond-send_grat_arp = 0;
+   bond-setup_by_slave = 0;
INIT_LIST_HEAD(bond-vlan_list);
 
/* Initialize the device entry points */
Index: net-2.6/drivers/net/bonding/bonding.h
===
--- net-2.6.orig/drivers/net/bonding/bonding.h  2007-09-24 14:01:24.055441842 
+0200
+++ net-2.6/drivers/net/bonding/bonding.h   2007-09-24 14:01:24.627340013 
+0200
@@ -188,6 +188,7 @@ struct bonding {
s8   kill_timers;
s8   do_set_mac_addr;
s8   send_grat_arp;
+   s8   setup_by_slave;
struct   net_device_stats stats;
 #ifdef CONFIG_PROC_FS
struct   proc_dir_entry *proc_entry;
@@ -295,6 +296,8 @@ static inline void bond_unset_master_alb
 struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry 
*curr);
 int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct 
net_device *slave_dev);
 int bond_create(char *name, struct bond_params *params, struct bonding 
**newbond);
+void bond_destroy(struct bonding *bond);
+int  bond_release_and_destroy(struct net_device *bond_dev, struct net_device 
*slave_dev);
 void bond_deinit(struct net_device *bond_dev);
 int bond_create_sysfs(void);
 void bond_destroy_sysfs(void);
Index: net-2.6/drivers/net/bonding/bond_sysfs.c
===
--- net-2.6.orig/drivers/net/bonding/bond_sysfs.c   2007-09-24 
14:01:23.523536550 +0200
+++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-09-24 14:01:24.628339835 
+0200
@@ -164,9 +164,7 @@ static ssize_t bonding_store_bonds(struc
printk(KERN_INFO DRV_NAME
: %s 

[PATCH 9/9] bonding: Optionally allow ethernet slaves to keep own MAC

2007-09-24 Thread Moni Shoua
Update the don't change MAC of slaves functionality added in
previous changes to be a generic option, rather than something tied to IB
devices, as it's occasionally useful for regular ethernet devices as well.

Adds fail_over_mac option (which is automatically enabled for IB
slaves), applicable only to active-backup mode.

Includes documentation update.

Updates bonding driver version to 3.2.0.

Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]
---
 Documentation/networking/bonding.txt |   33 +++
 drivers/net/bonding/bond_main.c  |   57 +
 drivers/net/bonding/bond_sysfs.c |   49 +
 drivers/net/bonding/bonding.h|6 ++--
 4 files changed, 121 insertions(+), 24 deletions(-)

diff --git a/Documentation/networking/bonding.txt 
b/Documentation/networking/bonding.txt
index 1da5666..1134062 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -281,6 +281,39 @@ downdelay
will be rounded down to the nearest multiple.  The default
value is 0.
 
+fail_over_mac
+
+   Specifies whether active-backup mode should set all slaves to
+   the same MAC address (the traditional behavior), or, when
+   enabled, change the bond's MAC address when changing the
+   active interface (i.e., fail over the MAC address itself).
+
+   Fail over MAC is useful for devices that cannot ever alter
+   their MAC address, or for devices that refuse incoming
+   broadcasts with their own source MAC (which interferes with
+   the ARP monitor).
+
+   The down side of fail over MAC is that every device on the
+   network must be updated via gratuitous ARP, vs. just updating
+   a switch or set of switches (which often takes place for any
+   traffic, not just ARP traffic, if the switch snoops incoming
+   traffic to update its tables) for the traditional method.  If
+   the gratuitous ARP is lost, communication may be disrupted.
+
+   When fail over MAC is used in conjuction with the mii monitor,
+   devices which assert link up prior to being able to actually
+   transmit and receive are particularly susecptible to loss of
+   the gratuitous ARP, and an appropriate updelay setting may be
+   required.
+
+   A value of 0 disables fail over MAC, and is the default.  A
+   value of 1 enables fail over MAC.  This option is enabled
+   automatically if the first slave added cannot change its MAC
+   address.  This option may be modified via sysfs only when no
+   slaves are present in the bond.
+
+   This option was added in bonding version 3.2.0.
+
 lacp_rate
 
Option specifying the rate in which we'll ask our link partner
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 77caca3..c01ff9d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -97,6 +97,7 @@ static char *xmit_hash_policy = NULL;
 static int arp_interval = BOND_LINK_ARP_INTERV;
 static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, };
 static char *arp_validate = NULL;
+static int fail_over_mac = 0;
 struct bond_params bonding_defaults;
 
 module_param(max_bonds, int, 0);
@@ -130,6 +131,8 @@ module_param_array(arp_ip_target, charp, NULL, 0);
 MODULE_PARM_DESC(arp_ip_target, arp targets in n.n.n.n form);
 module_param(arp_validate, charp, 0);
 MODULE_PARM_DESC(arp_validate, validate src/dst of ARP probes: none 
(default), active, backup or all);
+module_param(fail_over_mac, int, 0);
+MODULE_PARM_DESC(fail_over_mac, For active-backup, do not set all slaves to 
the same MAC.  0 of off (default), 1 for on.);
 
 /*- Global variables */
 
@@ -1099,7 +1102,7 @@ void bond_change_active_slave(struct bonding *bond, 
struct slave *new_active)
/* when bonding does not set the slave MAC address, the bond MAC
 * address is the one of the active slave.
 */
-   if (new_active  !bond-do_set_mac_addr)
+   if (new_active  bond-params.fail_over_mac)
memcpy(bond-dev-dev_addr,  new_active-dev-dev_addr,
new_active-dev-addr_len);
if (bond-curr_active_slave 
@@ -1371,16 +1374,16 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev)
if (slave_dev-set_mac_address == NULL) {
if (bond-slave_cnt == 0) {
printk(KERN_WARNING DRV_NAME
-   : %s: Warning: The first slave device you 
-   specified does not support setting the MAC 
-   address. This bond MAC address would be that 
-   of the active slave.\n, bond_dev-name);
-   bond-do_set_mac_addr = 0;
-   } else 

Re: [PATCH V6 5/9] net/bonding: Enable IP multicast for bonding IPoIB devices

2007-09-24 Thread Stephen Hemminger
On Mon, 24 Sep 2007 17:37:00 +0200
Moni Shoua [EMAIL PROTECTED] wrote:

 Allow to enslave devices when the bonding device is not up. Over the 
 discussion
 held at the previous post this seemed to be the most clean way to go, where it
 is not expected to cause instabilities.
 
 Normally, the bonding driver is UP before any enslavement takes place.
 Once a netdevice is UP, the network stack acts to have it join some multicast 
 groups
 (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding 
 device
 type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
 computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
 where for multicast joins taking place after the enslavement another 
 ip_xxx_mc_map()
 is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)
 
 Signed-off-by: Moni Shoua monis at voltaire.com
 Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
 ---
  drivers/net/bonding/bond_main.c  |5 +++--
  drivers/net/bonding/bond_sysfs.c |6 ++
  2 files changed, 5 insertions(+), 6 deletions(-)
 
 Index: net-2.6/drivers/net/bonding/bond_main.c
 ===
 --- net-2.6.orig/drivers/net/bonding/bond_main.c  2007-08-15 
 10:54:41.0 +0300
 +++ net-2.6/drivers/net/bonding/bond_main.c   2007-08-15 10:55:48.431862446 
 +0300
 @@ -1285,8 +1285,9 @@ int bond_enslave(struct net_device *bond
  
   /* bond must be initialized by bond_open() before enslaving */
   if (!(bond_dev-flags  IFF_UP)) {
 - dprintk(Error, master_dev is not up\n);
 - return -EPERM;
 + printk(KERN_WARNING DRV_NAME
 +  %s: master_dev is not up in bond_enslave\n,
 + bond_dev-name);
   }
  
   /* already enslaved */
 Index: net-2.6/drivers/net/bonding/bond_sysfs.c
 ===
 --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-08-15 
 10:08:58.0 +0300
 +++ net-2.6/drivers/net/bonding/bond_sysfs.c  2007-08-15 10:55:48.432862269 
 +0300
 @@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru
  
   /* Quick sanity check -- is the bond interface up? */
   if (!(bond-dev-flags  IFF_UP)) {
 - printk(KERN_ERR DRV_NAME
 -: %s: Unable to update slaves because interface is 
 down.\n,
 + printk(KERN_WARNING DRV_NAME
 +: %s: doing slave updates when interface is down.\n,
  bond-dev-name);
 - ret = -EPERM;
 - goto out;
   }
  

Please get rid of the warning. Make bonding work correctly and allow 
enslave/remove
of device when bonding is down.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sb1250-mac: Driver model phylib update

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 15:41:54 +0100 (BST) Maciej W. Rozycki [EMAIL 
PROTECTED] wrote:

 On Fri, 21 Sep 2007, Andrew Morton wrote:
 
A driver model and phylib update.
  
  akpm:/usr/src/25 diffstat patches/git-net.patch | tail -n 1
   1013 files changed, 187667 insertions(+), 23587 deletions(-)
  
  Sorry, but raising networking patches against Linus's crufty
  old mainline tree just isn't viable at present.
 
  Well, this is against Jeff's netdev-2.6 tree which hopefully is not as 
 crufty as Linus's old mainline; if it is not possible to queue this change 
 for 2.6.25 or suchlike, then I will try to resubmit later.

Most of Jeff's netdev tree got dumped into Dave's net-2.6.24 tree.  That's
the one you want to be raising patches against for the next few weeks.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG

2007-09-24 Thread Joe Perches
On Wed, 2007-09-19 at 12:54 -0700, David Miller wrote:
 Applied to net-2.6.24, thanks Joe!

Here is a patch that adds some type safety to print_mac
by using a struct print_mac_buf * instead of char *.

It also reduces the defconfig vmlinux size by 8 bytes.

Signed-off-by:  Joe Perches [EMAIL PROTECTED]

--

 include/linux/if_ether.h |   12 ++--
 net/ethernet/eth.c   |6 +++---
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index 57abca1..620d6b1 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -126,7 +126,15 @@ extern struct ctl_table ether_table[];
  * Display a 6 byte device address (MAC) in a readable format.
  */
 #define MAC_FMT %02x:%02x:%02x:%02x:%02x:%02x
-extern char *print_mac(char *buf, const u8 *addr);
-#define DECLARE_MAC_BUF(var) char var[18] __maybe_unused
+
+struct print_mac_buf {
+   char formatted_mac_addr[18];
+};
+
+#define DECLARE_MAC_BUF(var) \
+   struct print_mac_buf __maybe_unused _##var; \
+   struct print_mac_buf __maybe_unused *var = _##var
+
+extern char *print_mac(struct print_mac_buf *buf, const u8 *addr);
 
 #endif /* _LINUX_IF_ETHER_H */
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 2aaf6fa..ad82613 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -338,10 +338,10 @@ struct net_device *alloc_etherdev_mq(int sizeof_priv, 
unsigned int queue_count)
 }
 EXPORT_SYMBOL(alloc_etherdev_mq);
 
-char *print_mac(char *buf, const u8 *addr)
+char *print_mac(struct print_mac_buf *buf, const u8 *addr)
 {
-   sprintf(buf, MAC_FMT,
+   sprintf(buf-formatted_mac_addr, MAC_FMT,
addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);
-   return buf;
+   return buf-formatted_mac_addr;
 }
 EXPORT_SYMBOL(print_mac);


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-24 Thread Lennart Sorensen
On Fri, Sep 21, 2007 at 11:37:52PM +0100, Denys Vlasenko wrote:
 But I compile net/* into bzImage. I like netbooting :)

Isn't it possible to netboot with an initramfs image?  I am pretty sure
I have seen some systems do exactly that.

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pktgen question

2007-09-24 Thread David Miller
From: Steve Wise [EMAIL PROTECTED]
Date: Mon, 24 Sep 2007 08:54:13 -0500

 I think pktgen should be cloning the skbs using skb_clone().  Then it 
 will work for all devices, eh?

The problem is that skb_clone() is (relatively) expensive and
pktgen is trying to just grab a reference to the SKB in
the absolutely cheapest way possible.

In my personal opinion, I think what the drivers that don't
work with pktgen are doing is wrong and they should find
another way to pass state around other than to depend upon
being able to use the -cb[] area at-will.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pktgen question

2007-09-24 Thread Rick Jones
Perf-wise, you could clone the skbs up front, then deliver them to the 
nic in a tight loop.  This would mitigate the added overhead 
introduced by calling skb_clone() in the loop doing transmits...


That only works if you are sending a small number of skbs.  You can't 
pre-clone several minutes worth of 10Gbe traffic

with any normal amount of RAM.


Does pktgen really need to allocate anything more than some smallish fraction 
more than the depth of the driver's transmit queue?


Of course, even that won't fit in the L1 cache of a processor, so if one is 
really just trying to max-out the NIC it might still have too much overhead, but 
then does pktgen+driver et al actually fit in an L1 cache?


rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pktgen question

2007-09-24 Thread Ben Greear

Rick Jones wrote:
Perf-wise, you could clone the skbs up front, then deliver them to 
the nic in a tight loop.  This would mitigate the added overhead 
introduced by calling skb_clone() in the loop doing transmits...


That only works if you are sending a small number of skbs.  You can't 
pre-clone several minutes worth of 10Gbe traffic

with any normal amount of RAM.


Does pktgen really need to allocate anything more than some smallish 
fraction more than the depth of the driver's transmit queue?


If you want to send sustained high rates of traffic, for more than
just a trivial amount of time, then you either have to play the current
trick with the skb_get(), or you have to allocate a real packet each time
(maybe with skb_clone() or similar, but it's still more overhead than the 
skb_get
which only bumps a reference count.)

I see no other way, but if you can think of one, please let me know.

Thanks,
Ben

--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock

2007-09-24 Thread Waskiewicz Jr, Peter P
 I have submitted this before; but here it is again.
 Against net-2.6.24 from yesterday for this and all following patches. 
 
 
 cheers,
 jamal

Hi Jamal,
I've been (slowly) working on resurrecting the original design
of my multiqueue patches to address this exact issue of the queue_lock
being a hot item.  I added a queue_lock to each queue in the subqueue
struct, and in the enqueue and dequeue, just lock that queue instead of
the global device queue_lock.  The only two issues to overcome are the
QDISC_RUNNING state flag, since that also serializes entry into the
qdisc_restart() function, and the qdisc statistics maintenance, which
needs to be serialized.  Do you think this work along with your patch
will benefit from one another?  I apologize for not having working
patches right now, but I am working on them slowly as I have some blips
of spare time.

Thanks,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] CAN: Add PF_CAN core module

2007-09-24 Thread Urs Thuermann
Joe Perches [EMAIL PROTECTED] writes:

 I'd prefer something like this, which removes the unnecessary
 kmalloc/kfree pairs or the equivalent conversions to functions.

I have changed this to a static buffer.  Since this is only in
#ifdef CONFIG_CAN_DEBUG_CORE, it shouldn't hurt.

 #define can_debug_cframe(cf, fmt, arg...) \
 do { \
   char buf[20]; \
   int dlc = cf-can_dlc; \
   if (dlc  8) \
   dlc = 8; \
   if (cf-can_id  CAN_EFF_FLAG) \
   sprintf(buf, %08X [%X] , cf-can_id  CAN_EFF_MASK, dlc); \
   else \
   sprintf(buf, %03X [%X] , cf-can_id  CAN_SFF_MASK, dlc); \
   printk(KERN_DEBUG fmt, ##arg); \
   print_hex_dump(buf, DUMP_PREFIX_NONE, cf-data, dlc); \
 } while (0)

The line isn't printed atomically, i.e. it could happen that something
is printed between the fmt+args and the hexdump, i.e inbetween the
line.

 and
 
 #define can_debug_skb(skb) \
 do { \
   pr_debug(skbuff at %p, dev: %d, proto: %04x\n, \
skb, skb-dev ? skb-dev-ifindex : -1, \
ntohs(skb-protocol)); \
   pr_debug(users: %d, dataref: %d, nr_frags: %d,  \
h,d,t,e,l: %p %+d %+d %+d, %d\n, \
atomic_read(skb-users), \
atomic_read((skb_shinfo(skb)-dataref)), \
skb_shinfo(skb)-nr_frags, \
skb-head, skb-data - skb-head, \
skb-tail - skb-head, skb-end - skb-head, skb-len); \
   print_hex_dump(KERN_DEBUG, skb_head: , DUMP_PREFIX_NONE, \
  16, 1, skb-head, skb-end - skb-head); \ 
 } while (0)

Here, the consecutive lines aren't printed atomically.

I think sprintf() to a buffer first and the do one printk() is
better.  But I will use hex_dump_to_buffer().


urs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


deadlink in genetlink?

2007-09-24 Thread Ben Pfaff
[Sorry about the duplicate mail, Thomas: I got the netdev address
wrong on the first try.]

lockdep reported a circular dependency between cb_mutex and
genl_mutex, as follows:

- #1 (nlk-cb_mutex){--..}:
   [c0127b5d] __lock_acquire+0x9c8/0xb98
   [c01280f6] lock_acquire+0x5d/0x75
   [c027adc1] __mutex_lock_slowpath+0xdb/0x247
   [c027af49] mutex_lock+0x1c/0x1f
   [c0236aa0] netlink_dump_start+0xa9/0x12f
  --- takes cb_mutex
   [c0237fa6] genl_rcv_msg+0xa3/0x14c
   [c0235aa5] netlink_run_queue+0x6f/0xe1
   [c0237772] genl_rcv+0x2d/0x4e
  --- trylocks genl_mutex
   [c0235ef0] netlink_data_ready+0x15/0x55
   [c0234e98] netlink_sendskb+0x1f/0x36
   [c0235772] netlink_unicast+0x1a6/0x1c0
   [c0235ecf] netlink_sendmsg+0x238/0x244
   [c021a92e] sock_sendmsg+0xcb/0xe4
   [c021aa98] sys_sendmsg+0x151/0x1af
   [c021b850] sys_socketcall+0x203/0x222
   [c01025d2] syscall_call+0x7/0xb
   [] 0x

- #0 (genl_mutex){--..}:
   [c0127a46] __lock_acquire+0x8b1/0xb98
   [c01280f6] lock_acquire+0x5d/0x75
   [c027adc1] __mutex_lock_slowpath+0xdb/0x247
   [c027af49] mutex_lock+0x1c/0x1f
   [c023717d] genl_lock+0xd/0xf
   [c023806f] ctrl_dumpfamily+0x20/0xdd
  --- takes genl_mutex
   [c0234bfa] netlink_dump+0x50/0x168
  --- takes cb_mutex
   [c02360f2] netlink_recvmsg+0x15f/0x22f
   [c021a84a] sock_recvmsg+0xd5/0xee
   [c021b24e] sys_recvmsg+0xf5/0x187
   [c021b865] sys_socketcall+0x218/0x222
   [c01025d2] syscall_call+0x7/0xb
   [] 0x

The trylock in genl_rcv doesn't prevent the deadlock, because
it's not the last lock acquired.  Perhaps this can be fixed by
not acquiring the genl_mutex in ctrl_dumpfamily; it silences
lockdep, at least.  It is not clear to me what the value of
taking the mutex is there.

If this is an appropriate fix, here is a patch for it.

Signed-off-by: Ben Pfaff [EMAIL PROTECTED]

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 8c11ca4..2e79035 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -616,9 +616,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct 
netlink_callback *cb)
int chains_to_skip = cb-args[0];
int fams_to_skip = cb-args[1];
 
-   if (chains_to_skip != 0)
-   genl_lock();
-
for (i = 0; i  GENL_FAM_TAB_SIZE; i++) {
if (i  chains_to_skip)
continue;
@@ -636,9 +633,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct 
netlink_callback *cb)
}
 
 errout:
-   if (chains_to_skip != 0)
-   genl_unlock();
-
cb-args[0] = i;
cb-args[1] = n;
 

-- 
Ben Pfaff
Nicira Networks, Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] fs_enet: Be an of_platform device when CONFIG_PPC_CPM_NEW_BINDING is set.

2007-09-24 Thread Scott Wood
The existing OF glue code was crufty and broken.  Rather than fix it, it
will be removed, and the ethernet driver now talks to the device tree
directly.

The old, non-CONFIG_PPC_CPM_NEW_BINDING code can go away once CPM
platforms are dropped from arch/ppc (which will hopefully be soon), and
existing arch/powerpc boards that I wasn't able to test on for this
patchset get converted (which should be even sooner).

Signed-off-by: Scott Wood [EMAIL PROTECTED]
---
Resent, with linux/of_platform.h instead of asm/of_platform.h

 drivers/net/fs_enet/Kconfig|1 +
 drivers/net/fs_enet/fs_enet-main.c |  259 ---
 drivers/net/fs_enet/fs_enet.h  |   55 +---
 drivers/net/fs_enet/mac-fcc.c  |   89 +
 drivers/net/fs_enet/mac-fec.c  |   19 +++-
 drivers/net/fs_enet/mac-scc.c  |   53 +--
 drivers/net/fs_enet/mii-bitbang.c  |  269 +++-
 drivers/net/fs_enet/mii-fec.c  |  143 +++-
 include/linux/fs_enet_pd.h |5 +
 9 files changed, 714 insertions(+), 179 deletions(-)

diff --git a/drivers/net/fs_enet/Kconfig b/drivers/net/fs_enet/Kconfig
index e27ee21..2765e49 100644
--- a/drivers/net/fs_enet/Kconfig
+++ b/drivers/net/fs_enet/Kconfig
@@ -11,6 +11,7 @@ config FS_ENET_HAS_SCC
 config FS_ENET_HAS_FCC
bool Chip has an FCC usable for ethernet
depends on FS_ENET  CPM2
+   select MDIO_BITBANG
default y
 
 config FS_ENET_HAS_FEC
diff --git a/drivers/net/fs_enet/fs_enet-main.c 
b/drivers/net/fs_enet/fs_enet-main.c
index a4b76cd..c5abdea 100644
--- a/drivers/net/fs_enet/fs_enet-main.c
+++ b/drivers/net/fs_enet/fs_enet-main.c
@@ -44,12 +44,18 @@
 #include asm/irq.h
 #include asm/uaccess.h
 
+#ifdef CONFIG_PPC_CPM_NEW_BINDING
+#include linux/of_platform.h
+#endif
+
 #include fs_enet.h
 
 /*/
 
+#ifndef CONFIG_PPC_CPM_NEW_BINDING
 static char version[] __devinitdata =
 DRV_MODULE_NAME .c:v DRV_MODULE_VERSION  ( DRV_MODULE_RELDATE ) \n;
+#endif
 
 MODULE_AUTHOR(Pantelis Antoniou [EMAIL PROTECTED]);
 MODULE_DESCRIPTION(Freescale Ethernet Driver);
@@ -953,6 +959,7 @@ static int fs_ioctl(struct net_device *dev, struct ifreq 
*rq, int cmd)
 extern int fs_mii_connect(struct net_device *dev);
 extern void fs_mii_disconnect(struct net_device *dev);
 
+#ifndef CONFIG_PPC_CPM_NEW_BINDING
 static struct net_device *fs_init_instance(struct device *dev,
struct fs_platform_info *fpi)
 {
@@ -1132,6 +1139,7 @@ static int fs_cleanup_instance(struct net_device *ndev)
 
return 0;
 }
+#endif
 
 
/**/
 
@@ -1140,35 +1148,250 @@ void *fs_enet_immap = NULL;
 
 static int setup_immap(void)
 {
-   phys_addr_t paddr = 0;
-   unsigned long size = 0;
-
 #ifdef CONFIG_CPM1
-   paddr = IMAP_ADDR;
-   size = 0x1; /* map 64K */
-#endif
-
-#ifdef CONFIG_CPM2
-   paddr = CPM_MAP_ADDR;
-   size = 0x4; /* map 256 K */
+   fs_enet_immap = ioremap(IMAP_ADDR, 0x4000);
+   WARN_ON(!fs_enet_immap);
+#elif defined(CONFIG_CPM2)
+   fs_enet_immap = cpm2_immr;
 #endif
-   fs_enet_immap = ioremap(paddr, size);
-   if (fs_enet_immap == NULL)
-   return -EBADF;  /* XXX ahem; maybe just BUG_ON? */
 
return 0;
 }
 
 static void cleanup_immap(void)
 {
-   if (fs_enet_immap != NULL) {
-   iounmap(fs_enet_immap);
-   fs_enet_immap = NULL;
-   }
+#if defined(CONFIG_CPM1)
+   iounmap(fs_enet_immap);
+#endif
 }
 
 
/**/
 
+#ifdef CONFIG_PPC_CPM_NEW_BINDING
+static int __devinit find_phy(struct device_node *np,
+  struct fs_platform_info *fpi)
+{
+   struct device_node *phynode, *mdionode;
+   struct resource res;
+   int ret = 0, len;
+
+   const u32 *data = of_get_property(np, phy-handle, len);
+   if (!data || len != 4)
+   return -EINVAL;
+
+   phynode = of_find_node_by_phandle(*data);
+   if (!phynode)
+   return -EINVAL;
+
+   mdionode = of_get_parent(phynode);
+   if (!phynode)
+   goto out_put_phy;
+
+   ret = of_address_to_resource(mdionode, 0, res);
+   if (ret)
+   goto out_put_mdio;
+
+   data = of_get_property(phynode, reg, len);
+   if (!data || len != 4)
+   goto out_put_mdio;
+
+   snprintf(fpi-bus_id, 16, PHY_ID_FMT, res.start, *data);
+
+out_put_mdio:
+   of_node_put(mdionode);
+out_put_phy:
+   of_node_put(phynode);
+   return ret;
+}
+
+#ifdef CONFIG_FS_ENET_HAS_FEC
+#define IS_FEC(match) ((match)-data == fs_fec_ops)
+#else
+#define IS_FEC(match) 0
+#endif
+
+static int __devinit fs_enet_probe(struct of_device *ofdev,
+   const struct of_device_id *match)
+{
+   struct 

[SCTP PULL Request]: Bug fixes for 2.6.23

2007-09-24 Thread Vlad Yasevich
Hi David

Can you please pull the following changes since commit 
a41d3015c11a4e864b95cb57f579f2d8f40cd41b:
  David S. Miller (1):
Revert PCI: disable MSI by default on systems with Serverworks HT1000 
chips

which are available in the git repository at:

  [EMAIL PROTECTED]:/pub/scm/linux/kernel/git/vxy/lksctp-dev.git master

Vlad Yasevich (4):
  SCTP: Validate buffer room when processing sequential chunks
  SCTP: Explicitly discard OOTB chunks
  SCTP: Clean up OOTB handling and fix infinite loop processing
  SCTP: Discard OOTB packets with bundled INIT early.

Wei Yongjun (2):
  SCTP: Send ABORT chunk with correct tag in response to INIT ACK
  SCTP : Add paramters validity check for ASCONF chunk

 include/net/sctp/sm.h  |4 +-
 include/net/sctp/structs.h |1 +
 net/sctp/input.c   |8 ++
 net/sctp/inqueue.c |8 ++
 net/sctp/sm_make_chunk.c   |   46 
 net/sctp/sm_statefuns.c|  243 +---
 net/sctp/sm_statetable.c   |   16 ++--
 7 files changed, 278 insertions(+), 48 deletions(-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Zero-length write() does not generate a datagram on connected socket

2007-09-24 Thread Stephen Hemminger
The bug http://bugzilla.kernel.org/show_bug.cgi?id=5731
describes an issue where write() can't be used to generate a zero-length
datagram (but send, and sendto do work).

I think the following is needed:

--- a/net/socket.c  2007-08-20 09:54:28.0 -0700
+++ b/net/socket.c  2007-09-24 15:31:25.0 -0700
@@ -777,8 +777,11 @@ static ssize_t sock_aio_write(struct kio
if (pos != 0)
return -ESPIPE;
 
-   if (iocb-ki_left == 0) /* Match SYS5 behaviour */
-   return 0;
+   if (unlikely(iocb-ki_left == 0)) {
+   struct socket *sock = iocb-ki_filp-private_data;
+   if (sock-type == SOCK_STREAM)
+   return 0;
+   }
 
x = alloc_sock_iocb(iocb, siocb);
if (!x)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHES] TX batching

2007-09-24 Thread jamal
On Mon, 2007-24-09 at 00:00 -0700, Kok, Auke wrote:

 that's bad to begin with :) - please send those separately so I can fasttrack 
 them
 into e1000e and e1000 where applicable.

Ive been CCing you ;- Most of the changes are readability and
reusability with the batching.

 But yes, I'm very inclined to merge more features into e1000e than e1000. I 
 intend
 to put multiqueue support into e1000e, as *all* of the hardware that it will
 support has multiple queues. Putting in any other performance feature like tx
 batching would absolutely be interesting.

I looked at the e1000e and it is very close to e1000 so i should be able
to move the changes easily. Most importantly, can i kill LLTX?
For tx batching, we have to wait to see how Dave wants to move forward;
i will have the patches but it is not something you need to push until
we see where that is going.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock

2007-09-24 Thread jamal
On Mon, 2007-24-09 at 12:12 -0700, Waskiewicz Jr, Peter P wrote:

 Hi Jamal,
   I've been (slowly) working on resurrecting the original design
 of my multiqueue patches to address this exact issue of the queue_lock
 being a hot item.  I added a queue_lock to each queue in the subqueue
 struct, and in the enqueue and dequeue, just lock that queue instead of
 the global device queue_lock.  The only two issues to overcome are the
 QDISC_RUNNING state flag, since that also serializes entry into the
 qdisc_restart() function, and the qdisc statistics maintenance, which
 needs to be serialized.  Do you think this work along with your patch
 will benefit from one another? 

The one thing that seems obvious is to use dev-hard_prep_xmit() in the
patches i posted to select the xmit ring. You should be able to do
figure out the txmit ring without holding any lock. 

I lost track of how/where things went since the last discussion; so i
need to wrap my mind around it to make sensisble suggestions - I know
the core patches are in the kernel but havent paid attention to details
and if you look at my second patch youd see a comment in
dev_batch_xmit() which says i need to scrutinize multiqueue more. 

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DOC] Net batching driver howto

2007-09-24 Thread jamal

I have updated the driver howto to match the patches i posted yesterday.
attached. 

cheers,
jamal
Heres the begining of a howto for driver authors.

The intended audience for this howto is people already
familiar with netdevices.

1.0  Netdevice Pre-requisites
--

For hardware based netdevices, you must have at least hardware that 
is capable of doing DMA with many descriptors; i.e having hardware 
with a queue length of 3 (as in some fscked ethernet hardware) is 
not very useful in this case.

2.0  What is new in the driver API
---

There are 3 new methods and one new variable introduced. These are:
1)dev-hard_prep_xmit()
2)dev-hard_end_xmit()
3)dev-hard_batch_xmit()
4)dev-xmit_win

2.1 Using Core driver changes
-

To provide context, lets look at a typical driver abstraction
for dev-hard_start_xmit(). It has 4 parts:
a) packet formating (example vlan, mss, descriptor counting etc)
b) chip specific formatting
c) enqueueing the packet on a DMA ring
d) IO operations to complete packet transmit, tell DMA engine to chew 
on, tx completion interupts etc

[For code cleanliness/readability sake, regardless of this work,
one should break the dev-hard_start_xmit() into those 4 functions
anyways].
A driver which has all 4 parts and needing to support batching is 
advised to split its dev-hard_start_xmit() in the following manner:
1)use its dev-hard_prep_xmit() method to achieve #a
2)use its dev-hard_end_xmit() method to achieve #d
3)#b and #c can stay in -hard_start_xmit() (or whichever way you 
want to do this)
Note: There are drivers which may need not support any of the two
methods (example the tun driver i patched) so the two methods are
essentially optional.

2.1.1 Theory of operation
--

The core will first do the packet formatting by invoking your 
supplied dev-hard_prep_xmit() method. It will then pass you the packet 
via your dev-hard_start_xmit() method for as many as packets you
have advertised (via dev-xmit_win) you can consume. Lastly it will 
invoke your dev-hard_end_xmit() when it completes passing you all the 
packets queued for you. 


2.1.1.1 Locking rules
-

dev-hard_prep_xmit() is invoked without holding any
tx lock but the rest are under TX_LOCK(). So you have to ensure that
whatever you put it dev-hard_prep_xmit() doesnt require locking.

2.1.1.2 The slippery LLTX
-

LLTX drivers present a challenge in that we have to introduce a deviation
from the norm and require the -hard_batch_xmit() method. An LLTX
driver presents us with -hard_batch_xmit() to which we pass it a list
of packets in a dev-blist skb queue. It is then the responsibility
of the -hard_batch_xmit() to exercise steps #b and #c for all packets
passed in the dev-blist.
Step #a and #d are done by the core should you register presence of
dev-hard_prep_xmit() and dev-hard_end_xmit() in your setup.

2.1.1.3 xmit_win


dev-xmit_win variable is set by the driver to tell us how
much space it has in its rings/queues. dev-xmit_win is introduced to 
ensure that when we pass the driver a list of packets it will swallow 
all of them - which is useful because we dont requeue to the qdisc (and 
avoids burning unnecessary cpu cycles or introducing any strange 
re-ordering). The driver tells us, whenever it invokes netif_wake_queue,
how much space it has for descriptors by setting this variable.

3.0 Driver Essentials
-

The typical driver tx state machine is:


-1- +Core sends packets
 +-- Driver puts packet onto hardware queue
 +if hardware queue is full, netif_stop_queue(dev)
 +
-2- +core stops sending because of netif_stop_queue(dev)
..
.. time passes ...
..
-3- +--- driver has transmitted packets, opens up tx path by
  invoking netif_wake_queue(dev)
-1- +Cycle repeats and core sends more packets (step 1).


3.1  Driver pre-requisite
--

This is _a very important_ requirement in making batching useful.
The pre-requisite for batching changes is that the driver should 
provide a low threshold to open up the tx path.
Drivers such as tg3 and e1000 already do this.
Before you invoke netif_wake_queue(dev) you check if there is a
threshold of space reached to insert new packets.

Heres an example of how i added it to tun driver. Observe the
setting of dev-xmit_win

---
+#define NETDEV_LTT 4 /* the low threshold to open up the tx path */
..
..
u32 t = skb_queue_len(tun-readq);
if (netif_queue_stopped(tun-dev)  t  NETDEV_LTT) {
tun-dev-xmit_win = tun-dev-tx_queue_len;
netif_wake_queue(tun-dev);
}
---

Heres how the batching e1000 driver does it:

--
if (unlikely(cleaned  netif_carrier_ok(netdev) 
 E1000_DESC_UNUSED(tx_ring) = TX_WAKE_THRESHOLD)) {

if (netif_queue_stopped(netdev)) {
   int rspace =  E1000_DESC_UNUSED(tx_ring) - 

Re: [PATCHES] TX batching

2007-09-24 Thread Kok, Auke
jamal wrote:
 On Mon, 2007-24-09 at 00:00 -0700, Kok, Auke wrote:
 
 that's bad to begin with :) - please send those separately so I can 
 fasttrack them
 into e1000e and e1000 where applicable.
 
 Ive been CCing you ;- Most of the changes are readability and
 reusability with the batching.
 
 But yes, I'm very inclined to merge more features into e1000e than e1000. I 
 intend
 to put multiqueue support into e1000e, as *all* of the hardware that it will
 support has multiple queues. Putting in any other performance feature like tx
 batching would absolutely be interesting.
 
 I looked at the e1000e and it is very close to e1000 so i should be able
 to move the changes easily. Most importantly, can i kill LLTX?
 For tx batching, we have to wait to see how Dave wants to move forward;
 i will have the patches but it is not something you need to push until
 we see where that is going.

hmm, I though I already removed that, but now I see some remnants from that.

By all means, please send a separate patch for that!

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock

2007-09-24 Thread Waskiewicz Jr, Peter P
 The one thing that seems obvious is to use 
 dev-hard_prep_xmit() in the patches i posted to select the 
 xmit ring. You should be able to do figure out the txmit ring 
 without holding any lock. 

I've looked at that as a candidate to use.  The lock for enqueue would
be needed when actually placing the skb into the appropriate software
queue for the qdisc, so it'd be quick.

 I lost track of how/where things went since the last 
 discussion; so i need to wrap my mind around it to make 
 sensisble suggestions - I know the core patches are in the 
 kernel but havent paid attention to details and if you look 
 at my second patch youd see a comment in
 dev_batch_xmit() which says i need to scrutinize multiqueue more. 

No worries.  I'll try to get things together on my end and provide some
patches to add a per-queue lock.  In the meantime, I'll take a much
closer look at the batching code, since I've stopped looking at the
patches in-depth about a month ago.  :-(

Thanks,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock

2007-09-24 Thread jamal
On Mon, 2007-24-09 at 15:57 -0700, Waskiewicz Jr, Peter P wrote:

 I've looked at that as a candidate to use.  The lock for enqueue would
 be needed when actually placing the skb into the appropriate software
 queue for the qdisc, so it'd be quick.

The enqueue is easy to comprehend. The single device queue lock should
suffice. The dequeue is interesting:
Maybe you can point me to some doc or describe to me the dequeue aspect;
are you planning to have an array of txlocks per, one per ring?
How is the policy to define the qdisc queues locked/mapped to tx rings? 

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock

2007-09-24 Thread Waskiewicz Jr, Peter P
 On Mon, 2007-24-09 at 15:57 -0700, Waskiewicz Jr, Peter P wrote:
 
  I've looked at that as a candidate to use.  The lock for 
 enqueue would 
  be needed when actually placing the skb into the 
 appropriate software 
  queue for the qdisc, so it'd be quick.
 
 The enqueue is easy to comprehend. The single device queue 
 lock should suffice. The dequeue is interesting:

We should make sure we're symmetric with the locking on enqueue to
dequeue.  If we use the single device queue lock on enqueue, then
dequeue will also need to check that lock in addition to the individual
queue lock.  The details of this are more trivial than the actual
dequeue to make it efficient though.

 Maybe you can point me to some doc or describe to me the 
 dequeue aspect; are you planning to have an array of txlocks 
 per, one per ring?
 How is the policy to define the qdisc queues locked/mapped to 
 tx rings? 

The dequeue locking would be pushed into the qdisc itself.  This is how
I had it originally, and it did make the code more complex, but it was
successful at breaking the heavily-contended queue_lock apart.  I have a
subqueue structure right now in netdev, which only has queue_state (for
netif_{start|stop}_subqueue).  This state is checked in sch_prio right
now in the dequeue for both prio and rr.  My approach is to add a
queue_lock in that struct, so each queue allocated by the driver would
have a lock per queue.  Then in dequeue, that lock would be taken when
the skb is about to be dequeued.

The skb-queue_mapping field also maps directly to the queue index
itself, so it can be unlocked easily outside of the context of the
dequeue function.  The policy would be to use a spin_trylock() in
dequeue, so that dequeue can still do work if enqueue or another dequeue
is busy.  And the allocation of qdisc queues to device queues is assumed
to be one-to-one (that's how the qdisc behaves now).

I really just need to put my nose to the grindstone and get the patches
together and to the list...stay tuned.

Thanks,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHES] TX batching

2007-09-24 Thread Jeff Garzik

jamal wrote:
If the intel folks will accept the patch i'd really like to kill 
the e1000 LLTX interface.



If I understood DaveM correctly, it is sounding like we want to 
deprecate all of use LLTX on real hardware?  If so, several such 
projects might be considered, as well as possibly simplifying TX 
batching work perhaps.


Also, WRT e1000 specifically, I was hoping to minimize changes, and 
focus people on e1000e.


e1000e replaces (deprecates) large portions of e1000, namely the support 
for the PCI Express modern chips.  When e1000e has proven itself in the 
field, we can potentially look at several e1000 simplifications, during 
the large scale code removal that becomes possible.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock

2007-09-24 Thread Stephen Hemminger
On Mon, 24 Sep 2007 16:47:06 -0700
Waskiewicz Jr, Peter P [EMAIL PROTECTED] wrote:

  On Mon, 2007-24-09 at 15:57 -0700, Waskiewicz Jr, Peter P wrote:
  
   I've looked at that as a candidate to use.  The lock for 
  enqueue would 
   be needed when actually placing the skb into the 
  appropriate software 
   queue for the qdisc, so it'd be quick.
  
  The enqueue is easy to comprehend. The single device queue 
  lock should suffice. The dequeue is interesting:
 
 We should make sure we're symmetric with the locking on enqueue to
 dequeue.  If we use the single device queue lock on enqueue, then
 dequeue will also need to check that lock in addition to the individual
 queue lock.  The details of this are more trivial than the actual
 dequeue to make it efficient though.
 
  Maybe you can point me to some doc or describe to me the 
  dequeue aspect; are you planning to have an array of txlocks 
  per, one per ring?
  How is the policy to define the qdisc queues locked/mapped to 
  tx rings? 
 
 The dequeue locking would be pushed into the qdisc itself.  This is how
 I had it originally, and it did make the code more complex, but it was
 successful at breaking the heavily-contended queue_lock apart.  I have a
 subqueue structure right now in netdev, which only has queue_state (for
 netif_{start|stop}_subqueue).  This state is checked in sch_prio right
 now in the dequeue for both prio and rr.  My approach is to add a
 queue_lock in that struct, so each queue allocated by the driver would
 have a lock per queue.  Then in dequeue, that lock would be taken when
 the skb is about to be dequeued.
 
 The skb-queue_mapping field also maps directly to the queue index
 itself, so it can be unlocked easily outside of the context of the
 dequeue function.  The policy would be to use a spin_trylock() in
 dequeue, so that dequeue can still do work if enqueue or another dequeue
 is busy.  And the allocation of qdisc queues to device queues is assumed
 to be one-to-one (that's how the qdisc behaves now).
 
 I really just need to put my nose to the grindstone and get the patches
 together and to the list...stay tuned.
 
 Thanks,
 -PJ Waskiewicz
 -


Since we are redoing this, is there any way to make the whole TX path
more lockless?  The existing model seems to be more of a monitor than
a real locking model.
-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock

2007-09-24 Thread Waskiewicz Jr, Peter P
  I really just need to put my nose to the grindstone and get the 
  patches together and to the list...stay tuned.
  
  Thanks,
  -PJ Waskiewicz
  -
 
 
 Since we are redoing this, is there any way to make the whole 
 TX path more lockless?  The existing model seems to be more 
 of a monitor than a real locking model.

That seems quite reasonable.  I will certainly see what I can do.

Thanks Stephen,

-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sky2: FE+ Phy initialization

2007-09-24 Thread Stephen Hemminger
One more snippet of PHY initialization required for FE+ chips.
Discovered in latest sk98lin 10.21.1.3 driver.

Please apply to 2.6.23.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/drivers/net/sky2.c2007-09-24 09:10:36.0 -0700
+++ b/drivers/net/sky2.c2007-09-24 10:22:28.0 -0700
@@ -338,6 +338,16 @@ static void sky2_phy_init(struct sky2_hw
if (!(hw-flags  SKY2_HW_GIGABIT)) {
/* enable automatic crossover */
ctrl |= PHY_M_PC_MDI_XMODE(PHY_M_PC_ENA_AUTO)  1;
+
+   if (hw-chip_id == CHIP_ID_YUKON_FE_P 
+   hw-chip_rev == CHIP_REV_YU_FE2_A0) {
+   u16 spec;
+
+   /* Enable Class A driver for FE+ A0 */
+   spec = gm_phy_read(hw, port, 
PHY_MARV_FE_SPEC_2);
+   spec |= PHY_M_FESC_SEL_CL_A;
+   gm_phy_write(hw, port, PHY_MARV_FE_SPEC_2, 
spec);
+   }
} else {
/* disable energy detect */
ctrl = ~PHY_M_PC_EN_DET_MSK;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sky2: FE+ Phy initialization

2007-09-24 Thread Jeff Garzik

Stephen Hemminger wrote:

One more snippet of PHY initialization required for FE+ chips.
Discovered in latest sk98lin 10.21.1.3 driver.

Please apply to 2.6.23.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sky2: be more selective about FIFO watchdog

2007-09-24 Thread Jeff Garzik

Stephen Hemminger wrote:

Be more selective about when to enable the ram buffer watchdog code.
It is unnecessary on XL A3 or later revs, and with Yukon FE
the buffer is so small (4K) that the watchdog detects false positives.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove broken netfilter binary sysctls from bridging code

2007-09-24 Thread Patrick McHardy
Stephen Hemminger wrote:
 On Mon, 24 Sep 2007 18:55:38 +0200
 Patrick McHardy [EMAIL PROTECTED] wrote:
 
Eric W. Biederman wrote:

A really good fix would be to remove the binary side and then to
modify brnf_sysctl_call_tables to allocate a temporary ctl_table and
integer on the stack and only set ctl-data after we have normalized
the written value.  But since in practice nothing cares about
the race a better fix probably isn't worth it.


I seem to be missing something, the entire brnf_sysctl_call_tables
thing looks purely cosmetic to me, wouldn't it be better to simply
remove it?
 
 
 I agree, removing seems like a better option.  But probably need to go
 through a 3-6mo warning period, since sysctl's are technically an API.


I meant removing brnf_sysctl_call_tables function, not the sysctls
themselves, all it does is change values != 0 to 1. Or did you
actually mean that something in userspace might depend on reading
back the value 1 after writing a value != 0?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-24 Thread Kyle Moffett

On Sep 24, 2007, at 13:32:23, Lennart Sorensen wrote:

On Fri, Sep 21, 2007 at 11:37:52PM +0100, Denys Vlasenko wrote:

But I compile net/* into bzImage. I like netbooting :)


Isn't it possible to netboot with an initramfs image?  I am pretty  
sure I have seen some systems do exactly that.


Yeah, I've got Debian boxes that have never *not* netbooted (one Dell  
Op^?^?Craptiplex box whose BIOS and ACPI sucks so bad it can't even  
load GRUB/LILO, although Windows somehow works fine).  So they boot  
PXELinux using the PXE boot ROM on the NICs and it loads both a  
kernel and an initramfs into memory.  Kernel is stock Debian and  
hardly has enough built-in to spit at you, let alone find network/ 
disks, but it manages to load everything it needs off the  
automagically-generated initramfs.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[git patches] net driver fixes

2007-09-24 Thread Jeff Garzik

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
upstream-linus

to receive the following updates:

 drivers/net/pcmcia/3c589_cs.c |2 +-
 drivers/net/r8169.c   |   14 +-
 drivers/net/sky2.c|   37 -
 drivers/net/sky2.h|2 +-
 4 files changed, 39 insertions(+), 16 deletions(-)

Edward Hsu (1):
  r8169: correct phy parameters for the 8110SC

Francois Romieu (1):
  r8169: workaround against ignored TxPoll writes (8168)

Jeff Garzik (1):
  Revert drivers/net/pcmcia/3c589_cs: fix port configuration switcheroo

Stephen Hemminger (2):
  sky2: FE+ Phy initialization
  sky2: be more selective about FIFO watchdog

diff --git a/drivers/net/pcmcia/3c589_cs.c b/drivers/net/pcmcia/3c589_cs.c
index c06cae3..503f268 100644
--- a/drivers/net/pcmcia/3c589_cs.c
+++ b/drivers/net/pcmcia/3c589_cs.c
@@ -116,7 +116,7 @@ struct el3_private {
 spinlock_t lock;
 };
 
-static const char *if_names[] = { auto, 10base2, 10baseT, AUI };
+static const char *if_names[] = { auto, 10baseT, 10base2, AUI };
 
 /**/
 
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b85ab4a..c921ec3 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1228,7 +1228,10 @@ static void rtl8169_hw_phy_config(struct net_device *dev)
return;
}
 
-   /* phy config for RTL8169s mac_version C chip */
+   if ((tp-mac_version != RTL_GIGA_MAC_VER_02) 
+   (tp-mac_version != RTL_GIGA_MAC_VER_03))
+   return;
+
mdio_write(ioaddr, 31, 0x0001); //w 31 2 0 1
mdio_write(ioaddr, 21, 0x1000); //w 21 15 0 1000
mdio_write(ioaddr, 24, 0x65c7); //w 24 15 0 65c7
@@ -2567,6 +2570,15 @@ static void rtl8169_tx_interrupt(struct net_device *dev,
(TX_BUFFS_AVAIL(tp) = MAX_SKB_FRAGS)) {
netif_wake_queue(dev);
}
+   /*
+* 8168 hack: TxPoll requests are lost when the Tx packets are
+* too close. Let's kick an extra TxPoll request when a burst
+* of start_xmit activity is detected (if it is not detected,
+* it is slow enough). -- FR
+*/
+   smp_rmb();
+   if (tp-cur_tx != dirty_tx)
+   RTL_W8(TxPoll, NPQ);
}
 }
 
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index eaffe55..0792031 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -338,6 +338,16 @@ static void sky2_phy_init(struct sky2_hw *hw, unsigned 
port)
if (!(hw-flags  SKY2_HW_GIGABIT)) {
/* enable automatic crossover */
ctrl |= PHY_M_PC_MDI_XMODE(PHY_M_PC_ENA_AUTO)  1;
+
+   if (hw-chip_id == CHIP_ID_YUKON_FE_P 
+   hw-chip_rev == CHIP_REV_YU_FE2_A0) {
+   u16 spec;
+
+   /* Enable Class A driver for FE+ A0 */
+   spec = gm_phy_read(hw, port, 
PHY_MARV_FE_SPEC_2);
+   spec |= PHY_M_FESC_SEL_CL_A;
+   gm_phy_write(hw, port, PHY_MARV_FE_SPEC_2, 
spec);
+   }
} else {
/* disable energy detect */
ctrl = ~PHY_M_PC_EN_DET_MSK;
@@ -816,7 +826,8 @@ static void sky2_mac_init(struct sky2_hw *hw, unsigned port)
sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_CLR);
sky2_write16(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_OPER_ON);
 
-   if (!(hw-flags  SKY2_HW_RAMBUFFER)) {
+   /* On chips without ram buffer, pause is controled by MAC level */
+   if (sky2_read8(hw, B2_E_0) == 0) {
sky2_write8(hw, SK_REG(port, RX_GMF_LP_THR), 768/8);
sky2_write8(hw, SK_REG(port, RX_GMF_UP_THR), 1024/8);
 
@@ -1271,7 +1282,7 @@ static int sky2_up(struct net_device *dev)
struct sky2_port *sky2 = netdev_priv(dev);
struct sky2_hw *hw = sky2-hw;
unsigned port = sky2-port;
-   u32 imask;
+   u32 imask, ramsize;
int cap, err = -ENOMEM;
struct net_device *otherdev = hw-dev[sky2-port^1];
 
@@ -1326,13 +1337,12 @@ static int sky2_up(struct net_device *dev)
 
sky2_mac_init(hw, port);
 
-   if (hw-flags  SKY2_HW_RAMBUFFER) {
-   /* Register is number of 4K blocks on internal RAM buffer. */
-   u32 ramsize = sky2_read8(hw, B2_E_0) * 4;
+   /* Register is number of 4K blocks on internal RAM buffer. */
+   ramsize = sky2_read8(hw, B2_E_0) * 4;
+   if (ramsize  0) {
u32 rxspace;
 
-   printk(KERN_DEBUG PFX %s: ram buffer %dK\n, dev-name, 
ramsize);
-
+