Re: [IPROUTE] 2nd try

2006-10-19 Thread Michael Prokop
[Note: I'm the contributor of the manpage]

* Stephen Hemminger [EMAIL PROTECTED] wrote:
 Alexander Wirt wrote:

[manpage for ss]

 \fBss\fP is another utility to investigate sockets. Functionally it is
 NOT better than netstat combined with some perl/awk scripts and though it is
 surely faster it is not enough to make it much better. :-) So, stop reading
 this now and do not waste your time. Well, certainly, it proposes some
 functionality, which current netstat is still not able to do, but surely will
 soon.

 Amusing but hardly in keeping with the style and tone of other man pages.

The lines above are a 1:1 copy of the text by Alexey Kuznetosv I found in
the iproute-doc package at /usr/share/doc/iproute-doc/ss.html
[see http://www.math.ias.edu/doc/iproute-2.6.9/ss.ps].
Of course feel free to drop/replace the lines.

-mika-
-- 
 ,'`. http://www.michael-prokop.at/
(  grml.org -» Linux Live-CD for texttool-users and sysadmins
 `._,' http://www.grml.org/

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] Size listen hash tables using backlog hint

2006-10-19 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Thu, 19 Oct 2006 07:12:58 +0200

 A 66 MHz 486 can perform 1.000.000 divisions per second. Is it a 'slow' cpu ?

Sparc and some other embedded chips have no division/modulus integer
instruction and do it in software.

 So... what do you prefer :
 
 1) Keep the modulus
 2) allocate two blocks of ram (powser-of -two hash size, but one extra 
 indirection)
 3) waste near half of ram because one block allocated, and power-of-two hash 
 size.

I thought the problem was that you use a modulus and non-power-of-2
hash table size because rounding up to the next power of 2 wastes
a lot of space?  Given that, my suggestion is simply to not round
up to the next power-of-2, or only do so when we are very very close
to that next power-of-2.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH wireless-dev] set freq in ieee80211_rx_status

2006-10-19 Thread Michael Wu
set freq in ieee80211_rx_status

This patch fixes the RX handler in adm8211 and p54 to report the current 
frequency and channel. Should probably be handled in d80211 instead, but this 
will fix things for now. It also eliminates some definitions in adm8211.h 
that are no longer necessary.

Signed-off-by: Michael Wu [EMAIL PROTECTED]

---

 drivers/net/wireless/d80211/adm8211/adm8211.c   |3 ++
 drivers/net/wireless/d80211/adm8211/adm8211.h   |   42 
---
 drivers/net/wireless/d80211/p54/prism54.h   |1 +
 drivers/net/wireless/d80211/p54/prism54common.c |6 +++
 4 files changed, 24 insertions(+), 28 deletions(-)

diff --git a/drivers/net/wireless/d80211/adm8211/adm8211.c 
b/drivers/net/wireless/d80211/adm8211/adm8211.c
index 3bce55f..9f965d3 100644
--- a/drivers/net/wireless/d80211/adm8211/adm8211.c
+++ b/drivers/net/wireless/d80211/adm8211/adm8211.c
@@ -551,6 +551,9 @@ static void adm8211_interrupt_rci(struct
if (rate = 4)
rx_status.rate = rate_tbl[rate];
 
+   rx_status.channel = priv-channel;
+   rx_status.freq = adm8211_channels[priv-channel - 
1].freq;
+
/* remove FCS */
/* TODO: remove this and set flag in ieee80211_hw 
instead? */
if (dev-flags  IFF_PROMISC)
diff --git a/drivers/net/wireless/d80211/adm8211/adm8211.h 
b/drivers/net/wireless/d80211/adm8211/adm8211.h
index 89e0fdf..a579a90 100644
--- a/drivers/net/wireless/d80211/adm8211/adm8211.h
+++ b/drivers/net/wireless/d80211/adm8211/adm8211.h
@@ -590,47 +590,33 @@ static const struct ieee80211_chan_range
 
 static const struct ieee80211_channel adm8211_channels[] = {
{ .chan = 1,
- .freq = 2412,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2412},
{ .chan = 2,
- .freq = 2417,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2417},
{ .chan = 3,
- .freq = 2422,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2422},
{ .chan = 4,
- .freq = 2427,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2427},
{ .chan = 5,
- .freq = 2432,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2432},
{ .chan = 6,
- .freq = 2437,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2437},
{ .chan = 7,
- .freq = 2442,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2442},
{ .chan = 8,
- .freq = 2447,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2447},
{ .chan = 9,
- .freq = 2452,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2452},
{ .chan = 10,
- .freq = 2457,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2457},
{ .chan = 11,
- .freq = 2462,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2462},
{ .chan = 12,
- .freq = 2467,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2467},
{ .chan = 13,
- .freq = 2472,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS},
+ .freq = 2472},
{ .chan = 14,
- .freq = 2484,
- .flag = IEEE80211_CHAN_W_SCAN | IEEE80211_CHAN_W_ACTIVE_SCAN | 
IEEE80211_CHAN_W_IBSS}
+ .freq = 2484},
 };
 
 #endif /* ADM8211_H */
diff --git a/drivers/net/wireless/d80211/p54/prism54.h 
b/drivers/net/wireless/d80211/p54/prism54.h
index a165cc5..f2aaafe 100644
--- a/drivers/net/wireless/d80211/p54/prism54.h
+++ b/drivers/net/wireless/d80211/p54/prism54.h
@@ -53,6 +53,7 @@ struct p54_common {
int (*open)(struct net_device *dev);
void (*stop)(struct net_device *dev);
int mode;
+   int channel;
struct pda_iq_autocal_entry *iq_autocal;
unsigned int iq_autocal_len;
struct pda_channel_output_limit *output_limit;
diff --git a/drivers/net/wireless/d80211/p54/prism54common.c 
b/drivers/net/wireless/d80211/p54/prism54common.c
index a96a4fc..34bccee 100644
--- a/drivers/net/wireless/d80211/p54/prism54common.c
+++ b/drivers/net/wireless/d80211/p54/prism54common.c
@@ -206,11 +206,14 @@ EXPORT_SYMBOL_GPL(p54_parse_eeprom);
 
 static void 

Re: [IPROUTE] 2nd try

2006-10-19 Thread David Miller
From: Michael Prokop [EMAIL PROTECTED]
Date: Thu, 19 Oct 2006 08:00:57 +0200

 The lines above are a 1:1 copy of the text by Alexey Kuznetosv I found in
 the iproute-doc package at /usr/share/doc/iproute-doc/ss.html
 [see http://www.math.ias.edu/doc/iproute-2.6.9/ss.ps].
 Of course feel free to drop/replace the lines.

Alexey should have more confidence in the software he writes
:-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH wireless-dev] p54: fix stalling in TX queue

2006-10-19 Thread Michael Wu
p54: fix stalling in TX queue

This patch makes the p54 TX queue not stall anymore. Probably not the most 
efficient thing to do, but it's better than stalling.

It also adds a small comment to prism54common.h about the origin of the pda 
definitions and inserts a missing verb in the comment for p54_assign_address.

Signed-off-by: Michael Wu [EMAIL PROTECTED]

---

 drivers/net/wireless/d80211/p54/prism54common.c |   10 +-
 drivers/net/wireless/d80211/p54/prism54common.h |2 ++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/d80211/p54/prism54common.c 
b/drivers/net/wireless/d80211/p54/prism54common.c
index 34bccee..6f4cb5f 100644
--- a/drivers/net/wireless/d80211/p54/prism54common.c
+++ b/drivers/net/wireless/d80211/p54/prism54common.c
@@ -261,8 +261,8 @@ static void p54_rx_frame_sent(struct net
entry = entry-next;
}

-   if (freed  IEEE80211_MAX_RTS_THRESHOLD + 0x170 + sizeof(struct 
p54_control_hdr))
-   ieee80211_wake_queue(dev, 0);   
+   if (skb_queue_empty(priv-tx_queue))
+   ieee80211_wake_queue(dev, 0);
 }
 
 static void p54_rx_control(struct net_device *dev, struct sk_buff *skb)
@@ -308,9 +308,9 @@ EXPORT_SYMBOL_GPL(p54_rx);
 /*
  * So, the firmware is somewhat stupid and doesn't know what places in its
  * memory incoming data should go to. By poking around in the firmware, we
- * can some unused memory to upload our packets to. However, data that we 
want
- * the card to TX needs to stay intact until the card has told us that it is
- * done with it. This function finds empty places we can upload to and
+ * can find some unused memory to upload our packets to. However, data that 
we
+ * want the card to TX needs to stay intact until the card has told us that
+ * it is done with it. This function finds empty places we can upload to and
  * marks allocated areas as reserved if necessary. p54_rx_frame_sent frees
  * allocated areas.
  */
diff --git a/drivers/net/wireless/d80211/p54/prism54common.h 
b/drivers/net/wireless/d80211/p54/prism54common.h
index e2c1ff4..15199f6 100644
--- a/drivers/net/wireless/d80211/p54/prism54common.h
+++ b/drivers/net/wireless/d80211/p54/prism54common.h
@@ -43,6 +43,8 @@ #define FW_LM86 0x4c4d3836
 #define FW_LM87 0x4c4d3837
 #define FW_LM20 0x4c4d3230
 
+/* PDA defines are Copyright (C) 2005 Nokia Corporation (taken from 
islsm_pda.h) */
+
 struct pda_entry {
u16 len;/* includes both code and data */
u16 code;


pgp4akAjeabep.pgp
Description: PGP signature


Re: (usagi-core 31424) Re: [PATCH 7/13] [RFC] [IPV6] Move source address selection into route lookup.

2006-10-19 Thread Ville Nuorvala
On 10/19/06 05:27, Mitsuru Chinen wrote:

Hello Mitsuru-san,

 At the first Echo Request, its source address is a global address.
 
 | # ping6 -n -c 1 -i 1 -s 1452 -p 00 -w 2 -I eth1 FF1E: :1:2
 | PATTERN: 0x00
 | PING FF1E::1:2(ff1e::1:2) from 3ffe:501::100:2d0:b7ff:fe9a:455b eth1: 
 1452 data bytes
 |
 | --- FF1E::1:2 ping statistics ---
 | 3 packets transmitted, 0 received, 100% loss, time 1999ms
 
 However, after receiving a Too Big message, the source address
 of Echo Request is changed to :: at the failure case.

thank you for the thorough test report! I'll try to track down the reason.

Regards,
Ville
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] Size listen hash tables using backlog hint

2006-10-19 Thread Eric Dumazet

David Miller a écrit :

From: Eric Dumazet [EMAIL PROTECTED]
Date: Thu, 19 Oct 2006 07:12:58 +0200


A 66 MHz 486 can perform 1.000.000 divisions per second. Is it a 'slow' cpu ?


Sparc and some other embedded chips have no division/modulus integer
instruction and do it in software.


How many times this division will be done ? As I said, tcp session 
establishment.

Are you aware a division is done in slab code when you kfree() one network 
frames ? That is much more problematic than SYN packets.





So... what do you prefer :

1) Keep the modulus
2) allocate two blocks of ram (powser-of -two hash size, but one extra 
indirection)

3) waste near half of ram because one block allocated, and power-of-two hash 
size.


I thought the problem was that you use a modulus and non-power-of-2
hash table size because rounding up to the next power of 2 wastes
a lot of space?  Given that, my suggestion is simply to not round
up to the next power-of-2, or only do so when we are very very close
to that next power-of-2.


My main problem is being able to use a large hash table on big servers.

With power-of two constraint, plus kmalloc max size constraint, we can use 
half the size we could.


Are you suggesting something like :

Allocation time:

if (cpu is very_very_slow or hash size small_enough) {
  ptr-size = power_of_too;
  ptr-size_mask = (power_of_two - 1);
} else {
  ptr-size = somevalue;
  ptr-size_mask = ~0;
}

Lookup time :
---
if (ptr-size_mask != ~0)
slot = hash  ptr-size_mask;
else
slot = hash % ptr-size;

The extra conditional branch may be more expensive than just doing division on 
99% of cpus...


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] Size listen hash tables using backlog hint

2006-10-19 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Thu, 19 Oct 2006 08:34:53 +0200

 My main problem is being able to use a large hash table on big servers.
 
 With power-of two constraint, plus kmalloc max size constraint, we can use 
 half the size we could.

Switch to vmalloc() at the kmalloc() cut-off point, just like
I did for the other hashes in the tree.

 Are you suggesting something like :

Not at all.

BTW, this all reminds me that we need to be careful that this
isn't allowing arbitrary users to eat up a ton of unswappable
ram.  It's pretty easy to open up a lot of listening sockets :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] Size listen hash tables using backlog hint

2006-10-19 Thread Eric Dumazet
On Thursday 19 October 2006 08:57, David Miller wrote:

 Switch to vmalloc() at the kmalloc() cut-off point, just like
 I did for the other hashes in the tree.

Yes, so you basically want option 4) :)


4) Use vmalloc() if size_lopt  PAGE_SIZE

keep a power_of two :
nr_table_entries = 2 ^ X;

size_lopt = sizeof(listen_sock) + nr_table_entries*sizeof(void*);
if (size  PAGE_SIZE)
  ptr = vmalloc(size_lopt);
else
  ptr = kmalloc(size_lopt);

Pros :
Only under one page is wasted (ie allocated but not used)
vmalloc() is nicer for NUMA, so I am pleased :)
vmalloc() has more chances to succeed when memory is fragmented
keep a power-of-two hash table size

Cons :
TLB cost

// for reference
struct listen_sock {
u8  max_qlen_log;
/* 3 bytes hole, try to use */
int qlen;
int qlen_young;
int clock_hand;
u32 hash_rnd;
u32 nr_table_entries;
struct request_sock *syn_table[0]; /* hash table follow this 
header */
};



 BTW, this all reminds me that we need to be careful that this
 isn't allowing arbitrary users to eat up a ton of unswappable
 ram.  It's pretty easy to open up a lot of listening sockets :)

With actual somaxconn=128 limit, my patch ends in allocating less ram (half of 
a page) than current x86_64 kernel (2 pages)

Thank you

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] Size listen hash tables using backlog hint

2006-10-19 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Thu, 19 Oct 2006 10:29:00 +0200

 Cons :
 TLB cost

For those hot x86 and x86_64 cpus you tend to be using, this
particular cost is relatively small.  :-) It's effectively like
another memory reference in the worst case, in the best case
it's free.

 With actual somaxconn=128 limit, my patch ends in allocating less
 ram (half of a page) than current x86_64 kernel (2 pages)

Understood.  But the issue is that there are greater security
implications than before when increasing this sysctl.

To be honest, it's probably water under the bridge, because
if you can stuff up SOMAXCONN number of sockets into the
system per listening socket which is a lot more than the
hash table eats up. :-)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] Size listen hash tables using backlog hint

2006-10-19 Thread Eric Dumazet
On Thursday 19 October 2006 10:41, David Miller wrote:
 From: Eric Dumazet [EMAIL PROTECTED]
 Date: Thu, 19 Oct 2006 10:29:00 +0200

  Cons :
  TLB cost

 For those hot x86 and x86_64 cpus you tend to be using, this
 particular cost is relatively small.  :-) It's effectively like
 another memory reference in the worst case, in the best case
 it's free.

Well, it was a private joke with you, as you  *use* machines that take a fault 
on a TLB miss :) 

BTW I do care of old machines too...
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: (usagi-core 31424) Re: [PATCH 7/13] [RFC] [IPV6] Move source address selection into route lookup.

2006-10-19 Thread Ville Nuorvala
Mitsuru-san,

could you apply patch #12 and rerun the test? I think this particular
problem is caused by the routing cache entry not having a valid source
address in the first place. As ip6_dst_lookup() doesn't do the source
address lookup anymore, it is critical that we only store cache entries
with complete address data for both destination and source.

These two patches apparently shouldn't have been split in the first
place. Sorry!

Regards,
Ville
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [NET] Size listen hash tables using backlog hint

2006-10-19 Thread Eric Dumazet
Hi David

Here is the second try for this patch. Many thanks for your feedback.

[PATCH] [NET] Size listen hash tables using backlog hint

We currently allocate  a fixed size 512 (TCP_SYNQ_HSIZE) slots hash table for 
each LISTEN socket, regardless of various parameters (listen backlog for 
example)

On x86_64, this means order-1 allocations (might fail), even for 'small' 
sockets, expecting few connections. On the contrary, a huge server wanting a 
backlog of 5 is slowed down a bit because of this fixed limit.

This patch makes the sizing of listen hash table a dynamic parameter, 
depending of :
- net.core.somaxconn tunable (default is 128)
- net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
- backlog value given by user application  (2nd parameter of listen())

For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of 
kmalloc().

We still limit memory allocation with the two existing tunables (somaxconn  
tcp_max_syn_backlog).

 include/net/request_sock.h  |8 
 include/net/tcp.h   |1 -
 net/core/request_sock.c |   38 +-
 net/dccp/ipv4.c |2 +-
 net/dccp/proto.c|6 +++---
 net/ipv4/af_inet.c  |2 +-
 net/ipv4/inet_connection_sock.c |2 +-
 net/ipv4/tcp_ipv4.c |6 +++---
 net/ipv6/tcp_ipv6.c |2 +-
 9 files changed, 43 insertions(+), 24 deletions(-)

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
--- linux-2.6.19-rc2/net/core/request_sock.c2006-10-13 18:25:04.0 
+0200
+++ linux-2.6.19-rc2-ed/net/core/request_sock.c 2006-10-19 11:05:56.0 
+0200
@@ -15,6 +15,7 @@
 #include linux/random.h
 #include linux/slab.h
 #include linux/string.h
+#include linux/vmalloc.h
 
 #include net/request_sock.h
 
@@ -29,22 +30,31 @@
  * it is absolutely not enough even at 100conn/sec. 256 cures most
  * of problems. This value is adjusted to 128 for very small machines
  * (=32Mb of memory) and to 1024 on normal or better ones (=256Mb).
- * Further increasing requires to change hash table size.
+ * Note : Dont forget somaxconn that may limit backlog too.
  */
 int sysctl_max_syn_backlog = 256;
 
 int reqsk_queue_alloc(struct request_sock_queue *queue,
- const int nr_table_entries)
+ unsigned int nr_table_entries)
 {
-   const int lopt_size = sizeof(struct listen_sock) +
- nr_table_entries * sizeof(struct request_sock *);
-   struct listen_sock *lopt = kzalloc(lopt_size, GFP_KERNEL);
+   size_t lopt_size = sizeof(struct listen_sock);
+   struct listen_sock *lopt;
 
+   nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
+   nr_table_entries = max_t(u32, nr_table_entries, 8);
+   nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
+   lopt_size += nr_table_entries * sizeof(struct request_sock *);
+   if (lopt_size  PAGE_SIZE)
+   lopt = __vmalloc(lopt_size,
+   GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
+   PAGE_KERNEL);
+   else
+   lopt = kzalloc(lopt_size, GFP_KERNEL);
if (lopt == NULL)
return -ENOMEM;
 
-   for (lopt-max_qlen_log = 6;
-(1  lopt-max_qlen_log)  sysctl_max_syn_backlog;
+   for (lopt-max_qlen_log = 3;
+(1  lopt-max_qlen_log)  nr_table_entries;
 lopt-max_qlen_log++);
 
get_random_bytes(lopt-hash_rnd, sizeof(lopt-hash_rnd));
@@ -52,6 +62,11 @@
queue-rskq_accept_head = NULL;
lopt-nr_table_entries = nr_table_entries;
 
+   /*
+* This write_lock_bh()/write_unlock_bh() pair forces this CPU to commit
+* its memory changes and let readers (which acquire syn_wait_lock in
+* reader mode) operate without seeing random content.
+*/
write_lock_bh(queue-syn_wait_lock);
queue-listen_opt = lopt;
write_unlock_bh(queue-syn_wait_lock);
@@ -65,9 +80,11 @@
 {
/* make all the listen_opt local to us */
struct listen_sock *lopt = reqsk_queue_yank_listen_sk(queue);
+   size_t lopt_size = sizeof(struct listen_sock) +
+   lopt-nr_table_entries * sizeof(struct request_sock *);
 
if (lopt-qlen != 0) {
-   int i;
+   unsigned int i;
 
for (i = 0; i  lopt-nr_table_entries; i++) {
struct request_sock *req;
@@ -81,7 +98,10 @@
}
 
BUG_TRAP(lopt-qlen == 0);
-   kfree(lopt);
+   if (lopt_size  PAGE_SIZE)
+   vfree(lopt);
+   else
+   kfree(lopt);
 }
 
 EXPORT_SYMBOL(reqsk_queue_destroy);
--- linux-2.6.19-rc2/net/ipv4/af_inet.c 2006-10-13 18:25:04.0 +0200
+++ linux-2.6.19-rc2-ed/net/ipv4/af_inet.c  2006-10-17 10:32:22.0 
+0200
@@ -204,7 +204,7 @@
 * we can only allow the backlog to be adjusted.
 */
   

watchdog timeout panic in e1000 driver

2006-10-19 Thread Kenzo Iwami
Hi,

A watchdog timeout panic occurred in e1000 driver (7.2.9-NAPI).
If e1000_watchdog is called when processing ioctl from ethtool, the system
could stop inside e1000_watchdog interrupt handler for about 16 seconds,
and the system panicked as a result of a watchdog timeout.

This problem only occurs on a server using ethernet controller inside
631xESB/632xESB, and NMI watchdog enabled.

Environment:
  OS : RHEL4U3(x86_64)
  kernel : 2.6.9-34.ELsmp
  e1000  : 7.2.9-NAPI
  Ethernet controller : Intel Corporation 631x/632xESB DPT LAN Controller
Copper (rev 01)
  Watchdog timer should be enabled with a timeout period of less than 16
  seconds.

Steps to reproduce:
  Please apply the attached patch (ethtool.patch) to ethtool (VERSION 5) source
  code. Run make, and rename the freshly built ethtool as gsetloop.
  Put gsetloop and the attached shell script (gloop.sh) in the same directory,
  and execute gloop.sh. The problem should occur within about 5 minutes.

Cause:
  The problem occurs in the following steps.
   - ioctl is executed in ethtool.
  - e1000_read_phy_reg() is called from ioctl to read the value from phy
register.
  - e1000_get_hw_eeprom_semaphore() is called from e1000_read_phy_reg() to
acquire a semaphore.
  - E1000_SWSM_SWESMBI bit that is FW semaphore bit is set in
e1000_get_hw_eeprom_semaphore().
  - When this bit was set, E1000_SWSM_SMBI bit that is driver's semaphore
bit is also set.
   - e1000_watchdog() of interrupt handler is executed before the
 E1000_SWSM_SMBI bit is unset.
  - e1000_read_phy_reg() is called from e1000_watchdog() to read the value
from phy register.
  - e1000_get_software_semaphore() is called from e1000_watchdog to confirm
whether interruption handler can acquire a semaphore.
This function confirms whether E1000_SWSM_SMBI bit is being set.
  - Therefore the process does loop for hw-eeprom.word_size + 1 msec
in e1000_get_software_semaphore().
The value of hw-eeprom.word_size + 1 was 16385 on my system.
In other words it loops for 16.385 sec in
e1000_get_software_semaphore().
If NMI watchdog is enabled, the system will panic by NMI watchdog
within this loop.

Fix:
  In kernels before 2.6.17, the e1000_watchdog() interrupt handler schedules
  e1000_watchdog_task(). The semaphore is acquired within this task, after
  ioctl processing for ethtool is finished, and this problem is avoided.

  e1000_watchdog_task() was remove by the following patch.

http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2db10a081c5c1082d58809a1bcf1a6073f4db160
 e1000: rework driver hardware reset locking
 After studying the driver mac reset code it was found that there
 were multiple race conditions possible to reset the unit twice or
 bring it e1000_up() double. This fixes all occurences where the
 driver needs to reset the mac.
 
 We also remove irq requesting/releasing into _open and _close so
 that while the device is _up we will never touch the irq's. This fixes
 the double free irq bug that people saw.
 
 To make sure that the watchdog task doesn't cause another race we let
 it run as a non-scheduled task.

  I'm not sure whether there was any reason to actively remove
  e1000_watchdog_task(). I think that removing e1000_watchdog_task() was a
  mistake, and it should be brought back in.
--
  Kenzo Iwami ([EMAIL PROTECTED])


#!/bin/sh

n=64
i=1
[ $# -ge 1 ]  n=$1

while [ $i -le $n ]
do
./gsetloop eth0 
i=`expr $i + 1`
done
diff -urpN ethtool_git/ethtool.c my-ethtool/ethtool.c
--- ethtool_git/ethtool.c   2006-10-18 14:17:10.0 +0900
+++ my-ethtool/ethtool.c2006-10-18 14:18:45.0 +0900
@@ -1576,7 +1576,9 @@ static int do_gset(int fd, struct ifreq 
 
ecmd.cmd = ETHTOOL_GSET;
ifr-ifr_data = (caddr_t)ecmd;
-   err = ioctl(fd, SIOCETHTOOL, ifr);
+   for (;;) {
+   err = ioctl(fd, SIOCETHTOOL, ifr);
+   }
if (err == 0) {
err = dump_ecmd(ecmd);
if (err)


Re: (usagi-core 31424) Re: [PATCH 7/13] [RFC] [IPV6] Move source address selection into route lookup.

2006-10-19 Thread Mitsuru Chinen

Hello Ville,

Ville Nuorvala wrote:

Mitsuru-san,

could you apply patch #12 and rerun the test? I think this particular
problem is caused by the routing cache entry not having a valid source
address in the first place. As ip6_dst_lookup() doesn't do the source
address lookup anymore, it is critical that we only store cache entries
with complete address data for both destination and source.

These two patches apparently shouldn't have been split in the first
place. Sorry!


I'm afraid patch #12 doesn't work for this issue.

Although I applied patch #12, I still got fails.
The test log is the same as the last one.
When I applied all patches, I got fails, either.

Best Regards,
--
Mitsuru Chinen [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS

2006-10-19 Thread Eric Dumazet
Hi David

Lot of routers/embedded devices still use CPUS with 16/32 bytes cache lines. 
(486, Pentium, ...  PIII)
It makes sense to group together fields used at lookup time so they fit in one 
cache line.
This reduce cache footprint and speedup lookups.

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
--- net-2.6/include/net/inetpeer.h  2006-10-19 12:50:29.0 +0200
+++ net-2.6-ed/include/net/inetpeer.h   2006-10-19 12:52:08.0 +0200
@@ -17,14 +17,15 @@
 
 struct inet_peer
 {
+   /* group together avl_left,avl_right,v4daddr to speedup lookups */
struct inet_peer*avl_left, *avl_right;
+   __be32  v4daddr;/* peer's address */
+   __u16   avl_height;
+   __u16   ip_id_count;/* IP ID for the next packet */
struct inet_peer*unused_next, **unused_prevp;
__u32   dtime;  /* the time of last use of not
 * referenced entries */
atomic_trefcnt;
-   __be32  v4daddr;/* peer's address */
-   __u16   avl_height;
-   __u16   ip_id_count;/* IP ID for the next packet */
atomic_trid;/* Frag reception counter */
__u32   tcp_ts;
unsigned long   tcp_ts_stamp;


Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)

2006-10-19 Thread Patrick McHardy
jamal wrote:
 ACKed-by: Jamal Hadi Salim
 
 When Patrick has his patch ready after this goes in we can revisit.

NACK.

I still think this patch shouldn't go in. There's no point in doing the
same thing twice, and I haven't heard a compelling argument why it has
to be done in a way that only helps qdiscs using rtabs while ignoring
statistics and estimators (I even provided a patch to show how to do
it without these limitations).

Besides that:

+static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen)
+{
+   int slot = pktlen + rtab-rate.cell_align;
+   if (slot  0)
+   slot = 0;

Why would it go negative? A negative cell_align doesn't make sense I
guess.

+   slot = rtab-rate.cell_log;
+   if (slot  255)
+   return rtab-data[255] + 1;

Whats the point of this? Is it just to keep htb giant statistics
working?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: watchdog timeout panic in e1000 driver

2006-10-19 Thread Auke Kok

Kenzo Iwami wrote:

A watchdog timeout panic occurred in e1000 driver (7.2.9-NAPI).


where's the panic message ?

Please CC the maintainers of the driver at all times. Our e-mail addresses are widely 
visible everywhere.



If e1000_watchdog is called when processing ioctl from ethtool, the system
could stop inside e1000_watchdog interrupt handler for about 16 seconds



and the system panicked as a result of a watchdog timeout.

This problem only occurs on a server using ethernet controller inside
631xESB/632xESB, and NMI watchdog enabled.


why only this system? have you seen/tried it on other machines?


Environment:
  OS : RHEL4U3(x86_64)
  kernel : 2.6.9-34.ELsmp
  e1000  : 7.2.9-NAPI
  Ethernet controller : Intel Corporation 631x/632xESB DPT LAN Controller
Copper (rev 01)
  Watchdog timer should be enabled with a timeout period of less than 16
  seconds.
Steps to reproduce:
  Please apply the attached patch (ethtool.patch) to ethtool (VERSION 5) source
  code. Run make, and rename the freshly built ethtool as gsetloop.
  Put gsetloop and the attached shell script (gloop.sh) in the same directory,
  and execute gloop.sh. The problem should occur within about 5 minutes.




Cause:
  The problem occurs in the following steps.
   - ioctl is executed in ethtool.
  - e1000_read_phy_reg() is called from ioctl to read the value from phy
register.
  - e1000_get_hw_eeprom_semaphore() is called from e1000_read_phy_reg() to
acquire a semaphore.
  - E1000_SWSM_SWESMBI bit that is FW semaphore bit is set in
e1000_get_hw_eeprom_semaphore().
  - When this bit was set, E1000_SWSM_SMBI bit that is driver's semaphore
bit is also set.
   - e1000_watchdog() of interrupt handler is executed before the
 E1000_SWSM_SMBI bit is unset.
  - e1000_read_phy_reg() is called from e1000_watchdog() to read the value
from phy register.
  - e1000_get_software_semaphore() is called from e1000_watchdog to confirm
whether interruption handler can acquire a semaphore.
This function confirms whether E1000_SWSM_SMBI bit is being set.
  - Therefore the process does loop for hw-eeprom.word_size + 1 msec
in e1000_get_software_semaphore().
The value of hw-eeprom.word_size + 1 was 16385 on my system.
In other words it loops for 16.385 sec in
e1000_get_software_semaphore().
If NMI watchdog is enabled, the system will panic by NMI watchdog
within this loop.

Fix:
  In kernels before 2.6.17, the e1000_watchdog() interrupt handler schedules
  e1000_watchdog_task(). The semaphore is acquired within this task, after
  ioctl processing for ethtool is finished, and this problem is avoided.

  e1000_watchdog_task() was remove by the following patch.

http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2db10a081c5c1082d58809a1bcf1a6073f4db160
 e1000: rework driver hardware reset locking
 After studying the driver mac reset code it was found that there
 were multiple race conditions possible to reset the unit twice or
 bring it e1000_up() double. This fixes all occurences where the
 driver needs to reset the mac.
 
 We also remove irq requesting/releasing into _open and _close so
 that while the device is _up we will never touch the irq's. This fixes
 the double free irq bug that people saw.
 
 To make sure that the watchdog task doesn't cause another race we let
 it run as a non-scheduled task.

  I'm not sure whether there was any reason to actively remove
  e1000_watchdog_task(). I think that removing e1000_watchdog_task() was a
  mistake, and it should be brought back in.



Reverting this could would not be a fix, but only a workaround that leaves the problem 
still in the code, and as such not progress in the right direction.


I find this report extremely edgy, but I'll look into the fact that the driver attempts 
to sleep for 16384 + 1 msec, which seems overly long :)


As a side note, most other e1000 NIC's use hardcoded word_size numbers, but esb2 systems 
read it from a register/eeprom. Can you send me the output of `ethtool -e ethX` ? 
off-list is OK, it might be large.


Thanks,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [NET] One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err()

2006-10-19 Thread Eric Dumazet
I believe this NET_INC_STATS() call can be replaced by  NET_INC_STATS_BH(), a 
little bit cheaper.

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
--- linux/net/ipv4/tcp_ipv4.c.orig  2006-10-19 17:37:22.0 +0200
+++ linux-ed/net/ipv4/tcp_ipv4.c2006-10-19 17:37:43.0 +0200
@@ -373,7 +373,7 @@
seq = ntohl(th-seq);
if (sk-sk_state != TCP_LISTEN 
!between(seq, tp-snd_una, tp-snd_nxt)) {
-   NET_INC_STATS(LINUX_MIB_OUTOFWINDOWICMPS);
+   NET_INC_STATS_BH(LINUX_MIB_OUTOFWINDOWICMPS);
goto out;
}
 


Re: [PATCH] d80211: ieee80211_hw handlers should be allowed to sleep

2006-10-19 Thread Jiri Benc
On Wed, 18 Oct 2006 19:27:37 +0200, Ivo van Doorn wrote:
 Would something like the patch below be better?
 It keeps the flush_scheduled_work() at the same location, but a second
 is added in case local-scan_work.data != sdata-dev

Applied to my tree, thanks for the patch!

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ethernet Cheap Cryptography

2006-10-19 Thread Pawel Foremski
Stephen J. Bevan wrote:

 Dawid Ciezarkiewicz writes:
   It enforces to use upper level encryption with internal
   fragmentation which is problem because of more more frames that
   those bridges have to handle, bigger traffic etc.
 
 Where did the fragmentation come from?  If you are sending TCP over
 IPsec then the ESP/AH code should decrease the TCP MSS as it goes
 through to take acount of the extra space that IPsec will take up.
 Thus neither end-point will ever send a frame on that session that
 will require fragmentation.  Granted you can still have a problem if
 someone sends a UDP packet that is close to the 1500 MTU, but RFCs
 recommend against it (e.g. DNSSIP) and applications should try to
 avoid it.

First, ccrypts task is to secure Ethernet, not IP. Secondly, IPsec won't
decrease MSS in TCP encapsulated in PPPoE traffic, for example.

 Thus the argument for ccrypt should say :-
 
 a) why IPsec is not suitable for securing IP traffic in WIFI scenarios.

It's suitable. But for IP.

 b) what traffic other than IP traffic needs to be encrypted.

PPPoE; Ethernet in general.

   This allows key switching without loosing any frames. It should be done
   quickly, since when in key transition state all invalid/spoofed
   frames have  double cpu impact on receiver. Shouldn't be a problem
   because attacker should have no clue about when key is being switched.
 
 If the keying is done manually an attacker won't know when the keys
 are changed.  However, if keying is coordinated over the same link via
 a protocol (as is done with IKE for IPsec) then the attacker can see
 (or at least guess) the packets carrying the keying protcol thus know
 re-keying is going to occur.

Only if the rekeying traffic is the only being transmitted. IMHO a border
case.

 Indeed, in IPsec, the equivalent of ccrypt is ESP and that's rather
 straighforward.  The complicated part is IKE, the userspace component
 that handles keying.  It is certainly possible to create something
 simpler than IKE (e.g. IKEv2 is somewhat simpler) but the devil is in
 the details.

Sure, but that's IMHO little bit off-topic in regard to ccrypt, which is
just an encryption back-end (eventually the rekeying daemon will sit in the
userspace).

Oh, and of course I agree IKE (v1) is too complicated :-).

   I was not aware of that. Thanks. I will add this info to
   documentation. There is nothing actually I can do about that in the
   form that ccrypt is mean to be now.
 
 For completness there are also switches that :-
 
 * take notice of the TOS/DiffServ bits in an IP header and will
   re-order based on them
 
 * will re-order frames due to redundancy, load-balancing,
   spanning-tree changes ... etc.

I'll only add to what Dawid has said that ccrypt has been designed for
direct P2P links, with single path (and no such switches on it's way).
Later it turned out to be applicable for eg. small (simple) LANs or
wireless ad-hoc networks.

Thanks for your remarks!

Bye,

-- 
Pawel Foremski  
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.18] AT91RM9200 Ethernet update

2006-10-19 Thread Andrew Victor
hi,

 ACK, but patch doesn't apply to 2.6.18

The patch failed due to AT91_ID_EMAC being renamed to
AT91RM9200_ID_EMAC.

Updated patch for 2.6.19-rc2 attached.

Regards,
  Andrew Victor


-

This patch contains a few updates to the AT91RM9200 Ethernet driver.

The changes are:
1. Remove the global 'at91_dev' variable.
2. The global 'check_timer' moved into the private data structure.
3. Rather use dev_alloc_skb() instead of alloc_skb().
4. It is not necessary to adjust skb-len manually.
5. The I/O base-address and IRQ are no longer hard-coded, but are passed
via platform_device resources.


Signed-off-by: Andrew Victor [EMAIL PROTECTED]




diff -urN -x CVS linux-2.6.19-rc2.orig/drivers/net/arm/at91_ether.c 
linux-2.6.19/drivers/net/arm/at91_ether.c
--- linux-2.6.19-rc2.orig/drivers/net/arm/at91_ether.c  Thu Oct 19 16:31:49 2006
+++ linux-2.6.19/drivers/net/arm/at91_ether.c   Mon Oct 16 14:19:00 2006
@@ -41,9 +41,6 @@
 #define DRV_NAME   at91_ether
 #define DRV_VERSION1.0
 
-static struct net_device *at91_dev;
-
-static struct timer_list check_timer;
 #define LINK_POLL_INTERVAL (HZ)
 
 /* . */
@@ -252,8 +249,8 @@
 * PHY doesn't have an IRQ pin (RTL8201, DP83847, AC101L),
 * or board does not have it connected.
 */
-   check_timer.expires = jiffies + LINK_POLL_INTERVAL;
-   add_timer(check_timer);
+   lp-check_timer.expires = jiffies + LINK_POLL_INTERVAL;
+   add_timer(lp-check_timer);
return;
}
 
@@ -300,7 +297,7 @@
 
irq_number = lp-board_data.phy_irq_pin;
if (!irq_number) {
-   del_timer_sync(check_timer);
+   del_timer_sync(lp-check_timer);
return;
}
 
@@ -362,13 +359,14 @@
 static void at91ether_check_link(unsigned long dev_id)
 {
struct net_device *dev = (struct net_device *) dev_id;
+   struct at91_private *lp = (struct at91_private *) dev-priv;
 
enable_mdi();
update_linkspeed(dev, 1);
disable_mdi();
 
-   check_timer.expires = jiffies + LINK_POLL_INTERVAL;
-   add_timer(check_timer);
+   lp-check_timer.expires = jiffies + LINK_POLL_INTERVAL;
+   add_timer(lp-check_timer);
 }
 
 /* . ADDRESS MANAGEMENT  */
@@ -857,14 +855,13 @@
while (dlist-descriptors[lp-rxBuffIndex].addr  EMAC_DESC_DONE) {
p_recv = dlist-recv_buf[lp-rxBuffIndex];
pktlen = dlist-descriptors[lp-rxBuffIndex].size  0x7ff;  
/* Length of frame including FCS */
-   skb = alloc_skb(pktlen + 2, GFP_ATOMIC);
+   skb = dev_alloc_skb(pktlen + 2);
if (skb != NULL) {
skb_reserve(skb, 2);
memcpy(skb_put(skb, pktlen), p_recv, pktlen);
 
skb-dev = dev;
skb-protocol = eth_type_trans(skb, dev);
-   skb-len = pktlen;
dev-last_rx = jiffies;
lp-stats.rx_bytes += pktlen;
netif_rx(skb);
@@ -937,17 +934,22 @@
struct net_device *dev;
struct at91_private *lp;
unsigned int val;
-   int res;
-
-   if (at91_dev)   /* already initialized */
-   return 0;
+   struct resource *res;
+   int ret;
 
dev = alloc_etherdev(sizeof(struct at91_private));
if (!dev)
return -ENOMEM;
 
-   dev-base_addr = AT91_VA_BASE_EMAC;
-   dev-irq = AT91RM9200_ID_EMAC;
+   /* Get I/O base address and IRQ */
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (!res) {
+   free_netdev(dev);
+   return -ENODEV;  
+   }
+   dev-base_addr = res-start;
+   dev-irq = platform_get_irq(pdev, 0);
+
SET_MODULE_OWNER(dev);
 
/* Install the interrupt handler */
@@ -1017,14 +1019,13 @@
lp-phy_address = phy_address;  /* MDI address of PHY */
 
/* Register the network interface */
-   res = register_netdev(dev);
-   if (res) {
+   ret = register_netdev(dev);
+   if (ret) {
free_irq(dev-irq, dev);
free_netdev(dev);
dma_free_coherent(NULL, sizeof(struct recv_desc_bufs), 
lp-dlist, (dma_addr_t)lp-dlist_phys);
-   return res;
+   return ret;
}
-   at91_dev = dev;
 
/* Determine current link speed */
spin_lock_irq(lp-lock);
@@ -1036,9 +1037,9 @@
 
/* If board has no PHY IRQ, use a timer to poll the PHY */
if (!lp-board_data.phy_irq_pin) {
-   init_timer(check_timer);
-   check_timer.data = (unsigned long)dev;
-   check_timer.function = at91ether_check_link;
+   init_timer(lp-check_timer);
+   

Re: [PATCH] d80211: extend extra_hdr_room to be a bytecount

2006-10-19 Thread Jiri Benc
On Wed, 11 Oct 2006 07:59:23 -0700, David Kimdon wrote:
 Perhaps rename it to extra_tx_headroom?
  - existing users would then need to take notice of the change
  - the name 'extra_tx_headroom' is more descriptive of what it actually is

Extend ieee80211_hw's extra_hdr_room to be a bytecount for
a device specific TX header instead of being a hardcoded
0/2 byte choice.

Based on the patch by Michael Buesch [EMAIL PROTECTED].

Signed-off-by: Jiri Benc [EMAIL PROTECTED]

---
 drivers/net/wireless/d80211/adm8211/adm8211.c  |2 +-
 drivers/net/wireless/d80211/rt2x00/rt2400pci.c |2 +-
 drivers/net/wireless/d80211/rt2x00/rt2500pci.c |2 +-
 drivers/net/wireless/d80211/rt2x00/rt2500usb.c |2 +-
 drivers/net/wireless/d80211/rt2x00/rt61pci.c   |2 +-
 drivers/net/wireless/d80211/rt2x00/rt73usb.c   |2 +-
 include/net/d80211.h   |7 +++
 net/d80211/ieee80211.c |2 +-
 8 files changed, 10 insertions(+), 11 deletions(-)

--- dscape.orig/drivers/net/wireless/d80211/adm8211/adm8211.c
+++ dscape/drivers/net/wireless/d80211/adm8211/adm8211.c
@@ -2018,7 +2018,7 @@ static int __devinit adm8211_probe(struc
hw-wep_include_iv = 1;
hw-data_nullfunc_ack = 0;
hw-no_tkip_wmm_hwaccel = 1;
-   hw-extra_hdr_room = 0;
+   hw-extra_tx_headroom = 0;
hw-device_strips_mic = 0;
hw-monitor_during_oper = 0;
hw-fraglist = 0;
--- dscape.orig/drivers/net/wireless/d80211/rt2x00/rt2400pci.c
+++ dscape/drivers/net/wireless/d80211/rt2x00/rt2400pci.c
@@ -2578,7 +2578,7 @@ static int rt2400pci_init_hw(struct rt2x
hw-wep_include_iv = 1;
hw-data_nullfunc_ack = 1;
hw-no_tkip_wmm_hwaccel = 1;
-   hw-extra_hdr_room = 0;
+   hw-extra_tx_headroom = 0;
hw-device_strips_mic = 0;
hw-monitor_during_oper = 1;
hw-fraglist = 0;
--- dscape.orig/drivers/net/wireless/d80211/rt2x00/rt2500pci.c
+++ dscape/drivers/net/wireless/d80211/rt2x00/rt2500pci.c
@@ -2732,7 +2732,7 @@ static int rt2500pci_init_hw(struct rt2x
hw-wep_include_iv = 1;
hw-data_nullfunc_ack = 1;
hw-no_tkip_wmm_hwaccel = 1;
-   hw-extra_hdr_room = 0;
+   hw-extra_tx_headroom = 0;
hw-device_strips_mic = 0;
hw-monitor_during_oper = 1;
hw-fraglist = 0;
--- dscape.orig/drivers/net/wireless/d80211/rt2x00/rt2500usb.c
+++ dscape/drivers/net/wireless/d80211/rt2x00/rt2500usb.c
@@ -2419,7 +2419,7 @@ static int rt2500usb_init_hw(struct rt2x
hw-wep_include_iv = 1;
hw-data_nullfunc_ack = 1;
hw-no_tkip_wmm_hwaccel = 1;
-   hw-extra_hdr_room = 0;
+   hw-extra_tx_headroom = 0;
hw-device_strips_mic = 0;
hw-monitor_during_oper = 1;
hw-fraglist = 0;
--- dscape.orig/drivers/net/wireless/d80211/rt2x00/rt61pci.c
+++ dscape/drivers/net/wireless/d80211/rt2x00/rt61pci.c
@@ -3252,7 +3252,7 @@ static int rt61pci_init_hw(struct rt2x00
hw-wep_include_iv = 1;
hw-data_nullfunc_ack = 1;
hw-no_tkip_wmm_hwaccel = 1;
-   hw-extra_hdr_room = 0;
+   hw-extra_tx_headroom = 0;
hw-device_strips_mic = 0;
hw-monitor_during_oper = 1;
hw-fraglist = 0;
--- dscape.orig/drivers/net/wireless/d80211/rt2x00/rt73usb.c
+++ dscape/drivers/net/wireless/d80211/rt2x00/rt73usb.c
@@ -2792,7 +2792,7 @@ static int rt73usb_init_hw(struct rt2x00
hw-wep_include_iv = 1;
hw-data_nullfunc_ack = 1;
hw-no_tkip_wmm_hwaccel = 1;
-   hw-extra_hdr_room = 0;
+   hw-extra_tx_headroom = 0;
hw-device_strips_mic = 0;
hw-monitor_during_oper = 1;
hw-fraglist = 0;
--- dscape.orig/include/net/d80211.h
+++ dscape/include/net/d80211.h
@@ -456,10 +456,6 @@ struct ieee80211_hw {
/* Force software encryption for TKIP packets if WMM is enabled. */
unsigned int no_tkip_wmm_hwaccel:1;
 
-   /* set if the payload needs to be padded at even boundaries after the
-* header */
-   unsigned int extra_hdr_room:1;
-
/* Some devices handle Michael MIC internally and do not include MIC in
 * the received packets passed up. device_strips_mic must be set
 * for such devices. The 'encryption' frame control bit is expected to
@@ -476,6 +472,9 @@ struct ieee80211_hw {
 * i.e. more than one skb per frame */
unsigned int fraglist:1;
 
+   /* Set to the size of a needed device specific skb headroom for TX 
skbs. */
+   unsigned int extra_tx_headroom;
+
 /* This is the time in us to change channels
  */
 int channel_change_time;
--- dscape.orig/net/d80211/ieee80211.c
+++ dscape/net/d80211/ieee80211.c
@@ -1511,7 +1511,7 @@ static int ieee80211_subif_start_xmit(st
 * build in headroom in __dev_alloc_skb() (linux/skbuff.h) and
 * alloc_skb() (net/core/skbuff.c)
 */
-   head_need = hdrlen + encaps_len + (local-hw-extra_hdr_room ? 2 : 0);
+   head_need = hdrlen + encaps_len + 

Re: sky2 crash

2006-10-19 Thread Stephen Hemminger
On Thu, 19 Oct 2006 08:10:33 -0700
Shane [EMAIL PROTECTED] wrote:

 Hello,
 
 I am experiencing an intermittent crash with a Gigabit
 controler using the sky2 driver under load.  Confirmed on
 2.6.19-rc2 but also present with 2.6.18.  After the lockup,
 the system works as normally but the sky2 interface no
 longer processes traffic.  Here's the printk output:
 NETDEV WATCHDOG: eth0: transmit timed out
 sky2 eth0: tx timeout
 sky2 eth0: transmit ring 401 .. 378 report=403 done=403
 sky2 status report lost?

See below, most likely lost IRQ.

 BUG: soft lockup detected on CPU#0!
  [781447d5]  [78128ad6]  [78113b91]  [7810390b]
 [782b2477]  [f8986e43]  [7825fc69]  [7825fce5]
 [78128a30]  [78124eb1]  [78124f49]  [78113b96]
 [7810390b]  [78101265]  [78101281]  [78101d4f]
 [7837578c]  [783751e0]  ===
 
 And the controler:
 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
 Gigabit
  Ethernet Controller (rev 22)
 Subsystem: Giga-byte Technology Marvell 88E8053 Gigabit Ethernet 
 Control
 ler (Gigabyte)
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
 Step
 ping- SERR- FastB2B-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- 
 TAbort-
 MAbort- SERR- PERR-
 Latency: 0, Cache Line Size: 32 bytes
 Interrupt: pin A routed to IRQ 218
 Region 0: Memory at e800 (64-bit, non-prefetchable) [size=16K]
 Region 2: I/O ports at 8000 [size=256]
 [virtual] Expansion ROM at 8800 [disabled] [size=128K]
 Capabilities: [48] Power Management version 2
 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
 PME(D0+,D1+,D2+,D3hot
 +,D3cold+)
 Status: D0 PME-Enable- DSel=0 DScale=1 PME-
 Capabilities: [50] Vital Product Data
 Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ 
 Queue=0/1
 Enable+
 Address: fee0300c  Data: 4142
 Capabilities: [e0] Express Legacy Endpoint IRQ 0
 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
 Device: Latency L0s unlimited, L1 unlimited
 Device: AtnBtn- AtnInd- PwrInd-
 Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
 Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
 Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
 Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0
 Link: Latency L0s 256ns, L1 unlimited
 Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
 Link: Speed 2.5Gb/s, Width x1
 Capabilities: [100] Advanced Error Reporting
 
 The system is a dual core Conroe system running with 2gb of
 ram and a memory split of 2gb/2gb.  Kernel preemption is
 voluntary.  I can disable preempt but it may take a day or
 two for the lockup to show up again.
 
 Also, the mtu on this iface is set to 9k btu and the lockup
 seems more frequent at a good network load.
 
 Shane
 

1. What is the interrupt usage: cat /proc/interrupts

2. Try with the workaround for lost IRQ's
modprobe sky2 idle_timeout=100

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AF_KEY extended xfrm_state selector handling

2006-10-19 Thread Michal Ruzicka
Hello

In an effort to configure an L2TP/IPsec server on Linux capable of supporting
multiple clients behind a single NAT device I ran into difficulties with pf_key
protocol implementation not being able to exploit all the information
passed to it as a SADB_EXT_ADDRESS_PROXY info. Perhaps as the original source
suggested (/* Nobody uses this, but we try. */) this info has never been used
before.

So I propose the included patch. It changes the following:
1) the amount of information that is stored into the struct xfrm_state's
   selector from the SADB_EXT_ADDRESS_PROXY info received from userspace
   in the pfkey_msg2xfrm_state() function
   The information stored now is:
- the address (stored as the selector's source address, including family)
- the prefix length (stored as the selector's source address prefix length)
- the protocol (stored as selector's protocol)
- the port (stored as selector's source port)
   Previously only the address and the prefix length were stored.

2) the conditions under which the SADB_EXT_ADDRESS_PROXY info
   is included while converting a struct xfrm_state into a pf_key message
   in the pfkey_xfrm_state2msg() function
   The conditions now are:
- selector' protcol family is non-zero (ie. the selector is defined)
and
(
  - selector's prtocol is non-zero (ie. the protocol is specified)
  or
  - selector's source port is non-zero (ie. the port is specified)
  or
  - selector's source address is different from xfrm_state source address
)
   Further the case when selector's address is of a different family from
   the xfrm_state address is now handled.

3) the way how port information is obtained from struct sadb_address
   The port information extraction is now part of
   the pfkey_sadb_addr2xfrm_addr() function wich handles that in a protocol
   family safe manner, instead of using ((struct sockaddr_in *)x)-sin_port
   construct irrespective of the protocol family

NOTES:
  - This patch should cause no problems, since as the original source says
nobody uses that info.
  - I've also created a patch for racoon (ipsec-tools) to actually pass
that info. Eventually I've been able to establish L2TP/IPSec connections
from multiple clients behind the same NAT to the same L2TP/IPSec linux 2.6
based server.
(The procedure required a manual insertion of certain SPD entries during
the connection establishment but this will hopefuly be handled by
the L2TP daemon automatically soon.)

Here comes the patch (it is against 2.6.17.11):
Signed-off-by: Michal Ruzicka [EMAIL PROTECTED]

diff -Naur linux-2.6.17.11.orig/net/key/af_key.c 
linux-2.6.17.11/net/key/af_key.c
--- linux-2.6.17.11.orig/net/key/af_key.c   2006-08-23 23:16:33.0 
+0200
+++ linux-2.6.17.11/net/key/af_key.c2006-10-18 16:53:48.0 +0200
@@ -552,19 +552,28 @@
 }
 
 static int pfkey_sadb_addr2xfrm_addr(struct sadb_address *addr,
-xfrm_address_t *xaddr)
+xfrm_address_t *xaddr, __u16 *port)
 {
switch (((struct sockaddr*)(addr + 1))-sa_family) {
case AF_INET:
-   xaddr-a4 = 
-   ((struct sockaddr_in *)(addr + 1))-sin_addr.s_addr;
+   {
+   struct sockaddr_in *in = (struct sockaddr_in *)(addr + 1);
+
+   xaddr-a4 = in-sin_addr.s_addr;
+   if (port)
+   *port = in-sin_port;
return AF_INET;
+   }
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
case AF_INET6:
-   memcpy(xaddr-a6, 
-  ((struct sockaddr_in6 *)(addr + 1))-sin6_addr,
-  sizeof(struct in6_addr));
+   {
+   struct sockaddr_in6 *in6 = (struct sockaddr_in6 *)(addr + 1);
+
+   memcpy(xaddr-a6, in6-sin6_addr, sizeof(struct in6_addr));
+   if (port)
+   *port = in6-sin6_port;
return AF_INET6;
+   }
 #endif
default:
return 0;
@@ -651,6 +660,7 @@
int encrypt_key_size = 0;
int sockaddr_size;
struct xfrm_encap_tmpl *natt = NULL;
+   int proxy_size;
 
/* address family check */
sockaddr_size = pfkey_sockaddr_size(x-props.family);
@@ -674,14 +684,25 @@
 
/* identity  sensitivity */
 
-   if ((x-props.family == AF_INET 
-x-sel.saddr.a4 != x-props.saddr.a4)
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-   || (x-props.family == AF_INET6 
-   memcmp (x-sel.saddr.a6, x-props.saddr.a6, sizeof (struct 
in6_addr)))
-#endif
-   )
-   size += sizeof(struct sadb_address) + sockaddr_size;
+   if (x-sel.family != 0 
+   (x-sel.sport != 0 || x-sel.proto != 0
+|| x-sel.family != x-props.family
+|| (x-sel.family == AF_INET 
+x-sel.saddr.a4 != x-props.saddr.a4)
+

Re: BCM5461 phy issue in 10M/Full duplex

2006-10-19 Thread Maciej W. Rozycki
On Wed, 18 Oct 2006, Rick Jones wrote:

  I believe, but need to double check, that if I leave the BCM5461 in
  autoneg, and foce the switch to 10M/full that the BCM5461 will  autoneg at
  10M/half duplex.
 
 Indeed, if one side is hardcoded, autoneg will fail and the side trying to
 autoneg is required by the specs (not that I know chapter and verse to quote
 from the IEE stuff :( to go into half-duplex.

 Rather than forcing a PHY into 10Mbps, you may limit its advertised list 
of speeds/duplex settings and let autonegotiation do the rest.  It depends 
on how smart code to manage your switch is; it's certainly doable with 
Linux (you can do it with `mii-tool'; not sure about `ethtool') as long as 
the NIC driver supports the necessary ioctls.

 Was 10M/Fullduplex ever standardized?  If not I could see where kit might not
 be willing/able to autoneg to that.

 It is standard.

  Maciej
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] netpoll: initialize skb for UDP

2006-10-19 Thread Stephen Hemminger
Need to fully initialize skb to keep lower layers and queueing happy.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- linux-2.6.orig/net/core/netpoll.c   2006-10-18 15:26:36.0 -0700
+++ linux-2.6/net/core/netpoll.c2006-10-19 08:28:04.0 -0700
@@ -331,13 +331,13 @@
memcpy(skb-data, msg, len);
skb-len += len;
 
-   udph = (struct udphdr *) skb_push(skb, sizeof(*udph));
+   skb-h.uh = udph = (struct udphdr *) skb_push(skb, sizeof(*udph));
udph-source = htons(np-local_port);
udph-dest = htons(np-remote_port);
udph-len = htons(udp_len);
udph-check = 0;
 
-   iph = (struct iphdr *)skb_push(skb, sizeof(*iph));
+   skb-nh.iph = iph = (struct iphdr *)skb_push(skb, sizeof(*iph));
 
/* iph-version = 4; iph-ihl = 5; */
put_unaligned(0x45, (unsigned char *)iph);
@@ -353,8 +353,8 @@
iph-check= ip_fast_csum((unsigned char *)iph, iph-ihl);
 
eth = (struct ethhdr *) skb_push(skb, ETH_HLEN);
-
-   eth-h_proto = htons(ETH_P_IP);
+   skb-mac.raw = skb-data;
+   skb-protocol = eth-h_proto = htons(ETH_P_IP);
memcpy(eth-h_source, np-local_mac, 6);
memcpy(eth-h_dest, np-remote_mac, 6);
 

--

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] netpoll: use skb_buff_head for skb cache

2006-10-19 Thread Stephen Hemminger
The private skb cache should be managed with normal skb_buff_head rather
than a DIY queue. If pool is exhausted, don't print anything that just
makes the problem worse. After a number of attempts, punt and drop
the message (callers handle it already).

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 net/core/netpoll.c |   55 +
 1 file changed, 22 insertions(+), 33 deletions(-)

--- linux-2.6.orig/net/core/netpoll.c   2006-10-19 09:49:03.0 -0700
+++ linux-2.6/net/core/netpoll.c2006-10-19 10:06:39.0 -0700
@@ -36,9 +36,7 @@
 #define MAX_QUEUE_DEPTH (MAX_SKBS / 2)
 #define MAX_RETRIES 2
 
-static DEFINE_SPINLOCK(skb_list_lock);
-static int nr_skbs;
-static struct sk_buff *skbs;
+static struct sk_buff_head skb_list;
 
 static atomic_t trapped;
 
@@ -51,6 +49,7 @@
 
 static void zap_completion_queue(void);
 static void arp_reply(struct sk_buff *skb);
+static void refill_skbs(void);
 
 static void netpoll_run(unsigned long arg)
 {
@@ -79,6 +78,7 @@
break;
}
}
+   refill_skbs();
 }
 
 static int checksum_udp(struct sk_buff *skb, struct udphdr *uh,
@@ -169,19 +169,14 @@
 static void refill_skbs(void)
 {
struct sk_buff *skb;
-   unsigned long flags;
 
-   spin_lock_irqsave(skb_list_lock, flags);
-   while (nr_skbs  MAX_SKBS) {
+   while (skb_queue_len(skb_list)  MAX_SKBS) {
skb = alloc_skb(MAX_SKB_SIZE, GFP_ATOMIC);
if (!skb)
break;
 
-   skb-next = skbs;
-   skbs = skb;
-   nr_skbs++;
+   skb_queue_tail(skb_list, skb);
}
-   spin_unlock_irqrestore(skb_list_lock, flags);
 }
 
 static void zap_completion_queue(void)
@@ -210,37 +205,24 @@
put_cpu_var(softnet_data);
 }
 
-static struct sk_buff * find_skb(struct netpoll *np, int len, int reserve)
+static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
 {
-   int once = 1, count = 0;
-   unsigned long flags;
-   struct sk_buff *skb = NULL;
+   struct sk_buff *skb;
+   int tries = 0;
 
zap_completion_queue();
-repeat:
-   if (nr_skbs  MAX_SKBS)
-   refill_skbs();
 
+repeat:
skb = alloc_skb(len, GFP_ATOMIC);
-
-   if (!skb) {
-   spin_lock_irqsave(skb_list_lock, flags);
-   skb = skbs;
-   if (skb) {
-   skbs = skb-next;
-   skb-next = NULL;
-   nr_skbs--;
-   }
-   spin_unlock_irqrestore(skb_list_lock, flags);
-   }
+   if (!skb)
+   skb = skb_dequeue(skb_list);
 
if(!skb) {
-   count++;
-   if (once  (count == 100)) {
-   printk(out of netpoll skbs!\n);
-   once = 0;
-   }
+   if (++tries  MAX_RETRIES)
+   return NULL;
+
netpoll_poll(np);
+   tasklet_schedule(np-dev-npinfo-tx_task);
goto repeat;
}
 
@@ -589,6 +571,13 @@
return -1;
 }
 
+static __init int netpoll_init(void)
+{
+   skb_queue_head_init(skb_list);
+   return 0;
+}
+core_initcall(netpoll_init);
+
 int netpoll_setup(struct netpoll *np)
 {
struct net_device *ndev = NULL;

--

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] netpoll: rework skb transmit queue

2006-10-19 Thread Stephen Hemminger
The original skb management for netpoll was a mess, it had two queue paths
and a callback. This changes it to have a per-instance transmit queue
and use a tasklet rather than a work queue for the congested case.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 drivers/net/netconsole.c |1 
 include/linux/netpoll.h  |3 -
 net/core/netpoll.c   |  123 +++
 3 files changed, 42 insertions(+), 85 deletions(-)

--- linux-2.6.orig/drivers/net/netconsole.c 2006-10-16 14:15:01.0 
-0700
+++ linux-2.6/drivers/net/netconsole.c  2006-10-19 08:28:25.0 -0700
@@ -60,7 +60,6 @@
.local_port = 6665,
.remote_port = ,
.remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
-   .drop = netpoll_queue,
 };
 static int configured = 0;
 
--- linux-2.6.orig/include/linux/netpoll.h  2006-10-16 14:15:04.0 
-0700
+++ linux-2.6/include/linux/netpoll.h   2006-10-19 08:28:25.0 -0700
@@ -27,11 +27,12 @@
 struct netpoll_info {
spinlock_t poll_lock;
int poll_owner;
-   int tries;
int rx_flags;
spinlock_t rx_lock;
struct netpoll *rx_np; /* netpoll that registered an rx_hook */
struct sk_buff_head arp_tx; /* list of arp requests to reply to */
+   struct sk_buff_head tx_q;
+   struct tasklet_struct tx_task;
 };
 
 void netpoll_poll(struct netpoll *np);
--- linux-2.6.orig/net/core/netpoll.c   2006-10-19 08:28:04.0 -0700
+++ linux-2.6/net/core/netpoll.c2006-10-19 09:47:38.0 -0700
@@ -40,10 +40,6 @@
 static int nr_skbs;
 static struct sk_buff *skbs;
 
-static DEFINE_SPINLOCK(queue_lock);
-static int queue_depth;
-static struct sk_buff *queue_head, *queue_tail;
-
 static atomic_t trapped;
 
 #define NETPOLL_RX_ENABLED  1
@@ -56,49 +52,33 @@
 static void zap_completion_queue(void);
 static void arp_reply(struct sk_buff *skb);
 
-static void queue_process(void *p)
+static void netpoll_run(unsigned long arg)
 {
-   unsigned long flags;
+   struct net_device *dev = (struct net_device *) arg;
+   struct netpoll_info *npinfo = dev-npinfo;
struct sk_buff *skb;
 
-   while (queue_head) {
-   spin_lock_irqsave(queue_lock, flags);
-
-   skb = queue_head;
-   queue_head = skb-next;
-   if (skb == queue_tail)
-   queue_head = NULL;
-
-   queue_depth--;
-
-   spin_unlock_irqrestore(queue_lock, flags);
+   while ((skb = skb_dequeue(npinfo-tx_q))) {
+   int rc;
 
-   dev_queue_xmit(skb);
-   }
-}
-
-static DECLARE_WORK(send_queue, queue_process, NULL);
+   if (!netif_running(dev) || !netif_device_present(dev)) {
+   __kfree_skb(skb);
+   continue;
+   }
 
-void netpoll_queue(struct sk_buff *skb)
-{
-   unsigned long flags;
+   netif_tx_lock(dev);
+   if (netif_queue_stopped(dev))
+   rc = NETDEV_TX_BUSY;
+   else
+   rc = dev-hard_start_xmit(skb, dev);
+   netif_tx_unlock(dev);
 
-   if (queue_depth == MAX_QUEUE_DEPTH) {
-   __kfree_skb(skb);
-   return;
+   if (rc != NETDEV_TX_OK) {
+   skb_queue_head(npinfo-tx_q, skb);
+   tasklet_schedule(npinfo-tx_task);
+   break;
+   }
}
-   WARN_ON(skb-protocol == 0);
-
-   spin_lock_irqsave(queue_lock, flags);
-   if (!queue_head)
-   queue_head = skb;
-   else
-   queue_tail-next = skb;
-   queue_tail = skb;
-   queue_depth++;
-   spin_unlock_irqrestore(queue_lock, flags);
-
-   schedule_work(send_queue);
 }
 
 static int checksum_udp(struct sk_buff *skb, struct udphdr *uh,
@@ -232,6 +212,7 @@
 
 static struct sk_buff * find_skb(struct netpoll *np, int len, int reserve)
 {
+   int once = 1, count = 0;
unsigned long flags;
struct sk_buff *skb = NULL;
 
@@ -254,6 +235,11 @@
}
 
if(!skb) {
+   count++;
+   if (once  (count == 100)) {
+   printk(out of netpoll skbs!\n);
+   once = 0;
+   }
netpoll_poll(np);
goto repeat;
}
@@ -265,51 +251,17 @@
 
 static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
 {
-   int status;
-   struct netpoll_info *npinfo;
+   struct net_device *dev = np-dev;
+   struct netpoll_info *npinfo = dev-npinfo;
 
-   if (!np || !np-dev || !netif_device_present(np-dev) || 
!netif_running(np-dev)) {
-   __kfree_skb(skb);
-   return;
-   }
-
-   npinfo = np-dev-npinfo;
+   skb_queue_tail(npinfo-tx_q, skb);
 
/* avoid recursion */
if (npinfo-poll_owner == smp_processor_id() ||
-

[PATCH 0/3] netpoll/netconsole fixes

2006-10-19 Thread Stephen Hemminger
The netpoll transmit skb management is a mess, it has two
paths and it's on Txq. These patches try and clean this up.

--

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] wrr (weighted round-robin) bonding

2006-10-19 Thread Andy Gospodarek
On Tue, Oct 17, 2006 at 10:16:21AM +0200, Dawid Ciezarkiewicz wrote:
 
 In fact - as default weight is being set to 1, without changing it wrr 
 bonding 
 mode works like plain round-robin one. But it have little more overhead 
 (recharging tokens), and code is a bit more complicated. I was not sure if 
 some tools could assume that in mode 0 all interfaces work with same weights 
 and because of that behave strange with this patch in use. 
 
 It was written as a solution for some problem, and I'm still not sure if such 
 change will always be patch to linux kernel or may some day go into mainline. 
 For compatibility I've decided to have those modes separated.
 
 Because of that I haven't replaced mode 0. If this patch will be considered 
 useful, and my concerns are not a problem - I'd like to replace 0 mode if 
 possible.
 -

It would seem to me that extending an existing mode would be more
desirable than adding yet another mode to worry about.  I don't even
like the fact that there are as many as there are, but I understand why
they are there.  

I recently extended rr mode to allow an additional parameter called that
rr_repeat that would allow someone to send more than a single frame out
of each device before moving to the next one.  It seemed this could be
helpful when dealing with switches that constantly re-learned source MAC
addresses.  Network performance would suffer whenever rr_repeat was 1,
but box performance might be better if there weren't so many locks
taken.

This patch is pretty bad (in-fact even the math could be done better to
avoid the expensive modulo), but since I did did it as a proof of
concept I wasn't too worried about it at the time.  The functionality
might be interesting to add to your weighted rr concept.  It's also
against an older kernel, but it should apply to an upstream one with
minimal, if any, porting.


--- linux/drivers/net/bonding/bond_main.c.orig  2006-10-11 10:41:07.611562000 
-0400
+++ linux/drivers/net/bonding/bond_main.c   2006-10-11 13:40:54.767425000 
-0400
@@ -543,6 +543,7 @@
 /* monitor all links that often (in milliseconds). =0 disables monitoring */
 #define BOND_LINK_MON_INTERV   0
 #define BOND_LINK_ARP_INTERV   0
+#define BOND_RR_REPEAT 1
 
 static int max_bonds   = BOND_DEFAULT_MAX_BONDS;
 static int miimon  = BOND_LINK_MON_INTERV;
@@ -555,6 +556,8 @@ static char *lacp_rate  = NULL;
 static char *xmit_hash_policy = NULL;
 static int arp_interval = BOND_LINK_ARP_INTERV;
 static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, };
+static int rr_repeat   = BOND_RR_REPEAT;
+static int rr_repeat_count = 0;
 
 MODULE_PARM(max_bonds, i);
 MODULE_PARM_DESC(max_bonds, Max number of bonded devices);
@@ -578,6 +581,8 @@ MODULE_PARM(arp_interval, i);
 MODULE_PARM_DESC(arp_interval, arp interval in milliseconds);
 MODULE_PARM(arp_ip_target, 1- __MODULE_STRING(BOND_MAX_ARP_TARGETS) s);
 MODULE_PARM_DESC(arp_ip_target, arp targets in n.n.n.n form);
+MODULE_PARM(rr_repeat, i);
+MODULE_PARM_DESC(rr_repeat, number of frames to send on round-robin bonds 
before switching interfaces);
 
 /*- Global variables */
 
@@ -4390,21 +4395,27 @@ static int bond_xmit_roundrobin(struct s
goto out;
}
 
-   bond_for_each_slave_from(bond, slave, i, start_at) {
-   if (IS_UP(slave-dev) 
-   (slave-link == BOND_LINK_UP) 
-   (slave-state == BOND_STATE_ACTIVE)) {
-   res = bond_dev_queue_xmit(bond, skb, slave-dev);
+   /* just xmit if we haven't hit the repeat val */
+   if (!(++rr_repeat_count % rr_repeat)) {
 
-   write_lock(bond-curr_slave_lock);
-   bond-curr_active_slave = slave-next;
-   write_unlock(bond-curr_slave_lock);
+   rr_repeat_count = 0;
+   bond_for_each_slave_from(bond, slave, i, start_at) {
+   if (IS_UP(slave-dev) 
+   (slave-link == BOND_LINK_UP) 
+   (slave-state == BOND_STATE_ACTIVE)) {
+   res = bond_dev_queue_xmit(bond, skb, 
slave-dev);
 
-   break;
+   write_lock(bond-curr_slave_lock);
+   bond-curr_active_slave = slave-next;
+   write_unlock(bond-curr_slave_lock);
+
+   break;
+   }
}
+   } else {
+   res = bond_dev_queue_xmit(bond, skb, slave-dev);
}
 
-
 out:
if (res) {
/* no suitable interface, frame not sent */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPROUTE] manpage for ss

2006-10-19 Thread Stephen Hemminger
Revised version now in iproute2 repository.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPROUTE] manpage for ss

2006-10-19 Thread Alexander Wirt
Stephen Hemminger schrieb am Donnerstag, den 19. Oktober 2006:

 Revised version now in iproute2 repository.
Great, thank you very much. 

Alex

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ethtool: more sky2 decode

2006-10-19 Thread Stephen Hemminger
More marvell register decode. Add chip table, pci config info
and other bits useful for debug.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 marvell.c |   80 +
 1 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/marvell.c b/marvell.c
index e867521..9a14dd2 100644
--- a/marvell.c
+++ b/marvell.c
@@ -141,6 +141,8 @@ static void dump_gmac_fifo(const char *n
 
 static void dump_mac(const u8 *r)
 {
+   u8 id;
+
printf(\nMAC Addresses\n);
printf(---\n);
dump_addr(1, r + 0x100);
@@ -148,10 +150,30 @@ static void dump_mac(const u8 *r)
dump_addr(3, r + 0x110);
printf(\n);
 
-   printf(Connector type   0x%02X\n, r[0x118]);
-   printf(PMD type 0x%02X\n, r[0x119]);
-   printf(Configuration0x%02X\n, r[0x11a]);
-   printf(Chip Revision0x%02X\n, r[0x11b]);
+   printf(Connector type   0x%02X (%c)\n, 
+  r[0x118], (char)r[0x118]);
+   printf(PMD type 0x%02X (%c)\n,
+  r[0x119], (char)r[0x119]);
+   printf(PHY type 0x%02X\n, r[0x11d]);
+
+   id = r[0x11b];
+   printf(Chip Id  0x%02X , id);
+   switch (id) {
+   case 0x0a:  puts(Genesis);break;
+   case 0xb0:  puts(Yukon);  break;
+   case 0xb1:  puts(Yukon-Lite); break;
+   case 0xb2:  puts(Yukon-LP);   break;
+   case 0xb3:  puts(Yukon-2 XL); break;
+   case 0xb4:  puts(Yukon-2 EC Ultra);   break;
+   case 0xb6:  puts(Yukon-2 EC); break;
+   case 0xb7:  puts(Yukon-2 FE); break;
+   default:puts(Unknown);
+   }
+
+   printf( (rev %d)\n, r[0x11a]  0xf);
+
+   printf(Ram Buffer   0x%02X\n, r[0x11c]);
+  
 }

 static void dump_gma(const char *name, const u8 *r)
@@ -182,21 +204,43 @@ static void dump_gmac(const char *name, 
dump_gma(Physical, data + 0x28);
 }
 
+static void dump_pci(const u8 *cfg)
+{
+   int i;
+
+   printf(\nPCI config\n--\n);
+   for(i = 0; i  0x80; i++) {
+   if (!(i  15))
+   printf(%02x:, i);
+   printf( %02x, cfg[i]);
+   if ((i  15) == 15)
+   putchar('\n');
+   }
+   putchar('\n');
+}
+
+static void dump_control(u8 *r)
+{
+   printf(Control Registers\n);
+   printf(-\n);
+
+   printf(Register Access Port 0x%02X\n, *r);
+   printf(LED Control/Status   0x%08X\n, *(u32 *) (r + 4));
+
+   printf(Interrupt Source 0x%08X\n, *(u32 *) (r + 8));
+   printf(Interrupt Mask   0x%08X\n, *(u32 *) (r + 0xc));
+   printf(Interrupt Hardware Error Source  0x%08X\n, *(u32 *) (r + 
0x10));
+   printf(Interrupt Hardware Error Mask0x%08X\n, *(u32 *) (r + 
0x14));
+}
+
 int skge_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
 {
const u32 *r = (const u32 *) regs-data;
int dual = !(regs-data[0x11a]  1);
 
-   printf(Control Registers\n);
-   printf(-\n);
+   dump_pci(regs-data + 0x380);
 
-   printf(Register Access Port 0x%08X\n, r[0]);
-   printf(LED Control/Status   0x%08X\n, r[1]);
-   printf(Interrupt Source 0x%08X\n, r[2]);
-   printf(Interrupt Mask   0x%08X\n, r[3]);
-   printf(Interrupt Hardware Error Source  0x%08X\n, r[4]);
-   printf(Interrupt Hardware Error Mask0x%08X\n, r[5]);
-   printf(Special Interrupt Source 0x%08X\n, r[6]);
+   dump_control(regs-data);
 
printf(\nBus Management Unit\n);
printf(---\n);
@@ -296,15 +340,9 @@ int sky2_dump_regs(struct ethtool_drvinf
const u32 *r = (const u32 *) regs-data;
int dual;
 
-   printf(Control Registers\n);
-   printf(-\n);
+   dump_pci(regs-data + 0x1c00);
 
-   printf(Control/Status   0x%08X\n, r[1]);
-   printf(Interrupt Source 0x%08X\n, r[2]);
-   printf(Interrupt Mask   0x%08X\n, r[3]);
-   printf(Interrupt Hardware Error Source  0x%08X\n, r[4]);
-   printf(Interrupt Hardware Error Mask0x%08X\n, r[5]);
-   printf(Special Interrupt Source 0x%08X\n, r[6]);
+   dump_control(regs-data);
 
printf(\nBus Management Unit\n);
printf(---\n);
-- 
1.4.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please pull 'we21-fix' branch of wireless-2.6.git

2006-10-19 Thread John W. Linville
Jeff,

Here is my ugly patch to fix userland ABI compatibility for WE-21.
It tries to detect WE = 20 by the request length or the inclusion of
'\0' in the length for the ESSID and NICKN ioctls.  If it finds that,
it temporarily adjusts the length value and puts it back before
reporting back to userland.

It is possible that there are more elegant solutions that work, but
I'm not totally convinced that they truly preserve the ABI as desired.
Please see the discussion earlier in this thread if you are interested.

I'm aware that checking for '\0' is problematic, since that technically
is a valid SSID character even at the end of the SSID.  I'm afraid
we will just have to live with that limitation.

I have the single patch on its own clean branch from Linus's tree of
a few days ago.  That way you can pull just this fix without concern
about picking-up the other fixes which I posted a few days ago,
in case you haven't reviewed them yet.

Obviously, this is intended for 2.6.19.

Thanks,

John
---

The following changes since commit 51018b0a3160d253283173c2f54f16746cee5852:
  Ulrich Drepper:
make UML compile (FC6/x86-64)

are found in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/wireless-2.6.git we21-fix

John W. Linville:
  wireless: WE-20 compatibility for ESSID and NICKN ioctls

 net/core/wireless.c |   33 -
 1 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/net/core/wireless.c b/net/core/wireless.c
index 0da..cb1b872 100644
--- a/net/core/wireless.c
+++ b/net/core/wireless.c
@@ -748,11 +748,39 @@ #endif/* WE_SET_EVENT */
int extra_size;
int user_length = 0;
int err;
+   int essid_compat = 0;
 
/* Calculate space needed by arguments. Always allocate
 * for max space. Easier, and won't last long... */
extra_size = descr-max_tokens * descr-token_size;
 
+   /* Check need for ESSID compatibility for WE  21 */
+   switch (cmd) {
+   case SIOCSIWESSID:
+   case SIOCGIWESSID:
+   case SIOCSIWNICKN:
+   case SIOCGIWNICKN:
+   if (iwr-u.data.length == descr-max_tokens + 1)
+   essid_compat = 1;
+   else if (IW_IS_SET(cmd)  (iwr-u.data.length != 0)) {
+   char essid[IW_ESSID_MAX_SIZE + 1];
+
+   err = copy_from_user(essid, iwr-u.data.pointer,
+iwr-u.data.length *
+descr-token_size);
+   if (err)
+   return -EFAULT;
+
+   if (essid[iwr-u.data.length - 1] == '\0')
+   essid_compat = 1;
+   }
+   break;
+   default:
+   break;
+   }
+
+   iwr-u.data.length -= essid_compat;
+
/* Check what user space is giving us */
if(IW_IS_SET(cmd)) {
/* Check NULL pointer */
@@ -795,7 +823,8 @@ #ifdef WE_IOCTL_DEBUG
 #endif /* WE_IOCTL_DEBUG */
 
/* Create the kernel buffer */
-   extra = kmalloc(extra_size, GFP_KERNEL);
+   /*kzalloc ensures NULL-termination for essid_compat */
+   extra = kzalloc(extra_size, GFP_KERNEL);
if (extra == NULL) {
return -ENOMEM;
}
@@ -819,6 +848,8 @@ #endif  /* WE_IOCTL_DEBUG */
/* Call the handler */
ret = handler(dev, info, (iwr-u), extra);
 
+   iwr-u.data.length += essid_compat;
+
/* If we have something to return to the user */
if (!ret  IW_IS_GET(cmd)) {
/* Check if there is enough buffer up there */
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please pull 'we21-fix' branch of wireless-2.6.git

2006-10-19 Thread Jean Tourrilhes
On Thu, Oct 19, 2006 at 05:56:53PM -0400, John W. Linville wrote:
 Jeff,
 
 Here is my ugly patch to fix userland ABI compatibility for WE-21.
 It tries to detect WE = 20 by the request length or the inclusion of
 '\0' in the length for the ESSID and NICKN ioctls.  If it finds that,
 it temporarily adjusts the length value and puts it back before
 reporting back to userland.
 
 It is possible that there are more elegant solutions that work, but
 I'm not totally convinced that they truly preserve the ABI as desired.
 Please see the discussion earlier in this thread if you are interested.

I agree that this is a good compromise.
Thanks a lot for your work, John !

Jean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] net: use bitrev8

2006-10-19 Thread Andrew Morton
On Thu, 19 Oct 2006 01:46:47 +0900
Akinobu Mita [EMAIL PROTECTED] wrote:

 Use bitrev8 for bmac, mace, macmace, macsonic, and skfp drivers.
 
 Cc: Jeff Garzik [EMAIL PROTECTED]
 Cc: Paul Mackerras [EMAIL PROTECTED]
 Cc: Mirko Lindner [EMAIL PROTECTED]
 Cc: Thomas Bogendoerfer [EMAIL PROTECTED]
 Signed-off-by: Akinobu Mita [EMAIL PROTECTED]
 
  drivers/net/Kconfig|1 
  drivers/net/bmac.c |   20 ++
  drivers/net/mace.c |   16 +---
  drivers/net/macmace.c  |   18 +
  drivers/net/macsonic.c |6 ---
  drivers/net/skfp/can.c |   83 
 -
  drivers/net/skfp/drvfbi.c  |   21 ---
  drivers/net/skfp/fplustm.c |4 +-
  drivers/net/skfp/smt.c |7 +--

A bunch of drivers.

 ===
 --- work-fault-inject.orig/drivers/net/Kconfig
 +++ work-fault-inject/drivers/net/Kconfig
 @@ -2500,6 +2500,7 @@ config DEFXX
  config SKFP
   tristate SysKonnect FDDI PCI support
   depends on FDDI  PCI
 + select BITREVERSE
   ---help---
 Say Y here if you have a SysKonnect FDDI PCI adapter.
 The following adapters are supported by this driver:

But only one of them selects the library.

The patchset adds a large number of `select' statements.  afaict everything
_seems_ to work OK with that (as long as all the needed selects are there).

But select is problematic and I do wonder whether it'd be simpler to just
link the thing into vmlinux.

Oh well, we'll see how it goes.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bcm43xx: Drain TX status before starting IRQs

2006-10-19 Thread Michael Buesch
Drain the Microcode TX-status-FIFO before we enable IRQs.
This is required, because the FIFO may still have entries left
from a previous run. Those would immediately fire after enabling
IRQs and would lead to an oops in the DMA TXstatus handling code.

Signed-off-by: Michael Buesch [EMAIL PROTECTED]

--

Please consider also pushing this into the -stable tree.
The bug is not likely to trigger, but at least ben
triggered it in the past. Anyway, it can't hurt much to
drain the FIFO before running the device.

Note that this is diffed against 2.6.18.1 and not 2.6.18
as the diff prolog suggests. I just forgot to rename
the directory. ;)


Index: linux-2.6.18/drivers/net/wireless/bcm43xx/bcm43xx_main.c
===
--- linux-2.6.18.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c   
2006-10-15 21:10:37.0 +0200
+++ linux-2.6.18/drivers/net/wireless/bcm43xx/bcm43xx_main.c2006-10-19 
17:17:16.0 +0200
@@ -1463,6 +1463,21 @@ static void handle_irq_transmit_status(s
}
 }
 
+static void drain_txstatus_queue(struct bcm43xx_private *bcm)
+{
+   u32 dummy;
+
+   /* Read all entries from the microcode TXstatus FIFO
+* and throw them away.
+*/
+   while (1) {
+   dummy = bcm43xx_read32(bcm, BCM43xx_MMIO_XMITSTAT_0);
+   if (!dummy)
+   break;
+   dummy = bcm43xx_read32(bcm, BCM43xx_MMIO_XMITSTAT_1);
+   }
+}
+
 static void bcm43xx_generate_noise_sample(struct bcm43xx_private *bcm)
 {
bcm43xx_shm_write16(bcm, BCM43xx_SHM_SHARED, 0x408, 0x7F7F);
@@ -3509,6 +3524,7 @@ int bcm43xx_select_wireless_core(struct 
bcm43xx_macfilter_clear(bcm, BCM43xx_MACFILTER_ASSOC);
bcm43xx_macfilter_set(bcm, BCM43xx_MACFILTER_SELF, (u8 
*)(bcm-net_dev-dev_addr));
bcm43xx_security_init(bcm);
+   drain_txstatus_queue(bcm);
ieee80211softmac_start(bcm-net_dev);
 
/* Let's go! Be careful after enabling the IRQs.


-- 
Greetings Michael.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)

2006-10-19 Thread jamal
On Thu, 2006-19-10 at 16:38 +0200, Patrick McHardy wrote:
 jamal wrote:
  ACKed-by: Jamal Hadi Salim
  
  When Patrick has his patch ready after this goes in we can revisit.
 
 NACK.
 
 I still think this patch shouldn't go in. There's no point in doing the
 same thing twice, and I haven't heard a compelling argument why it has
 to be done in a way that only helps qdiscs using rtabs while ignoring
 statistics and estimators (I even provided a patch to show how to do
 it without these limitations).

The poor guy has been persistent and has some good ideas and we need to
encourage him to stick around.  Why dont you help him get the patch in
the shape you think is reasonable? I know you are busy elsewhere and
your patch has been a while since you last promised. I will try to help
as well. 

 Besides that:

I will let Russell respond to the critique (As well as the concept of
stats and estimator); Russell please try to be brief and to the point (I
still have to learn that lesson myself ;-).

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AF_KEY extended xfrm_state selector handling

2006-10-19 Thread jamal
On Thu, 2006-19-10 at 18:26 +0100, Michal Ruzicka wrote:
 Hello
 
 In an effort to configure an L2TP/IPsec server on Linux capable of supporting
 multiple clients behind a single NAT device I ran into difficulties with 
 pf_key
 protocol implementation not being able to exploit all the information
 passed to it as a SADB_EXT_ADDRESS_PROXY info. Perhaps as the original source
 suggested (/* Nobody uses this, but we try. */) this info has never been used
 before.


BTW, why not use xfrm instead? Then you dont have to worry about racoon.
Unless you care about running this in some other OS (I suspect these
OSes probably have made use of SADB_EXT_ADDRESS_PROXY so that may be a
futile effort in any case).


cheers,
jamal

PS:- Nothing stands out for me in your patch, so i have no comment; i
wasnt sure if the concept of tcp/udp port meant much to the concept of a
security association


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ethernet Cheap Cryptography

2006-10-19 Thread Stephen J. Bevan
Pawel Foremski writes:
  First, ccrypts task is to secure Ethernet, not IP.

Understood, but the vast majority of traffic running over Ethernet
that a user cares about is IP and so IPsec does the job.  Obviously
IPsec cannot handle non-IP traffic but the question is what non-IP
traffic do users want encrypted?


  Secondly, IPsec won't decrease MSS in TCP encapsulated in PPPoE
  traffic, for example. 

Various, commercial, IPsec products decrease the MSS for TCP
encapsulated in PPPoE.  I've not checked the Linux 2.6 IPsec code to
see if it does or if it can easily be made to.


   b) what traffic other than IP traffic needs to be encrypted.
  
  PPPoE; Ethernet in general.

PPPoE carrying IP can be handled by IPsec as noted above.  That leaves
Ethernet in general.


   If the keying is done manually an attacker won't know when the keys
   are changed.  However, if keying is coordinated over the same link via
   a protocol (as is done with IKE for IPsec) then the attacker can see
   (or at least guess) the packets carrying the keying protcol thus know
   re-keying is going to occur.
  
  Only if the rekeying traffic is the only being transmitted. IMHO a border
  case.

Unless you mask the size of your (re-)keying traffic by randomly
padding the packets then they can be detected even in the middle of
other traffic.


   Indeed, in IPsec, the equivalent of ccrypt is ESP and that's rather
   straighforward.  The complicated part is IKE, the userspace component
   that handles keying.  It is certainly possible to create something
   simpler than IKE (e.g. IKEv2 is somewhat simpler) but the devil is in
   the details.
  
  Sure, but that's IMHO little bit off-topic in regard to ccrypt, which is
  just an encryption back-end (eventually the rekeying daemon will sit in the
  userspace).

Sure.  However, there has to be a user-kernel API and the question
is whether what you have now is sufficient when a daemon is added or
whether it will need to change?  If it does need to change it will
need to be backwards compatible or need to be a separate API?

Also at least for IPsec, the kernel knows something about IKE in that
generally IKE traffic is not encrypted by IPsec.  Instead IKE has its
own encryption which it bootstraps using
shared-secrets/certificates/publicpreivate key pairs.  In the case of
ccrypt either the ccryptKE protocol would need to bypass ccrypt or you
need to way to start off with known keys, but not the same keys every
time or that can be exploited.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


smallish review of acx1xx-wireless-driver.patch

2006-10-19 Thread Alexey Dobriyan
Main griefs:
a) home-grown lock debugger (what can it do what lockdep can't?)
b) lack of endian annotations.
c) driver fallbacks to USA regulatory domain if it can't find valid one
   from the list. My gut feeling is that this can bring an unattentive Linux
   user to police on bad day.

lesser griefs are scattered over the rest of email.
-
+/* Locking: */
+/* very talkative */
+/* #define PARANOID_LOCKING 1 */
+/* normal (use when bug-free) */
+#define DO_LOCKING 1
+/* else locking is disabled! */

lock debugging.

+#define CLEAR_BIT(val, mask) ((val) = ~(mask))
+#define SET_BIT(val, mask) ((val) |= (mask))

silly macros and even misused in code like

CLEAR_BIT(feat.feature_options, cpu_to_le32(feature_options));

Please, open code.

+/* These functions *must* be inline or they will break horribly on SPARC, due
+ * to its weird semantics for save/restore flags */
+
+#if defined(PARANOID_LOCKING) /* Lock debugging */
+
+void acx_lock_debug(acx_device_t *adev, const char* where);
+void acx_unlock_debug(acx_device_t *adev, const char* where);
+void acx_down_debug(acx_device_t *adev, const char* where);
+void acx_up_debug(acx_device_t *adev, const char* where);
+void acx_lock_unhold(void);
+void acx_sem_unhold(void);
+
+static inline void
+acx_lock_helper(acx_device_t *adev, unsigned long *fp, const char* where)
+{
+   acx_lock_debug(adev, where);
+   spin_lock_irqsave(adev-lock, *fp);
+}
+static inline void
+acx_unlock_helper(acx_device_t *adev, unsigned long *fp, const char* where)
+{
+   acx_unlock_debug(adev, where);
+   spin_unlock_irqrestore(adev-lock, *fp);
+}
+static inline void
+acx_down_helper(acx_device_t *adev, const char* where)
+{
+   acx_down_debug(adev, where);
+}
+static inline void
+acx_up_helper(acx_device_t *adev, const char* where)
+{
+   acx_up_debug(adev, where);
+}
+#define acx_lock(adev, flags)  acx_lock_helper(adev, (flags), __FILE__ : 
STRING(__LINE__))
+#define acx_unlock(adev, flags)acx_unlock_helper(adev, (flags), 
__FILE__ : STRING(__LINE__))
+#define acx_sem_lock(adev) acx_down_helper(adev, __FILE__ : 
STRING(__LINE__))
+#define acx_sem_unlock(adev)   acx_up_helper(adev, __FILE__ : 
STRING(__LINE__))
+
+#elif defined(DO_LOCKING)
+
+#define acx_lock(adev, flags)  spin_lock_irqsave(adev-lock, flags)
+#define acx_unlock(adev, flags)spin_unlock_irqrestore(adev-lock, 
flags)
+#define acx_sem_lock(adev) down(adev-sem)
+#define acx_sem_unlock(adev)   up(adev-sem)
+#define acx_lock_unhold()  ((void)0)
+#define acx_sem_unhold()   ((void)0)
+
+#else /* no locking! :( */
+
+#define acx_lock(adev, flags)  ((void)0)
+#define acx_unlock(adev, flags)((void)0)
+#define acx_sem_lock(adev) ((void)0)
+#define acx_sem_unlock(adev)   ((void)0)
+#define acx_lock_unhold()  ((void)0)
+#define acx_sem_unhold()   ((void)0)
+
+#endif

+enum {
+   L_LOCK  = (ACX_DEBUG1)*0x0001, /* locking debug log */

lock debugging

+#define ACX_PACKED __WLAN_ATTRIB_PACK__

Just add __packed in kernel.h I guess.

+#define VEC_SIZE(a) (sizeof(a)/sizeof(a[0]))

That would be already existing ARRAY_SIZE()

+/***
+** Constants
+*/
+#define OK 0
+#define NOT_OK 1

That's not OK.

+#if !defined(CONFIG_ACX_PCI)  !defined(CONFIG_ACX_USB)
+#error Driver must include PCI and/or USB support. You selected neither.
+#endif

Can it be done some via Kconfig magic? I don't know.

+/* An opaque typesafe helper type
+ *
+ * Some hardware fields are actually pointers,
+ * but they have to remain u32, since using ptr instead
+ * (8 bytes on 64bit systems!) would disrupt the fixed descriptor
+ * format the acx firmware expects in the non-user area.
+ * Since we cannot cram an 8 byte ptr into 4 bytes, we need to
+ * enforce that pointed to data remains in low memory
+ * (address value needs to fit in 4 bytes) on 64bit systems.
+ *
+ * This is easy to get wrong, thus we are using a small struct
+ * and special macros to access it. Macros will check for
+ * attempts to overflow an acx_ptr with value  0x.
+ *
+ * Attempts to use acx_ptr without macros result in compile-time errors */
+
+typedef struct {
+   u32 v;
+} ACX_PACKED acx_ptr;
+
+#if ACX_DEBUG
+#define CHECK32(n) BUG_ON(sizeof(n)4  (long)(n)0xff00)
+#else
+#define CHECK32(n) ((void)0)
+#endif
+
+/* acx_ptr - integer conversion */
+#define cpu2acx(n) ({ CHECK32(n); ((acx_ptr){ .v = cpu_to_le32(n) }); })
+#define acx2cpu(a) (le32_to_cpu(a.v))
+
+/* acx_ptr - pointer conversion */
+#define ptr2acx(p) ({ CHECK32(p); ((acx_ptr){ .v = cpu_to_le32((u32)(long)(p)) 
}); })
+#define acx2ptr(a) ((void*)le32_to_cpu(a.v))

Duh!

+struct acx_device {
+   /* most frequent accesses first (dereferencing and cache line!) */
+
+   /*** Locking ***/
+   struct semaphore

Re: [PATCH] bcm43xx: Drain TX status before starting IRQs

2006-10-19 Thread Benjamin Herrenschmidt
On Thu, 2006-10-19 at 17:29 +0200, Michael Buesch wrote:
 Drain the Microcode TX-status-FIFO before we enable IRQs.
 This is required, because the FIFO may still have entries left
 from a previous run. Those would immediately fire after enabling
 IRQs and would lead to an oops in the DMA TXstatus handling code.
 
 Signed-off-by: Michael Buesch [EMAIL PROTECTED]

Great, thanks. Note that I haven't yet hit the bug since I updated the
firmware, so it could be a mix of firmware and that problem, though it's
not been long enough to be a clear results.

In any case, that patch should go in.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] net: use bitrev8

2006-10-19 Thread Akinobu Mita
On Thu, Oct 19, 2006 at 01:39:51PM -0700, Andrew Morton wrote:

 A bunch of drivers.
 
  ===
  --- work-fault-inject.orig/drivers/net/Kconfig
  +++ work-fault-inject/drivers/net/Kconfig
  @@ -2500,6 +2500,7 @@ config DEFXX
   config SKFP
  tristate SysKonnect FDDI PCI support
  depends on FDDI  PCI
  +   select BITREVERSE
  ---help---
Say Y here if you have a SysKonnect FDDI PCI adapter.
The following adapters are supported by this driver:
 
 But only one of them selects the library.

Other drivers already select CRC32 and CRC32 selects BITREVERSE.

 But select is problematic and I do wonder whether it'd be simpler to just
 link the thing into vmlinux.

OK. I'll try.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Ethernet Cheap Cryptography

2006-10-19 Thread David Miller
From: [EMAIL PROTECTED] (Stephen J. Bevan)
Date: Thu, 19 Oct 2006 19:18:41 -0700

 Pawel Foremski writes:
   Secondly, IPsec won't decrease MSS in TCP encapsulated in PPPoE
   traffic, for example. 
 
 Various, commercial, IPsec products decrease the MSS for TCP
 encapsulated in PPPoE.  I've not checked the Linux 2.6 IPsec code to
 see if it does or if it can easily be made to.

Linux will for local TCP connections over IPSEC transports since it
knows the path MTU, for IPSEC gateways the source system will adjust
the MSS after it notes via path-MTU what the decreased MTU is.

I think this is just a big list of excuses for not using IPSEC as the
solution for whatever problem is trying to be solved.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4][CRYPTO][IPsec] support XCBC

2006-10-19 Thread Herbert Xu
On Wed, Oct 18, 2006 at 11:25:55AM +0900, Kazunori MIYAZAWA wrote:
 
 I send patches to support XCBC mode of IPsec.
 This patch is for linux-2.6.19-rc2. I checked to also apply
 this to net-2.6.

Thanks a lot for the patches Miyazawa-san.  I'll put them into
cryptodev-2.6 for 2.6.20.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: revert mv643xx change from ubuntu tree

2006-10-19 Thread Ben Collins
On Thu, 2006-10-19 at 14:18 +0200, Olaf Hering wrote:
 Somehow the Ubuntu guys managed to sneak this compile error into the
 tree:
 
 commit ce9e3d9953c8cb67001719b5516da2928e956be4
 
   [mv643xx] Add pci device table for auto module loading.
 
 drivers/net/mv643xx_eth.c:1560: error: array type has incomplete element type
 drivers/net/mv643xx_eth.c:1561: warning: implicit declaration of function 
 ‘PCI_DEVICE’
 drivers/net/mv643xx_eth.c:1561: error: ‘PCI_VENDOR_ID_MARVELL’ undeclared 
 here (not in a function)
 drivers/net/mv643xx_eth.c:1561: error: ‘PCI_DEVICE_ID_MARVELL_MV64360’ 
 undeclared here (not in a function)

Correct, I missed the include for linux/pci.h.

This patch has been trailing our tree since 2.6.12. Could you help me to
understand what in this driver will cause it to be autoloaded by udev
when compiled as a module?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: revert mv643xx change from ubuntu tree

2006-10-19 Thread Olaf Hering
On Fri, Oct 20, Ben Collins wrote:

 On Thu, 2006-10-19 at 14:18 +0200, Olaf Hering wrote:
  Somehow the Ubuntu guys managed to sneak this compile error into the
  tree:
  
  commit ce9e3d9953c8cb67001719b5516da2928e956be4
  
[mv643xx] Add pci device table for auto module loading.
  
  drivers/net/mv643xx_eth.c:1560: error: array type has incomplete element 
  type
  drivers/net/mv643xx_eth.c:1561: warning: implicit declaration of function 
  ‘PCI_DEVICE’
  drivers/net/mv643xx_eth.c:1561: error: ‘PCI_VENDOR_ID_MARVELL’ undeclared 
  here (not in a function)
  drivers/net/mv643xx_eth.c:1561: error: ‘PCI_DEVICE_ID_MARVELL_MV64360’ 
  undeclared here (not in a function)
 
 Correct, I missed the include for linux/pci.h.
 
 This patch has been trailing our tree since 2.6.12. Could you help me to
 understand what in this driver will cause it to be autoloaded by udev
 when compiled as a module?

See commit ce9e3d9953c8cb67001719b5516da2928e956be4, platform devices
have now a modalias entry in sysfs. The network card is not a PCI
device.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html