Re: [PATCH 3/4] [NETLINK]: Dont set socket error for failed event notifications

2006-08-11 Thread David Miller
From: Thomas Graf [EMAIL PROTECTED]
Date: Thu, 10 Aug 2006 21:23:23 +0200

 Dave, please revert the whole patchset.

Done and I've rebased the net-2.6.19 tree as well at:

kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.19.git

Thomas, there was a conflict which I did expect since I added
all the 2.6.18 bug fixes into the tree.  The conflict is for
your netlink conversion of the iflink stuff.

I had to fix the -stable tree IFLA_ADDRESS handling because
dev-set_mac_address expects a sockaddr not a raw MAC
address.

I think I resolved the conflict while applying the patch
properly, but if you could take a look I'd appreciate it.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET 01/06]: Use u32 for routing table IDs

2006-08-11 Thread David Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Thu, 10 Aug 2006 21:29:59 +0200 (MEST)

 [NET]: Use u32 for routing table IDs
 
 Use u32 for routing table IDs in net/ipv4 and net/decnet in preparation of
 support for a larger number of routing tables. net/ipv6 already uses u32
 everywhere and needs no further changes. No functional changes are made by
 this patch.
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] neighbor: use ALIGN() macro

2006-08-11 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Thu, 10 Aug 2006 11:34:03 -0700

 Rather than opencoding the mask, it looks better to use ALIGN()
 macro from kernel.h.
 
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

Applied, thanks Stephen.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET 02/06]: Introduce RTA_TABLE/FRA_TABLE attributes

2006-08-11 Thread David Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Thu, 10 Aug 2006 21:30:00 +0200 (MEST)

 [NET]: Introduce RTA_TABLE/FRA_TABLE attributes
 
 Introduce RTA_TABLE route attribute and FRA_TABLE routing rule attribute
 to hold 32 bit routing table IDs. Usespace compatibility is provided by
 continuing to accept and send the rtm_table field, but because of its
 limited size it can only carry the low 8 bits of the table ID. This
 implies that if larger IDs are used, _all_ userspace programs using them
 need to use RTA_TABLE.
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

Applied, thanks Patrick.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET 06/06]: Increate RT_TABLE_MAX to 2^32

2006-08-11 Thread David Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Thu, 10 Aug 2006 21:30:06 +0200 (MEST)

 [NET]: Increate RT_TABLE_MAX to 2^32
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

2^32-1 :-)

Applied, thanks Patrick.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DECNET 05/06]: Increase number of possible routing tables to 2^32

2006-08-11 Thread David Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Thu, 10 Aug 2006 21:30:04 +0200 (MEST)

 [DECNET]: Increase number of possible routing tables to 2^32
 
 Increase the number of possible routing tables to 2^32 by replacing the
 fixed sized array of pointers by a hash table and replacing iterations
 over all possible table IDs by hash table walking.
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET 00/06]: Increase number of possible routing tables

2006-08-11 Thread Michael Tokarev
Patrick McHardy wrote:
 These are the updated patches (against net-2.6.19) to increase the number
 of possible routing tables to 2^32. They basically consist of four parts:
 
 - Use u32 for routing table IDs everywhere inside the kernel

Just out of curiocity: why current limit of 2^31 isn't sufficient?
Or am I missing the point?

Thanks.

/mjt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take6 1/3] kevent: Core files.

2006-08-11 Thread Evgeniy Polyakov
On Thu, Aug 10, 2006 at 05:56:39PM -0700, Andrew Morton ([EMAIL PROTECTED]) 
wrote:
  Per kevent fd.
  I have some ideas about better mmap ring implementation, which would
  dinamically grow it's buffer when events are added and reuse the same
  place for next events, but there are some nitpics unresolved yet.
  Let's not see there in next releases (no merge of course), until better 
  solution is ready. I will change that area when other things are ready.
 
 This is not a problem with the mmap interface per-se.  If the proposed
 event code permits each user to pin 160MB of kernel memory then that would
 be a serious problem.

The main disadvantage is that all memory is allocated on the start even
if it will not be used later. I think dynamic grow is appropriate
solution, since user will have that memory used anyway, since kevents
are allocated, just part of them will be allocated from possibly 
mmaped memory.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take6 1/3] kevent: Core files.

2006-08-11 Thread Andrew Morton
On Fri, 11 Aug 2006 10:15:35 +0400
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 On Thu, Aug 10, 2006 at 05:56:39PM -0700, Andrew Morton ([EMAIL PROTECTED]) 
 wrote:
   Per kevent fd.
   I have some ideas about better mmap ring implementation, which would
   dinamically grow it's buffer when events are added and reuse the same
   place for next events, but there are some nitpics unresolved yet.
   Let's not see there in next releases (no merge of course), until better 
   solution is ready. I will change that area when other things are ready.
  
  This is not a problem with the mmap interface per-se.  If the proposed
  event code permits each user to pin 160MB of kernel memory then that would
  be a serious problem.
 
 The main disadvantage is that all memory is allocated on the start even
 if it will not be used later. I think dynamic grow is appropriate
 solution, since user will have that memory used anyway, since kevents
 are allocated, just part of them will be allocated from possibly 
 mmaped memory.

But the worst-case remains the same, doesn't it?  160MB of pinned kernel
memory per user?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take6 1/3] kevent: Core files.

2006-08-11 Thread Evgeniy Polyakov
On Thu, Aug 10, 2006 at 11:23:40PM -0700, Andrew Morton ([EMAIL PROTECTED]) 
wrote:
 On Fri, 11 Aug 2006 10:15:35 +0400
 Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
  On Thu, Aug 10, 2006 at 05:56:39PM -0700, Andrew Morton ([EMAIL PROTECTED]) 
  wrote:
Per kevent fd.
I have some ideas about better mmap ring implementation, which would
dinamically grow it's buffer when events are added and reuse the same
place for next events, but there are some nitpics unresolved yet.
Let's not see there in next releases (no merge of course), until better 
solution is ready. I will change that area when other things are ready.
   
   This is not a problem with the mmap interface per-se.  If the proposed
   event code permits each user to pin 160MB of kernel memory then that would
   be a serious problem.
  
  The main disadvantage is that all memory is allocated on the start even
  if it will not be used later. I think dynamic grow is appropriate
  solution, since user will have that memory used anyway, since kevents
  are allocated, just part of them will be allocated from possibly 
  mmaped memory.
 
 But the worst-case remains the same, doesn't it?  160MB of pinned kernel
 memory per user?

Yes. And now I think dynamic growing is not a good solution, since user
can not know when he must call mmap() again to get additional pages
(although I have some hacks to dynamically replace previously mmapped
pages with new ones).

This area can be decreased down to 70mb by reducing amount of
information placed into the buffer (only user's data and flags) without
additional hints.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take6 1/3] kevent: Core files.

2006-08-11 Thread Ulrich Drepper
Evgeniy Polyakov wrote:
 The main disadvantage is that all memory is allocated on the start even
 if it will not be used later. I think dynamic grow is appropriate
 solution, since user will have that memory used anyway, since kevents
 are allocated,

If you _allocate_ memory at startup you're doing something wrong.  All
you should do is allocate address space.  Memory should be allocated
when it is needed.

Growing a memory region is always hard because it means you cannot keep
any addresses around and always have to reload a base pointer.  That's
not ideal.

Especially on 64-bit machines address space really is no limitation
anymore.  So, allocate as much as needed, allocate memory when it's
needed, and don't resize.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: [take6 1/3] kevent: Core files.

2006-08-11 Thread David Miller
From: Evgeniy Polyakov [EMAIL PROTECTED]
Date: Fri, 11 Aug 2006 10:33:53 +0400

 That requires mmap hacks to substitute pages in run-time without user
 notifications. I do not expect it is a good solution, since on x86 it
 requires full TLB flush (at least when I did it there were no exported
 methods to flush separate addresses).

You just need to provide a do_no_page method, the VM layer will
take care of the page level flushing or whatever else might be
needed.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] htb: cleanup

2006-08-11 Thread David Miller
From: Thomas Graf [EMAIL PROTECTED]
Date: Thu, 10 Aug 2006 14:36:02 +0200

 * David Miller [EMAIL PROTECTED] 2006-08-02 15:18
  From: Stephen Hemminger [EMAIL PROTECTED]
  Date: Wed, 2 Aug 2006 12:56:36 -0700
  
   The HTB scheduler code is a mess, this patch set does some basic
   house cleaning.  The first four should cause no code change, but the
   last two need more testing.
  
  These patches look fine to me.  Once everyone think's they
  are ready just let me know and I'll push them into net-2.6.19
 
 I think they are ready.

Thanks for taking a look.

I've merged all 6 of Stephen's HTB patches into net-2.6.19
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-11 Thread David Miller
From: Krzysztof Oledzki [EMAIL PROTECTED]
Date: Thu, 10 Aug 2006 20:18:23 +0200 (CEST)

 OK, this patch really solves the bug from my report. Are there any chances 
 for similar fix in the net-2.6.19.git?

I'm still thinking about this patch and what various people have
explained about the situation.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET 00/06]: Increase number of possible routing tables

2006-08-11 Thread David Miller
From: Michael Tokarev [EMAIL PROTECTED]
Date: Fri, 11 Aug 2006 10:39:19 +0400

 Patrick McHardy wrote:
  These are the updated patches (against net-2.6.19) to increase the number
  of possible routing tables to 2^32. They basically consist of four parts:
  
  - Use u32 for routing table IDs everywhere inside the kernel
 
 Just out of curiocity: why current limit of 2^31 isn't sufficient?
 Or am I missing the point?

The current limit is 256 because the table member of the struct
used to configure them is an 8-bit quantity.

That's the whole purpose of Patrick's patch set, to provide a new
optional attribute that allows specifying a 32-bit rather than
the 8-bit table ID.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET 00/06]: Increase number of possible routing tables

2006-08-11 Thread Michael Tokarev
David Miller wrote:
 From: Michael Tokarev [EMAIL PROTECTED]
[]
 - Use u32 for routing table IDs everywhere inside the kernel
 Just out of curiocity: why current limit of 2^31 isn't sufficient?
 Or am I missing the point?
 
 The current limit is 256 because the table member of the struct
 used to configure them is an 8-bit quantity.
 
 That's the whole purpose of Patrick's patch set, to provide a new
 optional attribute that allows specifying a 32-bit rather than
 the 8-bit table ID.

Aha, it was 256, not 2^31.  I remember now.

So the question probably should have been like, why u32 and additional
attribute (to represent former -1) instead of current int?  I mean,
it probably makes no difference whenever there are 2^32 or 2^31 tables
(both values are pretty large), but 2^32 requires more changes for the
existing code.

And while we're at it...  How about using table *names* instead of
numbers in kernel too, a-la iptables?  Once possible number of tables
is large, and we're using hashes for tables now anyway, keeping a
name inside the table structure wont hurt ;)

/mjt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take6 1/3] kevent: Core files.

2006-08-11 Thread Evgeniy Polyakov
On Thu, Aug 10, 2006 at 11:38:26PM -0700, David Miller ([EMAIL PROTECTED]) 
wrote:
 From: Evgeniy Polyakov [EMAIL PROTECTED]
 Date: Fri, 11 Aug 2006 10:33:53 +0400
 
  That requires mmap hacks to substitute pages in run-time without user
  notifications. I do not expect it is a good solution, since on x86 it
  requires full TLB flush (at least when I did it there were no exported
  methods to flush separate addresses).
 
 You just need to provide a do_no_page method, the VM layer will
 take care of the page level flushing or whatever else might be
 needed.

Yes, it is the simplest way to extend mapping but not to replace pages
which are successfully mapped, but such hacks are not needed for kevent
which only expects to extend mapping when number of ready kevents
increases.

So I will create such implementation and will place a reduced amount of
info into that pages.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take6 1/3] kevent: Core files.

2006-08-11 Thread Andrew Morton
On Fri, 11 Aug 2006 10:30:21 +0400
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 On Thu, Aug 10, 2006 at 11:23:40PM -0700, Andrew Morton ([EMAIL PROTECTED]) 
 wrote:
  On Fri, 11 Aug 2006 10:15:35 +0400
  Evgeniy Polyakov [EMAIL PROTECTED] wrote:
  
   On Thu, Aug 10, 2006 at 05:56:39PM -0700, Andrew Morton ([EMAIL 
   PROTECTED]) wrote:
 Per kevent fd.
 I have some ideas about better mmap ring implementation, which would
 dinamically grow it's buffer when events are added and reuse the same
 place for next events, but there are some nitpics unresolved yet.
 Let's not see there in next releases (no merge of course), until 
 better 
 solution is ready. I will change that area when other things are 
 ready.

This is not a problem with the mmap interface per-se.  If the proposed
event code permits each user to pin 160MB of kernel memory then that 
would
be a serious problem.
   
   The main disadvantage is that all memory is allocated on the start even
   if it will not be used later. I think dynamic grow is appropriate
   solution, since user will have that memory used anyway, since kevents
   are allocated, just part of them will be allocated from possibly 
   mmaped memory.
  
  But the worst-case remains the same, doesn't it?  160MB of pinned kernel
  memory per user?
 
 Yes. And now I think dynamic growing is not a good solution, since user
 can not know when he must call mmap() again to get additional pages
 (although I have some hacks to dynamically replace previously mmapped
 pages with new ones).
 
 This area can be decreased down to 70mb by reducing amount of
 information placed into the buffer (only user's data and flags) without
 additional hints.
 

70MB is still very bad, naturally.

There are other ways in which users can do this sort of thing - passing
fd's across sockets, allocating zillions of pagetables come to mind.  But
we don't want to add more.

Possible options:

- Add a new rlimit for the number of kevent fd's

- Add a new rlimit for the amount of kevent memory

- Add a new rlimit for the total amount of pinned kernel memory.  First
  user is kevent.

- Account a kevent fd as being worth 100 regular fds, so the naughty user
  hits EMFILE early (ug).

A new rlimit is attractive, and they're easy to add.  Problem is, userspace
support is hard (I think).  afaik a standard Linux system doesn't have
global and per-user rlimit config files which are parsed and acted upon at
login.  That would make rlimits more useful.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] ehea: queue management

2006-08-11 Thread Thomas Klein

Michael Neuling wrote:

+static inline u32 map_swqe_size(u8 swqe_enc_size)
+{
+   return 128  swqe_enc_size;
+}^
+ |
+static inline u32|map_rwqe_size(u8 rwqe_enc_size)
+{|
+   return 128  rwqe_enc_size;


  ^


+}|


  |
Snap!  These are ide|tical...
  

  |
No, they aren't. -+



Functionally identical.

Mikey

  

Agreed. Functions were replaced by a single map_wqe_size() function.

Thomas

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take6 1/3] kevent: Core files.

2006-08-11 Thread Evgeniy Polyakov
On Fri, Aug 11, 2006 at 12:04:54AM -0700, Andrew Morton ([EMAIL PROTECTED]) 
wrote:
  This area can be decreased down to 70mb by reducing amount of
  information placed into the buffer (only user's data and flags) without
  additional hints.
  
 
 70MB is still very bad, naturally.

Actually I do not think that 4k events is a good choice - I expect people
will scale it to tens of thousands at least, so we definitely want not to
allow user to create way too many kevent fds.

 There are other ways in which users can do this sort of thing - passing
 fd's across sockets, allocating zillions of pagetables come to mind.  But
 we don't want to add more.
 
 Possible options:
 
 - Add a new rlimit for the number of kevent fd's
 
 - Add a new rlimit for the amount of kevent memory
 
 - Add a new rlimit for the total amount of pinned kernel memory.  First
   user is kevent.

I think this rlimit and first one are the best choises.

 - Account a kevent fd as being worth 100 regular fds, so the naughty user
   hits EMFILE early (ug).
 
 A new rlimit is attractive, and they're easy to add.  Problem is, userspace
 support is hard (I think).  afaik a standard Linux system doesn't have
 global and per-user rlimit config files which are parsed and acted upon at
 login.  That would make rlimits more useful.

As for now it is possible to use stack size rlimit for example.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET 02/06]: Introduce RTA_TABLE/FRA_TABLE attributes

2006-08-11 Thread David Miller
From: Michael Tokarev [EMAIL PROTECTED]
Date: Fri, 11 Aug 2006 11:02:43 +0400

 Patrick McHardy wrote:
  --- a/include/linux/rtnetlink.h
  +++ b/include/linux/rtnetlink.h
 
  +static inline u32 rtm_get_table(struct rtattr **rta, u8 table)
  +{
  +   return RTA_GET_U32(rta[RTA_TABLE-1]);
  +rtattr_failure:
  +   return table;
  +}
  +
 
 What's that ?  ;)

The RTA_GET_U32 macro internally branches to rtattr_failure.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] d80211 LED handling

2006-08-11 Thread Johannes Berg
Attached (sorry, can't seem to figure out how to convince thunderbird to 
not mangle things under windows) is a patch to make d80211 have some LED 
triggers.


However, I'm not sure where I should call the _init and _exit functions 
and if I really should be using the local struct. It seems I shouldn't 
be and should be using the master interface directly or something.


Anyway, food for though and comments.

johannes
--- wireless-dev.orig/include/net/d80211.h  2006-08-10 20:02:20.159652863 
+0200
+++ wireless-dev/include/net/d80211.h   2006-08-10 20:02:22.439652863 +0200
@@ -884,16 +884,6 @@ enum {
IEEE80211_TEST_PARAM_TX_ANT_SEL_RAW = 5,
 };
 
-/* ieee80211_tx_led called with state == 1 when the first frame is queued
- *   with state == 0 when the last frame is transmitted and tx queue is empty
- */
-void ieee80211_tx_led(int state, struct net_device *dev);
-/* ieee80211_rx_led is called each time frame is received, state is not used
- * (== 2)
- */
-void ieee80211_rx_led(int state, struct net_device *dev);
-
-
 /* IEEE 802.11 defines */
 
 #define FCS_LEN 4
--- wireless-dev.orig/net/d80211/Kconfig2006-08-10 20:02:20.199652863 
+0200
+++ wireless-dev/net/d80211/Kconfig 2006-08-10 20:02:22.439652863 +0200
@@ -7,6 +7,13 @@ config D80211
This option enables the hardware independent IEEE 802.11
networking stack.
 
+config D80211_LEDS
+   bool Enable LED triggers
+   select LEDS_TRIGGERS
+   ---help---
+   This option enables a few LED triggers for different
+   packet receive/transmit events.
+
 config D80211_DEBUG
bool Enable debugging output
depends on D80211
--- wireless-dev.orig/net/d80211/ieee80211.c2006-08-10 20:02:20.279652863 
+0200
+++ wireless-dev/net/d80211/ieee80211.c 2006-08-10 20:02:22.449652863 +0200
@@ -31,7 +31,7 @@
 #include tkip.h
 #include wme.h
 #include aes_ccm.h
-
+#include ieee80211_led.h
 
 /* See IEEE 802.1H for LLC/SNAP encapsulation/decapsulation */
 /* Ethernet-II snap header (RFC1042 for most EtherTypes) */
@@ -1181,11 +1181,7 @@ static int __ieee80211_tx(struct ieee802
ret = local-hw-tx(local-mdev, skb, control);
if (ret)
return IEEE80211_TX_AGAIN;
-#ifdef IEEE80211_LEDS
-   if (local-tx_led_counter++ == 0) {
-   ieee80211_tx_led(1, local-mdev);
-   }
-#endif /* IEEE80211_LEDS */
+   ieee80211_led_tx(local, 1);
}
if (tx-u.tx.extra_frag) {
control-use_rts_cts = 0;
@@ -1210,11 +1206,7 @@ static int __ieee80211_tx(struct ieee802
control);
if (ret)
return IEEE80211_TX_FRAG_AGAIN;
-#ifdef IEEE80211_LEDS
-   if (local-tx_led_counter++ == 0) {
-   ieee80211_tx_led(1, local-mdev);
-   }
-#endif /* IEEE80211_LEDS */
+   ieee80211_led_tx(local, 1);
tx-u.tx.extra_frag[i] = NULL;
}
kfree(tx-u.tx.extra_frag);
@@ -2998,10 +2990,8 @@ ieee80211_rx_h_defragment(struct ieee802
rx-sta-rx_packets++;
if (is_multicast_ether_addr(hdr-addr1))
rx-local-dot11MulticastReceivedFrameCount++;
-#ifdef IEEE80211_LEDS
 else
-   ieee80211_rx_led(2, rx-dev);
-#endif /* IEEE80211_LEDS */
+   ieee80211_led_rx(rx-local);
return TXRX_CONTINUE;
 }
 
@@ -4104,11 +4094,8 @@ void ieee80211_tx_status(struct net_devi
rate_control_tx_status(dev, skb, status);
}
 
-#ifdef IEEE80211_LEDS
-if (local-tx_led_counter  (local-tx_led_counter-- == 1)) {
-ieee80211_tx_led(0, dev);
-}
-#endif /* IEEE80211_LEDS */
+   ieee80211_led_tx(local, 0);
+
 /* SNMP counters
 * Fragments are passed to low-level drivers as separate skbs, so these
 * are actually fragments, not frames. Update frame counters only for
--- wireless-dev.orig/net/d80211/ieee80211_dev.c2006-08-10 
20:02:20.309652863 +0200
+++ wireless-dev/net/d80211/ieee80211_dev.c 2006-08-10 21:54:44.300216256 
+0200
@@ -13,6 +13,7 @@
 #include linux/netdevice.h
 #include net/d80211.h
 #include ieee80211_i.h
+#include ieee80211_led.h
 
 struct ieee80211_dev_list {
struct list_head list;
--- wireless-dev.orig/net/d80211/ieee80211_i.h  2006-08-10 20:02:20.399652863 
+0200
+++ wireless-dev/net/d80211/ieee80211_i.h   2006-08-10 20:03:22.939652863 
+0200
@@ -460,7 +460,10 @@ struct ieee80211_local {
 u32 dot11TransmittedFrameCount;
 u32 dot11WEPUndecryptableCount;
 
-int tx_led_counter;
+#ifdef CONFIG_D80211_LEDS
+   int tx_led_counter, rx_led_counter;
+   struct led_trigger *tx_led, *rx_led;
+#endif
 
u32 channel_use;
u32 channel_use_raw;
--- wireless-dev.orig/net/d80211/ieee80211_led.c

[take8 1/2] kevent: Core files.

2006-08-11 Thread Evgeniy Polyakov

Core files.

This patch includes core kevent files:
 - userspace controlling
 - kernelspace interfaces
 - initialization
 - notification state machines

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]

diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S
index dd63d47..091ff42 100644
--- a/arch/i386/kernel/syscall_table.S
+++ b/arch/i386/kernel/syscall_table.S
@@ -317,3 +317,5 @@ ENTRY(sys_call_table)
.long sys_tee   /* 315 */
.long sys_vmsplice
.long sys_move_pages
+   .long sys_kevent_get_events
+   .long sys_kevent_ctl
diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S
index 5d4a7d1..b2af4a8 100644
--- a/arch/x86_64/ia32/ia32entry.S
+++ b/arch/x86_64/ia32/ia32entry.S
@@ -713,4 +713,6 @@ #endif
.quad sys_tee
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
+   .quad sys_kevent_get_events
+   .quad sys_kevent_ctl
 ia32_syscall_end:  
diff --git a/include/asm-i386/unistd.h b/include/asm-i386/unistd.h
index fc1c8dd..c9dde13 100644
--- a/include/asm-i386/unistd.h
+++ b/include/asm-i386/unistd.h
@@ -323,10 +323,12 @@ #define __NR_sync_file_range  314
 #define __NR_tee   315
 #define __NR_vmsplice  316
 #define __NR_move_pages317
+#define __NR_kevent_get_events 318
+#define __NR_kevent_ctl319
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 318
+#define NR_syscalls 320
 
 /*
  * user-visible error numbers are in the range -1 - -128: see
diff --git a/include/asm-x86_64/unistd.h b/include/asm-x86_64/unistd.h
index 94387c9..61363e0 100644
--- a/include/asm-x86_64/unistd.h
+++ b/include/asm-x86_64/unistd.h
@@ -619,10 +619,14 @@ #define __NR_vmsplice 278
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages279
 __SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_kevent_get_events 280
+__SYSCALL(__NR_kevent_get_events, sys_kevent_get_events)
+#define __NR_kevent_ctl281
+__SYSCALL(__NR_kevent_ctl, sys_kevent_ctl)
 
 #ifdef __KERNEL__
 
-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max __NR_kevent_ctl
 
 #ifndef __NO_STUBS
 
diff --git a/include/linux/kevent.h b/include/linux/kevent.h
new file mode 100644
index 000..64ef706
--- /dev/null
+++ b/include/linux/kevent.h
@@ -0,0 +1,309 @@
+/*
+ * kevent.h
+ * 
+ * 2006 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED]
+ * All rights reserved.
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __KEVENT_H
+#define __KEVENT_H
+
+/*
+ * Kevent request flags.
+ */
+
+#define KEVENT_REQ_ONESHOT 0x1 /* Process this event only once 
and then dequeue. */
+
+/*
+ * Kevent return flags.
+ */
+#define KEVENT_RET_BROKEN  0x1 /* Kevent is broken. */
+#define KEVENT_RET_DONE0x2 /* Kevent processing 
was finished successfully. */
+
+/*
+ * Kevent type set.
+ */
+#define KEVENT_SOCKET  0
+#define KEVENT_INODE   1
+#define KEVENT_TIMER   2
+#define KEVENT_POLL3
+#define KEVENT_NAIO4
+#define KEVENT_AIO 5
+#defineKEVENT_MAX  6
+
+/*
+ * Per-type event sets.
+ * Number of per-event sets should be exactly as number of kevent types.
+ */
+
+/*
+ * Timer events.
+ */
+#defineKEVENT_TIMER_FIRED  0x1
+
+/*
+ * Socket/network asynchronous IO events.
+ */
+#defineKEVENT_SOCKET_RECV  0x1
+#defineKEVENT_SOCKET_ACCEPT0x2
+#defineKEVENT_SOCKET_SEND  0x4
+
+/*
+ * Inode events.
+ */
+#defineKEVENT_INODE_CREATE 0x1
+#defineKEVENT_INODE_REMOVE 0x2
+
+/*
+ * Poll events.
+ */
+#defineKEVENT_POLL_POLLIN  0x0001
+#defineKEVENT_POLL_POLLPRI 0x0002
+#defineKEVENT_POLL_POLLOUT 0x0004
+#defineKEVENT_POLL_POLLERR 0x0008
+#defineKEVENT_POLL_POLLHUP 0x0010
+#defineKEVENT_POLL_POLLNVAL0x0020
+
+#defineKEVENT_POLL_POLLRDNORM  0x0040
+#defineKEVENT_POLL_POLLRDBAND  0x0080
+#defineKEVENT_POLL_POLLWRNORM  0x0100
+#defineKEVENT_POLL_POLLWRBAND  0x0200
+#defineKEVENT_POLL_POLLMSG 0x0400
+#defineKEVENT_POLL_POLLREMOVE  0x1000
+

[take8 0/2] kevent: Generic event handling mechanism.

2006-08-11 Thread Evgeniy Polyakov

Generic event handling mechanism.

Changes from 'take7' patchset:
 * new mmap interface (not tested, waiting for other changes to be acked)
- use nopage() method to dynamically substitue pages
- allocate new page for events only when new added kevent requres it
- do not use ugly index dereferencing, use structure instead
- reduced amount of data in the ring (id and flags), 
maximum 12 pages on x86 per kevent fd

Changes from 'take6' patchset:
 * a lot of comments!
 * do not use list poisoning for detection of the fact, that entry is in the 
list
 * return number of ready kevents even if copy*user() fails
 * strict check for number of kevents in syscall
 * use ARRAY_SIZE for array size calculation
 * changed superblock magic number
 * use SLAB_PANIC instead of direct panic() call
 * changed -E* return values
 * a lot of small cleanups and indent fixes

Changes from 'take5' patchset:
 * removed compilation warnings about unused wariables when lockdep is not 
turned on
 * do not use internal socket structures, use appropriate (exported) wrappers 
instead
 * removed default 1 second timeout
 * removed AIO stuff from patchset

Changes from 'take4' patchset:
 * use miscdevice instead of chardevice
 * comments fixes

Changes from 'take3' patchset:
 * removed serializing mutex from kevent_user_wait()
 * moved storage list processing to RCU
 * removed lockdep screaming - all storage locks are initialized in the same 
function, so it was learned 
to differentiate between various cases
 * remove kevent from storage if is marked as broken after callback
 * fixed a typo in mmaped buffer implementation which would end up in wrong 
index calcualtion 

Changes from 'take2' patchset:
 * split kevent_finish_user() to locked and unlocked variants
 * do not use KEVENT_STAT ifdefs, use inline functions instead
 * use array of callbacks of each type instead of each kevent callback 
initialization
 * changed name of ukevent guarding lock
 * use only one kevent lock in kevent_user for all hash buckets instead of 
per-bucket locks
 * do not use kevent_user_ctl structure instead provide needed arguments as 
syscall parameters
 * various indent cleanups
 * added optimisation, which is aimed to help when a lot of kevents are being 
copied from userspace
 * mapped buffer (initial) implementation (no userspace yet)

Changes from 'take1' patchset:
 - rebased against 2.6.18-git tree
 - removed ioctl controlling
 - added new syscall kevent_get_events(int fd, unsigned int min_nr, unsigned 
int max_nr,
unsigned int timeout, void __user *buf, unsigned flags)
 - use old syscall kevent_ctl for creation/removing, modification and initial 
kevent 
initialization
 - use mutuxes instead of semaphores
 - added file descriptor check and return error if provided descriptor does not 
match
kevent file operations
 - various indent fixes
 - removed aio_sendfile() declarations.

Thank you.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


lockdep rt2500usb report

2006-08-11 Thread Johannes Berg
This is running wireless-dev from yesterday. All I did was plug in a 
rt2500usb device into a usb port on a freshly booted system. I have a 
feeling that this is could be one of the problems reported earlier with 
the d80211 stack, but I haven't mastered the art of picking through 
these traces yet... What's swapper doing in there?


[ 1806.889513] usb 5-2: new high speed USB device using ehci_hcd and 
address 2

[ 1807.164838] usb 5-2: configuration #1 chosen from 1 choice
[ 1807.252880] Loading module: rt2500usb - CVS (N/A) by 
http://rt2x00.serialmonkey.com.

[ 1807.338966] wmaster0: Selected rate control algorithm 'simple'
[ 1807.364971] usbcore: registered new driver rt2500usb
[ 1807.658580]
[ 1807.658582] ===
[ 1807.658586] [ INFO: possible circular locking dependency detected ]
[ 1807.658588] ---
[ 1807.658591] swapper/0 is trying to acquire lock:
[ 1807.658593]  (dev-queue_lock){-+..}, at: [c0297518] 
dev_queue_xmit+0x52/0 x24f

[ 1807.658603]
[ 1807.658604] but task is already holding lock:
[ 1807.658606]  (dev-_xmit_lock){-+..}, at: [c02976b4] 
dev_queue_xmit+0x1ee/ 0x24f

[ 1807.658611]
[ 1807.658612] which lock already depends on the new lock.
[ 1807.658613]
[ 1807.658615]
[ 1807.658616] the existing dependency chain (in reverse order) is:
[ 1807.658618]
[ 1807.658619] - #1 (dev-_xmit_lock){-+..}:
[ 1807.658622][c01322b7] lock_acquire+0x5c/0x79
[ 1807.658631][c02f5560] _spin_lock_bh+0x3b/0x48
[ 1807.658639][c02a3231] dev_activate+0x5e/0x10f
[ 1807.658646][c0295c89] dev_open+0x5c/0x73
[ 1807.658652][c029538f] dev_change_flags+0x51/0x107
[ 1807.658659][c029e60a] do_setlink+0x182/0x378
[ 1807.658665][c029dac2] rtnetlink_rcv_msg+0x163/0x214
[ 1807.658671][c02a5b9e] netlink_run_queue+0x83/0x114
[ 1807.658678][c029d833] rtnetlink_rcv+0x2c/0x49
[ 1807.658684][c02a5c44] netlink_data_ready+0x15/0x59
[ 1807.658691][c02a3893] netlink_sendskb+0x1f/0x36
[ 1807.658697][c02a5673] netlink_unicast+0x190/0x1f2
[ 1807.658703][c02a588f] netlink_sendmsg+0x1ba/0x29d
[ 1807.658709][c028cb46] sock_sendmsg+0xcf/0xf3
[ 1807.658717][c028cc60] sys_sendmsg+0xf6/0x1fb
[ 1807.658723][c028d4af] sys_socketcall+0x232/0x253
[ 1807.658729][c0102ead] sysenter_past_esp+0x56/0x8d
[ 1807.658737]
[ 1807.658737] - #0 (dev-queue_lock){-+..}:
[ 1807.658740][c01322b7] lock_acquire+0x5c/0x79
[ 1807.658748][c02f5763] _spin_lock+0x36/0x43
[ 1807.658754][c0297518] dev_queue_xmit+0x52/0x24f
[ 1807.658760][e2cb2aff] 
ieee80211_subif_start_xmit+0x299/0x49f [80211 ]

[ 1807.658777][c02973d9] dev_hard_start_xmit+0x15f/0x24c
[ 1807.658783][c02976cd] dev_queue_xmit+0x207/0x24f
[ 1807.658789][e351b8a4] mld_sendpack+0x228/0x29f [ipv6]
[ 1807.658812][e351bf49] mld_ifc_timer_expire+0x217/0x260 [ipv6]
[ 1807.658829][c0121a6d] run_timer_softirq+0xbf/0x1ae
[ 1807.658836][c011e30c] __do_softirq+0x50/0xc1
[ 1807.658844][c011e3c6] do_softirq+0x49/0x4b
[ 1807.658849][c011e50f] irq_exit+0x42/0x44
[ 1807.658855][c01055d4] do_IRQ+0x3c/0x78
[ 1807.658861][c01039cd] common_interrupt+0x25/0x2c
[ 1807.658867][c0101c27] cpu_idle+0x41/0x69
[ 1807.658873][c0100295] rest_init+0x39/0x3b
[ 1807.658878][c03ee718] start_kernel+0x2a6/0x31e
[ 1807.658885][c0100199] 0xc0100199
[ 1807.658894]
[ 1807.658894] other info that might help us debug this:
[ 1807.658895]
[ 1807.658898] 1 lock held by swapper/0:
[ 1807.658899]  #0:  (dev-_xmit_lock){-+..}, at: [c02976b4] 
dev_queue_xmit+0 x1ee/0x24f

[ 1807.658905]
[ 1807.658906] stack backtrace:
[ 1807.658908]  [c01053a0] show_trace+0x12/0x14
[ 1807.658911]  [c01053bb] dump_stack+0x19/0x1e
[ 1807.658914]  [c01300f4] print_circular_bug_tail+0x5d/0x66
[ 1807.658918]  [c0131d9c] __lock_acquire+0xb89/0xd8a
[ 1807.658921]  [c01322b7] lock_acquire+0x5c/0x79
[ 1807.658925]  [c02f5763] _spin_lock+0x36/0x43
[ 1807.658928]  [c0297518] dev_queue_xmit+0x52/0x24f
[ 1807.658931]  [e2cb2aff] ieee80211_subif_start_xmit+0x299/0x49f [80211]
[ 1807.658942]  [c02973d9] dev_hard_start_xmit+0x15f/0x24c
[ 1807.658946]  [c02976cd] dev_queue_xmit+0x207/0x24f
[ 1807.658949]  [e351b8a4] mld_sendpack+0x228/0x29f [ipv6]
[ 1807.658964]  [e351bf49] mld_ifc_timer_expire+0x217/0x260 [ipv6]
[ 1807.658979]  [c0121a6d] run_timer_softirq+0xbf/0x1ae
[ 1807.658982]  [c011e30c] __do_softirq+0x50/0xc1
[ 1807.658986]  [c011e3c6] do_softirq+0x49/0x4b
[ 1807.658989]  [c011e50f] irq_exit+0x42/0x44
[ 1807.658992]  [c01055d4] do_IRQ+0x3c/0x78
[ 1807.658995]  [c01039cd] common_interrupt+0x25/0x2c
[ 1807.658998]  [c0101c27] cpu_idle+0x41/0x69
[ 1807.659001]  [c0100295] rest_init+0x39/0x3b
[ 1807.659004]  [c03ee718] start_kernel+0x2a6/0x31e
[ 1807.659007]  [c0100199] 0xc0100199

-
To unsubscribe from this 

Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-11 Thread David Miller
From: Christophe Devriese [EMAIL PROTECTED]
Date: Fri, 11 Aug 2006 10:50:44 +0200

 What can I do to get you to apply this then ? This patch is about
 fixing a bug which is bothering me a lot.

You need to be patient while I review the problem.

Nothing you say will allow my brain to operate any faster, sorry to
say.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-11 Thread Christophe Devriese
On Friday 11 August 2006 08:45, David Miller wrote:
 From: Krzysztof Oledzki [EMAIL PROTECTED]
 Date: Thu, 10 Aug 2006 20:18:23 +0200 (CEST)

  OK, this patch really solves the bug from my report. Are there any
  chances for similar fix in the net-2.6.19.git?

 I'm still thinking about this patch and what various people have
 explained about the situation.

What can I do to get you to apply this then ? This patch is about fixing a bug 
which is bothering me a lot.

Regards,

Christophe Devriese
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET 00/06]: Increase number of possible routing tables

2006-08-11 Thread Thomas Graf
* Michael Tokarev [EMAIL PROTECTED] 2006-08-11 10:56
 And while we're at it...  How about using table *names* instead of
 numbers in kernel too, a-la iptables?  Once possible number of tables
 is large, and we're using hashes for tables now anyway, keeping a
 name inside the table structure wont hurt ;)

This is and should be implemented in userspace.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible leak of multicast source filter sctructure

2006-08-11 Thread Michal Ruzicka
 Michal,
 This looks correct, but I think a better way to do it is:
 
 in_dev = inetdev_by_index(...)
 (void) ip_mc_leave_src()
 if (in_dev) {
 ip_mc_dec_group()
 in_dev_put()
 }
 
 That way, sflist internal details aren't visible at this
 level, and ip_mc_leave_src() collapses to the sock_kfree_s()
 when in_dev is NULL.

You are absolutely right, I just failed to notice that -ENODEV return value
from ip_mc_del_src()/ip_mc_leave_src() is ignored.
Here comes the patch:

--- linux-2.6.17.8/net/ipv4/igmp.c.orig 2006-08-11 11:45:56.0 +0200
+++ linux-2.6.17.8/net/ipv4/igmp.c  2006-08-11 11:51:56.0 +0200
@@ -2202,13 +2202,13 @@
struct in_device *in_dev;
inet-mc_list = iml-next;
 
-   if ((in_dev = inetdev_by_index(iml-multi.imr_ifindex)) != 
NULL) {
-   (void) ip_mc_leave_src(sk, iml, in_dev);
+   in_dev = inetdev_by_index(iml-multi.imr_ifindex);
+   (void) ip_mc_leave_src(sk, iml, in_dev);
+   if (in_dev != NULL) {
ip_mc_dec_group(in_dev, 
iml-multi.imr_multiaddr.s_addr);
in_dev_put(in_dev);
}
sock_kfree_s(sk, iml, sizeof(*iml));
-
}
rtnl_unlock();
 }


 Also, ip_mc_leave_group() has the same issue; looks
 like it just needs the if (in_dev) removed before the call to
 ip_mc_leave_src().

In fact it is a slightly different issue, there is no leak in this
function. Rather the function completely fails to leave a multicast
group joined on an interface that does not exist any more. Actually
this is how I discovered the bug as I was tracking down a problem
with ripd daemon of routing software quagga which failed to join
a multicast group (with -ENOBUFS) on an interface after there were
several (20 to be precise which corresponds to the default value
[IP_MAX_MEMBERSHIPS] of sysctl_igmp_max_memberships) interfaces
added/removed.
Here comes a patch for that:

--- linux-2.6.17.8/net/ipv4/igmp.c.leak 2006-08-11 11:50:46.0 +0200
+++ linux-2.6.17.8/net/ipv4/igmp.c  2006-08-11 11:52:33.0 +0200
@@ -1799,19 +1799,15 @@
 
rtnl_lock();
in_dev = ip_mc_find_dev(imr);
-   if (!in_dev) {
-   rtnl_unlock();
-   return -ENODEV;
-   }
ifindex = imr-imr_ifindex;
for (imlp = inet-mc_list; (iml = *imlp) != NULL; imlp = iml-next) {
if (iml-multi.imr_multiaddr.s_addr == group 
iml-multi.imr_ifindex == ifindex) {
-   (void) ip_mc_leave_src(sk, iml, in_dev);
-
*imlp = iml-next;
 
-   ip_mc_dec_group(in_dev, group);
+   (void) ip_mc_leave_src(sk, iml, in_dev);
+   if (in_dev != NULL)
+   ip_mc_dec_group(in_dev, group);
rtnl_unlock();
sock_kfree_s(sk, iml, sizeof(*iml));
return 0;


 +-DLS
 

Michal
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


the mystery that is sock_fasync

2006-08-11 Thread David Miller

I was studying sock_fasync() and it definitely has a bunch
of questionable issues.

Well firstly, it duplicates fasync_helper() entirely.
The only difference is that sock_fasync() does socket
local locking which is better for performance.  fasync_helper()
uses a global spinlock to protect the fasync list it is given.

Secondly, and I think more importantly, this thing acts as
if it is possible to have more than one file -- socket
mapping.  That is simply impossible.

There can indeed be many file descriptors that point to the
file object that points to the socket inode, but that's
different.

This invariant is maintained by the fact that socket
creations creates and maps one file object to point
to the socket's inode in sock_create.

Furthermore we block any attempt to open sockets by name
via things like /proc/$PID/fds/$sock_fdnum

In fact when sock_close() runs, it calls sock_fasync(-1, file, 0) and
the subsequent sock_release() bug checks that fasync_list is NULL.

If my analysis is correct we can incredibly simplify sock_fasync().

Did I miss some way that multiple file objects can point to the
same socket inode?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the mystery that is sock_fasync

2006-08-11 Thread Evgeniy Polyakov
On Fri, Aug 11, 2006 at 03:15:16AM -0700, David Miller ([EMAIL PROTECTED]) 
wrote:
 Did I miss some way that multiple file objects can point to the
 same socket inode?

What about dup and pipe?

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the mystery that is sock_fasync

2006-08-11 Thread David Miller
From: Evgeniy Polyakov [EMAIL PROTECTED]
Date: Fri, 11 Aug 2006 14:28:20 +0400

 On Fri, Aug 11, 2006 at 03:15:16AM -0700, David Miller ([EMAIL PROTECTED]) 
 wrote:
  Did I miss some way that multiple file objects can point to the
  same socket inode?
 
 What about dup and pipe?

Dup makes new file descriptor references to the file object.

It does not create a new file object reference to a socket inode,
which is what we're concerned with here.

Pipe files do not point to socket inodes.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] [NETLINK]: Dont set socket error for failed event notifications

2006-08-11 Thread Thomas Graf
* David Miller [EMAIL PROTECTED] 2006-08-10 23:02
 Thomas, there was a conflict which I did expect since I added
 all the 2.6.18 bug fixes into the tree.  The conflict is for
 your netlink conversion of the iflink stuff.
 
 I had to fix the -stable tree IFLA_ADDRESS handling because
 dev-set_mac_address expects a sockaddr not a raw MAC
 address.
 
 I think I resolved the conflict while applying the patch
 properly, but if you could take a look I'd appreciate it.

Looks good. You could have used nla_memcpy() instead of
memcpy() but there is no functional difference in this case.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] [NETLINK]: Dont set socket error for failed event notifications

2006-08-11 Thread David Miller
From: Thomas Graf [EMAIL PROTECTED]
Date: Fri, 11 Aug 2006 12:38:22 +0200

 Looks good. You could have used nla_memcpy() instead of
 memcpy() but there is no functional difference in this case.

Right, because the code already verifies that the attribute
length is at least dev-addr_len, and that is the size of
the memcpy we do.

Thanks for checking things out.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


race condition leading to segfault in d80211

2006-08-11 Thread Johannes Berg

What was that about locking not having problems? :P

I was writing a small program that (using ioctls)
* creates a new interface (using sysfs)
* sets the interface to monitor mode
* sets IFF_UP
* (1)
* sets IFF_DOWN
* (2)
* destroy interface (using sysfs)


That was fine, but then I wanted to see this happening and added 
system(iwconfig) at the two places marked (1) and (2), which 
triggered below bug. Note the address, I have slab debugging enabled.


[12143.789779] BUG: unable to handle kernel paging request at virtual address 
6b6b752f
[12143.789785]  printing eip:
[12143.789787] e2cc1df0
[12143.789789] *pde = 
[12143.789792] Oops:  [#1]
[12143.789794] PREEMPT
[12143.789796] Modules linked in: arc4 rate_control rt2500usb 80211 ipv6 
af_packet speedstep_lib cpufreq_userspace cpufreq_stats freq_table 
cpufreq_powersave cpufreq_ondemand cpufreq_conservative video sbs thermal 
i2c_ec i2c_core processor fan button battery container ac asus_acpi sr_mod sbp2 
snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm 
snd_timer 8250_pnp snd soundcore floppy 8250 serial_core psmouse snd_page_alloc 
skge crc32 ohci1394 ieee1394 rtc pcspkr ehci_hcd uhci_hcd usbcore sg evdev
[12143.789831] CPU:0
[12143.789832] EIP:0060:[e2cc1df0]Not tainted VLI
[12143.789833] EFLAGS: 00210282   (2.6.18-rc4 #2)
[12143.789850] EIP is at ieee80211_sta_scan_work+0x1a/0x406 [80211]
[12143.789853] eax: d517c320   ebx: cda019d8   ecx: c0128a7e   edx: c149
[12143.789856] esi: cda019dc   edi: 6b6b6b6b   ebp: c1491f4c   esp: c1491eec
[12143.789859] ds: 007b   es: 007b   ss: 0068
[12143.789862] Process events/0 (pid: 4, ti=c149 task=c1488070 
task.ti=c149)
[12143.789864] Stack: 00200046 00200046 00200046  c042653c 00200046 
 c1476888
[12143.789872]d517c000 d517c320 00200046 0002 0001 c0128a28 
c147686c c0128a7e
[12143.789879]00200046 c147686c c147686c 00200292 c1491f4c cda019d8 
cda019dc c147686c
[12143.789887] Call Trace:
[12143.789889]  [c010418f] show_stack_log_lvl+0xa8/0xe5
[12143.789895]  [c0104365] show_registers+0x199/0x229
[12143.789899]  [c0104844] die+0x118/0x2ac
[12143.789902]  [c0113db9] do_page_fault+0x280/0x599
[12143.789908]  [c0103ad5] error_code+0x39/0x40
[12143.789912]  [c0128a8e] run_workqueue+0x76/0xea
[12143.789917]  [c0128c88] worker_thread+0xe4/0x11c
[12143.789921]  [c012b82e] kthread+0xcf/0xd3
[12143.789925]  [c0101005] kernel_thread_helper+0x5/0xb
[12143.789928] Code: ba 03 00 00 00 89 d8 e8 9c de 5c dd e9 e6 fe ff ff 55 89 e5 57 
56 53 83 ec 54 89 45 c0 8b b8 c0 00 00 00 05 20 03 00 00 89 45 c4 8b 87 c4 09 
00 00 89 45 b4 85 c0 0f 84 18 01 00 00 8b 87 d0 09
[12143.789964] EIP: [e2cc1df0] ieee80211_sta_scan_work+0x1a/0x406 [80211] 
SS:ESP 0068:c1491eec
[12143.789977]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


sender throttling for unreliable protocols not garuanteed? (different units in sock-wmem_alloc and net_devive-tx_queue_len)

2006-08-11 Thread Steffen Maier
Hello,

a while ago I wrote a simple network load generator to inject datagrams or 
frames at maximum rate into a network. Maybe I was mistaken but I expected 
the socket's send operation to block, if the transmitting network device 
becomes saturated (no matter if using UDP or PF_PACKET). However, 
sometimes the send operation just returned ENOBUFS immediately without 
blocking.

If I understood WrightStevens' TCP/IP Illustrated Vol.2 correctly, BSD 
(at least 4.4 BSD Lite 1) never throttles a UDP sender, since it does not 
account bytes to transmit in any queue on egress path it could block on. 
On the other hand, Linux does in certain cases (details later). Even 
though I found out about the implementation details, I would still like to 
know, if there is any specification or common agreement on the semantics 
of socket send operation blocking (back pressure) with saturated network 
devices?

Please keep me in CC since I lurk and am not subscribed at the moment.

In order to understand why and under what circumstances blocking or 
non-blocking happens, I dug into the protocol stack code. The 
corresponding call traces look as follows (Linux 2.6, similar in 2.4):

sock_sendmsg
  __sock_sendmsg
   socket-ops-sendmsg: e.g. inet_sendmsg or packet_sendmsg
either:
inet_sendmsg
 sock-sk_prot-sendmsg: e.g. udp_sendmsg
  udp_sendmsg
   ip_append_data
sock_alloc_send_skb
 sock_alloc_send_pskb
  sock_wait_for_wmem
or:
packet_sendmsg
 sock_alloc_send_skb
  sock_alloc_send_pskb
   sock_wait_for_wmem

Now this is where a process might block, if the socket send buffer is full 
(atomic_read(sk-sk_wmem_alloc) = sk-sk_sndbuf). Suppose sndbuf is 
large enough and it won't block. Then the allocated sk_buff will be 
processed further in udp_sendmsg or packet_sendmsg and finally find its 
way into the device queue. Since we were using an unreliable (transport) 
protocol, the sk_buffs are not actually stored in the socket send buffer 
(there is no need for possible retransmissions). They are only accounted 
for the sndbuf, but they are stored in the device queue:

dev_queue_xmit
  q-enqueue: e.g. pfifo_fast_enqueue
   pfifo_fast_enqueue

This is where the sk_buff may be dropped, if the device queue is full 
(list-qlen = qdisc-dev-tx_queue_len). Suppose this bad case(?) 
happens, then the code path would return NET_XMIT_DROP. Packet_sendmsg 
would convert this via net_xmit_errno into a -ENOBUFS and finally return 
this as result of the socket send operation to the calling user process. 
Similar thing with the same effect is done for UDP messages.

This means, that there are cases where a socket send operation may just 
not block and immediately return ENOBUFS. If a process wanted to inject 
messages into the network at maximum line speed (or whatever less the NIC 
supports), this would in turn lead to busy sending. Even worse, I didn't 
come up with a sane configuration of sndbuf and txqueuelen, that could 
prevent this possibly unexpected behavior. If there was only one socket 
transmitting over a certain network device, you could roughly configure 
sndbuf = txqueuelen / MTU. For a fixed number of sockets we could use 
sndbuf = txqueuelen / MTU / #sockets. But this breaks as soon as you have 
arbitrarily many sockets transmitting over the same device.

In other words this all just happens because sndbuf accounts in bytes 
but the device queue measures in frames. But frames can have arbitrary 
size within an interval given by the network technology and thus there is 
no fixed relation between those two measurements.

I'd be interested in any opinions on the above mentioned effect.

Thanks,
Steffen.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the mystery that is sock_fasync

2006-08-11 Thread Alexey Kuznetsov
Hello!

 Did I miss some way that multiple file objects can point to the
 same socket inode?

Absolutely prohibited. Always was.

Apparently, sock_fasync() was cloned from tty_fasync(), that's the only
reason why it is so creepy.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skb_shared_info()

2006-08-11 Thread Alexey Kuznetsov
Hello!

 management schemes and to just wrap SKB's around
 arbitrary pieces of data.
+
 and something clever like a special page_offset encoding
 means use data, not page.

But for what purpose do you plan to use it?


 The e1000 issue is just one example of this, another

What is this issue?


What's about aggregated tcp queue, I can guess you did not find place
where to add protocol headers, but cannot figure out how adding non-pagecache
references could help.

You would rather want more then one skb_shared_info(): at least two,
one is immutable, another is for headers.

I think Evgeniy's idea about inlining skb_shared_info to skb head
is promising and simple enough. All the point of shared skb_shared_info
was to make cloning fast. But it makes lots of sense to inline some short
vector inot skb head (and, probably, even a MAX_HEADER space _instead_
of space for fclone).

With aggregated tcp send queue, when transmitting a segment, you could
allocate new skb head with space for header and either take existing
skb_shared_info from queue, attach it to head and set offset/length.
Or, alternatively, set one or two of page pointers in array, inlined in head.
(F.e. in the case of AF_UNIX socket, mentioned by Evgeniy, we would keep data
in pages and attach it directly to skb head).

Cloning becomes more expensive, but who needs it cheap, if tcp does not?



Returning to arbitrary pieces of data.

Page cache references in skb_shared_info are unique thing,
get_page()/page_cache_release() are enough to clone data.

But it is not enough even for such simple thing as splice().
It wants we remembered some strange pipe_buffer, where each page
is wrapped together with some ops, flags and even pipe_inode_info :-),
and called some destructor, when it is released. First thought is that
it is insane: it does not respect page cache logic, requires we implemented
additional level of refcounting, abuses amount of information, which
have to be stored in skb beyond all the limits of sanity.

But the second thought is that something like this is required in any case.
At least we must report to someone when a page is not in use and
can be recycled. I think Evgeniy knows more about this, AIO has
the same issue. But this is simpler, because release callback can be done
not per fragment or even per-skb, but actually per-transaction.

One idea is to announce (some) skb_shared_info completely immutable,
force each layer who needs to add a header or to fragment to refer
to original skb_shared_info as whole, using for modifications
another skb_shared_info() or area inlined in skb head.
And if someone is not able to, he must reallocate all the pages.
In this case destructor/notification can be done not for fragment,
but for whole aggregated skb_shared_info. Seems, it will work both
with aggregated tcp queue and with udp.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sender throttling for unreliable protocols not garuanteed? (different units in sock-wmem_alloc and net_devive-tx_queue_len)

2006-08-11 Thread Alexey Kuznetsov
Hello!

 I'd be interested in any opinions on the above mentioned effect.

Everything is right, it is exactly how it works.

Well, use another qdisc, which counts in bytes rather than in frames
(f.e. bfifo)

Set sndbuf small enough.

And if sndbuf*#senders is still too large, you have to use fair queueing,
sfq is quite good for this purpose.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DECNET] Fix to multiple tables routing

2006-08-11 Thread Steven Whitehouse

Here is a fix to Patrick McHardy's increase number of routing tables patch
for DECnet. I did just test this and it appears to be working fine with
this patch.

Cc: Patrick McHardy [EMAIL PROTECTED]
Cc: Patrick Caulfield [EMAIL PROTECTED]
Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]

diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 878312f..c8d9411 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -116,6 +116,7 @@ static struct nla_policy dn_fib_rule_pol
[FRA_SRC]   = { .type = NLA_U16 },
[FRA_DST]   = { .type = NLA_U16 },
[FRA_FWMARK]= { .type = NLA_U32 },
+   [FRA_TABLE] = { .type = NLA_U32 },
 };
 
 static int dn_fib_rule_match(struct fib_rule *rule, struct flowi *fl, int 
flags)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Patch/RfC] remove broken URLs from net drivers' output

2006-08-11 Thread Markus Dahms
Remove broken URLs (www.scyld.com) from network drivers' logging output.
URLs in comments and other strings are left intact.

Signed-off-by: Markus Dahms [EMAIL PROTECTED]

---

I was tired of always seeing an URL not working anymore on initialization
of 3c59x and natsemi. So this is an attempt to get rid of these messages.

The patch is against 2.6.18-rc4-git
(c4e321b85a89d7cd392d3105b2c033a6c58ed337 from .../gregkh/linux-2.6).

Btw: Is there a policy for message output for drivers with respect to
author name, email addresses, copyright messages or home pages?

Markus

 3c509.c |5 ++---
 3c59x.c |2 +-
 atp.c   |8 +++-
 eepro100.c  |2 +-
 epic100.c   |   10 --
 natsemi.c   |1 -
 ne2k-pci.c  |3 +--
 sundance.c  |3 +--
 yellowfin.c |1 -
 9 files changed, 13 insertions(+), 22 deletions(-)


diff --git a/drivers/net/3c509.c b/drivers/net/3c509.c
index cbdae54..dccf142 100644
--- a/drivers/net/3c509.c
+++ b/drivers/net/3c509.c
@@ -96,8 +96,7 @@ #include asm/uaccess.h
 #include asm/io.h
 #include asm/irq.h
 
-static char versionA[] __initdata = DRV_NAME .c: DRV_VERSION   DRV_RELDATE 
 [EMAIL PROTECTED];
-static char versionB[] __initdata = 
http://www.scyld.com/network/3c509.html\n;;
+static char version[] __initdata = DRV_NAME .c: DRV_VERSION   DRV_RELDATE 
 [EMAIL PROTECTED];
 
 #if defined(CONFIG_PM)  (defined(CONFIG_MCA) || defined(CONFIG_EISA))
 #define EL3_SUSPEND
@@ -360,7 +359,7 @@ #endif
printk(, IRQ %d.\n, dev-irq);
 
if (el3_debug  0)
-   printk(KERN_INFO %s KERN_INFO %s, versionA, versionB);
+   printk(KERN_INFO %s, version);
return 0;
 
 }
diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index 80e8ca0..098c7aa 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -103,7 +103,7 @@ #include linux/delay.h
 
 
 static char version[] __devinitdata =
-DRV_NAME : Donald Becker and others. www.scyld.com/network/vortex.html\n;
+DRV_NAME : Donald Becker and others.\n;
 
 MODULE_AUTHOR(Donald Becker [EMAIL PROTECTED]);
 MODULE_DESCRIPTION(3Com 3c59x/3c9xx ethernet driver );
diff --git a/drivers/net/atp.c b/drivers/net/atp.c
index bfa674e..697967f 100644
--- a/drivers/net/atp.c
+++ b/drivers/net/atp.c
@@ -31,10 +31,8 @@
 
 */
 
-static const char versionA[] =
+static const char version[] =
 atp.c:v1.09=ac 2002/10/01 Donald Becker [EMAIL PROTECTED]\n;
-static const char versionB[] =
-  http://www.scyld.com/network/atp.html\n;;
 
 /* The user-configurable values.
These may be modified when a driver module is loaded.*/
@@ -324,7 +322,7 @@ #endif
 
 #ifndef MODULE
if (net_debug)
-   printk(KERN_INFO %s KERN_INFO %s, versionA, versionB);
+   printk(KERN_INFO %s, version);
 #endif
 
printk(KERN_NOTICE %s: Pocket adapter found at %#3lx, IRQ %d, SAPROM 
@@ -932,7 +930,7 @@ static void set_rx_mode_8012(struct net_
 
 static int __init atp_init_module(void) {
if (debug)  /* Emit version even if 
no cards detected. */
-   printk(KERN_INFO %s KERN_INFO %s, versionA, versionB);
+   printk(KERN_INFO %s, version);
return atp_init();
 }
 
diff --git a/drivers/net/eepro100.c b/drivers/net/eepro100.c
index e445988..0f4b495 100644
--- a/drivers/net/eepro100.c
+++ b/drivers/net/eepro100.c
@@ -28,7 +28,7 @@
 */
 
 static const char * const version =
-eepro100.c:v1.09j-t 9/29/99 Donald Becker 
http://www.scyld.com/network/eepro100.html\n;
+eepro100.c:v1.09j-t 9/29/99 Donald Becker\n
 eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin 
[EMAIL PROTECTED] and others\n;
 
 /* A few user-configurable values that apply to all boards.
diff --git a/drivers/net/epic100.c b/drivers/net/epic100.c
index a67650c..ebef8ae 100644
--- a/drivers/net/epic100.c
+++ b/drivers/net/epic100.c
@@ -93,8 +93,6 @@ #include asm/uaccess.h
 static char version[] __devinitdata =
 DRV_NAME .c:v1.11 1/7/2001 Written by Donald Becker [EMAIL PROTECTED]\n;
 static char version2[] __devinitdata =
-  http://www.scyld.com/network/epic100.html\n;;
-static char version3[] __devinitdata =
   (unofficial 2.4.x kernel port, version  DRV_VERSION ,  DRV_RELDATE )\n;
 
 MODULE_AUTHOR(Donald Becker [EMAIL PROTECTED]);
@@ -323,8 +321,8 @@ static int __devinit epic_init_one (stru
 #ifndef MODULE
static int printed_version;
if (!printed_version++)
-   printk (KERN_INFO %s KERN_INFO %s KERN_INFO %s,
-   version, version2, version3);
+   printk (KERN_INFO %s KERN_INFO %s,
+   version, version2);
 #endif
 
card_idx++;
@@ -1600,8 +1598,8 @@ static int __init epic_init (void)
 {
 /* when a module, this is printed whether or not devices are found in probe */
 #ifdef MODULE
-   printk (KERN_INFO %s KERN_INFO %s KERN_INFO %s,
-   version, version2, version3);
+   printk (KERN_INFO %s KERN_INFO %s,
+   version, 

Re: [DECNET] Fix to multiple tables routing

2006-08-11 Thread Patrick McHardy
Steven Whitehouse wrote:
 Here is a fix to Patrick McHardy's increase number of routing tables patch
 for DECnet. I did just test this and it appears to be working fine with
 this patch.
 
 Cc: Patrick McHardy [EMAIL PROTECTED]
 Cc: Patrick Caulfield [EMAIL PROTECTED]
 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
 
 diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
 index 878312f..c8d9411 100644
 --- a/net/decnet/dn_rules.c
 +++ b/net/decnet/dn_rules.c
 @@ -116,6 +116,7 @@ static struct nla_policy dn_fib_rule_pol
   [FRA_SRC]   = { .type = NLA_U16 },
   [FRA_DST]   = { .type = NLA_U16 },
   [FRA_FWMARK]= { .type = NLA_U32 },
 + [FRA_TABLE] = { .type = NLA_U32 },
  };

Looks good. BTW, I noticed something in the DecNET fib_rule conversion
that looks like a bug:

The policy includes this for FRA_SRC/FRA_DST:

[FRA_SRC]   = { .type = NLA_U16 },
[FRA_DST]   = { .type = NLA_U16 },

But in dn_fib_rule_compare it is used like this:

if (tb[FRA_SRC]  (r-src != nla_get_u32(tb[FRA_SRC])))
return 0;

if (tb[FRA_DST]  (r-dst != nla_get_u32(tb[FRA_DST])))
return 0;

I think this might create problems depending on the endianness.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DECNET] Fix to multiple tables routing

2006-08-11 Thread Steven Whitehouse
Hi,

On Fri, Aug 11, 2006 at 05:22:17PM +0200, Patrick McHardy wrote:
 Steven Whitehouse wrote:
  Here is a fix to Patrick McHardy's increase number of routing tables patch
  for DECnet. I did just test this and it appears to be working fine with
  this patch.
  
  Cc: Patrick McHardy [EMAIL PROTECTED]
  Cc: Patrick Caulfield [EMAIL PROTECTED]
  Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
  
  diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
  index 878312f..c8d9411 100644
  --- a/net/decnet/dn_rules.c
  +++ b/net/decnet/dn_rules.c
  @@ -116,6 +116,7 @@ static struct nla_policy dn_fib_rule_pol
  [FRA_SRC]   = { .type = NLA_U16 },
  [FRA_DST]   = { .type = NLA_U16 },
  [FRA_FWMARK]= { .type = NLA_U32 },
  +   [FRA_TABLE] = { .type = NLA_U32 },
   };
 
 Looks good. BTW, I noticed something in the DecNET fib_rule conversion
 that looks like a bug:
 
 The policy includes this for FRA_SRC/FRA_DST:
 
 [FRA_SRC]   = { .type = NLA_U16 },
 [FRA_DST]   = { .type = NLA_U16 },
 
 But in dn_fib_rule_compare it is used like this:
 
 if (tb[FRA_SRC]  (r-src != nla_get_u32(tb[FRA_SRC])))
 return 0;
 
 if (tb[FRA_DST]  (r-dst != nla_get_u32(tb[FRA_DST])))
 return 0;
 
 I think this might create problems depending on the endianness.

Yes, good spotting :-) I'll send a patch shortly,

Steve.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] [NETLINK]: Handle NLM_F_ECHO in netlink_rcv_skb()

2006-08-11 Thread Alexey Kuznetsov
Hello!

 I get your point and I see the value. Unfortunately, probably due to
 lack of documentation, this feature isn't used by any applications I
 know of.

Well, tc was supposed to use it, but this did not happen and
it remained deficient.


 We even put in the hacks to make identification of own caused
 notifications easier by storing the netlink pid of the originator in
 the notification message.

Actually, it was supposed to be done everywhere, but originator info
did not propagate deep enough in many cases, especially in IPv6.
So, this is not a hack, it is a good work. :-)


BTW I have just remembered why it was especially important,
this should be documented as well.

Each socket, which subscribes to multicasts becomes sensitive
to rcvbuf overflows. F.e. when you do control operations on a socket,
which is subscribed to multicasts, the response can be lost in stream
of events and -ENOBUFS generated instead. If it is a daemon, it can resync
the state, but if it is a simple utility, it cannot recover.

Probably, unicasts sent due to NLM_F_ECHO should somehow override
rcvbuf limits.

This reminded me about a capital problem, found by openvz people.
Frankly speaking, I still have no idea how to repair this, probably you
will find a solution.

Look: while a dump, skb allocation can fail (because of many reasons,
the most obvious is that rcvbuf space was eaten by multicasts).
But error is not reported! Oops. The worst thing is that even if an error
is reported, iproute would ignore it.

Alexey


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DECNET] Fix to decnet rules compare function

2006-08-11 Thread Steven Whitehouse


Here is a fix to the DECnet rules compare function where we used
32bit values rather than 16bit values. Spotted by Patrick McHardy.

Cc: Patrick McHardy [EMAIL PROTECTED]
Cc: Patrick Caulfield [EMAIL PROTECTED]
Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]


diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 878312f..977bb56 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -196,10 +197,10 @@ static int dn_fib_rule_compare(struct fi
return 0;
 #endif
 
-   if (tb[FRA_SRC]  (r-src != nla_get_u32(tb[FRA_SRC])))
+   if (tb[FRA_SRC]  (r-src != nla_get_u16(tb[FRA_SRC])))
return 0;
 
-   if (tb[FRA_DST]  (r-dst != nla_get_u32(tb[FRA_DST])))
+   if (tb[FRA_DST]  (r-dst != nla_get_u16(tb[FRA_DST])))
return 0;
 
return 1;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPROUTE]: Fix struct alignment with cris architecture

2006-08-11 Thread Stephen Hemminger
On Thu, 10 Aug 2006 20:25:40 -0400
Andy Gay [EMAIL PROTECTED] wrote:

 [IPROUTE]: Fix struct alignment with cris architecture
 
 gcc for the cris arch does not pad structures to the next multiple of 4
 bytes, as the i386 gcc does.
 
 This causes errors like this when displaying xfrm policies:
 
 # ip x p
 !!!Deficit 3, rta_len=300
 src 192.168.251.32/29 dst 192.168.251.32/29 
 dir in priority 0 
 !!!Deficit 3, rta_len=180
 src 0.0.0.0/0 dst 192.168.251.32/29 
 dir in priority 2208 
 
 
 Similar errors are seen from ip x s.
 
 This patch fixes the errors when printing. I'm not sure whether we
 should worry about other uses of the affected structs, I've not seen any
 other bad effects from this though, so hopefully this is enough.
 
 (Thanks to Herbert Xu for pointing out that NLMSG_SPACE is the correct
 macro to use here.)
 
 Tested against 2.6.17.6 kernel on i386, and 2.6.16.1 kernel on cris.
 
 Signed-off-by: Andy Gay [EMAIL PROTECTED]

Applied
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lockdep rt2500usb report

2006-08-11 Thread Ivo van Doorn
On Friday 11 August 2006 10:34, Johannes Berg wrote:
 This is running wireless-dev from yesterday. All I did was plug in a 
 rt2500usb device into a usb port on a freshly booted system. I have a 
 feeling that this is could be one of the problems reported earlier with 
 the d80211 stack, but I haven't mastered the art of picking through 
 these traces yet... What's swapper doing in there?

Mhh, looking through the trace this might be a dscape bug.
The problem seem to come from the TX handler, and nothing has come out
of rt2500usb yet.

Jiri, do you have any ideas about this one?

Ivo

 [ 1806.889513] usb 5-2: new high speed USB device using ehci_hcd and 
 address 2
 [ 1807.164838] usb 5-2: configuration #1 chosen from 1 choice
 [ 1807.252880] Loading module: rt2500usb - CVS (N/A) by 
 http://rt2x00.serialmonkey.com.
 [ 1807.338966] wmaster0: Selected rate control algorithm 'simple'
 [ 1807.364971] usbcore: registered new driver rt2500usb
 [ 1807.658580]
 [ 1807.658582] ===
 [ 1807.658586] [ INFO: possible circular locking dependency detected ]
 [ 1807.658588] ---
 [ 1807.658591] swapper/0 is trying to acquire lock:
 [ 1807.658593]  (dev-queue_lock){-+..}, at: [c0297518] 
 dev_queue_xmit+0x52/0 x24f
 [ 1807.658603]
 [ 1807.658604] but task is already holding lock:
 [ 1807.658606]  (dev-_xmit_lock){-+..}, at: [c02976b4] 
 dev_queue_xmit+0x1ee/ 0x24f
 [ 1807.658611]
 [ 1807.658612] which lock already depends on the new lock.
 [ 1807.658613]
 [ 1807.658615]
 [ 1807.658616] the existing dependency chain (in reverse order) is:
 [ 1807.658618]
 [ 1807.658619] - #1 (dev-_xmit_lock){-+..}:
 [ 1807.658622][c01322b7] lock_acquire+0x5c/0x79
 [ 1807.658631][c02f5560] _spin_lock_bh+0x3b/0x48
 [ 1807.658639][c02a3231] dev_activate+0x5e/0x10f
 [ 1807.658646][c0295c89] dev_open+0x5c/0x73
 [ 1807.658652][c029538f] dev_change_flags+0x51/0x107
 [ 1807.658659][c029e60a] do_setlink+0x182/0x378
 [ 1807.658665][c029dac2] rtnetlink_rcv_msg+0x163/0x214
 [ 1807.658671][c02a5b9e] netlink_run_queue+0x83/0x114
 [ 1807.658678][c029d833] rtnetlink_rcv+0x2c/0x49
 [ 1807.658684][c02a5c44] netlink_data_ready+0x15/0x59
 [ 1807.658691][c02a3893] netlink_sendskb+0x1f/0x36
 [ 1807.658697][c02a5673] netlink_unicast+0x190/0x1f2
 [ 1807.658703][c02a588f] netlink_sendmsg+0x1ba/0x29d
 [ 1807.658709][c028cb46] sock_sendmsg+0xcf/0xf3
 [ 1807.658717][c028cc60] sys_sendmsg+0xf6/0x1fb
 [ 1807.658723][c028d4af] sys_socketcall+0x232/0x253
 [ 1807.658729][c0102ead] sysenter_past_esp+0x56/0x8d
 [ 1807.658737]
 [ 1807.658737] - #0 (dev-queue_lock){-+..}:
 [ 1807.658740][c01322b7] lock_acquire+0x5c/0x79
 [ 1807.658748][c02f5763] _spin_lock+0x36/0x43
 [ 1807.658754][c0297518] dev_queue_xmit+0x52/0x24f
 [ 1807.658760][e2cb2aff] 
 ieee80211_subif_start_xmit+0x299/0x49f [80211 ]
 [ 1807.658777][c02973d9] dev_hard_start_xmit+0x15f/0x24c
 [ 1807.658783][c02976cd] dev_queue_xmit+0x207/0x24f
 [ 1807.658789][e351b8a4] mld_sendpack+0x228/0x29f [ipv6]
 [ 1807.658812][e351bf49] mld_ifc_timer_expire+0x217/0x260 [ipv6]
 [ 1807.658829][c0121a6d] run_timer_softirq+0xbf/0x1ae
 [ 1807.658836][c011e30c] __do_softirq+0x50/0xc1
 [ 1807.658844][c011e3c6] do_softirq+0x49/0x4b
 [ 1807.658849][c011e50f] irq_exit+0x42/0x44
 [ 1807.658855][c01055d4] do_IRQ+0x3c/0x78
 [ 1807.658861][c01039cd] common_interrupt+0x25/0x2c
 [ 1807.658867][c0101c27] cpu_idle+0x41/0x69
 [ 1807.658873][c0100295] rest_init+0x39/0x3b
 [ 1807.658878][c03ee718] start_kernel+0x2a6/0x31e
 [ 1807.658885][c0100199] 0xc0100199
 [ 1807.658894]
 [ 1807.658894] other info that might help us debug this:
 [ 1807.658895]
 [ 1807.658898] 1 lock held by swapper/0:
 [ 1807.658899]  #0:  (dev-_xmit_lock){-+..}, at: [c02976b4] 
 dev_queue_xmit+0 x1ee/0x24f
 [ 1807.658905]
 [ 1807.658906] stack backtrace:
 [ 1807.658908]  [c01053a0] show_trace+0x12/0x14
 [ 1807.658911]  [c01053bb] dump_stack+0x19/0x1e
 [ 1807.658914]  [c01300f4] print_circular_bug_tail+0x5d/0x66
 [ 1807.658918]  [c0131d9c] __lock_acquire+0xb89/0xd8a
 [ 1807.658921]  [c01322b7] lock_acquire+0x5c/0x79
 [ 1807.658925]  [c02f5763] _spin_lock+0x36/0x43
 [ 1807.658928]  [c0297518] dev_queue_xmit+0x52/0x24f
 [ 1807.658931]  [e2cb2aff] ieee80211_subif_start_xmit+0x299/0x49f [80211]
 [ 1807.658942]  [c02973d9] dev_hard_start_xmit+0x15f/0x24c
 [ 1807.658946]  [c02976cd] dev_queue_xmit+0x207/0x24f
 [ 1807.658949]  [e351b8a4] mld_sendpack+0x228/0x29f [ipv6]
 [ 1807.658964]  [e351bf49] mld_ifc_timer_expire+0x217/0x260 [ipv6]
 [ 1807.658979]  [c0121a6d] run_timer_softirq+0xbf/0x1ae
 [ 1807.658982]  [c011e30c] __do_softirq+0x50/0xc1
 [ 1807.658986]  [c011e3c6] do_softirq+0x49/0x4b
 [ 1807.658989]  

Re: [PATCH 0/4]: powerpc/cell spidernet ethernet driver fixes

2006-08-11 Thread Sam Ravnborg
On Fri, Aug 11, 2006 at 12:03:37PM -0500, Linas Vepstas wrote:
 
 The following series of patches implement some fixes and performance
 improvements for the Spedernet ethernet device driver. The high point
 of the patch series is some code to implement a low-watermark interrupt
 on the transmit queue. The bundle of patches raises transmit performance 
 from some embarassingly low value to a reasonable 730 Megabits per
 second for 1500 byte packets.
 
 Please note that the spider is an ethernet controller that is 
 integrated into the south bridge, and is thus available only on
 Cell-based platforms.
 
 These have been well-tested over the last few weeks. Please apply. 
Hi Linas.
Just noticed a nit-pick detail.
The general rule is to add your Signed-off-by: at the bottom of the
patch, so the top-most Signed-of-by: is also the original author whereas
the last Signed-of-by: is the one that added this patch to the kernel.
Likewise you add Cc: before your Signed-off-by: line.

See patches from for example Andrew Morton for examples.

Sam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4]: powerpc/cell spidernet ethernet driver fixes

2006-08-11 Thread Linas Vepstas
On Fri, Aug 11, 2006 at 07:44:39PM +0200, Sam Ravnborg wrote:
  
  These have been well-tested over the last few weeks. Please apply. 
 Hi Linas.
 Just noticed a nit-pick detail.
 The general rule is to add your Signed-off-by: at the bottom of the
 patch, so the top-most Signed-of-by: is also the original author whereas
 the last Signed-of-by: is the one that added this patch to the kernel.

I put my name at the top when I was the primary author. 
I put Jim's name at the top when he was the primary author. 

Both names are there because I sat in Jim's office and used
his keyboard. I got him to compile and run the tests on
his hardware, and we'd then debate the results.

 Likewise you add Cc: before your Signed-off-by: line.

The patches I ave from akpm have the CC's after the 
signed-off by line, not before.

--linas
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4]: powerpc/cell spidernet ethtool -i version number info.

2006-08-11 Thread Linas Vepstas

Hi Olof,

Olof Johansson [EMAIL PROTECTED] writes:
 On Fri, Aug 11, 2006 at 12:11:17PM -0500, Linas Vepstas wrote:
 
  This patch adds version information as reported by 
  ethtool -i to the Spidernet driver.
 
 Why does a driver that's in the mainline kernel need to have a version
 number besides the kernel version?

I'll let Jim be the primary defender. From what I can tell, that's the
way its done.  For example:

linux-2.6.18-rc3-mm2 $ grep MODULE_VERSION */*/*.c |wc
 164 2459081

 I can understand it for drivers like e1000 that Intel maintain outside
 of the kernel as well. But spidernet is a fully mainline maintained
 driver, right?

Yes, the spidernet is a Linux-kernel only driver.

--linas

p.s. very strange, but I did not see your original email;  
only saw Jim's reply.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/4] kevent: AIO, aio_sendfile() implementation.

2006-08-11 Thread Ulrich Drepper
Sébastien Dugué wrote:
aio completion notification

I looked over this now but I don't think I understand everything.  Or I
don't see how it all is integrated.  And no, I'm not looking at the
proposed glibc code since would mean being tainted.


 Details:
 ---
 
   A struct sigevent *aio_sigeventp is added to struct iocb in
 include/linux/aio_abi.h
 
   An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in
 include/linux/aio.h:
 
   - IO_NOTIFY_SIGNAL means that the signal is to be sent to the
 requesting thread 
 
   - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a
 specifi thread.

This has been proved to be sufficient in the timer code which basically
has the same problem.  But why do you need separate constants?  We have
the various SIGEV_* constants, among them SIGEV_THREAD_ID.  Just use
these constants for the values of ki_notify.


   The following fields are added to struct kiocb in include/linux/aio.h:
 
   - pid_t ki_pid: target of the signal
 
   - __u16 ki_signo: signal number
 
   - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or
  IO_NOTIFY_THREAD_ID
 
   - uid_t ki_uid, ki_euid: filled with the submitter credentials

These two fields aren't needed for the POSIX interfaces.  Where does the
requirement come from?  I don't say they should be removed, they might
be useful, but if the costs are non-negligible then they could go away.


   - check whether the submitting thread wants to be notified directly
 (sigevent-sigev_notify_thread_id is 0) or wants the signal to be sent
 to another thread.
 In the latter case a check is made to assert that the target thread
 is in the same thread group

Is this really how it's implemented?  This is not how it should be.
Either a signal is sent to a specific thread in the same process (this
is what SIGEV_THREAD_ID is for) or the signal is sent to a calling
process.  Sending a signal to the process means that from the kernel's
POV any thread which doesn't have the signal blocked can receive it.
The final decision is made by the kernel.  There is no mechanism to send
the signal to another process.

So, for the purpose of the POSIX AIO code the ki_pid value is only
needed when the SIGEV_THREAD_ID bit is set.

It could be an extension and I don't mind it being introduced.  But
again, it's not necessary and if it adds costs then it could be left
out.  It is something which could easily be introduced later if the need
arises.


   listio support
 

I really don't understand the kernel interface for this feature.


 Details:
 ---
 
   An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h
 
   A struct lio_event is added in include/linux/aio.h
 
   A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h

So you have a pointer in the structure for the individual requests.  I
assume you use the atomic counter to trigger the final delivery.  I
further assume that if lio_wait is set the calling thread is suspended
until all requests are handled and that the final notification in this
case means that thread gets woken.

This is all fine.

But how do you pass the requests to the kernel?  If you have a new
lio_listio-like syscall it'll be easy.  But I haven't seen anything like
this mentioned.

The alternative is to pass the requests one-by-one in which case I don't
see how you create the reference to the lio_listio control block.  This
approach seems to be slower.

If all requests are passed at once, do you have the equivalent of
LIO_NOP entries?


How can we support the extension where we wait for a number of requests
which need not be all of them.  I.e., I submit N requests and want to be
notified when at least M (M = N) notified.  I am not yet clear about
the actual semantics we should implement (e.g., do we send another
notification after the first one?) but it's something which IMO should
be taken into account in the design.


Finally, and this is very important, does you code send out the
individual requests notification and then in the end the lio_listio
completion?  I think Suparna wrote this is the case but I want to make sure.


Overall, this looks much better than the old code.  If the answers to my
questions show that the behavior is compatible with the POSIX AIO code
I'm certainly very much in favor of adding the kernel code.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


[PATCH -mm] constify Tigon3 ether firmware structs

2006-08-11 Thread Andreas Mohr
Constify largish areas of firmware data in Tigon3 ethernet driver.


non-const:

lsmod:
tg3   101404  0

objdump -x:
.rodata 03e8
.data 4a0c

ls -l:
-rw-r--r-- 1 root root 114404 2006-08-19 21:36 drivers/net/tg3.ko


const:

lsmod:
tg3   101404  0

objdump -x:
.rodata 42c8
.data 0b4c

ls -l:
-rw-r--r-- 1 root root 114532 2006-08-19 21:06 drivers/net/tg3.ko



Compile- and run-tested with tg3 card on 2.6.18-rc3-mm2.

Any objections?


Signed-off-by: Andreas Mohr [EMAIL PROTECTED]


diff -urN linux-2.6.18-rc3-mm2.orig/drivers/net/tg3.c 
linux-2.6.18-rc3-mm2/drivers/net/tg3.c
--- linux-2.6.18-rc3-mm2.orig/drivers/net/tg3.c 2006-08-07 11:15:17.0 
+0200
+++ linux-2.6.18-rc3-mm2/drivers/net/tg3.c  2006-08-08 16:49:43.0 
+0200
@@ -267,7 +267,7 @@
 
 MODULE_DEVICE_TABLE(pci, tg3_pci_tbl);
 
-static struct {
+static const struct {
const char string[ETH_GSTRING_LEN];
 } ethtool_stats_keys[TG3_NUM_STATS] = {
{ rx_octets },
@@ -348,7 +348,7 @@
{ nic_tx_threshold_hit }
 };
 
-static struct {
+static const struct {
const char string[ETH_GSTRING_LEN];
 } ethtool_test_keys[TG3_NUM_TEST] = {
{ nvram test (online)  },
@@ -4963,7 +4963,7 @@
 #define TG3_FW_BSS_ADDR0x08000a70
 #define TG3_FW_BSS_LEN 0x10
 
-static u32 tg3FwText[(TG3_FW_TEXT_LEN / sizeof(u32)) + 1] = {
+static const u32 tg3FwText[(TG3_FW_TEXT_LEN / sizeof(u32)) + 1] = {
0x, 0x1003, 0x, 0x000d, 0x000d, 0x3c1d0800,
0x37bd3ffc, 0x03a0f021, 0x3c100800, 0x2610, 0x0e18, 0x,
0x000d, 0x3c1d0800, 0x37bd3ffc, 0x03a0f021, 0x3c100800, 0x26100034,
@@ -5057,7 +5057,7 @@
0x27bd0008, 0x03e8, 0x, 0x, 0x
 };
 
-static u32 tg3FwRodata[(TG3_FW_RODATA_LEN / sizeof(u32)) + 1] = {
+static const u32 tg3FwRodata[(TG3_FW_RODATA_LEN / sizeof(u32)) + 1] = {
0x35373031, 0x726c7341, 0x, 0x, 0x53774576, 0x656e7430,
0x, 0x726c7045, 0x76656e74, 0x3100, 0x556e6b6e, 0x45766e74,
0x, 0x, 0x, 0x, 0x66617461, 0x6c457272,
@@ -5122,13 +5122,13 @@
 struct fw_info {
unsigned int text_base;
unsigned int text_len;
-   u32 *text_data;
+   const u32 *text_data;
unsigned int rodata_base;
unsigned int rodata_len;
-   u32 *rodata_data;
+   const u32 *rodata_data;
unsigned int data_base;
unsigned int data_len;
-   u32 *data_data;
+   const u32 *data_data;
 };
 
 /* tp-lock is held. */
@@ -5260,7 +5260,7 @@
 #define TG3_TSO_FW_BSS_ADDR0x08001b80
 #define TG3_TSO_FW_BSS_LEN 0x894
 
-static u32 tg3TsoFwText[(TG3_TSO_FW_TEXT_LEN / 4) + 1] = {
+static const u32 tg3TsoFwText[(TG3_TSO_FW_TEXT_LEN / 4) + 1] = {
0x0e03, 0x, 0x08001b24, 0x, 0x1003, 0x,
0x000d, 0x000d, 0x3c1d0800, 0x37bd4000, 0x03a0f021, 0x3c100800,
0x2610, 0x0e10, 0x, 0x000d, 0x27bdffe0, 0x3c04fefe,
@@ -5547,7 +5547,7 @@
0xac470014, 0xac4a0018, 0x03e8, 0xac4b001c, 0x, 0x,
 };
 
-static u32 tg3TsoFwRodata[] = {
+static const u32 tg3TsoFwRodata[] = {
0x4d61696e, 0x43707542, 0x, 0x4d61696e, 0x43707541, 0x,
0x, 0x, 0x73746b6f, 0x6c64, 0x496e, 0x73746b6f,
0x2a2a, 0x, 0x53774576, 0x656e7430, 0x, 0x,
@@ -,7 +,7 @@
0x,
 };
 
-static u32 tg3TsoFwData[] = {
+static const u32 tg3TsoFwData[] = {
0x, 0x73746b6f, 0x6c64, 0x5f76312e, 0x362e3000, 0x,
0x, 0x, 0x, 0x, 0x, 0x,
0x,
@@ -5577,7 +5577,7 @@
 #define TG3_TSO5_FW_BSS_ADDR   0x00010f50
 #define TG3_TSO5_FW_BSS_LEN0x88
 
-static u32 tg3Tso5FwText[(TG3_TSO5_FW_TEXT_LEN / 4) + 1] = {
+static const u32 tg3Tso5FwText[(TG3_TSO5_FW_TEXT_LEN / 4) + 1] = {
0x0c004003, 0x, 0x00010f04, 0x, 0x1003, 0x,
0x000d, 0x000d, 0x3c1d0001, 0x37bde000, 0x03a0f021, 0x3c11,
0x2610, 0x0c004010, 0x, 0x000d, 0x27bdffe0, 0x3c04fefe,
@@ -5736,14 +5736,14 @@
0x, 0x, 0x,
 };
 
-static u32 tg3Tso5FwRodata[(TG3_TSO5_FW_RODATA_LEN / 4) + 1] = {
+static const u32 tg3Tso5FwRodata[(TG3_TSO5_FW_RODATA_LEN / 4) + 1] = {
0x4d61696e, 0x43707542, 0x, 0x4d61696e, 0x43707541, 0x,
0x, 0x, 0x73746b6f, 0x6c64, 0x, 0x,
0x73746b6f, 0x6c64, 0x, 0x, 0x66617461, 0x6c457272,
0x, 0x, 0x,
 };
 
-static u32 tg3Tso5FwData[(TG3_TSO5_FW_DATA_LEN / 4) + 1] = {
+static const u32 tg3Tso5FwData[(TG3_TSO5_FW_DATA_LEN / 4) + 1] = {
0x, 0x73746b6f, 0x6c64, 

Re: [PATCH 1/4] [NETLINK]: Handle NLM_F_ECHO in netlink_rcv_skb()

2006-08-11 Thread Thomas Graf
* Alexey Kuznetsov [EMAIL PROTECTED] 2006-08-11 19:35
 Well, tc was supposed to use it, but this did not happen and
 it remained deficient.

Makes sense, especially for auto generated handles. I've been listening
to the notifications on a separate socket for this purpose. It would
make sense however to extend a wait_for_ack() function and report
back and eventual echoed objects to have a blocking operation as well.

 Actually, it was supposed to be done everywhere, but originator info
 did not propagate deep enough in many cases, especially in IPv6.
 So, this is not a hack, it is a good work. :-)

It does make sense, the way it has been implemented if at all is
creepy. Even worse, IPv6 is using current-pid, some other code
has been using the pid from NETLINK_CREDS() :-)

 Each socket, which subscribes to multicasts becomes sensitive
 to rcvbuf overflows. F.e. when you do control operations on a socket,
 which is subscribed to multicasts, the response can be lost in stream
 of events and -ENOBUFS generated instead. If it is a daemon, it can resync
 the state, but if it is a simple utility, it cannot recover.

Yes, for that reason it is recommended to use a separate socket
when receiving multicasts. Also because some of the multicast
code is buggy and provides the pid of the requestor's socket to
netlink_broadcast() leading to excluding that socket.

 Probably, unicasts sent due to NLM_F_ECHO should somehow override
 rcvbuf limits.
 
 This reminded me about a capital problem, found by openvz people.
 Frankly speaking, I still have no idea how to repair this, probably you
 will find a solution.
 
 Look: while a dump, skb allocation can fail (because of many reasons,
 the most obvious is that rcvbuf space was eaten by multicasts).
 But error is not reported! Oops. The worst thing is that even if an error
 is reported, iproute would ignore it.

I'm not sure I understand this correctly, if rcvbuf space was eaten
by multicasts subsequent recvmsg() will follow invoking netlink_dump()
again and the dump continues.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netdev: change name length

2006-08-11 Thread Stephen Hemminger
Some improvements to robust name interface.  These API's are safe
now by convention, but it is worth providing some safety checks
against future bugs.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 net/core/dev.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- net-2.6.19.orig/net/core/dev.c
+++ net-2.6.19/net/core/dev.c
@@ -636,7 +636,8 @@ struct net_device * dev_get_by_flags(uns
  */
 int dev_valid_name(const char *name)
 {
-   return !(*name == '\0' 
+   return !(*name == '\0'
+|| strlen(name) = IFNAMSIZ
 || !strcmp(name, .)
 || !strcmp(name, ..)
 || strchr(name, '/'));
@@ -3198,13 +3199,15 @@ struct net_device *alloc_netdev(int size
struct net_device *dev;
int alloc_size;
 
+   BUG_ON(strlen(name) = sizeof(dev-name));
+
/* ensure 32-byte alignment of both the device and private area */
alloc_size = (sizeof(*dev) + NETDEV_ALIGN_CONST)  ~NETDEV_ALIGN_CONST;
alloc_size += sizeof_priv + NETDEV_ALIGN_CONST;
 
p = kzalloc(alloc_size, GFP_KERNEL);
if (!p) {
-   printk(KERN_ERR alloc_dev: Unable to allocate device.\n);
+   printk(KERN_ERR alloc_netdev: Unable to allocate device.\n);
return NULL;
}
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netdev: change name length

2006-08-11 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Fri, 11 Aug 2006 15:13:29 -0700

 Some improvements to robust name interface.  These API's are safe
 now by convention, but it is worth providing some safety checks
 against future bugs.
 
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

Applied, thanks Stephen.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DECNET] Fix to decnet rules compare function

2006-08-11 Thread David Miller
From: Steven Whitehouse [EMAIL PROTECTED]
Date: Fri, 11 Aug 2006 16:54:06 +0100

 Here is a fix to the DECnet rules compare function where we used
 32bit values rather than 16bit values. Spotted by Patrick McHardy.
 
 Cc: Patrick McHardy [EMAIL PROTECTED]
 Cc: Patrick Caulfield [EMAIL PROTECTED]
 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]

Applied, thanks a lot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DECNET] Fix to multiple tables routing

2006-08-11 Thread David Miller
From: Steven Whitehouse [EMAIL PROTECTED]
Date: Fri, 11 Aug 2006 15:27:42 +0100

 Here is a fix to Patrick McHardy's increase number of routing tables patch
 for DECnet. I did just test this and it appears to be working fine with
 this patch.
 
 Cc: Patrick McHardy [EMAIL PROTECTED]
 Cc: Patrick Caulfield [EMAIL PROTECTED]
 Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]

Applied, thanks Steven.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skb_shared_info()

2006-08-11 Thread David Miller
From: Alexey Kuznetsov [EMAIL PROTECTED]
Date: Fri, 11 Aug 2006 18:00:19 +0400

  The e1000 issue is just one example of this, another
 
 What is this issue?

E1000 wants 16K buffers for jumbo MTU settings.

The reason is that the chip can only handle power-of-2 buffer
sizes, and next hop from 9K is 16K.

It is not possible to tell the chip to only accept 9K packets, you
must give it the whole next power of 2 buffer size for the MTU you
wish to use.

With skb_shared_info() overhead this becomes a 32K allocation
in the simplest implementation.

Whichever hardware person was trying to save some trace lines on the
chip should have consulted software folks before implementing things
this way :-)

 What's about aggregated tcp queue, I can guess you did not find
 place where to add protocol headers, but cannot figure out how
 adding non-pagecache references could help.

This is not the idea.  I'm trying to see if we can salvage non-SG
paths in the design.

The idea is that struct retransmit_queue entries could hold either
paged or non-paged data, based upon the capabilities of the transmit
device.

If we store raw kmalloc buffers, we cannot attach them to an arbitrary
skb because of skb_shared_info().  This is true even if we
purposefully allocate the necessary head room for these kmalloc based
buffers.

It's requirement to live in the skb-data area really does preclude
any kind of clever buffering schemes.

 I think Evgeniy's idea about inlining skb_shared_info to skb head
 is promising and simple enough.

I think you are talking about struct sk_buff area when you say skb
head.  It is confusing talk because when I hear this phrase my brain
says skb-head which exactly where we want to move skb_shared_info()
away from! :-)

 But it makes lots of sense to inline some short vector inot skb head
 (and, probably, even a MAX_HEADER space _instead_ of space for
 fclone).

If inlined, one implementation of retransmit queue becomes apparent.
struct retransmit_queue is just list of data blobs, each represented
by usual vector of pages and offset and length.

Then transmission from write queue is merely propagating pages to
inline skb vector.

Some silly sample datastructure:

struct retransmit_block {
struct list_headrblk_node;
void*rblk_data;
unsigned short  rblk_frags;
skb_frag_t  frags[MAX_SKB_FRAGS];
};

struct retransmit_queue {
struct list_headrqueue_head;
struct retransmit_block *rqueue_send_head;
int rqueue_send_head_frag;
unsigned intrqueue_send_head_off;
};

tcp_sendmsg() and tcp_sendpage() just accumulate into the tail
retransmit_block until all of MAX_SKB_FRAGS are consumed.

tcp_write_xmit() and friends build skbs like this:

struct struct sk_buff *tcp_xmit_build(struct retransmit_block *rblk,
  int frag, unsigned int off,
  unsigned int len)
{
struct sk_buff *skb = alloc_skb(MAX_HEADER, GFP_KERNEL);
int ent;

if (unlikely(!skb))
return NULL;
ent = 0;
while (len) {
unsigned int this_off = rblk-frags[frag].page_offset + off;
unsigned int this_len = rblk-flags[frag].size - off;

if (this_len  len)
this_len = len;

skb-inline_info.frags[ent].page =
rblk-frags[frag].page;
skb-inline_info.frags[ent].page_offset = this_off;
skb-inline_info.frags[ent].size = this_len;
ent++;

frag++;
off = 0;
len -= this_len;
}
return skb;
}

(sorry, another outer loop is also needed to traverse to subsequent
 retransmit_blocks in the list when all of rblk_frags of current
 retransmit_block are consumed by inner loop)

Depending upon how we do completion callbacks, as you discuss below,
either we'll need a get_page() refcount grab in that inner loop
or we won't.

 With aggregated tcp send queue, when transmitting a segment, you could
 allocate new skb head with space for header and either take existing
 skb_shared_info from queue, attach it to head and set offset/length.
 Or, alternatively, set one or two of page pointers in array, inlined in head.
 (F.e. in the case of AF_UNIX socket, mentioned by Evgeniy, we would keep data
 in pages and attach it directly to skb head).

The latter scheme is closer to what I was thinking about.  Why
not inline this entire fraglist thing alongside sk_buff?

 Cloning becomes more expensive, but who needs it cheap, if tcp does not?

Exactly :)

 One idea is to announce (some) skb_shared_info completely immutable,
 force each layer who needs to add a header or to fragment to refer
 to original skb_shared_info as whole, using for modifications
 another skb_shared_info() or area inlined in skb head.
 And if 

Re: skb_shared_info()

2006-08-11 Thread Herbert Xu
On Fri, Aug 11, 2006 at 05:27:30PM -0700, David Miller wrote:
 
 E1000 wants 16K buffers for jumbo MTU settings.
 
 The reason is that the chip can only handle power-of-2 buffer
 sizes, and next hop from 9K is 16K.
 
 It is not possible to tell the chip to only accept 9K packets, you
 must give it the whole next power of 2 buffer size for the MTU you
 wish to use.
 
 With skb_shared_info() overhead this becomes a 32K allocation
 in the simplest implementation.

I think is no longer an issue because we've all come to the conclusion
that E1000 supports SG and therefore we can and should use 4K pages, no?

Whatever we do here, allocating 16K via kmalloc is very unlikely to
succeed in the near future so we have to via the SG route anyway.
In which case that we must have a head area which can accomodate the
skb_shared_info as we do now.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] CONFIG_PM=n slim: drivers/net/*

2006-08-11 Thread Alexey Dobriyan
Compile out dead code for CONFIG_PM=n users.

Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
---

 drivers/net/amd8111e.c |   10 +-
 drivers/net/b44.c  |4 
 drivers/net/bnx2.c |4 
 drivers/net/tg3.c  |4 
 4 files changed, 21 insertions(+), 1 deletion(-)

--- a/drivers/net/amd8111e.c
+++ b/drivers/net/amd8111e.c
@@ -1769,6 +1769,8 @@ static void amd8111e_vlan_rx_kill_vid(st
spin_unlock_irq(lp-lock);
 }
 #endif
+
+#ifdef CONFIG_PM
 static int amd8111e_enable_magicpkt(struct amd8111e_priv* lp)
 {
writel( VAL1|MPPLBA, lp-mmio + CMD3);
@@ -1789,6 +1791,8 @@ static int amd8111e_enable_link_change(s
readl(lp-mmio + CMD7);
return 0;
 }  
+#endif
+
 /* This function is called when a packet transmission fails to complete within 
a  resonable period, on the assumption that an interrupts have been failed or 
the  interface is locked up. This function will reinitialize the hardware */
 
 static void amd8111e_tx_timeout(struct net_device *dev)
@@ -1804,6 +1808,8 @@ static void amd8111e_tx_timeout(struct n
if(!err)
netif_wake_queue(dev);
 }
+
+#ifdef CONFIG_PM
 static int amd8111e_suspend(struct pci_dev *pci_dev, pm_message_t state)
 {  
struct net_device *dev = pci_get_drvdata(pci_dev);
@@ -1873,7 +1879,7 @@ static int amd8111e_resume(struct pci_de
 
return 0;
 }
-
+#endif
 
 static void __devexit amd8111e_remove_one(struct pci_dev *pdev)
 {
@@ -2152,8 +2158,10 @@ static struct pci_driver amd8111e_driver
.id_table   = amd8111e_pci_tbl,
.probe  = amd8111e_probe_one,
.remove = __devexit_p(amd8111e_remove_one),
+#ifdef CONFIG_PM
.suspend= amd8111e_suspend,
.resume = amd8111e_resume
+#endif
 };
 
 static int __init amd8111e_init(void)
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -2279,6 +2279,7 @@ static void __devexit b44_remove_one(str
pci_set_drvdata(pdev, NULL);
 }
 
+#ifdef CONFIG_PM
 static int b44_suspend(struct pci_dev *pdev, pm_message_t state)
 {
struct net_device *dev = pci_get_drvdata(pdev);
@@ -2336,14 +2337,17 @@ static int b44_resume(struct pci_dev *pd
netif_wake_queue(dev);
return 0;
 }
+#endif
 
 static struct pci_driver b44_driver = {
.name   = DRV_MODULE_NAME,
.id_table   = b44_pci_tbl,
.probe  = b44_init_one,
.remove = __devexit_p(b44_remove_one),
+#ifdef CONFIG_PM
 .suspend= b44_suspend,
 .resume = b44_resume,
+#endif
 };
 
 static int __init b44_init(void)
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -5962,6 +5962,7 @@ bnx2_remove_one(struct pci_dev *pdev)
pci_set_drvdata(pdev, NULL);
 }
 
+#ifdef CONFIG_PM
 static int
 bnx2_suspend(struct pci_dev *pdev, pm_message_t state)
 {
@@ -6003,14 +6004,17 @@ bnx2_resume(struct pci_dev *pdev)
bnx2_netif_start(bp);
return 0;
 }
+#endif
 
 static struct pci_driver bnx2_pci_driver = {
.name   = DRV_MODULE_NAME,
.id_table   = bnx2_pci_tbl,
.probe  = bnx2_init_one,
.remove = __devexit_p(bnx2_remove_one),
+#ifdef CONFIG_PM
.suspend= bnx2_suspend,
.resume = bnx2_resume,
+#endif
 };
 
 static int __init bnx2_init(void)
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -11722,6 +11722,7 @@ static void __devexit tg3_remove_one(str
}
 }
 
+#ifdef CONFIG_PM
 static int tg3_suspend(struct pci_dev *pdev, pm_message_t state)
 {
struct net_device *dev = pci_get_drvdata(pdev);
@@ -11802,14 +11803,17 @@ out:
 
return err;
 }
+#endif
 
 static struct pci_driver tg3_driver = {
.name   = DRV_MODULE_NAME,
.id_table   = tg3_pci_tbl,
.probe  = tg3_init_one,
.remove = __devexit_p(tg3_remove_one),
+#ifdef CONFIG_PM
.suspend= tg3_suspend,
.resume = tg3_resume
+#endif
 };
 
 static int __init tg3_init(void)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] CONFIG_PM=n slim: drivers/net/*

2006-08-11 Thread Andrew Morton
On Sat, 12 Aug 2006 04:46:23 +0400
Alexey Dobriyan [EMAIL PROTECTED] wrote:

 +#ifdef CONFIG_PM
  static int amd8111e_suspend(struct pci_dev *pci_dev, pm_message_t state)
  {
   struct net_device *dev = pci_get_drvdata(pci_dev);
 @@ -1873,7 +1879,7 @@ static int amd8111e_resume(struct pci_de
  
   return 0;
  }
 -
 +#endif
  
  static void __devexit amd8111e_remove_one(struct pci_dev *pdev)
  {
 @@ -2152,8 +2158,10 @@ static struct pci_driver amd8111e_driver
   .id_table   = amd8111e_pci_tbl,
   .probe  = amd8111e_probe_one,
   .remove = __devexit_p(amd8111e_remove_one),
 +#ifdef CONFIG_PM
   .suspend= amd8111e_suspend,
   .resume = amd8111e_resume
 +#endif
  };

The preferred way is

#ifdef CONFIG_PM
static int amd8111e_suspend(...)
{
}
#else
#define amd8111e_suspend NULL
#define amd8111e_resume NULL
#endif
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] CONFIG_PM=n slim: drivers/net/*

2006-08-11 Thread Alexey Dobriyan
On Fri, Aug 11, 2006 at 06:49:43PM -0700, Andrew Morton wrote:
 On Sat, 12 Aug 2006 04:46:23 +0400
 Alexey Dobriyan [EMAIL PROTECTED] wrote:

  +#ifdef CONFIG_PM
   static int amd8111e_suspend(struct pci_dev *pci_dev, pm_message_t state)
   {
  struct net_device *dev = pci_get_drvdata(pci_dev);
  @@ -1873,7 +1879,7 @@ static int amd8111e_resume(struct pci_de
 
  return 0;
   }
  -
  +#endif
 
   static void __devexit amd8111e_remove_one(struct pci_dev *pdev)
   {
  @@ -2152,8 +2158,10 @@ static struct pci_driver amd8111e_driver
  .id_table   = amd8111e_pci_tbl,
  .probe  = amd8111e_probe_one,
  .remove = __devexit_p(amd8111e_remove_one),
  +#ifdef CONFIG_PM
  .suspend= amd8111e_suspend,
  .resume = amd8111e_resume
  +#endif
   };

 The preferred way is

 #ifdef CONFIG_PM
 static int amd8111e_suspend(...)
 {
 }
 #else
 #define amd8111e_suspend NULL
 #define amd8111e_resume NULL
 #endif

Plenty of drivers already use first variant. Also this won't allow

struct pci_driver {
...
#ifdef CONFIG_PM
int (*suspend)(...);
int (*resume)(...);
#endif
...
};

which is good for a) space savings in CONFIG_PM=n case, and
b) making drivers care about CONFIG_PM=n users hard way aka compilation
failure.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] CONFIG_PM=n slim: drivers/net/*

2006-08-11 Thread Andrew Morton
On Sat, 12 Aug 2006 06:30:14 +0400
Alexey Dobriyan [EMAIL PROTECTED] wrote:

 On Fri, Aug 11, 2006 at 06:49:43PM -0700, Andrew Morton wrote:
  On Sat, 12 Aug 2006 04:46:23 +0400
  Alexey Dobriyan [EMAIL PROTECTED] wrote:
 
   +#ifdef CONFIG_PM
static int amd8111e_suspend(struct pci_dev *pci_dev, pm_message_t state)
{
 struct net_device *dev = pci_get_drvdata(pci_dev);
   @@ -1873,7 +1879,7 @@ static int amd8111e_resume(struct pci_de
  
 return 0;
}
   -
   +#endif
  
static void __devexit amd8111e_remove_one(struct pci_dev *pdev)
{
   @@ -2152,8 +2158,10 @@ static struct pci_driver amd8111e_driver
 .id_table   = amd8111e_pci_tbl,
 .probe  = amd8111e_probe_one,
 .remove = __devexit_p(amd8111e_remove_one),
   +#ifdef CONFIG_PM
 .suspend= amd8111e_suspend,
 .resume = amd8111e_resume
   +#endif
};
 
  The preferred way is
 
  #ifdef CONFIG_PM
  static int amd8111e_suspend(...)
  {
  }
  #else
  #define amd8111e_suspend NULL
  #define amd8111e_resume NULL
  #endif
 
 Plenty of drivers already use first variant.

That can be fixed.

 Also this won't allow
 
   struct pci_driver {
   ...
   #ifdef CONFIG_PM
   int (*suspend)(...);
   int (*resume)(...);
   #endif
   ...
   };
 
 which is good for a) space savings in CONFIG_PM=n case, and
 b) making drivers care about CONFIG_PM=n users hard way aka compilation
 failure.

eh?  Both versions will generate identical code.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] [NETLINK]: Handle NLM_F_ECHO in netlink_rcv_skb()

2006-08-11 Thread Herbert Xu
Alexey Kuznetsov [EMAIL PROTECTED] wrote:
 
 Probably, unicasts sent due to NLM_F_ECHO should somehow override
 rcvbuf limits.

Actually I think the only safe solution is to allocate a separate
socket for multicast messages.  In other words, if you want reliable
unicast reception on a socket, don't bind it to a multicast group.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] ehea: queue management

2006-08-11 Thread Jörn Engel
On Fri, 11 August 2006 09:28:26 +0200, Thomas Klein wrote:
 Michael Neuling wrote:
 +static inline u32 map_swqe_size(u8 swqe_enc_size)
 +static inline u32|map_rwqe_size(u8 rwqe_enc_size)
   
 Agreed. Functions were replaced by a single map_wqe_size() function.

Just a general thing, try to avoid having two identifiers that are
near-100% identical.  As seen in this thread, they are _very_ easy to
confuse.

Ime, there are two methods to avoid this.  One is to make the
identifiers longer, something like map_seek_wqe_size and
map_read_wqe_size.  The other is to make them shorter, just s and
r is less confusing than the above.

Which method works best depends on many things, including personal
taste.

Jörn

-- 
And spam is a useful source of entropy for /dev/random too!
-- Jasmine Strong
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take8 2/2] kevent: poll/select() notifications. Timer notifications.

2006-08-11 Thread Andrew Morton
On Fri, 11 Aug 2006 12:40:10 +0400
Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 
 poll/select() notifications. Timer notifications.
 
 This patch includes generic poll/select and timer notifications.
 
 kevent_poll works simialr to epoll and has the same issues (callback
 is invoked not from internal state machine of the caller, but through
 process awake).
 
 Timer notifications can be used for fine grained per-process time 
 management, since interval timers are very inconvenient to use, 
 and they are limited.
 
 ...

 +static struct lock_class_key kevent_poll_key;
 +
 +void kevent_poll_reinit(struct file *file)
 +{
 + lockdep_set_class(file-st.lock, kevent_poll_key);
 +}

Why is this necessary?

 +#include linux/kernel.h
 +#include linux/types.h
 +#include linux/list.h
 +#include linux/slab.h
 +#include linux/spinlock.h
 +#include linux/timer.h
 +#include linux/jiffies.h
 +#include linux/kevent.h
 +
 +static void kevent_timer_func(unsigned long data)
 +{
 + struct kevent *k = (struct kevent *)data;
 + struct timer_list *t = k-st-origin;
 +
 + kevent_storage_ready(k-st, NULL, KEVENT_MASK_ALL);
 + mod_timer(t, jiffies + msecs_to_jiffies(k-event.id.raw[0]));
 +}
 +
 +static struct lock_class_key kevent_timer_key;
 +
 +static int kevent_timer_enqueue(struct kevent *k)
 +{
 + struct timer_list *t;
 + struct kevent_storage *st;
 + int err;
 +
 + t = kmalloc(sizeof(struct timer_list) + sizeof(struct kevent_storage), 
 + GFP_KERNEL);
 + if (!t)
 + return -ENOMEM;
 +
 + init_timer(t);
 + t-function = kevent_timer_func;
 + t-expires = jiffies + msecs_to_jiffies(k-event.id.raw[0]);
 + t-data = (unsigned long)k;

setup_timer().

 + st = (struct kevent_storage *)(t+1);

It would be cleaner to create

struct something {
struct timer_list timer;
struct kevent_storage storage;
};

 + err = kevent_storage_init(t, st);
 + if (err)
 + goto err_out_free;
 + lockdep_set_class(st-lock, kevent_timer_key);

Why is this necesary?

 + 
 + kevent_storage_dequeue(st, k);
 + 
 + kfree(t);
 +
 + return 0;
 +}
 +
 +static int kevent_timer_callback(struct kevent *k)
 +{
 + struct kevent_storage *st = k-st;
 + struct timer_list *t = st-origin;
 +
 + if (!t)
 + return -ENODEV;
 + 
 + k-event.ret_data[0] = (__u32)jiffies;

What does this do?

Does it expose jiffies to userspace?

It truncates jiffies on 64-bit machines.

 +late_initcall(kevent_init_timer);

module_init() would be more typical.  If there was a reason for using
late_initcall(), that reason should be commented.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] ehea: queue management

2006-08-11 Thread Thomas Klein

Mikey,

first of all thanks a lot for the effort you invested to review our code.
We're quite happy about the improvements we made due to your comments.
See our answers below.

Kind regards
Thomas



Michael Neuling wrote:

Please add comments to make the code more readable, especially at the
start of functions/structures to describe what they do.  A large readme
at the start of ehea_main.c which gave an overview of the driver design
would be really useful.


We'll improve comments over the next patch iterations.


+static int ipz_queue_ctor(struct ipz_queue *queue,
+  const u32 nr_of_pages,
+  const u32 pagesize, const u32 qe_size,
+  const u32 nr_of_sg)
+{
+int f;
+EDEB_EN(7, nr_of_pages=%x pagesize=%x qe_size=%x,
+nr_of_pages, pagesize, qe_size);
+queue-queue_length = nr_of_pages * pagesize;
+queue-queue_pages = vmalloc(nr_of_pages * sizeof(void *));



+if (!queue-queue_pages) {
+EDEB(4, ERROR!! didn't get the memory);
+return 0;
+}
+memset(queue-queue_pages, 0, nr_of_pages * sizeof(void *));
+
+for (f = 0; f  nr_of_pages; f++) {
+(queue-queue_pages)[f] =
+(struct ipz_page *)get_zeroed_page(GFP_KERNEL);
+if (!(queue-queue_pages)[f]) {
+break;
+}
+}
+if (f  nr_of_pages) {
+int g;
+EDEB_ERR(4, couldn't get 0ed pages queue=%p f=%x 
+ nr_of_pages=%x, queue, f, nr_of_pages);
+for (g = 0; g  f; g++) {
+free_page((unsigned long)(queue-queue_pages)[g]);
+}
+return 0;


If you return here when calling from ehea_create_eq, I think you are
leaking the queue-queue_pages allocation (the pages they point to are
freed correctly).


You're right. Fixed it.



+void ehea_cq_delete(struct ehea_cq *cq)
+{
+vfree(cq);
+}


This is used in only two places.  Do we need it?


No. The ehea_?q_new()/ehea_?q_delete() functions were entirely removed.



If we do... can we static inline?





+hret = ehea_h_alloc_resource_cq(adapter-handle,
+cq,
+cq-attr,
+cq-ipz_cq_handle, cq-galpas);


hret set twice...


Fixed.






+if (hret != H_SUCCESS) {
+EDEB_ERR(4, ehea_h_alloc_resource_cq failed. hret=%lx, hret);
+goto create_cq_exit1;
+}
+
+ipz_rc = ipz_queue_ctor(cq-ipz_queue, cq-attr.nr_pages,
+EHEA_PAGESIZE, sizeof(struct ehea_cqe), 0);
+if (!ipz_rc)
+goto create_cq_exit2;
+
+hret = H_SUCCESS;
+
+for (counter = 0; counter  cq-attr.nr_pages; counter++) {
+vpage = ipz_qpageit_get_inc(cq-ipz_queue);


vpga set twice...


vpage gets assigned to rpage via the virt_to_abs() call. So using
it again afterwards is ok.




+if (!vpage) {
+EDEB_ERR(4, ipz_qpageit_get_inc() 
+ returns NULL adapter=%p, adapter);
+goto create_cq_exit3;
+}
+
+rpage = virt_to_abs(vpage);
+
+hret = ehea_h_register_rpage_cq(adapter-handle,
+cq-ipz_cq_handle,
+0,
+HIPZ_CQ_REGISTER_ORIG,
+rpage, 1, cq-galpas.kernel);
+
+if (hret  H_SUCCESS) {
+EDEB_ERR(4, ehea_h_register_rpage_cq() failed 
+ ehea_cq=%p hret=%lx 
+ counter=%i act_pages=%i,
+ cq, hret, counter, cq-attr.nr_pages);
+goto create_cq_exit3;
+}
+
+if (counter == (cq-attr.nr_pages - 1)) {
+vpage = ipz_qpageit_get_inc(cq-ipz_queue);
+
+if ((hret != H_SUCCESS) || (vpage)) {
+EDEB_ERR(4, Registration of pages not 
+ complete ehea_cq=%p hret=%lx,
+ cq, hret)
+goto create_cq_exit3;
+}
+} else {
+if ((hret != H_PAGE_REGISTERED) || (vpage == 0)) {
+EDEB_ERR(4, Registration of page failed 
+ ehea_cq=%p hret=%lx
+ counter=%i act_pages=%i,
+ cq, hret, counter, cq-attr.nr_pages);
+goto create_cq_exit3;
+}
+}
+}




+void ehea_eq_delete(struct ehea_eq *eq)
+{
+vfree(eq);
+}


Again, is this really needed and what about static inline?


removed. see above.



+struct ehea_qp *ehea_qp_new(void) {
+struct ehea_qp *qp = vmalloc(sizeof(*qp));
+

[2.6.18-rc4] lockdep warning at inet6_addr_add

2006-08-11 Thread Luca
Hi,
I get a warning from lockdep during boot; 2.6.18-rc3 don't have this
warning. I see a similar report in the archive (I haven't found time to
test the patch...):

http://marc.theaimsgroup.com/?l=linux-netdevm=115506258902757w=2

but my stacktrace is a bit different, so I'm reporting this one too:

=
[ INFO: inconsistent lock state ]
-
inconsistent {in-softirq-W} - {softirq-on-W} usage.
ifconfig/1812 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (ifa-lock){-+..}, at: [f1a0a4b9] inet6_addr_add+0xd9/0x160 [ipv6]
{in-softirq-W} state was registered at:
  [b01342dd] lock_acquire+0x5d/0x80
  [b030bafa] _spin_lock_bh+0x3a/0x50
  [f1a0b76b] addrconf_dad_timer+0x5b/0x100 [ipv6]
  [b0122bb9] run_timer_softirq+0x149/0x170
  [b011ee62] __do_softirq+0x62/0xc0
  [b011ef15] do_softirq+0x55/0x60
  [b011f18b] irq_exit+0x4b/0x50
  [b01059ec] do_IRQ+0x4c/0x90
  [b0103c15] common_interrupt+0x25/0x2c
  [b0101aa1] cpu_idle+0x41/0x70
  [b0100537] rest_init+0x37/0x40
  [b03fe7aa] start_kernel+0x2ba/0x360
  [b0100199] 0xb0100199
irq event stamp: 4501
hardirqs last  enabled at (4501): [b011f525] local_bh_enable_ip+0x95/0x110
hardirqs last disabled at (4499): [b011f4bf] local_bh_enable_ip+0x2f/0x110
softirqs last  enabled at (4500): [f1a0729e] ipv6_add_addr+0x3e/0x270 [ipv6]
softirqs last disabled at (4488): [b030bc6e] _read_lock_bh+0xe/0x50

other info that might help us debug this:
1 lock held by ifconfig/1812:
 #0:  (rtnl_mutex){--..}, at: [b030a73c] mutex_lock+0x1c/0x20

stack backtrace:
 [b0104312] show_trace+0x12/0x20
 [b01048e9] dump_stack+0x19/0x20
 [b01320de] print_usage_bug+0x23e/0x250
 [b0132a39] mark_lock+0x5a9/0x5c0
 [b0133a86] __lock_acquire+0x806/0xd20
 [b01342dd] lock_acquire+0x5d/0x80
 [b030baa5] _spin_lock+0x35/0x50
 [f1a0a4b9] inet6_addr_add+0xd9/0x160 [ipv6]
 [f1a0a7d9] addrconf_add_ifaddr+0x69/0x80 [ipv6]
 [f1a02342] inet6_ioctl+0x72/0x90 [ipv6]
 [b02aaa9e] sock_ioctl+0xfe/0x1f0
 [b01720f8] do_ioctl+0x28/0x80
 [b01721a7] vfs_ioctl+0x57/0x2c0
 [b0172449] sys_ioctl+0x39/0x60
 [b0103173] syscall_call+0x7/0xb


Luca
-- 
Home: http://kronoz.cjb.net
Il tempo speso
a coltivare sogni
non � sprecato.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4]: powerpc/cell spidernet ethernet driver fixes

2006-08-11 Thread Linas Vepstas

The following series of patches implement some fixes and performance
improvements for the Spedernet ethernet device driver. The high point
of the patch series is some code to implement a low-watermark interrupt
on the transmit queue. The bundle of patches raises transmit performance 
from some embarassingly low value to a reasonable 730 Megabits per
second for 1500 byte packets.

Please note that the spider is an ethernet controller that is 
integrated into the south bridge, and is thus available only on
Cell-based platforms.

These have been well-tested over the last few weeks. Please apply. 

--linas
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4]: powerpc/cell spidernet low watermark patch.

2006-08-11 Thread Linas Vepstas


Implement basic low-watermark support for the transmit queue.

The basic idea of a low-watermark interrupt is as follows.
The device driver queues up a bunch of packets for the hardware
to transmit, and then kicks he hardware to get it started.
As the hardware drains the queue of pending, untransmitted 
packets, the device driver will want to know when the queue
is almost empty, so that it can queue some more packets.

This is accomplished by setting the DESCR_TXDESFLG flag in
one of the packets. When the hardware sees this flag, it will 
interrupt the device driver. Because this flag is on a fixed
packet, rather than at  fixed location in the queue, the
code below needs to move the flag as more packets are
queued up. This implementation attempts to keep te flag 
at about 3/4's of the way into the queue.

This patch boosts driver performance from about 
300-400Mbps for 1500 byte packets, to about 710-740Mbps.


Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: James K Lewis [EMAIL PROTECTED]
Cc: Utz Bacher [EMAIL PROTECTED]
Cc: Jens Osterkamp [EMAIL PROTECTED]
Cc: Arnd Bergmann [EMAIL PROTECTED]


 drivers/net/spider_net.c |   56 ++-
 drivers/net/spider_net.h |6 +++--
 2 files changed, 55 insertions(+), 7 deletions(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c  2006-08-07 
14:39:38.0 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c   2006-08-11 
11:23:24.0 -0500
@@ -700,6 +700,39 @@ spider_net_release_tx_descr(struct spide
dev_kfree_skb_any(skb);
 }
 
+static void
+spider_net_set_low_watermark(struct spider_net_card *card)
+{
+   int status;
+   int cnt=0;
+   int i;
+   struct spider_net_descr *descr = card-tx_chain.tail;
+
+   /* Measure the length of the queue. */
+   while (descr != card-tx_chain.head) {
+   status = descr-dmac_cmd_status  SPIDER_NET_DESCR_NOT_IN_USE;
+   if (status == SPIDER_NET_DESCR_NOT_IN_USE)
+   break;
+   descr = descr-next;
+   cnt++;
+   }
+   if (cnt == 0)
+   return;
+
+   /* Set low-watermark 3/4th's of the way into the queue. */
+   descr = card-tx_chain.tail;
+   cnt = (cnt*3)/4;
+   for (i=0;icnt; i++)
+   descr = descr-next;
+
+   /* Set the new watermark, clear the old wtermark */
+   descr-dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
+   if (card-low_watermark  card-low_watermark != descr)
+   card-low_watermark-dmac_cmd_status =
+card-low_watermark-dmac_cmd_status  
~SPIDER_NET_DESCR_TXDESFLG;
+   card-low_watermark = descr;
+}
+
 /**
  * spider_net_release_tx_chain - processes sent tx descriptors
  * @card: adapter structure
@@ -717,6 +750,7 @@ spider_net_release_tx_chain(struct spide
 {
struct spider_net_descr_chain *chain = card-tx_chain;
int status;
+   int rc=0;
 
spider_net_read_reg(card, SPIDER_NET_GDTDMACCNTR);
 
@@ -729,8 +763,10 @@ spider_net_release_tx_chain(struct spide
break;
 
case SPIDER_NET_DESCR_CARDOWNED:
-   if (!brutal)
-   return 1;
+   if (!brutal) {
+   rc = 1;
+   goto done;
+   }
/* fallthrough, if we release the descriptors
 * brutally (then we don't care about
 * SPIDER_NET_DESCR_CARDOWNED) */
@@ -747,12 +783,15 @@ spider_net_release_tx_chain(struct spide
 
default:
card-netdev_stats.tx_dropped++;
-   return 1;
+   rc = 1;
+   goto done;
}
spider_net_release_tx_descr(card);
}
-
-   return 0;
+done:
+   if (rc == 1)
+   spider_net_set_low_watermark(card);
+   return rc;
 }
 
 /**
@@ -1453,6 +1492,10 @@ spider_net_interrupt(int irq, void *ptr,
spider_net_rx_irq_off(card);
netif_rx_schedule(netdev);
}
+   if (status_reg  SPIDER_NET_TXINT ) {
+   spider_net_cleanup_tx_ring(card);
+   netif_wake_queue(netdev);
+   }
 
if (status_reg  SPIDER_NET_ERRINT )
spider_net_handle_error_irq(card, status_reg);
@@ -1615,6 +1658,9 @@ spider_net_open(struct net_device *netde
card-descr,
PCI_DMA_TODEVICE, tx_descriptors))
goto alloc_tx_failed;
+
+   card-low_watermark = NULL;
+
if (spider_net_init_chain(card, card-rx_chain,
card-descr + tx_descriptors,
PCI_DMA_FROMDEVICE, rx_descriptors))

[PATCH 1/4]: powerpc/cell spidernet burst alignment patch.

2006-08-11 Thread Linas Vepstas


This patch increases the Burst Address alignment from 64 to 1024 in the
Spidernet driver. This improves transmit performance for large packets
from about 100Mbps to 300-400Mbps.

Signed-off-by: James K Lewis [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Cc: Utz Bacher [EMAIL PROTECTED]
Cc: Jens Osterkamp [EMAIL PROTECTED]
Cc: Arnd Bergmann [EMAIL PROTECTED]


 drivers/net/spider_net.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h  2006-08-07 
14:37:10.0 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h   2006-08-11 
11:09:57.0 -0500
@@ -209,7 +209,7 @@ extern char spider_net_driver_name[];
 #define SPIDER_NET_DMA_RX_FEND_VALUE   0x00030003
 /* to set TX_DMA_EN */
 #define SPIDER_NET_TX_DMA_EN   0x8000
-#define SPIDER_NET_GDTDCEIDIS  0x0002
+#define SPIDER_NET_GDTDCEIDIS  0x0302
 #define SPIDER_NET_DMA_TX_VALUESPIDER_NET_TX_DMA_EN | \
SPIDER_NET_GDTDCEIDIS
 #define SPIDER_NET_DMA_TX_FEND_VALUE   0x00030003
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4]: powerpc/cell spidernet ethtool -i version number info.

2006-08-11 Thread Linas Vepstas


This patch adds version information as reported by 
ethtool -i to the Spidernet driver.

Signed-off-by: James K Lewis [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Cc: Utz Bacher [EMAIL PROTECTED]
Cc: Jens Osterkamp [EMAIL PROTECTED]


 drivers/net/spider_net.c |3 +++
 drivers/net/spider_net.h |2 ++
 drivers/net/spider_net_ethtool.c |2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c  2006-08-11 
11:34:16.0 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c   2006-08-11 
11:38:48.0 -0500
@@ -55,6 +55,7 @@ MODULE_AUTHOR(Utz Bacher [EMAIL PROTECTED]
  [EMAIL PROTECTED]);
 MODULE_DESCRIPTION(Spider Southbridge Gigabit Ethernet driver);
 MODULE_LICENSE(GPL);
+MODULE_VERSION(VERSION);
 
 static int rx_descriptors = SPIDER_NET_RX_DESCRIPTORS_DEFAULT;
 static int tx_descriptors = SPIDER_NET_TX_DESCRIPTORS_DEFAULT;
@@ -2293,6 +2294,8 @@ static struct pci_driver spider_net_driv
  */
 static int __init spider_net_init(void)
 {
+   printk(spidernet Version %s.\n,VERSION);
+
if (rx_descriptors  SPIDER_NET_RX_DESCRIPTORS_MIN) {
rx_descriptors = SPIDER_NET_RX_DESCRIPTORS_MIN;
pr_info(adjusting rx descriptors to %i.\n, rx_descriptors);
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h  2006-08-11 
11:19:47.0 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h   2006-08-11 
11:38:48.0 -0500
@@ -24,6 +24,8 @@
 #ifndef _SPIDER_NET_H
 #define _SPIDER_NET_H
 
+#define VERSION 1.1 A
+
 #include sungem_phy.h
 
 extern int spider_net_stop(struct net_device *netdev);
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net_ethtool.c
===
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net_ethtool.c  2006-06-17 
20:49:35.0 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net_ethtool.c   2006-08-11 
11:38:48.0 -0500
@@ -55,7 +55,7 @@ spider_net_ethtool_get_drvinfo(struct ne
/* clear and fill out info */
memset(drvinfo, 0, sizeof(struct ethtool_drvinfo));
strncpy(drvinfo-driver, spider_net_driver_name, 32);
-   strncpy(drvinfo-version, 0.1, 32);
+   strncpy(drvinfo-version, VERSION, 32);
strcpy(drvinfo-fw_version, no information);
strncpy(drvinfo-bus_info, pci_name(card-pdev), 32);
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4]: powerpc/cell spidernet stop error printing patch.

2006-08-11 Thread Linas Vepstas

Turn off mis-interpretation of the queue-empty interrupt
status bit as an error. This bit is set as a part of 
the previous low-watermark patch.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: James K Lewis [EMAIL PROTECTED]
Cc: Utz Bacher [EMAIL PROTECTED]
Cc: Jens Osterkamp [EMAIL PROTECTED]
Cc: Arnd Bergmann [EMAIL PROTECTED]


 drivers/net/spider_net.c |   13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c  2006-08-11 
11:23:24.0 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c   2006-08-11 
11:34:16.0 -0500
@@ -1275,12 +1275,15 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_PHYINT:
case SPIDER_NET_GMAC2INT:
case SPIDER_NET_GMAC1INT:
-   case SPIDER_NET_GIPSINT:
case SPIDER_NET_GFIFOINT:
case SPIDER_NET_DMACINT:
case SPIDER_NET_GSYSINT:
break; */
 
+   case SPIDER_NET_GIPSINT:
+   show_error = 0;
+   break;
+
case SPIDER_NET_GPWOPCMPINT:
/* PHY write operation completed */
show_error = 0;
@@ -1339,9 +1342,10 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDTDCEINT:
/* chain end. If a descriptor should be sent, kick off
 * tx dma
-   if (card-tx_chain.tail == card-tx_chain.head)
+   if (card-tx_chain.tail != card-tx_chain.head)
spider_net_kick_tx_dma(card);
-   show_error = 0; */
+   */
+   show_error = 0;
break;
 
/* case SPIDER_NET_G1TMCNTINT: not used. print a message */
@@ -1455,8 +1459,9 @@ spider_net_handle_error_irq(struct spide
}
 
if ((show_error)  (netif_msg_intr(card)))
-   pr_err(Got error interrupt, GHIINT0STS = 0x%08x, 
+   pr_err(Got error interrupt on %s, GHIINT0STS = 0x%08x, 
   GHIINT1STS = 0x%08x, GHIINT2STS = 0x%08x\n,
+  card-netdev-name,
   status_reg, error_reg1, error_reg2);
 
/* clear interrupt sources */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4]: powerpc/cell spidernet ethernet driver fixes

2006-08-11 Thread jschopp

Linas Vepstas wrote:

The following series of patches implement some fixes and performance
improvements for the Spedernet ethernet device driver. The high point
of the patch series is some code to implement a low-watermark interrupt
on the transmit queue. The bundle of patches raises transmit performance 
from some embarassingly low value to a reasonable 730 Megabits per

second for 1500 byte packets.

Please note that the spider is an ethernet controller that is 
integrated into the south bridge, and is thus available only on

Cell-based platforms.

These have been well-tested over the last few weeks. Please apply. 


With these patches the spidernet driver performance goes from bad to usable.  They are all 
good changes.  I'd expect some more bottlnecks to be identified now that these are 
cleared.  Maybe Jim and Linas can get the driver performance up from usable to good next.


Acked-by: Joel Schopp [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4]: powerpc/cell spidernet ethernet driver fixes

2006-08-11 Thread Arnd Bergmann
On Friday 11 August 2006 21:31, Linas Vepstas wrote:
 I put my name at the top when I was the primary author. 
 I put Jim's name at the top when he was the primary author. 
 
 Both names are there because I sat in Jim's office and used
 his keyboard. I got him to compile and run the tests on
 his hardware, and we'd then debate the results.

When the patch gets added to a git repository, they end up
having your name on it, because the author is determined
from the person who sent the patch.

For the patches where Jim is the main author, you should put a 

From: James K Lewis [EMAIL PROTECTED]

into the first line of the email body. That will make the
scripts do the right thing. The order of the Signed-off-by:
lines is used is independant from authorship and should
list the name of the submitter last.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] ehea: interface to network stack

2006-08-11 Thread Anton Blanchard

Hi,

 --- linux-2.6.18-rc4-orig/drivers/net/ehea/ehea_main.c1969-12-31 

 +#define DEB_PREFIX main

Doesnt appear to be used.

 +static struct net_device_stats *ehea_get_stats(struct net_device *dev)
...
 + cb2 = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);

I cant see where this gets freed.

 +
 + skb_index = ((index - i
 +   + port_res-skb_arr_sq_len)
 +  % port_res-skb_arr_sq_len);

This is going to force an expensive divide. Its much better to change
this to the simpler and quicker:

i++;
if (i  max)
i = 0;

There are a few places in the driver can be changed to do this.

 +static int ehea_setup_single_port(struct ehea_adapter *adapter,A
 +   int portnum, struct device_node *dn)
...
 + cb4 = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);

I cant see where this is freed.

Anton
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] ehea: pHYP interface

2006-08-11 Thread Anton Blanchard

Hi,

 --- linux-2.6.18-rc4-orig/drivers/net/ehea/ehea_phyp.c1969-12-31 
 16:00:00.0 -0800

 +u64 ehea_h_alloc_resource_eq(const u64 hcp_adapter_handle,
...
 +u64 hipz_h_reregister_pmr(const u64 adapter_handle,
...
 +static inline int hcp_galpas_ctor(struct h_galpas *galpas,

Be nice to have some consistent names, hipz_ and hcp_ is kind of
cryptic.

 +#define H_QP_CR_STATE_RESET  0x0100  /*  Reset */

Probably want ULL on here and the other 64bit constants.

Anton
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] ehea: pHYP interface

2006-08-11 Thread Nathan Lynch
Nathan Lynch wrote:

 Hope all the callers of this function are in non-atomic context (but I
 wasn't able to find any callers?).

Never mind, I somehow missed the users of ehea_hcall_9arg_9ret in this
patch, sorry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] ehea: pHYP interface

2006-08-11 Thread Nathan Lynch
Hi-

Jan-Bernd Themann wrote:
 +static inline long ehea_hcall_9arg_9ret(unsigned long opcode,
 + unsigned long arg1,
 + unsigned long arg2,
 + unsigned long arg3,
 + unsigned long arg4,
 + unsigned long arg5,
 + unsigned long arg6,
 + unsigned long arg7,
 + unsigned long arg8,
 + unsigned long arg9,
 + unsigned long *out1,
 + unsigned long *out2,
 + unsigned long *out3,
 + unsigned long *out4,
 + unsigned long *out5,
 + unsigned long *out6,
 + unsigned long *out7,
 + unsigned long *out8,
 + unsigned long *out9)
 +{
 + long hret = H_SUCCESS;
 + int i, sleep_msecs;
 +
 + EDEB_EN(7, opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx 
 + arg5=%lx arg6=%lx arg7=%lx arg8=%lx arg9=%lx,
 + opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7,
 + arg8, arg9);
 +
 +
 + for (i = 0; i  5; i++) {
 + hret = plpar_hcall_9arg_9ret(opcode,
 + arg1, arg2, arg3, arg4,
 + arg5, arg6, arg7, arg8,
 + arg9,
 + out1, out2, out3, out4,
 + out5, out6, out7, out8,
 + out9);
 +
 + if (H_IS_LONG_BUSY(hret)) {
 + sleep_msecs = get_longbusy_msecs(hret);
 + msleep_interruptible(sleep_msecs);
 + continue;
 + }

Looping five times before giving up seems arbitrary and failure-prone
on busy systems.

Is msleep_interruptible (as opposed to msleep) really appropriate?

Hope all the callers of this function are in non-atomic context (but I
wasn't able to find any callers?).

And this function is too big to be inline.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] ehea: header files

2006-08-11 Thread Anton Blanchard

Hi,

  drivers/net/ehea/ehea.h|  452 

 +#define EHEA_DRIVER_NAME IBM eHEA

You are using this for ethtool get_drvinfo. Im not sure if it should
match the module name, and I worry about having a space in the name. Any
ideas on what we should be doing here?

 +#define NET_IP_ALIGN 0

Shouldnt override this in your driver.

 +#define EDEB_P_GENERIC(level, idstring, format, args...) \
 +#define EDEB_P_GENERIC(level,idstring,format,args...) \
 +#define EDEB(level, format, args...) \
 +#define EDEB_ERR(level, format, args...) \
 +#define EDEB_EN(level, format, args...) \
 +#define EDEB_EX(level, format, args...) \
 +#define EDEB_DMP(level, adr, len, format, args...) \

There are a lot of debug statements in the driver. When doing a review
I stripped them all out to make it easier to read. As suggested by
others, using the standard debug macros (where still required) would be
a good idea.

Anton
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] ehea: queue management

2006-08-11 Thread Anton Blanchard

Hi,

 --- linux-2.6.18-rc4-orig/drivers/net/ehea/ehea_ethtool.c 1969-12-31 

 +static void netdev_get_pauseparam(struct net_device *dev,
 +   struct ethtool_pauseparam *pauseparam)
 +{
 + printk(get pauseparam\n);
 +}

There are a number of stubbed out ethtool functions like this. Best not
to implement them and allow the upper layers to return a correct error.

Anton
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] ehea: header files

2006-08-11 Thread Anton Blanchard

 --- linux-2.6.18-rc4-orig/drivers/net/ehea/ehea.h 1969-12-31 

 +extern void exit(int);

Should be able to remove that prototype :) 

Anton
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.18-rc4] lockdep warning at inet6_addr_add

2006-08-11 Thread Herbert Xu
Luca [EMAIL PROTECTED] wrote:
 I get a warning from lockdep during boot; 2.6.18-rc3 don't have this
 warning. I see a similar report in the archive (I haven't found time to
 test the patch...):
 
 http://marc.theaimsgroup.com/?l=linux-netdevm=115506258902757w=2

It's the same issue.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


crash with bcm43xx_dscape and multiple interfaces

2006-08-11 Thread Johannes Berg

Hey,

I managed to crash it again :P

Here's approximately what I did:


johannes:/home/johannes# ifconfig wlan0 down

johannes:/home/johannes# cd /sys/class/ieee80211/phy0/

johannes:/sys/class/ieee80211/phy0# echo -n moni0  add_iface

johannes:/sys/class/ieee80211/phy0# iwconfig wlan0 mode master

johannes:/sys/class/ieee80211/phy0# iwconfig wlan0 essid test

johannes:/sys/class/ieee80211/phy0# ifconfig moni0 down

johannes:/sys/class/ieee80211/phy0# iwconfig moni0 mode monitor

johannes:/sys/class/ieee80211/phy0# ifconfig wlan0 up

johannes:/sys/class/ieee80211/phy0# ifconfig moni0 up

Segmentation fault

bcm43xx_d80211: ASSERTION FAILED (bcm-cached_beacon) at: 
drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c:1754:bcm43xx_update_templates()

bcm43xx_d80211: ASSERTION FAILED (bcm-cached_beacon) at: 
drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c:1603:bcm43xx_write_beacon_template()

Unable to handle kernel paging request for data at address 0x0060

Faulting instruction address: 0xf24cf308

Oops: Kernel access of bad area, sig: 11 [#1]

Aug 10 20:55:18 johannes kernel: [ 1095.326784]

Modules linked in: af_packet radeon drm binfmt_misc hci_usb rfcomm l2cap 
bluetooth nls_utf8 hfsplus nls_base joydev appletouch usbhid snd_aoa_codec_tas 
snd_aoa_fabric_layout snd_aoa arc4 rate_control evdev bcm43xx_d80211 
firmware_class snd_aoa_i2sbus snd_pcm snd_timer snd_page_alloc snd uninorth_agp 
ohci1394 ieee1394 agpgart soundcore snd_aoa_soundbus yenta_socket 
rsrc_nonstatic pcmcia_core ohci_hcd ehci_hcd usbcore 80211 unix

NIP: F24CF308 LR: F24CF348 CTR: C01BACA4

REGS: c1e83c70 TRAP: 0300   Not tainted  (2.6.18-rc4)

MSR: 1032 ME,IR,DR  CR: 24008422  XER: 

DAR: 0060, DSISR: 4000

TASK = e8b88070[3419] 'ifconfig' THREAD: c1e82000

GPR00: F24CF348 C1E83D20 E8B88070 00A4 A2E8  C056 0020

GPR08: 0033  0020 C051 44008488 10018A14 28004422 

GPR16: 1023D638 100D 100B 100D 10010474 E5A48000 C1E83E58 8914

GPR24: E5A48280 EFEC2400 0004  0068 0002 0018 EFEC2400

NIP [F24CF308] bcm43xx_write_beacon_template+0x50/0x98 [bcm43xx_d80211]

LR [F24CF348] bcm43xx_write_beacon_template+0x90/0x98 [bcm43xx_d80211]

Call Trace:

[C1E83D20] [F24CF348] bcm43xx_write_beacon_template+0x90/0x98 [bcm43xx_d80211] 
(unreliable)

[C1E83D40] [F24CFE44] bcm43xx_refresh_templates+0x48/0x268 [bcm43xx_d80211]

[C1E83D70] [F24D24AC] bcm43xx_add_interface+0xe4/0x118 [bcm43xx_d80211]

[C1E83DA0] [F20BDFD4] ieee80211_open+0x120/0x398 [80211]

[C1E83DF0] [C021E8C8] dev_open+0x78/0xcc

[C1E83E10] [C021C7E0] dev_change_flags+0x13c/0x168

[C1E83E30] [C0261C80] devinet_ioctl+0x5bc/0x71c

[C1E83EA0] [C02623F8] inet_ioctl+0xb0/0xdc

[C1E83EB0] [C0210810] sock_ioctl+0x160/0x28c

[C1E83ED0] [C0096604] do_ioctl+0x38/0x84

[C1E83EE0] [C00966D4] vfs_ioctl+0x84/0x43c

[C1E83F10] [C0096ACC] sys_ioctl+0x40/0x74

[C1E83F40] [C0010C88] ret_from_syscall+0x0/0x38

--- Exception: c01 at 0xff62780

   LR = 0xffecf54

Instruction dump:

3c80f24f 7cbe2b78 3ca0f24f 7cdd3378 3884c650 38a5c3f8 812304f8 3c60f24f

38c00643 3863c3b8 2f89 419e0040 80a90060 7fe3fb78 7f86e378 7fc7f378

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] ehea: interface to network stack

2006-08-11 Thread Jan-Bernd Themann

Hi Christian,

thanks for your comments, we'll send an updated patch set soon.

Jan-Bernd

Christian Borntraeger wrote:

Hi Jan-Bernd,

I had some minutes, here are some finding after a quick look.

On Wednesday 09 August 2006 10:38, you wrote:

+static struct net_device_stats *ehea_get_stats(struct net_device *dev)
+{
+   int i;
+   u64 hret = H_HARDWARE;
+   u64 rx_packets = 0;
+   struct ehea_port *port = (struct ehea_port*)dev-priv;


dev-priv is a void pointer, this cast is unnecessary. When we are at it, have 
you considered the netdev_priv macro? This will require some prep in 
alloc_netdev and might not always pe possible. 


good point, we'll use alloc_etherdev / netdev_priv


+
+   EDEB_DMP(7, (u8*)cb2,
+sizeof(struct hcp_query_ehea_port_cb_2), After HCALL);
+
+   for (i = 0; i  port-num_def_qps; i++) {
+   rx_packets += port-port_res[i].rx_packets;
+   }
+
+   stats-tx_packets = cb2-txucp + cb2-txmcp + cb2-txbcp;
+   stats-multicast = cb2-rxmcp;
+   stats-rx_errors = cb2-rxuerr;
+   stats-rx_bytes = cb2-rxo;
+   stats-tx_bytes = cb2-txo;
+   stats-rx_packets = rx_packets;
+
+get_stat_exit:
+   EDEB_EX(7, );
+   return stats;
+}


again, cb2 is not freed.
[...]


yep, done




+static inline u64 get_swqe_addr(u64 tmp_addr, int addr_seg)
+{
+   u64 addr;
+   addr = tmp_addr;
+   return addr;
+}


This is suppsed to change in the future? If not you can get rid of it. 


+
+static inline u64 get_rwqe_addr(u64 tmp_addr)
+{
+   return tmp_addr;
+}


same here. 


removed



+ ehea_poll()


The poll function seems too long and therefore hard to review. Please consider 
splitting it. 





done

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] ehea: queue management

2006-08-11 Thread Jan-Bernd Themann

Hi,




+#define EHEA_EQE_SM_MECH_NUMBER  EHEA_BMASK_IBM(48, 55)
+#define EHEA_EQE_SM_PORT_NUMBER  EHEA_BMASK_IBM(56, 63)
+
+struct ehea_eqe {
+   u64 entry;
+};


ehea_ege.. what is that and why a struct if only 1 item?  Comments
please.  



There are send / receive queue elements (ehea_swqe, ehea_rwqe),
completion queue elements (ehea_cqe) and event queue elements (ehea_eqe).
We introduced struct ehea_eqe to get a consistent description for all
queue elements.

Jan-Bernd

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PPP]: carrier/operstate support

2006-08-11 Thread Patrick McHardy
[PPP]: carrier/operstate support

Add support for setting carrier state and operstate of ppp devices.
The patch works as follows:

- Initially, ppp devices come up with carrier off and npmode set to NPMODE_DROP

- In non-demand mode (dialin or permanent connection), when the IP*CP
  protocols transition to UP state, the ppp daemon will change the npmode
  to NPMODE_PASS, at which point the carrier state is set to on.
  On hangup the ppp daemon will change it back to NPMODE_DROP and the
  carrier state is changed back to off.

- In demand mode, the PPP daemon will set npmode to NPMODE_PASS immediately
  (moving the interface to carrier up state) and set the SC_LOOP_TRAFFIC
  flag until demand is signalled. While SC_LOOP_TRAFFIC is set, the interface
  is put in dormant state. As soon as demand is signaled this flag will get
  cleared and npmode set to NPMODE_QUEUE (which is unsupported by the kernel),
  so the carrier state is changed to off until the IP*CB protocols
  transition to UP.

So simply put: a non-demand interface in in carrier off state until its
able to pass traffic, demand interfaces are in carrier on, dormant state
while waiting for demand, carrier off state while connecting and carrier on
state when able to pass traffic.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 821791c11bee40ec1bedbb86a03a995475c85b2c
tree 4c466550537e6bf4bc4b8f32171caf7d7349ca81
parent 99c4451081b0ea2107ba4827f7d518e1c739cf1b
author Patrick McHardy [EMAIL PROTECTED] Fri, 11 Aug 2006 16:01:15 +0200
committer Patrick McHardy [EMAIL PROTECTED] Fri, 11 Aug 2006 16:01:15 +0200

 drivers/net/ppp_generic.c |   24 +---
 1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index 0ec6e9d..e01b7e9 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -639,6 +639,12 @@ static int ppp_ioctl(struct inode *inode
ppp_lock(ppp);
cflags = ppp-flags  ~val;
ppp-flags = val  SC_FLAG_BITS;
+   if (ppp-dev != NULL) {
+   if (ppp-flags  SC_LOOP_TRAFFIC)
+   netif_dormant_on(ppp-dev);
+   else
+   netif_dormant_off(ppp-dev);
+   }
ppp_unlock(ppp);
if (cflags  SC_CCP_OPEN)
ppp_ccp_closed(ppp);
@@ -719,9 +725,20 @@ static int ppp_ioctl(struct inode *inode
if (copy_to_user(argp, npi, sizeof(npi)))
break;
} else {
+   ppp_lock(ppp);
ppp-npmode[i] = npi.mode;
-   /* we may be able to transmit more packets now (??) */
-   netif_wake_queue(ppp-dev);
+   if (ppp-dev != NULL) {
+   for (i = 0; i  NUM_NP; i++) {
+   if (ppp-npmode[i] == NPMODE_PASS) {
+   netif_carrier_on(ppp-dev);
+   netif_wake_queue(ppp-dev);
+   break;
+   }
+   }
+   if (i == NUM_NP)
+   netif_carrier_off(ppp-dev);
+   }
+   ppp_unlock(ppp);
}
err = 0;
break;
@@ -2420,7 +2437,8 @@ ppp_create_interface(int unit, int *retp
init_ppp_file(ppp-file, INTERFACE);
ppp-file.hdrlen = PPP_HDRLEN - 2;  /* don't count proto bytes */
for (i = 0; i  NUM_NP; ++i)
-   ppp-npmode[i] = NPMODE_PASS;
+   ppp-npmode[i] = NPMODE_DROP;
+   netif_carrier_off(dev);
INIT_LIST_HEAD(ppp-channels);
spin_lock_init(ppp-rlock);
spin_lock_init(ppp-wlock);