Re: FACK and CWND

2006-08-01 Thread David Miller
From: Ma Lin [EMAIL PROTECTED]
Date: Fri, 28 Jul 2006 18:37:15 +0800

 In FACK, the holes between SACK blocks are considered as loss. To a
 sender, when SACK comes in, loss_out would be non-zero. According to
 linux-2.6.17.7/net/ipv4/tcp_input.c, function tcp_time_to_recover(),
 this non-zero loss_out will send the sender into Recovery state,
 in which, CWND could be reduced. In one word, it seems that, FACK
 would allow SACK holes to reduce CWND.

That's right, because when tp-lost_out is set we have some form
of absolute proof that packets were lost.

Note that even when not receiving SACK blocks, ie. pure Reno,
we emulate the SACK information the best we can.

So, if we have real SACK blocks, tcp_update_scoreboard() will
mark all packets in the retransmit queue up to fackets_out
minus reordering as lost.

Else, for non-SACK, only the head packet in the retransmit queue
will be marked as lost.

 However, in the paper Congestion Control in Linux TCP, Section 3,
 subsection Recovery, it says that Recovery state is triggered by
 sufficient amount of successive duplicate ACK, to my understand,
 that means 3-dup.

Under Linux it has more complicated definition.  We wait until
we see at least tp-reordering packets lost.

Dynamically we try to determine how deeply packets are being
reordered on the connection.

Using this value, we use tp-fackets_out - tp-reordering
as how many packets we think have been proven as lost.

You will note that any code path that falls through to to end
tcp_fastretrans_alert() will retransmit one packet using a call
to tcp_xmit_retransmit_queue().  And one such code path is the
transition to TCP_CA_Recovery which is guarded by the
tcp_time_to_recover() check, which encapsulates the two tests
we've discussed as:

if (tp-lost_out)
return 1;
if (tcp_fackets_out(tp)  tp-reordering)
return 1;

The next few checks try to handle some fringe cases, such as the head
packet in the retransmit queue having been sent more than an RTO ago,
and also having so few packets in the retransmit queue that normal
recovery mechanisms cannot function properly:

if (tcp_head_timedout(sk, tp))
return 1;
packets_out = tp-packets_out;
if (packets_out = tp-reordering 
tp-sacked_out = max_t(__u32, packets_out/2, 
sysctl_tcp_reordering) 
!tcp_may_send_now(sk, tp)) {
return 1;
}

Hope this helps.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread David Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Mon, 31 Jul 2006 20:36:58 +0200

 Herbert Xu wrote:
  So I'd rather see a patch to disable the warnings for 2.6.18 so that
  the proper fix can be tested more thoroughly.  We should remember that
  the 2.6.18 minus the warning is still going to be heaps better in this
  regard compared to 2.6.17 where all TSO packets were essentially
  discarded due to the incorrect checksum (when the NAT module is loaded).
 
 I'm fine either way.

I'm going to kill the warning for 2.6.18, using the patch below.
We can queue up Patrick's changes for 2.6.19, just give me the
word and I'll apply it to net-2.6.19

commit 9133d3d0619e637f2e05bf574af4b7a16319e3ca
Author: David S. Miller [EMAIL PROTECTED]
Date:   Tue Aug 1 00:00:12 2006 -0700

[NET]: Kill the WARN_ON() calls for checksum fixups.

We have a more complete solution in the works, involving
the seperation of CHECKSUM_HW on input vs. output, and
having netfilter properly do incremental checksums.

But that is a very involved patch and is thus 2.6.19
material.

What we have now is infinitely better than the past,
wherein all TSO packets were dropped due to corrupt
checksums as soon at the NAT module was loaded.  At
least now, the checksums do get fixed up, it just
isn't the cleanest nor most optimal solution.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/net/core/dev.c b/net/core/dev.c
index 4d2b516..5b630ce 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1166,11 +1166,6 @@ int skb_checksum_help(struct sk_buff *sk
goto out_set_summed;
 
if (unlikely(skb_shinfo(skb)-gso_size)) {
-   static int warned;
-
-   WARN_ON(!warned);
-   warned = 1;
-
/* Let GSO fix up the checksum. */
goto out_set_summed;
}
@@ -1220,11 +1215,6 @@ struct sk_buff *skb_gso_segment(struct s
__skb_pull(skb, skb-mac_len);
 
if (unlikely(skb-ip_summed != CHECKSUM_HW)) {
-   static int warned;
-
-   WARN_ON(!warned);
-   warned = 1;
-
if (skb_header_cloned(skb) 
(err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC)))
return ERR_PTR(err);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Herbert Xu
On Tue, Aug 01, 2006 at 12:00:59AM -0700, David Miller wrote:
 
 I'm going to kill the warning for 2.6.18, using the patch below.
 We can queue up Patrick's changes for 2.6.19, just give me the
 word and I'll apply it to net-2.6.19

Thanks Dave.  This should give us plenty of time to produce a
complete solution to the problem.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Patrick McHardy
Patrick McHardy wrote:---
 [NETFILTER]: nf_queue: handle GSO packets
 
 Handle GSO packets in nf_queue by segmenting them before queueing to
 avoid breaking GSO in case they get mangled.

While testing this patch I noticed that some meta-data is lost when
segmenting packets. With the attached patch it works fine. Some of
the fields may not appear necessary, so here's my reasoning:

- nfct/nfctinfo/nfct_reasm: the xfrm output path does an immediate
  nf_reset, so they were not necessary until now. Queueing can happen
  on any hook, so we need to preserve them.

- nf_bridge: needed for GSO on a bridge device until the deferred
  hooks are removed

- tc_verd/tc_index/input_dev: when directing a packet from a device
  supporting GSO to a device not supporting GSO using tc actions,
  these fields may be set.

Herbert, does this look sane to you?


Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3a12ff1..9c6ef32 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1948,6 +1948,31 @@ struct sk_buff *skb_segment(struct sk_bu
 
nskb-dev = skb-dev;
nskb-priority = skb-priority;
+#ifdef CONFIG_NETFILTER
+   nskb-nfmark = skb-nfmark;
+   nskb-nfct = skb-nfct;
+   nf_conntrack_get(skb-nfct);
+   nskb-nfctinfo = skb-nfctinfo;
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+   nskb-nfct_reasm = skb-nfct_reasm;
+   nf_conntrack_get_reasm(skb-nfct_reasm);
+#endif
+#ifdef CONFIG_BRIDGE_NETFILTER
+   nskb-nf_bridge = skb-nf_bridge;
+   nf_bridge_get(skb-nf_bridge);
+#endif
+#endif
+#ifdef CONFIG_NET_SCHED
+#ifdef CONFIG_NET_CLS_ACT
+   nskb-input_dev = skb-input_dev;
+   nskb-tc_verd = skb-tc_verd;
+#endif
+   nskb-tc_index = skb-tc_index;
+#endif
+   skb_copy_secmark(nskb, skb);
nskb-protocol = skb-protocol;
nskb-dst = dst_clone(skb-dst);
memcpy(nskb-cb, skb-cb, sizeof(skb-cb));


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Patrick McHardy
David Miller wrote:
 I'm going to kill the warning for 2.6.18, using the patch below.
 We can queue up Patrick's changes for 2.6.19, just give me the
 word and I'll apply it to net-2.6.19


Besides the problem I mentioned in the mail I just wrote everything
looks good so far. I'll probably send you the final version sometimes
today.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Herbert Xu
Hi Patrick:

On Tue, Aug 01, 2006 at 09:19:34AM +0200, Patrick McHardy wrote:
 
 - nfct/nfctinfo/nfct_reasm: the xfrm output path does an immediate
   nf_reset, so they were not necessary until now. Queueing can happen
   on any hook, so we need to preserve them.
 
 - nf_bridge: needed for GSO on a bridge device until the deferred
   hooks are removed

I need to think a bit more about these two cases.

 - tc_verd/tc_index/input_dev: when directing a packet from a device
   supporting GSO to a device not supporting GSO using tc actions,
   these fields may be set.

This doesn't look right though.  GSO should occur just before
hard_start_xmit (after all tc actions have taken place) so we
shouldn't have any more tc actions to perform.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Tue, 1 Aug 2006 17:23:58 +1000

  - tc_verd/tc_index/input_dev: when directing a packet from a device
supporting GSO to a device not supporting GSO using tc actions,
these fields may be set.
 
 This doesn't look right though.  GSO should occur just before
 hard_start_xmit (after all tc actions have taken place) so we
 shouldn't have any more tc actions to perform.

Hmmm, what about loopback?  Yeah yeah, LOOPBACK_TSO is not defined :)
but what I'm really referring to is that loopback preserves the
traffic classifier bits of the skb.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Patrick McHardy
Herbert Xu wrote:
- tc_verd/tc_index/input_dev: when directing a packet from a device
  supporting GSO to a device not supporting GSO using tc actions,
  these fields may be set.
 
 
 This doesn't look right though.  GSO should occur just before
 hard_start_xmit (after all tc actions have taken place) so we
 shouldn't have any more tc actions to perform.

You're right, this part shouldn't be necessary. It might be possible
to construct some case with loopback, but I doubt it would be very
realistic.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Herbert Xu
On Tue, Aug 01, 2006 at 12:36:14AM -0700, David Miller wrote:
 
   - tc_verd/tc_index/input_dev: when directing a packet from a device
 supporting GSO to a device not supporting GSO using tc actions,
 these fields may be set.
  
  This doesn't look right though.  GSO should occur just before
  hard_start_xmit (after all tc actions have taken place) so we
  shouldn't have any more tc actions to perform.
 
 Hmmm, what about loopback?  Yeah yeah, LOOPBACK_TSO is not defined :)
 but what I'm really referring to is that loopback preserves the
 traffic classifier bits of the skb.

I don't know.  Jamal, is there a scenario where these three attributes
are needed for loopback packets?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/4] kevent: core files.

2006-08-01 Thread Ulrich Drepper
Herbert Xu wrote:
 The other to consider is that events don't come from the hardware.
 Events are written by the kernel.  So if user-space is just reading
 the events that we've written, then there are no cache misses at all.

Not quite true.  The ring buffer can be written to from another
processor.  The kernel thread responsible for generating the event
(receiving data from network or disk, expired timer) can run
independently on another CPU.

This is the case to keep in mind here.  I thought Zach and the other
involved in the discussions in Ottawa said this has been shown to be a
problem and that a ring buffer implementation with something other than
simple front and back pointers is preferable.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: [RFC 1/4] kevent: core files.

2006-08-01 Thread David Miller
From: Ulrich Drepper [EMAIL PROTECTED]
Date: Tue, 01 Aug 2006 00:53:10 -0700

 This is the case to keep in mind here.  I thought Zach and the other
 involved in the discussions in Ottawa said this has been shown to be a
 problem and that a ring buffer implementation with something other than
 simple front and back pointers is preferable.

This is part of why I suggested VJ style channel data
structure.  At worst, the cachelines for the entries get
into shared modified state when the remove userland cpu
reads the slot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux TCP in the presence of delays or drops...

2006-08-01 Thread Ilpo Järvinen
On Mon, 31 Jul 2006, Oumer Teyeb wrote:

 -If multiple timeouts occur for one packet then even if we are using the
 timestamp option or FRTO TCP linux is not able to detect spurious
 retransmissions... and TCP linux is able to detect spurious retransmissions
 only for a single timeout for one packet or fast retransmissions that are
 caused by duplicate ACK reception.I have some traces that show this
 behaviour, let me know if you are interested.

I have come across this same issue. I can confirm that multiple RTOs is 
not handled correctly. But there are some other issues in FRTO as 
well... nothing extremely dangerous though. In one of the cases, the 
current FRTO algorithm could miss real losses, but you luckily need to be 
quite clever to trigger that, and due to very conservative response used 
in case spurious RTO is detected, it has no significant danger in it even 
then. The other flaws cause too conservative behavior.

We have a set of fixes to F-RTO, but part of them have not yet been 
tested. Since the fixes include 4-5 independent changes to handle also 
rare cases, it takes some time to test. Besides, I'll probably have to 
talk with Pasi Sarolahti (author of FRTO) on couple of interpretation 
issues in RFC4138 as soon as his vacation ends (mid August if I remember 
correctly) to verify some of the changes.

I expect that I'll get some actual results after two weeks or so... I case 
you're are in hurry and are interested on the fixes, I could prepare an 
independent patch quite soon and release it (untested) on our projects web 
site (if you are interested, ask off-list so that we don't bother others 
:-)). But the kernel inclusion of the fixes should IMO wait at least until 
I get some decent test cases analyzed, which will take at least two weeks 
or so due to my schedule.

 -In the cases where TCP timestamp or FRTO is not able to detect spurious
 retransmissions, the performance degrades even more than when TCP timestamp
 or FRTO option are not used

That's one of the FRTO features, we have a fix (I cannot say about 
timestamps since we've been running our tests without tstamps for years).


-- 
 i.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[take2 2/4] kevent: network AIO, socket notifications.

2006-08-01 Thread Evgeniy Polyakov

This patchset includes socket notifications and network asynchronous IO.
Network AIO is based on kevent and works as usual kevent storage on top
of inode.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]

diff --git a/include/asm-i386/socket.h b/include/asm-i386/socket.h
index 5755d57..d4d2f5c 100644
--- a/include/asm-i386/socket.h
+++ b/include/asm-i386/socket.h
@@ -50,4 +50,6 @@ #define SO_ACCEPTCONN 30
 #define SO_PEERSEC 31
 #define SO_PASSSEC 34
 
+#define SO_ASYNC_SOCK  35
+
 #endif /* _ASM_SOCKET_H */

diff --git a/include/asm-x86_64/socket.h b/include/asm-x86_64/socket.h
index b467026..fc2b49d 100644
--- a/include/asm-x86_64/socket.h
+++ b/include/asm-x86_64/socket.h
@@ -50,4 +50,6 @@ #define SO_ACCEPTCONN 30
 #define SO_PEERSEC 31
 #define SO_PASSSEC 34
 
+#define SO_ASYNC_SOCK  35
+
 #endif /* _ASM_SOCKET_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4307e76..9267873 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1283,6 +1283,8 @@ extern struct sk_buff *skb_recv_datagram
 int noblock, int *err);
 extern unsigned intdatagram_poll(struct file *file, struct socket *sock,
 struct poll_table_struct *wait);
+extern intskb_copy_datagram(const struct sk_buff *from, 
+int offset, void *dst, int size);
 extern intskb_copy_datagram_iovec(const struct sk_buff *from,
   int offset, struct iovec *to,
   int size);
diff --git a/include/net/sock.h b/include/net/sock.h
index 324b3ea..c43a153 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -48,6 +48,7 @@ #include linux/lockdep.h
 #include linux/netdevice.h
 #include linux/skbuff.h  /* struct sk_buff */
 #include linux/security.h
+#include linux/kevent.h
 
 #include linux/filter.h
 
@@ -391,6 +392,8 @@ enum sock_flags {
SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */
SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
+   SOCK_ASYNC,
+   SOCK_ASYNC_INUSE,
 };
 
 static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
@@ -450,6 +453,21 @@ static inline int sk_stream_memory_free(
 
 extern void sk_stream_rfree(struct sk_buff *skb);
 
+struct socket_alloc {
+   struct socket socket;
+   struct inode vfs_inode;
+};
+
+static inline struct socket *SOCKET_I(struct inode *inode)
+{
+   return container_of(inode, struct socket_alloc, vfs_inode)-socket;
+}
+
+static inline struct inode *SOCK_INODE(struct socket *socket)
+{
+   return container_of(socket, struct socket_alloc, socket)-vfs_inode;
+}
+
 static inline void sk_stream_set_owner_r(struct sk_buff *skb, struct sock *sk)
 {
skb-sk = sk;
@@ -477,6 +495,7 @@ static inline void sk_add_backlog(struct
sk-sk_backlog.tail = skb;
}
skb-next = NULL;
+   kevent_socket_notify(sk, KEVENT_SOCKET_RECV);
 }
 
 #define sk_wait_event(__sk, __timeo, __condition)  \
@@ -548,6 +567,12 @@ struct proto {
 
int (*backlog_rcv) (struct sock *sk, 
struct sk_buff *skb);
+   
+   int (*async_recv) (struct sock *sk, 
+   void *dst, size_t size);
+   int (*async_send) (struct sock *sk, 
+   struct page **pages, unsigned 
int poffset, 
+   size_t size);
 
/* Keeping track of sk's, looking them up, and port selection methods. 
*/
void(*hash)(struct sock *sk);
@@ -679,21 +704,6 @@ static inline struct kiocb *siocb_to_kio
return si-kiocb;
 }
 
-struct socket_alloc {
-   struct socket socket;
-   struct inode vfs_inode;
-};
-
-static inline struct socket *SOCKET_I(struct inode *inode)
-{
-   return container_of(inode, struct socket_alloc, vfs_inode)-socket;
-}
-
-static inline struct inode *SOCK_INODE(struct socket *socket)
-{
-   return container_of(socket, struct socket_alloc, socket)-vfs_inode;
-}
-
 extern void __sk_stream_mem_reclaim(struct sock *sk);
 extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind);
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0720bdd..5a1899b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -364,6 +364,8 @@ extern int  compat_tcp_setsockopt(struc
int level, int optname,
char __user *optval, int optlen);
 extern voidtcp_set_keepalive(struct sock *sk, int val);
+extern int 

[take2 3/4] kevent: AIO, aio_sendfile() implementation.

2006-08-01 Thread Evgeniy Polyakov

This patch includes asynchronous propagation of file's data into VFS
cache and aio_sendfile() implementation.
Network aio_sendfile() works lazily - it asynchronously populates pages
into the VFS cache (which can be used for various tricks with adaptive
readahead) and then uses usual -sendfile() callback.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]

diff --git a/fs/bio.c b/fs/bio.c
index 6a0b9ad..a3ee530 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -119,7 +119,7 @@ void bio_free(struct bio *bio, struct bi
 /*
  * default destructor for a bio allocated with bio_alloc_bioset()
  */
-static void bio_fs_destructor(struct bio *bio)
+void bio_fs_destructor(struct bio *bio)
 {
bio_free(bio, fs_bio_set);
 }
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index fb4d322..9316551 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -685,6 +685,7 @@ ext2_writepages(struct address_space *ma
 }
 
 const struct address_space_operations ext2_aops = {
+   .get_block  = ext2_get_block,
.readpage   = ext2_readpage,
.readpages  = ext2_readpages,
.writepage  = ext2_writepage,
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index c5ee9f0..d9210d4 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1699,6 +1699,7 @@ static int ext3_journalled_set_page_dirt
 }
 
 static const struct address_space_operations ext3_ordered_aops = {
+   .get_block  = ext3_get_block,
.readpage   = ext3_readpage,
.readpages  = ext3_readpages,
.writepage  = ext3_ordered_writepage,
diff --git a/fs/file_table.c b/fs/file_table.c
index 0131ba0..b649317 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -112,6 +112,9 @@ struct file *get_empty_filp(void)
if (security_file_alloc(f))
goto fail_sec;
 
+#ifdef CONFIG_KEVENT_POLL
+   kevent_storage_init(f, f-st);
+#endif
tsk = current;
INIT_LIST_HEAD(f-f_u.fu_list);
atomic_set(f-f_count, 1);
@@ -159,6 +162,9 @@ void fastcall __fput(struct file *file)
might_sleep();
 
fsnotify_close(file);
+#ifdef CONFIG_KEVENT_POLL
+   kevent_storage_fini(file-st);
+#endif
/*
 * The function eventpoll_release() should be the first called
 * in the file cleanup chain.
diff --git a/fs/inode.c b/fs/inode.c
index 0bf9f04..fdbd0ba 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -21,6 +21,7 @@ #include linux/pagemap.h
 #include linux/cdev.h
 #include linux/bootmem.h
 #include linux/inotify.h
+#include linux/kevent.h
 #include linux/mount.h
 
 /*
@@ -165,12 +166,18 @@ #endif
}
memset(inode-u, 0, sizeof(inode-u));
inode-i_mapping = mapping;
+#if defined CONFIG_KEVENT
+   kevent_storage_init(inode, inode-st);
+#endif
}
return inode;
 }
 
 void destroy_inode(struct inode *inode) 
 {
+#if defined CONFIG_KEVENT_INODE || defined CONFIG_KEVENT_SOCKET
+   kevent_storage_fini(inode-st);
+#endif
BUG_ON(inode_has_buffers(inode));
security_inode_free(inode);
if (inode-i_sb-s_op-destroy_inode)
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 12dfdcf..f8dca72 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -3001,6 +3001,7 @@ int reiserfs_setattr(struct dentry *dent
 }
 
 const struct address_space_operations reiserfs_address_space_operations = {
+   .get_block = reiserfs_get_block,
.writepage = reiserfs_writepage,
.readpage = reiserfs_readpage,
.readpages = reiserfs_readpages,

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2561020..65eb438 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -240,6 +240,9 @@ #include linux/mutex.h
 #include asm/atomic.h
 #include asm/semaphore.h
 #include asm/byteorder.h
+#ifdef CONFIG_KEVENT
+#include linux/kevent_storage.h
+#endif
 
 struct hd_geometry;
 struct iovec;
@@ -352,6 +355,8 @@ struct address_space;
 struct writeback_control;
 
 struct address_space_operations {
+   int  (*get_block)(struct inode *inode, sector_t iblock,
+   struct buffer_head *bh_result, int create);
int (*writepage)(struct page *page, struct writeback_control *wbc);
int (*readpage)(struct file *, struct page *);
void (*sync_page)(struct page *);
@@ -546,6 +551,10 @@ #ifdef CONFIG_INOTIFY
struct mutexinotify_mutex;  /* protects the watches list */
 #endif
 
+#ifdef CONFIG_KEVENT_INODE
+   struct kevent_storage   st;
+#endif
+
unsigned long   i_state;
unsigned long   dirtied_when;   /* jiffies of first dirtying */
 
@@ -698,6 +707,9 @@ #ifdef CONFIG_EPOLL
struct list_headf_ep_links;
spinlock_t  f_ep_lock;
 #endif /* #ifdef CONFIG_EPOLL */
+#ifdef CONFIG_KEVENT_POLL
+   struct kevent_storage   st;
+#endif
struct address_space*f_mapping;
 };
 extern spinlock_t files_lock;
diff --git 

[take2 0/4] kevent: introduction.

2006-08-01 Thread Evgeniy Polyakov

I send this patchset for comments and review, it still contains AIO and 
aio_sendfile() implementation on top of get_block() abstraction, which was
decided to postpone for a while (it is simpler right now to generate patchset 
as a whole,
when kevent will be ready for merge, I will generate patchset without AIO 
stuff).
It does not contain mapped buffer implementation, since it's design is not 100% 
completed, I will present that implementation in the third patchset.

Changes from previous patchset:
 - rebased against 2.6.18-git tree
 - removed ioctl controlling
 - added new syscall kevent_get_events(int fd, unsigned int min_nr, unsigned 
int max_nr,
unsigned int timeout, void __user *buf, unsigned flags)
 - use old syscall kevent_ctl for creation/removing, modification and initial 
kevent 
initialization
 - use mutuxes instead of semaphores
 - added file descriptor check and return error if provided descriptor does not 
match
kevent file operations
 - various indent fixes
 - removed aio_sendfile() declarations.

Thank you.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[take2 1/4] kevent: core files.

2006-08-01 Thread Evgeniy Polyakov

This patch includes core kevent files:
 - userspace controlling
 - kernelspace interfaces
 - initialization
 - notification state machines

It might also inlclude parts from other subsystem (like network related
syscalls, so it is possible that it will not compile without other
patches applied).

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]


diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S
index dd63d47..0af988a 100644
--- a/arch/i386/kernel/syscall_table.S
+++ b/arch/i386/kernel/syscall_table.S
@@ -317,3 +317,7 @@ ENTRY(sys_call_table)
.long sys_tee   /* 315 */
.long sys_vmsplice
.long sys_move_pages
+   .long sys_aio_recv
+   .long sys_aio_send
+   .long sys_kevent_get_events
+   .long sys_kevent_ctl
diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S
index 5d4a7d1..e157ad4 100644
--- a/arch/x86_64/ia32/ia32entry.S
+++ b/arch/x86_64/ia32/ia32entry.S
@@ -713,4 +713,8 @@ #endif
.quad sys_tee
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
+   .quad sys_aio_recv
+   .quad sys_aio_send
+   .quad sys_kevent_get_events
+   .quad sys_kevent_ctl
 ia32_syscall_end:  

diff --git a/include/asm-i386/unistd.h b/include/asm-i386/unistd.h
index fc1c8dd..a76e50d 100644
--- a/include/asm-i386/unistd.h
+++ b/include/asm-i386/unistd.h
@@ -323,10 +323,14 @@ #define __NR_sync_file_range  314
 #define __NR_tee   315
 #define __NR_vmsplice  316
 #define __NR_move_pages317
+#define __NR_aio_recv  318
+#define __NR_aio_send  319
+#define __NR_kevent_get_events 320
+#define __NR_kevent_ctl321
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 318
+#define NR_syscalls 322
 
 /*
  * user-visible error numbers are in the range -1 - -128: see

diff --git a/include/asm-x86_64/unistd.h b/include/asm-x86_64/unistd.h
index 94387c9..9e61299 100644
--- a/include/asm-x86_64/unistd.h
+++ b/include/asm-x86_64/unistd.h
@@ -619,10 +619,18 @@ #define __NR_vmsplice 278
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages279
 __SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_aio_recv  280
+__SYSCALL(__NR_aio_recv, sys_aio_recv)
+#define __NR_aio_send  281
+__SYSCALL(__NR_aio_send, sys_aio_send)
+#define __NR_aio_sendfile  282
+__SYSCALL(__NR_aio_sendfile, sys_kevent_get_events)
+#define __NR_kevent_ctl283
+__SYSCALL(__NR_kevent_ctl, sys_kevent_ctl)
 
 #ifdef __KERNEL__
 
-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max __NR_kevent_ctl
 
 #ifndef __NO_STUBS
 
diff --git a/include/linux/kevent.h b/include/linux/kevent.h
new file mode 100644
index 000..6c36f3f
--- /dev/null
+++ b/include/linux/kevent.h
@@ -0,0 +1,259 @@
+/*
+ * kevent.h
+ * 
+ * 2006 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED]
+ * All rights reserved.
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __KEVENT_H
+#define __KEVENT_H
+
+/*
+ * Kevent request flags.
+ */
+
+#define KEVENT_REQ_ONESHOT 0x1 /* Process this event only once 
and then dequeue. */
+
+/*
+ * Kevent return flags.
+ */
+#define KEVENT_RET_BROKEN  0x1 /* Kevent is broken. */
+#define KEVENT_RET_DONE0x2 /* Kevent processing 
was finished successfully. */
+
+/*
+ * Kevent type set.
+ */
+#define KEVENT_SOCKET  0
+#define KEVENT_INODE   1
+#define KEVENT_TIMER   2
+#define KEVENT_POLL3
+#define KEVENT_NAIO4
+#define KEVENT_AIO 5
+#defineKEVENT_MAX  6
+
+/*
+ * Per-type event sets.
+ * Number of per-event sets should be exactly as number of kevent types.
+ */
+
+/*
+ * Timer events.
+ */
+#defineKEVENT_TIMER_FIRED  0x1
+
+/*
+ * Socket/network asynchronous IO events.
+ */
+#defineKEVENT_SOCKET_RECV  0x1
+#defineKEVENT_SOCKET_ACCEPT0x2
+#defineKEVENT_SOCKET_SEND  0x4
+
+/*
+ * Inode events.
+ */
+#defineKEVENT_INODE_CREATE 0x1
+#defineKEVENT_INODE_REMOVE 0x2
+
+/*
+ * Poll events.
+ */
+#defineKEVENT_POLL_POLLIN  0x0001
+#defineKEVENT_POLL_POLLPRI 0x0002

[take2 4/4] kevent: poll/select() notifications. Timer notifications.

2006-08-01 Thread Evgeniy Polyakov

This patch includes generic poll/select and timer notifications.

kevent_poll works simialr to epoll and has the same issues (callback
is invoked not from internal state machine of the caller, but through
process awake).

Timer notifications can be used for fine grained per-process time 
management, since iteractive timers are very inconveniently to use, 
and they are limited.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]

diff --git a/kernel/kevent/kevent_poll.c b/kernel/kevent/kevent_poll.c
new file mode 100644
index 000..4950e7c
--- /dev/null
+++ b/kernel/kevent/kevent_poll.c
@@ -0,0 +1,223 @@
+/*
+ * kevent_poll.c
+ * 
+ * 2006 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED]
+ * All rights reserved.
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/list.h
+#include linux/slab.h
+#include linux/spinlock.h
+#include linux/timer.h
+#include linux/file.h
+#include linux/kevent.h
+#include linux/poll.h
+#include linux/fs.h
+
+static kmem_cache_t *kevent_poll_container_cache;
+static kmem_cache_t *kevent_poll_priv_cache;
+
+struct kevent_poll_ctl
+{
+   struct poll_table_structpt;
+   struct kevent   *k;
+};
+
+struct kevent_poll_wait_container
+{
+   struct list_headcontainer_entry;
+   wait_queue_head_t   *whead;
+   wait_queue_twait;
+   struct kevent   *k;
+};
+
+struct kevent_poll_private
+{
+   struct list_headcontainer_list;
+   spinlock_t  container_lock;
+};
+
+static int kevent_poll_enqueue(struct kevent *k);
+static int kevent_poll_dequeue(struct kevent *k);
+static int kevent_poll_callback(struct kevent *k);
+
+static int kevent_poll_wait_callback(wait_queue_t *wait, 
+   unsigned mode, int sync, void *key)
+{
+   struct kevent_poll_wait_container *cont = 
+   container_of(wait, struct kevent_poll_wait_container, wait);
+   struct kevent *k = cont-k;
+   struct file *file = k-st-origin;
+   unsigned long flags;
+   u32 revents, event;
+
+   revents = file-f_op-poll(file, NULL);
+   spin_lock_irqsave(k-lock, flags);
+   event = k-event.event;
+   spin_unlock_irqrestore(k-lock, flags);
+
+   kevent_storage_ready(k-st, NULL, revents);
+
+   return 0;
+}
+
+static void kevent_poll_qproc(struct file *file, wait_queue_head_t *whead, 
+   struct poll_table_struct *poll_table)
+{
+   struct kevent *k = 
+   container_of(poll_table, struct kevent_poll_ctl, pt)-k;
+   struct kevent_poll_private *priv = k-priv;
+   struct kevent_poll_wait_container *cont;
+   unsigned long flags;
+
+   cont = kmem_cache_alloc(kevent_poll_container_cache, SLAB_KERNEL);
+   if (!cont) {
+   kevent_break(k);
+   return;
+   }
+   
+   cont-k = k;
+   init_waitqueue_func_entry(cont-wait, kevent_poll_wait_callback);
+   cont-whead = whead;
+
+   spin_lock_irqsave(priv-container_lock, flags);
+   list_add_tail(cont-container_entry, priv-container_list);
+   spin_unlock_irqrestore(priv-container_lock, flags);
+
+   add_wait_queue(whead, cont-wait);
+}
+
+static int kevent_poll_enqueue(struct kevent *k)
+{
+   struct file *file;
+   int err, ready = 0;
+   unsigned int revents;
+   struct kevent_poll_ctl ctl;
+   struct kevent_poll_private *priv;
+
+   file = fget(k-event.id.raw[0]);
+   if (!file)
+   return -ENODEV;
+
+   err = -EINVAL;
+   if (!file-f_op || !file-f_op-poll)
+   goto err_out_fput;
+
+   err = -ENOMEM;
+   priv = kmem_cache_alloc(kevent_poll_priv_cache, SLAB_KERNEL);
+   if (!priv)
+   goto err_out_fput;
+
+   spin_lock_init(priv-container_lock);
+   INIT_LIST_HEAD(priv-container_list);
+
+   k-priv = priv;
+
+   ctl.k = k;
+   init_poll_funcptr(ctl.pt, kevent_poll_qproc);
+
+   err = kevent_storage_enqueue(file-st, k);
+   if (err)
+   goto err_out_free;
+
+   revents = file-f_op-poll(file, ctl.pt);
+   if (revents  k-event.event) {
+   ready = 1;
+   kevent_poll_dequeue(k);
+   }
+   
+   return ready;
+
+err_out_free:
+   kmem_cache_free(kevent_poll_priv_cache, priv);
+err_out_fput:
+   fput(file);
+   return err;
+}
+
+static int kevent_poll_dequeue(struct 

Re: [RFC] gre: transparent ethernet bridging

2006-08-01 Thread Philip Craig
Stephen Hemminger wrote:
 I am not against making the bridge code smarter to handle other
 encapsulation.

Do you mean something like this patch?

The only drawback I see for this approach is that it means you
can only encapsulate the ethernet header if the gre interface is
bridged.  That's not too bad a restriction though.

This patch only works for local packets so far, and doesn't
handle the LLC_SAP_BSPAN packets.

Also, if the gre interface is the only port on the bridge, then
we have no mac address.


--- linux-2.6.x/net/bridge/br_device.c  18 Jun 2006 23:30:55 -  1.1.1.14
+++ linux-2.6.x/net/bridge/br_device.c  1 Aug 2006 09:12:42 -
@@ -17,6 +17,7 @@
 #include linux/netdevice.h
 #include linux/etherdevice.h
 #include linux/ethtool.h
+#include linux/if_arp.h

 #include asm/uaccess.h
 #include br_private.h
@@ -95,7 +96,9 @@ static int br_set_mac_address(struct net

spin_lock_bh(br-lock);
list_for_each_entry(port, br-port_list, list) {
-   if (!compare_ether_addr(port-dev-dev_addr, addr-sa_data)) {
+   if (port-dev-type == ARPHRD_ETHER 
+   !compare_ether_addr(port-dev-dev_addr,
+   addr-sa_data)) {
br_stp_change_bridge_id(br, addr-sa_data);
err = 0;
break;
--- linux-2.6.x/net/bridge/br_fdb.c 18 Jun 2006 23:30:55 -  1.1.1.13
+++ linux-2.6.x/net/bridge/br_fdb.c 1 Aug 2006 09:12:42 -
@@ -20,6 +20,7 @@
 #include linux/netdevice.h
 #include linux/etherdevice.h
 #include linux/jhash.h
+#include linux/if_arp.h
 #include asm/atomic.h
 #include br_private.h

@@ -86,6 +87,7 @@ void br_fdb_changeaddr(struct net_bridge
struct net_bridge_port *op;
list_for_each_entry(op, br-port_list, list) {
if (op != p 
+   op-dev-type == ARPHRD_ETHER 

!compare_ether_addr(op-dev-dev_addr,
f-addr.addr)) {
f-dst = op;
@@ -151,6 +153,7 @@ void br_fdb_delete_by_port(struct net_br
struct net_bridge_port *op;
list_for_each_entry(op, br-port_list, list) {
if (op != p 
+   op-dev-type == ARPHRD_ETHER 

!compare_ether_addr(op-dev-dev_addr,
f-addr.addr)) {
f-dst = op;
--- linux-2.6.x/net/bridge/br_forward.c 18 Jun 2006 23:30:55 -  1.1.1.15
+++ linux-2.6.x/net/bridge/br_forward.c 1 Aug 2006 09:12:42 -
@@ -18,6 +18,7 @@
 #include linux/skbuff.h
 #include linux/if_vlan.h
 #include linux/netfilter_bridge.h
+#include linux/if_arp.h
 #include br_private.h

 static inline int should_deliver(const struct net_bridge_port *p,
@@ -46,6 +47,8 @@ int br_dev_queue_push_xmit(struct sk_buf
nf_bridge_maybe_copy_header(skb);
 #endif
skb_push(skb, ETH_HLEN);
+   if (skb-dev-type == ARPHRD_IPGRE)
+   skb-protocol = htons(ETH_P_BRIDGE);

dev_queue_xmit(skb);
}
--- linux-2.6.x/net/bridge/br_if.c  18 Jun 2006 23:30:55 -  1.1.1.23
+++ linux-2.6.x/net/bridge/br_if.c  1 Aug 2006 09:12:42 -
@@ -391,7 +391,10 @@ int br_add_if(struct net_bridge *br, str
struct net_bridge_port *p;
int err = 0;

-   if (dev-flags  IFF_LOOPBACK || dev-type != ARPHRD_ETHER)
+   if (dev-flags  IFF_LOOPBACK)
+   return -EINVAL;
+
+   if (dev-type != ARPHRD_ETHER  dev-type != ARPHRD_IPGRE)
return -EINVAL;

if (dev-hard_start_xmit == br_dev_xmit)
@@ -408,9 +411,11 @@ int br_add_if(struct net_bridge *br, str
if (err)
goto err0;

-   err = br_fdb_insert(br, p, dev-dev_addr);
-   if (err)
-   goto err1;
+   if (dev-type == ARPHRD_ETHER) {
+   err = br_fdb_insert(br, p, dev-dev_addr);
+   if (err)
+   goto err1;
+   }

err = br_sysfs_addif(p);
if (err)
--- linux-2.6.x/net/bridge/br_input.c   18 Jun 2006 23:30:55 -  1.1.1.18
+++ linux-2.6.x/net/bridge/br_input.c   1 Aug 2006 09:12:42 -
@@ -17,6 +17,7 @@
 #include linux/netdevice.h
 #include linux/etherdevice.h
 #include linux/netfilter_bridge.h
+#include linux/if_arp.h
 #include br_private.h

 /* Bridge group multicast address 802.1d (pg 51). */
@@ -124,11 +125,22 @@ static inline int is_link_local(const un
 int br_handle_frame(struct net_bridge_port *p, struct sk_buff **pskb)
 {
struct sk_buff *skb = *pskb;
-   const unsigned 

Re: [PATCH dscape] d80211: Switch d80211.h to IEEE80211_ style names

2006-08-01 Thread Christoph Hellwig
On Mon, Jul 31, 2006 at 01:51:31PM -0700, Michael Wu wrote:
 On Monday 31 July 2006 13:31, John W. Linville wrote:
  As usual I'll depend on Jiri to merge d80211 stack patches, then
  send me a pull request.  If I apply your Switch drivers to d80211
  series now, that will undoutedly cause a breakage when Jiri asks me
  to pull this later.
 
 Yeah, there needs to be a new (and smaller) set of patches to switch drivers 
 to the d80211.h header.

NACK again.  Driver should continue to use the ieee80211.h header forever.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


e1000 speed/duplex error

2006-08-01 Thread a1
I'm trying to set my nic to force 100Mb/FD, but I'm constantly getting 100/HD on
other side of the link.

The command is:
ethtool -s eth0 speed 100 duplex full autoneg off

e1000 driver version 7.1.9 (latest) downloaded from sourceforge.

Are there any problems?

-- 
 alkot  mailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 speed/duplex error

2006-08-01 Thread Jeff Kirsher

On 8/1/06, a1 [EMAIL PROTECTED] wrote:

I'm trying to set my nic to force 100Mb/FD, but I'm constantly getting 100/HD on
other side of the link.

The command is:
ethtool -s eth0 speed 100 duplex full autoneg off

e1000 driver version 7.1.9 (latest) downloaded from sourceforge.



What are you linking to? And what is the link partner set to?

If one link partner is set to auto-negotiate, and the other partner
forced, it is common to see this issue no matter what the two link
partners are.


Are there any problems?


Not that I am aware of.

--
Cheers,
Jeff
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regarding offloading IPv6 addrconf and ndisc

2006-08-01 Thread Hugo Santos
David,

 So all of you userland control-plane fanatics, how will you handle
 things like NFS root with these daemon-required variants of NDISC and
 ARP?

   Do it in the initial ramdisk, we only need the daemon to setup the
 NDISC entries to talk to the NFS server. :-)

   There is obviously a cost associated with this, a deployment cost.
 But there are additional factors we must consider. In a later e-mail
 you state that Linux is a generic purpose operating system; how many
 users need to boot from a NFS root (besides myself :-)? I think that we
 must take into consideration that currently Linux is used in lots of
 distinct environments, not only Desktop computers, and servers, but
 also smaller devices. Configuration/Flexibility vs. optimization is
 something that varies a lot depending on the deployment you are talking
 about, and in most of my scenarios, a small mobile device isn't
 required at the moment to push 100Mbps (optimization) but must be
 capable of verifying it's peers and maintaining secure connections
 (flexibility). So, let's be generic?

   I might have some cycles during the month to code up something in
 this direction, at least for an initial review, i'll try to do so.

   Also, the reliability of a system depends on a lot of things, but
 please, let's not use the assumption that because everything sits in
 the kernel, it will be stable as the number of 'points of failure' is
 smaller; this is only true as long as people work to have stable
 components -- and this is independent of where the components sit. A
 few kernel versions ago (2.6.8 if i remember correctly) i couldn't even
 remove a used network interface safely from the system without hanging
 the network stack. It is possible to have stable user-space code, if
 people developing it work to and make sure it is stable.

   Hugo


signature.asc
Description: Digital signature


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Herbert Xu
On Tue, Aug 01, 2006 at 09:19:34AM +0200, Patrick McHardy wrote:
 
 - nfct/nfctinfo/nfct_reasm: the xfrm output path does an immediate
   nf_reset, so they were not necessary until now. Queueing can happen
   on any hook, so we need to preserve them.

This looks OK to me.  However, we should do this in a wrapper around
skb_gso_segment rather than in skb_segment.

The idea with skb_gso_segment is that it's the most generic (or basic)
level of service which is needed by everyone.  If you need anything
more specific, you can do it around skb_gso_segment.

For exmample, dev_gso_segment is the version that's tailored for
device transmission.

So I'd imagine something like

nf_gso_segment(skb)
{
segs = skb_gso_segment(skb, 0);
if (IS_ERR(segs))
goto out;

nskb = segs;
do {
copy_nf_attributes(nskb, skb);
} while ((nskb = nskb-next));

out:
kfree_skb(skb);
return segs;
}

 - nf_bridge: needed for GSO on a bridge device until the deferred
   hooks are removed

I presume this is only needed for netfilter queueing as well?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux TCP in the presence of delays or drops...

2006-08-01 Thread Oumer Teyeb

Thanks Ilpo for the info!

I am trying out the tests now using timestamps only and without FRTO, 
and vice versa and see if there is any change.
Actually, I have also noticed in some of the traces also this behaviour 
of FRTO where it mistook a loss as spurious which leads to further 
performance degradtion. but I was also using timestamps, so I dont know 
if it also occurs without timestamps.  I will try it out and let you 
know. I will send you the traces I just mentioned (FRTO+timestamps 
leading to a loss being mistaken for a spurious one..)..


Regards,
Oumer

Ilpo Järvinen wrote:


On Mon, 31 Jul 2006, Oumer Teyeb wrote:

 


-If multiple timeouts occur for one packet then even if we are using the
timestamp option or FRTO TCP linux is not able to detect spurious
retransmissions... and TCP linux is able to detect spurious retransmissions
only for a single timeout for one packet or fast retransmissions that are
caused by duplicate ACK reception.I have some traces that show this
behaviour, let me know if you are interested.
   



I have come across this same issue. I can confirm that multiple RTOs is 
not handled correctly. But there are some other issues in FRTO as 
well... nothing extremely dangerous though. In one of the cases, the 
current FRTO algorithm could miss real losses, but you luckily need to be 
quite clever to trigger that, and due to very conservative response used 
in case spurious RTO is detected, it has no significant danger in it even 
then. The other flaws cause too conservative behavior.
 



We have a set of fixes to F-RTO, but part of them have not yet been 
tested. Since the fixes include 4-5 independent changes to handle also 
rare cases, it takes some time to test. Besides, I'll probably have to 
talk with Pasi Sarolahti (author of FRTO) on couple of interpretation 
issues in RFC4138 as soon as his vacation ends (mid August if I remember 
correctly) to verify some of the changes.


I expect that I'll get some actual results after two weeks or so... I case 
you're are in hurry and are interested on the fixes, I could prepare an 
independent patch quite soon and release it (untested) on our projects web 
site (if you are interested, ask off-list so that we don't bother others 
:-)). But the kernel inclusion of the fixes should IMO wait at least until 
I get some decent test cases analyzed, which will take at least two weeks 
or so due to my schedule.


 




-In the cases where TCP timestamp or FRTO is not able to detect spurious
retransmissions, the performance degrades even more than when TCP timestamp
or FRTO option are not used
   



That's one of the FRTO features, we have a fix (I cannot say about 
timestamps since we've been running our tests without tstamps for years).
 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regarding offloading IPv6 addrconf and ndisc

2006-08-01 Thread Hugo Santos
David,

 To drive this home even more, I do not believe that the people who
 advocate pushing NDISC and ARP policy into userspace would be very
 happy if something like the RAID transformations were moved into
 userspace and they were not able to access their disks if the RAID
 transformer process in userspace died.

   How would you restart the RAID controller daemon if it's stored
 in the RAID itself? Also, assuming the same code quality (and ignoring
 OOM killer for a moment), if the RAID controller daemon dies, if that
 code was in the kernel, it would also possibly crash the whole kernel.

 At that point, network access equals disk access.  It would be amusing
 to need to restart such an NDISC/ARP daemon if it were to live on a
 remote volume. :-)

   What you are saying is that, well, the NDISC handling is already in
 the host's memory (kernel text), so the connection could be restarted
 with the remote storage facility. So, let's be fair, and say that
 somehow the NDISC daemon would be available localy?

 I understand full well that on special purpose network devices this
 control vs. data plane seperation into userspace might make a lot of
 sense.  But for a general purpose operating system, such as Linux, the
 greater concern is resiliency to failures and each piece of core
 functionality you move to userspace is a new potential point of
 failure.

   I think 100% of Linux's users want stability. Resiliency to failure
 is not something that depends on the kernel. If the code in question is
 in the kernel, and it crashes, how will you recover?

   Please note that i'm not making this a monolithic vs. micro- kernel
 discussion (i wouldn't want Linus to step in and kick me to hell), but
 if we have the possibility of not having _complex_ interactions within
 the kernel, we are making the kernel itself more resilient to failure.

   Hugo



signature.asc
Description: Digital signature


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Jamal Hadi Salim
On Tue, 2006-01-08 at 17:45 +1000, Herbert Xu wrote:
 On Tue, Aug 01, 2006 at 12:36:14AM -0700, David Miller wrote:
  
- tc_verd/tc_index/input_dev: when directing a packet from a device
  supporting GSO to a device not supporting GSO using tc actions,
  these fields may be set.
   
   This doesn't look right though.  GSO should occur just before
   hard_start_xmit (after all tc actions have taken place) so we
   shouldn't have any more tc actions to perform.
  
  Hmmm, what about loopback?  Yeah yeah, LOOPBACK_TSO is not defined :)
  but what I'm really referring to is that loopback preserves the
  traffic classifier bits of the skb.
 
 I don't know.  Jamal, is there a scenario where these three attributes
 are needed for loopback packets?


I didnt fully understand the scope of the discussion, but a little
explanation may help answer the question:

If the only spot where the GSO kicks in is at or after qdisc_is_running
level, then the redirect will happen way before that point at
qdisc-enqueue time.
When we redirect or mirror, the packet is cloned and any metadata
preserved at clone time needs to stay intact before dev_queue_xmit is
hit for the destination device. 
And of course we have no checks for whether we are redirecting from a
device that is GSO capable to one that is not (or vice-versa). If thats
important, we could add a check.

I am not sure if I answered the question ;-

cheers,
jamal



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re[2]: e1000 speed/duplex error

2006-08-01 Thread a1
He, Jeff.

Thank for quick reply.

JK On 8/1/06, a1 [EMAIL PROTECTED] wrote:
 I'm trying to set my nic to force 100Mb/FD, but I'm constantly getting 
 100/HD on
 other side of the link.

 The command is:
 ethtool -s eth0 speed 100 duplex full autoneg off

 e1000 driver version 7.1.9 (latest) downloaded from sourceforge.


JK What are you linking to? And what is the link partner set to?
I am linking to my 3com gigabit ethernet switch. And it shows
100Mbit/halfduplex.

JK If one link partner is set to auto-negotiate, and the other partner
JK forced, it is common to see this issue no matter what the two link
JK partners are.
I thought the common behavior is that if one side force any particular
parameter, other side should sense that and go to that mode too.

In current case there is misunderstanding - one side (linux box)
reports 100/FD and other (switch) reports 100/HD.

Maybe I don't understand something...

 Are there any problems?

JK Not that I am aware of.


-- 
Best regards,
 a1  mailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-01 Thread John W. Linville
On Mon, Jul 31, 2006 at 09:39:08PM -0400, Jamal Hadi Salim wrote:
 On Mon, 2006-31-07 at 08:30 -0400, John W. Linville wrote:

  Do we hold the view that our L2 code is on par with the rest of
  our code?  Is there an appetite for a clean-up?  Or is it just me?
  
  /rant
  
  If you made it this far, thanks for listening...I feel better now. :-)
 
 Yes, I made it this far and you do make good arguement (or i may be
 over-dosed ;-).
 I have seen the following setups that are useful:
 
 1) Vlans with bridges; in which one or more vlans exist per ethernet
 port. Broadcast packets within such vlans are restricted to just those
 vlans by the bridge.
 2) complicate the above a little by having multiple spanning trees. 
 3) Add to the above link layer HA (802.1ad or otherwise as presented
 today by Bonding).
 
 To answer your question; i think yes we need all 3.

Oh, don't get me wrong -- I definitely think we need all three.

I'm just not sure we need every conceivable combination of a) bonds
of vlan interfaces; b) vlan interfaces on top of bonds; c) bridged
vlan interfaces w/ disparate vlan IDs; d) bonded vlan interfaces w/
disparate vlan IDs; e) bonded bridge interfaces (does this work?) f)
bonded bonds (seen customers trying to do it); g) bridged vlan
interfaces; h) bridged bonds; i) bridged bridges (probably doesn't
work, but someone probably wants it); j) vlan interfaces on top of
bridges; k) vlan interfaces on top of vlans (double vlan tagging);
and, l) what am I leaving out?

Most (actually all afaik) L2 networking equipment enforces some
hierarchy on the relationship between these L2 entities.  I am more
and more convinced we should do the same, although I do acknowledge
that the current situation does allow for some cleverness.

I'm just not sure that cleverness is worth the headache, especially
since the most clever things usually only work by accident...

 Unfortunately the 3 above are all done by different people with
 different intentions altogether. I think BGrears end goal was VLANs for
 an end host. I think Lennert wrote the original Bridge code and for a
 while had some VLAN code that worked well with bridging (that code died
 as far as i know). Then bonding - theres some pre-historic relation to
 it since D Becker days and then the good folks from Intel adding about
 1M features to it. Yes, the fact all 3 need to work together is a
 mess ;- (but there are good pragmatic reasons for them to work
 together)...

I'm sure you are correct -- each entity was developed to serve its
purpose, and each does so admirably on its own.  The fact that they
work together is a desirable miracle.

There is no doubt that we need to be able to do all three (vlan,
bridge, bond) at once.  I'm just not convinced we need to support
stacking them in every conceivable order.  And, I think that a
reconsideration of all three functions as a group could lead to
better/cleaner functionality with easier support for extension (e.g.
802.1s).

Well, I'll guit now before I get sent-off to the visionaries list.
I am putting some thought to this, but I'm not yet far enough along
to sound coherent... :-)

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regarding offloading IPv6 addrconf and ndisc

2006-08-01 Thread Hugo Santos
Jamal,

 nice to know ;- At least you can protect some apps if you need to.
 Only racoon and quagga are important for me.
 But what happens then if you have a beast that just chews memory
 forever? I suppose other poor apps will just get shot.

   You should push QoS and differentiation into the memory-subsystem :-)
 Give a priority flag to mmap(). It's not simple to degrade existing
 allocations, but taking into consideration the OOM killer...

   Hugo


signature.asc
Description: Digital signature


Re: e1000 speed/duplex error

2006-08-01 Thread Andy Gospodarek
On Tue, Aug 01, 2006 at 04:03:28PM +0400, a1 wrote:
 He, Jeff.
 
 Thank for quick reply.
 
 JK On 8/1/06, a1 [EMAIL PROTECTED] wrote:
  I'm trying to set my nic to force 100Mb/FD, but I'm constantly getting 
  100/HD on
  other side of the link.
 
  The command is:
  ethtool -s eth0 speed 100 duplex full autoneg off
 
  e1000 driver version 7.1.9 (latest) downloaded from sourceforge.
 
 
 JK What are you linking to? And what is the link partner set to?
 I am linking to my 3com gigabit ethernet switch. And it shows
 100Mbit/halfduplex.
 
 JK If one link partner is set to auto-negotiate, and the other partner
 JK forced, it is common to see this issue no matter what the two link
 JK partners are.
 I thought the common behavior is that if one side force any particular
 parameter, other side should sense that and go to that mode too.
 
 In current case there is misunderstanding - one side (linux box)
 reports 100/FD and other (switch) reports 100/HD.
 
 Maybe I don't understand something...
 

Jeff it correct, but the behavior can change depending on the switch
hardware.  The results you see are expected since I'm quite sure the
IEEE standard says that a device should default to half-duplex when
auto-negotiation fails (which is why your 3com switch defaults to
half-duplex).

-andy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re[4]: e1000 speed/duplex error

2006-08-01 Thread a1
Hi, Jeff.


JK OPTION 2: Turn auto-negotiate on the e1000 card and tell it to only
JK advertise 100 Full Duplex.  This will allow negotiation between the
JK two lnk partners and the e1000 will advertise that it is only able to
JK do 100 Full duplex.

Is there any way i could do this with ethtool? It only allows force
spd/dplx , but not set it for advertising...

If i do as follows other side reports 1000/FD:

ethtool -s eth0 speed 100 duplex full autoneg on

-- 
Best Regards,
 a1  mailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re[2]: e1000 speed/duplex error

2006-08-01 Thread Jamal Hadi Salim
On Tue, 2006-01-08 at 16:03 +0400, a1 wrote:

 I thought the common behavior is that if one side force any particular
 parameter, other side should sense that and go to that mode too.
 

You _cannot_ depend on that behavior at all. IOW, if one side is not
forced the other side's setting is undefined and falls back to whatever
that side defines as default

 In current case there is misunderstanding - one side (linux box)
 reports 100/FD and other (switch) reports 100/HD.
 
 Maybe I don't understand something...

You must force both sides for predictable behavior. I would say that
doing HD as default is more common. So Linux may need to change just
that one bit.

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH dscape] d80211: Switch d80211.h to IEEE80211_ style names

2006-08-01 Thread John W. Linville
On Tue, Aug 01, 2006 at 10:56:50AM +0100, Christoph Hellwig wrote:
 On Mon, Jul 31, 2006 at 01:51:31PM -0700, Michael Wu wrote:
  On Monday 31 July 2006 13:31, John W. Linville wrote:
   As usual I'll depend on Jiri to merge d80211 stack patches, then
   send me a pull request.  If I apply your Switch drivers to d80211
   series now, that will undoutedly cause a breakage when Jiri asks me
   to pull this later.
  
  Yeah, there needs to be a new (and smaller) set of patches to switch 
  drivers 
  to the d80211.h header.
 
 NACK again.  Driver should continue to use the ieee80211.h header forever.

I don't anticipate the d80211 naming conventions to ever make it out
of wireless-dev.  By the time we are ready to push that stuff upstream,
we will have cleaned-up our messes.

This does raise the question: Should we start taking patches to
wireless-dev that migrate the current (i.e. ieee80211/softmac) stack
out of the kernel?  This would include (re-)moving the current stack
code, pointing non-migrated drivers (ipw2[12]00, zd1211rw) at the old
code, moving drivers out of drivers/net/wireless/d80211 up a level,
removing the softmac-based version of the bcm43xx driver, etc.

Are we ready for this?  Who wants to be the wireless janitor?

Whether Michael's patches come before or after this clean-up
really doesn't matter.  I'd probably rather have them now.  It only
complicates the migration slightly, while accomplishing something
useful.

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Herbert Xu
On Mon, Jul 31, 2006 at 08:36:58PM +0200, Patrick McHardy wrote:

 [NETFILTER]: Get rid of HW checksum invalidation
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

It all looks great except for the csum update function.

 diff --git a/net/netfilter/core.c b/net/netfilter/core.c
 index 5d29d5e..a7c42c8 100644
 --- a/net/netfilter/core.c
 +++ b/net/netfilter/core.c
 @@ -222,6 +222,29 @@ copy_skb:
  }
  EXPORT_SYMBOL(skb_make_writable);
  
 +u_int16_t nf_csum_update(u_int32_t oldval, u_int32_t newval, u_int16_t csum)
 +{
 + u_int32_t diff[] = { oldval, newval };
 +
 + return csum_fold(csum_partial((char *)diff, sizeof(diff),
 +   csum ^ 0x));
 +}
 +EXPORT_SYMBOL(nf_csum_update);
 +
 +u_int16_t nf_proto_csum_update(struct sk_buff *skb,
 +u_int32_t oldval, u_int32_t newval,
 +u_int16_t csum, int pseudohdr)
 +{
 + if (skb-ip_summed != CHECKSUM_PARTIAL) {
 + csum = nf_csum_update(oldval, newval, csum);
 + if (skb-ip_summed == CHECKSUM_COMPLETE  pseudohdr)

Shouldn't that be !pseudohdr?

 + skb-csum = nf_csum_update(oldval, newval, skb-csum);
   ^

This is a 32-bit quantity so nf_csum_update should eat a 32-bit quantity
as well.  Also, this checksum is not inverted so you need

skb-csum = ~nf_csum_update(oldval, newval, ~skb-csum);

Of course nf_csum_update will need ~csum instead of csum^0x.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] forcedeth: mac address corrected

2006-08-01 Thread Andy Gospodarek
On Mon, Jul 31, 2006 at 12:05:01PM -0400, Ayaz Abdulla wrote:
 This patch will correct the mac address and set a flag to indicate that 
 it is already corrected in case nv_probe is called again. For example, 
 when you use kexec to restart the kernel.
 
 Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED]
 

Have you found other situations where this patch is critical other than
when kevec is used to restart the kernel?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-01 Thread Jamal Hadi Salim
On Tue, 2006-01-08 at 08:08 -0400, John W. Linville wrote:
[..]
 There is no doubt that we need to be able to do all three (vlan,
 bridge, bond) at once.  I'm just not convinced we need to support
 stacking them in every conceivable order.  

In theory there should be no issues stacking netdevices in any order
you want. In other words the hooks for doing so exist (albeit in some
limited way[1]). Practically, some of the topologies of interconnected
netdevices dont make a lot of sense. The danger is in restricting how
the stacking happens and overlooking some future creative use.
Safer to let the user own the policy and configure any way they want aka
shoot themselves in the foot.

 And, I think that a
 reconsideration of all three functions as a group could lead to
 better/cleaner functionality with easier support for extension (e.g.
 802.1s).

Agreed. I have some very strong opinions on this subject that i could
share with you if you want. For example, IMO, I think it would be a lot
reasonable to assume that a VLAN or VLANS are attributes of a netdevice
(just like IP addresses or MAC addresses are). 

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Herbert Xu
On Tue, Aug 01, 2006 at 08:00:48AM -0400, Jamal Hadi Salim wrote:
   
 - tc_verd/tc_index/input_dev: when directing a packet from a device
   supporting GSO to a device not supporting GSO using tc actions,
   these fields may be set.

This doesn't look right though.  GSO should occur just before
hard_start_xmit (after all tc actions have taken place) so we
shouldn't have any more tc actions to perform.
   
   Hmmm, what about loopback?  Yeah yeah, LOOPBACK_TSO is not defined :)
   but what I'm really referring to is that loopback preserves the
   traffic classifier bits of the skb.
  
  I don't know.  Jamal, is there a scenario where these three attributes
  are needed for loopback packets?
 
 I didnt fully understand the scope of the discussion, but a little
 explanation may help answer the question:

My question isn't really about GSO as such.

What I'd like to know is do we really need to preserve

tc_verd/tc_index/input_dev

for a packet crossing loopback's xmit function?

The reason I'm asking is because currently they're preserved as a matter
of course since loopback just shoves the same packet down the other end.

If we were to apply GSO to loopback (purely as a thought exercise since
loopback can handle large frames just fine without segmenting it :) would
we need to copy those attriutes?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Jamal Hadi Salim
On Tue, 2006-01-08 at 22:34 +1000, Herbert Xu wrote:

 What I'd like to know is do we really need to preserve
 
   tc_verd/tc_index/input_dev
 
 for a packet crossing loopback's xmit function?
 

My instinctive reaction is to say no. Heres a (slightly complex)
example:

-- eth0(GSO ON) --- lo -- eth1(GSO off) -- eth3(GSO ON)

When we get to lo in the above graph, input_dev=eth0 and when we leave
that info will be overwritten to be input_dev=lo. 

I just added the GSO markers (incase that info is useful) to show that
we could move in the same topology between GSO and non-GSO devices.

I believe it would be fine for lo not to preserve. Not sure if it is ok
as a general rule though. 

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: neigh_lookup lockdep bug.

2006-08-01 Thread Arjan van de Ven
On Mon, 2006-07-31 at 14:02 -0700, David Miller wrote:
 From: Dave Jones [EMAIL PROTECTED]
 Date: Mon, 31 Jul 2006 16:50:04 -0400
 
  2.6.18rc2-gitSomething on my firewall box just triggered this..
 
 Lockdep is perhaps confused.
 
  [515613.904945] swapper/0 is trying to acquire lock:
  [515613.931489]  (tbl-lock){-+-+}, at: [c05b5d63] neigh_lookup+0x50/0xaf
  [515613.964369] 
  [515613.964373] but task is already holding lock:
  [515614.006550]  (skb_queue_lock_key){-+..}, at: [c05b741c] 
  neigh_proxy_process+0x20/0xc2
 
 The skb_queue_lock in question is tbl-proxy_queue.lock
 
  [515614.103459] the existing dependency chain (in reverse order) is:
  [515614.148752] 
  [515614.148755] - #2 (skb_queue_lock_key){-+..}:
  [515614.10][c043bf43] lock_acquire+0x4b/0x6c
  [515614.215554][c06089a7] _spin_lock_irqsave+0x22/0x32
  [515614.243606][c05ac2e3] skb_dequeue+0x12/0x43
  [515614.269657][c05acffe] skb_queue_purge+0x14/0x1b
  [515614.296565][c05b673e] neigh_update+0x317/0x353
 
 This is a different queue lock, namely neigh-arp_queue.lock
 
 Like the ipv6 trace we got yesterday from Matt Domsche, lockdep
 is aparently confusing two instances of the skb_queue_lock_key

we fixed lockdep to have this lock key to be per skb queue ... didn't
you put that patch in rawhide Dave (J) ?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] forcedeth: mac address corrected

2006-08-01 Thread Andy Gospodarek
On Tue, Aug 01, 2006 at 08:27:27AM -0400, Andy Gospodarek wrote:
 On Mon, Jul 31, 2006 at 12:05:01PM -0400, Ayaz Abdulla wrote:
  This patch will correct the mac address and set a flag to indicate that 
  it is already corrected in case nv_probe is called again. For example, 
  when you use kexec to restart the kernel.
  
  Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED]
  
 
 Have you found other situations where this patch is critical other than
 when kevec is used to restart the kernel?
 

s/kevec/kexec/g

 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take2 1/4] kevent: core files.

2006-08-01 Thread James Morris
On Tue, 1 Aug 2006, Evgeniy Polyakov wrote:

 + u-ready_num = 0;
 +#ifdef CONFIG_KEVENT_USER_STAT
 + u-wait_num = u-im_num = u-total = 0;
 +#endif

Generally, #ifdefs in the body of the kernel code are discouraged.  Can 
you abstract these out as static inlines?


- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH dscape] d80211: Switch d80211.h to IEEE80211_ style names

2006-08-01 Thread Jiri Benc
 On Tue, Aug 01, 2006 at 10:56:50AM +0100, Christoph Hellwig wrote:
  NACK again.  Driver should continue to use the ieee80211.h header forever.

When a patch that renames constants in d80211 is merged and d80211 stack
is moved to ieee80211/ directory, there will be only slight changes (if
any) needed for most drivers. So yes, fullmac drivers will continue to
use ieee80211.h, constants and structures they use will be still the
same, but the header itself will be completely different.

One notable exception is ipw2100 and ipw2200 drivers, but that's another
story.

On Tue, 1 Aug 2006 08:21:49 -0400, John W. Linville wrote:
 This does raise the question: Should we start taking patches to
 wireless-dev that migrate the current (i.e. ieee80211/softmac) stack
 out of the kernel?

I think we should, at least some of them.

 This would include (re-)moving the current stack
 code,

This can be done easily just before merging, no reason for one more
breakage of everyone's drivers now. Furthermore, if we do this now, it
will be more difficult to track Linus' tree.

 pointing non-migrated drivers (ipw2[12]00, zd1211rw) at the old
 code,

Yes. Rather than moving, zd1211 should be ported to d80211 - this will
also allow using of more advanced features of the hw.

 moving drivers out of drivers/net/wireless/d80211 up a level,

I think they should be moved along with the stack - i.e. just before
merging.

 removing the softmac-based version of the bcm43xx driver, etc.

Ditto.

 Whether Michael's patches come before or after this clean-up
 really doesn't matter.  I'd probably rather have them now.  It only
 complicates the migration slightly, while accomplishing something
 useful.

Do you have a plan when you will merge rt2x00 patches so I can apply
Michael's renaming patch(es) without risk of conflicts?

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH dscape] d80211: Switch d80211.h to IEEE80211_ style names

2006-08-01 Thread John W. Linville
On Tue, Aug 01, 2006 at 03:58:37PM +0200, Jiri Benc wrote:

 Do you have a plan when you will merge rt2x00 patches so I can apply
 Michael's renaming patch(es) without risk of conflicts?

Working on it ~now.  I hit a snag in that Ivo's patches seem to rely on
his radio button patch, which I had ignored until now.  I'll probably
pull that in this morning (or figure-out how to not need it) and get
something pushed to wireless-dev sometime today.

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH dscape] d80211: Switch d80211.h to IEEE80211_ style names

2006-08-01 Thread Ivo Van Doorn

Hi,


 Do you have a plan when you will merge rt2x00 patches so I can apply
 Michael's renaming patch(es) without risk of conflicts?

Working on it ~now.  I hit a snag in that Ivo's patches seem to rely on
his radio button patch, which I had ignored until now.  I'll probably
pull that in this morning (or figure-out how to not need it) and get
something pushed to wireless-dev sometime today.


Ehm, that should not have happened. The rt2x00 version I had send as patches
had the radio button integrated as they had been in wireless-dev already.
The patch to convert it to the rfkill driver is one of the patches
I'll send later.

Ivo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 speed/duplex error

2006-08-01 Thread Auke Kok

Jeff Kirsher wrote:

On 8/1/06, a1 [EMAIL PROTECTED] wrote:

Hi, Jeff.


JK OPTION 2: Turn auto-negotiate on the e1000 card and tell it to only
JK advertise 100 Full Duplex.  This will allow negotiation between the
JK two lnk partners and the e1000 will advertise that it is only able to
JK do 100 Full duplex.

Is there any way i could do this with ethtool? It only allows force
spd/dplx , but not set it for advertising...


Not currently.  That would be a nice feature though... :)



If i do as follows other side reports 1000/FD:

ethtool -s eth0 speed 100 duplex full autoneg on



Which is what I would expect.  I have to step away for a bit, but if
no one responds with how to load the driver with auto-negotiate
advertising only 100 Full Duplex, I will do so when I return.


Here's that part of the driver documentation:

$ modprobe e1000 AutoNeg=0x08
e1000: :00:00.0: e1000_validate_option: AutoNeg advertising 100/FD


 99 /* Auto-negotiation Advertisement Override
100  *
101  * Valid Range: 0x01-0x0F, 0x20-0x2F (copper); 0x20 (fiber)
102  *
103  * The AutoNeg value is a bit mask describing which speed and duplex
104  * combinations should be advertised during auto-negotiation.
105  * The supported speed and duplex modes are listed below
106  *
107  * Bit   7 6 5  4  3 2 1  0
108  * Speed (Mbps)  N/A   N/A   1000   N/A100   100   10 10
109  * DuplexFull  Full  Half  Full   Half
110  *
111  * Default Value: 0x2F (copper); 0x20 (fiber)
112  */

hth,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runtime power management for network interfaces

2006-08-01 Thread David Brownell
On Monday 31 July 2006 9:17 am, Stephen Hemminger wrote:
 On Tue, 25 Jul 2006 11:59:52 -0400 (EDT)
 Alan Stern [EMAIL PROTECTED] wrote:
 
  During a Power Management session at the Ottawa Linux Symposium, it was
  generally agreed that network interface drivers ought to automatically
  suspend their devices (if possible) whenever:
  
  (1) The interface is ifconfig'ed down, or
  
  (2) No link is available.
 
 This is hard because most of the power may be consumed by the PHY interface
 and it needs to be alive to see link.

True only for #2, yes?  I think #1 could be adopted pretty widely, but
no driver I've yet come across implements that policy.  I think maybe
Don Becker didn't have power management on the brain nearly enough.  :)


  Has any progress been made in this direction?  If not, a natural approach 
  would be to start with a reference implementation in one driver which 
  could then be copied to other drivers.
  
 
 The problem is not generic, it really is specific to each device.

For #2, yes.  Much less so for #1; if the hardware has a low power mode,
there's no point in being in any other mode when the interface is down.

This might actually be a good time to start rethinking power management for
network interfaces.  Upcoming kernels (see the MM tree) have new class
methods for suspend() and resume(), with the notion that they should be
offloading tasks from drivers.  For network links, that would most
naturally be netif_device_detach() on suspend, etc; if netdevices were
to provide suspend() and resume() methods, those could be called via
class suspend/resume as well as when they're configured down/up.  Just
an idea of course ... but it might well be possible that some changes
like that would be a nice incremental power savings on many systems,
while simplifying some tasks that often confuse driver writers.


 We have all the necessary infrastructure to do the right thing in the network
 device driver, but in many cases we don't have the code or the technical 
 information
 to do proper power management.

I think that's more true for wakeup events than for PM in general.  After
all, quite a lot of network drivers do have suspend() methods that do
something even if it's just going into PCI_D3, and resume() methods that
are fully capable of re-initializing from power-off.

- Dave

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


E1000: bug on error path in e1000_probe()

2006-08-01 Thread Stephane Doyon

Hi,

The e1000_probe() function passes references to the netdev structure 
before it's actually registered. In the (admittedly obscure) case where 
the netdev registration fails, we are left with a dangling reference.


Specifically, e1000_probe() calls
netif_carrier_off(netdev);
before register_netdev(netdev).

(It also calls pci_set_drvdata(pdev, netdev) rather early, not sure how 
important that is.)


netif_carrier_off() does linkwatch_fire_event(dev);, which in turn does 
dev_hold(dev); and queues up an event with a reference to the netdev.


But the net_device reference counting mechanism only works on registered 
netdevs.


Should the register_netdev() call fail, the error path does 
free_netdev(netdev);, and when the event goes off, it accesses random 
memory through the dangling reference.


I would recommend moving the register_netdev() call earlier.

Thanks

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runtime power management for network interfaces

2006-08-01 Thread Lennert Buytenhek
On Mon, Jul 31, 2006 at 09:17:28AM -0700, Stephen Hemminger wrote:

  During a Power Management session at the Ottawa Linux Symposium, it was
  generally agreed that network interface drivers ought to automatically
  suspend their devices (if possible) whenever:
  
  (1) The interface is ifconfig'ed down, or
  
  (2) No link is available.
 
 This is hard because most of the power may be consumed by the PHY
 interface and it needs to be alive to see link.

At least some Davicom PHYs (IIRC) have an 'energy detect' bit, which
allows you to very quickly see whether there is a link partner without
waiting for autonegotiation to complete.

So, you could just power on the PHY only once every couple of seconds,
and instantly power it down again if the 'energy detect' bit doesn't go
on within some short time interval.


cheers,
Lennert
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take2 1/4] kevent: core files.

2006-08-01 Thread Evgeniy Polyakov
On Tue, Aug 01, 2006 at 09:46:58AM -0400, James Morris ([EMAIL PROTECTED]) 
wrote:
 On Tue, 1 Aug 2006, Evgeniy Polyakov wrote:
 
  +   u-ready_num = 0;
  +#ifdef CONFIG_KEVENT_USER_STAT
  +   u-wait_num = u-im_num = u-total = 0;
  +#endif
 
 Generally, #ifdefs in the body of the kernel code are discouraged.  Can 
 you abstract these out as static inlines?

Yes, it is possible.
I would ask is it needed at all? It contains number of immediately fired
events (i.e. those which were ready when event was added and thus
syscall returned immediately showing that it is ready), total number of
events, which were inserted in the given queue and number of events
which were marked as ready after they were inserted.
Currently it is compilation option which ends up in printk with above
info when kevent queue is removed.
 
 - James
 -- 
 James Morris
 [EMAIL PROTECTED]

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take2 1/4] kevent: core files.

2006-08-01 Thread James Morris
On Tue, 1 Aug 2006, Evgeniy Polyakov wrote:

 On Tue, Aug 01, 2006 at 09:46:58AM -0400, James Morris ([EMAIL PROTECTED]) 
 wrote:
  On Tue, 1 Aug 2006, Evgeniy Polyakov wrote:
  
   + u-ready_num = 0;
   +#ifdef CONFIG_KEVENT_USER_STAT
   + u-wait_num = u-im_num = u-total = 0;
   +#endif
  
  Generally, #ifdefs in the body of the kernel code are discouraged.  Can 
  you abstract these out as static inlines?
 
 Yes, it is possible.
 I would ask is it needed at all?

Yes, please, it is standard kernel development practice.

Otherwise, the kernel will turn into an unmaintainable #ifdef jungle.

 It contains number of immediately fired
 events (i.e. those which were ready when event was added and thus
 syscall returned immediately showing that it is ready), total number of
 events, which were inserted in the given queue and number of events
 which were marked as ready after they were inserted.
 Currently it is compilation option which ends up in printk with above
 info when kevent queue is removed.

Fine, make 

static inline void kevent_user_stat_reset(u);

etc.

which compile to nothing when it's not confifgured.


-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take2 1/4] kevent: core files.

2006-08-01 Thread Evgeniy Polyakov
On Tue, Aug 01, 2006 at 10:27:36AM -0400, James Morris ([EMAIL PROTECTED]) 
wrote:
+   u-ready_num = 0;
+#ifdef CONFIG_KEVENT_USER_STAT
+   u-wait_num = u-im_num = u-total = 0;
+#endif
   
   Generally, #ifdefs in the body of the kernel code are discouraged.  Can 
   you abstract these out as static inlines?
  
  Yes, it is possible.
  I would ask is it needed at all?
 
 Yes, please, it is standard kernel development practice.

Will do.
Thanks, James.

 -- 
 James Morris
 [EMAIL PROTECTED]

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV6]: Audit all ip6_dst_lookup/ip6_dst_store calls

2006-08-01 Thread Andrew Morton
On Mon, 31 Jul 2006 19:04:33 +1000
Herbert Xu [EMAIL PROTECTED] wrote:

 2) There is something broken in the x86_64 unwind code which is causing
 it to panic just about everytime somebody calls dump_stack().
 
 Andi, this is the second time I've seen a report where an otherwise
 harmless dump_stack call (the other one was caused by a WARN_ON) gets
 turned into a panic by the stack unwind code on x86_64.  This particular
 report is with 2.6.18-rc3 so it looks like whatever bug is causing it
 hasn't been fixed yet.
 
 Could you please have a look at it? Thanks.

Jan thinks this might have been fixed by a patch which he sent Andi a
couple of days ago.  Andi has sent that patch to Linus but I'm not sure
which patch it was and I'm not sure whether it has been merged into
mainline.

But yes, -rc3 unwind has problems.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Phil Oester
On Tue, Aug 01, 2006 at 12:00:59AM -0700, David Miller wrote:
 What we have now is infinitely better than the past,
 wherein all TSO packets were dropped due to corrupt
 checksums as soon at the NAT module was loaded.

At what point did this problem begin?  2.6.18-rc or prior?

Phil
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runtime power management for network interfaces

2006-08-01 Thread Stephen Hemminger
On Mon, 31 Jul 2006 17:29:41 -0700
David Brownell [EMAIL PROTECTED] wrote:

 On Monday 31 July 2006 9:17 am, Stephen Hemminger wrote:
  On Tue, 25 Jul 2006 11:59:52 -0400 (EDT)
  Alan Stern [EMAIL PROTECTED] wrote:
  
   During a Power Management session at the Ottawa Linux Symposium,
   it was generally agreed that network interface drivers ought to
   automatically suspend their devices (if possible) whenever:
   
   (1) The interface is ifconfig'ed down, or
   
   (2) No link is available.
  
  This is hard because most of the power may be consumed by the PHY
  interface and it needs to be alive to see link.
 
 True only for #2, yes?  I think #1 could be adopted pretty widely, but
 no driver I've yet come across implements that policy.  I think maybe
 Don Becker didn't have power management on the brain nearly
 enough.  :)
 
 
   Has any progress been made in this direction?  If not, a natural
   approach would be to start with a reference implementation in one
   driver which could then be copied to other drivers.
   
  
  The problem is not generic, it really is specific to each device.
 
 For #2, yes.  Much less so for #1; if the hardware has a low power
 mode, there's no point in being in any other mode when the interface
 is down.
 
 This might actually be a good time to start rethinking power
 management for network interfaces.  Upcoming kernels (see the MM
 tree) have new class methods for suspend() and resume(), with the
 notion that they should be offloading tasks from drivers.  For
 network links, that would most naturally be netif_device_detach() on
 suspend, etc; if netdevices were to provide suspend() and resume()
 methods, those could be called via class suspend/resume as well as
 when they're configured down/up.  Just an idea of course ... but it
 might well be possible that some changes like that would be a nice
 incremental power savings on many systems, while simplifying some
 tasks that often confuse driver writers.

The tg3 and sky2 drivers both do power saving when not up.
It doesn't need special generic support. Just don't bring chip
up if the interface is config'd down.

 
  We have all the necessary infrastructure to do the right thing in
  the network device driver, but in many cases we don't have the code
  or the technical information to do proper power management.
 
 I think that's more true for wakeup events than for PM in general.
 After all, quite a lot of network drivers do have suspend() methods
 that do something even if it's just going into PCI_D3, and resume()
 methods that are fully capable of re-initializing from power-off.
 
 - Dave
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DSCAPE] rate_control needs some form of reference counting

2006-08-01 Thread Jiri Benc
On Tue, 1 Aug 2006 17:16:32 +0200, Karol Lewandowski wrote:
 Without that ieee80211_register_hw was returning value from previous
 check:

I missed that. Thanks for spotting this.

 This fixes that oops.  Problem remains, though.  I suppose that oops
 would happen anyway if someone would do:
 
   # modprobe rate_control   # loads 80211
   # modprobe rt2500pci
 
 (Driver needs to initialize cleanly here, mine has some (different)
 problem so I'm just guessing here...)
 
  # rmmod rate_control
 
 oops here, I think.

This is the reason I wrote it's not easy to fix.

But your patch is correct and will be applied, thanks.

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-01 Thread Ben Greear

Jamal Hadi Salim wrote:

On Tue, 2006-01-08 at 08:08 -0400, John W. Linville wrote:
[..]


There is no doubt that we need to be able to do all three (vlan,
bridge, bond) at once.  I'm just not convinced we need to support
stacking them in every conceivable order.  



In theory there should be no issues stacking netdevices in any order
you want. In other words the hooks for doing so exist (albeit in some
limited way[1]). Practically, some of the topologies of interconnected
netdevices dont make a lot of sense. The danger is in restricting how
the stacking happens and overlooking some future creative use.
Safer to let the user own the policy and configure any way they want aka
shoot themselves in the foot.



And, I think that a
reconsideration of all three functions as a group could lead to
better/cleaner functionality with easier support for extension (e.g.
802.1s).



Agreed. I have some very strong opinions on this subject that i could
share with you if you want. For example, IMO, I think it would be a lot
reasonable to assume that a VLAN or VLANS are attributes of a netdevice
(just like IP addresses or MAC addresses are). 


As might be expected, I feel that VLANs are much more
useful as full-featured net devices.  I do not believe it is worth
decreasing functionality to try to 'clean up' the code.  There are people
doing interesting things with the mentioned virtual devices that the original
developers of the individual parts never envisioned, but I see that only
as a resounding success and validation of the architecture.

It is true that there are some interesting issues about where you add
the hooks to slurp up vlan, bridged, bonded and other type of virtual
device packets.

At least for some of my out-of-the-tree virtual lan devices (mac-vlan, in 
particular),
I thought it would be useful to allow dynamic registration of the layer-2 hooks 
such
as bridging.  This way, where there is no logical way to determine which 
virtual interface
should get first chance at a packet, the user can provide the ordering by 
adjusting
where the hooks sit in the chain.

Last time I mentioned this feature, it was pointed out that the net-filter 
hooks provide,
or come close to providing, this ability to stack.  If someone wants to work on
virtual device priority, it might be worth investigating this further and 
create an
API that makes this usable from kernel and user-space.

Thanks,
Ben

--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-01 Thread Ben Greear

John W. Linville wrote:

On Mon, Jul 31, 2006 at 09:39:08PM -0400, Jamal Hadi Salim wrote:


On Mon, 2006-31-07 at 08:30 -0400, John W. Linville wrote:




Do we hold the view that our L2 code is on par with the rest of
our code?  Is there an appetite for a clean-up?  Or is it just me?

/rant

If you made it this far, thanks for listening...I feel better now. :-)


Yes, I made it this far and you do make good arguement (or i may be
over-dosed ;-).
I have seen the following setups that are useful:

1) Vlans with bridges; in which one or more vlans exist per ethernet
port. Broadcast packets within such vlans are restricted to just those
vlans by the bridge.
2) complicate the above a little by having multiple spanning trees. 
3) Add to the above link layer HA (802.1ad or otherwise as presented

today by Bonding).

To answer your question; i think yes we need all 3.



Oh, don't get me wrong -- I definitely think we need all three.

I'm just not sure we need every conceivable combination of a) bonds
of vlan interfaces; b) vlan interfaces on top of bonds; c) bridged
vlan interfaces w/ disparate vlan IDs; d) bonded vlan interfaces w/
disparate vlan IDs; e) bonded bridge interfaces (does this work?) f)
bonded bonds (seen customers trying to do it); g) bridged vlan
interfaces; h) bridged bonds; i) bridged bridges (probably doesn't
work, but someone probably wants it); j) vlan interfaces on top of
bridges; k) vlan interfaces on top of vlans (double vlan tagging);
and, l) what am I leaving out?


Well, if it makes you feel better, I can't see a good way to do
vlans-over-vlans cleanly, backwards compatibly, and functional with
bridging, etc.  I would not plan to add such a feature to the kernel
unless from it's moment of inclusion it could handle at least bridging,
either.  So that feature will probably not see the light of day
any time soon :)


Most (actually all afaik) L2 networking equipment enforces some
hierarchy on the relationship between these L2 entities.  I am more
and more convinced we should do the same, although I do acknowledge
that the current situation does allow for some cleverness.


Very often, the answer to difficult networking issues is to 'get a linux box',
since that very flexibility is often key to interesting problems.


I'm just not sure that cleverness is worth the headache, especially
since the most clever things usually only work by accident...


Or, work by solid, modular design and small tweaks!

Thanks,
Ben

--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-01 Thread John W. Linville
On Tue, Aug 01, 2006 at 08:33:34AM -0400, Jamal Hadi Salim wrote:
 On Tue, 2006-01-08 at 08:08 -0400, John W. Linville wrote:

  And, I think that a
  reconsideration of all three functions as a group could lead to
  better/cleaner functionality with easier support for extension (e.g.
  802.1s).
 
 Agreed. I have some very strong opinions on this subject that i could
 share with you if you want. For example, IMO, I think it would be a lot
 reasonable to assume that a VLAN or VLANS are attributes of a netdevice
 (just like IP addresses or MAC addresses are). 

I'd love to hear them.  Feel free to send them off list, since I know
how shy you can be... :-)

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-01 Thread John W. Linville
On Tue, Aug 01, 2006 at 09:10:06AM -0700, Ben Greear wrote:
 Jamal Hadi Salim wrote:

 Agreed. I have some very strong opinions on this subject that i could
 share with you if you want. For example, IMO, I think it would be a lot
 reasonable to assume that a VLAN or VLANS are attributes of a netdevice
 (just like IP addresses or MAC addresses are). 
 
 As might be expected, I feel that VLANs are much more
 useful as full-featured net devices.  I do not believe it is worth
 decreasing functionality to try to 'clean up' the code.

In general, I agree that we shouldn't lose functionality.

I'm curious as to what particularly functionality you fear would be
lost if VLANs were not implemented the way they are now?

Thanks,

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-01 Thread Ben Greear

John W. Linville wrote:

On Tue, Aug 01, 2006 at 09:10:06AM -0700, Ben Greear wrote:


Jamal Hadi Salim wrote:




Agreed. I have some very strong opinions on this subject that i could
share with you if you want. For example, IMO, I think it would be a lot
reasonable to assume that a VLAN or VLANS are attributes of a netdevice
(just like IP addresses or MAC addresses are). 


As might be expected, I feel that VLANs are much more
useful as full-featured net devices.  I do not believe it is worth
decreasing functionality to try to 'clean up' the code.



In general, I agree that we shouldn't lose functionality.

I'm curious as to what particularly functionality you fear would be
lost if VLANs were not implemented the way they are now?


Well, Jamal and I and others discussed this in depth in the 2.4.12 time frame
when VLANs where about to go into the kernel.  Basically, my point is that
if VLANs are true devices, they will just work with all of the user-space 
protocols
and they will easily handle abstractions such as bridges, (multiple) IP 
addresses, MAC addresses,
net-filter, and all the rest.

Sounds like Jamal still remembers his reasons for wanting it otherwise...so
will let him describe his reasons.

Nothing is set in stone in Linux, and I am certainly not the final arbiter of
what gets into the linux kernel, but in my opinion, the current VLAN 
architecture
is supperior to the ip-alias model, and I see no reason to make any significant
changes.

Ben



Thanks,

John



--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 speed/duplex error

2006-08-01 Thread Rick Jones

I thought the common behavior is that if one side force any particular
parameter, other side should sense that and go to that mode too.


Nope.  That is a common misconception and perhaps the source of many 
duplex mismatch problems today.  Here is some boilerplate I bring-out 
from time to time that may be of help:


$ cat usenet_replies/duplex
How 100Base-T Autoneg is supposed to work:

When both sides of the link are set to autoneg, they will negotiate
the duplex setting and select full-duplex if both sides can do
full-duplex.

If one side is hardcoded and not using autoneg, the autoneg process
will fail and the side trying to autoneg is required by spec to use
half-duplex mode.

If one side is using half-duplex, and the other is using full-duplex,
sorrow and woe is the usual result.

So, the following table shows what will happen given various settings
on each side:

 Auto   Half   Full

   AutoHappiness   Lucky  Sorrow

   HalfLucky   Happiness  Sorrow

   FullSorrow  Sorrow Happiness

Happiness means that there is a good shot of everything going well.
Lucky means that things will likely go well, but not because you did
anything correctly :) Sorrow means that there _will_ be a duplex
mis-match.

When there is a duplex mismatch, on the side running half-duplex you
will see various errors and probably a number of _LATE_ collisions
(normal collisions don't count here).  On the side running
full-duplex you will see things like FCS errors.  Note that those
errors are not necessarily conclusive, they are simply indicators.

Further, it is important to keep in mind that a clean ping (or the
like - eg linkloop or default netperf TCP_RR) test result is
inconclusive here - a duplex mismatch causes lost traffic _only_ when
both sides of the link try to speak at the same time. A typical ping
test, being synchronous, one at a time request/response, never tries
to have both sides talking at the same time.

Finally, when/if you migrate to 1000Base-T, everything has to be set
to auto-neg anyway.

rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IPv6: only set err in rawv6_bind() when necessary

2006-08-01 Thread Brian Haley
The variable 'err' is set in rawv6_bind() before the address check fails 
instead of after, moved inside if() statement.


Signed-off-by: Brian Haley [EMAIL PROTECTED]
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 8a30cd8..072b28b 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -240,10 +240,10 @@ static int rawv6_bind(struct sock *sk, s
 		 */
 		v4addr = LOOPBACK4_IPV6;
 		if (!(addr_type  IPV6_ADDR_MULTICAST))	{
-			err = -EADDRNOTAVAIL;
 			if (!ipv6_chk_addr(addr-sin6_addr, dev, 0)) {
 if (dev)
 	dev_put(dev);
+err = -EADDRNOTAVAIL;
 goto out;
 			}
 		}


Re: [PATCH dscape] d80211: Switch d80211.h to IEEE80211_ style names

2006-08-01 Thread John W. Linville
On Tue, Aug 01, 2006 at 04:25:05PM +0200, Ivo Van Doorn wrote:
 Hi,
 
  Do you have a plan when you will merge rt2x00 patches so I can apply
  Michael's renaming patch(es) without risk of conflicts?
 
 Working on it ~now.  I hit a snag in that Ivo's patches seem to rely on
 his radio button patch, which I had ignored until now.  I'll probably
 pull that in this morning (or figure-out how to not need it) and get
 something pushed to wireless-dev sometime today.
 
 Ehm, that should not have happened. The rt2x00 version I had send as patches
 had the radio button integrated as they had been in wireless-dev already.
 The patch to convert it to the rfkill driver is one of the patches
 I'll send later.

The rfkill stuff seems to be behind a CONFIG_RT2X00_BUTTON define,
which keys off some driver-specific Kconfig stuff.  They were
getting turned-on by default w/ 'allmodconfig' and 'allyesconfig'.
Turning them off in .config lets them compile.

In general, I'd prefer not to have Kconfig-settable options that
don't compile.  But in this case I guess we'll just live with it.
After all, it is a development tree... :-)

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 speed/duplex error

2006-08-01 Thread Jeff Kirsher

On 8/1/06, Auke Kok [EMAIL PROTECTED] wrote:

Jeff Kirsher wrote:
 On 8/1/06, a1 [EMAIL PROTECTED] wrote:
 Hi, Jeff.


 JK OPTION 2: Turn auto-negotiate on the e1000 card and tell it to only
 JK advertise 100 Full Duplex.  This will allow negotiation between the
 JK two lnk partners and the e1000 will advertise that it is only able to
 JK do 100 Full duplex.

 Is there any way i could do this with ethtool? It only allows force
 spd/dplx , but not set it for advertising...

 Not currently.  That would be a nice feature though... :)


 If i do as follows other side reports 1000/FD:

 ethtool -s eth0 speed 100 duplex full autoneg on


 Which is what I would expect.  I have to step away for a bit, but if
 no one responds with how to load the driver with auto-negotiate
 advertising only 100 Full Duplex, I will do so when I return.

Here's that part of the driver documentation:

$ modprobe e1000 AutoNeg=0x08
e1000: :00:00.0: e1000_validate_option: AutoNeg advertising 100/FD


  99 /* Auto-negotiation Advertisement Override
100  *
101  * Valid Range: 0x01-0x0F, 0x20-0x2F (copper); 0x20 (fiber)
102  *
103  * The AutoNeg value is a bit mask describing which speed and duplex
104  * combinations should be advertised during auto-negotiation.
105  * The supported speed and duplex modes are listed below
106  *
107  * Bit   7 6 5  4  3 2 1  0
108  * Speed (Mbps)  N/A   N/A   1000   N/A100   100   10 10
109  * DuplexFull  Full  Half  Full   Half
110  *
111  * Default Value: 0x2F (copper); 0x20 (fiber)
112  */

hth,

Auke


Thanks Auke.  I got tied up longer than expected.

--
Cheers,
Jeff
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-01 Thread Ben Greear

John W. Linville wrote:


I'm just not sure that cleverness is worth the headache, especially
since the most clever things usually only work by accident...


Or, work by solid, modular design and small tweaks!



Point taken.  But stashing little hacks in the networking core for
specific virtual drivers isn't totally modular either.  And even if
it were, modular design probably belongs on the list of things
that can be taken too far, like everything in userland, never
use ioctl, and microkernels are superior. :-)


To be honest, I'm not over-joyed to see bridging hooks included
in the VLAN code..but if that is what it takes to get bridging
and VLANs to play well and be flexible, I think it is a fair price.

It certainly wouldn't hurt to have someone take a holistic view of the
various L2 device interactions.  Just documenting current functionality
on, say, the netdev wiki would be a good first step.

Thanks,
Ben

--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH dscape] d80211: Switch d80211.h to IEEE80211_ style names

2006-08-01 Thread Ivo van Doorn
On Tuesday 01 August 2006 19:11, John W. Linville wrote:
 On Tue, Aug 01, 2006 at 04:25:05PM +0200, Ivo Van Doorn wrote:
  Hi,
  
   Do you have a plan when you will merge rt2x00 patches so I can apply
   Michael's renaming patch(es) without risk of conflicts?
  
  Working on it ~now.  I hit a snag in that Ivo's patches seem to rely on
  his radio button patch, which I had ignored until now.  I'll probably
  pull that in this morning (or figure-out how to not need it) and get
  something pushed to wireless-dev sometime today.
  
  Ehm, that should not have happened. The rt2x00 version I had send as patches
  had the radio button integrated as they had been in wireless-dev already.
  The patch to convert it to the rfkill driver is one of the patches
  I'll send later.
 
 The rfkill stuff seems to be behind a CONFIG_RT2X00_BUTTON define,
 which keys off some driver-specific Kconfig stuff.  They were
 getting turned-on by default w/ 'allmodconfig' and 'allyesconfig'.
 Turning them off in .config lets them compile.

Ah, I have spotted the problem. That was indeed accidently added during the last
patch series. It was part of the patch I was keeping behind.
I'll send the rfkill patch as soon as possible,  that should fix that problem,
in case the rfkill is considered not good enough for inclusion yet, I'll create 
a
patch to fix this issue correctly.

 In general, I'd prefer not to have Kconfig-settable options that
 don't compile.  But in this case I guess we'll just live with it.
 After all, it is a development tree... :-)

Hehe, true. But I'll do my best to fix this issue soon. :)

Ivo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/4] kevent: core files.

2006-08-01 Thread Zach Brown

 I do not think if we do a ring buffer that events should be obtainable
 via a syscall at all.  Rather, I think this system call should be
 purely sleep until ring is not empty.

Mmm, yeah, of course.  That's much simpler.  I'm looking forward to
Evgeniy's next patch set.

 The ring buffer size, as Evgeniy also tried to describe, is bounded
 purely by the number of registered events.

Yeah.  fwiw, fs/aio.c has this property today.

- z
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV6]: Audit all ip6_dst_lookup/ip6_dst_store calls

2006-08-01 Thread Andi Kleen
On Tuesday 01 August 2006 16:41, Andrew Morton wrote:
 On Mon, 31 Jul 2006 19:04:33 +1000
 Herbert Xu [EMAIL PROTECTED] wrote:
 
  2) There is something broken in the x86_64 unwind code which is causing
  it to panic just about everytime somebody calls dump_stack().
  
  Andi, this is the second time I've seen a report where an otherwise
  harmless dump_stack call (the other one was caused by a WARN_ON) gets
  turned into a panic by the stack unwind code on x86_64.  This particular
  report is with 2.6.18-rc3 so it looks like whatever bug is causing it
  hasn't been fixed yet.
  
  Could you please have a look at it? Thanks.
 
 Jan thinks this might have been fixed by a patch which he sent Andi a
 couple of days ago.  Andi has sent that patch to Linus

I didn't send that particular patch before, just queued it, because I didn't 
realize
that particular crash, but I have now send it yesterday. So far L. hasn't merged
it unfortunately, but I will resend.

 but I'm not sure 
 which patch it was
 
entry-more-unwind was my version, there was another one from Jan

 and I'm not sure whether it has been merged into 
 mainline.
 
 But yes, -rc3 unwind has problems.

unwinder stuck messages are expected and not really fatal because they
don't lose any information. I expect it will need some releases to work
them all out fully, but then we'll hopefully have a much better unwinder
that doesn't generate any false positives anymore.

New crashes during unwinding are fatal though and I plan to fix them.
So far this one was the only known one.

I already got a lot of patches queued for .19 that fix more unwind
information in a lot of assembly files. Still not fully complete though.
Fixing it all properly unfortunately requires undoing some stuff, e.g.
the unwinder cannot deal with separate lock sections, so I was slowly
removing them.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] RFC: matching interface groups

2006-08-01 Thread Balazs Scheidler
Hi,

I would like to easily match a set of dynamically created interfaces
from my packet filter rules. The attached patch forms the basis of my
implementation and I would like to know whether something like this is
mergeable to mainline.

The use-case is as follows:

* I have two different subsystems creating interfaces dynamically (for
example pptpd and serial pppd lines, each creating dynamic pppX
interfaces),
* I would like to assign a different set of iptables rules for these
clients,
* I would like to react to a new interface being added to a specific set
in a userspace application,

The reasons I see this needs new kernel functionality:

* iptables supports wildcard interface matching (for example iptables
-i ppp+), but as the names of the interfaces used by PPTPD and PPPD
cannot be distinguished this way, this is not enough,
* Reloading the iptables ruleset everytime a new interface comes up is
not really feasible, as it abrupts packet processing, and validating the
ruleset in the kernel can take significant amount of time,
* the kernel change is very simple, adapting userspace to this change is
also very simple, and in userspace various software packages can easily
interoperate with each-other once this is merged.

The implementation:

Each interface can belong to a single group at a time, an interface
comes up without being a member in any of the groups.

Userspace can assign interfaces to groups after being created, this
would typically be performed in /etc/ppp/ip-up.d (and similar) scripts.

In spirit interface group is somewhat similar to the routing
protocol field for routing entries, which contains information on which
routing daemon was responsible for adding the given route entry.

Things to be done if you like this approach:

* interface group match in iptables,
* support for naming interface groups in userspace, a'la routing
protocols,
* emitting a netlink notification when the group of an interface
changes,
* possibly converting the ip link command to use NETLINK messages,
instead of using ioctl()

What do you think?

kernel patch:
-

* add a numeric ID to each interface in the system, denoting its
interface group,


index df0cdd4..19a103a 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -736,6 +736,8 @@ enum
 #define IFLA_WEIGHT IFLA_WEIGHT
IFLA_OPERSTATE,
IFLA_LINKMODE,
+#define IFLA_IFGROUP IFLA_IFGROUP
+   IFLA_IFGROUP,
__IFLA_MAX
 };

diff --git a/include/linux/sockios.h b/include/linux/sockios.h
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 3fcfa9c..26849af 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -279,6 +279,11 @@ static int rtnetlink_fill_ifinfo(struct
u32 iflink = dev-iflink;
RTA_PUT(skb, IFLA_LINK, sizeof(iflink), iflink);
}
+
+   if (dev-ifgroup) {
+   u32 ifgroup = dev-ifgroup;
+   RTA_PUT(skb, IFLA_IFGROUP, sizeof(ifgroup), ifgroup);
+   }

if (dev-qdisc_sleeping)
RTA_PUT(skb, IFLA_QDISC,
@@ -459,6 +464,12 @@ static int do_setlink(struct sk_buff *sk
dev-link_mode = *((u8 *) RTA_DATA(ida[IFLA_LINKMODE - 1]));
write_unlock_bh(dev_base_lock);
}
+
+   if (ida[IFLA_IFGROUP - 1]) {
+   if (ida[IFLA_IFGROUP - 1]-rta_len != RTA_LENGTH(sizeof(u32)))
+   goto out;
+   dev-ifgroup = *((u32 *) RTA_DATA(ida[IFLA_IFGROUP - 1]));
+   }

if (ifm-ifi_index = 0  ida[IFLA_IFNAME - 1]) {
char ifname[IFNAMSIZ];


ip route patch:
---

* added a group option to ip link set to make it possible to set this
id, and a way to print this option


diff --git a/ip/iplink.c b/ip/iplink.c
index ffc9f06..e694475 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -26,6 +26,7 @@
 #include string.h
 #include sys/ioctl.h
 #include linux/sockios.h
+#include linux/rtnetlink.h

 #include rt_names.h
 #include utils.h
@@ -44,6 +45,7 @@ void iplink_usage(void)
fprintf(stderr, promisc { on | off } |\n);
fprintf(stderr, trailers { on | off } 
|\n);
fprintf(stderr, txqueuelen PACKETS |\n);
+   fprintf(stderr, group GROUP |\n);
fprintf(stderr, name NEWNAME |\n);
fprintf(stderr, address LLADDR | broadcast 
LLADDR |\n);
fprintf(stderr, mtu MTU }\n);
@@ -174,6 +176,28 @@ static int set_mtu(const char *dev, int
return 0;
 }

+static int set_group(const char *dev, int ifgroup)
+{
+   struct {
+   struct nlmsghdr n;
+   struct ifinfomsgifi;
+   charbuf[256];
+   } req;
+
+   memset(req, 0, sizeof(req));
+   req.n.nlmsg_len = NLMSG_LENGTH(sizeof(req.ifi));
+   

Re: [PATCH] SMSC LAN911x and LAN921x vendor driver

2006-08-01 Thread Scott Murray

On Tue, 1 Aug 2006, Steve Glendinning wrote:


Attached is a driver patch for SMSC911x family of ethernet chips,
generated against 2.6.18-rc1 sources. There's a similar driver in the
tree; this one has been tested by SMSC on all flavors of the chip and
claimed to be efficient.


Updated after feedback from Stephen Hemminger.

Driver updated to also support LAN921x family.  Workarounds added for
known hardware issues.


Many improvements following feedback from Stephen Hemminger and
Francois Romieu:
- Tasklet removed, NAPI poll used instead
- Multiple device support
- style fixes  minor improvements


Sorry to be coming in late, but I'm curious about why this work is being
submitted as a separate driver, rather than as patches against the driver
from Dustin McIntire that was added a few months ago.  Is the intention to 
go forward with two different drivers for these chips?


Scott


--
==
Scott Murray, [EMAIL PROTECTED]

 Good, bad ... I'm the guy with the gun. - Ash, Army of Darkness
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] RFC: matching interface groups

2006-08-01 Thread Phil Oester
On Tue, Aug 01, 2006 at 07:10:09PM +0200, Balazs Scheidler wrote:
 Each interface can belong to a single group at a time, an interface
 comes up without being a member in any of the groups.
 
 Userspace can assign interfaces to groups after being created, this
 would typically be performed in /etc/ppp/ip-up.d (and similar) scripts.

Since in this scenario userspace is able to determine ppp vs pptp, 
could you not also do something like have an inbound_ppp and inbound_pptp
chain, then jump to the appropriate chain depending on type?  If you
need per-interface rules, then create an inbound_pppX chain, populate
it with rules, then jump to that chain if -i pppX.  In ip-down, just
delete the chain as well as the jump.

Phil
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: E1000: bug on error path in e1000_probe()

2006-08-01 Thread Auke Kok

Stephane Doyon wrote:
The e1000_probe() function passes references to the netdev structure 
before it's actually registered. In the (admittedly obscure) case where 
the netdev registration fails, we are left with a dangling reference.


Specifically, e1000_probe() calls
netif_carrier_off(netdev);
before register_netdev(netdev).

(It also calls pci_set_drvdata(pdev, netdev) rather early, not sure how 
important that is.)


netif_carrier_off() does linkwatch_fire_event(dev);, which in turn does 
dev_hold(dev); and queues up an event with a reference to the netdev.


But the net_device reference counting mechanism only works on registered 
netdevs.


Should the register_netdev() call fail, the error path does 
free_netdev(netdev);, and when the event goes off, it accesses random 
memory through the dangling reference.


I would recommend moving the register_netdev() call earlier.


We agree that this may be an issue and we're looking at how this mis-ordering 
entered the code in the first place. I'm probably going to send a patch later 
today or include it in this week-worths upstream patches later this week.


We were wondering however how you encountered this problem? Did you see a case 
where this race actually happened? it might be an interesting case to look at. 
Or did you do this by code review only?


Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] d80211: make sleeping in hw-config possible

2006-08-01 Thread Jiri Benc
This patch makes sleeping in the hw-config callback possible by removing
the only atomic caller. The atomic caller was a timer and is replaced by
a workqueue.

This is based on a patch from Michael Buesch [EMAIL PROTECTED].

Signed-off-by: Jiri Benc [EMAIL PROTECTED]

---

 net/d80211/ieee80211.c   |   23 +++
 net/d80211/ieee80211_i.h |3 ++-
 net/d80211/ieee80211_iface.c |   10 --
 net/d80211/ieee80211_sta.c   |   37 +
 4 files changed, 42 insertions(+), 31 deletions(-)

d0d2b7a8ddc378ddea499f1537f6aea83d96d003
diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c
index 4e80767..9f883a4 100644
--- a/net/d80211/ieee80211.c
+++ b/net/d80211/ieee80211.c
@@ -4552,14 +4552,6 @@ void ieee80211_unregister_hw(struct net_
 tasklet_disable(local-tasklet);
 /* TODO: skb_queue should be empty here, no need to do anything? */
 
-   if (local-rate_limit)
-   del_timer_sync(local-rate_limit_timer);
-   if (local-stat_time)
-   del_timer_sync(local-stat_timer);
-   if (local-scan_timer.data)
-   del_timer_sync(local-scan_timer);
-   ieee80211_rx_bss_list_deinit(dev);
-
rtnl_lock();
local-reg_state = IEEE80211_DEV_UNREGISTERED;
if (local-apdev)
@@ -4572,6 +4564,21 @@ void ieee80211_unregister_hw(struct net_
}
rtnl_unlock();
 
+   if (local-rate_limit)
+   del_timer_sync(local-rate_limit_timer);
+   if (local-stat_time)
+   del_timer_sync(local-stat_timer);
+   if (local-scan_work.data) {
+   local-sta_scanning = 0;
+   cancel_delayed_work(local-scan_work);
+   flush_scheduled_work();
+   /* The scan_work is guaranteed not to be called at this
+* point. It is not scheduled and not running now. It can be
+* scheduled again only by some sta_timer (all of them are
+* stopped by now) or under rtnl lock. */
+   }
+
+   ieee80211_rx_bss_list_deinit(dev);
ieee80211_clear_tx_pending(local);
sta_info_stop(local);
rate_control_remove_attrs(local, local-rate_ctrl_priv,
diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h
index 016a2b1..5a2c6e8 100644
--- a/net/d80211/ieee80211_i.h
+++ b/net/d80211/ieee80211_i.h
@@ -17,6 +17,7 @@ #include linux/interrupt.h
 #include linux/list.h
 #include linux/netdevice.h
 #include linux/skbuff.h
+#include linux/workqueue.h
 #include ieee80211_key.h
 #include sta_info.h
 
@@ -425,7 +426,7 @@ #define IEEE80211_IRQSAFE_QUEUE_LIMIT 12
int scan_channel_idx;
enum { SCAN_SET_CHANNEL, SCAN_SEND_PROBE } scan_state;
unsigned long last_scan_completed;
-   struct timer_list scan_timer;
+   struct work_struct scan_work;
int scan_oper_channel;
int scan_oper_channel_val;
int scan_oper_power_level;
diff --git a/net/d80211/ieee80211_iface.c b/net/d80211/ieee80211_iface.c
index fa3d9e2..12b9d4f 100644
--- a/net/d80211/ieee80211_iface.c
+++ b/net/d80211/ieee80211_iface.c
@@ -287,8 +287,14 @@ #endif /* CONFIG_D80211_VERBOSE_DEBUG */
case IEEE80211_IF_TYPE_STA:
case IEEE80211_IF_TYPE_IBSS:
del_timer_sync(sdata-u.sta.timer);
-   if (local-scan_timer.data == (unsigned long) sdata-dev)
-   del_timer_sync(local-scan_timer);
+   if (local-scan_work.data == sdata-dev) {
+   local-sta_scanning = 0;
+   cancel_delayed_work(local-scan_work);
+   flush_scheduled_work();
+   /* see comment in ieee80211_unregister_hw to
+* understand why this works */
+   local-scan_work.data = NULL;
+   }
kfree(sdata-u.sta.extra_ie);
sdata-u.sta.extra_ie = NULL;
kfree(sdata-u.sta.assocreq_ies);
diff --git a/net/d80211/ieee80211_sta.c b/net/d80211/ieee80211_sta.c
index b0cfff1..22f9599 100644
--- a/net/d80211/ieee80211_sta.c
+++ b/net/d80211/ieee80211_sta.c
@@ -2417,15 +2417,16 @@ static int ieee80211_active_scan(struct 
 }
 
 
-static void ieee80211_sta_scan_timer(unsigned long ptr)
+static void ieee80211_sta_scan_work(void *ptr)
 {
-   struct net_device *dev = (struct net_device *) ptr;
+   struct net_device *dev = ptr;
struct ieee80211_local *local = dev-ieee80211_ptr;
 struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
struct ieee80211_hw_modes *mode;
struct ieee80211_channel *chan;
int skip;
union iwreq_data wrqu;
+   unsigned long next_delay = 0;
 
if (!local-sta_scanning)
return;
@@ -2493,31 +2494,30 @@ #endif
local-scan_channel_idx = 0;
}
 
-   if (skip) {
-   local-scan_timer.expires = jiffies;
+ 

[PATCH 2/5] d80211: return correct error codes for scan requests

2006-08-01 Thread Jiri Benc
Do not allow scanning when the network interface is down. Return 0 instead
of -EBUSY when scanning is in progress on the same network interface.

Signed-off-by: Jiri Benc [EMAIL PROTECTED]

---

 net/d80211/ieee80211_ioctl.c |6 ++
 net/d80211/ieee80211_sta.c   |5 -
 2 files changed, 10 insertions(+), 1 deletions(-)

2cf10f1a78222a375297d01a919d55d1a3c2a5a6
diff --git a/net/d80211/ieee80211_ioctl.c b/net/d80211/ieee80211_ioctl.c
index d73693e..e43e3b0 100644
--- a/net/d80211/ieee80211_ioctl.c
+++ b/net/d80211/ieee80211_ioctl.c
@@ -1091,6 +1091,9 @@ static int ieee80211_ioctl_scan_req(stru
if (local-user_space_mlme)
return -EOPNOTSUPP;
 
+   if (!netif_running(dev))
+   return -ENETDOWN;
+
if (left  len || len  IEEE80211_MAX_SSID_LEN)
return -EINVAL;
 
@@ -1914,6 +1917,9 @@ static int ieee80211_ioctl_siwscan(struc
u8 *ssid = NULL;
size_t ssid_len = 0;
 
+   if (!netif_running(dev))
+   return -ENETDOWN;
+
if (local-scan_flags  IEEE80211_SCAN_MATCH_SSID) {
if (sdata-type == IEEE80211_IF_TYPE_STA ||
sdata-type == IEEE80211_IF_TYPE_IBSS) {
diff --git a/net/d80211/ieee80211_sta.c b/net/d80211/ieee80211_sta.c
index 22f9599..13dcdae 100644
--- a/net/d80211/ieee80211_sta.c
+++ b/net/d80211/ieee80211_sta.c
@@ -2548,8 +2548,11 @@ int ieee80211_sta_req_scan(struct net_de
/* TODO: if assoc, move to power save mode for the duration of the
 * scan */
 
-   if (local-sta_scanning)
+   if (local-sta_scanning) {
+   if (local-scan_work.data == dev)
+   return 0;
return -EBUSY;
+   }
 
printk(KERN_DEBUG %s: starting scan\n, dev-name);
 
-- 
1.3.0

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH dscape] d80211: Switch d80211.h to IEEE80211_ style names

2006-08-01 Thread Ulrich Kunitz
On 06-08-01 15:58 Jiri Benc wrote:

  pointing non-migrated drivers (ipw2[12]00, zd1211rw) at the old
  code,
 
 Yes. Rather than moving, zd1211 should be ported to d80211 - this will
 also allow using of more advanced features of the hw.

I have currently no idea, when this will happen. Currently we are
still working on the basic plumbing of the driver.

However I would support the descape-preparing clean-up, if
pointless renaming is minimized. In the ideal case only header
includes should be changed. I would support a split between
protocol related headers and stack-related stuff.

Cheers,

Uli

-- 
Uli Kunitz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] d80211: return correct value when loading of rate control module fails

2006-08-01 Thread Jiri Benc
From: Karol Lewandowski [EMAIL PROTECTED]

When loading of rate_control module fails, ieee80211_register_hw returns
value from previous check. This patch fixes that.

Signed-off-by: Jiri Benc [EMAIL PROTECTED]

---

 net/d80211/ieee80211.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

2a053059b358f64991ac003c48f3de1da86c33ab
diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c
index 9f883a4..41c292b 100644
--- a/net/d80211/ieee80211.c
+++ b/net/d80211/ieee80211.c
@@ -4462,7 +4462,8 @@ int ieee80211_register_hw(struct net_dev
if (result  0)
goto fail_if_sysfs;
 
-   if (rate_control_initialize(local)  0) {
+   result = rate_control_initialize(local);
+   if (result  0) {
printk(KERN_DEBUG %s: Failed to initialize rate control 
   algorithm\n, dev-name);
goto fail_rate;
-- 
1.3.0

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: E1000: bug on error path in e1000_probe()

2006-08-01 Thread Stephane Doyon

On Tue, 1 Aug 2006, Auke Kok wrote:


Stephane Doyon wrote:

 The e1000_probe() function passes references to the netdev structure
 before it's actually registered. In the (admittedly obscure) case where
 the netdev registration fails, we are left with a dangling reference.

 Specifically, e1000_probe() calls
 netif_carrier_off(netdev);
 before register_netdev(netdev).

 (It also calls pci_set_drvdata(pdev, netdev) rather early, not sure how
 important that is.)

 netif_carrier_off() does linkwatch_fire_event(dev);, which in turn does
 dev_hold(dev); and queues up an event with a reference to the netdev.

 But the net_device reference counting mechanism only works on registered
 netdevs.

 Should the register_netdev() call fail, the error path does
 free_netdev(netdev);, and when the event goes off, it accesses random
 memory through the dangling reference.

 I would recommend moving the register_netdev() call earlier.


We agree that this may be an issue and we're looking at how this mis-ordering 
entered the code in the first place. I'm probably going to send a patch later 
today or include it in this week-worths upstream patches later this week.


Thank you for looking at this.

We were wondering however how you encountered this problem? Did you see a 
case where this race actually happened? it might be an interesting case to 
look at. Or did you do this by code review only?


Yes I did see a case where this happened, but the failure in 
register_netdev() was due to some unrelated kernel code modifications I 
was working with. The effect of the dangling reference was an unhandled 
kernel paging request somewhere in the USB EHCI driver where some pointer 
got corrupted. The USB modules were being inserted soon after the e1000. I 
moved things around and eventually I put a sleep after the modprobe e1000, 
and then I got a BUG at kernel/timer.c:279! instead, and the backtrace 
showed mod_timer() = __netdev_watchdog_up() = ... = dev_activate)( = 
linkwatch_run_queue()... I figured out the problem from there. In 
e1000_probe(), I moved netif_carrier_off() (and pci_set_drvdata( and 
netif_stop_queue() too for good measure) to after the register_netdev() 
call, and that made the weird effects go away. The error path in question 
is pretty obscure, but it wasn't easy working backward from the memory 
corruption effect, so that's my extra incentive for reporting the problem.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] SMSC LAN911x and LAN921x vendor driver

2006-08-01 Thread Steve . Glendinning
Hi Scott

 Sorry to be coming in late, but I'm curious about why this work is being
 submitted as a separate driver, rather than as patches against the 
driver
 from Dustin McIntire that was added a few months ago.  Is the intention 
to 
 go forward with two different drivers for these chips?

I was waiting for someone to ask this!

This driver has been developed by SMSC  ARM, and has several advantages 
over the already merged smc911x:

- The current driver is arm specific, our smsc911x driver is tested and 
supported on arm, sh, i386
- smsc911x contains support for the new LAN921x family, as well as LAN911x
- smsc911x contains important workarounds for currently known hardware 
issues
- It's shorted, and I believe the coding style to be cleaner and easier to 
follow.

so I'm presenting this as an alternative.  Thoughts?

Regards,
--
Steve Glendinning
SMSC GmbH
m: +44 777 933 9124
e: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-01 Thread Krzysztof Halasa
Ben Greear [EMAIL PROTECTED] writes:

 Basically, my point is that
 if VLANs are true devices, they will just work with all of the
 user-space protocols
 and they will easily handle abstractions such as bridges, (multiple)
 IP addresses, MAC addresses,
 net-filter, and all the rest.

AOL mode I think the same.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ipv4: don't call upper-layer disconnect function if not connected

2006-08-01 Thread Brian Haley
Calling connect() with AF_UNSPEC will disconnect a socket, but we don't 
need to do any work if the socket isn't currently connected.


Signed-off-by: Brian Haley [EMAIL PROTECTED]
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index c84a320..b294b92 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -480,12 +480,16 @@ int inet_dgram_connect(struct socket *so
 {
 	struct sock *sk = sock-sk;
 
-	if (uaddr-sa_family == AF_UNSPEC)
-		return sk-sk_prot-disconnect(sk, flags);
+	if (uaddr-sa_family == AF_UNSPEC) {
+		if (sock-state != SS_UNCONNECTED)
+			return sk-sk_prot-disconnect(sk, flags);
+		else
+			return 0;
+	}
 
 	if (!inet_sk(sk)-num  inet_autobind(sk))
 		return -EAGAIN;
-	return sk-sk_prot-connect(sk, (struct sockaddr *)uaddr, addr_len);
+	return sk-sk_prot-connect(sk, uaddr, addr_len);
 }
 
 static long inet_wait_for_connect(struct sock *sk, long timeo)
@@ -525,8 +529,11 @@ int inet_stream_connect(struct socket *s
 	lock_sock(sk);
 
 	if (uaddr-sa_family == AF_UNSPEC) {
-		err = sk-sk_prot-disconnect(sk, flags);
-		sock-state = err ? SS_DISCONNECTING : SS_UNCONNECTED;
+		if (sock-state != SS_UNCONNECTED) {
+			err = sk-sk_prot-disconnect(sk, flags);
+			sock-state = err ? SS_DISCONNECTING : SS_UNCONNECTED;
+		} else
+			err = 0;
 		goto out;
 	}
 


Re: [PATCH] NET: fix kernel panic from no dev-hard_header_len space

2006-08-01 Thread Alexey Kuznetsov
Hello!

  Alexey, any suggestions on how to handle this kind of thing?

Device, which adds something at head must check for space.
Anyone, who adds something at head, must check.
Otherwise, it will remain buggy forever.


 What's wrong with my patch?

As I already said there is nothing wrong with the first chunk.
Except that it hides the real problem.


 hardly their author's fault. I don't think we've ever advertised
 hard_header_len is valid only with non-NULL hard_header.

Do not get it wrong. dev-hard_header_len is _NEVER_ guaranteed.
The places, which allocate skb, take it as a hint to avoid reallocation.
But each place which stuffs something at head, still must check the space.

The only difference between the situation with dev-hard_header,
is that when dev-hard_header != NULL, the header is added by IP itself.
That's why IP checks it.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: fix kernel panic from no dev-hard_header_len space

2006-08-01 Thread Alexey Kuznetsov
Hello!

 Do the semantics (I'm not talking about bugs) allow skb passed
 to dev-hard_header() (if defined)

No. dev-hard_header() should get enough of space, which is
dev-hard_header_len.

Actually, it is historical hole in design, inherited from ancient
times. Calling conventions of dev-hard_header() just did not allow
to reallocate. BTW in 2.6 it can, if it uses pskb_expand_head().


and then to dev-hard_start_xmit()
 to have less link layer header space than dev-hard_header_len?

Absolutely. It used to happen all the time. All those devices,
which occasionally forget to check for space must be fixed.


 I.e., is dev-hard_header_len only advisory?

For initial allocator it is an advice. For layers, which add something
at head, it is just nothing, if there is enough space. And it is again
an advice, when skb is reallocated.


 Anyway, the issue with kernel panic is real so I think we better
 fix it before 2.6.18, and propagate to stable series as well.

:-) Know what? This problem followed us since prehistoric times.
It happened in 2.4-stablest, 2.2-stable, 2.0... The same devices,
the same problem, no matter how much of space it is given to them,
they managed to find a hole and to crash. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] zd1211rw: Fixed endianess issue with length info tag detection

2006-08-01 Thread Ulrich Kunitz
Discovered a problem while accessing www.python.org on my PPC32.
The problem was pretty consistent for all sticks. The reason was
that while testing for the length info tag, I ignored the
endianess of the host system.

Please recognize that converting the constant to little endian, we
create faster code.

Signed-off-by: Ulrich Kunitz [EMAIL PROTECTED]
---
 drivers/net/wireless/zd1211rw/zd_usb.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/zd1211rw/zd_usb.c 
b/drivers/net/wireless/zd1211rw/zd_usb.c
index 2e3d77e..96551da 100644
--- a/drivers/net/wireless/zd1211rw/zd_usb.c
+++ b/drivers/net/wireless/zd1211rw/zd_usb.c
@@ -546,11 +546,11 @@ static void handle_rx_packet(struct zd_u
 * be padded. Unaligned access might also happen if the length_info
 * structure is not present.
 */
-   if (get_unaligned(length_info-tag) == RX_LENGTH_INFO_TAG) {
+   if (get_unaligned(length_info-tag) == cpu_to_le16(RX_LENGTH_INFO_TAG))
+   {
unsigned int l, k, n;
for (i = 0, l = 0;; i++) {
-   k = le16_to_cpu(get_unaligned(
-   length_info-length[i]));
+   k = le16_to_cpu(get_unaligned(length_info-length[i]));
n = l+k;
if (n  length)
return;
-- 
1.4.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] zd1211rw: Pass more management frame types up to host

2006-08-01 Thread Ulrich Kunitz
From: Daniel Drake [EMAIL PROTECTED]

We'll be needing these at some point...

Signed-off-by: Daniel Drake [EMAIL PROTECTED]
Signed-off-by: Ulrich Kunitz [EMAIL PROTECTED]
---
 drivers/net/wireless/zd1211rw/zd_chip.h |4 +++-
 drivers/net/wireless/zd1211rw/zd_mac.c  |6 --
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/zd1211rw/zd_chip.h 
b/drivers/net/wireless/zd1211rw/zd_chip.h
index 8051210..0eb9c8f 100644
--- a/drivers/net/wireless/zd1211rw/zd_chip.h
+++ b/drivers/net/wireless/zd1211rw/zd_chip.h
@@ -461,10 +461,12 @@ #define CR_UNDERRUN_CNT   CTL_REG(0x0688
 
 #define CR_RX_FILTER   CTL_REG(0x068c)
 #define RX_FILTER_ASSOC_RESPONSE   0x0002
+#define RX_FILTER_REASSOC_RESPONSE 0x0008
 #define RX_FILTER_PROBE_RESPONSE   0x0020
 #define RX_FILTER_BEACON   0x0100
+#define RX_FILTER_DISASSOC 0x0400
 #define RX_FILTER_AUTH 0x0800
-/* Sniff modus sets filter to 0xf */
+/* Monitor mode sets filter to 0xf */
 
 #define CR_ACK_TIMEOUT_EXT CTL_REG(0x0690)
 #define CR_BCN_FIFO_SEMAPHORE  CTL_REG(0x0694)
diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c 
b/drivers/net/wireless/zd1211rw/zd_mac.c
index b394303..1cf1fda 100644
--- a/drivers/net/wireless/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zd1211rw/zd_mac.c
@@ -136,8 +136,10 @@ static int reset_mode(struct zd_mac *mac
 {
struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac);
struct zd_ioreq32 ioreqs[3] = {
-   { CR_RX_FILTER, RX_FILTER_BEACON|RX_FILTER_PROBE_RESPONSE|
-   RX_FILTER_AUTH|RX_FILTER_ASSOC_RESPONSE },
+   { CR_RX_FILTER, RX_FILTER_BEACON | RX_FILTER_PROBE_RESPONSE |
+   RX_FILTER_AUTH | RX_FILTER_ASSOC_RESPONSE |
+   RX_FILTER_REASSOC_RESPONSE |
+   RX_FILTER_DISASSOC },
{ CR_SNIFFER_ON, 0U },
{ CR_ENCRYPTION_TYPE, NO_WEP },
};
-- 
1.4.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] zd1211rw: Fixes radiotap header

2006-08-01 Thread Ulrich Kunitz
There has been a problem in the radiotap header. Monitor mode
works now with tcpdump 3.94 + libpcap 0.9.4. ethereal 0.99.0 +
libpcap 0.9.4 is broken, because it doesn't find the right offset
for the IEEE 802.11 header.

Signed-off-by: Ulrich Kunitz [EMAIL PROTECTED]
---
 drivers/net/wireless/zd1211rw/zd_mac.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c 
b/drivers/net/wireless/zd1211rw/zd_mac.c
index 3bdc54d..b394303 100644
--- a/drivers/net/wireless/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zd1211rw/zd_mac.c
@@ -713,10 +713,10 @@ static int zd_mac_tx(struct zd_mac *mac,
 struct zd_rt_hdr {
struct ieee80211_radiotap_header rt_hdr;
u8  rt_flags;
+   u8  rt_rate;
u16 rt_channel;
u16 rt_chbitmask;
-   u16 rt_rate;
-};
+} __attribute__((packed));
 
 static void fill_rt_header(void *buffer, struct zd_mac *mac,
   const struct ieee80211_rx_stats *stats,
@@ -735,14 +735,14 @@ static void fill_rt_header(void *buffer,
if (status-decryption_type  (ZD_RX_WEP64|ZD_RX_WEP128|ZD_RX_WEP256))
hdr-rt_flags |= IEEE80211_RADIOTAP_F_WEP;
 
+   hdr-rt_rate = stats-rate / 5;
+
/* FIXME: 802.11a */
hdr-rt_channel = cpu_to_le16(ieee80211chan2mhz(
 _zd_chip_get_channel(mac-chip)));
hdr-rt_chbitmask = cpu_to_le16(IEEE80211_CHAN_2GHZ |
((status-frame_status  ZD_RX_FRAME_MODULATION_MASK) ==
ZD_RX_OFDM ? IEEE80211_CHAN_OFDM : IEEE80211_CHAN_CCK));
-
-   hdr-rt_rate = stats-rate / 5;
 }
 
 /* Returns 1 if the data packet is for us and 0 otherwise. */
-- 
1.4.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] zd1211rw: Fix software encryption/decryption

2006-08-01 Thread Ulrich Kunitz
From: Daniel Drake [EMAIL PROTECTED]

Apparently the ZD1211 doesn't mind, but the ZD1211B absolutely must be
told that encryption is happening in software.

Signed-off-by: Daniel Drake [EMAIL PROTECTED]
Signed-off-by: Ulrich Kunitz [EMAIL PROTECTED]
---
 drivers/net/wireless/zd1211rw/zd_mac.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c 
b/drivers/net/wireless/zd1211rw/zd_mac.c
index 1cf1fda..a66625c 100644
--- a/drivers/net/wireless/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zd1211rw/zd_mac.c
@@ -108,7 +108,9 @@ int zd_mac_init_hw(struct zd_mac *mac, u
if (r)
goto disable_int;
 
-   r = zd_set_encryption_type(chip, NO_WEP);
+   /* We must inform the device that we are doing encryption/decryption in
+* software at the moment. */
+   r = zd_set_encryption_type(chip, ENC_SNIFFER);
if (r)
goto disable_int;
 
@@ -141,7 +143,6 @@ static int reset_mode(struct zd_mac *mac
RX_FILTER_REASSOC_RESPONSE |
RX_FILTER_DISASSOC },
{ CR_SNIFFER_ON, 0U },
-   { CR_ENCRYPTION_TYPE, NO_WEP },
};
 
if (ieee-iw_mode == IW_MODE_MONITOR) {
-- 
1.4.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] zd1211rw: six urgent upstream fixes

2006-08-01 Thread Ulrich Kunitz
Here are six fixes for the zd1211rw driver.

They fix
 - radiotap issues for the monitor mode
 - WEP encryption
 - an endianess problem in the rx path
 - reassociation after disassociation be the AP

If possible these patches should be included in 2.6.18

-- Uli
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] zd1211rw: Remove bogus assert

2006-08-01 Thread Ulrich Kunitz
From: Daniel Drake [EMAIL PROTECTED]

This function is never called in interrupt context, and it doesn't
matter if it is called in IRQ context or not.

Signed-off-by: Daniel Drake [EMAIL PROTECTED]
Signed-off-by: Ulrich Kunitz [EMAIL PROTECTED]
---
 drivers/net/wireless/zd1211rw/zd_usb.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/zd1211rw/zd_usb.c 
b/drivers/net/wireless/zd1211rw/zd_usb.c
index c68b9f8..2e3d77e 100644
--- a/drivers/net/wireless/zd1211rw/zd_usb.c
+++ b/drivers/net/wireless/zd1211rw/zd_usb.c
@@ -325,7 +325,6 @@ static void disable_read_regs_int(struct
 {
struct zd_usb_interrupt *intr = usb-intr;
 
-   ZD_ASSERT(in_interrupt());
spin_lock(intr-lock);
intr-read_regs_enabled = 0;
spin_unlock(intr-lock);
-- 
1.4.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] zd1211rw: Packet filter fix for managed (STA) mode

2006-08-01 Thread Ulrich Kunitz
I had problems with my AVM Fritz!Box access point. It appeared
that the AP deauthorized me and the softmac didn't reconnect me.
This patch handles the problem.

Signed-off-by: Ulrich Kunitz [EMAIL PROTECTED]
---
 drivers/net/wireless/zd1211rw/zd_chip.c |4 ++--
 drivers/net/wireless/zd1211rw/zd_chip.h |6 +++---
 drivers/net/wireless/zd1211rw/zd_mac.c  |5 +
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/zd1211rw/zd_chip.c 
b/drivers/net/wireless/zd1211rw/zd_chip.c
index efc9c4b..da9d06b 100644
--- a/drivers/net/wireless/zd1211rw/zd_chip.c
+++ b/drivers/net/wireless/zd1211rw/zd_chip.c
@@ -797,7 +797,7 @@ static int zd1211_hw_init_hmac(struct zd
{ CR_ADDA_MBIAS_WARMTIME,   0x3808 },
{ CR_ZD1211_RETRY_MAX,  0x2 },
{ CR_SNIFFER_ON,0 },
-   { CR_RX_FILTER, AP_RX_FILTER },
+   { CR_RX_FILTER, STA_RX_FILTER },
{ CR_GROUP_HASH_P1, 0x00 },
{ CR_GROUP_HASH_P2, 0x8000 },
{ CR_REG1,  0xa4 },
@@ -844,7 +844,7 @@ static int zd1211b_hw_init_hmac(struct z
{ CR_ZD1211B_AIFS_CTL2, 0x008C003C },
{ CR_ZD1211B_TXOP,  0x01800824 },
{ CR_SNIFFER_ON,0 },
-   { CR_RX_FILTER, AP_RX_FILTER },
+   { CR_RX_FILTER, STA_RX_FILTER },
{ CR_GROUP_HASH_P1, 0x00 },
{ CR_GROUP_HASH_P2, 0x8000 },
{ CR_REG1,  0xa4 },
diff --git a/drivers/net/wireless/zd1211rw/zd_chip.h 
b/drivers/net/wireless/zd1211rw/zd_chip.h
index 0eb9c8f..069d2b4 100644
--- a/drivers/net/wireless/zd1211rw/zd_chip.h
+++ b/drivers/net/wireless/zd1211rw/zd_chip.h
@@ -466,6 +466,9 @@ #define RX_FILTER_PROBE_RESPONSE0x0020
 #define RX_FILTER_BEACON   0x0100
 #define RX_FILTER_DISASSOC 0x0400
 #define RX_FILTER_AUTH 0x0800
+#define AP_RX_FILTER   0x0400feff
+#define STA_RX_FILTER  0x
+
 /* Monitor mode sets filter to 0xf */
 
 #define CR_ACK_TIMEOUT_EXT CTL_REG(0x0690)
@@ -548,9 +551,6 @@ #define CR_ZD1211B_AIFS_CTL2CTL_REG(0x
 #define CR_ZD1211B_TXOPCTL_REG(0x0b20)
 #define CR_ZD1211B_RETRY_MAX   CTL_REG(0x0b28)
 
-#define AP_RX_FILTER   0x0400feff
-#define STA_RX_FILTER  0x
-
 #define CWIN_SIZE  0x007f043f
 
 
diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c 
b/drivers/net/wireless/zd1211rw/zd_mac.c
index a66625c..d6f3e02 100644
--- a/drivers/net/wireless/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zd1211rw/zd_mac.c
@@ -138,10 +138,7 @@ static int reset_mode(struct zd_mac *mac
 {
struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac);
struct zd_ioreq32 ioreqs[3] = {
-   { CR_RX_FILTER, RX_FILTER_BEACON | RX_FILTER_PROBE_RESPONSE |
-   RX_FILTER_AUTH | RX_FILTER_ASSOC_RESPONSE |
-   RX_FILTER_REASSOC_RESPONSE |
-   RX_FILTER_DISASSOC },
+   { CR_RX_FILTER, STA_RX_FILTER },
{ CR_SNIFFER_ON, 0U },
};
 
-- 
1.4.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regarding offloading IPv6 addrconf and ndisc

2006-08-01 Thread David Miller
From: Hugo Santos [EMAIL PROTECTED]
Date: Tue, 1 Aug 2006 12:50:02 +0100

I might have some cycles during the month to code up something in
  this direction, at least for an initial review, i'll try to do so.

Great.  I prefer to talk about code anyways :)

Also, the reliability of a system depends on a lot of things, but
  please, let's not use the assumption that because everything sits in
  the kernel, it will be stable as the number of 'points of failure' is
  smaller; this is only true as long as people work to have stable
  components -- and this is independent of where the components sit.

This disagrees with my experience.  Things in the kernel tend to get
noticed fast and fixed, whereas things in userspace can stay broken
for a long period of time.

Everything is about momentum, and the kernel is where all the
development momentum is.  It's not in these userland components.

People are running semantic checkers on the kernel constantly,
the kernel has all sorts of automatic locking, memory allocation,
et. al verifications and assertions.

A particular userland components might have this treatment and checks,
but the kernel has them going all the time and people are looking at
the output of these tools and checks constantly.  You cannot get the
kind of coverage the kernel gets.

As Andrew Morton says, userland is just a testsuite for the kernel.
:-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regarding offloading IPv6 addrconf and ndisc

2006-08-01 Thread David Miller
From: Hugo Santos [EMAIL PROTECTED]
Date: Tue, 1 Aug 2006 13:00:03 +0100

  Resiliency to failure is not something that depends on the
  kernel. If the code in question is in the kernel, and it crashes,
  how will you recover?

Developer momentum means that the kernel is likely to get fixed
whereas the userland component will more likely rot and not get
fixed.

So in this sense resiliency does depend upon something being in
the kernel or not.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Herbert Xu
On Tue, Aug 01, 2006 at 08:34:05AM -0700, Phil Oester wrote:
 On Tue, Aug 01, 2006 at 12:00:59AM -0700, David Miller wrote:
  What we have now is infinitely better than the past,
  wherein all TSO packets were dropped due to corrupt
  checksums as soon at the NAT module was loaded.
 
 At what point did this problem begin?  2.6.18-rc or prior?

Since we had TSO.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: warning at net/core/dev.c:1171/skb_checksum_help() 2.6.18-rc3

2006-08-01 Thread Herbert Xu
On Mon, Jul 31, 2006 at 08:36:58PM +0200, Patrick McHardy wrote:

 diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
 index 662a869..df1f4e5 100644
 --- a/net/netfilter/nf_queue.c
 +++ b/net/netfilter/nf_queue.c

I presume we need similar patches for the old ipv4/ipv6 versions as well?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPv6: only set err in rawv6_bind() when necessary

2006-08-01 Thread David Miller
From: Brian Haley [EMAIL PROTECTED]
Date: Tue, 01 Aug 2006 13:06:03 -0400

 The variable 'err' is set in rawv6_bind() before the address check fails 
 instead of after, moved inside if() statement.
 
 Signed-off-by: Brian Haley [EMAIL PROTECTED]

This is a common C idiom in the kernel:

err = -EWHATEVER;
if (error_condition)
goto out;

err = 0;
out:
unlock_stuff();
return err;

Every other path going from this location in rawv6_bind()
will clear err to zero, so your patch also doesn't fix any
bug.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ipv4: don't call upper-layer disconnect function if not connected

2006-08-01 Thread David Miller
From: Brian Haley [EMAIL PROTECTED]
Date: Tue, 01 Aug 2006 15:48:54 -0400

 Calling connect() with AF_UNSPEC will disconnect a socket, but we don't 
 need to do any work if the socket isn't currently connected.
 
 Signed-off-by: Brian Haley [EMAIL PROTECTED]

The socket could have been bind()'d to, in which case it will
not move to connected state and we still need to invoke
the disconnect methods such as udp_disconnect() to clear out
that binding.

You seem to be groveling in random areas of the ipv4 and ipv6 stack,
what are you working on?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 1/3] secid reconciliation on inbound

2006-08-01 Thread Venkat Yekkirala

Currently a packet accumulates multiple security identifiers, each of a
different class, as it enters the system. This patch set reconciles these
identifiers into a single identifier while also allowing LSM (SELinux is
addressed in this patch set) to impose flow control checks based on the
identifiers.

The reconciliation steps for SELinux are explained in the Labeled Networking
document at:
http://marc.theaimsgroup.com/?l=linux-netdevm=115136637800361w=2

The following are the identifiers handled here:

1. secmark on the skb
2. xfrm security identifier associated with the skb if it used any xfrms,
  a zero secid otherwise.

This patch: Add new flask definitions to SELinux

Adds a new avperm come_thru to arbitrate among the identifiers on the
inbound (input/forward). Also adds a new avperm go_thru to enable flow
control checks on the outbound (output/forward), addressed in a later
patch.

Signed-off-by: Venkat Yekkirala [EMAIL PROTECTED]
---
av_perm_to_string.h |2 ++
av_permissions.h|2 ++
2 files changed, 4 insertions(+)

--- linux-2.6.17.child_sock/security/selinux/include/av_permissions.h   
2006-07-31 09:36:24.0 -0500
+++ linux-2.6.17/security/selinux/include/av_permissions.h  2006-07-31 
10:20:16.0 -0500
@@ -962,6 +962,8 @@
#define PACKET__SEND  0x0001UL
#define PACKET__RECV  0x0002UL
#define PACKET__RELABELTO 0x0004UL
+#define PACKET__COME_THRU 0x0008UL
+#define PACKET__GO_THRU   0x0010UL

#define KEY__VIEW 0x0001UL
#define KEY__READ 0x0002UL
--- linux-2.6.17.child_sock/security/selinux/include/av_perm_to_string.h
2006-07-31 09:36:24.0 -0500
+++ linux-2.6.17/security/selinux/include/av_perm_to_string.h   2006-07-31 
10:20:16.0 -0500
@@ -245,6 +245,8 @@
   S_(SECCLASS_PACKET, PACKET__SEND, send)
   S_(SECCLASS_PACKET, PACKET__RECV, recv)
   S_(SECCLASS_PACKET, PACKET__RELABELTO, relabelto)
+   S_(SECCLASS_PACKET, PACKET__COME_THRU, come_thru)
+   S_(SECCLASS_PACKET, PACKET__GO_THRU, go_thru)
   S_(SECCLASS_KEY, KEY__VIEW, view)
   S_(SECCLASS_KEY, KEY__READ, read)
   S_(SECCLASS_KEY, KEY__WRITE, write)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 3/3] secid reconciliation on inbound: core networking changes

2006-08-01 Thread Venkat Yekkirala

Invoke the skb_policy_check LSM hook from within networking code.

This is being done at the same time and as a part of checking
xfrm policy. This is hopefully adequate (not anticipating
IP protos that don't use xfrm).

Signed-off-by: Venkat Yekkirala [EMAIL PROTECTED]
---
xfrm.h |   50 +-
1 file changed, 27 insertions(+), 23 deletions(-)

--- linux-2.6.17.child_sock/include/net/xfrm.h  2006-07-31 09:55:23.0 
-0500
+++ linux-2.6.17/include/net/xfrm.h 2006-08-01 15:08:20.0 -0500
@@ -690,22 +690,20 @@ extern int __xfrm_policy_check(struct so

static inline int xfrm_policy_check(struct sock *sk, int dir, struct sk_buff 
*skb, unsigned short family)
{
-   if (sk  sk-sk_policy[XFRM_POLICY_IN])
-   return __xfrm_policy_check(sk, dir, skb, family);
-   
-   return  (!xfrm_policy_list[dir]  !skb-sp) ||
-   (skb-dst-flags  DST_NOPOLICY) ||
-   __xfrm_policy_check(sk, dir, skb, family);
-}
-
-static inline int xfrm4_policy_check(struct sock *sk, int dir, struct sk_buff 
*skb)
-{
-   return xfrm_policy_check(sk, dir, skb, AF_INET);
-}
+   int ret;

-static inline int xfrm6_policy_check(struct sock *sk, int dir, struct sk_buff 
*skb)
-{
-   return xfrm_policy_check(sk, dir, skb, AF_INET6);
+   if (sk  sk-sk_policy[XFRM_POLICY_IN])
+   ret =  __xfrm_policy_check(sk, dir, skb, family);
+   else
+   ret = (!xfrm_policy_list[dir]  !skb-sp) ||
+   (skb-dst-flags  DST_NOPOLICY) ||
+   __xfrm_policy_check(sk, dir, skb, family);
+
+#ifdef CONFIG_SECURITY_NETWORK
+   if (ret)
+   ret = security_skb_policy_check(skb, family);
+#endif /* CONFIG_SECURITY_NETWORK */
+   return ret;
}

extern int xfrm_decode_session(struct sk_buff *skb, struct flowi *fl, unsigned 
short family);
@@ -757,20 +755,26 @@ static inline void xfrm_sk_free_policy(s
static inline int xfrm_sk_clone_policy(struct sock *sk) { return 0; }
static inline int xfrm6_route_forward(struct sk_buff *skb) { return 1; }  
static inline int xfrm4_route_forward(struct sk_buff *skb) { return 1; } 
-static inline int xfrm6_policy_check(struct sock *sk, int dir, struct sk_buff *skb)
-{ 
-	return 1; 
-} 
-static inline int xfrm4_policy_check(struct sock *sk, int dir, struct sk_buff *skb)

-{
-   return 1;
-}
static inline int xfrm_policy_check(struct sock *sk, int dir, struct sk_buff 
*skb, unsigned short family)
{
+#ifdef CONFIG_SECURITY_NETWORK
+   return security_skb_policy_check(skb, family);
+#else
return 1;
+#endif /* CONFIG_SECURITY_NETWORK */
}
#endif

+static inline int xfrm4_policy_check(struct sock *sk, int dir, struct sk_buff 
*skb)
+{
+   return xfrm_policy_check(sk, dir, skb, AF_INET);
+}
+
+static inline int xfrm6_policy_check(struct sock *sk, int dir, struct sk_buff 
*skb)
+{
+   return xfrm_policy_check(sk, dir, skb, AF_INET6);
+}
+
static __inline__
xfrm_address_t *xfrm_flowi_daddr(struct flowi *fl, unsigned short family)
{
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] secid reconciliation on inbound: add LSM hooks

2006-08-01 Thread James Morris
On Tue, 1 Aug 2006, Venkat Yekkirala wrote:

 - if (err)
 - goto out;
 + /* if (err) */
 + /*  goto out; */
 
 - err = selinux_xfrm_sock_rcv_skb(sksec-sid, skb, ad);
 -out: +   /* err = selinux_xfrm_sock_rcv_skb(sksec-sid, skb, ad); */
 +out: return err;
 }


Did you mean to leave the call to selinux_xfrm_sock_rcv_skb() commented 
out?



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/3] secid reconciliation on inbound

2006-08-01 Thread James Morris
On Tue, 1 Aug 2006, Venkat Yekkirala wrote:

 +#define PACKET__COME_THRU 0x0008UL
 +#define PACKET__GO_THRU   0x0010UL

These names seem awkward, and do we really need a separate perm for each 
direction?



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix more per-cpu typos

2006-08-01 Thread Alexey Dobriyan
Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
---

 arch/x86_64/kernel/smp.c |2 +-
 include/net/netdma.h |2 +-
 net/core/dev.c   |4 ++--
 net/ipv4/tcp.c   |2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86_64/kernel/smp.c
+++ b/arch/x86_64/kernel/smp.c
@@ -203,7 +203,7 @@ int __cpuinit init_smp_flush(void)
 {
int i;
for_each_cpu_mask(i, cpu_possible_map) {
-   spin_lock_init(per_cpu(flush_state.tlbstate_lock, i));
+   spin_lock_init(per_cpu(flush_state, i).tlbstate_lock);
}
return 0;
 }
--- a/include/net/netdma.h
+++ b/include/net/netdma.h
@@ -29,7 +29,7 @@ static inline struct dma_chan *get_softn
 {
struct dma_chan *chan;
rcu_read_lock();
-   chan = rcu_dereference(__get_cpu_var(softnet_data.net_dma));
+   chan = rcu_dereference(__get_cpu_var(softnet_data).net_dma);
if (chan)
dma_chan_get(chan);
rcu_read_unlock();
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3433,7 +3433,7 @@ static void net_dma_rebalance(void)
 
if (net_dma_count == 0) {
for_each_online_cpu(cpu)
-   rcu_assign_pointer(per_cpu(softnet_data.net_dma, cpu), 
NULL);
+   rcu_assign_pointer(per_cpu(softnet_data, cpu).net_dma, 
NULL);
unlock_cpu_hotplug();
return;
}
@@ -3447,7 +3447,7 @@ static void net_dma_rebalance(void)
   + (i  (num_online_cpus() % net_dma_count) ? 1 : 0));
 
while(n) {
-   per_cpu(softnet_data.net_dma, cpu) = chan;
+   per_cpu(softnet_data, cpu).net_dma = chan;
cpu = next_cpu(cpu, cpu_online_map);
n--;
}
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1132,7 +1132,7 @@ #ifdef CONFIG_NET_DMA
tp-ucopy.dma_chan = NULL;
preempt_disable();
if ((len  sysctl_tcp_dma_copybreak)  !(flags  MSG_PEEK) 
-   !sysctl_tcp_low_latency  __get_cpu_var(softnet_data.net_dma)) {
+   !sysctl_tcp_low_latency  __get_cpu_var(softnet_data).net_dma) {
preempt_enable_no_resched();
tp-ucopy.pinned_list = dma_pin_iovec_pages(msg-msg_iov, len);
} else

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >