VPN vs. conntrack

2007-03-03 Thread Pete Zaitcev
Dear All:

Our IS/IT in their infinite wisdom made us to switch from ESP to UDP
encapsulation for the VPN. It worked ok for a while, but the following
seems to happen now (not sure how recently it started, sorry).
At first, everything works fine. A VPN client sends its packets through
a masquerading box to Cisco concentrator. Concentrator sends packets
back, and they get properly forwarded by the masquerading box.

Traffic looks like this (on the outside interface of the masquerading box):

22:05:50.383442 IP 65.181.30.74.ipsec-nat-t > 66.197.233.252.ipsec-nat-t: 
UDP-encap: ESP(spi=0x5e12c961,seq=0xb), length 76
22:05:50.414700 IP 66.197.233.252.ipsec-nat-t > 65.181.30.74.ipsec-nat-t: 
UDP-encap: ESP(spi=0x69017f05,seq=0x8), length 84

Conntrack looks like this:

udp  17 144 src=192.168.128.11 dst=66.197.233.252 sport=4500 dport=4500 
packets=20 bytes=3704 src=66.197.233.252 dst=65.181.30.74 sport=4500 dport=4500 
packets=14 bytes=4248 [ASSURED] mark=0 secmark=0 use=1

However, sometimes the masquerading box loses the conntrack entry. This
happens if there was not enough activity across VPN. Then, it cannot
re-establish the entry. I don't know how it manages that, but the mas-
querading system makes the concentrator to reply to port 1024:

21:56:51.136174 IP 65.181.30.74.ipsec-nat-t > 66.197.233.252.ipsec-nat-t: 
UDP-encap: ESP(spi=0x3d02e5ec,seq=0x26), length 92
21:56:51.224938 IP 66.197.233.252.ipsec-nat-t > 65.181.30.74.1024: UDP-encap: 
ESP(spi=0x7d35e369,seq=0x37), length 92

The conntrack entry looks like this (note port 1024):

udp  17 9 src=66.197.233.252 dst=65.181.30.74 sport=4500 dport=1024 
packets=1 bytes=120 [UNREPLIED] src=65.181.30.74 dst=66.197.233.252 sport=1024 
dport=4500 packets=0 bytes=0 mark=0 secmark=0 use=1

Restarting the client (!) makes conntrack to catch up and work again.
I looked at ip_conntrack code and I just don't understand anything...
Does anyone have any ideas?

Greetings,
-- Pete
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BUG][SECURITY] Re: Weird problem with PPPoE on tap interface

2007-03-03 Thread David Miller
From: Florian Zumbiehl <[EMAIL PROTECTED]>
Date: Sun, 4 Mar 2007 02:55:16 +0100

> Below you find a slightly changed version of the patch

I already applied your first patch, so if you have any
fixes to submit please provide them as relative patches
to your original change.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Session ID 0 with PPPoE

2007-03-03 Thread David Miller
From: Florian Zumbiehl <[EMAIL PROTECTED]>
Date: Sun, 4 Mar 2007 03:30:00 +0100

> I noticed that the PPPoE code doesn't allow session id 0x to be used
> for an actual session but rather considers 0 a special value denoting
> that the socket is unbound. Now, when reading RFC 2516, I couldn't really
> find anything that would forbid 0x as a session id. Only 0x "is
> reserved for future use and MUST NOT be used", while 0x is specified
> as the only allowed value for the session id field on certain types of
> packets, but neither can I find any statement that forbids 0x as
> an ordinary session identifier, nor can I find any reasons that would
> prevent PPPoE from functioning properly with a session id of 0x.
> 
> Does anyone of you see any reason why a server would not be allowed to
> select 0x as the session id for a PPPoE session?

I can't, feel free to provide a patch to remove this limitation
if it's important to you.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Wifi support for iproute2

2007-03-03 Thread Stephen Hemminger
On Sun, 4 Mar 2007 02:26:53 +0100
Stephan Maka <[EMAIL PROTECTED]> wrote:

> Hello
> 
> I've always felt uncomfortable by the usability of the wireless-tools
> (iwconfig, iwlist), but I really love iproute2. That's why I started
> to implement "ip wifi".
> 
> My GIT tree is located at:
>   http://cthulhu.c3d2.de/~astro/git/iproute2.git/
> 
> I wonder if this has a chance to get merged in iproute2 somewhen at
> all, or if this is completely inappropriate functionality for this
> tool.
> 
> Note that this my first contribution to a Linux-specific tool, *any*
> hints about style and conventions demanded here are welcome.
> 
> I'm sure not everyone will use this instead of wireless-tools, and a
> few even don't want it compiled in and linked with iwlib. Should I
> make this optional through a Makefile variable?
> 
> If noone else contributes not all of wireless-tools will be
> implemented, it's just too much. But basic stuff for configuring a
> Wifi card is already there.
> 
> Everyone told me dscape is near, but I hope iwlib will be ported then.

Don't waste your time with a tool that uses the exist wext API.
But a tool that could use cfg80211 would be useful. After the wireless
summit in Jan, I put it on my "interesting ideas" list.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Session ID 0 with PPPoE

2007-03-03 Thread Florian Zumbiehl
Hi,

I noticed that the PPPoE code doesn't allow session id 0x to be used
for an actual session but rather considers 0 a special value denoting
that the socket is unbound. Now, when reading RFC 2516, I couldn't really
find anything that would forbid 0x as a session id. Only 0x "is
reserved for future use and MUST NOT be used", while 0x is specified
as the only allowed value for the session id field on certain types of
packets, but neither can I find any statement that forbids 0x as
an ordinary session identifier, nor can I find any reasons that would
prevent PPPoE from functioning properly with a session id of 0x.

Does anyone of you see any reason why a server would not be allowed to
select 0x as the session id for a PPPoE session?

Florian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Wifi support for iproute2

2007-03-03 Thread Stephan Maka
Hello

I've always felt uncomfortable by the usability of the wireless-tools
(iwconfig, iwlist), but I really love iproute2. That's why I started to
implement "ip wifi".

My GIT tree is located at:
  http://cthulhu.c3d2.de/~astro/git/iproute2.git/

I wonder if this has a chance to get merged in iproute2 somewhen at all,
or if this is completely inappropriate functionality for this tool.

Note that this my first contribution to a Linux-specific tool, *any*
hints about style and conventions demanded here are welcome.

I'm sure not everyone will use this instead of wireless-tools, and a few
even don't want it compiled in and linked with iwlib. Should I make this
optional through a Makefile variable?

If noone else contributes not all of wireless-tools will be implemented,
it's just too much. But basic stuff for configuring a Wifi card is
already there.

Everyone told me dscape is near, but I hope iwlib will be ported then.


Stephan


pgpmNPtS32xU9.pgp
Description: PGP signature


Re: [PATCH][BUG][SECURITY] Re: Weird problem with PPPoE on tap interface

2007-03-03 Thread Florian Zumbiehl
Hi,

> From: Florian Zumbiehl <[EMAIL PROTECTED]>
> Date: Wed, 28 Feb 2007 13:38:44 +0100
> 
> > As noone seems to have an opinion on this: Here is a patch that does
> > work for me and that should solve the problem as far as that is easily
> > possible. It is based on the assumption that an interface's ifindex is
> > basically an alias for a local MAC address, so incoming packets now are
> > matched to sockets based on remote MAC, session id, and ifindex of the
> > interface the packet came in on/the socket was bound to by connect().
> 
> I agree with your analysis and have applied your patch.

Below you find a slightly changed version of the patch that avoids
a possible NULL pointer dereference in case pppoe_device_event()/
pppoe_flush_dev() dev_put()s dev and sets it to NULL before pppoe_connect()
tries to unbind from the previous address, in which case it would
dereference the NULL pointer in dev.

It now saves the ifindex in the socket's data structure upon connect(),
so that it's still available for finding the entry to remove from the
hash table in case pppoe_device_event() should have dropped the socket's
reference to dev.

> Another way to implement this would have been to store the
> pre-computed ifindex on the kernel side sockaddr.

Well, that probably depends on the intended semantics. There isn't any
documentation somewhere that specifies what the intended behaviour is,
is there?!

Florian

--- linux-2.6.20/drivers/net/pppoe.c.orig   2007-02-25 19:23:51.0 
+0100
+++ linux-2.6.20/drivers/net/pppoe.c2007-03-04 02:11:51.0 +0100
@@ -7,6 +7,12 @@
  *
  * Version:0.7.0
  *
+ * 070228 :Fix to allow multiple sessions with same remote MAC and same
+ * session id by including the local device ifindex in the
+ * tuple identifying a session. This also ensures packets can't
+ * be injected into a session from interfaces other than the one
+ * specified by userspace. Florian Zumbiehl <[EMAIL PROTECTED]>
+ * (Oh, BTW, this one is YYMMDD, in case you were wondering ...)
  * 220102 :Fix module use count on failure in pppoe_create, pppox_sk -acme
  * 030700 :Fixed connect logic to allow for disconnect.
  * 270700 :Fixed potential SMP problems; we must protect against
@@ -127,14 +133,14 @@
  *  Set/get/delete/rehash items  (internal versions)
  *
  **/
-static struct pppox_sock *__get_item(unsigned long sid, unsigned char *addr)
+static struct pppox_sock *__get_item(unsigned long sid, unsigned char *addr, 
int ifindex)
 {
int hash = hash_item(sid, addr);
struct pppox_sock *ret;
 
ret = item_hash_table[hash];
 
-   while (ret && !cmp_addr(&ret->pppoe_pa, sid, addr))
+   while (ret && !(cmp_addr(&ret->pppoe_pa, sid, addr) && 
ret->pppoe_ifindex == ifindex))
ret = ret->next;
 
return ret;
@@ -147,21 +153,19 @@
 
ret = item_hash_table[hash];
while (ret) {
-   if (cmp_2_addr(&ret->pppoe_pa, &po->pppoe_pa))
+   if (cmp_2_addr(&ret->pppoe_pa, &po->pppoe_pa) && 
ret->pppoe_ifindex == po->pppoe_ifindex)
return -EALREADY;
 
ret = ret->next;
}
 
-   if (!ret) {
-   po->next = item_hash_table[hash];
-   item_hash_table[hash] = po;
-   }
+   po->next = item_hash_table[hash];
+   item_hash_table[hash] = po;
 
return 0;
 }
 
-static struct pppox_sock *__delete_item(unsigned long sid, char *addr)
+static struct pppox_sock *__delete_item(unsigned long sid, char *addr, int 
ifindex)
 {
int hash = hash_item(sid, addr);
struct pppox_sock *ret, **src;
@@ -170,7 +174,7 @@
src = &item_hash_table[hash];
 
while (ret) {
-   if (cmp_addr(&ret->pppoe_pa, sid, addr)) {
+   if (cmp_addr(&ret->pppoe_pa, sid, addr) && ret->pppoe_ifindex 
== ifindex) {
*src = ret->next;
break;
}
@@ -188,12 +192,12 @@
  *
  **/
 static inline struct pppox_sock *get_item(unsigned long sid,
-unsigned char *addr)
+unsigned char *addr, int ifindex)
 {
struct pppox_sock *po;
 
read_lock_bh(&pppoe_hash_lock);
-   po = __get_item(sid, addr);
+   po = __get_item(sid, addr, ifindex);
if (po)
sock_hold(sk_pppox(po));
read_unlock_bh(&pppoe_hash_lock);
@@ -203,7 +207,15 @@
 
 static inline struct pppox_sock *get_item_by_addr(struct sockaddr_pppox *sp)
 {
-   return get_item(sp->sa_addr.pppoe.sid, sp->sa_addr.pppoe.remote);
+   struct net_device *dev = NULL;
+   int ifindex;
+
+   dev = dev_get_by_name(sp->sa_addr.pppoe.dev);
+   if(!dev)
+   return NULL;
+   ifin

Re: SWS for rcvbuf < MTU

2007-03-03 Thread John Heffner

David Miller wrote:

From: John Heffner <[EMAIL PROTECTED]>
Date: Fri, 02 Mar 2007 16:16:39 -0500

Please don't apply the patch I sent.  I've been thinking about this a 
bit harder, and it may not fix this particular problem.  (Hard to say 
without knowing exactly what it is.)  As the comment above 
__tcp_select_window() states, we do not do full receive-side SWS 
avoidance because of header prediction.


Alex, you're right I missed that special zero-window case.  I'm still 
not quite sure I'm completely happy with this patch.  I'd like to think 
about this a little bit harder...


Ok


Alright, I've thought about it a bit more, and I think the patch I sent 
should work.  Alex, any opinion?  Any way you can test this out?


Thanks,
  -John
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] at76_usb wireless driver

2007-03-03 Thread Johannes Berg
Looks pretty good, a few more comments.

KEVENT_* constants and functions are really really confusing when the
"kevent" subsystem is being discussed on netdev all the time. They're
also quite meaningless, please rename them to something like
AT76_DEVEVENT_* or whatever. While at that, the kevent() function really
could use splitting up into sub-functions, it's a pretty large mess.

The same about constant names goes for PM_* constants since linux/pm.h
defines a whole bunch of PM_* constants too (I initially thought you
were using those and was really confused what the driver does!)

Having a whole bunch of module parameters that only set initial settings
for things you later configure with iwconfig seems pretty useless but if
you really think that they're absolutely required I don't care too much.

PRIV_IOCTL_SET_MONITOR_MODE is wrong as far as I can tell, there's
"iwconfig ... mode monitor" which you should use instead. (Btw, why are
the priv ioctls spaced so strangely??)

There doesn't seem to be a need to include rtnetlink.h, what made you
think you'd need rtnl_lock()?
"putting this inside rtnl_lock() - rtnl_unlock() hangs modprobe" is
pretty obvious -- register_netdev does rtnl_lock()! And don't use
unregister_netdevice(), use unregister_netdev().

Your init_new_device() function can return an error but those errors are
not always checked, maybe a printk would be appropriate. Also,
generally, subfunctions are allowed to return errors directly, i.e.
instead of returning -1 return -ENODEV from there and just hand it up
from the caller.

at76c503_do_probe could use splitting into two functions, the two huge
parts of the if, if only to unindent the whole code a bit and get it to
adhere to 80 chars/line instead of 99.

"Use our own dbg macro" but mabye use dev_dbg?
Same for err() (which I didn't even know existed..) how about dev_err()?

static u8 snapsig/rfc1042sig/bc_addr/off_addr/hw_rates/channel_frequency
etc etc etc don't belong into a header file.



Some other (mostly style) issues:
 * kernel code prefers a space before the brace in
   "struct at76c503_command{" et al.
 * both your header and code files contain lots of trailing whitespace
 * kernel code prefers no space between function names and the opening
   parenthesis
 * use compare_ether_addr() instead of memcmp (look for ETH_ALEN, lots
   of places)
 * all the PROC = stuff looks pretty strange to my eyes but hey
   that's just me I guess :)
 * why does at76c503_get_fw_info take such a huge number of parameters?
   Couldn't you pass in a struct at76c503 * and have that filled?
 * struct reg_domain should be tab-indented.
 * don't use typedefs for structs
 * attribute packed on members of a struct is pretty weird
 * the various frame definitions like struct ieee802_11_beacon_data are
   useless, the same stuff is in struct ieee80211_mgmt (at least on
   wireless dev kernel, that might not be true on other kernels?)
 * hex2str wants proper indentation
 * get_hw_config has weird indentation
 * wait_completion sounds far too generic (wait_for_completion in
   completion.h!)



Hey, I need to go but that probably gives you a lot to review...

johannes


signature.asc
Description: This is a digitally signed message part


[PATCH] Au1000 link beat detection

2007-03-03 Thread Florian Fainelli
Hi all

This patch fixes the link beat detection when the cable is not plugged at 
startup with au1000_eth driver.

Signed-off-by: Florian Fainelli <[EMAIL PROTECTED]>
-- 
diff -urN linux-2.6.16.7/drivers/net/au1000_eth.c 
linux-2.6.16.7.new/drivers/net/au1000_eth.c
--- linux-2.6.16.7/drivers/net/au1000_eth.c 2006-04-17 23:53:25.0 
+0200
+++ linux-2.6.16.7.new/drivers/net/au1000_eth.c 2006-04-23 01:42:48.0 
+0200
@@ -12,6 +12,9 @@
  * Author: MontaVista Software, Inc.
  * [EMAIL PROTECTED] or [EMAIL PROTECTED]
  *
+ * Bjoern Riemer 2004
+ *   [EMAIL PROTECTED] or [EMAIL PROTECTED]
+ * // fixed the link beat detection with ioctls (SIOCGMIIPHY)
  * 
  *
  *  This program is free software; you can distribute it and/or modify it
@@ -1672,6 +1675,10 @@
aup->phy_ops->phy_status(dev, aup->phy_addr, &link, &speed);
control = MAC_DISABLE_RX_OWN | MAC_RX_ENABLE | MAC_TX_ENABLE;
 #ifndef CONFIG_CPU_LITTLE_ENDIAN
+   /*riemer: fix for startup without cable */
+   if (!link)
+   dev->flags &= ~IFF_RUNNING;
+
control |= MAC_BIG_ENDIAN;
 #endif
if (link && (dev->if_port == IF_PORT_100BASEFX)) {


Re: [patch] at76_usb wireless driver

2007-03-03 Thread Johannes Berg
Pavel should know better and have told you that wireless got it's own
list ;)
Quoting fully to make linux-wireless aware. I guess we'll want to take
out netdev on replies to this.

johannes

On Sat, 2007-03-03 at 16:00 +0100, Guido Guenther wrote:
> Hi,
> I'd be glad if someone could review the at76_usb wireless driver, it
> adds support for the at76c503, at76c505 and at76c505a wireless USB
> adapters. Since it exceeds the lists size limit, please
> git-clone http://honk.sigxcpu.org/git/at76c503a.git/
> 
> The projects homepage is:
> http://at76c503a.berlios.de/
> The above git tree is functionally equivalent to the CVS version at
> berlios but removes support for older kernels, older wireless
> extensions, C99 style comments, commented out code and shifts things
> around a little for better readability. This is only done to ease
> reviewing - there are no functional changes over the CVS archive.
> 
> The driver was tested (and is in pracitical use) on at least i386, amd64
> and powerpc.
> 
> Please note that I'm not the author of the driver, the driver itself
> lists these copyright holders (but none of them showed up on the
> mailing list during the last couple of months):
> 
> Copyright (c) 2002 - 2003 Oliver Kurth
> Copyright (c) 2004 Joerg Albert <[EMAIL PROTECTED]>
> Copyright (c) 2004 Nick Jones
> Copyright (c) 2004 Balint Seeber <[EMAIL PROTECTED]>
> 
> Pavel Roskin, Maxim Grechkin and me were committing to CVS recently.
> Cheers,
>  -- Guido
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


signature.asc
Description: This is a digitally signed message part


Re: [patch] at76_usb wireless driver

2007-03-03 Thread Guido Guenther
Hi,
I'd be glad if someone could review the at76_usb wireless driver, it
adds support for the at76c503, at76c505 and at76c505a wireless USB
adapters. Since it exceeds the lists size limit, please
git-clone http://honk.sigxcpu.org/git/at76c503a.git/

The projects homepage is:
http://at76c503a.berlios.de/
The above git tree is functionally equivalent to the CVS version at
berlios but removes support for older kernels, older wireless
extensions, C99 style comments, commented out code and shifts things
around a little for better readability. This is only done to ease
reviewing - there are no functional changes over the CVS archive.

The driver was tested (and is in pracitical use) on at least i386, amd64
and powerpc.

Please note that I'm not the author of the driver, the driver itself
lists these copyright holders (but none of them showed up on the
mailing list during the last couple of months):

Copyright (c) 2002 - 2003 Oliver Kurth
Copyright (c) 2004 Joerg Albert <[EMAIL PROTECTED]>
Copyright (c) 2004 Nick Jones
Copyright (c) 2004 Balint Seeber <[EMAIL PROTECTED]>

Pavel Roskin, Maxim Grechkin and me were committing to CVS recently.
Cheers,
 -- Guido

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ehea: Optional TX/RX path optimized for SMP

2007-03-03 Thread Andi Kleen
On Sat, Mar 03, 2007 at 09:28:28AM +0100, Benjamin Herrenschmidt wrote:
> On Sat, 2007-03-03 at 04:06 +0100, Andi Kleen wrote:
> > Jan-Bernd Themann <[EMAIL PROTECTED]> writes:
> > > 
> > > Are there any concerns about this approach?
> > 
> > Yes. You should fix the NAPI code instead of trying to work
> > around it.
> 
> NAPI is being fixed but the fix will take time to get in. In the
> meantime, the solution to get something working is the workaround

If it works right now with just a little less efficiency there
is no pressing need to do workarounds until the real solution.

Besides I doubt that patch would have made .21 anyways and in .22
you might already have multiqueue NAPI.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: avoid ptype_all packet handling

2007-03-03 Thread Andi Kleen
On Fri, Mar 02, 2007 at 08:22:11PM -0800, David Miller wrote:
> From: Andi Kleen <[EMAIL PROTECTED]>
> Date: 03 Mar 2007 03:14:29 +0100
> 
> > That's pretty common with many x86 server boards because 
> > they come with two NICs by default but must people only
> > plug the cable into one. However the distro installers
> > run DHCP on all.
> 
> Nope, that's not what I've seen them do, instead they run dhcp on
> interfaces that report a link being present.

I've seen otherwise.

And that's also bad. It means that when the user moves the machine
and happens to plug the Ethernet into the other port network
will be a notwork until the configuration is manually changed.
Similar when the cable is not plugged in yet at install time.
All not good.

Allowing low overhead DHCP is useful imho. The main problem
with running it always is that it will use more power because
it will IFF_UP the interface. But longer term that can be only
properly solved by real idle network power management I think.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-03-03 Thread Ilpo Järvinen
On Sat, 3 Mar 2007, Ilpo Järvinen wrote:
> On Fri, 2 Mar 2007, David Miller wrote:
> > From: Baruch Even <[EMAIL PROTECTED]>
> > Date: Thu, 1 Mar 2007 20:13:40 +0200
> 
> > > One drawback for this approach is that you now walk the entire sack
> > > block when you advance one packet. If you consider a 10,000 packet queue
> > > which had several losses at the beginning and a large sack block that
> > > advances from the middle to the end you'll walk a lot of packets for
> > > that one last stretch of a sack block.
> 
> Maybe this information in the last ACK could be even used as advantage to 
> speed up the startpoint lookup, since the last ACK very likely contains 
> the highest SACK block (could not hold under attack though), no matter how 
> big it is, and it's not necessary to process it ever if we know for sure 
> that it was the global highest rather than a local one. Using that, it's 
> trivial to find the starting point below the SACK block and that could be 
> passed to the head_lost marking on-the-fly using stack rather than in the
> structs. Sort of fastpath too :-)... Maybe even some auto-tuning thingie 
> which enables conditionally skipping of large SACK-blocks to reuse all of 
> this last ACK information in mark_head but that would get rather 
> complicated I think.

Even better, adding to more the globally highest SACKed block range that 
is already larger than tp->reordering does nothing in mark_head_lost 
unless ca_state or high_seq are also changed (just a quick thoughts, 
it might be possible to exclude also them with careful analysis of state), 
isn't that so? But taking advantage of this might require inter-ACK state 
and be then less useful optimization. So only thing that would remain is 
the check for timedout above the highest SACKed block...

-- 
 i.

Re: [PATCH] bridge: avoid ptype_all packet handling

2007-03-03 Thread Stefan Rompf
Am Freitag, 2. März 2007 22:26 schrieb David Miller:

> The DHCP client should only care about a particular interface's
> traffic, the one it wants to listen on.

Also, a DHCP client should close the socket between address acquisition and 
renewal. The only interesting events in that period are operstate changes.

Stefan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-03-03 Thread Ilpo Järvinen
On Fri, 2 Mar 2007, David Miller wrote:

> From: Baruch Even <[EMAIL PROTECTED]>
> Date: Thu, 1 Mar 2007 20:13:40 +0200

> > One drawback for this approach is that you now walk the entire sack
> > block when you advance one packet. If you consider a 10,000 packet queue
> > which had several losses at the beginning and a large sack block that
> > advances from the middle to the end you'll walk a lot of packets for
> > that one last stretch of a sack block.

Maybe this information in the last ACK could be even used as advantage to 
speed up the startpoint lookup, since the last ACK very likely contains 
the highest SACK block (could not hold under attack though), no matter how 
big it is, and it's not necessary to process it ever if we know for sure 
that it was the global highest rather than a local one. Using that, it's 
trivial to find the starting point below the SACK block and that could be 
passed to the head_lost marking on-the-fly using stack rather than in the
structs. Sort of fastpath too :-)... Maybe even some auto-tuning thingie 
which enables conditionally skipping of large SACK-blocks to reuse all of 
this last ACK information in mark_head but that would get rather 
complicated I think.

> > One way to handle that is to use the still existing sack fast path to
> > detect this case and calculate what is the sequence number to search
> > for. Since you know what was the end_seq that was handled last, you can
> > search for it as the start_seq and go on from there. Does it make sense?
> 
> BTW, I think I figured out a way to get rid of
> lost_{skb,cnt}_hint.  The fact of the matter in this case is that
> the setting of the tag bits always propagates from front of the queue
> onward.  We don't get holes mid-way.
> 
> So what we can do is search the RB-tree for high_seq and walk
> backwards.  Once we hit something with TCPCB_TAGBITS set, we
> stop processing as there are no earlier SKBs which we'd need
> to do anything with.

If TCP knows the highest SACK block skb (or its seq) and high_seq, finding 
the starting point is really trivial as you can always take min of them 
without any walking. Then walk backwards skipping first reordering, until 
the first LOST one is encountered. ...Only overhead then comes from the 
skipped which depends on the current reordering (of course that was not 
overhead if they were timedout). This would not even require knowledge 
about per skb fack_count because the skipping servers for its purpose and 
it can be done on the fly.

> scoreboard_skb_hint is a little bit trickier, but it is a similar
> case to the tcp_lost_skb_hint case.  Except here the termination
> condition is a relative timeout instead of a sequence number and
> packet count test.
> 
> Perhaps for that we can remember some state from the
> tcp_mark_head_lost() we do first.  In fact, we can start
> the queue walk from the latest packet which tcp_mark_head_lost()
> marked with a tag bit.

Yes, the problem with this case compared to head_lost seems to be that
we don't know whether the first skb (in backwards walk) must be marked 
until we have walked past it (actually to the point where the first 
SACKED_RETRANS is encountered, timestamps should be in order except for 
the discontinuity that occurs at SACKED_RETRANS edge). So it seems to me 
that any backwards walk could be a bit problematic in this case exactly 
because of this discontinuity? Armed with this knowledge, however, 
backwards walk could start checking for timedout marking right at the 
first SACKED_RETRANS skb. And continue later from that point forward if 
the skb at the edge was also marked due to timeout. ...Actually, I think 
that other discontinuity can exists at the EVER_RETRANS edge but that 
suffers from the same problem as non-RETRANS skbs before SACKED_RETRANS 
edge when first encountered and therefore is probably pretty useless.

> Basically these two algorithms are saying:
> 
> 1) Mark up to smallest of 'lost' or tp->high_seq.
> 2) Mark packets after those processed in #1 which have
>timed out.
> 
> Right?

So I would take another angle to this problem (basically combine the 
lost calculation from tcp_update_scoreboard and mark_head lost stuff and 
ignore this lost altogether). Basically what I propose is something like 
this (hopefully I don't stomp your feet by coding it this much as
pseudish approach seemed to get almost as complex :-)):

void tcp_update_scoreboard_fack(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
int reord_count = 0;
int had_retrans = 0;
struct sk_buff *timedout_reentry_skb = NULL;

/* How to get this highest_sack? */
skb = get_min_skb_wrapsafe(tp->high_seq, highest_sack);

walk_backwards_from(sk, skb) {
if (TCP_CB_SKB(skb)->sacked & TCPCB_LOST)
break;

if (TCP_CB_SKB(skb)->sacked & TCPCB_SACKED_RETRANS) {
had_retrans = 1;

Re: anthropology of linux: help needed

2007-03-03 Thread Roel Bindels
m a a [EMAIL PROTECTED] ac uk schreef:
> Hello,
> 
> This is a rather strange e-mail for these mailing
> lists, I know. I am a third year Social Anthropology
> student in the University of Durham doing my
> dissertation (thesis) on the Anthropology of
> GNU/Linux. I would really appreciate if you could help
> me out and offer some of your time to fill in the
> questionnaire below- it will only take 2 minutes.
> Replies will be confidential and everything in the
> dissertation will be anonymous. Results will be
> e-mailed to participants upon request.
> 
> Thanks in advance, and... enjoy!
> Maria Kastrinou
> 
> 
> 
> QUESTIONS:
> 
> 1. When did you start using GNU/Linux OS?
6 years ago
> 
> 
> 2. What is your level of involvement?
> newbie/ user/ developer (delete as appropriate)
developer
> 
> 
> 3. Why are you using Linux?
I like it more than the alternatives
> 
> 
> 4. Is Linux fun? How?
It's safe, flexibel
> 
> 
> 5. Which distribution of Linux do you use?
For clients I normaly use SuSE - sometimes Ubuntu, for servers I use debian.
> 
> 
> 6. What in your opinion constitutes a ¡good hack¢?
Since hacking is not a negative thing, I think a good hack is some code
that does the job and also is very nice developed
> 
> 
> 7. Would you describe yourself as a ¡hacker¢?
if hacker is equelent to software developer, yes. If it is somebody that
cracks (cracker) code then no.
> 
> 
> 8. Which super-hero (apart from Tux) do you think
> would represent Linux best?
I don't believe superheroes excists. Linux does..
> 
> 
> 9. Describe Microsoft OS in one word.
buggy
> 
> 
> 10. How do you view the recent patent agreement
> between Microsoft and Novell?
I think NOVELL is a big boy and can play it's own game.
> 
> 
> 11. GNU GPL, copyleft and freedom of speech: good, bad
> or irrelevant?
very good
> 
> 
> 12. How many Linux mailing lists are you a member of?
10

> 
> 
> 13. Would you reply to a question sent through Linux
> mailing lists and why?
No ;), open communication
> 
> 
> 14. Microsoft vs. Linux: cathedral vs. bazaar or
> something altogether different?
Commercial vs Non-Commercial

> 
> 
> 15. Open Source or Free Software?
Open Source
> 
> 
> Personal Information:
> a) Age:
29
> b) Gender:
Male
> c) Occupation:
Tutor
> d) Your current geographic location:
Netherlands
> 
> Any other comments?
Nope
> 
> 
> Would you like me to e-mail you the results? YES/ NO
Put them on the list ;)
> 
> 
> Hope you enjoyed it!
> 
> THANK YOU!
> 
> Maria
> 
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-gcc" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-03 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 12:45:36PM -0800, Michael K. Edwards ([EMAIL 
PROTECTED]) wrote:
> On 3/2/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:
> >Thank you for this report. (Still avoiding cache misses studies, while they
> >obviously are the limiting factor)
> 
> 1)  The entire point of going to a tree-like structure would be to
> allow the leaves to age out of cache (or even forcibly evict them)
> when the structure bloats (generally under DDoS attack), on the theory
> that most of them are bogus and won't be referenced again.  It's not
> about the speed of the data structure -- it's about managing its
> impact on the rest of the system.
> 
> 2)  The other entire point of going to a tree-like structure is that
> they're drastically simpler to RCU than hashes, and more generally
> they don't involve individual atomic operations (RCU reaping passes,
> resizing, etc.) that cause big latency hiccups and evict a bunch of
> other stuff from cache.
> 
> 3)  The third entire point of going to a tree-like structure is to
> have a richer set of efficient operations, since you can give them a
> second "priority"-type index and have "pluck-highest-priority-item",
> three-sided search, and bulk delete operations.  These aren't that
> much harder to RCU than the basic modify-existing-node operation.
> 
> Now can we give these idiotic micro-benchmarks a rest until Robert's
> implementation is tuned and ready for stress-testing?

Mmm, you have learnt new words from other threads :)
It is not a benchmark, it is analysis of the structure processing.
All you have written above is correct, since it was said in this thread
multiple times, but it does not change the fact, that tree traversal is
slower than list one, so to compete tree (or another algo, which would 
be even more interesting) implementation must have that in mind and be 
faster in any (the most) load. As is tree/trie does not have it (it is
feature of algorithm), but has another advantages (extremely suitable in
routing cache whihc requires wildcard support and scaling) you did not
mention - ability to scale without structure recreations.

Btw, you could try to implement something you have written above to show
its merits, so that it would not be an empty words :)

> Cheers,
> - Michael

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ehea: Optional TX/RX path optimized for SMP

2007-03-03 Thread Benjamin Herrenschmidt
On Sat, 2007-03-03 at 04:06 +0100, Andi Kleen wrote:
> Jan-Bernd Themann <[EMAIL PROTECTED]> writes:
> > 
> > Are there any concerns about this approach?
> 
> Yes. You should fix the NAPI code instead of trying to work
> around it.

NAPI is being fixed but the fix will take time to get in. In the
meantime, the solution to get something working is the workaround.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: Fix problem sending IP fragments

2007-03-03 Thread Benjamin Herrenschmidt
Geoff, I suspect gelic_net might have the same problem...

Cheers,
Ben.

On Fri, 2007-03-02 at 18:39 +0100, Norbert Eicker wrote:
> On Fri 2.3.2007 00:34, Linas Vepstas wrote:
> > On Thu, Mar 01, 2007 at 04:52:54PM -0600, Chris Engel wrote:
> > > I tried to apply this patch to 2.6.21-rc2 and CHECKSUM_HW appears
> > > to be changed to CHECKSUM_COMPLETE
> 
> Oops. I did not test this on the actual 2.6.21-rc2 before sending it.
> It worked fine for me on 2.6.18.
> 
> In the meantime it tested the patch below on 2.6.21.
> 
> > The use of CHECKSUM_HW was replaced by CHECKSUM_PARTIAL and
> > CHECKSUM_COMPLETE on a cae-by-case basis, in the patch series leading
> > up to 2.6.19.  In this case, I'm not sure which should have been
> > used.
> 
> In fact CHECKSUM_COMPLETE seems to be used on the receiving side while
> CHECKSUM_PARTIAL is the one to be used while sending frames. Thus the
> latter is the one to chose.
> 
> > Norbert, can you resubmit a patch that applies to a more recent
> > kernel? p.s. your emailer replaced tabs by spaces ...
> 
> so here's the new one:
> 
> Fix problem sending IP fragments on spidernet.
> 
> Signed-off-by: Norbert Eicker <[EMAIL PROTECTED]>
> ---
> diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
> index 3b91af8..e3019d5 100644
> --- a/drivers/net/spider_net.c
> +++ b/drivers/net/spider_net.c
> @@ -719,7 +719,7 @@ spider_net_prepare_tx_descr(struct spide
>   SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
>   spin_unlock_irqrestore(&chain->lock, flags);
> 
> - if (skb->protocol == htons(ETH_P_IP))
> + if (skb->protocol == htons(ETH_P_IP) && skb->ip_summed == 
> CHECKSUM_PARTIAL)
>   switch (skb->nh.iph->protocol) {
>   case IPPROTO_TCP:
>   hwdescr->dmac_cmd_status |= SPIDER_NET_DMAC_TCP;
> 
> ___
> Linuxppc-dev mailing list
> [EMAIL PROTECTED]
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-03-03 Thread Baruch Even
* David Miller <[EMAIL PROTECTED]> [070303 08:22]:
> BTW, I think I figured out a way to get rid of
> lost_{skb,cnt}_hint.  The fact of the matter in this case is that
> the setting of the tag bits always propagates from front of the queue
> onward.  We don't get holes mid-way.
> 
> So what we can do is search the RB-tree for high_seq and walk
> backwards.  Once we hit something with TCPCB_TAGBITS set, we
> stop processing as there are no earlier SKBs which we'd need
> to do anything with.
> 
> Do you see any problems with that idea?

I think this will be a fairly long walk initially.

You can try an augmented walk. If you are on a node which is tagged
anything on the right side will be tagged as well since it is smaller,
so you need to go left. This way you can find the first non-tagged item
in O(log n).

A bug in this logic is that sequence numbers can and do wrap around.

If you are willing to change the logic of the tree you can remove any
sacked element from it. Many of the operations are really only
interested in the non-sacked skbs. This will be similar to my patches
with the non-sacked list but I still needed the hints since the number
of lost packets could still be large and some operations (retransmit
f.ex.) need to get to the end of the list.


> scoreboard_skb_hint is a little bit trickier, but it is a similar
> case to the tcp_lost_skb_hint case.  Except here the termination
> condition is a relative timeout instead of a sequence number and
> packet count test.
> 
> Perhaps for that we can remember some state from the
> tcp_mark_head_lost() we do first.  In fact, we can start
> the queue walk from the latest packet which tcp_mark_head_lost()
> marked with a tag bit.
> 
> Basically these two algorithms are saying:
> 
> 1) Mark up to smallest of 'lost' or tp->high_seq.
> 2) Mark packets after those processed in #1 which have
>timed out.
> 
> Right?

Yes. This makes sense, the two algorithms start from the same place. I'd
even consider merging them into a single walk, unless we know that
usually on happens without the other.

There is another case like that for tcp_xmit_retrans where the forward
transmission should only start at the position that the retransmit
finished. I had that in my old patches and it improved performance at
the time.

Baruch
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html