s390 allmodconfig

2007-03-02 Thread Andrew Morton

Not sure who to blame for all of this...

net/bluetooth/hidp/Kconfig:4:warning: 'select' used by config symbol 'BT_HIDP' 
refer to undefined symbol 'HID'
net/mac80211/Kconfig:17:warning: 'select' used by config symbol 'MAC80211_LEDS' 
refer to undefined symbol 'NEW_LEDS'
net/mac80211/Kconfig:18:warning: 'select' used by config symbol 'MAC80211_LEDS' 
refer to undefined symbol 'LEDS_TRIGGERS'
drivers/net/Kconfig:1435:warning: 'select' used by config symbol 'B44' refer to 
undefined symbol 'SSB'
drivers/net/wireless/bcm43xx/Kconfig:5:warning: 'select' used by config symbol 
'BCM43XX' refer to undefined symbol 'HW_RANDOM'
drivers/net/wireless/mac80211/bcm43xx/Kconfig:13:warning: 'select' used by 
config symbol 'BCM43XX_MAC80211_PCI' refer to undefined symbol 'SSB_PCIHOST'
drivers/net/wireless/mac80211/bcm43xx/Kconfig:14:warning: 'select' used by 
config symbol 'BCM43XX_MAC80211_PCI' refer to undefined symbol 
'SSB_DRIVER_PCICORE'
drivers/net/wireless/mac80211/bcm43xx/Kconfig:27:warning: 'select' used by 
config symbol 'BCM43XX_MAC80211_PCMCIA' refer to undefined symbol 
'SSB_PCMCIAHOST'
drivers/net/wireless/mac80211/bcm43xx/Kconfig:5:warning: 'select' used by 
config symbol 'BCM43XX_MAC80211' refer to undefined symbol 'SSB'
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Andrew Morton
net/mac80211/ieee80211_led.c: In function 'ieee80211_led_init':
net/mac80211/ieee80211_led.c:38: error: invalid application of 'sizeof' to 
incomplete type 'struct led_trigger' 
net/mac80211/ieee80211_led.c:43: error: dereferencing pointer to incomplete type
net/mac80211/ieee80211_led.c:44: warning: implicit declaration of function 
'led_trigger_register'
net/mac80211/ieee80211_led.c:49: error: invalid application of 'sizeof' to 
incomplete type 'struct led_trigger' 
net/mac80211/ieee80211_led.c:54: error: dereferencing pointer to incomplete type
net/mac80211/ieee80211_led.c: In function 'ieee80211_led_exit':
net/mac80211/ieee80211_led.c:64: warning: implicit declaration of function 
'led_trigger_unregister'

akpm2:/usr/src/25> grep LED .config
CONFIG_NF_CONNTRACK_ENABLED=m
CONFIG_MAC80211_LEDS=y

Probably related to the Kconfig problems.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-02 Thread Evgeniy Polyakov
On Sat, Feb 17, 2007 at 04:13:02PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
> > >I noticed in an LCA talk mention that apprently extensible hashing
> > >with RCU access is an unsolved problem.  Here's an idea for solving it.
> > >
> > 
> > Yes, I have been playing around with the same idea for
> > doing dynamic resizing of the TCP hashtable.
> > 
> > Did a prototype "toy" implementation, and I have a
> > "half-finished" patch which resizes the TCP hashtable
> > at runtime. Hmmm, your mail may be the impetus to get
> > me to finally finish this thing
> 
> Why anyone do not want to use trie - for socket-like loads it has
> exactly constant search/insert/delete time and scales as hell.

Ok, I've ran an analysis of linked lists and trie traversals and found 
that (at least on x86) optimized one list traversal is about 4 (!) 
times faster than one bit lookup in trie traversal (or actually one
lookup in binary tree-like structure) - that is because of the fact 
that trie traversal needs to have more instructions per lookup, and at 
least one additional branch which can not be predicted.

Tests with rdtsc shows that one bit lookup in trie (actually it is any
lookup in binary tree structures) is about 3-4 times slower than one
lookup in linked list.

Since hash table usually has upto 4 elements in each hash entry,
competing binary tree/trie stucture must get an entry in one lookup,
which is essentially impossible with usual tree/trie implementations.

Things dramatically change when linked list became too long, but it
should not happend with proper resizing of the hash table, wildcards
implementation also introduce additional requirements, which can not be
easily solved in hash tables.

So I get my words about tree/trie implementation instead of hash table 
for socket lookup back.

Interested reader can find more details on tests, asm outputs and
conclusions at:
http://tservice.net.ru/~s0mbre/blog/2007/03/01#2007_03_01

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: need some help on a backport of r8169

2007-03-02 Thread Pascal GREGIS
[EMAIL PROTECTED] a écrit, le Thu 01 Mar 2007 à 10:57:11AM :
> Hello Ueimor,
> [...] 
> > Once you have logged the ifconfig/ethtool dump, you can try the serie
> > or the patch at:
> > 
> > http://www.fr.zoreil.com/people/francois/backport/r8169/20070228-00
> Hum... ok I might have enough time to check it, not sure though, I
> have a point with my boss this morning.
Indeed I wasn't able to test it yesterday. I won't be able today so,
the hardware being required for other tests, but don't worry, I don't
forget you, I'll test it as soon as I can, probably next week.

> 
> > 
> > Btw:
> > 
> > [...dmesg dump...]
> > > Enabling fast FPU save and restore... done.
> > > Enabling unmasked SIMD FPU exception support... done.
> > > Checking 'hlt' instruction... OK.
> > > ACPI: setting ELCR to 0200 (from 0c08)
> > > NET: Registered protocol family 16
> > > PCI: PCI BIOS revision 3.00 entry at 0xf0031, last bus=2
> > > PCI: Using MMCONFIG
> > 
> > Please disable MMCONFIG.
> In the BIOS?
> 
> > 
> > If you have any PCI latency option in your bios, set it to 64.
> I'm not the BIOS-master, I'll suggest it.
> 
> > 
> > -- 
> > Ueimor
> 
> Sigerg
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Arp announce (for Xen)

2007-03-02 Thread Pekka Savola

On Thu, 1 Mar 2007, Stephen Hemminger wrote:

What about implementing the unused arp_announce flag on the inetdevice?
Something like the following.  Totally untested...

Looks like it either was there (and got removed) or was planned but
never implemented.


If something like this goes in, it wouldn't hurt to do similar with 
IPv6 (RFC2461 section 7.2.6).


There are very popular hardware-based routers which refresh their NDP 
caches only every 24 hours or 20 minutes (depending on the software 
version).  Sending unsolicited NAs would eliminate traffic 
blackholing.



diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index e10794d..cefc339 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1089,6 +1089,16 @@ static int inetdev_event(struct notifier
}
}
ip_mc_up(in_dev);
+   /* fallthru */
+
+   case NETDEV_CHANGEADDR:
+   /* Send gratuitous ARP in case of address change or new device 
*/
+   if (IN_DEV_ARP_ANNOUNCE(in_dev))
+   arp_send(ARPOP_REQUEST, ETH_P_ARP,
+in_dev->ifa_list->ifa_address, dev,
+in_dev->ifa_list->ifa_address, NULL,
+dev->dev_addr, NULL);
+
break;
case NETDEV_DOWN:
ip_mc_down(in_dev);

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Pekka Savola "You each name yourselves king, yet the
Netcore Oykingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-02 Thread Eric Dumazet
On Friday 02 March 2007 09:52, Evgeniy Polyakov wrote:

> Ok, I've ran an analysis of linked lists and trie traversals and found
> that (at least on x86) optimized one list traversal is about 4 (!)
> times faster than one bit lookup in trie traversal (or actually one
> lookup in binary tree-like structure) - that is because of the fact
> that trie traversal needs to have more instructions per lookup, and at
> least one additional branch which can not be predicted.
>
> Tests with rdtsc shows that one bit lookup in trie (actually it is any
> lookup in binary tree structures) is about 3-4 times slower than one
> lookup in linked list.
>
> Since hash table usually has upto 4 elements in each hash entry,
> competing binary tree/trie stucture must get an entry in one lookup,
> which is essentially impossible with usual tree/trie implementations.
>
> Things dramatically change when linked list became too long, but it
> should not happend with proper resizing of the hash table, wildcards
> implementation also introduce additional requirements, which can not be
> easily solved in hash tables.
>
> So I get my words about tree/trie implementation instead of hash table
> for socket lookup back.
>
> Interested reader can find more details on tests, asm outputs and
> conclusions at:
> http://tservice.net.ru/~s0mbre/blog/2007/03/01#2007_03_01

Thank you for this report. (Still avoiding cache misses studies, while they 
obviously are the limiting factor)

Anyqay, if data is in cache and you want optimum performance from your cpu,
you may try to use an algorithm without conditional branches :
(well 4 in this case for the whole 32 bits tests)

gcc -O2 -S -march=i686 test1.c

struct node {
struct node *left;
struct node *right;
int value;
};
struct node *head;
int v1;

#define PASS2(bit) \
n2 = n1->left; \
right = n1->right; \
if (value & (1right; \
if (value & (2<>= 8;
}
printf("result=%p\n", n1);
}
.file   "test1.c"
.section.rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "result=%p\n"
.text
.p2align 4,,15
.globl main
.type   main, @function
main:
leal4(%esp), %ecx
andl$-16, %esp
pushl   -4(%ecx)
pushl   %ebp
movl%esp, %ebp
pushl   %ebx
xorl%ebx, %ebx
pushl   %ecx
subl$16, %esp
movlv1, %ecx
movlhead, %edx
.p2align 4,,7
.L2:
movl4(%edx), %eax
testb   $1, %cl
cmove   (%edx), %eax
testb   $2, %cl
movl4(%eax), %edx
cmove   (%eax), %edx
testb   $4, %cl
movl4(%edx), %eax
cmove   (%edx), %eax
testb   $8, %cl
movl4(%eax), %edx
cmove   (%eax), %edx
testb   $16, %cl
movl4(%edx), %eax
cmove   (%edx), %eax
testb   $32, %cl
movl4(%eax), %edx
cmove   (%eax), %edx
testb   $64, %cl
movl4(%edx), %eax
cmove   (%edx), %eax
testb   %cl, %cl
movl4(%eax), %edx
cmovns  (%eax), %edx
addl$1, %ebx
cmpl$4, %ebx
je  .L19
shrl$8, %ecx
jmp .L2
.p2align 4,,7
.L19:
movl%edx, 4(%esp)
movl$.LC0, (%esp)
callprintf
addl$16, %esp
popl%ecx
popl%ebx
popl%ebp
leal-4(%ecx), %esp
ret
.size   main, .-main
.comm   head,4,4
.comm   v1,4,4
.ident  "GCC: (GNU) 4.1.2 20060928 (prerelease) (Ubuntu 
4.1.1-13ubuntu5)"
.section.note.GNU-stack,"",@progbits


Re: CLOCK_MONOTONIC datagram timestamps by the kernel

2007-03-02 Thread Eric Dumazet
On Friday 02 March 2007 10:26, John wrote:
> Eric Dumazet wrote:

> > Anyway, if you want to play, you can apply this patch on top of
> > linux-2.6.21-rc2  (nanosecond resolution infrastructure needs 2.6.21)
> > I let you do the adjustments for rt kernel.
>
> Why does it require 2.6.21?

Well, this patch was done on top of the latest kernel for obvious practical 
reasons, but you probably can adapt it on the kernel of your choice.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CLOCK_MONOTONIC datagram timestamps by the kernel

2007-03-02 Thread John

Eric Dumazet wrote:


John wrote:


Consider an idle Linux 2.6.20-rt8 system, equipped with a single PCI-E
gigabit Ethernet NIC, running on a modern CPU (e.g. Core 2 Duo E6700).
All this system does is time stamp 1000 packets per second.

Are you claiming that this platform *cannot* handle most packets within
less than 1 microsecond of their arrival?


Yes I claim it. You expect too much of this platform, unless "most" means
10 % for you ;)


By "most" I meant more than 50%.

Has someone tried to measure interrupt latency in Linux? I'd like to 
plot the distribution of network IRQ to interrupt handler latencies.


If you replace "1 us" by "50 us", then yes, it probably can do it, if "most" 
means 99%, (not 99.999 %)


I think we need cold, hard numbers at this point :-)

Anyway, if you want to play, you can apply this patch on top of 
linux-2.6.21-rc2  (nanosecond resolution infrastructure needs 2.6.21)

I let you do the adjustments for rt kernel.


Why does it require 2.6.21?


This patch converts sk_buff timestamp to use new nanosecond infra
(added in 2.6.21)


Is this mentioned somewhere in the 2.6.21-rc1 ChangeLog?
http://kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.21-rc1

Regards.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Richard Purdie
On Fri, 2007-03-02 at 00:25 -0800, Andrew Morton wrote:
> net/mac80211/ieee80211_led.c: In function 'ieee80211_led_init':
> net/mac80211/ieee80211_led.c:38: error: invalid application of 'sizeof' to 
> incomplete type 'struct led_trigger' 
> net/mac80211/ieee80211_led.c:43: error: dereferencing pointer to incomplete 
> type
> net/mac80211/ieee80211_led.c:44: warning: implicit declaration of function 
> 'led_trigger_register'
> net/mac80211/ieee80211_led.c:49: error: invalid application of 'sizeof' to 
> incomplete type 'struct led_trigger' 
> net/mac80211/ieee80211_led.c:54: error: dereferencing pointer to incomplete 
> type
> net/mac80211/ieee80211_led.c: In function 'ieee80211_led_exit':
> net/mac80211/ieee80211_led.c:64: warning: implicit declaration of function 
> 'led_trigger_unregister'
> 
> akpm2:/usr/src/25> grep LED .config
> CONFIG_NF_CONNTRACK_ENABLED=m
> CONFIG_MAC80211_LEDS=y
> 
> Probably related to the Kconfig problems.

Almost certainly. Someone is building some LED trigger/driver without
the LED core enabled which is what that Kconfig warning was about.

Nobody's ever mentioned this driver to me...

Richard
(LED Maintainer)





-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-02 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 10:56:23AM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> On Friday 02 March 2007 09:52, Evgeniy Polyakov wrote:
> 
> > Ok, I've ran an analysis of linked lists and trie traversals and found
> > that (at least on x86) optimized one list traversal is about 4 (!)
> > times faster than one bit lookup in trie traversal (or actually one
> > lookup in binary tree-like structure) - that is because of the fact
> > that trie traversal needs to have more instructions per lookup, and at
> > least one additional branch which can not be predicted.
> >
> > Tests with rdtsc shows that one bit lookup in trie (actually it is any
> > lookup in binary tree structures) is about 3-4 times slower than one
> > lookup in linked list.
> >
> > Since hash table usually has upto 4 elements in each hash entry,
> > competing binary tree/trie stucture must get an entry in one lookup,
> > which is essentially impossible with usual tree/trie implementations.
> >
> > Things dramatically change when linked list became too long, but it
> > should not happend with proper resizing of the hash table, wildcards
> > implementation also introduce additional requirements, which can not be
> > easily solved in hash tables.
> >
> > So I get my words about tree/trie implementation instead of hash table
> > for socket lookup back.
> >
> > Interested reader can find more details on tests, asm outputs and
> > conclusions at:
> > http://tservice.net.ru/~s0mbre/blog/2007/03/01#2007_03_01
> 
> Thank you for this report. (Still avoiding cache misses studies, while they 
> obviously are the limiting factor)
> 
> Anyqay, if data is in cache and you want optimum performance from your cpu,
> you may try to use an algorithm without conditional branches :
> (well 4 in this case for the whole 32 bits tests)

Tests were always for no-cache-miss case.
I also ran them in kenel mode (to eliminate tlb flushes per rescheduling
and to get into account that kernel tlb covers 8mb while userspace only
4k), but results were essentially the same (modulo several percents). I
only tested trie, in my impementation its memory usage is smaller than
hash table for 2^20 entries.

> gcc -O2 -S -march=i686 test1.c
> 

> struct node {
>   struct node *left;
>   struct node *right;
>   int value;
>   };
> struct node *head;
> int v1;
> 
> #define PASS2(bit) \
>   n2 = n1->left; \
>   right = n1->right; \
> if (value & (1< n2 = right; \
>   n1 = n2->left; \
>   right = n2->right; \
>   if (value & (2<   n1 = right;
> 
> main()
> {
> int j;
> unsigned int value = v1;
> struct node *n1 = head, *n2, *right;
> for (j=0; j<4; ++j) {
>   PASS2(0)
>   PASS2(2)
>   PASS2(4)
>   PASS2(6)
>   value >>= 8;
>   }
> printf("result=%p\n", n1);
> }

This one resulted in 10*4 and 2*4 branches per loop.
So total 32 branches (instead of 64 in simpler code) and 160
instructions (instead of 128 in simpler code).
Getting that branch is two times longer to execute (though it is quite
strange sentence, but I must admit, that I did not read x86 processor
manual at all (only ppc32)) according to tests, we do not get any gain
for 32bit value (32 lookups): 64*2+128 in old case, 32*2+160 in new one.

I also have advanced trie implementation, which caches values in nodes
if there are no child entries, and it _greatly_ decrease number of
lookups and memory usage for smaller sets, but in long run and huge 
amount of entries in trie, it does not matter since only the 
lowest layer caches values.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Johannes Berg
On Fri, 2007-03-02 at 00:25 -0800, Andrew Morton wrote:

> Probably related to the Kconfig problems.

Yeah, it is. s390 is funny, it doesn't include drivers/Kconfig, I don't
think anybody of us would have suspected that.

There doesn't seem to be a reason why it shouldn't have drivers/leds
though. drivers/ssb I don't know about, does s390 have pci or pcmcia?
And the bluetooth stuff is also plain weird, I suppose s390 really
should include drivers/hid/Kconfig :)

Same with drivers/char that includes hw_random.

Is there any reason it isn't including drivers/Kconfig? 


I can offer below patch to fix the LED trigger problem, it's probably
cleaner to depend on LEDS_TRIGGERS rather than selecting it and
NEW_LEDS.

johannes

--- wireless-dev.orig/net/mac80211/Kconfig  2007-03-02 11:18:45.464333268 
+0100
+++ wireless-dev/net/mac80211/Kconfig   2007-03-02 11:33:24.534333268 +0100
@@ -13,9 +13,7 @@ config MAC80211
 
 config MAC80211_LEDS
bool "Enable LED triggers"
-   depends on MAC80211
-   select NEW_LEDS
-   select LEDS_TRIGGERS
+   depends on MAC80211 && LEDS_TRIGGERS
---help---
This option enables a few LED triggers for different
packet receive/transmit events.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Andrew Morton
On Fri, 02 Mar 2007 10:32:32 + Richard Purdie <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-03-02 at 00:25 -0800, Andrew Morton wrote:
> > net/mac80211/ieee80211_led.c: In function 'ieee80211_led_init':
> > net/mac80211/ieee80211_led.c:38: error: invalid application of 'sizeof' to 
> > incomplete type 'struct led_trigger' 
> > net/mac80211/ieee80211_led.c:43: error: dereferencing pointer to incomplete 
> > type
> > net/mac80211/ieee80211_led.c:44: warning: implicit declaration of function 
> > 'led_trigger_register'
> > net/mac80211/ieee80211_led.c:49: error: invalid application of 'sizeof' to 
> > incomplete type 'struct led_trigger' 
> > net/mac80211/ieee80211_led.c:54: error: dereferencing pointer to incomplete 
> > type
> > net/mac80211/ieee80211_led.c: In function 'ieee80211_led_exit':
> > net/mac80211/ieee80211_led.c:64: warning: implicit declaration of function 
> > 'led_trigger_unregister'
> > 
> > akpm2:/usr/src/25> grep LED .config
> > CONFIG_NF_CONNTRACK_ENABLED=m
> > CONFIG_MAC80211_LEDS=y
> > 
> > Probably related to the Kconfig problems.
> 
> Almost certainly. Someone is building some LED trigger/driver without
> the LED core enabled which is what that Kconfig warning was about.
> 
> Nobody's ever mentioned this driver to me...
> 

It's a mountain of new wireless code in the just-released 2.6.21-rc2-mm1.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Andrew Morton
On Fri, 02 Mar 2007 11:38:24 +0100 Johannes Berg <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-03-02 at 00:25 -0800, Andrew Morton wrote:
> 
> > Probably related to the Kconfig problems.
> 
> Yeah, it is. s390 is funny, it doesn't include drivers/Kconfig, I don't
> think anybody of us would have suspected that.
> 
> There doesn't seem to be a reason why it shouldn't have drivers/leds
> though. drivers/ssb I don't know about, does s390 have pci or pcmcia?

No, s390 doesn't have PCI.

> And the bluetooth stuff is also plain weird, I suppose s390 really
> should include drivers/hid/Kconfig :)
> 
> Same with drivers/char that includes hw_random.
> 
> Is there any reason it isn't including drivers/Kconfig? 
> 

s390 is weird ;)   There's no way it'll support any of the hardware which you're
working on (until they release the s390 laptop).  So all we really want to
do here is to avoid breaking s390 allmodconfig.
 
> I can offer below patch to fix the LED trigger problem, it's probably
> cleaner to depend on LEDS_TRIGGERS rather than selecting it and
> NEW_LEDS.
> 
> johannes
> 
> --- wireless-dev.orig/net/mac80211/Kconfig2007-03-02 11:18:45.464333268 
> +0100
> +++ wireless-dev/net/mac80211/Kconfig 2007-03-02 11:33:24.534333268 +0100
> @@ -13,9 +13,7 @@ config MAC80211
>  
>  config MAC80211_LEDS
>   bool "Enable LED triggers"
> - depends on MAC80211
> - select NEW_LEDS
> - select LEDS_TRIGGERS
> + depends on MAC80211 && LEDS_TRIGGERS
>   ---help---
>   This option enables a few LED triggers for different
>   packet receive/transmit events.

OK, I'll try that, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Johannes Berg
On Fri, 2007-03-02 at 03:06 -0800, Andrew Morton wrote:

> No, s390 doesn't have PCI.

Ok.

> s390 is weird ;)   There's no way it'll support any of the hardware which 
> you're
> working on (until they release the s390 laptop).  So all we really want to
> do here is to avoid breaking s390 allmodconfig.

Alright. I think we'll probably have to make bcm43xx and b44 depend on
SSB instead of selecting it like the LED trigger stuff below.

But I don't see why s390 can't include hw random, led trigger or even
hid, those are all software features afaict.
 

> OK, I'll try that, thanks.

Not that it'll actually help get the compile through... bcm43xx will
drop fail and bluetooth probably as well.

johannes


signature.asc
Description: This is a digitally signed message part


Re: s390 allmodconfig

2007-03-02 Thread Andrew Morton
On Fri, 02 Mar 2007 12:11:48 +0100 Johannes Berg <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-03-02 at 03:06 -0800, Andrew Morton wrote:
> 
> > No, s390 doesn't have PCI.
> 
> Ok.
> 
> > s390 is weird ;)   There's no way it'll support any of the hardware which 
> > you're
> > working on (until they release the s390 laptop).  So all we really want to
> > do here is to avoid breaking s390 allmodconfig.
> 
> Alright. I think we'll probably have to make bcm43xx and b44 depend on
> SSB instead of selecting it like the LED trigger stuff below.
> 
> But I don't see why s390 can't include hw random, led trigger or even
> hid, those are all software features afaict.
>  
> 
> > OK, I'll try that, thanks.
> 
> Not that it'll actually help get the compile through... bcm43xx will
> drop fail and bluetooth probably as well.
> 

OK, thanks.

fwiw, http://userweb.kernel.org/~akpm/cross-compilers/ has an s390
cross-compiler binary.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [TCP]: FRTO undo response falls back to ratehalving one if ECEd

2007-03-02 Thread Ilpo Järvinen
Undoing ssthresh is disabled in fastretrans_alert whenever
FLAG_ECE is set by clearing prior_ssthresh. This clearing does
not protect FRTO because FRTO operates before fastretrans_alert.
Moving the clearing of prior_ssthresh earlier seems to be a
suboptimal solution to the FRTO case because then FLAG_ECE will
cause a second ssthresh reduction in try_to_open (the first
occurred when FRTO was entered). So instead, FRTO falls back
immediately to the rate halving response, which switches TCP to
CA_CWR state preventing the latter reduction of ssthresh.

If the first ECE arrived before the ACK after which FRTO is able
to decide RTO as spurious, prior_ssthresh is already cleared.
Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be
set also in the following ACKs resulting in rate halving response
that sees TCP already in CA_CWR, which again prevents an extra
ssthresh reduction on that round-trip.

If the first ECE arrived before RTO, ssthresh has already been
adapted and prior_ssthresh remains cleared on entry because TCP
is in CA_CWR (the same applies also to a case where FRTO is
entered more than once and ECE comes in the middle).

I believe that after this patch, FRTO should be ECN-safe and
even able to take advantage of synergy benefits.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 net/ipv4/tcp_input.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index dc221a3..bdd6172 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2592,9 +2592,12 @@ static void tcp_ratehalving_spur_to_resp
tp->high_seq = tp->frto_highmark;   /* Smoother w/o this? - ij */
 }
 
-static void tcp_undo_spur_to_response(struct sock *sk)
+static void tcp_undo_spur_to_response(struct sock *sk, int flag)
 {
-   tcp_undo_cwr(sk, 1);
+   if (flag&FLAG_ECE)
+   tcp_ratehalving_spur_to_response(sk);
+   else
+   tcp_undo_cwr(sk, 1);
 }
 
 /* F-RTO spurious RTO detection algorithm (RFC4138)
@@ -2680,7 +2683,7 @@ static int tcp_process_frto(struct sock 
return 1;
} else /* frto_counter == 2 */ {
switch (sysctl_tcp_frto_response) {
-   case 2: tcp_undo_spur_to_response(sk); break;
+   case 2: tcp_undo_spur_to_response(sk, flag); break;
case 1: tcp_conservative_spur_to_response(tp); break;
default: tcp_ratehalving_spur_to_response(sk); break;
}
-- 
1.4.2

[PATCH v2] [TCP]: FRTO undo response falls back to ratehalving one if ECEd

2007-03-02 Thread Ilpo Järvinen
Undoing ssthresh is disabled in fastretrans_alert whenever
FLAG_ECE is set by clearing prior_ssthresh. The clearing does
not protect FRTO because FRTO operates before fastretrans_alert.
Moving the clearing of prior_ssthresh earlier seems to be a
suboptimal solution to the FRTO case because then FLAG_ECE will
cause a second ssthresh reduction in try_to_open (the first
occurred when FRTO was entered). So instead, FRTO falls back
immediately to the rate halving response, which switches TCP to
CA_CWR state preventing the latter reduction of ssthresh.

If the first ECE arrived before the ACK after which FRTO is able
to decide RTO as spurious, prior_ssthresh is already cleared.
Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be
set also in the following ACKs resulting in rate halving response
that sees TCP is already in CA_CWR, which again prevents an extra
ssthresh reduction on that round-trip.

If the first ECE arrived before RTO, ssthresh has already been
adapted and prior_ssthresh remains cleared on entry because TCP
is in CA_CWR (the same applies also to a case where FRTO is
entered more than once and ECE comes in the middle).

High_seq must not be touched after tcp_enter_cwr because CWR
round-trip calculation depends on it.

I believe that after this patch, FRTO should be ECN-safe and
even able to take advantage of synergy benefits.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---

Of course I forgot to fix also the high_seq thing I had in mind last 
evening, so here is this again now with it too.


diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index dc221a3..6b268dc 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2587,14 +2587,15 @@ static void tcp_conservative_spur_to_res
  */
 static void tcp_ratehalving_spur_to_response(struct sock *sk)
 {
-   struct tcp_sock *tp = tcp_sk(sk);
tcp_enter_cwr(sk, 0);
-   tp->high_seq = tp->frto_highmark;   /* Smoother w/o this? - ij */
 }
 
-static void tcp_undo_spur_to_response(struct sock *sk)
+static void tcp_undo_spur_to_response(struct sock *sk, int flag)
 {
-   tcp_undo_cwr(sk, 1);
+   if (flag&FLAG_ECE)
+   tcp_ratehalving_spur_to_response(sk);
+   else
+   tcp_undo_cwr(sk, 1);
 }
 
 /* F-RTO spurious RTO detection algorithm (RFC4138)
@@ -2680,7 +2681,7 @@ static int tcp_process_frto(struct sock 
return 1;
} else /* frto_counter == 2 */ {
switch (sysctl_tcp_frto_response) {
-   case 2: tcp_undo_spur_to_response(sk); break;
+   case 2: tcp_undo_spur_to_response(sk, flag); break;
case 1: tcp_conservative_spur_to_response(tp); break;
default: tcp_ratehalving_spur_to_response(sk); break;
}
-- 
1.4.2


Re: Network activity LED trigger

2007-03-02 Thread Florian Fainelli
Hi All,

Some more thoughts. The IDE activity LED trigger is currently triggered when a 
function is called in the IDE writing/reading routines.

In a similar way, we could call the trigger function in net/core/dev.c in 
netif_receive_skb and netif_rx ?

I was also thinking that some network NIC already have LEDs, so it is not 
necessary for those models to "overload" the user with lights everywhere.

Regars, Florian

Le jeudi 1 mars 2007, Florian Fainelli a écrit :
> Hi All,
>
> I have been talking a bit with Richard, who is the LED API maintainer, and
> a LED trigger based on network activity would be something great.
>
> There are somethings that concern the network stack :
>
> - should we specify if the network driver is allowed to contribute to
> the LED activity, just like it is done for random generation, at compile
> time
>
> - I would like to trigger the LED based on one or several network
> interfaces, maybe specify via sysfs which interface triggers which LED,
> and also maybe differentiate the layer-2 activity from the layer-3
> activity for instance
>
> - A led driver could by default be bound to a network driver, or an
> interface name
>
> As it could be very intrusive in the network stack, you might want to
> specify a bit more how you imagine a network activity trigger.
>
> Thanks
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network activity LED trigger

2007-03-02 Thread jamal

Where are these LEDs typically located? Are you talking about LEDs on a
network card for example? can you light them up in different colors?

cheers,
jamal

On Fri, 2007-02-03 at 13:58 +0100, Florian Fainelli wrote:
> Hi All,
> 
> Some more thoughts. The IDE activity LED trigger is currently triggered when 
> a 
> function is called in the IDE writing/reading routines.
> 
> In a similar way, we could call the trigger function in net/core/dev.c in 
> netif_receive_skb and netif_rx ?
> 
> I was also thinking that some network NIC already have LEDs, so it is not 
> necessary for those models to "overload" the user with lights everywhere.
> 
> R

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network activity LED trigger

2007-03-02 Thread Florian Fainelli
Hi,

Le vendredi 2 mars 2007, jamal a écrit :
> Where are these LEDs typically located? Are you talking about LEDs on a
> network card for example? can you light them up in different colors?

Those LEDS are typically controlled by GPIO lines visible in front of the 
device. It is mostly targeted to embedded devices for which you do not 
necessarily want to assign a LED to a given network interface

>
> cheers,
> jamal
>
> On Fri, 2007-02-03 at 13:58 +0100, Florian Fainelli wrote:
> > Hi All,
> >
> > Some more thoughts. The IDE activity LED trigger is currently triggered
> > when a function is called in the IDE writing/reading routines.
> >
> > In a similar way, we could call the trigger function in net/core/dev.c in
> > netif_receive_skb and netif_rx ?
> >
> > I was also thinking that some network NIC already have LEDs, so it is not
> > necessary for those models to "overload" the user with lights everywhere.
> >
> > R
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cordialement, Florian Fainelli
-
5, rue Charles Fourier
Chambre 1202
91011 Evry
http://www.alphacore.net
(+33) 01 60 76 64 21
(+33) 06 09 02 64 95
-
Association MiNET
http://www.minet.net
-
Institut National des Télécommunication
http://www.int-evry.fr/telecomint
-
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] tc35815 driver update (part 2)

2007-03-02 Thread Atsushi Nemoto
More updates for tc35815 driver, including:

* TX4939 support.
* NETPOLL support.
* NAPI support. (disabled by default)
* Reduce memcpy on receiving.
* PM support.
* Many cleanups and bugfixes.

Signed-off-by: Atsushi Nemoto <[EMAIL PROTECTED]>
---
 drivers/net/tc35815.c   |  827 +++---
 include/linux/pci_ids.h |1 
 2 files changed, 632 insertions(+), 196 deletions(-)

diff --git a/drivers/net/tc35815.c b/drivers/net/tc35815.c
index 0cf1f87..ec888db 100644
--- a/drivers/net/tc35815.c
+++ b/drivers/net/tc35815.c
@@ -38,9 +38,33 @@
  * Add workaround for 100MHalf HUB.
  * 1.22Minor fix.
  * 1.23Minor cleanup.
+ * 1.24Remove tc35815_setup since new stype option
+ * ("tc35815.speed=10", etc.) can be used for 2.6 kernel.
+ * 1.25TX4939 support.
+ * 1.26Minor cleanup.
+ * 1.27Move TX4939 PCFG.SPEEDn control code out from this driver.
+ * Cleanup init_dev_addr. (NETDEV_REGISTER event notifier
+ * can overwrite dev_addr)
+ * support ETHTOOL_GPERMADDR.
+ * 1.28Minor cleanup.
+ * 1.29support netpoll.
+ * 1.30Minor cleanup.
+ * 1.31NAPI support. (disabled by default)
+ * Use DMA_RxAlign_2 if possible.
+ * Do not use PackedBuffer.
+ * Cleanup.
+ * 1.32Fix free buffer management on non-PackedBuffer mode.
+ * 1.33Fix netpoll build.
+ * 1.34Fix netpoll locking.  "BH rule" for NAPI is not enough with
+ * netpoll, hard_start_xmit might be called from irq context.
+ * PM support.
  */
 
-#define DRV_VERSION"1.23"
+#ifdef TC35815_NAPI
+#define DRV_VERSION"1.34-NAPI"
+#else
+#define DRV_VERSION"1.34"
+#endif
 static const char *version = "tc35815.c:v" DRV_VERSION "\n";
 #define MODNAME"tc35815"
 
@@ -71,23 +95,27 @@ static const char *version = "tc35815.c:
 #define GATHER_TXINT   /* On-Demand Tx Interrupt */
 #define WORKAROUND_LOSTCAR
 #define WORKAROUND_100HALF_PROMISC
+/* #define TC35815_USE_PACKEDBUFFER */
 
 typedef enum {
TC35815CF = 0,
TC35815_NWU,
+   TC35815_TX4939,
 } board_t;
 
 /* indexed by board_t, above */
-static struct {
+static const struct {
const char *name;
 } board_info[] __devinitdata = {
{ "TOSHIBA TC35815CF 10/100BaseTX" },
{ "TOSHIBA TC35815 with Wake on LAN" },
+   { "TOSHIBA TC35815/TX4939" },
 };
 
-static struct pci_device_id tc35815_pci_tbl[] = {
-   {PCI_VENDOR_ID_TOSHIBA_2, PCI_DEVICE_ID_TOSHIBA_TC35815CF, PCI_ANY_ID, 
PCI_ANY_ID, 0, 0, TC35815CF },
-   {PCI_VENDOR_ID_TOSHIBA_2, PCI_DEVICE_ID_TOSHIBA_TC35815_NWU, 
PCI_ANY_ID, PCI_ANY_ID, 0, 0, TC35815_NWU },
+static const struct pci_device_id tc35815_pci_tbl[] = {
+   {PCI_DEVICE(PCI_VENDOR_ID_TOSHIBA_2, PCI_DEVICE_ID_TOSHIBA_TC35815CF), 
.driver_data = TC35815CF },
+   {PCI_DEVICE(PCI_VENDOR_ID_TOSHIBA_2, 
PCI_DEVICE_ID_TOSHIBA_TC35815_NWU), .driver_data = TC35815_NWU },
+   {PCI_DEVICE(PCI_VENDOR_ID_TOSHIBA_2, 
PCI_DEVICE_ID_TOSHIBA_TC35815_TX4939), .driver_data = TC35815_TX4939 },
{0,}
 };
 MODULE_DEVICE_TABLE (pci, tc35815_pci_tbl);
@@ -140,6 +168,11 @@ struct tc35815_regs {
  * Bit assignments
  */
 /* DMA_Ctl bit asign --- */
+#define DMA_RxAlign0x00c0 /* 1:Reception Alignment   */
+#define DMA_RxAlign_1  0x0040
+#define DMA_RxAlign_2  0x0080
+#define DMA_RxAlign_3  0x00c0
+#define DMA_M66EnStat  0x0008 /* 1:66MHz Enable State*/
 #define DMA_IntMask0x0004 /* 1:Interupt mask */
 #define DMA_SWIntReq   0x0002 /* 1:Software Interrupt request*/
 #define DMA_TxWakeUp   0x0001 /* 1:Transmit Wake Up  */
@@ -351,6 +384,8 @@ struct BDesc {
Int_SSysErrEn  | Int_RMasAbtEn | Int_RTargAbtEn | \
Int_STargAbtEn | \
Int_BLExEn  | Int_FDAExEn) /* maybe 0xb7f*/
+#define DMA_CTL_CMDDMA_BURST_SIZE
+#define HAVE_DMA_RXALIGN(lp)   likely((lp)->boardtype != TC35815CF)
 
 /* Tuning parameters */
 #define DMA_BURST_SIZE 32
@@ -358,12 +393,28 @@ struct BDesc {
 #define TX_THRESHOLD_MAX 1536   /* used threshold with packet max byte for 
low pci transfer ability.*/
 #define TX_THRESHOLD_KEEP_LIMIT 10  /* setting threshold max value when 
overrun error occured this count. */
 
+/* 16 + RX_BUF_NUM * 8 + RX_FD_NUM * 16 + TX_FD_NUM * 32 <= 
PAGE_SIZE*FD_PAGE_NUM */
+#ifdef TC35815_USE_PACKEDBUFFER
 #define FD_PAGE_NUM 2
-#define FD_PAGE_ORDER 1
-/* 16 + RX_BUF_PAGES * 8 + RX_FD_NUM * 16 + TX_FD_NUM * 32 <= PAGE_SIZE*2 */
-#define RX_BUF_PAGES   8   /* >= 2 */
+#define RX_BUF_NUM 8   /* >= 2 */
 #define RX_FD_NUM  250 /* >= 32 */
 #define TX_FD_NUM  128
+#define RX_BUF_SIZEPAGE_SIZE
+#else /* TC35815_USE_PACKEDBUFFER */
+#define FD_PA

[PATCH] NET : convert network timestamps to ktime_t

2007-03-02 Thread Eric Dumazet
We currently use a special structure (struct skb_timeval) and plain 'struct 
timeval' to store packet timestamps in sk_buffs and struct sock.

This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution time 
services, currently capable of nanosecond resolution.

As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte 
shrink of this structure on 64bit architectures. Some other structures also 
benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct 
frag_queue in ipv6/reassembly.c, ...)


Once this ktime infrastructure adopted, we can more easily provide nanosecond 
resolution on top of it. (ioctl SIOCGSTAMPNS and/or 
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

Note : this patch includes a bug correction in compat_sock_get_timestamp() 
where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 
0)

Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
CC: Stephen Hemminger <[EMAIL PROTECTED]>
CC: John find <[EMAIL PROTECTED]>

 include/linux/skbuff.h  |   26 --
 include/net/sock.h  |   18 +++
 net/bridge/netfilter/ebt_ulog.c |6 +++--
 net/compat.c|   15 
 net/core/dev.c  |   19 +++-
 net/core/sock.c |   16 +++--
 net/econet/af_econet.c  |2 -
 net/ipv4/ip_fragment.c  |6 ++---
 net/ipv4/netfilter/ip_queue.c   |6 +++--
 net/ipv4/netfilter/ipt_ULOG.c   |8 --
 net/ipv6/exthdrs.c  |2 -
 net/ipv6/netfilter/ip6_queue.c  |6 +++--
 net/ipv6/netfilter/nf_conntrack_reasm.c |6 ++---
 net/ipv6/reassembly.c   |6 ++---
 net/ipx/af_ipx.c|4 +--
 net/netfilter/nfnetlink_log.c   |8 +++---
 net/netfilter/nfnetlink_queue.c |8 +++---
 net/packet/af_packet.c  |8 --
 18 files changed, 80 insertions(+), 90 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ff3940..24dcbb3 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -27,6 +27,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 
 #define HAVE_ALLOC_SKB /* For the drivers to know */
 #define HAVE_ALIGNABLE_SKB /* Ditto 8)*/
@@ -156,11 +157,6 @@ struct skb_shared_info {
 #define SKB_DATAREF_SHIFT 16
 #define SKB_DATAREF_MASK ((1 << SKB_DATAREF_SHIFT) - 1)
 
-struct skb_timeval {
-   u32 off_sec;
-   u32 off_usec;
-};
-
 
 enum {
SKB_FCLONE_UNAVAILABLE,
@@ -233,7 +229,7 @@ struct sk_buff {
struct sk_buff  *prev;
 
struct sock *sk;
-   struct skb_timeval  tstamp;
+   ktime_t tstamp;
struct net_device   *dev;
struct net_device   *input_dev;
 
@@ -1360,26 +1356,14 @@ extern void skb_add_mtu(int mtu);
  */
 static inline void skb_get_timestamp(const struct sk_buff *skb, struct timeval 
*stamp)
 {
-   stamp->tv_sec  = skb->tstamp.off_sec;
-   stamp->tv_usec = skb->tstamp.off_usec;
+   *stamp = ktime_to_timeval(skb->tstamp);
 }
 
-/**
- * skb_set_timestamp - set timestamp of a skb
- * @skb: skb to set stamp of
- * @stamp: pointer to struct timeval to get stamp from
- *
- * Timestamps are stored in the skb as offsets to a base timestamp.
- * This function converts a struct timeval to an offset and stores
- * it in the skb.
- */
-static inline void skb_set_timestamp(struct sk_buff *skb, const struct timeval 
*stamp)
+static inline void __net_timestamp(struct sk_buff *skb)
 {
-   skb->tstamp.off_sec  = stamp->tv_sec;
-   skb->tstamp.off_usec = stamp->tv_usec;
+   skb->tstamp = ktime_get_real();
 }
 
-extern void __net_timestamp(struct sk_buff *skb);
 
 extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 2c7d60c..19f6540 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -244,7 +244,7 @@ #define sk_prot __sk_common.skc_prot
struct sk_filter*sk_filter;
void*sk_protinfo;
struct timer_list   sk_timer;
-   struct timeval  sk_stamp;
+   ktime_t sk_stamp;
struct socket   *sk_socket;
void*sk_user_data;
struct page *sk_sndmsg_page;
@@ -1307,19 +1307,19 @@ static inline int sock_intr_errno(long t
 static __inline__ void
 sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 {
-   struct timeval stamp;
+   ktime_t kt = skb->tstamp;
 
-   skb_get_timestamp(skb, &stamp);
if (sock_flag(sk, SOCK_RCVTSTAMP)) {

Re: Network activity LED trigger

2007-03-02 Thread jamal
On Fri, 2007-02-03 at 15:16 +0100, Florian Fainelli wrote:
> Hi,
> 
> Le vendredi 2 mars 2007, jamal a écrit :
> > Where are these LEDs typically located? Are you talking about LEDs on a
> > network card for example? can you light them up in different colors?
> 
> Those LEDS are typically controlled by GPIO lines visible in front of the 
> device. It is mostly targeted to embedded devices for which you do not 
> necessarily want to assign a LED to a given network interface
> 

Ah, ok - ive worked with a not-so-embedded board that had something that
was accessible via the ICH; i recall writting a user-space program to
handle it. So instead of calling this just LED, probably find a more
descriptive name for it; Example GPIO-LED.

Those things are tricky to have in a generic code though, no? I.e each
chipset/board will have different address mappings on where to
read/write for a specific LED. So you need to deal with that problem
without requiring changing of the kernel every time an address changes.
I actually found exactly similar board (some manufacturer) but the
firmware was slightly different.

Heres my view of what would be useful:
Have them accessible via the kernel, but also have an API from user
space. This way user space apps can control the LED, but if i wanted to
do it from the kernel i could as well. In my case i was actually
monitoring the health of a daemon; it would show off if the daemon was
not running, green if it was happy, yellow if semi-healthy and Red if it
was in trouble.

here are some operations/messages i can see that are useful which you
probably already have in your API:

turn on LED at #x color somecolor
turn off LED at #y
query LED info at #x
dump all LEDs on board - think of this as a discovery
flicker LED at #z at frequency y color green
maybe even: "I am a wireless card with no LED, I claim LED #x"
which is matched by "tell me if anyone owns LED code"

In other words, if you just provide mechanims let people write the
policies.
This way if i wanted to tie it to my eth0 i can. 

Hope that helps.

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8107] New: dev->header_cache_update has a random value

2007-03-02 Thread Krzysztof Halasa
Andrew Morton <[EMAIL PROTECTED]> writes:

>> However, in 
>> drivers/net/wan/hdlc_cisco.c, in function static int cisco_ioctl(struct 
>> net_device *dev, struct ifreq *ifr), where dev->hard_header is assigned a 
>> valid 
>> function, and dev->hard_header_cache is assigned a known value (NULL), dev-
>> >header_cache_update is not set to a known value:

Right, it seems I was never aware of dev->header_cache_update existence.
I wonder where does the non-NULL value come from? Nevermind.

> diff -puN 
> drivers/net/wan/hdlc_cisco.c~cisco_ioctl-initialise-header_cache_update 
> drivers/net/wan/hdlc_cisco.c
> --- a/drivers/net/wan/hdlc_cisco.c~cisco_ioctl-initialise-header_cache_update
> +++ a/drivers/net/wan/hdlc_cisco.c
> @@ -366,6 +366,7 @@ static int cisco_ioctl(struct net_device
>   dev->hard_start_xmit = hdlc->xmit;
>   dev->hard_header = cisco_hard_header;
>   dev->hard_header_cache = NULL;
> + dev->header_cache_update = NULL;
>   dev->type = ARPHRD_CISCO;
>   dev->flags = IFF_POINTOPOINT | IFF_NOARP;
>   dev->addr_len = 0;
> _

ACK, I think it's the best place.

Is it OK to leave this (and hard_header_cache) set to random value
if dev->hard_header = NULL (as with other protocols)?
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network activity LED trigger

2007-03-02 Thread Richard Purdie
On Fri, 2007-03-02 at 10:16 -0500, jamal wrote:
> Heres my view of what would be useful:
> Have them accessible via the kernel, but also have an API from user
> space. This way user space apps can control the LED, but if i wanted to
> do it from the kernel i could as well. In my case i was actually
> monitoring the health of a daemon; it would show off if the daemon was
> not running, green if it was happy, yellow if semi-healthy and Red if it
> was in trouble.

We already have this API, see drivers/leds ;-)

> here are some operations/messages i can see that are useful which you
> probably already have in your API:
> 
> turn on LED at #x color somecolor
> turn off LED at #y
> query LED info at #x
> dump all LEDs on board - think of this as a discovery
> flicker LED at #z at frequency y color green
> maybe even: "I am a wireless card with no LED, I claim LED #x"
> which is matched by "tell me if anyone owns LED code"
> 
> In other words, if you just provide mechanims let people write the
> policies.
> This way if i wanted to tie it to my eth0 i can. 

We have LEDs which show up in sysfs and can be controlled by userspace
from there. They can also choose to be controlled by kernel LED
'triggers', for example. we have an IDE disk trigger which shows up
activity on IDE disks. Florian would like to see a network trigger.

The LED trigger code is quite generic and designed to have little impact
on the subsystem its added to, at least in terms of code. As always,
there will be some runtime overhead though. Ultimately it depends how
complex you make the trigger (eg. how many options it has) and where and
how you hook it into the network subsystem. I know little about the
network subsystem so this is something others will have to advise on.

Cheers,

Richard 
(LED Maintainer)


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver fixes

2007-03-02 Thread Linus Torvalds


On Thu, 1 Mar 2007, Kok, Auke wrote:

> Linus Torvalds wrote:
> > 
> > Ok, here's an interesting one: my e1000 card no longer worked for a while.
> > 
> > The green link-light blinks on/off once a second, and in time to that, my
> > dmesg fills up with an endless supply of
> > 
> > e1000: eth0: e1000_watchdog: NIC Link is Down
> > e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex, Flow 
> > Control: None
> > e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO
> > 
> > and networking obviously doesn't actually work.
> 
> Just out of curiosity, which e1000 chipset+motherboard are you running this
> on?

The kernel prints out:

e1000: :00:19.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 
00:16:76:c7:eb:fe
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection

and lspci says:

00:19.0 Ethernet controller: Intel Corporation 82566DM Gigabit Network 
Connection (rev 02)
Subsystem: Intel Corporation Unknown device 0001
Flags: bus master, fast devsel, latency 0, IRQ 506
Memory at e040 (32-bit, non-prefetchable) [size=128K]
Memory at e0424000 (32-bit, non-prefetchable) [size=4K]
I/O ports at 20c0 [size=32]
Capabilities: 
00: 86 80 4a 10 07 04 10 00 02 00 00 02 00 00 00 00
10: 00 00 40 e0 00 40 42 e0 c1 20 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 01 00
30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00

It's an Intel system (Host bridge: Intel Corporation 82Q963/Q965) with 
integrated graphics: PCI ID 8086:2990 (rev 02) for the host bridge.

DMI info isn't very interesting, but it's an all-Intel board:

OEM-specific Type
Strings:
Intel_ASF
Intel_ASF_001
..
Base Board Information
Manufacturer: Intel Corporation
Product Name: DQ965GF
Version: AAD41676-305
Serial Number: BQGF635009R2
...
BIOS Information
Vendor: Intel Corp.
Version: CO96510J.86A.4462.2006.0804.2059
Release Date: 08/04/2006

so it's all-intel chipset, all-intel board, and all-intel BIOS ;)

> there have been problems reported with AMT2 on several chipsets (AMT2 is
> not supported under linux, unlike AMT1), and having it enabled in the BIOS
> produces this phenomenon.

Is there some way to at least disable AMT2 from the Linux driver (ie I 
assume this is some issue of Intel not documenting it all - but maybe you 
can add a "turn off that bit" to the affected chip).

If I'm not the only one to see it, it's obviously not just my personal 
ethernet switch bug, but apparently the e1000 becoming confused by some 
link detection event (and powering down the switch probably just gets it 
out of its confusion).

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NetLabel: Verify sensitivity level has a valid CIPSO mapping

2007-03-02 Thread Paul Moore
On Wednesday, February 28 2007 3:01:31 pm Paul Moore wrote:
> The current CIPSO engine has a problem where it does not verify that the
> given sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI
> type is used.  The end result is that bad packets are sent on the wire
> which should have never been sent in the first place.  This patch corrects
> this problem by verifying the sensitivity level mapping similar to what is
> done with the category mapping.  This patch also changes the returned error
> code in this case to -EPERM to better match what the category mapping
> verification code returns.
>
> Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
> ---
>  net/ipv4/cipso_ipv4.c |7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

I probably should have been more clear in the original patch posting ... this 
is a bugfix patch which I believe should go into 2.6.21 (as well as 
the -stable tree, but I know they like to see it hit Linus' tree first).

-- 
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver fixes

2007-03-02 Thread Kok, Auke

Linus Torvalds wrote:

On Thu, 1 Mar 2007, Kok, Auke wrote:
and lspci says:

00:19.0 Ethernet controller: Intel Corporation 82566DM Gigabit Network 
Connection (rev 02)



DMI info isn't very interesting, but it's an all-Intel board:

so it's all-intel chipset, all-intel board, and all-intel BIOS ;)


It's like the devil plays with it. We just discussed adding a piece of text 
about this issue to our README.



there have been problems reported with AMT2 on several chipsets (AMT2 is
not supported under linux, unlike AMT1), and having it enabled in the BIOS
produces this phenomenon.


Is there some way to at least disable AMT2 from the Linux driver (ie I 
assume this is some issue of Intel not documenting it all - but maybe you 
can add a "turn off that bit" to the affected chip).


Our suggestion is (IOW will be in the README) to turn AMT2 off completely in the 
BIOS, but I'll investigate if your suggestion is possible. It may be another 
workaround but this one indeed hurts.


If I'm not the only one to see it, it's obviously not just my personal 
ethernet switch bug, but apparently the e1000 becoming confused by some 
link detection event (and powering down the switch probably just gets it 
out of its confusion).


No, this fits the description perfectly of this issue. I'll get right on it and 
owe you a patch for the `e1000: not ready for irq` problem too, which seems to 
hold out after tests...


Cheers,

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network activity LED trigger

2007-03-02 Thread jamal
On Fri, 2007-02-03 at 16:03 +, Richard Purdie wrote:
> On Fri, 2007-03-02 at 10:16 -0500, jamal wrote:

> We already have this API, see drivers/leds ;-)

Very cool ;-> I was not aware of the existence of this API.
Actually i dont think it was available around 2.6.10.

> We have LEDs which show up in sysfs and can be controlled by userspace
> from there. They can also choose to be controlled by kernel LED
> 'triggers', for example. we have an IDE disk trigger which shows up
> activity on IDE disks. Florian would like to see a network trigger.
> 

This literally covers most of what i wanted; it may be too late to get
rid of that user space program but it is something i see you already
support;->

> The LED trigger code is quite generic and designed to have little impact
> on the subsystem its added to, at least in terms of code. As always,
> there will be some runtime overhead though. Ultimately it depends how
> complex you make the trigger (eg. how many options it has) and where 


Well, give me pointers and i will send you a patch for a board i
currently use:
http://download.intel.com/design/telecom/techspec/9635.pdf
which has GPIO LED.
I take it i would have to write a "driver" using your API?

> and how you hook it into the network subsystem. 
> I know little about the
> network subsystem so this is something others will have to advise on.

Other people may have different opionions: I cant think of something
useful from a network perspective mostly because you cant make it
generic enough i.e some boards will have LEDs for their NICs and some
wont. Just as some boards have activity LEDS for their IDE disks. IOW, I
think general purpose LEDs will probably be very dependent on the
shipping product.

other than that, great work!

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert network timestamps to ktime_t

2007-03-02 Thread Stephen Hemminger
On Fri, 2 Mar 2007 15:38:41 +0100
Eric Dumazet <[EMAIL PROTECTED]> wrote:

> We currently use a special structure (struct skb_timeval) and plain 'struct 
> timeval' to store packet timestamps in sk_buffs and struct sock.
> 
> This has some drawbacks :
> - Fixed resolution of micro second.
> - Waste of space on 64bit platforms where sizeof(struct timeval)=16
> 
> I suggest using ktime_t that is a nice abstraction of high resolution time 
> services, currently capable of nanosecond resolution.
> 
> As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 
> byte 
> shrink of this structure on 64bit architectures. Some other structures also 
> benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct 
> frag_queue in ipv6/reassembly.c, ...)

This is even better. Also comparing ktime_t's is easier if some code needs
to do that.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SWS for rcvbuf < MTU

2007-03-02 Thread Alex Sidorenko
Hello,

this is a rare corner case met by one of HP partners on 2.4.20 on IA64. 
Inspecting the sources of the latest 2.6.20.1 (net/ipv4/tcp_output.c) we can 
see that the bug is still there.

Here is a description of the bug and the suggested fix.

The problem occurs when the remote host (not necessarily Linux - in our case 
it was Solaris) does not implement SWS avoidance on sender side. If Linux 
connection socket has rcvbuf mtu. But if we use small rcvbuf (set by 
SO_RCVBUF), we can go into SWS mode. Let us for simplicity look only at the 
case when we don't have WS enabled. If we have free_space above full_space/2, 
we reach the following section:


/* Don't do rounding if we are using window scaling, since the
 * scaled window will not line up with the MSS boundary anyway.
 */
window = tp->rcv_wnd;
if (tp->rx_opt.rcv_wscale) {

} else {
/* Get the largest window that is a nice multiple of mss.
 * Window clamp already applied above.
 * If our current window offering is within 1 mss of the
 * free space we just keep it. This prevents the divide
 * and multiply from happening most of the time.
 * We also don't do any window rounding when the free space
 * is too small.
 */
(1)  if (window <= free_space - mss || window > free_space)
window = (free_space/mss)*mss;
}

return window;

What happens if we have a small tp->rcv_wnd and rcvbuf <= mss? In this case 
condition (1) is almost always false and as a result we'll return 
unmodified 'window' set to tp->rcv_wnd.  If tp->rcv_wnd is small, it can be 
reused over and over again.

For the case rcvbuf <= mss  __tcp_select_window() returns:

  0 if we have free_space < full_space/2OK
  mss   if rcvbuf is empty  OK
  tp->rcv_wnd   in other case   Bad


If there is no SWS avoidance on sender side, we can see Linux advertising the 
same small rcv_wnd over and over again. The problem here is that we never 
advertise one-half the receiver's buffer space as described e.g. in

"TCP/IP Illustrated" by Stevens (v.1, Chapter 22.3):

"The normal algorithm is for the receiver not to advertise a larger window 
than it is currently advertising (which can be 0) until the window can be 
increased by either one full-sized segment (i.e. the MSS being received) or by 
one-half the receiver's buffer space, whichever is smaller"
^^

The fix.


We have not been able to reproduce the problem inside HP as it is unclear what 
conditions are needed to bring system into SWS mode (this needs very special 
event timing). HP customer was seeing it every 2-3 days while running a 
custom application (Solaris<->Linux) that was running with low priority on a 
busy host running other custom applications with SCHED_RR. After going into 
SWS mode, his application stayed in it until restarted.

We provided to customer a fix for 2.4.20 only (used by customer in production) 
by adding another test and returning rcvbuf/2 when needed:

--- net/ipv4/tcp_output.c.orig  Wed May  3 20:40:43 2006
+++ net/ipv4/tcp_output.c   Tue Jan 30 14:24:56 2007
@@ -641,6 +641,7 @@
  * Note, we don't "adjust" for TIMESTAMP or SACK option bytes.
  * Regular options like TIMESTAMP are taken into account.
  */
+static const char *SWS_id_string="@#SWS-fix-2";
 u32 __tcp_select_window(struct sock *sk)
 {
struct tcp_opt *tp = &sk->tp_pinfo.af_tcp;
@@ -682,6 +683,9 @@
window = tp->rcv_wnd;
if (window <= free_space - mss || window > free_space)
window = (free_space/mss)*mss;
+/* A fix for small rcvbuf [EMAIL PROTECTED] */
+   else if (mss == full_space && window < full_space/2)
+   window = full_space/2;

return window;
 }


Customer has confirmed that this resolves the problem and decreases CPU usage 
by  his custom application - even when there is no SWS.


This is a rare corner case and most users will never meet it. But as the fix 
is trivial, I think it makes sense to include it in upstream sources. 

Regards,
Alex

-- 
--
Alexandre Sidorenko email: [EMAIL PROTECTED]
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: Fix problem sending IP fragments

2007-03-02 Thread Norbert Eicker
On Fri 2.3.2007 00:34, Linas Vepstas wrote:
> On Thu, Mar 01, 2007 at 04:52:54PM -0600, Chris Engel wrote:
> > I tried to apply this patch to 2.6.21-rc2 and CHECKSUM_HW appears
> > to be changed to CHECKSUM_COMPLETE

Oops. I did not test this on the actual 2.6.21-rc2 before sending it.
It worked fine for me on 2.6.18.

In the meantime it tested the patch below on 2.6.21.

> The use of CHECKSUM_HW was replaced by CHECKSUM_PARTIAL and
> CHECKSUM_COMPLETE on a cae-by-case basis, in the patch series leading
> up to 2.6.19.  In this case, I'm not sure which should have been
> used.

In fact CHECKSUM_COMPLETE seems to be used on the receiving side while
CHECKSUM_PARTIAL is the one to be used while sending frames. Thus the
latter is the one to chose.

> Norbert, can you resubmit a patch that applies to a more recent
> kernel? p.s. your emailer replaced tabs by spaces ...

so here's the new one:

Fix problem sending IP fragments on spidernet.

Signed-off-by: Norbert Eicker <[EMAIL PROTECTED]>
---
diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
index 3b91af8..e3019d5 100644
--- a/drivers/net/spider_net.c
+++ b/drivers/net/spider_net.c
@@ -719,7 +719,7 @@ spider_net_prepare_tx_descr(struct spide
SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
spin_unlock_irqrestore(&chain->lock, flags);

-   if (skb->protocol == htons(ETH_P_IP))
+   if (skb->protocol == htons(ETH_P_IP) && skb->ip_summed == 
CHECKSUM_PARTIAL)
switch (skb->nh.iph->protocol) {
case IPPROTO_TCP:
hwdescr->dmac_cmd_status |= SPIDER_NET_DMAC_TCP;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Martin Schwidefsky
On Fri, 2007-03-02 at 12:11 +0100, Johannes Berg wrote:
> On Fri, 2007-03-02 at 03:06 -0800, Andrew Morton wrote:
> > s390 is weird ;)   There's no way it'll support any of the hardware which 
> > you're
> > working on (until they release the s390 laptop).  So all we really want to
> > do here is to avoid breaking s390 allmodconfig.

Well, I would not say "weird" but different. None of the usual device
attachments is present on a s390. That includes memory mapped i/o (!).

> Alright. I think we'll probably have to make bcm43xx and b44 depend on
> SSB instead of selecting it like the LED trigger stuff below.
> 
> But I don't see why s390 can't include hw random, led trigger or even
> hid, those are all software features afaict.

True. I'm still sitting on a couple of patches that make s390 use the
standard drivers/Kconfig. The downside of these patches is that I have
to add a lot of "depends on !S390" all over the place.

> > OK, I'll try that, thanks.
> 
> Not that it'll actually help get the compile through... bcm43xx will
> drop fail and bluetooth probably as well.

No bcm43xx, no bluetooth on s390..

-- 
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Arp announce (for Xen)

2007-03-02 Thread Ben Greear

Pekka Savola wrote:

On Thu, 1 Mar 2007, Stephen Hemminger wrote:

What about implementing the unused arp_announce flag on the inetdevice?
Something like the following.  Totally untested...

Looks like it either was there (and got removed) or was planned but
never implemented.

IN_DEV_ARP_ANNOUNCE is in 2.6.18, at least..used in arp_solicit in arp.c

I really hope this didn't get removed because I find it very useful!

But, you could certainly add another sysctl...

Thanks,
Ben

--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf < MTU

2007-03-02 Thread John Heffner

Alex Sidorenko wrote:
[snip]

--- net/ipv4/tcp_output.c.orig  Wed May  3 20:40:43 2006
+++ net/ipv4/tcp_output.c   Tue Jan 30 14:24:56 2007
@@ -641,6 +641,7 @@
  * Note, we don't "adjust" for TIMESTAMP or SACK option bytes.
  * Regular options like TIMESTAMP are taken into account.
  */
+static const char *SWS_id_string="@#SWS-fix-2";
 u32 __tcp_select_window(struct sock *sk)
 {
struct tcp_opt *tp = &sk->tp_pinfo.af_tcp;
@@ -682,6 +683,9 @@
window = tp->rcv_wnd;
if (window <= free_space - mss || window > free_space)
window = (free_space/mss)*mss;
+/* A fix for small rcvbuf [EMAIL PROTECTED] */
+   else if (mss == full_space && window < full_space/2)
+   window = full_space/2;

return window;
 }


Good analysis of the problem, but the patch does not look quite right. 
In particular, you can't ever announce a zero window. :)


I think this attached patch does the correct SWS avoidance.

Thanks,
  -John

Do receiver-side SWS avoidance for rcvbuf < MSS.

Signed-off-by: John Heffner <[EMAIL PROTECTED]>

---
commit 38d33181c93a28cf7fb2f9f3377305a04636c054
tree 503f8a9de6e78694bae9fc2eb1c9dd5d26a0b5ed
parent 562aa1d4c6a874373f9a48ac184f662fbbb06a04
author John Heffner <[EMAIL PROTECTED]> Fri, 02 Mar 2007 13:47:44 -0500
committer John Heffner <[EMAIL PROTECTED]> Fri, 02 Mar 2007 13:47:44 -0500

 net/ipv4/tcp_output.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index dc15113..688b955 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1607,6 +1607,9 @@ u32 __tcp_select_window(struct sock *sk)
 */
if (window <= free_space - mss || window > free_space)
window = (free_space/mss)*mss;
+   else if (mss == full_space &&
+free_space > window + full_space/2)
+   window = free_space;
}
 
return window;


Netem tfifo implementation

2007-03-02 Thread Ritesh Kumar

Hi,
   I recently saw the qdisc "tfifo" in the netem module
(net/sched/sch_netem.c) when I migrated some of my patches from 2.6.14
to 2.6.20. As I understand, tfifo helps in keeping the queue of
packets sorted according to their "time_to_send". [tfifo was not
present in 2.6.14 perhaps because arrival order of packets was always
equal to the departure order]. However, tfifo uses a linear search in
the packet queue to find where to enqueue the packet.
   Quite some time ago (2.6.14 era), I needed a similar functionality
from the netem module and I ended up coding a pointer based min-heap
for the same. I was wondering if the community was interested in using
the min-heap implementation to replace the linear search
implementation. I have tested the min-heap quite a few times and it
seems to work.
   The implementation is slightly non-trivial because it uses
pointers to maintain the heap structure instead if using good old
fixed size arrays. I did this mainly so that the limit of the netem
qdisc could be changed on the fly. However, because every sk_buff now
needs two pointers for its children nodes, I added an extra
(sk_buff*)next2 to struct sk_buff (sorry!). However, this can probably
be changed to a pointer inside netem_skb_cb.  Also, because I needed
this for personal work and 2.6.14 didn't contain tfifo, I basically
removed the embedded qdisc and made netem a classless qdisc with my
min heap as the native "queue" (sorry again! :) )
My patch on sch_netem.c is included. If there is interest, I will
be glad to make this into a proper tfifo patch along with any more of
your suggestions.


diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 79542af..66881ab 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -24,6 +24,9 @@

#include 


#define VERSION "1.2"

/* Network Emulation Queuing algorithm.
@@ -53,7 +56,11 @@
*/

 struct netem_sched_data {
-   struct Qdisc*qdisc;
+   struct sk_buff  *root;  /* The root of the heap of packets */
+   struct sk_buff  *end;   /* The last element in the heap */
+   int heap_size;  /* The current
size of the heap */
+   __u64   num_arrivals;   /* records the number of
arrivals; to be used
+  for
stable ordering in t
he heap */
   struct timer_list timer;

   u32 latency;
@@ -75,11 +82,15 @@ struct netem_sched_data {
   u32  size;
   s16 table[0];
   } *delay_dist;
};

/* Time stamp put into socket buffer control block */
 struct netem_skb_cb {
   psched_time_t   time_to_send;
+   __u64   arrival_order;
};

/* init_crandom - initialize correlated random number generator
@@ -139,6 +150,210 @@ static long tabledist(unsigned long mu, long sigma,
   return  x / NETEM_DIST_SCALE + (sigma / NETEM_DIST_SCALE) * t + mu;
}

+int netem_time_less(struct sk_buff *skb1, struct sk_buff *skb2)
+{
+   int r;
+   struct netem_skb_cb *cb1 = (struct netem_skb_cb *)skb1->cb;
+   struct netem_skb_cb *cb2 = (struct netem_skb_cb *)skb2->cb;
+   r = PSCHED_TDIFF(cb1->time_to_send, cb2->time_to_send);
+   if(r == 0) return (cb1->arrival_order < cb2->arrival_order);
+   else return (r < 0);
+}
+
+int netem_insert_heap(struct sk_buff *skb, struct netem_sched_data *q)
+{
+   struct sk_buff *tmp;
+   struct netem_skb_cb *cb = (struct netem_skb_cb *)skb->cb;
+
+   if(q->heap_size >= q->limit)
+   return NET_XMIT_DROP;
+
+   skb->next = NULL;
+   skb->next2 = NULL;
+   //Use the arrival order in the heap to maintain stability.
cb->arrival order
+   //is 64 bits... so it should take a few years before this wraps around.
+   cb->arrival_order = q->num_arrivals++;
+   //root is the root of the heap. end is the last element of the heap.
+   if(q->root == NULL){
+   q->root = skb;
+   skb->prev = NULL;
+   q->end = skb;
+   goto success;
+   }
+   tmp = q->end;
+   //Note that the pointer next is left and next2 is right.
+   while(tmp->prev != NULL && tmp == tmp->prev->next2) tmp = tmp->prev;
+   if(tmp->prev == NULL){
+   //Complete tree: make a new node at a new level. Also
now, tmp == q->root
+   while(tmp->next != NULL) tmp = tmp->next;
+   tmp->next = skb;
+   skb->prev = tmp;
+   }else if(tmp->prev->next2 == NULL){
+   tmp->prev->next2 = skb;
+   skb->prev = tmp->prev;
+   }else{
+   tmp = tmp->prev->next2;
+   while(tmp->next != NULL) tmp = tmp->next;
+   tmp->next = skb;
+   skb->prev = tmp;
+   }
+
+   //Now skb is at the end of the heap though q->end is not
adjusted as yet.
+   if(netem_time_less(skb, skb->prev))
+   q->end = skb->prev;
+   else
+   q->end =

Re: [Bugme-new] [Bug 8107] New: dev->header_cache_update has a random value

2007-03-02 Thread David Miller
From: Krzysztof Halasa <[EMAIL PROTECTED]>
Date: Fri, 02 Mar 2007 16:29:06 +0100

> Andrew Morton <[EMAIL PROTECTED]> writes:
> 
> >> However, in 
> >> drivers/net/wan/hdlc_cisco.c, in function static int cisco_ioctl(struct 
> >> net_device *dev, struct ifreq *ifr), where dev->hard_header is assigned a 
> >> valid 
> >> function, and dev->hard_header_cache is assigned a known value (NULL), dev-
> >> >header_cache_update is not set to a known value:
> 
> Right, it seems I was never aware of dev->header_cache_update existence.
> I wonder where does the non-NULL value come from? Nevermind.
> 
> > diff -puN 
> > drivers/net/wan/hdlc_cisco.c~cisco_ioctl-initialise-header_cache_update 
> > drivers/net/wan/hdlc_cisco.c
> > --- 
> > a/drivers/net/wan/hdlc_cisco.c~cisco_ioctl-initialise-header_cache_update
> > +++ a/drivers/net/wan/hdlc_cisco.c
> > @@ -366,6 +366,7 @@ static int cisco_ioctl(struct net_device
> > dev->hard_start_xmit = hdlc->xmit;
> > dev->hard_header = cisco_hard_header;
> > dev->hard_header_cache = NULL;
> > +   dev->header_cache_update = NULL;
> > dev->type = ARPHRD_CISCO;
> > dev->flags = IFF_POINTOPOINT | IFF_NOARP;
> > dev->addr_len = 0;
> > _
> 
> ACK, I think it's the best place.

I disagree, you can't leave dangling references to functions
which are potentially inside of unloaded modules, as this code
does.

Rather, HDLC Cisco should implement a proper protocol destructor
method to clean up these function pointers.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NetLabel: Verify sensitivity level has a valid CIPSO mapping

2007-03-02 Thread David Miller
From: Paul Moore <[EMAIL PROTECTED]>
Date: Fri, 2 Mar 2007 11:12:12 -0500

> On Wednesday, February 28 2007 3:01:31 pm Paul Moore wrote:
> > The current CIPSO engine has a problem where it does not verify that the
> > given sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI
> > type is used.  The end result is that bad packets are sent on the wire
> > which should have never been sent in the first place.  This patch corrects
> > this problem by verifying the sensitivity level mapping similar to what is
> > done with the category mapping.  This patch also changes the returned error
> > code in this case to -EPERM to better match what the category mapping
> > verification code returns.
> >
> > Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
> > ---
> >  net/ipv4/cipso_ipv4.c |7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> I probably should have been more clear in the original patch posting ... this 
> is a bugfix patch which I believe should go into 2.6.21 (as well as 
> the -stable tree, but I know they like to see it hit Linus' tree first).

I realize this and plan to apply the patch, I'm just backlogged
at the moment.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf < MTU

2007-03-02 Thread David Miller
From: Alex Sidorenko <[EMAIL PROTECTED]>
Date: Fri, 2 Mar 2007 11:28:28 -0500

> Customer has confirmed that this resolves the problem and decreases
> CPU usage by his custom application - even when there is no SWS.

There is rarely ever a reason to set explicit socket receive
buffer sizes, since the kernel dynamically sizes them based
upon how the connection is used.

Why do they set it so low?

It is just as easy to fix their performance bug by simply removing
SO_RCVBUF setting in the application.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netem tfifo implementation

2007-03-02 Thread Patrick McHardy
Ritesh Kumar wrote:
> Hi,
>I recently saw the qdisc "tfifo" in the netem module
> (net/sched/sch_netem.c) when I migrated some of my patches from 2.6.14
> to 2.6.20. As I understand, tfifo helps in keeping the queue of
> packets sorted according to their "time_to_send". [tfifo was not
> present in 2.6.14 perhaps because arrival order of packets was always
> equal to the departure order]. However, tfifo uses a linear search in
> the packet queue to find where to enqueue the packet.
>Quite some time ago (2.6.14 era), I needed a similar functionality
> from the netem module and I ended up coding a pointer based min-heap
> for the same. I was wondering if the community was interested in using
> the min-heap implementation to replace the linear search
> implementation. I have tested the min-heap quite a few times and it
> seems to work.
>The implementation is slightly non-trivial because it uses
> pointers to maintain the heap structure instead if using good old
> fixed size arrays. I did this mainly so that the limit of the netem
> qdisc could be changed on the fly. However, because every sk_buff now
> needs two pointers for its children nodes, I added an extra
> (sk_buff*)next2 to struct sk_buff (sorry!). However, this can probably
> be changed to a pointer inside netem_skb_cb.  Also, because I needed
> this for personal work and 2.6.14 didn't contain tfifo, I basically
> removed the embedded qdisc and made netem a classless qdisc with my
> min heap as the native "queue" (sorry again! :) )

The tfifo qdisc has a limit, why not just allocate a fixed-size heap
based on that?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Arp announce (for Xen)

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 10:29:53 -0800
Ben Greear <[EMAIL PROTECTED]> wrote:

> Pekka Savola wrote:
> > On Thu, 1 Mar 2007, Stephen Hemminger wrote:
> >> What about implementing the unused arp_announce flag on the inetdevice?
> >> Something like the following.  Totally untested...
> >>
> >> Looks like it either was there (and got removed) or was planned but
> >> never implemented.
> IN_DEV_ARP_ANNOUNCE is in 2.6.18, at least..used in arp_solicit in arp.c
> 
> I really hope this didn't get removed because I find it very useful!
> 
> But, you could certainly add another sysctl...
> 
> Thanks,
> Ben
> 

yeah, something new like arp_notify? or arp_gratiutous

There are other drivers that do their own arp, they need to be fixed.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf < MTU

2007-03-02 Thread Alex Sidorenko
On March 2, 2007 02:25:42 pm David Miller wrote:
> From: Alex Sidorenko <[EMAIL PROTECTED]>
> Date: Fri, 2 Mar 2007 11:28:28 -0500
>
> > Customer has confirmed that this resolves the problem and decreases
> > CPU usage by his custom application - even when there is no SWS.
>
> There is rarely ever a reason to set explicit socket receive
> buffer sizes, since the kernel dynamically sizes them based
> upon how the connection is used.
>
> Why do they set it so low?
>
> It is just as easy to fix their performance bug by simply removing
> SO_RCVBUF setting in the application.

Hi David,

they told us that they use small rcvbuf to throttle bandwidth for this 
application. I explained it would be better to use TC for this purpose. They 
agreed and will probably redesign their application in the future, but they 
cannot do it right now. For the same reason they have to use the old 2.4.20 
for a while - in big companies the important production software cannot be 
changed quickly. 

The fix I suggested is trivial and should have no impact the case of 
rcvfbuf>mtu, so I think it makes sense to include it in upstream kernel.

Regards,
Alex


-- 
--
Alexandre Sidorenko email: [EMAIL PROTECTED]
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network access fails unless tcpdump is running?

2007-03-02 Thread Andy Gospodarek
On Thu, Mar 01, 2007 at 06:27:18PM -0500, Marc D Ronell wrote:
> Thats correct. Its the wired interface, eth0 which is having the
> problem.  I have turned the wireless interface, eth2 off with both
> ifconfig and ifdown, and still, the connection to the outside only
> works when tcpdump is running.
> 

Good to know.

> > Can you post the output from `ethtool -i ethX` (where ethX is the wired
> > interface).  I ask because that tells me what version of the b44/ipw3945
> > driver you are using.
> >
> >
> 
> # ethtool -i eth0
> driver: b44
> version: 1.01
> firmware-version:
> bus-info: :03:00.0
> 
> 
> The system was working originally fine, but something changed.
> Perhaps through an Debian aptitude update.

Any chance you can boot back to the old kernel (the one where is was
working) and run and ethtool -i eth0 on that one to see what version of
the driver was used there?  It's hard to know what may have changed
between the 2 versions of the driver since I don't know the starting
point.

It's also hard to know if this is fixed already since you aren't running
the latest upstream kernel.  Downloading, building, and testing the
latest from kernel.org would be a good way to know if this is already
fixed.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf < MTU

2007-03-02 Thread Alex Sidorenko
On March 2, 2007 01:54:45 pm John Heffner wrote:
> Alex Sidorenko wrote:
> [snip]
>
> > --- net/ipv4/tcp_output.c.orig  Wed May  3 20:40:43 2006
> > +++ net/ipv4/tcp_output.c   Tue Jan 30 14:24:56 2007
> > @@ -641,6 +641,7 @@
> >   * Note, we don't "adjust" for TIMESTAMP or SACK option bytes.
> >   * Regular options like TIMESTAMP are taken into account.
> >   */
> > +static const char *SWS_id_string="@#SWS-fix-2";
> >  u32 __tcp_select_window(struct sock *sk)
> >  {
> > struct tcp_opt *tp = &sk->tp_pinfo.af_tcp;
> > @@ -682,6 +683,9 @@
> > window = tp->rcv_wnd;
> > if (window <= free_space - mss || window > free_space)
> > window = (free_space/mss)*mss;
> > +/* A fix for small rcvbuf [EMAIL PROTECTED] */
> > +   else if (mss == full_space && window < full_space/2)
> > +   window = full_space/2;
> >
> > return window;
> >  }
>
> Good analysis of the problem, but the patch does not look quite right.
> In particular, you can't ever announce a zero window. :)

Hi John,

in case when (free_space < full_space/2) we do not reach the modified code and
we will return zero:

if (free_space < full_space/2) {
icsk->icsk_ack.quick = 0;
 if (tcp_memory_pressure)
tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U*tp->advmss);
 if (free_space < mss)
return 0;
}

Here is how windows look with the fixed kernel (from customer's test):

20:59:45.320758 Node1.logical.40171 > 11.0.0.1.39909: win = 708
20:59:45.322758 Node1.logical.40171 > 11.0.0.1.39909: win = 288
20:59:45.714567 Node1.logical.40171 > 11.0.0.1.39909: win = 354
20:59:45.717110 Node1.logical.40171 > 11.0.0.1.39909: win = 0
20:59:45.719110 Node1.logical.40171 > 11.0.0.1.39909: win = 708
...

Regards,
Alex

> I think this attached patch does the correct SWS avoidance.
>
> Thanks,
>-John



-- 
--
Alexandre Sidorenko email: [EMAIL PROTECTED]
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf < MTU

2007-03-02 Thread David Miller
From: Alex Sidorenko <[EMAIL PROTECTED]>
Date: Fri, 2 Mar 2007 15:21:58 -0500

> they told us that they use small rcvbuf to throttle bandwidth for this 
> application. I explained it would be better to use TC for this purpose. They 
> agreed and will probably redesign their application in the future, but they 
> cannot do it right now. For the same reason they have to use the old 2.4.20 
> for a while - in big companies the important production software cannot be 
> changed quickly. 
> 
> The fix I suggested is trivial and should have no impact the case of 
> rcvfbuf>mtu, so I think it makes sense to include it in upstream kernel.

I have no objection to the fix, especially John's version.

I was just curious about the app, thanks for the info :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-02 Thread Michael K. Edwards

On 3/2/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:

Thank you for this report. (Still avoiding cache misses studies, while they
obviously are the limiting factor)


1)  The entire point of going to a tree-like structure would be to
allow the leaves to age out of cache (or even forcibly evict them)
when the structure bloats (generally under DDoS attack), on the theory
that most of them are bogus and won't be referenced again.  It's not
about the speed of the data structure -- it's about managing its
impact on the rest of the system.

2)  The other entire point of going to a tree-like structure is that
they're drastically simpler to RCU than hashes, and more generally
they don't involve individual atomic operations (RCU reaping passes,
resizing, etc.) that cause big latency hiccups and evict a bunch of
other stuff from cache.

3)  The third entire point of going to a tree-like structure is to
have a richer set of efficient operations, since you can give them a
second "priority"-type index and have "pluck-highest-priority-item",
three-sided search, and bulk delete operations.  These aren't that
much harder to RCU than the basic modify-existing-node operation.

Now can we give these idiotic micro-benchmarks a rest until Robert's
implementation is tuned and ready for stress-testing?

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fix bugs in "Whether sock accept queue is full" checking

2007-03-02 Thread David Miller
From: weidong <[EMAIL PROTECTED]>
Date: Wed, 14 Feb 2007 11:30:57 -0500

> diff -ruN old/include/net/sock.h new/include/net/sock.h
> --- old/include/net/sock.h2007-02-03 08:38:21.0 -0500
> +++ new/include/net/sock.h2007-02-03 08:38:30.0 -0500
> @@ -426,7 +426,7 @@
>  
>  static inline int sk_acceptq_is_full(struct sock *sk)
>  {
> - return sk->sk_ack_backlog > sk->sk_max_ack_backlog;
> + return sk->sk_ack_backlog >= sk->sk_max_ack_backlog;
>  }
>  
>  /*

I've applied this patch, and also fixed a similar case
I spotted in AF_UNIX after doing a quick audit.

Thank you.

commit 626d548a8d145a032cff9237245f8ac9d9056ac1
Author: David S. Miller <[EMAIL PROTECTED]>
Date:   Fri Mar 2 12:49:23 2007 -0800

[AF_UNIX]: Test against sk_max_ack_backlog properly.

This brings things inline with the sk_acceptq_is_full() bug
fix.  The limit test should be x >= sk_max_ack_backlog.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 6069716..51ca438 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -934,7 +934,7 @@ static long unix_wait_for_peer(struct sock *other, long 
timeo)
 
sched = !sock_flag(other, SOCK_DEAD) &&
!(other->sk_shutdown & RCV_SHUTDOWN) &&
-   (skb_queue_len(&other->sk_receive_queue) >
+   (skb_queue_len(&other->sk_receive_queue) >=
 other->sk_max_ack_backlog);
 
unix_state_runlock(other);
@@ -1008,7 +1008,7 @@ restart:
if (other->sk_state != TCP_LISTEN)
goto out_unlock;
 
-   if (skb_queue_len(&other->sk_receive_queue) >
+   if (skb_queue_len(&other->sk_receive_queue) >=
other->sk_max_ack_backlog) {
err = -EAGAIN;
if (!timeo)
@@ -1381,7 +1381,7 @@ restart:
}
 
if (unix_peer(other) != sk &&
-   (skb_queue_len(&other->sk_receive_queue) >
+   (skb_queue_len(&other->sk_receive_queue) >=
 other->sk_max_ack_backlog)) {
if (!timeo) {
err = -EAGAIN;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network access fails unless tcpdump is running?

2007-03-02 Thread Marc D Ronell
"Andy Gospodarek" <[EMAIL PROTECTED]> writes:

>
> Any chance you can boot back to the old kernel (the one where is was
> working) and run and ethtool -i eth0 on that one to see what version of
> the driver was used there?  It's hard to know what may have changed
> between the 2 versions of the driver since I don't know the starting
> point.
>
> It's also hard to know if this is fixed already since you aren't running
> the latest upstream kernel.  Downloading, building, and testing the
> latest from kernel.org would be a good way to know if this is already
> fixed.
>

I had already loaded,  compiled, and tested linux-2.6.20.1.  There was
no change with the newer kernel.  Network connections only worked when
tcpdump was running.

Similar for booting with an  older kernel 2.6.17.  I think the problem
is not with the kernel, but with other system software.  It could take
a while to debug, so I am just rebuilding.

Thanks for your help.

marc


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netem tfifo implementation

2007-03-02 Thread Ritesh Kumar

On 3/2/07, Patrick McHardy <[EMAIL PROTECTED]> wrote:

Ritesh Kumar wrote:
> Hi,
>I recently saw the qdisc "tfifo" in the netem module
> (net/sched/sch_netem.c) when I migrated some of my patches from 2.6.14
> to 2.6.20. As I understand, tfifo helps in keeping the queue of
> packets sorted according to their "time_to_send". [tfifo was not
> present in 2.6.14 perhaps because arrival order of packets was always
> equal to the departure order]. However, tfifo uses a linear search in
> the packet queue to find where to enqueue the packet.
>Quite some time ago (2.6.14 era), I needed a similar functionality
> from the netem module and I ended up coding a pointer based min-heap
> for the same. I was wondering if the community was interested in using
> the min-heap implementation to replace the linear search
> implementation. I have tested the min-heap quite a few times and it
> seems to work.
>The implementation is slightly non-trivial because it uses
> pointers to maintain the heap structure instead if using good old
> fixed size arrays. I did this mainly so that the limit of the netem
> qdisc could be changed on the fly. However, because every sk_buff now
> needs two pointers for its children nodes, I added an extra
> (sk_buff*)next2 to struct sk_buff (sorry!). However, this can probably
> be changed to a pointer inside netem_skb_cb.  Also, because I needed
> this for personal work and 2.6.14 didn't contain tfifo, I basically
> removed the embedded qdisc and made netem a classless qdisc with my
> min heap as the native "queue" (sorry again! :) )

The tfifo qdisc has a limit, why not just allocate a fixed-size heap
based on that?




The tfifo queue limit itself can be changed and that creates the
problem. If we use a fixed heap (say implemented using a fixed size
array) then we will have to copy over all pointers from the first
array to a reallocated array whenever the queue limit is changed.
In retrospect, moving just a few 10s of kilobytes of data doesn't seem
that much of a problem... now I feel stupid having put so much effort
:).

Ritesh
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert network timestamps to ktime_t

2007-03-02 Thread Stephen Hemminger
On Fri, 2 Mar 2007 15:38:41 +0100
Eric Dumazet <[EMAIL PROTECTED]> wrote:

> We currently use a special structure (struct skb_timeval) and plain 'struct 
> timeval' to store packet timestamps in sk_buffs and struct sock.
> 
> This has some drawbacks :
> - Fixed resolution of micro second.
> - Waste of space on 64bit platforms where sizeof(struct timeval)=16
> 
> I suggest using ktime_t that is a nice abstraction of high resolution time 
> services, currently capable of nanosecond resolution.
> 
> As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 
> byte 
> shrink of this structure on 64bit architectures. Some other structures also 
> benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct 
> frag_queue in ipv6/reassembly.c, ...)
> 
> 

You missed a couple of spots.

--- tcp-2.6.orig/net/sunrpc/svcsock.c   2007-03-02 12:50:45.0 -0800
+++ tcp-2.6/net/sunrpc/svcsock.c2007-03-02 12:58:28.0 -0800
@@ -805,16 +805,9 @@
/* possibly an icmp error */
dprintk("svc: recvfrom returned error %d\n", -err);
}
-   if (skb->tstamp.off_sec == 0) {
-   struct timeval tv;
 
-   tv.tv_sec = xtime.tv_sec;
-   tv.tv_usec = xtime.tv_nsec / NSEC_PER_USEC;
-   skb_set_timestamp(skb, &tv);
-   /* Don't enable netstamp, sunrpc doesn't
-  need that much accuracy */
-   }
-   skb_get_timestamp(skb, &svsk->sk_sk->sk_stamp);
+   svsk->sk_sk->sk_stamp = (skb->tstamp.tv64 != 0) ? skb->tstamp
+   : ktime_get_real();
set_bit(SK_DATA, &svsk->sk_flags); /* there may be more data... */
 
/*
--- tcp-2.6.orig/kernel/time.c  2007-03-02 12:59:55.0 -0800
+++ tcp-2.6/kernel/time.c   2007-03-02 13:00:08.0 -0800
@@ -469,6 +469,8 @@
 
return tv;
 }
+EXPORT_SYMBOL(ns_to_timeval);
+
 
 /*
  * Convert jiffies to milliseconds and back.



-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netem tfifo implementation

2007-03-02 Thread Stephen Hemminger
On Fri, 2 Mar 2007 15:56:54 -0500
"Ritesh Kumar" <[EMAIL PROTECTED]> wrote:

> On 3/2/07, Patrick McHardy <[EMAIL PROTECTED]> wrote:
> > Ritesh Kumar wrote:
> > > Hi,
> > >I recently saw the qdisc "tfifo" in the netem module
> > > (net/sched/sch_netem.c) when I migrated some of my patches from 2.6.14
> > > to 2.6.20. As I understand, tfifo helps in keeping the queue of
> > > packets sorted according to their "time_to_send". [tfifo was not
> > > present in 2.6.14 perhaps because arrival order of packets was always
> > > equal to the departure order]. However, tfifo uses a linear search in
> > > the packet queue to find where to enqueue the packet.
> > >Quite some time ago (2.6.14 era), I needed a similar functionality
> > > from the netem module and I ended up coding a pointer based min-heap
> > > for the same. I was wondering if the community was interested in using
> > > the min-heap implementation to replace the linear search
> > > implementation. I have tested the min-heap quite a few times and it
> > > seems to work.
> > >The implementation is slightly non-trivial because it uses
> > > pointers to maintain the heap structure instead if using good old
> > > fixed size arrays. I did this mainly so that the limit of the netem
> > > qdisc could be changed on the fly. However, because every sk_buff now
> > > needs two pointers for its children nodes, I added an extra
> > > (sk_buff*)next2 to struct sk_buff (sorry!). However, this can probably
> > > be changed to a pointer inside netem_skb_cb.  Also, because I needed
> > > this for personal work and 2.6.14 didn't contain tfifo, I basically
> > > removed the embedded qdisc and made netem a classless qdisc with my
> > > min heap as the native "queue" (sorry again! :) )
> >
> > The tfifo qdisc has a limit, why not just allocate a fixed-size heap
> > based on that?
> >
> >
> 
> The tfifo queue limit itself can be changed and that creates the
> problem. If we use a fixed heap (say implemented using a fixed size
> array) then we will have to copy over all pointers from the first
> array to a reallocated array whenever the queue limit is changed.
> In retrospect, moving just a few 10s of kilobytes of data doesn't seem
> that much of a problem... now I feel stupid having put so much effort
> :).
> 

Tfifo is a special case because:
  * timestamps are stored in skb->cb so it is only really usable inside
netem that adds timestamps.
  * insertions are cheap because it walks backwards and netem usually has
tnext > tlast.   Only if you have a huge jitter which causes massive 
reordering
and that is unrealistic, would you see a problem.

You can always make a new qisc and since netem is classless use yours.


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf < MTU

2007-03-02 Thread John Heffner

David Miller wrote:

From: Alex Sidorenko <[EMAIL PROTECTED]>
Date: Fri, 2 Mar 2007 15:21:58 -0500

they told us that they use small rcvbuf to throttle bandwidth for this 
application. I explained it would be better to use TC for this purpose. They 
agreed and will probably redesign their application in the future, but they 
cannot do it right now. For the same reason they have to use the old 2.4.20 
for a while - in big companies the important production software cannot be 
changed quickly. 

The fix I suggested is trivial and should have no impact the case of 
rcvfbuf>mtu, so I think it makes sense to include it in upstream kernel.


I have no objection to the fix, especially John's version.

I was just curious about the app, thanks for the info :)


Please don't apply the patch I sent.  I've been thinking about this a 
bit harder, and it may not fix this particular problem.  (Hard to say 
without knowing exactly what it is.)  As the comment above 
__tcp_select_window() states, we do not do full receive-side SWS 
avoidance because of header prediction.


Alex, you're right I missed that special zero-window case.  I'm still 
not quite sure I'm completely happy with this patch.  I'd like to think 
about this a little bit harder...


Thanks,
  -John
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BUG][SECURITY] Re: Weird problem with PPPoE on tap interface

2007-03-02 Thread David Miller
From: Florian Zumbiehl <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 13:38:44 +0100

> As noone seems to have an opinion on this: Here is a patch that does
> work for me and that should solve the problem as far as that is easily
> possible. It is based on the assumption that an interface's ifindex is
> basically an alias for a local MAC address, so incoming packets now are
> matched to sockets based on remote MAC, session id, and ifindex of the
> interface the packet came in on/the socket was bound to by connect().

I agree with your analysis and have applied your patch.

Another way to implement this would have been to store the
pre-computed ifindex on the kernel side sockaddr.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NetLabel: Verify sensitivity level has a valid CIPSO mapping

2007-03-02 Thread David Miller
From: James Morris <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 15:45:07 -0500 (EST)

> On Wed, 28 Feb 2007, Paul Moore wrote:
> 
> > The current CIPSO engine has a problem where it does not verify that the 
> > given
> > sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI type is
> > used.  The end result is that bad packets are sent on the wire which should
> > have never been sent in the first place.  This patch corrects this problem 
> > by
> > verifying the sensitivity level mapping similar to what is done with the
> > category mapping.  This patch also changes the returned error code in this 
> > case
> > to -EPERM to better match what the category mapping verification code 
> > returns.
> > 
> > Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
> 
> [removed redhat-lspp, which is subscriber only]
> 
> Acked-by: James Morris <[EMAIL PROTECTED]>

Applied, thanks everyone.

If -stable inclusion is desired, please submit this patch there.
You can add my signoff if you want:

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: Second try at vlan mailing list patch.

2007-03-02 Thread David Miller
From: Ben Greear <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 15:25:59 -0800

> Hopefully, by attaching it as a file it will not screw up the tabs & spaces.
> 
> Signed-off-by:  Ben Greear <[EMAIL PROTECTED]>

Nope still doesn't apply.

I can guess that you didn't try emailing the patch to yourself and
applying it?  If so I'm basically still your guinea pig each time you
correct this problem.  How nice that is :-/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 17:18:46 -0800

> I was measuring bridging/routing performance and noticed this.
> 
> The current code runs the "all packet" type handlers before calling the
> bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
> this means that each received packet gets run through the Berkeley Packet 
> Filter
> code in sk_run_filter (slow).

I know we closed this out by saying that even though performance
sucks, we can't really apply this without breaking things.

What would be broken is if the DHCP client isn't specifying
a device ifindex when it binds the AF_PACKET socket.  That
would be an easy way to fix this performance problem at the
application level.

The DHCP client should only care about a particular interface's
traffic, the one it wants to listen on.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [TCP]: Add two new spurious RTO responses to FRTO

2007-03-02 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Thu, 1 Mar 2007 13:30:20 +0200 (EET)

> [PATCH] [TCP]: Complete icsk-to-local-variable change (in tcp_enter_cwr)
> 
> A local variable for icsk was created but this change was
> missing. Spotted by Jarek Poplawski.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Applied to tcp-2.6, thank you.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] [TCP]: Move clearing of the prior_ssthresh due to ECE earlier

2007-03-02 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Thu, 1 Mar 2007 22:26:57 +0200 (EET)

> I think that doing it in the response is better that this approach,
> since it knows that the ssthresh has been halved already within that
> round-trip, so there is no need to do that again... I'll submit the
> patch tomorrow... With this prior_ssthresh clearing move alone, the 
> ssthresh ends up being halved twice if I tought it right (first in 
> tcp_enter_frto and then again in tcp_enter_cwr that is called from 
> fastretrans_alert)... So please, drop this patch.

Ok.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] [TCP]: FRTO undo response falls back to ratehalving one if ECEd

2007-03-02 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Fri, 2 Mar 2007 14:34:36 +0200 (EET)

> Undoing ssthresh is disabled in fastretrans_alert whenever
> FLAG_ECE is set by clearing prior_ssthresh. The clearing does
> not protect FRTO because FRTO operates before fastretrans_alert.
> Moving the clearing of prior_ssthresh earlier seems to be a
> suboptimal solution to the FRTO case because then FLAG_ECE will
> cause a second ssthresh reduction in try_to_open (the first
> occurred when FRTO was entered). So instead, FRTO falls back
> immediately to the rate halving response, which switches TCP to
> CA_CWR state preventing the latter reduction of ssthresh.
> 
> If the first ECE arrived before the ACK after which FRTO is able
> to decide RTO as spurious, prior_ssthresh is already cleared.
> Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be
> set also in the following ACKs resulting in rate halving response
> that sees TCP is already in CA_CWR, which again prevents an extra
> ssthresh reduction on that round-trip.
> 
> If the first ECE arrived before RTO, ssthresh has already been
> adapted and prior_ssthresh remains cleared on entry because TCP
> is in CA_CWR (the same applies also to a case where FRTO is
> entered more than once and ECE comes in the middle).
> 
> High_seq must not be touched after tcp_enter_cwr because CWR
> round-trip calculation depends on it.
> 
> I believe that after this patch, FRTO should be ECN-safe and
> even able to take advantage of synergy benefits.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Applied, but I had to apply this by hand, you did not generate
this diff against tcp-2.6

And I'm very angry about this specific case because I told you
EXPLICITLY that I reformated the switch() statement when I applied
the earlier FRTO patches.

Not only are people expected to patch against tcp-2.6, BUT I TOLD
YOU specifically that I modified your patch in this specific area.

What else do I need to do in order for people to generate clean
patches? :-(  Tell me, I'll do it!!!

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf < MTU

2007-03-02 Thread David Miller
From: John Heffner <[EMAIL PROTECTED]>
Date: Fri, 02 Mar 2007 16:16:39 -0500

> Please don't apply the patch I sent.  I've been thinking about this a 
> bit harder, and it may not fix this particular problem.  (Hard to say 
> without knowing exactly what it is.)  As the comment above 
> __tcp_select_window() states, we do not do full receive-side SWS 
> avoidance because of header prediction.
> 
> Alex, you're right I missed that special zero-window case.  I'm still 
> not quite sure I'm completely happy with this patch.  I'd like to think 
> about this a little bit harder...

Ok
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vlan & net drivers: avoid a 4-order allocation]

2007-03-02 Thread David Miller
From: Dan Aloni <[EMAIL PROTECTED]>
Date: Thu, 1 Mar 2007 12:02:17 +0200

> This patch splits the vlan_group struct into a multi-allocated struct. On
> x86_64, the size of the original struct is a little more than 32KB, causing
> a 4-order allocation, which is prune to problems caused by buddy-system 
> external fragmentation conditions.
> 
> I couldn't just use vmalloc() because vfree() cannot be called in the
> softirq context of the RCU callback.
> 
> Signed-off-by: Dan Aloni <[EMAIL PROTECTED]>

No objections, this really needs to be fixed, applied.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] udp: whitespace fixes

2007-03-02 Thread Stephen Hemminger

The udp code is full of bad indenting, extra whitespace and other
style confusion.  It makes no sense to declare functions that are used
outside the current file (extern) as inline.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/ipv4/udp.c |  402 -
 net/ipv6/udp.c |  175 +---
 2 files changed, 295 insertions(+), 282 deletions(-)

--- tcp-2.6.orig/net/ipv4/udp.c 2007-03-02 12:08:06.0 -0800
+++ tcp-2.6/net/ipv4/udp.c  2007-03-02 12:37:38.0 -0800
@@ -120,8 +120,8 @@
struct hlist_node *node;
 
sk_for_each(sk, node, &udptable[num & (UDP_HTABLE_SIZE - 1)])
-   if (sk->sk_hash == num)
-   return 1;
+   if (sk->sk_hash == num)
+   return 1;
return 0;
 }
 
@@ -136,13 +136,13 @@
  */
 int __udp_lib_get_port(struct sock *sk, unsigned short snum,
   struct hlist_head udptable[], int *port_rover,
-  int (*saddr_comp)(const struct sock *sk1,
-const struct sock *sk2 ))
+  int (*saddr_comp) (const struct sock * sk1,
+ const struct sock * sk2))
 {
struct hlist_node *node;
struct hlist_head *head;
struct sock *sk2;
-   interror = 1;
+   int error = 1;
 
write_lock_bh(&udp_hash_lock);
if (snum == 0) {
@@ -160,8 +160,9 @@
if (hlist_empty(head)) {
if (result > sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
-   ((result - 
sysctl_local_port_range[0]) &
-(UDP_HTABLE_SIZE - 1));
+   ((result -
+ sysctl_local_port_range[0]) &
+(UDP_HTABLE_SIZE - 1));
goto gotit;
}
size = 0;
@@ -175,12 +176,13 @@
;
}
result = best;
-   for(i = 0; i < (1 << 16) / UDP_HTABLE_SIZE; i++, result += 
UDP_HTABLE_SIZE) {
+   for (i = 0; i < (1 << 16) / UDP_HTABLE_SIZE;
+i++, result += UDP_HTABLE_SIZE) {
if (result > sysctl_local_port_range[1])
result = sysctl_local_port_range[0]
-   + ((result - 
sysctl_local_port_range[0]) &
-  (UDP_HTABLE_SIZE - 1));
-   if (! __udp_lib_lport_inuse(result, udptable))
+   + ((result - sysctl_local_port_range[0]) &
+  (UDP_HTABLE_SIZE - 1));
+   if (!__udp_lib_lport_inuse(result, udptable))
break;
}
if (i >= (1 << 16) / UDP_HTABLE_SIZE)
@@ -191,13 +193,13 @@
head = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
 
sk_for_each(sk2, node, head)
-   if (sk2->sk_hash == snum &&
-   sk2 != sk&&
-   (!sk2->sk_reuse|| !sk->sk_reuse) &&
-   (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if
-|| sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
-   (*saddr_comp)(sk, sk2) )
-   goto fail;
+   if (sk2->sk_hash == snum &&
+   sk2 != sk &&
+   (!sk2->sk_reuse || !sk->sk_reuse) &&
+   (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if
+|| sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
+   (*saddr_comp) (sk, sk2))
+   goto fail;
}
inet_sk(sk)->num = snum;
sk->sk_hash = snum;
@@ -212,19 +214,19 @@
return error;
 }
 
-__inline__ int udp_get_port(struct sock *sk, unsigned short snum,
-   int (*scmp)(const struct sock *, const struct sock *))
+int udp_get_port(struct sock *sk, unsigned short snum,
+int (*scmp) (const struct sock *, const struct sock *))
 {
-   return  __udp_lib_get_port(sk, snum, udp_hash, &udp_port_rover, scmp);
+   return __udp_lib_get_port(sk, snum, udp_hash, &udp_port_rover, scmp);
 }
 
-inline int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
+int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
 {
-   struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
+   const struct in

[RFC 2/2] bridge: per device promiscious taps

2007-03-02 Thread Stephen Hemminger
Part of the next set of bridge patches includes this.

It allows packet capture by interface on a bridge:
tcpdump -i eth0

will work as expected.

@@ -128,34 +125,45 @@ static inline int is_link_local(const un
 int br_handle_frame(struct net_bridge_port *p, struct sk_buff **pskb)
 {
struct sk_buff *skb = *pskb;
+   struct sk_buff *skb2 = NULL;
const unsigned char *dest = eth_hdr(skb)->h_dest;
 
if (!is_valid_ether_addr(eth_hdr(skb)->h_source))
goto err;

if (unlikely(is_link_local(dest))) {
skb->pkt_type = PACKET_HOST;
return NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev,
   NULL, br_handle_local_finish) != 0;
}
+
+   if (unlikely(p->dev->promiscuity > 1))
+   skb2 = skb_clone(skb, GFP_ATOMIC);
 
-   if (p->state == BR_STATE_FORWARDING || p->state == BR_STATE_LEARNING) {
+   switch (p->state) {
+   case BR_STATE_FORWARDING:
if (br_should_route_hook) {
-   if (br_should_route_hook(pskb))
+   if (br_should_route_hook(pskb)) {
+   kfree_skb(skb2);
return 0;
+   }
skb = *pskb;
dest = eth_hdr(skb)->h_dest;
}
 
if (!compare_ether_addr(p->br->dev->dev_addr, dest))
skb->pkt_type = PACKET_HOST;
+   /* fall thru */
 
+   case BR_STATE_LEARNING:
NF_HOOK(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL,
br_handle_frame_finish);
-   return 1;
+   break;
+
+   default:
+   kfree_skb(skb);
}
 
-err:
-   kfree_skb(skb);
-   return 1;
+   if (likely(!skb2))
+   return 1;
+
+   *pskb = skb2;
+   return 0;
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 13:26:38 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

> From: Stephen Hemminger <[EMAIL PROTECTED]>
> Date: Wed, 28 Feb 2007 17:18:46 -0800
> 
> > I was measuring bridging/routing performance and noticed this.
> > 
> > The current code runs the "all packet" type handlers before calling the
> > bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
> > this means that each received packet gets run through the Berkeley Packet 
> > Filter
> > code in sk_run_filter (slow).
> 
> I know we closed this out by saying that even though performance
> sucks, we can't really apply this without breaking things.

wrong.

> What would be broken is if the DHCP client isn't specifying
> a device ifindex when it binds the AF_PACKET socket.  That
> would be an easy way to fix this performance problem at the
> application level.
> 
> The DHCP client should only care about a particular interface's
> traffic, the one it wants to listen on.


My assumption is that when bridging, the normal stack path only has
to receive those packets that it would receive if it was not doing
bridging.

A better version of the patch is:
==

The current code runs the "all packet" type handlers before calling the
bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
this means that each received packet gets run through the Berkeley Packet Filter
code in sk_run_filter. This is significant overhead.

By moving the bridging hook to run first, the packets flowing through
the bridge get filtered out there first. This results in a 14%
improvement in performance.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/core/dev.c |   24 
 1 file changed, 12 insertions(+), 12 deletions(-)

--- netem.orig/net/core/dev.c
+++ netem/net/core/dev.c
@@ -1702,9 +1702,12 @@ struct net_bridge_fdb_entry *(*br_fdb_ge
unsigned char *addr);
 void (*br_fdb_put_hook)(struct net_bridge_fdb_entry *ent);
 
-static __inline__ int handle_bridge(struct sk_buff **pskb,
-   struct packet_type **pt_prev, int *ret,
-   struct net_device *orig_dev)
+/*
+ * If bridge module is loaded call bridging hook.
+ * when it returns 1, this is a non-local packet
+ */
+int (*br_handle_frame_hook)(struct net_bridge_port *p, struct sk_buff **pskb) 
__read_mostly;
+static int handle_bridge(struct sk_buff **pskb)
 {
struct net_bridge_port *port;
 
@@ -1712,15 +1715,10 @@ static __inline__ int handle_bridge(stru
(port = rcu_dereference((*pskb)->dev->br_port)) == NULL)
return 0;
 
-   if (*pt_prev) {
-   *ret = deliver_skb(*pskb, *pt_prev, orig_dev);
-   *pt_prev = NULL;
-   }
-
return br_handle_frame_hook(port, pskb);
 }
 #else
-#define handle_bridge(skb, pt_prev, ret, orig_dev) (0)
+#define handle_bridge(pskb)0
 #endif
 
 #ifdef CONFIG_NET_CLS_ACT
@@ -1799,6 +1797,9 @@ int netif_receive_skb(struct sk_buff *sk
}
 #endif
 
+   if (handle_bridge(&skb))
+   goto out;
+
list_for_each_entry_rcu(ptype, &ptype_all, list) {
if (!ptype->dev || ptype->dev == skb->dev) {
if (pt_prev)
@@ -1826,9 +1827,6 @@ int netif_receive_skb(struct sk_buff *sk
 ncls:
 #endif
 
-   if (handle_bridge(&skb, &pt_prev, &ret, orig_dev))
-   goto out;
-
type = skb->protocol;
list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) {
if (ptype->type == type &&



-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 13:26:38 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

> From: Stephen Hemminger <[EMAIL PROTECTED]>
> Date: Wed, 28 Feb 2007 17:18:46 -0800
> 
> > I was measuring bridging/routing performance and noticed this.
> > 
> > The current code runs the "all packet" type handlers before calling the
> > bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
> > this means that each received packet gets run through the Berkeley Packet 
> > Filter
> > code in sk_run_filter (slow).
> 
> I know we closed this out by saying that even though performance
> sucks, we can't really apply this without breaking things.

wrong.

> What would be broken is if the DHCP client isn't specifying
> a device ifindex when it binds the AF_PACKET socket.  That
> would be an easy way to fix this performance problem at the
> application level.
> 
> The DHCP client should only care about a particular interface's
> traffic, the one it wants to listen on.


My assumption is that when bridging, the normal stack path only has
to receive those packets that it would receive if it was not doing
bridging.

A better version of the patch is:
==

The current code runs the "all packet" type handlers before calling the
bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
this means that each received packet gets run through the Berkeley Packet Filter
code in sk_run_filter. This is significant overhead.

By moving the bridging hook to run first, the packets flowing through
the bridge get filtered out there first. This results in a 14%
improvement in performance.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/core/dev.c |   24 
 1 file changed, 12 insertions(+), 12 deletions(-)

--- netem.orig/net/core/dev.c
+++ netem/net/core/dev.c
@@ -1702,9 +1702,12 @@ struct net_bridge_fdb_entry *(*br_fdb_ge
unsigned char *addr);
 void (*br_fdb_put_hook)(struct net_bridge_fdb_entry *ent);
 
-static __inline__ int handle_bridge(struct sk_buff **pskb,
-   struct packet_type **pt_prev, int *ret,
-   struct net_device *orig_dev)
+/*
+ * If bridge module is loaded call bridging hook.
+ * when it returns 1, this is a non-local packet
+ */
+int (*br_handle_frame_hook)(struct net_bridge_port *p, struct sk_buff **pskb) 
__read_mostly;
+static int handle_bridge(struct sk_buff **pskb)
 {
struct net_bridge_port *port;
 
@@ -1712,15 +1715,10 @@ static __inline__ int handle_bridge(stru
(port = rcu_dereference((*pskb)->dev->br_port)) == NULL)
return 0;
 
-   if (*pt_prev) {
-   *ret = deliver_skb(*pskb, *pt_prev, orig_dev);
-   *pt_prev = NULL;
-   }
-
return br_handle_frame_hook(port, pskb);
 }
 #else
-#define handle_bridge(skb, pt_prev, ret, orig_dev) (0)
+#define handle_bridge(pskb)0
 #endif
 
 #ifdef CONFIG_NET_CLS_ACT
@@ -1799,6 +1797,9 @@ int netif_receive_skb(struct sk_buff *sk
}
 #endif
 
+   if (handle_bridge(&skb))
+   goto out;
+
list_for_each_entry_rcu(ptype, &ptype_all, list) {
if (!ptype->dev || ptype->dev == skb->dev) {
if (pt_prev)
@@ -1826,9 +1827,6 @@ int netif_receive_skb(struct sk_buff *sk
 ncls:
 #endif
 
-   if (handle_bridge(&skb, &pt_prev, &ret, orig_dev))
-   goto out;
-
type = skb->protocol;
list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) {
if (ptype->type == type &&



-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert network timestamps to ktime_t

2007-03-02 Thread Eric Dumazet

Stephen Hemminger a écrit :

On Fri, 2 Mar 2007 15:38:41 +0100
Eric Dumazet <[EMAIL PROTECTED]> wrote:

We currently use a special structure (struct skb_timeval) and plain 'struct 
timeval' to store packet timestamps in sk_buffs and struct sock.


This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution time 
services, currently capable of nanosecond resolution.


As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte 
shrink of this structure on 64bit architectures. Some other structures also 
benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct 
frag_queue in ipv6/reassembly.c, ...)





You missed a couple of spots.


Arg yes...



--- tcp-2.6.orig/net/sunrpc/svcsock.c   2007-03-02 12:50:45.0 -0800
+++ tcp-2.6/net/sunrpc/svcsock.c2007-03-02 12:58:28.0 -0800
@@ -805,16 +805,9 @@
/* possibly an icmp error */
dprintk("svc: recvfrom returned error %d\n", -err);
}
-   if (skb->tstamp.off_sec == 0) {
-   struct timeval tv;
 
-		tv.tv_sec = xtime.tv_sec;

-   tv.tv_usec = xtime.tv_nsec / NSEC_PER_USEC;
-   skb_set_timestamp(skb, &tv);
-   /* Don't enable netstamp, sunrpc doesn't
-  need that much accuracy */
-   }
-   skb_get_timestamp(skb, &svsk->sk_sk->sk_stamp);
+   svsk->sk_sk->sk_stamp = (skb->tstamp.tv64 != 0) ? skb->tstamp
+   : ktime_get_real();


Well, if we want to stay in the spirit of old code, we probably want to use 
current_kernel_time() (+ timespec_to_ktime()), because its less expensive.


And also setting the skb tstamp, no ?


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 2 Mar 2007 14:09:29 -0800

> On Fri, 02 Mar 2007 13:26:38 -0800 (PST)
> David Miller <[EMAIL PROTECTED]> wrote:
> 
> > From: Stephen Hemminger <[EMAIL PROTECTED]>
> > Date: Wed, 28 Feb 2007 17:18:46 -0800
> > 
> > > I was measuring bridging/routing performance and noticed this.
> > > 
> > > The current code runs the "all packet" type handlers before calling the
> > > bridge hook.  If an application (like some DHCP clients) is using 
> > > AF_PACKET,
> > > this means that each received packet gets run through the Berkeley Packet 
> > > Filter
> > > code in sk_run_filter (slow).
> > 
> > I know we closed this out by saying that even though performance
> > sucks, we can't really apply this without breaking things.
> 
> wrong.

I disagee, and your patch is still broken because as Jamal
pointed out (which you didn't address in any way) this breaks
traffic classification of bridged traffic as well.

If someone wants their network tap to hear all traffic, they do mean
all traffic, and this includes potentially seeing it multiple times
when things like bridging and virtual devices decap incoming frames.

We can't apply this.

Back to a workable solution, why doesn't DHCP specify a specific
device?  It would fix this performance problem completely, at the
application level.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: David Miller <[EMAIL PROTECTED]>
Date: Fri, 02 Mar 2007 14:48:18 -0800 (PST)

> Back to a workable solution, why doesn't DHCP specify a specific
> device?  It would fix this performance problem completely, at the
> application level.

Since nobody seems to be able to be bothered to actually look
at what DHCP clients are doing, I actually did and it's no
surprise that broken stuff is happening here.

Here is how dhcp3-3.0.3 binds AF_PACKET sockets, in common/lpf.c:

struct sockaddr sa;
 ...
/* Bind to the interface name */
memset (&sa, 0, sizeof sa);
sa.sa_family = AF_PACKET;
strncpy (sa.sa_data, (const char *)info -> ifp, sizeof sa.sa_data);
if (bind (sock, &sa, sizeof sa)) {
if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT ||
errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT ||
errno == EAFNOSUPPORT || errno == EINVAL) {
log_error ("socket: %m - make sure");
log_error ("CONFIG_PACKET (Packet socket) %s",
   "and CONFIG_FILTER");
log_error ("(Socket Filtering) are enabled %s",
   "in your kernel");
log_fatal ("configuration!");
}
log_fatal ("Bind socket to interface: %m");
}

So it puts a string into the sockaddr data, and there
is no mention of sockaddr_ll, which is what is supposed to be
provided as the socket address here, in the entire DHCP tree.

I'm tempted to say I must be missing something here, since I can't see
how this could possible work at all.  The string passed in should
be interpreted as the ifindex value, and thus trigger a -ENODEV
return from AF_PACKET's bind() implementation.

My suspicions are confirmed by the patch here:

http://kernel.org/pub/linux/kernel/people/chuyee/patches/dhcp-3.0/dhcp-3.0-linux_cooked_packet.patch

Really, this bogus bind() explains everything.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 15:18:03 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

> From: David Miller <[EMAIL PROTECTED]>
> Date: Fri, 02 Mar 2007 14:48:18 -0800 (PST)
> 
> > Back to a workable solution, why doesn't DHCP specify a specific
> > device?  It would fix this performance problem completely, at the
> > application level.
> 
> Since nobody seems to be able to be bothered to actually look
> at what DHCP clients are doing, I actually did and it's no
> surprise that broken stuff is happening here.

I was in middle of checking that..

> Here is how dhcp3-3.0.3 binds AF_PACKET sockets, in common/lpf.c:
> 
>   struct sockaddr sa;
>  ...
>   /* Bind to the interface name */
>   memset (&sa, 0, sizeof sa);
>   sa.sa_family = AF_PACKET;
>   strncpy (sa.sa_data, (const char *)info -> ifp, sizeof sa.sa_data);
>   if (bind (sock, &sa, sizeof sa)) {
>   if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT ||
>   errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT ||
>   errno == EAFNOSUPPORT || errno == EINVAL) {
>   log_error ("socket: %m - make sure");
>   log_error ("CONFIG_PACKET (Packet socket) %s",
>  "and CONFIG_FILTER");
>   log_error ("(Socket Filtering) are enabled %s",
>  "in your kernel");
>   log_fatal ("configuration!");
>   }
>   log_fatal ("Bind socket to interface: %m");
>   }
> 
> So it puts a string into the sockaddr data, and there
> is no mention of sockaddr_ll, which is what is supposed to be
> provided as the socket address here, in the entire DHCP tree.
> 
> I'm tempted to say I must be missing something here, since I can't see
> how this could possible work at all.  The string passed in should
> be interpreted as the ifindex value, and thus trigger a -ENODEV
> return from AF_PACKET's bind() implementation.
> 
> My suspicions are confirmed by the patch here:
> 
> http://kernel.org/pub/linux/kernel/people/chuyee/patches/dhcp-3.0/dhcp-3.0-linux_cooked_packet.patch

Can you get FC fixed?

> Really, this bogus bind() explains everything.

Should we add a warning to kernel log, to make distro's fix it?

It might make sense to add a per-device ptype_dev list in network device?



-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8107] New: dev->header_cache_update has a random value

2007-03-02 Thread Krzysztof Halasa
Switching HDLC devices from Ethernet-framing mode caused stale ethernet
function assignments within net_device.

Signed-off-by: Krzysztof Halasa <[EMAIL PROTECTED]>

diff --git a/drivers/net/wan/hdlc.c b/drivers/net/wan/hdlc.c
index db354e0..f6e6b63 100644
--- a/drivers/net/wan/hdlc.c
+++ b/drivers/net/wan/hdlc.c
@@ -38,7 +38,7 @@
 #include 
 
 
-static const char* version = "HDLC support module revision 1.20";
+static const char* version = "HDLC support module revision 1.21";
 
 #undef DEBUG_LINK
 
@@ -222,19 +222,30 @@ int hdlc_ioctl(struct net_device *dev, struct ifreq *ifr, 
int cmd)
return -EINVAL;
 }
 
+static void hdlc_setup_dev(struct net_device *dev)
+{
+/* Re-init all variables changed by HDLC protocol drivers,
+   including ether_setup() called from hdlc_raw_eth.c. */
+   dev->get_stats   = hdlc_get_stats;
+   dev->flags   = IFF_POINTOPOINT | IFF_NOARP;
+   dev->mtu = HDLC_MAX_MTU;
+   dev->type= ARPHRD_RAWHDLC;
+   dev->hard_header_len = 16;
+   dev->addr_len= 0;
+   dev->hard_header = NULL;
+   dev->rebuild_header  = NULL;
+   dev->set_mac_address = NULL;
+   dev->hard_header_cache   = NULL;
+   dev->header_cache_update = NULL;
+   dev->change_mtu  = hdlc_change_mtu;
+   dev->hard_header_parse   = NULL;
+}
+
 void hdlc_setup(struct net_device *dev)
 {
hdlc_device *hdlc = dev_to_hdlc(dev);
 
-   dev->get_stats = hdlc_get_stats;
-   dev->change_mtu = hdlc_change_mtu;
-   dev->mtu = HDLC_MAX_MTU;
-
-   dev->type = ARPHRD_RAWHDLC;
-   dev->hard_header_len = 16;
-
-   dev->flags = IFF_POINTOPOINT | IFF_NOARP;
-
+   hdlc_setup_dev(dev);
hdlc->carrier = 1;
hdlc->open = 0;
spin_lock_init(&hdlc->state_lock);
@@ -294,6 +305,7 @@ void detach_hdlc_protocol(struct net_device *dev)
}
kfree(hdlc->state);
hdlc->state = NULL;
+   hdlc_setup_dev(dev);
 }
 
 
diff --git a/drivers/net/wan/hdlc_cisco.c b/drivers/net/wan/hdlc_cisco.c
index b0bc5dd..c9664fd 100644
--- a/drivers/net/wan/hdlc_cisco.c
+++ b/drivers/net/wan/hdlc_cisco.c
@@ -365,10 +365,7 @@ static int cisco_ioctl(struct net_device *dev, struct 
ifreq *ifr)
memcpy(&state(hdlc)->settings, &new_settings, size);
dev->hard_start_xmit = hdlc->xmit;
dev->hard_header = cisco_hard_header;
-   dev->hard_header_cache = NULL;
dev->type = ARPHRD_CISCO;
-   dev->flags = IFF_POINTOPOINT | IFF_NOARP;
-   dev->addr_len = 0;
netif_dormant_on(dev);
return 0;
}
diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c
index b45ab68..c6c3c75 100644
--- a/drivers/net/wan/hdlc_fr.c
+++ b/drivers/net/wan/hdlc_fr.c
@@ -1289,10 +1289,7 @@ static int fr_ioctl(struct net_device *dev, struct ifreq 
*ifr)
memcpy(&state(hdlc)->settings, &new_settings, size);
 
dev->hard_start_xmit = hdlc->xmit;
-   dev->hard_header = NULL;
dev->type = ARPHRD_FRAD;
-   dev->flags = IFF_POINTOPOINT | IFF_NOARP;
-   dev->addr_len = 0;
return 0;
 
case IF_PROTO_FR_ADD_PVC:
diff --git a/drivers/net/wan/hdlc_ppp.c b/drivers/net/wan/hdlc_ppp.c
index e9f7170..4591437 100644
--- a/drivers/net/wan/hdlc_ppp.c
+++ b/drivers/net/wan/hdlc_ppp.c
@@ -127,9 +127,7 @@ static int ppp_ioctl(struct net_device *dev, struct ifreq 
*ifr)
if (result)
return result;
dev->hard_start_xmit = hdlc->xmit;
-   dev->hard_header = NULL;
dev->type = ARPHRD_PPP;
-   dev->addr_len = 0;
netif_dormant_off(dev);
return 0;
}
diff --git a/drivers/net/wan/hdlc_raw.c b/drivers/net/wan/hdlc_raw.c
index fe3cae5..e23bc66 100644
--- a/drivers/net/wan/hdlc_raw.c
+++ b/drivers/net/wan/hdlc_raw.c
@@ -88,10 +88,7 @@ static int raw_ioctl(struct net_device *dev, struct ifreq 
*ifr)
return result;
memcpy(hdlc->state, &new_settings, size);
dev->hard_start_xmit = hdlc->xmit;
-   dev->hard_header = NULL;
dev->type = ARPHRD_RAWHDLC;
-   dev->flags = IFF_POINTOPOINT | IFF_NOARP;
-   dev->addr_len = 0;
netif_dormant_off(dev);
return 0;
}
diff --git a/drivers/net/wan/hdlc_x25.c b/drivers/net/wan/hdlc_x25.c
index e4bb9f8..cd7b22f 100644
--- a/drivers/net/wan/hdlc_x25.c
+++ b/drivers/net/wan/hdlc_x25.c
@@ -215,9 +215,7 @@ static int x25_ioctl(struct net_device *dev, struct ifreq 
*ifr)
   x25_rx, 0)) != 0)
return result;
dev->hard_start_xmit = x25_xmit;
-   dev->hard_header = NULL;
d

Re: [Bugme-new] [Bug 8107] New: dev->header_cache_update has a random value

2007-03-02 Thread Krzysztof Halasa
David Miller <[EMAIL PROTECTED]> writes:

> I disagree, you can't leave dangling references to functions
> which are potentially inside of unloaded modules, as this code
> does.

All such pointers were thought to be initialized by all HDLC protocol
handlers before device activation, but they were actually used by the
hdlc* code, and this one doesn't seem to...

> Rather, HDLC Cisco should implement a proper protocol destructor
> method to clean up these function pointers.

No, it wouldn't work - hdlc_cisco doesn't use it at all, it's just
a victim. But now I think there may be other victims.

It seems the only way to become non-NULL is through ether_setup()
from hdlc_raw_eth.c (Ethernet framing over HDLC).

I think it's best to NULLify it and the like in hdlc.c
unconditionally, it's slow path and we don't need another useless
EXPORT_SYMBOL(s). It would fix all such problems forever.

Compile-tested only but it seems pretty obvious and of course I check
if the packets still flow after regular kernel upgrades (and I run
automatic tests checking all protos except X.25 from time to time as
well).

(the patch is in the next message).

Not sure if 2.6.21 material.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 2 Mar 2007 15:34:14 -0800

> Can you get FC fixed?

I am not the DHCP package maintainer. :-)

I'm up to my earfulls already dealing with people trying
to slug broken patches into the kernel networking that paper
around application bugs. ;)

> Should we add a warning to kernel log, to make distro's fix it?

Unfortunately it looks like a properly formed sockaddr_ll,
the ifindex is in fact zero, so there is nothing we can do
to warn about this case.

The sockaddr_ll sits after the first sockaddr string in the ifreq, and
the rest remains initialized to zeros, thus the bind() succeeds.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8107] New: dev->header_cache_update has a random value

2007-03-02 Thread David Miller
From: Krzysztof Halasa <[EMAIL PROTECTED]>
Date: Sat, 03 Mar 2007 00:38:05 +0100

> Switching HDLC devices from Ethernet-framing mode caused stale ethernet
> function assignments within net_device.
> 
> Signed-off-by: Krzysztof Halasa <[EMAIL PROTECTED]>

This looks good to me, I think I'll apply it :-)

Thanks!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] udp: whitespace fixes

2007-03-02 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 2 Mar 2007 14:04:49 -0800

> 
> The udp code is full of bad indenting, extra whitespace and other
> style confusion.  It makes no sense to declare functions that are used
> outside the current file (extern) as inline.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
> ---
>  net/ipv4/udp.c |  402 
> -
>  net/ipv6/udp.c |  175 +---
>  2 files changed, 295 insertions(+), 282 deletions(-)
> 
> --- tcp-2.6.orig/net/ipv4/udp.c   2007-03-02 12:08:06.0 -0800
> +++ tcp-2.6/net/ipv4/udp.c2007-03-02 12:37:38.0 -0800
> @@ -120,8 +120,8 @@
>   struct hlist_node *node;
>  
>   sk_for_each(sk, node, &udptable[num & (UDP_HTABLE_SIZE - 1)])
> - if (sk->sk_hash == num)
> - return 1;
> + if (sk->sk_hash == num)
> + return 1;

This turns tabs into spaces, it cannot be correct.

Yoshifuji fixed all the whitespace problems under net/ already
for 2.6.21
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tree plans...

2007-03-02 Thread David Miller

I plan to cut a net-2.6.22 tree after I finish pushing the
current round of 2.6.21 networking bug fixes to Linus.

I'll load the tcp-2.6 tree changes into net-2.6.22, and then
we'll do all non-bug-fix development in the net-2.6.22 tree.

It may take some time for me to push out the bug fixes for today
because due to the VLAN group allocation fix, I need to do an
exhaustive build test with allmodconfig and stuff like that to
make sure no drivers got accidently build broken by that change.

Thanks!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Convert xtime.tv_sec to get_seconds()

2007-03-02 Thread David Miller
From: James Morris <[EMAIL PROTECTED]>
Date: Tue, 27 Feb 2007 16:24:49 -0500 (EST)

> Where appropriate, convert references to xtime.tv_sec to the
> get_seconds() helper function.
> 
> Signed-off-by: James Morris <[EMAIL PROTECTED]>

This looks great James, I'll apply it to net-2.6.2 once I set
that tree up.

Thanks again.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] pktgen: fix device name handling

2007-03-02 Thread David Miller
From: Robert Olsson <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 18:07:09 +0100

> Yes it seems be handle dev name change. So configuration scripts should
> use ifindex now :)
> 
> Signed-off-by: Robert Olsson <[EMAIL PROTECTED]>

I will apply all 4 of these patches to net-2.6.22, thanks everyone.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netem tfifo implementation

2007-03-02 Thread Ritesh Kumar

On 3/2/07, Stephen Hemminger <[EMAIL PROTECTED]> wrote:

On Fri, 2 Mar 2007 15:56:54 -0500
"Ritesh Kumar" <[EMAIL PROTECTED]> wrote:

> On 3/2/07, Patrick McHardy <[EMAIL PROTECTED]> wrote:
> > Ritesh Kumar wrote:
> > > Hi,
> > >I recently saw the qdisc "tfifo" in the netem module
> > > (net/sched/sch_netem.c) when I migrated some of my patches from 2.6.14
> > > to 2.6.20. As I understand, tfifo helps in keeping the queue of
> > > packets sorted according to their "time_to_send". [tfifo was not
> > > present in 2.6.14 perhaps because arrival order of packets was always
> > > equal to the departure order]. However, tfifo uses a linear search in
> > > the packet queue to find where to enqueue the packet.
> > >Quite some time ago (2.6.14 era), I needed a similar functionality
> > > from the netem module and I ended up coding a pointer based min-heap
> > > for the same. I was wondering if the community was interested in using
> > > the min-heap implementation to replace the linear search
> > > implementation. I have tested the min-heap quite a few times and it
> > > seems to work.
> > >The implementation is slightly non-trivial because it uses
> > > pointers to maintain the heap structure instead if using good old
> > > fixed size arrays. I did this mainly so that the limit of the netem
> > > qdisc could be changed on the fly. However, because every sk_buff now
> > > needs two pointers for its children nodes, I added an extra
> > > (sk_buff*)next2 to struct sk_buff (sorry!). However, this can probably
> > > be changed to a pointer inside netem_skb_cb.  Also, because I needed
> > > this for personal work and 2.6.14 didn't contain tfifo, I basically
> > > removed the embedded qdisc and made netem a classless qdisc with my
> > > min heap as the native "queue" (sorry again! :) )
> >
> > The tfifo qdisc has a limit, why not just allocate a fixed-size heap
> > based on that?
> >
> >
>
> The tfifo queue limit itself can be changed and that creates the
> problem. If we use a fixed heap (say implemented using a fixed size
> array) then we will have to copy over all pointers from the first
> array to a reallocated array whenever the queue limit is changed.
> In retrospect, moving just a few 10s of kilobytes of data doesn't seem
> that much of a problem... now I feel stupid having put so much effort
> :).
>

Tfifo is a special case because:
  * timestamps are stored in skb->cb so it is only really usable inside
netem that adds timestamps.
  * insertions are cheap because it walks backwards and netem usually has
tnext > tlast.   Only if you have a huge jitter which causes massive 
reordering
and that is unrealistic, would you see a problem.



You are right. A huge jitter inside a given flow is unrealistic in
real networks. It can also cause artificial reordering. However, in
our lab we use netem (with my changes) to enable per-flow delays. The
per-flow delays that we use vary a lot and hence we have to go through
some optimizations.

Thanks for all the feedback.

Ritesh
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] udp: whitespace fixes

2007-03-02 Thread Stephen Hemminger
Resend with less garbage...

The udp code is full of bad indenting, extra whitespace and other
style confusion.  It makes no sense to declare functions that are used
outside the current file (extern) as inline.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/ipv4/udp.c |  312 +++--
 net/ipv6/udp.c |  153 +++
 2 files changed, 236 insertions(+), 229 deletions(-)

--- tcp-2.6.orig/net/ipv4/udp.c 2007-03-02 16:25:12.0 -0800
+++ tcp-2.6/net/ipv4/udp.c  2007-03-02 16:41:04.0 -0800
@@ -136,13 +136,13 @@
  */
 int __udp_lib_get_port(struct sock *sk, unsigned short snum,
   struct hlist_head udptable[], int *port_rover,
-  int (*saddr_comp)(const struct sock *sk1,
-const struct sock *sk2 ))
+  int (*saddr_comp)(const struct sock * sk1,
+const struct sock * sk2))
 {
struct hlist_node *node;
struct hlist_head *head;
struct sock *sk2;
-   interror = 1;
+   int error = 1;
 
write_lock_bh(&udp_hash_lock);
if (snum == 0) {
@@ -160,7 +160,8 @@
if (hlist_empty(head)) {
if (result > sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
-   ((result - 
sysctl_local_port_range[0]) &
+   ((result -
+ sysctl_local_port_range[0]) &
 (UDP_HTABLE_SIZE - 1));
goto gotit;
}
@@ -175,12 +176,13 @@
;
}
result = best;
-   for(i = 0; i < (1 << 16) / UDP_HTABLE_SIZE; i++, result += 
UDP_HTABLE_SIZE) {
+   for (i = 0; i < (1 << 16) / UDP_HTABLE_SIZE;
+i++, result += UDP_HTABLE_SIZE) {
if (result > sysctl_local_port_range[1])
result = sysctl_local_port_range[0]
+ ((result - 
sysctl_local_port_range[0]) &
   (UDP_HTABLE_SIZE - 1));
-   if (! __udp_lib_lport_inuse(result, udptable))
+   if (!__udp_lib_lport_inuse(result, udptable))
break;
}
if (i >= (1 << 16) / UDP_HTABLE_SIZE)
@@ -194,9 +196,8 @@
if (sk2->sk_hash == snum &&
sk2 != sk&&
(!sk2->sk_reuse|| !sk->sk_reuse) &&
-   (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if
 || sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
-   (*saddr_comp)(sk, sk2) )
+   (*saddr_comp)(sk, sk2))
goto fail;
}
inet_sk(sk)->num = snum;
@@ -212,19 +213,19 @@
return error;
 }
 
-__inline__ int udp_get_port(struct sock *sk, unsigned short snum,
-   int (*scmp)(const struct sock *, const struct sock *))
+int udp_get_port(struct sock *sk, unsigned short snum,
+int (*scmp)(const struct sock *, const struct sock *))
 {
-   return  __udp_lib_get_port(sk, snum, udp_hash, &udp_port_rover, scmp);
+   return __udp_lib_get_port(sk, snum, udp_hash, &udp_port_rover, scmp);
 }
 
-inline int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
+int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
 {
-   struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
+   const struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
 
-   return  ( !ipv6_only_sock(sk2)  &&
- (!inet1->rcv_saddr || !inet2->rcv_saddr ||
-  inet1->rcv_saddr == inet2->rcv_saddr  ));
+   return !ipv6_only_sock(sk2) &&
+   (!inet1->rcv_saddr || !inet2->rcv_saddr ||
+inet1->rcv_saddr == inet2->rcv_saddr);
 }
 
 static inline int udp_v4_get_port(struct sock *sk, unsigned short snum)
@@ -253,27 +254,27 @@
if (inet->rcv_saddr) {
if (inet->rcv_saddr != daddr)
continue;
-   score+=2;
+   score += 2;
}
if (inet->daddr) {
if (inet->daddr != saddr)
continue;
-   score+=2;
+   

Re: [PATCH] udp: whitespace fixes

2007-03-02 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 2 Mar 2007 16:47:19 -0800

> Resend with less garbage...
> 
> The udp code is full of bad indenting, extra whitespace and other
> style confusion.  It makes no sense to declare functions that are used
> outside the current file (extern) as inline.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Looks good, I'll try to apply this when I cut the net-2.6.22
tree.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3]: Updates, removal of unsupported features and minor bug fixes.

2007-03-02 Thread Jeff Garzik

Linsys Contractor Mithlesh Thukral wrote:

NetXen: Updates, removal of unsupported features and minor bug fixes.

Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]>

---
 netxen_nic.h  |4 +
 netxen_nic_ethtool.c  |  144 +-
 netxen_nic_main.c |4 -
 netxen_nic_phan_reg.h |3 +
 4 files changed, 34 insertions(+), 121 deletions(-)


applied patches 1-2 of 3


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] NetXen: Make driver use multi PCI functions

2007-03-02 Thread Jeff Garzik

Linsys Contractor Mithlesh Thukral wrote:

NetXen: Make driver use multi PCI functions.

Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]>

---

 netxen_nic.h  |  126 +---
 netxen_nic_ethtool.c  |   80 +++
 netxen_nic_hdr.h  |8 
 netxen_nic_hw.c   |  213 +++-

 netxen_nic_hw.h   |   18 -
 netxen_nic_init.c |  115 +++---
 netxen_nic_isr.c  |   80 +++
 netxen_nic_main.c |  523 +-
 netxen_nic_niu.c  |   27 +-
 netxen_nic_phan_reg.h |  125 ---
 10 files changed, 631 insertions(+), 684 deletions(-)


all three patches in this patchset contained nothing but one-line 
summaries of the changes included in them, and are overall very poorly 
and vaguely described.


This patch is far too big, with far too little description and 
justification to go along with it.


If you are not going to make the effort to write a paragraph or two 
describing such huge changes, then I'm not going to make the effort to 
review and apply it.  NAK.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] Add support for Seeq 8003 on Challenge S Mezz board.

2007-03-02 Thread Jeff Garzik

Ralf Baechle wrote:

From: Ladislav Michl <[EMAIL PROTECTED]>

Thanks to Jö Fahlke for donating hardware.

Signed-off-by: Ladislav Michl <[EMAIL PROTECTED]>

Forward porting of Ladis' 2.4 patch.

Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]>


applied to #upstream (2.6.22)


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] tc35815 driver update (part 1)

2007-03-02 Thread Jeff Garzik

Atsushi Nemoto wrote:

Current tc35815 driver is very obsolete and less maintained for a long
time.  Replace it with a new driver based on one from CELF patch
archive.  It was for 2.6.10 kernel so some adjustment and cleanup are
added. (remove config.h, SA_ to IRQF_ conversion, etc.)

Major advantages are:

* Independent of JMR3927.
  (Actually independent of MIPS, but AFAIK the chip is used only on
   MIPS platforms)
* TX4938 support.
* 64-bit proof.
* Asynchronous and on-demand auto negotiation.
* High performance on non-coherent architecture.
* ethtool support.
* Many bugfixes and cleanups.

And next patch add further improvements/bugfixes/cleanups.

Signed-off-by: Atsushi Nemoto <[EMAIL PROTECTED]>
---
This is a patch against current linux-mips.org git-tree.

 drivers/net/Kconfig |3 
 drivers/net/tc35815.c   | 2070 +++---
 include/linux/pci_ids.h |1 
 3 files changed, 1440 insertions(+), 634 deletions(-)


Would you be kind enough to

a) provide a URL to a .c file (or post it, if it's under 100K) so that 
we may more easily review this


b) combine both patches into a single patch.  might as well, since it's 
a rewrite.


c) rediff your patch against linux-2.6.git + Ralf's killall removal 
patch, and resend.  There were some minor conflicting changes that 
appeared, though these changes will certainly become irrelevant once 
your new driver is merged.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: avoid ptype_all packet handling

2007-03-02 Thread Andi Kleen
David Miller <[EMAIL PROTECTED]> writes:
> 
> And in fact that effectively makes the new socket option
> pointless, since it doesn't buy us anything since we have
> to support the old stuff fully anyways.

I don't think it's pointless because it would still allow
newer DHCP clients to have less impact on other packets
when they are active. 

This can matter when you have a system with multiple
interfaces where DHCP doesn't get a address on one.

That's pretty common with many x86 server boards because 
they come with two NICs by default but must people only
plug the cable into one. However the distro installers
run DHCP on all.

When this happens all packets are always forced through
ptype_all chains before being rejected by AF_PACKETs device
bind, which adds some overhead to them. 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla3xxx: bugfix for line omitted in previous patch.

2007-03-02 Thread Jeff Garzik

Ron Mercer wrote:

From 01751a39d7327acc28dabf4f68930b7e20b279d1 Mon Sep 17 00:00:00 2001

From: Ron Mercer <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 16:42:17 -0800
Subject: [PATCH] [PATCH] qla3xxx: bugfix for line omitted in previous patch.

This missing line caused transmit errors on the Qlogic 4032 chip.

Signed-off-by: Ron Mercer <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network activity LED trigger

2007-03-02 Thread Andi Kleen
Florian Fainelli <[EMAIL PROTECTED]> writes:

> Hi All,
> 
> I have been talking a bit with Richard, who is the LED API maintainer, and a 
> LED trigger based on network activity would be something great.

You should be aware that normally the kernel doesn't see all packets
on a ethernet unless promiscuous mode is enabled (which it is normally 
not). That is because the hardware filters out all packets
not for this host. A software controlled LED wouldn't be equivalent
to the activity LEDs you normally have on network cards,
but only show local traffic.

That said if you want to get events for any in/outgoing packets
you can use the same hooks as PF_PACKET uses for sniffing;
using dev_add_pack with ETH_P_ALL.
That will get you all incoming and outgoing packets that 
are local.

And when someone runs tcpdump it will suddenly see all which
might be unexpected.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] bonding: only receive ARPs for us

2007-03-02 Thread Jeff Garzik

Jay Vosburgh wrote:

The ARP validation code only needs ARPs for the bonding device.

Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>


I seem to have lost the context of this.  Did this get discussed, and 
need further revision?


The three patches from 2/28/2007 look OK to me, and I just wanted to 
make sure before applying them.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] div64_64 consolidate (rev3)

2007-03-02 Thread Stephen Hemminger
Here is the current version of the 64 bit divide common code.
Since it is used by three times by networking code, can we put it net-2.6.22 
tree?

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 include/asm-arm/div64.h  |2 ++
 include/asm-generic/div64.h  |7 +++
 include/asm-i386/div64.h |2 ++
 include/asm-m68k/div64.h |1 +
 include/asm-mips/div64.h |2 ++
 include/asm-um/div64.h   |1 +
 include/asm-xtensa/div64.h   |4 
 lib/Makefile |5 +++--
 lib/div64.c  |   22 ++
 net/ipv4/tcp_cubic.c |   23 ---
 net/ipv4/tcp_yeah.c  |   21 -
 net/ipv4/tcp_yeah.h  |1 +
 net/netfilter/xt_connbytes.c |   16 
 13 files changed, 45 insertions(+), 62 deletions(-)

--- tcp-2.6.orig/include/asm-arm/div64.h2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-arm/div64.h 2007-03-02 17:22:38.0 -0800
@@ -223,4 +223,6 @@
 
 #endif
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
+
 #endif
--- tcp-2.6.orig/include/asm-generic/div64.h2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-generic/div64.h 2007-03-02 17:22:38.0 -0800
@@ -30,6 +30,11 @@
__rem;  \
  })
 
+static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
+{
+   return dividend / divisor;
+}
+
 #elif BITS_PER_LONG == 32
 
 extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
@@ -49,6 +54,8 @@
__rem;  \
  })
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
+
 #else /* BITS_PER_LONG == ?? */
 
 # error do_div() does not yet support the C64
--- tcp-2.6.orig/include/asm-i386/div64.h   2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-i386/div64.h2007-03-02 17:22:38.0 -0800
@@ -45,4 +45,6 @@
return dum2;
 
 }
+
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif
--- tcp-2.6.orig/include/asm-m68k/div64.h   2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-m68k/div64.h2007-03-02 17:22:38.0 -0800
@@ -23,4 +23,5 @@
__rem;  \
 })
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif /* _M68K_DIV64_H */
--- tcp-2.6.orig/include/asm-mips/div64.h   2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-mips/div64.h2007-03-02 17:22:38.0 -0800
@@ -78,6 +78,8 @@
__quot = __quot << 32 | __low; \
(n) = __quot; \
__mod; })
+
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif /* (_MIPS_SZLONG == 32) */
 
 #if (_MIPS_SZLONG == 64)
--- tcp-2.6.orig/include/asm-um/div64.h 2007-03-02 17:21:27.0 -0800
+++ tcp-2.6/include/asm-um/div64.h  2007-03-02 17:22:38.0 -0800
@@ -3,4 +3,5 @@
 
 #include "asm/arch/div64.h"
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif
--- tcp-2.6.orig/include/asm-xtensa/div64.h 2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-xtensa/div64.h  2007-03-02 17:22:38.0 -0800
@@ -16,4 +16,8 @@
n /= (unsigned int) base; \
__res; })
 
+static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
+{
+   return dividend / divisor;
+}
 #endif
--- tcp-2.6.orig/lib/Makefile   2007-03-02 17:21:27.0 -0800
+++ tcp-2.6/lib/Makefile2007-03-02 17:22:38.0 -0800
@@ -4,7 +4,7 @@
 
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 rbtree.o radix-tree.o dump_stack.o \
-idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \
+idr.o int_sqrt.o bitmap.o extable.o prio_tree.o \
 sha1.o irq_regs.o reciprocal_div.o
 
 lib-$(CONFIG_MMU) += ioremap.o
@@ -12,7 +12,8 @@
 
 lib-y  += kobject.o kref.o kobject_uevent.o klist.o
 
-obj-y += sort.o parser.o halfmd4.o debug_locks.o random32.o bust_spinlocks.o
+obj-y += div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
+bust_spinlocks.o
 
 ifeq ($(CONFIG_DEBUG_KOBJECT),y)
 CFLAGS_kobject.o += -DDEBUG
--- tcp-2.6.orig/lib/div64.c2007-03-02 17:21:27.0 -0800
+++ tcp-2.6/lib/div64.c 2007-03-02 17:22:38.0 -0800
@@ -58,4 +58,26 @@
 
 EXPORT_SYMBOL(__div64_32);
 
+/* 64bit divisor, dividend and result. dynamic precision */
+uint64_t div64_64(uint64_t dividend, uint64_t divisor)
+{
+   uint32_t d = divisor;
+
+   if (divisor > 0xULL) {
+   unsigned int shift = fls(divisor >> 32);
+
+   d = divisor >> shift;
+   dividend >>= shift;
+   }
+
+   /* avoid 64 bit division if possible */
+   if (dividend >> 32)
+   do_div(dividend, d);
+   else
+   dividend = (uint32_t) dividend / d;
+
+   return dividend;
+}
+EXPORT_SYMBOL(div64_64);
+
 #endif /* BITS_PER

Re: [PATCH] [USBNET] DM9501: Add Corega FEther USB-TXC support.

2007-03-02 Thread Jeff Garzik

YOSHIFUJI Hideaki / 吉藤英明 wrote:

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
---
 drivers/usb/net/dm9601.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/net/dm9601.c b/drivers/usb/net/dm9601.c
index 4a932e1..c0bc52b 100644
--- a/drivers/usb/net/dm9601.c
+++ b/drivers/usb/net/dm9601.c
@@ -571,6 +571,10 @@ static const struct driver_info dm9601_info = {
 
 static const struct usb_device_id products[] = {

{
+USB_DEVICE(0x07aa, 0x9601),/* Corega FEther USB-TXC */
+.driver_info = (unsigned long)&dm9601_info,
+},
+   {



ACK the patch, though I wonder if this shouldn't instead go to Greg.

Honestly, I would prefer that the USB net drivers were moved into 
drivers/net with the other net drivers, /then/ I would merge such 
patches.  We don't add drivers for PCI-based hardware to 
drivers/pci/net, after all...


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bonding-devel] [PATCH 2/3] bonding: only receive ARPs for us

2007-03-02 Thread Jay Vosburgh
Jeff Garzik <[EMAIL PROTECTED]> wrote:

>Jay Vosburgh wrote:
>>  The ARP validation code only needs ARPs for the bonding device.
>> 
>> Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>
>
>I seem to have lost the context of this.  Did this get discussed, and 
>need further revision?

The further discussion can be (loosely) paraphrased as:

Andy Gospodarek <[EMAIL PROTECTED]>: "Hey, this no workee with IPv6."

Me: "True, but bonding no workee with IPv6 at all."

Andy: "Oh, ok.  Ack."

After which followed some preliminary yakkage about fixing up
said non-workee IPv6 support.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bonding-devel] [PATCH 2/3] bonding: only receive ARPs for us

2007-03-02 Thread Jeff Garzik

Jay Vosburgh wrote:

Jeff Garzik <[EMAIL PROTECTED]> wrote:


Jay Vosburgh wrote:

The ARP validation code only needs ARPs for the bonding device.

Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>
I seem to have lost the context of this.  Did this get discussed, and 
need further revision?


The further discussion can be (loosely) paraphrased as:

Andy Gospodarek <[EMAIL PROTECTED]>: "Hey, this no workee with IPv6."

Me: "True, but bonding no workee with IPv6 at all."

Andy: "Oh, ok.  Ack."

After which followed some preliminary yakkage about fixing up
said non-workee IPv6 support.


thanks :)  I'll make sure the 3 patches go into #upstream-fixes


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [USBNET] DM9501: Add Corega FEther USB-TXC support.

2007-03-02 Thread Greg KH
On Fri, Mar 02, 2007 at 08:33:55PM -0500, Jeff Garzik wrote:
> YOSHIFUJI Hideaki /  wrote:
> >Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
> >---
> > drivers/usb/net/dm9601.c |4 
> > 1 files changed, 4 insertions(+), 0 deletions(-)
> >
> >diff --git a/drivers/usb/net/dm9601.c b/drivers/usb/net/dm9601.c
> >index 4a932e1..c0bc52b 100644
> >--- a/drivers/usb/net/dm9601.c
> >+++ b/drivers/usb/net/dm9601.c
> >@@ -571,6 +571,10 @@ static const struct driver_info dm9601_info = {
> > 
> > static const struct usb_device_id products[] = {
> > {
> >+ USB_DEVICE(0x07aa, 0x9601),/* Corega FEther USB-TXC */
> >+ .driver_info = (unsigned long)&dm9601_info,
> >+ },
> >+{
> 
> 
> ACK the patch, though I wonder if this shouldn't instead go to Greg.
> 
> Honestly, I would prefer that the USB net drivers were moved into 
> drivers/net with the other net drivers, /then/ I would merge such 
> patches.  We don't add drivers for PCI-based hardware to 
> drivers/pci/net, after all...

I have no objection to that.  Things have been moving out of the
drivers/usb/ directory over time, and if you want to take these under
your umbrella too, that's fine with me.

David, any objections?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Andi Kleen <[EMAIL PROTECTED]>
Date: 03 Mar 2007 03:14:29 +0100

> That's pretty common with many x86 server boards because 
> they come with two NICs by default but must people only
> plug the cable into one. However the distro installers
> run DHCP on all.

Nope, that's not what I've seen them do, instead they run dhcp on
interfaces that report a link being present.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Herbert Xu
David Miller <[EMAIL PROTECTED]> wrote:
> 
> I'm tempted to say I must be missing something here, since I can't see
> how this could possible work at all.  The string passed in should
> be interpreted as the ifindex value, and thus trigger a -ENODEV
> return from AF_PACKET's bind() implementation.

This is using packet_bind_spkt which uses a name instead of ifindex.

As you may recall, I've made a patch to convert it to use the new
(actually it's not-so-new anymore) AF_PACKET interface.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Sat, 03 Mar 2007 16:38:45 +1100

> This is using packet_bind_spkt which uses a name instead of ifindex.

So it should be just fine, it should be binding to a specific
device (by name instead of ifindex) and therefore it should
only trigger the pt_all hook when the packet arrives on that
specific device.

> As you may recall, I've made a patch to convert it to use the new
> (actually it's not-so-new anymore) AF_PACKET interface.

That's right.

So it's still a mystery why dhcp is causing bridge devices
to trigger the network tap paths on Stephen's machine.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ppp and routing table rules.

2007-03-02 Thread Bill Fink
On Thu, 01 Mar 2007, Ben Greear wrote:

> Ben Greear wrote:
> 
> I am sending udp packets through ppp400, and I see them appear on ppp401 
> as expected.
> 
> The thing that is bothering me is that all I see on rddVR4 (172.1.2.1) 
> is arps for 172.1.2.2, but the 'tell' IP is that of the
> originating ppp400 link, not the IP of rddVR4, as I expected:
> 
> 21:47:16.119640 arp who-has 172.1.2.2 tell 11.1.1.3
> 21:47:17.119371 arp who-has 172.1.2.2 tell 11.1.1.3
> 21:47:18.119254 arp who-has 172.1.2.2 tell 11.1.1.3
> 21:47:19.273118 arp who-has 172.1.2.2 tell 11.1.1.3
> 
> Unless I'm missing something dumb, a similar setup with all ethernet-ish 
> network devices
> works fine.
> 
> I have also enabled arp filtering:
> # Only answer ARPs if it is for the IP on our own interface.
> echo 2 > /proc/sys/net/ipv4/conf/all/arp_ignore
> and for every device used in these routing tables:
> echo 1 > /proc/sys/net/ipv4/conf/[dev]/arp_filter
> 
> Any idea what I need to do in order to make  the source IP for the ARP 
> packet correct?

Wouldn't that be controlled by arp_announce?

arp_announce - INTEGER
Define different restriction levels for announcing the local
source IP address from IP packets in ARP requests sent on
interface:
0 - (default) Use any local address, configured on any interface
1 - Try to avoid local addresses that are not in the target's
subnet for this interface. This mode is useful when target
hosts reachable via this interface require the source IP
address in ARP requests to be part of their logical network
configured on the receiving interface. When we generate the
request we will check all our subnets that include the
target IP and will preserve the source address if it is from
such subnet. If there is no such subnet we select source
address according to the rules for level 2.
2 - Always use the best local address for this target.
In this mode we ignore the source address in the IP packet
and try to select local address that we prefer for talks with
the target host. Such local address is selected by looking
for primary IP addresses on all our subnets on the outgoing
interface that include the target IP address. If no suitable
local address is found we select the first local address
we have on the outgoing interface or on all other interfaces,
with the hope we will receive reply for our request and
even sometimes no matter the source IP address we announce.

The max value from conf/{all,interface}/arp_announce is used.

Increasing the restriction level gives more chance for
receiving answer from the resolved target while decreasing
the level announces more valid sender's information.

-Bill
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-03-02 Thread David Miller
From: Baruch Even <[EMAIL PROTECTED]>
Date: Thu, 1 Mar 2007 20:13:40 +0200

> If you take this approach it makes sense to also remove the sorting of
> SACKs, the traversal of the SACK blocks will not start from the
> beginning anyway which was the reason for this sorting in the first
> place.
> 
> One drawback for this approach is that you now walk the entire sack
> block when you advance one packet. If you consider a 10,000 packet queue
> which had several losses at the beginning and a large sack block that
> advances from the middle to the end you'll walk a lot of packets for
> that one last stretch of a sack block.
> 
> One way to handle that is to use the still existing sack fast path to
> detect this case and calculate what is the sequence number to search
> for. Since you know what was the end_seq that was handled last, you can
> search for it as the start_seq and go on from there. Does it make sense?

Thanks for the feedback and these great ideas.

BTW, I think I figured out a way to get rid of
lost_{skb,cnt}_hint.  The fact of the matter in this case is that
the setting of the tag bits always propagates from front of the queue
onward.  We don't get holes mid-way.

So what we can do is search the RB-tree for high_seq and walk
backwards.  Once we hit something with TCPCB_TAGBITS set, we
stop processing as there are no earlier SKBs which we'd need
to do anything with.

Do you see any problems with that idea?

scoreboard_skb_hint is a little bit trickier, but it is a similar
case to the tcp_lost_skb_hint case.  Except here the termination
condition is a relative timeout instead of a sequence number and
packet count test.

Perhaps for that we can remember some state from the
tcp_mark_head_lost() we do first.  In fact, we can start
the queue walk from the latest packet which tcp_mark_head_lost()
marked with a tag bit.

Basically these two algorithms are saying:

1) Mark up to smallest of 'lost' or tp->high_seq.
2) Mark packets after those processed in #1 which have
   timed out.

Right?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Herbert Xu
On Fri, Mar 02, 2007 at 09:59:05PM -0800, David Miller wrote:
> 
> So it's still a mystery why dhcp is causing bridge devices
> to trigger the network tap paths on Stephen's machine.

If this is the ISC DHCP daemon then perhaps it's because Stephen
didn't specify an interface for it to listen on? By default it'll
enumerate all broadcast interfaces and listen to each one of them.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger

David Miller wrote:

From: Andi Kleen <[EMAIL PROTECTED]>
Date: 03 Mar 2007 03:14:29 +0100

  
That's pretty common with many x86 server boards because 
they come with two NICs by default but must people only

plug the cable into one. However the distro installers
run DHCP on all.



Nope, that's not what I've seen them do, instead they run dhcp on
interfaces that report a link being present.
  


Actually, It may be even simpler... I start bridge with a script and 
there was still a dhclient
left over running on the original interface.  It was an interesting 
exercise, and I have new

tools to help, but still no magic bullet to get up to full line rate.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >