date:20070306

Re: [GIT PATCH (TAKE 2)] [NET]: Use {htons,htonl,cpu_to_le16}() where appropriate.

2007-03-06 Thread David Miller

From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]>
Date: Wed, 07 Mar 2007 16:08:21 +0900 (JST)

> Dave,
> 
> In article <[EMAIL PROTECTED]> (at Wed, 07 Mar 2007 14:58:07 +0900 (JST)), 
> YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]> says:
> 
> > Please consider pulling following changesets from the
> > "net-2.6.22-20070307a-byteorder-20070307" branch at
> > .
> 
> Argh, I found more places to convert in bluetooth.
> 
> I've made a new branch "net-2.6.22-20070307a-byteorder-20070307a" at
> .

Pulled, thank you very much.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] NETLINK: convert NLMSG_GOODSIZE to constant expression.

2007-03-06 Thread David Miller

From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]>
Date: Wed, 07 Mar 2007 16:34:01 +0900 (JST)

> This fixes the following error:
> :
> |  CC [M]  net/ipv4/netfilter/ipt_ULOG.o
> |net/ipv4/netfilter/ipt_ULOG.c:82: error: braced-group within expression 
> allowed only inside a function
> |net/ipv4/netfilter/ipt_ULOG.c:82: error: syntax error before "void"
> |make[1]: *** [net/ipv4/netfilter/ipt_ULOG.o] Error 1
> |make: *** [net/ipv4/netfilter/] Error 2
> 
> Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

Sorry about this :-/

Thanks for the fix, applied.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] NETLINK: convert NLMSG_GOODSIZE to constant expression.

2007-03-06 Thread YOSHIFUJI Hideaki / 吉藤英明

This fixes the following error:
:
|  CC [M]  net/ipv4/netfilter/ipt_ULOG.o
|net/ipv4/netfilter/ipt_ULOG.c:82: error: braced-group within expression 
allowed only inside a function
|net/ipv4/netfilter/ipt_ULOG.c:82: error: syntax error before "void"
|make[1]: *** [net/ipv4/netfilter/ipt_ULOG.o] Error 1
|make: *** [net/ipv4/netfilter/] Error 2

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 54597e4..a9d3ad5 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -175,8 +175,12 @@ int netlink_sendskb(struct sock *sk, struct sk_buff *skb, 
int protocol);
  * use enormous buffer sizes on recvmsg() calls just to avoid
  * MSG_TRUNC when PAGE_SIZE is very large.
  */
-#define NLMSG_GOODSIZE \
-   SKB_WITH_OVERHEAD(min(PAGE_SIZE,8192UL))
+#if PAGE_SIZE < 8192UL
+#define NLMSG_GOODSIZE SKB_WITH_OVERHEAD(PAGE_SIZE)
+#else
+#define NLMSG_GOODSIZE SKB_WITH_OVERHEAD(8192UL)
+#endif
+
 #define NLMSG_DEFAULT_SIZE (NLMSG_GOODSIZE - NLMSG_HDRLEN)
 
 

-- 
YOSHIFUJI Hideaki @ USAGI Project  <[EMAIL PROTECTED]>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Multipath routing in Linux 2.6

2007-03-06 Thread Jarek Poplawski

On 06-03-2007 21:36, Tore Anderson wrote:
> 
>   Hello list,
> 
>   I've been trying to figure out how to make equal-cost multipath
>  routing work, with no luck.  Asked on the LARTC list with no success,

It is probably one of the most often asked questions
on the LARTC, so I'd suggest to look at its archives.

>   I'm sending traffic from a relatively busy network to this table:
...
>   I've tried loading and unloading the multipath_{wrandom,rr,random,drr}

Multipath with caching doesn't work with forwarding.
...
>   I feel I'm missing something essential here but I have no idea what.
>  Google only tells me about others having roughly the same problem but
>  never any solution. [...]

It must be this fake google.

Some wrandom suggestions:

CONFIG_IP_ROUTE_MULTIPATH = y
CONFIG_IP_ROUTE_MULTIPATH_CACHED = n
rp_filter turned off
iptables CONNMARK or Julian Anastasov's patch
more ip route ... & ip rule ...
go to the LARTC again with: why still doesn't work...

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PATCH (TAKE 2)] [NET]: Use {htons,htonl,cpu_to_le16}() where appropriate.

2007-03-06 Thread YOSHIFUJI Hideaki / 吉藤英明

Dave,

In article <[EMAIL PROTECTED]> (at Wed, 07 Mar 2007 14:58:07 +0900 (JST)), 
YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]> says:

> Please consider pulling following changesets from the
> "net-2.6.22-20070307a-byteorder-20070307" branch at
> .

Argh, I found more places to convert in bluetooth.

I've made a new branch "net-2.6.22-20070307a-byteorder-20070307a" at
.

Thanks.

HEADLINES
-

[NET] 802: Use hton{s,l}() where appropriate.
[NET] 8021Q: Use htons() where appropriate.
[NET] ATM: Use htons() where appropriate.
[NET] BLUETOOTH: Use cpu_to_le{16,32}() where appropriate.
[NET] CORE: Use htons() where appropriate.
[NET] ETHERNET: Use htons() where appropriate.
[NET] IEEE80211: Use htons() where appropriate.
[NET] IPV4: Use hton{s,l}() where appropriate.
[NET] NETFILTER: Use htonl() where appropriate.
[NET] SCHED: Use htons() where appropriate.
[NET] TIPC: Use htons() where appropriate.

DIFFSTAT


 net/802/fddi.c  |4 +-
 net/802/hippi.c |4 +-
 net/8021q/vlan_dev.c|6 +-
 net/atm/br2684.c|4 +-
 net/bluetooth/hci_conn.c|   36 +++---
 net/bluetooth/hci_core.c|   20 
 net/bluetooth/hci_event.c   |8 ++-
 net/bluetooth/l2cap.c   |   70 ++-
 net/core/netpoll.c  |2 -
 net/ethernet/eth.c  |2 -
 net/ieee80211/ieee80211_tx.c|2 -
 net/ipv4/ipvs/ip_vs_core.c  |   10 ++--
 net/ipv4/ipvs/ip_vs_proto_ah.c  |   16 +++---
 net/ipv4/ipvs/ip_vs_xmit.c  |   16 +++---
 net/ipv4/netfilter/ip_conntrack_proto_tcp.c |9 ++-
 net/netfilter/nf_conntrack_proto_tcp.c  |9 ++-
 net/sched/cls_rsvp.h|2 -
 net/sched/sch_api.c |2 -
 net/tipc/eth_media.c|2 -
 19 files changed, 111 insertions(+), 113 deletions(-)

CHANGESETS
--

commit 9f72aaaed003b2dda0fc5fa28c2caccc4ae358e3
Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date:   Wed Mar 7 14:18:33 2007 +0900

[NET] 802: Use hton{s,l}() where appropriate.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

diff --git a/net/802/fddi.c b/net/802/fddi.c
index ace6386..8c86216 100644
--- a/net/802/fddi.c
+++ b/net/802/fddi.c
@@ -100,7 +100,7 @@ static int fddi_rebuild_header(struct sk_buff   *skb)
struct fddihdr *fddi = (struct fddihdr *)skb->data;
 
 #ifdef CONFIG_INET
-   if (fddi->hdr.llc_snap.ethertype == __constant_htons(ETH_P_IP))
+   if (fddi->hdr.llc_snap.ethertype == htons(ETH_P_IP))
/* Try to get ARP to resolve the header and fill destination 
address */
return arp_find(fddi->daddr, skb);
else
@@ -135,7 +135,7 @@ __be16 fddi_type_trans(struct sk_buff *skb, struct 
net_device *dev)
if(fddi->hdr.llc_8022_1.dsap==0xe0)
{
skb_pull(skb, FDDI_K_8022_HLEN-3);
-   type = __constant_htons(ETH_P_802_2);
+   type = htons(ETH_P_802_2);
}
else
{
diff --git a/net/802/hippi.c b/net/802/hippi.c
index 578f2a3..35dd938 100644
--- a/net/802/hippi.c
+++ b/net/802/hippi.c
@@ -60,7 +60,7 @@ static int hippi_header(struct sk_buff *skb, struct 
net_device *dev,
 * Due to the stupidity of the little endian byte-order we
 * have to set the fp field this way.
 */
-   hip->fp.fixed   = __constant_htonl(0x04800018);
+   hip->fp.fixed   = htonl(0x04800018);
hip->fp.d2_size = htonl(len + 8);
hip->le.fc  = 0;
hip->le.double_wide = 0;/* only HIPPI 800 for the time being */
@@ -104,7 +104,7 @@ static int hippi_rebuild_header(struct sk_buff *skb)
 * Only IP is currently supported
 */
 
-   if(hip->snap.ethertype != __constant_htons(ETH_P_IP))
+   if(hip->snap.ethertype != htons(ETH_P_IP))
{
printk(KERN_DEBUG "%s: unable to resolve type %X 
addresses.\n",skb->dev->name,ntohs(hip->snap.ethertype));
return 0;

---
commit d186c0a25b23079f5faac2500d3dcbd8b7d6e166
Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date:   Wed Mar 7 14:18:35 2007 +0900

[NET] 8021Q: Use htons() where appropriate.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 2fc8fe2..e961d59 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -258,7 +258,7 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device 
*dev,
 * won't work for fault tolerant netware but does for the rest.
 */
if (*(unsigned short *)rawp == 0x) {
-   skb->protocol = __constant_htons(ETH_P_802_3);
+   skb->pr

Re: [RFC] ARP notify option

2007-03-06 Thread Pekka Savola


On Tue, 6 Mar 2007, Chris Friesen wrote:

Stephen Hemminger wrote:

 +arp_notify - BOOLEAN
 +  Define mode for notification of address and device changes.
 +  0 - (default): do nothing
 +  1 - Generate gratuitous arp replies when device is brought up
 +  or hardware address changes.


Did you consider using gratuitous arp requests instead?  I remember reading 
about some hardware that updated its arp cache on gratuitous requests but not 
gratuitous replies.


You might be interested in taking a look at:

http://tools.ietf.org/id/draft-cheshire-ipv4-acd

There has been some follow-up discussion on this in the thread 
starting at:


http://www1.ietf.org/mail-archive/web/int-area/current/msg00611.html

In particular, you may be interested in this comment about ARP 
request and ARP reply for gratuitous ARP:


http://www1.ietf.org/mail-archive/web/int-area/current/msg00669.html

--
Pekka Savola "You each name yourselves king, yet the
Netcore Oykingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Update to cube root benchmark code

2007-03-06 Thread Willy Tarreau

Hi Stephen,

Thanks for this code, it's easy to experiment with it.
Let me propose this simple update with a variation on your ncubic() function.
I noticed that all intermediate results were far below 32 bits, so I did a
new version which is 30% faster on my athlon with the same results. This is
because we only use x and a/x^2 in the function, with x very close to cbrt(a).
So a/x^2 is very close to cbrt(a) which is at most 22 bits. So we only use
the 32 lower bits of the result of div64_64(), and all intermediate
computations can be done on 32 bits (including multiplies and divides).

[EMAIL PROTECTED]:~$ ./bictcp 
Calibrating
Function clocks  mean(us) max(us)  std(us)  Avg error
bictcp 1085 0.7028.19 2.30 0.172%
ocubic  869 0.5622.76 1.23 0.274%
ncubic  637 0.4116.29 1.41 0.247%
ncubic32435 0.2811.18 1.03 0.247%
acbrt   824 0.5321.03 0.85 0.275%
hcbrt   547 0.3513.96 0.42 1.580%

I also tried to improve a bit by checking for early convergence and
returning before last divide, but it is worthless because it almost
never happens so it does not make the code any faster.

Here's the code. I think that it would be fine if we merged this
version since it's supposed to behave better on most 32 bits machines.

Best regards,
Willy

/*
Here is a better version of the benchmark code.
It has the original code used in 2.4 version of Cubic for comparison

---
*/
/* Test and measure perf of cube root algorithms.  */
#include 
#include 
#include 
#include 
#include 

#ifdef __x86_64

#define rdtscll(val) do { \
 unsigned int __a,__d; \
 asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); \
 (val) = ((unsigned long)__a) | (((unsigned long)__d)<<32); \
} while(0)

# define do_div(n,base) ({  \
uint32_t __base = (base);   \
uint32_t __rem; \
__rem = ((uint64_t)(n)) % __base;   \
(n) = ((uint64_t)(n)) / __base; \
__rem;  \
 })


/**
 * __ffs - find first bit in word.
 * @word: The word to search
 *
 * Undefined if no bit exists, so code should check against 0 first.
 */
static __inline__ unsigned long __ffs(unsigned long word)
{
__asm__("bsfq %1,%0"
:"=r" (word)
:"rm" (word));
return word;
}

/*
 * __fls: find last bit set.
 * @word: The word to search
 *
 * Undefined if no zero exists, so code should check against ~0UL first.
 */
static inline unsigned long __fls(unsigned long word)
{
__asm__("bsrq %1,%0"
:"=r" (word)
:"rm" (word));
return word;
}

/**
 * ffs - find first bit set
 * @x: the word to search
 *
 * This is defined the same way as
 * the libc and compiler builtin ffs routines, therefore
 * differs in spirit from the above ffz (man ffs).
 */
static __inline__ int ffs(int x)
{
int r;

__asm__("bsfl %1,%0\n\t"
"cmovzl %2,%0" 
: "=r" (r) : "rm" (x), "r" (-1));
return r+1;
}

/**
 * fls - find last bit set
 * @x: the word to search
 *
 * This is defined the same way as ffs.
 */
static inline int fls(int x)
{
int r;

__asm__("bsrl %1,%0\n\t"
"cmovzl %2,%0"
: "=&r" (r) : "rm" (x), "rm" (-1));
return r+1;
}

/**
 * fls64 - find last bit set in 64 bit word
 * @x: the word to search
 *
 * This is defined the same way as fls.
 */
static inline int fls64(uint64_t x)
{
if (x == 0)
return 0;
return __fls(x) + 1;
}

static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
{
return dividend / divisor;
}

#elif __i386

#define rdtscll(val) \
 __asm__ __volatile__("rdtsc" : "=A" (val))

/**
 * ffs - find first bit set
 * @x: the word to search
 *
 * This is defined the same way as
 * the libc and compiler builtin ffs routines, therefore
 * differs in spirit from the above ffz() (man ffs).
 */
static inline int ffs(int x)
{
int r;

__asm__("bsfl %1,%0\n\t"
"jnz 1f\n\t"
"movl $-1,%0\n"
"1:" : "=r" (r) : "rm" (x));
return r+1;
}

/**
 * fls - find last bit set
 * @x: the word to search
 *
 * This is defined the same way as ffs().
 */
static inline int fls(int x)
{
int r;

__asm__("bsrl %1,%0\n\t"
"jnz 1f\n\t"
"movl $-1,%0\n"
"1:" : "=r" (r) : "rm" (x));
return r+1;
}

static inline int fls64(uint64_t x)
{
uint32_t h = x >> 32;
if (h)
return fls(h) + 32;
return fls(x);
}


#define do_div(n,base) ({ \
unsigned long __upper, __low, __high, __mod, __base; \
__base =

[NET]: Use {htons,htonl,cpu_to_le16}() where appropriate.

2007-03-06 Thread YOSHIFUJI Hideaki / 吉藤英明

Dave,

Please consider pulling following changesets from the
"net-2.6.22-20070307a-byteorder-20070307" branch at
.

Thank you.

HEADLINES
-

[NET] 802: Use hton{s,l}() where appropriate.
[NET] 8021Q: Use htons() where appropriate.
[NET] ATM: Use htons() where appropriate.
[NET] BLUETOOTH: Use cpu_to_le16() where appropriate.
[NET] CORE: Use htons() where appropriate.
[NET] ETHERNET: Use htons() where appropriate.
[NET] IEEE80211: Use htons() where appropriate.
[NET] IPV4: Use hton{s,l}() where appropriate.
[NET] NETFILTER: Use htonl() where appropriate.
[NET] SCHED: Use htons() where appropriate.
[NET] TIPC: Use htons() where appropriate.

DIFFSTAT


 net/802/fddi.c  |4 ++--
 net/802/hippi.c |4 ++--
 net/8021q/vlan_dev.c|6 +++---
 net/atm/br2684.c|4 ++--
 net/bluetooth/hci_conn.c|   18 +-
 net/core/netpoll.c  |2 +-
 net/ethernet/eth.c  |2 +-
 net/ieee80211/ieee80211_tx.c|2 +-
 net/ipv4/ipvs/ip_vs_core.c  |   10 +-
 net/ipv4/ipvs/ip_vs_proto_ah.c  |   16 
 net/ipv4/ipvs/ip_vs_xmit.c  |   16 
 net/ipv4/netfilter/ip_conntrack_proto_tcp.c |9 -
 net/netfilter/nf_conntrack_proto_tcp.c  |9 -
 net/sched/cls_rsvp.h|2 +-
 net/sched/sch_api.c |2 +-
 net/tipc/eth_media.c|2 +-
 16 files changed, 53 insertions(+), 55 deletions(-)

CHANGESETS
--

commit 2893c534ffba19c31c0321f8221eaf844306e951
Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date:   Wed Mar 7 14:18:33 2007 +0900

[NET] 802: Use hton{s,l}() where appropriate.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

diff --git a/net/802/fddi.c b/net/802/fddi.c
index ace6386..8c86216 100644
--- a/net/802/fddi.c
+++ b/net/802/fddi.c
@@ -100,7 +100,7 @@ static int fddi_rebuild_header(struct sk_buff   *skb)
struct fddihdr *fddi = (struct fddihdr *)skb->data;
 
 #ifdef CONFIG_INET
-   if (fddi->hdr.llc_snap.ethertype == __constant_htons(ETH_P_IP))
+   if (fddi->hdr.llc_snap.ethertype == htons(ETH_P_IP))
/* Try to get ARP to resolve the header and fill destination 
address */
return arp_find(fddi->daddr, skb);
else
@@ -135,7 +135,7 @@ __be16 fddi_type_trans(struct sk_buff *skb, struct 
net_device *dev)
if(fddi->hdr.llc_8022_1.dsap==0xe0)
{
skb_pull(skb, FDDI_K_8022_HLEN-3);
-   type = __constant_htons(ETH_P_802_2);
+   type = htons(ETH_P_802_2);
}
else
{
diff --git a/net/802/hippi.c b/net/802/hippi.c
index 578f2a3..35dd938 100644
--- a/net/802/hippi.c
+++ b/net/802/hippi.c
@@ -60,7 +60,7 @@ static int hippi_header(struct sk_buff *skb, struct 
net_device *dev,
 * Due to the stupidity of the little endian byte-order we
 * have to set the fp field this way.
 */
-   hip->fp.fixed   = __constant_htonl(0x04800018);
+   hip->fp.fixed   = htonl(0x04800018);
hip->fp.d2_size = htonl(len + 8);
hip->le.fc  = 0;
hip->le.double_wide = 0;/* only HIPPI 800 for the time being */
@@ -104,7 +104,7 @@ static int hippi_rebuild_header(struct sk_buff *skb)
 * Only IP is currently supported
 */
 
-   if(hip->snap.ethertype != __constant_htons(ETH_P_IP))
+   if(hip->snap.ethertype != htons(ETH_P_IP))
{
printk(KERN_DEBUG "%s: unable to resolve type %X 
addresses.\n",skb->dev->name,ntohs(hip->snap.ethertype));
return 0;

---
commit 60e71cd6525fdf2b35c71158bb5441a252e62a0e
Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date:   Wed Mar 7 14:18:35 2007 +0900

[NET] 8021Q: Use htons() where appropriate.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 2fc8fe2..e961d59 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -258,7 +258,7 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device 
*dev,
 * won't work for fault tolerant netware but does for the rest.
 */
if (*(unsigned short *)rawp == 0x) {
-   skb->protocol = __constant_htons(ETH_P_802_3);
+   skb->protocol = htons(ETH_P_802_3);
/* place it back on the queue to be handled by true layer 3 
protocols.
 */
 
@@ -281,7 +281,7 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device 
*dev,
/*
 *  Real 802.2 LLC
 */
-   skb->protocol = __constant_htons(ETH_P_802_2);
+   skb->protocol = htons(ETH_P_802_2);
/* place it back

Re: [ubuntu-marketing] Why should we teach students Linux??

2007-03-06 Thread Melissa Draper

Roel Bindels wrote:
> Hello listers,
>
> I'm tutor on the Faculty ICT, department NID. This is a bachelor degree
> and we are preparing our students to become something more then just
> System Administrators (such as manager, consulting, etc). Since this
> department is part of the Microsoft camp, the students are educated
> mostly in this direction, which I think is not a bad thing. A better
> thing would be if we could give our students the opportunity to meat
> both the systems on the same level, at least, that is my opinion.
>
> To change a curriculum of a study, I need a solid case. So if somebody
> knows a link/document about why we should educate our students in the
> Linux OS, please send it. Or article about the usage of Linux in company's.
>
> I hope you will all take some time to send me your best links/documents.
>
> with best regards
>
> Roel Bindels
>   
Roel,

I recently interviewed Richard Weideman (who I am adding to the CC so he
can comment directly) for an article about Edubuntu. One of the comments
he made was:

"Kids who learn to use a computer from scratch are not afraid of
Linux or OpenOffice.org. They concentrate on the learning task at
hand, and they learn to use whatever the tool is put in front of
them. If some of those kids graduate to a work environment using
Linux or OpenOffice.org, they will have no problem. If the new work
environment uses Windows, they will adjust without any issues. Some
of them will even propose OpenOffice.org or Linux at work, and help
their new company to migrate and save money."

You can read the full article at
http://www.linux.com/article.pl?sid=07/02/20/197251

-- 
Sincerely
Melissa Draper

http://www.meldraweb.com

Phone: 0404 595 395
(intl): +61 404 595 395

P.O Box 1412
Lavington, NSW 2641
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [UDP]: Clean up UDP-Lite receive checksum

2007-03-06 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 12:41:00 +1100

> Hi Dave:
> 
> [UDP]: Clean up UDP-Lite receive checksum
> 
> This patch eliminates some duplicate code for the verification of
> receive checksums between UDP-Lite and UDP.  It does this by
> introducing __skb_checksum_complete_head which is identical to
> __skb_checksum_complete_head apart from the fact that it takes
> a length parameter rather than computing the first skb->len bytes.
> 
> As a result UDP-Lite will be able to use hardware checksum offload
> for packets which do not use partial coverage checksums.  It also
> means that UDP-Lite loopback no longer does unnecessary checksum
> verification.
> 
> If any NICs start support UDP-Lite this would also start working
> automatically.
> 
> This patch removes the assumption that msg_flags has MSG_TRUNC clear
> upon entry in recvmsg.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Also applied, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [UDP6]: Restore sk_filter optimisation

2007-03-06 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 12:20:10 +1100

> Hi Dave:
> 
> [UDP6]: Restore sk_filter optimisation
> 
> This reverts the changeset
> 
> [IPV6]: UDPv6 checksum.
> 
> We always need to check UDPv6 checksum because it is mandatory.
> 
> The sk_filter optimisation has nothing to do whether we verify the
> checksum.  It simply postpones it to the point when the user calls
> recv or poll.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [UDP]: Reread uh pointer after pskb_trim

2007-03-06 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 12:00:20 +1100

> Hi Dave:
> 
> [UDP]: Reread uh pointer after pskb_trim
> 
> The header may have moved when trimming.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

IPV6 got this case right :-)

Applied, and I'll push this to -stable too.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] NET : Optimizes inet_getpeer()

2007-03-06 Thread David Miller

From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 11:33:20 +0100

> [PATCH] NET : Optimizes inet_getpeer()
> 
> 1) Some sysctl vars are declared __read_mostly
> 
> 2) We can avoid updating stack[] when doing an AVL lookup only.
> 
> lookup() macro is extended to receive a second parameter, that may be 
> NULL 
> in case of a pure lookup (no need to save the AVL path). This removes 
> unnecessary instructions, because compiler knows if this _stack parameter is 
> NULL or not.
> 
> text size of net/ipv4/inetpeer.o is 2063 bytes instead of 2107 on x86_64
> 
> Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

Applied, thanks Eric.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] pcnet32: only allocate init_block dma consistent

2007-03-06 Thread Ralf Baechle

On Tue, Mar 06, 2007 at 07:39:21PM -0800, Michael K. Edwards wrote:

> On 3/6/07, Ralf Baechle <[EMAIL PROTECTED]> wrote:
> >This small change btw. delivers about ~ 3% extra performance on a very
> >slow test system.
> 
> Has this change been tested / benchmarked under VMWare?  pcnet32 is
> the (default?) virtual device presented by VMWare Workstation, and
> that's probably a large fraction of its use in the field these days.
> But then Don probably already knows that.  :-)

Price question: why would this patch make a difference under VMware? :-)

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] TCP Yeah: cleanup

2007-03-06 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 14:56:05 -0800

> 
> Eliminate need for full 6/4/64 divide to compute queue.
> Variable maxqueue was really a constant.
> Fix indentation.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied, thanks Stephen.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] tcp_cubic: faster cube root

2007-03-06 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 14:47:06 -0800

> The Newton-Raphson method is quadratically convergent so
> only a small fixed number of steps are necessary.
> Therefore it is faster to unroll the loop. Since div64_64 is no longer
> inline it won't cause code explosion.
> 
> Also fixes a bug that can occur if x^2 was bigger than 32 bits.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied, thanks Stephen.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ATM] ENI: Convert to struct timeval to ktime_t.

2007-03-06 Thread David Miller

From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]>
Date: Wed, 07 Mar 2007 11:31:39 +0900 (JST)

> Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

Applied, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IPv6 Davelopment Tree

2007-03-06 Thread David Miller

From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]>
Date: Wed, 07 Mar 2007 11:30:30 +0900 (JST)

> Please pull from "net-2.6.22-20070307-FOR_DAVEM-20070307" branch at
>   .

Pulled, thank you very much.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Howard Chu


Rick Jones wrote:
The timeout is also to cover datagrams which just got "stuck" somewhere 
too (IIRC) and may not necessarily require a multiple path situation.


I guess that's a fair point. Originally, the only possible place for a
packet to get "stuck" was in a router but I suppose that may no longer
be true.

True.  Thankfully, the web learned to use persistent connections so 
later versions of SPECweb benchmarking make use of persistent connections.


As a complete aside, I think it's about time for a SPECldap benchmark...
--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sunhttp://highlandsun.com/hyc
  Chief Architect, OpenLDAP http://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] pcnet32: only allocate init_block dma consistent

2007-03-06 Thread Michael K. Edwards


On 3/6/07, Ralf Baechle <[EMAIL PROTECTED]> wrote:

This small change btw. delivers about ~ 3% extra performance on a very
slow test system.


Has this change been tested / benchmarked under VMWare?  pcnet32 is
the (default?) virtual device presented by VMWare Workstation, and
that's probably a large fraction of its use in the field these days.
But then Don probably already knows that.  :-)

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread David Stevens

 > >Marking the master down would, I believe, issue notifiers that
> > the device has gone down.  Various things, network manager sort of
> > applications in particular, listen to those, so I'm not sure it's a 
good
> > idea.  I think there are other side effects as well, I'm thinking it
> > would flush routes associated with the interface as well.

[BTW, you can call ip_mc_down()/ip_mc_up() directly w/o getting there
from the notifiers -- then no side-effects.]

Andy Gospodarek wrote:
> 
> I agree with Jay here.  I hate that bonding has to have so much
> knowledge about upper layer protocols, but for the ones that are
> stateful like IGMP we will need fixes like the one proposed.

I have no problem with bonding having knowledge of ULP's (I
don't like it, but I don't have to look at it :-) ), but the
patch is doing it the other way around. What I don't like about the
proposed patch is that it's adding knowledge of bonding to IGMP.
And IGMP does work fine in this case, w/o flooding or the
proposed patch. It just has the risk of losing multicast packets
during one query interval, and that only happens if you're
using a switch that does IGMP snooping.
I'd like the patch a lot better if it were basicly this:

mc_bond_fudge(void)
{
ip_mc_down(masterdev);
/*do whatever you need to do to switch the slave */
ip_mc_up(masterdev);
}

That doesn't go through the notifier chain, uses existing
functions, doesn't have any refcnt issues, and most importantly
could/should reside in a bonding source file and not in igmp.c. :-)
But RTNL is required whether you use up/down or roll your
own variant, so it sounds like you have other issues to resolve too.

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] pcnet32: only allocate init_block dma consistent

2007-03-06 Thread Ralf Baechle

On Tue, Mar 06, 2007 at 10:45:23AM -0800, Don Fry wrote:

> The patch below moves the init_block out of the private struct and
> only allocates init block with pci_alloc_consistent. 
> 
> This has two effects:
> 
> 1. Performance increase for non cache coherent machines, because the
>CPU only data in the private struct are now cached

This small change btw. delivers about ~ 3% extra performance on a very
slow test system.

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ATM] ENI: Convert to struct timeval to ktime_t.

2007-03-06 Thread YOSHIFUJI Hideaki / 吉藤英明

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

--- 
diff --git a/drivers/atm/eni.c b/drivers/atm/eni.c
index 8fccf01..0d3a38b 100644
--- a/drivers/atm/eni.c
+++ b/drivers/atm/eni.c
@@ -536,7 +536,7 @@ static int rx_aal0(struct atm_vcc *vcc)
return 0;
}
skb_put(skb,length);
-   skb_set_timestamp(skb, &eni_vcc->timestamp);
+   skb->tstamp = eni_vcc->timestamp;
DPRINTK("got len %ld\n",length);
if (do_rx_dma(vcc,skb,1,length >> 2,length >> 2)) return 1;
eni_vcc->rxing++;
@@ -701,7 +701,7 @@ static void get_service(struct atm_dev *dev)
DPRINTK("Grr, servicing VCC %ld twice\n",vci);
continue;
}
-   do_gettimeofday(&ENI_VCC(vcc)->timestamp);
+   ENI_VCC(vcc)->timestamp = ktime_get_real();
ENI_VCC(vcc)->next = NULL;
if (vcc->qos.rxtp.traffic_class == ATM_CBR) {
if (eni_dev->fast)
diff --git a/drivers/atm/eni.h b/drivers/atm/eni.h
index 385090c..d04fefb 100644
--- a/drivers/atm/eni.h
+++ b/drivers/atm/eni.h
@@ -59,7 +59,7 @@ struct eni_vcc {
int rxing;  /* number of pending PDUs */
int servicing;  /* number of waiting VCs (0 or 1) */
int txing;  /* number of pending TX bytes */
-   struct timeval timestamp;   /* for RX timing */
+   ktime_t timestamp;  /* for RX timing */
struct atm_vcc *next;   /* next pending RX */
struct sk_buff *last;   /* last PDU being DMAed (used to carry
   discard information) */

-- 
YOSHIFUJI Hideaki @ USAGI Project  <[EMAIL PROTECTED]>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IPv6 Davelopment Tree

2007-03-06 Thread YOSHIFUJI Hideaki / 吉藤英明

Dave,

In article <[EMAIL PROTECTED]> (at Tue, 06 Mar 2007 13:36:11 -0800 (PST)), 
David Miller <[EMAIL PROTECTED]> says:

> From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
> Date: Fri, 23 Feb 2007 12:53:01 +0900 (JST)
> 
> > I have cooked up new git tree for IPv6 development.
> > It is available as branch named
> > 2.6.21-rc1-net-2.6-20070223-FOR_DAVEM-20070223
> > at
> > .
> > 
> > I will shift to new branch time to time (e.g. every -rc releases) in order
> > to chase the latest tree.
> 
> What is the current branch name?  I'd like to pull whatever
> you have into my net-2.6.22 tree.

Please pull from "net-2.6.22-20070307-FOR_DAVEM-20070307" branch at
.

Thank you.

-- 
YOSHIFUJI Hideaki @ USAGI Project  <[EMAIL PROTECTED]>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: wireless extensions vs. 64-bit architectures

2007-03-06 Thread Jean Tourrilhes

On Tue, Mar 06, 2007 at 07:43:06PM +0100, Michael Buesch wrote:
> 
> Ok, it is wrapping the following ioctls:
> 
> HANDLE_IOCTL(SIOCGIWRANGE, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCSIWSPY, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWSPY, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCSIWTHRSPY, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWTHRSPY, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWAPLIST, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWSCAN, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCSIWESSID, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWESSID, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCSIWNICKN, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWNICKN, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCSIWENCODE, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWENCODE, do_wireless_ioctl)
> 
> What about SIOCSIWSCAN, SIOCSIWENCODEEXT, SIOCGIWENCODEEXT
> and some others that also use iw_point?

Ok, please check the patch attached. I don't have a box to
test that on, and on my 32 bit kernel it is not even compiled, but I
believe I got everything all right.
Please push that to the usual channels...

> Greetings Michael.

Thanks again,

Jean

Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]>

--

diff -u -p linux/fs/compat_ioctl.j1.c  linux/fs/compat_ioctl.c
--- linux/fs/compat_ioctl.j1.c  2007-03-06 17:49:33.0 -0800
+++ linux/fs/compat_ioctl.c 2007-03-06 17:56:19.0 -0800
@@ -2553,11 +2553,15 @@ HANDLE_IOCTL(I2C_RDWR, do_i2c_rdwr_ioctl
 HANDLE_IOCTL(I2C_SMBUS, do_i2c_smbus_ioctl)
 /* wireless */
 HANDLE_IOCTL(SIOCGIWRANGE, do_wireless_ioctl)
+HANDLE_IOCTL(SIOCGIWPRIV, do_wireless_ioctl)
+HANDLE_IOCTL(SIOCGIWSTATS, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCSIWSPY, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCGIWSPY, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCSIWTHRSPY, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCGIWTHRSPY, do_wireless_ioctl)
+HANDLE_IOCTL(SIOCSIWMLME, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCGIWAPLIST, do_wireless_ioctl)
+HANDLE_IOCTL(SIOCSIWSCAN, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCGIWSCAN, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCSIWESSID, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCGIWESSID, do_wireless_ioctl)
@@ -2565,6 +2569,11 @@ HANDLE_IOCTL(SIOCSIWNICKN, do_wireless_i
 HANDLE_IOCTL(SIOCGIWNICKN, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCSIWENCODE, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCGIWENCODE, do_wireless_ioctl)
+HANDLE_IOCTL(SIOCSIWGENIE, do_wireless_ioctl)
+HANDLE_IOCTL(SIOCGIWGENIE, do_wireless_ioctl)
+HANDLE_IOCTL(SIOCSIWENCODEEXT, do_wireless_ioctl)
+HANDLE_IOCTL(SIOCGIWENCODEEXT, do_wireless_ioctl)
+HANDLE_IOCTL(SIOCSIWPMKSA, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCSIFBR, old_bridge_ioctl)
 HANDLE_IOCTL(SIOCGIFBR, old_bridge_ioctl)
 HANDLE_IOCTL(RTC_IRQP_READ32, rtc_ioctl)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread Andy Gospodarek

On Tue, Mar 06, 2007 at 03:15:41PM -0800, Jay Vosburgh wrote:
> 
> David Stevens <[EMAIL PROTECTED]> wrote:
> 
> >It looks to me like "rejoin" is essentially ip_mc_up(), and it'd be better
> >to call that than add a nearly identical function.
> 
>   Won't ip_mc_up() acquire an additional reference (via
> ip_mc_inc_group) to the IGMP_ALL_HOSTS im->users that would never be
> released (in the case of bonding calling the function out of the blue)?
> 
>   In looking at it, the ip_mc_rejoin_group function (the new one
> added with the patch) is a lot more like igmp_group_added() than
> ip_mc_up().  I'm not sure if the extra bits in igmp_group_added() are
> worthy of concern; I'm thinking not, since im->loaded shouldn't be zero
> coming in for the bonding case.
> 
>   I think the meat that the "rejoin" wants is what's in
> igmpv3_send_cr(), which appears to do the actual sending stuff.  I'm not
> sure if that's better to call directly (and risk locking adventures) or
> to just trip the timer via igmp_ifc_event().
> 
>   Anyway, it looks like all of this needs to be done under RTNL,
> which isn't the case, so I need to go off and look into reworking it
> again.
> 
>   Andy: do you have any work in progress on the sleep / rtnl stuff
> we've been discussing?

Jay, 

I do, but unfortunately it's much closer to the code I'd proposed
originally than the code you sent me.  The more I audited your code, the
more I like the design -- until I discovered that every time you pause
the timers you need to flush the workqueue.  This is bad since you are
regularly stopping the timers in places where the rtnl lock is taken
and the currently running work item may need to that lock to complete.
With a small enough monitor interval I could deadlock pretty quickly.
Without the benefit of the full stop, I couldn't justify the major
conversion just yet (plus is feels like keeping a list of the timers is
re-implementing what workqueues are designed to do for you).

I've got a patch that seems decent so far, but its really just at
timer->workqueue conversion with some bits thrown in correctly stop the
queues when taking the interface down or when removing the module.

> >Also, real interfaces already do gratuitous IGMP advertisements when
> >they are bounced (the reason there is an ip_mc_up()). Could bonding,
> >when failing over, simply mark the master interface as down, switch, and
> >then mark the master as up again? In addition to doing the right
> >thing for both IPv4 and IPv6 multicasting w/o any code changes in those
> >layers, it may have similar benefits for ARP and neighbor discovery, 
> >right?
> 
>   Marking the master down would, I believe, issue notifiers that
> the device has gone down.  Various things, network manager sort of
> applications in particular, listen to those, so I'm not sure it's a good
> idea.  I think there are other side effects as well, I'm thinking it
> would flush routes associated with the interface as well.

I agree with Jay here.  I hate that bonding has to have so much
knowledge about upper layer protocols, but for the ones that are
stateful like IGMP we will need fixes like the one proposed.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: wireless extensions vs. 64-bit architectures

2007-03-06 Thread Jean Tourrilhes

On Tue, Mar 06, 2007 at 07:43:06PM +0100, Michael Buesch wrote:
> > 
> > Yep, and it's even in fs/compat_ioctl.c. Hint, hint ;-)
> 
> Ok, it is wrapping the following ioctls:
> 
> HANDLE_IOCTL(SIOCGIWRANGE, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCSIWSPY, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWSPY, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCSIWTHRSPY, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWTHRSPY, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWAPLIST, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWSCAN, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCSIWESSID, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWESSID, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCSIWNICKN, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWNICKN, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCSIWENCODE, do_wireless_ioctl)
> HANDLE_IOCTL(SIOCGIWENCODE, do_wireless_ioctl)
> 
> What about SIOCSIWSCAN, SIOCSIWENCODEEXT, SIOCGIWENCODEEXT
> and some others that also use iw_point?

Yep, good point.
SIOCSIWSCAN is up there. I did not realise that all the WPA
ioctls are missing. That's easy enough to fix, remember that you have
the full description of the ioctls in wireless.c.
I'll try to do a patch if I find 5 min, but feel free to
forward something to John L.
Thanks a lot !

> Greetings Michael.

Jean
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] S2IO: Save/Restore unused buffer mappings in 2/3 buffer mode

2007-03-06 Thread Ramkrishna Vepa

- Save/Restore unused buffer mappings in 2/3 buffer mode to avoid
frequent mapping

- Save/Restore adapter reset count during adapter reset

  (Resending; forgot to cc netdev)

Signed-off-by: Santosh Rastapur <[EMAIL PROTECTED]>
---
diff -Nurp patch1/drivers/net/s2io.c patch2/drivers/net/s2io.c
--- patch1/drivers/net/s2io.c   2007-03-06 03:29:18.0 -0800
+++ patch2/drivers/net/s2io.c   2007-03-06 04:00:41.0 -0800
@@ -84,7 +84,7 @@
 #include "s2io.h"
 #include "s2io-regs.h"
 
-#define DRV_VERSION "2.0.17.1"
+#define DRV_VERSION "2.0.19.1"
 
 /* S2io Driver name & version. */
 static char s2io_driver_name[] = "Neterion";
@@ -2242,6 +2242,7 @@ static int fill_rx_buffers(struct s2io_n
struct buffAdd *ba;
unsigned long flags;
struct RxD_t *first_rxdp = NULL;
+   u64 Buffer0_ptr = 0, Buffer1_ptr = 0;
 
mac_control = &nic->mac_control;
config = &nic->config;
@@ -2342,7 +2343,14 @@ static int fill_rx_buffers(struct s2io_n
 * payload
 */
 
+   /* save the buffer pointers to avoid frequent dma 
mapping */
+   Buffer0_ptr = ((struct RxD3*)rxdp)->Buffer0_ptr;
+   Buffer1_ptr = ((struct RxD3*)rxdp)->Buffer1_ptr;
memset(rxdp, 0, sizeof(struct RxD3));
+   /* restore the buffer pointers for dma sync*/
+   ((struct RxD3*)rxdp)->Buffer0_ptr = Buffer0_ptr;
+   ((struct RxD3*)rxdp)->Buffer1_ptr = Buffer1_ptr;
+
ba = &mac_control->rings[ring_no].ba[block_no][off];
skb_reserve(skb, BUF0_LEN);
tmp = (u64)(unsigned long) skb->data;
@@ -3307,6 +3315,7 @@ static void s2io_reset(struct s2io_nic *
u16 subid, pci_cmd;
int i;
u16 val16;
+   unsigned long long reset_cnt = 0;
DBG_PRINT(INIT_DBG,"%s - Resetting XFrame card %s\n",
__FUNCTION__, sp->dev->name);
 
@@ -3372,6 +3381,11 @@ new_way:
 
/* Reset device statistics maintained by OS */
memset(&sp->stats, 0, sizeof (struct net_device_stats));
+   /* save reset count */
+   reset_cnt = sp->mac_control.stats_info->sw_stat.soft_reset_cnt;
+   memset(sp->mac_control.stats_info, 0, sizeof(struct stat_block));
+   /* restore reset count */
+   sp->mac_control.stats_info->sw_stat.soft_reset_cnt = reset_cnt;
 
/* SXE-002: Configure link and activity LED to turn it off */
subid = sp->pdev->subsystem_device;
@@ -4279,9 +4293,7 @@ static void s2io_updt_stats(struct s2io_
if (cnt == 5)
break; /* Updt failed */
} while(1);
-   } else {
-   memset(sp->mac_control.stats_info, 0, sizeof(struct 
stat_block));
-   }
+   } 
 }
 
 /**



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] S2IO: Remove unused variables

2007-03-06 Thread Ramkrishna Vepa

- Remove unused variables from s2io_nic structure

- Changed the memory failure printk messages to print only in debug mode

- Updated the copyright messages

  (Resending; forgot to cc netdev)

Signed-off-by: Santosh Rastapur <[EMAIL PROTECTED]>
---
diff -Nurp patch/drivers/net/s2io.c patch1/drivers/net/s2io.c
--- patch/drivers/net/s2io.c2007-03-06 03:28:39.0 -0800
+++ patch1/drivers/net/s2io.c   2007-03-06 03:29:18.0 -0800
@@ -1,6 +1,6 @@

/
  * s2io.c: A Linux PCI-X Ethernet driver for Neterion 10GbE Server NIC
- * Copyright(c) 2002-2005 Neterion Inc.
+ * Copyright(c) 2002-2007 Neterion Inc.
 
  * This software may be used and distributed according to the terms of
  * the GNU General Public License (GPL), incorporated herein by
reference.
@@ -516,7 +516,7 @@ static int init_shared_mem(struct s2io_n
mac_control->fifos[i].list_info = kmalloc(list_holder_size,
  GFP_KERNEL);
if (!mac_control->fifos[i].list_info) {
-   DBG_PRINT(ERR_DBG,
+   DBG_PRINT(INFO_DBG,
  "Malloc failed for list_info\n");
return -ENOMEM;
}
@@ -542,9 +542,9 @@ static int init_shared_mem(struct s2io_n
tmp_v = pci_alloc_consistent(nic->pdev,
 PAGE_SIZE, &tmp_p);
if (!tmp_v) {
-   DBG_PRINT(ERR_DBG,
+   DBG_PRINT(INFO_DBG,
  "pci_alloc_consistent ");
-   DBG_PRINT(ERR_DBG, "failed for TxDL\n");
+   DBG_PRINT(INFO_DBG, "failed for TxDL\n");
return -ENOMEM;
}
/* If we got a zero DMA address(can happen on
@@ -561,9 +561,9 @@ static int init_shared_mem(struct s2io_n
tmp_v = pci_alloc_consistent(nic->pdev,
 PAGE_SIZE, &tmp_p);
if (!tmp_v) {
-   DBG_PRINT(ERR_DBG,
+   DBG_PRINT(INFO_DBG,
  "pci_alloc_consistent ");
-   DBG_PRINT(ERR_DBG, "failed for TxDL\n");
+   DBG_PRINT(INFO_DBG, "failed for 
TxDL\n");
return -ENOMEM;
}
}
@@ -2187,7 +2187,7 @@ static int fill_rxd_3buf(struct s2io_nic
/* skb_shinfo(skb)->frag_list will have L4 data payload */
skb_shinfo(skb)->frag_list = dev_alloc_skb(dev->mtu + ALIGN_SIZE);
if (skb_shinfo(skb)->frag_list == NULL) {
-   DBG_PRINT(ERR_DBG, "%s: dev_alloc_skb failed\n ", dev->name);
+   DBG_PRINT(INFO_DBG, "%s: dev_alloc_skb failed\n ", dev->name);
return -ENOMEM ;
}
frag_list = skb_shinfo(skb)->frag_list;
@@ -2313,8 +2313,8 @@ static int fill_rx_buffers(struct s2io_n
/* allocate skb */
skb = dev_alloc_skb(size);
if(!skb) {
-   DBG_PRINT(ERR_DBG, "%s: Out of ", dev->name);
-   DBG_PRINT(ERR_DBG, "memory to allocate SKBs\n");
+   DBG_PRINT(INFO_DBG, "%s: Out of ", dev->name);
+   DBG_PRINT(INFO_DBG, "memory to allocate SKBs\n");
if (first_rxdp) {
wmb();
first_rxdp->Control_1 |= RXD_OWN_XENA;
@@ -2573,8 +2573,8 @@ static int s2io_poll(struct net_device *
 
for (i = 0; i < config->rx_ring_num; i++) {
if (fill_rx_buffers(nic, i) == -ENOMEM) {
-   DBG_PRINT(ERR_DBG, "%s:Out of memory", dev->name);
-   DBG_PRINT(ERR_DBG, " in Rx Poll!!\n");
+   DBG_PRINT(INFO_DBG, "%s:Out of memory", dev->name);
+   DBG_PRINT(INFO_DBG, " in Rx Poll!!\n");
break;
}
}
@@ -2590,8 +2590,8 @@ no_rx:
 
for (i = 0; i < config->rx_ring_num; i++) {
if (fill_rx_buffers(nic, i) == -ENOMEM) {
-   DBG_PRINT(ERR_DBG, "%s:Out of memory", dev->name);
-   DBG_PRINT(ERR_DBG, " in Rx Poll!!\n");
+   DBG_PRINT(INFO_DBG, "%s:Out of memory", dev->name);
+   DBG_PRINT(INFO_DBG, " in Rx Poll!!\n");
break;
}
}
@@ -2640,8 +2640,8 @@ static void s2io_netpoll(struct net_devi
 
for (i = 0; i < config->rx_ring_num; i++) {
if (fill_rx_buffers(nic, i)

Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread David Stevens

[EMAIL PROTECTED] wrote on 03/06/2007 03:15:41 PM:

> 
> David Stevens <[EMAIL PROTECTED]> wrote:
> 
> >It looks to me like "rejoin" is essentially ip_mc_up(), and it'd be 
better
> >to call that than add a nearly identical function.
> 
>Won't ip_mc_up() acquire an additional reference (via
> ip_mc_inc_group) to the IGMP_ALL_HOSTS im->users that would never be
> released (in the case of bonding calling the function out of the blue)?

Yes. I'm not sure that matters-- destroy_dev doesn't care how many
references to a group, and IGMP_ALL_HOSTS isn't advertised (so wouldn't
get a "leave" when you only down the interface, like other groups do).
But since ip_mc_up() is *entirely* that join plus group_added() on all
the existing groups, there really shouldn't be another. But the new device
will need the all-hosts group in its hardware multicast filter, too, if
it hasn't already been using multicasting. Your "reload" caller could just
dec_group that group after calling ip_mc_up().

>In looking at it, the ip_mc_rejoin_group function (the new one
> added with the patch) is a lot more like igmp_group_added() than
> ip_mc_up().
No, group_added() is one group. mc_up() just calls group_added on all
of them, which I think is what the rejoin was trying to do.

>I'm not sure if the extra bits in igmp_group_added() are
> worthy of concern; I'm thinking not, since im->loaded shouldn't be zero
> coming in for the bonding case.
"im->loaded" means the device has it in its multicast address 
filter.
If you're switching devices, and didn't do all the multicast stuff on all
the devices originally, then you want it to be 0 (and should make it so,
like ip_mc_down() does). :-)

>I think the meat that the "rejoin" wants is what's in
> igmpv3_send_cr(), which appears to do the actual sending stuff.  I'm not
> sure if that's better to call directly (and risk locking adventures) or
> to just trip the timer via igmp_ifc_event().

No, no, no -- please don't mess with those directly. It'd be a 
maintenance
nightmare, and multicasting is device independent right now. :-) I'd hope 
there
wouldn't be any bonding-specific code needed at this layer, which is why I 
hope
something like using up/down would work out.

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] s2io: add PCI error recovery support

2007-03-06 Thread Ramkrishna Vepa

Jeff,

Please apply and forward this patch upstream.

Ram
> -Original Message-
> From: Ramkrishna Vepa
> Sent: Monday, March 05, 2007 2:34 PM
> To: 'Linas Vepstas'
> Cc: Wen Xiong; linux-kernel@vger.kernel.org; linux-
> [EMAIL PROTECTED]; netdev@vger.kernel.org; Jeff Garzik;
Andrew
> Morton
> Subject: RE: [PATCH] s2io: add PCI error recovery support
> 
> Comments on this patch -
> 
> 1. device_close_flag is unused and is not required.
> > +static pci_ers_result_t s2io_io_error_detected(struct pci_dev
*pdev,
> > +   pci_channel_state_t
> state)
> > +{
>   ...
> > +   do_s2io_card_down(sp, 0);
> > +   sp->device_close_flag = TRUE;   /* Device is shut down.
*/
> 
> 2. s2io_reset can fail to reset the device. Ideally s2io_reset should
> return a failure in this case (return is void now) and in this case
could
> s2io_io_slot_reset() be called again, maybe try thrice, in total,
before
> failing to reset the slot?
> 
> Ram
> > -Original Message-
> > From: Linas Vepstas [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, February 15, 2007 3:09 PM
> > To: Ramkrishna Vepa; Raghavendra Koushik; Ananda Raju
> > Cc: Wen Xiong; linux-kernel@vger.kernel.org; linux-
> > [EMAIL PROTECTED]; netdev@vger.kernel.org; Jeff Garzik;
> Andrew
> > Morton
> > Subject: [PATCH] s2io: add PCI error recovery support
> >
> >
> > Koushik, Raju,
> >
> > Please review, comment, and if you find this acceptable,
> > please forward upstream. This patch incorporates all of
> > fixes resulting from the last set of discussions, circa
> > November 2006.
> >
> > --linas
> >
> > This patch adds PCI error recovery support to the
> > s2io 10-Gigabit ethernet device driver. Fourth revision,
> > blocks interrupts and the watchdog. Adds a flag to
> > s2io_down(), to avoid doing I/O when PCI bus is offline.
> >
> > Tested, seems to work well.
> >
> > Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
> > Acked-by: Ramkrishna Vepa <[EMAIL PROTECTED]>
> > Cc: Raghavendra Koushik <[EMAIL PROTECTED]>
> > Cc: Ananda Raju <[EMAIL PROTECTED]>
> > Cc: Wen Xiong <[EMAIL PROTECTED]>
> >
> > 
> >  drivers/net/s2io.c |  116
> > ++---
> >  drivers/net/s2io.h |5 ++
> >  2 files changed, 116 insertions(+), 5 deletions(-)
> >
> > Index: linux-2.6.20-git4/drivers/net/s2io.c
> > ===
> > --- linux-2.6.20-git4.orig/drivers/net/s2io.c   2007-02-15
> > 15:39:35.0 -0600
> > +++ linux-2.6.20-git4/drivers/net/s2io.c2007-02-15
> 16:15:10.0 -
> > 0600
> > @@ -435,11 +435,18 @@ static struct pci_device_id s2io_tbl[] _
> >
> >  MODULE_DEVICE_TABLE(pci, s2io_tbl);
> >
> > +static struct pci_error_handlers s2io_err_handler = {
> > +   .error_detected = s2io_io_error_detected,
> > +   .slot_reset = s2io_io_slot_reset,
> > +   .resume = s2io_io_resume,
> > +};
> > +
> >  static struct pci_driver s2io_driver = {
> >.name = "S2IO",
> >.id_table = s2io_tbl,
> >.probe = s2io_init_nic,
> >.remove = __devexit_p(s2io_rem_nic),
> > +  .err_handler = &s2io_err_handler,
> >  };
> >
> >  /* A simplifier macro used both by init and free shared_mem Fns().
*/
> > @@ -2577,6 +2584,9 @@ static void s2io_netpoll(struct net_devi
> > u64 val64 = 0xULL;
> > int i;
> >
> > +   if (pci_channel_offline(nic->pdev))
> > +   return;
> > +
> > disable_irq(dev->irq);
> >
> > atomic_inc(&nic->isr_cnt);
> > @@ -3079,6 +3089,8 @@ static void alarm_intr_handler(struct s2
> > int i;
> > if (atomic_read(&nic->card_state) == CARD_DOWN)
> > return;
> > +   if (pci_channel_offline(nic->pdev))
> > +   return;
> > nic->mac_control.stats_info->sw_stat.ring_full_cnt = 0;
> > /* Handling the XPAK counters update */
> > if(nic->mac_control.stats_info->xpak_stat.xpak_timer_count <
72000)
> > {
> > @@ -4117,6 +4129,10 @@ static irqreturn_t s2io_isr(int irq, voi
> > struct mac_info *mac_control;
> > struct config_param *config;
> >
> > +   /* Pretend we handled any irq's from a disconnected card */
> > +   if (pci_channel_offline(sp->pdev))
> > +   return IRQ_NONE;
> > +
> > atomic_inc(&sp->isr_cnt);
> > mac_control = &sp->mac_control;
> > config = &sp->config;
> > @@ -6188,7 +6204,7 @@ static void s2io_rem_isr(struct s2io_nic
> > } while(cnt < 5);
> >  }
> >
> > -static void s2io_card_down(struct s2io_nic * sp)
> > +static void do_s2io_card_down(struct s2io_nic * sp, int do_io)
> >  {
> > int cnt = 0;
> > struct XENA_dev_config __iomem *bar0 = sp->bar0;
> > @@ -6203,7 +6219,8 @@ static void s2io_card_down(struct s2io_n
> > atomic_set(&sp->card_state, CARD_DOWN);
> >
> > /* disable Tx and Rx traffic on the NIC */
> > -   stop_nic(sp);
> > +   if (do_io)
> > +   stop_nic(sp);
> >
> > s2io_rem_isr(sp);
> >
> > @@ -6211,7 +6228,7 @@ static voi

Re: [patch 2/2] div64_64: common code

2007-03-06 Thread Andrew Morton

On Tue, 06 Mar 2007 15:21:40 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Tue, 6 Mar 2007 14:32:06 -0800
> 
> > ho hum, I didn't know that, so we missed rc2-mm2.
> > 
> > Could I have symlinks in /pub/scm/linux/kernel/git/davem/ to net-latest and
> > sparc-latest please?
> 
> When we are in a merge window, I have only one tree for sparc
> and networking.
> 
> But when we're in RC mode, I've got multiple trees, one each
> for bug fixes and one for stuff which will get submitted in
> the next merge window.
> 
> When Linus pulls in the bug fixes, I rebase the merge window trees so
> that all the fixes get integrated to the merge tree and I can resolve
> any conflicts, if any.
> 
> So, which one(s) do you want? :-)

The merge-window things, generally.

I assume from the above that the merge-window tree doesn't contain the
paterial in the bugfixes tree?  If so, I guess I'd need both.  If not, the
merge-window tree should contain everything?

I dunno - you know your trees better than I.  The bottom line is I want
everything you've got, and it'd be nice to fix this problem where I don't
know that a new tree has been opened up, so I miss it - how can we do this?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netlink recvmsg() and MSG_TRUNC

2007-03-06 Thread Herbert Xu

On Tue, Mar 06, 2007 at 04:05:57PM -0800, David Miller wrote:
>
> Actually, more accurately it's using PAGE_SIZE. :)

Aha it's you non-i386 people :)

> I see, so the better fix would be to make glibc's
> netlink_request() function start with a getpagesize()'d
> buffer.

Yes that's a good idea.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netlink recvmsg() and MSG_TRUNC

2007-03-06 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 7 Mar 2007 11:04:19 +1100

> On Tue, Mar 06, 2007 at 04:02:02PM -0800, David Miller wrote:
> >
> > Create a lot of intefaces, try to dump them :-)
> 
> Dumps should be done using 4K (NLMSG_GOODSIZE) skb's, where is the problem?

Actually, more accurately it's using PAGE_SIZE. :)

> > GLIBC can even hit this via it's ifaddrs.c code.
> 
> Do you have a simple test case that I can run?

I see, so the better fix would be to make glibc's
netlink_request() function start with a getpagesize()'d
buffer.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netlink recvmsg() and MSG_TRUNC

2007-03-06 Thread Herbert Xu

On Tue, Mar 06, 2007 at 03:57:50PM -0800, Stephen Hemminger wrote:
> 
> I know some commands send big blocks down of configuration information.
> One example is netem statistical data, but there are others.

You mean dumps? Unless someone is coalescing them I don't see a problem
there.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netlink recvmsg() and MSG_TRUNC

2007-03-06 Thread Herbert Xu

On Tue, Mar 06, 2007 at 04:02:02PM -0800, David Miller wrote:
>
> Create a lot of intefaces, try to dump them :-)

Dumps should be done using 4K (NLMSG_GOODSIZE) skb's, where is the problem?

> GLIBC can even hit this via it's ifaddrs.c code.

Do you have a simple test case that I can run?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netlink recvmsg() and MSG_TRUNC

2007-03-06 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Mar 2007 10:49:07 +1100

> Which netlink family generates (or needs to generate) unbounded
> messages to user-space? Or indeed which ones generate messages
> greater than 64K (or 4K for that matter)?

Create a lot of intefaces, try to dump them :-)

GLIBC can even hit this via it's ifaddrs.c code.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netlink recvmsg() and MSG_TRUNC

2007-03-06 Thread Stephen Hemminger

On Wed, 07 Mar 2007 10:49:07 +1100
Herbert Xu <[EMAIL PROTECTED]> wrote:

> David Miller <[EMAIL PROTECTED]> wrote:
> > 
> > I guess one thing the user could do when it sees MSG_TRUNC
> > is keep calling recvmsg() until the receive queue is emptied
> > of packets, in order to get that pesky nlk->cb cleared to
> > NULL, then resubmit.
> > 
> > But that's rediculous and complicated.
> > 
> > Any ideas?
> 
> Which netlink family generates (or needs to generate) unbounded
> messages to user-space? Or indeed which ones generate messages
> greater than 64K (or 4K for that matter)?
> 
> Cheers,

I know some commands send big blocks down of configuration information.
One example is netem statistical data, but there are others.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netlink recvmsg() and MSG_TRUNC

2007-03-06 Thread Herbert Xu

David Miller <[EMAIL PROTECTED]> wrote:
> 
> I guess one thing the user could do when it sees MSG_TRUNC
> is keep calling recvmsg() until the receive queue is emptied
> of packets, in order to get that pesky nlk->cb cleared to
> NULL, then resubmit.
> 
> But that's rediculous and complicated.
> 
> Any ideas?

Which netlink family generates (or needs to generate) unbounded
messages to user-space? Or indeed which ones generate messages
greater than 64K (or 4K for that matter)?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netlink recvmsg() and MSG_TRUNC

2007-03-06 Thread James Morris

On Tue, 6 Mar 2007, David Miller wrote:

> I guess one thing the user could do when it sees MSG_TRUNC
> is keep calling recvmsg() until the receive queue is emptied
> of packets, in order to get that pesky nlk->cb cleared to
> NULL, then resubmit.
> 
> But that's rediculous and complicated.
> 
> Any ideas?

Only slightly less complicated: user calls recvmsg() once with a new flag 
MSG_FLUSH, which causes the queue to be flushed, then resubmits ?


- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] div64_64 support

2007-03-06 Thread Sami Farin

On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
...
> And I found bug in gcc-4.1.2, it gave 0 for ncubic results
> when doing 1000 loops test... gcc-4.0.3 works.

Found it.

--- cbrt-test.c~2007-03-07 00:20:54.735248105 +0200
+++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200
@@ -209,7 +209,7 @@
 
__asm__("bsrl %1,%0\n\t"
"cmovzl %2,%0"
-   : "=&r" (r) : "rm" (x), "rm" (-1));
+   : "=&r" (r) : "rm" (x), "rm" (-1) : "memory");
return r+1;
 }
 
Now Linux 2.6 does not have "memory" in fls, maybe it causes
some gcc funnies some people are seeing.

-- 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Howard Chu


Stephen Hemminger wrote:

TCP can not assume anything about the path that a packet may take.
We have declared a moratorium on loopback benchmark foolishness.
Go optimize the idle loop instead ;-)
Sure - A delay loop with fewer instructions is a worthwhile optimization 
because it has less impact on a CPU's instruction cache...

--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sunhttp://highlandsun.com/hyc
  Chief Architect, OpenLDAP http://www.openldap.org/project/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] div64_64: common code

2007-03-06 Thread David Miller

From: Andrew Morton <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 14:32:06 -0800

> ho hum, I didn't know that, so we missed rc2-mm2.
> 
> Could I have symlinks in /pub/scm/linux/kernel/git/davem/ to net-latest and
> sparc-latest please?

When we are in a merge window, I have only one tree for sparc
and networking.

But when we're in RC mode, I've got multiple trees, one each
for bug fixes and one for stuff which will get submitted in
the next merge window.

When Linus pulls in the bug fixes, I rebase the merge window trees so
that all the fixes get integrated to the merge tree and I can resolve
any conflicts, if any.

So, which one(s) do you want? :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

netlink recvmsg() and MSG_TRUNC

2007-03-06 Thread David Miller


So if you don't give a large enough buffer to
recvmsg() for the netlink response a few things
happen:

1) MSG_TRUNC is set
2) The length returned and the amount of data copied is the
   size given in the recvmsg() call
3) If enough other packets remain in the receive buffer,
   nlk->cb is left at non-NULL for a partial dump.  This
   means that you can't just immediately resubmit the
   original request else you'll get NLMSG_ERROR with error
   set to -EBUSY.  This is what netlink_dump_start() does
   when it sees nlk->cb non-NULL.

Now, the user is basically stuck and there is no real
way to recover from this besides doing something like
openning up a new netlink socket and then doing the recvmsg()
with a larger buffer, wash rinse repeat.

I looked at how some of our standard userspace code handles
this and it's not pretty:

1) iproute2 basically just uses a 16K buffer, signals an error
   when it sees MSG_TRUNC, and that's it, whoopee

2) Thomas's libnl believes that recvmsg() will return the
   true length necessary to receive the whole message, he
   signals on this to double the buffer size and try the
   recvmsg() again.  As mentioned recvmsg() never returns
   a length larger than the given buffer size, so this code
   never triggers, and if it did it would lose entries because
   netlink_recvmsg() drops the SKB even when it signals
   MSG_TRUNC.

The behavior of dropping the SKB matches what UDP does in
the case of MSG_TRUNC.

I guess one thing the user could do when it sees MSG_TRUNC
is keep calling recvmsg() until the receive queue is emptied
of packets, in order to get that pesky nlk->cb cleared to
NULL, then resubmit.

But that's rediculous and complicated.

Any ideas?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread Jay Vosburgh

David Stevens <[EMAIL PROTECTED]> wrote:

>It looks to me like "rejoin" is essentially ip_mc_up(), and it'd be better
>to call that than add a nearly identical function.

Won't ip_mc_up() acquire an additional reference (via
ip_mc_inc_group) to the IGMP_ALL_HOSTS im->users that would never be
released (in the case of bonding calling the function out of the blue)?

In looking at it, the ip_mc_rejoin_group function (the new one
added with the patch) is a lot more like igmp_group_added() than
ip_mc_up().  I'm not sure if the extra bits in igmp_group_added() are
worthy of concern; I'm thinking not, since im->loaded shouldn't be zero
coming in for the bonding case.

I think the meat that the "rejoin" wants is what's in
igmpv3_send_cr(), which appears to do the actual sending stuff.  I'm not
sure if that's better to call directly (and risk locking adventures) or
to just trip the timer via igmp_ifc_event().

Anyway, it looks like all of this needs to be done under RTNL,
which isn't the case, so I need to go off and look into reworking it
again.

Andy: do you have any work in progress on the sleep / rtnl stuff
we've been discussing?

>Also, real interfaces already do gratuitous IGMP advertisements when
>they are bounced (the reason there is an ip_mc_up()). Could bonding,
>when failing over, simply mark the master interface as down, switch, and
>then mark the master as up again? In addition to doing the right
>thing for both IPv4 and IPv6 multicasting w/o any code changes in those
>layers, it may have similar benefits for ARP and neighbor discovery, 
>right?

Marking the master down would, I believe, issue notifiers that
the device has gone down.  Various things, network manager sort of
applications in particular, listen to those, so I'm not sure it's a good
idea.  I think there are other side effects as well, I'm thinking it
would flush routes associated with the interface as well.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] div64_64 support

2007-03-06 Thread Sami Farin

On Tue, Mar 06, 2007 at 10:29:41 -0800, Stephen Hemminger wrote:
> Don't count the existing Newton-Raphson out. It turns out that to get enough
> precision for 32 bits, only 4 iterations are needed. By unrolling those, it
> gets much better timing.
> 
> Slightly gross test program (with original cubic wraparound bug fixed).
...
>   {~0, 2097151},
 ^^^
this should be 2642245.

Without serializing instruction before rdtsc and with one loop
I do not get very accurate results (104 for ncubic, > 1000 for others).

#define rdtscll_serialize(val) \
  __asm__ __volatile__("movl $0, %%eax\n\tcpuid\n\trdtsc\n" : "=A" (val) : : 
"ebx", "ecx")

Here Pentium D timings for 1000 loops. 

~0, 2097151

Function clocks mean(us)  max(us)  std(us) total error
ocubic  9120.306   20.3170.730  545101
ncubic  7770.261   14.7990.486  576263
acbrt  11680.392   21.6810.547  547562
hcbrt   8270.278   15.2440.3872410

~0, 2642245

Function clocks mean(us)  max(us)  std(us) total error
ocubic  9080.305   20.2100.656   7
ncubic  7750.260   14.7920.550   31169
acbrt  11760.395   22.0170.9702468
hcbrt   8260.278   15.3260.670  547504

And I found bug in gcc-4.1.2, it gave 0 for ncubic results
when doing 1000 loops test... gcc-4.0.3 works.

-- 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cube root benchmark code

2007-03-06 Thread Stephen Hemminger

Here is a better version of the benchmark code.
It has the original code used in 2.4 version of Cubic for comparison

---
/* Test and measure perf of cube root algorithms.  */
#include 
#include 
#include 
#include 
#include 

#ifdef __x86_64

#define rdtscll(val) do { \
 unsigned int __a,__d; \
 asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); \
 (val) = ((unsigned long)__a) | (((unsigned long)__d)<<32); \
} while(0)

# define do_div(n,base) ({  \
uint32_t __base = (base);   \
uint32_t __rem; \
__rem = ((uint64_t)(n)) % __base;   \
(n) = ((uint64_t)(n)) / __base; \
__rem;  \
 })


/**
 * __ffs - find first bit in word.
 * @word: The word to search
 *
 * Undefined if no bit exists, so code should check against 0 first.
 */
static __inline__ unsigned long __ffs(unsigned long word)
{
__asm__("bsfq %1,%0"
:"=r" (word)
:"rm" (word));
return word;
}

/*
 * __fls: find last bit set.
 * @word: The word to search
 *
 * Undefined if no zero exists, so code should check against ~0UL first.
 */
static inline unsigned long __fls(unsigned long word)
{
__asm__("bsrq %1,%0"
:"=r" (word)
:"rm" (word));
return word;
}

/**
 * ffs - find first bit set
 * @x: the word to search
 *
 * This is defined the same way as
 * the libc and compiler builtin ffs routines, therefore
 * differs in spirit from the above ffz (man ffs).
 */
static __inline__ int ffs(int x)
{
int r;

__asm__("bsfl %1,%0\n\t"
"cmovzl %2,%0" 
: "=r" (r) : "rm" (x), "r" (-1));
return r+1;
}

/**
 * fls - find last bit set
 * @x: the word to search
 *
 * This is defined the same way as ffs.
 */
static inline int fls(int x)
{
int r;

__asm__("bsrl %1,%0\n\t"
"cmovzl %2,%0"
: "=&r" (r) : "rm" (x), "rm" (-1));
return r+1;
}

/**
 * fls64 - find last bit set in 64 bit word
 * @x: the word to search
 *
 * This is defined the same way as fls.
 */
static inline int fls64(uint64_t x)
{
if (x == 0)
return 0;
return __fls(x) + 1;
}

static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
{
return dividend / divisor;
}

#elif __i386

#define rdtscll(val) \
 __asm__ __volatile__("rdtsc" : "=A" (val))

/**
 * ffs - find first bit set
 * @x: the word to search
 *
 * This is defined the same way as
 * the libc and compiler builtin ffs routines, therefore
 * differs in spirit from the above ffz() (man ffs).
 */
static inline int ffs(int x)
{
int r;

__asm__("bsfl %1,%0\n\t"
"jnz 1f\n\t"
"movl $-1,%0\n"
"1:" : "=r" (r) : "rm" (x));
return r+1;
}

/**
 * fls - find last bit set
 * @x: the word to search
 *
 * This is defined the same way as ffs().
 */
static inline int fls(int x)
{
int r;

__asm__("bsrl %1,%0\n\t"
"jnz 1f\n\t"
"movl $-1,%0\n"
"1:" : "=r" (r) : "rm" (x));
return r+1;
}

static inline int fls64(uint64_t x)
{
uint32_t h = x >> 32;
if (h)
return fls(h) + 32;
return fls(x);
}


#define do_div(n,base) ({ \
unsigned long __upper, __low, __high, __mod, __base; \
__base = (base); \
asm("":"=a" (__low), "=d" (__high):"A" (n)); \
__upper = __high; \
if (__high) { \
__upper = __high % (__base); \
__high = __high / (__base); \
} \
asm("divl %2":"=a" (__low), "=d" (__mod):"rm" (__base), "0" (__low), 
"1" (__upper)); \
asm("":"=A" (n):"a" (__low),"d" (__high)); \
__mod; \
})


/* 64bit divisor, dividend and result. dynamic precision */
static uint64_t div64_64(uint64_t dividend, uint64_t divisor)
{
uint32_t d = divisor;

if (divisor > 0xULL) {
unsigned int shift = fls(divisor >> 32);

d = divisor >> shift;
dividend >>= shift;
}

/* avoid 64 bit division if possible */
if (dividend >> 32)
do_div(dividend, d);
else
dividend = (uint32_t) dividend / d;

return dividend;
}
#endif

/* Andi Kleen's version */
uint32_t acbrt(uint64_t x)
{
uint32_t y = 0;
int s;

for (s = 63; s >= 0; s -= 3) {
uint64_t b, bs;

y = 2 * y;
b = 3 * y * (y+1) + 1;
bs = b << s;
if (x >= bs && (b == (bs>>s))) {  /* avoid overflow */
x -= bs;
y++;
}
}

[PATCH] TCP Yeah: cleanup

2007-03-06 Thread Stephen Hemminger


Eliminate need for full 6/4/64 divide to compute queue.
Variable maxqueue was really a constant.
Fix indentation.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
  
---
 net/ipv4/tcp_yeah.c |   42 +++---
 1 file changed, 23 insertions(+), 19 deletions(-)

--- net-2.6.22.orig/net/ipv4/tcp_yeah.c 2007-03-06 11:46:34.0 -0800
+++ net-2.6.22/net/ipv4/tcp_yeah.c  2007-03-06 11:54:54.0 -0800
@@ -74,7 +74,7 @@
 }
 
 static void tcp_yeah_cong_avoid(struct sock *sk, u32 ack,
-u32 seq_rtt, u32 in_flight, int flag)
+   u32 seq_rtt, u32 in_flight, int flag)
 {
struct tcp_sock *tp = tcp_sk(sk);
struct yeah *yeah = inet_csk_ca(sk);
@@ -142,8 +142,8 @@
 */
 
if (yeah->cntRTT > 2) {
-   u32 rtt;
-   u32 queue, maxqueue;
+   u32 rtt, queue;
+   u64 bw;
 
/* We have enough RTT samples, so, using the Vegas
 * algorithm, we determine if we should increase or
@@ -158,32 +158,36 @@
 */
rtt = yeah->minRTT;
 
-   queue = (u32)div64_64((u64)tp->snd_cwnd * (rtt - 
yeah->baseRTT), rtt);
-
-   maxqueue = TCP_YEAH_ALPHA;
-
-   if (queue > maxqueue ||
-   rtt - yeah->baseRTT > (yeah->baseRTT / 
TCP_YEAH_PHY)) {
-
-   if (queue > maxqueue && tp->snd_cwnd > 
yeah->reno_count) {
-   u32 reduction = min( queue / 
TCP_YEAH_GAMMA ,
-tp->snd_cwnd >> 
TCP_YEAH_EPSILON );
+   /* Compute excess number of packets above bandwidth
+* Avoid doing full 64 bit divide.
+*/
+   bw = tp->snd_cwnd;
+   bw *= rtt - yeah->baseRTT;
+   do_div(bw, rtt);
+   queue = bw;
+
+   if (queue > TCP_YEAH_ALPHA ||
+   rtt - yeah->baseRTT > (yeah->baseRTT / 
TCP_YEAH_PHY)) {
+   if (queue > TCP_YEAH_ALPHA
+   && tp->snd_cwnd > yeah->reno_count) {
+   u32 reduction = min(queue / 
TCP_YEAH_GAMMA ,
+   tp->snd_cwnd >> 
TCP_YEAH_EPSILON);
 
tp->snd_cwnd -= reduction;
 
-   tp->snd_cwnd = max( tp->snd_cwnd, 
yeah->reno_count);
+   tp->snd_cwnd = max(tp->snd_cwnd,
+  yeah->reno_count);
 
tp->snd_ssthresh = tp->snd_cwnd;
-   }
+   }
 
if (yeah->reno_count <= 2)
-   yeah->reno_count = max( 
tp->snd_cwnd>>1, 2U);
+   yeah->reno_count = max(tp->snd_cwnd>>1, 
2U);
else
yeah->reno_count++;
 
-   yeah->doing_reno_now =
-  min_t( u32, 
yeah->doing_reno_now + 1 , 0xff);
-
+   yeah->doing_reno_now = min(yeah->doing_reno_now 
+ 1,
+  0xffU);
} else {
yeah->fast_count++;
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Stephen Hemminger

On Tue, 06 Mar 2007 14:07:09 -0800
Howard Chu <[EMAIL PROTECTED]> wrote:

> David Miller wrote:
> > From: Rick Jones <[EMAIL PROTECTED]>
> > Date: Tue, 06 Mar 2007 13:25:35 -0800
> > 
> >>> On the other hand, being able to configure a small MSL for the loopback 
> >>> device is perfectly safe. Being able to configure a small MSL for other 
> >>> interfaces may be safe, depending on the rest of the network layout.
> >> A peanut gallery question - I seem to recall prior discussions about how 
> >> one cannot assume that a packet destined for a given IP address will 
> >> remain detined for that given IP address as it could go through a module 
> >> that will rewrite headers etc.
> > 
> > That's right, both netfilter and the packet scheduler actions
> > can do that, that's why this whole idea about changing the MSL
> > on loopback by default is wrong and pointless.
> 
> If the headers get rewritten and the packet gets directed elsewhere, 
> then we're no longer talking about a loopback connection, so that's 
> outside the discussion.
> 
> If the packet gets munged by multiple filters but still eventually gets 
> to the specified destination, OK. But regardless, if both endpoints of 
> the connection are on the loopback device, then there is nothing wrong 
> with the idea. Those filters can only do so much, they still have to 
> preserve the reliable in-order delivery semantics of TCP, otherwise the 
> system is broken.
> 
> It may not have much use, sure, I admitted that much from the outset.
> 
> So I'll leave it at this, thanks for the feedback.


TCP can not assume anything about the path that a packet may take.
We have declared a moratorium on loopback benchmark foolishness.
Go optimize the idle loop instead ;-)

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] ARP notify option

2007-03-06 Thread Stephen Hemminger

On Tue, 06 Mar 2007 15:18:07 -0600
"Chris Friesen" <[EMAIL PROTECTED]> wrote:

> Stephen Hemminger wrote:
> 
> > +arp_notify - BOOLEAN
> > +   Define mode for notification of address and device changes.
> > +   0 - (default): do nothing
> > +   1 - Generate gratuitous arp replies when device is brought up
> > +   or hardware address changes.
> 
> Did you consider using gratuitous arp requests instead?  I remember 
> reading about some hardware that updated its arp cache on gratuitous 
> requests but not gratuitous replies.
> 
> Chris

I copied the ARP generation from other places that were doing
gratuitous ARP already:  Xen and irlan.
Our local switch used REPLY's to do the same thing.

One could imagine making it a ternary value and having 2 generate
REQUEST's.


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] tcp_cubic: faster cube root

2007-03-06 Thread Stephen Hemminger

The Newton-Raphson method is quadratically convergent so
only a small fixed number of steps are necessary.
Therefore it is faster to unroll the loop. Since div64_64 is no longer
inline it won't cause code explosion.

Also fixes a bug that can occur if x^2 was bigger than 32 bits.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 net/ipv4/tcp_cubic.c |   16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

--- net-2.6.22.orig/net/ipv4/tcp_cubic.c2007-03-06 12:24:34.0 
-0800
+++ net-2.6.22/net/ipv4/tcp_cubic.c 2007-03-06 14:43:37.0 -0800
@@ -96,23 +96,17 @@
  */
 static u32 cubic_root(u64 a)
 {
-   u32 x, x1;
+   u64 x;
 
/* Initial estimate is based on:
 * cbrt(x) = exp(log(x) / 3)
 */
x = 1u << (fls64(a)/3);
 
-   /*
-* Iteration based on:
-* 2
-* x= ( 2 * x  +  a / x  ) / 3
-*  k+1  k k
-*/
-   do {
-   x1 = x;
-   x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3;
-   } while (abs(x1 - x) > 1);
+   /* converges to 32 bits in 3 iterations */
+   x = (2 * x + div64_64(a, x*x)) / 3;
+   x = (2 * x + div64_64(a, x*x)) / 3;
+   x = (2 * x + div64_64(a, x*x)) / 3;
 
return x;
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] div64_64: common code

2007-03-06 Thread Andrew Morton

On Tue, 06 Mar 2007 10:11:40 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

> From: [EMAIL PROTECTED]
> Date: Tue, 06 Mar 2007 02:42:28 -0800
> 
> > From: Stephen Hemminger <[EMAIL PROTECTED]>
> > 
> > Implement div64_64(): 64-bit by 64-bit division.  Needed by networking (at
> > least).
> 
> This patch, with the types.h fixes of your's, is already in my
> net-2.6.22 GIT tree if you'd like to start pulling from there
> Andrew.

ho hum, I didn't know that, so we missed rc2-mm2.

Could I have symlinks in /pub/scm/linux/kernel/git/davem/ to net-latest and
sparc-latest please?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread David Stevens

It looks to me like "rejoin" is essentially ip_mc_up(), and it'd be better
to call that than add a nearly identical function.

Also, real interfaces already do gratuitous IGMP advertisements when
they are bounced (the reason there is an ip_mc_up()). Could bonding,
when failing over, simply mark the master interface as down, switch, and
then mark the master as up again? In addition to doing the right
thing for both IPv4 and IPv6 multicasting w/o any code changes in those
layers, it may have similar benefits for ARP and neighbor discovery, 
right?
Maybe not-- haven't looked at it...

One down side for IPv6 (which apparently bonding doesn't support) is that
static addresses are lost when the device goes down, but that's a 
difference
form IPv4 that should be fixed.
+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread David Miller

From: Robert Olsson <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 14:26:04 +0100

> David Miller writes:
>  
>  > Actually, more accurately, the conflict exists in how this GC
>  > logic is implemented.  The core issue is that hash table size
>  > guides the GC processing, and hash table growth therefore
>  > modifies those GC goals.  So with the patch below we'll just
>  > keep growing the hash table instead of giving GC some time to
>  > try to keep the working set in equilibrium before doing the
>  > hash grow.
>  
>  AFIK the equilibrium is resizing function as well but using fixed 
>  hash table. So can we do without equilibrium resizing if tables 
>  are dynamic?  I think so
> 
>  With the hash data structure we could monitor the average chain 
>  length or just size and resize hash after that.

I'm not so sure, it may be a mistake to eliminate the equilibrium
logic.  One error I think it does have is the usage of chain length.

Even a nearly perfect hash has small lumps in distribution, and we
should not penalize entries which fall into these lumps.

Let us call T the threshold at which we would grow the routing hash
table.  As we approach T we start to GC.  Let's assume hash table
has shift = 2. and T would (with T=N+(N>>1) algorithm) therefore be
6.

TABLE:  [0] DST1, DST2
[1] DST3, DST4, DST5

DST6 arrives, what should we do?

If we just accept it and don't GC some existing entries, we
will grow the hash table.  This is the wrong thing to do if
our true working set is smaller than 6 entries and thus some
of the existing entries are unlikely to be reused and thus
could be purged to keep us from hitting T.

If they are all active, growing is the right thing to do.

This is the crux of the whole routing cache problem.

I am of the opinion that LRU, for routes not attached to sockets, is
probably the best thing to do here.

Furthermore at high packet rates, the current rt_may_expire() logic
probably is not very effective since it's granularity is limited to
jiffies.  We can quite easily create 100,000 or more entries per
jiffie when HZ=100 during rDOS, for example.  So perhaps some global
LRU algorithm using ktime is more appropriate.

Global LRU is not easy without touching a lot of memory.  But I'm
sure some clever trick can be discovered by someone :)

It is amusing, but it seems that for rDOS workload most optimal
routing hash would be tiny one like my example above.  All packets
essentially miss the routing cache and create new entry.  So
keeping the working set as small as possible is what you want
to do since no matter how large you grow your hit rate will be
zero :-)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Howard Chu

David Miller wrote:

From: Rick Jones <[EMAIL PROTECTED]>
Date: Tue, 06 Mar 2007 13:25:35 -0800

On the other hand, being able to configure a small MSL for the loopback 
device is perfectly safe. Being able to configure a small MSL for other 
interfaces may be safe, depending on the rest of the network layout.
A peanut gallery question - I seem to recall prior discussions about how 
one cannot assume that a packet destined for a given IP address will 
remain detined for that given IP address as it could go through a module 
that will rewrite headers etc.

That's right, both netfilter and the packet scheduler actions
can do that, that's why this whole idea about changing the MSL
on loopback by default is wrong and pointless.

If the headers get rewritten and the packet gets directed elsewhere, 
then we're no longer talking about a loopback connection, so that's 
outside the discussion.

If the packet gets munged by multiple filters but still eventually gets 
to the specified destination, OK. But regardless, if both endpoints of 
the connection are on the loopback device, then there is nothing wrong 
with the idea. Those filters can only do so much, they still have to 
preserve the reliable in-order delivery semantics of TCP, otherwise the 
system is broken.

It may not have much use, sure, I admitted that much from the outset.

So I'll leave it at this, thanks for the feedback.
--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sunhttp://highlandsun.com/hyc
  Chief Architect, OpenLDAP http://www.openldap.org/project/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] ARP notify option

2007-03-06 Thread Chris Friesen


Stephen Hemminger wrote:


+arp_notify - BOOLEAN
+   Define mode for notification of address and device changes.
+   0 - (default): do nothing
+   1 - Generate gratuitous arp replies when device is brought up
+   or hardware address changes.


Did you consider using gratuitous arp requests instead?  I remember 
reading about some hardware that updated its arp cache on gratuitous 
requests but not gratuitous replies.


Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions

2007-03-06 Thread David Miller

From: Baruch Even <[EMAIL PROTECTED]>
Date: Wed, 7 Mar 2007 00:01:46 +0200

> * David Miller <[EMAIL PROTECTED]> [070306 23:47]:
> > From: Baruch Even <[EMAIL PROTECTED]>
> > Date: Tue, 6 Mar 2007 21:42:59 +0200
> > 
> > > * Ilpo J?rvinen <[EMAIL PROTECTED]> [070306 14:52]:
> > > > +   newtp->highest_sack = treq->snt_isn + 1;
> > > 
> > > That's the only initialization that you have for highest_sack, I think
> > > that you should initialize it when a loss is detected to the start_seq
> > > of the first packet that wasn't acked.
> > 
> > He also sets it in tcp_sacktag_write_queue() like this:
> > 
> > +
> > +   if (after(TCP_SKB_CB(skb)->seq,
> > +   tp->highest_sack))
> > +   tp->highest_sack = TCP_SKB_CB(skb)->seq;
> 
> Yes, but that's still not enough if between the start of the connection
> and the first sack block we already wrapped around to before the old
> highest_sack. It might not be a common occurrence but it's still
> something to take care of.

Aha, I see, yes good point.  That would need to be fixed.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions

2007-03-06 Thread Baruch Even

* David Miller <[EMAIL PROTECTED]> [070306 23:47]:
> From: Baruch Even <[EMAIL PROTECTED]>
> Date: Tue, 6 Mar 2007 21:42:59 +0200
> 
> > * Ilpo J?rvinen <[EMAIL PROTECTED]> [070306 14:52]:
> > > + newtp->highest_sack = treq->snt_isn + 1;
> > 
> > That's the only initialization that you have for highest_sack, I think
> > that you should initialize it when a loss is detected to the start_seq
> > of the first packet that wasn't acked.
> 
> He also sets it in tcp_sacktag_write_queue() like this:
> 
> +
> + if (after(TCP_SKB_CB(skb)->seq,
> + tp->highest_sack))
> + tp->highest_sack = TCP_SKB_CB(skb)->seq;

Yes, but that's still not enough if between the start of the connection
and the first sack block we already wrapped around to before the old
highest_sack. It might not be a common occurrence but it's still
something to take care of.

Baruch
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] div64_64 support

2007-03-06 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 10:29:41 -0800

> /* calculate the cubic root of x using Newton-Raphson */
> static uint32_t ncubic(uint64_t a)
> {
>   uint64_t x;
> 
>   /* Initial estimate is based on:
>* cbrt(x) = exp(log(x) / 3)
>*/
>   x = 1u << (fls64(a)/3);
> 
>   /* Converges in 3 iterations to > 32 bits */
> 
>   x = (2 * x + div64_64(a, x*x)) / 3;
>   x = (2 * x + div64_64(a, x*x)) / 3;
>   x = (2 * x + div64_64(a, x*x)) / 3;
> 
>   return x;
> }

Indeed that will be the fastest variant for cpus with hw
integer division.

I did a quick sparc64 port, here is what I got:

Function clocks  mean(us) max(us)  std(us)  total error
ocubic  529 0.3515.16 0.66 545101
ncubic  498 0.3312.83 0.36 576263
acbrt   427 0.2811.04 0.33 547562
hcbrt   393 0.2610.18 0.47 2410
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions

2007-03-06 Thread David Miller

From: Baruch Even <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 21:42:59 +0200

> * Ilpo J?rvinen <[EMAIL PROTECTED]> [070306 14:52]:
> > +   newtp->highest_sack = treq->snt_isn + 1;
> 
> That's the only initialization that you have for highest_sack, I think
> that you should initialize it when a loss is detected to the start_seq
> of the first packet that wasn't acked.

He also sets it in tcp_sacktag_write_queue() like this:

+
+   if (after(TCP_SKB_CB(skb)->seq,
+   tp->highest_sack))
+   tp->highest_sack = TCP_SKB_CB(skb)->seq;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] div64_64: common code

2007-03-06 Thread David Miller

From: Ralf Baechle <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 19:46:32 +

> On Tue, Mar 06, 2007 at 02:42:28AM -0800, [EMAIL PROTECTED] wrote:
> 
> > Implement div64_64(): 64-bit by 64-bit division.  Needed by networking (at
> > least).
> 
> Your patch only implements div64_64() for 32-bit MIPS.  Below patch adds
> the trivial 64-bit bits.
> 
>   Ralf
> 
> Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]>

Applied to net-2.6.22, thanks Ralf.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fix compat_sock_common_getsockopt typo

2007-03-06 Thread David Miller

From: James Morris <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 10:06:00 -0500 (EST)

> On Tue, 6 Mar 2007, Johannes Berg wrote:
> 
> > This patch fixes a typo in compat_sock_common_getsockopt.
> > 
> > Signed-off-by: Johannes Berg <[EMAIL PROTECTED]>
> > 
> > --- wireless-dev.orig/net/core/sock.c   2007-03-06 15:44:15.618565674 
> > +0100
> > +++ wireless-dev/net/core/sock.c2007-03-06 15:44:25.948565674 +0100
> > @@ -1597,7 +1597,7 @@ int compat_sock_common_getsockopt(struct
> >  {
> > struct sock *sk = sock->sk;
> >  
> > -   if (sk->sk_prot->compat_setsockopt != NULL)
> > +   if (sk->sk_prot->compat_getsockopt != NULL)
> > return sk->sk_prot->compat_getsockopt(sk, level, optname,
> >   optval, optlen);
> > return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen);
> 
> 
> Acked-by: James Morris <[EMAIL PROTECTED]>

Applied, thanks evryone.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IPv6 Davelopment Tree

2007-03-06 Thread David Miller

From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date: Fri, 23 Feb 2007 12:53:01 +0900 (JST)

> I have cooked up new git tree for IPv6 development.
> It is available as branch named
>   2.6.21-rc1-net-2.6-20070223-FOR_DAVEM-20070223
> at
>   .
> 
> I will shift to new branch time to time (e.g. every -rc releases) in order
> to chase the latest tree.

What is the current branch name?  I'd like to pull whatever
you have into my net-2.6.22 tree.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread David Miller

From: Rick Jones <[EMAIL PROTECTED]>
Date: Tue, 06 Mar 2007 13:25:35 -0800

> > On the other hand, being able to configure a small MSL for the loopback 
> > device is perfectly safe. Being able to configure a small MSL for other 
> > interfaces may be safe, depending on the rest of the network layout.
> 
> A peanut gallery question - I seem to recall prior discussions about how 
> one cannot assume that a packet destined for a given IP address will 
> remain detined for that given IP address as it could go through a module 
> that will rewrite headers etc.

That's right, both netfilter and the packet scheduler actions
can do that, that's why this whole idea about changing the MSL
on loopback by default is wrong and pointless.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: many sockets, slow sendto

2007-03-06 Thread Zacco


Hi,

Thx a lot you for the advice, I'll have a try.
And sorry for the stupid webmail, I will not use it again.

Zacco


Andi Kleen wrote:

Zaccomer Lajos <[EMAIL PROTECTED]> writes:

  

I'm playing around with a simulation, in which many thousands of IP

addresses (on interface aliases) are used to send/receive TCP/UDP



Something seems to be wrong with your emailer. It adds a empty
line between each real line.

  

packets. I noticed that the time of send/sendto increased linearly

with the number of file descriptors, and I found it rather strange.



Yes that is strange. I would suggest you use oprofile to identify which
parts of the kernel use the CPU time with many descriptors.

-Andi

  


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Rick Jones

On the other hand, being able to configure a small MSL for the loopback 
device is perfectly safe. Being able to configure a small MSL for other 
interfaces may be safe, depending on the rest of the network layout.


A peanut gallery question - I seem to recall prior discussions about how 
one cannot assume that a packet destined for a given IP address will 
remain detined for that given IP address as it could go through a module 
that will rewrite headers etc.


Is traffic destined for 127.0.0.1 immune from that?

rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Howard Chu


Eric Dumazet wrote:


Arf... dont tell me you forgot to do this...

echo 1 >/proc/sys/net/ipv4/tcp_tw_recycle
echo 1 >/proc/sys/net/ipv4/tcp_tw_reuse


That does not appear to me to be a safe thing to do on a production 
machine. Tweaks that are only good in a test environment really don't 
help the testing effort; they just mask a problem that will surface 
later at deployment time.


We could run our benchmarks this way and get high rates but no one 
deploying the server for real use would ever get anything like that, 
which makes the benchmark figure rather pointless.


On the other hand, being able to configure a small MSL for the loopback 
device is perfectly safe. Being able to configure a small MSL for other 
interfaces may be safe, depending on the rest of the network layout.

--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sunhttp://highlandsun.com/hyc
  Chief Architect, OpenLDAP http://www.openldap.org/project/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Multipath routing in Linux 2.6

2007-03-06 Thread Tore Anderson



  Hello list,

  I've been trying to figure out how to make equal-cost multipath
 routing work, with no luck.  Asked on the LARTC list with no success,
 and attempts to contact the two authors privately yielded one bounce
 while the other declined to answer in private and pointed me to this
 list.  So I'll just include the rest of the mail I sent them here
 (after doing a search-and-replace on the first three octets of all
 addresses, I'm a bit paranoid), hopefully someone has some suggestions
 for me...

  I'm using 2.6.20 on an x86_64 machine.  I'm adding my route thusly:

ip route add table 100 default \
  nexthop via 1.1.1.1 nexthop via 1.1.1.9

  It shows correctly up in the routing table:

[EMAIL PROTECTED]:~# ip route show table 100
default
nexthop via 1.1.1.1  dev vlan11 weight 1
nexthop via 1.1.1.9  dev vlan12 weight 1
[...]

  I'm sending traffic from a relatively busy network to this table:

[EMAIL PROTECTED]:~# ip rule
[...]
21000:  from 1.1.2.128/26 lookup 100
[...]

  I can verify with tcpdump that the rule works correctly and that the
 route is used.  However, the traffic is without exception routed via
 1.1.1.9, not a single packet is sent to 1.1.1.1.  If I however swap
 the two nexthops while adding the route, all traffic is sent to
 1.1.1.1, and nothing ends up at 1.1.1.9.

  I've tried loading and unloading the multipath_{wrandom,rr,random,drr}
 modules, removing and readding the route, and flushing the routing
 cache.  Several times and in different order.  Nothing affects the
 behaviour though, all of the traffic is sent to the router specified as
 the second nexthop on the "ip route add" command line.

  I feel I'm missing something essential here but I have no idea what.
 Google only tells me about others having roughly the same problem but
 never any solution.  Do you have any suggestions for me?  If I can make
 this work I will be happy to document how and try to have that included
 in the next kernel/iproute release and hopefully nobody will bother you
 about it again.

  Thanks for your time!

Kind regards
--
Tore Anderson
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread Andy Gospodarek

On 3/6/07, Jay Vosburgh <[EMAIL PROTECTED]> wrote:

Brian Haley <[EMAIL PROTECTED]> wrote:

>Andy Gospodarek wrote:
>> If we are easily able to differentiate between the multicast addresses
>> in the mc_list as to which are for ipv4 and which are for ipv6 then it
>> would be easy to call-out to something in the ipv6 mcast code when
>> needed instead of always calling out to ipv4 code.
>
>I've been unable to figure out exactly what you're referring to in the
>code (bond_main.c), it seems to failover all multicast addresses,
>regardless of what address family they are.  I might have missed something
>in 4K lines of code though?

I believe Andy is talking about bond_resend_igmp_join_requests
being only effective for IGMP v4 and not IGMP v6.  The reason being that
there is (a) no discrimination between v4 and v6 multicast addresses,
and (b) for the v6 case, there's no "rejoin" type function as was
created for IPv4 with the patch.

/me nods
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Rick Jones

With transparant bridging, nobody knows how long the datagram may be 
out there.  Admittedly, the chances of a datagram living for a full 
two minutes these days is probably nil, but just being in the same IP 
subnet doesn't really mean anything when it comes to physical locality.



Bridging isn't necessarily a problem though. The 2MSL timeout is 
designed to prevent problems from delayed packets that got sent through 
multiple paths. In a bridging setup you don't allow multiple paths, 
that's what STP is designed to prevent. If you want to configure a 
network that allows multiple paths, you need to use a router, not a bridge.


Well, there is trunking at the data link layer, and in theory there 
could be an active-standby where the standby took a somewhat different path.


The timeout is also to cover datagrams which just got "stuck" somewhere 
too (IIRC) and may not necessarily require a multiple path situation.




SPECweb benchmarking has had to deal with the issue of attempted 
TIME_WAIT reuse going back to 1997.  It deals with it by not relying 
on the client's configured local/anonymous/ephemeral port number range 
and instead making explicit bind() calls in the (more or less) entire 
unpriv port range (actually it may just be from 5000 to 65535 but still)



That still doesn't solve the problem, it only ~doubles the available 
port range. That means it takes 0.6 seconds to trigger the problem 
instead of only 0.3 seconds...


True.  Thankfully, the web learned to use persistent connections so 
later versions of SPECweb benchmarking make use of persistent connections.


In an environment where connections are opened and closed very quickly 
with only a small amount of data carried per connection, it might make 
sense to remember the last sequence number used on a port and use that 
as the floor of the next randomly generated ISN. Monotonically 
increasing sequence numbers aren't a security risk if there's still a 
randomly determined gap from one connection to the next. But I don't 
think it's necessary to consider this at the moment.


I thought that all the "security types" started squawking if the ISN 
wasn't completely random?


I've not tried this, but if a client does want to cycle through 
thousands of connections per second, and if it is the one to initiate 
connection close, would it be sufficient to only use something like:


socket()
bind()
loop:
connect()
request()
response()
shudtown(SHUT_RDWR)
goto loop

ie not call close on the FD so there is still a direct link to the 
connection in TIME_WAIT so one could in theory initiate a new connection 
from TIME_WAIT?  Then in theory the randomness could be _almost_ the 
entire sequence space, less the previous connection's window (IIRC).


rick jones

rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Eric Dumazet


Howard Chu a écrit :

Eric Dumazet wrote:

Let me see, any chance you can try the prog on 2.6.20 ?


Not any time soon.


If not, please send :

grep . /proc/sys/net/ipv4/*


This is the output on the laptop:
/proc/sys/net/ipv4/icmp_echo_ignore_all:0
/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts:1
/proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr:0
/proc/sys/net/ipv4/icmp_ignore_bogus_error_responses:1
/proc/sys/net/ipv4/icmp_ratelimit:250
/proc/sys/net/ipv4/icmp_ratemask:6168
/proc/sys/net/ipv4/igmp_max_memberships:20
/proc/sys/net/ipv4/igmp_max_msf:10
/proc/sys/net/ipv4/inet_peer_gc_maxtime:120
/proc/sys/net/ipv4/inet_peer_gc_mintime:10
/proc/sys/net/ipv4/inet_peer_maxttl:600
/proc/sys/net/ipv4/inet_peer_minttl:120
/proc/sys/net/ipv4/inet_peer_threshold:65664
/proc/sys/net/ipv4/ip_default_ttl:64
/proc/sys/net/ipv4/ip_dynaddr:0
/proc/sys/net/ipv4/ip_forward:0
/proc/sys/net/ipv4/ipfrag_high_thresh:262144
/proc/sys/net/ipv4/ipfrag_low_thresh:196608
/proc/sys/net/ipv4/ipfrag_max_dist:64
/proc/sys/net/ipv4/ipfrag_secret_interval:600
/proc/sys/net/ipv4/ipfrag_time:30
/proc/sys/net/ipv4/ip_local_port_range:3276861000
/proc/sys/net/ipv4/ip_nonlocal_bind:0
/proc/sys/net/ipv4/ip_no_pmtu_disc:0
/proc/sys/net/ipv4/tcp_abc:0
/proc/sys/net/ipv4/tcp_abort_on_overflow:0
/proc/sys/net/ipv4/tcp_adv_win_scale:2
/proc/sys/net/ipv4/tcp_app_win:31
/proc/sys/net/ipv4/tcp_base_mss:512
/proc/sys/net/ipv4/tcp_congestion_control:reno
/proc/sys/net/ipv4/tcp_dma_copybreak:4096
/proc/sys/net/ipv4/tcp_dsack:1
/proc/sys/net/ipv4/tcp_ecn:0
/proc/sys/net/ipv4/tcp_fack:1
/proc/sys/net/ipv4/tcp_fin_timeout:60
/proc/sys/net/ipv4/tcp_frto:0
/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200
/proc/sys/net/ipv4/tcp_low_latency:0
/proc/sys/net/ipv4/tcp_max_orphans:32768
/proc/sys/net/ipv4/tcp_max_syn_backlog:1024
/proc/sys/net/ipv4/tcp_max_tw_buckets:18
/proc/sys/net/ipv4/tcp_mem:98304131072  196608
/proc/sys/net/ipv4/tcp_moderate_rcvbuf:1
/proc/sys/net/ipv4/tcp_mtu_probing:0
/proc/sys/net/ipv4/tcp_no_metrics_save:0
/proc/sys/net/ipv4/tcp_orphan_retries:0
/proc/sys/net/ipv4/tcp_reordering:3
/proc/sys/net/ipv4/tcp_retrans_collapse:1
/proc/sys/net/ipv4/tcp_retries1:3
/proc/sys/net/ipv4/tcp_retries2:15
/proc/sys/net/ipv4/tcp_rfc1337:0
/proc/sys/net/ipv4/tcp_rmem:409687380   4194304
/proc/sys/net/ipv4/tcp_sack:1
/proc/sys/net/ipv4/tcp_slow_start_after_idle:1
/proc/sys/net/ipv4/tcp_stdurg:0
/proc/sys/net/ipv4/tcp_synack_retries:5
/proc/sys/net/ipv4/tcp_syncookies:1
/proc/sys/net/ipv4/tcp_syn_retries:5
/proc/sys/net/ipv4/tcp_timestamps:1
/proc/sys/net/ipv4/tcp_tso_win_divisor:3
/proc/sys/net/ipv4/tcp_tw_recycle:0
/proc/sys/net/ipv4/tcp_tw_reuse:0
/proc/sys/net/ipv4/tcp_window_scaling:1
/proc/sys/net/ipv4/tcp_wmem:409616384   4194304
/proc/sys/net/ipv4/tcp_workaround_signed_windows:0


Arf... dont tell me you forgot to do this...

echo 1 >/proc/sys/net/ipv4/tcp_tw_recycle
echo 1 >/proc/sys/net/ipv4/tcp_tw_reuse

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread Jay Vosburgh

Brian Haley <[EMAIL PROTECTED]> wrote:

>Andy Gospodarek wrote:
>> If we are easily able to differentiate between the multicast addresses
>> in the mc_list as to which are for ipv4 and which are for ipv6 then it
>> would be easy to call-out to something in the ipv6 mcast code when
>> needed instead of always calling out to ipv4 code.
>
>I've been unable to figure out exactly what you're referring to in the
>code (bond_main.c), it seems to failover all multicast addresses,
>regardless of what address family they are.  I might have missed something
>in 4K lines of code though?

I believe Andy is talking about bond_resend_igmp_join_requests
being only effective for IGMP v4 and not IGMP v6.  The reason being that
there is (a) no discrimination between v4 and v6 multicast addresses,
and (b) for the v6 case, there's no "rejoin" type function as was
created for IPv4 with the patch.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET] netxen: fix warnings

2007-03-06 Thread Ralf Baechle

  CC [M]  drivers/net/netxen/netxen_nic_hw.o
drivers/net/netxen/netxen_nic_hw.c: In function 'netxen_nic_hw_resources':
drivers/net/netxen/netxen_nic_hw.c:231: warning: format '%llx' expects type 
'long long unsigned int', but argument 2 has type 'dma_addr_t'
drivers/net/netxen/netxen_nic_hw.c:250: warning: format '%llx' expects type 
'long long unsigned int', but argument 2 has type 'dma_addr_t'

u64 is unsigned long so the cast to u64 will result in a warning on the
printf arguments for 64-bit builds.  So cast to unsigned long long instead.

Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]>

diff --git a/drivers/net/netxen/netxen_nic_hw.c 
b/drivers/net/netxen/netxen_nic_hw.c
index a2877f3..1be5570 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -228,7 +228,7 @@ int netxen_nic_hw_resources(struct netxen_adapter *adapter)
&adapter->ctx_desc_pdev);
 
printk("ctx_desc_phys_addr: 0x%llx\n",
-  (u64) adapter->ctx_desc_phys_addr);
+  (unsigned long long) adapter->ctx_desc_phys_addr);
if (addr == NULL) {
DPRINTK(ERR, "bad return from pci_alloc_consistent\n");
err = -ENOMEM;
@@ -247,7 +247,8 @@ int netxen_nic_hw_resources(struct netxen_adapter *adapter)
adapter->max_tx_desc_count,
(dma_addr_t *) & hw->cmd_desc_phys_addr,
&adapter->ahw.cmd_desc_pdev);
-   printk("cmd_desc_phys_addr: 0x%llx\n", (u64) hw->cmd_desc_phys_addr);
+   printk("cmd_desc_phys_addr: 0x%llx\n",
+  (unsigned long long) hw->cmd_desc_phys_addr);
 
if (addr == NULL) {
DPRINTK(ERR, "bad return from pci_alloc_consistent\n");
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Howard Chu


Eric Dumazet wrote:

Let me see, any chance you can try the prog on 2.6.20 ?


Not any time soon.


If not, please send :

grep . /proc/sys/net/ipv4/*


This is the output on the laptop:
/proc/sys/net/ipv4/icmp_echo_ignore_all:0
/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts:1
/proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr:0
/proc/sys/net/ipv4/icmp_ignore_bogus_error_responses:1
/proc/sys/net/ipv4/icmp_ratelimit:250
/proc/sys/net/ipv4/icmp_ratemask:6168
/proc/sys/net/ipv4/igmp_max_memberships:20
/proc/sys/net/ipv4/igmp_max_msf:10
/proc/sys/net/ipv4/inet_peer_gc_maxtime:120
/proc/sys/net/ipv4/inet_peer_gc_mintime:10
/proc/sys/net/ipv4/inet_peer_maxttl:600
/proc/sys/net/ipv4/inet_peer_minttl:120
/proc/sys/net/ipv4/inet_peer_threshold:65664
/proc/sys/net/ipv4/ip_default_ttl:64
/proc/sys/net/ipv4/ip_dynaddr:0
/proc/sys/net/ipv4/ip_forward:0
/proc/sys/net/ipv4/ipfrag_high_thresh:262144
/proc/sys/net/ipv4/ipfrag_low_thresh:196608
/proc/sys/net/ipv4/ipfrag_max_dist:64
/proc/sys/net/ipv4/ipfrag_secret_interval:600
/proc/sys/net/ipv4/ipfrag_time:30
/proc/sys/net/ipv4/ip_local_port_range:3276861000
/proc/sys/net/ipv4/ip_nonlocal_bind:0
/proc/sys/net/ipv4/ip_no_pmtu_disc:0
/proc/sys/net/ipv4/tcp_abc:0
/proc/sys/net/ipv4/tcp_abort_on_overflow:0
/proc/sys/net/ipv4/tcp_adv_win_scale:2
/proc/sys/net/ipv4/tcp_app_win:31
/proc/sys/net/ipv4/tcp_base_mss:512
/proc/sys/net/ipv4/tcp_congestion_control:reno
/proc/sys/net/ipv4/tcp_dma_copybreak:4096
/proc/sys/net/ipv4/tcp_dsack:1
/proc/sys/net/ipv4/tcp_ecn:0
/proc/sys/net/ipv4/tcp_fack:1
/proc/sys/net/ipv4/tcp_fin_timeout:60
/proc/sys/net/ipv4/tcp_frto:0
/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200
/proc/sys/net/ipv4/tcp_low_latency:0
/proc/sys/net/ipv4/tcp_max_orphans:32768
/proc/sys/net/ipv4/tcp_max_syn_backlog:1024
/proc/sys/net/ipv4/tcp_max_tw_buckets:18
/proc/sys/net/ipv4/tcp_mem:98304131072  196608
/proc/sys/net/ipv4/tcp_moderate_rcvbuf:1
/proc/sys/net/ipv4/tcp_mtu_probing:0
/proc/sys/net/ipv4/tcp_no_metrics_save:0
/proc/sys/net/ipv4/tcp_orphan_retries:0
/proc/sys/net/ipv4/tcp_reordering:3
/proc/sys/net/ipv4/tcp_retrans_collapse:1
/proc/sys/net/ipv4/tcp_retries1:3
/proc/sys/net/ipv4/tcp_retries2:15
/proc/sys/net/ipv4/tcp_rfc1337:0
/proc/sys/net/ipv4/tcp_rmem:409687380   4194304
/proc/sys/net/ipv4/tcp_sack:1
/proc/sys/net/ipv4/tcp_slow_start_after_idle:1
/proc/sys/net/ipv4/tcp_stdurg:0
/proc/sys/net/ipv4/tcp_synack_retries:5
/proc/sys/net/ipv4/tcp_syncookies:1
/proc/sys/net/ipv4/tcp_syn_retries:5
/proc/sys/net/ipv4/tcp_timestamps:1
/proc/sys/net/ipv4/tcp_tso_win_divisor:3
/proc/sys/net/ipv4/tcp_tw_recycle:0
/proc/sys/net/ipv4/tcp_tw_reuse:0
/proc/sys/net/ipv4/tcp_window_scaling:1
/proc/sys/net/ipv4/tcp_wmem:409616384   4194304
/proc/sys/net/ipv4/tcp_workaround_signed_windows:0



--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sunhttp://highlandsun.com/hyc
  Chief Architect, OpenLDAP http://www.openldap.org/project/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] bonding: Improve IGMP join processing

2007-03-06 Thread Brian Haley


Andy Gospodarek wrote:

If we are easily able to differentiate between the multicast addresses
in the mc_list as to which are for ipv4 and which are for ipv6 then it
would be easy to call-out to something in the ipv6 mcast code when
needed instead of always calling out to ipv4 code.


I've been unable to figure out exactly what you're referring to in the 
code (bond_main.c), it seems to failover all multicast addresses, 
regardless of what address family they are.  I might have missed 
something in 4K lines of code though?


-Brian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] div64_64 support

2007-03-06 Thread Stephen Hemminger

On Tue, 6 Mar 2007 20:48:41 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

> On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote:
> > Don't count the existing Newton-Raphson out. It turns out that to get enough
> > precision for 32 bits, only 4 iterations are needed. By unrolling those, it
> > gets much better timing.
> 
> But did you fix the >2^43 bug too?

It was caused by not doing x^2 in 64 bit.

> 
> SGI has already shipped 10TB Altixen, so it's not entirely theoretical.
> 
> -Andi
> 


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Eric Dumazet


Howard Chu a écrit :

Eric Dumazet wrote:

On Tuesday 06 March 2007 10:22, Howard Chu wrote:


It's a combination of 2MSL and /proc/sys/net/ipv4/ip_local_port_range -
on my system the default port range is 32768-61000. That means if I use
up 28232 ports in less than 2MSL then everything stops. netstat will
show that all the available port numbers are in TIME_WAIT state. And
this is particularly bad because while waiting for the timeout, I can't
initiate any new outbound connections of any kind at all - telnet, ssh,
whatever, you have to wait for at least one port to free up.
(Interesting denial of service there)

Granted, I was running my test on 2.6.18, perhaps 2.6.21 behaves
differently.


Could you try this attached program and tell me whats happen ?

$ gcc -O2 -o socktest socktest.c -lpthread
$ time ./socktest -n 10
nb_conn=9 nb_accp=9

real0m5.058s
user0m0.212s
sys 0m4.844s

(on my small machine, dell d610 :) )


On my Asus laptop (2GHz Pentium M) the first time I ran it it completed 
in about 51 seconds, with no errors. I then copied it to another machine 
and started it up there, and got connect errors right away. I then went 
back to my laptop and ran it again, and got errors that time.


This is the laptop run with errors:
viola:~/src> uname -a
Linux viola 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 i686 
i686 i386 GNU/Linux

viola:~/src> time ./socktest -n 100
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
nb_conn=993757 nb_accp=993757
1.408u 88.649s 1:42.76 87.6%0+0k 0+0io 0pf+0w

This is my other system, an AMD X2 3800+ (dual core)
mandolin:~/src> uname -a
Linux mandolin 2.6.18.3SMP #9 SMP Sat Nov 25 10:08:51 PST 2006 x86_64 
x86_64 x86_64 GNU/Linux

mandolin:~/src> gcc -O2 -o socktest socktest.c -lpthread
mandolin:~/src> time ./socktest -n 100
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
nb_conn=957088 nb_accp=957088
1.012u 630.991s 5:18.05 198.7%  0+0k 0+0io 0pf+0w


Let me see, any chance you can try the prog on 2.6.20 ?

If not, please send :

grep . /proc/sys/net/ipv4/*

Thank you
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.21 patch] unconditionally enable SYSFS_DEPRECATED

2007-03-06 Thread Greg KH

On Tue, Mar 06, 2007 at 12:10:09AM -0600, Matt Mackall wrote:
> On Mon, Mar 05, 2007 at 08:03:50PM -0800, Greg KH wrote:
> > On Mon, Mar 05, 2007 at 09:39:47PM -0600, Matt Mackall wrote:
> > > On Mon, Mar 05, 2007 at 06:48:50PM -0800, Greg KH wrote:
> > > > If so, can you disable the option and strace it to see what program is
> > > > trying to access what?  That will put the
> > > > HAL/NetworkManager/libsysfs/distro script finger pointing to rest pretty
> > > > quickly :)
> > > 
> > > Ok, I've got straces of both good and bad (>5M each). Filtered out
> > > random pointer values and the like, diffed, and filtered for /sys/,
> > > and the result's still 1.5M. What should I be looking for?
> > 
> > Failures when trying to read from /sys/class/net/
> > 
> > Or opening the directory and iterating over the subdirs in there.  Or
> > something like that.
> > 
> > But the /sys/class/net/ stuff should hopefully help narrow it down.
> 
> Works:
> 
> 6857  open("/sys/class/net",
> O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 13
> 6857  fstat64(13, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> 6857  fcntl64(13, F_SETFD, FD_CLOEXEC)  = 0
> 6857  getdents64(13, /* 5 entries */, 4096) = 120
> 6857  readlink("/sys/class/net/eth1", 0x80a2450, 256) = -1 EINVAL
> (Invalid argument)
> 6857  readlink("/sys/class/net/eth1/device",
> "../../../devices/pci:00/:00:1e.0/:02:02.0", 256) = 53
> 6857  readlink("/sys/class/net/lo", 0x80a2450, 256) = -1 EINVAL
> (Invalid argument)
> 6857  readlink("/sys/class/net/lo/device", 0x80a2450, 256) = -1 ENOENT
> (No such
> file or directory)
> 6857  readlink("/sys/class/net/eth0", 0x80a2450, 256) = -1 EINVAL
> (Invalid argument)
> 6857  readlink("/sys/class/net/eth0/device",
> "../../../devices/pci:00/:00:1e.0/:02:01.0", 256) = 53
> 6857  getdents64(13, /* 0 entries */, 4096) = 0
> 6857  close(13) = 0
> 
> Breaks:
> 
> 3620  open("/sys/class/net",
> O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 13
> 3620  fstat64(13, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> 3620  fcntl64(13, F_SETFD, FD_CLOEXEC)  = 0
> 3620  getdents64(13, /* 5 entries */, 4096) = 120
> 3620  readlink("/sys/class/net/eth1",
> "../../devices/pci:00/:00:1e.0/00\00:02:02.0/eth1", 256) = 55
> 3620
> readlink("/sys/devices/pci:00/:00:1e.0/:02:02.0/eth1/device",
> 0x809e910, 256) = -1 ENOENT (No such file or directory)
> 3620  readlink("/sys/class/net/lo", "../../devices/virtual/net/lo",
> 256) = 28
> 3620  readlink("/sys/devices/virtual/net/lo/device", 0x809e960, 256) =
> -1 ENOEN\T (No such file or directory)
> 3620  readlink("/sys/class/net/eth0",
> "../../devices/pci:00/:00:1e.0/00\00:02:01.0/eth0", 256) = 55
> 3620
> readlink("/sys/devices/pci:00/:00:1e.0/:02:01.0/eth0/device",
> 0x809e960, 256) = -1 ENOENT (No such file or directory)
> 3620  getdents64(13, /* 0 entries */, 4096) = 0
> 3620  close(13) = 0

Can you try the patch below?  And enable CONFIG_SYSFS_DEPRECATED.  It
should cause HAL to see the network devices again, as the symlink is now
back (it shouldn't have gone away, that was my fault...)

I tried this with HAL 0.5.7, which is pretty old, and hal-device-manager
shows my network devices properly.

thanks for your patience,

greg k-h


---
 drivers/base/core.c |   21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

--- gregkh-2.6.orig/drivers/base/core.c
+++ gregkh-2.6/drivers/base/core.c
@@ -584,17 +584,17 @@ int device_add(struct device *dev)
if (dev->kobj.parent != &dev->class->subsys.kset.kobj)
sysfs_create_link(&dev->class->subsys.kset.kobj,
  &dev->kobj, dev->bus_id);
-#ifdef CONFIG_SYSFS_DEPRECATED
if (parent) {
sysfs_create_link(&dev->kobj, &dev->parent->kobj,
"device");
+#ifdef CONFIG_SYSFS_DEPRECATED
class_name = make_class_name(dev->class->name,
&dev->kobj);
if (class_name)
sysfs_create_link(&dev->parent->kobj,
  &dev->kobj, class_name);
-   }
 #endif
+   }
}
 
if ((error = device_add_attrs(dev)))
@@ -651,17 +651,17 @@ int device_add(struct device *dev)
if (dev->kobj.parent != &dev->class->subsys.kset.kobj)
sysfs_remove_link(&dev->class->subsys.kset.kobj,
  dev->bus_id);
-#ifdef CONFIG_SYSFS_DEPRECATED
if (parent) {
+#ifdef CONFIG_SYSFS_DEPRECATED
char *class_name = make_class_name(dev->class->name,
   &dev->kobj);
if (class_name)
sysfs_remove_

Re: [patch 2/2] div64_64: common code

2007-03-06 Thread Ralf Baechle

On Tue, Mar 06, 2007 at 02:42:28AM -0800, [EMAIL PROTECTED] wrote:

> Implement div64_64(): 64-bit by 64-bit division.  Needed by networking (at
> least).

Your patch only implements div64_64() for 32-bit MIPS.  Below patch adds
the trivial 64-bit bits.

  Ralf

Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]>

 include/asm-mips/div64.h |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Index: linux-mips/include/asm-mips/div64.h
===
--- linux-mips.orig/include/asm-mips/div64.h
+++ linux-mips/include/asm-mips/div64.h
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) 2000, 2004  Maciej W. Rozycki
- * Copyright (C) 2003 Ralf Baechle
+ * Copyright (C) 2003, 07 Ralf Baechle ([EMAIL PROTECTED])
  *
  * This file is subject to the terms and conditions of the GNU General Public
  * License.  See the file "COPYING" in the main directory of this archive
@@ -105,6 +105,11 @@ extern uint64_t div64_64(uint64_t divide
(n) = __quot; \
__mod; })
 
+static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
+{
+   return dividend / divisor;
+}
+
 #endif /* (_MIPS_SZLONG == 64) */
 
 #endif /* _ASM_DIV64_H */
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] div64_64 support

2007-03-06 Thread Andi Kleen

On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote:
> Don't count the existing Newton-Raphson out. It turns out that to get enough
> precision for 32 bits, only 4 iterations are needed. By unrolling those, it
> gets much better timing.

But did you fix the >2^43 bug too?

SGI has already shipped 10TB Altixen, so it's not entirely theoretical.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] [PATCH 2.6.21-rc2] iw_cxgb3: Don't use mm after its freed in iwch_mmap().

2007-03-06 Thread Roland Dreier

Thanks, applied.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions

2007-03-06 Thread Baruch Even

* Ilpo J?rvinen <[EMAIL PROTECTED]> [070306 14:52]:
> Complete rewrite for update_scoreboard and mark_head_lost. Couple
> of hints became unnecessary because of this change. Changes
> !TCPCB_TAGBITS check from the original to !(S|L) but it shouldn't
> make a difference, and if there ever is an R only skb TCP will
> mark it as LOST too. The algorithm uses some ideas presented by
> David Miller and Baruch Even.
> 
> Seqno lookups require fast lookups that are provided using
> RB-tree patch(+abstraction) from DaveM.
> 
> Signed-off-by: Ilpo J?rvinen <[EMAIL PROTECTED]>
> ---
> 
> I'm sorry about poorly chunked diff, is it possible to force git to 
> produce better (large block) diffs when a complete function is rewritten 
> from scratch in the patch (manpage of git-diff-files hints -B bit it did 
> not work, affects whole file rewrites only perhaps)?
> 
> This probably conflicts with the other patches in the rbtree patchset of 
> DaveM (two first are required) because I tested this one (at least the 
> non-timedout part worked) and didn't want some random breakage 
> from the other patches (as such was reported).
> 
>  include/linux/tcp.h  |6 -
>  include/net/tcp.h|6 +
>  net/ipv4/tcp_input.c |  194 
> +-
>  net/ipv4/tcp_minisocks.c |1 
>  4 files changed, 130 insertions(+), 77 deletions(-)
> 

[snip]

> + newtp->highest_sack = treq->snt_isn + 1;

That's the only initialization that you have for highest_sack, I think
that you should initialize it when a loss is detected to the start_seq
of the first packet that wasn't acked.

Didn't review the rest, still need to arrange a proper tree with
preliminary patches to apply it on. Could you note the kernel you based
it on and include all patches applied before it?

Baruch
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: when having to acquire an SA, ipsec drops the packet

2007-03-06 Thread James Morris

On Tue, 6 Mar 2007, Joy Latten wrote:

> > I saw something similar to this some time ago when testing various 
> > failure modes, and discused it with Herbert.
> > 
> > IIRC, there's a larval SA which is not torn down properly by Racoon once 
> > the full SA is established, and the larval SA keeps resending until it 
> > times out.
> > 
> Ok, good to know. 
> I thought a bit more about this last night but am not
> sure best way to fix it. Perhaps a way to keep larval
> SA around until all SAs resulting from xfrm_vec[xfrm_nr]
> are established... oh well, just thinking out loud... :-) 

I think the solution, if this actually the problem, is for the userland 
code to maintain the SAs.


- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Howard Chu


Rick Jones wrote:
This is probably not something that happens in real world deployments. 
I But it's not 60,000 concurrent connections, it's 60,000 within a 2 
minute span.


Sounds like a case of Doctor! Doctor! It hurts when I do this.


I guess. In the cases where it matters, we use LDAP over Unix Domain 
Sockets instead of TCP. Smarter clients that do connection pooling would 
help too, but the fact that this even came to our attention is because 
not all clients out there are smart enough.


Since we have an alternative that works, I'm not really worried about 
it. I just thought it was worthwhile to raise the question.


I'm not saying this is a high priority problem, I only encountered it 
in a test scenario where I was deliberately trying to max out the server.



Ideally the 2MSL parameter would be dynamically adjusted based on the
route to the destination and the weights associated with those routes.
In the simplest case, connections between machines on the same subnet
(i.e., no router hops involved) should have a much smaller default 
value

than connections that traverse any routers. I'd settle for a two-level
setting - with no router hops, use the small value; with any router 
hops

use the large value.


With transparant bridging, nobody knows how long the datagram may be out 
there.  Admittedly, the chances of a datagram living for a full two 
minutes these days is probably nil, but just being in the same IP subnet 
doesn't really mean anything when it comes to physical locality.


Bridging isn't necessarily a problem though. The 2MSL timeout is 
designed to prevent problems from delayed packets that got sent through 
multiple paths. In a bridging setup you don't allow multiple paths, 
that's what STP is designed to prevent. If you want to configure a 
network that allows multiple paths, you need to use a router, not a bridge.


SPECweb benchmarking has had to deal with the issue of attempted 
TIME_WAIT reuse going back to 1997.  It deals with it by not relying on 
the client's configured local/anonymous/ephemeral port number range and 
instead making explicit bind() calls in the (more or less) entire unpriv 
port range (actually it may just be from 5000 to 65535 but still)


That still doesn't solve the problem, it only ~doubles the available 
port range. That means it takes 0.6 seconds to trigger the problem 
instead of only 0.3 seconds...


Now, if it weren't necessary to fully randomize the ISNs, the chances of 
a successful transition from TIME_WAIT to ESTABLISHED might be greater, 
but going back to the good old days of more or less purly clock driven 
ISN's isn't likely.


In an environment where connections are opened and closed very quickly 
with only a small amount of data carried per connection, it might make 
sense to remember the last sequence number used on a port and use that 
as the floor of the next randomly generated ISN. Monotonically 
increasing sequence numbers aren't a security risk if there's still a 
randomly determined gap from one connection to the next. But I don't 
think it's necessary to consider this at the moment.

--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sunhttp://highlandsun.com/hyc
  Chief Architect, OpenLDAP http://www.openldap.org/project/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH]: Revert accept queue backlog change.

2007-03-06 Thread David Miller


Wei, I have to revert your change, it is incorrect as pointed
out by other people here on netdev.

BSD sockets basically define the 'backlog' parameter to listen()
to mean "allow backlog + 1" connections to be queued to the socket.
This allows a backlog parameter of "0" to allow 1 connection, and
there are real applications which do this.

diff --git a/include/net/sock.h b/include/net/sock.h
index 849c7df..2c7d60c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -426,7 +426,7 @@ static inline void sk_acceptq_added(struct sock *sk)
 
 static inline int sk_acceptq_is_full(struct sock *sk)
 {
-   return sk->sk_ack_backlog >= sk->sk_max_ack_backlog;
+   return sk->sk_ack_backlog > sk->sk_max_ack_backlog;
 }
 
 /*
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 51ca438..6069716 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -934,7 +934,7 @@ static long unix_wait_for_peer(struct sock *other, long 
timeo)
 
sched = !sock_flag(other, SOCK_DEAD) &&
!(other->sk_shutdown & RCV_SHUTDOWN) &&
-   (skb_queue_len(&other->sk_receive_queue) >=
+   (skb_queue_len(&other->sk_receive_queue) >
 other->sk_max_ack_backlog);
 
unix_state_runlock(other);
@@ -1008,7 +1008,7 @@ restart:
if (other->sk_state != TCP_LISTEN)
goto out_unlock;
 
-   if (skb_queue_len(&other->sk_receive_queue) >=
+   if (skb_queue_len(&other->sk_receive_queue) >
other->sk_max_ack_backlog) {
err = -EAGAIN;
if (!timeo)
@@ -1381,7 +1381,7 @@ restart:
}
 
if (unix_peer(other) != sk &&
-   (skb_queue_len(&other->sk_receive_queue) >=
+   (skb_queue_len(&other->sk_receive_queue) >
 other->sk_max_ack_backlog)) {
if (!timeo) {
err = -EAGAIN;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET]: Please revert disallowing zero listen queues

2007-03-06 Thread David Miller

From: Rick Jones <[EMAIL PROTECTED]>
Date: Tue, 06 Mar 2007 10:54:00 -0800

> > So we're not "disallowing" a backlog argument of zero to
> > listen().  We'll accept that just fine, the only thing that
> > happens is that you'll get what you ask for, that being
> > no connections :-)
> 
> I'm not sure where HP-UX inherited the 0 = 1 bit - perhaps from BSD, nor 
> am I sure there is official chapter and verse, but:
> 
> 
> backlog is limited to the range of 0 to SOMAXCONN, which is   defined in 
> .  SOMAXCONN is currently set to 4096.  If any other 
> value is specified, the system automatically assigns the closest value 
>  within the range.  A backlog of 0 specifies only 1 pending 
> connection  is allowed at any given time.
> 
> 
> I don't have a Solaris, BSD or AIX manpage for listen handy to check 
> them but would not be surprised to see they are similar.

Ok, that seals the deal for me, I'll revert the change :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] ixgb: Use ARRAY_SIZE macro when appropriate.

2007-03-06 Thread Auke Kok

From: Ahmed S. Darwish <[EMAIL PROTECTED]>

Signed-off-by: Ahmed S. Darwish <[EMAIL PROTECTED]>
Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 drivers/net/ixgb/ixgb_param.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgb/ixgb_param.c b/drivers/net/ixgb/ixgb_param.c
index b27442a..c38ce73 100644
--- a/drivers/net/ixgb/ixgb_param.c
+++ b/drivers/net/ixgb/ixgb_param.c
@@ -245,8 +245,6 @@ ixgb_validate_option(int *value, struct ixgb_option *opt)
return -1;
 }
 
-#define LIST_LEN(l) (sizeof(l) / sizeof(l[0]))
-
 /**
  * ixgb_check_options - Range Checking for Command Line Parameters
  * @adapter: board private structure
@@ -335,7 +333,7 @@ ixgb_check_options(struct ixgb_adapter *adapter)
.name = "Flow Control",
.err  = "reading default settings from EEPROM",
.def  = ixgb_fc_tx_pause,
-   .arg  = { .l = { .nr = LIST_LEN(fc_list),
+   .arg  = { .l = { .nr = ARRAY_SIZE(fc_list),
 .p = fc_list }}
};
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] ARP notify option

2007-03-06 Thread Chris Wright

* Stephen Hemminger ([EMAIL PROTECTED]) wrote:
> This adds another inet device option to enable gratuitous ARP
> when device is brought up or address change. This is handy for
> clusters or virtualization.

This looks good.  I'll test with Xen.  What about the source
addr selection?

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] ARP notify option

2007-03-06 Thread Jeremy Fitzhardinge

Stephen Hemminger wrote:
> This adds another inet device option to enable gratuitous ARP
> when device is brought up or address change. This is handy for
> clusters or virtualization.
>   

Thanks Stephen.  Haven't tested this yet, but it definitely cleans up a
warty corner of netfront.

J
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.21 patch] unconditionally enable SYSFS_DEPRECATED

2007-03-06 Thread Greg KH

On Tue, Mar 06, 2007 at 12:10:09AM -0600, Matt Mackall wrote:
> On Mon, Mar 05, 2007 at 08:03:50PM -0800, Greg KH wrote:
> > On Mon, Mar 05, 2007 at 09:39:47PM -0600, Matt Mackall wrote:
> > > On Mon, Mar 05, 2007 at 06:48:50PM -0800, Greg KH wrote:
> > > > If so, can you disable the option and strace it to see what program is
> > > > trying to access what?  That will put the
> > > > HAL/NetworkManager/libsysfs/distro script finger pointing to rest pretty
> > > > quickly :)
> > > 
> > > Ok, I've got straces of both good and bad (>5M each). Filtered out
> > > random pointer values and the like, diffed, and filtered for /sys/,
> > > and the result's still 1.5M. What should I be looking for?
> > 
> > Failures when trying to read from /sys/class/net/
> > 
> > Or opening the directory and iterating over the subdirs in there.  Or
> > something like that.
> > 
> > But the /sys/class/net/ stuff should hopefully help narrow it down.
> 
> Works:
> 
> 6857  open("/sys/class/net",
> O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 13
> 6857  fstat64(13, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> 6857  fcntl64(13, F_SETFD, FD_CLOEXEC)  = 0
> 6857  getdents64(13, /* 5 entries */, 4096) = 120
> 6857  readlink("/sys/class/net/eth1", 0x80a2450, 256) = -1 EINVAL
> (Invalid argument)
> 6857  readlink("/sys/class/net/eth1/device",
> "../../../devices/pci:00/:00:1e.0/:02:02.0", 256) = 53
> 6857  readlink("/sys/class/net/lo", 0x80a2450, 256) = -1 EINVAL
> (Invalid argument)
> 6857  readlink("/sys/class/net/lo/device", 0x80a2450, 256) = -1 ENOENT
> (No such
> file or directory)
> 6857  readlink("/sys/class/net/eth0", 0x80a2450, 256) = -1 EINVAL
> (Invalid argument)
> 6857  readlink("/sys/class/net/eth0/device",
> "../../../devices/pci:00/:00:1e.0/:02:01.0", 256) = 53
> 6857  getdents64(13, /* 0 entries */, 4096) = 0
> 6857  close(13) = 0
> 
> Breaks:
> 
> 3620  open("/sys/class/net",
> O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 13
> 3620  fstat64(13, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> 3620  fcntl64(13, F_SETFD, FD_CLOEXEC)  = 0
> 3620  getdents64(13, /* 5 entries */, 4096) = 120
> 3620  readlink("/sys/class/net/eth1",
> "../../devices/pci:00/:00:1e.0/00\00:02:02.0/eth1", 256) = 55
> 3620
> readlink("/sys/devices/pci:00/:00:1e.0/:02:02.0/eth1/device",
> 0x809e910, 256) = -1 ENOENT (No such file or directory)
> 3620  readlink("/sys/class/net/lo", "../../devices/virtual/net/lo",
> 256) = 28
> 3620  readlink("/sys/devices/virtual/net/lo/device", 0x809e960, 256) =
> -1 ENOEN\T (No such file or directory)
> 3620  readlink("/sys/class/net/eth0",
> "../../devices/pci:00/:00:1e.0/00\00:02:01.0/eth0", 256) = 55
> 3620
> readlink("/sys/devices/pci:00/:00:1e.0/:02:01.0/eth0/device",
> 0x809e960, 256) = -1 ENOENT (No such file or directory)
> 3620  getdents64(13, /* 0 entries */, 4096) = 0
> 3620  close(13) = 0

Ah, that should be simple to fix in the kernel, give me an hour or so...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Robert Olsson

Eric Dumazet writes:

 > With 2^20 entries, your actual limit of 2^19 entries in root node will 
 > probably show us quite different numbers for order-1,2,3,4... tnodes

 Yeep trie will get deeper and lookup more costly as insert and delete.
 The 2^19 was that was getting memory alloction problem that I never
 sorted out.

 > Yes, numbers you gave us basically showed a big root node, and mainly leaves 
 > and very few tnodes.
 > 
 > I was interested to see the distribution in case the root-node limit is hit, 
 > and we load into the table a *lot* of entries.

 Maxlength etc... well maybe root-restriction should be removed and just have 
 maxsize instead.

 Cheers
--ro
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET]: Please revert disallowing zero listen queues

2007-03-06 Thread Rick Jones


So we're not "disallowing" a backlog argument of zero to
listen().  We'll accept that just fine, the only thing that
happens is that you'll get what you ask for, that being
no connections :-)


I'm not sure where HP-UX inherited the 0 = 1 bit - perhaps from BSD, nor 
am I sure there is official chapter and verse, but:



backlog is limited to the range of 0 to SOMAXCONN, which is 	defined in 
.  SOMAXCONN is currently set to 4096.  If any other 
value is specified, the system automatically assigns the closest value 
within the range.  A backlog of 0 specifies only 1 pending 
connection  is allowed at any given time.



I don't have a Solaris, BSD or AIX manpage for listen handy to check 
them but would not be surprised to see they are similar.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] pcnet32: change to use netdev_priv

2007-03-06 Thread Don Fry

use netdev_priv() instead of dev->priv

Signed-off-by: Thomas Bogendoerfer <[EMAIL PROTECTED]>
Signed-off-by: Don Fry <[EMAIL PROTECTED]>
---
--- linux-2.6.21-rc2/drivers/net/one.pcnet32.c  2007-03-06 10:48:37.0 
-0800
+++ linux-2.6.21-rc2/drivers/net/pcnet32.c  2007-03-05 18:03:32.0 
-0800
@@ -653,7 +653,7 @@ static void pcnet32_realloc_rx_ring(stru
 
 static void pcnet32_purge_rx_ring(struct net_device *dev)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
int i;
 
/* free all allocated skbuffs */
@@ -681,7 +681,7 @@ static void pcnet32_poll_controller(stru
 
 static int pcnet32_get_settings(struct net_device *dev, struct ethtool_cmd 
*cmd)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
unsigned long flags;
int r = -EOPNOTSUPP;
 
@@ -696,7 +696,7 @@ static int pcnet32_get_settings(struct n
 
 static int pcnet32_set_settings(struct net_device *dev, struct ethtool_cmd 
*cmd)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
unsigned long flags;
int r = -EOPNOTSUPP;
 
@@ -711,7 +711,7 @@ static int pcnet32_set_settings(struct n
 static void pcnet32_get_drvinfo(struct net_device *dev,
struct ethtool_drvinfo *info)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
 
strcpy(info->driver, DRV_NAME);
strcpy(info->version, DRV_VERSION);
@@ -723,7 +723,7 @@ static void pcnet32_get_drvinfo(struct n
 
 static u32 pcnet32_get_link(struct net_device *dev)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
unsigned long flags;
int r;
 
@@ -743,19 +743,19 @@ static u32 pcnet32_get_link(struct net_d
 
 static u32 pcnet32_get_msglevel(struct net_device *dev)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
return lp->msg_enable;
 }
 
 static void pcnet32_set_msglevel(struct net_device *dev, u32 value)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
lp->msg_enable = value;
 }
 
 static int pcnet32_nway_reset(struct net_device *dev)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
unsigned long flags;
int r = -EOPNOTSUPP;
 
@@ -770,7 +770,7 @@ static int pcnet32_nway_reset(struct net
 static void pcnet32_get_ringparam(struct net_device *dev,
  struct ethtool_ringparam *ering)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
 
ering->tx_max_pending = TX_MAX_RING_SIZE;
ering->tx_pending = lp->tx_ring_size;
@@ -781,7 +781,7 @@ static void pcnet32_get_ringparam(struct
 static int pcnet32_set_ringparam(struct net_device *dev,
 struct ethtool_ringparam *ering)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
unsigned long flags;
unsigned int size;
ulong ioaddr = dev->base_addr;
@@ -847,7 +847,7 @@ static int pcnet32_self_test_count(struc
 static void pcnet32_ethtool_test(struct net_device *dev,
 struct ethtool_test *test, u64 * data)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
int rc;
 
if (test->flags == ETH_TEST_FL_OFFLINE) {
@@ -868,7 +868,7 @@ static void pcnet32_ethtool_test(struct 
 
 static int pcnet32_loopback_test(struct net_device *dev, uint64_t * data1)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
struct pcnet32_access *a = &lp->a;  /* access to registers */
ulong ioaddr = dev->base_addr;  /* card base I/O address */
struct sk_buff *skb;/* sk buff */
@@ -1047,7 +1047,7 @@ static int pcnet32_loopback_test(struct 
 
 static void pcnet32_led_blink_callback(struct net_device *dev)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
struct pcnet32_access *a = &lp->a;
ulong ioaddr = dev->base_addr;
unsigned long flags;
@@ -1064,7 +1064,7 @@ static void pcnet32_led_blink_callback(s
 
 static int pcnet32_phys_id(struct net_device *dev, u32 data)
 {
-   struct pcnet32_private *lp = dev->priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
struct pcnet32_access *a = &lp->a;
ulong ioaddr = dev->base_addr;
unsigned long flags;
@@ -1109,7 +1109,7 @@ static int pcnet32_suspend(struct net_de
int can_sleep)
 {
int csr5;
-   struct pcnet32_private *lp = dev->priv;
+   stru

Re: [RFC] div64_64 support

2007-03-06 Thread H. Peter Anvin


Andi Kleen wrote:


Let me see... You throw code like that and expect someone to actually 
understand it in one year, and be able to correct a bug ?


To be honest I don't expect any bugs in this function.




Please add something, an URL or even better a nice explanation, per favor...


It's straight out of Hacker's delight which is referenced in the commit
log.


Referencing it in a comment would have been a better idea.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] ARP notify option

2007-03-06 Thread Stephen Hemminger

This adds another inet device option to enable gratuitous ARP
when device is brought up or address change. This is handy for
clusters or virtualization.

Tested on a normal device (not Xen).

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 Documentation/networking/ip-sysctl.txt |6 ++
 include/linux/inetdevice.h |2 ++
 include/linux/sysctl.h |1 +
 net/ipv4/devinet.c |   16 
 4 files changed, 25 insertions(+)

--- net-2.6.22.orig/Documentation/networking/ip-sysctl.txt  2007-03-05 
14:35:31.0 -0800
+++ net-2.6.22/Documentation/networking/ip-sysctl.txt   2007-03-05 
16:46:47.0 -0800
@@ -732,6 +732,12 @@
The max value from conf/{all,interface}/arp_ignore is used
when ARP request is received on the {interface}
 
+arp_notify - BOOLEAN
+   Define mode for notification of address and device changes.
+   0 - (default): do nothing
+   1 - Generate gratuitous arp replies when device is brought up
+   or hardware address changes.
+
 arp_accept - BOOLEAN
Define behavior when gratuitous arp replies are received:
0 - drop gratuitous arp frames
--- net-2.6.22.orig/include/linux/inetdevice.h  2007-03-05 14:35:34.0 
-0800
+++ net-2.6.22/include/linux/inetdevice.h   2007-03-05 16:46:47.0 
-0800
@@ -26,6 +26,7 @@
int arp_announce;
int arp_ignore;
int arp_accept;
+   int arp_notify;
int medium_id;
int no_xfrm;
int no_policy;
@@ -84,6 +85,7 @@
 #define IN_DEV_ARPFILTER(in_dev)   (ipv4_devconf.arp_filter || 
(in_dev)->cnf.arp_filter)
 #define IN_DEV_ARP_ANNOUNCE(in_dev)(max(ipv4_devconf.arp_announce, 
(in_dev)->cnf.arp_announce))
 #define IN_DEV_ARP_IGNORE(in_dev)  (max(ipv4_devconf.arp_ignore, 
(in_dev)->cnf.arp_ignore))
+#define IN_DEV_ARP_NOTIFY(in_dev)  (ipv4_devconf.arp_notify || 
(in_dev)->cnf.arp_notify)
 
 struct in_ifaddr
 {
--- net-2.6.22.orig/include/linux/sysctl.h  2007-03-05 14:35:34.0 
-0800
+++ net-2.6.22/include/linux/sysctl.h   2007-03-05 16:46:47.0 -0800
@@ -495,6 +495,7 @@
NET_IPV4_CONF_ARP_IGNORE=19,
NET_IPV4_CONF_PROMOTE_SECONDARIES=20,
NET_IPV4_CONF_ARP_ACCEPT=21,
+   NET_IPV4_CONF_ARP_NOTIFY=22,
__NET_IPV4_CONF_MAX
 };
 
--- net-2.6.22.orig/net/ipv4/devinet.c  2007-03-05 14:35:34.0 -0800
+++ net-2.6.22/net/ipv4/devinet.c   2007-03-05 16:46:47.0 -0800
@@ -1089,6 +1089,14 @@
}
}
ip_mc_up(in_dev);
+   /* fall through */
+   case NETDEV_CHANGEADDR:
+   if (IN_DEV_ARP_NOTIFY(in_dev))
+   arp_send(ARPOP_REQUEST, ETH_P_ARP,
+in_dev->ifa_list->ifa_address,
+dev,
+in_dev->ifa_list->ifa_address,
+NULL, dev->dev_addr, NULL);
break;
case NETDEV_DOWN:
ip_mc_down(in_dev);
@@ -1495,6 +1503,14 @@
.proc_handler   = &proc_dointvec,
},
{
+   .ctl_name   = NET_IPV4_CONF_ARP_NOTIFY,
+   .procname   = "arp_notify",
+   .data   = &ipv4_devconf.arp_notify,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = &proc_dointvec,
+   },
+   {
.ctl_name   = NET_IPV4_CONF_NOXFRM,
.procname   = "disable_xfrm",
.data   = &ipv4_devconf.no_xfrm,
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] div64_64 support II

2007-03-06 Thread H. Peter Anvin


Andi Kleen wrote:

The problem with these algorithms that tradoff one or more
multiplies in order to avoid a divide is that they don't
give anything and often lose when both multiplies and
divides are emulated in software.


Actually on rereading this: is there really any Linux port
that emulates multiplies in software? I thought that was only
done on really small microcontrollers or smart cards; but anything
32bit+ that runs Linux should have hardware multiply, shouldn't it?


SPARC < v8 does multiplies using an MSTEP instruction.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Rick Jones

This is probably not something that happens in real world deployments. I 
But it's not 60,000 concurrent connections, it's 60,000 within a 2 
minute span.


Sounds like a case of Doctor! Doctor! It hurts when I do this.



I'm not saying this is a high priority problem, I only encountered it in 
a test scenario where I was deliberately trying to max out the server.



Ideally the 2MSL parameter would be dynamically adjusted based on the
route to the destination and the weights associated with those routes.
In the simplest case, connections between machines on the same subnet
(i.e., no router hops involved) should have a much smaller default value
than connections that traverse any routers. I'd settle for a two-level
setting - with no router hops, use the small value; with any router hops
use the large value.


With transparant bridging, nobody knows how long the datagram may be out 
there.  Admittedly, the chances of a datagram living for a full two 
minutes these days is probably nil, but just being in the same IP subnet 
doesn't really mean anything when it comes to physical locality.


It's a combination of 2MSL and /proc/sys/net/ipv4/ip_local_port_range - 
on my system the default port range is 32768-61000. That means if I use 
up 28232 ports in less than 2MSL then everything stops. netstat will 
show that all the available port numbers are in TIME_WAIT state. And 
this is particularly bad because while waiting for the timeout, I can't 
initiate any new outbound connections of any kind at all - telnet, ssh, 
whatever, you have to wait for at least one port to free up. 
(Interesting denial of service there)


SPECweb benchmarking has had to deal with the issue of attempted 
TIME_WAIT reuse going back to 1997.  It deals with it by not relying on 
the client's configured local/anonymous/ephemeral port number range and 
instead making explicit bind() calls in the (more or less) entire unpriv 
port range (actually it may just be from 5000 to 65535 but still)


Now, if it weren't necessary to fully randomize the ISNs, the chances of 
a successful transition from TIME_WAIT to ESTABLISHED might be greater, 
but going back to the good old days of more or less purly clock driven 
ISN's isn't likely.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET]: Please revert disallowing zero listen queues

2007-03-06 Thread David Miller

From: David Miller <[EMAIL PROTECTED]>
Date: Tue, 06 Mar 2007 10:37:06 -0800 (PST)

> Everything I've ever seen clearly states that a backlog of
> zero means that zero connections are allowed.
> 
> So we're not "disallowing" a backlog argument of zero to
> listen().  We'll accept that just fine, the only thing that
> happens is that you'll get what you ask for, that being
> no connections :-)

I'm not saying that a backlog of zero might mean allow one,
in which case we do need to revert the change.  Rather, I'm
trying to clarify what is the real issue here as Gerrit's
email implied that listen() with a zero backlog returns
an error now, which is not true.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: wireless extensions vs. 64-bit architectures

2007-03-06 Thread Michael Buesch

On Tuesday 06 March 2007 18:13, Jean Tourrilhes wrote:
> On Tue, Mar 06, 2007 at 02:27:26AM +0100, Johannes Berg wrote:
> > Hi,
> > 
> > Wtf! After struggling with some strange problems with zd1211rw (see some
> > other mail) I decided to think again about what could possibly cause all
> > the other problems I'm having with it. The kernel seems fine, but iw*
> > userspace continually segfaults! And it also seems to be not
> > reproducible for most other people, I'd asked on IRC once a while.
> > 
> > Well. Some thinking and stracing and thinking later it occurred to me...
> > Hell! wext is ioctls and includes this gem:
> > 
> > struct  iw_point
> > {
> >   void __user   *pointer;   /* Pointer to the data  (in user space) */
> >   __u16 length; /* number of fields or size in bytes */
> >   __u16 flags;  /* Optional params */
> > };
> > 
> > Of course nobody ever tells you this, but it's used in a shitload of
> > places.
> 
>   Yep, and it's even in fs/compat_ioctl.c. Hint, hint ;-)

Ok, it is wrapping the following ioctls:

HANDLE_IOCTL(SIOCGIWRANGE, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIWSPY, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWSPY, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIWTHRSPY, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWTHRSPY, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWAPLIST, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWSCAN, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIWESSID, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWESSID, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIWNICKN, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWNICKN, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIWENCODE, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWENCODE, do_wireless_ioctl)

What about SIOCSIWSCAN, SIOCSIWENCODEEXT, SIOCGIWENCODEEXT
and some others that also use iw_point?

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linux 2.6 Ipv4 routing enhancement (fwd)

2007-03-06 Thread Robert Olsson


Richard Kojedzinszky writes:

 > Sorry for sending the tgz with .svn included. And i did not send 
 > instructions.
 > To do a test with fib_trie, issue
 > $ make clean all ROUTE_ALG=TRIE & ./try a
 > with fib_radix:
 > $ make clean all ROUTE_ALG=RADIX & ./try a
 > with fib_lef:
 > $ make clean all ROUTE_ALG=LEF SBBITS=4 & ./try a

 Thanks. First I'll use to do my testing in kernel context and in the 
 forwarding path with full semantic match so it's not that easy to compare.  
 But I'll take a look. BTW the you test so you do correct prefix matching?
 
 FYI. some old fib work on robur.slu.se

 # Look with just hlist
  /pub/Linux/net-development/fib_hlist/

 # 24 bit hash lookup
  /pub/Linux/net-development/fib_hash2/
 
 And some hlist/hash2/trie comparisons in:
 /pub/Linux/tmp/trie-talk-kth.pdf

 Cheers
--ro
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 215 matches

Mail list logo