date:20060223

Re: [git patches] net driver fixes

2006-02-23 Thread Wolfgang Hoffmann

On Friday 24 February 2006 06:22, Jeff Garzik wrote:
> Please pull from 'upstream-fixes' branch of
> master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
>
> [...]
> Stephen Hemminger:
>   sky2: yukon-ec-u chipset initialization
>   sky2: limit coalescing values to ring size
>   sky2: poke coalescing timer to fix hang
>   sky2: force early transmit status
>   sky2: use device iomem to access PCI config
>   sky2: close race on IRQ mask update.
>[...]

Thanks for the update.

Still I'm seeing reproducable hangs with this version of sky2 (as reported in 
bugzilla 6084 and discussed on netdev).

Stephen, if there is anything I can do to narrow down my hangs a bit more 
systematically, please let me know, I'd be happy to help.

Wolfgang
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[git patches] net driver fixes

2006-02-23 Thread Jeff Garzik


Please pull from 'upstream-fixes' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git

to receive the following updates:

 drivers/net/r8169.c |  189 
 drivers/net/skge.c  |   75 
 drivers/net/skge.h  |1 
 drivers/net/sky2.c  |  173 ---
 drivers/net/sky2.h  |   85 ---
 drivers/net/tlan.c  |2 
 6 files changed, 371 insertions(+), 154 deletions(-)

Adrian Bunk:
  drivers/net/tlan.c: #ifdef CONFIG_PCI the PCI specific code

Francois Romieu:
  r8169: fix broken ring index handling in suspend/resume
  r8169: enable wake on lan

Stephen Hemminger:
  sky2: yukon-ec-u chipset initialization
  sky2: limit coalescing values to ring size
  sky2: poke coalescing timer to fix hang
  sky2: force early transmit status
  sky2: use device iomem to access PCI config
  sky2: close race on IRQ mask update.
  skge: NAPI/irq race fix
  skge: genesis phy initialzation
  skge: protect interrupt mask

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 6e10184..8cc0d0b 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -287,6 +287,20 @@ enum RTL8169_register_content {
TxInterFrameGapShift = 24,
TxDMAShift = 8, /* DMA burst value (0-7) is shift this many bits */
 
+   /* Config1 register p.24 */
+   PMEnable= (1 << 0), /* Power Management Enable */
+
+   /* Config3 register p.25 */
+   MagicPacket = (1 << 5), /* Wake up when receives a Magic Packet 
*/
+   LinkUp  = (1 << 4), /* Wake up when the cable connection is 
re-established */
+
+   /* Config5 register p.27 */
+   BWF = (1 << 6), /* Accept Broadcast wakeup frame */
+   MWF = (1 << 5), /* Accept Multicast wakeup frame */
+   UWF = (1 << 4), /* Accept Unicast wakeup frame */
+   LanWake = (1 << 1), /* LanWake enable/disable */
+   PMEStatus   = (1 << 0), /* PME status can be reset by PCI RST# 
*/
+
/* TBICSR p.28 */
TBIReset= 0x8000,
TBILoopback = 0x4000,
@@ -433,6 +447,7 @@ struct rtl8169_private {
unsigned int (*phy_reset_pending)(void __iomem *);
unsigned int (*link_ok)(void __iomem *);
struct work_struct task;
+   unsigned wol_enabled : 1;
 };
 
 MODULE_AUTHOR("Realtek and the Linux r8169 crew ");
@@ -607,6 +622,80 @@ static void rtl8169_link_option(int idx,
*duplex = p->duplex;
 }
 
+static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo 
*wol)
+{
+   struct rtl8169_private *tp = netdev_priv(dev);
+   void __iomem *ioaddr = tp->mmio_addr;
+   u8 options;
+
+   wol->wolopts = 0;
+
+#define WAKE_ANY (WAKE_PHY | WAKE_MAGIC | WAKE_UCAST | WAKE_BCAST | WAKE_MCAST)
+   wol->supported = WAKE_ANY;
+
+   spin_lock_irq(&tp->lock);
+
+   options = RTL_R8(Config1);
+   if (!(options & PMEnable))
+   goto out_unlock;
+
+   options = RTL_R8(Config3);
+   if (options & LinkUp)
+   wol->wolopts |= WAKE_PHY;
+   if (options & MagicPacket)
+   wol->wolopts |= WAKE_MAGIC;
+
+   options = RTL_R8(Config5);
+   if (options & UWF)
+   wol->wolopts |= WAKE_UCAST;
+   if (options & BWF)
+   wol->wolopts |= WAKE_BCAST;
+   if (options & MWF)
+   wol->wolopts |= WAKE_MCAST;
+
+out_unlock:
+   spin_unlock_irq(&tp->lock);
+}
+
+static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+{
+   struct rtl8169_private *tp = netdev_priv(dev);
+   void __iomem *ioaddr = tp->mmio_addr;
+   int i;
+   static struct {
+   u32 opt;
+   u16 reg;
+   u8  mask;
+   } cfg[] = {
+   { WAKE_ANY,   Config1, PMEnable },
+   { WAKE_PHY,   Config3, LinkUp },
+   { WAKE_MAGIC, Config3, MagicPacket },
+   { WAKE_UCAST, Config5, UWF },
+   { WAKE_BCAST, Config5, BWF },
+   { WAKE_MCAST, Config5, MWF },
+   { WAKE_ANY,   Config5, LanWake }
+   };
+
+   spin_lock_irq(&tp->lock);
+
+   RTL_W8(Cfg9346, Cfg9346_Unlock);
+
+   for (i = 0; i < ARRAY_SIZE(cfg); i++) {
+   u8 options = RTL_R8(cfg[i].reg) & ~cfg[i].mask;
+   if (wol->wolopts & cfg[i].opt)
+   options |= cfg[i].mask;
+   RTL_W8(cfg[i].reg, options);
+   }
+
+   RTL_W8(Cfg9346, Cfg9346_Lock);
+
+   tp->wol_enabled = (wol->wolopts) ? 1 : 0;
+
+   spin_unlock_irq(&tp->lock);
+
+   return 0;
+}
+
 static void rtl8169_get_drvinfo(struct net_device *dev,
struct ethtool_drvinfo *info)
 {
@@ -1025,6 +1114,8 @@ static struct ethtool_ops rtl8169_ethtoo
.get_tso

[Patch 1/1] updated: TCP/UDP getpeersec

2006-02-23 Thread Catherine Zhang

Hi,

Updated as per Herbert's comment.

Catherine

---

From: [EMAIL PROTECTED]

This patch implements an application of the LSM-IPSec networking
controls whereby an application can determine the label of the
security association its TCP or UDP sockets are currently connected to
via getsockopt and the auxiliary data mechanism of recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of an IPSec security association a particular TCP or
UDP socket is using.  The application can then use this security
context to determine the security context for processing on behalf of
the peer at the other end of this connection.  In the case of UDP, the
security context is for each individual packet.  An example
application is the inetd daemon, which could be modified to start
daemons running at security contexts dependent on the remote client.

Patch design approach:

- Design for TCP
The patch enables the SELinux LSM to set the peer security context for
a socket based on the security context of the IPSec security
association.  The application may retrieve this context using
getsockopt.  When called, the kernel determines if the socket is a
connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
cache on the socket to retrieve the security associations.  If a
security association has a security context, the context string is
returned, as for UNIX domain sockets.

- Design for UDP
Unlike TCP, UDP is connectionless.  This requires a somewhat different
API to retrieve the peer security context.  With TCP, the peer
security context stays the same throughout the connection, thus it can
be retrieved at any time between when the connection is established
and when it is torn down.  With UDP, each read/write can have
different peer and thus the security context might change every time.
As a result the security context retrieval must be done TOGETHER with
the packet retrieval.

The solution is to build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).

Patch implementation details: 

- Implementation for TCP
The security context can be retrieved by applications using getsockopt
with the existing SO_PEERSEC flag.  As an example (ignoring error
checking):

getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen);
printf("Socket peer context is: %s\n", optbuf);

The SELinux function, selinux_socket_getpeersec, is extended to check
for labeled security associations for connected (TCP_ESTABLISHED ==
sk->sk_state) TCP sockets only.  If so, the socket has a dst_cache of
struct dst_entry values that may refer to security associations.  If
these have security associations with security contexts, the security
context is returned.  

getsockopt returns a buffer that contains a security context string or 
the buffer is unmodified. 

- Implementation for UDP
To retrieve the security context, the application first indicates to
the kernel such desire by setting the IP_PASSSEC option via
getsockopt.  Then the application retrieves the security context using
the auxiliary data mechanism.  

An example server application for UDP should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
cmsg_hdr->cmsg_level == SOL_IP &&
cmsg_hdr->cmsg_type == SCM_SECURITY) {
memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
}
}

ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
a server socket to receive security context of the peer.  A new
ancillary message type SCM_SECURITY.

When the packet is received we get the security context from the
sec_path pointer which is contained in the sk_buff, and copy it to the
ancillary message space.  An additional LSM hook,
selinux_socket_getpeersec_udp, is defined to retrieve the security
context from the SELinux space.  The existing function,
selinux_socket_getpeersec does not suit our purpose, because the
security context is copied directly to user space, rather than to
kernel space.


Testing:

We have tested the patch by setting up TCP and UDP connections between
applications on two machines using the IPSec policies that result in
labeled security associations being built.  For TCP, we can then
extract the peer security context using getsockopt on either end.  For
UDP, the receiving end can retrieve the security context using the
auxiliary data mechanism of recvmsg.


---

 include/linux/in.h  |1 
 include/linux/security.h|   25 +++---
 include/linux/socket.h  |1 
 net/core/sock.c |2 -
 net/ipv4/ip_sockglue

Re: [PATCH] iproute2 -- add fwmarkmask

2006-02-23 Thread Patrick McHardy

Michael Richardson wrote:
> 
> 
>>>"Patrick" == Patrick McHardy <[EMAIL PROTECTED]> writes:
> 
> Patrick> The normal way to display masks is with a "/". Also I think
> Patrick> it shouldn't display the default mask to avoid breaking
> Patrick> scripts that parse the output.
> 
>   I generally dislike the /VALUE, since I expect /PREFIX-LEN.
>   I agree that it shouldn't show if it is default.
> 
> Patrick> ip should be able to parse its own output, and it would
> Patrick> also look nicer if I could just say "fwmark
> Patrick> 0x1/32". fwmarkmask is really an incredible ugly expression
> Patrick> :)
> 
>   Sure. Is that a 32-bit long mask (0xfff), or is it a 0x0020?
>   fwmark is not an address.
> 
>   Or would you like /32 to be a prefix-based mask, and &value and/or
> fwmarkmask to be a value? 

That was not the greatest example :) I think it should be a bitmask.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/02] add mask options to fwmark masking code

2006-02-23 Thread Patrick McHardy

Michael Richardson wrote:
> 
> 
>>>"Patrick" == Patrick McHardy <[EMAIL PROTECTED]> writes:
> 
> >> #define RTA_FWMARK RTA_PROTOINFO +#define RTA_FWMARK_MASK
> >> RTA_CACHEINFO
> 
> Patrick> Please introduce a new attribute for this instead of
> Patrick> overloading RTA_CACHEINFO.
> 
>   I would be happy to do that.
>   Should I also un-overload FWMARK, with backwards compatibility?

No, that one is fine since it doesn't already have a different meaning.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Problem with Ipsec transport mode over NAT

2006-02-23 Thread Patrick McHardy

Chinh Nguyen wrote:
> Patrick McHardy wrote:
> 
>>Netfilter recalculates the checksum when NATing it.
> 
> 
> The NATing is not done by netfilter but by the NAT device between the IPsec 
> peers.

I see, so the TCP checksum includes the wrong IPs.

> [Linux ipsec client C] -- [NAT device] -- [Linux ipsec server S]
> 
> C negotiates a IPsec Transport Mode with S. Because of Transport Mode/NAT-T, 2
> things happen to an IPsec packet.
> 
> 1. It is UDP-encapsulated, typically on port 4500/udp.
> 2. Transport Mode traffic leaves the original IP header alone whereas tunnel
> mode wraps the entire traffic in a second IP header. As such, when the packet
> passes through the NAT device, the source IP is N. However, the original
> unencrypted packet had source IP C.
> 
> S rips off the UDP-encap header, decrypts the payload, and "joins" the content
> back to the IP header. If the decrypted content is UDP or TCP, the UDP/TCP
> checksum is now incorrect because the source IP is now N not C.
> 
> (In tunnel mode, we would ignore the NAT-ted outer IP header because the
> decrypted content has an entire IP header + UDP/TCP etc)
> 
> This is a well-known problem with transport mode/NAT. One solution is to use
> NAT-OA and NAT-OR to recalculate the checksum. The linux kernel does the 
> simpler
> thing of ignoring the UDP/TCP checksum altogether in this particular case:
> 
> function esp_post_input (net/ipv4/esp4.c)
> 290 /*
> 291  * 2) ignore UDP/TCP checksums in case
> 292  *of NAT-T in Transport Mode, or
> 293  *perform other post-processing fixes
> 294  *as per * draft-ietf-ipsec-udp-encaps-06,
> 295  *section 3.1.2
> 296  */
> 297 if (!x->props.mode)
> 298 skb->ip_summed = CHECKSUM_UNNECESSARY;
> 299
> 300 break;
> 
> 
> As noted, esp_post_input is called in xfrm4_policy_check. Decrypted UDP 
> traffic
> through transport mode/nat also has bad checksums. However, since it is passed
> through udp_queue_rcv_skb after decryption, and this function calls
> xfrm4_policy_check before checking the UDP checksum, line 298 means the kernel
> ignores the bad checksum.
> 
> Decrypted TCP traffic has bad checksums too. But since tcp_v4_rcv checks the 
> TCP
> checksum before calling xfrm4_policy_check, the bad checksum means the TCP
> packet is dropped as a bad segment.
> 
> The end result is that UDP and other traffic (eg, ICMP) can pass through
> transport mode/nat but not TCP.
> 
> I don't know what correct fix is. Adding an extra call to xfrm4_policy_check 
> in
> tcp_v4_rcv before the checksum check fixes this problem and doesn't seem to
> break anything else. On the other hand, moving some of the code in
> esp_post_input into esp_input (especially line 298) will work, too.

So we could move checksum validation behind xfrm4_policy_check or
already set ip_summed to CHECKSUM_UNNECESSARY in esp_input. Already
setting ip_summed in esp4_input looks easier. But this still leaves
one problem. With netfilter and local NAT, a decapsulated transport
mode packet might be forwarded to another host. In that case the
checksum contained in the packet is invalid. Any ideas how to fix
this anyone?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [TCP 2.6.16-rc3] window scaling disabled issue?

2006-02-23 Thread Jesse Brandeburg

On 2/21/06, David S. Miller <[EMAIL PROTECTED]> wrote:
> From: Rick Jones <[EMAIL PROTECTED]>
> Date: Tue, 21 Feb 2006 17:21:30 -0800
>
> > My point (perhaps not as well expressed as the one on the top of my
> > head :) was that if 2.4 is "OK" with extending the window beyond
> > 32767 without adding additional semantics on those options, why
> > should 2.6 need to?
>
> 2.4.x has the same window limiting code, if it isn't limiting the
> window it's either a bug or a local change the person reporting
> that made.

Its definitely not a local change that *I* made.  Unless redhat made
that change to their kernel for some reason.  I'm running that
2.4.21-27 kernel from Redhat Enterprise on a power system.  The 2.4
machine had window scaling enabled but didn't advertise or use it when
tcp_window_scaling was off on the 2.6 side..

I finally got 2.4.32 to compile and it ramps nicely to a 64k receive
window, still, and the 2.6 kernel limits itself to 32767 when
receiving. keeping in mind this is with tcp_window_scaling = 0 and
tcp_adv_window_scale = 0 on the 2.6 kernel side.  I made no stack
config changes on the 2.4.32 side.

Just for grins I left the window scaling settings at default and I
noticed that the 2.4.32 kernel replies (and advertises with SYN) with
wscale 0 in the SYNACK.  Is that correct?

so i would say the 2.6 kernel with default settings is working okay
but is *not* the same as vanilla 2.4.32 when window scaling is
disabled.

Jesse

PS here are the mini-dumps

*** 2.6 sending to 2.4

19:04:50.431251 arp who-has 10.0.1.7 tell 10.0.1.9
19:04:50.431500 arp reply 10.0.1.7 is-at 00:07:e9:03:68:61
19:04:50.431514 IP 10.0.1.9.56210 > 10.0.1.7.12865: S
946995500:946995500(0) win 5840 
19:04:50.431873 IP 10.0.1.7.12865 > 10.0.1.9.56210: S
3054767463:3054767463(0) ack 946995501 win 5792 
19:04:50.431914 IP 10.0.1.9.56210 > 10.0.1.7.12865: . ack 1 win 5840

19:04:50.443776 IP 10.0.1.9.56210 > 10.0.1.7.12865: P 1:257(256) ack 1
win 5840 
19:04:50.444119 IP 10.0.1.7.12865 > 10.0.1.9.56210: . ack 257 win 6432

19:04:50.447120 IP 10.0.1.7.12865 > 10.0.1.9.56210: P 1:257(256) ack
257 win 6432 
19:04:50.447129 IP 10.0.1.9.56210 > 10.0.1.7.12865: . ack 257 win 6432

19:04:50.447159 IP 10.0.1.9.53371 > 10.0.1.7.32777: S
938580246:938580246(0) win 5840 
19:04:50.447369 IP 10.0.1.7.32777 > 10.0.1.9.53371: S
3061241349:3061241349(0) ack 938580247 win 5792 
19:04:50.447380 IP 10.0.1.9.53371 > 10.0.1.7.32777: . ack 1 win 5840

19:04:50.447422 IP 10.0.1.9.53371 > 10.0.1.7.32777: . 1:2897(2896) ack
1 win 5840 
19:04:50.447619 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 1449 win
8688 
19:04:50.447630 IP 10.0.1.9.53371 > 10.0.1.7.32777: P 2897:5793(2896)
ack 1 win 5840 
19:04:50.447638 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 2897 win
11584 
19:04:50.447645 IP 10.0.1.9.53371 > 10.0.1.7.32777: . 5793:8689(2896)
ack 1 win 5840 
19:04:50.447869 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 4345 win
14480 
19:04:50.447877 IP 10.0.1.9.53371 > 10.0.1.7.32777: P 8689:11585(2896)
ack 1 win 5840 
19:04:50.447883 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 5793 win
17376 
19:04:50.447890 IP 10.0.1.9.53371 > 10.0.1.7.32777: P
11585:14481(2896) ack 1 win 5840 
19:04:50.447897 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 7241 win
20272 
19:04:50.447902 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 8689 win
23168 
19:04:50.447921 IP 10.0.1.9.53371 > 10.0.1.7.32777: .
14481:15929(1448) ack 1 win 5840 
19:04:50.447927 IP 10.0.1.9.53371 > 10.0.1.7.32777: P 15929:16385(456)
ack 1 win 5840 
19:04:50.447944 IP 10.0.1.9.53371 > 10.0.1.7.32777: .
16385:19281(2896) ack 1 win 5840 
19:04:50.448118 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 10137 win
26064 
19:04:50.448126 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 11585 win
28960 
19:04:50.448135 IP 10.0.1.9.53371 > 10.0.1.7.32777: .
19281:25073(5792) ack 1 win 5840 
19:04:50.448142 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 13033 win
31856 
19:04:50.448147 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 14481 win
34752 
19:04:50.448157 IP 10.0.1.9.53371 > 10.0.1.7.32777: .
25073:30865(5792) ack 1 win 5840 
19:04:50.448163 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 15929 win
37648 
19:04:50.448245 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 16385 win
37648 
19:04:50.448255 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 17833 win
40544 
19:04:50.448261 IP 10.0.1.9.53371 > 10.0.1.7.32777: .
30865:38105(7240) ack 1 win 5840 
19:04:50.448269 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 19281 win
43440 
19:04:50.448285 IP 10.0.1.9.53371 > 10.0.1.7.32777: .
38105:41001(2896) ack 1 win 5840 
19:04:50.448372 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 20729 win
46336 
19:04:50.448381 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 22177 win
49232 
19:04:50.448493 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 23625 win
52128 
19:04:50.448502 IP 10.0.1.9.53371 > 10.0.1.7.32777: .
41001:49689(8688) ack 1 win 5840 
19:04:50.448508 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 25073 win
55024 
19:04:50.448515 IP 10.0.1.7.32777 > 10.0.1.9.53371: . ack 26521 win

Re: [PATCH]IPv4 UDP does not discard the datagram with invalid checksum

2006-02-23 Thread Wei Yongjun

Under IPv4, when I send a UDP packet with invalid checksum, kernel used
udp_rcv() to up packet to UDP layer, application used udp_recvmsg to
receive message. So if one UDP packet with invalid checksum is arrived
to host, UDP_MIB_INDATAGRAMS will be increased 1, UDP_MIB_INERRORS
should be increased 1.
int udp_rcv(struct sk_buff *skb) {
...
udp_queue_rcv_skb();
...
}

static int udp_queue_rcv_skb(struct sock * sk, struct sk_buff *skb) {
...
if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
 if (__udp_checksum_complete(skb)) {
 UDP_INC_STATS_BH(UDP_MIB_INERRORS);
 kfree_skb(skb);
 return -1;
}
skb->ip_summed = CHECKSUM_UNNECESSARY;
}

UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
...
}

static int udp_recvmsg(...) {
...
csum_copy_err:
UDP_INC_STATS_BH(UDP_MIB_INERRORS);
...
}

In my test, I send a to a IPv4 UDP packet with invalid checksum to echo-
udp, I can find the following message in file /var/log/messages:
xinetd[4468]: service echo-dgram, recvfrom: Resource temporarily
unavailable (errno = 11)
and UDP_MIB_INDATAGRAMS increased 1, UDP_MIB_INERRORS increased 0.
xinetd used other fucntion to receive message, not udp_recvmsg()?

The other question is why discard the packet with invalid checksum only
when sk->sk_filter is set?

By the way, under IPv6, packet with invalid checksum be discard in
udpv6_rcv(), so So if one UDP packet with invalid checksum is arrived to
IPv6 host, UDP_MIB_INDATAGRAMS will be increased 0, UDP_MIB_INERRORS
should be increased 1.

static int udpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp) {
...
udpv6_queue_rcv_skb();
...
}

static inline int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff
*skb) {
...
if (skb->ip_summed != CHECKSUM_UNNECESSARY) {
 if ((unsigned short)csum_fold(skb_checksum(skb, 0, skb->len,
skb->csum))) {
 UDP6_INC_STATS_BH(UDP_MIB_INERRORS);
 kfree_skb(skb);
return 0;
 }
 skb->ip_summed = CHECKSUM_UNNECESSARY;
}
...
UDP6_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
...
}

One packet with invalid checksum arrived to IPv4 and IPv6 host, the
count of UDP_MIB_INDATAGRAMS and UDP_MIB_INERRORS get different
increase. There definition of the two count are some difference between
IPv4 and IPv6?


> > IPv4 UDP does not discard the datagram with invalid checksum. UDP can
> > validate UDP checksums correctly only when socket filtering
> instructions
> > is set. If socket filtering instructions is not set, datagram with
> > invalid checksum will be passed to the application.
> 
> We check the checksum later, in parallel with the copy of
> the packet data into userspace.
> 
> See udp_recvmsg(), where we do this:
> 
> if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
> err = skb_copy_datagram_iovec(skb, sizeof(struct
> udphdr), msg->msg_iov,
>   copied);
> } else if (msg->msg_flags&MSG_TRUNC) {
> if (__udp_checksum_complete(skb))
> goto csum_copy_err;
> err = skb_copy_datagram_iovec(skb, sizeof(struct
> udphdr), msg->msg_iov,
>   copied);
> } else {
> err = skb_copy_and_csum_datagram_iovec(skb, sizeof
> (struct udphdr), msg->msg_iov);
> 
> if (err == -EINVAL)
> goto csum_copy_err;
> }


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Some state changes not be counted to TCP_MIB_ATTEMPTFAILS

2006-02-23 Thread Wei Yongjun

Refer to RFC2012, tcpAttemptFails is defined as following:
  tcpAttemptFails OBJECT-TYPE
  SYNTAX  Counter32
  MAX-ACCESS  read-only
  STATUS  current
  DESCRIPTION
  "The number of times TCP connections have made a direct
  transition to the CLOSED state from either the SYN-SENT
  state or the SYN-RCVD state, plus the number of times TCP
  connections have made a direct transition to the LISTEN
  state from the SYN-RCVD state."
  ::= { tcp 7 }

State changes of SYN-RCVD to CLOSED, SYN-SENT to CLOSED and SYN-RCVD to
LISTEN should be counted to TCP_MIB_ATTEMPTFAILS.

Following state changes does not be counted to TCP_MIB_ATTEMPTFAILS by
the kernel.

SYN-SENT state => CLOSED

TCP A TCP B
  
1.  LISTENCLOSED
   
2. <--   -->  SYN-SENT

3. --> SEQ=X>  -->  CLOSED

SYN-RECEIVED state(came from SYN-SENT state) => CLOSED

TCP A TCP B
  
1.  LISTENCLOSED

2. <--   -->  SYN-SENT

3. -->   SYN-SENT

4. <---->  SYN-RECEIVED

3. -->-->  CLOSED

SYN-RECEIVED state(came from SYN-SENT state) => CLOSED

TCP A TCP B
  
1.  LISTENCLOSED

2. <--   -->  SYN-SENT

3. -->   SYN-SENT

4. <---->  SYN-RECEIVED

3. -->-->  CLOSED

SYN-RECEIVED state => LISTEN

TCP A TCP B
  
1.  LISTENLISTEN
  
2.   ... -->  SYN-RECEIVED
  
3.  (??) <--<--  SYN-RECEIVED
  
4.   -->   -->  (return to
LISTEN!)
  
5.  LISTENLISTEN

SYN-RECEIVED state => LISTEN

TCP A TCP B
  
1.  LISTENLISTEN
  
2.   ... -->  SYN-RECEIVED
  
3.  (??) <--<--  SYN-RECEIVED
  
4.   -->   -->  (return to
LISTEN!)
  
5.  LISTENLISTEN

Patch to kernel 2.6.15.4 as following:

diff -Nur a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c  2006-02-23 09:20:24.659262056 +0900
+++ b/net/ipv4/tcp_input.c  2006-02-23 09:28:50.772321176 +0900
@@ -4003,6 +4003,7 @@
 */
 
if (th->rst) {
+   TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
tcp_reset(sk);
goto discard;
}
@@ -4290,6 +4291,8 @@
 
/* step 2: check RST bit */
if(th->rst) {
+   if(sk->sk_state == TCP_SYN_RECV)
+   TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
tcp_reset(sk);
goto discard;
}
@@ -4303,6 +4306,8 @@
 *  Check for a SYN in window.
 */
if (th->syn && !before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
+   if(sk->sk_state == TCP_SYN_RECV)
+   TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
NET_INC_STATS_BH(LINUX_MIB_TCPABORTONSYN);
tcp_reset(sk);
return 1;
diff -Nur a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
--- a/net/ipv4/tcp_minisocks.c  2006-02-23 09:20:24.660261904 +0900
+++ b/net/ipv4/tcp_minisocks.c  2006-02-23 09:26:07.432152656 +0900
@@ -591,8 +591,10 @@
/* RFC793: "second check the RST bit" and
 * "fourth, check the SYN bit"
 */
-   if (flg & (TCP_FLAG_RST|TCP_FLAG_SYN))
+   if (flg & (TCP_FLAG_RST|TCP_FLAG_SYN)) {
+   TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
goto embryonic_reset;
+   }
 
/* ACK sequence verified above, just make sure ACK is
 * set.  If ACK not set, just silently drop the packet.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] skge: patches for 2.6.16

2006-02-23 Thread Jeff Garzik


Francois Romieu wrote:

Stephen Hemminger <[EMAIL PROTECTED]> :


Bug fix patches to skge driver that need to go in 2.6.16.
Some of them are in -mm and some have already been sent (and ignored).



#1..#3 Applied to branch 'for-jeff' at
git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git

Shortlog

$ git rev-list --pretty master..HEAD | git shortlog

Francois Romieu:
  r8169: fix broken ring index handling in suspend/resume
  r8169: enable wake on lan

Stephen Hemminger:
  sky2: yukon-ec-u chipset initialization
  sky2: limit coalescing values to ring size
  sky2: poke coalescing timer to fix hang
  sky2: force early transmit status
  sky2: use device iomem to access PCI config
  sky2: close race on IRQ mask update.
  skge: NAPI/irq race fix
  skge: genesis phy initialzation
  skge: protect interrupt mask



pulled, thanks.  It definitely makes things easier, if the patches are 
rolled up like this.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] ip6_tunnel: release cached dst on change of tunnel params

2006-02-23 Thread Hugo Santos

Hi,

   The included patch fixes ip6_tunnel to release the cached dst entry
 when the tunnel parameters (such as tunnel endpoints) are changed so
 they are used immediatly for the next encapsulated packets.

 Signed-off-by: Hugo Santos <[EMAIL PROTECTED]>

--- linux-2.6.16-rc4/net/ipv6/ip6_tunnel.c  2006-02-17 22:23:45.0 
+
+++ linux-2.6.16-rc4-new/net/ipv6/ip6_tunnel.c  2006-02-24 01:40:17.0 
+
@@ -884,6 +884,7 @@ ip6ip6_tnl_change(struct ip6_tnl *t, str
t->parms.encap_limit = p->encap_limit;
t->parms.flowinfo = p->flowinfo;
t->parms.link = p->link;
+   ip6_tnl_dst_reset(t);
ip6ip6_tnl_link_config(t);
return 0;
 }



signature.asc
Description: Digital signature

Re: [PATCH] pktgen: fix races between control/worker threads

2006-02-23 Thread David S. Miller

From: Robert Olsson <[EMAIL PROTECTED]>
Date: Wed, 22 Feb 2006 19:47:13 +0100

> 
> Jesse Brandeburg writes:
>  > 
>  > I looked quickly at this on a couple different machines and wasn't
>  > able to reproduce, so don't let me block the patch.  I think its a
>  > good patch FWIW
> 
>  OK! 
>  We ask Deve to apply it.

Applied to net-2.6.17, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/01] pktgen: Lindent run.

2006-02-23 Thread David S. Miller

From: Luiz Fernando Capitulino <[EMAIL PROTECTED]>
Date: Mon, 23 Jan 2006 13:44:19 -0200

> 
>  This patch is not in-lined because it's 120K bytes long, you can found it at:
> 
> http://www.cpu.eti.br/patches/pktgen_lindent_1.patch

Not found:

[EMAIL PROTECTED]:~/src/GIT/net-2.6.17$ wget 
http://www.cpu.eti.br/patches/pktgen_lindent_1.patch
--17:16:50--  http://www.cpu.eti.br/patches/pktgen_lindent_1.patch
   => `pktgen_lindent_1.patch'
Resolving www.cpu.eti.br... 209.59.143.183
Connecting to www.cpu.eti.br|209.59.143.183|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
17:16:50 ERROR 404: Not Found.

Anyways, can you please regenerate these 4 patches against
net-2.6.17, as I put in Arthur's race fix and it will certainly
conflict with these.

Sorry for taking so long to get to this :-(
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: (usagi-users 03614) Re: IPv6 setsockopt software MTU patch

2006-02-23 Thread David S. Miller

From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date: Fri, 24 Feb 2006 00:23:51 +0900 (JST)

> David, please apply.  Thank you.

Can you please resend the patch with a full changelog
entry and Signed-off-by lines for me?  Thank you.

This is for net-2.6 right?  Or net-2.6.17?

Thanks again.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ip6_tunnel keeping dst_cache after change of params

2006-02-23 Thread Hugo Santos

Hi,

   ip6_tunnel keeps a cached dst (dst_cache in ip6_tnl) per tunnel
 instance. This cached dst is re-used while it's not marked obsolete. A
 change of the tunnel's parameters (via SIOCCHGTUNNEL) does not
 invalidate the dst_cache directly, which results on it being used by
 ip6ip6_tnl_xmit after the tunnel is configured with new parameters.
   Shouldn't ip6ip6_tnl_change dst_release() the cached dst and leave
 ip6ip6_tnl_xmit to pick a new one based on the new local/remote
 addresses etc? I can provide a patch to fix this, meanwhile just wanted
 to confirm the expected behaviour.

   Thanks,
  Hugo


signature.asc
Description: Digital signature

Re: Fw: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call

2006-02-23 Thread Arnaldo Carvalho de Melo

On 2/23/06, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:
> On 2/23/06, Andrew Morton <[EMAIL PROTECTED]> wrote:
>
> > Starting from 2.6.14, defer_accept is moved to request_sock_queue structure,
> > which is re-initialized in inet_csk_listen_start().
>
> Oops, looking into it...

culprit:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=295f7324ff8d9ea58b4d3ec93b1aaa1d80e048a9

Alexandra, can you please test by just removing the zeroing from
reqsk_queue_alloc() in net/core/request_sock.c? Just remove this
line:

queue->rskq_defer_accept = 0;

icsk->icsk_accept_queue (that maps to the queue-> above) is zeroed
at sk alloc time, so just removing this one should restore the previous
behaviour.

Thanks,

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch 1/6] IPSEC: core updates

2006-02-23 Thread David S. Miller

From: jamal <[EMAIL PROTECTED]>
Date: Tue, 21 Feb 2006 08:31:49 -0500

> Ok. Patch attached against net-2617
> 
> Yoshfuji-san you should probably write a little doc that should be
> available in the Doc/ directory.

If we write this, please ask Andi Kleen to review it.
His arch has the most problems in this area making him
an expert on this topic :-)

> struct xfrm_aevent_id needs to be 32-bit + 64-bit align friendly.
> 
> Signed-off-by: Jamal Hadi Salim <[EMAIL PROTECTED]>

Applied, thanks everyone.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: fix first packet goes out with MAC 00:00:00:00:00:00

2006-02-23 Thread David S. Miller

From: jamal <[EMAIL PROTECTED]>
Date: Thu, 23 Feb 2006 10:06:46 -0500

> Ok, patch attached. Dave this also is needed for 2.6.16-rcXX.
> 
> Tested against a standard eth device (e1000) and tuntap.

Applied to net-2.6, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3 losing promisc rx_mode bit

2006-02-23 Thread David S. Miller

From: "Michael Chan" <[EMAIL PROTECTED]>
Date: Thu, 23 Feb 2006 13:12:38 -0800

> On Fri, 2006-02-24 at 11:48 +1300, Ian McDonald wrote:
> 
> > Thinking out loud here without reading source... - can you check the
> > version of the firmware and make noise if they have a version like
> > this one?
>
> Probably yes. Will put this on my queue if there is no other objection.

No objection here.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3 losing promisc rx_mode bit

2006-02-23 Thread Michael Chan

On Fri, 2006-02-24 at 11:48 +1300, Ian McDonald wrote:

> Thinking out loud here without reading source... - can you check the
> version of the firmware and make noise if they have a version like
> this one?
> 
Probably yes. Will put this on my queue if there is no other objection.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3 losing promisc rx_mode bit

2006-02-23 Thread Ian McDonald

On 2/24/06, Michael Chan <[EMAIL PROTECTED]> wrote:
> This is a known problem caused by ASF or IPMI firmware overwriting the
> promiscuous mode bit. I will have someone contact you to get the
> firmware upgraded.
>
> Thanks.
>
Thinking out loud here without reading source... - can you check the
version of the firmware and make noise if they have a version like
this one?

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3 losing promisc rx_mode bit

2006-02-23 Thread Michael Chan

On Thu, 2006-02-23 at 14:31 -0800, Jim Westfall wrote:

> I am seeing the following issue on only the first onboard nic on each of 
> the servers.  If the nic is put into promisc mode too soon after the nic 
> is brought up, the promisc bit in the rx_mode register is somehow getting 
> reset to 0;
> 

This is a known problem caused by ASF or IPMI firmware overwriting the
promiscuous mode bit. I will have someone contact you to get the
firmware upgraded.

Thanks.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

tg3 losing promisc rx_mode bit

2006-02-23 Thread Jim Westfall

Hi

I have a number of ibm x336 servers that have the following 2 onboard 
nics (eth0/1 are 2 other bcm57xx nics on a PCIX card).  kernel version is 
2.6.15.4, though I have tried 2.4.32/2.6.14-rc4 and they both have the 
issue below. 

ACPI: PCI Interrupt :06:00.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Enabling bus mastering for device :06:00.0
PCI: Setting latency timer of device :06:00.0 to 64
eth2: Tigon3 [partno(BCM95721) rev 4101 PHY(5750)] (PCI Express) 
10/100/1000BaseT Ethernet 00:0d:60:9a:81:be
eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] 
TSOcap[1] eth2: dma_rwctrl[7618]

ACPI: PCI Interrupt :07:00.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device :07:00.0 to 64
eth3: Tigon3 [partno(BCM95721) rev 4101 PHY(5750)] (PCI Express) 
10/100/1000BaseT Ethernet 00:0d:60:9a:81:bf
eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] 
TSOcap[1] eth3: dma_rwctrl[7618]

I am seeing the following issue on only the first onboard nic on each of 
the servers.  If the nic is put into promisc mode too soon after the nic 
is brought up, the promisc bit in the rx_mode register is somehow getting 
reset to 0;

Test cases are 

non-sleep case (promisc bit gets lost)
ifconfig eth2 up;tcpdump -n -i eth2

and

sleep case (promisc is set)
ifconfig eth2 up;sleep 1;tcpdump -n -i eth2

I added some addition debug statements to the driver to printk when its 
updates to the rx_mode register.  It dumps what the change is, who changed 
it (via stack dump) and re-reads the register and print the value out.

This is the output of the parts that set the device into promisc mode, 
which is the same for both test cases.

ADDRCONF(NETDEV_UP): eth2: link is not ready
eth2: setting rx_mode register to 0102
 [] _tw32_flush+0x30/0xac [tg3]
 [] __tg3_set_rx_mode+0x194/0x1ac [tg3]
 [] wakeme_after_rcu+0x0/0x10
 [] tg3_set_rx_mode+0x25/0x3c [tg3]
 [] __dev_mc_upload+0x21/0x28
 [] dev_mc_upload+0x19/0x28
 [] dev_set_promiscuity+0x37/0x5c
 [] packet_dev_mc+0x67/0x7c
 [] packet_mc_add+0x126/0x13c
 [] packet_setsockopt+0xa5/0xd4
 [] sys_setsockopt+0x69/0x84
 [] sys_socketcall+0x1b6/0x208
 [] syscall_call+0x7/0xb
eth2: read 0102 from rx_mode register
device eth2 entered promiscuous mode
tg3: eth2: Link is up at 100 Mbps, full duplex.
tg3: eth2: Flow control is off for TX and off for RX.
ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready

both indicate they are setting the promisc bit 0x00100, and reading it 
back as being set, but ethtool and tcpdump show otherwise in the non-sleep  
case.

non-sleep case
# ethtool -d eth2 | egrep -A4 1128 | egrep -A4 1128 | head -4
11280x02
11290x00
11300x00
11310x00

sleep case
# ethtool -d eth2 | egrep -A4 1128 | egrep -A4 1128 | head -4
11280x02
11290x01
11300x00
11310x00

I recent got burned by this because we use eth2/3 in a bridge, eth2 wasnt 
seeing any stp related broadcasts, which triggered a loop.

thanks
jim







-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] skge: patches for 2.6.16

2006-02-23 Thread Francois Romieu

Stephen Hemminger <[EMAIL PROTECTED]> :
> Bug fix patches to skge driver that need to go in 2.6.16.
> Some of them are in -mm and some have already been sent (and ignored).

#1..#3 Applied to branch 'for-jeff' at
git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git

Shortlog

$ git rev-list --pretty master..HEAD | git shortlog

Francois Romieu:
  r8169: fix broken ring index handling in suspend/resume
  r8169: enable wake on lan

Stephen Hemminger:
  sky2: yukon-ec-u chipset initialization
  sky2: limit coalescing values to ring size
  sky2: poke coalescing timer to fix hang
  sky2: force early transmit status
  sky2: use device iomem to access PCI config
  sky2: close race on IRQ mask update.
  skge: NAPI/irq race fix
  skge: genesis phy initialzation
  skge: protect interrupt mask

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Uninline kfree_skb and allow NULL argument

2006-02-23 Thread David S. Miller

From: Jörn Engel <[EMAIL PROTECTED]>
Date: Thu, 23 Feb 2006 13:52:59 +0100

> +void kfree_skb(struct sk_buff *skb);
>  extern void __kfree_skb(struct sk_buff *skb);

If you wish to contribute to a software project, you should adhere to
the coding style and conventions of that project when submitting
changes.  It doesn't matter what the reasons are for those
conventions, you should follow them until the projects decides to
change them.

If you wish to discuss the merits of putting extern there or not in
function declarations, you can start a thread about that and make
proposals on linux-kernel.

Patch submissions are not the place to do that.

So place add extern here, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Problem with Ipsec transport mode over NAT

2006-02-23 Thread Chinh Nguyen

Patrick McHardy wrote:
> Chinh Nguyen wrote:
> 
>>Patrick McHardy wrote:
>>
>>
>>>What values does skb->ip_summed have before that?
>>
>>
>>the skb->ip_summed value before the checksum check in tcp_v4_rcv is
>>CHECKSUM_NONE. Hence tcp_v4_rcv checks its value, which is incorrect because 
>>the
>>checksum is with regards to the private IP but the NAT device has modified the
>>source IP.
> 
> 
> Netfilter recalculates the checksum when NATing it.

The NATing is not done by netfilter but by the NAT device between the IPsec 
peers.

> 
>  I believe that skb->ip_summed is set to CHECKSUM_NONE by esp_input
> 
>>(net/ipv4/esp4.c:180) which is called by xfrm4_rcv_encap
>>(net/ipv4/xfrm4_input.c:101).
> 
> 
> The question is why the checksum is invalid. Please start by describing
> what you're trying to do.

[Linux ipsec client C] -- [NAT device] -- [Linux ipsec server S]

C negotiates a IPsec Transport Mode with S. Because of Transport Mode/NAT-T, 2
things happen to an IPsec packet.

1. It is UDP-encapsulated, typically on port 4500/udp.
2. Transport Mode traffic leaves the original IP header alone whereas tunnel
mode wraps the entire traffic in a second IP header. As such, when the packet
passes through the NAT device, the source IP is N. However, the original
unencrypted packet had source IP C.

S rips off the UDP-encap header, decrypts the payload, and "joins" the content
back to the IP header. If the decrypted content is UDP or TCP, the UDP/TCP
checksum is now incorrect because the source IP is now N not C.

(In tunnel mode, we would ignore the NAT-ted outer IP header because the
decrypted content has an entire IP header + UDP/TCP etc)

This is a well-known problem with transport mode/NAT. One solution is to use
NAT-OA and NAT-OR to recalculate the checksum. The linux kernel does the simpler
thing of ignoring the UDP/TCP checksum altogether in this particular case:

function esp_post_input (net/ipv4/esp4.c)
290 /*
291  * 2) ignore UDP/TCP checksums in case
292  *of NAT-T in Transport Mode, or
293  *perform other post-processing fixes
294  *as per * draft-ietf-ipsec-udp-encaps-06,
295  *section 3.1.2
296  */
297 if (!x->props.mode)
298 skb->ip_summed = CHECKSUM_UNNECESSARY;
299
300 break;

As noted, esp_post_input is called in xfrm4_policy_check. Decrypted UDP traffic
through transport mode/nat also has bad checksums. However, since it is passed
through udp_queue_rcv_skb after decryption, and this function calls
xfrm4_policy_check before checking the UDP checksum, line 298 means the kernel
ignores the bad checksum.

Decrypted TCP traffic has bad checksums too. But since tcp_v4_rcv checks the TCP
checksum before calling xfrm4_policy_check, the bad checksum means the TCP
packet is dropped as a bad segment.

The end result is that UDP and other traffic (eg, ICMP) can pass through
transport mode/nat but not TCP.

I don't know what correct fix is. Adding an extra call to xfrm4_policy_check in
tcp_v4_rcv before the checksum check fixes this problem and doesn't seem to
break anything else. On the other hand, moving some of the code in
esp_post_input into esp_input (especially line 298) will work, too.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Uninline kfree_skb and allow NULL argument

2006-02-23 Thread Herbert Xu

On Thu, Feb 23, 2006 at 02:21:46PM +0100, Sven Schuster wrote:
> 
> static inline void kfree_skb(struct sk_buff *skb)
> {
>   if (unlikely(!skb))
>   return;
>   _kfree_skb(skb);
> }
> 
> This way the kernel with the new inlined kfree_skb should still become
> smaller while not calling the un-inlined _kfree_skb if skb is
> NULL...?? (_should_ become smaller is a claim I make without any
> proof, sorry...)

This is pointless because most callers of kfree_skb expect skb to be
non-NULL.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] iproute2 -- add fwmarkmask

2006-02-23 Thread Michael Richardson

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


> "Patrick" == Patrick McHardy <[EMAIL PROTECTED]> writes:
Patrick> The normal way to display masks is with a "/". Also I think
Patrick> it shouldn't display the default mask to avoid breaking
Patrick> scripts that parse the output.

  I generally dislike the /VALUE, since I expect /PREFIX-LEN.
  I agree that it shouldn't show if it is default.

Patrick> ip should be able to parse its own output, and it would
Patrick> also look nicer if I could just say "fwmark
Patrick> 0x1/32". fwmarkmask is really an incredible ugly expression
Patrick> :)

  Sure. Is that a 32-bit long mask (0xfff), or is it a 0x0020?
  fwmark is not an address.

  Or would you like /32 to be a prefix-based mask, and &value and/or
fwmarkmask to be a value? 

- -- 
]   ON HUMILITY: to err is human. To moo, bovine.   |  firewalls  [
]   Michael Richardson,Xelerance Corporation, Ottawa, ON|net architect[
] [EMAIL PROTECTED]  http://www.sandelman.ottawa.on.ca/mcr/ |device driver[
] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Finger me for keys

iQEUAwUBQ/4PcoCLcPvd0N1lAQIHhQf3XzPLA91QEx2+XpmYIm8RyB1oKmUUXDP+
s2UrhOKbQwipcq8/hk1t4FKx8J5j/dFHzVXbgPK+ZUwX4+IjHmM3r0sCIcK08xwU
/ZZjf0wqwUI+RcPRFw3zC0+hnwRUIAUxhl3p7h3PigDpPu7AY5tQ1dXc6WNwRjTi
fS7Yozbo225dzvVLKHhSIqOQ4eJFJcPPQdTKQLxnc3gtVoSe41DKMM+x6uix6fG8
se9dngJRbhye1Xgws9AGnBQT9f7JVmCSv7V4SHnNynmnRw3cra8++QEnLZ/vhm5C
JdeVSeDGxAPuKEj6HA2RZu/UOG6RkYNZGPovGKzuPn403x0HNBuf
=BzfV
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/02] add mask options to fwmark masking code

2006-02-23 Thread Michael Richardson

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


> "Patrick" == Patrick McHardy <[EMAIL PROTECTED]> writes:
>> #define RTA_FWMARK RTA_PROTOINFO +#define RTA_FWMARK_MASK
>> RTA_CACHEINFO

Patrick> Please introduce a new attribute for this instead of
Patrick> overloading RTA_CACHEINFO.

  I would be happy to do that.
  Should I also un-overload FWMARK, with backwards compatibility?

>> diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c index
>> de327b3..69eed89 100644 --- a/net/ipv4/fib_rules.c +++
>> b/net/ipv4/fib_rules.c @@ -68,6 +68,7 @@ struct fib_rule u8
>> r_tos; #ifdef CONFIG_IP_ROUTE_FWMARK u32 r_fwmark; + u32
>> r_fwmark_mask;

Patrick> Both patches have whitespace issues. You should also change

  uhm. okay.
  I'm surprised, since I produced it with git-format-patch. Maybe there
are tabs that emacs screwed up.

- -- 
]   ON HUMILITY: to err is human. To moo, bovine.   |  firewalls  [
]   Michael Richardson,Xelerance Corporation, Ottawa, ON|net architect[
] [EMAIL PROTECTED]  http://www.sandelman.ottawa.on.ca/mcr/ |device driver[
] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Finger me for keys

iQEVAwUBQ/4O2ICLcPvd0N1lAQK/egf6A0iQ1hvecR4BeaCrQiu53beGZd6zHldk
o6logfar94kPP/H/D/kMcNeAvL2a3cJ8wyfyP02Cav8gP1C3X+XV+yLtA9jHIrdK
nqQ1gw7F4Cj2+v7du/jS8GxNMWevXhJ7f9hvnzh8+DHMUCjqiksgsuIgcRQYrqOQ
vxYERvR5TojEIaJfg8kH/lJRn3sm/APuMphM6c6SAeqrWpAdijbZb4LSNpGH50ci
nNhUp+FxoP8vVFTMTu7M1MK4fpCIWA/PxBkmy3YDhcQx1+mE2nrEqHdbKfx9uY+t
0mxR8UC5sthhn94/VCjcqWOoHe3S/Gi+WWoPtwN1sFe5BujwU7Vcfw==
=yKIA
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Fw: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call

2006-02-23 Thread Arnaldo Carvalho de Melo

On 2/23/06, Andrew Morton <[EMAIL PROTECTED]> wrote:

> Starting from 2.6.14, defer_accept is moved to request_sock_queue structure,
> which is re-initialized in inet_csk_listen_start().

Oops, looking into it...

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fw: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call

2006-02-23 Thread Andrew Morton



Begin forwarded message:

Date: Thu, 23 Feb 2006 07:26:28 -0800
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call


http://bugzilla.kernel.org/show_bug.cgi?id=6121

   Summary: TCP_DEFER_ACCEPT is reset on listen() call
Kernel Version: 2.6.14, 2.6.15
Status: NEW
  Severity: normal
 Owner: [EMAIL PROTECTED]
 Submitter: [EMAIL PROTECTED]


Most recent kernel where this bug did not occur: 2.6.13
Distribution:
Hardware Environment:
Software Environment:
Problem Description:
Value of TCP_DEFER_ACCEPT socket option is reset to zero when listen() is 
called.

Steps to reproduce:
Following program shows the problem:
#include 
#include 
#include 
#include 

main()
{
int s = socket(AF_INET, SOCK_STREAM, 0);
int val = 1;
int len = sizeof(val);

setsockopt(s, SOL_TCP, TCP_DEFER_ACCEPT, &val, len);
listen(s, 1);
getsockopt(s, SOL_TCP, TCP_DEFER_ACCEPT, &val, &len);
printf("get TCP_DEFER_ACCEPT = %d\n", val);
}

On <=2.6.13 output is "get TCP_DEFER_ACCEPT = 3";
On >=2.6.14 output is "get TCP_DEFER_ACCEPT = 0".


Starting from 2.6.14, defer_accept is moved to request_sock_queue structure,
which is re-initialized in inet_csk_listen_start().

--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.16-rc4] e1000: revert to single descriptor for legacy receive path

2006-02-23 Thread Jesse Brandeburg

A recent patch attempted to enable more efficient memory usage by using 
only 2kB descriptors for jumbo frames.  The method used to implement this 
has since been commented upon as "illegal" and in recent kernels even 
causes a BUG when receiving ip fragments while using jumbo frames. This 
patch simply goes back to the way things were.  We expect some complaints 
to reoccur due to order 3 allocations failing due to this change.


Signed-off-by: Jesse Brandeburg <[EMAIL PROTECTED]>

---

 drivers/net/e1000/e1000.h  |3 -
 drivers/net/e1000/e1000_main.c |  117 +++-
 2 files changed, 45 insertions(+), 75 deletions(-)

diff --git a/drivers/net/e1000/e1000.h b/drivers/net/e1000/e1000.h
index 27c7730..99baf0e 100644
--- a/drivers/net/e1000/e1000.h
+++ b/drivers/net/e1000/e1000.h
@@ -225,9 +225,6 @@ struct e1000_rx_ring {
struct e1000_ps_page *ps_page;
struct e1000_ps_page_dma *ps_page_dma;

-   struct sk_buff *rx_skb_top;
-   struct sk_buff *rx_skb_prev;
-
/* cpu for rx queue */
int cpu;

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 31e3329..5b7d0f4 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -103,7 +103,7 @@ static char e1000_driver_string[] = "Int
 #else
 #define DRIVERNAPI "-NAPI"
 #endif
-#define DRV_VERSION "6.3.9-k2"DRIVERNAPI
+#define DRV_VERSION "6.3.9-k4"DRIVERNAPI
 char e1000_driver_version[] = DRV_VERSION;
 static char e1000_copyright[] = "Copyright (c) 1999-2005 Intel Corporation.";

@@ -1635,8 +1635,6 @@ setup_rx_desc_die:

rxdr->next_to_clean = 0;
rxdr->next_to_use = 0;
-   rxdr->rx_skb_top = NULL;
-   rxdr->rx_skb_prev = NULL;

return 0;
 }
@@ -1713,8 +1711,23 @@ e1000_setup_rctl(struct e1000_adapter *a
rctl |= adapter->rx_buffer_len << 0x11;
} else {
rctl &= ~E1000_RCTL_SZ_4096;
-   rctl &= ~E1000_RCTL_BSEX;
-   rctl |= E1000_RCTL_SZ_2048;
+		rctl |= E1000_RCTL_BSEX; 
+		switch (adapter->rx_buffer_len) {

+   case E1000_RXBUFFER_2048:
+   default:
+   rctl |= E1000_RCTL_SZ_2048;
+   rctl &= ~E1000_RCTL_BSEX;
+   break;
+   case E1000_RXBUFFER_4096:
+   rctl |= E1000_RCTL_SZ_4096;
+   break;
+   case E1000_RXBUFFER_8192:
+   rctl |= E1000_RCTL_SZ_8192;
+   break;
+   case E1000_RXBUFFER_16384:
+   rctl |= E1000_RCTL_SZ_16384;
+   break;
+   }
}

 #ifndef CONFIG_E1000_DISABLE_PACKET_SPLIT
@@ -2107,16 +2120,6 @@ e1000_clean_rx_ring(struct e1000_adapter
}
}

-   /* there also may be some cached data in our adapter */
-   if (rx_ring->rx_skb_top) {
-   dev_kfree_skb(rx_ring->rx_skb_top);
-
-   /* rx_skb_prev will be wiped out by rx_skb_top */
-   rx_ring->rx_skb_top = NULL;
-   rx_ring->rx_skb_prev = NULL;
-   }
-
-
size = sizeof(struct e1000_buffer) * rx_ring->count;
memset(rx_ring->buffer_info, 0, size);
size = sizeof(struct e1000_ps_page) * rx_ring->count;
@@ -3106,24 +3109,27 @@ e1000_change_mtu(struct net_device *netd
break;
}

-   /* since the driver code now supports splitting a packet across
-* multiple descriptors, most of the fifo related limitations on
-* jumbo frame traffic have gone away.
-* simply use 2k descriptors for everything.
-*
-* NOTE: dev_alloc_skb reserves 16 bytes, and typically NET_IP_ALIGN
-* means we reserve 2 more, this pushes us to allocate from the next
-* larger slab size
-* i.e. RXBUFFER_2048 --> size-4096 slab */

-   /* recent hardware supports 1KB granularity */
if (adapter->hw.mac_type > e1000_82547_rev_2) {
-   adapter->rx_buffer_len =
-   ((max_frame < E1000_RXBUFFER_2048) ?
-   max_frame : E1000_RXBUFFER_2048);
+   adapter->rx_buffer_len = max_frame;
E1000_ROUNDUP(adapter->rx_buffer_len, 1024);
-   } else
-   adapter->rx_buffer_len = E1000_RXBUFFER_2048;
+   } else {
+   if(unlikely((adapter->hw.mac_type < e1000_82543) &&
+  (max_frame > MAXIMUM_ETHERNET_FRAME_SIZE))) {
+   DPRINTK(PROBE, ERR, "Jumbo Frames not supported "
+   "on 82542\n");
+   return -EINVAL;
+   } else {
+   if(max_frame <= E1000_RXBUFFER_2048)
+   adapter->rx_buffer_len = E1000_RXBUFFER_2048;
+   else if(max_frame <= E1000_RXBUFFER_4096)
+   adapter->rx_buffer_len = E10

Re: Problem with Ipsec transport mode over NAT

2006-02-23 Thread Patrick McHardy

Chinh Nguyen wrote:
> Patrick McHardy wrote:
> 
>>What values does skb->ip_summed have before that?
> 
> 
> the skb->ip_summed value before the checksum check in tcp_v4_rcv is
> CHECKSUM_NONE. Hence tcp_v4_rcv checks its value, which is incorrect because 
> the
> checksum is with regards to the private IP but the NAT device has modified the
> source IP.

Netfilter recalculates the checksum when NATing it.

 I believe that skb->ip_summed is set to CHECKSUM_NONE by esp_input
> (net/ipv4/esp4.c:180) which is called by xfrm4_rcv_encap
> (net/ipv4/xfrm4_input.c:101).

The question is why the checksum is invalid. Please start by describing
what you're trying to do.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pktgen + napi == kaboom

2006-02-23 Thread Jesse Brandeburg

On 2/22/06, Simon Kirby <[EMAIL PROTECTED]> wrote:
> Of course, now it doesn't send as fast.  Hrmph. :)  On this older Xeon
> 2.4 Ghz w/533 FSB and e1000 & tg3 @ PCI-X 133 Mhz 64 bit, SMP kernel,
> single pktgen thread, I'm only seeing:
>
> clone_skb=0, 802.1Q tagging, 60 byte:
> e1000: 558526pps 268Mb/sec (268092480bps) errors: 0
> tg3: 621260pps 298Mb/sec (298204800bps) errors: 0
>
> clone_skb=0, no 802.1Q, 60 byte:
> e1000: 664558pps 318Mb/sec (318987840bps) errors: 0
> tg3:   772650pps 370Mb/sec (370872000bps) errors: 0
>
> clone_skb=16384, no 802.1Q, 60 byte:
> e1000: 684206pps 328Mb/sec (328418880bps) errors: 0
> tg3: 1069830pps 513Mb/sec (513518400bps) errors: 0
>
> I tried on an Opteron 140 box and it was faster for both cards, but not
> by much.  oprofile showed a lot of do_getttimeofday, so I hacked a bunch
> of calls out of pktgen -- I noticed the CPU time shifted around but the
> throughput was still the same as before, as if it's card or bus limited.
>
> Why is it so difficult to actually get 1 Gbps of small packets?  I also
> tried changing ring buffer sizes, txqueuelen, interrupt coalescing
> settings, etc... all I was able to do was make it slower or very
> slightly faster.

Its difficult because you have *loads* of transactions going over the
bus.  Linux's single transmit packet at a time methodology also
exacerbates this, as we're unable to coalesce transmit tail (TDT)
writes to the bus and are probably interrupting the previous DMA *a
lot* (see thread below)

You'll probably be able to get a little better throughput by switching
to a UP kernel, but from my experience you're getting pretty close to
the max for pci-x e1000 adapters.  There are some previous messages
about this like:
http://oss.sgi.com/projects/netdev/archive/2004-12/msg00017.html

beware the hardware bug when enabling TXDMAC!

Jesse
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ipw2200 tester needed

2006-02-23 Thread Michael Buesch

On Thursday 23 February 2006 17:17, you wrote:
> In reviewing the ieee80211 stack in order to add additional geographic 
> support for wireless drivers, 
> I have studied all the in-kernel wireless drivers for their interactions with 
> the routines in 
> ieee80211_geo.c. As clearly stated in the comments, ipw2200.c duplicates most 
> of those routines, 
> even though ieee80211 is required to use ipw2200. Obviously, this bloats both 
> the source code and 
> the binaries for any user of ipw2200. I am planning to develop a patch to 
> have ipw2200 use the 
> ieee80211 code; however, I do not have the necessary hardware to test the 
> result.
> 
> Is anyone interested in testing this patch for me? Are there any comments 
> regarding this change?

Provide the patch, and I will see what I can do.

-- 
Greetings Michael.


pgpsKWhzaj29G.pgp
Description: PGP signature

ipw2200 tester needed

2006-02-23 Thread Larry Finger

In reviewing the ieee80211 stack in order to add additional geographic support for wireless drivers, 
I have studied all the in-kernel wireless drivers for their interactions with the routines in 
ieee80211_geo.c. As clearly stated in the comments, ipw2200.c duplicates most of those routines, 
even though ieee80211 is required to use ipw2200. Obviously, this bloats both the source code and 
the binaries for any user of ipw2200. I am planning to develop a patch to have ipw2200 use the 
ieee80211 code; however, I do not have the necessary hardware to test the result.


Is anyone interested in testing this patch for me? Are there any comments 
regarding this change?

Thanks,

Larry
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: fix first packet goes out with MAC 00:00:00:00:00:00

2006-02-23 Thread jamal

On Thu, 2006-23-02 at 17:41 +0300, Alexey Kuznetsov wrote:

> After some thinking I suspect the deletion of this chunk could change 
> behaviour
> of some parts which do not use neighbour cache f.e. packet socket.
> 

Thanks Alexey, this was what i was worried about ;->

> 
> I think safer approach would be to move this chunk after if (daddr).
> And the possibility to remove this completely could be analyzed later.
> 

Ok, patch attached. Dave this also is needed for 2.6.16-rcXX.

Tested against a standard eth device (e1000) and tuntap.

cheers,
jamal


For ethernet-like netdevices, dont overwritte first packet's dst 
MAC address when it is already resolved

Signed-off-by: Jamal Hadi Salim <[EMAIL PROTECTED]>
---

diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 9890fd9..c971f14 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -95,6 +95,12 @@ int eth_header(struct sk_buff *skb, stru
saddr = dev->dev_addr;
memcpy(eth->h_source,saddr,dev->addr_len);
 
+   if(daddr)
+   {
+   memcpy(eth->h_dest,daddr,dev->addr_len);
+   return ETH_HLEN;
+   }
+   
/*
 *  Anyway, the loopback-device should never use this function... 
 */
@@ -105,12 +111,6 @@ int eth_header(struct sk_buff *skb, stru
return ETH_HLEN;
}

-   if(daddr)
-   {
-   memcpy(eth->h_dest,daddr,dev->addr_len);
-   return ETH_HLEN;
-   }
-   
return -ETH_HLEN;
 }

Re: Problem with Ipsec transport mode over NAT

2006-02-23 Thread Chinh Nguyen

Patrick McHardy wrote:
> Chinh Nguyen wrote:
> 
>>I discovered that the "bug" is in the function tcp_v4_rcv for kernel 
>>2.6.16-rc1.
>>
>>After the ESP packet is decapped and decrypted in xfrm4_rcv_encap_finish, the
>>unencrypted packet is pushed back through ip_local_deliver. For a UDP packet, 
>>it
>>goes (back) to function udp_queue_rcv_skb. The first thing this function does 
>>is
>>called xfrm4_policy_check. As noted previously, in xfrm4_policy_check, if the
>>skb->sp != NULL, the esp_post_input function is called. The post input 
>>function
>>sets skb->ip_summed to CHECKSUM_UNNECESSASRY if we are in transport mode.
>>Therefore, further down in udp_queue_rcv_skb, we skip the checksum check and 
>>the
>>packet is passed up the stack.
>>
>>However, for a decrypted TCP packet, the packet goes to tcp_v4_rcv. This
>>function does the checksum check right away if skb->ip_summed !=
>>CHECKSUM_UNNECESSARY while xfrm4_policy_check is called a little later in the
>>function. Therefore, the esp post input has not yet set the ip_summed to
>>unnecessary. The decrypted packet fails the checksum and is discarded.
>>
>>To confirm this, I added another call to xfrm4_policy_check before the 
>>checksum
>>check in tcp_v4_rcv (to call esp post input). Once patched, my systems were 
>>able
>>to initiate TCP connections using Transport Mode/NAT.
> 
> 
> What values does skb->ip_summed have before that?

the skb->ip_summed value before the checksum check in tcp_v4_rcv is
CHECKSUM_NONE. Hence tcp_v4_rcv checks its value, which is incorrect because the
checksum is with regards to the private IP but the NAT device has modified the
source IP. I believe that skb->ip_summed is set to CHECKSUM_NONE by esp_input
(net/ipv4/esp4.c:180) which is called by xfrm4_rcv_encap
(net/ipv4/xfrm4_input.c:101).

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: fix first packet goes out with MAC 00:00:00:00:00:00

2006-02-23 Thread Alexey Kuznetsov

Hello!

> All devices including loopback pass a daddr. loopback in fact passes
> a 0 all the time ;-> 
> This means i can delete the check totaly or i can remove the IFF_NOARP
...
> Anyone knows the history?

I think, it was me who did this crap. It was so long ago I do not remember
why it was made.

I remember some troubles with dummy device. It tried to resolve
addresses, apparently, without success and generated errors instead
of blackholing. I think the problem was eventually solved at neighbour level.

After some thinking I suspect the deletion of this chunk could change behaviour
of some parts which do not use neighbour cache f.e. packet socket.


I think safer approach would be to move this chunk after if (daddr).
And the possibility to remove this completely could be analyzed later.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RFC: fix first packet goes out with MAC 00:00:00:00:00:00

2006-02-23 Thread jamal


This drove me nuts this morning and i find it hard to believe that
no-one has reported this before because i went back as far back
as 2.4.2 and it is there ;->. I am ccing the three people who may
possibly have made this change (no records whatsoever in git);->

When you turn off ARP on a netdevice then the first packet always
goes out with a dstMAC of all zeroes. This is because the first
packet is used to resolve ARP entries. Even though the ARP entry may
be resolved (I tried by setting a static ARP entry for a host i was
pinging from), it gets overwritten by virtue of having the netdevice
disabling ARP.
Subsequent packets go out fine with correct dstMAC address (which may be
why people have ignored reporting this issue).

To cut the story short: 
the culprit code is in net/ethernet/eth.c::eth_header()


/*
 *  Anyway, the loopback-device should never use this
function...
 */

if (dev->flags & (IFF_LOOPBACK|IFF_NOARP))
{
memset(eth->h_dest, 0, dev->addr_len);
return ETH_HLEN;
}

if(daddr)
{
memcpy(eth->h_dest,daddr,dev->addr_len);
return ETH_HLEN;
}



Note how the h_dest is being reset when device has IFF_NOARP.

The only reason i am asking is that this small piece of code has
some huge impact and i dont understand the history of IFF_NOARP check
being put there.

As a note:
All devices including loopback pass a daddr. loopback in fact passes
a 0 all the time ;-> 
This means i can delete the check totaly or i can remove the IFF_NOARP

Anyone knows the history?

cheers,
jamal



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Uninline kfree_skb and allow NULL argument

2006-02-23 Thread Sven Schuster


Hello,

> --- kfree_skb/include/linux/skbuff.h~kfree_skb_uninline_null  2006-02-23 
> 13:35:05.0 +0100
> +++ kfree_skb/include/linux/skbuff.h  2006-02-23 13:36:23.0 +0100
> @@ -306,6 +306,7 @@ struct sk_buff {
>  
>  #include 
>  
> +void kfree_skb(struct sk_buff *skb);
>  extern void __kfree_skb(struct sk_buff *skb);
>  extern struct sk_buff *__alloc_skb(unsigned int size,
>  gfp_t priority, int fclone);
> @@ -406,22 +407,6 @@ static inline struct sk_buff *skb_get(st
>   */
>  
>  /**
> - *   kfree_skb - free an sk_buff
> - *   @skb: buffer to free
> - *
> - *   Drop a reference to the buffer and free it if the usage count has
> - *   hit zero.
> - */
> -static inline void kfree_skb(struct sk_buff *skb)
> -{
> - if (likely(atomic_read(&skb->users) == 1))
> - smp_rmb();
> - else if (likely(!atomic_dec_and_test(&skb->users)))
> - return;
> - __kfree_skb(skb);
> -}
> -
> -/**
>   *   skb_cloned - is the buffer a clone
>   *   @skb: buffer to check
>   *
> --- kfree_skb/net/core/skbuff.c~kfree_skb_uninline_null   2006-02-23 
> 13:35:05.0 +0100
> +++ kfree_skb/net/core/skbuff.c   2006-02-23 13:37:01.0 +0100
> @@ -355,6 +355,24 @@ void __kfree_skb(struct sk_buff *skb)
>  }
>  
>  /**
> + *   kfree_skb - free an sk_buff
> + *   @skb: buffer to free
> + *
> + *   Drop a reference to the buffer and free it if the usage count has
> + *   hit zero.
> + */
> +void kfree_skb(struct sk_buff *skb)
> +{
> + if (unlikely(!skb))
> + return;
> + if (likely(atomic_read(&skb->users) == 1))
> + smp_rmb();
> + else if (likely(!atomic_dec_and_test(&skb->users)))
> + return;
> + __kfree_skb(skb);
> +}
> +

just thinking about it a little bit, why not un-inline the current
kfree_skb to, say, _kfree_skb, and make a new inlined kfree_skb
which just does

static inline void kfree_skb(struct sk_buff *skb)
{
if (unlikely(!skb))
return;
_kfree_skb(skb);
}

This way the kernel with the new inlined kfree_skb should still become
smaller while not calling the un-inlined _kfree_skb if skb is
NULL...?? (_should_ become smaller is a claim I make without any
proof, sorry...)


Sven

-- 
Linux zion.homelinux.com 2.6.16-rc3-mm1_27 #27 Wed Feb 15 17:51:36 CET 2006 
i686 athlon i386 GNU/Linux
 14:14:50 up 5 days, 18:30,  1 user,  load average: 0.14, 0.11, 0.17


pgp3WY3Yi97zC.pgp
Description: PGP signature

Re: [RFC] Some infrastructure for interrupt-less TX

2006-02-23 Thread Jörn Engel

On Thu, 23 February 2006 02:00:50 -0800, David S. Miller wrote:
> 
> > This breaks socket buffer accounting.
> 
> That's why he's dropping the SKB sans the data.

There doesn't appear to be any fundamental opposition.  David, should
I turn this mess into a decent patch, convert one driver, do some
extensive testing and send it to you [1]?

[1] Provided that I get bored on a rainy day and actually sit down and
do it, of course.  It's not a high priority.

Jörn

-- 
You ain't got no problem, Jules. I'm on the motherfucker. Go back in
there, chill them niggers out and wait for the Wolf, who should be
coming directly.
-- Marsellus Wallace
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Uninline kfree_skb and allow NULL argument

2006-02-23 Thread Jörn Engel

On Thu, 23 February 2006 22:26:01 +1100, Herbert Xu wrote:
> On Thu, Feb 23, 2006 at 12:22:31PM +0100, J?rn Engel wrote:
> >
> > Should I merge the two patches into one and resend?
> 
> Sounds good.

Here it is.

Jörn

-- 
Fancy algorithms are buggier than simple ones, and they're much harder
to implement. Use simple algorithms as well as simple data structures.
-- Rob Pike


o Uninline kfree_skb, which saves some 15k of object code on my notebook.

o Allow kfree_skb to be called with a NULL argument.

  Subsequent patches can remove conditional from drivers and further
  reduce source and object size.

Signed-off-by: Jörn Engel <[EMAIL PROTECTED]>
---

 include/linux/skbuff.h |   17 +
 net/core/skbuff.c  |   18 ++
 2 files changed, 19 insertions(+), 16 deletions(-)

--- kfree_skb/include/linux/skbuff.h~kfree_skb_uninline_null2006-02-23 
13:35:05.0 +0100
+++ kfree_skb/include/linux/skbuff.h2006-02-23 13:36:23.0 +0100
@@ -306,6 +306,7 @@ struct sk_buff {
 
 #include 
 
+void kfree_skb(struct sk_buff *skb);
 extern void   __kfree_skb(struct sk_buff *skb);
 extern struct sk_buff *__alloc_skb(unsigned int size,
   gfp_t priority, int fclone);
@@ -406,22 +407,6 @@ static inline struct sk_buff *skb_get(st
  */
 
 /**
- * kfree_skb - free an sk_buff
- * @skb: buffer to free
- *
- * Drop a reference to the buffer and free it if the usage count has
- * hit zero.
- */
-static inline void kfree_skb(struct sk_buff *skb)
-{
-   if (likely(atomic_read(&skb->users) == 1))
-   smp_rmb();
-   else if (likely(!atomic_dec_and_test(&skb->users)))
-   return;
-   __kfree_skb(skb);
-}
-
-/**
  * skb_cloned - is the buffer a clone
  * @skb: buffer to check
  *
--- kfree_skb/net/core/skbuff.c~kfree_skb_uninline_null 2006-02-23 
13:35:05.0 +0100
+++ kfree_skb/net/core/skbuff.c 2006-02-23 13:37:01.0 +0100
@@ -355,6 +355,24 @@ void __kfree_skb(struct sk_buff *skb)
 }
 
 /**
+ * kfree_skb - free an sk_buff
+ * @skb: buffer to free
+ *
+ * Drop a reference to the buffer and free it if the usage count has
+ * hit zero.
+ */
+void kfree_skb(struct sk_buff *skb)
+{
+   if (unlikely(!skb))
+   return;
+   if (likely(atomic_read(&skb->users) == 1))
+   smp_rmb();
+   else if (likely(!atomic_dec_and_test(&skb->users)))
+   return;
+   __kfree_skb(skb);
+}
+
+/**
  * skb_clone   -   duplicate an sk_buff
  * @skb: buffer to clone
  * @gfp_mask: allocation priority
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Uninline kfree_skb and allow NULL argument

2006-02-23 Thread Jörn Engel

On Thu, 23 February 2006 13:52:59 +0100, Jörn Engel wrote:
>  
> +void kfree_skb(struct sk_buff *skb);
>  extern void __kfree_skb(struct sk_buff *skb);
>  extern struct sk_buff *__alloc_skb(unsigned int size,

And while we're in the area...is there a good reason why all function
declarations have the "extern" added?

Jörn

-- 
He that composes himself is wiser than he that composes a book.
-- B. Franklin
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Herbert Xu

On Thu, Feb 23, 2006 at 12:22:31PM +0100, J?rn Engel wrote:
>
> Should I merge the two patches into one and resend?

Sounds good.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Jörn Engel

On Thu, 23 February 2006 03:11:12 -0800, David S. Miller wrote:
> 
> > Now there's a good idea.  After all, the great majority of callers
> > of kfree_skb expect to free the skb.  Dave, what do you think?
> 
> Absolutely.

Should I merge the two patches into one and resend?

Jörn

-- 
If you're willing to restrict the flexibility of your approach,
you can almost always do something better.
-- John Carmack
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread David S. Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 23 Feb 2006 21:55:43 +1100

> Now there's a good idea.  After all, the great majority of callers
> of kfree_skb expect to free the skb.  Dave, what do you think?

Absolutely.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Herbert Xu

On Thu, Feb 23, 2006 at 11:50:41AM +0100, J?rn Engel wrote:
> 
> For my kernel, there would be 92 removals if the condition at the
> price of 135 bytes of extra object code.  Some of the removals would
> be in modules, so the numbers are not exactly fair.

IMHO source saving is cheap while binary bloat isn't.

> Another interesting question is: Why is kfree_skb inline in the first
> place?  After uninlining it, my patch would debloat both source and
> object code by a bit:
> 
> -rwxr-xr-x  1 joern src   4824435 Feb 23 11:46 vmlinux
> 
> 12157 bytes gained.  Plus a bit more when the 92 conditionals are
> removed.

Now there's a good idea.  After all, the great majority of callers
of kfree_skb expect to free the skb.  Dave, what do you think?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Jörn Engel

On Thu, 23 February 2006 21:10:37 +1100, Herbert Xu wrote:
> On Thu, Feb 23, 2006 at 10:54:46AM +0100, J?rn Engel wrote:
> > 
> > Wrt. the binary, you have a point.  For source code, my patch does not
> > any new bloat and allows removal of the existing.  Lemme do a quick
> 
> Well I just did a grep in net/*/*.c and it seems that the number of
> calls to kfree_skb preceded by a NULL check is a small minority.  So
> I don't see the point of this as we'll be trading a very small amount
> of source code savings for the bloating (albeit small) of the binary.

For my kernel, there would be 92 removals if the condition at the
price of 135 bytes of extra object code.  Some of the removals would
be in modules, so the numbers are not exactly fair.

Another interesting question is: Why is kfree_skb inline in the first
place?  After uninlining it, my patch would debloat both source and
object code by a bit:

-rwxr-xr-x  1 joern src   4824435 Feb 23 11:46 vmlinux

12157 bytes gained.  Plus a bit more when the 92 conditionals are
removed.

Jörn

-- 
Optimizations always bust things, because all optimizations are, in
the long haul, a form of cheating, and cheaters eventually get caught.
-- Larry Wall 

--- linux-2.6.14-rc3cow/include/linux/skbuff.h~uninline_kfree_skb   
2006-02-23 11:40:30.0 +0100
+++ linux-2.6.14-rc3cow/include/linux/skbuff.h  2006-02-23 11:41:38.0 
+0100
@@ -302,6 +302,7 @@ struct sk_buff {
 
 #include 
 
+void kfree_skb(struct sk_buff *skb);
 extern void   __kfree_skb(struct sk_buff *skb);
 extern struct sk_buff *__alloc_skb(unsigned int size,
   unsigned int __nocast priority, int fclone);
@@ -397,24 +398,6 @@ static inline struct sk_buff *skb_get(st
  */
 
 /**
- * kfree_skb - free an sk_buff
- * @skb: buffer to free
- *
- * Drop a reference to the buffer and free it if the usage count has
- * hit zero.
- */
-static inline void kfree_skb(struct sk_buff *skb)
-{
-   if (unlikely(!skb))
-   return;
-   if (likely(atomic_read(&skb->users) == 1))
-   smp_rmb();
-   else if (likely(!atomic_dec_and_test(&skb->users)))
-   return;
-   __kfree_skb(skb);
-}
-
-/**
  * skb_cloned - is the buffer a clone
  * @skb: buffer to check
  *
--- linux-2.6.14-rc3cow/net/core/skbuff.c~uninline_kfree_skb2006-01-18 
14:56:05.0 +0100
+++ linux-2.6.14-rc3cow/net/core/skbuff.c   2006-02-23 11:41:56.0 
+0100
@@ -350,6 +350,24 @@ void __kfree_skb(struct sk_buff *skb)
 }
 
 /**
+ * kfree_skb - free an sk_buff
+ * @skb: buffer to free
+ *
+ * Drop a reference to the buffer and free it if the usage count has
+ * hit zero.
+ */
+void kfree_skb(struct sk_buff *skb)
+{
+   if (unlikely(!skb))
+   return;
+   if (likely(atomic_read(&skb->users) == 1))
+   smp_rmb();
+   else if (likely(!atomic_dec_and_test(&skb->users)))
+   return;
+   __kfree_skb(skb);
+}
+
+/**
  * skb_clone   -   duplicate an sk_buff
  * @skb: buffer to clone
  * @gfp_mask: allocation priority
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Herbert Xu

On Thu, Feb 23, 2006 at 10:54:46AM +0100, J?rn Engel wrote:
> 
> Wrt. the binary, you have a point.  For source code, my patch does not
> any new bloat and allows removal of the existing.  Lemme do a quick

Well I just did a grep in net/*/*.c and it seems that the number of
calls to kfree_skb preceded by a NULL check is a small minority.  So
I don't see the point of this as we'll be trading a very small amount
of source code savings for the bloating (albeit small) of the binary.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Some infrastructure for interrupt-less TX

2006-02-23 Thread David S. Miller

From: Lennert Buytenhek <[EMAIL PROTECTED]>
Date: Thu, 23 Feb 2006 10:55:21 +0100

> This breaks socket buffer accounting.

That's why he's dropping the SKB sans the data.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Jörn Engel

On Thu, 23 February 2006 19:28:49 +1100, Herbert Xu wrote:
> On Thu, Feb 23, 2006 at 07:53:36AM +0100, J?rn Engel wrote:
> > 
> > How is that argument special for kfree_skb?  Both libc free and kfree
> > ignore NULL arguments and do so for good reasons.
> 
> Well with kfree there is actually a slight gain in that you are doing
> the check in one place.
> 
> kfree_skb on the other hand is inlined so the you're actually adding
> bloat to many places that simply don't need it.

Wrt. the binary, you have a point.  For source code, my patch does not
any new bloat and allows removal of the existing.  Lemme do a quick
measurement for the kernel I run on my machine:

-rwxr-xr-x  1 joern src   4836592 Feb 23 10:43 vmlinux
-rwxr-xr-x  1 joern src   4836727 Feb 23 10:19 vmlinux.kfree_null

135 bytes added by my patch.  Not that much.

Jörn

-- 
He who knows others is wise.
He who knows himself is enlightened.
-- Lao Tsu
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Some infrastructure for interrupt-less TX

2006-02-23 Thread Jörn Engel

On Thu, 23 February 2006 10:55:21 +0100, Lennert Buytenhek wrote:
> On Thu, Feb 23, 2006 at 08:00:32AM +0100, Jörn Engel wrote:
> 
> > > I am assuming the real goal is avoiding interrupts when
> > > transmit completions can be reported without them on a
> > > reasonably periodic basis.
> > 
> > Not necessarily on a periodic basis.  For some network driver I once
> > worked on, the hardware simply had a ring buffer of n frames.
> > Whenever a n+1th frame was transmitted, the first would be checked for
> > completion.  If it was completed, it was freed, else the new frame was
> > dropped (and freed).
> 
> This breaks socket buffer accounting.

Only if you keep the skb as well.  Read the patch.  The point is to
free the skb but keep the packet data.

Jörn

-- 
Optimizations always bust things, because all optimizations are, in
the long haul, a form of cheating, and cheaters eventually get caught.
-- Larry Wall 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Some infrastructure for interrupt-less TX

2006-02-23 Thread Lennert Buytenhek

On Thu, Feb 23, 2006 at 08:00:32AM +0100, Jörn Engel wrote:

> > I am assuming the real goal is avoiding interrupts when
> > transmit completions can be reported without them on a
> > reasonably periodic basis.
> 
> Not necessarily on a periodic basis.  For some network driver I once
> worked on, the hardware simply had a ring buffer of n frames.
> Whenever a n+1th frame was transmitted, the first would be checked for
> completion.  If it was completed, it was freed, else the new frame was
> dropped (and freed).

This breaks socket buffer accounting.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] prism54usb: compile fix

2006-02-23 Thread Pete Zaitcev

On Mon, 20 Feb 2006 20:39:16 +0100, Carlos Martin <[EMAIL PROTECTED]> wrote:

> diff --git a/drivers/net/wireless/prism54usb/isl_sm.h 
> b/drivers/net/wireless/prism54usb/isl_sm.h
> index 9e41587..c39bb48 100644
> --- a/drivers/net/wireless/prism54usb/isl_sm.h
> +++ b/drivers/net/wireless/prism54usb/isl_sm.h
> @@ -249,7 +249,7 @@ extern int  islsm_wait_timeo
>  
>  /* now the helper functions, for sending packets */
>  int islsm_outofband_msg(struct net_device *netdev,
> - void *buf, unsigned int size);
> + void *buf, size_t size);

I have it in my tree already. Something is inconsistent somewhere. Weird.

-- Pete
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: no carrier detection after resume from swsusp (8139too)

2006-02-23 Thread Will Stephenson

On Wednesday 22 February 2006 16:19, Robert Love wrote:
> e100 or e1000?

8139cp here.  Seems to have picked up this behaviour since SL10.1beta2 or so, 
still in beta4.

See https://bugzilla.novell.com/show_bug.cgi?id=151892

> `carrier' returns EINVAL if the device is not UP.  It might be a bug in
> NM if the device is not UP after a resume.  What does `ifconfig eth1`
> show before and after a resume?

carrier is 1 before and after, eth0 is UP before and after, restarting NM 
doesn't help, nor does stopping NM, rmmod, modprobe, and starting NM.  

I didn't think it is NM related, as I could not configure the network by hand 
after resume.   However, I just switched my network config to SUSE 
traditional ifup+ifplugd (joys of flexibility), and although eth0 does not 
work on resume, rcnetwork restart fixes it, whereas when NM is in charge, 
this does not help.  So my understanding is NM is not the direct cause but is 
a contributing factor.

Will

eth0  Link encap:Ethernet  HWaddr 00:02:3F:67:0A:E3  
  inet addr:169.254.137.164  Bcast:169.254.255.255  Mask:255.255.0.0
  inet6 addr: fe80::202:3fff:fe67:ae3/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:2769 errors:0 dropped:3144 overruns:0 frame:0
  TX packets:180 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:285833 (279.1 Kb)  TX bytes:14752 (14.4 Kb)
  Interrupt:10 Base address:0x2000 

eth1  Link encap:Ethernet  HWaddr 00:0C:F1:13:76:CB  
  UP BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
  Interrupt:5 Memory:9000-9fff 

loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:277 errors:0 dropped:0 overruns:0 frame:0
  TX packets:277 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0 
  RX bytes:21838 (21.3 Kb)  TX bytes:21838 (21.3 Kb)

eth0  Link encap:Ethernet  HWaddr 00:02:3F:67:0A:E3  
  inet addr:10.10.101.143  Bcast:10.10.255.255  Mask:255.255.0.0
  inet6 addr: 2001:780:101:a00:202:3fff:fe67:ae3/64 Scope:Global
  inet6 addr: fe80::202:3fff:fe67:ae3/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:2210 errors:0 dropped:0 overruns:0 frame:0
  TX packets:177 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:251842 (245.9 Kb)  TX bytes:14494 (14.1 Kb)
  Interrupt:10 Base address:0x2000 

eth1  Link encap:Ethernet  HWaddr 00:0C:F1:13:76:CB  
  UP BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
  Interrupt:5 Memory:9000-9fff 

loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:149 errors:0 dropped:0 overruns:0 frame:0
  TX packets:149 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0 
  RX bytes:12784 (12.4 Kb)  TX bytes:12784 (12.4 Kb)

Re: [PATCH]IPv4 UDP does not discard the datagram with invalid checksum

2006-02-23 Thread David S. Miller

From: Wei Yongjun <[EMAIL PROTECTED]>
Date: Thu, 23 Feb 2006 16:03:18 -0500

> IPv4 UDP does not discard the datagram with invalid checksum. UDP can
> validate UDP checksums correctly only when socket filtering instructions
> is set. If socket filtering instructions is not set, datagram with
> invalid checksum will be passed to the application.

We check the checksum later, in parallel with the copy of
the packet data into userspace.

See udp_recvmsg(), where we do this:

if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), 
msg->msg_iov,
  copied);
} else if (msg->msg_flags&MSG_TRUNC) {
if (__udp_checksum_complete(skb))
goto csum_copy_err;
err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), 
msg->msg_iov,
  copied);
} else {
err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct 
udphdr), msg->msg_iov);

if (err == -EINVAL)
goto csum_copy_err;
}

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Herbert Xu

On Thu, Feb 23, 2006 at 07:53:36AM +0100, J?rn Engel wrote:
> 
> How is that argument special for kfree_skb?  Both libc free and kfree
> ignore NULL arguments and do so for good reasons.

Well with kfree there is actually a slight gain in that you are doing
the check in one place.

kfree_skb on the other hand is inlined so the you're actually adding
bloat to many places that simply don't need it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

58 matches

Mail list logo