Re: [patch 12/19] fix irq problem with NAPI + NETPOLL

2007-03-13 Thread Atsushi Nemoto
On Thu, 08 Mar 2007 10:35:13 +0900 (JST), Atsushi Nemoto [EMAIL PROTECTED] 
wrote:
  netpoll_rx() should be invokable from hardware interrupt context.
  What is the crash you are seeing?
 
 The problem is not netpoll_rx().  It should be called from irq context.
 The problem is, netif_receive_skb() is called from irq context though
 it seems not designed to do so.

Unfortunately I could not reproduce the crash, but IIRC the crash was
happened at upper protocol layer on hardware interrupt context.

Anyway, I think main path of netif_receive_skb() should not be
executed in hardware interrupt context.  Is it wrong?

  It looks like perhaps the kfree_skb() calls need to be modified
  in __netpoll_rx().
 
 Well, it seems an another netpoll bug.

I suppose these kfree_skb() in __netpoll_rx() should be
dev_kfree_skb_any().  And I found an another abuse which is irrelevant
to netpoll.  The netif_rx() calls kfree_skb() at its bottom.  The
netif_rx() should be callable from hardware interrupt context, so it
should be changed to dev_kfree_skb_any().  Is it right?

---
Atsushi Nemoto
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Removal of multipath cached (was Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.)

2007-03-13 Thread Jarek Poplawski
On Mon, Mar 12, 2007 at 10:22:36PM -0800, Andrew Morton wrote:
  On Mon, 12 Mar 2007 13:53:11 -0700 (PDT) David Miller [EMAIL PROTECTED] 
  wrote:
...
  And there is absolutely no negotiations about this, I've held back on
  this for nearly 2 years, and nothing has happened, this code is not
  maintained, nobody cares enough to fix the bugs, and even no
  distributions enable it because it causes crashes.
 
 Good stuff.
 
 I suggest you put a big printk explaining the above into 2.6.21.
 

Plus official way: Documentation/feature-remove-schedule.txt
in the next rc-git.

Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-13 Thread Eric Dumazet
On Tuesday 13 March 2007 10:32, Evgeniy Polyakov wrote:
 On Fri, Mar 02, 2007 at 11:52:47AM +0300, Evgeniy Polyakov 
([EMAIL PROTECTED]) wrote:
 So, I ask network developers about testing environment for socket lookup
 benchmarking. What would be the best test case to determine performance
 of the lookup algo? Is it enough to replace algo and locking and create
 say one million of connections and try to run trivial web server (that
 is what I'm going to test if there will not be any better suggestion,
 but I only have single-core athlon 64 with 1gb of ram as a test bed and
 two core duo machines as generators, probably I can use one of them as a
 test machine too. They have gigabit adapters and aree connected over
 gigabit switch)?

One million concurrent sockets on your machines will be tricky :)

$ egrep (filp|dent|^TCP|sock_inode_cache) /proc/slabinfo |cut -c1-40
TCP   12 14   1152  
sock_inode_cache 423430384  
dentry_cache   36996  47850132  
filp4081   4680192  

that means at the minimum 1860 bytes of LOWMEM per tcp socket on 32bit kernel, 
(2512 bytes on a 64bit kernel)

I had one bench program but apparently I lost it :(
It was able to open long lived sockets, (one million if enough memory), and 
was generating kind of random trafic on all sockets. damned.
The 'server' side had to listen to many (16) ports because of the 65536 
limit.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-13 Thread Evgeniy Polyakov
On Tue, Mar 13, 2007 at 11:08:27AM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
 On Tuesday 13 March 2007 10:32, Evgeniy Polyakov wrote:
  On Fri, Mar 02, 2007 at 11:52:47AM +0300, Evgeniy Polyakov 
 ([EMAIL PROTECTED]) wrote:
  So, I ask network developers about testing environment for socket lookup
  benchmarking. What would be the best test case to determine performance
  of the lookup algo? Is it enough to replace algo and locking and create
  say one million of connections and try to run trivial web server (that
  is what I'm going to test if there will not be any better suggestion,
  but I only have single-core athlon 64 with 1gb of ram as a test bed and
  two core duo machines as generators, probably I can use one of them as a
  test machine too. They have gigabit adapters and aree connected over
  gigabit switch)?
 
 One million concurrent sockets on your machines will be tricky :)
 
 $ egrep (filp|dent|^TCP|sock_inode_cache) /proc/slabinfo |cut -c1-40
 TCP   12 14   1152  
 sock_inode_cache 423430384  
 dentry_cache   36996  47850132  
 filp4081   4680192  
 
 that means at the minimum 1860 bytes of LOWMEM per tcp socket on 32bit 
 kernel, 
 (2512 bytes on a 64bit kernel)
 
 I had one bench program but apparently I lost it :(
 It was able to open long lived sockets, (one million if enough memory), and 
 was generating kind of random trafic on all sockets. damned.
 The 'server' side had to listen to many (16) ports because of the 65536 
 limit.

Yep, I was too optimistic about my hardware - getting size of the tcp
socket it is impossible to even create such amount of them with 1 or 2 gb of
ram.

Well, I can run additional tests in userspace (ideally with hugetlb support, 
but given that both socket hash table and my algo use essentially the same
amount of ram it should not matter) with more precise analysis...

And just send a patch with detailed description.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] NetXen: Fix softlock seen on some machines during hardware writes

2007-03-13 Thread Mithlesh Thukral
On Friday 09 March 2007 21:56, Stephen Hemminger wrote:
 Linsys Contractor Mithlesh Thukral wrote:
  NetXen: This will fix a softlock seen on some machines.
  The reason was too much time was spent waiting for writes to go through.
 
  Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]
  ---
   drivers/net/netxen/netxen_nic.h |1 +
   drivers/net/netxen/netxen_nic_ethtool.c |1 +
   drivers/net/netxen/netxen_nic_init.c|   11 +--
   3 files changed, 11 insertions(+), 2 deletions(-)
 
  diff --git a/drivers/net/netxen/netxen_nic.h
  b/drivers/net/netxen/netxen_nic.h index 38d7409..c85c2cb 100644
  --- a/drivers/net/netxen/netxen_nic.h
  +++ b/drivers/net/netxen/netxen_nic.h
  @@ -236,6 +236,7 @@ #define MPORT_MULTI_FUNCTION_MODE 0x
 
   #include netxen_nic_phan_reg.h
   extern unsigned long long netxen_dma_mask;
  +extern unsigned long last_schedule_time;
 
   /*
* NetXen host-peg signal message structure
  diff --git a/drivers/net/netxen/netxen_nic_ethtool.c
  b/drivers/net/netxen/netxen_nic_ethtool.c index 3752d2a..d49a7d8 100644
  --- a/drivers/net/netxen/netxen_nic_ethtool.c
  +++ b/drivers/net/netxen/netxen_nic_ethtool.c
  @@ -455,6 +455,7 @@ netxen_nic_set_eeprom(struct net_device
  }
  printk(KERN_INFO %s: flash unlocked. \n,
  netxen_nic_driver_name);
  +   last_schedule_time = jiffies;
  ret = netxen_flash_erase_secondary(adapter);
  if (ret != FLASH_SUCCESS) {
  printk(KERN_ERR %s: Flash erase failed.\n,
  diff --git a/drivers/net/netxen/netxen_nic_init.c
  b/drivers/net/netxen/netxen_nic_init.c index b2e776f..53ca21e 100644
  --- a/drivers/net/netxen/netxen_nic_init.c
  +++ b/drivers/net/netxen/netxen_nic_init.c
  @@ -42,6 +42,8 @@ struct crb_addr_pair {
  u32 data;
   };
 
  +unsigned long last_schedule_time;
  +
   #define NETXEN_MAX_CRB_XFORM 60
   static unsigned int crb_addr_xform[NETXEN_MAX_CRB_XFORM];
   #define NETXEN_ADDR_ERROR (0x)
  @@ -404,9 +406,14 @@ static inline int do_rom_fast_write(stru
   static inline int
   do_rom_fast_read(struct netxen_adapter *adapter, int addr, int *valp)
   {
  +   if (jiffies  (last_schedule_time + (8 * HZ))) {
  +   last_schedule_time = jiffies;
  +   schedule();
  +   }
  +
  netxen_nic_reg_write(adapter, NETXEN_ROMUSB_ROM_ADDRESS, addr);
  netxen_nic_reg_write(adapter, NETXEN_ROMUSB_ROM_ABYTE_CNT, 3);
  -   udelay(70); /* prevent bursting on CRB */
  +   udelay(100);/* prevent bursting on CRB */

 To prevent PCI write posting issues, you should always do a dummy read
 before
 any delay.

This is a good suggestion. I have the code in place in which i do a dummy read 
of hardware location before the delay. But as of now i have tested this code 
only on some machines. I will like to test it on almost all possible set of 
hardware configurations and put it. With that i am also trying to reduce the 
delay as much as possible.

Till then this patch will make the code work on all hardware platforms 
(including one which require more delay) as well as prevent a softlockup from 
occurring.

Thanks,
Mithlesh Thukral
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] NetXen: Bug fixes

2007-03-13 Thread Linsys Contractor Mithlesh Thukral
Hi All,

I will be sending bug fixes to NetXen: 1G/10G Ethernet driver in subsequent 
mails.
The patches are with respect to netdev#upstream-fixes.

Regards,
Mithlesh Thukral
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] NetXen: Bug fix for Jumbo frames on XG card

2007-03-13 Thread Linsys Contractor Mithlesh Thukral
NetXen: Set the MTU for the right port depending upon the port number
for XG cards.

Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]
---

 drivers/net/netxen/netxen_nic_hw.c |5 -
 1 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netxen/netxen_nic_hw.c 
b/drivers/net/netxen/netxen_nic_hw.c
index 1be5570..6537574 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -822,7 +822,10 @@ int netxen_nic_set_mtu_xgb(struct netxen
 {
struct netxen_adapter *adapter = port-adapter;
new_mtu += NETXEN_NIU_HDRSIZE + NETXEN_NIU_TLRSIZE;
-   netxen_nic_write_w0(adapter, NETXEN_NIU_XGE_MAX_FRAME_SIZE, new_mtu);
+   if (port-portnum == 0)
+   netxen_nic_write_w0(adapter, NETXEN_NIU_XGE_MAX_FRAME_SIZE, 
new_mtu);
+   else if (port-portnum == 1)
+   netxen_nic_write_w0(adapter, NETXEN_NIU_XG1_MAX_FRAME_SIZE, 
new_mtu);
return 0;
 }
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bridge: faster compare for link local addresses

2007-03-13 Thread Andi Kleen
David Miller [EMAIL PROTECTED] writes:

 From: Rick Jones [EMAIL PROTECTED]
 Date: Mon, 12 Mar 2007 17:05:39 -0700
 
  Being paranoid - are there no worries about the alignment of dest?
 
 If it's an issue, it's an issue elsewhere too, as the places
 where Stephen took this idiomatic code from is the code
 ethernet handling and that runs on every input packet via
 eth_type_trans().

As a quick note -- when you tell gcc the expected alignment
by using correct types then moderm gcc should generate fast inline code 
for memcpy/memcmp/etc. by itself. It only falls back to a slow generic
function when it cannot figure out the alignment or the size.

So I expect just using u32 * instead of char * should have the same
effect and would be somewhat cleaner and the memcmp could be kept.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bridge: faster compare for link local addresses

2007-03-13 Thread Eric Dumazet
On Tuesday 13 March 2007 15:01, Andi Kleen wrote:
 David Miller [EMAIL PROTECTED] writes:
  From: Rick Jones [EMAIL PROTECTED]
  Date: Mon, 12 Mar 2007 17:05:39 -0700
 
   Being paranoid - are there no worries about the alignment of dest?
 
  If it's an issue, it's an issue elsewhere too, as the places
  where Stephen took this idiomatic code from is the code
  ethernet handling and that runs on every input packet via
  eth_type_trans().

 As a quick note -- when you tell gcc the expected alignment
 by using correct types then moderm gcc should generate fast inline code
 for memcpy/memcmp/etc. by itself. It only falls back to a slow generic
 function when it cannot figure out the alignment or the size.

 So I expect just using u32 * instead of char * should have the same
 effect and would be somewhat cleaner and the memcmp could be kept.

For memcpy() yes you can have some optimizations.

But memcmp() has a strong semantic (in libc). memcmp(a, b, 6) should do 6 byte 
compares and conditional branches, regardless of a/b alignment.
Or use the x86 rep cmpsb instruction that basically has the same cost.

The trick we use in compare_ether_addr() reduces to one some arithmetic and 
one test.

return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0;

I found this line as clean as memcmp(a, b, 6)

(On x86_64, were alignment is not mandatory, we could do :

((*(long *)a ^ *(long*)b)  16) != 0)

(only if we can always read two extra bytes without faulting, of course :) )

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] natsemi: netpoll fixes

2007-03-13 Thread Sergei Shtylyov

Hello.

Mark Brown wrote:


Subject: natsemi: Fix NAPI for interrupt sharing



The interrupt status register for the natsemi chips is clear on read and
was read unconditionally from both the interrupt and from the NAPI poll
routine, meaning that if the interrupt service routine was called (for 
example, due to a shared interrupt) while a NAPI poll was scheduled

interrupts could be missed.  This patch fixes that by ensuring that the
interrupt status register is only read by the interrupt handler when
interrupts are enabled from the chip.

It also reverts a workaround for this problem from the netpoll hook and
improves the trace for interrupt events.

Thanks to Sergei Shtylyov [EMAIL PROTECTED] for spotting the
issue, Mark Huth [EMAIL PROTECTED] for a simpler method and Simon
Blake [EMAIL PROTECTED] for testing resources.



Signed-Off-By: Mark Brown [EMAIL PROTECTED]



Index: linux-2.6/drivers/net/natsemi.c
===
--- linux-2.6.orig/drivers/net/natsemi.c2007-03-11 02:32:43.0 
+
+++ linux-2.6/drivers/net/natsemi.c 2007-03-13 00:12:29.0 +

[...]

@@ -2131,17 +2133,23 @@
   dev-name, np-intr_status,
   readl(ioaddr + IntrMask));
 
-	if (!np-intr_status)

-   return IRQ_NONE;
-
-   prefetch(np-rx_skbuff[np-cur_rx % RX_RING_SIZE]);
+   if (np-intr_status) {
+   prefetch(np-rx_skbuff[np-cur_rx % RX_RING_SIZE]);
 
-	if (netif_rx_schedule_prep(dev)) {

/* Disable interrupts and register for poll */
-   natsemi_irq_disable(dev);
-   __netif_rx_schedule(dev);
+   if (netif_rx_schedule_prep(dev)) {
+   natsemi_irq_disable(dev);
+   __netif_rx_schedule(dev);
+   } else
+   printk(KERN_WARNING
+  %s: Ignoring interrupt, status %#08x, mask 
%#08x.\n,
+  dev-name, np-intr_status,
+  readl(ioaddr + IntrMask));
+
+   return IRQ_HANDLED;
}
-   return IRQ_HANDLED;
+
+   return IRQ_NONE;
 }


   The only complaint I have is that this restructuring seems 
unnecessary: the only real change it does is an addition of else to the if 
statement.


WBR, Sergei
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Shrink struct dst_entry a bit

2007-03-13 Thread Andi Kleen

The ICMP rate limiting state can be shorts, we don't send that many ICMPs.
Changing flags to short and reorder fields to be sorted by size to avoid holes.
Move cold fields towards the end.

Signed-off-by: Andi Kleen [EMAIL PROTECTED]

Index: linux-2.6.21-rc3-net/include/net/dst.h
===
--- linux-2.6.21-rc3-net.orig/include/net/dst.h
+++ linux-2.6.21-rc3-net/include/net/dst.h
@@ -40,26 +40,24 @@ struct dst_entry
struct rcu_head rcu_head;
struct dst_entry*child;
struct net_device   *dev;
-   short   error;
-   short   obsolete;
-   int flags;
+   unsigned long   expires;
+   short   flags;
 #define DST_HOST   1
 #define DST_NOXFRM 2
 #define DST_NOPOLICY   4
 #define DST_NOHASH 8
 #define DST_BALANCED0x10
-   unsigned long   expires;
+   short   error;
+   short   obsolete;
 
unsigned short  header_len; /* more space at head required 
*/
unsigned short  nfheader_len;   /* more non-fragment space at 
head required */
unsigned short  trailer_len;/* space to reserve at tail */
 
-   u32 metrics[RTAX_MAX];
-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
-   unsigned long   rate_tokens;
+   unsigned short  rate_last;  /* rate limiting for ICMP */
+   unsigned short  rate_tokens;
 
+   struct dst_entry*path;
struct neighbour*neighbour;
struct hh_cache *hh;
struct xfrm_state   *xfrm;
@@ -67,10 +65,6 @@ struct dst_entry
int (*input)(struct sk_buff*);
int (*output)(struct sk_buff*);
 
-#ifdef CONFIG_NET_CLS_ROUTE
-   __u32   tclassid;
-#endif
-
struct  dst_ops *ops;

unsigned long   lastuse;
@@ -82,6 +76,13 @@ struct dst_entry
struct rt6_info   *rt6_next;
struct dn_route  *dn_next;
};
+
+   u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+   __u32   tclassid;
+#endif
+
charinfo[0];
 };
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] natsemi: netpoll fixes

2007-03-13 Thread Sergei Shtylyov

Hello.

Mark Brown wrote:


Moving netdev_rx() would fix that one but there's some others too -
there's one in the timer routine if the chip crashes.  In the case you


  Erm, sorry, I'm not seeing it -- could you point with finger please? 
  :-)



In netdev_timer() when the device is using PORT_TP if the DspCfg read
back from the chip differs from the one we think we programmed into it
then the driver thinks the PHY fell over.  It then goes through an init
sequence, including init_registers() which will reset IntrEnable among
other things.


   What's more important for us, it will also clear IntrStatus (and ignore 
all pending interrupts).  Well, as it will also reinit the whole TX/RX rings, 
so that all packets will be lost...



describe above the consequences shouldn't be too bad since it tends to
only occur at high volume so further traffic will tend to occur and
cause things to recover - all the testing of that patch was done with
the bug present and no ill effects.



  Oversized packets occur only at high volume? Is it some errata?



It's an errata - AN 1287 which you can get from the National web site.
It's not actually that chip that's getting oversided packets, what
happens is that the state machine which reads data off the wire gets
confused and eventually locks up.  Before locking up it will usually
report one or more oversided packets so this is a useful hint that we
should reset the recieve state machine in order to recover from this.


   That's all good by why we need to completely lose TX and other interrupts
in the meantime? High inbound traffic doesn't necessarily mean a high outbound 
one, does it?


WBR, Sergei

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] PPPoE: miscellaneous smaller cleanups

2007-03-13 Thread Michal Ostrowski
below is a patch that just removes dead code/initializers without any
effect (first access is an assignment) that I stumbled accross while
reading the source.

Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED]
Acked-by: Michal Ostrowski [EMAIL PROTECTED]
---
 drivers/net/pppoe.c |   21 -
 1 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index ebfa296..ec4e67d 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -207,7 +207,7 @@ static inline struct pppox_sock *get_item(unsigned long sid,
 
 static inline struct pppox_sock *get_item_by_addr(struct sockaddr_pppox *sp)
 {
-   struct net_device *dev = NULL;
+   struct net_device *dev;
int ifindex;
 
dev = dev_get_by_name(sp-sa_addr.pppoe.dev);
@@ -222,9 +222,6 @@ static inline int set_item(struct pppox_sock *po)
 {
int i;
 
-   if (!po)
-   return -EINVAL;
-
write_lock_bh(pppoe_hash_lock);
i = __set_item(po);
write_unlock_bh(pppoe_hash_lock);
@@ -344,7 +341,7 @@ static struct notifier_block pppoe_notifier = {
 static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb)
 {
struct pppox_sock *po = pppox_sk(sk);
-   struct pppox_sock *relay_po = NULL;
+   struct pppox_sock *relay_po;
 
if (sk-sk_state  PPPOX_BOUND) {
struct pppoe_hdr *ph = (struct pppoe_hdr *) skb-nh.raw;
@@ -514,7 +511,6 @@ static int pppoe_release(struct socket *sock)
 {
struct sock *sk = sock-sk;
struct pppox_sock *po;
-   int error = 0;
 
if (!sk)
return 0;
@@ -543,7 +539,7 @@ static int pppoe_release(struct socket *sock)
skb_queue_purge(sk-sk_receive_queue);
sock_put(sk);
 
-   return error;
+   return 0;
 }
 
 
@@ -762,10 +758,10 @@ static int pppoe_ioctl(struct socket *sock, unsigned int 
cmd,
 static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock,
  struct msghdr *m, size_t total_len)
 {
-   struct sk_buff *skb = NULL;
+   struct sk_buff *skb;
struct sock *sk = sock-sk;
struct pppox_sock *po = pppox_sk(sk);
-   int error = 0;
+   int error;
struct pppoe_hdr hdr;
struct pppoe_hdr *ph;
struct net_device *dev;
@@ -929,10 +925,10 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct 
socket *sock,
  struct msghdr *m, size_t total_len, int flags)
 {
struct sock *sk = sock-sk;
-   struct sk_buff *skb = NULL;
+   struct sk_buff *skb;
int error = 0;
int len;
-   struct pppoe_hdr *ph = NULL;
+   struct pppoe_hdr *ph;
 
if (sk-sk_state  PPPOX_BOUND) {
error = -EIO;
@@ -949,7 +945,6 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct socket 
*sock,
m-msg_namelen = 0;
 
if (skb) {
-   error = 0;
ph = (struct pppoe_hdr *) skb-nh.raw;
len = ntohs(ph-length);
 
@@ -991,7 +986,7 @@ out:
 
 static __inline__ struct pppox_sock *pppoe_get_idx(loff_t pos)
 {
-   struct pppox_sock *po = NULL;
+   struct pppox_sock *po;
int i = 0;
 
for (; i  PPPOE_HASH_SIZE; i++) {
-- 
1.5.0.g78e90

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] PPPOE: memory leak when socket is release()d before PPPIOCGCHAN has been called on it

2007-03-13 Thread Michal Ostrowski
below you find a patch that fixes a memory leak when a PPPoE socket is
release()d after it has been connect()ed, but before the PPPIOCGCHAN ioctl
ever has been called on it.

This is somewhat of a security problem, too, since PPPoE sockets can be
created by any user, so any user can easily allocate all the machine's
RAM to non-swappable address space and thus DoS the system.

Is there any specific reason for PPPoE sockets being available to any
unprivileged process, BTW? After all, you need a packet socket for the
discovery stage anyway, so it's unlikely that any unprivileged process
will ever need to create a PPPoE socket, no? Allocating all session IDs
for a known AC is a kind of DoS, too, after all - with Juniper ERXes,
this is really easy, actually, since they don't ever assign session ids
above 8000 ...

Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED]
Acked-by: Michal Ostrowski [EMAIL PROTECTED]
---
 drivers/net/pppox.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/pppox.c b/drivers/net/pppox.c
index 9315046..3f8115d 100644
--- a/drivers/net/pppox.c
+++ b/drivers/net/pppox.c
@@ -58,7 +58,7 @@ void pppox_unbind_sock(struct sock *sk)
 {
/* Clear connection to ppp device, if attached. */
 
-   if (sk-sk_state  (PPPOX_BOUND | PPPOX_ZOMBIE)) {
+   if (sk-sk_state  (PPPOX_BOUND | PPPOX_CONNECTED | PPPOX_ZOMBIE)) {
ppp_unregister_channel(pppox_sk(sk)-chan);
sk-sk_state = PPPOX_DEAD;
}
-- 
1.5.0.g78e90

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] PPPOE: race between interface going down and connect()

2007-03-13 Thread Michal Ostrowski
below you find a patch that (hopefully) fixes a race between an interface
going down and a connect() to a peer on that interface. Before,
connect() would determine that an interface is up, then the interface
could go down and all entries referring to that interface in the
item_hash_table would be marked as ZOMBIEs and their references to
the device would be freed, and after that, connect() would put a new
entry into the hash table referring to the device that meanwhile is
down already - which also would cause unregister_netdevice() to wait
until the socket has been release()d.

This patch does not suffice if we are not allowed to accept connect()s
referring to a device that we already acked a NETDEV_GOING_DOWN for
(that is: all references are only guaranteed to be freed after
NETDEV_DOWN has been acknowledged, not necessarily after the
NETDEV_GOING_DOWN already). And if we are allowed to, we could avoid
looking through the hash table upon NETDEV_GOING_DOWN completely and
only do that once we get the NETDEV_DOWN ...

mostrows:
pppoe_flush_dev is called on NETDEV_GOING_DOWN and NETDEV_DOWN to deal with
this late connect issue.  Ideally one would hope to notify users at the
NETDEV_GOING_DOWN phase (just to pretend to be nice).  However, it is the
NETDEV_DOWN scan that takes all the responsibility for ensuring nobody is
hanging around at that time.

Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED]
Acked-by: Michal Ostrowski [EMAIL PROTECTED]
---
 drivers/net/pppoe.c |   19 ++-
 1 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index ec4e67d..4e878c9 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -218,17 +218,6 @@ static inline struct pppox_sock *get_item_by_addr(struct 
sockaddr_pppox *sp)
return get_item(sp-sa_addr.pppoe.sid, sp-sa_addr.pppoe.remote, 
ifindex);
 }
 
-static inline int set_item(struct pppox_sock *po)
-{
-   int i;
-
-   write_lock_bh(pppoe_hash_lock);
-   i = __set_item(po);
-   write_unlock_bh(pppoe_hash_lock);
-
-   return i;
-}
-
 static inline struct pppox_sock *delete_item(unsigned long sid, char *addr, 
int ifindex)
 {
struct pppox_sock *ret;
@@ -595,14 +584,18 @@ static int pppoe_connect(struct socket *sock, struct 
sockaddr *uservaddr,
po-pppoe_dev = dev;
po-pppoe_ifindex = dev-ifindex;
 
-   if (!(dev-flags  IFF_UP))
+   write_lock_bh(pppoe_hash_lock);
+   if (!(dev-flags  IFF_UP)){
+   write_unlock_bh(pppoe_hash_lock);
goto err_put;
+   }
 
memcpy(po-pppoe_pa,
   sp-sa_addr.pppoe,
   sizeof(struct pppoe_addr));
 
-   error = set_item(po);
+   error = __set_item(po);
+   write_unlock_bh(pppoe_hash_lock);
if (error  0)
goto err_put;
 
-- 
1.5.0.g78e90

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] PPPOE: Fix device tear-down notification.

2007-03-13 Thread Michal Ostrowski
pppoe_flush_dev() kicks all sockets bound to a device that is going down.
In doing so, locks must be taken in the right order consistently (sock lock,
followed by the pppoe_hash_lock).  However, the scan process is based on
us holding the sock lock.  So, when something is found in the scan we must
release the lock we're holding and grab the sock lock.

This patch fixes race conditions between this code and pppoe_release(),
both of which perform similar functions but would naturally prefer to grab
locks in opposing orders.  Both code paths are now going after these locks
in a consistent manner.

pppoe_hash_lock protects the contents of the pppox_sock objects that reside
inside the hash.  Thus, NULL'ing out the pppoe_dev field should be done
under the protection of this lock.

Signed-off-by: Michal Ostrowski [EMAIL PROTECTED]
---
 drivers/net/pppoe.c |   93 +--
 1 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 4e878c9..0961bf9 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -241,54 +241,53 @@ static inline struct pppox_sock *delete_item(unsigned 
long sid, char *addr, int
 static void pppoe_flush_dev(struct net_device *dev)
 {
int hash;
-
BUG_ON(dev == NULL);
 
-   read_lock_bh(pppoe_hash_lock);
+   write_lock_bh(pppoe_hash_lock);
for (hash = 0; hash  PPPOE_HASH_SIZE; hash++) {
struct pppox_sock *po = item_hash_table[hash];
 
while (po != NULL) {
-   if (po-pppoe_dev == dev) {
-   struct sock *sk = sk_pppox(po);
-
-   sock_hold(sk);
-   po-pppoe_dev = NULL;
-
-   /* We hold a reference to SK, now drop the
-* hash table lock so that we may attempt
-* to lock the socket (which can sleep).
-*/
-   read_unlock_bh(pppoe_hash_lock);
-
-   lock_sock(sk);
-
-   if (sk-sk_state 
-   (PPPOX_CONNECTED | PPPOX_BOUND)) {
-   pppox_unbind_sock(sk);
-   dev_put(dev);
-   sk-sk_state = PPPOX_ZOMBIE;
-   sk-sk_state_change(sk);
-   }
-
-   release_sock(sk);
+   struct sock *sk = sk_pppox(po);
+   if (po-pppoe_dev != dev) {
+   po = po-next;
+   continue;
+   }
+   po-pppoe_dev = NULL;
+   dev_put(dev);
+   
+
+   /* We always grab the socket lock, followed by the
+* pppoe_hash_lock, in that order.  Since we should
+* hold the sock lock while doing any unbinding, 
+* we need to release the lock we're holding.  
+* Hold a reference to the sock so it doesn't disappear
+* as we're jumping between locks.
+*/
 
-   sock_put(sk);
+   sock_hold(sk);
 
-   read_lock_bh(pppoe_hash_lock);
+   write_unlock_bh(pppoe_hash_lock);
+   lock_sock(sk);
 
-   /* Now restart from the beginning of this
-* hash chain.  We always NULL out pppoe_dev
-* so we are guaranteed to make forward
-* progress.
-*/
-   po = item_hash_table[hash];
-   continue;
+   if (sk-sk_state  (PPPOX_CONNECTED | PPPOX_BOUND)) {
+   pppox_unbind_sock(sk);
+   sk-sk_state = PPPOX_ZOMBIE;
+   sk-sk_state_change(sk);
}
-   po = po-next;
+
+   release_sock(sk);
+   sock_put(sk);
+   
+   /* Restart scan at the beginning of this hash chain.
+* While the lock was dropped the chain contents may
+* have changed.
+*/
+   write_lock_bh(pppoe_hash_lock);
+   po = item_hash_table[hash];
}
}
-   read_unlock_bh(pppoe_hash_lock);
+   write_unlock_bh(pppoe_hash_lock);
 }
 
 static int pppoe_device_event(struct notifier_block *this,
@@ -504,28 +503,42 @@ 

Re: [PATCH] Shrink struct dst_entry a bit

2007-03-13 Thread Andi Kleen
On Tuesday 13 March 2007 15:10, Eric Dumazet wrote:
 On Tuesday 13 March 2007 14:48, Andi Kleen wrote:
  The ICMP rate limiting state can be shorts, we don't send that many ICMPs.
  Changing flags to short and reorder fields to be sorted by size to avoid
  holes. Move cold fields towards the end.
 
 
 Nope, you cannot break the reordering I've done one month ago.

Ok.  When you do such changes you should always add a comment, otherwise
it will be always destroyed with the next change.

But it seems highly fragile to me anyways because it depends on the exact
value of RTAX_MAX which tends to change regularly when someone invents
a new attribute. You should probably have moved next out of the dst entry.

Anyways here's a new patch with next still at the end and a comment.

-Andi

Shrink dst_entry a bit.

The ICMP rate limiting state can be shorts, we don't send that many ICMPs.
Changing flags to short and reorder fields to be sorted by size to avoid holes.
Move cold fields towards the end.

Signed-off-by: Andi Kleen [EMAIL PROTECTED]

Index: linux-2.6.21-rc3-net/include/net/dst.h
===
--- linux-2.6.21-rc3-net.orig/include/net/dst.h
+++ linux-2.6.21-rc3-net/include/net/dst.h
@@ -40,26 +40,24 @@ struct dst_entry
struct rcu_head rcu_head;
struct dst_entry*child;
struct net_device   *dev;
-   short   error;
-   short   obsolete;
-   int flags;
+   unsigned long   expires;
+   short   flags;
 #define DST_HOST   1
 #define DST_NOXFRM 2
 #define DST_NOPOLICY   4
 #define DST_NOHASH 8
 #define DST_BALANCED0x10
-   unsigned long   expires;
+   short   error;
+   short   obsolete;
 
unsigned short  header_len; /* more space at head required 
*/
unsigned short  nfheader_len;   /* more non-fragment space at 
head required */
unsigned short  trailer_len;/* space to reserve at tail */
 
-   u32 metrics[RTAX_MAX];
-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
-   unsigned long   rate_tokens;
+   unsigned short  rate_last;  /* rate limiting for ICMP */
+   unsigned short  rate_tokens;
 
+   struct dst_entry*path;
struct neighbour*neighbour;
struct hh_cache *hh;
struct xfrm_state   *xfrm;
@@ -67,21 +65,26 @@ struct dst_entry
int (*input)(struct sk_buff*);
int (*output)(struct sk_buff*);
 
-#ifdef CONFIG_NET_CLS_ROUTE
-   __u32   tclassid;
-#endif
-
struct  dst_ops *ops;

unsigned long   lastuse;
atomic_t__refcnt;   /* client references*/
int __use;
+   u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+   __u32   tclassid;
+#endif
+
+   /* Should be at the end to be on the same cache line as 
+  the flow information in rtable.  */
union {
struct dst_entry *next;
struct rtable*rt_next;
struct rt6_info   *rt6_next;
struct dn_route  *dn_next;
};
+
charinfo[0];
 };
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Shrink struct dst_entry a bit

2007-03-13 Thread Eric Dumazet
On Tuesday 13 March 2007 15:31, Andi Kleen wrote:

 Ok.  When you do such changes you should always add a comment, otherwise
 it will be always destroyed with the next change.

 But it seems highly fragile to me anyways because it depends on the exact
 value of RTAX_MAX which tends to change regularly when someone invents
 a new attribute. You should probably have moved next out of the dst entry.

Not an option, unfortunately. But yes, a comment is needed.
(Before my february patches, the 'next' pointer was forced to be the first 
field of dst).


 Anyways here's a new patch with next still at the end and a comment.


Andi, did you actually test your patch ?

Unless I really miss something obvious, rate_last is supposed to store 
jiffies.

net/ipv4/route.c:1313:  if (time_after(jiffies, rt-u.dst.rate_last + 
ip_rt_redirect_silence))

So you *cannot* convert it to 'unsigned short'. Really.


However, you could convert it to a u32, and use a helper function :

static inline u32 get_jiffies_32()
{
return (u32)jiffies;
}

and change appropriate code using rate_last

Also, 'lastuse' could use a u32 too, I even had a patch for this one...

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Shrink struct dst_entry a bit

2007-03-13 Thread Eric Dumazet
On Tuesday 13 March 2007 15:44, Eric Dumazet wrote:

 Also, 'lastuse' could use a u32 too, I even had a patch for this one...

Here is the patch I have here for lastuse u32 conversion, not for inclusion 
yet because not yet tested (only compiled)

[PATCH] NET : abstract lastuse (from struct dst_entry) and convert it to u32

This saves 4 bytes (possibly 8) on 64 bit archs

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index c080f61..fb23951 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -89,7 +89,16 @@ static inline u64 get_jiffies_64(void)
return (u64)jiffies;
 }
 #endif
-
+/*
+ * On 64bit archs, storing timestamps in 'unsigned long' vars
+ * may consume unecessary memory. Using u32 is ok when deltas
+ * between current jiffie and past timestamps are known to
+ * fit in 32-1 bits. Even with HZ=1000, thats 24 days.
+ */
+static inline u32 get_jiffies_32(void)
+{
+   return (u32)jiffies;
+}
 /*
  * These inlines deal with timer wrapping correctly. You are 
  * strongly encouraged to use them
diff --git a/include/net/dst.h b/include/net/dst.h
index e12a8ce..70366b7 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -67,13 +67,13 @@ #define DST_BALANCED0x10
int (*input)(struct sk_buff*);
int (*output)(struct sk_buff*);
 
+
+   struct  dst_ops *ops;
 #ifdef CONFIG_NET_CLS_ROUTE
__u32   tclassid;
 #endif
-
-   struct  dst_ops *ops;

-   unsigned long   lastuse;
+   u32 __lastuse;
atomic_t__refcnt;   /* client references*/
int __use;
union {
@@ -103,11 +103,23 @@ struct dst_ops
int entry_size;
 
atomic_tentries;
-   struct kmem_cache   *kmem_cachep;
+   struct kmem_cache   *kmem_cachep;
 };
 
 #ifdef __KERNEL__
 
+static inline void 
+dst_lastuse_set(struct dst_entry *dst)
+{
+   dst-__lastuse = get_jiffies_32();
+}
+
+static inline unsigned long
+dst_lastuse_delta(const struct dst_entry *dst)
+{
+   return get_jiffies_32() - dst-__lastuse;
+}
+
 static inline u32
 dst_metric(const struct dst_entry *dst, int metric)
 {
diff --git a/net/core/dst.c b/net/core/dst.c
index 764bccb..6c0a023 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -136,7 +136,7 @@ void * dst_alloc(struct dst_ops * ops)
return NULL;
atomic_set(dst-__refcnt, 0);
dst-ops = ops;
-   dst-lastuse = jiffies;
+   dst_lastuse_set(dst);
dst-path = dst;
dst-input = dst_discard_in;
dst-output = dst_discard_out;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6055074..c3f5264 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -215,7 +215,7 @@ int rtnl_put_cacheinfo(struct sk_buff *s
   u32 ts, u32 tsage, long expires, u32 error)
 {
struct rta_cacheinfo ci = {
-   .rta_lastuse = jiffies_to_clock_t(jiffies - dst-lastuse),
+   .rta_lastuse = jiffies_to_clock_t(dst_lastuse_delta(dst)),
.rta_used = dst-__use,
.rta_clntref = atomic_read((dst-__refcnt)),
.rta_error = error,
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 32a7db3..3d5c46c 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -166,7 +166,7 @@ static void dn_dst_check_expire(unsigned
spin_lock(dn_rt_hash_table[i].lock);
while((rt=*rtp) != NULL) {
if (atomic_read(rt-u.dst.__refcnt) ||
-   (now - rt-u.dst.lastuse)  expire) {
+   dst_lastuse_delta(rt-u.dst)  expire) 
{
rtp = rt-u.dst.dn_next;
continue;
}
@@ -187,7 +187,6 @@ static int dn_dst_gc(void)
 {
struct dn_route *rt, **rtp;
int i;
-   unsigned long now = jiffies;
unsigned long expire = 10 * HZ;
 
for(i = 0; i = dn_rt_hash_mask; i++) {
@@ -197,7 +196,7 @@ static int dn_dst_gc(void)
 
while((rt=*rtp) != NULL) {
if (atomic_read(rt-u.dst.__refcnt) ||
-   (now - rt-u.dst.lastuse)  expire) {
+   dst_lastuse_delta(rt-u.dst)  expire) 
{
rtp = rt-u.dst.dn_next;
continue;
}
@@ -278,7 +277,6 @@ static inline int compare_keys(struct fl
 static int dn_insert_route(struct dn_route *rt, unsigned hash, struct dn_route 
**rp)
 {
struct dn_route *rth, **rthp;
-   unsigned long now = jiffies;
 
rthp = dn_rt_hash_table[hash].chain;
 
@@ -293,7 +291,7 @@ 

[PATCH 1/3] PPPoE: improved hashing routine

2007-03-13 Thread Florian Zumbiehl
Hi,

I'm not sure whether this is really worth it, but it looked so
extremely inefficient that I couldn't resist - so let's hope providers
will keep PPPoE around for a while, at least until terabit dsl ;-)

The new code produces the same results as the old version and is
~ 3 to 6 times faster for 4-bit hashes on the CPUs I tested.

Florian

---
Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED]

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 9e51fcc..954328c 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -108,19 +108,24 @@ static inline int cmp_addr(struct pppoe_addr *a, unsigned 
long sid, char *addr)
(memcmp(a-remote,addr,ETH_ALEN) == 0));
 }
 
-static int hash_item(unsigned long sid, unsigned char *addr)
+#if 8%PPPOE_HASH_BITS
+#error 8 must be a multiple of PPPOE_HASH_BITS
+#endif
+
+static int hash_item(unsigned int sid, unsigned char *addr)
 {
-   char hash = 0;
-   int i, j;
+   unsigned char hash = 0;
+   unsigned int i;
 
-   for (i = 0; i  ETH_ALEN ; ++i) {
-   for (j = 0; j  8/PPPOE_HASH_BITS ; ++j) {
-   hash ^= addr[i]  ( j * PPPOE_HASH_BITS );
-   }
+   for (i = 0 ; i  ETH_ALEN ; i++) {
+   hash ^= addr[i];
+   }
+   for (i = 0 ; i  sizeof(sid_t)*8 ; i += 8 ){
+   hash ^= sidi;
+   }
+   for (i = 8 ; (i=1) = PPPOE_HASH_BITS ; ) {
+   hash ^= hashi;
}
-
-   for (i = 0; i  (sizeof(unsigned long)*8) / PPPOE_HASH_BITS ; ++i)
-   hash ^= sid  (i*PPPOE_HASH_BITS);
 
return hash  ( PPPOE_HASH_SIZE - 1 );
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] PPPoX/E: return ENOTTY on unknown ioctl requests

2007-03-13 Thread Florian Zumbiehl
Hi,

here another patch for the PPPoX/E code that makes sure that ENOTTY is
returned for unknown ioctl requests rather than 0 (and removes another
unneeded initializer which I didn't bother creating a separate patch for).

Florian

---
Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED]

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 954328c..9554924 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -669,8 +669,8 @@ static int pppoe_ioctl(struct socket *sock, unsigned int 
cmd,
 {
struct sock *sk = sock-sk;
struct pppox_sock *po = pppox_sk(sk);
-   int val = 0;
-   int err = 0;
+   int val;
+   int err;
 
switch (cmd) {
case PPPIOCGMRU:
@@ -759,8 +759,9 @@ static int pppoe_ioctl(struct socket *sock, unsigned int 
cmd,
err = 0;
break;
 
-   default:;
-   };
+   default:
+   err = -ENOTTY;
+   }
 
return err;
 }
diff --git a/drivers/net/pppox.c b/drivers/net/pppox.c
index 3f8115d..51de561 100644
--- a/drivers/net/pppox.c
+++ b/drivers/net/pppox.c
@@ -72,7 +72,7 @@ int pppox_ioctl(struct socket *sock, unsigned int cmd, 
unsigned long arg)
 {
struct sock *sk = sock-sk;
struct pppox_sock *po = pppox_sk(sk);
-   int rc = 0;
+   int rc;
 
lock_sock(sk);
 
@@ -93,12 +93,9 @@ int pppox_ioctl(struct socket *sock, unsigned int cmd, 
unsigned long arg)
break;
}
default:
-   if (pppox_protos[sk-sk_protocol]-ioctl)
-   rc = pppox_protos[sk-sk_protocol]-ioctl(sock, cmd,
- arg);
-
-   break;
-   };
+   rc = pppox_protos[sk-sk_protocol]-ioctl ?
+   pppox_protos[sk-sk_protocol]-ioctl(sock, cmd, arg) : 
-ENOTTY;
+   }
 
release_sock(sk);
return rc;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] PPPoE: move lock_sock() in pppoe_sendmsg() to the right location

2007-03-13 Thread Florian Zumbiehl
Hi,

and the last one for now: Acquire the sock lock in pppoe_sendmsg()
before accessing the sock - and in particular avoid releasing the lock
even though it hasn't been acquired.

Florian

---
Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED]

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 9554924..eef8a5b 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -779,6 +779,7 @@ static int pppoe_sendmsg(struct kiocb *iocb, struct socket 
*sock,
struct net_device *dev;
char *start;
 
+   lock_sock(sk);
if (sock_flag(sk, SOCK_DEAD) || !(sk-sk_state  PPPOX_CONNECTED)) {
error = -ENOTCONN;
goto end;
@@ -789,8 +790,6 @@ static int pppoe_sendmsg(struct kiocb *iocb, struct socket 
*sock,
hdr.code = 0;
hdr.sid = po-num;
 
-   lock_sock(sk);
-
dev = po-pppoe_dev;
 
error = -EMSGSIZE;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] tc35815: Fix an usage of streaming DMA API.

2007-03-13 Thread Atsushi Nemoto
The tc35815 driver lacks a call to pci_dma_sync_single_for_device() on
receiving.  Recent fix of MIPS dma_sync_single_for_cpu() reveal this
bug.

Signed-off-by: Atsushi Nemoto [EMAIL PROTECTED]
---
This patch can be applied to netdev-2.6 tree or 2.6.21-rc3-mm2.

diff --git a/drivers/net/tc35815.c b/drivers/net/tc35815.c
index ec888db..eed78b5 100644
--- a/drivers/net/tc35815.c
+++ b/drivers/net/tc35815.c
@@ -58,12 +58,13 @@
  * 1.34Fix netpoll locking.  BH rule for NAPI is not enough with
  * netpoll, hard_start_xmit might be called from irq context.
  * PM support.
+ * 1.35Fix an usage of streaming DMA API.
  */
 
 #ifdef TC35815_NAPI
-#define DRV_VERSION1.34-NAPI
+#define DRV_VERSION1.35-NAPI
 #else
-#define DRV_VERSION1.34
+#define DRV_VERSION1.35
 #endif
 static const char *version = tc35815.c:v DRV_VERSION \n;
 #define MODNAMEtc35815
@@ -1551,6 +1552,11 @@ tc35815_rx(struct net_device *dev)
PCI_DMA_FROMDEVICE);
 #endif
memcpy(data + offset, rxbuf, len);
+#ifdef TC35815_DMA_SYNC_ONDEMAND
+   pci_dma_sync_single_for_device(lp-pci_dev,
+  dma, len,
+  
PCI_DMA_FROMDEVICE);
+#endif
offset += len;
cur_bd++;
}
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[NET] AX.25 Kconfig and docs updates and fixes

2007-03-13 Thread Ralf Baechle
 o The AX.25 Howto is unmaintained since several years.  I've replaced it
   with a wiki at http://www.linux-ax25.org which provides more uptodate
   information.
 o Change default for AX25_DAMA_SLAVE to Y.  AX25_DAMA_SLAVE only compiles
   in support for DAMA but doesn't activate it.  I hope this gets Linux
   distributions to ship their AX.25 kernels with AX25_DAMA_SLAVE enabled.
   The price for this would be very small.
 o Delete historic changelog from comments, that's what SCM systems are
   meant to do.
 o ---help--- in Kconfig looks so yellingly eye insulting.  Use just help.
 o Rewrite the commented out piece of old Linux 2.4 configuration language
   to Kconfig for consistency.
 o Fixup dependencies.

Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

diff --git a/Documentation/networking/ax25.txt 
b/Documentation/networking/ax25.txt
index 37c25b0..8257dbf 100644
--- a/Documentation/networking/ax25.txt
+++ b/Documentation/networking/ax25.txt
@@ -1,16 +1,10 @@
 To use the amateur radio protocols within Linux you will need to get a
-suitable copy of the AX.25 Utilities. More detailed information about these
-and associated programs can be found on http://zone.pspt.fi/~jsn/.
-
-For more information about the AX.25, NET/ROM and ROSE protocol stacks, see
-the AX25-HOWTO written by Terry Dawson [EMAIL PROTECTED]
-who is also the AX.25 Utilities maintainer.
+suitable copy of the AX.25 Utilities. More detailed information about
+AX.25, NET/ROM and ROSE, associated programs and and utilities can be
+found on http://www.linux-ax25.org.
 
 There is an active mailing list for discussing Linux amateur radio matters
-called linux-hams. To subscribe to it, send a message to
+called [EMAIL PROTECTED] To subscribe to it, send a message to
 [EMAIL PROTECTED] with the words subscribe linux-hams in the body
-of the message, the subject field is ignored.
-
-Jonathan G4KLX
-
[EMAIL PROTECTED]
+of the message, the subject field is ignored.  You don't need to be
+subscribed to post but of course that means you might miss an answer.
diff --git a/net/ax25/Kconfig b/net/ax25/Kconfig
index a8993a0..43dd86f 100644
--- a/net/ax25/Kconfig
+++ b/net/ax25/Kconfig
@@ -1,30 +1,27 @@
 #
 # Amateur Radio protocols and AX.25 device configuration
 #
-# 19971130 Now in an own category to make correct compilation of the
-#  AX.25 stuff easier...
-#  Joerg Reuter DL1BKE [EMAIL PROTECTED]
-# 19980129 Moved to net/ax25/Config.in, sourcing device drivers.
 
 menuconfig HAMRADIO
depends on NET
bool Amateur Radio support
help
  If you want to connect your Linux box to an amateur radio, answer Y
- here. You want to read http://www.tapr.org/tapr/html/pkthome.html 
and
- the AX25-HOWTO, available from http://www.tldp.org/docs.html#howto.
+ here. You want to read http://www.tapr.org/tapr/html/pkthome.html
+ and more specifically about AX.25 on Linux
+ http://www.linux-ax25.org/.
 
  Note that the answer to this question won't directly affect the
  kernel: saying N will just cause the configurator to skip all
  the questions about amateur radio.
 
 comment Packet Radio protocols
-   depends on HAMRADIO  NET
+   depends on HAMRADIO
 
 config AX25
tristate Amateur Radio AX.25 Level 2 protocol
-   depends on HAMRADIO  NET
-   ---help---
+   depends on HAMRADIO
+   help
  This is the protocol used for computer communication over amateur
  radio. It is either used by itself for point-to-point links, or to
  carry other protocols such as tcp/ip. To use it, you need a device
@@ -52,6 +49,7 @@ config AX25
 
 config AX25_DAMA_SLAVE
bool AX.25 DAMA Slave support
+   default y
depends on AX25
help
  DAMA is a mechanism to prevent collisions when doing AX.25
@@ -59,23 +57,38 @@ config AX25_DAMA_SLAVE
  from clients (called slaves) and redistributes it to other slaves.
  If you say Y here, your Linux box will act as a DAMA slave; this is
  transparent in that you don't have to do any special DAMA
- configuration. (Linux cannot yet act as a DAMA server.) If unsure,
- say N.
+ configuration. Linux cannot yet act as a DAMA server.  This option
+ only compiles DAMA slave support into the kernel.  It still needs to
+ be enabled at runtime.  For more about DAMA see
+ http://www.linux-ax25.org.  If unsure, say Y.
+
+# placeholder until implemented
+config AX25_DAMA_MASTER
+   bool 'AX.25 DAMA Master support'
+   depends on AX25_DAMA_SLAVE  BROKEN
+   help
+ DAMA is a mechanism to prevent collisions when doing AX.25
+ networking. A DAMA server (called master) accepts incoming traffic
+ from clients (called slaves) and redistributes it to other slaves.
+ If you say Y here, your Linux box will act as a DAMA master; this is
+ 

Re: SWS for rcvbuf MTU

2007-03-13 Thread John Heffner

Alex Sidorenko wrote:
Here are the values from live kernel (obtained with 'crash') when the host was 
in SWS state:


full_space=708  full_space/2=354
free_space=393
window=76

In this case the test from my original fix, (window  full_space/2),  
succeeds. But John's test


free_space  window + full_space/2
393  430

does not. So I suspect that the new fix will not always work. From tcpdump 
traces we can see that both hosts exchange with 76-byte packets for a long 
time. From customer's application log we see that it continues to read 
76-byte chunks per each read() call - even though more than that is available 
in the receive buffer. Technically it's OK for read() to return even after 
reading one byte, so if sk-receive_queue contains multiple 76-byte skbuffs 
we may return after processing just one skbuff (but we we don't understand 
the details of why this happens on customer's system).


Are there any particular reasons why you want to postpone window update until 
free_space becomes  window + full_space/2 and not as soon as 
free_space  full_space/2? As the only real-life occurance of SWS shows 
free_space oscillating slightly above full_space/2, I created the fix 
specifically to match this phenomena as seen on customer's host. We reach the 
modified section only when (free_space  full_space/2) so it should be OK to 
update the window at this point if mss==full_space. 

So yes, we can test John's fix on customer's host but I doubt it will work for 
the reasons mentioned above, in brief:


'window = free_space' instead of 'window=full_space/2' is OK,
but the test 'free_space  window + full_space/2' is not for the specific 
pattern customer sees on his hosts.



Sorry for the long delay in response, I've been on vacation.  I'm okay 
with your patch, and I can't think of any real problem with it, except 
that the behavior is non-standard.  Then again, Linux acking in general 
is non-standard, which has created the bug in the first place. :)  The 
only thing I can think where it might still ack too often is if 
free_space frequently drops just below full_space/2 for a bit then rises 
above full_space/2.


I've also attached a corrected version of my earlier patch that I think 
solves the problem you noted.


Thanks,
  -John
Do full receiver-side SWS avoidance when rcvbuf  mss.

Signed-off-by: John Heffner [EMAIL PROTECTED]

---
commit f4333661026621e15549fb75b37be785e4a1c443
tree 30d46b64ea19634875fdd4656d33f76db526a313
parent 562aa1d4c6a874373f9a48ac184f662fbbb06a04
author John Heffner [EMAIL PROTECTED] Tue, 13 Mar 2007 14:17:03 -0400
committer John Heffner [EMAIL PROTECTED] Tue, 13 Mar 2007 14:17:03 -0400

 net/ipv4/tcp_output.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index dc15113..e621a63 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1605,8 +1605,15 @@ u32 __tcp_select_window(struct sock *sk)
 * We also don't do any window rounding when the free space
 * is too small.
 */
-   if (window = free_space - mss || window  free_space)
+   if (window = free_space - mss || window  free_space) {
window = (free_space/mss)*mss;
+   } else if (mss == full_space) {
+   /* Do full receive-side SWS avoidance
+* when rcvbuf = mss */
+   window = tcp_receive_window(tp);
+   if (free_space  window + full_space/2)
+   window = free_space;
+   }
}
 
return window;


Re: [PATCH] tc35815: Fix an usage of streaming DMA API.

2007-03-13 Thread Stephen Hemminger
On Wed, 14 Mar 2007 01:02:20 +0900 (JST)
Atsushi Nemoto [EMAIL PROTECTED] wrote:

 The tc35815 driver lacks a call to pci_dma_sync_single_for_device() on
 receiving.  Recent fix of MIPS dma_sync_single_for_cpu() reveal this
 bug.
 
 Signed-off-by: Atsushi Nemoto [EMAIL PROTECTED]
 ---
 This patch can be applied to netdev-2.6 tree or 2.6.21-rc3-mm2.
 
 diff --git a/drivers/net/tc35815.c b/drivers/net/tc35815.c
 index ec888db..eed78b5 100644
 --- a/drivers/net/tc35815.c
 +++ b/drivers/net/tc35815.c
 @@ -58,12 +58,13 @@
   *   1.34Fix netpoll locking.  BH rule for NAPI is not enough with
   *   netpoll, hard_start_xmit might be called from irq context.
   *   PM support.
 + *   1.35Fix an usage of streaming DMA API.
   */

Please don't use comments as changelog anymore. It gets out of date.
The use of change control systems has made this practice obsolete.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/4] tcp: statistics not read_mostly

2007-03-13 Thread Andi Kleen
Stephen Hemminger [EMAIL PROTECTED] writes:
 
 /* 
  * FIXME: On x86 and some other CPUs the split into user and softirq parts
  * is not needed because addl $1,memory is atomic against interrupts (but 
  * atomic_inc would be overkill because of the lock cycles). Wants new 
  * nonlocked_atomic_inc() primitives -AK
  */ 

That exists now as local_t.  And in fact the generic (non x86) local_t
is implemented in the same way as the current network statistics
(although I'm not convinced that's the best portable way to do this)

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] natsemi: netpoll fixes

2007-03-13 Thread Mark Brown
On Tue, Mar 13, 2007 at 04:53:54PM +0300, Sergei Shtylyov wrote:
 Mark Brown wrote:

 confused and eventually locks up.  Before locking up it will usually
 report one or more oversided packets so this is a useful hint that we
 should reset the recieve state machine in order to recover from this.

That's all good by why we need to completely lose TX and other interrupts
 in the meantime? High inbound traffic doesn't necessarily mean a high 
 outbound one, does it?

While the code in the driver can cope if the chip takes a while to
respond to the reset as far as I have been able to tell in testing 
it does so close enough to immediately to avoid repeating the loop at
all.  The effect on transmit processing should be minimal.

-- 
You grabbed my hand and we fell into it, like a daydream - or a fever.


signature.asc
Description: Digital signature


Re: bridge: faster compare for link local addresses

2007-03-13 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 14:38:32 +0100

 But memcmp() has a strong semantic (in libc). memcmp(a, b, 6) should
 do 6 byte compares and conditional branches, regardless of a/b
 alignment.  Or use the x86 rep cmpsb instruction that basically
 has the same cost.

Yep, that's the issue, gcc won't make the reductions necessary
here to get it down to one comparison and one branch.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wireless extensions vs. 64-bit architectures

2007-03-13 Thread Johannes Berg
On Mon, 2007-03-12 at 10:56 -0700, Jean Tourrilhes wrote:

   I did that in the e-mail to Jouni. The problem is that most
 people are unfamiliar with decoding iwevents, so can't grasp the
 explanation.
   Basically, for iwpoint, we have an outer lenght and an inner
 length. If they don't match, we have an alignement issue and just need
 to pick the payload 8 bytes after the expected location.
   For other events, they have a well known size. If the outer
 lenght is not the expected size, but is expected+4, you just pick the
 payload 4 bytes after the expected location.

Ok.

So the plan now is to put this document up somewhere maybe with some
graphics or whatever, and then send this to distros so they know what
happens when people hit this bug.

Does your new version work without padding even on 64-bit arches? Then
in a few years we can actually remove the padding completely in the
kernel, right?

johannes


signature.asc
Description: This is a digitally signed message part


Re: bridge: faster compare for link local addresses

2007-03-13 Thread Stephen Hemminger
On Tue, 13 Mar 2007 12:39:54 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Eric Dumazet [EMAIL PROTECTED]
 Date: Tue, 13 Mar 2007 14:38:32 +0100
 
  But memcmp() has a strong semantic (in libc). memcmp(a, b, 6) should
  do 6 byte compares and conditional branches, regardless of a/b
  alignment.  Or use the x86 rep cmpsb instruction that basically
  has the same cost.
 
 Yep, that's the issue, gcc won't make the reductions necessary
 here to get it down to one comparison and one branch.

Also, for our usage we only care about equality, not greater/less than
return value.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Get rid of netdev_nit

2007-03-13 Thread Stephen Hemminger
It isn't any faster to test a boolean global variable than do a 
simple check for empty list.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 net/core/dev.c |   18 +-
 1 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 3a8590c..f2ae2c9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -226,12 +226,6 @@ #endif
 
***/
 
 /*
- * For efficiency
- */
-
-static int netdev_nit;
-
-/*
  * Add a protocol ID to the list. Now that the input handler is
  * smarter we can dispense with all the messy stuff that used to be
  * here.
@@ -265,10 +259,9 @@ void dev_add_pack(struct packet_type *pt
int hash;
 
spin_lock_bh(ptype_lock);
-   if (pt-type == htons(ETH_P_ALL)) {
-   netdev_nit++;
+   if (pt-type == htons(ETH_P_ALL))
list_add_rcu(pt-list, ptype_all);
-   } else {
+   else {
hash = ntohs(pt-type)  15;
list_add_rcu(pt-list, ptype_base[hash]);
}
@@ -295,10 +288,9 @@ void __dev_remove_pack(struct packet_typ
 
spin_lock_bh(ptype_lock);
 
-   if (pt-type == htons(ETH_P_ALL)) {
-   netdev_nit--;
+   if (pt-type == htons(ETH_P_ALL))
head = ptype_all;
-   } else
+   else
head = ptype_base[ntohs(pt-type)  15];
 
list_for_each_entry(pt1, head, list) {
@@ -1333,7 +1325,7 @@ static int dev_gso_segment(struct sk_buf
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
if (likely(!skb-next)) {
-   if (netdev_nit)
+   if (!list_empty(ptype_all))
dev_queue_xmit_nit(skb, dev);
 
if (netif_needs_gso(dev, skb)) {
-- 
1.4.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wireless extensions vs. 64-bit architectures

2007-03-13 Thread Jean Tourrilhes
On Tue, Mar 13, 2007 at 08:42:05PM +0100, Johannes Berg wrote:
 On Mon, 2007-03-12 at 10:56 -0700, Jean Tourrilhes wrote:
 
  I did that in the e-mail to Jouni. The problem is that most
  people are unfamiliar with decoding iwevents, so can't grasp the
  explanation.
  Basically, for iwpoint, we have an outer lenght and an inner
  length. If they don't match, we have an alignement issue and just need
  to pick the payload 8 bytes after the expected location.
  For other events, they have a well known size. If the outer
  lenght is not the expected size, but is expected+4, you just pick the
  payload 4 bytes after the expected location.
 
 Ok.
 
 So the plan now is to put this document up somewhere maybe with some
 graphics or whatever, and then send this to distros so they know what
 happens when people hit this bug.
 
 Does your new version work without padding even on 64-bit arches? Then
 in a few years we can actually remove the padding completely in the
 kernel, right?

You are too smart ;-) Yes, the second version in pre16 does
exactly that. That's why I had to change the constants.

 johannes

Jean

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPROUTE2][GENERAL] nl_mgrp to crap if base multicast groups exceeded

2007-03-13 Thread Stephen Hemminger
On Sun, 25 Feb 2007 12:02:23 -0500
jamal [EMAIL PROTECTED] wrote:

 
 cheers,
 jamal
 

applied both patches

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC IPROUTE 00/08]: Time cleanups + nano-second clock resolution support

2007-03-13 Thread Stephen Hemminger
On Sun,  4 Mar 2007 20:14:53 +0100 (MET)
Patrick McHardy [EMAIL PROTECTED] wrote:

 This patchset consists of four parts:
 
 - minor TBF time conversion fix
 
 - consolidation of time calculations: consolidate commonly used expressions
   with the goal of making it easier to audit for integer overflows when
   increasing the internally used clock resolution.
  
 - support for detecting the clock resolution used by the kernel and converting
   time values as necessary.
 
 - finally, increase the internally used clock resolution to nano-seconds
 
 These patches have been tested (well, TBF and HFSC) with both old kernels
 and patched kernels using nano-second resolution.
 
 
  tc/m_estimator.c  |4 +--
  tc/m_police.c |2 -
  tc/q_cbq.c|   15 +++--
  tc/q_hfsc.c   |   18 +++
  tc/q_htb.c|4 +--
  tc/q_netem.c  |   14 +++-
  tc/q_tbf.c|   22 +--
  tc/tc_cbq.c   |8 +++
  tc/tc_core.c  |   61 
 ++
  tc/tc_core.h  |   13 +++
  tc/tc_estimator.c |2 -
  tc/tc_red.c   |2 -
  tc/tc_util.c  |   40 ++-
  tc/tc_util.h  |7 +++---
  14 files changed, 125 insertions(+), 87 deletions(-)
 
 Patrick McHardy:
   [IPROUTE]: tbf: fix latency printing
   [IPROUTE]: Use tc_calc_xmittime() where appropriate
   [IPROUTE]: Introduce tc_calc_xmitsize and use where appropriate
   [IPROUTE]: Introduce TIME_UNITS_PER_SEC to represent internal clock 
 resolution
   [IPROUTE]: Replace usec by time in function names
   [IPROUTE]: Add sprint_ticks() function and use in CBQ
   [IPROUTE]: Handle different kernel clock resolutions
   [IPROUTE]: Increase internal clock resolution to nsec

applied all

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHES 0/15] skb-h is now a one member union

2007-03-13 Thread Arnaldo Carvalho de Melo

Hi David,

Please consider pulling from:

master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.22

We're getting close...

Thanks a lot!

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/15] [SK_BUFF]: Introduce skb_reset_transport_header(skb)

2007-03-13 Thread Arnaldo Carvalho de Melo

For the common, open coded 'skb-h.raw = skb-data' operation, so that we can
later turn skb-h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple cases:

skb-h.raw = skb-data;
skb-h.raw = {skb_push|[__]skb_pull}()

The next ones will handle the slightly more complex cases.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
drivers/infiniband/hw/cxgb3/iwch_cm.c   |6 +++---
drivers/net/appletalk/cops.c|2 +-
drivers/net/appletalk/ltpc.c|4 ++--
drivers/net/cxgb3/sge.c |2 +-
include/linux/dccp.h|6 +++---
include/linux/skbuff.h  |5 +
net/appletalk/aarp.c|6 +++---
net/appletalk/ddp.c |4 ++--
net/ax25/af_ax25.c  |4 ++--
net/ax25/ax25_in.c  |8 
net/bluetooth/af_bluetooth.c|2 +-
net/bluetooth/hci_core.c|9 +
net/bluetooth/hci_sock.c|2 +-
net/core/dev.c  |2 +-
net/core/netpoll.c  |2 +-
net/decnet/dn_nsp_in.c  |2 +-
net/decnet/dn_nsp_out.c |2 +-
net/decnet/dn_route.c   |4 ++--
net/ipv4/af_inet.c  |6 --
net/ipv4/ah4.c  |3 ++-
net/ipv4/ip_input.c |2 +-
net/ipv4/ip_output.c|2 +-
net/ipv4/ipmr.c |2 +-
net/ipv4/udp.c  |3 ++-
net/ipv4/xfrm4_mode_transport.c |2 +-
net/ipv6/ip6_input.c|2 +-
net/ipv6/ip6_output.c   |8 
net/ipv6/ipv6_sockglue.c|4 ++--
net/ipv6/netfilter/nf_conntrack_reasm.c |2 +-
net/ipv6/reassembly.c   |2 +-
net/ipv6/xfrm6_mode_transport.c |2 +-
net/ipx/af_ipx.c|2 +-
net/ipx/ipx_route.c |2 +-
net/irda/af_irda.c  |4 ++--
net/irda/irlap_frame.c  |2 +-
net/iucv/af_iucv.c  |2 +-
net/key/af_key.c|2 +-
net/llc/llc_sap.c   |2 +-
net/netlink/af_netlink.c|2 +-
net/netrom/af_netrom.c  |6 +++---
net/netrom/nr_in.c  |2 +-
net/netrom/nr_loopback.c|2 +-
net/rose/af_rose.c  |2 +-
net/rose/rose_loopback.c|2 +-
net/rose/rose_route.c   |2 +-
net/unix/af_unix.c  |2 +-
net/x25/af_x25.c|3 +--
net/x25/x25_dev.c   |2 +-
net/x25/x25_in.c|2 +-
49 files changed, 82 insertions(+), 73 deletions(-)
From 410c353531e314f7c3642471b2f1b61bd8fc4ef7 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 13:06:52 -0300
Subject: [PATCH 01/15] [SK_BUFF]: Introduce skb_reset_transport_header(skb)

For the common, open coded 'skb-h.raw = skb-data' operation, so that we can
later turn skb-h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple cases:

skb-h.raw = skb-data;
skb-h.raw = {skb_push|[__]skb_pull}()

The next ones will handle the slightly more complex cases.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 drivers/infiniband/hw/cxgb3/iwch_cm.c   |6 +++---
 drivers/net/appletalk/cops.c|2 +-
 drivers/net/appletalk/ltpc.c|4 ++--
 drivers/net/cxgb3/sge.c |2 +-
 include/linux/dccp.h|6 +++---
 include/linux/skbuff.h  |5 +
 net/appletalk/aarp.c|6 +++---
 net/appletalk/ddp.c |4 ++--
 net/ax25/af_ax25.c  |4 ++--
 net/ax25/ax25_in.c  |8 
 net/bluetooth/af_bluetooth.c|2 +-
 net/bluetooth/hci_core.c|9 +
 net/bluetooth/hci_sock.c|2 +-
 net/core/dev.c  |2 +-
 net/core/netpoll.c  |2 +-
 net/decnet/dn_nsp_in.c  |2 +-
 net/decnet/dn_nsp_out.c |2 +-
 net/decnet/dn_route.c   |4 ++--
 net/ipv4/af_inet.c  |6 --
 net/ipv4/ah4.c  |3 ++-
 net/ipv4/ip_input.c |2 +-
 net/ipv4/ip_output.c|2 +-
 net/ipv4/ipmr.c |2 +-
 net/ipv4/udp.c  |3 ++-
 net/ipv4/xfrm4_mode_transport.c |2 +-
 

[PATCH 02/15] [SK_BUFF]: Introduce skb_transport_offset()

2007-03-13 Thread Arnaldo Carvalho de Melo

For the quite common 'skb-h.raw - skb-data' sequence.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
drivers/net/atl1/atl1_main.c   |   10 +-
drivers/net/cassini.c  |6 ++
drivers/net/cxgb3/sge.c|7 ---
drivers/net/e1000/e1000_main.c |   10 +-
drivers/net/ixgb/ixgb_main.c   |8 
drivers/net/myri10ge/myri10ge.c|5 +++--
drivers/net/netxen/netxen_nic_hw.c |2 +-
drivers/net/sk98lin/skge.c |4 ++--
drivers/net/skge.c |2 +-
drivers/net/sky2.c |2 +-
drivers/net/sungem.c   |6 ++
drivers/net/sunhme.c   |6 ++
include/linux/skbuff.h |5 +
include/net/udplite.h  |6 +++---
net/core/dev.c |2 +-
net/core/skbuff.c  |2 +-
net/ipv4/esp4.c|2 +-
net/ipv4/udp.c |2 +-
net/ipv6/esp6.c|9 +++--
net/ipv6/exthdrs.c |   12 +++-
net/ipv6/ip6_input.c   |2 +-
net/ipv6/ipcomp6.c |4 +---
net/ipv6/mip6.c|5 +++--
net/ipv6/raw.c |4 ++--
net/ipv6/reassembly.c  |3 ++-
net/sctp/input.c   |2 +-
26 files changed, 64 insertions(+), 64 deletions(-)
From 9e8e523e2f63bdb7f93f50990817d82a94d276e1 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 13:24:15 -0300
Subject: [PATCH 02/15] [SK_BUFF]: Introduce skb_transport_offset()

For the quite common 'skb-h.raw - skb-data' sequence.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 drivers/net/atl1/atl1_main.c   |   10 +-
 drivers/net/cassini.c  |6 ++
 drivers/net/cxgb3/sge.c|7 ---
 drivers/net/e1000/e1000_main.c |   10 +-
 drivers/net/ixgb/ixgb_main.c   |8 
 drivers/net/myri10ge/myri10ge.c|5 +++--
 drivers/net/netxen/netxen_nic_hw.c |2 +-
 drivers/net/sk98lin/skge.c |4 ++--
 drivers/net/skge.c |2 +-
 drivers/net/sky2.c |2 +-
 drivers/net/sungem.c   |6 ++
 drivers/net/sunhme.c   |6 ++
 include/linux/skbuff.h |5 +
 include/net/udplite.h  |6 +++---
 net/core/dev.c |2 +-
 net/core/skbuff.c  |2 +-
 net/ipv4/esp4.c|2 +-
 net/ipv4/udp.c |2 +-
 net/ipv6/esp6.c|9 +++--
 net/ipv6/exthdrs.c |   12 +++-
 net/ipv6/ip6_input.c   |2 +-
 net/ipv6/ipcomp6.c |4 +---
 net/ipv6/mip6.c|5 +++--
 net/ipv6/raw.c |4 ++--
 net/ipv6/reassembly.c  |3 ++-
 net/sctp/input.c   |2 +-
 26 files changed, 64 insertions(+), 64 deletions(-)

diff --git a/drivers/net/atl1/atl1_main.c b/drivers/net/atl1/atl1_main.c
index 5d69178..c5ac46f 100644
--- a/drivers/net/atl1/atl1_main.c
+++ b/drivers/net/atl1/atl1_main.c
@@ -1326,8 +1326,8 @@ static int atl1_tx_csum(struct atl1_adapter *adapter, struct sk_buff *skb,
 	u8 css, cso;
 
 	if (likely(skb-ip_summed == CHECKSUM_PARTIAL)) {
-		cso = skb-h.raw - skb-data;
-		css = (skb-h.raw + skb-csum) - skb-data;
+		cso = skb_transport_offset(skb);
+		css = cso + skb-csum;
 		if (unlikely(cso  0x1)) {
 			printk(KERN_DEBUG %s: payload offset != even number\n,
 atl1_driver_name);
@@ -1369,8 +1369,8 @@ static void atl1_tx_map(struct atl1_adapter *adapter,
 
 	if (tcp_seg) {
 		/* TSO/GSO */
-		proto_hdr_len =
-		((skb-h.raw - skb-data) + (skb-h.th-doff  2));
+		proto_hdr_len = (skb_transport_offset(skb) +
+ (skb-h.th-doff  2));
 		buffer_info-length = proto_hdr_len;
 		page = virt_to_page(skb-data);
 		offset = (unsigned long)skb-data  ~PAGE_MASK;
@@ -1562,7 +1562,7 @@ static int atl1_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	mss = skb_shinfo(skb)-gso_size;
 	if (mss) {
 		if (skb-protocol == ntohs(ETH_P_IP)) {
-			proto_hdr_len = ((skb-h.raw - skb-data) +
+			proto_hdr_len = (skb_transport_offset(skb) +
 	 (skb-h.th-doff  2));
 			if (unlikely(proto_hdr_len  len)) {
 dev_kfree_skb_any(skb);
diff --git a/drivers/net/cassini.c b/drivers/net/cassini.c
index 68e37a6..bd3ab64 100644
--- a/drivers/net/cassini.c
+++ b/drivers/net/cassini.c
@@ -2821,10 +2821,8 @@ static inline int cas_xmit_tx_ringN(struct cas *cp, int ring,
 
 	ctrl = 0;
 	if (skb-ip_summed == CHECKSUM_PARTIAL) {
-		u64 csum_start_off, csum_stuff_off;
-
-		csum_start_off = (u64) (skb-h.raw - skb-data);
-		csum_stuff_off = csum_start_off + skb-csum_offset;
+		const u64 csum_start_off = skb_transport_offset(skb);
+		const u64 csum_stuff_off = csum_start_off + skb-csum_offset;
 
 		ctrl =  

[PATCH 03/15] [SK_BUFF]: Introduce skb_set_transport_header

2007-03-13 Thread Arnaldo Carvalho de Melo

For the cases where the transport header is being set to a offset from
skb-data.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
include/linux/skbuff.h  |6 ++
net/ax25/af_ax25.c  |   20 
net/ax25/ax25_in.c  |2 +-
net/ipv4/esp4.c |3 ++-
net/ipv4/ip_output.c|   19 ---
net/ipv4/tcp_input.c|2 +-
net/ipv6/ah6.c  |2 +-
net/ipv6/esp6.c |4 ++--
net/ipv6/netfilter/nf_conntrack_reasm.c |2 +-
net/ipv6/xfrm6_mode_beet.c  |2 +-
net/ipv6/xfrm6_mode_ro.c|2 +-
net/ipv6/xfrm6_mode_transport.c |2 +-
12 files changed, 33 insertions(+), 33 deletions(-)
From 275275bab5d9e1887f1227fddbee875eaec82c6f Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 13:51:52 -0300
Subject: [PATCH 03/15] [SK_BUFF]: Introduce skb_set_transport_header

For the cases where the transport header is being set to a offset from
skb-data.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 include/linux/skbuff.h  |6 ++
 net/ax25/af_ax25.c  |   20 
 net/ax25/ax25_in.c  |2 +-
 net/ipv4/esp4.c |3 ++-
 net/ipv4/ip_output.c|   19 ---
 net/ipv4/tcp_input.c|2 +-
 net/ipv6/ah6.c  |2 +-
 net/ipv6/esp6.c |4 ++--
 net/ipv6/netfilter/nf_conntrack_reasm.c |2 +-
 net/ipv6/xfrm6_mode_beet.c  |2 +-
 net/ipv6/xfrm6_mode_ro.c|2 +-
 net/ipv6/xfrm6_mode_transport.c |2 +-
 12 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f721fab..12bd740 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -960,6 +960,12 @@ static inline void skb_reset_transport_header(struct sk_buff *skb)
 	skb-h.raw = skb-data;
 }
 
+static inline void skb_set_transport_header(struct sk_buff *skb,
+	const int offset)
+{
+	skb-h.raw = skb-data + offset;
+}
+
 static inline int skb_transport_offset(const struct sk_buff *skb)
 {
 	return skb-h.raw - skb-data;
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 14db01a..75d4d69 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1425,7 +1425,6 @@ static int ax25_sendmsg(struct kiocb *iocb, struct socket *sock,
 	struct sockaddr_ax25 sax;
 	struct sk_buff *skb;
 	ax25_digi dtmp, *dp;
-	unsigned char *asmptr;
 	ax25_cb *ax25;
 	size_t size;
 	int lv, err, addr_len = msg-msg_namelen;
@@ -1551,10 +1550,8 @@ static int ax25_sendmsg(struct kiocb *iocb, struct socket *sock,
 	skb_reset_network_header(skb);
 
 	/* Add the PID if one is not supplied by the user in the skb */
-	if (!ax25-pidincl) {
-		asmptr  = skb_push(skb, 1);
-		*asmptr = sk-sk_protocol;
-	}
+	if (!ax25-pidincl)
+		*skb_push(skb, 1) = sk-sk_protocol;
 
 	SOCK_DEBUG(sk, AX.25: Transmitting buffer\n);
 
@@ -1573,7 +1570,7 @@ static int ax25_sendmsg(struct kiocb *iocb, struct socket *sock,
 		goto out;
 	}
 
-	asmptr = skb_push(skb, 1 + ax25_addr_size(dp));
+	skb_push(skb, 1 + ax25_addr_size(dp));
 
 	SOCK_DEBUG(sk, Building AX.25 Header (dp=%p).\n, dp);
 
@@ -1581,17 +1578,16 @@ static int ax25_sendmsg(struct kiocb *iocb, struct socket *sock,
 		SOCK_DEBUG(sk, Num digipeaters=%d\n, dp-ndigi);
 
 	/* Build an AX.25 header */
-	asmptr += (lv = ax25_addr_build(asmptr, ax25-source_addr,
-	sax.sax25_call, dp,
-	AX25_COMMAND, AX25_MODULUS));
+	lv = ax25_addr_build(skb-data, ax25-source_addr, sax.sax25_call,
+			 dp, AX25_COMMAND, AX25_MODULUS);
 
 	SOCK_DEBUG(sk, Built header (%d bytes)\n,lv);
 
-	skb-h.raw = asmptr;
+	skb_set_transport_header(skb, lv);
 
-	SOCK_DEBUG(sk, base=%p pos=%p\n, skb-data, asmptr);
+	SOCK_DEBUG(sk, base=%p pos=%p\n, skb-data, skb-h.raw);
 
-	*asmptr = AX25_UI;
+	*skb-h.raw = AX25_UI;
 
 	/* Datagram frames go straight out of the door as UI */
 	ax25_queue_xmit(skb, ax25-ax25_dev-dev);
diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c
index 724ad5c..31c5938 100644
--- a/net/ax25/ax25_in.c
+++ b/net/ax25/ax25_in.c
@@ -233,7 +233,7 @@ static int ax25_rcv(struct sk_buff *skb, struct net_device *dev,
 
 	/* UI frame - bypass LAPB processing */
 	if ((*skb-data  ~0x10) == AX25_UI  dp.lastrepeat + 1 == dp.ndigi) {
-		skb-h.raw = skb-data + 2;		/* skip control and pid */
+		skb_set_transport_header(skb, 2); /* skip control and pid */
 
 		ax25_send_to_raw(dest, skb, skb-data[1]);
 
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 9576745..82543ee 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -261,7 +261,8 @@ static int esp_input(struct xfrm_state *x, struct sk_buff *skb)
 
 	iph-protocol = nexthdr[1];
 	pskb_trim(skb, 

[PATCH 04/15] [SCTP]: Introduce sctp_hdr()

2007-03-13 Thread Arnaldo Carvalho de Melo

For consistency with all the other skb-h.raw accessors.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
include/linux/sctp.h |9 +
net/sctp/input.c |   14 +-
net/sctp/ipv6.c  |4 ++--
net/sctp/protocol.c  |   10 --
4 files changed, 20 insertions(+), 17 deletions(-)
From c483c1c1cc2c8e79c8eb06ef8f1e9fe8b5860721 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 13:59:32 -0300
Subject: [PATCH 04/15] [SCTP]: Introduce sctp_hdr()

For consistency with all the other skb-h.raw accessors.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 include/linux/sctp.h |9 +
 net/sctp/input.c |   14 +-
 net/sctp/ipv6.c  |4 ++--
 net/sctp/protocol.c  |   10 --
 4 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index d4f8656..d76767d 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -63,6 +63,15 @@ typedef struct sctphdr {
 	__be32 checksum;
 } __attribute__((packed)) sctp_sctphdr_t;
 
+#ifdef __KERNEL__
+#include linux/skbuff.h
+
+static inline struct sctphdr *sctp_hdr(const struct sk_buff *skb)
+{
+	return (struct sctphdr *)skb-h.raw;
+}
+#endif
+
 /* Section 3.2.  Chunk Field Descriptions. */
 typedef struct sctp_chunkhdr {
 	__u8 type;
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 9311b5d..3a322c5 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -79,14 +79,10 @@ static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
 /* Calculate the SCTP checksum of an SCTP packet.  */
 static inline int sctp_rcv_checksum(struct sk_buff *skb)
 {
-	struct sctphdr *sh;
-	__u32 cmp, val;
 	struct sk_buff *list = skb_shinfo(skb)-frag_list;
-
-	sh = (struct sctphdr *) skb-h.raw;
-	cmp = ntohl(sh-checksum);
-
-	val = sctp_start_cksum((__u8 *)sh, skb_headlen(skb));
+	struct sctphdr *sh = sctp_hdr(skb);
+	__u32 cmp = ntohl(sh-checksum);
+	__u32 val = sctp_start_cksum((__u8 *)sh, skb_headlen(skb));
 
 	for (; list; list = list-next)
 		val = sctp_update_cksum((__u8 *)list-data, skb_headlen(list),
@@ -138,7 +134,7 @@ int sctp_rcv(struct sk_buff *skb)
 	if (skb_linearize(skb))
 		goto discard_it;
 
-	sh = (struct sctphdr *) skb-h.raw;
+	sh = sctp_hdr(skb);
 
 	/* Pull up the IP and SCTP headers. */
 	__skb_pull(skb, skb_transport_offset(skb));
@@ -905,7 +901,7 @@ static struct sctp_association *__sctp_rcv_init_lookup(struct sk_buff *skb,
 	struct sctp_association *asoc;
 	union sctp_addr addr;
 	union sctp_addr *paddr = addr;
-	struct sctphdr *sh = (struct sctphdr *) skb-h.raw;
+	struct sctphdr *sh = sctp_hdr(skb);
 	sctp_chunkhdr_t *ch;
 	union sctp_params params;
 	sctp_init_chunk_t *init;
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index e1bfc50..dff72e0 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -390,7 +390,7 @@ static void sctp_v6_from_skb(union sctp_addr *addr,struct sk_buff *skb,
 	addr-v6.sin6_flowinfo = 0; /* FIXME */
 	addr-v6.sin6_scope_id = ((struct inet6_skb_parm *)skb-cb)-iif;
 
-	sh = (struct sctphdr *) skb-h.raw;
+	sh = sctp_hdr(skb);
 	if (is_saddr) {
 		*port  = sh-source;
 		from = ipv6_hdr(skb)-saddr;
@@ -765,7 +765,7 @@ static void sctp_inet6_skb_msgname(struct sk_buff *skb, char *msgname,
 	if (msgname) {
 		sctp_inet6_msgname(msgname, addr_len);
 		sin6 = (struct sockaddr_in6 *)msgname;
-		sh = (struct sctphdr *)skb-h.raw;
+		sh = sctp_hdr(skb);
 		sin6-sin6_port = sh-source;
 
 		/* Map ipv4 address into v4-mapped-on-v6 address. */
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 08f92ba..7c28c9b 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -235,7 +235,7 @@ static void sctp_v4_from_skb(union sctp_addr *addr, struct sk_buff *skb,
 	port = addr-v4.sin_port;
 	addr-v4.sin_family = AF_INET;
 
-	sh = (struct sctphdr *) skb-h.raw;
+	sh = sctp_hdr(skb);
 	if (is_saddr) {
 		*port  = sh-source;
 		from = ip_hdr(skb)-saddr;
@@ -731,13 +731,11 @@ static void sctp_inet_event_msgname(struct sctp_ulpevent *event, char *msgname,
 /* Initialize and copy out a msgname from an inbound skb. */
 static void sctp_inet_skb_msgname(struct sk_buff *skb, char *msgname, int *len)
 {
-	struct sctphdr *sh;
-	struct sockaddr_in *sin;
-
 	if (msgname) {
+		struct sctphdr *sh = sctp_hdr(skb);
+		struct sockaddr_in *sin = (struct sockaddr_in *)msgname;
+
 		sctp_inet_msgname(msgname, len);
-		sin = (struct sockaddr_in *)msgname;
-		sh = (struct sctphdr *)skb-h.raw;
 		sin-sin_port = sh-source;
 		sin-sin_addr.s_addr = ip_hdr(skb)-saddr;
 	}
-- 
1.5.0.2



[PATCH 09/15] [TCP]: Introduce tcp_hdrlen() and tcp_optlen()

2007-03-13 Thread Arnaldo Carvalho de Melo

The ip_hdrlen() buddy, created to reduce the number of skb-h.th- uses and to
avoid the longer, open coded equivalent.

Ditched a no-op in bnx2 in the process.

I wonder if we should have a BUG_ON(skb-h.th-doff  5) in tcp_optlen()...

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
drivers/net/atl1/atl1_main.c |7 +++
drivers/net/bnx2.c   |7 +++
drivers/net/e1000/e1000_main.c   |4 ++--
drivers/net/ehea/ehea_main.c |2 +-
drivers/net/ixgb/ixgb_main.c |2 +-
drivers/net/myri10ge/myri10ge.c  |3 +--
drivers/net/netxen/netxen_nic_hw.c   |3 +--
drivers/net/netxen/netxen_nic_main.c |2 +-
drivers/net/sky2.c   |2 +-
drivers/net/tg3.c|4 ++--
drivers/s390/net/qeth_eddp.c |8 
include/linux/tcp.h  |   10 ++
net/ipv4/tcp_ipv4.c  |2 +-
net/ipv6/tcp_ipv6.c  |2 +-
14 files changed, 32 insertions(+), 26 deletions(-)
From 89d23ed26a2c62b8d8c0e2159c6cc9c6fb47a491 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 15:04:14 -0300
Subject: [PATCH 09/15] [TCP]: Introduce tcp_hdrlen() and tcp_optlen()

The ip_hdrlen() buddy, created to reduce the number of skb-h.th- uses and to
avoid the longer, open coded equivalent.

Ditched a no-op in bnx2 in the process.

I wonder if we should have a BUG_ON(skb-h.th-doff  5) in tcp_optlen()...

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 drivers/net/atl1/atl1_main.c |7 +++
 drivers/net/bnx2.c   |7 +++
 drivers/net/e1000/e1000_main.c   |4 ++--
 drivers/net/ehea/ehea_main.c |2 +-
 drivers/net/ixgb/ixgb_main.c |2 +-
 drivers/net/myri10ge/myri10ge.c  |3 +--
 drivers/net/netxen/netxen_nic_hw.c   |3 +--
 drivers/net/netxen/netxen_nic_main.c |2 +-
 drivers/net/sky2.c   |2 +-
 drivers/net/tg3.c|4 ++--
 drivers/s390/net/qeth_eddp.c |8 
 include/linux/tcp.h  |   10 ++
 net/ipv4/tcp_ipv4.c  |2 +-
 net/ipv6/tcp_ipv6.c  |2 +-
 14 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/drivers/net/atl1/atl1_main.c b/drivers/net/atl1/atl1_main.c
index c5ac46f..0912d2a 100644
--- a/drivers/net/atl1/atl1_main.c
+++ b/drivers/net/atl1/atl1_main.c
@@ -1307,7 +1307,7 @@ static int atl1_tso(struct atl1_adapter *adapter, struct sk_buff *skb,
 
 			tso-tsopl |= (iph-ihl 
 CSUM_PARAM_IPHL_MASK)  CSUM_PARAM_IPHL_SHIFT;
-			tso-tsopl |= ((skb-h.th-doff  2) 
+			tso-tsopl |= (tcp_hdrlen(skb) 
 TSO_PARAM_TCPHDRLEN_MASK)  TSO_PARAM_TCPHDRLEN_SHIFT;
 			tso-tsopl |= (skb_shinfo(skb)-gso_size 
 TSO_PARAM_MSS_MASK)  TSO_PARAM_MSS_SHIFT;
@@ -1369,8 +1369,7 @@ static void atl1_tx_map(struct atl1_adapter *adapter,
 
 	if (tcp_seg) {
 		/* TSO/GSO */
-		proto_hdr_len = (skb_transport_offset(skb) +
- (skb-h.th-doff  2));
+		proto_hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
 		buffer_info-length = proto_hdr_len;
 		page = virt_to_page(skb-data);
 		offset = (unsigned long)skb-data  ~PAGE_MASK;
@@ -1563,7 +1562,7 @@ static int atl1_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	if (mss) {
 		if (skb-protocol == ntohs(ETH_P_IP)) {
 			proto_hdr_len = (skb_transport_offset(skb) +
-	 (skb-h.th-doff  2));
+	 tcp_hdrlen(skb));
 			if (unlikely(proto_hdr_len  len)) {
 dev_kfree_skb_any(skb);
 return NETDEV_TX_OK;
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 01ea2ba..f948918 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -4520,13 +4520,12 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			return NETDEV_TX_OK;
 		}
 
-		tcp_opt_len = ((skb-h.th-doff - 5) * 4);
 		vlan_tag_flags |= TX_BD_FLAGS_SW_LSO;
 
 		tcp_opt_len = 0;
-		if (skb-h.th-doff  5) {
-			tcp_opt_len = (skb-h.th-doff - 5)  2;
-		}
+		if (skb-h.th-doff  5)
+			tcp_opt_len = tcp_optlen(skb);
+
 		ip_tcp_len = ip_hdrlen(skb) + sizeof(struct tcphdr);
 
 		iph = ip_hdr(skb);
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 79988b7..40e18c2 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2887,7 +2887,7 @@ e1000_tso(struct e1000_adapter *adapter, struct e1000_tx_ring *tx_ring,
 return err;
 		}
 
-		hdr_len = (skb_transport_offset(skb) + (skb-h.th-doff  2));
+		hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
 		mss = skb_shinfo(skb)-gso_size;
 		if (skb-protocol == htons(ETH_P_IP)) {
 			struct iphdr *iph = ip_hdr(skb);
@@ -3292,7 +3292,7 @@ e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 		/* TSO Workaround for 82571/2/3 Controllers -- if skb-data
 		* points to just header, pull a few bytes of payload from
 		* frags into skb-data */
-		hdr_len = (skb_transport_offset(skb) + 

[PATCH 11/15] [SK_BUFF]: Introduce ipip_hdr(), remove skb-h.ipiph

2007-03-13 Thread Arnaldo Carvalho de Melo

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
drivers/net/sk98lin/skge.c   |4 ++--
drivers/net/skge.c   |2 +-
include/linux/ip.h   |5 +
include/linux/skbuff.h   |1 -
net/ipv4/xfrm4_mode_tunnel.c |6 +++---
net/ipv6/xfrm6_mode_tunnel.c |2 +-
6 files changed, 12 insertions(+), 8 deletions(-)
From 13b28cb035592e5bfeb3da05550850b0c825e01e Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 15:52:43 -0300
Subject: [PATCH 11/15] [SK_BUFF]: Introduce ipip_hdr(), remove skb-h.ipiph

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 drivers/net/sk98lin/skge.c   |4 ++--
 drivers/net/skge.c   |2 +-
 include/linux/ip.h   |5 +
 include/linux/skbuff.h   |1 -
 net/ipv4/xfrm4_mode_tunnel.c |6 +++---
 net/ipv6/xfrm6_mode_tunnel.c |2 +-
 6 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index e4ab7a8..b987a5c 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -1565,7 +1565,7 @@ struct sk_buff	*pMessage)	/* pointer to send-message  */
 		u16 hdrlen = skb_transport_offset(pMessage);
 		u16 offset = hdrlen + pMessage-csum_offset;
 
-		if ((pMessage-h.ipiph-protocol == IPPROTO_UDP ) 
+		if ((ipip_hdr(pMessage)-protocol == IPPROTO_UDP) 
 			(pAC-GIni.GIChipRev == 0) 
 			(pAC-GIni.GIChipId == CHIP_ID_YUKON)) {
 			pTxd-TBControl = BMU_TCP_CHECK;
@@ -1691,7 +1691,7 @@ struct sk_buff	*pMessage)	/* pointer to send-message  */
 		** opcode for udp is not working in the hardware yet 
 		** (Revision 2.0)
 		*/
-		if ((pMessage-h.ipiph-protocol == IPPROTO_UDP ) 
+		if ((ipip_hdr(pMessage)-protocol == IPPROTO_UDP) 
 			(pAC-GIni.GIChipRev == 0) 
 			(pAC-GIni.GIChipId == CHIP_ID_YUKON)) {
 			Control |= BMU_TCP_CHECK;
diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index 609cdb4..26b0fe0 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -2626,7 +2626,7 @@ static int skge_xmit_frame(struct sk_buff *skb, struct net_device *dev)
 		/* This seems backwards, but it is what the sk98lin
 		 * does.  Looks like hardware is wrong?
 		 */
-		if (skb-h.ipiph-protocol == IPPROTO_UDP
+		if (ipip_hdr(skb)-protocol == IPPROTO_UDP
 	 hw-chip_rev == 0  hw-chip_id == CHIP_ID_YUKON)
 			control = BMU_TCP_CHECK;
 		else
diff --git a/include/linux/ip.h b/include/linux/ip.h
index f2f26db..1957844 100644
--- a/include/linux/ip.h
+++ b/include/linux/ip.h
@@ -111,6 +111,11 @@ static inline struct iphdr *ip_hdr(const struct sk_buff *skb)
 {
 	return (struct iphdr *)skb_network_header(skb);
 }
+
+static inline struct iphdr *ipip_hdr(const struct sk_buff *skb)
+{
+	return (struct iphdr *)skb-h.raw;
+}
 #endif
 
 struct ip_auth_hdr {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c5407b7..2b1e188 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -239,7 +239,6 @@ struct sk_buff {
 	struct net_device	*input_dev;
 
 	union {
-		struct iphdr	*ipiph;
 		struct ipv6hdr	*ipv6h;
 		unsigned char	*raw;
 	} h;
diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index edba756..521e52f 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -17,7 +17,7 @@
 static inline void ipip_ecn_decapsulate(struct sk_buff *skb)
 {
 	struct iphdr *outer_iph = ip_hdr(skb);
-	struct iphdr *inner_iph = skb-h.ipiph;
+	struct iphdr *inner_iph = ipip_hdr(skb);
 
 	if (INET_ECN_is_ce(outer_iph-tos))
 		IP_ECN_set_ce(inner_iph);
@@ -47,7 +47,7 @@ static int xfrm4_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 	int flags;
 
 	iph = ip_hdr(skb);
-	skb-h.ipiph = iph;
+	skb-h.raw = skb-nh.raw;
 
 	skb_push(skb, x-props.header_len);
 	skb_reset_network_header(skb);
@@ -116,7 +116,7 @@ static int xfrm4_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 	iph = ip_hdr(skb);
 	if (iph-protocol == IPPROTO_IPIP) {
 		if (x-props.flags  XFRM_STATE_DECAP_DSCP)
-			ipv4_copy_dscp(iph, skb-h.ipiph);
+			ipv4_copy_dscp(iph, ipip_hdr(skb));
 		if (!(x-props.flags  XFRM_STATE_NOECN))
 			ipip_ecn_decapsulate(skb);
 	}
diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index 28f36b3..9d3bd33 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -28,7 +28,7 @@ static inline void ipip6_ecn_decapsulate(struct sk_buff *skb)
 static inline void ip6ip_ecn_decapsulate(struct sk_buff *skb)
 {
 	if (INET_ECN_is_ce(ipv6_get_dsfield(ipv6_hdr(skb
-			IP_ECN_set_ce(skb-h.ipiph);
+			IP_ECN_set_ce(ipip_hdr(skb));
 }
 
 /* Add encapsulation header.
-- 
1.5.0.2



[PATCH 06/15] [SK_BUFF]: Introduce igmp_hdr() friends, remove skb-h.igmph

2007-03-13 Thread Arnaldo Carvalho de Melo

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
include/linux/igmp.h   |   21 +
include/linux/skbuff.h |1 -
net/ipv4/igmp.c|   22 +++---
net/ipv4/ipmr.c|2 +-
4 files changed, 33 insertions(+), 13 deletions(-)
From 515b800d7c7cc7f224ebc24525be83d1e0f956ed Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 14:19:23 -0300
Subject: [PATCH 06/15] [SK_BUFF]: Introduce igmp_hdr()  friends, remove skb-h.igmph

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 include/linux/igmp.h   |   21 +
 include/linux/skbuff.h |1 -
 net/ipv4/igmp.c|   22 +++---
 net/ipv4/ipmr.c|2 +-
 4 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index a113fe6..ca28552 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -80,6 +80,27 @@ struct igmpv3_query {
 	__be32 srcs[0];
 };
 
+#ifdef __KERNEL__
+#include linux/skbuff.h
+
+static inline struct igmphdr *igmp_hdr(const struct sk_buff *skb)
+{
+	return (struct igmphdr *)skb-h.raw;
+}
+
+static inline struct igmpv3_report *
+			igmpv3_report_hdr(const struct sk_buff *skb)
+{
+	return (struct igmpv3_report *)skb-h.raw;
+}
+
+static inline struct igmpv3_query *
+			igmpv3_query_hdr(const struct sk_buff *skb)
+{
+	return (struct igmpv3_query *)skb-h.raw;
+}
+#endif
+
 #define IGMP_HOST_MEMBERSHIP_QUERY	0x11	/* From RFC1112 */
 #define IGMP_HOST_MEMBERSHIP_REPORT	0x12	/* Ditto */
 #define IGMP_DVMRP			0x13	/* DVMRP routing */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 12bd740..a60d1e5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -242,7 +242,6 @@ struct sk_buff {
 		struct tcphdr	*th;
 		struct udphdr	*uh;
 		struct icmphdr	*icmph;
-		struct igmphdr	*igmph;
 		struct iphdr	*ipiph;
 		struct ipv6hdr	*ipv6h;
 		unsigned char	*raw;
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 596eaaa..56a838c 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -333,8 +333,8 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
 	((u8*)pip[1])[2] = 0;
 	((u8*)pip[1])[3] = 0;
 
-	pig =(struct igmpv3_report *)skb_put(skb, sizeof(*pig));
-	skb-h.igmph = (struct igmphdr *)pig;
+	skb-h.raw = skb_put(skb, sizeof(*pig));
+	pig = igmpv3_report_hdr(skb);
 	pig-type = IGMPV3_HOST_MEMBERSHIP_REPORT;
 	pig-resv1 = 0;
 	pig-csum = 0;
@@ -346,13 +346,13 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
 static int igmpv3_sendpack(struct sk_buff *skb)
 {
 	struct iphdr *pip = ip_hdr(skb);
-	struct igmphdr *pig = skb-h.igmph;
+	struct igmphdr *pig = igmp_hdr(skb);
 	const int iplen = skb-tail - skb-nh.raw;
 	const int igmplen = skb-tail - skb-h.raw;
 
 	pip-tot_len = htons(iplen);
 	ip_send_check(pip);
-	pig-csum = ip_compute_csum(skb-h.igmph, igmplen);
+	pig-csum = ip_compute_csum(igmp_hdr(skb), igmplen);
 
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, skb-dev,
 		   dst_output);
@@ -379,7 +379,7 @@ static struct sk_buff *add_grhead(struct sk_buff *skb, struct ip_mc_list *pmc,
 	pgr-grec_auxwords = 0;
 	pgr-grec_nsrcs = 0;
 	pgr-grec_mca = pmc-multiaddr;
-	pih = (struct igmpv3_report *)skb-h.igmph;
+	pih = igmpv3_report_hdr(skb);
 	pih-ngrec = htons(ntohs(pih-ngrec)+1);
 	*ppgr = pgr;
 	return skb;
@@ -412,7 +412,7 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ip_mc_list *pmc,
 	if (!*psf_list)
 		goto empty_source;
 
-	pih = skb ? (struct igmpv3_report *)skb-h.igmph : NULL;
+	pih = skb ? igmpv3_report_hdr(skb) : NULL;
 
 	/* EX and TO_EX get a fresh packet, if needed */
 	if (truncate) {
@@ -829,8 +829,8 @@ static void igmp_heard_report(struct in_device *in_dev, __be32 group)
 static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 	int len)
 {
-	struct igmphdr 		*ih = skb-h.igmph;
-	struct igmpv3_query *ih3 = (struct igmpv3_query *)ih;
+	struct igmphdr 		*ih = igmp_hdr(skb);
+	struct igmpv3_query *ih3 = igmpv3_query_hdr(skb);
 	struct ip_mc_list	*im;
 	__be32			group = ih-group;
 	int			max_delay;
@@ -863,12 +863,12 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 		if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)))
 			return;
 
-		ih3 = (struct igmpv3_query *) skb-h.raw;
+		ih3 = igmpv3_query_hdr(skb);
 		if (ih3-nsrcs) {
 			if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)
 	   + ntohs(ih3-nsrcs)*sizeof(__be32)))
 return;
-			ih3 = (struct igmpv3_query *) skb-h.raw;
+			ih3 = igmpv3_query_hdr(skb);
 		}
 
 		max_delay = IGMPV3_MRC(ih3-code)*(HZ/IGMP_TIMER_SCALE);
@@ -945,7 +945,7 @@ int igmp_rcv(struct sk_buff *skb)
 			goto drop;
 	}
 
-	ih = skb-h.igmph;
+	ih = igmp_hdr(skb);
 	switch (ih-type) {
 	case IGMP_HOST_MEMBERSHIP_QUERY:
 		igmp_heard_query(in_dev, skb, len);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 03869d9..05bc270 100644
--- a/net/ipv4/ipmr.c

[PATCH 07/15] [SK_BUFF]: Introduce udp_hdr(), remove skb-h.uh

2007-03-13 Thread Arnaldo Carvalho de Melo

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
drivers/net/gianfar.c |4 ++--
drivers/net/ioc3-eth.c|2 +-
drivers/net/mv643xx_eth.c |2 +-
include/linux/skbuff.h|1 -
include/linux/udp.h   |9 +
include/net/udplite.h |2 +-
net/core/netpoll.c|4 +++-
net/core/pktgen.c |4 ++--
net/ipv4/udp.c|   12 ++--
net/ipv6/udp.c|   10 +-
net/rxrpc/connection.c|4 ++--
net/rxrpc/transport.c |4 ++--
12 files changed, 34 insertions(+), 24 deletions(-)
From 2b56798afa3a60638188e29d3794263f07ef594c Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 14:28:48 -0300
Subject: [PATCH 07/15] [SK_BUFF]: Introduce udp_hdr(), remove skb-h.uh

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 drivers/net/gianfar.c |4 ++--
 drivers/net/ioc3-eth.c|2 +-
 drivers/net/mv643xx_eth.c |2 +-
 include/linux/skbuff.h|1 -
 include/linux/udp.h   |9 +
 include/net/udplite.h |2 +-
 net/core/netpoll.c|4 +++-
 net/core/pktgen.c |4 ++--
 net/ipv4/udp.c|   12 ++--
 net/ipv6/udp.c|   10 +-
 net/rxrpc/connection.c|4 ++--
 net/rxrpc/transport.c |4 ++--
 12 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index c9abc96..b9f4460 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -944,9 +944,9 @@ static inline void gfar_tx_checksum(struct sk_buff *skb, struct txfcb *fcb)
 	/* And provide the already calculated phcs */
 	if (ip_hdr(skb)-protocol == IPPROTO_UDP) {
 		flags |= TXFCB_UDP;
-		fcb-phcs = skb-h.uh-check;
+		fcb-phcs = udp_hdr(skb)-check;
 	} else
-		fcb-phcs = skb-h.th-check;
+		fcb-phcs = udp_hdr(skb)-check;
 
 	/* l3os is the distance between the start of the
 	 * frame (skb-data) and the start of the IP hdr.
diff --git a/drivers/net/ioc3-eth.c b/drivers/net/ioc3-eth.c
index d375e78..ba012e1 100644
--- a/drivers/net/ioc3-eth.c
+++ b/drivers/net/ioc3-eth.c
@@ -1422,7 +1422,7 @@ static int ioc3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		csoff = ETH_HLEN + (ih-ihl  2);
 		if (proto == IPPROTO_UDP) {
 			csoff += offsetof(struct udphdr, check);
-			skb-h.uh-check = csum;
+			udp_hdr(skb)-check = csum;
 		}
 		if (proto == IPPROTO_TCP) {
 			csoff += offsetof(struct tcphdr, check);
diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 92ecf76..af99068 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -1164,7 +1164,7 @@ static void eth_tx_submit_descs_for_skb(struct mv643xx_private *mp,
 		switch (ip_hdr(skb)-protocol) {
 		case IPPROTO_UDP:
 			cmd_sts |= ETH_UDP_FRAME;
-			desc-l4i_chk = skb-h.uh-check;
+			desc-l4i_chk = udp_hdr(skb)-check;
 			break;
 		case IPPROTO_TCP:
 			desc-l4i_chk = skb-h.th-check;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a60d1e5..a5d1087 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -240,7 +240,6 @@ struct sk_buff {
 
 	union {
 		struct tcphdr	*th;
-		struct udphdr	*uh;
 		struct icmphdr	*icmph;
 		struct iphdr	*ipiph;
 		struct ipv6hdr	*ipv6h;
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 7e08c07..1f58503 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -26,6 +26,15 @@ struct udphdr {
 	__sum16	check;
 };
 
+#ifdef __KERNEL__
+#include linux/skbuff.h
+
+static inline struct udphdr *udp_hdr(const struct sk_buff *skb)
+{
+	return (struct udphdr *)skb-h.raw;
+}
+#endif
+
 /* UDP socket options */
 #define UDP_CORK	1	/* Never send partially complete segments */
 #define UDP_ENCAP	100	/* Set the socket to accept encapsulated packets */
diff --git a/include/net/udplite.h b/include/net/udplite.h
index 7650320..635b0ea 100644
--- a/include/net/udplite.h
+++ b/include/net/udplite.h
@@ -101,7 +101,7 @@ static inline int udplite_sender_cscov(struct udp_sock *up, struct udphdr *uh)
 
 static inline __wsum udplite_csum_outgoing(struct sock *sk, struct sk_buff *skb)
 {
-	int cscov = udplite_sender_cscov(udp_sk(sk), skb-h.uh);
+	int cscov = udplite_sender_cscov(udp_sk(sk), udp_hdr(skb));
 	__wsum csum = 0;
 
 	skb-ip_summed = CHECKSUM_NONE; /* no HW support for checksumming */
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index cac0279..ae63087 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -296,7 +296,9 @@ void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
 	memcpy(skb-data, msg, len);
 	skb-len += len;
 
-	skb-h.uh = udph = (struct udphdr *) skb_push(skb, sizeof(*udph));
+	skb_push(skb, sizeof(*udph));
+	skb_reset_transport_header(skb);
+	udph = udp_hdr(skb);
 	udph-source = htons(np-local_port);
 	udph-dest = htons(np-remote_port);
 	udph-len = htons(udp_len);
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 6389693..2d49dbc 100644
--- 

[PATCH 08/15] [SK_BUFF]: Introduce icmp_hdr(), remove skb-h.icmph

2007-03-13 Thread Arnaldo Carvalho de Melo

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
include/linux/icmp.h   |9 +
include/linux/skbuff.h |1 -
net/dccp/ipv4.c|4 ++--
net/ipv4/ah4.c |4 ++--
net/ipv4/esp4.c|4 ++--
net/ipv4/icmp.c|   14 +++---
net/ipv4/ip_gre.c  |   12 ++--
net/ipv4/ip_sockglue.c |6 +++---
net/ipv4/ipcomp.c  |4 ++--
net/ipv4/ipip.c|   12 ++--
net/ipv4/raw.c |6 +++---
net/ipv4/tcp_ipv4.c|4 ++--
net/ipv4/udp.c |4 ++--
net/ipv6/sit.c |   12 ++--
net/sctp/input.c   |4 ++--
15 files changed, 54 insertions(+), 46 deletions(-)
From 5750acd8b55567ec9e8839f352f60af0f492f509 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 14:43:18 -0300
Subject: [PATCH 08/15] [SK_BUFF]: Introduce icmp_hdr(), remove skb-h.icmph

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 include/linux/icmp.h   |9 +
 include/linux/skbuff.h |1 -
 net/dccp/ipv4.c|4 ++--
 net/ipv4/ah4.c |4 ++--
 net/ipv4/esp4.c|4 ++--
 net/ipv4/icmp.c|   14 +++---
 net/ipv4/ip_gre.c  |   12 ++--
 net/ipv4/ip_sockglue.c |6 +++---
 net/ipv4/ipcomp.c  |4 ++--
 net/ipv4/ipip.c|   12 ++--
 net/ipv4/raw.c |6 +++---
 net/ipv4/tcp_ipv4.c|4 ++--
 net/ipv4/udp.c |4 ++--
 net/ipv6/sit.c |   12 ++--
 net/sctp/input.c   |4 ++--
 15 files changed, 54 insertions(+), 46 deletions(-)

diff --git a/include/linux/icmp.h b/include/linux/icmp.h
index 24da4fb..cd3017a 100644
--- a/include/linux/icmp.h
+++ b/include/linux/icmp.h
@@ -82,6 +82,15 @@ struct icmphdr {
   } un;
 };
 
+#ifdef __KERNEL__
+#include linux/skbuff.h
+
+static inline struct icmphdr *icmp_hdr(const struct sk_buff *skb)
+{
+	return (struct icmphdr *)skb-h.raw;
+}
+#endif
+
 /*
  *	constants for (set|get)sockopt
  */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a5d1087..eea512a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -240,7 +240,6 @@ struct sk_buff {
 
 	union {
 		struct tcphdr	*th;
-		struct icmphdr	*icmph;
 		struct iphdr	*ipiph;
 		struct ipv6hdr	*ipv6h;
 		unsigned char	*raw;
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index b85437d..718f2fa 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -207,8 +207,8 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
 			(iph-ihl  2));
 	struct dccp_sock *dp;
 	struct inet_sock *inet;
-	const int type = skb-h.icmph-type;
-	const int code = skb-h.icmph-code;
+	const int type = icmp_hdr(skb)-type;
+	const int code = icmp_hdr(skb)-code;
 	struct sock *sk;
 	__u64 seq;
 	int err;
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index ebcc797..e1bb9e0 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -198,8 +198,8 @@ static void ah4_err(struct sk_buff *skb, u32 info)
 	struct ip_auth_hdr *ah = (struct ip_auth_hdr*)(skb-data+(iph-ihl2));
 	struct xfrm_state *x;
 
-	if (skb-h.icmph-type != ICMP_DEST_UNREACH ||
-	skb-h.icmph-code != ICMP_FRAG_NEEDED)
+	if (icmp_hdr(skb)-type != ICMP_DEST_UNREACH ||
+	icmp_hdr(skb)-code != ICMP_FRAG_NEEDED)
 		return;
 
 	x = xfrm_state_lookup((xfrm_address_t *)iph-daddr, ah-spi, IPPROTO_AH, AF_INET);
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 82543ee..de019f9 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -304,8 +304,8 @@ static void esp4_err(struct sk_buff *skb, u32 info)
 	struct ip_esp_hdr *esph = (struct ip_esp_hdr*)(skb-data+(iph-ihl2));
 	struct xfrm_state *x;
 
-	if (skb-h.icmph-type != ICMP_DEST_UNREACH ||
-	skb-h.icmph-code != ICMP_FRAG_NEEDED)
+	if (icmp_hdr(skb)-type != ICMP_DEST_UNREACH ||
+	icmp_hdr(skb)-code != ICMP_FRAG_NEEDED)
 		return;
 
 	x = xfrm_state_lookup((xfrm_address_t *)iph-daddr, esph-spi, IPPROTO_ESP, AF_INET);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 4d70c21..8372f8b 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -355,7 +355,7 @@ static void icmp_push_reply(struct icmp_bxm *icmp_param,
 			   ipc, rt, MSG_DONTWAIT)  0)
 		ip_flush_pending_frames(icmp_socket-sk);
 	else if ((skb = skb_peek(icmp_socket-sk-sk_write_queue)) != NULL) {
-		struct icmphdr *icmph = skb-h.icmph;
+		struct icmphdr *icmph = icmp_hdr(skb);
 		__wsum csum = 0;
 		struct sk_buff *skb1;
 
@@ -613,7 +613,7 @@ static void icmp_unreach(struct sk_buff *skb)
 	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
 		goto out_err;
 
-	icmph = skb-h.icmph;
+	icmph = icmp_hdr(skb);
 	iph   = (struct iphdr *)skb-data;
 
 	if (iph-ihl  5) /* Mangled header, drop. */
@@ -743,7 +743,7 @@ static void icmp_redirect(struct sk_buff *skb)
 
 	iph = (struct iphdr *)skb-data;
 
-	switch (skb-h.icmph-code  7) {
+	switch (icmp_hdr(skb)-code  7) {
 	case ICMP_REDIR_NET:
 	case ICMP_REDIR_NETTOS:
 		/*
@@ -752,7 +752,7 @@ static void icmp_redirect(struct sk_buff *skb)
 	case ICMP_REDIR_HOST:
 

[PATCH 10/15] [SK_BUFF]: Introduce tcp_hdr(), remove skb-h.th

2007-03-13 Thread Arnaldo Carvalho de Melo

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
drivers/net/atl1/atl1_main.c   |7 ---
drivers/net/bnx2.c |8 
drivers/net/chelsio/sge.c  |2 +-
drivers/net/cxgb3/sge.c|2 +-
drivers/net/e1000/e1000_main.c |   11 ++-
drivers/net/ioc3-eth.c |2 +-
drivers/net/ixgb/ixgb_main.c   |7 ---
drivers/net/mv643xx_eth.c  |2 +-
drivers/net/tg3.c  |   15 +++
drivers/s390/net/qeth_eddp.c   |2 +-
drivers/s390/net/qeth_tso.h|4 ++--
include/linux/skbuff.h |1 -
include/linux/tcp.h|9 +++--
include/net/tcp.h  |2 +-
include/net/tcp_ecn.h  |6 +++---
net/ipv4/ip_output.c   |4 ++--
net/ipv4/syncookies.c  |   36 ++--
net/ipv4/tcp.c |   22 +++---
net/ipv4/tcp_input.c   |   28 +++-
net/ipv4/tcp_ipv4.c|   32 
net/ipv4/tcp_minisocks.c   |9 +
net/ipv4/tcp_output.c  |   13 -
net/ipv6/tcp_ipv6.c|   32 
23 files changed, 134 insertions(+), 122 deletions(-)
From 084fb923f59f068a3bdb95e7ae772811b7b09793 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 15:37:34 -0300
Subject: [PATCH 10/15] [SK_BUFF]: Introduce tcp_hdr(), remove skb-h.th

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 drivers/net/atl1/atl1_main.c   |7 ---
 drivers/net/bnx2.c |8 
 drivers/net/chelsio/sge.c  |2 +-
 drivers/net/cxgb3/sge.c|2 +-
 drivers/net/e1000/e1000_main.c |   11 ++-
 drivers/net/ioc3-eth.c |2 +-
 drivers/net/ixgb/ixgb_main.c   |7 ---
 drivers/net/mv643xx_eth.c  |2 +-
 drivers/net/tg3.c  |   15 +++
 drivers/s390/net/qeth_eddp.c   |2 +-
 drivers/s390/net/qeth_tso.h|4 ++--
 include/linux/skbuff.h |1 -
 include/linux/tcp.h|9 +++--
 include/net/tcp.h  |2 +-
 include/net/tcp_ecn.h  |6 +++---
 net/ipv4/ip_output.c   |4 ++--
 net/ipv4/syncookies.c  |   36 ++--
 net/ipv4/tcp.c |   22 +++---
 net/ipv4/tcp_input.c   |   28 +++-
 net/ipv4/tcp_ipv4.c|   32 
 net/ipv4/tcp_minisocks.c   |9 +
 net/ipv4/tcp_output.c  |   13 -
 net/ipv6/tcp_ipv6.c|   32 
 23 files changed, 134 insertions(+), 122 deletions(-)

diff --git a/drivers/net/atl1/atl1_main.c b/drivers/net/atl1/atl1_main.c
index 0912d2a..cc131d3 100644
--- a/drivers/net/atl1/atl1_main.c
+++ b/drivers/net/atl1/atl1_main.c
@@ -1298,9 +1298,10 @@ static int atl1_tso(struct atl1_adapter *adapter, struct sk_buff *skb,
 
 			iph-tot_len = 0;
 			iph-check = 0;
-			skb-h.th-check = ~csum_tcpudp_magic(iph-saddr,
-			  iph-daddr, 0,
-			  IPPROTO_TCP, 0);
+			tcp_hdr(skb)-check = ~csum_tcpudp_magic(iph-saddr,
+ iph-daddr, 0,
+ IPPROTO_TCP,
+ 0);
 			ipofst = skb_network_offset(skb);
 			if (ipofst != ENET_HEADER_SIZE) /* 802.3 frame */
 tso-tsopl |= 1  TSO_PARAM_ETHTYPE_SHIFT;
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index f948918..5658b46 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -4523,7 +4523,7 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		vlan_tag_flags |= TX_BD_FLAGS_SW_LSO;
 
 		tcp_opt_len = 0;
-		if (skb-h.th-doff  5)
+		if (tcp_hdr(skb)-doff  5)
 			tcp_opt_len = tcp_optlen(skb);
 
 		ip_tcp_len = ip_hdrlen(skb) + sizeof(struct tcphdr);
@@ -4531,9 +4531,9 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		iph = ip_hdr(skb);
 		iph-check = 0;
 		iph-tot_len = htons(mss + ip_tcp_len + tcp_opt_len);
-		skb-h.th-check = ~csum_tcpudp_magic(iph-saddr, iph-daddr,
-		  0, IPPROTO_TCP, 0);
-
+		tcp_hdr(skb)-check = ~csum_tcpudp_magic(iph-saddr,
+			 iph-daddr, 0,
+			 IPPROTO_TCP, 0);
 		if (tcp_opt_len || (iph-ihl  5)) {
 			vlan_tag_flags |= ((iph-ihl - 5) +
 	   (tcp_opt_len  2))  8;
diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c
index a4204df..43e92f9 100644
--- a/drivers/net/chelsio/sge.c
+++ b/drivers/net/chelsio/sge.c
@@ -1872,7 +1872,7 @@ int t1_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		hdr-opcode = CPL_TX_PKT_LSO;
 		hdr-ip_csum_dis = hdr-l4_csum_dis = 0;
 		hdr-ip_hdr_words = ip_hdr(skb)-ihl;
-		hdr-tcp_hdr_words = skb-h.th-doff;
+		hdr-tcp_hdr_words = tcp_hdr(skb)-doff;
 		hdr-eth_type_mss = htons(MK_ETH_TYPE_MSS(eth_type,
 			  skb_shinfo(skb)-gso_size));
 		hdr-len = htonl(skb-len - sizeof(*hdr));
diff --git a/drivers/net/cxgb3/sge.c 

[PATCH 12/15] [SK_BUFF]: Introduce ipipv6_hdr(), remove skb-h.ipv6h

2007-03-13 Thread Arnaldo Carvalho de Melo

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
include/linux/ipv6.h |5 +
include/linux/skbuff.h   |1 -
net/ipv6/xfrm6_mode_beet.c   |4 ++--
net/ipv6/xfrm6_mode_tunnel.c |8 
4 files changed, 11 insertions(+), 7 deletions(-)
From bb1c2a7d91b74b6d7952dda388bbafc2341f563f Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 16:15:37 -0300
Subject: [PATCH 12/15] [SK_BUFF]: Introduce ipipv6_hdr(), remove skb-h.ipv6h

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 include/linux/ipv6.h |5 +
 include/linux/skbuff.h   |1 -
 net/ipv6/xfrm6_mode_beet.c   |4 ++--
 net/ipv6/xfrm6_mode_tunnel.c |8 
 4 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 096dcd2..df48c96 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -225,6 +225,11 @@ static inline struct ipv6hdr *ipv6_hdr(const struct sk_buff *skb)
 	return (struct ipv6hdr *)skb_network_header(skb);
 }
 
+static inline struct ipv6hdr *ipipv6_hdr(const struct sk_buff *skb)
+{
+	return (struct ipv6hdr *)skb-h.raw;
+}
+
 /* 
This structure contains results of exthdrs parsing
as offsets from skb-nh.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2b1e188..f69a06d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -239,7 +239,6 @@ struct sk_buff {
 	struct net_device	*input_dev;
 
 	union {
-		struct ipv6hdr	*ipv6h;
 		unsigned char	*raw;
 	} h;
 
diff --git a/net/ipv6/xfrm6_mode_beet.c b/net/ipv6/xfrm6_mode_beet.c
index abac094..0cc96ec 100644
--- a/net/ipv6/xfrm6_mode_beet.c
+++ b/net/ipv6/xfrm6_mode_beet.c
@@ -47,8 +47,8 @@ static int xfrm6_beet_output(struct xfrm_state *x, struct sk_buff *skb)
 
 	skb_reset_network_header(skb);
 	top_iph = ipv6_hdr(skb);
-	skb-nh.raw = top_iph-nexthdr;
-	skb-h.ipv6h = top_iph + 1;
+	skb-h.raw = skb-nh.raw + sizeof(struct ipv6hdr);
+	skb-nh.raw += offsetof(struct ipv6hdr, nexthdr);
 
 	ipv6_addr_copy(top_iph-saddr, (struct in6_addr *)x-props.saddr);
 	ipv6_addr_copy(top_iph-daddr, (struct in6_addr *)x-id.daddr);
diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index 9d3bd33..21d65df 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -19,7 +19,7 @@
 static inline void ipip6_ecn_decapsulate(struct sk_buff *skb)
 {
 	struct ipv6hdr *outer_iph = ipv6_hdr(skb);
-	struct ipv6hdr *inner_iph = skb-h.ipv6h;
+	struct ipv6hdr *inner_iph = ipipv6_hdr(skb);
 
 	if (INET_ECN_is_ce(ipv6_get_dsfield(outer_iph)))
 		IP6_ECN_set_ce(inner_iph);
@@ -55,8 +55,8 @@ static int xfrm6_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 
 	skb_reset_network_header(skb);
 	top_iph = ipv6_hdr(skb);
-	skb-nh.raw = top_iph-nexthdr;
-	skb-h.ipv6h = top_iph + 1;
+	skb-h.raw = skb-nh.raw + sizeof(struct ipv6hdr);
+	skb-nh.raw += offsetof(struct ipv6hdr, nexthdr);
 
 	top_iph-version = 6;
 	if (xdst-route-ops-family == AF_INET6) {
@@ -102,7 +102,7 @@ static int xfrm6_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 	nh = skb_network_header(skb);
 	if (nh[IP6CB(skb)-nhoff] == IPPROTO_IPV6) {
 		if (x-props.flags  XFRM_STATE_DECAP_DSCP)
-			ipv6_copy_dscp(ipv6_hdr(skb), skb-h.ipv6h);
+			ipv6_copy_dscp(ipv6_hdr(skb), ipipv6_hdr(skb));
 		if (!(x-props.flags  XFRM_STATE_NOECN))
 			ipip6_ecn_decapsulate(skb);
 	} else {
-- 
1.5.0.2



[PATCH 13/15] [SK_BUFF]: More skb_reset_transport_header conversions

2007-03-13 Thread Arnaldo Carvalho de Melo

These are a bit more subtle, they are of this type:

-   skb-h.raw = payload;
   __skb_pull(skb, payload - skb-data);
+   skb_reset_transport_header(skb);

__skb_pull results in:

skb-data = skb-data + payload - skb-data;
skb-data = payload;

So after __skb_pull we have skb-data pointing to payload and we can
just call skb_reset_transport_header(skb), that will do:

skb-h.raw = payload;

The others are similar, allowing us to get rid of some more cases where a
pointer was being attributed to the layer headers.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
net/ipv4/ip_sockglue.c |   12 +++-
net/ipv6/datagram.c|4 ++--
2 files changed, 9 insertions(+), 7 deletions(-)
From 7684a24cb0328e519298a3674e08db4bca6861c3 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 17:10:43 -0300
Subject: [PATCH 13/15] [SK_BUFF]: More skb_reset_transport_header conversions

These are a bit more subtle, they are of this type:

-   skb-h.raw = payload;
__skb_pull(skb, payload - skb-data);
+   skb_reset_transport_header(skb);

__skb_pull results in:

skb-data = skb-data + payload - skb-data;
skb-data = payload;

So after __skb_pull we have skb-data pointing to payload and we can
just call skb_reset_transport_header(skb), that will do:

skb-h.raw = payload;

The others are similar, allowing us to get rid of some more cases where a
pointer was being attributed to the layer headers.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 net/ipv4/ip_sockglue.c |   12 +++-
 net/ipv6/datagram.c|4 ++--
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ccdc59d..fcb35cd 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -278,10 +278,12 @@ void ip_icmp_error(struct sock *sk, struct sk_buff *skb, int err,
    skb_network_header(skb);
 	serr-port = port;
 
-	skb-h.raw = payload;
-	if (!skb_pull(skb, payload - skb-data) ||
-	sock_queue_err_skb(sk, skb))
-		kfree_skb(skb);
+	if (skb_pull(skb, payload - skb-data) != NULL) {
+		skb_reset_transport_header(skb);
+		if (sock_queue_err_skb(sk, skb) == 0)
+			return;
+	}
+	kfree_skb(skb);
 }
 
 void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 info)
@@ -314,8 +316,8 @@ void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
 	serr-addr_offset = (u8 *)iph-daddr - skb_network_header(skb);
 	serr-port = port;
 
-	skb-h.raw = skb-tail;
 	__skb_pull(skb, skb-tail - skb-data);
+	skb_reset_transport_header(skb);
 
 	if (sock_queue_err_skb(sk, skb))
 		kfree_skb(skb);
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index feba6b1..f16f4f0 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -231,8 +231,8 @@ void ipv6_icmp_error(struct sock *sk, struct sk_buff *skb, int err,
   skb_network_header(skb);
 	serr-port = port;
 
-	skb-h.raw = payload;
 	__skb_pull(skb, payload - skb-data);
+	skb_reset_transport_header(skb);
 
 	if (sock_queue_err_skb(sk, skb))
 		kfree_skb(skb);
@@ -268,8 +268,8 @@ void ipv6_local_error(struct sock *sk, int err, struct flowi *fl, u32 info)
 	serr-addr_offset = (u8 *)iph-daddr - skb_network_header(skb);
 	serr-port = fl-fl_ip_dport;
 
-	skb-h.raw = skb-tail;
 	__skb_pull(skb, skb-tail - skb-data);
+	skb_reset_transport_header(skb);
 
 	if (sock_queue_err_skb(sk, skb))
 		kfree_skb(skb);
-- 
1.5.0.2



[PATCH 14/15] [SCTP]: Eliminate some pointer attributions to the skb layer headers

2007-03-13 Thread Arnaldo Carvalho de Melo

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
net/sctp/input.c |8 
net/sctp/ipv6.c  |5 ++---
2 files changed, 6 insertions(+), 7 deletions(-)
From b9c0a34313240c6f74bcc5587a496493480daf7d Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 17:17:10 -0300
Subject: [PATCH 14/15] [SCTP]: Eliminate some pointer attributions to the skb layer headers

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 net/sctp/input.c |8 
 net/sctp/ipv6.c  |5 ++---
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/sctp/input.c b/net/sctp/input.c
index 40d0df8..f38e91b 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -506,7 +506,7 @@ void sctp_err_finish(struct sock *sk, struct sctp_association *asoc)
 void sctp_v4_err(struct sk_buff *skb, __u32 info)
 {
 	struct iphdr *iph = (struct iphdr *)skb-data;
-	struct sctphdr *sh = (struct sctphdr *)(skb-data + (iph-ihl 2));
+	const int ihlen = iph-ihl * 4;
 	const int type = icmp_hdr(skb)-type;
 	const int code = icmp_hdr(skb)-code;
 	struct sock *sk;
@@ -516,7 +516,7 @@ void sctp_v4_err(struct sk_buff *skb, __u32 info)
 	char *saveip, *savesctp;
 	int err;
 
-	if (skb-len  ((iph-ihl  2) + 8)) {
+	if (skb-len  ihlen + 8) {
 		ICMP_INC_STATS_BH(ICMP_MIB_INERRORS);
 		return;
 	}
@@ -525,8 +525,8 @@ void sctp_v4_err(struct sk_buff *skb, __u32 info)
 	saveip = skb-nh.raw;
 	savesctp  = skb-h.raw;
 	skb_reset_network_header(skb);
-	skb-h.raw = (char *)sh;
-	sk = sctp_err_lookup(AF_INET, skb, sh, asoc, transport);
+	skb_set_transport_header(skb, ihlen);
+	sk = sctp_err_lookup(AF_INET, skb, sctp_hdr(skb), asoc, transport);
 	/* Put back, the original pointers. */
 	skb-nh.raw = saveip;
 	skb-h.raw = savesctp;
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index dff72e0..6cad0f4 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -122,7 +122,6 @@ SCTP_STATIC void sctp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 			 int type, int code, int offset, __be32 info)
 {
 	struct inet6_dev *idev;
-	struct sctphdr *sh = (struct sctphdr *)(skb-data + offset);
 	struct sock *sk;
 	struct sctp_association *asoc;
 	struct sctp_transport *transport;
@@ -136,8 +135,8 @@ SCTP_STATIC void sctp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	saveip = skb-nh.raw;
 	savesctp  = skb-h.raw;
 	skb_reset_network_header(skb);
-	skb-h.raw = (char *)sh;
-	sk = sctp_err_lookup(AF_INET6, skb, sh, asoc, transport);
+	skb_set_transport_header(skb, offset);
+	sk = sctp_err_lookup(AF_INET6, skb, sctp_hdr(skb), asoc, transport);
 	/* Put back, the original pointers. */
 	skb-nh.raw = saveip;
 	skb-h.raw = savesctp;
-- 
1.5.0.2



[PATCH 15/15] [SK_BUFF]: Introduce skb_transport_header(skb)

2007-03-13 Thread Arnaldo Carvalho de Melo

For the places where we need a pointer to the transport header, it is
still legal
to touch skb-h.raw directly if just adding to, subtracting from or setting it
to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
drivers/net/appletalk/ltpc.c|7 +--
drivers/net/cxgb3/sge.c |8 +---
drivers/s390/net/qeth_eddp.c|4 ++--
include/linux/atalk.h   |4 ++--
include/linux/dccp.h|   19 ---
include/linux/icmp.h|2 +-
include/linux/icmpv6.h  |2 +-
include/linux/igmp.h|6 +++---
include/linux/ip.h  |2 +-
include/linux/ipv6.h|2 +-
include/linux/sctp.h|2 +-
include/linux/skbuff.h  |5 +
include/linux/tcp.h |2 +-
include/linux/udp.h |2 +-
include/net/ipx.h   |2 +-
include/net/pkt_cls.h   |2 +-
include/net/udp.h   |4 ++--
net/802/psnap.c |2 +-
net/ax25/af_ax25.c  |5 +++--
net/bluetooth/hci_core.c|4 ++--
net/core/dev.c  |6 +++---
net/econet/af_econet.c  |2 +-
net/ipv4/igmp.c |2 +-
net/ipv4/ip_gre.c   |2 +-
net/ipv4/ip_output.c|6 --
net/ipv4/ipconfig.c |4 ++--
net/ipv4/ipmr.c |8 +---
net/ipv4/tcp.c  |   12 +++-
net/ipv4/tcp_input.c|   13 +++--
net/ipv4/xfrm4_mode_beet.c  |2 +-
net/ipv4/xfrm4_mode_transport.c |5 +++--
net/ipv6/ah6.c  |2 +-
net/ipv6/esp6.c |2 +-
net/ipv6/exthdrs.c  |   21 ++---
net/ipv6/ipcomp6.c  |2 +-
net/ipv6/mcast.c|   16 +---
net/ipv6/mip6.c |8 
net/ipv6/ndisc.c|   17 +
net/ipv6/raw.c  |2 +-
net/ipv6/reassembly.c   |2 +-
net/ipv6/xfrm6_mode_transport.c |5 +++--
net/xfrm/xfrm_input.c   |6 +++---
42 files changed, 129 insertions(+), 102 deletions(-)
From eab29f0961397cd4e0f54b9448c7f7a4be39b3e9 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Tue, 13 Mar 2007 17:20:39 -0300
Subject: [PATCH 15/15] [SK_BUFF]: Introduce skb_transport_header(skb)

For the places where we need a pointer to the transport header, it is still legal
to touch skb-h.raw directly if just adding to, subtracting from or setting it
to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
---
 drivers/net/appletalk/ltpc.c|7 +--
 drivers/net/cxgb3/sge.c |8 +---
 drivers/s390/net/qeth_eddp.c|4 ++--
 include/linux/atalk.h   |4 ++--
 include/linux/dccp.h|   19 ---
 include/linux/icmp.h|2 +-
 include/linux/icmpv6.h  |2 +-
 include/linux/igmp.h|6 +++---
 include/linux/ip.h  |2 +-
 include/linux/ipv6.h|2 +-
 include/linux/sctp.h|2 +-
 include/linux/skbuff.h  |5 +
 include/linux/tcp.h |2 +-
 include/linux/udp.h |2 +-
 include/net/ipx.h   |2 +-
 include/net/pkt_cls.h   |2 +-
 include/net/udp.h   |4 ++--
 net/802/psnap.c |2 +-
 net/ax25/af_ax25.c  |5 +++--
 net/bluetooth/hci_core.c|4 ++--
 net/core/dev.c  |6 +++---
 net/econet/af_econet.c  |2 +-
 net/ipv4/igmp.c |2 +-
 net/ipv4/ip_gre.c   |2 +-
 net/ipv4/ip_output.c|6 --
 net/ipv4/ipconfig.c |4 ++--
 net/ipv4/ipmr.c |8 +---
 net/ipv4/tcp.c  |   12 +++-
 net/ipv4/tcp_input.c|   13 +++--
 net/ipv4/xfrm4_mode_beet.c  |2 +-
 net/ipv4/xfrm4_mode_transport.c |5 +++--
 net/ipv6/ah6.c  |2 +-
 net/ipv6/esp6.c |2 +-
 net/ipv6/exthdrs.c  |   21 ++---
 net/ipv6/ipcomp6.c  |2 +-
 net/ipv6/mcast.c|   16 +---
 net/ipv6/mip6.c |8 
 net/ipv6/ndisc.c|   17 +
 net/ipv6/raw.c  |2 +-
 net/ipv6/reassembly.c   |2 +-
 net/ipv6/xfrm6_mode_transport.c |5 +++--
 net/xfrm/xfrm_input.c   |6 +++---
 42 files changed, 129 insertions(+), 102 deletions(-)

diff --git a/drivers/net/appletalk/ltpc.c b/drivers/net/appletalk/ltpc.c
index dc3bce9..43c17c8 100644
--- a/drivers/net/appletalk/ltpc.c
+++ b/drivers/net/appletalk/ltpc.c
@@ -917,6 +917,7 @@ static int ltpc_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	int i;
 	struct lt_sendlap cbuf;
+	

Re: [PATCH 09/15] [TCP]: Introduce tcp_hdrlen() and tcp_optlen()

2007-03-13 Thread Jesse Brandeburg

On 3/13/07, Arnaldo Carvalho de Melo [EMAIL PROTECTED] wrote:

Introduce tcp_hdrlen() and tcp_optlen():
The ip_hdrlen() buddy, created to reduce the number of skb-h.th- uses and to
avoid the longer, open coded equivalent.


+static inline unsigned int tcp_hdrlen(const struct sk_buff *skb)
+{
+   return skb-h.th-doff * 4;
+}
+
+static inline unsigned int tcp_optlen(const struct sk_buff *skb)
+{
+   return (skb-h.th-doff - 5) * 4;
+}

acme, good stuff, but does the  * 4 generate equivalent assembly
with gcc 3/4 as   2 ?

I could assume that the compiler would be smart enough, but every time
I assume I know what the compiler is doing I get myself in trouble.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/15] [TCP]: Introduce tcp_hdrlen() and tcp_optlen()

2007-03-13 Thread Jesse Brandeburg

On 3/13/07, Jesse Brandeburg [EMAIL PROTECTED] wrote:

acme, good stuff, but does the  * 4 generate equivalent assembly
with gcc 3/4 as   2 ?

I could assume that the compiler would be smart enough, but every time
I assume I know what the compiler is doing I get myself in trouble.


nevermind, I wrote a program myself to test it (which I should have
done first).  with x86-64 gcc 3.4.6 or 4.1.0 it always comes out to
shl 2, %eax

etc, sorry for the noise.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tc35815: Fix an usage of streaming DMA API.

2007-03-13 Thread Atsushi Nemoto
On Tue, 13 Mar 2007 12:04:18 -0700, Stephen Hemminger [EMAIL PROTECTED] wrote:
  + * 1.35Fix an usage of streaming DMA API.
*/
 
 Please don't use comments as changelog anymore. It gets out of date.
 The use of change control systems has made this practice obsolete.

OK, Jeff, should I send a revised patch dropping this line?

---
Atsushi Nemoto
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2/6] 2.6.21-rc3: known regressions

2007-03-13 Thread Adrian Bunk
This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject: ipv6 crash
References : http://lkml.org/lkml/2007/3/10/2
Submitter  : Len Brown [EMAIL PROTECTED]
Status : unknown


Subject: ThinkPad X60: bluetooth hardlocks
References : http://lkml.org/lkml/2007/3/2/85
Submitter  : Pavel Machek [EMAIL PROTECTED]
Handled-By : Marcel Holtmann [EMAIL PROTECTED]
Status : unknown


Subject: forcedeth: skb_over_panic
References : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Submitter  : Albert Hopkins [EMAIL PROTECTED]
Handled-By : Ayaz Abdulla [EMAIL PROTECTED]
Status : problem is being debugged


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD

2007-03-13 Thread Benjamin LaHaise
On Tue, Mar 13, 2007 at 01:39:12AM -0700, Roland McGrath wrote:
 The OPEN_MAX constant is an arbitrary number with no useful relation to
 anything.  Nothing should be using it.  This patch changes SCM_MAX_FD to
 use NR_OPEN instead of OPEN_MAX.  This increases the size of the struct
 scm_fp_list type fourfold, to make it big enough to contain as many file
 descriptors as could be asked of it.  This size increase may not be very
 worthwhile, but at any rate if an arbitrary limit unrelated to anything
 else is being defined it should be done explicitly here with:

 -#define SCM_MAX_FD   (OPEN_MAX-1)
 +#define SCM_MAX_FD   (NR_OPEN-1)

This is a bad idea.  From linux/fs.h:

#undef NR_OPEN
#define NR_OPEN (1024*1024) /* Absolute upper limit on fd num */

There isn't anything I can see guaranteeing that net/scm.h is included 
before fs.h.  This affects networking and should really be Cc'd to 
netdev@vger.kernel.org, which will raise the issue that if SCM_MAX_FD is 
raised, the resulting simple kmalloc() must be changed.  That said, I 
doubt SCM_MAX_FD really needs to be raised, as applications using many 
file descriptors are unlikely to try to send their entire file table to 
another process in one go -- they have to handle the limits imposed by 
SCM_MAX_FD anyways.

-ben
-- 
Time is of no importance, Mr. President, only life is important.
Don't Email: [EMAIL PROTECTED].
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/4] [TULIP] Rev tulip version

2007-03-13 Thread Andy Gospodarek
On Mon, Mar 12, 2007 at 10:07:33AM -0400, Jeff Garzik wrote:
 Pekka Enberg wrote:
 Hi,
 
 On 3/12/07, Valerie Henson [EMAIL PROTECTED] wrote:
 --- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip_core.c
 +++ tulip-2.6-mm-linux/drivers/net/tulip/tulip_core.c
 @@ -17,11 +17,11 @@
 
  #define DRV_NAME   tulip
  #ifdef CONFIG_TULIP_NAPI
 -#define DRV_VERSION1.1.14-NAPI /* Keep at least for test */
 +#define DRV_VERSION1.1.15-NAPI /* Keep at least for test */
  #else
 -#define DRV_VERSION1.1.14
 +#define DRV_VERSION1.1.15
  #endif
 -#define DRV_RELDATEMay 11, 2002
 +#define DRV_RELDATEFeb 27, 2007
 
 Why not just drop this? What purpose does a per-module revision have
 for in-kernel drivers anyway?
 
 It's the maintainer's call.  Sometimes it eases parsing bug reports, and 
 tracking changes as your drivers get backported to various enterprise 
 operating systems(tm).  Sometimes it just gets in the way.
 

It's good to keep this type of information in drivers.  I've been
thinking lately that it would be nice to even expand it a little bit
(maybe include the commit sum) so its easier to help those who aren't
running the latest upstream kernels on their boxes

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD

2007-03-13 Thread Roland McGrath
  -#define SCM_MAX_FD (OPEN_MAX-1)
  +#define SCM_MAX_FD (NR_OPEN-1)
 
 This is a bad idea.  [...]

Ok.  My only agenda is to get rid of OPEN_MAX.
I then propose the following instead.


Thanks,
Roland

---
[PATCH] avoid OPEN_MAX in SCM_MAX_FD

The OPEN_MAX constant is an arbitrary number with no useful relation to
anything.  Nothing should be using it.  SCM_MAX_FD is just an arbitrary
constant and it should be clear that its value is chosen in net/scm.h
and not actually derived from anything else meaningful in the system.

Signed-off-by: Roland McGrath [EMAIL PROTECTED]
---
 include/net/scm.h |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/net/scm.h b/include/net/scm.h
index 5637d5e..2240690 100644  
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -8,7 +8,7 @@
 /* Well, we should have at least one descriptor open
  * to accept passed FDs 8)
  */
-#define SCM_MAX_FD (OPEN_MAX-1)
+#define SCM_MAX_FD 255
 
 struct scm_fp_list
 {
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcp_cubic: use 32 bit math

2007-03-13 Thread Willy Tarreau
Hi Stephen,

On Mon, Mar 12, 2007 at 02:11:56PM -0700, Stephen Hemminger wrote:
  Oh BTW, I have a newer version with a first approximation of the
  cbrt() before the div64_64, which allows us to reduce from 3 div64
  to only 2 div64. This results in a version which is twice as fast
  as the initial one (ncubic), but with slightly less accuracy (0.286%
  compared to 0.247). But I see that other functions such as hcbrt()
  had a 1.5% avg error, so I think this is not dramatic.
 
 Ignore my hcbrt() it was a less accurate version of andi's stuff.

OK.

  Also, I managed to remove all other divides, to be kind with CPUs
  having a slow divide instruction or no divide at all. Since we compute
  on limited range (22 bits), we can multiply then shift right. It shows
  me even slightly better time on pentium-m and athlon, with a slightly
  higher avg error (0.297% compared to 0.286%), and slightly smaller
  code.
 
 What does the code look like?

Well, I have cleaned it a little bit, there were more comments and ifdefs
than code ! I've appended it to the end of this mail.

I have changed it a bit, because I noticed that integer divide precision
was so coarse that there were other possibilities to play with the bits.

I have experimented with combinations of several methods :
  - replace integer divides with multiplies/shifts where possible.

  - compensation for divide imprecisions by adding/removing small
values bofore/after them. Often, the integer result of 1/(x*(x-1))
is closer to (float)1/(float)x^2 than 1/(x*x). This is because
the divide always truncates the result.

  - use direct result lookup for small values. Small inputs give small
outputs which have very few moving bits. Many different values fit
in a 32bit integer, so we use a shift offset to lookup the value.
I used this in an fls function I wrote a while ago, that I should
also post because it is up to twice as fast as the kernel's.
Sometimes it seems faster to lookup in from memory, sometimes it
is faster from an immediate value. Maybe more visible differences
would show up on RISC CPUs where loading 32 bits immediate needs
two instructions. I don't know yet, I've not tested on my sparc
yet.

  - use small lookup tables (64 bytes) with 6 bits inputs and at least
as many on output. We only lookup the 6 MSB and return the 2-3 MSB
of the result.

  - iterative search and manual refinment of the lookup tables for best
accuracy. The avg error rate can easily be halved this way.

I have duplicated tried several functions with 0, 1, 2 and 3 divides.
Several of them offer better accuracy over what we currently have, in
less cycles. Others offer faster results (up to 5 times) with slightly
less accuracy.

There is one function which is not to be used, but is just here for
comparison (ncubic_0div). It does no divide but has awful avg error.

But one which is interesting is the ncubic_tab0. It does not use any
divide at all, even not any div64. It shows a 0.6% avg error, which I'm
not sure is enough or not. It is 6.7 times faster than initial ncubic()
with less accuracy, and 4 times smaller. I suspect that it can differ
more on architectures which have no divide instruction.

Is 0.6% avg error rate is too much, ncubic_tab1() uses one single div64
and is twice slower (still nearly 3 times faster than ncubic). It show
0.195% avg error, which is better than initial ncubic. I think that it
is a good tradeoff.

If best accuracy is an absolute requirement, then I have a variation of
ncubic (ncubic_3div) which does 0.17% in 2/3 of the time (compared to
0.247%), and which is slightly smaller.

I have also added a size column, indicating approximative function
size, provided that the compiler does not reorder the code. On gcc 3.4,
it's OK, but 4.1 returns garbage. That does not matter, it's just a
rough estimate anyway.

Here are the results classed by speed :

/* Sample output on a Pentium-M 600 MHz :

Function  clocks mean(us)  max(us)  std(us) Avg err size
ncubic_tab0   79 0.66 7.20 1.04  0.613%  160
ncubic_0div   84 0.70 7.64 1.57  4.521%  192
ncubic_1div  178 1.4816.27 1.81  0.443%  336
ncubic_tab1  179 1.4916.34 1.85  0.195%  320
ncubic_ndiv3 263 2.1824.04 3.59  0.250%  512
ncubic_2div  270 2.2424.70 2.77  0.187%  512
ncubic32_1   359 2.9832.81 3.59  0.238%  544
ncubic_3div  361 2.9933.08 3.79  0.170%  656
ncubic32 364 3.0233.29 3.51  0.247%  544
ncubic   529 4.3948.39 4.92  0.247%  720
hcbrt539 4.4749.25 5.98  1.580%   96
ocubic   732 4.9361.83 7.22  0.274%  320
acbrt842 6.9876.73 8.55  0.275%  192
bictcp  1032 6.9586.30 9.04  0.172%  768

And now by avg error :

ncubic_3div  

Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD

2007-03-13 Thread Linus Torvalds


On Tue, 13 Mar 2007, Roland McGrath wrote:
 
 The OPEN_MAX constant is an arbitrary number with no useful relation to
 anything.  Nothing should be using it.  SCM_MAX_FD is just an arbitrary
 constant and it should be clear that its value is chosen in net/scm.h
 and not actually derived from anything else meaningful in the system.

I'd actually prefer this as part of the remove OPEN_MAX patch.

It's certainly nice to have small independent patches in a series, but two 
one-liners that really aren't all that independent either in practice or 
in goals doesn't make much sense to me. Much better to just be up-front 
about things and say: remove OPEN_MAX, and to do so, just rewrite that 
other arbitrary constant to not need it any more.

That said, it actually worries me that you should call _SC_OPEN_MAX. I 
think the whole POSIX config method is way over-designed (anybody who 
thinks you should ever have used _SC_HZ or whatever it was called was just 
crazy), but more importantly, and independently of that worry, I just 
suspect a lot of programs simply _don't_do_it_.

For example, I know perfectly well that I should use _SC_PATH_MAX, but a 
*lot* of code simply doesn't care. In git, I used PATH_MAX, and the reason 
is that
 - I want a constant for arrays
 - I don't care that much about the exact value, I just want a reasonable 
   value for sizing an array for some random path
 - _SC_PATH_MAX is practically unportable and simply not *useful*.

.. in short, I'm not a big believer in programs should do Xyz according 
to some paper standard. Paper standards are written by committees, not 
programmers, and seldom take issues other than politics into account.

So, what's the likelihood that this will break some old programs? I 
realize that modern distributions don't put the kernel headers in their 
user-visible includes any more, but the breakage is most likely exactly 
for old programs and older distributions.

Linus
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD

2007-03-13 Thread Roland McGrath
 I'd actually prefer this as part of the remove OPEN_MAX patch.

Ok.  (But now you're going to argue with me about remove OPEN_MAX,
and you haven't said you have any problem with changing SCM_MAX_FD,
so why make it wait?)

 That said, it actually worries me that you should call _SC_OPEN_MAX. 
[...]
 For example, I know perfectly well that I should use _SC_PATH_MAX, but a 
 *lot* of code simply doesn't care. In git, I used PATH_MAX, and the reason 
[...]

Ok, fine.  But PATH_MAX is a real constant that has some meaning in the
kernel.  It's perfectly correct to use PATH_MAX as a constant on a system
like Linux that defines it and means what it says.  Conversely, OPEN_MAX
has no useful relationship with anything the kernel is doing at all.

 So, what's the likelihood that this will break some old programs? I 
 realize that modern distributions don't put the kernel headers in their 
 user-visible includes any more, but the breakage is most likely exactly 
 for old programs and older distributions.

Well, I don't know for sure.  It doesn't seem all that likely to me (not
like PATH_MAX), as there has been getdtablesize() since before there was
OPEN_MAX by that name (not to mention before there was Linux).  If things
use OPEN_MAX as a constant for arrays, they're already broken unless they
call setrlimit to constrain themselves.  Getting things fixed has to start
somewhere.


Thanks,
Roland

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD

2007-03-13 Thread Linus Torvalds


On Tue, 13 Mar 2007, Roland McGrath wrote:
 
 Ok, fine.  But PATH_MAX is a real constant that has some meaning in the
 kernel.  It's perfectly correct to use PATH_MAX as a constant on a system
 like Linux that defines it and means what it says.  Conversely, OPEN_MAX
 has no useful relationship with anything the kernel is doing at all.

Sure. I'm just saying that some people may use OPEN_MAX the way I know 
people use PATH_MAX - whether it's what you're supposed to or not.

I do agree that PATH_MAX is much more appropriate to be used that way, and 
is more likely to have real meaning, I just worry.

Linus
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] iproute2 2.6.20-070313

2007-03-13 Thread Stephen Hemminger
This is an experimental to the iproute2 command set.

The version number includes the kernel version to denote what features are
supported. The same source should build on older systems, but obviously the
newer kernel features won't be available. As much as possible, this package
tries to be source compatible across releases.

It can be downloaded from:
  http://developer.osdl.org/dev/iproute2/download/iproute2-2.6.20-070313.tar.gz

Repository:
  git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

For more info on iproute2 see:
  http://linux-net.osdl.org/index.php/Iproute2

Changes:

Jamal Hadi Salim:
  update rest to use nl_mgrp
  nl_mgrp to crap if base multicast groups exceeded
  Old bug on tc

Mike Frysinger:
  do not ignore build failures in subdirs of iproute2

Noriaki TAKAMIYA:
  enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from 
ip.

Patrick McHardy:
  tbf: fix latency printing
  Use tc_calc_xmittime() where appropriate
  Introduce tc_calc_xmitsize and use where appropriate
  Introduce TIME_UNITS_PER_SEC to represent internal clock resolution
  Replace usec by time in function names
  Add sprint_ticks() function and use in CBQ
  Handle different kernel clock resolutions
  Increase internal clock resolution to nsec

Stephen Hemminger:
  netem use read/write for changes
  fix tc-pfifo and tc-bfifo man pages
  iptables library fix
  TC bfifo man page
  Use kernel headers from 2.6.20.y

Thomas Hisch:
  Fixes use of uninitialized string

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-13 Thread Michal Piotrowski

On 12/03/07, Tejun Heo [EMAIL PROTECTED] wrote:

Stephen Hemminger wrote:
 On Tue, 13 Mar 2007 04:03:00 +0900
 Tejun Heo [EMAIL PROTECTED] wrote:

 Stephen Hemminger wrote:
 1. the controller has IRQ stuck high (infrequent but possible)
 2. the IRQ is already requested by another device
 3. the IRQ gets disabled due to screaming interrupts at the moment
 ata_piix does pci_enable_device().

 I think we can be much more resilient to screaming interrupts if we
 enable device with IRQ disabled and enable it after the device is
 initialized to some level, possibly when requesting IRQ.
 The first thing the skge driver does is do a chip reset, and that should
 cause IRQ to be disabled and cleared. The driver has no chance to
 fix it if the BIOS left the IRQ screaming...
 What if we do something like...

  pci_intx(pdev, 0);
  pci_enable_device(pdev);
  /* initialize */
  request_irq(blah blah...);
  pci_intx(pdev, 1);

 Would this work for skge?


 Okay for testing, but any change like this should be done in the base
 PCI layer, not one off in a particular driver.

Yeap, it was a proof-of-concept pseudo code.  I attached a patch to do
above in skge.  Please point out if it is broken (e.g. intx needs to be
enabled earlier).

Michal, can you apply the attached patch and see whether it fixes the
problem.


I think that problem is solved.

Thanks.



Thanks.

--
tejun

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index eea75a4..2c990f2 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -3585,6 +3585,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
struct skge_hw *hw;
int err, using_dac = 0;

+   pci_intx(pdev, 0);
err = pci_enable_device(pdev);
if (err) {
dev_err(pdev-dev, cannot enable PCI device\n);
@@ -3669,6 +3670,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
   dev-name, pdev-irq);
goto err_out_unregister;
}
+   pci_intx(pdev, 1);
skge_show_addr(dev);

if (hw-ports  1  (dev1 = skge_devinit(hw, 1, using_dac))) {




Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html