Re: more complex processing in ing_filter ?

2005-07-27 Thread Lucas Nussbaum
On Wed, Jul 27, 2005 at 10:50:41AM -0700, Stephen Hemminger <[EMAIL PROTECTED]> 
wrote:
> On Wed, 27 Jul 2005 10:06:45 +0200
> Lucas Nussbaum <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > I'm interested in doing more complex stuff on inbound packets than
> > what is currently possible with ing_filter (I understand ingress
> > doesn't allow child classes , and can only drop/pass packets, not
> > store one to send it later).
> > 
> > While this is understandable because it would conflict with the
> > benefits of NAPI by queueing and dropping packets much later, it
> > prevents me from using Linux instead of FreeBSD's Dummynet (I'm
> > working on network emulation-related stuff).
> >
> Why not just fix netem to work on imq?

I might look at that. What's the problem with netem & imq ?

Also, I'm not sure I understand the difference between the "redirect to
imq" and the "redirect to dummy" approaches.

-- 
| Lucas Nussbaum
| [EMAIL PROTECTED]   http://www.lucas-nussbaum.net/ |
| jabber: [EMAIL PROTECTED] GPG: 1024D/023B3F4F |
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: more complex processing in ing_filter ?

2005-07-27 Thread Lucas Nussbaum
On Wed, Jul 27, 2005 at 08:59:56AM -0400, jamal <[EMAIL PROTECTED]> wrote:
> On Wed, 2005-27-07 at 10:06 +0200, Lucas Nussbaum wrote:
> > Hi,
> > 
> > I'm interested in doing more complex stuff on inbound packets than what
> > is currently possible with ing_filter (I understand ingress doesn't
> > allow child classes , and can only drop/pass packets, not store one to
> > send it later).
> > 
> 
> No, thats not true. You can write a tc action that will steal packets
> from that path and later reinject them.

Any example/mail thread I could read about this ?

> But that may not be necessary
> if you use the patched dummy device since you could redirect packets to
> it and run whatever qdisc you want on it. 
>
> [...]
>
> I am not sure why you say it's unclean. If you can give the packets to
> dummy and run any qdisc on it such as netem - why would that be a
> problem?

I'm concerned about the overhead of redirecting the packets to a
dummy/imq device and then re-inject them, compared to doing all the
processing inside ing_filter. However, I don't know enough linux
internals to really evaluate it. Any idea ?
-- 
| Lucas Nussbaum
| [EMAIL PROTECTED]   http://www.lucas-nussbaum.net/ |
| jabber: [EMAIL PROTECTED] GPG: 1024D/023B3F4F |
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][NET] Cleanup INET_REFCNT_DEBUG code

2005-07-27 Thread David S. Miller
From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo)
Date: Thu, 28 Jul 2005 02:46:56 -0300

> Oops, sorry I overlooked that, did a test without an the last one was with
> it defined, but I guess that leaving it as is for a few days wouldn't harm
> so that people get a bit of this debugging and perhaps find out some
> possible problems introduced in the last six months or so? I.e. its just a
> matter of deciding if we disable it now or in a few days.

Ok, we can leave it on for a little while ;)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][NET] Cleanup INET_REFCNT_DEBUG code

2005-07-27 Thread Arnaldo Carvalho de Melo
Em Wed, Jul 27, 2005 at 10:43:45PM -0700, David S. Miller escreveu:
> From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo)
> Date: Thu, 28 Jul 2005 02:19:59 -0300
> 
> > Indeed, there were some issues about that thing, I think I have those 
> > handled
> > properly now, please take a look at the comments and tell me if you find any
> > holes.
> 
> Looks good, I'll get to pulling this in soon.
> 
> Are we going from a default of off to a default of
> on for any particular reason?

Oops, sorry I overlooked that, did a test without an the last one was with
it defined, but I guess that leaving it as is for a few days wouldn't harm
so that people get a bit of this debugging and perhaps find out some
possible problems introduced in the last six months or so? I.e. its just a
matter of deciding if we disable it now or in a few days.

Regards,

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][NET] Cleanup INET_REFCNT_DEBUG code

2005-07-27 Thread David S. Miller
From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo)
Date: Thu, 28 Jul 2005 02:19:59 -0300

> Indeed, there were some issues about that thing, I think I have those handled
> properly now, please take a look at the comments and tell me if you find any
> holes.

Looks good, I'll get to pulling this in soon.

Are we going from a default of off to a default of
on for any particular reason?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6 1/5] tg3: Add basic register access function pointers

2005-07-27 Thread David S. Miller
From: "Michael Chan" <[EMAIL PROTECTED]>
Date: Wed, 27 Jul 2005 22:33:32 -0700

> But with so many different workaround methods
> (TG3_FLAG_MBOX_WRITE_REORDER, TG3_FLAG_TXD_MBOX_HWBUG,
> TG3_FLG2_ICH_WORKAROUND, TG3_FLAG_5701_REG_WRITE_BUG, etc), it's
> more like:
> 
>   if (...)
>   direct_func_1()
>   else if (...)
>   direct_func_2()
>   else if (...)
>   direct_func_3()
>   else
>   direct_func_4()
> 
> At some point I suspect the indirect function pointer method will
> become better.

That's a good point.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6 1/5] tg3: Add basic register access function pointers

2005-07-27 Thread Michael Chan

Jeff Garzik wrote:
> Is this theory, or it has been actually measured?
> 
> In x86-based CPUs at least (the largest tg3 platform), branch 
> prediction 
> often prefers
> 
>   if (...)
>   direct_func_1()
>   else
>   direct_func_2()
> 
> to
> 
>   tp->func()
> 
> For hot paths, branch prediction will almost always predict 
> the correct 
> path, without any need for deferenced, indirect jumps.
> 
> The latter example may look more clean, but the former is probably 
> faster in Real Life(tm).
> 

Not measured. But with so many different workaround methods
(TG3_FLAG_MBOX_WRITE_REORDER, TG3_FLAG_TXD_MBOX_HWBUG,
TG3_FLG2_ICH_WORKAROUND, TG3_FLAG_5701_REG_WRITE_BUG, etc), it's
more like:

if (...)
direct_func_1()
else if (...)
direct_func_2()
else if (...)
direct_func_3()
else
direct_func_4()

At some point I suspect the indirect function pointer method will
become better.
 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][NET] Cleanup INET_REFCNT_DEBUG code

2005-07-27 Thread Arnaldo Carvalho de Melo
Hi David,

Please consider pulling from:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/

Regards,

- Arnaldo

tree 238b21aeaed399b847c0f0b0f7328cd69ffcd0d1
parent df2e0392536ecdd6385f4319f746045fd6fae38f
author Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> 1122526171 -0300
committer Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> 1122526171 -0300

[PATCH][NET] Cleanup INET_REFCNT_DEBUG code

> On 7/22/05, David S. Miller <[EMAIL PROTECTED]> wrote:
> > From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
> > Date: Thu, 21 Jul 2005 23:02:03 -0300
> >
> > > The second one again, also at:
> > >
> > > rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git
> >
> > How is this handling properly the case where sk_prot changes?
> >
> > Do you remember we had that problem with socket SLAB caches,
> > because of how IPV6 and IPV4 sockets can change into the other
> > type?  That's why we store the socket SLAB cache in there, as
> > well as the sk_prot.

> I think so, but that thing is so tricky at times that I'll go over most of
> the patch (re)reviewing/commenting why its safe.

Indeed, there were some issues about that thing, I think I have those handled
properly now, please take a look at the comments and tell me if you find any
holes.

> > Also, would be nice to have some "do { } while (0)" for the NOP
> > version of the debug macros just in case :-)

> I'll do that

Done.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

--

 include/net/inet_common.h |1 -
 include/net/ipv6.h|1 -
 include/net/sock.h|   32 +++-
 include/net/tcp.h |2 +-
 net/core/sock.c   |6 +-
 net/ipv4/af_inet.c|   18 ++
 net/ipv4/tcp.c|7 +--
 net/ipv4/tcp_minisocks.c  |   20 
 net/ipv6/af_inet6.c   |   31 +++
 net/ipv6/ipv6_sockglue.c  |   15 ---
 net/ipv6/tcp_ipv6.c   |   18 +-
 net/sctp/ipv6.c   |5 +
 net/sctp/protocol.c   |4 +---
 13 files changed, 86 insertions(+), 74 deletions(-)

--

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -29,7 +29,6 @@ extern unsigned int   inet_poll(struct fi
 extern int inet_listen(struct socket *sock, int backlog);
 
 extern voidinet_sock_destruct(struct sock *sk);
-extern atomic_tinet_sock_nr;
 
 extern int inet_bind(struct socket *sock, 
  struct sockaddr *uaddr, int addr_len);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -145,7 +145,6 @@ DECLARE_SNMP_STAT(struct udp_mib, udp_st
 #define UDP6_INC_STATS(field)  SNMP_INC_STATS(udp_stats_in6, field)
 #define UDP6_INC_STATS_BH(field)   SNMP_INC_STATS_BH(udp_stats_in6, field)
 #define UDP6_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_stats_in6, 
field)
-extern atomic_tinet6_sock_nr;
 
 int snmp6_register_dev(struct inet6_dev *idev);
 int snmp6_unregister_dev(struct inet6_dev *idev);
diff --git a/include/net/sock.h b/include/net/sock.h
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -486,6 +486,9 @@ extern int sk_wait_data(struct sock *sk,
 
 struct request_sock_ops;
 
+/* Here is the right place to enable sock refcounting debugging */
+#define SOCK_REFCNT_DEBUG
+
 /* Networking protocol blocks we attach to sockets.
  * socket layer -> transport layer interface
  * transport -> network interface is defined by struct inet_proto
@@ -556,7 +559,9 @@ struct proto {
charname[32];
 
struct list_headnode;
-
+#ifdef SOCK_REFCNT_DEBUG
+   atomic_tsocks;
+#endif
struct {
int inuse;
u8  __pad[SMP_CACHE_BYTES - sizeof(int)];
@@ -566,6 +571,31 @@ struct proto {
 extern int proto_register(struct proto *prot, int alloc_slab);
 extern void proto_unregister(struct proto *prot);
 
+#ifdef SOCK_REFCNT_DEBUG
+static inline void sk_refcnt_debug_inc(struct sock *sk)
+{
+   atomic_inc(&sk->sk_prot->socks);
+}
+
+static inline void sk_refcnt_debug_dec(struct sock *sk)
+{
+   atomic_dec(&sk->sk_prot->socks);
+   printk(KERN_DEBUG "%s socket %p released, %d are still alive\n",
+  sk->sk_prot->name, sk, atomic_read(&sk->sk_prot->socks));
+}
+
+static inline void sk_refcnt_debug_release(const struct sock *sk)
+{
+   if (atomic_read(&sk->sk_refcnt) != 1)
+   printk(KERN_DEBUG "Destruction of the %s socket %p delayed, 
refcnt

Re: [PATCH 2.6 1/5] tg3: Add basic register access function pointers

2005-07-27 Thread David S. Miller
From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Wed, 27 Jul 2005 23:44:51 -0400

> In x86-based CPUs at least (the largest tg3 platform), branch prediction 
> often prefers
> 
>   if (...)
>   direct_func_1()
>   else
>   direct_func_2()
> 
> to
> 
>   tp->func()

Indirect function calls also kill cpus such as ia64, which
cannot avoid the implicit branch prediction miss unless the
function method target is prefetched several instruction groups
before the call and gcc does not emit the necessary directives
to achieve this even if it were possible.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6 1/5] tg3: Add basic register access function pointers

2005-07-27 Thread Jeff Garzik

Michael Chan wrote:

This patch adds the basic function pointers to do register accesses in
the fast path. This was suggested by David Miller. The idea is that
various register access methods for different hardware errata can easily
be implemented with these function pointers and performance will not be
degraded on chips that use normal register access methods.



Is this theory, or it has been actually measured?

In x86-based CPUs at least (the largest tg3 platform), branch prediction 
often prefers


if (...)
direct_func_1()
else
direct_func_2()

to

tp->func()

For hot paths, branch prediction will almost always predict the correct 
path, without any need for deferenced, indirect jumps.


The latter example may look more clean, but the former is probably 
faster in Real Life(tm).


Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6 5/5] tg3: Eliminate one register write in tg3_restart_ints()

2005-07-27 Thread Michael Chan
The register write to register 0x68 to restart interrupts is unnecessary
as the interrupt wasn't masked in that register by the irq handler. This
will save one register write in the fast path.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff -Nrup 5/drivers/net/tg3.c 6/drivers/net/tg3.c
--- 5/drivers/net/tg3.c 2005-07-27 16:40:17.0 -0700
+++ 6/drivers/net/tg3.c 2005-07-27 16:40:32.0 -0700
@@ -533,8 +533,6 @@ static inline unsigned int tg3_has_work(
  */
 static void tg3_restart_ints(struct tg3 *tp)
 {
-   tw32(TG3PCI_MISC_HOST_CTRL,
-   (tp->misc_host_ctrl & ~MISC_HOST_CTRL_MASK_PCI_INT));
tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
 tp->last_tag << 24);
mmiowb();


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6 4/5] tg3: Add indirect register method for 5703 behind ICH

2005-07-27 Thread Michael Chan
This patch adds the new workaround for 5703 A1/A2 if it is behind
certain ICH bridges. The workaround disables memory and uses config.
cycles only to access all registers. The 5702/03 chips can mistakenly
decode the special cycles from the ICH chipsets as memory write cycles,
causing corruption of register and memory space. Only certain ICH
bridges will drive special cycles with non-zero data during the address
phase which can fall within the 5703's address range. This is not an ICH
bug as the PCI spec allows non-zero address during special cycles.
However, only these ICH bridges are known to drive non-zero addresses
during special cycles.

The indirect_lock is also changed to spin_lock_irqsave from spin_lock_bh
because it is used in irq handler when using the indirect method to
disable interrupts.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff -Nrup 4/drivers/net/tg3.c 5/drivers/net/tg3.c
--- 4/drivers/net/tg3.c 2005-07-26 09:23:32.0 -0700
+++ 5/drivers/net/tg3.c 2005-07-27 16:40:17.0 -0700
@@ -340,10 +340,12 @@ static struct {
 
 static void tg3_write_indirect_reg32(struct tg3 *tp, u32 off, u32 val)
 {
-   spin_lock_bh(&tp->indirect_lock);
+   unsigned long flags;
+
+   spin_lock_irqsave(&tp->indirect_lock, flags);
pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off);
pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val);
-   spin_unlock_bh(&tp->indirect_lock);
+   spin_unlock_irqrestore(&tp->indirect_lock, flags);
 }
 
 static void tg3_write_flush_reg32(struct tg3 *tp, u32 off, u32 val)
@@ -352,24 +354,75 @@ static void tg3_write_flush_reg32(struct
readl(tp->regs + off);
 }
 
-static void _tw32_flush(struct tg3 *tp, u32 off, u32 val)
+static u32 tg3_read_indirect_reg32(struct tg3 *tp, u32 off)
 {
-   if ((tp->tg3_flags & TG3_FLAG_PCIX_TARGET_HWBUG) != 0) {
-   spin_lock_bh(&tp->indirect_lock);
-   pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off);
-   pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val);
-   spin_unlock_bh(&tp->indirect_lock);
-   } else {
-   void __iomem *dest = tp->regs + off;
-   writel(val, dest);
-   readl(dest);/* always flush PCI write */
+   unsigned long flags;
+   u32 val;
+
+   spin_lock_irqsave(&tp->indirect_lock, flags);
+   pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off);
+   pci_read_config_dword(tp->pdev, TG3PCI_REG_DATA, &val);
+   spin_unlock_irqrestore(&tp->indirect_lock, flags);
+   return val;
+}
+
+static void tg3_write_indirect_mbox(struct tg3 *tp, u32 off, u32 val)
+{
+   unsigned long flags;
+
+   if (off == (MAILBOX_RCVRET_CON_IDX_0 + TG3_64BIT_REG_LOW)) {
+   pci_write_config_dword(tp->pdev, TG3PCI_RCV_RET_RING_CON_IDX +
+  TG3_64BIT_REG_LOW, val);
+   return;
+   }
+   if (off == (MAILBOX_RCV_STD_PROD_IDX + TG3_64BIT_REG_LOW)) {
+   pci_write_config_dword(tp->pdev, TG3PCI_STD_RING_PROD_IDX +
+  TG3_64BIT_REG_LOW, val);
+   return;
}
+
+   spin_lock_irqsave(&tp->indirect_lock, flags);
+   pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off + 0x5600);
+   pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val);
+   spin_unlock_irqrestore(&tp->indirect_lock, flags);
+
+   /* In indirect mode when disabling interrupts, we also need
+* to clear the interrupt bit in the GRC local ctrl register.
+*/
+   if ((off == (MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW)) &&
+   (val == 0x1)) {
+   pci_write_config_dword(tp->pdev, TG3PCI_MISC_LOCAL_CTRL,
+  tp->grc_local_ctrl|GRC_LCLCTRL_CLEARINT);
+   }
+}
+
+static u32 tg3_read_indirect_mbox(struct tg3 *tp, u32 off)
+{
+   unsigned long flags;
+   u32 val;
+
+   spin_lock_irqsave(&tp->indirect_lock, flags);
+   pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off + 0x5600);
+   pci_read_config_dword(tp->pdev, TG3PCI_REG_DATA, &val);
+   spin_unlock_irqrestore(&tp->indirect_lock, flags);
+   return val;
+}
+
+static void _tw32_flush(struct tg3 *tp, u32 off, u32 val)
+{
+   tp->write32(tp, off, val);
+   if (!(tp->tg3_flags & TG3_FLAG_PCIX_TARGET_HWBUG) &&
+   !(tp->tg3_flags & TG3_FLAG_5701_REG_WRITE_BUG) &&
+   !(tp->tg3_flags2 & TG3_FLG2_ICH_WORKAROUND))
+   tp->read32(tp, off);/* flush */
 }
 
 static inline void tw32_mailbox_flush(struct tg3 *tp, u32 off, u32 val)
 {
tp->write32_mbox(tp, off, val);
-   tp->read32_mbox(tp, off);
+   if (!(tp->tg3_flags & TG3_FLAG_MBOX_WRITE_REORDER) &&
+   !(tp->tg3_flags2 & TG3_FLG2_ICH_WORKAROUND))
+   tp->read32_mbox(tp, off);
 }
 
 static void tg3_write32_tx_mbox(struct tg3 *tp, u32 off, u32 val)
@@ -404,24 +457,28 

[PATCH 2.6 3/5] tg3: Add mailbox read method

2005-07-27 Thread Michael Chan
This patch adds the mailbox read method and also adds an inline function
tw32_mailbox_f() for mailbox writes that require read flush.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff -Nrup 3/drivers/net/tg3.c 4/drivers/net/tg3.c
--- 3/drivers/net/tg3.c 2005-07-26 07:40:18.0 -0700
+++ 4/drivers/net/tg3.c 2005-07-26 09:23:32.0 -0700
@@ -366,6 +366,12 @@ static void _tw32_flush(struct tg3 *tp, 
}
 }
 
+static inline void tw32_mailbox_flush(struct tg3 *tp, u32 off, u32 val)
+{
+   tp->write32_mbox(tp, off, val);
+   tp->read32_mbox(tp, off);
+}
+
 static void tg3_write32_tx_mbox(struct tg3 *tp, u32 off, u32 val)
 {
void __iomem *mbox = tp->regs + off;
@@ -387,8 +393,10 @@ static u32 tg3_read32(struct tg3 *tp, u3
 }
 
 #define tw32_mailbox(reg, val) tp->write32_mbox(tp, reg, val)
+#define tw32_mailbox_f(reg, val)   tw32_mailbox_flush(tp, (reg), (val))
 #define tw32_rx_mbox(reg, val) tp->write32_rx_mbox(tp, reg, val)
 #define tw32_tx_mbox(reg, val) tp->write32_tx_mbox(tp, reg, val)
+#define tr32_mailbox(reg)  tp->read32_mbox(tp, reg)
 
 #define tw32(reg,val)  tp->write32(tp, reg, val)
 #define tw32_f(reg,val)_tw32_flush(tp,(reg),(val))
@@ -420,8 +428,7 @@ static void tg3_disable_ints(struct tg3 
 {
tw32(TG3PCI_MISC_HOST_CTRL,
 (tp->misc_host_ctrl | MISC_HOST_CTRL_MASK_PCI_INT));
-   tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x0001);
-   tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW);
+   tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x0001);
 }
 
 static inline void tg3_cond_int(struct tg3 *tp)
@@ -437,9 +444,8 @@ static void tg3_enable_ints(struct tg3 *
 
tw32(TG3PCI_MISC_HOST_CTRL,
 (tp->misc_host_ctrl & ~MISC_HOST_CTRL_MASK_PCI_INT));
-   tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
-(tp->last_tag << 24));
-   tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW);
+   tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
+  (tp->last_tag << 24));
tg3_cond_int(tp);
 }
 
@@ -3276,9 +3282,8 @@ static irqreturn_t tg3_interrupt(int irq
/* No work, shared interrupt perhaps?  re-enable
 * interrupts, and flush that PCI write
 */
-   tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
+   tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
0x);
-   tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW);
}
} else {/* shared interrupt */
handled = 0;
@@ -3321,9 +3326,8 @@ static irqreturn_t tg3_interrupt_tagged(
/* no work, shared interrupt perhaps?  re-enable
 * interrupts, and flush that PCI write
 */
-   tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
-tp->last_tag << 24);
-   tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW);
+   tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
+  tp->last_tag << 24);
}
} else {/* shared interrupt */
handled = 0;
@@ -5800,8 +5804,7 @@ static int tg3_reset_hw(struct tg3 *tp)
tw32_f(GRC_LOCAL_CTRL, tp->grc_local_ctrl);
udelay(100);
 
-   tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0);
-   tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW);
+   tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0);
tp->last_tag = 0;
 
if (!(tp->tg3_flags2 & TG3_FLG2_5705_PLUS)) {
@@ -6190,7 +6193,8 @@ static int tg3_test_interrupt(struct tg3
   HOSTCC_MODE_NOW);
 
for (i = 0; i < 5; i++) {
-   int_mbox = tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW);
+   int_mbox = tr32_mailbox(MAILBOX_INTERRUPT_0 +
+   TG3_64BIT_REG_LOW);
if (int_mbox != 0)
break;
msleep(10);
@@ -6590,10 +6594,10 @@ static int tg3_open(struct net_device *d
 
/* Mailboxes */
printk("DEBUG: SNDHOST_PROD[%08x%08x] SNDNIC_PROD[%08x%08x]\n",
-  tr32(MAILBOX_SNDHOST_PROD_IDX_0 + 0x0),
-  tr32(MAILBOX_SNDHOST_PROD_IDX_0 + 0x4),
-  tr32(MAILBOX_SNDNIC_PROD_IDX_0 + 0x0),
-  tr32(MAILBOX_SNDNIC_PROD_IDX_0 + 0x4));
+  tr32_mailbox(MAILBOX_SNDHOST_PROD_IDX_0 + 0x0),
+  tr32_mailbox(MAILBOX_SNDHOST_PROD_IDX_0 + 0x4),
+  tr32_mailbox(MAILBOX_SNDNIC_PROD_IDX_0 + 0x0),
+  tr32_mailbox(MAILBOX_SNDNIC_PROD_IDX_0 + 0x4));
 
/* NIC side send descriptors. */
for (i = 0; i < 6; i++) {
@@ -7895,7 +7899,7 @@ static int tg3_test_loopback(struct tg3 
nu

[PATCH 2.6 2/5] tg3: Add various register methods

2005-07-27 Thread Michael Chan
This patch adds various dedicated register read/write methods for the
existing workarounds, including PCIX target workaround, write with read
flush, etc. The chips that require these workarounds will use these
dedicated access functions.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff -Nrup 2/drivers/net/tg3.c 3/drivers/net/tg3.c
--- 2/drivers/net/tg3.c 2005-07-26 07:33:32.0 -0700
+++ 3/drivers/net/tg3.c 2005-07-26 07:40:18.0 -0700
@@ -340,16 +340,16 @@ static struct {
 
 static void tg3_write_indirect_reg32(struct tg3 *tp, u32 off, u32 val)
 {
-   if ((tp->tg3_flags & TG3_FLAG_PCIX_TARGET_HWBUG) != 0) {
-   spin_lock_bh(&tp->indirect_lock);
-   pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off);
-   pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val);
-   spin_unlock_bh(&tp->indirect_lock);
-   } else {
-   writel(val, tp->regs + off);
-   if ((tp->tg3_flags & TG3_FLAG_5701_REG_WRITE_BUG) != 0)
-   readl(tp->regs + off);
-   }
+   spin_lock_bh(&tp->indirect_lock);
+   pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off);
+   pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val);
+   spin_unlock_bh(&tp->indirect_lock);
+}
+
+static void tg3_write_flush_reg32(struct tg3 *tp, u32 off, u32 val)
+{
+   writel(val, tp->regs + off);
+   readl(tp->regs + off);
 }
 
 static void _tw32_flush(struct tg3 *tp, u32 off, u32 val)
@@ -366,14 +366,6 @@ static void _tw32_flush(struct tg3 *tp, 
}
 }
 
-static void tg3_write32_rx_mbox(struct tg3 *tp, u32 off, u32 val)
-{
-   void __iomem *mbox = tp->regs + off;
-   writel(val, mbox);
-   if (tp->tg3_flags & TG3_FLAG_MBOX_WRITE_REORDER)
-   readl(mbox);
-}
-
 static void tg3_write32_tx_mbox(struct tg3 *tp, u32 off, u32 val)
 {
void __iomem *mbox = tp->regs + off;
@@ -4222,7 +4214,7 @@ static void tg3_stop_fw(struct tg3 *);
 static int tg3_chip_reset(struct tg3 *tp)
 {
u32 val;
-   u32 flags_save;
+   void (*write_op)(struct tg3 *, u32, u32);
int i;
 
if (!(tp->tg3_flags2 & TG3_FLG2_SUN_570X))
@@ -4234,8 +4226,9 @@ static int tg3_chip_reset(struct tg3 *tp
 * fun things.  So, temporarily disable the 5701
 * hardware workaround, while we do the reset.
 */
-   flags_save = tp->tg3_flags;
-   tp->tg3_flags &= ~TG3_FLAG_5701_REG_WRITE_BUG;
+   write_op = tp->write32;
+   if (write_op == tg3_write_flush_reg32)
+   tp->write32 = tg3_write32;
 
/* do the reset */
val = GRC_MISC_CFG_CORECLK_RESET;
@@ -4254,8 +4247,8 @@ static int tg3_chip_reset(struct tg3 *tp
val |= GRC_MISC_CFG_KEEP_GPHY_POWER;
tw32(GRC_MISC_CFG, val);
 
-   /* restore 5701 hardware bug workaround flag */
-   tp->tg3_flags = flags_save;
+   /* restore 5701 hardware bug workaround write method */
+   tp->write32 = write_op;
 
/* Unfortunately, we have to delay before the PCI read back.
 * Some 575X chips even will not respond to a PCI cfg access
@@ -4641,7 +4634,6 @@ static int tg3_load_firmware_cpu(struct 
 int cpu_scratch_size, struct fw_info *info)
 {
int err, i;
-   u32 orig_tg3_flags = tp->tg3_flags;
void (*write_op)(struct tg3 *, u32, u32);
 
if (cpu_base == TX_CPU_BASE &&
@@ -4657,11 +4649,6 @@ static int tg3_load_firmware_cpu(struct 
else
write_op = tg3_write_indirect_reg32;
 
-   /* Force use of PCI config space for indirect register
-* write calls.
-*/
-   tp->tg3_flags |= TG3_FLAG_PCIX_TARGET_HWBUG;
-
/* It is possible that bootcode is still loading at this point.
 * Get the nvram lock first before halting the cpu.
 */
@@ -4697,7 +4684,6 @@ static int tg3_load_firmware_cpu(struct 
err = 0;
 
 out:
-   tp->tg3_flags = orig_tg3_flags;
return err;
 }
 
@@ -9331,11 +9317,25 @@ static int __devinit tg3_get_invariants(
pci_write_config_dword(tp->pdev, TG3PCI_PCISTATE, 
pci_state_reg);
}
 
+   /* Default fast path register access methods */
tp->read32 = tg3_read32;
-   tp->write32 = tg3_write_indirect_reg32;
+   tp->write32 = tg3_write32;
tp->write32_mbox = tg3_write32;
-   tp->write32_tx_mbox = tg3_write32_tx_mbox;
-   tp->write32_rx_mbox = tg3_write32_rx_mbox;
+   tp->write32_tx_mbox = tg3_write32;
+   tp->write32_rx_mbox = tg3_write32;
+
+   /* Various workaround register access methods */
+   if (tp->tg3_flags & TG3_FLAG_PCIX_TARGET_HWBUG)
+   tp->write32 = tg3_write_indirect_reg32;
+   else if (tp->tg3_flags & TG3_FLAG_5701_REG_WRITE_BUG)
+   tp->write32 = tg3_write_flush_reg32;
+
+   if ((tp->tg3_flags & TG3_FLAG_TXD_MBOX_HWBUG) ||
+   (tp->tg3_flags & TG3_FLAG_MBOX_WRITE_REORDER)) {
+ 

[PATCH 2.6 1/5] tg3: Add basic register access function pointers

2005-07-27 Thread Michael Chan
This patch adds the basic function pointers to do register accesses in
the fast path. This was suggested by David Miller. The idea is that
various register access methods for different hardware errata can easily
be implemented with these function pointers and performance will not be
degraded on chips that use normal register access methods.

The various register read write macros (e.g. tw32, tr32, tw32_mailbox)
are redefined to call the function pointers.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

diff -Nrup 1/drivers/net/tg3.c 2/drivers/net/tg3.c
--- 1/drivers/net/tg3.c 2005-07-25 20:01:38.0 -0700
+++ 2/drivers/net/tg3.c 2005-07-26 07:33:32.0 -0700
@@ -366,7 +366,7 @@ static void _tw32_flush(struct tg3 *tp, 
}
 }
 
-static inline void _tw32_rx_mbox(struct tg3 *tp, u32 off, u32 val)
+static void tg3_write32_rx_mbox(struct tg3 *tp, u32 off, u32 val)
 {
void __iomem *mbox = tp->regs + off;
writel(val, mbox);
@@ -374,7 +374,7 @@ static inline void _tw32_rx_mbox(struct 
readl(mbox);
 }
 
-static inline void _tw32_tx_mbox(struct tg3 *tp, u32 off, u32 val)
+static void tg3_write32_tx_mbox(struct tg3 *tp, u32 off, u32 val)
 {
void __iomem *mbox = tp->regs + off;
writel(val, mbox);
@@ -384,17 +384,23 @@ static inline void _tw32_tx_mbox(struct 
readl(mbox);
 }
 
-#define tw32_mailbox(reg, val)  writel(((val) & 0x), tp->regs + (reg))
-#define tw32_rx_mbox(reg, val)  _tw32_rx_mbox(tp, reg, val)
-#define tw32_tx_mbox(reg, val)  _tw32_tx_mbox(tp, reg, val)
+static void tg3_write32(struct tg3 *tp, u32 off, u32 val)
+{
+   writel(val, tp->regs + off);
+}
 
-#define tw32(reg,val)  tg3_write_indirect_reg32(tp,(reg),(val))
+static u32 tg3_read32(struct tg3 *tp, u32 off)
+{
+   return (readl(tp->regs + off)); 
+}
+
+#define tw32_mailbox(reg, val) tp->write32_mbox(tp, reg, val)
+#define tw32_rx_mbox(reg, val) tp->write32_rx_mbox(tp, reg, val)
+#define tw32_tx_mbox(reg, val) tp->write32_tx_mbox(tp, reg, val)
+
+#define tw32(reg,val)  tp->write32(tp, reg, val)
 #define tw32_f(reg,val)_tw32_flush(tp,(reg),(val))
-#define tw16(reg,val)  writew(((val) & 0x), tp->regs + (reg))
-#define tw8(reg,val)   writeb(((val) & 0xff), tp->regs + (reg))
-#define tr32(reg)  readl(tp->regs + (reg))
-#define tr16(reg)  readw(tp->regs + (reg))
-#define tr8(reg)   readb(tp->regs + (reg))
+#define tr32(reg)  tp->read32(tp, reg)
 
 static void tg3_write_mem(struct tg3 *tp, u32 off, u32 val)
 {
@@ -9325,6 +9331,12 @@ static int __devinit tg3_get_invariants(
pci_write_config_dword(tp->pdev, TG3PCI_PCISTATE, 
pci_state_reg);
}
 
+   tp->read32 = tg3_read32;
+   tp->write32 = tg3_write_indirect_reg32;
+   tp->write32_mbox = tg3_write32;
+   tp->write32_tx_mbox = tg3_write32_tx_mbox;
+   tp->write32_rx_mbox = tg3_write32_rx_mbox;
+
/* Get eeprom hw config before calling tg3_set_power_state().
 * In particular, the TG3_FLAG_EEPROM_WRITE_PROT flag must be
 * determined before calling tg3_set_power_state() so that
diff -Nrup 1/drivers/net/tg3.h 2/drivers/net/tg3.h
--- 1/drivers/net/tg3.h 2005-07-25 20:01:38.0 -0700
+++ 2/drivers/net/tg3.h 2005-07-25 20:05:53.0 -0700
@@ -2049,6 +2049,10 @@ struct tg3 {
spinlock_t  lock;
spinlock_t  indirect_lock;
 
+   u32 (*read32) (struct tg3 *, u32);
+   void(*write32) (struct tg3 *, u32, u32);
+   void(*write32_mbox) (struct tg3 *, u32,
+u32);
void __iomem*regs;
struct net_device   *dev;
struct pci_dev  *pdev;
@@ -2060,6 +2064,8 @@ struct tg3 {
u32 msg_enable;
 
/* begin "tx thread" cacheline section */
+   void(*write32_tx_mbox) (struct tg3 *, u32,
+   u32);
u32 tx_prod;
u32 tx_cons;
u32 tx_pending;
@@ -2071,6 +2077,8 @@ struct tg3 {
dma_addr_t  tx_desc_mapping;
 
/* begin "rx thread" cacheline section */
+   void(*write32_rx_mbox) (struct tg3 *, u32,
+   u32);
u32 rx_rcb_ptr;
u32 rx_std_ptr;
u32 rx_jumbo_ptr;


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH] Fix up struct sockaddr_in definition

2005-07-27 Thread Kyle Moffett

Hi,

I would like to propose a cleanup for struct sockaddr_in that I think
will make the code much more obvious and remove some icky padding
math:



sockaddr_in-cleanup.patch
Description: Binary data


Thanks for all your input!

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to  
make it so simple that there are obviously no deficiencies. And the  
other way is to make it so complicated that there are no obvious  
deficiencies.

  -- C.A.R. Hoare




[PATCH 2.6 0/5] tg3: Add indirect register access for 5703

2005-07-27 Thread Michael Chan
A set of patches will follow that adds the last remaining register
access workaround for 5703 behind certain ICH bridges.

The first 3 patches add the infrastructure to use function pointers for
various register access methods. Patch #4 adds the new indirect register
access method.

It turns out that these patches improve performance on many systems with
the 82801 (ICH) bridge, including new PCIE systems. The current tg3
driver sets the TG3_FLAG_MBOX_WRITE_REORDER flag when the ICH bridge is
detected and a read flush will be added in the tx and rx data paths on
all tg3 chips. These patches will correctly apply the indirect register
method to 5703 only when necessary and the unnecessary read flush will
be eliminated on all other tg3 chips.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2.6.13-rc3] ethtool: add generic ethtool_op_get_perm_addr routine

2005-07-27 Thread John W. Linville
On Wed, Jul 27, 2005 at 06:45:18PM -0700, cramerj wrote:
> Stupid question:  Can we assume ethtool will only be used for networking
> devices with a 6-byte hardware address?
 
I presume not...?

> If not, then the driver-specific approach would give the flexibility of
> copying anything up to MAX_ADDR_LEN.
> 
> Perhaps increasing the count to MAX_ADDR_LEN is the way to go??

Drivers would still have the option to override if they so choose.
But, since the ETH_MAX_ADDR_LEN definition is actually 32 (which
matches MAX_ADDR_LEN anyway) then it is a bit of a moot point... :-)

Jon, you should probably add a patch (or redo you current patch)
and use MAX_ADDR_LEN instead of adding the new ETH_MAX_ADDR_LEN...

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [patch 2.6.13-rc3] ethtool: add generic ethtool_op_get_perm_addr routine

2005-07-27 Thread cramerj
B'ah!  Nevermind.  I'll learn to read #defines one of these days.

Sorry for the spam.

> -Original Message-
> From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
> On Behalf Of cramerj
> Sent: Wednesday, July 27, 2005 6:45 PM
> To: John W. Linville; Jon Wetzel
> Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]
> Subject: RE: [patch 2.6.13-rc3] ethtool: add generic
> ethtool_op_get_perm_addr routine
> 
> Stupid question:  Can we assume ethtool will only be used for
networking
> devices with a 6-byte hardware address?
> 
> If not, then the driver-specific approach would give the flexibility
of
> copying anything up to MAX_ADDR_LEN.
> 
> Perhaps increasing the count to MAX_ADDR_LEN is the way to go??
> 
> 6/half-dozen
> 
> -Jeb
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]
> > On Behalf Of John W. Linville
> > Sent: Wednesday, July 27, 2005 6:15 PM
> > To: Jon Wetzel
> > Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]
> > Subject: [patch 2.6.13-rc3] ethtool: add generic
> ethtool_op_get_perm_addr
> > routine
> >
> > Add generic ethtool operation for getting permanenet hardware
address.
> >
> > Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
> > ---
> > This moves and renames the basically generic e1000_get_perm_addr
> > routine to ethtool_op_get_perm_addr, and causes e1000 to make use of
> > the new name.
> >
> >  drivers/net/e1000/e1000_ethtool.c |9 +
> >  include/linux/ethtool.h   |1 +
> >  net/core/ethtool.c|7 +++
> >  3 files changed, 9 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/e1000/e1000_ethtool.c
> > b/drivers/net/e1000/e1000_ethtool.c
> > --- a/drivers/net/e1000/e1000_ethtool.c
> > +++ b/drivers/net/e1000/e1000_ethtool.c
> > @@ -1704,13 +1704,6 @@ e1000_get_strings(struct net_device *net
> > }
> >  }
> >
> > -static int
> > -e1000_get_perm_addr(struct net_device *netdev, struct ethtool_addr
> *eaddr)
> > -{
> > -   memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN);
> > -   return 0;
> > -}
> > -
> >  struct ethtool_ops e1000_ethtool_ops = {
> > .get_settings   = e1000_get_settings,
> > .set_settings   = e1000_set_settings,
> > @@ -1746,7 +1739,7 @@ struct ethtool_ops e1000_ethtool_ops = {
> > .phys_id= e1000_phys_id,
> > .get_stats_count= e1000_get_stats_count,
> > .get_ethtool_stats  = e1000_get_ethtool_stats,
> > -   .get_perm_addr  = e1000_get_perm_addr,
> > +   .get_perm_addr  = ethtool_op_get_perm_addr,
> >  };
> >
> >  void e1000_set_ethtool_ops(struct net_device *netdev)
> > diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> > --- a/include/linux/ethtool.h
> > +++ b/include/linux/ethtool.h
> > @@ -268,6 +268,7 @@ u32 ethtool_op_get_sg(struct net_device
> >  int ethtool_op_set_sg(struct net_device *dev, u32 data);
> >  u32 ethtool_op_get_tso(struct net_device *dev);
> >  int ethtool_op_set_tso(struct net_device *dev, u32 data);
> > +int ethtool_op_get_perm_addr(struct net_device *dev, struct
> ethtool_addr
> > *);
> >
> >  /**
> >   * ðtool_ops - Alter and report network device settings
> > diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> > --- a/net/core/ethtool.c
> > +++ b/net/core/ethtool.c
> > @@ -81,6 +81,12 @@ int ethtool_op_set_tso(struct net_device
> > return 0;
> >  }
> >
> > +int ethtool_op_get_perm_addr(struct net_device *netdev, struct
> > ethtool_addr *eaddr)
> > +{
> > +   memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN);
> > +   return 0;
> > +}
> > +
> >  /* Handlers for each ethtool command */
> >
> >  static int ethtool_get_settings(struct net_device *dev, void __user
> > *useraddr)
> > @@ -845,6 +851,7 @@ int dev_ethtool(struct ifreq *ifr)
> >
> >  EXPORT_SYMBOL(dev_ethtool);
> >  EXPORT_SYMBOL(ethtool_op_get_link);
> > +EXPORT_SYMBOL_GPL(ethtool_op_get_perm_addr);
> >  EXPORT_SYMBOL(ethtool_op_get_sg);
> >  EXPORT_SYMBOL(ethtool_op_get_tso);
> >  EXPORT_SYMBOL(ethtool_op_get_tx_csum);
> > --
> > John W. Linville
> > [EMAIL PROTECTED]
> > -
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [patch 2.6.13-rc3] ethtool: add generic ethtool_op_get_perm_addr routine

2005-07-27 Thread cramerj
Stupid question:  Can we assume ethtool will only be used for networking
devices with a 6-byte hardware address?

If not, then the driver-specific approach would give the flexibility of
copying anything up to MAX_ADDR_LEN.

Perhaps increasing the count to MAX_ADDR_LEN is the way to go??

6/half-dozen

-Jeb

> -Original Message-
> From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
> On Behalf Of John W. Linville
> Sent: Wednesday, July 27, 2005 6:15 PM
> To: Jon Wetzel
> Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]
> Subject: [patch 2.6.13-rc3] ethtool: add generic
ethtool_op_get_perm_addr
> routine
> 
> Add generic ethtool operation for getting permanenet hardware address.
> 
> Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
> ---
> This moves and renames the basically generic e1000_get_perm_addr
> routine to ethtool_op_get_perm_addr, and causes e1000 to make use of
> the new name.
> 
>  drivers/net/e1000/e1000_ethtool.c |9 +
>  include/linux/ethtool.h   |1 +
>  net/core/ethtool.c|7 +++
>  3 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/e1000/e1000_ethtool.c
> b/drivers/net/e1000/e1000_ethtool.c
> --- a/drivers/net/e1000/e1000_ethtool.c
> +++ b/drivers/net/e1000/e1000_ethtool.c
> @@ -1704,13 +1704,6 @@ e1000_get_strings(struct net_device *net
>   }
>  }
> 
> -static int
> -e1000_get_perm_addr(struct net_device *netdev, struct ethtool_addr
*eaddr)
> -{
> - memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN);
> - return 0;
> -}
> -
>  struct ethtool_ops e1000_ethtool_ops = {
>   .get_settings   = e1000_get_settings,
>   .set_settings   = e1000_set_settings,
> @@ -1746,7 +1739,7 @@ struct ethtool_ops e1000_ethtool_ops = {
>   .phys_id= e1000_phys_id,
>   .get_stats_count= e1000_get_stats_count,
>   .get_ethtool_stats  = e1000_get_ethtool_stats,
> - .get_perm_addr  = e1000_get_perm_addr,
> + .get_perm_addr  = ethtool_op_get_perm_addr,
>  };
> 
>  void e1000_set_ethtool_ops(struct net_device *netdev)
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -268,6 +268,7 @@ u32 ethtool_op_get_sg(struct net_device
>  int ethtool_op_set_sg(struct net_device *dev, u32 data);
>  u32 ethtool_op_get_tso(struct net_device *dev);
>  int ethtool_op_set_tso(struct net_device *dev, u32 data);
> +int ethtool_op_get_perm_addr(struct net_device *dev, struct
ethtool_addr
> *);
> 
>  /**
>   * ðtool_ops - Alter and report network device settings
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -81,6 +81,12 @@ int ethtool_op_set_tso(struct net_device
>   return 0;
>  }
> 
> +int ethtool_op_get_perm_addr(struct net_device *netdev, struct
> ethtool_addr *eaddr)
> +{
> + memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN);
> + return 0;
> +}
> +
>  /* Handlers for each ethtool command */
> 
>  static int ethtool_get_settings(struct net_device *dev, void __user
> *useraddr)
> @@ -845,6 +851,7 @@ int dev_ethtool(struct ifreq *ifr)
> 
>  EXPORT_SYMBOL(dev_ethtool);
>  EXPORT_SYMBOL(ethtool_op_get_link);
> +EXPORT_SYMBOL_GPL(ethtool_op_get_perm_addr);
>  EXPORT_SYMBOL(ethtool_op_get_sg);
>  EXPORT_SYMBOL(ethtool_op_get_tso);
>  EXPORT_SYMBOL(ethtool_op_get_tx_csum);
> --
> John W. Linville
> [EMAIL PROTECTED]
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2.6.13-rc3] ethtool: add generic ethtool_op_get_perm_addr routine

2005-07-27 Thread John W. Linville
Add generic ethtool operation for getting permanenet hardware address.

Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
---
This moves and renames the basically generic e1000_get_perm_addr
routine to ethtool_op_get_perm_addr, and causes e1000 to make use of
the new name.

 drivers/net/e1000/e1000_ethtool.c |9 +
 include/linux/ethtool.h   |1 +
 net/core/ethtool.c|7 +++
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethtool.c 
b/drivers/net/e1000/e1000_ethtool.c
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -1704,13 +1704,6 @@ e1000_get_strings(struct net_device *net
}
 }
 
-static int
-e1000_get_perm_addr(struct net_device *netdev, struct ethtool_addr *eaddr)
-{  
-   memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN);
-   return 0;
-}
-
 struct ethtool_ops e1000_ethtool_ops = {
.get_settings   = e1000_get_settings,
.set_settings   = e1000_set_settings,
@@ -1746,7 +1739,7 @@ struct ethtool_ops e1000_ethtool_ops = {
.phys_id= e1000_phys_id,
.get_stats_count= e1000_get_stats_count,
.get_ethtool_stats  = e1000_get_ethtool_stats,
-   .get_perm_addr  = e1000_get_perm_addr,
+   .get_perm_addr  = ethtool_op_get_perm_addr,
 };
 
 void e1000_set_ethtool_ops(struct net_device *netdev)
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -268,6 +268,7 @@ u32 ethtool_op_get_sg(struct net_device 
 int ethtool_op_set_sg(struct net_device *dev, u32 data);
 u32 ethtool_op_get_tso(struct net_device *dev);
 int ethtool_op_set_tso(struct net_device *dev, u32 data);
+int ethtool_op_get_perm_addr(struct net_device *dev, struct ethtool_addr *);
 
 /**
  * ðtool_ops - Alter and report network device settings
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -81,6 +81,12 @@ int ethtool_op_set_tso(struct net_device
return 0;
 }
 
+int ethtool_op_get_perm_addr(struct net_device *netdev, struct ethtool_addr 
*eaddr)
+{  
+   memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN);
+   return 0;
+}
+
 /* Handlers for each ethtool command */
 
 static int ethtool_get_settings(struct net_device *dev, void __user *useraddr)
@@ -845,6 +851,7 @@ int dev_ethtool(struct ifreq *ifr)
 
 EXPORT_SYMBOL(dev_ethtool);
 EXPORT_SYMBOL(ethtool_op_get_link);
+EXPORT_SYMBOL_GPL(ethtool_op_get_perm_addr);
 EXPORT_SYMBOL(ethtool_op_get_sg);
 EXPORT_SYMBOL(ethtool_op_get_tso);
 EXPORT_SYMBOL(ethtool_op_get_tx_csum);
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 2.6.12.2 1/2]netðtool: Add support for getting the permanent hardware address

2005-07-27 Thread John W. Linville
Build fixup for busted ethtool perm_addr support patch.

Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
---
The hunk below is busted...

On Tue, Jul 26, 2005 at 09:32:38AM -0500, Jon Wetzel wrote:
> @@ -683,6 +683,22 @@
>   return ret;
>  }
>  
> +static int ethtool_get_perm_addr(struct net_device *dev, void __user 
> *useraddr)
> +{
> + struct ethtool_addr addr = { ETHTOOL_GPERMADDR };
> + struct ethtool_ops *ops = dev->ethtool_ops;
> + 
> + if (!ops->get_perm_addr){
> + return -EOPNOTSUPP;
> +
> + ops->get_perm_addr(dev, &addr); 
> +
> + if (copy_to_user(useraddr, &addr, sizeof(addr)))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
>  /* The main entry point in this file.  Called from net/core/dev.c */
>  
>  int dev_ethtool(struct ifreq *ifr)

Patch follows...

 net/core/ethtool.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -694,7 +694,7 @@ static int ethtool_get_perm_addr(struct 
struct ethtool_addr addr = { ETHTOOL_GPERMADDR };
struct ethtool_ops *ops = dev->ethtool_ops;

-   if (!ops->get_perm_addr){
+   if (!ops->get_perm_addr)
return -EOPNOTSUPP;
 
ops->get_perm_addr(dev, &addr); 
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 2.6.12.2 2/2]e1000: Add support for getting a permanent hardware address

2005-07-27 Thread John W. Linville
On Tue, Jul 26, 2005 at 09:34:10AM -0500, Jon Wetzel wrote:
> This patch gives the e1000 driver the ability to retreive the permanent
> hardware address of its device, via the framework established in part 1
> of this patch series.  This patch fills in the new perm_addr field on 
> probing, and implements the get_perm_addr ethtool. 

> @@ -1663,6 +1663,13 @@
>   }
>  }
>  
> +static int
> +e1000_get_perm_addr(struct net_device *netdev, struct ethtool_addr *eaddr)
> +{
> + memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN);
> + return 0;
> +}
> +
>  struct ethtool_ops e1000_ethtool_ops = {
>   .get_settings   = e1000_get_settings,
>   .set_settings   = e1000_set_settings,

This seems pretty generic, especially since you have added
perm_addr to the net_device structure.  How about if we reform
it as ethtool_op_get_perm_addr, so that all drivers can use it?
Patch to follow...

John

P.S.  Would a driver ever need to implement its own verion of this
function?  Since perm_addr is in the net_device structure, is there
a cleaner way to do this?  Just thinking out-loud...
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic.

2005-07-27 Thread Kazunori Miyazawa

Herbert Xu wrote:

On Wed, Jul 27, 2005 at 03:18:39PM -0700, David S. Miller wrote:


One idea tossed around between Herbert Xu (also CC:'d) and myself is
to store a generation counter when we attach a route to a socket, then
sk_dst_check() can verify that this generation count matches the
current IPSEC flow cache generation count.



Yes we did talk about having generation IDs for IPsec dst entries.
However, it doesn't help us when IPsec SAs change.  The flow cache
generation ID is only incremented for policy changes, not state
changes.

This particular bug report relates to the case where SAs are
renegotiated but the policy remains unchanged.

IMHO this is something that user space can and should deal with.
All the KM has to do is to delete the old outbound SA when the
new outbound SA has been negotiated.  This will cause all new
traffic to start using the new SA immediately.  It will also
allow the remote side to continue using the old SA until it
expires since we're not removing the existing inbound SA.



The key management protocols have the timing and procedure to delete
old SAs in the rekeying. We may not able to delete the outbound SA
with the reasone. But we can delete the old SA(s) when implementing
delete procedure on IKEv2 and KINK.
IKEv1 does not define the rekey. We can probably delete the old outbound SA
like the above.


We could do this in the kernel.  However, it'll end up being
harder since the kernel doesn't really know which old SA(s)
the new SA is meant to replace.



I'm anxious about MIPL 2.0 because it is implemented on the xfrm
architecture.

--
Kazunori Miyazawa

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] NETCONSOLE must depend on INET

2005-07-27 Thread David S. Miller
From: Matt Mackall <[EMAIL PROTECTED]>
Date: Tue, 26 Jul 2005 19:36:37 -0700

> # HG changeset patch
> # User [EMAIL PROTECTED]
> # Node ID 6cdd6f36d53678a016cfbf5ce667cbd91504d538
> # Parent  75716ae25f9d87ee2a5ef7c4df2d8f86e0f3f762
> Move in_aton from net/ipv4/utils.c to net/core/utils.c

This patch doesn't apply, in the current 2.6.x GIT tree
NETCONSOLE does not depend on NETDEVICES.

Please fix up this patch so that I can apply it.
Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.10 Kernel Goes Crazy After Resetting MTU

2005-07-27 Thread Jesse Brandeburg
On 7/27/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> I sent the following posting to linux-kernel at kernel.org. One reply
> suggested I send it to [EMAIL PROTECTED] (mailto:netdev@vger.kernel.org) , so
> I am doing  that.
> 
> Below are some rather extensive logs showing the Linux 2.6.10 kernel
> (kernel.org source) going into a tail spin after an "administration  program" 
> on my
> server tried to reset the mtu on all four of Ethernet Ports. I  believe only
> eth1 was changed -- from 9000 to 1500. But the admin program always  resets 
> all
> ports (even if the settings don't change).
> 
> In about 1/2 a  second, we generated this tremendous list of errors. Then
> everything went back  to normal (except that all of my users on all subnets 
> had
> their Windows  workstations freeze when it occurred)
> 

This is most likely due to the well known problem (we're working a
fix) where the use of jumbo frames appears to cause extreme memory
pressure due to looking for contiguous 32k pieces of memory for each
descriptor.  I don't know if this is indicative of some other problem
in the memory manager not defragmenting often enough, but we're
getting a lot of reports recently of problems like this with jumbos.

We are working on a patch to change how the driver works in order to
avoid this kind of scenario.  Its not quite complete yet however.

Jesse
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic.

2005-07-27 Thread Herbert Xu
On Wed, Jul 27, 2005 at 03:18:39PM -0700, David S. Miller wrote:
> 
> One idea tossed around between Herbert Xu (also CC:'d) and myself is
> to store a generation counter when we attach a route to a socket, then
> sk_dst_check() can verify that this generation count matches the
> current IPSEC flow cache generation count.

Yes we did talk about having generation IDs for IPsec dst entries.
However, it doesn't help us when IPsec SAs change.  The flow cache
generation ID is only incremented for policy changes, not state
changes.

This particular bug report relates to the case where SAs are
renegotiated but the policy remains unchanged.

IMHO this is something that user space can and should deal with.
All the KM has to do is to delete the old outbound SA when the
new outbound SA has been negotiated.  This will cause all new
traffic to start using the new SA immediately.  It will also
allow the remote side to continue using the old SA until it
expires since we're not removing the existing inbound SA.

We could do this in the kernel.  However, it'll end up being
harder since the kernel doesn't really know which old SA(s)
the new SA is meant to replace.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.13rc3] IPv6: Check interface bindings on IPv6 raw socket reception

2005-07-27 Thread David S. Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Sun, 24 Jul 2005 07:39:12 +0200

> [IPV4/6]: Check if packet was actually delivered to a raw socket to decide 
> whether to send an ICMP unreachable
> 
> Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

Applied, thanks Patrick.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)

2005-07-27 Thread Randy Dunlap

> On Jul 27, 2005, at 13:08, Randy Dunlap wrote:
> 
> >
> >
> >> On Jul 25, 2005, at 16:06, Francois Romieu wrote:
> >>
> >>
> >>
>  +int mdiobus_register(struct mii_bus *bus)
>  +{
>  +int i;
>  +int err = 0;
>  +
>  +spin_lock_init(&bus->mdio_lock);
>  +
>  +if (NULL == bus || NULL == bus->name ||
>  +NULL == bus->read ||
>  +NULL == bus->write)
> 
> 
> >>>
> >>> Be spartan:
> >>> if (!bus || !bus->name || !bus->read || !bus->write)
> >>>
> >>
> >>
> >> I think we have to agree to disagree here.  I could be convinced, but
> >> I'm partial to using NULL explicitly.
> >>
> >
> > But there are 2 issues here (at least).  One is to use NULL or
> > not.  The other is using (constant == var) or (var == constant).
> >
> > It's not described in CodingStlye afaik, but most recent email
> > on the subject strongly prefers (var == constant) [in my
> > unscientific survey -- of bits in my head].
> >
> > So using the suggested style will fix both of these.  :)
> 
> 
> Ok, here I won't agree to disagree with you.  !foo as a check for  
> NULL is a reasonable idea, but not my style.  If that's the preferred  
> style for the kernel, I will do that.
> 
> But (var == constant) is a style that asks for errors.  By putting  
> the constant first in these checks, you never run the risk of leaving  
> a bug like this:
> 
> if (dev = NULL)
>  ...
> 
> This kind of error is quite frustrating to detect, and the eye will  
> often miss it when scanning for errors.  If you follow constant ==  
> var, though, then the bug looks like this:
> 
> if (NULL = dev)
> 
> which is instantly caught by the compiler.
> 
> Just my 32 cents

Yes, we know about that argument.  :)

>  +/* Otherwise, we allocate the device, and initialize the
>  + * default values */
>  +dev = kmalloc(sizeof(*dev), GFP_KERNEL);
>  +
>  +if (NULL == dev) {
>  +errno = -ENOMEM;
>  +return NULL;
>  +}
>  +
>  +memset(dev, 0, sizeof(*dev));
> 
> 
> >>>
> >>> The kernel provides kcalloc.
> >>>
> >>
> >>
> >> I went looking for it, and found it in fs/cifs/misc.c.  I'm hesitant
> >> to link to a function defined in the filesystem code just to save 1
> >> line of code
> >>
> >
> > It's more global than that.
> 
> 
> Should we move the function, then, to include/linux/slab.h?  Or  
> somewhere else?

It's there, like Francois said.  Get use a current tree.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic.

2005-07-27 Thread David S. Miller
From: Andrew Morton <[EMAIL PROTECTED]>
Date: Wed, 27 Jul 2005 14:38:35 -0700

>Summary: IPSec incompabilty. Linux kernel waits to long to start
> using new SA for outbound traffic.

I think this is the known bug where we don't notice that a route
attached to a socket is obsolete.  It was first pointed out to
me last year by Kazunori Miyazawa, CC:'d here.

The problem is that, when we update IPSEC rules, sockets currently
don't have a way to discover that.

Traditionally, the route "obsolete" flag served this purpose, and that
does work properly for normal route entries.  But for IPSEC, we don't
have a way to find all of the stacked routes we created that match a
particular SA, and thus get them fixed up the next time a socket tries
to send a packet.

One idea tossed around between Herbert Xu (also CC:'d) and myself is
to store a generation counter when we attach a route to a socket, then
sk_dst_check() can verify that this generation count matches the
current IPSEC flow cache generation count.

Something like the following, untested patch, demonstrates the
idea.

[NET]: Tie obsolete state of routes also to flow cache generation count.

This fixes the problem wherein IPSEC SA changes do not get noticed
by cached socket routes.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/include/net/sock.h b/include/net/sock.h
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * This structure really needs to be cleaned up.
@@ -193,6 +194,7 @@ struct sock {
socket_lock_t   sk_lock;
wait_queue_head_t   *sk_sleep;
struct dst_entry*sk_dst_cache;
+   unsigned intsk_dst_cache_genid;
struct xfrm_policy  *sk_policy[2];
rwlock_tsk_dst_lock;
atomic_tsk_rmem_alloc;
@@ -924,6 +926,9 @@ __sk_dst_set(struct sock *sk, struct dst
 
old_dst = sk->sk_dst_cache;
sk->sk_dst_cache = dst;
+#ifdef CONFIG_XFRM
+   sk->sk_dst_cache_genid = atomic_read(&flow_cache_genid);
+#endif
dst_release(old_dst);
 }
 
@@ -958,7 +963,9 @@ __sk_dst_check(struct sock *sk, u32 cook
 {
struct dst_entry *dst = sk->sk_dst_cache;
 
-   if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
+   if (dst &&
+   ((dst->obsolete && dst->ops->check(dst, cookie) == NULL) ||
+(sk->sk_dst_cache_genid != atomic_read(&flow_cache_genid {
sk->sk_dst_cache = NULL;
dst_release(dst);
return NULL;
@@ -972,7 +979,9 @@ sk_dst_check(struct sock *sk, u32 cookie
 {
struct dst_entry *dst = sk_dst_get(sk);
 
-   if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
+   if (dst &&
+   ((dst->obsolete && dst->ops->check(dst, cookie) == NULL) ||
+(sk->sk_dst_cache_genid != atomic_read(&flow_cache_genid {
sk_dst_reset(sk);
dst_release(dst);
return NULL;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic.

2005-07-27 Thread Andrew Morton


Begin forwarded message:

Date: Wed, 27 Jul 2005 14:31:20 -0700
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to 
long to start using new SA for outbound traffic.


http://bugzilla.kernel.org/show_bug.cgi?id=4952

   Summary: IPSec incompabilty. Linux kernel waits to long to start
using new SA for outbound traffic.
Kernel Version: 2.6.12.3
Status: NEW
  Severity: high
 Owner: [EMAIL PROTECTED]
 Submitter: [EMAIL PROTECTED]


Problem Description:

Linux kernel waits to long to start using new SA for outbound traffic. This is
wrong, and creates real problems when peer stops supporting old SA faster than
it should.

Steps to reproduce:

I started pinging over IPSec tunnel. Racoon created the pair of IPsec-SA:

Jul 27 21:18:06 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel
YYY.YY.YYY.YYY[0]->XXX.XX.XX.XXX[0] spi=209244
158(0xc78cffe)
Jul 27 21:18:06 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel
XXX.XX.XX.XXX[0]->YYY.YY.YYY.YYY[0] spi=282443
5949(0xa85978ed)

Everything worked fine:

2005-07-27 21:18:24.451903 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0x1), length 116
2005-07-27 21:18:24.486625 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x0c78cffe,seq=0x1), length 116
2005-07-27 21:18:25.453360 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0x2), length 116
2005-07-27 21:18:25.482251 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x0c78cffe,seq=0x2), length 116

After ~2880s keys expired:

Jul 27 22:06:06 gw1 racoon: INFO: IPsec-SA expired: ESP/Tunnel
YYY.YY.YYY.YYY[0]->XXX.XX.XX.XXX[0] spi=209244158(
0xc78cffe)
Jul 27 22:06:06 gw1 racoon: INFO: initiate new phase 2 negotiation:
XXX.XX.XX.XXX[0]<=>YYY.YY.YYY.YYY[0]
Jul 27 22:06:06 gw1 racoon: INFO: IPsec-SA expired: ESP/Tunnel
XXX.XX.XX.XXX[0]->YYY.YY.YYY.YYY[0] spi=2824435949
(0xa85978ed)

Racoon negotiated new SA:

Jul 27 22:06:10 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel
YYY.YY.YYY.YYY[0]->XXX.XX.XX.XXX[0] spi=150510
678(0x8f89c56)
Jul 27 22:06:10 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel
XXX.XX.XX.XXX[0]->YYY.YY.YYY.YYY[0] spi=360860
9595(0xd717033b)

Linux kernel was still using old SA (spi=0xa85978ed), peer switched into new SA
(spi=0x08f89c56)

2005-07-27 22:06:10.634929 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0xb2c), length 116
2005-07-27 22:06:10.987012 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x0c78cffe,seq=0xb28), length 116
2005-07-27 22:06:10.992134 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x0c78cffe,seq=0xb29), length 116
2005-07-27 22:06:10.997382 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x0c78cffe,seq=0xb2a), length 116
2005-07-27 22:06:11.636814 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0xb2d), length 116
2005-07-27 22:06:11.665220 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x0c78cffe,seq=0xb2b), length 116
2005-07-27 22:06:12.638681 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0xb2e), length 116
2005-07-27 22:06:12.666848 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x08f89c56,seq=0x1), length 116
2005-07-27 22:06:13.640549 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0xb2f), length 116
2005-07-27 22:06:13.673727 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x08f89c56,seq=0x2), length 116
2005-07-27 22:06:14.642430 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0xb30), length 116
2005-07-27 22:06:14.670360 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x08f89c56,seq=0x3), length 116
2005-07-27 22:06:15.643304 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0xb31), length 116
2005-07-27 22:06:15.670616 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x08f89c56,seq=0x4), length 116

IPSec peer initiated new key renegotiation:
Jul 27 22:06:52 gw1 racoon: INFO: respond new phase 1 negotiation:
XXX.XX.XX.XXX[500]<=>YYY.YY.YYY.YYY[500]
Jul 27 22:06:52 gw1 racoon: INFO: begin Identity Protection mode.
Jul 27 22:06:58 gw1 racoon: INFO: ISAKMP-SA established
XXX.XX.XX.XXX[500]-YYY.YY.YYY.YYY[500] spi:aa103d22d3e480
b3:1b470506cfc95a33
Jul 27 22:07:02 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel
YYY.YY.YYY.YYY[0]->XXX.XX.XX.XXX[0] spi=383529
4(0x3a859e)
Jul 27 22:07:02 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel
XXX.XX.XX.XXX[0]->YYY.YY.YYY.YYY[0] spi=299727
8258(0xb2a6d632)

Linux was still using old SA(spi=0xa85978ed), peer stopped accepting this SA:

2005-07-27 22:07:03.691323 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0xb61), length 116
2005-07-27 22:07:03.718660 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX:
ESP(spi=0x08f89c56,seq=0x34), length 116
2005-07-27 22:07:04.692194 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0xb62), length 116
2005-07-27 22:07:05.692064 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0xb63), length 116
2005-07-27 22:07:06.691950 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY:
ESP(spi=0xa85978ed,seq=0xb64), length 116
2005-07-

Re: [PATCH 2.6.13rc3] IPv6: Check interface bindings on IPv6 raw socket reception

2005-07-27 Thread David S. Miller
From: Andrew McDonald <[EMAIL PROTECTED]>
Date: Sat, 23 Jul 2005 19:04:43 +0100

> Take account of whether a socket is bound to a particular device when
> selecting an IPv6 raw socket to receive a packet. Also perform this
> check when receiving IPv6 packets with router alert options.
> 
> Signed-off-by: Andrew McDonald <[EMAIL PROTECTED]>

Applied, thanks Andrew.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] setsockopt locking fix

2005-07-27 Thread David S. Miller
From: Kyle Moffett <[EMAIL PROTECTED]>
Subject: Re: [PATCH] setsockopt locking fix
Date: Wed, 27 Jul 2005 16:48:21 -0400

> On Jul 27, 2005, at 16:16:02, David S. Miller wrote:
> > Fix is correct, good thing it only hits Sparc :-)
> >
> > But your patch does not apply cleanly (perhaps your
> > email client mangled it somehow) and I need to have
> > a "Signed-off-by: " line in order to apply the
> > patch.
> 
> Fixed.  Attached is below.  Patch is against a _very_
> recent linus GIT repository, and after mailing it to
> myself, it still applies to a fresh repository here,
> so I'm assuming it's ok:

Applied, th anks Kyle.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.13-rc3-mm2

2005-07-27 Thread David S. Miller
From: Andrew Morton <[EMAIL PROTECTED]>
Date: Wed, 27 Jul 2005 14:11:51 -0700

> Unbalanced netlink_table_ungrab() in the netlink stuff in git-net.patch.

Applied to net-2.6.14, thanks Andrew.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] NETCONSOLE must depend on INET

2005-07-27 Thread Matt Mackall
On Wed, Jul 27, 2005 at 01:19:00PM -0700, David S. Miller wrote:
> From: Matt Mackall <[EMAIL PROTECTED]>
> Date: Tue, 26 Jul 2005 19:36:37 -0700
> 
> > # HG changeset patch
> > # User [EMAIL PROTECTED]
> > # Node ID 6cdd6f36d53678a016cfbf5ce667cbd91504d538
> > # Parent  75716ae25f9d87ee2a5ef7c4df2d8f86e0f3f762
> > Move in_aton from net/ipv4/utils.c to net/core/utils.c
> 
> This patch doesn't apply, in the current 2.6.x GIT tree
> NETCONSOLE does not depend on NETDEVICES.

Odd, gitweb of Linus' tree seems to disagree. I see it depends on
NETDEVICES && INET && EXPERIMENTAL. NETDEVICES has been there since
the beginning of git history and according to my Mercurial import from
BKCVS, it's been dependent on NETDEVICES since I first submitted it.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] NETCONSOLE must depend on INET

2005-07-27 Thread David S. Miller
From: Matt Mackall <[EMAIL PROTECTED]>
Date: Wed, 27 Jul 2005 13:46:22 -0700

> Odd, gitweb of Linus' tree seems to disagree. I see it depends on
> NETDEVICES && INET && EXPERIMENTAL. NETDEVICES has been there since
> the beginning of git history and according to my Mercurial import from
> BKCVS, it's been dependent on NETDEVICES since I first submitted it.

Sorry, that's a result of a local change I just added
to fix up presentation the net device family Kconfig's.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.13-rc3-mm2

2005-07-27 Thread Andrew Morton
Andrew James Wade <[EMAIL PROTECTED]> wrote:
>
> Hello, my kernel crashes on boot with the following BUG():

Indeed it will.

> ENABLING IO-APIC IRQs
> ..TIMER: vector=0x31 pin1=2 pin2=-1
> softlockup thread 0 started up.
> NET: Registered protocol family 16
> [ cut here ]
> kernel BUG at kernel/sched.c:2888!
> invalid operand:  [#1]
> PREEMPT
> last sysfs file:
> CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010202   (2.6.13-rc3-mm2)
> EIP is at sub_preempt_count+0x35/0x40
> eax: dff8   ebx:    ecx: 0001   edx: 0001
> esi: dffc3d18   edi:    ebp: dff81f50   esp: dff81f50
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 1, threadinfo=dff8 task=c14d9a10)
> Stack:  c038a5fe   0003 c048f5e0 c048f780 c048f780
> dff8d544 c038bcaa   c0386d30 dffc3d18 000f
>000f dff8d544  c04f2bf3 0021 0021 c04f2e8d 
> Call Trace:
>  [] netlink_create+0x5e/0x120
>  [] netlink_kernel_create+0x13a/0x240
>  [] rtnetlink_rcv+0x0/0x390
>  [] rtnetlink_init+0x53/0xa0
>  [] netlink_proto_init+0x18d/0x200
>  [] do_initcalls+0x2b/0xc0
>  [] kern_mount+0x15/0x19
>  [] init+0x0/0x110
>  [] init+0x2f/0x110
>  [] kernel_thread_helper+0x0/0x18
>  [] kernel_thread_helper+0x5/0x18
> Code: 89 e5 3b 50 14 7f 24 81 fa fe 00 00 00 76 0c b8 00 e0 ff ff 21 e0 29 50 
> 14 c9 c3 80 78 14 00 75 ee 0f 0b 4c 0b 66 50 41 c0 eb e4
> <0f> 0b 48 0b 66 50 41 c0 eb d2 90 55 8b 40 04 89 e5 c9 e9 54 f5
>  <0>Kernel panic - not syncing: Attempted to kill init!
> 

Unbalanced netlink_table_ungrab() in the netlink stuff in git-net.patch.

--- devel/net/netlink/af_netlink.c~netlink-locking-fix  2005-07-27 
14:10:07.0 -0700
+++ devel-akpm/net/netlink/af_netlink.c 2005-07-27 14:10:16.0 -0700
@@ -349,12 +349,12 @@ static int netlink_create(struct socket 
 
netlink_table_grab();
if (!nl_table[protocol].hash.entries) {
-   netlink_table_ungrab();
 #ifdef CONFIG_KMOD
/* We do 'best effort'.  If we find a matching module,
 * it is loaded.  If not, we don't return an error to
 * allow pure userspace<->userspace communication. -HW
 */
+   netlink_table_ungrab();
request_module("net-pf-%d-proto-%d", PF_NETLINK, protocol);
netlink_table_grab();
 #endif
_

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] setsockopt locking fix

2005-07-27 Thread Kyle Moffett

On Jul 27, 2005, at 16:16:02, David S. Miller wrote:

Fix is correct, good thing it only hits Sparc :-)

But your patch does not apply cleanly (perhaps your
email client mangled it somehow) and I need to have
a "Signed-off-by: " line in order to apply the
patch.


Fixed.  Attached is below.  Patch is against a _very_
recent linus GIT repository, and after mailing it to
myself, it still applies to a fresh repository here,
so I'm assuming it's ok:



setsockopt-locking-fix.patch
Description: Binary data


Cheers,
Kyle Moffett

--
I lost interest in "blade servers" when I found they didn't throw  
knives at

people who weren't supposed to be in your machine room.
  -- Anthony de Boer




Re: [PATCH] setsockopt locking fix

2005-07-27 Thread David S. Miller
From: Kyle Moffett <[EMAIL PROTECTED]>
Date: Wed, 27 Jul 2005 13:47:30 -0400

> # HG changeset patch
> # User Kyle Moffett <[EMAIL PROTECTED]>
> # Node ID 77475acbe89242e63e6fd73dc66fe52643011ed7
> # Parent  43cd2abd0f4c5d2e8ee4666d6bf1f0b96e252e54
> Fix a bug where sock_reset_flag() was called without lock_sock()

Fix is correct, good thing it only hits Sparc :-)

But your patch does not apply cleanly (perhaps your
email client mangled it somehow) and I need to have
a "Signed-off-by: " line in order to apply the
patch.

So please fix this up, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-mm patch] include/net/ieee80211.h must #include

2005-07-27 Thread Adrian Bunk
gcc found an (although perhaps harmless) bug:

<--  snip  -->

...
  CC  net/ieee80211/ieee80211_crypt.o
In file included from net/ieee80211/ieee80211_crypt.c:21:
include/net/ieee80211.h:26:5: warning: "WIRELESS_EXT" is not defined
  CC  net/ieee80211/ieee80211_crypt_wep.o
In file included from net/ieee80211/ieee80211_crypt_wep.c:20:
include/net/ieee80211.h:26:5: warning: "WIRELESS_EXT" is not defined
  CC  net/ieee80211/ieee80211_crypt_ccmp.o
  CC  net/ieee80211/ieee80211_crypt_tkip.o
In file included from net/ieee80211/ieee80211_crypt_tkip.c:23:
include/net/ieee80211.h:26:5: warning: "WIRELESS_EXT" is not defined
...

<--  snip  -->


Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

This patch was already sent on:
- 22 Jul 2005

--- linux-2.6.13-rc3-mm1-full/include/net/ieee80211.h.old   2005-07-22 
18:37:57.0 +0200
+++ linux-2.6.13-rc3-mm1-full/include/net/ieee80211.h   2005-07-22 
18:38:10.0 +0200
@@ -22,6 +22,7 @@
 #define IEEE80211_H
 #include  /* ETH_ALEN */
 #include/* ARRAY_SIZE */
+#include 
 
 #if WIRELESS_EXT < 17
 #define IW_QUAL_QUAL_INVALID   0x10

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)

2005-07-27 Thread Francois Romieu
Andy Fleming <[EMAIL PROTECTED]> :
[kcalloc]
> Should we move the function, then, to include/linux/slab.h?  Or  
> somewhere else?

It is already in mm/slab.c

[rc = request_irq(...)]

It appears in drivers/net/*c. Jeff Garzik used to suggest something
similar but it does not matter as long as you do not need to return
an error status (KERN_ERR is probably a bit too strong then).

[initialization of struct phy_setting settings]

#define NITZ(d,t,s) { .speed = s, .duplex = d, .setting = t }

static struct phy_setting settings[] = {
   NITZ(DUPLEX_FULL, SUPPORTED_1baseT_Full, 1),
   NITZ(DUPLEX_FULL, SUPPORTED_1000baseT_Full,  SPEED_1000),
   NITZ(DUPLEX_HALF, SUPPORTED_1000baseT_Half,  SPEED_1000),
   NITZ(DUPLEX_FULL, SUPPORTED_100baseT_Full,   SPEED_100),
   NITZ(DUPLEX_HALF, SUPPORTED_100baseT_Half,   SPEED_100),
   NITZ(DUPLEX_FULL, SUPPORTED_10baseT_Full,SPEED_10),
   NITZ(DUPLEX_HALF, SUPPORTED_10baseT_Half,SPEED_10),
};

#undef NITZ

--
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)

2005-07-27 Thread Andy Fleming


On Jul 27, 2005, at 13:08, Randy Dunlap wrote:





On Jul 25, 2005, at 16:06, Francois Romieu wrote:




+int mdiobus_register(struct mii_bus *bus)
+{
+int i;
+int err = 0;
+
+spin_lock_init(&bus->mdio_lock);
+
+if (NULL == bus || NULL == bus->name ||
+NULL == bus->read ||
+NULL == bus->write)




Be spartan:
if (!bus || !bus->name || !bus->read || !bus->write)




I think we have to agree to disagree here.  I could be convinced, but
I'm partial to using NULL explicitly.



But there are 2 issues here (at least).  One is to use NULL or
not.  The other is using (constant == var) or (var == constant).

It's not described in CodingStlye afaik, but most recent email
on the subject strongly prefers (var == constant) [in my
unscientific survey -- of bits in my head].

So using the suggested style will fix both of these.  :)



Ok, here I won't agree to disagree with you.  !foo as a check for  
NULL is a reasonable idea, but not my style.  If that's the preferred  
style for the kernel, I will do that.


But (var == constant) is a style that asks for errors.  By putting  
the constant first in these checks, you never run the risk of leaving  
a bug like this:


if (dev = NULL)
...

This kind of error is quite frustrating to detect, and the eye will  
often miss it when scanning for errors.  If you follow constant ==  
var, though, then the bug looks like this:


if (NULL = dev)

which is instantly caught by the compiler.

Just my 32 cents






+/* Otherwise, we allocate the device, and initialize the
+ * default values */
+dev = kmalloc(sizeof(*dev), GFP_KERNEL);
+
+if (NULL == dev) {
+errno = -ENOMEM;
+return NULL;
+}
+
+memset(dev, 0, sizeof(*dev));




The kernel provides kcalloc.




I went looking for it, and found it in fs/cifs/misc.c.  I'm hesitant
to link to a function defined in the filesystem code just to save 1
line of code



It's more global than that.



Should we move the function, then, to include/linux/slab.h?  Or  
somewhere else?


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)

2005-07-27 Thread Randy Dunlap

> On Jul 25, 2005, at 16:06, Francois Romieu wrote:
> 
> 
> >> +int mdiobus_register(struct mii_bus *bus)
> >> +{
> >> +int i;
> >> +int err = 0;
> >> +
> >> +spin_lock_init(&bus->mdio_lock);
> >> +
> >> +if (NULL == bus || NULL == bus->name ||
> >> +NULL == bus->read ||
> >> +NULL == bus->write)
> >>
> >
> > Be spartan:
> > if (!bus || !bus->name || !bus->read || !bus->write)
> 
> 
> I think we have to agree to disagree here.  I could be convinced, but  
> I'm partial to using NULL explicitly.

But there are 2 issues here (at least).  One is to use NULL or
not.  The other is using (constant == var) or (var == constant).

It's not described in CodingStlye afaik, but most recent email
on the subject strongly prefers (var == constant) [in my
unscientific survey -- of bits in my head].

So using the suggested style will fix both of these.  :)

> >> +/* Otherwise, we allocate the device, and initialize the
> >> + * default values */
> >> +dev = kmalloc(sizeof(*dev), GFP_KERNEL);
> >> +
> >> +if (NULL == dev) {
> >> +errno = -ENOMEM;
> >> +return NULL;
> >> +}
> >> +
> >> +memset(dev, 0, sizeof(*dev));
> >>
> >
> > The kernel provides kcalloc.
> 
> 
> I went looking for it, and found it in fs/cifs/misc.c.  I'm hesitant  
> to link to a function defined in the filesystem code just to save 1  
> line of code

It's more global than that.

~Randy
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)

2005-07-27 Thread Andy Fleming


On Jul 25, 2005, at 16:06, Francois Romieu wrote:

[snip]



+config DAVICOM_PHY
+bool "Drivers for Davicom PHYs"
+depends on PHYLIB
+---help---
+  Currently supports dm9161e and dm9131


[snip]



Yeah, I resisted splitting the patch up for this reason.  Suffice it  
to say, you have to apply patch #2 to not break everything.   
Splitting the PHY driver code from the PHY layer is just for  
"convenience"




+int mdiobus_register(struct mii_bus *bus)
+{
+int i;
+int err = 0;
+
+spin_lock_init(&bus->mdio_lock);
+
+if (NULL == bus || NULL == bus->name ||
+NULL == bus->read ||
+NULL == bus->write)



Be spartan:
if (!bus || !bus->name || !bus->read || !bus->write)



I think we have to agree to disagree here.  I could be convinced, but  
I'm partial to using NULL explicitly.




+
+/* Convenience function to print out the current phy status
+ */
+void phy_print_status(struct phy_device *phydev)
+{
+pr_info("%s: Link is %s", phydev->dev.bus_id,
+phydev->link ? "Up" : "Down");
+if (phydev->link)
+printk(" - %d/%s", phydev->speed,



Missing KERN_SOMETHING in the printk.



Actually, KERN_SOMETHING would muck up the line, and make it look  
like this:


phy0:0: Link is Up<3> - 1000/Full

That's why it's like that.



+/* A mapping of all SUPPORTED settings to speed/duplex */
+static struct phy_setting settings[] = {
+{ .speed = 1, .duplex = DUPLEX_FULL,
+.setting = SUPPORTED_1baseT_Full,
+},
+{ .speed = SPEED_1000, .duplex = DUPLEX_FULL,
+.setting = SUPPORTED_1000baseT_Full,
+},
+{ .speed = SPEED_1000, .duplex = DUPLEX_HALF,
+.setting = SUPPORTED_1000baseT_Half,
+},
+{ .speed = SPEED_100, .duplex = DUPLEX_FULL,
+.setting = SUPPORTED_100baseT_Full,
+},
+{ .speed = SPEED_100, .duplex = DUPLEX_HALF,
+.setting = SUPPORTED_100baseT_Half,
+},
+{ .speed = SPEED_10, .duplex = DUPLEX_FULL,
+.setting = SUPPORTED_10baseT_Full,
+},
+{ .speed = SPEED_10, .duplex = DUPLEX_HALF,
+.setting = SUPPORTED_10baseT_Half,
+},
+};



Would you veto some macro to initialise this array ?



Depends on the macro.  :)  I'm not keen on writing it, but I would  
support one that:


a) works
b) Isn't uglier than the current solution.  :)



+static inline int phy_find_setting(int speed, int duplex)
+{
+int idx = 0;
+
+while (idx < MAX_NUM_SETTINGS &&
+(settings[idx].speed != speed ||
+settings[idx].duplex != duplex))
+idx++;



"for" loop in disguise ?



Well  I think it falls into the gray area.  It's searching until  
it finds something, which implies "while" to me.  Really it's more of  
a while...until.  Of course, a for loop could be used, but I often  
worry about using a for loop's iterator variable outside of the  
loop.  I will change to ARRAY_SIZE, though.







+
+return idx < MAX_NUM_SETTINGS ? idx : MAX_NUM_SETTINGS - 1;



Ok (dunno if "idx % MAX_NUM_SETTINGS" is more idiomatic or not).



That would be completely different.  The current code makes sure  
that, if no valid match was found, the last value in the array is  
returned.  Using % would result in the first value being returned.  I  
was defaulting to the lowest setting.




+int phy_start_interrupts(struct phy_device *phydev)
+{
+int err = 0;
+
+INIT_WORK(&phydev->phy_queue, phy_change, phydev);
+
+if (request_irq(phydev->irq, phy_interrupt,
+SA_SHIRQ,
+"phy_interrupt",
+phydev) < 0) {



Please, don't do that :o(

err = request_irq(phydev->irq, phy_interrupt, SA_SHIRQ,
  "phy_interrupt", phydev);
if (err < 0)
...



I did a cursory search, and didn't find any other drivers which use  
this method.  Which is the method preferred in Linux?




+printk(KERN_ERR "%s: Can't get IRQ %d (PHY)\n",
+phydev->bus->name,
+phydev->irq);
+phydev->irq = PHY_POLL;
+return 0;



The description of the function says "Returns 0 on success".



Failing to request the IRQ does not result in failure of the  
function.  It falls back to polling, instead.


However, it can fail if phy_enable_interrupts() fails, which would  
happen if a hardware issue occurred.




+/* Otherwise, we allocate the device, and initialize the
+ * default values */
+dev = kmalloc(sizeof(*dev), GFP_KERNEL);
+
+if (NULL == dev) {
+errno = -ENOMEM;
+return NULL;
+}
+
+memset(dev, 0, sizeof(*dev));



The kernel provides kcalloc.



I went looking for it, and found it in fs/cifs/misc.c.  I'm hesitant  
to link to a function defined in the filesystem code just to save 1  
line of code



I agree with all the other suggestions, and will implement them.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo inf

Re: more complex processing in ing_filter ?

2005-07-27 Thread Stephen Hemminger
On Wed, 27 Jul 2005 10:06:45 +0200
Lucas Nussbaum <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I'm interested in doing more complex stuff on inbound packets than
> what is currently possible with ing_filter (I understand ingress
> doesn't allow child classes , and can only drop/pass packets, not
> store one to send it later).
> 
> While this is understandable because it would conflict with the
> benefits of NAPI by queueing and dropping packets much later, it
> prevents me from using Linux instead of FreeBSD's Dummynet (I'm
> working on network emulation-related stuff).
>
Why not just fix netem to work on imq?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


QoS for web traffic

2005-07-27 Thread Anand SVR
Hi,

I am one of the network admins of a campus of large community who
tries to make sure that the academic community is kept happy in spite
of a congested access link. My professor keeps asking me why can't you
give higher bandwidth for web access to educational, and any academic
related sites. I wonder if I can do so using Linux box placed in
between the access link and the campus LAN.

The squid delay pools option has been tried but it doesn't serve the
purpose because squid doesn't have intimate knowledge of the bandwidth
availability of the access link, which is dynamic.

As a starting point, I would like to define classification rules such
that web access to
*.edu OR *.net OR *.org can be put under  one bandwidth chunk. Public
mail sites such as  *.yahoo.com OR gmail.com OR *hotmail.com under a
different chunk. The rest goes to default chunk, and so on. If any one
category is not using its bandwidth share, others should be able to
borrow the bandwidth. Of course smtp and other kinds of traffic will
be given  their quota.

Can I do the above kind of classification, and subsequently bandwidth
allocation based on text based wildcard with logical operators such as
above using any of the existing options available under Linux ? Am I
asking for a  moon  ? :)

Thanks for your patience.

Anand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] setsockopt locking fix

2005-07-27 Thread Kyle Moffett
# HG changeset patch
# User Kyle Moffett <[EMAIL PROTECTED]>
# Node ID 77475acbe89242e63e6fd73dc66fe52643011ed7
# Parent  43cd2abd0f4c5d2e8ee4666d6bf1f0b96e252e54
Fix a bug where sock_reset_flag() was called without lock_sock()

diff -r 43cd2abd0f4c -r 77475acbe892 net/core/sock.c
--- a/net/core/sock.c   Wed Jul 27 04:02:15 2005
+++ b/net/core/sock.c   Wed Jul 27 17:38:27 2005
@@ -206,13 +206,14 @@
 */
 
 #ifdef SO_DONTLINGER   /* Compatibility item... */
-   switch (optname) {
-   case SO_DONTLINGER:
-   sock_reset_flag(sk, SOCK_LINGER);
-   return 0;
-   }
-#endif 
-   
+   if (optname == SO_DONTLINGER) {
+   lock_sock(sk);
+   sock_reset_flag(sk, SOCK_LINGER);
+   release_sock(sk);
+   return 0;
+   }
+#endif
+   
if(optlen

Cheers,
Kyle Moffett

--
I lost interest in "blade servers" when I found they didn't throw  
knives at

people who weren't supposed to be in your machine room.
  -- Anthony de Boer




Resend: [RFC/PATCH] "safer ipv4 reassembly"

2005-07-27 Thread Arthur Kepner

Resending and requesting comments. (Patch was against 2.6.13-rc1, 
so it's a little stale by now.)

-- Forwarded message --
.

Version 2 of the rfc/patch is attached. It has been changed 
as indicated in the commentary below.

Diffstat:
 include/linux/sysctl.h |1
 net/ipv4/ip_fragment.c |  195 +
 net/ipv4/sysctl_net_ipv4.c |   11 ++

Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>

On Tue, 28 Jun 2005, Arthur Kepner wrote:

> 
> On Sun, 26 Jun 2005, Herbert Xu wrote:
> 
> > 
> > > +struct ipc {
> > > ..
> > > + struct rcu_head rcu;
> > 
> > Is RCU worth it here? The only time we'd be taking the locks on this
> > is when the first fragment of a packet comes in.  At that point we'll
> > be taking write_lock(&ipfrag_lock) anyway.
> > 

Yes, I think rcu is worth it here. The reason is that to 
not use rcu would necessitate grabbing the (global) 
ipfrag_lock an additional time, when we free an ipc.

Adding an "ipc" to the hashtable could be done under the
ipfrag_lock, as you mention. But removing an ipc shouldn't 
be done at the same time that fragments are destroyed, 
because the common case is that another fragment queue will 
soon be created for the same (src,dst,proto). Better to 
save the ipc for a while to avoid freeing and then 
immediately recreating it. 

Since the freeing of the ipc has to be deferred until 
well after the last associated fragment queue has been 
freed, we can't take advantage of the fact that the 
ipfrag_lock is held when the fragment queue is freed.  

So when finally freeing the ipc, we can either grab the 
global ipfrag_lock again, or use some other, finer-grained 
lock to protect the ipc_hash entries. I'd prefer to avoid 
introducing new uses of global locks.

If we use the finer-grained ipc_hash[].lock locks then 
rcu allows us to avoid taking any locks in ipc_find when we 
create a new fragment chain and there already happens to be 
an ipc for the associated (src,dst,proto). (I suspect this 
would be a fairly common case.)

> > The only other use of RCU in your patch is ip_count.  That should be
> > changed to be done in ip_defrag instead.  At that point you can simply
> > find the ipc by deferencing ipq, so no need for __ipc_find and hence
> > RCU.
> > 
> > The reason you need to change it in this way is because you can't make
> > assumptions about ip_rcv_finish being the first place where a packet
> > is defragmented.  With connection tracking enabled conntrack is the first
> > place where defragmentation occurs.
> >   
> .

This has been fixed. ip_input.c isn't changed by this version 
of the patch. But there's the caveat that I mentioned earlier:

> 
> There is a (big) advantage to doing this in ip_defrag() - this 
> becomes a no-op for non-fragmented datagrams. The disadvantage 
> is that there could be a situation where you receive:
> 
>   1) first fragment of datagram X [for a particular (src,dst,proto)]
>   2) a zillion non-fragmented datagrams [for the same (src,dst,proto)]
>   3) last fragment of datagram X [for (src,dst,proto)]
> 
> and no "disorder" would be detected for the datagrams associated 
> with (src,dst,proto), even though the ip id could have wrapped in the 
> meantime. This seems like a very uncommon case, however.
> 
> 
> > > +#define IPC_HASHSZ   IPQ_HASHSZ
> > > +static struct {
> > > + struct hlist_head head;
> > > + spinlock_t lock;
> > > +} ipc_hash[IPC_HASHSZ];
> > 
> > I'd store ipc entries in the main ipq hash table since they can use
> > the same keys for lookup as ipq entries.  You just need to set protocol
> > to zero and map the user to values specific to ipc for ipc entries.
> > One mapping would be to set the top bit of user for ipc entries, e.g.
> > 
> > #define IP_DEFRAG_IPC 0x8000
> > ipc->user = ipq->user | IP_DEFRAG_IPC;
> > 
> > Of course you also need to make sure that the two structures share
> > the leading elements.  You can then use the user field to distinguish
> > between ipc/ipq entries.
> 

I thought about this point, but I dislike reusing the same 
structure for such different purposes, so left this unchanged. 

Comments?

--
Arthurdiff -rup linux.orig/include/linux/sysctl.h linux/include/linux/sysctl.h
--- linux.orig/include/linux/sysctl.h   2005-07-06 12:32:03.224546953 -0700
+++ linux/include/linux/sysctl.h2005-07-07 09:56:42.854845089 -0700
@@ -343,6 +343,7 @@ enum
NET_TCP_BIC_BETA=108,
NET_IPV4_ICMP_ERRORS_USE_INBOUND_IFADDR=109,
NET_TCP_CONG_CONTROL=110,
+   NET_IPV4_REASM_COUNT=111,
 };
 
 enum {
diff -rup linux.orig/net/ipv4/ip_fragment.c linux/net/ipv4/ip_fragment.c
--- linux.orig/net/ipv4/ip_fragment.c   2005-07-06 12:30:51.033380830 -0700
+++ linux/net/ipv4/ip_fragment.c2005-07-07 09:56:42.856798234 -0700
@@ -56,6 +56,8 @@
 int sysctl_ipfrag_high_thresh = 256*1024;
 int sysctl_ipfrag_low_thresh = 192*1024;
 
+int sysctl_ip_reassembly_count;
+
 /* Important NOTE! Frag

Re: Patch: reduce skb input dev on 64 bit machines

2005-07-27 Thread jamal

So 20 emails or so later ...

On Wed, 2005-27-07 at 08:43 -0700, Ben Greear wrote:
> I don't see a good reason for the feature, or at least nothing that
> justifies the work of trying to implement it.  

You are the one who pointed the ifindex issue. What a waste of time.

> What benefits do you envision?

Go back and read the thread again. 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: reduce skb input dev on 64 bit machines

2005-07-27 Thread Ben Greear

jamal wrote:

On Tue, 2005-26-07 at 09:54 -0700, Ben Greear wrote:
[..]


You will need to enforce that nothing else gets the index 34 while eth7 is
removed.  
How do you do that?  



Thats trivial if you assume there's one management app which most of the
router vendors implementing it have. It will get tricky when you have 10
apps fighting to get index 34 for other devices. 
You could of course enforce a reserved set of indices right on bootup

via a kernel option for example; but that doesnt solve a stupid admin
with 10 scripts all fighting for index 34 each for a different device.



If you try to put that into the kernel, you
have a big nasty mess, and if you try to make user-space do it, any bogus
script can hose your system and potentially screw up your firewall and
worse.




Refer to above. It can actually be solved and not in a big mess like you
say. The question is whether such a feature is needed.


I don't see a good reason for the feature, or at least nothing that
justifies the work of trying to implement it.  What benefits do you
envision?

I believe this discussion originally came about because we can save
4 bytes by storing the ifindex instead of a pointer to the netdevice (on 64-bit 
machines).
However, the cost of doing so is a netdev_get_by_index() and some
scheme of making the ifindexes persistent.  I don't think the saving of 4 bytes 
is worth
either of these costs, much less both together.

Ben

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: more complex processing in ing_filter ?

2005-07-27 Thread jamal
On Wed, 2005-27-07 at 10:06 +0200, Lucas Nussbaum wrote:
> Hi,
> 
> I'm interested in doing more complex stuff on inbound packets than what
> is currently possible with ing_filter (I understand ingress doesn't
> allow child classes , and can only drop/pass packets, not store one to
> send it later).
> 

No, thats not true. You can write a tc action that will steal packets
from that path and later reinject them. But that may not be necessary
if you use the patched dummy device since you could redirect packets to
it and run whatever qdisc you want on it. 

> While this is understandable because it would conflict with the benefits
> of NAPI by queueing and dropping packets much later, it prevents me from
> using Linux instead of FreeBSD's Dummynet (I'm working on network
> emulation-related stuff).
> 
> What would be the disadvantages of moving the call to ing_filter earlier
> in netif_receive_skb, allow queueing in ingress, and re-inject packets
> inside netif_receive_skb ? Does it look do-able at least ? I'm not sure
> I see all the problems it implies.
> 
> I know there's a solution to my problem using IMQ or dummy, but it
> doesn't look like a very clean solution.
> 

I am not sure why you say it's unclean. If you can give the packets to
dummy and run any qdisc on it such as netem - why would that be a
problem?

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: reduce skb input dev on 64 bit machines

2005-07-27 Thread jamal
On Tue, 2005-26-07 at 13:00 -0700, David S. Miller wrote:
> 
> Calling __dev_get_by_index() at every classification check is quite
> silly and potentially expensive, so let's call using ifindex a last
> resort, yet correct, fix.

Just double checking (I think we are saying the same thing),
that using ifindices and requiring refcounting for input_dev means you
have to use __dev_get_by_index() on a per-packet basis.

The contention is if we do really care about refcounting: I dont think
we do. The only time it would really matter is when a module or device
is hotplugged out and back in. 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: reduce skb input dev on 64 bit machines

2005-07-27 Thread jamal
On Tue, 2005-26-07 at 09:54 -0700, Ben Greear wrote:
[..]
> You will need to enforce that nothing else gets the index 34 while eth7 is
> removed.  
> How do you do that?  

Thats trivial if you assume there's one management app which most of the
router vendors implementing it have. It will get tricky when you have 10
apps fighting to get index 34 for other devices. 
You could of course enforce a reserved set of indices right on bootup
via a kernel option for example; but that doesnt solve a stupid admin
with 10 scripts all fighting for index 34 each for a different device.

> If you try to put that into the kernel, you
> have a big nasty mess, and if you try to make user-space do it, any bogus
> script can hose your system and potentially screw up your firewall and
> worse.
> 

Refer to above. It can actually be solved and not in a big mess like you
say. The question is whether such a feature is needed.

> Also, imagine that you remove your pro/100 pcmcia NIC and put in your
> tulip.  Both will be, say, eth1 currently, and that is probably what you
> want, but they will have different physical characteristics.  

The name is easy. Check out a utility like nameif for example which uses
MAC addresses as unique ids. DaveM mentions this in his other email on
this thread.

> Even if you
> put them in different cardbus slots, the likelyhood is that you want it
> to be called eth1 and treated the same regardless of which NIC you are
> using.  If you are matching firewall rules or whatever against a device
> index instead of a device name, this will fail because the device indexes
> will be different.

Again the name is easy. You can call a NIC whatever you want. 

> And, for purely virtual devices, with no lspci relationship, and no serial
> number, how do you match those?
> 

[You are taking things too literally (which is always dangerous): When i
mentioned lspci, serial number etc - I was not defining scripture. I was
giving an example of how you could find uniqueness. DaveM mentioned MAC
addresses for example; think outside the box a little - find something
thats unique on per device type if you are going to write a management
script/program.]

> Maybe we could make a small effort to keep the device indexes the same
> more often, but still not guarantee it.  That may help snmp related tools
> without overly complicating the kernel or user space.

We already almost guarantee a device will get the same ifindex and name
if created at boot time on the same kernel.
As for reserving, as i said above - an admin could be allowed to reserve
ifindices. Now you could go further with boot time reserving of a
name,ifindex pair example "ifreserve=eth7,34" and pass a series of
those; when someone creates eth7 they get ifindex 34 etc. But this is
assuming 10 other scripts will try to be mapping ifindex 34 to something
else.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add netlink module refcounting

2005-07-27 Thread Harald Welte
On Tue, Jul 26, 2005 at 04:37:19PM -0700, David S. Miller wrote:
> From: Harald Welte <[EMAIL PROTECTED]>
> Date: Sat, 23 Jul 2005 16:15:52 -0400
> 
> > The attached patch adds support for refcounting of modules implementing
> > netlink protocols.  The idea is that you prevent the module from
> > disappearing as long as someone in userspace has still a socket talking
> > to you.
> 
> Ok, the changes look mostly fine.  I've made a few slight
> modifications before integrating, some of which I've mentioned
> already:
> 
> 1) I keep nl_table[] dynamically allocated
> 2) I fixed up some white spacing, very minor stuff
> 3) I fixed a socket leak in netlink_kernel_create().  If
>netlink_lookup() returns non-NULL, you have a reference
>to that socket thus have to release it.

thanks for taking care of this, and especially modifying the patch.  I
would have done that based on your comments, but well, if you want to do
it, it's more convenient for me ;)

> I'm only including the af_netlink.c part of that patch I integrated
> since that's the only part I modified compared to your original
> patch.

ok, I read through it and it seems fine to me.
 
> I think there is a slight hole in this code though, which we can
> fixup as a followon patch.  We probably need to grab the netlink
> table from the point at which we set p_ops all the way to where
> we full commit and netlink_insert() the kernel netlink socket.

mh, I have to think about that in more detail, will get back to you.

There's also a potential module refcounting leak when we have the
following order of events:

1) netlink_kernel_create()
2) userspace opens socket, increases refcount of kernel socket
3) sock_release(kernel_sock), resets p_ops to generic ones
4) userspace closes socket, but can no longer drop refcount on module 
   implementing the kernel socket.

I had a somewhat lengthy discussion with Thomas and Patrick about this,
and we didn't think it's worth fixing this up, esp. since all the
current users seem to sock_release only in the module unload path, which
can in turn only be invoked if the refcount is already zero.  The only
real solution for this is to split netlink_kernel_create() in two parts,
let's say netlink_proto_register and netlink_kernel_sock_create(), and
the same for sock_release() / netlink_proto_unregister().  Let me know
whether you think it's worth the effort.

Thanks,
Harald
-- 
- Harald Welte <[EMAIL PROTECTED]> http://netfilter.org/

  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."-- Paul Vixie


pgpaKkcTCs0Uh.pgp
Description: PGP signature


more complex processing in ing_filter ?

2005-07-27 Thread Lucas Nussbaum
Hi,

I'm interested in doing more complex stuff on inbound packets than what
is currently possible with ing_filter (I understand ingress doesn't
allow child classes , and can only drop/pass packets, not store one to
send it later).

While this is understandable because it would conflict with the benefits
of NAPI by queueing and dropping packets much later, it prevents me from
using Linux instead of FreeBSD's Dummynet (I'm working on network
emulation-related stuff).

What would be the disadvantages of moving the call to ing_filter earlier
in netif_receive_skb, allow queueing in ingress, and re-inject packets
inside netif_receive_skb ? Does it look do-able at least ? I'm not sure
I see all the problems it implies.

I know there's a solution to my problem using IMQ or dummy, but it
doesn't look like a very clean solution.

Thank you,
-- 
| Lucas Nussbaum
| [EMAIL PROTECTED]   http://www.lucas-nussbaum.net/ |
| jabber: [EMAIL PROTECTED] GPG: 1024D/023B3F4F |
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6-git] Add netpoll support to cs890x driver

2005-07-27 Thread Deepak Saxena

Trivial patch adding netpoll support to cs89x0 driver.

Signed-off-by: Deepak Saxena <[EMAIL PROTECTED]>

Please apply,
~Deepak


diff --git a/drivers/net/cs89x0.c b/drivers/net/cs89x0.c
--- a/drivers/net/cs89x0.c
+++ b/drivers/net/cs89x0.c
@@ -86,6 +86,7 @@
 
   Deepak Saxena : [EMAIL PROTECTED]
 : Intel IXDP2x01 (XScale ixp2x00 NPU) platform support
+: Netpoll support
 
 */
 
@@ -247,6 +248,9 @@ static int get_eeprom_data(struct net_de
 static int get_eeprom_cksum(int off, int len, int *buffer);
 static int set_mac_address(struct net_device *dev, void *addr);
 static void count_rx_errors(int status, struct net_local *lp);
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static void net_poll_controller(struct net_device *dev);
+#endif
 #if ALLOW_DMA
 static void get_dma_channel(struct net_device *dev);
 static void release_dma_buff(struct net_local *lp);
@@ -405,6 +409,19 @@ get_eeprom_cksum(int off, int len, int *
return -1;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+/*
+ * Polling receive - used by netconsole and other diagnostic tools
+ * to allow network i/o with interrupts disabled.
+ */
+static void net_poll_controller(struct net_device *dev)
+{
+   disable_irq(dev->irq);
+   net_interrupt(dev->irq, dev, NULL);
+   enable_irq(dev->irq);
+}
+#endif
+
 /* This is the real probe routine.  Linux has a history of friendly device
probes on the ISA bus.  A good device probes avoids doing writes, and
verifies that the correct device exists and functions.
@@ -756,6 +773,9 @@ printk("PP_addr=0x%x\n", inw(ioaddr + AD
dev->get_stats  = net_get_stats;
dev->set_multicast_list = set_multicast_list;
dev->set_mac_address= set_mac_address;
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   dev->poll_controller= net_poll_controller;
+#endif
 
printk("\n");
if (net_debug)


-- 
Deepak Saxena - [EMAIL PROTECTED] - http://www.plexity.net

Even a stopped clock gives the right time twice a day.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html