Re: more complex processing in ing_filter ?
On Wed, Jul 27, 2005 at 10:50:41AM -0700, Stephen Hemminger <[EMAIL PROTECTED]> wrote: > On Wed, 27 Jul 2005 10:06:45 +0200 > Lucas Nussbaum <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I'm interested in doing more complex stuff on inbound packets than > > what is currently possible with ing_filter (I understand ingress > > doesn't allow child classes , and can only drop/pass packets, not > > store one to send it later). > > > > While this is understandable because it would conflict with the > > benefits of NAPI by queueing and dropping packets much later, it > > prevents me from using Linux instead of FreeBSD's Dummynet (I'm > > working on network emulation-related stuff). > > > Why not just fix netem to work on imq? I might look at that. What's the problem with netem & imq ? Also, I'm not sure I understand the difference between the "redirect to imq" and the "redirect to dummy" approaches. -- | Lucas Nussbaum | [EMAIL PROTECTED] http://www.lucas-nussbaum.net/ | | jabber: [EMAIL PROTECTED] GPG: 1024D/023B3F4F | - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: more complex processing in ing_filter ?
On Wed, Jul 27, 2005 at 08:59:56AM -0400, jamal <[EMAIL PROTECTED]> wrote: > On Wed, 2005-27-07 at 10:06 +0200, Lucas Nussbaum wrote: > > Hi, > > > > I'm interested in doing more complex stuff on inbound packets than what > > is currently possible with ing_filter (I understand ingress doesn't > > allow child classes , and can only drop/pass packets, not store one to > > send it later). > > > > No, thats not true. You can write a tc action that will steal packets > from that path and later reinject them. Any example/mail thread I could read about this ? > But that may not be necessary > if you use the patched dummy device since you could redirect packets to > it and run whatever qdisc you want on it. > > [...] > > I am not sure why you say it's unclean. If you can give the packets to > dummy and run any qdisc on it such as netem - why would that be a > problem? I'm concerned about the overhead of redirecting the packets to a dummy/imq device and then re-inject them, compared to doing all the processing inside ing_filter. However, I don't know enough linux internals to really evaluate it. Any idea ? -- | Lucas Nussbaum | [EMAIL PROTECTED] http://www.lucas-nussbaum.net/ | | jabber: [EMAIL PROTECTED] GPG: 1024D/023B3F4F | - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NET] Cleanup INET_REFCNT_DEBUG code
From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo) Date: Thu, 28 Jul 2005 02:46:56 -0300 > Oops, sorry I overlooked that, did a test without an the last one was with > it defined, but I guess that leaving it as is for a few days wouldn't harm > so that people get a bit of this debugging and perhaps find out some > possible problems introduced in the last six months or so? I.e. its just a > matter of deciding if we disable it now or in a few days. Ok, we can leave it on for a little while ;) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NET] Cleanup INET_REFCNT_DEBUG code
Em Wed, Jul 27, 2005 at 10:43:45PM -0700, David S. Miller escreveu: > From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo) > Date: Thu, 28 Jul 2005 02:19:59 -0300 > > > Indeed, there were some issues about that thing, I think I have those > > handled > > properly now, please take a look at the comments and tell me if you find any > > holes. > > Looks good, I'll get to pulling this in soon. > > Are we going from a default of off to a default of > on for any particular reason? Oops, sorry I overlooked that, did a test without an the last one was with it defined, but I guess that leaving it as is for a few days wouldn't harm so that people get a bit of this debugging and perhaps find out some possible problems introduced in the last six months or so? I.e. its just a matter of deciding if we disable it now or in a few days. Regards, - Arnaldo - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NET] Cleanup INET_REFCNT_DEBUG code
From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo) Date: Thu, 28 Jul 2005 02:19:59 -0300 > Indeed, there were some issues about that thing, I think I have those handled > properly now, please take a look at the comments and tell me if you find any > holes. Looks good, I'll get to pulling this in soon. Are we going from a default of off to a default of on for any particular reason? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6 1/5] tg3: Add basic register access function pointers
From: "Michael Chan" <[EMAIL PROTECTED]> Date: Wed, 27 Jul 2005 22:33:32 -0700 > But with so many different workaround methods > (TG3_FLAG_MBOX_WRITE_REORDER, TG3_FLAG_TXD_MBOX_HWBUG, > TG3_FLG2_ICH_WORKAROUND, TG3_FLAG_5701_REG_WRITE_BUG, etc), it's > more like: > > if (...) > direct_func_1() > else if (...) > direct_func_2() > else if (...) > direct_func_3() > else > direct_func_4() > > At some point I suspect the indirect function pointer method will > become better. That's a good point. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6 1/5] tg3: Add basic register access function pointers
Jeff Garzik wrote: > Is this theory, or it has been actually measured? > > In x86-based CPUs at least (the largest tg3 platform), branch > prediction > often prefers > > if (...) > direct_func_1() > else > direct_func_2() > > to > > tp->func() > > For hot paths, branch prediction will almost always predict > the correct > path, without any need for deferenced, indirect jumps. > > The latter example may look more clean, but the former is probably > faster in Real Life(tm). > Not measured. But with so many different workaround methods (TG3_FLAG_MBOX_WRITE_REORDER, TG3_FLAG_TXD_MBOX_HWBUG, TG3_FLG2_ICH_WORKAROUND, TG3_FLAG_5701_REG_WRITE_BUG, etc), it's more like: if (...) direct_func_1() else if (...) direct_func_2() else if (...) direct_func_3() else direct_func_4() At some point I suspect the indirect function pointer method will become better. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][NET] Cleanup INET_REFCNT_DEBUG code
Hi David, Please consider pulling from: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/ Regards, - Arnaldo tree 238b21aeaed399b847c0f0b0f7328cd69ffcd0d1 parent df2e0392536ecdd6385f4319f746045fd6fae38f author Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> 1122526171 -0300 committer Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> 1122526171 -0300 [PATCH][NET] Cleanup INET_REFCNT_DEBUG code > On 7/22/05, David S. Miller <[EMAIL PROTECTED]> wrote: > > From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> > > Date: Thu, 21 Jul 2005 23:02:03 -0300 > > > > > The second one again, also at: > > > > > > rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git > > > > How is this handling properly the case where sk_prot changes? > > > > Do you remember we had that problem with socket SLAB caches, > > because of how IPV6 and IPV4 sockets can change into the other > > type? That's why we store the socket SLAB cache in there, as > > well as the sk_prot. > I think so, but that thing is so tricky at times that I'll go over most of > the patch (re)reviewing/commenting why its safe. Indeed, there were some issues about that thing, I think I have those handled properly now, please take a look at the comments and tell me if you find any holes. > > Also, would be nice to have some "do { } while (0)" for the NOP > > version of the debug macros just in case :-) > I'll do that Done. Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> Signed-off-by: David S. Miller <[EMAIL PROTECTED]> -- include/net/inet_common.h |1 - include/net/ipv6.h|1 - include/net/sock.h| 32 +++- include/net/tcp.h |2 +- net/core/sock.c |6 +- net/ipv4/af_inet.c| 18 ++ net/ipv4/tcp.c|7 +-- net/ipv4/tcp_minisocks.c | 20 net/ipv6/af_inet6.c | 31 +++ net/ipv6/ipv6_sockglue.c | 15 --- net/ipv6/tcp_ipv6.c | 18 +- net/sctp/ipv6.c |5 + net/sctp/protocol.c |4 +--- 13 files changed, 86 insertions(+), 74 deletions(-) -- diff --git a/include/net/inet_common.h b/include/net/inet_common.h --- a/include/net/inet_common.h +++ b/include/net/inet_common.h @@ -29,7 +29,6 @@ extern unsigned int inet_poll(struct fi extern int inet_listen(struct socket *sock, int backlog); extern voidinet_sock_destruct(struct sock *sk); -extern atomic_tinet_sock_nr; extern int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); diff --git a/include/net/ipv6.h b/include/net/ipv6.h --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -145,7 +145,6 @@ DECLARE_SNMP_STAT(struct udp_mib, udp_st #define UDP6_INC_STATS(field) SNMP_INC_STATS(udp_stats_in6, field) #define UDP6_INC_STATS_BH(field) SNMP_INC_STATS_BH(udp_stats_in6, field) #define UDP6_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_stats_in6, field) -extern atomic_tinet6_sock_nr; int snmp6_register_dev(struct inet6_dev *idev); int snmp6_unregister_dev(struct inet6_dev *idev); diff --git a/include/net/sock.h b/include/net/sock.h --- a/include/net/sock.h +++ b/include/net/sock.h @@ -486,6 +486,9 @@ extern int sk_wait_data(struct sock *sk, struct request_sock_ops; +/* Here is the right place to enable sock refcounting debugging */ +#define SOCK_REFCNT_DEBUG + /* Networking protocol blocks we attach to sockets. * socket layer -> transport layer interface * transport -> network interface is defined by struct inet_proto @@ -556,7 +559,9 @@ struct proto { charname[32]; struct list_headnode; - +#ifdef SOCK_REFCNT_DEBUG + atomic_tsocks; +#endif struct { int inuse; u8 __pad[SMP_CACHE_BYTES - sizeof(int)]; @@ -566,6 +571,31 @@ struct proto { extern int proto_register(struct proto *prot, int alloc_slab); extern void proto_unregister(struct proto *prot); +#ifdef SOCK_REFCNT_DEBUG +static inline void sk_refcnt_debug_inc(struct sock *sk) +{ + atomic_inc(&sk->sk_prot->socks); +} + +static inline void sk_refcnt_debug_dec(struct sock *sk) +{ + atomic_dec(&sk->sk_prot->socks); + printk(KERN_DEBUG "%s socket %p released, %d are still alive\n", + sk->sk_prot->name, sk, atomic_read(&sk->sk_prot->socks)); +} + +static inline void sk_refcnt_debug_release(const struct sock *sk) +{ + if (atomic_read(&sk->sk_refcnt) != 1) + printk(KERN_DEBUG "Destruction of the %s socket %p delayed, refcnt
Re: [PATCH 2.6 1/5] tg3: Add basic register access function pointers
From: Jeff Garzik <[EMAIL PROTECTED]> Date: Wed, 27 Jul 2005 23:44:51 -0400 > In x86-based CPUs at least (the largest tg3 platform), branch prediction > often prefers > > if (...) > direct_func_1() > else > direct_func_2() > > to > > tp->func() Indirect function calls also kill cpus such as ia64, which cannot avoid the implicit branch prediction miss unless the function method target is prefetched several instruction groups before the call and gcc does not emit the necessary directives to achieve this even if it were possible. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6 1/5] tg3: Add basic register access function pointers
Michael Chan wrote: This patch adds the basic function pointers to do register accesses in the fast path. This was suggested by David Miller. The idea is that various register access methods for different hardware errata can easily be implemented with these function pointers and performance will not be degraded on chips that use normal register access methods. Is this theory, or it has been actually measured? In x86-based CPUs at least (the largest tg3 platform), branch prediction often prefers if (...) direct_func_1() else direct_func_2() to tp->func() For hot paths, branch prediction will almost always predict the correct path, without any need for deferenced, indirect jumps. The latter example may look more clean, but the former is probably faster in Real Life(tm). Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6 5/5] tg3: Eliminate one register write in tg3_restart_ints()
The register write to register 0x68 to restart interrupts is unnecessary as the interrupt wasn't masked in that register by the irq handler. This will save one register write in the fast path. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff -Nrup 5/drivers/net/tg3.c 6/drivers/net/tg3.c --- 5/drivers/net/tg3.c 2005-07-27 16:40:17.0 -0700 +++ 6/drivers/net/tg3.c 2005-07-27 16:40:32.0 -0700 @@ -533,8 +533,6 @@ static inline unsigned int tg3_has_work( */ static void tg3_restart_ints(struct tg3 *tp) { - tw32(TG3PCI_MISC_HOST_CTRL, - (tp->misc_host_ctrl & ~MISC_HOST_CTRL_MASK_PCI_INT)); tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, tp->last_tag << 24); mmiowb(); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6 4/5] tg3: Add indirect register method for 5703 behind ICH
This patch adds the new workaround for 5703 A1/A2 if it is behind certain ICH bridges. The workaround disables memory and uses config. cycles only to access all registers. The 5702/03 chips can mistakenly decode the special cycles from the ICH chipsets as memory write cycles, causing corruption of register and memory space. Only certain ICH bridges will drive special cycles with non-zero data during the address phase which can fall within the 5703's address range. This is not an ICH bug as the PCI spec allows non-zero address during special cycles. However, only these ICH bridges are known to drive non-zero addresses during special cycles. The indirect_lock is also changed to spin_lock_irqsave from spin_lock_bh because it is used in irq handler when using the indirect method to disable interrupts. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff -Nrup 4/drivers/net/tg3.c 5/drivers/net/tg3.c --- 4/drivers/net/tg3.c 2005-07-26 09:23:32.0 -0700 +++ 5/drivers/net/tg3.c 2005-07-27 16:40:17.0 -0700 @@ -340,10 +340,12 @@ static struct { static void tg3_write_indirect_reg32(struct tg3 *tp, u32 off, u32 val) { - spin_lock_bh(&tp->indirect_lock); + unsigned long flags; + + spin_lock_irqsave(&tp->indirect_lock, flags); pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off); pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val); - spin_unlock_bh(&tp->indirect_lock); + spin_unlock_irqrestore(&tp->indirect_lock, flags); } static void tg3_write_flush_reg32(struct tg3 *tp, u32 off, u32 val) @@ -352,24 +354,75 @@ static void tg3_write_flush_reg32(struct readl(tp->regs + off); } -static void _tw32_flush(struct tg3 *tp, u32 off, u32 val) +static u32 tg3_read_indirect_reg32(struct tg3 *tp, u32 off) { - if ((tp->tg3_flags & TG3_FLAG_PCIX_TARGET_HWBUG) != 0) { - spin_lock_bh(&tp->indirect_lock); - pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off); - pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val); - spin_unlock_bh(&tp->indirect_lock); - } else { - void __iomem *dest = tp->regs + off; - writel(val, dest); - readl(dest);/* always flush PCI write */ + unsigned long flags; + u32 val; + + spin_lock_irqsave(&tp->indirect_lock, flags); + pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off); + pci_read_config_dword(tp->pdev, TG3PCI_REG_DATA, &val); + spin_unlock_irqrestore(&tp->indirect_lock, flags); + return val; +} + +static void tg3_write_indirect_mbox(struct tg3 *tp, u32 off, u32 val) +{ + unsigned long flags; + + if (off == (MAILBOX_RCVRET_CON_IDX_0 + TG3_64BIT_REG_LOW)) { + pci_write_config_dword(tp->pdev, TG3PCI_RCV_RET_RING_CON_IDX + + TG3_64BIT_REG_LOW, val); + return; + } + if (off == (MAILBOX_RCV_STD_PROD_IDX + TG3_64BIT_REG_LOW)) { + pci_write_config_dword(tp->pdev, TG3PCI_STD_RING_PROD_IDX + + TG3_64BIT_REG_LOW, val); + return; } + + spin_lock_irqsave(&tp->indirect_lock, flags); + pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off + 0x5600); + pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val); + spin_unlock_irqrestore(&tp->indirect_lock, flags); + + /* In indirect mode when disabling interrupts, we also need +* to clear the interrupt bit in the GRC local ctrl register. +*/ + if ((off == (MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW)) && + (val == 0x1)) { + pci_write_config_dword(tp->pdev, TG3PCI_MISC_LOCAL_CTRL, + tp->grc_local_ctrl|GRC_LCLCTRL_CLEARINT); + } +} + +static u32 tg3_read_indirect_mbox(struct tg3 *tp, u32 off) +{ + unsigned long flags; + u32 val; + + spin_lock_irqsave(&tp->indirect_lock, flags); + pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off + 0x5600); + pci_read_config_dword(tp->pdev, TG3PCI_REG_DATA, &val); + spin_unlock_irqrestore(&tp->indirect_lock, flags); + return val; +} + +static void _tw32_flush(struct tg3 *tp, u32 off, u32 val) +{ + tp->write32(tp, off, val); + if (!(tp->tg3_flags & TG3_FLAG_PCIX_TARGET_HWBUG) && + !(tp->tg3_flags & TG3_FLAG_5701_REG_WRITE_BUG) && + !(tp->tg3_flags2 & TG3_FLG2_ICH_WORKAROUND)) + tp->read32(tp, off);/* flush */ } static inline void tw32_mailbox_flush(struct tg3 *tp, u32 off, u32 val) { tp->write32_mbox(tp, off, val); - tp->read32_mbox(tp, off); + if (!(tp->tg3_flags & TG3_FLAG_MBOX_WRITE_REORDER) && + !(tp->tg3_flags2 & TG3_FLG2_ICH_WORKAROUND)) + tp->read32_mbox(tp, off); } static void tg3_write32_tx_mbox(struct tg3 *tp, u32 off, u32 val) @@ -404,24 +457,28
[PATCH 2.6 3/5] tg3: Add mailbox read method
This patch adds the mailbox read method and also adds an inline function tw32_mailbox_f() for mailbox writes that require read flush. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff -Nrup 3/drivers/net/tg3.c 4/drivers/net/tg3.c --- 3/drivers/net/tg3.c 2005-07-26 07:40:18.0 -0700 +++ 4/drivers/net/tg3.c 2005-07-26 09:23:32.0 -0700 @@ -366,6 +366,12 @@ static void _tw32_flush(struct tg3 *tp, } } +static inline void tw32_mailbox_flush(struct tg3 *tp, u32 off, u32 val) +{ + tp->write32_mbox(tp, off, val); + tp->read32_mbox(tp, off); +} + static void tg3_write32_tx_mbox(struct tg3 *tp, u32 off, u32 val) { void __iomem *mbox = tp->regs + off; @@ -387,8 +393,10 @@ static u32 tg3_read32(struct tg3 *tp, u3 } #define tw32_mailbox(reg, val) tp->write32_mbox(tp, reg, val) +#define tw32_mailbox_f(reg, val) tw32_mailbox_flush(tp, (reg), (val)) #define tw32_rx_mbox(reg, val) tp->write32_rx_mbox(tp, reg, val) #define tw32_tx_mbox(reg, val) tp->write32_tx_mbox(tp, reg, val) +#define tr32_mailbox(reg) tp->read32_mbox(tp, reg) #define tw32(reg,val) tp->write32(tp, reg, val) #define tw32_f(reg,val)_tw32_flush(tp,(reg),(val)) @@ -420,8 +428,7 @@ static void tg3_disable_ints(struct tg3 { tw32(TG3PCI_MISC_HOST_CTRL, (tp->misc_host_ctrl | MISC_HOST_CTRL_MASK_PCI_INT)); - tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x0001); - tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); + tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x0001); } static inline void tg3_cond_int(struct tg3 *tp) @@ -437,9 +444,8 @@ static void tg3_enable_ints(struct tg3 * tw32(TG3PCI_MISC_HOST_CTRL, (tp->misc_host_ctrl & ~MISC_HOST_CTRL_MASK_PCI_INT)); - tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, -(tp->last_tag << 24)); - tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); + tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, + (tp->last_tag << 24)); tg3_cond_int(tp); } @@ -3276,9 +3282,8 @@ static irqreturn_t tg3_interrupt(int irq /* No work, shared interrupt perhaps? re-enable * interrupts, and flush that PCI write */ - tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, + tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x); - tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); } } else {/* shared interrupt */ handled = 0; @@ -3321,9 +3326,8 @@ static irqreturn_t tg3_interrupt_tagged( /* no work, shared interrupt perhaps? re-enable * interrupts, and flush that PCI write */ - tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, -tp->last_tag << 24); - tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); + tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, + tp->last_tag << 24); } } else {/* shared interrupt */ handled = 0; @@ -5800,8 +5804,7 @@ static int tg3_reset_hw(struct tg3 *tp) tw32_f(GRC_LOCAL_CTRL, tp->grc_local_ctrl); udelay(100); - tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0); - tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); + tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0); tp->last_tag = 0; if (!(tp->tg3_flags2 & TG3_FLG2_5705_PLUS)) { @@ -6190,7 +6193,8 @@ static int tg3_test_interrupt(struct tg3 HOSTCC_MODE_NOW); for (i = 0; i < 5; i++) { - int_mbox = tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); + int_mbox = tr32_mailbox(MAILBOX_INTERRUPT_0 + + TG3_64BIT_REG_LOW); if (int_mbox != 0) break; msleep(10); @@ -6590,10 +6594,10 @@ static int tg3_open(struct net_device *d /* Mailboxes */ printk("DEBUG: SNDHOST_PROD[%08x%08x] SNDNIC_PROD[%08x%08x]\n", - tr32(MAILBOX_SNDHOST_PROD_IDX_0 + 0x0), - tr32(MAILBOX_SNDHOST_PROD_IDX_0 + 0x4), - tr32(MAILBOX_SNDNIC_PROD_IDX_0 + 0x0), - tr32(MAILBOX_SNDNIC_PROD_IDX_0 + 0x4)); + tr32_mailbox(MAILBOX_SNDHOST_PROD_IDX_0 + 0x0), + tr32_mailbox(MAILBOX_SNDHOST_PROD_IDX_0 + 0x4), + tr32_mailbox(MAILBOX_SNDNIC_PROD_IDX_0 + 0x0), + tr32_mailbox(MAILBOX_SNDNIC_PROD_IDX_0 + 0x4)); /* NIC side send descriptors. */ for (i = 0; i < 6; i++) { @@ -7895,7 +7899,7 @@ static int tg3_test_loopback(struct tg3 nu
[PATCH 2.6 2/5] tg3: Add various register methods
This patch adds various dedicated register read/write methods for the existing workarounds, including PCIX target workaround, write with read flush, etc. The chips that require these workarounds will use these dedicated access functions. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff -Nrup 2/drivers/net/tg3.c 3/drivers/net/tg3.c --- 2/drivers/net/tg3.c 2005-07-26 07:33:32.0 -0700 +++ 3/drivers/net/tg3.c 2005-07-26 07:40:18.0 -0700 @@ -340,16 +340,16 @@ static struct { static void tg3_write_indirect_reg32(struct tg3 *tp, u32 off, u32 val) { - if ((tp->tg3_flags & TG3_FLAG_PCIX_TARGET_HWBUG) != 0) { - spin_lock_bh(&tp->indirect_lock); - pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off); - pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val); - spin_unlock_bh(&tp->indirect_lock); - } else { - writel(val, tp->regs + off); - if ((tp->tg3_flags & TG3_FLAG_5701_REG_WRITE_BUG) != 0) - readl(tp->regs + off); - } + spin_lock_bh(&tp->indirect_lock); + pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off); + pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val); + spin_unlock_bh(&tp->indirect_lock); +} + +static void tg3_write_flush_reg32(struct tg3 *tp, u32 off, u32 val) +{ + writel(val, tp->regs + off); + readl(tp->regs + off); } static void _tw32_flush(struct tg3 *tp, u32 off, u32 val) @@ -366,14 +366,6 @@ static void _tw32_flush(struct tg3 *tp, } } -static void tg3_write32_rx_mbox(struct tg3 *tp, u32 off, u32 val) -{ - void __iomem *mbox = tp->regs + off; - writel(val, mbox); - if (tp->tg3_flags & TG3_FLAG_MBOX_WRITE_REORDER) - readl(mbox); -} - static void tg3_write32_tx_mbox(struct tg3 *tp, u32 off, u32 val) { void __iomem *mbox = tp->regs + off; @@ -4222,7 +4214,7 @@ static void tg3_stop_fw(struct tg3 *); static int tg3_chip_reset(struct tg3 *tp) { u32 val; - u32 flags_save; + void (*write_op)(struct tg3 *, u32, u32); int i; if (!(tp->tg3_flags2 & TG3_FLG2_SUN_570X)) @@ -4234,8 +4226,9 @@ static int tg3_chip_reset(struct tg3 *tp * fun things. So, temporarily disable the 5701 * hardware workaround, while we do the reset. */ - flags_save = tp->tg3_flags; - tp->tg3_flags &= ~TG3_FLAG_5701_REG_WRITE_BUG; + write_op = tp->write32; + if (write_op == tg3_write_flush_reg32) + tp->write32 = tg3_write32; /* do the reset */ val = GRC_MISC_CFG_CORECLK_RESET; @@ -4254,8 +4247,8 @@ static int tg3_chip_reset(struct tg3 *tp val |= GRC_MISC_CFG_KEEP_GPHY_POWER; tw32(GRC_MISC_CFG, val); - /* restore 5701 hardware bug workaround flag */ - tp->tg3_flags = flags_save; + /* restore 5701 hardware bug workaround write method */ + tp->write32 = write_op; /* Unfortunately, we have to delay before the PCI read back. * Some 575X chips even will not respond to a PCI cfg access @@ -4641,7 +4634,6 @@ static int tg3_load_firmware_cpu(struct int cpu_scratch_size, struct fw_info *info) { int err, i; - u32 orig_tg3_flags = tp->tg3_flags; void (*write_op)(struct tg3 *, u32, u32); if (cpu_base == TX_CPU_BASE && @@ -4657,11 +4649,6 @@ static int tg3_load_firmware_cpu(struct else write_op = tg3_write_indirect_reg32; - /* Force use of PCI config space for indirect register -* write calls. -*/ - tp->tg3_flags |= TG3_FLAG_PCIX_TARGET_HWBUG; - /* It is possible that bootcode is still loading at this point. * Get the nvram lock first before halting the cpu. */ @@ -4697,7 +4684,6 @@ static int tg3_load_firmware_cpu(struct err = 0; out: - tp->tg3_flags = orig_tg3_flags; return err; } @@ -9331,11 +9317,25 @@ static int __devinit tg3_get_invariants( pci_write_config_dword(tp->pdev, TG3PCI_PCISTATE, pci_state_reg); } + /* Default fast path register access methods */ tp->read32 = tg3_read32; - tp->write32 = tg3_write_indirect_reg32; + tp->write32 = tg3_write32; tp->write32_mbox = tg3_write32; - tp->write32_tx_mbox = tg3_write32_tx_mbox; - tp->write32_rx_mbox = tg3_write32_rx_mbox; + tp->write32_tx_mbox = tg3_write32; + tp->write32_rx_mbox = tg3_write32; + + /* Various workaround register access methods */ + if (tp->tg3_flags & TG3_FLAG_PCIX_TARGET_HWBUG) + tp->write32 = tg3_write_indirect_reg32; + else if (tp->tg3_flags & TG3_FLAG_5701_REG_WRITE_BUG) + tp->write32 = tg3_write_flush_reg32; + + if ((tp->tg3_flags & TG3_FLAG_TXD_MBOX_HWBUG) || + (tp->tg3_flags & TG3_FLAG_MBOX_WRITE_REORDER)) { +
[PATCH 2.6 1/5] tg3: Add basic register access function pointers
This patch adds the basic function pointers to do register accesses in the fast path. This was suggested by David Miller. The idea is that various register access methods for different hardware errata can easily be implemented with these function pointers and performance will not be degraded on chips that use normal register access methods. The various register read write macros (e.g. tw32, tr32, tw32_mailbox) are redefined to call the function pointers. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff -Nrup 1/drivers/net/tg3.c 2/drivers/net/tg3.c --- 1/drivers/net/tg3.c 2005-07-25 20:01:38.0 -0700 +++ 2/drivers/net/tg3.c 2005-07-26 07:33:32.0 -0700 @@ -366,7 +366,7 @@ static void _tw32_flush(struct tg3 *tp, } } -static inline void _tw32_rx_mbox(struct tg3 *tp, u32 off, u32 val) +static void tg3_write32_rx_mbox(struct tg3 *tp, u32 off, u32 val) { void __iomem *mbox = tp->regs + off; writel(val, mbox); @@ -374,7 +374,7 @@ static inline void _tw32_rx_mbox(struct readl(mbox); } -static inline void _tw32_tx_mbox(struct tg3 *tp, u32 off, u32 val) +static void tg3_write32_tx_mbox(struct tg3 *tp, u32 off, u32 val) { void __iomem *mbox = tp->regs + off; writel(val, mbox); @@ -384,17 +384,23 @@ static inline void _tw32_tx_mbox(struct readl(mbox); } -#define tw32_mailbox(reg, val) writel(((val) & 0x), tp->regs + (reg)) -#define tw32_rx_mbox(reg, val) _tw32_rx_mbox(tp, reg, val) -#define tw32_tx_mbox(reg, val) _tw32_tx_mbox(tp, reg, val) +static void tg3_write32(struct tg3 *tp, u32 off, u32 val) +{ + writel(val, tp->regs + off); +} -#define tw32(reg,val) tg3_write_indirect_reg32(tp,(reg),(val)) +static u32 tg3_read32(struct tg3 *tp, u32 off) +{ + return (readl(tp->regs + off)); +} + +#define tw32_mailbox(reg, val) tp->write32_mbox(tp, reg, val) +#define tw32_rx_mbox(reg, val) tp->write32_rx_mbox(tp, reg, val) +#define tw32_tx_mbox(reg, val) tp->write32_tx_mbox(tp, reg, val) + +#define tw32(reg,val) tp->write32(tp, reg, val) #define tw32_f(reg,val)_tw32_flush(tp,(reg),(val)) -#define tw16(reg,val) writew(((val) & 0x), tp->regs + (reg)) -#define tw8(reg,val) writeb(((val) & 0xff), tp->regs + (reg)) -#define tr32(reg) readl(tp->regs + (reg)) -#define tr16(reg) readw(tp->regs + (reg)) -#define tr8(reg) readb(tp->regs + (reg)) +#define tr32(reg) tp->read32(tp, reg) static void tg3_write_mem(struct tg3 *tp, u32 off, u32 val) { @@ -9325,6 +9331,12 @@ static int __devinit tg3_get_invariants( pci_write_config_dword(tp->pdev, TG3PCI_PCISTATE, pci_state_reg); } + tp->read32 = tg3_read32; + tp->write32 = tg3_write_indirect_reg32; + tp->write32_mbox = tg3_write32; + tp->write32_tx_mbox = tg3_write32_tx_mbox; + tp->write32_rx_mbox = tg3_write32_rx_mbox; + /* Get eeprom hw config before calling tg3_set_power_state(). * In particular, the TG3_FLAG_EEPROM_WRITE_PROT flag must be * determined before calling tg3_set_power_state() so that diff -Nrup 1/drivers/net/tg3.h 2/drivers/net/tg3.h --- 1/drivers/net/tg3.h 2005-07-25 20:01:38.0 -0700 +++ 2/drivers/net/tg3.h 2005-07-25 20:05:53.0 -0700 @@ -2049,6 +2049,10 @@ struct tg3 { spinlock_t lock; spinlock_t indirect_lock; + u32 (*read32) (struct tg3 *, u32); + void(*write32) (struct tg3 *, u32, u32); + void(*write32_mbox) (struct tg3 *, u32, +u32); void __iomem*regs; struct net_device *dev; struct pci_dev *pdev; @@ -2060,6 +2064,8 @@ struct tg3 { u32 msg_enable; /* begin "tx thread" cacheline section */ + void(*write32_tx_mbox) (struct tg3 *, u32, + u32); u32 tx_prod; u32 tx_cons; u32 tx_pending; @@ -2071,6 +2077,8 @@ struct tg3 { dma_addr_t tx_desc_mapping; /* begin "rx thread" cacheline section */ + void(*write32_rx_mbox) (struct tg3 *, u32, + u32); u32 rx_rcb_ptr; u32 rx_std_ptr; u32 rx_jumbo_ptr; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH] Fix up struct sockaddr_in definition
Hi, I would like to propose a cleanup for struct sockaddr_in that I think will make the code much more obvious and remove some icky padding math: sockaddr_in-cleanup.patch Description: Binary data Thanks for all your input! Cheers, Kyle Moffett -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare
[PATCH 2.6 0/5] tg3: Add indirect register access for 5703
A set of patches will follow that adds the last remaining register access workaround for 5703 behind certain ICH bridges. The first 3 patches add the infrastructure to use function pointers for various register access methods. Patch #4 adds the new indirect register access method. It turns out that these patches improve performance on many systems with the 82801 (ICH) bridge, including new PCIE systems. The current tg3 driver sets the TG3_FLAG_MBOX_WRITE_REORDER flag when the ICH bridge is detected and a read flush will be added in the tx and rx data paths on all tg3 chips. These patches will correctly apply the indirect register method to 5703 only when necessary and the unnecessary read flush will be eliminated on all other tg3 chips. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2.6.13-rc3] ethtool: add generic ethtool_op_get_perm_addr routine
On Wed, Jul 27, 2005 at 06:45:18PM -0700, cramerj wrote: > Stupid question: Can we assume ethtool will only be used for networking > devices with a 6-byte hardware address? I presume not...? > If not, then the driver-specific approach would give the flexibility of > copying anything up to MAX_ADDR_LEN. > > Perhaps increasing the count to MAX_ADDR_LEN is the way to go?? Drivers would still have the option to override if they so choose. But, since the ETH_MAX_ADDR_LEN definition is actually 32 (which matches MAX_ADDR_LEN anyway) then it is a bit of a moot point... :-) Jon, you should probably add a patch (or redo you current patch) and use MAX_ADDR_LEN instead of adding the new ETH_MAX_ADDR_LEN... John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [patch 2.6.13-rc3] ethtool: add generic ethtool_op_get_perm_addr routine
B'ah! Nevermind. I'll learn to read #defines one of these days. Sorry for the spam. > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > On Behalf Of cramerj > Sent: Wednesday, July 27, 2005 6:45 PM > To: John W. Linville; Jon Wetzel > Cc: netdev@vger.kernel.org; [EMAIL PROTECTED] > Subject: RE: [patch 2.6.13-rc3] ethtool: add generic > ethtool_op_get_perm_addr routine > > Stupid question: Can we assume ethtool will only be used for networking > devices with a 6-byte hardware address? > > If not, then the driver-specific approach would give the flexibility of > copying anything up to MAX_ADDR_LEN. > > Perhaps increasing the count to MAX_ADDR_LEN is the way to go?? > > 6/half-dozen > > -Jeb > > > -Original Message- > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] > > On Behalf Of John W. Linville > > Sent: Wednesday, July 27, 2005 6:15 PM > > To: Jon Wetzel > > Cc: netdev@vger.kernel.org; [EMAIL PROTECTED] > > Subject: [patch 2.6.13-rc3] ethtool: add generic > ethtool_op_get_perm_addr > > routine > > > > Add generic ethtool operation for getting permanenet hardware address. > > > > Signed-off-by: John W. Linville <[EMAIL PROTECTED]> > > --- > > This moves and renames the basically generic e1000_get_perm_addr > > routine to ethtool_op_get_perm_addr, and causes e1000 to make use of > > the new name. > > > > drivers/net/e1000/e1000_ethtool.c |9 + > > include/linux/ethtool.h |1 + > > net/core/ethtool.c|7 +++ > > 3 files changed, 9 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/net/e1000/e1000_ethtool.c > > b/drivers/net/e1000/e1000_ethtool.c > > --- a/drivers/net/e1000/e1000_ethtool.c > > +++ b/drivers/net/e1000/e1000_ethtool.c > > @@ -1704,13 +1704,6 @@ e1000_get_strings(struct net_device *net > > } > > } > > > > -static int > > -e1000_get_perm_addr(struct net_device *netdev, struct ethtool_addr > *eaddr) > > -{ > > - memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN); > > - return 0; > > -} > > - > > struct ethtool_ops e1000_ethtool_ops = { > > .get_settings = e1000_get_settings, > > .set_settings = e1000_set_settings, > > @@ -1746,7 +1739,7 @@ struct ethtool_ops e1000_ethtool_ops = { > > .phys_id= e1000_phys_id, > > .get_stats_count= e1000_get_stats_count, > > .get_ethtool_stats = e1000_get_ethtool_stats, > > - .get_perm_addr = e1000_get_perm_addr, > > + .get_perm_addr = ethtool_op_get_perm_addr, > > }; > > > > void e1000_set_ethtool_ops(struct net_device *netdev) > > diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h > > --- a/include/linux/ethtool.h > > +++ b/include/linux/ethtool.h > > @@ -268,6 +268,7 @@ u32 ethtool_op_get_sg(struct net_device > > int ethtool_op_set_sg(struct net_device *dev, u32 data); > > u32 ethtool_op_get_tso(struct net_device *dev); > > int ethtool_op_set_tso(struct net_device *dev, u32 data); > > +int ethtool_op_get_perm_addr(struct net_device *dev, struct > ethtool_addr > > *); > > > > /** > > * ðtool_ops - Alter and report network device settings > > diff --git a/net/core/ethtool.c b/net/core/ethtool.c > > --- a/net/core/ethtool.c > > +++ b/net/core/ethtool.c > > @@ -81,6 +81,12 @@ int ethtool_op_set_tso(struct net_device > > return 0; > > } > > > > +int ethtool_op_get_perm_addr(struct net_device *netdev, struct > > ethtool_addr *eaddr) > > +{ > > + memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN); > > + return 0; > > +} > > + > > /* Handlers for each ethtool command */ > > > > static int ethtool_get_settings(struct net_device *dev, void __user > > *useraddr) > > @@ -845,6 +851,7 @@ int dev_ethtool(struct ifreq *ifr) > > > > EXPORT_SYMBOL(dev_ethtool); > > EXPORT_SYMBOL(ethtool_op_get_link); > > +EXPORT_SYMBOL_GPL(ethtool_op_get_perm_addr); > > EXPORT_SYMBOL(ethtool_op_get_sg); > > EXPORT_SYMBOL(ethtool_op_get_tso); > > EXPORT_SYMBOL(ethtool_op_get_tx_csum); > > -- > > John W. Linville > > [EMAIL PROTECTED] > > - > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [patch 2.6.13-rc3] ethtool: add generic ethtool_op_get_perm_addr routine
Stupid question: Can we assume ethtool will only be used for networking devices with a 6-byte hardware address? If not, then the driver-specific approach would give the flexibility of copying anything up to MAX_ADDR_LEN. Perhaps increasing the count to MAX_ADDR_LEN is the way to go?? 6/half-dozen -Jeb > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > On Behalf Of John W. Linville > Sent: Wednesday, July 27, 2005 6:15 PM > To: Jon Wetzel > Cc: netdev@vger.kernel.org; [EMAIL PROTECTED] > Subject: [patch 2.6.13-rc3] ethtool: add generic ethtool_op_get_perm_addr > routine > > Add generic ethtool operation for getting permanenet hardware address. > > Signed-off-by: John W. Linville <[EMAIL PROTECTED]> > --- > This moves and renames the basically generic e1000_get_perm_addr > routine to ethtool_op_get_perm_addr, and causes e1000 to make use of > the new name. > > drivers/net/e1000/e1000_ethtool.c |9 + > include/linux/ethtool.h |1 + > net/core/ethtool.c|7 +++ > 3 files changed, 9 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/e1000/e1000_ethtool.c > b/drivers/net/e1000/e1000_ethtool.c > --- a/drivers/net/e1000/e1000_ethtool.c > +++ b/drivers/net/e1000/e1000_ethtool.c > @@ -1704,13 +1704,6 @@ e1000_get_strings(struct net_device *net > } > } > > -static int > -e1000_get_perm_addr(struct net_device *netdev, struct ethtool_addr *eaddr) > -{ > - memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN); > - return 0; > -} > - > struct ethtool_ops e1000_ethtool_ops = { > .get_settings = e1000_get_settings, > .set_settings = e1000_set_settings, > @@ -1746,7 +1739,7 @@ struct ethtool_ops e1000_ethtool_ops = { > .phys_id= e1000_phys_id, > .get_stats_count= e1000_get_stats_count, > .get_ethtool_stats = e1000_get_ethtool_stats, > - .get_perm_addr = e1000_get_perm_addr, > + .get_perm_addr = ethtool_op_get_perm_addr, > }; > > void e1000_set_ethtool_ops(struct net_device *netdev) > diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h > --- a/include/linux/ethtool.h > +++ b/include/linux/ethtool.h > @@ -268,6 +268,7 @@ u32 ethtool_op_get_sg(struct net_device > int ethtool_op_set_sg(struct net_device *dev, u32 data); > u32 ethtool_op_get_tso(struct net_device *dev); > int ethtool_op_set_tso(struct net_device *dev, u32 data); > +int ethtool_op_get_perm_addr(struct net_device *dev, struct ethtool_addr > *); > > /** > * ðtool_ops - Alter and report network device settings > diff --git a/net/core/ethtool.c b/net/core/ethtool.c > --- a/net/core/ethtool.c > +++ b/net/core/ethtool.c > @@ -81,6 +81,12 @@ int ethtool_op_set_tso(struct net_device > return 0; > } > > +int ethtool_op_get_perm_addr(struct net_device *netdev, struct > ethtool_addr *eaddr) > +{ > + memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN); > + return 0; > +} > + > /* Handlers for each ethtool command */ > > static int ethtool_get_settings(struct net_device *dev, void __user > *useraddr) > @@ -845,6 +851,7 @@ int dev_ethtool(struct ifreq *ifr) > > EXPORT_SYMBOL(dev_ethtool); > EXPORT_SYMBOL(ethtool_op_get_link); > +EXPORT_SYMBOL_GPL(ethtool_op_get_perm_addr); > EXPORT_SYMBOL(ethtool_op_get_sg); > EXPORT_SYMBOL(ethtool_op_get_tso); > EXPORT_SYMBOL(ethtool_op_get_tx_csum); > -- > John W. Linville > [EMAIL PROTECTED] > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2.6.13-rc3] ethtool: add generic ethtool_op_get_perm_addr routine
Add generic ethtool operation for getting permanenet hardware address. Signed-off-by: John W. Linville <[EMAIL PROTECTED]> --- This moves and renames the basically generic e1000_get_perm_addr routine to ethtool_op_get_perm_addr, and causes e1000 to make use of the new name. drivers/net/e1000/e1000_ethtool.c |9 + include/linux/ethtool.h |1 + net/core/ethtool.c|7 +++ 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/net/e1000/e1000_ethtool.c b/drivers/net/e1000/e1000_ethtool.c --- a/drivers/net/e1000/e1000_ethtool.c +++ b/drivers/net/e1000/e1000_ethtool.c @@ -1704,13 +1704,6 @@ e1000_get_strings(struct net_device *net } } -static int -e1000_get_perm_addr(struct net_device *netdev, struct ethtool_addr *eaddr) -{ - memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN); - return 0; -} - struct ethtool_ops e1000_ethtool_ops = { .get_settings = e1000_get_settings, .set_settings = e1000_set_settings, @@ -1746,7 +1739,7 @@ struct ethtool_ops e1000_ethtool_ops = { .phys_id= e1000_phys_id, .get_stats_count= e1000_get_stats_count, .get_ethtool_stats = e1000_get_ethtool_stats, - .get_perm_addr = e1000_get_perm_addr, + .get_perm_addr = ethtool_op_get_perm_addr, }; void e1000_set_ethtool_ops(struct net_device *netdev) diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -268,6 +268,7 @@ u32 ethtool_op_get_sg(struct net_device int ethtool_op_set_sg(struct net_device *dev, u32 data); u32 ethtool_op_get_tso(struct net_device *dev); int ethtool_op_set_tso(struct net_device *dev, u32 data); +int ethtool_op_get_perm_addr(struct net_device *dev, struct ethtool_addr *); /** * ðtool_ops - Alter and report network device settings diff --git a/net/core/ethtool.c b/net/core/ethtool.c --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -81,6 +81,12 @@ int ethtool_op_set_tso(struct net_device return 0; } +int ethtool_op_get_perm_addr(struct net_device *netdev, struct ethtool_addr *eaddr) +{ + memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN); + return 0; +} + /* Handlers for each ethtool command */ static int ethtool_get_settings(struct net_device *dev, void __user *useraddr) @@ -845,6 +851,7 @@ int dev_ethtool(struct ifreq *ifr) EXPORT_SYMBOL(dev_ethtool); EXPORT_SYMBOL(ethtool_op_get_link); +EXPORT_SYMBOL_GPL(ethtool_op_get_perm_addr); EXPORT_SYMBOL(ethtool_op_get_sg); EXPORT_SYMBOL(ethtool_op_get_tso); EXPORT_SYMBOL(ethtool_op_get_tx_csum); -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 2.6.12.2 1/2]netðtool: Add support for getting the permanent hardware address
Build fixup for busted ethtool perm_addr support patch. Signed-off-by: John W. Linville <[EMAIL PROTECTED]> --- The hunk below is busted... On Tue, Jul 26, 2005 at 09:32:38AM -0500, Jon Wetzel wrote: > @@ -683,6 +683,22 @@ > return ret; > } > > +static int ethtool_get_perm_addr(struct net_device *dev, void __user > *useraddr) > +{ > + struct ethtool_addr addr = { ETHTOOL_GPERMADDR }; > + struct ethtool_ops *ops = dev->ethtool_ops; > + > + if (!ops->get_perm_addr){ > + return -EOPNOTSUPP; > + > + ops->get_perm_addr(dev, &addr); > + > + if (copy_to_user(useraddr, &addr, sizeof(addr))) > + return -EFAULT; > + > + return 0; > +} > + > /* The main entry point in this file. Called from net/core/dev.c */ > > int dev_ethtool(struct ifreq *ifr) Patch follows... net/core/ethtool.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/ethtool.c b/net/core/ethtool.c --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -694,7 +694,7 @@ static int ethtool_get_perm_addr(struct struct ethtool_addr addr = { ETHTOOL_GPERMADDR }; struct ethtool_ops *ops = dev->ethtool_ops; - if (!ops->get_perm_addr){ + if (!ops->get_perm_addr) return -EOPNOTSUPP; ops->get_perm_addr(dev, &addr); -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 2.6.12.2 2/2]e1000: Add support for getting a permanent hardware address
On Tue, Jul 26, 2005 at 09:34:10AM -0500, Jon Wetzel wrote: > This patch gives the e1000 driver the ability to retreive the permanent > hardware address of its device, via the framework established in part 1 > of this patch series. This patch fills in the new perm_addr field on > probing, and implements the get_perm_addr ethtool. > @@ -1663,6 +1663,13 @@ > } > } > > +static int > +e1000_get_perm_addr(struct net_device *netdev, struct ethtool_addr *eaddr) > +{ > + memcpy(eaddr->addr, netdev->perm_addr, ETH_MAX_ADDR_LEN); > + return 0; > +} > + > struct ethtool_ops e1000_ethtool_ops = { > .get_settings = e1000_get_settings, > .set_settings = e1000_set_settings, This seems pretty generic, especially since you have added perm_addr to the net_device structure. How about if we reform it as ethtool_op_get_perm_addr, so that all drivers can use it? Patch to follow... John P.S. Would a driver ever need to implement its own verion of this function? Since perm_addr is in the net_device structure, is there a cleaner way to do this? Just thinking out-loud... -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic.
Herbert Xu wrote: On Wed, Jul 27, 2005 at 03:18:39PM -0700, David S. Miller wrote: One idea tossed around between Herbert Xu (also CC:'d) and myself is to store a generation counter when we attach a route to a socket, then sk_dst_check() can verify that this generation count matches the current IPSEC flow cache generation count. Yes we did talk about having generation IDs for IPsec dst entries. However, it doesn't help us when IPsec SAs change. The flow cache generation ID is only incremented for policy changes, not state changes. This particular bug report relates to the case where SAs are renegotiated but the policy remains unchanged. IMHO this is something that user space can and should deal with. All the KM has to do is to delete the old outbound SA when the new outbound SA has been negotiated. This will cause all new traffic to start using the new SA immediately. It will also allow the remote side to continue using the old SA until it expires since we're not removing the existing inbound SA. The key management protocols have the timing and procedure to delete old SAs in the rekeying. We may not able to delete the outbound SA with the reasone. But we can delete the old SA(s) when implementing delete procedure on IKEv2 and KINK. IKEv1 does not define the rekey. We can probably delete the old outbound SA like the above. We could do this in the kernel. However, it'll end up being harder since the kernel doesn't really know which old SA(s) the new SA is meant to replace. I'm anxious about MIPL 2.0 because it is implemented on the xfrm architecture. -- Kazunori Miyazawa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] NETCONSOLE must depend on INET
From: Matt Mackall <[EMAIL PROTECTED]> Date: Tue, 26 Jul 2005 19:36:37 -0700 > # HG changeset patch > # User [EMAIL PROTECTED] > # Node ID 6cdd6f36d53678a016cfbf5ce667cbd91504d538 > # Parent 75716ae25f9d87ee2a5ef7c4df2d8f86e0f3f762 > Move in_aton from net/ipv4/utils.c to net/core/utils.c This patch doesn't apply, in the current 2.6.x GIT tree NETCONSOLE does not depend on NETDEVICES. Please fix up this patch so that I can apply it. Thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.10 Kernel Goes Crazy After Resetting MTU
On 7/27/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > I sent the following posting to linux-kernel at kernel.org. One reply > suggested I send it to [EMAIL PROTECTED] (mailto:netdev@vger.kernel.org) , so > I am doing that. > > Below are some rather extensive logs showing the Linux 2.6.10 kernel > (kernel.org source) going into a tail spin after an "administration program" > on my > server tried to reset the mtu on all four of Ethernet Ports. I believe only > eth1 was changed -- from 9000 to 1500. But the admin program always resets > all > ports (even if the settings don't change). > > In about 1/2 a second, we generated this tremendous list of errors. Then > everything went back to normal (except that all of my users on all subnets > had > their Windows workstations freeze when it occurred) > This is most likely due to the well known problem (we're working a fix) where the use of jumbo frames appears to cause extreme memory pressure due to looking for contiguous 32k pieces of memory for each descriptor. I don't know if this is indicative of some other problem in the memory manager not defragmenting often enough, but we're getting a lot of reports recently of problems like this with jumbos. We are working on a patch to change how the driver works in order to avoid this kind of scenario. Its not quite complete yet however. Jesse - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic.
On Wed, Jul 27, 2005 at 03:18:39PM -0700, David S. Miller wrote: > > One idea tossed around between Herbert Xu (also CC:'d) and myself is > to store a generation counter when we attach a route to a socket, then > sk_dst_check() can verify that this generation count matches the > current IPSEC flow cache generation count. Yes we did talk about having generation IDs for IPsec dst entries. However, it doesn't help us when IPsec SAs change. The flow cache generation ID is only incremented for policy changes, not state changes. This particular bug report relates to the case where SAs are renegotiated but the policy remains unchanged. IMHO this is something that user space can and should deal with. All the KM has to do is to delete the old outbound SA when the new outbound SA has been negotiated. This will cause all new traffic to start using the new SA immediately. It will also allow the remote side to continue using the old SA until it expires since we're not removing the existing inbound SA. We could do this in the kernel. However, it'll end up being harder since the kernel doesn't really know which old SA(s) the new SA is meant to replace. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.13rc3] IPv6: Check interface bindings on IPv6 raw socket reception
From: Patrick McHardy <[EMAIL PROTECTED]> Date: Sun, 24 Jul 2005 07:39:12 +0200 > [IPV4/6]: Check if packet was actually delivered to a raw socket to decide > whether to send an ICMP unreachable > > Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> Applied, thanks Patrick. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)
> On Jul 27, 2005, at 13:08, Randy Dunlap wrote: > > > > > > >> On Jul 25, 2005, at 16:06, Francois Romieu wrote: > >> > >> > >> > +int mdiobus_register(struct mii_bus *bus) > +{ > +int i; > +int err = 0; > + > +spin_lock_init(&bus->mdio_lock); > + > +if (NULL == bus || NULL == bus->name || > +NULL == bus->read || > +NULL == bus->write) > > > >>> > >>> Be spartan: > >>> if (!bus || !bus->name || !bus->read || !bus->write) > >>> > >> > >> > >> I think we have to agree to disagree here. I could be convinced, but > >> I'm partial to using NULL explicitly. > >> > > > > But there are 2 issues here (at least). One is to use NULL or > > not. The other is using (constant == var) or (var == constant). > > > > It's not described in CodingStlye afaik, but most recent email > > on the subject strongly prefers (var == constant) [in my > > unscientific survey -- of bits in my head]. > > > > So using the suggested style will fix both of these. :) > > > Ok, here I won't agree to disagree with you. !foo as a check for > NULL is a reasonable idea, but not my style. If that's the preferred > style for the kernel, I will do that. > > But (var == constant) is a style that asks for errors. By putting > the constant first in these checks, you never run the risk of leaving > a bug like this: > > if (dev = NULL) > ... > > This kind of error is quite frustrating to detect, and the eye will > often miss it when scanning for errors. If you follow constant == > var, though, then the bug looks like this: > > if (NULL = dev) > > which is instantly caught by the compiler. > > Just my 32 cents Yes, we know about that argument. :) > +/* Otherwise, we allocate the device, and initialize the > + * default values */ > +dev = kmalloc(sizeof(*dev), GFP_KERNEL); > + > +if (NULL == dev) { > +errno = -ENOMEM; > +return NULL; > +} > + > +memset(dev, 0, sizeof(*dev)); > > > >>> > >>> The kernel provides kcalloc. > >>> > >> > >> > >> I went looking for it, and found it in fs/cifs/misc.c. I'm hesitant > >> to link to a function defined in the filesystem code just to save 1 > >> line of code > >> > > > > It's more global than that. > > > Should we move the function, then, to include/linux/slab.h? Or > somewhere else? It's there, like Francois said. Get use a current tree. --- ~Randy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic.
From: Andrew Morton <[EMAIL PROTECTED]> Date: Wed, 27 Jul 2005 14:38:35 -0700 >Summary: IPSec incompabilty. Linux kernel waits to long to start > using new SA for outbound traffic. I think this is the known bug where we don't notice that a route attached to a socket is obsolete. It was first pointed out to me last year by Kazunori Miyazawa, CC:'d here. The problem is that, when we update IPSEC rules, sockets currently don't have a way to discover that. Traditionally, the route "obsolete" flag served this purpose, and that does work properly for normal route entries. But for IPSEC, we don't have a way to find all of the stacked routes we created that match a particular SA, and thus get them fixed up the next time a socket tries to send a packet. One idea tossed around between Herbert Xu (also CC:'d) and myself is to store a generation counter when we attach a route to a socket, then sk_dst_check() can verify that this generation count matches the current IPSEC flow cache generation count. Something like the following, untested patch, demonstrates the idea. [NET]: Tie obsolete state of routes also to flow cache generation count. This fixes the problem wherein IPSEC SA changes do not get noticed by cached socket routes. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/include/net/sock.h b/include/net/sock.h --- a/include/net/sock.h +++ b/include/net/sock.h @@ -54,6 +54,7 @@ #include #include #include +#include /* * This structure really needs to be cleaned up. @@ -193,6 +194,7 @@ struct sock { socket_lock_t sk_lock; wait_queue_head_t *sk_sleep; struct dst_entry*sk_dst_cache; + unsigned intsk_dst_cache_genid; struct xfrm_policy *sk_policy[2]; rwlock_tsk_dst_lock; atomic_tsk_rmem_alloc; @@ -924,6 +926,9 @@ __sk_dst_set(struct sock *sk, struct dst old_dst = sk->sk_dst_cache; sk->sk_dst_cache = dst; +#ifdef CONFIG_XFRM + sk->sk_dst_cache_genid = atomic_read(&flow_cache_genid); +#endif dst_release(old_dst); } @@ -958,7 +963,9 @@ __sk_dst_check(struct sock *sk, u32 cook { struct dst_entry *dst = sk->sk_dst_cache; - if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) { + if (dst && + ((dst->obsolete && dst->ops->check(dst, cookie) == NULL) || +(sk->sk_dst_cache_genid != atomic_read(&flow_cache_genid { sk->sk_dst_cache = NULL; dst_release(dst); return NULL; @@ -972,7 +979,9 @@ sk_dst_check(struct sock *sk, u32 cookie { struct dst_entry *dst = sk_dst_get(sk); - if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) { + if (dst && + ((dst->obsolete && dst->ops->check(dst, cookie) == NULL) || +(sk->sk_dst_cache_genid != atomic_read(&flow_cache_genid { sk_dst_reset(sk); dst_release(dst); return NULL; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fw: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic.
Begin forwarded message: Date: Wed, 27 Jul 2005 14:31:20 -0700 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic. http://bugzilla.kernel.org/show_bug.cgi?id=4952 Summary: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic. Kernel Version: 2.6.12.3 Status: NEW Severity: high Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Problem Description: Linux kernel waits to long to start using new SA for outbound traffic. This is wrong, and creates real problems when peer stops supporting old SA faster than it should. Steps to reproduce: I started pinging over IPSec tunnel. Racoon created the pair of IPsec-SA: Jul 27 21:18:06 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel YYY.YY.YYY.YYY[0]->XXX.XX.XX.XXX[0] spi=209244 158(0xc78cffe) Jul 27 21:18:06 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel XXX.XX.XX.XXX[0]->YYY.YY.YYY.YYY[0] spi=282443 5949(0xa85978ed) Everything worked fine: 2005-07-27 21:18:24.451903 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0x1), length 116 2005-07-27 21:18:24.486625 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x0c78cffe,seq=0x1), length 116 2005-07-27 21:18:25.453360 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0x2), length 116 2005-07-27 21:18:25.482251 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x0c78cffe,seq=0x2), length 116 After ~2880s keys expired: Jul 27 22:06:06 gw1 racoon: INFO: IPsec-SA expired: ESP/Tunnel YYY.YY.YYY.YYY[0]->XXX.XX.XX.XXX[0] spi=209244158( 0xc78cffe) Jul 27 22:06:06 gw1 racoon: INFO: initiate new phase 2 negotiation: XXX.XX.XX.XXX[0]<=>YYY.YY.YYY.YYY[0] Jul 27 22:06:06 gw1 racoon: INFO: IPsec-SA expired: ESP/Tunnel XXX.XX.XX.XXX[0]->YYY.YY.YYY.YYY[0] spi=2824435949 (0xa85978ed) Racoon negotiated new SA: Jul 27 22:06:10 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel YYY.YY.YYY.YYY[0]->XXX.XX.XX.XXX[0] spi=150510 678(0x8f89c56) Jul 27 22:06:10 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel XXX.XX.XX.XXX[0]->YYY.YY.YYY.YYY[0] spi=360860 9595(0xd717033b) Linux kernel was still using old SA (spi=0xa85978ed), peer switched into new SA (spi=0x08f89c56) 2005-07-27 22:06:10.634929 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0xb2c), length 116 2005-07-27 22:06:10.987012 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x0c78cffe,seq=0xb28), length 116 2005-07-27 22:06:10.992134 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x0c78cffe,seq=0xb29), length 116 2005-07-27 22:06:10.997382 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x0c78cffe,seq=0xb2a), length 116 2005-07-27 22:06:11.636814 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0xb2d), length 116 2005-07-27 22:06:11.665220 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x0c78cffe,seq=0xb2b), length 116 2005-07-27 22:06:12.638681 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0xb2e), length 116 2005-07-27 22:06:12.666848 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x08f89c56,seq=0x1), length 116 2005-07-27 22:06:13.640549 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0xb2f), length 116 2005-07-27 22:06:13.673727 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x08f89c56,seq=0x2), length 116 2005-07-27 22:06:14.642430 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0xb30), length 116 2005-07-27 22:06:14.670360 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x08f89c56,seq=0x3), length 116 2005-07-27 22:06:15.643304 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0xb31), length 116 2005-07-27 22:06:15.670616 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x08f89c56,seq=0x4), length 116 IPSec peer initiated new key renegotiation: Jul 27 22:06:52 gw1 racoon: INFO: respond new phase 1 negotiation: XXX.XX.XX.XXX[500]<=>YYY.YY.YYY.YYY[500] Jul 27 22:06:52 gw1 racoon: INFO: begin Identity Protection mode. Jul 27 22:06:58 gw1 racoon: INFO: ISAKMP-SA established XXX.XX.XX.XXX[500]-YYY.YY.YYY.YYY[500] spi:aa103d22d3e480 b3:1b470506cfc95a33 Jul 27 22:07:02 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel YYY.YY.YYY.YYY[0]->XXX.XX.XX.XXX[0] spi=383529 4(0x3a859e) Jul 27 22:07:02 gw1 racoon: INFO: IPsec-SA established: ESP/Tunnel XXX.XX.XX.XXX[0]->YYY.YY.YYY.YYY[0] spi=299727 8258(0xb2a6d632) Linux was still using old SA(spi=0xa85978ed), peer stopped accepting this SA: 2005-07-27 22:07:03.691323 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0xb61), length 116 2005-07-27 22:07:03.718660 IP YYY.YY.YYY.YYY > XXX.XX.XX.XXX: ESP(spi=0x08f89c56,seq=0x34), length 116 2005-07-27 22:07:04.692194 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0xb62), length 116 2005-07-27 22:07:05.692064 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0xb63), length 116 2005-07-27 22:07:06.691950 IP XXX.XX.XX.XXX > YYY.YY.YYY.YYY: ESP(spi=0xa85978ed,seq=0xb64), length 116 2005-07-
Re: [PATCH 2.6.13rc3] IPv6: Check interface bindings on IPv6 raw socket reception
From: Andrew McDonald <[EMAIL PROTECTED]> Date: Sat, 23 Jul 2005 19:04:43 +0100 > Take account of whether a socket is bound to a particular device when > selecting an IPv6 raw socket to receive a packet. Also perform this > check when receiving IPv6 packets with router alert options. > > Signed-off-by: Andrew McDonald <[EMAIL PROTECTED]> Applied, thanks Andrew. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] setsockopt locking fix
From: Kyle Moffett <[EMAIL PROTECTED]> Subject: Re: [PATCH] setsockopt locking fix Date: Wed, 27 Jul 2005 16:48:21 -0400 > On Jul 27, 2005, at 16:16:02, David S. Miller wrote: > > Fix is correct, good thing it only hits Sparc :-) > > > > But your patch does not apply cleanly (perhaps your > > email client mangled it somehow) and I need to have > > a "Signed-off-by: " line in order to apply the > > patch. > > Fixed. Attached is below. Patch is against a _very_ > recent linus GIT repository, and after mailing it to > myself, it still applies to a fresh repository here, > so I'm assuming it's ok: Applied, th anks Kyle. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.13-rc3-mm2
From: Andrew Morton <[EMAIL PROTECTED]> Date: Wed, 27 Jul 2005 14:11:51 -0700 > Unbalanced netlink_table_ungrab() in the netlink stuff in git-net.patch. Applied to net-2.6.14, thanks Andrew. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] NETCONSOLE must depend on INET
On Wed, Jul 27, 2005 at 01:19:00PM -0700, David S. Miller wrote: > From: Matt Mackall <[EMAIL PROTECTED]> > Date: Tue, 26 Jul 2005 19:36:37 -0700 > > > # HG changeset patch > > # User [EMAIL PROTECTED] > > # Node ID 6cdd6f36d53678a016cfbf5ce667cbd91504d538 > > # Parent 75716ae25f9d87ee2a5ef7c4df2d8f86e0f3f762 > > Move in_aton from net/ipv4/utils.c to net/core/utils.c > > This patch doesn't apply, in the current 2.6.x GIT tree > NETCONSOLE does not depend on NETDEVICES. Odd, gitweb of Linus' tree seems to disagree. I see it depends on NETDEVICES && INET && EXPERIMENTAL. NETDEVICES has been there since the beginning of git history and according to my Mercurial import from BKCVS, it's been dependent on NETDEVICES since I first submitted it. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] NETCONSOLE must depend on INET
From: Matt Mackall <[EMAIL PROTECTED]> Date: Wed, 27 Jul 2005 13:46:22 -0700 > Odd, gitweb of Linus' tree seems to disagree. I see it depends on > NETDEVICES && INET && EXPERIMENTAL. NETDEVICES has been there since > the beginning of git history and according to my Mercurial import from > BKCVS, it's been dependent on NETDEVICES since I first submitted it. Sorry, that's a result of a local change I just added to fix up presentation the net device family Kconfig's. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.13-rc3-mm2
Andrew James Wade <[EMAIL PROTECTED]> wrote: > > Hello, my kernel crashes on boot with the following BUG(): Indeed it will. > ENABLING IO-APIC IRQs > ..TIMER: vector=0x31 pin1=2 pin2=-1 > softlockup thread 0 started up. > NET: Registered protocol family 16 > [ cut here ] > kernel BUG at kernel/sched.c:2888! > invalid operand: [#1] > PREEMPT > last sysfs file: > CPU:0 > EIP:0060:[]Not tainted VLI > EFLAGS: 00010202 (2.6.13-rc3-mm2) > EIP is at sub_preempt_count+0x35/0x40 > eax: dff8 ebx: ecx: 0001 edx: 0001 > esi: dffc3d18 edi: ebp: dff81f50 esp: dff81f50 > ds: 007b es: 007b ss: 0068 > Process swapper (pid: 1, threadinfo=dff8 task=c14d9a10) > Stack: c038a5fe 0003 c048f5e0 c048f780 c048f780 > dff8d544 c038bcaa c0386d30 dffc3d18 000f >000f dff8d544 c04f2bf3 0021 0021 c04f2e8d > Call Trace: > [] netlink_create+0x5e/0x120 > [] netlink_kernel_create+0x13a/0x240 > [] rtnetlink_rcv+0x0/0x390 > [] rtnetlink_init+0x53/0xa0 > [] netlink_proto_init+0x18d/0x200 > [] do_initcalls+0x2b/0xc0 > [] kern_mount+0x15/0x19 > [] init+0x0/0x110 > [] init+0x2f/0x110 > [] kernel_thread_helper+0x0/0x18 > [] kernel_thread_helper+0x5/0x18 > Code: 89 e5 3b 50 14 7f 24 81 fa fe 00 00 00 76 0c b8 00 e0 ff ff 21 e0 29 50 > 14 c9 c3 80 78 14 00 75 ee 0f 0b 4c 0b 66 50 41 c0 eb e4 > <0f> 0b 48 0b 66 50 41 c0 eb d2 90 55 8b 40 04 89 e5 c9 e9 54 f5 > <0>Kernel panic - not syncing: Attempted to kill init! > Unbalanced netlink_table_ungrab() in the netlink stuff in git-net.patch. --- devel/net/netlink/af_netlink.c~netlink-locking-fix 2005-07-27 14:10:07.0 -0700 +++ devel-akpm/net/netlink/af_netlink.c 2005-07-27 14:10:16.0 -0700 @@ -349,12 +349,12 @@ static int netlink_create(struct socket netlink_table_grab(); if (!nl_table[protocol].hash.entries) { - netlink_table_ungrab(); #ifdef CONFIG_KMOD /* We do 'best effort'. If we find a matching module, * it is loaded. If not, we don't return an error to * allow pure userspace<->userspace communication. -HW */ + netlink_table_ungrab(); request_module("net-pf-%d-proto-%d", PF_NETLINK, protocol); netlink_table_grab(); #endif _ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] setsockopt locking fix
On Jul 27, 2005, at 16:16:02, David S. Miller wrote: Fix is correct, good thing it only hits Sparc :-) But your patch does not apply cleanly (perhaps your email client mangled it somehow) and I need to have a "Signed-off-by: " line in order to apply the patch. Fixed. Attached is below. Patch is against a _very_ recent linus GIT repository, and after mailing it to myself, it still applies to a fresh repository here, so I'm assuming it's ok: setsockopt-locking-fix.patch Description: Binary data Cheers, Kyle Moffett -- I lost interest in "blade servers" when I found they didn't throw knives at people who weren't supposed to be in your machine room. -- Anthony de Boer
Re: [PATCH] setsockopt locking fix
From: Kyle Moffett <[EMAIL PROTECTED]> Date: Wed, 27 Jul 2005 13:47:30 -0400 > # HG changeset patch > # User Kyle Moffett <[EMAIL PROTECTED]> > # Node ID 77475acbe89242e63e6fd73dc66fe52643011ed7 > # Parent 43cd2abd0f4c5d2e8ee4666d6bf1f0b96e252e54 > Fix a bug where sock_reset_flag() was called without lock_sock() Fix is correct, good thing it only hits Sparc :-) But your patch does not apply cleanly (perhaps your email client mangled it somehow) and I need to have a "Signed-off-by: " line in order to apply the patch. So please fix this up, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[-mm patch] include/net/ieee80211.h must #include
gcc found an (although perhaps harmless) bug: <-- snip --> ... CC net/ieee80211/ieee80211_crypt.o In file included from net/ieee80211/ieee80211_crypt.c:21: include/net/ieee80211.h:26:5: warning: "WIRELESS_EXT" is not defined CC net/ieee80211/ieee80211_crypt_wep.o In file included from net/ieee80211/ieee80211_crypt_wep.c:20: include/net/ieee80211.h:26:5: warning: "WIRELESS_EXT" is not defined CC net/ieee80211/ieee80211_crypt_ccmp.o CC net/ieee80211/ieee80211_crypt_tkip.o In file included from net/ieee80211/ieee80211_crypt_tkip.c:23: include/net/ieee80211.h:26:5: warning: "WIRELESS_EXT" is not defined ... <-- snip --> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> --- This patch was already sent on: - 22 Jul 2005 --- linux-2.6.13-rc3-mm1-full/include/net/ieee80211.h.old 2005-07-22 18:37:57.0 +0200 +++ linux-2.6.13-rc3-mm1-full/include/net/ieee80211.h 2005-07-22 18:38:10.0 +0200 @@ -22,6 +22,7 @@ #define IEEE80211_H #include /* ETH_ALEN */ #include/* ARRAY_SIZE */ +#include #if WIRELESS_EXT < 17 #define IW_QUAL_QUAL_INVALID 0x10 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)
Andy Fleming <[EMAIL PROTECTED]> : [kcalloc] > Should we move the function, then, to include/linux/slab.h? Or > somewhere else? It is already in mm/slab.c [rc = request_irq(...)] It appears in drivers/net/*c. Jeff Garzik used to suggest something similar but it does not matter as long as you do not need to return an error status (KERN_ERR is probably a bit too strong then). [initialization of struct phy_setting settings] #define NITZ(d,t,s) { .speed = s, .duplex = d, .setting = t } static struct phy_setting settings[] = { NITZ(DUPLEX_FULL, SUPPORTED_1baseT_Full, 1), NITZ(DUPLEX_FULL, SUPPORTED_1000baseT_Full, SPEED_1000), NITZ(DUPLEX_HALF, SUPPORTED_1000baseT_Half, SPEED_1000), NITZ(DUPLEX_FULL, SUPPORTED_100baseT_Full, SPEED_100), NITZ(DUPLEX_HALF, SUPPORTED_100baseT_Half, SPEED_100), NITZ(DUPLEX_FULL, SUPPORTED_10baseT_Full,SPEED_10), NITZ(DUPLEX_HALF, SUPPORTED_10baseT_Half,SPEED_10), }; #undef NITZ -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)
On Jul 27, 2005, at 13:08, Randy Dunlap wrote: On Jul 25, 2005, at 16:06, Francois Romieu wrote: +int mdiobus_register(struct mii_bus *bus) +{ +int i; +int err = 0; + +spin_lock_init(&bus->mdio_lock); + +if (NULL == bus || NULL == bus->name || +NULL == bus->read || +NULL == bus->write) Be spartan: if (!bus || !bus->name || !bus->read || !bus->write) I think we have to agree to disagree here. I could be convinced, but I'm partial to using NULL explicitly. But there are 2 issues here (at least). One is to use NULL or not. The other is using (constant == var) or (var == constant). It's not described in CodingStlye afaik, but most recent email on the subject strongly prefers (var == constant) [in my unscientific survey -- of bits in my head]. So using the suggested style will fix both of these. :) Ok, here I won't agree to disagree with you. !foo as a check for NULL is a reasonable idea, but not my style. If that's the preferred style for the kernel, I will do that. But (var == constant) is a style that asks for errors. By putting the constant first in these checks, you never run the risk of leaving a bug like this: if (dev = NULL) ... This kind of error is quite frustrating to detect, and the eye will often miss it when scanning for errors. If you follow constant == var, though, then the bug looks like this: if (NULL = dev) which is instantly caught by the compiler. Just my 32 cents +/* Otherwise, we allocate the device, and initialize the + * default values */ +dev = kmalloc(sizeof(*dev), GFP_KERNEL); + +if (NULL == dev) { +errno = -ENOMEM; +return NULL; +} + +memset(dev, 0, sizeof(*dev)); The kernel provides kcalloc. I went looking for it, and found it in fs/cifs/misc.c. I'm hesitant to link to a function defined in the filesystem code just to save 1 line of code It's more global than that. Should we move the function, then, to include/linux/slab.h? Or somewhere else? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)
> On Jul 25, 2005, at 16:06, Francois Romieu wrote: > > > >> +int mdiobus_register(struct mii_bus *bus) > >> +{ > >> +int i; > >> +int err = 0; > >> + > >> +spin_lock_init(&bus->mdio_lock); > >> + > >> +if (NULL == bus || NULL == bus->name || > >> +NULL == bus->read || > >> +NULL == bus->write) > >> > > > > Be spartan: > > if (!bus || !bus->name || !bus->read || !bus->write) > > > I think we have to agree to disagree here. I could be convinced, but > I'm partial to using NULL explicitly. But there are 2 issues here (at least). One is to use NULL or not. The other is using (constant == var) or (var == constant). It's not described in CodingStlye afaik, but most recent email on the subject strongly prefers (var == constant) [in my unscientific survey -- of bits in my head]. So using the suggested style will fix both of these. :) > >> +/* Otherwise, we allocate the device, and initialize the > >> + * default values */ > >> +dev = kmalloc(sizeof(*dev), GFP_KERNEL); > >> + > >> +if (NULL == dev) { > >> +errno = -ENOMEM; > >> +return NULL; > >> +} > >> + > >> +memset(dev, 0, sizeof(*dev)); > >> > > > > The kernel provides kcalloc. > > > I went looking for it, and found it in fs/cifs/misc.c. I'm hesitant > to link to a function defined in the filesystem code just to save 1 > line of code It's more global than that. ~Randy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)
On Jul 25, 2005, at 16:06, Francois Romieu wrote: [snip] +config DAVICOM_PHY +bool "Drivers for Davicom PHYs" +depends on PHYLIB +---help--- + Currently supports dm9161e and dm9131 [snip] Yeah, I resisted splitting the patch up for this reason. Suffice it to say, you have to apply patch #2 to not break everything. Splitting the PHY driver code from the PHY layer is just for "convenience" +int mdiobus_register(struct mii_bus *bus) +{ +int i; +int err = 0; + +spin_lock_init(&bus->mdio_lock); + +if (NULL == bus || NULL == bus->name || +NULL == bus->read || +NULL == bus->write) Be spartan: if (!bus || !bus->name || !bus->read || !bus->write) I think we have to agree to disagree here. I could be convinced, but I'm partial to using NULL explicitly. + +/* Convenience function to print out the current phy status + */ +void phy_print_status(struct phy_device *phydev) +{ +pr_info("%s: Link is %s", phydev->dev.bus_id, +phydev->link ? "Up" : "Down"); +if (phydev->link) +printk(" - %d/%s", phydev->speed, Missing KERN_SOMETHING in the printk. Actually, KERN_SOMETHING would muck up the line, and make it look like this: phy0:0: Link is Up<3> - 1000/Full That's why it's like that. +/* A mapping of all SUPPORTED settings to speed/duplex */ +static struct phy_setting settings[] = { +{ .speed = 1, .duplex = DUPLEX_FULL, +.setting = SUPPORTED_1baseT_Full, +}, +{ .speed = SPEED_1000, .duplex = DUPLEX_FULL, +.setting = SUPPORTED_1000baseT_Full, +}, +{ .speed = SPEED_1000, .duplex = DUPLEX_HALF, +.setting = SUPPORTED_1000baseT_Half, +}, +{ .speed = SPEED_100, .duplex = DUPLEX_FULL, +.setting = SUPPORTED_100baseT_Full, +}, +{ .speed = SPEED_100, .duplex = DUPLEX_HALF, +.setting = SUPPORTED_100baseT_Half, +}, +{ .speed = SPEED_10, .duplex = DUPLEX_FULL, +.setting = SUPPORTED_10baseT_Full, +}, +{ .speed = SPEED_10, .duplex = DUPLEX_HALF, +.setting = SUPPORTED_10baseT_Half, +}, +}; Would you veto some macro to initialise this array ? Depends on the macro. :) I'm not keen on writing it, but I would support one that: a) works b) Isn't uglier than the current solution. :) +static inline int phy_find_setting(int speed, int duplex) +{ +int idx = 0; + +while (idx < MAX_NUM_SETTINGS && +(settings[idx].speed != speed || +settings[idx].duplex != duplex)) +idx++; "for" loop in disguise ? Well I think it falls into the gray area. It's searching until it finds something, which implies "while" to me. Really it's more of a while...until. Of course, a for loop could be used, but I often worry about using a for loop's iterator variable outside of the loop. I will change to ARRAY_SIZE, though. + +return idx < MAX_NUM_SETTINGS ? idx : MAX_NUM_SETTINGS - 1; Ok (dunno if "idx % MAX_NUM_SETTINGS" is more idiomatic or not). That would be completely different. The current code makes sure that, if no valid match was found, the last value in the array is returned. Using % would result in the first value being returned. I was defaulting to the lowest setting. +int phy_start_interrupts(struct phy_device *phydev) +{ +int err = 0; + +INIT_WORK(&phydev->phy_queue, phy_change, phydev); + +if (request_irq(phydev->irq, phy_interrupt, +SA_SHIRQ, +"phy_interrupt", +phydev) < 0) { Please, don't do that :o( err = request_irq(phydev->irq, phy_interrupt, SA_SHIRQ, "phy_interrupt", phydev); if (err < 0) ... I did a cursory search, and didn't find any other drivers which use this method. Which is the method preferred in Linux? +printk(KERN_ERR "%s: Can't get IRQ %d (PHY)\n", +phydev->bus->name, +phydev->irq); +phydev->irq = PHY_POLL; +return 0; The description of the function says "Returns 0 on success". Failing to request the IRQ does not result in failure of the function. It falls back to polling, instead. However, it can fail if phy_enable_interrupts() fails, which would happen if a hardware issue occurred. +/* Otherwise, we allocate the device, and initialize the + * default values */ +dev = kmalloc(sizeof(*dev), GFP_KERNEL); + +if (NULL == dev) { +errno = -ENOMEM; +return NULL; +} + +memset(dev, 0, sizeof(*dev)); The kernel provides kcalloc. I went looking for it, and found it in fs/cifs/misc.c. I'm hesitant to link to a function defined in the filesystem code just to save 1 line of code I agree with all the other suggestions, and will implement them. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo inf
Re: more complex processing in ing_filter ?
On Wed, 27 Jul 2005 10:06:45 +0200 Lucas Nussbaum <[EMAIL PROTECTED]> wrote: > Hi, > > I'm interested in doing more complex stuff on inbound packets than > what is currently possible with ing_filter (I understand ingress > doesn't allow child classes , and can only drop/pass packets, not > store one to send it later). > > While this is understandable because it would conflict with the > benefits of NAPI by queueing and dropping packets much later, it > prevents me from using Linux instead of FreeBSD's Dummynet (I'm > working on network emulation-related stuff). > Why not just fix netem to work on imq? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
QoS for web traffic
Hi, I am one of the network admins of a campus of large community who tries to make sure that the academic community is kept happy in spite of a congested access link. My professor keeps asking me why can't you give higher bandwidth for web access to educational, and any academic related sites. I wonder if I can do so using Linux box placed in between the access link and the campus LAN. The squid delay pools option has been tried but it doesn't serve the purpose because squid doesn't have intimate knowledge of the bandwidth availability of the access link, which is dynamic. As a starting point, I would like to define classification rules such that web access to *.edu OR *.net OR *.org can be put under one bandwidth chunk. Public mail sites such as *.yahoo.com OR gmail.com OR *hotmail.com under a different chunk. The rest goes to default chunk, and so on. If any one category is not using its bandwidth share, others should be able to borrow the bandwidth. Of course smtp and other kinds of traffic will be given their quota. Can I do the above kind of classification, and subsequently bandwidth allocation based on text based wildcard with logical operators such as above using any of the existing options available under Linux ? Am I asking for a moon ? :) Thanks for your patience. Anand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] setsockopt locking fix
# HG changeset patch # User Kyle Moffett <[EMAIL PROTECTED]> # Node ID 77475acbe89242e63e6fd73dc66fe52643011ed7 # Parent 43cd2abd0f4c5d2e8ee4666d6bf1f0b96e252e54 Fix a bug where sock_reset_flag() was called without lock_sock() diff -r 43cd2abd0f4c -r 77475acbe892 net/core/sock.c --- a/net/core/sock.c Wed Jul 27 04:02:15 2005 +++ b/net/core/sock.c Wed Jul 27 17:38:27 2005 @@ -206,13 +206,14 @@ */ #ifdef SO_DONTLINGER /* Compatibility item... */ - switch (optname) { - case SO_DONTLINGER: - sock_reset_flag(sk, SOCK_LINGER); - return 0; - } -#endif - + if (optname == SO_DONTLINGER) { + lock_sock(sk); + sock_reset_flag(sk, SOCK_LINGER); + release_sock(sk); + return 0; + } +#endif + if(optlen Cheers, Kyle Moffett -- I lost interest in "blade servers" when I found they didn't throw knives at people who weren't supposed to be in your machine room. -- Anthony de Boer
Resend: [RFC/PATCH] "safer ipv4 reassembly"
Resending and requesting comments. (Patch was against 2.6.13-rc1, so it's a little stale by now.) -- Forwarded message -- . Version 2 of the rfc/patch is attached. It has been changed as indicated in the commentary below. Diffstat: include/linux/sysctl.h |1 net/ipv4/ip_fragment.c | 195 + net/ipv4/sysctl_net_ipv4.c | 11 ++ Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]> On Tue, 28 Jun 2005, Arthur Kepner wrote: > > On Sun, 26 Jun 2005, Herbert Xu wrote: > > > > > > +struct ipc { > > > .. > > > + struct rcu_head rcu; > > > > Is RCU worth it here? The only time we'd be taking the locks on this > > is when the first fragment of a packet comes in. At that point we'll > > be taking write_lock(&ipfrag_lock) anyway. > > Yes, I think rcu is worth it here. The reason is that to not use rcu would necessitate grabbing the (global) ipfrag_lock an additional time, when we free an ipc. Adding an "ipc" to the hashtable could be done under the ipfrag_lock, as you mention. But removing an ipc shouldn't be done at the same time that fragments are destroyed, because the common case is that another fragment queue will soon be created for the same (src,dst,proto). Better to save the ipc for a while to avoid freeing and then immediately recreating it. Since the freeing of the ipc has to be deferred until well after the last associated fragment queue has been freed, we can't take advantage of the fact that the ipfrag_lock is held when the fragment queue is freed. So when finally freeing the ipc, we can either grab the global ipfrag_lock again, or use some other, finer-grained lock to protect the ipc_hash entries. I'd prefer to avoid introducing new uses of global locks. If we use the finer-grained ipc_hash[].lock locks then rcu allows us to avoid taking any locks in ipc_find when we create a new fragment chain and there already happens to be an ipc for the associated (src,dst,proto). (I suspect this would be a fairly common case.) > > The only other use of RCU in your patch is ip_count. That should be > > changed to be done in ip_defrag instead. At that point you can simply > > find the ipc by deferencing ipq, so no need for __ipc_find and hence > > RCU. > > > > The reason you need to change it in this way is because you can't make > > assumptions about ip_rcv_finish being the first place where a packet > > is defragmented. With connection tracking enabled conntrack is the first > > place where defragmentation occurs. > > > . This has been fixed. ip_input.c isn't changed by this version of the patch. But there's the caveat that I mentioned earlier: > > There is a (big) advantage to doing this in ip_defrag() - this > becomes a no-op for non-fragmented datagrams. The disadvantage > is that there could be a situation where you receive: > > 1) first fragment of datagram X [for a particular (src,dst,proto)] > 2) a zillion non-fragmented datagrams [for the same (src,dst,proto)] > 3) last fragment of datagram X [for (src,dst,proto)] > > and no "disorder" would be detected for the datagrams associated > with (src,dst,proto), even though the ip id could have wrapped in the > meantime. This seems like a very uncommon case, however. > > > > > +#define IPC_HASHSZ IPQ_HASHSZ > > > +static struct { > > > + struct hlist_head head; > > > + spinlock_t lock; > > > +} ipc_hash[IPC_HASHSZ]; > > > > I'd store ipc entries in the main ipq hash table since they can use > > the same keys for lookup as ipq entries. You just need to set protocol > > to zero and map the user to values specific to ipc for ipc entries. > > One mapping would be to set the top bit of user for ipc entries, e.g. > > > > #define IP_DEFRAG_IPC 0x8000 > > ipc->user = ipq->user | IP_DEFRAG_IPC; > > > > Of course you also need to make sure that the two structures share > > the leading elements. You can then use the user field to distinguish > > between ipc/ipq entries. > I thought about this point, but I dislike reusing the same structure for such different purposes, so left this unchanged. Comments? -- Arthurdiff -rup linux.orig/include/linux/sysctl.h linux/include/linux/sysctl.h --- linux.orig/include/linux/sysctl.h 2005-07-06 12:32:03.224546953 -0700 +++ linux/include/linux/sysctl.h2005-07-07 09:56:42.854845089 -0700 @@ -343,6 +343,7 @@ enum NET_TCP_BIC_BETA=108, NET_IPV4_ICMP_ERRORS_USE_INBOUND_IFADDR=109, NET_TCP_CONG_CONTROL=110, + NET_IPV4_REASM_COUNT=111, }; enum { diff -rup linux.orig/net/ipv4/ip_fragment.c linux/net/ipv4/ip_fragment.c --- linux.orig/net/ipv4/ip_fragment.c 2005-07-06 12:30:51.033380830 -0700 +++ linux/net/ipv4/ip_fragment.c2005-07-07 09:56:42.856798234 -0700 @@ -56,6 +56,8 @@ int sysctl_ipfrag_high_thresh = 256*1024; int sysctl_ipfrag_low_thresh = 192*1024; +int sysctl_ip_reassembly_count; + /* Important NOTE! Frag
Re: Patch: reduce skb input dev on 64 bit machines
So 20 emails or so later ... On Wed, 2005-27-07 at 08:43 -0700, Ben Greear wrote: > I don't see a good reason for the feature, or at least nothing that > justifies the work of trying to implement it. You are the one who pointed the ifindex issue. What a waste of time. > What benefits do you envision? Go back and read the thread again. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: reduce skb input dev on 64 bit machines
jamal wrote: On Tue, 2005-26-07 at 09:54 -0700, Ben Greear wrote: [..] You will need to enforce that nothing else gets the index 34 while eth7 is removed. How do you do that? Thats trivial if you assume there's one management app which most of the router vendors implementing it have. It will get tricky when you have 10 apps fighting to get index 34 for other devices. You could of course enforce a reserved set of indices right on bootup via a kernel option for example; but that doesnt solve a stupid admin with 10 scripts all fighting for index 34 each for a different device. If you try to put that into the kernel, you have a big nasty mess, and if you try to make user-space do it, any bogus script can hose your system and potentially screw up your firewall and worse. Refer to above. It can actually be solved and not in a big mess like you say. The question is whether such a feature is needed. I don't see a good reason for the feature, or at least nothing that justifies the work of trying to implement it. What benefits do you envision? I believe this discussion originally came about because we can save 4 bytes by storing the ifindex instead of a pointer to the netdevice (on 64-bit machines). However, the cost of doing so is a netdev_get_by_index() and some scheme of making the ifindexes persistent. I don't think the saving of 4 bytes is worth either of these costs, much less both together. Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: more complex processing in ing_filter ?
On Wed, 2005-27-07 at 10:06 +0200, Lucas Nussbaum wrote: > Hi, > > I'm interested in doing more complex stuff on inbound packets than what > is currently possible with ing_filter (I understand ingress doesn't > allow child classes , and can only drop/pass packets, not store one to > send it later). > No, thats not true. You can write a tc action that will steal packets from that path and later reinject them. But that may not be necessary if you use the patched dummy device since you could redirect packets to it and run whatever qdisc you want on it. > While this is understandable because it would conflict with the benefits > of NAPI by queueing and dropping packets much later, it prevents me from > using Linux instead of FreeBSD's Dummynet (I'm working on network > emulation-related stuff). > > What would be the disadvantages of moving the call to ing_filter earlier > in netif_receive_skb, allow queueing in ingress, and re-inject packets > inside netif_receive_skb ? Does it look do-able at least ? I'm not sure > I see all the problems it implies. > > I know there's a solution to my problem using IMQ or dummy, but it > doesn't look like a very clean solution. > I am not sure why you say it's unclean. If you can give the packets to dummy and run any qdisc on it such as netem - why would that be a problem? cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: reduce skb input dev on 64 bit machines
On Tue, 2005-26-07 at 13:00 -0700, David S. Miller wrote: > > Calling __dev_get_by_index() at every classification check is quite > silly and potentially expensive, so let's call using ifindex a last > resort, yet correct, fix. Just double checking (I think we are saying the same thing), that using ifindices and requiring refcounting for input_dev means you have to use __dev_get_by_index() on a per-packet basis. The contention is if we do really care about refcounting: I dont think we do. The only time it would really matter is when a module or device is hotplugged out and back in. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: reduce skb input dev on 64 bit machines
On Tue, 2005-26-07 at 09:54 -0700, Ben Greear wrote: [..] > You will need to enforce that nothing else gets the index 34 while eth7 is > removed. > How do you do that? Thats trivial if you assume there's one management app which most of the router vendors implementing it have. It will get tricky when you have 10 apps fighting to get index 34 for other devices. You could of course enforce a reserved set of indices right on bootup via a kernel option for example; but that doesnt solve a stupid admin with 10 scripts all fighting for index 34 each for a different device. > If you try to put that into the kernel, you > have a big nasty mess, and if you try to make user-space do it, any bogus > script can hose your system and potentially screw up your firewall and > worse. > Refer to above. It can actually be solved and not in a big mess like you say. The question is whether such a feature is needed. > Also, imagine that you remove your pro/100 pcmcia NIC and put in your > tulip. Both will be, say, eth1 currently, and that is probably what you > want, but they will have different physical characteristics. The name is easy. Check out a utility like nameif for example which uses MAC addresses as unique ids. DaveM mentions this in his other email on this thread. > Even if you > put them in different cardbus slots, the likelyhood is that you want it > to be called eth1 and treated the same regardless of which NIC you are > using. If you are matching firewall rules or whatever against a device > index instead of a device name, this will fail because the device indexes > will be different. Again the name is easy. You can call a NIC whatever you want. > And, for purely virtual devices, with no lspci relationship, and no serial > number, how do you match those? > [You are taking things too literally (which is always dangerous): When i mentioned lspci, serial number etc - I was not defining scripture. I was giving an example of how you could find uniqueness. DaveM mentioned MAC addresses for example; think outside the box a little - find something thats unique on per device type if you are going to write a management script/program.] > Maybe we could make a small effort to keep the device indexes the same > more often, but still not guarantee it. That may help snmp related tools > without overly complicating the kernel or user space. We already almost guarantee a device will get the same ifindex and name if created at boot time on the same kernel. As for reserving, as i said above - an admin could be allowed to reserve ifindices. Now you could go further with boot time reserving of a name,ifindex pair example "ifreserve=eth7,34" and pass a series of those; when someone creates eth7 they get ifindex 34 etc. But this is assuming 10 other scripts will try to be mapping ifindex 34 to something else. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] add netlink module refcounting
On Tue, Jul 26, 2005 at 04:37:19PM -0700, David S. Miller wrote: > From: Harald Welte <[EMAIL PROTECTED]> > Date: Sat, 23 Jul 2005 16:15:52 -0400 > > > The attached patch adds support for refcounting of modules implementing > > netlink protocols. The idea is that you prevent the module from > > disappearing as long as someone in userspace has still a socket talking > > to you. > > Ok, the changes look mostly fine. I've made a few slight > modifications before integrating, some of which I've mentioned > already: > > 1) I keep nl_table[] dynamically allocated > 2) I fixed up some white spacing, very minor stuff > 3) I fixed a socket leak in netlink_kernel_create(). If >netlink_lookup() returns non-NULL, you have a reference >to that socket thus have to release it. thanks for taking care of this, and especially modifying the patch. I would have done that based on your comments, but well, if you want to do it, it's more convenient for me ;) > I'm only including the af_netlink.c part of that patch I integrated > since that's the only part I modified compared to your original > patch. ok, I read through it and it seems fine to me. > I think there is a slight hole in this code though, which we can > fixup as a followon patch. We probably need to grab the netlink > table from the point at which we set p_ops all the way to where > we full commit and netlink_insert() the kernel netlink socket. mh, I have to think about that in more detail, will get back to you. There's also a potential module refcounting leak when we have the following order of events: 1) netlink_kernel_create() 2) userspace opens socket, increases refcount of kernel socket 3) sock_release(kernel_sock), resets p_ops to generic ones 4) userspace closes socket, but can no longer drop refcount on module implementing the kernel socket. I had a somewhat lengthy discussion with Thomas and Patrick about this, and we didn't think it's worth fixing this up, esp. since all the current users seem to sock_release only in the module unload path, which can in turn only be invoked if the refcount is already zero. The only real solution for this is to split netlink_kernel_create() in two parts, let's say netlink_proto_register and netlink_kernel_sock_create(), and the same for sock_release() / netlink_proto_unregister(). Let me know whether you think it's worth the effort. Thanks, Harald -- - Harald Welte <[EMAIL PROTECTED]> http://netfilter.org/ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed."-- Paul Vixie pgpaKkcTCs0Uh.pgp Description: PGP signature
more complex processing in ing_filter ?
Hi, I'm interested in doing more complex stuff on inbound packets than what is currently possible with ing_filter (I understand ingress doesn't allow child classes , and can only drop/pass packets, not store one to send it later). While this is understandable because it would conflict with the benefits of NAPI by queueing and dropping packets much later, it prevents me from using Linux instead of FreeBSD's Dummynet (I'm working on network emulation-related stuff). What would be the disadvantages of moving the call to ing_filter earlier in netif_receive_skb, allow queueing in ingress, and re-inject packets inside netif_receive_skb ? Does it look do-able at least ? I'm not sure I see all the problems it implies. I know there's a solution to my problem using IMQ or dummy, but it doesn't look like a very clean solution. Thank you, -- | Lucas Nussbaum | [EMAIL PROTECTED] http://www.lucas-nussbaum.net/ | | jabber: [EMAIL PROTECTED] GPG: 1024D/023B3F4F | - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6-git] Add netpoll support to cs890x driver
Trivial patch adding netpoll support to cs89x0 driver. Signed-off-by: Deepak Saxena <[EMAIL PROTECTED]> Please apply, ~Deepak diff --git a/drivers/net/cs89x0.c b/drivers/net/cs89x0.c --- a/drivers/net/cs89x0.c +++ b/drivers/net/cs89x0.c @@ -86,6 +86,7 @@ Deepak Saxena : [EMAIL PROTECTED] : Intel IXDP2x01 (XScale ixp2x00 NPU) platform support +: Netpoll support */ @@ -247,6 +248,9 @@ static int get_eeprom_data(struct net_de static int get_eeprom_cksum(int off, int len, int *buffer); static int set_mac_address(struct net_device *dev, void *addr); static void count_rx_errors(int status, struct net_local *lp); +#ifdef CONFIG_NET_POLL_CONTROLLER +static void net_poll_controller(struct net_device *dev); +#endif #if ALLOW_DMA static void get_dma_channel(struct net_device *dev); static void release_dma_buff(struct net_local *lp); @@ -405,6 +409,19 @@ get_eeprom_cksum(int off, int len, int * return -1; } +#ifdef CONFIG_NET_POLL_CONTROLLER +/* + * Polling receive - used by netconsole and other diagnostic tools + * to allow network i/o with interrupts disabled. + */ +static void net_poll_controller(struct net_device *dev) +{ + disable_irq(dev->irq); + net_interrupt(dev->irq, dev, NULL); + enable_irq(dev->irq); +} +#endif + /* This is the real probe routine. Linux has a history of friendly device probes on the ISA bus. A good device probes avoids doing writes, and verifies that the correct device exists and functions. @@ -756,6 +773,9 @@ printk("PP_addr=0x%x\n", inw(ioaddr + AD dev->get_stats = net_get_stats; dev->set_multicast_list = set_multicast_list; dev->set_mac_address= set_mac_address; +#ifdef CONFIG_NET_POLL_CONTROLLER + dev->poll_controller= net_poll_controller; +#endif printk("\n"); if (net_debug) -- Deepak Saxena - [EMAIL PROTECTED] - http://www.plexity.net Even a stopped clock gives the right time twice a day. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html