Re: [RFT] proxy arp deadlock possible
On Wed, Apr 04, 2007 at 06:10:42PM -0700, Arjan van de Ven wrote: > On Thu, 2007-04-05 at 10:44 +1000, Herbert Xu wrote: > > Stephen Hemminger <[EMAIL PROTECTED]> wrote: > > > Thanks Dave, there is a classic AB BA deadlock here. > > > We should break the dependency like this. > > > > > > Could someone who uses proxy ARP test this? > > > > Sorry Stephen, this isn't necessary. The lockdep thing is > > simply confused here. It's treating tbl->proxy_queue as the > > same thing as neigh->arp_queue when they're clearly different. > > > > I'm disappointed that after all this time lockdep is still > > producing bogus reports like this. I'm sure we've been > > through this particular issue many times already. > > > what's the exact lockdep output here? http://www.mail-archive.com/netdev@vger.kernel.org/msg35266.html Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20.4: NETDEV WATCHDOG and lockups
On Wed, Apr 04, 2007 at 02:20:23PM +0100, Christian Kujau wrote: > On Wed, 4 Apr 2007, Jarek Poplawski wrote: > >So, it's a lot sooner than before. (BTW, isn't there anything > >in debug log?) > > No, nothing. I've set up remote-syslgging to the other node (node1 > logging to node2 and vice versa) - nothing :( > > >I see both CPUs did interrupt handling again. > > Yes, when booting with 'lapic' both CPUs/cores are handling interrupts > again. However, since 'lapic' seems to lead to crashes here, we would be > more than happy to just boot with 'noapic' but have 'irqbalance' > working. Unfortunately, irqbalance is unable to write to > /proc/irq/*/smp_affinity (did not help to disable CONFIG_IRQBALANCE). I hope you are right, but maybe it's not lapic's fault? Probably the fastest way to know would be to try with some other card, yet. > >Maybe it's a real locking problem. Here are some more > >suggestions for testing (if you don't find anything better): > >- try without SMP, so: 'acpi=off lapic nosmp' BTW, I'm not sure acpi should be turned off with any modern hardware. Did you tried to compile with CONFIG_ACPI = y, all other acpi options off, and maybe to tweak only with 'pci=...' boot parameter? Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] proxy arp deadlock possible
On Wed, Apr 04, 2007 at 06:10:42PM -0700, Arjan van de Ven wrote: > > what's the exact lockdep output here? Here's the original report: === [ INFO: possible circular locking dependency detected ] 2.6.20-1.2933.fc6debug #1 --- swapper/0 is trying to acquire lock: (&tbl->lock){-+-+}, at: [] neigh_lookup+0x43/0xa2 but task is already holding lock: (&list->lock#4){-+..}, at: [] neigh_proxy_process+0x20/0xc2 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&list->lock#4){-+..}: [] __lock_acquire+0x913/0xa43 [] lock_acquire+0x56/0x6f [] _spin_lock_irqsave+0x34/0x44 [] skb_dequeue+0x12/0x43 [] skb_queue_purge+0x14/0x1b [] neigh_update+0x349/0x3a5 [] arp_process+0x4d1/0x50a [] arp_rcv+0xe3/0x100 [] netif_receive_skb+0x2db/0x35a [] process_backlog+0x95/0xf6 [] net_rx_action+0xa1/0x1a8 [] __do_softirq+0x6f/0xe2 [] do_softirq+0x61/0xd0 [] 0x -> #1 (&n->lock){-+-+}: [] __lock_acquire+0x913/0xa43 [] lock_acquire+0x56/0x6f [] _write_lock+0x2b/0x38 [] neigh_periodic_timer+0x99/0x138 [] run_timer_softirq+0x104/0x168 [] __do_softirq+0x6f/0xe2 [] do_softirq+0x61/0xd0 [] 0x -> #0 (&tbl->lock){-+-+}: [] __lock_acquire+0x814/0xa43 [] lock_acquire+0x56/0x6f [] _read_lock_bh+0x30/0x3d [] neigh_lookup+0x43/0xa2 [] neigh_event_ns+0x2c/0x7a [] arp_process+0x386/0x50a [] parp_redo+0x8/0xa [] neigh_proxy_process+0x66/0xc2 [] run_timer_softirq+0x104/0x168 [] __do_softirq+0x6f/0xe2 [] do_softirq+0x61/0xd0 [] 0x other info that might help us debug this: 1 lock held by swapper/0: #0: (&list->lock#4){-+..}, at: [] neigh_proxy_process+0x20/0xc2 stack backtrace: [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] print_circular_bug_tail+0x5f/0x68 [] __lock_acquire+0x814/0xa43 [] lock_acquire+0x56/0x6f [] _read_lock_bh+0x30/0x3d [] neigh_lookup+0x43/0xa2 [] neigh_event_ns+0x2c/0x7a [] arp_process+0x386/0x50a [] parp_redo+0x8/0xa [] neigh_proxy_process+0x66/0xc2 [] run_timer_softirq+0x104/0x168 [] __do_softirq+0x6f/0xe2 [] do_softirq+0x61/0xd0 === Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
* Avi Kivity <[EMAIL PROTECTED]> wrote: > [...] But the difference in cruftiness between kvm and qemu code > should not enter into the discussion of where to do things. i agree that it doesnt enter the discussion for the *PIC question, but it very much enters the discussion for the question that i replied to: > > > You didn't quote Anthony's point about "it's more about there not > > > being good enough userspace interfaces to do network IO." > > > > > > It's easier to write a kernel-space network driver, but it's not > > > obviously the right thing to do until we can show that an > > > efficient packet-level userspace interface isn't possible. I > > > don't think that's been done, and it would be interesting to try. prototyping new kernel APIs to implement user-space network drivers, on a crufty codebase is not something that should be done lightly. Any negative result will not bring us any real conclusion. (was the failure due to the concept, due the API or due to the crufty codebase?) (but ... this is really a side-track issue for the *PIC question at hand. PICs are not network devices, they are essential platform components and almost an extended part of the CPU.) Ingo - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
Hi all As I came this morning to check the IrNET / PPP test, I started yesterday, the device was dead and OOM messages were scrolling up the terminal. I captured task trace, and the ftp process seems to have been the original culprit. Below is the backtrace. Which looks like a recursive spinlock, since, I assume, it is this BUG() that has triggered: BUG_ON(rt_mutex_owner(lock) == current); and ppp is indeed entered recursively in the trace. I'll look if I can find the reason and a solution, but would also be greatful for any hints. Thanks Guennadi - Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany ftp D [c3c9e460] C01E5838 0 18445 1 20756 14588 (L-TLB) [] (__schedule+0x0/0x7e8) from [] (schedule+0x54/0x124) [] (schedule+0x0/0x124) from [] (lock_sock_nested+0x94/0xd0) r5 = C329F06C r4 = C14B9780 [] (lock_sock_nested+0x0/0xd0) from [] (sock_fasync+0x40/0x154) r7 = C329F040 r6 = C2238A5C r5 = C0AD8B60 r4 = C0AD8B60 [] (sock_fasync+0x0/0x154) from [] (sock_close+0x24/0x44) [] (sock_close+0x0/0x44) from [] (__fput+0x194/0x1c8) r4 = 0008 [] (__fput+0x0/0x1c8) from [] (fput+0x38/0x3c) r8 = r7 = C3251380 r6 = r5 = C3251380 r4 = C0AD8B60 [] (fput+0x0/0x3c) from [] (filp_close+0x5c/0x88) [] (filp_close+0x0/0x88) from [] (put_files_struct+0x9c/0xdc) r6 = C3251388 r5 = 0007 r4 = 0001 [] (put_files_struct+0x0/0xdc) from [] (do_exit+0x168/0x8b0) [] (do_exit+0x0/0x8b0) from [] (die+0x29c/0x2e8) [] (die+0x0/0x2e8) from [] (__do_kernel_fault+0x70/0x80) [] (__do_kernel_fault+0x0/0x80) from [] (do_page_fault+0x60/0x214) r7 = C1B3B8C0 r6 = C0264418 r5 = C3C9E460 r4 = C02643A8 [] (do_page_fault+0x0/0x214) from [] (do_DataAbort+0x3c/0xa4) [] (do_DataAbort+0x0/0xa4) from [] (__dabt_svc+0x40/0x60) r8 = 0001 r7 = A013 r6 = C3A43780 r5 = C14B99E0 r4 = [] (__bug+0x0/0x2c) from [] (rt_spin_lock_slowlock+0x1c8/0x1f8) [] (rt_spin_lock_slowlock+0x0/0x1f8) from [] (__lock_text_start+0x44/0x48) [] (__lock_text_start+0x0/0x48) from [] (ppp_channel_push+0x1c/0xc8 [ppp_generic]) [] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [] (ppp_output_wakeup+0x18/0x1c [ppp_generic]) r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = [] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [] (irnet_flow_indication+0x38/0x3c [irnet]) [] (irnet_flow_indication+0x0/0x3c [irnet]) from [] (irttp_run_tx_queue+0x1c0/0x1d4 [irda]) [] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [] (irttp_data_request+0x128/0x4f8 [irda]) r8 = BF121560 r7 = 0002 r6 = C38F4200 r5 = C21418B8 r4 = C21418B8 [] (irttp_data_request+0x0/0x4f8 [irda]) from [] (ppp_irnet_send+0x134/0x238 [irnet]) [] (ppp_irnet_send+0x0/0x238 [irnet]) from [] (ppp_push+0x80/0xb8 [ppp_generic]) r7 = C3A436E0 r6 = r5 = C21418B8 r4 = C1489600 [] (ppp_push+0x0/0xb8 [ppp_generic]) from [] (ppp_xmit_process+0x34/0x50c [ppp_generic]) r7 = 0021 r6 = C21418B8 r5 = C1489600 r4 = [] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [] (ppp_start_xmit+0x128/0x254 [ppp_generic]) [] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [] (dev_hard_start_xmit+0x170/0x268) [] (dev_hard_start_xmit+0x0/0x268) from [] (__qdisc_run+0x60/0x270) r8 = C1BBC914 r7 = C21418B8 r6 = r5 = C21418B8 r4 = C1BBC800 [] (__qdisc_run+0x0/0x270) from [] (dev_queue_xmit+0x1b4/0x25c) [] (dev_queue_xmit+0x0/0x25c) from [] (ip_output+0x150/0x254) r7 = C329F040 r6 = C21418B8 r5 = r4 = C0D02EE0 [] (ip_output+0x0/0x254) from [] (ip_queue_xmit+0x360/0x4b4) [] (ip_queue_xmit+0x0/0x4b4) from [] (tcp_transmit_skb+0x5ec/0x8c0) [] (tcp_transmit_skb+0x0/0x8c0) from [] (tcp_push_one+0xb4/0x13c) [] (tcp_push_one+0x0/0x13c) from [] (tcp_sendmsg+0x9a8/0xcdc) r8 = C2EF30A0 r7 = 05A8 r6 = r5 = C329F040 r4 = C2141820 [] (tcp_sendmsg+0x0/0xcdc) from [] (inet_sendmsg+0x60/0x64) [] (inet_sendmsg+0x0/0x64) from [] (sock_aio_write+0x100/0x104) r7 = C14B9E94 r6 = 0001 r5 = C14B9E9C r4 = C2238A20 [] (sock_aio_write+0x4/0x104) from [] (do_sync_write+0xc8/0x114) r8 = C14B9E94 r7 = C14B9EE4 r6 = C14B9E9C r5 = r4 = [] (do_sync_write+0x0/0x114) from [] (vfs_write+0x178/0x18c) [] (vfs_write+0x0/0x18c) from [] (sys_write+0x4c/0x7c) [] (sys_write+0x0/0x7c) from [] (ret_fast_syscall+0x0/0x2c) r8 = C0020084 r7 = 0004 r6 = 00082000 r5 = 2000 r4 = BE974C9C - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Ingo Molnar wrote: so right now the only option for a clean codebase is the KVM in-kernel code. I strongly disagree with this. Bad code in userspace is not an excuse for shoving stuff into the kernel, where maintaining it is much more expensive, and the cause of a mistake can be system crashes and data loss, affecting unrelated processes. If we move something into the kernel, we'd better have a really good reason for it. Qemu code _is_ crufty. We can do one of three things: 1. live with it 2. fork it and clean it up 3. clean it up incrementally and merge it upstream Currently we're doing (1). You're suggesting a variant of (2), fork plus move into the kernel. The right thing to do IMO is (3), but I don't see anybody volunteering. Qemu picked up additional committers recently and I believe they would be receptive to cleanups. [In the *pic/pit case, we have other reasons to push things into the kernel. But "this code is crap, let's rewrite it in the kernel" is not a justification I'll accept. I'd be much happier if we could quantify these other reasons.] -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Move sk_setup_caps out of line
It is far too large to be an inline and not in any hot paths. Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Index: linux-2.6.21-rc1-net/include/net/sock.h === --- linux-2.6.21-rc1-net.orig/include/net/sock.h +++ linux-2.6.21-rc1-net/include/net/sock.h @@ -1083,19 +1083,7 @@ static inline int sk_can_gso(const struc return net_gso_ok(sk->sk_route_caps, sk->sk_gso_type); } -static inline void sk_setup_caps(struct sock *sk, struct dst_entry *dst) -{ - __sk_dst_set(sk, dst); - sk->sk_route_caps = dst->dev->features; - if (sk->sk_route_caps & NETIF_F_GSO) - sk->sk_route_caps |= NETIF_F_GSO_MASK; - if (sk_can_gso(sk)) { - if (dst->header_len) - sk->sk_route_caps &= ~NETIF_F_GSO_MASK; - else - sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM; - } -} +extern void sk_setup_caps(struct sock *sk, struct dst_entry *dst); static inline void sk_charge_skb(struct sock *sk, struct sk_buff *skb) { Index: linux-2.6.21-rc1-net/net/core/sock.c === --- linux-2.6.21-rc1-net.orig/net/core/sock.c +++ linux-2.6.21-rc1-net/net/core/sock.c @@ -970,6 +970,21 @@ out: EXPORT_SYMBOL_GPL(sk_clone); +void sk_setup_caps(struct sock *sk, struct dst_entry *dst) +{ + __sk_dst_set(sk, dst); + sk->sk_route_caps = dst->dev->features; + if (sk->sk_route_caps & NETIF_F_GSO) + sk->sk_route_caps |= NETIF_F_GSO_MASK; + if (sk_can_gso(sk)) { + if (dst->header_len) + sk->sk_route_caps &= ~NETIF_F_GSO_MASK; + else + sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM; + } +} +EXPORT_SYMBOL_GPL(sk_setup_caps); + void __init sk_init(void) { if (num_physpages <= 4096) { - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
Ok, a simple analysis reveals the recursive spinlock: On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote: > [] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [] > (ppp_output_wakeup+0x18/0x1c [ppp_generic]) ===> > r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = ===>spin_lock_bh(&pch->downl); > [] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [] > (irnet_flow_indication+0x38/0x3c [irnet]) > [] (irnet_flow_indication+0x0/0x3c [irnet]) from [] > (irttp_run_tx_queue+0x1c0/0x1d4 [irda]) > [] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [] > (irttp_data_request+0x128/0x4f8 [irda]) > r8 = BF121560 r7 = 0002 r6 = C38F4200 r5 = C21418B8 > r4 = C21418B8 > [] (irttp_data_request+0x0/0x4f8 [irda]) from [] > (ppp_irnet_send+0x134/0x238 [irnet]) > [] (ppp_irnet_send+0x0/0x238 [irnet]) from [] > (ppp_push+0x80/0xb8 [ppp_generic]) > r7 = C3A436E0 r6 = r5 = C21418B8 r4 = C1489600 > [] (ppp_push+0x0/0xb8 [ppp_generic]) from [] > (ppp_xmit_process+0x34/0x50c [ppp_generic]) ===> > r7 = 0021 r6 = C21418B8 r5 = C1489600 r4 = ===>spin_lock_bh(&pch->downl); > [] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [] > (ppp_start_xmit+0x128/0x254 [ppp_generic]) > [] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [] > (dev_hard_start_xmit+0x170/0x268) > [] (dev_hard_start_xmit+0x0/0x268) from [] > (__qdisc_run+0x60/0x270) > r8 = C1BBC914 r7 = C21418B8 r6 = r5 = C21418B8 > r4 = C1BBC800 > [] (__qdisc_run+0x0/0x270) from [] > (dev_queue_xmit+0x1b4/0x25c) Now, does anyone have an idea how best to fix it? 1. Should re-entrance in ppp_channel_push() be prevented, and if yes - at which level? Or 2. Should re-entrance be allowed and only recursive spin_lock_bh() avoided? Thanks Guennadi - Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[SC92031]: Fix priv->lock context (was: 2.6.21-rc5-mm4)
On Thu, Apr 05, 2007 at 02:04:02AM +, Andrew Morton wrote: > > This looks like a locking bug in the ipv6 changes in davem's devel tree. > There are no relevant changes to drivers/net/sc92031.c in rc5-mm4. Actually, this looks like a latent bug in sc92031. It's calling spin_lock in the dev->open function on a lock that's held in BH context. [SC92031]: Fix priv->lock context The spin_lock calls made in dev->open and dev->close must disable BH since open/close are made in process context. Conversely, the call in dev->hard_start_xmit does not need to disable BH since it is already executing with BH disabled. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Eliminate some unnecessary gotos in tcp v4 hash handling
The compiler eliminates them anyways and this makes the code easier to read and shorter. Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Index: linux-2.6.21-rc3-net/net/ipv4/tcp_ipv4.c === --- linux-2.6.21-rc3-net.orig/net/ipv4/tcp_ipv4.c +++ linux-2.6.21-rc3-net/net/ipv4/tcp_ipv4.c @@ -2052,8 +2052,7 @@ static void *established_get_first(struc if (sk->sk_family != st->family) { continue; } - rc = sk; - goto out; + return sk; } st->state = TCP_SEQ_STATE_TIME_WAIT; inet_twsk_for_each(tw, node, @@ -2061,13 +2060,11 @@ static void *established_get_first(struc if (tw->tw_family != st->family) { continue; } - rc = tw; - goto out; + return tw; } read_unlock(&tcp_hashinfo.ehash[st->bucket].lock); st->state = TCP_SEQ_STATE_ESTABLISHED; } -out: return rc; } @@ -2088,10 +2085,8 @@ get_tw: while (tw && tw->tw_family != st->family) { tw = tw_next(tw); } - if (tw) { - cur = tw; - goto out; - } + if (tw) + return tw; read_unlock(&tcp_hashinfo.ehash[st->bucket].lock); st->state = TCP_SEQ_STATE_ESTABLISHED; @@ -2111,16 +2106,12 @@ get_tw: sk_for_each_from(sk, node) { if (sk->sk_family == st->family) - goto found; + return sk; } st->state = TCP_SEQ_STATE_TIME_WAIT; tw = tw_head(&tcp_hashinfo.ehash[st->bucket].twchain); goto get_tw; -found: - cur = sk; -out: - return cur; } static void *established_get_idx(struct seq_file *seq, loff_t pos) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] 8139too: RTNL and flush_scheduled_work deadlock
Ben Greear <[EMAIL PROTECTED]> : [...] > It looks like this has not made it into the 2.6.20 stable series > patches... Any reason not to add it there? No. Go ahead and submit it. -- Ueimor Anybody got a battery for my Ultra 10 ? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > * Rusty Russell <[EMAIL PROTECTED]> wrote: > > > It's easier to write a kernel-space network driver, but it's not > > obviously the right thing to do until we can show that an efficient > > packet-level userspace interface isn't possible. I don't think > > that's been done, and it would be interesting to try. > > yes, i agree in theory, [...] let me explain my position a bit more verbosely: i agree in terms of 'network driver' (and more generally in terms of 'device', which includes network, storage, console, etc. devices): having a user-space driver option should still be possible and it should be integrated well. Qemu is quite rich and flexible in these areas and we dont want to throw away or isolate that body of code. but i dont agree in terms of PIC code, which is the main argument in this particular thread. There's little precedent for any add-ons for PICs in user-space, nor any particular PIC handling richness in Qemu that we'd like to preserve. Ingo - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Ingo Molnar wrote: * Avi Kivity <[EMAIL PROTECTED]> wrote: so right now the only option for a clean codebase is the KVM in-kernel code. I strongly disagree with this. are you disagreeing with my statement that the KVM kernel-side code is the only clean codebase here? To me this is a clear fact :) No, I agree with that. I just disagree with choosing to put the *pic code (or other code) into the kernel on *that* basis. The selection should be on design/performance issues alone, *not* the state of existing code. I only pointed out that the only clean codebase at the moment is the KVM in-kernel code - i did not make the argument (at all) that every new piece of KVM code should be done in the kernel. That would be stupid - do you think i'd advocate for example moving command line argument parsing into the kernel? No. But the difference in cruftiness between kvm and qemu code should not enter into the discussion of where to do things. and as i said in the mail: "the kernel _is_ the best place to do this particular stuff". I agree with this, maybe for different reasons. -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
* Avi Kivity <[EMAIL PROTECTED]> wrote: > > so right now the only option for a clean codebase is the KVM > > in-kernel code. > > I strongly disagree with this. are you disagreeing with my statement that the KVM kernel-side code is the only clean codebase here? To me this is a clear fact :) I only pointed out that the only clean codebase at the moment is the KVM in-kernel code - i did not make the argument (at all) that every new piece of KVM code should be done in the kernel. That would be stupid - do you think i'd advocate for example moving command line argument parsing into the kernel? and as i said in the mail: "the kernel _is_ the best place to do this particular stuff". Ingo - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Silent corruption with r8169
> I'll try to get to testing this, but I'm wondering if people may have > misunderstood my original post. I don't get any corruption over > Ethernet; it's just corruption on the filesystem during certain load > patterns that involve the Realtek ethernet card. When disabling hardware checksums helps then you know the corruption is on the Ethernetside. Otherwise it's somewhere else. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
* Rusty Russell <[EMAIL PROTECTED]> wrote: > It's easier to write a kernel-space network driver, but it's not > obviously the right thing to do until we can show that an efficient > packet-level userspace interface isn't possible. I don't think that's > been done, and it would be interesting to try. yes, i agree in theory, but IMO this is largely beside the point. What matters most for developing a project is _the quality of the codebase_. That attracts developers, developers improve the code, which then attracts users, which attracts more developers, etc., etc. As long as the quality of the codebase is maintained, this is a self-sustaining process. You've seen that happen with Linux. [ And of course, the crutial step #0 is: a sane, open-minded maintainer with good taste ;-) ] qemu's code quality is not really suitable for that basic OSS model, in my opinion. It has been a mostly one-man show for a long time with various hostile forks, bin-only kernel module and other actions that easily poison an OSS project. the result is not surprising: important portions of qemu have grown into a hard to hack, hard to maintain codebase with poor code quality, with gems like: #ifdef _WIN32 void CALLBACK host_alarm_handler(UINT uTimerID, UINT uMsg, DWORD_PTR dwUser, DWORD_PTR dw1, DWORD_PTR dw2) #else static void host_alarm_handler(int host_signum) #endif { #if 0 #define DISP_FREQ 1000 and that's not just some random driver - this is _the_ main central timer code of qemu. so right now the only option for a clean codebase is the KVM in-kernel code. It's clean and sweet and integrates nicely into the rest of the kernel. The kernel is also obviously the final place where most virtualization technologies want to show up because it's the entity that is the closest to the guest context: we _dont_ want to _force_ network traffic (let alone interrupt handling) through a userspace context, only if the functionality of the task absolutely requires it. (but in most cases we'll try to come up with a maximally flexible scheme that can just drive things straight via the kernel. netfilter/iptables isnt in user-space either, partly for that reason.) but architectural issues aside (ignoring that the kernel _is_ the best place to do this particular of stuff), this question is still mainly dominated by the basic question of code quality. I'd rather move something into the Linux kernel, enforce its code quality that way, and _then_ add whatever clean infrastructure is needed to push it back into user-space again (into a different codebase), than having to hack the monolithic 200 KLOC+ qemu codebase that is shackled with support for tons of arcane architectures nobody uses and tons of arcane OS variants that no-one cares about. Now qemu is a very important enabler and platform-reference-implementation for KVM to fall back to, but it's not the place to put crutial new code into, at least currently. Ingo - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ethtool: additional 10Gig niceness
Rick Jones wrote: teach ethtool to print "1Mb/s" for a 10G NIC and prepare for 10G NICs where it is possible to run something other than 10G update the ethtool.8 manpage with info re same and some grammar fixes Signed-off-by: Rick Jones <[EMAIL PROTECTED]> the likely required asbestos at the ready :) applied - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ehea: fix for dlpar and sysfs entries
This patch includes: - dlpar fix: certain resources may only be allocated when first logical port is available, and must be removed when last logical port has been removed - sysfs entries: create symbolic link from each logical port to ehea driver Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> --- This patch applies on top of the netdev upstream branch for 2.6.22 diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h index 1405d0b..173994d 100644 --- a/drivers/net/ehea/ehea.h +++ b/drivers/net/ehea/ehea.h @@ -39,7 +39,7 @@ #include #include #define DRV_NAME "ehea" -#define DRV_VERSION"EHEA_0055" +#define DRV_VERSION"EHEA_0056" #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \ | NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index a36a023..f9f3133 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -78,6 +78,28 @@ MODULE_PARM_DESC(sq_entries, " Number of __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) ")"); MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 1 "); +static int port_name_cnt = 0; + +static int __devinit ehea_probe_adapter(struct ibmebus_dev *dev, +const struct of_device_id *id); + +static int __devexit ehea_remove(struct ibmebus_dev *dev); + +static struct of_device_id ehea_device_table[] = { + { + .name = "lhea", + .compatible = "IBM,lhea", + }, + {}, +}; + +static struct ibmebus_driver ehea_driver = { + .name = "ehea", + .id_table = ehea_device_table, + .probe = ehea_probe_adapter, + .remove = ehea_remove, +}; + void ehea_dump(void *adr, int len, char *msg) { int x; unsigned char *deb = adr; @@ -2108,6 +2130,28 @@ static int ehea_clean_all_portres(struct return ret; } +static void ehea_remove_adapter_mr (struct ehea_adapter *adapter) +{ + int i; + + for (i=0; i < EHEA_MAX_PORTS; i++) + if (adapter->port[i]) + return; + + ehea_rem_mr(&adapter->mr); +} + +static int ehea_add_adapter_mr (struct ehea_adapter *adapter) +{ + int i; + + for (i=0; i < EHEA_MAX_PORTS; i++) + if (adapter->port[i]) + return 0; + + return ehea_reg_kernel_mr(adapter, &adapter->mr); +} + static int ehea_up(struct net_device *dev) { int ret, i; @@ -2361,6 +2405,34 @@ static void __devinit logical_port_relea of_node_put(port->ofdev.node); } +static int ehea_driver_sysfs_add(struct device *dev, + struct device_driver *driver) +{ + int ret; + + ret = sysfs_create_link(&driver->kobj, &dev->kobj, + kobject_name(&dev->kobj)); + if (ret == 0) { + ret = sysfs_create_link(&dev->kobj, &driver->kobj, + "driver"); + if (ret) + sysfs_remove_link(&driver->kobj, + kobject_name(&dev->kobj)); + } + return ret; +} + +static void ehea_driver_sysfs_remove(struct device *dev, + struct device_driver *driver) +{ + struct device_driver *drv = driver; + + if (drv) { + sysfs_remove_link(&drv->kobj, kobject_name(&dev->kobj)); + sysfs_remove_link(&dev->kobj, "driver"); + } +} + static struct device *ehea_register_port(struct ehea_port *port, struct device_node *dn) { @@ -2368,8 +2440,9 @@ static struct device *ehea_register_port port->ofdev.node = of_node_get(dn); port->ofdev.dev.parent = &port->adapter->ebus_dev->ofdev.dev; + port->ofdev.dev.bus = &ibmebus_bus_type; - sprintf(port->ofdev.dev.bus_id, "port%d", port->logical_port_id); + sprintf(port->ofdev.dev.bus_id, "port%d", port_name_cnt++); port->ofdev.dev.release = logical_port_release; ret = of_device_register(&port->ofdev); @@ -2384,8 +2457,16 @@ static struct device *ehea_register_port goto out_unreg_of_dev; } + ret = ehea_driver_sysfs_add(&port->ofdev.dev, &ehea_driver.driver); + if (ret) { + ehea_error("failed to register sysfs driver link"); + goto out_rem_dev_file; + } + return &port->ofdev.dev; +out_rem_dev_file: + device_remove_file(&port->ofdev.dev, &dev_attr_log_port_id); out_unreg_of_dev: of_device_unregister(&port->ofdev); out: @@ -2394,6 +2475,7 @@ out: static void ehea_unregister_port(struct ehea_port *port) { + ehea_driver_sysfs_remove(&port->ofdev.dev, &ehea_driver.driver); device_remove_file(&port->ofdev.dev, &dev_attr_log_port_id); of_device_unregister(&port->ofdev); } @@ -2520,7 +2602
Re: [PATCH] [IPv6] Exclude truncated packets from InHdrErrors statistics
From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]> Date: Thu, 05 Apr 2007 02:21:52 +0900 (JST) > In article <[EMAIL PROTECTED]> (at Tue, 3 Apr 2007 15:55:51 +0900), Mitsuru > Chinen <[EMAIL PROTECTED]> says: > > > Incoming trancated packets are counted as not only InTruncatedPkts but > > also InHdrErrors. They should be counted as InTruncatedPkts only. > > > > Signed-off-by: Mitsuru Chinen <[EMAIL PROTECTED]> > Acked-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Applied, thanks everyone. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Rusty Russell wrote: You didn't quote Anthony's point about "it's more about there not being good enough userspace interfaces to do network IO." It's easier to write a kernel-space network driver, but it's not obviously the right thing to do until we can show that an efficient packet-level userspace interface isn't possible. I don't think that's been done, and it would be interesting to try. In the case of networking, the copyful interfaces on receive are driven by the hardware not knowing how to split the header from the data. On transmit I agree, it could be made copyless from userspace (somthing like sendfilev, only not file oriented). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Uninline tcp_done
The function is quite big and has several call sites and nothing to collapse by compiler optimization on inlining. Besides it's nicer to read in a in .c file. Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Index: linux-2.6.21-rc3-net/include/net/tcp.h === --- linux-2.6.21-rc3-net.orig/include/net/tcp.h +++ linux-2.6.21-rc3-net/include/net/tcp.h @@ -918,21 +918,7 @@ static inline void tcp_set_state(struct #endif } -static inline void tcp_done(struct sock *sk) -{ - if(sk->sk_state == TCP_SYN_SENT || sk->sk_state == TCP_SYN_RECV) - TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS); - - tcp_set_state(sk, TCP_CLOSE); - tcp_clear_xmit_timers(sk); - - sk->sk_shutdown = SHUTDOWN_MASK; - - if (!sock_flag(sk, SOCK_DEAD)) - sk->sk_state_change(sk); - else - inet_csk_destroy_sock(sk); -} +extern void tcp_done(struct sock *sk); static inline void tcp_sack_reset(struct tcp_options_received *rx_opt) { Index: linux-2.6.21-rc3-net/net/ipv4/tcp.c === --- linux-2.6.21-rc3-net.orig/net/ipv4/tcp.c +++ linux-2.6.21-rc3-net/net/ipv4/tcp.c @@ -2372,6 +2372,23 @@ void __tcp_put_md5sig_pool(void) EXPORT_SYMBOL(__tcp_put_md5sig_pool); #endif +void tcp_done(struct sock *sk) +{ + if(sk->sk_state == TCP_SYN_SENT || sk->sk_state == TCP_SYN_RECV) + TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS); + + tcp_set_state(sk, TCP_CLOSE); + tcp_clear_xmit_timers(sk); + + sk->sk_shutdown = SHUTDOWN_MASK; + + if (!sock_flag(sk, SOCK_DEAD)) + sk->sk_state_change(sk); + else + inet_csk_destroy_sock(sk); +} +EXPORT_SYMBOL_GPL(tcp_done); + extern void __skb_cb_too_small_for_tcp(int, int); extern struct tcp_congestion_ops tcp_reno; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 reset_task question
On Wed, Apr 04, 2007 at 10:10:07PM +, Stephen Hemminger wrote: > > > > Yes, you're right. Perhaps we should get the rtnl first before > > > tg3_full_lock(), or turn irq_sync into an atomic counter that allows > > > nesting. > > When you start reinventing windows locks or the BKL, you know > you are going down the wrong path Actually, I think what Michael's suggesting is quite different. This would be a simple counter that tells the IRQ handlers to not process any events. So this isn't really a lock as such. FWIW I think the counter sounds better than using the RTNL since with the latter you'd have to figure out whether you're in an RTNL context or not (e.g., tg3_suspend would also need to grab it). Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPsec PMTUD problem
Herbert Xu wrote: > On Tue, Apr 03, 2007 at 06:32:07PM +0200, Patrick McHardy wrote: > >>I'm not sure I understand how this would work, the ICMP message >>looks the same in both cases. Or are you suggesting to >>differentiate based on the source of the ICMP message? > > > Actually you're right, this can't work in the general case. Even > if we had real devices for IPsec tunnels, there is still no way to > reliably figure out which device we should attribute a given MTU > event to if the same address appears on more than one device. > > >>Yes, that would work as a workaround, but it still seems like >>something worth fixing. > > > One possible solution is to not send MTU errors to ourselves since > we it wouldn't give us any new information. We'd need to audit the > users of icmp_send to make sure that there isn't a legitimate case > where we'd want to do that. One such case is delivery of errors to sockets. We'd need to make sure the errors are delivered some other way. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPsec PMTUD problem
Herbert Xu wrote: > On Thu, Apr 05, 2007 at 02:09:20PM +0200, Patrick McHardy wrote: > >>>One possible solution is to not send MTU errors to ourselves since >>>we it wouldn't give us any new information. We'd need to audit the >>>users of icmp_send to make sure that there isn't a legitimate case >>>where we'd want to do that. >> >>One such case is delivery of errors to sockets. We'd need to make >>sure the errors are delivered some other way. > > > Alternatively we can still send the ICMP error but avoid a PMTU > update if we received it from ourselves. That sounds easier. I'm currently working in that area anyway, I'll give it a try. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug in SCTP with SCTP_BINDX_REM_ADDR
Folks, while doing some testing of SCTP recently I came across a scenario where the behavior I see is not what I expect. Here is the scenario: I have 2 interfaces on a system, each has both an IPv4 and an IPv6 address, e.g. eth0 192.168.1.130 :::192.168.1.130 eth1 192.168.3.130 :::192.168.3.130 I have a test program that creates an IPv6 socket and then does a sctp_bindx() to add the first IPv6 address and a sctp_bindx() to add the second IPv6 address using the SCTP_BINDX_ADD_ADDR option to sctp_bindx(). I then call sctp_bindx() to remove the second IPv6 address using the SCTP_BINDX_REM_ADDR option. This call to sctp_bindx() fails with EINVAL. Since there is still 1 address associated with the endpoint I would expect this call to succeed. I traced what was going on and here is what I observed. The sctp_bindx() call in user space eventually turns into a call to sctp_bindx_add() in the kernel which in turn calls sctp_do_bind(). In sctp_do_bindx() there is the following code: /* PF specific bind() address verification. */ if (!sp->pf->bind_verify(sp, addr)) return -EADDRNOTAVAIL; bind_verify() is a function pointer which resolves into a call to sctp_inet6_bind_verify() since I'm dealing with an IPv6 socket. This function verifies that the sockaddr looks bindable. After doing some checks it calls sctp_v6_available() through another function pointer. The 2 IPv6 addresses assigned to the 2 interfaces are IPv6 addresses which are mapped to IPv4 addresses. In sctp_v6_available() there is the following code: if (type == IPV6_ADDR_MAPPED) { if (sp && !sp->v4mapped) return 0; if (sp && ipv6_only_sock(sctp_opt2sk(sp))) return 0; sctp_v6_map_v4(addr); return sctp_get_af_specific(AF_INET)->available(addr, sp); } Since my IPv6 addresses are IPv4 mapped addresses sctp_v6_map_v4() gets called which converts the IPv6 address to an IPv4 address. So what originally started as an IPv6 address in sctp_bindx_add() has now been converted to an IPv4 address. We then return back to sctp_do_bind(). We do some more checks and then we call sctp_add_bind_addr() to actually complete the bind process. sctp_add_bind_addr() adds the address we just processed to the bind address list. So at this point in time we have 2 entries on &bp->address_list (see net/sctp/bind_addr.c), each entry contains an address family value of 2 (AF_INET), a port number and an IPv4 address. So far so good. Mow we get to the core of the problem. When we call sctp_bindx() with the SCTP_BINDX_REM_ADDR option to remove the IPv6 address we call sctp_bindx_rem(). This function does some validation and then calls sctp_del_bind_addr() to remove the address from the bind address list. sctp_del_bind_addr() walks the bind address list &bp->address_list looking for a match. Since we are processing an IPv6 address the address family is 10 (AF_INET6). The call to sctp_cmp_addr_exact() in sctp_del_bind_addr() never finds a match and so sctp_del_bind_addr() returns EINVAL. sctp_cmp_addr_exact() fails on the compare of the address family, what's on the list is AF_INET and what it's comparing against is AF_INET6. What is happening is that the check for IPV6_ADDR_MAPPED that occurs during the add is missing when you do the remove and hence the IPv6 address is never mapped to the IPv4 address causing the lookup to fail. Below is the patch to add the necessary checks to do the mapping. This patch is against 2.6.21-rc5 Does this make sense? Any comments are appreciated. Thank you, Paolo I've attached the test program - compile as gcc -o bindx-test-ipv6 bindx-test-ipv6.c -lsctp >8 == --- net/sctp/socket.c.orig 2007-04-04 13:22:59.0 -0700 +++ net/sctp/socket.c 2007-04-04 13:25:35.0 -0700 @@ -627,6 +627,27 @@ int sctp_bindx_rem(struct sock *sk, stru retval = -EINVAL; goto err_bindx_rem; } + /* +* It's possible that we mapped an IPV6 addr to an IPV4 addr +* during the sctp_bindx_add() operation. This will happen if +* the IPV6 address we assigned to an interface is a mapped +* address, e.g. :::192.0.2.128. If we have mapped an IPV6 +* address to an IPV4 address during the add we need to make +* sure we do the same thing during the remove, otherwise we +* wont find a match on the address_list. +*/ + + if (af->sa_family == AF_INET6) { + struct in6_addr *in6; + int type; + + in6 = (struct in6_addr *)&sa_addr->v6.sin6_addr; + type = ipv6_addr_type(in6); + + if (type == IPV6_ADDR_MAPPED) + sctp_v6_map_v4(sa_addr); +
Re: IPsec PMTUD problem
On Thu, Apr 05, 2007 at 02:16:53PM +0200, Patrick McHardy wrote: > > That sounds easier. I'm currently working in that area anyway, I'll > give it a try. Thanks Patrick! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPsec PMTUD problem
On Thu, Apr 05, 2007 at 02:09:20PM +0200, Patrick McHardy wrote: > > > One possible solution is to not send MTU errors to ourselves since > > we it wouldn't give us any new information. We'd need to audit the > > users of icmp_send to make sure that there isn't a legitimate case > > where we'd want to do that. > > One such case is delivery of errors to sockets. We'd need to make > sure the errors are delivered some other way. Alternatively we can still send the ICMP error but avoid a PMTU update if we received it from ourselves. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPsec PMTUD problem
On Tue, Apr 03, 2007 at 06:32:07PM +0200, Patrick McHardy wrote: > > I'm not sure I understand how this would work, the ICMP message > looks the same in both cases. Or are you suggesting to > differentiate based on the source of the ICMP message? Actually you're right, this can't work in the general case. Even if we had real devices for IPsec tunnels, there is still no way to reliably figure out which device we should attribute a given MTU event to if the same address appears on more than one device. > Yes, that would work as a workaround, but it still seems like > something worth fixing. One possible solution is to not send MTU errors to ourselves since we it wouldn't give us any new information. We'd need to audit the users of icmp_send to make sure that there isn't a legitimate case where we'd want to do that. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Ingo Molnar wrote: * Rusty Russell <[EMAIL PROTECTED]> wrote: It's easier to write a kernel-space network driver, but it's not obviously the right thing to do until we can show that an efficient packet-level userspace interface isn't possible. I don't think that's been done, and it would be interesting to try. yes, i agree in theory, but IMO this is largely beside the point. What matters most for developing a project is _the quality of the codebase_. That attracts developers, developers improve the code, which then attracts users, which attracts more developers, etc., etc. As long as the quality of the codebase is maintained, this is a self-sustaining process. You've seen that happen with Linux. [ And of course, the crutial step #0 is: a sane, open-minded maintainer with good taste ;-) ] qemu's code quality is not really suitable for that basic OSS model, in my opinion. I think you may want to step off your high horse there. QEMU's code may not be Linux kernel quality but it's certainly not anywhere near the worst that is out there. Linux is over decade old. QEMU is only around 3 years old. Did Linux have extremely high quality code in 1994? Instead of posting code snippets to LKML, it would be much more constructive to post patches to qemu-devel. It's not like the QEMU maintainers are actively ignoring your efforts to improve the code. but architectural issues aside (ignoring that the kernel _is_ the best place to do this particular of stuff), Right. We don't put things in the kernel just because we don't like the way the userspace code is written. If that logic was valid, then Linus would be working on moving all of Gnome into the kernel. This discussion has two parts. The first is whether or not the kernel is the right place for a paravirtual network driver backend. My current believe is that we could not get enough performance from something like tun to do it in userspace. I also believe that we could improve tun (or create a replacement) so that we could implement a PV network driver backend in userspace. Admittedly, I'm not an expert in networking though so I could be wrong here. The second part is whether the platform devices should go in the kernel. I agree with you that having the PIT in the kernel is probably a good idea. I also agree that we probably have no choice but to move the APIC into the kernel (not for PV drivers, but for TPR performance and SMP support). Regards, Anthony Liguori this question is still mainly dominated by the basic question of code quality. I'd rather move something into the Linux kernel, enforce its code quality that way, and _then_ add whatever clean infrastructure is needed to push it back into user-space again (into a different codebase), than having to hack the monolithic 200 KLOC+ qemu codebase that is shackled with support for tons of arcane architectures nobody uses and tons of arcane OS variants that no-one cares about. Now qemu is a very important enabler and platform-reference-implementation for KVM to fall back to, but it's not the place to put crutial new code into, at least currently. Ingo - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RTNL]: Improve error codes for unsupported operations
The most common trigger of these errors is that the config option hasn't been enable wich would make the functionality available. Therefore returning EOPNOTSUPP gives a better idea on what is going wrong. Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Index: net-2.6.22/net/core/rtnetlink.c === --- net-2.6.22.orig/net/core/rtnetlink.c2007-04-05 13:22:14.0 +0200 +++ net-2.6.22/net/core/rtnetlink.c 2007-04-05 13:22:51.0 +0200 @@ -861,7 +861,7 @@ static int rtnetlink_rcv_msg(struct sk_b type = nlh->nlmsg_type; if (type > RTM_MAX) - return -EINVAL; + return -EOPNOTSUPP; type -= RTM_BASE; @@ -884,7 +884,7 @@ static int rtnetlink_rcv_msg(struct sk_b dumpit = rtnl_get_dumpit(family, type); if (dumpit == NULL) - return -EINVAL; + return -EOPNOTSUPP; return netlink_dump_start(rtnl, skb, nlh, dumpit, NULL); } @@ -912,7 +912,7 @@ static int rtnetlink_rcv_msg(struct sk_b doit = rtnl_get_doit(family, type); if (doit == NULL) - return -EINVAL; + return -EOPNOTSUPP; return doit(skb, nlh, (void *)&rta_buf[0]); } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote: > Ok, a simple analysis reveals the recursive spinlock: > > On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote: > > > [] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [] > > (ppp_output_wakeup+0x18/0x1c [ppp_generic]) > ===> > > r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = > > ===> spin_lock_bh(&pch->downl); > > > [] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [] > > (irnet_flow_indication+0x38/0x3c [irnet]) > > [] (irnet_flow_indication+0x0/0x3c [irnet]) from [] > > (irttp_run_tx_queue+0x1c0/0x1d4 [irda]) > > [] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [] > > (irttp_data_request+0x128/0x4f8 [irda]) > > r8 = BF121560 r7 = 0002 r6 = C38F4200 r5 = C21418B8 > > r4 = C21418B8 > > [] (irttp_data_request+0x0/0x4f8 [irda]) from [] > > (ppp_irnet_send+0x134/0x238 [irnet]) > > [] (ppp_irnet_send+0x0/0x238 [irnet]) from [] > > (ppp_push+0x80/0xb8 [ppp_generic]) > > r7 = C3A436E0 r6 = r5 = C21418B8 r4 = C1489600 > > [] (ppp_push+0x0/0xb8 [ppp_generic]) from [] > > (ppp_xmit_process+0x34/0x50c [ppp_generic]) > ===> > > r7 = 0021 r6 = C21418B8 r5 = C1489600 r4 = > > ===> spin_lock_bh(&pch->downl); For comments, below is a patch I am testing ATM. It doesn't look right nor very pretty, but I couldn't come up with anything better. I do sign-off for it in case nothing better is proposed and it is decided to push it for 2.6.21. Thanks Guennadi - Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany Fix recursion with PPP over IrNET. Signed-off-by: G. Liakhovetski <[EMAIL PROTECTED]> diff -u -r1.1.1.19.4.1 ppp_generic.c --- a/drivers/net/ppp_generic.c 26 Mar 2007 09:21:32 - 1.1.1.19.4.1 +++ b/drivers/net/ppp_generic.c 5 Apr 2007 15:01:45 - @@ -155,6 +155,7 @@ struct ppp_channel *chan; /* public channel data structure */ struct rw_semaphore chan_sem; /* protects `chan' during chan ioctl */ spinlock_t downl; /* protects `chan', file.xq dequeue */ + struct task_struct *locker; /* owner of the downl lock */ struct ppp *ppp; /* ppp unit we're connected to */ struct list_head clist; /* link in list of channels per unit */ rwlock_tupl;/* protects `ppp' */ @@ -1214,8 +1215,11 @@ spin_lock_bh(&pch->downl); if (pch->chan) { + /* Prevent recursive locking */ + pch->locker = current; if (pch->chan->ops->start_xmit(pch->chan, skb)) ppp->xmit_pending = NULL; + pch->locker = NULL; } else { /* channel got unregistered */ kfree_skb(skb); @@ -1435,6 +1439,9 @@ struct sk_buff *skb; struct ppp *ppp; + if (pch->locker == current) + return; + spin_lock_bh(&pch->downl); if (pch->chan != 0) { while (!skb_queue_empty(&pch->file.xq)) { - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SC92031]: Fix priv->lock context
From: Herbert Xu <[EMAIL PROTECTED]> Date: Thu, 5 Apr 2007 21:44:31 +1000 > On Thu, Apr 05, 2007 at 02:04:02AM +, Andrew Morton wrote: > > > > This looks like a locking bug in the ipv6 changes in davem's devel tree. > > There are no relevant changes to drivers/net/sc92031.c in rc5-mm4. > > Actually, this looks like a latent bug in sc92031. It's calling > spin_lock in the dev->open function on a lock that's held in BH > context. > > [SC92031]: Fix priv->lock context Where is the patch? :-) Second time you've done this in two days Herbert, tsk tsk :))) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ethtool: additional 10Gig niceness
applied Thanks. One thing I noticed while making the changes is that the reported speed is kept in a u16. With 10G we are already 1/6 of the way to the maximum. I've no idea when 100G will arrive, but euros to beliners it will probably arrive "some day" which means something will have to give. I've not thought it through completely, but my initial reaction would be to suggest just making the thing a 64 bit quantity reporting bits and not worry about it again. And then one doesn't have to worry if ethtool starts being applied to links which do not run at integral multiples of a Mbit/s. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in SCTP with SCTP_BINDX_REM_ADDR
Hi Paolo Paolo Galtieri wrote: What is happening is that the check for IPV6_ADDR_MAPPED that occurs during the add is missing when you do the remove and hence the IPv6 address is never mapped to the IPv4 address causing the lookup to fail. Below is the patch to add the necessary checks to do the mapping. This patch is against 2.6.21-rc5 Does this make sense? Any comments are appreciated. Yes, it makes perfect sense; however, I think you can just use af->addr_valid() instead of adding a special case below. If that works, can you regenerate the patch and provide a Signed-off-by line so I can incorporate that. Thanks -vlad Thank you, Paolo I've attached the test program - compile as gcc -o bindx-test-ipv6 bindx-test-ipv6.c -lsctp >8 == --- net/sctp/socket.c.orig 2007-04-04 13:22:59.0 -0700 +++ net/sctp/socket.c 2007-04-04 13:25:35.0 -0700 @@ -627,6 +627,27 @@ int sctp_bindx_rem(struct sock *sk, stru retval = -EINVAL; goto err_bindx_rem; } + /* +* It's possible that we mapped an IPV6 addr to an IPV4 addr +* during the sctp_bindx_add() operation. This will happen if +* the IPV6 address we assigned to an interface is a mapped +* address, e.g. :::192.0.2.128. If we have mapped an IPV6 +* address to an IPV4 address during the add we need to make +* sure we do the same thing during the remove, otherwise we +* wont find a match on the address_list. +*/ + + if (af->sa_family == AF_INET6) { + struct in6_addr *in6; + int type; + + in6 = (struct in6_addr *)&sa_addr->v6.sin6_addr; + type = ipv6_addr_type(in6); + + if (type == IPV6_ADDR_MAPPED) + sctp_v6_map_v4(sa_addr); + } + if (sa_addr->v4.sin_port != htons(bp->port)) { retval = -EINVAL; goto err_bindx_rem; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] 8139too: harden against TX ring overflow
This driver's 4-packet deep TX queue is too sensible to the "careless" callers ignoring its state (like netpoll in trapped mode), so add "queue full" check at the start of the hard_start_xmit() method (only under #ifndef RTL8139_NDEBUG, otherwise the queue will get stuck once dirty pointer gets out of sync); switch to using appropriate mnemonics for the return values while at it. Also, the out-of-sync dirty pointer check is misplaced in rtl8139_tx_interrupt() which causes TX descriptors to be inspected more than once in case the pointer really gets out-of-sync (and incrementing the dirty pointer always by 4 is just not enough, e.g. KGDBoE managed to stuff 20+ extra buffers into the queue) -- place it before the loop and limit the loop to only look through 4 descriptors at most, so that already overwritten descriptors are just not counted. Signed-off-by: Sergei Shtylyov <[EMAIL PROTECTED]> --- drivers/net/8139too.c | 31 --- 1 files changed, 20 insertions(+), 11 deletions(-) Index: linux-2.6/drivers/net/8139too.c === --- linux-2.6.orig/drivers/net/8139too.c +++ linux-2.6/drivers/net/8139too.c @@ -90,7 +90,7 @@ */ #define DRV_NAME "8139too" -#define DRV_VERSION"0.9.28" +#define DRV_VERSION"0.9.31" #include @@ -1708,6 +1708,13 @@ static int rtl8139_start_xmit (struct sk unsigned int len = skb->len; unsigned long flags; +#ifndef RTL8139_NDEBUG + if (unlikely((tp->cur_tx - tp->dirty_tx) >= NUM_TX_DESC)) { + printk(KERN_ERR "%s: TX queue full!\n", dev->name); + return NETDEV_TX_BUSY; + } +#endif + /* Calculate the next Tx descriptor entry. */ entry = tp->cur_tx % NUM_TX_DESC; @@ -1720,7 +1727,7 @@ static int rtl8139_start_xmit (struct sk } else { dev_kfree_skb(skb); tp->stats.tx_dropped++; - return 0; + return NETDEV_TX_OK; } spin_lock_irqsave(&tp->lock, flags); @@ -1740,7 +1747,7 @@ static int rtl8139_start_xmit (struct sk printk (KERN_DEBUG "%s: Queued Tx packet size %u to slot %d.\n", dev->name, len, entry); - return 0; + return NETDEV_TX_OK; } @@ -1755,6 +1762,16 @@ static void rtl8139_tx_interrupt (struct dirty_tx = tp->dirty_tx; tx_left = tp->cur_tx - dirty_tx; + +#ifndef RTL8139_NDEBUG + if (unlikely(tx_left > NUM_TX_DESC)) { + printk(KERN_ERR "%s: Out-of-sync dirty pointer, %ld vs. %ld.\n", + dev->name, dirty_tx, tp->cur_tx); + tx_left = NUM_TX_DESC; + dirty_tx = tp->cur_tx - NUM_TX_DESC; + } +#endif /* RTL8139_NDEBUG */ + while (tx_left > 0) { int entry = dirty_tx % NUM_TX_DESC; int txstatus; @@ -1797,14 +1814,6 @@ static void rtl8139_tx_interrupt (struct tx_left--; } -#ifndef RTL8139_NDEBUG - if (tp->cur_tx - dirty_tx > NUM_TX_DESC) { - printk (KERN_ERR "%s: Out-of-sync dirty pointer, %ld vs. %ld.\n", - dev->name, dirty_tx, tp->cur_tx); - dirty_tx += NUM_TX_DESC; - } -#endif /* RTL8139_NDEBUG */ - /* only wake the queue if we did work, and the queue is stopped */ if (tp->dirty_tx != dirty_tx) { tp->dirty_tx = dirty_tx; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
pci-e e1000 not responding to ARPs sometimes?
We're seeing some strange behavior on several systems with onboard PCI-e pro/1000 NICs. The behavior is generally that we cannot connect into a system until we ping from that system out. It often happens with our tainted 2.6.18.2 kernel, but we also see similar problems on the 2.6.20.? FC5 kernel. It appears to me that the NIC gets into some funky state where it will not receive traffic until a packet is sent out of the interface. After that, it works fine, at least for a while. At this point, I am just curious if anyone else is seeing anything similar, or if there are any known problems of this nature with the 2.6.18 e1000 driver. Here is lspci for one of the affected systems: 00:00.0 Host bridge: Intel Corporation Server Memory Contoller Hub (rev b1) 00:02.0 PCI bridge: Intel Corporation Server PCI Express x8 Port 2-3 (rev b1) 00:08.0 System peripheral: Intel Corporation Server DMA Controller (rev b1) 00:10.0 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:10.1 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:10.2 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:11.0 Host bridge: Intel Corporation Reserved Registers (rev b1) 00:13.0 Host bridge: Intel Corporation Reserved Registers (rev b1) 00:15.0 Host bridge: Intel Corporation Server FBD Registers (rev b1) 00:16.0 Host bridge: Intel Corporation Server FBD Registers (rev b1) 00:1c.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Root Port 1 (rev 09) 00:1d.0 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #3 (rev 09) 00:1d.3 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #4 (rev 09) 00:1d.7 USB Controller: Intel Corporation Enterprise Southbridge EHCI USB (rev 09) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) 00:1f.0 ISA bridge: Intel Corporation Enterprise Southbridge LPC (rev 09) 00:1f.2 IDE interface: Intel Corporation Enterprise Southbridge SATA cc=IDE (rev 09) 00:1f.3 SMBus: Intel Corporation Enterprise Southbridge SMBus (rev 09) 01:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Upstream Port (rev 01) 01:00.3 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express to PCI-X Bridge (rev 01) 02:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Downstream Port E1 (rev 01) 02:02.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Downstream Port E3 (rev 01) 04:00.0 Ethernet controller: Intel Corporation Enterprise Southbridge DPT LAN Copper (rev 01) 04:00.1 Ethernet controller: Intel Corporation Enterprise Southbridge DPT LAN Copper (rev 01) 05:01.0 PCI bridge: IBM PCI-X to PCI-X Bridge (rev 03) 06:04.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 06:04.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 06:06.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 06:06.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 08:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in SCTP with SCTP_BINDX_REM_ADDR
Here's the revises patch Paolo Signed-off-by: Paolo Galtieri <[EMAIL PROTECTED]> --- net/sctp/socket.c.orig 2007-04-05 12:59:15.0 -0700 +++ net/sctp/socket.c 2007-04-05 13:11:37.0 -0700 @@ -627,6 +627,12 @@ int sctp_bindx_rem(struct sock *sk, stru retval = -EINVAL; goto err_bindx_rem; } + + if (!af->addr_valid(&saveaddr, sp)) { + retval = -EADDRNOTAVAIL; + goto err_bindx_rem; + } + if (sa_addr->v4.sin_port != htons(bp->port)) { retval = -EINVAL; goto err_bindx_rem; Vlad Yasevich wrote: Hi Paolo Paolo Galtieri wrote: What is happening is that the check for IPV6_ADDR_MAPPED that occurs during the add is missing when you do the remove and hence the IPv6 address is never mapped to the IPv4 address causing the lookup to fail. Below is the patch to add the necessary checks to do the mapping. This patch is against 2.6.21-rc5 Does this make sense? Any comments are appreciated. Yes, it makes perfect sense; however, I think you can just use af->addr_valid() instead of adding a special case below. If that works, can you regenerate the patch and provide a Signed-off-by line so I can incorporate that. Thanks -vlad Thank you, Paolo I've attached the test program - compile as gcc -o bindx-test-ipv6 bindx-test-ipv6.c -lsctp >8 == --- net/sctp/socket.c.orig 2007-04-04 13:22:59.0 -0700 +++ net/sctp/socket.c 2007-04-04 13:25:35.0 -0700 @@ -627,6 +627,27 @@ int sctp_bindx_rem(struct sock *sk, stru retval = -EINVAL; goto err_bindx_rem; } + /* +* It's possible that we mapped an IPV6 addr to an IPV4 addr +* during the sctp_bindx_add() operation. This will happen if +* the IPV6 address we assigned to an interface is a mapped +* address, e.g. :::192.0.2.128. If we have mapped an IPV6 +* address to an IPV4 address during the add we need to make +* sure we do the same thing during the remove, otherwise we +* wont find a match on the address_list. +*/ + + if (af->sa_family == AF_INET6) { + struct in6_addr *in6; + int type; + + in6 = (struct in6_addr *)&sa_addr->v6.sin6_addr; + type = ipv6_addr_type(in6); + + if (type == IPV6_ADDR_MAPPED) + sctp_v6_map_v4(sa_addr); + } + if (sa_addr->v4.sin_port != htons(bp->port)) { retval = -EINVAL; goto err_bindx_rem; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in SCTP with SCTP_BINDX_REM_ADDR
Oops, the patch I sent previously was for an older 2.6 kernel. I'm testing on a 2.6.10+ SCTP patches up to 2.6.17. Here is a revised patch for 2.6.21: Paolo Signed-off-by: Paolo Galtieri <[EMAIL PROTECTED]> --- linux-2.6.21/net/sctp/socket.c 2007-03-26 06:58:14.0 -0700 +++ linux-2.6.21build/net/sctp/socket.c 2007-04-05 14:04:51.0 -0700 @@ -627,6 +627,12 @@ int sctp_bindx_rem(struct sock *sk, stru retval = -EINVAL; goto err_bindx_rem; } + + if (!af->addr_valid(sa_addr, sp, NULL)) { + retval = -EADDRNOTAVAIL; + goto err_bindx_rem; + } + if (sa_addr->v4.sin_port != htons(bp->port)) { retval = -EINVAL; goto err_bindx_rem; Paolo Galtieri wrote: Here's the revises patch Paolo Signed-off-by: Paolo Galtieri <[EMAIL PROTECTED]> --- net/sctp/socket.c.orig 2007-04-05 12:59:15.0 -0700 +++ net/sctp/socket.c 2007-04-05 13:11:37.0 -0700 @@ -627,6 +627,12 @@ int sctp_bindx_rem(struct sock *sk, stru retval = -EINVAL; goto err_bindx_rem; } + + if (!af->addr_valid(&saveaddr, sp)) { + retval = -EADDRNOTAVAIL; + goto err_bindx_rem; + } + if (sa_addr->v4.sin_port != htons(bp->port)) { retval = -EINVAL; goto err_bindx_rem; Vlad Yasevich wrote: Hi Paolo Paolo Galtieri wrote: What is happening is that the check for IPV6_ADDR_MAPPED that occurs during the add is missing when you do the remove and hence the IPv6 address is never mapped to the IPv4 address causing the lookup to fail. Below is the patch to add the necessary checks to do the mapping. This patch is against 2.6.21-rc5 Does this make sense? Any comments are appreciated. Yes, it makes perfect sense; however, I think you can just use af->addr_valid() instead of adding a special case below. If that works, can you regenerate the patch and provide a Signed-off-by line so I can incorporate that. Thanks -vlad Thank you, Paolo I've attached the test program - compile as gcc -o bindx-test-ipv6 bindx-test-ipv6.c -lsctp >8 == --- net/sctp/socket.c.orig 2007-04-04 13:22:59.0 -0700 +++ net/sctp/socket.c 2007-04-04 13:25:35.0 -0700 @@ -627,6 +627,27 @@ int sctp_bindx_rem(struct sock *sk, stru retval = -EINVAL; goto err_bindx_rem; } + /* +* It's possible that we mapped an IPV6 addr to an IPV4 addr +* during the sctp_bindx_add() operation. This will happen if +* the IPV6 address we assigned to an interface is a mapped +* address, e.g. :::192.0.2.128. If we have mapped an IPV6 +* address to an IPV4 address during the add we need to make +* sure we do the same thing during the remove, otherwise we +* wont find a match on the address_list. +*/ + + if (af->sa_family == AF_INET6) { + struct in6_addr *in6; + int type; + + in6 = (struct in6_addr *)&sa_addr->v6.sin6_addr; + type = ipv6_addr_type(in6); + + if (type == IPV6_ADDR_MAPPED) + sctp_v6_map_v4(sa_addr); + } + if (sa_addr->v4.sin_port != htons(bp->port)) { retval = -EINVAL; goto err_bindx_rem; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NET: Multiqueue network device support, replayed.
David, Please consider pulling from my git tree: git-pull git://lost.foo-projects.org/~ppwaskie/git/net-2.6.22 multiqueue This is a branch named 'multiqueue' of a recent pull from your tree with an updated implementation of the multiqueue implementation we've been working on. Included in this is: 1) A simplified API where all drivers will allocate a queue struct for each queue on the NIC. 2) A transparent stack change for both non-mq and mq devices to use the same codepath. 3) Remove the per-queue locking model, and just use the per-queue netif_{start|stop|wake}_subqueue() functions for multiqueue management. 4) Updated multiqueue documentation in Documentation/networking describing the implementation and base driver requirements to implement multiqueue support. Cheers, PJ Waskiewicz Intel Corp. [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RTNL]: Improve error codes for unsupported operations
From: Thomas Graf <[EMAIL PROTECTED]> Date: Thu, 5 Apr 2007 16:34:02 +0200 > The most common trigger of these errors is that the > config option hasn't been enable wich would make the > functionality available. Therefore returning EOPNOTSUPP > gives a better idea on what is going wrong. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied, thanks a lot Thomas. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix MCA when shutting down tulip quad-NIC
On Tue, Apr 03, 2007 at 11:19:16PM +0200, Olaf Hering wrote: > From: [EMAIL PROTECTED] > > https://bugzilla.novell.com/show_bug.cgi?id=SUSE39204 Wow, registering for Novell's bugzilla is painful. And in the end I get "Access denied" on that bug. Can you give us this information some other way? > Shutting down the network causes an MCA because of an IO TLB error when > a DEC quad 10/100 card is in any slot. This problem was originally seen > on an HP rx4640. I'm not clear on why pci_disable_device() would fix this bug. Do you have an explanation (or can copy one out of the bug report)? I'm hesitant to make even obviously correct changes to the tulip driver without good evidence, given the incredible variety of buggy hardware out there. This looks to me like another iteration of the shutdown DMA/irq race at first glance. Grant has a patch for it; I'm working on one I consider cleaner. -VAL > > > Signed-off-by: Olaf Hering <[EMAIL PROTECTED]> > > --- > > Andrew: Why is it tp->pdev instead of pdev? > > drivers/net/tulip/tulip_core.c |1 + > 1 file changed, 1 insertion(+) > > Index: b/drivers/net/tulip/tulip_core.c > === > --- a/drivers/net/tulip/tulip_core.c > +++ b/drivers/net/tulip/tulip_core.c > @@ -1798,6 +1798,7 @@ static void __devexit tulip_remove_one ( > return; > > tp = netdev_priv(dev); > + pci_disable_device(tp->pdev); > unregister_netdev(dev); > pci_free_consistent (pdev, >sizeof (struct tulip_rx_desc) * RX_RING_SIZE + - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support, replayed.
From: "Waskiewicz Jr, Peter P" <[EMAIL PROTECTED]> Date: Thu, 5 Apr 2007 14:33:19 -0700 > David, > Please consider pulling from my git tree: > > git-pull git://lost.foo-projects.org/~ppwaskie/git/net-2.6.22 > multiqueue > > This is a branch named 'multiqueue' of a recent pull from your tree with > an updated implementation of the multiqueue implementation we've been > working on. Included in this is: > > 1) A simplified API where all drivers will allocate a queue struct for > each queue on the NIC. > 2) A transparent stack change for both non-mq and mq devices to use the > same codepath. > 3) Remove the per-queue locking model, and just use the per-queue > netif_{start|stop|wake}_subqueue() functions for multiqueue management. > 4) Updated multiqueue documentation in Documentation/networking > describing the implementation and base driver requirements to implement > multiqueue support. Thanks for following up on this work, but it really needs to be reviewed here on netdev, I really can't just pull this into my tree until it is reviewed properly here. So, please post the new feature as a set of patches. Thanks a lot. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: pci-e e1000 not responding to ARPs sometimes?
Sorry for the top post. I believe this is a known issue and it was fixed in the kernel driver as well (maybe it didn't make it into FC5 kernel?). See the following: http://article.gmane.org/gmane.linux.network/52168/match=e1000+factps+mn gcg If I'm not mistaken, the part relevant to your problem is this: --- linux-2.6.orig/drivers/net/e1000/e1000_hw.c +++ linux-2.6/drivers/net/e1000/e1000_hw.c @@ -7817,9 +7817,8 @@ e1000_enable_mng_pass_thru(struct e1000_ fwsm = E1000_READ_REG(hw, FWSM); factps = E1000_READ_REG(hw, FACTPS); -if (((fwsm & E1000_FWSM_MODE_MASK) == -(e1000_mng_mode_pt << E1000_FWSM_MODE_SHIFT)) && -(factps & E1000_FACTPS_MNGCG)) +if fwsm & E1000_FWSM_MODE_MASK) >> E1000_FWSM_MODE_SHIFT) == + e1000_mng_mode_pt) && !(factps & E1000_FACTPS_MNGCG)) return TRUE; } else if ((manc & E1000_MANC_SMBUS_EN) && !(manc & E1000_MANC_ASF_EN)) Emil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ben Greear Sent: Thursday, April 05, 2007 1:18 PM To: NetDev Subject: pci-e e1000 not responding to ARPs sometimes? We're seeing some strange behavior on several systems with onboard PCI-e pro/1000 NICs. The behavior is generally that we cannot connect into a system until we ping from that system out. It often happens with our tainted 2.6.18.2 kernel, but we also see similar problems on the 2.6.20.? FC5 kernel. It appears to me that the NIC gets into some funky state where it will not receive traffic until a packet is sent out of the interface. After that, it works fine, at least for a while. At this point, I am just curious if anyone else is seeing anything similar, or if there are any known problems of this nature with the 2.6.18 e1000 driver. Here is lspci for one of the affected systems: 00:00.0 Host bridge: Intel Corporation Server Memory Contoller Hub (rev b1) 00:02.0 PCI bridge: Intel Corporation Server PCI Express x8 Port 2-3 (rev b1) 00:08.0 System peripheral: Intel Corporation Server DMA Controller (rev b1) 00:10.0 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:10.1 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:10.2 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:11.0 Host bridge: Intel Corporation Reserved Registers (rev b1) 00:13.0 Host bridge: Intel Corporation Reserved Registers (rev b1) 00:15.0 Host bridge: Intel Corporation Server FBD Registers (rev b1) 00:16.0 Host bridge: Intel Corporation Server FBD Registers (rev b1) 00:1c.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Root Port 1 (rev 09) 00:1d.0 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #3 (rev 09) 00:1d.3 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #4 (rev 09) 00:1d.7 USB Controller: Intel Corporation Enterprise Southbridge EHCI USB (rev 09) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) 00:1f.0 ISA bridge: Intel Corporation Enterprise Southbridge LPC (rev 09) 00:1f.2 IDE interface: Intel Corporation Enterprise Southbridge SATA cc=IDE (rev 09) 00:1f.3 SMBus: Intel Corporation Enterprise Southbridge SMBus (rev 09) 01:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Upstream Port (rev 01) 01:00.3 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express to PCI-X Bridge (rev 01) 02:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Downstream Port E1 (rev 01) 02:02.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Downstream Port E3 (rev 01) 04:00.0 Ethernet controller: Intel Corporation Enterprise Southbridge DPT LAN Copper (rev 01) 04:00.1 Ethernet controller: Intel Corporation Enterprise Southbridge DPT LAN Copper (rev 01) 05:01.0 PCI bridge: IBM PCI-X to PCI-X Bridge (rev 03) 06:04.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 06:04.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 06:06.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 06:06.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 08:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-i
Re: [SC92031]: Fix priv->lock context
On Thu, Apr 05, 2007 at 09:59:29AM -0700, David Miller wrote: > > Where is the patch? :-) > > Second time you've done this in two days Herbert, tsk tsk :))) The patch was so easy that it was left as an exercise to the reader :) [SC92031]: Fix priv->lock context The spin_lock calls made in dev->open and dev->close must disable BH since open/close are made in process context. Conversely, the call in dev->hard_start_xmit does not need to disable BH since it is already executing with BH disabled. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/drivers/net/sc92031.c b/drivers/net/sc92031.c index 15ceeae..5b7284c 100644 --- a/drivers/net/sc92031.c +++ b/drivers/net/sc92031.c @@ -963,7 +963,7 @@ static int sc92031_start_xmit(struct sk_buff *skb, struct net_device *dev) goto out; } - spin_lock_bh(&priv->lock); + spin_lock(&priv->lock); if (unlikely(!netif_carrier_ok(dev))) { err = -ENOLINK; @@ -1004,7 +1004,7 @@ static int sc92031_start_xmit(struct sk_buff *skb, struct net_device *dev) netif_stop_queue(dev); out_unlock: - spin_unlock_bh(&priv->lock); + spin_unlock(&priv->lock); out: dev_kfree_skb(skb); @@ -1041,12 +1041,12 @@ static int sc92031_open(struct net_device *dev) priv->pm_config = 0; /* Interrupts already disabled by sc92031_stop or sc92031_probe */ - spin_lock(&priv->lock); + spin_lock_bh(&priv->lock); _sc92031_reset(dev); mmiowb(); - spin_unlock(&priv->lock); + spin_unlock_bh(&priv->lock); sc92031_enable_interrupts(dev); if (netif_carrier_ok(dev)) @@ -1076,13 +1076,13 @@ static int sc92031_stop(struct net_device *dev) /* Disable interrupts, stop Tx and Rx. */ sc92031_disable_interrupts(dev); - spin_lock(&priv->lock); + spin_lock_bh(&priv->lock); _sc92031_disable_tx_rx(dev); _sc92031_tx_clear(dev); mmiowb(); - spin_unlock(&priv->lock); + spin_unlock_bh(&priv->lock); free_irq(pdev->irq, dev); pci_free_consistent(pdev, TX_BUF_TOT_LEN, priv->tx_bufs, @@ -1538,13 +1538,13 @@ static int sc92031_suspend(struct pci_dev *pdev, pm_message_t state) /* Disable interrupts, stop Tx and Rx. */ sc92031_disable_interrupts(dev); - spin_lock(&priv->lock); + spin_lock_bh(&priv->lock); _sc92031_disable_tx_rx(dev); _sc92031_tx_clear(dev); mmiowb(); - spin_unlock(&priv->lock); + spin_unlock_bh(&priv->lock); out: pci_set_power_state(pdev, pci_choose_state(pdev, state)); @@ -1564,12 +1564,12 @@ static int sc92031_resume(struct pci_dev *pdev) goto out; /* Interrupts already disabled by sc92031_suspend */ - spin_lock(&priv->lock); + spin_lock_bh(&priv->lock); _sc92031_reset(dev); mmiowb(); - spin_unlock(&priv->lock); + spin_unlock_bh(&priv->lock); sc92031_enable_interrupts(dev); netif_device_attach(dev); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
On Thu, 2007-04-05 at 10:17 +0300, Avi Kivity wrote: > Rusty Russell wrote: > > You didn't quote Anthony's point about "it's more about there not being > > good enough userspace interfaces to do network IO." > > > > It's easier to write a kernel-space network driver, but it's not > > obviously the right thing to do until we can show that an efficient > > packet-level userspace interface isn't possible. I don't think that's > > been done, and it would be interesting to try. > > > > In the case of networking, the copyful interfaces on receive are driven > by the hardware not knowing how to split the header from the data. On > transmit I agree, it could be made copyless from userspace (somthing > like sendfilev, only not file oriented). Hi Avi, I don't think you've thought about this very hard. The receive copy is completely independent with whether the packet is going to the guest via a kernel driver or via userspace, so not relevant. And if all packets from the card are going to the guest, you can deliver directly. Userspace or kernel, no difference. And we have a "sendfilev not file oriented": it's called "writev" 8) An in-kernel driver can avoid system call overhead and page references. But a better tap device helps more than just KVM. Rusty. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SC92031]: Fix priv->lock context
> On Thu, Apr 05, 2007 at 09:59:29AM -0700, David Miller wrote: >> >> Where is the patch? :-) >> >> Second time you've done this in two days Herbert, tsk tsk :))) > > The patch was so easy that it was left as an exercise to the reader :) I did try, but it was too much for me :-) > > [SC92031]: Fix priv->lock context > > The spin_lock calls made in dev->open and dev->close must disable > BH since open/close are made in process context. Conversely, the > call in dev->hard_start_xmit does not need to disable BH since it > is already executing with BH disabled. > Yes, this patch did the trick. Thanks. Tony - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
On Thu, 2007-04-05 at 13:36 +0200, Ingo Molnar wrote: > prototyping new kernel APIs to implement user-space network drivers, on > a crufty codebase is not something that should be done lightly. I think you overestimate my radicalism. I was considering readv() and writev() on the tap device. Qemu's infrastructure may hurt kvm here, but lguest won't be able to use that excuse. > track issue for the *PIC question at > hand. PICs are not network devices, they are essential platform > components and almost an extended part of the CPU.) Definitely, I'm only interested in stealing^H^H^Hsharing KVM devices. The subject is now deeply misleading 8( Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] 8139too: harden against TX ring overflow
Sergei Shtylyov <[EMAIL PROTECTED]> wrote: > This driver's 4-packet deep TX queue is too sensible to the "careless" callers > ignoring its state (like netpoll in trapped mode), so add "queue full" check > at > the start of the hard_start_xmit() method (only under #ifndef RTL8139_NDEBUG, > otherwise the queue will get stuck once dirty pointer gets out of sync); > switch > to using appropriate mnemonics for the return values while at it. Could you please describe this netpoll scenario in more detail? More importantly, why wouldn't we fix netpoll instead? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] [IPv6] Add link and site-local scope inline
In article <[EMAIL PROTECTED]> (at Thu, 05 Apr 2007 23:21:05 -0400), Brian Haley <[EMAIL PROTECTED]> says: > Add link and site-local scope inline to avoid calls to ipv6_addr_type(). > I disagree. Multicast scopes should also be handled appropriately. --yoshfuji - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Kgdb-bugreport] [PATCH] 8139too: harden against TX ring overflow
On Friday 06 April 2007 07:18, Herbert Xu wrote: > Sergei Shtylyov <[EMAIL PROTECTED]> wrote: > > This driver's 4-packet deep TX queue is too sensible to the "careless" > > callers ignoring its state (like netpoll in trapped mode), so add "queue > > full" check at the start of the hard_start_xmit() method (only under > > #ifndef RTL8139_NDEBUG, otherwise the queue will get stuck once dirty > > pointer gets out of sync); switch to using appropriate mnemonics for the > > return values while at it. > > Could you please describe this netpoll scenario in more detail? > More importantly, why wouldn't we fix netpoll instead? We're trying to figure out a way of fixing netpoll. Don't know what the solution is yet. Here is what happens: in KGDB we set netpoll trapped flag. This prevents stopping and starting of a netdev queue. Interfaces that have a small ring (8139) run into a problem because of this. When the ring goes full, it can't stop the queue. This doesn't make sense since in absence of ring descriptors, the device can't transmit any more packets. Sergie had posted one more patch last week that lets us start and stop queues in trapped state. This patch fixes the 8139 side behavior in this context. -Amit - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] [IPv6] Add multicast address type inline
In article <[EMAIL PROTECTED]> (at Thu, 05 Apr 2007 23:21:16 -0400), Brian Haley <[EMAIL PROTECTED]> says: > Add multicast address type inline to avoid calls to ipv6_addr_type(). > > Signed-off-by: Brian Haley <[EMAIL PROTECTED]> > --- > include/net/ipv6.h|5 + > net/ipv6/icmp.c | 12 > net/ipv6/ip6_tunnel.c |4 ++-- > net/ipv6/route.c |4 ++-- > 4 files changed, 13 insertions(+), 12 deletions(-) > > diff --git a/include/net/ipv6.h b/include/net/ipv6.h > index d473789..a888b0e 100644 > --- a/include/net/ipv6.h > +++ b/include/net/ipv6.h > @@ -439,6 +439,11 @@ static inline int ipv6_addr_scope_sitelocal(const struct > in6_addr *a) > return ((a->s6_addr32[0] & htonl(0xFFC0)) == htonl(0xFEC0)); > } > > +static inline int ipv6_addr_type_multicast(const struct in6_addr *a) > +{ > + return ((a->s6_addr32[0] & htonl(0xFF00)) == htonl(0xFF00)); > +} > + Matter of taste, but I prefer ipv6_addr_multicast() to align with ipv6_addr_any(). > diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c > index e94992a..709037f 100644 > --- a/net/ipv6/icmp.c > +++ b/net/ipv6/icmp.c > @@ -312,7 +312,6 @@ void icmpv6_send(struct sk_buff *skb, int type, int code, > __u32 info, > struct flowi fl; > struct icmpv6_msg msg; > int iif = 0; > - int addr_type = 0; > int len; > int hlimit, tclass; > int err = 0; > @@ -327,8 +326,6 @@ void icmpv6_send(struct sk_buff *skb, int type, int code, > __u32 info, >* Rule (e.1) is enforced by not using icmpv6_send >* in any code that processes icmp errors. >*/ > - addr_type = ipv6_addr_type(&hdr->daddr); > - > if (ipv6_chk_addr(&hdr->daddr, skb->dev, 0)) > saddr = &hdr->daddr; > > @@ -336,7 +333,7 @@ void icmpv6_send(struct sk_buff *skb, int type, int code, > __u32 info, >* Dest addr check >*/ > > - if ((addr_type & IPV6_ADDR_MULTICAST || skb->pkt_type != PACKET_HOST)) { > + if (ipv6_addr_type_multicast(&hdr->daddr) || skb->pkt_type != > PACKET_HOST) { > if (type != ICMPV6_PKT_TOOBIG && > !(type == ICMPV6_PARAMPROB && > code == ICMPV6_UNK_OPTION && I think this is okay. > @@ -346,13 +343,11 @@ void icmpv6_send(struct sk_buff *skb, int type, int > code, __u32 info, > saddr = NULL; > } > > - addr_type = ipv6_addr_type(&hdr->saddr); > - > /* >* Source addr check >*/ > > - if (addr_type & IPV6_ADDR_LINKLOCAL) > + if (ipv6_addr_scope_linklocal(&hdr->saddr)) > iif = skb->dev->ifindex; > > /* No, this is not identical. > @@ -361,7 +356,8 @@ void icmpv6_send(struct sk_buff *skb, int type, int code, > __u32 info, >* We check unspecified / multicast addresses here, >* and anycast addresses will be checked later. >*/ > - if ((addr_type == IPV6_ADDR_ANY) || (addr_type & IPV6_ADDR_MULTICAST)) { > + if (ipv6_addr_any(&hdr->saddr) || > + ipv6_addr_type_multicast(&hdr->saddr)) { > LIMIT_NETDEBUG(KERN_DEBUG "icmpv6_send: addr_any/mcast > source\n"); > return; > } I guess ipv6_addr_multicast() || ipv6_addr_any() is better. > diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c > index a0902fb..0dd1f63 100644 > --- a/net/ipv6/ip6_tunnel.c > +++ b/net/ipv6/ip6_tunnel.c > @@ -,8 +,8 @@ static void ip6_tnl_link_config(struct ip6_tnl *t) > dev->iflink = p->link; > > if (p->flags & IP6_TNL_F_CAP_XMIT) { > - int strict = (ipv6_addr_type(&p->raddr) & > - (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL)); > + int strict = ipv6_addr_type_multicast(&p->raddr) || > + ipv6_addr_scope_linklocal(&p->raddr); > > struct rt6_info *rt = rt6_lookup(&p->raddr, &p->laddr, >p->link, strict); Different logic, but seems sane. > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 53d79ac..32c6398 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -227,8 +227,8 @@ static __inline__ int rt6_check_expired(const struct > rt6_info *rt) > > static inline int rt6_need_strict(struct in6_addr *daddr) > { > - return (ipv6_addr_type(daddr) & > - (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL)); > + return (ipv6_addr_is_multicast(daddr) || > + ipv6_addr_scope_linklocal(daddr)); > } > ditto. --yoshfuji - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] [IPv6] Add multicast address type inline
Add multicast address type inline to avoid calls to ipv6_addr_type(). Signed-off-by: Brian Haley <[EMAIL PROTECTED]> --- include/net/ipv6.h|5 + net/ipv6/icmp.c | 12 net/ipv6/ip6_tunnel.c |4 ++-- net/ipv6/route.c |4 ++-- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index d473789..a888b0e 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -439,6 +439,11 @@ static inline int ipv6_addr_scope_sitelocal(const struct in6_addr *a) return ((a->s6_addr32[0] & htonl(0xFFC0)) == htonl(0xFEC0)); } +static inline int ipv6_addr_type_multicast(const struct in6_addr *a) +{ + return ((a->s6_addr32[0] & htonl(0xFF00)) == htonl(0xFF00)); +} + /* * Prototypes exported by ipv6 */ diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index e94992a..709037f 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -312,7 +312,6 @@ void icmpv6_send(struct sk_buff *skb, int type, int code, __u32 info, struct flowi fl; struct icmpv6_msg msg; int iif = 0; - int addr_type = 0; int len; int hlimit, tclass; int err = 0; @@ -327,8 +326,6 @@ void icmpv6_send(struct sk_buff *skb, int type, int code, __u32 info, * Rule (e.1) is enforced by not using icmpv6_send * in any code that processes icmp errors. */ - addr_type = ipv6_addr_type(&hdr->daddr); - if (ipv6_chk_addr(&hdr->daddr, skb->dev, 0)) saddr = &hdr->daddr; @@ -336,7 +333,7 @@ void icmpv6_send(struct sk_buff *skb, int type, int code, __u32 info, * Dest addr check */ - if ((addr_type & IPV6_ADDR_MULTICAST || skb->pkt_type != PACKET_HOST)) { + if (ipv6_addr_type_multicast(&hdr->daddr) || skb->pkt_type != PACKET_HOST) { if (type != ICMPV6_PKT_TOOBIG && !(type == ICMPV6_PARAMPROB && code == ICMPV6_UNK_OPTION && @@ -346,13 +343,11 @@ void icmpv6_send(struct sk_buff *skb, int type, int code, __u32 info, saddr = NULL; } - addr_type = ipv6_addr_type(&hdr->saddr); - /* * Source addr check */ - if (addr_type & IPV6_ADDR_LINKLOCAL) + if (ipv6_addr_scope_linklocal(&hdr->saddr)) iif = skb->dev->ifindex; /* @@ -361,7 +356,8 @@ void icmpv6_send(struct sk_buff *skb, int type, int code, __u32 info, * We check unspecified / multicast addresses here, * and anycast addresses will be checked later. */ - if ((addr_type == IPV6_ADDR_ANY) || (addr_type & IPV6_ADDR_MULTICAST)) { + if (ipv6_addr_any(&hdr->saddr) || + ipv6_addr_type_multicast(&hdr->saddr)) { LIMIT_NETDEBUG(KERN_DEBUG "icmpv6_send: addr_any/mcast source\n"); return; } diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index a0902fb..0dd1f63 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -,8 +,8 @@ static void ip6_tnl_link_config(struct ip6_tnl *t) dev->iflink = p->link; if (p->flags & IP6_TNL_F_CAP_XMIT) { - int strict = (ipv6_addr_type(&p->raddr) & - (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL)); + int strict = ipv6_addr_type_multicast(&p->raddr) || +ipv6_addr_scope_linklocal(&p->raddr); struct rt6_info *rt = rt6_lookup(&p->raddr, &p->laddr, p->link, strict); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 53d79ac..32c6398 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -227,8 +227,8 @@ static __inline__ int rt6_check_expired(const struct rt6_info *rt) static inline int rt6_need_strict(struct in6_addr *daddr) { - return (ipv6_addr_type(daddr) & - (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL)); + return (ipv6_addr_is_multicast(daddr) || + ipv6_addr_scope_linklocal(daddr)); } /* - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] [IPv6] Add new scope and address type inlines
[sorry if anyone got these multiple times, git-send-email weirdness] This set of patches adds new IPv6 scope and address type inlines to both clean-up the code (inspired by Arnaldo's skb cleanup) and reduce calls to ipv6_addr_type() when we can just compare the address directly. No functionality is changed. I'm only cc'ing the DCCP and LkSCTP lists on the patches that actually touch their code. -Brian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] Add mapped address type inline
Add mapped address type inline to avoid calls to ipv6_addr_type(). Signed-off-by: Brian Haley <[EMAIL PROTECTED]> --- include/net/ipv6.h |6 ++ net/ipv6/ip6_flowlabel.c |6 ++ net/ipv6/ipv6_sockglue.c |2 +- net/ipv6/tcp_ipv6.c | 13 + net/ipv6/udp.c |2 +- net/sctp/ipv6.c |4 ++-- 6 files changed, 17 insertions(+), 16 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index a888b0e..f3e13db 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -444,6 +444,12 @@ static inline int ipv6_addr_type_multicast(const struct in6_addr *a) return ((a->s6_addr32[0] & htonl(0xFF00)) == htonl(0xFF00)); } +static inline int ipv6_addr_type_mapped(const struct in6_addr *a) +{ + return ((a->s6_addr32[0] | a->s6_addr32[1]) == 0 && +a->s6_addr32[2] == htonl(0x)); +} + /* * Prototypes exported by ipv6 */ diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c index c206a15..b1bd088 100644 --- a/net/ipv6/ip6_flowlabel.c +++ b/net/ipv6/ip6_flowlabel.c @@ -282,7 +282,6 @@ fl_create(struct in6_flowlabel_req *freq, char __user *optval, int optlen, int * { struct ip6_flowlabel *fl; int olen; - int addr_type; int err; err = -ENOMEM; @@ -328,9 +327,8 @@ fl_create(struct in6_flowlabel_req *freq, char __user *optval, int optlen, int * if (err) goto done; fl->share = freq->flr_share; - addr_type = ipv6_addr_type(&freq->flr_dst); - if ((addr_type&IPV6_ADDR_MAPPED) - || addr_type == IPV6_ADDR_ANY) { + if (ipv6_addr_type_mapped(&freq->flr_dst) || + ipv6_addr_any(&freq->flr_dst)) { err = -EINVAL; goto done; } diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index aa3d07c..d83e982 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -249,7 +249,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, } if (ipv6_only_sock(sk) || - !(ipv6_addr_type(&np->daddr) & IPV6_ADDR_MAPPED)) { + !ipv6_addr_type_mapped(&np->daddr)) { retv = -EADDRNOTAVAIL; break; } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 537978c..a47d23d 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -132,7 +132,6 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, struct in6_addr *saddr = NULL, *final_p = NULL, final; struct flowi fl; struct dst_entry *dst; - int addr_type; int err; if (addr_len < SIN6_LEN_RFC2133) @@ -163,12 +162,10 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, if(ipv6_addr_any(&usin->sin6_addr)) usin->sin6_addr.s6_addr[15] = 0x1; - addr_type = ipv6_addr_type(&usin->sin6_addr); - - if(addr_type & IPV6_ADDR_MULTICAST) + if (ipv6_addr_type_multicast(&usin->sin6_addr)) return -ENETUNREACH; - if (addr_type&IPV6_ADDR_LINKLOCAL) { + if (ipv6_addr_scope_linklocal(&usin->sin6_addr)) { if (addr_len >= sizeof(struct sockaddr_in6) && usin->sin6_scope_id) { /* If interface is set while binding, indices @@ -200,7 +197,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, * TCP over IPv4 */ - if (addr_type == IPV6_ADDR_MAPPED) { + if (ipv6_addr_type_mapped(&usin->sin6_addr)) { u32 exthdrlen = icsk->icsk_ext_hdr_len; struct sockaddr_in sin; @@ -703,7 +703,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval, if (!cmd.tcpm_keylen) { if (!tcp_sk(sk)->md5sig_info) return -ENOENT; - if (ipv6_addr_type(&sin6->sin6_addr) & IPV6_ADDR_MAPPED) + if (ipv6_addr_type_mapped(&sin6->sin6_addr)) return tcp_v4_md5_do_del(sk, sin6->sin6_addr.s6_addr32[3]); return tcp_v6_md5_do_del(sk, &sin6->sin6_addr); } @@ -725,7 +725,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval, newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, GFP_KERNEL); if (!newkey) return -ENOMEM; - if (ipv6_addr_type(&sin6->sin6_addr) & IPV6_ADDR_MAPPED) { + if (ipv6_addr_type_mapped(&sin6->sin6_addr)) { return tcp_v4_md5_do_add(sk, sin6->sin6_addr.s6_addr32[3], newkey, cmd.tcpm_keylen); } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index c0b5fe3..6636431 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -610,7 +610,7 @@ int udpv6_sendmsg(struct kiocb *iocb, struct so
[PATCH 1/4] [IPv6] Add link and site-local scope inline
Add link and site-local scope inline to avoid calls to ipv6_addr_type(). Signed-off-by: Brian Haley <[EMAIL PROTECTED]> --- include/net/ipv6.h | 10 ++ net/dccp/ipv6.c |2 +- net/ipv6/addrconf.c |6 +++--- net/ipv6/af_inet6.c |2 +- net/ipv6/datagram.c | 11 --- net/ipv6/inet6_connection_sock.c |2 +- net/ipv6/ip6_output.c|2 +- net/ipv6/mcast.c |8 +++- net/ipv6/ndisc.c |8 net/ipv6/raw.c |4 ++-- net/ipv6/tcp_ipv6.c |2 +- net/ipv6/udp.c |4 ++-- net/sctp/ipv6.c | 16 +++- 13 files changed, 40 insertions(+), 37 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 00328b7..d473789 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -429,6 +429,16 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr)); } +static inline int ipv6_addr_scope_linklocal(const struct in6_addr *a) +{ + return ((a->s6_addr32[0] & htonl(0xFFC0)) == htonl(0xFE80)); +} + +static inline int ipv6_addr_scope_sitelocal(const struct in6_addr *a) +{ + return ((a->s6_addr32[0] & htonl(0xFFC0)) == htonl(0xFEC0)); +} + /* * Prototypes exported by ipv6 */ diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 64eac25..14a0f12 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -476,7 +476,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb) /* So that link locals have meaning */ if (!sk->sk_bound_dev_if && - ipv6_addr_type(&ireq6->rmt_addr) & IPV6_ADDR_LINKLOCAL) + ipv6_addr_scope_linklocal(&ireq6->rmt_addr)) ireq6->iif = inet6_iif(skb); /* diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 47d3adf..2d4fe24 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -2634,7 +2634,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp) if (ifp->idev->cnf.forwarding == 0 && ifp->idev->cnf.rtr_solicits > 0 && (dev->flags&IFF_LOOPBACK) == 0 && - (ipv6_addr_type(&ifp->addr) & IPV6_ADDR_LINKLOCAL)) { + ipv6_addr_scope_linklocal(&ifp->addr)) { struct in6_addr all_routers; ipv6_addr_all_routers(&all_routers); @@ -3155,7 +3155,7 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca, u8 scope = RT_SCOPE_UNIVERSE; int ifindex = ifmca->idev->dev->ifindex; - if (ipv6_addr_scope(&ifmca->mca_addr) & IFA_SITE) + if (ipv6_addr_scope_sitelocal(&ifmca->mca_addr)) scope = RT_SCOPE_SITE; nlh = nlmsg_put(skb, pid, seq, event, sizeof(struct ifaddrmsg), flags); @@ -3180,7 +3180,7 @@ static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca, u8 scope = RT_SCOPE_UNIVERSE; int ifindex = ifaca->aca_idev->dev->ifindex; - if (ipv6_addr_scope(&ifaca->aca_addr) & IFA_SITE) + if (ipv6_addr_scope_sitelocal(&ifaca->aca_addr)) scope = RT_SCOPE_SITE; nlh = nlmsg_put(skb, pid, seq, event, sizeof(struct ifaddrmsg), flags); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index df31cdd..24618cf 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -431,7 +431,7 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr, sin->sin6_port = inet->sport; } - if (ipv6_addr_type(&sin->sin6_addr) & IPV6_ADDR_LINKLOCAL) + if (ipv6_addr_scope_linklocal(&sin->sin6_addr)) sin->sin6_scope_id = sk->sk_bound_dev_if; *uaddr_len = sizeof(*sin); return(0); diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index 4a355fe..a8612b2 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -323,7 +323,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len) sin->sin6_flowinfo = (*(__be32 *)(nh + serr->addr_offset - 24) & IPV6_FLOWINFO_MASK); - if (ipv6_addr_type(&sin->sin6_addr) & IPV6_ADDR_LINKLOCAL) + if (ipv6_addr_scope_linklocal(&sin->sin6_addr)) sin->sin6_scope_id = IP6CB(skb)->iif; } else { ipv6_addr_set(&sin->sin6_addr, 0, 0, @@ -343,7 +343,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len) ipv6_addr_copy(&sin->sin6_addr, &ipv6_hdr(skb)->saddr); if (np->rxopt.all) datagram_recv_ctl(sk, msg, skb); - if (ipv6_addr_type(&sin->sin6_addr) & IPV6_ADDR_LINKLOCAL) +
[PATCH 4/4] Add loopback address type inline
Add loopback address type inline to avoid calls to ipv6_addr_type(). Signed-off-by: Brian Haley <[EMAIL PROTECTED]> --- include/net/ipv6.h|7 +++ net/ipv6/ip6_output.c |5 +++-- net/ipv6/route.c |8 +++- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index f3e13db..d87f421 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -388,6 +388,13 @@ static inline int ipv6_addr_any(const struct in6_addr *a) a->s6_addr32[2] | a->s6_addr32[3] ) == 0); } +static inline int ipv6_addr_loopback(const struct in6_addr *a) +{ + return ((a->s6_addr32[0] | a->s6_addr32[1] | +a->s6_addr32[2] ) == 0 && +a->s6_addr32[3] == htonl(0x0001)); +} + /* * find the first different bit between two addresses * length of address must be a multiple of 32bits diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index f6aa338..7f1aabe 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -455,8 +455,9 @@ int ip6_forward(struct sk_buff *skb) */ if (xrlim_allow(dst, 1*HZ)) ndisc_send_redirect(skb, n, target); - } else if (ipv6_addr_type(&hdr->saddr)&(IPV6_ADDR_MULTICAST|IPV6_ADDR_LOOPBACK - |IPV6_ADDR_LINKLOCAL)) { + } else if (ipv6_addr_type_multicast(&hdr->saddr) || + ipv6_addr_loopback(&hdr->saddr) || + ipv6_addr_scope_linklocal(&hdr->saddr)) { /* This check is security critical. */ goto error; } diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 32c6398..06ee92d 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1067,7 +1067,6 @@ int ip6_route_add(struct fib6_config *cfg) struct net_device *dev = NULL; struct inet6_dev *idev = NULL; struct fib6_table *table; - int addr_type; if (cfg->fc_dst_len > 128 || cfg->fc_src_len > 128) return -EINVAL; @@ -1108,9 +1107,7 @@ int ip6_route_add(struct fib6_config *cfg) cfg->fc_protocol = RTPROT_BOOT; rt->rt6i_protocol = cfg->fc_protocol; - addr_type = ipv6_addr_type(&cfg->fc_dst); - - if (addr_type & IPV6_ADDR_MULTICAST) + if (ipv6_addr_type_multicast(&cfg->fc_dst)) rt->u.dst.input = ip6_mc_input; else rt->u.dst.input = ip6_forward; @@ -1133,7 +1130,8 @@ int ip6_route_add(struct fib6_config *cfg) they would result in kernel looping; promote them to reject routes */ if ((cfg->fc_flags & RTF_REJECT) || - (dev && (dev->flags&IFF_LOOPBACK) && !(addr_type&IPV6_ADDR_LOOPBACK))) { + (dev && (dev->flags&IFF_LOOPBACK) && +!ipv6_addr_loopback(&cfg->fc_dst))) { /* hold loopback dev/idev if we haven't done so. */ if (dev != &loopback_dev) { if (dev) { -- 1.5.0.3 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Add mapped address type inline
In article <[EMAIL PROTECTED]> (at Thu, 05 Apr 2007 23:21:25 -0400), Brian Haley <[EMAIL PROTECTED]> says: > Add mapped address type inline to avoid calls to ipv6_addr_type(). > > Signed-off-by: Brian Haley <[EMAIL PROTECTED]> > --- > include/net/ipv6.h |6 ++ > net/ipv6/ip6_flowlabel.c |6 ++ > net/ipv6/ipv6_sockglue.c |2 +- > net/ipv6/tcp_ipv6.c | 13 + > net/ipv6/udp.c |2 +- > net/sctp/ipv6.c |4 ++-- > 6 files changed, 17 insertions(+), 16 deletions(-) > > diff --git a/include/net/ipv6.h b/include/net/ipv6.h > index a888b0e..f3e13db 100644 > --- a/include/net/ipv6.h > +++ b/include/net/ipv6.h > @@ -444,6 +444,12 @@ static inline int ipv6_addr_type_multicast(const struct > in6_addr *a) > return ((a->s6_addr32[0] & htonl(0xFF00)) == htonl(0xFF00)); > } > > +static inline int ipv6_addr_type_mapped(const struct in6_addr *a) > +{ > + return ((a->s6_addr32[0] | a->s6_addr32[1]) == 0 && > + a->s6_addr32[2] == htonl(0x)); > +} > + > /* >* Prototypes exported by ipv6 >*/ I prefer ipv6_addr_v4mapped() to align with ipv6_addr_any() (and IN6_IS_ADDR_V4MAPPED() macro in RFC3493). > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c > index 537978c..a47d23d 100644 > --- a/net/ipv6/tcp_ipv6.c > +++ b/net/ipv6/tcp_ipv6.c > @@ -132,7 +132,6 @@ static int tcp_v6_connect(struct sock *sk, struct > sockaddr *uaddr, > struct in6_addr *saddr = NULL, *final_p = NULL, final; > struct flowi fl; > struct dst_entry *dst; > - int addr_type; > int err; > > if (addr_len < SIN6_LEN_RFC2133) > @@ -163,12 +162,10 @@ static int tcp_v6_connect(struct sock *sk, struct > sockaddr *uaddr, > if(ipv6_addr_any(&usin->sin6_addr)) > usin->sin6_addr.s6_addr[15] = 0x1; > > - addr_type = ipv6_addr_type(&usin->sin6_addr); > - > - if(addr_type & IPV6_ADDR_MULTICAST) > + if (ipv6_addr_type_multicast(&usin->sin6_addr)) > return -ENETUNREACH; > > - if (addr_type&IPV6_ADDR_LINKLOCAL) { > + if (ipv6_addr_scope_linklocal(&usin->sin6_addr)) { > if (addr_len >= sizeof(struct sockaddr_in6) && > usin->sin6_scope_id) { > /* If interface is set while binding, indices > @@ -200,7 +197,7 @@ static int tcp_v6_connect(struct sock *sk, struct > sockaddr *uaddr, >* TCP over IPv4 >*/ different commit? --yoshfuji - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Add loopback address type inline
In article <[EMAIL PROTECTED]> (at Thu, 05 Apr 2007 23:21:33 -0400), Brian Haley <[EMAIL PROTECTED]> says: > Add loopback address type inline to avoid calls to ipv6_addr_type(). > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c > index f6aa338..7f1aabe 100644 > --- a/net/ipv6/ip6_output.c > +++ b/net/ipv6/ip6_output.c > @@ -455,8 +455,9 @@ int ip6_forward(struct sk_buff *skb) >*/ > if (xrlim_allow(dst, 1*HZ)) > ndisc_send_redirect(skb, n, target); > - } else if > (ipv6_addr_type(&hdr->saddr)&(IPV6_ADDR_MULTICAST|IPV6_ADDR_LOOPBACK > - |IPV6_ADDR_LINKLOCAL)) { > + } else if (ipv6_addr_type_multicast(&hdr->saddr) || > +ipv6_addr_loopback(&hdr->saddr) || > +ipv6_addr_scope_linklocal(&hdr->saddr)) { > /* This check is security critical. */ > goto error; > } > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 32c6398..06ee92d 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -1067,7 +1067,6 @@ int ip6_route_add(struct fib6_config *cfg) > struct net_device *dev = NULL; > struct inet6_dev *idev = NULL; > struct fib6_table *table; > - int addr_type; > > if (cfg->fc_dst_len > 128 || cfg->fc_src_len > 128) > return -EINVAL; > @@ -1108,9 +1107,7 @@ int ip6_route_add(struct fib6_config *cfg) > cfg->fc_protocol = RTPROT_BOOT; > rt->rt6i_protocol = cfg->fc_protocol; > > - addr_type = ipv6_addr_type(&cfg->fc_dst); > - > - if (addr_type & IPV6_ADDR_MULTICAST) > + if (ipv6_addr_type_multicast(&cfg->fc_dst)) > rt->u.dst.input = ip6_mc_input; > else > rt->u.dst.input = ip6_forward; different commit... --yoshfuji - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pci-e e1000 not responding to ARPs sometimes?
Tantilov, Emil S wrote: Sorry for the top post. I believe this is a known issue and it was fixed in the kernel driver as well (maybe it didn't make it into FC5 kernel?). See the following: http://article.gmane.org/gmane.linux.network/52168/match=e1000+factps+mn gcg Thank you! I've added this patch and we will do some testing on it tomorrow. I will let you know how it works out. Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Fwd: [PATCH 3/4] 8139too: RTNL and flush_scheduled_work deadlock]
Please consider this for the 2.6.20 stable branch. This fixes a deadlock between the bridging code and the 8139too driver. I am not the author of this patch, but I have tested a slightly modified version (so that it works with the 2.6.18 kernel) extensively and it solves the deadlock. Mr. Romieu suggested I forward this to you... Thanks, Ben Original Message Subject:[PATCH 3/4] 8139too: RTNL and flush_scheduled_work deadlock Date: Thu, 15 Feb 2007 23:37:44 +0100 From: Francois Romieu <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] CC: Stephen Hemminger <[EMAIL PROTECTED]>, [EMAIL PROTECTED], netdev@vger.kernel.org, Ben Greear <[EMAIL PROTECTED]>, Kyle Lucke <[EMAIL PROTECTED]>, Raghavendra Koushik <[EMAIL PROTECTED]>, Al Viro <[EMAIL PROTECTED]> Your usual dont-flush_scheduled_work-with-RTNL-held stuff. It is a bit different here since the thread runs permanently or is only occasionally kicked for recovery depending on the hardware revision. Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> --- drivers/net/8139too.c | 40 +--- 1 files changed, 17 insertions(+), 23 deletions(-) diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c index 35ad5cf..99304b2 100644 --- a/drivers/net/8139too.c +++ b/drivers/net/8139too.c @@ -1109,6 +1109,8 @@ static void __devexit rtl8139_remove_one (struct pci_dev *pdev) assert (dev != NULL); + flush_scheduled_work(); + unregister_netdev (dev); __rtl8139_cleanup_dev (dev); @@ -1603,18 +1605,21 @@ static void rtl8139_thread (struct work_struct *work) struct net_device *dev = tp->mii.dev; unsigned long thr_delay = next_tick; + rtnl_lock(); + + if (!netif_running(dev)) + goto out_unlock; + if (tp->watchdog_fired) { tp->watchdog_fired = 0; rtl8139_tx_timeout_task(work); - } else if (rtnl_trylock()) { - rtl8139_thread_iter (dev, tp, tp->mmio_addr); - rtnl_unlock (); - } else { - /* unlikely race. mitigate with fast poll. */ - thr_delay = HZ / 2; - } + } else + rtl8139_thread_iter(dev, tp, tp->mmio_addr); - schedule_delayed_work(&tp->thread, thr_delay); + if (tp->have_thread) + schedule_delayed_work(&tp->thread, thr_delay); +out_unlock: + rtnl_unlock (); } static void rtl8139_start_thread(struct rtl8139_private *tp) @@ -1626,19 +1631,11 @@ static void rtl8139_start_thread(struct rtl8139_private *tp) return; tp->have_thread = 1; + tp->watchdog_fired = 0; schedule_delayed_work(&tp->thread, next_tick); } -static void rtl8139_stop_thread(struct rtl8139_private *tp) -{ - if (tp->have_thread) { - cancel_rearming_delayed_work(&tp->thread); - tp->have_thread = 0; - } else - flush_scheduled_work(); -} - static inline void rtl8139_tx_clear (struct rtl8139_private *tp) { tp->cur_tx = 0; @@ -1696,12 +1693,11 @@ static void rtl8139_tx_timeout (struct net_device *dev) { struct rtl8139_private *tp = netdev_priv(dev); + tp->watchdog_fired = 1; if (!tp->have_thread) { - INIT_DELAYED_WORK(&tp->thread, rtl8139_tx_timeout_task); + INIT_DELAYED_WORK(&tp->thread, rtl8139_thread); schedule_delayed_work(&tp->thread, next_tick); - } else - tp->watchdog_fired = 1; - + } } static int rtl8139_start_xmit (struct sk_buff *skb, struct net_device *dev) @@ -2233,8 +2229,6 @@ static int rtl8139_close (struct net_device *dev) netif_stop_queue (dev); - rtl8139_stop_thread(tp); - if (netif_msg_ifdown(tp)) printk(KERN_DEBUG "%s: Shutting down ethercard, status was 0x%4.4x.\n", dev->name, RTL_R16 (IntrStatus)); -- 1.4.4.4 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] [IPv6] Add link and site-local scope inline
YOSHIFUJI Hideaki / wrote: In article <[EMAIL PROTECTED]> (at Thu, 05 Apr 2007 23:21:05 -0400), Brian Haley <[EMAIL PROTECTED]> says: Add link and site-local scope inline to avoid calls to ipv6_addr_type(). I disagree. Multicast scopes should also be handled appropriately. Yes, I totally missed that ipv6_addr_scope2type(IPV6_ADDR_MC_SCOPE(addr)) in __ipv6_addr_type(), so the linklocal inline probably isn't worth it since it would have to be something like: static inline int ipv6_addr_scope_linklocal(const struct in6_addr *a) { return ((a->s6_addr32[0] & htonl(0xFFC0)) == htonl(0xFE80) || ((a->s6_addr32[0] & htonl(0xFF00)) == htonl(0xFF00) && ((a)->s6_addr[1] & 0x0f) == IPV6_ADDR_SCOPE_LINKLOCAL))) } That's not that clean an inline anymore, but still doable... I'll clean-up the rest based on your comments and re-send. -Brian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] [IPv6] Add link and site-local scope inline
In article <[EMAIL PROTECTED]> (at Fri, 06 Apr 2007 02:37:52 -0400), Brian Haley <[EMAIL PROTECTED]> says: > static inline int ipv6_addr_scope_linklocal(const struct in6_addr *a) > { > return ((a->s6_addr32[0] & htonl(0xFFC0)) == htonl(0xFE80) || > ((a->s6_addr32[0] & htonl(0xFF00)) == htonl(0xFF00) && > ((a)->s6_addr[1] & 0x0f) == IPV6_ADDR_SCOPE_LINKLOCAL))) > } > > That's not that clean an inline anymore, but still doable... I would prefer to have ipv6_addr_linklocal() and ipv6_addr_mc_linklocal() aligning with RFC3493. --yoshfuji - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html