Re: [PATCH] include listenq max backlog in /proc/net/tcp and include in tcp_info
Sridhar Samudrala a écrit : On Mon, 2007-09-10 at 16:13 -0700, Rick Jones wrote: Return some useful information such as the maximum listen backlog and the current listen backlog in the tcp_info structure and have that match what one can see in /proc/net/tcp and /proc/net/tcp6. If we are also exporting max listen backlog, another place to consider adding this is to tcp_diag_get_info() called via INET_DIAG_INFO. Current listen backlog is returned in inet_diag_msg->idiag_rqueue. max listen backlog can be returned in inet_diag_msg->idiag_wqueue. I agree, /proc/net/tcp is deprecated nowadays... Rick, could you add this part in your patch, and add my Sign-off-by ? Thank you Eric diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c index 57c5f0b..f5b6275 100644 --- a/net/ipv4/tcp_diag.c +++ b/net/ipv4/tcp_diag.c @@ -25,11 +25,13 @@ static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r, const struct tcp_sock *tp = tcp_sk(sk); struct tcp_info *info = _info; - if (sk->sk_state == TCP_LISTEN) + if (sk->sk_state == TCP_LISTEN) { r->idiag_rqueue = sk->sk_ack_backlog; - else + r->idiag_wqueue = sk->sk_max_ack_backlog; + else { r->idiag_rqueue = tp->rcv_nxt - tp->copied_seq; - r->idiag_wqueue = tp->write_seq - tp->snd_una; + r->idiag_wqueue = tp->write_seq - tp->snd_una; + } if (info != NULL) tcp_get_info(sk, info); }
[PATCH][2/2] Add ICMPMsgStats MIB (RFC 4293)
Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches "remove" (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for "Icmp" to get the named data from new counters. IPv6 patch attached. +-DLS Signed-off-by: David L Stevens <[EMAIL PROTECTED]> diff -ruNp linux-2.6.22.5/include/linux/snmp.h linux-2.6.22.5_ICMPv6MSG/include/linux/snmp.h --- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPv6MSG/include/linux/snmp.h 2007-09-10 15:02:43.0 -0700 @@ -91,35 +91,12 @@ enum ICMP6_MIB_NUM = 0, ICMP6_MIB_INMSGS, /* InMsgs */ ICMP6_MIB_INERRORS, /* InErrors */ - ICMP6_MIB_INDESTUNREACHS, /* InDestUnreachs */ - ICMP6_MIB_INPKTTOOBIGS, /* InPktTooBigs */ - ICMP6_MIB_INTIMEEXCDS, /* InTimeExcds */ - ICMP6_MIB_INPARMPROBLEMS, /* InParmProblems */ - ICMP6_MIB_INECHOS, /* InEchos */ - ICMP6_MIB_INECHOREPLIES,/* InEchoReplies */ - ICMP6_MIB_INGROUPMEMBQUERIES, /* InGroupMembQueries */ - ICMP6_MIB_INGROUPMEMBRESPONSES, /* InGroupMembResponses */ - ICMP6_MIB_INGROUPMEMBREDUCTIONS,/* InGroupMembReductions */ - ICMP6_MIB_INROUTERSOLICITS, /* InRouterSolicits */ - ICMP6_MIB_INROUTERADVERTISEMENTS, /* InRouterAdvertisements */ - ICMP6_MIB_INNEIGHBORSOLICITS, /* InNeighborSolicits */ - ICMP6_MIB_INNEIGHBORADVERTISEMENTS, /* InNeighborAdvertisements */ - ICMP6_MIB_INREDIRECTS, /* InRedirects */ ICMP6_MIB_OUTMSGS, /* OutMsgs */ - ICMP6_MIB_OUTDESTUNREACHS, /* OutDestUnreachs */ - ICMP6_MIB_OUTPKTTOOBIGS,/* OutPktTooBigs */ - ICMP6_MIB_OUTTIMEEXCDS, /* OutTimeExcds */ - ICMP6_MIB_OUTPARMPROBLEMS, /* OutParmProblems */ - ICMP6_MIB_OUTECHOREPLIES, /* OutEchoReplies */ - ICMP6_MIB_OUTROUTERSOLICITS,/* OutRouterSolicits */ - ICMP6_MIB_OUTNEIGHBORSOLICITS, /* OutNeighborSolicits */ - ICMP6_MIB_OUTNEIGHBORADVERTISEMENTS,/* OutNeighborAdvertisements */ - ICMP6_MIB_OUTREDIRECTS, /* OutRedirects */ - ICMP6_MIB_OUTGROUPMEMBRESPONSES,/* OutGroupMembResponses */ - ICMP6_MIB_OUTGROUPMEMBREDUCTIONS, /* OutGroupMembReductions */ __ICMP6_MIB_MAX }; +#define __ICMP6MSG_MIB_MAX 512 /* Out+In for all 8-bit ICMPv6 types */ + /* tcp mib definitions */ /* * RFC 1213: MIB-II TCP group diff -ruNp linux-2.6.22.5/include/net/ipv6.h linux-2.6.22.5_ICMPv6MSG/include/net/ipv6.h --- linux-2.6.22.5/include/net/ipv6.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPv6MSG/include/net/ipv6.h 2007-09-10 15:41:19.0 -0700 @@ -132,6 +132,7 @@ DECLARE_SNMP_STAT(struct ipstats_mib, ip SNMP_INC_STATS_USER(ipv6_statistics, field);\ }) DECLARE_SNMP_STAT(struct icmpv6_mib, icmpv6_statistics); +DECLARE_SNMP_STAT(struct icmpv6msg_mib, icmpv6msg_statistics); #define ICMP6_INC_STATS(idev, field) ({ \ struct inet6_dev *_idev = (idev); \ if (likely(_idev != NULL)) \ @@ -157,6 +158,14 @@ DECLARE_SNMP_STAT(struct icmpv6_mib, icm SNMP_INC_STATS_OFFSET_BH(_idev->stats.icmpv6, field, _offset); \ SNMP_INC_STATS_OFFSET_BH(icmpv6_statistics, field, _offset); \ }) + +#define ICMP6MSGOUT_INC_STATS(field) SNMP_INC_STATS(icmpv6msg_statistics, field+256) +#define ICMP6MSGOUT_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpv6msg_statistics, field+256) +#define ICMP6MSGOUT_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpv6msg_statistics, field+256) +#define ICMP6MSGIN_INC_STATS(field) SNMP_INC_STATS(icmpv6msg_statistics, field) +#define ICMP6MSGIN_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpv6msg_statistics, field) +#define ICMP6MSGIN_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpv6msg_statisti
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
"volatile" has nothing to do with reordering. atomic_dec() writes to memory, so it _does_ have "volatile semantics", implicitly, as long as the compiler cannot optimise the atomic variable away completely -- any store counts as a side effect. Stores can be reordered. Only x86 has (mostly) implicit write ordering. So no atomic_dec has no volatile semantics Read again: I said the C "volatile" construct has nothing to do with CPU memory access reordering. and may be reordered on a variety of processors. Writes to memory may not follow code order on several processors. The _compiler_ isn't allowed to reorder things here. Yes, of course you do need stronger barriers for many purposes, volatile isn't all that useful you know. Segher - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293)
Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches "remove" (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for "Icmp" to get the named data from new counters. IPv4 patch attached, IPv6 patch to follow. +-DLS Signed-off-by: David L Stevens <[EMAIL PROTECTED]> diff -ruNp linux-2.6.22.5/include/linux/snmp.h linux-2.6.22.5_ICMPMSG/include/linux/snmp.h --- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/linux/snmp.h 2007-08-23 15:32:29.0 -0700 @@ -82,6 +82,8 @@ enum __ICMP_MIB_MAX }; +#define __ICMPMSG_MIB_MAX 512 /* Out+In for all 8-bit ICMP types */ + /* icmp6 mib definitions */ /* * RFC 2466: ICMPv6-MIB diff -ruNp linux-2.6.22.5/include/net/icmp.h linux-2.6.22.5_ICMPMSG/include/net/icmp.h --- linux-2.6.22.5/include/net/icmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/net/icmp.h 2007-08-23 15:56:45.0 -0700 @@ -30,9 +30,16 @@ struct icmp_err { extern struct icmp_err icmp_err_convert[]; DECLARE_SNMP_STAT(struct icmp_mib, icmp_statistics); +DECLARE_SNMP_STAT(struct icmpmsg_mib, icmpmsg_statistics); #define ICMP_INC_STATS(field) SNMP_INC_STATS(icmp_statistics, field) #define ICMP_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmp_statistics, field) #define ICMP_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmp_statistics, field) +#define ICMPMSGOUT_INC_STATS(field)SNMP_INC_STATS(icmpmsg_statistics, field+256) +#define ICMPMSGOUT_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpmsg_statistics, field+256) +#define ICMPMSGOUT_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpmsg_statistics, field+256) +#define ICMPMSGIN_INC_STATS(field) SNMP_INC_STATS(icmpmsg_statistics, field) +#define ICMPMSGIN_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpmsg_statistics, field) +#define ICMPMSGIN_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpmsg_statistics, field) struct dst_entry; struct net_proto_family; @@ -42,6 +49,7 @@ extern void icmp_send(struct sk_buff *sk extern int icmp_rcv(struct sk_buff *skb); extern int icmp_ioctl(struct sock *sk, int cmd, unsigned long arg); extern voidicmp_init(struct net_proto_family *ops); +extern voidicmp_out_count(unsigned char type); /* Move into dst.h ? */ extern int xrlim_allow(struct dst_entry *dst, int timeout); diff -ruNp linux-2.6.22.5/include/net/snmp.h linux-2.6.22.5_ICMPMSG/include/net/snmp.h --- linux-2.6.22.5/include/net/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/net/snmp.h 2007-08-23 14:42:50.0 -0700 @@ -82,6 +82,11 @@ struct icmp_mib { unsigned long mibs[ICMP_MIB_MAX]; } __SNMP_MIB_ALIGN__; +#define ICMPMSG_MIB_MAX__ICMPMSG_MIB_MAX +struct icmpmsg_mib { + unsigned long mibs[ICMPMSG_MIB_MAX]; +} __SNMP_MIB_ALIGN__; + /* ICMP6 (IPv6-ICMP) */ #define ICMP6_MIB_MAX __ICMP6_MIB_MAX struct icmpv6_mib { diff -ruNp linux-2.6.22.5/net/ipv4/af_inet.c linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c --- linux-2.6.22.5/net/ipv4/af_inet.c 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c 2007-08-23 14:47:26.0 -0700 @@ -1296,6 +1296,10 @@ static int __init init_ipv4_mibs(void) sizeof(struct icmp_mib), __alignof__(struct icmp_mib)) < 0) goto err_icmp_mib; + if (snmp_mib_init((void **)icmpmsg_statistics, + sizeof(struct icmpmsg_mib), + __alignof__(struct icmpmsg_mib)) < 0) + goto err_icmpmsg_mib; if (snmp_mib_init((void **)tcp_statistics, sizeof(struct tcp_mib), __alignof__(struct tcp_mib)) < 0) @@ -1318,6 +1322,8 @@ err_udplite_mib: err_udp_mib: snmp_mib_free((void **)tcp_statistics); err_tcp_mib: + snmp_mib_free((void **)icmpmsg_statistics); +err_icmpmsg_mib: snmp_mib_free((void **)icmp_statistics); err_icmp_mib: snmp_mib_free((void **)ip_statistics); diff -ruNp linux-2.6.22.5/net/ipv4/icmp.c linux-2.6.22.5_ICMPMSG/net/ipv4/icmp.c --- linux-2.6.22.5/net/ipv4/icmp.c 2007
Re: [PATCH resend] Fix a lock problem in generic phy code
On Mon, Sep 10, 2007 at 08:45:50PM +0200, Hans-Jürgen Koch wrote: > > > Could you please audit all instances of physdev->lock and add > > _bh where necessary? I can see that at least phys_stop also > > needs the _bh. > > I think the patch does all that's necessary. At least, there're no error > messages in the logs anymore. I didn't check if there's an error on > unload, though. Sorry, but you can't rely on the non-existence of lockdep messages as a proof of correctness :) If we're going to fix the obvious bugs here, we should fix the subtle ones too as otherwise they'll be much harder to notice with this patch merged. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc5: possible irq lock inversion dependency detected
On Mon, Sep 10, 2007 at 08:04:41PM -0400, jamal wrote: > > disabling BH would make it more symmetric to the way we handle > egress. I couldnt reproduce the issue, but this should hopefully resolve > it. > Christian, can you test with this patch? Jamal, it's the police_lock that we need to make _bh. The ingress_lock is already _bh because of the spin_lock_bh that directly precedes it. Oh and I think the same thing applies for the other actions too. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cfg80211: fix initialisation if built-in
On 9/10/07, Magnus Damm <[EMAIL PROTECTED]> wrote: > -module_init(rate_control_simple_init); > +//module_init(rate_control_simple_init); > +postcore_initcall(rate_control_simple_init); > module_exit(rate_control_simple_exit); > > MODULE_DESCRIPTION("Simple rate control algorithm for ieee80211"); Same problem here, except with the rt2x00 driver. I changed it to subsys_initcall(rate_control_simple_init), which also worked. I also found that without this change, it was failing at this point in ieee80211_rate.c: ieee80211_try_rate_control_ops_get(const char *name) { struct rate_control_alg *alg; struct rate_control_ops *ops = NULL; mutex_lock(&rate_ctrl_mutex); list_for_each_entry(alg, &rate_ctrl_algs, list) { <= Here if (!name || !strcmp(alg->ops->name, name)) if (try_module_get(alg->ops->module)) { ops = alg->ops; break; } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cfg80211: fix initialisation if built-in
On 9/10/07, Johannes Berg <[EMAIL PROTECTED]> wrote: > When cfg80211 is built into the kernel it needs to init earlier > so that device registrations are run after it has initialised. > > Signed-off-by: Johannes Berg <[EMAIL PROTECTED]> Yep, I need this fix as well. Without it the ath5k driver built in bombs out during module_init(). Something with kref and a struct device pointing to an uninitialized ieee80211_class. I need a similar fix for net/mac80211/rc80211_simple.c as well to get ath5k working though, not sure why at the moment. There may be some bug with request_module() not being called properly. ieee80211_register_hw() calls ieee80211_init_rate_ctrl_alg() with NULL as name which calls rate_control_alloc() with NULL which always seems to fail when built in. This hack works around that problem, not sure what the real fix is. --- 0002/net/mac80211/rc80211_simple.c +++ work/net/mac80211/rc80211_simple.c 2007-09-09 18:11:48.0 +0900 @@ -431,7 +431,8 @@ static void __exit rate_control_simple_e } -module_init(rate_control_simple_init); +//module_init(rate_control_simple_init); +postcore_initcall(rate_control_simple_init); module_exit(rate_control_simple_exit); MODULE_DESCRIPTION("Simple rate control algorithm for ieee80211"); Thanks, / magnus - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/16] net: netlink support for moving devices between network namespaces.
Quoting Eric W. Biederman ([EMAIL PROTECTED]): > "Serge E. Hallyn" <[EMAIL PROTECTED]> writes: > >> > >> +static struct net *get_net_ns_by_pid(pid_t pid) > >> +{ > >> + struct task_struct *tsk; > >> + struct net *net; > >> + > >> + /* Lookup the network namespace */ > >> + net = ERR_PTR(-ESRCH); > >> + rcu_read_lock(); > >> + tsk = find_task_by_pid(pid); > >> + if (tsk) { > >> + task_lock(tsk); > >> + if (tsk->nsproxy) > >> + net = get_net(tsk->nsproxy->net_ns); > >> + task_unlock(tsk); > > > > Thinking... Ok, I'm not sure this is 100% safe in the target tree, but > > the long-term correct way probably isn't yet implemented in the net- > > tree. Eventually you will want to: > > > > net_ns = NULL; > > rcu_read_lock(); > > tsk = find_task_by_pid(); /* or _pidns equiv? */ > > nsproxy = task_nsproxy(tsk); > > if (nsproxy) > > net_ns = get_net(nsproxy->net_ns); > > rcu_read_unlock; > > > > What you have here is probably unsafe if tsk is the last task pointing > > to it's nsproxy and it does an unshare, bc unshare isn't protected by > > task_lock, and you're not rcu_dereferencing tsk->nsproxy (which > > task_nsproxy does). At one point we floated a patch to reuse the same > > nsproxy in that case which would prevent you having to worry about it, > > but that isn't being done in -mm now so i doubt it's in -net. > > > That change isn't merged upstream yet, so it isn't in David's > net-2.6.24 tree. Currently task->nsproxy is protected but > task_lock(current). So the code is fine. > > I am aware that removing the task_lock(current) for the setting > of current->nsproxy is currently in the works, and I have planned > to revisit this later when all of these pieces come together. > > For now the code is fine. > > If need be we can drop this patch to remove the potential merge > conflict. No, no. Like you say it's correct at the moment. Just something we need to watch out for when it does get merged with the newer changes. > But I figured it was useful Absolutely. > for this part of the user space > interface to be available for review. Agreed. And the rest of the patchset looks good to me. Thanks. -serge - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] include listenq max backlog in /proc/net/tcp and include in tcp_info
On Mon, 2007-09-10 at 16:13 -0700, Rick Jones wrote: > Return some useful information such as the maximum listen backlog and > the current listen backlog in the tcp_info structure and have that > match what one can see in /proc/net/tcp and /proc/net/tcp6. If we are also exporting max listen backlog, another place to consider adding this is to tcp_diag_get_info() called via INET_DIAG_INFO. Current listen backlog is returned in inet_diag_msg->idiag_rqueue. max listen backlog can be returned in inet_diag_msg->idiag_wqueue. Thanks Sridhar > > Signed-off-by: Rick Jones <[EMAIL PROTECTED]> > --- > > diff -r bdcdd0e1ee9d Documentation/networking/proc_net_tcp.txt > --- a/Documentation/networking/proc_net_tcp.txt Sat Sep 01 07:00:31 > 2007 + > +++ b/Documentation/networking/proc_net_tcp.txt Mon Sep 10 16:09:46 > 2007 -0700 > @@ -20,8 +20,8 @@ up into 3 parts because of the length of >|| | | |--> number of unrecovered RTO timeouts >|| | |--> number of jiffies until timer expires >|| |> timer_active (see below) > - ||--> receive-queue > - |---> transmit-queue > + ||--> receive-queue or connection backlog > + |---> transmit-queue or connection limit > > 10000 54165785 4 cd1e6040 25 4 27 3 -1 > | || || | | | | |--> slow start size > threshold, > diff -r bdcdd0e1ee9d net/ipv4/tcp.c > --- a/net/ipv4/tcp.c Sat Sep 01 07:00:31 2007 + > +++ b/net/ipv4/tcp.c Mon Sep 10 16:09:46 2007 -0700 > @@ -2030,8 +2030,14 @@ void tcp_get_info(struct sock *sk, struc > info->tcpi_snd_mss = tp->mss_cache; > info->tcpi_rcv_mss = icsk->icsk_ack.rcv_mss; > > - info->tcpi_unacked = tp->packets_out; > - info->tcpi_sacked = tp->sacked_out; > + if (sk->sk_state == TCP_LISTEN) { > + info->tcpi_unacked = sk->sk_ack_backlog; > + info->tcpi_sacked = sk->sk_max_ack_backlog; > + } > + else { > + info->tcpi_unacked = tp->packets_out; > + info->tcpi_sacked = tp->sacked_out; > + } > info->tcpi_lost = tp->lost_out; > info->tcpi_retrans = tp->retrans_out; > info->tcpi_fackets = tp->fackets_out; > diff -r bdcdd0e1ee9d net/ipv4/tcp_ipv4.c > --- a/net/ipv4/tcp_ipv4.c Sat Sep 01 07:00:31 2007 + > +++ b/net/ipv4/tcp_ipv4.c Mon Sep 10 16:09:46 2007 -0700 > @@ -2320,7 +2320,8 @@ static void get_tcp4_sock(struct sock *s > sprintf(tmpbuf, "%4d: %08X:%04X %08X:%04X %02X %08X:%08X %02X:%08lX " > "%08X %5d %8d %lu %d %p %u %u %u %u %d", > i, src, srcp, dest, destp, sk->sk_state, > - tp->write_seq - tp->snd_una, > + sk->sk_state == TCP_LISTEN ? sk->sk_max_ack_backlog : > + (tp->write_seq - tp->snd_una), > sk->sk_state == TCP_LISTEN ? sk->sk_ack_backlog : >(tp->rcv_nxt - tp->copied_seq), > timer_active, > diff -r bdcdd0e1ee9d net/ipv6/tcp_ipv6.c > --- a/net/ipv6/tcp_ipv6.c Sat Sep 01 07:00:31 2007 + > +++ b/net/ipv6/tcp_ipv6.c Mon Sep 10 16:09:46 2007 -0700 > @@ -2005,8 +2005,10 @@ static void get_tcp6_sock(struct seq_fil > dest->s6_addr32[0], dest->s6_addr32[1], > dest->s6_addr32[2], dest->s6_addr32[3], destp, > sp->sk_state, > -tp->write_seq-tp->snd_una, > -(sp->sk_state == TCP_LISTEN) ? sp->sk_ack_backlog : > (tp->rcv_nxt - tp->copied_seq), > +(sp->sk_state == TCP_LISTEN) ? sp->sk_max_ack_backlog: > + tp->write_seq-tp->snd_una, > +(sp->sk_state == TCP_LISTEN) ? sp->sk_ack_backlog : > + (tp->rcv_nxt - tp->copied_seq), > timer_active, > jiffies_to_clock_t(timer_expires - jiffies), > icsk->icsk_retransmits, > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New NAPI API: Need for netif_napi_remove() ?!
Kok, Auke wrote: David, From an old thread: > 5) Since, in the NETPOLL case, netif_napi_init() adds the NAPI struct > to the per-device list I renamed it to netif_napi_add(). Currently > no teardown is really necessary, anything that would need to be done > would be driver internal, so I didn't create the corollary > netif_napi_remove() for the time being. Let's not add it unless it > really becomes necessary. while coding the NAPI API changes into the ixgbe driver, I notice that I'm in need for an implementation for netif_napi_remove(). The ixgbe driver itself already modifies it's polling routing on open() and close() based on whether it was able to acquire MSI-X vectors or not, and can thus logically change as the system suspends/resumes and new hardware is inserted that change the balance in the MSI-X vectors in the system. Or, even more bluntly, all MSI support is disabled and we want the driver to come up in legacy mode and use a completely different poll routine alltogether. We can't do this at probe time. In any case I think we have a legitimate case for netif_napi_remove() to be implemented. hm, I spoke too soon, I think I can get by for now by just modifying adapter->napi.poll when needed, and this would be clean enough for now. This might change as I enable multiqueue in this driver later though. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
New NAPI API: Need for netif_napi_remove() ?!
David, From an old thread: > 5) Since, in the NETPOLL case, netif_napi_init() adds the NAPI struct > to the per-device list I renamed it to netif_napi_add(). Currently > no teardown is really necessary, anything that would need to be done > would be driver internal, so I didn't create the corollary > netif_napi_remove() for the time being. Let's not add it unless it > really becomes necessary. while coding the NAPI API changes into the ixgbe driver, I notice that I'm in need for an implementation for netif_napi_remove(). The ixgbe driver itself already modifies it's polling routing on open() and close() based on whether it was able to acquire MSI-X vectors or not, and can thus logically change as the system suspends/resumes and new hardware is inserted that change the balance in the MSI-X vectors in the system. Or, even more bluntly, all MSI support is disabled and we want the driver to come up in legacy mode and use a completely different poll routine alltogether. We can't do this at probe time. In any case I think we have a legitimate case for netif_napi_remove() to be implemented. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
noob dev question
I'm new to the list and new to just about everything involving kernel development. I'm working on a project where I have successfully written a LKM to handle IP packets for protocol type 253 (an experimental protocol number). Now, I'm working on sending my reply and I hit a road block. I'm not sure if I should: 1) create a new skb with my protocol data and try to pass it to the IP handler OR 2) use the inbound skb and just turn it around and use it to send back out manually via a NF Hook. Ideally I would like to get my reply formatted and sent into ip_queue_xmit but I'm at the limit of what I can do regarding the manipulation of the socket buffers. I'm currently working in 2.6.18.8 as the book I'm using does not reflect some of the newer changes to the sk_buff struct. Any help or pointers would be nice. I'm looking more for online tutorials at this point. Regards, -dave - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc5: possible irq lock inversion dependency detected
On Mon, 2007-10-09 at 21:00 +0800, Herbert Xu wrote: > The minimal fix would be to make sure that we disable BH on > the first CPU. disabling BH would make it more symmetric to the way we handle egress. I couldnt reproduce the issue, but this should hopefully resolve it. Christian, can you test with this patch? cheers, jamal [NET_SCHED] make ingress qlock symmetric to egress Signed-off-by: Jamal Hadi Salim <[EMAIL PROTECTED]> --- a/net/sched/sch_generic.c 2007/09/10 23:19:45 1.1 +++ b/net/sched/sch_generic.c 2007/09/10 23:52:45 @@ -42,12 +42,12 @@ void qdisc_lock_tree(struct net_device *dev) { spin_lock_bh(&dev->queue_lock); - spin_lock(&dev->ingress_lock); + spin_lock_bh(&dev->ingress_lock); } void qdisc_unlock_tree(struct net_device *dev) { - spin_unlock(&dev->ingress_lock); + spin_unlock_bh(&dev->ingress_lock); spin_unlock_bh(&dev->queue_lock); }
Re: [PATCH] Document non-semantics of atomic_read() and atomic_set()
On Mon, Sep 10, 2007 at 07:19:44PM -0400, Chris Snook wrote: > From: Chris Snook <[EMAIL PROTECTED]> > > Unambiguously document the fact that atomic_read() and atomic_set() > do not imply any ordering or memory access, and that callers are > obligated to explicitly invoke barriers as needed to ensure that > changes to atomic variables are visible in all contexts that need > to see them. Acked-by: Paul E. McKenney <[EMAIL PROTECTED]> > Signed-off-by: Chris Snook <[EMAIL PROTECTED]> > > --- a/Documentation/atomic_ops.txt2007-07-08 19:32:17.0 -0400 > +++ b/Documentation/atomic_ops.txt2007-09-10 19:02:50.0 -0400 > @@ -12,7 +12,11 @@ > C integer type will fail. Something like the following should > suffice: > > - typedef struct { volatile int counter; } atomic_t; > + typedef struct { int counter; } atomic_t; > + > + Historically, counter has been declared volatile. This is now > +discouraged. See Documentation/volatile-considered-harmful.txt for the > +complete rationale. > > The first operations to implement for atomic_t's are the > initializers and plain reads. > @@ -42,6 +46,22 @@ > > which simply reads the current value of the counter. > > +*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! *** > + > +Some architectures may choose to use the volatile keyword, barriers, or > +inline assembly to guarantee some degree of immediacy for atomic_read() > +and atomic_set(). This is not uniformly guaranteed, and may change in > +the future, so all users of atomic_t should treat atomic_read() and > +atomic_set() as simple C assignment statements that may be reordered or > +optimized away entirely by the compiler or processor, and explicitly > +invoke the appropriate compiler and/or memory barrier for each use case. > +Failure to do so will result in code that may suddenly break when used with > +different architectures or compiler optimizations, or even changes in > +unrelated code which changes how the compiler optimizes the section > +accessing atomic_t variables. > + > +*** YOU HAVE BEEN WARNED! *** > + > Now, we move onto the actual atomic operation interfaces. > > void atomic_add(int i, atomic_t *v); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [-MM, FIX V4] e1000e: incorporate napi_struct changes from net-2.6.24.git
This incorporates the new napi_struct changes into e1000e. Included bugfix for ifdown hang from Krishna Kumar for e1000. Disabling polling is no longer needed at init time, so remove napi_disable() call from _probe(). This also fixes an endless polling loop where the driver signalled "polling done" improperly back to the stack. Signed-off-by: Auke Kok <[EMAIL PROTECTED]> --- drivers/net/e1000e/e1000.h |2 ++ drivers/net/e1000e/netdev.c | 40 2 files changed, 18 insertions(+), 24 deletions(-) diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h index c57e35a..d2499bb 100644 --- a/drivers/net/e1000e/e1000.h +++ b/drivers/net/e1000e/e1000.h @@ -187,6 +187,8 @@ struct e1000_adapter { struct e1000_ring *tx_ring /* One per active queue */ cacheline_aligned_in_smp; + struct napi_struct napi; + unsigned long tx_queue_len; unsigned int restart_queue; u32 txd_cmd; diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c index 372da46..eeb40cc 100644 --- a/drivers/net/e1000e/netdev.c +++ b/drivers/net/e1000e/netdev.c @@ -1149,12 +1149,12 @@ static irqreturn_t e1000_intr_msi(int irq, void *data) mod_timer(&adapter->watchdog_timer, jiffies + 1); } - if (netif_rx_schedule_prep(netdev)) { + if (netif_rx_schedule_prep(netdev, &adapter->napi)) { adapter->total_tx_bytes = 0; adapter->total_tx_packets = 0; adapter->total_rx_bytes = 0; adapter->total_rx_packets = 0; - __netif_rx_schedule(netdev); + __netif_rx_schedule(netdev, &adapter->napi); } else { atomic_dec(&adapter->irq_sem); } @@ -1212,12 +1212,12 @@ static irqreturn_t e1000_intr(int irq, void *data) mod_timer(&adapter->watchdog_timer, jiffies + 1); } - if (netif_rx_schedule_prep(netdev)) { + if (netif_rx_schedule_prep(netdev, &adapter->napi)) { adapter->total_tx_bytes = 0; adapter->total_tx_packets = 0; adapter->total_rx_bytes = 0; adapter->total_rx_packets = 0; - __netif_rx_schedule(netdev); + __netif_rx_schedule(netdev, &adapter->napi); } else { atomic_dec(&adapter->irq_sem); } @@ -1662,10 +1662,10 @@ set_itr_now: * e1000_clean - NAPI Rx polling callback * @adapter: board private structure **/ -static int e1000_clean(struct net_device *poll_dev, int *budget) +static int e1000_clean(struct napi_struct *napi, int budget) { - struct e1000_adapter *adapter; - int work_to_do = min(*budget, poll_dev->quota); + struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi); + struct net_device *poll_dev = adapter->netdev; int tx_cleaned = 0, work_done = 0; /* Must NOT use netdev_priv macro here. */ @@ -1684,25 +1684,19 @@ static int e1000_clean(struct net_device *poll_dev, int *budget) spin_unlock(&adapter->tx_queue_lock); } - adapter->clean_rx(adapter, &work_done, work_to_do); - *budget -= work_done; - poll_dev->quota -= work_done; + adapter->clean_rx(adapter, &work_done, budget); /* If no Tx and not enough Rx work done, exit the polling mode */ - if ((!tx_cleaned && (work_done == 0)) || + if ((!tx_cleaned && (work_done < budget)) || !netif_running(poll_dev)) { quit_polling: if (adapter->itr_setting & 3) e1000_set_itr(adapter); - netif_rx_complete(poll_dev); - if (test_bit(__E1000_DOWN, &adapter->state)) - atomic_dec(&adapter->irq_sem); - else - e1000_irq_enable(adapter); - return 0; + netif_rx_complete(poll_dev, napi); + e1000_irq_enable(adapter); } - return 1; + return work_done; } static void e1000_vlan_rx_add_vid(struct net_device *netdev, u16 vid) @@ -2439,7 +2433,7 @@ int e1000e_up(struct e1000_adapter *adapter) clear_bit(__E1000_DOWN, &adapter->state); - netif_poll_enable(adapter->netdev); + napi_enable(&adapter->napi); e1000_irq_enable(adapter); /* fire a link change interrupt to start the watchdog */ @@ -2472,7 +2466,7 @@ void e1000e_down(struct e1000_adapter *adapter) e1e_flush(); msleep(10); - netif_poll_disable(netdev); + napi_disable(&adapter->napi); e1000_irq_disable(adapter); del_timer_sync(&adapter->watchdog_timer); @@ -2605,7 +2599,7 @@ static int e1000_open(struct net_device *netdev) /* From here on the code is the same as e1000e_up() */ clear_bit(__E1000_DOWN, &adapter->state); - netif_poll_enable
[PATCH] Document non-semantics of atomic_read() and atomic_set()
From: Chris Snook <[EMAIL PROTECTED]> Unambiguously document the fact that atomic_read() and atomic_set() do not imply any ordering or memory access, and that callers are obligated to explicitly invoke barriers as needed to ensure that changes to atomic variables are visible in all contexts that need to see them. Signed-off-by: Chris Snook <[EMAIL PROTECTED]> --- a/Documentation/atomic_ops.txt 2007-07-08 19:32:17.0 -0400 +++ b/Documentation/atomic_ops.txt 2007-09-10 19:02:50.0 -0400 @@ -12,7 +12,11 @@ C integer type will fail. Something like the following should suffice: - typedef struct { volatile int counter; } atomic_t; + typedef struct { int counter; } atomic_t; + + Historically, counter has been declared volatile. This is now +discouraged. See Documentation/volatile-considered-harmful.txt for the +complete rationale. The first operations to implement for atomic_t's are the initializers and plain reads. @@ -42,6 +46,22 @@ which simply reads the current value of the counter. +*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! *** + +Some architectures may choose to use the volatile keyword, barriers, or +inline assembly to guarantee some degree of immediacy for atomic_read() +and atomic_set(). This is not uniformly guaranteed, and may change in +the future, so all users of atomic_t should treat atomic_read() and +atomic_set() as simple C assignment statements that may be reordered or +optimized away entirely by the compiler or processor, and explicitly +invoke the appropriate compiler and/or memory barrier for each use case. +Failure to do so will result in code that may suddenly break when used with +different architectures or compiler optimizations, or even changes in +unrelated code which changes how the compiler optimizes the section +accessing atomic_t variables. + +*** YOU HAVE BEEN WARNED! *** + Now, we move onto the actual atomic operation interfaces. void atomic_add(int i, atomic_t *v); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] rfkill: Add rfkill documentation
On Mon, 10 Sep 2007 19:56:03 +0200 Ivo van Doorn wrote: > Add a documentation file which contains > a short description about rfkill with some > notes about drivers and the userspace interface. Thanks. I have noted a few typo/editorial changes below. > Signed-off-by: Ivo van Doorn <[EMAIL PROTECTED]> > Acked-by: Dmitry Torokhov <[EMAIL PROTECTED]> > --- > Documentation/rfkill.txt | 88 > ++ > 1 files changed, 88 insertions(+), 0 deletions(-) > create mode 100644 Documentation/rfkill.txt > > diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt > new file mode 100644 > index 000..93c76fc > --- /dev/null > +++ b/Documentation/rfkill.txt > @@ -0,0 +1,88 @@ > +rfkill - RF switch subsystem support > + > + > +1 Implementation details > +2 Driver support > +3 Userspace support > + > +=== > +1: Implementation details > + > +The rfkill switch subsystem offers support for keys often found on laptops > +to enable wireless devices like WiFi and Bluetooth. > + > +This is done by providing the user 3 possibilities: > + - The rfkill system handles all events, userspace is not aware of events. > + - The rfkill system handles all events, userspace is informed about the > event. > + - The rfkill system does not handle events, userspace handles all events. I would s/,/;/ in the 3 lines above. > +The buttons to enable and disable the wireless radios are important in > +situations where the user is for example using his laptop on a location where > +wireless radios _must_ be disabled (e.g airplanes). > +Because of this requirement, userspace support for the keys should not be > +made mandatory. Because userspace might want to perform some additional > smarter > +tasks when the key is pressed, rfkill still provides userspace the > possibility > +to take over the task to handle the key events. > + > +The system inside the kernel has been split into 2 seperate sections: separate > + 1 - RFKILL > + 2 - RFKILL_INPUT > + > +The first option enables rfkill support and will make sure userspace will > +be notified of any events through the input device. It also creates several > +sysfs entries which can be used by userspace. See section "Userspace > support". > + > +The second option provides a rfkill input handler. This handler will an > +listen to all rfkill key events and will toggle the radio accordingly, end above with ; or . If '.', s/with/With/ on next line. > +with this option enabled userspace could either do nothing or simply > +perform monitoring tasks. > + > + > +2: Driver support > + > +Drivers who wish to build in rfkill subsystem support should Drivers that But, drivers can't/don't wish, so it would be better to say something like: To build a driver with rfkill subsystem support, the driver should depend on the Kconfig symbol RFKILL; it should _not_ depend on RKFILL_INPUT. > +make sure their driver depends of the Kconfig option RFKILL, it should > +_not_ depend on RFKILL_INPUT. > + > +Unless key events trigger a interrupt to which the driver listens, polling an interrupt > +will be required to determine the key state changes. For this the input > +layer providers the input-polldev handler. > + > +A driver should implement a few steps to correctly make use of the > +rfkill subsystem. First for non-polling drivers: > + > + - rfkill_allocate() > + - input_allocate_device() > + - rfkill_register() > + - input_register_device() > + > +For polling drivers: > + > + - rfkill_allocate() > + - input_allocate_polled_device() > + - rfkill_register() > + - input_register_polled_device() > + > +When a key event has been detected, the correct event should be > +send over the input device which has been registered by the driver. sent > + > + > +3: Userspace support > + > +For each key a input device will be created which will send out the correct an > +key event when the rfkill key has been pressed. > + > +The following sysfs entries will be created: > + > + name: Name assigned by driver to this key (interface or driver name). > + type: Name of the key type ("wlan", "bluetooth", etc). > + state: Current state of the key. 1: On, 0: Off. > + claim: 1: Userspace handles events, 0: Kernel handles events > + > +Both the "state" and "claim" entries are also writable. For the "state" entry > +this means that when 1 or 0 is written all radios will be toggled > accordingly. will be written even if they are already in that state? > +For the "claim" entry writing 1 to it will mean that the kernel will no > longer s/will mean/means/ s/will no longer handle/no longer handles/ > +hand
[PATCH] include listenq max backlog in /proc/net/tcp and include in tcp_info
Return some useful information such as the maximum listen backlog and the current listen backlog in the tcp_info structure and have that match what one can see in /proc/net/tcp and /proc/net/tcp6. Signed-off-by: Rick Jones <[EMAIL PROTECTED]> --- diff -r bdcdd0e1ee9d Documentation/networking/proc_net_tcp.txt --- a/Documentation/networking/proc_net_tcp.txt Sat Sep 01 07:00:31 2007 + +++ b/Documentation/networking/proc_net_tcp.txt Mon Sep 10 16:09:46 2007 -0700 @@ -20,8 +20,8 @@ up into 3 parts because of the length of || | | |--> number of unrecovered RTO timeouts || | |--> number of jiffies until timer expires || |> timer_active (see below) - ||--> receive-queue - |---> transmit-queue + ||--> receive-queue or connection backlog + |---> transmit-queue or connection limit 10000 54165785 4 cd1e6040 25 4 27 3 -1 | || || | | | | |--> slow start size threshold, diff -r bdcdd0e1ee9d net/ipv4/tcp.c --- a/net/ipv4/tcp.cSat Sep 01 07:00:31 2007 + +++ b/net/ipv4/tcp.cMon Sep 10 16:09:46 2007 -0700 @@ -2030,8 +2030,14 @@ void tcp_get_info(struct sock *sk, struc info->tcpi_snd_mss = tp->mss_cache; info->tcpi_rcv_mss = icsk->icsk_ack.rcv_mss; - info->tcpi_unacked = tp->packets_out; - info->tcpi_sacked = tp->sacked_out; + if (sk->sk_state == TCP_LISTEN) { + info->tcpi_unacked = sk->sk_ack_backlog; + info->tcpi_sacked = sk->sk_max_ack_backlog; + } + else { + info->tcpi_unacked = tp->packets_out; + info->tcpi_sacked = tp->sacked_out; + } info->tcpi_lost = tp->lost_out; info->tcpi_retrans = tp->retrans_out; info->tcpi_fackets = tp->fackets_out; diff -r bdcdd0e1ee9d net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c Sat Sep 01 07:00:31 2007 + +++ b/net/ipv4/tcp_ipv4.c Mon Sep 10 16:09:46 2007 -0700 @@ -2320,7 +2320,8 @@ static void get_tcp4_sock(struct sock *s sprintf(tmpbuf, "%4d: %08X:%04X %08X:%04X %02X %08X:%08X %02X:%08lX " "%08X %5d %8d %lu %d %p %u %u %u %u %d", i, src, srcp, dest, destp, sk->sk_state, - tp->write_seq - tp->snd_una, + sk->sk_state == TCP_LISTEN ? sk->sk_max_ack_backlog : +(tp->write_seq - tp->snd_una), sk->sk_state == TCP_LISTEN ? sk->sk_ack_backlog : (tp->rcv_nxt - tp->copied_seq), timer_active, diff -r bdcdd0e1ee9d net/ipv6/tcp_ipv6.c --- a/net/ipv6/tcp_ipv6.c Sat Sep 01 07:00:31 2007 + +++ b/net/ipv6/tcp_ipv6.c Mon Sep 10 16:09:46 2007 -0700 @@ -2005,8 +2005,10 @@ static void get_tcp6_sock(struct seq_fil dest->s6_addr32[0], dest->s6_addr32[1], dest->s6_addr32[2], dest->s6_addr32[3], destp, sp->sk_state, - tp->write_seq-tp->snd_una, - (sp->sk_state == TCP_LISTEN) ? sp->sk_ack_backlog : (tp->rcv_nxt - tp->copied_seq), + (sp->sk_state == TCP_LISTEN) ? sp->sk_max_ack_backlog: + tp->write_seq-tp->snd_una, + (sp->sk_state == TCP_LISTEN) ? sp->sk_ack_backlog : + (tp->rcv_nxt - tp->copied_seq), timer_active, jiffies_to_clock_t(timer_expires - jiffies), icsk->icsk_retransmits, - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why does tcp_v[46]_conn_request not inc MIB stats
Sridhar Samudrala wrote: On Mon, 2007-09-10 at 11:42 -0700, Rick Jones wrote: I've been digging around to see about inducing /proc/net/tcp to show some "interesting" things for listen sockets (eg backlog depth, its max, and dropped connection requests). backlog depth(acceptq length) for a listening socket should be available with the newer kernels. The following patch exports this value via the rx_queue field in /proc/net/tcp. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=47da8ee681d04e68ca1b1812c10e28162150d453 Yep, I see it in the 2.6.23-rc5 tree I'm using. At the risk of yet another "merely practice" patching excercise I'm putting together a patch which will also return that in a TCP_INFO and add the max backlog (to the tx_queue field) While doing that, I've noticed that Documenation/networking/proc-net-tcp.txt (?) talks about a tcp6_get_info which I cannot find anywhere in the tree. I'm not sure if that simply means that tcp_get_info is what is used for a "tcp6" connection and the text can simply be removed from the documentation, or if it is called something else. While there I've noticed that both tcp_v[46]_syn_recv_sock and tcp_v[46]conn_request both check that the listen queue is full, but only tcp_v[46]_syn_recv_sock increments some mib stats for dropped connection requests. Is that deliberate, or is that a hole in the stats? looks like it is a hole in the stats. I think we should increment LISTENOVERFLOWS or LISTENDROPS in tcp_v[46]_conn_request too if the SYN is dropped. OK. Now, can we get a third to Sridhar's second?-) rick jones struggling through the maze of twisty routines for connection establishment - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU
On Mon, Sep 10, 2007 at 03:46:30PM -0400, Vlad Yasevich wrote: > Since the sctp_sockaddr_entry is now RCU enabled as part of > the patch to synchronize sctp_localaddr_list, it makes sense to > change all handling of these entries to RCU. This includes the > sctp_bind_addrs structure and it's list of bound addresses. > > This list is currently protected by an external rw_lock and that > looks like an overkill. There are only 2 writers to the list: > bind()/bindx() calls, and BH processing of ASCONF-ACK chunks. > These are already seriealized via the socket lock, so they will > not step on each other. These are also relatively rare, so we > should be good with RCU. > > The readers are varied and they are easily converted to RCU. Again, good start -- similar questions as for the other patch in this series. Also a few places where it looks like you are letting a pointer to an RCU-protected data structure slip out of rcu_read_lock() protection, and a case of mixing rcu_read_lock() and rcu_read_lock_bh() within the same RCU-protected data structure. Thanx, Paul > Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]> > --- > include/net/sctp/structs.h |3 - > net/sctp/associola.c | 14 +-- > net/sctp/bind_addr.c | 59 ++-- > net/sctp/endpointola.c | 26 - > net/sctp/ipv6.c| 12 ++--- > net/sctp/protocol.c| 25 +--- > net/sctp/sm_make_chunk.c | 17 +++- > net/sctp/socket.c | 92 > ++-- > 8 files changed, 97 insertions(+), 151 deletions(-) > > diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h > index 2591c49..1d46f7d 100644 > --- a/include/net/sctp/structs.h > +++ b/include/net/sctp/structs.h > @@ -1222,9 +1222,6 @@ struct sctp_ep_common { >* bind_addr.address_list is our set of local IP addresses. >*/ > struct sctp_bind_addr bind_addr; > - > - /* Protection during address list comparisons. */ > - rwlock_t addr_lock; > }; > > > diff --git a/net/sctp/associola.c b/net/sctp/associola.c > index 2ad1caf..9bad8ba 100644 > --- a/net/sctp/associola.c > +++ b/net/sctp/associola.c > @@ -99,7 +99,6 @@ static struct sctp_association > *sctp_association_init(struct sctp_association *a > > /* Initialize the bind addr area. */ > sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port); > - rwlock_init(&asoc->base.addr_lock); > > asoc->state = SCTP_STATE_CLOSED; > > @@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct > sctp_association *asoc, > { > struct sctp_transport *transport; > > - sctp_read_lock(&asoc->base.addr_lock); > - > if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) && > (htons(asoc->peer.port) == paddr->v4.sin_port)) { > transport = sctp_assoc_lookup_paddr(asoc, paddr); > @@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct > sctp_association *asoc, > transport = NULL; > > out: > - sctp_read_unlock(&asoc->base.addr_lock); > return transport; > } > > @@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct > sctp_association *asoc, > int sctp_assoc_lookup_laddr(struct sctp_association *asoc, > const union sctp_addr *laddr) > { > - int found; > + int found = 0; > > - sctp_read_lock(&asoc->base.addr_lock); > if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) && > sctp_bind_addr_match(&asoc->base.bind_addr, laddr, > - sctp_sk(asoc->base.sk))) { > + sctp_sk(asoc->base.sk))) > found = 1; > - goto out; > - } > > - found = 0; > -out: > - sctp_read_unlock(&asoc->base.addr_lock); > return found; > } > > diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c > index 7fc369f..9c7db1f 100644 > --- a/net/sctp/bind_addr.c > +++ b/net/sctp/bind_addr.c > @@ -167,7 +167,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union > sctp_addr *new, > > INIT_LIST_HEAD(&addr->list); > INIT_RCU_HEAD(&addr->rcu); > - list_add_tail(&addr->list, &bp->address_list); > + > + rcu_read_lock(); > + list_add_tail_rcu(&addr->list, &bp->address_list); > + rcu_read_unlock(); Given the original code, we presumably hold the update-side lock. If so, the rcu_read_lock() and rcu_read_unlock() are (harmlessly) redundant. > SCTP_DBG_OBJCNT_INC(addr); > > return 0; > @@ -178,20 +181,23 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union > sctp_addr *new, > */ > int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr) > { > - struct list_head *pos, *temp; > - struct sctp_sockaddr_entry *addr; > + struct sctp_sockaddr_entry *addr, *temp; > > - list_for_each_safe(pos, tem
Re: [PATCH] ipconfig.c: De-clutter IP configuration report
On Sep 10 2007 13:09, Maciej W. Rozycki wrote: > The new code builds fine; no semantic changes. > > Please apply, > > Maciej > >patch-mips-2.6.23-rc5-20070904-ipconfig-printk-2 >diff -up --recursive --new-file >linux-mips-2.6.23-rc5-20070904.macro/net/ipv4/ipconfig.c >linux-mips-2.6.23-rc5-20070904/net/ipv4/ipconfig.c >--- linux-mips-2.6.23-rc5-20070904.macro/net/ipv4/ipconfig.c 2007-09-04 >04:56:22.0 + >+++ linux-mips-2.6.23-rc5-20070904/net/ipv4/ipconfig.c 2007-09-10 >11:53:19.0 + >@@ -1364,17 +1364,17 @@ static int __init ip_auto_config(void) > /* >* Clue in the operator. >*/ >- printk("IP-Config: Complete:"); >- printk("\n device=%s", ic_dev->name); >- printk(", addr=%u.%u.%u.%u", NIPQUAD(ic_myaddr)); >- printk(", mask=%u.%u.%u.%u", NIPQUAD(ic_netmask)); >- printk(", gw=%u.%u.%u.%u", NIPQUAD(ic_gateway)); >- printk(",\n host=%s, domain=%s, nis-domain=%s", >- utsname()->nodename, ic_domain, utsname()->domainname); >- printk(",\n bootserver=%u.%u.%u.%u", NIPQUAD(ic_servaddr)); >- printk(", rootserver=%u.%u.%u.%u", NIPQUAD(root_server_addr)); >- printk(", rootpath=%s", root_server_path); >- printk("\n"); >+ pr_info("IP-Config: Complete:\n"); >+ pr_info(" device=%s, addr=%u.%u.%u.%u, " >+ "mask=%u.%u.%u.%u, gw=%u.%u.%u.%u,\n", >+ ic_dev->name, NIPQUAD(ic_myaddr), >+ NIPQUAD(ic_netmask), NIPQUAD(ic_gateway)); >+ pr_info(" host=%s, domain=%s, nis-domain=%s,\n", >+ utsname()->nodename, ic_domain, utsname()->domainname); >+ pr_info(" bootserver=%u.%u.%u.%u, " >+ "rootserver=%u.%u.%u.%u, rootpath=%s\n", >+ NIPQUAD(ic_servaddr), >+ NIPQUAD(root_server_addr), root_server_path); > #endif /* !SILENT */ > > return 0; It should really be done in userspace. And ripped from the kernel. Jan -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why does tcp_v[46]_conn_request not inc MIB stats
On Mon, 2007-09-10 at 11:42 -0700, Rick Jones wrote: > I've been digging around to see about inducing /proc/net/tcp to show > some "interesting" things for listen sockets (eg backlog depth, its max, > and dropped connection requests). backlog depth(acceptq length) for a listening socket should be available with the newer kernels. The following patch exports this value via the rx_queue field in /proc/net/tcp. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=47da8ee681d04e68ca1b1812c10e28162150d453 > While there I've noticed that both > tcp_v[46]_syn_recv_sock and tcp_v[46]conn_request both check that the > listen queue is full, but only tcp_v[46]_syn_recv_sock increments some > mib stats for dropped connection requests. > > Is that deliberate, or is that a hole in the stats? looks like it is a hole in the stats. I think we should increment LISTENOVERFLOWS or LISTENDROPS in tcp_v[46]_conn_request too if the SYN is dropped. Thanks Sridhar - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, Sep 10, 2007 at 02:36:26PM -0700, Christoph Lameter wrote: > On Mon, 10 Sep 2007, Paul E. McKenney wrote: > > > The one exception to this being the case where process-level code is > > communicating to an interrupt handler running on that same CPU -- on > > all CPUs that I am aware of, a given CPU always sees its own writes > > in order. > > Yes but that is due to the code path effectively continuing in the > interrupt handler. The cpu makes sure that op codes being executed always > see memory in a consistent way. The basic ordering problem with out of > order writes is therefore coming from other processors concurrently > executing code and holding variables in registers that are modified > elsewhere. The only solution that I know of are one or the other form of > barrier. So we are agreed then -- volatile accesses may be of some assistance when interacting with interrupt handlers running on the same CPU (presumably when using per-CPU variables), but are generally useless when sharing variables among CPUs. Correct? Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
On Mon, Sep 10, 2007 at 03:46:29PM -0400, Vlad Yasevich wrote: > sctp_localaddr_list is modified dynamically via NETDEV_UP > and NETDEV_DOWN events, but there is not synchronization > between writer (even handler) and readers. As a result, > the readers can access an entry that has been freed and > crash the sytem. Good start, but few questions interspersed below... Thanx, Paul > Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]> > --- > include/net/sctp/sctp.h|1 + > include/net/sctp/structs.h |2 + > net/sctp/bind_addr.c |2 + > net/sctp/ipv6.c| 33 > net/sctp/protocol.c| 50 ++- > net/sctp/socket.c | 38 ++--- > 6 files changed, 88 insertions(+), 38 deletions(-) > > diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h > index d529045..c9cc00c 100644 > --- a/include/net/sctp/sctp.h > +++ b/include/net/sctp/sctp.h > @@ -123,6 +123,7 @@ > * sctp/protocol.c > */ > extern struct sock *sctp_get_ctl_sock(void); > +extern void sctp_local_addr_free(struct rcu_head *head); > extern int sctp_copy_local_addr_list(struct sctp_bind_addr *, >sctp_scope_t, gfp_t gfp, >int flags); > diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h > index c0d5848..2591c49 100644 > --- a/include/net/sctp/structs.h > +++ b/include/net/sctp/structs.h > @@ -737,8 +737,10 @@ const union sctp_addr *sctp_source(const struct > sctp_chunk *chunk); > /* This is a structure for holding either an IPv6 or an IPv4 address. */ > struct sctp_sockaddr_entry { > struct list_head list; > + struct rcu_head rcu; > union sctp_addr a; > __u8 use_as_src; > + __u8 valid; > }; > > typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association > *); > diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c > index fdb287a..7fc369f 100644 > --- a/net/sctp/bind_addr.c > +++ b/net/sctp/bind_addr.c > @@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union > sctp_addr *new, > addr->a.v4.sin_port = htons(bp->port); > > addr->use_as_src = use_as_src; > + addr->valid = 1; > > INIT_LIST_HEAD(&addr->list); > + INIT_RCU_HEAD(&addr->rcu); > list_add_tail(&addr->list, &bp->address_list); > SCTP_DBG_OBJCNT_INC(addr); > > diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c > index f8aa23d..fc2e4e2 100644 > --- a/net/sctp/ipv6.c > +++ b/net/sctp/ipv6.c > @@ -77,13 +77,18 @@ > > #include > > -/* Event handler for inet6 address addition/deletion events. */ > +/* Event handler for inet6 address addition/deletion events. > + * This even is part of the atomic notifier call chain > + * and thus happens atomically and can NOT sleep. As a result > + * we can't and really don't need to add any locks to guard the > + * RCU. > + */ > static int sctp_inet6addr_event(struct notifier_block *this, unsigned long > ev, > void *ptr) > { > struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr; > - struct sctp_sockaddr_entry *addr; > - struct list_head *pos, *temp; > + struct sctp_sockaddr_entry *addr = NULL; > + struct sctp_sockaddr_entry *temp; > > switch (ev) { > case NETDEV_UP: > @@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block > *this, unsigned long ev, > memcpy(&addr->a.v6.sin6_addr, &ifa->addr, >sizeof(struct in6_addr)); > addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex; > - list_add_tail(&addr->list, &sctp_local_addr_list); > + addr->valid = 1; > + rcu_read_lock(); > + list_add_tail_rcu(&addr->list, &sctp_local_addr_list); > + rcu_read_unlock(); If we are under the protection of the update-side mutex, the rcu_read_lock() and rcu_read_unlock() are (harmlessly) redundant. If we are not under the protection of some mutex, what prevents concurrent list_add_tail_rcu() calls from messing up the sctp_sockaddr_entry list? > } > break; > case NETDEV_DOWN: > - list_for_each_safe(pos, temp, &sctp_local_addr_list) { > - addr = list_entry(pos, struct sctp_sockaddr_entry, > list); > - if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) > { > - list_del(pos); > - kfree(addr); > + rcu_read_lock(); > + list_for_each_entry_safe(addr, temp, > + &sctp_local_addr_list, list) { > + if (ipv6_addr_equal(&addr->a.v6.sin6_addr, > + &ifa->addr)) { >
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 10 Sep 2007, Paul E. McKenney wrote: > The one exception to this being the case where process-level code is > communicating to an interrupt handler running on that same CPU -- on > all CPUs that I am aware of, a given CPU always sees its own writes > in order. Yes but that is due to the code path effectively continuing in the interrupt handler. The cpu makes sure that op codes being executed always see memory in a consistent way. The basic ordering problem with out of order writes is therefore coming from other processors concurrently executing code and holding variables in registers that are modified elsewhere. The only solution that I know of are one or the other form of barrier. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ne driver crashes when unloaded in 2.6.22.6
On Sun, 2007-09-09 at 23:12 +0100, Chris Rankin wrote: > --- Dan Williams <[EMAIL PROTECTED]> wrote: > > Offhand question, does your ne2000 card support carrier detection? > > Err... there is a /sys/class/net/eth0/carrier entry (I think - not in front > of that machine right > now). IIRC it said "1" when I read it. Does it read '0' when you unplug the cable? Dan > Cheers, > Chris > > > > ___ > Yahoo! Answers - Got a question? Someone out there knows the answer. Try it > now. > http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, Sep 10, 2007 at 11:59:29AM -0700, Christoph Lameter wrote: > On Fri, 17 Aug 2007, Segher Boessenkool wrote: > > > "volatile" has nothing to do with reordering. atomic_dec() writes > > to memory, so it _does_ have "volatile semantics", implicitly, as > > long as the compiler cannot optimise the atomic variable away > > completely -- any store counts as a side effect. > > Stores can be reordered. Only x86 has (mostly) implicit write ordering. So > no atomic_dec has no volatile semantics and may be reordered on a variety > of processors. Writes to memory may not follow code order on several > processors. The one exception to this being the case where process-level code is communicating to an interrupt handler running on that same CPU -- on all CPUs that I am aware of, a given CPU always sees its own writes in order. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sep 10, 2007, at 12:46:33, Denys Vlasenko wrote: My point is that people are confused as to what atomic_read() exactly means, and this is bad. Same for cpu_relax(). First one says "read", and second one doesn't say "barrier". Q&A: Q: When is it OK to use atomic_read()? A: You are asking the question, so never. Q: But I need to check the value of the atomic at this point in time... A: Your code is buggy if it needs to do that on an atomic_t for anything other than debugging or optimization. Use either atomic_*_return() or a lock and some normal integers. Q: "So why can't the atomic_read DTRT magically?" A: Because "the right thing" depends on the situation and is usually best done with something other than atomic_t. If somebody can post some non-buggy code which is correctly using atomic_read() *and* depends on the compiler generating extra nonsensical loads due to "volatile" then the issue *might* be reconsidered. This also includes samples of code which uses atomic_read() and needs memory barriers (so that we can fix the buggy code, not so we can change atomic_read()). So far the only code samples anybody has posted are buggy regardless of whether or not the value and/or accessors are flagged "volatile" or not. And hey, maybe the volatile ops *should* be implemented in inline ASM for future- proof-ness, but that's a separate issue. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATH 0/2] Add RCU locking to SCTP address management
Hi All This is my first attempt to add RCU synchronization to pieces of SCTP and I want to make sure I do this right. The RCU docs a somewhat outdated, and the calling conventions differ between subsystems, so I am using what I've been able to find. A bit of a background... The whole problem started with a panic that led me to NULL deref while walking a global list of addresses that SCTP maitains. That list is modified curing the NETDEV_UP and NETDEV_DOWN notifier processing and all readers are run in user context. It looks like the list was modified while a user app was walking it and we crashed. Easy enough to fix by adding locking. However, some notifiers (ipv6 addrconf) run in atomic context and it says that they can't sleep. (Q1. Why are ipv6 notifiers atomic, while IPv4 notifires are blocking?) So, I decided to add RCU locking around that list. These events are already synchronized, so no additional locking on the writer side are need, so should be make for a nice RCU conversion. While doing this conversion, I saw that the same structures are used to maintain a list of bound addresses. This seemed like another opportunity to use RCU. In this case, the readers are wide spread, but there are only 2 wirters: BH processing of specific chunks, and bind() calls. Again the writers are already synchronized on the socket lock, so they will keep out of each others way. Readers are widespread, but rcu takes care of that nicely. So, can folks please take a look and make sure I didn't mess up the rcu conventions. Thanks a lot -vlad - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
sctp_localaddr_list is modified dynamically via NETDEV_UP and NETDEV_DOWN events, but there is not synchronization between writer (even handler) and readers. As a result, the readers can access an entry that has been freed and crash the sytem. Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]> --- include/net/sctp/sctp.h|1 + include/net/sctp/structs.h |2 + net/sctp/bind_addr.c |2 + net/sctp/ipv6.c| 33 net/sctp/protocol.c| 50 ++- net/sctp/socket.c | 38 ++--- 6 files changed, 88 insertions(+), 38 deletions(-) diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index d529045..c9cc00c 100644 --- a/include/net/sctp/sctp.h +++ b/include/net/sctp/sctp.h @@ -123,6 +123,7 @@ * sctp/protocol.c */ extern struct sock *sctp_get_ctl_sock(void); +extern void sctp_local_addr_free(struct rcu_head *head); extern int sctp_copy_local_addr_list(struct sctp_bind_addr *, sctp_scope_t, gfp_t gfp, int flags); diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index c0d5848..2591c49 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -737,8 +737,10 @@ const union sctp_addr *sctp_source(const struct sctp_chunk *chunk); /* This is a structure for holding either an IPv6 or an IPv4 address. */ struct sctp_sockaddr_entry { struct list_head list; + struct rcu_head rcu; union sctp_addr a; __u8 use_as_src; + __u8 valid; }; typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *); diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c index fdb287a..7fc369f 100644 --- a/net/sctp/bind_addr.c +++ b/net/sctp/bind_addr.c @@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new, addr->a.v4.sin_port = htons(bp->port); addr->use_as_src = use_as_src; + addr->valid = 1; INIT_LIST_HEAD(&addr->list); + INIT_RCU_HEAD(&addr->rcu); list_add_tail(&addr->list, &bp->address_list); SCTP_DBG_OBJCNT_INC(addr); diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index f8aa23d..fc2e4e2 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -77,13 +77,18 @@ #include -/* Event handler for inet6 address addition/deletion events. */ +/* Event handler for inet6 address addition/deletion events. + * This even is part of the atomic notifier call chain + * and thus happens atomically and can NOT sleep. As a result + * we can't and really don't need to add any locks to guard the + * RCU. + */ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev, void *ptr) { struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr; - struct sctp_sockaddr_entry *addr; - struct list_head *pos, *temp; + struct sctp_sockaddr_entry *addr = NULL; + struct sctp_sockaddr_entry *temp; switch (ev) { case NETDEV_UP: @@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev, memcpy(&addr->a.v6.sin6_addr, &ifa->addr, sizeof(struct in6_addr)); addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex; - list_add_tail(&addr->list, &sctp_local_addr_list); + addr->valid = 1; + rcu_read_lock(); + list_add_tail_rcu(&addr->list, &sctp_local_addr_list); + rcu_read_unlock(); } break; case NETDEV_DOWN: - list_for_each_safe(pos, temp, &sctp_local_addr_list) { - addr = list_entry(pos, struct sctp_sockaddr_entry, list); - if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) { - list_del(pos); - kfree(addr); + rcu_read_lock(); + list_for_each_entry_safe(addr, temp, + &sctp_local_addr_list, list) { + if (ipv6_addr_equal(&addr->a.v6.sin6_addr, +&ifa->addr)) { + addr->valid = 0; + list_del_rcu(&addr->list); break; } } - + rcu_read_unlock(); + if (addr && !addr->valid) + call_rcu(&addr->rcu, sctp_local_addr_free); break; } @@ -368,6 +380,7 @@ static void sctp_v6_copy_addrlist(struct list_head *addrlist, addr->a.v6.sin6_addr = ifp->addr; addr->a.v6.sin6_scope_id = dev->ifindex;
[RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU
Since the sctp_sockaddr_entry is now RCU enabled as part of the patch to synchronize sctp_localaddr_list, it makes sense to change all handling of these entries to RCU. This includes the sctp_bind_addrs structure and it's list of bound addresses. This list is currently protected by an external rw_lock and that looks like an overkill. There are only 2 writers to the list: bind()/bindx() calls, and BH processing of ASCONF-ACK chunks. These are already seriealized via the socket lock, so they will not step on each other. These are also relatively rare, so we should be good with RCU. The readers are varied and they are easily converted to RCU. Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]> --- include/net/sctp/structs.h |3 - net/sctp/associola.c | 14 +-- net/sctp/bind_addr.c | 59 ++-- net/sctp/endpointola.c | 26 - net/sctp/ipv6.c| 12 ++--- net/sctp/protocol.c| 25 +--- net/sctp/sm_make_chunk.c | 17 +++- net/sctp/socket.c | 92 ++-- 8 files changed, 97 insertions(+), 151 deletions(-) diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 2591c49..1d46f7d 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1222,9 +1222,6 @@ struct sctp_ep_common { * bind_addr.address_list is our set of local IP addresses. */ struct sctp_bind_addr bind_addr; - - /* Protection during address list comparisons. */ - rwlock_t addr_lock; }; diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 2ad1caf..9bad8ba 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -99,7 +99,6 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a /* Initialize the bind addr area. */ sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port); - rwlock_init(&asoc->base.addr_lock); asoc->state = SCTP_STATE_CLOSED; @@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc, { struct sctp_transport *transport; - sctp_read_lock(&asoc->base.addr_lock); - if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) && (htons(asoc->peer.port) == paddr->v4.sin_port)) { transport = sctp_assoc_lookup_paddr(asoc, paddr); @@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc, transport = NULL; out: - sctp_read_unlock(&asoc->base.addr_lock); return transport; } @@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct sctp_association *asoc, int sctp_assoc_lookup_laddr(struct sctp_association *asoc, const union sctp_addr *laddr) { - int found; + int found = 0; - sctp_read_lock(&asoc->base.addr_lock); if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) && sctp_bind_addr_match(&asoc->base.bind_addr, laddr, -sctp_sk(asoc->base.sk))) { +sctp_sk(asoc->base.sk))) found = 1; - goto out; - } - found = 0; -out: - sctp_read_unlock(&asoc->base.addr_lock); return found; } diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c index 7fc369f..9c7db1f 100644 --- a/net/sctp/bind_addr.c +++ b/net/sctp/bind_addr.c @@ -167,7 +167,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new, INIT_LIST_HEAD(&addr->list); INIT_RCU_HEAD(&addr->rcu); - list_add_tail(&addr->list, &bp->address_list); + + rcu_read_lock(); + list_add_tail_rcu(&addr->list, &bp->address_list); + rcu_read_unlock(); SCTP_DBG_OBJCNT_INC(addr); return 0; @@ -178,20 +181,23 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new, */ int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr) { - struct list_head *pos, *temp; - struct sctp_sockaddr_entry *addr; + struct sctp_sockaddr_entry *addr, *temp; - list_for_each_safe(pos, temp, &bp->address_list) { - addr = list_entry(pos, struct sctp_sockaddr_entry, list); + rcu_read_lock_bh(); + list_for_each_entry_safe(addr, temp, &bp->address_list, list) { if (sctp_cmp_addr_exact(&addr->a, del_addr)) { /* Found the exact match. */ - list_del(pos); - kfree(addr); - SCTP_DBG_OBJCNT_DEC(addr); - - return 0; + addr->valid = 0; + list_del_rcu(&addr->list); + break; } } + rcu_read_unlock_bh(); + + if (addr && !addr->valid) { + call_rcu_bh(&addr->rcu, sctp
Re: [PATCH 16/16] net: netlink support for moving devices between network namespaces.
"Serge E. Hallyn" <[EMAIL PROTECTED]> writes: >> >> +static struct net *get_net_ns_by_pid(pid_t pid) >> +{ >> +struct task_struct *tsk; >> +struct net *net; >> + >> +/* Lookup the network namespace */ >> +net = ERR_PTR(-ESRCH); >> +rcu_read_lock(); >> +tsk = find_task_by_pid(pid); >> +if (tsk) { >> +task_lock(tsk); >> +if (tsk->nsproxy) >> +net = get_net(tsk->nsproxy->net_ns); >> +task_unlock(tsk); > > Thinking... Ok, I'm not sure this is 100% safe in the target tree, but > the long-term correct way probably isn't yet implemented in the net- > tree. Eventually you will want to: > > net_ns = NULL; > rcu_read_lock(); > tsk = find_task_by_pid(); /* or _pidns equiv? */ > nsproxy = task_nsproxy(tsk); > if (nsproxy) > net_ns = get_net(nsproxy->net_ns); > rcu_read_unlock; > > What you have here is probably unsafe if tsk is the last task pointing > to it's nsproxy and it does an unshare, bc unshare isn't protected by > task_lock, and you're not rcu_dereferencing tsk->nsproxy (which > task_nsproxy does). At one point we floated a patch to reuse the same > nsproxy in that case which would prevent you having to worry about it, > but that isn't being done in -mm now so i doubt it's in -net. That change isn't merged upstream yet, so it isn't in David's net-2.6.24 tree. Currently task->nsproxy is protected but task_lock(current). So the code is fine. I am aware that removing the task_lock(current) for the setting of current->nsproxy is currently in the works, and I have planned to revisit this later when all of these pieces come together. For now the code is fine. If need be we can drop this patch to remove the potential merge conflict. But I figured it was useful for this part of the user space interface to be available for review. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/16] net: netlink support for moving devices between network namespaces.
Quoting Eric W. Biederman ([EMAIL PROTECTED]): > > The simplest thing to implement is moving network devices between > namespaces. However with the same attribute IFLA_NET_NS_PID we can > easily implement creating devices in the destination network > namespace as well. However that is a little bit trickier so this > patch sticks to what is simple and easy. > > A pid is used to identify a process that happens to be a member > of the network namespace we want to move the network device to. > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> > --- > include/linux/if_link.h |1 + > net/core/rtnetlink.c| 35 +++ > 2 files changed, 36 insertions(+), 0 deletions(-) > > diff --git a/include/linux/if_link.h b/include/linux/if_link.h > index 422084d..84c3492 100644 > --- a/include/linux/if_link.h > +++ b/include/linux/if_link.h > @@ -78,6 +78,7 @@ enum > IFLA_LINKMODE, > IFLA_LINKINFO, > #define IFLA_LINKINFO IFLA_LINKINFO > + IFLA_NET_NS_PID, > __IFLA_MAX > }; > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index 44f91bb..1b9c32d 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -35,6 +35,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -727,6 +728,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = { > [IFLA_WEIGHT] = { .type = NLA_U32 }, > [IFLA_OPERSTATE]= { .type = NLA_U8 }, > [IFLA_LINKMODE] = { .type = NLA_U8 }, > + [IFLA_NET_NS_PID] = { .type = NLA_U32 }, > }; > > static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = { > @@ -734,12 +736,45 @@ static const struct nla_policy > ifla_info_policy[IFLA_INFO_MAX+1] = { > [IFLA_INFO_DATA]= { .type = NLA_NESTED }, > }; > > +static struct net *get_net_ns_by_pid(pid_t pid) > +{ > + struct task_struct *tsk; > + struct net *net; > + > + /* Lookup the network namespace */ > + net = ERR_PTR(-ESRCH); > + rcu_read_lock(); > + tsk = find_task_by_pid(pid); > + if (tsk) { > + task_lock(tsk); > + if (tsk->nsproxy) > + net = get_net(tsk->nsproxy->net_ns); > + task_unlock(tsk); Thinking... Ok, I'm not sure this is 100% safe in the target tree, but the long-term correct way probably isn't yet implemented in the net- tree. Eventually you will want to: net_ns = NULL; rcu_read_lock(); tsk = find_task_by_pid(); /* or _pidns equiv? */ nsproxy = task_nsproxy(tsk); if (nsproxy) net_ns = get_net(nsproxy->net_ns); rcu_read_unlock; What you have here is probably unsafe if tsk is the last task pointing to it's nsproxy and it does an unshare, bc unshare isn't protected by task_lock, and you're not rcu_dereferencing tsk->nsproxy (which task_nsproxy does). At one point we floated a patch to reuse the same nsproxy in that case which would prevent you having to worry about it, but that isn't being done in -mm now so i doubt it's in -net. > + } > + rcu_read_unlock(); > + return net; > +} > + > static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, > struct nlattr **tb, char *ifname, int modified) > { > int send_addr_notify = 0; > int err; > > + if (tb[IFLA_NET_NS_PID]) { > + struct net *net; > + net = get_net_ns_by_pid(nla_get_u32(tb[IFLA_NET_NS_PID])); > + if (IS_ERR(net)) { > + err = PTR_ERR(net); > + goto errout; > + } > + err = dev_change_net_namespace(dev, net, ifname); > + put_net(net); > + if (err) > + goto errout; > + modified = 1; > + } > + > if (tb[IFLA_MAP]) { > struct rtnl_link_ifmap *u_map; > struct ifmap k_map; > -- > 1.5.3.rc6.17.g1911 > > ___ > Containers mailing list > [EMAIL PROTECTED] > https://lists.linux-foundation.org/mailman/listinfo/containers - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 10 Sep 2007, Linus Torvalds wrote: > The fact is, "volatile" *only* makes things worse. It generates worse > code, and never fixes any real bugs. This is a *fact*. Yes, lets just drop the volatiles now! We need a patch that gets rid of them Volunteers? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Segher Boessenkool wrote: > "volatile" has nothing to do with reordering. atomic_dec() writes > to memory, so it _does_ have "volatile semantics", implicitly, as > long as the compiler cannot optimise the atomic variable away > completely -- any store counts as a side effect. Stores can be reordered. Only x86 has (mostly) implicit write ordering. So no atomic_dec has no volatile semantics and may be reordered on a variety of processors. Writes to memory may not follow code order on several processors. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH resend] Fix a lock problem in generic phy code
Am Montag 10 September 2007 schrieb Herbert Xu: > Hans-J??rgen Koch <[EMAIL PROTECTED]> wrote: > > The following patch fixes it. Tested on an AT91SAM9263-EK board, kernel > > 2.6.23-rc4 and -rc3-mm1. > > Could you please audit all instances of physdev->lock and add > _bh where necessary? I can see that at least phys_stop also > needs the _bh. I think the patch does all that's necessary. At least, there're no error messages in the logs anymore. I didn't check if there's an error on unload, though. > > We should also consider whether it makes sense to move the > timer into a work queue. Somebody who's more involved in that PHY stuff should probably do that. I'm on holidays, sitting in a hotel room, and will probably not have the time to have a deeper look into that. Thanks, Hans - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
why does tcp_v[46]_conn_request not inc MIB stats
I've been digging around to see about inducing /proc/net/tcp to show some "interesting" things for listen sockets (eg backlog depth, its max, and dropped connection requests). While there I've noticed that both tcp_v[46]_syn_recv_sock and tcp_v[46]conn_request both check that the listen queue is full, but only tcp_v[46]_syn_recv_sock increments some mib stats for dropped connection requests. Is that deliberate, or is that a hole in the stats? rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...
On Mon, Sep 10, 2007 at 01:20:38PM +0100, Maciej W. Rozycki wrote: > Remove typedefs, volatiles and convert kmalloc()/memset() pairs to > kcalloc(). Also reformat the surrounding clutter. > > Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]> > --- > Per your request, Andrew, a while ago. It builds, runs, passes > checkpatch.pl and sparse. No semantic changes. One step closer to sanity for this driver. So it's got my ACK. Ralf - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] rfkill: Add support for ultrawideband
This patch will add support for UWB keys to rfkill, support for this has been requested by Inaky. Signed-off-by: Ivo van Doorn <[EMAIL PROTECTED]> Acked-by: Dmitry Torokhov <[EMAIL PROTECTED]> --- include/linux/input.h |1 + include/linux/rfkill.h|2 ++ net/rfkill/rfkill-input.c |9 + net/rfkill/rfkill.c |3 +++ 4 files changed, 15 insertions(+), 0 deletions(-) diff --git a/include/linux/input.h b/include/linux/input.h index cf2b561..8e5828d 100644 --- a/include/linux/input.h +++ b/include/linux/input.h @@ -360,6 +360,7 @@ struct input_absinfo { #define KEY_BLUETOOTH 237 #define KEY_WLAN 238 +#define KEY_UWB239 #define KEY_UNKNOWN240 diff --git a/include/linux/rfkill.h b/include/linux/rfkill.h index c4546e1..f9a50da 100644 --- a/include/linux/rfkill.h +++ b/include/linux/rfkill.h @@ -31,10 +31,12 @@ * enum rfkill_type - type of rfkill switch. * RFKILL_TYPE_WLAN: switch is no a Wireless network devices. * RFKILL_TYPE_BlUETOOTH: switch is on a bluetooth device. + * RFKILL_TYPE_UWB: switch is on a Ultra wideband device. */ enum rfkill_type { RFKILL_TYPE_WLAN , RFKILL_TYPE_BLUETOOTH, + RFKILL_TYPE_UWB, RFKILL_TYPE_MAX, }; diff --git a/net/rfkill/rfkill-input.c b/net/rfkill/rfkill-input.c index 9f746be..8e4516a 100644 --- a/net/rfkill/rfkill-input.c +++ b/net/rfkill/rfkill-input.c @@ -81,6 +81,7 @@ static void rfkill_schedule_toggle(struct rfkill_task *task) static DEFINE_RFKILL_TASK(rfkill_wlan, RFKILL_TYPE_WLAN); static DEFINE_RFKILL_TASK(rfkill_bt, RFKILL_TYPE_BLUETOOTH); +static DEFINE_RFKILL_TASK(rfkill_uwb, RFKILL_TYPE_UWB); static void rfkill_event(struct input_handle *handle, unsigned int type, unsigned int code, int down) @@ -93,6 +94,9 @@ static void rfkill_event(struct input_handle *handle, unsigned int type, case KEY_BLUETOOTH: rfkill_schedule_toggle(&rfkill_bt); break; + case KEY_UWB: + rfkill_schedule_toggle(&rfkill_uwb); + break; default: break; } @@ -148,6 +152,11 @@ static const struct input_device_id rfkill_ids[] = { .evbit = { BIT(EV_KEY) }, .keybit = { [LONG(KEY_BLUETOOTH)] = BIT(KEY_BLUETOOTH) }, }, + { + .flags = INPUT_DEVICE_ID_MATCH_EVBIT | INPUT_DEVICE_ID_MATCH_KEYBIT, + .evbit = { BIT(EV_KEY) }, + .keybit = { [LONG(KEY_UWB)] = BIT(KEY_UWB) }, + }, { } }; diff --git a/net/rfkill/rfkill.c b/net/rfkill/rfkill.c index 50e0102..03ed7fd 100644 --- a/net/rfkill/rfkill.c +++ b/net/rfkill/rfkill.c @@ -106,6 +106,9 @@ static ssize_t rfkill_type_show(struct device *dev, case RFKILL_TYPE_BLUETOOTH: type = "bluetooth"; break; + case RFKILL_TYPE_UWB: + type = "ultrawideband"; + break; default: BUG(); } -- 1.5.3 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] rfkill: Add rfkill documentation
Add a documentation file which contains a short description about rfkill with some notes about drivers and the userspace interface. Signed-off-by: Ivo van Doorn <[EMAIL PROTECTED]> Acked-by: Dmitry Torokhov <[EMAIL PROTECTED]> --- Documentation/rfkill.txt | 88 ++ 1 files changed, 88 insertions(+), 0 deletions(-) create mode 100644 Documentation/rfkill.txt diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt new file mode 100644 index 000..93c76fc --- /dev/null +++ b/Documentation/rfkill.txt @@ -0,0 +1,88 @@ +rfkill - RF switch subsystem support + + +1 Implementation details +2 Driver support +3 Userspace support + +=== +1: Implementation details + +The rfkill switch subsystem offers support for keys often found on laptops +to enable wireless devices like WiFi and Bluetooth. + +This is done by providing the user 3 possibilities: + - The rfkill system handles all events, userspace is not aware of events. + - The rfkill system handles all events, userspace is informed about the event. + - The rfkill system does not handle events, userspace handles all events. + +The buttons to enable and disable the wireless radios are important in +situations where the user is for example using his laptop on a location where +wireless radios _must_ be disabled (e.g airplanes). +Because of this requirement, userspace support for the keys should not be +made mandatory. Because userspace might want to perform some additional smarter +tasks when the key is pressed, rfkill still provides userspace the possibility +to take over the task to handle the key events. + +The system inside the kernel has been split into 2 seperate sections: + 1 - RFKILL + 2 - RFKILL_INPUT + +The first option enables rfkill support and will make sure userspace will +be notified of any events through the input device. It also creates several +sysfs entries which can be used by userspace. See section "Userspace support". + +The second option provides a rfkill input handler. This handler will +listen to all rfkill key events and will toggle the radio accordingly, +with this option enabled userspace could either do nothing or simply +perform monitoring tasks. + + +2: Driver support + +Drivers who wish to build in rfkill subsystem support should +make sure their driver depends of the Kconfig option RFKILL, it should +_not_ depend on RFKILL_INPUT. + +Unless key events trigger a interrupt to which the driver listens, polling +will be required to determine the key state changes. For this the input +layer providers the input-polldev handler. + +A driver should implement a few steps to correctly make use of the +rfkill subsystem. First for non-polling drivers: + + - rfkill_allocate() + - input_allocate_device() + - rfkill_register() + - input_register_device() + +For polling drivers: + + - rfkill_allocate() + - input_allocate_polled_device() + - rfkill_register() + - input_register_polled_device() + +When a key event has been detected, the correct event should be +send over the input device which has been registered by the driver. + + +3: Userspace support + +For each key a input device will be created which will send out the correct +key event when the rfkill key has been pressed. + +The following sysfs entries will be created: + + name: Name assigned by driver to this key (interface or driver name). + type: Name of the key type ("wlan", "bluetooth", etc). + state: Current state of the key. 1: On, 0: Off. + claim: 1: Userspace handles events, 0: Kernel handles events + +Both the "state" and "claim" entries are also writable. For the "state" entry +this means that when 1 or 0 is written all radios will be toggled accordingly. +For the "claim" entry writing 1 to it will mean that the kernel will no longer +handle key events even though RFKILL_INPUT input was enabled. When "claim" has +been set to 0, userspace should make sure it will listen for the input events +or check the sysfs "state" entry regularly to correctly perform the required +tasks when the rkfill key is pressed. -- 1.5.3 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] rfkill: Remove IRDA
As Dmitry pointed out earlier, rfkill-input.c doesn't support irda because there are no users and we shouldn't add unrequired KEY_ defines. However, RFKILL_TYPE_IRDA was defined in the rfkill.h header file and would confuse people about whether it is implemented or not. This patch removes IRDA support completely, so it can be added whenever a driver wants the feature. Signed-off-by: Ivo van Doorn <[EMAIL PROTECTED]> Acked-by: Dmitry Torokhov <[EMAIL PROTECTED]> --- include/linux/rfkill.h |8 +++- net/rfkill/Kconfig |2 +- net/rfkill/rfkill.c|5 + 3 files changed, 5 insertions(+), 10 deletions(-) diff --git a/include/linux/rfkill.h b/include/linux/rfkill.h index a8a6ea8..c4546e1 100644 --- a/include/linux/rfkill.h +++ b/include/linux/rfkill.h @@ -31,13 +31,11 @@ * enum rfkill_type - type of rfkill switch. * RFKILL_TYPE_WLAN: switch is no a Wireless network devices. * RFKILL_TYPE_BlUETOOTH: switch is on a bluetooth device. - * RFKILL_TYPE_IRDA: switch is on an infrared devices. */ enum rfkill_type { - RFKILL_TYPE_WLAN = 0, - RFKILL_TYPE_BLUETOOTH = 1, - RFKILL_TYPE_IRDA = 2, - RFKILL_TYPE_MAX = 3, + RFKILL_TYPE_WLAN , + RFKILL_TYPE_BLUETOOTH, + RFKILL_TYPE_MAX, }; enum rfkill_state { diff --git a/net/rfkill/Kconfig b/net/rfkill/Kconfig index 8b31759..d28a6d9 100644 --- a/net/rfkill/Kconfig +++ b/net/rfkill/Kconfig @@ -5,7 +5,7 @@ menuconfig RFKILL tristate "RF switch subsystem support" help Say Y here if you want to have control over RF switches - found on many WiFi, Bluetooth and IRDA cards. + found on many WiFi and Bluetooth cards. To compile this driver as a module, choose M here: the module will be called rfkill. diff --git a/net/rfkill/rfkill.c b/net/rfkill/rfkill.c index db3395b..50e0102 100644 --- a/net/rfkill/rfkill.c +++ b/net/rfkill/rfkill.c @@ -106,9 +106,6 @@ static ssize_t rfkill_type_show(struct device *dev, case RFKILL_TYPE_BLUETOOTH: type = "bluetooth"; break; - case RFKILL_TYPE_IRDA: - type = "irda"; - break; default: BUG(); } @@ -281,7 +278,7 @@ static void rfkill_remove_switch(struct rfkill *rfkill) /** * rfkill_allocate - allocate memory for rfkill structure. * @parent: device that has rf switch on it - * @type: type of the switch (wlan, bluetooth, irda) + * @type: type of the switch (RFKILL_TYPE_*) * * This function should be called by the network driver when it needs * rfkill structure. Once the structure is allocated the driver shoud -- 1.5.3 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: auto recycling of TIME_WAIT connections
Pádraig Brady wrote: Rick Jones wrote: This was an issue over a decade ago with SPECweb96 benchmarking. The initial solution was to make the explicit bind() calls and not rely on the anonymous/ephemeral port space. After that, one starts adding additional IP's into the mix (at least where possible). And if that fails, one has to go back to the beginning and ask oneself exactly why a client is trying to churn through so many connections per second in the first place. right. This is for benchmarking mainly. Sane applications would use persistent connections, or a different form of IPC. All the more reason to go the "add more client IP's" path then. It gives you more connections per second, and gives you a much broader umber of "client" IP's hitting the server which will be more realistic. That is one thing I like very much about polygraph (based on what I've read) - it's use of _lots_ of client IPs to better simulate reality. I think other web-oriented benchmarks should start to include that as well for there are stacks which do indeed make "decisions" based on whether or not a destination is perceived to be "local" or not. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 16:09, Linus Torvalds wrote: > On Mon, 10 Sep 2007, Denys Vlasenko wrote: > > static inline int > > qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha) > > { > > int return_status = QLA_SUCCESS; > > unsigned long loop_timeout ; > > scsi_qla_host_t *pha = to_qla_parent(ha); > > > > /* wait for 5 min at the max for loop to be ready */ > > loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ); > > > > while ((!atomic_read(&pha->loop_down_timer) && > > atomic_read(&pha->loop_state) == LOOP_DOWN) || > > atomic_read(&pha->loop_state) != LOOP_READY) { > > if (atomic_read(&pha->loop_state) == LOOP_DEAD) { > ... > > Is above correct or buggy? Correct, because msleep is a barrier. > > Is it obvious? No. > > It's *buggy*. But it has nothing to do with any msleep() in the loop, or > anything else. > > And more importantly, it would be equally buggy even *with* a "volatile" > atomic_read(). I am not saying that this code is okay, this isn't the point. (The code is in fact awful for several more reasons). My point is that people are confused as to what atomic_read() exactly means, and this is bad. Same for cpu_relax(). First one says "read", and second one doesn't say "barrier". This is real code from current kernel which demonstrates this: "I don't know that cpu_relax() is a barrier already": drivers/kvm/kvm_main.c while (atomic_read(&completed) != needed) { cpu_relax(); barrier(); } "I think that atomic_read() is a read from memory and therefore I don't need a barrier": arch/x86_64/kernel/crash.c msecs = 1000; /* Wait at most a second for the other cpus to stop */ while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) { mdelay(1); msecs--; } Since neither camp seems to give up, I am proposing renaming them to something less confusing, and make everybody happy. cpu_relax_barrier() atomic_value(&x) atomic_fetch(&x) I'm not native English speaker, do these sound better? -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 10 Sep 2007 15:38:23 +0100 Denys Vlasenko <[EMAIL PROTECTED]> wrote: > On Monday 10 September 2007 15:51, Arjan van de Ven wrote: > > On Mon, 10 Sep 2007 11:56:29 +0100 > > Denys Vlasenko <[EMAIL PROTECTED]> wrote: > > > > > > > > Well, if you insist on having it again: > > > > > > Waiting for atomic value to be zero: > > > > > > while (atomic_read(&x)) > > > continue; > > > > > > > and this I would say is buggy code all the way. > > > > Not from a pure C level semantics, but from a "busy waiting is > > buggy" semantics level and a "I'm inventing my own locking" > > semantics level. > > After inspecting arch/*, I cannot agree with you. the arch/ people obviously are allowed to do their own locking stuff... BECAUSE THEY HAVE TO IMPLEMENT THAT! the arch maintainers know EXACTLY how their hw behaves (well, we hope) so they tend to be the exception to many rules in the kernel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/16] net: Basic network namespace infrastructure.
Pavel Emelyanov <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: > > [snip] > >> --- /dev/null >> +++ b/include/net/net_namespace.h >> @@ -0,0 +1,68 @@ >> +/* >> + * Operations on the network namespace >> + */ >> +#ifndef __NET_NET_NAMESPACE_H >> +#define __NET_NET_NAMESPACE_H >> + >> +#include >> +#include >> +#include >> + >> +struct net { > > Isn't this name is too generic? Why not net_namespace? My fingers went on strike. struct netns probably wouldn't be bad. The very long and spelled out version seemed painful. I don't really care except that not changing it is easier for me. I'm happy to do whatever is considered most maintainable. >> +/* Walk through the list backwards calling the exit functions >> + * for the pernet modules whose init functions did not fail. >> + */ >> +for (ptr = ptr->prev; ptr != &pernet_list; ptr = ptr->prev) { > > Good reason for adding list_for_each_continue_reverse :) Sounds reasonable. >> +static int register_pernet_operations(struct list_head *list, >> + struct pernet_operations *ops) >> +{ >> +struct net *net, *undo_net; >> +int error; >> + >> +error = 0; >> +list_add_tail(&ops->list, list); >> +for_each_net(net) { >> +if (ops->init) { > > Maybe it's better to do it vice-versa? > if (ops->init) >for_each_net(net) >ops->init(net); > ... My gut feel says it is more readable with the test on the inside. Although something like if (ops->init) goto out; might be more readable. >> +int register_pernet_device(struct pernet_operations *ops) >> +{ >> +int error; >> +mutex_lock(&net_mutex); >> +error = register_pernet_operations(&pernet_list, ops); >> +if (!error && (first_device == &pernet_list)) > > Very minor: why do you give the name "device" to some pernet_operations? Subsystems need to be initialized before and cleaned up after network devices. We don't have much in the way that needs this distinction, but we have just enough that it is useful to make this distinction. Looking at my complete patchset all I have in this category is the loopback device, and it is important on the teardown side of things that the loopback device be unregistered before I clean up the protocols or else I get weird leaks. Reflecting on it I'm not quite certain about the setup side of things. I'm on the fence if I need to completely dynamically allocate the loopback device or if I need to embed it struct net. There also may be a call for some other special network devices to have one off instances in each network namespace.With the new netlink creation API it isn't certain we need that idiom anymore but before that point I was certain we would have other network devices besides the loopback that would care. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace
Pavel Emelyanov <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> Until we support multiple network namespaces with netfilter only allow >> netfilter configuration in the initial network namespace. > > PATCH 17/16? :) Exactly! If my target was the core of the networking stack I figured I better include the change that keeps netfilter commands isolated to the initial network namespace, and in my review of completeness I had missed that in my first pass through my patches. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/16] net: Support multiple network namespaces with netlink
Pavel Emelyanov <[EMAIL PROTECTED]> writes: > > Rr. This is the 5th or even the 6th patch that changes tens of files > but (!) most of these changes are just propagating some core thing into > protocols, drivers, etc. E.g. you add an argument to some function and > then make all the rest use it, but the chunk adding the argument itself > is buried in these changes. > > Why not make a reviewers' lifes easier and make (with hands) the core > hunks go first and the "propagation" ones at the end? For RFC purpose > I would even break the git-bisect safeness and splitted these patches > into 2 parts: those with the core and those with the propagation. Agreed, this is an issue for easy review. My apologies. I guess it was a failure in my imagination on how to prepare these patches for review, in a way that was both reviewer and preparer friendly. There is at least one /proc idiom change that needs to be made so I will keep this in mind for the resend. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 10 Sep 2007, Denys Vlasenko wrote: > > static inline int > qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha) > { > int return_status = QLA_SUCCESS; > unsigned long loop_timeout ; > scsi_qla_host_t *pha = to_qla_parent(ha); > > /* wait for 5 min at the max for loop to be ready */ > loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ); > > while ((!atomic_read(&pha->loop_down_timer) && > atomic_read(&pha->loop_state) == LOOP_DOWN) || > atomic_read(&pha->loop_state) != LOOP_READY) { > if (atomic_read(&pha->loop_state) == LOOP_DEAD) { ... > Is above correct or buggy? Correct, because msleep is a barrier. > Is it obvious? No. It's *buggy*. But it has nothing to do with any msleep() in the loop, or anything else. And more importantly, it would be equally buggy even *with* a "volatile" atomic_read(). Why is this so hard for people to understand? You're all acting like morons. The reason it is buggy has absolutely nothing to do with whether the read is done or not, it has to do with the fact that the CPU may re-order the reads *regardless* of whether the read is done in some specific order by the compiler ot not! In effect, there is zero ordering between all those three reads, and if you don't have memory barriers (or a lock or other serialization), that code is buggy. So stop this idiotic discussion thread already. The above kind of code needs memory barriers to be non-buggy. The whole "volatile or not" discussion is totally idiotic, and pointless, and anybody who doesn't understand that by now needs to just shut up and think about it more, rather than make this discussion drag out even further. The fact is, "volatile" *only* makes things worse. It generates worse code, and never fixes any real bugs. This is a *fact*. Linus - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
sky2.c: length mismatch errors with vlan frames
Hi list, I have been running recent linux kernel on nexcom NSA 1086's equipped with sysconnect NICs. Like some people previously have on this list I am running into problems with these NICs and seeing frequent errors in my dmesg: sky2 eth4: rx error, status 0x402300 length 60 printk: 17 messages suppressed. sky2 eth4: rx error, status 0x402300 length 60 printk: 32 messages suppressed. sky2 eth4: rx error, status 0x602300 length 92 printk: 25 messages suppressed. sky2 eth4: rx error, status 0x6e2300 length 106 printk: 16 messages suppressed. sky2 eth4: rx error, status 0x402300 length 60 printk: 10 messages suppressed. sky2 eth4: rx error, status 0x402300 length 60 printk: 17 messages suppressed. sky2 eth4: rx error, status 0x402300 length 60 I have investigated a bit and status doesn't match any of the errors in GMR_FS_ANY_ERR. The block generating the error is now len_mismatch, due to the fact that on packets having the GMR_FS_VLAN bit set, the length argument to sky2_receive is 4 bytes shorter that the 16 bit value in status (status ``right shift'' 16). Since both these values are read from the card I don't know how to solve this other than by incrementing the length arg by 4 when GMR_FS_VLAN is set in status. So here's a diff that does this, although I'm not sure its an elegant solution: --- sky2.c.orig 2007-09-10 15:34:15.0 +0200 +++ sky2.c 2007-09-10 16:20:28.0 +0200 @@ -2059,13 +2059,16 @@ static struct sk_buff *sky2_receive(stru sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending; prefetch(sky2->rx_ring + sky2->rx_next); + if (status & GMR_FS_VLAN) + length += 4; + if (status & GMR_FS_ANY_ERR) goto error; if (!(status & GMR_FS_RX_OK)) goto resubmit; - if (status >> 16 != length) + if ((status >> 16) != length) goto len_mismatch; if (length < copybreak) @@ -2081,6 +2084,8 @@ len_mismatch: /* Truncation of overlength packets causes PHY length to not match MAC length */ ++sky2->net_stats.rx_length_errors; + printk(KERN_INFO PFX "%s: rx length mismatch: length %d != %d\n", + dev->name, length, status >> 16); error: ++sky2->net_stats.rx_errors; Any thoughts on how to solve this ? Further testing shows that sometimes there are packets without the GMR_FS_VLAN bit set which have a 4 bytes length difference, has shown by this log excerpt: sky2 eth4: rx length mismatch: length 243 != 247 sky2 eth4: rx error, status 0xf70300 length 243 For debugging purposes, I here is the little program with excerpts from sky2.c I use to see what bits are set in the status field: #include #include #include #include enum { GMR_FS_LEN = 0x<<16, /* Bit 31..16:Rx Frame Length */ GMR_FS_VLAN = 1<<13, /* VLAN Packet */ GMR_FS_JABBER = 1<<12, /* Jabber Packet */ GMR_FS_UN_SIZE = 1<<11, /* Undersize Packet */ GMR_FS_MC = 1<<10, /* Multicast Packet */ GMR_FS_BC = 1<<9, /* Broadcast Packet */ GMR_FS_RX_OK= 1<<8, /* Receive OK (Good Packet) */ GMR_FS_GOOD_FC = 1<<7, /* Good Flow-Control Packet */ GMR_FS_BAD_FC = 1<<6, /* Bad Flow-Control Packet */ GMR_FS_MII_ERR = 1<<5, /* MII Error */ GMR_FS_LONG_ERR = 1<<4, /* Too Long Packet */ GMR_FS_FRAGMENT = 1<<3, /* Fragment */ GMR_FS_CRC_ERR = 1<<1, /* CRC Error */ GMR_FS_RX_FF_OV = 1<<0, /* Rx FIFO Overflow */ GMR_FS_ANY_ERR = GMR_FS_RX_FF_OV | GMR_FS_CRC_ERR | GMR_FS_FRAGMENT | GMR_FS_LONG_ERR | GMR_FS_MII_ERR | GMR_FS_BAD_FC | GMR_FS_UN_SIZE | GMR_FS_JABBER, }; struct bitdesc { const char *name; u_int32_tfield; }; struct bitdesc bits[] = { { "GMR_FS_VLAN", 1 << 13 }, { "GMR_FS_JABBER", 1 << 12 }, { "GMR_FS_UN_SIZE", 1 << 11 }, { "GMR_FS_MC", 1 << 10 }, { "GMR_FS_BC", 1 << 9 }, { "GMR_FS_RX_OK", 1 << 8 }, { "GMR_FS_GOOD_FC", 1 << 7 }, { "GMR_FS_BAD_FC", 1 << 6 }, { "GMR_FS_MII_ERR", 1 << 5 }, { "GMR_FS_LONG_ERR", 1 << 4 }, { "GMR_FS_FRAGMENT", 1 << 3 }, { "GMR_FS_CRC_ERR", 1 << 1 }, { "GMR_FS_RX_FF_OV", 1 << 0 } }; int main(int argc, const char *argv[]) { int status; int i; if (argc < 2) return (1); status = strtol(argv[1], NULL, 16); printf("status: 0x%08x\n", status); if (status & GMR_FS_ANY_ERR) printf("packet has an error\n"); printf("length: %u\n", status >> 16); for (i = 0; i < (sizeof(bits) / sizeof(bits[0])); i++) { if (status & bits[i].field) { printf("has bit %s\n", bits[i].name); } } return (0); } - To un
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 15:51, Arjan van de Ven wrote: > On Mon, 10 Sep 2007 11:56:29 +0100 > Denys Vlasenko <[EMAIL PROTECTED]> wrote: > > > > > Well, if you insist on having it again: > > > > Waiting for atomic value to be zero: > > > > while (atomic_read(&x)) > > continue; > > > > and this I would say is buggy code all the way. > > Not from a pure C level semantics, but from a "busy waiting is buggy" > semantics level and a "I'm inventing my own locking" semantics level. After inspecting arch/*, I cannot agree with you. Otherwise almost all major architectures use "conceptually buggy busy-waiting": arch/alpha arch/i386 arch/ia64 arch/m32r arch/mips arch/parisc arch/powerpc arch/sh arch/sparc64 arch/um arch/x86_64 All of the above contain busy-waiting on atomic_read. Including these loops without barriers: arch/mips/kernel/smtc.c while (atomic_read(&idle_hook_initialized) < 1000) ; arch/mips/sgi-ip27/ip27-nmi.c while (atomic_read(&nmied_cpus) != num_online_cpus()); [Well maybe num_online_cpus() is a barrier, I didn't check] arch/sh/kernel/smp.c if (wait) while (atomic_read(&smp_fn_call.finished) != (nr_cpus - 1)); Bugs? -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] [PATCH V4 0/10] net/bonding: ADD IPoIB support for the bonding driver
Hi all, This patch series is a bit neglected. Since our goal is to have bonding support for IPoIB in kernel 2.6.24 it is very important for us to get comments soon. We would appreciate if you take some time to look at this and help us push this code upstream. thanks MoniS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/16] net: Basic network namespace infrastructure.
Eric W. Biederman wrote: [snip] > --- /dev/null > +++ b/include/net/net_namespace.h > @@ -0,0 +1,68 @@ > +/* > + * Operations on the network namespace > + */ > +#ifndef __NET_NET_NAMESPACE_H > +#define __NET_NET_NAMESPACE_H > + > +#include > +#include > +#include > + > +struct net { Isn't this name is too generic? Why not net_namespace? > + atomic_tcount; /* To decided when the network > + * namespace should be freed. > + */ > + atomic_tuse_count; /* To track references we > + * destroy on demand > + */ > + struct list_headlist; /* list of network namespaces */ > + struct work_struct work; /* work struct for freeing */ > +}; [snip] > --- /dev/null > +++ b/net/core/net_namespace.c [snip] > +static int setup_net(struct net *net) > +{ > + /* Must be called with net_mutex held */ > + struct pernet_operations *ops; > + struct list_head *ptr; > + int error; > + > + memset(net, 0, sizeof(struct net)); > + atomic_set(&net->count, 1); > + atomic_set(&net->use_count, 0); > + > + error = 0; > + list_for_each(ptr, &pernet_list) { > + ops = list_entry(ptr, struct pernet_operations, list); > + if (ops->init) { > + error = ops->init(net); > + if (error < 0) > + goto out_undo; > + } > + } > +out: > + return error; > +out_undo: > + /* Walk through the list backwards calling the exit functions > + * for the pernet modules whose init functions did not fail. > + */ > + for (ptr = ptr->prev; ptr != &pernet_list; ptr = ptr->prev) { Good reason for adding list_for_each_continue_reverse :) > + ops = list_entry(ptr, struct pernet_operations, list); > + if (ops->exit) > + ops->exit(net); > + } > + goto out; > +} > + > +static int __init net_ns_init(void) > +{ > + int err; > + > + printk(KERN_INFO "net_namespace: %zd bytes\n", sizeof(struct net)); > + net_cachep = kmem_cache_create("net_namespace", sizeof(struct net), > + SMP_CACHE_BYTES, > + SLAB_PANIC, NULL); > + mutex_lock(&net_mutex); > + err = setup_net(&init_net); > + > + net_lock(); > + list_add_tail(&init_net.list, &net_namespace_list); > + net_unlock(); > + > + mutex_unlock(&net_mutex); > + if (err) > + panic("Could not setup the initial network namespace"); > + > + return 0; > +} > + > +pure_initcall(net_ns_init); > + > +static int register_pernet_operations(struct list_head *list, > + struct pernet_operations *ops) > +{ > + struct net *net, *undo_net; > + int error; > + > + error = 0; > + list_add_tail(&ops->list, list); > + for_each_net(net) { > + if (ops->init) { Maybe it's better to do it vice-versa? if (ops->init) for_each_net(net) ops->init(net); ... > + error = ops->init(net); > + if (error) > + goto out_undo; > + } > + } > +out: > + return error; > + > +out_undo: > + /* If I have an error cleanup all namespaces I initialized */ > + list_del(&ops->list); > + for_each_net(undo_net) { > + if (undo_net == net) > + goto undone; > + if (ops->exit) > + ops->exit(undo_net); > + } > +undone: > + goto out; > +} > + > +static void unregister_pernet_operations(struct pernet_operations *ops) > +{ > + struct net *net; > + > + list_del(&ops->list); > + for_each_net(net) > + if (ops->exit) The same here. > + ops->exit(net); > +} > + > +/** > + * register_pernet_subsys - register a network namespace subsystem > + * @ops: pernet operations structure for the subsystem > + * > + * Register a subsystem which has init and exit functions > + * that are called when network namespaces are created and > + * destroyed respectively. > + * > + * When registered all network namespace init functions are > + * called for every existing network namespace. Allowing kernel > + * modules to have a race free view of the set of network namespaces. > + * > + * When a new network namespace is created all of the init > + * methods are called in the order in which they were registered. > + * > + * When a network namespace is destroyed all of the exit methods > + * are called in the reverse of the order with which they were > + * registered. > + */ > +int register_pernet_subsys(struct pernet_operations *ops) > +{ > + int error; > + mutex_lock(&net_mutex);
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 14:38, Denys Vlasenko wrote: > You are basically trying to educate me how to use atomic properly. > You don't need to do it, as I am (currently) not a driver author. > > I am saying that people who are already using atomic_read() > (and who unfortunately did not read your explanation above) > will still sometimes use atomic_read() as a way to read atomic value > *from memory*, and will create nasty heisenbugs for you to debug. static inline int qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha) { int return_status = QLA_SUCCESS; unsigned long loop_timeout ; scsi_qla_host_t *pha = to_qla_parent(ha); /* wait for 5 min at the max for loop to be ready */ loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ); while ((!atomic_read(&pha->loop_down_timer) && atomic_read(&pha->loop_state) == LOOP_DOWN) || atomic_read(&pha->loop_state) != LOOP_READY) { if (atomic_read(&pha->loop_state) == LOOP_DEAD) { return_status = QLA_FUNCTION_FAILED; break; } msleep(1000); if (time_after_eq(jiffies, loop_timeout)) { return_status = QLA_FUNCTION_FAILED; break; } } return (return_status); } Is above correct or buggy? Correct, because msleep is a barrier. Is it obvious? No. static void qla2x00_rst_aen(scsi_qla_host_t *ha) { if (ha->flags.online && !ha->flags.reset_active && !atomic_read(&ha->loop_down_timer) && !(test_bit(ABORT_ISP_ACTIVE, &ha->dpc_flags))) { do { clear_bit(RESET_MARKER_NEEDED, &ha->dpc_flags); /* * Issue marker command only when we are going to start * the I/O. */ ha->marker_needed = 1; } while (!atomic_read(&ha->loop_down_timer) && (test_bit(RESET_MARKER_NEEDED, &ha->dpc_flags))); } } Is above correct? I honestly don't know. Correct, because set_bit is a barrier on _all _memory_? Will it break if set_bit will be changed to be a barrier only on its operand? Probably yes. drivers/kvm/kvm_main.c while (atomic_read(&completed) != needed) { cpu_relax(); barrier(); } Obviously author did not know that cpu_relax is already a barrier. See why I think driver authors will be confused? arch/x86_64/kernel/crash.c static void nmi_shootdown_cpus(void) { ... msecs = 1000; /* Wait at most a second for the other cpus to stop */ while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) { mdelay(1); msecs--; } ... } Is mdelay(1) a barrier? Yes, because it is a function on x86_64. Absolutely the same code will be buggy on an arch where mdelay(1) == udelay(1000), and udelay is implemented as inline busy-wait. arch/sparc64/kernel/smp.c /* Wait for response */ while (atomic_read(&data.finished) != cpus) cpu_relax(); ...later in the same file... while (atomic_read(&smp_capture_registry) != ncpus) rmb(); I'm confused. Do we need cpu_relax() or rmb()? Does cpu_relax() imply rmb()? (No it doesn't). Which of those two while loops needs correcting? -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Lksctp-developers] [-mm patch] net/sctp/socket.c: make 3 variables static
On Sun, Sep 09, 2007 at 10:25:54PM +0200, Adrian Bunk wrote: > On Fri, Aug 31, 2007 at 09:58:22PM -0700, Andrew Morton wrote: > >... > > Changes since 2.6.23-rc3-mm1: > >... > > git-net.patch > >... > > git trees > >... > > This patch makes the following needlessly globalvariables static: > - sctp_memory_pressure > - sctp_memory_allocated > - sctp_sockets_allocated > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > Looks fine to me Acked-by: Neil Horman <[EMAIL PROTECTED]> Neil -- /*** *Neil Horman [EMAIL PROTECTED] *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 10 Sep 2007 11:56:29 +0100 Denys Vlasenko <[EMAIL PROTECTED]> wrote: > > Well, if you insist on having it again: > > Waiting for atomic value to be zero: > > while (atomic_read(&x)) > continue; > and this I would say is buggy code all the way. Not from a pure C level semantics, but from a "busy waiting is buggy" semantics level and a "I'm inventing my own locking" semantics level. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace
Eric W. Biederman wrote: > Until we support multiple network namespaces with netfilter only allow > netfilter configuration in the initial network namespace. PATCH 17/16? :) Sorry, Pavel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/16] net: Support multiple network namespaces with netlink
Eric W. Biederman wrote: > Each netlink socket will live in exactly one network namespace, > this includes the controlling kernel sockets. > > This patch updates all of the existing netlink protocols > to only support the initial network namespace. Request > by clients in other namespaces will get -ECONREFUSED. > As they would if the kernel did not have the support for > that netlink protocol compiled in. > > As each netlink protocol is updated to be multiple network > namespace safe it can register multiple kernel sockets > to acquire a presence in the rest of the network namespaces. > > The implementation in af_netlink is a simple filter implementation > at hash table insertion and hash table look up time. > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> > --- > drivers/connector/connector.c |2 +- > drivers/scsi/scsi_netlink.c |2 +- > drivers/scsi/scsi_transport_iscsi.c |2 +- > fs/ecryptfs/netlink.c |2 +- > include/linux/netlink.h |6 ++- > kernel/audit.c |4 +- > lib/kobject_uevent.c|5 +- > net/bridge/netfilter/ebt_ulog.c |5 +- > net/core/rtnetlink.c|4 +- > net/decnet/netfilter/dn_rtmsg.c |3 +- > net/ipv4/fib_frontend.c |4 +- > net/ipv4/inet_diag.c|4 +- > net/ipv4/netfilter/ip_queue.c |6 +- > net/ipv4/netfilter/ipt_ULOG.c |3 +- > net/ipv6/netfilter/ip6_queue.c |6 +- > net/netfilter/nfnetlink.c |2 +- > net/netfilter/nfnetlink_log.c |3 +- > net/netfilter/nfnetlink_queue.c |3 +- > net/netlink/af_netlink.c| 106 > ++- > net/netlink/genetlink.c |4 +- > net/xfrm/xfrm_user.c|2 +- > security/selinux/netlink.c |5 +- > 22 files changed, 122 insertions(+), 61 deletions(-) Rr. This is the 5th or even the 6th patch that changes tens of files but (!) most of these changes are just propagating some core thing into protocols, drivers, etc. E.g. you add an argument to some function and then make all the rest use it, but the chunk adding the argument itself is buried in these changes. Why not make a reviewers' lifes easier and make (with hands) the core hunks go first and the "propagation" ones at the end? For RFC purpose I would even break the git-bisect safeness and splitted these patches into 2 parts: those with the core and those with the propagation. Thanks, Pavel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Lksctp-developers] [2.6 patch] make sctp_addto_param() static
On Sun, Sep 09, 2007 at 10:25:50PM +0200, Adrian Bunk wrote: > sctp_addto_param() can become static. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > ACK, seems reasonable to me. Neil -- /*** *Neil Horman [EMAIL PROTECTED] *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] make sctp_addto_param() static
Adrian Bunk wrote: > sctp_addto_param() can become static. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Ack -vlad - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 13:22, Kyle Moffett wrote: > On Sep 10, 2007, at 06:56:29, Denys Vlasenko wrote: > > On Sunday 09 September 2007 19:18, Arjan van de Ven wrote: > >> On Sun, 9 Sep 2007 19:02:54 +0100 > >> Denys Vlasenko <[EMAIL PROTECTED]> wrote: > >> > >>> Why is all this fixation on "volatile"? I don't think people want > >>> "volatile" keyword per se, they want atomic_read(&x) to _always_ > >>> compile into an memory-accessing instruction, not register access. > >> > >> and ... why is that? is there any valid, non-buggy code sequence > >> that makes that a reasonable requirement? > > > > Well, if you insist on having it again: > > > > Waiting for atomic value to be zero: > > > > while (atomic_read(&x)) > > continue; > > > > gcc may happily convert it into: > > > > reg = atomic_read(&x); > > while (reg) > > continue; > > Bzzt. Even if you fixed gcc to actually convert it to a busy loop on > a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that > does the conversion, it may be that the CPU does the caching of the > memory value. GCC has no mechanism to do cache-flushes or memory- > barriers except through our custom inline assembly. CPU can cache the value all right, but it cannot use that cached value *forever*, it has to react to invalidate cycles on the shared bus and re-fetch new data. IOW: atomic_read(&x) which compiles down to memory accessor will work properly. > the CPU. Thirdly, on a large system it may take some arbitrarily > large amount of time for cache-propagation to update the value of the > variable in your local CPU cache. Yes, but "arbitrarily large amount of time" is actually measured in nanoseconds here. Let's say 1000ns max for hundreds of CPUs? > Also, you > probably want a cpu_relax() in there somewhere to avoid overheating > the CPU. Yes, but 1. CPU shouldn't overheat (in a sense that it gets damaged), it will only use more power than needed. 2. cpu_relax() just throttles down my CPU, so it's performance optimization only. Wait, it isn't, it's a barrier too. Wow, "cpu_relax" is a barrier? How am I supposed to know that without reading lkml flamewars and/or header files? Let's try reading headers. asm-x86_64/processor.h: #define cpu_relax() rep_nop() So, is it a barrier? No clue yet. /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */ static inline void rep_nop(void) { __asm__ __volatile__("rep;nop": : :"memory"); } Comment explicitly says that it is "a good thing" (doesn't say that it is mandatory) and says NOTHING about barriers! Barrier-ness is not mentioned and is hidden in "memory" clobber. Do you think it's obvious enough for average driver writer? I think not, especially that it's unlikely for him to even start suspecting that it is a memory barrier based on the "cpu_relax" name. > You simply CANNOT use an atomic_t as your sole synchronizing > primitive, it doesn't work! You virtually ALWAYS want to use an > atomic_t in the following types of situations: > > (A) As an object refcount. The value is never read except as part of > an atomic_dec_return(). Why aren't you using "struct kref"? > > (B) As an atomic value counter (number of processes, for example). > Just "reading" the value is racy anyways, if you want to enforce a > limit or something then use atomic_inc_return(), check the result, > and use atomic_dec() if it's too big. If you just want to return the > statistics then you are going to be instantaneous-point-in-time anyways. > > (C) As an optimization value (statistics-like, but exact accuracy > isn't important). > > Atomics are NOT A REPLACEMENT for the proper kernel subsystem, like > completions, mutexes, semaphores, spinlocks, krefs, etc. It's not > useful for synchronization, only for keeping track of simple integer > RMW values. Note that atomic_read() and atomic_set() aren't very > useful RMW primitives (read-nomodify-nowrite and read-set-zero- > write). Code which assumes anything else is probably buggy in other > ways too. You are basically trying to educate me how to use atomic properly. You don't need to do it, as I am (currently) not a driver author. I am saying that people who are already using atomic_read() (and who unfortunately did not read your explanation above) will still sometimes use atomic_read() as a way to read atomic value *from memory*, and will create nasty heisenbugs for you to debug. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc5: possible irq lock inversion dependency detected
On Sun, Sep 02, 2007 at 01:11:29PM +, Christian Kujau wrote: > > after upgrading to 2.6.23-rc5 (and applying davem's fix [0]), lockdep > was quite noisy when I tried to shape my external (wireless) interface: > > [ 6400.534545] FahCore_78.exe/3552 just changed the state of lock: > [ 6400.534713] (&dev->ingress_lock){-+..}, at: [] > netif_receive_skb+0x2d5/0x3c0 > [ 6400.534941] but this lock took another, soft-read-irq-unsafe lock in the > past: > [ 6400.535145] (police_lock){-.--} This is a genuine dead-lock. The police lock can be taken for reading with softirqs on. If a second CPU tries to take the police lock for writing, while holding the ingress lock, then a softirq on the first CPU can dead-lock when it tries to get the ingress lock. The minimal fix would be to make sure that we disable BH on the first CPU. Jamal, could you take a look at this please? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sep 10, 2007, at 06:56:29, Denys Vlasenko wrote: On Sunday 09 September 2007 19:18, Arjan van de Ven wrote: On Sun, 9 Sep 2007 19:02:54 +0100 Denys Vlasenko <[EMAIL PROTECTED]> wrote: Why is all this fixation on "volatile"? I don't think people want "volatile" keyword per se, they want atomic_read(&x) to _always_ compile into an memory-accessing instruction, not register access. and ... why is that? is there any valid, non-buggy code sequence that makes that a reasonable requirement? Well, if you insist on having it again: Waiting for atomic value to be zero: while (atomic_read(&x)) continue; gcc may happily convert it into: reg = atomic_read(&x); while (reg) continue; Bzzt. Even if you fixed gcc to actually convert it to a busy loop on a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that does the conversion, it may be that the CPU does the caching of the memory value. GCC has no mechanism to do cache-flushes or memory- barriers except through our custom inline assembly. Also, you probably want a cpu_relax() in there somewhere to avoid overheating the CPU. Thirdly, on a large system it may take some arbitrarily large amount of time for cache-propagation to update the value of the variable in your local CPU cache. Finally, if atomics are based on based on spinlock+interrupt-disable then you will sit in a tight busy- loop of spin_lock_irqsave()->spin_unlock_irqrestore(). Depending on your system's internal model this may practically lock up your core because the spin_lock() will take the cacheline for exclusive access and doing that in a loop can prevent any other CPU from doing any operation on it! Since your IRQs are disabled you even have a very small window that an IRQ will come along and free it up long enough for the update to take place. The earlier code segment of: while(atomic_read(&x) > 0) atomic_dec(&x); is *completely* buggy because you could very easily have 4 CPUs doing this on an atomic variable with a value of 1 and end up with it at negative 3 by the time you are done. Moreover all the alternatives are also buggy, with the sole exception of this rather obvious- seeming one: atomic_set(&x, 0); You simply CANNOT use an atomic_t as your sole synchronizing primitive, it doesn't work! You virtually ALWAYS want to use an atomic_t in the following types of situations: (A) As an object refcount. The value is never read except as part of an atomic_dec_return(). Why aren't you using "struct kref"? (B) As an atomic value counter (number of processes, for example). Just "reading" the value is racy anyways, if you want to enforce a limit or something then use atomic_inc_return(), check the result, and use atomic_dec() if it's too big. If you just want to return the statistics then you are going to be instantaneous-point-in-time anyways. (C) As an optimization value (statistics-like, but exact accuracy isn't important). Atomics are NOT A REPLACEMENT for the proper kernel subsystem, like completions, mutexes, semaphores, spinlocks, krefs, etc. It's not useful for synchronization, only for keeping track of simple integer RMW values. Note that atomic_read() and atomic_set() aren't very useful RMW primitives (read-nomodify-nowrite and read-set-zero- write). Code which assumes anything else is probably buggy in other ways too. So while I see no real reason for the "volatile" on the atomics, I also see no real reason why it's terribly harmful. Regardless of the "volatile" on the operation the CPU is perfectly happy to cache it anyways so it doesn't buy you any actual "always-access-memory" guarantees. If you are just interested in it as an optimization you could probably just read the properly-aligned integer counter directly, an atomic read on most CPUs. If you really need it to hit main memory *every* *single* *time* (Why? Are you using it instead of the proper kernel subsystem?) then you probably need a custom inline assembly helper anyways. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] remove asm/bitops.h includes
On Sat, Sep 08, 2007 at 09:00:08PM +0100, Jiri Slaby wrote: > > remove asm/bitops.h includes > > including asm/bitops directly may cause compile errors. don't include it > and include linux/bitops instead. next patch will deny including asm header > directly. > > Cc: Adrian Bunk <[EMAIL PROTECTED]> > Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]> For the MIPS and hamradio bits: Acked-by: Ralf Baechle <[EMAIL PROTECTED]> Ralf - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
On Mon, 2007-10-09 at 10:20 +0100, James Chapman wrote: > jamal wrote: > > > If the problem i am trying to solve is "reduce cpu use at lower rate", > > then this is not the right answer because your cpu use has gone up. > > The problem I'm trying to solve is "reduce the max interrupt rate from > NAPI drivers while minimizing latency". As long as what you are saying above translates to "there is one interupt per packet per napi poll" then we are saying the same thing. > In modern systems, the interrupt > rate can be so high that the CPU spends too much time processing > interrupts, resulting in the system's behavior seen by the user being > degraded. modern systems also can handle interupts a lot better. If you can amortize two packets per interupt per napi poll then you have done better than the breakeven point; however, I think it is fair to also disprove that in modern hardware the breakeven point is met with amortizing two packets. > Having the poll() called when idle will always increase CPU usage. But > the feedback you and others are giving encourages me to find a better > compromise. i dont mean in any way to discourage you - just making you work better ;-> It is very refreshing to see that you understand the scope is performance not vomiting endless versions of code - and for this i feel obligated to help. > I'll go away and do some tests. I'll post results here for discussion later. way to go. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...
Remove typedefs, volatiles and convert kmalloc()/memset() pairs to kcalloc(). Also reformat the surrounding clutter. Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]> --- Per your request, Andrew, a while ago. It builds, runs, passes checkpatch.pl and sparse. No semantic changes. Please apply, Maciej patch-mips-2.6.23-rc5-20070904-sb1250-mac-typedef-7 diff -up --recursive --new-file linux-mips-2.6.23-rc5-20070904.macro/drivers/net/sb1250-mac.c linux-mips-2.6.23-rc5-20070904/drivers/net/sb1250-mac.c --- linux-mips-2.6.23-rc5-20070904.macro/drivers/net/sb1250-mac.c 2007-07-10 04:57:46.0 + +++ linux-mips-2.6.23-rc5-20070904/drivers/net/sb1250-mac.c 2007-09-10 01:01:09.0 + @@ -140,17 +140,17 @@ MODULE_PARM_DESC(int_timeout_rx, "RX tim * */ -typedef enum { sbmac_speed_auto, sbmac_speed_10, - sbmac_speed_100, sbmac_speed_1000 } sbmac_speed_t; +enum sbmac_speed { sbmac_speed_auto, sbmac_speed_10, + sbmac_speed_100, sbmac_speed_1000 }; -typedef enum { sbmac_duplex_auto, sbmac_duplex_half, - sbmac_duplex_full } sbmac_duplex_t; +enum sbmac_duplex { sbmac_duplex_auto, sbmac_duplex_half, + sbmac_duplex_full }; -typedef enum { sbmac_fc_auto, sbmac_fc_disabled, sbmac_fc_frame, - sbmac_fc_collision, sbmac_fc_carrier } sbmac_fc_t; +enum sbmac_fc { sbmac_fc_auto, sbmac_fc_disabled, sbmac_fc_frame, + sbmac_fc_collision, sbmac_fc_carrier } sbmac_fc_t; -typedef enum { sbmac_state_uninit, sbmac_state_off, sbmac_state_on, - sbmac_state_broken } sbmac_state_t; +enum sbmac_state { sbmac_state_uninit, sbmac_state_off, sbmac_state_on, + sbmac_state_broken }; /** @@ -176,55 +176,61 @@ typedef enum { sbmac_state_uninit, sbmac * DMA Descriptor structure * */ -typedef struct sbdmadscr_s { +struct sbdmadscr { uint64_t dscr_a; uint64_t dscr_b; -} sbdmadscr_t; - -typedef unsigned long paddr_t; +}; /** * DMA Controller structure * */ -typedef struct sbmacdma_s { +struct sbmacdma { /* * This stuff is used to identify the channel and the registers * associated with it. */ - - struct sbmac_softc *sbdma_eth; /* back pointer to associated MAC */ - int sbdma_channel; /* channel number */ - int sbdma_txdir; /* direction (1=transmit) */ - int sbdma_maxdescr;/* total # of descriptors in ring */ + struct sbmac_softc *sbdma_eth; /* back pointer to associated + MAC */ + int sbdma_channel; /* channel number */ + int sbdma_txdir;/* direction (1=transmit) */ + int sbdma_maxdescr; /* total # of descriptors + in ring */ #ifdef CONFIG_SBMAC_COALESCE - int sbdma_int_pktcnt; /* # descriptors rx/tx before interrupt*/ - int sbdma_int_timeout; /* # usec rx/tx interrupt */ -#endif - - volatile void __iomem *sbdma_config0; /* DMA config register 0 */ - volatile void __iomem *sbdma_config1; /* DMA config register 1 */ - volatile void __iomem *sbdma_dscrbase; /* Descriptor base address */ - volatile void __iomem *sbdma_dscrcnt; /* Descriptor count register */ - volatile void __iomem *sbdma_curdscr; /* current descriptor address */ - volatile void __iomem *sbdma_oodpktlost;/* pkt drop (rx only) */ - + int sbdma_int_pktcnt; + /* # descriptors rx/tx + before interrupt */ + int sbdma_int_timeout; + /* # usec rx/tx interrupt */ +#endif + void __iomem*sbdma_config0; /* DMA config register 0 */ + void __iomem*sbdma_config1; /* DMA config register 1 */ + void __iomem*sbdma_dscrbase; + /* descriptor base address */ + void __iomem*sbdma_dscrcnt; /* descriptor count register */ + void __iomem*sbdma_curdscr; /* current descriptor + address */ + void __iomem*sbdma_oodpktlost; + /* pkt drop (rx only) */ /* * This stuff is for maintenance of the ring */ - - sbdmadscr_
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
On Sat, 2007-08-09 at 09:42 -0700, Mandeep Singh Baines wrote: > Reading the "interrupt pending" register would require an MMIO read. > MMIO reads are very expensive. In some systems the latency of an MMIO > read can be 1000x that of an L1 cache access. Indeed. > However, work_done() doesn't have to be inefficient. For newer > devices you can implement work_done() without an MMIO read by polling > the next ring entry status in memory or some other mechanism. Since > PCI is coherent, acceses to this memory location could be cached > after the first miss. For architectures where PCI is not coherent you'd > have to go to memory for every poll. So for these architectures has_work() > will be moderately expensive (memory access) even when has_work() does > not require an MMIO read. This might affect home routers: not sure if MIPS or > ARM have coherent PCI. I think the effect would be clearly experimentally observable in smaller devices e.g the Geode you seem to be experimenting on. One other suggestion i made in the paper is to something along the lines of cached_irq_mask for the i8259 cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ipconfig.c: De-clutter IP configuration report
Reformat the printk() calls removing leading new-line characters, making output being done line-by-line rather than partially and defining the log level used. Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]> --- The new code builds fine; no semantic changes. Please apply, Maciej patch-mips-2.6.23-rc5-20070904-ipconfig-printk-2 diff -up --recursive --new-file linux-mips-2.6.23-rc5-20070904.macro/net/ipv4/ipconfig.c linux-mips-2.6.23-rc5-20070904/net/ipv4/ipconfig.c --- linux-mips-2.6.23-rc5-20070904.macro/net/ipv4/ipconfig.c2007-09-04 04:56:22.0 + +++ linux-mips-2.6.23-rc5-20070904/net/ipv4/ipconfig.c 2007-09-10 11:53:19.0 + @@ -1364,17 +1364,17 @@ static int __init ip_auto_config(void) /* * Clue in the operator. */ - printk("IP-Config: Complete:"); - printk("\n device=%s", ic_dev->name); - printk(", addr=%u.%u.%u.%u", NIPQUAD(ic_myaddr)); - printk(", mask=%u.%u.%u.%u", NIPQUAD(ic_netmask)); - printk(", gw=%u.%u.%u.%u", NIPQUAD(ic_gateway)); - printk(",\n host=%s, domain=%s, nis-domain=%s", - utsname()->nodename, ic_domain, utsname()->domainname); - printk(",\n bootserver=%u.%u.%u.%u", NIPQUAD(ic_servaddr)); - printk(", rootserver=%u.%u.%u.%u", NIPQUAD(root_server_addr)); - printk(", rootpath=%s", root_server_path); - printk("\n"); + pr_info("IP-Config: Complete:\n"); + pr_info(" device=%s, addr=%u.%u.%u.%u, " + "mask=%u.%u.%u.%u, gw=%u.%u.%u.%u,\n", + ic_dev->name, NIPQUAD(ic_myaddr), + NIPQUAD(ic_netmask), NIPQUAD(ic_gateway)); + pr_info(" host=%s, domain=%s, nis-domain=%s,\n", + utsname()->nodename, ic_domain, utsname()->domainname); + pr_info(" bootserver=%u.%u.%u.%u, " + "rootserver=%u.%u.%u.%u, rootpath=%s\n", + NIPQUAD(ic_servaddr), + NIPQUAD(root_server_addr), root_server_path); #endif /* !SILENT */ return 0; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc5: possible irq lock inversion dependency detected
On Sun, 2007-09-02 at 15:11 +0200, Christian Kujau wrote: > Hi, > > after upgrading to 2.6.23-rc5 (and applying davem's fix [0]), lockdep > was quite noisy when I tried to shape my external (wireless) interface: > > [ 6400.534545] FahCore_78.exe/3552 just changed the state of lock: > [ 6400.534713] (&dev->ingress_lock){-+..}, at: [] > netif_receive_skb+0x2d5/0x3c0 > [ 6400.534941] but this lock took another, soft-read-irq-unsafe lock in the > past: > [ 6400.535145] (police_lock){-.--} > > This happened when I executed: > http://nerdbynature.de/bits/2.6.23-rc5/qos.sh.txt > (using iproute2-ss070313). The is still running, I just noticed a short > hickup, probably when it was busy writing the warning to the disk. > > More details and .config: http://nerdbynature.de/bits/2.6.23-rc5/ seems unavailable at this time, please submit the whole lockdep report if possible. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] cfg80211: fix initialisation if built-in
When cfg80211 is built into the kernel it needs to init earlier so that device registrations are run after it has initialised. Signed-off-by: Johannes Berg <[EMAIL PROTECTED]> --- net/wireless/core.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- wireless-dev.orig/net/wireless/core.c 2007-09-10 13:00:18.567257487 +0200 +++ wireless-dev/net/wireless/core.c2007-09-10 13:00:59.837219347 +0200 @@ -363,7 +363,7 @@ out_fail_notifier: out_fail_sysfs: return err; } -module_init(cfg80211_init); +subsys_initcall(cfg80211_init); static void cfg80211_exit(void) { - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, Sep 10, 2007 at 11:56:29AM +0100, Denys Vlasenko wrote: > > Expecting every driver writer to remember that atomic_read is not in fact > a "read from memory" is naive. That won't happen. Face it, majority of > driver authors are a bit less talented than Ingo Molnar or Arjan van de Ven ;) > The name of the macro is saying that it's a read. > We are confusing users here. For driver authors who're too busy to learn the intricacies of atomic operations, we have the plain old spin lock which then lets you use normal data structures such as u32 safely. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/2] 2.6.23-rc5: known regressions with patches v2
On Mon, 2007-09-10 at 13:09 +0200, Johannes Berg wrote: > On Sat, 2007-09-08 at 13:11 +0200, Michal Piotrowski wrote: > > > Subject : Unable to access memory card reader anymore > Last known good/caused-by doesn't apply, the bug has been in there ever > since the mac80211 code was merged into the tree. Eh, no, I quoted the wrong one by accident, Subject : ifconfig eth1 - scheduling while atomic: ifconfig/0x0002/4170 is the one I was thinking of. johannes signature.asc Description: This is a digitally signed message part
Re: [2/2] 2.6.23-rc5: known regressions with patches v2
On Sat, 2007-09-08 at 13:11 +0200, Michal Piotrowski wrote: > Subject : Unable to access memory card reader anymore > References : http://bugzilla.kernel.org/show_bug.cgi?id=8885 > Last known good : ? > Submitter : Christian Casteyde <[EMAIL PROTECTED]> > Caused-By : ? > Handled-By : Alan Stern <[EMAIL PROTECTED]> > Patch : http://bugzilla.kernel.org/attachment.cgi?id=12438 > Status : patch available Last known good/caused-by doesn't apply, the bug has been in there ever since the mac80211 code was merged into the tree. johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sunday 09 September 2007 19:18, Arjan van de Ven wrote: > On Sun, 9 Sep 2007 19:02:54 +0100 > Denys Vlasenko <[EMAIL PROTECTED]> wrote: > > > Why is all this fixation on "volatile"? I don't think > > people want "volatile" keyword per se, they want atomic_read(&x) to > > _always_ compile into an memory-accessing instruction, not register > > access. > > and ... why is that? > is there any valid, non-buggy code sequence that makes that a > reasonable requirement? Well, if you insist on having it again: Waiting for atomic value to be zero: while (atomic_read(&x)) continue; gcc may happily convert it into: reg = atomic_read(&x); while (reg) continue; Expecting every driver writer to remember that atomic_read is not in fact a "read from memory" is naive. That won't happen. Face it, majority of driver authors are a bit less talented than Ingo Molnar or Arjan van de Ven ;) The name of the macro is saying that it's a read. We are confusing users here. It's doubly confusing that cpy_relax(), which says _nothing_ about barriers in its name, is actually a barrier you need to insert here. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: auto recycling of TIME_WAIT connections
Rick Jones wrote: >> The first issue, requires a large timeout, and >> the TIME_WAIT timeout is currently 60 seconds on linux. >> That timeout effectively limits the connection rate between >> local TCP clients and a server to 32k/60s or around 500 >> connections/second. > > Actually, it would be more like 60k/60s if the application were making > explicit calls to bind() as arguably it should if it is going to be > churning through so many connections. > This was an issue over a decade ago with SPECweb96 benchmarking. The > initial solution was to make the explicit bind() calls and not rely on > the anonymous/ephemeral port space. After that, one starts adding > additional IP's into the mix (at least where possible). And if that > fails, one has to go back to the beginning and ask oneself exactly why a > client is trying to churn through so many connections per second in the > first place. right. This is for benchmarking mainly. Sane applications would use persistent connections, or a different form of IPC. > > If we were slavishly conformant to the RFC's :) that 60 seconds would be > 240 seconds... > >> But that issue can't really happen when the client >> and server are on the same machine can it, and >> even if it could, the timeouts involved would be shorter. >> >> Now linux does have an (undocumented) >> /proc/sys/net/ipv4/tcp_tw_recycle flag >> to enable recycling of TIME_WAIT connections. This is global however >> and could cause >> problems in general for external connections. > > Rampant speculation begins... > > If the client can be convinced to just call shutdown(SHUT_RDWR) rather > than close(), and be the first to do so, ahead of the server, I think it > will retain a link to the TCP endpoint in TIME_WAIT. It could then, in > TCP theory, call connect() again, and go through a path that allows > transition from TIME_WAIT to ESTABLISHED if all the right things wrt > Initial Sequence Number selection happen. Whether randomization of the > ISN allows that today is questionable. Sounds good, unfortunately connect() returns EISCONN unless you do a close(). > >> So how about auto enabling recycling for local connections? > > I think the standard response is that one can never _really_ know what > is local and what not, particularly in the presence of netfilter and the > rewriting of headers behind one's back. Hmm, I was afraid someone would say that :) thanks for the feedback, Pádraig. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
un
unsubscribe netdev - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
Mandeep Singh Baines wrote: Why would using a timer to hold off the napi_complete() rather than jiffy count limit the polls per packet to 2? I was thinking a timer could be used in the way suggested in Jamal's paper. The driver would do nothing (park) until the timer expires. So there would be no calls to poll for the duration of the timer. Hence, this approach would add extra latency not present in a jiffy polling approach. Ah, ok. I wasn't planning to test timer-driven polling. :) Why wouldn't it be efficient? It would usually be done by reading an "interrupt pending" register. Reading the "interrupt pending" register would require an MMIO read. MMIO reads are very expensive. In some systems the latency of an MMIO read can be 1000x that of an L1 cache access. Agreed. Testing for any work being available should be as efficient as possible and would be driver specific. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
Andi Kleen wrote: James Chapman <[EMAIL PROTECTED]> writes: On some platforms the precise timers (like ktime_get()) can be slow, but often they are fast. It might make sense to use a shorter constant time wait on those with fast timers at least. Right now this cannot be known by portable code, but there was a proposal some time ago to export some global estimate to tell how fast ktime_get().et.al. are. That could be reviewed. Interesting. Is ktime_get() fast enough on P4 systems? I'll be using those to test with. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
Jason Lunz wrote: I'd be particularly interested to see what happens to your latency when other apps are hogging the cpu. I assume from your description that your cpu is mostly free to schedule the niced softirqd for the device polling duration, but this won't always be the case. If other tasks are running at high priority, it could be nearly a full jiffy before softirqd gets to check the poll list again and the latency introduced could be much higher than you've yet measured. Indeed. The effect of cpu load on all of this is important to consider. The challenge will be how to test it fairly on different test runs. One thing to bear in mind is that interrupts are processed at highest priority, above any scheduled work. Reducing interrupt rate gives the scheduler more chance to run what it thinks is the next highest priority work. This might be at the expense of network processing. Is it better to give other runnable tasks a fair chunk of the cpu pie? I think so. I'll try to incorporate application cpu load into my tests. Thanks for your feedback. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
jamal wrote: If the problem i am trying to solve is "reduce cpu use at lower rate", then this is not the right answer because your cpu use has gone up. The problem I'm trying to solve is "reduce the max interrupt rate from NAPI drivers while minimizing latency". In modern systems, the interrupt rate can be so high that the CPU spends too much time processing interrupts, resulting in the system's behavior seen by the user being degraded. Having the poll() called when idle will always increase CPU usage. But the feedback you and others are giving encourages me to find a better compromise. At the end of the day, it's going to be a trade-off of cpu and latency. The trade-off will be different for each hardware system and application environment so parameters need to be tunable. I'll go away and do some tests. I'll post results here for discussion later. Thanks for your feedback. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH resend] Fix a lock problem in generic phy code
Hans-J??rgen Koch <[EMAIL PROTECTED]> wrote: > > The following patch fixes it. Tested on an AT91SAM9263-EK board, kernel > 2.6.23-rc4 and -rc3-mm1. Could you please audit all instances of physdev->lock and add _bh where necessary? I can see that at least phys_stop also needs the _bh. We should also consider whether it makes sense to move the timer into a work queue. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] BUG: NULL pointer dereference in(?) ip6_flush_pending_frames
YOSHIFUJI Hideaki / 吉藤英明: Hi, BUG: unable to handle kernel NULL pointer dereference at virtual address 008c : EIP is at ip6_flush_pending_frames+0x97/0x121 I think I've found a bug. [...] Anyway, please try this. FTR, I tried 2.6.22.6 without the patch and it failed as well. The patched kernel is running since yesterday evening (about 8h now) and seems to be stable so far. Too early to tell for sure, but I guess we have a fix. Thanks Yoshifuji! Regards, Bernhard - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[INET_DIAG]: Fix oops in netlink_rcv_skb
Fix the oidentd oops reported by Athanasius <[EMAIL PROTECTED]> in http://bugzilla.kernel.org/show_bug.cgi?id=8961 The oops is a 2.6.22 regression and triggerable by normal users. The patch applies cleanly to current -git and stable-2.6.22. [INET_DIAG]: Fix oops in netlink_rcv_skb netlink_run_queue() doesn't handle multiple processes processing the queue concurrently. Serialize queue processing in inet_diag to fix a oops in netlink_rcv_skb caused by netlink_run_queue passing a NULL for the skb. BUG: unable to handle kernel NULL pointer dereference at virtual address 0054 [349587.500454] printing eip: [349587.500457] c03318ae [349587.500459] *pde = [349587.500464] Oops: [#1] [349587.500466] PREEMPT SMP [349587.500474] Modules linked in: w83627hf hwmon_vid i2c_isa [349587.500483] CPU:0 [349587.500485] EIP:0060:[]Not tainted VLI [349587.500487] EFLAGS: 00010246 (2.6.22.3 #1) [349587.500499] EIP is at netlink_rcv_skb+0xa/0x7e [349587.500506] eax: ebx: ecx: c148d2a0 edx: c0398819 [349587.500510] esi: edi: c0398819 ebp: c7a21c8c esp: c7a21c80 [349587.500517] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [349587.500521] Process oidentd (pid: 17943, ti=c7a2 task=cee231c0 task.ti=c7a2) [349587.500527] Stack: c7a21cac f7c8ba78 c7a21ca4 c0331962 c0398819 f7c8ba00 004c [349587.500542]f736f000 c7a21cb4 c03988e3 0001 f7c8ba00 c7a21cc4 c03312a5 004c [349587.500558]f7c8ba00 c7a21cd4 c0330681 f7c8ba00 e4695280 c7a21d00 c03307c6 7fff [349587.500578] Call Trace: [349587.500581] [] show_trace_log_lvl+0x1c/0x33 [349587.500591] [] show_stack_log_lvl+0x8d/0xaa [349587.500595] [] show_registers+0x1cb/0x321 [349587.500604] [] die+0x112/0x1e1 [349587.500607] [] do_page_fault+0x229/0x565 [349587.500618] [] error_code+0x72/0x78 [349587.500625] [] netlink_run_queue+0x40/0x76 [349587.500632] [] inet_diag_rcv+0x1f/0x2c [349587.500639] [] netlink_data_ready+0x57/0x59 [349587.500643] [] netlink_sendskb+0x24/0x45 [349587.500651] [] netlink_unicast+0x100/0x116 [349587.500656] [] netlink_sendmsg+0x1c2/0x280 [349587.500664] [] sock_sendmsg+0xba/0xd5 [349587.500671] [] sys_sendmsg+0x17b/0x1e8 [349587.500676] [] sys_socketcall+0x230/0x24d [349587.500684] [] syscall_call+0x7/0xb [349587.500691] === [349587.500693] Code: f0 ff 4e 18 0f 94 c0 84 c0 0f 84 66 ff ff ff 89 f0 e8 86 e2 fc ff e9 5a ff ff ff f0 ff 40 10 eb be 55 89 e5 57 89 d7 56 89 c6 53 <8b> 50 54 83 fa 10 72 55 8b 9e 9c 00 00 00 31 c9 8b 03 83 f8 0f Reported by Athanasius <[EMAIL PROTECTED]> Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index dbeacd8..def007e 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -836,12 +836,16 @@ static int inet_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) return inet_diag_get_exact(skb, nlh); } +static DEFINE_MUTEX(inet_diag_mutex); + static void inet_diag_rcv(struct sock *sk, int len) { unsigned int qlen = 0; do { + mutex_lock(&inet_diag_mutex); netlink_run_queue(sk, &qlen, &inet_diag_rcv_msg); + mutex_unlock(&inet_diag_mutex); } while (qlen); }
Re: sh: add support for ax88796 and 93cx6 to highlander boards
On Mon, Sep 10, 2007 at 03:36:26PM +0900, Magnus Damm wrote: > --- 0004/arch/sh/boards/renesas/r7780rp/setup.c > +++ work/arch/sh/boards/renesas/r7780rp/setup.c 2007-09-06 > 15:35:49.0 +0900 > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > static struct resource r8a66597_usb_host_resources[] = { > [0] = { > @@ -136,11 +137,50 @@ static struct platform_device heartbeat_ > .resource = heartbeat_resources, > }; > > +static struct ax_plat_data ax88796_platdata = { > + .flags = AXFLG_HAS_93CX6, > + .wordlength = 2, > + .dcr_val= 0x1, > + .rcr_val= 0x40, > +}; > + > +static struct resource ax88796_resources[] = { > + { > +#ifdef CONFIG_SH_R7780RP > + .start = 0xa5800400, > + .end= 0xa5800400 + (0x20 * 0x2) - 1, > +#else > + .start = 0xa4100400, > + .end= 0xa4100400 + (0x20 * 0x2) - 1, > +#endif > + .flags = IORESOURCE_MEM, > + }, > + { > + .start = IRQ_AX88796, > + .end= IRQ_AX88796, > + .flags = IORESOURCE_IRQ, > + }, > +}; > + > +static struct platform_device ax88796_device = { > + .name = "ax88796", > + .id = 0, > + > + .dev= { > + .platform_data = &ax88796_platdata, > + }, > + > + .num_resources = ARRAY_SIZE(ax88796_resources), > + .resource = ax88796_resources, > +}; > + > + > static struct platform_device *r7780rp_devices[] __initdata = { > &r8a66597_usb_host_device, > &m66592_usb_peripheral_device, > &cf_ide_device, > &heartbeat_device, > + &ax88796_device, > }; > > static int __init r7780rp_devices_setup(void) These bits I'll merge separately once the other patches have been applied. > --- 0001/drivers/net/Kconfig > +++ work/drivers/net/Kconfig 2007-09-06 15:35:41.0 +0900 > @@ -218,13 +218,20 @@ source "drivers/net/arm/Kconfig" > > config AX88796 > tristate "ASIX AX88796 NE2000 clone support" > - depends on ARM || MIPS > + depends on ARM || MIPS || SUPERH > select CRC32 > select MII > help > AX88796 driver, using platform bus to provide > chip detection and resources > > +config AX88796_93CX6 > + bool "ASIX AX88796 external 93CX6 eeprom support" > + depends on AX88796 > + select EEPROM_93CX6 > + help > + Select this if your platform comes with an external 93CX6 eeprom. > + > config MACE > tristate "MACE (Power Mac ethernet) support" > depends on PPC_PMAC && PPC32 There are two different changes here, these should probably be split up and applied independently of each other, given that there's no real dependency between them. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html