Re: [PATCH] ipv4: kernel panic when only one unsecured port available
Hi, "Denis V. Lunev" <[EMAIL PROTECTED]> writes: > This code is broken from the very beginning. > > iris den # cat /proc/sys/net/ipv4/ip_local_port_range > 32768 61000 > iris den # echo 32768 32 >/proc/sys/net/ipv4/ip_local_port_range > iris den # cat /proc/sys/net/ipv4/ip_local_port_range > 32768 32 > iris den # echo 32768 61000 >/proc/sys/net/ipv4/ip_local_port_range If you're talking about checks in sysctl, I believe it should be another patch for sysctl only, and I'm going to push it via -mm tree. the devision by zero exists in inet_connection_socket.c, and must be fixed for sure because the situation with the same min and max port numbers in sysctl are possible and not prohibited. Cheers! -- Anton Arapov, <[EMAIL PROTECTED]> GPG Key ID: 0x6FA8C812 pgpltcnDPkAOC.pgp Description: PGP signature
Re: [RFC/PATCH 2/4] UDP memory usage accounting (take 4): accounting unit and variable
Hi Evgeniy, Thank you for your comment. > Hi. > > On Sat, Oct 06, 2007 at 12:01:07AM +0900, Satoshi OSHIMA ([EMAIL PROTECTED]) > wrote: >> --- 2.6.23-rc3-udp_limit.orig/net/ipv4/udp.c >> +++ 2.6.23-rc3-udp_limit/net/ipv4/udp.c >> @@ -113,6 +113,10 @@ DEFINE_SNMP_STAT(struct udp_mib, udp_sta >> struct hlist_head udp_hash[UDP_HTABLE_SIZE]; >> DEFINE_RWLOCK(udp_hash_lock); >> >> +atomic_t udp_memory_allocated; >> + >> +EXPORT_SYMBOL(udp_memory_allocated); >> + > > Why do you export this variable? > It is not accessed from modules in your patchset. Good point! I'll fix it. Satoshi Oshima - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ipv4: kernel panic when only one unsecured port available
"Denis V. Lunev" <[EMAIL PROTECTED]> writes: > Anton Arapov wrote: >> "Denis V. Lunev" <[EMAIL PROTECTED]> writes: >>> This code is broken from the very beginning. >>> >>> iris den # cat /proc/sys/net/ipv4/ip_local_port_range >>> 32768 61000 >>> iris den # echo 32768 32 >/proc/sys/net/ipv4/ip_local_port_range >>> iris den # cat /proc/sys/net/ipv4/ip_local_port_range >>> 32768 32 >>> iris den # echo 32768 61000 >/proc/sys/net/ipv4/ip_local_port_range >> >> If you're talking about checks in sysctl, I believe it should be >> another patch for sysctl only, and I'm going to push it via -mm tree. >> >> the devision by zero exists in inet_connection_socket.c, and must be >> fixed for sure because the situation with the same min and max port >> numbers in sysctl are possible and not prohibited. >> >> Cheers! > > your patch change nothing :( unfortunately. If I set '32768 32767' it > will oops again. Patch prevents the system crash. System traps on division by zero. Your case(MAX Kernel Development, Red Hat GPG Key ID: 0x6FA8C812 pgpMdgddHvlK9.pgp Description: PGP signature
Re: [PATCH] ipv4: kernel panic when only one unsecured port available
Anton Arapov wrote: > "Denis V. Lunev" <[EMAIL PROTECTED]> writes: >> Anton Arapov wrote: >>> "Denis V. Lunev" <[EMAIL PROTECTED]> writes: This code is broken from the very beginning. iris den # cat /proc/sys/net/ipv4/ip_local_port_range 32768 61000 iris den # echo 32768 32 >/proc/sys/net/ipv4/ip_local_port_range iris den # cat /proc/sys/net/ipv4/ip_local_port_range 32768 32 iris den # echo 32768 61000 >/proc/sys/net/ipv4/ip_local_port_range >>> If you're talking about checks in sysctl, I believe it should be >>> another patch for sysctl only, and I'm going to push it via -mm tree. >>> >>> the devision by zero exists in inet_connection_socket.c, and must be >>> fixed for sure because the situation with the same min and max port >>> numbers in sysctl are possible and not prohibited. >>> >>> Cheers! >> your patch change nothing :( unfortunately. If I set '32768 32767' it >> will oops again. > > Patch prevents the system crash. System traps on division by zero. > > Your case(MAX that I have to join patch for sysctl.c to this one? It's bad idea. > both versions of settings, your ones and my ones are _useless_ in real life. So, we do some sanity fixes. Am I right? If so, we must prevent all versions of OOPS (aka division by zero here). I'll send my vision in a moment... Regards, Den - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ipv4: kernel panic when only one unsecured port available
From: "Denis V. Lunev" <[EMAIL PROTECTED]> Date: Wed, 10 Oct 2007 12:38:37 +0400 > both versions of settings, your ones and my ones are _useless_ in real > life. So, we do some sanity fixes. Am I right? If so, we must prevent > all versions of OOPS (aka division by zero here). > > I'll send my vision in a moment... I agree with Denis that we should plug all of the holes when fixing this. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ipv4: kernel panic when only one unsecured port available
Anton Arapov wrote: > Hi, > > "Denis V. Lunev" <[EMAIL PROTECTED]> writes: >> This code is broken from the very beginning. >> >> iris den # cat /proc/sys/net/ipv4/ip_local_port_range >> 32768 61000 >> iris den # echo 32768 32 >/proc/sys/net/ipv4/ip_local_port_range >> iris den # cat /proc/sys/net/ipv4/ip_local_port_range >> 32768 32 >> iris den # echo 32768 61000 >/proc/sys/net/ipv4/ip_local_port_range > > If you're talking about checks in sysctl, I believe it should be > another patch for sysctl only, and I'm going to push it via -mm tree. > > the devision by zero exists in inet_connection_socket.c, and must be > fixed for sure because the situation with the same min and max port > numbers in sysctl are possible and not prohibited. > > Cheers! your patch change nothing :( unfortunately. If I set '32768 32767' it will oops again. Regards, Den - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] division-by-zero in inet_csk_get_port
This patch fixed a possible division-by-zero in inet_csk_get_port treating situation low > high as if low == high. Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> CC: Antov Arapov <[EMAIL PROTECTED]> --- ./net/ipv4/inet_connection_sock.c.getport 2007-10-09 15:16:02.0 +0400 +++ ./net/ipv4/inet_connection_sock.c 2007-10-10 12:44:04.0 +0400 @@ -80,7 +80,14 @@ int inet_csk_get_port(struct inet_hashin int low = sysctl_local_port_range[0]; int high = sysctl_local_port_range[1]; int remaining = (high - low) + 1; - int rover = net_random() % (high - low) + low; + int rover; + + /* Treat low > high as high == low */ + if (remaining <= 1) { + remaining = 1; + rover = low; + } else + rover = net_random() % (high - low) + low; do { head = &hashinfo->bhash[inet_bhashfn(rover, hashinfo->bhash_size)]; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NETNS] Make ifindex generation per-namespace
Eric W. Biederman wrote: > Pavel Emelyanov <[EMAIL PROTECTED]> writes: > >> Currently indexes for netdevices come sequentially one by >> one, and the same stays true even for devices that are >> created for namespaces. >> >> Side effects of this are: >> * lo device has not 1 index in a namespace. This may break >>some userspace that relies on it (and AFAIR something >>really broke in OpenVZ VEs without this); > > As it happens lo hasn't been registered first for some time > so it hasn't had ifindex of 1 in the normal kernel. > >> * after some time namespaces will have devices with indexes >>like 100 os similar. This might be confusing for a >>human (tools will not mind). > > Only if we wind up creating that many devices. Nope. Create and destroy new net ns for 1 times and you'll get it. >> So move the (currently "global" and static) ifindex variable >> on the struct net, making the indexes allocation look more >> like on a standalone machine. >> >> Moreover - when we have indexes intersect between namespaces, >> we may catch more BUGs in the future related to "wrong device >> was found for a given index". > > Not yet. > > I know there are several data structures internal to the kernel that > are indexed by ifindex, and not struct net_device *. There is the > iflink field in struct net_device. We need a way to refer to network > devices in other namespaces in rtnetlink in an unambiguous way. I > don't see any real problems with a global ifindex assignment until > we start migrating applications. > > So please hold off on this until the kernel has been audited and > we have removed all of the uses of ifindex that assume ifindex is > global, that we can find. Ok. > Right now a namespace local ifindex seems to be just asking for > trouble. You said the same about caching the global pid on the task_struct, but looks like you were wrong ;) Just kidding. > Eric > > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PROBLEM: skb_clone SMP race?
Hello, I'm studying the implementation of sk_buff and I think there's a possible race condition in skb_clone (2.6.22.9) The code is: struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) { struct sk_buff *n; n = skb + 1; if (skb->fclone == SKB_FCLONE_ORIG && n->fclone == SKB_FCLONE_UNAVAILABLE) { atomic_t *fclone_ref = (atomic_t *) (n + 1); n->fclone = SKB_FCLONE_CLONE; atomic_inc(fclone_ref); } else { n = kmem_cache_alloc(skbuff_head_cache, gfp_mask); if (!n) return NULL; n->fclone = SKB_FCLONE_UNAVAILABLE; } If an skb with fast clone available (first "if" true) has references in different CPUs (skb->users>1) (I do not find explicit checks for this to be impossible), if skb_clone is called simultaneously over that skb, both callers can get the same clone (the "fast" clone) and different problems follow: wrong "clone_skb->users" (1 as expected by the caller, but it should be, to be true, 2), fclone_ref set to 3 involving further problems, ... IMO, the same problem arises although the calls to skb_clone are not simultaneous: there isn´t a memory barrier after the change of "n->fclone" to guarantee the visibility of that change to other CPUs (but that barrier will not solve anything; I mentioned this only to reflect another reason I see for the race to happen). Is that correct? Thank you in advance. Santiago Font Arquer - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] division-by-zero in inet_csk_get_port
Ok, I've got it, so we have to do the same with the following: quote from inet_hashtables.c and inet6_hashtables.c. I'll prepare the patch. And just a curious, does the /* Treat low > high as high == low */ idea will keep after the sysctl will be patched? int inet_hash_connect(struct inet_timewait_death_row *death_row, struct sock *sk) { struct inet_hashinfo *hinfo = death_row->hashinfo; const unsigned short snum = inet_sk(sk)->num; struct inet_bind_hashbucket *head; struct inet_bind_bucket *tb; int ret; if (!snum) { int low = sysctl_local_port_range[0]; int high = sysctl_local_port_range[1]; >int range = high - low; int i; int port; static u32 hint; u32 offset = hint + inet_sk_port_offset(sk); struct hlist_node *node; struct inet_timewait_sock *tw = NULL; local_bh_disable(); for (i = 1; i <= range; i++) { >port = low + (i + offset) % range; "Denis V. Lunev" <[EMAIL PROTECTED]> writes: > This patch fixed a possible division-by-zero in inet_csk_get_port > treating situation low > high as if low == high. > > Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> > CC: Antov Arapov <[EMAIL PROTECTED]> > > --- ./net/ipv4/inet_connection_sock.c.getport 2007-10-09 15:16:02.0 > +0400 > +++ ./net/ipv4/inet_connection_sock.c 2007-10-10 12:44:04.0 +0400 > @@ -80,7 +80,14 @@ int inet_csk_get_port(struct inet_hashin > int low = sysctl_local_port_range[0]; > int high = sysctl_local_port_range[1]; > int remaining = (high - low) + 1; > - int rover = net_random() % (high - low) + low; > + int rover; > + > + /* Treat low > high as high == low */ > + if (remaining <= 1) { > + remaining = 1; > + rover = low; > + } else > + rover = net_random() % (high - low) + low; > > do { > head = &hashinfo->bhash[inet_bhashfn(rover, > hashinfo->bhash_size)]; -- Anton Arapov, <[EMAIL PROTECTED]> Kernel Development, Red Hat GPG Key ID: 0x6FA8C812 pgp4C9bqqJZFq.pgp Description: PGP signature
Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
> A 256 entry TX hw queue fills up trivially on 1GB and 10GB, but if you With TSO really? > increase the size much more performance starts to go down due to L2 > cache thrashing. Another possibility would be to consider using cache avoidance instructions while updating the TX ring (e.g. write combining on x86) -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
From: Andi Kleen <[EMAIL PROTECTED]> Date: Wed, 10 Oct 2007 11:16:44 +0200 > > A 256 entry TX hw queue fills up trivially on 1GB and 10GB, but if you > > With TSO really? Yes. > > increase the size much more performance starts to go down due to L2 > > cache thrashing. > > Another possibility would be to consider using cache avoidance > instructions while updating the TX ring (e.g. write combining > on x86) The chip I was working with at the time (UltraSPARC-IIi) compressed all the linear stores into 64-byte full cacheline transactions via the store buffer. It's true that it would allocate in the L2 cache on a miss, which is different from your suggestion. In fact, such a thing might not pan out well, because most of the time you write a single descriptor or two, and that isn't a full cacheline, which means a read/modify/write is the only coherent way to make such a write to RAM. Sure you could batch, but I'd rather give the chip work to do unless I unequivocably knew I'd have enough pending to fill a cacheline's worth of descriptors. And since you suggest we shouldn't queue in software... :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6][NET-2.6.24] Introduce the seq_open_private()
From: Pavel Emelyanov <[EMAIL PROTECTED]> Date: Tue, 09 Oct 2007 19:52:58 +0400 > This function allocates the zeroed chunk of memory and > call seq_open(). The __seq_open_private() helper returns > the allocated memory to make it possible for the caller > to initialize it. > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> Applied, nice cleanup Pavel. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6][NET-2.6.24] Make core networking code use seq_open_private
From: Pavel Emelyanov <[EMAIL PROTECTED]> Date: Tue, 09 Oct 2007 19:55:28 +0400 > This concerns the ipv4 and ipv6 code mostly, but also the netlink > and unix sockets. > > The netlink code is an example of how to use the __seq_open_private() > call - it saves the net namespace on this private. > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6][NET-2.6.24] Make netfilter code use the seq_open_private
From: Pavel Emelyanov <[EMAIL PROTECTED]> Date: Tue, 09 Oct 2007 19:57:29 +0400 > Just switch to the consolidated calls. > > ipt_recent() has to initialize the private, so use > the __seq_open_private() helper. > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> > Cc: Patrick McHardy <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6][NET-2.6.24] Make decnet code use the seq_open_private()
From: Pavel Emelyanov <[EMAIL PROTECTED]> Date: Tue, 09 Oct 2007 19:59:38 +0400 > Just switch to the consolidated code. > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> > Cc: Patrick Caulfield <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6][NET-2.6.24] Make the IRDA use the seq_open_private()
From: Pavel Emelyanov <[EMAIL PROTECTED]> Date: Tue, 09 Oct 2007 20:01:32 +0400 > Just switch to the consolidated code > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> > Cc: Samuel Ortiz <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] division-by-zero in inet_csk_get_port
From: Anton Arapov <[EMAIL PROTECTED]> Date: Wed, 10 Oct 2007 11:00:17 +0200 > Ok, I've got it, so we have to do the same with the following: > quote from inet_hashtables.c and inet6_hashtables.c. I'll prepare the > patch. > > And just a curious, does the /* Treat low > high as high == low */ > idea will keep after the sysctl will be patched? I'm beginning to think that we should do the sysctl validation in this patch too, instead of duplicating this grotty check in all of these port selection functions. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] [TCP]: Limit processing lost_retrans loop to work-to-do cases
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Tue, 9 Oct 2007 15:20:02 +0300 > This addition of lost_retrans_low to tcp_sock might be > unnecessary, it's not clear how often lost_retrans worker is > executed when there wasn't work to do. > > Cc: TAKANO Ryousei <[EMAIL PROTECTED]> > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> I wanted to apply this, but it doesn't go cleanly on top of net-2.6.24, can you respin this patch for me? Thanks! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] [TCP]: Limit processing lost_retrans loop to work-to-do cases
From: David Miller <[EMAIL PROTECTED]> Date: Wed, 10 Oct 2007 02:44:03 -0700 (PDT) > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Tue, 9 Oct 2007 15:20:02 +0300 > > > This addition of lost_retrans_low to tcp_sock might be > > unnecessary, it's not clear how often lost_retrans worker is > > executed when there wasn't work to do. > > > > Cc: TAKANO Ryousei <[EMAIL PROTECTED]> > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> > > I wanted to apply this, but it doesn't go cleanly on top of > net-2.6.24, can you respin this patch for me? Nevermind, I mis-interpreted the ordering of the 3 patches, sorry... - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP]: Separate lost_retrans loop into own function
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Tue, 9 Oct 2007 15:20:00 +0300 > Follows own function for each task principle, this is really > somewhat separate task being done in sacktag. Also reduces > indentation. > > In addition, added ack_seq local var to break some long > lines & fixed coding style things. > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> Applied, thanks Ilpo! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NET-2.6.24] Remove double dev->flags checking when calling dev_close()
From: Pavel Emelyanov <[EMAIL PROTECTED]> Date: Tue, 09 Oct 2007 14:50:54 +0400 > The unregister_netdevice() and dev_change_net_namespace() > both check for dev->flags to be IFF_UP before calling the > dev_close(), but the dev_close() checks for IFF_UP itself, > so remove those unneeded checks. > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> Applied, thanks Pavel. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-2.6.24 0/3]: Attempt to fix lost_retrans brokeness
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Tue, 9 Oct 2007 16:03:29 +0300 (EEST) > On Tue, 9 Oct 2007, Ilpo Järvinen wrote: > > > Lost_retrans handling of sacktag was found to be flawed, two > > problems that were found have an intertwined solution. Fastpath > > problem has existed since hints got added and the other problem > > has probably been there even longer than that. ...This change > > may add non-trivial processing cost. > > > > Initial sketch, only compile tested. This will become more and > > more useful, when sacktag starts to process less and less skbs, > > which hopefully happens quite soon... :-) Sadly enough it will > > probably then be consuming part of the benefits we're able to > > achieve by less skb walking... > > > > First one is trivial, so Dave might want to apply it already. > > Hmm, forgot to add -n to git-format-patch. Since it's currently > RFC, I won't bother to resubmit with numbers unless somebody > really wants that. Here's the correct ordering, if it's not > obvious from the patches alone: I'm going to leave the 2nd patches and 3rd patches alone for now so they can cook a little bit longer. Thanks! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/8][BNX2X] resubmit as attachments: add bnx2x to Kconfig and Makefile
From: "Eliezer Tamir" <[EMAIL PROTECTED]> Date: Tue, 09 Oct 2007 18:20:01 +0200 > Almost all of the zero-filled tables will be removed. > The rest of the registers do need to be initialized. > > I agree that the number of registers that needs to be initialized is > huge, but that is caused by the way the hardware was designed. > > The values for the initialization come from several sources: > Some are derived from HW code (the XML files used to derive the verilog > code), > Others (along with much of the machine generated .h files) are generated > at microcode build time, adding a microcode routine will cause the init > values to change, using a new variable can cause an .h file to change. > In the last group which is very small, are registers that are controlled > by the driver. > > The values in this file really are machine generated, they really are > not meant to be modified directly by editing the file. > > The registers that are under the driver's control are in the main .c > and .h files. ... > The idle check code is not a manufacturing test, it is meant to help > debug the driver and microcode. > If the driver sends an invalid command to one of the CPUs which then > chokes on it, this will tell you which one of them died and the general > whereabouts of the problem. (ingress CPU X is stuck because output queue > Y is full) ... > ( Michael has showed me the trick of how to post with evolution, so I > hope that the mangled patch problem is behind us and I think that I can > now post everything without a problem, Hallelujah!) Thanks for the explanations, I look forward to your next submission. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6][NET-2.6.24] Make the sunrpc use the seq_open_private()
From: Pavel Emelyanov <[EMAIL PROTECTED]> Date: Tue, 09 Oct 2007 20:04:23 +0400 > Just switch to the consolidated code. > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> > Cc: Neil Brown <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPV6] Defer IPv6 device initialization until a valid qdisc is specified
From: Mitsuru Chinen <[EMAIL PROTECTED]> Date: Tue, 9 Oct 2007 16:21:58 +0900 > To judge the timing for DAD, netif_carrier_ok() is used. However, > there is a possibility that dev->qdisc stays noop_qdisc even if > netif_carrier_ok() returns true. In that case, DAD NS is not sent out. > We need to defer the IPv6 device initialization until a valid qdisc > is specified. > > Signed-off-by: Mitsuru Chinen <[EMAIL PROTECTED]> > Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Thanks for submitting the fix. Although Herbert is right that this does not fix the problem universally, it does make things better, so I will apply this patch. Thanks! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] division-by-zero in inet_csk_get_port
From: Anton Arapov <[EMAIL PROTECTED]> Date: Wed, 10 Oct 2007 11:56:23 +0200 > Yep, that's exactly I'm talking about. I'm sure that > [...] % (high - low) [...] erroneous from the begining, because > in such places we want to have 1 in denominator, for the cases when we > have only one port. Because 34000 34000 in sysctl's > ip_local_port_range means 1(one) port, not 0(zero). > > So it seems to me that we have to fix mentioned denominators in > kernel/net to have 1, that will be correct logically. And do the > MAX From this point of view, it's best idea to have two patches: one for > the kernel/net denominators and another one for the sysctl.c's > function dointvec_minmax(). Because they can live independently. And > the patch for the kernel/net will do the work at least because we > prevent kernel trap at all. > > Dave, am I right? Sure, two patches is fine. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] natsemi: Use round_jiffies() for slow timers
Unless we have failed to fill the RX ring the timer used by the natsemi driver is not particularly urgent and can use round_jiffies() to allow grouping with other timers. Signed-off-by: Mark Brown <[EMAIL PROTECTED]> --- Rediffed against current netdev-2.6.git#upstream drivers/net/natsemi.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/natsemi.c b/drivers/net/natsemi.c index 527f9dc..b881786 100644 --- a/drivers/net/natsemi.c +++ b/drivers/net/natsemi.c @@ -1576,7 +1576,7 @@ static int netdev_open(struct net_device *dev) /* Set the timer to check for link beat. */ init_timer(&np->timer); - np->timer.expires = jiffies + NATSEMI_TIMER_FREQ; + np->timer.expires = round_jiffies(jiffies + NATSEMI_TIMER_FREQ); np->timer.data = (unsigned long)dev; np->timer.function = &netdev_timer; /* timer handler */ add_timer(&np->timer); @@ -1856,7 +1856,11 @@ static void netdev_timer(unsigned long data) next_tick = 1; } } - mod_timer(&np->timer, jiffies + next_tick); + + if (next_tick > 1) + mod_timer(&np->timer, round_jiffies(jiffies + next_tick)); + else + mod_timer(&np->timer, jiffies + next_tick); } static void dump_ring(struct net_device *dev) @@ -3331,7 +3335,7 @@ static int natsemi_resume (struct pci_dev *pdev) spin_unlock_irq(&np->lock); enable_irq(dev->irq); - mod_timer(&np->timer, jiffies + 1*HZ); + mod_timer(&np->timer, round_jiffies(jiffies + 1*HZ)); } netif_device_attach(dev); out: -- 1.5.3.4 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-2.6.24 0/5]: TCP sacktag cache usage recoded
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Mon, 24 Sep 2007 13:28:42 +0300 > After couple of wrong-wayed before/after()s and one infinite > loopy version, here's the current trial version of a sacktag > cache usage recode > > Two first patches come from tcp-2.6 (rebased and rotated). > This series apply cleanly only on top of the other three patch > series I posted earlier today. The last debug patch provides > some statistics for those interested enough. > > Dave, please DO NOT apply! ...Some thoughts could be nice > though :-). Ilpo, I have not forgotten about this patch set. It is something I plan to look over after the madness of merging net-2.6.24 to Linus is complete. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
On Wed, Oct 10, 2007 at 11:16:44AM +0200, Andi Kleen wrote: > > A 256 entry TX hw queue fills up trivially on 1GB and 10GB, but if you > > With TSO really? Hardware queues are generally per-page rather than per-skb so it'd fill up quicker than a software queue even with TSO. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] division-by-zero in inet_csk_get_port
David Miller <[EMAIL PROTECTED]> writes: >> Ok, I've got it, so we have to do the same with the following: >> quote from inet_hashtables.c and inet6_hashtables.c. I'll prepare the >> patch. >> >> And just a curious, does the /* Treat low > high as high == low */ >> idea will keep after the sysctl will be patched? > > I'm beginning to think that we should do the sysctl validation > in this patch too, instead of duplicating this grotty check > in all of these port selection functions. Yep, that's exactly I'm talking about. I'm sure that [...] % (high - low) [...] erroneous from the begining, because in such places we want to have 1 in denominator, for the cases when we have only one port. Because 34000 34000 in sysctl's ip_local_port_range means 1(one) port, not 0(zero). So it seems to me that we have to fix mentioned denominators in kernel/net to have 1, that will be correct logically. And do the MAX Kernel Development, Red Hat GPG Key ID: 0x6FA8C812 pgp0HegHwE2UE.pgp Description: PGP signature
Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
On Wed, Oct 10, 2007 at 02:25:50AM -0700, David Miller wrote: > The chip I was working with at the time (UltraSPARC-IIi) compressed > all the linear stores into 64-byte full cacheline transactions via > the store buffer. That's a pretty old CPU. Conclusions on more modern ones might be different. > In fact, such a thing might not pan out well, because most of the time > you write a single descriptor or two, and that isn't a full cacheline, > which means a read/modify/write is the only coherent way to make such > a write to RAM. x86 WC does R-M-W and is coherent of course. The main difference is just that the result is not cached. When the hardware accesses the cache line then the cache should be also invalidated. > Sure you could batch, but I'd rather give the chip work to do unless > I unequivocably knew I'd have enough pending to fill a cacheline's > worth of descriptors. And since you suggest we shouldn't queue in > software... :-) Hmm, it probably would need to be coupled with batched submission if multiple packets are available you're right. Probably not worth doing explicit queueing though. I suppose it would be an interesting experiment at least. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
From: Andi Kleen <[EMAIL PROTECTED]> Date: Wed, 10 Oct 2007 12:23:31 +0200 > On Wed, Oct 10, 2007 at 02:25:50AM -0700, David Miller wrote: > > The chip I was working with at the time (UltraSPARC-IIi) compressed > > all the linear stores into 64-byte full cacheline transactions via > > the store buffer. > > That's a pretty old CPU. Conclusions on more modern ones might be different. Cache matters, just scale the numbers. > I suppose it would be an interesting experiment at least. Absolutely. I've always gotten very poor results when increasing the TX queue a lot, for example with NIU the point of diminishing returns seems to be in the range of 256-512 TX descriptor entries and this was with 1.6Ghz cpus. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-2.6.24 0/5]: TCP sacktag cache usage recoded
On Wed, 10 Oct 2007, David Miller wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Mon, 24 Sep 2007 13:28:42 +0300 > > > After couple of wrong-wayed before/after()s and one infinite > > loopy version, here's the current trial version of a sacktag > > cache usage recode > > > > Two first patches come from tcp-2.6 (rebased and rotated). > > This series apply cleanly only on top of the other three patch > > series I posted earlier today. The last debug patch provides > > some statistics for those interested enough. > > > > Dave, please DO NOT apply! ...Some thoughts could be nice > > though :-). > > Ilpo, I have not forgotten about this patch set. > > It is something I plan to look over after the madness of merging > net-2.6.24 to Linus is complete. Thanks, there's probably going to be some trouble though, I'd bet it doesn't anymore apply cleanly to net-2.6.23 HEAD because of something else that got applied (don't remember exactly but I guess that highest_sack reno fix did that). I try to get them resent soon but currently my thoughts are in solving DSACK ignored bug (and doing the associated cleanups) which again will cause those code move conflicts to reoccur. Therefore I'd love to postpone the rebase a bit... Hmm, SACK code is under such flux currently that I'll have to deal conflicts almost daily due to overlapping ideas... -- i.
Re: [RFC PATCH net-2.6.24 0/5]: TCP sacktag cache usage recoded
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Wed, 10 Oct 2007 13:26:05 +0300 (EEST) > Hmm, SACK code is under such flux currently that I'll > have to deal conflicts almost daily due to overlapping ideas... Welcome to my world, just scale it to 800 patches and entire networking tree :- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible 2.6.22 -> 2.6.23 HTB regression?
Denys wrote: >here is try to switch clocksource to acpi_pm > > > Time: acpi_pm clocksource has been installed. > Clockevents: could not switch to one-shot mode: lapic is not functional. > Could not switch to high resolution mode on CPU 0 What does /proc/net/psched contain? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[re] Possible 2.6.22 -> 2.6.23 HTB regression?
> What does /proc/net/psched contain? visp-1 ~ # cat /proc/net/psched 03e8 0400 000f4240 3b9aca00 -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [re] Possible 2.6.22 -> 2.6.23 HTB regression?
Denys wrote: >>What does /proc/net/psched contain? > > visp-1 ~ # cat /proc/net/psched > 03e8 0400 000f4240 3b9aca00 OK, hrtimers are disabled on your system, but we still announce the usec clock resolution to userspace, which is used by HTB to calculate the burst rate. But actually that can't be the reason since that has already been the case in 2.6.22. Please post a diff of the bootlog from 2.6.22 and 2.6.23. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] division-by-zero in inet_csk_get_port
David Miller <[EMAIL PROTECTED]> writes: > From: Anton Arapov <[EMAIL PROTECTED]> > Date: Wed, 10 Oct 2007 11:56:23 +0200 > >> Yep, that's exactly I'm talking about. I'm sure that >> [...] % (high - low) [...] erroneous from the begining, because >> in such places we want to have 1 in denominator, for the cases when we >> have only one port. Because 34000 34000 in sysctl's >> ip_local_port_range means 1(one) port, not 0(zero). >> >> So it seems to me that we have to fix mentioned denominators in >> kernel/net to have 1, that will be correct logically. And do the >> MAX> From this point of view, it's best idea to have two patches: one for >> the kernel/net denominators and another one for the sysctl.c's >> function dointvec_minmax(). Because they can live independently. And >> the patch for the kernel/net will do the work at least because we >> prevent kernel trap at all. >> >> Dave, am I right? > > Sure, two patches is fine. I have been mistaken. We can't modify sysctl code itself to do the checks like (MAX_VAL < MIN_VAL), we have generic functions, and if we want implement something like this we have to implement absolutely new functionality, it's insane to do it. :) It seems to me, all we can is to make this check in code where the MAX_VAL Kernel Development, Red Hat GPG Key ID: 0x6FA8C812 pgpKdvARE900u.pgp Description: PGP signature
Re: [Devel] [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace
Eric W. Biederman wrote: > Before I can enable rtnetlink to work in all network namespaces > I need to be certain that something won't break. So this > patch deliberately disables all of the rtnletlink methods in everything > except the initial network namespace. After the methods have been > audited this extra check can be disabled. > [...] > static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) > { > + struct net *net = skb->sk->sk_net; > struct net_device *dev; > int idx; > I've read some code today greping 'init_net.loopback_dev' and found interesting non-trivial for me issue. Network namespace is extracted from the packet in two different ways in TCP. This is a socket for outgoing path and a device for incoming. Though, there are some places called uniformly both from incoming and outgoing path. Typical example is netfilters. They are called uniformly all around the code. The prototype is the following: static unsigned int reject6_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, const void *targinfo); So, we are bound to the following options: - perform additional non-uniform hacks around to place 'struct net' into other and other structures like xt_target - add 7th parameter here and over - introduce an skb_net field in the 'struct sk_buff' making all code uniform, at least when we have an skb I think that this is not the last place with such a parameter list and we should make a decision at this point when the code in not mainline yet. As far as I understand, netfilters are not touched by the Eric and we can face some non-trivial problems there. So, if my point about uniformity is valid, this patchset looks wrong and should be re-worked :( Regards, Den - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Possible 2.6.22 -> 2.6.23 HTB regression?
Hi 2 all again I have shaper, running very simple HTB tree (about 10 classes). Traffic is coming via eth0, and going over eth0.1000, shaper installed on eth0.1000 (802.1Q vlan), total rate is 85Mbit/s. On kernel 2.6.22 with bnx2 eth0 everything was working fine. I have another server with similar task, with e1000, and was shaping rate lower than expected on 2.6.22. On full load it was 65-66Mbit/s instead 88Mbit/ s. So i postpone troubleshooting, and moved it as backup server. When i upgrade server with bnx2 to 2.6.23 i got the same behaviour on bnx2 too. Even with extended burst/cburst it is around 82.5Mbit/s instead 85Mbit/s. (calculation with 5 second delay) 82319/82354 KBit/S (52684221/52706721) (0/0) 82375/82368 KBit/S (52720558/52716058) (52684221/52706721) 82500/82500 KBit/S (52800425/52800425) (105404779/105422779) 82406/82406 KBit/S (52740119/52740119) (158205204/158223204) 82615/82631 KBit/S (52873964/52884464) (210945323/210963323) 82459/82464 KBit/S (52774379/52777379) (263819287/263847787) Here is tc -s -d class show dev eth0.2022 when i run without custom burst/ cburst (after 60 seconds) class htb 1:100 root rate 85000Kbit ceil 85000Kbit burst 1583b/8 mpu 0b overhead 0b cburst 1583b/8 mpu 0b overhead 0b level 7 Sent 511732440 bytes 348298 pkt (dropped 0, overlimits 0 requeues 0) rate 67637Kbit 5774pps backlog 0b 0p requeues 0 lended: 4029 borrowed: 0 giants: 0 tokens: -764 ctokens: -764 class htb 1:200 parent 1:100 leaf 200: prio 0 quantum 2000 rate 5000Kbit ceil 85000Kbit burst 1600b/8 mpu 0b overhead 0b cburst 1583b/8 mpu 0b overhead 0b level 0 Sent 1785787 bytes 8180 pkt (dropped 0, overlimits 0 requeues 0) rate 312608bit 162pps backlog 0b 0p requeues 0 lended: 8090 borrowed: 90 giants: 0 tokens: 2426 ctokens: 143 class htb 1:923 parent 1:920 leaf 923: prio 0 quantum 2000 rate 5000Kbit ceil 85000Kbit burst 1600b/8 mpu 0b overhead 0b cburst 1583b/8 mpu 0b overhead 0b level 0 Sent 16491797 bytes 10896 pkt (dropped 0, overlimits 0 requeues 0) rate 2171Kbit 179pps backlog 0b 0p requeues 0 lended: 10673 borrowed: 223 giants: 0 tokens: 138 ctokens: 8 class htb 1:910 parent 1:900 leaf 910: prio 0 quantum 2000 rate 5Kbit ceil 85000Kbit burst 1600b/8 mpu 0b overhead 0b cburst 1583b/8 mpu 0b overhead 0b level 0 Sent 276577122 bytes 185876 pkt (dropped 7427, overlimits 0 requeues 0) rate 36497Kbit 3065pps backlog 0b 444p requeues 0 lended: 182521 borrowed: 2911 giants: 0 tokens: -455 ctokens: -117 class htb 1:922 parent 1:920 leaf 922: prio 5 quantum 2000 rate 5000Kbit ceil 85000Kbit burst 1600b/8 mpu 0b overhead 0b cburst 1583b/8 mpu 0b overhead 0b level 0 Sent 56949991 bytes 37706 pkt (dropped 34663, overlimits 0 requeues 0) rate 7298Kbit 604pps backlog 0b 823p requeues 0 lended: 24844 borrowed: 12039 giants: 0 tokens: -4641 ctokens: -110 class htb 1:900 parent 1:100 rate 8Kbit ceil 85000Kbit burst 1590b/8 mpu 0b overhead 0b cburst 1583b/8 mpu 0b overhead 0b level 6 Sent 509946653 bytes 340118 pkt (dropped 0, overlimits 0 requeues 0) rate 67324Kbit 5612pps backlog 0b 0p requeues 0 lended: 10152 borrowed: 3939 giants: 0 tokens: -850 ctokens: -764 class htb 1:921 parent 1:920 leaf 921: prio 5 quantum 2000 rate 2Kbit ceil 85000Kbit burst 1600b/8 mpu 0b overhead 0b cburst 1583b/8 mpu 0b overhead 0b level 0 Sent 162349726 bytes 107246 pkt (dropped 51305, overlimits 0 requeues 0) rate 21252Kbit 1755pps backlog 0b 339p requeues 0 lended: 94785 borrowed: 12122 giants: 0 tokens: -1113 ctokens: -239 class htb 1:920 parent 1:900 rate 3Kbit ceil 85000Kbit burst 1593b/8 mpu 0b overhead 0b cburst 1583b/8 mpu 0b overhead 0b level 5 Sent 234033699 bytes 154686 pkt (dropped 0, overlimits 0 requeues 0) rate 30719Kbit 2538pps backlog 0b 0p requeues 0 lended: 13204 borrowed: 11180 giants: 0 tokens: -1780 ctokens: -505 Here is tc -s -d class show dev eth0.2022 when i run WITH custom burst/cburst (after 60 seconds) class htb 1:100 root rate 85000Kbit ceil 85000Kbit burst 16Kb/8 mpu 0b overhead 0b cburst 8Kb/8 mpu 0b overhead 0b level 7 Sent 637707898 bytes 430930 pkt (dropped 0, overlimits 0 requeues 0) rate 83982Kbit 7082pps backlog 0b 0p requeues 0 lended: 22809 borrowed: 0 giants: 0 tokens: -2453 ctokens: -3206 class htb 1:200 parent 1:100 leaf 200: prio 0 quantum 2000 rate 5000Kbit ceil 85000Kbit burst 16Kb/8 mpu 0b overhead 0b cburst 8Kb/8 mpu 0b overhead 0b level 0 Sent 1266780 bytes 7041 pkt (dropped 0, overlimits 0 requeues 0) rate 128920bit 102pps backlog 0b 0p requeues 0 lended: 6983 borrowed: 58 giants: 0 tokens: 23250 ctokens: 615 class htb 1:923 parent 1:920 leaf 923: prio 0 quantum 2000 rate 5000Kbit ceil 85000Kbit burst 16Kb/8 mpu 0b overhead 0b cburst 8Kb/8 mpu 0b overhead 0b level 0 Sent 16484432 bytes 10888 pkt (dropped 0, overlimits 0 requeues 0) rate 2172Kbit 179pps backlog 0b 0p requeues 0 lended: 10888 borrowed: 0 giants: 0 tokens: 18537 ctokens: 362 class htb 1:910 parent 1:900 leaf 910: prio 0
Possible 2.6.22 -> 2.6.23 HTB regression?
P.S. dmesg Linux version 2.6.23-insat1 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1- r3)) #1 SMP Wed Oct 10 01:41:17 EEST 2007 BIOS-provided physical RAM map: BIOS-e820: 0100 - 000a (usable) BIOS-e820: 0010 - 3ffa8000 (usable) BIOS-e820: 3ffa8000 - 3ffb7c00 (ACPI data) BIOS-e820: 3ffb7c00 - 4000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fe00 - 0001 (reserved) 127MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000fe710 Entering add_active_range(0, 0, 262056) 0 entries of 256 used Zone PFN ranges: DMA 0 -> 4096 Normal 4096 -> 229376 HighMem229376 -> 262056 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0:0 -> 262056 On node 0 totalpages: 262056 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 4064 pages, LIFO batch:0 Normal zone: 1760 pages used for memmap Normal zone: 223520 pages, LIFO batch:31 HighMem zone: 255 pages used for memmap HighMem zone: 32425 pages, LIFO batch:7 Movable zone: 0 pages used for memmap DMI 2.4 present. Using APIC driver default ACPI: RSDP 000F2620, 0024 (r2 DELL ) ACPI: XSDT 000F26A0, 004C (r1 DELL PE_SC3 1 DELL1) ACPI: FACP 000F27A8, 00F4 (r3 DELL PE_SC3 1 DELL1) ACPI: DSDT 3FFA8000, 3C53 (r1 DELL PE_SC3 1 MSFT 10E) ACPI: FACS 3FFB7C00, 0040 ACPI: APIC 000F289C, 00E0 (r1 DELL PE_SC3 1 DELL1) ACPI: SPCR 000F297D, 0050 (r1 DELL PE_SC3 1 DELL1) ACPI: HPET 000F29CD, 0038 (r1 DELL PE_SC3 1 DELL1) ACPI: MCFG 000F2A05, 003C (r1 DELL PE_SC3 1 DELL1) ACPI: PM-Timer IO Port: 0x808 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:6 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) Processor #2 15:6 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled) Processor #1 15:6 APIC version 20 ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) Processor #3 15:6 APIC version 20 ACPI: LAPIC (acpi_id[0x05] lapic_id[0x14] disabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x15] disabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x16] disabled) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x17] disabled) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1]) ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 4, version 32, address 0xfec0, GSI 0-23 ACPI: IOAPIC (id[0x05] address[0xfec8] gsi_base[32]) IOAPIC[1]: apic_id 5, version 32, address 0xfec8, GSI 32-55 ACPI: IOAPIC (id[0x06] address[0xfec81000] gsi_base[64]) IOAPIC[2]: apic_id 6, version 32, address 0xfec81000, GSI 64-87 ACPI: IOAPIC (id[0x07] address[0xfec82000] gsi_base[96]) IOAPIC[3]: apic_id 7, version 32, address 0xfec82000, GSI 96-119 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 4 I/O APICs ACPI: HPET id: 0x8086a201 base: 0xfed0 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 5000 (gap: 4000:a000) Built 1 zonelists in Zone order. Total pages: 260009 Kernel command line: root=/dev/sda3 panic=10 nmi_watchdog=1 mapped APIC to b000 (fee0) mapped IOAPIC to a000 (fec0) mapped IOAPIC to 9000 (fec8) mapped IOAPIC to 8000 (fec81000) mapped IOAPIC to 7000 (fec82000) Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 CPU 0 irqstacks, hard=c050e000 soft=c04ee000 PID hash table entries: 4096 (order: 12, 16384 bytes) Detected 3192.148 MHz processor. Console: colour VGA+ 80x25 console [tty0] enabled Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1033564k/1048224k available (2453k kernel code, 14048k reserved, 1305k data, 232k init, 130720k highmem) virtual kernel memory layout: fixmap : 0xffe14000 - 0xf000 (1964 kB) pkmap : 0xff80 - 0xffc0 (4096 kB) vmalloc : 0xf880 - 0xff7fe000 ( 111 MB) lowmem : 0xc000 - 0xf800 ( 896 MB) .init : 0xc04b1000 - 0xc04eb000 ( 232 kB) .data : 0xc036544c - 0xc04ab97c (1305 kB) .text : 0xc010 - 0xc036544c (2453 kB) Checking if this processor honours the
Re: Possible 2.6.22 -> 2.6.23 HTB regression?
Seems i am lost a bit. Now 2.6.22, i am not sure that working well. Possible it is related, that i booted kernel over kexec. I will try to do full power cycle reboot if required, but it will cause for me serious downtime. Please tell me, which kernel prefferable to boot? If it is interesting 2.6.22 (but also non-functional now). visp-1 ~ # cat /proc/net/psched 03e8 0400 000f4240 3b9aca00 maybe it is related to visp-1 ~ # dmesg|grep hpet hpet0: at MMIO 0xfed0, IRQs 2, 8, 0 hpet0: 3 64-bit timers, 14318180 Hz Time: hpet clocksource has been installed. hpet_resources: 0xfed0 is busy <<< - this? Diff 2.6.22 -> 2.6.23 dmesg --- log.2.6.22 2007-10-10 16:08:04.0 +0300 +++ log.2.6.23 2007-10-10 16:06:19.0 +0300 @@ -1,4 +1,4 @@ -Linux version 2.6.22-gentoo-r5-insat1 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1-r3)) #4 SMP Tue Sep 4 14:32:32 EEST 2007 +Linux version 2.6.23-insat1 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1- r3)) #1 SMP Wed Oct 10 01:41:17 EEST 2007 BIOS-provided physical RAM map: BIOS-e820: 0100 - 000a (usable) BIOS-e820: 0010 - 3ffa8000 (usable) @@ -14,6 +14,7 @@ DMA 0 -> 4096 Normal 4096 -> 229376 HighMem229376 -> 262056 +Movable zone start PFN for each node early_node_map[1] active PFN ranges 0:0 -> 262056 On node 0 totalpages: 262056 @@ -24,6 +25,7 @@ Normal zone: 223520 pages, LIFO batch:31 HighMem zone: 255 pages used for memmap HighMem zone: 32425 pages, LIFO batch:7 + Movable zone: 0 pages used for memmap DMI 2.4 present. Using APIC driver default ACPI: RSDP 000F2620, 0024 (r2 DELL ) @@ -74,45 +76,46 @@ ACPI: HPET id: 0x8086a201 base: 0xfed0 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 5000 (gap: 4000:a000) -Built 1 zonelists. Total pages: 260009 +Built 1 zonelists in Zone order. Total pages: 260009 Kernel command line: root=/dev/sda3 panic=10 nmi_watchdog=1 -mapped APIC to d000 (fee0) -mapped IOAPIC to c000 (fec0) -mapped IOAPIC to b000 (fec8) -mapped IOAPIC to a000 (fec81000) -mapped IOAPIC to 9000 (fec82000) +mapped APIC to b000 (fee0) +mapped IOAPIC to a000 (fec0) +mapped IOAPIC to 9000 (fec8) +mapped IOAPIC to 8000 (fec81000) +mapped IOAPIC to 7000 (fec82000) Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 -CPU 0 irqstacks, hard=c0508000 soft=c04e8000 +CPU 0 irqstacks, hard=c050e000 soft=c04ee000 PID hash table entries: 4096 (order: 12, 16384 bytes) -Detected 3192.172 MHz processor. +Detected 3192.148 MHz processor. Console: colour VGA+ 80x25 +console [tty0] enabled Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) -Memory: 1033592k/1048224k available (2456k kernel code, 14036k reserved, 1265k data, 236k init, 130720k highmem) +Memory: 1033564k/1048224k available (2453k kernel code, 14048k reserved, 1305k data, 232k init, 130720k highmem) virtual kernel memory layout: -fixmap : 0xffe16000 - 0xf000 (1956 kB) +fixmap : 0xffe14000 - 0xf000 (1964 kB) pkmap : 0xff80 - 0xffc0 (4096 kB) vmalloc : 0xf880 - 0xff7fe000 ( 111 MB) lowmem : 0xc000 - 0xf800 ( 896 MB) - .init : 0xc04a8000 - 0xc04e3000 ( 236 kB) - .data : 0xc03660d1 - 0xc04a289c (1265 kB) - .text : 0xc010 - 0xc03660d1 (2456 kB) + .init : 0xc04b1000 - 0xc04eb000 ( 232 kB) + .data : 0xc036544c - 0xc04ab97c (1305 kB) + .text : 0xc010 - 0xc036544c (2453 kB) Checking if this processor honours the WP bit even in supervisor mode... Ok. SLUB: Genslabs=22, HWalign=64, Order=0-1, MinObjects=4, CPUs=4, Nodes=1 hpet0: at MMIO 0xfed0, IRQs 2, 8, 0 hpet0: 3 64-bit timers, 14318180 Hz -Calibrating delay using timer specific routine.. 6388.07 BogoMIPS (lpj=3194038) +Calibrating delay using timer specific routine.. 6388.11 BogoMIPS (lpj=3194056) Mount-cache hash table entries: 512 -CPU: After generic identify, caps: bfebfbff 2010 e43d 0001 +CPU: After generic identify, caps: bfebfbff 2010 e43d 0001 monitor/mwait feature present. using mwait in idle threads. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 2048K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 -CPU: After all inits, caps: bfebfbff 2010 b180 e43d 0001 +CPU: After all inits, caps: bfebfbff 2010 b180 e43d 0001 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (24) available @@ -123,114 +126,81 @@ ACPI: Core revision 20070126 Parsing
Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
On Wed, 2007-10-10 at 03:44 -0700, David Miller wrote: > I've always gotten very poor results when increasing the TX queue a > lot, for example with NIU the point of diminishing returns seems to > be in the range of 256-512 TX descriptor entries and this was with > 1.6Ghz cpus. Is it interupt per packet? From my experience, you may find interesting results varying tx interupt mitigation parameters in addition to the ring parameters. Unfortunately when you do that, optimal parameters also depends on packet size. so what may work for 64B, wont work well for 1400B. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [re] Possible 2.6.22 -> 2.6.23 HTB regression?
On Wed, 2007-10-10 at 14:45 +0200, Patrick McHardy wrote: > OK, hrtimers are disabled on your system, but we still announce > the usec clock resolution to userspace, which is used by HTB to > calculate the burst rate. But actually that can't be the reason > since that has already been the case in 2.6.22. Please post a diff > of the bootlog from 2.6.22 and 2.6.23. Any possible relation to clock source? logs seem to indicate acpi source; how does tsc or jiffies do? BTW, I could be wrong about this, but iirc in a xeon i had access to i saw that i could not guarantee the same clock source would be selected across reboots in about 2.6.22. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible 2.6.22 -> 2.6.23 HTB regression?
Denys wrote: > Seems i am lost a bit. Now 2.6.22, i am not sure that working well. Possible > it is related, that i booted kernel over kexec. Possibly. > I will try to do full power cycle reboot if required, but it will cause for > me serious downtime. Please tell me, which kernel prefferable to boot? > > If it is interesting > 2.6.22 (but also non-functional now). > > visp-1 ~ # cat /proc/net/psched > 03e8 0400 000f4240 3b9aca00 > > maybe it is related to > visp-1 ~ # dmesg|grep hpet > hpet0: at MMIO 0xfed0, IRQs 2, 8, 0 > hpet0: 3 64-bit timers, 14318180 Hz > Time: hpet clocksource has been installed. > hpet_resources: 0xfed0 is busy <<< - this? Thats appears on both 2.6.22 and 2.6.23. > --- log.2.6.22 2007-10-10 16:08:04.0 +0300 > +++ log.2.6.23 2007-10-10 16:06:19.0 +0300 > @@ -314,12 +235,20 @@ > usbcore: registered new device driver usb > PCI: Using ACPI for IRQ routing > PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a > report > +Time: hpet clocksource has been installed. > +Clockevents: could not switch to one-shot mode: lapic is not functional. > +Could not switch to high resolution mode on CPU 0 > +Clockevents: could not switch to one-shot mode:<6>Clockevents: could not > switch to one-shot mode: lapic is not functional. > + lapic is not functional. > +Could not switch to high resolution mode on CPU 2 > +Could not switch to high resolution mode on CPU 3 > +Clockevents: could not switch to one-shot mode: lapic is not functional. > +Could not switch to high resolution mode on CPU 1 > pnp: 00:08: ioport range 0x800-0x87f has been reserved > pnp: 00:08: ioport range 0x880-0x8bf has been reserved > pnp: 00:08: ioport range 0x8c0-0x8df has been reserved > pnp: 00:08: ioport range 0x8e0-0x8e3 has been reserved > pnp: 00:08: ioport range 0xc00-0xc7f has been reserved > -Time: hpet clocksource has been installed. > pnp: 00:08: ioport range 0xca0-0xca7 has been reserved > pnp: 00:08: ioport range 0xca9-0xcab has been reserved > pnp: 00:08: ioport range 0xcad-0xcaf has been reserved > Real Time Clock Driver v1.12ac > -[ACPI Debug] String: [0x09] "HPET _CRS" > -[ACPI Debug] Buffer: [0x1C] > hpet_resources: 0xfed0 is busy > -ACPI Error (utglobal-0126): Unknown exception code: 0xFFF0 [20070126] > intel_rng: FWH not detected > Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is 60 > seconds). > Hangcheck: Using get_cycles(). > -input: Power Button (FF) as /class/input/input0 > -ACPI: Power Button (FF) [PWRF] > Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled > +serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A > +serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A > 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A > 00:07: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A > +Clockevents: could not switch to one-shot mode:<6>Clockevents: could not > switch to one-shot mode:<6>Clockevents: could not switch to one-shot mode: > lapic is not functional. > + lapic is not functional. > +Could not switch to high resolution mode on CPU 3 > +Could not switch to high resolution mode on CPU 2 > +Clockevents: could not switch to one-shot mode: lapic is not functional. > +Could not switch to high resolution mode on CPU 0 > + lapic is not functional. > +Could not switch to high resolution mode on CPU 1 hrtimers seem to have worked on your system in 2.6.22 and not in 2.6.23 anymore. This patch should fix the incorrectly announced /proc/net/psched timer resolution I mentioned earlier, causing HTB to use larger burst rates by default, but that still won't be as precise as with hrtimers. Looking at the code, the reason for not using the lapic seems to be nmi_watchdog=1: +APIC timer registered as dummy, due to nmi_watchdog=1! diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index dee0d5f..8f1bcf6 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1225,10 +1225,13 @@ EXPORT_SYMBOL(tcf_destroy_chain); #ifdef CONFIG_PROC_FS static int psched_show(struct seq_file *seq, void *v) { + struct timespec ts; + + hrtimer_get_res(CLOCK_MONOTONIC, &ts); seq_printf(seq, "%08x %08x %08x %08x\n", (u32)NSEC_PER_USEC, (u32)PSCHED_US2NS(1), 100, - (u32)NSEC_PER_SEC/(u32)ktime_to_ns(KTIME_MONOTONIC_RES)); + (u32)NSEC_PER_SEC/(u32)ktime_to_ns(timespec_to_ktime(ts))); return 0; }
Re: Possible 2.6.22 -> 2.6.23 HTB regression?
I did complete reboot(without kexec) to 2.6.23 (same configuration) and seems it is working better (not stuck as before to 60-70Mbit/s). On all cases current_clocksources was hpet, just i tried to change it (doesnt help at all). visp-1 ~ # cat /sys/devices/system/clocksource/clocksource0/ current_clocksource hpet visp-1 ~ # cat /sys/devices/system/clocksource/clocksource0/ available_clocksource hpet acpi_pm jiffies tsc from pcap analyser i wrote (just filter by expression "ip" counting bytes on eth0.1000): 82957/82957 KBit/S (53092530/53092530) (3887135786/3889389904) 82931/82931 KBit/S (53076469/53076469) (3940228316/3942482434) 82965/82965 KBit/S (53097615/53097615) (3993304785/3995558903) 82946/82946 KBit/S (53085988/53085988) (4046402400/4048656518) 82867/82867 KBit/S (53035341/53035341) (4099488388/4101742506) 82941/82941 KBit/S (53082260/53082260) (4152523729/4154777847) 82952/82952 KBit/S (53089348/53089348) (4205605989/4207860107) 82948/82945 KBit/S (53086915/53085415) (4258695337/4260949455) visp-1 ~ # cat /proc/net/psched 03e8 0400 000f4240 3b9aca00 How i can help more? -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [re] Possible 2.6.22 -> 2.6.23 HTB regression?
Patch applied. Rebooted over kexec to 2.6.23 without nmi_watchdog, for now all seems fine. visp-1 ~ # cat /proc/net/psched 03e8 0400 000f4240 3b9aca00 82942/82949 KBit/S (53083445/53087945) (106109614/106105114) 82955/82955 KBit/S (53091631/53091631) (159193059/159193059) 82955/82955 KBit/S (53091351/53091351) (212284690/212284690) 82951/82951 KBit/S (53088902/53088902) (265376041/265376041) 82940/82940 KBit/S (53081605/53081605) (318464943/318464943) 82959/82959 KBit/S (53094269/53094269) (371546548/371546548) 81596/81596 KBit/S (52221918/52221918) (424640817/424640817) 82909/82909 KBit/S (53062055/53062055) (476862735/476862735) 82939/82939 KBit/S (53081402/53081402) (529924790/529924790) 82963/82963 KBit/S (53096554/53096554) (583006192/583006192) 82954/82954 KBit/S (53090871/53090871) (636102746/636102746) 82030/82943 KBit/S (52499816/53084066) (689193617/689193617) 82945/82945 KBit/S (53085182/53085182) (741693433/742277683) 82964/82954 KBit/S (53097002/53091002) (794778615/795362865) -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace
Denis V. Lunev wrote: Eric W. Biederman wrote: Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. [...] static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) { + struct net *net = skb->sk->sk_net; struct net_device *dev; int idx; I've read some code today greping 'init_net.loopback_dev' and found interesting non-trivial for me issue. Network namespace is extracted from the packet in two different ways in TCP. This is a socket for outgoing path and a device for incoming. Though, there are some places called uniformly both from incoming and outgoing path. Typical example is netfilters. They are called uniformly all around the code. The prototype is the following: static unsigned int reject6_target(struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, unsigned int hooknum, const struct xt_target *target, const void *targinfo); Thanks Denis for auditing the code. As far as I see, struct net_device *in is NULL for outgoing traffic and struct net_device *out is NULL for ingress traffic. Except for the FORWARD rules where both are filled. If we are following network namespace semantic, we should not have two network devices belonging to two differents namespaces, right ? In this case, the following line of code should be sufficient to retrieve the network namespace, no ? struct net *net = in?in->nd_net:out->nd_net; So, we are bound to the following options: - perform additional non-uniform hacks around to place 'struct net' into other and other structures like xt_target - add 7th parameter here and over - introduce an skb_net field in the 'struct sk_buff' making all code uniform, at least when we have an skb I think that this is not the last place with such a parameter list and we should make a decision at this point when the code in not mainline yet. As far as I understand, netfilters are not touched by the Eric and we can face some non-trivial problems there. In Eric's git tree: http://git.kernel.org/?p=linux/kernel/git/ebiederm/linux-2.6-netns.git There are some modifications concerning net/ipv4/netfiler/iptable_filter.c and at the ipt_hook function, there is: struct net *net = (in?in:out)->nd_net; So, if my point about uniformity is valid, this patchset looks wrong and should be re-worked :( As Eric said, we want to build the network namespace step by step, taking care of not breaking the init network namespace. If you want to make iptables per namespace or catch problems before the code goes to Dave's tree, IMHO it will be more convenient to post to containers@ the patches against netns49, where the modifications will be in a network namespace big picture. Regards. -- Daniel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ip_local_port_range low > high check
This patch adds check low > high for ip_local_port_range. Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 53ef0f4..686c0a4 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -186,6 +186,61 @@ static int strategy_allowed_congestion_control(ctl_table *table, int __user *nam } +static int proc_port_range(ctl_table *table, int write, struct file *filp, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int ret; + ctl_table tbl = { + .maxlen = sizeof(sysctl_local_port_range), + .extra1 = ip_local_port_range_min, + .extra2 = ip_local_port_range_max + }; + tbl.data = kmalloc(tbl.maxlen, GFP_USER); + if (tbl.data == NULL) + return -ENOMEM; + memcpy(tbl.data, sysctl_local_port_range, tbl.maxlen); + + ret = proc_dointvec_minmax(&tbl, write, filp, buffer, lenp, ppos); + if (write && ret == 0) { + int *data = (int *)tbl.data; + if (data[0] > data[1]) + ret = -EINVAL; + else + memcpy(sysctl_local_port_range, data, + sizeof(sysctl_local_port_range)); + } + kfree(tbl.data); + return ret; +} + +int sysctl_strategy_port_range(ctl_table *table, int __user *name, int nlen, + void __user *oldval, size_t __user *oldlenp, + void __user *newval, size_t newlen) +{ + int ret; + ctl_table tbl = { + .maxlen = sizeof(sysctl_local_port_range), + .extra1 = ip_local_port_range_min, + .extra2 = ip_local_port_range_max + }; + tbl.data = kmalloc(tbl.maxlen, GFP_USER); + if (tbl.data == NULL) + return -ENOMEM; + memcpy(tbl.data, sysctl_local_port_range, tbl.maxlen); + + ret = sysctl_intvec(&tbl, name, nlen, oldval, oldlenp, newval, newlen); + if (ret == 0 && newval && newlen) { + int *data = (int *)tbl.data; + if (data[0] > data[1]) + ret = -EINVAL; + else + memcpy(sysctl_local_port_range, data, + sizeof(sysctl_local_port_range)); + } + kfree(tbl.data); + return ret; +} + ctl_table ipv4_table[] = { { .ctl_name = NET_IPV4_TCP_TIMESTAMPS, @@ -427,8 +482,8 @@ ctl_table ipv4_table[] = { .data = &sysctl_local_port_range, .maxlen = sizeof(sysctl_local_port_range), .mode = 0644, - .proc_handler = &proc_dointvec_minmax, - .strategy = &sysctl_intvec, + .proc_handler = &proc_port_range, + .strategy = &sysctl_strategy_port_range, .extra1 = ip_local_port_range_min, .extra2 = ip_local_port_range_max }, Warning: 1 path touched but unmodified. Consider running git-status. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] do not give access to 1-1024 ports for autobinding
This patch prevents possibility to give 1-1024 port range for autobinding. {1, 1} may only takes some sense for deep embedded people. Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> --- ./net/ipv4/sysctl_net_ipv4.c.port2 2007-10-10 17:46:48.0 +0400 +++ ./net/ipv4/sysctl_net_ipv4.c2007-10-10 18:08:00.0 +0400 @@ -25,7 +25,7 @@ extern int sysctl_ip_nonlocal_bind; #ifdef CONFIG_SYSCTL static int zero; static int tcp_retr1_max = 255; -static int ip_local_port_range_min[] = { 1, 1 }; +static int ip_local_port_range_min[] = { 1024, 1024 }; static int ip_local_port_range_max[] = { 65535, 65535 }; #endif - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET_SCHED]: Show timer resolution instead of clock resolution in /proc/net/psched
Fix incorrect HTB burst rate calculation in userspace when clock and timer resolution differ. I guess this should go in stable 2.6.22/23 as well. [NET_SCHED]: Show timer resolution instead of clock resolution in /proc/net/psched The fourth parameter of /proc/net/psched is supposed to show the timer resultion and is used by HTB userspace to calculate the necessary burst rate. Currently we show the clock resolution, which results in a too low burst rate when the two differ. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit a3885788169f2f70634f8142344e5131ccf32595 tree 62bcf28c9706547228521dc4402ebea273326331 parent 0e52ab8ceb41df2104279938484267ab474286d1 author Patrick McHardy <[EMAIL PROTECTED]> Wed, 10 Oct 2007 16:29:14 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Wed, 10 Oct 2007 16:29:14 +0200 net/sched/sch_api.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index dee0d5f..8f1bcf6 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1225,10 +1225,13 @@ EXPORT_SYMBOL(tcf_destroy_chain); #ifdef CONFIG_PROC_FS static int psched_show(struct seq_file *seq, void *v) { + struct timespec ts; + + hrtimer_get_res(CLOCK_MONOTONIC, &ts); seq_printf(seq, "%08x %08x %08x %08x\n", (u32)NSEC_PER_USEC, (u32)PSCHED_US2NS(1), 100, - (u32)NSEC_PER_SEC/(u32)ktime_to_ns(KTIME_MONOTONIC_RES)); + (u32)NSEC_PER_SEC/(u32)ktime_to_ns(timespec_to_ktime(ts))); return 0; }
Re: [PATCH] Evict tmp variable from the stack in ip6_evictor
Pavel Emelyanov wrote: The list_head *tmp is used to help getting the first entry in the ip6_frag_lru_list list. There is a simpler way to do it The exact same code exists in ip_fragment.c and nf_conntrack_reasm.c, please also change it there. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[0/7] IPsec: More input/output clean-ups
Hi Dave: Here's a few more clean-up's on the IPsec input/output path. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] [IPSEC] esp: Remove NAT-T checksum invalidation for BEET
[IPSEC] esp: Remove NAT-T checksum invalidation for BEET I pointed this out back when this patch was first proposed but it looks like it got lost along the way. The checksum only needs to be ignored for NAT-T in transport mode where we lose the original inner addresses due to NAT. With BEET the inner addresses will be intact so the checksum remains valid. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- net/ipv4/esp4.c |3 +-- 1 files changed, 1 insertion(+), 2 deletions(-) diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index 452910d..1af332d 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -261,8 +261,7 @@ static int esp_input(struct xfrm_state *x, struct sk_buff *skb) *as per draft-ietf-ipsec-udp-encaps-06, *section 3.1.2 */ - if (x->props.mode == XFRM_MODE_TRANSPORT || - x->props.mode == XFRM_MODE_BEET) + if (x->props.mode == XFRM_MODE_TRANSPORT) skb->ip_summed = CHECKSUM_UNNECESSARY; } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] [IPSEC] beet: Fix extension header support on output
[IPSEC] beet: Fix extension header support on output The beet output function completely kills any extension headers by replacing them with the IPv6 header. This is because it essentially ignores the result of ip6_find_1stfragopt by simply acting as if there aren't any extension headers. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- net/ipv6/xfrm6_mode_beet.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv6/xfrm6_mode_beet.c b/net/ipv6/xfrm6_mode_beet.c index 65e6b2a..d9366df 100644 --- a/net/ipv6/xfrm6_mode_beet.c +++ b/net/ipv6/xfrm6_mode_beet.c @@ -44,9 +44,9 @@ static int xfrm6_beet_output(struct xfrm_state *x, struct sk_buff *skb) hdr_len = ip6_find_1stfragopt(skb, &prevhdr); memmove(skb->data, iph, hdr_len); - skb_set_mac_header(skb, offsetof(struct ipv6hdr, nexthdr)); + skb_set_mac_header(skb, (prevhdr - x->props.header_len) - skb->data); skb_reset_network_header(skb); - skb_set_transport_header(skb, sizeof(struct ipv6hdr)); + skb_set_transport_header(skb, hdr_len); top_iph = ipv6_hdr(skb); ipv6_addr_copy(&top_iph->saddr, (struct in6_addr *)&x->props.saddr); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] [IPSEC]: Set skb->data to payload in x->mode->output
[IPSEC]: Set skb->data to payload in x->mode->output This patch changes the calling convention so that on entry from x->mode->output and before entry into x->type->output skb->data will point to the payload instead of the IP header. This is essentially a redistribution of skb_push/skb_pull calls with the aim of minimising them on the common path of tunnel + ESP. It'll also let us use the same calling convention between IPv4 and IPv6 with the next patch. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- net/ipv4/ah4.c |1 + net/ipv4/esp4.c |6 ++ net/ipv4/ipcomp.c |1 + net/ipv4/xfrm4_mode_beet.c |5 +++-- net/ipv4/xfrm4_mode_transport.c |4 ++-- net/ipv4/xfrm4_mode_tunnel.c|3 +-- net/ipv4/xfrm4_tunnel.c |1 + net/ipv6/ah6.c |1 + net/ipv6/esp6.c |9 ++--- net/ipv6/ipcomp6.c |5 - net/ipv6/mip6.c |2 ++ net/ipv6/xfrm6_mode_beet.c | 13 +++-- net/ipv6/xfrm6_mode_ro.c| 12 ++-- net/ipv6/xfrm6_mode_transport.c | 12 ++-- net/ipv6/xfrm6_mode_tunnel.c| 13 +++-- net/ipv6/xfrm6_tunnel.c |1 + 16 files changed, 47 insertions(+), 42 deletions(-) diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index 3513149..dbb1f11 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -66,6 +66,7 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb) charbuf[60]; } tmp_iph; + skb_push(skb, -skb_network_offset(skb)); top_iph = ip_hdr(skb); iph = &tmp_iph.iph; diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index 1af332d..0f5e838 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -28,9 +28,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) int alen; int nfrags; - /* Strip IP+ESP header. */ - __skb_pull(skb, skb_transport_offset(skb)); - /* Now skb is pure payload to encrypt */ + /* skb is pure payload to encrypt */ err = -ENOMEM; @@ -60,7 +58,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) tail[clen - skb->len - 2] = (clen - skb->len) - 2; pskb_put(skb, trailer, clen - skb->len); - __skb_push(skb, -skb_network_offset(skb)); + skb_push(skb, -skb_network_offset(skb)); top_iph = ip_hdr(skb); esph = (struct ip_esp_hdr *)(skb_network_header(skb) + top_iph->ihl * 4); diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c index e787044..1929d45 100644 --- a/net/ipv4/ipcomp.c +++ b/net/ipv4/ipcomp.c @@ -134,6 +134,7 @@ static int ipcomp_output(struct xfrm_state *x, struct sk_buff *skb) int hdr_len = 0; struct iphdr *iph = ip_hdr(skb); + skb_push(skb, -skb_network_offset(skb)); iph->tot_len = htons(skb->len); hdr_len = iph->ihl * 4; if ((skb->len - hdr_len) < ipcd->threshold) { diff --git a/net/ipv4/xfrm4_mode_beet.c b/net/ipv4/xfrm4_mode_beet.c index a73e710..77888f5 100644 --- a/net/ipv4/xfrm4_mode_beet.c +++ b/net/ipv4/xfrm4_mode_beet.c @@ -40,10 +40,11 @@ static int xfrm4_beet_output(struct xfrm_state *x, struct sk_buff *skb) if (unlikely(optlen)) hdrlen += IPV4_BEET_PHMAXLEN - (optlen & 4); - skb_push(skb, x->props.header_len - IPV4_BEET_PHMAXLEN + hdrlen); - skb_reset_network_header(skb); + skb_set_network_header(skb, IPV4_BEET_PHMAXLEN - x->props.header_len - + hdrlen); top_iph = ip_hdr(skb); skb->transport_header += sizeof(*iph) - hdrlen; + __skb_pull(skb, sizeof(*iph) - hdrlen); memmove(top_iph, iph, sizeof(*iph)); if (unlikely(optlen)) { diff --git a/net/ipv4/xfrm4_mode_transport.c b/net/ipv4/xfrm4_mode_transport.c index 6010471..10499d2 100644 --- a/net/ipv4/xfrm4_mode_transport.c +++ b/net/ipv4/xfrm4_mode_transport.c @@ -27,8 +27,8 @@ static int xfrm4_transport_output(struct xfrm_state *x, struct sk_buff *skb) int ihl = iph->ihl * 4; skb->transport_header = skb->network_header + ihl; - skb_push(skb, x->props.header_len); - skb_reset_network_header(skb); + skb_set_network_header(skb, -x->props.header_len); + __skb_pull(skb, ihl); memmove(skb_network_header(skb), iph, ihl); return 0; } diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c index 9963700..bac1a91 100644 --- a/net/ipv4/xfrm4_mode_tunnel.c +++ b/net/ipv4/xfrm4_mode_tunnel.c @@ -49,8 +49,7 @@ static int xfrm4_tunnel_output(struct xfrm_state *x, struct sk_buff *skb) iph = ip_hdr(skb); skb->transport_header = skb->network_header; - skb_push(skb, x->props.header_len); - skb_reset_network_header(skb); + skb_set_network_header(skb, -x->props.header_len); top_iph = ip_hdr(skb); top_i
[PATCH 5/7] [IPSEC]: Get rid of ipv6_{auth,esp,comp}_hdr
[IPSEC]: Get rid of ipv6_{auth,esp,comp}_hdr This patch removes the duplicate ipv6_{auth,esp,comp}_hdr structures since they're identical to the IPv4 versions. Duplicating them would only create problems for ourselves later when we need to add things like extended sequence numbers. I've also added transport header type conversion headers for these types which are now used by the transforms. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- include/linux/ipv6.h | 21 - include/net/ah.h |7 +++ include/net/esp.h|7 +++ include/net/ipcomp.h | 11 ++- net/ipv4/ah4.c | 18 +- net/ipv4/esp4.c | 10 +- net/ipv4/ipcomp.c|2 +- net/ipv6/ah6.c | 16 net/ipv6/esp6.c | 18 +- net/ipv6/ipcomp6.c | 17 - 10 files changed, 64 insertions(+), 63 deletions(-) diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 4ca60c3..5d35a4c 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -96,27 +96,6 @@ struct ipv6_destopt_hao { struct in6_addr addr; } __attribute__ ((__packed__)); -struct ipv6_auth_hdr { - __u8 nexthdr; - __u8 hdrlen; /* This one is measured in 32 bit units! */ - __be16 reserved; - __be32 spi; - __be32 seq_no; /* Sequence number */ - __u8 auth_data[0]; /* Length variable but >=4. Mind the 64 bit alignment! */ -}; - -struct ipv6_esp_hdr { - __be32 spi; - __be32 seq_no; /* Sequence number */ - __u8 enc_data[0]; /* Length variable but >=8. Mind the 64 bit alignment! */ -}; - -struct ipv6_comp_hdr { - __u8 nexthdr; - __u8 flags; - __be16 cpi; -}; - /* * IPv6 fixed header * diff --git a/include/net/ah.h b/include/net/ah.h index 5e758c2..ae1c322 100644 --- a/include/net/ah.h +++ b/include/net/ah.h @@ -38,4 +38,11 @@ out: return err; } +struct ip_auth_hdr; + +static inline struct ip_auth_hdr *ip_auth_hdr(const struct sk_buff *skb) +{ + return (struct ip_auth_hdr *)skb_transport_header(skb); +} + #endif diff --git a/include/net/esp.h b/include/net/esp.h index e793d76..c1bc529 100644 --- a/include/net/esp.h +++ b/include/net/esp.h @@ -53,4 +53,11 @@ static inline int esp_mac_digest(struct esp_data *esp, struct sk_buff *skb, return crypto_hash_final(&desc, esp->auth.work_icv); } +struct ip_esp_hdr; + +static inline struct ip_esp_hdr *ip_esp_hdr(const struct sk_buff *skb) +{ + return (struct ip_esp_hdr *)skb_transport_header(skb); +} + #endif diff --git a/include/net/ipcomp.h b/include/net/ipcomp.h index 87c1af3..330b74e 100644 --- a/include/net/ipcomp.h +++ b/include/net/ipcomp.h @@ -1,14 +1,23 @@ #ifndef _NET_IPCOMP_H #define _NET_IPCOMP_H -#include #include #define IPCOMP_SCRATCH_SIZE 65400 +struct crypto_comp; + struct ipcomp_data { u16 threshold; struct crypto_comp **tfms; }; +struct ip_comp_hdr; +struct sk_buff; + +static inline struct ip_comp_hdr *ip_comp_hdr(const struct sk_buff *skb) +{ + return (struct ip_comp_hdr *)skb_transport_header(skb); +} + #endif diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index e4f7aa3..d697064 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -82,7 +82,7 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb) goto error; } - ah = (struct ip_auth_hdr *)skb_transport_header(skb); + ah = ip_auth_hdr(skb); ah->nexthdr = *skb_mac_header(skb); *skb_mac_header(skb) = IPPROTO_AH; @@ -93,8 +93,7 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb) top_iph->check = 0; ahp = x->data; - ah->hdrlen = (XFRM_ALIGN8(sizeof(struct ip_auth_hdr) + - ahp->icv_trunc_len) >> 2) - 2; + ah->hdrlen = (XFRM_ALIGN8(sizeof(*ah) + ahp->icv_trunc_len) >> 2) - 2; ah->reserved = 0; ah->spi = x->id.spi; @@ -134,15 +133,15 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb) struct ah_data *ahp; char work_buf[60]; - if (!pskb_may_pull(skb, sizeof(struct ip_auth_hdr))) + if (!pskb_may_pull(skb, sizeof(*ah))) goto out; - ah = (struct ip_auth_hdr*)skb->data; + ah = (struct ip_auth_hdr *)skb->data; ahp = x->data; ah_hlen = (ah->hdrlen + 2) << 2; - if (ah_hlen != XFRM_ALIGN8(sizeof(struct ip_auth_hdr) + ahp->icv_full_len) && - ah_hlen != XFRM_ALIGN8(sizeof(struct ip_auth_hdr) + ahp->icv_trunc_len)) + if (ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_full_len) && + ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_trunc_len)) goto out; if (!pskb_may_pull(skb, ah_hlen)) @@ -156,7 +155,7 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb) skb->ip_summed = CHECKSUM_NONE;
[PATCH 6/7] [IPSEC]: Move IP length/checksum setting out of transforms
[IPSEC]: Move IP length/checksum setting out of transforms This patch moves the setting of the IP length and checksum fields out of the transforms and into the xfrmX_output functions. This would help future efforts in merging the transforms themselves. It also adds an optimisation to ipcomp due to the fact that the transport offset is guaranteed to be zero. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- net/ipv4/ah4.c |2 -- net/ipv4/esp4.c |7 +-- net/ipv4/ipcomp.c| 22 +- net/ipv4/xfrm4_mode_beet.c |3 --- net/ipv4/xfrm4_mode_tunnel.c |5 + net/ipv4/xfrm4_output.c |5 + net/ipv4/xfrm4_tunnel.c |5 - net/ipv6/esp6.c |3 --- net/ipv6/ipcomp6.c | 19 ++- net/ipv6/mip6.c |2 -- net/ipv6/xfrm6_mode_beet.c |2 -- net/ipv6/xfrm6_mode_tunnel.c |4 +--- net/ipv6/xfrm6_output.c |4 net/ipv6/xfrm6_tunnel.c |5 - 14 files changed, 23 insertions(+), 65 deletions(-) diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index d697064..60925fe 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -115,8 +115,6 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb) memcpy(top_iph+1, iph+1, top_iph->ihl*4 - sizeof(struct iphdr)); } - ip_send_check(top_iph); - err = 0; error: diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index 66eb496..8377bed 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -16,7 +16,6 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) { int err; - struct iphdr *top_iph; struct ip_esp_hdr *esph; struct crypto_blkcipher *tfm; struct blkcipher_desc desc; @@ -59,9 +58,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) pskb_put(skb, trailer, clen - skb->len); skb_push(skb, -skb_network_offset(skb)); - top_iph = ip_hdr(skb); esph = ip_esp_hdr(skb); - top_iph->tot_len = htons(skb->len + alen); *(skb_tail_pointer(trailer) - 1) = *skb_mac_header(skb); *skb_mac_header(skb) = IPPROTO_ESP; @@ -76,7 +73,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) uh = (struct udphdr *)esph; uh->source = encap->encap_sport; uh->dest = encap->encap_dport; - uh->len = htons(skb->len + alen - top_iph->ihl*4); + uh->len = htons(skb->len + alen - skb_transport_offset(skb)); uh->check = 0; switch (encap->encap_type) { @@ -136,8 +133,6 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) unlock: spin_unlock_bh(&x->lock); - ip_send_check(top_iph); - error: return err; } diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c index 78d6ddb..32b02de 100644 --- a/net/ipv4/ipcomp.c +++ b/net/ipv4/ipcomp.c @@ -98,10 +98,9 @@ out: static int ipcomp_compress(struct xfrm_state *x, struct sk_buff *skb) { struct ipcomp_data *ipcd = x->data; - const int ihlen = skb_transport_offset(skb); - const int plen = skb->len - ihlen; + const int plen = skb->len; int dlen = IPCOMP_SCRATCH_SIZE; - u8 *start = skb_transport_header(skb); + u8 *start = skb->data; const int cpu = get_cpu(); u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu); struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu); @@ -118,7 +117,7 @@ static int ipcomp_compress(struct xfrm_state *x, struct sk_buff *skb) memcpy(start + sizeof(struct ip_comp_hdr), scratch, dlen); put_cpu(); - pskb_trim(skb, ihlen + dlen + sizeof(struct ip_comp_hdr)); + pskb_trim(skb, dlen + sizeof(struct ip_comp_hdr)); return 0; out: @@ -131,13 +130,8 @@ static int ipcomp_output(struct xfrm_state *x, struct sk_buff *skb) int err; struct ip_comp_hdr *ipch; struct ipcomp_data *ipcd = x->data; - int hdr_len = 0; - struct iphdr *iph = ip_hdr(skb); - skb_push(skb, -skb_network_offset(skb)); - iph->tot_len = htons(skb->len); - hdr_len = iph->ihl * 4; - if ((skb->len - hdr_len) < ipcd->threshold) { + if (skb->len < ipcd->threshold) { /* Don't bother compressing */ goto out_ok; } @@ -146,25 +140,19 @@ static int ipcomp_output(struct xfrm_state *x, struct sk_buff *skb) goto out_ok; err = ipcomp_compress(x, skb); - iph = ip_hdr(skb); if (err) { goto out_ok; } /* Install ipcomp header, convert into ipcomp datagram. */ - iph->tot_len = htons(skb->len); ipch = ip_comp_hdr(skb); ipch->nexthdr = *skb_mac_header(skb); ipch->flags = 0; ipch->cpi = htons((u16 )ntohl(x->id.spi)); *skb_mac_header(skb) = IPPROTO_COMP; - ip_send_
[PATCH 7/7] [IPSEC]: Move IP protocol setting from transforms into xfrm4_input.c
[IPSEC]: Move IP protocol setting from transforms into xfrm4_input.c This patch makes the IPv4 x->type->input functions return the next protocol instead of setting it directly. This is identical to how we do things in IPv6 and will help us merge common code on the input path. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- net/ipv4/ah4.c |5 +++-- net/ipv4/esp4.c |3 +-- net/ipv4/ipcomp.c |7 --- net/ipv4/xfrm4_input.c |7 ++- net/ipv4/xfrm4_tunnel.c |2 +- 5 files changed, 15 insertions(+), 9 deletions(-) diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index 60925fe..4e8e3b0 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -125,6 +125,7 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb) { int ah_hlen; int ihl; + int nexthdr; int err = -EINVAL; struct iphdr *iph; struct ip_auth_hdr *ah; @@ -136,6 +137,7 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb) ah = (struct ip_auth_hdr *)skb->data; ahp = x->data; + nexthdr = ah->nexthdr; ah_hlen = (ah->hdrlen + 2) << 2; if (ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_full_len) && @@ -182,13 +184,12 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb) goto out; } } - ((struct iphdr*)work_buf)->protocol = ah->nexthdr; skb->network_header += ah_hlen; memcpy(skb_network_header(skb), work_buf, ihl); skb->transport_header = skb->network_header; __skb_pull(skb, ah_hlen + ihl); - return 0; + return nexthdr; out: return err; diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index 8377bed..6b1a31a 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -257,12 +257,11 @@ static int esp_input(struct xfrm_state *x, struct sk_buff *skb) skb->ip_summed = CHECKSUM_UNNECESSARY; } - iph->protocol = nexthdr[1]; pskb_trim(skb, skb->len - alen - padlen - 2); __skb_pull(skb, sizeof(*esph) + esp->conf.ivlen); skb_set_transport_header(skb, -ihl); - return 0; + return nexthdr[1]; out: return -EINVAL; diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c index 32b02de..0bfeb02 100644 --- a/net/ipv4/ipcomp.c +++ b/net/ipv4/ipcomp.c @@ -75,7 +75,6 @@ out: static int ipcomp_input(struct xfrm_state *x, struct sk_buff *skb) { int err = -ENOMEM; - struct iphdr *iph; struct ip_comp_hdr *ipch; if (skb_linearize_cow(skb)) @@ -84,12 +83,14 @@ static int ipcomp_input(struct xfrm_state *x, struct sk_buff *skb) skb->ip_summed = CHECKSUM_NONE; /* Remove ipcomp header and decompress original payload */ - iph = ip_hdr(skb); ipch = (void *)skb->data; - iph->protocol = ipch->nexthdr; skb->transport_header = skb->network_header + sizeof(*ipch); __skb_pull(skb, sizeof(*ipch)); err = ipcomp_decompress(x, skb); + if (err) + goto out; + + err = ipch->nexthdr; out: return err; diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c index 2fa1082..e9bbfde 100644 --- a/net/ipv4/xfrm4_input.c +++ b/net/ipv4/xfrm4_input.c @@ -54,12 +54,14 @@ static int xfrm4_rcv_encap(struct sk_buff *skb, __u16 encap_type) int xfrm_nr = 0; int decaps = 0; int err = xfrm4_parse_spi(skb, ip_hdr(skb)->protocol, &spi, &seq); + unsigned int nhoff = offsetof(struct iphdr, protocol); if (err != 0) goto drop; do { const struct iphdr *iph = ip_hdr(skb); + int nexthdr; if (xfrm_nr == XFRM_MAX_DEPTH) goto drop; @@ -82,9 +84,12 @@ static int xfrm4_rcv_encap(struct sk_buff *skb, __u16 encap_type) if (xfrm_state_check_expire(x)) goto drop_unlock; - if (x->type->input(x, skb)) + nexthdr = x->type->input(x, skb); + if (nexthdr <= 0) goto drop_unlock; + skb_network_header(skb)[nhoff] = nexthdr; + /* only the first xfrm gets the encap type */ encap_type = 0; diff --git a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c index e1fafc1..1312417 100644 --- a/net/ipv4/xfrm4_tunnel.c +++ b/net/ipv4/xfrm4_tunnel.c @@ -18,7 +18,7 @@ static int ipip_output(struct xfrm_state *x, struct sk_buff *skb) static int ipip_xfrm_rcv(struct xfrm_state *x, struct sk_buff *skb) { - return 0; + return IPPROTO_IP; } static int ipip_init_state(struct xfrm_state *x) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] [IPSEC]: Use IPv6 calling convention as the convention for x->mode->output
[IPSEC]: Use IPv6 calling convention as the convention for x->mode->output The IPv6 calling convention for x->mode->output is more general and could help an eventual protocol-generic x->type->output implementation. This patch adopts it for IPv4 as well and modifies the IPv4 type output functions accordingly. It also rewrites the IPv6 mac/transport header calculation to be based off the network header where practical. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> --- include/net/xfrm.h | 12 net/ipv4/ah4.c |6 +++--- net/ipv4/esp4.c | 11 +-- net/ipv4/ipcomp.c | 10 +- net/ipv4/xfrm4_mode_beet.c | 17 +++-- net/ipv4/xfrm4_mode_transport.c |7 +++ net/ipv4/xfrm4_mode_tunnel.c|7 +++ net/ipv6/xfrm6_mode_beet.c |9 + net/ipv6/xfrm6_mode_ro.c|9 + net/ipv6/xfrm6_mode_transport.c |9 + net/ipv6/xfrm6_mode_tunnel.c| 14 +++--- 11 files changed, 44 insertions(+), 67 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 1c116dc..77be396 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -300,6 +300,18 @@ extern void xfrm_put_type(struct xfrm_type *type); struct xfrm_mode { int (*input)(struct xfrm_state *x, struct sk_buff *skb); + + /* +* Add encapsulation header. +* +* On exit, the transport header will be set to the start of the +* encapsulation header to be filled in by x->type->output and +* the mac header will be set to the nextheader (protocol for +* IPv4) field of the extension header directly preceding the +* encapsulation header, or in its absence, that of the top IP +* header. The value of the network header will always point +* to the top IP header while skb->data will point to the payload. +*/ int (*output)(struct xfrm_state *x,struct sk_buff *skb); struct module *owner; diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index dbb1f11..e4f7aa3 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -82,14 +82,14 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb) goto error; } - ah = (struct ip_auth_hdr *)((char *)top_iph+top_iph->ihl*4); - ah->nexthdr = top_iph->protocol; + ah = (struct ip_auth_hdr *)skb_transport_header(skb); + ah->nexthdr = *skb_mac_header(skb); + *skb_mac_header(skb) = IPPROTO_AH; top_iph->tos = 0; top_iph->tot_len = htons(skb->len); top_iph->frag_off = 0; top_iph->ttl = 0; - top_iph->protocol = IPPROTO_AH; top_iph->check = 0; ahp = x->data; diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index 0f5e838..93153d1 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -60,10 +60,10 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) skb_push(skb, -skb_network_offset(skb)); top_iph = ip_hdr(skb); - esph = (struct ip_esp_hdr *)(skb_network_header(skb) + -top_iph->ihl * 4); + esph = (struct ip_esp_hdr *)skb_transport_header(skb); top_iph->tot_len = htons(skb->len + alen); - *(skb_tail_pointer(trailer) - 1) = top_iph->protocol; + *(skb_tail_pointer(trailer) - 1) = *skb_mac_header(skb); + *skb_mac_header(skb) = IPPROTO_ESP; spin_lock_bh(&x->lock); @@ -91,9 +91,8 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) break; } - top_iph->protocol = IPPROTO_UDP; - } else - top_iph->protocol = IPPROTO_ESP; + *skb_mac_header(skb) = IPPROTO_UDP; + } esph->spi = x->id.spi; esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq); diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c index 1929d45..bf74f64 100644 --- a/net/ipv4/ipcomp.c +++ b/net/ipv4/ipcomp.c @@ -98,10 +98,10 @@ out: static int ipcomp_compress(struct xfrm_state *x, struct sk_buff *skb) { struct ipcomp_data *ipcd = x->data; - const int ihlen = ip_hdrlen(skb); + const int ihlen = skb_transport_offset(skb); const int plen = skb->len - ihlen; int dlen = IPCOMP_SCRATCH_SIZE; - u8 *start = skb->data + ihlen; + u8 *start = skb_transport_header(skb); const int cpu = get_cpu(); u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu); struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu); @@ -154,11 +154,11 @@ static int ipcomp_output(struct xfrm_state *x, struct sk_buff *skb) /* Install ipcomp header, convert into ipcomp datagram. */ iph->tot_len = htons(skb->len); - ipch = (struct ip_comp_hdr *)((char *)iph + iph->ihl * 4); - ipch->nexthdr = iph->protocol; + ipch = (struct ip_comp_hdr *)skb_transport_header(skb); + ipch->nexthdr =
Re: [Devel] [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace
Daniel Lezcano wrote: > struct net *net = in?in->nd_net:out->nd_net; > >> So, we are bound to the following options: >> - perform additional non-uniform hacks around to place 'struct net' into >> other and other structures like xt_target >> - add 7th parameter here and over >> - introduce an skb_net field in the 'struct sk_buff' making all code >> uniform, at least when we have an skb >> >> I think that this is not the last place with such a parameter list and >> we should make a decision at this point when the code in not mainline >> yet. >> >> As far as I understand, netfilters are not touched by the Eric and we >> can face some non-trivial problems there. > > In Eric's git tree: > http://git.kernel.org/?p=linux/kernel/git/ebiederm/linux-2.6-netns.git > > There are some modifications concerning > net/ipv4/netfiler/iptable_filter.c and at the ipt_hook function, there is: > > struct net *net = (in?in:out)->nd_net; > >> So, if my point about uniformity is valid, this patchset looks wrong and >> should be re-worked :( > > As Eric said, we want to build the network namespace step by step, > taking care of not breaking the init network namespace. > > If you want to make iptables per namespace or catch problems before the > code goes to Dave's tree, IMHO it will be more convenient to post to > containers@ the patches against netns49, where the modifications will be > in a network namespace big picture. > my point is somewhat another. Yes, this is enough for that place. If so, I must scatter these checks all around in the netfilters code. Brr. In forward chain the situation is different for Layer3 switching. Let's assume that we have an OpenVZ scheme, where the packet flows from socket to device and after that from device to device via forwarding path. You can't call skb_orphan on namespace switching as this breaks UDP flow regulation. Virtual network device is fast while real Ethernet is slow, packets will be dropped on queue in real device. So, the situation with packet on send path with a socket from other namespace is possible :( I just pray for uniformity to concentrate on the code rather than on guesses on which path we are :( Regards, Den - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Evict tmp variable from the stack in ip6_evictor
Patrick McHardy wrote: > Pavel Emelyanov wrote: >> The list_head *tmp is used to help getting the first entry in >> the ip6_frag_lru_list list. There is a simpler way to do it > > > The exact same code exists in ip_fragment.c and nf_conntrack_reasm.c, > please also change it there. Hm, indeed. But I see that the structs frag_queue in reassembly.c, ipq in ip_fragment.c and nf_ct_frag6_queue in nf code looks VERY similar and very much of code (like link/unlink or evict) looks the same too. Maybe it's worth creating something like struct skb_fragment and consolidate all the common stuff into some net/core/lib_frag.c? Or is there some hidden reason for keeping this code splitted? Thanks, Pavel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Evict tmp variable from the stack in ip6_evictor
The list_head *tmp is used to help getting the first entry in the ip6_frag_lru_list list. There is a simpler way to do it. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 31601c9..8fad98b 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -261,7 +261,6 @@ static __inline__ void fq_kill(struct fr static void ip6_evictor(struct inet6_dev *idev) { struct frag_queue *fq; - struct list_head *tmp; int work; work = atomic_read(&ip6_frag_mem) - sysctl_ip6frag_low_thresh; @@ -274,8 +273,9 @@ static void ip6_evictor(struct inet6_dev read_unlock(&ip6_frag_lock); return; } - tmp = ip6_frag_lru_list.next; - fq = list_entry(tmp, struct frag_queue, lru_list); + + fq = list_first_entry(&ip6_frag_lru_list, + struct frag_queue, lru_list); atomic_inc(&fq->refcnt); read_unlock(&ip6_frag_lock); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] division-by-zero in inet_csk_get_port
Anton Arapov wrote: So, now the way suggested by Denis looks reasonable. What do you think? If that's the case then you should fix __udp_lib_get_port() the same way. Prevent division by zero in __udp_lib_get_port() when only one unsecured port is available. -Brian Signed-off-by: Brian Haley <[EMAIL PROTECTED]> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index ef4d901..61faa38 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -150,10 +150,11 @@ int __udp_lib_get_port(struct sock *sk, unsigned short snum, int i; int low = sysctl_local_port_range[0]; int high = sysctl_local_port_range[1]; + int remaining = (high - low) + 1; unsigned rover, best, best_size_so_far; best_size_so_far = UINT_MAX; - best = rover = net_random() % (high - low) + low; + best = rover = net_random() % remaining + low; /* 1st pass: look for empty (or shortest) hash chain */ for (i = 0; i < UDP_HTABLE_SIZE; i++) {
[IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493
Hi, From RFC 3493, Section 5.2: IPV6_MULTICAST_IF Set the interface to use for outgoing multicast packets. The argument is the index of the interface to use. If the interface index is specified as zero, the system selects the interface (for example, by looking up the address in a routing table and using the resulting interface). This patch adds support for (index == 0) to reset the value to it's original state, allowing the system to choose the best interface. IPv4 already behaves this way. -Brian Signed-off-by: Brian Haley <[EMAIL PROTECTED]> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 532425d..309284e 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -539,6 +539,13 @@ done: case IPV6_MULTICAST_IF: if (sk->sk_type == SOCK_STREAM) goto e_inval; + + if (val == 0) { + np->mcast_oif = 0; + retv = 0; + break; + } + if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != val) goto e_inval;
Re: [PATCH] Evict tmp variable from the stack in ip6_evictor
Pavel Emelyanov wrote: Patrick McHardy wrote: Pavel Emelyanov wrote: The list_head *tmp is used to help getting the first entry in the ip6_frag_lru_list list. There is a simpler way to do it The exact same code exists in ip_fragment.c and nf_conntrack_reasm.c, please also change it there. Hm, indeed. But I see that the structs frag_queue in reassembly.c, ipq in ip_fragment.c and nf_ct_frag6_queue in nf code looks VERY similar and very much of code (like link/unlink or evict) looks the same too. Maybe it's worth creating something like struct skb_fragment and consolidate all the common stuff into some net/core/lib_frag.c? Or is there some hidden reason for keeping this code splitted? I'm not sure if its possible between IPv4 and IPv6, but sharing code between IPv6 reassembly and netfilter/ipv6 would be nice. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
> From: Andi Kleen <[EMAIL PROTECTED]> > Date: Wed, 10 Oct 2007 12:23:31 +0200 > > > On Wed, Oct 10, 2007 at 02:25:50AM -0700, David Miller wrote: > > > The chip I was working with at the time (UltraSPARC-IIi) > compressed > > > all the linear stores into 64-byte full cacheline > transactions via > > > the store buffer. > > > > That's a pretty old CPU. Conclusions on more modern ones > might be different. > > Cache matters, just scale the numbers. > > > I suppose it would be an interesting experiment at least. > > Absolutely. > > I've always gotten very poor results when increasing the TX > queue a lot, for example with NIU the point of diminishing > returns seems to be in the range of 256-512 TX descriptor > entries and this was with 1.6Ghz cpus. We've done similar testing with ixgbe to push maximum descriptor counts, and we lost performance very quickly in the same range you're quoting on NIU. Cheers, -PJ Waskiewicz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] do not give access to 1-1024 ports for autobinding
On Wed, 10 Oct 2007 18:34:49 +0400 "Denis V. Lunev" <[EMAIL PROTECTED]> wrote: > This patch prevents possibility to give 1-1024 port range for autobinding. > {1, 1} may only takes some sense for deep embedded people. > > Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> > > --- ./net/ipv4/sysctl_net_ipv4.c.port22007-10-10 17:46:48.0 > +0400 > +++ ./net/ipv4/sysctl_net_ipv4.c 2007-10-10 18:08:00.0 +0400 > @@ -25,7 +25,7 @@ extern int sysctl_ip_nonlocal_bind; > #ifdef CONFIG_SYSCTL > static int zero; > static int tcp_retr1_max = 255; > -static int ip_local_port_range_min[] = { 1, 1 }; > +static int ip_local_port_range_min[] = { 1024, 1024 }; > static int ip_local_port_range_max[] = { 65535, 65535 }; > #endif > > - That only limits the sysctl, which seems completely counter productive. Sounds like more of the "stop root from shooting themselves" patches. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
> We've done similar testing with ixgbe to push maximum descriptor counts, > and we lost performance very quickly in the same range you're quoting on > NIU. Did you try it with WC writes to the ring or CLFLUSH? -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
On Tue, 09 Oct 2007, David Miller wrote: > From: jamal <[EMAIL PROTECTED]> > Date: Tue, 09 Oct 2007 17:56:46 -0400 > > > if the h/ware queues are full because of link pressure etc, you drop. We > > drop today when the s/ware queues are full. The driver txmit lock takes > > place of the qdisc queue lock etc. I am assuming there is still need for > > that locking. The filter/classification scheme still works as is and > > select classes which map to rings. tc still works as is etc. > > I understand your suggestion. > > We have to keep in mind, however, that the sw queue right now is 1000 > packets. I heavily discourage any driver author to try and use any > single TX queue of that size. Which means that just dropping on back > pressure might not work so well. > > Or it might be perfect and signal TCP to backoff, who knows! :-) I can't remember the details anymore, but for 10-GigE, I have encountered cases where I was able to significantly increase TCP performance by increasing the txqueuelen to 1, which is the setting I now use for any 10-GigE testing. -Bill - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
Jay Vosburgh wrote: > David Miller <[EMAIL PROTECTED]> wrote: > >> From: Jeff Garzik <[EMAIL PROTECTED]> >> Date: Tue, 09 Oct 2007 20:56:35 -0400 >> >>> Jeff Garzik wrote: applied patches 1-9 the only thing that was a hiccup during submission is that your email subject lines did not contain a notion of ordering "[PATCH 1/9] ...". But other than that, the git-send-email went flawlessly. >>> unfortunately it does not seem to build flawlessly: >> Yeah it doesn't handle Stephen Hemmingers headerops change >> in net-2.6.24 > > Gaah. I'll sort it out and repost. > > -J > > --- > -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] Hi Jay, Jeff Thanks for the help with making the patch work compile under 2.6.24. However, patch #3 has a missing line in bond_setup_by_slave that should look like this bond_dev->header_ops= slave_dev->header_ops; I rewrote the patch and also fixed patch #8 that became broken. I would send the new patches now but there is more I also ran a test for the code in the branch of 2.6.24 and found a problem. I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?) I am trying to solve it now so I'd like to wait a short time before applying these patches. I guess that I'll need to add something. thanks MoniS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] natsemi: Check return value for pci_enable_device()
pci_enable_device() is __must_check so do that in natsemi_resume(). Signed-off-by: Mark Brown <[EMAIL PROTECTED]> --- drivers/net/natsemi.c | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/natsemi.c b/drivers/net/natsemi.c index b881786..50e1ec6 100644 --- a/drivers/net/natsemi.c +++ b/drivers/net/natsemi.c @@ -3314,13 +3314,19 @@ static int natsemi_resume (struct pci_dev *pdev) { struct net_device *dev = pci_get_drvdata (pdev); struct netdev_private *np = netdev_priv(dev); + int ret = 0; rtnl_lock(); if (netif_device_present(dev)) goto out; if (netif_running(dev)) { BUG_ON(!np->hands_off); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret < 0) { + dev_err(&pdev->dev, + "pci_enable_device() failed: %d\n", ret); + goto out; + } /* pci_power_on(pdev); */ napi_enable(&np->napi); @@ -3340,7 +3346,7 @@ static int natsemi_resume (struct pci_dev *pdev) netif_device_attach(dev); out: rtnl_unlock(); - return 0; + return ret; } #endif /* CONFIG_PM */ -- 1.5.3.4 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
> -Original Message- > From: Andi Kleen [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 10, 2007 9:02 AM > To: Waskiewicz Jr, Peter P > Cc: David Miller; [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED]; [EMAIL PROTECTED]; > netdev@vger.kernel.org; [EMAIL PROTECTED]; > [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net > core use batching > > > We've done similar testing with ixgbe to push maximum descriptor > > counts, and we lost performance very quickly in the same > range you're > > quoting on NIU. > > Did you try it with WC writes to the ring or CLFLUSH? > > -Andi Hmm, I think it might be slightly different, but it still shows queue depth vs. performance. I was actually referring to how many descriptors we can represent a packet with before it becomes a problem wrt performance. This morning I tried to actually push my ixgbe NIC hard enough to come close to filling the ring with packets (384-byte packets), and even on my 8-core Xeon I can't do it. My system can't generate enough I/O to fill the hardware queues before CPUs max out. -PJ Waskiewicz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] do not give access to 1-1024 ports for autobinding
Stephen Hemminger wrote: > On Wed, 10 Oct 2007 18:34:49 +0400 > "Denis V. Lunev" <[EMAIL PROTECTED]> wrote: > >> This patch prevents possibility to give 1-1024 port range for autobinding. >> {1, 1} may only takes some sense for deep embedded people. >> >> Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> >> >> --- ./net/ipv4/sysctl_net_ipv4.c.port2 2007-10-10 17:46:48.0 >> +0400 >> +++ ./net/ipv4/sysctl_net_ipv4.c 2007-10-10 18:08:00.0 +0400 >> @@ -25,7 +25,7 @@ extern int sysctl_ip_nonlocal_bind; >> #ifdef CONFIG_SYSCTL >> static int zero; >> static int tcp_retr1_max = 255; >> -static int ip_local_port_range_min[] = { 1, 1 }; >> +static int ip_local_port_range_min[] = { 1024, 1024 }; >> static int ip_local_port_range_max[] = { 65535, 65535 }; >> #endif >> >> - > > That only limits the sysctl, which seems completely counter productive. > Sounds like more of the "stop root from shooting themselves" patches. > They have sense for the case of multiple network namespaces, where root in the other namespace can be treated as a user to initial namespace. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] more robust inet range checking
More complete version of local port range checking. 1. Enforce that low < high when setting. 2. Use seqlock to ensure atomic update. 3. Add port randomization to SCTP. This is a new feature but easier than maintaining old code that was broken if range changed. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- drivers/infiniband/core/cma.c | 24 ++-- include/net/ip.h|3 + net/ipv4/inet_connection_sock.c | 26 ++--- net/ipv4/inet_hashtables.c | 13 +++--- net/ipv4/sysctl_net_ipv4.c | 77 net/ipv4/tcp_ipv4.c |1 net/ipv4/udp.c | 18 - net/ipv6/inet6_hashtables.c | 13 +++--- net/sctp/protocol.c |1 net/sctp/socket.c | 26 - security/selinux/hooks.c| 37 ++- 11 files changed, 157 insertions(+), 82 deletions(-) --- a/include/net/ip.h 2007-10-10 08:26:57.0 -0700 +++ b/include/net/ip.h 2007-10-10 09:35:26.0 -0700 @@ -171,7 +171,8 @@ extern unsigned long snmp_fold_field(voi extern int snmp_mib_init(void *ptr[2], size_t mibsize, size_t mibalign); extern void snmp_mib_free(void *ptr[2]); -extern int sysctl_local_port_range[2]; +extern void inet_get_local_port_range(int range[2]); + extern int sysctl_ip_default_ttl; extern int sysctl_ip_nonlocal_bind; --- a/net/ipv4/inet_connection_sock.c 2007-10-10 09:29:03.0 -0700 +++ b/net/ipv4/inet_connection_sock.c 2007-10-10 09:52:49.0 -0700 @@ -33,6 +33,19 @@ EXPORT_SYMBOL(inet_csk_timer_bug_msg); * This array holds the first and last local port number. */ int sysctl_local_port_range[2] = { 32768, 61000 }; +DEFINE_SEQLOCK(sysctl_port_range_lock); + +void inet_get_local_port_range(int range[2]) +{ + unsigned seq; + do { + seq = read_seqbegin(&sysctl_port_range_lock); + + range[0] = sysctl_local_port_range[0]; + range[1] = sysctl_local_port_range[1]; + } while (read_seqretry(&sysctl_port_range_lock, seq)); +} +EXPORT_SYMBOL(inet_get_local_port_range); int inet_csk_bind_conflict(const struct sock *sk, const struct inet_bind_bucket *tb) @@ -77,10 +90,11 @@ int inet_csk_get_port(struct inet_hashin local_bh_disable(); if (!snum) { - int low = sysctl_local_port_range[0]; - int high = sysctl_local_port_range[1]; - int remaining = (high - low) + 1; - int rover = net_random() % (high - low) + low; + int remaining, range[2], rover; + + inet_get_local_port_range(range); + remaining = range[1] - range[0]; + rover = net_random() % (range[1] - range[0]) + range[0]; do { head = &hashinfo->bhash[inet_bhashfn(rover, hashinfo->bhash_size)]; @@ -91,8 +105,8 @@ int inet_csk_get_port(struct inet_hashin break; next: spin_unlock(&head->lock); - if (++rover > high) - rover = low; + if (++rover > range[1]) + rover = range[0]; } while (--remaining > 0); /* Exhausted local port range during search? It is not --- a/net/ipv4/inet_hashtables.c2007-10-10 09:27:02.0 -0700 +++ b/net/ipv4/inet_hashtables.c2007-10-10 09:40:39.0 -0700 @@ -279,19 +279,18 @@ int inet_hash_connect(struct inet_timewa int ret; if (!snum) { - int low = sysctl_local_port_range[0]; - int high = sysctl_local_port_range[1]; - int range = high - low; - int i; - int port; + int i, count, range[2], port; static u32 hint; u32 offset = hint + inet_sk_port_offset(sk); struct hlist_node *node; struct inet_timewait_sock *tw = NULL; + inet_get_local_port_range(range); + count = range[1] - range[0]; + local_bh_disable(); - for (i = 1; i <= range; i++) { - port = low + (i + offset) % range; + for (i = 1; i <= count; i++) { + port = range[0] + (i + offset) % count; head = &hinfo->bhash[inet_bhashfn(port, hinfo->bhash_size)]; spin_lock(&head->lock); --- a/net/ipv4/sysctl_net_ipv4.c2007-10-10 08:27:00.0 -0700 +++ b/net/ipv4/sysctl_net_ipv4.c2007-10-10 09:46:12.0 -0700 @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -25,8 +26,6 @@ extern int sysctl_ip_nonlocal_bind; #ifdef CONFIG_SYSCTL static int zero; static int tcp_retr1_max = 255; -static int ip_local_port_range_min[]
Re: [PATCH] do not give access to 1-1024 ports for autobinding
On Wed, 10 Oct 2007 20:59:13 +0400 "Denis V. Lunev" <[EMAIL PROTECTED]> wrote: > Stephen Hemminger wrote: > > On Wed, 10 Oct 2007 18:34:49 +0400 > > "Denis V. Lunev" <[EMAIL PROTECTED]> wrote: > > > >> This patch prevents possibility to give 1-1024 port range for autobinding. > >> {1, 1} may only takes some sense for deep embedded people. > >> > >> Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> > >> > >> --- ./net/ipv4/sysctl_net_ipv4.c.port2 2007-10-10 17:46:48.0 > >> +0400 > >> +++ ./net/ipv4/sysctl_net_ipv4.c 2007-10-10 18:08:00.0 +0400 > >> @@ -25,7 +25,7 @@ extern int sysctl_ip_nonlocal_bind; > >> #ifdef CONFIG_SYSCTL > >> static int zero; > >> static int tcp_retr1_max = 255; > >> -static int ip_local_port_range_min[] = { 1, 1 }; > >> +static int ip_local_port_range_min[] = { 1024, 1024 }; > >> static int ip_local_port_range_max[] = { 65535, 65535 }; > >> #endif > >> > >> - > > > > That only limits the sysctl, which seems completely counter productive. > > Sounds like more of the "stop root from shooting themselves" patches. > > > > They have sense for the case of multiple network namespaces, where root > in the other namespace can be treated as a user to initial namespace. IMHO don't want to treat root as a complete idiot like normal users. As long as what root requests doesn't create a security problem, it should be allowed. The port space is per namespace right? The sysctl values should be per namespace as well. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 rebased...
David Miller wrote: From: Urs Thuermann <[EMAIL PROTECTED]> Date: 09 Oct 2007 23:13:42 +0200 Last week I have sent another version of our patch series for PF_CAN. The changes after the last review feedback were only cosmetics. Do you have any plans with that code for this or the next release? I think PF_CAN will go in 2.6.25 Good news. Thanks! I'll keep on tracking the current patch flow to be sure that we're still on the head of development, when net-2.6.25 hits the ground. Best regards, Oliver - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] more robust inet range checking
Stephen Hemminger wrote: > More complete version of local port range checking. > > 1. Enforce that low < high when setting. > 2. Use seqlock to ensure atomic update. > 3. Add port randomization to SCTP. This is a new feature but >easier than maintaining old code that was broken if range >changed. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > Ack the SCTP portion. Much nicer and a much needed improvement. Thanks -vlad - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
authenc compile warnings in current net-2.6.24
Hi Herbert, CC [M] crypto/authenc.o crypto/authenc.c: In function ‘crypto_authenc_hash’: crypto/authenc.c:88: warning: ‘cryptlen’ may be used uninitialized in this function crypto/authenc.c:87: warning: ‘dst’ may be used uninitialized in this function crypto/authenc.c: In function ‘crypto_authenc_decrypt’: crypto/authenc.c:163: warning: ‘cryptlen’ may be used uninitialized in this function crypto/authenc.c:163: note: ‘cryptlen’ was declared here crypto/authenc.c:162: warning: ‘src’ may be used uninitialized in this function crypto/authenc.c:162: note: ‘src’ was declared here do you already know these warnings? Oliver - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NETNS] Make ifindex generation per-namespace
Pavel Emelyanov <[EMAIL PROTECTED]> writes: >> I know there are several data structures internal to the kernel that >> are indexed by ifindex, and not struct net_device *. There is the >> iflink field in struct net_device. We need a way to refer to network >> devices in other namespaces in rtnetlink in an unambiguous way. I >> don't see any real problems with a global ifindex assignment until >> we start migrating applications. >> >> So please hold off on this until the kernel has been audited and >> we have removed all of the uses of ifindex that assume ifindex is >> global, that we can find. > > Ok. Thanks. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
> I also ran a test for the code in the branch of 2.6.24 and found a problem. > I see that ifconfig down doesn't return (for IPoIB interfaces) and it's > stuck in napi_disable() in the kernel (any idea why?) For what it's worth, I took the upstream 2.6.23 git tree and merged in Dave's latest net-2.6.24 tree and my latest for-2.6.24 tree and tried that. I brought up an IPoIB interface, sent a few pings, and did ifconfig down, and it worked fine. Can you try the same thing without the bonding patches to see if your setup works OK too? Also can you give more details about what you do to get ifconfig down stuck? - R. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NETNS] Make ifindex generation per-namespace
On Tue, 2007-10-09 at 11:41 -0600, Eric W. Biederman wrote: > So please hold off on this until the kernel has been audited and > we have removed all of the uses of ifindex that assume ifindex is > global, that we can find. I certainly have this assumption in the wireless code (cfg80211). How would I go about removing it? Are netlink sockets per-namespace so I can use the namespace of the netlink socket to look up a netdev? johannes signature.asc Description: This is a digitally signed message part
Re: [RFC] more robust inet range checking
Stephen Hemminger wrote: int inet_csk_bind_conflict(const struct sock *sk, const struct inet_bind_bucket *tb) @@ -77,10 +90,11 @@ int inet_csk_get_port(struct inet_hashin local_bh_disable(); if (!snum) { - int low = sysctl_local_port_range[0]; - int high = sysctl_local_port_range[1]; - int remaining = (high - low) + 1; - int rover = net_random() % (high - low) + low; + int remaining, range[2], rover; + + inet_get_local_port_range(range); + remaining = range[1] - range[0]; + rover = net_random() % (range[1] - range[0]) + range[0]; nit-pick: rover = net_random() % remaining + range[0]; --- a/net/ipv4/udp.c2007-10-10 08:27:00.0 -0700 +++ b/net/ipv4/udp.c2007-10-10 09:44:35.0 -0700 @@ -147,13 +147,13 @@ int __udp_lib_get_port(struct sock *sk, write_lock_bh(&udp_hash_lock); if (!snum) { - int i; - int low = sysctl_local_port_range[0]; - int high = sysctl_local_port_range[1]; + int i, range[2]; unsigned rover, best, best_size_so_far; Should these be signed ints? They're the only ones that are unsigned, but I don't know why. --- a/net/sctp/protocol.c 2007-10-10 08:27:00.0 -0700 +++ b/net/sctp/protocol.c 2007-10-10 09:58:21.0 -0700 @@ -1173,7 +1173,6 @@ SCTP_STATIC __init int sctp_init(void) } spin_lock_init(&sctp_port_alloc_lock); - sctp_port_rover = sysctl_local_port_range[0] - 1; I think you can remove the port_rover definition in sctp/structs.h and also the lock that protects it. Patch below for that which can be applied on-top of yours. -Brian Remove SCTP port_rover and port_alloc_lock as they're no longer required. Signed-off-by: Brian Haley <[EMAIL PROTECTED]> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 448f713..c1a083c 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -197,8 +197,6 @@ extern struct sctp_globals { /* This is the sctp port control hash. */ int port_hashsize; - int port_rover; - spinlock_t port_alloc_lock; /* Protects port_rover. */ struct sctp_bind_hashbucket *port_hashtable; /* This is the global local address list. @@ -245,8 +243,6 @@ extern struct sctp_globals { #define sctp_assoc_hashsize (sctp_globals.assoc_hashsize) #define sctp_assoc_hashtable (sctp_globals.assoc_hashtable) #define sctp_port_hashsize (sctp_globals.port_hashsize) -#define sctp_port_rover (sctp_globals.port_rover) -#define sctp_port_alloc_lock (sctp_globals.port_alloc_lock) #define sctp_port_hashtable (sctp_globals.port_hashtable) #define sctp_local_addr_list (sctp_globals.local_addr_list) #define sctp_local_addr_lock (sctp_globals.addr_list_lock) diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 80df457..81b26c5 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -1172,8 +1172,6 @@ SCTP_STATIC __init int sctp_init(void) sctp_port_hashtable[i].chain = NULL; } - spin_lock_init(&sctp_port_alloc_lock); - printk(KERN_INFO "SCTP: Hash tables configured " "(established %d bind %d)\n", sctp_assoc_hashsize, sctp_port_hashsize); diff --git a/net/sctp/socket.c b/net/sctp/socket.c index e1e2d2c..293200d 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -5321,7 +5321,6 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr) remaining = range[1] - range[0]; rover = net_random() % remaining + range[0]; - sctp_spin_lock(&sctp_port_alloc_lock); do { rover++; if ((rover < range[0]) || (rover > range[1])) @@ -5337,7 +5336,6 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr) next: sctp_spin_unlock(&head->lock); } while (--remaining > 0); - sctp_spin_unlock(&sctp_port_alloc_lock); /* Exhausted local port range during search? */ ret = 1;
Re: [Devel] [PATCH 1/5] net: Modify all rtnetlink methods to only work in the initial namespace
"Denis V. Lunev" <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> Before I can enable rtnetlink to work in all network namespaces >> I need to be certain that something won't break. So this >> patch deliberately disables all of the rtnletlink methods in everything >> except the initial network namespace. After the methods have been >> audited this extra check can be disabled. >> > [...] >> static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) >> { >> +struct net *net = skb->sk->sk_net; >> struct net_device *dev; >> int idx; >> > > I've read some code today greping 'init_net.loopback_dev' and found > interesting non-trivial for me issue. > > Network namespace is extracted from the packet in two different ways in > TCP. This is a socket for outgoing path and a device for incoming. > Though, there are some places called uniformly both from incoming and > outgoing path. > > Typical example is netfilters. They are called uniformly all around the > code. The prototype is the following: > > static unsigned int reject6_target(struct sk_buff **pskb, >const struct net_device *in, >const struct net_device *out, >unsigned int hooknum, >const struct xt_target *target, >const void *targinfo); > > So, we are bound to the following options: > - perform additional non-uniform hacks around to place 'struct net' into > other and other structures like xt_target > - add 7th parameter here and over > - introduce an skb_net field in the 'struct sk_buff' making all code > uniform, at least when we have an skb No. That bloats a sk_buff, changes the semantics of moving a skb around, and decreases performance (because we have to maintain the field on a fast path). There will not be a skb_net field. The entire concept of skb_net is a maintenance disaster. > I think that this is not the last place with such a parameter list and > we should make a decision at this point when the code in not mainline yet. Certainly that is what I have a proof of concept tree for. So we can see how these things look before we merge them. > As far as I understand, netfilters are not touched by the Eric and we > can face some non-trivial problems there. No. In my proof of concept tree I should have working per network namespace netfilter code. My intention was to just do enough to see what the impact would be so most of the netfilter code (in my tree) insists on running in the initial network namespace. But there are a few pieces that are fully converted. Please take a look. > So, if my point about uniformity is valid, this patchset looks wrong and > should be re-worked :( This patchset does need to get rebased on top of net-2.6.25 when it opens and hopefully your patchset to remove the unnecessary work in rtnl_unlock, and to really process netlink requests in process context. I see a need for the more fundamental change you seem to be advocating. Differentiating between the incoming and the outgoing code paths is something we already do permission checking, for locking, for sleeping, etc. Modifying the code requires reading and understanding it in context. That is the nature of code. This does make large patches going across the entire networking stack making something a network namespace parameter difficult, but it should not cause any problem for maintenance or other work on the code. As shown by the fact that even outside the tree rebasing my network namespace patches has not been all that difficult. So no I don't think uniformity, or beauty or elegance is what we are after right now. Trying to hard in that direction ultimately obfuscates the code. What we want is something that is simple, straight forward, and doesn't require you to be an expert in network namespaces to understand the code or the patches. In the particular case of the netfilter hooks we don't have a network namespace parameter laying around before we call NF_HOOK, and the idiom "net = (in?in:out)->nd_net" seems perfectly accurate so it seems reasonable to me to derive the network namespace that way in generic code. Although thinking about this. We know which hooks we are being called from so we may in fact actually know if which of in or out must be valid when we get to the netfilter hook. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][BNX2X] round three
Eliezer Tamir wrote: The full patch is available at: [EMAIL PROTECTED]://ftp1.broadcom.com/0001-BNX2X-0.40.10a-net-2.6.24.patch Just when I thought I have beaten the line beast. (or maybe it's just too much work and not enough sleep.) the right links are of course: ftp://[EMAIL PROTECTED]/0001-BNX2X-0.40.10a-net-2.6.24.patch and ftp://[EMAIL PROTECTED]/0001-BNX2X-0.40.10a-net-2.6.24.patch.gz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NETNS] Make ifindex generation per-namespace
Johannes Berg <[EMAIL PROTECTED]> writes: > On Tue, 2007-10-09 at 11:41 -0600, Eric W. Biederman wrote: > >> So please hold off on this until the kernel has been audited and >> we have removed all of the uses of ifindex that assume ifindex is >> global, that we can find. > > I certainly have this assumption in the wireless code (cfg80211). How > would I go about removing it? Are netlink sockets per-namespace so I can > use the namespace of the netlink socket to look up a netdev? Yes. Netlink sockets are per-namespace and you can use the namespace of a netlink socket to look up a netdev. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: authenc compile warnings in current net-2.6.24
* Oliver Hartkopp | 2007-10-10 19:53:53 [+0200]: > CC [M] crypto/authenc.o > crypto/authenc.c: In function ?crypto_authenc_hash?: > crypto/authenc.c:88: warning: ?cryptlen? may be used uninitialized in this > function > crypto/authenc.c:87: warning: ?dst? may be used uninitialized in this > function > crypto/authenc.c: In function ?crypto_authenc_decrypt?: > crypto/authenc.c:163: warning: ?cryptlen? may be used uninitialized in this > function > crypto/authenc.c:163: note: ?cryptlen? was declared here > crypto/authenc.c:162: warning: ?src? may be used uninitialized in this > function > crypto/authenc.c:162: note: ?src? was declared here > > do you already know these warnings? Those warnings are looking like a compiler bug to me. > Oliver Sebastian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493
What about just checking for 0 in the later test? if (val && __dev_get_by_index(val) == NULL) { ... +-DLS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
The hack to use a socket and bind it to claim the port was just for demostrating the idea. The correct solution, IMO, is to enhance the core low level 4-tuple allocation services to be more generic (eg: not be tied to a struct sock). Then the host tcp stack and the host rdma stack can allocate TCP/iWARP ports/4tuples from this common exported service and share the port space. This allocation service could also be used by other deep adapters like iscsi adapters if needed. Since iWarp runs on top of TCP, the port space is really the same. FWIW, I agree that this proposal is the correct solution to support iWarp. - Sean - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493
David Stevens wrote: What about just checking for 0 in the later test? if (val && __dev_get_by_index(val) == NULL) { We could fail the next check right before that though: if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != val) goto e_inval; I just mimicked what the IPv4 code does in do_ip_setsockopt(). -Brian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493
Brian Haley <[EMAIL PROTECTED]> wrote on 10/10/2007 02:20:45 PM: > David Stevens wrote: > > What about just checking for 0 in the later test? > > > > if (val && __dev_get_by_index(val) == NULL) { > > We could fail the next check right before that though: Right, the semantics there would be "if we have a bound dev if, that's the only legal value here." Setting it to '0' in that case doesn't really do anythng, anyway. But I don't care about that semantic difference-- could even add "val &&" to the bound_dev_if check. What I don't like is that your "if" creates an identical duplicate code path for the functional part of it. In this case it's trivial (the asignment), but makes the code look more complex than it really is. If v4 does it that way, I don't like that either. :-) I agree with it in general, and may not be worth the trouble, but I'd personally prefer something like: if (sk->sk_type == SOCK_STREAM) goto e_inval; if (val && sk->sk_bound_dev_if && sk->sk_bound_dev_if != val) goto e_inval; if (val && __dev_get_by_index(val) != NULL) { retv = -ENODEV; break; } [at this point all validity checks are done and we're following one code path to do the work; each check is easily identifiable.] np->mcast_oif = val; retv = 0; break; Or maybe: if (sk->sk_type == SOCK_STREAM) goto e_inval; if (val) { if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != val) goto e_inval; if (__dev_get_by_index(val != NULL) { retv = -ENODEV; break; } } np->mcast_oif = val; retv = 0; break; But anyway, I made the comment; I think some form of it should go in. :-) If you like the original better, that's ok with me, too. +-DLS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] QE clock source improvements
This patch set adds a new property to make specifying QE clock sources easier, adds a function to help parse the property, updates some other functions to use an enum instead of an integer, and updates the ucc_geth driver to take advantage of all this. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] qe: add function qe_clock_source
Add function qe_clock_source() which takes a string containing the name of a QE clock source (as is typically found in device trees) and returns the matching enum qe_clock value. Update booting-without-of.txt to indicate that the UCC properties rx-clock and tx-clock are deprecated and replaced with rx-clock-name and tx-clock-name, which use strings instead of numbers to indicate QE clock sources. Update qe_setbrg() to take an enum qe_clock instead of an integer as its first paramter. Signed-off-by: Timur Tabi <[EMAIL PROTECTED]> --- This patch applies to Kumar's for-2.6.24 branch. arch/powerpc/sysdev/qe_lib/qe.c | 13 +++-- include/asm-powerpc/qe.h| 98 +++ 2 files changed, 56 insertions(+), 55 deletions(-) diff --git a/arch/powerpc/sysdev/qe_lib/qe.c b/arch/powerpc/sysdev/qe_lib/qe.c index 3ccd360..8551e74 100644 --- a/arch/powerpc/sysdev/qe_lib/qe.c +++ b/arch/powerpc/sysdev/qe_lib/qe.c @@ -167,7 +167,7 @@ unsigned int get_brg_clk(void) /* Program the BRG to the given sampling rate and multiplier * - * @brg: the BRG, 1-16 + * @brg: the BRG, QE_BRG1 - QE_BRG16 * @rate: the desired sampling rate * @multiplier: corresponds to the value programmed in GUMR_L[RDCR] or * GUMR_L[TDCR]. E.g., if this BRG is the RX clock, and GUMR_L[RDCR]=01, @@ -175,11 +175,14 @@ unsigned int get_brg_clk(void) * * Also note that the value programmed into the BRGC register must be even. */ -void qe_setbrg(unsigned int brg, unsigned int rate, unsigned int multiplier) +void qe_setbrg(enum qe_clock brg, unsigned int rate, unsigned int multiplier) { u32 divisor, tempval; u32 div16 = 0; + if ((brg < QE_BRG1) || (brg > QE_BRG16)) + return; + divisor = get_brg_clk() / (rate * multiplier); if (divisor > QE_BRGC_DIVISOR_MAX + 1) { @@ -196,7 +199,7 @@ void qe_setbrg(unsigned int brg, unsigned int rate, unsigned int multiplier) tempval = ((divisor - 1) << QE_BRGC_DIVISOR_SHIFT) | QE_BRGC_ENABLE | div16; - out_be32(&qe_immr->brg.brgc[brg - 1], tempval); + out_be32(&qe_immr->brg.brgc[brg - QE_BRG1], tempval); } /* Convert a string to a QE clock source enum @@ -214,7 +217,7 @@ enum qe_clock qe_clock_source(const char *source) if (strncasecmp(source, "brg", 3) == 0) { i = simple_strtoul(source + 3, NULL, 10); if ((i >= 1) && (i <= 16)) - return QE_BRG1 + i - 1; + return (QE_BRG1 - 1) + i; else return QE_CLK_DUMMY; } @@ -222,7 +225,7 @@ enum qe_clock qe_clock_source(const char *source) if (strncasecmp(source, "clk", 3) == 0) { i = simple_strtoul(source + 3, NULL, 10); if ((i >= 1) && (i <= 24)) - return QE_CLK1 + i - 1; + return (QE_CLK1 - 1) + i; else return QE_CLK_DUMMY; } diff --git a/include/asm-powerpc/qe.h b/include/asm-powerpc/qe.h index 7d53750..81403ee 100644 --- a/include/asm-powerpc/qe.h +++ b/include/asm-powerpc/qe.h @@ -28,6 +28,52 @@ #define MEM_PART_SECONDARY 1 #define MEM_PART_MURAM 2 +/* Clocks and BRGs */ +enum qe_clock { + QE_CLK_NONE = 0, + QE_BRG1,/* Baud Rate Generator 1 */ + QE_BRG2,/* Baud Rate Generator 2 */ + QE_BRG3,/* Baud Rate Generator 3 */ + QE_BRG4,/* Baud Rate Generator 4 */ + QE_BRG5,/* Baud Rate Generator 5 */ + QE_BRG6,/* Baud Rate Generator 6 */ + QE_BRG7,/* Baud Rate Generator 7 */ + QE_BRG8,/* Baud Rate Generator 8 */ + QE_BRG9,/* Baud Rate Generator 9 */ + QE_BRG10, /* Baud Rate Generator 10 */ + QE_BRG11, /* Baud Rate Generator 11 */ + QE_BRG12, /* Baud Rate Generator 12 */ + QE_BRG13, /* Baud Rate Generator 13 */ + QE_BRG14, /* Baud Rate Generator 14 */ + QE_BRG15, /* Baud Rate Generator 15 */ + QE_BRG16, /* Baud Rate Generator 16 */ + QE_CLK1,/* Clock 1 */ + QE_CLK2,/* Clock 2 */ + QE_CLK3,/* Clock 3 */ + QE_CLK4,/* Clock 4 */ + QE_CLK5,/* Clock 5 */ + QE_CLK6,/* Clock 6 */ + QE_CLK7,/* Clock 7 */ + QE_CLK8,/* Clock 8 */ + QE_CLK9,/* Clock 9 */ + QE_CLK10, /* Clock 10 */ + QE_CLK11, /* Clock 11 */ + QE_CLK12, /* Clock 12 */ + QE_CLK13, /* Clock 13 */ + QE_CLK14, /* Clock 14 */ + QE_CLK15, /* Clock 15 */ +
[PATCH 2/2] ucc_geth: use rx-clock-name and tx-clock-name device tree properties
This patch updates the ucc_geth device driver to check the new rx-clock-name and tx-clock-name properties first. If present, it uses the new function qe_clock_source() to obtain the clock source. Otherwise, it checks the deprecated rx-clock and tx-clock properties. The device trees for 832x, 836x, and 8568 have been updated to contain the new property names only. Signed-off-by: Timur Tabi <[EMAIL PROTECTED]> --- This patch applies to Kumar's for-2.6.24 branch, on top of my other patch titled "qe: add function qe_clock_source". arch/powerpc/boot/dts/mpc832x_mds.dts |8 arch/powerpc/boot/dts/mpc832x_rdb.dts |8 arch/powerpc/boot/dts/mpc836x_mds.dts |8 arch/powerpc/boot/dts/mpc8568mds.dts |8 drivers/net/ucc_geth.c| 12 +--- 5 files changed, 21 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/boot/dts/mpc832x_mds.dts b/arch/powerpc/boot/dts/mpc832x_mds.dts index fcd333c..b57485b 100644 --- a/arch/powerpc/boot/dts/mpc832x_mds.dts +++ b/arch/powerpc/boot/dts/mpc832x_mds.dts @@ -217,8 +217,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <19>; - tx-clock = <1a>; + rx-clock-name = "clk9"; + tx-clock-name = "clk10"; phy-handle = < &phy3 >; pio-handle = < &pio3 >; }; @@ -238,8 +238,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <17>; - tx-clock = <18>; + rx-clock-name = "clk7"; + tx-clock-name = "clk8"; phy-handle = < &phy4 >; pio-handle = < &pio4 >; }; diff --git a/arch/powerpc/boot/dts/mpc832x_rdb.dts b/arch/powerpc/boot/dts/mpc832x_rdb.dts index 388c8a7..e68a08b 100644 --- a/arch/powerpc/boot/dts/mpc832x_rdb.dts +++ b/arch/powerpc/boot/dts/mpc832x_rdb.dts @@ -202,8 +202,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <20>; - tx-clock = <13>; + rx-clock-name = "clk16"; + tx-clock-name = "clk3"; phy-handle = <&phy00>; pio-handle = <&ucc2pio>; }; @@ -223,8 +223,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <19>; - tx-clock = <1a>; + rx-clock-name = "clk9"; + tx-clock-name = "clk10"; phy-handle = <&phy04>; pio-handle = <&ucc3pio>; }; diff --git a/arch/powerpc/boot/dts/mpc836x_mds.dts b/arch/powerpc/boot/dts/mpc836x_mds.dts index fbd1573..7a54072 100644 --- a/arch/powerpc/boot/dts/mpc836x_mds.dts +++ b/arch/powerpc/boot/dts/mpc836x_mds.dts @@ -245,8 +245,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <0>; - tx-clock = <19>; + rx-clock-name = "none"; + tx-clock-name = "clk9"; phy-handle = < &phy0 >; phy-connection-type = "rgmii-id"; pio-handle = < &pio1 >; @@ -267,8 +267,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <0>; - tx-clock = <14>; + rx-clock-name = "none"; + tx-clock-name = "clk4"; phy-handle = < &phy1 >; phy-connection-type = "rgmii-id"; pio-handle = < &pio2 >; diff --git a/arch/powerpc/boot/dts/mpc8568mds.dts b/arch/powerpc/boot/dts/mpc8568mds.dts index 5439437..cf45aab 100644 --- a/arch/powerpc/boot/dts/mpc8568mds.dts +++ b/arch/powerpc/boot/dts/mpc8568mds.dts @@ -333,8 +333,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <0>; - tx-clock = <20>; + rx-clock-name = "none"; + tx-clock-name = "clk16"; pio-handle = <&pio1>;
Re: [PATCH 0/2] QE clock source improvements
Sorry, please ignore this set. Something got screwed up with the patches. I'm going to resend. Timur Tabi wrote: This patch set adds a new property to make specifying QE clock sources easier, adds a function to help parse the property, updates some other functions to use an enum instead of an integer, and updates the ucc_geth driver to take advantage of all this. ___ Linuxppc-dev mailing list [EMAIL PROTECTED] https://ozlabs.org/mailman/listinfo/linuxppc-dev -- Timur Tabi Linux Kernel Developer @ Freescale - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
From: jamal <[EMAIL PROTECTED]> Date: Wed, 10 Oct 2007 09:08:48 -0400 > On Wed, 2007-10-10 at 03:44 -0700, David Miller wrote: > > > I've always gotten very poor results when increasing the TX queue a > > lot, for example with NIU the point of diminishing returns seems to > > be in the range of 256-512 TX descriptor entries and this was with > > 1.6Ghz cpus. > > Is it interupt per packet? From my experience, you may find interesting > results varying tx interupt mitigation parameters in addition to the > ring parameters. > Unfortunately when you do that, optimal parameters also depends on > packet size. so what may work for 64B, wont work well for 1400B. No, it was not interrupt per-packet, I was telling the chip to interrupt me every 1/4 of the ring. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] QE clock source improvements
(Replaces all previous versions of this patch) This patch set adds a new property to make specifying QE clock sources easier, adds a function to help parse the property, updates some other functions to use an enum instead of an integer, and updates the ucc_geth driver to take advantage of all this. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] qe: add function qe_clock_source
Add function qe_clock_source() which takes a string containing the name of a QE clock source (as is typically found in device trees) and returns the matching enum qe_clock value. Update booting-without-of.txt to indicate that the UCC properties rx-clock and tx-clock are deprecated and replaced with rx-clock-name and tx-clock-name, which use strings instead of numbers to indicate QE clock sources. Update qe_setbrg() to take an enum qe_clock instead of an integer as its first paramter. Signed-off-by: Timur Tabi <[EMAIL PROTECTED]> --- This patch applies to Kumar's for-2.6.24 branch. Documentation/powerpc/booting-without-of.txt | 13 arch/powerpc/sysdev/qe_lib/qe.c | 41 ++- include/asm-powerpc/qe.h | 95 +- 3 files changed, 99 insertions(+), 50 deletions(-) diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index 7a6c5f2..d8306ee 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt @@ -1615,6 +1615,19 @@ platforms are moved over to use the flattened-device-tree model. - interrupt-parent : the phandle for the interrupt controller that services interrupts for this device. - pio-handle : The phandle for the Parallel I/O port configuration. + - rx-clock-name: the UCC receive clock source + "none": clock source is disabled + "brg1" through "brg16": clock source is BRG1-BRG16, respectively + "clk1" through "clk24": clock source is CLK1-CLK24, respectively + - tx-clock-name: the UCC transmit clock source + "none": clock source is disabled + "brg1" through "brg16": clock source is BRG1-BRG16, respectively + "clk1" through "clk24": clock source is CLK1-CLK24, respectively + The following two properties are deprecated. rx-clock has been replaced + with rx-clock-name, and tx-clock has been replaced with tx-clock-name. + Drivers that currently use the deprecated properties should continue to + do so, in order to support older device trees, but they should be updated + to check for the new properties first. - rx-clock : represents the UCC receive clock source. 0x00 : clock source is disabled; 0x1~0x10 : clock source is BRG1~BRG16 respectively; diff --git a/arch/powerpc/sysdev/qe_lib/qe.c b/arch/powerpc/sysdev/qe_lib/qe.c index 3d57d38..8551e74 100644 --- a/arch/powerpc/sysdev/qe_lib/qe.c +++ b/arch/powerpc/sysdev/qe_lib/qe.c @@ -167,7 +167,7 @@ unsigned int get_brg_clk(void) /* Program the BRG to the given sampling rate and multiplier * - * @brg: the BRG, 1-16 + * @brg: the BRG, QE_BRG1 - QE_BRG16 * @rate: the desired sampling rate * @multiplier: corresponds to the value programmed in GUMR_L[RDCR] or * GUMR_L[TDCR]. E.g., if this BRG is the RX clock, and GUMR_L[RDCR]=01, @@ -175,11 +175,14 @@ unsigned int get_brg_clk(void) * * Also note that the value programmed into the BRGC register must be even. */ -void qe_setbrg(unsigned int brg, unsigned int rate, unsigned int multiplier) +void qe_setbrg(enum qe_clock brg, unsigned int rate, unsigned int multiplier) { u32 divisor, tempval; u32 div16 = 0; + if ((brg < QE_BRG1) || (brg > QE_BRG16)) + return; + divisor = get_brg_clk() / (rate * multiplier); if (divisor > QE_BRGC_DIVISOR_MAX + 1) { @@ -196,8 +199,40 @@ void qe_setbrg(unsigned int brg, unsigned int rate, unsigned int multiplier) tempval = ((divisor - 1) << QE_BRGC_DIVISOR_SHIFT) | QE_BRGC_ENABLE | div16; - out_be32(&qe_immr->brg.brgc[brg - 1], tempval); + out_be32(&qe_immr->brg.brgc[brg - QE_BRG1], tempval); +} + +/* Convert a string to a QE clock source enum + * + * This function takes a string, typically from a property in the device + * tree, and returns the corresponding "enum qe_clock" value. +*/ +enum qe_clock qe_clock_source(const char *source) +{ + unsigned int i; + + if (strcasecmp(source, "none") == 0) + return QE_CLK_NONE; + + if (strncasecmp(source, "brg", 3) == 0) { + i = simple_strtoul(source + 3, NULL, 10); + if ((i >= 1) && (i <= 16)) + return (QE_BRG1 - 1) + i; + else + return QE_CLK_DUMMY; + } + + if (strncasecmp(source, "clk", 3) == 0) { + i = simple_strtoul(source + 3, NULL, 10); + if ((i >= 1) && (i <= 24)) + return (QE_CLK1 - 1) + i; + else + return QE_CLK_DUMMY; + } + + return QE_CLK_DUMMY; } +EXPORT_SYMBOL(qe_clock_source); /* Initialize SNUMs (thread serial numbers) according to * QE Module Control chapter, SNUM table diff --git a/include/asm-powerpc/qe.h b/include/asm-powerpc/qe.h index 0dabe46..81403ee 100644 --- a/include/asm-powerpc/qe.h +++ b/include/asm-powerpc/qe.h @@ -28,6 +28,52 @@ #de
[PATCH 2/2] ucc_geth: use rx-clock-name and tx-clock-name device tree properties
This patch updates the ucc_geth device driver to check the new rx-clock-name and tx-clock-name properties first. If present, it uses the new function qe_clock_source() to obtain the clock source. Otherwise, it checks the deprecated rx-clock and tx-clock properties. The device trees for 832x, 836x, and 8568 have been updated to contain the new property names only. Signed-off-by: Timur Tabi <[EMAIL PROTECTED]> --- This patch applies to Kumar's for-2.6.24 branch, on top of my other patch titled "qe: add function qe_clock_source". arch/powerpc/boot/dts/mpc832x_mds.dts |8 ++-- arch/powerpc/boot/dts/mpc832x_rdb.dts |8 ++-- arch/powerpc/boot/dts/mpc836x_mds.dts |8 ++-- arch/powerpc/boot/dts/mpc8568mds.dts |8 ++-- drivers/net/ucc_geth.c| 55 ++-- 5 files changed, 67 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/boot/dts/mpc832x_mds.dts b/arch/powerpc/boot/dts/mpc832x_mds.dts index fcd333c..b57485b 100644 --- a/arch/powerpc/boot/dts/mpc832x_mds.dts +++ b/arch/powerpc/boot/dts/mpc832x_mds.dts @@ -217,8 +217,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <19>; - tx-clock = <1a>; + rx-clock-name = "clk9"; + tx-clock-name = "clk10"; phy-handle = < &phy3 >; pio-handle = < &pio3 >; }; @@ -238,8 +238,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <17>; - tx-clock = <18>; + rx-clock-name = "clk7"; + tx-clock-name = "clk8"; phy-handle = < &phy4 >; pio-handle = < &pio4 >; }; diff --git a/arch/powerpc/boot/dts/mpc832x_rdb.dts b/arch/powerpc/boot/dts/mpc832x_rdb.dts index 388c8a7..e68a08b 100644 --- a/arch/powerpc/boot/dts/mpc832x_rdb.dts +++ b/arch/powerpc/boot/dts/mpc832x_rdb.dts @@ -202,8 +202,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <20>; - tx-clock = <13>; + rx-clock-name = "clk16"; + tx-clock-name = "clk3"; phy-handle = <&phy00>; pio-handle = <&ucc2pio>; }; @@ -223,8 +223,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <19>; - tx-clock = <1a>; + rx-clock-name = "clk9"; + tx-clock-name = "clk10"; phy-handle = <&phy04>; pio-handle = <&ucc3pio>; }; diff --git a/arch/powerpc/boot/dts/mpc836x_mds.dts b/arch/powerpc/boot/dts/mpc836x_mds.dts index fbd1573..7a54072 100644 --- a/arch/powerpc/boot/dts/mpc836x_mds.dts +++ b/arch/powerpc/boot/dts/mpc836x_mds.dts @@ -245,8 +245,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <0>; - tx-clock = <19>; + rx-clock-name = "none"; + tx-clock-name = "clk9"; phy-handle = < &phy0 >; phy-connection-type = "rgmii-id"; pio-handle = < &pio1 >; @@ -267,8 +267,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <0>; - tx-clock = <14>; + rx-clock-name = "none"; + tx-clock-name = "clk4"; phy-handle = < &phy1 >; phy-connection-type = "rgmii-id"; pio-handle = < &pio2 >; diff --git a/arch/powerpc/boot/dts/mpc8568mds.dts b/arch/powerpc/boot/dts/mpc8568mds.dts index 5439437..cf45aab 100644 --- a/arch/powerpc/boot/dts/mpc8568mds.dts +++ b/arch/powerpc/boot/dts/mpc8568mds.dts @@ -333,8 +333,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = <0>; - tx-clock = <20>; + rx-clock-name = "none"; + tx-clock-name = "clk16"; pio-handle = <&pi