Re: [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue
From: Herbert Xu <[EMAIL PROTECTED]> Date: Sun, 18 Nov 2007 10:07:37 +0800 > On Sat, Nov 17, 2007 at 04:45:42PM -0800, David Miller wrote: > > > > Herbert, you asked about just nop'ing out cond_resched() when we're > > doing real preemption. > > > > A lot of code goes: > > > > if (need_resched()) { > > /* drop some locks, etc. */ > > cond_resched(); > > /* reacquire locks, etc. */ > > } > > > > So it has to do something even with real preemption enabled. > > Actually that shouldn't be necessary. Because things like spin_unlock > does preempt_enable which in turn does: > > #define preempt_enable() \ > do { \ > preempt_enable_no_resched(); \ > barrier(); \ > preempt_check_resched(); \ > } while (0) > > when CONFIG_PREEMPT is enabled. So at least in this case the > cond_resched call is superfluous. I see what you mean, ok yes that would catch it. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue
On Sat, Nov 17, 2007 at 04:45:42PM -0800, David Miller wrote: > > Herbert, you asked about just nop'ing out cond_resched() when we're > doing real preemption. > > A lot of code goes: > > if (need_resched()) { > /* drop some locks, etc. */ > cond_resched(); > /* reacquire locks, etc. */ > } > > So it has to do something even with real preemption enabled. Actually that shouldn't be necessary. Because things like spin_unlock does preempt_enable which in turn does: #define preempt_enable() \ do { \ preempt_enable_no_resched(); \ barrier(); \ preempt_check_resched(); \ } while (0) when CONFIG_PREEMPT is enabled. So at least in this case the cond_resched call is superfluous. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc2-mm1 -- strange apparent network failures
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kevin Winchester wrote: > However, I got around the problem by making the code change manually - > and my network connection is now working. Looking at the code being > bypassed: > > if (pE.cap[i] || pP.cap[i] || pP.cap[i]) > > looks somewhat weird as it is testing the same condition twice. Should > it have been: > > if (pE.cap[i] || pP.cap[i] || pI.cap[i]) Yes, that was also a bug. However, upon reflection (and as per my "0 &&" hack), I now believe these few lines of code are problematic in general. Thanks for reporting this bug. I'll post a more clear patch (that isn't GPG'd). Cheers Andrew -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHP5vy+bHCR3gb8jsRAliTAKCvCsfZuNN7Og57S0s8O4SZNveSUwCgq4VP vHUE/S+x09l5I24E2/rmLj4= =JaWT -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc2-mm1 -- strange apparent network failures
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Andrew Morgan wrote: > Kevin, > > Can you try this quick hack? > > diff --git a/kernel/capability.c b/kernel/capability.c > index e57d1aa..4088610 100644 > --- a/kernel/capability.c > +++ b/kernel/capability.c > @@ -109,7 +109,7 @@ out: > kdata[i].permitted = pP.cap[i]; > kdata[i].inheritable = pI.cap[i]; > } > - while (i < _LINUX_CAPABILITY_U32S) { > + while (0 && (i < _LINUX_CAPABILITY_U32S)) { > if (pE.cap[i] || pP.cap[i] || pP.cap[i]) { > /* Cannot represent w/ legacy structure */ > return -ERANGE; > Oh, and the reason your patch turned up incorrect in my mailer and on lkml seems to be the PGP signature. I didn't have your public key, so my mail client just left the full PGP-signed text in, which includes escaping of '-' characters. LKML must also ignore the signature. Once I added your public key, the patch shows up correctly in my client at least. (I guess everyone else probably knew this already...but at least I learned something new today) - -- Kevin Winchester -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHP5QdKPGFQbiQ3tQRAqimAJwOSGDSM2wXeLbm+sBKehGf/haNpACfX7Cb IALnPxwlgShR6Xb+XQclBro= =xFUp -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc2-mm1 -- strange apparent network failures
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kevin Winchester wrote: > Looking at the code being bypassed: > > if (pE.cap[i] || pP.cap[i] || pP.cap[i]) > > looks somewhat weird as it is testing the same condition twice. Should > it have been: > > if (pE.cap[i] || pP.cap[i] || pI.cap[i]) > > ? > > I'm about to test that change instead of bypassing the loop, so I'll let > you know the results. > No, this still results in a dead network connection, although it is probably a correct change. I suppose giving the loop even more reasons to return -ERANGE wasn't going to be helpful. - -- Kevin Winchester -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHP5KXKPGFQbiQ3tQRAilbAJ9h3qtO9sb9+ctVU0pxzCBjysy06QCdE1Wd M5V3+0BWyn04p0UeUq/KSlw= =663t -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc2-mm1 -- strange apparent network failures
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Andrew Morgan wrote: > Kevin, > > Can you try this quick hack? > > diff --git a/kernel/capability.c b/kernel/capability.c > index e57d1aa..4088610 100644 > --- a/kernel/capability.c > +++ b/kernel/capability.c > @@ -109,7 +109,7 @@ out: > kdata[i].permitted = pP.cap[i]; > kdata[i].inheritable = pI.cap[i]; > } > - while (i < _LINUX_CAPABILITY_U32S) { > + while (0 && (i < _LINUX_CAPABILITY_U32S)) { > if (pE.cap[i] || pP.cap[i] || pP.cap[i]) { > /* Cannot represent w/ legacy structure */ > return -ERANGE; > Well, something went wrong with the patch - it has extra negative signs in my mail reader, and on lkml, but now that I've hit reply and it's been quoted, it looks fine in my mail client. So I have no idea what went on. However, I got around the problem by making the code change manually - and my network connection is now working. Looking at the code being bypassed: if (pE.cap[i] || pP.cap[i] || pP.cap[i]) looks somewhat weird as it is testing the same condition twice. Should it have been: if (pE.cap[i] || pP.cap[i] || pI.cap[i]) ? I'm about to test that change instead of bypassing the loop, so I'll let you know the results. - -- Kevin Winchester -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHP4xGKPGFQbiQ3tQRAooWAJ9c6exhOiD4VUZ04hS9z77/RmERUACfauTE BV/JAexzlm2zSmG4laYi+HQ= =IPkA -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue
From: Herbert Xu <[EMAIL PROTECTED]> Date: Sun, 18 Nov 2007 00:29:39 +0800 > However, since you're already working on this as your next step > I can wait :) Me too. Herbert, you asked about just nop'ing out cond_resched() when we're doing real preemption. A lot of code goes: if (need_resched()) { /* drop some locks, etc. */ cond_resched(); /* reacquire locks, etc. */ } So it has to do something even with real preemption enabled. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : rt_check_expire() can take a long time, add a cond_resched()
From: Andi Kleen <[EMAIL PROTECTED]> Date: Sat, 17 Nov 2007 22:46:40 +0100 > > The 25.000.000 ns and 88.000.000 ns numbers where on an empty table, but > > large (16 MB of memory) > > This would mean that cond_resched() needs ~4x as much time as checking > an empty bucket. I find that somewhat hard to believe. Based upon Arjan's analysis of the stall created by the cond_resched() assembler, this doesn't surprise me. I also don't believe Eric would make up such numbers or measure them carelessly and then present them as fact. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [IPV4] UDP: Always checksum even if without socket filter
From: Andi Kleen <[EMAIL PROTECTED]> Date: Sat, 17 Nov 2007 14:18:46 +0100 > Wang Chen <[EMAIL PROTECTED]> writes: > > > Herbert Xu said the following on 2007-11-16 12:11: > >> Wang Chen <[EMAIL PROTECTED]> wrote: > >>> So, I think the checksum in udp_queue_rcv_skb() actually does > >>> the work, not that in udp_recvmsg() and udp_poll(). > >>> > >>> If I am wrong, please point out. > >> > >> We may have a bug in the accounting area. Check the recent > >> patch made to UDP/IPv6 which is probably needed here as well. > >> > > > > Like dave said, decrementing the InDataGrams in this case is an > > option. > > I will check the same place of UDP/IPv6. > > And like Benny pointed out it's probably a bad idea because > decrementing counters will be an unexpected ABI change for monitoring > programs who have no other way to detect overflow. We could defer the increment until we check the checksum, but that is likely to break even more things because people (as Wang Chen did initially) will send a packet to some port with an app that doesn't eat the packets, and expect the InDatagrams counter to increase once the stack eats the packet. But it won't until the application does the read. All of our options suck, we just have to choose the least sucking one and right now to me that's decrementing the counter as much as I empathize with the SNMP application overflow detection issue. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc2-mm1 -- strange apparent network failures
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kevin, Can you try this quick hack? diff --git a/kernel/capability.c b/kernel/capability.c index e57d1aa..4088610 100644 - --- a/kernel/capability.c +++ b/kernel/capability.c @@ -109,7 +109,7 @@ out: kdata[i].permitted = pP.cap[i]; kdata[i].inheritable = pI.cap[i]; } - - while (i < _LINUX_CAPABILITY_U32S) { + while (0 && (i < _LINUX_CAPABILITY_U32S)) { if (pE.cap[i] || pP.cap[i] || pP.cap[i]) { /* Cannot represent w/ legacy structure */ return -ERANGE; Thanks Andrew Kevin Winchester wrote: > On November 17, 2007 01:16:58 am Andrew Morgan wrote: >> Hi, >> >> This warning is just saying that you might want to reconsider >> recompiling your dhclient with a newer libcap - which has native support >> for 64-bit capabilities. This is supposed to be informative, and not be >> associated with any particular error. >> >> From your comments, you believe that this patch causes something in your >> boot process to fail. Can you supply some detail about the version of >> dhclient you are using? I'd like to understand exactly what it is doing >> (via libcap). >> >> Thanks >> > > The boot succeeds (and appears to bring initialize the network adapter > properly - it autonegotiates a 100Mbps link speed), but the dhcp client is > never able to get an address. However, applying the rc2-mm1 patch series up > to just before: > > add-64-bit-capability-support-to-the-kernel.patch > > results in a working kernel. Applying just this patch causes the failure. > To > be sure, I also tried applying the above patch plus the following ones: > > add-64-bit-capability-support-to-the-kernel-checkpatch-fixes.patch > add-64-bit-capability-support-to-the-kernel-fix.patch > add-64-bit-capability-support-to-the-kernel-fix-fix.patch > remove-unnecessary-include-from-include-linux-capabilityh.patch > > but the problem still occurs even with all of these. > > As to versions, I'm running Kubuntu gutsy, so I have the default: > > dhcp3-client 3.0.5-3ubuntu4 > libcap11:1.10-14build1 > > packages installed. > > Let me know if you need any other information, or if you have a patch you > would like tested. > -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFHP37LQheEq9QabfIRAst5AJ9Nsw0RtF2NDuUAMvQZh5OFWEB4ugCeIxMH lp5/Ka7SJZLIrQpZDijrd1E= =GN18 -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8169 crash
Before it happens on 2.6.22, i tried to attach good cable, plug-unplug, whatever, interface up/down - card still remains dead. I try to plug cable to laptop with rtl8139 and pc with e100 - it is worked when i shake bad cable(interface was going up/down as well), and good cable also worked fine, and never crashed. On 2.6.22 i was able to reproduce it easily, but later i can't cause it is server in internet cafe, and customers going crazy. On Sat, 17 Nov 2007 23:07:59 +0100, Francois Romieu wrote > Denys <[EMAIL PROTECTED]> : > [...] > > After that i have. > > Nov 15 22:11:37 vzone NETDEV WATCHDOG: eth1: transmit timed out > > Nov 15 22:11:49 vzone NETDEV WATCHDOG: eth1: transmit timed out > > Nov 15 22:12:01 vzone NETDEV WATCHDOG: eth1: transmit timed out > > Nov 15 22:12:13 vzone NETDEV WATCHDOG: eth1: transmit timed out > > Nov 15 22:12:25 vzone NETDEV WATCHDOG: eth1: transmit timed out > > Nov 15 22:12:37 vzone NETDEV WATCHDOG: eth1: transmit timed out > > Nov 15 22:12:49 vzone NETDEV WATCHDOG: eth1: transmit timed out > > Nov 15 22:13:01 vzone NETDEV WATCHDOG: eth1: transmit timed out > > Is there a chance for you to verify that the network interface does > not recover if a real cable is plugged at this point ? > > -- > Ueimor -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH, REPOST] Fix/Rewrite of the mipsnet driver
Hello All, currently the mipsnet driver fails after transmitting a number of packages because SKBs are allocated but never freed. I fixed that and coudn't refrain from removing the most egregious warts. - mipsnet.h folded into mipsnet.c, as it doesn't provide any useful external interface. - Free SKB after transmission. - Call free_irq in mipsnet_close, to balance the request_irq in mipsnet_open. - Removed duplicate read of rxDataCount. - Some identifiers are now less verbose. - Removed dead and/or unnecessarily complex code. - Code formatting fixes. Tested on Qemu's mipssim emulation, with this patch it can boot a Debian NFSroot. Thiemo Signed-off-by: Thiemo Seufer <[EMAIL PROTECTED]> --- b/drivers/net/mipsnet.c | 201 drivers/net/mipsnet.h | 112 -- 2 files changed, 134 insertions(+), 179 deletions(-) diff --git a/drivers/net/mipsnet.c b/drivers/net/mipsnet.c index aafc3ce..6d343ef 100644 --- a/drivers/net/mipsnet.c +++ b/drivers/net/mipsnet.c @@ -4,8 +4,6 @@ * for more details. */ -#define DEBUG - #include #include #include @@ -15,11 +13,93 @@ #include #include -#include "mipsnet.h" /* actual device IO mapping */ +#define MIPSNET_VERSION "2007-11-17" + +/* + * Net status/control block as seen by sw in the core. + */ +struct mipsnet_regs { + /* +* Device info for probing, reads as MIPSNET%d where %d is some +* form of version. +*/ + u64 devId; /*0x00 */ -#define MIPSNET_VERSION "2005-06-20" + /* +* read only busy flag. +* Set and cleared by the Net Device to indicate that an rx or a tx +* is in progress. +*/ + u32 busy; /*0x08 */ -#define mipsnet_reg_address(dev, field) (dev->base_addr + field_offset(field)) + /* +* Set by the Net Device. +* The device will set it once data has been received. +* The value is the number of bytes that should be read from +* rxDataBuffer. The value will decrease till 0 until all the data +* from rxDataBuffer has been read. +*/ + u32 rxDataCount;/*0x0c */ +#define MIPSNET_MAX_RXTX_DATACOUNT (1 << 16) + + /* +* Settable from the MIPS core, cleared by the Net Device. +* The core should set the number of bytes it wants to send, +* then it should write those bytes of data to txDataBuffer. +* The device will clear txDataCount has been processed (not +* necessarily sent). +*/ + u32 txDataCount;/*0x10 */ + + /* +* Interrupt control +* +* Used to clear the interrupted generated by this dev. +* Write a 1 to clear the interrupt. (except bit31). +* +* Bit0 is set if it was a tx-done interrupt. +* Bit1 is set when new rx-data is available. +*Until this bit is cleared there will be no other RXs. +* +* Bit31 is used for testing, it clears after a read. +*Writing 1 to this bit will cause an interrupt to be generated. +*To clear the test interrupt, write 0 to this register. +*/ + u32 interruptControl; /*0x14 */ +#define MIPSNET_INTCTL_TXDONE (1u << 0) +#define MIPSNET_INTCTL_RXDONE (1u << 1) +#define MIPSNET_INTCTL_TESTBIT(1u << 31) + + /* +* Readonly core-specific interrupt info for the device to signal +* the core. The meaning of the contents of this field might change. +*/ + /* XXX: the whole memIntf interrupt scheme is messy: the device +* should have no control what so ever of what VPE/register set is +* being used. +* The MemIntf should only expose interrupt lines, and something in +* the config should be responsible for the line<->core/vpe bindings. +*/ + u32 interruptInfo; /*0x18 */ + + /* +* This is where the received data is read out. +* There is more data to read until rxDataReady is 0. +* Only 1 byte at this regs offset is used. +*/ + u32 rxDataBuffer; /*0x1c */ + + /* +* This is where the data to transmit is written. +* Data should be written for the amount specified in the +* txDataCount register. +* Only 1 byte at this regs offset is used. +*/ + u32 txDataBuffer; /*0x20 */ +}; + +#define regaddr(dev, field) \ + (dev->base_addr + offsetof(struct mipsnet_regs, field)) static char mipsnet_string[] = "mipsnet"; @@ -29,32 +109,27 @@ static char mipsnet_string[] = "mipsnet"; static int ioiocpy_frommipsnet(struct net_device *dev, unsigned char *kdata, int len) { - uint32_t available_len = inl(mipsnet_reg_address(dev, rxDataCount)); - - if (available_len < len) - return -EFAULT; - for (; len > 0; len--, kdata++) - *
Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0
Bill Fink wrote, On 11/16/2007 08:26 PM: ... > Regarding the Target IP, RFC 826 says: > > "The target protocol address is necessary in the request form > of the packet so that a machine can determine whether or not > to enter the sender information in a table or to send a reply. > It is not necessarily needed in the reply form if one assumes > a reply is only provoked by a request. It is included for > completeness, network monitoring, and to simplify the suggested > processing algorithm described above (which does not look at > the opcode until AFTER putting the sender information in a > table). > > So it's ambiguous about the target IP address in an ARP reply packet, > but a value of 0.0.0.0 makes more logical sense to me than using > 192.168.0.1 in this example case, since it should reflect the requestor > IP address, which is unknown in this case. IMHO, you are mostly right, but, according to this, if it's ambiguous then only, if there is the target IP or no target IP, so here 0.0.0.0 or 0.0.0.0... Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8169 crash
Denys <[EMAIL PROTECTED]> : [...] > After that i have. > Nov 15 22:11:37 vzone NETDEV WATCHDOG: eth1: transmit timed out > Nov 15 22:11:49 vzone NETDEV WATCHDOG: eth1: transmit timed out > Nov 15 22:12:01 vzone NETDEV WATCHDOG: eth1: transmit timed out > Nov 15 22:12:13 vzone NETDEV WATCHDOG: eth1: transmit timed out > Nov 15 22:12:25 vzone NETDEV WATCHDOG: eth1: transmit timed out > Nov 15 22:12:37 vzone NETDEV WATCHDOG: eth1: transmit timed out > Nov 15 22:12:49 vzone NETDEV WATCHDOG: eth1: transmit timed out > Nov 15 22:13:01 vzone NETDEV WATCHDOG: eth1: transmit timed out Is there a chance for you to verify that the network interface does not recover if a real cable is plugged at this point ? -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : rt_check_expire() can take a long time, add a cond_resched()
> The 25.000.000 ns and 88.000.000 ns numbers where on an empty table, but > large (16 MB of memory) This would mean that cond_resched() needs ~4x as much time as checking an empty bucket. I find that somewhat hard to believe. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netconsole and logging everything from /dev/console
* Matt Mackall <[EMAIL PROTECTED]> [2007-11-16 12:50]: > > Do you think that would be possible? > > It is, definitely, you just need to wire up a tty struct's write > method to netconsole's and add it to the console registration. But I > haven't had any time to work on this in a while. Where would I look for examples I could follow? -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue
On Sat, Nov 17, 2007 at 05:18:35PM +0100, Eric Dumazet wrote: > > >This seems to be the only potentially softirq caller of rt_run_flush. > >However, I just checked the callers of it and most of them seem to > >hold the RTNL which would indicate that they're in process context. > > > >So do you know if you we have any real softirq callers left? > >If we do perhaps we can look at either moving them out or see > >if they can cope with the flush occuring after the call returns. > > > >If not we can get rid of the softirq special case. > > Unfortunatly we have softirq callers left. But my goal is to move > everything to process context yes. I choose small patches, so that they can > be more easyly reviewed and accepted. > > The most common case is triggered by "ip route flush cache" > Since it's arming a 2 second timer (ip_rt_min_delay) . When this > timer is fired (softirq), it is flushing the table. > > Then, every calls to rt_cache_flush(-1) are asking the same thing, while > rt_cache_flush(0) are synchronous (immediate flushing unless a flush > already is in flight) Right. Obviously the ones with non-zero arguments aren't an issue because it's delayed anyway. What I meant above is that which of the ones that call it with zero are really in a softirq context. As the ones I looked at all seem to hold the RTNL it would suggest that most of them are already in process context. However, since you're already working on this as your next step I can wait :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : rt_check_expire() can take a long time, add a cond_resched()
Andi Kleen a écrit : Eric Dumazet <[EMAIL PROTECTED]> writes: So it may sound unnecessary but in the rt_check_expire() case, with a loop potentially doing XXX.XXX iterations, being able to bypass the function call is a clear win (in my bench case, 25 ms instead of 88 ms). Impact on I-cache is irrelevant here as this rt_check_expires() Measuring what? And really milli-seconds? The number does not sound plausible to me. You know Andi, I have seen production servers that needed several seconds to perform the flush. When you have millions of entries on this table, can you imagine the number of memory transactions (including atomic ops) needed to flush them all ? The 25.000.000 ns and 88.000.000 ns numbers where on an empty table, but large (16 MB of memory) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue
Herbert Xu a écrit : On Sat, Nov 17, 2007 at 09:41:47AM +, Eric Dumazet wrote: [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue Thanks for your work on this Eric! It's very much needed. Thanks :) @@ -667,7 +697,7 @@ void rt_cache_flush(int delay) if (delay <= 0) { spin_unlock_bh(&rt_flush_lock); - rt_run_flush(0); + rt_run_flush(user_mode); return; } This seems to be the only potentially softirq caller of rt_run_flush. However, I just checked the callers of it and most of them seem to hold the RTNL which would indicate that they're in process context. So do you know if you we have any real softirq callers left? If we do perhaps we can look at either moving them out or see if they can cope with the flush occuring after the call returns. If not we can get rid of the softirq special case. Unfortunatly we have softirq callers left. But my goal is to move everything to process context yes. I choose small patches, so that they can be more easyly reviewed and accepted. The most common case is triggered by "ip route flush cache" Since it's arming a 2 second timer (ip_rt_min_delay) . When this timer is fired (softirq), it is flushing the table. Then, every calls to rt_cache_flush(-1) are asking the same thing, while rt_cache_flush(0) are synchronous (immediate flushing unless a flush already is in flight) net/decnet/dn_table.c:621: dn_rt_cache_flush(-1); net/decnet/dn_table.c:625: dn_rt_cache_flush(-1); net/decnet/dn_table.c:700: dn_rt_cache_flush(-1); net/decnet/dn_table.c:707: dn_rt_cache_flush(-1); net/decnet/dn_table.c:876: dn_rt_cache_flush(-1); net/decnet/dn_rules.c:234: dn_rt_cache_flush(-1); net/decnet/dn_fib.c:632:dn_rt_cache_flush(0); net/decnet/dn_fib.c:644:dn_rt_cache_flush(-1); net/decnet/dn_fib.c:651:dn_rt_cache_flush(-1); net/decnet/dn_route.c:339:void dn_rt_cache_flush(int delay) net/ipv4/devinet.c:1344:rt_cache_flush(0); net/ipv4/devinet.c:1359:rt_cache_flush(0); net/ipv4/devinet.c:1374:rt_cache_flush(0); net/ipv4/devinet.c:1387:rt_cache_flush(0); net/ipv4/fib_frontend.c:126:rt_cache_flush(-1); net/ipv4/fib_frontend.c:833:rt_cache_flush(0); net/ipv4/fib_frontend.c:847:rt_cache_flush(-1); net/ipv4/fib_frontend.c:857:rt_cache_flush(-1); net/ipv4/fib_frontend.c:888:rt_cache_flush(-1); net/ipv4/fib_frontend.c:895:rt_cache_flush(0); net/ipv4/fib_rules.c:274: rt_cache_flush(-1); net/ipv4/fib_trie.c:1235: rt_cache_flush(-1); net/ipv4/fib_trie.c:1288: rt_cache_flush(-1); net/ipv4/fib_trie.c:1654: rt_cache_flush(-1); net/ipv4/route.c:671:void rt_cache_flush(int delay) net/ipv4/route.c:2688: rt_cache_flush(0); net/ipv4/route.c:2700: rt_cache_flush(flush_delay); net/ipv4/route.c:2720: rt_cache_flush(delay); net/ipv4/arp.c:1215:rt_cache_flush(0); net/ipv4/fib_hash.c:459:rt_cache_flush(-1); net/ipv4/fib_hash.c:526:rt_cache_flush(-1); net/ipv4/fib_hash.c:608:rt_cache_flush(-1); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] via-velocity: don't oops on MTU change.
Jon Nelson wrote, On 11/17/2007 01:59 AM: ... > OK. This is what I did. > Using git I grabbed a copy of Linus' tree and using the latest files > for via-velocity.[c,h], commit > 99fee6d7e5748d96884667a4628118f7fc130ea0, I determined that if I > backed out change 44c10138fd4bbc4b6d6bff0873c24902f2a9da65 (PCI: > Change all drivers to use pci_device->revision) I could get it to > compile. This gets me a more recent driver. > > Then I applied both of the patches you have provided me, and built and > tried that. > No sigseg, no oops on initial MTU, no sigseg or oops on subsequent MTU > changes. Good news! But, if I got it right your method could be tricky. These current via-velocity files could depend on other files being current as well. So, it's safer to use the whole new kernel (eg. 2.6.24-rc3) or to stay with your older one. But, if you want to check the effect of these new two patches only without any additional 'features', your older kernel should be a better choice. Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue
On Sat, Nov 17, 2007 at 09:41:47AM +, Eric Dumazet wrote: > > [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to > workqueue Thanks for your work on this Eric! It's very much needed. > @@ -667,7 +697,7 @@ void rt_cache_flush(int delay) > > if (delay <= 0) { > spin_unlock_bh(&rt_flush_lock); > - rt_run_flush(0); > + rt_run_flush(user_mode); > return; > } This seems to be the only potentially softirq caller of rt_run_flush. However, I just checked the callers of it and most of them seem to hold the RTNL which would indicate that they're in process context. So do you know if you we have any real softirq callers left? If we do perhaps we can look at either moving them out or see if they can cope with the flush occuring after the call returns. If not we can get rid of the softirq special case. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] tun: use iov_length()
Use iov_length() instead of tun's homemade iov_total(). Cc: Maxim Krasnyansky <[EMAIL PROTECTED]> Signed-off-by: Akinobu Mita <[EMAIL PROTECTED]> --- drivers/net/tun.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) Index: 2.6-mm/drivers/net/tun.c === --- 2.6-mm.orig/drivers/net/tun.c +++ 2.6-mm/drivers/net/tun.c @@ -292,17 +292,6 @@ static __inline__ ssize_t tun_get_user(s return count; } -static inline size_t iov_total(const struct iovec *iv, unsigned long count) -{ - unsigned long i; - size_t len; - - for (i = 0, len = 0; i < count; i++) - len += iv[i].iov_len; - - return len; -} - static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv, unsigned long count, loff_t pos) { @@ -313,7 +302,7 @@ static ssize_t tun_chr_aio_write(struct DBG(KERN_INFO "%s: tun_chr_write %ld\n", tun->dev->name, count); - return tun_get_user(tun, (struct iovec *) iv, iov_total(iv, count)); + return tun_get_user(tun, (struct iovec *) iv, iov_length(iv, count)); } /* Put packet to the user space buffer */ @@ -364,7 +353,7 @@ static ssize_t tun_chr_aio_read(struct k DBG(KERN_INFO "%s: tun_chr_read\n", tun->dev->name); - len = iov_total(iv, count); + len = iov_length(iv, count); if (len < 0) return -EINVAL; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc2-mm1 -- strange apparent network failures
On Fri, Nov 16, 2007 at 09:16:58PM -0800, Andrew Morgan wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi, > > This warning is just saying that you might want to reconsider > recompiling your dhclient with a newer libcap - which has native support > for 64-bit capabilities. This is supposed to be informative, and not be > associated with any particular error. > > - From your comments, you believe that this patch causes something in your > boot process to fail. Can you supply some detail about the version of > dhclient you are using? I'd like to understand exactly what it is doing > (via libcap). > > Thanks The machine which show this problem for me are using static network configurations, so I don't know if libcap is still in the mix there. I've just compared the boot logs from a successful and unsuccessful boot on this kernel, and I don't see that particular message, nor do I see any significant differences overall. Perlexed. -apw - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [IPV4] UDP: Always checksum even if without socket filter
Wang Chen <[EMAIL PROTECTED]> writes: > Herbert Xu said the following on 2007-11-16 12:11: >> Wang Chen <[EMAIL PROTECTED]> wrote: >>> So, I think the checksum in udp_queue_rcv_skb() actually does >>> the work, not that in udp_recvmsg() and udp_poll(). >>> >>> If I am wrong, please point out. >> >> We may have a bug in the accounting area. Check the recent >> patch made to UDP/IPv6 which is probably needed here as well. >> > > Like dave said, decrementing the InDataGrams in this case is an > option. > I will check the same place of UDP/IPv6. And like Benny pointed out it's probably a bad idea because decrementing counters will be an unexpected ABI change for monitoring programs who have no other way to detect overflow. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : rt_check_expire() can take a long time, add a cond_resched()
Eric Dumazet <[EMAIL PROTECTED]> writes: > So it may sound unnecessary but in the rt_check_expire() case, with a > loop potentially doing XXX.XXX iterations, being able to bypass the > function call is a clear win (in my bench case, 25 ms instead of 88 > ms). Impact on I-cache is irrelevant here as this rt_check_expires() Measuring what? And really milli-seconds? The number does not sound plausible to me. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc2-mm1 -- strange apparent network failures
On November 17, 2007 01:16:58 am Andrew Morgan wrote: > Hi, > > This warning is just saying that you might want to reconsider > recompiling your dhclient with a newer libcap - which has native support > for 64-bit capabilities. This is supposed to be informative, and not be > associated with any particular error. > > From your comments, you believe that this patch causes something in your > boot process to fail. Can you supply some detail about the version of > dhclient you are using? I'd like to understand exactly what it is doing > (via libcap). > > Thanks > The boot succeeds (and appears to bring initialize the network adapter properly - it autonegotiates a 100Mbps link speed), but the dhcp client is never able to get an address. However, applying the rc2-mm1 patch series up to just before: add-64-bit-capability-support-to-the-kernel.patch results in a working kernel. Applying just this patch causes the failure. To be sure, I also tried applying the above patch plus the following ones: add-64-bit-capability-support-to-the-kernel-checkpatch-fixes.patch add-64-bit-capability-support-to-the-kernel-fix.patch add-64-bit-capability-support-to-the-kernel-fix-fix.patch remove-unnecessary-include-from-include-linux-capabilityh.patch but the problem still occurs even with all of these. As to versions, I'm running Kubuntu gutsy, so I have the default: dhcp3-client 3.0.5-3ubuntu4 libcap11:1.10-14build1 packages installed. Let me know if you need any other information, or if you have a patch you would like tested. -- Kevin Winchester - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[2.6 patch] net/core/request_sock.c: remove unused exports
This patch removes the following unused EXPORT_SYMBOL's: - reqsk_queue_alloc - __reqsk_queue_destroy - reqsk_queue_destroy Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> --- net/core/request_sock.c |5 - 1 file changed, 5 deletions(-) 3761f092ccd5d87a1517b55e2001ac9ef189b901 diff --git a/net/core/request_sock.c b/net/core/request_sock.c index 45aed75..2d3035d 100644 --- a/net/core/request_sock.c +++ b/net/core/request_sock.c @@ -69,8 +69,6 @@ int reqsk_queue_alloc(struct request_sock_queue *queue, return 0; } -EXPORT_SYMBOL(reqsk_queue_alloc); - void __reqsk_queue_destroy(struct request_sock_queue *queue) { struct listen_sock *lopt; @@ -91,8 +89,6 @@ void __reqsk_queue_destroy(struct request_sock_queue *queue) kfree(lopt); } -EXPORT_SYMBOL(__reqsk_queue_destroy); - static inline struct listen_sock *reqsk_queue_yank_listen_sk( struct request_sock_queue *queue) { @@ -134,4 +130,3 @@ void reqsk_queue_destroy(struct request_sock_queue *queue) kfree(lopt); } -EXPORT_SYMBOL(reqsk_queue_destroy); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue
David Miller a écrit : From: Eric Dumazet <[EMAIL PROTECTED]> Date: Fri, 16 Nov 2007 17:40:27 +0100 + unsigned long fake = 0, *flag_ptr; ... + /* +* This is a fast version of : +* if (process_context && need_resched()) +*/ + if (unlikely(test_bit(TIF_NEED_RESCHED, flag_ptr))) + cond_resched(); Too much exposure to internals for me to apply this, really. I have no problem with the change conceptually at all, this detail is just too dirty. Fair enough ;) Re-sending this patch from a thunderbird on a winXP machine, I hope you wont mind... Have a nice week end. [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue Every 600 seconds (ip_rt_secret_interval), a softirq flush of the whole ip route cache is triggered. On loaded machines, this can starve softirq for many seconds and can eventually crash. This patch moves this flush to a workqueue context, using the worker we intoduced in commit 39c90ece7565f5c47110c2fa77409d7a9478bd5b (IPV4: Convert rt_check_expire() from softirq processing to workqueue.) Also, immediate flushes (echo 0 >/proc/sys/net/ipv4/route/flush) are using rt_do_flush() helper function, wich take attention to rescheduling. Next step will be to handle delayed flushes ("echo -1 >/proc/sys/net/ipv4/route/flush" or "ip route flush cache") Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> net/ipv4/route.c | 83 +++-- 1 files changed, 59 insertions(+), 24 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 856807c..ad297f4 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -133,13 +133,14 @@ static int ip_rt_mtu_expires = 10 * 60 * HZ; static int ip_rt_min_pmtu = 512 + 20 + 20; static int ip_rt_min_advmss= 256; static int ip_rt_secret_interval = 10 * 60 * HZ; +static int ip_rt_flush_expected; static unsigned long rt_deadline; #define RTprint(a...) printk(KERN_DEBUG a) static struct timer_list rt_flush_timer; -static void rt_check_expire(struct work_struct *work); -static DECLARE_DELAYED_WORK(expires_work, rt_check_expire); +static void rt_worker_func(struct work_struct *work); +static DECLARE_DELAYED_WORK(expires_work, rt_worker_func); static struct timer_list rt_secret_timer; /* @@ -561,7 +562,36 @@ static inline int compare_keys(struct flowi *fl1, struct flowi *fl2) (fl1->iif ^ fl2->iif)) == 0; } -static void rt_check_expire(struct work_struct *work) +/* + * Perform a full scan of hash table and free all entries. + * Can be called by a softirq or a process. + * In the later case, we want to be reschedule if necessary + */ +static void rt_do_flush(int process_context) +{ + unsigned int i; + struct rtable *rth, *next; + + for (i = 0; i <= rt_hash_mask; i++) { + if (process_context && need_resched()) + cond_resched(); + rth = rt_hash_table[i].chain; + if (!rth) + continue; + + spin_lock_bh(rt_hash_lock_addr(i)); + rth = rt_hash_table[i].chain; + rt_hash_table[i].chain = NULL; + spin_unlock_bh(rt_hash_lock_addr(i)); + + for (; rth; rth = next) { + next = rth->u.dst.rt_next; + rt_free(rth); + } + } +} + +static void rt_check_expire(void) { static unsigned int rover; unsigned int i = rover, goal; @@ -607,33 +637,33 @@ static void rt_check_expire(struct work_struct *work) spin_unlock_bh(rt_hash_lock_addr(i)); } rover = i; +} + +/* + * rt_worker_func() is run in process context. + * If a whole flush was scheduled, it is done. + * Else, we call rt_check_expire() to scan part of the hash table + */ +static void rt_worker_func(struct work_struct *work) +{ + if (ip_rt_flush_expected) { + ip_rt_flush_expected = 0; + rt_do_flush(1); + } else + rt_check_expire(); schedule_delayed_work(&expires_work, ip_rt_gc_interval); } /* This can run from both BH and non-BH contexts, the latter * in the case of a forced flush event. */ -static void rt_run_flush(unsigned long dummy) +static void rt_run_flush(unsigned long process_context) { - int i; - struct rtable *rth, *next; - rt_deadline = 0; get_random_bytes(&rt_hash_rnd, 4); - for (i = rt_hash_mask; i >= 0; i--) { - spin_lock_bh(rt_hash_lock_addr(i)); - rth = rt_hash_table[i].chain; - if (rth) - rt_hash_table[i].chain = NULL; - spin_unlock_bh(rt_hash_lock_addr(i)); - - for (; rth; rth = next) { - next = rth->u.dst.rt_next; - rt_free(rth); -