Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources
On Wed, 2014-03-05 at 16:11 -0500, Jason Cooper wrote: > > In other words, if there are 4096 bits of "unknownness" in X to start > > with, and I can get those same 4096 bits of "unknownness" back by > > unmixing X' and Y, then there must still be 4096 bits of "unknownness" > > in X'. If X' is 4096 bits long, then we've just proven that > > reversibility means the attacker can know nothing about the contents of > > X' by his choice of Y. > > Well, this reinforces my comfortability with loadable modules. The pool > is already initialized by the point at which the driver is loaded. > > Unfortunately, any of the drivers in hw_random can be built in. When > built in, hwrng_register is going to be called during the kernel > initialization process. In that case, the unknownness in X is not 4096 > bits, but far less. Also, the items that may have seeded X (MAC addr, > time, etc) are discoverable by a potential attacker. This is also well > before random-seed has been fed in. To which I would respond.. so? If the pool is in an attacker-knowable state at early boot, adding attacker-controlled data does not make the situation any worse. In fact, if the attacker has less-than-perfect control of the inputs, mixing more things in will make things exponentially harder for the attacker. Put another way: mixing can't ever removes unknownness from the pool, it can only add more. So the only reason you should ever choose not to mix something into the pool is performance. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources
On Wed, 2014-03-05 at 16:11 -0500, Jason Cooper wrote: In other words, if there are 4096 bits of unknownness in X to start with, and I can get those same 4096 bits of unknownness back by unmixing X' and Y, then there must still be 4096 bits of unknownness in X'. If X' is 4096 bits long, then we've just proven that reversibility means the attacker can know nothing about the contents of X' by his choice of Y. Well, this reinforces my comfortability with loadable modules. The pool is already initialized by the point at which the driver is loaded. Unfortunately, any of the drivers in hw_random can be built in. When built in, hwrng_register is going to be called during the kernel initialization process. In that case, the unknownness in X is not 4096 bits, but far less. Also, the items that may have seeded X (MAC addr, time, etc) are discoverable by a potential attacker. This is also well before random-seed has been fed in. To which I would respond.. so? If the pool is in an attacker-knowable state at early boot, adding attacker-controlled data does not make the situation any worse. In fact, if the attacker has less-than-perfect control of the inputs, mixing more things in will make things exponentially harder for the attacker. Put another way: mixing can't ever removes unknownness from the pool, it can only add more. So the only reason you should ever choose not to mix something into the pool is performance. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources
On Tue, 2014-03-04 at 11:59 -0800, Kees Cook wrote: > On Tue, Mar 4, 2014 at 11:53 AM, Jason Cooper wrote: > > On Tue, Mar 04, 2014 at 11:01:49AM -0800, Kees Cook wrote: > >> On Tue, Mar 4, 2014 at 7:38 AM, Jason Cooper wrote: > >> > Kees, Ted, > >> > > >> > On Mon, Mar 03, 2014 at 03:51:48PM -0800, Kees Cook wrote: > >> >> When bringing a new RNG source online, it seems like it would make sense > >> >> to use some of its bytes to make the system entropy pool more random, > >> >> as done with all sorts of other devices that contain per-device or > >> >> per-boot differences. > >> > > >> > Why is this necessary? init_std_data() already calls > >> > arch_get_random_long() while stirring each of the pools. > >> > >> I may be misunderstanding something here, but hwrng isn't going to get > >> hit by a arch_get_random_long(). > > > > ahh, you are correct. It appears it's only used on x86 and powerpc. > > Bad assumption on my part. > > > >> That's just for arch-specific RNGs (e.g. RDRAND), where as hwrng is > >> for, effectively, add-on devices (e.g. TPMs). > >> > >> > I'm a little concerned here because this gives potentially untrusted > >> > hwrngs more influence over the entropy pools initial state than most > >> > users of random.c expect. Many of the drivers in hw_random/ are > >> > platform drivers and are initialized before random.c. > >> > > >> > I'm comfortable with the design decisions Ted has made wrt random.c and > >> > hwrngs. However, I think that this changes that trust relationship in a > >> > fundamental way. I'm ok with building support into my kernels for > >> > hwrngs as long as random.c's internal use of them is limited to the > >> > mixing in extract_buf() and init_std_data(). > >> > > >> > By adding this patch, even without crediting entropy to the pool, a > >> > rogue hwrng now has significantly more influence over the initial state > >> > of the entropy pools. Or, am I missing something? > >> > >> I wasn't viewing this as dealing with rouge hwrngs (though shouldn't > >> that state still be covered due to the existing mixing), but more as a > >> "hey this thing has some randomness associated with it", similar to > >> the mixing done for things like NIC MAC, etc. (Better, actually, since > >> NIC MAC is going to be the same every boot.) It seemed silly to ignore > >> an actual entropy source when seeding. > > > > Agreed, but I think we need to be careful about how random.c interacts > > with any hwrng. Ideally, the drivers in hw_random/ could provide > > arch_get_random_long(). This way, random.c still determines when and > > how to use the hwrng. > > > > Ultimately, the user (person compiling the kernel) will decide to trust > > or not trust the hwrng by enabling support for it or not. My concern > > with this patch is that it changes the magnitude of that trust decision. > > And only the most diligent user would discover the change. > > > > To date, all discussion wrt random.c and hwrngs are that the output of > > the hwrng (in particular, RDRAND) is XORd with the output of the mixer. > > Now, we're saying it can provide input as well. > > Well, I think there's confusion here over "the" hwrng and "a" hwrng. I > have devices with multiple entropy sources, and all my hwrngs are > built as modules, so I choose when to load them into my kernel. "The" > arch-specific entropy source (e.g. RDRAND) is very different. > > > > > Please understand, my point-of-view is as someone who installs Linux on > > equipment *after* purchase (hobbyist, tinkers). If I control the part > > selection and sourcing of the board components, of course I have more > > trust in the hwrng. > > > > So my situation is similar to buying an Intel based laptop. I can't do > > a special order at Bestbuy and ask for a system without the RDRAND > > instruction. Same with the hobbyist market. We buy the gear, but we > > have no control over what's inside it. > > > > In that situation, without this patch, I would enable the hwrng for the > > board. With the patch in it's current form, I would start looking for > > research papers and discussions regarding using the hwrng for input. If > > the patch provided arch_get_random_long(), I would feel comfortable > > enabling the hwrng. > > > > Perhaps I'm being too conservative, but I'd rather have the discussion > > now and have concerns proven unfounded than have someone say "How the > > hell did this happen?" three releases down the road. > > Sure, and I don't want to be the one weakening the entropy pool. [temporarily coming out of retirement to provide a clue] The pool mixing function is intentionally _reversible_. This is a crucial security property. That means, if I have an initial secret pool state X, and hostile attacker controlled data Y, then we can do: X' = mix(X, Y) and X = unmix(X', Y) We can see from this that the combination of (X' and Y) still contain the information that was originally in X. Since it's clearly not in Y.. it must all remain
Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources
On Tue, 2014-03-04 at 11:59 -0800, Kees Cook wrote: On Tue, Mar 4, 2014 at 11:53 AM, Jason Cooper ja...@lakedaemon.net wrote: On Tue, Mar 04, 2014 at 11:01:49AM -0800, Kees Cook wrote: On Tue, Mar 4, 2014 at 7:38 AM, Jason Cooper ja...@lakedaemon.net wrote: Kees, Ted, On Mon, Mar 03, 2014 at 03:51:48PM -0800, Kees Cook wrote: When bringing a new RNG source online, it seems like it would make sense to use some of its bytes to make the system entropy pool more random, as done with all sorts of other devices that contain per-device or per-boot differences. Why is this necessary? init_std_data() already calls arch_get_random_long() while stirring each of the pools. I may be misunderstanding something here, but hwrng isn't going to get hit by a arch_get_random_long(). ahh, you are correct. It appears it's only used on x86 and powerpc. Bad assumption on my part. That's just for arch-specific RNGs (e.g. RDRAND), where as hwrng is for, effectively, add-on devices (e.g. TPMs). I'm a little concerned here because this gives potentially untrusted hwrngs more influence over the entropy pools initial state than most users of random.c expect. Many of the drivers in hw_random/ are platform drivers and are initialized before random.c. I'm comfortable with the design decisions Ted has made wrt random.c and hwrngs. However, I think that this changes that trust relationship in a fundamental way. I'm ok with building support into my kernels for hwrngs as long as random.c's internal use of them is limited to the mixing in extract_buf() and init_std_data(). By adding this patch, even without crediting entropy to the pool, a rogue hwrng now has significantly more influence over the initial state of the entropy pools. Or, am I missing something? I wasn't viewing this as dealing with rouge hwrngs (though shouldn't that state still be covered due to the existing mixing), but more as a hey this thing has some randomness associated with it, similar to the mixing done for things like NIC MAC, etc. (Better, actually, since NIC MAC is going to be the same every boot.) It seemed silly to ignore an actual entropy source when seeding. Agreed, but I think we need to be careful about how random.c interacts with any hwrng. Ideally, the drivers in hw_random/ could provide arch_get_random_long(). This way, random.c still determines when and how to use the hwrng. Ultimately, the user (person compiling the kernel) will decide to trust or not trust the hwrng by enabling support for it or not. My concern with this patch is that it changes the magnitude of that trust decision. And only the most diligent user would discover the change. To date, all discussion wrt random.c and hwrngs are that the output of the hwrng (in particular, RDRAND) is XORd with the output of the mixer. Now, we're saying it can provide input as well. Well, I think there's confusion here over the hwrng and a hwrng. I have devices with multiple entropy sources, and all my hwrngs are built as modules, so I choose when to load them into my kernel. The arch-specific entropy source (e.g. RDRAND) is very different. Please understand, my point-of-view is as someone who installs Linux on equipment *after* purchase (hobbyist, tinkers). If I control the part selection and sourcing of the board components, of course I have more trust in the hwrng. So my situation is similar to buying an Intel based laptop. I can't do a special order at Bestbuy and ask for a system without the RDRAND instruction. Same with the hobbyist market. We buy the gear, but we have no control over what's inside it. In that situation, without this patch, I would enable the hwrng for the board. With the patch in it's current form, I would start looking for research papers and discussions regarding using the hwrng for input. If the patch provided arch_get_random_long(), I would feel comfortable enabling the hwrng. Perhaps I'm being too conservative, but I'd rather have the discussion now and have concerns proven unfounded than have someone say How the hell did this happen? three releases down the road. Sure, and I don't want to be the one weakening the entropy pool. [temporarily coming out of retirement to provide a clue] The pool mixing function is intentionally _reversible_. This is a crucial security property. That means, if I have an initial secret pool state X, and hostile attacker controlled data Y, then we can do: X' = mix(X, Y) and X = unmix(X', Y) We can see from this that the combination of (X' and Y) still contain the information that was originally in X. Since it's clearly not in Y.. it must all remain in X'. In other words, if there are 4096 bits of unknownness in X to start with, and I can get those same 4096 bits of unknownness back by unmixing X' and Y, then there must still be 4096 bits of
Re: [PATCH RFC] random: Account for entropy loss due to overwrites
On Mon, 2012-08-13 at 10:26 -0700, H. Peter Anvin wrote: > From: "H. Peter Anvin" > > When we write entropy into a non-empty pool, we currently don't > account at all for the fact that we will probabilistically overwrite > some of the entropy in that pool. Technically, no, nothing is overwritten. The key fact is that the mixing function is -reversible-. Thus, even if you mix in known data, you can't learn anything about the state and thus can't destroy any of the existing entropy. But you are correct, mixing new actual entropy is not purely additive (with saturation). For that to happen, we'd need an input mixing function with perfect maximal cascading. Instead we effectively cascade across somewhere between 6 and 64 bits. So the truth lies somewhere between linear and your exponential estimate (which would be the case for mixing a single bit into the pool with XOR), but much closer to linear due to combinatoric expansion. On the other hand, I don't think this sort of thing matters at all. There is so much more fundamentally wrong with even trying to do entropy accounting in the first place that these sorts of details don't even matter. Instead we should stop fooling ourselves and just drop the pretense of accounting entirely. Now that we've got a much richer set of inputs, I think the time is ripe... but of course, I'm no longer the maintainer. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] random: Account for entropy loss due to overwrites
On Mon, 2012-08-13 at 10:26 -0700, H. Peter Anvin wrote: From: H. Peter Anvin h...@linux.intel.com When we write entropy into a non-empty pool, we currently don't account at all for the fact that we will probabilistically overwrite some of the entropy in that pool. Technically, no, nothing is overwritten. The key fact is that the mixing function is -reversible-. Thus, even if you mix in known data, you can't learn anything about the state and thus can't destroy any of the existing entropy. But you are correct, mixing new actual entropy is not purely additive (with saturation). For that to happen, we'd need an input mixing function with perfect maximal cascading. Instead we effectively cascade across somewhere between 6 and 64 bits. So the truth lies somewhere between linear and your exponential estimate (which would be the case for mixing a single bit into the pool with XOR), but much closer to linear due to combinatoric expansion. On the other hand, I don't think this sort of thing matters at all. There is so much more fundamentally wrong with even trying to do entropy accounting in the first place that these sorts of details don't even matter. Instead we should stop fooling ourselves and just drop the pretense of accounting entirely. Now that we've got a much richer set of inputs, I think the time is ripe... but of course, I'm no longer the maintainer. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dmi: Feed DMI table to /dev/random driver
On Fri, 2012-07-20 at 13:15 -0700, Tony Luck wrote: > Send the entire DMI (SMBIOS) table to the /dev/random driver to > help seed its pools. > > Signed-off-by: Tony Luck > --- > > This looks a useful addition to your /dev/random series. There are > lots of platform specific goodies in this table (BIOS version, system > serial number and UUID, count and version number of processors, DIMM > slot population and serial numbers, etc.) > > On the system I tested the patch on the table is 9866 bytes. Is it > OK to dump that much into add_device_randomness() in one shot? Yes, that's fine. We should also consider doing something similar with various bus enumerations (PCI, USB, SCSI) and hotplug, we might pick up similar goodies. Also, we should feed in the OF device tree on platforms that use it. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dmi: Feed DMI table to /dev/random driver
On Fri, 2012-07-20 at 13:15 -0700, Tony Luck wrote: Send the entire DMI (SMBIOS) table to the /dev/random driver to help seed its pools. Signed-off-by: Tony Luck tony.l...@intel.com --- This looks a useful addition to your /dev/random series. There are lots of platform specific goodies in this table (BIOS version, system serial number and UUID, count and version number of processors, DIMM slot population and serial numbers, etc.) On the system I tested the patch on the table is 9866 bytes. Is it OK to dump that much into add_device_randomness() in one shot? Yes, that's fine. We should also consider doing something similar with various bus enumerations (PCI, USB, SCSI) and hotplug, we might pick up similar goodies. Also, we should feed in the OF device tree on platforms that use it. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/10] random: make 'add_interrupt_randomness()' do something sane
On Fri, 2012-07-06 at 12:52 -0400, Theodore Ts'o wrote: > On Fri, Jul 06, 2012 at 09:24:00AM -0700, Linus Torvalds wrote: > > On Fri, Jul 6, 2012 at 6:01 AM, Theodore Ts'o wrote: > > > What in the world is "fast count"? I've grepped for it, > > > and I can't find it. > > > > It's your own fast-pool counter that Matt was talking about. > > When he said "check it against HZ", it confused me, since there's no > way to compare it against HZ. But yes, I can certainly not give any > credit for entropy if __IRQF_TIMER is set, or keep track of whether > the previous interrupt had __IRQF_TIMER set in its descriptor. That's > simple enough. > > I thought he was saying there was some way to distinguish between > interrupts triggered by the clock interrupt versus other devices on > the same irq channel --- and I couldn't figure out any to do that in > an architecture independent way. Sorry.. offline for the weekend. Let me restate: - on some architectures, we will call into the RNG on timer interrupts - this is generally desirable, as most time sources are asynchronous to sched_clock() and thus a source of entropy - we also want to keep conditional checks like IRQF_TIMER off the fast path - but on systems where the timer interrupt is the primary time source, we may get effectively no entropy when the system is quiescent - so we should check the fast pool count against HZ before crediting - but even then, we still should mix the fast pool Something like: add_some_randomness(...) /* always mix */ if (fast_pool->count > HZ) { fast_pool->count = 0; credit_entropy_pool(...); /* only credit when we've got > HZ events */ } That should be safe on all systems. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/10] random: make 'add_interrupt_randomness()' do something sane
On Fri, 2012-07-06 at 12:52 -0400, Theodore Ts'o wrote: On Fri, Jul 06, 2012 at 09:24:00AM -0700, Linus Torvalds wrote: On Fri, Jul 6, 2012 at 6:01 AM, Theodore Ts'o ty...@mit.edu wrote: What in the world is fast count? I've grepped for it, and I can't find it. It's your own fast-pool counter that Matt was talking about. When he said check it against HZ, it confused me, since there's no way to compare it against HZ. But yes, I can certainly not give any credit for entropy if __IRQF_TIMER is set, or keep track of whether the previous interrupt had __IRQF_TIMER set in its descriptor. That's simple enough. I thought he was saying there was some way to distinguish between interrupts triggered by the clock interrupt versus other devices on the same irq channel --- and I couldn't figure out any to do that in an architecture independent way. Sorry.. offline for the weekend. Let me restate: - on some architectures, we will call into the RNG on timer interrupts - this is generally desirable, as most time sources are asynchronous to sched_clock() and thus a source of entropy - we also want to keep conditional checks like IRQF_TIMER off the fast path - but on systems where the timer interrupt is the primary time source, we may get effectively no entropy when the system is quiescent - so we should check the fast pool count against HZ before crediting - but even then, we still should mix the fast pool Something like: add_some_randomness(...) /* always mix */ if (fast_pool-count HZ) { fast_pool-count = 0; credit_entropy_pool(...); /* only credit when we've got HZ events */ } That should be safe on all systems. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)
On Mon, 2008-02-25 at 18:53 +0100, Thomas Petazzoni wrote: > Le Mon, 25 Feb 2008 09:03:12 -0800, > Matt Mackall <[EMAIL PROTECTED]> a écrit : > > > > > This is not quite what Peter and I were thinking of, I think. > > > > It's not at all generic. How about a section that simply contains > > > > a set of function pointers, a macro to add things to that > > > > section, and a function that calls all the pointers in that > > > > section. Eg: > > > > > > > > CALLBACK_SECTION(init_cpu_amd, "cpuvendor.init"); > > > > invoke_callback_section("cpuvendor.init"); > > > > > > > > ..which would give us a generic facility we could use in various > > > > places. > > > > > > I see. Probably doable. How would it work in the LD script file ? > > > Your mechanism allows to specify any section name, but AFAIK, the > > > sections must be explicitly listed in the kernel LD script in order > > > to be included in the final kernel image. Am I missing something ? > > > > I can't see any way to avoid it, but we can leave it to future > > generations to come up with something more clever. > > After a quick look at the LD documentation, it seems that wildcards are > supported in the input section names of the linker script. So that the > CALLBACK_SECTION() macro could add the function pointer to a section > named: > >gcm. ## name > > (gcm standing for "generic callback mechanism") and then, in the linker > script, do: > >*(gcm.*) > > I'm going to try that. Sounds great! But I'd rather the base name be "callback" so it'll be obvious what it is when people dump section names. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)
On Mon, 2008-02-25 at 09:29 +0100, Thomas Petazzoni wrote: > Le Sat, 23 Feb 2008 10:43:37 +0800, > Matt Mackall <[EMAIL PROTECTED]> a écrit : > > > This is not quite what Peter and I were thinking of, I think. It's not > > at all generic. How about a section that simply contains a set of > > function pointers, a macro to add things to that section, and a > > function that calls all the pointers in that section. Eg: > > > > CALLBACK_SECTION(init_cpu_amd, "cpuvendor.init"); > > invoke_callback_section("cpuvendor.init"); > > > > ..which would give us a generic facility we could use in various > > places. > > I see. Probably doable. How would it work in the LD script file ? Your > mechanism allows to specify any section name, but AFAIK, the sections > must be explicitly listed in the kernel LD script in order to be > included in the final kernel image. Am I missing something ? I can't see any way to avoid it, but we can leave it to future generations to come up with something more clever. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)
On Mon, 2008-02-25 at 09:29 +0100, Thomas Petazzoni wrote: Le Sat, 23 Feb 2008 10:43:37 +0800, Matt Mackall [EMAIL PROTECTED] a écrit : This is not quite what Peter and I were thinking of, I think. It's not at all generic. How about a section that simply contains a set of function pointers, a macro to add things to that section, and a function that calls all the pointers in that section. Eg: CALLBACK_SECTION(init_cpu_amd, cpuvendor.init); invoke_callback_section(cpuvendor.init); ..which would give us a generic facility we could use in various places. I see. Probably doable. How would it work in the LD script file ? Your mechanism allows to specify any section name, but AFAIK, the sections must be explicitly listed in the kernel LD script in order to be included in the final kernel image. Am I missing something ? I can't see any way to avoid it, but we can leave it to future generations to come up with something more clever. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)
On Mon, 2008-02-25 at 18:53 +0100, Thomas Petazzoni wrote: Le Mon, 25 Feb 2008 09:03:12 -0800, Matt Mackall [EMAIL PROTECTED] a écrit : This is not quite what Peter and I were thinking of, I think. It's not at all generic. How about a section that simply contains a set of function pointers, a macro to add things to that section, and a function that calls all the pointers in that section. Eg: CALLBACK_SECTION(init_cpu_amd, cpuvendor.init); invoke_callback_section(cpuvendor.init); ..which would give us a generic facility we could use in various places. I see. Probably doable. How would it work in the LD script file ? Your mechanism allows to specify any section name, but AFAIK, the sections must be explicitly listed in the kernel LD script in order to be included in the final kernel image. Am I missing something ? I can't see any way to avoid it, but we can leave it to future generations to come up with something more clever. After a quick look at the LD documentation, it seems that wildcards are supported in the input section names of the linker script. So that the CALLBACK_SECTION() macro could add the function pointer to a section named: gcm. ## name (gcm standing for generic callback mechanism) and then, in the linker script, do: *(gcm.*) I'm going to try that. Sounds great! But I'd rather the base name be callback so it'll be obvious what it is when people dump section names. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size
On Sat, 2008-02-23 at 00:06 -0800, Andrew Morton wrote: > On Wed, 20 Feb 2008 14:57:43 +0100 "Hans Rosenfeld" <[EMAIL PROTECTED]> wrote: > > > The current code for /proc/pid/pagemap does not work with huge pages (on > > x86). The code will make no difference between a normal pmd and a huge > > page pmd, trying to parse the contents of the huge page as ptes. Another > > problem is that there is no way to get information about the page size a > > specific mapping uses. > > > > Also, the current way the "not present" and "swap" bits are encoded in > > the returned pfn isn't very clean, especially not if this interface is > > going to be extended. > > > > I propose to change /proc/pid/pagemap to return a pseudo-pte instead of > > just a raw pfn. The pseudo-pte will contain: > > > > - 58 bits for the physical address of the first byte in the page, even > > less bits would probably be sufficient for quite a while > > > > - 4 bits for the page size, with 0 meaning native page size (4k on x86, > > 8k on alpha, ...) and values 1-15 being specific to the architecture > > (I used 1 for 2M, 2 for 4M and 3 for 1G for x86) > > > > - a "swap" bit indicating that a not present page is paged out, with the > > physical address field containing page file number and block number > > just like before > > > > - a "present" bit just like in a real pte > > > > By shortening the field for the physical address, some more interesting > > information could be included, like read/write permissions and the like. > > The page size could also be returned directly, 6 bits could be used to > > express any page shift in a 64 bit system, but I found the encoded page > > size more useful for my specific use case. > > > > > > The attached patch changes the /proc/pid/pagemap code to use such a > > pseudo-pte. The huge page handling is currently limited to 2M/4M pages > > on x86, 1G pages will need some more work. To keep the simple mapping of > > virtual addresses to file index intact, any huge page pseudo-pte is > > replicated in the user buffer to map the equivalent range of small > > pages. > > > > Note that I had to move the pmd_pfn() macro from asm-x86/pgtable_64.h to > > asm-x86/pgtable.h, it applies to both 32 bit and 64 bit x86. > > > > Other architectures will probably need other changes to support huge > > pages and return the page size. > > > > I think that the definition of the pseudo-pte structure and the page > > size codes should be made available through a header file, but I didn't > > do this for now. > > > > If we're going to do this, we need to do it *fast*. Once 2.6.25 goes out > our hands are tied. > > That means talking with the maintainers of other hugepage-capable > architectures. > > > +struct ppte { > > + uint64_t paddr:58; > > + uint64_t psize:4; > > + uint64_t swap:1; > > + uint64_t present:1; > > +}; > > This is part of the exported kernel interface and hence should be in a > header somewhere, shouldn't it? The old stuff should have been too. I think we're better off not using bitfields here. > u64 is a bit more conventional than uint64_t, and if we move this to a > userspace-visible header then __u64 is the type to use, I think. Although > one would expect uint64_t to be OK as well. > > > +#ifdef CONFIG_X86 > > +#define PM_PSIZE_1G 3 > > +#define PM_PSIZE_4M 2 > > +#define PM_PSIZE_2M 1 > > +#endif > > No, we should factor this correctly and get the CONFIG_X86 stuff out of here. Perhaps my "continuation bit" idea. > Matt? Help? Did my previous message make it out? This is probably my last message for 24+ hours. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size
On Sat, 2008-02-23 at 00:06 -0800, Andrew Morton wrote: On Wed, 20 Feb 2008 14:57:43 +0100 Hans Rosenfeld [EMAIL PROTECTED] wrote: The current code for /proc/pid/pagemap does not work with huge pages (on x86). The code will make no difference between a normal pmd and a huge page pmd, trying to parse the contents of the huge page as ptes. Another problem is that there is no way to get information about the page size a specific mapping uses. Also, the current way the not present and swap bits are encoded in the returned pfn isn't very clean, especially not if this interface is going to be extended. I propose to change /proc/pid/pagemap to return a pseudo-pte instead of just a raw pfn. The pseudo-pte will contain: - 58 bits for the physical address of the first byte in the page, even less bits would probably be sufficient for quite a while - 4 bits for the page size, with 0 meaning native page size (4k on x86, 8k on alpha, ...) and values 1-15 being specific to the architecture (I used 1 for 2M, 2 for 4M and 3 for 1G for x86) - a swap bit indicating that a not present page is paged out, with the physical address field containing page file number and block number just like before - a present bit just like in a real pte By shortening the field for the physical address, some more interesting information could be included, like read/write permissions and the like. The page size could also be returned directly, 6 bits could be used to express any page shift in a 64 bit system, but I found the encoded page size more useful for my specific use case. The attached patch changes the /proc/pid/pagemap code to use such a pseudo-pte. The huge page handling is currently limited to 2M/4M pages on x86, 1G pages will need some more work. To keep the simple mapping of virtual addresses to file index intact, any huge page pseudo-pte is replicated in the user buffer to map the equivalent range of small pages. Note that I had to move the pmd_pfn() macro from asm-x86/pgtable_64.h to asm-x86/pgtable.h, it applies to both 32 bit and 64 bit x86. Other architectures will probably need other changes to support huge pages and return the page size. I think that the definition of the pseudo-pte structure and the page size codes should be made available through a header file, but I didn't do this for now. If we're going to do this, we need to do it *fast*. Once 2.6.25 goes out our hands are tied. That means talking with the maintainers of other hugepage-capable architectures. +struct ppte { + uint64_t paddr:58; + uint64_t psize:4; + uint64_t swap:1; + uint64_t present:1; +}; This is part of the exported kernel interface and hence should be in a header somewhere, shouldn't it? The old stuff should have been too. I think we're better off not using bitfields here. u64 is a bit more conventional than uint64_t, and if we move this to a userspace-visible header then __u64 is the type to use, I think. Although one would expect uint64_t to be OK as well. +#ifdef CONFIG_X86 +#define PM_PSIZE_1G 3 +#define PM_PSIZE_4M 2 +#define PM_PSIZE_2M 1 +#endif No, we should factor this correctly and get the CONFIG_X86 stuff out of here. Perhaps my continuation bit idea. Matt? Help? Did my previous message make it out? This is probably my last message for 24+ hours. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)
On Fri, 2008-02-15 at 12:00 +0100, Thomas Petazzoni wrote: > Hi, > > Le Mon, 11 Feb 2008 16:54:30 -0800, > "H. Peter Anvin" <[EMAIL PROTECTED]> a écrit : > > > b) would be my first choice, and yes, it would be a good thing to > > have a generalized mechanism for this. For the registrant, it's > > pretty easy: just add a macro that adds a pointer to a named > > section. We then need a way to get the base address and length of > > each such section in order to be able to execute each function in > > sequence. > > You'll find below a tentative patch that implements this. Tuple > (vendor, pointer to cpu_dev structure) are stored in a > x86cpuvendor.init section of the kernel, which is then read by the > generic CPU code in arch/x86/kernel/cpu/common.c to fill the cpu_devs[] > function. This is not quite what Peter and I were thinking of, I think. It's not at all generic. How about a section that simply contains a set of function pointers, a macro to add things to that section, and a function that calls all the pointers in that section. Eg: CALLBACK_SECTION(init_cpu_amd, "cpuvendor.init"); invoke_callback_section("cpuvendor.init"); ..which would give us a generic facility we could use in various places. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size
(sorry for the delay, travelling) On Wed, 2008-02-20 at 14:57 +0100, Hans Rosenfeld wrote: > The current code for /proc/pid/pagemap does not work with huge pages (on > x86). The code will make no difference between a normal pmd and a huge > page pmd, trying to parse the contents of the huge page as ptes. Another > problem is that there is no way to get information about the page size a > specific mapping uses. > > Also, the current way the "not present" and "swap" bits are encoded in > the returned pfn isn't very clean, especially not if this interface is > going to be extended. Fair. > I propose to change /proc/pid/pagemap to return a pseudo-pte instead of > just a raw pfn. The pseudo-pte will contain: > > - 58 bits for the physical address of the first byte in the page, even > less bits would probably be sufficient for quite a while > > - 4 bits for the page size, with 0 meaning native page size (4k on x86, > 8k on alpha, ...) and values 1-15 being specific to the architecture > (I used 1 for 2M, 2 for 4M and 3 for 1G for x86) > > - a "swap" bit indicating that a not present page is paged out, with the > physical address field containing page file number and block number > just like before > > - a "present" bit just like in a real pte This is ok-ish, but I can't say I like it much. Especially the page size field. But I don't really have many ideas here. Perhaps having a bit saying "this entry is really a continuation of the previous one". Then any page size can be trivially represented. This might also make the code on both sides simpler? > By shortening the field for the physical address, some more interesting > information could be included, like read/write permissions and the like. > The page size could also be returned directly, 6 bits could be used to > express any page shift in a 64 bit system, but I found the encoded page > size more useful for my specific use case. > > > The attached patch changes the /proc/pid/pagemap code to use such a > pseudo-pte. The huge page handling is currently limited to 2M/4M pages > on x86, 1G pages will need some more work. To keep the simple mapping of > virtual addresses to file index intact, any huge page pseudo-pte is > replicated in the user buffer to map the equivalent range of small > pages. > > Note that I had to move the pmd_pfn() macro from asm-x86/pgtable_64.h to > asm-x86/pgtable.h, it applies to both 32 bit and 64 bit x86. > > Other architectures will probably need other changes to support huge > pages and return the page size. > > I think that the definition of the pseudo-pte structure and the page > size codes should be made available through a header file, but I didn't > do this for now. > > Signed-Off-By: Hans Rosenfeld <[EMAIL PROTECTED]> > > --- > fs/proc/task_mmu.c | 68 + > include/asm-x86/pgtable.h|2 + > include/asm-x86/pgtable_64.h |1 - > 3 files changed, 50 insertions(+), 21 deletions(-) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 49958cf..58af588 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -527,16 +527,23 @@ struct pagemapread { > char __user *out, *end; > }; > > -#define PM_ENTRY_BYTES sizeof(u64) > -#define PM_RESERVED_BITS3 > -#define PM_RESERVED_OFFSET (64 - PM_RESERVED_BITS) > -#define PM_RESERVED_MASK(((1LL< PM_RESERVED_OFFSET) > -#define PM_SPECIAL(nr) (((nr) << PM_RESERVED_OFFSET) | PM_RESERVED_MASK) > -#define PM_NOT_PRESENT PM_SPECIAL(1LL) > -#define PM_SWAP PM_SPECIAL(2LL) > -#define PM_END_OF_BUFFER1 > - > -static int add_to_pagemap(unsigned long addr, u64 pfn, > +struct ppte { > + uint64_t paddr:58; > + uint64_t psize:4; > + uint64_t swap:1; > + uint64_t present:1; > +}; > + > +#ifdef CONFIG_X86 > +#define PM_PSIZE_1G 3 > +#define PM_PSIZE_4M 2 > +#define PM_PSIZE_2M 1 > +#endif > + > +#define PM_ENTRY_BYTES sizeof(struct ppte) > +#define PM_END_OF_BUFFER 1 > + > +static int add_to_pagemap(unsigned long addr, struct ppte ppte, > struct pagemapread *pm) > { > /* > @@ -545,13 +552,13 @@ static int add_to_pagemap(unsigned long addr, u64 pfn, >* the pfn. >*/ > if (pm->out + PM_ENTRY_BYTES >= pm->end) { > - if (copy_to_user(pm->out, , pm->end - pm->out)) > + if (copy_to_user(pm->out, , pm->end - pm->out)) > return -EFAULT; > pm->out = pm->end; > return PM_END_OF_BUFFER; > } > > - if (put_user(pfn, pm->out)) > + if (copy_to_user(pm->out, , sizeof(ppte))) > return -EFAULT; > pm->out += PM_ENTRY_BYTES; > return 0; > @@ -564,7 +571,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned > long end, > unsigned long addr; > int err = 0; > for (addr = start; addr < end; addr += PAGE_SIZE) { > - err = add_to_pagemap(addr,
Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size
(sorry for the delay, travelling) On Wed, 2008-02-20 at 14:57 +0100, Hans Rosenfeld wrote: The current code for /proc/pid/pagemap does not work with huge pages (on x86). The code will make no difference between a normal pmd and a huge page pmd, trying to parse the contents of the huge page as ptes. Another problem is that there is no way to get information about the page size a specific mapping uses. Also, the current way the not present and swap bits are encoded in the returned pfn isn't very clean, especially not if this interface is going to be extended. Fair. I propose to change /proc/pid/pagemap to return a pseudo-pte instead of just a raw pfn. The pseudo-pte will contain: - 58 bits for the physical address of the first byte in the page, even less bits would probably be sufficient for quite a while - 4 bits for the page size, with 0 meaning native page size (4k on x86, 8k on alpha, ...) and values 1-15 being specific to the architecture (I used 1 for 2M, 2 for 4M and 3 for 1G for x86) - a swap bit indicating that a not present page is paged out, with the physical address field containing page file number and block number just like before - a present bit just like in a real pte This is ok-ish, but I can't say I like it much. Especially the page size field. But I don't really have many ideas here. Perhaps having a bit saying this entry is really a continuation of the previous one. Then any page size can be trivially represented. This might also make the code on both sides simpler? By shortening the field for the physical address, some more interesting information could be included, like read/write permissions and the like. The page size could also be returned directly, 6 bits could be used to express any page shift in a 64 bit system, but I found the encoded page size more useful for my specific use case. The attached patch changes the /proc/pid/pagemap code to use such a pseudo-pte. The huge page handling is currently limited to 2M/4M pages on x86, 1G pages will need some more work. To keep the simple mapping of virtual addresses to file index intact, any huge page pseudo-pte is replicated in the user buffer to map the equivalent range of small pages. Note that I had to move the pmd_pfn() macro from asm-x86/pgtable_64.h to asm-x86/pgtable.h, it applies to both 32 bit and 64 bit x86. Other architectures will probably need other changes to support huge pages and return the page size. I think that the definition of the pseudo-pte structure and the page size codes should be made available through a header file, but I didn't do this for now. Signed-Off-By: Hans Rosenfeld [EMAIL PROTECTED] --- fs/proc/task_mmu.c | 68 + include/asm-x86/pgtable.h|2 + include/asm-x86/pgtable_64.h |1 - 3 files changed, 50 insertions(+), 21 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 49958cf..58af588 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -527,16 +527,23 @@ struct pagemapread { char __user *out, *end; }; -#define PM_ENTRY_BYTES sizeof(u64) -#define PM_RESERVED_BITS3 -#define PM_RESERVED_OFFSET (64 - PM_RESERVED_BITS) -#define PM_RESERVED_MASK(((1LLPM_RESERVED_BITS)-1) PM_RESERVED_OFFSET) -#define PM_SPECIAL(nr) (((nr) PM_RESERVED_OFFSET) | PM_RESERVED_MASK) -#define PM_NOT_PRESENT PM_SPECIAL(1LL) -#define PM_SWAP PM_SPECIAL(2LL) -#define PM_END_OF_BUFFER1 - -static int add_to_pagemap(unsigned long addr, u64 pfn, +struct ppte { + uint64_t paddr:58; + uint64_t psize:4; + uint64_t swap:1; + uint64_t present:1; +}; + +#ifdef CONFIG_X86 +#define PM_PSIZE_1G 3 +#define PM_PSIZE_4M 2 +#define PM_PSIZE_2M 1 +#endif + +#define PM_ENTRY_BYTES sizeof(struct ppte) +#define PM_END_OF_BUFFER 1 + +static int add_to_pagemap(unsigned long addr, struct ppte ppte, struct pagemapread *pm) { /* @@ -545,13 +552,13 @@ static int add_to_pagemap(unsigned long addr, u64 pfn, * the pfn. */ if (pm-out + PM_ENTRY_BYTES = pm-end) { - if (copy_to_user(pm-out, pfn, pm-end - pm-out)) + if (copy_to_user(pm-out, ppte, pm-end - pm-out)) return -EFAULT; pm-out = pm-end; return PM_END_OF_BUFFER; } - if (put_user(pfn, pm-out)) + if (copy_to_user(pm-out, ppte, sizeof(ppte))) return -EFAULT; pm-out += PM_ENTRY_BYTES; return 0; @@ -564,7 +571,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end, unsigned long addr; int err = 0; for (addr = start; addr end; addr += PAGE_SIZE) { - err = add_to_pagemap(addr, PM_NOT_PRESENT, pm); + err = add_to_pagemap(addr, (struct ppte) {0, 0, 0, 0}, pm); if
Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)
On Fri, 2008-02-15 at 12:00 +0100, Thomas Petazzoni wrote: Hi, Le Mon, 11 Feb 2008 16:54:30 -0800, H. Peter Anvin [EMAIL PROTECTED] a écrit : b) would be my first choice, and yes, it would be a good thing to have a generalized mechanism for this. For the registrant, it's pretty easy: just add a macro that adds a pointer to a named section. We then need a way to get the base address and length of each such section in order to be able to execute each function in sequence. You'll find below a tentative patch that implements this. Tuple (vendor, pointer to cpu_dev structure) are stored in a x86cpuvendor.init section of the kernel, which is then read by the generic CPU code in arch/x86/kernel/cpu/common.c to fill the cpu_devs[] function. This is not quite what Peter and I were thinking of, I think. It's not at all generic. How about a section that simply contains a set of function pointers, a macro to add things to that section, and a function that calls all the pointers in that section. Eg: CALLBACK_SECTION(init_cpu_amd, cpuvendor.init); invoke_callback_section(cpuvendor.init); ..which would give us a generic facility we could use in various places. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86 : relocate uninitialized variable in init DATA section into init BSS section
On Thu, 2008-02-21 at 10:53 +0100, Ingo Molnar wrote: > * Huang, Ying <[EMAIL PROTECTED]> wrote: > > > > > -int __initdata early_ioremap_debug; > > > > +int __initbss early_ioremap_debug; > > > > > > will we get some sort of build error if we accidentally do: > > > > > >int __initbss early_ioremap_debug = 1; > > > > > > ? > > > > I tested it just now, and there is no build error. > > well, that's bad. We'd silently ignore the " = 1" and boot up with that > value at 0, right? At minimum we need some really prominent build-time > _errors_ (i.e. aborted builds) if this ever happens. But ideally, > shouldnt this whole thing be done at link time? Couldnt the linker sort > the variables that are zero initialized into the right section, and move > this constant maintenance pressure off the programmer's shoulder? I'm not sure if it's possible currently. But it might be possible to instead tag objects as "init" with an attribute other than section and then move such objects into init sections "by hand" late in the build. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86 : relocate uninitialized variable in init DATA section into init BSS section
On Thu, 2008-02-21 at 10:53 +0100, Ingo Molnar wrote: * Huang, Ying [EMAIL PROTECTED] wrote: -int __initdata early_ioremap_debug; +int __initbss early_ioremap_debug; will we get some sort of build error if we accidentally do: int __initbss early_ioremap_debug = 1; ? I tested it just now, and there is no build error. well, that's bad. We'd silently ignore the = 1 and boot up with that value at 0, right? At minimum we need some really prominent build-time _errors_ (i.e. aborted builds) if this ever happens. But ideally, shouldnt this whole thing be done at link time? Couldnt the linker sort the variables that are zero initialized into the right section, and move this constant maintenance pressure off the programmer's shoulder? I'm not sure if it's possible currently. But it might be possible to instead tag objects as init with an attribute other than section and then move such objects into init sections by hand late in the build. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arch/x86/mm/ioremap unification grew by 10x
On Fri, 2008-02-15 at 15:21 -0600, Matt Mackall wrote: > On Fri, 2008-02-15 at 21:32 +0100, Sam Ravnborg wrote: > > On Fri, Feb 15, 2008 at 02:25:54PM -0600, Matt Mackall wrote: > > > In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In > > > 2.6.25-rc1, the unified ioremap.o is 20.8k. > > > > Just an observation - 17 commits touches said file after > > the unification (at least in latest -linus). > > Correction: those numbers should be halved. So we're going from .9k to > 10.4k. And here's most of the cause: 02b8 0124 T early_ioremap 1000 1000 t bm_pte 2000 0004 T early_ioremap_debug static __initdata pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __attribute__((aligned(PAGE_SIZE))); Double ouch. First, this isn't in BSS. Second, even though it's initdata, the alignment slop won't get recovered. Don't we have a special section for page-aligned crap so it doesn't waste most of two pages? -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arch/x86/mm/ioremap unification grew by 10x
On Fri, 2008-02-15 at 21:32 +0100, Sam Ravnborg wrote: > On Fri, Feb 15, 2008 at 02:25:54PM -0600, Matt Mackall wrote: > > In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In > > 2.6.25-rc1, the unified ioremap.o is 20.8k. > > Just an observation - 17 commits touches said file after > the unification (at least in latest -linus). Correction: those numbers should be halved. So we're going from .9k to 10.4k. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
arch/x86/mm/ioremap unification grew by 10x
In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In 2.6.25-rc1, the unified ioremap.o is 20.8k. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)
On Fri, 2008-02-15 at 19:04 +0100, Andi Kleen wrote: > > do when it does". There's very little point in having this sort of code > > in a mass-market camera, phone, DVR, TV, etc. (of which there are > > Do any of them run with a x86 CPU? Yes. The last PVR I worked on was just such a device, as was the very first device I put Linux on (1.2 era). There are several families of x86 CPUs targeted at embedded so this shouldn't be a surprise. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)
On Fri, 2008-02-15 at 13:00 +0100, Andi Kleen wrote: > Matt Mackall <[EMAIL PROTECTED]> writes: > > > > I bet there's some doublefault-handling code hiding somewhere. It's not > > the sort of thing it'd make sense to take out of the architecture. > > The big question is if it makes sense taking out of a kernel at all. > I still think the answer is no. > > Or have you considered replacing die() and show_trace() etc. with a single > panic("the tiny gods say this won't happen") yet? That would be roughly > equivalent. It's not a matter of "won't happen" so much as "not a damn thing we can do when it does". There's very little point in having this sort of code in a mass-market camera, phone, DVR, TV, etc. (of which there are already millions running Linux). These devices have no console and basically zero serviceability beyond firmware upgrades. If taking these vestigial debugging features out means we can cram in more features that consumers can actually see and will pay for, that's precisely what's going to happen. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)
On Fri, 2008-02-15 at 19:04 +0100, Andi Kleen wrote: do when it does. There's very little point in having this sort of code in a mass-market camera, phone, DVR, TV, etc. (of which there are Do any of them run with a x86 CPU? Yes. The last PVR I worked on was just such a device, as was the very first device I put Linux on (1.2 era). There are several families of x86 CPUs targeted at embedded so this shouldn't be a surprise. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)
On Fri, 2008-02-15 at 13:00 +0100, Andi Kleen wrote: Matt Mackall [EMAIL PROTECTED] writes: I bet there's some doublefault-handling code hiding somewhere. It's not the sort of thing it'd make sense to take out of the architecture. The big question is if it makes sense taking out of a kernel at all. I still think the answer is no. Or have you considered replacing die() and show_trace() etc. with a single panic(the tiny gods say this won't happen) yet? That would be roughly equivalent. It's not a matter of won't happen so much as not a damn thing we can do when it does. There's very little point in having this sort of code in a mass-market camera, phone, DVR, TV, etc. (of which there are already millions running Linux). These devices have no console and basically zero serviceability beyond firmware upgrades. If taking these vestigial debugging features out means we can cram in more features that consumers can actually see and will pay for, that's precisely what's going to happen. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
arch/x86/mm/ioremap unification grew by 10x
In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In 2.6.25-rc1, the unified ioremap.o is 20.8k. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arch/x86/mm/ioremap unification grew by 10x
On Fri, 2008-02-15 at 21:32 +0100, Sam Ravnborg wrote: On Fri, Feb 15, 2008 at 02:25:54PM -0600, Matt Mackall wrote: In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In 2.6.25-rc1, the unified ioremap.o is 20.8k. Just an observation - 17 commits touches said file after the unification (at least in latest -linus). Correction: those numbers should be halved. So we're going from .9k to 10.4k. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arch/x86/mm/ioremap unification grew by 10x
On Fri, 2008-02-15 at 15:21 -0600, Matt Mackall wrote: On Fri, 2008-02-15 at 21:32 +0100, Sam Ravnborg wrote: On Fri, Feb 15, 2008 at 02:25:54PM -0600, Matt Mackall wrote: In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In 2.6.25-rc1, the unified ioremap.o is 20.8k. Just an observation - 17 commits touches said file after the unification (at least in latest -linus). Correction: those numbers should be halved. So we're going from .9k to 10.4k. And here's most of the cause: 02b8 0124 T early_ioremap 1000 1000 t bm_pte 2000 0004 T early_ioremap_debug static __initdata pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __attribute__((aligned(PAGE_SIZE))); Double ouch. First, this isn't in BSS. Second, even though it's initdata, the alignment slop won't get recovered. Don't we have a special section for page-aligned crap so it doesn't waste most of two pages? -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] make swap_pte_to_pagemap_entry() static
On Wed, 2008-02-13 at 23:30 +0200, Adrian Bunk wrote: > This patch makes the needlessly global swap_pte_to_pagemap_entry() > static. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Thanks. Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] make swap_pte_to_pagemap_entry() static
On Wed, 2008-02-13 at 23:30 +0200, Adrian Bunk wrote: This patch makes the needlessly global swap_pte_to_pagemap_entry() static. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Thanks. Signed-off-by: Matt Mackall [EMAIL PROTECTED] -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Configure out DMI scanning code v2 (Linux Tiny)
On Tue, 2008-02-12 at 10:04 +0100, Thomas Petazzoni wrote: > Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in > order to be able to remove the DMI table scanning code if it's not > needed, and then reduce the kernel code size. > > With CONFIG_DMI (i.e before) : > >textdata bss dec hex filename > 1076076 128656 98304 1303036 13e1fc vmlinux > > Without CONFIG_DMI (i.e after) : > >textdata bss dec hex filename > 1068092 126308 98304 1292704 13b9a0 vmlinux > > Result: > >textdata bss dec hex filename > -7984 -2348 0 -10332 -285c vmlinux > > The new option appears in "Processor type and features", only when > CONFIG_EMBEDDED is defined. > > This patch is part of the Linux Tiny project, and is based on previous > work done by Matt Mackall <[EMAIL PROTECTED]>. > > Signed-off-by: Thomas Petazzoni <[EMAIL PROTECTED]> Thanks for working on this. Acked-by: Matt Mackall <[EMAIL PROTECTED]> -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)
On Tue, 2008-02-12 at 15:00 +0100, Thomas Petazzoni wrote: > Hi Sam, > > Le Tue, 12 Feb 2008 14:04:28 +0100, > Sam Ravnborg <[EMAIL PROTECTED]> a écrit : > > > We already have this in arch/x86/Kconfig.debug: > > Oops, my usual "find . -name Kconfig" missed it. Thanks for pointing it > out! The fact that you didn't have to add any makefile bits should have been a hint. > > It may need a small update if this is valid for both 32 and 64 bit. > > Doesn't seem so: there's only a doublefault_32.c, no doublefault_64.c. > However, I don't know the details of x86_64. I bet there's some doublefault-handling code hiding somewhere. It's not the sort of thing it'd make sense to take out of the architecture. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Configure out DMI scanning code v2 (Linux Tiny)
On Tue, 2008-02-12 at 10:04 +0100, Thomas Petazzoni wrote: Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in order to be able to remove the DMI table scanning code if it's not needed, and then reduce the kernel code size. With CONFIG_DMI (i.e before) : textdata bss dec hex filename 1076076 128656 98304 1303036 13e1fc vmlinux Without CONFIG_DMI (i.e after) : textdata bss dec hex filename 1068092 126308 98304 1292704 13b9a0 vmlinux Result: textdata bss dec hex filename -7984 -2348 0 -10332 -285c vmlinux The new option appears in Processor type and features, only when CONFIG_EMBEDDED is defined. This patch is part of the Linux Tiny project, and is based on previous work done by Matt Mackall [EMAIL PROTECTED]. Signed-off-by: Thomas Petazzoni [EMAIL PROTECTED] Thanks for working on this. Acked-by: Matt Mackall [EMAIL PROTECTED] -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)
On Tue, 2008-02-12 at 15:00 +0100, Thomas Petazzoni wrote: Hi Sam, Le Tue, 12 Feb 2008 14:04:28 +0100, Sam Ravnborg [EMAIL PROTECTED] a écrit : We already have this in arch/x86/Kconfig.debug: Oops, my usual find . -name Kconfig missed it. Thanks for pointing it out! The fact that you didn't have to add any makefile bits should have been a hint. It may need a small update if this is valid for both 32 and 64 bit. Doesn't seem so: there's only a doublefault_32.c, no doublefault_64.c. However, I don't know the details of x86_64. I bet there's some doublefault-handling code hiding somewhere. It's not the sort of thing it'd make sense to take out of the architecture. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86 (Linux Tiny): configure out support for some processors
On Mon, 2008-02-11 at 16:54 -0800, H. Peter Anvin wrote: > Matt Mackall wrote: > > On Mon, 2008-02-11 at 15:01 -0800, H. Peter Anvin wrote: > >> Matt Mackall wrote: > >>> Best would be to have no ifdefs and do it all with linker magic, of > >>> course. But that's trickier. > >>> > >> I concur with this, definitely. > > > > Ok, so let's come up with a plan. We can: > > > > a) use weak symbols, ala cond_syscall > > b) use a special section > > c) use early_init code (is it early enough?) > > c) have some sort of registration list > > > > Having a generic cond_call of some sort might be nice for this sort of > > thing. > > > > c) is out, because this has to be executed after the early generic code > and before the late generic code. > > b) would be my first choice, and yes, it would be a good thing to have a > generalized mechanism for this. For the registrant, it's pretty easy: > just add a macro that adds a pointer to a named section. We then need a > way to get the base address and length of each such section in order to > be able to execute each function in sequence. I like the idea of making a generalized hook section. But this is a bit burdensome for Michael's little patch (unless you have time to whip something up) so I think we should probably explore it separately. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86 (Linux Tiny): configure out support for some processors
On Mon, 2008-02-11 at 15:01 -0800, H. Peter Anvin wrote: > Matt Mackall wrote: > > > > Best would be to have no ifdefs and do it all with linker magic, of > > course. But that's trickier. > > > > I concur with this, definitely. Ok, so let's come up with a plan. We can: a) use weak symbols, ala cond_syscall b) use a special section c) use early_init code (is it early enough?) c) have some sort of registration list Having a generic cond_call of some sort might be nice for this sort of thing. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86 (Linux Tiny): configure out support for some processors
On Mon, 2008-02-11 at 23:42 +0100, Michael Opdenacker wrote: > /* Specific CPU type init functions */ > -int intel_cpu_init(void); > -int amd_init_cpu(void); > -int cyrix_init_cpu(void); > -int nsc_init_cpu(void); > -int centaur_init_cpu(void); > -int transmeta_init_cpu(void); > -int nexgen_init_cpu(void); > -int umc_init_cpu(void); > + > +#ifdef CONFIG_CPU_SUP_INTEL > +int __cpuinit __ppro_with_ram_bug(void); > +static inline int __cpuinit ppro_with_ram_bug(void) > +{ > + return __ppro_with_ram_bug(); > +} I know Ingo said to do this, but I think he was flat-out wrong. If the tradeoff is between having a dozen ifdefs contained in a single function in one .c file vs wrapping a dozen function in a .h file, I say stick them in the .c file. Best would be to have no ifdefs and do it all with linker magic, of course. But that's trickier. Now the patch is 90% fiddling with wrappers and it's impossible to find the interesting bits anymore.. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slob: fix linking for user mode linux
On Tue, 2008-02-12 at 00:32 +0200, Pekka J Enberg wrote: > From: Pekka Enberg <[EMAIL PROTECTED]> > > UML has some header magic that expects a non-inline __kmalloc() function to be > available. Fixes the following link time errors: > > arch/um/drivers/built-in.o: In function `kmalloc': > /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference > to `__kmalloc' > /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference > to `__kmalloc' > /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference > to `__kmalloc' > /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference > to `__kmalloc' > /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference > to `__kmalloc' > arch/um/drivers/built-in.o:/home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: > more undefined references to `__kmalloc' follow Can someone explain why the magic is needed (and preferably capture it in a comment somewhere sensible)? I took a peek at this and have no idea what's going on. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Configure out DMI scanning code
On Mon, 2008-02-11 at 17:58 +0100, Thomas Petazzoni wrote: > Hi, > > The enclosed patch allows to remove the DMI scanning code when > CONFIG_EMBEDDED is defined. It's basically the dma_blacklist patch of > Linux-Tiny ported to 2.6.25-rc1, with the required modifications. It > allows to remove ~10k from the kernel code/data size. Looks ok. Please preserve original authorship (ie me) in some fashion in your description. > On top of this patch, I've tested if removing the big dmi tables in the > code (for example in arch/x86/kernel/reboot.c) would allow to make more > space optimizations. However, it seems that simply defining > dmi_check_system() to an empty static inlined function already allows > gcc to optimize out the dmi tables, because there are not present in > the code. Is that possible, or is my understanding incorrect ? That's possible with modern gccs, yes. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RESENDING] netconsole: register cmdline netconsole configs to configfs
On Mon, 2008-02-11 at 18:08 +0900, Joonwoo Park wrote: > This patch intorduces cmdline netconsole configs to register to > configfs > with dynamic netconsole. Satyam Sharma who designed shiny dynamic > reconfiguration for netconsole, mentioned about this issue already. > (http://lkml.org/lkml/2007/7/29/360) > But I think, without separately managing of two kind of netconsole > target > objects, it's possible by using config_group instead of > config_item in the netconsole_target and default_groups feature of > configfs. > > Patch was tested with configuration creation/destruction by kernel and > module. > And it makes possible to enable/disable, modify and review netconsole > target configs from cmdline. I'm afraid I'm going to have to leave review of this to someone who is clueful about configfs. But it seems reasonable. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86 (Linux Tiny): configure out support for some processors
On Mon, 2008-02-11 at 23:42 +0100, Michael Opdenacker wrote: /* Specific CPU type init functions */ -int intel_cpu_init(void); -int amd_init_cpu(void); -int cyrix_init_cpu(void); -int nsc_init_cpu(void); -int centaur_init_cpu(void); -int transmeta_init_cpu(void); -int nexgen_init_cpu(void); -int umc_init_cpu(void); + +#ifdef CONFIG_CPU_SUP_INTEL +int __cpuinit __ppro_with_ram_bug(void); +static inline int __cpuinit ppro_with_ram_bug(void) +{ + return __ppro_with_ram_bug(); +} I know Ingo said to do this, but I think he was flat-out wrong. If the tradeoff is between having a dozen ifdefs contained in a single function in one .c file vs wrapping a dozen function in a .h file, I say stick them in the .c file. Best would be to have no ifdefs and do it all with linker magic, of course. But that's trickier. Now the patch is 90% fiddling with wrappers and it's impossible to find the interesting bits anymore.. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Configure out DMI scanning code
On Mon, 2008-02-11 at 17:58 +0100, Thomas Petazzoni wrote: Hi, The enclosed patch allows to remove the DMI scanning code when CONFIG_EMBEDDED is defined. It's basically the dma_blacklist patch of Linux-Tiny ported to 2.6.25-rc1, with the required modifications. It allows to remove ~10k from the kernel code/data size. Looks ok. Please preserve original authorship (ie me) in some fashion in your description. On top of this patch, I've tested if removing the big dmi tables in the code (for example in arch/x86/kernel/reboot.c) would allow to make more space optimizations. However, it seems that simply defining dmi_check_system() to an empty static inlined function already allows gcc to optimize out the dmi tables, because there are not present in the code. Is that possible, or is my understanding incorrect ? That's possible with modern gccs, yes. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RESENDING] netconsole: register cmdline netconsole configs to configfs
On Mon, 2008-02-11 at 18:08 +0900, Joonwoo Park wrote: This patch intorduces cmdline netconsole configs to register to configfs with dynamic netconsole. Satyam Sharma who designed shiny dynamic reconfiguration for netconsole, mentioned about this issue already. (http://lkml.org/lkml/2007/7/29/360) But I think, without separately managing of two kind of netconsole target objects, it's possible by using config_group instead of config_item in the netconsole_target and default_groups feature of configfs. Patch was tested with configuration creation/destruction by kernel and module. And it makes possible to enable/disable, modify and review netconsole target configs from cmdline. I'm afraid I'm going to have to leave review of this to someone who is clueful about configfs. But it seems reasonable. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slob: fix linking for user mode linux
On Tue, 2008-02-12 at 00:32 +0200, Pekka J Enberg wrote: From: Pekka Enberg [EMAIL PROTECTED] UML has some header magic that expects a non-inline __kmalloc() function to be available. Fixes the following link time errors: arch/um/drivers/built-in.o: In function `kmalloc': /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference to `__kmalloc' /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference to `__kmalloc' /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference to `__kmalloc' /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference to `__kmalloc' /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference to `__kmalloc' arch/um/drivers/built-in.o:/home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: more undefined references to `__kmalloc' follow Can someone explain why the magic is needed (and preferably capture it in a comment somewhere sensible)? I took a peek at this and have no idea what's going on. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86 (Linux Tiny): configure out support for some processors
On Mon, 2008-02-11 at 16:54 -0800, H. Peter Anvin wrote: Matt Mackall wrote: On Mon, 2008-02-11 at 15:01 -0800, H. Peter Anvin wrote: Matt Mackall wrote: Best would be to have no ifdefs and do it all with linker magic, of course. But that's trickier. I concur with this, definitely. Ok, so let's come up with a plan. We can: a) use weak symbols, ala cond_syscall b) use a special section c) use early_init code (is it early enough?) c) have some sort of registration list Having a generic cond_call of some sort might be nice for this sort of thing. c) is out, because this has to be executed after the early generic code and before the late generic code. b) would be my first choice, and yes, it would be a good thing to have a generalized mechanism for this. For the registrant, it's pretty easy: just add a macro that adds a pointer to a named section. We then need a way to get the base address and length of each such section in order to be able to execute each function in sequence. I like the idea of making a generalized hook section. But this is a bit burdensome for Michael's little patch (unless you have time to whip something up) so I think we should probably explore it separately. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86 (Linux Tiny): configure out support for some processors
On Mon, 2008-02-11 at 15:01 -0800, H. Peter Anvin wrote: Matt Mackall wrote: Best would be to have no ifdefs and do it all with linker magic, of course. But that's trickier. I concur with this, definitely. Ok, so let's come up with a plan. We can: a) use weak symbols, ala cond_syscall b) use a special section c) use early_init code (is it early enough?) c) have some sort of registration list Having a generic cond_call of some sort might be nice for this sort of thing. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86 (Linux Tiny): configure out support for some processors
On Fri, 2008-02-08 at 23:47 +0100, Michael Opdenacker wrote: > This patch against x86/mm tries to revive an original patch > from Matt Mackall which didn't get merged at that time. It makes > it possible to disable support code for some processors. This can > be useful to support only the exact processor type used > in a given system. > > I may have made wrong assumptions with the code handling > force_mwait. As force_mwait is only declared in > arch/x86/kernel/cpu/amd.c, which is only compiled > when CONFIG_X86_32 is set, I thought it was safe > to make the code depend on CONFIG_CPU_SUP_AMD, > but I could be wrong. > > Your comments are more than welcome! To make the code > cleaner, I could use empty inline functions instead > of ifdef's, as suggested in Documentation/SubmittingPatches. Please include the output of size with all these options on and off. > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c > index dabdbef..8f9a123 100644 > --- a/arch/x86/kernel/process_32.c > +++ b/arch/x86/kernel/process_32.c > @@ -287,8 +287,10 @@ static void mwait_idle(void) > > static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c) > { > +#ifdef CONFIG_CPU_SUP_AMD > if (force_mwait) > return 1; > +#endif Probably makes sense to move force_mwait (one word) here and eliminate these ifdefs. > diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c > index 347a8cd..812bfa0 100644 > --- a/arch/x86/mm/init_32.c > +++ b/arch/x86/mm/init_32.c > @@ -211,12 +211,14 @@ static void __init kernel_physical_mapping_init(pgd_t > *pgd_base) > } > } > > +#ifdef CONFIG_CPU_SUP_INTEL > static inline int page_kills_ppro(unsigned long pagenr) > { > if (pagenr >= 0x7 && pagenr <= 0x7003F) > return 1; > return 0; > } > +#endif > /* > * devmem_is_allowed() checks to see if /dev/mem access to a certain address > @@ -287,7 +289,11 @@ static void __meminit free_new_highpage(struct page > *page) > > void __init add_one_highpage_init(struct page *page, int pfn, int bad_ppro) > { > - if (page_is_ram(pfn) && !(bad_ppro && page_kills_ppro(pfn))) { > + if (page_is_ram(pfn) > +#ifdef CONFIG_CPU_SUP_INTEL > + && !(bad_ppro && page_kills_ppro(pfn)) > +#endif Yuck. A better way to do this is move the bad_ppro check into page_kills_ppro and then ifdef out -the body- of the inline. > @@ -592,7 +598,11 @@ void __init mem_init(void) > #ifdef CONFIG_FLATMEM > BUG_ON(!mem_map); > #endif > +#ifdef CONFIG_CPU_SUP_INTEL > bad_ppro = ppro_with_ram_bug(); > +#else > + bad_ppro = 0; > +#endif Again, move the storage for this, let it get initialized to zero automatically, and initialize it in the CPU-specific code (if ordering allows). -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] stub out is_swap_pte for !MMU
On Fri, 2008-02-08 at 14:05 -0800, Andrew Morton wrote: > On Fri, 08 Feb 2008 15:41:42 -0600 > Matt Mackall <[EMAIL PROTECTED]> wrote: > > > Fix compile error on nommu for is_swap_pte > > > > Does it ever make sense to ask "is this pte a swap entry?" on a machine > > with no MMU? Presumably this also means it has no ptes too, right? In > > which case, it's better to comment the whole function out. Then when > > someone tries to ask the above meaningless question, they get a compile > > error rather than a meaningless answer. > > > > Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> > > > > diff -r 50a6e531a9f2 include/linux/swapops.h > > --- a/include/linux/swapops.h Mon Feb 04 20:23:02 2008 -0600 > > +++ b/include/linux/swapops.h Fri Feb 08 15:38:01 2008 -0600 > > @@ -42,11 +42,13 @@ > > return entry.val & SWP_OFFSET_MASK(entry); > > } > > > > +#ifdef CONFIG_MMU > > /* check whether a pte points to a swap entry */ > > static inline int is_swap_pte(pte_t pte) > > { > > return !pte_none(pte) && !pte_present(pte) && !pte_file(pte); > > } > > +#endif > > > > Seems contradictory. Is there _really_ a compilation error at present? > The changelog seems to imply otherwise and no compiler error output is > quoted and it all compiled OK for me on nommu superh. Sorry, here's the compile error from the original thread (where the original copy of the above patch was posted). ... CC mm/vmscan.o In file included from /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/vmscan.c:44: /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h: In function 'is_swap_pte': /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: implicit declaration of function 'pte_none' /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: implicit declaration of function 'pte_present' make[2]: *** [mm/vmscan.o] Error 1 -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] stub out is_swap_pte for !MMU
On Fri, 2008-02-08 at 16:25 -0500, Mike Frysinger wrote: > On Friday 08 February 2008, Matt Mackall wrote: > > On Fri, 2008-02-08 at 15:02 -0500, Mike Frysinger wrote: > > > With commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773, swap_pte() was > > > moved into view of both MMU and !MMU, but uses functions only provided by > > > MMU. Here we stub out the function for !MMU ports. > > > > I'm not sure if this is right compared to my original patch. Does it > > ever make sense to ask "is this pte a swap entry?" on a machine with no > > MMU? Presumably this also means it has no ptes too, right? In which > > case, it's better to comment the whole function out. Then when someone > > tries to ask the above meaningless question, they get a compile error > > rather than a meaningless answer. > > honestly, doesnt matter to me since none of the code that currently utilizes > this function is used in no-mmu context. if you want to just put the whole > thing in CONFIG_MMU, then go for it. Here it is again, I'll leave it up to Andrew: Fix compile error on nommu for is_swap_pte Does it ever make sense to ask "is this pte a swap entry?" on a machine with no MMU? Presumably this also means it has no ptes too, right? In which case, it's better to comment the whole function out. Then when someone tries to ask the above meaningless question, they get a compile error rather than a meaningless answer. Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> diff -r 50a6e531a9f2 include/linux/swapops.h --- a/include/linux/swapops.h Mon Feb 04 20:23:02 2008 -0600 +++ b/include/linux/swapops.h Fri Feb 08 15:38:01 2008 -0600 @@ -42,11 +42,13 @@ return entry.val & SWP_OFFSET_MASK(entry); } +#ifdef CONFIG_MMU /* check whether a pte points to a swap entry */ static inline int is_swap_pte(pte_t pte) { return !pte_none(pte) && !pte_present(pte) && !pte_file(pte); } +#endif /* * Convert the arch-dependent pte representation of a swp_entry_t into an -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] stub out is_swap_pte for !MMU
On Fri, 2008-02-08 at 15:02 -0500, Mike Frysinger wrote: > With commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773, swap_pte() was moved > into view of both MMU and !MMU, but uses functions only provided by MMU. > Here we stub out the function for !MMU ports. I'm not sure if this is right compared to my original patch. Does it ever make sense to ask "is this pte a swap entry?" on a machine with no MMU? Presumably this also means it has no ptes too, right? In which case, it's better to comment the whole function out. Then when someone tries to ask the above meaningless question, they get a compile error rather than a meaningless answer. > Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]> > --- > include/linux/swapops.h |4 > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/include/linux/swapops.h b/include/linux/swapops.h > index 7bf2d14..e6b54f7 100644 > --- a/include/linux/swapops.h > +++ b/include/linux/swapops.h > @@ -45,7 +45,11 @@ static inline pgoff_t swp_offset(swp_entry_t entry) > /* check whether a pte points to a swap entry */ > static inline int is_swap_pte(pte_t pte) > { > +#ifdef CONFIG_MMU > return !pte_none(pte) && !pte_present(pte) && !pte_file(pte); > +#else > + return 0; > +#endif > } > > /* -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] stub out is_swap_pte for !MMU
On Fri, 2008-02-08 at 15:02 -0500, Mike Frysinger wrote: With commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773, swap_pte() was moved into view of both MMU and !MMU, but uses functions only provided by MMU. Here we stub out the function for !MMU ports. I'm not sure if this is right compared to my original patch. Does it ever make sense to ask is this pte a swap entry? on a machine with no MMU? Presumably this also means it has no ptes too, right? In which case, it's better to comment the whole function out. Then when someone tries to ask the above meaningless question, they get a compile error rather than a meaningless answer. Signed-off-by: Mike Frysinger [EMAIL PROTECTED] --- include/linux/swapops.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 7bf2d14..e6b54f7 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -45,7 +45,11 @@ static inline pgoff_t swp_offset(swp_entry_t entry) /* check whether a pte points to a swap entry */ static inline int is_swap_pte(pte_t pte) { +#ifdef CONFIG_MMU return !pte_none(pte) !pte_present(pte) !pte_file(pte); +#else + return 0; +#endif } /* -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] stub out is_swap_pte for !MMU
On Fri, 2008-02-08 at 16:25 -0500, Mike Frysinger wrote: On Friday 08 February 2008, Matt Mackall wrote: On Fri, 2008-02-08 at 15:02 -0500, Mike Frysinger wrote: With commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773, swap_pte() was moved into view of both MMU and !MMU, but uses functions only provided by MMU. Here we stub out the function for !MMU ports. I'm not sure if this is right compared to my original patch. Does it ever make sense to ask is this pte a swap entry? on a machine with no MMU? Presumably this also means it has no ptes too, right? In which case, it's better to comment the whole function out. Then when someone tries to ask the above meaningless question, they get a compile error rather than a meaningless answer. honestly, doesnt matter to me since none of the code that currently utilizes this function is used in no-mmu context. if you want to just put the whole thing in CONFIG_MMU, then go for it. Here it is again, I'll leave it up to Andrew: Fix compile error on nommu for is_swap_pte Does it ever make sense to ask is this pte a swap entry? on a machine with no MMU? Presumably this also means it has no ptes too, right? In which case, it's better to comment the whole function out. Then when someone tries to ask the above meaningless question, they get a compile error rather than a meaningless answer. Signed-off-by: Matt Mackall [EMAIL PROTECTED] diff -r 50a6e531a9f2 include/linux/swapops.h --- a/include/linux/swapops.h Mon Feb 04 20:23:02 2008 -0600 +++ b/include/linux/swapops.h Fri Feb 08 15:38:01 2008 -0600 @@ -42,11 +42,13 @@ return entry.val SWP_OFFSET_MASK(entry); } +#ifdef CONFIG_MMU /* check whether a pte points to a swap entry */ static inline int is_swap_pte(pte_t pte) { return !pte_none(pte) !pte_present(pte) !pte_file(pte); } +#endif /* * Convert the arch-dependent pte representation of a swp_entry_t into an -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86 (Linux Tiny): configure out support for some processors
On Fri, 2008-02-08 at 23:47 +0100, Michael Opdenacker wrote: This patch against x86/mm tries to revive an original patch from Matt Mackall which didn't get merged at that time. It makes it possible to disable support code for some processors. This can be useful to support only the exact processor type used in a given system. I may have made wrong assumptions with the code handling force_mwait. As force_mwait is only declared in arch/x86/kernel/cpu/amd.c, which is only compiled when CONFIG_X86_32 is set, I thought it was safe to make the code depend on CONFIG_CPU_SUP_AMD, but I could be wrong. Your comments are more than welcome! To make the code cleaner, I could use empty inline functions instead of ifdef's, as suggested in Documentation/SubmittingPatches. Please include the output of size with all these options on and off. diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index dabdbef..8f9a123 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -287,8 +287,10 @@ static void mwait_idle(void) static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c) { +#ifdef CONFIG_CPU_SUP_AMD if (force_mwait) return 1; +#endif Probably makes sense to move force_mwait (one word) here and eliminate these ifdefs. diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 347a8cd..812bfa0 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -211,12 +211,14 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base) } } +#ifdef CONFIG_CPU_SUP_INTEL static inline int page_kills_ppro(unsigned long pagenr) { if (pagenr = 0x7 pagenr = 0x7003F) return 1; return 0; } +#endif /* * devmem_is_allowed() checks to see if /dev/mem access to a certain address @@ -287,7 +289,11 @@ static void __meminit free_new_highpage(struct page *page) void __init add_one_highpage_init(struct page *page, int pfn, int bad_ppro) { - if (page_is_ram(pfn) !(bad_ppro page_kills_ppro(pfn))) { + if (page_is_ram(pfn) +#ifdef CONFIG_CPU_SUP_INTEL + !(bad_ppro page_kills_ppro(pfn)) +#endif Yuck. A better way to do this is move the bad_ppro check into page_kills_ppro and then ifdef out -the body- of the inline. @@ -592,7 +598,11 @@ void __init mem_init(void) #ifdef CONFIG_FLATMEM BUG_ON(!mem_map); #endif +#ifdef CONFIG_CPU_SUP_INTEL bad_ppro = ppro_with_ram_bug(); +#else + bad_ppro = 0; +#endif Again, move the storage for this, let it get initialized to zero automatically, and initialize it in the CPU-specific code (if ordering allows). -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: blackfin compile error
On Wed, 2008-02-06 at 17:18 +0200, Adrian Bunk wrote: > Commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773 broke blackfin: > > <-- snip --> > > ... > CC mm/vmscan.o > In file included from > /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/vmscan.c:44: > /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h: In > function 'is_swap_pte': > /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: > implicit declaration of function 'pte_none' > /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: > implicit declaration of function 'pte_present' > make[2]: *** [mm/vmscan.o] Error 1 This suggests that no one's tried to compile -mm on Blackfin since before September, I think. Is there somewhere more appropriate to move it? I can't find one. Failing that, we can wrap it in CONFIG_MMU, I suppose. Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> diff -r 50a6e531a9f2 include/linux/swapops.h --- a/include/linux/swapops.h Mon Feb 04 20:23:02 2008 -0600 +++ b/include/linux/swapops.h Wed Feb 06 10:21:32 2008 -0600 @@ -42,11 +42,13 @@ return entry.val & SWP_OFFSET_MASK(entry); } +#ifdef CONFIG_MMU /* check whether a pte points to a swap entry */ static inline int is_swap_pte(pte_t pte) { return !pte_none(pte) && !pte_present(pte) && !pte_file(pte); } +#endif /* * Convert the arch-dependent pte representation of a swp_entry_t into an -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: blackfin compile error
On Wed, 2008-02-06 at 17:18 +0200, Adrian Bunk wrote: Commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773 broke blackfin: -- snip -- ... CC mm/vmscan.o In file included from /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/vmscan.c:44: /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h: In function 'is_swap_pte': /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: implicit declaration of function 'pte_none' /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: implicit declaration of function 'pte_present' make[2]: *** [mm/vmscan.o] Error 1 This suggests that no one's tried to compile -mm on Blackfin since before September, I think. Is there somewhere more appropriate to move it? I can't find one. Failing that, we can wrap it in CONFIG_MMU, I suppose. Signed-off-by: Matt Mackall [EMAIL PROTECTED] diff -r 50a6e531a9f2 include/linux/swapops.h --- a/include/linux/swapops.h Mon Feb 04 20:23:02 2008 -0600 +++ b/include/linux/swapops.h Wed Feb 06 10:21:32 2008 -0600 @@ -42,11 +42,13 @@ return entry.val SWP_OFFSET_MASK(entry); } +#ifdef CONFIG_MMU /* check whether a pte points to a swap entry */ static inline int is_swap_pte(pte_t pte) { return !pte_none(pte) !pte_present(pte) !pte_file(pte); } +#endif /* * Convert the arch-dependent pte representation of a swp_entry_t into an -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.24-mm1] TCP/IPv6 connect() oopses at twothirdsMD4Transform()
On Mon, 2008-02-04 at 17:36 -0800, Andrew Morton wrote: > On Tue, 05 Feb 2008 10:28:43 +0900 Tetsuo Handa <[EMAIL PROTECTED]> wrote: > > > Hello. > > > > Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.24-mm1 > > > > 2.6.24 works fine. > err, Matt? random: revert braindamage that snuck into checkpatch cleanup Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> diff -r 50a6e531a9f2 drivers/char/random.c --- a/drivers/char/random.c Mon Feb 04 20:23:02 2008 -0600 +++ b/drivers/char/random.c Mon Feb 04 20:28:08 2008 -0600 @@ -1306,7 +1306,7 @@ * Rotation is separate from addition to prevent recomputation */ #define ROUND(f, a, b, c, d, x, s) \ - (a += f(b, c, d) + in[x], a = (a << s) | (a >> (32 - s))) + (a += f(b, c, d) + x, a = (a << s) | (a >> (32 - s))) #define K1 0 #define K2 013240474631UL #define K3 015666365641UL -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Integration of SCST in the mainstream Linux kernel
On Mon, 2008-02-04 at 16:24 -0800, Linus Torvalds wrote: > > On Mon, 4 Feb 2008, Matt Mackall wrote: > > > > But ATAoE is boring because it's not IP. Which means no routing, > > firewalls, tunnels, congestion control, etc. > > The thing is, that's often an advantage. Not just for performance. > > > NBD and iSCSI (for all its hideous growths) can take advantage of these > > things. > > .. and all this could equally well be done by a simple bridging protocol > (completely independently of any AoE code). > > The thing is, iSCSI does things at the wrong level. It *forces* people to > use the complex protocols, when it's a known that a lot of people don't > want it. I frankly think NBD is at a pretty comfortable level. It's internally very simple (and hardware-agnostic). And moderately easy to do in silicon. But I'm not going to defend iSCSI. I worked on the first implementation (what became the Cisco iSCSI driver) and I have no love for iSCSI at all. It should have been (and started out as) a nearly trivial encapsulation of SCSI over TCP much like ATA over Ethernet but quickly lost the plot when committees got ahold of it. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Integration of SCST in the mainstream Linux kernel
On Mon, 2008-02-04 at 22:43 +, Alan Cox wrote: > > better. So for example, I personally suspect that ATA-over-ethernet is way > > better than some crazy SCSI-over-TCP crap, but I'm biased for simple and > > low-level, and against those crazy SCSI people to begin with. > > Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP > would probably trash iSCSI for latency if nothing else. But ATAoE is boring because it's not IP. Which means no routing, firewalls, tunnels, congestion control, etc. NBD and iSCSI (for all its hideous growths) can take advantage of these things. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Integration of SCST in the mainstream Linux kernel
On Mon, 2008-02-04 at 22:43 +, Alan Cox wrote: better. So for example, I personally suspect that ATA-over-ethernet is way better than some crazy SCSI-over-TCP crap, but I'm biased for simple and low-level, and against those crazy SCSI people to begin with. Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP would probably trash iSCSI for latency if nothing else. But ATAoE is boring because it's not IP. Which means no routing, firewalls, tunnels, congestion control, etc. NBD and iSCSI (for all its hideous growths) can take advantage of these things. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Integration of SCST in the mainstream Linux kernel
On Mon, 2008-02-04 at 16:24 -0800, Linus Torvalds wrote: On Mon, 4 Feb 2008, Matt Mackall wrote: But ATAoE is boring because it's not IP. Which means no routing, firewalls, tunnels, congestion control, etc. The thing is, that's often an advantage. Not just for performance. NBD and iSCSI (for all its hideous growths) can take advantage of these things. .. and all this could equally well be done by a simple bridging protocol (completely independently of any AoE code). The thing is, iSCSI does things at the wrong level. It *forces* people to use the complex protocols, when it's a known that a lot of people don't want it. I frankly think NBD is at a pretty comfortable level. It's internally very simple (and hardware-agnostic). And moderately easy to do in silicon. But I'm not going to defend iSCSI. I worked on the first implementation (what became the Cisco iSCSI driver) and I have no love for iSCSI at all. It should have been (and started out as) a nearly trivial encapsulation of SCSI over TCP much like ATA over Ethernet but quickly lost the plot when committees got ahold of it. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.24-mm1] TCP/IPv6 connect() oopses at twothirdsMD4Transform()
On Mon, 2008-02-04 at 17:36 -0800, Andrew Morton wrote: On Tue, 05 Feb 2008 10:28:43 +0900 Tetsuo Handa [EMAIL PROTECTED] wrote: Hello. Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.24-mm1 2.6.24 works fine. err, Matt? random: revert braindamage that snuck into checkpatch cleanup Signed-off-by: Matt Mackall [EMAIL PROTECTED] diff -r 50a6e531a9f2 drivers/char/random.c --- a/drivers/char/random.c Mon Feb 04 20:23:02 2008 -0600 +++ b/drivers/char/random.c Mon Feb 04 20:28:08 2008 -0600 @@ -1306,7 +1306,7 @@ * Rotation is separate from addition to prevent recomputation */ #define ROUND(f, a, b, c, d, x, s) \ - (a += f(b, c, d) + in[x], a = (a s) | (a (32 - s))) + (a += f(b, c, d) + x, a = (a s) | (a (32 - s))) #define K1 0 #define K2 013240474631UL #define K3 015666365641UL -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] unexport add_disk_randomness
On Wed, 2008-01-30 at 22:02 +0200, Adrian Bunk wrote: > This patch removes the no longer used EXPORT_SYMBOL(add_disk_randomness). > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Acked-by: Matt Mackall <[EMAIL PROTECTED]> > --- > f1a195a30248eae541ba006633aa70385d1eb785 > diff --git a/drivers/char/random.c b/drivers/char/random.c > index 5fee056..c511a83 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -667,8 +667,6 @@ void add_disk_randomness(struct gendisk *disk) > add_timer_randomness(disk->random, >0x100 + MKDEV(disk->major, disk->first_minor)); > } > - > -EXPORT_SYMBOL(add_disk_randomness); > #endif > > #define EXTRACT_SIZE 10 -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Improve Documentation/stable_api_nonsense.txt
On Tue, 2008-01-29 at 16:14 +0200, Heikki Orsila wrote: > > > Imo, > > > "same exact C compiler" is just bad language, because C compilers are > > > always "exact". "exactly same C compiler" would do. > > > > No, "exactly same C compiler" doesn't parse well in English. > > "Same exact C compiler" does not mean what you try to say. Actually, it does. It is perfectly idiomatic English. "Use the exact same C compiler." -> OK, idiomatic "Use the same exact C compiler." -> OK, idiomatic "Use the exactly same C compiler." -> very awkward "Use exactly the same C compiler." -> formally correct http://www.bartleby.com/68/24/2324.html -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory
On Wed, 2008-01-30 at 18:28 +0100, Peter Zijlstra wrote: > Subject: mm: MADV_WILLNEED implementation for anonymous memory > > Implement MADV_WILLNEED for anonymous pages by walking the page tables and > starting asynchonous swap cache reads for all encountered swap pages. > > Doing so required a modification to the page table walking library functions. > Previously ->pte_entry() could be called while holding a kmap_atomic, to > overcome this problem the pte walker is changed to copy batches of the pmd > and iterate them. That's a pretty reasonable approach. My original approach was to buffer a page worth of PTEs with all the attendant malloc annoyances. Then Andrew and I came up with another fix a bit ago by effectively doing a batch of size 1: mapping and immediately unmapping per PTE. That's basically a no-op on !HIGHPTE but could potentially be expensive in the HIGHPTE case. Your approach might be a good complexity/performance middle ground. Unfortunately, I think we only implemented our fix in one of the relevant places: the /proc/pid/pagemap code hooks a callback at the pte table level and then does its own walk across the table. Perhaps I should refactor this so that it hooks in at the pte entry level of the walker instead. > +/* > + * Much of the complication here is to work around CONFIG_HIGHPTE which needs > + * to kmap the pmd. So copy batches of ptes from the pmd and iterate over > + * those. > + */ > +#define WALK_BATCH_SIZE 32 > + > static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, > const struct mm_walk *walk, void *private) > { > pte_t *pte; > + pte_t ptes[WALK_BATCH_SIZE]; > + unsigned long start; > + unsigned int i; > int err = 0; > > - pte = pte_offset_map(pmd, addr); > do { > - err = walk->pte_entry(pte, addr, addr + PAGE_SIZE, private); > - if (err) > -break; > - } while (pte++, addr += PAGE_SIZE, addr != end); > + start = addr; > > - pte_unmap(pte); > + pte = pte_offset_map(pmd, addr); > + for (i = 0; i < WALK_BATCH_SIZE && addr != end; > + i++, pte++, addr += PAGE_SIZE) > + ptes[i] = *pte; Looks like this could be: for (i = 0; i < WALK_BATCH_SIZE && addr + i * PAGE_SIZE != end; i++) ptes[i] = pte[i]; > + pte_unmap(pte); > + > + for (i = 0, pte = ptes, addr = start; > + i < WALK_BATCH_SIZE && addr != end; > + i++, pte++, addr += PAGE_SIZE) { > + err = walk->pte_entry(pte, addr, addr + PAGE_SIZE, > + private); for (i = 0; i < WALK_BATCH_SIZE && addr != end; i++, addr+= PAGE_SIZE) { err = walk->pte_entry(ptes[i], addr, addr + PAGE_SIZE, private); And we can ditch start. Also, one wonders if setting batch size to 1 will then convince the compiler to collapse this into a more trivial loop in the !HIGHPTE case. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory
On Wed, 2008-01-30 at 18:28 +0100, Peter Zijlstra wrote: Subject: mm: MADV_WILLNEED implementation for anonymous memory Implement MADV_WILLNEED for anonymous pages by walking the page tables and starting asynchonous swap cache reads for all encountered swap pages. Doing so required a modification to the page table walking library functions. Previously -pte_entry() could be called while holding a kmap_atomic, to overcome this problem the pte walker is changed to copy batches of the pmd and iterate them. That's a pretty reasonable approach. My original approach was to buffer a page worth of PTEs with all the attendant malloc annoyances. Then Andrew and I came up with another fix a bit ago by effectively doing a batch of size 1: mapping and immediately unmapping per PTE. That's basically a no-op on !HIGHPTE but could potentially be expensive in the HIGHPTE case. Your approach might be a good complexity/performance middle ground. Unfortunately, I think we only implemented our fix in one of the relevant places: the /proc/pid/pagemap code hooks a callback at the pte table level and then does its own walk across the table. Perhaps I should refactor this so that it hooks in at the pte entry level of the walker instead. +/* + * Much of the complication here is to work around CONFIG_HIGHPTE which needs + * to kmap the pmd. So copy batches of ptes from the pmd and iterate over + * those. + */ +#define WALK_BATCH_SIZE 32 + static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, const struct mm_walk *walk, void *private) { pte_t *pte; + pte_t ptes[WALK_BATCH_SIZE]; + unsigned long start; + unsigned int i; int err = 0; - pte = pte_offset_map(pmd, addr); do { - err = walk-pte_entry(pte, addr, addr + PAGE_SIZE, private); - if (err) -break; - } while (pte++, addr += PAGE_SIZE, addr != end); + start = addr; - pte_unmap(pte); + pte = pte_offset_map(pmd, addr); + for (i = 0; i WALK_BATCH_SIZE addr != end; + i++, pte++, addr += PAGE_SIZE) + ptes[i] = *pte; Looks like this could be: for (i = 0; i WALK_BATCH_SIZE addr + i * PAGE_SIZE != end; i++) ptes[i] = pte[i]; + pte_unmap(pte); + + for (i = 0, pte = ptes, addr = start; + i WALK_BATCH_SIZE addr != end; + i++, pte++, addr += PAGE_SIZE) { + err = walk-pte_entry(pte, addr, addr + PAGE_SIZE, + private); for (i = 0; i WALK_BATCH_SIZE addr != end; i++, addr+= PAGE_SIZE) { err = walk-pte_entry(ptes[i], addr, addr + PAGE_SIZE, private); And we can ditch start. Also, one wonders if setting batch size to 1 will then convince the compiler to collapse this into a more trivial loop in the !HIGHPTE case. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Improve Documentation/stable_api_nonsense.txt
On Tue, 2008-01-29 at 16:14 +0200, Heikki Orsila wrote: Imo, same exact C compiler is just bad language, because C compilers are always exact. exactly same C compiler would do. No, exactly same C compiler doesn't parse well in English. Same exact C compiler does not mean what you try to say. Actually, it does. It is perfectly idiomatic English. Use the exact same C compiler. - OK, idiomatic Use the same exact C compiler. - OK, idiomatic Use the exactly same C compiler. - very awkward Use exactly the same C compiler. - formally correct http://www.bartleby.com/68/24/2324.html -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] unexport add_disk_randomness
On Wed, 2008-01-30 at 22:02 +0200, Adrian Bunk wrote: This patch removes the no longer used EXPORT_SYMBOL(add_disk_randomness). Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Acked-by: Matt Mackall [EMAIL PROTECTED] --- f1a195a30248eae541ba006633aa70385d1eb785 diff --git a/drivers/char/random.c b/drivers/char/random.c index 5fee056..c511a83 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -667,8 +667,6 @@ void add_disk_randomness(struct gendisk *disk) add_timer_randomness(disk-random, 0x100 + MKDEV(disk-major, disk-first_minor)); } - -EXPORT_SYMBOL(add_disk_randomness); #endif #define EXTRACT_SIZE 10 -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH UPDATE] x86: ignore spurious faults
On Wed, 2008-01-23 at 16:28 -0800, Jeremy Fitzhardinge wrote: > When changing a kernel page from RO->RW, it's OK to leave stale TLB > entries around, since doing a global flush is expensive and they pose > no security problem. They can, however, generate a spurious fault, > which we should catch and simply return from (which will have the > side-effect of reloading the TLB to the current PTE). > > This can occur when running under Xen, because it frequently changes > kernel pages from RW->RO->RW to implement Xen's pagetable semantics. > It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it avoids > doing a global TLB flush after changing page permissions. There's perhaps an opportunity to do this lazy TLB trick in the mmap path as well, where RW mappings are initially mapped as RO so we can catch processes dirtying them and then switched to RW. If the mapping is shared across threads on multiple cores, we can defer synchronizing the TLBs on the others. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys_msync()
On Thu, 2008-01-24 at 12:36 +1100, Nick Piggin wrote: > On Thursday 24 January 2008 04:05, Linus Torvalds wrote: > > On Wed, 23 Jan 2008, Anton Salikhmetov wrote: > > > + > > > + if (pte_dirty(*pte) && pte_write(*pte)) { > > > > Not correct. > > > > You still need to check "pte_present()" before you can test any other > > bits. For a non-present pte, none of the other bits are defined, and for > > all we know there might be architectures out there that require them to > > be non-dirty. > > > > As it is, you just possibly randomly corrupted the pte. > > > > Yeah, on all architectures I know of, it the pte is clear, neither of > > those tests will trigger, so it just happens to work, but it's still > > wrong. > > Probably it can fail for !present nonlinear mappings on many > architectures. Definitely. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Rescheduling interrupts
On Wed, 2008-01-23 at 09:53 +0100, Andi Kleen wrote: > Ingo Molnar <[EMAIL PROTECTED]> writes: > > > that would probably be the case if it's multiple sockets - but for > > multiple cores exactly the opposite is true: the sooner _both_ cores > > finish processing, the deeper power use the CPU can reach. > > That's only true on setups where the cores don't have > separate sleep states. But that's not generally true anymore. > e.g. AMD Fam10h has completely separate power planes for > the cores and I believe newer Intel CPUs can also let their > cores go to at least some sleep states independently (although > the deepest sleep modi still require all cores idle) I think we can expect everyone to rapidly evolve towards full independence of core power states. In fact, it wouldn't surprise me if we eventually get to the point of shutting down individual functional units like the FPU. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Rescheduling interrupts
On Wed, 2008-01-23 at 09:53 +0100, Andi Kleen wrote: Ingo Molnar [EMAIL PROTECTED] writes: that would probably be the case if it's multiple sockets - but for multiple cores exactly the opposite is true: the sooner _both_ cores finish processing, the deeper power use the CPU can reach. That's only true on setups where the cores don't have separate sleep states. But that's not generally true anymore. e.g. AMD Fam10h has completely separate power planes for the cores and I believe newer Intel CPUs can also let their cores go to at least some sleep states independently (although the deepest sleep modi still require all cores idle) I think we can expect everyone to rapidly evolve towards full independence of core power states. In fact, it wouldn't surprise me if we eventually get to the point of shutting down individual functional units like the FPU. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH rc8-mm1] hotfix libata-scsi corruption
On Tue, 2008-01-22 at 22:59 +, Hugh Dickins wrote: > On Tue, 22 Jan 2008, James Bottomley wrote: > > > > libsas looks to be OK because it specifically kmallocs a 512 byte buffer > > which should (for off slab data) be 512 byte aligned. > > I don't remember the various SLAB and SLOB and SLUB rules offhand: > I'm not sure it's safe to rely on such alignment on all of them It doesn't work that way with SLOB kmalloc (nor did it in pre-slabified kmalloc). One shouldn't be surprised if a SLAB/SLUB debugging feature breaks that alignment either. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling
On Tue, 2008-01-22 at 19:58 +0100, Sam Ravnborg wrote: > On Tue, Jan 22, 2008 at 10:37:19AM -0600, Matt Mackall wrote: > > > > On Tue, 2008-01-22 at 15:39 +0100, Ingo Molnar wrote: > > > threadinfo-ool.patch: doesnt this break the scheduler? > > > > It didn't when I wrote it, 3+ years ago. But I'm sure it needs to be > > revisited. > > > > > tiny-cflags.patch: obsolete? Isnt CFLAGS already extendable? Question to > > > Sam i guess. > And what was the question then? > > We have today the possibility to say: > make KCFLAGS=-whatever > > and we have plenty of kconfig adjustmenst affecting the gcc options. > > I do not know if this covers it. Basically the idea was you could specify various flags that affected kernel size, in particular overriding the various bloated alignment defaults. If I were to do this today (if they haven't already become the default), I'd probably add a config var to request minimal alignment instead. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Rescheduling interrupts
On Tue, 2008-01-22 at 17:05 +0100, Ingo Molnar wrote: > * S.Çağlar Onur <[EMAIL PROTECTED]> wrote: > > > > My theory is that for whatever reason we get "repeat" IPIs: multiple > > > reschedule IPIs although the other CPU only initiated one. > > > > Ok, please see http://cekirdek.pardus.org.tr/~caglar/dmesg.3rd :) > > hm, the IPI sending and receiving is nicely paired up: > > [ 625.795008] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1: > [ 625.795223] IPI (@native_smp_send_reschedule) from task amarokapp:2882 on > CPU#1: > > amarokapp does wake up threads every 20 microseconds - that could > explain it. It's probably Xorg running on one core, amarokapp on the > other core. That's already 100 reschedules/sec. That suggests we want an "anti-load-balancing" heuristic when CPU usage is very low. Migrating everything onto one core when we're close to idle will save power and probably reduce latencies. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling
On Tue, 2008-01-22 at 15:39 +0100, Ingo Molnar wrote: > threadinfo-ool.patch: doesnt this break the scheduler? It didn't when I wrote it, 3+ years ago. But I'm sure it needs to be revisited. > tiny-cflags.patch: obsolete? Isnt CFLAGS already extendable? Question to > Sam i guess. Yup. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling
On Tue, 2008-01-22 at 15:39 +0100, Ingo Molnar wrote: threadinfo-ool.patch: doesnt this break the scheduler? It didn't when I wrote it, 3+ years ago. But I'm sure it needs to be revisited. tiny-cflags.patch: obsolete? Isnt CFLAGS already extendable? Question to Sam i guess. Yup. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Rescheduling interrupts
On Tue, 2008-01-22 at 17:05 +0100, Ingo Molnar wrote: * S.Çağlar Onur [EMAIL PROTECTED] wrote: My theory is that for whatever reason we get repeat IPIs: multiple reschedule IPIs although the other CPU only initiated one. Ok, please see http://cekirdek.pardus.org.tr/~caglar/dmesg.3rd :) hm, the IPI sending and receiving is nicely paired up: [ 625.795008] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1: [ 625.795223] IPI (@native_smp_send_reschedule) from task amarokapp:2882 on CPU#1: amarokapp does wake up threads every 20 microseconds - that could explain it. It's probably Xorg running on one core, amarokapp on the other core. That's already 100 reschedules/sec. That suggests we want an anti-load-balancing heuristic when CPU usage is very low. Migrating everything onto one core when we're close to idle will save power and probably reduce latencies. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling
On Tue, 2008-01-22 at 19:58 +0100, Sam Ravnborg wrote: On Tue, Jan 22, 2008 at 10:37:19AM -0600, Matt Mackall wrote: On Tue, 2008-01-22 at 15:39 +0100, Ingo Molnar wrote: threadinfo-ool.patch: doesnt this break the scheduler? It didn't when I wrote it, 3+ years ago. But I'm sure it needs to be revisited. tiny-cflags.patch: obsolete? Isnt CFLAGS already extendable? Question to Sam i guess. And what was the question then? We have today the possibility to say: make KCFLAGS=-whatever and we have plenty of kconfig adjustmenst affecting the gcc options. I do not know if this covers it. Basically the idea was you could specify various flags that affected kernel size, in particular overriding the various bloated alignment defaults. If I were to do this today (if they haven't already become the default), I'd probably add a config var to request minimal alignment instead. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH rc8-mm1] hotfix libata-scsi corruption
On Tue, 2008-01-22 at 22:59 +, Hugh Dickins wrote: On Tue, 22 Jan 2008, James Bottomley wrote: libsas looks to be OK because it specifically kmallocs a 512 byte buffer which should (for off slab data) be 512 byte aligned. I don't remember the various SLAB and SLOB and SLUB rules offhand: I'm not sure it's safe to rely on such alignment on all of them It doesn't work that way with SLOB kmalloc (nor did it in pre-slabified kmalloc). One shouldn't be surprised if a SLAB/SLUB debugging feature breaks that alignment either. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET] printk: implement printk_header() and merging printk
On Tue, 2008-01-22 at 10:00 +0900, Tejun Heo wrote: > Matt Mackall wrote: > > I suppose. I still find this approach less than ideal, especially > > putting something potentially large on the stack. The dangers are > > perhaps worse than a malloc, really. > > I pondered on this a bit but the thing is we already use several > hundreds bytes in a function which builds complex messages. Well there are lots of current and potential future users of this function, many of them down at the end of long call chains. So I'm more worried about the new cases that embrace this approach and suddenly add 300 bytes of stack. In fact, if this is at all popular, we can expect to have more than one of these frames on the stack in various paths. Given that it only exists to make output prettier, it doesn't -really- justify increased stack usage. > > I also don't like your interface much. Consider this alternative: > > > > struct mprintk *mp = mprintk_begin(KERN_INFO "ata%u.%2u: ", 1, 0); > > mprintk(mp, "ATA %d", 7); > > mprintk(mp, ", %u sectors\n", 1024); > > mprintk(mp, "everything seems dandy\n"); > > mprintk_end(mp); > > > > That keeps all the "normal" printks short and makes the flush more > > explict. > > I like that the more used function is shorter. Hmmm... The reason why I > first used mprintk_push() is to make it clear that the function > accumulates messages unlike mprintk() which flushes what's accumulated > and prints its own message. > > > Now we make mprintk_begin attempt to do a kmalloc of a moderate size > > (512 bytes?) and failing that, return null. Then mprintk can fall > > through to printk in the NULL case. > > If you wanna do that implicitly, you need GFP_ flag in mprintk_begin() > and atomic allocation should be used from interrupt handlers and friends > and they fail easily under the right (or wrong) conditions. Forcing > kmalloc isn't a good idea. Having multiple initializers is one way to > do it. Any suggestions? Adding a gfp_flags arg isn't too painful. And we've generally avoided having separate function calls for atomic vs non-atomic allocation. Btw, we can also easily hide Willy or Rusty's stringbuf implementation under the covers here and still have a scheme that automatically falls back to direct printk.. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET] printk: implement printk_header() and merging printk
On Sat, 2008-01-19 at 07:58 +0900, Tejun Heo wrote: > Matt Mackall wrote: > > On Wed, 2008-01-16 at 10:00 +0900, Tejun Heo wrote: > >> And mprintk the following. > >> > >> code: > >> DEFINE_MPRINTK(mp, 2 * 80); > >> > >> mprintk_set_header(, KERN_INFO "ata%u.%2u: ", 1, 0); > >> mprintk_push(, "ATA %d", 7); > >> mprintk_push(, ", %u sectors\n", 1024); > >> mprintk(, "everything seems dandy\n"); > > > > I prefer Matthew Wilcox's stringbuf approach which does proper memory > > management and isn't specific to printk: > > > > http://www.ussg.iu.edu/hypermail/linux/kernel/0710.3/0517.html > > Yeap, that's generic and nice but I think both 'generic' and 'proper > memory management' are weakness if what you're trying to do is to > support collecting messages in pieces and putting it out via printk. > Please consider the following scenario. > > You're in an interrupt handler and detected a severe error condition > which should be notified to the user but the information is rather > complex and best built in pieces, so you create a stringbuf and does > sb_printf() to it w/ GFP_ATOMIC but alas memory allocation failed and > you end up printing "out of memory" unless you detect the failure and go > back and printk messages piece-by-piece manually. I would rather > assemble the message manually from the get-go into an on-stack buffer. I suppose. I still find this approach less than ideal, especially putting something potentially large on the stack. The dangers are perhaps worse than a malloc, really. I also don't like your interface much. Consider this alternative: struct mprintk *mp = mprintk_begin(KERN_INFO "ata%u.%2u: ", 1, 0); mprintk(mp, "ATA %d", 7); mprintk(mp, ", %u sectors\n", 1024); mprintk(mp, "everything seems dandy\n"); mprintk_end(mp); That keeps all the "normal" printks short and makes the flush more explict. Now we make mprintk_begin attempt to do a kmalloc of a moderate size (512 bytes?) and failing that, return null. Then mprintk can fall through to printk in the NULL case. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] random - add async I/O support
On Mon, 2008-01-21 at 13:18 -0500, Jeff Dike wrote: > Add async notification support to /dev/random. This conflicts just about everywhere with my latest code, but I'll fix that up. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] random - add async I/O support
On Mon, 2008-01-21 at 13:18 -0500, Jeff Dike wrote: Add async notification support to /dev/random. This conflicts just about everywhere with my latest code, but I'll fix that up. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET] printk: implement printk_header() and merging printk
On Sat, 2008-01-19 at 07:58 +0900, Tejun Heo wrote: Matt Mackall wrote: On Wed, 2008-01-16 at 10:00 +0900, Tejun Heo wrote: And mprintk the following. code: DEFINE_MPRINTK(mp, 2 * 80); mprintk_set_header(mp, KERN_INFO ata%u.%2u: , 1, 0); mprintk_push(mp, ATA %d, 7); mprintk_push(mp, , %u sectors\n, 1024); mprintk(mp, everything seems dandy\n); I prefer Matthew Wilcox's stringbuf approach which does proper memory management and isn't specific to printk: http://www.ussg.iu.edu/hypermail/linux/kernel/0710.3/0517.html Yeap, that's generic and nice but I think both 'generic' and 'proper memory management' are weakness if what you're trying to do is to support collecting messages in pieces and putting it out via printk. Please consider the following scenario. You're in an interrupt handler and detected a severe error condition which should be notified to the user but the information is rather complex and best built in pieces, so you create a stringbuf and does sb_printf() to it w/ GFP_ATOMIC but alas memory allocation failed and you end up printing out of memory unless you detect the failure and go back and printk messages piece-by-piece manually. I would rather assemble the message manually from the get-go into an on-stack buffer. I suppose. I still find this approach less than ideal, especially putting something potentially large on the stack. The dangers are perhaps worse than a malloc, really. I also don't like your interface much. Consider this alternative: struct mprintk *mp = mprintk_begin(KERN_INFO ata%u.%2u: , 1, 0); mprintk(mp, ATA %d, 7); mprintk(mp, , %u sectors\n, 1024); mprintk(mp, everything seems dandy\n); mprintk_end(mp); That keeps all the normal printks short and makes the flush more explict. Now we make mprintk_begin attempt to do a kmalloc of a moderate size (512 bytes?) and failing that, return null. Then mprintk can fall through to printk in the NULL case. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET] printk: implement printk_header() and merging printk
On Tue, 2008-01-22 at 10:00 +0900, Tejun Heo wrote: Matt Mackall wrote: I suppose. I still find this approach less than ideal, especially putting something potentially large on the stack. The dangers are perhaps worse than a malloc, really. I pondered on this a bit but the thing is we already use several hundreds bytes in a function which builds complex messages. Well there are lots of current and potential future users of this function, many of them down at the end of long call chains. So I'm more worried about the new cases that embrace this approach and suddenly add 300 bytes of stack. In fact, if this is at all popular, we can expect to have more than one of these frames on the stack in various paths. Given that it only exists to make output prettier, it doesn't -really- justify increased stack usage. I also don't like your interface much. Consider this alternative: struct mprintk *mp = mprintk_begin(KERN_INFO ata%u.%2u: , 1, 0); mprintk(mp, ATA %d, 7); mprintk(mp, , %u sectors\n, 1024); mprintk(mp, everything seems dandy\n); mprintk_end(mp); That keeps all the normal printks short and makes the flush more explict. I like that the more used function is shorter. Hmmm... The reason why I first used mprintk_push() is to make it clear that the function accumulates messages unlike mprintk() which flushes what's accumulated and prints its own message. Now we make mprintk_begin attempt to do a kmalloc of a moderate size (512 bytes?) and failing that, return null. Then mprintk can fall through to printk in the NULL case. If you wanna do that implicitly, you need GFP_ flag in mprintk_begin() and atomic allocation should be used from interrupt handlers and friends and they fail easily under the right (or wrong) conditions. Forcing kmalloc isn't a good idea. Having multiple initializers is one way to do it. Any suggestions? Adding a gfp_flags arg isn't too painful. And we've generally avoided having separate function calls for atomic vs non-atomic allocation. Btw, we can also easily hide Willy or Rusty's stringbuf implementation under the covers here and still have a scheme that automatically falls back to direct printk.. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Celeron Core
On Sun, 2008-01-20 at 12:24 -0600, Robert Hancock wrote: > David Newall wrote: > > Andi Kleen wrote: > >>> Isn't it the case that an idle machine will use > >>> less power when throttled than when not? > >>> > >> No that is not the case (not even on old CPUs) > >> > > Then why would it run cooler? What generates the heat when not > > throttled? What stops generating heat when throttled? And you say this > > happens without reducing power consumption? I'm not convinced. I'm a > > long way from that. > > I believe that all throttling does is forcibly halt the CPU on a > particular duty cycle. This will reduce the rate of power consumption, > but reduces the CPU performance by a greater amount (since even at 100% > halted the CPU still consumes power) and so actually reduces performance > per watt. It will spread the heat and power usage produced from a given > workload task out in time (thus its usefulness in limiting CPU > temperature) but will consume more power overall. Your usage of "overall power" here is wrong. Power is an instantaneous quantity (1/s) like velocity, and you are comparing it to energy which is not an instaneous quantity, more like distance. If we throttle the velocity of a car from 100km/h to 50km/h, it'll obviously take longer for it travel a given distance. Now what will it mean when we ask about its "overall velocity" when it reaches its destination? We surely don't mean the distance travelled - that's not a velocity! We can perhaps talk about its average velocity, which will obviously be smaller. > Real CPU clock throttling schemes like SpeedStep, PowerNow, etc. > actually do increase performance per watt when they kick in. That may be true. But the statement "throttling does not reduce power usage" remains false. And the statement "throttling reduces heat production but not power usage" remains physically impossible. It might be true that "throttling increases energy usage per unit of computation relative to no power saving measures at all", but that is not incompatible with "throttling lets you run your laptop on battery longer than no power saving measures at all", which is often what people care about. Voltage/frequency reduction is obviously a much better solution if it's available as reducing voltage reduces power usage quadratically rather than linearly. But beyond the quadratic/linear thing, the concept is the same: use less power and your battery lasts longer. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling
On Sat, 2008-01-19 at 22:59 -0600, Rob Landley wrote: > On Friday 18 January 2008 11:10:19 Matt Mackall wrote: > > > * Disable support for readahead, page writeback, pdflush and swap > > > when we have no storage at all (typically booting from an > > > initramfs). This corresponds to 69 KB of source code! > > > > That'd be nice, yes. It would probably make sense to be able to disable > > just readahead support when we're working with only solid-state devices. > > Very nice. From a UI standpoint, shouldn't disabling the block layer take at > least some of that out? There are a number of laptops now that ship with solid-state disks. These things look like normal IDE block devices to the kernel, but have zero seek time and zero rotational latency. So here, prefetch is a waste of memory and probably increases latency on average. This will also be true for using prefetch on a typical embedded board using compact flash through an IDE interface controller (extremely common). -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling
On Sat, 2008-01-19 at 22:59 -0600, Rob Landley wrote: On Friday 18 January 2008 11:10:19 Matt Mackall wrote: * Disable support for readahead, page writeback, pdflush and swap when we have no storage at all (typically booting from an initramfs). This corresponds to 69 KB of source code! That'd be nice, yes. It would probably make sense to be able to disable just readahead support when we're working with only solid-state devices. Very nice. From a UI standpoint, shouldn't disabling the block layer take at least some of that out? There are a number of laptops now that ship with solid-state disks. These things look like normal IDE block devices to the kernel, but have zero seek time and zero rotational latency. So here, prefetch is a waste of memory and probably increases latency on average. This will also be true for using prefetch on a typical embedded board using compact flash through an IDE interface controller (extremely common). -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Celeron Core
On Sun, 2008-01-20 at 12:24 -0600, Robert Hancock wrote: David Newall wrote: Andi Kleen wrote: Isn't it the case that an idle machine will use less power when throttled than when not? No that is not the case (not even on old CPUs) Then why would it run cooler? What generates the heat when not throttled? What stops generating heat when throttled? And you say this happens without reducing power consumption? I'm not convinced. I'm a long way from that. I believe that all throttling does is forcibly halt the CPU on a particular duty cycle. This will reduce the rate of power consumption, but reduces the CPU performance by a greater amount (since even at 100% halted the CPU still consumes power) and so actually reduces performance per watt. It will spread the heat and power usage produced from a given workload task out in time (thus its usefulness in limiting CPU temperature) but will consume more power overall. Your usage of overall power here is wrong. Power is an instantaneous quantity (1/s) like velocity, and you are comparing it to energy which is not an instaneous quantity, more like distance. If we throttle the velocity of a car from 100km/h to 50km/h, it'll obviously take longer for it travel a given distance. Now what will it mean when we ask about its overall velocity when it reaches its destination? We surely don't mean the distance travelled - that's not a velocity! We can perhaps talk about its average velocity, which will obviously be smaller. Real CPU clock throttling schemes like SpeedStep, PowerNow, etc. actually do increase performance per watt when they kick in. That may be true. But the statement throttling does not reduce power usage remains false. And the statement throttling reduces heat production but not power usage remains physically impossible. It might be true that throttling increases energy usage per unit of computation relative to no power saving measures at all, but that is not incompatible with throttling lets you run your laptop on battery longer than no power saving measures at all, which is often what people care about. Voltage/frequency reduction is obviously a much better solution if it's available as reducing voltage reduces power usage quadratically rather than linearly. But beyond the quadratic/linear thing, the concept is the same: use less power and your battery lasts longer. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files
On Sat, 2008-01-19 at 11:22 +0100, Miklos Szeredi wrote: > > Reminds me, I've got a patch here for addressing that problem with loop > > mounts: > > > > Writes to loop should update the mtime of the underlying file. > > > > Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> > > > > Index: l/drivers/block/loop.c > > === > > --- l.orig/drivers/block/loop.c 2007-11-05 17:50:07.0 -0600 > > +++ l/drivers/block/loop.c 2007-11-05 19:03:51.0 -0600 > > @@ -221,6 +221,7 @@ static int do_lo_send_aops(struct loop_d > > offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1); > > bv_offs = bvec->bv_offset; > > len = bvec->bv_len; > > + file_update_time(file); > > while (len > 0) { > > sector_t IV; > > unsigned size; > > @@ -299,6 +300,7 @@ static int __do_lo_send_write(struct fil > > > > set_fs(get_ds()); > > bw = file->f_op->write(file, buf, len, ); > > + file_update_time(file); > > ->write should have already updated the times, no? Yes, this second case is redundant. Still needed in the first case. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files
On Sat, 2008-01-19 at 11:22 +0100, Miklos Szeredi wrote: Reminds me, I've got a patch here for addressing that problem with loop mounts: Writes to loop should update the mtime of the underlying file. Signed-off-by: Matt Mackall [EMAIL PROTECTED] Index: l/drivers/block/loop.c === --- l.orig/drivers/block/loop.c 2007-11-05 17:50:07.0 -0600 +++ l/drivers/block/loop.c 2007-11-05 19:03:51.0 -0600 @@ -221,6 +221,7 @@ static int do_lo_send_aops(struct loop_d offset = pos ((pgoff_t)PAGE_CACHE_SIZE - 1); bv_offs = bvec-bv_offset; len = bvec-bv_len; + file_update_time(file); while (len 0) { sector_t IV; unsigned size; @@ -299,6 +300,7 @@ static int __do_lo_send_write(struct fil set_fs(get_ds()); bw = file-f_op-write(file, buf, len, pos); + file_update_time(file); -write should have already updated the times, no? Yes, this second case is redundant. Still needed in the first case. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Celeron Core
On Sat, 2008-01-19 at 05:27 +0100, Andi Kleen wrote: > > So while throttling may be less efficient in terms of watt seconds used > > to compile something than running at full speed, it is incorrect to say > > it uses less power. One machine running for an hour throttled to 50% > > uses less power (and therefore less battery and cooling) than another > > running at full speed for that same hour. > > Not for the same unit of work. If you just run endless loops you > might be true, but most systems don't do that. Yes, most systems idle. > In terms of laptops (or rather in most other systems too) you usually care > about battery life time while the system is mostly idling (waiting > for your key strokes etc.). In this case enabling throttling > as a cpufreq driver will not make your battery last longer. It will relative to not throttling. You made a claim that is -physically impossible- as stated, a claim I've seen here before and I'm correcting it. If something reduces heat, it must save power *by the definition of heat and power*. And if you reduce power usage, you will make your battery last longer. Make any other statement you want about the efficiency of throttling per unit work or the effectiveness of throttling relavite to other methods, just stop repeating the claim that "throttling reduces heat but doesn't save power". It goes against the law of conservation of energy. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Celeron Core
On Sat, 2008-01-19 at 02:15 +0100, Andi Kleen wrote: > On Fri, Jan 18, 2008 at 06:27:57PM -0600, Matt Mackall wrote: > > > > On Fri, 2008-01-18 at 22:11 +0100, Andi Kleen wrote: > > > Chodorenko Michail <[EMAIL PROTECTED]> writes: > > > > > > > I have a laptop "Extensa 5220", with the processor Celeron based on > > > > 'core' > > > > technology. > > > > There is ~ / arch/i386/kernel/cpu/cpufreq/p4-clockmod.c in the kernel > > > > source code > > > > but there's no line identification of my CPU for apply freqency change > > > > need to add a ID line 0х16 > > > > > > Note that driver will likely do clock throttling on your CPU. > > > Using that is usually a bad idea because it does not actually > > > safe power. It's only intended to let the CPU cool down in some > > > situations. > > > > Power consumption is more or less exactly equal to heat production > > (that's where the power goes, after all!), so either clock throttling > > DOES save power or it DOES NOT cool the CPU. > > No actually the way it works on modern x86 CPUs is that the best > strategy for saving power is to do things quickly and then > idle longer. That means on anything that has reasonably > deep sleep modi e.g. on older server/desktop systems things might > be slightly different because they had very little power saving > features enabled, but it's definitely true for all > laptop systems from the last several years. But even > on desktop/server throttling tends to be a bad idea. Dominik is measuring energy expended (watts * seconds) vs work done (CPU cycles). But your claim above is "clock throttling...does not save power [but it lets] the CPU cool down", which talks about power (watts) and heat (also watts, in fact the *very same* watts) and is physically impossible. A CPU turns power into heat. Less heat out implies less power in. So while throttling may be less efficient in terms of watt seconds used to compile something than running at full speed, it is incorrect to say it uses less power. One machine running for an hour throttled to 50% uses less power (and therefore less battery and cooling) than another running at full speed for that same hour. The first machine may take significantly longer to complete its task (or it may not, if the task is reading email or watching video), but that's another matter entirely. And whether it's more or less efficient than other power-saving approaches is also another matter. Throttling does save power. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files
On Fri, 2008-01-18 at 17:54 -0500, Rik van Riel wrote: > On Fri, 18 Jan 2008 14:47:33 -0800 (PST) > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > - keep it simple. Let's face it, Linux has never ever given those > >guarantees before, and it's not is if anybody has really cared. Even > >now, the issue seems to be more about paper standards conformance than > >anything else. > > There is one issue which is way more than just standards conformance. > > When a program changes file data through mmap(), at some point the > mtime needs to be update so that backup programs know to back up the > new version of the file. > > Backup programs not seeing an updated mtime is a really big deal. And that's fixed with the 4-line approach. Reminds me, I've got a patch here for addressing that problem with loop mounts: Writes to loop should update the mtime of the underlying file. Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> Index: l/drivers/block/loop.c === --- l.orig/drivers/block/loop.c 2007-11-05 17:50:07.0 -0600 +++ l/drivers/block/loop.c 2007-11-05 19:03:51.0 -0600 @@ -221,6 +221,7 @@ static int do_lo_send_aops(struct loop_d offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1); bv_offs = bvec->bv_offset; len = bvec->bv_len; + file_update_time(file); while (len > 0) { sector_t IV; unsigned size; @@ -299,6 +300,7 @@ static int __do_lo_send_write(struct fil set_fs(get_ds()); bw = file->f_op->write(file, buf, len, ); + file_update_time(file); set_fs(old_fs); if (likely(bw == len)) return 0; -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Celeron Core
On Fri, 2008-01-18 at 22:11 +0100, Andi Kleen wrote: > Chodorenko Michail <[EMAIL PROTECTED]> writes: > > > I have a laptop "Extensa 5220", with the processor Celeron based on 'core' > > technology. > > There is ~ / arch/i386/kernel/cpu/cpufreq/p4-clockmod.c in the kernel > > source code > > but there's no line identification of my CPU for apply freqency change > > need to add a ID line 0х16 > > Note that driver will likely do clock throttling on your CPU. > Using that is usually a bad idea because it does not actually > safe power. It's only intended to let the CPU cool down in some situations. Power consumption is more or less exactly equal to heat production (that's where the power goes, after all!), so either clock throttling DOES save power or it DOES NOT cool the CPU. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling
On Fri, 2008-01-18 at 22:09 +0100, Ingo Molnar wrote: > * Matt Mackall <[EMAIL PROTECTED]> wrote: > > > > Sounds fine! Don't hesitate to let us know about the lower-hanging > > > fruit you're thinking about. Here are the main things I have so far: > > > > > > * Ideas in the existing Linux-Tiny patchset. > > > * Disable support for non-Intel processors in x86 (cyrix.c, > > > centaur.c, transmeta.c, nexgen.c, umc.c in arch/x86/kernel/cpu). > > > As far as I remember, I saved 15 KB when I first experimented with > > > this). > > > > Isn't that already in -tiny? > > btw., are there any pending arch/x86 bits in -tiny? (stupid question: > were can i get the most uptodate version of -tiny from?) It's not a stupid question. I dropped updating the tree regulary some time ago to focus on merging bits and then got a bit side-tracked by this little thing called "version control". Michael is attempting to get the tree started again and has put a quilt up here: http://elinux.org/images/3/3c/Tiny-quilt-2.6.23-0.tar.bz2 -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/