from:"Matt Mackall"

Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources

2014-03-05 Thread Matt Mackall

On Wed, 2014-03-05 at 16:11 -0500, Jason Cooper wrote:
> > In other words, if there are 4096 bits of "unknownness" in X to start
> > with, and I can get those same 4096 bits of "unknownness" back by
> > unmixing X' and Y, then there must still be 4096 bits of "unknownness"
> > in X'. If X' is 4096 bits long, then we've just proven that
> > reversibility means the attacker can know nothing about the contents of
> > X' by his choice of Y.
> 
> Well, this reinforces my comfortability with loadable modules.  The pool
> is already initialized by the point at which the driver is loaded.
> 
> Unfortunately, any of the drivers in hw_random can be built in.  When
> built in, hwrng_register is going to be called during the kernel
> initialization process.  In that case, the unknownness in X is not 4096
> bits, but far less.  Also, the items that may have seeded X (MAC addr,
> time, etc) are discoverable by a potential attacker.  This is also well
> before random-seed has been fed in.

To which I would respond.. so?

If the pool is in an attacker-knowable state at early boot, adding
attacker-controlled data does not make the situation any worse. In fact,
if the attacker has less-than-perfect control of the inputs, mixing more
things in will make things exponentially harder for the attacker.

Put another way: mixing can't ever removes unknownness from the pool, it
can only add more. So the only reason you should ever choose not to mix
something into the pool is performance.

-- 
Mathematics is the supreme nostalgia of our time.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources

2014-03-05 Thread Matt Mackall

On Wed, 2014-03-05 at 16:11 -0500, Jason Cooper wrote:
  In other words, if there are 4096 bits of unknownness in X to start
  with, and I can get those same 4096 bits of unknownness back by
  unmixing X' and Y, then there must still be 4096 bits of unknownness
  in X'. If X' is 4096 bits long, then we've just proven that
  reversibility means the attacker can know nothing about the contents of
  X' by his choice of Y.
 
 Well, this reinforces my comfortability with loadable modules.  The pool
 is already initialized by the point at which the driver is loaded.
 
 Unfortunately, any of the drivers in hw_random can be built in.  When
 built in, hwrng_register is going to be called during the kernel
 initialization process.  In that case, the unknownness in X is not 4096
 bits, but far less.  Also, the items that may have seeded X (MAC addr,
 time, etc) are discoverable by a potential attacker.  This is also well
 before random-seed has been fed in.

To which I would respond.. so?

If the pool is in an attacker-knowable state at early boot, adding
attacker-controlled data does not make the situation any worse. In fact,
if the attacker has less-than-perfect control of the inputs, mixing more
things in will make things exponentially harder for the attacker.

Put another way: mixing can't ever removes unknownness from the pool, it
can only add more. So the only reason you should ever choose not to mix
something into the pool is performance.

-- 
Mathematics is the supreme nostalgia of our time.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources

2014-03-04 Thread Matt Mackall

On Tue, 2014-03-04 at 11:59 -0800, Kees Cook wrote:
> On Tue, Mar 4, 2014 at 11:53 AM, Jason Cooper  wrote:
> > On Tue, Mar 04, 2014 at 11:01:49AM -0800, Kees Cook wrote:
> >> On Tue, Mar 4, 2014 at 7:38 AM, Jason Cooper  wrote:
> >> > Kees, Ted,
> >> >
> >> > On Mon, Mar 03, 2014 at 03:51:48PM -0800, Kees Cook wrote:
> >> >> When bringing a new RNG source online, it seems like it would make sense
> >> >> to use some of its bytes to make the system entropy pool more random,
> >> >> as done with all sorts of other devices that contain per-device or
> >> >> per-boot differences.
> >> >
> >> > Why is this necessary?  init_std_data() already calls
> >> > arch_get_random_long() while stirring each of the pools.
> >>
> >> I may be misunderstanding something here, but hwrng isn't going to get
> >> hit by a arch_get_random_long().
> >
> > ahh, you are correct.  It appears it's only used on x86 and powerpc.
> > Bad assumption on my part.
> >
> >> That's just for arch-specific RNGs (e.g. RDRAND), where as hwrng is
> >> for, effectively, add-on devices (e.g. TPMs).
> >>
> >> > I'm a little concerned here because this gives potentially untrusted
> >> > hwrngs more influence over the entropy pools initial state than most
> >> > users of random.c expect.  Many of the drivers in hw_random/ are
> >> > platform drivers and are initialized before random.c.
> >> >
> >> > I'm comfortable with the design decisions Ted has made wrt random.c and
> >> > hwrngs.  However, I think that this changes that trust relationship in a
> >> > fundamental way.  I'm ok with building support into my kernels for
> >> > hwrngs as long as random.c's internal use of them is limited to the
> >> > mixing in extract_buf() and init_std_data().
> >> >
> >> > By adding this patch, even without crediting entropy to the pool, a
> >> > rogue hwrng now has significantly more influence over the initial state
> >> > of the entropy pools.  Or, am I missing something?
> >>
> >> I wasn't viewing this as dealing with rouge hwrngs (though shouldn't
> >> that state still be covered due to the existing mixing), but more as a
> >> "hey this thing has some randomness associated with it", similar to
> >> the mixing done for things like NIC MAC, etc. (Better, actually, since
> >> NIC MAC is going to be the same every boot.) It seemed silly to ignore
> >> an actual entropy source when seeding.
> >
> > Agreed, but I think we need to be careful about how random.c interacts
> > with any hwrng.  Ideally, the drivers in hw_random/ could provide
> > arch_get_random_long().  This way, random.c still determines when and
> > how to use the hwrng.
> >
> > Ultimately, the user (person compiling the kernel) will decide to trust
> > or not trust the hwrng by enabling support for it or not.  My concern
> > with this patch is that it changes the magnitude of that trust decision.
> > And only the most diligent user would discover the change.
> >
> > To date, all discussion wrt random.c and hwrngs are that the output of
> > the hwrng (in particular, RDRAND) is XORd with the output of the mixer.
> > Now, we're saying it can provide input as well.
> 
> Well, I think there's confusion here over "the" hwrng and "a" hwrng. I
> have devices with multiple entropy sources, and all my hwrngs are
> built as modules, so I choose when to load them into my kernel. "The"
> arch-specific entropy source (e.g. RDRAND) is very different.
> 
> >
> > Please understand, my point-of-view is as someone who installs Linux on
> > equipment *after* purchase (hobbyist, tinkers).  If I control the part
> > selection and sourcing of the board components, of course I have more
> > trust in the hwrng.
> >
> > So my situation is similar to buying an Intel based laptop.  I can't do
> > a special order at Bestbuy and ask for a system without the RDRAND
> > instruction.  Same with the hobbyist market.  We buy the gear, but we
> > have no control over what's inside it.
> >
> > In that situation, without this patch, I would enable the hwrng for the
> > board.  With the patch in it's current form, I would start looking for
> > research papers and discussions regarding using the hwrng for input.  If
> > the patch provided arch_get_random_long(), I would feel comfortable
> > enabling the hwrng.
> >
> > Perhaps I'm being too conservative, but I'd rather have the discussion
> > now and have concerns proven unfounded than have someone say "How the
> > hell did this happen?" three releases down the road.
> 
> Sure, and I don't want to be the one weakening the entropy pool.

[temporarily coming out of retirement to provide a clue]

The pool mixing function is intentionally _reversible_. This is a
crucial security property.

That means, if I have an initial secret pool state X, and hostile
attacker controlled data Y, then we can do:

X' = mix(X, Y)

 and 

X = unmix(X', Y)

We can see from this that the combination of (X' and Y) still contain
the information that was originally in X. Since it's clearly not in Y..
it must all remain

Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources

2014-03-04 Thread Matt Mackall

On Tue, 2014-03-04 at 11:59 -0800, Kees Cook wrote:
 On Tue, Mar 4, 2014 at 11:53 AM, Jason Cooper ja...@lakedaemon.net wrote:
  On Tue, Mar 04, 2014 at 11:01:49AM -0800, Kees Cook wrote:
  On Tue, Mar 4, 2014 at 7:38 AM, Jason Cooper ja...@lakedaemon.net wrote:
   Kees, Ted,
  
   On Mon, Mar 03, 2014 at 03:51:48PM -0800, Kees Cook wrote:
   When bringing a new RNG source online, it seems like it would make sense
   to use some of its bytes to make the system entropy pool more random,
   as done with all sorts of other devices that contain per-device or
   per-boot differences.
  
   Why is this necessary?  init_std_data() already calls
   arch_get_random_long() while stirring each of the pools.
 
  I may be misunderstanding something here, but hwrng isn't going to get
  hit by a arch_get_random_long().
 
  ahh, you are correct.  It appears it's only used on x86 and powerpc.
  Bad assumption on my part.
 
  That's just for arch-specific RNGs (e.g. RDRAND), where as hwrng is
  for, effectively, add-on devices (e.g. TPMs).
 
   I'm a little concerned here because this gives potentially untrusted
   hwrngs more influence over the entropy pools initial state than most
   users of random.c expect.  Many of the drivers in hw_random/ are
   platform drivers and are initialized before random.c.
  
   I'm comfortable with the design decisions Ted has made wrt random.c and
   hwrngs.  However, I think that this changes that trust relationship in a
   fundamental way.  I'm ok with building support into my kernels for
   hwrngs as long as random.c's internal use of them is limited to the
   mixing in extract_buf() and init_std_data().
  
   By adding this patch, even without crediting entropy to the pool, a
   rogue hwrng now has significantly more influence over the initial state
   of the entropy pools.  Or, am I missing something?
 
  I wasn't viewing this as dealing with rouge hwrngs (though shouldn't
  that state still be covered due to the existing mixing), but more as a
  hey this thing has some randomness associated with it, similar to
  the mixing done for things like NIC MAC, etc. (Better, actually, since
  NIC MAC is going to be the same every boot.) It seemed silly to ignore
  an actual entropy source when seeding.
 
  Agreed, but I think we need to be careful about how random.c interacts
  with any hwrng.  Ideally, the drivers in hw_random/ could provide
  arch_get_random_long().  This way, random.c still determines when and
  how to use the hwrng.
 
  Ultimately, the user (person compiling the kernel) will decide to trust
  or not trust the hwrng by enabling support for it or not.  My concern
  with this patch is that it changes the magnitude of that trust decision.
  And only the most diligent user would discover the change.
 
  To date, all discussion wrt random.c and hwrngs are that the output of
  the hwrng (in particular, RDRAND) is XORd with the output of the mixer.
  Now, we're saying it can provide input as well.
 
 Well, I think there's confusion here over the hwrng and a hwrng. I
 have devices with multiple entropy sources, and all my hwrngs are
 built as modules, so I choose when to load them into my kernel. The
 arch-specific entropy source (e.g. RDRAND) is very different.
 
 
  Please understand, my point-of-view is as someone who installs Linux on
  equipment *after* purchase (hobbyist, tinkers).  If I control the part
  selection and sourcing of the board components, of course I have more
  trust in the hwrng.
 
  So my situation is similar to buying an Intel based laptop.  I can't do
  a special order at Bestbuy and ask for a system without the RDRAND
  instruction.  Same with the hobbyist market.  We buy the gear, but we
  have no control over what's inside it.
 
  In that situation, without this patch, I would enable the hwrng for the
  board.  With the patch in it's current form, I would start looking for
  research papers and discussions regarding using the hwrng for input.  If
  the patch provided arch_get_random_long(), I would feel comfortable
  enabling the hwrng.
 
  Perhaps I'm being too conservative, but I'd rather have the discussion
  now and have concerns proven unfounded than have someone say How the
  hell did this happen? three releases down the road.
 
 Sure, and I don't want to be the one weakening the entropy pool.

[temporarily coming out of retirement to provide a clue]

The pool mixing function is intentionally _reversible_. This is a
crucial security property.

That means, if I have an initial secret pool state X, and hostile
attacker controlled data Y, then we can do:

X' = mix(X, Y)

 and 

X = unmix(X', Y)

We can see from this that the combination of (X' and Y) still contain
the information that was originally in X. Since it's clearly not in Y..
it must all remain in X'.

In other words, if there are 4096 bits of unknownness in X to start
with, and I can get those same 4096 bits of unknownness back by
unmixing X' and Y, then there must still be 4096 bits of

Re: [PATCH RFC] random: Account for entropy loss due to overwrites

2012-08-15 Thread Matt Mackall

On Mon, 2012-08-13 at 10:26 -0700, H. Peter Anvin wrote:
> From: "H. Peter Anvin" 
> 
> When we write entropy into a non-empty pool, we currently don't
> account at all for the fact that we will probabilistically overwrite
> some of the entropy in that pool.

Technically, no, nothing is overwritten. The key fact is that the mixing
function is -reversible-. Thus, even if you mix in known data, you can't
learn anything about the state and thus can't destroy any of the
existing entropy.

But you are correct, mixing new actual entropy is not purely additive
(with saturation). For that to happen, we'd need an input mixing
function with perfect maximal cascading. Instead we effectively cascade
across somewhere between 6 and 64 bits. So the truth lies somewhere
between linear and your exponential estimate (which would be the case
for mixing a single bit into the pool with XOR), but much closer to
linear due to combinatoric expansion.

On the other hand, I don't think this sort of thing matters at all.
There is so much more fundamentally wrong with even trying to do entropy
accounting in the first place that these sorts of details don't even
matter. Instead we should stop fooling ourselves and just drop the
pretense of accounting entirely. Now that we've got a much richer set of
inputs, I think the time is ripe... but of course, I'm no longer the
maintainer.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] random: Account for entropy loss due to overwrites

2012-08-15 Thread Matt Mackall

On Mon, 2012-08-13 at 10:26 -0700, H. Peter Anvin wrote:
 From: H. Peter Anvin h...@linux.intel.com
 
 When we write entropy into a non-empty pool, we currently don't
 account at all for the fact that we will probabilistically overwrite
 some of the entropy in that pool.

Technically, no, nothing is overwritten. The key fact is that the mixing
function is -reversible-. Thus, even if you mix in known data, you can't
learn anything about the state and thus can't destroy any of the
existing entropy.

But you are correct, mixing new actual entropy is not purely additive
(with saturation). For that to happen, we'd need an input mixing
function with perfect maximal cascading. Instead we effectively cascade
across somewhere between 6 and 64 bits. So the truth lies somewhere
between linear and your exponential estimate (which would be the case
for mixing a single bit into the pool with XOR), but much closer to
linear due to combinatoric expansion.

On the other hand, I don't think this sort of thing matters at all.
There is so much more fundamentally wrong with even trying to do entropy
accounting in the first place that these sorts of details don't even
matter. Instead we should stop fooling ourselves and just drop the
pretense of accounting entirely. Now that we've got a much richer set of
inputs, I think the time is ripe... but of course, I'm no longer the
maintainer.

-- 
Mathematics is the supreme nostalgia of our time.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dmi: Feed DMI table to /dev/random driver

2012-07-20 Thread Matt Mackall

On Fri, 2012-07-20 at 13:15 -0700, Tony Luck wrote:
> Send the entire DMI (SMBIOS) table to the /dev/random driver to
> help seed its pools.
> 
> Signed-off-by: Tony Luck 
> ---
> 
> This looks a useful addition to your /dev/random series. There are
> lots of platform specific goodies in this table (BIOS version, system
> serial number and UUID, count and version number of processors, DIMM
> slot population and serial numbers, etc.)
> 
> On the system I tested the patch on the table is 9866 bytes. Is it
> OK to dump that much into add_device_randomness() in one shot?

Yes, that's fine. We should also consider doing something similar with
various bus enumerations (PCI, USB, SCSI) and hotplug, we might pick up
similar goodies. Also, we should feed in the OF device tree on platforms
that use it.

-- 
Mathematics is the supreme nostalgia of our time.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dmi: Feed DMI table to /dev/random driver

2012-07-20 Thread Matt Mackall

On Fri, 2012-07-20 at 13:15 -0700, Tony Luck wrote:
 Send the entire DMI (SMBIOS) table to the /dev/random driver to
 help seed its pools.
 
 Signed-off-by: Tony Luck tony.l...@intel.com
 ---
 
 This looks a useful addition to your /dev/random series. There are
 lots of platform specific goodies in this table (BIOS version, system
 serial number and UUID, count and version number of processors, DIMM
 slot population and serial numbers, etc.)
 
 On the system I tested the patch on the table is 9866 bytes. Is it
 OK to dump that much into add_device_randomness() in one shot?

Yes, that's fine. We should also consider doing something similar with
various bus enumerations (PCI, USB, SCSI) and hotplug, we might pick up
similar goodies. Also, we should feed in the OF device tree on platforms
that use it.

-- 
Mathematics is the supreme nostalgia of our time.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/10] random: make 'add_interrupt_randomness()' do something sane

2012-07-09 Thread Matt Mackall

On Fri, 2012-07-06 at 12:52 -0400, Theodore Ts'o wrote:
> On Fri, Jul 06, 2012 at 09:24:00AM -0700, Linus Torvalds wrote:
> > On Fri, Jul 6, 2012 at 6:01 AM, Theodore Ts'o  wrote:
> > > What in the world is "fast count"?  I've grepped for it,
> > > and I can't find it.
> > 
> > It's your own fast-pool counter that Matt was talking about.
> 
> When he said "check it against HZ", it confused me, since there's no
> way to compare it against HZ.  But yes, I can certainly not give any
> credit for entropy if __IRQF_TIMER is set, or keep track of whether
> the previous interrupt had __IRQF_TIMER set in its descriptor.  That's
> simple enough.
> 
> I thought he was saying there was some way to distinguish between
> interrupts triggered by the clock interrupt versus other devices on
> the same irq channel --- and I couldn't figure out any to do that in
> an architecture independent way.

Sorry.. offline for the weekend.

Let me restate:

- on some architectures, we will call into the RNG on timer interrupts
- this is generally desirable, as most time sources are asynchronous to
sched_clock() and thus a source of entropy
- we also want to keep conditional checks like IRQF_TIMER off the fast
path
- but on systems where the timer interrupt is the primary time source,
we may get effectively no entropy when the system is quiescent
- so we should check the fast pool count against HZ before crediting
- but even then, we still should mix the fast pool

Something like:

add_some_randomness(...) /* always mix */
if (fast_pool->count > HZ) {
  fast_pool->count = 0;
  credit_entropy_pool(...); /* only credit when we've got > HZ events */
}

That should be safe on all systems.

-- 
Mathematics is the supreme nostalgia of our time.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/10] random: make 'add_interrupt_randomness()' do something sane

2012-07-09 Thread Matt Mackall

On Fri, 2012-07-06 at 12:52 -0400, Theodore Ts'o wrote:
 On Fri, Jul 06, 2012 at 09:24:00AM -0700, Linus Torvalds wrote:
  On Fri, Jul 6, 2012 at 6:01 AM, Theodore Ts'o ty...@mit.edu wrote:
   What in the world is fast count?  I've grepped for it,
   and I can't find it.
  
  It's your own fast-pool counter that Matt was talking about.
 
 When he said check it against HZ, it confused me, since there's no
 way to compare it against HZ.  But yes, I can certainly not give any
 credit for entropy if __IRQF_TIMER is set, or keep track of whether
 the previous interrupt had __IRQF_TIMER set in its descriptor.  That's
 simple enough.
 
 I thought he was saying there was some way to distinguish between
 interrupts triggered by the clock interrupt versus other devices on
 the same irq channel --- and I couldn't figure out any to do that in
 an architecture independent way.

Sorry.. offline for the weekend.

Let me restate:

- on some architectures, we will call into the RNG on timer interrupts
- this is generally desirable, as most time sources are asynchronous to
sched_clock() and thus a source of entropy
- we also want to keep conditional checks like IRQF_TIMER off the fast
path
- but on systems where the timer interrupt is the primary time source,
we may get effectively no entropy when the system is quiescent
- so we should check the fast pool count against HZ before crediting
- but even then, we still should mix the fast pool

Something like:

add_some_randomness(...) /* always mix */
if (fast_pool-count  HZ) {
  fast_pool-count = 0;
  credit_entropy_pool(...); /* only credit when we've got  HZ events */
}

That should be safe on all systems.

-- 
Mathematics is the supreme nostalgia of our time.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)

2008-02-25 Thread Matt Mackall


On Mon, 2008-02-25 at 18:53 +0100, Thomas Petazzoni wrote:
> Le Mon, 25 Feb 2008 09:03:12 -0800,
> Matt Mackall <[EMAIL PROTECTED]> a écrit :
> 
> > > > This is not quite what Peter and I were thinking of, I think.
> > > > It's not at all generic. How about a section that simply contains
> > > > a set of function pointers, a macro to add things to that
> > > > section, and a function that calls all the pointers in that
> > > > section. Eg:
> > > > 
> > > > CALLBACK_SECTION(init_cpu_amd, "cpuvendor.init");
> > > > invoke_callback_section("cpuvendor.init");
> > > > 
> > > > ..which would give us a generic facility we could use in various
> > > > places.
> > > 
> > > I see. Probably doable. How would it work in the LD script file ?
> > > Your mechanism allows to specify any section name, but AFAIK, the
> > > sections must be explicitly listed in the kernel LD script in order
> > > to be included in the final kernel image. Am I missing something ?
> > 
> > I can't see any way to avoid it, but we can leave it to future
> > generations to come up with something more clever.
> 
> After a quick look at the LD documentation, it seems that wildcards are
> supported in the input section names of the linker script. So that the
> CALLBACK_SECTION() macro could add the function pointer to a section
> named:
> 
>gcm. ## name
> 
> (gcm standing for "generic callback mechanism") and then, in the linker
> script, do:
> 
>*(gcm.*)
> 
> I'm going to try that.

Sounds great! But I'd rather the base name be "callback" so it'll be
obvious what it is when people dump section names.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)

2008-02-25 Thread Matt Mackall


On Mon, 2008-02-25 at 09:29 +0100, Thomas Petazzoni wrote:
> Le Sat, 23 Feb 2008 10:43:37 +0800,
> Matt Mackall <[EMAIL PROTECTED]> a écrit :
> 
> > This is not quite what Peter and I were thinking of, I think. It's not
> > at all generic. How about a section that simply contains a set of
> > function pointers, a macro to add things to that section, and a
> > function that calls all the pointers in that section. Eg:
> > 
> > CALLBACK_SECTION(init_cpu_amd, "cpuvendor.init");
> > invoke_callback_section("cpuvendor.init");
> > 
> > ..which would give us a generic facility we could use in various
> > places.
> 
> I see. Probably doable. How would it work in the LD script file ? Your
> mechanism allows to specify any section name, but AFAIK, the sections
> must be explicitly listed in the kernel LD script in order to be
> included in the final kernel image. Am I missing something ?

I can't see any way to avoid it, but we can leave it to future
generations to come up with something more clever.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)

2008-02-25 Thread Matt Mackall


On Mon, 2008-02-25 at 09:29 +0100, Thomas Petazzoni wrote:
 Le Sat, 23 Feb 2008 10:43:37 +0800,
 Matt Mackall [EMAIL PROTECTED] a écrit :
 
  This is not quite what Peter and I were thinking of, I think. It's not
  at all generic. How about a section that simply contains a set of
  function pointers, a macro to add things to that section, and a
  function that calls all the pointers in that section. Eg:
  
  CALLBACK_SECTION(init_cpu_amd, cpuvendor.init);
  invoke_callback_section(cpuvendor.init);
  
  ..which would give us a generic facility we could use in various
  places.
 
 I see. Probably doable. How would it work in the LD script file ? Your
 mechanism allows to specify any section name, but AFAIK, the sections
 must be explicitly listed in the kernel LD script in order to be
 included in the final kernel image. Am I missing something ?

I can't see any way to avoid it, but we can leave it to future
generations to come up with something more clever.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)

2008-02-25 Thread Matt Mackall


On Mon, 2008-02-25 at 18:53 +0100, Thomas Petazzoni wrote:
 Le Mon, 25 Feb 2008 09:03:12 -0800,
 Matt Mackall [EMAIL PROTECTED] a écrit :
 
This is not quite what Peter and I were thinking of, I think.
It's not at all generic. How about a section that simply contains
a set of function pointers, a macro to add things to that
section, and a function that calls all the pointers in that
section. Eg:

CALLBACK_SECTION(init_cpu_amd, cpuvendor.init);
invoke_callback_section(cpuvendor.init);

..which would give us a generic facility we could use in various
places.
   
   I see. Probably doable. How would it work in the LD script file ?
   Your mechanism allows to specify any section name, but AFAIK, the
   sections must be explicitly listed in the kernel LD script in order
   to be included in the final kernel image. Am I missing something ?
  
  I can't see any way to avoid it, but we can leave it to future
  generations to come up with something more clever.
 
 After a quick look at the LD documentation, it seems that wildcards are
 supported in the input section names of the linker script. So that the
 CALLBACK_SECTION() macro could add the function pointer to a section
 named:
 
gcm. ## name
 
 (gcm standing for generic callback mechanism) and then, in the linker
 script, do:
 
*(gcm.*)
 
 I'm going to try that.

Sounds great! But I'd rather the base name be callback so it'll be
obvious what it is when people dump section names.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size

2008-02-23 Thread Matt Mackall


On Sat, 2008-02-23 at 00:06 -0800, Andrew Morton wrote:
> On Wed, 20 Feb 2008 14:57:43 +0100 "Hans Rosenfeld" <[EMAIL PROTECTED]> wrote:
> 
> > The current code for /proc/pid/pagemap does not work with huge pages (on
> > x86). The code will make no difference between a normal pmd and a huge
> > page pmd, trying to parse the contents of the huge page as ptes. Another
> > problem is that there is no way to get information about the page size a
> > specific mapping uses.
> > 
> > Also, the current way the "not present" and "swap" bits are encoded in
> > the returned pfn isn't very clean, especially not if this interface is
> > going to be extended.
> > 
> > I propose to change /proc/pid/pagemap to return a pseudo-pte instead of
> > just a raw pfn. The pseudo-pte will contain:
> > 
> > - 58 bits for the physical address of the first byte in the page, even
> >   less bits would probably be sufficient for quite a while
> > 
> > - 4 bits for the page size, with 0 meaning native page size (4k on x86,
> >   8k on alpha, ...) and values 1-15 being specific to the architecture
> >   (I used 1 for 2M, 2 for 4M and 3 for 1G for x86)
> > 
> > - a "swap" bit indicating that a not present page is paged out, with the
> >   physical address field containing page file number and block number
> >   just like before
> > 
> > - a "present" bit just like in a real pte
> >   
> > By shortening the field for the physical address, some more interesting
> > information could be included, like read/write permissions and the like.
> > The page size could also be returned directly, 6 bits could be used to
> > express any page shift in a 64 bit system, but I found the encoded page
> > size more useful for my specific use case.
> > 
> > 
> > The attached patch changes the /proc/pid/pagemap code to use such a
> > pseudo-pte. The huge page handling is currently limited to 2M/4M pages
> > on x86, 1G pages will need some more work. To keep the simple mapping of
> > virtual addresses to file index intact, any huge page pseudo-pte is
> > replicated in the user buffer to map the equivalent range of small
> > pages. 
> > 
> > Note that I had to move the pmd_pfn() macro from asm-x86/pgtable_64.h to
> > asm-x86/pgtable.h, it applies to both 32 bit and 64 bit x86.
> > 
> > Other architectures will probably need other changes to support huge
> > pages and return the page size.
> > 
> > I think that the definition of the pseudo-pte structure and the page
> > size codes should be made available through a header file, but I didn't
> > do this for now.
> > 
> 
> If we're going to do this, we need to do it *fast*.  Once 2.6.25 goes out
> our hands are tied.
> 
> That means talking with the maintainers of other hugepage-capable
> architectures.
> 
> > +struct ppte {
> > +   uint64_t paddr:58;
> > +   uint64_t psize:4;
> > +   uint64_t swap:1;
> > +   uint64_t present:1;
> > +};
> 
> This is part of the exported kernel interface and hence should be in a
> header somewhere, shouldn't it?  The old stuff should have been too.

I think we're better off not using bitfields here.

> u64 is a bit more conventional than uint64_t, and if we move this to a
> userspace-visible header then __u64 is the type to use, I think.  Although
> one would expect uint64_t to be OK as well.
> 
> > +#ifdef CONFIG_X86
> > +#define PM_PSIZE_1G  3
> > +#define PM_PSIZE_4M  2
> > +#define PM_PSIZE_2M  1
> > +#endif
> 
> No, we should factor this correctly and get the CONFIG_X86 stuff out of here.

Perhaps my "continuation bit" idea.

> Matt?  Help?

Did my previous message make it out? This is probably my last message
for 24+ hours.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size

2008-02-23 Thread Matt Mackall


On Sat, 2008-02-23 at 00:06 -0800, Andrew Morton wrote:
 On Wed, 20 Feb 2008 14:57:43 +0100 Hans Rosenfeld [EMAIL PROTECTED] wrote:
 
  The current code for /proc/pid/pagemap does not work with huge pages (on
  x86). The code will make no difference between a normal pmd and a huge
  page pmd, trying to parse the contents of the huge page as ptes. Another
  problem is that there is no way to get information about the page size a
  specific mapping uses.
  
  Also, the current way the not present and swap bits are encoded in
  the returned pfn isn't very clean, especially not if this interface is
  going to be extended.
  
  I propose to change /proc/pid/pagemap to return a pseudo-pte instead of
  just a raw pfn. The pseudo-pte will contain:
  
  - 58 bits for the physical address of the first byte in the page, even
less bits would probably be sufficient for quite a while
  
  - 4 bits for the page size, with 0 meaning native page size (4k on x86,
8k on alpha, ...) and values 1-15 being specific to the architecture
(I used 1 for 2M, 2 for 4M and 3 for 1G for x86)
  
  - a swap bit indicating that a not present page is paged out, with the
physical address field containing page file number and block number
just like before
  
  - a present bit just like in a real pte

  By shortening the field for the physical address, some more interesting
  information could be included, like read/write permissions and the like.
  The page size could also be returned directly, 6 bits could be used to
  express any page shift in a 64 bit system, but I found the encoded page
  size more useful for my specific use case.
  
  
  The attached patch changes the /proc/pid/pagemap code to use such a
  pseudo-pte. The huge page handling is currently limited to 2M/4M pages
  on x86, 1G pages will need some more work. To keep the simple mapping of
  virtual addresses to file index intact, any huge page pseudo-pte is
  replicated in the user buffer to map the equivalent range of small
  pages. 
  
  Note that I had to move the pmd_pfn() macro from asm-x86/pgtable_64.h to
  asm-x86/pgtable.h, it applies to both 32 bit and 64 bit x86.
  
  Other architectures will probably need other changes to support huge
  pages and return the page size.
  
  I think that the definition of the pseudo-pte structure and the page
  size codes should be made available through a header file, but I didn't
  do this for now.
  
 
 If we're going to do this, we need to do it *fast*.  Once 2.6.25 goes out
 our hands are tied.
 
 That means talking with the maintainers of other hugepage-capable
 architectures.
 
  +struct ppte {
  +   uint64_t paddr:58;
  +   uint64_t psize:4;
  +   uint64_t swap:1;
  +   uint64_t present:1;
  +};
 
 This is part of the exported kernel interface and hence should be in a
 header somewhere, shouldn't it?  The old stuff should have been too.

I think we're better off not using bitfields here.

 u64 is a bit more conventional than uint64_t, and if we move this to a
 userspace-visible header then __u64 is the type to use, I think.  Although
 one would expect uint64_t to be OK as well.
 
  +#ifdef CONFIG_X86
  +#define PM_PSIZE_1G  3
  +#define PM_PSIZE_4M  2
  +#define PM_PSIZE_2M  1
  +#endif
 
 No, we should factor this correctly and get the CONFIG_X86 stuff out of here.

Perhaps my continuation bit idea.

 Matt?  Help?

Did my previous message make it out? This is probably my last message
for 24+ hours.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)

2008-02-22 Thread Matt Mackall


On Fri, 2008-02-15 at 12:00 +0100, Thomas Petazzoni wrote:
> Hi,
> 
> Le Mon, 11 Feb 2008 16:54:30 -0800,
> "H. Peter Anvin" <[EMAIL PROTECTED]> a écrit :
> 
> > b) would be my first choice, and yes, it would be a good thing to
> > have a generalized mechanism for this.  For the registrant, it's
> > pretty easy: just add a macro that adds a pointer to a named
> > section.  We then need a way to get the base address and length of
> > each such section in order to be able to execute each function in
> > sequence.
> 
> You'll find below a tentative patch that implements this. Tuple
> (vendor, pointer to cpu_dev structure) are stored in a
> x86cpuvendor.init section of the kernel, which is then read by the
> generic CPU code in arch/x86/kernel/cpu/common.c to fill the cpu_devs[]
> function.

This is not quite what Peter and I were thinking of, I think. It's not
at all generic. How about a section that simply contains a set of
function pointers, a macro to add things to that section, and a function
that calls all the pointers in that section. Eg:

CALLBACK_SECTION(init_cpu_amd, "cpuvendor.init");
invoke_callback_section("cpuvendor.init");

..which would give us a generic facility we could use in various places.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size

2008-02-22 Thread Matt Mackall

(sorry for the delay, travelling)

On Wed, 2008-02-20 at 14:57 +0100, Hans Rosenfeld wrote:
> The current code for /proc/pid/pagemap does not work with huge pages (on
> x86). The code will make no difference between a normal pmd and a huge
> page pmd, trying to parse the contents of the huge page as ptes. Another
> problem is that there is no way to get information about the page size a
> specific mapping uses.
> 
> Also, the current way the "not present" and "swap" bits are encoded in
> the returned pfn isn't very clean, especially not if this interface is
> going to be extended.

Fair.

> I propose to change /proc/pid/pagemap to return a pseudo-pte instead of
> just a raw pfn. The pseudo-pte will contain:
> 
> - 58 bits for the physical address of the first byte in the page, even
>   less bits would probably be sufficient for quite a while
> 
> - 4 bits for the page size, with 0 meaning native page size (4k on x86,
>   8k on alpha, ...) and values 1-15 being specific to the architecture
>   (I used 1 for 2M, 2 for 4M and 3 for 1G for x86)
> 
> - a "swap" bit indicating that a not present page is paged out, with the
>   physical address field containing page file number and block number
>   just like before
> 
> - a "present" bit just like in a real pte

This is ok-ish, but I can't say I like it much. Especially the page size
field.

But I don't really have many ideas here. Perhaps having a bit saying
"this entry is really a continuation of the previous one". Then any page
size can be trivially represented. This might also make the code on both
sides simpler?
  
> By shortening the field for the physical address, some more interesting
> information could be included, like read/write permissions and the like.
> The page size could also be returned directly, 6 bits could be used to
> express any page shift in a 64 bit system, but I found the encoded page
> size more useful for my specific use case.
> 
> 
> The attached patch changes the /proc/pid/pagemap code to use such a
> pseudo-pte. The huge page handling is currently limited to 2M/4M pages
> on x86, 1G pages will need some more work. To keep the simple mapping of
> virtual addresses to file index intact, any huge page pseudo-pte is
> replicated in the user buffer to map the equivalent range of small
> pages. 
> 
> Note that I had to move the pmd_pfn() macro from asm-x86/pgtable_64.h to
> asm-x86/pgtable.h, it applies to both 32 bit and 64 bit x86.
> 
> Other architectures will probably need other changes to support huge
> pages and return the page size.
> 
> I think that the definition of the pseudo-pte structure and the page
> size codes should be made available through a header file, but I didn't
> do this for now.
> 
> Signed-Off-By: Hans Rosenfeld <[EMAIL PROTECTED]>
> 
> ---
>  fs/proc/task_mmu.c   |   68 +
>  include/asm-x86/pgtable.h|2 +
>  include/asm-x86/pgtable_64.h |1 -
>  3 files changed, 50 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 49958cf..58af588 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -527,16 +527,23 @@ struct pagemapread {
>   char __user *out, *end;
>  };
>  
> -#define PM_ENTRY_BYTES sizeof(u64)
> -#define PM_RESERVED_BITS3
> -#define PM_RESERVED_OFFSET  (64 - PM_RESERVED_BITS)
> -#define PM_RESERVED_MASK(((1LL< PM_RESERVED_OFFSET)
> -#define PM_SPECIAL(nr)  (((nr) << PM_RESERVED_OFFSET) | PM_RESERVED_MASK)
> -#define PM_NOT_PRESENT  PM_SPECIAL(1LL)
> -#define PM_SWAP PM_SPECIAL(2LL)
> -#define PM_END_OF_BUFFER1
> -
> -static int add_to_pagemap(unsigned long addr, u64 pfn,
> +struct ppte {
> + uint64_t paddr:58;
> + uint64_t psize:4;
> + uint64_t swap:1;
> + uint64_t present:1;
> +};
> +
> +#ifdef CONFIG_X86
> +#define PM_PSIZE_1G  3
> +#define PM_PSIZE_4M  2
> +#define PM_PSIZE_2M  1
> +#endif
> +
> +#define PM_ENTRY_BYTES   sizeof(struct ppte)
> +#define PM_END_OF_BUFFER 1
> +
> +static int add_to_pagemap(unsigned long addr, struct ppte ppte,
> struct pagemapread *pm)
>  {
>   /*
> @@ -545,13 +552,13 @@ static int add_to_pagemap(unsigned long addr, u64 pfn,
>* the pfn.
>*/
>   if (pm->out + PM_ENTRY_BYTES >= pm->end) {
> - if (copy_to_user(pm->out, , pm->end - pm->out))
> + if (copy_to_user(pm->out, , pm->end - pm->out))
>   return -EFAULT;
>   pm->out = pm->end;
>   return PM_END_OF_BUFFER;
>   }
>  
> - if (put_user(pfn, pm->out))
> + if (copy_to_user(pm->out, , sizeof(ppte)))
>   return -EFAULT;
>   pm->out += PM_ENTRY_BYTES;
>   return 0;
> @@ -564,7 +571,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned 
> long end,
>   unsigned long addr;
>   int err = 0;
>   for (addr = start; addr < end; addr += PAGE_SIZE) {
> - err = add_to_pagemap(addr,

Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size

2008-02-22 Thread Matt Mackall

(sorry for the delay, travelling)

On Wed, 2008-02-20 at 14:57 +0100, Hans Rosenfeld wrote:
 The current code for /proc/pid/pagemap does not work with huge pages (on
 x86). The code will make no difference between a normal pmd and a huge
 page pmd, trying to parse the contents of the huge page as ptes. Another
 problem is that there is no way to get information about the page size a
 specific mapping uses.
 
 Also, the current way the not present and swap bits are encoded in
 the returned pfn isn't very clean, especially not if this interface is
 going to be extended.

Fair.

 I propose to change /proc/pid/pagemap to return a pseudo-pte instead of
 just a raw pfn. The pseudo-pte will contain:
 
 - 58 bits for the physical address of the first byte in the page, even
   less bits would probably be sufficient for quite a while
 
 - 4 bits for the page size, with 0 meaning native page size (4k on x86,
   8k on alpha, ...) and values 1-15 being specific to the architecture
   (I used 1 for 2M, 2 for 4M and 3 for 1G for x86)
 
 - a swap bit indicating that a not present page is paged out, with the
   physical address field containing page file number and block number
   just like before
 
 - a present bit just like in a real pte

This is ok-ish, but I can't say I like it much. Especially the page size
field.

But I don't really have many ideas here. Perhaps having a bit saying
this entry is really a continuation of the previous one. Then any page
size can be trivially represented. This might also make the code on both
sides simpler?
  
 By shortening the field for the physical address, some more interesting
 information could be included, like read/write permissions and the like.
 The page size could also be returned directly, 6 bits could be used to
 express any page shift in a 64 bit system, but I found the encoded page
 size more useful for my specific use case.
 
 
 The attached patch changes the /proc/pid/pagemap code to use such a
 pseudo-pte. The huge page handling is currently limited to 2M/4M pages
 on x86, 1G pages will need some more work. To keep the simple mapping of
 virtual addresses to file index intact, any huge page pseudo-pte is
 replicated in the user buffer to map the equivalent range of small
 pages. 
 
 Note that I had to move the pmd_pfn() macro from asm-x86/pgtable_64.h to
 asm-x86/pgtable.h, it applies to both 32 bit and 64 bit x86.
 
 Other architectures will probably need other changes to support huge
 pages and return the page size.
 
 I think that the definition of the pseudo-pte structure and the page
 size codes should be made available through a header file, but I didn't
 do this for now.
 
 Signed-Off-By: Hans Rosenfeld [EMAIL PROTECTED]
 
 ---
  fs/proc/task_mmu.c   |   68 +
  include/asm-x86/pgtable.h|2 +
  include/asm-x86/pgtable_64.h |1 -
  3 files changed, 50 insertions(+), 21 deletions(-)
 
 diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
 index 49958cf..58af588 100644
 --- a/fs/proc/task_mmu.c
 +++ b/fs/proc/task_mmu.c
 @@ -527,16 +527,23 @@ struct pagemapread {
   char __user *out, *end;
  };
  
 -#define PM_ENTRY_BYTES sizeof(u64)
 -#define PM_RESERVED_BITS3
 -#define PM_RESERVED_OFFSET  (64 - PM_RESERVED_BITS)
 -#define PM_RESERVED_MASK(((1LLPM_RESERVED_BITS)-1)  
 PM_RESERVED_OFFSET)
 -#define PM_SPECIAL(nr)  (((nr)  PM_RESERVED_OFFSET) | PM_RESERVED_MASK)
 -#define PM_NOT_PRESENT  PM_SPECIAL(1LL)
 -#define PM_SWAP PM_SPECIAL(2LL)
 -#define PM_END_OF_BUFFER1
 -
 -static int add_to_pagemap(unsigned long addr, u64 pfn,
 +struct ppte {
 + uint64_t paddr:58;
 + uint64_t psize:4;
 + uint64_t swap:1;
 + uint64_t present:1;
 +};
 +
 +#ifdef CONFIG_X86
 +#define PM_PSIZE_1G  3
 +#define PM_PSIZE_4M  2
 +#define PM_PSIZE_2M  1
 +#endif
 +
 +#define PM_ENTRY_BYTES   sizeof(struct ppte)
 +#define PM_END_OF_BUFFER 1
 +
 +static int add_to_pagemap(unsigned long addr, struct ppte ppte,
 struct pagemapread *pm)
  {
   /*
 @@ -545,13 +552,13 @@ static int add_to_pagemap(unsigned long addr, u64 pfn,
* the pfn.
*/
   if (pm-out + PM_ENTRY_BYTES = pm-end) {
 - if (copy_to_user(pm-out, pfn, pm-end - pm-out))
 + if (copy_to_user(pm-out, ppte, pm-end - pm-out))
   return -EFAULT;
   pm-out = pm-end;
   return PM_END_OF_BUFFER;
   }
  
 - if (put_user(pfn, pm-out))
 + if (copy_to_user(pm-out, ppte, sizeof(ppte)))
   return -EFAULT;
   pm-out += PM_ENTRY_BYTES;
   return 0;
 @@ -564,7 +571,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned 
 long end,
   unsigned long addr;
   int err = 0;
   for (addr = start; addr  end; addr += PAGE_SIZE) {
 - err = add_to_pagemap(addr, PM_NOT_PRESENT, pm);
 + err = add_to_pagemap(addr, (struct ppte) {0, 0, 0, 0}, pm);
   if

Re: [RFC] [PATCH] x86: Use ELF section to list CPU vendor specific code (Linux Tiny)

2008-02-22 Thread Matt Mackall


On Fri, 2008-02-15 at 12:00 +0100, Thomas Petazzoni wrote:
 Hi,
 
 Le Mon, 11 Feb 2008 16:54:30 -0800,
 H. Peter Anvin [EMAIL PROTECTED] a écrit :
 
  b) would be my first choice, and yes, it would be a good thing to
  have a generalized mechanism for this.  For the registrant, it's
  pretty easy: just add a macro that adds a pointer to a named
  section.  We then need a way to get the base address and length of
  each such section in order to be able to execute each function in
  sequence.
 
 You'll find below a tentative patch that implements this. Tuple
 (vendor, pointer to cpu_dev structure) are stored in a
 x86cpuvendor.init section of the kernel, which is then read by the
 generic CPU code in arch/x86/kernel/cpu/common.c to fill the cpu_devs[]
 function.

This is not quite what Peter and I were thinking of, I think. It's not
at all generic. How about a section that simply contains a set of
function pointers, a macro to add things to that section, and a function
that calls all the pointers in that section. Eg:

CALLBACK_SECTION(init_cpu_amd, cpuvendor.init);
invoke_callback_section(cpuvendor.init);

..which would give us a generic facility we could use in various places.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] x86 : relocate uninitialized variable in init DATA section into init BSS section

2008-02-21 Thread Matt Mackall


On Thu, 2008-02-21 at 10:53 +0100, Ingo Molnar wrote:
> * Huang, Ying <[EMAIL PROTECTED]> wrote:
> 
> > > > -int __initdata early_ioremap_debug;
> > > > +int __initbss early_ioremap_debug;
> > > 
> > > will we get some sort of build error if we accidentally do:
> > > 
> > >int __initbss early_ioremap_debug = 1;
> > > 
> > > ?
> > 
> > I tested it just now, and there is no build error.
> 
> well, that's bad. We'd silently ignore the " = 1" and boot up with that 
> value at 0, right? At minimum we need some really prominent build-time 
> _errors_ (i.e. aborted builds) if this ever happens. But ideally, 
> shouldnt this whole thing be done at link time? Couldnt the linker sort 
> the variables that are zero initialized into the right section, and move 
> this constant maintenance pressure off the programmer's shoulder?

I'm not sure if it's possible currently. But it might be possible to
instead tag objects as "init" with an attribute other than section and
then move such objects into init sections "by hand" late in the build.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] x86 : relocate uninitialized variable in init DATA section into init BSS section

2008-02-21 Thread Matt Mackall


On Thu, 2008-02-21 at 10:53 +0100, Ingo Molnar wrote:
 * Huang, Ying [EMAIL PROTECTED] wrote:
 
-int __initdata early_ioremap_debug;
+int __initbss early_ioremap_debug;
   
   will we get some sort of build error if we accidentally do:
   
  int __initbss early_ioremap_debug = 1;
   
   ?
  
  I tested it just now, and there is no build error.
 
 well, that's bad. We'd silently ignore the  = 1 and boot up with that 
 value at 0, right? At minimum we need some really prominent build-time 
 _errors_ (i.e. aborted builds) if this ever happens. But ideally, 
 shouldnt this whole thing be done at link time? Couldnt the linker sort 
 the variables that are zero initialized into the right section, and move 
 this constant maintenance pressure off the programmer's shoulder?

I'm not sure if it's possible currently. But it might be possible to
instead tag objects as init with an attribute other than section and
then move such objects into init sections by hand late in the build.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch/x86/mm/ioremap unification grew by 10x

2008-02-15 Thread Matt Mackall


On Fri, 2008-02-15 at 15:21 -0600, Matt Mackall wrote:
> On Fri, 2008-02-15 at 21:32 +0100, Sam Ravnborg wrote:
> > On Fri, Feb 15, 2008 at 02:25:54PM -0600, Matt Mackall wrote:
> > > In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In
> > > 2.6.25-rc1, the unified ioremap.o is 20.8k.
> > 
> > Just an observation - 17 commits touches said file after
> > the unification (at least in latest -linus).
> 
> Correction: those numbers should be halved. So we're going from .9k to
> 10.4k.

And here's most of the cause:

02b8 0124 T early_ioremap
1000 1000 t bm_pte
2000 0004 T early_ioremap_debug

static __initdata pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)]
__attribute__((aligned(PAGE_SIZE)));

Double ouch. First, this isn't in BSS. Second, even though it's
initdata, the alignment slop won't get recovered.

Don't we have a special section for page-aligned crap so it doesn't
waste most of two pages?

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch/x86/mm/ioremap unification grew by 10x

2008-02-15 Thread Matt Mackall


On Fri, 2008-02-15 at 21:32 +0100, Sam Ravnborg wrote:
> On Fri, Feb 15, 2008 at 02:25:54PM -0600, Matt Mackall wrote:
> > In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In
> > 2.6.25-rc1, the unified ioremap.o is 20.8k.
> 
> Just an observation - 17 commits touches said file after
> the unification (at least in latest -linus).

Correction: those numbers should be halved. So we're going from .9k to
10.4k.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

arch/x86/mm/ioremap unification grew by 10x

2008-02-15 Thread Matt Mackall

In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In
2.6.25-rc1, the unified ioremap.o is 20.8k.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)

2008-02-15 Thread Matt Mackall


On Fri, 2008-02-15 at 19:04 +0100, Andi Kleen wrote:
> > do when it does". There's very little point in having this sort of code
> > in a mass-market camera, phone, DVR, TV, etc. (of which there are
> 
> Do any of them run with a x86 CPU?

Yes. The last PVR I worked on was just such a device, as was the very
first device I put Linux on (1.2 era). There are several families of x86
CPUs targeted at embedded so this shouldn't be a surprise.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)

2008-02-15 Thread Matt Mackall

On Fri, 2008-02-15 at 13:00 +0100, Andi Kleen wrote:
> Matt Mackall <[EMAIL PROTECTED]> writes:
> >
> > I bet there's some doublefault-handling code hiding somewhere. It's not
> > the sort of thing it'd make sense to take out of the architecture.
> 
> The big question is if it makes sense taking out of a kernel at all.
> I still think the answer is no.
> 
> Or have you considered replacing die() and show_trace() etc. with a single
> panic("the tiny gods say this won't happen") yet? That would be roughly 
> equivalent.

It's not a matter of "won't happen" so much as "not a damn thing we can
do when it does". There's very little point in having this sort of code
in a mass-market camera, phone, DVR, TV, etc. (of which there are
already millions running Linux). These devices have no console and
basically zero serviceability beyond firmware upgrades. If taking these
vestigial debugging features out means we can cram in more features that
consumers can actually see and will pay for, that's precisely what's
going to happen.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)

2008-02-15 Thread Matt Mackall


On Fri, 2008-02-15 at 19:04 +0100, Andi Kleen wrote:
  do when it does. There's very little point in having this sort of code
  in a mass-market camera, phone, DVR, TV, etc. (of which there are
 
 Do any of them run with a x86 CPU?

Yes. The last PVR I worked on was just such a device, as was the very
first device I put Linux on (1.2 era). There are several families of x86
CPUs targeted at embedded so this shouldn't be a surprise.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)

2008-02-15 Thread Matt Mackall


On Fri, 2008-02-15 at 13:00 +0100, Andi Kleen wrote:
 Matt Mackall [EMAIL PROTECTED] writes:
 
  I bet there's some doublefault-handling code hiding somewhere. It's not
  the sort of thing it'd make sense to take out of the architecture.
 
 The big question is if it makes sense taking out of a kernel at all.
 I still think the answer is no.
 
 Or have you considered replacing die() and show_trace() etc. with a single
 panic(the tiny gods say this won't happen) yet? That would be roughly 
 equivalent.

It's not a matter of won't happen so much as not a damn thing we can
do when it does. There's very little point in having this sort of code
in a mass-market camera, phone, DVR, TV, etc. (of which there are
already millions running Linux). These devices have no console and
basically zero serviceability beyond firmware upgrades. If taking these
vestigial debugging features out means we can cram in more features that
consumers can actually see and will pay for, that's precisely what's
going to happen.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

arch/x86/mm/ioremap unification grew by 10x

2008-02-15 Thread Matt Mackall

In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In
2.6.25-rc1, the unified ioremap.o is 20.8k.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch/x86/mm/ioremap unification grew by 10x

2008-02-15 Thread Matt Mackall


On Fri, 2008-02-15 at 21:32 +0100, Sam Ravnborg wrote:
 On Fri, Feb 15, 2008 at 02:25:54PM -0600, Matt Mackall wrote:
  In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In
  2.6.25-rc1, the unified ioremap.o is 20.8k.
 
 Just an observation - 17 commits touches said file after
 the unification (at least in latest -linus).

Correction: those numbers should be halved. So we're going from .9k to
10.4k.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch/x86/mm/ioremap unification grew by 10x

2008-02-15 Thread Matt Mackall


On Fri, 2008-02-15 at 15:21 -0600, Matt Mackall wrote:
 On Fri, 2008-02-15 at 21:32 +0100, Sam Ravnborg wrote:
  On Fri, Feb 15, 2008 at 02:25:54PM -0600, Matt Mackall wrote:
   In 2.6.24 defconfig, my build stats show ioremap_32.o was 1.8k. In
   2.6.25-rc1, the unified ioremap.o is 20.8k.
  
  Just an observation - 17 commits touches said file after
  the unification (at least in latest -linus).
 
 Correction: those numbers should be halved. So we're going from .9k to
 10.4k.

And here's most of the cause:

02b8 0124 T early_ioremap
1000 1000 t bm_pte
2000 0004 T early_ioremap_debug

static __initdata pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)]
__attribute__((aligned(PAGE_SIZE)));

Double ouch. First, this isn't in BSS. Second, even though it's
initdata, the alignment slop won't get recovered.

Don't we have a special section for page-aligned crap so it doesn't
waste most of two pages?

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] make swap_pte_to_pagemap_entry() static

2008-02-13 Thread Matt Mackall


On Wed, 2008-02-13 at 23:30 +0200, Adrian Bunk wrote:
> This patch makes the needlessly global swap_pte_to_pagemap_entry() 
> static.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Thanks.

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] make swap_pte_to_pagemap_entry() static

2008-02-13 Thread Matt Mackall


On Wed, 2008-02-13 at 23:30 +0200, Adrian Bunk wrote:
 This patch makes the needlessly global swap_pte_to_pagemap_entry() 
 static.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Thanks.

Signed-off-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Configure out DMI scanning code v2 (Linux Tiny)

2008-02-12 Thread Matt Mackall


On Tue, 2008-02-12 at 10:04 +0100, Thomas Petazzoni wrote:

> Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in
> order to be able to remove the DMI table scanning code if it's not
> needed, and then reduce the kernel code size.
> 
> With CONFIG_DMI (i.e before) :
> 
>textdata bss dec hex filename
> 1076076  128656   98304 1303036  13e1fc vmlinux
> 
> Without CONFIG_DMI (i.e after) :
> 
>textdata bss dec hex filename
> 1068092  126308   98304 1292704  13b9a0 vmlinux
> 
> Result:
> 
>textdata bss dec hex filename
>   -7984   -2348   0  -10332   -285c vmlinux
> 
> The new option appears in "Processor type and features", only when
> CONFIG_EMBEDDED is defined.
> 
> This patch is part of the Linux Tiny project, and is based on previous
> work done by Matt Mackall <[EMAIL PROTECTED]>.
> 
> Signed-off-by: Thomas Petazzoni <[EMAIL PROTECTED]>

Thanks for working on this.

Acked-by: Matt Mackall <[EMAIL PROTECTED]>

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)

2008-02-12 Thread Matt Mackall


On Tue, 2008-02-12 at 15:00 +0100, Thomas Petazzoni wrote:
> Hi Sam,
> 
> Le Tue, 12 Feb 2008 14:04:28 +0100,
> Sam Ravnborg <[EMAIL PROTECTED]> a écrit :
> 
> > We already have this in arch/x86/Kconfig.debug:
> 
> Oops, my usual "find . -name Kconfig" missed it. Thanks for pointing it
> out!

The fact that you didn't have to add any makefile bits should have been
a hint.

> > It may need a small update if this is valid for both 32 and 64 bit.
> 
> Doesn't seem so: there's only a doublefault_32.c, no doublefault_64.c.
> However, I don't know the details of x86_64.

I bet there's some doublefault-handling code hiding somewhere. It's not
the sort of thing it'd make sense to take out of the architecture.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Configure out DMI scanning code v2 (Linux Tiny)

2008-02-12 Thread Matt Mackall


On Tue, 2008-02-12 at 10:04 +0100, Thomas Petazzoni wrote:

 Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in
 order to be able to remove the DMI table scanning code if it's not
 needed, and then reduce the kernel code size.
 
 With CONFIG_DMI (i.e before) :
 
textdata bss dec hex filename
 1076076  128656   98304 1303036  13e1fc vmlinux
 
 Without CONFIG_DMI (i.e after) :
 
textdata bss dec hex filename
 1068092  126308   98304 1292704  13b9a0 vmlinux
 
 Result:
 
textdata bss dec hex filename
   -7984   -2348   0  -10332   -285c vmlinux
 
 The new option appears in Processor type and features, only when
 CONFIG_EMBEDDED is defined.
 
 This patch is part of the Linux Tiny project, and is based on previous
 work done by Matt Mackall [EMAIL PROTECTED].
 
 Signed-off-by: Thomas Petazzoni [EMAIL PROTECTED]

Thanks for working on this.

Acked-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Configure out doublefault exception handler (Linux Tiny)

2008-02-12 Thread Matt Mackall


On Tue, 2008-02-12 at 15:00 +0100, Thomas Petazzoni wrote:
 Hi Sam,
 
 Le Tue, 12 Feb 2008 14:04:28 +0100,
 Sam Ravnborg [EMAIL PROTECTED] a écrit :
 
  We already have this in arch/x86/Kconfig.debug:
 
 Oops, my usual find . -name Kconfig missed it. Thanks for pointing it
 out!

The fact that you didn't have to add any makefile bits should have been
a hint.

  It may need a small update if this is valid for both 32 and 64 bit.
 
 Doesn't seem so: there's only a doublefault_32.c, no doublefault_64.c.
 However, I don't know the details of x86_64.

I bet there's some doublefault-handling code hiding somewhere. It's not
the sort of thing it'd make sense to take out of the architecture.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86 (Linux Tiny): configure out support for some processors

2008-02-11 Thread Matt Mackall


On Mon, 2008-02-11 at 16:54 -0800, H. Peter Anvin wrote:
> Matt Mackall wrote:
> > On Mon, 2008-02-11 at 15:01 -0800, H. Peter Anvin wrote:
> >> Matt Mackall wrote:
> >>> Best would be to have no ifdefs and do it all with linker magic, of
> >>> course. But that's trickier.
> >>>
> >> I concur with this, definitely.
> > 
> > Ok, so let's come up with a plan. We can:
> > 
> > a) use weak symbols, ala cond_syscall
> > b) use a special section
> > c) use early_init code (is it early enough?)
> > c) have some sort of registration list
> > 
> > Having a generic cond_call of some sort might be nice for this sort of
> > thing.
> > 
> 
> c) is out, because this has to be executed after the early generic code 
> and before the late generic code.
> 
> b) would be my first choice, and yes, it would be a good thing to have a 
> generalized mechanism for this.  For the registrant, it's pretty easy: 
> just add a macro that adds a pointer to a named section.  We then need a 
> way to get the base address and length of each such section in order to 
> be able to execute each function in sequence.

I like the idea of making a generalized hook section. But this is a bit
burdensome for Michael's little patch (unless you have time to whip
something up) so I think we should probably explore it separately.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86 (Linux Tiny): configure out support for some processors

2008-02-11 Thread Matt Mackall


On Mon, 2008-02-11 at 15:01 -0800, H. Peter Anvin wrote:
> Matt Mackall wrote:
> > 
> > Best would be to have no ifdefs and do it all with linker magic, of
> > course. But that's trickier.
> > 
> 
> I concur with this, definitely.

Ok, so let's come up with a plan. We can:

a) use weak symbols, ala cond_syscall
b) use a special section
c) use early_init code (is it early enough?)
c) have some sort of registration list

Having a generic cond_call of some sort might be nice for this sort of
thing.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86 (Linux Tiny): configure out support for some processors

2008-02-11 Thread Matt Mackall

On Mon, 2008-02-11 at 23:42 +0100, Michael Opdenacker wrote:
>  /* Specific CPU type init functions */
> -int intel_cpu_init(void);
> -int amd_init_cpu(void);
> -int cyrix_init_cpu(void);
> -int nsc_init_cpu(void);
> -int centaur_init_cpu(void);
> -int transmeta_init_cpu(void);
> -int nexgen_init_cpu(void);
> -int umc_init_cpu(void);
> +
> +#ifdef CONFIG_CPU_SUP_INTEL
> +int __cpuinit __ppro_with_ram_bug(void);
> +static inline int __cpuinit ppro_with_ram_bug(void)
> +{
> + return __ppro_with_ram_bug();
> +}

I know Ingo said to do this, but I think he was flat-out wrong. If the
tradeoff is between having a dozen ifdefs contained in a single function
in one .c file vs wrapping a dozen function in a .h file, I say stick
them in the .c file.

Best would be to have no ifdefs and do it all with linker magic, of
course. But that's trickier.

Now the patch is 90% fiddling with wrappers and it's impossible to find
the interesting bits anymore..

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] slob: fix linking for user mode linux

2008-02-11 Thread Matt Mackall


On Tue, 2008-02-12 at 00:32 +0200, Pekka J Enberg wrote:
> From: Pekka Enberg <[EMAIL PROTECTED]>
> 
> UML has some header magic that expects a non-inline __kmalloc() function to be
> available. Fixes the following link time errors:
> 
> arch/um/drivers/built-in.o: In function `kmalloc':
> /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference 
> to `__kmalloc'
> /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference 
> to `__kmalloc'
> /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference 
> to `__kmalloc'
> /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference 
> to `__kmalloc'
> /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference 
> to `__kmalloc'
> arch/um/drivers/built-in.o:/home/penberg/linux-2.6/arch/um/include/um_malloc.h:14:
>  more undefined references to `__kmalloc' follow

Can someone explain why the magic is needed (and preferably capture it
in a comment somewhere sensible)? I took a peek at this and have no idea
what's going on. 

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Configure out DMI scanning code

2008-02-11 Thread Matt Mackall


On Mon, 2008-02-11 at 17:58 +0100, Thomas Petazzoni wrote:
> Hi,
> 
> The enclosed patch allows to remove the DMI scanning code when
> CONFIG_EMBEDDED is defined. It's basically the dma_blacklist patch of
> Linux-Tiny ported to 2.6.25-rc1, with the required modifications. It
> allows to remove ~10k from the kernel code/data size.

Looks ok. Please preserve original authorship (ie me) in some fashion in
your description.

> On top of this patch, I've tested if removing the big dmi tables in the
> code (for example in arch/x86/kernel/reboot.c) would allow to make more
> space optimizations. However, it seems that simply defining
> dmi_check_system() to an empty static inlined function already allows
> gcc to optimize out the dmi tables, because there are not present in
> the code. Is that possible, or is my understanding incorrect ?

That's possible with modern gccs, yes.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [RESENDING] netconsole: register cmdline netconsole configs to configfs

2008-02-11 Thread Matt Mackall


On Mon, 2008-02-11 at 18:08 +0900, Joonwoo Park wrote:
> This patch intorduces cmdline netconsole configs to register to
> configfs
> with dynamic netconsole. Satyam Sharma who designed shiny dynamic
> reconfiguration for netconsole, mentioned about this issue already.
> (http://lkml.org/lkml/2007/7/29/360)
> But I think, without separately managing of two kind of netconsole
> target
> objects, it's possible by using config_group instead of
> config_item in the netconsole_target and default_groups feature of
> configfs.
> 
> Patch was tested with configuration creation/destruction by kernel and
> module.
> And it makes possible to enable/disable, modify and review netconsole
> target configs from cmdline.

I'm afraid I'm going to have to leave review of this to someone who is
clueful about configfs. But it seems reasonable.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86 (Linux Tiny): configure out support for some processors

2008-02-11 Thread Matt Mackall


On Mon, 2008-02-11 at 23:42 +0100, Michael Opdenacker wrote:
  /* Specific CPU type init functions */
 -int intel_cpu_init(void);
 -int amd_init_cpu(void);
 -int cyrix_init_cpu(void);
 -int nsc_init_cpu(void);
 -int centaur_init_cpu(void);
 -int transmeta_init_cpu(void);
 -int nexgen_init_cpu(void);
 -int umc_init_cpu(void);
 +
 +#ifdef CONFIG_CPU_SUP_INTEL
 +int __cpuinit __ppro_with_ram_bug(void);
 +static inline int __cpuinit ppro_with_ram_bug(void)
 +{
 + return __ppro_with_ram_bug();
 +}

I know Ingo said to do this, but I think he was flat-out wrong. If the
tradeoff is between having a dozen ifdefs contained in a single function
in one .c file vs wrapping a dozen function in a .h file, I say stick
them in the .c file.

Best would be to have no ifdefs and do it all with linker magic, of
course. But that's trickier.

Now the patch is 90% fiddling with wrappers and it's impossible to find
the interesting bits anymore..

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Configure out DMI scanning code

2008-02-11 Thread Matt Mackall


On Mon, 2008-02-11 at 17:58 +0100, Thomas Petazzoni wrote:
 Hi,
 
 The enclosed patch allows to remove the DMI scanning code when
 CONFIG_EMBEDDED is defined. It's basically the dma_blacklist patch of
 Linux-Tiny ported to 2.6.25-rc1, with the required modifications. It
 allows to remove ~10k from the kernel code/data size.

Looks ok. Please preserve original authorship (ie me) in some fashion in
your description.

 On top of this patch, I've tested if removing the big dmi tables in the
 code (for example in arch/x86/kernel/reboot.c) would allow to make more
 space optimizations. However, it seems that simply defining
 dmi_check_system() to an empty static inlined function already allows
 gcc to optimize out the dmi tables, because there are not present in
 the code. Is that possible, or is my understanding incorrect ?

That's possible with modern gccs, yes.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [RESENDING] netconsole: register cmdline netconsole configs to configfs

2008-02-11 Thread Matt Mackall


On Mon, 2008-02-11 at 18:08 +0900, Joonwoo Park wrote:
 This patch intorduces cmdline netconsole configs to register to
 configfs
 with dynamic netconsole. Satyam Sharma who designed shiny dynamic
 reconfiguration for netconsole, mentioned about this issue already.
 (http://lkml.org/lkml/2007/7/29/360)
 But I think, without separately managing of two kind of netconsole
 target
 objects, it's possible by using config_group instead of
 config_item in the netconsole_target and default_groups feature of
 configfs.
 
 Patch was tested with configuration creation/destruction by kernel and
 module.
 And it makes possible to enable/disable, modify and review netconsole
 target configs from cmdline.

I'm afraid I'm going to have to leave review of this to someone who is
clueful about configfs. But it seems reasonable.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] slob: fix linking for user mode linux

2008-02-11 Thread Matt Mackall


On Tue, 2008-02-12 at 00:32 +0200, Pekka J Enberg wrote:
 From: Pekka Enberg [EMAIL PROTECTED]
 
 UML has some header magic that expects a non-inline __kmalloc() function to be
 available. Fixes the following link time errors:
 
 arch/um/drivers/built-in.o: In function `kmalloc':
 /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference 
 to `__kmalloc'
 /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference 
 to `__kmalloc'
 /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference 
 to `__kmalloc'
 /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference 
 to `__kmalloc'
 /home/penberg/linux-2.6/arch/um/include/um_malloc.h:14: undefined reference 
 to `__kmalloc'
 arch/um/drivers/built-in.o:/home/penberg/linux-2.6/arch/um/include/um_malloc.h:14:
  more undefined references to `__kmalloc' follow

Can someone explain why the magic is needed (and preferably capture it
in a comment somewhere sensible)? I took a peek at this and have no idea
what's going on. 

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86 (Linux Tiny): configure out support for some processors

2008-02-11 Thread Matt Mackall


On Mon, 2008-02-11 at 16:54 -0800, H. Peter Anvin wrote:
 Matt Mackall wrote:
  On Mon, 2008-02-11 at 15:01 -0800, H. Peter Anvin wrote:
  Matt Mackall wrote:
  Best would be to have no ifdefs and do it all with linker magic, of
  course. But that's trickier.
 
  I concur with this, definitely.
  
  Ok, so let's come up with a plan. We can:
  
  a) use weak symbols, ala cond_syscall
  b) use a special section
  c) use early_init code (is it early enough?)
  c) have some sort of registration list
  
  Having a generic cond_call of some sort might be nice for this sort of
  thing.
  
 
 c) is out, because this has to be executed after the early generic code 
 and before the late generic code.
 
 b) would be my first choice, and yes, it would be a good thing to have a 
 generalized mechanism for this.  For the registrant, it's pretty easy: 
 just add a macro that adds a pointer to a named section.  We then need a 
 way to get the base address and length of each such section in order to 
 be able to execute each function in sequence.

I like the idea of making a generalized hook section. But this is a bit
burdensome for Michael's little patch (unless you have time to whip
something up) so I think we should probably explore it separately.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86 (Linux Tiny): configure out support for some processors

2008-02-11 Thread Matt Mackall


On Mon, 2008-02-11 at 15:01 -0800, H. Peter Anvin wrote:
 Matt Mackall wrote:
  
  Best would be to have no ifdefs and do it all with linker magic, of
  course. But that's trickier.
  
 
 I concur with this, definitely.

Ok, so let's come up with a plan. We can:

a) use weak symbols, ala cond_syscall
b) use a special section
c) use early_init code (is it early enough?)
c) have some sort of registration list

Having a generic cond_call of some sort might be nice for this sort of
thing.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86 (Linux Tiny): configure out support for some processors

2008-02-08 Thread Matt Mackall


On Fri, 2008-02-08 at 23:47 +0100, Michael Opdenacker wrote:
> This patch against x86/mm tries to revive an original patch
> from Matt Mackall which didn't get merged at that time. It makes
> it possible to disable support code for some processors. This can
> be useful to support only the exact processor type used
> in a given system.
> 
> I may have made wrong assumptions with the code handling
> force_mwait. As force_mwait is only declared in
> arch/x86/kernel/cpu/amd.c, which is only compiled
> when CONFIG_X86_32 is set, I thought it was safe
> to make the code depend on CONFIG_CPU_SUP_AMD,
> but I could be wrong.
> 
> Your comments are more than welcome! To make the code
> cleaner, I could use empty inline functions instead
> of ifdef's, as suggested in Documentation/SubmittingPatches.

Please include the output of size with all these options on and off.

> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index dabdbef..8f9a123 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -287,8 +287,10 @@ static void mwait_idle(void)
>  
>  static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
>  {
> +#ifdef CONFIG_CPU_SUP_AMD
>   if (force_mwait)
>   return 1;
> +#endif

Probably makes sense to move force_mwait (one word) here and eliminate
these ifdefs.

> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 347a8cd..812bfa0 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -211,12 +211,14 @@ static void __init kernel_physical_mapping_init(pgd_t 
> *pgd_base)
>   }
>  }
>  
> +#ifdef CONFIG_CPU_SUP_INTEL
>  static inline int page_kills_ppro(unsigned long pagenr)
>  {
>   if (pagenr >= 0x7 && pagenr <= 0x7003F)
>   return 1;
>   return 0;
>  }
> +#endif
>  /*
>   * devmem_is_allowed() checks to see if /dev/mem access to a certain address
> @@ -287,7 +289,11 @@ static void __meminit free_new_highpage(struct page 
> *page)
>  
>  void __init add_one_highpage_init(struct page *page, int pfn, int bad_ppro)
>  {
> - if (page_is_ram(pfn) && !(bad_ppro && page_kills_ppro(pfn))) {
> + if (page_is_ram(pfn)
> +#ifdef CONFIG_CPU_SUP_INTEL
> + && !(bad_ppro && page_kills_ppro(pfn))
> +#endif

Yuck. A better way to do this is move the bad_ppro check into
page_kills_ppro and then ifdef out -the body- of the inline.

> @@ -592,7 +598,11 @@ void __init mem_init(void)
>  #ifdef CONFIG_FLATMEM
>   BUG_ON(!mem_map);
>  #endif
> +#ifdef CONFIG_CPU_SUP_INTEL
>   bad_ppro = ppro_with_ram_bug();
> +#else
> + bad_ppro = 0;
> +#endif

Again, move the storage for this, let it get initialized to zero
automatically, and initialize it in the CPU-specific code (if ordering
allows).
 
-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] stub out is_swap_pte for !MMU

2008-02-08 Thread Matt Mackall


On Fri, 2008-02-08 at 14:05 -0800, Andrew Morton wrote:
> On Fri, 08 Feb 2008 15:41:42 -0600
> Matt Mackall <[EMAIL PROTECTED]> wrote:
> 
> > Fix compile error on nommu for is_swap_pte
> > 
> > Does it ever make sense to ask "is this pte a swap entry?" on a machine
> > with no MMU? Presumably this also means it has no ptes too, right? In
> > which case, it's better to comment the whole function out. Then when
> > someone tries to ask the above meaningless question, they get a compile
> > error rather than a meaningless answer.
> > 
> > Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>
> > 
> > diff -r 50a6e531a9f2 include/linux/swapops.h
> > --- a/include/linux/swapops.h   Mon Feb 04 20:23:02 2008 -0600
> > +++ b/include/linux/swapops.h   Fri Feb 08 15:38:01 2008 -0600
> > @@ -42,11 +42,13 @@
> > return entry.val & SWP_OFFSET_MASK(entry);
> >  }
> >  
> > +#ifdef CONFIG_MMU
> >  /* check whether a pte points to a swap entry */
> >  static inline int is_swap_pte(pte_t pte)
> >  {
> > return !pte_none(pte) && !pte_present(pte) && !pte_file(pte);
> >  }
> > +#endif
> >  
> 
> Seems contradictory.  Is there _really_ a compilation error at present? 
> The changelog seems to imply otherwise and no compiler error output is
> quoted and it all compiled OK for me on nommu superh.

Sorry, here's the compile error from the original thread (where the
original copy of the above patch was posted).

...
  CC  mm/vmscan.o
In file included from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/mm/vmscan.c:44:
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h: In function 
'is_swap_pte':
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: 
implicit declaration of function 'pte_none'
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: 
implicit declaration of function 'pte_present'
make[2]: *** [mm/vmscan.o] Error 1

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] stub out is_swap_pte for !MMU

2008-02-08 Thread Matt Mackall


On Fri, 2008-02-08 at 16:25 -0500, Mike Frysinger wrote:
> On Friday 08 February 2008, Matt Mackall wrote:
> > On Fri, 2008-02-08 at 15:02 -0500, Mike Frysinger wrote:
> > > With commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773, swap_pte() was
> > > moved into view of both MMU and !MMU, but uses functions only provided by
> > > MMU. Here we stub out the function for !MMU ports.
> >
> > I'm not sure if this is right compared to my original patch. Does it
> > ever make sense to ask "is this pte a swap entry?" on a machine with no
> > MMU? Presumably this also means it has no ptes too, right? In which
> > case, it's better to comment the whole function out. Then when someone
> > tries to ask the above meaningless question, they get a compile error
> > rather than a meaningless answer.
> 
> honestly, doesnt matter to me since none of the code that currently utilizes 
> this function is used in no-mmu context.  if you want to just put the whole 
> thing in CONFIG_MMU, then go for it.

Here it is again, I'll leave it up to Andrew:

Fix compile error on nommu for is_swap_pte

Does it ever make sense to ask "is this pte a swap entry?" on a machine
with no MMU? Presumably this also means it has no ptes too, right? In
which case, it's better to comment the whole function out. Then when
someone tries to ask the above meaningless question, they get a compile
error rather than a meaningless answer.

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

diff -r 50a6e531a9f2 include/linux/swapops.h
--- a/include/linux/swapops.h   Mon Feb 04 20:23:02 2008 -0600
+++ b/include/linux/swapops.h   Fri Feb 08 15:38:01 2008 -0600
@@ -42,11 +42,13 @@
return entry.val & SWP_OFFSET_MASK(entry);
 }
 
+#ifdef CONFIG_MMU
 /* check whether a pte points to a swap entry */
 static inline int is_swap_pte(pte_t pte)
 {
return !pte_none(pte) && !pte_present(pte) && !pte_file(pte);
 }
+#endif
 
 /*
  * Convert the arch-dependent pte representation of a swp_entry_t into an

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] stub out is_swap_pte for !MMU

2008-02-08 Thread Matt Mackall


On Fri, 2008-02-08 at 15:02 -0500, Mike Frysinger wrote:
> With commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773, swap_pte() was moved
> into view of both MMU and !MMU, but uses functions only provided by MMU.
> Here we stub out the function for !MMU ports.

I'm not sure if this is right compared to my original patch. Does it
ever make sense to ask "is this pte a swap entry?" on a machine with no
MMU? Presumably this also means it has no ptes too, right? In which
case, it's better to comment the whole function out. Then when someone
tries to ask the above meaningless question, they get a compile error
rather than a meaningless answer.

> Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
> ---
>  include/linux/swapops.h |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
> index 7bf2d14..e6b54f7 100644
> --- a/include/linux/swapops.h
> +++ b/include/linux/swapops.h
> @@ -45,7 +45,11 @@ static inline pgoff_t swp_offset(swp_entry_t entry)
>  /* check whether a pte points to a swap entry */
>  static inline int is_swap_pte(pte_t pte)
>  {
> +#ifdef CONFIG_MMU
>   return !pte_none(pte) && !pte_present(pte) && !pte_file(pte);
> +#else
> + return 0;
> +#endif
>  }
>  
>  /*
-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] stub out is_swap_pte for !MMU

2008-02-08 Thread Matt Mackall


On Fri, 2008-02-08 at 15:02 -0500, Mike Frysinger wrote:
 With commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773, swap_pte() was moved
 into view of both MMU and !MMU, but uses functions only provided by MMU.
 Here we stub out the function for !MMU ports.

I'm not sure if this is right compared to my original patch. Does it
ever make sense to ask is this pte a swap entry? on a machine with no
MMU? Presumably this also means it has no ptes too, right? In which
case, it's better to comment the whole function out. Then when someone
tries to ask the above meaningless question, they get a compile error
rather than a meaningless answer.

 Signed-off-by: Mike Frysinger [EMAIL PROTECTED]
 ---
  include/linux/swapops.h |4 
  1 files changed, 4 insertions(+), 0 deletions(-)
 
 diff --git a/include/linux/swapops.h b/include/linux/swapops.h
 index 7bf2d14..e6b54f7 100644
 --- a/include/linux/swapops.h
 +++ b/include/linux/swapops.h
 @@ -45,7 +45,11 @@ static inline pgoff_t swp_offset(swp_entry_t entry)
  /* check whether a pte points to a swap entry */
  static inline int is_swap_pte(pte_t pte)
  {
 +#ifdef CONFIG_MMU
   return !pte_none(pte)  !pte_present(pte)  !pte_file(pte);
 +#else
 + return 0;
 +#endif
  }
  
  /*
-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] stub out is_swap_pte for !MMU

2008-02-08 Thread Matt Mackall


On Fri, 2008-02-08 at 16:25 -0500, Mike Frysinger wrote:
 On Friday 08 February 2008, Matt Mackall wrote:
  On Fri, 2008-02-08 at 15:02 -0500, Mike Frysinger wrote:
   With commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773, swap_pte() was
   moved into view of both MMU and !MMU, but uses functions only provided by
   MMU. Here we stub out the function for !MMU ports.
 
  I'm not sure if this is right compared to my original patch. Does it
  ever make sense to ask is this pte a swap entry? on a machine with no
  MMU? Presumably this also means it has no ptes too, right? In which
  case, it's better to comment the whole function out. Then when someone
  tries to ask the above meaningless question, they get a compile error
  rather than a meaningless answer.
 
 honestly, doesnt matter to me since none of the code that currently utilizes 
 this function is used in no-mmu context.  if you want to just put the whole 
 thing in CONFIG_MMU, then go for it.

Here it is again, I'll leave it up to Andrew:

Fix compile error on nommu for is_swap_pte

Does it ever make sense to ask is this pte a swap entry? on a machine
with no MMU? Presumably this also means it has no ptes too, right? In
which case, it's better to comment the whole function out. Then when
someone tries to ask the above meaningless question, they get a compile
error rather than a meaningless answer.

Signed-off-by: Matt Mackall [EMAIL PROTECTED]

diff -r 50a6e531a9f2 include/linux/swapops.h
--- a/include/linux/swapops.h   Mon Feb 04 20:23:02 2008 -0600
+++ b/include/linux/swapops.h   Fri Feb 08 15:38:01 2008 -0600
@@ -42,11 +42,13 @@
return entry.val  SWP_OFFSET_MASK(entry);
 }
 
+#ifdef CONFIG_MMU
 /* check whether a pte points to a swap entry */
 static inline int is_swap_pte(pte_t pte)
 {
return !pte_none(pte)  !pte_present(pte)  !pte_file(pte);
 }
+#endif
 
 /*
  * Convert the arch-dependent pte representation of a swp_entry_t into an

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86 (Linux Tiny): configure out support for some processors

2008-02-08 Thread Matt Mackall


On Fri, 2008-02-08 at 23:47 +0100, Michael Opdenacker wrote:
 This patch against x86/mm tries to revive an original patch
 from Matt Mackall which didn't get merged at that time. It makes
 it possible to disable support code for some processors. This can
 be useful to support only the exact processor type used
 in a given system.
 
 I may have made wrong assumptions with the code handling
 force_mwait. As force_mwait is only declared in
 arch/x86/kernel/cpu/amd.c, which is only compiled
 when CONFIG_X86_32 is set, I thought it was safe
 to make the code depend on CONFIG_CPU_SUP_AMD,
 but I could be wrong.
 
 Your comments are more than welcome! To make the code
 cleaner, I could use empty inline functions instead
 of ifdef's, as suggested in Documentation/SubmittingPatches.

Please include the output of size with all these options on and off.

 diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
 index dabdbef..8f9a123 100644
 --- a/arch/x86/kernel/process_32.c
 +++ b/arch/x86/kernel/process_32.c
 @@ -287,8 +287,10 @@ static void mwait_idle(void)
  
  static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
  {
 +#ifdef CONFIG_CPU_SUP_AMD
   if (force_mwait)
   return 1;
 +#endif

Probably makes sense to move force_mwait (one word) here and eliminate
these ifdefs.

 diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
 index 347a8cd..812bfa0 100644
 --- a/arch/x86/mm/init_32.c
 +++ b/arch/x86/mm/init_32.c
 @@ -211,12 +211,14 @@ static void __init kernel_physical_mapping_init(pgd_t 
 *pgd_base)
   }
  }
  
 +#ifdef CONFIG_CPU_SUP_INTEL
  static inline int page_kills_ppro(unsigned long pagenr)
  {
   if (pagenr = 0x7  pagenr = 0x7003F)
   return 1;
   return 0;
  }
 +#endif
  /*
   * devmem_is_allowed() checks to see if /dev/mem access to a certain address
 @@ -287,7 +289,11 @@ static void __meminit free_new_highpage(struct page 
 *page)
  
  void __init add_one_highpage_init(struct page *page, int pfn, int bad_ppro)
  {
 - if (page_is_ram(pfn)  !(bad_ppro  page_kills_ppro(pfn))) {
 + if (page_is_ram(pfn)
 +#ifdef CONFIG_CPU_SUP_INTEL
 +  !(bad_ppro  page_kills_ppro(pfn))
 +#endif

Yuck. A better way to do this is move the bad_ppro check into
page_kills_ppro and then ifdef out -the body- of the inline.

 @@ -592,7 +598,11 @@ void __init mem_init(void)
  #ifdef CONFIG_FLATMEM
   BUG_ON(!mem_map);
  #endif
 +#ifdef CONFIG_CPU_SUP_INTEL
   bad_ppro = ppro_with_ram_bug();
 +#else
 + bad_ppro = 0;
 +#endif

Again, move the storage for this, let it get initialized to zero
automatically, and initialize it in the CPU-specific code (if ordering
allows).
 
-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: blackfin compile error

2008-02-06 Thread Matt Mackall


On Wed, 2008-02-06 at 17:18 +0200, Adrian Bunk wrote:
> Commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773 broke blackfin:
> 
> <--  snip  -->
> 
> ...
>   CC  mm/vmscan.o
> In file included from 
> /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/vmscan.c:44:
> /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h: In 
> function 'is_swap_pte':
> /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: 
> implicit declaration of function 'pte_none'
> /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: 
> implicit declaration of function 'pte_present'
> make[2]: *** [mm/vmscan.o] Error 1

This suggests that no one's tried to compile -mm on Blackfin since
before September, I think.

Is there somewhere more appropriate to move it? I can't find one.
Failing that, we can wrap it in CONFIG_MMU, I suppose.

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

diff -r 50a6e531a9f2 include/linux/swapops.h
--- a/include/linux/swapops.h   Mon Feb 04 20:23:02 2008 -0600
+++ b/include/linux/swapops.h   Wed Feb 06 10:21:32 2008 -0600
@@ -42,11 +42,13 @@
return entry.val & SWP_OFFSET_MASK(entry);
 }
 
+#ifdef CONFIG_MMU
 /* check whether a pte points to a swap entry */
 static inline int is_swap_pte(pte_t pte)
 {
return !pte_none(pte) && !pte_present(pte) && !pte_file(pte);
 }
+#endif
 
 /*
  * Convert the arch-dependent pte representation of a swp_entry_t into an

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: blackfin compile error

2008-02-06 Thread Matt Mackall


On Wed, 2008-02-06 at 17:18 +0200, Adrian Bunk wrote:
 Commit 698dd4ba6b12e34e1e432c944c01478c0b2cd773 broke blackfin:
 
 --  snip  --
 
 ...
   CC  mm/vmscan.o
 In file included from 
 /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/vmscan.c:44:
 /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h: In 
 function 'is_swap_pte':
 /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: 
 implicit declaration of function 'pte_none'
 /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: 
 implicit declaration of function 'pte_present'
 make[2]: *** [mm/vmscan.o] Error 1

This suggests that no one's tried to compile -mm on Blackfin since
before September, I think.

Is there somewhere more appropriate to move it? I can't find one.
Failing that, we can wrap it in CONFIG_MMU, I suppose.

Signed-off-by: Matt Mackall [EMAIL PROTECTED]

diff -r 50a6e531a9f2 include/linux/swapops.h
--- a/include/linux/swapops.h   Mon Feb 04 20:23:02 2008 -0600
+++ b/include/linux/swapops.h   Wed Feb 06 10:21:32 2008 -0600
@@ -42,11 +42,13 @@
return entry.val  SWP_OFFSET_MASK(entry);
 }
 
+#ifdef CONFIG_MMU
 /* check whether a pte points to a swap entry */
 static inline int is_swap_pte(pte_t pte)
 {
return !pte_none(pte)  !pte_present(pte)  !pte_file(pte);
 }
+#endif
 
 /*
  * Convert the arch-dependent pte representation of a swp_entry_t into an

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6.24-mm1] TCP/IPv6 connect() oopses at twothirdsMD4Transform()

2008-02-04 Thread Matt Mackall


On Mon, 2008-02-04 at 17:36 -0800, Andrew Morton wrote:
> On Tue, 05 Feb 2008 10:28:43 +0900 Tetsuo Handa <[EMAIL PROTECTED]> wrote:
> 
> > Hello.
> > 
> > Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.24-mm1
> > 
> > 2.6.24 works fine.

> err, Matt?

random: revert braindamage that snuck into checkpatch cleanup

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

diff -r 50a6e531a9f2 drivers/char/random.c
--- a/drivers/char/random.c Mon Feb 04 20:23:02 2008 -0600
+++ b/drivers/char/random.c Mon Feb 04 20:28:08 2008 -0600
@@ -1306,7 +1306,7 @@
  * Rotation is separate from addition to prevent recomputation
  */
 #define ROUND(f, a, b, c, d, x, s) \
-   (a += f(b, c, d) + in[x], a = (a << s) | (a >> (32 - s)))
+   (a += f(b, c, d) + x, a = (a << s) | (a >> (32 - s)))
 #define K1 0
 #define K2 013240474631UL
 #define K3 015666365641UL

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Matt Mackall

On Mon, 2008-02-04 at 16:24 -0800, Linus Torvalds wrote:
> 
> On Mon, 4 Feb 2008, Matt Mackall wrote:
> > 
> > But ATAoE is boring because it's not IP. Which means no routing,
> > firewalls, tunnels, congestion control, etc.
> 
> The thing is, that's often an advantage. Not just for performance.
> 
> > NBD and iSCSI (for all its hideous growths) can take advantage of these
> > things.
> 
> .. and all this could equally well be done by a simple bridging protocol 
> (completely independently of any AoE code).
> 
> The thing is, iSCSI does things at the wrong level. It *forces* people to 
> use the complex protocols, when it's a known that a lot of people don't 
> want it. 

I frankly think NBD is at a pretty comfortable level. It's internally
very simple (and hardware-agnostic). And moderately easy to do in
silicon.

But I'm not going to defend iSCSI. I worked on the first implementation
(what became the Cisco iSCSI driver) and I have no love for iSCSI at
all. It should have been (and started out as) a nearly trivial
encapsulation of SCSI over TCP much like ATA over Ethernet but quickly
lost the plot when committees got ahold of it.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Matt Mackall


On Mon, 2008-02-04 at 22:43 +, Alan Cox wrote:
> > better. So for example, I personally suspect that ATA-over-ethernet is way 
> > better than some crazy SCSI-over-TCP crap, but I'm biased for simple and 
> > low-level, and against those crazy SCSI people to begin with.
> 
> Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP
> would probably trash iSCSI for latency if nothing else.

But ATAoE is boring because it's not IP. Which means no routing,
firewalls, tunnels, congestion control, etc.

NBD and iSCSI (for all its hideous growths) can take advantage of these
things.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Matt Mackall


On Mon, 2008-02-04 at 22:43 +, Alan Cox wrote:
  better. So for example, I personally suspect that ATA-over-ethernet is way 
  better than some crazy SCSI-over-TCP crap, but I'm biased for simple and 
  low-level, and against those crazy SCSI people to begin with.
 
 Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP
 would probably trash iSCSI for latency if nothing else.

But ATAoE is boring because it's not IP. Which means no routing,
firewalls, tunnels, congestion control, etc.

NBD and iSCSI (for all its hideous growths) can take advantage of these
things.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Matt Mackall


On Mon, 2008-02-04 at 16:24 -0800, Linus Torvalds wrote:
 
 On Mon, 4 Feb 2008, Matt Mackall wrote:
  
  But ATAoE is boring because it's not IP. Which means no routing,
  firewalls, tunnels, congestion control, etc.
 
 The thing is, that's often an advantage. Not just for performance.
 
  NBD and iSCSI (for all its hideous growths) can take advantage of these
  things.
 
 .. and all this could equally well be done by a simple bridging protocol 
 (completely independently of any AoE code).
 
 The thing is, iSCSI does things at the wrong level. It *forces* people to 
 use the complex protocols, when it's a known that a lot of people don't 
 want it. 

I frankly think NBD is at a pretty comfortable level. It's internally
very simple (and hardware-agnostic). And moderately easy to do in
silicon.

But I'm not going to defend iSCSI. I worked on the first implementation
(what became the Cisco iSCSI driver) and I have no love for iSCSI at
all. It should have been (and started out as) a nearly trivial
encapsulation of SCSI over TCP much like ATA over Ethernet but quickly
lost the plot when committees got ahold of it.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6.24-mm1] TCP/IPv6 connect() oopses at twothirdsMD4Transform()

2008-02-04 Thread Matt Mackall


On Mon, 2008-02-04 at 17:36 -0800, Andrew Morton wrote:
 On Tue, 05 Feb 2008 10:28:43 +0900 Tetsuo Handa [EMAIL PROTECTED] wrote:
 
  Hello.
  
  Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.24-mm1
  
  2.6.24 works fine.

 err, Matt?

random: revert braindamage that snuck into checkpatch cleanup

Signed-off-by: Matt Mackall [EMAIL PROTECTED]

diff -r 50a6e531a9f2 drivers/char/random.c
--- a/drivers/char/random.c Mon Feb 04 20:23:02 2008 -0600
+++ b/drivers/char/random.c Mon Feb 04 20:28:08 2008 -0600
@@ -1306,7 +1306,7 @@
  * Rotation is separate from addition to prevent recomputation
  */
 #define ROUND(f, a, b, c, d, x, s) \
-   (a += f(b, c, d) + in[x], a = (a  s) | (a  (32 - s)))
+   (a += f(b, c, d) + x, a = (a  s) | (a  (32 - s)))
 #define K1 0
 #define K2 013240474631UL
 #define K3 015666365641UL

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] unexport add_disk_randomness

2008-01-30 Thread Matt Mackall


On Wed, 2008-01-30 at 22:02 +0200, Adrian Bunk wrote:
> This patch removes the no longer used EXPORT_SYMBOL(add_disk_randomness).
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Acked-by: Matt Mackall <[EMAIL PROTECTED]>

> ---
> f1a195a30248eae541ba006633aa70385d1eb785 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 5fee056..c511a83 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -667,8 +667,6 @@ void add_disk_randomness(struct gendisk *disk)
>   add_timer_randomness(disk->random,
>0x100 + MKDEV(disk->major, disk->first_minor));
>  }
> -
> -EXPORT_SYMBOL(add_disk_randomness);
>  #endif
>  
>  #define EXTRACT_SIZE 10
-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Improve Documentation/stable_api_nonsense.txt

2008-01-30 Thread Matt Mackall


On Tue, 2008-01-29 at 16:14 +0200, Heikki Orsila wrote:

> > > Imo,
> > > "same exact C compiler" is just bad language, because C compilers are 
> > > always "exact". "exactly same C compiler" would do.
> > 
> > No, "exactly same C compiler" doesn't parse well in English.
> 
> "Same exact C compiler" does not mean what you try to say.

Actually, it does. It is perfectly idiomatic English.

"Use the exact same C compiler." -> OK, idiomatic
"Use the same exact C compiler." -> OK, idiomatic
"Use the exactly same C compiler." -> very awkward
"Use exactly the same C compiler." -> formally correct

http://www.bartleby.com/68/24/2324.html
-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-30 Thread Matt Mackall

On Wed, 2008-01-30 at 18:28 +0100, Peter Zijlstra wrote:
> Subject: mm: MADV_WILLNEED implementation for anonymous memory
> 
> Implement MADV_WILLNEED for anonymous pages by walking the page tables and
> starting asynchonous swap cache reads for all encountered swap pages.
> 
> Doing so required a modification to the page table walking library functions.
> Previously ->pte_entry() could be called while holding a kmap_atomic, to
> overcome this problem the pte walker is changed to copy batches of the pmd
> and iterate them.

That's a pretty reasonable approach. My original approach was to buffer
a page worth of PTEs with all the attendant malloc annoyances. Then
Andrew and I came up with another fix a bit ago by effectively doing a
batch of size 1: mapping and immediately unmapping per PTE. That's
basically a no-op on !HIGHPTE but could potentially be expensive in the
HIGHPTE case. Your approach might be a good complexity/performance
middle ground.

Unfortunately, I think we only implemented our fix in one of the
relevant places: the /proc/pid/pagemap code hooks a callback at the pte
table level and then does its own walk across the table. Perhaps I
should refactor this so that it hooks in at the pte entry level of the
walker instead.

> +/*
> + * Much of the complication here is to work around CONFIG_HIGHPTE which needs
> + * to kmap the pmd. So copy batches of ptes from the pmd and iterate over
> + * those.
> + */
> +#define WALK_BATCH_SIZE  32
> +
>  static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> const struct mm_walk *walk, void *private)
>  {
>   pte_t *pte;
> + pte_t ptes[WALK_BATCH_SIZE];
> + unsigned long start;
> + unsigned int i;
>   int err = 0;
>  
> - pte = pte_offset_map(pmd, addr);
>   do {
> - err = walk->pte_entry(pte, addr, addr + PAGE_SIZE, private);
> - if (err)
> -break;
> - } while (pte++, addr += PAGE_SIZE, addr != end);
> + start = addr;
>  
> - pte_unmap(pte);
> + pte = pte_offset_map(pmd, addr);
> + for (i = 0; i < WALK_BATCH_SIZE && addr != end;
> + i++, pte++, addr += PAGE_SIZE)
> + ptes[i] = *pte;

Looks like this could be:

for (i = 0; i < WALK_BATCH_SIZE && addr + i * PAGE_SIZE != end; 
i++)
ptes[i] = pte[i];

> + pte_unmap(pte);
> +
> + for (i = 0, pte = ptes, addr = start;
> + i < WALK_BATCH_SIZE && addr != end;
> + i++, pte++, addr += PAGE_SIZE) {
> + err = walk->pte_entry(pte, addr, addr + PAGE_SIZE,
> + private);
for (i = 0; i < WALK_BATCH_SIZE && addr != end;
i++, addr+= PAGE_SIZE) {
err = walk->pte_entry(ptes[i], addr, addr + PAGE_SIZE,
private);

And we can ditch start.

Also, one wonders if setting batch size to 1 will then convince the
compiler to collapse this into a more trivial loop in the !HIGHPTE case.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-30 Thread Matt Mackall


On Wed, 2008-01-30 at 18:28 +0100, Peter Zijlstra wrote:
 Subject: mm: MADV_WILLNEED implementation for anonymous memory
 
 Implement MADV_WILLNEED for anonymous pages by walking the page tables and
 starting asynchonous swap cache reads for all encountered swap pages.
 
 Doing so required a modification to the page table walking library functions.
 Previously -pte_entry() could be called while holding a kmap_atomic, to
 overcome this problem the pte walker is changed to copy batches of the pmd
 and iterate them.

That's a pretty reasonable approach. My original approach was to buffer
a page worth of PTEs with all the attendant malloc annoyances. Then
Andrew and I came up with another fix a bit ago by effectively doing a
batch of size 1: mapping and immediately unmapping per PTE. That's
basically a no-op on !HIGHPTE but could potentially be expensive in the
HIGHPTE case. Your approach might be a good complexity/performance
middle ground.

Unfortunately, I think we only implemented our fix in one of the
relevant places: the /proc/pid/pagemap code hooks a callback at the pte
table level and then does its own walk across the table. Perhaps I
should refactor this so that it hooks in at the pte entry level of the
walker instead.

 +/*
 + * Much of the complication here is to work around CONFIG_HIGHPTE which needs
 + * to kmap the pmd. So copy batches of ptes from the pmd and iterate over
 + * those.
 + */
 +#define WALK_BATCH_SIZE  32
 +
  static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 const struct mm_walk *walk, void *private)
  {
   pte_t *pte;
 + pte_t ptes[WALK_BATCH_SIZE];
 + unsigned long start;
 + unsigned int i;
   int err = 0;
  
 - pte = pte_offset_map(pmd, addr);
   do {
 - err = walk-pte_entry(pte, addr, addr + PAGE_SIZE, private);
 - if (err)
 -break;
 - } while (pte++, addr += PAGE_SIZE, addr != end);
 + start = addr;
  
 - pte_unmap(pte);
 + pte = pte_offset_map(pmd, addr);
 + for (i = 0; i  WALK_BATCH_SIZE  addr != end;
 + i++, pte++, addr += PAGE_SIZE)
 + ptes[i] = *pte;

Looks like this could be:

for (i = 0; i  WALK_BATCH_SIZE  addr + i * PAGE_SIZE != end; 
i++)
ptes[i] = pte[i];

 + pte_unmap(pte);
 +
 + for (i = 0, pte = ptes, addr = start;
 + i  WALK_BATCH_SIZE  addr != end;
 + i++, pte++, addr += PAGE_SIZE) {
 + err = walk-pte_entry(pte, addr, addr + PAGE_SIZE,
 + private);
for (i = 0; i  WALK_BATCH_SIZE  addr != end;
i++, addr+= PAGE_SIZE) {
err = walk-pte_entry(ptes[i], addr, addr + PAGE_SIZE,
private);

And we can ditch start.

Also, one wonders if setting batch size to 1 will then convince the
compiler to collapse this into a more trivial loop in the !HIGHPTE case.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Improve Documentation/stable_api_nonsense.txt

2008-01-30 Thread Matt Mackall


On Tue, 2008-01-29 at 16:14 +0200, Heikki Orsila wrote:

   Imo,
   same exact C compiler is just bad language, because C compilers are 
   always exact. exactly same C compiler would do.
  
  No, exactly same C compiler doesn't parse well in English.
 
 Same exact C compiler does not mean what you try to say.

Actually, it does. It is perfectly idiomatic English.

Use the exact same C compiler. - OK, idiomatic
Use the same exact C compiler. - OK, idiomatic
Use the exactly same C compiler. - very awkward
Use exactly the same C compiler. - formally correct

http://www.bartleby.com/68/24/2324.html
-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] unexport add_disk_randomness

2008-01-30 Thread Matt Mackall


On Wed, 2008-01-30 at 22:02 +0200, Adrian Bunk wrote:
 This patch removes the no longer used EXPORT_SYMBOL(add_disk_randomness).
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Acked-by: Matt Mackall [EMAIL PROTECTED]

 ---
 f1a195a30248eae541ba006633aa70385d1eb785 
 diff --git a/drivers/char/random.c b/drivers/char/random.c
 index 5fee056..c511a83 100644
 --- a/drivers/char/random.c
 +++ b/drivers/char/random.c
 @@ -667,8 +667,6 @@ void add_disk_randomness(struct gendisk *disk)
   add_timer_randomness(disk-random,
0x100 + MKDEV(disk-major, disk-first_minor));
  }
 -
 -EXPORT_SYMBOL(add_disk_randomness);
  #endif
  
  #define EXTRACT_SIZE 10
-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH UPDATE] x86: ignore spurious faults

2008-01-24 Thread Matt Mackall


On Wed, 2008-01-23 at 16:28 -0800, Jeremy Fitzhardinge wrote:
> When changing a kernel page from RO->RW, it's OK to leave stale TLB
> entries around, since doing a global flush is expensive and they pose
> no security problem.  They can, however, generate a spurious fault,
> which we should catch and simply return from (which will have the
> side-effect of reloading the TLB to the current PTE).
> 
> This can occur when running under Xen, because it frequently changes
> kernel pages from RW->RO->RW to implement Xen's pagetable semantics.
> It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it avoids
> doing a global TLB flush after changing page permissions.

There's perhaps an opportunity to do this lazy TLB trick in the mmap
path as well, where RW mappings are initially mapped as RO so we can
catch processes dirtying them and then switched to RW. If the mapping is
shared across threads on multiple cores, we can defer synchronizing the
TLBs on the others.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys_msync()

2008-01-24 Thread Matt Mackall


On Thu, 2008-01-24 at 12:36 +1100, Nick Piggin wrote:
> On Thursday 24 January 2008 04:05, Linus Torvalds wrote:
> > On Wed, 23 Jan 2008, Anton Salikhmetov wrote:
> > > +
> > > + if (pte_dirty(*pte) && pte_write(*pte)) {
> >
> > Not correct.
> >
> > You still need to check "pte_present()" before you can test any other
> > bits. For a non-present pte, none of the other bits are defined, and for
> > all we know there might be architectures out there that require them to
> > be non-dirty.
> >
> > As it is, you just possibly randomly corrupted the pte.
> >
> > Yeah, on all architectures I know of, it the pte is clear, neither of
> > those tests will trigger, so it just happens to work, but it's still
> > wrong.
> 
> Probably it can fail for !present nonlinear mappings on many
> architectures.

Definitely.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Rescheduling interrupts

2008-01-23 Thread Matt Mackall


On Wed, 2008-01-23 at 09:53 +0100, Andi Kleen wrote:
> Ingo Molnar <[EMAIL PROTECTED]> writes:
> 
> > that would probably be the case if it's multiple sockets - but for 
> > multiple cores exactly the opposite is true: the sooner _both_ cores 
> > finish processing, the deeper power use the CPU can reach. 
> 
> That's only true on setups where the cores don't have 
> separate sleep states. But that's not generally true anymore.
> e.g. AMD Fam10h has completely separate power planes for
> the cores and I believe newer Intel CPUs can also let their 
> cores go to at least some sleep states independently (although
> the deepest sleep modi still require all cores idle) 

I think we can expect everyone to rapidly evolve towards full
independence of core power states. In fact, it wouldn't surprise me if
we eventually get to the point of shutting down individual functional
units like the FPU.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Rescheduling interrupts

2008-01-23 Thread Matt Mackall


On Wed, 2008-01-23 at 09:53 +0100, Andi Kleen wrote:
 Ingo Molnar [EMAIL PROTECTED] writes:
 
  that would probably be the case if it's multiple sockets - but for 
  multiple cores exactly the opposite is true: the sooner _both_ cores 
  finish processing, the deeper power use the CPU can reach. 
 
 That's only true on setups where the cores don't have 
 separate sleep states. But that's not generally true anymore.
 e.g. AMD Fam10h has completely separate power planes for
 the cores and I believe newer Intel CPUs can also let their 
 cores go to at least some sleep states independently (although
 the deepest sleep modi still require all cores idle) 

I think we can expect everyone to rapidly evolve towards full
independence of core power states. In fact, it wouldn't surprise me if
we eventually get to the point of shutting down individual functional
units like the FPU.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH rc8-mm1] hotfix libata-scsi corruption

2008-01-22 Thread Matt Mackall


On Tue, 2008-01-22 at 22:59 +, Hugh Dickins wrote:
> On Tue, 22 Jan 2008, James Bottomley wrote:
> > 
> > libsas looks to be OK because it specifically kmallocs a 512 byte buffer
> > which should (for off slab data) be 512 byte aligned.
> 
> I don't remember the various SLAB and SLOB and SLUB rules offhand:
> I'm not sure it's safe to rely on such alignment on all of them 

It doesn't work that way with SLOB kmalloc (nor did it in pre-slabified
kmalloc). One shouldn't be surprised if a SLAB/SLUB debugging feature
breaks that alignment either.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling

2008-01-22 Thread Matt Mackall


On Tue, 2008-01-22 at 19:58 +0100, Sam Ravnborg wrote:
> On Tue, Jan 22, 2008 at 10:37:19AM -0600, Matt Mackall wrote:
> > 
> > On Tue, 2008-01-22 at 15:39 +0100, Ingo Molnar wrote:
> > > threadinfo-ool.patch: doesnt this break the scheduler? 
> > 
> > It didn't when I wrote it, 3+ years ago. But I'm sure it needs to be
> > revisited.
> > 
> > > tiny-cflags.patch: obsolete? Isnt CFLAGS already extendable? Question to 
> > > Sam i guess.
> And what was the question then?
> 
> We have today the possibility to say:
> make KCFLAGS=-whatever
> 
> and we have plenty of kconfig adjustmenst affecting the gcc options.
> 
> I do not know if this covers it.

Basically the idea was you could specify various flags that affected
kernel size, in particular overriding the various bloated alignment
defaults.

If I were to do this today (if they haven't already become the default),
I'd probably add a config var to request minimal alignment instead.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Rescheduling interrupts

2008-01-22 Thread Matt Mackall


On Tue, 2008-01-22 at 17:05 +0100, Ingo Molnar wrote:
> * S.Çağlar Onur <[EMAIL PROTECTED]> wrote:
> 
> > > My theory is that for whatever reason we get "repeat" IPIs: multiple 
> > > reschedule IPIs although the other CPU only initiated one.
> > 
> > Ok, please see http://cekirdek.pardus.org.tr/~caglar/dmesg.3rd :)
> 
> hm, the IPI sending and receiving is nicely paired up:
> 
> [  625.795008] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
> [  625.795223] IPI (@native_smp_send_reschedule) from task amarokapp:2882 on 
> CPU#1:
> 
> amarokapp does wake up threads every 20 microseconds - that could 
> explain it. It's probably Xorg running on one core, amarokapp on the 
> other core. That's already 100 reschedules/sec.

That suggests we want an "anti-load-balancing" heuristic when CPU usage
is very low. Migrating everything onto one core when we're close to idle
will save power and probably reduce latencies.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling

2008-01-22 Thread Matt Mackall


On Tue, 2008-01-22 at 15:39 +0100, Ingo Molnar wrote:
> threadinfo-ool.patch: doesnt this break the scheduler? 

It didn't when I wrote it, 3+ years ago. But I'm sure it needs to be
revisited.

> tiny-cflags.patch: obsolete? Isnt CFLAGS already extendable? Question to 
> Sam i guess.

Yup.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling

2008-01-22 Thread Matt Mackall


On Tue, 2008-01-22 at 15:39 +0100, Ingo Molnar wrote:
 threadinfo-ool.patch: doesnt this break the scheduler? 

It didn't when I wrote it, 3+ years ago. But I'm sure it needs to be
revisited.

 tiny-cflags.patch: obsolete? Isnt CFLAGS already extendable? Question to 
 Sam i guess.

Yup.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Rescheduling interrupts

2008-01-22 Thread Matt Mackall


On Tue, 2008-01-22 at 17:05 +0100, Ingo Molnar wrote:
 * S.Çağlar Onur [EMAIL PROTECTED] wrote:
 
   My theory is that for whatever reason we get repeat IPIs: multiple 
   reschedule IPIs although the other CPU only initiated one.
  
  Ok, please see http://cekirdek.pardus.org.tr/~caglar/dmesg.3rd :)
 
 hm, the IPI sending and receiving is nicely paired up:
 
 [  625.795008] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
 [  625.795223] IPI (@native_smp_send_reschedule) from task amarokapp:2882 on 
 CPU#1:
 
 amarokapp does wake up threads every 20 microseconds - that could 
 explain it. It's probably Xorg running on one core, amarokapp on the 
 other core. That's already 100 reschedules/sec.

That suggests we want an anti-load-balancing heuristic when CPU usage
is very low. Migrating everything onto one core when we're close to idle
will save power and probably reduce latencies.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling

2008-01-22 Thread Matt Mackall


On Tue, 2008-01-22 at 19:58 +0100, Sam Ravnborg wrote:
 On Tue, Jan 22, 2008 at 10:37:19AM -0600, Matt Mackall wrote:
  
  On Tue, 2008-01-22 at 15:39 +0100, Ingo Molnar wrote:
   threadinfo-ool.patch: doesnt this break the scheduler? 
  
  It didn't when I wrote it, 3+ years ago. But I'm sure it needs to be
  revisited.
  
   tiny-cflags.patch: obsolete? Isnt CFLAGS already extendable? Question to 
   Sam i guess.
 And what was the question then?
 
 We have today the possibility to say:
 make KCFLAGS=-whatever
 
 and we have plenty of kconfig adjustmenst affecting the gcc options.
 
 I do not know if this covers it.

Basically the idea was you could specify various flags that affected
kernel size, in particular overriding the various bloated alignment
defaults.

If I were to do this today (if they haven't already become the default),
I'd probably add a config var to request minimal alignment instead.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH rc8-mm1] hotfix libata-scsi corruption

2008-01-22 Thread Matt Mackall


On Tue, 2008-01-22 at 22:59 +, Hugh Dickins wrote:
 On Tue, 22 Jan 2008, James Bottomley wrote:
  
  libsas looks to be OK because it specifically kmallocs a 512 byte buffer
  which should (for off slab data) be 512 byte aligned.
 
 I don't remember the various SLAB and SLOB and SLUB rules offhand:
 I'm not sure it's safe to rely on such alignment on all of them 

It doesn't work that way with SLOB kmalloc (nor did it in pre-slabified
kmalloc). One shouldn't be surprised if a SLAB/SLUB debugging feature
breaks that alignment either.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHSET] printk: implement printk_header() and merging printk

2008-01-21 Thread Matt Mackall


On Tue, 2008-01-22 at 10:00 +0900, Tejun Heo wrote:
> Matt Mackall wrote:
> > I suppose. I still find this approach less than ideal, especially
> > putting something potentially large on the stack. The dangers are
> > perhaps worse than a malloc, really. 
> 
> I pondered on this a bit but the thing is we already use several
> hundreds bytes in a function which builds complex messages.

Well there are lots of current and potential future users of this
function, many of them down at the end of long call chains. So I'm more
worried about the new cases that embrace this approach and suddenly add
300 bytes of stack. In fact, if this is at all popular, we can expect to
have more than one of these frames on the stack in various paths. 

Given that it only exists to make output prettier, it doesn't -really-
justify increased stack usage.

> > I also don't like your interface much. Consider this alternative:
> > 
> > struct mprintk *mp = mprintk_begin(KERN_INFO "ata%u.%2u: ", 1, 0);
> > mprintk(mp, "ATA %d", 7);
> > mprintk(mp, ", %u sectors\n", 1024);
> > mprintk(mp, "everything seems dandy\n");
> > mprintk_end(mp);
> > 
> > That keeps all the "normal" printks short and makes the flush more
> > explict.
> 
> I like that the more used function is shorter.  Hmmm... The reason why I
> first used mprintk_push() is to make it clear that the function
> accumulates messages unlike mprintk() which flushes what's accumulated
> and prints its own message.
> 
> > Now we make mprintk_begin attempt to do a kmalloc of a moderate size
> > (512 bytes?) and failing that, return null. Then mprintk can fall
> > through to printk in the NULL case.
> 
> If you wanna do that implicitly, you need GFP_ flag in mprintk_begin()
> and atomic allocation should be used from interrupt handlers and friends
> and they fail easily under the right (or wrong) conditions.  Forcing
> kmalloc isn't a good idea.  Having multiple initializers is one way to
> do it.  Any suggestions?

Adding a gfp_flags arg isn't too painful. And we've generally avoided
having separate function calls for atomic vs non-atomic allocation.

Btw, we can also easily hide Willy or Rusty's stringbuf implementation
under the covers here and still have a scheme that automatically falls
back to direct printk..

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHSET] printk: implement printk_header() and merging printk

2008-01-21 Thread Matt Mackall


On Sat, 2008-01-19 at 07:58 +0900, Tejun Heo wrote:
> Matt Mackall wrote:
> > On Wed, 2008-01-16 at 10:00 +0900, Tejun Heo wrote:
> >> And mprintk the following.
> >>
> >>  code:
> >>   DEFINE_MPRINTK(mp, 2 * 80);
> >>
> >>   mprintk_set_header(, KERN_INFO "ata%u.%2u: ", 1, 0);
> >>   mprintk_push(, "ATA %d", 7);
> >>   mprintk_push(, ", %u sectors\n", 1024);
> >>   mprintk(, "everything seems dandy\n");
> > 
> > I prefer Matthew Wilcox's stringbuf approach which does proper memory
> > management and isn't specific to printk:
> > 
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0710.3/0517.html
> 
> Yeap, that's generic and nice but I think both 'generic' and 'proper
> memory management' are weakness if what you're trying to do is to
> support collecting messages in pieces and putting it out via printk.
> Please consider the following scenario.
> 
> You're in an interrupt handler and detected a severe error condition
> which should be notified to the user but the information is rather
> complex and best built in pieces, so you create a stringbuf and does
> sb_printf() to it w/ GFP_ATOMIC but alas memory allocation failed and
> you end up printing "out of memory" unless you detect the failure and go
> back and printk messages piece-by-piece manually.  I would rather
> assemble the message manually from the get-go into an on-stack buffer.

I suppose. I still find this approach less than ideal, especially
putting something potentially large on the stack. The dangers are
perhaps worse than a malloc, really. 

I also don't like your interface much. Consider this alternative:

struct mprintk *mp = mprintk_begin(KERN_INFO "ata%u.%2u: ", 1, 0);
mprintk(mp, "ATA %d", 7);
mprintk(mp, ", %u sectors\n", 1024);
mprintk(mp, "everything seems dandy\n");
mprintk_end(mp);

That keeps all the "normal" printks short and makes the flush more
explict.

Now we make mprintk_begin attempt to do a kmalloc of a moderate size
(512 bytes?) and failing that, return null. Then mprintk can fall
through to printk in the NULL case.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] random - add async I/O support

2008-01-21 Thread Matt Mackall


On Mon, 2008-01-21 at 13:18 -0500, Jeff Dike wrote:
> Add async notification support to /dev/random.

This conflicts just about everywhere with my latest code, but I'll fix
that up.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] random - add async I/O support

2008-01-21 Thread Matt Mackall


On Mon, 2008-01-21 at 13:18 -0500, Jeff Dike wrote:
 Add async notification support to /dev/random.

This conflicts just about everywhere with my latest code, but I'll fix
that up.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHSET] printk: implement printk_header() and merging printk

2008-01-21 Thread Matt Mackall


On Sat, 2008-01-19 at 07:58 +0900, Tejun Heo wrote:
 Matt Mackall wrote:
  On Wed, 2008-01-16 at 10:00 +0900, Tejun Heo wrote:
  And mprintk the following.
 
   code:
DEFINE_MPRINTK(mp, 2 * 80);
 
mprintk_set_header(mp, KERN_INFO ata%u.%2u: , 1, 0);
mprintk_push(mp, ATA %d, 7);
mprintk_push(mp, , %u sectors\n, 1024);
mprintk(mp, everything seems dandy\n);
  
  I prefer Matthew Wilcox's stringbuf approach which does proper memory
  management and isn't specific to printk:
  
  http://www.ussg.iu.edu/hypermail/linux/kernel/0710.3/0517.html
 
 Yeap, that's generic and nice but I think both 'generic' and 'proper
 memory management' are weakness if what you're trying to do is to
 support collecting messages in pieces and putting it out via printk.
 Please consider the following scenario.
 
 You're in an interrupt handler and detected a severe error condition
 which should be notified to the user but the information is rather
 complex and best built in pieces, so you create a stringbuf and does
 sb_printf() to it w/ GFP_ATOMIC but alas memory allocation failed and
 you end up printing out of memory unless you detect the failure and go
 back and printk messages piece-by-piece manually.  I would rather
 assemble the message manually from the get-go into an on-stack buffer.

I suppose. I still find this approach less than ideal, especially
putting something potentially large on the stack. The dangers are
perhaps worse than a malloc, really. 

I also don't like your interface much. Consider this alternative:

struct mprintk *mp = mprintk_begin(KERN_INFO ata%u.%2u: , 1, 0);
mprintk(mp, ATA %d, 7);
mprintk(mp, , %u sectors\n, 1024);
mprintk(mp, everything seems dandy\n);
mprintk_end(mp);

That keeps all the normal printks short and makes the flush more
explict.

Now we make mprintk_begin attempt to do a kmalloc of a moderate size
(512 bytes?) and failing that, return null. Then mprintk can fall
through to printk in the NULL case.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHSET] printk: implement printk_header() and merging printk

2008-01-21 Thread Matt Mackall


On Tue, 2008-01-22 at 10:00 +0900, Tejun Heo wrote:
 Matt Mackall wrote:
  I suppose. I still find this approach less than ideal, especially
  putting something potentially large on the stack. The dangers are
  perhaps worse than a malloc, really. 
 
 I pondered on this a bit but the thing is we already use several
 hundreds bytes in a function which builds complex messages.

Well there are lots of current and potential future users of this
function, many of them down at the end of long call chains. So I'm more
worried about the new cases that embrace this approach and suddenly add
300 bytes of stack. In fact, if this is at all popular, we can expect to
have more than one of these frames on the stack in various paths. 

Given that it only exists to make output prettier, it doesn't -really-
justify increased stack usage.

  I also don't like your interface much. Consider this alternative:
  
  struct mprintk *mp = mprintk_begin(KERN_INFO ata%u.%2u: , 1, 0);
  mprintk(mp, ATA %d, 7);
  mprintk(mp, , %u sectors\n, 1024);
  mprintk(mp, everything seems dandy\n);
  mprintk_end(mp);
  
  That keeps all the normal printks short and makes the flush more
  explict.
 
 I like that the more used function is shorter.  Hmmm... The reason why I
 first used mprintk_push() is to make it clear that the function
 accumulates messages unlike mprintk() which flushes what's accumulated
 and prints its own message.
 
  Now we make mprintk_begin attempt to do a kmalloc of a moderate size
  (512 bytes?) and failing that, return null. Then mprintk can fall
  through to printk in the NULL case.
 
 If you wanna do that implicitly, you need GFP_ flag in mprintk_begin()
 and atomic allocation should be used from interrupt handlers and friends
 and they fail easily under the right (or wrong) conditions.  Forcing
 kmalloc isn't a good idea.  Having multiple initializers is one way to
 do it.  Any suggestions?

Adding a gfp_flags arg isn't too painful. And we've generally avoided
having separate function calls for atomic vs non-atomic allocation.

Btw, we can also easily hide Willy or Rusty's stringbuf implementation
under the covers here and still have a scheme that automatically falls
back to direct printk..

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: Celeron Core

2008-01-20 Thread Matt Mackall

On Sun, 2008-01-20 at 12:24 -0600, Robert Hancock wrote:
> David Newall wrote:
> > Andi Kleen wrote:
> >>> Isn't it the case that an idle machine will use
> >>> less power when throttled than when not?
> >>> 
> >> No that is not the case (not even on old CPUs) 
> >>   
> > Then why would it run cooler?  What generates the heat when not
> > throttled?  What stops generating heat when throttled?  And you say this
> > happens without reducing power consumption?  I'm not convinced.  I'm a
> > long way from that.
> 
> I believe that all throttling does is forcibly halt the CPU on a 
> particular duty cycle. This will reduce the rate of power consumption, 
> but reduces the CPU performance by a greater amount (since even at 100% 
> halted the CPU still consumes power) and so actually reduces performance 
> per watt. It will spread the heat and power usage produced from a given 
> workload task out in time (thus its usefulness in limiting CPU 
> temperature) but will consume more power overall.

Your usage of "overall power" here is wrong. Power is an instantaneous
quantity (1/s) like velocity, and you are comparing it to energy which
is not an instaneous quantity, more like distance.

If we throttle the velocity of a car from 100km/h to 50km/h, it'll
obviously take longer for it travel a given distance. Now what will it
mean when we ask about its "overall velocity" when it reaches its
destination? We surely don't mean the distance travelled - that's not a
velocity! We can perhaps talk about its average velocity, which will
obviously be smaller.

> Real CPU clock throttling schemes like SpeedStep, PowerNow, etc. 
> actually do increase performance per watt when they kick in.

That may be true. But the statement "throttling does not reduce power
usage" remains false. And the statement "throttling reduces heat
production but not power usage" remains physically impossible.

It might be true that "throttling increases energy usage per unit of
computation relative to no power saving measures at all", but that is
not incompatible with "throttling lets you run your laptop on battery
longer than no power saving measures at all", which is often what people
care about.

Voltage/frequency reduction is obviously a much better solution if it's
available as reducing voltage reduces power usage quadratically rather
than linearly. But beyond the quadratic/linear thing, the concept is the
same: use less power and your battery lasts longer.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling

2008-01-20 Thread Matt Mackall

On Sat, 2008-01-19 at 22:59 -0600, Rob Landley wrote:
> On Friday 18 January 2008 11:10:19 Matt Mackall wrote:
> > > * Disable support for readahead, page writeback, pdflush and swap
> > >   when we have no storage at all (typically booting from an
> > >   initramfs). This corresponds to 69 KB of source code!
> >
> > That'd be nice, yes. It would probably make sense to be able to disable
> > just readahead support when we're working with only solid-state devices.
> 
> Very nice.  From a UI standpoint, shouldn't disabling the block layer take at 
> least some of that out?

There are a number of laptops now that ship with solid-state disks.
These things look like normal IDE block devices to the kernel, but have
zero seek time and zero rotational latency. So here, prefetch is a waste
of memory and probably increases latency on average.

This will also be true for using prefetch on a typical embedded board
using compact flash through an IDE interface controller (extremely
common).

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling

2008-01-20 Thread Matt Mackall


On Sat, 2008-01-19 at 22:59 -0600, Rob Landley wrote:
 On Friday 18 January 2008 11:10:19 Matt Mackall wrote:
   * Disable support for readahead, page writeback, pdflush and swap
 when we have no storage at all (typically booting from an
 initramfs). This corresponds to 69 KB of source code!
 
  That'd be nice, yes. It would probably make sense to be able to disable
  just readahead support when we're working with only solid-state devices.
 
 Very nice.  From a UI standpoint, shouldn't disabling the block layer take at 
 least some of that out?

There are a number of laptops now that ship with solid-state disks.
These things look like normal IDE block devices to the kernel, but have
zero seek time and zero rotational latency. So here, prefetch is a waste
of memory and probably increases latency on average.

This will also be true for using prefetch on a typical embedded board
using compact flash through an IDE interface controller (extremely
common).

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: Celeron Core

2008-01-20 Thread Matt Mackall


On Sun, 2008-01-20 at 12:24 -0600, Robert Hancock wrote:
 David Newall wrote:
  Andi Kleen wrote:
  Isn't it the case that an idle machine will use
  less power when throttled than when not?
  
  No that is not the case (not even on old CPUs) 

  Then why would it run cooler?  What generates the heat when not
  throttled?  What stops generating heat when throttled?  And you say this
  happens without reducing power consumption?  I'm not convinced.  I'm a
  long way from that.
 
 I believe that all throttling does is forcibly halt the CPU on a 
 particular duty cycle. This will reduce the rate of power consumption, 
 but reduces the CPU performance by a greater amount (since even at 100% 
 halted the CPU still consumes power) and so actually reduces performance 
 per watt. It will spread the heat and power usage produced from a given 
 workload task out in time (thus its usefulness in limiting CPU 
 temperature) but will consume more power overall.

Your usage of overall power here is wrong. Power is an instantaneous
quantity (1/s) like velocity, and you are comparing it to energy which
is not an instaneous quantity, more like distance.

If we throttle the velocity of a car from 100km/h to 50km/h, it'll
obviously take longer for it travel a given distance. Now what will it
mean when we ask about its overall velocity when it reaches its
destination? We surely don't mean the distance travelled - that's not a
velocity! We can perhaps talk about its average velocity, which will
obviously be smaller.

 Real CPU clock throttling schemes like SpeedStep, PowerNow, etc. 
 actually do increase performance per watt when they kick in.

That may be true. But the statement throttling does not reduce power
usage remains false. And the statement throttling reduces heat
production but not power usage remains physically impossible.

It might be true that throttling increases energy usage per unit of
computation relative to no power saving measures at all, but that is
not incompatible with throttling lets you run your laptop on battery
longer than no power saving measures at all, which is often what people
care about.

Voltage/frequency reduction is obviously a much better solution if it's
available as reducing voltage reduces power usage quadratically rather
than linearly. But beyond the quadratic/linear thing, the concept is the
same: use less power and your battery lasts longer.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-19 Thread Matt Mackall


On Sat, 2008-01-19 at 11:22 +0100, Miklos Szeredi wrote:
> > Reminds me, I've got a patch here for addressing that problem with loop 
> > mounts:
> > 
> > Writes to loop should update the mtime of the underlying file.
> > 
> > Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>
> > 
> > Index: l/drivers/block/loop.c
> > ===
> > --- l.orig/drivers/block/loop.c 2007-11-05 17:50:07.0 -0600
> > +++ l/drivers/block/loop.c  2007-11-05 19:03:51.0 -0600
> > @@ -221,6 +221,7 @@ static int do_lo_send_aops(struct loop_d
> > offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1);
> > bv_offs = bvec->bv_offset;
> > len = bvec->bv_len;
> > +   file_update_time(file);
> > while (len > 0) {
> > sector_t IV;
> > unsigned size;
> > @@ -299,6 +300,7 @@ static int __do_lo_send_write(struct fil
> >  
> > set_fs(get_ds());
> > bw = file->f_op->write(file, buf, len, );
> > +   file_update_time(file);
> 
> ->write should have already updated the times, no?

Yes, this second case is redundant. Still needed in the first case.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-19 Thread Matt Mackall


On Sat, 2008-01-19 at 11:22 +0100, Miklos Szeredi wrote:
  Reminds me, I've got a patch here for addressing that problem with loop 
  mounts:
  
  Writes to loop should update the mtime of the underlying file.
  
  Signed-off-by: Matt Mackall [EMAIL PROTECTED]
  
  Index: l/drivers/block/loop.c
  ===
  --- l.orig/drivers/block/loop.c 2007-11-05 17:50:07.0 -0600
  +++ l/drivers/block/loop.c  2007-11-05 19:03:51.0 -0600
  @@ -221,6 +221,7 @@ static int do_lo_send_aops(struct loop_d
  offset = pos  ((pgoff_t)PAGE_CACHE_SIZE - 1);
  bv_offs = bvec-bv_offset;
  len = bvec-bv_len;
  +   file_update_time(file);
  while (len  0) {
  sector_t IV;
  unsigned size;
  @@ -299,6 +300,7 @@ static int __do_lo_send_write(struct fil
   
  set_fs(get_ds());
  bw = file-f_op-write(file, buf, len, pos);
  +   file_update_time(file);
 
 -write should have already updated the times, no?

Yes, this second case is redundant. Still needed in the first case.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: Celeron Core

2008-01-18 Thread Matt Mackall

On Sat, 2008-01-19 at 05:27 +0100, Andi Kleen wrote:
> > So while throttling may be less efficient in terms of watt seconds used
> > to compile something than running at full speed, it is incorrect to say
> > it uses less power. One machine running for an hour throttled to 50%
> > uses less power (and therefore less battery and cooling) than another
> > running at full speed for that same hour.
> 
> Not for the same unit of work. If you just run endless loops you 
> might be true, but most systems don't do that. 

Yes, most systems idle.

> In terms of laptops (or rather in most other systems too) you usually care 
> about battery life time while the system is mostly idling (waiting
> for your key strokes etc.). In this case enabling throttling
> as a cpufreq driver will not make your battery last longer.

It will relative to not throttling.

You made a claim that is -physically impossible- as stated, a claim I've
seen here before and I'm correcting it. If something reduces heat, it
must save power *by the definition of heat and power*. And if you reduce
power usage, you will make your battery last longer.

Make any other statement you want about the efficiency of throttling per
unit work or the effectiveness of throttling relavite to other methods,
just stop repeating the claim that "throttling reduces heat but doesn't
save power". It goes against the law of conservation of energy.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: Celeron Core

2008-01-18 Thread Matt Mackall

On Sat, 2008-01-19 at 02:15 +0100, Andi Kleen wrote:
> On Fri, Jan 18, 2008 at 06:27:57PM -0600, Matt Mackall wrote:
> > 
> > On Fri, 2008-01-18 at 22:11 +0100, Andi Kleen wrote:
> > > Chodorenko Michail <[EMAIL PROTECTED]> writes:
> > > 
> > > > I have a laptop "Extensa 5220", with the processor Celeron based on 
> > > > 'core'
> > > > technology.
> > > > There is ~ / arch/i386/kernel/cpu/cpufreq/p4-clockmod.c in the kernel
> > > > source code
> > > > but there's no line identification of my CPU for apply freqency change
> > > > need to add a ID line 0х16
> > > 
> > > Note that driver will likely do clock throttling on your CPU.
> > > Using that is usually a bad idea because it does not actually
> > > safe power. It's only intended to let the CPU cool down in some 
> > > situations.
> > 
> > Power consumption is more or less exactly equal to heat production
> > (that's where the power goes, after all!), so either clock throttling
> > DOES save power or it DOES NOT cool the CPU.
> 
> No actually the way it works on modern x86 CPUs is that the best
> strategy for saving power is to do things quickly and then
> idle longer. That means on anything that has reasonably
> deep sleep modi e.g. on older server/desktop systems things might
> be slightly different because they had very little power saving
> features enabled, but it's definitely true for all
> laptop systems from the last several years. But even
> on desktop/server throttling tends to be a bad idea.

Dominik is measuring energy expended (watts * seconds) vs work done (CPU
cycles). But your claim above is "clock throttling...does not save power
[but it lets] the CPU cool down", which talks about power (watts) and
heat (also watts, in fact the *very same* watts) and is physically
impossible. A CPU turns power into heat. Less heat out implies less
power in.

So while throttling may be less efficient in terms of watt seconds used
to compile something than running at full speed, it is incorrect to say
it uses less power. One machine running for an hour throttled to 50%
uses less power (and therefore less battery and cooling) than another
running at full speed for that same hour.

The first machine may take significantly longer to complete its task (or
it may not, if the task is reading email or watching video), but that's
another matter entirely. And whether it's more or less efficient than
other power-saving approaches is also another matter. Throttling does
save power.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Matt Mackall


On Fri, 2008-01-18 at 17:54 -0500, Rik van Riel wrote:
> On Fri, 18 Jan 2008 14:47:33 -0800 (PST)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> >  - keep it simple. Let's face it, Linux has never ever given those 
> >guarantees before, and it's not is if anybody has really cared. Even 
> >now, the issue seems to be more about paper standards conformance than 
> >anything else.
> 
> There is one issue which is way more than just standards conformance.
> 
> When a program changes file data through mmap(), at some point the
> mtime needs to be update so that backup programs know to back up the
> new version of the file.
> 
> Backup programs not seeing an updated mtime is a really big deal.

And that's fixed with the 4-line approach.

Reminds me, I've got a patch here for addressing that problem with loop mounts:

Writes to loop should update the mtime of the underlying file.

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: l/drivers/block/loop.c
===
--- l.orig/drivers/block/loop.c 2007-11-05 17:50:07.0 -0600
+++ l/drivers/block/loop.c  2007-11-05 19:03:51.0 -0600
@@ -221,6 +221,7 @@ static int do_lo_send_aops(struct loop_d
offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1);
bv_offs = bvec->bv_offset;
len = bvec->bv_len;
+   file_update_time(file);
while (len > 0) {
sector_t IV;
unsigned size;
@@ -299,6 +300,7 @@ static int __do_lo_send_write(struct fil
 
set_fs(get_ds());
bw = file->f_op->write(file, buf, len, );
+   file_update_time(file);
set_fs(old_fs);
if (likely(bw == len))
return 0;


-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: Celeron Core

2008-01-18 Thread Matt Mackall


On Fri, 2008-01-18 at 22:11 +0100, Andi Kleen wrote:
> Chodorenko Michail <[EMAIL PROTECTED]> writes:
> 
> > I have a laptop "Extensa 5220", with the processor Celeron based on 'core'
> > technology.
> > There is ~ / arch/i386/kernel/cpu/cpufreq/p4-clockmod.c in the kernel
> > source code
> > but there's no line identification of my CPU for apply freqency change
> > need to add a ID line 0х16
> 
> Note that driver will likely do clock throttling on your CPU.
> Using that is usually a bad idea because it does not actually
> safe power. It's only intended to let the CPU cool down in some situations.

Power consumption is more or less exactly equal to heat production
(that's where the power goes, after all!), so either clock throttling
DOES save power or it DOES NOT cool the CPU.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c?compiling

2008-01-18 Thread Matt Mackall


On Fri, 2008-01-18 at 22:09 +0100, Ingo Molnar wrote:
> * Matt Mackall <[EMAIL PROTECTED]> wrote:
> 
> > > Sounds fine! Don't hesitate to let us know about the lower-hanging 
> > > fruit you're thinking about. Here are the main things I have so far:
> > > 
> > > * Ideas in the existing Linux-Tiny patchset.
> > > * Disable support for non-Intel processors in x86 (cyrix.c,
> > >   centaur.c, transmeta.c, nexgen.c, umc.c in arch/x86/kernel/cpu).
> > >   As far as I remember, I saved 15 KB when I first experimented with
> > >   this).
> > 
> > Isn't that already in -tiny?
> 
> btw., are there any pending arch/x86 bits in -tiny? (stupid question: 
> were can i get the most uptodate version of -tiny from?)

It's not a stupid question. I dropped updating the tree regulary some
time ago to focus on merging bits and then got a bit side-tracked by
this little thing called "version control".

Michael is attempting to get the tree started again and has put a quilt
up here:

http://elinux.org/images/3/3c/Tiny-quilt-2.6.23-0.tar.bz2

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1967 matches

Mail list logo