Re: test13-pre5

2001-01-01 Thread H. Peter Anvin

Followup to:  <[EMAIL PROTECTED]>
By author:Geert Uytterhoeven <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel
> 
> What about defining new types for this? Like e.g. `x8', being `u8' on platforms
> were that's OK, and `u32' on platforms where that's more efficient?
> 

You may just want to look at how C99 handles this using ;
stdint.h defines types of the following format:

 int, uint  ... signed/unsigned

  ... exact size
 _least   ... no smaller than
 _fast... no smaller than, and efficient

 _t

E.g. uint32_t, int_least64_t, uint_fast8_t (the latter could easily be
a 32-bit type, for eaxmple.)

In addition, constructor macros are defined, as well as (u)intmax_t
and (u)intptr_t; which are defined as the largest
possible integer and an integer large enough to hold a (void *),
respectively.

In other words:

(void *)(uintptr_t)(void *)foo == (void *)foo

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2001-01-01 Thread H. Peter Anvin

Followup to:  [EMAIL PROTECTED]
By author:Geert Uytterhoeven [EMAIL PROTECTED]
In newsgroup: linux.dev.kernel
 
 What about defining new types for this? Like e.g. `x8', being `u8' on platforms
 were that's OK, and `u32' on platforms where that's more efficient?
 

You may just want to look at how C99 handles this using stdint.h;
stdint.h defines types of the following format:

 int, uint  ... signed/unsigned

 size ... exact size
 _leastsize   ... no smaller than
 _fastsize... no smaller than, and efficient

 _t

E.g. uint32_t, int_least64_t, uint_fast8_t (the latter could easily be
a 32-bit type, for eaxmple.)

In addition, constructor macros are defined, as well as (u)intmax_t
and (u)intptr_t; which are defined as the largest
possible integer and an integer large enough to hold a (void *),
respectively.

In other words:

(void *)(uintptr_t)(void *)foo == (void *)foo

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andi Kleen

On Sun, Dec 31, 2000 at 11:15:51AM -0800, Linus Torvalds wrote:
> In article <[EMAIL PROTECTED]>,
> Matti Aarnio  <[EMAIL PROTECTED]> wrote:
> >
> > Actually nothing SMP specific in that problem sphere.
> > Alpha has  load-locked/store-conditional  pair for
> > this type of memory accesses to automatically detect,
> > and (conditionally) restart the operation - to form
> > classical  ``locked-read-modify-write'' operations.
> 
> Sure, we could make the older alphas use ldl_l stl_c for byte accesses,
> but if you thought byte accesses on those machines were kind-of slow
> before, just WAIT until that happens.

The older Alphas would just typedef x8/x16 (or granular_u8, granular_u16
or whatever it is called) to u32 and be the same as today. Just most
other boxes would benefit.

This actually all assumes that gcc really uses the byte instructions
for byte stores in structures, which is to be determined.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Matti Aarnio  <[EMAIL PROTECTED]> wrote:
>
>   Actually nothing SMP specific in that problem sphere.
>   Alpha has  load-locked/store-conditional  pair for
>   this type of memory accesses to automatically detect,
>   and (conditionally) restart the operation - to form
>   classical  ``locked-read-modify-write'' operations.

Sure, we could make the older alphas use ldl_l stl_c for byte accesses,
but if you thought byte accesses on those machines were kind-of slow
before, just WAIT until that happens.

Old alpha machines (the same ones that would need this code) were
HORRIBLE at ldl_l<->stl_c: they go out all the way to the bus to set the
lock.  So suddenly your every byte access ends up being a few hundred
cycles!

So ldl_l/stc_l is not the answer.  It would work, but it would be so
slow that you'd be a lot better off not doing it. 

I think they fixed ldl/stc later on (so that it only sets a bit locally
that gets cleared by the cache coherency protocol), but as later alphas
have the byte accesses anyway that doesn't matter here. The faster
ldl/stc makes for much faster spinlocks on newer alphas, though.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andrea Arcangeli

On Sun, Dec 31, 2000 at 06:36:50PM +0100, Andi Kleen wrote:
> AFAIK alpha has byte instructions now.

See other post. Only from ev6 (at least as far as gcc is concerned). I've an
userspace testcase here (it was originally an obscure alpha userspace MM
corruption bug report that I sorted out some time ago) that works only only
when compiled for ev6 because it needs `short' granularity (not even byte
granularity).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andrea Arcangeli

On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote:
> The alpha systems I remember this problem on were all [..]

Yes the granularity issue has nothing to do with SMP (with preemptive kernel
it can trigger even without interrupts involved into the code). Also
CONFIG_SPACE_EFFICIENT looks not necessary.

The x8 name is confusing IMHO (when I read `8' I expect 8bits only, the x
isn't explicit enough). But by using a better name we could save some byte on
alpha ev6 and x86. Something like granular_char/granular_short/granular_int
looks nicer.  For the generic 64bit cpu they needs to be _unconditionally_
defined to `long'.

BTW, only old chips (ev[45]) doesn't provide byte granularity. Infact a linux
kernel compiled for ev6 can handle byte granularity also on alpha (it uses
-mcpu=ev6).

alpha reference manual 5.2.2:

[..] For each region, an implementation must support aligned quadword
access and may optionally support aligned longword access or byte
access. If byte access is supported in a region, aligned word access
and aligned longword access are also supported. [..]

21264hrm:

[..] The 21264-generated external references to memory space are
always of a fixed 64-byte size, though the internal access granularity
is byte, word, longword, or quad-word. [..]

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Matti Aarnio

On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote:
> On Sun, 31 Dec 2000, Andi Kleen wrote:
> > 
> > Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for
> > embedded systems, where you could trade a bit of CPU for less memory overhead 
> > even on systems where u8 is slow and atomicity doesn't come into play
> > because it's UP anyways. 
> 
> UP has nothing to do with it.
> The alpha systems I remember this problem on were all SMP.

Actually nothing SMP specific in that problem sphere.
Alpha has  load-locked/store-conditional  pair for
this type of memory accesses to automatically detect,
and (conditionally) restart the operation - to form
classical  ``locked-read-modify-write'' operations.

In what situations the compiler will use those instructions,
that I don't know.   Volatiles, very least, use them.
Will closely packed bytes be processed with it without
them being volatiles ?  How about bitfields ?

Newer Alphas have byte/short load/store instructions,
so things really aren't that straight-forward...


> I don't think it's a good diea.
> 
>   Linus

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andi Kleen

On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 31 Dec 2000, Andi Kleen wrote:
> > 
> > Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for
> > embedded systems, where you could trade a bit of CPU for less memory overhead 
> > even on systems where u8 is slow and atomicity doesn't come into play
> > because it's UP anyways. 
> 
> UP has nothing to do with it.
> 
> The alpha systems I remember this problem on were all SMP.
[...]

I just checked all architecture manuals I could lay my hands on
(sparcv9, ppc32, mips r4400, parisc 1.1, alpha, sh is somewhere in
storage but as I remember it has it too) 
and they all seem to have at least store byte and mostly store
half words instructions. 

> 
> Imagine an architecture where you need to do a
> 
>   load_32() 
>   mask-and-insert-byte
>   store_32()

iirc the Alpha guys found out that they couldn't drive half of the
available devices without byte store, and since then nobody has 
repeated that mistake @)


> 
> I don't think it's a good diea.

I don't see it. Just define x8 to u32 on old alpha and let most other architectures
be happy. 


-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andi Kleen

On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 31 Dec 2000, Andi Kleen wrote:
> > 
> > Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for
> > embedded systems, where you could trade a bit of CPU for less memory overhead 
> > even on systems where u8 is slow and atomicity doesn't come into play
> > because it's UP anyways. 
> 
> UP has nothing to do with it.
> 
> The alpha systems I remember this problem on were all SMP.

AFAIK alpha has byte instructions now.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Linus Torvalds



On Sun, 31 Dec 2000, Andi Kleen wrote:
> 
> Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for
> embedded systems, where you could trade a bit of CPU for less memory overhead 
> even on systems where u8 is slow and atomicity doesn't come into play
> because it's UP anyways. 

UP has nothing to do with it.

The alpha systems I remember this problem on were all SMP.

Imagine an architecture where you need to do a

load_32()
mask-and-insert-byte
store_32()

and imagine that an interrupt comes in:

load_32()
mask-and-insert-byte

* INTERRUPT *

load_32()
mask-and-insert-ANOTHER-byte
store_32()

interrupt return

store_32()

and notice how the value written by the interrupt is gone, gone, gone,
even though it was to a completely different byte.

Now, imagine that the first byte is the "age", and imagine that the thing
the interrupt tries to update is "flags".

Yes, you're screwed.

I don't think it's a good diea.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andi Kleen

On Sat, Dec 30, 2000 at 02:24:06PM +0100, Geert Uytterhoeven wrote:
> On Thu, 28 Dec 2000, Linus Torvalds wrote:
> > On Thu, 28 Dec 2000, Andi Kleen wrote:
> > > - Instead of having a zone pointer mask use a 8 or 16 byte index into a 
> > > zone table. On a modern CPU it is much cheaper to do the and/shifts than
> > > to do even a single cache miss during page aging. On a lot of systems 
> > > that zone index could be hardcoded to 0 anyways, giving better code.
> > > - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which
> > > has the same swapping algorithm even only uses 8bit) 
> > 
> > This would be good, but can be hard.
> > 
> > FreeBSD doesn't try to be portable any more, but Linux does, and there are
> > architectures where 8- and 16-bit accesses aren't atomic but have to be
> > done with read-modify-write cycles.
> > 
> > And even for fields like "age", where we don't care whether the age itself
> > is 100% accurate, we _do_ care that the fields close-by don't get strange
> > effects from updating "age". We used to have exactly this problem on alpha
> > back in the 2.1.x timeframe.
> > 
> > This is why a lot of fields are 32-bit, even though we wouldn't need more
> > than 8 or 16 bits of them.
> 
> What about defining new types for this? Like e.g. `x8', being `u8' on platforms
> were that's OK, and `u32' on platforms where that's more efficient?

Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for
embedded systems, where you could trade a bit of CPU for less memory overhead 
even on systems where u8 is slow and atomicity doesn't come into play
because it's UP anyways. 

Only problem I see is that when the programmer was wrong about the possible
range (which sometimes happens) then it could mysteriously work on some
machines and fail on others. This is already the case e.g. with atomic_t,
which is shorter than 32bit e.g. on sparc32, so it is probably not a too
big problem. 


-Andi

/* asm/types.h for a random 32bit machine with no byte access */ 
#if defined(CONFIG_SPACE_EFFICIENT) && !defined(CONFIG_SMP)
typedef __u8 x8; 
typedef __u16 x16; 
typedef __u32 x32; 
#else
typedef __u32 x8; 
typedef __u32 x16; 
typedef __u32 x32; 
#endif

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Geert Uytterhoeven

On Thu, 28 Dec 2000, Linus Torvalds wrote:
> On Thu, 28 Dec 2000, Andi Kleen wrote:
> > - Instead of having a zone pointer mask use a 8 or 16 byte index into a 
> > zone table. On a modern CPU it is much cheaper to do the and/shifts than
> > to do even a single cache miss during page aging. On a lot of systems 
> > that zone index could be hardcoded to 0 anyways, giving better code.
> > - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which
> > has the same swapping algorithm even only uses 8bit) 
> 
> This would be good, but can be hard.
> 
> FreeBSD doesn't try to be portable any more, but Linux does, and there are
> architectures where 8- and 16-bit accesses aren't atomic but have to be
> done with read-modify-write cycles.
> 
> And even for fields like "age", where we don't care whether the age itself
> is 100% accurate, we _do_ care that the fields close-by don't get strange
> effects from updating "age". We used to have exactly this problem on alpha
> back in the 2.1.x timeframe.
> 
> This is why a lot of fields are 32-bit, even though we wouldn't need more
> than 8 or 16 bits of them.

What about defining new types for this? Like e.g. `x8', being `u8' on platforms
were that's OK, and `u32' on platforms where that's more efficient?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Geert Uytterhoeven

On Thu, 28 Dec 2000, Linus Torvalds wrote:
 On Thu, 28 Dec 2000, Andi Kleen wrote:
  - Instead of having a zone pointer mask use a 8 or 16 byte index into a 
  zone table. On a modern CPU it is much cheaper to do the and/shifts than
  to do even a single cache miss during page aging. On a lot of systems 
  that zone index could be hardcoded to 0 anyways, giving better code.
  - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which
  has the same swapping algorithm even only uses 8bit) 
 
 This would be good, but can be hard.
 
 FreeBSD doesn't try to be portable any more, but Linux does, and there are
 architectures where 8- and 16-bit accesses aren't atomic but have to be
 done with read-modify-write cycles.
 
 And even for fields like "age", where we don't care whether the age itself
 is 100% accurate, we _do_ care that the fields close-by don't get strange
 effects from updating "age". We used to have exactly this problem on alpha
 back in the 2.1.x timeframe.
 
 This is why a lot of fields are 32-bit, even though we wouldn't need more
 than 8 or 16 bits of them.

What about defining new types for this? Like e.g. `x8', being `u8' on platforms
were that's OK, and `u32' on platforms where that's more efficient?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andi Kleen

On Sat, Dec 30, 2000 at 02:24:06PM +0100, Geert Uytterhoeven wrote:
 On Thu, 28 Dec 2000, Linus Torvalds wrote:
  On Thu, 28 Dec 2000, Andi Kleen wrote:
   - Instead of having a zone pointer mask use a 8 or 16 byte index into a 
   zone table. On a modern CPU it is much cheaper to do the and/shifts than
   to do even a single cache miss during page aging. On a lot of systems 
   that zone index could be hardcoded to 0 anyways, giving better code.
   - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which
   has the same swapping algorithm even only uses 8bit) 
  
  This would be good, but can be hard.
  
  FreeBSD doesn't try to be portable any more, but Linux does, and there are
  architectures where 8- and 16-bit accesses aren't atomic but have to be
  done with read-modify-write cycles.
  
  And even for fields like "age", where we don't care whether the age itself
  is 100% accurate, we _do_ care that the fields close-by don't get strange
  effects from updating "age". We used to have exactly this problem on alpha
  back in the 2.1.x timeframe.
  
  This is why a lot of fields are 32-bit, even though we wouldn't need more
  than 8 or 16 bits of them.
 
 What about defining new types for this? Like e.g. `x8', being `u8' on platforms
 were that's OK, and `u32' on platforms where that's more efficient?

Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for
embedded systems, where you could trade a bit of CPU for less memory overhead 
even on systems where u8 is slow and atomicity doesn't come into play
because it's UP anyways. 

Only problem I see is that when the programmer was wrong about the possible
range (which sometimes happens) then it could mysteriously work on some
machines and fail on others. This is already the case e.g. with atomic_t,
which is shorter than 32bit e.g. on sparc32, so it is probably not a too
big problem. 


-Andi

/* asm/types.h for a random 32bit machine with no byte access */ 
#if defined(CONFIG_SPACE_EFFICIENT)  !defined(CONFIG_SMP)
typedef __u8 x8; 
typedef __u16 x16; 
typedef __u32 x32; 
#else
typedef __u32 x8; 
typedef __u32 x16; 
typedef __u32 x32; 
#endif

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Linus Torvalds



On Sun, 31 Dec 2000, Andi Kleen wrote:
 
 Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for
 embedded systems, where you could trade a bit of CPU for less memory overhead 
 even on systems where u8 is slow and atomicity doesn't come into play
 because it's UP anyways. 

UP has nothing to do with it.

The alpha systems I remember this problem on were all SMP.

Imagine an architecture where you need to do a

load_32()
mask-and-insert-byte
store_32()

and imagine that an interrupt comes in:

load_32()
mask-and-insert-byte

* INTERRUPT *

load_32()
mask-and-insert-ANOTHER-byte
store_32()

interrupt return

store_32()

and notice how the value written by the interrupt is gone, gone, gone,
even though it was to a completely different byte.

Now, imagine that the first byte is the "age", and imagine that the thing
the interrupt tries to update is "flags".

Yes, you're screwed.

I don't think it's a good diea.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andi Kleen

On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote:
 
 
 On Sun, 31 Dec 2000, Andi Kleen wrote:
  
  Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for
  embedded systems, where you could trade a bit of CPU for less memory overhead 
  even on systems where u8 is slow and atomicity doesn't come into play
  because it's UP anyways. 
 
 UP has nothing to do with it.
 
 The alpha systems I remember this problem on were all SMP.

AFAIK alpha has byte instructions now.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andi Kleen

On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote:
 
 
 On Sun, 31 Dec 2000, Andi Kleen wrote:
  
  Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for
  embedded systems, where you could trade a bit of CPU for less memory overhead 
  even on systems where u8 is slow and atomicity doesn't come into play
  because it's UP anyways. 
 
 UP has nothing to do with it.
 
 The alpha systems I remember this problem on were all SMP.
[...]

I just checked all architecture manuals I could lay my hands on
(sparcv9, ppc32, mips r4400, parisc 1.1, alpha, sh is somewhere in
storage but as I remember it has it too) 
and they all seem to have at least store byte and mostly store
half words instructions. 

 
 Imagine an architecture where you need to do a
 
   load_32() 
   mask-and-insert-byte
   store_32()

iirc the Alpha guys found out that they couldn't drive half of the
available devices without byte store, and since then nobody has 
repeated that mistake @)


 
 I don't think it's a good diea.

I don't see it. Just define x8 to u32 on old alpha and let most other architectures
be happy. 


-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Matti Aarnio

On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote:
 On Sun, 31 Dec 2000, Andi Kleen wrote:
  
  Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for
  embedded systems, where you could trade a bit of CPU for less memory overhead 
  even on systems where u8 is slow and atomicity doesn't come into play
  because it's UP anyways. 
 
 UP has nothing to do with it.
 The alpha systems I remember this problem on were all SMP.

Actually nothing SMP specific in that problem sphere.
Alpha has  load-locked/store-conditional  pair for
this type of memory accesses to automatically detect,
and (conditionally) restart the operation - to form
classical  ``locked-read-modify-write'' operations.

In what situations the compiler will use those instructions,
that I don't know.   Volatiles, very least, use them.
Will closely packed bytes be processed with it without
them being volatiles ?  How about bitfields ?

Newer Alphas have byte/short load/store instructions,
so things really aren't that straight-forward...


 I don't think it's a good diea.
 
   Linus

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andrea Arcangeli

On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote:
 The alpha systems I remember this problem on were all [..]

Yes the granularity issue has nothing to do with SMP (with preemptive kernel
it can trigger even without interrupts involved into the code). Also
CONFIG_SPACE_EFFICIENT looks not necessary.

The x8 name is confusing IMHO (when I read `8' I expect 8bits only, the x
isn't explicit enough). But by using a better name we could save some byte on
alpha ev6 and x86. Something like granular_char/granular_short/granular_int
looks nicer.  For the generic 64bit cpu they needs to be _unconditionally_
defined to `long'.

BTW, only old chips (ev[45]) doesn't provide byte granularity. Infact a linux
kernel compiled for ev6 can handle byte granularity also on alpha (it uses
-mcpu=ev6).

alpha reference manual 5.2.2:

[..] For each region, an implementation must support aligned quadword
access and may optionally support aligned longword access or byte
access. If byte access is supported in a region, aligned word access
and aligned longword access are also supported. [..]

21264hrm:

[..] The 21264-generated external references to memory space are
always of a fixed 64-byte size, though the internal access granularity
is byte, word, longword, or quad-word. [..]

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andrea Arcangeli

On Sun, Dec 31, 2000 at 06:36:50PM +0100, Andi Kleen wrote:
 AFAIK alpha has byte instructions now.

See other post. Only from ev6 (at least as far as gcc is concerned). I've an
userspace testcase here (it was originally an obscure alpha userspace MM
corruption bug report that I sorted out some time ago) that works only only
when compiled for ev6 because it needs `short' granularity (not even byte
granularity).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Matti Aarnio  [EMAIL PROTECTED] wrote:

   Actually nothing SMP specific in that problem sphere.
   Alpha has  load-locked/store-conditional  pair for
   this type of memory accesses to automatically detect,
   and (conditionally) restart the operation - to form
   classical  ``locked-read-modify-write'' operations.

Sure, we could make the older alphas use ldl_l stl_c for byte accesses,
but if you thought byte accesses on those machines were kind-of slow
before, just WAIT until that happens.

Old alpha machines (the same ones that would need this code) were
HORRIBLE at ldl_l-stl_c: they go out all the way to the bus to set the
lock.  So suddenly your every byte access ends up being a few hundred
cycles!

So ldl_l/stc_l is not the answer.  It would work, but it would be so
slow that you'd be a lot better off not doing it. 

I think they fixed ldl/stc later on (so that it only sets a bit locally
that gets cleared by the cache coherency protocol), but as later alphas
have the byte accesses anyway that doesn't matter here. The faster
ldl/stc makes for much faster spinlocks on newer alphas, though.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-31 Thread Andi Kleen

On Sun, Dec 31, 2000 at 11:15:51AM -0800, Linus Torvalds wrote:
 In article [EMAIL PROTECTED],
 Matti Aarnio  [EMAIL PROTECTED] wrote:
 
  Actually nothing SMP specific in that problem sphere.
  Alpha has  load-locked/store-conditional  pair for
  this type of memory accesses to automatically detect,
  and (conditionally) restart the operation - to form
  classical  ``locked-read-modify-write'' operations.
 
 Sure, we could make the older alphas use ldl_l stl_c for byte accesses,
 but if you thought byte accesses on those machines were kind-of slow
 before, just WAIT until that happens.

The older Alphas would just typedef x8/x16 (or granular_u8, granular_u16
or whatever it is called) to u32 and be the same as today. Just most
other boxes would benefit.

This actually all assumes that gcc really uses the byte instructions
for byte stores in structures, which is to be determined.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5 + char-major-145??

2000-12-29 Thread Keith Owens

On Fri, 29 Dec 2000 14:07:57 -0700, 
Frank Jacobberger <[EMAIL PROTECTED]> wrote:
>modprobe: Can't locate module char-major-145
>
>From /usr/src/linux/Documentation/devices.txt
>
>10 charNon-serial mice, misc features
>145 = /dev/hfmodem  Soundcard shortwave modem control {2.6}

That is major 10, minor 145.  Search /145 *char/ to find char-major-145.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Linus Torvalds



On Fri, 29 Dec 2000, David S. Miller wrote:
> 
>  For my development testing, I'm running a _heavily_ hacked
>kernel.  One of these hacks is to pull the wait_queue_head out of
>struct page; the waitq-heads are in a separate allocated area of
>memory, with a waitq-head pointer embedded in the page structure
>(allocated/initialised in free_area_init_core()).  This gives a
>page structure of 60bytes, giving me one free double-word to play
>with (which I'm using as a pointer to a release function).
> 
> Not something like those damn Solaris turnstiles, no please

If you want to have a release function, please just use "page->mapping",
which gives you much more, including memory pressure indicators etc. Now
_that_ can be useful for doing things like slab caches.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread David S. Miller

   Date: Fri, 29 Dec 2000 15:46:22 + (GMT)
   From: Mark Hemment <[EMAIL PROTECTED]>

 For my development testing, I'm running a _heavily_ hacked
   kernel.  One of these hacks is to pull the wait_queue_head out of
   struct page; the waitq-heads are in a separate allocated area of
   memory, with a waitq-head pointer embedded in the page structure
   (allocated/initialised in free_area_init_core()).  This gives a
   page structure of 60bytes, giving me one free double-word to play
   with (which I'm using as a pointer to a release function).

Not something like those damn Solaris turnstiles, no please

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5 + char-major-145??

2000-12-29 Thread Bernd Eckenfels

In article <[EMAIL PROTECTED]> you wrote:
> What may be calling this? Any advice where to go ferreting?

Somebody may try to open the device file.

Greetings
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Tom Rini

On Thu, Dec 28, 2000 at 12:25:23PM -0800, Linus Torvalds wrote:

>  - pre5:
>- NIIBE Yutaka: SuperH update
>- Geert Uytterhoeven: m68k update
>- David Miller: TCP RTO calc fix, UDP multicast fix etc
>- Duncan Laurie: ServerWorks PIRQ routing definition.
>- mm PageDirty cleanups, added sanity checks, and don't lose the bit. 

I just noticed this (playing with some other stuff), but ext2 as a module
is currently broken:
$ make INSTALL_MOD_PATH=/tmp/foo modules_install
...
if [ -r System.map ]; then /sbin/depmod -ae -F System.map -b /tmp/foo -r 2.4.0-test12; 
fi
depmod: *** Unresolved symbols in 
/tmp/foo/lib/modules/2.4.0-test12/kernel/fs/ext2/ext2.o
depmod: buffer_insert_inode_queue
depmod: fsync_inode_buffers

I tried the following locally and it fixes it.
--
--- fs/Makefile.origFri Dec 29 10:35:50 2000
+++ fs/Makefile Fri Dec 29 10:36:06 2000
@@ -7,7 +7,7 @@
 
 O_TARGET := fs.o
 
-export-objs := filesystems.o
+export-objs := filesystems.o buffer.o
 mod-subdirs := nls
 
 obj-y :=   open.o read_write.o devices.o file_table.o buffer.o \
--- fs/buffer.c.origFri Dec 29 10:33:21 2000
+++ fs/buffer.c Fri Dec 29 10:35:46 2000
@@ -29,6 +29,7 @@
 /* async buffer flushing, 1999 Andrea Arcangeli <[EMAIL PROTECTED]> */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -579,6 +580,8 @@
spin_unlock(_list_lock);
 }
 
+EXPORT_SYMBOL(buffer_insert_inode_queue);
+
 /* The caller must have the lru_list lock before calling the 
remove_inode_queue functions.  */
 static void __remove_inode_queue(struct buffer_head *bh)
@@ -900,6 +903,7 @@
return err2;
 }
 
+EXPORT_SYMBOL(fsync_inode_buffers);
 
 /*
  * osync is designed to support O_SYNC io.  It waits synchronously for
--

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Mark Hemment

On Fri, 29 Dec 2000, Tim Wright wrote:
> Yes, this is a very important point if we ever want to make serious use
> of large memory machines on ia32. We ran into this with DYNIX/ptx when the
> P6 added 36-bit physical addressing. Conserving KVA (kernel virtual address
> space), became a very high priority. Eventually, we had to add code to play
> silly segment games and "magically" materialize and dematerialize a 4GB
> kernel virtual address space instead of the 1GB. This only comes into play
> with really large amounts of memory, and is almost certainly not worth the
> agony of implementation on Linux, but we'll need to be careful elsewhere to
> conserve it as much as possible.

  Indeed.  I'm compiling my kernels with 2GB virtual.  Not as I want more
NORMAL pages in the page cache (HIGH memory is fine), but as I need
NORMAL pages for kernel data/structures (memory allocated from  
slab-caches) which need to be constantly mapped in.

Mark


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Tim Wright

On Fri, Dec 29, 2000 at 03:46:22PM +, Mark Hemment wrote:
>   Note, for those of us running on 32bit with lots of physical memory, the
> available virtual address-space is of major consideration.  Reducing the
> size of the page structure is more than just reducing cache misses - it
> gives us more virtual to play with...
> 
> Mark
> 

Yes, this is a very important point if we ever want to make serious use
of large memory machines on ia32. We ran into this with DYNIX/ptx when the
P6 added 36-bit physical addressing. Conserving KVA (kernel virtual address
space), became a very high priority. Eventually, we had to add code to play
silly segment games and "magically" materialize and dematerialize a 4GB
kernel virtual address space instead of the 1GB. This only comes into play
with really large amounts of memory, and is almost certainly not worth the
agony of implementation on Linux, but we'll need to be careful elsewhere to
conserve it as much as possible.

Regards,

Tim


-- 
Tim Wright - [EMAIL PROTECTED] or [EMAIL PROTECTED] or [EMAIL PROTECTED]
IBM Linux Technology Center, Beaverton, Oregon
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Mark Hemment

Hi,

On Thu, 28 Dec 2000, David S. Miller wrote:
>Date:  Thu, 28 Dec 2000 23:17:22 +0100
>From: Andi Kleen <[EMAIL PROTECTED]>
> 
>Would you consider patches for any of these points? 
> 
> To me it seems just as important to make sure struct page is
> a power of 2 in size, with the waitq debugging turned off this
> is true for both 32-bit and 64-bit hosts last time I checked.

  Checking test11 (which I'm running here), even with waitq debugging
turned off, on 32-bit (IA32) the struct page is 68bytes (since
the "age" member was re-introduced a while back).

  For my development testing, I'm running a _heavily_ hacked kernel.  One
of these hacks is to pull the wait_queue_head out of struct page; the
waitq-heads are in a separate allocated area of memory, with a waitq-head
pointer embedded in the page structure (allocated/initialised in
free_area_init_core()).  This gives a page structure of 60bytes, giving me
one free double-word to play with (which I'm using as a pointer to a
release function).

  Infact, there doesn't need to be a waitq-head allocated for each page
structure - they can share; with a performance overhead on a false
wakeup in __wait_on_page().
  Note, for those of us running on 32bit with lots of physical memory, the
available virtual address-space is of major consideration.  Reducing the
size of the page structure is more than just reducing cache misses - it
gives us more virtual to play with...

Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Stefan Traby

On Fri, Dec 29, 2000 at 01:06:30AM +, Albert Cranford wrote:
> Simply executing
>  *p++ = htonl(fl->fl_pid);
> before 
>  start = loff_t_to_s64(fl->fl_start);
> also works.

Yes, confirmed.
Since you're located in Florida I vote for this and I hope that
Linus will elect it. :)


--- linux/fs/lockd/xdr4.c.orig  Fri Dec 29 01:35:32 2000
+++ linux/fs/lockd/xdr4.c   Fri Dec 29 14:56:07 2000
@@ -167,13 +167,13 @@
 || (fl->fl_end > NLM4_OFFSET_MAX && fl->fl_end != OFFSET_MAX))
return NULL;
 
+   *p++ = htonl(fl->fl_pid);
start = loff_t_to_s64(fl->fl_start);
if (fl->fl_end == OFFSET_MAX)
len = 0;
else
len = loff_t_to_s64(fl->fl_end - fl->fl_start + 1);
 
-   *p++ = htonl(fl->fl_pid);
p = xdr_encode_hyper(p, start);
p = xdr_encode_hyper(p, len);

-- 

  ciao - 
Stefan

"export PS1="rms# "  "

Stefan TrabyLinux/ia32   fax:  +43-3133-6107-9
Mitterlasznitzstr. 13   Linux/alphaphone:  +43-3133-6107-2
8302 Nestelbach Linux/sparc   http://www.hello-penguin.com
Austriamailto:[EMAIL PROTECTED]
Europe   mailto:[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Daniel Phillips

Marcelo Tosatti wrote:
> 
> On Thu, 28 Dec 2000, Linus Torvalds wrote:
> 
> >  - make "SetPageDirty()" do something like
> >
> >   if (!test_and_set(PG_dirty, >flags)) {
> >   spin_lock(_cache_lock);
> >   list_del(page->list);
> >   list_add(page->list, page->mapping->dirty_pages);
> >   spin_unlock(_cache_lock);
> >   }
> 
> We also want to move the page to the per-address-space clean list in
> ClearPageDirty I suppose.

I'd like to suggest taking this opportunity to regularize the notation
by going to set_page_dirty/clear_page_dirty which will call
SetPageDirty/ClearPageDirty.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Daniel Phillips

Marcelo Tosatti wrote:
 
 On Thu, 28 Dec 2000, Linus Torvalds wrote:
 
   - make "SetPageDirty()" do something like
 
if (!test_and_set(PG_dirty, page-flags)) {
spin_lock(page_cache_lock);
list_del(page-list);
list_add(page-list, page-mapping-dirty_pages);
spin_unlock(page_cache_lock);
}
 
 We also want to move the page to the per-address-space clean list in
 ClearPageDirty I suppose.

I'd like to suggest taking this opportunity to regularize the notation
by going to set_page_dirty/clear_page_dirty which will call
SetPageDirty/ClearPageDirty.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Stefan Traby

On Fri, Dec 29, 2000 at 01:06:30AM +, Albert Cranford wrote:
 Simply executing
  *p++ = htonl(fl-fl_pid);
 before 
  start = loff_t_to_s64(fl-fl_start);
 also works.

Yes, confirmed.
Since you're located in Florida I vote for this and I hope that
Linus will elect it. :)


--- linux/fs/lockd/xdr4.c.orig  Fri Dec 29 01:35:32 2000
+++ linux/fs/lockd/xdr4.c   Fri Dec 29 14:56:07 2000
@@ -167,13 +167,13 @@
 || (fl-fl_end  NLM4_OFFSET_MAX  fl-fl_end != OFFSET_MAX))
return NULL;
 
+   *p++ = htonl(fl-fl_pid);
start = loff_t_to_s64(fl-fl_start);
if (fl-fl_end == OFFSET_MAX)
len = 0;
else
len = loff_t_to_s64(fl-fl_end - fl-fl_start + 1);
 
-   *p++ = htonl(fl-fl_pid);
p = xdr_encode_hyper(p, start);
p = xdr_encode_hyper(p, len);

-- 

  ciao - 
Stefan

"export PS1="rms# "  "

Stefan TrabyLinux/ia32   fax:  +43-3133-6107-9
Mitterlasznitzstr. 13   Linux/alphaphone:  +43-3133-6107-2
8302 Nestelbach Linux/sparc   http://www.hello-penguin.com
Austriamailto:[EMAIL PROTECTED]
Europe   mailto:[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Mark Hemment

Hi,

On Thu, 28 Dec 2000, David S. Miller wrote:
Date:  Thu, 28 Dec 2000 23:17:22 +0100
From: Andi Kleen [EMAIL PROTECTED]
 
Would you consider patches for any of these points? 
 
 To me it seems just as important to make sure struct page is
 a power of 2 in size, with the waitq debugging turned off this
 is true for both 32-bit and 64-bit hosts last time I checked.

  Checking test11 (which I'm running here), even with waitq debugging
turned off, on 32-bit (IA32) the struct page is 68bytes (since
the "age" member was re-introduced a while back).

  For my development testing, I'm running a _heavily_ hacked kernel.  One
of these hacks is to pull the wait_queue_head out of struct page; the
waitq-heads are in a separate allocated area of memory, with a waitq-head
pointer embedded in the page structure (allocated/initialised in
free_area_init_core()).  This gives a page structure of 60bytes, giving me
one free double-word to play with (which I'm using as a pointer to a
release function).

  Infact, there doesn't need to be a waitq-head allocated for each page
structure - they can share; with a performance overhead on a false
wakeup in __wait_on_page().
  Note, for those of us running on 32bit with lots of physical memory, the
available virtual address-space is of major consideration.  Reducing the
size of the page structure is more than just reducing cache misses - it
gives us more virtual to play with...

Mark

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Tim Wright

On Fri, Dec 29, 2000 at 03:46:22PM +, Mark Hemment wrote:
   Note, for those of us running on 32bit with lots of physical memory, the
 available virtual address-space is of major consideration.  Reducing the
 size of the page structure is more than just reducing cache misses - it
 gives us more virtual to play with...
 
 Mark
 

Yes, this is a very important point if we ever want to make serious use
of large memory machines on ia32. We ran into this with DYNIX/ptx when the
P6 added 36-bit physical addressing. Conserving KVA (kernel virtual address
space), became a very high priority. Eventually, we had to add code to play
silly segment games and "magically" materialize and dematerialize a 4GB
kernel virtual address space instead of the 1GB. This only comes into play
with really large amounts of memory, and is almost certainly not worth the
agony of implementation on Linux, but we'll need to be careful elsewhere to
conserve it as much as possible.

Regards,

Tim


-- 
Tim Wright - [EMAIL PROTECTED] or [EMAIL PROTECTED] or [EMAIL PROTECTED]
IBM Linux Technology Center, Beaverton, Oregon
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Mark Hemment

On Fri, 29 Dec 2000, Tim Wright wrote:
 Yes, this is a very important point if we ever want to make serious use
 of large memory machines on ia32. We ran into this with DYNIX/ptx when the
 P6 added 36-bit physical addressing. Conserving KVA (kernel virtual address
 space), became a very high priority. Eventually, we had to add code to play
 silly segment games and "magically" materialize and dematerialize a 4GB
 kernel virtual address space instead of the 1GB. This only comes into play
 with really large amounts of memory, and is almost certainly not worth the
 agony of implementation on Linux, but we'll need to be careful elsewhere to
 conserve it as much as possible.

  Indeed.  I'm compiling my kernels with 2GB virtual.  Not as I want more
NORMAL pages in the page cache (HIGH memory is fine), but as I need
NORMAL pages for kernel data/structures (memory allocated from  
slab-caches) which need to be constantly mapped in.

Mark


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Tom Rini

On Thu, Dec 28, 2000 at 12:25:23PM -0800, Linus Torvalds wrote:

  - pre5:
- NIIBE Yutaka: SuperH update
- Geert Uytterhoeven: m68k update
- David Miller: TCP RTO calc fix, UDP multicast fix etc
- Duncan Laurie: ServerWorks PIRQ routing definition.
- mm PageDirty cleanups, added sanity checks, and don't lose the bit. 

I just noticed this (playing with some other stuff), but ext2 as a module
is currently broken:
$ make INSTALL_MOD_PATH=/tmp/foo modules_install
...
if [ -r System.map ]; then /sbin/depmod -ae -F System.map -b /tmp/foo -r 2.4.0-test12; 
fi
depmod: *** Unresolved symbols in 
/tmp/foo/lib/modules/2.4.0-test12/kernel/fs/ext2/ext2.o
depmod: buffer_insert_inode_queue
depmod: fsync_inode_buffers

I tried the following locally and it fixes it.
---cut---
--- fs/Makefile.origFri Dec 29 10:35:50 2000
+++ fs/Makefile Fri Dec 29 10:36:06 2000
@@ -7,7 +7,7 @@
 
 O_TARGET := fs.o
 
-export-objs := filesystems.o
+export-objs := filesystems.o buffer.o
 mod-subdirs := nls
 
 obj-y :=   open.o read_write.o devices.o file_table.o buffer.o \
--- fs/buffer.c.origFri Dec 29 10:33:21 2000
+++ fs/buffer.c Fri Dec 29 10:35:46 2000
@@ -29,6 +29,7 @@
 /* async buffer flushing, 1999 Andrea Arcangeli [EMAIL PROTECTED] */
 
 #include linux/config.h
+#include linux/module.h
 #include linux/sched.h
 #include linux/fs.h
 #include linux/malloc.h
@@ -579,6 +580,8 @@
spin_unlock(lru_list_lock);
 }
 
+EXPORT_SYMBOL(buffer_insert_inode_queue);
+
 /* The caller must have the lru_list lock before calling the 
remove_inode_queue functions.  */
 static void __remove_inode_queue(struct buffer_head *bh)
@@ -900,6 +903,7 @@
return err2;
 }
 
+EXPORT_SYMBOL(fsync_inode_buffers);
 
 /*
  * osync is designed to support O_SYNC io.  It waits synchronously for
---end---

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5 + char-major-145??

2000-12-29 Thread Bernd Eckenfels

In article [EMAIL PROTECTED] you wrote:
 What may be calling this? Any advice where to go ferreting?

Somebody may try to open the device file.

Greetings
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread David S. Miller

   Date: Fri, 29 Dec 2000 15:46:22 + (GMT)
   From: Mark Hemment [EMAIL PROTECTED]

 For my development testing, I'm running a _heavily_ hacked
   kernel.  One of these hacks is to pull the wait_queue_head out of
   struct page; the waitq-heads are in a separate allocated area of
   memory, with a waitq-head pointer embedded in the page structure
   (allocated/initialised in free_area_init_core()).  This gives a
   page structure of 60bytes, giving me one free double-word to play
   with (which I'm using as a pointer to a release function).

Not something like those damn Solaris turnstiles, no please

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-29 Thread Linus Torvalds



On Fri, 29 Dec 2000, David S. Miller wrote:
 
  For my development testing, I'm running a _heavily_ hacked
kernel.  One of these hacks is to pull the wait_queue_head out of
struct page; the waitq-heads are in a separate allocated area of
memory, with a waitq-head pointer embedded in the page structure
(allocated/initialised in free_area_init_core()).  This gives a
page structure of 60bytes, giving me one free double-word to play
with (which I'm using as a pointer to a release function).
 
 Not something like those damn Solaris turnstiles, no please

If you want to have a release function, please just use "page-mapping",
which gives you much more, including memory pressure indicators etc. Now
_that_ can be useful for doing things like slab caches.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5 + char-major-145??

2000-12-29 Thread Keith Owens

On Fri, 29 Dec 2000 14:07:57 -0700, 
Frank Jacobberger [EMAIL PROTECTED] wrote:
modprobe: Can't locate module char-major-145

From /usr/src/linux/Documentation/devices.txt

10 charNon-serial mice, misc features
145 = /dev/hfmodem  Soundcard shortwave modem control {2.6}

That is major 10, minor 145.  Search /145 *char/ to find char-major-145.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Albert Cranford

Simply executing
 *p++ = htonl(fl->fl_pid);
before 
 start = loff_t_to_s64(fl->fl_start);
also works.
Later,
Albert

Linus Torvalds wrote:
> 
> On Fri, 29 Dec 2000, Stefan Traby wrote:
> > On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote:
> >
> > > Too bad. Maybe somebody should tell gcc maintainers about programmers that
> > > know more than the compiler again.
> >
> > I know that {p,}gcc-2.95.2{,.1} are not officially supported.
> 
> Hmm, I use gcc-2.95.2 myself on some machines, and while I'm not 100%
> comfortable with it, it does count as "supported" even if it has known
> problems with "long long". pgcc isn't.
> 
> > Did you know that it's impossible to compile nfsv4 because of
> > register allocation problems with long long since (long long) month ?
> 
> lockd v4 (for NFS v3), I assume.
> 
> No, I wasn't aware of this particular bug.
> 
> > The following does not hurt, it's just a fix for a broken
> > compiler:
> 
> Ugh, that's ugly.
> 
> Can you test if it is sufficient to just simplify the math a bit, instead
> of uglyfing that function more? The nlm4_encode_lock() function already
> tests for NLM4_OFFSET_MAX explicitly for both start and end, so it should
> be ok to just re-code the function to not do the extra "loff_t_to_s64()"
> stuff, and simplify it enough that the compile rwill be happy to compile
> the simpler function. Something along the lines of
> 
> if (.. NLM4_OFFSET_MAX tests ..)
> ..
> 
> *p++ = htonl(fl->fl_pid);
> 
> start = fl->fl_start;
> len = fl->fl_end - start;
> if (fl->fl_end == OFFSET_MAX)
> len = 0;
> 
> p = xdr_encode_hyper(p, start);
> p = xdr_encode_hyper(p, len);
> 
> return p;
> 
> Where it tries to minimize the liveness of the 64-bit values, and tries to
> avoid extra complications.
> 
> Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-- 
Albert Cranford Deerfield Beach FL USA
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds



On Fri, 29 Dec 2000, Stefan Traby wrote:
> On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote:
> 
> > Too bad. Maybe somebody should tell gcc maintainers about programmers that
> > know more than the compiler again.
> 
> I know that {p,}gcc-2.95.2{,.1} are not officially supported.

Hmm, I use gcc-2.95.2 myself on some machines, and while I'm not 100%
comfortable with it, it does count as "supported" even if it has known
problems with "long long". pgcc isn't.

> Did you know that it's impossible to compile nfsv4 because of
> register allocation problems with long long since (long long) month ?

lockd v4 (for NFS v3), I assume. 

No, I wasn't aware of this particular bug. 

> The following does not hurt, it's just a fix for a broken
> compiler:

Ugh, that's ugly.

Can you test if it is sufficient to just simplify the math a bit, instead
of uglyfing that function more? The nlm4_encode_lock() function already
tests for NLM4_OFFSET_MAX explicitly for both start and end, so it should
be ok to just re-code the function to not do the extra "loff_t_to_s64()"
stuff, and simplify it enough that the compile rwill be happy to compile
the simpler function. Something along the lines of

if (.. NLM4_OFFSET_MAX tests ..)
..

*p++ = htonl(fl->fl_pid);

start = fl->fl_start;
len = fl->fl_end - start;
if (fl->fl_end == OFFSET_MAX)
len = 0;

p = xdr_encode_hyper(p, start);
p = xdr_encode_hyper(p, len);

return p;

Where it tries to minimize the liveness of the 64-bit values, and tries to
avoid extra complications.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds



On Thu, 28 Dec 2000, Marcelo Tosatti wrote:
> 
> We also want to move the page to the per-address-space clean list in
> ClearPageDirty I suppose.

I would actually advice against this.

 - it's ok to have too many pages on the dirty list (think o fthe dirty
   list as a "these pages _can_ be dirty")

 - whenever we do a ClearPageDirty() we're likely to remove the page from
   the lists altogether, so it's not worth it doing extra work.

The exception, of course, is the actual "filemap_fdatasync()" function,
but that one would probably look something like

spin_lock(_cache_lock);
while (!list_empty(>dirty_pages)) {
struct page *page = list_entry(mapping->dirty_pages.next, struct page, 
list);

list_del(>list);
list_add(>list, >clean_pages);

if (!PageDirty(page))
continue;
page_get(page);
spin_unlock(_cache_lock);

lock_page(page);
if (PageDirty(page)) {
ClearPageDirty(page);
page->mapping->writepage(page);
}
UnlockPage(page);
page_cache_put(page);
spin_lock(_cache_lock);
}
spin_unlock(_cache_lock);

and again note how we can move it to the clean list early and we don't
have to keep the PageDirty bit 100% in sync with which list is it on. If
somebody marks it dirty later on (and the dirty bit is still set), that
somebody won't move it back to the dirty list (because it noticved that
the dirty bit is already set), but that's ok: as long as we do the
"ClearPageDirty(page);" call before startign the actual writeout(), we're
fine.

So the "mapping->dirty_pages" list is maybe not so much a _dirty_ list, as
a "scheduled for writeout" list. Marking the page clean doesn't remove it
from that list - it can happily stay on the list and then when the
writeout is started we'd just skip it.

Ok?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Stefan Traby

On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote:

> Too bad. Maybe somebody should tell gcc maintainers about programmers that
> know more than the compiler again.

I know that {p,}gcc-2.95.2{,.1} are not officially supported.

Did you know that it's impossible to compile nfsv4 because of
register allocation problems with long long since (long long) month ?

The following does not hurt, it's just a fix for a broken
compiler:

--- linux/fs/lockd/xdr4.c.orig  Fri Dec 29 01:35:32 2000
+++ linux/fs/lockd/xdr4.c   Fri Dec 29 01:36:36 2000
@@ -156,7 +156,7 @@
 nlm4_encode_lock(u32 *p, struct nlm_lock *lock)
 {
struct file_lock*fl = >fl;
-   __s64   start, len;
+   volatile __s64  start, len;

if (!(p = xdr_encode_string(p, lock->caller))
 || !(p = nlm4_encode_fh(p, >fh))


Here is an example without this patch (pgcc-2.95.2.1 this time which
is bug-compatible to gcc-2.95.2.1).

gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -O2 
-fomit-frame-pointer -fno-strict-aliasing -pipe -mpreferred-stack-boundary=2 
-march=i686 -DMODULE   -c -o xdr4.o xdr4.c
xdr4.c: In function `nlm4_encode_lock':
xdr4.c:181: internal error--insn does not satisfy its constraints:
(insn/i 313 585 315 (set (reg:SI 1 %edx)
(subreg:SI (lshiftrt:DI (reg:DI 0 %eax)
(const_int 32 [0x20])) 0)) 323 {lshrdi3_const_int_subreg} (nil)
(nil))
gcc: Internal compiler error: program cpp got fatal signal 13
make[2]: *** [xdr4.o] Error 1
make[2]: Leaving directory `/.localvol000/usr/src/linux-2.4.0-test13pre5/fs/lockd'
make[1]: *** [_modsubdir_lockd] Error 2
make[1]: Leaving directory `/.localvol000/usr/src/linux-2.4.0-test13pre5/fs'
make: *** [_mod_fs] Error 2

The question is: Is it worth to apply ?

-- 

  ciao - 
Stefan

"export PS1="rms# "  "

Stefan TrabyLinux/ia32   fax:  +43-3133-6107-9
Mitterlasznitzstr. 13   Linux/alphaphone:  +43-3133-6107-2
8302 Nestelbach Linux/sparc   http://www.hello-penguin.com
Austriamailto:[EMAIL PROTECTED]
Europe   mailto:[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Marcelo Tosatti



On Thu, 28 Dec 2000, Linus Torvalds wrote:

>  - make "SetPageDirty()" do something like
> 
>   if (!test_and_set(PG_dirty, >flags)) {
>   spin_lock(_cache_lock);
>   list_del(page->list);
>   list_add(page->list, page->mapping->dirty_pages);
>   spin_unlock(_cache_lock);
>   }
> 
>This will require making sure that every place that does a
>SetPageDirty() will be ok with this (ie double-check that they all have
>a mapping: right now the free_pte() code in mm/memory.c doesn't care,
>because it knew that it coul dmark even anonymous pages dirty and
>they'd just get ignored.
>  - make a filemap_fdatasync() that walks the dirty pages and does a
>writepage() on them all and moves them to the clean list.

We also want to move the page to the per-address-space clean list in
ClearPageDirty I suppose.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote:
> 
> 
> On Fri, 29 Dec 2000, Andi Kleen wrote:
> > 
> > Hopefully all the "goto out" micro optimizations can be taken out then too,
> 
> "goto out" often generates much more readable code, so the optimization is
> secondary.

I was more thinking of cases like the scheduler's gotos, which has gotten
rather spagetti recently. Admittedly classic goto out is often more readable
than many nested if()s with error handling.

> 
> > I recently found out that gcc 2.97's block moving pass has the tendency
> > to move the outlined blocks inline again ;) 
> 
> Too bad. Maybe somebody should tell gcc maintainers about programmers that
> know more than the compiler again.

In x86-64 which relies on 2.97 I'm using __builtin_expect, defined to 
likely() and unlikely(), which seems to generate good code.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 03:14:56PM -0800, David S. Miller wrote:
>Date: Fri, 29 Dec 2000 00:17:21 +0100
>From: Andi Kleen <[EMAIL PROTECTED]>
> 
>On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote:
>> To make things like "page - mem_map" et al. use shifts instead of
>> expensive multiplies...
> 
>I thought that is what ->index is for ? 
> 
> It is for the page cache identity Andi... you know, page_hash(mapping, index)...

Oops, I confused it with the 2.0 page->map_nr, which did exactly that.

I should have known better.  Thanks for correcting this brainfart.

> And the add/sub/shift expansion of a multiply/divide by constant even
> in its' most optimal form is often not trivial, it is something on the
> order of 7 instructions with waitq debugging enabled last time I
> checked.

Wonder if it looks better with wq debugging turned off or a compressed
->zone.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds



On Fri, 29 Dec 2000, Andi Kleen wrote:
> 
> Hopefully all the "goto out" micro optimizations can be taken out then too,

"goto out" often generates much more readable code, so the optimization is
secondary.

> I recently found out that gcc 2.97's block moving pass has the tendency
> to move the outlined blocks inline again ;) 

Too bad. Maybe somebody should tell gcc maintainers about programmers that
know more than the compiler again.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds



On Fri, 29 Dec 2000, Andi Kleen wrote:

> On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote:
> >Date: Thu, 28 Dec 2000 23:58:36 +0100
> >From: Andi Kleen <[EMAIL PROTECTED]>
> > 
> >Why exactly a power of two ? To get rid of ->index ? 
> > 
> > To make things like "page - mem_map" et al. use shifts instead of
> > expensive multiplies...
> 
> I thought that is what ->index is for ? 

No. "index" only gives the virtual index.

"page - mem_map" is how you get the _physical_ index in the zone in
question, which is common for physical tranlations (ie "pte_page()",
"page_to_virt()" or "page_to_phys()")

> Also gcc seems to be already quite clever at dividing through small
> integers, e.g. using mul and shift and sub, so it may not be even worth to reach
> for a real power-of-two. 

Look at the code - it's a big multiply to do a divide by 68 or similar.
Quite expensive.

Doing "page->address - TASK_SIZE" on x86 for the non-highmem case would
probably be faster.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread David S. Miller

   Date: Fri, 29 Dec 2000 00:17:21 +0100
   From: Andi Kleen <[EMAIL PROTECTED]>

   On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote:
   > To make things like "page - mem_map" et al. use shifts instead of
   > expensive multiplies...

   I thought that is what ->index is for ? 

It is for the page cache identity Andi... you know, page_hash(mapping, index)...

And the add/sub/shift expansion of a multiply/divide by constant even
in its' most optimal form is often not trivial, it is something on the
order of 7 instructions with waitq debugging enabled last time I
checked.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Rik van Riel

On Fri, 29 Dec 2000, Andi Kleen wrote:
> On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote:
> >Date: Thu, 28 Dec 2000 23:58:36 +0100
> >From: Andi Kleen <[EMAIL PROTECTED]>
> > 
> >Why exactly a power of two ? To get rid of ->index ? 
> > 
> > To make things like "page - mem_map" et al. use shifts instead of
> > expensive multiplies...
> 
> I thought that is what ->index is for ? 

Nope, ->index is there to identify which offset the page
has in ->mapping, read mm/filemap.c::__find_page_nolock()
for more info.

> Also gcc seems to be already quite clever at dividing through
> small integers, e.g. using mul and shift and sub, so it may not
> be even worth to reach for a real power-of-two.
> 
> I suspect doing the arithmetics is at least faster than eating the 
> cache misses because of ->index. 

I'm pretty confident that arithmetic is faster than cache
misses ... but an unlucky size of the page struct will cause
extra cache misses due to misalignment.

regards,

Rik
--
Hollywood goes for world dumbination,
Trailer at 11.

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 03:15:01PM -0800, Linus Torvalds wrote:
> > (first number for 32bit, second for 64bit) 
> > 
> > - Do not compile virtual in when the kernel does not support highmem
> > (saves 4/8 bytes) 
> 
> Even on UP, "virtual" often helps. The conversion from "struct page" to
> the linear address is quite common, and if "struct page" isn't a
> power-of-two it gets slow.

Are you sure? Last time I checked gcc did a very good job at optimizing
small divisions with small integers, without using div. It just has to 
be a good integer with not too many set bits.

> is 100% accurate, we _do_ care that the fields close-by don't get strange
> effects from updating "age". We used to have exactly this problem on alpha
> back in the 2.1.x timeframe.

When it is shared with a constant field (like zone index) it shouldn't
matter, no ? At worst you can see outdated data, and when the outdated
data is constant all is fine.

> > - flags can be __u32 on 64bit hosts, sharing 64bit with something that
> > is tolerant to async updates (e.g. the zone table index or the index) 
> > - index could be probably u32 instead of unsigned long, saving 4 bytes
> > on i386
> 
> It already _is_ 32-bit on x86. 

Oops. It was a typo. I meant to write "saving 4 bytes on 64bit"

> Anyway, I don't want to increase "struct page" in size, but I also don't
> think it's worth micro-optimizing some of these things if the code gets
> harder to maintain (like the partial-word stuff would be).

Ok pity :-/

Hopefully all the "goto out" micro optimizations can be taken out then too,
I recently found out that gcc 2.97's block moving pass has the tendency
to move the outlined blocks inline again ;) 



-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote:
>Date: Thu, 28 Dec 2000 23:58:36 +0100
>From: Andi Kleen <[EMAIL PROTECTED]>
> 
>Why exactly a power of two ? To get rid of ->index ? 
> 
> To make things like "page - mem_map" et al. use shifts instead of
> expensive multiplies...

I thought that is what ->index is for ? 

Also gcc seems to be already quite clever at dividing through small
integers, e.g. using mul and shift and sub, so it may not be even worth to reach
for a real power-of-two. 

I suspect doing the arithmetics is at least faster than eating the 
cache misses because of ->index. 

-Andikkk


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds



On Thu, 28 Dec 2000, Andi Kleen wrote:
> 
> BTW..
> 
> The current 2.4 struct page could be already shortened a lot, saving a lot
> of cache.

Not that much, but some.

> (first number for 32bit, second for 64bit) 
> 
> - Do not compile virtual in when the kernel does not support highmem
> (saves 4/8 bytes) 

Even on UP, "virtual" often helps. The conversion from "struct page" to
the linear address is quite common, and if "struct page" isn't a
power-of-two it gets slow.

> - Instead of having a zone pointer mask use a 8 or 16 byte index into a 
> zone table. On a modern CPU it is much cheaper to do the and/shifts than
> to do even a single cache miss during page aging. On a lot of systems 
> that zone index could be hardcoded to 0 anyways, giving better code.
> - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which
> has the same swapping algorithm even only uses 8bit) 

This would be good, but can be hard.

FreeBSD doesn't try to be portable any more, but Linux does, and there are
architectures where 8- and 16-bit accesses aren't atomic but have to be
done with read-modify-write cycles.

And even for fields like "age", where we don't care whether the age itself
is 100% accurate, we _do_ care that the fields close-by don't get strange
effects from updating "age". We used to have exactly this problem on alpha
back in the 2.1.x timeframe.

This is why a lot of fields are 32-bit, even though we wouldn't need more
than 8 or 16 bits of them.

> - Remove the waitqueue debugging (obvious @)

Not obvious enough. There are magic things that could be done, like hiding
the wait-queue lock bit in the waitqueue lists themselves etc. That could
be done with some per-architecture magic etc.

> - flags can be __u32 on 64bit hosts, sharing 64bit with something that
> is tolerant to async updates (e.g. the zone table index or the index) 
> - index could be probably u32 instead of unsigned long, saving 4 bytes
> on i386

It already _is_ 32-bit on x86. 

Only the LSF patches made it 64-bit. That never made it into the standard
kernel.

Sure, we could make it "u32" and thus force it to be 32-bit even on 64-bit
architectures, but some day somebody will want to have more than 46 bits
of file mappings, and which 46 bits is _huge_ on a 32-bit machine, on a
64-bit one in 5 years it will not be entirely unreasonable. 

Anyway, I don't want to increase "struct page" in size, but I also don't
think it's worth micro-optimizing some of these things if the code gets
harder to maintain (like the partial-word stuff would be).

The biggest win by far would come from increasing the page-size, something
we can do even in software. Having a "kernel page size" of 8kB even on x86
would basically cut the overhead in half. As that would also improve some
other things (like having better throughput due to bigger contiguous
chunks), that's something I'd like to see some day.

(And user space wouldn't ever have to know - we could map in "half pages"
aka "hardware pages" without mappign the whole page).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread David S. Miller

   Date: Thu, 28 Dec 2000 23:58:36 +0100
   From: Andi Kleen <[EMAIL PROTECTED]>

   Why exactly a power of two ? To get rid of ->index ? 

To make things like "page - mem_map" et al. use shifts instead of
expensive multiplies...

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Rik van Riel

On Thu, 28 Dec 2000, Andi Kleen wrote:
> On Thu, Dec 28, 2000 at 02:33:07PM -0800, David S. Miller wrote:
> >Date:Thu, 28 Dec 2000 23:17:22 +0100
> >From: Andi Kleen <[EMAIL PROTECTED]>
> > 
> >Would you consider patches for any of these points? 
> > 
> > To me it seems just as important to make sure struct page is
> > a power of 2 in size, with the waitq debugging turned off this
> > is true for both 32-bit and 64-bit hosts last time I checked.
> 
> Why exactly a power of two ? To get rid of ->index ? 

Most likely to minimise the number of cache misses needed
to access a complete page_struct.

Then again, I guess 48 bytes would _also_ guarantee that
we never need more than 2 cache misses to access every
part of the page_struct.

And the memory wasted in the page_struct may well be a
bigger factor than the cache misses on lots of systems...

(time for another CONFIG option? ;))

regards,

Rik
--
Hollywood goes for world dumbination,
Trailer at 11.

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 02:33:07PM -0800, David S. Miller wrote:
>Date:  Thu, 28 Dec 2000 23:17:22 +0100
>From: Andi Kleen <[EMAIL PROTECTED]>
> 
>Would you consider patches for any of these points? 
> 
> To me it seems just as important to make sure struct page is
> a power of 2 in size, with the waitq debugging turned off this
> is true for both 32-bit and 64-bit hosts last time I checked.

Why exactly a power of two ? To get rid of ->index ? 


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread David S. Miller

   Date:Thu, 28 Dec 2000 23:17:22 +0100
   From: Andi Kleen <[EMAIL PROTECTED]>

   Would you consider patches for any of these points? 

To me it seems just as important to make sure struct page is
a power of 2 in size, with the waitq debugging turned off this
is true for both 32-bit and 64-bit hosts last time I checked.

Later,
David S. Miller
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 12:59:22PM -0800, Linus Torvalds wrote:
>  - we absolutely do _not_ want to make "struct page" bigger. We can't
>afford to just throw away another 8 bytes per page on adding a new list
>structure, I feel. Even if this would be the simplest solution.

BTW..

The current 2.4 struct page could be already shortened a lot, saving a lot
of cache.
(first number for 32bit, second for 64bit) 

- Do not compile virtual in when the kernel does not support highmem
(saves 4/8 bytes) 
- Instead of having a zone pointer mask use a 8 or 16 byte index into a 
zone table. On a modern CPU it is much cheaper to do the and/shifts than
to do even a single cache miss during page aging. On a lot of systems 
that zone index could be hardcoded to 0 anyways, giving better code.
- Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which
has the same swapping algorithm even only uses 8bit) 
- Remove the waitqueue debugging (obvious @)
- flags can be __u32 on 64bit hosts, sharing 64bit with something that
is tolerant to async updates (e.g. the zone table index or the index) 
- index could be probably u32 instead of unsigned long, saving 4 bytes
on i386

Would you consider patches for any of these points? 


-Andi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Daniel Phillips

Linus Torvalds wrote:
>  - global dirty list for global syn(). We don't have one, and I don't
>think we want one. We could add a few lists, and split up the active
>list into "active" and "active_dirty", for example, but I don't like
>the implications that would probably have for the LRU ordering.

This has been the subject of a lot of flam^H^H^H^H discussion on
#kernelnewbies about this and it's still an open question.  The only way
to see if a separate active_dirty hurts or helps is to try it.  Later.
:-)

I don't see how a separate active_dirty list can hurt LRU ordering.  We
could still take the pages off the two lists in the same order we did
with one list if we wanted to, or at least, statistically the same in
turns of number, age, time since entering the list, etc.  That better
not cause radically different or undesireable behaviour or something is
really broken.

By breaking active into two lists we'd get a very interesting tuning
parameter to play with: the relative rate at which pages are moved from
active to inactive.  Beyond that, the active_dirty list could be pressed
into service quite easily as a page-oriented version of kflushd, and
would obviously be useful as a global sync list.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5 (via82cxxx_audio.c)

2000-12-28 Thread Jonathan Hudson


In article <[EMAIL PROTECTED]>,
Linus Torvalds <[EMAIL PROTECTED]> writes:
LT> 
LT> The mm cleanups also include removing "swapout()" as a VM operation, as

swapout was not removed from drivers/sound/via82cxxx_audio.c; the
following does so (compiles and produces sound, someone who
understands this please check).


--- drivers/sound/via82cxxx_audio.c.origThu Dec 28 21:02:03 2000
+++ drivers/sound/via82cxxx_audio.c Thu Dec 28 21:12:58 2000
@@ -1727,20 +1727,8 @@
 }
 
 
-#ifndef VM_RESERVE
-static int via_mm_swapout (struct page *page, struct file *filp)
-{
-   return 0;
-}
-#endif /* VM_RESERVE */
-
-
 struct vm_operations_struct via_mm_ops = {
nopage: via_mm_nopage,
-
-#ifndef VM_RESERVE
-   swapout:via_mm_swapout,
-#endif
 };
 
 
 
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [wildly off-topic] Re: test13-pre5

2000-12-28 Thread Marcelo Tosatti


On Thu, 28 Dec 2000, Rik van Riel wrote:

> On Thu, 28 Dec 2000, Marcelo Tosatti wrote:
> > On Thu, 28 Dec 2000, Linus Torvalds wrote:
> > 
> > > If somebody (you? hint, hint) wants to do this,
> > 
> > Ok, I'll do it because I love Tove. 
> 
> Marcelo, you should buy some glasses ;)
> 
> Tove != Tux
> 
> It's ok and probably safe to love Tux, the nice cuddly
> penguin everybody loves.
> 
> However, loving the (6-time ??) Finnish female karate
> champion, who happens to be married to Linus is probably
> quite a bit less safe ...

Marcelo runs like hell. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[wildly off-topic] Re: test13-pre5

2000-12-28 Thread Rik van Riel

On Thu, 28 Dec 2000, Marcelo Tosatti wrote:
> On Thu, 28 Dec 2000, Linus Torvalds wrote:
> 
> > If somebody (you? hint, hint) wants to do this,
> 
> Ok, I'll do it because I love Tove. 

Marcelo, you should buy some glasses ;)

Tove != Tux

It's ok and probably safe to love Tux, the nice cuddly
penguin everybody loves.

However, loving the (6-time ??) Finnish female karate
champion, who happens to be married to Linus is probably
quite a bit less safe ...

cheers,

Rik
--
Hollywood goes for world dumbination,
Trailer at 11.

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Marcelo Tosatti


On Thu, 28 Dec 2000, Linus Torvalds wrote:

> If somebody (you? hint, hint) wants to do this,

Ok, I'll do it because I love Tove. 

> I'd be very happy - I can do it myself, but because it's my birthday
> I'm supposed to drag myself off the computer soon and be social, or
> Tove will be grumpy.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds


On Thu, 28 Dec 2000, Marcelo Tosatti wrote:
> 
> On Thu, 28 Dec 2000, Linus Torvalds wrote:
> 
> > This still doesn't tell "sync()" about dirty pages (ie the "innd loses the
> > active file after a reboot" bug), but now the places that mark pages dirty
> > are under control. Next step..
> 
> Do you really want to split the per-address-space pages list in dirty and
> clean lists for 2.4 ?
> 
> Or do you think walking the current per-address-space page list searching
> for dirty pages and syncing them is ok?

There are a few issues:

 - fdatasync/fsync is often quite critical for databases. It's _possibly_
   ok to just walk all the pages for an inode, but I'm fairly certain that
   this is an area where if we don't have a separate dirty queue we _will_
   need to add one later.

 - global dirty list for global syn(). We don't have one, and I don't
   think we want one. We could add a few lists, and split up the active
   list into "active" and "active_dirty", for example, but I don't like
   the implications that would probably have for the LRU ordering.

 - we absolutely do _not_ want to make "struct page" bigger. We can't
   afford to just throw away another 8 bytes per page on adding a new list
   structure, I feel. Even if this would be the simplest solution.

So right now I think the right idea is to

 - split up "address_space->pages" into "->clean_pages" and
   "->dirty_pages".  This is fairly easily done, it requires small changes
   like making "truncate_inode_pages()" instead be
   "truncate_list_pages()", and making "truncate_inode_pages()" call that
   for both the dirty and the clean lists. That's about 10 lines of diff
   (I already tried this), and that's about the biggest example of
   something like that. Most other areas don't much care about the inode
   page lists.

 - make "SetPageDirty()" do something like

if (!test_and_set(PG_dirty, >flags)) {
spin_lock(_cache_lock);
list_del(page->list);
list_add(page->list, page->mapping->dirty_pages);
spin_unlock(_cache_lock);
}

   This will require making sure that every place that does a
   SetPageDirty() will be ok with this (ie double-check that they all have
   a mapping: right now the free_pte() code in mm/memory.c doesn't care,
   because it knew that it coul dmark even anonymous pages dirty and
   they'd just get ignored.

 - make a filemap_fdatasync() that walks the dirty pages and does a
   writepage() on them all and moves them to the clean list.

 - make fsync() and fdatasync() call the above function before they even
   call the low-level per-FS code.

 - make sync_inodes() use that same filemap_fdatasync() function so that
   the sync() case is handled.

All done. It looks something like 5-10 places, most of which are about 10
lines of diff each, if even that.

The only real worry would be that the locking isn't rigth, but getting the
pagemap lock should be the safe thing, and from a lock contention
standpoint it should be ok (if we move a lot of pages back and forth, lock
contention is going to be the least of our worries, because it implies
that we'd be doing a LOT of IO to actually write the pages out).

If somebody (you? hint, hint) wants to do this, I'd be very happy - I can
do it myself, but because it's my birthday I'm supposed to drag myself off
the computer soon and be social, or Tove will be grumpy.

And you don't want Tove grumpy.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Marcelo Tosatti


On Thu, 28 Dec 2000, Linus Torvalds wrote:

> This still doesn't tell "sync()" about dirty pages (ie the "innd loses the
> active file after a reboot" bug), but now the places that mark pages dirty
> are under control. Next step..

Do you really want to split the per-address-space pages list in dirty and
clean lists for 2.4 ?

Or do you think walking the current per-address-space page list searching
for dirty pages and syncing them is ok?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Marcelo Tosatti


On Thu, 28 Dec 2000, Linus Torvalds wrote:

 This still doesn't tell "sync()" about dirty pages (ie the "innd loses the
 active file after a reboot" bug), but now the places that mark pages dirty
 are under control. Next step..

Do you really want to split the per-address-space pages list in dirty and
clean lists for 2.4 ?

Or do you think walking the current per-address-space page list searching
for dirty pages and syncing them is ok?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Marcelo Tosatti


On Thu, 28 Dec 2000, Linus Torvalds wrote:

 If somebody (you? hint, hint) wants to do this,

Ok, I'll do it because I love Tove. 

 I'd be very happy - I can do it myself, but because it's my birthday
 I'm supposed to drag myself off the computer soon and be social, or
 Tove will be grumpy.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[wildly off-topic] Re: test13-pre5

2000-12-28 Thread Rik van Riel

On Thu, 28 Dec 2000, Marcelo Tosatti wrote:
 On Thu, 28 Dec 2000, Linus Torvalds wrote:
 
  If somebody (you? hint, hint) wants to do this,
 
 Ok, I'll do it because I love Tove. 

Marcelo, you should buy some glasses ;)

Tove != Tux

It's ok and probably safe to love Tux, the nice cuddly
penguin everybody loves.

However, loving the (6-time ??) Finnish female karate
champion, who happens to be married to Linus is probably
quite a bit less safe ...

cheers,

Rik
--
Hollywood goes for world dumbination,
Trailer at 11.

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [wildly off-topic] Re: test13-pre5

2000-12-28 Thread Marcelo Tosatti


On Thu, 28 Dec 2000, Rik van Riel wrote:

 On Thu, 28 Dec 2000, Marcelo Tosatti wrote:
  On Thu, 28 Dec 2000, Linus Torvalds wrote:
  
   If somebody (you? hint, hint) wants to do this,
  
  Ok, I'll do it because I love Tove. 
 
 Marcelo, you should buy some glasses ;)
 
 Tove != Tux
 
 It's ok and probably safe to love Tux, the nice cuddly
 penguin everybody loves.
 
 However, loving the (6-time ??) Finnish female karate
 champion, who happens to be married to Linus is probably
 quite a bit less safe ...

Marcelo runs like hell. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5 (via82cxxx_audio.c)

2000-12-28 Thread Jonathan Hudson


In article [EMAIL PROTECTED],
Linus Torvalds [EMAIL PROTECTED] writes:
LT 
LT The mm cleanups also include removing "swapout()" as a VM operation, as

swapout was not removed from drivers/sound/via82cxxx_audio.c; the
following does so (compiles and produces sound, someone who
understands this please check).


--- drivers/sound/via82cxxx_audio.c.origThu Dec 28 21:02:03 2000
+++ drivers/sound/via82cxxx_audio.c Thu Dec 28 21:12:58 2000
@@ -1727,20 +1727,8 @@
 }
 
 
-#ifndef VM_RESERVE
-static int via_mm_swapout (struct page *page, struct file *filp)
-{
-   return 0;
-}
-#endif /* VM_RESERVE */
-
-
 struct vm_operations_struct via_mm_ops = {
nopage: via_mm_nopage,
-
-#ifndef VM_RESERVE
-   swapout:via_mm_swapout,
-#endif
 };
 
 
 
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Daniel Phillips

Linus Torvalds wrote:
  - global dirty list for global syn(). We don't have one, and I don't
think we want one. We could add a few lists, and split up the active
list into "active" and "active_dirty", for example, but I don't like
the implications that would probably have for the LRU ordering.

This has been the subject of a lot of flam^H^H^H^H discussion on
#kernelnewbies about this and it's still an open question.  The only way
to see if a separate active_dirty hurts or helps is to try it.  Later.
:-)

I don't see how a separate active_dirty list can hurt LRU ordering.  We
could still take the pages off the two lists in the same order we did
with one list if we wanted to, or at least, statistically the same in
turns of number, age, time since entering the list, etc.  That better
not cause radically different or undesireable behaviour or something is
really broken.

By breaking active into two lists we'd get a very interesting tuning
parameter to play with: the relative rate at which pages are moved from
active to inactive.  Beyond that, the active_dirty list could be pressed
into service quite easily as a page-oriented version of kflushd, and
would obviously be useful as a global sync list.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 12:59:22PM -0800, Linus Torvalds wrote:
  - we absolutely do _not_ want to make "struct page" bigger. We can't
afford to just throw away another 8 bytes per page on adding a new list
structure, I feel. Even if this would be the simplest solution.

BTW..

The current 2.4 struct page could be already shortened a lot, saving a lot
of cache.
(first number for 32bit, second for 64bit) 

- Do not compile virtual in when the kernel does not support highmem
(saves 4/8 bytes) 
- Instead of having a zone pointer mask use a 8 or 16 byte index into a 
zone table. On a modern CPU it is much cheaper to do the and/shifts than
to do even a single cache miss during page aging. On a lot of systems 
that zone index could be hardcoded to 0 anyways, giving better code.
- Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which
has the same swapping algorithm even only uses 8bit) 
- Remove the waitqueue debugging (obvious @)
- flags can be __u32 on 64bit hosts, sharing 64bit with something that
is tolerant to async updates (e.g. the zone table index or the index) 
- index could be probably u32 instead of unsigned long, saving 4 bytes
on i386

Would you consider patches for any of these points? 


-Andi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds



On Thu, 28 Dec 2000, Andi Kleen wrote:
 
 BTW..
 
 The current 2.4 struct page could be already shortened a lot, saving a lot
 of cache.

Not that much, but some.

 (first number for 32bit, second for 64bit) 
 
 - Do not compile virtual in when the kernel does not support highmem
 (saves 4/8 bytes) 

Even on UP, "virtual" often helps. The conversion from "struct page" to
the linear address is quite common, and if "struct page" isn't a
power-of-two it gets slow.

 - Instead of having a zone pointer mask use a 8 or 16 byte index into a 
 zone table. On a modern CPU it is much cheaper to do the and/shifts than
 to do even a single cache miss during page aging. On a lot of systems 
 that zone index could be hardcoded to 0 anyways, giving better code.
 - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which
 has the same swapping algorithm even only uses 8bit) 

This would be good, but can be hard.

FreeBSD doesn't try to be portable any more, but Linux does, and there are
architectures where 8- and 16-bit accesses aren't atomic but have to be
done with read-modify-write cycles.

And even for fields like "age", where we don't care whether the age itself
is 100% accurate, we _do_ care that the fields close-by don't get strange
effects from updating "age". We used to have exactly this problem on alpha
back in the 2.1.x timeframe.

This is why a lot of fields are 32-bit, even though we wouldn't need more
than 8 or 16 bits of them.

 - Remove the waitqueue debugging (obvious @)

Not obvious enough. There are magic things that could be done, like hiding
the wait-queue lock bit in the waitqueue lists themselves etc. That could
be done with some per-architecture magic etc.

 - flags can be __u32 on 64bit hosts, sharing 64bit with something that
 is tolerant to async updates (e.g. the zone table index or the index) 
 - index could be probably u32 instead of unsigned long, saving 4 bytes
 on i386

It already _is_ 32-bit on x86. 

Only the LSF patches made it 64-bit. That never made it into the standard
kernel.

Sure, we could make it "u32" and thus force it to be 32-bit even on 64-bit
architectures, but some day somebody will want to have more than 46 bits
of file mappings, and which 46 bits is _huge_ on a 32-bit machine, on a
64-bit one in 5 years it will not be entirely unreasonable. 

Anyway, I don't want to increase "struct page" in size, but I also don't
think it's worth micro-optimizing some of these things if the code gets
harder to maintain (like the partial-word stuff would be).

The biggest win by far would come from increasing the page-size, something
we can do even in software. Having a "kernel page size" of 8kB even on x86
would basically cut the overhead in half. As that would also improve some
other things (like having better throughput due to bigger contiguous
chunks), that's something I'd like to see some day.

(And user space wouldn't ever have to know - we could map in "half pages"
aka "hardware pages" without mappign the whole page).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 03:15:01PM -0800, Linus Torvalds wrote:
  (first number for 32bit, second for 64bit) 
  
  - Do not compile virtual in when the kernel does not support highmem
  (saves 4/8 bytes) 
 
 Even on UP, "virtual" often helps. The conversion from "struct page" to
 the linear address is quite common, and if "struct page" isn't a
 power-of-two it gets slow.

Are you sure? Last time I checked gcc did a very good job at optimizing
small divisions with small integers, without using div. It just has to 
be a good integer with not too many set bits.

 is 100% accurate, we _do_ care that the fields close-by don't get strange
 effects from updating "age". We used to have exactly this problem on alpha
 back in the 2.1.x timeframe.

When it is shared with a constant field (like zone index) it shouldn't
matter, no ? At worst you can see outdated data, and when the outdated
data is constant all is fine.

  - flags can be __u32 on 64bit hosts, sharing 64bit with something that
  is tolerant to async updates (e.g. the zone table index or the index) 
  - index could be probably u32 instead of unsigned long, saving 4 bytes
  on i386
 
 It already _is_ 32-bit on x86. 

Oops. It was a typo. I meant to write "saving 4 bytes on 64bit"

 Anyway, I don't want to increase "struct page" in size, but I also don't
 think it's worth micro-optimizing some of these things if the code gets
 harder to maintain (like the partial-word stuff would be).

Ok pity :-/

Hopefully all the "goto out" micro optimizations can be taken out then too,
I recently found out that gcc 2.97's block moving pass has the tendency
to move the outlined blocks inline again ;) 



-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds



On Fri, 29 Dec 2000, Andi Kleen wrote:
 
 Hopefully all the "goto out" micro optimizations can be taken out then too,

"goto out" often generates much more readable code, so the optimization is
secondary.

 I recently found out that gcc 2.97's block moving pass has the tendency
 to move the outlined blocks inline again ;) 

Too bad. Maybe somebody should tell gcc maintainers about programmers that
know more than the compiler again.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote:
 
 
 On Fri, 29 Dec 2000, Andi Kleen wrote:
  
  Hopefully all the "goto out" micro optimizations can be taken out then too,
 
 "goto out" often generates much more readable code, so the optimization is
 secondary.

I was more thinking of cases like the scheduler's gotos, which has gotten
rather spagetti recently. Admittedly classic goto out is often more readable
than many nested if()s with error handling.

 
  I recently found out that gcc 2.97's block moving pass has the tendency
  to move the outlined blocks inline again ;) 
 
 Too bad. Maybe somebody should tell gcc maintainers about programmers that
 know more than the compiler again.

In x86-64 which relies on 2.97 I'm using __builtin_expect, defined to 
likely() and unlikely(), which seems to generate good code.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Marcelo Tosatti



On Thu, 28 Dec 2000, Linus Torvalds wrote:

  - make "SetPageDirty()" do something like
 
   if (!test_and_set(PG_dirty, page-flags)) {
   spin_lock(page_cache_lock);
   list_del(page-list);
   list_add(page-list, page-mapping-dirty_pages);
   spin_unlock(page_cache_lock);
   }
 
This will require making sure that every place that does a
SetPageDirty() will be ok with this (ie double-check that they all have
a mapping: right now the free_pte() code in mm/memory.c doesn't care,
because it knew that it coul dmark even anonymous pages dirty and
they'd just get ignored.
  - make a filemap_fdatasync() that walks the dirty pages and does a
writepage() on them all and moves them to the clean list.

We also want to move the page to the per-address-space clean list in
ClearPageDirty I suppose.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Stefan Traby

On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote:

 Too bad. Maybe somebody should tell gcc maintainers about programmers that
 know more than the compiler again.

I know that {p,}gcc-2.95.2{,.1} are not officially supported.

Did you know that it's impossible to compile nfsv4 because of
register allocation problems with long long since (long long) month ?

The following does not hurt, it's just a fix for a broken
compiler:

--- linux/fs/lockd/xdr4.c.orig  Fri Dec 29 01:35:32 2000
+++ linux/fs/lockd/xdr4.c   Fri Dec 29 01:36:36 2000
@@ -156,7 +156,7 @@
 nlm4_encode_lock(u32 *p, struct nlm_lock *lock)
 {
struct file_lock*fl = lock-fl;
-   __s64   start, len;
+   volatile __s64  start, len;

if (!(p = xdr_encode_string(p, lock-caller))
 || !(p = nlm4_encode_fh(p, lock-fh))


Here is an example without this patch (pgcc-2.95.2.1 this time which
is bug-compatible to gcc-2.95.2.1).

gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -O2 
-fomit-frame-pointer -fno-strict-aliasing -pipe -mpreferred-stack-boundary=2 
-march=i686 -DMODULE   -c -o xdr4.o xdr4.c
xdr4.c: In function `nlm4_encode_lock':
xdr4.c:181: internal error--insn does not satisfy its constraints:
(insn/i 313 585 315 (set (reg:SI 1 %edx)
(subreg:SI (lshiftrt:DI (reg:DI 0 %eax)
(const_int 32 [0x20])) 0)) 323 {lshrdi3_const_int_subreg} (nil)
(nil))
gcc: Internal compiler error: program cpp got fatal signal 13
make[2]: *** [xdr4.o] Error 1
make[2]: Leaving directory `/.localvol000/usr/src/linux-2.4.0-test13pre5/fs/lockd'
make[1]: *** [_modsubdir_lockd] Error 2
make[1]: Leaving directory `/.localvol000/usr/src/linux-2.4.0-test13pre5/fs'
make: *** [_mod_fs] Error 2

The question is: Is it worth to apply ?

-- 

  ciao - 
Stefan

"export PS1="rms# "  "

Stefan TrabyLinux/ia32   fax:  +43-3133-6107-9
Mitterlasznitzstr. 13   Linux/alphaphone:  +43-3133-6107-2
8302 Nestelbach Linux/sparc   http://www.hello-penguin.com
Austriamailto:[EMAIL PROTECTED]
Europe   mailto:[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 03:14:56PM -0800, David S. Miller wrote:
Date: Fri, 29 Dec 2000 00:17:21 +0100
From: Andi Kleen [EMAIL PROTECTED]
 
On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote:
 To make things like "page - mem_map" et al. use shifts instead of
 expensive multiplies...
 
I thought that is what -index is for ? 
 
 It is for the page cache identity Andi... you know, page_hash(mapping, index)...

Oops, I confused it with the 2.0 page-map_nr, which did exactly that.

I should have known better.  Thanks for correcting this brainfart.

 And the add/sub/shift expansion of a multiply/divide by constant even
 in its' most optimal form is often not trivial, it is something on the
 order of 7 instructions with waitq debugging enabled last time I
 checked.

Wonder if it looks better with wq debugging turned off or a compressed
-zone.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 02:33:07PM -0800, David S. Miller wrote:
Date:  Thu, 28 Dec 2000 23:17:22 +0100
From: Andi Kleen [EMAIL PROTECTED]
 
Would you consider patches for any of these points? 
 
 To me it seems just as important to make sure struct page is
 a power of 2 in size, with the waitq debugging turned off this
 is true for both 32-bit and 64-bit hosts last time I checked.

Why exactly a power of two ? To get rid of -index ? 


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Rik van Riel

On Thu, 28 Dec 2000, Andi Kleen wrote:
 On Thu, Dec 28, 2000 at 02:33:07PM -0800, David S. Miller wrote:
 Date:Thu, 28 Dec 2000 23:17:22 +0100
 From: Andi Kleen [EMAIL PROTECTED]
  
 Would you consider patches for any of these points? 
  
  To me it seems just as important to make sure struct page is
  a power of 2 in size, with the waitq debugging turned off this
  is true for both 32-bit and 64-bit hosts last time I checked.
 
 Why exactly a power of two ? To get rid of -index ? 

Most likely to minimise the number of cache misses needed
to access a complete page_struct.

Then again, I guess 48 bytes would _also_ guarantee that
we never need more than 2 cache misses to access every
part of the page_struct.

And the memory wasted in the page_struct may well be a
bigger factor than the cache misses on lots of systems...

(time for another CONFIG option? ;))

regards,

Rik
--
Hollywood goes for world dumbination,
Trailer at 11.

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread David S. Miller

   Date: Thu, 28 Dec 2000 23:58:36 +0100
   From: Andi Kleen [EMAIL PROTECTED]

   Why exactly a power of two ? To get rid of -index ? 

To make things like "page - mem_map" et al. use shifts instead of
expensive multiplies...

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds



On Thu, 28 Dec 2000, Marcelo Tosatti wrote:
 
 We also want to move the page to the per-address-space clean list in
 ClearPageDirty I suppose.

I would actually advice against this.

 - it's ok to have too many pages on the dirty list (think o fthe dirty
   list as a "these pages _can_ be dirty")

 - whenever we do a ClearPageDirty() we're likely to remove the page from
   the lists altogether, so it's not worth it doing extra work.

The exception, of course, is the actual "filemap_fdatasync()" function,
but that one would probably look something like

spin_lock(page_cache_lock);
while (!list_empty(mapping-dirty_pages)) {
struct page *page = list_entry(mapping-dirty_pages.next, struct page, 
list);

list_del(page-list);
list_add(page-list, mapping-clean_pages);

if (!PageDirty(page))
continue;
page_get(page);
spin_unlock(page_cache_lock);

lock_page(page);
if (PageDirty(page)) {
ClearPageDirty(page);
page-mapping-writepage(page);
}
UnlockPage(page);
page_cache_put(page);
spin_lock(page_cache_lock);
}
spin_unlock(page_cache_lock);

and again note how we can move it to the clean list early and we don't
have to keep the PageDirty bit 100% in sync with which list is it on. If
somebody marks it dirty later on (and the dirty bit is still set), that
somebody won't move it back to the dirty list (because it noticved that
the dirty bit is already set), but that's ok: as long as we do the
"ClearPageDirty(page);" call before startign the actual writeout(), we're
fine.

So the "mapping-dirty_pages" list is maybe not so much a _dirty_ list, as
a "scheduled for writeout" list. Marking the page clean doesn't remove it
from that list - it can happily stay on the list and then when the
writeout is started we'd just skip it.

Ok?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds



On Fri, 29 Dec 2000, Stefan Traby wrote:
 On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote:
 
  Too bad. Maybe somebody should tell gcc maintainers about programmers that
  know more than the compiler again.
 
 I know that {p,}gcc-2.95.2{,.1} are not officially supported.

Hmm, I use gcc-2.95.2 myself on some machines, and while I'm not 100%
comfortable with it, it does count as "supported" even if it has known
problems with "long long". pgcc isn't.

 Did you know that it's impossible to compile nfsv4 because of
 register allocation problems with long long since (long long) month ?

lockd v4 (for NFS v3), I assume. 

No, I wasn't aware of this particular bug. 

 The following does not hurt, it's just a fix for a broken
 compiler:

Ugh, that's ugly.

Can you test if it is sufficient to just simplify the math a bit, instead
of uglyfing that function more? The nlm4_encode_lock() function already
tests for NLM4_OFFSET_MAX explicitly for both start and end, so it should
be ok to just re-code the function to not do the extra "loff_t_to_s64()"
stuff, and simplify it enough that the compile rwill be happy to compile
the simpler function. Something along the lines of

if (.. NLM4_OFFSET_MAX tests ..)
..

*p++ = htonl(fl-fl_pid);

start = fl-fl_start;
len = fl-fl_end - start;
if (fl-fl_end == OFFSET_MAX)
len = 0;

p = xdr_encode_hyper(p, start);
p = xdr_encode_hyper(p, len);

return p;

Where it tries to minimize the liveness of the 64-bit values, and tries to
avoid extra complications.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Andi Kleen

On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote:
Date: Thu, 28 Dec 2000 23:58:36 +0100
From: Andi Kleen [EMAIL PROTECTED]
 
Why exactly a power of two ? To get rid of -index ? 
 
 To make things like "page - mem_map" et al. use shifts instead of
 expensive multiplies...

I thought that is what -index is for ? 

Also gcc seems to be already quite clever at dividing through small
integers, e.g. using mul and shift and sub, so it may not be even worth to reach
for a real power-of-two. 

I suspect doing the arithmetics is at least faster than eating the 
cache misses because of -index. 

-Andikkk


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Rik van Riel

On Fri, 29 Dec 2000, Andi Kleen wrote:
 On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote:
 Date: Thu, 28 Dec 2000 23:58:36 +0100
 From: Andi Kleen [EMAIL PROTECTED]
  
 Why exactly a power of two ? To get rid of -index ? 
  
  To make things like "page - mem_map" et al. use shifts instead of
  expensive multiplies...
 
 I thought that is what -index is for ? 

Nope, -index is there to identify which offset the page
has in -mapping, read mm/filemap.c::__find_page_nolock()
for more info.

 Also gcc seems to be already quite clever at dividing through
 small integers, e.g. using mul and shift and sub, so it may not
 be even worth to reach for a real power-of-two.
 
 I suspect doing the arithmetics is at least faster than eating the 
 cache misses because of -index. 

I'm pretty confident that arithmetic is faster than cache
misses ... but an unlucky size of the page struct will cause
extra cache misses due to misalignment.

regards,

Rik
--
Hollywood goes for world dumbination,
Trailer at 11.

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread David S. Miller

   Date: Fri, 29 Dec 2000 00:17:21 +0100
   From: Andi Kleen [EMAIL PROTECTED]

   On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote:
To make things like "page - mem_map" et al. use shifts instead of
expensive multiplies...

   I thought that is what -index is for ? 

It is for the page cache identity Andi... you know, page_hash(mapping, index)...

And the add/sub/shift expansion of a multiply/divide by constant even
in its' most optimal form is often not trivial, it is something on the
order of 7 instructions with waitq debugging enabled last time I
checked.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Linus Torvalds



On Fri, 29 Dec 2000, Andi Kleen wrote:

 On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote:
 Date: Thu, 28 Dec 2000 23:58:36 +0100
 From: Andi Kleen [EMAIL PROTECTED]
  
 Why exactly a power of two ? To get rid of -index ? 
  
  To make things like "page - mem_map" et al. use shifts instead of
  expensive multiplies...
 
 I thought that is what -index is for ? 

No. "index" only gives the virtual index.

"page - mem_map" is how you get the _physical_ index in the zone in
question, which is common for physical tranlations (ie "pte_page()",
"page_to_virt()" or "page_to_phys()")

 Also gcc seems to be already quite clever at dividing through small
 integers, e.g. using mul and shift and sub, so it may not be even worth to reach
 for a real power-of-two. 

Look at the code - it's a big multiply to do a divide by 68 or similar.
Quite expensive.

Doing "page-address - TASK_SIZE" on x86 for the non-highmem case would
probably be faster.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test13-pre5

2000-12-28 Thread Albert Cranford

Simply executing
 *p++ = htonl(fl-fl_pid);
before 
 start = loff_t_to_s64(fl-fl_start);
also works.
Later,
Albert

Linus Torvalds wrote:
 
 On Fri, 29 Dec 2000, Stefan Traby wrote:
  On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote:
 
   Too bad. Maybe somebody should tell gcc maintainers about programmers that
   know more than the compiler again.
 
  I know that {p,}gcc-2.95.2{,.1} are not officially supported.
 
 Hmm, I use gcc-2.95.2 myself on some machines, and while I'm not 100%
 comfortable with it, it does count as "supported" even if it has known
 problems with "long long". pgcc isn't.
 
  Did you know that it's impossible to compile nfsv4 because of
  register allocation problems with long long since (long long) month ?
 
 lockd v4 (for NFS v3), I assume.
 
 No, I wasn't aware of this particular bug.
 
  The following does not hurt, it's just a fix for a broken
  compiler:
 
 Ugh, that's ugly.
 
 Can you test if it is sufficient to just simplify the math a bit, instead
 of uglyfing that function more? The nlm4_encode_lock() function already
 tests for NLM4_OFFSET_MAX explicitly for both start and end, so it should
 be ok to just re-code the function to not do the extra "loff_t_to_s64()"
 stuff, and simplify it enough that the compile rwill be happy to compile
 the simpler function. Something along the lines of
 
 if (.. NLM4_OFFSET_MAX tests ..)
 ..
 
 *p++ = htonl(fl-fl_pid);
 
 start = fl-fl_start;
 len = fl-fl_end - start;
 if (fl-fl_end == OFFSET_MAX)
 len = 0;
 
 p = xdr_encode_hyper(p, start);
 p = xdr_encode_hyper(p, len);
 
 return p;
 
 Where it tries to minimize the liveness of the 64-bit values, and tries to
 avoid extra complications.
 
 Linus
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

-- 
Albert Cranford Deerfield Beach FL USA
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/