Re: test13-pre5
Followup to: <[EMAIL PROTECTED]> By author:Geert Uytterhoeven <[EMAIL PROTECTED]> In newsgroup: linux.dev.kernel > > What about defining new types for this? Like e.g. `x8', being `u8' on platforms > were that's OK, and `u32' on platforms where that's more efficient? > You may just want to look at how C99 handles this using ; stdint.h defines types of the following format: int, uint ... signed/unsigned ... exact size _least ... no smaller than _fast... no smaller than, and efficient _t E.g. uint32_t, int_least64_t, uint_fast8_t (the latter could easily be a 32-bit type, for eaxmple.) In addition, constructor macros are defined, as well as (u)intmax_t and (u)intptr_t; which are defined as the largest possible integer and an integer large enough to hold a (void *), respectively. In other words: (void *)(uintptr_t)(void *)foo == (void *)foo -hpa -- <[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Followup to: [EMAIL PROTECTED] By author:Geert Uytterhoeven [EMAIL PROTECTED] In newsgroup: linux.dev.kernel What about defining new types for this? Like e.g. `x8', being `u8' on platforms were that's OK, and `u32' on platforms where that's more efficient? You may just want to look at how C99 handles this using stdint.h; stdint.h defines types of the following format: int, uint ... signed/unsigned size ... exact size _leastsize ... no smaller than _fastsize... no smaller than, and efficient _t E.g. uint32_t, int_least64_t, uint_fast8_t (the latter could easily be a 32-bit type, for eaxmple.) In addition, constructor macros are defined, as well as (u)intmax_t and (u)intptr_t; which are defined as the largest possible integer and an integer large enough to hold a (void *), respectively. In other words: (void *)(uintptr_t)(void *)foo == (void *)foo -hpa -- [EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 11:15:51AM -0800, Linus Torvalds wrote: > In article <[EMAIL PROTECTED]>, > Matti Aarnio <[EMAIL PROTECTED]> wrote: > > > > Actually nothing SMP specific in that problem sphere. > > Alpha has load-locked/store-conditional pair for > > this type of memory accesses to automatically detect, > > and (conditionally) restart the operation - to form > > classical ``locked-read-modify-write'' operations. > > Sure, we could make the older alphas use ldl_l stl_c for byte accesses, > but if you thought byte accesses on those machines were kind-of slow > before, just WAIT until that happens. The older Alphas would just typedef x8/x16 (or granular_u8, granular_u16 or whatever it is called) to u32 and be the same as today. Just most other boxes would benefit. This actually all assumes that gcc really uses the byte instructions for byte stores in structures, which is to be determined. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
In article <[EMAIL PROTECTED]>, Matti Aarnio <[EMAIL PROTECTED]> wrote: > > Actually nothing SMP specific in that problem sphere. > Alpha has load-locked/store-conditional pair for > this type of memory accesses to automatically detect, > and (conditionally) restart the operation - to form > classical ``locked-read-modify-write'' operations. Sure, we could make the older alphas use ldl_l stl_c for byte accesses, but if you thought byte accesses on those machines were kind-of slow before, just WAIT until that happens. Old alpha machines (the same ones that would need this code) were HORRIBLE at ldl_l<->stl_c: they go out all the way to the bus to set the lock. So suddenly your every byte access ends up being a few hundred cycles! So ldl_l/stc_l is not the answer. It would work, but it would be so slow that you'd be a lot better off not doing it. I think they fixed ldl/stc later on (so that it only sets a bit locally that gets cleared by the cache coherency protocol), but as later alphas have the byte accesses anyway that doesn't matter here. The faster ldl/stc makes for much faster spinlocks on newer alphas, though. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 06:36:50PM +0100, Andi Kleen wrote: > AFAIK alpha has byte instructions now. See other post. Only from ev6 (at least as far as gcc is concerned). I've an userspace testcase here (it was originally an obscure alpha userspace MM corruption bug report that I sorted out some time ago) that works only only when compiled for ev6 because it needs `short' granularity (not even byte granularity). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote: > The alpha systems I remember this problem on were all [..] Yes the granularity issue has nothing to do with SMP (with preemptive kernel it can trigger even without interrupts involved into the code). Also CONFIG_SPACE_EFFICIENT looks not necessary. The x8 name is confusing IMHO (when I read `8' I expect 8bits only, the x isn't explicit enough). But by using a better name we could save some byte on alpha ev6 and x86. Something like granular_char/granular_short/granular_int looks nicer. For the generic 64bit cpu they needs to be _unconditionally_ defined to `long'. BTW, only old chips (ev[45]) doesn't provide byte granularity. Infact a linux kernel compiled for ev6 can handle byte granularity also on alpha (it uses -mcpu=ev6). alpha reference manual 5.2.2: [..] For each region, an implementation must support aligned quadword access and may optionally support aligned longword access or byte access. If byte access is supported in a region, aligned word access and aligned longword access are also supported. [..] 21264hrm: [..] The 21264-generated external references to memory space are always of a fixed 64-byte size, though the internal access granularity is byte, word, longword, or quad-word. [..] Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote: > On Sun, 31 Dec 2000, Andi Kleen wrote: > > > > Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for > > embedded systems, where you could trade a bit of CPU for less memory overhead > > even on systems where u8 is slow and atomicity doesn't come into play > > because it's UP anyways. > > UP has nothing to do with it. > The alpha systems I remember this problem on were all SMP. Actually nothing SMP specific in that problem sphere. Alpha has load-locked/store-conditional pair for this type of memory accesses to automatically detect, and (conditionally) restart the operation - to form classical ``locked-read-modify-write'' operations. In what situations the compiler will use those instructions, that I don't know. Volatiles, very least, use them. Will closely packed bytes be processed with it without them being volatiles ? How about bitfields ? Newer Alphas have byte/short load/store instructions, so things really aren't that straight-forward... > I don't think it's a good diea. > > Linus /Matti Aarnio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote: > > > On Sun, 31 Dec 2000, Andi Kleen wrote: > > > > Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for > > embedded systems, where you could trade a bit of CPU for less memory overhead > > even on systems where u8 is slow and atomicity doesn't come into play > > because it's UP anyways. > > UP has nothing to do with it. > > The alpha systems I remember this problem on were all SMP. [...] I just checked all architecture manuals I could lay my hands on (sparcv9, ppc32, mips r4400, parisc 1.1, alpha, sh is somewhere in storage but as I remember it has it too) and they all seem to have at least store byte and mostly store half words instructions. > > Imagine an architecture where you need to do a > > load_32() > mask-and-insert-byte > store_32() iirc the Alpha guys found out that they couldn't drive half of the available devices without byte store, and since then nobody has repeated that mistake @) > > I don't think it's a good diea. I don't see it. Just define x8 to u32 on old alpha and let most other architectures be happy. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote: > > > On Sun, 31 Dec 2000, Andi Kleen wrote: > > > > Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for > > embedded systems, where you could trade a bit of CPU for less memory overhead > > even on systems where u8 is slow and atomicity doesn't come into play > > because it's UP anyways. > > UP has nothing to do with it. > > The alpha systems I remember this problem on were all SMP. AFAIK alpha has byte instructions now. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, 31 Dec 2000, Andi Kleen wrote: > > Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for > embedded systems, where you could trade a bit of CPU for less memory overhead > even on systems where u8 is slow and atomicity doesn't come into play > because it's UP anyways. UP has nothing to do with it. The alpha systems I remember this problem on were all SMP. Imagine an architecture where you need to do a load_32() mask-and-insert-byte store_32() and imagine that an interrupt comes in: load_32() mask-and-insert-byte * INTERRUPT * load_32() mask-and-insert-ANOTHER-byte store_32() interrupt return store_32() and notice how the value written by the interrupt is gone, gone, gone, even though it was to a completely different byte. Now, imagine that the first byte is the "age", and imagine that the thing the interrupt tries to update is "flags". Yes, you're screwed. I don't think it's a good diea. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sat, Dec 30, 2000 at 02:24:06PM +0100, Geert Uytterhoeven wrote: > On Thu, 28 Dec 2000, Linus Torvalds wrote: > > On Thu, 28 Dec 2000, Andi Kleen wrote: > > > - Instead of having a zone pointer mask use a 8 or 16 byte index into a > > > zone table. On a modern CPU it is much cheaper to do the and/shifts than > > > to do even a single cache miss during page aging. On a lot of systems > > > that zone index could be hardcoded to 0 anyways, giving better code. > > > - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which > > > has the same swapping algorithm even only uses 8bit) > > > > This would be good, but can be hard. > > > > FreeBSD doesn't try to be portable any more, but Linux does, and there are > > architectures where 8- and 16-bit accesses aren't atomic but have to be > > done with read-modify-write cycles. > > > > And even for fields like "age", where we don't care whether the age itself > > is 100% accurate, we _do_ care that the fields close-by don't get strange > > effects from updating "age". We used to have exactly this problem on alpha > > back in the 2.1.x timeframe. > > > > This is why a lot of fields are 32-bit, even though we wouldn't need more > > than 8 or 16 bits of them. > > What about defining new types for this? Like e.g. `x8', being `u8' on platforms > were that's OK, and `u32' on platforms where that's more efficient? Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for embedded systems, where you could trade a bit of CPU for less memory overhead even on systems where u8 is slow and atomicity doesn't come into play because it's UP anyways. Only problem I see is that when the programmer was wrong about the possible range (which sometimes happens) then it could mysteriously work on some machines and fail on others. This is already the case e.g. with atomic_t, which is shorter than 32bit e.g. on sparc32, so it is probably not a too big problem. -Andi /* asm/types.h for a random 32bit machine with no byte access */ #if defined(CONFIG_SPACE_EFFICIENT) && !defined(CONFIG_SMP) typedef __u8 x8; typedef __u16 x16; typedef __u32 x32; #else typedef __u32 x8; typedef __u32 x16; typedef __u32 x32; #endif -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Linus Torvalds wrote: > On Thu, 28 Dec 2000, Andi Kleen wrote: > > - Instead of having a zone pointer mask use a 8 or 16 byte index into a > > zone table. On a modern CPU it is much cheaper to do the and/shifts than > > to do even a single cache miss during page aging. On a lot of systems > > that zone index could be hardcoded to 0 anyways, giving better code. > > - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which > > has the same swapping algorithm even only uses 8bit) > > This would be good, but can be hard. > > FreeBSD doesn't try to be portable any more, but Linux does, and there are > architectures where 8- and 16-bit accesses aren't atomic but have to be > done with read-modify-write cycles. > > And even for fields like "age", where we don't care whether the age itself > is 100% accurate, we _do_ care that the fields close-by don't get strange > effects from updating "age". We used to have exactly this problem on alpha > back in the 2.1.x timeframe. > > This is why a lot of fields are 32-bit, even though we wouldn't need more > than 8 or 16 bits of them. What about defining new types for this? Like e.g. `x8', being `u8' on platforms were that's OK, and `u32' on platforms where that's more efficient? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED] In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Linus Torvalds wrote: On Thu, 28 Dec 2000, Andi Kleen wrote: - Instead of having a zone pointer mask use a 8 or 16 byte index into a zone table. On a modern CPU it is much cheaper to do the and/shifts than to do even a single cache miss during page aging. On a lot of systems that zone index could be hardcoded to 0 anyways, giving better code. - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which has the same swapping algorithm even only uses 8bit) This would be good, but can be hard. FreeBSD doesn't try to be portable any more, but Linux does, and there are architectures where 8- and 16-bit accesses aren't atomic but have to be done with read-modify-write cycles. And even for fields like "age", where we don't care whether the age itself is 100% accurate, we _do_ care that the fields close-by don't get strange effects from updating "age". We used to have exactly this problem on alpha back in the 2.1.x timeframe. This is why a lot of fields are 32-bit, even though we wouldn't need more than 8 or 16 bits of them. What about defining new types for this? Like e.g. `x8', being `u8' on platforms were that's OK, and `u32' on platforms where that's more efficient? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED] In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sat, Dec 30, 2000 at 02:24:06PM +0100, Geert Uytterhoeven wrote: On Thu, 28 Dec 2000, Linus Torvalds wrote: On Thu, 28 Dec 2000, Andi Kleen wrote: - Instead of having a zone pointer mask use a 8 or 16 byte index into a zone table. On a modern CPU it is much cheaper to do the and/shifts than to do even a single cache miss during page aging. On a lot of systems that zone index could be hardcoded to 0 anyways, giving better code. - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which has the same swapping algorithm even only uses 8bit) This would be good, but can be hard. FreeBSD doesn't try to be portable any more, but Linux does, and there are architectures where 8- and 16-bit accesses aren't atomic but have to be done with read-modify-write cycles. And even for fields like "age", where we don't care whether the age itself is 100% accurate, we _do_ care that the fields close-by don't get strange effects from updating "age". We used to have exactly this problem on alpha back in the 2.1.x timeframe. This is why a lot of fields are 32-bit, even though we wouldn't need more than 8 or 16 bits of them. What about defining new types for this? Like e.g. `x8', being `u8' on platforms were that's OK, and `u32' on platforms where that's more efficient? Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for embedded systems, where you could trade a bit of CPU for less memory overhead even on systems where u8 is slow and atomicity doesn't come into play because it's UP anyways. Only problem I see is that when the programmer was wrong about the possible range (which sometimes happens) then it could mysteriously work on some machines and fail on others. This is already the case e.g. with atomic_t, which is shorter than 32bit e.g. on sparc32, so it is probably not a too big problem. -Andi /* asm/types.h for a random 32bit machine with no byte access */ #if defined(CONFIG_SPACE_EFFICIENT) !defined(CONFIG_SMP) typedef __u8 x8; typedef __u16 x16; typedef __u32 x32; #else typedef __u32 x8; typedef __u32 x16; typedef __u32 x32; #endif -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, 31 Dec 2000, Andi Kleen wrote: Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for embedded systems, where you could trade a bit of CPU for less memory overhead even on systems where u8 is slow and atomicity doesn't come into play because it's UP anyways. UP has nothing to do with it. The alpha systems I remember this problem on were all SMP. Imagine an architecture where you need to do a load_32() mask-and-insert-byte store_32() and imagine that an interrupt comes in: load_32() mask-and-insert-byte * INTERRUPT * load_32() mask-and-insert-ANOTHER-byte store_32() interrupt return store_32() and notice how the value written by the interrupt is gone, gone, gone, even though it was to a completely different byte. Now, imagine that the first byte is the "age", and imagine that the thing the interrupt tries to update is "flags". Yes, you're screwed. I don't think it's a good diea. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote: On Sun, 31 Dec 2000, Andi Kleen wrote: Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for embedded systems, where you could trade a bit of CPU for less memory overhead even on systems where u8 is slow and atomicity doesn't come into play because it's UP anyways. UP has nothing to do with it. The alpha systems I remember this problem on were all SMP. AFAIK alpha has byte instructions now. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote: On Sun, 31 Dec 2000, Andi Kleen wrote: Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for embedded systems, where you could trade a bit of CPU for less memory overhead even on systems where u8 is slow and atomicity doesn't come into play because it's UP anyways. UP has nothing to do with it. The alpha systems I remember this problem on were all SMP. [...] I just checked all architecture manuals I could lay my hands on (sparcv9, ppc32, mips r4400, parisc 1.1, alpha, sh is somewhere in storage but as I remember it has it too) and they all seem to have at least store byte and mostly store half words instructions. Imagine an architecture where you need to do a load_32() mask-and-insert-byte store_32() iirc the Alpha guys found out that they couldn't drive half of the available devices without byte store, and since then nobody has repeated that mistake @) I don't think it's a good diea. I don't see it. Just define x8 to u32 on old alpha and let most other architectures be happy. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote: On Sun, 31 Dec 2000, Andi Kleen wrote: Sounds good. It could also be controlled by a CONFIG_SPACE_EFFICIENT for embedded systems, where you could trade a bit of CPU for less memory overhead even on systems where u8 is slow and atomicity doesn't come into play because it's UP anyways. UP has nothing to do with it. The alpha systems I remember this problem on were all SMP. Actually nothing SMP specific in that problem sphere. Alpha has load-locked/store-conditional pair for this type of memory accesses to automatically detect, and (conditionally) restart the operation - to form classical ``locked-read-modify-write'' operations. In what situations the compiler will use those instructions, that I don't know. Volatiles, very least, use them. Will closely packed bytes be processed with it without them being volatiles ? How about bitfields ? Newer Alphas have byte/short load/store instructions, so things really aren't that straight-forward... I don't think it's a good diea. Linus /Matti Aarnio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 09:27:23AM -0800, Linus Torvalds wrote: The alpha systems I remember this problem on were all [..] Yes the granularity issue has nothing to do with SMP (with preemptive kernel it can trigger even without interrupts involved into the code). Also CONFIG_SPACE_EFFICIENT looks not necessary. The x8 name is confusing IMHO (when I read `8' I expect 8bits only, the x isn't explicit enough). But by using a better name we could save some byte on alpha ev6 and x86. Something like granular_char/granular_short/granular_int looks nicer. For the generic 64bit cpu they needs to be _unconditionally_ defined to `long'. BTW, only old chips (ev[45]) doesn't provide byte granularity. Infact a linux kernel compiled for ev6 can handle byte granularity also on alpha (it uses -mcpu=ev6). alpha reference manual 5.2.2: [..] For each region, an implementation must support aligned quadword access and may optionally support aligned longword access or byte access. If byte access is supported in a region, aligned word access and aligned longword access are also supported. [..] 21264hrm: [..] The 21264-generated external references to memory space are always of a fixed 64-byte size, though the internal access granularity is byte, word, longword, or quad-word. [..] Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 06:36:50PM +0100, Andi Kleen wrote: AFAIK alpha has byte instructions now. See other post. Only from ev6 (at least as far as gcc is concerned). I've an userspace testcase here (it was originally an obscure alpha userspace MM corruption bug report that I sorted out some time ago) that works only only when compiled for ev6 because it needs `short' granularity (not even byte granularity). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
In article [EMAIL PROTECTED], Matti Aarnio [EMAIL PROTECTED] wrote: Actually nothing SMP specific in that problem sphere. Alpha has load-locked/store-conditional pair for this type of memory accesses to automatically detect, and (conditionally) restart the operation - to form classical ``locked-read-modify-write'' operations. Sure, we could make the older alphas use ldl_l stl_c for byte accesses, but if you thought byte accesses on those machines were kind-of slow before, just WAIT until that happens. Old alpha machines (the same ones that would need this code) were HORRIBLE at ldl_l-stl_c: they go out all the way to the bus to set the lock. So suddenly your every byte access ends up being a few hundred cycles! So ldl_l/stc_l is not the answer. It would work, but it would be so slow that you'd be a lot better off not doing it. I think they fixed ldl/stc later on (so that it only sets a bit locally that gets cleared by the cache coherency protocol), but as later alphas have the byte accesses anyway that doesn't matter here. The faster ldl/stc makes for much faster spinlocks on newer alphas, though. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Sun, Dec 31, 2000 at 11:15:51AM -0800, Linus Torvalds wrote: In article [EMAIL PROTECTED], Matti Aarnio [EMAIL PROTECTED] wrote: Actually nothing SMP specific in that problem sphere. Alpha has load-locked/store-conditional pair for this type of memory accesses to automatically detect, and (conditionally) restart the operation - to form classical ``locked-read-modify-write'' operations. Sure, we could make the older alphas use ldl_l stl_c for byte accesses, but if you thought byte accesses on those machines were kind-of slow before, just WAIT until that happens. The older Alphas would just typedef x8/x16 (or granular_u8, granular_u16 or whatever it is called) to u32 and be the same as today. Just most other boxes would benefit. This actually all assumes that gcc really uses the byte instructions for byte stores in structures, which is to be determined. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5 + char-major-145??
On Fri, 29 Dec 2000 14:07:57 -0700, Frank Jacobberger <[EMAIL PROTECTED]> wrote: >modprobe: Can't locate module char-major-145 > >From /usr/src/linux/Documentation/devices.txt > >10 charNon-serial mice, misc features >145 = /dev/hfmodem Soundcard shortwave modem control {2.6} That is major 10, minor 145. Search /145 *char/ to find char-major-145. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, David S. Miller wrote: > > For my development testing, I'm running a _heavily_ hacked >kernel. One of these hacks is to pull the wait_queue_head out of >struct page; the waitq-heads are in a separate allocated area of >memory, with a waitq-head pointer embedded in the page structure >(allocated/initialised in free_area_init_core()). This gives a >page structure of 60bytes, giving me one free double-word to play >with (which I'm using as a pointer to a release function). > > Not something like those damn Solaris turnstiles, no please If you want to have a release function, please just use "page->mapping", which gives you much more, including memory pressure indicators etc. Now _that_ can be useful for doing things like slab caches. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Date: Fri, 29 Dec 2000 15:46:22 + (GMT) From: Mark Hemment <[EMAIL PROTECTED]> For my development testing, I'm running a _heavily_ hacked kernel. One of these hacks is to pull the wait_queue_head out of struct page; the waitq-heads are in a separate allocated area of memory, with a waitq-head pointer embedded in the page structure (allocated/initialised in free_area_init_core()). This gives a page structure of 60bytes, giving me one free double-word to play with (which I'm using as a pointer to a release function). Not something like those damn Solaris turnstiles, no please Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5 + char-major-145??
In article <[EMAIL PROTECTED]> you wrote: > What may be calling this? Any advice where to go ferreting? Somebody may try to open the device file. Greetings Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 12:25:23PM -0800, Linus Torvalds wrote: > - pre5: >- NIIBE Yutaka: SuperH update >- Geert Uytterhoeven: m68k update >- David Miller: TCP RTO calc fix, UDP multicast fix etc >- Duncan Laurie: ServerWorks PIRQ routing definition. >- mm PageDirty cleanups, added sanity checks, and don't lose the bit. I just noticed this (playing with some other stuff), but ext2 as a module is currently broken: $ make INSTALL_MOD_PATH=/tmp/foo modules_install ... if [ -r System.map ]; then /sbin/depmod -ae -F System.map -b /tmp/foo -r 2.4.0-test12; fi depmod: *** Unresolved symbols in /tmp/foo/lib/modules/2.4.0-test12/kernel/fs/ext2/ext2.o depmod: buffer_insert_inode_queue depmod: fsync_inode_buffers I tried the following locally and it fixes it. -- --- fs/Makefile.origFri Dec 29 10:35:50 2000 +++ fs/Makefile Fri Dec 29 10:36:06 2000 @@ -7,7 +7,7 @@ O_TARGET := fs.o -export-objs := filesystems.o +export-objs := filesystems.o buffer.o mod-subdirs := nls obj-y := open.o read_write.o devices.o file_table.o buffer.o \ --- fs/buffer.c.origFri Dec 29 10:33:21 2000 +++ fs/buffer.c Fri Dec 29 10:35:46 2000 @@ -29,6 +29,7 @@ /* async buffer flushing, 1999 Andrea Arcangeli <[EMAIL PROTECTED]> */ #include +#include #include #include #include @@ -579,6 +580,8 @@ spin_unlock(_list_lock); } +EXPORT_SYMBOL(buffer_insert_inode_queue); + /* The caller must have the lru_list lock before calling the remove_inode_queue functions. */ static void __remove_inode_queue(struct buffer_head *bh) @@ -900,6 +903,7 @@ return err2; } +EXPORT_SYMBOL(fsync_inode_buffers); /* * osync is designed to support O_SYNC io. It waits synchronously for -- -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, Tim Wright wrote: > Yes, this is a very important point if we ever want to make serious use > of large memory machines on ia32. We ran into this with DYNIX/ptx when the > P6 added 36-bit physical addressing. Conserving KVA (kernel virtual address > space), became a very high priority. Eventually, we had to add code to play > silly segment games and "magically" materialize and dematerialize a 4GB > kernel virtual address space instead of the 1GB. This only comes into play > with really large amounts of memory, and is almost certainly not worth the > agony of implementation on Linux, but we'll need to be careful elsewhere to > conserve it as much as possible. Indeed. I'm compiling my kernels with 2GB virtual. Not as I want more NORMAL pages in the page cache (HIGH memory is fine), but as I need NORMAL pages for kernel data/structures (memory allocated from slab-caches) which need to be constantly mapped in. Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, Dec 29, 2000 at 03:46:22PM +, Mark Hemment wrote: > Note, for those of us running on 32bit with lots of physical memory, the > available virtual address-space is of major consideration. Reducing the > size of the page structure is more than just reducing cache misses - it > gives us more virtual to play with... > > Mark > Yes, this is a very important point if we ever want to make serious use of large memory machines on ia32. We ran into this with DYNIX/ptx when the P6 added 36-bit physical addressing. Conserving KVA (kernel virtual address space), became a very high priority. Eventually, we had to add code to play silly segment games and "magically" materialize and dematerialize a 4GB kernel virtual address space instead of the 1GB. This only comes into play with really large amounts of memory, and is almost certainly not worth the agony of implementation on Linux, but we'll need to be careful elsewhere to conserve it as much as possible. Regards, Tim -- Tim Wright - [EMAIL PROTECTED] or [EMAIL PROTECTED] or [EMAIL PROTECTED] IBM Linux Technology Center, Beaverton, Oregon "Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Hi, On Thu, 28 Dec 2000, David S. Miller wrote: >Date: Thu, 28 Dec 2000 23:17:22 +0100 >From: Andi Kleen <[EMAIL PROTECTED]> > >Would you consider patches for any of these points? > > To me it seems just as important to make sure struct page is > a power of 2 in size, with the waitq debugging turned off this > is true for both 32-bit and 64-bit hosts last time I checked. Checking test11 (which I'm running here), even with waitq debugging turned off, on 32-bit (IA32) the struct page is 68bytes (since the "age" member was re-introduced a while back). For my development testing, I'm running a _heavily_ hacked kernel. One of these hacks is to pull the wait_queue_head out of struct page; the waitq-heads are in a separate allocated area of memory, with a waitq-head pointer embedded in the page structure (allocated/initialised in free_area_init_core()). This gives a page structure of 60bytes, giving me one free double-word to play with (which I'm using as a pointer to a release function). Infact, there doesn't need to be a waitq-head allocated for each page structure - they can share; with a performance overhead on a false wakeup in __wait_on_page(). Note, for those of us running on 32bit with lots of physical memory, the available virtual address-space is of major consideration. Reducing the size of the page structure is more than just reducing cache misses - it gives us more virtual to play with... Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, Dec 29, 2000 at 01:06:30AM +, Albert Cranford wrote: > Simply executing > *p++ = htonl(fl->fl_pid); > before > start = loff_t_to_s64(fl->fl_start); > also works. Yes, confirmed. Since you're located in Florida I vote for this and I hope that Linus will elect it. :) --- linux/fs/lockd/xdr4.c.orig Fri Dec 29 01:35:32 2000 +++ linux/fs/lockd/xdr4.c Fri Dec 29 14:56:07 2000 @@ -167,13 +167,13 @@ || (fl->fl_end > NLM4_OFFSET_MAX && fl->fl_end != OFFSET_MAX)) return NULL; + *p++ = htonl(fl->fl_pid); start = loff_t_to_s64(fl->fl_start); if (fl->fl_end == OFFSET_MAX) len = 0; else len = loff_t_to_s64(fl->fl_end - fl->fl_start + 1); - *p++ = htonl(fl->fl_pid); p = xdr_encode_hyper(p, start); p = xdr_encode_hyper(p, len); -- ciao - Stefan "export PS1="rms# " " Stefan TrabyLinux/ia32 fax: +43-3133-6107-9 Mitterlasznitzstr. 13 Linux/alphaphone: +43-3133-6107-2 8302 Nestelbach Linux/sparc http://www.hello-penguin.com Austriamailto:[EMAIL PROTECTED] Europe mailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Marcelo Tosatti wrote: > > On Thu, 28 Dec 2000, Linus Torvalds wrote: > > > - make "SetPageDirty()" do something like > > > > if (!test_and_set(PG_dirty, >flags)) { > > spin_lock(_cache_lock); > > list_del(page->list); > > list_add(page->list, page->mapping->dirty_pages); > > spin_unlock(_cache_lock); > > } > > We also want to move the page to the per-address-space clean list in > ClearPageDirty I suppose. I'd like to suggest taking this opportunity to regularize the notation by going to set_page_dirty/clear_page_dirty which will call SetPageDirty/ClearPageDirty. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Marcelo Tosatti wrote: On Thu, 28 Dec 2000, Linus Torvalds wrote: - make "SetPageDirty()" do something like if (!test_and_set(PG_dirty, page-flags)) { spin_lock(page_cache_lock); list_del(page-list); list_add(page-list, page-mapping-dirty_pages); spin_unlock(page_cache_lock); } We also want to move the page to the per-address-space clean list in ClearPageDirty I suppose. I'd like to suggest taking this opportunity to regularize the notation by going to set_page_dirty/clear_page_dirty which will call SetPageDirty/ClearPageDirty. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, Dec 29, 2000 at 01:06:30AM +, Albert Cranford wrote: Simply executing *p++ = htonl(fl-fl_pid); before start = loff_t_to_s64(fl-fl_start); also works. Yes, confirmed. Since you're located in Florida I vote for this and I hope that Linus will elect it. :) --- linux/fs/lockd/xdr4.c.orig Fri Dec 29 01:35:32 2000 +++ linux/fs/lockd/xdr4.c Fri Dec 29 14:56:07 2000 @@ -167,13 +167,13 @@ || (fl-fl_end NLM4_OFFSET_MAX fl-fl_end != OFFSET_MAX)) return NULL; + *p++ = htonl(fl-fl_pid); start = loff_t_to_s64(fl-fl_start); if (fl-fl_end == OFFSET_MAX) len = 0; else len = loff_t_to_s64(fl-fl_end - fl-fl_start + 1); - *p++ = htonl(fl-fl_pid); p = xdr_encode_hyper(p, start); p = xdr_encode_hyper(p, len); -- ciao - Stefan "export PS1="rms# " " Stefan TrabyLinux/ia32 fax: +43-3133-6107-9 Mitterlasznitzstr. 13 Linux/alphaphone: +43-3133-6107-2 8302 Nestelbach Linux/sparc http://www.hello-penguin.com Austriamailto:[EMAIL PROTECTED] Europe mailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Hi, On Thu, 28 Dec 2000, David S. Miller wrote: Date: Thu, 28 Dec 2000 23:17:22 +0100 From: Andi Kleen [EMAIL PROTECTED] Would you consider patches for any of these points? To me it seems just as important to make sure struct page is a power of 2 in size, with the waitq debugging turned off this is true for both 32-bit and 64-bit hosts last time I checked. Checking test11 (which I'm running here), even with waitq debugging turned off, on 32-bit (IA32) the struct page is 68bytes (since the "age" member was re-introduced a while back). For my development testing, I'm running a _heavily_ hacked kernel. One of these hacks is to pull the wait_queue_head out of struct page; the waitq-heads are in a separate allocated area of memory, with a waitq-head pointer embedded in the page structure (allocated/initialised in free_area_init_core()). This gives a page structure of 60bytes, giving me one free double-word to play with (which I'm using as a pointer to a release function). Infact, there doesn't need to be a waitq-head allocated for each page structure - they can share; with a performance overhead on a false wakeup in __wait_on_page(). Note, for those of us running on 32bit with lots of physical memory, the available virtual address-space is of major consideration. Reducing the size of the page structure is more than just reducing cache misses - it gives us more virtual to play with... Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, Dec 29, 2000 at 03:46:22PM +, Mark Hemment wrote: Note, for those of us running on 32bit with lots of physical memory, the available virtual address-space is of major consideration. Reducing the size of the page structure is more than just reducing cache misses - it gives us more virtual to play with... Mark Yes, this is a very important point if we ever want to make serious use of large memory machines on ia32. We ran into this with DYNIX/ptx when the P6 added 36-bit physical addressing. Conserving KVA (kernel virtual address space), became a very high priority. Eventually, we had to add code to play silly segment games and "magically" materialize and dematerialize a 4GB kernel virtual address space instead of the 1GB. This only comes into play with really large amounts of memory, and is almost certainly not worth the agony of implementation on Linux, but we'll need to be careful elsewhere to conserve it as much as possible. Regards, Tim -- Tim Wright - [EMAIL PROTECTED] or [EMAIL PROTECTED] or [EMAIL PROTECTED] IBM Linux Technology Center, Beaverton, Oregon "Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, Tim Wright wrote: Yes, this is a very important point if we ever want to make serious use of large memory machines on ia32. We ran into this with DYNIX/ptx when the P6 added 36-bit physical addressing. Conserving KVA (kernel virtual address space), became a very high priority. Eventually, we had to add code to play silly segment games and "magically" materialize and dematerialize a 4GB kernel virtual address space instead of the 1GB. This only comes into play with really large amounts of memory, and is almost certainly not worth the agony of implementation on Linux, but we'll need to be careful elsewhere to conserve it as much as possible. Indeed. I'm compiling my kernels with 2GB virtual. Not as I want more NORMAL pages in the page cache (HIGH memory is fine), but as I need NORMAL pages for kernel data/structures (memory allocated from slab-caches) which need to be constantly mapped in. Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 12:25:23PM -0800, Linus Torvalds wrote: - pre5: - NIIBE Yutaka: SuperH update - Geert Uytterhoeven: m68k update - David Miller: TCP RTO calc fix, UDP multicast fix etc - Duncan Laurie: ServerWorks PIRQ routing definition. - mm PageDirty cleanups, added sanity checks, and don't lose the bit. I just noticed this (playing with some other stuff), but ext2 as a module is currently broken: $ make INSTALL_MOD_PATH=/tmp/foo modules_install ... if [ -r System.map ]; then /sbin/depmod -ae -F System.map -b /tmp/foo -r 2.4.0-test12; fi depmod: *** Unresolved symbols in /tmp/foo/lib/modules/2.4.0-test12/kernel/fs/ext2/ext2.o depmod: buffer_insert_inode_queue depmod: fsync_inode_buffers I tried the following locally and it fixes it. ---cut--- --- fs/Makefile.origFri Dec 29 10:35:50 2000 +++ fs/Makefile Fri Dec 29 10:36:06 2000 @@ -7,7 +7,7 @@ O_TARGET := fs.o -export-objs := filesystems.o +export-objs := filesystems.o buffer.o mod-subdirs := nls obj-y := open.o read_write.o devices.o file_table.o buffer.o \ --- fs/buffer.c.origFri Dec 29 10:33:21 2000 +++ fs/buffer.c Fri Dec 29 10:35:46 2000 @@ -29,6 +29,7 @@ /* async buffer flushing, 1999 Andrea Arcangeli [EMAIL PROTECTED] */ #include linux/config.h +#include linux/module.h #include linux/sched.h #include linux/fs.h #include linux/malloc.h @@ -579,6 +580,8 @@ spin_unlock(lru_list_lock); } +EXPORT_SYMBOL(buffer_insert_inode_queue); + /* The caller must have the lru_list lock before calling the remove_inode_queue functions. */ static void __remove_inode_queue(struct buffer_head *bh) @@ -900,6 +903,7 @@ return err2; } +EXPORT_SYMBOL(fsync_inode_buffers); /* * osync is designed to support O_SYNC io. It waits synchronously for ---end--- -- Tom Rini (TR1265) http://gate.crashing.org/~trini/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5 + char-major-145??
In article [EMAIL PROTECTED] you wrote: What may be calling this? Any advice where to go ferreting? Somebody may try to open the device file. Greetings Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Date: Fri, 29 Dec 2000 15:46:22 + (GMT) From: Mark Hemment [EMAIL PROTECTED] For my development testing, I'm running a _heavily_ hacked kernel. One of these hacks is to pull the wait_queue_head out of struct page; the waitq-heads are in a separate allocated area of memory, with a waitq-head pointer embedded in the page structure (allocated/initialised in free_area_init_core()). This gives a page structure of 60bytes, giving me one free double-word to play with (which I'm using as a pointer to a release function). Not something like those damn Solaris turnstiles, no please Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, David S. Miller wrote: For my development testing, I'm running a _heavily_ hacked kernel. One of these hacks is to pull the wait_queue_head out of struct page; the waitq-heads are in a separate allocated area of memory, with a waitq-head pointer embedded in the page structure (allocated/initialised in free_area_init_core()). This gives a page structure of 60bytes, giving me one free double-word to play with (which I'm using as a pointer to a release function). Not something like those damn Solaris turnstiles, no please If you want to have a release function, please just use "page-mapping", which gives you much more, including memory pressure indicators etc. Now _that_ can be useful for doing things like slab caches. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5 + char-major-145??
On Fri, 29 Dec 2000 14:07:57 -0700, Frank Jacobberger [EMAIL PROTECTED] wrote: modprobe: Can't locate module char-major-145 From /usr/src/linux/Documentation/devices.txt 10 charNon-serial mice, misc features 145 = /dev/hfmodem Soundcard shortwave modem control {2.6} That is major 10, minor 145. Search /145 *char/ to find char-major-145. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Simply executing *p++ = htonl(fl->fl_pid); before start = loff_t_to_s64(fl->fl_start); also works. Later, Albert Linus Torvalds wrote: > > On Fri, 29 Dec 2000, Stefan Traby wrote: > > On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote: > > > > > Too bad. Maybe somebody should tell gcc maintainers about programmers that > > > know more than the compiler again. > > > > I know that {p,}gcc-2.95.2{,.1} are not officially supported. > > Hmm, I use gcc-2.95.2 myself on some machines, and while I'm not 100% > comfortable with it, it does count as "supported" even if it has known > problems with "long long". pgcc isn't. > > > Did you know that it's impossible to compile nfsv4 because of > > register allocation problems with long long since (long long) month ? > > lockd v4 (for NFS v3), I assume. > > No, I wasn't aware of this particular bug. > > > The following does not hurt, it's just a fix for a broken > > compiler: > > Ugh, that's ugly. > > Can you test if it is sufficient to just simplify the math a bit, instead > of uglyfing that function more? The nlm4_encode_lock() function already > tests for NLM4_OFFSET_MAX explicitly for both start and end, so it should > be ok to just re-code the function to not do the extra "loff_t_to_s64()" > stuff, and simplify it enough that the compile rwill be happy to compile > the simpler function. Something along the lines of > > if (.. NLM4_OFFSET_MAX tests ..) > .. > > *p++ = htonl(fl->fl_pid); > > start = fl->fl_start; > len = fl->fl_end - start; > if (fl->fl_end == OFFSET_MAX) > len = 0; > > p = xdr_encode_hyper(p, start); > p = xdr_encode_hyper(p, len); > > return p; > > Where it tries to minimize the liveness of the 64-bit values, and tries to > avoid extra complications. > > Linus > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ -- Albert Cranford Deerfield Beach FL USA [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, Stefan Traby wrote: > On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote: > > > Too bad. Maybe somebody should tell gcc maintainers about programmers that > > know more than the compiler again. > > I know that {p,}gcc-2.95.2{,.1} are not officially supported. Hmm, I use gcc-2.95.2 myself on some machines, and while I'm not 100% comfortable with it, it does count as "supported" even if it has known problems with "long long". pgcc isn't. > Did you know that it's impossible to compile nfsv4 because of > register allocation problems with long long since (long long) month ? lockd v4 (for NFS v3), I assume. No, I wasn't aware of this particular bug. > The following does not hurt, it's just a fix for a broken > compiler: Ugh, that's ugly. Can you test if it is sufficient to just simplify the math a bit, instead of uglyfing that function more? The nlm4_encode_lock() function already tests for NLM4_OFFSET_MAX explicitly for both start and end, so it should be ok to just re-code the function to not do the extra "loff_t_to_s64()" stuff, and simplify it enough that the compile rwill be happy to compile the simpler function. Something along the lines of if (.. NLM4_OFFSET_MAX tests ..) .. *p++ = htonl(fl->fl_pid); start = fl->fl_start; len = fl->fl_end - start; if (fl->fl_end == OFFSET_MAX) len = 0; p = xdr_encode_hyper(p, start); p = xdr_encode_hyper(p, len); return p; Where it tries to minimize the liveness of the 64-bit values, and tries to avoid extra complications. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Marcelo Tosatti wrote: > > We also want to move the page to the per-address-space clean list in > ClearPageDirty I suppose. I would actually advice against this. - it's ok to have too many pages on the dirty list (think o fthe dirty list as a "these pages _can_ be dirty") - whenever we do a ClearPageDirty() we're likely to remove the page from the lists altogether, so it's not worth it doing extra work. The exception, of course, is the actual "filemap_fdatasync()" function, but that one would probably look something like spin_lock(_cache_lock); while (!list_empty(>dirty_pages)) { struct page *page = list_entry(mapping->dirty_pages.next, struct page, list); list_del(>list); list_add(>list, >clean_pages); if (!PageDirty(page)) continue; page_get(page); spin_unlock(_cache_lock); lock_page(page); if (PageDirty(page)) { ClearPageDirty(page); page->mapping->writepage(page); } UnlockPage(page); page_cache_put(page); spin_lock(_cache_lock); } spin_unlock(_cache_lock); and again note how we can move it to the clean list early and we don't have to keep the PageDirty bit 100% in sync with which list is it on. If somebody marks it dirty later on (and the dirty bit is still set), that somebody won't move it back to the dirty list (because it noticved that the dirty bit is already set), but that's ok: as long as we do the "ClearPageDirty(page);" call before startign the actual writeout(), we're fine. So the "mapping->dirty_pages" list is maybe not so much a _dirty_ list, as a "scheduled for writeout" list. Marking the page clean doesn't remove it from that list - it can happily stay on the list and then when the writeout is started we'd just skip it. Ok? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote: > Too bad. Maybe somebody should tell gcc maintainers about programmers that > know more than the compiler again. I know that {p,}gcc-2.95.2{,.1} are not officially supported. Did you know that it's impossible to compile nfsv4 because of register allocation problems with long long since (long long) month ? The following does not hurt, it's just a fix for a broken compiler: --- linux/fs/lockd/xdr4.c.orig Fri Dec 29 01:35:32 2000 +++ linux/fs/lockd/xdr4.c Fri Dec 29 01:36:36 2000 @@ -156,7 +156,7 @@ nlm4_encode_lock(u32 *p, struct nlm_lock *lock) { struct file_lock*fl = >fl; - __s64 start, len; + volatile __s64 start, len; if (!(p = xdr_encode_string(p, lock->caller)) || !(p = nlm4_encode_fh(p, >fh)) Here is an example without this patch (pgcc-2.95.2.1 this time which is bug-compatible to gcc-2.95.2.1). gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -mpreferred-stack-boundary=2 -march=i686 -DMODULE -c -o xdr4.o xdr4.c xdr4.c: In function `nlm4_encode_lock': xdr4.c:181: internal error--insn does not satisfy its constraints: (insn/i 313 585 315 (set (reg:SI 1 %edx) (subreg:SI (lshiftrt:DI (reg:DI 0 %eax) (const_int 32 [0x20])) 0)) 323 {lshrdi3_const_int_subreg} (nil) (nil)) gcc: Internal compiler error: program cpp got fatal signal 13 make[2]: *** [xdr4.o] Error 1 make[2]: Leaving directory `/.localvol000/usr/src/linux-2.4.0-test13pre5/fs/lockd' make[1]: *** [_modsubdir_lockd] Error 2 make[1]: Leaving directory `/.localvol000/usr/src/linux-2.4.0-test13pre5/fs' make: *** [_mod_fs] Error 2 The question is: Is it worth to apply ? -- ciao - Stefan "export PS1="rms# " " Stefan TrabyLinux/ia32 fax: +43-3133-6107-9 Mitterlasznitzstr. 13 Linux/alphaphone: +43-3133-6107-2 8302 Nestelbach Linux/sparc http://www.hello-penguin.com Austriamailto:[EMAIL PROTECTED] Europe mailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Linus Torvalds wrote: > - make "SetPageDirty()" do something like > > if (!test_and_set(PG_dirty, >flags)) { > spin_lock(_cache_lock); > list_del(page->list); > list_add(page->list, page->mapping->dirty_pages); > spin_unlock(_cache_lock); > } > >This will require making sure that every place that does a >SetPageDirty() will be ok with this (ie double-check that they all have >a mapping: right now the free_pte() code in mm/memory.c doesn't care, >because it knew that it coul dmark even anonymous pages dirty and >they'd just get ignored. > - make a filemap_fdatasync() that walks the dirty pages and does a >writepage() on them all and moves them to the clean list. We also want to move the page to the per-address-space clean list in ClearPageDirty I suppose. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote: > > > On Fri, 29 Dec 2000, Andi Kleen wrote: > > > > Hopefully all the "goto out" micro optimizations can be taken out then too, > > "goto out" often generates much more readable code, so the optimization is > secondary. I was more thinking of cases like the scheduler's gotos, which has gotten rather spagetti recently. Admittedly classic goto out is often more readable than many nested if()s with error handling. > > > I recently found out that gcc 2.97's block moving pass has the tendency > > to move the outlined blocks inline again ;) > > Too bad. Maybe somebody should tell gcc maintainers about programmers that > know more than the compiler again. In x86-64 which relies on 2.97 I'm using __builtin_expect, defined to likely() and unlikely(), which seems to generate good code. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 03:14:56PM -0800, David S. Miller wrote: >Date: Fri, 29 Dec 2000 00:17:21 +0100 >From: Andi Kleen <[EMAIL PROTECTED]> > >On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote: >> To make things like "page - mem_map" et al. use shifts instead of >> expensive multiplies... > >I thought that is what ->index is for ? > > It is for the page cache identity Andi... you know, page_hash(mapping, index)... Oops, I confused it with the 2.0 page->map_nr, which did exactly that. I should have known better. Thanks for correcting this brainfart. > And the add/sub/shift expansion of a multiply/divide by constant even > in its' most optimal form is often not trivial, it is something on the > order of 7 instructions with waitq debugging enabled last time I > checked. Wonder if it looks better with wq debugging turned off or a compressed ->zone. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, Andi Kleen wrote: > > Hopefully all the "goto out" micro optimizations can be taken out then too, "goto out" often generates much more readable code, so the optimization is secondary. > I recently found out that gcc 2.97's block moving pass has the tendency > to move the outlined blocks inline again ;) Too bad. Maybe somebody should tell gcc maintainers about programmers that know more than the compiler again. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, Andi Kleen wrote: > On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote: > >Date: Thu, 28 Dec 2000 23:58:36 +0100 > >From: Andi Kleen <[EMAIL PROTECTED]> > > > >Why exactly a power of two ? To get rid of ->index ? > > > > To make things like "page - mem_map" et al. use shifts instead of > > expensive multiplies... > > I thought that is what ->index is for ? No. "index" only gives the virtual index. "page - mem_map" is how you get the _physical_ index in the zone in question, which is common for physical tranlations (ie "pte_page()", "page_to_virt()" or "page_to_phys()") > Also gcc seems to be already quite clever at dividing through small > integers, e.g. using mul and shift and sub, so it may not be even worth to reach > for a real power-of-two. Look at the code - it's a big multiply to do a divide by 68 or similar. Quite expensive. Doing "page->address - TASK_SIZE" on x86 for the non-highmem case would probably be faster. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Date: Fri, 29 Dec 2000 00:17:21 +0100 From: Andi Kleen <[EMAIL PROTECTED]> On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote: > To make things like "page - mem_map" et al. use shifts instead of > expensive multiplies... I thought that is what ->index is for ? It is for the page cache identity Andi... you know, page_hash(mapping, index)... And the add/sub/shift expansion of a multiply/divide by constant even in its' most optimal form is often not trivial, it is something on the order of 7 instructions with waitq debugging enabled last time I checked. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, Andi Kleen wrote: > On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote: > >Date: Thu, 28 Dec 2000 23:58:36 +0100 > >From: Andi Kleen <[EMAIL PROTECTED]> > > > >Why exactly a power of two ? To get rid of ->index ? > > > > To make things like "page - mem_map" et al. use shifts instead of > > expensive multiplies... > > I thought that is what ->index is for ? Nope, ->index is there to identify which offset the page has in ->mapping, read mm/filemap.c::__find_page_nolock() for more info. > Also gcc seems to be already quite clever at dividing through > small integers, e.g. using mul and shift and sub, so it may not > be even worth to reach for a real power-of-two. > > I suspect doing the arithmetics is at least faster than eating the > cache misses because of ->index. I'm pretty confident that arithmetic is faster than cache misses ... but an unlucky size of the page struct will cause extra cache misses due to misalignment. regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 03:15:01PM -0800, Linus Torvalds wrote: > > (first number for 32bit, second for 64bit) > > > > - Do not compile virtual in when the kernel does not support highmem > > (saves 4/8 bytes) > > Even on UP, "virtual" often helps. The conversion from "struct page" to > the linear address is quite common, and if "struct page" isn't a > power-of-two it gets slow. Are you sure? Last time I checked gcc did a very good job at optimizing small divisions with small integers, without using div. It just has to be a good integer with not too many set bits. > is 100% accurate, we _do_ care that the fields close-by don't get strange > effects from updating "age". We used to have exactly this problem on alpha > back in the 2.1.x timeframe. When it is shared with a constant field (like zone index) it shouldn't matter, no ? At worst you can see outdated data, and when the outdated data is constant all is fine. > > - flags can be __u32 on 64bit hosts, sharing 64bit with something that > > is tolerant to async updates (e.g. the zone table index or the index) > > - index could be probably u32 instead of unsigned long, saving 4 bytes > > on i386 > > It already _is_ 32-bit on x86. Oops. It was a typo. I meant to write "saving 4 bytes on 64bit" > Anyway, I don't want to increase "struct page" in size, but I also don't > think it's worth micro-optimizing some of these things if the code gets > harder to maintain (like the partial-word stuff would be). Ok pity :-/ Hopefully all the "goto out" micro optimizations can be taken out then too, I recently found out that gcc 2.97's block moving pass has the tendency to move the outlined blocks inline again ;) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote: >Date: Thu, 28 Dec 2000 23:58:36 +0100 >From: Andi Kleen <[EMAIL PROTECTED]> > >Why exactly a power of two ? To get rid of ->index ? > > To make things like "page - mem_map" et al. use shifts instead of > expensive multiplies... I thought that is what ->index is for ? Also gcc seems to be already quite clever at dividing through small integers, e.g. using mul and shift and sub, so it may not be even worth to reach for a real power-of-two. I suspect doing the arithmetics is at least faster than eating the cache misses because of ->index. -Andikkk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Andi Kleen wrote: > > BTW.. > > The current 2.4 struct page could be already shortened a lot, saving a lot > of cache. Not that much, but some. > (first number for 32bit, second for 64bit) > > - Do not compile virtual in when the kernel does not support highmem > (saves 4/8 bytes) Even on UP, "virtual" often helps. The conversion from "struct page" to the linear address is quite common, and if "struct page" isn't a power-of-two it gets slow. > - Instead of having a zone pointer mask use a 8 or 16 byte index into a > zone table. On a modern CPU it is much cheaper to do the and/shifts than > to do even a single cache miss during page aging. On a lot of systems > that zone index could be hardcoded to 0 anyways, giving better code. > - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which > has the same swapping algorithm even only uses 8bit) This would be good, but can be hard. FreeBSD doesn't try to be portable any more, but Linux does, and there are architectures where 8- and 16-bit accesses aren't atomic but have to be done with read-modify-write cycles. And even for fields like "age", where we don't care whether the age itself is 100% accurate, we _do_ care that the fields close-by don't get strange effects from updating "age". We used to have exactly this problem on alpha back in the 2.1.x timeframe. This is why a lot of fields are 32-bit, even though we wouldn't need more than 8 or 16 bits of them. > - Remove the waitqueue debugging (obvious @) Not obvious enough. There are magic things that could be done, like hiding the wait-queue lock bit in the waitqueue lists themselves etc. That could be done with some per-architecture magic etc. > - flags can be __u32 on 64bit hosts, sharing 64bit with something that > is tolerant to async updates (e.g. the zone table index or the index) > - index could be probably u32 instead of unsigned long, saving 4 bytes > on i386 It already _is_ 32-bit on x86. Only the LSF patches made it 64-bit. That never made it into the standard kernel. Sure, we could make it "u32" and thus force it to be 32-bit even on 64-bit architectures, but some day somebody will want to have more than 46 bits of file mappings, and which 46 bits is _huge_ on a 32-bit machine, on a 64-bit one in 5 years it will not be entirely unreasonable. Anyway, I don't want to increase "struct page" in size, but I also don't think it's worth micro-optimizing some of these things if the code gets harder to maintain (like the partial-word stuff would be). The biggest win by far would come from increasing the page-size, something we can do even in software. Having a "kernel page size" of 8kB even on x86 would basically cut the overhead in half. As that would also improve some other things (like having better throughput due to bigger contiguous chunks), that's something I'd like to see some day. (And user space wouldn't ever have to know - we could map in "half pages" aka "hardware pages" without mappign the whole page). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Date: Thu, 28 Dec 2000 23:58:36 +0100 From: Andi Kleen <[EMAIL PROTECTED]> Why exactly a power of two ? To get rid of ->index ? To make things like "page - mem_map" et al. use shifts instead of expensive multiplies... Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Andi Kleen wrote: > On Thu, Dec 28, 2000 at 02:33:07PM -0800, David S. Miller wrote: > >Date:Thu, 28 Dec 2000 23:17:22 +0100 > >From: Andi Kleen <[EMAIL PROTECTED]> > > > >Would you consider patches for any of these points? > > > > To me it seems just as important to make sure struct page is > > a power of 2 in size, with the waitq debugging turned off this > > is true for both 32-bit and 64-bit hosts last time I checked. > > Why exactly a power of two ? To get rid of ->index ? Most likely to minimise the number of cache misses needed to access a complete page_struct. Then again, I guess 48 bytes would _also_ guarantee that we never need more than 2 cache misses to access every part of the page_struct. And the memory wasted in the page_struct may well be a bigger factor than the cache misses on lots of systems... (time for another CONFIG option? ;)) regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 02:33:07PM -0800, David S. Miller wrote: >Date: Thu, 28 Dec 2000 23:17:22 +0100 >From: Andi Kleen <[EMAIL PROTECTED]> > >Would you consider patches for any of these points? > > To me it seems just as important to make sure struct page is > a power of 2 in size, with the waitq debugging turned off this > is true for both 32-bit and 64-bit hosts last time I checked. Why exactly a power of two ? To get rid of ->index ? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Date:Thu, 28 Dec 2000 23:17:22 +0100 From: Andi Kleen <[EMAIL PROTECTED]> Would you consider patches for any of these points? To me it seems just as important to make sure struct page is a power of 2 in size, with the waitq debugging turned off this is true for both 32-bit and 64-bit hosts last time I checked. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 12:59:22PM -0800, Linus Torvalds wrote: > - we absolutely do _not_ want to make "struct page" bigger. We can't >afford to just throw away another 8 bytes per page on adding a new list >structure, I feel. Even if this would be the simplest solution. BTW.. The current 2.4 struct page could be already shortened a lot, saving a lot of cache. (first number for 32bit, second for 64bit) - Do not compile virtual in when the kernel does not support highmem (saves 4/8 bytes) - Instead of having a zone pointer mask use a 8 or 16 byte index into a zone table. On a modern CPU it is much cheaper to do the and/shifts than to do even a single cache miss during page aging. On a lot of systems that zone index could be hardcoded to 0 anyways, giving better code. - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which has the same swapping algorithm even only uses 8bit) - Remove the waitqueue debugging (obvious @) - flags can be __u32 on 64bit hosts, sharing 64bit with something that is tolerant to async updates (e.g. the zone table index or the index) - index could be probably u32 instead of unsigned long, saving 4 bytes on i386 Would you consider patches for any of these points? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Linus Torvalds wrote: > - global dirty list for global syn(). We don't have one, and I don't >think we want one. We could add a few lists, and split up the active >list into "active" and "active_dirty", for example, but I don't like >the implications that would probably have for the LRU ordering. This has been the subject of a lot of flam^H^H^H^H discussion on #kernelnewbies about this and it's still an open question. The only way to see if a separate active_dirty hurts or helps is to try it. Later. :-) I don't see how a separate active_dirty list can hurt LRU ordering. We could still take the pages off the two lists in the same order we did with one list if we wanted to, or at least, statistically the same in turns of number, age, time since entering the list, etc. That better not cause radically different or undesireable behaviour or something is really broken. By breaking active into two lists we'd get a very interesting tuning parameter to play with: the relative rate at which pages are moved from active to inactive. Beyond that, the active_dirty list could be pressed into service quite easily as a page-oriented version of kflushd, and would obviously be useful as a global sync list. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5 (via82cxxx_audio.c)
In article <[EMAIL PROTECTED]>, Linus Torvalds <[EMAIL PROTECTED]> writes: LT> LT> The mm cleanups also include removing "swapout()" as a VM operation, as swapout was not removed from drivers/sound/via82cxxx_audio.c; the following does so (compiles and produces sound, someone who understands this please check). --- drivers/sound/via82cxxx_audio.c.origThu Dec 28 21:02:03 2000 +++ drivers/sound/via82cxxx_audio.c Thu Dec 28 21:12:58 2000 @@ -1727,20 +1727,8 @@ } -#ifndef VM_RESERVE -static int via_mm_swapout (struct page *page, struct file *filp) -{ - return 0; -} -#endif /* VM_RESERVE */ - - struct vm_operations_struct via_mm_ops = { nopage: via_mm_nopage, - -#ifndef VM_RESERVE - swapout:via_mm_swapout, -#endif }; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [wildly off-topic] Re: test13-pre5
On Thu, 28 Dec 2000, Rik van Riel wrote: > On Thu, 28 Dec 2000, Marcelo Tosatti wrote: > > On Thu, 28 Dec 2000, Linus Torvalds wrote: > > > > > If somebody (you? hint, hint) wants to do this, > > > > Ok, I'll do it because I love Tove. > > Marcelo, you should buy some glasses ;) > > Tove != Tux > > It's ok and probably safe to love Tux, the nice cuddly > penguin everybody loves. > > However, loving the (6-time ??) Finnish female karate > champion, who happens to be married to Linus is probably > quite a bit less safe ... Marcelo runs like hell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[wildly off-topic] Re: test13-pre5
On Thu, 28 Dec 2000, Marcelo Tosatti wrote: > On Thu, 28 Dec 2000, Linus Torvalds wrote: > > > If somebody (you? hint, hint) wants to do this, > > Ok, I'll do it because I love Tove. Marcelo, you should buy some glasses ;) Tove != Tux It's ok and probably safe to love Tux, the nice cuddly penguin everybody loves. However, loving the (6-time ??) Finnish female karate champion, who happens to be married to Linus is probably quite a bit less safe ... cheers, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Linus Torvalds wrote: > If somebody (you? hint, hint) wants to do this, Ok, I'll do it because I love Tove. > I'd be very happy - I can do it myself, but because it's my birthday > I'm supposed to drag myself off the computer soon and be social, or > Tove will be grumpy. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Marcelo Tosatti wrote: > > On Thu, 28 Dec 2000, Linus Torvalds wrote: > > > This still doesn't tell "sync()" about dirty pages (ie the "innd loses the > > active file after a reboot" bug), but now the places that mark pages dirty > > are under control. Next step.. > > Do you really want to split the per-address-space pages list in dirty and > clean lists for 2.4 ? > > Or do you think walking the current per-address-space page list searching > for dirty pages and syncing them is ok? There are a few issues: - fdatasync/fsync is often quite critical for databases. It's _possibly_ ok to just walk all the pages for an inode, but I'm fairly certain that this is an area where if we don't have a separate dirty queue we _will_ need to add one later. - global dirty list for global syn(). We don't have one, and I don't think we want one. We could add a few lists, and split up the active list into "active" and "active_dirty", for example, but I don't like the implications that would probably have for the LRU ordering. - we absolutely do _not_ want to make "struct page" bigger. We can't afford to just throw away another 8 bytes per page on adding a new list structure, I feel. Even if this would be the simplest solution. So right now I think the right idea is to - split up "address_space->pages" into "->clean_pages" and "->dirty_pages". This is fairly easily done, it requires small changes like making "truncate_inode_pages()" instead be "truncate_list_pages()", and making "truncate_inode_pages()" call that for both the dirty and the clean lists. That's about 10 lines of diff (I already tried this), and that's about the biggest example of something like that. Most other areas don't much care about the inode page lists. - make "SetPageDirty()" do something like if (!test_and_set(PG_dirty, >flags)) { spin_lock(_cache_lock); list_del(page->list); list_add(page->list, page->mapping->dirty_pages); spin_unlock(_cache_lock); } This will require making sure that every place that does a SetPageDirty() will be ok with this (ie double-check that they all have a mapping: right now the free_pte() code in mm/memory.c doesn't care, because it knew that it coul dmark even anonymous pages dirty and they'd just get ignored. - make a filemap_fdatasync() that walks the dirty pages and does a writepage() on them all and moves them to the clean list. - make fsync() and fdatasync() call the above function before they even call the low-level per-FS code. - make sync_inodes() use that same filemap_fdatasync() function so that the sync() case is handled. All done. It looks something like 5-10 places, most of which are about 10 lines of diff each, if even that. The only real worry would be that the locking isn't rigth, but getting the pagemap lock should be the safe thing, and from a lock contention standpoint it should be ok (if we move a lot of pages back and forth, lock contention is going to be the least of our worries, because it implies that we'd be doing a LOT of IO to actually write the pages out). If somebody (you? hint, hint) wants to do this, I'd be very happy - I can do it myself, but because it's my birthday I'm supposed to drag myself off the computer soon and be social, or Tove will be grumpy. And you don't want Tove grumpy. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Linus Torvalds wrote: > This still doesn't tell "sync()" about dirty pages (ie the "innd loses the > active file after a reboot" bug), but now the places that mark pages dirty > are under control. Next step.. Do you really want to split the per-address-space pages list in dirty and clean lists for 2.4 ? Or do you think walking the current per-address-space page list searching for dirty pages and syncing them is ok? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Linus Torvalds wrote: This still doesn't tell "sync()" about dirty pages (ie the "innd loses the active file after a reboot" bug), but now the places that mark pages dirty are under control. Next step.. Do you really want to split the per-address-space pages list in dirty and clean lists for 2.4 ? Or do you think walking the current per-address-space page list searching for dirty pages and syncing them is ok? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Linus Torvalds wrote: If somebody (you? hint, hint) wants to do this, Ok, I'll do it because I love Tove. I'd be very happy - I can do it myself, but because it's my birthday I'm supposed to drag myself off the computer soon and be social, or Tove will be grumpy. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[wildly off-topic] Re: test13-pre5
On Thu, 28 Dec 2000, Marcelo Tosatti wrote: On Thu, 28 Dec 2000, Linus Torvalds wrote: If somebody (you? hint, hint) wants to do this, Ok, I'll do it because I love Tove. Marcelo, you should buy some glasses ;) Tove != Tux It's ok and probably safe to love Tux, the nice cuddly penguin everybody loves. However, loving the (6-time ??) Finnish female karate champion, who happens to be married to Linus is probably quite a bit less safe ... cheers, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [wildly off-topic] Re: test13-pre5
On Thu, 28 Dec 2000, Rik van Riel wrote: On Thu, 28 Dec 2000, Marcelo Tosatti wrote: On Thu, 28 Dec 2000, Linus Torvalds wrote: If somebody (you? hint, hint) wants to do this, Ok, I'll do it because I love Tove. Marcelo, you should buy some glasses ;) Tove != Tux It's ok and probably safe to love Tux, the nice cuddly penguin everybody loves. However, loving the (6-time ??) Finnish female karate champion, who happens to be married to Linus is probably quite a bit less safe ... Marcelo runs like hell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5 (via82cxxx_audio.c)
In article [EMAIL PROTECTED], Linus Torvalds [EMAIL PROTECTED] writes: LT LT The mm cleanups also include removing "swapout()" as a VM operation, as swapout was not removed from drivers/sound/via82cxxx_audio.c; the following does so (compiles and produces sound, someone who understands this please check). --- drivers/sound/via82cxxx_audio.c.origThu Dec 28 21:02:03 2000 +++ drivers/sound/via82cxxx_audio.c Thu Dec 28 21:12:58 2000 @@ -1727,20 +1727,8 @@ } -#ifndef VM_RESERVE -static int via_mm_swapout (struct page *page, struct file *filp) -{ - return 0; -} -#endif /* VM_RESERVE */ - - struct vm_operations_struct via_mm_ops = { nopage: via_mm_nopage, - -#ifndef VM_RESERVE - swapout:via_mm_swapout, -#endif }; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Linus Torvalds wrote: - global dirty list for global syn(). We don't have one, and I don't think we want one. We could add a few lists, and split up the active list into "active" and "active_dirty", for example, but I don't like the implications that would probably have for the LRU ordering. This has been the subject of a lot of flam^H^H^H^H discussion on #kernelnewbies about this and it's still an open question. The only way to see if a separate active_dirty hurts or helps is to try it. Later. :-) I don't see how a separate active_dirty list can hurt LRU ordering. We could still take the pages off the two lists in the same order we did with one list if we wanted to, or at least, statistically the same in turns of number, age, time since entering the list, etc. That better not cause radically different or undesireable behaviour or something is really broken. By breaking active into two lists we'd get a very interesting tuning parameter to play with: the relative rate at which pages are moved from active to inactive. Beyond that, the active_dirty list could be pressed into service quite easily as a page-oriented version of kflushd, and would obviously be useful as a global sync list. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 12:59:22PM -0800, Linus Torvalds wrote: - we absolutely do _not_ want to make "struct page" bigger. We can't afford to just throw away another 8 bytes per page on adding a new list structure, I feel. Even if this would be the simplest solution. BTW.. The current 2.4 struct page could be already shortened a lot, saving a lot of cache. (first number for 32bit, second for 64bit) - Do not compile virtual in when the kernel does not support highmem (saves 4/8 bytes) - Instead of having a zone pointer mask use a 8 or 16 byte index into a zone table. On a modern CPU it is much cheaper to do the and/shifts than to do even a single cache miss during page aging. On a lot of systems that zone index could be hardcoded to 0 anyways, giving better code. - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which has the same swapping algorithm even only uses 8bit) - Remove the waitqueue debugging (obvious @) - flags can be __u32 on 64bit hosts, sharing 64bit with something that is tolerant to async updates (e.g. the zone table index or the index) - index could be probably u32 instead of unsigned long, saving 4 bytes on i386 Would you consider patches for any of these points? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Andi Kleen wrote: BTW.. The current 2.4 struct page could be already shortened a lot, saving a lot of cache. Not that much, but some. (first number for 32bit, second for 64bit) - Do not compile virtual in when the kernel does not support highmem (saves 4/8 bytes) Even on UP, "virtual" often helps. The conversion from "struct page" to the linear address is quite common, and if "struct page" isn't a power-of-two it gets slow. - Instead of having a zone pointer mask use a 8 or 16 byte index into a zone table. On a modern CPU it is much cheaper to do the and/shifts than to do even a single cache miss during page aging. On a lot of systems that zone index could be hardcoded to 0 anyways, giving better code. - Instead of using 4/8 bytes for the age use only 16bit (FreeBSD which has the same swapping algorithm even only uses 8bit) This would be good, but can be hard. FreeBSD doesn't try to be portable any more, but Linux does, and there are architectures where 8- and 16-bit accesses aren't atomic but have to be done with read-modify-write cycles. And even for fields like "age", where we don't care whether the age itself is 100% accurate, we _do_ care that the fields close-by don't get strange effects from updating "age". We used to have exactly this problem on alpha back in the 2.1.x timeframe. This is why a lot of fields are 32-bit, even though we wouldn't need more than 8 or 16 bits of them. - Remove the waitqueue debugging (obvious @) Not obvious enough. There are magic things that could be done, like hiding the wait-queue lock bit in the waitqueue lists themselves etc. That could be done with some per-architecture magic etc. - flags can be __u32 on 64bit hosts, sharing 64bit with something that is tolerant to async updates (e.g. the zone table index or the index) - index could be probably u32 instead of unsigned long, saving 4 bytes on i386 It already _is_ 32-bit on x86. Only the LSF patches made it 64-bit. That never made it into the standard kernel. Sure, we could make it "u32" and thus force it to be 32-bit even on 64-bit architectures, but some day somebody will want to have more than 46 bits of file mappings, and which 46 bits is _huge_ on a 32-bit machine, on a 64-bit one in 5 years it will not be entirely unreasonable. Anyway, I don't want to increase "struct page" in size, but I also don't think it's worth micro-optimizing some of these things if the code gets harder to maintain (like the partial-word stuff would be). The biggest win by far would come from increasing the page-size, something we can do even in software. Having a "kernel page size" of 8kB even on x86 would basically cut the overhead in half. As that would also improve some other things (like having better throughput due to bigger contiguous chunks), that's something I'd like to see some day. (And user space wouldn't ever have to know - we could map in "half pages" aka "hardware pages" without mappign the whole page). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 03:15:01PM -0800, Linus Torvalds wrote: (first number for 32bit, second for 64bit) - Do not compile virtual in when the kernel does not support highmem (saves 4/8 bytes) Even on UP, "virtual" often helps. The conversion from "struct page" to the linear address is quite common, and if "struct page" isn't a power-of-two it gets slow. Are you sure? Last time I checked gcc did a very good job at optimizing small divisions with small integers, without using div. It just has to be a good integer with not too many set bits. is 100% accurate, we _do_ care that the fields close-by don't get strange effects from updating "age". We used to have exactly this problem on alpha back in the 2.1.x timeframe. When it is shared with a constant field (like zone index) it shouldn't matter, no ? At worst you can see outdated data, and when the outdated data is constant all is fine. - flags can be __u32 on 64bit hosts, sharing 64bit with something that is tolerant to async updates (e.g. the zone table index or the index) - index could be probably u32 instead of unsigned long, saving 4 bytes on i386 It already _is_ 32-bit on x86. Oops. It was a typo. I meant to write "saving 4 bytes on 64bit" Anyway, I don't want to increase "struct page" in size, but I also don't think it's worth micro-optimizing some of these things if the code gets harder to maintain (like the partial-word stuff would be). Ok pity :-/ Hopefully all the "goto out" micro optimizations can be taken out then too, I recently found out that gcc 2.97's block moving pass has the tendency to move the outlined blocks inline again ;) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, Andi Kleen wrote: Hopefully all the "goto out" micro optimizations can be taken out then too, "goto out" often generates much more readable code, so the optimization is secondary. I recently found out that gcc 2.97's block moving pass has the tendency to move the outlined blocks inline again ;) Too bad. Maybe somebody should tell gcc maintainers about programmers that know more than the compiler again. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote: On Fri, 29 Dec 2000, Andi Kleen wrote: Hopefully all the "goto out" micro optimizations can be taken out then too, "goto out" often generates much more readable code, so the optimization is secondary. I was more thinking of cases like the scheduler's gotos, which has gotten rather spagetti recently. Admittedly classic goto out is often more readable than many nested if()s with error handling. I recently found out that gcc 2.97's block moving pass has the tendency to move the outlined blocks inline again ;) Too bad. Maybe somebody should tell gcc maintainers about programmers that know more than the compiler again. In x86-64 which relies on 2.97 I'm using __builtin_expect, defined to likely() and unlikely(), which seems to generate good code. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Linus Torvalds wrote: - make "SetPageDirty()" do something like if (!test_and_set(PG_dirty, page-flags)) { spin_lock(page_cache_lock); list_del(page-list); list_add(page-list, page-mapping-dirty_pages); spin_unlock(page_cache_lock); } This will require making sure that every place that does a SetPageDirty() will be ok with this (ie double-check that they all have a mapping: right now the free_pte() code in mm/memory.c doesn't care, because it knew that it coul dmark even anonymous pages dirty and they'd just get ignored. - make a filemap_fdatasync() that walks the dirty pages and does a writepage() on them all and moves them to the clean list. We also want to move the page to the per-address-space clean list in ClearPageDirty I suppose. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote: Too bad. Maybe somebody should tell gcc maintainers about programmers that know more than the compiler again. I know that {p,}gcc-2.95.2{,.1} are not officially supported. Did you know that it's impossible to compile nfsv4 because of register allocation problems with long long since (long long) month ? The following does not hurt, it's just a fix for a broken compiler: --- linux/fs/lockd/xdr4.c.orig Fri Dec 29 01:35:32 2000 +++ linux/fs/lockd/xdr4.c Fri Dec 29 01:36:36 2000 @@ -156,7 +156,7 @@ nlm4_encode_lock(u32 *p, struct nlm_lock *lock) { struct file_lock*fl = lock-fl; - __s64 start, len; + volatile __s64 start, len; if (!(p = xdr_encode_string(p, lock-caller)) || !(p = nlm4_encode_fh(p, lock-fh)) Here is an example without this patch (pgcc-2.95.2.1 this time which is bug-compatible to gcc-2.95.2.1). gcc -D__KERNEL__ -I/usr/src/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -mpreferred-stack-boundary=2 -march=i686 -DMODULE -c -o xdr4.o xdr4.c xdr4.c: In function `nlm4_encode_lock': xdr4.c:181: internal error--insn does not satisfy its constraints: (insn/i 313 585 315 (set (reg:SI 1 %edx) (subreg:SI (lshiftrt:DI (reg:DI 0 %eax) (const_int 32 [0x20])) 0)) 323 {lshrdi3_const_int_subreg} (nil) (nil)) gcc: Internal compiler error: program cpp got fatal signal 13 make[2]: *** [xdr4.o] Error 1 make[2]: Leaving directory `/.localvol000/usr/src/linux-2.4.0-test13pre5/fs/lockd' make[1]: *** [_modsubdir_lockd] Error 2 make[1]: Leaving directory `/.localvol000/usr/src/linux-2.4.0-test13pre5/fs' make: *** [_mod_fs] Error 2 The question is: Is it worth to apply ? -- ciao - Stefan "export PS1="rms# " " Stefan TrabyLinux/ia32 fax: +43-3133-6107-9 Mitterlasznitzstr. 13 Linux/alphaphone: +43-3133-6107-2 8302 Nestelbach Linux/sparc http://www.hello-penguin.com Austriamailto:[EMAIL PROTECTED] Europe mailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 03:14:56PM -0800, David S. Miller wrote: Date: Fri, 29 Dec 2000 00:17:21 +0100 From: Andi Kleen [EMAIL PROTECTED] On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote: To make things like "page - mem_map" et al. use shifts instead of expensive multiplies... I thought that is what -index is for ? It is for the page cache identity Andi... you know, page_hash(mapping, index)... Oops, I confused it with the 2.0 page-map_nr, which did exactly that. I should have known better. Thanks for correcting this brainfart. And the add/sub/shift expansion of a multiply/divide by constant even in its' most optimal form is often not trivial, it is something on the order of 7 instructions with waitq debugging enabled last time I checked. Wonder if it looks better with wq debugging turned off or a compressed -zone. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 02:33:07PM -0800, David S. Miller wrote: Date: Thu, 28 Dec 2000 23:17:22 +0100 From: Andi Kleen [EMAIL PROTECTED] Would you consider patches for any of these points? To me it seems just as important to make sure struct page is a power of 2 in size, with the waitq debugging turned off this is true for both 32-bit and 64-bit hosts last time I checked. Why exactly a power of two ? To get rid of -index ? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Andi Kleen wrote: On Thu, Dec 28, 2000 at 02:33:07PM -0800, David S. Miller wrote: Date:Thu, 28 Dec 2000 23:17:22 +0100 From: Andi Kleen [EMAIL PROTECTED] Would you consider patches for any of these points? To me it seems just as important to make sure struct page is a power of 2 in size, with the waitq debugging turned off this is true for both 32-bit and 64-bit hosts last time I checked. Why exactly a power of two ? To get rid of -index ? Most likely to minimise the number of cache misses needed to access a complete page_struct. Then again, I guess 48 bytes would _also_ guarantee that we never need more than 2 cache misses to access every part of the page_struct. And the memory wasted in the page_struct may well be a bigger factor than the cache misses on lots of systems... (time for another CONFIG option? ;)) regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Date: Thu, 28 Dec 2000 23:58:36 +0100 From: Andi Kleen [EMAIL PROTECTED] Why exactly a power of two ? To get rid of -index ? To make things like "page - mem_map" et al. use shifts instead of expensive multiplies... Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, 28 Dec 2000, Marcelo Tosatti wrote: We also want to move the page to the per-address-space clean list in ClearPageDirty I suppose. I would actually advice against this. - it's ok to have too many pages on the dirty list (think o fthe dirty list as a "these pages _can_ be dirty") - whenever we do a ClearPageDirty() we're likely to remove the page from the lists altogether, so it's not worth it doing extra work. The exception, of course, is the actual "filemap_fdatasync()" function, but that one would probably look something like spin_lock(page_cache_lock); while (!list_empty(mapping-dirty_pages)) { struct page *page = list_entry(mapping-dirty_pages.next, struct page, list); list_del(page-list); list_add(page-list, mapping-clean_pages); if (!PageDirty(page)) continue; page_get(page); spin_unlock(page_cache_lock); lock_page(page); if (PageDirty(page)) { ClearPageDirty(page); page-mapping-writepage(page); } UnlockPage(page); page_cache_put(page); spin_lock(page_cache_lock); } spin_unlock(page_cache_lock); and again note how we can move it to the clean list early and we don't have to keep the PageDirty bit 100% in sync with which list is it on. If somebody marks it dirty later on (and the dirty bit is still set), that somebody won't move it back to the dirty list (because it noticved that the dirty bit is already set), but that's ok: as long as we do the "ClearPageDirty(page);" call before startign the actual writeout(), we're fine. So the "mapping-dirty_pages" list is maybe not so much a _dirty_ list, as a "scheduled for writeout" list. Marking the page clean doesn't remove it from that list - it can happily stay on the list and then when the writeout is started we'd just skip it. Ok? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, Stefan Traby wrote: On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote: Too bad. Maybe somebody should tell gcc maintainers about programmers that know more than the compiler again. I know that {p,}gcc-2.95.2{,.1} are not officially supported. Hmm, I use gcc-2.95.2 myself on some machines, and while I'm not 100% comfortable with it, it does count as "supported" even if it has known problems with "long long". pgcc isn't. Did you know that it's impossible to compile nfsv4 because of register allocation problems with long long since (long long) month ? lockd v4 (for NFS v3), I assume. No, I wasn't aware of this particular bug. The following does not hurt, it's just a fix for a broken compiler: Ugh, that's ugly. Can you test if it is sufficient to just simplify the math a bit, instead of uglyfing that function more? The nlm4_encode_lock() function already tests for NLM4_OFFSET_MAX explicitly for both start and end, so it should be ok to just re-code the function to not do the extra "loff_t_to_s64()" stuff, and simplify it enough that the compile rwill be happy to compile the simpler function. Something along the lines of if (.. NLM4_OFFSET_MAX tests ..) .. *p++ = htonl(fl-fl_pid); start = fl-fl_start; len = fl-fl_end - start; if (fl-fl_end == OFFSET_MAX) len = 0; p = xdr_encode_hyper(p, start); p = xdr_encode_hyper(p, len); return p; Where it tries to minimize the liveness of the 64-bit values, and tries to avoid extra complications. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote: Date: Thu, 28 Dec 2000 23:58:36 +0100 From: Andi Kleen [EMAIL PROTECTED] Why exactly a power of two ? To get rid of -index ? To make things like "page - mem_map" et al. use shifts instead of expensive multiplies... I thought that is what -index is for ? Also gcc seems to be already quite clever at dividing through small integers, e.g. using mul and shift and sub, so it may not be even worth to reach for a real power-of-two. I suspect doing the arithmetics is at least faster than eating the cache misses because of -index. -Andikkk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, Andi Kleen wrote: On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote: Date: Thu, 28 Dec 2000 23:58:36 +0100 From: Andi Kleen [EMAIL PROTECTED] Why exactly a power of two ? To get rid of -index ? To make things like "page - mem_map" et al. use shifts instead of expensive multiplies... I thought that is what -index is for ? Nope, -index is there to identify which offset the page has in -mapping, read mm/filemap.c::__find_page_nolock() for more info. Also gcc seems to be already quite clever at dividing through small integers, e.g. using mul and shift and sub, so it may not be even worth to reach for a real power-of-two. I suspect doing the arithmetics is at least faster than eating the cache misses because of -index. I'm pretty confident that arithmetic is faster than cache misses ... but an unlucky size of the page struct will cause extra cache misses due to misalignment. regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Date: Fri, 29 Dec 2000 00:17:21 +0100 From: Andi Kleen [EMAIL PROTECTED] On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote: To make things like "page - mem_map" et al. use shifts instead of expensive multiplies... I thought that is what -index is for ? It is for the page cache identity Andi... you know, page_hash(mapping, index)... And the add/sub/shift expansion of a multiply/divide by constant even in its' most optimal form is often not trivial, it is something on the order of 7 instructions with waitq debugging enabled last time I checked. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
On Fri, 29 Dec 2000, Andi Kleen wrote: On Thu, Dec 28, 2000 at 02:54:52PM -0800, David S. Miller wrote: Date: Thu, 28 Dec 2000 23:58:36 +0100 From: Andi Kleen [EMAIL PROTECTED] Why exactly a power of two ? To get rid of -index ? To make things like "page - mem_map" et al. use shifts instead of expensive multiplies... I thought that is what -index is for ? No. "index" only gives the virtual index. "page - mem_map" is how you get the _physical_ index in the zone in question, which is common for physical tranlations (ie "pte_page()", "page_to_virt()" or "page_to_phys()") Also gcc seems to be already quite clever at dividing through small integers, e.g. using mul and shift and sub, so it may not be even worth to reach for a real power-of-two. Look at the code - it's a big multiply to do a divide by 68 or similar. Quite expensive. Doing "page-address - TASK_SIZE" on x86 for the non-highmem case would probably be faster. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre5
Simply executing *p++ = htonl(fl-fl_pid); before start = loff_t_to_s64(fl-fl_start); also works. Later, Albert Linus Torvalds wrote: On Fri, 29 Dec 2000, Stefan Traby wrote: On Thu, Dec 28, 2000 at 03:37:51PM -0800, Linus Torvalds wrote: Too bad. Maybe somebody should tell gcc maintainers about programmers that know more than the compiler again. I know that {p,}gcc-2.95.2{,.1} are not officially supported. Hmm, I use gcc-2.95.2 myself on some machines, and while I'm not 100% comfortable with it, it does count as "supported" even if it has known problems with "long long". pgcc isn't. Did you know that it's impossible to compile nfsv4 because of register allocation problems with long long since (long long) month ? lockd v4 (for NFS v3), I assume. No, I wasn't aware of this particular bug. The following does not hurt, it's just a fix for a broken compiler: Ugh, that's ugly. Can you test if it is sufficient to just simplify the math a bit, instead of uglyfing that function more? The nlm4_encode_lock() function already tests for NLM4_OFFSET_MAX explicitly for both start and end, so it should be ok to just re-code the function to not do the extra "loff_t_to_s64()" stuff, and simplify it enough that the compile rwill be happy to compile the simpler function. Something along the lines of if (.. NLM4_OFFSET_MAX tests ..) .. *p++ = htonl(fl-fl_pid); start = fl-fl_start; len = fl-fl_end - start; if (fl-fl_end == OFFSET_MAX) len = 0; p = xdr_encode_hyper(p, start); p = xdr_encode_hyper(p, len); return p; Where it tries to minimize the liveness of the 64-bit values, and tries to avoid extra complications. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ -- Albert Cranford Deerfield Beach FL USA [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/