Re: [Patch] performance enhancement for simple_strtoul
Pavel Machek wrote: > > [about strtoul] > > Perhaps I am mistaken but I'd expect it to be called what, ten times at > > boot time, and a couple of times when X loads the MTRRs? > > On second thought, ps -auxl maybe stresses simple_strtoul a little > bit. Not sure. Nah. proc_pid_lookup does its own conversion from string to number, and the rest are conversions from numbers to strings in sprintf. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Pavel Machek wrote: [about strtoul] Perhaps I am mistaken but I'd expect it to be called what, ten times at boot time, and a couple of times when X loads the MTRRs? On second thought, ps -auxl maybe stresses simple_strtoul a little bit. Not sure. Nah. proc_pid_lookup does its own conversion from string to number, and the rest are conversions from numbers to strings in sprintf. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Hi! > > It seems gcc creates much better code with the variables set to register > > types. > > Curious. GCC should be generating the same code regardless; ah well. > > Is strtoul actually used in the kernel other than for the occasional > (rare) write to /proc/sys and parsing boot options? > > > But this is the kernel and there are people that would reject my patch > > purely on the basis that it adds precious bytes to the kernel. > > Perhaps I am mistaken but I'd expect it to be called what, ten times at > boot time, and a couple of times when X loads the MTRRs? On second thought, ps -auxl maybe stresses simple_strtoul a little bit. Not sure. > Sounds like the neatest trick would be reducing bytes used here... Pavel -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Hi! > The following patch is a faster implementation of the simple_strtoul > function. This function differs from the original in that it reduces the > multiplies to shifts and logical operations wherever possible. My testing > shows that it adds around 100 bytes, but is about 6% faster on a K6-2. (It > is 40% faster that glibc's strtoul, but that's a different story.) My guess > is that the performance gain will be higher on platforms with slower > multiplication instructions. I have tested it for numerical accuracy so I > think this is safe to apply. If anyone is interested, I can also supply a > test application that demonstrates the performance gain. This patch was > generated against 2.2.16, but should apply to 2.2.19 cleanly. In > 2.4.0-test9, simple_strtoul starts on line 19 rather than 17, hopefully > that's not a problem. Simple question: who cares about performance of simple_strtoul? Original is shorter, and simple_strtoul is not performace critical. Keep it as it is. -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Hi! The following patch is a faster implementation of the simple_strtoul function. This function differs from the original in that it reduces the multiplies to shifts and logical operations wherever possible. My testing shows that it adds around 100 bytes, but is about 6% faster on a K6-2. (It is 40% faster that glibc's strtoul, but that's a different story.) My guess is that the performance gain will be higher on platforms with slower multiplication instructions. I have tested it for numerical accuracy so I think this is safe to apply. If anyone is interested, I can also supply a test application that demonstrates the performance gain. This patch was generated against 2.2.16, but should apply to 2.2.19 cleanly. In 2.4.0-test9, simple_strtoul starts on line 19 rather than 17, hopefully that's not a problem. Simple question: who cares about performance of simple_strtoul? Original is shorter, and simple_strtoul is not performace critical. Keep it as it is. -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Hi! It seems gcc creates much better code with the variables set to register types. Curious. GCC should be generating the same code regardless; ah well. Is strtoul actually used in the kernel other than for the occasional (rare) write to /proc/sys and parsing boot options? But this is the kernel and there are people that would reject my patch purely on the basis that it adds precious bytes to the kernel. Perhaps I am mistaken but I'd expect it to be called what, ten times at boot time, and a couple of times when X loads the MTRRs? On second thought, ps -auxl maybe stresses simple_strtoul a little bit. Not sure. Sounds like the neatest trick would be reducing bytes used here... Pavel -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
> On Wed, 20 Dec 2000, Steve Grubb wrote: > > > +while (isdigit(c)) { > > +result = (result*10) + (c & 0x0f); > > +c = *(++cp); > > +} > > x * 10 can be written as: > > (x << 2 + x) << 1 = (4x+x) * 2 > (x << 3) + (x << 1) = 8x + 2x Since when has printk been performance critical. It isnt worth microoptimising (or in your case for some cpus micropessimising) that stuff. Besides, gcc should work it out if its worth doing - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
On Thu, 21 Dec 2000, Matthias Andree wrote: > > x * 10 can be written as: > > (x << 2 + x) << 1 = (4x+x) * 2 > (x << 3) + (x << 1) = 8x + 2x Or as "x * 10". Which has the advantage of actually being readable, and letting the compiler optimize it into one of the other forms if that's profitable on the machine you are compiling for. Why do you guys bother making strtoul run fast anyway? Surely it's not on any critical path in the kernel? Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
On Wed, 20 Dec 2000, Steve Grubb wrote: > +while (isdigit(c)) { > +result = (result*10) + (c & 0x0f); > +c = *(++cp); > +} x * 10 can be written as: (x << 2 + x) << 1 = (4x+x) * 2 (x << 3) + (x << 1) = 8x + 2x Not sure if that/which one is faster, you may want to benchmark. However, on machines that I have seen, multiplication times are either constant or depend on the count of set bits in the second divisor, so it's something like 6 + 2s. However, I have only m68k data books here, and it will gain nothing on an 'C68060 since those beasts ram down multiplications in 2 cycles, so we'd gain nothing on those chips (OK, the shifts take 1 cycles each and are scheduled in parallel, and the add takes an additional cycle after the shifts have completed). Not sure about the ix86, alpha or sparc series. -- Matthias Andree - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
On Wed, 20 Dec 2000, Steve Grubb wrote: +while (isdigit(c)) { +result = (result*10) + (c 0x0f); +c = *(++cp); +} x * 10 can be written as: (x 2 + x) 1 = (4x+x) * 2 (x 3) + (x 1) = 8x + 2x Not sure if that/which one is faster, you may want to benchmark. However, on machines that I have seen, multiplication times are either constant or depend on the count of set bits in the second divisor, so it's something like 6 + 2s. However, I have only m68k data books here, and it will gain nothing on an 'C68060 since those beasts ram down multiplications in 2 cycles, so we'd gain nothing on those chips (OK, the shifts take 1 cycles each and are scheduled in parallel, and the add takes an additional cycle after the shifts have completed). Not sure about the ix86, alpha or sparc series. -- Matthias Andree - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
On Thu, 21 Dec 2000, Matthias Andree wrote: x * 10 can be written as: (x 2 + x) 1 = (4x+x) * 2 (x 3) + (x 1) = 8x + 2x Or as "x * 10". Which has the advantage of actually being readable, and letting the compiler optimize it into one of the other forms if that's profitable on the machine you are compiling for. Why do you guys bother making strtoul run fast anyway? Surely it's not on any critical path in the kernel? Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
On Wed, 20 Dec 2000, Steve Grubb wrote: +while (isdigit(c)) { +result = (result*10) + (c 0x0f); +c = *(++cp); +} x * 10 can be written as: (x 2 + x) 1 = (4x+x) * 2 (x 3) + (x 1) = 8x + 2x Since when has printk been performance critical. It isnt worth microoptimising (or in your case for some cpus micropessimising) that stuff. Besides, gcc should work it out if its worth doing - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Steve Grubb wrote: > It seems gcc creates much better code with the variables set to register > types. Curious. GCC should be generating the same code regardless; ah well. Is strtoul actually used in the kernel other than for the occasional (rare) write to /proc/sys and parsing boot options? > But this is the kernel and there are people that would reject my patch > purely on the basis that it adds precious bytes to the kernel. Perhaps I am mistaken but I'd expect it to be called what, ten times at boot time, and a couple of times when X loads the MTRRs? Sounds like the neatest trick would be reducing bytes used here... -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Hello, I continued experimenting with the Test Case and found a further speed improvement & I am re-submiting the patch. It is the same as the first one with the two local variables changed to register storage types. On a K6-2, I now see: Base 10 - 28% speedup Base 16 - 24% speedup Base 8 - 30% speedup On a P3 system, I now see: Base 10 - 25% speedup Base 16 - 17% speedup Base 8 - 20% speedup It seems gcc creates much better code with the variables set to register types. Please apply the following patch. It should apply to any recent 2.2.x without problems. In 2.4 the function starts 2 lines later. Cheers, Steve Grubb -- --- lib/vsprintf.orig Fri Dec 1 08:58:02 2000 +++ lib/vsprintf.c Wed Dec 20 13:14:13 2000 @@ -14,10 +14,13 @@ #include #include +/* +* This function converts base 8, 10, or 16 only - Steve Grubb +*/ unsigned long simple_strtoul(const char *cp,char **endp,unsigned int base) { - unsigned long result = 0,value; - + register unsigned char c; + register unsigned long result = 0; if (!base) { base = 10; if (*cp == '0') { @@ -29,11 +32,36 @@ } } } - while (isxdigit(*cp) && (value = isdigit(*cp) ? *cp-'0' : (islower(*cp) - ? toupper(*cp) : *cp)-'A'+10) < base) { - result = result*base + value; - cp++; - } + c = *cp; +switch (base) { +case 10: +while (isdigit(c)) { +result = (result*10) + (c & 0x0f); +c = *(++cp); +} +break; +case 16: +while (isxdigit(c)) { +result = result<<4; +if (c&0x40) + result += (c & 0x07) + 9; +else +result += c & 0x0f; +c = *(++cp); +} +break; +case 8: +while (isdigit(c)) { +if ((c&0x37) == c) +result = (result<<3) + (c & 0x07); +else +break; +c = *(++cp); +} +break; +default: /* Is anything else used by the kernel? */ +break; +} if (endp) *endp = (char *)cp; return result; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Hello, I thought about that. This would be my recommendation for glibc where the general public may be doing scientific applications. But this is the kernel and there are people that would reject my patch purely on the basis that it adds precious bytes to the kernel. But since the kernel is "controllable" & printf() and its variants only support 8, 10, & 16, perhaps a better solution might be to trap the odd case and write something for it if its that important, or simply don't allow it. The base guessing part at the beginning of the function only supports base 8, 10, & 16. Therefore, the only way to require another base is to specify it in the function call (param - unsigned int base). A quick scan of the current linux source shows no one using something odd. So... If the maintainers of vsprintf.c want support for all number bases, that's fine with me. Just say the word & I'll gen up another patch...but it will be more bytes. Cheers, Steve Grubb - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
On Wed, Dec 20, 2000 at 09:09:03AM -0500, Steve Grubb wrote: > Hello, > > The following patch is a faster implementation of the simple_strtoul > function. [snip] Why not preserve the existing code for bases other than 8, 10, and 16? Admittedly, the only other case that is likely to be used would be base 2, but surely there's only a penalty of a few dozen bytes for the following code.. > - while (isxdigit(*cp) && (value = isdigit(*cp) ? *cp-'0' : (islower(*cp) > - ? toupper(*cp) : *cp)-'A'+10) < base) { > - result = result*base + value; > - cp++; > - } Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[Patch] performance enhancement for simple_strtoul
Hello, The following patch is a faster implementation of the simple_strtoul function. This function differs from the original in that it reduces the multiplies to shifts and logical operations wherever possible. My testing shows that it adds around 100 bytes, but is about 6% faster on a K6-2. (It is 40% faster that glibc's strtoul, but that's a different story.) My guess is that the performance gain will be higher on platforms with slower multiplication instructions. I have tested it for numerical accuracy so I think this is safe to apply. If anyone is interested, I can also supply a test application that demonstrates the performance gain. This patch was generated against 2.2.16, but should apply to 2.2.19 cleanly. In 2.4.0-test9, simple_strtoul starts on line 19 rather than 17, hopefully that's not a problem. Cheers, Steve Grubb - --- lib/vsprintf.orig Fri Dec 1 08:58:02 2000 +++ lib/vsprintf.c Wed Dec 20 08:42:26 2000 @@ -14,10 +14,13 @@ #include #include +/* +* This function converts base 8, 10, or 16 only - Steve Grubb +*/ unsigned long simple_strtoul(const char *cp,char **endp,unsigned int base) { - unsigned long result = 0,value; - + unsigned char c; + unsigned long result = 0; if (!base) { base = 10; if (*cp == '0') { @@ -29,11 +32,36 @@ } } } - while (isxdigit(*cp) && (value = isdigit(*cp) ? *cp-'0' : (islower(*cp) - ? toupper(*cp) : *cp)-'A'+10) < base) { - result = result*base + value; - cp++; - } + c = *cp; +switch (base) { +case 10: +while (isdigit(c)) { +result = (result*10) + (c & 0x0f); +c = *(++cp); +} +break; +case 16: +while (isxdigit(c)) { +result = result<<4; +if (c&0x40) + result += (c & 0x07) + 9; +else +result += c & 0x0f; +c = *(++cp); +} +break; +case 8: +while (isdigit(c)) { +if ((c&0x37) == c) +result = (result<<3) + (c & 0x07); +else +break; +c = *(++cp); +} +break; +default: /* Is anything else used by the kernel? */ +break; +} if (endp) *endp = (char *)cp; return result; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[Patch] performance enhancement for simple_strtoul
Hello, The following patch is a faster implementation of the simple_strtoul function. This function differs from the original in that it reduces the multiplies to shifts and logical operations wherever possible. My testing shows that it adds around 100 bytes, but is about 6% faster on a K6-2. (It is 40% faster that glibc's strtoul, but that's a different story.) My guess is that the performance gain will be higher on platforms with slower multiplication instructions. I have tested it for numerical accuracy so I think this is safe to apply. If anyone is interested, I can also supply a test application that demonstrates the performance gain. This patch was generated against 2.2.16, but should apply to 2.2.19 cleanly. In 2.4.0-test9, simple_strtoul starts on line 19 rather than 17, hopefully that's not a problem. Cheers, Steve Grubb - --- lib/vsprintf.orig Fri Dec 1 08:58:02 2000 +++ lib/vsprintf.c Wed Dec 20 08:42:26 2000 @@ -14,10 +14,13 @@ #include linux/string.h #include linux/ctype.h +/* +* This function converts base 8, 10, or 16 only - Steve Grubb +*/ unsigned long simple_strtoul(const char *cp,char **endp,unsigned int base) { - unsigned long result = 0,value; - + unsigned char c; + unsigned long result = 0; if (!base) { base = 10; if (*cp == '0') { @@ -29,11 +32,36 @@ } } } - while (isxdigit(*cp) (value = isdigit(*cp) ? *cp-'0' : (islower(*cp) - ? toupper(*cp) : *cp)-'A'+10) base) { - result = result*base + value; - cp++; - } + c = *cp; +switch (base) { +case 10: +while (isdigit(c)) { +result = (result*10) + (c 0x0f); +c = *(++cp); +} +break; +case 16: +while (isxdigit(c)) { +result = result4; +if (c0x40) + result += (c 0x07) + 9; +else +result += c 0x0f; +c = *(++cp); +} +break; +case 8: +while (isdigit(c)) { +if ((c0x37) == c) +result = (result3) + (c 0x07); +else +break; +c = *(++cp); +} +break; +default: /* Is anything else used by the kernel? */ +break; +} if (endp) *endp = (char *)cp; return result; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
On Wed, Dec 20, 2000 at 09:09:03AM -0500, Steve Grubb wrote: Hello, The following patch is a faster implementation of the simple_strtoul function. [snip] Why not preserve the existing code for bases other than 8, 10, and 16? Admittedly, the only other case that is likely to be used would be base 2, but surely there's only a penalty of a few dozen bytes for the following code.. - while (isxdigit(*cp) (value = isdigit(*cp) ? *cp-'0' : (islower(*cp) - ? toupper(*cp) : *cp)-'A'+10) base) { - result = result*base + value; - cp++; - } Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Hello, I thought about that. This would be my recommendation for glibc where the general public may be doing scientific applications. But this is the kernel and there are people that would reject my patch purely on the basis that it adds precious bytes to the kernel. But since the kernel is "controllable" printf() and its variants only support 8, 10, 16, perhaps a better solution might be to trap the odd case and write something for it if its that important, or simply don't allow it. The base guessing part at the beginning of the function only supports base 8, 10, 16. Therefore, the only way to require another base is to specify it in the function call (param - unsigned int base). A quick scan of the current linux source shows no one using something odd. So... If the maintainers of vsprintf.c want support for all number bases, that's fine with me. Just say the word I'll gen up another patch...but it will be more bytes. Cheers, Steve Grubb - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Hello, I continued experimenting with the Test Case and found a further speed improvement I am re-submiting the patch. It is the same as the first one with the two local variables changed to register storage types. On a K6-2, I now see: Base 10 - 28% speedup Base 16 - 24% speedup Base 8 - 30% speedup On a P3 system, I now see: Base 10 - 25% speedup Base 16 - 17% speedup Base 8 - 20% speedup It seems gcc creates much better code with the variables set to register types. Please apply the following patch. It should apply to any recent 2.2.x without problems. In 2.4 the function starts 2 lines later. Cheers, Steve Grubb -- --- lib/vsprintf.orig Fri Dec 1 08:58:02 2000 +++ lib/vsprintf.c Wed Dec 20 13:14:13 2000 @@ -14,10 +14,13 @@ #include linux/string.h #include linux/ctype.h +/* +* This function converts base 8, 10, or 16 only - Steve Grubb +*/ unsigned long simple_strtoul(const char *cp,char **endp,unsigned int base) { - unsigned long result = 0,value; - + register unsigned char c; + register unsigned long result = 0; if (!base) { base = 10; if (*cp == '0') { @@ -29,11 +32,36 @@ } } } - while (isxdigit(*cp) (value = isdigit(*cp) ? *cp-'0' : (islower(*cp) - ? toupper(*cp) : *cp)-'A'+10) base) { - result = result*base + value; - cp++; - } + c = *cp; +switch (base) { +case 10: +while (isdigit(c)) { +result = (result*10) + (c 0x0f); +c = *(++cp); +} +break; +case 16: +while (isxdigit(c)) { +result = result4; +if (c0x40) + result += (c 0x07) + 9; +else +result += c 0x0f; +c = *(++cp); +} +break; +case 8: +while (isdigit(c)) { +if ((c0x37) == c) +result = (result3) + (c 0x07); +else +break; +c = *(++cp); +} +break; +default: /* Is anything else used by the kernel? */ +break; +} if (endp) *endp = (char *)cp; return result; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] performance enhancement for simple_strtoul
Steve Grubb wrote: It seems gcc creates much better code with the variables set to register types. Curious. GCC should be generating the same code regardless; ah well. Is strtoul actually used in the kernel other than for the occasional (rare) write to /proc/sys and parsing boot options? But this is the kernel and there are people that would reject my patch purely on the basis that it adds precious bytes to the kernel. Perhaps I am mistaken but I'd expect it to be called what, ten times at boot time, and a couple of times when X loads the MTRRs? Sounds like the neatest trick would be reducing bytes used here... -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/