Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.
Michal Nazarewicz wrote: > On Fri, Aug 03 2012, George Spelvin wrote: >> Shrink the reciprocal approximations used in put_dec_full4 >> based on the comments in put_dec_full9. > > Have you verified that the comment is correct? I rechecked all the validity limits myself. >> r = (q * 0xcd) >> 11; > If you are changing everything, this could also be changed to: > > r = (q * 0x67) >> 10; > > no? Also in those comments is a statement I did *not* recheck, as it didn't affect correctness, saying that 0xcd produces shorter code than 0x67 on x86 (if the code is generated using shifts and adds). That's why I left it that way. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.
>> +/* See comment in put_dec_full9 for choice of constants */ >> static noinline_for_stack >> char *put_dec_full4(char *buf, unsigned q) >> { >> unsigned r; >> - r = (q * 0xcccd) >> 19; >> + r = (q * 0xccd) >> 15; >> *buf++ = (q - 10 * r) + '0'; >> - q = (r * 0x199a) >> 16; >> + q = (r * 0xcd) >> 11; > I would use 16-bit shifts instead of smaller ones. > There may be CPUs on which "get upper half of 32-bit reg" > operation is cheaper or smaller than a shift. Good point, but wouldn't those CPUs *also* have multi-cycle multiply, or have to synthesize it out of shift-and-add, in which case smaller constants would save even more cycles? I'm thinking original MC68010 here, which I'm not sure is even meaningful any more. ColdFire has single-cycle shifts. Can you think of a processor where that would actually be an improvement? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.
On Fri, Aug 3, 2012 at 7:21 AM, George Spelvin wrote: > Shrink the reciprocal approximations used in put_dec_full4 > based on the comments in put_dec_full9. > > Signed-off-by: George Spelvin > Cc: Denys Vlasenko > Cc: Michal Nazarewicz > --- > lib/vsprintf.c |5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > I was looking over the code and noticed that the constants could be smaller. > > diff --git a/lib/vsprintf.c b/lib/vsprintf.c > index c3f36d41..2f32fe8 100644 > --- a/lib/vsprintf.c > +++ b/lib/vsprintf.c > @@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n) > > /* Second algorithm: valid only for 64-bit long longs */ > > +/* See comment in put_dec_full9 for choice of constants */ > static noinline_for_stack > char *put_dec_full4(char *buf, unsigned q) > { > unsigned r; > - r = (q * 0xcccd) >> 19; > + r = (q * 0xccd) >> 15; > *buf++ = (q - 10 * r) + '0'; > - q = (r * 0x199a) >> 16; > + q = (r * 0xcd) >> 11; I would use 16-bit shifts instead of smaller ones. There may be CPUs on which "get upper half of 32-bit reg" operation is cheaper or smaller than a shift. -- vda -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.
On Fri, Aug 3, 2012 at 7:21 AM, George Spelvin li...@horizon.com wrote: Shrink the reciprocal approximations used in put_dec_full4 based on the comments in put_dec_full9. Signed-off-by: George Spelvin li...@horizon.com Cc: Denys Vlasenko vda.li...@googlemail.com Cc: Michal Nazarewicz min...@mina86.com --- lib/vsprintf.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) I was looking over the code and noticed that the constants could be smaller. diff --git a/lib/vsprintf.c b/lib/vsprintf.c index c3f36d41..2f32fe8 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n) /* Second algorithm: valid only for 64-bit long longs */ +/* See comment in put_dec_full9 for choice of constants */ static noinline_for_stack char *put_dec_full4(char *buf, unsigned q) { unsigned r; - r = (q * 0xcccd) 19; + r = (q * 0xccd) 15; *buf++ = (q - 10 * r) + '0'; - q = (r * 0x199a) 16; + q = (r * 0xcd) 11; I would use 16-bit shifts instead of smaller ones. There may be CPUs on which get upper half of 32-bit reg operation is cheaper or smaller than a shift. -- vda -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.
+/* See comment in put_dec_full9 for choice of constants */ static noinline_for_stack char *put_dec_full4(char *buf, unsigned q) { unsigned r; - r = (q * 0xcccd) 19; + r = (q * 0xccd) 15; *buf++ = (q - 10 * r) + '0'; - q = (r * 0x199a) 16; + q = (r * 0xcd) 11; I would use 16-bit shifts instead of smaller ones. There may be CPUs on which get upper half of 32-bit reg operation is cheaper or smaller than a shift. Good point, but wouldn't those CPUs *also* have multi-cycle multiply, or have to synthesize it out of shift-and-add, in which case smaller constants would save even more cycles? I'm thinking original MC68010 here, which I'm not sure is even meaningful any more. ColdFire has single-cycle shifts. Can you think of a processor where that would actually be an improvement? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.
Michal Nazarewicz m...@google.com wrote: On Fri, Aug 03 2012, George Spelvin li...@horizon.com wrote: Shrink the reciprocal approximations used in put_dec_full4 based on the comments in put_dec_full9. Have you verified that the comment is correct? I rechecked all the validity limits myself. r = (q * 0xcd) 11; If you are changing everything, this could also be changed to: r = (q * 0x67) 10; no? Also in those comments is a statement I did *not* recheck, as it didn't affect correctness, saying that 0xcd produces shorter code than 0x67 on x86 (if the code is generated using shifts and adds). That's why I left it that way. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.
On Fri, Aug 03 2012, George Spelvin wrote: > Shrink the reciprocal approximations used in put_dec_full4 > based on the comments in put_dec_full9. > > Signed-off-by: George Spelvin > Cc: Denys Vlasenko > Cc: Michal Nazarewicz Have you verified that the comment is correct? > --- > lib/vsprintf.c |5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > I was looking over the code and noticed that the constants could be smaller. > > diff --git a/lib/vsprintf.c b/lib/vsprintf.c > index c3f36d41..2f32fe8 100644 > --- a/lib/vsprintf.c > +++ b/lib/vsprintf.c > @@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n) > > /* Second algorithm: valid only for 64-bit long longs */ > > +/* See comment in put_dec_full9 for choice of constants */ > static noinline_for_stack > char *put_dec_full4(char *buf, unsigned q) > { > unsigned r; > - r = (q * 0xcccd) >> 19; > + r = (q * 0xccd) >> 15; > *buf++ = (q - 10 * r) + '0'; > - q = (r * 0x199a) >> 16; > + q = (r * 0xcd) >> 11; > *buf++ = (r - 10 * q) + '0'; > r = (q * 0xcd) >> 11; If you are changing everything, this could also be changed to: r = (q * 0x67) >> 10; no? > *buf++ = (q - 10 * r) + '0'; -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +--ooO--(_)--Ooo-- pgphxPh3rOuGX.pgp Description: PGP signature
Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.
On Fri, Aug 03 2012, George Spelvin li...@horizon.com wrote: Shrink the reciprocal approximations used in put_dec_full4 based on the comments in put_dec_full9. Signed-off-by: George Spelvin li...@horizon.com Cc: Denys Vlasenko vda.li...@googlemail.com Cc: Michal Nazarewicz min...@mina86.com Have you verified that the comment is correct? --- lib/vsprintf.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) I was looking over the code and noticed that the constants could be smaller. diff --git a/lib/vsprintf.c b/lib/vsprintf.c index c3f36d41..2f32fe8 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n) /* Second algorithm: valid only for 64-bit long longs */ +/* See comment in put_dec_full9 for choice of constants */ static noinline_for_stack char *put_dec_full4(char *buf, unsigned q) { unsigned r; - r = (q * 0xcccd) 19; + r = (q * 0xccd) 15; *buf++ = (q - 10 * r) + '0'; - q = (r * 0x199a) 16; + q = (r * 0xcd) 11; *buf++ = (r - 10 * q) + '0'; r = (q * 0xcd) 11; If you are changing everything, this could also be changed to: r = (q * 0x67) 10; no? *buf++ = (q - 10 * r) + '0'; -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz(o o) ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo-- pgphxPh3rOuGX.pgp Description: PGP signature
[PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.
Shrink the reciprocal approximations used in put_dec_full4 based on the comments in put_dec_full9. Signed-off-by: George Spelvin Cc: Denys Vlasenko Cc: Michal Nazarewicz --- lib/vsprintf.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) I was looking over the code and noticed that the constants could be smaller. diff --git a/lib/vsprintf.c b/lib/vsprintf.c index c3f36d41..2f32fe8 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n) /* Second algorithm: valid only for 64-bit long longs */ +/* See comment in put_dec_full9 for choice of constants */ static noinline_for_stack char *put_dec_full4(char *buf, unsigned q) { unsigned r; - r = (q * 0xcccd) >> 19; + r = (q * 0xccd) >> 15; *buf++ = (q - 10 * r) + '0'; - q = (r * 0x199a) >> 16; + q = (r * 0xcd) >> 11; *buf++ = (r - 10 * q) + '0'; r = (q * 0xcd) >> 11; *buf++ = (q - 10 * r) + '0'; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.
Shrink the reciprocal approximations used in put_dec_full4 based on the comments in put_dec_full9. Signed-off-by: George Spelvin li...@horizon.com Cc: Denys Vlasenko vda.li...@googlemail.com Cc: Michal Nazarewicz min...@mina86.com --- lib/vsprintf.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) I was looking over the code and noticed that the constants could be smaller. diff --git a/lib/vsprintf.c b/lib/vsprintf.c index c3f36d41..2f32fe8 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n) /* Second algorithm: valid only for 64-bit long longs */ +/* See comment in put_dec_full9 for choice of constants */ static noinline_for_stack char *put_dec_full4(char *buf, unsigned q) { unsigned r; - r = (q * 0xcccd) 19; + r = (q * 0xccd) 15; *buf++ = (q - 10 * r) + '0'; - q = (r * 0x199a) 16; + q = (r * 0xcd) 11; *buf++ = (r - 10 * q) + '0'; r = (q * 0xcd) 11; *buf++ = (q - 10 * r) + '0'; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/