Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.

2012-09-24 Thread George Spelvin
Michal Nazarewicz  wrote:
> On Fri, Aug 03 2012, George Spelvin  wrote:
>> Shrink the reciprocal approximations used in put_dec_full4
>> based on the comments in put_dec_full9.
>
> Have you verified that the comment is correct?

I rechecked all the validity limits myself.

>>  r  = (q * 0xcd) >> 11;

> If you are changing everything, this could also be changed to:
>
>   r  = (q * 0x67) >> 10;
>
> no?

Also in those comments is a statement I did *not* recheck, as it didn't
affect correctness, saying that 0xcd produces shorter code than 0x67 on
x86 (if the code is generated using shifts and adds).

That's why I left it that way.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.

2012-09-24 Thread George Spelvin
>> +/* See comment in put_dec_full9 for choice of constants */
>>  static noinline_for_stack
>>  char *put_dec_full4(char *buf, unsigned q)
>>  {
>> unsigned r;
>> -   r  = (q * 0xcccd) >> 19;
>> +   r  = (q * 0xccd) >> 15;
>> *buf++ = (q - 10 * r) + '0';
>> -   q  = (r * 0x199a) >> 16;
>> +   q  = (r * 0xcd) >> 11;

> I would use 16-bit shifts instead of smaller ones.
> There may be CPUs on which "get upper half of 32-bit reg"
> operation is cheaper or smaller than a shift.

Good point, but wouldn't those CPUs *also* have multi-cycle multiply,
or have to synthesize it out of shift-and-add, in which case smaller
constants would save even more cycles?

I'm thinking original MC68010 here, which I'm not sure is even
meaningful any more.  ColdFire has single-cycle shifts.

Can you think of a processor where that would actually be
an improvement?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.

2012-09-24 Thread Denys Vlasenko
On Fri, Aug 3, 2012 at 7:21 AM, George Spelvin  wrote:
> Shrink the reciprocal approximations used in put_dec_full4
> based on the comments in put_dec_full9.
>
> Signed-off-by: George Spelvin 
> Cc: Denys Vlasenko 
> Cc: Michal Nazarewicz 
> ---
>  lib/vsprintf.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> I was looking over the code and noticed that the constants could be smaller.
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index c3f36d41..2f32fe8 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n)
>
>  /* Second algorithm: valid only for 64-bit long longs */
>
> +/* See comment in put_dec_full9 for choice of constants */
>  static noinline_for_stack
>  char *put_dec_full4(char *buf, unsigned q)
>  {
> unsigned r;
> -   r  = (q * 0xcccd) >> 19;
> +   r  = (q * 0xccd) >> 15;
> *buf++ = (q - 10 * r) + '0';
> -   q  = (r * 0x199a) >> 16;
> +   q  = (r * 0xcd) >> 11;

I would use 16-bit shifts instead of smaller ones.
There may be CPUs on which "get upper half of 32-bit reg"
operation is cheaper or smaller than a shift.

-- 
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.

2012-09-24 Thread Denys Vlasenko
On Fri, Aug 3, 2012 at 7:21 AM, George Spelvin li...@horizon.com wrote:
 Shrink the reciprocal approximations used in put_dec_full4
 based on the comments in put_dec_full9.

 Signed-off-by: George Spelvin li...@horizon.com
 Cc: Denys Vlasenko vda.li...@googlemail.com
 Cc: Michal Nazarewicz min...@mina86.com
 ---
  lib/vsprintf.c |5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

 I was looking over the code and noticed that the constants could be smaller.

 diff --git a/lib/vsprintf.c b/lib/vsprintf.c
 index c3f36d41..2f32fe8 100644
 --- a/lib/vsprintf.c
 +++ b/lib/vsprintf.c
 @@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n)

  /* Second algorithm: valid only for 64-bit long longs */

 +/* See comment in put_dec_full9 for choice of constants */
  static noinline_for_stack
  char *put_dec_full4(char *buf, unsigned q)
  {
 unsigned r;
 -   r  = (q * 0xcccd)  19;
 +   r  = (q * 0xccd)  15;
 *buf++ = (q - 10 * r) + '0';
 -   q  = (r * 0x199a)  16;
 +   q  = (r * 0xcd)  11;

I would use 16-bit shifts instead of smaller ones.
There may be CPUs on which get upper half of 32-bit reg
operation is cheaper or smaller than a shift.

-- 
vda
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.

2012-09-24 Thread George Spelvin
 +/* See comment in put_dec_full9 for choice of constants */
  static noinline_for_stack
  char *put_dec_full4(char *buf, unsigned q)
  {
 unsigned r;
 -   r  = (q * 0xcccd)  19;
 +   r  = (q * 0xccd)  15;
 *buf++ = (q - 10 * r) + '0';
 -   q  = (r * 0x199a)  16;
 +   q  = (r * 0xcd)  11;

 I would use 16-bit shifts instead of smaller ones.
 There may be CPUs on which get upper half of 32-bit reg
 operation is cheaper or smaller than a shift.

Good point, but wouldn't those CPUs *also* have multi-cycle multiply,
or have to synthesize it out of shift-and-add, in which case smaller
constants would save even more cycles?

I'm thinking original MC68010 here, which I'm not sure is even
meaningful any more.  ColdFire has single-cycle shifts.

Can you think of a processor where that would actually be
an improvement?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.

2012-09-24 Thread George Spelvin
Michal Nazarewicz m...@google.com wrote:
 On Fri, Aug 03 2012, George Spelvin li...@horizon.com wrote:
 Shrink the reciprocal approximations used in put_dec_full4
 based on the comments in put_dec_full9.

 Have you verified that the comment is correct?

I rechecked all the validity limits myself.

  r  = (q * 0xcd)  11;

 If you are changing everything, this could also be changed to:

   r  = (q * 0x67)  10;

 no?

Also in those comments is a statement I did *not* recheck, as it didn't
affect correctness, saying that 0xcd produces shorter code than 0x67 on
x86 (if the code is generated using shifts and adds).

That's why I left it that way.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.

2012-09-23 Thread Michal Nazarewicz
On Fri, Aug 03 2012, George Spelvin  wrote:
> Shrink the reciprocal approximations used in put_dec_full4
> based on the comments in put_dec_full9.
>
> Signed-off-by: George Spelvin 
> Cc: Denys Vlasenko 
> Cc: Michal Nazarewicz 

Have you verified that the comment is correct?

> ---
>  lib/vsprintf.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> I was looking over the code and noticed that the constants could be smaller.
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index c3f36d41..2f32fe8 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n)
>  
>  /* Second algorithm: valid only for 64-bit long longs */
>  
> +/* See comment in put_dec_full9 for choice of constants */
>  static noinline_for_stack
>  char *put_dec_full4(char *buf, unsigned q)
>  {
>   unsigned r;
> - r  = (q * 0xcccd) >> 19;
> + r  = (q * 0xccd) >> 15;
>   *buf++ = (q - 10 * r) + '0';
> - q  = (r * 0x199a) >> 16;
> + q  = (r * 0xcd) >> 11;
>   *buf++ = (r - 10 * q)  + '0';
>   r  = (q * 0xcd) >> 11;

If you are changing everything, this could also be changed to:

r  = (q * 0x67) >> 10;

no?

>   *buf++ = (q - 10 * r)  + '0';

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +--ooO--(_)--Ooo--

pgphxPh3rOuGX.pgp
Description: PGP signature


Re: [PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.

2012-09-23 Thread Michal Nazarewicz
On Fri, Aug 03 2012, George Spelvin li...@horizon.com wrote:
 Shrink the reciprocal approximations used in put_dec_full4
 based on the comments in put_dec_full9.

 Signed-off-by: George Spelvin li...@horizon.com
 Cc: Denys Vlasenko vda.li...@googlemail.com
 Cc: Michal Nazarewicz min...@mina86.com

Have you verified that the comment is correct?

 ---
  lib/vsprintf.c |5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

 I was looking over the code and noticed that the constants could be smaller.

 diff --git a/lib/vsprintf.c b/lib/vsprintf.c
 index c3f36d41..2f32fe8 100644
 --- a/lib/vsprintf.c
 +++ b/lib/vsprintf.c
 @@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n)
  
  /* Second algorithm: valid only for 64-bit long longs */
  
 +/* See comment in put_dec_full9 for choice of constants */
  static noinline_for_stack
  char *put_dec_full4(char *buf, unsigned q)
  {
   unsigned r;
 - r  = (q * 0xcccd)  19;
 + r  = (q * 0xccd)  15;
   *buf++ = (q - 10 * r) + '0';
 - q  = (r * 0x199a)  16;
 + q  = (r * 0xcd)  11;
   *buf++ = (r - 10 * q)  + '0';
   r  = (q * 0xcd)  11;

If you are changing everything, this could also be changed to:

r  = (q * 0x67)  10;

no?

   *buf++ = (q - 10 * r)  + '0';

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +email/xmpp: m...@google.com--ooO--(_)--Ooo--

pgphxPh3rOuGX.pgp
Description: PGP signature


[PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.

2012-08-02 Thread George Spelvin
Shrink the reciprocal approximations used in put_dec_full4
based on the comments in put_dec_full9.

Signed-off-by: George Spelvin 
Cc: Denys Vlasenko 
Cc: Michal Nazarewicz 
---
 lib/vsprintf.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

I was looking over the code and noticed that the constants could be smaller.

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c3f36d41..2f32fe8 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n)
 
 /* Second algorithm: valid only for 64-bit long longs */
 
+/* See comment in put_dec_full9 for choice of constants */
 static noinline_for_stack
 char *put_dec_full4(char *buf, unsigned q)
 {
unsigned r;
-   r  = (q * 0xcccd) >> 19;
+   r  = (q * 0xccd) >> 15;
*buf++ = (q - 10 * r) + '0';
-   q  = (r * 0x199a) >> 16;
+   q  = (r * 0xcd) >> 11;
*buf++ = (r - 10 * q)  + '0';
r  = (q * 0xcd) >> 11;
*buf++ = (q - 10 * r)  + '0';
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] lib: vsprintf: Optimize division by 10 for small integers.

2012-08-02 Thread George Spelvin
Shrink the reciprocal approximations used in put_dec_full4
based on the comments in put_dec_full9.

Signed-off-by: George Spelvin li...@horizon.com
Cc: Denys Vlasenko vda.li...@googlemail.com
Cc: Michal Nazarewicz min...@mina86.com
---
 lib/vsprintf.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

I was looking over the code and noticed that the constants could be smaller.

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c3f36d41..2f32fe8 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -243,13 +243,14 @@ char *put_dec(char *buf, unsigned long long n)
 
 /* Second algorithm: valid only for 64-bit long longs */
 
+/* See comment in put_dec_full9 for choice of constants */
 static noinline_for_stack
 char *put_dec_full4(char *buf, unsigned q)
 {
unsigned r;
-   r  = (q * 0xcccd)  19;
+   r  = (q * 0xccd)  15;
*buf++ = (q - 10 * r) + '0';
-   q  = (r * 0x199a)  16;
+   q  = (r * 0xcd)  11;
*buf++ = (r - 10 * q)  + '0';
r  = (q * 0xcd)  11;
*buf++ = (q - 10 * r)  + '0';
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/