RE: [PATCH 2] example binary BUSEC patch for benchmarking only

2002-07-11 Thread Bill Stoddard
I ran a quick profile with this patch and it eliminated a couple of
divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your
mileage may vary) which was good for 493 instructions. Still have 3 __divu64
and 2 __divi64 calls. The three __divu64 calls are in the gettimeofday() CRT
function, so there is not much we can do about these directly.  One __divi64
is in apr_poll (convert microseconds to milliseconds. This can probably be
optimized away). The other __divi64 is somewhere in cached_explode
(util_time.c).

Bill

 At 10:38 PM 7/10/2002, William A. Rowe, Jr. wrote:
 At 10:03 PM 7/10/2002, Brian Pane wrote:
 Bill Stoddard wrote:
 
 I've not looked at the generated code, but profiling indicates that an
 additional division is happening, adding an extra 231 instructions.
 (xlc_r -O2)
 
 If you redefine the macro as a shift, does the profile look better?

 Ok, attached is the code redone as binary math.  I'm tired, could be
 any number of major blunders in it, but on first pass, it looked right.

 Bill



RE: [PATCH 2] example binary BUSEC patch for benchmarking only

2002-07-11 Thread Ryan Bloom

 From: Bill Stoddard [mailto:[EMAIL PROTECTED]
 
 I ran a quick profile with this patch and it eliminated a couple of
 divisions (calls to __divi64 reduced from 4 to 2 in my test setup.
your
 mileage may vary) which was good for 493 instructions. Still have 3
 __divu64
 and 2 __divi64 calls. The three __divu64 calls are in the
gettimeofday()
 CRT
 function, so there is not much we can do about these directly.  One

Has anybody looked at ExtendedStatus to make sure that we are
eliminating all of the time calls that we can during request processing?

 __divi64
 is in apr_poll (convert microseconds to milliseconds. This can
probably be

If we move to using the macros, it should go away.

Ryan



Re: [PATCH 2] example binary BUSEC patch for benchmarking only

2002-07-11 Thread Greg Marr
At 01:33 AM 07/11/2002, William A. Rowe, Jr. wrote:
Ok, attached is the code redone as binary math.  I'm tired, could be 
any number of major blunders in it, but on first pass, it looked right.

-/** number of microseconds per second */
-#define APR_USEC_PER_SEC APR_TIME_C(100)
+/** number of binary microseconds per second (2^20) */
+#define APR_USEC_PER_SEC APR_TIME_C(1048576)
+#define APR_USEC_BITS 20
I keep thinking that APR_USEC_PER_SEC should be (1  20), or now
(1  APR_USEC_BITS) instead of the magical constant.  I have no way 
of verifying with a quick glance that 1048576 is really 2^20.

--
Greg Marr
[EMAIL PROTECTED]
We thought you were dead.
I was, but I'm better now. - Sheridan, The Summoning


Re: [PATCH 2] example binary BUSEC patch for benchmarking only

2002-07-11 Thread Jim Jagielski
Greg Marr wrote:
 
 At 01:33 AM 07/11/2002, William A. Rowe, Jr. wrote:
 Ok, attached is the code redone as binary math.  I'm tired, could be 
 any number of major blunders in it, but on first pass, it looked right.
 
 -/** number of microseconds per second */
 -#define APR_USEC_PER_SEC APR_TIME_C(100)
 +/** number of binary microseconds per second (2^20) */
 +#define APR_USEC_PER_SEC APR_TIME_C(1048576)
 +#define APR_USEC_BITS 20
 
 I keep thinking that APR_USEC_PER_SEC should be (1  20), or now
 (1  APR_USEC_BITS) instead of the magical constant.  I have no way 
 of verifying with a quick glance that 1048576 is really 2^20.
 

It is :)

-- 
===
   Jim Jagielski   [|]   [EMAIL PROTECTED]   [|]   http://www.jaguNET.com/
  A society that will trade a little liberty for a little order
 will lose both and deserve neither - T.Jefferson


Re: [PATCH 2] example binary BUSEC patch for benchmarking only

2002-07-11 Thread Greg Marr
At 10:27 AM 07/11/2002, Jim Jagielski wrote:
Greg Marr wrote:
 I keep thinking that APR_USEC_PER_SEC should be (1  20), or now
 (1  APR_USEC_BITS) instead of the magical constant.  I have no 
way
 of verifying with a quick glance that 1048576 is really 2^20.


It is :)
Well, yes, I did check it, but it's not immediately obvious.  :)
--
Greg Marr
[EMAIL PROTECTED]
We thought you were dead.
I was, but I'm better now. - Sheridan, The Summoning


RE: [PATCH 2] example binary BUSEC patch for benchmarking only

2002-07-11 Thread Brian Pane
On Thu, 2002-07-11 at 06:58, Bill Stoddard wrote:

 I ran a quick profile with this patch and it eliminated a couple of
 divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your
 mileage may vary) which was good for 493 instructions. Still have 3 __divu64
 and 2 __divi64 calls. The three __divu64 calls are in the gettimeofday() CRT
 function, so there is not much we can do about these directly.  One __divi64
 is in apr_poll (convert microseconds to milliseconds. This can probably be
 optimized away). The other __divi64 is somewhere in cached_explode
 (util_time.c).

The only division that I know of in cached_explode() is:

struct exploded_time_cache_element *cache_element =
(cache[seconds % TIME_CACHE_SIZE]);

Is that where the division operation is being generated
on your test system?

TIME_CACHE_SIZE is 16, specifically so that the compiler
can optimize away the division, but if there's still a
division being generated, we can replace it with
  cache[seconds  TIME_CACHE_MASK]

--Brian




RE: [PATCH 2] example binary BUSEC patch for benchmarking only

2002-07-11 Thread William A. Rowe, Jr.
At 10:41 AM 7/11/2002, Brian Pane wrote:
On Thu, 2002-07-11 at 06:58, Bill Stoddard wrote:
 I ran a quick profile with this patch and it eliminated a couple of
 divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your
 mileage may vary) which was good for 493 instructions. Still have 3 
__divu64
 and 2 __divi64 calls. The three __divu64 calls are in the 
gettimeofday() CRT
 function, so there is not much we can do about these directly.  One 
__divi64
 is in apr_poll (convert microseconds to milliseconds. This can probably be
 optimized away). The other __divi64 is somewhere in cached_explode
 (util_time.c).

The only division that I know of in cached_explode() is:
struct exploded_time_cache_element *cache_element =
(cache[seconds % TIME_CACHE_SIZE]);
Is that where the division operation is being generated
on your test system?
Actually, we still have division on construct busec from usecs.
That would be the one /100 which can never be optimized
away.  Fortunately, it should be infrequent.
Bill



Re: [PATCH 2] example binary BUSEC patch for benchmarking only

2002-07-11 Thread Cliff Woolley
On Thu, 11 Jul 2002, Greg Marr wrote:

 I keep thinking that APR_USEC_PER_SEC should be (1  20), or now
 (1  APR_USEC_BITS) instead of the magical constant.  I have no way
 of verifying with a quick glance that 1048576 is really 2^20.

You don't know your powers of 2?  Memorize, Greg, Memorize.  ;)