RE: [PATCH 2] example binary BUSEC patch for benchmarking only
I ran a quick profile with this patch and it eliminated a couple of divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your mileage may vary) which was good for 493 instructions. Still have 3 __divu64 and 2 __divi64 calls. The three __divu64 calls are in the gettimeofday() CRT function, so there is not much we can do about these directly. One __divi64 is in apr_poll (convert microseconds to milliseconds. This can probably be optimized away). The other __divi64 is somewhere in cached_explode (util_time.c). Bill At 10:38 PM 7/10/2002, William A. Rowe, Jr. wrote: At 10:03 PM 7/10/2002, Brian Pane wrote: Bill Stoddard wrote: I've not looked at the generated code, but profiling indicates that an additional division is happening, adding an extra 231 instructions. (xlc_r -O2) If you redefine the macro as a shift, does the profile look better? Ok, attached is the code redone as binary math. I'm tired, could be any number of major blunders in it, but on first pass, it looked right. Bill
RE: [PATCH 2] example binary BUSEC patch for benchmarking only
From: Bill Stoddard [mailto:[EMAIL PROTECTED] I ran a quick profile with this patch and it eliminated a couple of divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your mileage may vary) which was good for 493 instructions. Still have 3 __divu64 and 2 __divi64 calls. The three __divu64 calls are in the gettimeofday() CRT function, so there is not much we can do about these directly. One Has anybody looked at ExtendedStatus to make sure that we are eliminating all of the time calls that we can during request processing? __divi64 is in apr_poll (convert microseconds to milliseconds. This can probably be If we move to using the macros, it should go away. Ryan
Re: [PATCH 2] example binary BUSEC patch for benchmarking only
At 01:33 AM 07/11/2002, William A. Rowe, Jr. wrote: Ok, attached is the code redone as binary math. I'm tired, could be any number of major blunders in it, but on first pass, it looked right. -/** number of microseconds per second */ -#define APR_USEC_PER_SEC APR_TIME_C(100) +/** number of binary microseconds per second (2^20) */ +#define APR_USEC_PER_SEC APR_TIME_C(1048576) +#define APR_USEC_BITS 20 I keep thinking that APR_USEC_PER_SEC should be (1 20), or now (1 APR_USEC_BITS) instead of the magical constant. I have no way of verifying with a quick glance that 1048576 is really 2^20. -- Greg Marr [EMAIL PROTECTED] We thought you were dead. I was, but I'm better now. - Sheridan, The Summoning
Re: [PATCH 2] example binary BUSEC patch for benchmarking only
Greg Marr wrote: At 01:33 AM 07/11/2002, William A. Rowe, Jr. wrote: Ok, attached is the code redone as binary math. I'm tired, could be any number of major blunders in it, but on first pass, it looked right. -/** number of microseconds per second */ -#define APR_USEC_PER_SEC APR_TIME_C(100) +/** number of binary microseconds per second (2^20) */ +#define APR_USEC_PER_SEC APR_TIME_C(1048576) +#define APR_USEC_BITS 20 I keep thinking that APR_USEC_PER_SEC should be (1 20), or now (1 APR_USEC_BITS) instead of the magical constant. I have no way of verifying with a quick glance that 1048576 is really 2^20. It is :) -- === Jim Jagielski [|] [EMAIL PROTECTED] [|] http://www.jaguNET.com/ A society that will trade a little liberty for a little order will lose both and deserve neither - T.Jefferson
Re: [PATCH 2] example binary BUSEC patch for benchmarking only
At 10:27 AM 07/11/2002, Jim Jagielski wrote: Greg Marr wrote: I keep thinking that APR_USEC_PER_SEC should be (1 20), or now (1 APR_USEC_BITS) instead of the magical constant. I have no way of verifying with a quick glance that 1048576 is really 2^20. It is :) Well, yes, I did check it, but it's not immediately obvious. :) -- Greg Marr [EMAIL PROTECTED] We thought you were dead. I was, but I'm better now. - Sheridan, The Summoning
RE: [PATCH 2] example binary BUSEC patch for benchmarking only
On Thu, 2002-07-11 at 06:58, Bill Stoddard wrote: I ran a quick profile with this patch and it eliminated a couple of divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your mileage may vary) which was good for 493 instructions. Still have 3 __divu64 and 2 __divi64 calls. The three __divu64 calls are in the gettimeofday() CRT function, so there is not much we can do about these directly. One __divi64 is in apr_poll (convert microseconds to milliseconds. This can probably be optimized away). The other __divi64 is somewhere in cached_explode (util_time.c). The only division that I know of in cached_explode() is: struct exploded_time_cache_element *cache_element = (cache[seconds % TIME_CACHE_SIZE]); Is that where the division operation is being generated on your test system? TIME_CACHE_SIZE is 16, specifically so that the compiler can optimize away the division, but if there's still a division being generated, we can replace it with cache[seconds TIME_CACHE_MASK] --Brian
RE: [PATCH 2] example binary BUSEC patch for benchmarking only
At 10:41 AM 7/11/2002, Brian Pane wrote: On Thu, 2002-07-11 at 06:58, Bill Stoddard wrote: I ran a quick profile with this patch and it eliminated a couple of divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your mileage may vary) which was good for 493 instructions. Still have 3 __divu64 and 2 __divi64 calls. The three __divu64 calls are in the gettimeofday() CRT function, so there is not much we can do about these directly. One __divi64 is in apr_poll (convert microseconds to milliseconds. This can probably be optimized away). The other __divi64 is somewhere in cached_explode (util_time.c). The only division that I know of in cached_explode() is: struct exploded_time_cache_element *cache_element = (cache[seconds % TIME_CACHE_SIZE]); Is that where the division operation is being generated on your test system? Actually, we still have division on construct busec from usecs. That would be the one /100 which can never be optimized away. Fortunately, it should be infrequent. Bill
Re: [PATCH 2] example binary BUSEC patch for benchmarking only
On Thu, 11 Jul 2002, Greg Marr wrote: I keep thinking that APR_USEC_PER_SEC should be (1 20), or now (1 APR_USEC_BITS) instead of the magical constant. I have no way of verifying with a quick glance that 1048576 is really 2^20. You don't know your powers of 2? Memorize, Greg, Memorize. ;)