Re: [julia-users] Accuracy of timing macros

John Myles White Tue, 03 Jun 2014 19:37:18 -0700

I’m not sure there’s any single correct way to do benchmarks without 
information about what you’re trying to optimize.


If you’re trying to optimize the experience of people using your code, I think 
it’s important to use means rather than medians because you want to use a 
metric that’s effected by the entire shape of the distribution of times and not 
entirely determined by the "center" of that distribution.

If you want a theoretically pure measurement for an algorithm, I think 
measuring time is kind of problematic. For algorithms, I’d prefer seeing a 
count of CPU instructions.

 — John

On Jun 2, 2014, at 7:32 PM, Kevin Squire <[email protected]> wrote:

> I think that, for many algorithms, triggering the gc() is simply a matter of 
> running a simulation for enough iterations. By calling gc() ahead of time, 
> you should be able to get the same number (n > 0) of gc calls, which isn't 
> ignoring gc().  
> 
> That said, it can take some effort to figure out the number of iterations and 
> time to run the experiment. 
> 
> Cheers, Kevin 
> 
> On Monday, June 2, 2014, Stefan Karpinski <[email protected]> wrote:
> I feel that ignoring gc can be a bit of a cheat since it does happen and it's 
> quite expensive – and other systems may be better or worse at it. Of course, 
> it can still be good to separate the cause of slowness explicitly into 
> execution time and overhead for things like gc.
> 
> 
> On Mon, Jun 2, 2014 at 5:21 PM, Kevin Squire <[email protected]> wrote:
> Thanks John.  His argument definitely makes sense (that algorithms that cause 
> more garbage collection won't get penalized by median, unless, of course, 
> they cause gc() to occur more than 50% of the time).  
> 
> Most benchmarks of Julia code that I've done (or seen) have made some attempt 
> to take gc() differences out of the equation, usually by explicitly calling 
> gc() before the timing begins.  For most algorithms, that would mean that the 
> same number of gc() calls should occur for each repetition, in which case, I 
> would think that any measure of central tendency (including mean and median) 
> would be useful.  
> 
> Is there a problem with this reasoning?
> 
> Cheers,
>    Kevin
> 
> 
> On Mon, Jun 2, 2014 at 1:04 PM, John Myles White <[email protected]> 
> wrote:
> For some reasons why one might want not to use the median, see 
> http://radfordneal.wordpress.com/2014/02/02/inaccurate-results-from-microbenchmark/
> 
>  -- John
> 
> On Jun 2, 2014, at 11:06 AM, Kevin Squire <[email protected]> wrote:
> 
>> median is probably also useful. I like it a little better in cases where the 
>> code being tested triggers gc() more than half the time. 
>> 
>> On Monday, June 2, 2014, Steven G. Johnson <[email protected]> wrote:
>> 
>> 
>> On Monday, June 2, 2014 1:01:25 AM UTC-4, Jameson wrote: 
>> Therefore, for benchmarks, you should execute your code in a loop enough 
>> times that the measurement error (of the hardware and OS) is not too 
>> significant.
>> 
>> 
>> You can also often benchmark multiple times and take the minimum (not the 
>> mean!) time for reasonable results with fairly small time intervals.
> 
> 
>

Re: [julia-users] Accuracy of timing macros

Reply via email to