Re: [julia-users] Accuracy of timing macros

Tamas Papp Thu, 05 Jun 2014 01:34:32 -0700

IMO unless your distribution has a very irregular shape (more than a few
modes) and/or rare events (frequency < 1e-3), you can get reasonable
estimate for all quantiles between 0.05-0.95 that's good enough for most
practical purposes with 500-2000 samples, either with simple
nonparametric or semiparametric statistical methods.


That said, if there is large variance or multiple modes, if I had the
resources (time/expertise) then my most important goal would not be
refining my estimate of the distribution, but figuring out what's going
on (which, admittedly, can be a pain).

I have only skimmed through the profiler functions in Julia (still a
newbie, learning the language), but I noticed the
https://github.com/timholy/ProfileView.jl supports gathering some
information about GC.

Best,

Tamas

On Thu, Jun 05 2014, John Myles White <[email protected]> wrote:

> Ignoring pipelining and modern CPU trickery, the CPU instruction count is a 
> potentially low variance lower bound on the speed of an operation. The lower 
> bound property is all I meant by purity. In addition, because instruction 
> counts are often lower variance than other metrics, it's potentially useful 
> for comparing two proposed implementations/algorithms. Other metrics like 
> mean wall clock time may have such high variance that you can't collect 
> enough measurements to conclude that a change had any impact.
>
> Personally, I think you can never do better that an estimate of the full 
> distribution of wall clock timings. But you almost never collect enough data 
> to even come close to having a proper estimate of this distribution. In that 
> regard, I totally agree that there is no perfect measurement. But you do need 
> to make decisions. Lower bounds from CPU instructions are nice for doing 
> that, although you want to make sure your wall clock timings aren't doing 
> something nuts to create the change in CPU instructions.
>
>  -- John
>
> On Jun 4, 2014, at 9:36 AM, Tamas Papp <[email protected]> wrote:
>
>> Sorry if I missed something in this discussion, but why are you assuming
>> that the CPU instructions per se are the bottleneck, instead of, say,
>> memory, especially how your algorithm interacts with various levels of
>> CPU caches? Heavily optimized numerical algorithms make a lot of effort
>> to optimize the latter, as fetching something from memory is on the
>> order of ~100 CPU instructions.
>> 
>> I am not a computer scientist, but I am not sure that such a
>> theoretically pure measurement exists. If there is a lot of variation in
>> execution times _with the same input_, I guess it would be to worthwhile
>> to understand the cause. GC would be a prime suspect in dynamic
>> languages, but I don't yet know Julia well enough to see if this
>> generalizes to your case.
>> 
>> Best,
>> 
>> Tamas
>> 
>> On Wed, Jun 04 2014, Joshua Job <[email protected]> wrote:
>> 
>>> I want a theoretically pure measurement for an algorithm. I'm not sure how 
>>> I would go about counting CPU instructions, however, since the algorithm is 
>>> only a heuristic one and my only guide for how long it takes on a given set 
>>> of instances is to run it and see how long it took. Is there some way of 
>>> getting a count of CPU instructions due to the execution of a particular 
>>> function in Julia? 
>>> 
>>> I want to estimate how long a user would have to run the algorithm in order 
>>> to achieve 99% confidence that it will have succeeded at last once in 
>>> finding the true solution to the problem(s). The obvious way to do that is 
>>> to run the algorithm many times (say, 10^4) and take the 99th percentile of 
>>> that distribution and call it T. If one ran the algorithm for time T, then 
>>> we know empirically that 99 times out of a 100 it will find a solution in a 
>>> time less than or equal to T.
>>> 
>>> I could use means or medians of course, but those are much less sensitive 
>>> to the tails of the distribution, and I'm really interested in how long it 
>>> takes to achieve confidence in the results, not just average time to 
>>> solution. Hence the desire to time the runs of the algorithms.
>>> 
>>> In any case I may have underestimated the runtime of the algorithm in 
>>> question --- it appears for interesting problem sizes the C code for it 
>>> takes on the order of a millisecond or longer as opposed to tens of 
>>> microseconds. I suspect then that the @time macro or tic/toc should be 
>>> accurate enough for that purpose.
>>> 
>>> On Tuesday, June 3, 2014 7:36:12 PM UTC-7, John Myles White wrote:
>>>> 
>>>> I’m not sure there’s any single correct way to do benchmarks without 
>>>> information about what you’re trying to optimize.
>>>> 
>>>> If you’re trying to optimize the experience of people using your code, I 
>>>> think it’s important to use means rather than medians because you want to 
>>>> use a metric that’s effected by the entire shape of the distribution of 
>>>> times and not entirely determined by the "center" of that distribution.
>>>> 
>>>> If you want a theoretically pure measurement for an algorithm, I think 
>>>> measuring time is kind of problematic. For algorithms, I’d prefer seeing a 
>>>> count of CPU instructions.
>>>> 
>>>> — John
>>>> 
>>>> On Jun 2, 2014, at 7:32 PM, Kevin Squire <[email protected] 
>>>> <javascript:>> wrote:
>>>> 
>>>> I think that, for many algorithms, triggering the gc() is simply a matter 
>>>> of running a simulation for enough iterations. By calling gc() ahead of 
>>>> time, you should be able to get the same number (n > 0) of gc calls, which 
>>>> isn't ignoring gc().  
>>>> 
>>>> That said, it can take some effort to figure out the number of iterations 
>>>> and time to run the experiment. 
>>>> 
>>>> Cheers, Kevin 
>>>> 
>>>> On Monday, June 2, 2014, Stefan Karpinski <[email protected] 
>>>> <javascript:>> wrote:
>>>> 
>>>>> I feel that ignoring gc can be a bit of a cheat since it does happen and 
>>>>> it's quite expensive – and other systems may be better or worse at it. Of 
>>>>> course, it can still be good to separate the cause of slowness explicitly 
>>>>> into execution time and overhead for things like gc.
>>>>> 
>>>>> 
>>>>> On Mon, Jun 2, 2014 at 5:21 PM, Kevin Squire <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>>> Thanks John.  His argument definitely makes sense (that algorithms that 
>>>>>> cause more garbage collection won't get penalized by median, unless, of 
>>>>>> course, they cause gc() to occur more than 50% of the time).  
>>>>>> 
>>>>>> Most benchmarks of Julia code that I've done (or seen) have made some 
>>>>>> attempt to take gc() differences out of the equation, usually by 
>>>>>> explicitly 
>>>>>> calling gc() before the timing begins.  For most algorithms, that would 
>>>>>> mean that the same number of gc() calls should occur for each 
>>>>>> repetition, 
>>>>>> in which case, I would think that any measure of central tendency 
>>>>>> (including mean and median) would be useful.  
>>>>>> 
>>>>>> Is there a problem with this reasoning?
>>>>>> 
>>>>>> Cheers,
>>>>>>   Kevin
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jun 2, 2014 at 1:04 PM, John Myles White <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>> For some reasons why one might want not to use the median, see 
>>>>>>> http://radfordneal.wordpress.com/2014/02/02/inaccurate-results-from-microbenchmark/
>>>>>>> 
>>>>>>> -- John
>>>>>>> 
>>>>>>> On Jun 2, 2014, at 11:06 AM, Kevin Squire <[email protected]> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> median is probably also useful. I like it a little better in cases 
>>>>>>> where the code being tested triggers gc() more than half the time. 
>>>>>>> 
>>>>>>> On Monday, June 2, 2014, Steven G. Johnson <[email protected]> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Monday, June 2, 2014 1:01:25 AM UTC-4, Jameson wrote: 
>>>>>>>>> 
>>>>>>>>> Therefore, for benchmarks, you should execute your code in a loop 
>>>>>>>>> enough times that the measurement error (of the hardware and OS) is 
>>>>>>>>> not too 
>>>>>>>>> significant.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> You can also often benchmark multiple times and take the minimum (not 
>>>>>>>> the mean!) time for reasonable results with fairly small time 
>>>>>>>> intervals.
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> --

--

Re: [julia-users] Accuracy of timing macros

Reply via email to