> On 1 Oct 2016, at 01:33, Quincey Morris
> wrote:
>
> On Sep 30, 2016, at 02:57 , Gerriet M. Denkmann wrote:
>
>> Any ideas where to look for a reason?
>
> The next step is probably to clarify the times between:
>
> a. Accumulated execution time — the amount of time your code actually spends
> executing in CPUs.
>
> b. Elapsed time in your process — the amount of time that’s accounted for by
> your process, whether executing or waiting.
>
> c. Elapsed time outside your process — the amount of time that’s accounted
> for by system code, also whether executing or waiting.
Time Profiler tells me that 99.9 % of the time is in Running and Blocked - each
very roughly half of the time (or 2/3 to 1/3).
Running is almost 100 % in my function.
Blocked is in roughly equal parts: my function, mach_msg_trap (from RunLoop)
and workq_kernreturn (from start_wq_thread).
There are some minor variations between 8 or 20,000 iterations, but nothing to
explain a difference factor of 8.
My function reports the running time:
start =[NSDate date]
…
dispatch_apply…
time = -start.timeIntervallSinceNow
which shows the same factor of 8 between 8 or 20,000 iterations.
>
> You can also play around with change isolation. Instead of changing two
> contextual conditions (the number of dispatch_appy calls, the number of
> iterations in a single block’s loop), change only one of them and observe the
> effect in Instruments.
Well, to compare I want in all cases (independent of nbr of iterations):
The iterations have disjunct working ranges and the sum of the working ranges
of all iterations covers the whole bigArray.
The same nbr of operations (at the same indices) should be done on bigArray.
The operations in the working range should be done randomly (well: at least not
sequentially).
One thing is quite clear: each iteration of my function accesses its working
range sort of randomly.
If one uses sequential access, then a very different behaviour emerges.
> You can also try out some other instruments speculatively. For example, is
> there a different pattern in the Allocations instrument, indicating that one
> form of your code is doing vast numbers of memory allocations for some
> (unknown) reason. Or is I/O being doing, unexpectedly?
There are no allocations (except at the start one huge malloc of 400 MB).
There is no I/O.
The I tried the System Trace Instrument and learned:
2k iterations (180 msec):
the whole time there is zero-fill being done (very rarely a page-fault).
8 iterations (1500 msec):
the first 100 msec there is zero-filling, then the 8 threads just keep slugging
along.
There are far fewer context switches (almost none after the zero-filling has
ceased).
But still, I cannot see any reason why this should take so much longer.
My hypothesis is: with a big number of iterations (each having a working range
≤ 500 KB ) any 8 iterations running concurrently use together ≤ 4 MB, which
might just fit into some cache.
With 8 iterations (each using a working range of 50 MB) there probably is a lot
of cache reloading going on.
But I failed to see any proof of this hypothesis in Instruments.
Kind regards,
Gerriet.
___
Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com
This email sent to arch...@mail-archive.com