On 9 Feb 2014, at 15:53, Greg Parker <gpar...@apple.com> wrote:

> On Feb 9, 2014, at 12:19 AM, Gerriet M. Denkmann <gerr...@mdenkmann.de> wrote:
>> The real app (which I am trying to optimise) has actually two loops: one is 
>> counting, the other one is modifying. Which seems to be good news.
>> 
>> But I would really like to understand what I should do. Trial and error (or 
>> blindly groping in the mist) is not really my preferred way of working.
> 
> Optimizing small loops like this is a black art. Very small effects become 
> critically important, such as the alignment of your loop instructions or the 
> associativity of that CPU's L1 cache. 

[...]

> Cache associativity can mean that there are some array split sizes that are 
> much worse than others. If you choose the wrong size then each thread's 
> working memory is on different cache lines, but those cache lines collide 
> with each other in memory caches. Changing the work size to avoid collisions 
> can help.

sysctl hw.cachelinesize returns: hw.cachelinesize: 64

I divided my huge array (malloced, address is multiple of 0x1000) into at most 
[NSProcessInfo processorCount] chunks, where each chunk starts at a multiple of 
2^n (using fewer chunks if required by this rule).

The result of using dispatch_apply:

n       time
0       10
1       5.5
2       4
3       3
4       2
5       1.7
6       1.6
7       1.5             
16      1.4

That is, your statement "that there are some array split sizes that are much 
worse than others" is strongly backed up by my tests.

Kind regards,

Gerriet.


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to