I have a big array (like a few GB) which is operated upon by some functions.
As these functions act purely local, an obvious idea is:

- (void)someFunction
{
        nbrOfThreads = ...
        sizeOfBigArray = ...  a few GB
        stride = sizeOfBigArray / nbrOfThreads
        
        dispatch_apply( nbrOfThreads, queue, ^void(size_t idx)
                {
                        start = idx * stride
                        end = start + stride
                        
                        index = start   
                        while ( index < end )
                        {
                                mask = ...
                                bigArray[index] |= mask
                                index += … something positive…
                        }
                }
        )
}

As my computer has just 8 CPUs, I thought that using nbrOfThreads > 8 would be 
silly: adding overhead without gaining anything.

Turns out this is quite wrong. One function (called threadHappyFunction) works 
more than 10 times faster using nbrOfThreads = a few ten-thousand (as compared 
to nbrOfThreads = 8).

This is nice, but I would like to know why can this happen.

Another function does not like threads at all:

- (void)threadShyFunction
{
        nbrOfThreads = ...
        uint64_t *bigArrayAsLongs = (uint64_t *)bigArray
        sizeOfBigArrayInLongs = ... 
        stride = sizeOfBigArrayInLongs / nbrOfThreads
        
        uint64_t *template = ...
        sizeOfTemplate = not more than a few dozen longs
        
        dispatch_apply( nbrOfThreads, queue, ^void(size_t idx)
                {
                        start = idx * stride
                        end = start + stride
                        
                        offset = start
                        
                        while ( offset + sizeOfTemplate < end )
                        {
                                for ( i = 0 ..< sizeOfTemplate ) 
bigArrayAsLongs[offset + i] |= template[i]
                                offset += sizeOfTemplate
                        }
                }
        )
}

This works, but for nbrOfThreads > 1 it gets slower instead of faster. Up to 
hundred times slower for moderately big nbrOfThreads.
This really bothers me. 
Why are the threads seemingly blocking each other?
What is going on here?
How can I investigate this?

Gerriet.

P.S. macOS 12, Xcode 8, ObjC or Swift.


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to