Re: Threading - How its done?

Army Research Lab Wed, 07 May 2008 06:13:42 -0700

Karl von Moller wrote on Wed, 7 May 2008 09:51:56 +1000:

> Thanks to all that posted.
> 
>> To the original poster:
>>
>> How much experience do you have with threads?  I'm a little confused
>> reading through your posts, I can't tell if you are familiar with
>> pthreads, and just need to figure out NSThreads, or if you have no
>> threading experience at all.
> 
> As I said earlier on, I don't have ANY experience with threading at
> all - this was my first attempt to try and understand how to implement
> a threading scheme. The Apple documentation was my first port of call
> and at least the basics of it, I got from those articles. I found a
> great deal of reading material online however I think the noise of
> hearing the same comments "Threading is hard ..." etc made this all
> the more difficult. This mailing list has in one day at least, given
> me some clear guidance on my next approach which I really appreciate.
> At some point though, to take things out of the theoretical, you have
> to put things into practice. Granted my approach may have been very
> hit and miss but it certainly accelerated my learning!


In that case, I suggest getting used to pthreads with some simple C
programs first.  This will reduce the number of things that can go wrong
to a fairly small minimum (you don't have to wonder if its you or
Cocoa).  Many of the common C standard libraries are now required to be
thread-safe (at least, if you are using C99), so things like printf()
will work out of the box.  A fairly good book to learn from is
Programming with POSIX Threads by David R. Butenhof.
(http://www.amazon.com/Programming-Threads-Addison-Wesley-Professional-Compu
ting/dp/0201633922/ref=pd_bbs_1/104-4945869-2700757?ie=UTF8&s=books&qid=1210
159650&sr=8-1)
Once you understand the basics of what threads are, then you can start
working on NSThreads (which, IIRC, wrap pthreads).

As for why threads are 'hard'; in reality, they aren't hard, as long as
you are absolutely fastidious in following the rules:

1) Don't access shared variables unless you've got possession of the
lock.  If you must own several different locks at the same time, enforce
an ordering on how you grab the locks; that is, if each thread must own
locks A, B, and C in order to get any work done, then the threads must
lock the locks in that order each time.  I'm sure you've read about
deadlock by now; the fastest way to cause it is to have one thread grab
locks A, B and then another one to grab them B, A.  Somewhere along the
line, one thread will have A and be waiting for B, while the other has B
and is waiting for A.  Enforcing an order prevents this.

2) Check your condition variables in a while loop, to make sure you
weren't woken up by accident.

3) I don't care how 'small' or 'atomic' the shared variable is, use a
lock.

4) Signals are bad in multi-threaded environments; accept that the
system may pass signals to you, but don't use signals yourself.  BTW,
the pthreads API is a little confusing here, there is a function in
there called 'pthread_cond_signal()'.  That one is safe to use, although
'pthread_cond_broadcast()' is better for beginners.  The one that can be
dangerous is 'signal()' (see 'man 3 signal').

5) WHAT PART OF 'USE A LOCK' DIDN'T YOU GET???

Sorry, that last one is for all the code I've had to debug over the
years...

Here are some of the reasons that threads are 'hard'

1) Threads introduce non-deterministic behavior into your program; just
because it didn't crash this go around, doesn't mean it won't crash the
next time, or on someone else's machine.  This non-determinism is VERY
hard to debug, because the mere act of loading the code up in a debugger
will cause the program's behavior to change, possibly masking what went
wrong.

2) Compilers are smart; they'll keep stuff in registers as long as
possible, including things that are shared between threads.  Anything
that is shared, but in a register, won't be seen between threads, so code
that should be sharing stuff won't be. That is one reason why you MUST
use a lock around all shared variable access, and around any 'sensitive'
code (code that temporarily breaks invariants).  Part of the code of the
lock includes something like OSMemoryBarrier() in
/usr/include/libkern/OSAtomic.h.  A memory barrier forces anything that
is supposed to be in memory to be written all the way out to main memory
before any further processing happens; that ensures that shared
variables are updated properly.

3) 'Small' variables like chars or ints look like they should be written
atomically to main memory, but this isn't always the case; on some
systems, the only way to write out a single character is to read in 32
or 64 bits, mask off the portions of the variable you don't want to
change, write the character into the variable, and then write the whole
32 or 64 bit chunk back into memory.  That means that without a lock,
even a char can cause problems.

4) Threads all share the same address space, but have different stacks.
The stack is the important problem here; it is possible to have a stack
that is so deep that one thread's stack overwrites another thread's
stack (look up 'stack smashing', which is similar).  This might not seem
likely until you've run into anyone that attempts to put large arrays
onto the stack, or finds highly recursive functions to be a pleasure to
use.  Valgrind (http://valgrind.org/) is particularly nice for finding
this, but it is Linux only for right now.  There are other tricks to use
to solve this as well, but they are basically hacks (e.g. guard pages
and other canaries, etc.)

In general, I'd say threading isn't any more or less difficult than any
other programming; its just that if you're used to hacking something out
and then using the debugger to fix the problems, then it will be hard.
If you plan and design carefully, choosing what will be shared and what
will be private, then it won't be too bad.

>> to have a pool of worker threads that render the thumbnails/PDFviews
>> in the background.  It would simply be a matter of having a
>> thread-safe priority queue (so if the user clicks on a thumbnail, it
>> gets jumped to the head of the queue), and let the threads grab
>> whatever happens to be the highest priority to work on, returning the
>> results to the main thread to display later.  Alternatively, each
>> time you click on a thumbnail, it could spawn a thread, with the
>> thread returning the results to the main thread when it is done.
> 
> This idea is exactly what I was trying to achieve - The idea of the
> thumbnail creation and the PDF loading happening in a separate thread.
> However to have a thread safe priority queue to organize the "Worker
> Threads" sounds like the missing link - And with that some safe way to
> cancel Threads that have been outdated by the user before that Thread
> had time to set the PDF View! I am assuming in a priority queue this
> function could exist?

A priority queue is simply a way of organizing chunks of data, much like
a regular queue, a tree, or another data structure.  The thread safety
comes from protecting all accesses via locking; e.g., the queue's
appendData() function will internally lock a lock, do all changes that
are necessary, and then release the lock before returning from the call.
One of the simplest ways to do this is to write an ordinary queue (or
use something like stl::priority_queue<>) and then write a wrapper using
a decorator pattern that handles the locking.  This lets you use return
statements anywhere you want in the ordinary queue, while the
thread-safe wrapper catches the return value, and does the necessary
unlocking.  E.g. (all code written in my mail client, may have bugs):

#include <assert.h>
#include <pthread.h>
#include <stddef.h>
#include <stdint.h>

struct nonLockQueueContext
{
    uint8_t *data;
    size_t dataSize;
};

struct lockingQueueContext
{
    struct nonLockQueueContext nonLockingContext;
    pthread_mutex_t lock;
};

int nonLockingInsert(   struct nonLockQueueContext *context,
                        void *data, size_t dataSize)
{
    if (context == NULL)
        return -1;
    
    if (data == NULL)
        return -2;
        
    if (dataSize == 0)
        return -3;
    
    // Any other checks and return values you can think of
    
    // Put the data in the queue
    
    return 0;
}

int lockingInsert(  struct lockingQueueContext *context,
                    void *data, size_t dataSize)
{
    int result = 0;

    if (context == NULL)
        return -1;
        
    assert(pthread_mutex_lock(&(context->lock)) == 0);
        result = nonLockingInsert(  &(context->nonLockingContext),
                                    data, dataSize);
    assert(pthread_mutex_unlock(&(context->lock)) == 0);
    
    return result;
}

Canceling threads is another thing entirely; pthreads DOES have the
pthread_cancel() function, but if threading is a way of shooting
yourself in the foot, pthread_cancel() involves an unmarked minefield;
avoid it if you can.  (For all you pthreads experts out there; do YOU
know what ALL the cancellation points defined by the API are?  Better
yet, since NSThreads are based on pthreads, do you know when Cocoa will
drop out due to a cancellation point???)

A relatively safe way of canceling a thread is to create a cancel
variable and a lock to go with it (and I actually tested this code!)

#include <assert.h>
#include <pthread.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <unistd.h>

pthread_mutex_t myLock = PTHREAD_MUTEX_INITIALIZER;
bool dieNow = false;

void *threadFunc(void *arg)
{
    printf("About to start loop.\n");

    while (true)
    {
        assert(pthread_mutex_lock(&myLock) == 0);
            if (dieNow)
            {
                assert(pthread_mutex_unlock(&myLock) == 0);
                break;
            }
        assert(pthread_mutex_unlock(&myLock) == 0);
        
        // Whatever processing you need to do, which may require another
        // lock.  Be sure, it is short!  Otherwise, it will take a while
        // for the thread to cancel.
    }
    
    printf("Got message to quit.\n");
    
    return NULL;
}

int main(void)
{
    pthread_t ID;
    pthread_create(&ID, NULL, threadFunc, NULL);
    
    sleep(1);
    
    assert(pthread_mutex_lock(&myLock) == 0);
        dieNow = true;
    assert(pthread_mutex_unlock(&myLock) == 0);
    
    assert(pthread_join(ID, NULL) == 0);
    
    return 0;
}

I keep saying I'll write a book on all of this, but I never get around to
it...

Good luck,
Cem Karan

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Re: Threading - How its done?

Reply via email to