Re: pthreads question

Nadav Har'El Fri, 15 Mar 2002 08:09:45 -0800

On Fri, Mar 15, 2002, Malcolm Kavalsky wrote about "Re: pthreads question":
> Linux has a very efficient process model and if you are writing any 
> reasonable
> size program and don't want to get into trouble, then I suggest you 
> split it into
> multiple processes and use IPC.


Threads, especially on Linux (the common "LinuxThreads" implementation)
are the same thing as processes, with two things are shared between all
of them:

 1. All the memory is shared (there is a seperate stack for each thread,
    but threads can still reach another thread's stack via pointers).

 2. All file descriptors are shared.

(see clone(2) for more information).

So if you find yourself, in writing (or designing) a multi-process program
doing so much shared memory that you start wishing malloc() would just give
you shared memory, if you find yourself sending file descriptors to other
processes (over unix domain sockets, that's the way to do that...), if you
find yourself using a lot of semaphores (arrg, those System V semaphores are
annoying...) to do mutual-exclusion or wakeups on these shared memory areas -
well... threads might be a more appropriate framework for your program.

Moreover, threads can yield real performance benefits on SMP machines.
Again, only if you know what you're doing (threads aren't one of those "let's
just write a few random lines of code and see if it works" paradigms).

> In my 20 years of programming experience, there are very few cases which I
> have come across that warrant the use of threads.

I agree. But I *did* use them, and found that when you know what you're
doing, you can actually get beautiful results.

> The initial appeal of using threads (easy sharing of global data structures,
> concurrent programming, low overhead task switching) is quickly dispelled
> the minute you start wondering why your program is crashing/dead-locking.

It is indeed hard to debug a multithreaded program, more than to debug a
single-process program, but not necessarily harder to debug than a multi-
process program where the processes actually communicate a lot and have a lot
of shared memory and sempahores set up.

Comparing a multithreaded program to an "embaracingly parallel" program (a
multi-process program whose processes just work alone and never communicate)
is irrelevant. You wouldn't use threads to implement embaracingly parallel
programs; This is also why processes (rather than threads) can make a lot
of sense when implementing web servers and the like (of course both processes
and threads have a severe limitation when implementing a web server, but
that's an issue for another post).

> After adding in lots of mutexes to protect all your data structures, 
> your program
> slows to a crawl, and tracing it reveals that 99% of the time is spent 
> in lock/unlock
> calls.

This would never happen if you actually understand what you're doing: not
just the syntax of the API but also the reasons why things should be done
the way they are. Students might want to check out courses like "Distributed
and Parallel Programming" in the Technion, or you can read a good book
to try to get a feel for the theory.

I have written a relatively-large threaded program (there were about 8
threads doing different things), and my experience is that I always understood
why and where mutexes and condition variables were needed, and I never had
to go back and add ones in places I forgot. The locking overhead was minimal
because I designed the program that way (you lock mutexes only in really
critical sections and try to work on local variables most of the time).

You're right though, that a bad programmer might make a real mess of things
by using threads (where all memory is shared) and will have a really hard
time debugging...

> Note also that C++  has certain effects that make use of threads 
> dangerous

What kind of effects? The threaded program I mentioned was in C++, and
not only I didn't see any ill-effects, I actually was very happy I chose
C++ to program it: One of the dangers of threads is that you have a "too
easy" access to variables your thread was not supposed to access (or
access without holding a lock), and C++ makes it very easy to force you to,
say, access some variable only through a method which also grabs a lock.

> and
> any library calls that you use automatically, need to be checked that 
> they are MT-safe.

Luckily this is not a problem any more for glibc except in a small number
of functions (say inet_ntoa) whose manual says the return value is a
statically allocated buffer (so called "non-reentrant" routines).

Until a few years ago, this was a serious problem in most Unix versions
and Linux, which is why threaded programming was almost unheard of in
the Unix world.

> and the OS protects each task from the other. You need to work a little 
> harder in the
> beginning to setup the IPC, but once that is done, you are home free, 

Again, if you use a lot of shared memory and so on, this begins to become
annoying. Using shared memory you need to allocate fixed-size shared memory
areas and "allocate" place in them or use some sort of special allocation
library like "mm". It's doable, but isn't as easy as just doing malloc()
as in threads.

Also, what if a library function returns an malloc()ed area, and you need
it to be allocated in shared memory instead? See - badly designed libraries
can hurt you not only if you are using threads.

> Most windows programmers that I have met, are used to working with 
> threads, and it
> is hard to change their habits to use processes.

Supposedly Windows' process implementation sucks (or sucked?) bigtime,
being very inefficient, which is why Windows programmers became used to
programming only with threads. Compare this to Unix's (or Linux's) thread
implementation sucking bigtime until a few years ago which is why Unix
programmers became used to programming only with processes.

Both "fanaticisms" are silly. You should know about both methods and use
the one that best fits your needs. I don't know if you'll end up with
a 10%-90% multiprocess/multithread ratio, 50%-50%, or 90%-10% - what I do
know is that most programs should be neither threaded nor multi-process at
all...

> did then the context switches were too high. Linux, on the other hand, 
> has an excellent
> low-overhead process model which removes 99% of the reason to use threads.

Right, but I don't agree about the 99% figure. As I said, not everybody
wants to use threads just for lower overhead in context switching.


-- 
Nadav Har'El                        |        Friday, Mar 15 2002, 2 Nisan 5762
[EMAIL PROTECTED]             |-----------------------------------------
Phone: +972-53-245868, ICQ 13349191 |The knowledge that you are an idiot, is
http://nadav.harel.org.il           |what distinguishes you from one.

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: pthreads question

Reply via email to